All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/45] KVM: Arm SMMUv3 driver for pKVM
@ 2023-02-01 12:52 ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:52 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

The pKVM hypervisor, recently introduced on arm64, provides a separation
of privileges between the host and hypervisor parts of KVM, where the
hypervisor is trusted by guests but the host is not [1]. The host is
initially trusted during boot, but its privileges are reduced after KVM
is initialized so that, if an adversary later gains access to the large
attack surface of the host, it cannot access guest data.

Currently with pKVM, the host can still instruct DMA-capable devices
like the GPU to access guest and hypervisor memory, which undermines
this isolation. Preventing DMA attacks requires an IOMMU, owned by the
hypervisor.

This series adds a hypervisor driver for the Arm SMMUv3 IOMMU. Since the
hypervisor part of pKVM (called nVHE here) is minimal, moving the whole
host SMMU driver into nVHE isn't really an option. It is too large and
complex and requires infrastructure from all over the kernel. We add a
reduced nVHE driver that deals with populating the SMMU tables and the
command queue, and the host driver still deals with probing and some
initialization.


Patch overview
==============

A significant portion of this series just moves and factors code to
avoid duplications. Things get interesting only around patch 15, which
adds two helpers that track pages mapped in the IOMMU, and ensure those
pages are not donated to guests. Then patches 16-27 add the hypervisor
IOMMU driver, split into a generic part that can be reused by other
drivers, and code specific to SMMUv3.

Patches 34-40 introduce the host component of the pKVM SMMUv3 driver,
which initializes the configuration and forwards mapping requests to the
hypervisor. Ideally there would be a single host driver with two sets of
IOMMU ops, and while I believe more code can still be shared, the
initialization is very different and having separate driver entry points
seems clearer.

Patches 41-45 provide a rough example of power management through SCMI.
Although the host decides on power management policies, the hypervisor
must at least be aware of power changes, so that it doesn't access powered
down interfaces. We expect that the platform controller enforces
dependencies so that DMA doesn't bypass a powered down IOMMU. But these
things are unfortunately platform dependent and the SCMI patches are
only illustrative.

These patches in particular are best reviewed with git's --color-moved:
1,2	iommu/io-pgtable-arm: Split*
7,29-32	iommu/arm-smmu-v3: Move*

A development branch is available at
https://jpbrucker.net/git/linux pkvm/smmu


Design
======

We've explored three solutions so far. This posting implements the third
one, slightly more invasive in the hypervisor but the most flexible.

1. Sharing stage-2 page tables

This is the simplest solution, sharing the stage-2 page tables (which
translates host physical address -> system physical address) between CPU
and SMMU. Whatever the host can access on the CPU, it can also access with
DMA. Memory that is not accessible to the host because donated to the
hypervisor or guests, DMA cannot access either.

pKVM normally populates the host stage-2 page tables lazily, when the host
first accesses them. However this relies on CPU page faults, and DMA
generally cannot fault. The whole stage-2 must therefore be populated at
boot. That's easy to do because the HPA->PA mapping for the host is an
identity.

It gets more complicated when donating some pages to guests, which
involves removing those pages from the host stage-2. To save memory and be
TLB efficient, the stage-2 is mapped with block mappings (1G or 2MB
contiguous range, rather than individual 4k units). When donating a page
from that range, the hypervisor must remove the block mapping, and replace
it with a table that excludes the donated page. Since a device may be
simultaneously performing DMA on other pages in the range, this
replacement operation must be atomic. Otherwise DMA may reach the SMMU
during a small period of time where the mapping is invalid, and fatally
abort.

The Arm architecture supports atomic replacement of block mappings only
since version 8.4 (FEAT_BBM), and it is optional. So this solution, while
tempting, is not sufficient.

2. Pinning DMA mappings in the shared stage-2

Building on the first solution, we can let the host notify the hypervisor
about pages used for DMA. This way block mappings are broken into tables
when the host sets up DMA, and donating neighbouring pages to guests won't
cause block replacement.

This solution adds runtime overhead because calls the DMA API are now
forwarded to the hypervisor, which needs to update the stage-2 mappings.

All in all, I believe this is a good solution if the hardware is up to the
task. But sharing page tables requires matching capabilities between the
stage-2 MMU and SMMU, and we don't expect all platforms to support the
required features, especially on mobile platforms where chip area is
costly.

3. Private I/O page tables

A flexible alternative uses private page tables in the SMMU, entirely
disconnected from the CPU page tables. With this the SMMU can implement a
reduced set of features, even shed a stage of translation. This also
provides a virtual I/O address space to the host, which allows more
efficient memory allocation for large buffers, and for devices with
limited addressing abilities.

This is the solution implemented in this series. The host creates
IOVA->HPA mappings with two hypercalls map_pages() and unmap_pages(), and
the hypervisor populates the page tables. Page tables are abstracted into
IOMMU domains, which allow multiple devices to share the same address
space. Another four hypercalls, alloc_domain(), attach_dev(), detach_dev()
and free_domain(), manage the domains.

Although the hypervisor already has pgtable.c to populate CPU page tables,
we import the io-pgtable library because it is more suited to IOMMU page
tables. It supports arbitrary page and address sizes, non-coherent page
walks, quirks and errata workarounds specific to IOMMU implementations,
and atomically switching between tables and blocks without lazy remapping.


Performance
===========

Both solutions 2 and 3 add overhead to DMA mappings, and since the
hypervisor relies on global locks at the moment, they scale poorly.
Interestingly solution 3 can be optimized to scale really well on the
map() path. We can remove the hypervisor IOMMU lock in map()/unmap() by
holding domain references, and then use the hyp vmemmap to track DMA state
of pages atomically, without updating the CPU stage-2 tables. Donation and
sharing would then need to inspect the vmemmap. On the unmap() path, the
single command queue for TLB invalidations still requires locking.

To give a rough idea, these are dma_map_benchmark results on a 96-core
server (4 NUMA nodes, SMMU on node 0). I'm adding these because I found
the magnitudes interesting but do take them with a grain of salt, my
methodology wasn't particularly thorough (although the numbers seem
repeatable). Numbers represent the average time needed for one
dma_map/dma_unmap call in μs, lower is better.

			1 thread	16 threads (node 0)	96 threads
host only		0.2/0.7		0.4/3.5			1.7/81
pkvm (this series)	0.5/2.2		28/51			291/542
pkvm (+optimizations)	0.3/1.9		0.4/38			0.8/304


[1] https://lore.kernel.org/kvmarm/20220519134204.5379-1-will@kernel.org/


David Brazdil (1):
  KVM: arm64: Introduce IOMMU driver infrastructure

Jean-Philippe Brucker (44):
  iommu/io-pgtable-arm: Split the page table driver
  iommu/io-pgtable-arm: Split initialization
  iommu/io-pgtable: Move fmt into io_pgtable_cfg
  iommu/io-pgtable: Add configure() operation
  iommu/io-pgtable: Split io_pgtable structure
  iommu/io-pgtable-arm: Extend __arm_lpae_free_pgtable() to only free
    child tables
  iommu/arm-smmu-v3: Move some definitions to arm64 include/
  KVM: arm64: pkvm: Add pkvm_udelay()
  KVM: arm64: pkvm: Add pkvm_create_hyp_device_mapping()
  KVM: arm64: pkvm: Expose pkvm_map/unmap_donated_memory()
  KVM: arm64: pkvm: Expose pkvm_admit_host_page()
  KVM: arm64: pkvm: Unify pkvm_pkvm_teardown_donated_memory()
  KVM: arm64: pkvm: Add hyp_page_ref_inc_return()
  KVM: arm64: pkvm: Prevent host donation of device memory
  KVM: arm64: pkvm: Add __pkvm_host_share/unshare_dma()
  KVM: arm64: pkvm: Add IOMMU hypercalls
  KVM: arm64: iommu: Add per-cpu page queue
  KVM: arm64: iommu: Add domains
  KVM: arm64: iommu: Add map() and unmap() operations
  KVM: arm64: iommu: Add SMMUv3 driver
  KVM: arm64: smmu-v3: Initialize registers
  KVM: arm64: smmu-v3: Setup command queue
  KVM: arm64: smmu-v3: Setup stream table
  KVM: arm64: smmu-v3: Reset the device
  KVM: arm64: smmu-v3: Support io-pgtable
  KVM: arm64: smmu-v3: Setup domains and page table configuration
  iommu/arm-smmu-v3: Extract driver-specific bits from probe function
  iommu/arm-smmu-v3: Move some functions to arm-smmu-v3-common.c
  iommu/arm-smmu-v3: Move queue and table allocation to
    arm-smmu-v3-common.c
  iommu/arm-smmu-v3: Move firmware probe to arm-smmu-v3-common
  iommu/arm-smmu-v3: Move IOMMU registration to arm-smmu-v3-common.c
  iommu/arm-smmu-v3: Use single pages for level-2 stream tables
  iommu/arm-smmu-v3: Add host driver for pKVM
  iommu/arm-smmu-v3-kvm: Pass a list of SMMU devices to the hypervisor
  iommu/arm-smmu-v3-kvm: Validate device features
  iommu/arm-smmu-v3-kvm: Allocate structures and reset device
  iommu/arm-smmu-v3-kvm: Add per-cpu page queue
  iommu/arm-smmu-v3-kvm: Initialize page table configuration
  iommu/arm-smmu-v3-kvm: Add IOMMU ops
  KVM: arm64: pkvm: Add __pkvm_host_add_remove_page()
  KVM: arm64: pkvm: Support SCMI power domain
  KVM: arm64: smmu-v3: Support power management
  iommu/arm-smmu-v3-kvm: Support power management with SCMI SMC
  iommu/arm-smmu-v3-kvm: Enable runtime PM

 drivers/iommu/Kconfig                         |   10 +
 virt/kvm/Kconfig                              |    3 +
 arch/arm64/kvm/hyp/nvhe/Makefile              |    6 +
 drivers/iommu/Makefile                        |    2 +-
 drivers/iommu/arm/arm-smmu-v3/Makefile        |    6 +
 arch/arm64/include/asm/arm-smmu-v3-regs.h     |  478 ++++++++
 arch/arm64/include/asm/kvm_asm.h              |    7 +
 arch/arm64/include/asm/kvm_host.h             |    5 +
 arch/arm64/include/asm/kvm_hyp.h              |    4 +-
 arch/arm64/kvm/hyp/include/nvhe/iommu.h       |  115 ++
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |   11 +-
 arch/arm64/kvm/hyp/include/nvhe/memory.h      |   15 +-
 arch/arm64/kvm/hyp/include/nvhe/mm.h          |    2 +
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h        |   29 +
 .../arm64/kvm/hyp/include/nvhe/trap_handler.h |    2 +
 drivers/gpu/drm/panfrost/panfrost_device.h    |    2 +-
 drivers/iommu/amd/amd_iommu_types.h           |   17 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  510 +-------
 drivers/iommu/arm/arm-smmu/arm-smmu.h         |    2 +-
 drivers/iommu/io-pgtable-arm.h                |   30 -
 include/kvm/arm_smmu_v3.h                     |   61 +
 include/kvm/iommu.h                           |   74 ++
 include/kvm/power_domain.h                    |   22 +
 include/linux/io-pgtable-arm.h                |  190 +++
 include/linux/io-pgtable.h                    |  114 +-
 arch/arm64/kvm/arm.c                          |   41 +-
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            |  101 +-
 arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c   |  625 ++++++++++
 .../arm64/kvm/hyp/nvhe/iommu/io-pgtable-arm.c |   97 ++
 arch/arm64/kvm/hyp/nvhe/iommu/iommu.c         |  393 ++++++
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         |  209 +++-
 arch/arm64/kvm/hyp/nvhe/mm.c                  |   27 +-
 arch/arm64/kvm/hyp/nvhe/pkvm.c                |   66 +-
 arch/arm64/kvm/hyp/nvhe/power/scmi.c          |  233 ++++
 arch/arm64/kvm/hyp/nvhe/setup.c               |   47 +-
 arch/arm64/kvm/hyp/nvhe/timer-sr.c            |   43 +
 drivers/gpu/drm/msm/msm_iommu.c               |   22 +-
 drivers/gpu/drm/panfrost/panfrost_mmu.c       |   22 +-
 drivers/iommu/amd/io_pgtable.c                |   26 +-
 drivers/iommu/amd/io_pgtable_v2.c             |   43 +-
 drivers/iommu/amd/iommu.c                     |   29 +-
 drivers/iommu/apple-dart.c                    |   38 +-
 .../arm/arm-smmu-v3/arm-smmu-v3-common.c      |  632 ++++++++++
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c   |  864 +++++++++++++
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |    2 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  679 +----------
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c    |    7 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.c         |   41 +-
 drivers/iommu/arm/arm-smmu/qcom_iommu.c       |   41 +-
 drivers/iommu/io-pgtable-arm-common.c         |  766 ++++++++++++
 drivers/iommu/io-pgtable-arm-v7s.c            |  190 +--
 drivers/iommu/io-pgtable-arm.c                | 1082 ++---------------
 drivers/iommu/io-pgtable-dart.c               |  105 +-
 drivers/iommu/io-pgtable.c                    |   57 +-
 drivers/iommu/ipmmu-vmsa.c                    |   20 +-
 drivers/iommu/msm_iommu.c                     |   18 +-
 drivers/iommu/mtk_iommu.c                     |   14 +-
 57 files changed, 5743 insertions(+), 2554 deletions(-)
 create mode 100644 arch/arm64/include/asm/arm-smmu-v3-regs.h
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/iommu.h
 delete mode 100644 drivers/iommu/io-pgtable-arm.h
 create mode 100644 include/kvm/arm_smmu_v3.h
 create mode 100644 include/kvm/iommu.h
 create mode 100644 include/kvm/power_domain.h
 create mode 100644 include/linux/io-pgtable-arm.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/iommu/io-pgtable-arm.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/power/scmi.c
 create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
 create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
 create mode 100644 drivers/iommu/io-pgtable-arm-common.c

-- 
2.39.0


^ permalink raw reply	[flat|nested] 201+ messages in thread

* [RFC PATCH 00/45] KVM: Arm SMMUv3 driver for pKVM
@ 2023-02-01 12:52 ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:52 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

The pKVM hypervisor, recently introduced on arm64, provides a separation
of privileges between the host and hypervisor parts of KVM, where the
hypervisor is trusted by guests but the host is not [1]. The host is
initially trusted during boot, but its privileges are reduced after KVM
is initialized so that, if an adversary later gains access to the large
attack surface of the host, it cannot access guest data.

Currently with pKVM, the host can still instruct DMA-capable devices
like the GPU to access guest and hypervisor memory, which undermines
this isolation. Preventing DMA attacks requires an IOMMU, owned by the
hypervisor.

This series adds a hypervisor driver for the Arm SMMUv3 IOMMU. Since the
hypervisor part of pKVM (called nVHE here) is minimal, moving the whole
host SMMU driver into nVHE isn't really an option. It is too large and
complex and requires infrastructure from all over the kernel. We add a
reduced nVHE driver that deals with populating the SMMU tables and the
command queue, and the host driver still deals with probing and some
initialization.


Patch overview
==============

A significant portion of this series just moves and factors code to
avoid duplications. Things get interesting only around patch 15, which
adds two helpers that track pages mapped in the IOMMU, and ensure those
pages are not donated to guests. Then patches 16-27 add the hypervisor
IOMMU driver, split into a generic part that can be reused by other
drivers, and code specific to SMMUv3.

Patches 34-40 introduce the host component of the pKVM SMMUv3 driver,
which initializes the configuration and forwards mapping requests to the
hypervisor. Ideally there would be a single host driver with two sets of
IOMMU ops, and while I believe more code can still be shared, the
initialization is very different and having separate driver entry points
seems clearer.

Patches 41-45 provide a rough example of power management through SCMI.
Although the host decides on power management policies, the hypervisor
must at least be aware of power changes, so that it doesn't access powered
down interfaces. We expect that the platform controller enforces
dependencies so that DMA doesn't bypass a powered down IOMMU. But these
things are unfortunately platform dependent and the SCMI patches are
only illustrative.

These patches in particular are best reviewed with git's --color-moved:
1,2	iommu/io-pgtable-arm: Split*
7,29-32	iommu/arm-smmu-v3: Move*

A development branch is available at
https://jpbrucker.net/git/linux pkvm/smmu


Design
======

We've explored three solutions so far. This posting implements the third
one, slightly more invasive in the hypervisor but the most flexible.

1. Sharing stage-2 page tables

This is the simplest solution, sharing the stage-2 page tables (which
translates host physical address -> system physical address) between CPU
and SMMU. Whatever the host can access on the CPU, it can also access with
DMA. Memory that is not accessible to the host because donated to the
hypervisor or guests, DMA cannot access either.

pKVM normally populates the host stage-2 page tables lazily, when the host
first accesses them. However this relies on CPU page faults, and DMA
generally cannot fault. The whole stage-2 must therefore be populated at
boot. That's easy to do because the HPA->PA mapping for the host is an
identity.

It gets more complicated when donating some pages to guests, which
involves removing those pages from the host stage-2. To save memory and be
TLB efficient, the stage-2 is mapped with block mappings (1G or 2MB
contiguous range, rather than individual 4k units). When donating a page
from that range, the hypervisor must remove the block mapping, and replace
it with a table that excludes the donated page. Since a device may be
simultaneously performing DMA on other pages in the range, this
replacement operation must be atomic. Otherwise DMA may reach the SMMU
during a small period of time where the mapping is invalid, and fatally
abort.

The Arm architecture supports atomic replacement of block mappings only
since version 8.4 (FEAT_BBM), and it is optional. So this solution, while
tempting, is not sufficient.

2. Pinning DMA mappings in the shared stage-2

Building on the first solution, we can let the host notify the hypervisor
about pages used for DMA. This way block mappings are broken into tables
when the host sets up DMA, and donating neighbouring pages to guests won't
cause block replacement.

This solution adds runtime overhead because calls the DMA API are now
forwarded to the hypervisor, which needs to update the stage-2 mappings.

All in all, I believe this is a good solution if the hardware is up to the
task. But sharing page tables requires matching capabilities between the
stage-2 MMU and SMMU, and we don't expect all platforms to support the
required features, especially on mobile platforms where chip area is
costly.

3. Private I/O page tables

A flexible alternative uses private page tables in the SMMU, entirely
disconnected from the CPU page tables. With this the SMMU can implement a
reduced set of features, even shed a stage of translation. This also
provides a virtual I/O address space to the host, which allows more
efficient memory allocation for large buffers, and for devices with
limited addressing abilities.

This is the solution implemented in this series. The host creates
IOVA->HPA mappings with two hypercalls map_pages() and unmap_pages(), and
the hypervisor populates the page tables. Page tables are abstracted into
IOMMU domains, which allow multiple devices to share the same address
space. Another four hypercalls, alloc_domain(), attach_dev(), detach_dev()
and free_domain(), manage the domains.

Although the hypervisor already has pgtable.c to populate CPU page tables,
we import the io-pgtable library because it is more suited to IOMMU page
tables. It supports arbitrary page and address sizes, non-coherent page
walks, quirks and errata workarounds specific to IOMMU implementations,
and atomically switching between tables and blocks without lazy remapping.


Performance
===========

Both solutions 2 and 3 add overhead to DMA mappings, and since the
hypervisor relies on global locks at the moment, they scale poorly.
Interestingly solution 3 can be optimized to scale really well on the
map() path. We can remove the hypervisor IOMMU lock in map()/unmap() by
holding domain references, and then use the hyp vmemmap to track DMA state
of pages atomically, without updating the CPU stage-2 tables. Donation and
sharing would then need to inspect the vmemmap. On the unmap() path, the
single command queue for TLB invalidations still requires locking.

To give a rough idea, these are dma_map_benchmark results on a 96-core
server (4 NUMA nodes, SMMU on node 0). I'm adding these because I found
the magnitudes interesting but do take them with a grain of salt, my
methodology wasn't particularly thorough (although the numbers seem
repeatable). Numbers represent the average time needed for one
dma_map/dma_unmap call in μs, lower is better.

			1 thread	16 threads (node 0)	96 threads
host only		0.2/0.7		0.4/3.5			1.7/81
pkvm (this series)	0.5/2.2		28/51			291/542
pkvm (+optimizations)	0.3/1.9		0.4/38			0.8/304


[1] https://lore.kernel.org/kvmarm/20220519134204.5379-1-will@kernel.org/


David Brazdil (1):
  KVM: arm64: Introduce IOMMU driver infrastructure

Jean-Philippe Brucker (44):
  iommu/io-pgtable-arm: Split the page table driver
  iommu/io-pgtable-arm: Split initialization
  iommu/io-pgtable: Move fmt into io_pgtable_cfg
  iommu/io-pgtable: Add configure() operation
  iommu/io-pgtable: Split io_pgtable structure
  iommu/io-pgtable-arm: Extend __arm_lpae_free_pgtable() to only free
    child tables
  iommu/arm-smmu-v3: Move some definitions to arm64 include/
  KVM: arm64: pkvm: Add pkvm_udelay()
  KVM: arm64: pkvm: Add pkvm_create_hyp_device_mapping()
  KVM: arm64: pkvm: Expose pkvm_map/unmap_donated_memory()
  KVM: arm64: pkvm: Expose pkvm_admit_host_page()
  KVM: arm64: pkvm: Unify pkvm_pkvm_teardown_donated_memory()
  KVM: arm64: pkvm: Add hyp_page_ref_inc_return()
  KVM: arm64: pkvm: Prevent host donation of device memory
  KVM: arm64: pkvm: Add __pkvm_host_share/unshare_dma()
  KVM: arm64: pkvm: Add IOMMU hypercalls
  KVM: arm64: iommu: Add per-cpu page queue
  KVM: arm64: iommu: Add domains
  KVM: arm64: iommu: Add map() and unmap() operations
  KVM: arm64: iommu: Add SMMUv3 driver
  KVM: arm64: smmu-v3: Initialize registers
  KVM: arm64: smmu-v3: Setup command queue
  KVM: arm64: smmu-v3: Setup stream table
  KVM: arm64: smmu-v3: Reset the device
  KVM: arm64: smmu-v3: Support io-pgtable
  KVM: arm64: smmu-v3: Setup domains and page table configuration
  iommu/arm-smmu-v3: Extract driver-specific bits from probe function
  iommu/arm-smmu-v3: Move some functions to arm-smmu-v3-common.c
  iommu/arm-smmu-v3: Move queue and table allocation to
    arm-smmu-v3-common.c
  iommu/arm-smmu-v3: Move firmware probe to arm-smmu-v3-common
  iommu/arm-smmu-v3: Move IOMMU registration to arm-smmu-v3-common.c
  iommu/arm-smmu-v3: Use single pages for level-2 stream tables
  iommu/arm-smmu-v3: Add host driver for pKVM
  iommu/arm-smmu-v3-kvm: Pass a list of SMMU devices to the hypervisor
  iommu/arm-smmu-v3-kvm: Validate device features
  iommu/arm-smmu-v3-kvm: Allocate structures and reset device
  iommu/arm-smmu-v3-kvm: Add per-cpu page queue
  iommu/arm-smmu-v3-kvm: Initialize page table configuration
  iommu/arm-smmu-v3-kvm: Add IOMMU ops
  KVM: arm64: pkvm: Add __pkvm_host_add_remove_page()
  KVM: arm64: pkvm: Support SCMI power domain
  KVM: arm64: smmu-v3: Support power management
  iommu/arm-smmu-v3-kvm: Support power management with SCMI SMC
  iommu/arm-smmu-v3-kvm: Enable runtime PM

 drivers/iommu/Kconfig                         |   10 +
 virt/kvm/Kconfig                              |    3 +
 arch/arm64/kvm/hyp/nvhe/Makefile              |    6 +
 drivers/iommu/Makefile                        |    2 +-
 drivers/iommu/arm/arm-smmu-v3/Makefile        |    6 +
 arch/arm64/include/asm/arm-smmu-v3-regs.h     |  478 ++++++++
 arch/arm64/include/asm/kvm_asm.h              |    7 +
 arch/arm64/include/asm/kvm_host.h             |    5 +
 arch/arm64/include/asm/kvm_hyp.h              |    4 +-
 arch/arm64/kvm/hyp/include/nvhe/iommu.h       |  115 ++
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |   11 +-
 arch/arm64/kvm/hyp/include/nvhe/memory.h      |   15 +-
 arch/arm64/kvm/hyp/include/nvhe/mm.h          |    2 +
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h        |   29 +
 .../arm64/kvm/hyp/include/nvhe/trap_handler.h |    2 +
 drivers/gpu/drm/panfrost/panfrost_device.h    |    2 +-
 drivers/iommu/amd/amd_iommu_types.h           |   17 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  510 +-------
 drivers/iommu/arm/arm-smmu/arm-smmu.h         |    2 +-
 drivers/iommu/io-pgtable-arm.h                |   30 -
 include/kvm/arm_smmu_v3.h                     |   61 +
 include/kvm/iommu.h                           |   74 ++
 include/kvm/power_domain.h                    |   22 +
 include/linux/io-pgtable-arm.h                |  190 +++
 include/linux/io-pgtable.h                    |  114 +-
 arch/arm64/kvm/arm.c                          |   41 +-
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            |  101 +-
 arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c   |  625 ++++++++++
 .../arm64/kvm/hyp/nvhe/iommu/io-pgtable-arm.c |   97 ++
 arch/arm64/kvm/hyp/nvhe/iommu/iommu.c         |  393 ++++++
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         |  209 +++-
 arch/arm64/kvm/hyp/nvhe/mm.c                  |   27 +-
 arch/arm64/kvm/hyp/nvhe/pkvm.c                |   66 +-
 arch/arm64/kvm/hyp/nvhe/power/scmi.c          |  233 ++++
 arch/arm64/kvm/hyp/nvhe/setup.c               |   47 +-
 arch/arm64/kvm/hyp/nvhe/timer-sr.c            |   43 +
 drivers/gpu/drm/msm/msm_iommu.c               |   22 +-
 drivers/gpu/drm/panfrost/panfrost_mmu.c       |   22 +-
 drivers/iommu/amd/io_pgtable.c                |   26 +-
 drivers/iommu/amd/io_pgtable_v2.c             |   43 +-
 drivers/iommu/amd/iommu.c                     |   29 +-
 drivers/iommu/apple-dart.c                    |   38 +-
 .../arm/arm-smmu-v3/arm-smmu-v3-common.c      |  632 ++++++++++
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c   |  864 +++++++++++++
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |    2 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  679 +----------
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c    |    7 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.c         |   41 +-
 drivers/iommu/arm/arm-smmu/qcom_iommu.c       |   41 +-
 drivers/iommu/io-pgtable-arm-common.c         |  766 ++++++++++++
 drivers/iommu/io-pgtable-arm-v7s.c            |  190 +--
 drivers/iommu/io-pgtable-arm.c                | 1082 ++---------------
 drivers/iommu/io-pgtable-dart.c               |  105 +-
 drivers/iommu/io-pgtable.c                    |   57 +-
 drivers/iommu/ipmmu-vmsa.c                    |   20 +-
 drivers/iommu/msm_iommu.c                     |   18 +-
 drivers/iommu/mtk_iommu.c                     |   14 +-
 57 files changed, 5743 insertions(+), 2554 deletions(-)
 create mode 100644 arch/arm64/include/asm/arm-smmu-v3-regs.h
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/iommu.h
 delete mode 100644 drivers/iommu/io-pgtable-arm.h
 create mode 100644 include/kvm/arm_smmu_v3.h
 create mode 100644 include/kvm/iommu.h
 create mode 100644 include/kvm/power_domain.h
 create mode 100644 include/linux/io-pgtable-arm.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/iommu/io-pgtable-arm.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/power/scmi.c
 create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
 create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
 create mode 100644 drivers/iommu/io-pgtable-arm-common.c

-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [RFC PATCH 01/45] iommu/io-pgtable-arm: Split the page table driver
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:52   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:52 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

To allow the KVM IOMMU driver to populate page tables using the
io-pgtable-arm code, move the shared bits into io-pgtable-arm-common.c.

Here we move the bulk of the common code, and a subsequent patch handles
the bits that require more care. phys_to_virt() and virt_to_phys() do
need special handling here because the hypervisor will have its own
version. It will also implement its own version of
__arm_lpae_alloc_pages(), __arm_lpae_free_pages() and
__arm_lpae_sync_pte() since the hypervisor needs some assistance for
allocating pages.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/iommu/Makefile                        |   2 +-
 drivers/iommu/io-pgtable-arm.h                |  30 -
 include/linux/io-pgtable-arm.h                | 187 ++++++
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |   2 +-
 drivers/iommu/io-pgtable-arm-common.c         | 500 ++++++++++++++
 drivers/iommu/io-pgtable-arm.c                | 634 +-----------------
 6 files changed, 696 insertions(+), 659 deletions(-)
 delete mode 100644 drivers/iommu/io-pgtable-arm.h
 create mode 100644 include/linux/io-pgtable-arm.h
 create mode 100644 drivers/iommu/io-pgtable-arm-common.c

diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index f461d0651385..c616acf534f8 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -7,7 +7,7 @@ obj-$(CONFIG_IOMMU_DEBUGFS) += iommu-debugfs.o
 obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
-obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
+obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o io-pgtable-arm-common.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_DART) += io-pgtable-dart.o
 obj-$(CONFIG_IOASID) += ioasid.o
 obj-$(CONFIG_IOMMU_IOVA) += iova.o
diff --git a/drivers/iommu/io-pgtable-arm.h b/drivers/iommu/io-pgtable-arm.h
deleted file mode 100644
index ba7cfdf7afa0..000000000000
--- a/drivers/iommu/io-pgtable-arm.h
+++ /dev/null
@@ -1,30 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-#ifndef IO_PGTABLE_ARM_H_
-#define IO_PGTABLE_ARM_H_
-
-#define ARM_LPAE_TCR_TG0_4K		0
-#define ARM_LPAE_TCR_TG0_64K		1
-#define ARM_LPAE_TCR_TG0_16K		2
-
-#define ARM_LPAE_TCR_TG1_16K		1
-#define ARM_LPAE_TCR_TG1_4K		2
-#define ARM_LPAE_TCR_TG1_64K		3
-
-#define ARM_LPAE_TCR_SH_NS		0
-#define ARM_LPAE_TCR_SH_OS		2
-#define ARM_LPAE_TCR_SH_IS		3
-
-#define ARM_LPAE_TCR_RGN_NC		0
-#define ARM_LPAE_TCR_RGN_WBWA		1
-#define ARM_LPAE_TCR_RGN_WT		2
-#define ARM_LPAE_TCR_RGN_WB		3
-
-#define ARM_LPAE_TCR_PS_32_BIT		0x0ULL
-#define ARM_LPAE_TCR_PS_36_BIT		0x1ULL
-#define ARM_LPAE_TCR_PS_40_BIT		0x2ULL
-#define ARM_LPAE_TCR_PS_42_BIT		0x3ULL
-#define ARM_LPAE_TCR_PS_44_BIT		0x4ULL
-#define ARM_LPAE_TCR_PS_48_BIT		0x5ULL
-#define ARM_LPAE_TCR_PS_52_BIT		0x6ULL
-
-#endif /* IO_PGTABLE_ARM_H_ */
diff --git a/include/linux/io-pgtable-arm.h b/include/linux/io-pgtable-arm.h
new file mode 100644
index 000000000000..594b5030b450
--- /dev/null
+++ b/include/linux/io-pgtable-arm.h
@@ -0,0 +1,187 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef IO_PGTABLE_H_
+#define IO_PGTABLE_H_
+
+#include <linux/io-pgtable.h>
+
+extern bool selftest_running;
+
+typedef u64 arm_lpae_iopte;
+
+struct arm_lpae_io_pgtable {
+	struct io_pgtable	iop;
+
+	int			pgd_bits;
+	int			start_level;
+	int			bits_per_level;
+
+	void			*pgd;
+};
+
+/* Struct accessors */
+#define io_pgtable_to_data(x)						\
+	container_of((x), struct arm_lpae_io_pgtable, iop)
+
+#define io_pgtable_ops_to_data(x)					\
+	io_pgtable_to_data(io_pgtable_ops_to_pgtable(x))
+
+/*
+ * Calculate the right shift amount to get to the portion describing level l
+ * in a virtual address mapped by the pagetable in d.
+ */
+#define ARM_LPAE_LVL_SHIFT(l,d)						\
+	(((ARM_LPAE_MAX_LEVELS - (l)) * (d)->bits_per_level) +		\
+	ilog2(sizeof(arm_lpae_iopte)))
+
+#define ARM_LPAE_GRANULE(d)						\
+	(sizeof(arm_lpae_iopte) << (d)->bits_per_level)
+#define ARM_LPAE_PGD_SIZE(d)						\
+	(sizeof(arm_lpae_iopte) << (d)->pgd_bits)
+
+#define ARM_LPAE_PTES_PER_TABLE(d)					\
+	(ARM_LPAE_GRANULE(d) >> ilog2(sizeof(arm_lpae_iopte)))
+
+/*
+ * Calculate the index at level l used to map virtual address a using the
+ * pagetable in d.
+ */
+#define ARM_LPAE_PGD_IDX(l,d)						\
+	((l) == (d)->start_level ? (d)->pgd_bits - (d)->bits_per_level : 0)
+
+#define ARM_LPAE_LVL_IDX(a,l,d)						\
+	(((u64)(a) >> ARM_LPAE_LVL_SHIFT(l,d)) &			\
+	 ((1 << ((d)->bits_per_level + ARM_LPAE_PGD_IDX(l,d))) - 1))
+
+/* Calculate the block/page mapping size at level l for pagetable in d. */
+#define ARM_LPAE_BLOCK_SIZE(l,d)	(1ULL << ARM_LPAE_LVL_SHIFT(l,d))
+
+/* Page table bits */
+#define ARM_LPAE_PTE_TYPE_SHIFT		0
+#define ARM_LPAE_PTE_TYPE_MASK		0x3
+
+#define ARM_LPAE_PTE_TYPE_BLOCK		1
+#define ARM_LPAE_PTE_TYPE_TABLE		3
+#define ARM_LPAE_PTE_TYPE_PAGE		3
+
+#define ARM_LPAE_PTE_ADDR_MASK		GENMASK_ULL(47,12)
+
+#define ARM_LPAE_PTE_NSTABLE		(((arm_lpae_iopte)1) << 63)
+#define ARM_LPAE_PTE_XN			(((arm_lpae_iopte)3) << 53)
+#define ARM_LPAE_PTE_AF			(((arm_lpae_iopte)1) << 10)
+#define ARM_LPAE_PTE_SH_NS		(((arm_lpae_iopte)0) << 8)
+#define ARM_LPAE_PTE_SH_OS		(((arm_lpae_iopte)2) << 8)
+#define ARM_LPAE_PTE_SH_IS		(((arm_lpae_iopte)3) << 8)
+#define ARM_LPAE_PTE_NS			(((arm_lpae_iopte)1) << 5)
+#define ARM_LPAE_PTE_VALID		(((arm_lpae_iopte)1) << 0)
+
+#define ARM_LPAE_PTE_ATTR_LO_MASK	(((arm_lpae_iopte)0x3ff) << 2)
+/* Ignore the contiguous bit for block splitting */
+#define ARM_LPAE_PTE_ATTR_HI_MASK	(((arm_lpae_iopte)6) << 52)
+#define ARM_LPAE_PTE_ATTR_MASK		(ARM_LPAE_PTE_ATTR_LO_MASK |	\
+					 ARM_LPAE_PTE_ATTR_HI_MASK)
+/* Software bit for solving coherency races */
+#define ARM_LPAE_PTE_SW_SYNC		(((arm_lpae_iopte)1) << 55)
+
+/* Stage-1 PTE */
+#define ARM_LPAE_PTE_AP_UNPRIV		(((arm_lpae_iopte)1) << 6)
+#define ARM_LPAE_PTE_AP_RDONLY		(((arm_lpae_iopte)2) << 6)
+#define ARM_LPAE_PTE_ATTRINDX_SHIFT	2
+#define ARM_LPAE_PTE_nG			(((arm_lpae_iopte)1) << 11)
+
+/* Stage-2 PTE */
+#define ARM_LPAE_PTE_HAP_FAULT		(((arm_lpae_iopte)0) << 6)
+#define ARM_LPAE_PTE_HAP_READ		(((arm_lpae_iopte)1) << 6)
+#define ARM_LPAE_PTE_HAP_WRITE		(((arm_lpae_iopte)2) << 6)
+#define ARM_LPAE_PTE_MEMATTR_OIWB	(((arm_lpae_iopte)0xf) << 2)
+#define ARM_LPAE_PTE_MEMATTR_NC		(((arm_lpae_iopte)0x5) << 2)
+#define ARM_LPAE_PTE_MEMATTR_DEV	(((arm_lpae_iopte)0x1) << 2)
+
+/* Register bits */
+#define ARM_LPAE_VTCR_SL0_MASK		0x3
+
+#define ARM_LPAE_TCR_T0SZ_SHIFT		0
+
+#define ARM_LPAE_TCR_TG0_4K		0
+#define ARM_LPAE_TCR_TG0_64K		1
+#define ARM_LPAE_TCR_TG0_16K		2
+
+#define ARM_LPAE_TCR_TG1_16K		1
+#define ARM_LPAE_TCR_TG1_4K		2
+#define ARM_LPAE_TCR_TG1_64K		3
+
+#define ARM_LPAE_TCR_SH_NS		0
+#define ARM_LPAE_TCR_SH_OS		2
+#define ARM_LPAE_TCR_SH_IS		3
+
+#define ARM_LPAE_TCR_RGN_NC		0
+#define ARM_LPAE_TCR_RGN_WBWA		1
+#define ARM_LPAE_TCR_RGN_WT		2
+#define ARM_LPAE_TCR_RGN_WB		3
+
+#define ARM_LPAE_TCR_PS_32_BIT		0x0ULL
+#define ARM_LPAE_TCR_PS_36_BIT		0x1ULL
+#define ARM_LPAE_TCR_PS_40_BIT		0x2ULL
+#define ARM_LPAE_TCR_PS_42_BIT		0x3ULL
+#define ARM_LPAE_TCR_PS_44_BIT		0x4ULL
+#define ARM_LPAE_TCR_PS_48_BIT		0x5ULL
+#define ARM_LPAE_TCR_PS_52_BIT		0x6ULL
+
+#define ARM_LPAE_VTCR_PS_SHIFT		16
+#define ARM_LPAE_VTCR_PS_MASK		0x7
+
+#define ARM_LPAE_MAIR_ATTR_SHIFT(n)	((n) << 3)
+#define ARM_LPAE_MAIR_ATTR_MASK		0xff
+#define ARM_LPAE_MAIR_ATTR_DEVICE	0x04
+#define ARM_LPAE_MAIR_ATTR_NC		0x44
+#define ARM_LPAE_MAIR_ATTR_INC_OWBRWA	0xf4
+#define ARM_LPAE_MAIR_ATTR_WBRWA	0xff
+#define ARM_LPAE_MAIR_ATTR_IDX_NC	0
+#define ARM_LPAE_MAIR_ATTR_IDX_CACHE	1
+#define ARM_LPAE_MAIR_ATTR_IDX_DEV	2
+#define ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE	3
+
+#define ARM_MALI_LPAE_TTBR_ADRMODE_TABLE (3u << 0)
+#define ARM_MALI_LPAE_TTBR_READ_INNER	BIT(2)
+#define ARM_MALI_LPAE_TTBR_SHARE_OUTER	BIT(4)
+
+#define ARM_MALI_LPAE_MEMATTR_IMP_DEF	0x88ULL
+#define ARM_MALI_LPAE_MEMATTR_WRITE_ALLOC 0x8DULL
+
+#define ARM_LPAE_MAX_LEVELS		4
+
+#define iopte_type(pte)					\
+	(((pte) >> ARM_LPAE_PTE_TYPE_SHIFT) & ARM_LPAE_PTE_TYPE_MASK)
+
+#define iopte_prot(pte)	((pte) & ARM_LPAE_PTE_ATTR_MASK)
+
+static inline bool iopte_leaf(arm_lpae_iopte pte, int lvl,
+			      enum io_pgtable_fmt fmt)
+{
+	if (lvl == (ARM_LPAE_MAX_LEVELS - 1) && fmt != ARM_MALI_LPAE)
+		return iopte_type(pte) == ARM_LPAE_PTE_TYPE_PAGE;
+
+	return iopte_type(pte) == ARM_LPAE_PTE_TYPE_BLOCK;
+}
+
+#define __arm_lpae_virt_to_phys	__pa
+#define __arm_lpae_phys_to_virt	__va
+
+/* Generic functions */
+int arm_lpae_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
+		       phys_addr_t paddr, size_t pgsize, size_t pgcount,
+		       int iommu_prot, gfp_t gfp, size_t *mapped);
+size_t arm_lpae_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova,
+			    size_t pgsize, size_t pgcount,
+			    struct iommu_iotlb_gather *gather);
+phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
+				  unsigned long iova);
+void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
+			     arm_lpae_iopte *ptep);
+
+/* Host/hyp-specific functions */
+void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp, struct io_pgtable_cfg *cfg);
+void __arm_lpae_free_pages(void *pages, size_t size, struct io_pgtable_cfg *cfg);
+void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries,
+			 struct io_pgtable_cfg *cfg);
+
+#endif /* IO_PGTABLE_H_ */
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index a5a63b1c947e..df288f29a5c1 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -3,6 +3,7 @@
  * Implementation of the IOMMU SVA API for the ARM SMMUv3
  */
 
+#include <linux/io-pgtable-arm.h>
 #include <linux/mm.h>
 #include <linux/mmu_context.h>
 #include <linux/mmu_notifier.h>
@@ -11,7 +12,6 @@
 
 #include "arm-smmu-v3.h"
 #include "../../iommu-sva.h"
-#include "../../io-pgtable-arm.h"
 
 struct arm_smmu_mmu_notifier {
 	struct mmu_notifier		mn;
diff --git a/drivers/iommu/io-pgtable-arm-common.c b/drivers/iommu/io-pgtable-arm-common.c
new file mode 100644
index 000000000000..74d962712d15
--- /dev/null
+++ b/drivers/iommu/io-pgtable-arm-common.c
@@ -0,0 +1,500 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * CPU-agnostic ARM page table allocator.
+ * A copy of this library is embedded in the KVM nVHE image.
+ *
+ * Copyright (C) 2022 Arm Limited
+ *
+ * Author: Will Deacon <will.deacon@arm.com>
+ */
+
+#include <linux/io-pgtable-arm.h>
+
+#include <linux/sizes.h>
+#include <linux/types.h>
+
+#define iopte_deref(pte, d) __arm_lpae_phys_to_virt(iopte_to_paddr(pte, d))
+
+static arm_lpae_iopte paddr_to_iopte(phys_addr_t paddr,
+				     struct arm_lpae_io_pgtable *data)
+{
+	arm_lpae_iopte pte = paddr;
+
+	/* Of the bits which overlap, either 51:48 or 15:12 are always RES0 */
+	return (pte | (pte >> (48 - 12))) & ARM_LPAE_PTE_ADDR_MASK;
+}
+
+static phys_addr_t iopte_to_paddr(arm_lpae_iopte pte,
+				  struct arm_lpae_io_pgtable *data)
+{
+	u64 paddr = pte & ARM_LPAE_PTE_ADDR_MASK;
+
+	if (ARM_LPAE_GRANULE(data) < SZ_64K)
+		return paddr;
+
+	/* Rotate the packed high-order bits back to the top */
+	return (paddr | (paddr << (48 - 12))) & (ARM_LPAE_PTE_ADDR_MASK << 4);
+}
+
+static void __arm_lpae_clear_pte(arm_lpae_iopte *ptep, struct io_pgtable_cfg *cfg)
+{
+
+	*ptep = 0;
+
+	if (!cfg->coherent_walk)
+		__arm_lpae_sync_pte(ptep, 1, cfg);
+}
+
+static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
+			       struct iommu_iotlb_gather *gather,
+			       unsigned long iova, size_t size, size_t pgcount,
+			       int lvl, arm_lpae_iopte *ptep);
+
+static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
+				phys_addr_t paddr, arm_lpae_iopte prot,
+				int lvl, int num_entries, arm_lpae_iopte *ptep)
+{
+	arm_lpae_iopte pte = prot;
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+	size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
+	int i;
+
+	if (data->iop.fmt != ARM_MALI_LPAE && lvl == ARM_LPAE_MAX_LEVELS - 1)
+		pte |= ARM_LPAE_PTE_TYPE_PAGE;
+	else
+		pte |= ARM_LPAE_PTE_TYPE_BLOCK;
+
+	for (i = 0; i < num_entries; i++)
+		ptep[i] = pte | paddr_to_iopte(paddr + i * sz, data);
+
+	if (!cfg->coherent_walk)
+		__arm_lpae_sync_pte(ptep, num_entries, cfg);
+}
+
+static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
+			     unsigned long iova, phys_addr_t paddr,
+			     arm_lpae_iopte prot, int lvl, int num_entries,
+			     arm_lpae_iopte *ptep)
+{
+	int i;
+
+	for (i = 0; i < num_entries; i++)
+		if (iopte_leaf(ptep[i], lvl, data->iop.fmt)) {
+			/* We require an unmap first */
+			WARN_ON(!selftest_running);
+			return -EEXIST;
+		} else if (iopte_type(ptep[i]) == ARM_LPAE_PTE_TYPE_TABLE) {
+			/*
+			 * We need to unmap and free the old table before
+			 * overwriting it with a block entry.
+			 */
+			arm_lpae_iopte *tblp;
+			size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
+
+			tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data);
+			if (__arm_lpae_unmap(data, NULL, iova + i * sz, sz, 1,
+					     lvl, tblp) != sz) {
+				WARN_ON(1);
+				return -EINVAL;
+			}
+		}
+
+	__arm_lpae_init_pte(data, paddr, prot, lvl, num_entries, ptep);
+	return 0;
+}
+
+static arm_lpae_iopte arm_lpae_install_table(arm_lpae_iopte *table,
+					     arm_lpae_iopte *ptep,
+					     arm_lpae_iopte curr,
+					     struct arm_lpae_io_pgtable *data)
+{
+	arm_lpae_iopte old, new;
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+
+	new = paddr_to_iopte(__arm_lpae_virt_to_phys(table), data) |
+		ARM_LPAE_PTE_TYPE_TABLE;
+	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_NS)
+		new |= ARM_LPAE_PTE_NSTABLE;
+
+	/*
+	 * Ensure the table itself is visible before its PTE can be.
+	 * Whilst we could get away with cmpxchg64_release below, this
+	 * doesn't have any ordering semantics when !CONFIG_SMP.
+	 */
+	dma_wmb();
+
+	old = cmpxchg64_relaxed(ptep, curr, new);
+
+	if (cfg->coherent_walk || (old & ARM_LPAE_PTE_SW_SYNC))
+		return old;
+
+	/* Even if it's not ours, there's no point waiting; just kick it */
+	__arm_lpae_sync_pte(ptep, 1, cfg);
+	if (old == curr)
+		WRITE_ONCE(*ptep, new | ARM_LPAE_PTE_SW_SYNC);
+
+	return old;
+}
+
+int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova,
+		   phys_addr_t paddr, size_t size, size_t pgcount,
+		   arm_lpae_iopte prot, int lvl, arm_lpae_iopte *ptep,
+		   gfp_t gfp, size_t *mapped)
+{
+	arm_lpae_iopte *cptep, pte;
+	size_t block_size = ARM_LPAE_BLOCK_SIZE(lvl, data);
+	size_t tblsz = ARM_LPAE_GRANULE(data);
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+	int ret = 0, num_entries, max_entries, map_idx_start;
+
+	/* Find our entry at the current level */
+	map_idx_start = ARM_LPAE_LVL_IDX(iova, lvl, data);
+	ptep += map_idx_start;
+
+	/* If we can install a leaf entry at this level, then do so */
+	if (size == block_size) {
+		max_entries = ARM_LPAE_PTES_PER_TABLE(data) - map_idx_start;
+		num_entries = min_t(int, pgcount, max_entries);
+		ret = arm_lpae_init_pte(data, iova, paddr, prot, lvl, num_entries, ptep);
+		if (!ret)
+			*mapped += num_entries * size;
+
+		return ret;
+	}
+
+	/* We can't allocate tables at the final level */
+	if (WARN_ON(lvl >= ARM_LPAE_MAX_LEVELS - 1))
+		return -EINVAL;
+
+	/* Grab a pointer to the next level */
+	pte = READ_ONCE(*ptep);
+	if (!pte) {
+		cptep = __arm_lpae_alloc_pages(tblsz, gfp, cfg);
+		if (!cptep)
+			return -ENOMEM;
+
+		pte = arm_lpae_install_table(cptep, ptep, 0, data);
+		if (pte)
+			__arm_lpae_free_pages(cptep, tblsz, cfg);
+	} else if (!cfg->coherent_walk && !(pte & ARM_LPAE_PTE_SW_SYNC)) {
+		__arm_lpae_sync_pte(ptep, 1, cfg);
+	}
+
+	if (pte && !iopte_leaf(pte, lvl, data->iop.fmt)) {
+		cptep = iopte_deref(pte, data);
+	} else if (pte) {
+		/* We require an unmap first */
+		WARN_ON(!selftest_running);
+		return -EEXIST;
+	}
+
+	/* Rinse, repeat */
+	return __arm_lpae_map(data, iova, paddr, size, pgcount, prot, lvl + 1,
+			      cptep, gfp, mapped);
+}
+
+static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
+					   int prot)
+{
+	arm_lpae_iopte pte;
+
+	if (data->iop.fmt == ARM_64_LPAE_S1 ||
+	    data->iop.fmt == ARM_32_LPAE_S1) {
+		pte = ARM_LPAE_PTE_nG;
+		if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
+			pte |= ARM_LPAE_PTE_AP_RDONLY;
+		if (!(prot & IOMMU_PRIV))
+			pte |= ARM_LPAE_PTE_AP_UNPRIV;
+	} else {
+		pte = ARM_LPAE_PTE_HAP_FAULT;
+		if (prot & IOMMU_READ)
+			pte |= ARM_LPAE_PTE_HAP_READ;
+		if (prot & IOMMU_WRITE)
+			pte |= ARM_LPAE_PTE_HAP_WRITE;
+	}
+
+	/*
+	 * Note that this logic is structured to accommodate Mali LPAE
+	 * having stage-1-like attributes but stage-2-like permissions.
+	 */
+	if (data->iop.fmt == ARM_64_LPAE_S2 ||
+	    data->iop.fmt == ARM_32_LPAE_S2) {
+		if (prot & IOMMU_MMIO)
+			pte |= ARM_LPAE_PTE_MEMATTR_DEV;
+		else if (prot & IOMMU_CACHE)
+			pte |= ARM_LPAE_PTE_MEMATTR_OIWB;
+		else
+			pte |= ARM_LPAE_PTE_MEMATTR_NC;
+	} else {
+		if (prot & IOMMU_MMIO)
+			pte |= (ARM_LPAE_MAIR_ATTR_IDX_DEV
+				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
+		else if (prot & IOMMU_CACHE)
+			pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
+				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
+	}
+
+	/*
+	 * Also Mali has its own notions of shareability wherein its Inner
+	 * domain covers the cores within the GPU, and its Outer domain is
+	 * "outside the GPU" (i.e. either the Inner or System domain in CPU
+	 * terms, depending on coherency).
+	 */
+	if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE)
+		pte |= ARM_LPAE_PTE_SH_IS;
+	else
+		pte |= ARM_LPAE_PTE_SH_OS;
+
+	if (prot & IOMMU_NOEXEC)
+		pte |= ARM_LPAE_PTE_XN;
+
+	if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_NS)
+		pte |= ARM_LPAE_PTE_NS;
+
+	if (data->iop.fmt != ARM_MALI_LPAE)
+		pte |= ARM_LPAE_PTE_AF;
+
+	return pte;
+}
+
+int arm_lpae_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
+		       phys_addr_t paddr, size_t pgsize, size_t pgcount,
+		       int iommu_prot, gfp_t gfp, size_t *mapped)
+{
+	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+	arm_lpae_iopte *ptep = data->pgd;
+	int ret, lvl = data->start_level;
+	arm_lpae_iopte prot;
+	long iaext = (s64)iova >> cfg->ias;
+
+	if (WARN_ON(!pgsize || (pgsize & cfg->pgsize_bitmap) != pgsize))
+		return -EINVAL;
+
+	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
+		iaext = ~iaext;
+	if (WARN_ON(iaext || paddr >> cfg->oas))
+		return -ERANGE;
+
+	/* If no access, then nothing to do */
+	if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE)))
+		return 0;
+
+	prot = arm_lpae_prot_to_pte(data, iommu_prot);
+	ret = __arm_lpae_map(data, iova, paddr, pgsize, pgcount, prot, lvl,
+			     ptep, gfp, mapped);
+	/*
+	 * Synchronise all PTE updates for the new mapping before there's
+	 * a chance for anything to kick off a table walk for the new iova.
+	 */
+	wmb();
+
+	return ret;
+}
+
+void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
+			     arm_lpae_iopte *ptep)
+{
+	arm_lpae_iopte *start, *end;
+	unsigned long table_size;
+
+	if (lvl == data->start_level)
+		table_size = ARM_LPAE_PGD_SIZE(data);
+	else
+		table_size = ARM_LPAE_GRANULE(data);
+
+	start = ptep;
+
+	/* Only leaf entries at the last level */
+	if (lvl == ARM_LPAE_MAX_LEVELS - 1)
+		end = ptep;
+	else
+		end = (void *)ptep + table_size;
+
+	while (ptep != end) {
+		arm_lpae_iopte pte = *ptep++;
+
+		if (!pte || iopte_leaf(pte, lvl, data->iop.fmt))
+			continue;
+
+		__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data));
+	}
+
+	__arm_lpae_free_pages(start, table_size, &data->iop.cfg);
+}
+
+static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
+				       struct iommu_iotlb_gather *gather,
+				       unsigned long iova, size_t size,
+				       arm_lpae_iopte blk_pte, int lvl,
+				       arm_lpae_iopte *ptep, size_t pgcount)
+{
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+	arm_lpae_iopte pte, *tablep;
+	phys_addr_t blk_paddr;
+	size_t tablesz = ARM_LPAE_GRANULE(data);
+	size_t split_sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
+	int ptes_per_table = ARM_LPAE_PTES_PER_TABLE(data);
+	int i, unmap_idx_start = -1, num_entries = 0, max_entries;
+
+	if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+		return 0;
+
+	tablep = __arm_lpae_alloc_pages(tablesz, GFP_ATOMIC, cfg);
+	if (!tablep)
+		return 0; /* Bytes unmapped */
+
+	if (size == split_sz) {
+		unmap_idx_start = ARM_LPAE_LVL_IDX(iova, lvl, data);
+		max_entries = ptes_per_table - unmap_idx_start;
+		num_entries = min_t(int, pgcount, max_entries);
+	}
+
+	blk_paddr = iopte_to_paddr(blk_pte, data);
+	pte = iopte_prot(blk_pte);
+
+	for (i = 0; i < ptes_per_table; i++, blk_paddr += split_sz) {
+		/* Unmap! */
+		if (i >= unmap_idx_start && i < (unmap_idx_start + num_entries))
+			continue;
+
+		__arm_lpae_init_pte(data, blk_paddr, pte, lvl, 1, &tablep[i]);
+	}
+
+	pte = arm_lpae_install_table(tablep, ptep, blk_pte, data);
+	if (pte != blk_pte) {
+		__arm_lpae_free_pages(tablep, tablesz, cfg);
+		/*
+		 * We may race against someone unmapping another part of this
+		 * block, but anything else is invalid. We can't misinterpret
+		 * a page entry here since we're never at the last level.
+		 */
+		if (iopte_type(pte) != ARM_LPAE_PTE_TYPE_TABLE)
+			return 0;
+
+		tablep = iopte_deref(pte, data);
+	} else if (unmap_idx_start >= 0) {
+		for (i = 0; i < num_entries; i++)
+			io_pgtable_tlb_add_page(&data->iop, gather, iova + i * size, size);
+
+		return num_entries * size;
+	}
+
+	return __arm_lpae_unmap(data, gather, iova, size, pgcount, lvl, tablep);
+}
+
+static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
+			       struct iommu_iotlb_gather *gather,
+			       unsigned long iova, size_t size, size_t pgcount,
+			       int lvl, arm_lpae_iopte *ptep)
+{
+	arm_lpae_iopte pte;
+	struct io_pgtable *iop = &data->iop;
+	int i = 0, num_entries, max_entries, unmap_idx_start;
+
+	/* Something went horribly wrong and we ran out of page table */
+	if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+		return 0;
+
+	unmap_idx_start = ARM_LPAE_LVL_IDX(iova, lvl, data);
+	ptep += unmap_idx_start;
+	pte = READ_ONCE(*ptep);
+	if (WARN_ON(!pte))
+		return 0;
+
+	/* If the size matches this level, we're in the right place */
+	if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
+		max_entries = ARM_LPAE_PTES_PER_TABLE(data) - unmap_idx_start;
+		num_entries = min_t(int, pgcount, max_entries);
+
+		while (i < num_entries) {
+			pte = READ_ONCE(*ptep);
+			if (WARN_ON(!pte))
+				break;
+
+			__arm_lpae_clear_pte(ptep, &iop->cfg);
+
+			if (!iopte_leaf(pte, lvl, iop->fmt)) {
+				/* Also flush any partial walks */
+				io_pgtable_tlb_flush_walk(iop, iova + i * size, size,
+							  ARM_LPAE_GRANULE(data));
+				__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data));
+			} else if (!iommu_iotlb_gather_queued(gather)) {
+				io_pgtable_tlb_add_page(iop, gather, iova + i * size, size);
+			}
+
+			ptep++;
+			i++;
+		}
+
+		return i * size;
+	} else if (iopte_leaf(pte, lvl, iop->fmt)) {
+		/*
+		 * Insert a table at the next level to map the old region,
+		 * minus the part we want to unmap
+		 */
+		return arm_lpae_split_blk_unmap(data, gather, iova, size, pte,
+						lvl + 1, ptep, pgcount);
+	}
+
+	/* Keep on walkin' */
+	ptep = iopte_deref(pte, data);
+	return __arm_lpae_unmap(data, gather, iova, size, pgcount, lvl + 1, ptep);
+}
+
+size_t arm_lpae_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova,
+			    size_t pgsize, size_t pgcount,
+			    struct iommu_iotlb_gather *gather)
+{
+	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+	arm_lpae_iopte *ptep = data->pgd;
+	long iaext = (s64)iova >> cfg->ias;
+
+	if (WARN_ON(!pgsize || (pgsize & cfg->pgsize_bitmap) != pgsize || !pgcount))
+		return 0;
+
+	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
+		iaext = ~iaext;
+	if (WARN_ON(iaext))
+		return 0;
+
+	return __arm_lpae_unmap(data, gather, iova, pgsize, pgcount,
+				data->start_level, ptep);
+}
+
+phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
+				  unsigned long iova)
+{
+	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	arm_lpae_iopte pte, *ptep = data->pgd;
+	int lvl = data->start_level;
+
+	do {
+		/* Valid IOPTE pointer? */
+		if (!ptep)
+			return 0;
+
+		/* Grab the IOPTE we're interested in */
+		ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+		pte = READ_ONCE(*ptep);
+
+		/* Valid entry? */
+		if (!pte)
+			return 0;
+
+		/* Leaf entry? */
+		if (iopte_leaf(pte, lvl, data->iop.fmt))
+			goto found_translation;
+
+		/* Take it to the next level */
+		ptep = iopte_deref(pte, data);
+	} while (++lvl < ARM_LPAE_MAX_LEVELS);
+
+	/* Ran out of page tables to walk */
+	return 0;
+
+found_translation:
+	iova &= (ARM_LPAE_BLOCK_SIZE(lvl, data) - 1);
+	return iopte_to_paddr(pte, data) | iova;
+}
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 72dcdd468cf3..db42aed6ad7b 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /*
  * CPU-agnostic ARM page table allocator.
+ * Host-specific functions. The rest is in io-pgtable-arm-common.c.
  *
  * Copyright (C) 2014 ARM Limited
  *
@@ -11,7 +12,7 @@
 
 #include <linux/atomic.h>
 #include <linux/bitops.h>
-#include <linux/io-pgtable.h>
+#include <linux/io-pgtable-arm.h>
 #include <linux/kernel.h>
 #include <linux/sizes.h>
 #include <linux/slab.h>
@@ -20,175 +21,17 @@
 
 #include <asm/barrier.h>
 
-#include "io-pgtable-arm.h"
-
 #define ARM_LPAE_MAX_ADDR_BITS		52
 #define ARM_LPAE_S2_MAX_CONCAT_PAGES	16
-#define ARM_LPAE_MAX_LEVELS		4
-
-/* Struct accessors */
-#define io_pgtable_to_data(x)						\
-	container_of((x), struct arm_lpae_io_pgtable, iop)
-
-#define io_pgtable_ops_to_data(x)					\
-	io_pgtable_to_data(io_pgtable_ops_to_pgtable(x))
-
-/*
- * Calculate the right shift amount to get to the portion describing level l
- * in a virtual address mapped by the pagetable in d.
- */
-#define ARM_LPAE_LVL_SHIFT(l,d)						\
-	(((ARM_LPAE_MAX_LEVELS - (l)) * (d)->bits_per_level) +		\
-	ilog2(sizeof(arm_lpae_iopte)))
 
-#define ARM_LPAE_GRANULE(d)						\
-	(sizeof(arm_lpae_iopte) << (d)->bits_per_level)
-#define ARM_LPAE_PGD_SIZE(d)						\
-	(sizeof(arm_lpae_iopte) << (d)->pgd_bits)
-
-#define ARM_LPAE_PTES_PER_TABLE(d)					\
-	(ARM_LPAE_GRANULE(d) >> ilog2(sizeof(arm_lpae_iopte)))
-
-/*
- * Calculate the index at level l used to map virtual address a using the
- * pagetable in d.
- */
-#define ARM_LPAE_PGD_IDX(l,d)						\
-	((l) == (d)->start_level ? (d)->pgd_bits - (d)->bits_per_level : 0)
-
-#define ARM_LPAE_LVL_IDX(a,l,d)						\
-	(((u64)(a) >> ARM_LPAE_LVL_SHIFT(l,d)) &			\
-	 ((1 << ((d)->bits_per_level + ARM_LPAE_PGD_IDX(l,d))) - 1))
-
-/* Calculate the block/page mapping size at level l for pagetable in d. */
-#define ARM_LPAE_BLOCK_SIZE(l,d)	(1ULL << ARM_LPAE_LVL_SHIFT(l,d))
-
-/* Page table bits */
-#define ARM_LPAE_PTE_TYPE_SHIFT		0
-#define ARM_LPAE_PTE_TYPE_MASK		0x3
-
-#define ARM_LPAE_PTE_TYPE_BLOCK		1
-#define ARM_LPAE_PTE_TYPE_TABLE		3
-#define ARM_LPAE_PTE_TYPE_PAGE		3
-
-#define ARM_LPAE_PTE_ADDR_MASK		GENMASK_ULL(47,12)
-
-#define ARM_LPAE_PTE_NSTABLE		(((arm_lpae_iopte)1) << 63)
-#define ARM_LPAE_PTE_XN			(((arm_lpae_iopte)3) << 53)
-#define ARM_LPAE_PTE_AF			(((arm_lpae_iopte)1) << 10)
-#define ARM_LPAE_PTE_SH_NS		(((arm_lpae_iopte)0) << 8)
-#define ARM_LPAE_PTE_SH_OS		(((arm_lpae_iopte)2) << 8)
-#define ARM_LPAE_PTE_SH_IS		(((arm_lpae_iopte)3) << 8)
-#define ARM_LPAE_PTE_NS			(((arm_lpae_iopte)1) << 5)
-#define ARM_LPAE_PTE_VALID		(((arm_lpae_iopte)1) << 0)
-
-#define ARM_LPAE_PTE_ATTR_LO_MASK	(((arm_lpae_iopte)0x3ff) << 2)
-/* Ignore the contiguous bit for block splitting */
-#define ARM_LPAE_PTE_ATTR_HI_MASK	(((arm_lpae_iopte)6) << 52)
-#define ARM_LPAE_PTE_ATTR_MASK		(ARM_LPAE_PTE_ATTR_LO_MASK |	\
-					 ARM_LPAE_PTE_ATTR_HI_MASK)
-/* Software bit for solving coherency races */
-#define ARM_LPAE_PTE_SW_SYNC		(((arm_lpae_iopte)1) << 55)
-
-/* Stage-1 PTE */
-#define ARM_LPAE_PTE_AP_UNPRIV		(((arm_lpae_iopte)1) << 6)
-#define ARM_LPAE_PTE_AP_RDONLY		(((arm_lpae_iopte)2) << 6)
-#define ARM_LPAE_PTE_ATTRINDX_SHIFT	2
-#define ARM_LPAE_PTE_nG			(((arm_lpae_iopte)1) << 11)
-
-/* Stage-2 PTE */
-#define ARM_LPAE_PTE_HAP_FAULT		(((arm_lpae_iopte)0) << 6)
-#define ARM_LPAE_PTE_HAP_READ		(((arm_lpae_iopte)1) << 6)
-#define ARM_LPAE_PTE_HAP_WRITE		(((arm_lpae_iopte)2) << 6)
-#define ARM_LPAE_PTE_MEMATTR_OIWB	(((arm_lpae_iopte)0xf) << 2)
-#define ARM_LPAE_PTE_MEMATTR_NC		(((arm_lpae_iopte)0x5) << 2)
-#define ARM_LPAE_PTE_MEMATTR_DEV	(((arm_lpae_iopte)0x1) << 2)
-
-/* Register bits */
-#define ARM_LPAE_VTCR_SL0_MASK		0x3
-
-#define ARM_LPAE_TCR_T0SZ_SHIFT		0
-
-#define ARM_LPAE_VTCR_PS_SHIFT		16
-#define ARM_LPAE_VTCR_PS_MASK		0x7
-
-#define ARM_LPAE_MAIR_ATTR_SHIFT(n)	((n) << 3)
-#define ARM_LPAE_MAIR_ATTR_MASK		0xff
-#define ARM_LPAE_MAIR_ATTR_DEVICE	0x04
-#define ARM_LPAE_MAIR_ATTR_NC		0x44
-#define ARM_LPAE_MAIR_ATTR_INC_OWBRWA	0xf4
-#define ARM_LPAE_MAIR_ATTR_WBRWA	0xff
-#define ARM_LPAE_MAIR_ATTR_IDX_NC	0
-#define ARM_LPAE_MAIR_ATTR_IDX_CACHE	1
-#define ARM_LPAE_MAIR_ATTR_IDX_DEV	2
-#define ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE	3
-
-#define ARM_MALI_LPAE_TTBR_ADRMODE_TABLE (3u << 0)
-#define ARM_MALI_LPAE_TTBR_READ_INNER	BIT(2)
-#define ARM_MALI_LPAE_TTBR_SHARE_OUTER	BIT(4)
-
-#define ARM_MALI_LPAE_MEMATTR_IMP_DEF	0x88ULL
-#define ARM_MALI_LPAE_MEMATTR_WRITE_ALLOC 0x8DULL
-
-/* IOPTE accessors */
-#define iopte_deref(pte,d) __va(iopte_to_paddr(pte, d))
-
-#define iopte_type(pte)					\
-	(((pte) >> ARM_LPAE_PTE_TYPE_SHIFT) & ARM_LPAE_PTE_TYPE_MASK)
-
-#define iopte_prot(pte)	((pte) & ARM_LPAE_PTE_ATTR_MASK)
-
-struct arm_lpae_io_pgtable {
-	struct io_pgtable	iop;
-
-	int			pgd_bits;
-	int			start_level;
-	int			bits_per_level;
-
-	void			*pgd;
-};
-
-typedef u64 arm_lpae_iopte;
-
-static inline bool iopte_leaf(arm_lpae_iopte pte, int lvl,
-			      enum io_pgtable_fmt fmt)
-{
-	if (lvl == (ARM_LPAE_MAX_LEVELS - 1) && fmt != ARM_MALI_LPAE)
-		return iopte_type(pte) == ARM_LPAE_PTE_TYPE_PAGE;
-
-	return iopte_type(pte) == ARM_LPAE_PTE_TYPE_BLOCK;
-}
-
-static arm_lpae_iopte paddr_to_iopte(phys_addr_t paddr,
-				     struct arm_lpae_io_pgtable *data)
-{
-	arm_lpae_iopte pte = paddr;
-
-	/* Of the bits which overlap, either 51:48 or 15:12 are always RES0 */
-	return (pte | (pte >> (48 - 12))) & ARM_LPAE_PTE_ADDR_MASK;
-}
-
-static phys_addr_t iopte_to_paddr(arm_lpae_iopte pte,
-				  struct arm_lpae_io_pgtable *data)
-{
-	u64 paddr = pte & ARM_LPAE_PTE_ADDR_MASK;
-
-	if (ARM_LPAE_GRANULE(data) < SZ_64K)
-		return paddr;
-
-	/* Rotate the packed high-order bits back to the top */
-	return (paddr | (paddr << (48 - 12))) & (ARM_LPAE_PTE_ADDR_MASK << 4);
-}
-
-static bool selftest_running = false;
+bool selftest_running = false;
 
 static dma_addr_t __arm_lpae_dma_addr(void *pages)
 {
 	return (dma_addr_t)virt_to_phys(pages);
 }
 
-static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
-				    struct io_pgtable_cfg *cfg)
+void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp, struct io_pgtable_cfg *cfg)
 {
 	struct device *dev = cfg->iommu_dev;
 	int order = get_order(size);
@@ -225,8 +68,7 @@ static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
 	return NULL;
 }
 
-static void __arm_lpae_free_pages(void *pages, size_t size,
-				  struct io_pgtable_cfg *cfg)
+void __arm_lpae_free_pages(void *pages, size_t size, struct io_pgtable_cfg *cfg)
 {
 	if (!cfg->coherent_walk)
 		dma_unmap_single(cfg->iommu_dev, __arm_lpae_dma_addr(pages),
@@ -234,299 +76,13 @@ static void __arm_lpae_free_pages(void *pages, size_t size,
 	free_pages((unsigned long)pages, get_order(size));
 }
 
-static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries,
-				struct io_pgtable_cfg *cfg)
+void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries,
+			 struct io_pgtable_cfg *cfg)
 {
 	dma_sync_single_for_device(cfg->iommu_dev, __arm_lpae_dma_addr(ptep),
 				   sizeof(*ptep) * num_entries, DMA_TO_DEVICE);
 }
 
-static void __arm_lpae_clear_pte(arm_lpae_iopte *ptep, struct io_pgtable_cfg *cfg)
-{
-
-	*ptep = 0;
-
-	if (!cfg->coherent_walk)
-		__arm_lpae_sync_pte(ptep, 1, cfg);
-}
-
-static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
-			       struct iommu_iotlb_gather *gather,
-			       unsigned long iova, size_t size, size_t pgcount,
-			       int lvl, arm_lpae_iopte *ptep);
-
-static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
-				phys_addr_t paddr, arm_lpae_iopte prot,
-				int lvl, int num_entries, arm_lpae_iopte *ptep)
-{
-	arm_lpae_iopte pte = prot;
-	struct io_pgtable_cfg *cfg = &data->iop.cfg;
-	size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
-	int i;
-
-	if (data->iop.fmt != ARM_MALI_LPAE && lvl == ARM_LPAE_MAX_LEVELS - 1)
-		pte |= ARM_LPAE_PTE_TYPE_PAGE;
-	else
-		pte |= ARM_LPAE_PTE_TYPE_BLOCK;
-
-	for (i = 0; i < num_entries; i++)
-		ptep[i] = pte | paddr_to_iopte(paddr + i * sz, data);
-
-	if (!cfg->coherent_walk)
-		__arm_lpae_sync_pte(ptep, num_entries, cfg);
-}
-
-static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
-			     unsigned long iova, phys_addr_t paddr,
-			     arm_lpae_iopte prot, int lvl, int num_entries,
-			     arm_lpae_iopte *ptep)
-{
-	int i;
-
-	for (i = 0; i < num_entries; i++)
-		if (iopte_leaf(ptep[i], lvl, data->iop.fmt)) {
-			/* We require an unmap first */
-			WARN_ON(!selftest_running);
-			return -EEXIST;
-		} else if (iopte_type(ptep[i]) == ARM_LPAE_PTE_TYPE_TABLE) {
-			/*
-			 * We need to unmap and free the old table before
-			 * overwriting it with a block entry.
-			 */
-			arm_lpae_iopte *tblp;
-			size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
-
-			tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data);
-			if (__arm_lpae_unmap(data, NULL, iova + i * sz, sz, 1,
-					     lvl, tblp) != sz) {
-				WARN_ON(1);
-				return -EINVAL;
-			}
-		}
-
-	__arm_lpae_init_pte(data, paddr, prot, lvl, num_entries, ptep);
-	return 0;
-}
-
-static arm_lpae_iopte arm_lpae_install_table(arm_lpae_iopte *table,
-					     arm_lpae_iopte *ptep,
-					     arm_lpae_iopte curr,
-					     struct arm_lpae_io_pgtable *data)
-{
-	arm_lpae_iopte old, new;
-	struct io_pgtable_cfg *cfg = &data->iop.cfg;
-
-	new = paddr_to_iopte(__pa(table), data) | ARM_LPAE_PTE_TYPE_TABLE;
-	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_NS)
-		new |= ARM_LPAE_PTE_NSTABLE;
-
-	/*
-	 * Ensure the table itself is visible before its PTE can be.
-	 * Whilst we could get away with cmpxchg64_release below, this
-	 * doesn't have any ordering semantics when !CONFIG_SMP.
-	 */
-	dma_wmb();
-
-	old = cmpxchg64_relaxed(ptep, curr, new);
-
-	if (cfg->coherent_walk || (old & ARM_LPAE_PTE_SW_SYNC))
-		return old;
-
-	/* Even if it's not ours, there's no point waiting; just kick it */
-	__arm_lpae_sync_pte(ptep, 1, cfg);
-	if (old == curr)
-		WRITE_ONCE(*ptep, new | ARM_LPAE_PTE_SW_SYNC);
-
-	return old;
-}
-
-static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova,
-			  phys_addr_t paddr, size_t size, size_t pgcount,
-			  arm_lpae_iopte prot, int lvl, arm_lpae_iopte *ptep,
-			  gfp_t gfp, size_t *mapped)
-{
-	arm_lpae_iopte *cptep, pte;
-	size_t block_size = ARM_LPAE_BLOCK_SIZE(lvl, data);
-	size_t tblsz = ARM_LPAE_GRANULE(data);
-	struct io_pgtable_cfg *cfg = &data->iop.cfg;
-	int ret = 0, num_entries, max_entries, map_idx_start;
-
-	/* Find our entry at the current level */
-	map_idx_start = ARM_LPAE_LVL_IDX(iova, lvl, data);
-	ptep += map_idx_start;
-
-	/* If we can install a leaf entry at this level, then do so */
-	if (size == block_size) {
-		max_entries = ARM_LPAE_PTES_PER_TABLE(data) - map_idx_start;
-		num_entries = min_t(int, pgcount, max_entries);
-		ret = arm_lpae_init_pte(data, iova, paddr, prot, lvl, num_entries, ptep);
-		if (!ret)
-			*mapped += num_entries * size;
-
-		return ret;
-	}
-
-	/* We can't allocate tables at the final level */
-	if (WARN_ON(lvl >= ARM_LPAE_MAX_LEVELS - 1))
-		return -EINVAL;
-
-	/* Grab a pointer to the next level */
-	pte = READ_ONCE(*ptep);
-	if (!pte) {
-		cptep = __arm_lpae_alloc_pages(tblsz, gfp, cfg);
-		if (!cptep)
-			return -ENOMEM;
-
-		pte = arm_lpae_install_table(cptep, ptep, 0, data);
-		if (pte)
-			__arm_lpae_free_pages(cptep, tblsz, cfg);
-	} else if (!cfg->coherent_walk && !(pte & ARM_LPAE_PTE_SW_SYNC)) {
-		__arm_lpae_sync_pte(ptep, 1, cfg);
-	}
-
-	if (pte && !iopte_leaf(pte, lvl, data->iop.fmt)) {
-		cptep = iopte_deref(pte, data);
-	} else if (pte) {
-		/* We require an unmap first */
-		WARN_ON(!selftest_running);
-		return -EEXIST;
-	}
-
-	/* Rinse, repeat */
-	return __arm_lpae_map(data, iova, paddr, size, pgcount, prot, lvl + 1,
-			      cptep, gfp, mapped);
-}
-
-static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
-					   int prot)
-{
-	arm_lpae_iopte pte;
-
-	if (data->iop.fmt == ARM_64_LPAE_S1 ||
-	    data->iop.fmt == ARM_32_LPAE_S1) {
-		pte = ARM_LPAE_PTE_nG;
-		if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
-			pte |= ARM_LPAE_PTE_AP_RDONLY;
-		if (!(prot & IOMMU_PRIV))
-			pte |= ARM_LPAE_PTE_AP_UNPRIV;
-	} else {
-		pte = ARM_LPAE_PTE_HAP_FAULT;
-		if (prot & IOMMU_READ)
-			pte |= ARM_LPAE_PTE_HAP_READ;
-		if (prot & IOMMU_WRITE)
-			pte |= ARM_LPAE_PTE_HAP_WRITE;
-	}
-
-	/*
-	 * Note that this logic is structured to accommodate Mali LPAE
-	 * having stage-1-like attributes but stage-2-like permissions.
-	 */
-	if (data->iop.fmt == ARM_64_LPAE_S2 ||
-	    data->iop.fmt == ARM_32_LPAE_S2) {
-		if (prot & IOMMU_MMIO)
-			pte |= ARM_LPAE_PTE_MEMATTR_DEV;
-		else if (prot & IOMMU_CACHE)
-			pte |= ARM_LPAE_PTE_MEMATTR_OIWB;
-		else
-			pte |= ARM_LPAE_PTE_MEMATTR_NC;
-	} else {
-		if (prot & IOMMU_MMIO)
-			pte |= (ARM_LPAE_MAIR_ATTR_IDX_DEV
-				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
-		else if (prot & IOMMU_CACHE)
-			pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
-				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
-	}
-
-	/*
-	 * Also Mali has its own notions of shareability wherein its Inner
-	 * domain covers the cores within the GPU, and its Outer domain is
-	 * "outside the GPU" (i.e. either the Inner or System domain in CPU
-	 * terms, depending on coherency).
-	 */
-	if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE)
-		pte |= ARM_LPAE_PTE_SH_IS;
-	else
-		pte |= ARM_LPAE_PTE_SH_OS;
-
-	if (prot & IOMMU_NOEXEC)
-		pte |= ARM_LPAE_PTE_XN;
-
-	if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_NS)
-		pte |= ARM_LPAE_PTE_NS;
-
-	if (data->iop.fmt != ARM_MALI_LPAE)
-		pte |= ARM_LPAE_PTE_AF;
-
-	return pte;
-}
-
-static int arm_lpae_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
-			      phys_addr_t paddr, size_t pgsize, size_t pgcount,
-			      int iommu_prot, gfp_t gfp, size_t *mapped)
-{
-	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
-	struct io_pgtable_cfg *cfg = &data->iop.cfg;
-	arm_lpae_iopte *ptep = data->pgd;
-	int ret, lvl = data->start_level;
-	arm_lpae_iopte prot;
-	long iaext = (s64)iova >> cfg->ias;
-
-	if (WARN_ON(!pgsize || (pgsize & cfg->pgsize_bitmap) != pgsize))
-		return -EINVAL;
-
-	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
-		iaext = ~iaext;
-	if (WARN_ON(iaext || paddr >> cfg->oas))
-		return -ERANGE;
-
-	/* If no access, then nothing to do */
-	if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE)))
-		return 0;
-
-	prot = arm_lpae_prot_to_pte(data, iommu_prot);
-	ret = __arm_lpae_map(data, iova, paddr, pgsize, pgcount, prot, lvl,
-			     ptep, gfp, mapped);
-	/*
-	 * Synchronise all PTE updates for the new mapping before there's
-	 * a chance for anything to kick off a table walk for the new iova.
-	 */
-	wmb();
-
-	return ret;
-}
-
-static void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
-				    arm_lpae_iopte *ptep)
-{
-	arm_lpae_iopte *start, *end;
-	unsigned long table_size;
-
-	if (lvl == data->start_level)
-		table_size = ARM_LPAE_PGD_SIZE(data);
-	else
-		table_size = ARM_LPAE_GRANULE(data);
-
-	start = ptep;
-
-	/* Only leaf entries at the last level */
-	if (lvl == ARM_LPAE_MAX_LEVELS - 1)
-		end = ptep;
-	else
-		end = (void *)ptep + table_size;
-
-	while (ptep != end) {
-		arm_lpae_iopte pte = *ptep++;
-
-		if (!pte || iopte_leaf(pte, lvl, data->iop.fmt))
-			continue;
-
-		__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data));
-	}
-
-	__arm_lpae_free_pages(start, table_size, &data->iop.cfg);
-}
-
 static void arm_lpae_free_pgtable(struct io_pgtable *iop)
 {
 	struct arm_lpae_io_pgtable *data = io_pgtable_to_data(iop);
@@ -535,182 +91,6 @@ static void arm_lpae_free_pgtable(struct io_pgtable *iop)
 	kfree(data);
 }
 
-static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
-				       struct iommu_iotlb_gather *gather,
-				       unsigned long iova, size_t size,
-				       arm_lpae_iopte blk_pte, int lvl,
-				       arm_lpae_iopte *ptep, size_t pgcount)
-{
-	struct io_pgtable_cfg *cfg = &data->iop.cfg;
-	arm_lpae_iopte pte, *tablep;
-	phys_addr_t blk_paddr;
-	size_t tablesz = ARM_LPAE_GRANULE(data);
-	size_t split_sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
-	int ptes_per_table = ARM_LPAE_PTES_PER_TABLE(data);
-	int i, unmap_idx_start = -1, num_entries = 0, max_entries;
-
-	if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
-		return 0;
-
-	tablep = __arm_lpae_alloc_pages(tablesz, GFP_ATOMIC, cfg);
-	if (!tablep)
-		return 0; /* Bytes unmapped */
-
-	if (size == split_sz) {
-		unmap_idx_start = ARM_LPAE_LVL_IDX(iova, lvl, data);
-		max_entries = ptes_per_table - unmap_idx_start;
-		num_entries = min_t(int, pgcount, max_entries);
-	}
-
-	blk_paddr = iopte_to_paddr(blk_pte, data);
-	pte = iopte_prot(blk_pte);
-
-	for (i = 0; i < ptes_per_table; i++, blk_paddr += split_sz) {
-		/* Unmap! */
-		if (i >= unmap_idx_start && i < (unmap_idx_start + num_entries))
-			continue;
-
-		__arm_lpae_init_pte(data, blk_paddr, pte, lvl, 1, &tablep[i]);
-	}
-
-	pte = arm_lpae_install_table(tablep, ptep, blk_pte, data);
-	if (pte != blk_pte) {
-		__arm_lpae_free_pages(tablep, tablesz, cfg);
-		/*
-		 * We may race against someone unmapping another part of this
-		 * block, but anything else is invalid. We can't misinterpret
-		 * a page entry here since we're never at the last level.
-		 */
-		if (iopte_type(pte) != ARM_LPAE_PTE_TYPE_TABLE)
-			return 0;
-
-		tablep = iopte_deref(pte, data);
-	} else if (unmap_idx_start >= 0) {
-		for (i = 0; i < num_entries; i++)
-			io_pgtable_tlb_add_page(&data->iop, gather, iova + i * size, size);
-
-		return num_entries * size;
-	}
-
-	return __arm_lpae_unmap(data, gather, iova, size, pgcount, lvl, tablep);
-}
-
-static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
-			       struct iommu_iotlb_gather *gather,
-			       unsigned long iova, size_t size, size_t pgcount,
-			       int lvl, arm_lpae_iopte *ptep)
-{
-	arm_lpae_iopte pte;
-	struct io_pgtable *iop = &data->iop;
-	int i = 0, num_entries, max_entries, unmap_idx_start;
-
-	/* Something went horribly wrong and we ran out of page table */
-	if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
-		return 0;
-
-	unmap_idx_start = ARM_LPAE_LVL_IDX(iova, lvl, data);
-	ptep += unmap_idx_start;
-	pte = READ_ONCE(*ptep);
-	if (WARN_ON(!pte))
-		return 0;
-
-	/* If the size matches this level, we're in the right place */
-	if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
-		max_entries = ARM_LPAE_PTES_PER_TABLE(data) - unmap_idx_start;
-		num_entries = min_t(int, pgcount, max_entries);
-
-		while (i < num_entries) {
-			pte = READ_ONCE(*ptep);
-			if (WARN_ON(!pte))
-				break;
-
-			__arm_lpae_clear_pte(ptep, &iop->cfg);
-
-			if (!iopte_leaf(pte, lvl, iop->fmt)) {
-				/* Also flush any partial walks */
-				io_pgtable_tlb_flush_walk(iop, iova + i * size, size,
-							  ARM_LPAE_GRANULE(data));
-				__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data));
-			} else if (!iommu_iotlb_gather_queued(gather)) {
-				io_pgtable_tlb_add_page(iop, gather, iova + i * size, size);
-			}
-
-			ptep++;
-			i++;
-		}
-
-		return i * size;
-	} else if (iopte_leaf(pte, lvl, iop->fmt)) {
-		/*
-		 * Insert a table at the next level to map the old region,
-		 * minus the part we want to unmap
-		 */
-		return arm_lpae_split_blk_unmap(data, gather, iova, size, pte,
-						lvl + 1, ptep, pgcount);
-	}
-
-	/* Keep on walkin' */
-	ptep = iopte_deref(pte, data);
-	return __arm_lpae_unmap(data, gather, iova, size, pgcount, lvl + 1, ptep);
-}
-
-static size_t arm_lpae_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova,
-				   size_t pgsize, size_t pgcount,
-				   struct iommu_iotlb_gather *gather)
-{
-	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
-	struct io_pgtable_cfg *cfg = &data->iop.cfg;
-	arm_lpae_iopte *ptep = data->pgd;
-	long iaext = (s64)iova >> cfg->ias;
-
-	if (WARN_ON(!pgsize || (pgsize & cfg->pgsize_bitmap) != pgsize || !pgcount))
-		return 0;
-
-	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
-		iaext = ~iaext;
-	if (WARN_ON(iaext))
-		return 0;
-
-	return __arm_lpae_unmap(data, gather, iova, pgsize, pgcount,
-				data->start_level, ptep);
-}
-
-static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
-					 unsigned long iova)
-{
-	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
-	arm_lpae_iopte pte, *ptep = data->pgd;
-	int lvl = data->start_level;
-
-	do {
-		/* Valid IOPTE pointer? */
-		if (!ptep)
-			return 0;
-
-		/* Grab the IOPTE we're interested in */
-		ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
-		pte = READ_ONCE(*ptep);
-
-		/* Valid entry? */
-		if (!pte)
-			return 0;
-
-		/* Leaf entry? */
-		if (iopte_leaf(pte, lvl, data->iop.fmt))
-			goto found_translation;
-
-		/* Take it to the next level */
-		ptep = iopte_deref(pte, data);
-	} while (++lvl < ARM_LPAE_MAX_LEVELS);
-
-	/* Ran out of page tables to walk */
-	return 0;
-
-found_translation:
-	iova &= (ARM_LPAE_BLOCK_SIZE(lvl, data) - 1);
-	return iopte_to_paddr(pte, data) | iova;
-}
-
 static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
 {
 	unsigned long granule, page_sizes;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 01/45] iommu/io-pgtable-arm: Split the page table driver
@ 2023-02-01 12:52   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:52 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

To allow the KVM IOMMU driver to populate page tables using the
io-pgtable-arm code, move the shared bits into io-pgtable-arm-common.c.

Here we move the bulk of the common code, and a subsequent patch handles
the bits that require more care. phys_to_virt() and virt_to_phys() do
need special handling here because the hypervisor will have its own
version. It will also implement its own version of
__arm_lpae_alloc_pages(), __arm_lpae_free_pages() and
__arm_lpae_sync_pte() since the hypervisor needs some assistance for
allocating pages.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/iommu/Makefile                        |   2 +-
 drivers/iommu/io-pgtable-arm.h                |  30 -
 include/linux/io-pgtable-arm.h                | 187 ++++++
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |   2 +-
 drivers/iommu/io-pgtable-arm-common.c         | 500 ++++++++++++++
 drivers/iommu/io-pgtable-arm.c                | 634 +-----------------
 6 files changed, 696 insertions(+), 659 deletions(-)
 delete mode 100644 drivers/iommu/io-pgtable-arm.h
 create mode 100644 include/linux/io-pgtable-arm.h
 create mode 100644 drivers/iommu/io-pgtable-arm-common.c

diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index f461d0651385..c616acf534f8 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -7,7 +7,7 @@ obj-$(CONFIG_IOMMU_DEBUGFS) += iommu-debugfs.o
 obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
-obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
+obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o io-pgtable-arm-common.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_DART) += io-pgtable-dart.o
 obj-$(CONFIG_IOASID) += ioasid.o
 obj-$(CONFIG_IOMMU_IOVA) += iova.o
diff --git a/drivers/iommu/io-pgtable-arm.h b/drivers/iommu/io-pgtable-arm.h
deleted file mode 100644
index ba7cfdf7afa0..000000000000
--- a/drivers/iommu/io-pgtable-arm.h
+++ /dev/null
@@ -1,30 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-#ifndef IO_PGTABLE_ARM_H_
-#define IO_PGTABLE_ARM_H_
-
-#define ARM_LPAE_TCR_TG0_4K		0
-#define ARM_LPAE_TCR_TG0_64K		1
-#define ARM_LPAE_TCR_TG0_16K		2
-
-#define ARM_LPAE_TCR_TG1_16K		1
-#define ARM_LPAE_TCR_TG1_4K		2
-#define ARM_LPAE_TCR_TG1_64K		3
-
-#define ARM_LPAE_TCR_SH_NS		0
-#define ARM_LPAE_TCR_SH_OS		2
-#define ARM_LPAE_TCR_SH_IS		3
-
-#define ARM_LPAE_TCR_RGN_NC		0
-#define ARM_LPAE_TCR_RGN_WBWA		1
-#define ARM_LPAE_TCR_RGN_WT		2
-#define ARM_LPAE_TCR_RGN_WB		3
-
-#define ARM_LPAE_TCR_PS_32_BIT		0x0ULL
-#define ARM_LPAE_TCR_PS_36_BIT		0x1ULL
-#define ARM_LPAE_TCR_PS_40_BIT		0x2ULL
-#define ARM_LPAE_TCR_PS_42_BIT		0x3ULL
-#define ARM_LPAE_TCR_PS_44_BIT		0x4ULL
-#define ARM_LPAE_TCR_PS_48_BIT		0x5ULL
-#define ARM_LPAE_TCR_PS_52_BIT		0x6ULL
-
-#endif /* IO_PGTABLE_ARM_H_ */
diff --git a/include/linux/io-pgtable-arm.h b/include/linux/io-pgtable-arm.h
new file mode 100644
index 000000000000..594b5030b450
--- /dev/null
+++ b/include/linux/io-pgtable-arm.h
@@ -0,0 +1,187 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef IO_PGTABLE_H_
+#define IO_PGTABLE_H_
+
+#include <linux/io-pgtable.h>
+
+extern bool selftest_running;
+
+typedef u64 arm_lpae_iopte;
+
+struct arm_lpae_io_pgtable {
+	struct io_pgtable	iop;
+
+	int			pgd_bits;
+	int			start_level;
+	int			bits_per_level;
+
+	void			*pgd;
+};
+
+/* Struct accessors */
+#define io_pgtable_to_data(x)						\
+	container_of((x), struct arm_lpae_io_pgtable, iop)
+
+#define io_pgtable_ops_to_data(x)					\
+	io_pgtable_to_data(io_pgtable_ops_to_pgtable(x))
+
+/*
+ * Calculate the right shift amount to get to the portion describing level l
+ * in a virtual address mapped by the pagetable in d.
+ */
+#define ARM_LPAE_LVL_SHIFT(l,d)						\
+	(((ARM_LPAE_MAX_LEVELS - (l)) * (d)->bits_per_level) +		\
+	ilog2(sizeof(arm_lpae_iopte)))
+
+#define ARM_LPAE_GRANULE(d)						\
+	(sizeof(arm_lpae_iopte) << (d)->bits_per_level)
+#define ARM_LPAE_PGD_SIZE(d)						\
+	(sizeof(arm_lpae_iopte) << (d)->pgd_bits)
+
+#define ARM_LPAE_PTES_PER_TABLE(d)					\
+	(ARM_LPAE_GRANULE(d) >> ilog2(sizeof(arm_lpae_iopte)))
+
+/*
+ * Calculate the index at level l used to map virtual address a using the
+ * pagetable in d.
+ */
+#define ARM_LPAE_PGD_IDX(l,d)						\
+	((l) == (d)->start_level ? (d)->pgd_bits - (d)->bits_per_level : 0)
+
+#define ARM_LPAE_LVL_IDX(a,l,d)						\
+	(((u64)(a) >> ARM_LPAE_LVL_SHIFT(l,d)) &			\
+	 ((1 << ((d)->bits_per_level + ARM_LPAE_PGD_IDX(l,d))) - 1))
+
+/* Calculate the block/page mapping size at level l for pagetable in d. */
+#define ARM_LPAE_BLOCK_SIZE(l,d)	(1ULL << ARM_LPAE_LVL_SHIFT(l,d))
+
+/* Page table bits */
+#define ARM_LPAE_PTE_TYPE_SHIFT		0
+#define ARM_LPAE_PTE_TYPE_MASK		0x3
+
+#define ARM_LPAE_PTE_TYPE_BLOCK		1
+#define ARM_LPAE_PTE_TYPE_TABLE		3
+#define ARM_LPAE_PTE_TYPE_PAGE		3
+
+#define ARM_LPAE_PTE_ADDR_MASK		GENMASK_ULL(47,12)
+
+#define ARM_LPAE_PTE_NSTABLE		(((arm_lpae_iopte)1) << 63)
+#define ARM_LPAE_PTE_XN			(((arm_lpae_iopte)3) << 53)
+#define ARM_LPAE_PTE_AF			(((arm_lpae_iopte)1) << 10)
+#define ARM_LPAE_PTE_SH_NS		(((arm_lpae_iopte)0) << 8)
+#define ARM_LPAE_PTE_SH_OS		(((arm_lpae_iopte)2) << 8)
+#define ARM_LPAE_PTE_SH_IS		(((arm_lpae_iopte)3) << 8)
+#define ARM_LPAE_PTE_NS			(((arm_lpae_iopte)1) << 5)
+#define ARM_LPAE_PTE_VALID		(((arm_lpae_iopte)1) << 0)
+
+#define ARM_LPAE_PTE_ATTR_LO_MASK	(((arm_lpae_iopte)0x3ff) << 2)
+/* Ignore the contiguous bit for block splitting */
+#define ARM_LPAE_PTE_ATTR_HI_MASK	(((arm_lpae_iopte)6) << 52)
+#define ARM_LPAE_PTE_ATTR_MASK		(ARM_LPAE_PTE_ATTR_LO_MASK |	\
+					 ARM_LPAE_PTE_ATTR_HI_MASK)
+/* Software bit for solving coherency races */
+#define ARM_LPAE_PTE_SW_SYNC		(((arm_lpae_iopte)1) << 55)
+
+/* Stage-1 PTE */
+#define ARM_LPAE_PTE_AP_UNPRIV		(((arm_lpae_iopte)1) << 6)
+#define ARM_LPAE_PTE_AP_RDONLY		(((arm_lpae_iopte)2) << 6)
+#define ARM_LPAE_PTE_ATTRINDX_SHIFT	2
+#define ARM_LPAE_PTE_nG			(((arm_lpae_iopte)1) << 11)
+
+/* Stage-2 PTE */
+#define ARM_LPAE_PTE_HAP_FAULT		(((arm_lpae_iopte)0) << 6)
+#define ARM_LPAE_PTE_HAP_READ		(((arm_lpae_iopte)1) << 6)
+#define ARM_LPAE_PTE_HAP_WRITE		(((arm_lpae_iopte)2) << 6)
+#define ARM_LPAE_PTE_MEMATTR_OIWB	(((arm_lpae_iopte)0xf) << 2)
+#define ARM_LPAE_PTE_MEMATTR_NC		(((arm_lpae_iopte)0x5) << 2)
+#define ARM_LPAE_PTE_MEMATTR_DEV	(((arm_lpae_iopte)0x1) << 2)
+
+/* Register bits */
+#define ARM_LPAE_VTCR_SL0_MASK		0x3
+
+#define ARM_LPAE_TCR_T0SZ_SHIFT		0
+
+#define ARM_LPAE_TCR_TG0_4K		0
+#define ARM_LPAE_TCR_TG0_64K		1
+#define ARM_LPAE_TCR_TG0_16K		2
+
+#define ARM_LPAE_TCR_TG1_16K		1
+#define ARM_LPAE_TCR_TG1_4K		2
+#define ARM_LPAE_TCR_TG1_64K		3
+
+#define ARM_LPAE_TCR_SH_NS		0
+#define ARM_LPAE_TCR_SH_OS		2
+#define ARM_LPAE_TCR_SH_IS		3
+
+#define ARM_LPAE_TCR_RGN_NC		0
+#define ARM_LPAE_TCR_RGN_WBWA		1
+#define ARM_LPAE_TCR_RGN_WT		2
+#define ARM_LPAE_TCR_RGN_WB		3
+
+#define ARM_LPAE_TCR_PS_32_BIT		0x0ULL
+#define ARM_LPAE_TCR_PS_36_BIT		0x1ULL
+#define ARM_LPAE_TCR_PS_40_BIT		0x2ULL
+#define ARM_LPAE_TCR_PS_42_BIT		0x3ULL
+#define ARM_LPAE_TCR_PS_44_BIT		0x4ULL
+#define ARM_LPAE_TCR_PS_48_BIT		0x5ULL
+#define ARM_LPAE_TCR_PS_52_BIT		0x6ULL
+
+#define ARM_LPAE_VTCR_PS_SHIFT		16
+#define ARM_LPAE_VTCR_PS_MASK		0x7
+
+#define ARM_LPAE_MAIR_ATTR_SHIFT(n)	((n) << 3)
+#define ARM_LPAE_MAIR_ATTR_MASK		0xff
+#define ARM_LPAE_MAIR_ATTR_DEVICE	0x04
+#define ARM_LPAE_MAIR_ATTR_NC		0x44
+#define ARM_LPAE_MAIR_ATTR_INC_OWBRWA	0xf4
+#define ARM_LPAE_MAIR_ATTR_WBRWA	0xff
+#define ARM_LPAE_MAIR_ATTR_IDX_NC	0
+#define ARM_LPAE_MAIR_ATTR_IDX_CACHE	1
+#define ARM_LPAE_MAIR_ATTR_IDX_DEV	2
+#define ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE	3
+
+#define ARM_MALI_LPAE_TTBR_ADRMODE_TABLE (3u << 0)
+#define ARM_MALI_LPAE_TTBR_READ_INNER	BIT(2)
+#define ARM_MALI_LPAE_TTBR_SHARE_OUTER	BIT(4)
+
+#define ARM_MALI_LPAE_MEMATTR_IMP_DEF	0x88ULL
+#define ARM_MALI_LPAE_MEMATTR_WRITE_ALLOC 0x8DULL
+
+#define ARM_LPAE_MAX_LEVELS		4
+
+#define iopte_type(pte)					\
+	(((pte) >> ARM_LPAE_PTE_TYPE_SHIFT) & ARM_LPAE_PTE_TYPE_MASK)
+
+#define iopte_prot(pte)	((pte) & ARM_LPAE_PTE_ATTR_MASK)
+
+static inline bool iopte_leaf(arm_lpae_iopte pte, int lvl,
+			      enum io_pgtable_fmt fmt)
+{
+	if (lvl == (ARM_LPAE_MAX_LEVELS - 1) && fmt != ARM_MALI_LPAE)
+		return iopte_type(pte) == ARM_LPAE_PTE_TYPE_PAGE;
+
+	return iopte_type(pte) == ARM_LPAE_PTE_TYPE_BLOCK;
+}
+
+#define __arm_lpae_virt_to_phys	__pa
+#define __arm_lpae_phys_to_virt	__va
+
+/* Generic functions */
+int arm_lpae_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
+		       phys_addr_t paddr, size_t pgsize, size_t pgcount,
+		       int iommu_prot, gfp_t gfp, size_t *mapped);
+size_t arm_lpae_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova,
+			    size_t pgsize, size_t pgcount,
+			    struct iommu_iotlb_gather *gather);
+phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
+				  unsigned long iova);
+void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
+			     arm_lpae_iopte *ptep);
+
+/* Host/hyp-specific functions */
+void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp, struct io_pgtable_cfg *cfg);
+void __arm_lpae_free_pages(void *pages, size_t size, struct io_pgtable_cfg *cfg);
+void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries,
+			 struct io_pgtable_cfg *cfg);
+
+#endif /* IO_PGTABLE_H_ */
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index a5a63b1c947e..df288f29a5c1 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -3,6 +3,7 @@
  * Implementation of the IOMMU SVA API for the ARM SMMUv3
  */
 
+#include <linux/io-pgtable-arm.h>
 #include <linux/mm.h>
 #include <linux/mmu_context.h>
 #include <linux/mmu_notifier.h>
@@ -11,7 +12,6 @@
 
 #include "arm-smmu-v3.h"
 #include "../../iommu-sva.h"
-#include "../../io-pgtable-arm.h"
 
 struct arm_smmu_mmu_notifier {
 	struct mmu_notifier		mn;
diff --git a/drivers/iommu/io-pgtable-arm-common.c b/drivers/iommu/io-pgtable-arm-common.c
new file mode 100644
index 000000000000..74d962712d15
--- /dev/null
+++ b/drivers/iommu/io-pgtable-arm-common.c
@@ -0,0 +1,500 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * CPU-agnostic ARM page table allocator.
+ * A copy of this library is embedded in the KVM nVHE image.
+ *
+ * Copyright (C) 2022 Arm Limited
+ *
+ * Author: Will Deacon <will.deacon@arm.com>
+ */
+
+#include <linux/io-pgtable-arm.h>
+
+#include <linux/sizes.h>
+#include <linux/types.h>
+
+#define iopte_deref(pte, d) __arm_lpae_phys_to_virt(iopte_to_paddr(pte, d))
+
+static arm_lpae_iopte paddr_to_iopte(phys_addr_t paddr,
+				     struct arm_lpae_io_pgtable *data)
+{
+	arm_lpae_iopte pte = paddr;
+
+	/* Of the bits which overlap, either 51:48 or 15:12 are always RES0 */
+	return (pte | (pte >> (48 - 12))) & ARM_LPAE_PTE_ADDR_MASK;
+}
+
+static phys_addr_t iopte_to_paddr(arm_lpae_iopte pte,
+				  struct arm_lpae_io_pgtable *data)
+{
+	u64 paddr = pte & ARM_LPAE_PTE_ADDR_MASK;
+
+	if (ARM_LPAE_GRANULE(data) < SZ_64K)
+		return paddr;
+
+	/* Rotate the packed high-order bits back to the top */
+	return (paddr | (paddr << (48 - 12))) & (ARM_LPAE_PTE_ADDR_MASK << 4);
+}
+
+static void __arm_lpae_clear_pte(arm_lpae_iopte *ptep, struct io_pgtable_cfg *cfg)
+{
+
+	*ptep = 0;
+
+	if (!cfg->coherent_walk)
+		__arm_lpae_sync_pte(ptep, 1, cfg);
+}
+
+static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
+			       struct iommu_iotlb_gather *gather,
+			       unsigned long iova, size_t size, size_t pgcount,
+			       int lvl, arm_lpae_iopte *ptep);
+
+static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
+				phys_addr_t paddr, arm_lpae_iopte prot,
+				int lvl, int num_entries, arm_lpae_iopte *ptep)
+{
+	arm_lpae_iopte pte = prot;
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+	size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
+	int i;
+
+	if (data->iop.fmt != ARM_MALI_LPAE && lvl == ARM_LPAE_MAX_LEVELS - 1)
+		pte |= ARM_LPAE_PTE_TYPE_PAGE;
+	else
+		pte |= ARM_LPAE_PTE_TYPE_BLOCK;
+
+	for (i = 0; i < num_entries; i++)
+		ptep[i] = pte | paddr_to_iopte(paddr + i * sz, data);
+
+	if (!cfg->coherent_walk)
+		__arm_lpae_sync_pte(ptep, num_entries, cfg);
+}
+
+static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
+			     unsigned long iova, phys_addr_t paddr,
+			     arm_lpae_iopte prot, int lvl, int num_entries,
+			     arm_lpae_iopte *ptep)
+{
+	int i;
+
+	for (i = 0; i < num_entries; i++)
+		if (iopte_leaf(ptep[i], lvl, data->iop.fmt)) {
+			/* We require an unmap first */
+			WARN_ON(!selftest_running);
+			return -EEXIST;
+		} else if (iopte_type(ptep[i]) == ARM_LPAE_PTE_TYPE_TABLE) {
+			/*
+			 * We need to unmap and free the old table before
+			 * overwriting it with a block entry.
+			 */
+			arm_lpae_iopte *tblp;
+			size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
+
+			tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data);
+			if (__arm_lpae_unmap(data, NULL, iova + i * sz, sz, 1,
+					     lvl, tblp) != sz) {
+				WARN_ON(1);
+				return -EINVAL;
+			}
+		}
+
+	__arm_lpae_init_pte(data, paddr, prot, lvl, num_entries, ptep);
+	return 0;
+}
+
+static arm_lpae_iopte arm_lpae_install_table(arm_lpae_iopte *table,
+					     arm_lpae_iopte *ptep,
+					     arm_lpae_iopte curr,
+					     struct arm_lpae_io_pgtable *data)
+{
+	arm_lpae_iopte old, new;
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+
+	new = paddr_to_iopte(__arm_lpae_virt_to_phys(table), data) |
+		ARM_LPAE_PTE_TYPE_TABLE;
+	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_NS)
+		new |= ARM_LPAE_PTE_NSTABLE;
+
+	/*
+	 * Ensure the table itself is visible before its PTE can be.
+	 * Whilst we could get away with cmpxchg64_release below, this
+	 * doesn't have any ordering semantics when !CONFIG_SMP.
+	 */
+	dma_wmb();
+
+	old = cmpxchg64_relaxed(ptep, curr, new);
+
+	if (cfg->coherent_walk || (old & ARM_LPAE_PTE_SW_SYNC))
+		return old;
+
+	/* Even if it's not ours, there's no point waiting; just kick it */
+	__arm_lpae_sync_pte(ptep, 1, cfg);
+	if (old == curr)
+		WRITE_ONCE(*ptep, new | ARM_LPAE_PTE_SW_SYNC);
+
+	return old;
+}
+
+int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova,
+		   phys_addr_t paddr, size_t size, size_t pgcount,
+		   arm_lpae_iopte prot, int lvl, arm_lpae_iopte *ptep,
+		   gfp_t gfp, size_t *mapped)
+{
+	arm_lpae_iopte *cptep, pte;
+	size_t block_size = ARM_LPAE_BLOCK_SIZE(lvl, data);
+	size_t tblsz = ARM_LPAE_GRANULE(data);
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+	int ret = 0, num_entries, max_entries, map_idx_start;
+
+	/* Find our entry at the current level */
+	map_idx_start = ARM_LPAE_LVL_IDX(iova, lvl, data);
+	ptep += map_idx_start;
+
+	/* If we can install a leaf entry at this level, then do so */
+	if (size == block_size) {
+		max_entries = ARM_LPAE_PTES_PER_TABLE(data) - map_idx_start;
+		num_entries = min_t(int, pgcount, max_entries);
+		ret = arm_lpae_init_pte(data, iova, paddr, prot, lvl, num_entries, ptep);
+		if (!ret)
+			*mapped += num_entries * size;
+
+		return ret;
+	}
+
+	/* We can't allocate tables at the final level */
+	if (WARN_ON(lvl >= ARM_LPAE_MAX_LEVELS - 1))
+		return -EINVAL;
+
+	/* Grab a pointer to the next level */
+	pte = READ_ONCE(*ptep);
+	if (!pte) {
+		cptep = __arm_lpae_alloc_pages(tblsz, gfp, cfg);
+		if (!cptep)
+			return -ENOMEM;
+
+		pte = arm_lpae_install_table(cptep, ptep, 0, data);
+		if (pte)
+			__arm_lpae_free_pages(cptep, tblsz, cfg);
+	} else if (!cfg->coherent_walk && !(pte & ARM_LPAE_PTE_SW_SYNC)) {
+		__arm_lpae_sync_pte(ptep, 1, cfg);
+	}
+
+	if (pte && !iopte_leaf(pte, lvl, data->iop.fmt)) {
+		cptep = iopte_deref(pte, data);
+	} else if (pte) {
+		/* We require an unmap first */
+		WARN_ON(!selftest_running);
+		return -EEXIST;
+	}
+
+	/* Rinse, repeat */
+	return __arm_lpae_map(data, iova, paddr, size, pgcount, prot, lvl + 1,
+			      cptep, gfp, mapped);
+}
+
+static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
+					   int prot)
+{
+	arm_lpae_iopte pte;
+
+	if (data->iop.fmt == ARM_64_LPAE_S1 ||
+	    data->iop.fmt == ARM_32_LPAE_S1) {
+		pte = ARM_LPAE_PTE_nG;
+		if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
+			pte |= ARM_LPAE_PTE_AP_RDONLY;
+		if (!(prot & IOMMU_PRIV))
+			pte |= ARM_LPAE_PTE_AP_UNPRIV;
+	} else {
+		pte = ARM_LPAE_PTE_HAP_FAULT;
+		if (prot & IOMMU_READ)
+			pte |= ARM_LPAE_PTE_HAP_READ;
+		if (prot & IOMMU_WRITE)
+			pte |= ARM_LPAE_PTE_HAP_WRITE;
+	}
+
+	/*
+	 * Note that this logic is structured to accommodate Mali LPAE
+	 * having stage-1-like attributes but stage-2-like permissions.
+	 */
+	if (data->iop.fmt == ARM_64_LPAE_S2 ||
+	    data->iop.fmt == ARM_32_LPAE_S2) {
+		if (prot & IOMMU_MMIO)
+			pte |= ARM_LPAE_PTE_MEMATTR_DEV;
+		else if (prot & IOMMU_CACHE)
+			pte |= ARM_LPAE_PTE_MEMATTR_OIWB;
+		else
+			pte |= ARM_LPAE_PTE_MEMATTR_NC;
+	} else {
+		if (prot & IOMMU_MMIO)
+			pte |= (ARM_LPAE_MAIR_ATTR_IDX_DEV
+				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
+		else if (prot & IOMMU_CACHE)
+			pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
+				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
+	}
+
+	/*
+	 * Also Mali has its own notions of shareability wherein its Inner
+	 * domain covers the cores within the GPU, and its Outer domain is
+	 * "outside the GPU" (i.e. either the Inner or System domain in CPU
+	 * terms, depending on coherency).
+	 */
+	if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE)
+		pte |= ARM_LPAE_PTE_SH_IS;
+	else
+		pte |= ARM_LPAE_PTE_SH_OS;
+
+	if (prot & IOMMU_NOEXEC)
+		pte |= ARM_LPAE_PTE_XN;
+
+	if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_NS)
+		pte |= ARM_LPAE_PTE_NS;
+
+	if (data->iop.fmt != ARM_MALI_LPAE)
+		pte |= ARM_LPAE_PTE_AF;
+
+	return pte;
+}
+
+int arm_lpae_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
+		       phys_addr_t paddr, size_t pgsize, size_t pgcount,
+		       int iommu_prot, gfp_t gfp, size_t *mapped)
+{
+	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+	arm_lpae_iopte *ptep = data->pgd;
+	int ret, lvl = data->start_level;
+	arm_lpae_iopte prot;
+	long iaext = (s64)iova >> cfg->ias;
+
+	if (WARN_ON(!pgsize || (pgsize & cfg->pgsize_bitmap) != pgsize))
+		return -EINVAL;
+
+	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
+		iaext = ~iaext;
+	if (WARN_ON(iaext || paddr >> cfg->oas))
+		return -ERANGE;
+
+	/* If no access, then nothing to do */
+	if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE)))
+		return 0;
+
+	prot = arm_lpae_prot_to_pte(data, iommu_prot);
+	ret = __arm_lpae_map(data, iova, paddr, pgsize, pgcount, prot, lvl,
+			     ptep, gfp, mapped);
+	/*
+	 * Synchronise all PTE updates for the new mapping before there's
+	 * a chance for anything to kick off a table walk for the new iova.
+	 */
+	wmb();
+
+	return ret;
+}
+
+void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
+			     arm_lpae_iopte *ptep)
+{
+	arm_lpae_iopte *start, *end;
+	unsigned long table_size;
+
+	if (lvl == data->start_level)
+		table_size = ARM_LPAE_PGD_SIZE(data);
+	else
+		table_size = ARM_LPAE_GRANULE(data);
+
+	start = ptep;
+
+	/* Only leaf entries at the last level */
+	if (lvl == ARM_LPAE_MAX_LEVELS - 1)
+		end = ptep;
+	else
+		end = (void *)ptep + table_size;
+
+	while (ptep != end) {
+		arm_lpae_iopte pte = *ptep++;
+
+		if (!pte || iopte_leaf(pte, lvl, data->iop.fmt))
+			continue;
+
+		__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data));
+	}
+
+	__arm_lpae_free_pages(start, table_size, &data->iop.cfg);
+}
+
+static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
+				       struct iommu_iotlb_gather *gather,
+				       unsigned long iova, size_t size,
+				       arm_lpae_iopte blk_pte, int lvl,
+				       arm_lpae_iopte *ptep, size_t pgcount)
+{
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+	arm_lpae_iopte pte, *tablep;
+	phys_addr_t blk_paddr;
+	size_t tablesz = ARM_LPAE_GRANULE(data);
+	size_t split_sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
+	int ptes_per_table = ARM_LPAE_PTES_PER_TABLE(data);
+	int i, unmap_idx_start = -1, num_entries = 0, max_entries;
+
+	if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+		return 0;
+
+	tablep = __arm_lpae_alloc_pages(tablesz, GFP_ATOMIC, cfg);
+	if (!tablep)
+		return 0; /* Bytes unmapped */
+
+	if (size == split_sz) {
+		unmap_idx_start = ARM_LPAE_LVL_IDX(iova, lvl, data);
+		max_entries = ptes_per_table - unmap_idx_start;
+		num_entries = min_t(int, pgcount, max_entries);
+	}
+
+	blk_paddr = iopte_to_paddr(blk_pte, data);
+	pte = iopte_prot(blk_pte);
+
+	for (i = 0; i < ptes_per_table; i++, blk_paddr += split_sz) {
+		/* Unmap! */
+		if (i >= unmap_idx_start && i < (unmap_idx_start + num_entries))
+			continue;
+
+		__arm_lpae_init_pte(data, blk_paddr, pte, lvl, 1, &tablep[i]);
+	}
+
+	pte = arm_lpae_install_table(tablep, ptep, blk_pte, data);
+	if (pte != blk_pte) {
+		__arm_lpae_free_pages(tablep, tablesz, cfg);
+		/*
+		 * We may race against someone unmapping another part of this
+		 * block, but anything else is invalid. We can't misinterpret
+		 * a page entry here since we're never at the last level.
+		 */
+		if (iopte_type(pte) != ARM_LPAE_PTE_TYPE_TABLE)
+			return 0;
+
+		tablep = iopte_deref(pte, data);
+	} else if (unmap_idx_start >= 0) {
+		for (i = 0; i < num_entries; i++)
+			io_pgtable_tlb_add_page(&data->iop, gather, iova + i * size, size);
+
+		return num_entries * size;
+	}
+
+	return __arm_lpae_unmap(data, gather, iova, size, pgcount, lvl, tablep);
+}
+
+static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
+			       struct iommu_iotlb_gather *gather,
+			       unsigned long iova, size_t size, size_t pgcount,
+			       int lvl, arm_lpae_iopte *ptep)
+{
+	arm_lpae_iopte pte;
+	struct io_pgtable *iop = &data->iop;
+	int i = 0, num_entries, max_entries, unmap_idx_start;
+
+	/* Something went horribly wrong and we ran out of page table */
+	if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+		return 0;
+
+	unmap_idx_start = ARM_LPAE_LVL_IDX(iova, lvl, data);
+	ptep += unmap_idx_start;
+	pte = READ_ONCE(*ptep);
+	if (WARN_ON(!pte))
+		return 0;
+
+	/* If the size matches this level, we're in the right place */
+	if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
+		max_entries = ARM_LPAE_PTES_PER_TABLE(data) - unmap_idx_start;
+		num_entries = min_t(int, pgcount, max_entries);
+
+		while (i < num_entries) {
+			pte = READ_ONCE(*ptep);
+			if (WARN_ON(!pte))
+				break;
+
+			__arm_lpae_clear_pte(ptep, &iop->cfg);
+
+			if (!iopte_leaf(pte, lvl, iop->fmt)) {
+				/* Also flush any partial walks */
+				io_pgtable_tlb_flush_walk(iop, iova + i * size, size,
+							  ARM_LPAE_GRANULE(data));
+				__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data));
+			} else if (!iommu_iotlb_gather_queued(gather)) {
+				io_pgtable_tlb_add_page(iop, gather, iova + i * size, size);
+			}
+
+			ptep++;
+			i++;
+		}
+
+		return i * size;
+	} else if (iopte_leaf(pte, lvl, iop->fmt)) {
+		/*
+		 * Insert a table at the next level to map the old region,
+		 * minus the part we want to unmap
+		 */
+		return arm_lpae_split_blk_unmap(data, gather, iova, size, pte,
+						lvl + 1, ptep, pgcount);
+	}
+
+	/* Keep on walkin' */
+	ptep = iopte_deref(pte, data);
+	return __arm_lpae_unmap(data, gather, iova, size, pgcount, lvl + 1, ptep);
+}
+
+size_t arm_lpae_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova,
+			    size_t pgsize, size_t pgcount,
+			    struct iommu_iotlb_gather *gather)
+{
+	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+	arm_lpae_iopte *ptep = data->pgd;
+	long iaext = (s64)iova >> cfg->ias;
+
+	if (WARN_ON(!pgsize || (pgsize & cfg->pgsize_bitmap) != pgsize || !pgcount))
+		return 0;
+
+	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
+		iaext = ~iaext;
+	if (WARN_ON(iaext))
+		return 0;
+
+	return __arm_lpae_unmap(data, gather, iova, pgsize, pgcount,
+				data->start_level, ptep);
+}
+
+phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
+				  unsigned long iova)
+{
+	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	arm_lpae_iopte pte, *ptep = data->pgd;
+	int lvl = data->start_level;
+
+	do {
+		/* Valid IOPTE pointer? */
+		if (!ptep)
+			return 0;
+
+		/* Grab the IOPTE we're interested in */
+		ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+		pte = READ_ONCE(*ptep);
+
+		/* Valid entry? */
+		if (!pte)
+			return 0;
+
+		/* Leaf entry? */
+		if (iopte_leaf(pte, lvl, data->iop.fmt))
+			goto found_translation;
+
+		/* Take it to the next level */
+		ptep = iopte_deref(pte, data);
+	} while (++lvl < ARM_LPAE_MAX_LEVELS);
+
+	/* Ran out of page tables to walk */
+	return 0;
+
+found_translation:
+	iova &= (ARM_LPAE_BLOCK_SIZE(lvl, data) - 1);
+	return iopte_to_paddr(pte, data) | iova;
+}
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 72dcdd468cf3..db42aed6ad7b 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /*
  * CPU-agnostic ARM page table allocator.
+ * Host-specific functions. The rest is in io-pgtable-arm-common.c.
  *
  * Copyright (C) 2014 ARM Limited
  *
@@ -11,7 +12,7 @@
 
 #include <linux/atomic.h>
 #include <linux/bitops.h>
-#include <linux/io-pgtable.h>
+#include <linux/io-pgtable-arm.h>
 #include <linux/kernel.h>
 #include <linux/sizes.h>
 #include <linux/slab.h>
@@ -20,175 +21,17 @@
 
 #include <asm/barrier.h>
 
-#include "io-pgtable-arm.h"
-
 #define ARM_LPAE_MAX_ADDR_BITS		52
 #define ARM_LPAE_S2_MAX_CONCAT_PAGES	16
-#define ARM_LPAE_MAX_LEVELS		4
-
-/* Struct accessors */
-#define io_pgtable_to_data(x)						\
-	container_of((x), struct arm_lpae_io_pgtable, iop)
-
-#define io_pgtable_ops_to_data(x)					\
-	io_pgtable_to_data(io_pgtable_ops_to_pgtable(x))
-
-/*
- * Calculate the right shift amount to get to the portion describing level l
- * in a virtual address mapped by the pagetable in d.
- */
-#define ARM_LPAE_LVL_SHIFT(l,d)						\
-	(((ARM_LPAE_MAX_LEVELS - (l)) * (d)->bits_per_level) +		\
-	ilog2(sizeof(arm_lpae_iopte)))
 
-#define ARM_LPAE_GRANULE(d)						\
-	(sizeof(arm_lpae_iopte) << (d)->bits_per_level)
-#define ARM_LPAE_PGD_SIZE(d)						\
-	(sizeof(arm_lpae_iopte) << (d)->pgd_bits)
-
-#define ARM_LPAE_PTES_PER_TABLE(d)					\
-	(ARM_LPAE_GRANULE(d) >> ilog2(sizeof(arm_lpae_iopte)))
-
-/*
- * Calculate the index at level l used to map virtual address a using the
- * pagetable in d.
- */
-#define ARM_LPAE_PGD_IDX(l,d)						\
-	((l) == (d)->start_level ? (d)->pgd_bits - (d)->bits_per_level : 0)
-
-#define ARM_LPAE_LVL_IDX(a,l,d)						\
-	(((u64)(a) >> ARM_LPAE_LVL_SHIFT(l,d)) &			\
-	 ((1 << ((d)->bits_per_level + ARM_LPAE_PGD_IDX(l,d))) - 1))
-
-/* Calculate the block/page mapping size at level l for pagetable in d. */
-#define ARM_LPAE_BLOCK_SIZE(l,d)	(1ULL << ARM_LPAE_LVL_SHIFT(l,d))
-
-/* Page table bits */
-#define ARM_LPAE_PTE_TYPE_SHIFT		0
-#define ARM_LPAE_PTE_TYPE_MASK		0x3
-
-#define ARM_LPAE_PTE_TYPE_BLOCK		1
-#define ARM_LPAE_PTE_TYPE_TABLE		3
-#define ARM_LPAE_PTE_TYPE_PAGE		3
-
-#define ARM_LPAE_PTE_ADDR_MASK		GENMASK_ULL(47,12)
-
-#define ARM_LPAE_PTE_NSTABLE		(((arm_lpae_iopte)1) << 63)
-#define ARM_LPAE_PTE_XN			(((arm_lpae_iopte)3) << 53)
-#define ARM_LPAE_PTE_AF			(((arm_lpae_iopte)1) << 10)
-#define ARM_LPAE_PTE_SH_NS		(((arm_lpae_iopte)0) << 8)
-#define ARM_LPAE_PTE_SH_OS		(((arm_lpae_iopte)2) << 8)
-#define ARM_LPAE_PTE_SH_IS		(((arm_lpae_iopte)3) << 8)
-#define ARM_LPAE_PTE_NS			(((arm_lpae_iopte)1) << 5)
-#define ARM_LPAE_PTE_VALID		(((arm_lpae_iopte)1) << 0)
-
-#define ARM_LPAE_PTE_ATTR_LO_MASK	(((arm_lpae_iopte)0x3ff) << 2)
-/* Ignore the contiguous bit for block splitting */
-#define ARM_LPAE_PTE_ATTR_HI_MASK	(((arm_lpae_iopte)6) << 52)
-#define ARM_LPAE_PTE_ATTR_MASK		(ARM_LPAE_PTE_ATTR_LO_MASK |	\
-					 ARM_LPAE_PTE_ATTR_HI_MASK)
-/* Software bit for solving coherency races */
-#define ARM_LPAE_PTE_SW_SYNC		(((arm_lpae_iopte)1) << 55)
-
-/* Stage-1 PTE */
-#define ARM_LPAE_PTE_AP_UNPRIV		(((arm_lpae_iopte)1) << 6)
-#define ARM_LPAE_PTE_AP_RDONLY		(((arm_lpae_iopte)2) << 6)
-#define ARM_LPAE_PTE_ATTRINDX_SHIFT	2
-#define ARM_LPAE_PTE_nG			(((arm_lpae_iopte)1) << 11)
-
-/* Stage-2 PTE */
-#define ARM_LPAE_PTE_HAP_FAULT		(((arm_lpae_iopte)0) << 6)
-#define ARM_LPAE_PTE_HAP_READ		(((arm_lpae_iopte)1) << 6)
-#define ARM_LPAE_PTE_HAP_WRITE		(((arm_lpae_iopte)2) << 6)
-#define ARM_LPAE_PTE_MEMATTR_OIWB	(((arm_lpae_iopte)0xf) << 2)
-#define ARM_LPAE_PTE_MEMATTR_NC		(((arm_lpae_iopte)0x5) << 2)
-#define ARM_LPAE_PTE_MEMATTR_DEV	(((arm_lpae_iopte)0x1) << 2)
-
-/* Register bits */
-#define ARM_LPAE_VTCR_SL0_MASK		0x3
-
-#define ARM_LPAE_TCR_T0SZ_SHIFT		0
-
-#define ARM_LPAE_VTCR_PS_SHIFT		16
-#define ARM_LPAE_VTCR_PS_MASK		0x7
-
-#define ARM_LPAE_MAIR_ATTR_SHIFT(n)	((n) << 3)
-#define ARM_LPAE_MAIR_ATTR_MASK		0xff
-#define ARM_LPAE_MAIR_ATTR_DEVICE	0x04
-#define ARM_LPAE_MAIR_ATTR_NC		0x44
-#define ARM_LPAE_MAIR_ATTR_INC_OWBRWA	0xf4
-#define ARM_LPAE_MAIR_ATTR_WBRWA	0xff
-#define ARM_LPAE_MAIR_ATTR_IDX_NC	0
-#define ARM_LPAE_MAIR_ATTR_IDX_CACHE	1
-#define ARM_LPAE_MAIR_ATTR_IDX_DEV	2
-#define ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE	3
-
-#define ARM_MALI_LPAE_TTBR_ADRMODE_TABLE (3u << 0)
-#define ARM_MALI_LPAE_TTBR_READ_INNER	BIT(2)
-#define ARM_MALI_LPAE_TTBR_SHARE_OUTER	BIT(4)
-
-#define ARM_MALI_LPAE_MEMATTR_IMP_DEF	0x88ULL
-#define ARM_MALI_LPAE_MEMATTR_WRITE_ALLOC 0x8DULL
-
-/* IOPTE accessors */
-#define iopte_deref(pte,d) __va(iopte_to_paddr(pte, d))
-
-#define iopte_type(pte)					\
-	(((pte) >> ARM_LPAE_PTE_TYPE_SHIFT) & ARM_LPAE_PTE_TYPE_MASK)
-
-#define iopte_prot(pte)	((pte) & ARM_LPAE_PTE_ATTR_MASK)
-
-struct arm_lpae_io_pgtable {
-	struct io_pgtable	iop;
-
-	int			pgd_bits;
-	int			start_level;
-	int			bits_per_level;
-
-	void			*pgd;
-};
-
-typedef u64 arm_lpae_iopte;
-
-static inline bool iopte_leaf(arm_lpae_iopte pte, int lvl,
-			      enum io_pgtable_fmt fmt)
-{
-	if (lvl == (ARM_LPAE_MAX_LEVELS - 1) && fmt != ARM_MALI_LPAE)
-		return iopte_type(pte) == ARM_LPAE_PTE_TYPE_PAGE;
-
-	return iopte_type(pte) == ARM_LPAE_PTE_TYPE_BLOCK;
-}
-
-static arm_lpae_iopte paddr_to_iopte(phys_addr_t paddr,
-				     struct arm_lpae_io_pgtable *data)
-{
-	arm_lpae_iopte pte = paddr;
-
-	/* Of the bits which overlap, either 51:48 or 15:12 are always RES0 */
-	return (pte | (pte >> (48 - 12))) & ARM_LPAE_PTE_ADDR_MASK;
-}
-
-static phys_addr_t iopte_to_paddr(arm_lpae_iopte pte,
-				  struct arm_lpae_io_pgtable *data)
-{
-	u64 paddr = pte & ARM_LPAE_PTE_ADDR_MASK;
-
-	if (ARM_LPAE_GRANULE(data) < SZ_64K)
-		return paddr;
-
-	/* Rotate the packed high-order bits back to the top */
-	return (paddr | (paddr << (48 - 12))) & (ARM_LPAE_PTE_ADDR_MASK << 4);
-}
-
-static bool selftest_running = false;
+bool selftest_running = false;
 
 static dma_addr_t __arm_lpae_dma_addr(void *pages)
 {
 	return (dma_addr_t)virt_to_phys(pages);
 }
 
-static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
-				    struct io_pgtable_cfg *cfg)
+void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp, struct io_pgtable_cfg *cfg)
 {
 	struct device *dev = cfg->iommu_dev;
 	int order = get_order(size);
@@ -225,8 +68,7 @@ static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
 	return NULL;
 }
 
-static void __arm_lpae_free_pages(void *pages, size_t size,
-				  struct io_pgtable_cfg *cfg)
+void __arm_lpae_free_pages(void *pages, size_t size, struct io_pgtable_cfg *cfg)
 {
 	if (!cfg->coherent_walk)
 		dma_unmap_single(cfg->iommu_dev, __arm_lpae_dma_addr(pages),
@@ -234,299 +76,13 @@ static void __arm_lpae_free_pages(void *pages, size_t size,
 	free_pages((unsigned long)pages, get_order(size));
 }
 
-static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries,
-				struct io_pgtable_cfg *cfg)
+void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries,
+			 struct io_pgtable_cfg *cfg)
 {
 	dma_sync_single_for_device(cfg->iommu_dev, __arm_lpae_dma_addr(ptep),
 				   sizeof(*ptep) * num_entries, DMA_TO_DEVICE);
 }
 
-static void __arm_lpae_clear_pte(arm_lpae_iopte *ptep, struct io_pgtable_cfg *cfg)
-{
-
-	*ptep = 0;
-
-	if (!cfg->coherent_walk)
-		__arm_lpae_sync_pte(ptep, 1, cfg);
-}
-
-static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
-			       struct iommu_iotlb_gather *gather,
-			       unsigned long iova, size_t size, size_t pgcount,
-			       int lvl, arm_lpae_iopte *ptep);
-
-static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
-				phys_addr_t paddr, arm_lpae_iopte prot,
-				int lvl, int num_entries, arm_lpae_iopte *ptep)
-{
-	arm_lpae_iopte pte = prot;
-	struct io_pgtable_cfg *cfg = &data->iop.cfg;
-	size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
-	int i;
-
-	if (data->iop.fmt != ARM_MALI_LPAE && lvl == ARM_LPAE_MAX_LEVELS - 1)
-		pte |= ARM_LPAE_PTE_TYPE_PAGE;
-	else
-		pte |= ARM_LPAE_PTE_TYPE_BLOCK;
-
-	for (i = 0; i < num_entries; i++)
-		ptep[i] = pte | paddr_to_iopte(paddr + i * sz, data);
-
-	if (!cfg->coherent_walk)
-		__arm_lpae_sync_pte(ptep, num_entries, cfg);
-}
-
-static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
-			     unsigned long iova, phys_addr_t paddr,
-			     arm_lpae_iopte prot, int lvl, int num_entries,
-			     arm_lpae_iopte *ptep)
-{
-	int i;
-
-	for (i = 0; i < num_entries; i++)
-		if (iopte_leaf(ptep[i], lvl, data->iop.fmt)) {
-			/* We require an unmap first */
-			WARN_ON(!selftest_running);
-			return -EEXIST;
-		} else if (iopte_type(ptep[i]) == ARM_LPAE_PTE_TYPE_TABLE) {
-			/*
-			 * We need to unmap and free the old table before
-			 * overwriting it with a block entry.
-			 */
-			arm_lpae_iopte *tblp;
-			size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
-
-			tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data);
-			if (__arm_lpae_unmap(data, NULL, iova + i * sz, sz, 1,
-					     lvl, tblp) != sz) {
-				WARN_ON(1);
-				return -EINVAL;
-			}
-		}
-
-	__arm_lpae_init_pte(data, paddr, prot, lvl, num_entries, ptep);
-	return 0;
-}
-
-static arm_lpae_iopte arm_lpae_install_table(arm_lpae_iopte *table,
-					     arm_lpae_iopte *ptep,
-					     arm_lpae_iopte curr,
-					     struct arm_lpae_io_pgtable *data)
-{
-	arm_lpae_iopte old, new;
-	struct io_pgtable_cfg *cfg = &data->iop.cfg;
-
-	new = paddr_to_iopte(__pa(table), data) | ARM_LPAE_PTE_TYPE_TABLE;
-	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_NS)
-		new |= ARM_LPAE_PTE_NSTABLE;
-
-	/*
-	 * Ensure the table itself is visible before its PTE can be.
-	 * Whilst we could get away with cmpxchg64_release below, this
-	 * doesn't have any ordering semantics when !CONFIG_SMP.
-	 */
-	dma_wmb();
-
-	old = cmpxchg64_relaxed(ptep, curr, new);
-
-	if (cfg->coherent_walk || (old & ARM_LPAE_PTE_SW_SYNC))
-		return old;
-
-	/* Even if it's not ours, there's no point waiting; just kick it */
-	__arm_lpae_sync_pte(ptep, 1, cfg);
-	if (old == curr)
-		WRITE_ONCE(*ptep, new | ARM_LPAE_PTE_SW_SYNC);
-
-	return old;
-}
-
-static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova,
-			  phys_addr_t paddr, size_t size, size_t pgcount,
-			  arm_lpae_iopte prot, int lvl, arm_lpae_iopte *ptep,
-			  gfp_t gfp, size_t *mapped)
-{
-	arm_lpae_iopte *cptep, pte;
-	size_t block_size = ARM_LPAE_BLOCK_SIZE(lvl, data);
-	size_t tblsz = ARM_LPAE_GRANULE(data);
-	struct io_pgtable_cfg *cfg = &data->iop.cfg;
-	int ret = 0, num_entries, max_entries, map_idx_start;
-
-	/* Find our entry at the current level */
-	map_idx_start = ARM_LPAE_LVL_IDX(iova, lvl, data);
-	ptep += map_idx_start;
-
-	/* If we can install a leaf entry at this level, then do so */
-	if (size == block_size) {
-		max_entries = ARM_LPAE_PTES_PER_TABLE(data) - map_idx_start;
-		num_entries = min_t(int, pgcount, max_entries);
-		ret = arm_lpae_init_pte(data, iova, paddr, prot, lvl, num_entries, ptep);
-		if (!ret)
-			*mapped += num_entries * size;
-
-		return ret;
-	}
-
-	/* We can't allocate tables at the final level */
-	if (WARN_ON(lvl >= ARM_LPAE_MAX_LEVELS - 1))
-		return -EINVAL;
-
-	/* Grab a pointer to the next level */
-	pte = READ_ONCE(*ptep);
-	if (!pte) {
-		cptep = __arm_lpae_alloc_pages(tblsz, gfp, cfg);
-		if (!cptep)
-			return -ENOMEM;
-
-		pte = arm_lpae_install_table(cptep, ptep, 0, data);
-		if (pte)
-			__arm_lpae_free_pages(cptep, tblsz, cfg);
-	} else if (!cfg->coherent_walk && !(pte & ARM_LPAE_PTE_SW_SYNC)) {
-		__arm_lpae_sync_pte(ptep, 1, cfg);
-	}
-
-	if (pte && !iopte_leaf(pte, lvl, data->iop.fmt)) {
-		cptep = iopte_deref(pte, data);
-	} else if (pte) {
-		/* We require an unmap first */
-		WARN_ON(!selftest_running);
-		return -EEXIST;
-	}
-
-	/* Rinse, repeat */
-	return __arm_lpae_map(data, iova, paddr, size, pgcount, prot, lvl + 1,
-			      cptep, gfp, mapped);
-}
-
-static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
-					   int prot)
-{
-	arm_lpae_iopte pte;
-
-	if (data->iop.fmt == ARM_64_LPAE_S1 ||
-	    data->iop.fmt == ARM_32_LPAE_S1) {
-		pte = ARM_LPAE_PTE_nG;
-		if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
-			pte |= ARM_LPAE_PTE_AP_RDONLY;
-		if (!(prot & IOMMU_PRIV))
-			pte |= ARM_LPAE_PTE_AP_UNPRIV;
-	} else {
-		pte = ARM_LPAE_PTE_HAP_FAULT;
-		if (prot & IOMMU_READ)
-			pte |= ARM_LPAE_PTE_HAP_READ;
-		if (prot & IOMMU_WRITE)
-			pte |= ARM_LPAE_PTE_HAP_WRITE;
-	}
-
-	/*
-	 * Note that this logic is structured to accommodate Mali LPAE
-	 * having stage-1-like attributes but stage-2-like permissions.
-	 */
-	if (data->iop.fmt == ARM_64_LPAE_S2 ||
-	    data->iop.fmt == ARM_32_LPAE_S2) {
-		if (prot & IOMMU_MMIO)
-			pte |= ARM_LPAE_PTE_MEMATTR_DEV;
-		else if (prot & IOMMU_CACHE)
-			pte |= ARM_LPAE_PTE_MEMATTR_OIWB;
-		else
-			pte |= ARM_LPAE_PTE_MEMATTR_NC;
-	} else {
-		if (prot & IOMMU_MMIO)
-			pte |= (ARM_LPAE_MAIR_ATTR_IDX_DEV
-				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
-		else if (prot & IOMMU_CACHE)
-			pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
-				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
-	}
-
-	/*
-	 * Also Mali has its own notions of shareability wherein its Inner
-	 * domain covers the cores within the GPU, and its Outer domain is
-	 * "outside the GPU" (i.e. either the Inner or System domain in CPU
-	 * terms, depending on coherency).
-	 */
-	if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE)
-		pte |= ARM_LPAE_PTE_SH_IS;
-	else
-		pte |= ARM_LPAE_PTE_SH_OS;
-
-	if (prot & IOMMU_NOEXEC)
-		pte |= ARM_LPAE_PTE_XN;
-
-	if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_NS)
-		pte |= ARM_LPAE_PTE_NS;
-
-	if (data->iop.fmt != ARM_MALI_LPAE)
-		pte |= ARM_LPAE_PTE_AF;
-
-	return pte;
-}
-
-static int arm_lpae_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
-			      phys_addr_t paddr, size_t pgsize, size_t pgcount,
-			      int iommu_prot, gfp_t gfp, size_t *mapped)
-{
-	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
-	struct io_pgtable_cfg *cfg = &data->iop.cfg;
-	arm_lpae_iopte *ptep = data->pgd;
-	int ret, lvl = data->start_level;
-	arm_lpae_iopte prot;
-	long iaext = (s64)iova >> cfg->ias;
-
-	if (WARN_ON(!pgsize || (pgsize & cfg->pgsize_bitmap) != pgsize))
-		return -EINVAL;
-
-	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
-		iaext = ~iaext;
-	if (WARN_ON(iaext || paddr >> cfg->oas))
-		return -ERANGE;
-
-	/* If no access, then nothing to do */
-	if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE)))
-		return 0;
-
-	prot = arm_lpae_prot_to_pte(data, iommu_prot);
-	ret = __arm_lpae_map(data, iova, paddr, pgsize, pgcount, prot, lvl,
-			     ptep, gfp, mapped);
-	/*
-	 * Synchronise all PTE updates for the new mapping before there's
-	 * a chance for anything to kick off a table walk for the new iova.
-	 */
-	wmb();
-
-	return ret;
-}
-
-static void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
-				    arm_lpae_iopte *ptep)
-{
-	arm_lpae_iopte *start, *end;
-	unsigned long table_size;
-
-	if (lvl == data->start_level)
-		table_size = ARM_LPAE_PGD_SIZE(data);
-	else
-		table_size = ARM_LPAE_GRANULE(data);
-
-	start = ptep;
-
-	/* Only leaf entries at the last level */
-	if (lvl == ARM_LPAE_MAX_LEVELS - 1)
-		end = ptep;
-	else
-		end = (void *)ptep + table_size;
-
-	while (ptep != end) {
-		arm_lpae_iopte pte = *ptep++;
-
-		if (!pte || iopte_leaf(pte, lvl, data->iop.fmt))
-			continue;
-
-		__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data));
-	}
-
-	__arm_lpae_free_pages(start, table_size, &data->iop.cfg);
-}
-
 static void arm_lpae_free_pgtable(struct io_pgtable *iop)
 {
 	struct arm_lpae_io_pgtable *data = io_pgtable_to_data(iop);
@@ -535,182 +91,6 @@ static void arm_lpae_free_pgtable(struct io_pgtable *iop)
 	kfree(data);
 }
 
-static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
-				       struct iommu_iotlb_gather *gather,
-				       unsigned long iova, size_t size,
-				       arm_lpae_iopte blk_pte, int lvl,
-				       arm_lpae_iopte *ptep, size_t pgcount)
-{
-	struct io_pgtable_cfg *cfg = &data->iop.cfg;
-	arm_lpae_iopte pte, *tablep;
-	phys_addr_t blk_paddr;
-	size_t tablesz = ARM_LPAE_GRANULE(data);
-	size_t split_sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
-	int ptes_per_table = ARM_LPAE_PTES_PER_TABLE(data);
-	int i, unmap_idx_start = -1, num_entries = 0, max_entries;
-
-	if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
-		return 0;
-
-	tablep = __arm_lpae_alloc_pages(tablesz, GFP_ATOMIC, cfg);
-	if (!tablep)
-		return 0; /* Bytes unmapped */
-
-	if (size == split_sz) {
-		unmap_idx_start = ARM_LPAE_LVL_IDX(iova, lvl, data);
-		max_entries = ptes_per_table - unmap_idx_start;
-		num_entries = min_t(int, pgcount, max_entries);
-	}
-
-	blk_paddr = iopte_to_paddr(blk_pte, data);
-	pte = iopte_prot(blk_pte);
-
-	for (i = 0; i < ptes_per_table; i++, blk_paddr += split_sz) {
-		/* Unmap! */
-		if (i >= unmap_idx_start && i < (unmap_idx_start + num_entries))
-			continue;
-
-		__arm_lpae_init_pte(data, blk_paddr, pte, lvl, 1, &tablep[i]);
-	}
-
-	pte = arm_lpae_install_table(tablep, ptep, blk_pte, data);
-	if (pte != blk_pte) {
-		__arm_lpae_free_pages(tablep, tablesz, cfg);
-		/*
-		 * We may race against someone unmapping another part of this
-		 * block, but anything else is invalid. We can't misinterpret
-		 * a page entry here since we're never at the last level.
-		 */
-		if (iopte_type(pte) != ARM_LPAE_PTE_TYPE_TABLE)
-			return 0;
-
-		tablep = iopte_deref(pte, data);
-	} else if (unmap_idx_start >= 0) {
-		for (i = 0; i < num_entries; i++)
-			io_pgtable_tlb_add_page(&data->iop, gather, iova + i * size, size);
-
-		return num_entries * size;
-	}
-
-	return __arm_lpae_unmap(data, gather, iova, size, pgcount, lvl, tablep);
-}
-
-static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
-			       struct iommu_iotlb_gather *gather,
-			       unsigned long iova, size_t size, size_t pgcount,
-			       int lvl, arm_lpae_iopte *ptep)
-{
-	arm_lpae_iopte pte;
-	struct io_pgtable *iop = &data->iop;
-	int i = 0, num_entries, max_entries, unmap_idx_start;
-
-	/* Something went horribly wrong and we ran out of page table */
-	if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
-		return 0;
-
-	unmap_idx_start = ARM_LPAE_LVL_IDX(iova, lvl, data);
-	ptep += unmap_idx_start;
-	pte = READ_ONCE(*ptep);
-	if (WARN_ON(!pte))
-		return 0;
-
-	/* If the size matches this level, we're in the right place */
-	if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
-		max_entries = ARM_LPAE_PTES_PER_TABLE(data) - unmap_idx_start;
-		num_entries = min_t(int, pgcount, max_entries);
-
-		while (i < num_entries) {
-			pte = READ_ONCE(*ptep);
-			if (WARN_ON(!pte))
-				break;
-
-			__arm_lpae_clear_pte(ptep, &iop->cfg);
-
-			if (!iopte_leaf(pte, lvl, iop->fmt)) {
-				/* Also flush any partial walks */
-				io_pgtable_tlb_flush_walk(iop, iova + i * size, size,
-							  ARM_LPAE_GRANULE(data));
-				__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data));
-			} else if (!iommu_iotlb_gather_queued(gather)) {
-				io_pgtable_tlb_add_page(iop, gather, iova + i * size, size);
-			}
-
-			ptep++;
-			i++;
-		}
-
-		return i * size;
-	} else if (iopte_leaf(pte, lvl, iop->fmt)) {
-		/*
-		 * Insert a table at the next level to map the old region,
-		 * minus the part we want to unmap
-		 */
-		return arm_lpae_split_blk_unmap(data, gather, iova, size, pte,
-						lvl + 1, ptep, pgcount);
-	}
-
-	/* Keep on walkin' */
-	ptep = iopte_deref(pte, data);
-	return __arm_lpae_unmap(data, gather, iova, size, pgcount, lvl + 1, ptep);
-}
-
-static size_t arm_lpae_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova,
-				   size_t pgsize, size_t pgcount,
-				   struct iommu_iotlb_gather *gather)
-{
-	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
-	struct io_pgtable_cfg *cfg = &data->iop.cfg;
-	arm_lpae_iopte *ptep = data->pgd;
-	long iaext = (s64)iova >> cfg->ias;
-
-	if (WARN_ON(!pgsize || (pgsize & cfg->pgsize_bitmap) != pgsize || !pgcount))
-		return 0;
-
-	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
-		iaext = ~iaext;
-	if (WARN_ON(iaext))
-		return 0;
-
-	return __arm_lpae_unmap(data, gather, iova, pgsize, pgcount,
-				data->start_level, ptep);
-}
-
-static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
-					 unsigned long iova)
-{
-	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
-	arm_lpae_iopte pte, *ptep = data->pgd;
-	int lvl = data->start_level;
-
-	do {
-		/* Valid IOPTE pointer? */
-		if (!ptep)
-			return 0;
-
-		/* Grab the IOPTE we're interested in */
-		ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
-		pte = READ_ONCE(*ptep);
-
-		/* Valid entry? */
-		if (!pte)
-			return 0;
-
-		/* Leaf entry? */
-		if (iopte_leaf(pte, lvl, data->iop.fmt))
-			goto found_translation;
-
-		/* Take it to the next level */
-		ptep = iopte_deref(pte, data);
-	} while (++lvl < ARM_LPAE_MAX_LEVELS);
-
-	/* Ran out of page tables to walk */
-	return 0;
-
-found_translation:
-	iova &= (ARM_LPAE_BLOCK_SIZE(lvl, data) - 1);
-	return iopte_to_paddr(pte, data) | iova;
-}
-
 static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
 {
 	unsigned long granule, page_sizes;
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 02/45] iommu/io-pgtable-arm: Split initialization
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:52   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:52 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Extract the configuration part from io-pgtable-arm.c, move it to
io-pgtable-arm-common.c.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 include/linux/io-pgtable-arm.h        |  15 +-
 drivers/iommu/io-pgtable-arm-common.c | 255 ++++++++++++++++++++++++++
 drivers/iommu/io-pgtable-arm.c        | 245 +------------------------
 3 files changed, 270 insertions(+), 245 deletions(-)

diff --git a/include/linux/io-pgtable-arm.h b/include/linux/io-pgtable-arm.h
index 594b5030b450..42202bc0ffa2 100644
--- a/include/linux/io-pgtable-arm.h
+++ b/include/linux/io-pgtable-arm.h
@@ -167,17 +167,16 @@ static inline bool iopte_leaf(arm_lpae_iopte pte, int lvl,
 #define __arm_lpae_phys_to_virt	__va
 
 /* Generic functions */
-int arm_lpae_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
-		       phys_addr_t paddr, size_t pgsize, size_t pgcount,
-		       int iommu_prot, gfp_t gfp, size_t *mapped);
-size_t arm_lpae_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova,
-			    size_t pgsize, size_t pgcount,
-			    struct iommu_iotlb_gather *gather);
-phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
-				  unsigned long iova);
 void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
 			     arm_lpae_iopte *ptep);
 
+int arm_lpae_init_pgtable(struct io_pgtable_cfg *cfg,
+			  struct arm_lpae_io_pgtable *data);
+int arm_lpae_init_pgtable_s1(struct io_pgtable_cfg *cfg,
+			     struct arm_lpae_io_pgtable *data);
+int arm_lpae_init_pgtable_s2(struct io_pgtable_cfg *cfg,
+			     struct arm_lpae_io_pgtable *data);
+
 /* Host/hyp-specific functions */
 void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp, struct io_pgtable_cfg *cfg);
 void __arm_lpae_free_pages(void *pages, size_t size, struct io_pgtable_cfg *cfg);
diff --git a/drivers/iommu/io-pgtable-arm-common.c b/drivers/iommu/io-pgtable-arm-common.c
index 74d962712d15..7340b5096499 100644
--- a/drivers/iommu/io-pgtable-arm-common.c
+++ b/drivers/iommu/io-pgtable-arm-common.c
@@ -15,6 +15,9 @@
 
 #define iopte_deref(pte, d) __arm_lpae_phys_to_virt(iopte_to_paddr(pte, d))
 
+#define ARM_LPAE_MAX_ADDR_BITS		52
+#define ARM_LPAE_S2_MAX_CONCAT_PAGES	16
+
 static arm_lpae_iopte paddr_to_iopte(phys_addr_t paddr,
 				     struct arm_lpae_io_pgtable *data)
 {
@@ -498,3 +501,255 @@ phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
 	iova &= (ARM_LPAE_BLOCK_SIZE(lvl, data) - 1);
 	return iopte_to_paddr(pte, data) | iova;
 }
+
+static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
+{
+	unsigned long granule, page_sizes;
+	unsigned int max_addr_bits = 48;
+
+	/*
+	 * We need to restrict the supported page sizes to match the
+	 * translation regime for a particular granule. Aim to match
+	 * the CPU page size if possible, otherwise prefer smaller sizes.
+	 * While we're at it, restrict the block sizes to match the
+	 * chosen granule.
+	 */
+	if (cfg->pgsize_bitmap & PAGE_SIZE)
+		granule = PAGE_SIZE;
+	else if (cfg->pgsize_bitmap & ~PAGE_MASK)
+		granule = 1UL << __fls(cfg->pgsize_bitmap & ~PAGE_MASK);
+	else if (cfg->pgsize_bitmap & PAGE_MASK)
+		granule = 1UL << __ffs(cfg->pgsize_bitmap & PAGE_MASK);
+	else
+		granule = 0;
+
+	switch (granule) {
+	case SZ_4K:
+		page_sizes = (SZ_4K | SZ_2M | SZ_1G);
+		break;
+	case SZ_16K:
+		page_sizes = (SZ_16K | SZ_32M);
+		break;
+	case SZ_64K:
+		max_addr_bits = 52;
+		page_sizes = (SZ_64K | SZ_512M);
+		if (cfg->oas > 48)
+			page_sizes |= 1ULL << 42; /* 4TB */
+		break;
+	default:
+		page_sizes = 0;
+	}
+
+	cfg->pgsize_bitmap &= page_sizes;
+	cfg->ias = min(cfg->ias, max_addr_bits);
+	cfg->oas = min(cfg->oas, max_addr_bits);
+}
+
+int arm_lpae_init_pgtable(struct io_pgtable_cfg *cfg,
+			  struct arm_lpae_io_pgtable *data)
+{
+	int levels, va_bits, pg_shift;
+
+	arm_lpae_restrict_pgsizes(cfg);
+
+	if (!(cfg->pgsize_bitmap & (SZ_4K | SZ_16K | SZ_64K)))
+		return -EINVAL;
+
+	if (cfg->ias > ARM_LPAE_MAX_ADDR_BITS)
+		return -E2BIG;
+
+	if (cfg->oas > ARM_LPAE_MAX_ADDR_BITS)
+		return -E2BIG;
+
+	pg_shift = __ffs(cfg->pgsize_bitmap);
+	data->bits_per_level = pg_shift - ilog2(sizeof(arm_lpae_iopte));
+
+	va_bits = cfg->ias - pg_shift;
+	levels = DIV_ROUND_UP(va_bits, data->bits_per_level);
+	data->start_level = ARM_LPAE_MAX_LEVELS - levels;
+
+	/* Calculate the actual size of our pgd (without concatenation) */
+	data->pgd_bits = va_bits - (data->bits_per_level * (levels - 1));
+
+	data->iop.ops = (struct io_pgtable_ops) {
+		.map_pages	= arm_lpae_map_pages,
+		.unmap_pages	= arm_lpae_unmap_pages,
+		.iova_to_phys	= arm_lpae_iova_to_phys,
+	};
+
+	return 0;
+}
+
+int arm_lpae_init_pgtable_s1(struct io_pgtable_cfg *cfg,
+			     struct arm_lpae_io_pgtable *data)
+{
+	u64 reg;
+	int ret;
+	typeof(&cfg->arm_lpae_s1_cfg.tcr) tcr = &cfg->arm_lpae_s1_cfg.tcr;
+	bool tg1;
+
+	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
+			    IO_PGTABLE_QUIRK_ARM_TTBR1 |
+			    IO_PGTABLE_QUIRK_ARM_OUTER_WBWA))
+		return -EINVAL;
+
+	ret = arm_lpae_init_pgtable(cfg, data);
+	if (ret)
+		return ret;
+
+	/* TCR */
+	if (cfg->coherent_walk) {
+		tcr->sh = ARM_LPAE_TCR_SH_IS;
+		tcr->irgn = ARM_LPAE_TCR_RGN_WBWA;
+		tcr->orgn = ARM_LPAE_TCR_RGN_WBWA;
+		if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA)
+			return -EINVAL;
+	} else {
+		tcr->sh = ARM_LPAE_TCR_SH_OS;
+		tcr->irgn = ARM_LPAE_TCR_RGN_NC;
+		if (!(cfg->quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA))
+			tcr->orgn = ARM_LPAE_TCR_RGN_NC;
+		else
+			tcr->orgn = ARM_LPAE_TCR_RGN_WBWA;
+	}
+
+	tg1 = cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1;
+	switch (ARM_LPAE_GRANULE(data)) {
+	case SZ_4K:
+		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_4K : ARM_LPAE_TCR_TG0_4K;
+		break;
+	case SZ_16K:
+		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_16K : ARM_LPAE_TCR_TG0_16K;
+		break;
+	case SZ_64K:
+		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_64K : ARM_LPAE_TCR_TG0_64K;
+		break;
+	}
+
+	switch (cfg->oas) {
+	case 32:
+		tcr->ips = ARM_LPAE_TCR_PS_32_BIT;
+		break;
+	case 36:
+		tcr->ips = ARM_LPAE_TCR_PS_36_BIT;
+		break;
+	case 40:
+		tcr->ips = ARM_LPAE_TCR_PS_40_BIT;
+		break;
+	case 42:
+		tcr->ips = ARM_LPAE_TCR_PS_42_BIT;
+		break;
+	case 44:
+		tcr->ips = ARM_LPAE_TCR_PS_44_BIT;
+		break;
+	case 48:
+		tcr->ips = ARM_LPAE_TCR_PS_48_BIT;
+		break;
+	case 52:
+		tcr->ips = ARM_LPAE_TCR_PS_52_BIT;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	tcr->tsz = 64ULL - cfg->ias;
+
+	/* MAIRs */
+	reg = (ARM_LPAE_MAIR_ATTR_NC
+	       << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_NC)) |
+	      (ARM_LPAE_MAIR_ATTR_WBRWA
+	       << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_CACHE)) |
+	      (ARM_LPAE_MAIR_ATTR_DEVICE
+	       << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV)) |
+	      (ARM_LPAE_MAIR_ATTR_INC_OWBRWA
+	       << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE));
+
+	cfg->arm_lpae_s1_cfg.mair = reg;
+	return 0;
+}
+
+int arm_lpae_init_pgtable_s2(struct io_pgtable_cfg *cfg,
+			     struct arm_lpae_io_pgtable *data)
+{
+	u64 sl;
+	int ret;
+	typeof(&cfg->arm_lpae_s2_cfg.vtcr) vtcr = &cfg->arm_lpae_s2_cfg.vtcr;
+
+	/* The NS quirk doesn't apply at stage 2 */
+	if (cfg->quirks)
+		return -EINVAL;
+
+	ret = arm_lpae_init_pgtable(cfg, data);
+	if (ret)
+		return ret;
+
+	/*
+	 * Concatenate PGDs at level 1 if possible in order to reduce
+	 * the depth of the stage-2 walk.
+	 */
+	if (data->start_level == 0) {
+		unsigned long pgd_pages;
+
+		pgd_pages = ARM_LPAE_PGD_SIZE(data) / sizeof(arm_lpae_iopte);
+		if (pgd_pages <= ARM_LPAE_S2_MAX_CONCAT_PAGES) {
+			data->pgd_bits += data->bits_per_level;
+			data->start_level++;
+		}
+	}
+
+	/* VTCR */
+	if (cfg->coherent_walk) {
+		vtcr->sh = ARM_LPAE_TCR_SH_IS;
+		vtcr->irgn = ARM_LPAE_TCR_RGN_WBWA;
+		vtcr->orgn = ARM_LPAE_TCR_RGN_WBWA;
+	} else {
+		vtcr->sh = ARM_LPAE_TCR_SH_OS;
+		vtcr->irgn = ARM_LPAE_TCR_RGN_NC;
+		vtcr->orgn = ARM_LPAE_TCR_RGN_NC;
+	}
+
+	sl = data->start_level;
+
+	switch (ARM_LPAE_GRANULE(data)) {
+	case SZ_4K:
+		vtcr->tg = ARM_LPAE_TCR_TG0_4K;
+		sl++; /* SL0 format is different for 4K granule size */
+		break;
+	case SZ_16K:
+		vtcr->tg = ARM_LPAE_TCR_TG0_16K;
+		break;
+	case SZ_64K:
+		vtcr->tg = ARM_LPAE_TCR_TG0_64K;
+		break;
+	}
+
+	switch (cfg->oas) {
+	case 32:
+		vtcr->ps = ARM_LPAE_TCR_PS_32_BIT;
+		break;
+	case 36:
+		vtcr->ps = ARM_LPAE_TCR_PS_36_BIT;
+		break;
+	case 40:
+		vtcr->ps = ARM_LPAE_TCR_PS_40_BIT;
+		break;
+	case 42:
+		vtcr->ps = ARM_LPAE_TCR_PS_42_BIT;
+		break;
+	case 44:
+		vtcr->ps = ARM_LPAE_TCR_PS_44_BIT;
+		break;
+	case 48:
+		vtcr->ps = ARM_LPAE_TCR_PS_48_BIT;
+		break;
+	case 52:
+		vtcr->ps = ARM_LPAE_TCR_PS_52_BIT;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	vtcr->tsz = 64ULL - cfg->ias;
+	vtcr->sl = ~sl & ARM_LPAE_VTCR_SL0_MASK;
+	return 0;
+}
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index db42aed6ad7b..b2b188bb86b3 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -21,9 +21,6 @@
 
 #include <asm/barrier.h>
 
-#define ARM_LPAE_MAX_ADDR_BITS		52
-#define ARM_LPAE_S2_MAX_CONCAT_PAGES	16
-
 bool selftest_running = false;
 
 static dma_addr_t __arm_lpae_dma_addr(void *pages)
@@ -91,174 +88,17 @@ static void arm_lpae_free_pgtable(struct io_pgtable *iop)
 	kfree(data);
 }
 
-static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
-{
-	unsigned long granule, page_sizes;
-	unsigned int max_addr_bits = 48;
-
-	/*
-	 * We need to restrict the supported page sizes to match the
-	 * translation regime for a particular granule. Aim to match
-	 * the CPU page size if possible, otherwise prefer smaller sizes.
-	 * While we're at it, restrict the block sizes to match the
-	 * chosen granule.
-	 */
-	if (cfg->pgsize_bitmap & PAGE_SIZE)
-		granule = PAGE_SIZE;
-	else if (cfg->pgsize_bitmap & ~PAGE_MASK)
-		granule = 1UL << __fls(cfg->pgsize_bitmap & ~PAGE_MASK);
-	else if (cfg->pgsize_bitmap & PAGE_MASK)
-		granule = 1UL << __ffs(cfg->pgsize_bitmap & PAGE_MASK);
-	else
-		granule = 0;
-
-	switch (granule) {
-	case SZ_4K:
-		page_sizes = (SZ_4K | SZ_2M | SZ_1G);
-		break;
-	case SZ_16K:
-		page_sizes = (SZ_16K | SZ_32M);
-		break;
-	case SZ_64K:
-		max_addr_bits = 52;
-		page_sizes = (SZ_64K | SZ_512M);
-		if (cfg->oas > 48)
-			page_sizes |= 1ULL << 42; /* 4TB */
-		break;
-	default:
-		page_sizes = 0;
-	}
-
-	cfg->pgsize_bitmap &= page_sizes;
-	cfg->ias = min(cfg->ias, max_addr_bits);
-	cfg->oas = min(cfg->oas, max_addr_bits);
-}
-
-static struct arm_lpae_io_pgtable *
-arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
-{
-	struct arm_lpae_io_pgtable *data;
-	int levels, va_bits, pg_shift;
-
-	arm_lpae_restrict_pgsizes(cfg);
-
-	if (!(cfg->pgsize_bitmap & (SZ_4K | SZ_16K | SZ_64K)))
-		return NULL;
-
-	if (cfg->ias > ARM_LPAE_MAX_ADDR_BITS)
-		return NULL;
-
-	if (cfg->oas > ARM_LPAE_MAX_ADDR_BITS)
-		return NULL;
-
-	data = kmalloc(sizeof(*data), GFP_KERNEL);
-	if (!data)
-		return NULL;
-
-	pg_shift = __ffs(cfg->pgsize_bitmap);
-	data->bits_per_level = pg_shift - ilog2(sizeof(arm_lpae_iopte));
-
-	va_bits = cfg->ias - pg_shift;
-	levels = DIV_ROUND_UP(va_bits, data->bits_per_level);
-	data->start_level = ARM_LPAE_MAX_LEVELS - levels;
-
-	/* Calculate the actual size of our pgd (without concatenation) */
-	data->pgd_bits = va_bits - (data->bits_per_level * (levels - 1));
-
-	data->iop.ops = (struct io_pgtable_ops) {
-		.map_pages	= arm_lpae_map_pages,
-		.unmap_pages	= arm_lpae_unmap_pages,
-		.iova_to_phys	= arm_lpae_iova_to_phys,
-	};
-
-	return data;
-}
-
 static struct io_pgtable *
 arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 {
-	u64 reg;
 	struct arm_lpae_io_pgtable *data;
-	typeof(&cfg->arm_lpae_s1_cfg.tcr) tcr = &cfg->arm_lpae_s1_cfg.tcr;
-	bool tg1;
-
-	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
-			    IO_PGTABLE_QUIRK_ARM_TTBR1 |
-			    IO_PGTABLE_QUIRK_ARM_OUTER_WBWA))
-		return NULL;
 
-	data = arm_lpae_alloc_pgtable(cfg);
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
 	if (!data)
 		return NULL;
 
-	/* TCR */
-	if (cfg->coherent_walk) {
-		tcr->sh = ARM_LPAE_TCR_SH_IS;
-		tcr->irgn = ARM_LPAE_TCR_RGN_WBWA;
-		tcr->orgn = ARM_LPAE_TCR_RGN_WBWA;
-		if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA)
-			goto out_free_data;
-	} else {
-		tcr->sh = ARM_LPAE_TCR_SH_OS;
-		tcr->irgn = ARM_LPAE_TCR_RGN_NC;
-		if (!(cfg->quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA))
-			tcr->orgn = ARM_LPAE_TCR_RGN_NC;
-		else
-			tcr->orgn = ARM_LPAE_TCR_RGN_WBWA;
-	}
-
-	tg1 = cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1;
-	switch (ARM_LPAE_GRANULE(data)) {
-	case SZ_4K:
-		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_4K : ARM_LPAE_TCR_TG0_4K;
-		break;
-	case SZ_16K:
-		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_16K : ARM_LPAE_TCR_TG0_16K;
-		break;
-	case SZ_64K:
-		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_64K : ARM_LPAE_TCR_TG0_64K;
-		break;
-	}
-
-	switch (cfg->oas) {
-	case 32:
-		tcr->ips = ARM_LPAE_TCR_PS_32_BIT;
-		break;
-	case 36:
-		tcr->ips = ARM_LPAE_TCR_PS_36_BIT;
-		break;
-	case 40:
-		tcr->ips = ARM_LPAE_TCR_PS_40_BIT;
-		break;
-	case 42:
-		tcr->ips = ARM_LPAE_TCR_PS_42_BIT;
-		break;
-	case 44:
-		tcr->ips = ARM_LPAE_TCR_PS_44_BIT;
-		break;
-	case 48:
-		tcr->ips = ARM_LPAE_TCR_PS_48_BIT;
-		break;
-	case 52:
-		tcr->ips = ARM_LPAE_TCR_PS_52_BIT;
-		break;
-	default:
+	if (arm_lpae_init_pgtable_s1(cfg, data))
 		goto out_free_data;
-	}
-
-	tcr->tsz = 64ULL - cfg->ias;
-
-	/* MAIRs */
-	reg = (ARM_LPAE_MAIR_ATTR_NC
-	       << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_NC)) |
-	      (ARM_LPAE_MAIR_ATTR_WBRWA
-	       << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_CACHE)) |
-	      (ARM_LPAE_MAIR_ATTR_DEVICE
-	       << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV)) |
-	      (ARM_LPAE_MAIR_ATTR_INC_OWBRWA
-	       << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE));
-
-	cfg->arm_lpae_s1_cfg.mair = reg;
 
 	/* Looking good; allocate a pgd */
 	data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data),
@@ -281,86 +121,14 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 static struct io_pgtable *
 arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
 {
-	u64 sl;
 	struct arm_lpae_io_pgtable *data;
-	typeof(&cfg->arm_lpae_s2_cfg.vtcr) vtcr = &cfg->arm_lpae_s2_cfg.vtcr;
-
-	/* The NS quirk doesn't apply at stage 2 */
-	if (cfg->quirks)
-		return NULL;
 
-	data = arm_lpae_alloc_pgtable(cfg);
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
 	if (!data)
 		return NULL;
 
-	/*
-	 * Concatenate PGDs at level 1 if possible in order to reduce
-	 * the depth of the stage-2 walk.
-	 */
-	if (data->start_level == 0) {
-		unsigned long pgd_pages;
-
-		pgd_pages = ARM_LPAE_PGD_SIZE(data) / sizeof(arm_lpae_iopte);
-		if (pgd_pages <= ARM_LPAE_S2_MAX_CONCAT_PAGES) {
-			data->pgd_bits += data->bits_per_level;
-			data->start_level++;
-		}
-	}
-
-	/* VTCR */
-	if (cfg->coherent_walk) {
-		vtcr->sh = ARM_LPAE_TCR_SH_IS;
-		vtcr->irgn = ARM_LPAE_TCR_RGN_WBWA;
-		vtcr->orgn = ARM_LPAE_TCR_RGN_WBWA;
-	} else {
-		vtcr->sh = ARM_LPAE_TCR_SH_OS;
-		vtcr->irgn = ARM_LPAE_TCR_RGN_NC;
-		vtcr->orgn = ARM_LPAE_TCR_RGN_NC;
-	}
-
-	sl = data->start_level;
-
-	switch (ARM_LPAE_GRANULE(data)) {
-	case SZ_4K:
-		vtcr->tg = ARM_LPAE_TCR_TG0_4K;
-		sl++; /* SL0 format is different for 4K granule size */
-		break;
-	case SZ_16K:
-		vtcr->tg = ARM_LPAE_TCR_TG0_16K;
-		break;
-	case SZ_64K:
-		vtcr->tg = ARM_LPAE_TCR_TG0_64K;
-		break;
-	}
-
-	switch (cfg->oas) {
-	case 32:
-		vtcr->ps = ARM_LPAE_TCR_PS_32_BIT;
-		break;
-	case 36:
-		vtcr->ps = ARM_LPAE_TCR_PS_36_BIT;
-		break;
-	case 40:
-		vtcr->ps = ARM_LPAE_TCR_PS_40_BIT;
-		break;
-	case 42:
-		vtcr->ps = ARM_LPAE_TCR_PS_42_BIT;
-		break;
-	case 44:
-		vtcr->ps = ARM_LPAE_TCR_PS_44_BIT;
-		break;
-	case 48:
-		vtcr->ps = ARM_LPAE_TCR_PS_48_BIT;
-		break;
-	case 52:
-		vtcr->ps = ARM_LPAE_TCR_PS_52_BIT;
-		break;
-	default:
+	if (arm_lpae_init_pgtable_s2(cfg, data))
 		goto out_free_data;
-	}
-
-	vtcr->tsz = 64ULL - cfg->ias;
-	vtcr->sl = ~sl & ARM_LPAE_VTCR_SL0_MASK;
 
 	/* Allocate pgd pages */
 	data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data),
@@ -414,10 +182,13 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
 
 	cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G);
 
-	data = arm_lpae_alloc_pgtable(cfg);
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
 	if (!data)
 		return NULL;
 
+	if (arm_lpae_init_pgtable(cfg, data))
+		return NULL;
+
 	/* Mali seems to need a full 4-level table regardless of IAS */
 	if (data->start_level > 0) {
 		data->start_level = 0;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 02/45] iommu/io-pgtable-arm: Split initialization
@ 2023-02-01 12:52   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:52 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Extract the configuration part from io-pgtable-arm.c, move it to
io-pgtable-arm-common.c.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 include/linux/io-pgtable-arm.h        |  15 +-
 drivers/iommu/io-pgtable-arm-common.c | 255 ++++++++++++++++++++++++++
 drivers/iommu/io-pgtable-arm.c        | 245 +------------------------
 3 files changed, 270 insertions(+), 245 deletions(-)

diff --git a/include/linux/io-pgtable-arm.h b/include/linux/io-pgtable-arm.h
index 594b5030b450..42202bc0ffa2 100644
--- a/include/linux/io-pgtable-arm.h
+++ b/include/linux/io-pgtable-arm.h
@@ -167,17 +167,16 @@ static inline bool iopte_leaf(arm_lpae_iopte pte, int lvl,
 #define __arm_lpae_phys_to_virt	__va
 
 /* Generic functions */
-int arm_lpae_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
-		       phys_addr_t paddr, size_t pgsize, size_t pgcount,
-		       int iommu_prot, gfp_t gfp, size_t *mapped);
-size_t arm_lpae_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova,
-			    size_t pgsize, size_t pgcount,
-			    struct iommu_iotlb_gather *gather);
-phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
-				  unsigned long iova);
 void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
 			     arm_lpae_iopte *ptep);
 
+int arm_lpae_init_pgtable(struct io_pgtable_cfg *cfg,
+			  struct arm_lpae_io_pgtable *data);
+int arm_lpae_init_pgtable_s1(struct io_pgtable_cfg *cfg,
+			     struct arm_lpae_io_pgtable *data);
+int arm_lpae_init_pgtable_s2(struct io_pgtable_cfg *cfg,
+			     struct arm_lpae_io_pgtable *data);
+
 /* Host/hyp-specific functions */
 void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp, struct io_pgtable_cfg *cfg);
 void __arm_lpae_free_pages(void *pages, size_t size, struct io_pgtable_cfg *cfg);
diff --git a/drivers/iommu/io-pgtable-arm-common.c b/drivers/iommu/io-pgtable-arm-common.c
index 74d962712d15..7340b5096499 100644
--- a/drivers/iommu/io-pgtable-arm-common.c
+++ b/drivers/iommu/io-pgtable-arm-common.c
@@ -15,6 +15,9 @@
 
 #define iopte_deref(pte, d) __arm_lpae_phys_to_virt(iopte_to_paddr(pte, d))
 
+#define ARM_LPAE_MAX_ADDR_BITS		52
+#define ARM_LPAE_S2_MAX_CONCAT_PAGES	16
+
 static arm_lpae_iopte paddr_to_iopte(phys_addr_t paddr,
 				     struct arm_lpae_io_pgtable *data)
 {
@@ -498,3 +501,255 @@ phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
 	iova &= (ARM_LPAE_BLOCK_SIZE(lvl, data) - 1);
 	return iopte_to_paddr(pte, data) | iova;
 }
+
+static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
+{
+	unsigned long granule, page_sizes;
+	unsigned int max_addr_bits = 48;
+
+	/*
+	 * We need to restrict the supported page sizes to match the
+	 * translation regime for a particular granule. Aim to match
+	 * the CPU page size if possible, otherwise prefer smaller sizes.
+	 * While we're at it, restrict the block sizes to match the
+	 * chosen granule.
+	 */
+	if (cfg->pgsize_bitmap & PAGE_SIZE)
+		granule = PAGE_SIZE;
+	else if (cfg->pgsize_bitmap & ~PAGE_MASK)
+		granule = 1UL << __fls(cfg->pgsize_bitmap & ~PAGE_MASK);
+	else if (cfg->pgsize_bitmap & PAGE_MASK)
+		granule = 1UL << __ffs(cfg->pgsize_bitmap & PAGE_MASK);
+	else
+		granule = 0;
+
+	switch (granule) {
+	case SZ_4K:
+		page_sizes = (SZ_4K | SZ_2M | SZ_1G);
+		break;
+	case SZ_16K:
+		page_sizes = (SZ_16K | SZ_32M);
+		break;
+	case SZ_64K:
+		max_addr_bits = 52;
+		page_sizes = (SZ_64K | SZ_512M);
+		if (cfg->oas > 48)
+			page_sizes |= 1ULL << 42; /* 4TB */
+		break;
+	default:
+		page_sizes = 0;
+	}
+
+	cfg->pgsize_bitmap &= page_sizes;
+	cfg->ias = min(cfg->ias, max_addr_bits);
+	cfg->oas = min(cfg->oas, max_addr_bits);
+}
+
+int arm_lpae_init_pgtable(struct io_pgtable_cfg *cfg,
+			  struct arm_lpae_io_pgtable *data)
+{
+	int levels, va_bits, pg_shift;
+
+	arm_lpae_restrict_pgsizes(cfg);
+
+	if (!(cfg->pgsize_bitmap & (SZ_4K | SZ_16K | SZ_64K)))
+		return -EINVAL;
+
+	if (cfg->ias > ARM_LPAE_MAX_ADDR_BITS)
+		return -E2BIG;
+
+	if (cfg->oas > ARM_LPAE_MAX_ADDR_BITS)
+		return -E2BIG;
+
+	pg_shift = __ffs(cfg->pgsize_bitmap);
+	data->bits_per_level = pg_shift - ilog2(sizeof(arm_lpae_iopte));
+
+	va_bits = cfg->ias - pg_shift;
+	levels = DIV_ROUND_UP(va_bits, data->bits_per_level);
+	data->start_level = ARM_LPAE_MAX_LEVELS - levels;
+
+	/* Calculate the actual size of our pgd (without concatenation) */
+	data->pgd_bits = va_bits - (data->bits_per_level * (levels - 1));
+
+	data->iop.ops = (struct io_pgtable_ops) {
+		.map_pages	= arm_lpae_map_pages,
+		.unmap_pages	= arm_lpae_unmap_pages,
+		.iova_to_phys	= arm_lpae_iova_to_phys,
+	};
+
+	return 0;
+}
+
+int arm_lpae_init_pgtable_s1(struct io_pgtable_cfg *cfg,
+			     struct arm_lpae_io_pgtable *data)
+{
+	u64 reg;
+	int ret;
+	typeof(&cfg->arm_lpae_s1_cfg.tcr) tcr = &cfg->arm_lpae_s1_cfg.tcr;
+	bool tg1;
+
+	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
+			    IO_PGTABLE_QUIRK_ARM_TTBR1 |
+			    IO_PGTABLE_QUIRK_ARM_OUTER_WBWA))
+		return -EINVAL;
+
+	ret = arm_lpae_init_pgtable(cfg, data);
+	if (ret)
+		return ret;
+
+	/* TCR */
+	if (cfg->coherent_walk) {
+		tcr->sh = ARM_LPAE_TCR_SH_IS;
+		tcr->irgn = ARM_LPAE_TCR_RGN_WBWA;
+		tcr->orgn = ARM_LPAE_TCR_RGN_WBWA;
+		if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA)
+			return -EINVAL;
+	} else {
+		tcr->sh = ARM_LPAE_TCR_SH_OS;
+		tcr->irgn = ARM_LPAE_TCR_RGN_NC;
+		if (!(cfg->quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA))
+			tcr->orgn = ARM_LPAE_TCR_RGN_NC;
+		else
+			tcr->orgn = ARM_LPAE_TCR_RGN_WBWA;
+	}
+
+	tg1 = cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1;
+	switch (ARM_LPAE_GRANULE(data)) {
+	case SZ_4K:
+		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_4K : ARM_LPAE_TCR_TG0_4K;
+		break;
+	case SZ_16K:
+		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_16K : ARM_LPAE_TCR_TG0_16K;
+		break;
+	case SZ_64K:
+		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_64K : ARM_LPAE_TCR_TG0_64K;
+		break;
+	}
+
+	switch (cfg->oas) {
+	case 32:
+		tcr->ips = ARM_LPAE_TCR_PS_32_BIT;
+		break;
+	case 36:
+		tcr->ips = ARM_LPAE_TCR_PS_36_BIT;
+		break;
+	case 40:
+		tcr->ips = ARM_LPAE_TCR_PS_40_BIT;
+		break;
+	case 42:
+		tcr->ips = ARM_LPAE_TCR_PS_42_BIT;
+		break;
+	case 44:
+		tcr->ips = ARM_LPAE_TCR_PS_44_BIT;
+		break;
+	case 48:
+		tcr->ips = ARM_LPAE_TCR_PS_48_BIT;
+		break;
+	case 52:
+		tcr->ips = ARM_LPAE_TCR_PS_52_BIT;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	tcr->tsz = 64ULL - cfg->ias;
+
+	/* MAIRs */
+	reg = (ARM_LPAE_MAIR_ATTR_NC
+	       << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_NC)) |
+	      (ARM_LPAE_MAIR_ATTR_WBRWA
+	       << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_CACHE)) |
+	      (ARM_LPAE_MAIR_ATTR_DEVICE
+	       << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV)) |
+	      (ARM_LPAE_MAIR_ATTR_INC_OWBRWA
+	       << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE));
+
+	cfg->arm_lpae_s1_cfg.mair = reg;
+	return 0;
+}
+
+int arm_lpae_init_pgtable_s2(struct io_pgtable_cfg *cfg,
+			     struct arm_lpae_io_pgtable *data)
+{
+	u64 sl;
+	int ret;
+	typeof(&cfg->arm_lpae_s2_cfg.vtcr) vtcr = &cfg->arm_lpae_s2_cfg.vtcr;
+
+	/* The NS quirk doesn't apply at stage 2 */
+	if (cfg->quirks)
+		return -EINVAL;
+
+	ret = arm_lpae_init_pgtable(cfg, data);
+	if (ret)
+		return ret;
+
+	/*
+	 * Concatenate PGDs at level 1 if possible in order to reduce
+	 * the depth of the stage-2 walk.
+	 */
+	if (data->start_level == 0) {
+		unsigned long pgd_pages;
+
+		pgd_pages = ARM_LPAE_PGD_SIZE(data) / sizeof(arm_lpae_iopte);
+		if (pgd_pages <= ARM_LPAE_S2_MAX_CONCAT_PAGES) {
+			data->pgd_bits += data->bits_per_level;
+			data->start_level++;
+		}
+	}
+
+	/* VTCR */
+	if (cfg->coherent_walk) {
+		vtcr->sh = ARM_LPAE_TCR_SH_IS;
+		vtcr->irgn = ARM_LPAE_TCR_RGN_WBWA;
+		vtcr->orgn = ARM_LPAE_TCR_RGN_WBWA;
+	} else {
+		vtcr->sh = ARM_LPAE_TCR_SH_OS;
+		vtcr->irgn = ARM_LPAE_TCR_RGN_NC;
+		vtcr->orgn = ARM_LPAE_TCR_RGN_NC;
+	}
+
+	sl = data->start_level;
+
+	switch (ARM_LPAE_GRANULE(data)) {
+	case SZ_4K:
+		vtcr->tg = ARM_LPAE_TCR_TG0_4K;
+		sl++; /* SL0 format is different for 4K granule size */
+		break;
+	case SZ_16K:
+		vtcr->tg = ARM_LPAE_TCR_TG0_16K;
+		break;
+	case SZ_64K:
+		vtcr->tg = ARM_LPAE_TCR_TG0_64K;
+		break;
+	}
+
+	switch (cfg->oas) {
+	case 32:
+		vtcr->ps = ARM_LPAE_TCR_PS_32_BIT;
+		break;
+	case 36:
+		vtcr->ps = ARM_LPAE_TCR_PS_36_BIT;
+		break;
+	case 40:
+		vtcr->ps = ARM_LPAE_TCR_PS_40_BIT;
+		break;
+	case 42:
+		vtcr->ps = ARM_LPAE_TCR_PS_42_BIT;
+		break;
+	case 44:
+		vtcr->ps = ARM_LPAE_TCR_PS_44_BIT;
+		break;
+	case 48:
+		vtcr->ps = ARM_LPAE_TCR_PS_48_BIT;
+		break;
+	case 52:
+		vtcr->ps = ARM_LPAE_TCR_PS_52_BIT;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	vtcr->tsz = 64ULL - cfg->ias;
+	vtcr->sl = ~sl & ARM_LPAE_VTCR_SL0_MASK;
+	return 0;
+}
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index db42aed6ad7b..b2b188bb86b3 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -21,9 +21,6 @@
 
 #include <asm/barrier.h>
 
-#define ARM_LPAE_MAX_ADDR_BITS		52
-#define ARM_LPAE_S2_MAX_CONCAT_PAGES	16
-
 bool selftest_running = false;
 
 static dma_addr_t __arm_lpae_dma_addr(void *pages)
@@ -91,174 +88,17 @@ static void arm_lpae_free_pgtable(struct io_pgtable *iop)
 	kfree(data);
 }
 
-static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
-{
-	unsigned long granule, page_sizes;
-	unsigned int max_addr_bits = 48;
-
-	/*
-	 * We need to restrict the supported page sizes to match the
-	 * translation regime for a particular granule. Aim to match
-	 * the CPU page size if possible, otherwise prefer smaller sizes.
-	 * While we're at it, restrict the block sizes to match the
-	 * chosen granule.
-	 */
-	if (cfg->pgsize_bitmap & PAGE_SIZE)
-		granule = PAGE_SIZE;
-	else if (cfg->pgsize_bitmap & ~PAGE_MASK)
-		granule = 1UL << __fls(cfg->pgsize_bitmap & ~PAGE_MASK);
-	else if (cfg->pgsize_bitmap & PAGE_MASK)
-		granule = 1UL << __ffs(cfg->pgsize_bitmap & PAGE_MASK);
-	else
-		granule = 0;
-
-	switch (granule) {
-	case SZ_4K:
-		page_sizes = (SZ_4K | SZ_2M | SZ_1G);
-		break;
-	case SZ_16K:
-		page_sizes = (SZ_16K | SZ_32M);
-		break;
-	case SZ_64K:
-		max_addr_bits = 52;
-		page_sizes = (SZ_64K | SZ_512M);
-		if (cfg->oas > 48)
-			page_sizes |= 1ULL << 42; /* 4TB */
-		break;
-	default:
-		page_sizes = 0;
-	}
-
-	cfg->pgsize_bitmap &= page_sizes;
-	cfg->ias = min(cfg->ias, max_addr_bits);
-	cfg->oas = min(cfg->oas, max_addr_bits);
-}
-
-static struct arm_lpae_io_pgtable *
-arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
-{
-	struct arm_lpae_io_pgtable *data;
-	int levels, va_bits, pg_shift;
-
-	arm_lpae_restrict_pgsizes(cfg);
-
-	if (!(cfg->pgsize_bitmap & (SZ_4K | SZ_16K | SZ_64K)))
-		return NULL;
-
-	if (cfg->ias > ARM_LPAE_MAX_ADDR_BITS)
-		return NULL;
-
-	if (cfg->oas > ARM_LPAE_MAX_ADDR_BITS)
-		return NULL;
-
-	data = kmalloc(sizeof(*data), GFP_KERNEL);
-	if (!data)
-		return NULL;
-
-	pg_shift = __ffs(cfg->pgsize_bitmap);
-	data->bits_per_level = pg_shift - ilog2(sizeof(arm_lpae_iopte));
-
-	va_bits = cfg->ias - pg_shift;
-	levels = DIV_ROUND_UP(va_bits, data->bits_per_level);
-	data->start_level = ARM_LPAE_MAX_LEVELS - levels;
-
-	/* Calculate the actual size of our pgd (without concatenation) */
-	data->pgd_bits = va_bits - (data->bits_per_level * (levels - 1));
-
-	data->iop.ops = (struct io_pgtable_ops) {
-		.map_pages	= arm_lpae_map_pages,
-		.unmap_pages	= arm_lpae_unmap_pages,
-		.iova_to_phys	= arm_lpae_iova_to_phys,
-	};
-
-	return data;
-}
-
 static struct io_pgtable *
 arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 {
-	u64 reg;
 	struct arm_lpae_io_pgtable *data;
-	typeof(&cfg->arm_lpae_s1_cfg.tcr) tcr = &cfg->arm_lpae_s1_cfg.tcr;
-	bool tg1;
-
-	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
-			    IO_PGTABLE_QUIRK_ARM_TTBR1 |
-			    IO_PGTABLE_QUIRK_ARM_OUTER_WBWA))
-		return NULL;
 
-	data = arm_lpae_alloc_pgtable(cfg);
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
 	if (!data)
 		return NULL;
 
-	/* TCR */
-	if (cfg->coherent_walk) {
-		tcr->sh = ARM_LPAE_TCR_SH_IS;
-		tcr->irgn = ARM_LPAE_TCR_RGN_WBWA;
-		tcr->orgn = ARM_LPAE_TCR_RGN_WBWA;
-		if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA)
-			goto out_free_data;
-	} else {
-		tcr->sh = ARM_LPAE_TCR_SH_OS;
-		tcr->irgn = ARM_LPAE_TCR_RGN_NC;
-		if (!(cfg->quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA))
-			tcr->orgn = ARM_LPAE_TCR_RGN_NC;
-		else
-			tcr->orgn = ARM_LPAE_TCR_RGN_WBWA;
-	}
-
-	tg1 = cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1;
-	switch (ARM_LPAE_GRANULE(data)) {
-	case SZ_4K:
-		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_4K : ARM_LPAE_TCR_TG0_4K;
-		break;
-	case SZ_16K:
-		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_16K : ARM_LPAE_TCR_TG0_16K;
-		break;
-	case SZ_64K:
-		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_64K : ARM_LPAE_TCR_TG0_64K;
-		break;
-	}
-
-	switch (cfg->oas) {
-	case 32:
-		tcr->ips = ARM_LPAE_TCR_PS_32_BIT;
-		break;
-	case 36:
-		tcr->ips = ARM_LPAE_TCR_PS_36_BIT;
-		break;
-	case 40:
-		tcr->ips = ARM_LPAE_TCR_PS_40_BIT;
-		break;
-	case 42:
-		tcr->ips = ARM_LPAE_TCR_PS_42_BIT;
-		break;
-	case 44:
-		tcr->ips = ARM_LPAE_TCR_PS_44_BIT;
-		break;
-	case 48:
-		tcr->ips = ARM_LPAE_TCR_PS_48_BIT;
-		break;
-	case 52:
-		tcr->ips = ARM_LPAE_TCR_PS_52_BIT;
-		break;
-	default:
+	if (arm_lpae_init_pgtable_s1(cfg, data))
 		goto out_free_data;
-	}
-
-	tcr->tsz = 64ULL - cfg->ias;
-
-	/* MAIRs */
-	reg = (ARM_LPAE_MAIR_ATTR_NC
-	       << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_NC)) |
-	      (ARM_LPAE_MAIR_ATTR_WBRWA
-	       << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_CACHE)) |
-	      (ARM_LPAE_MAIR_ATTR_DEVICE
-	       << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV)) |
-	      (ARM_LPAE_MAIR_ATTR_INC_OWBRWA
-	       << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE));
-
-	cfg->arm_lpae_s1_cfg.mair = reg;
 
 	/* Looking good; allocate a pgd */
 	data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data),
@@ -281,86 +121,14 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 static struct io_pgtable *
 arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
 {
-	u64 sl;
 	struct arm_lpae_io_pgtable *data;
-	typeof(&cfg->arm_lpae_s2_cfg.vtcr) vtcr = &cfg->arm_lpae_s2_cfg.vtcr;
-
-	/* The NS quirk doesn't apply at stage 2 */
-	if (cfg->quirks)
-		return NULL;
 
-	data = arm_lpae_alloc_pgtable(cfg);
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
 	if (!data)
 		return NULL;
 
-	/*
-	 * Concatenate PGDs at level 1 if possible in order to reduce
-	 * the depth of the stage-2 walk.
-	 */
-	if (data->start_level == 0) {
-		unsigned long pgd_pages;
-
-		pgd_pages = ARM_LPAE_PGD_SIZE(data) / sizeof(arm_lpae_iopte);
-		if (pgd_pages <= ARM_LPAE_S2_MAX_CONCAT_PAGES) {
-			data->pgd_bits += data->bits_per_level;
-			data->start_level++;
-		}
-	}
-
-	/* VTCR */
-	if (cfg->coherent_walk) {
-		vtcr->sh = ARM_LPAE_TCR_SH_IS;
-		vtcr->irgn = ARM_LPAE_TCR_RGN_WBWA;
-		vtcr->orgn = ARM_LPAE_TCR_RGN_WBWA;
-	} else {
-		vtcr->sh = ARM_LPAE_TCR_SH_OS;
-		vtcr->irgn = ARM_LPAE_TCR_RGN_NC;
-		vtcr->orgn = ARM_LPAE_TCR_RGN_NC;
-	}
-
-	sl = data->start_level;
-
-	switch (ARM_LPAE_GRANULE(data)) {
-	case SZ_4K:
-		vtcr->tg = ARM_LPAE_TCR_TG0_4K;
-		sl++; /* SL0 format is different for 4K granule size */
-		break;
-	case SZ_16K:
-		vtcr->tg = ARM_LPAE_TCR_TG0_16K;
-		break;
-	case SZ_64K:
-		vtcr->tg = ARM_LPAE_TCR_TG0_64K;
-		break;
-	}
-
-	switch (cfg->oas) {
-	case 32:
-		vtcr->ps = ARM_LPAE_TCR_PS_32_BIT;
-		break;
-	case 36:
-		vtcr->ps = ARM_LPAE_TCR_PS_36_BIT;
-		break;
-	case 40:
-		vtcr->ps = ARM_LPAE_TCR_PS_40_BIT;
-		break;
-	case 42:
-		vtcr->ps = ARM_LPAE_TCR_PS_42_BIT;
-		break;
-	case 44:
-		vtcr->ps = ARM_LPAE_TCR_PS_44_BIT;
-		break;
-	case 48:
-		vtcr->ps = ARM_LPAE_TCR_PS_48_BIT;
-		break;
-	case 52:
-		vtcr->ps = ARM_LPAE_TCR_PS_52_BIT;
-		break;
-	default:
+	if (arm_lpae_init_pgtable_s2(cfg, data))
 		goto out_free_data;
-	}
-
-	vtcr->tsz = 64ULL - cfg->ias;
-	vtcr->sl = ~sl & ARM_LPAE_VTCR_SL0_MASK;
 
 	/* Allocate pgd pages */
 	data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data),
@@ -414,10 +182,13 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
 
 	cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G);
 
-	data = arm_lpae_alloc_pgtable(cfg);
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
 	if (!data)
 		return NULL;
 
+	if (arm_lpae_init_pgtable(cfg, data))
+		return NULL;
+
 	/* Mali seems to need a full 4-level table regardless of IAS */
 	if (data->start_level > 0) {
 		data->start_level = 0;
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 03/45] iommu/io-pgtable: Move fmt into io_pgtable_cfg
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:52   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:52 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

When passing the I/O pagetable configuration around and adding new
operations, it will be slightly more convenient to have fmt be part of
the config structure rather than a separate parameter.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 include/linux/io-pgtable.h                  |  8 +++----
 drivers/gpu/drm/msm/msm_iommu.c             |  3 +--
 drivers/gpu/drm/panfrost/panfrost_mmu.c     |  4 ++--
 drivers/iommu/amd/iommu.c                   |  3 ++-
 drivers/iommu/apple-dart.c                  |  4 ++--
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  3 ++-
 drivers/iommu/arm/arm-smmu/arm-smmu.c       |  3 ++-
 drivers/iommu/arm/arm-smmu/qcom_iommu.c     |  3 ++-
 drivers/iommu/io-pgtable-arm-common.c       | 26 ++++++++++-----------
 drivers/iommu/io-pgtable-arm-v7s.c          |  3 ++-
 drivers/iommu/io-pgtable-arm.c              |  3 ++-
 drivers/iommu/io-pgtable-dart.c             |  8 +++----
 drivers/iommu/io-pgtable.c                  | 10 ++++----
 drivers/iommu/ipmmu-vmsa.c                  |  4 ++--
 drivers/iommu/msm_iommu.c                   |  3 ++-
 drivers/iommu/mtk_iommu.c                   |  3 ++-
 16 files changed, 47 insertions(+), 44 deletions(-)

diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 1b7a44b35616..1b0c26241a78 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -49,6 +49,7 @@ struct iommu_flush_ops {
 /**
  * struct io_pgtable_cfg - Configuration data for a set of page tables.
  *
+ * @fmt	           Format used for these page tables
  * @quirks:        A bitmap of hardware quirks that require some special
  *                 action by the low-level page table allocator.
  * @pgsize_bitmap: A bitmap of page sizes supported by this set of page
@@ -62,6 +63,7 @@ struct iommu_flush_ops {
  *                 page table walker.
  */
 struct io_pgtable_cfg {
+	enum io_pgtable_fmt		fmt;
 	/*
 	 * IO_PGTABLE_QUIRK_ARM_NS: (ARM formats) Set NS and NSTABLE bits in
 	 *	stage 1 PTEs, for hardware which insists on validating them
@@ -171,15 +173,13 @@ struct io_pgtable_ops {
 /**
  * alloc_io_pgtable_ops() - Allocate a page table allocator for use by an IOMMU.
  *
- * @fmt:    The page table format.
  * @cfg:    The page table configuration. This will be modified to represent
  *          the configuration actually provided by the allocator (e.g. the
  *          pgsize_bitmap may be restricted).
  * @cookie: An opaque token provided by the IOMMU driver and passed back to
  *          the callback routines in cfg->tlb.
  */
-struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
-					    struct io_pgtable_cfg *cfg,
+struct io_pgtable_ops *alloc_io_pgtable_ops(struct io_pgtable_cfg *cfg,
 					    void *cookie);
 
 /**
@@ -199,14 +199,12 @@ void free_io_pgtable_ops(struct io_pgtable_ops *ops);
 /**
  * struct io_pgtable - Internal structure describing a set of page tables.
  *
- * @fmt:    The page table format.
  * @cookie: An opaque token provided by the IOMMU driver and passed back to
  *          any callback routines.
  * @cfg:    A copy of the page table configuration.
  * @ops:    The page table operations in use for this set of page tables.
  */
 struct io_pgtable {
-	enum io_pgtable_fmt	fmt;
 	void			*cookie;
 	struct io_pgtable_cfg	cfg;
 	struct io_pgtable_ops	ops;
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index c2507582ecf3..e9c6f281e3dd 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -258,8 +258,7 @@ struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent)
 	ttbr0_cfg.quirks &= ~IO_PGTABLE_QUIRK_ARM_TTBR1;
 	ttbr0_cfg.tlb = &null_tlb_ops;
 
-	pagetable->pgtbl_ops = alloc_io_pgtable_ops(ARM_64_LPAE_S1,
-		&ttbr0_cfg, iommu->domain);
+	pagetable->pgtbl_ops = alloc_io_pgtable_ops(&ttbr0_cfg, iommu->domain);
 
 	if (!pagetable->pgtbl_ops) {
 		kfree(pagetable);
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 4e83a1891f3e..31bdb5d46244 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -622,6 +622,7 @@ struct panfrost_mmu *panfrost_mmu_ctx_create(struct panfrost_device *pfdev)
 	mmu->as = -1;
 
 	mmu->pgtbl_cfg = (struct io_pgtable_cfg) {
+		.fmt		= ARM_MALI_LPAE,
 		.pgsize_bitmap	= SZ_4K | SZ_2M,
 		.ias		= FIELD_GET(0xff, pfdev->features.mmu_features),
 		.oas		= FIELD_GET(0xff00, pfdev->features.mmu_features),
@@ -630,8 +631,7 @@ struct panfrost_mmu *panfrost_mmu_ctx_create(struct panfrost_device *pfdev)
 		.iommu_dev	= pfdev->dev,
 	};
 
-	mmu->pgtbl_ops = alloc_io_pgtable_ops(ARM_MALI_LPAE, &mmu->pgtbl_cfg,
-					      mmu);
+	mmu->pgtbl_ops = alloc_io_pgtable_ops(&mmu->pgtbl_cfg, mmu);
 	if (!mmu->pgtbl_ops) {
 		kfree(mmu);
 		return ERR_PTR(-EINVAL);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index cbeaab55c0db..7efb6b467041 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2072,7 +2072,8 @@ static struct protection_domain *protection_domain_alloc(unsigned int type)
 	if (ret)
 		goto out_err;
 
-	pgtbl_ops = alloc_io_pgtable_ops(pgtable, &domain->iop.pgtbl_cfg, domain);
+	domain->iop.pgtbl_cfg.fmt = pgtable;
+	pgtbl_ops = alloc_io_pgtable_ops(&domain->iop.pgtbl_cfg, domain);
 	if (!pgtbl_ops) {
 		domain_id_free(domain->id);
 		goto out_err;
diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
index 4f4a323be0d0..571f948add7c 100644
--- a/drivers/iommu/apple-dart.c
+++ b/drivers/iommu/apple-dart.c
@@ -427,6 +427,7 @@ static int apple_dart_finalize_domain(struct iommu_domain *domain,
 	}
 
 	pgtbl_cfg = (struct io_pgtable_cfg){
+		.fmt = dart->hw->fmt,
 		.pgsize_bitmap = dart->pgsize,
 		.ias = 32,
 		.oas = dart->hw->oas,
@@ -434,8 +435,7 @@ static int apple_dart_finalize_domain(struct iommu_domain *domain,
 		.iommu_dev = dart->dev,
 	};
 
-	dart_domain->pgtbl_ops =
-		alloc_io_pgtable_ops(dart->hw->fmt, &pgtbl_cfg, domain);
+	dart_domain->pgtbl_ops = alloc_io_pgtable_ops(&pgtbl_cfg, domain);
 	if (!dart_domain->pgtbl_ops) {
 		ret = -ENOMEM;
 		goto done;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index ab160198edd6..c033b23ca4b2 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2209,6 +2209,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
 	}
 
 	pgtbl_cfg = (struct io_pgtable_cfg) {
+		.fmt		= fmt,
 		.pgsize_bitmap	= smmu->pgsize_bitmap,
 		.ias		= ias,
 		.oas		= oas,
@@ -2217,7 +2218,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
 		.iommu_dev	= smmu->dev,
 	};
 
-	pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);
+	pgtbl_ops = alloc_io_pgtable_ops(&pgtbl_cfg, smmu_domain);
 	if (!pgtbl_ops)
 		return -ENOMEM;
 
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 719fbca1fe52..f230d2ce977a 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -747,6 +747,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 		cfg->asid = cfg->cbndx;
 
 	pgtbl_cfg = (struct io_pgtable_cfg) {
+		.fmt		= fmt,
 		.pgsize_bitmap	= smmu->pgsize_bitmap,
 		.ias		= ias,
 		.oas		= oas,
@@ -764,7 +765,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 	if (smmu_domain->pgtbl_quirks)
 		pgtbl_cfg.quirks |= smmu_domain->pgtbl_quirks;
 
-	pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);
+	pgtbl_ops = alloc_io_pgtable_ops(&pgtbl_cfg, smmu_domain);
 	if (!pgtbl_ops) {
 		ret = -ENOMEM;
 		goto out_clear_smmu;
diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
index 270c3d9128ba..65eb8bdcbe50 100644
--- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c
+++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
@@ -239,6 +239,7 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
 		goto out_unlock;
 
 	pgtbl_cfg = (struct io_pgtable_cfg) {
+		.fmt		= ARM_32_LPAE_S1,
 		.pgsize_bitmap	= qcom_iommu_ops.pgsize_bitmap,
 		.ias		= 32,
 		.oas		= 40,
@@ -249,7 +250,7 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
 	qcom_domain->iommu = qcom_iommu;
 	qcom_domain->fwspec = fwspec;
 
-	pgtbl_ops = alloc_io_pgtable_ops(ARM_32_LPAE_S1, &pgtbl_cfg, qcom_domain);
+	pgtbl_ops = alloc_io_pgtable_ops(&pgtbl_cfg, qcom_domain);
 	if (!pgtbl_ops) {
 		dev_err(qcom_iommu->dev, "failed to allocate pagetable ops\n");
 		ret = -ENOMEM;
diff --git a/drivers/iommu/io-pgtable-arm-common.c b/drivers/iommu/io-pgtable-arm-common.c
index 7340b5096499..4b3a9ce806ea 100644
--- a/drivers/iommu/io-pgtable-arm-common.c
+++ b/drivers/iommu/io-pgtable-arm-common.c
@@ -62,7 +62,7 @@ static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
 	size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
 	int i;
 
-	if (data->iop.fmt != ARM_MALI_LPAE && lvl == ARM_LPAE_MAX_LEVELS - 1)
+	if (data->iop.cfg.fmt != ARM_MALI_LPAE && lvl == ARM_LPAE_MAX_LEVELS - 1)
 		pte |= ARM_LPAE_PTE_TYPE_PAGE;
 	else
 		pte |= ARM_LPAE_PTE_TYPE_BLOCK;
@@ -82,7 +82,7 @@ static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
 	int i;
 
 	for (i = 0; i < num_entries; i++)
-		if (iopte_leaf(ptep[i], lvl, data->iop.fmt)) {
+		if (iopte_leaf(ptep[i], lvl, data->iop.cfg.fmt)) {
 			/* We require an unmap first */
 			WARN_ON(!selftest_running);
 			return -EEXIST;
@@ -183,7 +183,7 @@ int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova,
 		__arm_lpae_sync_pte(ptep, 1, cfg);
 	}
 
-	if (pte && !iopte_leaf(pte, lvl, data->iop.fmt)) {
+	if (pte && !iopte_leaf(pte, lvl, data->iop.cfg.fmt)) {
 		cptep = iopte_deref(pte, data);
 	} else if (pte) {
 		/* We require an unmap first */
@@ -201,8 +201,8 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
 {
 	arm_lpae_iopte pte;
 
-	if (data->iop.fmt == ARM_64_LPAE_S1 ||
-	    data->iop.fmt == ARM_32_LPAE_S1) {
+	if (data->iop.cfg.fmt == ARM_64_LPAE_S1 ||
+	    data->iop.cfg.fmt == ARM_32_LPAE_S1) {
 		pte = ARM_LPAE_PTE_nG;
 		if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
 			pte |= ARM_LPAE_PTE_AP_RDONLY;
@@ -220,8 +220,8 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
 	 * Note that this logic is structured to accommodate Mali LPAE
 	 * having stage-1-like attributes but stage-2-like permissions.
 	 */
-	if (data->iop.fmt == ARM_64_LPAE_S2 ||
-	    data->iop.fmt == ARM_32_LPAE_S2) {
+	if (data->iop.cfg.fmt == ARM_64_LPAE_S2 ||
+	    data->iop.cfg.fmt == ARM_32_LPAE_S2) {
 		if (prot & IOMMU_MMIO)
 			pte |= ARM_LPAE_PTE_MEMATTR_DEV;
 		else if (prot & IOMMU_CACHE)
@@ -243,7 +243,7 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
 	 * "outside the GPU" (i.e. either the Inner or System domain in CPU
 	 * terms, depending on coherency).
 	 */
-	if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE)
+	if (prot & IOMMU_CACHE && data->iop.cfg.fmt != ARM_MALI_LPAE)
 		pte |= ARM_LPAE_PTE_SH_IS;
 	else
 		pte |= ARM_LPAE_PTE_SH_OS;
@@ -254,7 +254,7 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
 	if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_NS)
 		pte |= ARM_LPAE_PTE_NS;
 
-	if (data->iop.fmt != ARM_MALI_LPAE)
+	if (data->iop.cfg.fmt != ARM_MALI_LPAE)
 		pte |= ARM_LPAE_PTE_AF;
 
 	return pte;
@@ -317,7 +317,7 @@ void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
 	while (ptep != end) {
 		arm_lpae_iopte pte = *ptep++;
 
-		if (!pte || iopte_leaf(pte, lvl, data->iop.fmt))
+		if (!pte || iopte_leaf(pte, lvl, data->iop.cfg.fmt))
 			continue;
 
 		__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data));
@@ -417,7 +417,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
 
 			__arm_lpae_clear_pte(ptep, &iop->cfg);
 
-			if (!iopte_leaf(pte, lvl, iop->fmt)) {
+			if (!iopte_leaf(pte, lvl, iop->cfg.fmt)) {
 				/* Also flush any partial walks */
 				io_pgtable_tlb_flush_walk(iop, iova + i * size, size,
 							  ARM_LPAE_GRANULE(data));
@@ -431,7 +431,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
 		}
 
 		return i * size;
-	} else if (iopte_leaf(pte, lvl, iop->fmt)) {
+	} else if (iopte_leaf(pte, lvl, iop->cfg.fmt)) {
 		/*
 		 * Insert a table at the next level to map the old region,
 		 * minus the part we want to unmap
@@ -487,7 +487,7 @@ phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
 			return 0;
 
 		/* Leaf entry? */
-		if (iopte_leaf(pte, lvl, data->iop.fmt))
+		if (iopte_leaf(pte, lvl, data->iop.cfg.fmt))
 			goto found_translation;
 
 		/* Take it to the next level */
diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
index 75f244a3e12d..278b4299d757 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -930,6 +930,7 @@ static int __init arm_v7s_do_selftests(void)
 {
 	struct io_pgtable_ops *ops;
 	struct io_pgtable_cfg cfg = {
+		.fmt = ARM_V7S,
 		.tlb = &dummy_tlb_ops,
 		.oas = 32,
 		.ias = 32,
@@ -945,7 +946,7 @@ static int __init arm_v7s_do_selftests(void)
 
 	cfg_cookie = &cfg;
 
-	ops = alloc_io_pgtable_ops(ARM_V7S, &cfg, &cfg);
+	ops = alloc_io_pgtable_ops(&cfg, &cfg);
 	if (!ops) {
 		pr_err("selftest: failed to allocate io pgtable ops\n");
 		return -EINVAL;
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index b2b188bb86b3..b76b903400de 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -319,7 +319,8 @@ static int __init arm_lpae_run_tests(struct io_pgtable_cfg *cfg)
 
 	for (i = 0; i < ARRAY_SIZE(fmts); ++i) {
 		cfg_cookie = cfg;
-		ops = alloc_io_pgtable_ops(fmts[i], cfg, cfg);
+		cfg->fmt = fmts[i];
+		ops = alloc_io_pgtable_ops(cfg, cfg);
 		if (!ops) {
 			pr_err("selftest: failed to allocate io pgtable ops\n");
 			return -ENOMEM;
diff --git a/drivers/iommu/io-pgtable-dart.c b/drivers/iommu/io-pgtable-dart.c
index 74b1ef2b96be..f981b25d8c98 100644
--- a/drivers/iommu/io-pgtable-dart.c
+++ b/drivers/iommu/io-pgtable-dart.c
@@ -81,7 +81,7 @@ static dart_iopte paddr_to_iopte(phys_addr_t paddr,
 {
 	dart_iopte pte;
 
-	if (data->iop.fmt == APPLE_DART)
+	if (data->iop.cfg.fmt == APPLE_DART)
 		return paddr & APPLE_DART1_PADDR_MASK;
 
 	/* format is APPLE_DART2 */
@@ -96,7 +96,7 @@ static phys_addr_t iopte_to_paddr(dart_iopte pte,
 {
 	u64 paddr;
 
-	if (data->iop.fmt == APPLE_DART)
+	if (data->iop.cfg.fmt == APPLE_DART)
 		return pte & APPLE_DART1_PADDR_MASK;
 
 	/* format is APPLE_DART2 */
@@ -215,13 +215,13 @@ static dart_iopte dart_prot_to_pte(struct dart_io_pgtable *data,
 {
 	dart_iopte pte = 0;
 
-	if (data->iop.fmt == APPLE_DART) {
+	if (data->iop.cfg.fmt == APPLE_DART) {
 		if (!(prot & IOMMU_WRITE))
 			pte |= APPLE_DART1_PTE_PROT_NO_WRITE;
 		if (!(prot & IOMMU_READ))
 			pte |= APPLE_DART1_PTE_PROT_NO_READ;
 	}
-	if (data->iop.fmt == APPLE_DART2) {
+	if (data->iop.cfg.fmt == APPLE_DART2) {
 		if (!(prot & IOMMU_WRITE))
 			pte |= APPLE_DART2_PTE_PROT_NO_WRITE;
 		if (!(prot & IOMMU_READ))
diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c
index b843fcd365d2..79e459f95012 100644
--- a/drivers/iommu/io-pgtable.c
+++ b/drivers/iommu/io-pgtable.c
@@ -34,17 +34,16 @@ io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] = {
 #endif
 };
 
-struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
-					    struct io_pgtable_cfg *cfg,
+struct io_pgtable_ops *alloc_io_pgtable_ops(struct io_pgtable_cfg *cfg,
 					    void *cookie)
 {
 	struct io_pgtable *iop;
 	const struct io_pgtable_init_fns *fns;
 
-	if (fmt >= IO_PGTABLE_NUM_FMTS)
+	if (cfg->fmt >= IO_PGTABLE_NUM_FMTS)
 		return NULL;
 
-	fns = io_pgtable_init_table[fmt];
+	fns = io_pgtable_init_table[cfg->fmt];
 	if (!fns)
 		return NULL;
 
@@ -52,7 +51,6 @@ struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
 	if (!iop)
 		return NULL;
 
-	iop->fmt	= fmt;
 	iop->cookie	= cookie;
 	iop->cfg	= *cfg;
 
@@ -73,6 +71,6 @@ void free_io_pgtable_ops(struct io_pgtable_ops *ops)
 
 	iop = io_pgtable_ops_to_pgtable(ops);
 	io_pgtable_tlb_flush_all(iop);
-	io_pgtable_init_table[iop->fmt]->free(iop);
+	io_pgtable_init_table[iop->cfg.fmt]->free(iop);
 }
 EXPORT_SYMBOL_GPL(free_io_pgtable_ops);
diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index a003bd5fc65c..4a1927489635 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -447,6 +447,7 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
 	 */
 	domain->cfg.coherent_walk = false;
 	domain->cfg.iommu_dev = domain->mmu->root->dev;
+	domain->cfg.fmt = ARM_32_LPAE_S1;
 
 	/*
 	 * Find an unused context.
@@ -457,8 +458,7 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
 
 	domain->context_id = ret;
 
-	domain->iop = alloc_io_pgtable_ops(ARM_32_LPAE_S1, &domain->cfg,
-					   domain);
+	domain->iop = alloc_io_pgtable_ops(&domain->cfg, domain);
 	if (!domain->iop) {
 		ipmmu_domain_free_context(domain->mmu->root,
 					  domain->context_id);
diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index c60624910872..2c05a84ec1bf 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -342,6 +342,7 @@ static int msm_iommu_domain_config(struct msm_priv *priv)
 	spin_lock_init(&priv->pgtlock);
 
 	priv->cfg = (struct io_pgtable_cfg) {
+		.fmt = ARM_V7S,
 		.pgsize_bitmap = msm_iommu_ops.pgsize_bitmap,
 		.ias = 32,
 		.oas = 32,
@@ -349,7 +350,7 @@ static int msm_iommu_domain_config(struct msm_priv *priv)
 		.iommu_dev = priv->dev,
 	};
 
-	priv->iop = alloc_io_pgtable_ops(ARM_V7S, &priv->cfg, priv);
+	priv->iop = alloc_io_pgtable_ops(&priv->cfg, priv);
 	if (!priv->iop) {
 		dev_err(priv->dev, "Failed to allocate pgtable\n");
 		return -EINVAL;
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 2badd6acfb23..0d754d94ae52 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -598,6 +598,7 @@ static int mtk_iommu_domain_finalise(struct mtk_iommu_domain *dom,
 	}
 
 	dom->cfg = (struct io_pgtable_cfg) {
+		.fmt = ARM_V7S,
 		.quirks = IO_PGTABLE_QUIRK_ARM_NS |
 			IO_PGTABLE_QUIRK_NO_PERMS |
 			IO_PGTABLE_QUIRK_ARM_MTK_EXT,
@@ -614,7 +615,7 @@ static int mtk_iommu_domain_finalise(struct mtk_iommu_domain *dom,
 	else
 		dom->cfg.oas = 35;
 
-	dom->iop = alloc_io_pgtable_ops(ARM_V7S, &dom->cfg, data);
+	dom->iop = alloc_io_pgtable_ops(&dom->cfg, data);
 	if (!dom->iop) {
 		dev_err(data->dev, "Failed to alloc io pgtable\n");
 		return -ENOMEM;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 03/45] iommu/io-pgtable: Move fmt into io_pgtable_cfg
@ 2023-02-01 12:52   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:52 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

When passing the I/O pagetable configuration around and adding new
operations, it will be slightly more convenient to have fmt be part of
the config structure rather than a separate parameter.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 include/linux/io-pgtable.h                  |  8 +++----
 drivers/gpu/drm/msm/msm_iommu.c             |  3 +--
 drivers/gpu/drm/panfrost/panfrost_mmu.c     |  4 ++--
 drivers/iommu/amd/iommu.c                   |  3 ++-
 drivers/iommu/apple-dart.c                  |  4 ++--
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  3 ++-
 drivers/iommu/arm/arm-smmu/arm-smmu.c       |  3 ++-
 drivers/iommu/arm/arm-smmu/qcom_iommu.c     |  3 ++-
 drivers/iommu/io-pgtable-arm-common.c       | 26 ++++++++++-----------
 drivers/iommu/io-pgtable-arm-v7s.c          |  3 ++-
 drivers/iommu/io-pgtable-arm.c              |  3 ++-
 drivers/iommu/io-pgtable-dart.c             |  8 +++----
 drivers/iommu/io-pgtable.c                  | 10 ++++----
 drivers/iommu/ipmmu-vmsa.c                  |  4 ++--
 drivers/iommu/msm_iommu.c                   |  3 ++-
 drivers/iommu/mtk_iommu.c                   |  3 ++-
 16 files changed, 47 insertions(+), 44 deletions(-)

diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 1b7a44b35616..1b0c26241a78 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -49,6 +49,7 @@ struct iommu_flush_ops {
 /**
  * struct io_pgtable_cfg - Configuration data for a set of page tables.
  *
+ * @fmt	           Format used for these page tables
  * @quirks:        A bitmap of hardware quirks that require some special
  *                 action by the low-level page table allocator.
  * @pgsize_bitmap: A bitmap of page sizes supported by this set of page
@@ -62,6 +63,7 @@ struct iommu_flush_ops {
  *                 page table walker.
  */
 struct io_pgtable_cfg {
+	enum io_pgtable_fmt		fmt;
 	/*
 	 * IO_PGTABLE_QUIRK_ARM_NS: (ARM formats) Set NS and NSTABLE bits in
 	 *	stage 1 PTEs, for hardware which insists on validating them
@@ -171,15 +173,13 @@ struct io_pgtable_ops {
 /**
  * alloc_io_pgtable_ops() - Allocate a page table allocator for use by an IOMMU.
  *
- * @fmt:    The page table format.
  * @cfg:    The page table configuration. This will be modified to represent
  *          the configuration actually provided by the allocator (e.g. the
  *          pgsize_bitmap may be restricted).
  * @cookie: An opaque token provided by the IOMMU driver and passed back to
  *          the callback routines in cfg->tlb.
  */
-struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
-					    struct io_pgtable_cfg *cfg,
+struct io_pgtable_ops *alloc_io_pgtable_ops(struct io_pgtable_cfg *cfg,
 					    void *cookie);
 
 /**
@@ -199,14 +199,12 @@ void free_io_pgtable_ops(struct io_pgtable_ops *ops);
 /**
  * struct io_pgtable - Internal structure describing a set of page tables.
  *
- * @fmt:    The page table format.
  * @cookie: An opaque token provided by the IOMMU driver and passed back to
  *          any callback routines.
  * @cfg:    A copy of the page table configuration.
  * @ops:    The page table operations in use for this set of page tables.
  */
 struct io_pgtable {
-	enum io_pgtable_fmt	fmt;
 	void			*cookie;
 	struct io_pgtable_cfg	cfg;
 	struct io_pgtable_ops	ops;
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index c2507582ecf3..e9c6f281e3dd 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -258,8 +258,7 @@ struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent)
 	ttbr0_cfg.quirks &= ~IO_PGTABLE_QUIRK_ARM_TTBR1;
 	ttbr0_cfg.tlb = &null_tlb_ops;
 
-	pagetable->pgtbl_ops = alloc_io_pgtable_ops(ARM_64_LPAE_S1,
-		&ttbr0_cfg, iommu->domain);
+	pagetable->pgtbl_ops = alloc_io_pgtable_ops(&ttbr0_cfg, iommu->domain);
 
 	if (!pagetable->pgtbl_ops) {
 		kfree(pagetable);
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 4e83a1891f3e..31bdb5d46244 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -622,6 +622,7 @@ struct panfrost_mmu *panfrost_mmu_ctx_create(struct panfrost_device *pfdev)
 	mmu->as = -1;
 
 	mmu->pgtbl_cfg = (struct io_pgtable_cfg) {
+		.fmt		= ARM_MALI_LPAE,
 		.pgsize_bitmap	= SZ_4K | SZ_2M,
 		.ias		= FIELD_GET(0xff, pfdev->features.mmu_features),
 		.oas		= FIELD_GET(0xff00, pfdev->features.mmu_features),
@@ -630,8 +631,7 @@ struct panfrost_mmu *panfrost_mmu_ctx_create(struct panfrost_device *pfdev)
 		.iommu_dev	= pfdev->dev,
 	};
 
-	mmu->pgtbl_ops = alloc_io_pgtable_ops(ARM_MALI_LPAE, &mmu->pgtbl_cfg,
-					      mmu);
+	mmu->pgtbl_ops = alloc_io_pgtable_ops(&mmu->pgtbl_cfg, mmu);
 	if (!mmu->pgtbl_ops) {
 		kfree(mmu);
 		return ERR_PTR(-EINVAL);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index cbeaab55c0db..7efb6b467041 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2072,7 +2072,8 @@ static struct protection_domain *protection_domain_alloc(unsigned int type)
 	if (ret)
 		goto out_err;
 
-	pgtbl_ops = alloc_io_pgtable_ops(pgtable, &domain->iop.pgtbl_cfg, domain);
+	domain->iop.pgtbl_cfg.fmt = pgtable;
+	pgtbl_ops = alloc_io_pgtable_ops(&domain->iop.pgtbl_cfg, domain);
 	if (!pgtbl_ops) {
 		domain_id_free(domain->id);
 		goto out_err;
diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
index 4f4a323be0d0..571f948add7c 100644
--- a/drivers/iommu/apple-dart.c
+++ b/drivers/iommu/apple-dart.c
@@ -427,6 +427,7 @@ static int apple_dart_finalize_domain(struct iommu_domain *domain,
 	}
 
 	pgtbl_cfg = (struct io_pgtable_cfg){
+		.fmt = dart->hw->fmt,
 		.pgsize_bitmap = dart->pgsize,
 		.ias = 32,
 		.oas = dart->hw->oas,
@@ -434,8 +435,7 @@ static int apple_dart_finalize_domain(struct iommu_domain *domain,
 		.iommu_dev = dart->dev,
 	};
 
-	dart_domain->pgtbl_ops =
-		alloc_io_pgtable_ops(dart->hw->fmt, &pgtbl_cfg, domain);
+	dart_domain->pgtbl_ops = alloc_io_pgtable_ops(&pgtbl_cfg, domain);
 	if (!dart_domain->pgtbl_ops) {
 		ret = -ENOMEM;
 		goto done;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index ab160198edd6..c033b23ca4b2 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2209,6 +2209,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
 	}
 
 	pgtbl_cfg = (struct io_pgtable_cfg) {
+		.fmt		= fmt,
 		.pgsize_bitmap	= smmu->pgsize_bitmap,
 		.ias		= ias,
 		.oas		= oas,
@@ -2217,7 +2218,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
 		.iommu_dev	= smmu->dev,
 	};
 
-	pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);
+	pgtbl_ops = alloc_io_pgtable_ops(&pgtbl_cfg, smmu_domain);
 	if (!pgtbl_ops)
 		return -ENOMEM;
 
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 719fbca1fe52..f230d2ce977a 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -747,6 +747,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 		cfg->asid = cfg->cbndx;
 
 	pgtbl_cfg = (struct io_pgtable_cfg) {
+		.fmt		= fmt,
 		.pgsize_bitmap	= smmu->pgsize_bitmap,
 		.ias		= ias,
 		.oas		= oas,
@@ -764,7 +765,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 	if (smmu_domain->pgtbl_quirks)
 		pgtbl_cfg.quirks |= smmu_domain->pgtbl_quirks;
 
-	pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);
+	pgtbl_ops = alloc_io_pgtable_ops(&pgtbl_cfg, smmu_domain);
 	if (!pgtbl_ops) {
 		ret = -ENOMEM;
 		goto out_clear_smmu;
diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
index 270c3d9128ba..65eb8bdcbe50 100644
--- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c
+++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
@@ -239,6 +239,7 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
 		goto out_unlock;
 
 	pgtbl_cfg = (struct io_pgtable_cfg) {
+		.fmt		= ARM_32_LPAE_S1,
 		.pgsize_bitmap	= qcom_iommu_ops.pgsize_bitmap,
 		.ias		= 32,
 		.oas		= 40,
@@ -249,7 +250,7 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
 	qcom_domain->iommu = qcom_iommu;
 	qcom_domain->fwspec = fwspec;
 
-	pgtbl_ops = alloc_io_pgtable_ops(ARM_32_LPAE_S1, &pgtbl_cfg, qcom_domain);
+	pgtbl_ops = alloc_io_pgtable_ops(&pgtbl_cfg, qcom_domain);
 	if (!pgtbl_ops) {
 		dev_err(qcom_iommu->dev, "failed to allocate pagetable ops\n");
 		ret = -ENOMEM;
diff --git a/drivers/iommu/io-pgtable-arm-common.c b/drivers/iommu/io-pgtable-arm-common.c
index 7340b5096499..4b3a9ce806ea 100644
--- a/drivers/iommu/io-pgtable-arm-common.c
+++ b/drivers/iommu/io-pgtable-arm-common.c
@@ -62,7 +62,7 @@ static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
 	size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
 	int i;
 
-	if (data->iop.fmt != ARM_MALI_LPAE && lvl == ARM_LPAE_MAX_LEVELS - 1)
+	if (data->iop.cfg.fmt != ARM_MALI_LPAE && lvl == ARM_LPAE_MAX_LEVELS - 1)
 		pte |= ARM_LPAE_PTE_TYPE_PAGE;
 	else
 		pte |= ARM_LPAE_PTE_TYPE_BLOCK;
@@ -82,7 +82,7 @@ static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
 	int i;
 
 	for (i = 0; i < num_entries; i++)
-		if (iopte_leaf(ptep[i], lvl, data->iop.fmt)) {
+		if (iopte_leaf(ptep[i], lvl, data->iop.cfg.fmt)) {
 			/* We require an unmap first */
 			WARN_ON(!selftest_running);
 			return -EEXIST;
@@ -183,7 +183,7 @@ int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova,
 		__arm_lpae_sync_pte(ptep, 1, cfg);
 	}
 
-	if (pte && !iopte_leaf(pte, lvl, data->iop.fmt)) {
+	if (pte && !iopte_leaf(pte, lvl, data->iop.cfg.fmt)) {
 		cptep = iopte_deref(pte, data);
 	} else if (pte) {
 		/* We require an unmap first */
@@ -201,8 +201,8 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
 {
 	arm_lpae_iopte pte;
 
-	if (data->iop.fmt == ARM_64_LPAE_S1 ||
-	    data->iop.fmt == ARM_32_LPAE_S1) {
+	if (data->iop.cfg.fmt == ARM_64_LPAE_S1 ||
+	    data->iop.cfg.fmt == ARM_32_LPAE_S1) {
 		pte = ARM_LPAE_PTE_nG;
 		if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
 			pte |= ARM_LPAE_PTE_AP_RDONLY;
@@ -220,8 +220,8 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
 	 * Note that this logic is structured to accommodate Mali LPAE
 	 * having stage-1-like attributes but stage-2-like permissions.
 	 */
-	if (data->iop.fmt == ARM_64_LPAE_S2 ||
-	    data->iop.fmt == ARM_32_LPAE_S2) {
+	if (data->iop.cfg.fmt == ARM_64_LPAE_S2 ||
+	    data->iop.cfg.fmt == ARM_32_LPAE_S2) {
 		if (prot & IOMMU_MMIO)
 			pte |= ARM_LPAE_PTE_MEMATTR_DEV;
 		else if (prot & IOMMU_CACHE)
@@ -243,7 +243,7 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
 	 * "outside the GPU" (i.e. either the Inner or System domain in CPU
 	 * terms, depending on coherency).
 	 */
-	if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE)
+	if (prot & IOMMU_CACHE && data->iop.cfg.fmt != ARM_MALI_LPAE)
 		pte |= ARM_LPAE_PTE_SH_IS;
 	else
 		pte |= ARM_LPAE_PTE_SH_OS;
@@ -254,7 +254,7 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
 	if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_NS)
 		pte |= ARM_LPAE_PTE_NS;
 
-	if (data->iop.fmt != ARM_MALI_LPAE)
+	if (data->iop.cfg.fmt != ARM_MALI_LPAE)
 		pte |= ARM_LPAE_PTE_AF;
 
 	return pte;
@@ -317,7 +317,7 @@ void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
 	while (ptep != end) {
 		arm_lpae_iopte pte = *ptep++;
 
-		if (!pte || iopte_leaf(pte, lvl, data->iop.fmt))
+		if (!pte || iopte_leaf(pte, lvl, data->iop.cfg.fmt))
 			continue;
 
 		__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data));
@@ -417,7 +417,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
 
 			__arm_lpae_clear_pte(ptep, &iop->cfg);
 
-			if (!iopte_leaf(pte, lvl, iop->fmt)) {
+			if (!iopte_leaf(pte, lvl, iop->cfg.fmt)) {
 				/* Also flush any partial walks */
 				io_pgtable_tlb_flush_walk(iop, iova + i * size, size,
 							  ARM_LPAE_GRANULE(data));
@@ -431,7 +431,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
 		}
 
 		return i * size;
-	} else if (iopte_leaf(pte, lvl, iop->fmt)) {
+	} else if (iopte_leaf(pte, lvl, iop->cfg.fmt)) {
 		/*
 		 * Insert a table at the next level to map the old region,
 		 * minus the part we want to unmap
@@ -487,7 +487,7 @@ phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
 			return 0;
 
 		/* Leaf entry? */
-		if (iopte_leaf(pte, lvl, data->iop.fmt))
+		if (iopte_leaf(pte, lvl, data->iop.cfg.fmt))
 			goto found_translation;
 
 		/* Take it to the next level */
diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
index 75f244a3e12d..278b4299d757 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -930,6 +930,7 @@ static int __init arm_v7s_do_selftests(void)
 {
 	struct io_pgtable_ops *ops;
 	struct io_pgtable_cfg cfg = {
+		.fmt = ARM_V7S,
 		.tlb = &dummy_tlb_ops,
 		.oas = 32,
 		.ias = 32,
@@ -945,7 +946,7 @@ static int __init arm_v7s_do_selftests(void)
 
 	cfg_cookie = &cfg;
 
-	ops = alloc_io_pgtable_ops(ARM_V7S, &cfg, &cfg);
+	ops = alloc_io_pgtable_ops(&cfg, &cfg);
 	if (!ops) {
 		pr_err("selftest: failed to allocate io pgtable ops\n");
 		return -EINVAL;
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index b2b188bb86b3..b76b903400de 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -319,7 +319,8 @@ static int __init arm_lpae_run_tests(struct io_pgtable_cfg *cfg)
 
 	for (i = 0; i < ARRAY_SIZE(fmts); ++i) {
 		cfg_cookie = cfg;
-		ops = alloc_io_pgtable_ops(fmts[i], cfg, cfg);
+		cfg->fmt = fmts[i];
+		ops = alloc_io_pgtable_ops(cfg, cfg);
 		if (!ops) {
 			pr_err("selftest: failed to allocate io pgtable ops\n");
 			return -ENOMEM;
diff --git a/drivers/iommu/io-pgtable-dart.c b/drivers/iommu/io-pgtable-dart.c
index 74b1ef2b96be..f981b25d8c98 100644
--- a/drivers/iommu/io-pgtable-dart.c
+++ b/drivers/iommu/io-pgtable-dart.c
@@ -81,7 +81,7 @@ static dart_iopte paddr_to_iopte(phys_addr_t paddr,
 {
 	dart_iopte pte;
 
-	if (data->iop.fmt == APPLE_DART)
+	if (data->iop.cfg.fmt == APPLE_DART)
 		return paddr & APPLE_DART1_PADDR_MASK;
 
 	/* format is APPLE_DART2 */
@@ -96,7 +96,7 @@ static phys_addr_t iopte_to_paddr(dart_iopte pte,
 {
 	u64 paddr;
 
-	if (data->iop.fmt == APPLE_DART)
+	if (data->iop.cfg.fmt == APPLE_DART)
 		return pte & APPLE_DART1_PADDR_MASK;
 
 	/* format is APPLE_DART2 */
@@ -215,13 +215,13 @@ static dart_iopte dart_prot_to_pte(struct dart_io_pgtable *data,
 {
 	dart_iopte pte = 0;
 
-	if (data->iop.fmt == APPLE_DART) {
+	if (data->iop.cfg.fmt == APPLE_DART) {
 		if (!(prot & IOMMU_WRITE))
 			pte |= APPLE_DART1_PTE_PROT_NO_WRITE;
 		if (!(prot & IOMMU_READ))
 			pte |= APPLE_DART1_PTE_PROT_NO_READ;
 	}
-	if (data->iop.fmt == APPLE_DART2) {
+	if (data->iop.cfg.fmt == APPLE_DART2) {
 		if (!(prot & IOMMU_WRITE))
 			pte |= APPLE_DART2_PTE_PROT_NO_WRITE;
 		if (!(prot & IOMMU_READ))
diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c
index b843fcd365d2..79e459f95012 100644
--- a/drivers/iommu/io-pgtable.c
+++ b/drivers/iommu/io-pgtable.c
@@ -34,17 +34,16 @@ io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] = {
 #endif
 };
 
-struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
-					    struct io_pgtable_cfg *cfg,
+struct io_pgtable_ops *alloc_io_pgtable_ops(struct io_pgtable_cfg *cfg,
 					    void *cookie)
 {
 	struct io_pgtable *iop;
 	const struct io_pgtable_init_fns *fns;
 
-	if (fmt >= IO_PGTABLE_NUM_FMTS)
+	if (cfg->fmt >= IO_PGTABLE_NUM_FMTS)
 		return NULL;
 
-	fns = io_pgtable_init_table[fmt];
+	fns = io_pgtable_init_table[cfg->fmt];
 	if (!fns)
 		return NULL;
 
@@ -52,7 +51,6 @@ struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
 	if (!iop)
 		return NULL;
 
-	iop->fmt	= fmt;
 	iop->cookie	= cookie;
 	iop->cfg	= *cfg;
 
@@ -73,6 +71,6 @@ void free_io_pgtable_ops(struct io_pgtable_ops *ops)
 
 	iop = io_pgtable_ops_to_pgtable(ops);
 	io_pgtable_tlb_flush_all(iop);
-	io_pgtable_init_table[iop->fmt]->free(iop);
+	io_pgtable_init_table[iop->cfg.fmt]->free(iop);
 }
 EXPORT_SYMBOL_GPL(free_io_pgtable_ops);
diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index a003bd5fc65c..4a1927489635 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -447,6 +447,7 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
 	 */
 	domain->cfg.coherent_walk = false;
 	domain->cfg.iommu_dev = domain->mmu->root->dev;
+	domain->cfg.fmt = ARM_32_LPAE_S1;
 
 	/*
 	 * Find an unused context.
@@ -457,8 +458,7 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
 
 	domain->context_id = ret;
 
-	domain->iop = alloc_io_pgtable_ops(ARM_32_LPAE_S1, &domain->cfg,
-					   domain);
+	domain->iop = alloc_io_pgtable_ops(&domain->cfg, domain);
 	if (!domain->iop) {
 		ipmmu_domain_free_context(domain->mmu->root,
 					  domain->context_id);
diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index c60624910872..2c05a84ec1bf 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -342,6 +342,7 @@ static int msm_iommu_domain_config(struct msm_priv *priv)
 	spin_lock_init(&priv->pgtlock);
 
 	priv->cfg = (struct io_pgtable_cfg) {
+		.fmt = ARM_V7S,
 		.pgsize_bitmap = msm_iommu_ops.pgsize_bitmap,
 		.ias = 32,
 		.oas = 32,
@@ -349,7 +350,7 @@ static int msm_iommu_domain_config(struct msm_priv *priv)
 		.iommu_dev = priv->dev,
 	};
 
-	priv->iop = alloc_io_pgtable_ops(ARM_V7S, &priv->cfg, priv);
+	priv->iop = alloc_io_pgtable_ops(&priv->cfg, priv);
 	if (!priv->iop) {
 		dev_err(priv->dev, "Failed to allocate pgtable\n");
 		return -EINVAL;
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 2badd6acfb23..0d754d94ae52 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -598,6 +598,7 @@ static int mtk_iommu_domain_finalise(struct mtk_iommu_domain *dom,
 	}
 
 	dom->cfg = (struct io_pgtable_cfg) {
+		.fmt = ARM_V7S,
 		.quirks = IO_PGTABLE_QUIRK_ARM_NS |
 			IO_PGTABLE_QUIRK_NO_PERMS |
 			IO_PGTABLE_QUIRK_ARM_MTK_EXT,
@@ -614,7 +615,7 @@ static int mtk_iommu_domain_finalise(struct mtk_iommu_domain *dom,
 	else
 		dom->cfg.oas = 35;
 
-	dom->iop = alloc_io_pgtable_ops(ARM_V7S, &dom->cfg, data);
+	dom->iop = alloc_io_pgtable_ops(&dom->cfg, data);
 	if (!dom->iop) {
 		dev_err(data->dev, "Failed to alloc io pgtable\n");
 		return -ENOMEM;
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 04/45] iommu/io-pgtable: Add configure() operation
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:52   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:52 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Allow IOMMU drivers to create the io-pgtable configuration without
allocating any tables. This will be used by the SMMUv3-KVM driver to
initialize a config and pass it to KVM.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 include/linux/io-pgtable.h     | 14 +++++++++++
 drivers/iommu/io-pgtable-arm.c | 46 ++++++++++++++++++++++++++--------
 drivers/iommu/io-pgtable.c     | 15 +++++++++++
 3 files changed, 65 insertions(+), 10 deletions(-)

diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 1b0c26241a78..ee6484d7a5e0 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -191,6 +191,18 @@ struct io_pgtable_ops *alloc_io_pgtable_ops(struct io_pgtable_cfg *cfg,
  */
 void free_io_pgtable_ops(struct io_pgtable_ops *ops);
 
+/**
+ * io_pgtable_configure - Create page table config
+ *
+ * @cfg:	The page table configuration.
+ * @pgd_size:	On success, size of the top-level table in bytes.
+ *
+ * Initialize @cfg in the same way as alloc_io_pgtable_ops(), without allocating
+ * anything.
+ *
+ * Not all io_pgtable drivers implement this operation.
+ */
+int io_pgtable_configure(struct io_pgtable_cfg *cfg, size_t *pgd_size);
 
 /*
  * Internal structures for page table allocator implementations.
@@ -241,10 +253,12 @@ io_pgtable_tlb_add_page(struct io_pgtable *iop,
  *
  * @alloc: Allocate a set of page tables described by cfg.
  * @free:  Free the page tables associated with iop.
+ * @configure: Create the configuration without allocating anything. Optional.
  */
 struct io_pgtable_init_fns {
 	struct io_pgtable *(*alloc)(struct io_pgtable_cfg *cfg, void *cookie);
 	void (*free)(struct io_pgtable *iop);
+	int (*configure)(struct io_pgtable_cfg *cfg, size_t *pgd_size);
 };
 
 extern struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s1_init_fns;
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index b76b903400de..c412500efadf 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -118,6 +118,18 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 	return NULL;
 }
 
+static int arm_64_lpae_configure_s1(struct io_pgtable_cfg *cfg, size_t *pgd_size)
+{
+	int ret;
+	struct arm_lpae_io_pgtable data = {};
+
+	ret = arm_lpae_init_pgtable_s1(cfg, &data);
+	if (ret)
+		return ret;
+	*pgd_size = sizeof(arm_lpae_iopte) << data.pgd_bits;
+	return 0;
+}
+
 static struct io_pgtable *
 arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
 {
@@ -148,6 +160,18 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
 	return NULL;
 }
 
+static int arm_64_lpae_configure_s2(struct io_pgtable_cfg *cfg, size_t *pgd_size)
+{
+	int ret;
+	struct arm_lpae_io_pgtable data = {};
+
+	ret = arm_lpae_init_pgtable_s2(cfg, &data);
+	if (ret)
+		return ret;
+	*pgd_size = sizeof(arm_lpae_iopte) << data.pgd_bits;
+	return 0;
+}
+
 static struct io_pgtable *
 arm_32_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 {
@@ -231,28 +255,30 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
 }
 
 struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns = {
-	.alloc	= arm_64_lpae_alloc_pgtable_s1,
-	.free	= arm_lpae_free_pgtable,
+	.alloc		= arm_64_lpae_alloc_pgtable_s1,
+	.free		= arm_lpae_free_pgtable,
+	.configure	= arm_64_lpae_configure_s1,
 };
 
 struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s2_init_fns = {
-	.alloc	= arm_64_lpae_alloc_pgtable_s2,
-	.free	= arm_lpae_free_pgtable,
+	.alloc		= arm_64_lpae_alloc_pgtable_s2,
+	.free		= arm_lpae_free_pgtable,
+	.configure	= arm_64_lpae_configure_s2,
 };
 
 struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s1_init_fns = {
-	.alloc	= arm_32_lpae_alloc_pgtable_s1,
-	.free	= arm_lpae_free_pgtable,
+	.alloc		= arm_32_lpae_alloc_pgtable_s1,
+	.free		= arm_lpae_free_pgtable,
 };
 
 struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s2_init_fns = {
-	.alloc	= arm_32_lpae_alloc_pgtable_s2,
-	.free	= arm_lpae_free_pgtable,
+	.alloc		= arm_32_lpae_alloc_pgtable_s2,
+	.free		= arm_lpae_free_pgtable,
 };
 
 struct io_pgtable_init_fns io_pgtable_arm_mali_lpae_init_fns = {
-	.alloc	= arm_mali_lpae_alloc_pgtable,
-	.free	= arm_lpae_free_pgtable,
+	.alloc		= arm_mali_lpae_alloc_pgtable,
+	.free		= arm_lpae_free_pgtable,
 };
 
 #ifdef CONFIG_IOMMU_IO_PGTABLE_LPAE_SELFTEST
diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c
index 79e459f95012..2aba691db1da 100644
--- a/drivers/iommu/io-pgtable.c
+++ b/drivers/iommu/io-pgtable.c
@@ -74,3 +74,18 @@ void free_io_pgtable_ops(struct io_pgtable_ops *ops)
 	io_pgtable_init_table[iop->cfg.fmt]->free(iop);
 }
 EXPORT_SYMBOL_GPL(free_io_pgtable_ops);
+
+int io_pgtable_configure(struct io_pgtable_cfg *cfg, size_t *pgd_size)
+{
+	const struct io_pgtable_init_fns *fns;
+
+	if (cfg->fmt >= IO_PGTABLE_NUM_FMTS)
+		return -EINVAL;
+
+	fns = io_pgtable_init_table[cfg->fmt];
+	if (!fns || !fns->configure)
+		return -EOPNOTSUPP;
+
+	return fns->configure(cfg, pgd_size);
+}
+EXPORT_SYMBOL_GPL(io_pgtable_configure);
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 04/45] iommu/io-pgtable: Add configure() operation
@ 2023-02-01 12:52   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:52 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Allow IOMMU drivers to create the io-pgtable configuration without
allocating any tables. This will be used by the SMMUv3-KVM driver to
initialize a config and pass it to KVM.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 include/linux/io-pgtable.h     | 14 +++++++++++
 drivers/iommu/io-pgtable-arm.c | 46 ++++++++++++++++++++++++++--------
 drivers/iommu/io-pgtable.c     | 15 +++++++++++
 3 files changed, 65 insertions(+), 10 deletions(-)

diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 1b0c26241a78..ee6484d7a5e0 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -191,6 +191,18 @@ struct io_pgtable_ops *alloc_io_pgtable_ops(struct io_pgtable_cfg *cfg,
  */
 void free_io_pgtable_ops(struct io_pgtable_ops *ops);
 
+/**
+ * io_pgtable_configure - Create page table config
+ *
+ * @cfg:	The page table configuration.
+ * @pgd_size:	On success, size of the top-level table in bytes.
+ *
+ * Initialize @cfg in the same way as alloc_io_pgtable_ops(), without allocating
+ * anything.
+ *
+ * Not all io_pgtable drivers implement this operation.
+ */
+int io_pgtable_configure(struct io_pgtable_cfg *cfg, size_t *pgd_size);
 
 /*
  * Internal structures for page table allocator implementations.
@@ -241,10 +253,12 @@ io_pgtable_tlb_add_page(struct io_pgtable *iop,
  *
  * @alloc: Allocate a set of page tables described by cfg.
  * @free:  Free the page tables associated with iop.
+ * @configure: Create the configuration without allocating anything. Optional.
  */
 struct io_pgtable_init_fns {
 	struct io_pgtable *(*alloc)(struct io_pgtable_cfg *cfg, void *cookie);
 	void (*free)(struct io_pgtable *iop);
+	int (*configure)(struct io_pgtable_cfg *cfg, size_t *pgd_size);
 };
 
 extern struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s1_init_fns;
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index b76b903400de..c412500efadf 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -118,6 +118,18 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 	return NULL;
 }
 
+static int arm_64_lpae_configure_s1(struct io_pgtable_cfg *cfg, size_t *pgd_size)
+{
+	int ret;
+	struct arm_lpae_io_pgtable data = {};
+
+	ret = arm_lpae_init_pgtable_s1(cfg, &data);
+	if (ret)
+		return ret;
+	*pgd_size = sizeof(arm_lpae_iopte) << data.pgd_bits;
+	return 0;
+}
+
 static struct io_pgtable *
 arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
 {
@@ -148,6 +160,18 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
 	return NULL;
 }
 
+static int arm_64_lpae_configure_s2(struct io_pgtable_cfg *cfg, size_t *pgd_size)
+{
+	int ret;
+	struct arm_lpae_io_pgtable data = {};
+
+	ret = arm_lpae_init_pgtable_s2(cfg, &data);
+	if (ret)
+		return ret;
+	*pgd_size = sizeof(arm_lpae_iopte) << data.pgd_bits;
+	return 0;
+}
+
 static struct io_pgtable *
 arm_32_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 {
@@ -231,28 +255,30 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
 }
 
 struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns = {
-	.alloc	= arm_64_lpae_alloc_pgtable_s1,
-	.free	= arm_lpae_free_pgtable,
+	.alloc		= arm_64_lpae_alloc_pgtable_s1,
+	.free		= arm_lpae_free_pgtable,
+	.configure	= arm_64_lpae_configure_s1,
 };
 
 struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s2_init_fns = {
-	.alloc	= arm_64_lpae_alloc_pgtable_s2,
-	.free	= arm_lpae_free_pgtable,
+	.alloc		= arm_64_lpae_alloc_pgtable_s2,
+	.free		= arm_lpae_free_pgtable,
+	.configure	= arm_64_lpae_configure_s2,
 };
 
 struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s1_init_fns = {
-	.alloc	= arm_32_lpae_alloc_pgtable_s1,
-	.free	= arm_lpae_free_pgtable,
+	.alloc		= arm_32_lpae_alloc_pgtable_s1,
+	.free		= arm_lpae_free_pgtable,
 };
 
 struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s2_init_fns = {
-	.alloc	= arm_32_lpae_alloc_pgtable_s2,
-	.free	= arm_lpae_free_pgtable,
+	.alloc		= arm_32_lpae_alloc_pgtable_s2,
+	.free		= arm_lpae_free_pgtable,
 };
 
 struct io_pgtable_init_fns io_pgtable_arm_mali_lpae_init_fns = {
-	.alloc	= arm_mali_lpae_alloc_pgtable,
-	.free	= arm_lpae_free_pgtable,
+	.alloc		= arm_mali_lpae_alloc_pgtable,
+	.free		= arm_lpae_free_pgtable,
 };
 
 #ifdef CONFIG_IOMMU_IO_PGTABLE_LPAE_SELFTEST
diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c
index 79e459f95012..2aba691db1da 100644
--- a/drivers/iommu/io-pgtable.c
+++ b/drivers/iommu/io-pgtable.c
@@ -74,3 +74,18 @@ void free_io_pgtable_ops(struct io_pgtable_ops *ops)
 	io_pgtable_init_table[iop->cfg.fmt]->free(iop);
 }
 EXPORT_SYMBOL_GPL(free_io_pgtable_ops);
+
+int io_pgtable_configure(struct io_pgtable_cfg *cfg, size_t *pgd_size)
+{
+	const struct io_pgtable_init_fns *fns;
+
+	if (cfg->fmt >= IO_PGTABLE_NUM_FMTS)
+		return -EINVAL;
+
+	fns = io_pgtable_init_table[cfg->fmt];
+	if (!fns || !fns->configure)
+		return -EOPNOTSUPP;
+
+	return fns->configure(cfg, pgd_size);
+}
+EXPORT_SYMBOL_GPL(io_pgtable_configure);
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 05/45] iommu/io-pgtable: Split io_pgtable structure
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:52   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:52 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker, Abhinav Kumar,
	Alyssa Rosenzweig, Andy Gross, Bjorn Andersson, Daniel Vetter,
	David Airlie, Dmitry Baryshkov, Hector Martin, Konrad Dybcio,
	Matthias Brugger, Rob Clark, Rob Herring, Sean Paul,
	Steven Price, Suravee Suthikulpanit, Sven Peter, Tomeu Vizoso,
	Yong Wu

The io_pgtable structure contains all information needed for io-pgtable
ops map() and unmap(), including a static configuration, driver-facing
ops, TLB callbacks and the PGD pointer. Most of these are common to all
sets of page tables for a given configuration, and really only need one
instance.

Split the structure in two:

* io_pgtable_params contains information that is common to all sets of
  page tables for a given io_pgtable_cfg.
* io_pgtable contains information that is different for each set of page
  tables, namely the PGD and the IOMMU driver cookie passed to TLB
  callbacks.

Keep essentially the same interface for IOMMU drivers, but move it
behind a set of helpers.

The goal is to optimize for space, in order to allocate less memory in
the KVM SMMU driver. While storing 64k io-pgtables with identical
configuration would previously require 10MB, it is now 512kB because the
driver only needs to store the pgd for each domain.

Note that the io_pgtable_cfg still contains the TTBRs, which are
specific to a set of page tables. Most of them can be removed, since
IOMMU drivers can trivially obtain them with virt_to_phys(iop->pgd).
Some architectures do have static configuration bits in the TTBR that
need to be kept.

Unfortunately the split does add an additional dereference which
degrades performance slightly. Running a single-threaded dma-map
benchmark on a server with SMMUv3, I measured a regression of 7-9ns for
map() and 32-78ns for unmap(), which is a slowdown of about 4% and 8%
respectively.

Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Cc: Andy Gross <agross@kernel.org>
Cc: Bjorn Andersson <andersson@kernel.org>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: David Airlie <airlied@gmail.com>
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Cc: Hector Martin <marcan@marcan.st>
Cc: Konrad Dybcio <konrad.dybcio@linaro.org>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Sean Paul <sean@poorly.run>
Cc: Steven Price <steven.price@arm.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Sven Peter <sven@svenpeter.dev>
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Cc: Yong Wu <yong.wu@mediatek.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/gpu/drm/panfrost/panfrost_device.h  |   2 +-
 drivers/iommu/amd/amd_iommu_types.h         |  17 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |   3 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.h       |   2 +-
 include/linux/io-pgtable-arm.h              |  12 +-
 include/linux/io-pgtable.h                  |  94 +++++++---
 drivers/gpu/drm/msm/msm_iommu.c             |  21 ++-
 drivers/gpu/drm/panfrost/panfrost_mmu.c     |  20 +--
 drivers/iommu/amd/io_pgtable.c              |  26 +--
 drivers/iommu/amd/io_pgtable_v2.c           |  43 ++---
 drivers/iommu/amd/iommu.c                   |  28 ++-
 drivers/iommu/apple-dart.c                  |  36 ++--
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  34 ++--
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c  |   7 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.c       |  40 ++---
 drivers/iommu/arm/arm-smmu/qcom_iommu.c     |  40 ++---
 drivers/iommu/io-pgtable-arm-common.c       |  80 +++++----
 drivers/iommu/io-pgtable-arm-v7s.c          | 189 ++++++++++----------
 drivers/iommu/io-pgtable-arm.c              | 158 ++++++++--------
 drivers/iommu/io-pgtable-dart.c             |  97 +++++-----
 drivers/iommu/io-pgtable.c                  |  36 ++--
 drivers/iommu/ipmmu-vmsa.c                  |  18 +-
 drivers/iommu/msm_iommu.c                   |  17 +-
 drivers/iommu/mtk_iommu.c                   |  13 +-
 24 files changed, 519 insertions(+), 514 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h
index 8b25278f34c8..8a610c4b8f03 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -126,7 +126,7 @@ struct panfrost_mmu {
 	struct panfrost_device *pfdev;
 	struct kref refcount;
 	struct io_pgtable_cfg pgtbl_cfg;
-	struct io_pgtable_ops *pgtbl_ops;
+	struct io_pgtable pgtbl;
 	struct drm_mm mm;
 	spinlock_t mm_lock;
 	int as;
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 3d684190b4d5..5920a556f7ec 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -516,10 +516,10 @@ struct amd_irte_ops;
 #define AMD_IOMMU_FLAG_TRANS_PRE_ENABLED      (1 << 0)
 
 #define io_pgtable_to_data(x) \
-	container_of((x), struct amd_io_pgtable, iop)
+	container_of((x), struct amd_io_pgtable, iop_params)
 
 #define io_pgtable_ops_to_data(x) \
-	io_pgtable_to_data(io_pgtable_ops_to_pgtable(x))
+	io_pgtable_to_data(io_pgtable_ops_to_params(x))
 
 #define io_pgtable_ops_to_domain(x) \
 	container_of(io_pgtable_ops_to_data(x), \
@@ -529,12 +529,13 @@ struct amd_irte_ops;
 	container_of((x), struct amd_io_pgtable, pgtbl_cfg)
 
 struct amd_io_pgtable {
-	struct io_pgtable_cfg	pgtbl_cfg;
-	struct io_pgtable	iop;
-	int			mode;
-	u64			*root;
-	atomic64_t		pt_root;	/* pgtable root and pgtable mode */
-	u64			*pgd;		/* v2 pgtable pgd pointer */
+	struct io_pgtable_cfg		pgtbl_cfg;
+	struct io_pgtable		iop;
+	struct io_pgtable_params	iop_params;
+	int				mode;
+	u64				*root;
+	atomic64_t			pt_root;	/* pgtable root and pgtable mode */
+	u64				*pgd;		/* v2 pgtable pgd pointer */
 };
 
 /*
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 8d772ea8a583..cec3c8103404 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -10,6 +10,7 @@
 
 #include <linux/bitfield.h>
 #include <linux/iommu.h>
+#include <linux/io-pgtable.h>
 #include <linux/kernel.h>
 #include <linux/mmzone.h>
 #include <linux/sizes.h>
@@ -710,7 +711,7 @@ struct arm_smmu_domain {
 	struct arm_smmu_device		*smmu;
 	struct mutex			init_mutex; /* Protects smmu pointer */
 
-	struct io_pgtable_ops		*pgtbl_ops;
+	struct io_pgtable		pgtbl;
 	bool				stall_enabled;
 	atomic_t			nr_ats_masters;
 
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index 703fd5817ec1..249825fc71ac 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -366,7 +366,7 @@ enum arm_smmu_domain_stage {
 
 struct arm_smmu_domain {
 	struct arm_smmu_device		*smmu;
-	struct io_pgtable_ops		*pgtbl_ops;
+	struct io_pgtable		pgtbl;
 	unsigned long			pgtbl_quirks;
 	const struct iommu_flush_ops	*flush_ops;
 	struct arm_smmu_cfg		cfg;
diff --git a/include/linux/io-pgtable-arm.h b/include/linux/io-pgtable-arm.h
index 42202bc0ffa2..5199bd9851b6 100644
--- a/include/linux/io-pgtable-arm.h
+++ b/include/linux/io-pgtable-arm.h
@@ -9,13 +9,11 @@ extern bool selftest_running;
 typedef u64 arm_lpae_iopte;
 
 struct arm_lpae_io_pgtable {
-	struct io_pgtable	iop;
+	struct io_pgtable_params	iop;
 
-	int			pgd_bits;
-	int			start_level;
-	int			bits_per_level;
-
-	void			*pgd;
+	int				pgd_bits;
+	int				start_level;
+	int				bits_per_level;
 };
 
 /* Struct accessors */
@@ -23,7 +21,7 @@ struct arm_lpae_io_pgtable {
 	container_of((x), struct arm_lpae_io_pgtable, iop)
 
 #define io_pgtable_ops_to_data(x)					\
-	io_pgtable_to_data(io_pgtable_ops_to_pgtable(x))
+	io_pgtable_to_data(io_pgtable_ops_to_params(x))
 
 /*
  * Calculate the right shift amount to get to the portion describing level l
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index ee6484d7a5e0..cce5ddbf71c7 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -149,6 +149,20 @@ struct io_pgtable_cfg {
 	};
 };
 
+/**
+ * struct io_pgtable - Structure describing a set of page tables.
+ *
+ * @ops:	The page table operations in use for this set of page tables.
+ * @cookie:	An opaque token provided by the IOMMU driver and passed back to
+ *		any callback routines.
+ * @pgd:	Virtual address of the page directory.
+ */
+struct io_pgtable {
+	struct io_pgtable_ops	*ops;
+	void			*cookie;
+	void			*pgd;
+};
+
 /**
  * struct io_pgtable_ops - Page table manipulation API for IOMMU drivers.
  *
@@ -160,36 +174,64 @@ struct io_pgtable_cfg {
  * the same names.
  */
 struct io_pgtable_ops {
-	int (*map_pages)(struct io_pgtable_ops *ops, unsigned long iova,
+	int (*map_pages)(struct io_pgtable *iop, unsigned long iova,
 			 phys_addr_t paddr, size_t pgsize, size_t pgcount,
 			 int prot, gfp_t gfp, size_t *mapped);
-	size_t (*unmap_pages)(struct io_pgtable_ops *ops, unsigned long iova,
+	size_t (*unmap_pages)(struct io_pgtable *iop, unsigned long iova,
 			      size_t pgsize, size_t pgcount,
 			      struct iommu_iotlb_gather *gather);
-	phys_addr_t (*iova_to_phys)(struct io_pgtable_ops *ops,
-				    unsigned long iova);
+	phys_addr_t (*iova_to_phys)(struct io_pgtable *iop, unsigned long iova);
 };
 
+static inline int
+iopt_map_pages(struct io_pgtable *iop, unsigned long iova, phys_addr_t paddr,
+	       size_t pgsize, size_t pgcount, int prot, gfp_t gfp,
+	       size_t *mapped)
+{
+	if (!iop->ops || !iop->ops->map_pages)
+		return -EINVAL;
+	return iop->ops->map_pages(iop, iova, paddr, pgsize, pgcount, prot, gfp,
+				   mapped);
+}
+
+static inline size_t
+iopt_unmap_pages(struct io_pgtable *iop, unsigned long iova, size_t pgsize,
+		 size_t pgcount, struct iommu_iotlb_gather *gather)
+{
+	if (!iop->ops || !iop->ops->map_pages)
+		return 0;
+	return iop->ops->unmap_pages(iop, iova, pgsize, pgcount, gather);
+}
+
+static inline phys_addr_t
+iopt_iova_to_phys(struct io_pgtable *iop, unsigned long iova)
+{
+	if (!iop->ops || !iop->ops->iova_to_phys)
+		return 0;
+	return iop->ops->iova_to_phys(iop, iova);
+}
+
 /**
  * alloc_io_pgtable_ops() - Allocate a page table allocator for use by an IOMMU.
  *
+ * @iop:    The page table object, filled with the allocated ops on success
  * @cfg:    The page table configuration. This will be modified to represent
  *          the configuration actually provided by the allocator (e.g. the
  *          pgsize_bitmap may be restricted).
  * @cookie: An opaque token provided by the IOMMU driver and passed back to
  *          the callback routines in cfg->tlb.
  */
-struct io_pgtable_ops *alloc_io_pgtable_ops(struct io_pgtable_cfg *cfg,
-					    void *cookie);
+int alloc_io_pgtable_ops(struct io_pgtable *iop, struct io_pgtable_cfg *cfg,
+			 void *cookie);
 
 /**
- * free_io_pgtable_ops() - Free an io_pgtable_ops structure. The caller
+ * free_io_pgtable_ops() - Free the page table. The caller
  *                         *must* ensure that the page table is no longer
  *                         live, but the TLB can be dirty.
  *
- * @ops: The ops returned from alloc_io_pgtable_ops.
+ * @iop: The iop object passed to alloc_io_pgtable_ops
  */
-void free_io_pgtable_ops(struct io_pgtable_ops *ops);
+void free_io_pgtable_ops(struct io_pgtable *iop);
 
 /**
  * io_pgtable_configure - Create page table config
@@ -209,42 +251,41 @@ int io_pgtable_configure(struct io_pgtable_cfg *cfg, size_t *pgd_size);
  */
 
 /**
- * struct io_pgtable - Internal structure describing a set of page tables.
+ * struct io_pgtable_params - Internal structure describing parameters for a
+ *			      given page table configuration
  *
- * @cookie: An opaque token provided by the IOMMU driver and passed back to
- *          any callback routines.
  * @cfg:    A copy of the page table configuration.
  * @ops:    The page table operations in use for this set of page tables.
  */
-struct io_pgtable {
-	void			*cookie;
+struct io_pgtable_params {
 	struct io_pgtable_cfg	cfg;
 	struct io_pgtable_ops	ops;
 };
 
-#define io_pgtable_ops_to_pgtable(x) container_of((x), struct io_pgtable, ops)
+#define io_pgtable_ops_to_params(x) container_of((x), struct io_pgtable_params, ops)
 
-static inline void io_pgtable_tlb_flush_all(struct io_pgtable *iop)
+static inline void io_pgtable_tlb_flush_all(struct io_pgtable_cfg *cfg,
+					    struct io_pgtable *iop)
 {
-	if (iop->cfg.tlb && iop->cfg.tlb->tlb_flush_all)
-		iop->cfg.tlb->tlb_flush_all(iop->cookie);
+	if (cfg->tlb && cfg->tlb->tlb_flush_all)
+		cfg->tlb->tlb_flush_all(iop->cookie);
 }
 
 static inline void
-io_pgtable_tlb_flush_walk(struct io_pgtable *iop, unsigned long iova,
-			  size_t size, size_t granule)
+io_pgtable_tlb_flush_walk(struct io_pgtable_cfg *cfg, struct io_pgtable *iop,
+			  unsigned long iova, size_t size, size_t granule)
 {
-	if (iop->cfg.tlb && iop->cfg.tlb->tlb_flush_walk)
-		iop->cfg.tlb->tlb_flush_walk(iova, size, granule, iop->cookie);
+	if (cfg->tlb && cfg->tlb->tlb_flush_walk)
+		cfg->tlb->tlb_flush_walk(iova, size, granule, iop->cookie);
 }
 
 static inline void
-io_pgtable_tlb_add_page(struct io_pgtable *iop,
+io_pgtable_tlb_add_page(struct io_pgtable_cfg *cfg, struct io_pgtable *iop,
 			struct iommu_iotlb_gather * gather, unsigned long iova,
 			size_t granule)
 {
-	if (iop->cfg.tlb && iop->cfg.tlb->tlb_add_page)
-		iop->cfg.tlb->tlb_add_page(gather, iova, granule, iop->cookie);
+	if (cfg->tlb && cfg->tlb->tlb_add_page)
+		cfg->tlb->tlb_add_page(gather, iova, granule, iop->cookie);
 }
 
 /**
@@ -256,7 +297,8 @@ io_pgtable_tlb_add_page(struct io_pgtable *iop,
  * @configure: Create the configuration without allocating anything. Optional.
  */
 struct io_pgtable_init_fns {
-	struct io_pgtable *(*alloc)(struct io_pgtable_cfg *cfg, void *cookie);
+	int (*alloc)(struct io_pgtable *iop, struct io_pgtable_cfg *cfg,
+		     void *cookie);
 	void (*free)(struct io_pgtable *iop);
 	int (*configure)(struct io_pgtable_cfg *cfg, size_t *pgd_size);
 };
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index e9c6f281e3dd..e372ca6cd79c 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -20,7 +20,7 @@ struct msm_iommu {
 struct msm_iommu_pagetable {
 	struct msm_mmu base;
 	struct msm_mmu *parent;
-	struct io_pgtable_ops *pgtbl_ops;
+	struct io_pgtable pgtbl;
 	unsigned long pgsize_bitmap;	/* Bitmap of page sizes in use */
 	phys_addr_t ttbr;
 	u32 asid;
@@ -90,14 +90,14 @@ static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova,
 		size_t size)
 {
 	struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
-	struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
 
 	while (size) {
 		size_t unmapped, pgsize, count;
 
 		pgsize = calc_pgsize(pagetable, iova, iova, size, &count);
 
-		unmapped = ops->unmap_pages(ops, iova, pgsize, count, NULL);
+		unmapped = iopt_unmap_pages(&pagetable->pgtbl, iova, pgsize,
+					    count, NULL);
 		if (!unmapped)
 			break;
 
@@ -114,7 +114,7 @@ static int msm_iommu_pagetable_map(struct msm_mmu *mmu, u64 iova,
 		struct sg_table *sgt, size_t len, int prot)
 {
 	struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
-	struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
+	struct io_pgtable *iop = &pagetable->pgtbl;
 	struct scatterlist *sg;
 	u64 addr = iova;
 	unsigned int i;
@@ -129,7 +129,7 @@ static int msm_iommu_pagetable_map(struct msm_mmu *mmu, u64 iova,
 
 			pgsize = calc_pgsize(pagetable, addr, phys, size, &count);
 
-			ret = ops->map_pages(ops, addr, phys, pgsize, count,
+			ret = iopt_map_pages(iop, addr, phys, pgsize, count,
 					     prot, GFP_KERNEL, &mapped);
 
 			/* map_pages could fail after mapping some of the pages,
@@ -163,7 +163,7 @@ static void msm_iommu_pagetable_destroy(struct msm_mmu *mmu)
 	if (atomic_dec_return(&iommu->pagetables) == 0)
 		adreno_smmu->set_ttbr0_cfg(adreno_smmu->cookie, NULL);
 
-	free_io_pgtable_ops(pagetable->pgtbl_ops);
+	free_io_pgtable_ops(&pagetable->pgtbl);
 	kfree(pagetable);
 }
 
@@ -258,11 +258,10 @@ struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent)
 	ttbr0_cfg.quirks &= ~IO_PGTABLE_QUIRK_ARM_TTBR1;
 	ttbr0_cfg.tlb = &null_tlb_ops;
 
-	pagetable->pgtbl_ops = alloc_io_pgtable_ops(&ttbr0_cfg, iommu->domain);
-
-	if (!pagetable->pgtbl_ops) {
+	ret = alloc_io_pgtable_ops(&pagetable->pgtbl, &ttbr0_cfg, iommu->domain);
+	if (ret) {
 		kfree(pagetable);
-		return ERR_PTR(-ENOMEM);
+		return ERR_PTR(ret);
 	}
 
 	/*
@@ -275,7 +274,7 @@ struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent)
 
 		ret = adreno_smmu->set_ttbr0_cfg(adreno_smmu->cookie, &ttbr0_cfg);
 		if (ret) {
-			free_io_pgtable_ops(pagetable->pgtbl_ops);
+			free_io_pgtable_ops(&pagetable->pgtbl);
 			kfree(pagetable);
 			return ERR_PTR(ret);
 		}
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 31bdb5d46244..118b49ab120f 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -290,7 +290,6 @@ static int mmu_map_sg(struct panfrost_device *pfdev, struct panfrost_mmu *mmu,
 {
 	unsigned int count;
 	struct scatterlist *sgl;
-	struct io_pgtable_ops *ops = mmu->pgtbl_ops;
 	u64 start_iova = iova;
 
 	for_each_sgtable_dma_sg(sgt, sgl, count) {
@@ -303,8 +302,8 @@ static int mmu_map_sg(struct panfrost_device *pfdev, struct panfrost_mmu *mmu,
 			size_t pgcount, mapped = 0;
 			size_t pgsize = get_pgsize(iova | paddr, len, &pgcount);
 
-			ops->map_pages(ops, iova, paddr, pgsize, pgcount, prot,
-				       GFP_KERNEL, &mapped);
+			iopt_map_pages(&mmu->pgtbl, iova, paddr, pgsize,
+				       pgcount, prot, GFP_KERNEL, &mapped);
 			/* Don't get stuck if things have gone wrong */
 			mapped = max(mapped, pgsize);
 			iova += mapped;
@@ -349,7 +348,7 @@ void panfrost_mmu_unmap(struct panfrost_gem_mapping *mapping)
 	struct panfrost_gem_object *bo = mapping->obj;
 	struct drm_gem_object *obj = &bo->base.base;
 	struct panfrost_device *pfdev = to_panfrost_device(obj->dev);
-	struct io_pgtable_ops *ops = mapping->mmu->pgtbl_ops;
+	struct io_pgtable *iop = &mapping->mmu->pgtbl;
 	u64 iova = mapping->mmnode.start << PAGE_SHIFT;
 	size_t len = mapping->mmnode.size << PAGE_SHIFT;
 	size_t unmapped_len = 0;
@@ -366,8 +365,8 @@ void panfrost_mmu_unmap(struct panfrost_gem_mapping *mapping)
 
 		if (bo->is_heap)
 			pgcount = 1;
-		if (!bo->is_heap || ops->iova_to_phys(ops, iova)) {
-			unmapped_page = ops->unmap_pages(ops, iova, pgsize, pgcount, NULL);
+		if (!bo->is_heap || iopt_iova_to_phys(iop, iova)) {
+			unmapped_page = iopt_unmap_pages(iop, iova, pgsize, pgcount, NULL);
 			WARN_ON(unmapped_page != pgsize * pgcount);
 		}
 		iova += pgsize * pgcount;
@@ -560,7 +559,7 @@ static void panfrost_mmu_release_ctx(struct kref *kref)
 	}
 	spin_unlock(&pfdev->as_lock);
 
-	free_io_pgtable_ops(mmu->pgtbl_ops);
+	free_io_pgtable_ops(&mmu->pgtbl);
 	drm_mm_takedown(&mmu->mm);
 	kfree(mmu);
 }
@@ -605,6 +604,7 @@ static void panfrost_drm_mm_color_adjust(const struct drm_mm_node *node,
 
 struct panfrost_mmu *panfrost_mmu_ctx_create(struct panfrost_device *pfdev)
 {
+	int ret;
 	struct panfrost_mmu *mmu;
 
 	mmu = kzalloc(sizeof(*mmu), GFP_KERNEL);
@@ -631,10 +631,10 @@ struct panfrost_mmu *panfrost_mmu_ctx_create(struct panfrost_device *pfdev)
 		.iommu_dev	= pfdev->dev,
 	};
 
-	mmu->pgtbl_ops = alloc_io_pgtable_ops(&mmu->pgtbl_cfg, mmu);
-	if (!mmu->pgtbl_ops) {
+	ret = alloc_io_pgtable_ops(&mmu->pgtbl, &mmu->pgtbl_cfg, mmu);
+	if (ret) {
 		kfree(mmu);
-		return ERR_PTR(-EINVAL);
+		return ERR_PTR(ret);
 	}
 
 	kref_init(&mmu->refcount);
diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
index ace0e9b8b913..f9ea551404ba 100644
--- a/drivers/iommu/amd/io_pgtable.c
+++ b/drivers/iommu/amd/io_pgtable.c
@@ -360,11 +360,11 @@ static void free_clear_pte(u64 *pte, u64 pteval, struct list_head *freelist)
  * supporting all features of AMD IOMMU page tables like level skipping
  * and full 64 bit address spaces.
  */
-static int iommu_v1_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
+static int iommu_v1_map_pages(struct io_pgtable *iop, unsigned long iova,
 			      phys_addr_t paddr, size_t pgsize, size_t pgcount,
 			      int prot, gfp_t gfp, size_t *mapped)
 {
-	struct protection_domain *dom = io_pgtable_ops_to_domain(ops);
+	struct protection_domain *dom = io_pgtable_ops_to_domain(iop->ops);
 	LIST_HEAD(freelist);
 	bool updated = false;
 	u64 __pte, *pte;
@@ -435,12 +435,12 @@ static int iommu_v1_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
 	return ret;
 }
 
-static unsigned long iommu_v1_unmap_pages(struct io_pgtable_ops *ops,
+static unsigned long iommu_v1_unmap_pages(struct io_pgtable *iop,
 					  unsigned long iova,
 					  size_t pgsize, size_t pgcount,
 					  struct iommu_iotlb_gather *gather)
 {
-	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(ops);
+	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(iop->ops);
 	unsigned long long unmapped;
 	unsigned long unmap_size;
 	u64 *pte;
@@ -469,9 +469,9 @@ static unsigned long iommu_v1_unmap_pages(struct io_pgtable_ops *ops,
 	return unmapped;
 }
 
-static phys_addr_t iommu_v1_iova_to_phys(struct io_pgtable_ops *ops, unsigned long iova)
+static phys_addr_t iommu_v1_iova_to_phys(struct io_pgtable *iop, unsigned long iova)
 {
-	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(ops);
+	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(iop->ops);
 	unsigned long offset_mask, pte_pgsize;
 	u64 *pte, __pte;
 
@@ -491,7 +491,7 @@ static phys_addr_t iommu_v1_iova_to_phys(struct io_pgtable_ops *ops, unsigned lo
  */
 static void v1_free_pgtable(struct io_pgtable *iop)
 {
-	struct amd_io_pgtable *pgtable = container_of(iop, struct amd_io_pgtable, iop);
+	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(iop->ops);
 	struct protection_domain *dom;
 	LIST_HEAD(freelist);
 
@@ -515,7 +515,8 @@ static void v1_free_pgtable(struct io_pgtable *iop)
 	put_pages_list(&freelist);
 }
 
-static struct io_pgtable *v1_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
+int v1_alloc_pgtable(struct io_pgtable *iop, struct io_pgtable_cfg *cfg,
+		     void *cookie)
 {
 	struct amd_io_pgtable *pgtable = io_pgtable_cfg_to_data(cfg);
 
@@ -524,11 +525,12 @@ static struct io_pgtable *v1_alloc_pgtable(struct io_pgtable_cfg *cfg, void *coo
 	cfg->oas            = IOMMU_OUT_ADDR_BIT_SIZE,
 	cfg->tlb            = &v1_flush_ops;
 
-	pgtable->iop.ops.map_pages    = iommu_v1_map_pages;
-	pgtable->iop.ops.unmap_pages  = iommu_v1_unmap_pages;
-	pgtable->iop.ops.iova_to_phys = iommu_v1_iova_to_phys;
+	pgtable->iop_params.ops.map_pages    = iommu_v1_map_pages;
+	pgtable->iop_params.ops.unmap_pages  = iommu_v1_unmap_pages;
+	pgtable->iop_params.ops.iova_to_phys = iommu_v1_iova_to_phys;
+	iop->ops = &pgtable->iop_params.ops;
 
-	return &pgtable->iop;
+	return 0;
 }
 
 struct io_pgtable_init_fns io_pgtable_amd_iommu_v1_init_fns = {
diff --git a/drivers/iommu/amd/io_pgtable_v2.c b/drivers/iommu/amd/io_pgtable_v2.c
index 8638ddf6fb3b..52acb8f11a27 100644
--- a/drivers/iommu/amd/io_pgtable_v2.c
+++ b/drivers/iommu/amd/io_pgtable_v2.c
@@ -239,12 +239,12 @@ static u64 *fetch_pte(struct amd_io_pgtable *pgtable,
 	return pte;
 }
 
-static int iommu_v2_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
+static int iommu_v2_map_pages(struct io_pgtable *iop, unsigned long iova,
 			      phys_addr_t paddr, size_t pgsize, size_t pgcount,
 			      int prot, gfp_t gfp, size_t *mapped)
 {
-	struct protection_domain *pdom = io_pgtable_ops_to_domain(ops);
-	struct io_pgtable_cfg *cfg = &pdom->iop.iop.cfg;
+	struct protection_domain *pdom = io_pgtable_ops_to_domain(iop->ops);
+	struct io_pgtable_cfg *cfg = &pdom->iop.iop_params.cfg;
 	u64 *pte;
 	unsigned long map_size;
 	unsigned long mapped_size = 0;
@@ -290,13 +290,13 @@ static int iommu_v2_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
 	return ret;
 }
 
-static unsigned long iommu_v2_unmap_pages(struct io_pgtable_ops *ops,
+static unsigned long iommu_v2_unmap_pages(struct io_pgtable *iop,
 					  unsigned long iova,
 					  size_t pgsize, size_t pgcount,
 					  struct iommu_iotlb_gather *gather)
 {
-	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(ops);
-	struct io_pgtable_cfg *cfg = &pgtable->iop.cfg;
+	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(iop->ops);
+	struct io_pgtable_cfg *cfg = &pgtable->iop_params.cfg;
 	unsigned long unmap_size;
 	unsigned long unmapped = 0;
 	size_t size = pgcount << __ffs(pgsize);
@@ -319,9 +319,9 @@ static unsigned long iommu_v2_unmap_pages(struct io_pgtable_ops *ops,
 	return unmapped;
 }
 
-static phys_addr_t iommu_v2_iova_to_phys(struct io_pgtable_ops *ops, unsigned long iova)
+static phys_addr_t iommu_v2_iova_to_phys(struct io_pgtable *iop, unsigned long iova)
 {
-	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(ops);
+	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(iop->ops);
 	unsigned long offset_mask, pte_pgsize;
 	u64 *pte, __pte;
 
@@ -362,7 +362,7 @@ static const struct iommu_flush_ops v2_flush_ops = {
 static void v2_free_pgtable(struct io_pgtable *iop)
 {
 	struct protection_domain *pdom;
-	struct amd_io_pgtable *pgtable = container_of(iop, struct amd_io_pgtable, iop);
+	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(iop->ops);
 
 	pdom = container_of(pgtable, struct protection_domain, iop);
 	if (!(pdom->flags & PD_IOMMUV2_MASK))
@@ -375,38 +375,39 @@ static void v2_free_pgtable(struct io_pgtable *iop)
 	amd_iommu_domain_update(pdom);
 
 	/* Free page table */
-	free_pgtable(pgtable->pgd, get_pgtable_level());
+	free_pgtable(iop->pgd, get_pgtable_level());
 }
 
-static struct io_pgtable *v2_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
+int v2_alloc_pgtable(struct io_pgtable *iop, struct io_pgtable_cfg *cfg, void *cookie)
 {
 	struct amd_io_pgtable *pgtable = io_pgtable_cfg_to_data(cfg);
 	struct protection_domain *pdom = (struct protection_domain *)cookie;
 	int ret;
 
-	pgtable->pgd = alloc_pgtable_page();
-	if (!pgtable->pgd)
-		return NULL;
+	iop->pgd = alloc_pgtable_page();
+	if (!iop->pgd)
+		return -ENOMEM;
 
-	ret = amd_iommu_domain_set_gcr3(&pdom->domain, 0, iommu_virt_to_phys(pgtable->pgd));
+	ret = amd_iommu_domain_set_gcr3(&pdom->domain, 0, iommu_virt_to_phys(iop->pgd));
 	if (ret)
 		goto err_free_pgd;
 
-	pgtable->iop.ops.map_pages    = iommu_v2_map_pages;
-	pgtable->iop.ops.unmap_pages  = iommu_v2_unmap_pages;
-	pgtable->iop.ops.iova_to_phys = iommu_v2_iova_to_phys;
+	pgtable->iop_params.ops.map_pages    = iommu_v2_map_pages;
+	pgtable->iop_params.ops.unmap_pages  = iommu_v2_unmap_pages;
+	pgtable->iop_params.ops.iova_to_phys = iommu_v2_iova_to_phys;
+	iop->ops = &pgtable->iop_params.ops;
 
 	cfg->pgsize_bitmap = AMD_IOMMU_PGSIZES_V2,
 	cfg->ias           = IOMMU_IN_ADDR_BIT_SIZE,
 	cfg->oas           = IOMMU_OUT_ADDR_BIT_SIZE,
 	cfg->tlb           = &v2_flush_ops;
 
-	return &pgtable->iop;
+	return 0;
 
 err_free_pgd:
-	free_pgtable_page(pgtable->pgd);
+	free_pgtable_page(iop->pgd);
 
-	return NULL;
+	return ret;
 }
 
 struct io_pgtable_init_fns io_pgtable_amd_iommu_v2_init_fns = {
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 7efb6b467041..51f9cecdcb6b 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1984,7 +1984,7 @@ static void protection_domain_free(struct protection_domain *domain)
 		return;
 
 	if (domain->iop.pgtbl_cfg.tlb)
-		free_io_pgtable_ops(&domain->iop.iop.ops);
+		free_io_pgtable_ops(&domain->iop.iop);
 
 	if (domain->id)
 		domain_id_free(domain->id);
@@ -2037,7 +2037,6 @@ static int protection_domain_init_v2(struct protection_domain *domain)
 
 static struct protection_domain *protection_domain_alloc(unsigned int type)
 {
-	struct io_pgtable_ops *pgtbl_ops;
 	struct protection_domain *domain;
 	int pgtable = amd_iommu_pgtable;
 	int mode = DEFAULT_PGTABLE_LEVEL;
@@ -2073,8 +2072,9 @@ static struct protection_domain *protection_domain_alloc(unsigned int type)
 		goto out_err;
 
 	domain->iop.pgtbl_cfg.fmt = pgtable;
-	pgtbl_ops = alloc_io_pgtable_ops(&domain->iop.pgtbl_cfg, domain);
-	if (!pgtbl_ops) {
+	ret = alloc_io_pgtable_ops(&domain->iop.iop, &domain->iop.pgtbl_cfg,
+				   domain);
+	if (ret) {
 		domain_id_free(domain->id);
 		goto out_err;
 	}
@@ -2185,7 +2185,7 @@ static void amd_iommu_iotlb_sync_map(struct iommu_domain *dom,
 				     unsigned long iova, size_t size)
 {
 	struct protection_domain *domain = to_pdomain(dom);
-	struct io_pgtable_ops *ops = &domain->iop.iop.ops;
+	struct io_pgtable_ops *ops = domain->iop.iop.ops;
 
 	if (ops->map_pages)
 		domain_flush_np_cache(domain, iova, size);
@@ -2196,9 +2196,7 @@ static int amd_iommu_map_pages(struct iommu_domain *dom, unsigned long iova,
 			       int iommu_prot, gfp_t gfp, size_t *mapped)
 {
 	struct protection_domain *domain = to_pdomain(dom);
-	struct io_pgtable_ops *ops = &domain->iop.iop.ops;
 	int prot = 0;
-	int ret = -EINVAL;
 
 	if ((amd_iommu_pgtable == AMD_IOMMU_V1) &&
 	    (domain->iop.mode == PAGE_MODE_NONE))
@@ -2209,12 +2207,8 @@ static int amd_iommu_map_pages(struct iommu_domain *dom, unsigned long iova,
 	if (iommu_prot & IOMMU_WRITE)
 		prot |= IOMMU_PROT_IW;
 
-	if (ops->map_pages) {
-		ret = ops->map_pages(ops, iova, paddr, pgsize,
-				     pgcount, prot, gfp, mapped);
-	}
-
-	return ret;
+	return iopt_map_pages(&domain->iop.iop, iova, paddr, pgsize, pgcount,
+			      prot, gfp, mapped);
 }
 
 static void amd_iommu_iotlb_gather_add_page(struct iommu_domain *domain,
@@ -2243,14 +2237,13 @@ static size_t amd_iommu_unmap_pages(struct iommu_domain *dom, unsigned long iova
 				    struct iommu_iotlb_gather *gather)
 {
 	struct protection_domain *domain = to_pdomain(dom);
-	struct io_pgtable_ops *ops = &domain->iop.iop.ops;
 	size_t r;
 
 	if ((amd_iommu_pgtable == AMD_IOMMU_V1) &&
 	    (domain->iop.mode == PAGE_MODE_NONE))
 		return 0;
 
-	r = (ops->unmap_pages) ? ops->unmap_pages(ops, iova, pgsize, pgcount, NULL) : 0;
+	r = iopt_unmap_pages(&domain->iop.iop, iova, pgsize, pgcount, NULL);
 
 	if (r)
 		amd_iommu_iotlb_gather_add_page(dom, gather, iova, r);
@@ -2262,9 +2255,8 @@ static phys_addr_t amd_iommu_iova_to_phys(struct iommu_domain *dom,
 					  dma_addr_t iova)
 {
 	struct protection_domain *domain = to_pdomain(dom);
-	struct io_pgtable_ops *ops = &domain->iop.iop.ops;
 
-	return ops->iova_to_phys(ops, iova);
+	return iopt_iova_to_phys(&domain->iop.iop, iova);
 }
 
 static bool amd_iommu_capable(struct device *dev, enum iommu_cap cap)
@@ -2460,7 +2452,7 @@ void amd_iommu_domain_direct_map(struct iommu_domain *dom)
 	spin_lock_irqsave(&domain->lock, flags);
 
 	if (domain->iop.pgtbl_cfg.tlb)
-		free_io_pgtable_ops(&domain->iop.iop.ops);
+		free_io_pgtable_ops(&domain->iop.iop);
 
 	spin_unlock_irqrestore(&domain->lock, flags);
 }
diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
index 571f948add7c..b806019f925b 100644
--- a/drivers/iommu/apple-dart.c
+++ b/drivers/iommu/apple-dart.c
@@ -150,14 +150,14 @@ struct apple_dart_atomic_stream_map {
 /*
  * This structure is attached to each iommu domain handled by a DART.
  *
- * @pgtbl_ops: pagetable ops allocated by io-pgtable
+ * @pgtbl: pagetable allocated by io-pgtable
  * @finalized: true if the domain has been completely initialized
  * @init_lock: protects domain initialization
  * @stream_maps: streams attached to this domain (valid for DMA/UNMANAGED only)
  * @domain: core iommu domain pointer
  */
 struct apple_dart_domain {
-	struct io_pgtable_ops *pgtbl_ops;
+	struct io_pgtable pgtbl;
 
 	bool finalized;
 	struct mutex init_lock;
@@ -354,12 +354,8 @@ static phys_addr_t apple_dart_iova_to_phys(struct iommu_domain *domain,
 					   dma_addr_t iova)
 {
 	struct apple_dart_domain *dart_domain = to_dart_domain(domain);
-	struct io_pgtable_ops *ops = dart_domain->pgtbl_ops;
 
-	if (!ops)
-		return 0;
-
-	return ops->iova_to_phys(ops, iova);
+	return iopt_iova_to_phys(&dart_domain->pgtbl, iova);
 }
 
 static int apple_dart_map_pages(struct iommu_domain *domain, unsigned long iova,
@@ -368,13 +364,9 @@ static int apple_dart_map_pages(struct iommu_domain *domain, unsigned long iova,
 				size_t *mapped)
 {
 	struct apple_dart_domain *dart_domain = to_dart_domain(domain);
-	struct io_pgtable_ops *ops = dart_domain->pgtbl_ops;
-
-	if (!ops)
-		return -ENODEV;
 
-	return ops->map_pages(ops, iova, paddr, pgsize, pgcount, prot, gfp,
-			      mapped);
+	return iopt_map_pages(&dart_domain->pgtbl, iova, paddr, pgsize, pgcount,
+			      prot, gfp, mapped);
 }
 
 static size_t apple_dart_unmap_pages(struct iommu_domain *domain,
@@ -383,9 +375,9 @@ static size_t apple_dart_unmap_pages(struct iommu_domain *domain,
 				     struct iommu_iotlb_gather *gather)
 {
 	struct apple_dart_domain *dart_domain = to_dart_domain(domain);
-	struct io_pgtable_ops *ops = dart_domain->pgtbl_ops;
 
-	return ops->unmap_pages(ops, iova, pgsize, pgcount, gather);
+	return iopt_unmap_pages(&dart_domain->pgtbl, iova, pgsize, pgcount,
+				gather);
 }
 
 static void
@@ -394,7 +386,7 @@ apple_dart_setup_translation(struct apple_dart_domain *domain,
 {
 	int i;
 	struct io_pgtable_cfg *pgtbl_cfg =
-		&io_pgtable_ops_to_pgtable(domain->pgtbl_ops)->cfg;
+		&io_pgtable_ops_to_params(domain->pgtbl.ops)->cfg;
 
 	for (i = 0; i < pgtbl_cfg->apple_dart_cfg.n_ttbrs; ++i)
 		apple_dart_hw_set_ttbr(stream_map, i,
@@ -435,11 +427,9 @@ static int apple_dart_finalize_domain(struct iommu_domain *domain,
 		.iommu_dev = dart->dev,
 	};
 
-	dart_domain->pgtbl_ops = alloc_io_pgtable_ops(&pgtbl_cfg, domain);
-	if (!dart_domain->pgtbl_ops) {
-		ret = -ENOMEM;
+	ret = alloc_io_pgtable_ops(&dart_domain->pgtbl, &pgtbl_cfg, domain);
+	if (ret)
 		goto done;
-	}
 
 	domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
 	domain->geometry.aperture_start = 0;
@@ -590,7 +580,7 @@ static struct iommu_domain *apple_dart_domain_alloc(unsigned int type)
 
 	mutex_init(&dart_domain->init_lock);
 
-	/* no need to allocate pgtbl_ops or do any other finalization steps */
+	/* no need to allocate pgtbl or do any other finalization steps */
 	if (type == IOMMU_DOMAIN_IDENTITY || type == IOMMU_DOMAIN_BLOCKED)
 		dart_domain->finalized = true;
 
@@ -601,8 +591,8 @@ static void apple_dart_domain_free(struct iommu_domain *domain)
 {
 	struct apple_dart_domain *dart_domain = to_dart_domain(domain);
 
-	if (dart_domain->pgtbl_ops)
-		free_io_pgtable_ops(dart_domain->pgtbl_ops);
+	if (dart_domain->pgtbl.ops)
+		free_io_pgtable_ops(&dart_domain->pgtbl);
 
 	kfree(dart_domain);
 }
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index c033b23ca4b2..97d24ee5c14d 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2058,7 +2058,7 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 
-	free_io_pgtable_ops(smmu_domain->pgtbl_ops);
+	free_io_pgtable_ops(&smmu_domain->pgtbl);
 
 	/* Free the CD and ASID, if we allocated them */
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
@@ -2171,7 +2171,6 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
 	unsigned long ias, oas;
 	enum io_pgtable_fmt fmt;
 	struct io_pgtable_cfg pgtbl_cfg;
-	struct io_pgtable_ops *pgtbl_ops;
 	int (*finalise_stage_fn)(struct arm_smmu_domain *,
 				 struct arm_smmu_master *,
 				 struct io_pgtable_cfg *);
@@ -2218,9 +2217,9 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
 		.iommu_dev	= smmu->dev,
 	};
 
-	pgtbl_ops = alloc_io_pgtable_ops(&pgtbl_cfg, smmu_domain);
-	if (!pgtbl_ops)
-		return -ENOMEM;
+	ret = alloc_io_pgtable_ops(&smmu_domain->pgtbl, &pgtbl_cfg, smmu_domain);
+	if (ret)
+		return ret;
 
 	domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
 	domain->geometry.aperture_end = (1UL << pgtbl_cfg.ias) - 1;
@@ -2228,11 +2227,10 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
 
 	ret = finalise_stage_fn(smmu_domain, master, &pgtbl_cfg);
 	if (ret < 0) {
-		free_io_pgtable_ops(pgtbl_ops);
+		free_io_pgtable_ops(&smmu_domain->pgtbl);
 		return ret;
 	}
 
-	smmu_domain->pgtbl_ops = pgtbl_ops;
 	return 0;
 }
 
@@ -2468,12 +2466,10 @@ static int arm_smmu_map_pages(struct iommu_domain *domain, unsigned long iova,
 			      phys_addr_t paddr, size_t pgsize, size_t pgcount,
 			      int prot, gfp_t gfp, size_t *mapped)
 {
-	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
-
-	if (!ops)
-		return -ENODEV;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 
-	return ops->map_pages(ops, iova, paddr, pgsize, pgcount, prot, gfp, mapped);
+	return iopt_map_pages(&smmu_domain->pgtbl, iova, paddr, pgsize, pgcount,
+			      prot, gfp, mapped);
 }
 
 static size_t arm_smmu_unmap_pages(struct iommu_domain *domain, unsigned long iova,
@@ -2481,12 +2477,9 @@ static size_t arm_smmu_unmap_pages(struct iommu_domain *domain, unsigned long io
 				   struct iommu_iotlb_gather *gather)
 {
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
-	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
 
-	if (!ops)
-		return 0;
-
-	return ops->unmap_pages(ops, iova, pgsize, pgcount, gather);
+	return iopt_unmap_pages(&smmu_domain->pgtbl, iova, pgsize, pgcount,
+				gather);
 }
 
 static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
@@ -2513,12 +2506,9 @@ static void arm_smmu_iotlb_sync(struct iommu_domain *domain,
 static phys_addr_t
 arm_smmu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova)
 {
-	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
-
-	if (!ops)
-		return 0;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 
-	return ops->iova_to_phys(ops, iova);
+	return iopt_iova_to_phys(&smmu_domain->pgtbl, iova);
 }
 
 static struct platform_driver arm_smmu_driver;
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 91d404deb115..0673841167be 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -122,8 +122,8 @@ static const struct io_pgtable_cfg *qcom_adreno_smmu_get_ttbr1_cfg(
 		const void *cookie)
 {
 	struct arm_smmu_domain *smmu_domain = (void *)cookie;
-	struct io_pgtable *pgtable =
-		io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops);
+	struct io_pgtable_params *pgtable =
+		io_pgtable_ops_to_params(smmu_domain->pgtbl.ops);
 	return &pgtable->cfg;
 }
 
@@ -137,7 +137,8 @@ static int qcom_adreno_smmu_set_ttbr0_cfg(const void *cookie,
 		const struct io_pgtable_cfg *pgtbl_cfg)
 {
 	struct arm_smmu_domain *smmu_domain = (void *)cookie;
-	struct io_pgtable *pgtable = io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops);
+	struct io_pgtable_params *pgtable =
+		io_pgtable_ops_to_params(smmu_domain->pgtbl.ops);
 	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
 	struct arm_smmu_cb *cb = &smmu_domain->smmu->cbs[cfg->cbndx];
 
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index f230d2ce977a..201055254d5b 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -614,7 +614,6 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 {
 	int irq, start, ret = 0;
 	unsigned long ias, oas;
-	struct io_pgtable_ops *pgtbl_ops;
 	struct io_pgtable_cfg pgtbl_cfg;
 	enum io_pgtable_fmt fmt;
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
@@ -765,11 +764,9 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 	if (smmu_domain->pgtbl_quirks)
 		pgtbl_cfg.quirks |= smmu_domain->pgtbl_quirks;
 
-	pgtbl_ops = alloc_io_pgtable_ops(&pgtbl_cfg, smmu_domain);
-	if (!pgtbl_ops) {
-		ret = -ENOMEM;
+	ret = alloc_io_pgtable_ops(&smmu_domain->pgtbl, &pgtbl_cfg, smmu_domain);
+	if (ret)
 		goto out_clear_smmu;
-	}
 
 	/* Update the domain's page sizes to reflect the page table format */
 	domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
@@ -808,8 +805,6 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 
 	mutex_unlock(&smmu_domain->init_mutex);
 
-	/* Publish page table ops for map/unmap */
-	smmu_domain->pgtbl_ops = pgtbl_ops;
 	return 0;
 
 out_clear_smmu:
@@ -846,7 +841,7 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain)
 		devm_free_irq(smmu->dev, irq, domain);
 	}
 
-	free_io_pgtable_ops(smmu_domain->pgtbl_ops);
+	free_io_pgtable_ops(&smmu_domain->pgtbl);
 	__arm_smmu_free_bitmap(smmu->context_map, cfg->cbndx);
 
 	arm_smmu_rpm_put(smmu);
@@ -1181,15 +1176,13 @@ static int arm_smmu_map_pages(struct iommu_domain *domain, unsigned long iova,
 			      phys_addr_t paddr, size_t pgsize, size_t pgcount,
 			      int prot, gfp_t gfp, size_t *mapped)
 {
-	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
-	struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	int ret;
 
-	if (!ops)
-		return -ENODEV;
-
 	arm_smmu_rpm_get(smmu);
-	ret = ops->map_pages(ops, iova, paddr, pgsize, pgcount, prot, gfp, mapped);
+	ret = iopt_map_pages(&smmu_domain->pgtbl, iova, paddr, pgsize, pgcount,
+			     prot, gfp, mapped);
 	arm_smmu_rpm_put(smmu);
 
 	return ret;
@@ -1199,15 +1192,13 @@ static size_t arm_smmu_unmap_pages(struct iommu_domain *domain, unsigned long io
 				   size_t pgsize, size_t pgcount,
 				   struct iommu_iotlb_gather *iotlb_gather)
 {
-	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
-	struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	size_t ret;
 
-	if (!ops)
-		return 0;
-
 	arm_smmu_rpm_get(smmu);
-	ret = ops->unmap_pages(ops, iova, pgsize, pgcount, iotlb_gather);
+	ret = iopt_unmap_pages(&smmu_domain->pgtbl, iova, pgsize, pgcount,
+			       iotlb_gather);
 	arm_smmu_rpm_put(smmu);
 
 	return ret;
@@ -1249,7 +1240,6 @@ static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
-	struct io_pgtable_ops *ops= smmu_domain->pgtbl_ops;
 	struct device *dev = smmu->dev;
 	void __iomem *reg;
 	u32 tmp;
@@ -1277,7 +1267,7 @@ static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,
 			"iova to phys timed out on %pad. Falling back to software table walk.\n",
 			&iova);
 		arm_smmu_rpm_put(smmu);
-		return ops->iova_to_phys(ops, iova);
+		return iopt_iova_to_phys(&smmu_domain->pgtbl, iova);
 	}
 
 	phys = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_PAR);
@@ -1299,16 +1289,12 @@ static phys_addr_t arm_smmu_iova_to_phys(struct iommu_domain *domain,
 					dma_addr_t iova)
 {
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
-	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
-
-	if (!ops)
-		return 0;
 
 	if (smmu_domain->smmu->features & ARM_SMMU_FEAT_TRANS_OPS &&
 			smmu_domain->stage == ARM_SMMU_DOMAIN_S1)
 		return arm_smmu_iova_to_phys_hard(domain, iova);
 
-	return ops->iova_to_phys(ops, iova);
+	return iopt_iova_to_phys(&smmu_domain->pgtbl, iova);
 }
 
 static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap)
diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
index 65eb8bdcbe50..56676dd84462 100644
--- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c
+++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
@@ -64,7 +64,7 @@ struct qcom_iommu_ctx {
 };
 
 struct qcom_iommu_domain {
-	struct io_pgtable_ops	*pgtbl_ops;
+	struct io_pgtable	 pgtbl;
 	spinlock_t		 pgtbl_lock;
 	struct mutex		 init_mutex; /* Protects iommu pointer */
 	struct iommu_domain	 domain;
@@ -229,7 +229,6 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
 {
 	struct qcom_iommu_domain *qcom_domain = to_qcom_iommu_domain(domain);
 	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
-	struct io_pgtable_ops *pgtbl_ops;
 	struct io_pgtable_cfg pgtbl_cfg;
 	int i, ret = 0;
 	u32 reg;
@@ -250,10 +249,9 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
 	qcom_domain->iommu = qcom_iommu;
 	qcom_domain->fwspec = fwspec;
 
-	pgtbl_ops = alloc_io_pgtable_ops(&pgtbl_cfg, qcom_domain);
-	if (!pgtbl_ops) {
+	ret = alloc_io_pgtable_ops(&qcom_domain->pgtbl, &pgtbl_cfg, qcom_domain);
+	if (ret) {
 		dev_err(qcom_iommu->dev, "failed to allocate pagetable ops\n");
-		ret = -ENOMEM;
 		goto out_clear_iommu;
 	}
 
@@ -308,9 +306,6 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
 
 	mutex_unlock(&qcom_domain->init_mutex);
 
-	/* Publish page table ops for map/unmap */
-	qcom_domain->pgtbl_ops = pgtbl_ops;
-
 	return 0;
 
 out_clear_iommu:
@@ -353,7 +348,7 @@ static void qcom_iommu_domain_free(struct iommu_domain *domain)
 		 * is on to avoid unclocked accesses in the TLB inv path:
 		 */
 		pm_runtime_get_sync(qcom_domain->iommu->dev);
-		free_io_pgtable_ops(qcom_domain->pgtbl_ops);
+		free_io_pgtable_ops(&qcom_domain->pgtbl);
 		pm_runtime_put_sync(qcom_domain->iommu->dev);
 	}
 
@@ -417,13 +412,10 @@ static int qcom_iommu_map(struct iommu_domain *domain, unsigned long iova,
 	int ret;
 	unsigned long flags;
 	struct qcom_iommu_domain *qcom_domain = to_qcom_iommu_domain(domain);
-	struct io_pgtable_ops *ops = qcom_domain->pgtbl_ops;
-
-	if (!ops)
-		return -ENODEV;
 
 	spin_lock_irqsave(&qcom_domain->pgtbl_lock, flags);
-	ret = ops->map_pages(ops, iova, paddr, pgsize, pgcount, prot, GFP_ATOMIC, mapped);
+	ret = iopt_map_pages(&qcom_domain->pgtbl, iova, paddr, pgsize, pgcount,
+			     prot, GFP_ATOMIC, mapped);
 	spin_unlock_irqrestore(&qcom_domain->pgtbl_lock, flags);
 	return ret;
 }
@@ -435,10 +427,6 @@ static size_t qcom_iommu_unmap(struct iommu_domain *domain, unsigned long iova,
 	size_t ret;
 	unsigned long flags;
 	struct qcom_iommu_domain *qcom_domain = to_qcom_iommu_domain(domain);
-	struct io_pgtable_ops *ops = qcom_domain->pgtbl_ops;
-
-	if (!ops)
-		return 0;
 
 	/* NOTE: unmap can be called after client device is powered off,
 	 * for example, with GPUs or anything involving dma-buf.  So we
@@ -447,7 +435,8 @@ static size_t qcom_iommu_unmap(struct iommu_domain *domain, unsigned long iova,
 	 */
 	pm_runtime_get_sync(qcom_domain->iommu->dev);
 	spin_lock_irqsave(&qcom_domain->pgtbl_lock, flags);
-	ret = ops->unmap_pages(ops, iova, pgsize, pgcount, gather);
+	ret = iopt_unmap_pages(&qcom_domain->pgtbl, iova, pgsize, pgcount,
+			       gather);
 	spin_unlock_irqrestore(&qcom_domain->pgtbl_lock, flags);
 	pm_runtime_put_sync(qcom_domain->iommu->dev);
 
@@ -457,13 +446,12 @@ static size_t qcom_iommu_unmap(struct iommu_domain *domain, unsigned long iova,
 static void qcom_iommu_flush_iotlb_all(struct iommu_domain *domain)
 {
 	struct qcom_iommu_domain *qcom_domain = to_qcom_iommu_domain(domain);
-	struct io_pgtable *pgtable = container_of(qcom_domain->pgtbl_ops,
-						  struct io_pgtable, ops);
-	if (!qcom_domain->pgtbl_ops)
+
+	if (!qcom_domain->pgtbl.ops)
 		return;
 
 	pm_runtime_get_sync(qcom_domain->iommu->dev);
-	qcom_iommu_tlb_sync(pgtable->cookie);
+	qcom_iommu_tlb_sync(qcom_domain->pgtbl.cookie);
 	pm_runtime_put_sync(qcom_domain->iommu->dev);
 }
 
@@ -479,13 +467,9 @@ static phys_addr_t qcom_iommu_iova_to_phys(struct iommu_domain *domain,
 	phys_addr_t ret;
 	unsigned long flags;
 	struct qcom_iommu_domain *qcom_domain = to_qcom_iommu_domain(domain);
-	struct io_pgtable_ops *ops = qcom_domain->pgtbl_ops;
-
-	if (!ops)
-		return 0;
 
 	spin_lock_irqsave(&qcom_domain->pgtbl_lock, flags);
-	ret = ops->iova_to_phys(ops, iova);
+	ret = iopt_iova_to_phys(&qcom_domain->pgtbl, iova);
 	spin_unlock_irqrestore(&qcom_domain->pgtbl_lock, flags);
 
 	return ret;
diff --git a/drivers/iommu/io-pgtable-arm-common.c b/drivers/iommu/io-pgtable-arm-common.c
index 4b3a9ce806ea..359086cace34 100644
--- a/drivers/iommu/io-pgtable-arm-common.c
+++ b/drivers/iommu/io-pgtable-arm-common.c
@@ -48,7 +48,8 @@ static void __arm_lpae_clear_pte(arm_lpae_iopte *ptep, struct io_pgtable_cfg *cf
 		__arm_lpae_sync_pte(ptep, 1, cfg);
 }
 
-static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
+static size_t __arm_lpae_unmap(struct io_pgtable *iop,
+			       struct arm_lpae_io_pgtable *data,
 			       struct iommu_iotlb_gather *gather,
 			       unsigned long iova, size_t size, size_t pgcount,
 			       int lvl, arm_lpae_iopte *ptep);
@@ -74,7 +75,8 @@ static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
 		__arm_lpae_sync_pte(ptep, num_entries, cfg);
 }
 
-static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
+static int arm_lpae_init_pte(struct io_pgtable *iop,
+			     struct arm_lpae_io_pgtable *data,
 			     unsigned long iova, phys_addr_t paddr,
 			     arm_lpae_iopte prot, int lvl, int num_entries,
 			     arm_lpae_iopte *ptep)
@@ -95,8 +97,8 @@ static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
 			size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
 
 			tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data);
-			if (__arm_lpae_unmap(data, NULL, iova + i * sz, sz, 1,
-					     lvl, tblp) != sz) {
+			if (__arm_lpae_unmap(iop, data, NULL, iova + i * sz, sz,
+					     1, lvl, tblp) != sz) {
 				WARN_ON(1);
 				return -EINVAL;
 			}
@@ -139,10 +141,10 @@ static arm_lpae_iopte arm_lpae_install_table(arm_lpae_iopte *table,
 	return old;
 }
 
-int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova,
-		   phys_addr_t paddr, size_t size, size_t pgcount,
-		   arm_lpae_iopte prot, int lvl, arm_lpae_iopte *ptep,
-		   gfp_t gfp, size_t *mapped)
+int __arm_lpae_map(struct io_pgtable *iop, struct arm_lpae_io_pgtable *data,
+		   unsigned long iova, phys_addr_t paddr, size_t size,
+		   size_t pgcount, arm_lpae_iopte prot, int lvl,
+		   arm_lpae_iopte *ptep, gfp_t gfp, size_t *mapped)
 {
 	arm_lpae_iopte *cptep, pte;
 	size_t block_size = ARM_LPAE_BLOCK_SIZE(lvl, data);
@@ -158,7 +160,8 @@ int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova,
 	if (size == block_size) {
 		max_entries = ARM_LPAE_PTES_PER_TABLE(data) - map_idx_start;
 		num_entries = min_t(int, pgcount, max_entries);
-		ret = arm_lpae_init_pte(data, iova, paddr, prot, lvl, num_entries, ptep);
+		ret = arm_lpae_init_pte(iop, data, iova, paddr, prot, lvl,
+					num_entries, ptep);
 		if (!ret)
 			*mapped += num_entries * size;
 
@@ -192,7 +195,7 @@ int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova,
 	}
 
 	/* Rinse, repeat */
-	return __arm_lpae_map(data, iova, paddr, size, pgcount, prot, lvl + 1,
+	return __arm_lpae_map(iop, data, iova, paddr, size, pgcount, prot, lvl + 1,
 			      cptep, gfp, mapped);
 }
 
@@ -260,13 +263,13 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
 	return pte;
 }
 
-int arm_lpae_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
+int arm_lpae_map_pages(struct io_pgtable *iop, unsigned long iova,
 		       phys_addr_t paddr, size_t pgsize, size_t pgcount,
 		       int iommu_prot, gfp_t gfp, size_t *mapped)
 {
-	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
 	struct io_pgtable_cfg *cfg = &data->iop.cfg;
-	arm_lpae_iopte *ptep = data->pgd;
+	arm_lpae_iopte *ptep = iop->pgd;
 	int ret, lvl = data->start_level;
 	arm_lpae_iopte prot;
 	long iaext = (s64)iova >> cfg->ias;
@@ -284,7 +287,7 @@ int arm_lpae_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
 		return 0;
 
 	prot = arm_lpae_prot_to_pte(data, iommu_prot);
-	ret = __arm_lpae_map(data, iova, paddr, pgsize, pgcount, prot, lvl,
+	ret = __arm_lpae_map(iop, data, iova, paddr, pgsize, pgcount, prot, lvl,
 			     ptep, gfp, mapped);
 	/*
 	 * Synchronise all PTE updates for the new mapping before there's
@@ -326,7 +329,8 @@ void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
 	__arm_lpae_free_pages(start, table_size, &data->iop.cfg);
 }
 
-static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
+static size_t arm_lpae_split_blk_unmap(struct io_pgtable *iop,
+				       struct arm_lpae_io_pgtable *data,
 				       struct iommu_iotlb_gather *gather,
 				       unsigned long iova, size_t size,
 				       arm_lpae_iopte blk_pte, int lvl,
@@ -378,21 +382,24 @@ static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
 		tablep = iopte_deref(pte, data);
 	} else if (unmap_idx_start >= 0) {
 		for (i = 0; i < num_entries; i++)
-			io_pgtable_tlb_add_page(&data->iop, gather, iova + i * size, size);
+			io_pgtable_tlb_add_page(cfg, iop, gather,
+						iova + i * size, size);
 
 		return num_entries * size;
 	}
 
-	return __arm_lpae_unmap(data, gather, iova, size, pgcount, lvl, tablep);
+	return __arm_lpae_unmap(iop, data, gather, iova, size, pgcount, lvl,
+				tablep);
 }
 
-static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
+static size_t __arm_lpae_unmap(struct io_pgtable *iop,
+			       struct arm_lpae_io_pgtable *data,
 			       struct iommu_iotlb_gather *gather,
 			       unsigned long iova, size_t size, size_t pgcount,
 			       int lvl, arm_lpae_iopte *ptep)
 {
 	arm_lpae_iopte pte;
-	struct io_pgtable *iop = &data->iop;
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
 	int i = 0, num_entries, max_entries, unmap_idx_start;
 
 	/* Something went horribly wrong and we ran out of page table */
@@ -415,15 +422,16 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
 			if (WARN_ON(!pte))
 				break;
 
-			__arm_lpae_clear_pte(ptep, &iop->cfg);
+			__arm_lpae_clear_pte(ptep, cfg);
 
-			if (!iopte_leaf(pte, lvl, iop->cfg.fmt)) {
+			if (!iopte_leaf(pte, lvl, cfg->fmt)) {
 				/* Also flush any partial walks */
-				io_pgtable_tlb_flush_walk(iop, iova + i * size, size,
+				io_pgtable_tlb_flush_walk(cfg, iop, iova + i * size, size,
 							  ARM_LPAE_GRANULE(data));
 				__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data));
 			} else if (!iommu_iotlb_gather_queued(gather)) {
-				io_pgtable_tlb_add_page(iop, gather, iova + i * size, size);
+				io_pgtable_tlb_add_page(cfg, iop, gather,
+							iova + i * size, size);
 			}
 
 			ptep++;
@@ -431,27 +439,28 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
 		}
 
 		return i * size;
-	} else if (iopte_leaf(pte, lvl, iop->cfg.fmt)) {
+	} else if (iopte_leaf(pte, lvl, cfg->fmt)) {
 		/*
 		 * Insert a table at the next level to map the old region,
 		 * minus the part we want to unmap
 		 */
-		return arm_lpae_split_blk_unmap(data, gather, iova, size, pte,
-						lvl + 1, ptep, pgcount);
+		return arm_lpae_split_blk_unmap(iop, data, gather, iova, size,
+						pte, lvl + 1, ptep, pgcount);
 	}
 
 	/* Keep on walkin' */
 	ptep = iopte_deref(pte, data);
-	return __arm_lpae_unmap(data, gather, iova, size, pgcount, lvl + 1, ptep);
+	return __arm_lpae_unmap(iop, data, gather, iova, size,
+				pgcount, lvl + 1, ptep);
 }
 
-size_t arm_lpae_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova,
+size_t arm_lpae_unmap_pages(struct io_pgtable *iop, unsigned long iova,
 			    size_t pgsize, size_t pgcount,
 			    struct iommu_iotlb_gather *gather)
 {
-	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
 	struct io_pgtable_cfg *cfg = &data->iop.cfg;
-	arm_lpae_iopte *ptep = data->pgd;
+	arm_lpae_iopte *ptep = iop->pgd;
 	long iaext = (s64)iova >> cfg->ias;
 
 	if (WARN_ON(!pgsize || (pgsize & cfg->pgsize_bitmap) != pgsize || !pgcount))
@@ -462,15 +471,14 @@ size_t arm_lpae_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova,
 	if (WARN_ON(iaext))
 		return 0;
 
-	return __arm_lpae_unmap(data, gather, iova, pgsize, pgcount,
-				data->start_level, ptep);
+	return __arm_lpae_unmap(iop, data, gather, iova, pgsize,
+				pgcount, data->start_level, ptep);
 }
 
-phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
-				  unsigned long iova)
+static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable *iop, unsigned long iova)
 {
-	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
-	arm_lpae_iopte pte, *ptep = data->pgd;
+	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
+	arm_lpae_iopte pte, *ptep = iop->pgd;
 	int lvl = data->start_level;
 
 	do {
diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
index 278b4299d757..2dd12fabfaee 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -40,7 +40,7 @@
 	container_of((x), struct arm_v7s_io_pgtable, iop)
 
 #define io_pgtable_ops_to_data(x)					\
-	io_pgtable_to_data(io_pgtable_ops_to_pgtable(x))
+	io_pgtable_to_data(io_pgtable_ops_to_params(x))
 
 /*
  * We have 32 bits total; 12 bits resolved at level 1, 8 bits at level 2,
@@ -162,11 +162,10 @@ typedef u32 arm_v7s_iopte;
 static bool selftest_running;
 
 struct arm_v7s_io_pgtable {
-	struct io_pgtable	iop;
+	struct io_pgtable_params	iop;
 
-	arm_v7s_iopte		*pgd;
-	struct kmem_cache	*l2_tables;
-	spinlock_t		split_lock;
+	struct kmem_cache		*l2_tables;
+	spinlock_t			split_lock;
 };
 
 static bool arm_v7s_pte_is_cont(arm_v7s_iopte pte, int lvl);
@@ -424,13 +423,14 @@ static bool arm_v7s_pte_is_cont(arm_v7s_iopte pte, int lvl)
 	return false;
 }
 
-static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *,
+static size_t __arm_v7s_unmap(struct io_pgtable *, struct arm_v7s_io_pgtable *,
 			      struct iommu_iotlb_gather *, unsigned long,
 			      size_t, int, arm_v7s_iopte *);
 
-static int arm_v7s_init_pte(struct arm_v7s_io_pgtable *data,
-			    unsigned long iova, phys_addr_t paddr, int prot,
-			    int lvl, int num_entries, arm_v7s_iopte *ptep)
+static int arm_v7s_init_pte(struct io_pgtable *iop,
+			    struct arm_v7s_io_pgtable *data, unsigned long iova,
+			    phys_addr_t paddr, int prot, int lvl,
+			    int num_entries, arm_v7s_iopte *ptep)
 {
 	struct io_pgtable_cfg *cfg = &data->iop.cfg;
 	arm_v7s_iopte pte;
@@ -446,7 +446,7 @@ static int arm_v7s_init_pte(struct arm_v7s_io_pgtable *data,
 			size_t sz = ARM_V7S_BLOCK_SIZE(lvl);
 
 			tblp = ptep - ARM_V7S_LVL_IDX(iova, lvl, cfg);
-			if (WARN_ON(__arm_v7s_unmap(data, NULL, iova + i * sz,
+			if (WARN_ON(__arm_v7s_unmap(iop, data, NULL, iova + i * sz,
 						    sz, lvl, tblp) != sz))
 				return -EINVAL;
 		} else if (ptep[i]) {
@@ -494,9 +494,9 @@ static arm_v7s_iopte arm_v7s_install_table(arm_v7s_iopte *table,
 	return old;
 }
 
-static int __arm_v7s_map(struct arm_v7s_io_pgtable *data, unsigned long iova,
-			 phys_addr_t paddr, size_t size, int prot,
-			 int lvl, arm_v7s_iopte *ptep, gfp_t gfp)
+static int __arm_v7s_map(struct io_pgtable *iop, struct arm_v7s_io_pgtable *data,
+			 unsigned long iova, phys_addr_t paddr, size_t size,
+			 int prot, int lvl, arm_v7s_iopte *ptep, gfp_t gfp)
 {
 	struct io_pgtable_cfg *cfg = &data->iop.cfg;
 	arm_v7s_iopte pte, *cptep;
@@ -507,7 +507,7 @@ static int __arm_v7s_map(struct arm_v7s_io_pgtable *data, unsigned long iova,
 
 	/* If we can install a leaf entry at this level, then do so */
 	if (num_entries)
-		return arm_v7s_init_pte(data, iova, paddr, prot,
+		return arm_v7s_init_pte(iop, data, iova, paddr, prot,
 					lvl, num_entries, ptep);
 
 	/* We can't allocate tables at the final level */
@@ -538,14 +538,14 @@ static int __arm_v7s_map(struct arm_v7s_io_pgtable *data, unsigned long iova,
 	}
 
 	/* Rinse, repeat */
-	return __arm_v7s_map(data, iova, paddr, size, prot, lvl + 1, cptep, gfp);
+	return __arm_v7s_map(iop, data, iova, paddr, size, prot, lvl + 1, cptep, gfp);
 }
 
-static int arm_v7s_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
+static int arm_v7s_map_pages(struct io_pgtable *iop, unsigned long iova,
 			     phys_addr_t paddr, size_t pgsize, size_t pgcount,
 			     int prot, gfp_t gfp, size_t *mapped)
 {
-	struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
 	int ret = -EINVAL;
 
 	if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias) ||
@@ -557,8 +557,8 @@ static int arm_v7s_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
 		return 0;
 
 	while (pgcount--) {
-		ret = __arm_v7s_map(data, iova, paddr, pgsize, prot, 1, data->pgd,
-				    gfp);
+		ret = __arm_v7s_map(iop, data, iova, paddr, pgsize, prot, 1,
+				    iop->pgd, gfp);
 		if (ret)
 			break;
 
@@ -577,26 +577,26 @@ static int arm_v7s_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
 
 static void arm_v7s_free_pgtable(struct io_pgtable *iop)
 {
-	struct arm_v7s_io_pgtable *data = io_pgtable_to_data(iop);
+	struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
+	arm_v7s_iopte *ptep = iop->pgd;
 	int i;
 
-	for (i = 0; i < ARM_V7S_PTES_PER_LVL(1, &data->iop.cfg); i++) {
-		arm_v7s_iopte pte = data->pgd[i];
-
-		if (ARM_V7S_PTE_IS_TABLE(pte, 1))
-			__arm_v7s_free_table(iopte_deref(pte, 1, data),
+	for (i = 0; i < ARM_V7S_PTES_PER_LVL(1, &data->iop.cfg); i++, ptep++) {
+		if (ARM_V7S_PTE_IS_TABLE(*ptep, 1))
+			__arm_v7s_free_table(iopte_deref(*ptep, 1, data),
 					     2, data);
 	}
-	__arm_v7s_free_table(data->pgd, 1, data);
+	__arm_v7s_free_table(iop->pgd, 1, data);
 	kmem_cache_destroy(data->l2_tables);
 	kfree(data);
 }
 
-static arm_v7s_iopte arm_v7s_split_cont(struct arm_v7s_io_pgtable *data,
+static arm_v7s_iopte arm_v7s_split_cont(struct io_pgtable *iop,
+					struct arm_v7s_io_pgtable *data,
 					unsigned long iova, int idx, int lvl,
 					arm_v7s_iopte *ptep)
 {
-	struct io_pgtable *iop = &data->iop;
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
 	arm_v7s_iopte pte;
 	size_t size = ARM_V7S_BLOCK_SIZE(lvl);
 	int i;
@@ -611,14 +611,15 @@ static arm_v7s_iopte arm_v7s_split_cont(struct arm_v7s_io_pgtable *data,
 	for (i = 0; i < ARM_V7S_CONT_PAGES; i++)
 		ptep[i] = pte + i * size;
 
-	__arm_v7s_pte_sync(ptep, ARM_V7S_CONT_PAGES, &iop->cfg);
+	__arm_v7s_pte_sync(ptep, ARM_V7S_CONT_PAGES, cfg);
 
 	size *= ARM_V7S_CONT_PAGES;
-	io_pgtable_tlb_flush_walk(iop, iova, size, size);
+	io_pgtable_tlb_flush_walk(cfg, iop, iova, size, size);
 	return pte;
 }
 
-static size_t arm_v7s_split_blk_unmap(struct arm_v7s_io_pgtable *data,
+static size_t arm_v7s_split_blk_unmap(struct io_pgtable *iop,
+				      struct arm_v7s_io_pgtable *data,
 				      struct iommu_iotlb_gather *gather,
 				      unsigned long iova, size_t size,
 				      arm_v7s_iopte blk_pte,
@@ -656,27 +657,28 @@ static size_t arm_v7s_split_blk_unmap(struct arm_v7s_io_pgtable *data,
 			return 0;
 
 		tablep = iopte_deref(pte, 1, data);
-		return __arm_v7s_unmap(data, gather, iova, size, 2, tablep);
+		return __arm_v7s_unmap(iop, data, gather, iova, size, 2, tablep);
 	}
 
-	io_pgtable_tlb_add_page(&data->iop, gather, iova, size);
+	io_pgtable_tlb_add_page(cfg, iop, gather, iova, size);
 	return size;
 }
 
-static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *data,
+static size_t __arm_v7s_unmap(struct io_pgtable *iop,
+			      struct arm_v7s_io_pgtable *data,
 			      struct iommu_iotlb_gather *gather,
 			      unsigned long iova, size_t size, int lvl,
 			      arm_v7s_iopte *ptep)
 {
 	arm_v7s_iopte pte[ARM_V7S_CONT_PAGES];
-	struct io_pgtable *iop = &data->iop;
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
 	int idx, i = 0, num_entries = size >> ARM_V7S_LVL_SHIFT(lvl);
 
 	/* Something went horribly wrong and we ran out of page table */
 	if (WARN_ON(lvl > 2))
 		return 0;
 
-	idx = ARM_V7S_LVL_IDX(iova, lvl, &iop->cfg);
+	idx = ARM_V7S_LVL_IDX(iova, lvl, cfg);
 	ptep += idx;
 	do {
 		pte[i] = READ_ONCE(ptep[i]);
@@ -698,7 +700,7 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *data,
 		unsigned long flags;
 
 		spin_lock_irqsave(&data->split_lock, flags);
-		pte[0] = arm_v7s_split_cont(data, iova, idx, lvl, ptep);
+		pte[0] = arm_v7s_split_cont(iop, data, iova, idx, lvl, ptep);
 		spin_unlock_irqrestore(&data->split_lock, flags);
 	}
 
@@ -706,17 +708,18 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *data,
 	if (num_entries) {
 		size_t blk_size = ARM_V7S_BLOCK_SIZE(lvl);
 
-		__arm_v7s_set_pte(ptep, 0, num_entries, &iop->cfg);
+		__arm_v7s_set_pte(ptep, 0, num_entries, cfg);
 
 		for (i = 0; i < num_entries; i++) {
 			if (ARM_V7S_PTE_IS_TABLE(pte[i], lvl)) {
 				/* Also flush any partial walks */
-				io_pgtable_tlb_flush_walk(iop, iova, blk_size,
+				io_pgtable_tlb_flush_walk(cfg, iop, iova, blk_size,
 						ARM_V7S_BLOCK_SIZE(lvl + 1));
 				ptep = iopte_deref(pte[i], lvl, data);
 				__arm_v7s_free_table(ptep, lvl + 1, data);
 			} else if (!iommu_iotlb_gather_queued(gather)) {
-				io_pgtable_tlb_add_page(iop, gather, iova, blk_size);
+				io_pgtable_tlb_add_page(cfg, iop, gather, iova,
+							blk_size);
 			}
 			iova += blk_size;
 		}
@@ -726,27 +729,27 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *data,
 		 * Insert a table at the next level to map the old region,
 		 * minus the part we want to unmap
 		 */
-		return arm_v7s_split_blk_unmap(data, gather, iova, size, pte[0],
-					       ptep);
+		return arm_v7s_split_blk_unmap(iop, data, gather, iova, size,
+					       pte[0], ptep);
 	}
 
 	/* Keep on walkin' */
 	ptep = iopte_deref(pte[0], lvl, data);
-	return __arm_v7s_unmap(data, gather, iova, size, lvl + 1, ptep);
+	return __arm_v7s_unmap(iop, data, gather, iova, size, lvl + 1, ptep);
 }
 
-static size_t arm_v7s_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova,
+static size_t arm_v7s_unmap_pages(struct io_pgtable *iop, unsigned long iova,
 				  size_t pgsize, size_t pgcount,
 				  struct iommu_iotlb_gather *gather)
 {
-	struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
 	size_t unmapped = 0, ret;
 
 	if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias)))
 		return 0;
 
 	while (pgcount--) {
-		ret = __arm_v7s_unmap(data, gather, iova, pgsize, 1, data->pgd);
+		ret = __arm_v7s_unmap(iop, data, gather, iova, pgsize, 1, iop->pgd);
 		if (!ret)
 			break;
 
@@ -757,11 +760,11 @@ static size_t arm_v7s_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova
 	return unmapped;
 }
 
-static phys_addr_t arm_v7s_iova_to_phys(struct io_pgtable_ops *ops,
+static phys_addr_t arm_v7s_iova_to_phys(struct io_pgtable *iop,
 					unsigned long iova)
 {
-	struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops);
-	arm_v7s_iopte *ptep = data->pgd, pte;
+	struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
+	arm_v7s_iopte *ptep = iop->pgd, pte;
 	int lvl = 0;
 	u32 mask;
 
@@ -780,37 +783,37 @@ static phys_addr_t arm_v7s_iova_to_phys(struct io_pgtable_ops *ops,
 	return iopte_to_paddr(pte, lvl, &data->iop.cfg) | (iova & ~mask);
 }
 
-static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,
-						void *cookie)
+static int arm_v7s_alloc_pgtable(struct io_pgtable *iop,
+				 struct io_pgtable_cfg *cfg, void *cookie)
 {
 	struct arm_v7s_io_pgtable *data;
 	slab_flags_t slab_flag;
 	phys_addr_t paddr;
 
 	if (cfg->ias > (arm_v7s_is_mtk_enabled(cfg) ? 34 : ARM_V7S_ADDR_BITS))
-		return NULL;
+		return -EINVAL;
 
 	if (cfg->oas > (arm_v7s_is_mtk_enabled(cfg) ? 35 : ARM_V7S_ADDR_BITS))
-		return NULL;
+		return -EINVAL;
 
 	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
 			    IO_PGTABLE_QUIRK_NO_PERMS |
 			    IO_PGTABLE_QUIRK_ARM_MTK_EXT |
 			    IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT))
-		return NULL;
+		return -EINVAL;
 
 	/* If ARM_MTK_4GB is enabled, the NO_PERMS is also expected. */
 	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_EXT &&
 	    !(cfg->quirks & IO_PGTABLE_QUIRK_NO_PERMS))
-			return NULL;
+		return -EINVAL;
 
 	if ((cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT) &&
 	    !arm_v7s_is_mtk_enabled(cfg))
-		return NULL;
+		return -EINVAL;
 
 	data = kmalloc(sizeof(*data), GFP_KERNEL);
 	if (!data)
-		return NULL;
+		return -ENOMEM;
 
 	spin_lock_init(&data->split_lock);
 
@@ -860,15 +863,15 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,
 				ARM_V7S_NMRR_OR(7, ARM_V7S_RGN_WBWA);
 
 	/* Looking good; allocate a pgd */
-	data->pgd = __arm_v7s_alloc_table(1, GFP_KERNEL, data);
-	if (!data->pgd)
+	iop->pgd = __arm_v7s_alloc_table(1, GFP_KERNEL, data);
+	if (!iop->pgd)
 		goto out_free_data;
 
 	/* Ensure the empty pgd is visible before any actual TTBR write */
 	wmb();
 
 	/* TTBR */
-	paddr = virt_to_phys(data->pgd);
+	paddr = virt_to_phys(iop->pgd);
 	if (arm_v7s_is_mtk_enabled(cfg))
 		cfg->arm_v7s_cfg.ttbr = paddr | upper_32_bits(paddr);
 	else
@@ -878,12 +881,13 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,
 					 ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_WBWA)) :
 					(ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_NC) |
 					 ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_NC)));
-	return &data->iop;
+	iop->ops = &data->iop.ops;
+	return 0;
 
 out_free_data:
 	kmem_cache_destroy(data->l2_tables);
 	kfree(data);
-	return NULL;
+	return -EINVAL;
 }
 
 struct io_pgtable_init_fns io_pgtable_arm_v7s_init_fns = {
@@ -920,7 +924,7 @@ static const struct iommu_flush_ops dummy_tlb_ops __initconst = {
 	.tlb_add_page	= dummy_tlb_add_page,
 };
 
-#define __FAIL(ops)	({				\
+#define __FAIL()	({				\
 		WARN(1, "selftest: test failed\n");	\
 		selftest_running = false;		\
 		-EFAULT;				\
@@ -928,7 +932,7 @@ static const struct iommu_flush_ops dummy_tlb_ops __initconst = {
 
 static int __init arm_v7s_do_selftests(void)
 {
-	struct io_pgtable_ops *ops;
+	struct io_pgtable iop;
 	struct io_pgtable_cfg cfg = {
 		.fmt = ARM_V7S,
 		.tlb = &dummy_tlb_ops,
@@ -946,8 +950,7 @@ static int __init arm_v7s_do_selftests(void)
 
 	cfg_cookie = &cfg;
 
-	ops = alloc_io_pgtable_ops(&cfg, &cfg);
-	if (!ops) {
+	if (alloc_io_pgtable_ops(&iop, &cfg, &cfg)) {
 		pr_err("selftest: failed to allocate io pgtable ops\n");
 		return -EINVAL;
 	}
@@ -956,14 +959,14 @@ static int __init arm_v7s_do_selftests(void)
 	 * Initial sanity checks.
 	 * Empty page tables shouldn't provide any translations.
 	 */
-	if (ops->iova_to_phys(ops, 42))
-		return __FAIL(ops);
+	if (iopt_iova_to_phys(&iop, 42))
+		return __FAIL();
 
-	if (ops->iova_to_phys(ops, SZ_1G + 42))
-		return __FAIL(ops);
+	if (iopt_iova_to_phys(&iop, SZ_1G + 42))
+		return __FAIL();
 
-	if (ops->iova_to_phys(ops, SZ_2G + 42))
-		return __FAIL(ops);
+	if (iopt_iova_to_phys(&iop, SZ_2G + 42))
+		return __FAIL();
 
 	/*
 	 * Distinct mappings of different granule sizes.
@@ -971,20 +974,20 @@ static int __init arm_v7s_do_selftests(void)
 	iova = 0;
 	for_each_set_bit(i, &cfg.pgsize_bitmap, BITS_PER_LONG) {
 		size = 1UL << i;
-		if (ops->map_pages(ops, iova, iova, size, 1,
+		if (iopt_map_pages(&iop, iova, iova, size, 1,
 				   IOMMU_READ | IOMMU_WRITE |
 				   IOMMU_NOEXEC | IOMMU_CACHE,
 				   GFP_KERNEL, &mapped))
-			return __FAIL(ops);
+			return __FAIL();
 
 		/* Overlapping mappings */
-		if (!ops->map_pages(ops, iova, iova + size, size, 1,
+		if (!iopt_map_pages(&iop, iova, iova + size, size, 1,
 				    IOMMU_READ | IOMMU_NOEXEC, GFP_KERNEL,
 				    &mapped))
-			return __FAIL(ops);
+			return __FAIL();
 
-		if (ops->iova_to_phys(ops, iova + 42) != (iova + 42))
-			return __FAIL(ops);
+		if (iopt_iova_to_phys(&iop, iova + 42) != (iova + 42))
+			return __FAIL();
 
 		iova += SZ_16M;
 		loopnr++;
@@ -995,17 +998,17 @@ static int __init arm_v7s_do_selftests(void)
 	size = 1UL << __ffs(cfg.pgsize_bitmap);
 	while (i < loopnr) {
 		iova_start = i * SZ_16M;
-		if (ops->unmap_pages(ops, iova_start + size, size, 1, NULL) != size)
-			return __FAIL(ops);
+		if (iopt_unmap_pages(&iop, iova_start + size, size, 1, NULL) != size)
+			return __FAIL();
 
 		/* Remap of partial unmap */
-		if (ops->map_pages(ops, iova_start + size, size, size, 1,
+		if (iopt_map_pages(&iop, iova_start + size, size, size, 1,
 				   IOMMU_READ, GFP_KERNEL, &mapped))
-			return __FAIL(ops);
+			return __FAIL();
 
-		if (ops->iova_to_phys(ops, iova_start + size + 42)
+		if (iopt_iova_to_phys(&iop, iova_start + size + 42)
 		    != (size + 42))
-			return __FAIL(ops);
+			return __FAIL();
 		i++;
 	}
 
@@ -1014,24 +1017,24 @@ static int __init arm_v7s_do_selftests(void)
 	for_each_set_bit(i, &cfg.pgsize_bitmap, BITS_PER_LONG) {
 		size = 1UL << i;
 
-		if (ops->unmap_pages(ops, iova, size, 1, NULL) != size)
-			return __FAIL(ops);
+		if (iopt_unmap_pages(&iop, iova, size, 1, NULL) != size)
+			return __FAIL();
 
-		if (ops->iova_to_phys(ops, iova + 42))
-			return __FAIL(ops);
+		if (iopt_iova_to_phys(&iop, iova + 42))
+			return __FAIL();
 
 		/* Remap full block */
-		if (ops->map_pages(ops, iova, iova, size, 1, IOMMU_WRITE,
+		if (iopt_map_pages(&iop, iova, iova, size, 1, IOMMU_WRITE,
 				   GFP_KERNEL, &mapped))
-			return __FAIL(ops);
+			return __FAIL();
 
-		if (ops->iova_to_phys(ops, iova + 42) != (iova + 42))
-			return __FAIL(ops);
+		if (iopt_iova_to_phys(&iop, iova + 42) != (iova + 42))
+			return __FAIL();
 
 		iova += SZ_16M;
 	}
 
-	free_io_pgtable_ops(ops);
+	free_io_pgtable_ops(&iop);
 
 	selftest_running = false;
 
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index c412500efadf..bee8980c89eb 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -82,40 +82,40 @@ void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries,
 
 static void arm_lpae_free_pgtable(struct io_pgtable *iop)
 {
-	struct arm_lpae_io_pgtable *data = io_pgtable_to_data(iop);
+	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
 
-	__arm_lpae_free_pgtable(data, data->start_level, data->pgd);
+	__arm_lpae_free_pgtable(data, data->start_level, iop->pgd);
 	kfree(data);
 }
 
-static struct io_pgtable *
-arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
+int arm_64_lpae_alloc_pgtable_s1(struct io_pgtable *iop,
+				 struct io_pgtable_cfg *cfg, void *cookie)
 {
 	struct arm_lpae_io_pgtable *data;
 
 	data = kzalloc(sizeof(*data), GFP_KERNEL);
 	if (!data)
-		return NULL;
+		return -ENOMEM;
 
 	if (arm_lpae_init_pgtable_s1(cfg, data))
 		goto out_free_data;
 
 	/* Looking good; allocate a pgd */
-	data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data),
-					   GFP_KERNEL, cfg);
-	if (!data->pgd)
+	iop->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data),
+					  GFP_KERNEL, cfg);
+	if (!iop->pgd)
 		goto out_free_data;
 
 	/* Ensure the empty pgd is visible before any actual TTBR write */
 	wmb();
 
-	/* TTBR */
-	cfg->arm_lpae_s1_cfg.ttbr = virt_to_phys(data->pgd);
-	return &data->iop;
+	cfg->arm_lpae_s1_cfg.ttbr = virt_to_phys(iop->pgd);
+	iop->ops = &data->iop.ops;
+	return 0;
 
 out_free_data:
 	kfree(data);
-	return NULL;
+	return -EINVAL;
 }
 
 static int arm_64_lpae_configure_s1(struct io_pgtable_cfg *cfg, size_t *pgd_size)
@@ -130,34 +130,35 @@ static int arm_64_lpae_configure_s1(struct io_pgtable_cfg *cfg, size_t *pgd_size
 	return 0;
 }
 
-static struct io_pgtable *
-arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
+int arm_64_lpae_alloc_pgtable_s2(struct io_pgtable *iop,
+				 struct io_pgtable_cfg *cfg, void *cookie)
 {
 	struct arm_lpae_io_pgtable *data;
 
 	data = kzalloc(sizeof(*data), GFP_KERNEL);
 	if (!data)
-		return NULL;
+		return -ENOMEM;
 
 	if (arm_lpae_init_pgtable_s2(cfg, data))
 		goto out_free_data;
 
 	/* Allocate pgd pages */
-	data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data),
-					   GFP_KERNEL, cfg);
-	if (!data->pgd)
+	iop->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data),
+					  GFP_KERNEL, cfg);
+	if (!iop->pgd)
 		goto out_free_data;
 
 	/* Ensure the empty pgd is visible before any actual TTBR write */
 	wmb();
 
 	/* VTTBR */
-	cfg->arm_lpae_s2_cfg.vttbr = virt_to_phys(data->pgd);
-	return &data->iop;
+	cfg->arm_lpae_s2_cfg.vttbr = virt_to_phys(iop->pgd);
+	iop->ops = &data->iop.ops;
+	return 0;
 
 out_free_data:
 	kfree(data);
-	return NULL;
+	return -EINVAL;
 }
 
 static int arm_64_lpae_configure_s2(struct io_pgtable_cfg *cfg, size_t *pgd_size)
@@ -172,46 +173,46 @@ static int arm_64_lpae_configure_s2(struct io_pgtable_cfg *cfg, size_t *pgd_size
 	return 0;
 }
 
-static struct io_pgtable *
-arm_32_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
+int arm_32_lpae_alloc_pgtable_s1(struct io_pgtable *iop,
+				 struct io_pgtable_cfg *cfg, void *cookie)
 {
 	if (cfg->ias > 32 || cfg->oas > 40)
-		return NULL;
+		return -EINVAL;
 
 	cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G);
-	return arm_64_lpae_alloc_pgtable_s1(cfg, cookie);
+	return arm_64_lpae_alloc_pgtable_s1(iop, cfg, cookie);
 }
 
-static struct io_pgtable *
-arm_32_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
+int arm_32_lpae_alloc_pgtable_s2(struct io_pgtable *iop,
+				 struct io_pgtable_cfg *cfg, void *cookie)
 {
 	if (cfg->ias > 40 || cfg->oas > 40)
-		return NULL;
+		return -EINVAL;
 
 	cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G);
-	return arm_64_lpae_alloc_pgtable_s2(cfg, cookie);
+	return arm_64_lpae_alloc_pgtable_s2(iop, cfg, cookie);
 }
 
-static struct io_pgtable *
-arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
+int arm_mali_lpae_alloc_pgtable(struct io_pgtable *iop,
+				struct io_pgtable_cfg *cfg, void *cookie)
 {
 	struct arm_lpae_io_pgtable *data;
 
 	/* No quirks for Mali (hopefully) */
 	if (cfg->quirks)
-		return NULL;
+		return -EINVAL;
 
 	if (cfg->ias > 48 || cfg->oas > 40)
-		return NULL;
+		return -EINVAL;
 
 	cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G);
 
 	data = kzalloc(sizeof(*data), GFP_KERNEL);
 	if (!data)
-		return NULL;
+		return -ENOMEM;
 
 	if (arm_lpae_init_pgtable(cfg, data))
-		return NULL;
+		goto out_free_data;
 
 	/* Mali seems to need a full 4-level table regardless of IAS */
 	if (data->start_level > 0) {
@@ -233,25 +234,26 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
 		(ARM_MALI_LPAE_MEMATTR_IMP_DEF
 		 << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV));
 
-	data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data), GFP_KERNEL,
-					   cfg);
-	if (!data->pgd)
+	iop->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data), GFP_KERNEL,
+					  cfg);
+	if (!iop->pgd)
 		goto out_free_data;
 
 	/* Ensure the empty pgd is visible before TRANSTAB can be written */
 	wmb();
 
-	cfg->arm_mali_lpae_cfg.transtab = virt_to_phys(data->pgd) |
+	cfg->arm_mali_lpae_cfg.transtab = virt_to_phys(iop->pgd) |
 					  ARM_MALI_LPAE_TTBR_READ_INNER |
 					  ARM_MALI_LPAE_TTBR_ADRMODE_TABLE;
 	if (cfg->coherent_walk)
 		cfg->arm_mali_lpae_cfg.transtab |= ARM_MALI_LPAE_TTBR_SHARE_OUTER;
 
-	return &data->iop;
+	iop->ops = &data->iop.ops;
+	return 0;
 
 out_free_data:
 	kfree(data);
-	return NULL;
+	return -EINVAL;
 }
 
 struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns = {
@@ -310,21 +312,21 @@ static const struct iommu_flush_ops dummy_tlb_ops __initconst = {
 	.tlb_add_page	= dummy_tlb_add_page,
 };
 
-static void __init arm_lpae_dump_ops(struct io_pgtable_ops *ops)
+static void __init arm_lpae_dump_ops(struct io_pgtable *iop)
 {
-	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
 	struct io_pgtable_cfg *cfg = &data->iop.cfg;
 
 	pr_err("cfg: pgsize_bitmap 0x%lx, ias %u-bit\n",
 		cfg->pgsize_bitmap, cfg->ias);
 	pr_err("data: %d levels, 0x%zx pgd_size, %u pg_shift, %u bits_per_level, pgd @ %p\n",
 		ARM_LPAE_MAX_LEVELS - data->start_level, ARM_LPAE_PGD_SIZE(data),
-		ilog2(ARM_LPAE_GRANULE(data)), data->bits_per_level, data->pgd);
+		ilog2(ARM_LPAE_GRANULE(data)), data->bits_per_level, iop->pgd);
 }
 
-#define __FAIL(ops, i)	({						\
+#define __FAIL(iop, i)	({						\
 		WARN(1, "selftest: test failed for fmt idx %d\n", (i));	\
-		arm_lpae_dump_ops(ops);					\
+		arm_lpae_dump_ops(iop);					\
 		selftest_running = false;				\
 		-EFAULT;						\
 })
@@ -336,34 +338,34 @@ static int __init arm_lpae_run_tests(struct io_pgtable_cfg *cfg)
 		ARM_64_LPAE_S2,
 	};
 
-	int i, j;
+	int i, j, ret;
 	unsigned long iova;
 	size_t size, mapped;
-	struct io_pgtable_ops *ops;
+	struct io_pgtable iop;
 
 	selftest_running = true;
 
 	for (i = 0; i < ARRAY_SIZE(fmts); ++i) {
 		cfg_cookie = cfg;
 		cfg->fmt = fmts[i];
-		ops = alloc_io_pgtable_ops(cfg, cfg);
-		if (!ops) {
+		ret = alloc_io_pgtable_ops(&iop, cfg, cfg);
+		if (ret) {
 			pr_err("selftest: failed to allocate io pgtable ops\n");
-			return -ENOMEM;
+			return ret;
 		}
 
 		/*
 		 * Initial sanity checks.
 		 * Empty page tables shouldn't provide any translations.
 		 */
-		if (ops->iova_to_phys(ops, 42))
-			return __FAIL(ops, i);
+		if (iopt_iova_to_phys(&iop, 42))
+			return __FAIL(&iop, i);
 
-		if (ops->iova_to_phys(ops, SZ_1G + 42))
-			return __FAIL(ops, i);
+		if (iopt_iova_to_phys(&iop, SZ_1G + 42))
+			return __FAIL(&iop, i);
 
-		if (ops->iova_to_phys(ops, SZ_2G + 42))
-			return __FAIL(ops, i);
+		if (iopt_iova_to_phys(&iop, SZ_2G + 42))
+			return __FAIL(&iop, i);
 
 		/*
 		 * Distinct mappings of different granule sizes.
@@ -372,60 +374,60 @@ static int __init arm_lpae_run_tests(struct io_pgtable_cfg *cfg)
 		for_each_set_bit(j, &cfg->pgsize_bitmap, BITS_PER_LONG) {
 			size = 1UL << j;
 
-			if (ops->map_pages(ops, iova, iova, size, 1,
+			if (iopt_map_pages(&iop, iova, iova, size, 1,
 					   IOMMU_READ | IOMMU_WRITE |
 					   IOMMU_NOEXEC | IOMMU_CACHE,
 					   GFP_KERNEL, &mapped))
-				return __FAIL(ops, i);
+				return __FAIL(&iop, i);
 
 			/* Overlapping mappings */
-			if (!ops->map_pages(ops, iova, iova + size, size, 1,
+			if (!iopt_map_pages(&iop, iova, iova + size, size, 1,
 					    IOMMU_READ | IOMMU_NOEXEC,
 					    GFP_KERNEL, &mapped))
-				return __FAIL(ops, i);
+				return __FAIL(&iop, i);
 
-			if (ops->iova_to_phys(ops, iova + 42) != (iova + 42))
-				return __FAIL(ops, i);
+			if (iopt_iova_to_phys(&iop, iova + 42) != (iova + 42))
+				return __FAIL(&iop, i);
 
 			iova += SZ_1G;
 		}
 
 		/* Partial unmap */
 		size = 1UL << __ffs(cfg->pgsize_bitmap);
-		if (ops->unmap_pages(ops, SZ_1G + size, size, 1, NULL) != size)
-			return __FAIL(ops, i);
+		if (iopt_unmap_pages(&iop, SZ_1G + size, size, 1, NULL) != size)
+			return __FAIL(&iop, i);
 
 		/* Remap of partial unmap */
-		if (ops->map_pages(ops, SZ_1G + size, size, size, 1,
+		if (iopt_map_pages(&iop, SZ_1G + size, size, size, 1,
 				   IOMMU_READ, GFP_KERNEL, &mapped))
-			return __FAIL(ops, i);
+			return __FAIL(&iop, i);
 
-		if (ops->iova_to_phys(ops, SZ_1G + size + 42) != (size + 42))
-			return __FAIL(ops, i);
+		if (iopt_iova_to_phys(&iop, SZ_1G + size + 42) != (size + 42))
+			return __FAIL(&iop, i);
 
 		/* Full unmap */
 		iova = 0;
 		for_each_set_bit(j, &cfg->pgsize_bitmap, BITS_PER_LONG) {
 			size = 1UL << j;
 
-			if (ops->unmap_pages(ops, iova, size, 1, NULL) != size)
-				return __FAIL(ops, i);
+			if (iopt_unmap_pages(&iop, iova, size, 1, NULL) != size)
+				return __FAIL(&iop, i);
 
-			if (ops->iova_to_phys(ops, iova + 42))
-				return __FAIL(ops, i);
+			if (iopt_iova_to_phys(&iop, iova + 42))
+				return __FAIL(&iop, i);
 
 			/* Remap full block */
-			if (ops->map_pages(ops, iova, iova, size, 1,
+			if (iopt_map_pages(&iop, iova, iova, size, 1,
 					   IOMMU_WRITE, GFP_KERNEL, &mapped))
-				return __FAIL(ops, i);
+				return __FAIL(&iop, i);
 
-			if (ops->iova_to_phys(ops, iova + 42) != (iova + 42))
-				return __FAIL(ops, i);
+			if (iopt_iova_to_phys(&iop, iova + 42) != (iova + 42))
+				return __FAIL(&iop, i);
 
 			iova += SZ_1G;
 		}
 
-		free_io_pgtable_ops(ops);
+		free_io_pgtable_ops(&iop);
 	}
 
 	selftest_running = false;
diff --git a/drivers/iommu/io-pgtable-dart.c b/drivers/iommu/io-pgtable-dart.c
index f981b25d8c98..1bb2e91ed0a7 100644
--- a/drivers/iommu/io-pgtable-dart.c
+++ b/drivers/iommu/io-pgtable-dart.c
@@ -34,7 +34,7 @@
 	container_of((x), struct dart_io_pgtable, iop)
 
 #define io_pgtable_ops_to_data(x)					\
-	io_pgtable_to_data(io_pgtable_ops_to_pgtable(x))
+	io_pgtable_to_data(io_pgtable_ops_to_params(x))
 
 #define DART_GRANULE(d)						\
 	(sizeof(dart_iopte) << (d)->bits_per_level)
@@ -65,12 +65,10 @@
 #define iopte_deref(pte, d) __va(iopte_to_paddr(pte, d))
 
 struct dart_io_pgtable {
-	struct io_pgtable	iop;
+	struct io_pgtable_params	iop;
 
-	int			tbl_bits;
-	int			bits_per_level;
-
-	void			*pgd[DART_MAX_TABLES];
+	int				tbl_bits;
+	int				bits_per_level;
 };
 
 typedef u64 dart_iopte;
@@ -170,10 +168,14 @@ static dart_iopte dart_install_table(dart_iopte *table,
 	return old;
 }
 
-static int dart_get_table(struct dart_io_pgtable *data, unsigned long iova)
+static dart_iopte *dart_get_table(struct io_pgtable *iop,
+				  struct dart_io_pgtable *data,
+				  unsigned long iova)
 {
-	return (iova >> (3 * data->bits_per_level + ilog2(sizeof(dart_iopte)))) &
+	int tbl = (iova >> (3 * data->bits_per_level + ilog2(sizeof(dart_iopte)))) &
 		((1 << data->tbl_bits) - 1);
+
+	return iop->pgd + DART_GRANULE(data) * tbl;
 }
 
 static int dart_get_l1_index(struct dart_io_pgtable *data, unsigned long iova)
@@ -190,12 +192,12 @@ static int dart_get_l2_index(struct dart_io_pgtable *data, unsigned long iova)
 		 ((1 << data->bits_per_level) - 1);
 }
 
-static  dart_iopte *dart_get_l2(struct dart_io_pgtable *data, unsigned long iova)
+static  dart_iopte *dart_get_l2(struct io_pgtable *iop,
+				struct dart_io_pgtable *data, unsigned long iova)
 {
 	dart_iopte pte, *ptep;
-	int tbl = dart_get_table(data, iova);
 
-	ptep = data->pgd[tbl];
+	ptep = dart_get_table(iop, data, iova);
 	if (!ptep)
 		return NULL;
 
@@ -233,14 +235,14 @@ static dart_iopte dart_prot_to_pte(struct dart_io_pgtable *data,
 	return pte;
 }
 
-static int dart_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
+static int dart_map_pages(struct io_pgtable *iop, unsigned long iova,
 			      phys_addr_t paddr, size_t pgsize, size_t pgcount,
 			      int iommu_prot, gfp_t gfp, size_t *mapped)
 {
-	struct dart_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	struct dart_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
 	struct io_pgtable_cfg *cfg = &data->iop.cfg;
 	size_t tblsz = DART_GRANULE(data);
-	int ret = 0, tbl, num_entries, max_entries, map_idx_start;
+	int ret = 0, num_entries, max_entries, map_idx_start;
 	dart_iopte pte, *cptep, *ptep;
 	dart_iopte prot;
 
@@ -254,9 +256,7 @@ static int dart_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
 	if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE)))
 		return 0;
 
-	tbl = dart_get_table(data, iova);
-
-	ptep = data->pgd[tbl];
+	ptep = dart_get_table(iop, data, iova);
 	ptep += dart_get_l1_index(data, iova);
 	pte = READ_ONCE(*ptep);
 
@@ -295,11 +295,11 @@ static int dart_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
 	return ret;
 }
 
-static size_t dart_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova,
+static size_t dart_unmap_pages(struct io_pgtable *iop, unsigned long iova,
 				   size_t pgsize, size_t pgcount,
 				   struct iommu_iotlb_gather *gather)
 {
-	struct dart_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	struct dart_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
 	struct io_pgtable_cfg *cfg = &data->iop.cfg;
 	int i = 0, num_entries, max_entries, unmap_idx_start;
 	dart_iopte pte, *ptep;
@@ -307,7 +307,7 @@ static size_t dart_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova,
 	if (WARN_ON(pgsize != cfg->pgsize_bitmap || !pgcount))
 		return 0;
 
-	ptep = dart_get_l2(data, iova);
+	ptep = dart_get_l2(iop, data, iova);
 
 	/* Valid L2 IOPTE pointer? */
 	if (WARN_ON(!ptep))
@@ -328,7 +328,7 @@ static size_t dart_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova,
 		*ptep = 0;
 
 		if (!iommu_iotlb_gather_queued(gather))
-			io_pgtable_tlb_add_page(&data->iop, gather,
+			io_pgtable_tlb_add_page(cfg, iop, gather,
 						iova + i * pgsize, pgsize);
 
 		ptep++;
@@ -338,13 +338,13 @@ static size_t dart_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova,
 	return i * pgsize;
 }
 
-static phys_addr_t dart_iova_to_phys(struct io_pgtable_ops *ops,
+static phys_addr_t dart_iova_to_phys(struct io_pgtable *iop,
 					 unsigned long iova)
 {
-	struct dart_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	struct dart_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
 	dart_iopte pte, *ptep;
 
-	ptep = dart_get_l2(data, iova);
+	ptep = dart_get_l2(iop, data, iova);
 
 	/* Valid L2 IOPTE pointer? */
 	if (!ptep)
@@ -394,56 +394,56 @@ dart_alloc_pgtable(struct io_pgtable_cfg *cfg)
 	return data;
 }
 
-static struct io_pgtable *
-apple_dart_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
+static int apple_dart_alloc_pgtable(struct io_pgtable *iop,
+				    struct io_pgtable_cfg *cfg, void *cookie)
 {
 	struct dart_io_pgtable *data;
 	int i;
 
 	if (!cfg->coherent_walk)
-		return NULL;
+		return -EINVAL;
 
 	if (cfg->oas != 36 && cfg->oas != 42)
-		return NULL;
+		return -EINVAL;
 
 	if (cfg->ias > cfg->oas)
-		return NULL;
+		return -EINVAL;
 
 	if (!(cfg->pgsize_bitmap == SZ_4K || cfg->pgsize_bitmap == SZ_16K))
-		return NULL;
+		return -EINVAL;
 
 	data = dart_alloc_pgtable(cfg);
 	if (!data)
-		return NULL;
+		return -ENOMEM;
 
 	cfg->apple_dart_cfg.n_ttbrs = 1 << data->tbl_bits;
 
-	for (i = 0; i < cfg->apple_dart_cfg.n_ttbrs; ++i) {
-		data->pgd[i] = __dart_alloc_pages(DART_GRANULE(data), GFP_KERNEL,
-					   cfg);
-		if (!data->pgd[i])
-			goto out_free_data;
-		cfg->apple_dart_cfg.ttbr[i] = virt_to_phys(data->pgd[i]);
-	}
+	iop->pgd = __dart_alloc_pages(cfg->apple_dart_cfg.n_ttbrs *
+				      DART_GRANULE(data), GFP_KERNEL, cfg);
+	if (!iop->pgd)
+		goto out_free_data;
+
+	for (i = 0; i < cfg->apple_dart_cfg.n_ttbrs; ++i)
+		cfg->apple_dart_cfg.ttbr[i] = virt_to_phys(iop->pgd) +
+					      i * DART_GRANULE(data);
 
-	return &data->iop;
+	iop->ops = &data->iop.ops;
+	return 0;
 
 out_free_data:
-	while (--i >= 0)
-		free_pages((unsigned long)data->pgd[i],
-			   get_order(DART_GRANULE(data)));
 	kfree(data);
-	return NULL;
+	return -ENOMEM;
 }
 
 static void apple_dart_free_pgtable(struct io_pgtable *iop)
 {
-	struct dart_io_pgtable *data = io_pgtable_to_data(iop);
+	struct dart_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
+	size_t n_ttbrs = 1 << data->tbl_bits;
 	dart_iopte *ptep, *end;
 	int i;
 
-	for (i = 0; i < (1 << data->tbl_bits) && data->pgd[i]; ++i) {
-		ptep = data->pgd[i];
+	for (i = 0; i < n_ttbrs; ++i) {
+		ptep = iop->pgd + DART_GRANULE(data) * i;
 		end = (void *)ptep + DART_GRANULE(data);
 
 		while (ptep != end) {
@@ -456,10 +456,9 @@ static void apple_dart_free_pgtable(struct io_pgtable *iop)
 				free_pages(page, get_order(DART_GRANULE(data)));
 			}
 		}
-		free_pages((unsigned long)data->pgd[i],
-			   get_order(DART_GRANULE(data)));
 	}
-
+	free_pages((unsigned long)iop->pgd,
+		   get_order(DART_GRANULE(data) * n_ttbrs));
 	kfree(data);
 }
 
diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c
index 2aba691db1da..acc6802b2f50 100644
--- a/drivers/iommu/io-pgtable.c
+++ b/drivers/iommu/io-pgtable.c
@@ -34,27 +34,30 @@ io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] = {
 #endif
 };
 
-struct io_pgtable_ops *alloc_io_pgtable_ops(struct io_pgtable_cfg *cfg,
-					    void *cookie)
+int alloc_io_pgtable_ops(struct io_pgtable *iop, struct io_pgtable_cfg *cfg,
+			 void *cookie)
 {
-	struct io_pgtable *iop;
+	int ret;
+	struct io_pgtable_params *params;
 	const struct io_pgtable_init_fns *fns;
 
 	if (cfg->fmt >= IO_PGTABLE_NUM_FMTS)
-		return NULL;
+		return -EINVAL;
 
 	fns = io_pgtable_init_table[cfg->fmt];
 	if (!fns)
-		return NULL;
+		return -EINVAL;
 
-	iop = fns->alloc(cfg, cookie);
-	if (!iop)
-		return NULL;
+	ret = fns->alloc(iop, cfg, cookie);
+	if (ret)
+		return ret;
+
+	params = io_pgtable_ops_to_params(iop->ops);
 
 	iop->cookie	= cookie;
-	iop->cfg	= *cfg;
+	params->cfg	= *cfg;
 
-	return &iop->ops;
+	return 0;
 }
 EXPORT_SYMBOL_GPL(alloc_io_pgtable_ops);
 
@@ -62,16 +65,17 @@ EXPORT_SYMBOL_GPL(alloc_io_pgtable_ops);
  * It is the IOMMU driver's responsibility to ensure that the page table
  * is no longer accessible to the walker by this point.
  */
-void free_io_pgtable_ops(struct io_pgtable_ops *ops)
+void free_io_pgtable_ops(struct io_pgtable *iop)
 {
-	struct io_pgtable *iop;
+	struct io_pgtable_params *params;
 
-	if (!ops)
+	if (!iop)
 		return;
 
-	iop = io_pgtable_ops_to_pgtable(ops);
-	io_pgtable_tlb_flush_all(iop);
-	io_pgtable_init_table[iop->cfg.fmt]->free(iop);
+	params = io_pgtable_ops_to_params(iop->ops);
+	io_pgtable_tlb_flush_all(&params->cfg, iop);
+	io_pgtable_init_table[params->cfg.fmt]->free(iop);
+	memset(iop, 0, sizeof(*iop));
 }
 EXPORT_SYMBOL_GPL(free_io_pgtable_ops);
 
diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index 4a1927489635..3ff21e6bf939 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -73,7 +73,7 @@ struct ipmmu_vmsa_domain {
 	struct iommu_domain io_domain;
 
 	struct io_pgtable_cfg cfg;
-	struct io_pgtable_ops *iop;
+	struct io_pgtable iop;
 
 	unsigned int context_id;
 	struct mutex mutex;			/* Protects mappings */
@@ -458,11 +458,11 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
 
 	domain->context_id = ret;
 
-	domain->iop = alloc_io_pgtable_ops(&domain->cfg, domain);
-	if (!domain->iop) {
+	ret = alloc_io_pgtable_ops(&domain->iop, &domain->cfg, domain);
+	if (ret) {
 		ipmmu_domain_free_context(domain->mmu->root,
 					  domain->context_id);
-		return -EINVAL;
+		return ret;
 	}
 
 	ipmmu_domain_setup_context(domain);
@@ -592,7 +592,7 @@ static void ipmmu_domain_free(struct iommu_domain *io_domain)
 	 * been detached.
 	 */
 	ipmmu_domain_destroy_context(domain);
-	free_io_pgtable_ops(domain->iop);
+	free_io_pgtable_ops(&domain->iop);
 	kfree(domain);
 }
 
@@ -664,8 +664,8 @@ static int ipmmu_map(struct iommu_domain *io_domain, unsigned long iova,
 {
 	struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);
 
-	return domain->iop->map_pages(domain->iop, iova, paddr, pgsize, pgcount,
-				      prot, gfp, mapped);
+	return iopt_map_pages(&domain->iop, iova, paddr, pgsize, pgcount, prot,
+			      gfp, mapped);
 }
 
 static size_t ipmmu_unmap(struct iommu_domain *io_domain, unsigned long iova,
@@ -674,7 +674,7 @@ static size_t ipmmu_unmap(struct iommu_domain *io_domain, unsigned long iova,
 {
 	struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);
 
-	return domain->iop->unmap_pages(domain->iop, iova, pgsize, pgcount, gather);
+	return iopt_unmap_pages(&domain->iop, iova, pgsize, pgcount, gather);
 }
 
 static void ipmmu_flush_iotlb_all(struct iommu_domain *io_domain)
@@ -698,7 +698,7 @@ static phys_addr_t ipmmu_iova_to_phys(struct iommu_domain *io_domain,
 
 	/* TODO: Is locking needed ? */
 
-	return domain->iop->iova_to_phys(domain->iop, iova);
+	return iopt_iova_to_phys(&domain->iop, iova);
 }
 
 static int ipmmu_init_platform_device(struct device *dev,
diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index 2c05a84ec1bf..6dae6743e11b 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -41,7 +41,7 @@ struct msm_priv {
 	struct list_head list_attached;
 	struct iommu_domain domain;
 	struct io_pgtable_cfg	cfg;
-	struct io_pgtable_ops	*iop;
+	struct io_pgtable	iop;
 	struct device		*dev;
 	spinlock_t		pgtlock; /* pagetable lock */
 };
@@ -339,6 +339,7 @@ static void msm_iommu_domain_free(struct iommu_domain *domain)
 
 static int msm_iommu_domain_config(struct msm_priv *priv)
 {
+	int ret;
 	spin_lock_init(&priv->pgtlock);
 
 	priv->cfg = (struct io_pgtable_cfg) {
@@ -350,10 +351,10 @@ static int msm_iommu_domain_config(struct msm_priv *priv)
 		.iommu_dev = priv->dev,
 	};
 
-	priv->iop = alloc_io_pgtable_ops(&priv->cfg, priv);
-	if (!priv->iop) {
+	ret = alloc_io_pgtable_ops(&priv->iop, &priv->cfg, priv);
+	if (ret) {
 		dev_err(priv->dev, "Failed to allocate pgtable\n");
-		return -EINVAL;
+		return ret;
 	}
 
 	msm_iommu_ops.pgsize_bitmap = priv->cfg.pgsize_bitmap;
@@ -453,7 +454,7 @@ static void msm_iommu_detach_dev(struct iommu_domain *domain,
 	struct msm_iommu_ctx_dev *master;
 	int ret;
 
-	free_io_pgtable_ops(priv->iop);
+	free_io_pgtable_ops(&priv->iop);
 
 	spin_lock_irqsave(&msm_iommu_lock, flags);
 	list_for_each_entry(iommu, &priv->list_attached, dom_node) {
@@ -480,8 +481,8 @@ static int msm_iommu_map(struct iommu_domain *domain, unsigned long iova,
 	int ret;
 
 	spin_lock_irqsave(&priv->pgtlock, flags);
-	ret = priv->iop->map_pages(priv->iop, iova, pa, pgsize, pgcount, prot,
-				   GFP_ATOMIC, mapped);
+	ret = iopt_map_pages(&priv->iop, iova, pa, pgsize, pgcount, prot,
+			     GFP_ATOMIC, mapped);
 	spin_unlock_irqrestore(&priv->pgtlock, flags);
 
 	return ret;
@@ -504,7 +505,7 @@ static size_t msm_iommu_unmap(struct iommu_domain *domain, unsigned long iova,
 	size_t ret;
 
 	spin_lock_irqsave(&priv->pgtlock, flags);
-	ret = priv->iop->unmap_pages(priv->iop, iova, pgsize, pgcount, gather);
+	ret = iopt_unmap_pages(&priv->iop, iova, pgsize, pgcount, gather);
 	spin_unlock_irqrestore(&priv->pgtlock, flags);
 
 	return ret;
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 0d754d94ae52..615d9ade575e 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -244,7 +244,7 @@ struct mtk_iommu_data {
 
 struct mtk_iommu_domain {
 	struct io_pgtable_cfg		cfg;
-	struct io_pgtable_ops		*iop;
+	struct io_pgtable		iop;
 
 	struct mtk_iommu_bank_data	*bank;
 	struct iommu_domain		domain;
@@ -587,6 +587,7 @@ static int mtk_iommu_domain_finalise(struct mtk_iommu_domain *dom,
 {
 	const struct mtk_iommu_iova_region *region;
 	struct mtk_iommu_domain	*m4u_dom;
+	int ret;
 
 	/* Always use bank0 in sharing pgtable case */
 	m4u_dom = data->bank[0].m4u_dom;
@@ -615,8 +616,8 @@ static int mtk_iommu_domain_finalise(struct mtk_iommu_domain *dom,
 	else
 		dom->cfg.oas = 35;
 
-	dom->iop = alloc_io_pgtable_ops(&dom->cfg, data);
-	if (!dom->iop) {
+	ret = alloc_io_pgtable_ops(&dom->iop, &dom->cfg, data);
+	if (ret) {
 		dev_err(data->dev, "Failed to alloc io pgtable\n");
 		return -ENOMEM;
 	}
@@ -730,7 +731,7 @@ static int mtk_iommu_map(struct iommu_domain *domain, unsigned long iova,
 		paddr |= BIT_ULL(32);
 
 	/* Synchronize with the tlb_lock */
-	return dom->iop->map_pages(dom->iop, iova, paddr, pgsize, pgcount, prot, gfp, mapped);
+	return iopt_map_pages(&dom->iop, iova, paddr, pgsize, pgcount, prot, gfp, mapped);
 }
 
 static size_t mtk_iommu_unmap(struct iommu_domain *domain,
@@ -740,7 +741,7 @@ static size_t mtk_iommu_unmap(struct iommu_domain *domain,
 	struct mtk_iommu_domain *dom = to_mtk_domain(domain);
 
 	iommu_iotlb_gather_add_range(gather, iova, pgsize * pgcount);
-	return dom->iop->unmap_pages(dom->iop, iova, pgsize, pgcount, gather);
+	return iopt_unmap_pages(&dom->iop, iova, pgsize, pgcount, gather);
 }
 
 static void mtk_iommu_flush_iotlb_all(struct iommu_domain *domain)
@@ -773,7 +774,7 @@ static phys_addr_t mtk_iommu_iova_to_phys(struct iommu_domain *domain,
 	struct mtk_iommu_domain *dom = to_mtk_domain(domain);
 	phys_addr_t pa;
 
-	pa = dom->iop->iova_to_phys(dom->iop, iova);
+	pa = iopt_iova_to_phys(&dom->iop, iova);
 	if (IS_ENABLED(CONFIG_PHYS_ADDR_T_64BIT) &&
 	    dom->bank->parent_data->enable_4GB &&
 	    pa >= MTK_IOMMU_4GB_MODE_REMAP_BASE)
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 05/45] iommu/io-pgtable: Split io_pgtable structure
@ 2023-02-01 12:52   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:52 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker, Abhinav Kumar,
	Alyssa Rosenzweig, Andy Gross, Bjorn Andersson, Daniel Vetter,
	David Airlie, Dmitry Baryshkov, Hector Martin, Konrad Dybcio,
	Matthias Brugger, Rob Clark, Rob Herring, Sean Paul,
	Steven Price, Suravee Suthikulpanit, Sven Peter, Tomeu Vizoso,
	Yong Wu

The io_pgtable structure contains all information needed for io-pgtable
ops map() and unmap(), including a static configuration, driver-facing
ops, TLB callbacks and the PGD pointer. Most of these are common to all
sets of page tables for a given configuration, and really only need one
instance.

Split the structure in two:

* io_pgtable_params contains information that is common to all sets of
  page tables for a given io_pgtable_cfg.
* io_pgtable contains information that is different for each set of page
  tables, namely the PGD and the IOMMU driver cookie passed to TLB
  callbacks.

Keep essentially the same interface for IOMMU drivers, but move it
behind a set of helpers.

The goal is to optimize for space, in order to allocate less memory in
the KVM SMMU driver. While storing 64k io-pgtables with identical
configuration would previously require 10MB, it is now 512kB because the
driver only needs to store the pgd for each domain.

Note that the io_pgtable_cfg still contains the TTBRs, which are
specific to a set of page tables. Most of them can be removed, since
IOMMU drivers can trivially obtain them with virt_to_phys(iop->pgd).
Some architectures do have static configuration bits in the TTBR that
need to be kept.

Unfortunately the split does add an additional dereference which
degrades performance slightly. Running a single-threaded dma-map
benchmark on a server with SMMUv3, I measured a regression of 7-9ns for
map() and 32-78ns for unmap(), which is a slowdown of about 4% and 8%
respectively.

Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Cc: Andy Gross <agross@kernel.org>
Cc: Bjorn Andersson <andersson@kernel.org>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: David Airlie <airlied@gmail.com>
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Cc: Hector Martin <marcan@marcan.st>
Cc: Konrad Dybcio <konrad.dybcio@linaro.org>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Sean Paul <sean@poorly.run>
Cc: Steven Price <steven.price@arm.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Sven Peter <sven@svenpeter.dev>
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Cc: Yong Wu <yong.wu@mediatek.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/gpu/drm/panfrost/panfrost_device.h  |   2 +-
 drivers/iommu/amd/amd_iommu_types.h         |  17 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |   3 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.h       |   2 +-
 include/linux/io-pgtable-arm.h              |  12 +-
 include/linux/io-pgtable.h                  |  94 +++++++---
 drivers/gpu/drm/msm/msm_iommu.c             |  21 ++-
 drivers/gpu/drm/panfrost/panfrost_mmu.c     |  20 +--
 drivers/iommu/amd/io_pgtable.c              |  26 +--
 drivers/iommu/amd/io_pgtable_v2.c           |  43 ++---
 drivers/iommu/amd/iommu.c                   |  28 ++-
 drivers/iommu/apple-dart.c                  |  36 ++--
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  34 ++--
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c  |   7 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.c       |  40 ++---
 drivers/iommu/arm/arm-smmu/qcom_iommu.c     |  40 ++---
 drivers/iommu/io-pgtable-arm-common.c       |  80 +++++----
 drivers/iommu/io-pgtable-arm-v7s.c          | 189 ++++++++++----------
 drivers/iommu/io-pgtable-arm.c              | 158 ++++++++--------
 drivers/iommu/io-pgtable-dart.c             |  97 +++++-----
 drivers/iommu/io-pgtable.c                  |  36 ++--
 drivers/iommu/ipmmu-vmsa.c                  |  18 +-
 drivers/iommu/msm_iommu.c                   |  17 +-
 drivers/iommu/mtk_iommu.c                   |  13 +-
 24 files changed, 519 insertions(+), 514 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h
index 8b25278f34c8..8a610c4b8f03 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -126,7 +126,7 @@ struct panfrost_mmu {
 	struct panfrost_device *pfdev;
 	struct kref refcount;
 	struct io_pgtable_cfg pgtbl_cfg;
-	struct io_pgtable_ops *pgtbl_ops;
+	struct io_pgtable pgtbl;
 	struct drm_mm mm;
 	spinlock_t mm_lock;
 	int as;
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 3d684190b4d5..5920a556f7ec 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -516,10 +516,10 @@ struct amd_irte_ops;
 #define AMD_IOMMU_FLAG_TRANS_PRE_ENABLED      (1 << 0)
 
 #define io_pgtable_to_data(x) \
-	container_of((x), struct amd_io_pgtable, iop)
+	container_of((x), struct amd_io_pgtable, iop_params)
 
 #define io_pgtable_ops_to_data(x) \
-	io_pgtable_to_data(io_pgtable_ops_to_pgtable(x))
+	io_pgtable_to_data(io_pgtable_ops_to_params(x))
 
 #define io_pgtable_ops_to_domain(x) \
 	container_of(io_pgtable_ops_to_data(x), \
@@ -529,12 +529,13 @@ struct amd_irte_ops;
 	container_of((x), struct amd_io_pgtable, pgtbl_cfg)
 
 struct amd_io_pgtable {
-	struct io_pgtable_cfg	pgtbl_cfg;
-	struct io_pgtable	iop;
-	int			mode;
-	u64			*root;
-	atomic64_t		pt_root;	/* pgtable root and pgtable mode */
-	u64			*pgd;		/* v2 pgtable pgd pointer */
+	struct io_pgtable_cfg		pgtbl_cfg;
+	struct io_pgtable		iop;
+	struct io_pgtable_params	iop_params;
+	int				mode;
+	u64				*root;
+	atomic64_t			pt_root;	/* pgtable root and pgtable mode */
+	u64				*pgd;		/* v2 pgtable pgd pointer */
 };
 
 /*
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 8d772ea8a583..cec3c8103404 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -10,6 +10,7 @@
 
 #include <linux/bitfield.h>
 #include <linux/iommu.h>
+#include <linux/io-pgtable.h>
 #include <linux/kernel.h>
 #include <linux/mmzone.h>
 #include <linux/sizes.h>
@@ -710,7 +711,7 @@ struct arm_smmu_domain {
 	struct arm_smmu_device		*smmu;
 	struct mutex			init_mutex; /* Protects smmu pointer */
 
-	struct io_pgtable_ops		*pgtbl_ops;
+	struct io_pgtable		pgtbl;
 	bool				stall_enabled;
 	atomic_t			nr_ats_masters;
 
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index 703fd5817ec1..249825fc71ac 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -366,7 +366,7 @@ enum arm_smmu_domain_stage {
 
 struct arm_smmu_domain {
 	struct arm_smmu_device		*smmu;
-	struct io_pgtable_ops		*pgtbl_ops;
+	struct io_pgtable		pgtbl;
 	unsigned long			pgtbl_quirks;
 	const struct iommu_flush_ops	*flush_ops;
 	struct arm_smmu_cfg		cfg;
diff --git a/include/linux/io-pgtable-arm.h b/include/linux/io-pgtable-arm.h
index 42202bc0ffa2..5199bd9851b6 100644
--- a/include/linux/io-pgtable-arm.h
+++ b/include/linux/io-pgtable-arm.h
@@ -9,13 +9,11 @@ extern bool selftest_running;
 typedef u64 arm_lpae_iopte;
 
 struct arm_lpae_io_pgtable {
-	struct io_pgtable	iop;
+	struct io_pgtable_params	iop;
 
-	int			pgd_bits;
-	int			start_level;
-	int			bits_per_level;
-
-	void			*pgd;
+	int				pgd_bits;
+	int				start_level;
+	int				bits_per_level;
 };
 
 /* Struct accessors */
@@ -23,7 +21,7 @@ struct arm_lpae_io_pgtable {
 	container_of((x), struct arm_lpae_io_pgtable, iop)
 
 #define io_pgtable_ops_to_data(x)					\
-	io_pgtable_to_data(io_pgtable_ops_to_pgtable(x))
+	io_pgtable_to_data(io_pgtable_ops_to_params(x))
 
 /*
  * Calculate the right shift amount to get to the portion describing level l
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index ee6484d7a5e0..cce5ddbf71c7 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -149,6 +149,20 @@ struct io_pgtable_cfg {
 	};
 };
 
+/**
+ * struct io_pgtable - Structure describing a set of page tables.
+ *
+ * @ops:	The page table operations in use for this set of page tables.
+ * @cookie:	An opaque token provided by the IOMMU driver and passed back to
+ *		any callback routines.
+ * @pgd:	Virtual address of the page directory.
+ */
+struct io_pgtable {
+	struct io_pgtable_ops	*ops;
+	void			*cookie;
+	void			*pgd;
+};
+
 /**
  * struct io_pgtable_ops - Page table manipulation API for IOMMU drivers.
  *
@@ -160,36 +174,64 @@ struct io_pgtable_cfg {
  * the same names.
  */
 struct io_pgtable_ops {
-	int (*map_pages)(struct io_pgtable_ops *ops, unsigned long iova,
+	int (*map_pages)(struct io_pgtable *iop, unsigned long iova,
 			 phys_addr_t paddr, size_t pgsize, size_t pgcount,
 			 int prot, gfp_t gfp, size_t *mapped);
-	size_t (*unmap_pages)(struct io_pgtable_ops *ops, unsigned long iova,
+	size_t (*unmap_pages)(struct io_pgtable *iop, unsigned long iova,
 			      size_t pgsize, size_t pgcount,
 			      struct iommu_iotlb_gather *gather);
-	phys_addr_t (*iova_to_phys)(struct io_pgtable_ops *ops,
-				    unsigned long iova);
+	phys_addr_t (*iova_to_phys)(struct io_pgtable *iop, unsigned long iova);
 };
 
+static inline int
+iopt_map_pages(struct io_pgtable *iop, unsigned long iova, phys_addr_t paddr,
+	       size_t pgsize, size_t pgcount, int prot, gfp_t gfp,
+	       size_t *mapped)
+{
+	if (!iop->ops || !iop->ops->map_pages)
+		return -EINVAL;
+	return iop->ops->map_pages(iop, iova, paddr, pgsize, pgcount, prot, gfp,
+				   mapped);
+}
+
+static inline size_t
+iopt_unmap_pages(struct io_pgtable *iop, unsigned long iova, size_t pgsize,
+		 size_t pgcount, struct iommu_iotlb_gather *gather)
+{
+	if (!iop->ops || !iop->ops->map_pages)
+		return 0;
+	return iop->ops->unmap_pages(iop, iova, pgsize, pgcount, gather);
+}
+
+static inline phys_addr_t
+iopt_iova_to_phys(struct io_pgtable *iop, unsigned long iova)
+{
+	if (!iop->ops || !iop->ops->iova_to_phys)
+		return 0;
+	return iop->ops->iova_to_phys(iop, iova);
+}
+
 /**
  * alloc_io_pgtable_ops() - Allocate a page table allocator for use by an IOMMU.
  *
+ * @iop:    The page table object, filled with the allocated ops on success
  * @cfg:    The page table configuration. This will be modified to represent
  *          the configuration actually provided by the allocator (e.g. the
  *          pgsize_bitmap may be restricted).
  * @cookie: An opaque token provided by the IOMMU driver and passed back to
  *          the callback routines in cfg->tlb.
  */
-struct io_pgtable_ops *alloc_io_pgtable_ops(struct io_pgtable_cfg *cfg,
-					    void *cookie);
+int alloc_io_pgtable_ops(struct io_pgtable *iop, struct io_pgtable_cfg *cfg,
+			 void *cookie);
 
 /**
- * free_io_pgtable_ops() - Free an io_pgtable_ops structure. The caller
+ * free_io_pgtable_ops() - Free the page table. The caller
  *                         *must* ensure that the page table is no longer
  *                         live, but the TLB can be dirty.
  *
- * @ops: The ops returned from alloc_io_pgtable_ops.
+ * @iop: The iop object passed to alloc_io_pgtable_ops
  */
-void free_io_pgtable_ops(struct io_pgtable_ops *ops);
+void free_io_pgtable_ops(struct io_pgtable *iop);
 
 /**
  * io_pgtable_configure - Create page table config
@@ -209,42 +251,41 @@ int io_pgtable_configure(struct io_pgtable_cfg *cfg, size_t *pgd_size);
  */
 
 /**
- * struct io_pgtable - Internal structure describing a set of page tables.
+ * struct io_pgtable_params - Internal structure describing parameters for a
+ *			      given page table configuration
  *
- * @cookie: An opaque token provided by the IOMMU driver and passed back to
- *          any callback routines.
  * @cfg:    A copy of the page table configuration.
  * @ops:    The page table operations in use for this set of page tables.
  */
-struct io_pgtable {
-	void			*cookie;
+struct io_pgtable_params {
 	struct io_pgtable_cfg	cfg;
 	struct io_pgtable_ops	ops;
 };
 
-#define io_pgtable_ops_to_pgtable(x) container_of((x), struct io_pgtable, ops)
+#define io_pgtable_ops_to_params(x) container_of((x), struct io_pgtable_params, ops)
 
-static inline void io_pgtable_tlb_flush_all(struct io_pgtable *iop)
+static inline void io_pgtable_tlb_flush_all(struct io_pgtable_cfg *cfg,
+					    struct io_pgtable *iop)
 {
-	if (iop->cfg.tlb && iop->cfg.tlb->tlb_flush_all)
-		iop->cfg.tlb->tlb_flush_all(iop->cookie);
+	if (cfg->tlb && cfg->tlb->tlb_flush_all)
+		cfg->tlb->tlb_flush_all(iop->cookie);
 }
 
 static inline void
-io_pgtable_tlb_flush_walk(struct io_pgtable *iop, unsigned long iova,
-			  size_t size, size_t granule)
+io_pgtable_tlb_flush_walk(struct io_pgtable_cfg *cfg, struct io_pgtable *iop,
+			  unsigned long iova, size_t size, size_t granule)
 {
-	if (iop->cfg.tlb && iop->cfg.tlb->tlb_flush_walk)
-		iop->cfg.tlb->tlb_flush_walk(iova, size, granule, iop->cookie);
+	if (cfg->tlb && cfg->tlb->tlb_flush_walk)
+		cfg->tlb->tlb_flush_walk(iova, size, granule, iop->cookie);
 }
 
 static inline void
-io_pgtable_tlb_add_page(struct io_pgtable *iop,
+io_pgtable_tlb_add_page(struct io_pgtable_cfg *cfg, struct io_pgtable *iop,
 			struct iommu_iotlb_gather * gather, unsigned long iova,
 			size_t granule)
 {
-	if (iop->cfg.tlb && iop->cfg.tlb->tlb_add_page)
-		iop->cfg.tlb->tlb_add_page(gather, iova, granule, iop->cookie);
+	if (cfg->tlb && cfg->tlb->tlb_add_page)
+		cfg->tlb->tlb_add_page(gather, iova, granule, iop->cookie);
 }
 
 /**
@@ -256,7 +297,8 @@ io_pgtable_tlb_add_page(struct io_pgtable *iop,
  * @configure: Create the configuration without allocating anything. Optional.
  */
 struct io_pgtable_init_fns {
-	struct io_pgtable *(*alloc)(struct io_pgtable_cfg *cfg, void *cookie);
+	int (*alloc)(struct io_pgtable *iop, struct io_pgtable_cfg *cfg,
+		     void *cookie);
 	void (*free)(struct io_pgtable *iop);
 	int (*configure)(struct io_pgtable_cfg *cfg, size_t *pgd_size);
 };
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index e9c6f281e3dd..e372ca6cd79c 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -20,7 +20,7 @@ struct msm_iommu {
 struct msm_iommu_pagetable {
 	struct msm_mmu base;
 	struct msm_mmu *parent;
-	struct io_pgtable_ops *pgtbl_ops;
+	struct io_pgtable pgtbl;
 	unsigned long pgsize_bitmap;	/* Bitmap of page sizes in use */
 	phys_addr_t ttbr;
 	u32 asid;
@@ -90,14 +90,14 @@ static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova,
 		size_t size)
 {
 	struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
-	struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
 
 	while (size) {
 		size_t unmapped, pgsize, count;
 
 		pgsize = calc_pgsize(pagetable, iova, iova, size, &count);
 
-		unmapped = ops->unmap_pages(ops, iova, pgsize, count, NULL);
+		unmapped = iopt_unmap_pages(&pagetable->pgtbl, iova, pgsize,
+					    count, NULL);
 		if (!unmapped)
 			break;
 
@@ -114,7 +114,7 @@ static int msm_iommu_pagetable_map(struct msm_mmu *mmu, u64 iova,
 		struct sg_table *sgt, size_t len, int prot)
 {
 	struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
-	struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
+	struct io_pgtable *iop = &pagetable->pgtbl;
 	struct scatterlist *sg;
 	u64 addr = iova;
 	unsigned int i;
@@ -129,7 +129,7 @@ static int msm_iommu_pagetable_map(struct msm_mmu *mmu, u64 iova,
 
 			pgsize = calc_pgsize(pagetable, addr, phys, size, &count);
 
-			ret = ops->map_pages(ops, addr, phys, pgsize, count,
+			ret = iopt_map_pages(iop, addr, phys, pgsize, count,
 					     prot, GFP_KERNEL, &mapped);
 
 			/* map_pages could fail after mapping some of the pages,
@@ -163,7 +163,7 @@ static void msm_iommu_pagetable_destroy(struct msm_mmu *mmu)
 	if (atomic_dec_return(&iommu->pagetables) == 0)
 		adreno_smmu->set_ttbr0_cfg(adreno_smmu->cookie, NULL);
 
-	free_io_pgtable_ops(pagetable->pgtbl_ops);
+	free_io_pgtable_ops(&pagetable->pgtbl);
 	kfree(pagetable);
 }
 
@@ -258,11 +258,10 @@ struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent)
 	ttbr0_cfg.quirks &= ~IO_PGTABLE_QUIRK_ARM_TTBR1;
 	ttbr0_cfg.tlb = &null_tlb_ops;
 
-	pagetable->pgtbl_ops = alloc_io_pgtable_ops(&ttbr0_cfg, iommu->domain);
-
-	if (!pagetable->pgtbl_ops) {
+	ret = alloc_io_pgtable_ops(&pagetable->pgtbl, &ttbr0_cfg, iommu->domain);
+	if (ret) {
 		kfree(pagetable);
-		return ERR_PTR(-ENOMEM);
+		return ERR_PTR(ret);
 	}
 
 	/*
@@ -275,7 +274,7 @@ struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent)
 
 		ret = adreno_smmu->set_ttbr0_cfg(adreno_smmu->cookie, &ttbr0_cfg);
 		if (ret) {
-			free_io_pgtable_ops(pagetable->pgtbl_ops);
+			free_io_pgtable_ops(&pagetable->pgtbl);
 			kfree(pagetable);
 			return ERR_PTR(ret);
 		}
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 31bdb5d46244..118b49ab120f 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -290,7 +290,6 @@ static int mmu_map_sg(struct panfrost_device *pfdev, struct panfrost_mmu *mmu,
 {
 	unsigned int count;
 	struct scatterlist *sgl;
-	struct io_pgtable_ops *ops = mmu->pgtbl_ops;
 	u64 start_iova = iova;
 
 	for_each_sgtable_dma_sg(sgt, sgl, count) {
@@ -303,8 +302,8 @@ static int mmu_map_sg(struct panfrost_device *pfdev, struct panfrost_mmu *mmu,
 			size_t pgcount, mapped = 0;
 			size_t pgsize = get_pgsize(iova | paddr, len, &pgcount);
 
-			ops->map_pages(ops, iova, paddr, pgsize, pgcount, prot,
-				       GFP_KERNEL, &mapped);
+			iopt_map_pages(&mmu->pgtbl, iova, paddr, pgsize,
+				       pgcount, prot, GFP_KERNEL, &mapped);
 			/* Don't get stuck if things have gone wrong */
 			mapped = max(mapped, pgsize);
 			iova += mapped;
@@ -349,7 +348,7 @@ void panfrost_mmu_unmap(struct panfrost_gem_mapping *mapping)
 	struct panfrost_gem_object *bo = mapping->obj;
 	struct drm_gem_object *obj = &bo->base.base;
 	struct panfrost_device *pfdev = to_panfrost_device(obj->dev);
-	struct io_pgtable_ops *ops = mapping->mmu->pgtbl_ops;
+	struct io_pgtable *iop = &mapping->mmu->pgtbl;
 	u64 iova = mapping->mmnode.start << PAGE_SHIFT;
 	size_t len = mapping->mmnode.size << PAGE_SHIFT;
 	size_t unmapped_len = 0;
@@ -366,8 +365,8 @@ void panfrost_mmu_unmap(struct panfrost_gem_mapping *mapping)
 
 		if (bo->is_heap)
 			pgcount = 1;
-		if (!bo->is_heap || ops->iova_to_phys(ops, iova)) {
-			unmapped_page = ops->unmap_pages(ops, iova, pgsize, pgcount, NULL);
+		if (!bo->is_heap || iopt_iova_to_phys(iop, iova)) {
+			unmapped_page = iopt_unmap_pages(iop, iova, pgsize, pgcount, NULL);
 			WARN_ON(unmapped_page != pgsize * pgcount);
 		}
 		iova += pgsize * pgcount;
@@ -560,7 +559,7 @@ static void panfrost_mmu_release_ctx(struct kref *kref)
 	}
 	spin_unlock(&pfdev->as_lock);
 
-	free_io_pgtable_ops(mmu->pgtbl_ops);
+	free_io_pgtable_ops(&mmu->pgtbl);
 	drm_mm_takedown(&mmu->mm);
 	kfree(mmu);
 }
@@ -605,6 +604,7 @@ static void panfrost_drm_mm_color_adjust(const struct drm_mm_node *node,
 
 struct panfrost_mmu *panfrost_mmu_ctx_create(struct panfrost_device *pfdev)
 {
+	int ret;
 	struct panfrost_mmu *mmu;
 
 	mmu = kzalloc(sizeof(*mmu), GFP_KERNEL);
@@ -631,10 +631,10 @@ struct panfrost_mmu *panfrost_mmu_ctx_create(struct panfrost_device *pfdev)
 		.iommu_dev	= pfdev->dev,
 	};
 
-	mmu->pgtbl_ops = alloc_io_pgtable_ops(&mmu->pgtbl_cfg, mmu);
-	if (!mmu->pgtbl_ops) {
+	ret = alloc_io_pgtable_ops(&mmu->pgtbl, &mmu->pgtbl_cfg, mmu);
+	if (ret) {
 		kfree(mmu);
-		return ERR_PTR(-EINVAL);
+		return ERR_PTR(ret);
 	}
 
 	kref_init(&mmu->refcount);
diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
index ace0e9b8b913..f9ea551404ba 100644
--- a/drivers/iommu/amd/io_pgtable.c
+++ b/drivers/iommu/amd/io_pgtable.c
@@ -360,11 +360,11 @@ static void free_clear_pte(u64 *pte, u64 pteval, struct list_head *freelist)
  * supporting all features of AMD IOMMU page tables like level skipping
  * and full 64 bit address spaces.
  */
-static int iommu_v1_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
+static int iommu_v1_map_pages(struct io_pgtable *iop, unsigned long iova,
 			      phys_addr_t paddr, size_t pgsize, size_t pgcount,
 			      int prot, gfp_t gfp, size_t *mapped)
 {
-	struct protection_domain *dom = io_pgtable_ops_to_domain(ops);
+	struct protection_domain *dom = io_pgtable_ops_to_domain(iop->ops);
 	LIST_HEAD(freelist);
 	bool updated = false;
 	u64 __pte, *pte;
@@ -435,12 +435,12 @@ static int iommu_v1_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
 	return ret;
 }
 
-static unsigned long iommu_v1_unmap_pages(struct io_pgtable_ops *ops,
+static unsigned long iommu_v1_unmap_pages(struct io_pgtable *iop,
 					  unsigned long iova,
 					  size_t pgsize, size_t pgcount,
 					  struct iommu_iotlb_gather *gather)
 {
-	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(ops);
+	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(iop->ops);
 	unsigned long long unmapped;
 	unsigned long unmap_size;
 	u64 *pte;
@@ -469,9 +469,9 @@ static unsigned long iommu_v1_unmap_pages(struct io_pgtable_ops *ops,
 	return unmapped;
 }
 
-static phys_addr_t iommu_v1_iova_to_phys(struct io_pgtable_ops *ops, unsigned long iova)
+static phys_addr_t iommu_v1_iova_to_phys(struct io_pgtable *iop, unsigned long iova)
 {
-	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(ops);
+	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(iop->ops);
 	unsigned long offset_mask, pte_pgsize;
 	u64 *pte, __pte;
 
@@ -491,7 +491,7 @@ static phys_addr_t iommu_v1_iova_to_phys(struct io_pgtable_ops *ops, unsigned lo
  */
 static void v1_free_pgtable(struct io_pgtable *iop)
 {
-	struct amd_io_pgtable *pgtable = container_of(iop, struct amd_io_pgtable, iop);
+	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(iop->ops);
 	struct protection_domain *dom;
 	LIST_HEAD(freelist);
 
@@ -515,7 +515,8 @@ static void v1_free_pgtable(struct io_pgtable *iop)
 	put_pages_list(&freelist);
 }
 
-static struct io_pgtable *v1_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
+int v1_alloc_pgtable(struct io_pgtable *iop, struct io_pgtable_cfg *cfg,
+		     void *cookie)
 {
 	struct amd_io_pgtable *pgtable = io_pgtable_cfg_to_data(cfg);
 
@@ -524,11 +525,12 @@ static struct io_pgtable *v1_alloc_pgtable(struct io_pgtable_cfg *cfg, void *coo
 	cfg->oas            = IOMMU_OUT_ADDR_BIT_SIZE,
 	cfg->tlb            = &v1_flush_ops;
 
-	pgtable->iop.ops.map_pages    = iommu_v1_map_pages;
-	pgtable->iop.ops.unmap_pages  = iommu_v1_unmap_pages;
-	pgtable->iop.ops.iova_to_phys = iommu_v1_iova_to_phys;
+	pgtable->iop_params.ops.map_pages    = iommu_v1_map_pages;
+	pgtable->iop_params.ops.unmap_pages  = iommu_v1_unmap_pages;
+	pgtable->iop_params.ops.iova_to_phys = iommu_v1_iova_to_phys;
+	iop->ops = &pgtable->iop_params.ops;
 
-	return &pgtable->iop;
+	return 0;
 }
 
 struct io_pgtable_init_fns io_pgtable_amd_iommu_v1_init_fns = {
diff --git a/drivers/iommu/amd/io_pgtable_v2.c b/drivers/iommu/amd/io_pgtable_v2.c
index 8638ddf6fb3b..52acb8f11a27 100644
--- a/drivers/iommu/amd/io_pgtable_v2.c
+++ b/drivers/iommu/amd/io_pgtable_v2.c
@@ -239,12 +239,12 @@ static u64 *fetch_pte(struct amd_io_pgtable *pgtable,
 	return pte;
 }
 
-static int iommu_v2_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
+static int iommu_v2_map_pages(struct io_pgtable *iop, unsigned long iova,
 			      phys_addr_t paddr, size_t pgsize, size_t pgcount,
 			      int prot, gfp_t gfp, size_t *mapped)
 {
-	struct protection_domain *pdom = io_pgtable_ops_to_domain(ops);
-	struct io_pgtable_cfg *cfg = &pdom->iop.iop.cfg;
+	struct protection_domain *pdom = io_pgtable_ops_to_domain(iop->ops);
+	struct io_pgtable_cfg *cfg = &pdom->iop.iop_params.cfg;
 	u64 *pte;
 	unsigned long map_size;
 	unsigned long mapped_size = 0;
@@ -290,13 +290,13 @@ static int iommu_v2_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
 	return ret;
 }
 
-static unsigned long iommu_v2_unmap_pages(struct io_pgtable_ops *ops,
+static unsigned long iommu_v2_unmap_pages(struct io_pgtable *iop,
 					  unsigned long iova,
 					  size_t pgsize, size_t pgcount,
 					  struct iommu_iotlb_gather *gather)
 {
-	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(ops);
-	struct io_pgtable_cfg *cfg = &pgtable->iop.cfg;
+	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(iop->ops);
+	struct io_pgtable_cfg *cfg = &pgtable->iop_params.cfg;
 	unsigned long unmap_size;
 	unsigned long unmapped = 0;
 	size_t size = pgcount << __ffs(pgsize);
@@ -319,9 +319,9 @@ static unsigned long iommu_v2_unmap_pages(struct io_pgtable_ops *ops,
 	return unmapped;
 }
 
-static phys_addr_t iommu_v2_iova_to_phys(struct io_pgtable_ops *ops, unsigned long iova)
+static phys_addr_t iommu_v2_iova_to_phys(struct io_pgtable *iop, unsigned long iova)
 {
-	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(ops);
+	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(iop->ops);
 	unsigned long offset_mask, pte_pgsize;
 	u64 *pte, __pte;
 
@@ -362,7 +362,7 @@ static const struct iommu_flush_ops v2_flush_ops = {
 static void v2_free_pgtable(struct io_pgtable *iop)
 {
 	struct protection_domain *pdom;
-	struct amd_io_pgtable *pgtable = container_of(iop, struct amd_io_pgtable, iop);
+	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(iop->ops);
 
 	pdom = container_of(pgtable, struct protection_domain, iop);
 	if (!(pdom->flags & PD_IOMMUV2_MASK))
@@ -375,38 +375,39 @@ static void v2_free_pgtable(struct io_pgtable *iop)
 	amd_iommu_domain_update(pdom);
 
 	/* Free page table */
-	free_pgtable(pgtable->pgd, get_pgtable_level());
+	free_pgtable(iop->pgd, get_pgtable_level());
 }
 
-static struct io_pgtable *v2_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
+int v2_alloc_pgtable(struct io_pgtable *iop, struct io_pgtable_cfg *cfg, void *cookie)
 {
 	struct amd_io_pgtable *pgtable = io_pgtable_cfg_to_data(cfg);
 	struct protection_domain *pdom = (struct protection_domain *)cookie;
 	int ret;
 
-	pgtable->pgd = alloc_pgtable_page();
-	if (!pgtable->pgd)
-		return NULL;
+	iop->pgd = alloc_pgtable_page();
+	if (!iop->pgd)
+		return -ENOMEM;
 
-	ret = amd_iommu_domain_set_gcr3(&pdom->domain, 0, iommu_virt_to_phys(pgtable->pgd));
+	ret = amd_iommu_domain_set_gcr3(&pdom->domain, 0, iommu_virt_to_phys(iop->pgd));
 	if (ret)
 		goto err_free_pgd;
 
-	pgtable->iop.ops.map_pages    = iommu_v2_map_pages;
-	pgtable->iop.ops.unmap_pages  = iommu_v2_unmap_pages;
-	pgtable->iop.ops.iova_to_phys = iommu_v2_iova_to_phys;
+	pgtable->iop_params.ops.map_pages    = iommu_v2_map_pages;
+	pgtable->iop_params.ops.unmap_pages  = iommu_v2_unmap_pages;
+	pgtable->iop_params.ops.iova_to_phys = iommu_v2_iova_to_phys;
+	iop->ops = &pgtable->iop_params.ops;
 
 	cfg->pgsize_bitmap = AMD_IOMMU_PGSIZES_V2,
 	cfg->ias           = IOMMU_IN_ADDR_BIT_SIZE,
 	cfg->oas           = IOMMU_OUT_ADDR_BIT_SIZE,
 	cfg->tlb           = &v2_flush_ops;
 
-	return &pgtable->iop;
+	return 0;
 
 err_free_pgd:
-	free_pgtable_page(pgtable->pgd);
+	free_pgtable_page(iop->pgd);
 
-	return NULL;
+	return ret;
 }
 
 struct io_pgtable_init_fns io_pgtable_amd_iommu_v2_init_fns = {
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 7efb6b467041..51f9cecdcb6b 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1984,7 +1984,7 @@ static void protection_domain_free(struct protection_domain *domain)
 		return;
 
 	if (domain->iop.pgtbl_cfg.tlb)
-		free_io_pgtable_ops(&domain->iop.iop.ops);
+		free_io_pgtable_ops(&domain->iop.iop);
 
 	if (domain->id)
 		domain_id_free(domain->id);
@@ -2037,7 +2037,6 @@ static int protection_domain_init_v2(struct protection_domain *domain)
 
 static struct protection_domain *protection_domain_alloc(unsigned int type)
 {
-	struct io_pgtable_ops *pgtbl_ops;
 	struct protection_domain *domain;
 	int pgtable = amd_iommu_pgtable;
 	int mode = DEFAULT_PGTABLE_LEVEL;
@@ -2073,8 +2072,9 @@ static struct protection_domain *protection_domain_alloc(unsigned int type)
 		goto out_err;
 
 	domain->iop.pgtbl_cfg.fmt = pgtable;
-	pgtbl_ops = alloc_io_pgtable_ops(&domain->iop.pgtbl_cfg, domain);
-	if (!pgtbl_ops) {
+	ret = alloc_io_pgtable_ops(&domain->iop.iop, &domain->iop.pgtbl_cfg,
+				   domain);
+	if (ret) {
 		domain_id_free(domain->id);
 		goto out_err;
 	}
@@ -2185,7 +2185,7 @@ static void amd_iommu_iotlb_sync_map(struct iommu_domain *dom,
 				     unsigned long iova, size_t size)
 {
 	struct protection_domain *domain = to_pdomain(dom);
-	struct io_pgtable_ops *ops = &domain->iop.iop.ops;
+	struct io_pgtable_ops *ops = domain->iop.iop.ops;
 
 	if (ops->map_pages)
 		domain_flush_np_cache(domain, iova, size);
@@ -2196,9 +2196,7 @@ static int amd_iommu_map_pages(struct iommu_domain *dom, unsigned long iova,
 			       int iommu_prot, gfp_t gfp, size_t *mapped)
 {
 	struct protection_domain *domain = to_pdomain(dom);
-	struct io_pgtable_ops *ops = &domain->iop.iop.ops;
 	int prot = 0;
-	int ret = -EINVAL;
 
 	if ((amd_iommu_pgtable == AMD_IOMMU_V1) &&
 	    (domain->iop.mode == PAGE_MODE_NONE))
@@ -2209,12 +2207,8 @@ static int amd_iommu_map_pages(struct iommu_domain *dom, unsigned long iova,
 	if (iommu_prot & IOMMU_WRITE)
 		prot |= IOMMU_PROT_IW;
 
-	if (ops->map_pages) {
-		ret = ops->map_pages(ops, iova, paddr, pgsize,
-				     pgcount, prot, gfp, mapped);
-	}
-
-	return ret;
+	return iopt_map_pages(&domain->iop.iop, iova, paddr, pgsize, pgcount,
+			      prot, gfp, mapped);
 }
 
 static void amd_iommu_iotlb_gather_add_page(struct iommu_domain *domain,
@@ -2243,14 +2237,13 @@ static size_t amd_iommu_unmap_pages(struct iommu_domain *dom, unsigned long iova
 				    struct iommu_iotlb_gather *gather)
 {
 	struct protection_domain *domain = to_pdomain(dom);
-	struct io_pgtable_ops *ops = &domain->iop.iop.ops;
 	size_t r;
 
 	if ((amd_iommu_pgtable == AMD_IOMMU_V1) &&
 	    (domain->iop.mode == PAGE_MODE_NONE))
 		return 0;
 
-	r = (ops->unmap_pages) ? ops->unmap_pages(ops, iova, pgsize, pgcount, NULL) : 0;
+	r = iopt_unmap_pages(&domain->iop.iop, iova, pgsize, pgcount, NULL);
 
 	if (r)
 		amd_iommu_iotlb_gather_add_page(dom, gather, iova, r);
@@ -2262,9 +2255,8 @@ static phys_addr_t amd_iommu_iova_to_phys(struct iommu_domain *dom,
 					  dma_addr_t iova)
 {
 	struct protection_domain *domain = to_pdomain(dom);
-	struct io_pgtable_ops *ops = &domain->iop.iop.ops;
 
-	return ops->iova_to_phys(ops, iova);
+	return iopt_iova_to_phys(&domain->iop.iop, iova);
 }
 
 static bool amd_iommu_capable(struct device *dev, enum iommu_cap cap)
@@ -2460,7 +2452,7 @@ void amd_iommu_domain_direct_map(struct iommu_domain *dom)
 	spin_lock_irqsave(&domain->lock, flags);
 
 	if (domain->iop.pgtbl_cfg.tlb)
-		free_io_pgtable_ops(&domain->iop.iop.ops);
+		free_io_pgtable_ops(&domain->iop.iop);
 
 	spin_unlock_irqrestore(&domain->lock, flags);
 }
diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
index 571f948add7c..b806019f925b 100644
--- a/drivers/iommu/apple-dart.c
+++ b/drivers/iommu/apple-dart.c
@@ -150,14 +150,14 @@ struct apple_dart_atomic_stream_map {
 /*
  * This structure is attached to each iommu domain handled by a DART.
  *
- * @pgtbl_ops: pagetable ops allocated by io-pgtable
+ * @pgtbl: pagetable allocated by io-pgtable
  * @finalized: true if the domain has been completely initialized
  * @init_lock: protects domain initialization
  * @stream_maps: streams attached to this domain (valid for DMA/UNMANAGED only)
  * @domain: core iommu domain pointer
  */
 struct apple_dart_domain {
-	struct io_pgtable_ops *pgtbl_ops;
+	struct io_pgtable pgtbl;
 
 	bool finalized;
 	struct mutex init_lock;
@@ -354,12 +354,8 @@ static phys_addr_t apple_dart_iova_to_phys(struct iommu_domain *domain,
 					   dma_addr_t iova)
 {
 	struct apple_dart_domain *dart_domain = to_dart_domain(domain);
-	struct io_pgtable_ops *ops = dart_domain->pgtbl_ops;
 
-	if (!ops)
-		return 0;
-
-	return ops->iova_to_phys(ops, iova);
+	return iopt_iova_to_phys(&dart_domain->pgtbl, iova);
 }
 
 static int apple_dart_map_pages(struct iommu_domain *domain, unsigned long iova,
@@ -368,13 +364,9 @@ static int apple_dart_map_pages(struct iommu_domain *domain, unsigned long iova,
 				size_t *mapped)
 {
 	struct apple_dart_domain *dart_domain = to_dart_domain(domain);
-	struct io_pgtable_ops *ops = dart_domain->pgtbl_ops;
-
-	if (!ops)
-		return -ENODEV;
 
-	return ops->map_pages(ops, iova, paddr, pgsize, pgcount, prot, gfp,
-			      mapped);
+	return iopt_map_pages(&dart_domain->pgtbl, iova, paddr, pgsize, pgcount,
+			      prot, gfp, mapped);
 }
 
 static size_t apple_dart_unmap_pages(struct iommu_domain *domain,
@@ -383,9 +375,9 @@ static size_t apple_dart_unmap_pages(struct iommu_domain *domain,
 				     struct iommu_iotlb_gather *gather)
 {
 	struct apple_dart_domain *dart_domain = to_dart_domain(domain);
-	struct io_pgtable_ops *ops = dart_domain->pgtbl_ops;
 
-	return ops->unmap_pages(ops, iova, pgsize, pgcount, gather);
+	return iopt_unmap_pages(&dart_domain->pgtbl, iova, pgsize, pgcount,
+				gather);
 }
 
 static void
@@ -394,7 +386,7 @@ apple_dart_setup_translation(struct apple_dart_domain *domain,
 {
 	int i;
 	struct io_pgtable_cfg *pgtbl_cfg =
-		&io_pgtable_ops_to_pgtable(domain->pgtbl_ops)->cfg;
+		&io_pgtable_ops_to_params(domain->pgtbl.ops)->cfg;
 
 	for (i = 0; i < pgtbl_cfg->apple_dart_cfg.n_ttbrs; ++i)
 		apple_dart_hw_set_ttbr(stream_map, i,
@@ -435,11 +427,9 @@ static int apple_dart_finalize_domain(struct iommu_domain *domain,
 		.iommu_dev = dart->dev,
 	};
 
-	dart_domain->pgtbl_ops = alloc_io_pgtable_ops(&pgtbl_cfg, domain);
-	if (!dart_domain->pgtbl_ops) {
-		ret = -ENOMEM;
+	ret = alloc_io_pgtable_ops(&dart_domain->pgtbl, &pgtbl_cfg, domain);
+	if (ret)
 		goto done;
-	}
 
 	domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
 	domain->geometry.aperture_start = 0;
@@ -590,7 +580,7 @@ static struct iommu_domain *apple_dart_domain_alloc(unsigned int type)
 
 	mutex_init(&dart_domain->init_lock);
 
-	/* no need to allocate pgtbl_ops or do any other finalization steps */
+	/* no need to allocate pgtbl or do any other finalization steps */
 	if (type == IOMMU_DOMAIN_IDENTITY || type == IOMMU_DOMAIN_BLOCKED)
 		dart_domain->finalized = true;
 
@@ -601,8 +591,8 @@ static void apple_dart_domain_free(struct iommu_domain *domain)
 {
 	struct apple_dart_domain *dart_domain = to_dart_domain(domain);
 
-	if (dart_domain->pgtbl_ops)
-		free_io_pgtable_ops(dart_domain->pgtbl_ops);
+	if (dart_domain->pgtbl.ops)
+		free_io_pgtable_ops(&dart_domain->pgtbl);
 
 	kfree(dart_domain);
 }
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index c033b23ca4b2..97d24ee5c14d 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2058,7 +2058,7 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 
-	free_io_pgtable_ops(smmu_domain->pgtbl_ops);
+	free_io_pgtable_ops(&smmu_domain->pgtbl);
 
 	/* Free the CD and ASID, if we allocated them */
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
@@ -2171,7 +2171,6 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
 	unsigned long ias, oas;
 	enum io_pgtable_fmt fmt;
 	struct io_pgtable_cfg pgtbl_cfg;
-	struct io_pgtable_ops *pgtbl_ops;
 	int (*finalise_stage_fn)(struct arm_smmu_domain *,
 				 struct arm_smmu_master *,
 				 struct io_pgtable_cfg *);
@@ -2218,9 +2217,9 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
 		.iommu_dev	= smmu->dev,
 	};
 
-	pgtbl_ops = alloc_io_pgtable_ops(&pgtbl_cfg, smmu_domain);
-	if (!pgtbl_ops)
-		return -ENOMEM;
+	ret = alloc_io_pgtable_ops(&smmu_domain->pgtbl, &pgtbl_cfg, smmu_domain);
+	if (ret)
+		return ret;
 
 	domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
 	domain->geometry.aperture_end = (1UL << pgtbl_cfg.ias) - 1;
@@ -2228,11 +2227,10 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
 
 	ret = finalise_stage_fn(smmu_domain, master, &pgtbl_cfg);
 	if (ret < 0) {
-		free_io_pgtable_ops(pgtbl_ops);
+		free_io_pgtable_ops(&smmu_domain->pgtbl);
 		return ret;
 	}
 
-	smmu_domain->pgtbl_ops = pgtbl_ops;
 	return 0;
 }
 
@@ -2468,12 +2466,10 @@ static int arm_smmu_map_pages(struct iommu_domain *domain, unsigned long iova,
 			      phys_addr_t paddr, size_t pgsize, size_t pgcount,
 			      int prot, gfp_t gfp, size_t *mapped)
 {
-	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
-
-	if (!ops)
-		return -ENODEV;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 
-	return ops->map_pages(ops, iova, paddr, pgsize, pgcount, prot, gfp, mapped);
+	return iopt_map_pages(&smmu_domain->pgtbl, iova, paddr, pgsize, pgcount,
+			      prot, gfp, mapped);
 }
 
 static size_t arm_smmu_unmap_pages(struct iommu_domain *domain, unsigned long iova,
@@ -2481,12 +2477,9 @@ static size_t arm_smmu_unmap_pages(struct iommu_domain *domain, unsigned long io
 				   struct iommu_iotlb_gather *gather)
 {
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
-	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
 
-	if (!ops)
-		return 0;
-
-	return ops->unmap_pages(ops, iova, pgsize, pgcount, gather);
+	return iopt_unmap_pages(&smmu_domain->pgtbl, iova, pgsize, pgcount,
+				gather);
 }
 
 static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
@@ -2513,12 +2506,9 @@ static void arm_smmu_iotlb_sync(struct iommu_domain *domain,
 static phys_addr_t
 arm_smmu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova)
 {
-	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
-
-	if (!ops)
-		return 0;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 
-	return ops->iova_to_phys(ops, iova);
+	return iopt_iova_to_phys(&smmu_domain->pgtbl, iova);
 }
 
 static struct platform_driver arm_smmu_driver;
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 91d404deb115..0673841167be 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -122,8 +122,8 @@ static const struct io_pgtable_cfg *qcom_adreno_smmu_get_ttbr1_cfg(
 		const void *cookie)
 {
 	struct arm_smmu_domain *smmu_domain = (void *)cookie;
-	struct io_pgtable *pgtable =
-		io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops);
+	struct io_pgtable_params *pgtable =
+		io_pgtable_ops_to_params(smmu_domain->pgtbl.ops);
 	return &pgtable->cfg;
 }
 
@@ -137,7 +137,8 @@ static int qcom_adreno_smmu_set_ttbr0_cfg(const void *cookie,
 		const struct io_pgtable_cfg *pgtbl_cfg)
 {
 	struct arm_smmu_domain *smmu_domain = (void *)cookie;
-	struct io_pgtable *pgtable = io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops);
+	struct io_pgtable_params *pgtable =
+		io_pgtable_ops_to_params(smmu_domain->pgtbl.ops);
 	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
 	struct arm_smmu_cb *cb = &smmu_domain->smmu->cbs[cfg->cbndx];
 
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index f230d2ce977a..201055254d5b 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -614,7 +614,6 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 {
 	int irq, start, ret = 0;
 	unsigned long ias, oas;
-	struct io_pgtable_ops *pgtbl_ops;
 	struct io_pgtable_cfg pgtbl_cfg;
 	enum io_pgtable_fmt fmt;
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
@@ -765,11 +764,9 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 	if (smmu_domain->pgtbl_quirks)
 		pgtbl_cfg.quirks |= smmu_domain->pgtbl_quirks;
 
-	pgtbl_ops = alloc_io_pgtable_ops(&pgtbl_cfg, smmu_domain);
-	if (!pgtbl_ops) {
-		ret = -ENOMEM;
+	ret = alloc_io_pgtable_ops(&smmu_domain->pgtbl, &pgtbl_cfg, smmu_domain);
+	if (ret)
 		goto out_clear_smmu;
-	}
 
 	/* Update the domain's page sizes to reflect the page table format */
 	domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
@@ -808,8 +805,6 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 
 	mutex_unlock(&smmu_domain->init_mutex);
 
-	/* Publish page table ops for map/unmap */
-	smmu_domain->pgtbl_ops = pgtbl_ops;
 	return 0;
 
 out_clear_smmu:
@@ -846,7 +841,7 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain)
 		devm_free_irq(smmu->dev, irq, domain);
 	}
 
-	free_io_pgtable_ops(smmu_domain->pgtbl_ops);
+	free_io_pgtable_ops(&smmu_domain->pgtbl);
 	__arm_smmu_free_bitmap(smmu->context_map, cfg->cbndx);
 
 	arm_smmu_rpm_put(smmu);
@@ -1181,15 +1176,13 @@ static int arm_smmu_map_pages(struct iommu_domain *domain, unsigned long iova,
 			      phys_addr_t paddr, size_t pgsize, size_t pgcount,
 			      int prot, gfp_t gfp, size_t *mapped)
 {
-	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
-	struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	int ret;
 
-	if (!ops)
-		return -ENODEV;
-
 	arm_smmu_rpm_get(smmu);
-	ret = ops->map_pages(ops, iova, paddr, pgsize, pgcount, prot, gfp, mapped);
+	ret = iopt_map_pages(&smmu_domain->pgtbl, iova, paddr, pgsize, pgcount,
+			     prot, gfp, mapped);
 	arm_smmu_rpm_put(smmu);
 
 	return ret;
@@ -1199,15 +1192,13 @@ static size_t arm_smmu_unmap_pages(struct iommu_domain *domain, unsigned long io
 				   size_t pgsize, size_t pgcount,
 				   struct iommu_iotlb_gather *iotlb_gather)
 {
-	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
-	struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	size_t ret;
 
-	if (!ops)
-		return 0;
-
 	arm_smmu_rpm_get(smmu);
-	ret = ops->unmap_pages(ops, iova, pgsize, pgcount, iotlb_gather);
+	ret = iopt_unmap_pages(&smmu_domain->pgtbl, iova, pgsize, pgcount,
+			       iotlb_gather);
 	arm_smmu_rpm_put(smmu);
 
 	return ret;
@@ -1249,7 +1240,6 @@ static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
-	struct io_pgtable_ops *ops= smmu_domain->pgtbl_ops;
 	struct device *dev = smmu->dev;
 	void __iomem *reg;
 	u32 tmp;
@@ -1277,7 +1267,7 @@ static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,
 			"iova to phys timed out on %pad. Falling back to software table walk.\n",
 			&iova);
 		arm_smmu_rpm_put(smmu);
-		return ops->iova_to_phys(ops, iova);
+		return iopt_iova_to_phys(&smmu_domain->pgtbl, iova);
 	}
 
 	phys = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_PAR);
@@ -1299,16 +1289,12 @@ static phys_addr_t arm_smmu_iova_to_phys(struct iommu_domain *domain,
 					dma_addr_t iova)
 {
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
-	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
-
-	if (!ops)
-		return 0;
 
 	if (smmu_domain->smmu->features & ARM_SMMU_FEAT_TRANS_OPS &&
 			smmu_domain->stage == ARM_SMMU_DOMAIN_S1)
 		return arm_smmu_iova_to_phys_hard(domain, iova);
 
-	return ops->iova_to_phys(ops, iova);
+	return iopt_iova_to_phys(&smmu_domain->pgtbl, iova);
 }
 
 static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap)
diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
index 65eb8bdcbe50..56676dd84462 100644
--- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c
+++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
@@ -64,7 +64,7 @@ struct qcom_iommu_ctx {
 };
 
 struct qcom_iommu_domain {
-	struct io_pgtable_ops	*pgtbl_ops;
+	struct io_pgtable	 pgtbl;
 	spinlock_t		 pgtbl_lock;
 	struct mutex		 init_mutex; /* Protects iommu pointer */
 	struct iommu_domain	 domain;
@@ -229,7 +229,6 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
 {
 	struct qcom_iommu_domain *qcom_domain = to_qcom_iommu_domain(domain);
 	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
-	struct io_pgtable_ops *pgtbl_ops;
 	struct io_pgtable_cfg pgtbl_cfg;
 	int i, ret = 0;
 	u32 reg;
@@ -250,10 +249,9 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
 	qcom_domain->iommu = qcom_iommu;
 	qcom_domain->fwspec = fwspec;
 
-	pgtbl_ops = alloc_io_pgtable_ops(&pgtbl_cfg, qcom_domain);
-	if (!pgtbl_ops) {
+	ret = alloc_io_pgtable_ops(&qcom_domain->pgtbl, &pgtbl_cfg, qcom_domain);
+	if (ret) {
 		dev_err(qcom_iommu->dev, "failed to allocate pagetable ops\n");
-		ret = -ENOMEM;
 		goto out_clear_iommu;
 	}
 
@@ -308,9 +306,6 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
 
 	mutex_unlock(&qcom_domain->init_mutex);
 
-	/* Publish page table ops for map/unmap */
-	qcom_domain->pgtbl_ops = pgtbl_ops;
-
 	return 0;
 
 out_clear_iommu:
@@ -353,7 +348,7 @@ static void qcom_iommu_domain_free(struct iommu_domain *domain)
 		 * is on to avoid unclocked accesses in the TLB inv path:
 		 */
 		pm_runtime_get_sync(qcom_domain->iommu->dev);
-		free_io_pgtable_ops(qcom_domain->pgtbl_ops);
+		free_io_pgtable_ops(&qcom_domain->pgtbl);
 		pm_runtime_put_sync(qcom_domain->iommu->dev);
 	}
 
@@ -417,13 +412,10 @@ static int qcom_iommu_map(struct iommu_domain *domain, unsigned long iova,
 	int ret;
 	unsigned long flags;
 	struct qcom_iommu_domain *qcom_domain = to_qcom_iommu_domain(domain);
-	struct io_pgtable_ops *ops = qcom_domain->pgtbl_ops;
-
-	if (!ops)
-		return -ENODEV;
 
 	spin_lock_irqsave(&qcom_domain->pgtbl_lock, flags);
-	ret = ops->map_pages(ops, iova, paddr, pgsize, pgcount, prot, GFP_ATOMIC, mapped);
+	ret = iopt_map_pages(&qcom_domain->pgtbl, iova, paddr, pgsize, pgcount,
+			     prot, GFP_ATOMIC, mapped);
 	spin_unlock_irqrestore(&qcom_domain->pgtbl_lock, flags);
 	return ret;
 }
@@ -435,10 +427,6 @@ static size_t qcom_iommu_unmap(struct iommu_domain *domain, unsigned long iova,
 	size_t ret;
 	unsigned long flags;
 	struct qcom_iommu_domain *qcom_domain = to_qcom_iommu_domain(domain);
-	struct io_pgtable_ops *ops = qcom_domain->pgtbl_ops;
-
-	if (!ops)
-		return 0;
 
 	/* NOTE: unmap can be called after client device is powered off,
 	 * for example, with GPUs or anything involving dma-buf.  So we
@@ -447,7 +435,8 @@ static size_t qcom_iommu_unmap(struct iommu_domain *domain, unsigned long iova,
 	 */
 	pm_runtime_get_sync(qcom_domain->iommu->dev);
 	spin_lock_irqsave(&qcom_domain->pgtbl_lock, flags);
-	ret = ops->unmap_pages(ops, iova, pgsize, pgcount, gather);
+	ret = iopt_unmap_pages(&qcom_domain->pgtbl, iova, pgsize, pgcount,
+			       gather);
 	spin_unlock_irqrestore(&qcom_domain->pgtbl_lock, flags);
 	pm_runtime_put_sync(qcom_domain->iommu->dev);
 
@@ -457,13 +446,12 @@ static size_t qcom_iommu_unmap(struct iommu_domain *domain, unsigned long iova,
 static void qcom_iommu_flush_iotlb_all(struct iommu_domain *domain)
 {
 	struct qcom_iommu_domain *qcom_domain = to_qcom_iommu_domain(domain);
-	struct io_pgtable *pgtable = container_of(qcom_domain->pgtbl_ops,
-						  struct io_pgtable, ops);
-	if (!qcom_domain->pgtbl_ops)
+
+	if (!qcom_domain->pgtbl.ops)
 		return;
 
 	pm_runtime_get_sync(qcom_domain->iommu->dev);
-	qcom_iommu_tlb_sync(pgtable->cookie);
+	qcom_iommu_tlb_sync(qcom_domain->pgtbl.cookie);
 	pm_runtime_put_sync(qcom_domain->iommu->dev);
 }
 
@@ -479,13 +467,9 @@ static phys_addr_t qcom_iommu_iova_to_phys(struct iommu_domain *domain,
 	phys_addr_t ret;
 	unsigned long flags;
 	struct qcom_iommu_domain *qcom_domain = to_qcom_iommu_domain(domain);
-	struct io_pgtable_ops *ops = qcom_domain->pgtbl_ops;
-
-	if (!ops)
-		return 0;
 
 	spin_lock_irqsave(&qcom_domain->pgtbl_lock, flags);
-	ret = ops->iova_to_phys(ops, iova);
+	ret = iopt_iova_to_phys(&qcom_domain->pgtbl, iova);
 	spin_unlock_irqrestore(&qcom_domain->pgtbl_lock, flags);
 
 	return ret;
diff --git a/drivers/iommu/io-pgtable-arm-common.c b/drivers/iommu/io-pgtable-arm-common.c
index 4b3a9ce806ea..359086cace34 100644
--- a/drivers/iommu/io-pgtable-arm-common.c
+++ b/drivers/iommu/io-pgtable-arm-common.c
@@ -48,7 +48,8 @@ static void __arm_lpae_clear_pte(arm_lpae_iopte *ptep, struct io_pgtable_cfg *cf
 		__arm_lpae_sync_pte(ptep, 1, cfg);
 }
 
-static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
+static size_t __arm_lpae_unmap(struct io_pgtable *iop,
+			       struct arm_lpae_io_pgtable *data,
 			       struct iommu_iotlb_gather *gather,
 			       unsigned long iova, size_t size, size_t pgcount,
 			       int lvl, arm_lpae_iopte *ptep);
@@ -74,7 +75,8 @@ static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
 		__arm_lpae_sync_pte(ptep, num_entries, cfg);
 }
 
-static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
+static int arm_lpae_init_pte(struct io_pgtable *iop,
+			     struct arm_lpae_io_pgtable *data,
 			     unsigned long iova, phys_addr_t paddr,
 			     arm_lpae_iopte prot, int lvl, int num_entries,
 			     arm_lpae_iopte *ptep)
@@ -95,8 +97,8 @@ static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
 			size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
 
 			tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data);
-			if (__arm_lpae_unmap(data, NULL, iova + i * sz, sz, 1,
-					     lvl, tblp) != sz) {
+			if (__arm_lpae_unmap(iop, data, NULL, iova + i * sz, sz,
+					     1, lvl, tblp) != sz) {
 				WARN_ON(1);
 				return -EINVAL;
 			}
@@ -139,10 +141,10 @@ static arm_lpae_iopte arm_lpae_install_table(arm_lpae_iopte *table,
 	return old;
 }
 
-int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova,
-		   phys_addr_t paddr, size_t size, size_t pgcount,
-		   arm_lpae_iopte prot, int lvl, arm_lpae_iopte *ptep,
-		   gfp_t gfp, size_t *mapped)
+int __arm_lpae_map(struct io_pgtable *iop, struct arm_lpae_io_pgtable *data,
+		   unsigned long iova, phys_addr_t paddr, size_t size,
+		   size_t pgcount, arm_lpae_iopte prot, int lvl,
+		   arm_lpae_iopte *ptep, gfp_t gfp, size_t *mapped)
 {
 	arm_lpae_iopte *cptep, pte;
 	size_t block_size = ARM_LPAE_BLOCK_SIZE(lvl, data);
@@ -158,7 +160,8 @@ int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova,
 	if (size == block_size) {
 		max_entries = ARM_LPAE_PTES_PER_TABLE(data) - map_idx_start;
 		num_entries = min_t(int, pgcount, max_entries);
-		ret = arm_lpae_init_pte(data, iova, paddr, prot, lvl, num_entries, ptep);
+		ret = arm_lpae_init_pte(iop, data, iova, paddr, prot, lvl,
+					num_entries, ptep);
 		if (!ret)
 			*mapped += num_entries * size;
 
@@ -192,7 +195,7 @@ int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova,
 	}
 
 	/* Rinse, repeat */
-	return __arm_lpae_map(data, iova, paddr, size, pgcount, prot, lvl + 1,
+	return __arm_lpae_map(iop, data, iova, paddr, size, pgcount, prot, lvl + 1,
 			      cptep, gfp, mapped);
 }
 
@@ -260,13 +263,13 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
 	return pte;
 }
 
-int arm_lpae_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
+int arm_lpae_map_pages(struct io_pgtable *iop, unsigned long iova,
 		       phys_addr_t paddr, size_t pgsize, size_t pgcount,
 		       int iommu_prot, gfp_t gfp, size_t *mapped)
 {
-	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
 	struct io_pgtable_cfg *cfg = &data->iop.cfg;
-	arm_lpae_iopte *ptep = data->pgd;
+	arm_lpae_iopte *ptep = iop->pgd;
 	int ret, lvl = data->start_level;
 	arm_lpae_iopte prot;
 	long iaext = (s64)iova >> cfg->ias;
@@ -284,7 +287,7 @@ int arm_lpae_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
 		return 0;
 
 	prot = arm_lpae_prot_to_pte(data, iommu_prot);
-	ret = __arm_lpae_map(data, iova, paddr, pgsize, pgcount, prot, lvl,
+	ret = __arm_lpae_map(iop, data, iova, paddr, pgsize, pgcount, prot, lvl,
 			     ptep, gfp, mapped);
 	/*
 	 * Synchronise all PTE updates for the new mapping before there's
@@ -326,7 +329,8 @@ void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
 	__arm_lpae_free_pages(start, table_size, &data->iop.cfg);
 }
 
-static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
+static size_t arm_lpae_split_blk_unmap(struct io_pgtable *iop,
+				       struct arm_lpae_io_pgtable *data,
 				       struct iommu_iotlb_gather *gather,
 				       unsigned long iova, size_t size,
 				       arm_lpae_iopte blk_pte, int lvl,
@@ -378,21 +382,24 @@ static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
 		tablep = iopte_deref(pte, data);
 	} else if (unmap_idx_start >= 0) {
 		for (i = 0; i < num_entries; i++)
-			io_pgtable_tlb_add_page(&data->iop, gather, iova + i * size, size);
+			io_pgtable_tlb_add_page(cfg, iop, gather,
+						iova + i * size, size);
 
 		return num_entries * size;
 	}
 
-	return __arm_lpae_unmap(data, gather, iova, size, pgcount, lvl, tablep);
+	return __arm_lpae_unmap(iop, data, gather, iova, size, pgcount, lvl,
+				tablep);
 }
 
-static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
+static size_t __arm_lpae_unmap(struct io_pgtable *iop,
+			       struct arm_lpae_io_pgtable *data,
 			       struct iommu_iotlb_gather *gather,
 			       unsigned long iova, size_t size, size_t pgcount,
 			       int lvl, arm_lpae_iopte *ptep)
 {
 	arm_lpae_iopte pte;
-	struct io_pgtable *iop = &data->iop;
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
 	int i = 0, num_entries, max_entries, unmap_idx_start;
 
 	/* Something went horribly wrong and we ran out of page table */
@@ -415,15 +422,16 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
 			if (WARN_ON(!pte))
 				break;
 
-			__arm_lpae_clear_pte(ptep, &iop->cfg);
+			__arm_lpae_clear_pte(ptep, cfg);
 
-			if (!iopte_leaf(pte, lvl, iop->cfg.fmt)) {
+			if (!iopte_leaf(pte, lvl, cfg->fmt)) {
 				/* Also flush any partial walks */
-				io_pgtable_tlb_flush_walk(iop, iova + i * size, size,
+				io_pgtable_tlb_flush_walk(cfg, iop, iova + i * size, size,
 							  ARM_LPAE_GRANULE(data));
 				__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data));
 			} else if (!iommu_iotlb_gather_queued(gather)) {
-				io_pgtable_tlb_add_page(iop, gather, iova + i * size, size);
+				io_pgtable_tlb_add_page(cfg, iop, gather,
+							iova + i * size, size);
 			}
 
 			ptep++;
@@ -431,27 +439,28 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
 		}
 
 		return i * size;
-	} else if (iopte_leaf(pte, lvl, iop->cfg.fmt)) {
+	} else if (iopte_leaf(pte, lvl, cfg->fmt)) {
 		/*
 		 * Insert a table at the next level to map the old region,
 		 * minus the part we want to unmap
 		 */
-		return arm_lpae_split_blk_unmap(data, gather, iova, size, pte,
-						lvl + 1, ptep, pgcount);
+		return arm_lpae_split_blk_unmap(iop, data, gather, iova, size,
+						pte, lvl + 1, ptep, pgcount);
 	}
 
 	/* Keep on walkin' */
 	ptep = iopte_deref(pte, data);
-	return __arm_lpae_unmap(data, gather, iova, size, pgcount, lvl + 1, ptep);
+	return __arm_lpae_unmap(iop, data, gather, iova, size,
+				pgcount, lvl + 1, ptep);
 }
 
-size_t arm_lpae_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova,
+size_t arm_lpae_unmap_pages(struct io_pgtable *iop, unsigned long iova,
 			    size_t pgsize, size_t pgcount,
 			    struct iommu_iotlb_gather *gather)
 {
-	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
 	struct io_pgtable_cfg *cfg = &data->iop.cfg;
-	arm_lpae_iopte *ptep = data->pgd;
+	arm_lpae_iopte *ptep = iop->pgd;
 	long iaext = (s64)iova >> cfg->ias;
 
 	if (WARN_ON(!pgsize || (pgsize & cfg->pgsize_bitmap) != pgsize || !pgcount))
@@ -462,15 +471,14 @@ size_t arm_lpae_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova,
 	if (WARN_ON(iaext))
 		return 0;
 
-	return __arm_lpae_unmap(data, gather, iova, pgsize, pgcount,
-				data->start_level, ptep);
+	return __arm_lpae_unmap(iop, data, gather, iova, pgsize,
+				pgcount, data->start_level, ptep);
 }
 
-phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
-				  unsigned long iova)
+static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable *iop, unsigned long iova)
 {
-	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
-	arm_lpae_iopte pte, *ptep = data->pgd;
+	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
+	arm_lpae_iopte pte, *ptep = iop->pgd;
 	int lvl = data->start_level;
 
 	do {
diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
index 278b4299d757..2dd12fabfaee 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -40,7 +40,7 @@
 	container_of((x), struct arm_v7s_io_pgtable, iop)
 
 #define io_pgtable_ops_to_data(x)					\
-	io_pgtable_to_data(io_pgtable_ops_to_pgtable(x))
+	io_pgtable_to_data(io_pgtable_ops_to_params(x))
 
 /*
  * We have 32 bits total; 12 bits resolved at level 1, 8 bits at level 2,
@@ -162,11 +162,10 @@ typedef u32 arm_v7s_iopte;
 static bool selftest_running;
 
 struct arm_v7s_io_pgtable {
-	struct io_pgtable	iop;
+	struct io_pgtable_params	iop;
 
-	arm_v7s_iopte		*pgd;
-	struct kmem_cache	*l2_tables;
-	spinlock_t		split_lock;
+	struct kmem_cache		*l2_tables;
+	spinlock_t			split_lock;
 };
 
 static bool arm_v7s_pte_is_cont(arm_v7s_iopte pte, int lvl);
@@ -424,13 +423,14 @@ static bool arm_v7s_pte_is_cont(arm_v7s_iopte pte, int lvl)
 	return false;
 }
 
-static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *,
+static size_t __arm_v7s_unmap(struct io_pgtable *, struct arm_v7s_io_pgtable *,
 			      struct iommu_iotlb_gather *, unsigned long,
 			      size_t, int, arm_v7s_iopte *);
 
-static int arm_v7s_init_pte(struct arm_v7s_io_pgtable *data,
-			    unsigned long iova, phys_addr_t paddr, int prot,
-			    int lvl, int num_entries, arm_v7s_iopte *ptep)
+static int arm_v7s_init_pte(struct io_pgtable *iop,
+			    struct arm_v7s_io_pgtable *data, unsigned long iova,
+			    phys_addr_t paddr, int prot, int lvl,
+			    int num_entries, arm_v7s_iopte *ptep)
 {
 	struct io_pgtable_cfg *cfg = &data->iop.cfg;
 	arm_v7s_iopte pte;
@@ -446,7 +446,7 @@ static int arm_v7s_init_pte(struct arm_v7s_io_pgtable *data,
 			size_t sz = ARM_V7S_BLOCK_SIZE(lvl);
 
 			tblp = ptep - ARM_V7S_LVL_IDX(iova, lvl, cfg);
-			if (WARN_ON(__arm_v7s_unmap(data, NULL, iova + i * sz,
+			if (WARN_ON(__arm_v7s_unmap(iop, data, NULL, iova + i * sz,
 						    sz, lvl, tblp) != sz))
 				return -EINVAL;
 		} else if (ptep[i]) {
@@ -494,9 +494,9 @@ static arm_v7s_iopte arm_v7s_install_table(arm_v7s_iopte *table,
 	return old;
 }
 
-static int __arm_v7s_map(struct arm_v7s_io_pgtable *data, unsigned long iova,
-			 phys_addr_t paddr, size_t size, int prot,
-			 int lvl, arm_v7s_iopte *ptep, gfp_t gfp)
+static int __arm_v7s_map(struct io_pgtable *iop, struct arm_v7s_io_pgtable *data,
+			 unsigned long iova, phys_addr_t paddr, size_t size,
+			 int prot, int lvl, arm_v7s_iopte *ptep, gfp_t gfp)
 {
 	struct io_pgtable_cfg *cfg = &data->iop.cfg;
 	arm_v7s_iopte pte, *cptep;
@@ -507,7 +507,7 @@ static int __arm_v7s_map(struct arm_v7s_io_pgtable *data, unsigned long iova,
 
 	/* If we can install a leaf entry at this level, then do so */
 	if (num_entries)
-		return arm_v7s_init_pte(data, iova, paddr, prot,
+		return arm_v7s_init_pte(iop, data, iova, paddr, prot,
 					lvl, num_entries, ptep);
 
 	/* We can't allocate tables at the final level */
@@ -538,14 +538,14 @@ static int __arm_v7s_map(struct arm_v7s_io_pgtable *data, unsigned long iova,
 	}
 
 	/* Rinse, repeat */
-	return __arm_v7s_map(data, iova, paddr, size, prot, lvl + 1, cptep, gfp);
+	return __arm_v7s_map(iop, data, iova, paddr, size, prot, lvl + 1, cptep, gfp);
 }
 
-static int arm_v7s_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
+static int arm_v7s_map_pages(struct io_pgtable *iop, unsigned long iova,
 			     phys_addr_t paddr, size_t pgsize, size_t pgcount,
 			     int prot, gfp_t gfp, size_t *mapped)
 {
-	struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
 	int ret = -EINVAL;
 
 	if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias) ||
@@ -557,8 +557,8 @@ static int arm_v7s_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
 		return 0;
 
 	while (pgcount--) {
-		ret = __arm_v7s_map(data, iova, paddr, pgsize, prot, 1, data->pgd,
-				    gfp);
+		ret = __arm_v7s_map(iop, data, iova, paddr, pgsize, prot, 1,
+				    iop->pgd, gfp);
 		if (ret)
 			break;
 
@@ -577,26 +577,26 @@ static int arm_v7s_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
 
 static void arm_v7s_free_pgtable(struct io_pgtable *iop)
 {
-	struct arm_v7s_io_pgtable *data = io_pgtable_to_data(iop);
+	struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
+	arm_v7s_iopte *ptep = iop->pgd;
 	int i;
 
-	for (i = 0; i < ARM_V7S_PTES_PER_LVL(1, &data->iop.cfg); i++) {
-		arm_v7s_iopte pte = data->pgd[i];
-
-		if (ARM_V7S_PTE_IS_TABLE(pte, 1))
-			__arm_v7s_free_table(iopte_deref(pte, 1, data),
+	for (i = 0; i < ARM_V7S_PTES_PER_LVL(1, &data->iop.cfg); i++, ptep++) {
+		if (ARM_V7S_PTE_IS_TABLE(*ptep, 1))
+			__arm_v7s_free_table(iopte_deref(*ptep, 1, data),
 					     2, data);
 	}
-	__arm_v7s_free_table(data->pgd, 1, data);
+	__arm_v7s_free_table(iop->pgd, 1, data);
 	kmem_cache_destroy(data->l2_tables);
 	kfree(data);
 }
 
-static arm_v7s_iopte arm_v7s_split_cont(struct arm_v7s_io_pgtable *data,
+static arm_v7s_iopte arm_v7s_split_cont(struct io_pgtable *iop,
+					struct arm_v7s_io_pgtable *data,
 					unsigned long iova, int idx, int lvl,
 					arm_v7s_iopte *ptep)
 {
-	struct io_pgtable *iop = &data->iop;
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
 	arm_v7s_iopte pte;
 	size_t size = ARM_V7S_BLOCK_SIZE(lvl);
 	int i;
@@ -611,14 +611,15 @@ static arm_v7s_iopte arm_v7s_split_cont(struct arm_v7s_io_pgtable *data,
 	for (i = 0; i < ARM_V7S_CONT_PAGES; i++)
 		ptep[i] = pte + i * size;
 
-	__arm_v7s_pte_sync(ptep, ARM_V7S_CONT_PAGES, &iop->cfg);
+	__arm_v7s_pte_sync(ptep, ARM_V7S_CONT_PAGES, cfg);
 
 	size *= ARM_V7S_CONT_PAGES;
-	io_pgtable_tlb_flush_walk(iop, iova, size, size);
+	io_pgtable_tlb_flush_walk(cfg, iop, iova, size, size);
 	return pte;
 }
 
-static size_t arm_v7s_split_blk_unmap(struct arm_v7s_io_pgtable *data,
+static size_t arm_v7s_split_blk_unmap(struct io_pgtable *iop,
+				      struct arm_v7s_io_pgtable *data,
 				      struct iommu_iotlb_gather *gather,
 				      unsigned long iova, size_t size,
 				      arm_v7s_iopte blk_pte,
@@ -656,27 +657,28 @@ static size_t arm_v7s_split_blk_unmap(struct arm_v7s_io_pgtable *data,
 			return 0;
 
 		tablep = iopte_deref(pte, 1, data);
-		return __arm_v7s_unmap(data, gather, iova, size, 2, tablep);
+		return __arm_v7s_unmap(iop, data, gather, iova, size, 2, tablep);
 	}
 
-	io_pgtable_tlb_add_page(&data->iop, gather, iova, size);
+	io_pgtable_tlb_add_page(cfg, iop, gather, iova, size);
 	return size;
 }
 
-static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *data,
+static size_t __arm_v7s_unmap(struct io_pgtable *iop,
+			      struct arm_v7s_io_pgtable *data,
 			      struct iommu_iotlb_gather *gather,
 			      unsigned long iova, size_t size, int lvl,
 			      arm_v7s_iopte *ptep)
 {
 	arm_v7s_iopte pte[ARM_V7S_CONT_PAGES];
-	struct io_pgtable *iop = &data->iop;
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
 	int idx, i = 0, num_entries = size >> ARM_V7S_LVL_SHIFT(lvl);
 
 	/* Something went horribly wrong and we ran out of page table */
 	if (WARN_ON(lvl > 2))
 		return 0;
 
-	idx = ARM_V7S_LVL_IDX(iova, lvl, &iop->cfg);
+	idx = ARM_V7S_LVL_IDX(iova, lvl, cfg);
 	ptep += idx;
 	do {
 		pte[i] = READ_ONCE(ptep[i]);
@@ -698,7 +700,7 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *data,
 		unsigned long flags;
 
 		spin_lock_irqsave(&data->split_lock, flags);
-		pte[0] = arm_v7s_split_cont(data, iova, idx, lvl, ptep);
+		pte[0] = arm_v7s_split_cont(iop, data, iova, idx, lvl, ptep);
 		spin_unlock_irqrestore(&data->split_lock, flags);
 	}
 
@@ -706,17 +708,18 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *data,
 	if (num_entries) {
 		size_t blk_size = ARM_V7S_BLOCK_SIZE(lvl);
 
-		__arm_v7s_set_pte(ptep, 0, num_entries, &iop->cfg);
+		__arm_v7s_set_pte(ptep, 0, num_entries, cfg);
 
 		for (i = 0; i < num_entries; i++) {
 			if (ARM_V7S_PTE_IS_TABLE(pte[i], lvl)) {
 				/* Also flush any partial walks */
-				io_pgtable_tlb_flush_walk(iop, iova, blk_size,
+				io_pgtable_tlb_flush_walk(cfg, iop, iova, blk_size,
 						ARM_V7S_BLOCK_SIZE(lvl + 1));
 				ptep = iopte_deref(pte[i], lvl, data);
 				__arm_v7s_free_table(ptep, lvl + 1, data);
 			} else if (!iommu_iotlb_gather_queued(gather)) {
-				io_pgtable_tlb_add_page(iop, gather, iova, blk_size);
+				io_pgtable_tlb_add_page(cfg, iop, gather, iova,
+							blk_size);
 			}
 			iova += blk_size;
 		}
@@ -726,27 +729,27 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *data,
 		 * Insert a table at the next level to map the old region,
 		 * minus the part we want to unmap
 		 */
-		return arm_v7s_split_blk_unmap(data, gather, iova, size, pte[0],
-					       ptep);
+		return arm_v7s_split_blk_unmap(iop, data, gather, iova, size,
+					       pte[0], ptep);
 	}
 
 	/* Keep on walkin' */
 	ptep = iopte_deref(pte[0], lvl, data);
-	return __arm_v7s_unmap(data, gather, iova, size, lvl + 1, ptep);
+	return __arm_v7s_unmap(iop, data, gather, iova, size, lvl + 1, ptep);
 }
 
-static size_t arm_v7s_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova,
+static size_t arm_v7s_unmap_pages(struct io_pgtable *iop, unsigned long iova,
 				  size_t pgsize, size_t pgcount,
 				  struct iommu_iotlb_gather *gather)
 {
-	struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
 	size_t unmapped = 0, ret;
 
 	if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias)))
 		return 0;
 
 	while (pgcount--) {
-		ret = __arm_v7s_unmap(data, gather, iova, pgsize, 1, data->pgd);
+		ret = __arm_v7s_unmap(iop, data, gather, iova, pgsize, 1, iop->pgd);
 		if (!ret)
 			break;
 
@@ -757,11 +760,11 @@ static size_t arm_v7s_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova
 	return unmapped;
 }
 
-static phys_addr_t arm_v7s_iova_to_phys(struct io_pgtable_ops *ops,
+static phys_addr_t arm_v7s_iova_to_phys(struct io_pgtable *iop,
 					unsigned long iova)
 {
-	struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops);
-	arm_v7s_iopte *ptep = data->pgd, pte;
+	struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
+	arm_v7s_iopte *ptep = iop->pgd, pte;
 	int lvl = 0;
 	u32 mask;
 
@@ -780,37 +783,37 @@ static phys_addr_t arm_v7s_iova_to_phys(struct io_pgtable_ops *ops,
 	return iopte_to_paddr(pte, lvl, &data->iop.cfg) | (iova & ~mask);
 }
 
-static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,
-						void *cookie)
+static int arm_v7s_alloc_pgtable(struct io_pgtable *iop,
+				 struct io_pgtable_cfg *cfg, void *cookie)
 {
 	struct arm_v7s_io_pgtable *data;
 	slab_flags_t slab_flag;
 	phys_addr_t paddr;
 
 	if (cfg->ias > (arm_v7s_is_mtk_enabled(cfg) ? 34 : ARM_V7S_ADDR_BITS))
-		return NULL;
+		return -EINVAL;
 
 	if (cfg->oas > (arm_v7s_is_mtk_enabled(cfg) ? 35 : ARM_V7S_ADDR_BITS))
-		return NULL;
+		return -EINVAL;
 
 	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
 			    IO_PGTABLE_QUIRK_NO_PERMS |
 			    IO_PGTABLE_QUIRK_ARM_MTK_EXT |
 			    IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT))
-		return NULL;
+		return -EINVAL;
 
 	/* If ARM_MTK_4GB is enabled, the NO_PERMS is also expected. */
 	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_EXT &&
 	    !(cfg->quirks & IO_PGTABLE_QUIRK_NO_PERMS))
-			return NULL;
+		return -EINVAL;
 
 	if ((cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT) &&
 	    !arm_v7s_is_mtk_enabled(cfg))
-		return NULL;
+		return -EINVAL;
 
 	data = kmalloc(sizeof(*data), GFP_KERNEL);
 	if (!data)
-		return NULL;
+		return -ENOMEM;
 
 	spin_lock_init(&data->split_lock);
 
@@ -860,15 +863,15 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,
 				ARM_V7S_NMRR_OR(7, ARM_V7S_RGN_WBWA);
 
 	/* Looking good; allocate a pgd */
-	data->pgd = __arm_v7s_alloc_table(1, GFP_KERNEL, data);
-	if (!data->pgd)
+	iop->pgd = __arm_v7s_alloc_table(1, GFP_KERNEL, data);
+	if (!iop->pgd)
 		goto out_free_data;
 
 	/* Ensure the empty pgd is visible before any actual TTBR write */
 	wmb();
 
 	/* TTBR */
-	paddr = virt_to_phys(data->pgd);
+	paddr = virt_to_phys(iop->pgd);
 	if (arm_v7s_is_mtk_enabled(cfg))
 		cfg->arm_v7s_cfg.ttbr = paddr | upper_32_bits(paddr);
 	else
@@ -878,12 +881,13 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,
 					 ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_WBWA)) :
 					(ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_NC) |
 					 ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_NC)));
-	return &data->iop;
+	iop->ops = &data->iop.ops;
+	return 0;
 
 out_free_data:
 	kmem_cache_destroy(data->l2_tables);
 	kfree(data);
-	return NULL;
+	return -EINVAL;
 }
 
 struct io_pgtable_init_fns io_pgtable_arm_v7s_init_fns = {
@@ -920,7 +924,7 @@ static const struct iommu_flush_ops dummy_tlb_ops __initconst = {
 	.tlb_add_page	= dummy_tlb_add_page,
 };
 
-#define __FAIL(ops)	({				\
+#define __FAIL()	({				\
 		WARN(1, "selftest: test failed\n");	\
 		selftest_running = false;		\
 		-EFAULT;				\
@@ -928,7 +932,7 @@ static const struct iommu_flush_ops dummy_tlb_ops __initconst = {
 
 static int __init arm_v7s_do_selftests(void)
 {
-	struct io_pgtable_ops *ops;
+	struct io_pgtable iop;
 	struct io_pgtable_cfg cfg = {
 		.fmt = ARM_V7S,
 		.tlb = &dummy_tlb_ops,
@@ -946,8 +950,7 @@ static int __init arm_v7s_do_selftests(void)
 
 	cfg_cookie = &cfg;
 
-	ops = alloc_io_pgtable_ops(&cfg, &cfg);
-	if (!ops) {
+	if (alloc_io_pgtable_ops(&iop, &cfg, &cfg)) {
 		pr_err("selftest: failed to allocate io pgtable ops\n");
 		return -EINVAL;
 	}
@@ -956,14 +959,14 @@ static int __init arm_v7s_do_selftests(void)
 	 * Initial sanity checks.
 	 * Empty page tables shouldn't provide any translations.
 	 */
-	if (ops->iova_to_phys(ops, 42))
-		return __FAIL(ops);
+	if (iopt_iova_to_phys(&iop, 42))
+		return __FAIL();
 
-	if (ops->iova_to_phys(ops, SZ_1G + 42))
-		return __FAIL(ops);
+	if (iopt_iova_to_phys(&iop, SZ_1G + 42))
+		return __FAIL();
 
-	if (ops->iova_to_phys(ops, SZ_2G + 42))
-		return __FAIL(ops);
+	if (iopt_iova_to_phys(&iop, SZ_2G + 42))
+		return __FAIL();
 
 	/*
 	 * Distinct mappings of different granule sizes.
@@ -971,20 +974,20 @@ static int __init arm_v7s_do_selftests(void)
 	iova = 0;
 	for_each_set_bit(i, &cfg.pgsize_bitmap, BITS_PER_LONG) {
 		size = 1UL << i;
-		if (ops->map_pages(ops, iova, iova, size, 1,
+		if (iopt_map_pages(&iop, iova, iova, size, 1,
 				   IOMMU_READ | IOMMU_WRITE |
 				   IOMMU_NOEXEC | IOMMU_CACHE,
 				   GFP_KERNEL, &mapped))
-			return __FAIL(ops);
+			return __FAIL();
 
 		/* Overlapping mappings */
-		if (!ops->map_pages(ops, iova, iova + size, size, 1,
+		if (!iopt_map_pages(&iop, iova, iova + size, size, 1,
 				    IOMMU_READ | IOMMU_NOEXEC, GFP_KERNEL,
 				    &mapped))
-			return __FAIL(ops);
+			return __FAIL();
 
-		if (ops->iova_to_phys(ops, iova + 42) != (iova + 42))
-			return __FAIL(ops);
+		if (iopt_iova_to_phys(&iop, iova + 42) != (iova + 42))
+			return __FAIL();
 
 		iova += SZ_16M;
 		loopnr++;
@@ -995,17 +998,17 @@ static int __init arm_v7s_do_selftests(void)
 	size = 1UL << __ffs(cfg.pgsize_bitmap);
 	while (i < loopnr) {
 		iova_start = i * SZ_16M;
-		if (ops->unmap_pages(ops, iova_start + size, size, 1, NULL) != size)
-			return __FAIL(ops);
+		if (iopt_unmap_pages(&iop, iova_start + size, size, 1, NULL) != size)
+			return __FAIL();
 
 		/* Remap of partial unmap */
-		if (ops->map_pages(ops, iova_start + size, size, size, 1,
+		if (iopt_map_pages(&iop, iova_start + size, size, size, 1,
 				   IOMMU_READ, GFP_KERNEL, &mapped))
-			return __FAIL(ops);
+			return __FAIL();
 
-		if (ops->iova_to_phys(ops, iova_start + size + 42)
+		if (iopt_iova_to_phys(&iop, iova_start + size + 42)
 		    != (size + 42))
-			return __FAIL(ops);
+			return __FAIL();
 		i++;
 	}
 
@@ -1014,24 +1017,24 @@ static int __init arm_v7s_do_selftests(void)
 	for_each_set_bit(i, &cfg.pgsize_bitmap, BITS_PER_LONG) {
 		size = 1UL << i;
 
-		if (ops->unmap_pages(ops, iova, size, 1, NULL) != size)
-			return __FAIL(ops);
+		if (iopt_unmap_pages(&iop, iova, size, 1, NULL) != size)
+			return __FAIL();
 
-		if (ops->iova_to_phys(ops, iova + 42))
-			return __FAIL(ops);
+		if (iopt_iova_to_phys(&iop, iova + 42))
+			return __FAIL();
 
 		/* Remap full block */
-		if (ops->map_pages(ops, iova, iova, size, 1, IOMMU_WRITE,
+		if (iopt_map_pages(&iop, iova, iova, size, 1, IOMMU_WRITE,
 				   GFP_KERNEL, &mapped))
-			return __FAIL(ops);
+			return __FAIL();
 
-		if (ops->iova_to_phys(ops, iova + 42) != (iova + 42))
-			return __FAIL(ops);
+		if (iopt_iova_to_phys(&iop, iova + 42) != (iova + 42))
+			return __FAIL();
 
 		iova += SZ_16M;
 	}
 
-	free_io_pgtable_ops(ops);
+	free_io_pgtable_ops(&iop);
 
 	selftest_running = false;
 
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index c412500efadf..bee8980c89eb 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -82,40 +82,40 @@ void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries,
 
 static void arm_lpae_free_pgtable(struct io_pgtable *iop)
 {
-	struct arm_lpae_io_pgtable *data = io_pgtable_to_data(iop);
+	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
 
-	__arm_lpae_free_pgtable(data, data->start_level, data->pgd);
+	__arm_lpae_free_pgtable(data, data->start_level, iop->pgd);
 	kfree(data);
 }
 
-static struct io_pgtable *
-arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
+int arm_64_lpae_alloc_pgtable_s1(struct io_pgtable *iop,
+				 struct io_pgtable_cfg *cfg, void *cookie)
 {
 	struct arm_lpae_io_pgtable *data;
 
 	data = kzalloc(sizeof(*data), GFP_KERNEL);
 	if (!data)
-		return NULL;
+		return -ENOMEM;
 
 	if (arm_lpae_init_pgtable_s1(cfg, data))
 		goto out_free_data;
 
 	/* Looking good; allocate a pgd */
-	data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data),
-					   GFP_KERNEL, cfg);
-	if (!data->pgd)
+	iop->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data),
+					  GFP_KERNEL, cfg);
+	if (!iop->pgd)
 		goto out_free_data;
 
 	/* Ensure the empty pgd is visible before any actual TTBR write */
 	wmb();
 
-	/* TTBR */
-	cfg->arm_lpae_s1_cfg.ttbr = virt_to_phys(data->pgd);
-	return &data->iop;
+	cfg->arm_lpae_s1_cfg.ttbr = virt_to_phys(iop->pgd);
+	iop->ops = &data->iop.ops;
+	return 0;
 
 out_free_data:
 	kfree(data);
-	return NULL;
+	return -EINVAL;
 }
 
 static int arm_64_lpae_configure_s1(struct io_pgtable_cfg *cfg, size_t *pgd_size)
@@ -130,34 +130,35 @@ static int arm_64_lpae_configure_s1(struct io_pgtable_cfg *cfg, size_t *pgd_size
 	return 0;
 }
 
-static struct io_pgtable *
-arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
+int arm_64_lpae_alloc_pgtable_s2(struct io_pgtable *iop,
+				 struct io_pgtable_cfg *cfg, void *cookie)
 {
 	struct arm_lpae_io_pgtable *data;
 
 	data = kzalloc(sizeof(*data), GFP_KERNEL);
 	if (!data)
-		return NULL;
+		return -ENOMEM;
 
 	if (arm_lpae_init_pgtable_s2(cfg, data))
 		goto out_free_data;
 
 	/* Allocate pgd pages */
-	data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data),
-					   GFP_KERNEL, cfg);
-	if (!data->pgd)
+	iop->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data),
+					  GFP_KERNEL, cfg);
+	if (!iop->pgd)
 		goto out_free_data;
 
 	/* Ensure the empty pgd is visible before any actual TTBR write */
 	wmb();
 
 	/* VTTBR */
-	cfg->arm_lpae_s2_cfg.vttbr = virt_to_phys(data->pgd);
-	return &data->iop;
+	cfg->arm_lpae_s2_cfg.vttbr = virt_to_phys(iop->pgd);
+	iop->ops = &data->iop.ops;
+	return 0;
 
 out_free_data:
 	kfree(data);
-	return NULL;
+	return -EINVAL;
 }
 
 static int arm_64_lpae_configure_s2(struct io_pgtable_cfg *cfg, size_t *pgd_size)
@@ -172,46 +173,46 @@ static int arm_64_lpae_configure_s2(struct io_pgtable_cfg *cfg, size_t *pgd_size
 	return 0;
 }
 
-static struct io_pgtable *
-arm_32_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
+int arm_32_lpae_alloc_pgtable_s1(struct io_pgtable *iop,
+				 struct io_pgtable_cfg *cfg, void *cookie)
 {
 	if (cfg->ias > 32 || cfg->oas > 40)
-		return NULL;
+		return -EINVAL;
 
 	cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G);
-	return arm_64_lpae_alloc_pgtable_s1(cfg, cookie);
+	return arm_64_lpae_alloc_pgtable_s1(iop, cfg, cookie);
 }
 
-static struct io_pgtable *
-arm_32_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
+int arm_32_lpae_alloc_pgtable_s2(struct io_pgtable *iop,
+				 struct io_pgtable_cfg *cfg, void *cookie)
 {
 	if (cfg->ias > 40 || cfg->oas > 40)
-		return NULL;
+		return -EINVAL;
 
 	cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G);
-	return arm_64_lpae_alloc_pgtable_s2(cfg, cookie);
+	return arm_64_lpae_alloc_pgtable_s2(iop, cfg, cookie);
 }
 
-static struct io_pgtable *
-arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
+int arm_mali_lpae_alloc_pgtable(struct io_pgtable *iop,
+				struct io_pgtable_cfg *cfg, void *cookie)
 {
 	struct arm_lpae_io_pgtable *data;
 
 	/* No quirks for Mali (hopefully) */
 	if (cfg->quirks)
-		return NULL;
+		return -EINVAL;
 
 	if (cfg->ias > 48 || cfg->oas > 40)
-		return NULL;
+		return -EINVAL;
 
 	cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G);
 
 	data = kzalloc(sizeof(*data), GFP_KERNEL);
 	if (!data)
-		return NULL;
+		return -ENOMEM;
 
 	if (arm_lpae_init_pgtable(cfg, data))
-		return NULL;
+		goto out_free_data;
 
 	/* Mali seems to need a full 4-level table regardless of IAS */
 	if (data->start_level > 0) {
@@ -233,25 +234,26 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
 		(ARM_MALI_LPAE_MEMATTR_IMP_DEF
 		 << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV));
 
-	data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data), GFP_KERNEL,
-					   cfg);
-	if (!data->pgd)
+	iop->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data), GFP_KERNEL,
+					  cfg);
+	if (!iop->pgd)
 		goto out_free_data;
 
 	/* Ensure the empty pgd is visible before TRANSTAB can be written */
 	wmb();
 
-	cfg->arm_mali_lpae_cfg.transtab = virt_to_phys(data->pgd) |
+	cfg->arm_mali_lpae_cfg.transtab = virt_to_phys(iop->pgd) |
 					  ARM_MALI_LPAE_TTBR_READ_INNER |
 					  ARM_MALI_LPAE_TTBR_ADRMODE_TABLE;
 	if (cfg->coherent_walk)
 		cfg->arm_mali_lpae_cfg.transtab |= ARM_MALI_LPAE_TTBR_SHARE_OUTER;
 
-	return &data->iop;
+	iop->ops = &data->iop.ops;
+	return 0;
 
 out_free_data:
 	kfree(data);
-	return NULL;
+	return -EINVAL;
 }
 
 struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns = {
@@ -310,21 +312,21 @@ static const struct iommu_flush_ops dummy_tlb_ops __initconst = {
 	.tlb_add_page	= dummy_tlb_add_page,
 };
 
-static void __init arm_lpae_dump_ops(struct io_pgtable_ops *ops)
+static void __init arm_lpae_dump_ops(struct io_pgtable *iop)
 {
-	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
 	struct io_pgtable_cfg *cfg = &data->iop.cfg;
 
 	pr_err("cfg: pgsize_bitmap 0x%lx, ias %u-bit\n",
 		cfg->pgsize_bitmap, cfg->ias);
 	pr_err("data: %d levels, 0x%zx pgd_size, %u pg_shift, %u bits_per_level, pgd @ %p\n",
 		ARM_LPAE_MAX_LEVELS - data->start_level, ARM_LPAE_PGD_SIZE(data),
-		ilog2(ARM_LPAE_GRANULE(data)), data->bits_per_level, data->pgd);
+		ilog2(ARM_LPAE_GRANULE(data)), data->bits_per_level, iop->pgd);
 }
 
-#define __FAIL(ops, i)	({						\
+#define __FAIL(iop, i)	({						\
 		WARN(1, "selftest: test failed for fmt idx %d\n", (i));	\
-		arm_lpae_dump_ops(ops);					\
+		arm_lpae_dump_ops(iop);					\
 		selftest_running = false;				\
 		-EFAULT;						\
 })
@@ -336,34 +338,34 @@ static int __init arm_lpae_run_tests(struct io_pgtable_cfg *cfg)
 		ARM_64_LPAE_S2,
 	};
 
-	int i, j;
+	int i, j, ret;
 	unsigned long iova;
 	size_t size, mapped;
-	struct io_pgtable_ops *ops;
+	struct io_pgtable iop;
 
 	selftest_running = true;
 
 	for (i = 0; i < ARRAY_SIZE(fmts); ++i) {
 		cfg_cookie = cfg;
 		cfg->fmt = fmts[i];
-		ops = alloc_io_pgtable_ops(cfg, cfg);
-		if (!ops) {
+		ret = alloc_io_pgtable_ops(&iop, cfg, cfg);
+		if (ret) {
 			pr_err("selftest: failed to allocate io pgtable ops\n");
-			return -ENOMEM;
+			return ret;
 		}
 
 		/*
 		 * Initial sanity checks.
 		 * Empty page tables shouldn't provide any translations.
 		 */
-		if (ops->iova_to_phys(ops, 42))
-			return __FAIL(ops, i);
+		if (iopt_iova_to_phys(&iop, 42))
+			return __FAIL(&iop, i);
 
-		if (ops->iova_to_phys(ops, SZ_1G + 42))
-			return __FAIL(ops, i);
+		if (iopt_iova_to_phys(&iop, SZ_1G + 42))
+			return __FAIL(&iop, i);
 
-		if (ops->iova_to_phys(ops, SZ_2G + 42))
-			return __FAIL(ops, i);
+		if (iopt_iova_to_phys(&iop, SZ_2G + 42))
+			return __FAIL(&iop, i);
 
 		/*
 		 * Distinct mappings of different granule sizes.
@@ -372,60 +374,60 @@ static int __init arm_lpae_run_tests(struct io_pgtable_cfg *cfg)
 		for_each_set_bit(j, &cfg->pgsize_bitmap, BITS_PER_LONG) {
 			size = 1UL << j;
 
-			if (ops->map_pages(ops, iova, iova, size, 1,
+			if (iopt_map_pages(&iop, iova, iova, size, 1,
 					   IOMMU_READ | IOMMU_WRITE |
 					   IOMMU_NOEXEC | IOMMU_CACHE,
 					   GFP_KERNEL, &mapped))
-				return __FAIL(ops, i);
+				return __FAIL(&iop, i);
 
 			/* Overlapping mappings */
-			if (!ops->map_pages(ops, iova, iova + size, size, 1,
+			if (!iopt_map_pages(&iop, iova, iova + size, size, 1,
 					    IOMMU_READ | IOMMU_NOEXEC,
 					    GFP_KERNEL, &mapped))
-				return __FAIL(ops, i);
+				return __FAIL(&iop, i);
 
-			if (ops->iova_to_phys(ops, iova + 42) != (iova + 42))
-				return __FAIL(ops, i);
+			if (iopt_iova_to_phys(&iop, iova + 42) != (iova + 42))
+				return __FAIL(&iop, i);
 
 			iova += SZ_1G;
 		}
 
 		/* Partial unmap */
 		size = 1UL << __ffs(cfg->pgsize_bitmap);
-		if (ops->unmap_pages(ops, SZ_1G + size, size, 1, NULL) != size)
-			return __FAIL(ops, i);
+		if (iopt_unmap_pages(&iop, SZ_1G + size, size, 1, NULL) != size)
+			return __FAIL(&iop, i);
 
 		/* Remap of partial unmap */
-		if (ops->map_pages(ops, SZ_1G + size, size, size, 1,
+		if (iopt_map_pages(&iop, SZ_1G + size, size, size, 1,
 				   IOMMU_READ, GFP_KERNEL, &mapped))
-			return __FAIL(ops, i);
+			return __FAIL(&iop, i);
 
-		if (ops->iova_to_phys(ops, SZ_1G + size + 42) != (size + 42))
-			return __FAIL(ops, i);
+		if (iopt_iova_to_phys(&iop, SZ_1G + size + 42) != (size + 42))
+			return __FAIL(&iop, i);
 
 		/* Full unmap */
 		iova = 0;
 		for_each_set_bit(j, &cfg->pgsize_bitmap, BITS_PER_LONG) {
 			size = 1UL << j;
 
-			if (ops->unmap_pages(ops, iova, size, 1, NULL) != size)
-				return __FAIL(ops, i);
+			if (iopt_unmap_pages(&iop, iova, size, 1, NULL) != size)
+				return __FAIL(&iop, i);
 
-			if (ops->iova_to_phys(ops, iova + 42))
-				return __FAIL(ops, i);
+			if (iopt_iova_to_phys(&iop, iova + 42))
+				return __FAIL(&iop, i);
 
 			/* Remap full block */
-			if (ops->map_pages(ops, iova, iova, size, 1,
+			if (iopt_map_pages(&iop, iova, iova, size, 1,
 					   IOMMU_WRITE, GFP_KERNEL, &mapped))
-				return __FAIL(ops, i);
+				return __FAIL(&iop, i);
 
-			if (ops->iova_to_phys(ops, iova + 42) != (iova + 42))
-				return __FAIL(ops, i);
+			if (iopt_iova_to_phys(&iop, iova + 42) != (iova + 42))
+				return __FAIL(&iop, i);
 
 			iova += SZ_1G;
 		}
 
-		free_io_pgtable_ops(ops);
+		free_io_pgtable_ops(&iop);
 	}
 
 	selftest_running = false;
diff --git a/drivers/iommu/io-pgtable-dart.c b/drivers/iommu/io-pgtable-dart.c
index f981b25d8c98..1bb2e91ed0a7 100644
--- a/drivers/iommu/io-pgtable-dart.c
+++ b/drivers/iommu/io-pgtable-dart.c
@@ -34,7 +34,7 @@
 	container_of((x), struct dart_io_pgtable, iop)
 
 #define io_pgtable_ops_to_data(x)					\
-	io_pgtable_to_data(io_pgtable_ops_to_pgtable(x))
+	io_pgtable_to_data(io_pgtable_ops_to_params(x))
 
 #define DART_GRANULE(d)						\
 	(sizeof(dart_iopte) << (d)->bits_per_level)
@@ -65,12 +65,10 @@
 #define iopte_deref(pte, d) __va(iopte_to_paddr(pte, d))
 
 struct dart_io_pgtable {
-	struct io_pgtable	iop;
+	struct io_pgtable_params	iop;
 
-	int			tbl_bits;
-	int			bits_per_level;
-
-	void			*pgd[DART_MAX_TABLES];
+	int				tbl_bits;
+	int				bits_per_level;
 };
 
 typedef u64 dart_iopte;
@@ -170,10 +168,14 @@ static dart_iopte dart_install_table(dart_iopte *table,
 	return old;
 }
 
-static int dart_get_table(struct dart_io_pgtable *data, unsigned long iova)
+static dart_iopte *dart_get_table(struct io_pgtable *iop,
+				  struct dart_io_pgtable *data,
+				  unsigned long iova)
 {
-	return (iova >> (3 * data->bits_per_level + ilog2(sizeof(dart_iopte)))) &
+	int tbl = (iova >> (3 * data->bits_per_level + ilog2(sizeof(dart_iopte)))) &
 		((1 << data->tbl_bits) - 1);
+
+	return iop->pgd + DART_GRANULE(data) * tbl;
 }
 
 static int dart_get_l1_index(struct dart_io_pgtable *data, unsigned long iova)
@@ -190,12 +192,12 @@ static int dart_get_l2_index(struct dart_io_pgtable *data, unsigned long iova)
 		 ((1 << data->bits_per_level) - 1);
 }
 
-static  dart_iopte *dart_get_l2(struct dart_io_pgtable *data, unsigned long iova)
+static  dart_iopte *dart_get_l2(struct io_pgtable *iop,
+				struct dart_io_pgtable *data, unsigned long iova)
 {
 	dart_iopte pte, *ptep;
-	int tbl = dart_get_table(data, iova);
 
-	ptep = data->pgd[tbl];
+	ptep = dart_get_table(iop, data, iova);
 	if (!ptep)
 		return NULL;
 
@@ -233,14 +235,14 @@ static dart_iopte dart_prot_to_pte(struct dart_io_pgtable *data,
 	return pte;
 }
 
-static int dart_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
+static int dart_map_pages(struct io_pgtable *iop, unsigned long iova,
 			      phys_addr_t paddr, size_t pgsize, size_t pgcount,
 			      int iommu_prot, gfp_t gfp, size_t *mapped)
 {
-	struct dart_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	struct dart_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
 	struct io_pgtable_cfg *cfg = &data->iop.cfg;
 	size_t tblsz = DART_GRANULE(data);
-	int ret = 0, tbl, num_entries, max_entries, map_idx_start;
+	int ret = 0, num_entries, max_entries, map_idx_start;
 	dart_iopte pte, *cptep, *ptep;
 	dart_iopte prot;
 
@@ -254,9 +256,7 @@ static int dart_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
 	if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE)))
 		return 0;
 
-	tbl = dart_get_table(data, iova);
-
-	ptep = data->pgd[tbl];
+	ptep = dart_get_table(iop, data, iova);
 	ptep += dart_get_l1_index(data, iova);
 	pte = READ_ONCE(*ptep);
 
@@ -295,11 +295,11 @@ static int dart_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
 	return ret;
 }
 
-static size_t dart_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova,
+static size_t dart_unmap_pages(struct io_pgtable *iop, unsigned long iova,
 				   size_t pgsize, size_t pgcount,
 				   struct iommu_iotlb_gather *gather)
 {
-	struct dart_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	struct dart_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
 	struct io_pgtable_cfg *cfg = &data->iop.cfg;
 	int i = 0, num_entries, max_entries, unmap_idx_start;
 	dart_iopte pte, *ptep;
@@ -307,7 +307,7 @@ static size_t dart_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova,
 	if (WARN_ON(pgsize != cfg->pgsize_bitmap || !pgcount))
 		return 0;
 
-	ptep = dart_get_l2(data, iova);
+	ptep = dart_get_l2(iop, data, iova);
 
 	/* Valid L2 IOPTE pointer? */
 	if (WARN_ON(!ptep))
@@ -328,7 +328,7 @@ static size_t dart_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova,
 		*ptep = 0;
 
 		if (!iommu_iotlb_gather_queued(gather))
-			io_pgtable_tlb_add_page(&data->iop, gather,
+			io_pgtable_tlb_add_page(cfg, iop, gather,
 						iova + i * pgsize, pgsize);
 
 		ptep++;
@@ -338,13 +338,13 @@ static size_t dart_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova,
 	return i * pgsize;
 }
 
-static phys_addr_t dart_iova_to_phys(struct io_pgtable_ops *ops,
+static phys_addr_t dart_iova_to_phys(struct io_pgtable *iop,
 					 unsigned long iova)
 {
-	struct dart_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	struct dart_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
 	dart_iopte pte, *ptep;
 
-	ptep = dart_get_l2(data, iova);
+	ptep = dart_get_l2(iop, data, iova);
 
 	/* Valid L2 IOPTE pointer? */
 	if (!ptep)
@@ -394,56 +394,56 @@ dart_alloc_pgtable(struct io_pgtable_cfg *cfg)
 	return data;
 }
 
-static struct io_pgtable *
-apple_dart_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
+static int apple_dart_alloc_pgtable(struct io_pgtable *iop,
+				    struct io_pgtable_cfg *cfg, void *cookie)
 {
 	struct dart_io_pgtable *data;
 	int i;
 
 	if (!cfg->coherent_walk)
-		return NULL;
+		return -EINVAL;
 
 	if (cfg->oas != 36 && cfg->oas != 42)
-		return NULL;
+		return -EINVAL;
 
 	if (cfg->ias > cfg->oas)
-		return NULL;
+		return -EINVAL;
 
 	if (!(cfg->pgsize_bitmap == SZ_4K || cfg->pgsize_bitmap == SZ_16K))
-		return NULL;
+		return -EINVAL;
 
 	data = dart_alloc_pgtable(cfg);
 	if (!data)
-		return NULL;
+		return -ENOMEM;
 
 	cfg->apple_dart_cfg.n_ttbrs = 1 << data->tbl_bits;
 
-	for (i = 0; i < cfg->apple_dart_cfg.n_ttbrs; ++i) {
-		data->pgd[i] = __dart_alloc_pages(DART_GRANULE(data), GFP_KERNEL,
-					   cfg);
-		if (!data->pgd[i])
-			goto out_free_data;
-		cfg->apple_dart_cfg.ttbr[i] = virt_to_phys(data->pgd[i]);
-	}
+	iop->pgd = __dart_alloc_pages(cfg->apple_dart_cfg.n_ttbrs *
+				      DART_GRANULE(data), GFP_KERNEL, cfg);
+	if (!iop->pgd)
+		goto out_free_data;
+
+	for (i = 0; i < cfg->apple_dart_cfg.n_ttbrs; ++i)
+		cfg->apple_dart_cfg.ttbr[i] = virt_to_phys(iop->pgd) +
+					      i * DART_GRANULE(data);
 
-	return &data->iop;
+	iop->ops = &data->iop.ops;
+	return 0;
 
 out_free_data:
-	while (--i >= 0)
-		free_pages((unsigned long)data->pgd[i],
-			   get_order(DART_GRANULE(data)));
 	kfree(data);
-	return NULL;
+	return -ENOMEM;
 }
 
 static void apple_dart_free_pgtable(struct io_pgtable *iop)
 {
-	struct dart_io_pgtable *data = io_pgtable_to_data(iop);
+	struct dart_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
+	size_t n_ttbrs = 1 << data->tbl_bits;
 	dart_iopte *ptep, *end;
 	int i;
 
-	for (i = 0; i < (1 << data->tbl_bits) && data->pgd[i]; ++i) {
-		ptep = data->pgd[i];
+	for (i = 0; i < n_ttbrs; ++i) {
+		ptep = iop->pgd + DART_GRANULE(data) * i;
 		end = (void *)ptep + DART_GRANULE(data);
 
 		while (ptep != end) {
@@ -456,10 +456,9 @@ static void apple_dart_free_pgtable(struct io_pgtable *iop)
 				free_pages(page, get_order(DART_GRANULE(data)));
 			}
 		}
-		free_pages((unsigned long)data->pgd[i],
-			   get_order(DART_GRANULE(data)));
 	}
-
+	free_pages((unsigned long)iop->pgd,
+		   get_order(DART_GRANULE(data) * n_ttbrs));
 	kfree(data);
 }
 
diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c
index 2aba691db1da..acc6802b2f50 100644
--- a/drivers/iommu/io-pgtable.c
+++ b/drivers/iommu/io-pgtable.c
@@ -34,27 +34,30 @@ io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] = {
 #endif
 };
 
-struct io_pgtable_ops *alloc_io_pgtable_ops(struct io_pgtable_cfg *cfg,
-					    void *cookie)
+int alloc_io_pgtable_ops(struct io_pgtable *iop, struct io_pgtable_cfg *cfg,
+			 void *cookie)
 {
-	struct io_pgtable *iop;
+	int ret;
+	struct io_pgtable_params *params;
 	const struct io_pgtable_init_fns *fns;
 
 	if (cfg->fmt >= IO_PGTABLE_NUM_FMTS)
-		return NULL;
+		return -EINVAL;
 
 	fns = io_pgtable_init_table[cfg->fmt];
 	if (!fns)
-		return NULL;
+		return -EINVAL;
 
-	iop = fns->alloc(cfg, cookie);
-	if (!iop)
-		return NULL;
+	ret = fns->alloc(iop, cfg, cookie);
+	if (ret)
+		return ret;
+
+	params = io_pgtable_ops_to_params(iop->ops);
 
 	iop->cookie	= cookie;
-	iop->cfg	= *cfg;
+	params->cfg	= *cfg;
 
-	return &iop->ops;
+	return 0;
 }
 EXPORT_SYMBOL_GPL(alloc_io_pgtable_ops);
 
@@ -62,16 +65,17 @@ EXPORT_SYMBOL_GPL(alloc_io_pgtable_ops);
  * It is the IOMMU driver's responsibility to ensure that the page table
  * is no longer accessible to the walker by this point.
  */
-void free_io_pgtable_ops(struct io_pgtable_ops *ops)
+void free_io_pgtable_ops(struct io_pgtable *iop)
 {
-	struct io_pgtable *iop;
+	struct io_pgtable_params *params;
 
-	if (!ops)
+	if (!iop)
 		return;
 
-	iop = io_pgtable_ops_to_pgtable(ops);
-	io_pgtable_tlb_flush_all(iop);
-	io_pgtable_init_table[iop->cfg.fmt]->free(iop);
+	params = io_pgtable_ops_to_params(iop->ops);
+	io_pgtable_tlb_flush_all(&params->cfg, iop);
+	io_pgtable_init_table[params->cfg.fmt]->free(iop);
+	memset(iop, 0, sizeof(*iop));
 }
 EXPORT_SYMBOL_GPL(free_io_pgtable_ops);
 
diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index 4a1927489635..3ff21e6bf939 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -73,7 +73,7 @@ struct ipmmu_vmsa_domain {
 	struct iommu_domain io_domain;
 
 	struct io_pgtable_cfg cfg;
-	struct io_pgtable_ops *iop;
+	struct io_pgtable iop;
 
 	unsigned int context_id;
 	struct mutex mutex;			/* Protects mappings */
@@ -458,11 +458,11 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
 
 	domain->context_id = ret;
 
-	domain->iop = alloc_io_pgtable_ops(&domain->cfg, domain);
-	if (!domain->iop) {
+	ret = alloc_io_pgtable_ops(&domain->iop, &domain->cfg, domain);
+	if (ret) {
 		ipmmu_domain_free_context(domain->mmu->root,
 					  domain->context_id);
-		return -EINVAL;
+		return ret;
 	}
 
 	ipmmu_domain_setup_context(domain);
@@ -592,7 +592,7 @@ static void ipmmu_domain_free(struct iommu_domain *io_domain)
 	 * been detached.
 	 */
 	ipmmu_domain_destroy_context(domain);
-	free_io_pgtable_ops(domain->iop);
+	free_io_pgtable_ops(&domain->iop);
 	kfree(domain);
 }
 
@@ -664,8 +664,8 @@ static int ipmmu_map(struct iommu_domain *io_domain, unsigned long iova,
 {
 	struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);
 
-	return domain->iop->map_pages(domain->iop, iova, paddr, pgsize, pgcount,
-				      prot, gfp, mapped);
+	return iopt_map_pages(&domain->iop, iova, paddr, pgsize, pgcount, prot,
+			      gfp, mapped);
 }
 
 static size_t ipmmu_unmap(struct iommu_domain *io_domain, unsigned long iova,
@@ -674,7 +674,7 @@ static size_t ipmmu_unmap(struct iommu_domain *io_domain, unsigned long iova,
 {
 	struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);
 
-	return domain->iop->unmap_pages(domain->iop, iova, pgsize, pgcount, gather);
+	return iopt_unmap_pages(&domain->iop, iova, pgsize, pgcount, gather);
 }
 
 static void ipmmu_flush_iotlb_all(struct iommu_domain *io_domain)
@@ -698,7 +698,7 @@ static phys_addr_t ipmmu_iova_to_phys(struct iommu_domain *io_domain,
 
 	/* TODO: Is locking needed ? */
 
-	return domain->iop->iova_to_phys(domain->iop, iova);
+	return iopt_iova_to_phys(&domain->iop, iova);
 }
 
 static int ipmmu_init_platform_device(struct device *dev,
diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index 2c05a84ec1bf..6dae6743e11b 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -41,7 +41,7 @@ struct msm_priv {
 	struct list_head list_attached;
 	struct iommu_domain domain;
 	struct io_pgtable_cfg	cfg;
-	struct io_pgtable_ops	*iop;
+	struct io_pgtable	iop;
 	struct device		*dev;
 	spinlock_t		pgtlock; /* pagetable lock */
 };
@@ -339,6 +339,7 @@ static void msm_iommu_domain_free(struct iommu_domain *domain)
 
 static int msm_iommu_domain_config(struct msm_priv *priv)
 {
+	int ret;
 	spin_lock_init(&priv->pgtlock);
 
 	priv->cfg = (struct io_pgtable_cfg) {
@@ -350,10 +351,10 @@ static int msm_iommu_domain_config(struct msm_priv *priv)
 		.iommu_dev = priv->dev,
 	};
 
-	priv->iop = alloc_io_pgtable_ops(&priv->cfg, priv);
-	if (!priv->iop) {
+	ret = alloc_io_pgtable_ops(&priv->iop, &priv->cfg, priv);
+	if (ret) {
 		dev_err(priv->dev, "Failed to allocate pgtable\n");
-		return -EINVAL;
+		return ret;
 	}
 
 	msm_iommu_ops.pgsize_bitmap = priv->cfg.pgsize_bitmap;
@@ -453,7 +454,7 @@ static void msm_iommu_detach_dev(struct iommu_domain *domain,
 	struct msm_iommu_ctx_dev *master;
 	int ret;
 
-	free_io_pgtable_ops(priv->iop);
+	free_io_pgtable_ops(&priv->iop);
 
 	spin_lock_irqsave(&msm_iommu_lock, flags);
 	list_for_each_entry(iommu, &priv->list_attached, dom_node) {
@@ -480,8 +481,8 @@ static int msm_iommu_map(struct iommu_domain *domain, unsigned long iova,
 	int ret;
 
 	spin_lock_irqsave(&priv->pgtlock, flags);
-	ret = priv->iop->map_pages(priv->iop, iova, pa, pgsize, pgcount, prot,
-				   GFP_ATOMIC, mapped);
+	ret = iopt_map_pages(&priv->iop, iova, pa, pgsize, pgcount, prot,
+			     GFP_ATOMIC, mapped);
 	spin_unlock_irqrestore(&priv->pgtlock, flags);
 
 	return ret;
@@ -504,7 +505,7 @@ static size_t msm_iommu_unmap(struct iommu_domain *domain, unsigned long iova,
 	size_t ret;
 
 	spin_lock_irqsave(&priv->pgtlock, flags);
-	ret = priv->iop->unmap_pages(priv->iop, iova, pgsize, pgcount, gather);
+	ret = iopt_unmap_pages(&priv->iop, iova, pgsize, pgcount, gather);
 	spin_unlock_irqrestore(&priv->pgtlock, flags);
 
 	return ret;
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 0d754d94ae52..615d9ade575e 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -244,7 +244,7 @@ struct mtk_iommu_data {
 
 struct mtk_iommu_domain {
 	struct io_pgtable_cfg		cfg;
-	struct io_pgtable_ops		*iop;
+	struct io_pgtable		iop;
 
 	struct mtk_iommu_bank_data	*bank;
 	struct iommu_domain		domain;
@@ -587,6 +587,7 @@ static int mtk_iommu_domain_finalise(struct mtk_iommu_domain *dom,
 {
 	const struct mtk_iommu_iova_region *region;
 	struct mtk_iommu_domain	*m4u_dom;
+	int ret;
 
 	/* Always use bank0 in sharing pgtable case */
 	m4u_dom = data->bank[0].m4u_dom;
@@ -615,8 +616,8 @@ static int mtk_iommu_domain_finalise(struct mtk_iommu_domain *dom,
 	else
 		dom->cfg.oas = 35;
 
-	dom->iop = alloc_io_pgtable_ops(&dom->cfg, data);
-	if (!dom->iop) {
+	ret = alloc_io_pgtable_ops(&dom->iop, &dom->cfg, data);
+	if (ret) {
 		dev_err(data->dev, "Failed to alloc io pgtable\n");
 		return -ENOMEM;
 	}
@@ -730,7 +731,7 @@ static int mtk_iommu_map(struct iommu_domain *domain, unsigned long iova,
 		paddr |= BIT_ULL(32);
 
 	/* Synchronize with the tlb_lock */
-	return dom->iop->map_pages(dom->iop, iova, paddr, pgsize, pgcount, prot, gfp, mapped);
+	return iopt_map_pages(&dom->iop, iova, paddr, pgsize, pgcount, prot, gfp, mapped);
 }
 
 static size_t mtk_iommu_unmap(struct iommu_domain *domain,
@@ -740,7 +741,7 @@ static size_t mtk_iommu_unmap(struct iommu_domain *domain,
 	struct mtk_iommu_domain *dom = to_mtk_domain(domain);
 
 	iommu_iotlb_gather_add_range(gather, iova, pgsize * pgcount);
-	return dom->iop->unmap_pages(dom->iop, iova, pgsize, pgcount, gather);
+	return iopt_unmap_pages(&dom->iop, iova, pgsize, pgcount, gather);
 }
 
 static void mtk_iommu_flush_iotlb_all(struct iommu_domain *domain)
@@ -773,7 +774,7 @@ static phys_addr_t mtk_iommu_iova_to_phys(struct iommu_domain *domain,
 	struct mtk_iommu_domain *dom = to_mtk_domain(domain);
 	phys_addr_t pa;
 
-	pa = dom->iop->iova_to_phys(dom->iop, iova);
+	pa = iopt_iova_to_phys(&dom->iop, iova);
 	if (IS_ENABLED(CONFIG_PHYS_ADDR_T_64BIT) &&
 	    dom->bank->parent_data->enable_4GB &&
 	    pa >= MTK_IOMMU_4GB_MODE_REMAP_BASE)
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 06/45] iommu/io-pgtable-arm: Extend __arm_lpae_free_pgtable() to only free child tables
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:52   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:52 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

The hypervisor side of io-pgtable-arm needs to free the top-level page
table separately from the other tables (which are page-sized and will
use a page queue).

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 include/linux/io-pgtable-arm.h        |  2 +-
 drivers/iommu/io-pgtable-arm-common.c | 11 +++++++----
 drivers/iommu/io-pgtable-arm.c        |  2 +-
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/include/linux/io-pgtable-arm.h b/include/linux/io-pgtable-arm.h
index 5199bd9851b6..2b3e69386d08 100644
--- a/include/linux/io-pgtable-arm.h
+++ b/include/linux/io-pgtable-arm.h
@@ -166,7 +166,7 @@ static inline bool iopte_leaf(arm_lpae_iopte pte, int lvl,
 
 /* Generic functions */
 void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
-			     arm_lpae_iopte *ptep);
+			     arm_lpae_iopte *ptep, bool only_children);
 
 int arm_lpae_init_pgtable(struct io_pgtable_cfg *cfg,
 			  struct arm_lpae_io_pgtable *data);
diff --git a/drivers/iommu/io-pgtable-arm-common.c b/drivers/iommu/io-pgtable-arm-common.c
index 359086cace34..009c35d4095f 100644
--- a/drivers/iommu/io-pgtable-arm-common.c
+++ b/drivers/iommu/io-pgtable-arm-common.c
@@ -299,7 +299,7 @@ int arm_lpae_map_pages(struct io_pgtable *iop, unsigned long iova,
 }
 
 void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
-			     arm_lpae_iopte *ptep)
+			     arm_lpae_iopte *ptep, bool only_children)
 {
 	arm_lpae_iopte *start, *end;
 	unsigned long table_size;
@@ -323,10 +323,12 @@ void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
 		if (!pte || iopte_leaf(pte, lvl, data->iop.cfg.fmt))
 			continue;
 
-		__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data));
+		__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data),
+					false);
 	}
 
-	__arm_lpae_free_pages(start, table_size, &data->iop.cfg);
+	if (!only_children)
+		__arm_lpae_free_pages(start, table_size, &data->iop.cfg);
 }
 
 static size_t arm_lpae_split_blk_unmap(struct io_pgtable *iop,
@@ -428,7 +430,8 @@ static size_t __arm_lpae_unmap(struct io_pgtable *iop,
 				/* Also flush any partial walks */
 				io_pgtable_tlb_flush_walk(cfg, iop, iova + i * size, size,
 							  ARM_LPAE_GRANULE(data));
-				__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data));
+				__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data),
+							false);
 			} else if (!iommu_iotlb_gather_queued(gather)) {
 				io_pgtable_tlb_add_page(cfg, iop, gather,
 							iova + i * size, size);
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index bee8980c89eb..b7920637126c 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -84,7 +84,7 @@ static void arm_lpae_free_pgtable(struct io_pgtable *iop)
 {
 	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
 
-	__arm_lpae_free_pgtable(data, data->start_level, iop->pgd);
+	__arm_lpae_free_pgtable(data, data->start_level, iop->pgd, false);
 	kfree(data);
 }
 
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 06/45] iommu/io-pgtable-arm: Extend __arm_lpae_free_pgtable() to only free child tables
@ 2023-02-01 12:52   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:52 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

The hypervisor side of io-pgtable-arm needs to free the top-level page
table separately from the other tables (which are page-sized and will
use a page queue).

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 include/linux/io-pgtable-arm.h        |  2 +-
 drivers/iommu/io-pgtable-arm-common.c | 11 +++++++----
 drivers/iommu/io-pgtable-arm.c        |  2 +-
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/include/linux/io-pgtable-arm.h b/include/linux/io-pgtable-arm.h
index 5199bd9851b6..2b3e69386d08 100644
--- a/include/linux/io-pgtable-arm.h
+++ b/include/linux/io-pgtable-arm.h
@@ -166,7 +166,7 @@ static inline bool iopte_leaf(arm_lpae_iopte pte, int lvl,
 
 /* Generic functions */
 void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
-			     arm_lpae_iopte *ptep);
+			     arm_lpae_iopte *ptep, bool only_children);
 
 int arm_lpae_init_pgtable(struct io_pgtable_cfg *cfg,
 			  struct arm_lpae_io_pgtable *data);
diff --git a/drivers/iommu/io-pgtable-arm-common.c b/drivers/iommu/io-pgtable-arm-common.c
index 359086cace34..009c35d4095f 100644
--- a/drivers/iommu/io-pgtable-arm-common.c
+++ b/drivers/iommu/io-pgtable-arm-common.c
@@ -299,7 +299,7 @@ int arm_lpae_map_pages(struct io_pgtable *iop, unsigned long iova,
 }
 
 void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
-			     arm_lpae_iopte *ptep)
+			     arm_lpae_iopte *ptep, bool only_children)
 {
 	arm_lpae_iopte *start, *end;
 	unsigned long table_size;
@@ -323,10 +323,12 @@ void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
 		if (!pte || iopte_leaf(pte, lvl, data->iop.cfg.fmt))
 			continue;
 
-		__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data));
+		__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data),
+					false);
 	}
 
-	__arm_lpae_free_pages(start, table_size, &data->iop.cfg);
+	if (!only_children)
+		__arm_lpae_free_pages(start, table_size, &data->iop.cfg);
 }
 
 static size_t arm_lpae_split_blk_unmap(struct io_pgtable *iop,
@@ -428,7 +430,8 @@ static size_t __arm_lpae_unmap(struct io_pgtable *iop,
 				/* Also flush any partial walks */
 				io_pgtable_tlb_flush_walk(cfg, iop, iova + i * size, size,
 							  ARM_LPAE_GRANULE(data));
-				__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data));
+				__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data),
+							false);
 			} else if (!iommu_iotlb_gather_queued(gather)) {
 				io_pgtable_tlb_add_page(cfg, iop, gather,
 							iova + i * size, size);
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index bee8980c89eb..b7920637126c 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -84,7 +84,7 @@ static void arm_lpae_free_pgtable(struct io_pgtable *iop)
 {
 	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
 
-	__arm_lpae_free_pgtable(data, data->start_level, iop->pgd);
+	__arm_lpae_free_pgtable(data, data->start_level, iop->pgd, false);
 	kfree(data);
 }
 
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 07/45] iommu/arm-smmu-v3: Move some definitions to arm64 include/
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:52   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:52 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

So that the KVM SMMUv3 driver can re-use architectural definitions,
command structures and feature bits, move them to the arm64 include/

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/include/asm/arm-smmu-v3-regs.h   | 479 +++++++++++++++++++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 481 +-------------------
 2 files changed, 485 insertions(+), 475 deletions(-)
 create mode 100644 arch/arm64/include/asm/arm-smmu-v3-regs.h

diff --git a/arch/arm64/include/asm/arm-smmu-v3-regs.h b/arch/arm64/include/asm/arm-smmu-v3-regs.h
new file mode 100644
index 000000000000..646a734f2554
--- /dev/null
+++ b/arch/arm64/include/asm/arm-smmu-v3-regs.h
@@ -0,0 +1,479 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _ARM_SMMU_V3_REGS_H
+#define _ARM_SMMU_V3_REGS_H
+
+#include <linux/bitfield.h>
+
+/* MMIO registers */
+#define ARM_SMMU_IDR0			0x0
+#define IDR0_ST_LVL			GENMASK(28, 27)
+#define IDR0_ST_LVL_2LVL		1
+#define IDR0_STALL_MODEL		GENMASK(25, 24)
+#define IDR0_STALL_MODEL_STALL		0
+#define IDR0_STALL_MODEL_FORCE		2
+#define IDR0_TTENDIAN			GENMASK(22, 21)
+#define IDR0_TTENDIAN_MIXED		0
+#define IDR0_TTENDIAN_LE		2
+#define IDR0_TTENDIAN_BE		3
+#define IDR0_CD2L			(1 << 19)
+#define IDR0_VMID16			(1 << 18)
+#define IDR0_PRI			(1 << 16)
+#define IDR0_SEV			(1 << 14)
+#define IDR0_MSI			(1 << 13)
+#define IDR0_ASID16			(1 << 12)
+#define IDR0_ATS			(1 << 10)
+#define IDR0_HYP			(1 << 9)
+#define IDR0_COHACC			(1 << 4)
+#define IDR0_TTF			GENMASK(3, 2)
+#define IDR0_TTF_AARCH64		2
+#define IDR0_TTF_AARCH32_64		3
+#define IDR0_S1P			(1 << 1)
+#define IDR0_S2P			(1 << 0)
+
+#define ARM_SMMU_IDR1			0x4
+#define IDR1_TABLES_PRESET		(1 << 30)
+#define IDR1_QUEUES_PRESET		(1 << 29)
+#define IDR1_REL			(1 << 28)
+#define IDR1_CMDQS			GENMASK(25, 21)
+#define IDR1_EVTQS			GENMASK(20, 16)
+#define IDR1_PRIQS			GENMASK(15, 11)
+#define IDR1_SSIDSIZE			GENMASK(10, 6)
+#define IDR1_SIDSIZE			GENMASK(5, 0)
+
+#define ARM_SMMU_IDR3			0xc
+#define IDR3_RIL			(1 << 10)
+
+#define ARM_SMMU_IDR5			0x14
+#define IDR5_STALL_MAX			GENMASK(31, 16)
+#define IDR5_GRAN64K			(1 << 6)
+#define IDR5_GRAN16K			(1 << 5)
+#define IDR5_GRAN4K			(1 << 4)
+#define IDR5_OAS			GENMASK(2, 0)
+#define IDR5_OAS_32_BIT			0
+#define IDR5_OAS_36_BIT			1
+#define IDR5_OAS_40_BIT			2
+#define IDR5_OAS_42_BIT			3
+#define IDR5_OAS_44_BIT			4
+#define IDR5_OAS_48_BIT			5
+#define IDR5_OAS_52_BIT			6
+#define IDR5_VAX			GENMASK(11, 10)
+#define IDR5_VAX_52_BIT			1
+
+#define ARM_SMMU_CR0			0x20
+#define CR0_ATSCHK			(1 << 4)
+#define CR0_CMDQEN			(1 << 3)
+#define CR0_EVTQEN			(1 << 2)
+#define CR0_PRIQEN			(1 << 1)
+#define CR0_SMMUEN			(1 << 0)
+
+#define ARM_SMMU_CR0ACK			0x24
+
+#define ARM_SMMU_CR1			0x28
+#define CR1_TABLE_SH			GENMASK(11, 10)
+#define CR1_TABLE_OC			GENMASK(9, 8)
+#define CR1_TABLE_IC			GENMASK(7, 6)
+#define CR1_QUEUE_SH			GENMASK(5, 4)
+#define CR1_QUEUE_OC			GENMASK(3, 2)
+#define CR1_QUEUE_IC			GENMASK(1, 0)
+/* CR1 cacheability fields don't quite follow the usual TCR-style encoding */
+#define CR1_CACHE_NC			0
+#define CR1_CACHE_WB			1
+#define CR1_CACHE_WT			2
+
+#define ARM_SMMU_CR2			0x2c
+#define CR2_PTM				(1 << 2)
+#define CR2_RECINVSID			(1 << 1)
+#define CR2_E2H				(1 << 0)
+
+#define ARM_SMMU_GBPA			0x44
+#define GBPA_UPDATE			(1 << 31)
+#define GBPA_ABORT			(1 << 20)
+
+#define ARM_SMMU_IRQ_CTRL		0x50
+#define IRQ_CTRL_EVTQ_IRQEN		(1 << 2)
+#define IRQ_CTRL_PRIQ_IRQEN		(1 << 1)
+#define IRQ_CTRL_GERROR_IRQEN		(1 << 0)
+
+#define ARM_SMMU_IRQ_CTRLACK		0x54
+
+#define ARM_SMMU_GERROR			0x60
+#define GERROR_SFM_ERR			(1 << 8)
+#define GERROR_MSI_GERROR_ABT_ERR	(1 << 7)
+#define GERROR_MSI_PRIQ_ABT_ERR		(1 << 6)
+#define GERROR_MSI_EVTQ_ABT_ERR		(1 << 5)
+#define GERROR_MSI_CMDQ_ABT_ERR		(1 << 4)
+#define GERROR_PRIQ_ABT_ERR		(1 << 3)
+#define GERROR_EVTQ_ABT_ERR		(1 << 2)
+#define GERROR_CMDQ_ERR			(1 << 0)
+#define GERROR_ERR_MASK			0x1fd
+
+#define ARM_SMMU_GERRORN		0x64
+
+#define ARM_SMMU_GERROR_IRQ_CFG0	0x68
+#define ARM_SMMU_GERROR_IRQ_CFG1	0x70
+#define ARM_SMMU_GERROR_IRQ_CFG2	0x74
+
+#define ARM_SMMU_STRTAB_BASE		0x80
+#define STRTAB_BASE_RA			(1UL << 62)
+#define STRTAB_BASE_ADDR_MASK		GENMASK_ULL(51, 6)
+
+#define ARM_SMMU_STRTAB_BASE_CFG	0x88
+#define STRTAB_BASE_CFG_FMT		GENMASK(17, 16)
+#define STRTAB_BASE_CFG_FMT_LINEAR	0
+#define STRTAB_BASE_CFG_FMT_2LVL	1
+#define STRTAB_BASE_CFG_SPLIT		GENMASK(10, 6)
+#define STRTAB_BASE_CFG_LOG2SIZE	GENMASK(5, 0)
+
+#define Q_BASE_RWA			(1UL << 62)
+#define Q_BASE_ADDR_MASK		GENMASK_ULL(51, 5)
+#define Q_BASE_LOG2SIZE			GENMASK(4, 0)
+
+#define ARM_SMMU_CMDQ_BASE		0x90
+#define ARM_SMMU_CMDQ_PROD		0x98
+#define ARM_SMMU_CMDQ_CONS		0x9c
+
+#define ARM_SMMU_EVTQ_BASE		0xa0
+#define ARM_SMMU_EVTQ_PROD		0xa8
+#define ARM_SMMU_EVTQ_CONS		0xac
+#define ARM_SMMU_EVTQ_IRQ_CFG0		0xb0
+#define ARM_SMMU_EVTQ_IRQ_CFG1		0xb8
+#define ARM_SMMU_EVTQ_IRQ_CFG2		0xbc
+
+#define ARM_SMMU_PRIQ_BASE		0xc0
+#define ARM_SMMU_PRIQ_PROD		0xc8
+#define ARM_SMMU_PRIQ_CONS		0xcc
+#define ARM_SMMU_PRIQ_IRQ_CFG0		0xd0
+#define ARM_SMMU_PRIQ_IRQ_CFG1		0xd8
+#define ARM_SMMU_PRIQ_IRQ_CFG2		0xdc
+
+#define ARM_SMMU_REG_SZ			0xe00
+
+/* Common MSI config fields */
+#define MSI_CFG0_ADDR_MASK		GENMASK_ULL(51, 2)
+#define MSI_CFG2_SH			GENMASK(5, 4)
+#define MSI_CFG2_MEMATTR		GENMASK(3, 0)
+
+/* Common memory attribute values */
+#define ARM_SMMU_SH_NSH			0
+#define ARM_SMMU_SH_OSH			2
+#define ARM_SMMU_SH_ISH			3
+#define ARM_SMMU_MEMATTR_DEVICE_nGnRE	0x1
+#define ARM_SMMU_MEMATTR_OIWB		0xf
+
+/*
+ * Stream table.
+ *
+ * Linear: Enough to cover 1 << IDR1.SIDSIZE entries
+ * 2lvl: 128k L1 entries,
+ *       256 lazy entries per table (each table covers a PCI bus)
+ */
+#define STRTAB_L1_SZ_SHIFT		20
+#define STRTAB_SPLIT			8
+
+#define STRTAB_L1_DESC_DWORDS		1
+#define STRTAB_L1_DESC_SPAN		GENMASK_ULL(4, 0)
+#define STRTAB_L1_DESC_L2PTR_MASK	GENMASK_ULL(51, 6)
+
+#define STRTAB_STE_DWORDS		8
+#define STRTAB_STE_0_V			(1UL << 0)
+#define STRTAB_STE_0_CFG		GENMASK_ULL(3, 1)
+#define STRTAB_STE_0_CFG_ABORT		0
+#define STRTAB_STE_0_CFG_BYPASS		4
+#define STRTAB_STE_0_CFG_S1_TRANS	5
+#define STRTAB_STE_0_CFG_S2_TRANS	6
+
+#define STRTAB_STE_0_S1FMT		GENMASK_ULL(5, 4)
+#define STRTAB_STE_0_S1FMT_LINEAR	0
+#define STRTAB_STE_0_S1FMT_64K_L2	2
+#define STRTAB_STE_0_S1CTXPTR_MASK	GENMASK_ULL(51, 6)
+#define STRTAB_STE_0_S1CDMAX		GENMASK_ULL(63, 59)
+
+#define STRTAB_STE_1_S1DSS		GENMASK_ULL(1, 0)
+#define STRTAB_STE_1_S1DSS_TERMINATE	0x0
+#define STRTAB_STE_1_S1DSS_BYPASS	0x1
+#define STRTAB_STE_1_S1DSS_SSID0	0x2
+
+#define STRTAB_STE_1_S1C_CACHE_NC	0UL
+#define STRTAB_STE_1_S1C_CACHE_WBRA	1UL
+#define STRTAB_STE_1_S1C_CACHE_WT	2UL
+#define STRTAB_STE_1_S1C_CACHE_WB	3UL
+#define STRTAB_STE_1_S1CIR		GENMASK_ULL(3, 2)
+#define STRTAB_STE_1_S1COR		GENMASK_ULL(5, 4)
+#define STRTAB_STE_1_S1CSH		GENMASK_ULL(7, 6)
+
+#define STRTAB_STE_1_S1STALLD		(1UL << 27)
+
+#define STRTAB_STE_1_EATS		GENMASK_ULL(29, 28)
+#define STRTAB_STE_1_EATS_ABT		0UL
+#define STRTAB_STE_1_EATS_TRANS		1UL
+#define STRTAB_STE_1_EATS_S1CHK		2UL
+
+#define STRTAB_STE_1_STRW		GENMASK_ULL(31, 30)
+#define STRTAB_STE_1_STRW_NSEL1		0UL
+#define STRTAB_STE_1_STRW_EL2		2UL
+
+#define STRTAB_STE_1_SHCFG		GENMASK_ULL(45, 44)
+#define STRTAB_STE_1_SHCFG_INCOMING	1UL
+
+#define STRTAB_STE_2_S2VMID		GENMASK_ULL(15, 0)
+#define STRTAB_STE_2_VTCR		GENMASK_ULL(50, 32)
+#define STRTAB_STE_2_VTCR_S2T0SZ	GENMASK_ULL(5, 0)
+#define STRTAB_STE_2_VTCR_S2SL0		GENMASK_ULL(7, 6)
+#define STRTAB_STE_2_VTCR_S2IR0		GENMASK_ULL(9, 8)
+#define STRTAB_STE_2_VTCR_S2OR0		GENMASK_ULL(11, 10)
+#define STRTAB_STE_2_VTCR_S2SH0		GENMASK_ULL(13, 12)
+#define STRTAB_STE_2_VTCR_S2TG		GENMASK_ULL(15, 14)
+#define STRTAB_STE_2_VTCR_S2PS		GENMASK_ULL(18, 16)
+#define STRTAB_STE_2_S2AA64		(1UL << 51)
+#define STRTAB_STE_2_S2ENDI		(1UL << 52)
+#define STRTAB_STE_2_S2PTW		(1UL << 54)
+#define STRTAB_STE_2_S2R		(1UL << 58)
+
+#define STRTAB_STE_3_S2TTB_MASK		GENMASK_ULL(51, 4)
+
+/*
+ * Context descriptors.
+ *
+ * Linear: when less than 1024 SSIDs are supported
+ * 2lvl: at most 1024 L1 entries,
+ *       1024 lazy entries per table.
+ */
+#define CTXDESC_SPLIT			10
+#define CTXDESC_L2_ENTRIES		(1 << CTXDESC_SPLIT)
+
+#define CTXDESC_L1_DESC_DWORDS		1
+#define CTXDESC_L1_DESC_V		(1UL << 0)
+#define CTXDESC_L1_DESC_L2PTR_MASK	GENMASK_ULL(51, 12)
+
+#define CTXDESC_CD_DWORDS		8
+#define CTXDESC_CD_0_TCR_T0SZ		GENMASK_ULL(5, 0)
+#define CTXDESC_CD_0_TCR_TG0		GENMASK_ULL(7, 6)
+#define CTXDESC_CD_0_TCR_IRGN0		GENMASK_ULL(9, 8)
+#define CTXDESC_CD_0_TCR_ORGN0		GENMASK_ULL(11, 10)
+#define CTXDESC_CD_0_TCR_SH0		GENMASK_ULL(13, 12)
+#define CTXDESC_CD_0_TCR_EPD0		(1ULL << 14)
+#define CTXDESC_CD_0_TCR_EPD1		(1ULL << 30)
+
+#define CTXDESC_CD_0_ENDI		(1UL << 15)
+#define CTXDESC_CD_0_V			(1UL << 31)
+
+#define CTXDESC_CD_0_TCR_IPS		GENMASK_ULL(34, 32)
+#define CTXDESC_CD_0_TCR_TBI0		(1ULL << 38)
+
+#define CTXDESC_CD_0_AA64		(1UL << 41)
+#define CTXDESC_CD_0_S			(1UL << 44)
+#define CTXDESC_CD_0_R			(1UL << 45)
+#define CTXDESC_CD_0_A			(1UL << 46)
+#define CTXDESC_CD_0_ASET		(1UL << 47)
+#define CTXDESC_CD_0_ASID		GENMASK_ULL(63, 48)
+
+#define CTXDESC_CD_1_TTB0_MASK		GENMASK_ULL(51, 4)
+
+/* Command queue */
+#define CMDQ_ENT_SZ_SHIFT		4
+#define CMDQ_ENT_DWORDS			((1 << CMDQ_ENT_SZ_SHIFT) >> 3)
+#define CMDQ_MAX_SZ_SHIFT		(Q_MAX_SZ_SHIFT - CMDQ_ENT_SZ_SHIFT)
+
+#define CMDQ_CONS_ERR			GENMASK(30, 24)
+#define CMDQ_ERR_CERROR_NONE_IDX	0
+#define CMDQ_ERR_CERROR_ILL_IDX		1
+#define CMDQ_ERR_CERROR_ABT_IDX		2
+#define CMDQ_ERR_CERROR_ATC_INV_IDX	3
+
+#define CMDQ_0_OP			GENMASK_ULL(7, 0)
+#define CMDQ_0_SSV			(1UL << 11)
+
+#define CMDQ_PREFETCH_0_SID		GENMASK_ULL(63, 32)
+#define CMDQ_PREFETCH_1_SIZE		GENMASK_ULL(4, 0)
+#define CMDQ_PREFETCH_1_ADDR_MASK	GENMASK_ULL(63, 12)
+
+#define CMDQ_CFGI_0_SSID		GENMASK_ULL(31, 12)
+#define CMDQ_CFGI_0_SID			GENMASK_ULL(63, 32)
+#define CMDQ_CFGI_1_LEAF		(1UL << 0)
+#define CMDQ_CFGI_1_RANGE		GENMASK_ULL(4, 0)
+
+#define CMDQ_TLBI_0_NUM			GENMASK_ULL(16, 12)
+#define CMDQ_TLBI_RANGE_NUM_MAX		31
+#define CMDQ_TLBI_0_SCALE		GENMASK_ULL(24, 20)
+#define CMDQ_TLBI_0_VMID		GENMASK_ULL(47, 32)
+#define CMDQ_TLBI_0_ASID		GENMASK_ULL(63, 48)
+#define CMDQ_TLBI_1_LEAF		(1UL << 0)
+#define CMDQ_TLBI_1_TTL			GENMASK_ULL(9, 8)
+#define CMDQ_TLBI_1_TG			GENMASK_ULL(11, 10)
+#define CMDQ_TLBI_1_VA_MASK		GENMASK_ULL(63, 12)
+#define CMDQ_TLBI_1_IPA_MASK		GENMASK_ULL(51, 12)
+
+#define CMDQ_ATC_0_SSID			GENMASK_ULL(31, 12)
+#define CMDQ_ATC_0_SID			GENMASK_ULL(63, 32)
+#define CMDQ_ATC_0_GLOBAL		(1UL << 9)
+#define CMDQ_ATC_1_SIZE			GENMASK_ULL(5, 0)
+#define CMDQ_ATC_1_ADDR_MASK		GENMASK_ULL(63, 12)
+
+#define CMDQ_PRI_0_SSID			GENMASK_ULL(31, 12)
+#define CMDQ_PRI_0_SID			GENMASK_ULL(63, 32)
+#define CMDQ_PRI_1_GRPID		GENMASK_ULL(8, 0)
+#define CMDQ_PRI_1_RESP			GENMASK_ULL(13, 12)
+
+#define CMDQ_RESUME_0_RESP_TERM		0UL
+#define CMDQ_RESUME_0_RESP_RETRY	1UL
+#define CMDQ_RESUME_0_RESP_ABORT	2UL
+#define CMDQ_RESUME_0_RESP		GENMASK_ULL(13, 12)
+#define CMDQ_RESUME_0_SID		GENMASK_ULL(63, 32)
+#define CMDQ_RESUME_1_STAG		GENMASK_ULL(15, 0)
+
+#define CMDQ_SYNC_0_CS			GENMASK_ULL(13, 12)
+#define CMDQ_SYNC_0_CS_NONE		0
+#define CMDQ_SYNC_0_CS_IRQ		1
+#define CMDQ_SYNC_0_CS_SEV		2
+#define CMDQ_SYNC_0_MSH			GENMASK_ULL(23, 22)
+#define CMDQ_SYNC_0_MSIATTR		GENMASK_ULL(27, 24)
+#define CMDQ_SYNC_0_MSIDATA		GENMASK_ULL(63, 32)
+#define CMDQ_SYNC_1_MSIADDR_MASK	GENMASK_ULL(51, 2)
+
+/* Event queue */
+#define EVTQ_ENT_SZ_SHIFT		5
+#define EVTQ_ENT_DWORDS			((1 << EVTQ_ENT_SZ_SHIFT) >> 3)
+#define EVTQ_MAX_SZ_SHIFT		(Q_MAX_SZ_SHIFT - EVTQ_ENT_SZ_SHIFT)
+
+#define EVTQ_0_ID			GENMASK_ULL(7, 0)
+
+#define EVT_ID_TRANSLATION_FAULT	0x10
+#define EVT_ID_ADDR_SIZE_FAULT		0x11
+#define EVT_ID_ACCESS_FAULT		0x12
+#define EVT_ID_PERMISSION_FAULT		0x13
+
+#define EVTQ_0_SSV			(1UL << 11)
+#define EVTQ_0_SSID			GENMASK_ULL(31, 12)
+#define EVTQ_0_SID			GENMASK_ULL(63, 32)
+#define EVTQ_1_STAG			GENMASK_ULL(15, 0)
+#define EVTQ_1_STALL			(1UL << 31)
+#define EVTQ_1_PnU			(1UL << 33)
+#define EVTQ_1_InD			(1UL << 34)
+#define EVTQ_1_RnW			(1UL << 35)
+#define EVTQ_1_S2			(1UL << 39)
+#define EVTQ_1_CLASS			GENMASK_ULL(41, 40)
+#define EVTQ_1_TT_READ			(1UL << 44)
+#define EVTQ_2_ADDR			GENMASK_ULL(63, 0)
+#define EVTQ_3_IPA			GENMASK_ULL(51, 12)
+
+/* PRI queue */
+#define PRIQ_ENT_SZ_SHIFT		4
+#define PRIQ_ENT_DWORDS			((1 << PRIQ_ENT_SZ_SHIFT) >> 3)
+#define PRIQ_MAX_SZ_SHIFT		(Q_MAX_SZ_SHIFT - PRIQ_ENT_SZ_SHIFT)
+
+#define PRIQ_0_SID			GENMASK_ULL(31, 0)
+#define PRIQ_0_SSID			GENMASK_ULL(51, 32)
+#define PRIQ_0_PERM_PRIV		(1UL << 58)
+#define PRIQ_0_PERM_EXEC		(1UL << 59)
+#define PRIQ_0_PERM_READ		(1UL << 60)
+#define PRIQ_0_PERM_WRITE		(1UL << 61)
+#define PRIQ_0_PRG_LAST			(1UL << 62)
+#define PRIQ_0_SSID_V			(1UL << 63)
+
+#define PRIQ_1_PRG_IDX			GENMASK_ULL(8, 0)
+#define PRIQ_1_ADDR_MASK		GENMASK_ULL(63, 12)
+
+/* Synthesized features */
+#define ARM_SMMU_FEAT_2_LVL_STRTAB	(1 << 0)
+#define ARM_SMMU_FEAT_2_LVL_CDTAB	(1 << 1)
+#define ARM_SMMU_FEAT_TT_LE		(1 << 2)
+#define ARM_SMMU_FEAT_TT_BE		(1 << 3)
+#define ARM_SMMU_FEAT_PRI		(1 << 4)
+#define ARM_SMMU_FEAT_ATS		(1 << 5)
+#define ARM_SMMU_FEAT_SEV		(1 << 6)
+#define ARM_SMMU_FEAT_MSI		(1 << 7)
+#define ARM_SMMU_FEAT_COHERENCY		(1 << 8)
+#define ARM_SMMU_FEAT_TRANS_S1		(1 << 9)
+#define ARM_SMMU_FEAT_TRANS_S2		(1 << 10)
+#define ARM_SMMU_FEAT_STALLS		(1 << 11)
+#define ARM_SMMU_FEAT_HYP		(1 << 12)
+#define ARM_SMMU_FEAT_STALL_FORCE	(1 << 13)
+#define ARM_SMMU_FEAT_VAX		(1 << 14)
+#define ARM_SMMU_FEAT_RANGE_INV		(1 << 15)
+#define ARM_SMMU_FEAT_BTM		(1 << 16)
+#define ARM_SMMU_FEAT_SVA		(1 << 17)
+#define ARM_SMMU_FEAT_E2H		(1 << 18)
+
+enum pri_resp {
+	PRI_RESP_DENY = 0,
+	PRI_RESP_FAIL = 1,
+	PRI_RESP_SUCC = 2,
+};
+
+struct arm_smmu_cmdq_ent {
+	/* Common fields */
+	u8				opcode;
+	bool				substream_valid;
+
+	/* Command-specific fields */
+	union {
+		#define CMDQ_OP_PREFETCH_CFG	0x1
+		struct {
+			u32			sid;
+		} prefetch;
+
+		#define CMDQ_OP_CFGI_STE	0x3
+		#define CMDQ_OP_CFGI_ALL	0x4
+		#define CMDQ_OP_CFGI_CD		0x5
+		#define CMDQ_OP_CFGI_CD_ALL	0x6
+		struct {
+			u32			sid;
+			u32			ssid;
+			union {
+				bool		leaf;
+				u8		span;
+			};
+		} cfgi;
+
+		#define CMDQ_OP_TLBI_NH_ASID	0x11
+		#define CMDQ_OP_TLBI_NH_VA	0x12
+		#define CMDQ_OP_TLBI_EL2_ALL	0x20
+		#define CMDQ_OP_TLBI_EL2_ASID	0x21
+		#define CMDQ_OP_TLBI_EL2_VA	0x22
+		#define CMDQ_OP_TLBI_S12_VMALL	0x28
+		#define CMDQ_OP_TLBI_S2_IPA	0x2a
+		#define CMDQ_OP_TLBI_NSNH_ALL	0x30
+		struct {
+			u8			num;
+			u8			scale;
+			u16			asid;
+			u16			vmid;
+			bool			leaf;
+			u8			ttl;
+			u8			tg;
+			u64			addr;
+		} tlbi;
+
+		#define CMDQ_OP_ATC_INV		0x40
+		#define ATC_INV_SIZE_ALL	52
+		struct {
+			u32			sid;
+			u32			ssid;
+			u64			addr;
+			u8			size;
+			bool			global;
+		} atc;
+
+		#define CMDQ_OP_PRI_RESP	0x41
+		struct {
+			u32			sid;
+			u32			ssid;
+			u16			grpid;
+			enum pri_resp		resp;
+		} pri;
+
+		#define CMDQ_OP_RESUME		0x44
+		struct {
+			u32			sid;
+			u16			stag;
+			u8			resp;
+		} resume;
+
+		#define CMDQ_OP_CMD_SYNC	0x46
+		struct {
+			u64			msiaddr;
+		} sync;
+	};
+};
+
+#endif /* _ARM_SMMU_V3_REGS_H */
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index cec3c8103404..32ce835ab4eb 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -8,164 +8,13 @@
 #ifndef _ARM_SMMU_V3_H
 #define _ARM_SMMU_V3_H
 
-#include <linux/bitfield.h>
 #include <linux/iommu.h>
 #include <linux/io-pgtable.h>
 #include <linux/kernel.h>
 #include <linux/mmzone.h>
 #include <linux/sizes.h>
 
-/* MMIO registers */
-#define ARM_SMMU_IDR0			0x0
-#define IDR0_ST_LVL			GENMASK(28, 27)
-#define IDR0_ST_LVL_2LVL		1
-#define IDR0_STALL_MODEL		GENMASK(25, 24)
-#define IDR0_STALL_MODEL_STALL		0
-#define IDR0_STALL_MODEL_FORCE		2
-#define IDR0_TTENDIAN			GENMASK(22, 21)
-#define IDR0_TTENDIAN_MIXED		0
-#define IDR0_TTENDIAN_LE		2
-#define IDR0_TTENDIAN_BE		3
-#define IDR0_CD2L			(1 << 19)
-#define IDR0_VMID16			(1 << 18)
-#define IDR0_PRI			(1 << 16)
-#define IDR0_SEV			(1 << 14)
-#define IDR0_MSI			(1 << 13)
-#define IDR0_ASID16			(1 << 12)
-#define IDR0_ATS			(1 << 10)
-#define IDR0_HYP			(1 << 9)
-#define IDR0_COHACC			(1 << 4)
-#define IDR0_TTF			GENMASK(3, 2)
-#define IDR0_TTF_AARCH64		2
-#define IDR0_TTF_AARCH32_64		3
-#define IDR0_S1P			(1 << 1)
-#define IDR0_S2P			(1 << 0)
-
-#define ARM_SMMU_IDR1			0x4
-#define IDR1_TABLES_PRESET		(1 << 30)
-#define IDR1_QUEUES_PRESET		(1 << 29)
-#define IDR1_REL			(1 << 28)
-#define IDR1_CMDQS			GENMASK(25, 21)
-#define IDR1_EVTQS			GENMASK(20, 16)
-#define IDR1_PRIQS			GENMASK(15, 11)
-#define IDR1_SSIDSIZE			GENMASK(10, 6)
-#define IDR1_SIDSIZE			GENMASK(5, 0)
-
-#define ARM_SMMU_IDR3			0xc
-#define IDR3_RIL			(1 << 10)
-
-#define ARM_SMMU_IDR5			0x14
-#define IDR5_STALL_MAX			GENMASK(31, 16)
-#define IDR5_GRAN64K			(1 << 6)
-#define IDR5_GRAN16K			(1 << 5)
-#define IDR5_GRAN4K			(1 << 4)
-#define IDR5_OAS			GENMASK(2, 0)
-#define IDR5_OAS_32_BIT			0
-#define IDR5_OAS_36_BIT			1
-#define IDR5_OAS_40_BIT			2
-#define IDR5_OAS_42_BIT			3
-#define IDR5_OAS_44_BIT			4
-#define IDR5_OAS_48_BIT			5
-#define IDR5_OAS_52_BIT			6
-#define IDR5_VAX			GENMASK(11, 10)
-#define IDR5_VAX_52_BIT			1
-
-#define ARM_SMMU_CR0			0x20
-#define CR0_ATSCHK			(1 << 4)
-#define CR0_CMDQEN			(1 << 3)
-#define CR0_EVTQEN			(1 << 2)
-#define CR0_PRIQEN			(1 << 1)
-#define CR0_SMMUEN			(1 << 0)
-
-#define ARM_SMMU_CR0ACK			0x24
-
-#define ARM_SMMU_CR1			0x28
-#define CR1_TABLE_SH			GENMASK(11, 10)
-#define CR1_TABLE_OC			GENMASK(9, 8)
-#define CR1_TABLE_IC			GENMASK(7, 6)
-#define CR1_QUEUE_SH			GENMASK(5, 4)
-#define CR1_QUEUE_OC			GENMASK(3, 2)
-#define CR1_QUEUE_IC			GENMASK(1, 0)
-/* CR1 cacheability fields don't quite follow the usual TCR-style encoding */
-#define CR1_CACHE_NC			0
-#define CR1_CACHE_WB			1
-#define CR1_CACHE_WT			2
-
-#define ARM_SMMU_CR2			0x2c
-#define CR2_PTM				(1 << 2)
-#define CR2_RECINVSID			(1 << 1)
-#define CR2_E2H				(1 << 0)
-
-#define ARM_SMMU_GBPA			0x44
-#define GBPA_UPDATE			(1 << 31)
-#define GBPA_ABORT			(1 << 20)
-
-#define ARM_SMMU_IRQ_CTRL		0x50
-#define IRQ_CTRL_EVTQ_IRQEN		(1 << 2)
-#define IRQ_CTRL_PRIQ_IRQEN		(1 << 1)
-#define IRQ_CTRL_GERROR_IRQEN		(1 << 0)
-
-#define ARM_SMMU_IRQ_CTRLACK		0x54
-
-#define ARM_SMMU_GERROR			0x60
-#define GERROR_SFM_ERR			(1 << 8)
-#define GERROR_MSI_GERROR_ABT_ERR	(1 << 7)
-#define GERROR_MSI_PRIQ_ABT_ERR		(1 << 6)
-#define GERROR_MSI_EVTQ_ABT_ERR		(1 << 5)
-#define GERROR_MSI_CMDQ_ABT_ERR		(1 << 4)
-#define GERROR_PRIQ_ABT_ERR		(1 << 3)
-#define GERROR_EVTQ_ABT_ERR		(1 << 2)
-#define GERROR_CMDQ_ERR			(1 << 0)
-#define GERROR_ERR_MASK			0x1fd
-
-#define ARM_SMMU_GERRORN		0x64
-
-#define ARM_SMMU_GERROR_IRQ_CFG0	0x68
-#define ARM_SMMU_GERROR_IRQ_CFG1	0x70
-#define ARM_SMMU_GERROR_IRQ_CFG2	0x74
-
-#define ARM_SMMU_STRTAB_BASE		0x80
-#define STRTAB_BASE_RA			(1UL << 62)
-#define STRTAB_BASE_ADDR_MASK		GENMASK_ULL(51, 6)
-
-#define ARM_SMMU_STRTAB_BASE_CFG	0x88
-#define STRTAB_BASE_CFG_FMT		GENMASK(17, 16)
-#define STRTAB_BASE_CFG_FMT_LINEAR	0
-#define STRTAB_BASE_CFG_FMT_2LVL	1
-#define STRTAB_BASE_CFG_SPLIT		GENMASK(10, 6)
-#define STRTAB_BASE_CFG_LOG2SIZE	GENMASK(5, 0)
-
-#define ARM_SMMU_CMDQ_BASE		0x90
-#define ARM_SMMU_CMDQ_PROD		0x98
-#define ARM_SMMU_CMDQ_CONS		0x9c
-
-#define ARM_SMMU_EVTQ_BASE		0xa0
-#define ARM_SMMU_EVTQ_PROD		0xa8
-#define ARM_SMMU_EVTQ_CONS		0xac
-#define ARM_SMMU_EVTQ_IRQ_CFG0		0xb0
-#define ARM_SMMU_EVTQ_IRQ_CFG1		0xb8
-#define ARM_SMMU_EVTQ_IRQ_CFG2		0xbc
-
-#define ARM_SMMU_PRIQ_BASE		0xc0
-#define ARM_SMMU_PRIQ_PROD		0xc8
-#define ARM_SMMU_PRIQ_CONS		0xcc
-#define ARM_SMMU_PRIQ_IRQ_CFG0		0xd0
-#define ARM_SMMU_PRIQ_IRQ_CFG1		0xd8
-#define ARM_SMMU_PRIQ_IRQ_CFG2		0xdc
-
-#define ARM_SMMU_REG_SZ			0xe00
-
-/* Common MSI config fields */
-#define MSI_CFG0_ADDR_MASK		GENMASK_ULL(51, 2)
-#define MSI_CFG2_SH			GENMASK(5, 4)
-#define MSI_CFG2_MEMATTR		GENMASK(3, 0)
-
-/* Common memory attribute values */
-#define ARM_SMMU_SH_NSH			0
-#define ARM_SMMU_SH_OSH			2
-#define ARM_SMMU_SH_ISH			3
-#define ARM_SMMU_MEMATTR_DEVICE_nGnRE	0x1
-#define ARM_SMMU_MEMATTR_OIWB		0xf
+#include <asm/arm-smmu-v3-regs.h>
 
 #define Q_IDX(llq, p)			((p) & ((1 << (llq)->max_n_shift) - 1))
 #define Q_WRP(llq, p)			((p) & (1 << (llq)->max_n_shift))
@@ -175,10 +24,6 @@
 					 Q_IDX(&((q)->llq), p) *	\
 					 (q)->ent_dwords)
 
-#define Q_BASE_RWA			(1UL << 62)
-#define Q_BASE_ADDR_MASK		GENMASK_ULL(51, 5)
-#define Q_BASE_LOG2SIZE			GENMASK(4, 0)
-
 /* Ensure DMA allocations are naturally aligned */
 #ifdef CONFIG_CMA_ALIGNMENT
 #define Q_MAX_SZ_SHIFT			(PAGE_SHIFT + CONFIG_CMA_ALIGNMENT)
@@ -186,132 +31,6 @@
 #define Q_MAX_SZ_SHIFT			(PAGE_SHIFT + MAX_ORDER - 1)
 #endif
 
-/*
- * Stream table.
- *
- * Linear: Enough to cover 1 << IDR1.SIDSIZE entries
- * 2lvl: 128k L1 entries,
- *       256 lazy entries per table (each table covers a PCI bus)
- */
-#define STRTAB_L1_SZ_SHIFT		20
-#define STRTAB_SPLIT			8
-
-#define STRTAB_L1_DESC_DWORDS		1
-#define STRTAB_L1_DESC_SPAN		GENMASK_ULL(4, 0)
-#define STRTAB_L1_DESC_L2PTR_MASK	GENMASK_ULL(51, 6)
-
-#define STRTAB_STE_DWORDS		8
-#define STRTAB_STE_0_V			(1UL << 0)
-#define STRTAB_STE_0_CFG		GENMASK_ULL(3, 1)
-#define STRTAB_STE_0_CFG_ABORT		0
-#define STRTAB_STE_0_CFG_BYPASS		4
-#define STRTAB_STE_0_CFG_S1_TRANS	5
-#define STRTAB_STE_0_CFG_S2_TRANS	6
-
-#define STRTAB_STE_0_S1FMT		GENMASK_ULL(5, 4)
-#define STRTAB_STE_0_S1FMT_LINEAR	0
-#define STRTAB_STE_0_S1FMT_64K_L2	2
-#define STRTAB_STE_0_S1CTXPTR_MASK	GENMASK_ULL(51, 6)
-#define STRTAB_STE_0_S1CDMAX		GENMASK_ULL(63, 59)
-
-#define STRTAB_STE_1_S1DSS		GENMASK_ULL(1, 0)
-#define STRTAB_STE_1_S1DSS_TERMINATE	0x0
-#define STRTAB_STE_1_S1DSS_BYPASS	0x1
-#define STRTAB_STE_1_S1DSS_SSID0	0x2
-
-#define STRTAB_STE_1_S1C_CACHE_NC	0UL
-#define STRTAB_STE_1_S1C_CACHE_WBRA	1UL
-#define STRTAB_STE_1_S1C_CACHE_WT	2UL
-#define STRTAB_STE_1_S1C_CACHE_WB	3UL
-#define STRTAB_STE_1_S1CIR		GENMASK_ULL(3, 2)
-#define STRTAB_STE_1_S1COR		GENMASK_ULL(5, 4)
-#define STRTAB_STE_1_S1CSH		GENMASK_ULL(7, 6)
-
-#define STRTAB_STE_1_S1STALLD		(1UL << 27)
-
-#define STRTAB_STE_1_EATS		GENMASK_ULL(29, 28)
-#define STRTAB_STE_1_EATS_ABT		0UL
-#define STRTAB_STE_1_EATS_TRANS		1UL
-#define STRTAB_STE_1_EATS_S1CHK		2UL
-
-#define STRTAB_STE_1_STRW		GENMASK_ULL(31, 30)
-#define STRTAB_STE_1_STRW_NSEL1		0UL
-#define STRTAB_STE_1_STRW_EL2		2UL
-
-#define STRTAB_STE_1_SHCFG		GENMASK_ULL(45, 44)
-#define STRTAB_STE_1_SHCFG_INCOMING	1UL
-
-#define STRTAB_STE_2_S2VMID		GENMASK_ULL(15, 0)
-#define STRTAB_STE_2_VTCR		GENMASK_ULL(50, 32)
-#define STRTAB_STE_2_VTCR_S2T0SZ	GENMASK_ULL(5, 0)
-#define STRTAB_STE_2_VTCR_S2SL0		GENMASK_ULL(7, 6)
-#define STRTAB_STE_2_VTCR_S2IR0		GENMASK_ULL(9, 8)
-#define STRTAB_STE_2_VTCR_S2OR0		GENMASK_ULL(11, 10)
-#define STRTAB_STE_2_VTCR_S2SH0		GENMASK_ULL(13, 12)
-#define STRTAB_STE_2_VTCR_S2TG		GENMASK_ULL(15, 14)
-#define STRTAB_STE_2_VTCR_S2PS		GENMASK_ULL(18, 16)
-#define STRTAB_STE_2_S2AA64		(1UL << 51)
-#define STRTAB_STE_2_S2ENDI		(1UL << 52)
-#define STRTAB_STE_2_S2PTW		(1UL << 54)
-#define STRTAB_STE_2_S2R		(1UL << 58)
-
-#define STRTAB_STE_3_S2TTB_MASK		GENMASK_ULL(51, 4)
-
-/*
- * Context descriptors.
- *
- * Linear: when less than 1024 SSIDs are supported
- * 2lvl: at most 1024 L1 entries,
- *       1024 lazy entries per table.
- */
-#define CTXDESC_SPLIT			10
-#define CTXDESC_L2_ENTRIES		(1 << CTXDESC_SPLIT)
-
-#define CTXDESC_L1_DESC_DWORDS		1
-#define CTXDESC_L1_DESC_V		(1UL << 0)
-#define CTXDESC_L1_DESC_L2PTR_MASK	GENMASK_ULL(51, 12)
-
-#define CTXDESC_CD_DWORDS		8
-#define CTXDESC_CD_0_TCR_T0SZ		GENMASK_ULL(5, 0)
-#define CTXDESC_CD_0_TCR_TG0		GENMASK_ULL(7, 6)
-#define CTXDESC_CD_0_TCR_IRGN0		GENMASK_ULL(9, 8)
-#define CTXDESC_CD_0_TCR_ORGN0		GENMASK_ULL(11, 10)
-#define CTXDESC_CD_0_TCR_SH0		GENMASK_ULL(13, 12)
-#define CTXDESC_CD_0_TCR_EPD0		(1ULL << 14)
-#define CTXDESC_CD_0_TCR_EPD1		(1ULL << 30)
-
-#define CTXDESC_CD_0_ENDI		(1UL << 15)
-#define CTXDESC_CD_0_V			(1UL << 31)
-
-#define CTXDESC_CD_0_TCR_IPS		GENMASK_ULL(34, 32)
-#define CTXDESC_CD_0_TCR_TBI0		(1ULL << 38)
-
-#define CTXDESC_CD_0_AA64		(1UL << 41)
-#define CTXDESC_CD_0_S			(1UL << 44)
-#define CTXDESC_CD_0_R			(1UL << 45)
-#define CTXDESC_CD_0_A			(1UL << 46)
-#define CTXDESC_CD_0_ASET		(1UL << 47)
-#define CTXDESC_CD_0_ASID		GENMASK_ULL(63, 48)
-
-#define CTXDESC_CD_1_TTB0_MASK		GENMASK_ULL(51, 4)
-
-/*
- * When the SMMU only supports linear context descriptor tables, pick a
- * reasonable size limit (64kB).
- */
-#define CTXDESC_LINEAR_CDMAX		ilog2(SZ_64K / (CTXDESC_CD_DWORDS << 3))
-
-/* Command queue */
-#define CMDQ_ENT_SZ_SHIFT		4
-#define CMDQ_ENT_DWORDS			((1 << CMDQ_ENT_SZ_SHIFT) >> 3)
-#define CMDQ_MAX_SZ_SHIFT		(Q_MAX_SZ_SHIFT - CMDQ_ENT_SZ_SHIFT)
-
-#define CMDQ_CONS_ERR			GENMASK(30, 24)
-#define CMDQ_ERR_CERROR_NONE_IDX	0
-#define CMDQ_ERR_CERROR_ILL_IDX		1
-#define CMDQ_ERR_CERROR_ABT_IDX		2
-#define CMDQ_ERR_CERROR_ATC_INV_IDX	3
-
 #define CMDQ_PROD_OWNED_FLAG		Q_OVERFLOW_FLAG
 
 /*
@@ -321,98 +40,11 @@
  */
 #define CMDQ_BATCH_ENTRIES		BITS_PER_LONG
 
-#define CMDQ_0_OP			GENMASK_ULL(7, 0)
-#define CMDQ_0_SSV			(1UL << 11)
-
-#define CMDQ_PREFETCH_0_SID		GENMASK_ULL(63, 32)
-#define CMDQ_PREFETCH_1_SIZE		GENMASK_ULL(4, 0)
-#define CMDQ_PREFETCH_1_ADDR_MASK	GENMASK_ULL(63, 12)
-
-#define CMDQ_CFGI_0_SSID		GENMASK_ULL(31, 12)
-#define CMDQ_CFGI_0_SID			GENMASK_ULL(63, 32)
-#define CMDQ_CFGI_1_LEAF		(1UL << 0)
-#define CMDQ_CFGI_1_RANGE		GENMASK_ULL(4, 0)
-
-#define CMDQ_TLBI_0_NUM			GENMASK_ULL(16, 12)
-#define CMDQ_TLBI_RANGE_NUM_MAX		31
-#define CMDQ_TLBI_0_SCALE		GENMASK_ULL(24, 20)
-#define CMDQ_TLBI_0_VMID		GENMASK_ULL(47, 32)
-#define CMDQ_TLBI_0_ASID		GENMASK_ULL(63, 48)
-#define CMDQ_TLBI_1_LEAF		(1UL << 0)
-#define CMDQ_TLBI_1_TTL			GENMASK_ULL(9, 8)
-#define CMDQ_TLBI_1_TG			GENMASK_ULL(11, 10)
-#define CMDQ_TLBI_1_VA_MASK		GENMASK_ULL(63, 12)
-#define CMDQ_TLBI_1_IPA_MASK		GENMASK_ULL(51, 12)
-
-#define CMDQ_ATC_0_SSID			GENMASK_ULL(31, 12)
-#define CMDQ_ATC_0_SID			GENMASK_ULL(63, 32)
-#define CMDQ_ATC_0_GLOBAL		(1UL << 9)
-#define CMDQ_ATC_1_SIZE			GENMASK_ULL(5, 0)
-#define CMDQ_ATC_1_ADDR_MASK		GENMASK_ULL(63, 12)
-
-#define CMDQ_PRI_0_SSID			GENMASK_ULL(31, 12)
-#define CMDQ_PRI_0_SID			GENMASK_ULL(63, 32)
-#define CMDQ_PRI_1_GRPID		GENMASK_ULL(8, 0)
-#define CMDQ_PRI_1_RESP			GENMASK_ULL(13, 12)
-
-#define CMDQ_RESUME_0_RESP_TERM		0UL
-#define CMDQ_RESUME_0_RESP_RETRY	1UL
-#define CMDQ_RESUME_0_RESP_ABORT	2UL
-#define CMDQ_RESUME_0_RESP		GENMASK_ULL(13, 12)
-#define CMDQ_RESUME_0_SID		GENMASK_ULL(63, 32)
-#define CMDQ_RESUME_1_STAG		GENMASK_ULL(15, 0)
-
-#define CMDQ_SYNC_0_CS			GENMASK_ULL(13, 12)
-#define CMDQ_SYNC_0_CS_NONE		0
-#define CMDQ_SYNC_0_CS_IRQ		1
-#define CMDQ_SYNC_0_CS_SEV		2
-#define CMDQ_SYNC_0_MSH			GENMASK_ULL(23, 22)
-#define CMDQ_SYNC_0_MSIATTR		GENMASK_ULL(27, 24)
-#define CMDQ_SYNC_0_MSIDATA		GENMASK_ULL(63, 32)
-#define CMDQ_SYNC_1_MSIADDR_MASK	GENMASK_ULL(51, 2)
-
-/* Event queue */
-#define EVTQ_ENT_SZ_SHIFT		5
-#define EVTQ_ENT_DWORDS			((1 << EVTQ_ENT_SZ_SHIFT) >> 3)
-#define EVTQ_MAX_SZ_SHIFT		(Q_MAX_SZ_SHIFT - EVTQ_ENT_SZ_SHIFT)
-
-#define EVTQ_0_ID			GENMASK_ULL(7, 0)
-
-#define EVT_ID_TRANSLATION_FAULT	0x10
-#define EVT_ID_ADDR_SIZE_FAULT		0x11
-#define EVT_ID_ACCESS_FAULT		0x12
-#define EVT_ID_PERMISSION_FAULT		0x13
-
-#define EVTQ_0_SSV			(1UL << 11)
-#define EVTQ_0_SSID			GENMASK_ULL(31, 12)
-#define EVTQ_0_SID			GENMASK_ULL(63, 32)
-#define EVTQ_1_STAG			GENMASK_ULL(15, 0)
-#define EVTQ_1_STALL			(1UL << 31)
-#define EVTQ_1_PnU			(1UL << 33)
-#define EVTQ_1_InD			(1UL << 34)
-#define EVTQ_1_RnW			(1UL << 35)
-#define EVTQ_1_S2			(1UL << 39)
-#define EVTQ_1_CLASS			GENMASK_ULL(41, 40)
-#define EVTQ_1_TT_READ			(1UL << 44)
-#define EVTQ_2_ADDR			GENMASK_ULL(63, 0)
-#define EVTQ_3_IPA			GENMASK_ULL(51, 12)
-
-/* PRI queue */
-#define PRIQ_ENT_SZ_SHIFT		4
-#define PRIQ_ENT_DWORDS			((1 << PRIQ_ENT_SZ_SHIFT) >> 3)
-#define PRIQ_MAX_SZ_SHIFT		(Q_MAX_SZ_SHIFT - PRIQ_ENT_SZ_SHIFT)
-
-#define PRIQ_0_SID			GENMASK_ULL(31, 0)
-#define PRIQ_0_SSID			GENMASK_ULL(51, 32)
-#define PRIQ_0_PERM_PRIV		(1UL << 58)
-#define PRIQ_0_PERM_EXEC		(1UL << 59)
-#define PRIQ_0_PERM_READ		(1UL << 60)
-#define PRIQ_0_PERM_WRITE		(1UL << 61)
-#define PRIQ_0_PRG_LAST			(1UL << 62)
-#define PRIQ_0_SSID_V			(1UL << 63)
-
-#define PRIQ_1_PRG_IDX			GENMASK_ULL(8, 0)
-#define PRIQ_1_ADDR_MASK		GENMASK_ULL(63, 12)
+/*
+ * When the SMMU only supports linear context descriptor tables, pick a
+ * reasonable size limit (64kB).
+ */
+#define CTXDESC_LINEAR_CDMAX		ilog2(SZ_64K / (CTXDESC_CD_DWORDS << 3))
 
 /* High-level queue structures */
 #define ARM_SMMU_POLL_TIMEOUT_US	1000000 /* 1s! */
@@ -421,88 +53,6 @@
 #define MSI_IOVA_BASE			0x8000000
 #define MSI_IOVA_LENGTH			0x100000
 
-enum pri_resp {
-	PRI_RESP_DENY = 0,
-	PRI_RESP_FAIL = 1,
-	PRI_RESP_SUCC = 2,
-};
-
-struct arm_smmu_cmdq_ent {
-	/* Common fields */
-	u8				opcode;
-	bool				substream_valid;
-
-	/* Command-specific fields */
-	union {
-		#define CMDQ_OP_PREFETCH_CFG	0x1
-		struct {
-			u32			sid;
-		} prefetch;
-
-		#define CMDQ_OP_CFGI_STE	0x3
-		#define CMDQ_OP_CFGI_ALL	0x4
-		#define CMDQ_OP_CFGI_CD		0x5
-		#define CMDQ_OP_CFGI_CD_ALL	0x6
-		struct {
-			u32			sid;
-			u32			ssid;
-			union {
-				bool		leaf;
-				u8		span;
-			};
-		} cfgi;
-
-		#define CMDQ_OP_TLBI_NH_ASID	0x11
-		#define CMDQ_OP_TLBI_NH_VA	0x12
-		#define CMDQ_OP_TLBI_EL2_ALL	0x20
-		#define CMDQ_OP_TLBI_EL2_ASID	0x21
-		#define CMDQ_OP_TLBI_EL2_VA	0x22
-		#define CMDQ_OP_TLBI_S12_VMALL	0x28
-		#define CMDQ_OP_TLBI_S2_IPA	0x2a
-		#define CMDQ_OP_TLBI_NSNH_ALL	0x30
-		struct {
-			u8			num;
-			u8			scale;
-			u16			asid;
-			u16			vmid;
-			bool			leaf;
-			u8			ttl;
-			u8			tg;
-			u64			addr;
-		} tlbi;
-
-		#define CMDQ_OP_ATC_INV		0x40
-		#define ATC_INV_SIZE_ALL	52
-		struct {
-			u32			sid;
-			u32			ssid;
-			u64			addr;
-			u8			size;
-			bool			global;
-		} atc;
-
-		#define CMDQ_OP_PRI_RESP	0x41
-		struct {
-			u32			sid;
-			u32			ssid;
-			u16			grpid;
-			enum pri_resp		resp;
-		} pri;
-
-		#define CMDQ_OP_RESUME		0x44
-		struct {
-			u32			sid;
-			u16			stag;
-			u8			resp;
-		} resume;
-
-		#define CMDQ_OP_CMD_SYNC	0x46
-		struct {
-			u64			msiaddr;
-		} sync;
-	};
-};
-
 struct arm_smmu_ll_queue {
 	union {
 		u64			val;
@@ -621,25 +171,6 @@ struct arm_smmu_device {
 	void __iomem			*base;
 	void __iomem			*page1;
 
-#define ARM_SMMU_FEAT_2_LVL_STRTAB	(1 << 0)
-#define ARM_SMMU_FEAT_2_LVL_CDTAB	(1 << 1)
-#define ARM_SMMU_FEAT_TT_LE		(1 << 2)
-#define ARM_SMMU_FEAT_TT_BE		(1 << 3)
-#define ARM_SMMU_FEAT_PRI		(1 << 4)
-#define ARM_SMMU_FEAT_ATS		(1 << 5)
-#define ARM_SMMU_FEAT_SEV		(1 << 6)
-#define ARM_SMMU_FEAT_MSI		(1 << 7)
-#define ARM_SMMU_FEAT_COHERENCY		(1 << 8)
-#define ARM_SMMU_FEAT_TRANS_S1		(1 << 9)
-#define ARM_SMMU_FEAT_TRANS_S2		(1 << 10)
-#define ARM_SMMU_FEAT_STALLS		(1 << 11)
-#define ARM_SMMU_FEAT_HYP		(1 << 12)
-#define ARM_SMMU_FEAT_STALL_FORCE	(1 << 13)
-#define ARM_SMMU_FEAT_VAX		(1 << 14)
-#define ARM_SMMU_FEAT_RANGE_INV		(1 << 15)
-#define ARM_SMMU_FEAT_BTM		(1 << 16)
-#define ARM_SMMU_FEAT_SVA		(1 << 17)
-#define ARM_SMMU_FEAT_E2H		(1 << 18)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 07/45] iommu/arm-smmu-v3: Move some definitions to arm64 include/
@ 2023-02-01 12:52   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:52 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

So that the KVM SMMUv3 driver can re-use architectural definitions,
command structures and feature bits, move them to the arm64 include/

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/include/asm/arm-smmu-v3-regs.h   | 479 +++++++++++++++++++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 481 +-------------------
 2 files changed, 485 insertions(+), 475 deletions(-)
 create mode 100644 arch/arm64/include/asm/arm-smmu-v3-regs.h

diff --git a/arch/arm64/include/asm/arm-smmu-v3-regs.h b/arch/arm64/include/asm/arm-smmu-v3-regs.h
new file mode 100644
index 000000000000..646a734f2554
--- /dev/null
+++ b/arch/arm64/include/asm/arm-smmu-v3-regs.h
@@ -0,0 +1,479 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _ARM_SMMU_V3_REGS_H
+#define _ARM_SMMU_V3_REGS_H
+
+#include <linux/bitfield.h>
+
+/* MMIO registers */
+#define ARM_SMMU_IDR0			0x0
+#define IDR0_ST_LVL			GENMASK(28, 27)
+#define IDR0_ST_LVL_2LVL		1
+#define IDR0_STALL_MODEL		GENMASK(25, 24)
+#define IDR0_STALL_MODEL_STALL		0
+#define IDR0_STALL_MODEL_FORCE		2
+#define IDR0_TTENDIAN			GENMASK(22, 21)
+#define IDR0_TTENDIAN_MIXED		0
+#define IDR0_TTENDIAN_LE		2
+#define IDR0_TTENDIAN_BE		3
+#define IDR0_CD2L			(1 << 19)
+#define IDR0_VMID16			(1 << 18)
+#define IDR0_PRI			(1 << 16)
+#define IDR0_SEV			(1 << 14)
+#define IDR0_MSI			(1 << 13)
+#define IDR0_ASID16			(1 << 12)
+#define IDR0_ATS			(1 << 10)
+#define IDR0_HYP			(1 << 9)
+#define IDR0_COHACC			(1 << 4)
+#define IDR0_TTF			GENMASK(3, 2)
+#define IDR0_TTF_AARCH64		2
+#define IDR0_TTF_AARCH32_64		3
+#define IDR0_S1P			(1 << 1)
+#define IDR0_S2P			(1 << 0)
+
+#define ARM_SMMU_IDR1			0x4
+#define IDR1_TABLES_PRESET		(1 << 30)
+#define IDR1_QUEUES_PRESET		(1 << 29)
+#define IDR1_REL			(1 << 28)
+#define IDR1_CMDQS			GENMASK(25, 21)
+#define IDR1_EVTQS			GENMASK(20, 16)
+#define IDR1_PRIQS			GENMASK(15, 11)
+#define IDR1_SSIDSIZE			GENMASK(10, 6)
+#define IDR1_SIDSIZE			GENMASK(5, 0)
+
+#define ARM_SMMU_IDR3			0xc
+#define IDR3_RIL			(1 << 10)
+
+#define ARM_SMMU_IDR5			0x14
+#define IDR5_STALL_MAX			GENMASK(31, 16)
+#define IDR5_GRAN64K			(1 << 6)
+#define IDR5_GRAN16K			(1 << 5)
+#define IDR5_GRAN4K			(1 << 4)
+#define IDR5_OAS			GENMASK(2, 0)
+#define IDR5_OAS_32_BIT			0
+#define IDR5_OAS_36_BIT			1
+#define IDR5_OAS_40_BIT			2
+#define IDR5_OAS_42_BIT			3
+#define IDR5_OAS_44_BIT			4
+#define IDR5_OAS_48_BIT			5
+#define IDR5_OAS_52_BIT			6
+#define IDR5_VAX			GENMASK(11, 10)
+#define IDR5_VAX_52_BIT			1
+
+#define ARM_SMMU_CR0			0x20
+#define CR0_ATSCHK			(1 << 4)
+#define CR0_CMDQEN			(1 << 3)
+#define CR0_EVTQEN			(1 << 2)
+#define CR0_PRIQEN			(1 << 1)
+#define CR0_SMMUEN			(1 << 0)
+
+#define ARM_SMMU_CR0ACK			0x24
+
+#define ARM_SMMU_CR1			0x28
+#define CR1_TABLE_SH			GENMASK(11, 10)
+#define CR1_TABLE_OC			GENMASK(9, 8)
+#define CR1_TABLE_IC			GENMASK(7, 6)
+#define CR1_QUEUE_SH			GENMASK(5, 4)
+#define CR1_QUEUE_OC			GENMASK(3, 2)
+#define CR1_QUEUE_IC			GENMASK(1, 0)
+/* CR1 cacheability fields don't quite follow the usual TCR-style encoding */
+#define CR1_CACHE_NC			0
+#define CR1_CACHE_WB			1
+#define CR1_CACHE_WT			2
+
+#define ARM_SMMU_CR2			0x2c
+#define CR2_PTM				(1 << 2)
+#define CR2_RECINVSID			(1 << 1)
+#define CR2_E2H				(1 << 0)
+
+#define ARM_SMMU_GBPA			0x44
+#define GBPA_UPDATE			(1 << 31)
+#define GBPA_ABORT			(1 << 20)
+
+#define ARM_SMMU_IRQ_CTRL		0x50
+#define IRQ_CTRL_EVTQ_IRQEN		(1 << 2)
+#define IRQ_CTRL_PRIQ_IRQEN		(1 << 1)
+#define IRQ_CTRL_GERROR_IRQEN		(1 << 0)
+
+#define ARM_SMMU_IRQ_CTRLACK		0x54
+
+#define ARM_SMMU_GERROR			0x60
+#define GERROR_SFM_ERR			(1 << 8)
+#define GERROR_MSI_GERROR_ABT_ERR	(1 << 7)
+#define GERROR_MSI_PRIQ_ABT_ERR		(1 << 6)
+#define GERROR_MSI_EVTQ_ABT_ERR		(1 << 5)
+#define GERROR_MSI_CMDQ_ABT_ERR		(1 << 4)
+#define GERROR_PRIQ_ABT_ERR		(1 << 3)
+#define GERROR_EVTQ_ABT_ERR		(1 << 2)
+#define GERROR_CMDQ_ERR			(1 << 0)
+#define GERROR_ERR_MASK			0x1fd
+
+#define ARM_SMMU_GERRORN		0x64
+
+#define ARM_SMMU_GERROR_IRQ_CFG0	0x68
+#define ARM_SMMU_GERROR_IRQ_CFG1	0x70
+#define ARM_SMMU_GERROR_IRQ_CFG2	0x74
+
+#define ARM_SMMU_STRTAB_BASE		0x80
+#define STRTAB_BASE_RA			(1UL << 62)
+#define STRTAB_BASE_ADDR_MASK		GENMASK_ULL(51, 6)
+
+#define ARM_SMMU_STRTAB_BASE_CFG	0x88
+#define STRTAB_BASE_CFG_FMT		GENMASK(17, 16)
+#define STRTAB_BASE_CFG_FMT_LINEAR	0
+#define STRTAB_BASE_CFG_FMT_2LVL	1
+#define STRTAB_BASE_CFG_SPLIT		GENMASK(10, 6)
+#define STRTAB_BASE_CFG_LOG2SIZE	GENMASK(5, 0)
+
+#define Q_BASE_RWA			(1UL << 62)
+#define Q_BASE_ADDR_MASK		GENMASK_ULL(51, 5)
+#define Q_BASE_LOG2SIZE			GENMASK(4, 0)
+
+#define ARM_SMMU_CMDQ_BASE		0x90
+#define ARM_SMMU_CMDQ_PROD		0x98
+#define ARM_SMMU_CMDQ_CONS		0x9c
+
+#define ARM_SMMU_EVTQ_BASE		0xa0
+#define ARM_SMMU_EVTQ_PROD		0xa8
+#define ARM_SMMU_EVTQ_CONS		0xac
+#define ARM_SMMU_EVTQ_IRQ_CFG0		0xb0
+#define ARM_SMMU_EVTQ_IRQ_CFG1		0xb8
+#define ARM_SMMU_EVTQ_IRQ_CFG2		0xbc
+
+#define ARM_SMMU_PRIQ_BASE		0xc0
+#define ARM_SMMU_PRIQ_PROD		0xc8
+#define ARM_SMMU_PRIQ_CONS		0xcc
+#define ARM_SMMU_PRIQ_IRQ_CFG0		0xd0
+#define ARM_SMMU_PRIQ_IRQ_CFG1		0xd8
+#define ARM_SMMU_PRIQ_IRQ_CFG2		0xdc
+
+#define ARM_SMMU_REG_SZ			0xe00
+
+/* Common MSI config fields */
+#define MSI_CFG0_ADDR_MASK		GENMASK_ULL(51, 2)
+#define MSI_CFG2_SH			GENMASK(5, 4)
+#define MSI_CFG2_MEMATTR		GENMASK(3, 0)
+
+/* Common memory attribute values */
+#define ARM_SMMU_SH_NSH			0
+#define ARM_SMMU_SH_OSH			2
+#define ARM_SMMU_SH_ISH			3
+#define ARM_SMMU_MEMATTR_DEVICE_nGnRE	0x1
+#define ARM_SMMU_MEMATTR_OIWB		0xf
+
+/*
+ * Stream table.
+ *
+ * Linear: Enough to cover 1 << IDR1.SIDSIZE entries
+ * 2lvl: 128k L1 entries,
+ *       256 lazy entries per table (each table covers a PCI bus)
+ */
+#define STRTAB_L1_SZ_SHIFT		20
+#define STRTAB_SPLIT			8
+
+#define STRTAB_L1_DESC_DWORDS		1
+#define STRTAB_L1_DESC_SPAN		GENMASK_ULL(4, 0)
+#define STRTAB_L1_DESC_L2PTR_MASK	GENMASK_ULL(51, 6)
+
+#define STRTAB_STE_DWORDS		8
+#define STRTAB_STE_0_V			(1UL << 0)
+#define STRTAB_STE_0_CFG		GENMASK_ULL(3, 1)
+#define STRTAB_STE_0_CFG_ABORT		0
+#define STRTAB_STE_0_CFG_BYPASS		4
+#define STRTAB_STE_0_CFG_S1_TRANS	5
+#define STRTAB_STE_0_CFG_S2_TRANS	6
+
+#define STRTAB_STE_0_S1FMT		GENMASK_ULL(5, 4)
+#define STRTAB_STE_0_S1FMT_LINEAR	0
+#define STRTAB_STE_0_S1FMT_64K_L2	2
+#define STRTAB_STE_0_S1CTXPTR_MASK	GENMASK_ULL(51, 6)
+#define STRTAB_STE_0_S1CDMAX		GENMASK_ULL(63, 59)
+
+#define STRTAB_STE_1_S1DSS		GENMASK_ULL(1, 0)
+#define STRTAB_STE_1_S1DSS_TERMINATE	0x0
+#define STRTAB_STE_1_S1DSS_BYPASS	0x1
+#define STRTAB_STE_1_S1DSS_SSID0	0x2
+
+#define STRTAB_STE_1_S1C_CACHE_NC	0UL
+#define STRTAB_STE_1_S1C_CACHE_WBRA	1UL
+#define STRTAB_STE_1_S1C_CACHE_WT	2UL
+#define STRTAB_STE_1_S1C_CACHE_WB	3UL
+#define STRTAB_STE_1_S1CIR		GENMASK_ULL(3, 2)
+#define STRTAB_STE_1_S1COR		GENMASK_ULL(5, 4)
+#define STRTAB_STE_1_S1CSH		GENMASK_ULL(7, 6)
+
+#define STRTAB_STE_1_S1STALLD		(1UL << 27)
+
+#define STRTAB_STE_1_EATS		GENMASK_ULL(29, 28)
+#define STRTAB_STE_1_EATS_ABT		0UL
+#define STRTAB_STE_1_EATS_TRANS		1UL
+#define STRTAB_STE_1_EATS_S1CHK		2UL
+
+#define STRTAB_STE_1_STRW		GENMASK_ULL(31, 30)
+#define STRTAB_STE_1_STRW_NSEL1		0UL
+#define STRTAB_STE_1_STRW_EL2		2UL
+
+#define STRTAB_STE_1_SHCFG		GENMASK_ULL(45, 44)
+#define STRTAB_STE_1_SHCFG_INCOMING	1UL
+
+#define STRTAB_STE_2_S2VMID		GENMASK_ULL(15, 0)
+#define STRTAB_STE_2_VTCR		GENMASK_ULL(50, 32)
+#define STRTAB_STE_2_VTCR_S2T0SZ	GENMASK_ULL(5, 0)
+#define STRTAB_STE_2_VTCR_S2SL0		GENMASK_ULL(7, 6)
+#define STRTAB_STE_2_VTCR_S2IR0		GENMASK_ULL(9, 8)
+#define STRTAB_STE_2_VTCR_S2OR0		GENMASK_ULL(11, 10)
+#define STRTAB_STE_2_VTCR_S2SH0		GENMASK_ULL(13, 12)
+#define STRTAB_STE_2_VTCR_S2TG		GENMASK_ULL(15, 14)
+#define STRTAB_STE_2_VTCR_S2PS		GENMASK_ULL(18, 16)
+#define STRTAB_STE_2_S2AA64		(1UL << 51)
+#define STRTAB_STE_2_S2ENDI		(1UL << 52)
+#define STRTAB_STE_2_S2PTW		(1UL << 54)
+#define STRTAB_STE_2_S2R		(1UL << 58)
+
+#define STRTAB_STE_3_S2TTB_MASK		GENMASK_ULL(51, 4)
+
+/*
+ * Context descriptors.
+ *
+ * Linear: when less than 1024 SSIDs are supported
+ * 2lvl: at most 1024 L1 entries,
+ *       1024 lazy entries per table.
+ */
+#define CTXDESC_SPLIT			10
+#define CTXDESC_L2_ENTRIES		(1 << CTXDESC_SPLIT)
+
+#define CTXDESC_L1_DESC_DWORDS		1
+#define CTXDESC_L1_DESC_V		(1UL << 0)
+#define CTXDESC_L1_DESC_L2PTR_MASK	GENMASK_ULL(51, 12)
+
+#define CTXDESC_CD_DWORDS		8
+#define CTXDESC_CD_0_TCR_T0SZ		GENMASK_ULL(5, 0)
+#define CTXDESC_CD_0_TCR_TG0		GENMASK_ULL(7, 6)
+#define CTXDESC_CD_0_TCR_IRGN0		GENMASK_ULL(9, 8)
+#define CTXDESC_CD_0_TCR_ORGN0		GENMASK_ULL(11, 10)
+#define CTXDESC_CD_0_TCR_SH0		GENMASK_ULL(13, 12)
+#define CTXDESC_CD_0_TCR_EPD0		(1ULL << 14)
+#define CTXDESC_CD_0_TCR_EPD1		(1ULL << 30)
+
+#define CTXDESC_CD_0_ENDI		(1UL << 15)
+#define CTXDESC_CD_0_V			(1UL << 31)
+
+#define CTXDESC_CD_0_TCR_IPS		GENMASK_ULL(34, 32)
+#define CTXDESC_CD_0_TCR_TBI0		(1ULL << 38)
+
+#define CTXDESC_CD_0_AA64		(1UL << 41)
+#define CTXDESC_CD_0_S			(1UL << 44)
+#define CTXDESC_CD_0_R			(1UL << 45)
+#define CTXDESC_CD_0_A			(1UL << 46)
+#define CTXDESC_CD_0_ASET		(1UL << 47)
+#define CTXDESC_CD_0_ASID		GENMASK_ULL(63, 48)
+
+#define CTXDESC_CD_1_TTB0_MASK		GENMASK_ULL(51, 4)
+
+/* Command queue */
+#define CMDQ_ENT_SZ_SHIFT		4
+#define CMDQ_ENT_DWORDS			((1 << CMDQ_ENT_SZ_SHIFT) >> 3)
+#define CMDQ_MAX_SZ_SHIFT		(Q_MAX_SZ_SHIFT - CMDQ_ENT_SZ_SHIFT)
+
+#define CMDQ_CONS_ERR			GENMASK(30, 24)
+#define CMDQ_ERR_CERROR_NONE_IDX	0
+#define CMDQ_ERR_CERROR_ILL_IDX		1
+#define CMDQ_ERR_CERROR_ABT_IDX		2
+#define CMDQ_ERR_CERROR_ATC_INV_IDX	3
+
+#define CMDQ_0_OP			GENMASK_ULL(7, 0)
+#define CMDQ_0_SSV			(1UL << 11)
+
+#define CMDQ_PREFETCH_0_SID		GENMASK_ULL(63, 32)
+#define CMDQ_PREFETCH_1_SIZE		GENMASK_ULL(4, 0)
+#define CMDQ_PREFETCH_1_ADDR_MASK	GENMASK_ULL(63, 12)
+
+#define CMDQ_CFGI_0_SSID		GENMASK_ULL(31, 12)
+#define CMDQ_CFGI_0_SID			GENMASK_ULL(63, 32)
+#define CMDQ_CFGI_1_LEAF		(1UL << 0)
+#define CMDQ_CFGI_1_RANGE		GENMASK_ULL(4, 0)
+
+#define CMDQ_TLBI_0_NUM			GENMASK_ULL(16, 12)
+#define CMDQ_TLBI_RANGE_NUM_MAX		31
+#define CMDQ_TLBI_0_SCALE		GENMASK_ULL(24, 20)
+#define CMDQ_TLBI_0_VMID		GENMASK_ULL(47, 32)
+#define CMDQ_TLBI_0_ASID		GENMASK_ULL(63, 48)
+#define CMDQ_TLBI_1_LEAF		(1UL << 0)
+#define CMDQ_TLBI_1_TTL			GENMASK_ULL(9, 8)
+#define CMDQ_TLBI_1_TG			GENMASK_ULL(11, 10)
+#define CMDQ_TLBI_1_VA_MASK		GENMASK_ULL(63, 12)
+#define CMDQ_TLBI_1_IPA_MASK		GENMASK_ULL(51, 12)
+
+#define CMDQ_ATC_0_SSID			GENMASK_ULL(31, 12)
+#define CMDQ_ATC_0_SID			GENMASK_ULL(63, 32)
+#define CMDQ_ATC_0_GLOBAL		(1UL << 9)
+#define CMDQ_ATC_1_SIZE			GENMASK_ULL(5, 0)
+#define CMDQ_ATC_1_ADDR_MASK		GENMASK_ULL(63, 12)
+
+#define CMDQ_PRI_0_SSID			GENMASK_ULL(31, 12)
+#define CMDQ_PRI_0_SID			GENMASK_ULL(63, 32)
+#define CMDQ_PRI_1_GRPID		GENMASK_ULL(8, 0)
+#define CMDQ_PRI_1_RESP			GENMASK_ULL(13, 12)
+
+#define CMDQ_RESUME_0_RESP_TERM		0UL
+#define CMDQ_RESUME_0_RESP_RETRY	1UL
+#define CMDQ_RESUME_0_RESP_ABORT	2UL
+#define CMDQ_RESUME_0_RESP		GENMASK_ULL(13, 12)
+#define CMDQ_RESUME_0_SID		GENMASK_ULL(63, 32)
+#define CMDQ_RESUME_1_STAG		GENMASK_ULL(15, 0)
+
+#define CMDQ_SYNC_0_CS			GENMASK_ULL(13, 12)
+#define CMDQ_SYNC_0_CS_NONE		0
+#define CMDQ_SYNC_0_CS_IRQ		1
+#define CMDQ_SYNC_0_CS_SEV		2
+#define CMDQ_SYNC_0_MSH			GENMASK_ULL(23, 22)
+#define CMDQ_SYNC_0_MSIATTR		GENMASK_ULL(27, 24)
+#define CMDQ_SYNC_0_MSIDATA		GENMASK_ULL(63, 32)
+#define CMDQ_SYNC_1_MSIADDR_MASK	GENMASK_ULL(51, 2)
+
+/* Event queue */
+#define EVTQ_ENT_SZ_SHIFT		5
+#define EVTQ_ENT_DWORDS			((1 << EVTQ_ENT_SZ_SHIFT) >> 3)
+#define EVTQ_MAX_SZ_SHIFT		(Q_MAX_SZ_SHIFT - EVTQ_ENT_SZ_SHIFT)
+
+#define EVTQ_0_ID			GENMASK_ULL(7, 0)
+
+#define EVT_ID_TRANSLATION_FAULT	0x10
+#define EVT_ID_ADDR_SIZE_FAULT		0x11
+#define EVT_ID_ACCESS_FAULT		0x12
+#define EVT_ID_PERMISSION_FAULT		0x13
+
+#define EVTQ_0_SSV			(1UL << 11)
+#define EVTQ_0_SSID			GENMASK_ULL(31, 12)
+#define EVTQ_0_SID			GENMASK_ULL(63, 32)
+#define EVTQ_1_STAG			GENMASK_ULL(15, 0)
+#define EVTQ_1_STALL			(1UL << 31)
+#define EVTQ_1_PnU			(1UL << 33)
+#define EVTQ_1_InD			(1UL << 34)
+#define EVTQ_1_RnW			(1UL << 35)
+#define EVTQ_1_S2			(1UL << 39)
+#define EVTQ_1_CLASS			GENMASK_ULL(41, 40)
+#define EVTQ_1_TT_READ			(1UL << 44)
+#define EVTQ_2_ADDR			GENMASK_ULL(63, 0)
+#define EVTQ_3_IPA			GENMASK_ULL(51, 12)
+
+/* PRI queue */
+#define PRIQ_ENT_SZ_SHIFT		4
+#define PRIQ_ENT_DWORDS			((1 << PRIQ_ENT_SZ_SHIFT) >> 3)
+#define PRIQ_MAX_SZ_SHIFT		(Q_MAX_SZ_SHIFT - PRIQ_ENT_SZ_SHIFT)
+
+#define PRIQ_0_SID			GENMASK_ULL(31, 0)
+#define PRIQ_0_SSID			GENMASK_ULL(51, 32)
+#define PRIQ_0_PERM_PRIV		(1UL << 58)
+#define PRIQ_0_PERM_EXEC		(1UL << 59)
+#define PRIQ_0_PERM_READ		(1UL << 60)
+#define PRIQ_0_PERM_WRITE		(1UL << 61)
+#define PRIQ_0_PRG_LAST			(1UL << 62)
+#define PRIQ_0_SSID_V			(1UL << 63)
+
+#define PRIQ_1_PRG_IDX			GENMASK_ULL(8, 0)
+#define PRIQ_1_ADDR_MASK		GENMASK_ULL(63, 12)
+
+/* Synthesized features */
+#define ARM_SMMU_FEAT_2_LVL_STRTAB	(1 << 0)
+#define ARM_SMMU_FEAT_2_LVL_CDTAB	(1 << 1)
+#define ARM_SMMU_FEAT_TT_LE		(1 << 2)
+#define ARM_SMMU_FEAT_TT_BE		(1 << 3)
+#define ARM_SMMU_FEAT_PRI		(1 << 4)
+#define ARM_SMMU_FEAT_ATS		(1 << 5)
+#define ARM_SMMU_FEAT_SEV		(1 << 6)
+#define ARM_SMMU_FEAT_MSI		(1 << 7)
+#define ARM_SMMU_FEAT_COHERENCY		(1 << 8)
+#define ARM_SMMU_FEAT_TRANS_S1		(1 << 9)
+#define ARM_SMMU_FEAT_TRANS_S2		(1 << 10)
+#define ARM_SMMU_FEAT_STALLS		(1 << 11)
+#define ARM_SMMU_FEAT_HYP		(1 << 12)
+#define ARM_SMMU_FEAT_STALL_FORCE	(1 << 13)
+#define ARM_SMMU_FEAT_VAX		(1 << 14)
+#define ARM_SMMU_FEAT_RANGE_INV		(1 << 15)
+#define ARM_SMMU_FEAT_BTM		(1 << 16)
+#define ARM_SMMU_FEAT_SVA		(1 << 17)
+#define ARM_SMMU_FEAT_E2H		(1 << 18)
+
+enum pri_resp {
+	PRI_RESP_DENY = 0,
+	PRI_RESP_FAIL = 1,
+	PRI_RESP_SUCC = 2,
+};
+
+struct arm_smmu_cmdq_ent {
+	/* Common fields */
+	u8				opcode;
+	bool				substream_valid;
+
+	/* Command-specific fields */
+	union {
+		#define CMDQ_OP_PREFETCH_CFG	0x1
+		struct {
+			u32			sid;
+		} prefetch;
+
+		#define CMDQ_OP_CFGI_STE	0x3
+		#define CMDQ_OP_CFGI_ALL	0x4
+		#define CMDQ_OP_CFGI_CD		0x5
+		#define CMDQ_OP_CFGI_CD_ALL	0x6
+		struct {
+			u32			sid;
+			u32			ssid;
+			union {
+				bool		leaf;
+				u8		span;
+			};
+		} cfgi;
+
+		#define CMDQ_OP_TLBI_NH_ASID	0x11
+		#define CMDQ_OP_TLBI_NH_VA	0x12
+		#define CMDQ_OP_TLBI_EL2_ALL	0x20
+		#define CMDQ_OP_TLBI_EL2_ASID	0x21
+		#define CMDQ_OP_TLBI_EL2_VA	0x22
+		#define CMDQ_OP_TLBI_S12_VMALL	0x28
+		#define CMDQ_OP_TLBI_S2_IPA	0x2a
+		#define CMDQ_OP_TLBI_NSNH_ALL	0x30
+		struct {
+			u8			num;
+			u8			scale;
+			u16			asid;
+			u16			vmid;
+			bool			leaf;
+			u8			ttl;
+			u8			tg;
+			u64			addr;
+		} tlbi;
+
+		#define CMDQ_OP_ATC_INV		0x40
+		#define ATC_INV_SIZE_ALL	52
+		struct {
+			u32			sid;
+			u32			ssid;
+			u64			addr;
+			u8			size;
+			bool			global;
+		} atc;
+
+		#define CMDQ_OP_PRI_RESP	0x41
+		struct {
+			u32			sid;
+			u32			ssid;
+			u16			grpid;
+			enum pri_resp		resp;
+		} pri;
+
+		#define CMDQ_OP_RESUME		0x44
+		struct {
+			u32			sid;
+			u16			stag;
+			u8			resp;
+		} resume;
+
+		#define CMDQ_OP_CMD_SYNC	0x46
+		struct {
+			u64			msiaddr;
+		} sync;
+	};
+};
+
+#endif /* _ARM_SMMU_V3_REGS_H */
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index cec3c8103404..32ce835ab4eb 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -8,164 +8,13 @@
 #ifndef _ARM_SMMU_V3_H
 #define _ARM_SMMU_V3_H
 
-#include <linux/bitfield.h>
 #include <linux/iommu.h>
 #include <linux/io-pgtable.h>
 #include <linux/kernel.h>
 #include <linux/mmzone.h>
 #include <linux/sizes.h>
 
-/* MMIO registers */
-#define ARM_SMMU_IDR0			0x0
-#define IDR0_ST_LVL			GENMASK(28, 27)
-#define IDR0_ST_LVL_2LVL		1
-#define IDR0_STALL_MODEL		GENMASK(25, 24)
-#define IDR0_STALL_MODEL_STALL		0
-#define IDR0_STALL_MODEL_FORCE		2
-#define IDR0_TTENDIAN			GENMASK(22, 21)
-#define IDR0_TTENDIAN_MIXED		0
-#define IDR0_TTENDIAN_LE		2
-#define IDR0_TTENDIAN_BE		3
-#define IDR0_CD2L			(1 << 19)
-#define IDR0_VMID16			(1 << 18)
-#define IDR0_PRI			(1 << 16)
-#define IDR0_SEV			(1 << 14)
-#define IDR0_MSI			(1 << 13)
-#define IDR0_ASID16			(1 << 12)
-#define IDR0_ATS			(1 << 10)
-#define IDR0_HYP			(1 << 9)
-#define IDR0_COHACC			(1 << 4)
-#define IDR0_TTF			GENMASK(3, 2)
-#define IDR0_TTF_AARCH64		2
-#define IDR0_TTF_AARCH32_64		3
-#define IDR0_S1P			(1 << 1)
-#define IDR0_S2P			(1 << 0)
-
-#define ARM_SMMU_IDR1			0x4
-#define IDR1_TABLES_PRESET		(1 << 30)
-#define IDR1_QUEUES_PRESET		(1 << 29)
-#define IDR1_REL			(1 << 28)
-#define IDR1_CMDQS			GENMASK(25, 21)
-#define IDR1_EVTQS			GENMASK(20, 16)
-#define IDR1_PRIQS			GENMASK(15, 11)
-#define IDR1_SSIDSIZE			GENMASK(10, 6)
-#define IDR1_SIDSIZE			GENMASK(5, 0)
-
-#define ARM_SMMU_IDR3			0xc
-#define IDR3_RIL			(1 << 10)
-
-#define ARM_SMMU_IDR5			0x14
-#define IDR5_STALL_MAX			GENMASK(31, 16)
-#define IDR5_GRAN64K			(1 << 6)
-#define IDR5_GRAN16K			(1 << 5)
-#define IDR5_GRAN4K			(1 << 4)
-#define IDR5_OAS			GENMASK(2, 0)
-#define IDR5_OAS_32_BIT			0
-#define IDR5_OAS_36_BIT			1
-#define IDR5_OAS_40_BIT			2
-#define IDR5_OAS_42_BIT			3
-#define IDR5_OAS_44_BIT			4
-#define IDR5_OAS_48_BIT			5
-#define IDR5_OAS_52_BIT			6
-#define IDR5_VAX			GENMASK(11, 10)
-#define IDR5_VAX_52_BIT			1
-
-#define ARM_SMMU_CR0			0x20
-#define CR0_ATSCHK			(1 << 4)
-#define CR0_CMDQEN			(1 << 3)
-#define CR0_EVTQEN			(1 << 2)
-#define CR0_PRIQEN			(1 << 1)
-#define CR0_SMMUEN			(1 << 0)
-
-#define ARM_SMMU_CR0ACK			0x24
-
-#define ARM_SMMU_CR1			0x28
-#define CR1_TABLE_SH			GENMASK(11, 10)
-#define CR1_TABLE_OC			GENMASK(9, 8)
-#define CR1_TABLE_IC			GENMASK(7, 6)
-#define CR1_QUEUE_SH			GENMASK(5, 4)
-#define CR1_QUEUE_OC			GENMASK(3, 2)
-#define CR1_QUEUE_IC			GENMASK(1, 0)
-/* CR1 cacheability fields don't quite follow the usual TCR-style encoding */
-#define CR1_CACHE_NC			0
-#define CR1_CACHE_WB			1
-#define CR1_CACHE_WT			2
-
-#define ARM_SMMU_CR2			0x2c
-#define CR2_PTM				(1 << 2)
-#define CR2_RECINVSID			(1 << 1)
-#define CR2_E2H				(1 << 0)
-
-#define ARM_SMMU_GBPA			0x44
-#define GBPA_UPDATE			(1 << 31)
-#define GBPA_ABORT			(1 << 20)
-
-#define ARM_SMMU_IRQ_CTRL		0x50
-#define IRQ_CTRL_EVTQ_IRQEN		(1 << 2)
-#define IRQ_CTRL_PRIQ_IRQEN		(1 << 1)
-#define IRQ_CTRL_GERROR_IRQEN		(1 << 0)
-
-#define ARM_SMMU_IRQ_CTRLACK		0x54
-
-#define ARM_SMMU_GERROR			0x60
-#define GERROR_SFM_ERR			(1 << 8)
-#define GERROR_MSI_GERROR_ABT_ERR	(1 << 7)
-#define GERROR_MSI_PRIQ_ABT_ERR		(1 << 6)
-#define GERROR_MSI_EVTQ_ABT_ERR		(1 << 5)
-#define GERROR_MSI_CMDQ_ABT_ERR		(1 << 4)
-#define GERROR_PRIQ_ABT_ERR		(1 << 3)
-#define GERROR_EVTQ_ABT_ERR		(1 << 2)
-#define GERROR_CMDQ_ERR			(1 << 0)
-#define GERROR_ERR_MASK			0x1fd
-
-#define ARM_SMMU_GERRORN		0x64
-
-#define ARM_SMMU_GERROR_IRQ_CFG0	0x68
-#define ARM_SMMU_GERROR_IRQ_CFG1	0x70
-#define ARM_SMMU_GERROR_IRQ_CFG2	0x74
-
-#define ARM_SMMU_STRTAB_BASE		0x80
-#define STRTAB_BASE_RA			(1UL << 62)
-#define STRTAB_BASE_ADDR_MASK		GENMASK_ULL(51, 6)
-
-#define ARM_SMMU_STRTAB_BASE_CFG	0x88
-#define STRTAB_BASE_CFG_FMT		GENMASK(17, 16)
-#define STRTAB_BASE_CFG_FMT_LINEAR	0
-#define STRTAB_BASE_CFG_FMT_2LVL	1
-#define STRTAB_BASE_CFG_SPLIT		GENMASK(10, 6)
-#define STRTAB_BASE_CFG_LOG2SIZE	GENMASK(5, 0)
-
-#define ARM_SMMU_CMDQ_BASE		0x90
-#define ARM_SMMU_CMDQ_PROD		0x98
-#define ARM_SMMU_CMDQ_CONS		0x9c
-
-#define ARM_SMMU_EVTQ_BASE		0xa0
-#define ARM_SMMU_EVTQ_PROD		0xa8
-#define ARM_SMMU_EVTQ_CONS		0xac
-#define ARM_SMMU_EVTQ_IRQ_CFG0		0xb0
-#define ARM_SMMU_EVTQ_IRQ_CFG1		0xb8
-#define ARM_SMMU_EVTQ_IRQ_CFG2		0xbc
-
-#define ARM_SMMU_PRIQ_BASE		0xc0
-#define ARM_SMMU_PRIQ_PROD		0xc8
-#define ARM_SMMU_PRIQ_CONS		0xcc
-#define ARM_SMMU_PRIQ_IRQ_CFG0		0xd0
-#define ARM_SMMU_PRIQ_IRQ_CFG1		0xd8
-#define ARM_SMMU_PRIQ_IRQ_CFG2		0xdc
-
-#define ARM_SMMU_REG_SZ			0xe00
-
-/* Common MSI config fields */
-#define MSI_CFG0_ADDR_MASK		GENMASK_ULL(51, 2)
-#define MSI_CFG2_SH			GENMASK(5, 4)
-#define MSI_CFG2_MEMATTR		GENMASK(3, 0)
-
-/* Common memory attribute values */
-#define ARM_SMMU_SH_NSH			0
-#define ARM_SMMU_SH_OSH			2
-#define ARM_SMMU_SH_ISH			3
-#define ARM_SMMU_MEMATTR_DEVICE_nGnRE	0x1
-#define ARM_SMMU_MEMATTR_OIWB		0xf
+#include <asm/arm-smmu-v3-regs.h>
 
 #define Q_IDX(llq, p)			((p) & ((1 << (llq)->max_n_shift) - 1))
 #define Q_WRP(llq, p)			((p) & (1 << (llq)->max_n_shift))
@@ -175,10 +24,6 @@
 					 Q_IDX(&((q)->llq), p) *	\
 					 (q)->ent_dwords)
 
-#define Q_BASE_RWA			(1UL << 62)
-#define Q_BASE_ADDR_MASK		GENMASK_ULL(51, 5)
-#define Q_BASE_LOG2SIZE			GENMASK(4, 0)
-
 /* Ensure DMA allocations are naturally aligned */
 #ifdef CONFIG_CMA_ALIGNMENT
 #define Q_MAX_SZ_SHIFT			(PAGE_SHIFT + CONFIG_CMA_ALIGNMENT)
@@ -186,132 +31,6 @@
 #define Q_MAX_SZ_SHIFT			(PAGE_SHIFT + MAX_ORDER - 1)
 #endif
 
-/*
- * Stream table.
- *
- * Linear: Enough to cover 1 << IDR1.SIDSIZE entries
- * 2lvl: 128k L1 entries,
- *       256 lazy entries per table (each table covers a PCI bus)
- */
-#define STRTAB_L1_SZ_SHIFT		20
-#define STRTAB_SPLIT			8
-
-#define STRTAB_L1_DESC_DWORDS		1
-#define STRTAB_L1_DESC_SPAN		GENMASK_ULL(4, 0)
-#define STRTAB_L1_DESC_L2PTR_MASK	GENMASK_ULL(51, 6)
-
-#define STRTAB_STE_DWORDS		8
-#define STRTAB_STE_0_V			(1UL << 0)
-#define STRTAB_STE_0_CFG		GENMASK_ULL(3, 1)
-#define STRTAB_STE_0_CFG_ABORT		0
-#define STRTAB_STE_0_CFG_BYPASS		4
-#define STRTAB_STE_0_CFG_S1_TRANS	5
-#define STRTAB_STE_0_CFG_S2_TRANS	6
-
-#define STRTAB_STE_0_S1FMT		GENMASK_ULL(5, 4)
-#define STRTAB_STE_0_S1FMT_LINEAR	0
-#define STRTAB_STE_0_S1FMT_64K_L2	2
-#define STRTAB_STE_0_S1CTXPTR_MASK	GENMASK_ULL(51, 6)
-#define STRTAB_STE_0_S1CDMAX		GENMASK_ULL(63, 59)
-
-#define STRTAB_STE_1_S1DSS		GENMASK_ULL(1, 0)
-#define STRTAB_STE_1_S1DSS_TERMINATE	0x0
-#define STRTAB_STE_1_S1DSS_BYPASS	0x1
-#define STRTAB_STE_1_S1DSS_SSID0	0x2
-
-#define STRTAB_STE_1_S1C_CACHE_NC	0UL
-#define STRTAB_STE_1_S1C_CACHE_WBRA	1UL
-#define STRTAB_STE_1_S1C_CACHE_WT	2UL
-#define STRTAB_STE_1_S1C_CACHE_WB	3UL
-#define STRTAB_STE_1_S1CIR		GENMASK_ULL(3, 2)
-#define STRTAB_STE_1_S1COR		GENMASK_ULL(5, 4)
-#define STRTAB_STE_1_S1CSH		GENMASK_ULL(7, 6)
-
-#define STRTAB_STE_1_S1STALLD		(1UL << 27)
-
-#define STRTAB_STE_1_EATS		GENMASK_ULL(29, 28)
-#define STRTAB_STE_1_EATS_ABT		0UL
-#define STRTAB_STE_1_EATS_TRANS		1UL
-#define STRTAB_STE_1_EATS_S1CHK		2UL
-
-#define STRTAB_STE_1_STRW		GENMASK_ULL(31, 30)
-#define STRTAB_STE_1_STRW_NSEL1		0UL
-#define STRTAB_STE_1_STRW_EL2		2UL
-
-#define STRTAB_STE_1_SHCFG		GENMASK_ULL(45, 44)
-#define STRTAB_STE_1_SHCFG_INCOMING	1UL
-
-#define STRTAB_STE_2_S2VMID		GENMASK_ULL(15, 0)
-#define STRTAB_STE_2_VTCR		GENMASK_ULL(50, 32)
-#define STRTAB_STE_2_VTCR_S2T0SZ	GENMASK_ULL(5, 0)
-#define STRTAB_STE_2_VTCR_S2SL0		GENMASK_ULL(7, 6)
-#define STRTAB_STE_2_VTCR_S2IR0		GENMASK_ULL(9, 8)
-#define STRTAB_STE_2_VTCR_S2OR0		GENMASK_ULL(11, 10)
-#define STRTAB_STE_2_VTCR_S2SH0		GENMASK_ULL(13, 12)
-#define STRTAB_STE_2_VTCR_S2TG		GENMASK_ULL(15, 14)
-#define STRTAB_STE_2_VTCR_S2PS		GENMASK_ULL(18, 16)
-#define STRTAB_STE_2_S2AA64		(1UL << 51)
-#define STRTAB_STE_2_S2ENDI		(1UL << 52)
-#define STRTAB_STE_2_S2PTW		(1UL << 54)
-#define STRTAB_STE_2_S2R		(1UL << 58)
-
-#define STRTAB_STE_3_S2TTB_MASK		GENMASK_ULL(51, 4)
-
-/*
- * Context descriptors.
- *
- * Linear: when less than 1024 SSIDs are supported
- * 2lvl: at most 1024 L1 entries,
- *       1024 lazy entries per table.
- */
-#define CTXDESC_SPLIT			10
-#define CTXDESC_L2_ENTRIES		(1 << CTXDESC_SPLIT)
-
-#define CTXDESC_L1_DESC_DWORDS		1
-#define CTXDESC_L1_DESC_V		(1UL << 0)
-#define CTXDESC_L1_DESC_L2PTR_MASK	GENMASK_ULL(51, 12)
-
-#define CTXDESC_CD_DWORDS		8
-#define CTXDESC_CD_0_TCR_T0SZ		GENMASK_ULL(5, 0)
-#define CTXDESC_CD_0_TCR_TG0		GENMASK_ULL(7, 6)
-#define CTXDESC_CD_0_TCR_IRGN0		GENMASK_ULL(9, 8)
-#define CTXDESC_CD_0_TCR_ORGN0		GENMASK_ULL(11, 10)
-#define CTXDESC_CD_0_TCR_SH0		GENMASK_ULL(13, 12)
-#define CTXDESC_CD_0_TCR_EPD0		(1ULL << 14)
-#define CTXDESC_CD_0_TCR_EPD1		(1ULL << 30)
-
-#define CTXDESC_CD_0_ENDI		(1UL << 15)
-#define CTXDESC_CD_0_V			(1UL << 31)
-
-#define CTXDESC_CD_0_TCR_IPS		GENMASK_ULL(34, 32)
-#define CTXDESC_CD_0_TCR_TBI0		(1ULL << 38)
-
-#define CTXDESC_CD_0_AA64		(1UL << 41)
-#define CTXDESC_CD_0_S			(1UL << 44)
-#define CTXDESC_CD_0_R			(1UL << 45)
-#define CTXDESC_CD_0_A			(1UL << 46)
-#define CTXDESC_CD_0_ASET		(1UL << 47)
-#define CTXDESC_CD_0_ASID		GENMASK_ULL(63, 48)
-
-#define CTXDESC_CD_1_TTB0_MASK		GENMASK_ULL(51, 4)
-
-/*
- * When the SMMU only supports linear context descriptor tables, pick a
- * reasonable size limit (64kB).
- */
-#define CTXDESC_LINEAR_CDMAX		ilog2(SZ_64K / (CTXDESC_CD_DWORDS << 3))
-
-/* Command queue */
-#define CMDQ_ENT_SZ_SHIFT		4
-#define CMDQ_ENT_DWORDS			((1 << CMDQ_ENT_SZ_SHIFT) >> 3)
-#define CMDQ_MAX_SZ_SHIFT		(Q_MAX_SZ_SHIFT - CMDQ_ENT_SZ_SHIFT)
-
-#define CMDQ_CONS_ERR			GENMASK(30, 24)
-#define CMDQ_ERR_CERROR_NONE_IDX	0
-#define CMDQ_ERR_CERROR_ILL_IDX		1
-#define CMDQ_ERR_CERROR_ABT_IDX		2
-#define CMDQ_ERR_CERROR_ATC_INV_IDX	3
-
 #define CMDQ_PROD_OWNED_FLAG		Q_OVERFLOW_FLAG
 
 /*
@@ -321,98 +40,11 @@
  */
 #define CMDQ_BATCH_ENTRIES		BITS_PER_LONG
 
-#define CMDQ_0_OP			GENMASK_ULL(7, 0)
-#define CMDQ_0_SSV			(1UL << 11)
-
-#define CMDQ_PREFETCH_0_SID		GENMASK_ULL(63, 32)
-#define CMDQ_PREFETCH_1_SIZE		GENMASK_ULL(4, 0)
-#define CMDQ_PREFETCH_1_ADDR_MASK	GENMASK_ULL(63, 12)
-
-#define CMDQ_CFGI_0_SSID		GENMASK_ULL(31, 12)
-#define CMDQ_CFGI_0_SID			GENMASK_ULL(63, 32)
-#define CMDQ_CFGI_1_LEAF		(1UL << 0)
-#define CMDQ_CFGI_1_RANGE		GENMASK_ULL(4, 0)
-
-#define CMDQ_TLBI_0_NUM			GENMASK_ULL(16, 12)
-#define CMDQ_TLBI_RANGE_NUM_MAX		31
-#define CMDQ_TLBI_0_SCALE		GENMASK_ULL(24, 20)
-#define CMDQ_TLBI_0_VMID		GENMASK_ULL(47, 32)
-#define CMDQ_TLBI_0_ASID		GENMASK_ULL(63, 48)
-#define CMDQ_TLBI_1_LEAF		(1UL << 0)
-#define CMDQ_TLBI_1_TTL			GENMASK_ULL(9, 8)
-#define CMDQ_TLBI_1_TG			GENMASK_ULL(11, 10)
-#define CMDQ_TLBI_1_VA_MASK		GENMASK_ULL(63, 12)
-#define CMDQ_TLBI_1_IPA_MASK		GENMASK_ULL(51, 12)
-
-#define CMDQ_ATC_0_SSID			GENMASK_ULL(31, 12)
-#define CMDQ_ATC_0_SID			GENMASK_ULL(63, 32)
-#define CMDQ_ATC_0_GLOBAL		(1UL << 9)
-#define CMDQ_ATC_1_SIZE			GENMASK_ULL(5, 0)
-#define CMDQ_ATC_1_ADDR_MASK		GENMASK_ULL(63, 12)
-
-#define CMDQ_PRI_0_SSID			GENMASK_ULL(31, 12)
-#define CMDQ_PRI_0_SID			GENMASK_ULL(63, 32)
-#define CMDQ_PRI_1_GRPID		GENMASK_ULL(8, 0)
-#define CMDQ_PRI_1_RESP			GENMASK_ULL(13, 12)
-
-#define CMDQ_RESUME_0_RESP_TERM		0UL
-#define CMDQ_RESUME_0_RESP_RETRY	1UL
-#define CMDQ_RESUME_0_RESP_ABORT	2UL
-#define CMDQ_RESUME_0_RESP		GENMASK_ULL(13, 12)
-#define CMDQ_RESUME_0_SID		GENMASK_ULL(63, 32)
-#define CMDQ_RESUME_1_STAG		GENMASK_ULL(15, 0)
-
-#define CMDQ_SYNC_0_CS			GENMASK_ULL(13, 12)
-#define CMDQ_SYNC_0_CS_NONE		0
-#define CMDQ_SYNC_0_CS_IRQ		1
-#define CMDQ_SYNC_0_CS_SEV		2
-#define CMDQ_SYNC_0_MSH			GENMASK_ULL(23, 22)
-#define CMDQ_SYNC_0_MSIATTR		GENMASK_ULL(27, 24)
-#define CMDQ_SYNC_0_MSIDATA		GENMASK_ULL(63, 32)
-#define CMDQ_SYNC_1_MSIADDR_MASK	GENMASK_ULL(51, 2)
-
-/* Event queue */
-#define EVTQ_ENT_SZ_SHIFT		5
-#define EVTQ_ENT_DWORDS			((1 << EVTQ_ENT_SZ_SHIFT) >> 3)
-#define EVTQ_MAX_SZ_SHIFT		(Q_MAX_SZ_SHIFT - EVTQ_ENT_SZ_SHIFT)
-
-#define EVTQ_0_ID			GENMASK_ULL(7, 0)
-
-#define EVT_ID_TRANSLATION_FAULT	0x10
-#define EVT_ID_ADDR_SIZE_FAULT		0x11
-#define EVT_ID_ACCESS_FAULT		0x12
-#define EVT_ID_PERMISSION_FAULT		0x13
-
-#define EVTQ_0_SSV			(1UL << 11)
-#define EVTQ_0_SSID			GENMASK_ULL(31, 12)
-#define EVTQ_0_SID			GENMASK_ULL(63, 32)
-#define EVTQ_1_STAG			GENMASK_ULL(15, 0)
-#define EVTQ_1_STALL			(1UL << 31)
-#define EVTQ_1_PnU			(1UL << 33)
-#define EVTQ_1_InD			(1UL << 34)
-#define EVTQ_1_RnW			(1UL << 35)
-#define EVTQ_1_S2			(1UL << 39)
-#define EVTQ_1_CLASS			GENMASK_ULL(41, 40)
-#define EVTQ_1_TT_READ			(1UL << 44)
-#define EVTQ_2_ADDR			GENMASK_ULL(63, 0)
-#define EVTQ_3_IPA			GENMASK_ULL(51, 12)
-
-/* PRI queue */
-#define PRIQ_ENT_SZ_SHIFT		4
-#define PRIQ_ENT_DWORDS			((1 << PRIQ_ENT_SZ_SHIFT) >> 3)
-#define PRIQ_MAX_SZ_SHIFT		(Q_MAX_SZ_SHIFT - PRIQ_ENT_SZ_SHIFT)
-
-#define PRIQ_0_SID			GENMASK_ULL(31, 0)
-#define PRIQ_0_SSID			GENMASK_ULL(51, 32)
-#define PRIQ_0_PERM_PRIV		(1UL << 58)
-#define PRIQ_0_PERM_EXEC		(1UL << 59)
-#define PRIQ_0_PERM_READ		(1UL << 60)
-#define PRIQ_0_PERM_WRITE		(1UL << 61)
-#define PRIQ_0_PRG_LAST			(1UL << 62)
-#define PRIQ_0_SSID_V			(1UL << 63)
-
-#define PRIQ_1_PRG_IDX			GENMASK_ULL(8, 0)
-#define PRIQ_1_ADDR_MASK		GENMASK_ULL(63, 12)
+/*
+ * When the SMMU only supports linear context descriptor tables, pick a
+ * reasonable size limit (64kB).
+ */
+#define CTXDESC_LINEAR_CDMAX		ilog2(SZ_64K / (CTXDESC_CD_DWORDS << 3))
 
 /* High-level queue structures */
 #define ARM_SMMU_POLL_TIMEOUT_US	1000000 /* 1s! */
@@ -421,88 +53,6 @@
 #define MSI_IOVA_BASE			0x8000000
 #define MSI_IOVA_LENGTH			0x100000
 
-enum pri_resp {
-	PRI_RESP_DENY = 0,
-	PRI_RESP_FAIL = 1,
-	PRI_RESP_SUCC = 2,
-};
-
-struct arm_smmu_cmdq_ent {
-	/* Common fields */
-	u8				opcode;
-	bool				substream_valid;
-
-	/* Command-specific fields */
-	union {
-		#define CMDQ_OP_PREFETCH_CFG	0x1
-		struct {
-			u32			sid;
-		} prefetch;
-
-		#define CMDQ_OP_CFGI_STE	0x3
-		#define CMDQ_OP_CFGI_ALL	0x4
-		#define CMDQ_OP_CFGI_CD		0x5
-		#define CMDQ_OP_CFGI_CD_ALL	0x6
-		struct {
-			u32			sid;
-			u32			ssid;
-			union {
-				bool		leaf;
-				u8		span;
-			};
-		} cfgi;
-
-		#define CMDQ_OP_TLBI_NH_ASID	0x11
-		#define CMDQ_OP_TLBI_NH_VA	0x12
-		#define CMDQ_OP_TLBI_EL2_ALL	0x20
-		#define CMDQ_OP_TLBI_EL2_ASID	0x21
-		#define CMDQ_OP_TLBI_EL2_VA	0x22
-		#define CMDQ_OP_TLBI_S12_VMALL	0x28
-		#define CMDQ_OP_TLBI_S2_IPA	0x2a
-		#define CMDQ_OP_TLBI_NSNH_ALL	0x30
-		struct {
-			u8			num;
-			u8			scale;
-			u16			asid;
-			u16			vmid;
-			bool			leaf;
-			u8			ttl;
-			u8			tg;
-			u64			addr;
-		} tlbi;
-
-		#define CMDQ_OP_ATC_INV		0x40
-		#define ATC_INV_SIZE_ALL	52
-		struct {
-			u32			sid;
-			u32			ssid;
-			u64			addr;
-			u8			size;
-			bool			global;
-		} atc;
-
-		#define CMDQ_OP_PRI_RESP	0x41
-		struct {
-			u32			sid;
-			u32			ssid;
-			u16			grpid;
-			enum pri_resp		resp;
-		} pri;
-
-		#define CMDQ_OP_RESUME		0x44
-		struct {
-			u32			sid;
-			u16			stag;
-			u8			resp;
-		} resume;
-
-		#define CMDQ_OP_CMD_SYNC	0x46
-		struct {
-			u64			msiaddr;
-		} sync;
-	};
-};
-
 struct arm_smmu_ll_queue {
 	union {
 		u64			val;
@@ -621,25 +171,6 @@ struct arm_smmu_device {
 	void __iomem			*base;
 	void __iomem			*page1;
 
-#define ARM_SMMU_FEAT_2_LVL_STRTAB	(1 << 0)
-#define ARM_SMMU_FEAT_2_LVL_CDTAB	(1 << 1)
-#define ARM_SMMU_FEAT_TT_LE		(1 << 2)
-#define ARM_SMMU_FEAT_TT_BE		(1 << 3)
-#define ARM_SMMU_FEAT_PRI		(1 << 4)
-#define ARM_SMMU_FEAT_ATS		(1 << 5)
-#define ARM_SMMU_FEAT_SEV		(1 << 6)
-#define ARM_SMMU_FEAT_MSI		(1 << 7)
-#define ARM_SMMU_FEAT_COHERENCY		(1 << 8)
-#define ARM_SMMU_FEAT_TRANS_S1		(1 << 9)
-#define ARM_SMMU_FEAT_TRANS_S2		(1 << 10)
-#define ARM_SMMU_FEAT_STALLS		(1 << 11)
-#define ARM_SMMU_FEAT_HYP		(1 << 12)
-#define ARM_SMMU_FEAT_STALL_FORCE	(1 << 13)
-#define ARM_SMMU_FEAT_VAX		(1 << 14)
-#define ARM_SMMU_FEAT_RANGE_INV		(1 << 15)
-#define ARM_SMMU_FEAT_BTM		(1 << 16)
-#define ARM_SMMU_FEAT_SVA		(1 << 17)
-#define ARM_SMMU_FEAT_E2H		(1 << 18)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 08/45] KVM: arm64: pkvm: Add pkvm_udelay()
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:52   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:52 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Add a simple delay loop for drivers.

This could use more work. It should be possible to insert a wfe and save
power, but I haven't studied whether it is safe to do so with the host
in control of the event stream. The SMMU driver will use wfe anyway for
frequent waits (provided the implementation can send command queue
events).

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h |  3 ++
 arch/arm64/kvm/hyp/nvhe/setup.c        |  4 +++
 arch/arm64/kvm/hyp/nvhe/timer-sr.c     | 43 ++++++++++++++++++++++++++
 3 files changed, 50 insertions(+)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index 6160d1a34fa2..746dc1c05a8e 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -109,4 +109,7 @@ bool kvm_handle_pvm_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code);
 
 struct pkvm_hyp_vcpu *pkvm_mpidr_to_hyp_vcpu(struct pkvm_hyp_vm *vm, u64 mpidr);
 
+int pkvm_timer_init(void);
+void pkvm_udelay(unsigned long usecs);
+
 #endif /* __ARM64_KVM_NVHE_PKVM_H__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 8a357637ce81..629e74c46b35 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -300,6 +300,10 @@ void __noreturn __pkvm_init_finalise(void)
 	};
 	pkvm_pgtable.mm_ops = &pkvm_pgtable_mm_ops;
 
+	ret = pkvm_timer_init();
+	if (ret)
+		goto out;
+
 	ret = fix_host_ownership();
 	if (ret)
 		goto out;
diff --git a/arch/arm64/kvm/hyp/nvhe/timer-sr.c b/arch/arm64/kvm/hyp/nvhe/timer-sr.c
index 9072e71693ba..202df9003a0d 100644
--- a/arch/arm64/kvm/hyp/nvhe/timer-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/timer-sr.c
@@ -10,6 +10,10 @@
 
 #include <asm/kvm_hyp.h>
 
+#include <nvhe/pkvm.h>
+
+static u32 timer_freq;
+
 void __kvm_timer_set_cntvoff(u64 cntvoff)
 {
 	write_sysreg(cntvoff, cntvoff_el2);
@@ -46,3 +50,42 @@ void __timer_enable_traps(struct kvm_vcpu *vcpu)
 	val |= CNTHCTL_EL1PCTEN;
 	write_sysreg(val, cnthctl_el2);
 }
+
+static u64 pkvm_ticks_get(void)
+{
+	return __arch_counter_get_cntvct();
+}
+
+#define SEC_TO_US 1000000
+
+int pkvm_timer_init(void)
+{
+	timer_freq = read_sysreg(cntfrq_el0);
+	/*
+	 * TODO: The highest privileged level is supposed to initialize this
+	 * register. But on some systems (which?), this information is only
+	 * contained in the device-tree, so we'll need to find it out some other
+	 * way.
+	 */
+	if (!timer_freq || timer_freq < SEC_TO_US)
+		return -ENODEV;
+	return 0;
+}
+
+
+#define pkvm_time_us_to_ticks(us) ((u64)(us) * timer_freq / SEC_TO_US)
+
+void pkvm_udelay(unsigned long usecs)
+{
+	u64 ticks = pkvm_time_us_to_ticks(usecs);
+	u64 start = pkvm_ticks_get();
+
+	while (true) {
+		u64 cur = pkvm_ticks_get();
+
+		if ((cur - start) >= ticks || cur < start)
+			break;
+		/* TODO wfe */
+		cpu_relax();
+	}
+}
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 08/45] KVM: arm64: pkvm: Add pkvm_udelay()
@ 2023-02-01 12:52   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:52 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Add a simple delay loop for drivers.

This could use more work. It should be possible to insert a wfe and save
power, but I haven't studied whether it is safe to do so with the host
in control of the event stream. The SMMU driver will use wfe anyway for
frequent waits (provided the implementation can send command queue
events).

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h |  3 ++
 arch/arm64/kvm/hyp/nvhe/setup.c        |  4 +++
 arch/arm64/kvm/hyp/nvhe/timer-sr.c     | 43 ++++++++++++++++++++++++++
 3 files changed, 50 insertions(+)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index 6160d1a34fa2..746dc1c05a8e 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -109,4 +109,7 @@ bool kvm_handle_pvm_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code);
 
 struct pkvm_hyp_vcpu *pkvm_mpidr_to_hyp_vcpu(struct pkvm_hyp_vm *vm, u64 mpidr);
 
+int pkvm_timer_init(void);
+void pkvm_udelay(unsigned long usecs);
+
 #endif /* __ARM64_KVM_NVHE_PKVM_H__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 8a357637ce81..629e74c46b35 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -300,6 +300,10 @@ void __noreturn __pkvm_init_finalise(void)
 	};
 	pkvm_pgtable.mm_ops = &pkvm_pgtable_mm_ops;
 
+	ret = pkvm_timer_init();
+	if (ret)
+		goto out;
+
 	ret = fix_host_ownership();
 	if (ret)
 		goto out;
diff --git a/arch/arm64/kvm/hyp/nvhe/timer-sr.c b/arch/arm64/kvm/hyp/nvhe/timer-sr.c
index 9072e71693ba..202df9003a0d 100644
--- a/arch/arm64/kvm/hyp/nvhe/timer-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/timer-sr.c
@@ -10,6 +10,10 @@
 
 #include <asm/kvm_hyp.h>
 
+#include <nvhe/pkvm.h>
+
+static u32 timer_freq;
+
 void __kvm_timer_set_cntvoff(u64 cntvoff)
 {
 	write_sysreg(cntvoff, cntvoff_el2);
@@ -46,3 +50,42 @@ void __timer_enable_traps(struct kvm_vcpu *vcpu)
 	val |= CNTHCTL_EL1PCTEN;
 	write_sysreg(val, cnthctl_el2);
 }
+
+static u64 pkvm_ticks_get(void)
+{
+	return __arch_counter_get_cntvct();
+}
+
+#define SEC_TO_US 1000000
+
+int pkvm_timer_init(void)
+{
+	timer_freq = read_sysreg(cntfrq_el0);
+	/*
+	 * TODO: The highest privileged level is supposed to initialize this
+	 * register. But on some systems (which?), this information is only
+	 * contained in the device-tree, so we'll need to find it out some other
+	 * way.
+	 */
+	if (!timer_freq || timer_freq < SEC_TO_US)
+		return -ENODEV;
+	return 0;
+}
+
+
+#define pkvm_time_us_to_ticks(us) ((u64)(us) * timer_freq / SEC_TO_US)
+
+void pkvm_udelay(unsigned long usecs)
+{
+	u64 ticks = pkvm_time_us_to_ticks(usecs);
+	u64 start = pkvm_ticks_get();
+
+	while (true) {
+		u64 cur = pkvm_ticks_get();
+
+		if ((cur - start) >= ticks || cur < start)
+			break;
+		/* TODO wfe */
+		cpu_relax();
+	}
+}
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 09/45] KVM: arm64: pkvm: Add pkvm_create_hyp_device_mapping()
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:52   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:52 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Add a function to map a MMIO region in the hypervisor and remove it from
the host. Hypervisor device drivers use this to reserve their regions
during setup.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mm.h |  1 +
 arch/arm64/kvm/hyp/nvhe/setup.c      | 17 +++++++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
index d5ec972b5c1e..84db840f2057 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -27,5 +27,6 @@ int __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
 				  enum kvm_pgtable_prot prot,
 				  unsigned long *haddr);
 int pkvm_alloc_private_va_range(size_t size, unsigned long *haddr);
+int pkvm_create_hyp_device_mapping(u64 base, u64 size, void __iomem *haddr);
 
 #endif /* __KVM_HYP_MM_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 629e74c46b35..de7d60c3c20b 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -259,6 +259,23 @@ static int fix_host_ownership(void)
 	return 0;
 }
 
+/* Map the MMIO region into the hypervisor and remove it from host */
+int pkvm_create_hyp_device_mapping(u64 base, u64 size, void __iomem *haddr)
+{
+	int ret;
+
+	ret = __pkvm_create_private_mapping(base, size, PAGE_HYP_DEVICE, haddr);
+	if (ret)
+		return ret;
+
+	/* lock not needed during setup */
+	ret = host_stage2_set_owner_locked(base, size, PKVM_ID_HYP);
+	if (ret)
+		return ret;
+
+	return ret;
+}
+
 static int fix_hyp_pgtable_refcnt(void)
 {
 	struct kvm_pgtable_walker walker = {
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 09/45] KVM: arm64: pkvm: Add pkvm_create_hyp_device_mapping()
@ 2023-02-01 12:52   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:52 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Add a function to map a MMIO region in the hypervisor and remove it from
the host. Hypervisor device drivers use this to reserve their regions
during setup.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mm.h |  1 +
 arch/arm64/kvm/hyp/nvhe/setup.c      | 17 +++++++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
index d5ec972b5c1e..84db840f2057 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -27,5 +27,6 @@ int __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
 				  enum kvm_pgtable_prot prot,
 				  unsigned long *haddr);
 int pkvm_alloc_private_va_range(size_t size, unsigned long *haddr);
+int pkvm_create_hyp_device_mapping(u64 base, u64 size, void __iomem *haddr);
 
 #endif /* __KVM_HYP_MM_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 629e74c46b35..de7d60c3c20b 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -259,6 +259,23 @@ static int fix_host_ownership(void)
 	return 0;
 }
 
+/* Map the MMIO region into the hypervisor and remove it from host */
+int pkvm_create_hyp_device_mapping(u64 base, u64 size, void __iomem *haddr)
+{
+	int ret;
+
+	ret = __pkvm_create_private_mapping(base, size, PAGE_HYP_DEVICE, haddr);
+	if (ret)
+		return ret;
+
+	/* lock not needed during setup */
+	ret = host_stage2_set_owner_locked(base, size, PKVM_ID_HYP);
+	if (ret)
+		return ret;
+
+	return ret;
+}
+
 static int fix_hyp_pgtable_refcnt(void)
 {
 	struct kvm_pgtable_walker walker = {
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 10/45] KVM: arm64: pkvm: Expose pkvm_map/unmap_donated_memory()
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:52   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:52 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Allow the IOMMU driver to use pkvm_map/unmap_donated memory()

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  3 +++
 arch/arm64/kvm/hyp/nvhe/pkvm.c                | 18 +++++++++---------
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 38e5e9b259fc..40decbe4cc70 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -86,6 +86,9 @@ void reclaim_guest_pages(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *mc);
 int refill_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages,
 		    struct kvm_hyp_memcache *host_mc);
 
+void *pkvm_map_donated_memory(unsigned long host_va, size_t size);
+void pkvm_unmap_donated_memory(void *va, size_t size);
+
 static __always_inline void __load_host_stage2(void)
 {
 	if (static_branch_likely(&kvm_protected_mode_initialized))
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 905c05c7e9bf..a3711979bbd3 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -592,7 +592,7 @@ static void *map_donated_memory_noclear(unsigned long host_va, size_t size)
 	return va;
 }
 
-static void *map_donated_memory(unsigned long host_va, size_t size)
+void *pkvm_map_donated_memory(unsigned long host_va, size_t size)
 {
 	void *va = map_donated_memory_noclear(host_va, size);
 
@@ -608,7 +608,7 @@ static void __unmap_donated_memory(void *va, size_t size)
 				       PAGE_ALIGN(size) >> PAGE_SHIFT));
 }
 
-static void unmap_donated_memory(void *va, size_t size)
+void pkvm_unmap_donated_memory(void *va, size_t size)
 {
 	if (!va)
 		return;
@@ -668,11 +668,11 @@ int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
 
 	ret = -ENOMEM;
 
-	hyp_vm = map_donated_memory(vm_hva, vm_size);
+	hyp_vm = pkvm_map_donated_memory(vm_hva, vm_size);
 	if (!hyp_vm)
 		goto err_remove_mappings;
 
-	last_ran = map_donated_memory(last_ran_hva, last_ran_size);
+	last_ran = pkvm_map_donated_memory(last_ran_hva, last_ran_size);
 	if (!last_ran)
 		goto err_remove_mappings;
 
@@ -699,9 +699,9 @@ int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
 err_unlock:
 	hyp_spin_unlock(&vm_table_lock);
 err_remove_mappings:
-	unmap_donated_memory(hyp_vm, vm_size);
-	unmap_donated_memory(last_ran, last_ran_size);
-	unmap_donated_memory(pgd, pgd_size);
+	pkvm_unmap_donated_memory(hyp_vm, vm_size);
+	pkvm_unmap_donated_memory(last_ran, last_ran_size);
+	pkvm_unmap_donated_memory(pgd, pgd_size);
 err_unpin_kvm:
 	hyp_unpin_shared_mem(host_kvm, host_kvm + 1);
 	return ret;
@@ -726,7 +726,7 @@ int __pkvm_init_vcpu(pkvm_handle_t handle, struct kvm_vcpu *host_vcpu,
 	unsigned int idx;
 	int ret;
 
-	hyp_vcpu = map_donated_memory(vcpu_hva, sizeof(*hyp_vcpu));
+	hyp_vcpu = pkvm_map_donated_memory(vcpu_hva, sizeof(*hyp_vcpu));
 	if (!hyp_vcpu)
 		return -ENOMEM;
 
@@ -754,7 +754,7 @@ int __pkvm_init_vcpu(pkvm_handle_t handle, struct kvm_vcpu *host_vcpu,
 	hyp_spin_unlock(&vm_table_lock);
 
 	if (ret)
-		unmap_donated_memory(hyp_vcpu, sizeof(*hyp_vcpu));
+		pkvm_unmap_donated_memory(hyp_vcpu, sizeof(*hyp_vcpu));
 
 	return ret;
 }
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 10/45] KVM: arm64: pkvm: Expose pkvm_map/unmap_donated_memory()
@ 2023-02-01 12:52   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:52 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Allow the IOMMU driver to use pkvm_map/unmap_donated memory()

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  3 +++
 arch/arm64/kvm/hyp/nvhe/pkvm.c                | 18 +++++++++---------
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 38e5e9b259fc..40decbe4cc70 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -86,6 +86,9 @@ void reclaim_guest_pages(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *mc);
 int refill_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages,
 		    struct kvm_hyp_memcache *host_mc);
 
+void *pkvm_map_donated_memory(unsigned long host_va, size_t size);
+void pkvm_unmap_donated_memory(void *va, size_t size);
+
 static __always_inline void __load_host_stage2(void)
 {
 	if (static_branch_likely(&kvm_protected_mode_initialized))
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 905c05c7e9bf..a3711979bbd3 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -592,7 +592,7 @@ static void *map_donated_memory_noclear(unsigned long host_va, size_t size)
 	return va;
 }
 
-static void *map_donated_memory(unsigned long host_va, size_t size)
+void *pkvm_map_donated_memory(unsigned long host_va, size_t size)
 {
 	void *va = map_donated_memory_noclear(host_va, size);
 
@@ -608,7 +608,7 @@ static void __unmap_donated_memory(void *va, size_t size)
 				       PAGE_ALIGN(size) >> PAGE_SHIFT));
 }
 
-static void unmap_donated_memory(void *va, size_t size)
+void pkvm_unmap_donated_memory(void *va, size_t size)
 {
 	if (!va)
 		return;
@@ -668,11 +668,11 @@ int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
 
 	ret = -ENOMEM;
 
-	hyp_vm = map_donated_memory(vm_hva, vm_size);
+	hyp_vm = pkvm_map_donated_memory(vm_hva, vm_size);
 	if (!hyp_vm)
 		goto err_remove_mappings;
 
-	last_ran = map_donated_memory(last_ran_hva, last_ran_size);
+	last_ran = pkvm_map_donated_memory(last_ran_hva, last_ran_size);
 	if (!last_ran)
 		goto err_remove_mappings;
 
@@ -699,9 +699,9 @@ int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
 err_unlock:
 	hyp_spin_unlock(&vm_table_lock);
 err_remove_mappings:
-	unmap_donated_memory(hyp_vm, vm_size);
-	unmap_donated_memory(last_ran, last_ran_size);
-	unmap_donated_memory(pgd, pgd_size);
+	pkvm_unmap_donated_memory(hyp_vm, vm_size);
+	pkvm_unmap_donated_memory(last_ran, last_ran_size);
+	pkvm_unmap_donated_memory(pgd, pgd_size);
 err_unpin_kvm:
 	hyp_unpin_shared_mem(host_kvm, host_kvm + 1);
 	return ret;
@@ -726,7 +726,7 @@ int __pkvm_init_vcpu(pkvm_handle_t handle, struct kvm_vcpu *host_vcpu,
 	unsigned int idx;
 	int ret;
 
-	hyp_vcpu = map_donated_memory(vcpu_hva, sizeof(*hyp_vcpu));
+	hyp_vcpu = pkvm_map_donated_memory(vcpu_hva, sizeof(*hyp_vcpu));
 	if (!hyp_vcpu)
 		return -ENOMEM;
 
@@ -754,7 +754,7 @@ int __pkvm_init_vcpu(pkvm_handle_t handle, struct kvm_vcpu *host_vcpu,
 	hyp_spin_unlock(&vm_table_lock);
 
 	if (ret)
-		unmap_donated_memory(hyp_vcpu, sizeof(*hyp_vcpu));
+		pkvm_unmap_donated_memory(hyp_vcpu, sizeof(*hyp_vcpu));
 
 	return ret;
 }
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 11/45] KVM: arm64: pkvm: Expose pkvm_admit_host_page()
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:52   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:52 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Since the IOMMU driver will need admit_host_page(), make it non static.
As a result we can drop refill_memcache() and call admit_host_page()
directly from pkvm_refill_memcache().

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  2 --
 arch/arm64/kvm/hyp/include/nvhe/mm.h          |  1 +
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            | 16 ++++++++---
 arch/arm64/kvm/hyp/nvhe/mm.c                  | 27 +++++--------------
 4 files changed, 20 insertions(+), 26 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 40decbe4cc70..d4f4ffbb7dbb 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -83,8 +83,6 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
 int hyp_pin_shared_mem(void *from, void *to);
 void hyp_unpin_shared_mem(void *from, void *to);
 void reclaim_guest_pages(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *mc);
-int refill_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages,
-		    struct kvm_hyp_memcache *host_mc);
 
 void *pkvm_map_donated_memory(unsigned long host_va, size_t size);
 void pkvm_unmap_donated_memory(void *va, size_t size);
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
index 84db840f2057..a8c46a0ebc4a 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -26,6 +26,7 @@ int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot prot
 int __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
 				  enum kvm_pgtable_prot prot,
 				  unsigned long *haddr);
+void *pkvm_admit_host_page(struct kvm_hyp_memcache *mc);
 int pkvm_alloc_private_va_range(size_t size, unsigned long *haddr);
 int pkvm_create_hyp_device_mapping(u64 base, u64 size, void __iomem *haddr);
 
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index e8328f54200e..29ce7b09edbb 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -766,14 +766,24 @@ static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
 	cpu_reg(host_ctxt, 1) =  ret;
 }
 
+static void *admit_host_page(void *arg)
+{
+	return pkvm_admit_host_page(arg);
+}
+
 static int pkvm_refill_memcache(struct pkvm_hyp_vcpu *hyp_vcpu)
 {
+	int ret;
 	struct pkvm_hyp_vm *hyp_vm = pkvm_hyp_vcpu_to_hyp_vm(hyp_vcpu);
 	u64 nr_pages = VTCR_EL2_LVLS(hyp_vm->kvm.arch.vtcr) - 1;
-	struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
+	struct kvm_hyp_memcache host_mc = hyp_vcpu->host_vcpu->arch.pkvm_memcache;
+
+	ret =  __topup_hyp_memcache(&hyp_vcpu->vcpu.arch.pkvm_memcache,
+				    nr_pages, admit_host_page,
+				    hyp_virt_to_phys, &host_mc);
 
-	return refill_memcache(&hyp_vcpu->vcpu.arch.pkvm_memcache, nr_pages,
-			       &host_vcpu->arch.pkvm_memcache);
+	hyp_vcpu->host_vcpu->arch.pkvm_memcache = host_mc;
+	return ret;
 }
 
 static void handle___pkvm_host_map_guest(struct kvm_cpu_context *host_ctxt)
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index 318298eb3d6b..9daaf2b2b191 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -340,35 +340,20 @@ int hyp_create_idmap(u32 hyp_va_bits)
 	return __pkvm_create_mappings(start, end - start, start, PAGE_HYP_EXEC);
 }
 
-static void *admit_host_page(void *arg)
+void *pkvm_admit_host_page(struct kvm_hyp_memcache *mc)
 {
-	struct kvm_hyp_memcache *host_mc = arg;
-
-	if (!host_mc->nr_pages)
+	if (!mc->nr_pages)
 		return NULL;
 
 	/*
 	 * The host still owns the pages in its memcache, so we need to go
 	 * through a full host-to-hyp donation cycle to change it. Fortunately,
 	 * __pkvm_host_donate_hyp() takes care of races for us, so if it
-	 * succeeds we're good to go.
+	 * succeeds we're good to go. Because mc is a copy of the memcache
+	 * struct, the host cannot modify mc->head between donate and pop.
 	 */
-	if (__pkvm_host_donate_hyp(hyp_phys_to_pfn(host_mc->head), 1))
+	if (__pkvm_host_donate_hyp(hyp_phys_to_pfn(mc->head), 1))
 		return NULL;
 
-	return pop_hyp_memcache(host_mc, hyp_phys_to_virt);
-}
-
-/* Refill our local memcache by poping pages from the one provided by the host. */
-int refill_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages,
-		    struct kvm_hyp_memcache *host_mc)
-{
-	struct kvm_hyp_memcache tmp = *host_mc;
-	int ret;
-
-	ret =  __topup_hyp_memcache(mc, min_pages, admit_host_page,
-				    hyp_virt_to_phys, &tmp);
-	*host_mc = tmp;
-
-	return ret;
+	return pop_hyp_memcache(mc, hyp_phys_to_virt);
 }
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 11/45] KVM: arm64: pkvm: Expose pkvm_admit_host_page()
@ 2023-02-01 12:52   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:52 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Since the IOMMU driver will need admit_host_page(), make it non static.
As a result we can drop refill_memcache() and call admit_host_page()
directly from pkvm_refill_memcache().

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  2 --
 arch/arm64/kvm/hyp/include/nvhe/mm.h          |  1 +
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            | 16 ++++++++---
 arch/arm64/kvm/hyp/nvhe/mm.c                  | 27 +++++--------------
 4 files changed, 20 insertions(+), 26 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 40decbe4cc70..d4f4ffbb7dbb 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -83,8 +83,6 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
 int hyp_pin_shared_mem(void *from, void *to);
 void hyp_unpin_shared_mem(void *from, void *to);
 void reclaim_guest_pages(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *mc);
-int refill_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages,
-		    struct kvm_hyp_memcache *host_mc);
 
 void *pkvm_map_donated_memory(unsigned long host_va, size_t size);
 void pkvm_unmap_donated_memory(void *va, size_t size);
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
index 84db840f2057..a8c46a0ebc4a 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -26,6 +26,7 @@ int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot prot
 int __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
 				  enum kvm_pgtable_prot prot,
 				  unsigned long *haddr);
+void *pkvm_admit_host_page(struct kvm_hyp_memcache *mc);
 int pkvm_alloc_private_va_range(size_t size, unsigned long *haddr);
 int pkvm_create_hyp_device_mapping(u64 base, u64 size, void __iomem *haddr);
 
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index e8328f54200e..29ce7b09edbb 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -766,14 +766,24 @@ static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
 	cpu_reg(host_ctxt, 1) =  ret;
 }
 
+static void *admit_host_page(void *arg)
+{
+	return pkvm_admit_host_page(arg);
+}
+
 static int pkvm_refill_memcache(struct pkvm_hyp_vcpu *hyp_vcpu)
 {
+	int ret;
 	struct pkvm_hyp_vm *hyp_vm = pkvm_hyp_vcpu_to_hyp_vm(hyp_vcpu);
 	u64 nr_pages = VTCR_EL2_LVLS(hyp_vm->kvm.arch.vtcr) - 1;
-	struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
+	struct kvm_hyp_memcache host_mc = hyp_vcpu->host_vcpu->arch.pkvm_memcache;
+
+	ret =  __topup_hyp_memcache(&hyp_vcpu->vcpu.arch.pkvm_memcache,
+				    nr_pages, admit_host_page,
+				    hyp_virt_to_phys, &host_mc);
 
-	return refill_memcache(&hyp_vcpu->vcpu.arch.pkvm_memcache, nr_pages,
-			       &host_vcpu->arch.pkvm_memcache);
+	hyp_vcpu->host_vcpu->arch.pkvm_memcache = host_mc;
+	return ret;
 }
 
 static void handle___pkvm_host_map_guest(struct kvm_cpu_context *host_ctxt)
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index 318298eb3d6b..9daaf2b2b191 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -340,35 +340,20 @@ int hyp_create_idmap(u32 hyp_va_bits)
 	return __pkvm_create_mappings(start, end - start, start, PAGE_HYP_EXEC);
 }
 
-static void *admit_host_page(void *arg)
+void *pkvm_admit_host_page(struct kvm_hyp_memcache *mc)
 {
-	struct kvm_hyp_memcache *host_mc = arg;
-
-	if (!host_mc->nr_pages)
+	if (!mc->nr_pages)
 		return NULL;
 
 	/*
 	 * The host still owns the pages in its memcache, so we need to go
 	 * through a full host-to-hyp donation cycle to change it. Fortunately,
 	 * __pkvm_host_donate_hyp() takes care of races for us, so if it
-	 * succeeds we're good to go.
+	 * succeeds we're good to go. Because mc is a copy of the memcache
+	 * struct, the host cannot modify mc->head between donate and pop.
 	 */
-	if (__pkvm_host_donate_hyp(hyp_phys_to_pfn(host_mc->head), 1))
+	if (__pkvm_host_donate_hyp(hyp_phys_to_pfn(mc->head), 1))
 		return NULL;
 
-	return pop_hyp_memcache(host_mc, hyp_phys_to_virt);
-}
-
-/* Refill our local memcache by poping pages from the one provided by the host. */
-int refill_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages,
-		    struct kvm_hyp_memcache *host_mc)
-{
-	struct kvm_hyp_memcache tmp = *host_mc;
-	int ret;
-
-	ret =  __topup_hyp_memcache(mc, min_pages, admit_host_page,
-				    hyp_virt_to_phys, &tmp);
-	*host_mc = tmp;
-
-	return ret;
+	return pop_hyp_memcache(mc, hyp_phys_to_virt);
 }
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 12/45] KVM: arm64: pkvm: Unify pkvm_pkvm_teardown_donated_memory()
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:52   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:52 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Tearing down donated memory requires clearing the memory, pushing the
pages into the reclaim memcache, and moving the mapping into the host
stage-2. Keep these operations in a single function.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  2 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         |  3 +-
 arch/arm64/kvm/hyp/nvhe/pkvm.c                | 50 +++++++------------
 3 files changed, 22 insertions(+), 33 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index d4f4ffbb7dbb..021825aee854 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -86,6 +86,8 @@ void reclaim_guest_pages(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *mc);
 
 void *pkvm_map_donated_memory(unsigned long host_va, size_t size);
 void pkvm_unmap_donated_memory(void *va, size_t size);
+void pkvm_teardown_donated_memory(struct kvm_hyp_memcache *mc, void *addr,
+				  size_t dirty_size);
 
 static __always_inline void __load_host_stage2(void)
 {
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 410361f41e38..cad5736026d5 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -314,8 +314,7 @@ void reclaim_guest_pages(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *mc)
 	addr = hyp_alloc_pages(&vm->pool, 0);
 	while (addr) {
 		memset(hyp_virt_to_page(addr), 0, sizeof(struct hyp_page));
-		push_hyp_memcache(mc, addr, hyp_virt_to_phys);
-		WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(addr), 1));
+		pkvm_teardown_donated_memory(mc, addr, 0);
 		addr = hyp_alloc_pages(&vm->pool, 0);
 	}
 }
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index a3711979bbd3..c51a8a592849 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -602,27 +602,28 @@ void *pkvm_map_donated_memory(unsigned long host_va, size_t size)
 	return va;
 }
 
-static void __unmap_donated_memory(void *va, size_t size)
+void pkvm_teardown_donated_memory(struct kvm_hyp_memcache *mc, void *va,
+				  size_t dirty_size)
 {
-	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(va),
-				       PAGE_ALIGN(size) >> PAGE_SHIFT));
-}
+	size_t size = max(PAGE_ALIGN(dirty_size), PAGE_SIZE);
 
-void pkvm_unmap_donated_memory(void *va, size_t size)
-{
 	if (!va)
 		return;
 
-	memset(va, 0, size);
-	__unmap_donated_memory(va, size);
+	memset(va, 0, dirty_size);
+
+	if (mc) {
+		for (void *start = va; start < va + size; start += PAGE_SIZE)
+			push_hyp_memcache(mc, start, hyp_virt_to_phys);
+	}
+
+	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(va),
+				       size >> PAGE_SHIFT));
 }
 
-static void unmap_donated_memory_noclear(void *va, size_t size)
+void pkvm_unmap_donated_memory(void *va, size_t size)
 {
-	if (!va)
-		return;
-
-	__unmap_donated_memory(va, size);
+	pkvm_teardown_donated_memory(NULL, va, size);
 }
 
 /*
@@ -759,18 +760,6 @@ int __pkvm_init_vcpu(pkvm_handle_t handle, struct kvm_vcpu *host_vcpu,
 	return ret;
 }
 
-static void
-teardown_donated_memory(struct kvm_hyp_memcache *mc, void *addr, size_t size)
-{
-	size = PAGE_ALIGN(size);
-	memset(addr, 0, size);
-
-	for (void *start = addr; start < addr + size; start += PAGE_SIZE)
-		push_hyp_memcache(mc, start, hyp_virt_to_phys);
-
-	unmap_donated_memory_noclear(addr, size);
-}
-
 int __pkvm_teardown_vm(pkvm_handle_t handle)
 {
 	size_t vm_size, last_ran_size;
@@ -813,19 +802,18 @@ int __pkvm_teardown_vm(pkvm_handle_t handle)
 		vcpu_mc = &hyp_vcpu->vcpu.arch.pkvm_memcache;
 		while (vcpu_mc->nr_pages) {
 			addr = pop_hyp_memcache(vcpu_mc, hyp_phys_to_virt);
-			push_hyp_memcache(mc, addr, hyp_virt_to_phys);
-			unmap_donated_memory_noclear(addr, PAGE_SIZE);
+			pkvm_teardown_donated_memory(mc, addr, 0);
 		}
 
-		teardown_donated_memory(mc, hyp_vcpu, sizeof(*hyp_vcpu));
+		pkvm_teardown_donated_memory(mc, hyp_vcpu, sizeof(*hyp_vcpu));
 	}
 
 	last_ran_size = pkvm_get_last_ran_size();
-	teardown_donated_memory(mc, hyp_vm->kvm.arch.mmu.last_vcpu_ran,
-				last_ran_size);
+	pkvm_teardown_donated_memory(mc, hyp_vm->kvm.arch.mmu.last_vcpu_ran,
+				     last_ran_size);
 
 	vm_size = pkvm_get_hyp_vm_size(hyp_vm->kvm.created_vcpus);
-	teardown_donated_memory(mc, hyp_vm, vm_size);
+	pkvm_teardown_donated_memory(mc, hyp_vm, vm_size);
 	hyp_unpin_shared_mem(host_kvm, host_kvm + 1);
 	return 0;
 
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 12/45] KVM: arm64: pkvm: Unify pkvm_pkvm_teardown_donated_memory()
@ 2023-02-01 12:52   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:52 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Tearing down donated memory requires clearing the memory, pushing the
pages into the reclaim memcache, and moving the mapping into the host
stage-2. Keep these operations in a single function.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  2 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         |  3 +-
 arch/arm64/kvm/hyp/nvhe/pkvm.c                | 50 +++++++------------
 3 files changed, 22 insertions(+), 33 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index d4f4ffbb7dbb..021825aee854 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -86,6 +86,8 @@ void reclaim_guest_pages(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *mc);
 
 void *pkvm_map_donated_memory(unsigned long host_va, size_t size);
 void pkvm_unmap_donated_memory(void *va, size_t size);
+void pkvm_teardown_donated_memory(struct kvm_hyp_memcache *mc, void *addr,
+				  size_t dirty_size);
 
 static __always_inline void __load_host_stage2(void)
 {
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 410361f41e38..cad5736026d5 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -314,8 +314,7 @@ void reclaim_guest_pages(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *mc)
 	addr = hyp_alloc_pages(&vm->pool, 0);
 	while (addr) {
 		memset(hyp_virt_to_page(addr), 0, sizeof(struct hyp_page));
-		push_hyp_memcache(mc, addr, hyp_virt_to_phys);
-		WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(addr), 1));
+		pkvm_teardown_donated_memory(mc, addr, 0);
 		addr = hyp_alloc_pages(&vm->pool, 0);
 	}
 }
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index a3711979bbd3..c51a8a592849 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -602,27 +602,28 @@ void *pkvm_map_donated_memory(unsigned long host_va, size_t size)
 	return va;
 }
 
-static void __unmap_donated_memory(void *va, size_t size)
+void pkvm_teardown_donated_memory(struct kvm_hyp_memcache *mc, void *va,
+				  size_t dirty_size)
 {
-	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(va),
-				       PAGE_ALIGN(size) >> PAGE_SHIFT));
-}
+	size_t size = max(PAGE_ALIGN(dirty_size), PAGE_SIZE);
 
-void pkvm_unmap_donated_memory(void *va, size_t size)
-{
 	if (!va)
 		return;
 
-	memset(va, 0, size);
-	__unmap_donated_memory(va, size);
+	memset(va, 0, dirty_size);
+
+	if (mc) {
+		for (void *start = va; start < va + size; start += PAGE_SIZE)
+			push_hyp_memcache(mc, start, hyp_virt_to_phys);
+	}
+
+	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(va),
+				       size >> PAGE_SHIFT));
 }
 
-static void unmap_donated_memory_noclear(void *va, size_t size)
+void pkvm_unmap_donated_memory(void *va, size_t size)
 {
-	if (!va)
-		return;
-
-	__unmap_donated_memory(va, size);
+	pkvm_teardown_donated_memory(NULL, va, size);
 }
 
 /*
@@ -759,18 +760,6 @@ int __pkvm_init_vcpu(pkvm_handle_t handle, struct kvm_vcpu *host_vcpu,
 	return ret;
 }
 
-static void
-teardown_donated_memory(struct kvm_hyp_memcache *mc, void *addr, size_t size)
-{
-	size = PAGE_ALIGN(size);
-	memset(addr, 0, size);
-
-	for (void *start = addr; start < addr + size; start += PAGE_SIZE)
-		push_hyp_memcache(mc, start, hyp_virt_to_phys);
-
-	unmap_donated_memory_noclear(addr, size);
-}
-
 int __pkvm_teardown_vm(pkvm_handle_t handle)
 {
 	size_t vm_size, last_ran_size;
@@ -813,19 +802,18 @@ int __pkvm_teardown_vm(pkvm_handle_t handle)
 		vcpu_mc = &hyp_vcpu->vcpu.arch.pkvm_memcache;
 		while (vcpu_mc->nr_pages) {
 			addr = pop_hyp_memcache(vcpu_mc, hyp_phys_to_virt);
-			push_hyp_memcache(mc, addr, hyp_virt_to_phys);
-			unmap_donated_memory_noclear(addr, PAGE_SIZE);
+			pkvm_teardown_donated_memory(mc, addr, 0);
 		}
 
-		teardown_donated_memory(mc, hyp_vcpu, sizeof(*hyp_vcpu));
+		pkvm_teardown_donated_memory(mc, hyp_vcpu, sizeof(*hyp_vcpu));
 	}
 
 	last_ran_size = pkvm_get_last_ran_size();
-	teardown_donated_memory(mc, hyp_vm->kvm.arch.mmu.last_vcpu_ran,
-				last_ran_size);
+	pkvm_teardown_donated_memory(mc, hyp_vm->kvm.arch.mmu.last_vcpu_ran,
+				     last_ran_size);
 
 	vm_size = pkvm_get_hyp_vm_size(hyp_vm->kvm.created_vcpus);
-	teardown_donated_memory(mc, hyp_vm, vm_size);
+	pkvm_teardown_donated_memory(mc, hyp_vm, vm_size);
 	hyp_unpin_shared_mem(host_kvm, host_kvm + 1);
 	return 0;
 
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 13/45] KVM: arm64: pkvm: Add hyp_page_ref_inc_return()
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:52   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:52 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Add a page_ref_inc() helper that returns an error on saturation instead
of BUG()ing. There is no limit in the IOMMU API for the number of times
a page can be mapped. Since pKVM has this limit at 2^16, error out
gracefully.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/kvm/hyp/include/nvhe/memory.h | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index a8d4a5b919d2..c40fff5d6d22 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -57,10 +57,21 @@ static inline int hyp_page_count(void *addr)
 	return p->refcount;
 }
 
+/*
+ * Increase the refcount and return its new value.
+ * If the refcount is saturated, return a negative error
+ */
+static inline int hyp_page_ref_inc_return(struct hyp_page *p)
+{
+	if (p->refcount == USHRT_MAX)
+		return -EOVERFLOW;
+
+	return ++p->refcount;
+}
+
 static inline void hyp_page_ref_inc(struct hyp_page *p)
 {
-	BUG_ON(p->refcount == USHRT_MAX);
-	p->refcount++;
+	BUG_ON(hyp_page_ref_inc_return(p) <= 0);
 }
 
 static inline void hyp_page_ref_dec(struct hyp_page *p)
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 13/45] KVM: arm64: pkvm: Add hyp_page_ref_inc_return()
@ 2023-02-01 12:52   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:52 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Add a page_ref_inc() helper that returns an error on saturation instead
of BUG()ing. There is no limit in the IOMMU API for the number of times
a page can be mapped. Since pKVM has this limit at 2^16, error out
gracefully.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/kvm/hyp/include/nvhe/memory.h | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index a8d4a5b919d2..c40fff5d6d22 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -57,10 +57,21 @@ static inline int hyp_page_count(void *addr)
 	return p->refcount;
 }
 
+/*
+ * Increase the refcount and return its new value.
+ * If the refcount is saturated, return a negative error
+ */
+static inline int hyp_page_ref_inc_return(struct hyp_page *p)
+{
+	if (p->refcount == USHRT_MAX)
+		return -EOVERFLOW;
+
+	return ++p->refcount;
+}
+
 static inline void hyp_page_ref_inc(struct hyp_page *p)
 {
-	BUG_ON(p->refcount == USHRT_MAX);
-	p->refcount++;
+	BUG_ON(hyp_page_ref_inc_return(p) <= 0);
 }
 
 static inline void hyp_page_ref_dec(struct hyp_page *p)
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 14/45] KVM: arm64: pkvm: Prevent host donation of device memory
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:52   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:52 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

For the moment donating device memory cannot be supported. IOMMU support
requires tracking host-owned pages that are mapped in the IOMMU, but the
vmemmap portion of MMIO is not backed by physical pages, and ownership
information in the host stage-2 page tables is not kept by
host_stage2_try().

__check_page_state_visitor() already ensures that MMIO pages present in
the host stage-2 are not donated, so we're just extending that check to
pages that haven't been accessed by the host yet (typical of an MSI
doorbell), or that have been recycled by host_stage2_try().

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index cad5736026d5..856673291d70 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -719,6 +719,10 @@ static int host_request_owned_transition(u64 *completer_addr,
 	u64 size = tx->nr_pages * PAGE_SIZE;
 	u64 addr = tx->initiator.addr;
 
+	/* We don't support donating device memory at the moment */
+	if (!range_is_memory(addr, addr + size))
+		return -EINVAL;
+
 	*completer_addr = tx->initiator.host.completer_addr;
 	return __host_check_page_state_range(addr, size, PKVM_PAGE_OWNED);
 }
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 14/45] KVM: arm64: pkvm: Prevent host donation of device memory
@ 2023-02-01 12:52   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:52 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

For the moment donating device memory cannot be supported. IOMMU support
requires tracking host-owned pages that are mapped in the IOMMU, but the
vmemmap portion of MMIO is not backed by physical pages, and ownership
information in the host stage-2 page tables is not kept by
host_stage2_try().

__check_page_state_visitor() already ensures that MMIO pages present in
the host stage-2 are not donated, so we're just extending that check to
pages that haven't been accessed by the host yet (typical of an MSI
doorbell), or that have been recycled by host_stage2_try().

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index cad5736026d5..856673291d70 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -719,6 +719,10 @@ static int host_request_owned_transition(u64 *completer_addr,
 	u64 size = tx->nr_pages * PAGE_SIZE;
 	u64 addr = tx->initiator.addr;
 
+	/* We don't support donating device memory at the moment */
+	if (!range_is_memory(addr, addr + size))
+		return -EINVAL;
+
 	*completer_addr = tx->initiator.host.completer_addr;
 	return __host_check_page_state_range(addr, size, PKVM_PAGE_OWNED);
 }
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 15/45] KVM: arm64: pkvm: Add __pkvm_host_share/unshare_dma()
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:52   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:52 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Host pages mapped in the SMMU must not be donated to the guest or
hypervisor, since the host could then use DMA to break confidentiality.
Mark them shared in the host stage-2 page tables, and keep a refcount in
the hyp vmemmap.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |   3 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 185 ++++++++++++++++++
 2 files changed, 188 insertions(+)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 021825aee854..a363d58a998b 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -58,6 +58,7 @@ enum pkvm_component_id {
 	PKVM_ID_HOST,
 	PKVM_ID_HYP,
 	PKVM_ID_GUEST,
+	PKVM_ID_IOMMU,
 };
 
 extern unsigned long hyp_nr_cpus;
@@ -72,6 +73,8 @@ int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu);
 int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu);
 int __pkvm_guest_share_host(struct pkvm_hyp_vcpu *hyp_vcpu, u64 ipa);
 int __pkvm_guest_unshare_host(struct pkvm_hyp_vcpu *hyp_vcpu, u64 ipa);
+int __pkvm_host_share_dma(u64 phys_addr, size_t size, bool is_ram);
+int __pkvm_host_unshare_dma(u64 phys_addr, size_t size);
 
 bool addr_is_memory(phys_addr_t phys);
 int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 856673291d70..dcf08ce03790 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -1148,6 +1148,9 @@ static int check_share(struct pkvm_mem_share *share)
 	case PKVM_ID_GUEST:
 		ret = guest_ack_share(completer_addr, tx, share->completer_prot);
 		break;
+	case PKVM_ID_IOMMU:
+		ret = 0;
+		break;
 	default:
 		ret = -EINVAL;
 	}
@@ -1185,6 +1188,9 @@ static int __do_share(struct pkvm_mem_share *share)
 	case PKVM_ID_GUEST:
 		ret = guest_complete_share(completer_addr, tx, share->completer_prot);
 		break;
+	case PKVM_ID_IOMMU:
+		ret = 0;
+		break;
 	default:
 		ret = -EINVAL;
 	}
@@ -1239,6 +1245,9 @@ static int check_unshare(struct pkvm_mem_share *share)
 	case PKVM_ID_HYP:
 		ret = hyp_ack_unshare(completer_addr, tx);
 		break;
+	case PKVM_ID_IOMMU:
+		ret = 0;
+		break;
 	default:
 		ret = -EINVAL;
 	}
@@ -1273,6 +1282,9 @@ static int __do_unshare(struct pkvm_mem_share *share)
 	case PKVM_ID_HYP:
 		ret = hyp_complete_unshare(completer_addr, tx);
 		break;
+	case PKVM_ID_IOMMU:
+		ret = 0;
+		break;
 	default:
 		ret = -EINVAL;
 	}
@@ -1633,6 +1645,179 @@ void hyp_unpin_shared_mem(void *from, void *to)
 	host_unlock_component();
 }
 
+static int __host_check_page_dma_shared(phys_addr_t phys_addr)
+{
+	int ret;
+	u64 hyp_addr;
+
+	/*
+	 * The page is already refcounted. Make sure it's owned by the host, and
+	 * not part of the hyp pool.
+	 */
+	ret = __host_check_page_state_range(phys_addr, PAGE_SIZE,
+					    PKVM_PAGE_SHARED_OWNED);
+	if (ret)
+		return ret;
+
+	/*
+	 * Refcounted and owned by host, means it's either mapped in the
+	 * SMMU, or it's some VM/VCPU state shared with the hypervisor.
+	 * The host has no reason to use a page for both.
+	 */
+	hyp_addr = (u64)hyp_phys_to_virt(phys_addr);
+	return __hyp_check_page_state_range(hyp_addr, PAGE_SIZE, PKVM_NOPAGE);
+}
+
+static int __pkvm_host_share_dma_page(phys_addr_t phys_addr, bool is_ram)
+{
+	int ret;
+	struct hyp_page *p = hyp_phys_to_page(phys_addr);
+	struct pkvm_mem_share share = {
+		.tx	= {
+			.nr_pages	= 1,
+			.initiator	= {
+				.id	= PKVM_ID_HOST,
+				.addr	= phys_addr,
+			},
+			.completer	= {
+				.id	= PKVM_ID_IOMMU,
+			},
+		},
+	};
+
+	hyp_assert_lock_held(&host_mmu.lock);
+	hyp_assert_lock_held(&pkvm_pgd_lock);
+
+	/*
+	 * Some differences between handling of RAM and device memory:
+	 * - The hyp vmemmap area for device memory is not backed by physical
+	 *   pages in the hyp page tables.
+	 * - Device memory is unmapped automatically under memory pressure
+	 *   (host_stage2_try()) and the ownership information would be
+	 *   discarded.
+	 * We don't need to deal with that at the moment, because the host
+	 * cannot share or donate device memory, only RAM.
+	 *
+	 * Since 'is_ram' is only a hint provided by the host, we do need to
+	 * make sure of it.
+	 */
+	if (!is_ram)
+		return addr_is_memory(phys_addr) ? -EINVAL : 0;
+
+	ret = hyp_page_ref_inc_return(p);
+	BUG_ON(ret == 0);
+	if (ret < 0)
+		return ret;
+	else if (ret == 1)
+		ret = do_share(&share);
+	else
+		ret = __host_check_page_dma_shared(phys_addr);
+
+	if (ret)
+		hyp_page_ref_dec(p);
+
+	return ret;
+}
+
+static int __pkvm_host_unshare_dma_page(phys_addr_t phys_addr)
+{
+	struct hyp_page *p = hyp_phys_to_page(phys_addr);
+	struct pkvm_mem_share share = {
+		.tx	= {
+			.nr_pages	= 1,
+			.initiator	= {
+				.id	= PKVM_ID_HOST,
+				.addr	= phys_addr,
+			},
+			.completer	= {
+				.id	= PKVM_ID_IOMMU,
+			},
+		},
+	};
+
+	hyp_assert_lock_held(&host_mmu.lock);
+	hyp_assert_lock_held(&pkvm_pgd_lock);
+
+	if (!addr_is_memory(phys_addr))
+		return 0;
+
+	if (!hyp_page_ref_dec_and_test(p))
+		return 0;
+
+	return do_unshare(&share);
+}
+
+/*
+ * __pkvm_host_share_dma - Mark host memory as used for DMA
+ * @phys_addr:	physical address of the DMA region
+ * @size:	size of the DMA region
+ * @is_ram:	whether it is RAM or device memory
+ *
+ * We must not allow the host to donate pages that are mapped in the IOMMU for
+ * DMA. So:
+ * 1. Mark the host S2 entry as being owned by IOMMU
+ * 2. Refcount it, since a page may be mapped in multiple device address spaces.
+ *
+ * At some point we may end up needing more than the current 16 bits for
+ * refcounting, for example if all devices and sub-devices map the same MSI
+ * doorbell page. It will do for now.
+ */
+int __pkvm_host_share_dma(phys_addr_t phys_addr, size_t size, bool is_ram)
+{
+	int i;
+	int ret;
+	size_t nr_pages = size >> PAGE_SHIFT;
+
+	if (WARN_ON(!PAGE_ALIGNED(phys_addr | size)))
+		return -EINVAL;
+
+	host_lock_component();
+	hyp_lock_component();
+
+	for (i = 0; i < nr_pages; i++) {
+		ret = __pkvm_host_share_dma_page(phys_addr + i * PAGE_SIZE,
+						 is_ram);
+		if (ret)
+			break;
+	}
+
+	if (ret) {
+		for (--i; i >= 0; --i)
+			__pkvm_host_unshare_dma_page(phys_addr + i * PAGE_SIZE);
+	}
+
+	hyp_unlock_component();
+	host_unlock_component();
+
+	return ret;
+}
+
+int __pkvm_host_unshare_dma(phys_addr_t phys_addr, size_t size)
+{
+	int i;
+	int ret;
+	size_t nr_pages = size >> PAGE_SHIFT;
+
+	host_lock_component();
+	hyp_lock_component();
+
+	/*
+	 * We end up here after the caller successfully unmapped the page from
+	 * the IOMMU table. Which means that a ref is held, the page is shared
+	 * in the host s2, there can be no failure.
+	 */
+	for (i = 0; i < nr_pages; i++) {
+		ret = __pkvm_host_unshare_dma_page(phys_addr + i * PAGE_SIZE);
+		if (ret)
+			break;
+	}
+
+	hyp_unlock_component();
+	host_unlock_component();
+
+	return ret;
+}
+
 int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
 {
 	int ret;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 15/45] KVM: arm64: pkvm: Add __pkvm_host_share/unshare_dma()
@ 2023-02-01 12:52   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:52 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Host pages mapped in the SMMU must not be donated to the guest or
hypervisor, since the host could then use DMA to break confidentiality.
Mark them shared in the host stage-2 page tables, and keep a refcount in
the hyp vmemmap.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |   3 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 185 ++++++++++++++++++
 2 files changed, 188 insertions(+)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 021825aee854..a363d58a998b 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -58,6 +58,7 @@ enum pkvm_component_id {
 	PKVM_ID_HOST,
 	PKVM_ID_HYP,
 	PKVM_ID_GUEST,
+	PKVM_ID_IOMMU,
 };
 
 extern unsigned long hyp_nr_cpus;
@@ -72,6 +73,8 @@ int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu);
 int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu);
 int __pkvm_guest_share_host(struct pkvm_hyp_vcpu *hyp_vcpu, u64 ipa);
 int __pkvm_guest_unshare_host(struct pkvm_hyp_vcpu *hyp_vcpu, u64 ipa);
+int __pkvm_host_share_dma(u64 phys_addr, size_t size, bool is_ram);
+int __pkvm_host_unshare_dma(u64 phys_addr, size_t size);
 
 bool addr_is_memory(phys_addr_t phys);
 int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 856673291d70..dcf08ce03790 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -1148,6 +1148,9 @@ static int check_share(struct pkvm_mem_share *share)
 	case PKVM_ID_GUEST:
 		ret = guest_ack_share(completer_addr, tx, share->completer_prot);
 		break;
+	case PKVM_ID_IOMMU:
+		ret = 0;
+		break;
 	default:
 		ret = -EINVAL;
 	}
@@ -1185,6 +1188,9 @@ static int __do_share(struct pkvm_mem_share *share)
 	case PKVM_ID_GUEST:
 		ret = guest_complete_share(completer_addr, tx, share->completer_prot);
 		break;
+	case PKVM_ID_IOMMU:
+		ret = 0;
+		break;
 	default:
 		ret = -EINVAL;
 	}
@@ -1239,6 +1245,9 @@ static int check_unshare(struct pkvm_mem_share *share)
 	case PKVM_ID_HYP:
 		ret = hyp_ack_unshare(completer_addr, tx);
 		break;
+	case PKVM_ID_IOMMU:
+		ret = 0;
+		break;
 	default:
 		ret = -EINVAL;
 	}
@@ -1273,6 +1282,9 @@ static int __do_unshare(struct pkvm_mem_share *share)
 	case PKVM_ID_HYP:
 		ret = hyp_complete_unshare(completer_addr, tx);
 		break;
+	case PKVM_ID_IOMMU:
+		ret = 0;
+		break;
 	default:
 		ret = -EINVAL;
 	}
@@ -1633,6 +1645,179 @@ void hyp_unpin_shared_mem(void *from, void *to)
 	host_unlock_component();
 }
 
+static int __host_check_page_dma_shared(phys_addr_t phys_addr)
+{
+	int ret;
+	u64 hyp_addr;
+
+	/*
+	 * The page is already refcounted. Make sure it's owned by the host, and
+	 * not part of the hyp pool.
+	 */
+	ret = __host_check_page_state_range(phys_addr, PAGE_SIZE,
+					    PKVM_PAGE_SHARED_OWNED);
+	if (ret)
+		return ret;
+
+	/*
+	 * Refcounted and owned by host, means it's either mapped in the
+	 * SMMU, or it's some VM/VCPU state shared with the hypervisor.
+	 * The host has no reason to use a page for both.
+	 */
+	hyp_addr = (u64)hyp_phys_to_virt(phys_addr);
+	return __hyp_check_page_state_range(hyp_addr, PAGE_SIZE, PKVM_NOPAGE);
+}
+
+static int __pkvm_host_share_dma_page(phys_addr_t phys_addr, bool is_ram)
+{
+	int ret;
+	struct hyp_page *p = hyp_phys_to_page(phys_addr);
+	struct pkvm_mem_share share = {
+		.tx	= {
+			.nr_pages	= 1,
+			.initiator	= {
+				.id	= PKVM_ID_HOST,
+				.addr	= phys_addr,
+			},
+			.completer	= {
+				.id	= PKVM_ID_IOMMU,
+			},
+		},
+	};
+
+	hyp_assert_lock_held(&host_mmu.lock);
+	hyp_assert_lock_held(&pkvm_pgd_lock);
+
+	/*
+	 * Some differences between handling of RAM and device memory:
+	 * - The hyp vmemmap area for device memory is not backed by physical
+	 *   pages in the hyp page tables.
+	 * - Device memory is unmapped automatically under memory pressure
+	 *   (host_stage2_try()) and the ownership information would be
+	 *   discarded.
+	 * We don't need to deal with that at the moment, because the host
+	 * cannot share or donate device memory, only RAM.
+	 *
+	 * Since 'is_ram' is only a hint provided by the host, we do need to
+	 * make sure of it.
+	 */
+	if (!is_ram)
+		return addr_is_memory(phys_addr) ? -EINVAL : 0;
+
+	ret = hyp_page_ref_inc_return(p);
+	BUG_ON(ret == 0);
+	if (ret < 0)
+		return ret;
+	else if (ret == 1)
+		ret = do_share(&share);
+	else
+		ret = __host_check_page_dma_shared(phys_addr);
+
+	if (ret)
+		hyp_page_ref_dec(p);
+
+	return ret;
+}
+
+static int __pkvm_host_unshare_dma_page(phys_addr_t phys_addr)
+{
+	struct hyp_page *p = hyp_phys_to_page(phys_addr);
+	struct pkvm_mem_share share = {
+		.tx	= {
+			.nr_pages	= 1,
+			.initiator	= {
+				.id	= PKVM_ID_HOST,
+				.addr	= phys_addr,
+			},
+			.completer	= {
+				.id	= PKVM_ID_IOMMU,
+			},
+		},
+	};
+
+	hyp_assert_lock_held(&host_mmu.lock);
+	hyp_assert_lock_held(&pkvm_pgd_lock);
+
+	if (!addr_is_memory(phys_addr))
+		return 0;
+
+	if (!hyp_page_ref_dec_and_test(p))
+		return 0;
+
+	return do_unshare(&share);
+}
+
+/*
+ * __pkvm_host_share_dma - Mark host memory as used for DMA
+ * @phys_addr:	physical address of the DMA region
+ * @size:	size of the DMA region
+ * @is_ram:	whether it is RAM or device memory
+ *
+ * We must not allow the host to donate pages that are mapped in the IOMMU for
+ * DMA. So:
+ * 1. Mark the host S2 entry as being owned by IOMMU
+ * 2. Refcount it, since a page may be mapped in multiple device address spaces.
+ *
+ * At some point we may end up needing more than the current 16 bits for
+ * refcounting, for example if all devices and sub-devices map the same MSI
+ * doorbell page. It will do for now.
+ */
+int __pkvm_host_share_dma(phys_addr_t phys_addr, size_t size, bool is_ram)
+{
+	int i;
+	int ret;
+	size_t nr_pages = size >> PAGE_SHIFT;
+
+	if (WARN_ON(!PAGE_ALIGNED(phys_addr | size)))
+		return -EINVAL;
+
+	host_lock_component();
+	hyp_lock_component();
+
+	for (i = 0; i < nr_pages; i++) {
+		ret = __pkvm_host_share_dma_page(phys_addr + i * PAGE_SIZE,
+						 is_ram);
+		if (ret)
+			break;
+	}
+
+	if (ret) {
+		for (--i; i >= 0; --i)
+			__pkvm_host_unshare_dma_page(phys_addr + i * PAGE_SIZE);
+	}
+
+	hyp_unlock_component();
+	host_unlock_component();
+
+	return ret;
+}
+
+int __pkvm_host_unshare_dma(phys_addr_t phys_addr, size_t size)
+{
+	int i;
+	int ret;
+	size_t nr_pages = size >> PAGE_SHIFT;
+
+	host_lock_component();
+	hyp_lock_component();
+
+	/*
+	 * We end up here after the caller successfully unmapped the page from
+	 * the IOMMU table. Which means that a ref is held, the page is shared
+	 * in the host s2, there can be no failure.
+	 */
+	for (i = 0; i < nr_pages; i++) {
+		ret = __pkvm_host_unshare_dma_page(phys_addr + i * PAGE_SIZE);
+		if (ret)
+			break;
+	}
+
+	hyp_unlock_component();
+	host_unlock_component();
+
+	return ret;
+}
+
 int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
 {
 	int ret;
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 16/45] KVM: arm64: Introduce IOMMU driver infrastructure
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

From: David Brazdil <dbrazdil@google.com>

Bootstrap infrastructure for IOMMU drivers by introducing kvm_iommu_ops
struct in EL2 that is populated based on a iommu_driver parameter to
__pkvm_init hypercall and selected in EL1 early init.

An 'init' operation is called in __pkvm_init_finalise, giving the driver
an opportunity to initialize itself in EL2 and create any EL2 mappings
that it will need. 'init' is specifically called before
'finalize_host_mappings' so that:
  (a) pages mapped by the driver change owner to hyp,
  (b) ownership changes in 'finalize_host_mappings' get reflected in
      IOMMU mappings (added in a future patch).

Signed-off-by: David Brazdil <dbrazdil@google.com>
[JPB: add remove(), move to include/nvhe]
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/include/asm/kvm_host.h       |  4 ++++
 arch/arm64/include/asm/kvm_hyp.h        |  3 ++-
 arch/arm64/kvm/hyp/include/nvhe/iommu.h | 11 +++++++++++
 arch/arm64/kvm/arm.c                    | 25 +++++++++++++++++++++----
 arch/arm64/kvm/hyp/nvhe/hyp-main.c      |  6 +++++-
 arch/arm64/kvm/hyp/nvhe/setup.c         | 24 +++++++++++++++++++++++-
 6 files changed, 66 insertions(+), 7 deletions(-)
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/iommu.h

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 02850cf3f0de..b8e032bda022 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -377,6 +377,10 @@ extern s64 kvm_nvhe_sym(hyp_physvirt_offset);
 extern u64 kvm_nvhe_sym(hyp_cpu_logical_map)[NR_CPUS];
 #define hyp_cpu_logical_map CHOOSE_NVHE_SYM(hyp_cpu_logical_map)
 
+enum kvm_iommu_driver {
+	KVM_IOMMU_DRIVER_NONE,
+};
+
 struct vcpu_reset_state {
 	unsigned long	pc;
 	unsigned long	r0;
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 1b597b7db99b..0226a719e28f 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -114,7 +114,8 @@ void __noreturn __hyp_do_panic(struct kvm_cpu_context *host_ctxt, u64 spsr,
 void __pkvm_init_switch_pgd(phys_addr_t phys, unsigned long size,
 			    phys_addr_t pgd, void *sp, void *cont_fn);
 int __pkvm_init(phys_addr_t phys, unsigned long size, unsigned long nr_cpus,
-		unsigned long *per_cpu_base, u32 hyp_va_bits);
+		unsigned long *per_cpu_base, u32 hyp_va_bits,
+		enum kvm_iommu_driver iommu_driver);
 void __noreturn __host_enter(struct kvm_cpu_context *host_ctxt);
 #endif
 
diff --git a/arch/arm64/kvm/hyp/include/nvhe/iommu.h b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
new file mode 100644
index 000000000000..c728c8e913da
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ARM64_KVM_NVHE_IOMMU_H__
+#define __ARM64_KVM_NVHE_IOMMU_H__
+
+struct kvm_iommu_ops {
+	int (*init)(void);
+};
+
+extern struct kvm_iommu_ops kvm_iommu_ops;
+
+#endif /* __ARM64_KVM_NVHE_IOMMU_H__ */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index c96fd7deea14..31faae76d519 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1899,6 +1899,15 @@ static bool init_psci_relay(void)
 	return true;
 }
 
+static int init_stage2_iommu(void)
+{
+	return KVM_IOMMU_DRIVER_NONE;
+}
+
+static void remove_stage2_iommu(enum kvm_iommu_driver iommu)
+{
+}
+
 static int init_subsystems(void)
 {
 	int err = 0;
@@ -1957,7 +1966,7 @@ static void teardown_hyp_mode(void)
 	}
 }
 
-static int do_pkvm_init(u32 hyp_va_bits)
+static int do_pkvm_init(u32 hyp_va_bits, enum kvm_iommu_driver iommu_driver)
 {
 	void *per_cpu_base = kvm_ksym_ref(kvm_nvhe_sym(kvm_arm_hyp_percpu_base));
 	int ret;
@@ -1966,7 +1975,7 @@ static int do_pkvm_init(u32 hyp_va_bits)
 	cpu_hyp_init_context();
 	ret = kvm_call_hyp_nvhe(__pkvm_init, hyp_mem_base, hyp_mem_size,
 				num_possible_cpus(), kern_hyp_va(per_cpu_base),
-				hyp_va_bits);
+				hyp_va_bits, iommu_driver);
 	cpu_hyp_init_features();
 
 	/*
@@ -1996,15 +2005,23 @@ static void kvm_hyp_init_symbols(void)
 static int kvm_hyp_init_protection(u32 hyp_va_bits)
 {
 	void *addr = phys_to_virt(hyp_mem_base);
+	enum kvm_iommu_driver iommu;
 	int ret;
 
 	ret = create_hyp_mappings(addr, addr + hyp_mem_size, PAGE_HYP);
 	if (ret)
 		return ret;
 
-	ret = do_pkvm_init(hyp_va_bits);
-	if (ret)
+	ret = init_stage2_iommu();
+	if (ret < 0)
 		return ret;
+	iommu = ret;
+
+	ret = do_pkvm_init(hyp_va_bits, iommu);
+	if (ret) {
+		remove_stage2_iommu(iommu);
+		return ret;
+	}
 
 	free_hyp_pgds();
 
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 29ce7b09edbb..37e308337fec 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -14,6 +14,7 @@
 #include <asm/kvm_host.h>
 #include <asm/kvm_hyp.h>
 
+#include <nvhe/iommu.h>
 #include <nvhe/mem_protect.h>
 #include <nvhe/mm.h>
 #include <nvhe/pkvm.h>
@@ -34,6 +35,8 @@ static DEFINE_PER_CPU(struct user_fpsimd_state, loaded_host_fpsimd_state);
 
 DEFINE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
 
+struct kvm_iommu_ops kvm_iommu_ops;
+
 void __kvm_hyp_host_forward_smc(struct kvm_cpu_context *host_ctxt);
 
 typedef void (*hyp_entry_exit_handler_fn)(struct pkvm_hyp_vcpu *);
@@ -958,6 +961,7 @@ static void handle___pkvm_init(struct kvm_cpu_context *host_ctxt)
 	DECLARE_REG(unsigned long, nr_cpus, host_ctxt, 3);
 	DECLARE_REG(unsigned long *, per_cpu_base, host_ctxt, 4);
 	DECLARE_REG(u32, hyp_va_bits, host_ctxt, 5);
+	DECLARE_REG(enum kvm_iommu_driver, iommu_driver, host_ctxt, 6);
 
 	/*
 	 * __pkvm_init() will return only if an error occurred, otherwise it
@@ -965,7 +969,7 @@ static void handle___pkvm_init(struct kvm_cpu_context *host_ctxt)
 	 * with the host context directly.
 	 */
 	cpu_reg(host_ctxt, 1) = __pkvm_init(phys, size, nr_cpus, per_cpu_base,
-					    hyp_va_bits);
+					    hyp_va_bits, iommu_driver);
 }
 
 static void handle___pkvm_cpu_set_vector(struct kvm_cpu_context *host_ctxt)
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index de7d60c3c20b..3e73c066d560 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -11,6 +11,7 @@
 
 #include <nvhe/early_alloc.h>
 #include <nvhe/gfp.h>
+#include <nvhe/iommu.h>
 #include <nvhe/memory.h>
 #include <nvhe/mem_protect.h>
 #include <nvhe/mm.h>
@@ -288,6 +289,16 @@ static int fix_hyp_pgtable_refcnt(void)
 				&walker);
 }
 
+static int select_iommu_ops(enum kvm_iommu_driver driver)
+{
+	switch (driver) {
+	case KVM_IOMMU_DRIVER_NONE:
+		return 0;
+	}
+
+	return -EINVAL;
+}
+
 void __noreturn __pkvm_init_finalise(void)
 {
 	struct kvm_host_data *host_data = this_cpu_ptr(&kvm_host_data);
@@ -321,6 +332,12 @@ void __noreturn __pkvm_init_finalise(void)
 	if (ret)
 		goto out;
 
+	if (kvm_iommu_ops.init) {
+		ret = kvm_iommu_ops.init();
+		if (ret)
+			goto out;
+	}
+
 	ret = fix_host_ownership();
 	if (ret)
 		goto out;
@@ -345,7 +362,8 @@ void __noreturn __pkvm_init_finalise(void)
 }
 
 int __pkvm_init(phys_addr_t phys, unsigned long size, unsigned long nr_cpus,
-		unsigned long *per_cpu_base, u32 hyp_va_bits)
+		unsigned long *per_cpu_base, u32 hyp_va_bits,
+		enum kvm_iommu_driver iommu_driver)
 {
 	struct kvm_nvhe_init_params *params;
 	void *virt = hyp_phys_to_virt(phys);
@@ -368,6 +386,10 @@ int __pkvm_init(phys_addr_t phys, unsigned long size, unsigned long nr_cpus,
 	if (ret)
 		return ret;
 
+	ret = select_iommu_ops(iommu_driver);
+	if (ret)
+		return ret;
+
 	update_nvhe_init_params();
 
 	/* Jump in the idmap page to switch to the new page-tables */
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 16/45] KVM: arm64: Introduce IOMMU driver infrastructure
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

From: David Brazdil <dbrazdil@google.com>

Bootstrap infrastructure for IOMMU drivers by introducing kvm_iommu_ops
struct in EL2 that is populated based on a iommu_driver parameter to
__pkvm_init hypercall and selected in EL1 early init.

An 'init' operation is called in __pkvm_init_finalise, giving the driver
an opportunity to initialize itself in EL2 and create any EL2 mappings
that it will need. 'init' is specifically called before
'finalize_host_mappings' so that:
  (a) pages mapped by the driver change owner to hyp,
  (b) ownership changes in 'finalize_host_mappings' get reflected in
      IOMMU mappings (added in a future patch).

Signed-off-by: David Brazdil <dbrazdil@google.com>
[JPB: add remove(), move to include/nvhe]
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/include/asm/kvm_host.h       |  4 ++++
 arch/arm64/include/asm/kvm_hyp.h        |  3 ++-
 arch/arm64/kvm/hyp/include/nvhe/iommu.h | 11 +++++++++++
 arch/arm64/kvm/arm.c                    | 25 +++++++++++++++++++++----
 arch/arm64/kvm/hyp/nvhe/hyp-main.c      |  6 +++++-
 arch/arm64/kvm/hyp/nvhe/setup.c         | 24 +++++++++++++++++++++++-
 6 files changed, 66 insertions(+), 7 deletions(-)
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/iommu.h

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 02850cf3f0de..b8e032bda022 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -377,6 +377,10 @@ extern s64 kvm_nvhe_sym(hyp_physvirt_offset);
 extern u64 kvm_nvhe_sym(hyp_cpu_logical_map)[NR_CPUS];
 #define hyp_cpu_logical_map CHOOSE_NVHE_SYM(hyp_cpu_logical_map)
 
+enum kvm_iommu_driver {
+	KVM_IOMMU_DRIVER_NONE,
+};
+
 struct vcpu_reset_state {
 	unsigned long	pc;
 	unsigned long	r0;
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 1b597b7db99b..0226a719e28f 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -114,7 +114,8 @@ void __noreturn __hyp_do_panic(struct kvm_cpu_context *host_ctxt, u64 spsr,
 void __pkvm_init_switch_pgd(phys_addr_t phys, unsigned long size,
 			    phys_addr_t pgd, void *sp, void *cont_fn);
 int __pkvm_init(phys_addr_t phys, unsigned long size, unsigned long nr_cpus,
-		unsigned long *per_cpu_base, u32 hyp_va_bits);
+		unsigned long *per_cpu_base, u32 hyp_va_bits,
+		enum kvm_iommu_driver iommu_driver);
 void __noreturn __host_enter(struct kvm_cpu_context *host_ctxt);
 #endif
 
diff --git a/arch/arm64/kvm/hyp/include/nvhe/iommu.h b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
new file mode 100644
index 000000000000..c728c8e913da
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ARM64_KVM_NVHE_IOMMU_H__
+#define __ARM64_KVM_NVHE_IOMMU_H__
+
+struct kvm_iommu_ops {
+	int (*init)(void);
+};
+
+extern struct kvm_iommu_ops kvm_iommu_ops;
+
+#endif /* __ARM64_KVM_NVHE_IOMMU_H__ */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index c96fd7deea14..31faae76d519 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1899,6 +1899,15 @@ static bool init_psci_relay(void)
 	return true;
 }
 
+static int init_stage2_iommu(void)
+{
+	return KVM_IOMMU_DRIVER_NONE;
+}
+
+static void remove_stage2_iommu(enum kvm_iommu_driver iommu)
+{
+}
+
 static int init_subsystems(void)
 {
 	int err = 0;
@@ -1957,7 +1966,7 @@ static void teardown_hyp_mode(void)
 	}
 }
 
-static int do_pkvm_init(u32 hyp_va_bits)
+static int do_pkvm_init(u32 hyp_va_bits, enum kvm_iommu_driver iommu_driver)
 {
 	void *per_cpu_base = kvm_ksym_ref(kvm_nvhe_sym(kvm_arm_hyp_percpu_base));
 	int ret;
@@ -1966,7 +1975,7 @@ static int do_pkvm_init(u32 hyp_va_bits)
 	cpu_hyp_init_context();
 	ret = kvm_call_hyp_nvhe(__pkvm_init, hyp_mem_base, hyp_mem_size,
 				num_possible_cpus(), kern_hyp_va(per_cpu_base),
-				hyp_va_bits);
+				hyp_va_bits, iommu_driver);
 	cpu_hyp_init_features();
 
 	/*
@@ -1996,15 +2005,23 @@ static void kvm_hyp_init_symbols(void)
 static int kvm_hyp_init_protection(u32 hyp_va_bits)
 {
 	void *addr = phys_to_virt(hyp_mem_base);
+	enum kvm_iommu_driver iommu;
 	int ret;
 
 	ret = create_hyp_mappings(addr, addr + hyp_mem_size, PAGE_HYP);
 	if (ret)
 		return ret;
 
-	ret = do_pkvm_init(hyp_va_bits);
-	if (ret)
+	ret = init_stage2_iommu();
+	if (ret < 0)
 		return ret;
+	iommu = ret;
+
+	ret = do_pkvm_init(hyp_va_bits, iommu);
+	if (ret) {
+		remove_stage2_iommu(iommu);
+		return ret;
+	}
 
 	free_hyp_pgds();
 
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 29ce7b09edbb..37e308337fec 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -14,6 +14,7 @@
 #include <asm/kvm_host.h>
 #include <asm/kvm_hyp.h>
 
+#include <nvhe/iommu.h>
 #include <nvhe/mem_protect.h>
 #include <nvhe/mm.h>
 #include <nvhe/pkvm.h>
@@ -34,6 +35,8 @@ static DEFINE_PER_CPU(struct user_fpsimd_state, loaded_host_fpsimd_state);
 
 DEFINE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
 
+struct kvm_iommu_ops kvm_iommu_ops;
+
 void __kvm_hyp_host_forward_smc(struct kvm_cpu_context *host_ctxt);
 
 typedef void (*hyp_entry_exit_handler_fn)(struct pkvm_hyp_vcpu *);
@@ -958,6 +961,7 @@ static void handle___pkvm_init(struct kvm_cpu_context *host_ctxt)
 	DECLARE_REG(unsigned long, nr_cpus, host_ctxt, 3);
 	DECLARE_REG(unsigned long *, per_cpu_base, host_ctxt, 4);
 	DECLARE_REG(u32, hyp_va_bits, host_ctxt, 5);
+	DECLARE_REG(enum kvm_iommu_driver, iommu_driver, host_ctxt, 6);
 
 	/*
 	 * __pkvm_init() will return only if an error occurred, otherwise it
@@ -965,7 +969,7 @@ static void handle___pkvm_init(struct kvm_cpu_context *host_ctxt)
 	 * with the host context directly.
 	 */
 	cpu_reg(host_ctxt, 1) = __pkvm_init(phys, size, nr_cpus, per_cpu_base,
-					    hyp_va_bits);
+					    hyp_va_bits, iommu_driver);
 }
 
 static void handle___pkvm_cpu_set_vector(struct kvm_cpu_context *host_ctxt)
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index de7d60c3c20b..3e73c066d560 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -11,6 +11,7 @@
 
 #include <nvhe/early_alloc.h>
 #include <nvhe/gfp.h>
+#include <nvhe/iommu.h>
 #include <nvhe/memory.h>
 #include <nvhe/mem_protect.h>
 #include <nvhe/mm.h>
@@ -288,6 +289,16 @@ static int fix_hyp_pgtable_refcnt(void)
 				&walker);
 }
 
+static int select_iommu_ops(enum kvm_iommu_driver driver)
+{
+	switch (driver) {
+	case KVM_IOMMU_DRIVER_NONE:
+		return 0;
+	}
+
+	return -EINVAL;
+}
+
 void __noreturn __pkvm_init_finalise(void)
 {
 	struct kvm_host_data *host_data = this_cpu_ptr(&kvm_host_data);
@@ -321,6 +332,12 @@ void __noreturn __pkvm_init_finalise(void)
 	if (ret)
 		goto out;
 
+	if (kvm_iommu_ops.init) {
+		ret = kvm_iommu_ops.init();
+		if (ret)
+			goto out;
+	}
+
 	ret = fix_host_ownership();
 	if (ret)
 		goto out;
@@ -345,7 +362,8 @@ void __noreturn __pkvm_init_finalise(void)
 }
 
 int __pkvm_init(phys_addr_t phys, unsigned long size, unsigned long nr_cpus,
-		unsigned long *per_cpu_base, u32 hyp_va_bits)
+		unsigned long *per_cpu_base, u32 hyp_va_bits,
+		enum kvm_iommu_driver iommu_driver)
 {
 	struct kvm_nvhe_init_params *params;
 	void *virt = hyp_phys_to_virt(phys);
@@ -368,6 +386,10 @@ int __pkvm_init(phys_addr_t phys, unsigned long size, unsigned long nr_cpus,
 	if (ret)
 		return ret;
 
+	ret = select_iommu_ops(iommu_driver);
+	if (ret)
+		return ret;
+
 	update_nvhe_init_params();
 
 	/* Jump in the idmap page to switch to the new page-tables */
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 17/45] KVM: arm64: pkvm: Add IOMMU hypercalls
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

The unprivileged host IOMMU driver forwards some of the IOMMU API calls
to the hypervisor, which installs and populates the page tables.

Note that this is not a stable ABI. Those hypercalls change with the
kernel just like internal function calls.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 virt/kvm/Kconfig                        |  3 +
 arch/arm64/include/asm/kvm_asm.h        |  7 +++
 arch/arm64/kvm/hyp/include/nvhe/iommu.h | 68 ++++++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/hyp-main.c      | 77 +++++++++++++++++++++++++
 4 files changed, 155 insertions(+)

diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 9fb1ff6f19e5..99b0ddc50443 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -92,3 +92,6 @@ config KVM_XFER_TO_GUEST_WORK
 
 config HAVE_KVM_PM_NOTIFIER
        bool
+
+config KVM_IOMMU
+       bool
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 12aa0ccc3b3d..e2ced352b49c 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -81,6 +81,13 @@ enum __kvm_host_smccc_func {
 	__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_load,
 	__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_put,
 	__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_sync_state,
+	__KVM_HOST_SMCCC_FUNC___pkvm_host_iommu_alloc_domain,
+	__KVM_HOST_SMCCC_FUNC___pkvm_host_iommu_free_domain,
+	__KVM_HOST_SMCCC_FUNC___pkvm_host_iommu_attach_dev,
+	__KVM_HOST_SMCCC_FUNC___pkvm_host_iommu_detach_dev,
+	__KVM_HOST_SMCCC_FUNC___pkvm_host_iommu_map_pages,
+	__KVM_HOST_SMCCC_FUNC___pkvm_host_iommu_unmap_pages,
+	__KVM_HOST_SMCCC_FUNC___pkvm_host_iommu_iova_to_phys,
 };
 
 #define DECLARE_KVM_VHE_SYM(sym)	extern char sym[]
diff --git a/arch/arm64/kvm/hyp/include/nvhe/iommu.h b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
index c728c8e913da..26a95717b613 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/iommu.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
@@ -2,6 +2,74 @@
 #ifndef __ARM64_KVM_NVHE_IOMMU_H__
 #define __ARM64_KVM_NVHE_IOMMU_H__
 
+#if IS_ENABLED(CONFIG_KVM_IOMMU)
+/* Hypercall handlers */
+int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
+			   unsigned long pgd_hva);
+int kvm_iommu_free_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id);
+int kvm_iommu_attach_dev(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
+			 u32 endpoint_id);
+int kvm_iommu_detach_dev(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
+			 u32 endpoint_id);
+int kvm_iommu_map_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
+			unsigned long iova, phys_addr_t paddr, size_t pgsize,
+			size_t pgcount, int prot);
+int kvm_iommu_unmap_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
+			  unsigned long iova, size_t pgsize, size_t pgcount);
+phys_addr_t kvm_iommu_iova_to_phys(pkvm_handle_t iommu_id,
+				   pkvm_handle_t domain_id, unsigned long iova);
+#else /* !CONFIG_KVM_IOMMU */
+static inline int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id,
+					 pkvm_handle_t domain_id,
+					 unsigned long pgd_hva)
+{
+	return -ENODEV;
+}
+
+static inline int kvm_iommu_free_domain(pkvm_handle_t iommu_id,
+					pkvm_handle_t domain_id)
+{
+	return -ENODEV;
+}
+
+static inline int kvm_iommu_attach_dev(pkvm_handle_t iommu_id,
+				       pkvm_handle_t domain_id,
+				       u32 endpoint_id)
+{
+	return -ENODEV;
+}
+
+static inline int kvm_iommu_detach_dev(pkvm_handle_t iommu_id,
+				       pkvm_handle_t domain_id,
+				       u32 endpoint_id)
+{
+	return -ENODEV;
+}
+
+static inline int kvm_iommu_map_pages(pkvm_handle_t iommu_id,
+				      pkvm_handle_t domain_id,
+				      unsigned long iova, phys_addr_t paddr,
+				      size_t pgsize, size_t pgcount, int prot)
+{
+	return -ENODEV;
+}
+
+static inline int kvm_iommu_unmap_pages(pkvm_handle_t iommu_id,
+					pkvm_handle_t domain_id,
+					unsigned long iova, size_t pgsize,
+					size_t pgcount)
+{
+	return 0;
+}
+
+static inline phys_addr_t kvm_iommu_iova_to_phys(pkvm_handle_t iommu_id,
+						 pkvm_handle_t domain_id,
+						 unsigned long iova)
+{
+	return 0;
+}
+#endif /* CONFIG_KVM_IOMMU */
+
 struct kvm_iommu_ops {
 	int (*init)(void);
 };
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 37e308337fec..34ec46b890f0 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -1059,6 +1059,76 @@ static void handle___pkvm_teardown_vm(struct kvm_cpu_context *host_ctxt)
 	cpu_reg(host_ctxt, 1) = __pkvm_teardown_vm(handle);
 }
 
+static void handle___pkvm_host_iommu_alloc_domain(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(pkvm_handle_t, iommu, host_ctxt, 1);
+	DECLARE_REG(pkvm_handle_t, domain, host_ctxt, 2);
+	DECLARE_REG(unsigned long, pgd_hva, host_ctxt, 3);
+
+	cpu_reg(host_ctxt, 1) = kvm_iommu_alloc_domain(iommu, domain, pgd_hva);
+}
+
+static void handle___pkvm_host_iommu_free_domain(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(pkvm_handle_t, iommu, host_ctxt, 1);
+	DECLARE_REG(pkvm_handle_t, domain, host_ctxt, 2);
+
+	cpu_reg(host_ctxt, 1) = kvm_iommu_free_domain(iommu, domain);
+}
+
+static void handle___pkvm_host_iommu_attach_dev(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(pkvm_handle_t, iommu, host_ctxt, 1);
+	DECLARE_REG(pkvm_handle_t, domain, host_ctxt, 2);
+	DECLARE_REG(unsigned int, endpoint, host_ctxt, 3);
+
+	cpu_reg(host_ctxt, 1) = kvm_iommu_attach_dev(iommu, domain, endpoint);
+}
+
+static void handle___pkvm_host_iommu_detach_dev(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(pkvm_handle_t, iommu, host_ctxt, 1);
+	DECLARE_REG(pkvm_handle_t, domain, host_ctxt, 2);
+	DECLARE_REG(unsigned int, endpoint, host_ctxt, 3);
+
+	cpu_reg(host_ctxt, 1) = kvm_iommu_detach_dev(iommu, domain, endpoint);
+}
+
+static void handle___pkvm_host_iommu_map_pages(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(pkvm_handle_t, iommu, host_ctxt, 1);
+	DECLARE_REG(pkvm_handle_t, domain, host_ctxt, 2);
+	DECLARE_REG(unsigned long, iova, host_ctxt, 3);
+	DECLARE_REG(phys_addr_t, paddr, host_ctxt, 4);
+	DECLARE_REG(size_t, pgsize, host_ctxt, 5);
+	DECLARE_REG(size_t, pgcount, host_ctxt, 6);
+	DECLARE_REG(unsigned int, prot, host_ctxt, 7);
+
+	cpu_reg(host_ctxt, 1) = kvm_iommu_map_pages(iommu, domain, iova, paddr,
+						    pgsize, pgcount, prot);
+}
+
+static void handle___pkvm_host_iommu_unmap_pages(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(pkvm_handle_t, iommu, host_ctxt, 1);
+	DECLARE_REG(pkvm_handle_t, domain, host_ctxt, 2);
+	DECLARE_REG(unsigned long, iova, host_ctxt, 3);
+	DECLARE_REG(size_t, pgsize, host_ctxt, 4);
+	DECLARE_REG(size_t, pgcount, host_ctxt, 5);
+
+	cpu_reg(host_ctxt, 1) = kvm_iommu_unmap_pages(iommu, domain, iova,
+						      pgsize, pgcount);
+}
+
+static void handle___pkvm_host_iommu_iova_to_phys(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(pkvm_handle_t, iommu, host_ctxt, 1);
+	DECLARE_REG(pkvm_handle_t, domain, host_ctxt, 2);
+	DECLARE_REG(unsigned long, iova, host_ctxt, 3);
+
+	cpu_reg(host_ctxt, 1) = kvm_iommu_iova_to_phys(iommu, domain, iova);
+}
+
 typedef void (*hcall_t)(struct kvm_cpu_context *);
 
 #define HANDLE_FUNC(x)	[__KVM_HOST_SMCCC_FUNC_##x] = (hcall_t)handle_##x
@@ -1093,6 +1163,13 @@ static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__pkvm_vcpu_load),
 	HANDLE_FUNC(__pkvm_vcpu_put),
 	HANDLE_FUNC(__pkvm_vcpu_sync_state),
+	HANDLE_FUNC(__pkvm_host_iommu_alloc_domain),
+	HANDLE_FUNC(__pkvm_host_iommu_free_domain),
+	HANDLE_FUNC(__pkvm_host_iommu_attach_dev),
+	HANDLE_FUNC(__pkvm_host_iommu_detach_dev),
+	HANDLE_FUNC(__pkvm_host_iommu_map_pages),
+	HANDLE_FUNC(__pkvm_host_iommu_unmap_pages),
+	HANDLE_FUNC(__pkvm_host_iommu_iova_to_phys),
 };
 
 static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 17/45] KVM: arm64: pkvm: Add IOMMU hypercalls
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

The unprivileged host IOMMU driver forwards some of the IOMMU API calls
to the hypervisor, which installs and populates the page tables.

Note that this is not a stable ABI. Those hypercalls change with the
kernel just like internal function calls.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 virt/kvm/Kconfig                        |  3 +
 arch/arm64/include/asm/kvm_asm.h        |  7 +++
 arch/arm64/kvm/hyp/include/nvhe/iommu.h | 68 ++++++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/hyp-main.c      | 77 +++++++++++++++++++++++++
 4 files changed, 155 insertions(+)

diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 9fb1ff6f19e5..99b0ddc50443 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -92,3 +92,6 @@ config KVM_XFER_TO_GUEST_WORK
 
 config HAVE_KVM_PM_NOTIFIER
        bool
+
+config KVM_IOMMU
+       bool
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 12aa0ccc3b3d..e2ced352b49c 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -81,6 +81,13 @@ enum __kvm_host_smccc_func {
 	__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_load,
 	__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_put,
 	__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_sync_state,
+	__KVM_HOST_SMCCC_FUNC___pkvm_host_iommu_alloc_domain,
+	__KVM_HOST_SMCCC_FUNC___pkvm_host_iommu_free_domain,
+	__KVM_HOST_SMCCC_FUNC___pkvm_host_iommu_attach_dev,
+	__KVM_HOST_SMCCC_FUNC___pkvm_host_iommu_detach_dev,
+	__KVM_HOST_SMCCC_FUNC___pkvm_host_iommu_map_pages,
+	__KVM_HOST_SMCCC_FUNC___pkvm_host_iommu_unmap_pages,
+	__KVM_HOST_SMCCC_FUNC___pkvm_host_iommu_iova_to_phys,
 };
 
 #define DECLARE_KVM_VHE_SYM(sym)	extern char sym[]
diff --git a/arch/arm64/kvm/hyp/include/nvhe/iommu.h b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
index c728c8e913da..26a95717b613 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/iommu.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
@@ -2,6 +2,74 @@
 #ifndef __ARM64_KVM_NVHE_IOMMU_H__
 #define __ARM64_KVM_NVHE_IOMMU_H__
 
+#if IS_ENABLED(CONFIG_KVM_IOMMU)
+/* Hypercall handlers */
+int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
+			   unsigned long pgd_hva);
+int kvm_iommu_free_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id);
+int kvm_iommu_attach_dev(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
+			 u32 endpoint_id);
+int kvm_iommu_detach_dev(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
+			 u32 endpoint_id);
+int kvm_iommu_map_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
+			unsigned long iova, phys_addr_t paddr, size_t pgsize,
+			size_t pgcount, int prot);
+int kvm_iommu_unmap_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
+			  unsigned long iova, size_t pgsize, size_t pgcount);
+phys_addr_t kvm_iommu_iova_to_phys(pkvm_handle_t iommu_id,
+				   pkvm_handle_t domain_id, unsigned long iova);
+#else /* !CONFIG_KVM_IOMMU */
+static inline int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id,
+					 pkvm_handle_t domain_id,
+					 unsigned long pgd_hva)
+{
+	return -ENODEV;
+}
+
+static inline int kvm_iommu_free_domain(pkvm_handle_t iommu_id,
+					pkvm_handle_t domain_id)
+{
+	return -ENODEV;
+}
+
+static inline int kvm_iommu_attach_dev(pkvm_handle_t iommu_id,
+				       pkvm_handle_t domain_id,
+				       u32 endpoint_id)
+{
+	return -ENODEV;
+}
+
+static inline int kvm_iommu_detach_dev(pkvm_handle_t iommu_id,
+				       pkvm_handle_t domain_id,
+				       u32 endpoint_id)
+{
+	return -ENODEV;
+}
+
+static inline int kvm_iommu_map_pages(pkvm_handle_t iommu_id,
+				      pkvm_handle_t domain_id,
+				      unsigned long iova, phys_addr_t paddr,
+				      size_t pgsize, size_t pgcount, int prot)
+{
+	return -ENODEV;
+}
+
+static inline int kvm_iommu_unmap_pages(pkvm_handle_t iommu_id,
+					pkvm_handle_t domain_id,
+					unsigned long iova, size_t pgsize,
+					size_t pgcount)
+{
+	return 0;
+}
+
+static inline phys_addr_t kvm_iommu_iova_to_phys(pkvm_handle_t iommu_id,
+						 pkvm_handle_t domain_id,
+						 unsigned long iova)
+{
+	return 0;
+}
+#endif /* CONFIG_KVM_IOMMU */
+
 struct kvm_iommu_ops {
 	int (*init)(void);
 };
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 37e308337fec..34ec46b890f0 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -1059,6 +1059,76 @@ static void handle___pkvm_teardown_vm(struct kvm_cpu_context *host_ctxt)
 	cpu_reg(host_ctxt, 1) = __pkvm_teardown_vm(handle);
 }
 
+static void handle___pkvm_host_iommu_alloc_domain(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(pkvm_handle_t, iommu, host_ctxt, 1);
+	DECLARE_REG(pkvm_handle_t, domain, host_ctxt, 2);
+	DECLARE_REG(unsigned long, pgd_hva, host_ctxt, 3);
+
+	cpu_reg(host_ctxt, 1) = kvm_iommu_alloc_domain(iommu, domain, pgd_hva);
+}
+
+static void handle___pkvm_host_iommu_free_domain(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(pkvm_handle_t, iommu, host_ctxt, 1);
+	DECLARE_REG(pkvm_handle_t, domain, host_ctxt, 2);
+
+	cpu_reg(host_ctxt, 1) = kvm_iommu_free_domain(iommu, domain);
+}
+
+static void handle___pkvm_host_iommu_attach_dev(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(pkvm_handle_t, iommu, host_ctxt, 1);
+	DECLARE_REG(pkvm_handle_t, domain, host_ctxt, 2);
+	DECLARE_REG(unsigned int, endpoint, host_ctxt, 3);
+
+	cpu_reg(host_ctxt, 1) = kvm_iommu_attach_dev(iommu, domain, endpoint);
+}
+
+static void handle___pkvm_host_iommu_detach_dev(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(pkvm_handle_t, iommu, host_ctxt, 1);
+	DECLARE_REG(pkvm_handle_t, domain, host_ctxt, 2);
+	DECLARE_REG(unsigned int, endpoint, host_ctxt, 3);
+
+	cpu_reg(host_ctxt, 1) = kvm_iommu_detach_dev(iommu, domain, endpoint);
+}
+
+static void handle___pkvm_host_iommu_map_pages(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(pkvm_handle_t, iommu, host_ctxt, 1);
+	DECLARE_REG(pkvm_handle_t, domain, host_ctxt, 2);
+	DECLARE_REG(unsigned long, iova, host_ctxt, 3);
+	DECLARE_REG(phys_addr_t, paddr, host_ctxt, 4);
+	DECLARE_REG(size_t, pgsize, host_ctxt, 5);
+	DECLARE_REG(size_t, pgcount, host_ctxt, 6);
+	DECLARE_REG(unsigned int, prot, host_ctxt, 7);
+
+	cpu_reg(host_ctxt, 1) = kvm_iommu_map_pages(iommu, domain, iova, paddr,
+						    pgsize, pgcount, prot);
+}
+
+static void handle___pkvm_host_iommu_unmap_pages(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(pkvm_handle_t, iommu, host_ctxt, 1);
+	DECLARE_REG(pkvm_handle_t, domain, host_ctxt, 2);
+	DECLARE_REG(unsigned long, iova, host_ctxt, 3);
+	DECLARE_REG(size_t, pgsize, host_ctxt, 4);
+	DECLARE_REG(size_t, pgcount, host_ctxt, 5);
+
+	cpu_reg(host_ctxt, 1) = kvm_iommu_unmap_pages(iommu, domain, iova,
+						      pgsize, pgcount);
+}
+
+static void handle___pkvm_host_iommu_iova_to_phys(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(pkvm_handle_t, iommu, host_ctxt, 1);
+	DECLARE_REG(pkvm_handle_t, domain, host_ctxt, 2);
+	DECLARE_REG(unsigned long, iova, host_ctxt, 3);
+
+	cpu_reg(host_ctxt, 1) = kvm_iommu_iova_to_phys(iommu, domain, iova);
+}
+
 typedef void (*hcall_t)(struct kvm_cpu_context *);
 
 #define HANDLE_FUNC(x)	[__KVM_HOST_SMCCC_FUNC_##x] = (hcall_t)handle_##x
@@ -1093,6 +1163,13 @@ static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__pkvm_vcpu_load),
 	HANDLE_FUNC(__pkvm_vcpu_put),
 	HANDLE_FUNC(__pkvm_vcpu_sync_state),
+	HANDLE_FUNC(__pkvm_host_iommu_alloc_domain),
+	HANDLE_FUNC(__pkvm_host_iommu_free_domain),
+	HANDLE_FUNC(__pkvm_host_iommu_attach_dev),
+	HANDLE_FUNC(__pkvm_host_iommu_detach_dev),
+	HANDLE_FUNC(__pkvm_host_iommu_map_pages),
+	HANDLE_FUNC(__pkvm_host_iommu_unmap_pages),
+	HANDLE_FUNC(__pkvm_host_iommu_iova_to_phys),
 };
 
 static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 18/45] KVM: arm64: iommu: Add per-cpu page queue
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

The hyp driver will need to allocate pages when handling some
hypercalls, to populate page, stream and domain tables. Add a per-cpu
page queue that will contain host pages to be donated and reclaimed.
When the driver needs a new page, it sets the needs_page bit and returns
to the host with an error. The host pushes a page and retries the
hypercall.

The queue is per-cpu to ensure that IOMMU map()/unmap() requests from
different CPUs don't step on each others. It is populated on demand
rather than upfront to avoid wasting memory, as these allocations should
be relatively rare.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/kvm/hyp/nvhe/Makefile        |  2 +
 arch/arm64/kvm/hyp/include/nvhe/iommu.h |  4 ++
 include/kvm/iommu.h                     | 15 +++++++
 arch/arm64/kvm/hyp/nvhe/iommu/iommu.c   | 52 +++++++++++++++++++++++++
 4 files changed, 73 insertions(+)
 create mode 100644 include/kvm/iommu.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/iommu/iommu.c

diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 530347cdebe3..f7dfc88c9f5b 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -28,6 +28,8 @@ hyp-obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
 hyp-obj-$(CONFIG_DEBUG_LIST) += list_debug.o
 hyp-obj-y += $(lib-objs)
 
+hyp-obj-$(CONFIG_KVM_IOMMU) += iommu/iommu.o
+
 ##
 ## Build rules for compiling nVHE hyp code
 ## Output of this folder is `kvm_nvhe.o`, a partially linked object
diff --git a/arch/arm64/kvm/hyp/include/nvhe/iommu.h b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
index 26a95717b613..4959c30977b8 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/iommu.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
@@ -3,6 +3,10 @@
 #define __ARM64_KVM_NVHE_IOMMU_H__
 
 #if IS_ENABLED(CONFIG_KVM_IOMMU)
+int kvm_iommu_init(void);
+void *kvm_iommu_donate_page(void);
+void kvm_iommu_reclaim_page(void *p);
+
 /* Hypercall handlers */
 int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 			   unsigned long pgd_hva);
diff --git a/include/kvm/iommu.h b/include/kvm/iommu.h
new file mode 100644
index 000000000000..12b06a5df889
--- /dev/null
+++ b/include/kvm/iommu.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_IOMMU_H
+#define __KVM_IOMMU_H
+
+#include <asm/kvm_host.h>
+
+struct kvm_hyp_iommu_memcache {
+	struct kvm_hyp_memcache	pages;
+	bool needs_page;
+} ____cacheline_aligned_in_smp;
+
+extern struct kvm_hyp_iommu_memcache *kvm_nvhe_sym(kvm_hyp_iommu_memcaches);
+#define kvm_hyp_iommu_memcaches kvm_nvhe_sym(kvm_hyp_iommu_memcaches)
+
+#endif /* __KVM_IOMMU_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
new file mode 100644
index 000000000000..1a9184fbbd27
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * IOMMU operations for pKVM
+ *
+ * Copyright (C) 2022 Linaro Ltd.
+ */
+
+#include <asm/kvm_hyp.h>
+#include <kvm/iommu.h>
+#include <nvhe/iommu.h>
+#include <nvhe/mem_protect.h>
+#include <nvhe/mm.h>
+
+struct kvm_hyp_iommu_memcache __ro_after_init *kvm_hyp_iommu_memcaches;
+
+void *kvm_iommu_donate_page(void)
+{
+	void *p;
+	int cpu = hyp_smp_processor_id();
+	struct kvm_hyp_memcache tmp = kvm_hyp_iommu_memcaches[cpu].pages;
+
+	if (!tmp.nr_pages) {
+		kvm_hyp_iommu_memcaches[cpu].needs_page = true;
+		return NULL;
+	}
+
+	p = pkvm_admit_host_page(&tmp);
+	if (!p)
+		return NULL;
+
+	kvm_hyp_iommu_memcaches[cpu].pages = tmp;
+	memset(p, 0, PAGE_SIZE);
+	return p;
+}
+
+void kvm_iommu_reclaim_page(void *p)
+{
+	int cpu = hyp_smp_processor_id();
+
+	pkvm_teardown_donated_memory(&kvm_hyp_iommu_memcaches[cpu].pages, p,
+				     PAGE_SIZE);
+}
+
+int kvm_iommu_init(void)
+{
+	enum kvm_pgtable_prot prot;
+
+	/* The memcache is shared with the host */
+	prot = pkvm_mkstate(PAGE_HYP, PKVM_PAGE_SHARED_OWNED);
+	return pkvm_create_mappings(kvm_hyp_iommu_memcaches,
+				    kvm_hyp_iommu_memcaches + NR_CPUS, prot);
+}
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 18/45] KVM: arm64: iommu: Add per-cpu page queue
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

The hyp driver will need to allocate pages when handling some
hypercalls, to populate page, stream and domain tables. Add a per-cpu
page queue that will contain host pages to be donated and reclaimed.
When the driver needs a new page, it sets the needs_page bit and returns
to the host with an error. The host pushes a page and retries the
hypercall.

The queue is per-cpu to ensure that IOMMU map()/unmap() requests from
different CPUs don't step on each others. It is populated on demand
rather than upfront to avoid wasting memory, as these allocations should
be relatively rare.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/kvm/hyp/nvhe/Makefile        |  2 +
 arch/arm64/kvm/hyp/include/nvhe/iommu.h |  4 ++
 include/kvm/iommu.h                     | 15 +++++++
 arch/arm64/kvm/hyp/nvhe/iommu/iommu.c   | 52 +++++++++++++++++++++++++
 4 files changed, 73 insertions(+)
 create mode 100644 include/kvm/iommu.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/iommu/iommu.c

diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 530347cdebe3..f7dfc88c9f5b 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -28,6 +28,8 @@ hyp-obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
 hyp-obj-$(CONFIG_DEBUG_LIST) += list_debug.o
 hyp-obj-y += $(lib-objs)
 
+hyp-obj-$(CONFIG_KVM_IOMMU) += iommu/iommu.o
+
 ##
 ## Build rules for compiling nVHE hyp code
 ## Output of this folder is `kvm_nvhe.o`, a partially linked object
diff --git a/arch/arm64/kvm/hyp/include/nvhe/iommu.h b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
index 26a95717b613..4959c30977b8 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/iommu.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
@@ -3,6 +3,10 @@
 #define __ARM64_KVM_NVHE_IOMMU_H__
 
 #if IS_ENABLED(CONFIG_KVM_IOMMU)
+int kvm_iommu_init(void);
+void *kvm_iommu_donate_page(void);
+void kvm_iommu_reclaim_page(void *p);
+
 /* Hypercall handlers */
 int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 			   unsigned long pgd_hva);
diff --git a/include/kvm/iommu.h b/include/kvm/iommu.h
new file mode 100644
index 000000000000..12b06a5df889
--- /dev/null
+++ b/include/kvm/iommu.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_IOMMU_H
+#define __KVM_IOMMU_H
+
+#include <asm/kvm_host.h>
+
+struct kvm_hyp_iommu_memcache {
+	struct kvm_hyp_memcache	pages;
+	bool needs_page;
+} ____cacheline_aligned_in_smp;
+
+extern struct kvm_hyp_iommu_memcache *kvm_nvhe_sym(kvm_hyp_iommu_memcaches);
+#define kvm_hyp_iommu_memcaches kvm_nvhe_sym(kvm_hyp_iommu_memcaches)
+
+#endif /* __KVM_IOMMU_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
new file mode 100644
index 000000000000..1a9184fbbd27
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * IOMMU operations for pKVM
+ *
+ * Copyright (C) 2022 Linaro Ltd.
+ */
+
+#include <asm/kvm_hyp.h>
+#include <kvm/iommu.h>
+#include <nvhe/iommu.h>
+#include <nvhe/mem_protect.h>
+#include <nvhe/mm.h>
+
+struct kvm_hyp_iommu_memcache __ro_after_init *kvm_hyp_iommu_memcaches;
+
+void *kvm_iommu_donate_page(void)
+{
+	void *p;
+	int cpu = hyp_smp_processor_id();
+	struct kvm_hyp_memcache tmp = kvm_hyp_iommu_memcaches[cpu].pages;
+
+	if (!tmp.nr_pages) {
+		kvm_hyp_iommu_memcaches[cpu].needs_page = true;
+		return NULL;
+	}
+
+	p = pkvm_admit_host_page(&tmp);
+	if (!p)
+		return NULL;
+
+	kvm_hyp_iommu_memcaches[cpu].pages = tmp;
+	memset(p, 0, PAGE_SIZE);
+	return p;
+}
+
+void kvm_iommu_reclaim_page(void *p)
+{
+	int cpu = hyp_smp_processor_id();
+
+	pkvm_teardown_donated_memory(&kvm_hyp_iommu_memcaches[cpu].pages, p,
+				     PAGE_SIZE);
+}
+
+int kvm_iommu_init(void)
+{
+	enum kvm_pgtable_prot prot;
+
+	/* The memcache is shared with the host */
+	prot = pkvm_mkstate(PAGE_HYP, PKVM_PAGE_SHARED_OWNED);
+	return pkvm_create_mappings(kvm_hyp_iommu_memcaches,
+				    kvm_hyp_iommu_memcaches + NR_CPUS, prot);
+}
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 19/45] KVM: arm64: iommu: Add domains
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

The IOMMU domain abstraction allows to share the same page tables
between multiple devices. That may be necessary due to hardware
constraints, if multiple devices cannot be isolated by the IOMMU
(conventional PCI bus for example). It may also help with optimizing
resource or TLB use. For pKVM in particular, it may be useful to reduce
the amount of memory required for page tables. All devices owned by the
host kernel could be attached to the same domain (though that requires
host changes).

Each IOMMU device holds an array of domains, and the host allocates
domain IDs that index this array. The alloc() operation initializes the
domain and prepares the page tables. The attach() operation initializes
the device table that holds the PGD and its configuration.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/kvm/hyp/include/nvhe/iommu.h |  16 +++
 include/kvm/iommu.h                     |  55 ++++++++
 arch/arm64/kvm/hyp/nvhe/iommu/iommu.c   | 161 ++++++++++++++++++++++++
 3 files changed, 232 insertions(+)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/iommu.h b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
index 4959c30977b8..76d3fa6ce331 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/iommu.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
@@ -2,8 +2,12 @@
 #ifndef __ARM64_KVM_NVHE_IOMMU_H__
 #define __ARM64_KVM_NVHE_IOMMU_H__
 
+#include <kvm/iommu.h>
+#include <linux/io-pgtable.h>
+
 #if IS_ENABLED(CONFIG_KVM_IOMMU)
 int kvm_iommu_init(void);
+int kvm_iommu_init_device(struct kvm_hyp_iommu *iommu);
 void *kvm_iommu_donate_page(void);
 void kvm_iommu_reclaim_page(void *p);
 
@@ -74,8 +78,20 @@ static inline phys_addr_t kvm_iommu_iova_to_phys(pkvm_handle_t iommu_id,
 }
 #endif /* CONFIG_KVM_IOMMU */
 
+struct kvm_iommu_tlb_cookie {
+	struct kvm_hyp_iommu	*iommu;
+	pkvm_handle_t		domain_id;
+};
+
 struct kvm_iommu_ops {
 	int (*init)(void);
+	struct kvm_hyp_iommu *(*get_iommu_by_id)(pkvm_handle_t smmu_id);
+	int (*alloc_iopt)(struct io_pgtable *iopt, unsigned long pgd_hva);
+	int (*free_iopt)(struct io_pgtable *iopt);
+	int (*attach_dev)(struct kvm_hyp_iommu *iommu, pkvm_handle_t domain_id,
+			  struct kvm_hyp_iommu_domain *domain, u32 endpoint_id);
+	int (*detach_dev)(struct kvm_hyp_iommu *iommu, pkvm_handle_t domain_id,
+			  struct kvm_hyp_iommu_domain *domain, u32 endpoint_id);
 };
 
 extern struct kvm_iommu_ops kvm_iommu_ops;
diff --git a/include/kvm/iommu.h b/include/kvm/iommu.h
index 12b06a5df889..2bbe5f7bf726 100644
--- a/include/kvm/iommu.h
+++ b/include/kvm/iommu.h
@@ -3,6 +3,23 @@
 #define __KVM_IOMMU_H
 
 #include <asm/kvm_host.h>
+#include <linux/io-pgtable.h>
+
+/*
+ * Parameters from the trusted host:
+ * @pgtable_cfg:	page table configuration
+ * @domains:		root domain table
+ * @nr_domains:		max number of domains (exclusive)
+ *
+ * Other members are filled and used at runtime by the IOMMU driver.
+ */
+struct kvm_hyp_iommu {
+	struct io_pgtable_cfg		pgtable_cfg;
+	void				**domains;
+	size_t				nr_domains;
+
+	struct io_pgtable_params	*pgtable;
+};
 
 struct kvm_hyp_iommu_memcache {
 	struct kvm_hyp_memcache	pages;
@@ -12,4 +29,42 @@ struct kvm_hyp_iommu_memcache {
 extern struct kvm_hyp_iommu_memcache *kvm_nvhe_sym(kvm_hyp_iommu_memcaches);
 #define kvm_hyp_iommu_memcaches kvm_nvhe_sym(kvm_hyp_iommu_memcaches)
 
+struct kvm_hyp_iommu_domain {
+	void			*pgd;
+	u32			refs;
+};
+
+/*
+ * At the moment the number of domains is limited by the ASID and VMID size on
+ * Arm. With single-stage translation, that size is 2^8 or 2^16. On a lot of
+ * platforms the number of devices is actually the limiting factor and we'll
+ * only need a handful of domains, but with PASID or SR-IOV support that limit
+ * can be reached.
+ *
+ * In practice we're rarely going to need a lot of domains. To avoid allocating
+ * a large domain table, we use a two-level table, indexed by domain ID. With
+ * 4kB pages and 16-bytes domains, the leaf table contains 256 domains, and the
+ * root table 256 pointers. With 64kB pages, the leaf table contains 4096
+ * domains and the root table 16 pointers. In this case, or when using 8-bit
+ * VMIDs, it may be more advantageous to use a single level. But using two
+ * levels allows to easily extend the domain size.
+ */
+#define KVM_IOMMU_MAX_DOMAINS	(1 << 16)
+
+/* Number of entries in the level-2 domain table */
+#define KVM_IOMMU_DOMAINS_PER_PAGE \
+	(PAGE_SIZE / sizeof(struct kvm_hyp_iommu_domain))
+
+/* Number of entries in the root domain table */
+#define KVM_IOMMU_DOMAINS_ROOT_ENTRIES \
+	(KVM_IOMMU_MAX_DOMAINS / KVM_IOMMU_DOMAINS_PER_PAGE)
+
+#define KVM_IOMMU_DOMAINS_ROOT_SIZE \
+	(KVM_IOMMU_DOMAINS_ROOT_ENTRIES * sizeof(void *))
+
+/* Bits [16:split] index the root table, bits [split-1:0] index the leaf table */
+#define KVM_IOMMU_DOMAIN_ID_SPLIT	ilog2(KVM_IOMMU_DOMAINS_PER_PAGE)
+
+#define KVM_IOMMU_DOMAIN_ID_LEAF_MASK	((1 << KVM_IOMMU_DOMAIN_ID_SPLIT) - 1)
+
 #endif /* __KVM_IOMMU_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
index 1a9184fbbd27..7404ea77ed9f 100644
--- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
@@ -13,6 +13,22 @@
 
 struct kvm_hyp_iommu_memcache __ro_after_init *kvm_hyp_iommu_memcaches;
 
+/*
+ * Serialize access to domains and IOMMU driver internal structures (command
+ * queue, device tables)
+ */
+static hyp_spinlock_t iommu_lock;
+
+#define domain_to_iopt(_iommu, _domain, _domain_id)		\
+	(struct io_pgtable) {					\
+		.ops = &(_iommu)->pgtable->ops,			\
+		.pgd = (_domain)->pgd,				\
+		.cookie = &(struct kvm_iommu_tlb_cookie) {	\
+			.iommu		= (_iommu),		\
+			.domain_id	= (_domain_id),		\
+		},						\
+	}
+
 void *kvm_iommu_donate_page(void)
 {
 	void *p;
@@ -41,10 +57,155 @@ void kvm_iommu_reclaim_page(void *p)
 				     PAGE_SIZE);
 }
 
+static struct kvm_hyp_iommu_domain *
+handle_to_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
+		 struct kvm_hyp_iommu **out_iommu)
+{
+	int idx;
+	struct kvm_hyp_iommu *iommu;
+	struct kvm_hyp_iommu_domain *domains;
+
+	iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
+	if (!iommu)
+		return NULL;
+
+	if (domain_id >= iommu->nr_domains)
+		return NULL;
+	domain_id = array_index_nospec(domain_id, iommu->nr_domains);
+
+	idx = domain_id >> KVM_IOMMU_DOMAIN_ID_SPLIT;
+	domains = iommu->domains[idx];
+	if (!domains) {
+		domains = kvm_iommu_donate_page();
+		if (!domains)
+			return NULL;
+		iommu->domains[idx] = domains;
+	}
+
+	*out_iommu = iommu;
+	return &domains[domain_id & KVM_IOMMU_DOMAIN_ID_LEAF_MASK];
+}
+
+int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
+			   unsigned long pgd_hva)
+{
+	int ret = -EINVAL;
+	struct io_pgtable iopt;
+	struct kvm_hyp_iommu *iommu;
+	struct kvm_hyp_iommu_domain *domain;
+
+	hyp_spin_lock(&iommu_lock);
+	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	if (!domain)
+		goto out_unlock;
+
+	if (domain->refs)
+		goto out_unlock;
+
+	iopt = domain_to_iopt(iommu, domain, domain_id);
+	ret = kvm_iommu_ops.alloc_iopt(&iopt, pgd_hva);
+	if (ret)
+		goto out_unlock;
+
+	domain->refs = 1;
+	domain->pgd = iopt.pgd;
+out_unlock:
+	hyp_spin_unlock(&iommu_lock);
+	return ret;
+}
+
+int kvm_iommu_free_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id)
+{
+	int ret = -EINVAL;
+	struct io_pgtable iopt;
+	struct kvm_hyp_iommu *iommu;
+	struct kvm_hyp_iommu_domain *domain;
+
+	hyp_spin_lock(&iommu_lock);
+	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	if (!domain)
+		goto out_unlock;
+
+	if (domain->refs != 1)
+		goto out_unlock;
+
+	iopt = domain_to_iopt(iommu, domain, domain_id);
+	ret = kvm_iommu_ops.free_iopt(&iopt);
+
+	memset(domain, 0, sizeof(*domain));
+
+out_unlock:
+	hyp_spin_unlock(&iommu_lock);
+	return ret;
+}
+
+int kvm_iommu_attach_dev(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
+			 u32 endpoint_id)
+{
+	int ret = -EINVAL;
+	struct kvm_hyp_iommu *iommu;
+	struct kvm_hyp_iommu_domain *domain;
+
+	hyp_spin_lock(&iommu_lock);
+	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	if (!domain || !domain->refs || domain->refs == UINT_MAX)
+		goto out_unlock;
+
+	ret = kvm_iommu_ops.attach_dev(iommu, domain_id, domain, endpoint_id);
+	if (ret)
+		goto out_unlock;
+
+	domain->refs++;
+out_unlock:
+	hyp_spin_unlock(&iommu_lock);
+	return ret;
+}
+
+int kvm_iommu_detach_dev(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
+			 u32 endpoint_id)
+{
+	int ret = -EINVAL;
+	struct kvm_hyp_iommu *iommu;
+	struct kvm_hyp_iommu_domain *domain;
+
+	hyp_spin_lock(&iommu_lock);
+	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	if (!domain || domain->refs <= 1)
+		goto out_unlock;
+
+	ret = kvm_iommu_ops.detach_dev(iommu, domain_id, domain, endpoint_id);
+	if (ret)
+		goto out_unlock;
+
+	domain->refs--;
+out_unlock:
+	hyp_spin_unlock(&iommu_lock);
+	return ret;
+}
+
+int kvm_iommu_init_device(struct kvm_hyp_iommu *iommu)
+{
+	void *domains;
+
+	domains = iommu->domains;
+	iommu->domains = kern_hyp_va(domains);
+	return pkvm_create_mappings(iommu->domains, iommu->domains +
+				    KVM_IOMMU_DOMAINS_ROOT_ENTRIES, PAGE_HYP);
+}
+
 int kvm_iommu_init(void)
 {
 	enum kvm_pgtable_prot prot;
 
+	hyp_spin_lock_init(&iommu_lock);
+
+	if (WARN_ON(!kvm_iommu_ops.get_iommu_by_id ||
+		    !kvm_iommu_ops.alloc_iopt ||
+		    !kvm_iommu_ops.free_iopt ||
+		    !kvm_iommu_ops.attach_dev ||
+		    !kvm_iommu_ops.detach_dev))
+		return -ENODEV;
+
 	/* The memcache is shared with the host */
 	prot = pkvm_mkstate(PAGE_HYP, PKVM_PAGE_SHARED_OWNED);
 	return pkvm_create_mappings(kvm_hyp_iommu_memcaches,
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 19/45] KVM: arm64: iommu: Add domains
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

The IOMMU domain abstraction allows to share the same page tables
between multiple devices. That may be necessary due to hardware
constraints, if multiple devices cannot be isolated by the IOMMU
(conventional PCI bus for example). It may also help with optimizing
resource or TLB use. For pKVM in particular, it may be useful to reduce
the amount of memory required for page tables. All devices owned by the
host kernel could be attached to the same domain (though that requires
host changes).

Each IOMMU device holds an array of domains, and the host allocates
domain IDs that index this array. The alloc() operation initializes the
domain and prepares the page tables. The attach() operation initializes
the device table that holds the PGD and its configuration.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/kvm/hyp/include/nvhe/iommu.h |  16 +++
 include/kvm/iommu.h                     |  55 ++++++++
 arch/arm64/kvm/hyp/nvhe/iommu/iommu.c   | 161 ++++++++++++++++++++++++
 3 files changed, 232 insertions(+)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/iommu.h b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
index 4959c30977b8..76d3fa6ce331 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/iommu.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
@@ -2,8 +2,12 @@
 #ifndef __ARM64_KVM_NVHE_IOMMU_H__
 #define __ARM64_KVM_NVHE_IOMMU_H__
 
+#include <kvm/iommu.h>
+#include <linux/io-pgtable.h>
+
 #if IS_ENABLED(CONFIG_KVM_IOMMU)
 int kvm_iommu_init(void);
+int kvm_iommu_init_device(struct kvm_hyp_iommu *iommu);
 void *kvm_iommu_donate_page(void);
 void kvm_iommu_reclaim_page(void *p);
 
@@ -74,8 +78,20 @@ static inline phys_addr_t kvm_iommu_iova_to_phys(pkvm_handle_t iommu_id,
 }
 #endif /* CONFIG_KVM_IOMMU */
 
+struct kvm_iommu_tlb_cookie {
+	struct kvm_hyp_iommu	*iommu;
+	pkvm_handle_t		domain_id;
+};
+
 struct kvm_iommu_ops {
 	int (*init)(void);
+	struct kvm_hyp_iommu *(*get_iommu_by_id)(pkvm_handle_t smmu_id);
+	int (*alloc_iopt)(struct io_pgtable *iopt, unsigned long pgd_hva);
+	int (*free_iopt)(struct io_pgtable *iopt);
+	int (*attach_dev)(struct kvm_hyp_iommu *iommu, pkvm_handle_t domain_id,
+			  struct kvm_hyp_iommu_domain *domain, u32 endpoint_id);
+	int (*detach_dev)(struct kvm_hyp_iommu *iommu, pkvm_handle_t domain_id,
+			  struct kvm_hyp_iommu_domain *domain, u32 endpoint_id);
 };
 
 extern struct kvm_iommu_ops kvm_iommu_ops;
diff --git a/include/kvm/iommu.h b/include/kvm/iommu.h
index 12b06a5df889..2bbe5f7bf726 100644
--- a/include/kvm/iommu.h
+++ b/include/kvm/iommu.h
@@ -3,6 +3,23 @@
 #define __KVM_IOMMU_H
 
 #include <asm/kvm_host.h>
+#include <linux/io-pgtable.h>
+
+/*
+ * Parameters from the trusted host:
+ * @pgtable_cfg:	page table configuration
+ * @domains:		root domain table
+ * @nr_domains:		max number of domains (exclusive)
+ *
+ * Other members are filled and used at runtime by the IOMMU driver.
+ */
+struct kvm_hyp_iommu {
+	struct io_pgtable_cfg		pgtable_cfg;
+	void				**domains;
+	size_t				nr_domains;
+
+	struct io_pgtable_params	*pgtable;
+};
 
 struct kvm_hyp_iommu_memcache {
 	struct kvm_hyp_memcache	pages;
@@ -12,4 +29,42 @@ struct kvm_hyp_iommu_memcache {
 extern struct kvm_hyp_iommu_memcache *kvm_nvhe_sym(kvm_hyp_iommu_memcaches);
 #define kvm_hyp_iommu_memcaches kvm_nvhe_sym(kvm_hyp_iommu_memcaches)
 
+struct kvm_hyp_iommu_domain {
+	void			*pgd;
+	u32			refs;
+};
+
+/*
+ * At the moment the number of domains is limited by the ASID and VMID size on
+ * Arm. With single-stage translation, that size is 2^8 or 2^16. On a lot of
+ * platforms the number of devices is actually the limiting factor and we'll
+ * only need a handful of domains, but with PASID or SR-IOV support that limit
+ * can be reached.
+ *
+ * In practice we're rarely going to need a lot of domains. To avoid allocating
+ * a large domain table, we use a two-level table, indexed by domain ID. With
+ * 4kB pages and 16-bytes domains, the leaf table contains 256 domains, and the
+ * root table 256 pointers. With 64kB pages, the leaf table contains 4096
+ * domains and the root table 16 pointers. In this case, or when using 8-bit
+ * VMIDs, it may be more advantageous to use a single level. But using two
+ * levels allows to easily extend the domain size.
+ */
+#define KVM_IOMMU_MAX_DOMAINS	(1 << 16)
+
+/* Number of entries in the level-2 domain table */
+#define KVM_IOMMU_DOMAINS_PER_PAGE \
+	(PAGE_SIZE / sizeof(struct kvm_hyp_iommu_domain))
+
+/* Number of entries in the root domain table */
+#define KVM_IOMMU_DOMAINS_ROOT_ENTRIES \
+	(KVM_IOMMU_MAX_DOMAINS / KVM_IOMMU_DOMAINS_PER_PAGE)
+
+#define KVM_IOMMU_DOMAINS_ROOT_SIZE \
+	(KVM_IOMMU_DOMAINS_ROOT_ENTRIES * sizeof(void *))
+
+/* Bits [16:split] index the root table, bits [split-1:0] index the leaf table */
+#define KVM_IOMMU_DOMAIN_ID_SPLIT	ilog2(KVM_IOMMU_DOMAINS_PER_PAGE)
+
+#define KVM_IOMMU_DOMAIN_ID_LEAF_MASK	((1 << KVM_IOMMU_DOMAIN_ID_SPLIT) - 1)
+
 #endif /* __KVM_IOMMU_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
index 1a9184fbbd27..7404ea77ed9f 100644
--- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
@@ -13,6 +13,22 @@
 
 struct kvm_hyp_iommu_memcache __ro_after_init *kvm_hyp_iommu_memcaches;
 
+/*
+ * Serialize access to domains and IOMMU driver internal structures (command
+ * queue, device tables)
+ */
+static hyp_spinlock_t iommu_lock;
+
+#define domain_to_iopt(_iommu, _domain, _domain_id)		\
+	(struct io_pgtable) {					\
+		.ops = &(_iommu)->pgtable->ops,			\
+		.pgd = (_domain)->pgd,				\
+		.cookie = &(struct kvm_iommu_tlb_cookie) {	\
+			.iommu		= (_iommu),		\
+			.domain_id	= (_domain_id),		\
+		},						\
+	}
+
 void *kvm_iommu_donate_page(void)
 {
 	void *p;
@@ -41,10 +57,155 @@ void kvm_iommu_reclaim_page(void *p)
 				     PAGE_SIZE);
 }
 
+static struct kvm_hyp_iommu_domain *
+handle_to_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
+		 struct kvm_hyp_iommu **out_iommu)
+{
+	int idx;
+	struct kvm_hyp_iommu *iommu;
+	struct kvm_hyp_iommu_domain *domains;
+
+	iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
+	if (!iommu)
+		return NULL;
+
+	if (domain_id >= iommu->nr_domains)
+		return NULL;
+	domain_id = array_index_nospec(domain_id, iommu->nr_domains);
+
+	idx = domain_id >> KVM_IOMMU_DOMAIN_ID_SPLIT;
+	domains = iommu->domains[idx];
+	if (!domains) {
+		domains = kvm_iommu_donate_page();
+		if (!domains)
+			return NULL;
+		iommu->domains[idx] = domains;
+	}
+
+	*out_iommu = iommu;
+	return &domains[domain_id & KVM_IOMMU_DOMAIN_ID_LEAF_MASK];
+}
+
+int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
+			   unsigned long pgd_hva)
+{
+	int ret = -EINVAL;
+	struct io_pgtable iopt;
+	struct kvm_hyp_iommu *iommu;
+	struct kvm_hyp_iommu_domain *domain;
+
+	hyp_spin_lock(&iommu_lock);
+	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	if (!domain)
+		goto out_unlock;
+
+	if (domain->refs)
+		goto out_unlock;
+
+	iopt = domain_to_iopt(iommu, domain, domain_id);
+	ret = kvm_iommu_ops.alloc_iopt(&iopt, pgd_hva);
+	if (ret)
+		goto out_unlock;
+
+	domain->refs = 1;
+	domain->pgd = iopt.pgd;
+out_unlock:
+	hyp_spin_unlock(&iommu_lock);
+	return ret;
+}
+
+int kvm_iommu_free_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id)
+{
+	int ret = -EINVAL;
+	struct io_pgtable iopt;
+	struct kvm_hyp_iommu *iommu;
+	struct kvm_hyp_iommu_domain *domain;
+
+	hyp_spin_lock(&iommu_lock);
+	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	if (!domain)
+		goto out_unlock;
+
+	if (domain->refs != 1)
+		goto out_unlock;
+
+	iopt = domain_to_iopt(iommu, domain, domain_id);
+	ret = kvm_iommu_ops.free_iopt(&iopt);
+
+	memset(domain, 0, sizeof(*domain));
+
+out_unlock:
+	hyp_spin_unlock(&iommu_lock);
+	return ret;
+}
+
+int kvm_iommu_attach_dev(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
+			 u32 endpoint_id)
+{
+	int ret = -EINVAL;
+	struct kvm_hyp_iommu *iommu;
+	struct kvm_hyp_iommu_domain *domain;
+
+	hyp_spin_lock(&iommu_lock);
+	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	if (!domain || !domain->refs || domain->refs == UINT_MAX)
+		goto out_unlock;
+
+	ret = kvm_iommu_ops.attach_dev(iommu, domain_id, domain, endpoint_id);
+	if (ret)
+		goto out_unlock;
+
+	domain->refs++;
+out_unlock:
+	hyp_spin_unlock(&iommu_lock);
+	return ret;
+}
+
+int kvm_iommu_detach_dev(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
+			 u32 endpoint_id)
+{
+	int ret = -EINVAL;
+	struct kvm_hyp_iommu *iommu;
+	struct kvm_hyp_iommu_domain *domain;
+
+	hyp_spin_lock(&iommu_lock);
+	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	if (!domain || domain->refs <= 1)
+		goto out_unlock;
+
+	ret = kvm_iommu_ops.detach_dev(iommu, domain_id, domain, endpoint_id);
+	if (ret)
+		goto out_unlock;
+
+	domain->refs--;
+out_unlock:
+	hyp_spin_unlock(&iommu_lock);
+	return ret;
+}
+
+int kvm_iommu_init_device(struct kvm_hyp_iommu *iommu)
+{
+	void *domains;
+
+	domains = iommu->domains;
+	iommu->domains = kern_hyp_va(domains);
+	return pkvm_create_mappings(iommu->domains, iommu->domains +
+				    KVM_IOMMU_DOMAINS_ROOT_ENTRIES, PAGE_HYP);
+}
+
 int kvm_iommu_init(void)
 {
 	enum kvm_pgtable_prot prot;
 
+	hyp_spin_lock_init(&iommu_lock);
+
+	if (WARN_ON(!kvm_iommu_ops.get_iommu_by_id ||
+		    !kvm_iommu_ops.alloc_iopt ||
+		    !kvm_iommu_ops.free_iopt ||
+		    !kvm_iommu_ops.attach_dev ||
+		    !kvm_iommu_ops.detach_dev))
+		return -ENODEV;
+
 	/* The memcache is shared with the host */
 	prot = pkvm_mkstate(PAGE_HYP, PKVM_PAGE_SHARED_OWNED);
 	return pkvm_create_mappings(kvm_hyp_iommu_memcaches,
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 20/45] KVM: arm64: iommu: Add map() and unmap() operations
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Handle map() and unmap() hypercalls by calling the io-pgtable library.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/kvm/hyp/nvhe/iommu/iommu.c | 144 ++++++++++++++++++++++++++
 1 file changed, 144 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
index 7404ea77ed9f..0550e7bdf179 100644
--- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
@@ -183,6 +183,150 @@ int kvm_iommu_detach_dev(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 	return ret;
 }
 
+static int __kvm_iommu_unmap_pages(struct io_pgtable *iopt, unsigned long iova,
+				   size_t pgsize, size_t pgcount)
+{
+	int ret;
+	size_t unmapped;
+	phys_addr_t paddr;
+	size_t total_unmapped = 0;
+	size_t size = pgsize * pgcount;
+
+	while (total_unmapped < size) {
+		paddr = iopt_iova_to_phys(iopt, iova);
+		if (paddr == 0)
+			return -EINVAL;
+
+		/*
+		 * One page/block at a time, because the range provided may not
+		 * be physically contiguous, and we need to unshare all physical
+		 * pages.
+		 */
+		unmapped = iopt_unmap_pages(iopt, iova, pgsize, 1, NULL);
+		if (!unmapped)
+			return -EINVAL;
+
+		ret = __pkvm_host_unshare_dma(paddr, pgsize);
+		if (ret)
+			return ret;
+
+		iova += unmapped;
+		pgcount -= unmapped / pgsize;
+		total_unmapped += unmapped;
+	}
+
+	return 0;
+}
+
+#define IOMMU_PROT_MASK (IOMMU_READ | IOMMU_WRITE | IOMMU_CACHE |\
+			 IOMMU_NOEXEC | IOMMU_MMIO)
+
+int kvm_iommu_map_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
+			unsigned long iova, phys_addr_t paddr, size_t pgsize,
+			size_t pgcount, int prot)
+{
+	size_t size;
+	size_t granule;
+	int ret = -EINVAL;
+	size_t mapped = 0;
+	struct io_pgtable iopt;
+	struct kvm_hyp_iommu *iommu;
+	size_t pgcount_orig = pgcount;
+	unsigned long iova_orig = iova;
+	struct kvm_hyp_iommu_domain *domain;
+
+	if (prot & ~IOMMU_PROT_MASK)
+		return -EINVAL;
+
+	if (__builtin_mul_overflow(pgsize, pgcount, &size) ||
+	    iova + size < iova || paddr + size < paddr)
+		return -EOVERFLOW;
+
+	hyp_spin_lock(&iommu_lock);
+
+	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	if (!domain)
+		goto err_unlock;
+
+	granule = 1 << __ffs(iommu->pgtable->cfg.pgsize_bitmap);
+	if (!IS_ALIGNED(iova | paddr | pgsize, granule))
+		goto err_unlock;
+
+	ret = __pkvm_host_share_dma(paddr, size, !(prot & IOMMU_MMIO));
+	if (ret)
+		goto err_unlock;
+
+	iopt = domain_to_iopt(iommu, domain, domain_id);
+	while (pgcount) {
+		ret = iopt_map_pages(&iopt, iova, paddr, pgsize, pgcount, prot,
+				     0, &mapped);
+		WARN_ON(!IS_ALIGNED(mapped, pgsize));
+		pgcount -= mapped / pgsize;
+		if (ret)
+			goto err_unmap;
+		iova += mapped;
+		paddr += mapped;
+	}
+
+	hyp_spin_unlock(&iommu_lock);
+	return 0;
+
+err_unmap:
+	__kvm_iommu_unmap_pages(&iopt, iova_orig, pgsize, pgcount_orig - pgcount);
+err_unlock:
+	hyp_spin_unlock(&iommu_lock);
+	return ret;
+}
+
+int kvm_iommu_unmap_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
+			  unsigned long iova, size_t pgsize, size_t pgcount)
+{
+	size_t size;
+	size_t granule;
+	int ret = -EINVAL;
+	struct io_pgtable iopt;
+	struct kvm_hyp_iommu *iommu;
+	struct kvm_hyp_iommu_domain *domain;
+
+	if (__builtin_mul_overflow(pgsize, pgcount, &size) ||
+	    iova + size < iova)
+		return -EOVERFLOW;
+
+	hyp_spin_lock(&iommu_lock);
+	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	if (!domain)
+		goto out_unlock;
+
+	granule = 1 << __ffs(iommu->pgtable->cfg.pgsize_bitmap);
+	if (!IS_ALIGNED(iova | pgsize, granule))
+		goto out_unlock;
+
+	iopt = domain_to_iopt(iommu, domain, domain_id);
+	ret = __kvm_iommu_unmap_pages(&iopt, iova, pgsize, pgcount);
+out_unlock:
+	hyp_spin_unlock(&iommu_lock);
+	return ret;
+}
+
+phys_addr_t kvm_iommu_iova_to_phys(pkvm_handle_t iommu_id,
+				   pkvm_handle_t domain_id, unsigned long iova)
+{
+	phys_addr_t phys = 0;
+	struct io_pgtable iopt;
+	struct kvm_hyp_iommu *iommu;
+	struct kvm_hyp_iommu_domain *domain;
+
+	hyp_spin_lock(&iommu_lock);
+	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	if (domain) {
+		iopt = domain_to_iopt(iommu, domain, domain_id);
+
+		phys = iopt_iova_to_phys(&iopt, iova);
+	}
+	hyp_spin_unlock(&iommu_lock);
+	return phys;
+}
+
 int kvm_iommu_init_device(struct kvm_hyp_iommu *iommu)
 {
 	void *domains;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 20/45] KVM: arm64: iommu: Add map() and unmap() operations
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Handle map() and unmap() hypercalls by calling the io-pgtable library.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/kvm/hyp/nvhe/iommu/iommu.c | 144 ++++++++++++++++++++++++++
 1 file changed, 144 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
index 7404ea77ed9f..0550e7bdf179 100644
--- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
@@ -183,6 +183,150 @@ int kvm_iommu_detach_dev(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 	return ret;
 }
 
+static int __kvm_iommu_unmap_pages(struct io_pgtable *iopt, unsigned long iova,
+				   size_t pgsize, size_t pgcount)
+{
+	int ret;
+	size_t unmapped;
+	phys_addr_t paddr;
+	size_t total_unmapped = 0;
+	size_t size = pgsize * pgcount;
+
+	while (total_unmapped < size) {
+		paddr = iopt_iova_to_phys(iopt, iova);
+		if (paddr == 0)
+			return -EINVAL;
+
+		/*
+		 * One page/block at a time, because the range provided may not
+		 * be physically contiguous, and we need to unshare all physical
+		 * pages.
+		 */
+		unmapped = iopt_unmap_pages(iopt, iova, pgsize, 1, NULL);
+		if (!unmapped)
+			return -EINVAL;
+
+		ret = __pkvm_host_unshare_dma(paddr, pgsize);
+		if (ret)
+			return ret;
+
+		iova += unmapped;
+		pgcount -= unmapped / pgsize;
+		total_unmapped += unmapped;
+	}
+
+	return 0;
+}
+
+#define IOMMU_PROT_MASK (IOMMU_READ | IOMMU_WRITE | IOMMU_CACHE |\
+			 IOMMU_NOEXEC | IOMMU_MMIO)
+
+int kvm_iommu_map_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
+			unsigned long iova, phys_addr_t paddr, size_t pgsize,
+			size_t pgcount, int prot)
+{
+	size_t size;
+	size_t granule;
+	int ret = -EINVAL;
+	size_t mapped = 0;
+	struct io_pgtable iopt;
+	struct kvm_hyp_iommu *iommu;
+	size_t pgcount_orig = pgcount;
+	unsigned long iova_orig = iova;
+	struct kvm_hyp_iommu_domain *domain;
+
+	if (prot & ~IOMMU_PROT_MASK)
+		return -EINVAL;
+
+	if (__builtin_mul_overflow(pgsize, pgcount, &size) ||
+	    iova + size < iova || paddr + size < paddr)
+		return -EOVERFLOW;
+
+	hyp_spin_lock(&iommu_lock);
+
+	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	if (!domain)
+		goto err_unlock;
+
+	granule = 1 << __ffs(iommu->pgtable->cfg.pgsize_bitmap);
+	if (!IS_ALIGNED(iova | paddr | pgsize, granule))
+		goto err_unlock;
+
+	ret = __pkvm_host_share_dma(paddr, size, !(prot & IOMMU_MMIO));
+	if (ret)
+		goto err_unlock;
+
+	iopt = domain_to_iopt(iommu, domain, domain_id);
+	while (pgcount) {
+		ret = iopt_map_pages(&iopt, iova, paddr, pgsize, pgcount, prot,
+				     0, &mapped);
+		WARN_ON(!IS_ALIGNED(mapped, pgsize));
+		pgcount -= mapped / pgsize;
+		if (ret)
+			goto err_unmap;
+		iova += mapped;
+		paddr += mapped;
+	}
+
+	hyp_spin_unlock(&iommu_lock);
+	return 0;
+
+err_unmap:
+	__kvm_iommu_unmap_pages(&iopt, iova_orig, pgsize, pgcount_orig - pgcount);
+err_unlock:
+	hyp_spin_unlock(&iommu_lock);
+	return ret;
+}
+
+int kvm_iommu_unmap_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
+			  unsigned long iova, size_t pgsize, size_t pgcount)
+{
+	size_t size;
+	size_t granule;
+	int ret = -EINVAL;
+	struct io_pgtable iopt;
+	struct kvm_hyp_iommu *iommu;
+	struct kvm_hyp_iommu_domain *domain;
+
+	if (__builtin_mul_overflow(pgsize, pgcount, &size) ||
+	    iova + size < iova)
+		return -EOVERFLOW;
+
+	hyp_spin_lock(&iommu_lock);
+	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	if (!domain)
+		goto out_unlock;
+
+	granule = 1 << __ffs(iommu->pgtable->cfg.pgsize_bitmap);
+	if (!IS_ALIGNED(iova | pgsize, granule))
+		goto out_unlock;
+
+	iopt = domain_to_iopt(iommu, domain, domain_id);
+	ret = __kvm_iommu_unmap_pages(&iopt, iova, pgsize, pgcount);
+out_unlock:
+	hyp_spin_unlock(&iommu_lock);
+	return ret;
+}
+
+phys_addr_t kvm_iommu_iova_to_phys(pkvm_handle_t iommu_id,
+				   pkvm_handle_t domain_id, unsigned long iova)
+{
+	phys_addr_t phys = 0;
+	struct io_pgtable iopt;
+	struct kvm_hyp_iommu *iommu;
+	struct kvm_hyp_iommu_domain *domain;
+
+	hyp_spin_lock(&iommu_lock);
+	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	if (domain) {
+		iopt = domain_to_iopt(iommu, domain, domain_id);
+
+		phys = iopt_iova_to_phys(&iopt, iova);
+	}
+	hyp_spin_unlock(&iommu_lock);
+	return phys;
+}
+
 int kvm_iommu_init_device(struct kvm_hyp_iommu *iommu)
 {
 	void *domains;
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 21/45] KVM: arm64: iommu: Add SMMUv3 driver
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Add the skeleton for an Arm SMMUv3 driver at EL2.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/iommu/Kconfig                       | 10 ++++++++
 arch/arm64/kvm/hyp/nvhe/Makefile            |  1 +
 arch/arm64/include/asm/kvm_host.h           |  1 +
 arch/arm64/kvm/hyp/include/nvhe/iommu.h     |  9 +++++++
 include/kvm/arm_smmu_v3.h                   | 22 +++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c | 27 +++++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/setup.c             |  2 ++
 7 files changed, 72 insertions(+)
 create mode 100644 include/kvm/arm_smmu_v3.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 79707685d54a..1689d416ccd8 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -410,6 +410,16 @@ config ARM_SMMU_V3_SVA
 	  Say Y here if your system supports SVA extensions such as PCIe PASID
 	  and PRI.
 
+config ARM_SMMU_V3_PKVM
+	bool "ARM SMMUv3 support for protected Virtual Machines"
+	depends on KVM && ARM64
+	select KVM_IOMMU
+	help
+	  Enable a SMMUv3 driver in the KVM hypervisor, to protect VMs against
+	  memory accesses from devices owned by the host.
+
+	  Say Y here if you intend to enable KVM in protected mode.
+
 config S390_IOMMU
 	def_bool y if S390 && PCI
 	depends on S390 && PCI
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index f7dfc88c9f5b..349c874762c8 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -29,6 +29,7 @@ hyp-obj-$(CONFIG_DEBUG_LIST) += list_debug.o
 hyp-obj-y += $(lib-objs)
 
 hyp-obj-$(CONFIG_KVM_IOMMU) += iommu/iommu.o
+hyp-obj-$(CONFIG_ARM_SMMU_V3_PKVM) += iommu/arm-smmu-v3.o
 
 ##
 ## Build rules for compiling nVHE hyp code
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index b8e032bda022..c98ce17f8148 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -379,6 +379,7 @@ extern u64 kvm_nvhe_sym(hyp_cpu_logical_map)[NR_CPUS];
 
 enum kvm_iommu_driver {
 	KVM_IOMMU_DRIVER_NONE,
+	KVM_IOMMU_DRIVER_SMMUV3,
 };
 
 struct vcpu_reset_state {
diff --git a/arch/arm64/kvm/hyp/include/nvhe/iommu.h b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
index 76d3fa6ce331..0ba59d20bef3 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/iommu.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
@@ -5,6 +5,15 @@
 #include <kvm/iommu.h>
 #include <linux/io-pgtable.h>
 
+#if IS_ENABLED(CONFIG_ARM_SMMU_V3_PKVM)
+int kvm_arm_smmu_v3_register(void);
+#else /* CONFIG_ARM_SMMU_V3_PKVM */
+static inline int kvm_arm_smmu_v3_register(void)
+{
+	return -EINVAL;
+}
+#endif /* CONFIG_ARM_SMMU_V3_PKVM */
+
 #if IS_ENABLED(CONFIG_KVM_IOMMU)
 int kvm_iommu_init(void);
 int kvm_iommu_init_device(struct kvm_hyp_iommu *iommu);
diff --git a/include/kvm/arm_smmu_v3.h b/include/kvm/arm_smmu_v3.h
new file mode 100644
index 000000000000..ebe488b2f93c
--- /dev/null
+++ b/include/kvm/arm_smmu_v3.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_ARM_SMMU_V3_H
+#define __KVM_ARM_SMMU_V3_H
+
+#include <asm/kvm_asm.h>
+#include <kvm/iommu.h>
+
+#if IS_ENABLED(CONFIG_ARM_SMMU_V3_PKVM)
+
+struct hyp_arm_smmu_v3_device {
+	struct kvm_hyp_iommu	iommu;
+};
+
+extern size_t kvm_nvhe_sym(kvm_hyp_arm_smmu_v3_count);
+#define kvm_hyp_arm_smmu_v3_count kvm_nvhe_sym(kvm_hyp_arm_smmu_v3_count)
+
+extern struct hyp_arm_smmu_v3_device *kvm_nvhe_sym(kvm_hyp_arm_smmu_v3_smmus);
+#define kvm_hyp_arm_smmu_v3_smmus kvm_nvhe_sym(kvm_hyp_arm_smmu_v3_smmus)
+
+#endif /* CONFIG_ARM_SMMU_V3_PKVM */
+
+#endif /* __KVM_ARM_SMMU_V3_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
new file mode 100644
index 000000000000..c167e4dbd28d
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
@@ -0,0 +1,27 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * pKVM hyp driver for the Arm SMMUv3
+ *
+ * Copyright (C) 2022 Linaro Ltd.
+ */
+#include <asm/kvm_hyp.h>
+#include <kvm/arm_smmu_v3.h>
+#include <nvhe/iommu.h>
+
+size_t __ro_after_init kvm_hyp_arm_smmu_v3_count;
+struct hyp_arm_smmu_v3_device __ro_after_init *kvm_hyp_arm_smmu_v3_smmus;
+
+static int smmu_init(void)
+{
+	return -ENOSYS;
+}
+
+static struct kvm_iommu_ops smmu_ops = {
+	.init				= smmu_init,
+};
+
+int kvm_arm_smmu_v3_register(void)
+{
+	kvm_iommu_ops = smmu_ops;
+	return 0;
+}
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 3e73c066d560..a25de8c5d489 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -294,6 +294,8 @@ static int select_iommu_ops(enum kvm_iommu_driver driver)
 	switch (driver) {
 	case KVM_IOMMU_DRIVER_NONE:
 		return 0;
+	case KVM_IOMMU_DRIVER_SMMUV3:
+		return kvm_arm_smmu_v3_register();
 	}
 
 	return -EINVAL;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 21/45] KVM: arm64: iommu: Add SMMUv3 driver
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Add the skeleton for an Arm SMMUv3 driver at EL2.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/iommu/Kconfig                       | 10 ++++++++
 arch/arm64/kvm/hyp/nvhe/Makefile            |  1 +
 arch/arm64/include/asm/kvm_host.h           |  1 +
 arch/arm64/kvm/hyp/include/nvhe/iommu.h     |  9 +++++++
 include/kvm/arm_smmu_v3.h                   | 22 +++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c | 27 +++++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/setup.c             |  2 ++
 7 files changed, 72 insertions(+)
 create mode 100644 include/kvm/arm_smmu_v3.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 79707685d54a..1689d416ccd8 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -410,6 +410,16 @@ config ARM_SMMU_V3_SVA
 	  Say Y here if your system supports SVA extensions such as PCIe PASID
 	  and PRI.
 
+config ARM_SMMU_V3_PKVM
+	bool "ARM SMMUv3 support for protected Virtual Machines"
+	depends on KVM && ARM64
+	select KVM_IOMMU
+	help
+	  Enable a SMMUv3 driver in the KVM hypervisor, to protect VMs against
+	  memory accesses from devices owned by the host.
+
+	  Say Y here if you intend to enable KVM in protected mode.
+
 config S390_IOMMU
 	def_bool y if S390 && PCI
 	depends on S390 && PCI
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index f7dfc88c9f5b..349c874762c8 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -29,6 +29,7 @@ hyp-obj-$(CONFIG_DEBUG_LIST) += list_debug.o
 hyp-obj-y += $(lib-objs)
 
 hyp-obj-$(CONFIG_KVM_IOMMU) += iommu/iommu.o
+hyp-obj-$(CONFIG_ARM_SMMU_V3_PKVM) += iommu/arm-smmu-v3.o
 
 ##
 ## Build rules for compiling nVHE hyp code
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index b8e032bda022..c98ce17f8148 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -379,6 +379,7 @@ extern u64 kvm_nvhe_sym(hyp_cpu_logical_map)[NR_CPUS];
 
 enum kvm_iommu_driver {
 	KVM_IOMMU_DRIVER_NONE,
+	KVM_IOMMU_DRIVER_SMMUV3,
 };
 
 struct vcpu_reset_state {
diff --git a/arch/arm64/kvm/hyp/include/nvhe/iommu.h b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
index 76d3fa6ce331..0ba59d20bef3 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/iommu.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
@@ -5,6 +5,15 @@
 #include <kvm/iommu.h>
 #include <linux/io-pgtable.h>
 
+#if IS_ENABLED(CONFIG_ARM_SMMU_V3_PKVM)
+int kvm_arm_smmu_v3_register(void);
+#else /* CONFIG_ARM_SMMU_V3_PKVM */
+static inline int kvm_arm_smmu_v3_register(void)
+{
+	return -EINVAL;
+}
+#endif /* CONFIG_ARM_SMMU_V3_PKVM */
+
 #if IS_ENABLED(CONFIG_KVM_IOMMU)
 int kvm_iommu_init(void);
 int kvm_iommu_init_device(struct kvm_hyp_iommu *iommu);
diff --git a/include/kvm/arm_smmu_v3.h b/include/kvm/arm_smmu_v3.h
new file mode 100644
index 000000000000..ebe488b2f93c
--- /dev/null
+++ b/include/kvm/arm_smmu_v3.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_ARM_SMMU_V3_H
+#define __KVM_ARM_SMMU_V3_H
+
+#include <asm/kvm_asm.h>
+#include <kvm/iommu.h>
+
+#if IS_ENABLED(CONFIG_ARM_SMMU_V3_PKVM)
+
+struct hyp_arm_smmu_v3_device {
+	struct kvm_hyp_iommu	iommu;
+};
+
+extern size_t kvm_nvhe_sym(kvm_hyp_arm_smmu_v3_count);
+#define kvm_hyp_arm_smmu_v3_count kvm_nvhe_sym(kvm_hyp_arm_smmu_v3_count)
+
+extern struct hyp_arm_smmu_v3_device *kvm_nvhe_sym(kvm_hyp_arm_smmu_v3_smmus);
+#define kvm_hyp_arm_smmu_v3_smmus kvm_nvhe_sym(kvm_hyp_arm_smmu_v3_smmus)
+
+#endif /* CONFIG_ARM_SMMU_V3_PKVM */
+
+#endif /* __KVM_ARM_SMMU_V3_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
new file mode 100644
index 000000000000..c167e4dbd28d
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
@@ -0,0 +1,27 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * pKVM hyp driver for the Arm SMMUv3
+ *
+ * Copyright (C) 2022 Linaro Ltd.
+ */
+#include <asm/kvm_hyp.h>
+#include <kvm/arm_smmu_v3.h>
+#include <nvhe/iommu.h>
+
+size_t __ro_after_init kvm_hyp_arm_smmu_v3_count;
+struct hyp_arm_smmu_v3_device __ro_after_init *kvm_hyp_arm_smmu_v3_smmus;
+
+static int smmu_init(void)
+{
+	return -ENOSYS;
+}
+
+static struct kvm_iommu_ops smmu_ops = {
+	.init				= smmu_init,
+};
+
+int kvm_arm_smmu_v3_register(void)
+{
+	kvm_iommu_ops = smmu_ops;
+	return 0;
+}
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 3e73c066d560..a25de8c5d489 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -294,6 +294,8 @@ static int select_iommu_ops(enum kvm_iommu_driver driver)
 	switch (driver) {
 	case KVM_IOMMU_DRIVER_NONE:
 		return 0;
+	case KVM_IOMMU_DRIVER_SMMUV3:
+		return kvm_arm_smmu_v3_register();
 	}
 
 	return -EINVAL;
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 22/45] KVM: arm64: smmu-v3: Initialize registers
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Ensure all writable registers are properly initialized. We do not touch
registers that will not be read by the SMMU due to disabled features,
such as event queue registers.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 include/kvm/arm_smmu_v3.h                   |  11 +++
 arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c | 103 +++++++++++++++++++-
 2 files changed, 113 insertions(+), 1 deletion(-)

diff --git a/include/kvm/arm_smmu_v3.h b/include/kvm/arm_smmu_v3.h
index ebe488b2f93c..d4b1e487b7d7 100644
--- a/include/kvm/arm_smmu_v3.h
+++ b/include/kvm/arm_smmu_v3.h
@@ -7,8 +7,19 @@
 
 #if IS_ENABLED(CONFIG_ARM_SMMU_V3_PKVM)
 
+/*
+ * Parameters from the trusted host:
+ * @mmio_addr		base address of the SMMU registers
+ * @mmio_size		size of the registers resource
+ *
+ * Other members are filled and used at runtime by the SMMU driver.
+ */
 struct hyp_arm_smmu_v3_device {
 	struct kvm_hyp_iommu	iommu;
+	phys_addr_t		mmio_addr;
+	size_t			mmio_size;
+
+	void __iomem		*base;
 };
 
 extern size_t kvm_nvhe_sym(kvm_hyp_arm_smmu_v3_count);
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
index c167e4dbd28d..75a6aa01b057 100644
--- a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
@@ -4,16 +4,117 @@
  *
  * Copyright (C) 2022 Linaro Ltd.
  */
+#include <asm/arm-smmu-v3-regs.h>
 #include <asm/kvm_hyp.h>
 #include <kvm/arm_smmu_v3.h>
 #include <nvhe/iommu.h>
+#include <nvhe/mm.h>
+#include <nvhe/pkvm.h>
+
+#define ARM_SMMU_POLL_TIMEOUT_US	1000000 /* 1s! */
 
 size_t __ro_after_init kvm_hyp_arm_smmu_v3_count;
 struct hyp_arm_smmu_v3_device __ro_after_init *kvm_hyp_arm_smmu_v3_smmus;
 
+#define for_each_smmu(smmu) \
+	for ((smmu) = kvm_hyp_arm_smmu_v3_smmus; \
+	     (smmu) != &kvm_hyp_arm_smmu_v3_smmus[kvm_hyp_arm_smmu_v3_count]; \
+	     (smmu)++)
+
+/*
+ * Wait until @cond is true.
+ * Return 0 on success, or -ETIMEDOUT
+ */
+#define smmu_wait(_cond)					\
+({								\
+	int __i = 0;						\
+	int __ret = 0;						\
+								\
+	while (!(_cond)) {					\
+		if (++__i > ARM_SMMU_POLL_TIMEOUT_US) {		\
+			__ret = -ETIMEDOUT;			\
+			break;					\
+		}						\
+		pkvm_udelay(1);					\
+	}							\
+	__ret;							\
+})
+
+static int smmu_write_cr0(struct hyp_arm_smmu_v3_device *smmu, u32 val)
+{
+	writel_relaxed(val, smmu->base + ARM_SMMU_CR0);
+	return smmu_wait(readl_relaxed(smmu->base + ARM_SMMU_CR0ACK) == val);
+}
+
+static int smmu_init_registers(struct hyp_arm_smmu_v3_device *smmu)
+{
+	u64 val, old;
+
+	if (!(readl_relaxed(smmu->base + ARM_SMMU_GBPA) & GBPA_ABORT))
+		return -EINVAL;
+
+	/* Initialize all RW registers that will be read by the SMMU */
+	smmu_write_cr0(smmu, 0);
+
+	val = FIELD_PREP(CR1_TABLE_SH, ARM_SMMU_SH_ISH) |
+	      FIELD_PREP(CR1_TABLE_OC, CR1_CACHE_WB) |
+	      FIELD_PREP(CR1_TABLE_IC, CR1_CACHE_WB) |
+	      FIELD_PREP(CR1_QUEUE_SH, ARM_SMMU_SH_ISH) |
+	      FIELD_PREP(CR1_QUEUE_OC, CR1_CACHE_WB) |
+	      FIELD_PREP(CR1_QUEUE_IC, CR1_CACHE_WB);
+	writel_relaxed(val, smmu->base + ARM_SMMU_CR1);
+	writel_relaxed(CR2_PTM, smmu->base + ARM_SMMU_CR2);
+	writel_relaxed(0, smmu->base + ARM_SMMU_IRQ_CTRL);
+
+	val = readl_relaxed(smmu->base + ARM_SMMU_GERROR);
+	old = readl_relaxed(smmu->base + ARM_SMMU_GERRORN);
+	/* Service Failure Mode is fatal */
+	if ((val ^ old) & GERROR_SFM_ERR)
+		return -EIO;
+	/* Clear pending errors */
+	writel_relaxed(val, smmu->base + ARM_SMMU_GERRORN);
+
+	return 0;
+}
+
+static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
+{
+	int ret;
+
+	if (!PAGE_ALIGNED(smmu->mmio_addr | smmu->mmio_size))
+		return -EINVAL;
+
+	ret = pkvm_create_hyp_device_mapping(smmu->mmio_addr, smmu->mmio_size,
+					     &smmu->base);
+	if (IS_ERR(smmu->base))
+		return PTR_ERR(smmu->base);
+
+	ret = smmu_init_registers(smmu);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
 static int smmu_init(void)
 {
-	return -ENOSYS;
+	int ret;
+	struct hyp_arm_smmu_v3_device *smmu;
+
+	ret = pkvm_create_mappings(kvm_hyp_arm_smmu_v3_smmus,
+				   kvm_hyp_arm_smmu_v3_smmus +
+				   kvm_hyp_arm_smmu_v3_count,
+				   PAGE_HYP);
+	if (ret)
+		return ret;
+
+	for_each_smmu(smmu) {
+		ret = smmu_init_device(smmu);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
 }
 
 static struct kvm_iommu_ops smmu_ops = {
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 22/45] KVM: arm64: smmu-v3: Initialize registers
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Ensure all writable registers are properly initialized. We do not touch
registers that will not be read by the SMMU due to disabled features,
such as event queue registers.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 include/kvm/arm_smmu_v3.h                   |  11 +++
 arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c | 103 +++++++++++++++++++-
 2 files changed, 113 insertions(+), 1 deletion(-)

diff --git a/include/kvm/arm_smmu_v3.h b/include/kvm/arm_smmu_v3.h
index ebe488b2f93c..d4b1e487b7d7 100644
--- a/include/kvm/arm_smmu_v3.h
+++ b/include/kvm/arm_smmu_v3.h
@@ -7,8 +7,19 @@
 
 #if IS_ENABLED(CONFIG_ARM_SMMU_V3_PKVM)
 
+/*
+ * Parameters from the trusted host:
+ * @mmio_addr		base address of the SMMU registers
+ * @mmio_size		size of the registers resource
+ *
+ * Other members are filled and used at runtime by the SMMU driver.
+ */
 struct hyp_arm_smmu_v3_device {
 	struct kvm_hyp_iommu	iommu;
+	phys_addr_t		mmio_addr;
+	size_t			mmio_size;
+
+	void __iomem		*base;
 };
 
 extern size_t kvm_nvhe_sym(kvm_hyp_arm_smmu_v3_count);
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
index c167e4dbd28d..75a6aa01b057 100644
--- a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
@@ -4,16 +4,117 @@
  *
  * Copyright (C) 2022 Linaro Ltd.
  */
+#include <asm/arm-smmu-v3-regs.h>
 #include <asm/kvm_hyp.h>
 #include <kvm/arm_smmu_v3.h>
 #include <nvhe/iommu.h>
+#include <nvhe/mm.h>
+#include <nvhe/pkvm.h>
+
+#define ARM_SMMU_POLL_TIMEOUT_US	1000000 /* 1s! */
 
 size_t __ro_after_init kvm_hyp_arm_smmu_v3_count;
 struct hyp_arm_smmu_v3_device __ro_after_init *kvm_hyp_arm_smmu_v3_smmus;
 
+#define for_each_smmu(smmu) \
+	for ((smmu) = kvm_hyp_arm_smmu_v3_smmus; \
+	     (smmu) != &kvm_hyp_arm_smmu_v3_smmus[kvm_hyp_arm_smmu_v3_count]; \
+	     (smmu)++)
+
+/*
+ * Wait until @cond is true.
+ * Return 0 on success, or -ETIMEDOUT
+ */
+#define smmu_wait(_cond)					\
+({								\
+	int __i = 0;						\
+	int __ret = 0;						\
+								\
+	while (!(_cond)) {					\
+		if (++__i > ARM_SMMU_POLL_TIMEOUT_US) {		\
+			__ret = -ETIMEDOUT;			\
+			break;					\
+		}						\
+		pkvm_udelay(1);					\
+	}							\
+	__ret;							\
+})
+
+static int smmu_write_cr0(struct hyp_arm_smmu_v3_device *smmu, u32 val)
+{
+	writel_relaxed(val, smmu->base + ARM_SMMU_CR0);
+	return smmu_wait(readl_relaxed(smmu->base + ARM_SMMU_CR0ACK) == val);
+}
+
+static int smmu_init_registers(struct hyp_arm_smmu_v3_device *smmu)
+{
+	u64 val, old;
+
+	if (!(readl_relaxed(smmu->base + ARM_SMMU_GBPA) & GBPA_ABORT))
+		return -EINVAL;
+
+	/* Initialize all RW registers that will be read by the SMMU */
+	smmu_write_cr0(smmu, 0);
+
+	val = FIELD_PREP(CR1_TABLE_SH, ARM_SMMU_SH_ISH) |
+	      FIELD_PREP(CR1_TABLE_OC, CR1_CACHE_WB) |
+	      FIELD_PREP(CR1_TABLE_IC, CR1_CACHE_WB) |
+	      FIELD_PREP(CR1_QUEUE_SH, ARM_SMMU_SH_ISH) |
+	      FIELD_PREP(CR1_QUEUE_OC, CR1_CACHE_WB) |
+	      FIELD_PREP(CR1_QUEUE_IC, CR1_CACHE_WB);
+	writel_relaxed(val, smmu->base + ARM_SMMU_CR1);
+	writel_relaxed(CR2_PTM, smmu->base + ARM_SMMU_CR2);
+	writel_relaxed(0, smmu->base + ARM_SMMU_IRQ_CTRL);
+
+	val = readl_relaxed(smmu->base + ARM_SMMU_GERROR);
+	old = readl_relaxed(smmu->base + ARM_SMMU_GERRORN);
+	/* Service Failure Mode is fatal */
+	if ((val ^ old) & GERROR_SFM_ERR)
+		return -EIO;
+	/* Clear pending errors */
+	writel_relaxed(val, smmu->base + ARM_SMMU_GERRORN);
+
+	return 0;
+}
+
+static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
+{
+	int ret;
+
+	if (!PAGE_ALIGNED(smmu->mmio_addr | smmu->mmio_size))
+		return -EINVAL;
+
+	ret = pkvm_create_hyp_device_mapping(smmu->mmio_addr, smmu->mmio_size,
+					     &smmu->base);
+	if (IS_ERR(smmu->base))
+		return PTR_ERR(smmu->base);
+
+	ret = smmu_init_registers(smmu);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
 static int smmu_init(void)
 {
-	return -ENOSYS;
+	int ret;
+	struct hyp_arm_smmu_v3_device *smmu;
+
+	ret = pkvm_create_mappings(kvm_hyp_arm_smmu_v3_smmus,
+				   kvm_hyp_arm_smmu_v3_smmus +
+				   kvm_hyp_arm_smmu_v3_count,
+				   PAGE_HYP);
+	if (ret)
+		return ret;
+
+	for_each_smmu(smmu) {
+		ret = smmu_init_device(smmu);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
 }
 
 static struct kvm_iommu_ops smmu_ops = {
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 23/45] KVM: arm64: smmu-v3: Setup command queue
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Map the command queue allocated by the host into the hypervisor address
space. When the host mappings are finalized, the queue is unmapped from
the host.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 include/kvm/arm_smmu_v3.h                   |   4 +
 arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c | 148 ++++++++++++++++++++
 2 files changed, 152 insertions(+)

diff --git a/include/kvm/arm_smmu_v3.h b/include/kvm/arm_smmu_v3.h
index d4b1e487b7d7..da36737bc1e0 100644
--- a/include/kvm/arm_smmu_v3.h
+++ b/include/kvm/arm_smmu_v3.h
@@ -18,8 +18,12 @@ struct hyp_arm_smmu_v3_device {
 	struct kvm_hyp_iommu	iommu;
 	phys_addr_t		mmio_addr;
 	size_t			mmio_size;
+	unsigned long		features;
 
 	void __iomem		*base;
+	u32			cmdq_prod;
+	u64			*cmdq_base;
+	size_t			cmdq_log2size;
 };
 
 extern size_t kvm_nvhe_sym(kvm_hyp_arm_smmu_v3_count);
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
index 75a6aa01b057..36ee5724f36f 100644
--- a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
@@ -40,12 +40,119 @@ struct hyp_arm_smmu_v3_device __ro_after_init *kvm_hyp_arm_smmu_v3_smmus;
 	__ret;							\
 })
 
+#define smmu_wait_event(_smmu, _cond)				\
+({								\
+	if ((_smmu)->features & ARM_SMMU_FEAT_SEV) {		\
+		while (!(_cond))				\
+			wfe();					\
+	}							\
+	smmu_wait(_cond);					\
+})
+
 static int smmu_write_cr0(struct hyp_arm_smmu_v3_device *smmu, u32 val)
 {
 	writel_relaxed(val, smmu->base + ARM_SMMU_CR0);
 	return smmu_wait(readl_relaxed(smmu->base + ARM_SMMU_CR0ACK) == val);
 }
 
+#define Q_WRAP(smmu, reg)	((reg) & (1 << (smmu)->cmdq_log2size))
+#define Q_IDX(smmu, reg)	((reg) & ((1 << (smmu)->cmdq_log2size) - 1))
+
+static bool smmu_cmdq_full(struct hyp_arm_smmu_v3_device *smmu)
+{
+	u64 cons = readl_relaxed(smmu->base + ARM_SMMU_CMDQ_CONS);
+
+	return Q_IDX(smmu, smmu->cmdq_prod) == Q_IDX(smmu, cons) &&
+	       Q_WRAP(smmu, smmu->cmdq_prod) != Q_WRAP(smmu, cons);
+}
+
+static bool smmu_cmdq_empty(struct hyp_arm_smmu_v3_device *smmu)
+{
+	u64 cons = readl_relaxed(smmu->base + ARM_SMMU_CMDQ_CONS);
+
+	return Q_IDX(smmu, smmu->cmdq_prod) == Q_IDX(smmu, cons) &&
+	       Q_WRAP(smmu, smmu->cmdq_prod) == Q_WRAP(smmu, cons);
+}
+
+static int smmu_add_cmd(struct hyp_arm_smmu_v3_device *smmu,
+			struct arm_smmu_cmdq_ent *ent)
+{
+	int i;
+	int ret;
+	u64 cmd[CMDQ_ENT_DWORDS] = {};
+	int idx = Q_IDX(smmu, smmu->cmdq_prod);
+	u64 *slot = smmu->cmdq_base + idx * CMDQ_ENT_DWORDS;
+
+	ret = smmu_wait_event(smmu, !smmu_cmdq_full(smmu));
+	if (ret)
+		return ret;
+
+	cmd[0] |= FIELD_PREP(CMDQ_0_OP, ent->opcode);
+
+	switch (ent->opcode) {
+	case CMDQ_OP_CFGI_ALL:
+		cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_RANGE, 31);
+		break;
+	case CMDQ_OP_CFGI_STE:
+		cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SID, ent->cfgi.sid);
+		cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_LEAF, ent->cfgi.leaf);
+		break;
+	case CMDQ_OP_TLBI_NSNH_ALL:
+		break;
+	case CMDQ_OP_TLBI_S12_VMALL:
+		cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid);
+		break;
+	case CMDQ_OP_TLBI_S2_IPA:
+		cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_NUM, ent->tlbi.num);
+		cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_SCALE, ent->tlbi.scale);
+		cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid);
+		cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_LEAF, ent->tlbi.leaf);
+		cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_TTL, ent->tlbi.ttl);
+		cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_TG, ent->tlbi.tg);
+		cmd[1] |= ent->tlbi.addr & CMDQ_TLBI_1_IPA_MASK;
+		break;
+	case CMDQ_OP_CMD_SYNC:
+		cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_CS, CMDQ_SYNC_0_CS_SEV);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	for (i = 0; i < CMDQ_ENT_DWORDS; i++)
+		slot[i] = cpu_to_le64(cmd[i]);
+
+	smmu->cmdq_prod++;
+	writel(Q_IDX(smmu, smmu->cmdq_prod) | Q_WRAP(smmu, smmu->cmdq_prod),
+	       smmu->base + ARM_SMMU_CMDQ_PROD);
+	return 0;
+}
+
+static int smmu_sync_cmd(struct hyp_arm_smmu_v3_device *smmu)
+{
+	int ret;
+	struct arm_smmu_cmdq_ent cmd = {
+		.opcode = CMDQ_OP_CMD_SYNC,
+	};
+
+	ret = smmu_add_cmd(smmu, &cmd);
+	if (ret)
+		return ret;
+
+	return smmu_wait_event(smmu, smmu_cmdq_empty(smmu));
+}
+
+__maybe_unused
+static int smmu_send_cmd(struct hyp_arm_smmu_v3_device *smmu,
+			 struct arm_smmu_cmdq_ent *cmd)
+{
+	int ret = smmu_add_cmd(smmu, cmd);
+
+	if (ret)
+		return ret;
+
+	return smmu_sync_cmd(smmu);
+}
+
 static int smmu_init_registers(struct hyp_arm_smmu_v3_device *smmu)
 {
 	u64 val, old;
@@ -77,6 +184,43 @@ static int smmu_init_registers(struct hyp_arm_smmu_v3_device *smmu)
 	return 0;
 }
 
+/* Transfer ownership of structures from host to hyp */
+static void *smmu_take_pages(u64 base, size_t size)
+{
+	void *hyp_ptr;
+
+	hyp_ptr = hyp_phys_to_virt(base);
+	if (pkvm_create_mappings(hyp_ptr, hyp_ptr + size, PAGE_HYP))
+		return NULL;
+
+	return hyp_ptr;
+}
+
+static int smmu_init_cmdq(struct hyp_arm_smmu_v3_device *smmu)
+{
+	u64 cmdq_base;
+	size_t cmdq_nr_entries, cmdq_size;
+
+	cmdq_base = readq_relaxed(smmu->base + ARM_SMMU_CMDQ_BASE);
+	if (cmdq_base & ~(Q_BASE_RWA | Q_BASE_ADDR_MASK | Q_BASE_LOG2SIZE))
+		return -EINVAL;
+
+	smmu->cmdq_log2size = cmdq_base & Q_BASE_LOG2SIZE;
+	cmdq_nr_entries = 1 << smmu->cmdq_log2size;
+	cmdq_size = cmdq_nr_entries * CMDQ_ENT_DWORDS * 8;
+
+	cmdq_base &= Q_BASE_ADDR_MASK;
+	smmu->cmdq_base = smmu_take_pages(cmdq_base, cmdq_size);
+	if (!smmu->cmdq_base)
+		return -EINVAL;
+
+	memset(smmu->cmdq_base, 0, cmdq_size);
+	writel_relaxed(0, smmu->base + ARM_SMMU_CMDQ_PROD);
+	writel_relaxed(0, smmu->base + ARM_SMMU_CMDQ_CONS);
+
+	return 0;
+}
+
 static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
 {
 	int ret;
@@ -93,6 +237,10 @@ static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
 	if (ret)
 		return ret;
 
+	ret = smmu_init_cmdq(smmu);
+	if (ret)
+		return ret;
+
 	return 0;
 }
 
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 23/45] KVM: arm64: smmu-v3: Setup command queue
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Map the command queue allocated by the host into the hypervisor address
space. When the host mappings are finalized, the queue is unmapped from
the host.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 include/kvm/arm_smmu_v3.h                   |   4 +
 arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c | 148 ++++++++++++++++++++
 2 files changed, 152 insertions(+)

diff --git a/include/kvm/arm_smmu_v3.h b/include/kvm/arm_smmu_v3.h
index d4b1e487b7d7..da36737bc1e0 100644
--- a/include/kvm/arm_smmu_v3.h
+++ b/include/kvm/arm_smmu_v3.h
@@ -18,8 +18,12 @@ struct hyp_arm_smmu_v3_device {
 	struct kvm_hyp_iommu	iommu;
 	phys_addr_t		mmio_addr;
 	size_t			mmio_size;
+	unsigned long		features;
 
 	void __iomem		*base;
+	u32			cmdq_prod;
+	u64			*cmdq_base;
+	size_t			cmdq_log2size;
 };
 
 extern size_t kvm_nvhe_sym(kvm_hyp_arm_smmu_v3_count);
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
index 75a6aa01b057..36ee5724f36f 100644
--- a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
@@ -40,12 +40,119 @@ struct hyp_arm_smmu_v3_device __ro_after_init *kvm_hyp_arm_smmu_v3_smmus;
 	__ret;							\
 })
 
+#define smmu_wait_event(_smmu, _cond)				\
+({								\
+	if ((_smmu)->features & ARM_SMMU_FEAT_SEV) {		\
+		while (!(_cond))				\
+			wfe();					\
+	}							\
+	smmu_wait(_cond);					\
+})
+
 static int smmu_write_cr0(struct hyp_arm_smmu_v3_device *smmu, u32 val)
 {
 	writel_relaxed(val, smmu->base + ARM_SMMU_CR0);
 	return smmu_wait(readl_relaxed(smmu->base + ARM_SMMU_CR0ACK) == val);
 }
 
+#define Q_WRAP(smmu, reg)	((reg) & (1 << (smmu)->cmdq_log2size))
+#define Q_IDX(smmu, reg)	((reg) & ((1 << (smmu)->cmdq_log2size) - 1))
+
+static bool smmu_cmdq_full(struct hyp_arm_smmu_v3_device *smmu)
+{
+	u64 cons = readl_relaxed(smmu->base + ARM_SMMU_CMDQ_CONS);
+
+	return Q_IDX(smmu, smmu->cmdq_prod) == Q_IDX(smmu, cons) &&
+	       Q_WRAP(smmu, smmu->cmdq_prod) != Q_WRAP(smmu, cons);
+}
+
+static bool smmu_cmdq_empty(struct hyp_arm_smmu_v3_device *smmu)
+{
+	u64 cons = readl_relaxed(smmu->base + ARM_SMMU_CMDQ_CONS);
+
+	return Q_IDX(smmu, smmu->cmdq_prod) == Q_IDX(smmu, cons) &&
+	       Q_WRAP(smmu, smmu->cmdq_prod) == Q_WRAP(smmu, cons);
+}
+
+static int smmu_add_cmd(struct hyp_arm_smmu_v3_device *smmu,
+			struct arm_smmu_cmdq_ent *ent)
+{
+	int i;
+	int ret;
+	u64 cmd[CMDQ_ENT_DWORDS] = {};
+	int idx = Q_IDX(smmu, smmu->cmdq_prod);
+	u64 *slot = smmu->cmdq_base + idx * CMDQ_ENT_DWORDS;
+
+	ret = smmu_wait_event(smmu, !smmu_cmdq_full(smmu));
+	if (ret)
+		return ret;
+
+	cmd[0] |= FIELD_PREP(CMDQ_0_OP, ent->opcode);
+
+	switch (ent->opcode) {
+	case CMDQ_OP_CFGI_ALL:
+		cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_RANGE, 31);
+		break;
+	case CMDQ_OP_CFGI_STE:
+		cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SID, ent->cfgi.sid);
+		cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_LEAF, ent->cfgi.leaf);
+		break;
+	case CMDQ_OP_TLBI_NSNH_ALL:
+		break;
+	case CMDQ_OP_TLBI_S12_VMALL:
+		cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid);
+		break;
+	case CMDQ_OP_TLBI_S2_IPA:
+		cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_NUM, ent->tlbi.num);
+		cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_SCALE, ent->tlbi.scale);
+		cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid);
+		cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_LEAF, ent->tlbi.leaf);
+		cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_TTL, ent->tlbi.ttl);
+		cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_TG, ent->tlbi.tg);
+		cmd[1] |= ent->tlbi.addr & CMDQ_TLBI_1_IPA_MASK;
+		break;
+	case CMDQ_OP_CMD_SYNC:
+		cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_CS, CMDQ_SYNC_0_CS_SEV);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	for (i = 0; i < CMDQ_ENT_DWORDS; i++)
+		slot[i] = cpu_to_le64(cmd[i]);
+
+	smmu->cmdq_prod++;
+	writel(Q_IDX(smmu, smmu->cmdq_prod) | Q_WRAP(smmu, smmu->cmdq_prod),
+	       smmu->base + ARM_SMMU_CMDQ_PROD);
+	return 0;
+}
+
+static int smmu_sync_cmd(struct hyp_arm_smmu_v3_device *smmu)
+{
+	int ret;
+	struct arm_smmu_cmdq_ent cmd = {
+		.opcode = CMDQ_OP_CMD_SYNC,
+	};
+
+	ret = smmu_add_cmd(smmu, &cmd);
+	if (ret)
+		return ret;
+
+	return smmu_wait_event(smmu, smmu_cmdq_empty(smmu));
+}
+
+__maybe_unused
+static int smmu_send_cmd(struct hyp_arm_smmu_v3_device *smmu,
+			 struct arm_smmu_cmdq_ent *cmd)
+{
+	int ret = smmu_add_cmd(smmu, cmd);
+
+	if (ret)
+		return ret;
+
+	return smmu_sync_cmd(smmu);
+}
+
 static int smmu_init_registers(struct hyp_arm_smmu_v3_device *smmu)
 {
 	u64 val, old;
@@ -77,6 +184,43 @@ static int smmu_init_registers(struct hyp_arm_smmu_v3_device *smmu)
 	return 0;
 }
 
+/* Transfer ownership of structures from host to hyp */
+static void *smmu_take_pages(u64 base, size_t size)
+{
+	void *hyp_ptr;
+
+	hyp_ptr = hyp_phys_to_virt(base);
+	if (pkvm_create_mappings(hyp_ptr, hyp_ptr + size, PAGE_HYP))
+		return NULL;
+
+	return hyp_ptr;
+}
+
+static int smmu_init_cmdq(struct hyp_arm_smmu_v3_device *smmu)
+{
+	u64 cmdq_base;
+	size_t cmdq_nr_entries, cmdq_size;
+
+	cmdq_base = readq_relaxed(smmu->base + ARM_SMMU_CMDQ_BASE);
+	if (cmdq_base & ~(Q_BASE_RWA | Q_BASE_ADDR_MASK | Q_BASE_LOG2SIZE))
+		return -EINVAL;
+
+	smmu->cmdq_log2size = cmdq_base & Q_BASE_LOG2SIZE;
+	cmdq_nr_entries = 1 << smmu->cmdq_log2size;
+	cmdq_size = cmdq_nr_entries * CMDQ_ENT_DWORDS * 8;
+
+	cmdq_base &= Q_BASE_ADDR_MASK;
+	smmu->cmdq_base = smmu_take_pages(cmdq_base, cmdq_size);
+	if (!smmu->cmdq_base)
+		return -EINVAL;
+
+	memset(smmu->cmdq_base, 0, cmdq_size);
+	writel_relaxed(0, smmu->base + ARM_SMMU_CMDQ_PROD);
+	writel_relaxed(0, smmu->base + ARM_SMMU_CMDQ_CONS);
+
+	return 0;
+}
+
 static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
 {
 	int ret;
@@ -93,6 +237,10 @@ static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
 	if (ret)
 		return ret;
 
+	ret = smmu_init_cmdq(smmu);
+	if (ret)
+		return ret;
+
 	return 0;
 }
 
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 24/45] KVM: arm64: smmu-v3: Setup stream table
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Map the stream table allocated by the host into the hypervisor address
space. When the host mappings are finalized, the table is unmapped from
the host. Depending on the host configuration, the stream table may have
one or two levels. Populate the level-2 stream table lazily.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 include/kvm/arm_smmu_v3.h                   |   4 +
 arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c | 133 +++++++++++++++++++-
 2 files changed, 136 insertions(+), 1 deletion(-)

diff --git a/include/kvm/arm_smmu_v3.h b/include/kvm/arm_smmu_v3.h
index da36737bc1e0..fc67a3bf5709 100644
--- a/include/kvm/arm_smmu_v3.h
+++ b/include/kvm/arm_smmu_v3.h
@@ -24,6 +24,10 @@ struct hyp_arm_smmu_v3_device {
 	u32			cmdq_prod;
 	u64			*cmdq_base;
 	size_t			cmdq_log2size;
+	u64			*strtab_base;
+	size_t			strtab_num_entries;
+	size_t			strtab_num_l1_entries;
+	u8			strtab_split;
 };
 
 extern size_t kvm_nvhe_sym(kvm_hyp_arm_smmu_v3_count);
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
index 36ee5724f36f..021bebebd40c 100644
--- a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
@@ -141,7 +141,6 @@ static int smmu_sync_cmd(struct hyp_arm_smmu_v3_device *smmu)
 	return smmu_wait_event(smmu, smmu_cmdq_empty(smmu));
 }
 
-__maybe_unused
 static int smmu_send_cmd(struct hyp_arm_smmu_v3_device *smmu,
 			 struct arm_smmu_cmdq_ent *cmd)
 {
@@ -153,6 +152,82 @@ static int smmu_send_cmd(struct hyp_arm_smmu_v3_device *smmu,
 	return smmu_sync_cmd(smmu);
 }
 
+__maybe_unused
+static int smmu_sync_ste(struct hyp_arm_smmu_v3_device *smmu, u32 sid)
+{
+	struct arm_smmu_cmdq_ent cmd = {
+		.opcode = CMDQ_OP_CFGI_STE,
+		.cfgi.sid = sid,
+		.cfgi.leaf = true,
+	};
+
+	return smmu_send_cmd(smmu, &cmd);
+}
+
+static int smmu_alloc_l2_strtab(struct hyp_arm_smmu_v3_device *smmu, u32 idx)
+{
+	void *table;
+	u64 l2ptr, span;
+
+	/* Leaf tables must be page-sized */
+	if (smmu->strtab_split + ilog2(STRTAB_STE_DWORDS) + 3 != PAGE_SHIFT)
+		return -EINVAL;
+
+	span = smmu->strtab_split + 1;
+	if (WARN_ON(span < 1 || span > 11))
+		return -EINVAL;
+
+	table = kvm_iommu_donate_page();
+	if (!table)
+		return -ENOMEM;
+
+	l2ptr = hyp_virt_to_phys(table);
+	if (l2ptr & (~STRTAB_L1_DESC_L2PTR_MASK | ~PAGE_MASK))
+		return -EINVAL;
+
+	/* Ensure the empty stream table is visible before the descriptor write */
+	wmb();
+
+	if ((cmpxchg64_relaxed(&smmu->strtab_base[idx], 0, l2ptr | span) != 0))
+		kvm_iommu_reclaim_page(table);
+
+	return 0;
+}
+
+__maybe_unused
+static u64 *smmu_get_ste_ptr(struct hyp_arm_smmu_v3_device *smmu, u32 sid)
+{
+	u32 idx;
+	int ret;
+	u64 l1std, span, *base;
+
+	if (sid >= smmu->strtab_num_entries)
+		return NULL;
+	sid = array_index_nospec(sid, smmu->strtab_num_entries);
+
+	if (!smmu->strtab_split)
+		return smmu->strtab_base + sid * STRTAB_STE_DWORDS;
+
+	idx = sid >> smmu->strtab_split;
+	l1std = smmu->strtab_base[idx];
+	if (!l1std) {
+		ret = smmu_alloc_l2_strtab(smmu, idx);
+		if (ret)
+			return NULL;
+		l1std = smmu->strtab_base[idx];
+		if (WARN_ON(!l1std))
+			return NULL;
+	}
+
+	span = l1std & STRTAB_L1_DESC_SPAN;
+	idx = sid & ((1 << smmu->strtab_split) - 1);
+	if (!span || idx >= (1 << (span - 1)))
+		return NULL;
+
+	base = hyp_phys_to_virt(l1std & STRTAB_L1_DESC_L2PTR_MASK);
+	return base + idx * STRTAB_STE_DWORDS;
+}
+
 static int smmu_init_registers(struct hyp_arm_smmu_v3_device *smmu)
 {
 	u64 val, old;
@@ -221,6 +296,58 @@ static int smmu_init_cmdq(struct hyp_arm_smmu_v3_device *smmu)
 	return 0;
 }
 
+static int smmu_init_strtab(struct hyp_arm_smmu_v3_device *smmu)
+{
+	u64 strtab_base;
+	size_t strtab_size;
+	u32 strtab_cfg, fmt;
+	int split, log2size;
+
+	strtab_base = readq_relaxed(smmu->base + ARM_SMMU_STRTAB_BASE);
+	if (strtab_base & ~(STRTAB_BASE_ADDR_MASK | STRTAB_BASE_RA))
+		return -EINVAL;
+
+	strtab_cfg = readl_relaxed(smmu->base + ARM_SMMU_STRTAB_BASE_CFG);
+	if (strtab_cfg & ~(STRTAB_BASE_CFG_FMT | STRTAB_BASE_CFG_SPLIT |
+			   STRTAB_BASE_CFG_LOG2SIZE))
+		return -EINVAL;
+
+	fmt = FIELD_GET(STRTAB_BASE_CFG_FMT, strtab_cfg);
+	split = FIELD_GET(STRTAB_BASE_CFG_SPLIT, strtab_cfg);
+	log2size = FIELD_GET(STRTAB_BASE_CFG_LOG2SIZE, strtab_cfg);
+
+	smmu->strtab_split = split;
+	smmu->strtab_num_entries = 1 << log2size;
+
+	switch (fmt) {
+	case STRTAB_BASE_CFG_FMT_LINEAR:
+		if (split)
+			return -EINVAL;
+		smmu->strtab_num_l1_entries = smmu->strtab_num_entries;
+		strtab_size = smmu->strtab_num_l1_entries *
+			      STRTAB_STE_DWORDS * 8;
+		break;
+	case STRTAB_BASE_CFG_FMT_2LVL:
+		if (split != 6 && split != 8 && split != 10)
+			return -EINVAL;
+		smmu->strtab_num_l1_entries = 1 << max(0, log2size - split);
+		strtab_size = smmu->strtab_num_l1_entries *
+			      STRTAB_L1_DESC_DWORDS * 8;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	strtab_base &= STRTAB_BASE_ADDR_MASK;
+	smmu->strtab_base = smmu_take_pages(strtab_base, strtab_size);
+	if (!smmu->strtab_base)
+		return -EINVAL;
+
+	/* Disable all STEs */
+	memset(smmu->strtab_base, 0, strtab_size);
+	return 0;
+}
+
 static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
 {
 	int ret;
@@ -241,6 +368,10 @@ static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
 	if (ret)
 		return ret;
 
+	ret = smmu_init_strtab(smmu);
+	if (ret)
+		return ret;
+
 	return 0;
 }
 
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 24/45] KVM: arm64: smmu-v3: Setup stream table
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Map the stream table allocated by the host into the hypervisor address
space. When the host mappings are finalized, the table is unmapped from
the host. Depending on the host configuration, the stream table may have
one or two levels. Populate the level-2 stream table lazily.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 include/kvm/arm_smmu_v3.h                   |   4 +
 arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c | 133 +++++++++++++++++++-
 2 files changed, 136 insertions(+), 1 deletion(-)

diff --git a/include/kvm/arm_smmu_v3.h b/include/kvm/arm_smmu_v3.h
index da36737bc1e0..fc67a3bf5709 100644
--- a/include/kvm/arm_smmu_v3.h
+++ b/include/kvm/arm_smmu_v3.h
@@ -24,6 +24,10 @@ struct hyp_arm_smmu_v3_device {
 	u32			cmdq_prod;
 	u64			*cmdq_base;
 	size_t			cmdq_log2size;
+	u64			*strtab_base;
+	size_t			strtab_num_entries;
+	size_t			strtab_num_l1_entries;
+	u8			strtab_split;
 };
 
 extern size_t kvm_nvhe_sym(kvm_hyp_arm_smmu_v3_count);
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
index 36ee5724f36f..021bebebd40c 100644
--- a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
@@ -141,7 +141,6 @@ static int smmu_sync_cmd(struct hyp_arm_smmu_v3_device *smmu)
 	return smmu_wait_event(smmu, smmu_cmdq_empty(smmu));
 }
 
-__maybe_unused
 static int smmu_send_cmd(struct hyp_arm_smmu_v3_device *smmu,
 			 struct arm_smmu_cmdq_ent *cmd)
 {
@@ -153,6 +152,82 @@ static int smmu_send_cmd(struct hyp_arm_smmu_v3_device *smmu,
 	return smmu_sync_cmd(smmu);
 }
 
+__maybe_unused
+static int smmu_sync_ste(struct hyp_arm_smmu_v3_device *smmu, u32 sid)
+{
+	struct arm_smmu_cmdq_ent cmd = {
+		.opcode = CMDQ_OP_CFGI_STE,
+		.cfgi.sid = sid,
+		.cfgi.leaf = true,
+	};
+
+	return smmu_send_cmd(smmu, &cmd);
+}
+
+static int smmu_alloc_l2_strtab(struct hyp_arm_smmu_v3_device *smmu, u32 idx)
+{
+	void *table;
+	u64 l2ptr, span;
+
+	/* Leaf tables must be page-sized */
+	if (smmu->strtab_split + ilog2(STRTAB_STE_DWORDS) + 3 != PAGE_SHIFT)
+		return -EINVAL;
+
+	span = smmu->strtab_split + 1;
+	if (WARN_ON(span < 1 || span > 11))
+		return -EINVAL;
+
+	table = kvm_iommu_donate_page();
+	if (!table)
+		return -ENOMEM;
+
+	l2ptr = hyp_virt_to_phys(table);
+	if (l2ptr & (~STRTAB_L1_DESC_L2PTR_MASK | ~PAGE_MASK))
+		return -EINVAL;
+
+	/* Ensure the empty stream table is visible before the descriptor write */
+	wmb();
+
+	if ((cmpxchg64_relaxed(&smmu->strtab_base[idx], 0, l2ptr | span) != 0))
+		kvm_iommu_reclaim_page(table);
+
+	return 0;
+}
+
+__maybe_unused
+static u64 *smmu_get_ste_ptr(struct hyp_arm_smmu_v3_device *smmu, u32 sid)
+{
+	u32 idx;
+	int ret;
+	u64 l1std, span, *base;
+
+	if (sid >= smmu->strtab_num_entries)
+		return NULL;
+	sid = array_index_nospec(sid, smmu->strtab_num_entries);
+
+	if (!smmu->strtab_split)
+		return smmu->strtab_base + sid * STRTAB_STE_DWORDS;
+
+	idx = sid >> smmu->strtab_split;
+	l1std = smmu->strtab_base[idx];
+	if (!l1std) {
+		ret = smmu_alloc_l2_strtab(smmu, idx);
+		if (ret)
+			return NULL;
+		l1std = smmu->strtab_base[idx];
+		if (WARN_ON(!l1std))
+			return NULL;
+	}
+
+	span = l1std & STRTAB_L1_DESC_SPAN;
+	idx = sid & ((1 << smmu->strtab_split) - 1);
+	if (!span || idx >= (1 << (span - 1)))
+		return NULL;
+
+	base = hyp_phys_to_virt(l1std & STRTAB_L1_DESC_L2PTR_MASK);
+	return base + idx * STRTAB_STE_DWORDS;
+}
+
 static int smmu_init_registers(struct hyp_arm_smmu_v3_device *smmu)
 {
 	u64 val, old;
@@ -221,6 +296,58 @@ static int smmu_init_cmdq(struct hyp_arm_smmu_v3_device *smmu)
 	return 0;
 }
 
+static int smmu_init_strtab(struct hyp_arm_smmu_v3_device *smmu)
+{
+	u64 strtab_base;
+	size_t strtab_size;
+	u32 strtab_cfg, fmt;
+	int split, log2size;
+
+	strtab_base = readq_relaxed(smmu->base + ARM_SMMU_STRTAB_BASE);
+	if (strtab_base & ~(STRTAB_BASE_ADDR_MASK | STRTAB_BASE_RA))
+		return -EINVAL;
+
+	strtab_cfg = readl_relaxed(smmu->base + ARM_SMMU_STRTAB_BASE_CFG);
+	if (strtab_cfg & ~(STRTAB_BASE_CFG_FMT | STRTAB_BASE_CFG_SPLIT |
+			   STRTAB_BASE_CFG_LOG2SIZE))
+		return -EINVAL;
+
+	fmt = FIELD_GET(STRTAB_BASE_CFG_FMT, strtab_cfg);
+	split = FIELD_GET(STRTAB_BASE_CFG_SPLIT, strtab_cfg);
+	log2size = FIELD_GET(STRTAB_BASE_CFG_LOG2SIZE, strtab_cfg);
+
+	smmu->strtab_split = split;
+	smmu->strtab_num_entries = 1 << log2size;
+
+	switch (fmt) {
+	case STRTAB_BASE_CFG_FMT_LINEAR:
+		if (split)
+			return -EINVAL;
+		smmu->strtab_num_l1_entries = smmu->strtab_num_entries;
+		strtab_size = smmu->strtab_num_l1_entries *
+			      STRTAB_STE_DWORDS * 8;
+		break;
+	case STRTAB_BASE_CFG_FMT_2LVL:
+		if (split != 6 && split != 8 && split != 10)
+			return -EINVAL;
+		smmu->strtab_num_l1_entries = 1 << max(0, log2size - split);
+		strtab_size = smmu->strtab_num_l1_entries *
+			      STRTAB_L1_DESC_DWORDS * 8;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	strtab_base &= STRTAB_BASE_ADDR_MASK;
+	smmu->strtab_base = smmu_take_pages(strtab_base, strtab_size);
+	if (!smmu->strtab_base)
+		return -EINVAL;
+
+	/* Disable all STEs */
+	memset(smmu->strtab_base, 0, strtab_size);
+	return 0;
+}
+
 static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
 {
 	int ret;
@@ -241,6 +368,10 @@ static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
 	if (ret)
 		return ret;
 
+	ret = smmu_init_strtab(smmu);
+	if (ret)
+		return ret;
+
 	return 0;
 }
 
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 25/45] KVM: arm64: smmu-v3: Reset the device
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Now that all structures are initialized, send global invalidations and
reset the SMMUv3 device.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c | 36 ++++++++++++++++++++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
index 021bebebd40c..81040339ccfe 100644
--- a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
@@ -348,6 +348,40 @@ static int smmu_init_strtab(struct hyp_arm_smmu_v3_device *smmu)
 	return 0;
 }
 
+static int smmu_reset_device(struct hyp_arm_smmu_v3_device *smmu)
+{
+	int ret;
+	struct arm_smmu_cmdq_ent cfgi_cmd = {
+		.opcode = CMDQ_OP_CFGI_ALL,
+	};
+	struct arm_smmu_cmdq_ent tlbi_cmd = {
+		.opcode = CMDQ_OP_TLBI_NSNH_ALL,
+	};
+
+	/* Invalidate all cached configs and TLBs */
+	ret = smmu_write_cr0(smmu, CR0_CMDQEN);
+	if (ret)
+		return ret;
+
+	ret = smmu_add_cmd(smmu, &cfgi_cmd);
+	if (ret)
+		goto err_disable_cmdq;
+
+	ret = smmu_add_cmd(smmu, &tlbi_cmd);
+	if (ret)
+		goto err_disable_cmdq;
+
+	ret = smmu_sync_cmd(smmu);
+	if (ret)
+		goto err_disable_cmdq;
+
+	/* Enable translation */
+	return smmu_write_cr0(smmu, CR0_SMMUEN | CR0_CMDQEN | CR0_ATSCHK);
+
+err_disable_cmdq:
+	return smmu_write_cr0(smmu, 0);
+}
+
 static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
 {
 	int ret;
@@ -372,7 +406,7 @@ static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
 	if (ret)
 		return ret;
 
-	return 0;
+	return smmu_reset_device(smmu);
 }
 
 static int smmu_init(void)
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 25/45] KVM: arm64: smmu-v3: Reset the device
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Now that all structures are initialized, send global invalidations and
reset the SMMUv3 device.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c | 36 ++++++++++++++++++++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
index 021bebebd40c..81040339ccfe 100644
--- a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
@@ -348,6 +348,40 @@ static int smmu_init_strtab(struct hyp_arm_smmu_v3_device *smmu)
 	return 0;
 }
 
+static int smmu_reset_device(struct hyp_arm_smmu_v3_device *smmu)
+{
+	int ret;
+	struct arm_smmu_cmdq_ent cfgi_cmd = {
+		.opcode = CMDQ_OP_CFGI_ALL,
+	};
+	struct arm_smmu_cmdq_ent tlbi_cmd = {
+		.opcode = CMDQ_OP_TLBI_NSNH_ALL,
+	};
+
+	/* Invalidate all cached configs and TLBs */
+	ret = smmu_write_cr0(smmu, CR0_CMDQEN);
+	if (ret)
+		return ret;
+
+	ret = smmu_add_cmd(smmu, &cfgi_cmd);
+	if (ret)
+		goto err_disable_cmdq;
+
+	ret = smmu_add_cmd(smmu, &tlbi_cmd);
+	if (ret)
+		goto err_disable_cmdq;
+
+	ret = smmu_sync_cmd(smmu);
+	if (ret)
+		goto err_disable_cmdq;
+
+	/* Enable translation */
+	return smmu_write_cr0(smmu, CR0_SMMUEN | CR0_CMDQEN | CR0_ATSCHK);
+
+err_disable_cmdq:
+	return smmu_write_cr0(smmu, 0);
+}
+
 static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
 {
 	int ret;
@@ -372,7 +406,7 @@ static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
 	if (ret)
 		return ret;
 
-	return 0;
+	return smmu_reset_device(smmu);
 }
 
 static int smmu_init(void)
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 26/45] KVM: arm64: smmu-v3: Support io-pgtable
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Implement the hypervisor version of io-pgtable allocation functions,
mirroring drivers/iommu/io-pgtable-arm.c. Page allocation uses the IOMMU
memcache filled by the host, except for the PGD which may be larger than
a page.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/kvm/hyp/nvhe/Makefile              |  2 +
 arch/arm64/kvm/hyp/include/nvhe/iommu.h       |  7 ++
 include/linux/io-pgtable-arm.h                |  6 ++
 .../arm64/kvm/hyp/nvhe/iommu/io-pgtable-arm.c | 97 +++++++++++++++++++
 4 files changed, 112 insertions(+)
 create mode 100644 arch/arm64/kvm/hyp/nvhe/iommu/io-pgtable-arm.c

diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 349c874762c8..8359909bd796 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -30,6 +30,8 @@ hyp-obj-y += $(lib-objs)
 
 hyp-obj-$(CONFIG_KVM_IOMMU) += iommu/iommu.o
 hyp-obj-$(CONFIG_ARM_SMMU_V3_PKVM) += iommu/arm-smmu-v3.o
+hyp-obj-$(CONFIG_ARM_SMMU_V3_PKVM) += iommu/io-pgtable-arm.o \
+	../../../../../drivers/iommu/io-pgtable-arm-common.o
 
 ##
 ## Build rules for compiling nVHE hyp code
diff --git a/arch/arm64/kvm/hyp/include/nvhe/iommu.h b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
index 0ba59d20bef3..c7744cca6e13 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/iommu.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
@@ -6,7 +6,14 @@
 #include <linux/io-pgtable.h>
 
 #if IS_ENABLED(CONFIG_ARM_SMMU_V3_PKVM)
+#include <linux/io-pgtable-arm.h>
+
 int kvm_arm_smmu_v3_register(void);
+
+int kvm_arm_io_pgtable_init(struct io_pgtable_cfg *cfg,
+			    struct arm_lpae_io_pgtable *data);
+int kvm_arm_io_pgtable_alloc(struct io_pgtable *iop, unsigned long pgd_hva);
+int kvm_arm_io_pgtable_free(struct io_pgtable *iop);
 #else /* CONFIG_ARM_SMMU_V3_PKVM */
 static inline int kvm_arm_smmu_v3_register(void)
 {
diff --git a/include/linux/io-pgtable-arm.h b/include/linux/io-pgtable-arm.h
index 2b3e69386d08..b89b8ec57721 100644
--- a/include/linux/io-pgtable-arm.h
+++ b/include/linux/io-pgtable-arm.h
@@ -161,8 +161,14 @@ static inline bool iopte_leaf(arm_lpae_iopte pte, int lvl,
 	return iopte_type(pte) == ARM_LPAE_PTE_TYPE_BLOCK;
 }
 
+#ifdef __KVM_NVHE_HYPERVISOR__
+#include <nvhe/memory.h>
+#define __arm_lpae_virt_to_phys	hyp_virt_to_phys
+#define __arm_lpae_phys_to_virt	hyp_phys_to_virt
+#else
 #define __arm_lpae_virt_to_phys	__pa
 #define __arm_lpae_phys_to_virt	__va
+#endif
 
 /* Generic functions */
 void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/io-pgtable-arm.c b/arch/arm64/kvm/hyp/nvhe/iommu/io-pgtable-arm.c
new file mode 100644
index 000000000000..a46490acb45c
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/io-pgtable-arm.c
@@ -0,0 +1,97 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2022 Arm Ltd.
+ */
+#include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
+#include <kvm/arm_smmu_v3.h>
+#include <linux/types.h>
+#include <linux/gfp_types.h>
+#include <linux/io-pgtable-arm.h>
+
+#include <nvhe/iommu.h>
+#include <nvhe/mem_protect.h>
+
+bool __ro_after_init selftest_running;
+
+void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp, struct io_pgtable_cfg *cfg)
+{
+	void *addr = kvm_iommu_donate_page();
+
+	BUG_ON(size != PAGE_SIZE);
+
+	if (addr && !cfg->coherent_walk)
+		kvm_flush_dcache_to_poc(addr, size);
+
+	return addr;
+}
+
+void __arm_lpae_free_pages(void *addr, size_t size, struct io_pgtable_cfg *cfg)
+{
+	BUG_ON(size != PAGE_SIZE);
+
+	if (!cfg->coherent_walk)
+		kvm_flush_dcache_to_poc(addr, size);
+
+	kvm_iommu_reclaim_page(addr);
+}
+
+void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries,
+			 struct io_pgtable_cfg *cfg)
+{
+	if (!cfg->coherent_walk)
+		kvm_flush_dcache_to_poc(ptep, sizeof(*ptep) * num_entries);
+}
+
+int kvm_arm_io_pgtable_init(struct io_pgtable_cfg *cfg,
+			    struct arm_lpae_io_pgtable *data)
+{
+	int ret = arm_lpae_init_pgtable_s2(cfg, data);
+
+	if (ret)
+		return ret;
+
+	data->iop.cfg = *cfg;
+	return 0;
+}
+
+int kvm_arm_io_pgtable_alloc(struct io_pgtable *iopt, unsigned long pgd_hva)
+{
+	size_t pgd_size, alignment;
+	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(iopt->ops);
+
+	pgd_size = ARM_LPAE_PGD_SIZE(data);
+	/*
+	 * If it has eight or more entries, the table must be aligned on
+	 * its size. Otherwise 64 bytes.
+	 */
+	alignment = max(pgd_size, 8 * sizeof(arm_lpae_iopte));
+	if (!IS_ALIGNED(pgd_hva, alignment))
+		return -EINVAL;
+
+	iopt->pgd = pkvm_map_donated_memory(pgd_hva, pgd_size);
+	if (!iopt->pgd)
+		return -ENOMEM;
+
+	if (!data->iop.cfg.coherent_walk)
+		kvm_flush_dcache_to_poc(iopt->pgd, pgd_size);
+
+	/* Ensure the empty pgd is visible before any actual TTBR write */
+	wmb();
+
+	return 0;
+}
+
+int kvm_arm_io_pgtable_free(struct io_pgtable *iopt)
+{
+	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(iopt->ops);
+	size_t pgd_size = ARM_LPAE_PGD_SIZE(data);
+
+	if (!data->iop.cfg.coherent_walk)
+		kvm_flush_dcache_to_poc(iopt->pgd, pgd_size);
+
+	/* Free all tables but the pgd */
+	__arm_lpae_free_pgtable(data, data->start_level, iopt->pgd, true);
+	pkvm_unmap_donated_memory(iopt->pgd, pgd_size);
+	return 0;
+}
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 26/45] KVM: arm64: smmu-v3: Support io-pgtable
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Implement the hypervisor version of io-pgtable allocation functions,
mirroring drivers/iommu/io-pgtable-arm.c. Page allocation uses the IOMMU
memcache filled by the host, except for the PGD which may be larger than
a page.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/kvm/hyp/nvhe/Makefile              |  2 +
 arch/arm64/kvm/hyp/include/nvhe/iommu.h       |  7 ++
 include/linux/io-pgtable-arm.h                |  6 ++
 .../arm64/kvm/hyp/nvhe/iommu/io-pgtable-arm.c | 97 +++++++++++++++++++
 4 files changed, 112 insertions(+)
 create mode 100644 arch/arm64/kvm/hyp/nvhe/iommu/io-pgtable-arm.c

diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 349c874762c8..8359909bd796 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -30,6 +30,8 @@ hyp-obj-y += $(lib-objs)
 
 hyp-obj-$(CONFIG_KVM_IOMMU) += iommu/iommu.o
 hyp-obj-$(CONFIG_ARM_SMMU_V3_PKVM) += iommu/arm-smmu-v3.o
+hyp-obj-$(CONFIG_ARM_SMMU_V3_PKVM) += iommu/io-pgtable-arm.o \
+	../../../../../drivers/iommu/io-pgtable-arm-common.o
 
 ##
 ## Build rules for compiling nVHE hyp code
diff --git a/arch/arm64/kvm/hyp/include/nvhe/iommu.h b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
index 0ba59d20bef3..c7744cca6e13 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/iommu.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
@@ -6,7 +6,14 @@
 #include <linux/io-pgtable.h>
 
 #if IS_ENABLED(CONFIG_ARM_SMMU_V3_PKVM)
+#include <linux/io-pgtable-arm.h>
+
 int kvm_arm_smmu_v3_register(void);
+
+int kvm_arm_io_pgtable_init(struct io_pgtable_cfg *cfg,
+			    struct arm_lpae_io_pgtable *data);
+int kvm_arm_io_pgtable_alloc(struct io_pgtable *iop, unsigned long pgd_hva);
+int kvm_arm_io_pgtable_free(struct io_pgtable *iop);
 #else /* CONFIG_ARM_SMMU_V3_PKVM */
 static inline int kvm_arm_smmu_v3_register(void)
 {
diff --git a/include/linux/io-pgtable-arm.h b/include/linux/io-pgtable-arm.h
index 2b3e69386d08..b89b8ec57721 100644
--- a/include/linux/io-pgtable-arm.h
+++ b/include/linux/io-pgtable-arm.h
@@ -161,8 +161,14 @@ static inline bool iopte_leaf(arm_lpae_iopte pte, int lvl,
 	return iopte_type(pte) == ARM_LPAE_PTE_TYPE_BLOCK;
 }
 
+#ifdef __KVM_NVHE_HYPERVISOR__
+#include <nvhe/memory.h>
+#define __arm_lpae_virt_to_phys	hyp_virt_to_phys
+#define __arm_lpae_phys_to_virt	hyp_phys_to_virt
+#else
 #define __arm_lpae_virt_to_phys	__pa
 #define __arm_lpae_phys_to_virt	__va
+#endif
 
 /* Generic functions */
 void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/io-pgtable-arm.c b/arch/arm64/kvm/hyp/nvhe/iommu/io-pgtable-arm.c
new file mode 100644
index 000000000000..a46490acb45c
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/io-pgtable-arm.c
@@ -0,0 +1,97 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2022 Arm Ltd.
+ */
+#include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
+#include <kvm/arm_smmu_v3.h>
+#include <linux/types.h>
+#include <linux/gfp_types.h>
+#include <linux/io-pgtable-arm.h>
+
+#include <nvhe/iommu.h>
+#include <nvhe/mem_protect.h>
+
+bool __ro_after_init selftest_running;
+
+void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp, struct io_pgtable_cfg *cfg)
+{
+	void *addr = kvm_iommu_donate_page();
+
+	BUG_ON(size != PAGE_SIZE);
+
+	if (addr && !cfg->coherent_walk)
+		kvm_flush_dcache_to_poc(addr, size);
+
+	return addr;
+}
+
+void __arm_lpae_free_pages(void *addr, size_t size, struct io_pgtable_cfg *cfg)
+{
+	BUG_ON(size != PAGE_SIZE);
+
+	if (!cfg->coherent_walk)
+		kvm_flush_dcache_to_poc(addr, size);
+
+	kvm_iommu_reclaim_page(addr);
+}
+
+void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries,
+			 struct io_pgtable_cfg *cfg)
+{
+	if (!cfg->coherent_walk)
+		kvm_flush_dcache_to_poc(ptep, sizeof(*ptep) * num_entries);
+}
+
+int kvm_arm_io_pgtable_init(struct io_pgtable_cfg *cfg,
+			    struct arm_lpae_io_pgtable *data)
+{
+	int ret = arm_lpae_init_pgtable_s2(cfg, data);
+
+	if (ret)
+		return ret;
+
+	data->iop.cfg = *cfg;
+	return 0;
+}
+
+int kvm_arm_io_pgtable_alloc(struct io_pgtable *iopt, unsigned long pgd_hva)
+{
+	size_t pgd_size, alignment;
+	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(iopt->ops);
+
+	pgd_size = ARM_LPAE_PGD_SIZE(data);
+	/*
+	 * If it has eight or more entries, the table must be aligned on
+	 * its size. Otherwise 64 bytes.
+	 */
+	alignment = max(pgd_size, 8 * sizeof(arm_lpae_iopte));
+	if (!IS_ALIGNED(pgd_hva, alignment))
+		return -EINVAL;
+
+	iopt->pgd = pkvm_map_donated_memory(pgd_hva, pgd_size);
+	if (!iopt->pgd)
+		return -ENOMEM;
+
+	if (!data->iop.cfg.coherent_walk)
+		kvm_flush_dcache_to_poc(iopt->pgd, pgd_size);
+
+	/* Ensure the empty pgd is visible before any actual TTBR write */
+	wmb();
+
+	return 0;
+}
+
+int kvm_arm_io_pgtable_free(struct io_pgtable *iopt)
+{
+	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(iopt->ops);
+	size_t pgd_size = ARM_LPAE_PGD_SIZE(data);
+
+	if (!data->iop.cfg.coherent_walk)
+		kvm_flush_dcache_to_poc(iopt->pgd, pgd_size);
+
+	/* Free all tables but the pgd */
+	__arm_lpae_free_pgtable(data, data->start_level, iopt->pgd, true);
+	pkvm_unmap_donated_memory(iopt->pgd, pgd_size);
+	return 0;
+}
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 27/45] KVM: arm64: smmu-v3: Setup domains and page table configuration
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Setup the stream table entries when the host issues the attach_dev() and
detach_dev() hypercalls. The driver holds one io-pgtable configuration
for all domains.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 include/kvm/arm_smmu_v3.h                   |   2 +
 arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c | 178 +++++++++++++++++++-
 2 files changed, 177 insertions(+), 3 deletions(-)

diff --git a/include/kvm/arm_smmu_v3.h b/include/kvm/arm_smmu_v3.h
index fc67a3bf5709..ed139b0e9612 100644
--- a/include/kvm/arm_smmu_v3.h
+++ b/include/kvm/arm_smmu_v3.h
@@ -3,6 +3,7 @@
 #define __KVM_ARM_SMMU_V3_H
 
 #include <asm/kvm_asm.h>
+#include <linux/io-pgtable-arm.h>
 #include <kvm/iommu.h>
 
 #if IS_ENABLED(CONFIG_ARM_SMMU_V3_PKVM)
@@ -28,6 +29,7 @@ struct hyp_arm_smmu_v3_device {
 	size_t			strtab_num_entries;
 	size_t			strtab_num_l1_entries;
 	u8			strtab_split;
+	struct arm_lpae_io_pgtable pgtable;
 };
 
 extern size_t kvm_nvhe_sym(kvm_hyp_arm_smmu_v3_count);
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
index 81040339ccfe..56e313203a16 100644
--- a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
@@ -152,7 +152,6 @@ static int smmu_send_cmd(struct hyp_arm_smmu_v3_device *smmu,
 	return smmu_sync_cmd(smmu);
 }
 
-__maybe_unused
 static int smmu_sync_ste(struct hyp_arm_smmu_v3_device *smmu, u32 sid)
 {
 	struct arm_smmu_cmdq_ent cmd = {
@@ -194,7 +193,6 @@ static int smmu_alloc_l2_strtab(struct hyp_arm_smmu_v3_device *smmu, u32 idx)
 	return 0;
 }
 
-__maybe_unused
 static u64 *smmu_get_ste_ptr(struct hyp_arm_smmu_v3_device *smmu, u32 sid)
 {
 	u32 idx;
@@ -382,6 +380,68 @@ static int smmu_reset_device(struct hyp_arm_smmu_v3_device *smmu)
 	return smmu_write_cr0(smmu, 0);
 }
 
+static struct hyp_arm_smmu_v3_device *to_smmu(struct kvm_hyp_iommu *iommu)
+{
+	return container_of(iommu, struct hyp_arm_smmu_v3_device, iommu);
+}
+
+static void smmu_tlb_flush_all(void *cookie)
+{
+	struct kvm_iommu_tlb_cookie *data = cookie;
+	struct hyp_arm_smmu_v3_device *smmu = to_smmu(data->iommu);
+	struct arm_smmu_cmdq_ent cmd = {
+		.opcode = CMDQ_OP_TLBI_S12_VMALL,
+		.tlbi.vmid = data->domain_id,
+	};
+
+	WARN_ON(smmu_send_cmd(smmu, &cmd));
+}
+
+static void smmu_tlb_inv_range(struct kvm_iommu_tlb_cookie *data,
+			       unsigned long iova, size_t size, size_t granule,
+			       bool leaf)
+{
+	struct hyp_arm_smmu_v3_device *smmu = to_smmu(data->iommu);
+	unsigned long end = iova + size;
+	struct arm_smmu_cmdq_ent cmd = {
+		.opcode = CMDQ_OP_TLBI_S2_IPA,
+		.tlbi.vmid = data->domain_id,
+		.tlbi.leaf = leaf,
+	};
+
+	/*
+	 * There are no mappings at high addresses since we don't use TTB1, so
+	 * no overflow possible.
+	 */
+	BUG_ON(end < iova);
+
+	while (iova < end) {
+		cmd.tlbi.addr = iova;
+		WARN_ON(smmu_send_cmd(smmu, &cmd));
+		BUG_ON(iova + granule < iova);
+		iova += granule;
+	}
+}
+
+static void smmu_tlb_flush_walk(unsigned long iova, size_t size,
+				size_t granule, void *cookie)
+{
+	smmu_tlb_inv_range(cookie, iova, size, granule, false);
+}
+
+static void smmu_tlb_add_page(struct iommu_iotlb_gather *gather,
+			      unsigned long iova, size_t granule,
+			      void *cookie)
+{
+	smmu_tlb_inv_range(cookie, iova, granule, granule, true);
+}
+
+static const struct iommu_flush_ops smmu_tlb_ops = {
+	.tlb_flush_all	= smmu_tlb_flush_all,
+	.tlb_flush_walk = smmu_tlb_flush_walk,
+	.tlb_add_page	= smmu_tlb_add_page,
+};
+
 static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
 {
 	int ret;
@@ -394,6 +454,14 @@ static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
 	if (IS_ERR(smmu->base))
 		return PTR_ERR(smmu->base);
 
+	smmu->iommu.pgtable_cfg.tlb = &smmu_tlb_ops;
+
+	ret = kvm_arm_io_pgtable_init(&smmu->iommu.pgtable_cfg, &smmu->pgtable);
+	if (ret)
+		return ret;
+
+	smmu->iommu.pgtable = &smmu->pgtable.iop;
+
 	ret = smmu_init_registers(smmu);
 	if (ret)
 		return ret;
@@ -406,7 +474,11 @@ static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
 	if (ret)
 		return ret;
 
-	return smmu_reset_device(smmu);
+	ret = smmu_reset_device(smmu);
+	if (ret)
+		return ret;
+
+	return kvm_iommu_init_device(&smmu->iommu);
 }
 
 static int smmu_init(void)
@@ -414,6 +486,10 @@ static int smmu_init(void)
 	int ret;
 	struct hyp_arm_smmu_v3_device *smmu;
 
+	ret = kvm_iommu_init();
+	if (ret)
+		return ret;
+
 	ret = pkvm_create_mappings(kvm_hyp_arm_smmu_v3_smmus,
 				   kvm_hyp_arm_smmu_v3_smmus +
 				   kvm_hyp_arm_smmu_v3_count,
@@ -430,8 +506,104 @@ static int smmu_init(void)
 	return 0;
 }
 
+static struct kvm_hyp_iommu *smmu_id_to_iommu(pkvm_handle_t smmu_id)
+{
+	if (smmu_id >= kvm_hyp_arm_smmu_v3_count)
+		return NULL;
+	smmu_id = array_index_nospec(smmu_id, kvm_hyp_arm_smmu_v3_count);
+
+	return &kvm_hyp_arm_smmu_v3_smmus[smmu_id].iommu;
+}
+
+static int smmu_attach_dev(struct kvm_hyp_iommu *iommu, pkvm_handle_t domain_id,
+			   struct kvm_hyp_iommu_domain *domain, u32 sid)
+{
+	int i;
+	int ret;
+	u64 *dst;
+	struct io_pgtable_cfg *cfg;
+	u64 ts, sl, ic, oc, sh, tg, ps;
+	u64 ent[STRTAB_STE_DWORDS] = {};
+	struct hyp_arm_smmu_v3_device *smmu = to_smmu(iommu);
+
+	dst = smmu_get_ste_ptr(smmu, sid);
+	if (!dst || dst[0])
+		return -EINVAL;
+
+	cfg = &smmu->pgtable.iop.cfg;
+	ps = cfg->arm_lpae_s2_cfg.vtcr.ps;
+	tg = cfg->arm_lpae_s2_cfg.vtcr.tg;
+	sh = cfg->arm_lpae_s2_cfg.vtcr.sh;
+	oc = cfg->arm_lpae_s2_cfg.vtcr.orgn;
+	ic = cfg->arm_lpae_s2_cfg.vtcr.irgn;
+	sl = cfg->arm_lpae_s2_cfg.vtcr.sl;
+	ts = cfg->arm_lpae_s2_cfg.vtcr.tsz;
+
+	ent[0] = STRTAB_STE_0_V |
+		 FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS);
+	ent[2] = FIELD_PREP(STRTAB_STE_2_VTCR,
+			FIELD_PREP(STRTAB_STE_2_VTCR_S2PS, ps) |
+			FIELD_PREP(STRTAB_STE_2_VTCR_S2TG, tg) |
+			FIELD_PREP(STRTAB_STE_2_VTCR_S2SH0, sh) |
+			FIELD_PREP(STRTAB_STE_2_VTCR_S2OR0, oc) |
+			FIELD_PREP(STRTAB_STE_2_VTCR_S2IR0, ic) |
+			FIELD_PREP(STRTAB_STE_2_VTCR_S2SL0, sl) |
+			FIELD_PREP(STRTAB_STE_2_VTCR_S2T0SZ, ts)) |
+		 FIELD_PREP(STRTAB_STE_2_S2VMID, domain_id) |
+		 STRTAB_STE_2_S2AA64;
+	ent[3] = hyp_virt_to_phys(domain->pgd) & STRTAB_STE_3_S2TTB_MASK;
+
+	/*
+	 * The SMMU may cache a disabled STE.
+	 * Initialize all fields, sync, then enable it.
+	 */
+	for (i = 1; i < STRTAB_STE_DWORDS; i++)
+		dst[i] = cpu_to_le64(ent[i]);
+
+	ret = smmu_sync_ste(smmu, sid);
+	if (ret)
+		return ret;
+
+	WRITE_ONCE(dst[0], cpu_to_le64(ent[0]));
+	ret = smmu_sync_ste(smmu, sid);
+	if (ret)
+		dst[0] = 0;
+
+	return ret;
+}
+
+static int smmu_detach_dev(struct kvm_hyp_iommu *iommu, pkvm_handle_t domain_id,
+			   struct kvm_hyp_iommu_domain *domain, u32 sid)
+{
+	u64 ttb;
+	u64 *dst;
+	int i, ret;
+	struct hyp_arm_smmu_v3_device *smmu = to_smmu(iommu);
+
+	dst = smmu_get_ste_ptr(smmu, sid);
+	if (!dst)
+		return -ENODEV;
+
+	ttb = dst[3] & STRTAB_STE_3_S2TTB_MASK;
+
+	dst[0] = 0;
+	ret = smmu_sync_ste(smmu, sid);
+	if (ret)
+		return ret;
+
+	for (i = 1; i < STRTAB_STE_DWORDS; i++)
+		dst[i] = 0;
+
+	return smmu_sync_ste(smmu, sid);
+}
+
 static struct kvm_iommu_ops smmu_ops = {
 	.init				= smmu_init,
+	.get_iommu_by_id		= smmu_id_to_iommu,
+	.alloc_iopt			= kvm_arm_io_pgtable_alloc,
+	.free_iopt			= kvm_arm_io_pgtable_free,
+	.attach_dev			= smmu_attach_dev,
+	.detach_dev			= smmu_detach_dev,
 };
 
 int kvm_arm_smmu_v3_register(void)
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 27/45] KVM: arm64: smmu-v3: Setup domains and page table configuration
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Setup the stream table entries when the host issues the attach_dev() and
detach_dev() hypercalls. The driver holds one io-pgtable configuration
for all domains.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 include/kvm/arm_smmu_v3.h                   |   2 +
 arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c | 178 +++++++++++++++++++-
 2 files changed, 177 insertions(+), 3 deletions(-)

diff --git a/include/kvm/arm_smmu_v3.h b/include/kvm/arm_smmu_v3.h
index fc67a3bf5709..ed139b0e9612 100644
--- a/include/kvm/arm_smmu_v3.h
+++ b/include/kvm/arm_smmu_v3.h
@@ -3,6 +3,7 @@
 #define __KVM_ARM_SMMU_V3_H
 
 #include <asm/kvm_asm.h>
+#include <linux/io-pgtable-arm.h>
 #include <kvm/iommu.h>
 
 #if IS_ENABLED(CONFIG_ARM_SMMU_V3_PKVM)
@@ -28,6 +29,7 @@ struct hyp_arm_smmu_v3_device {
 	size_t			strtab_num_entries;
 	size_t			strtab_num_l1_entries;
 	u8			strtab_split;
+	struct arm_lpae_io_pgtable pgtable;
 };
 
 extern size_t kvm_nvhe_sym(kvm_hyp_arm_smmu_v3_count);
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
index 81040339ccfe..56e313203a16 100644
--- a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
@@ -152,7 +152,6 @@ static int smmu_send_cmd(struct hyp_arm_smmu_v3_device *smmu,
 	return smmu_sync_cmd(smmu);
 }
 
-__maybe_unused
 static int smmu_sync_ste(struct hyp_arm_smmu_v3_device *smmu, u32 sid)
 {
 	struct arm_smmu_cmdq_ent cmd = {
@@ -194,7 +193,6 @@ static int smmu_alloc_l2_strtab(struct hyp_arm_smmu_v3_device *smmu, u32 idx)
 	return 0;
 }
 
-__maybe_unused
 static u64 *smmu_get_ste_ptr(struct hyp_arm_smmu_v3_device *smmu, u32 sid)
 {
 	u32 idx;
@@ -382,6 +380,68 @@ static int smmu_reset_device(struct hyp_arm_smmu_v3_device *smmu)
 	return smmu_write_cr0(smmu, 0);
 }
 
+static struct hyp_arm_smmu_v3_device *to_smmu(struct kvm_hyp_iommu *iommu)
+{
+	return container_of(iommu, struct hyp_arm_smmu_v3_device, iommu);
+}
+
+static void smmu_tlb_flush_all(void *cookie)
+{
+	struct kvm_iommu_tlb_cookie *data = cookie;
+	struct hyp_arm_smmu_v3_device *smmu = to_smmu(data->iommu);
+	struct arm_smmu_cmdq_ent cmd = {
+		.opcode = CMDQ_OP_TLBI_S12_VMALL,
+		.tlbi.vmid = data->domain_id,
+	};
+
+	WARN_ON(smmu_send_cmd(smmu, &cmd));
+}
+
+static void smmu_tlb_inv_range(struct kvm_iommu_tlb_cookie *data,
+			       unsigned long iova, size_t size, size_t granule,
+			       bool leaf)
+{
+	struct hyp_arm_smmu_v3_device *smmu = to_smmu(data->iommu);
+	unsigned long end = iova + size;
+	struct arm_smmu_cmdq_ent cmd = {
+		.opcode = CMDQ_OP_TLBI_S2_IPA,
+		.tlbi.vmid = data->domain_id,
+		.tlbi.leaf = leaf,
+	};
+
+	/*
+	 * There are no mappings at high addresses since we don't use TTB1, so
+	 * no overflow possible.
+	 */
+	BUG_ON(end < iova);
+
+	while (iova < end) {
+		cmd.tlbi.addr = iova;
+		WARN_ON(smmu_send_cmd(smmu, &cmd));
+		BUG_ON(iova + granule < iova);
+		iova += granule;
+	}
+}
+
+static void smmu_tlb_flush_walk(unsigned long iova, size_t size,
+				size_t granule, void *cookie)
+{
+	smmu_tlb_inv_range(cookie, iova, size, granule, false);
+}
+
+static void smmu_tlb_add_page(struct iommu_iotlb_gather *gather,
+			      unsigned long iova, size_t granule,
+			      void *cookie)
+{
+	smmu_tlb_inv_range(cookie, iova, granule, granule, true);
+}
+
+static const struct iommu_flush_ops smmu_tlb_ops = {
+	.tlb_flush_all	= smmu_tlb_flush_all,
+	.tlb_flush_walk = smmu_tlb_flush_walk,
+	.tlb_add_page	= smmu_tlb_add_page,
+};
+
 static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
 {
 	int ret;
@@ -394,6 +454,14 @@ static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
 	if (IS_ERR(smmu->base))
 		return PTR_ERR(smmu->base);
 
+	smmu->iommu.pgtable_cfg.tlb = &smmu_tlb_ops;
+
+	ret = kvm_arm_io_pgtable_init(&smmu->iommu.pgtable_cfg, &smmu->pgtable);
+	if (ret)
+		return ret;
+
+	smmu->iommu.pgtable = &smmu->pgtable.iop;
+
 	ret = smmu_init_registers(smmu);
 	if (ret)
 		return ret;
@@ -406,7 +474,11 @@ static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
 	if (ret)
 		return ret;
 
-	return smmu_reset_device(smmu);
+	ret = smmu_reset_device(smmu);
+	if (ret)
+		return ret;
+
+	return kvm_iommu_init_device(&smmu->iommu);
 }
 
 static int smmu_init(void)
@@ -414,6 +486,10 @@ static int smmu_init(void)
 	int ret;
 	struct hyp_arm_smmu_v3_device *smmu;
 
+	ret = kvm_iommu_init();
+	if (ret)
+		return ret;
+
 	ret = pkvm_create_mappings(kvm_hyp_arm_smmu_v3_smmus,
 				   kvm_hyp_arm_smmu_v3_smmus +
 				   kvm_hyp_arm_smmu_v3_count,
@@ -430,8 +506,104 @@ static int smmu_init(void)
 	return 0;
 }
 
+static struct kvm_hyp_iommu *smmu_id_to_iommu(pkvm_handle_t smmu_id)
+{
+	if (smmu_id >= kvm_hyp_arm_smmu_v3_count)
+		return NULL;
+	smmu_id = array_index_nospec(smmu_id, kvm_hyp_arm_smmu_v3_count);
+
+	return &kvm_hyp_arm_smmu_v3_smmus[smmu_id].iommu;
+}
+
+static int smmu_attach_dev(struct kvm_hyp_iommu *iommu, pkvm_handle_t domain_id,
+			   struct kvm_hyp_iommu_domain *domain, u32 sid)
+{
+	int i;
+	int ret;
+	u64 *dst;
+	struct io_pgtable_cfg *cfg;
+	u64 ts, sl, ic, oc, sh, tg, ps;
+	u64 ent[STRTAB_STE_DWORDS] = {};
+	struct hyp_arm_smmu_v3_device *smmu = to_smmu(iommu);
+
+	dst = smmu_get_ste_ptr(smmu, sid);
+	if (!dst || dst[0])
+		return -EINVAL;
+
+	cfg = &smmu->pgtable.iop.cfg;
+	ps = cfg->arm_lpae_s2_cfg.vtcr.ps;
+	tg = cfg->arm_lpae_s2_cfg.vtcr.tg;
+	sh = cfg->arm_lpae_s2_cfg.vtcr.sh;
+	oc = cfg->arm_lpae_s2_cfg.vtcr.orgn;
+	ic = cfg->arm_lpae_s2_cfg.vtcr.irgn;
+	sl = cfg->arm_lpae_s2_cfg.vtcr.sl;
+	ts = cfg->arm_lpae_s2_cfg.vtcr.tsz;
+
+	ent[0] = STRTAB_STE_0_V |
+		 FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS);
+	ent[2] = FIELD_PREP(STRTAB_STE_2_VTCR,
+			FIELD_PREP(STRTAB_STE_2_VTCR_S2PS, ps) |
+			FIELD_PREP(STRTAB_STE_2_VTCR_S2TG, tg) |
+			FIELD_PREP(STRTAB_STE_2_VTCR_S2SH0, sh) |
+			FIELD_PREP(STRTAB_STE_2_VTCR_S2OR0, oc) |
+			FIELD_PREP(STRTAB_STE_2_VTCR_S2IR0, ic) |
+			FIELD_PREP(STRTAB_STE_2_VTCR_S2SL0, sl) |
+			FIELD_PREP(STRTAB_STE_2_VTCR_S2T0SZ, ts)) |
+		 FIELD_PREP(STRTAB_STE_2_S2VMID, domain_id) |
+		 STRTAB_STE_2_S2AA64;
+	ent[3] = hyp_virt_to_phys(domain->pgd) & STRTAB_STE_3_S2TTB_MASK;
+
+	/*
+	 * The SMMU may cache a disabled STE.
+	 * Initialize all fields, sync, then enable it.
+	 */
+	for (i = 1; i < STRTAB_STE_DWORDS; i++)
+		dst[i] = cpu_to_le64(ent[i]);
+
+	ret = smmu_sync_ste(smmu, sid);
+	if (ret)
+		return ret;
+
+	WRITE_ONCE(dst[0], cpu_to_le64(ent[0]));
+	ret = smmu_sync_ste(smmu, sid);
+	if (ret)
+		dst[0] = 0;
+
+	return ret;
+}
+
+static int smmu_detach_dev(struct kvm_hyp_iommu *iommu, pkvm_handle_t domain_id,
+			   struct kvm_hyp_iommu_domain *domain, u32 sid)
+{
+	u64 ttb;
+	u64 *dst;
+	int i, ret;
+	struct hyp_arm_smmu_v3_device *smmu = to_smmu(iommu);
+
+	dst = smmu_get_ste_ptr(smmu, sid);
+	if (!dst)
+		return -ENODEV;
+
+	ttb = dst[3] & STRTAB_STE_3_S2TTB_MASK;
+
+	dst[0] = 0;
+	ret = smmu_sync_ste(smmu, sid);
+	if (ret)
+		return ret;
+
+	for (i = 1; i < STRTAB_STE_DWORDS; i++)
+		dst[i] = 0;
+
+	return smmu_sync_ste(smmu, sid);
+}
+
 static struct kvm_iommu_ops smmu_ops = {
 	.init				= smmu_init,
+	.get_iommu_by_id		= smmu_id_to_iommu,
+	.alloc_iopt			= kvm_arm_io_pgtable_alloc,
+	.free_iopt			= kvm_arm_io_pgtable_free,
+	.attach_dev			= smmu_attach_dev,
+	.detach_dev			= smmu_detach_dev,
 };
 
 int kvm_arm_smmu_v3_register(void)
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 28/45] iommu/arm-smmu-v3: Extract driver-specific bits from probe function
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

As we're about to share the arm_smmu_device_hw_probe() function with the
KVM driver, extract two bits that are specific to the normal driver.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 97d24ee5c14d..bcbd691ca96a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3454,7 +3454,7 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 
 	if (reg & IDR0_MSI) {
 		smmu->features |= ARM_SMMU_FEAT_MSI;
-		if (coherent && !disable_msipolling)
+		if (coherent)
 			smmu->options |= ARM_SMMU_OPT_MSIPOLL;
 	}
 
@@ -3598,11 +3598,6 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 		smmu->oas = 48;
 	}
 
-	if (arm_smmu_ops.pgsize_bitmap == -1UL)
-		arm_smmu_ops.pgsize_bitmap = smmu->pgsize_bitmap;
-	else
-		arm_smmu_ops.pgsize_bitmap |= smmu->pgsize_bitmap;
-
 	/* Set the DMA mask for our table walker */
 	if (dma_set_mask_and_coherent(smmu->dev, DMA_BIT_MASK(smmu->oas)))
 		dev_warn(smmu->dev,
@@ -3803,6 +3798,14 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
 	if (ret)
 		return ret;
 
+	if (disable_msipolling)
+		smmu->options &= ~ARM_SMMU_OPT_MSIPOLL;
+
+	if (arm_smmu_ops.pgsize_bitmap == -1UL)
+		arm_smmu_ops.pgsize_bitmap = smmu->pgsize_bitmap;
+	else
+		arm_smmu_ops.pgsize_bitmap |= smmu->pgsize_bitmap;
+
 	/* Initialise in-memory data structures */
 	ret = arm_smmu_init_structures(smmu);
 	if (ret)
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 28/45] iommu/arm-smmu-v3: Extract driver-specific bits from probe function
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

As we're about to share the arm_smmu_device_hw_probe() function with the
KVM driver, extract two bits that are specific to the normal driver.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 97d24ee5c14d..bcbd691ca96a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3454,7 +3454,7 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 
 	if (reg & IDR0_MSI) {
 		smmu->features |= ARM_SMMU_FEAT_MSI;
-		if (coherent && !disable_msipolling)
+		if (coherent)
 			smmu->options |= ARM_SMMU_OPT_MSIPOLL;
 	}
 
@@ -3598,11 +3598,6 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 		smmu->oas = 48;
 	}
 
-	if (arm_smmu_ops.pgsize_bitmap == -1UL)
-		arm_smmu_ops.pgsize_bitmap = smmu->pgsize_bitmap;
-	else
-		arm_smmu_ops.pgsize_bitmap |= smmu->pgsize_bitmap;
-
 	/* Set the DMA mask for our table walker */
 	if (dma_set_mask_and_coherent(smmu->dev, DMA_BIT_MASK(smmu->oas)))
 		dev_warn(smmu->dev,
@@ -3803,6 +3798,14 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
 	if (ret)
 		return ret;
 
+	if (disable_msipolling)
+		smmu->options &= ~ARM_SMMU_OPT_MSIPOLL;
+
+	if (arm_smmu_ops.pgsize_bitmap == -1UL)
+		arm_smmu_ops.pgsize_bitmap = smmu->pgsize_bitmap;
+	else
+		arm_smmu_ops.pgsize_bitmap |= smmu->pgsize_bitmap;
+
 	/* Initialise in-memory data structures */
 	ret = arm_smmu_init_structures(smmu);
 	if (ret)
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 29/45] iommu/arm-smmu-v3: Move some functions to arm-smmu-v3-common.c
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Move functions that can be shared between normal and KVM drivers to
arm-smmu-v3-common.c

Only straightforward moves here. More subtle factoring will be done in
then next patches.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/iommu/arm/arm-smmu-v3/Makefile        |   1 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |   9 +
 .../arm/arm-smmu-v3/arm-smmu-v3-common.c      | 296 ++++++++++++++++++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 293 -----------------
 4 files changed, 306 insertions(+), 293 deletions(-)
 create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c

diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile
index 54feb1ecccad..c4fcc796213c 100644
--- a/drivers/iommu/arm/arm-smmu-v3/Makefile
+++ b/drivers/iommu/arm/arm-smmu-v3/Makefile
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_ARM_SMMU_V3) += arm_smmu_v3.o
 arm_smmu_v3-objs-y += arm-smmu-v3.o
+arm_smmu_v3-objs-y += arm-smmu-v3-common.o
 arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
 arm_smmu_v3-objs := $(arm_smmu_v3-objs-y)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 32ce835ab4eb..59e8101d4ff5 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -269,6 +269,15 @@ extern struct xarray arm_smmu_asid_xa;
 extern struct mutex arm_smmu_asid_lock;
 extern struct arm_smmu_ctx_desc quiet_cd;
 
+int arm_smmu_write_reg_sync(struct arm_smmu_device *smmu, u32 val,
+			    unsigned int reg_off, unsigned int ack_off);
+int arm_smmu_update_gbpa(struct arm_smmu_device *smmu, u32 set, u32 clr);
+int arm_smmu_device_disable(struct arm_smmu_device *smmu);
+bool arm_smmu_capable(struct device *dev, enum iommu_cap cap);
+struct iommu_group *arm_smmu_device_group(struct device *dev);
+int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args);
+int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu);
+
 int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain, int ssid,
 			    struct arm_smmu_ctx_desc *cd);
 void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
new file mode 100644
index 000000000000..5e43329c0826
--- /dev/null
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
@@ -0,0 +1,296 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/dma-mapping.h>
+#include <linux/iopoll.h>
+#include <linux/pci.h>
+
+#include "arm-smmu-v3.h"
+
+int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
+{
+	u32 reg;
+	bool coherent = smmu->features & ARM_SMMU_FEAT_COHERENCY;
+
+	/* IDR0 */
+	reg = readl_relaxed(smmu->base + ARM_SMMU_IDR0);
+
+	/* 2-level structures */
+	if (FIELD_GET(IDR0_ST_LVL, reg) == IDR0_ST_LVL_2LVL)
+		smmu->features |= ARM_SMMU_FEAT_2_LVL_STRTAB;
+
+	if (reg & IDR0_CD2L)
+		smmu->features |= ARM_SMMU_FEAT_2_LVL_CDTAB;
+
+	/*
+	 * Translation table endianness.
+	 * We currently require the same endianness as the CPU, but this
+	 * could be changed later by adding a new IO_PGTABLE_QUIRK.
+	 */
+	switch (FIELD_GET(IDR0_TTENDIAN, reg)) {
+	case IDR0_TTENDIAN_MIXED:
+		smmu->features |= ARM_SMMU_FEAT_TT_LE | ARM_SMMU_FEAT_TT_BE;
+		break;
+#ifdef __BIG_ENDIAN
+	case IDR0_TTENDIAN_BE:
+		smmu->features |= ARM_SMMU_FEAT_TT_BE;
+		break;
+#else
+	case IDR0_TTENDIAN_LE:
+		smmu->features |= ARM_SMMU_FEAT_TT_LE;
+		break;
+#endif
+	default:
+		dev_err(smmu->dev, "unknown/unsupported TT endianness!\n");
+		return -ENXIO;
+	}
+
+	/* Boolean feature flags */
+	if (IS_ENABLED(CONFIG_PCI_PRI) && reg & IDR0_PRI)
+		smmu->features |= ARM_SMMU_FEAT_PRI;
+
+	if (IS_ENABLED(CONFIG_PCI_ATS) && reg & IDR0_ATS)
+		smmu->features |= ARM_SMMU_FEAT_ATS;
+
+	if (reg & IDR0_SEV)
+		smmu->features |= ARM_SMMU_FEAT_SEV;
+
+	if (reg & IDR0_MSI) {
+		smmu->features |= ARM_SMMU_FEAT_MSI;
+		if (coherent)
+			smmu->options |= ARM_SMMU_OPT_MSIPOLL;
+	}
+
+	if (reg & IDR0_HYP) {
+		smmu->features |= ARM_SMMU_FEAT_HYP;
+		if (cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN))
+			smmu->features |= ARM_SMMU_FEAT_E2H;
+	}
+
+	/*
+	 * The coherency feature as set by FW is used in preference to the ID
+	 * register, but warn on mismatch.
+	 */
+	if (!!(reg & IDR0_COHACC) != coherent)
+		dev_warn(smmu->dev, "IDR0.COHACC overridden by FW configuration (%s)\n",
+			 coherent ? "true" : "false");
+
+	switch (FIELD_GET(IDR0_STALL_MODEL, reg)) {
+	case IDR0_STALL_MODEL_FORCE:
+		smmu->features |= ARM_SMMU_FEAT_STALL_FORCE;
+		fallthrough;
+	case IDR0_STALL_MODEL_STALL:
+		smmu->features |= ARM_SMMU_FEAT_STALLS;
+	}
+
+	if (reg & IDR0_S1P)
+		smmu->features |= ARM_SMMU_FEAT_TRANS_S1;
+
+	if (reg & IDR0_S2P)
+		smmu->features |= ARM_SMMU_FEAT_TRANS_S2;
+
+	if (!(reg & (IDR0_S1P | IDR0_S2P))) {
+		dev_err(smmu->dev, "no translation support!\n");
+		return -ENXIO;
+	}
+
+	/* We only support the AArch64 table format at present */
+	switch (FIELD_GET(IDR0_TTF, reg)) {
+	case IDR0_TTF_AARCH32_64:
+		smmu->ias = 40;
+		fallthrough;
+	case IDR0_TTF_AARCH64:
+		break;
+	default:
+		dev_err(smmu->dev, "AArch64 table format not supported!\n");
+		return -ENXIO;
+	}
+
+	/* ASID/VMID sizes */
+	smmu->asid_bits = reg & IDR0_ASID16 ? 16 : 8;
+	smmu->vmid_bits = reg & IDR0_VMID16 ? 16 : 8;
+
+	/* IDR1 */
+	reg = readl_relaxed(smmu->base + ARM_SMMU_IDR1);
+	if (reg & (IDR1_TABLES_PRESET | IDR1_QUEUES_PRESET | IDR1_REL)) {
+		dev_err(smmu->dev, "embedded implementation not supported\n");
+		return -ENXIO;
+	}
+
+	/* Queue sizes, capped to ensure natural alignment */
+	smmu->cmdq.q.llq.max_n_shift = min_t(u32, CMDQ_MAX_SZ_SHIFT,
+					     FIELD_GET(IDR1_CMDQS, reg));
+	if (smmu->cmdq.q.llq.max_n_shift <= ilog2(CMDQ_BATCH_ENTRIES)) {
+		/*
+		 * We don't support splitting up batches, so one batch of
+		 * commands plus an extra sync needs to fit inside the command
+		 * queue. There's also no way we can handle the weird alignment
+		 * restrictions on the base pointer for a unit-length queue.
+		 */
+		dev_err(smmu->dev, "command queue size <= %d entries not supported\n",
+			CMDQ_BATCH_ENTRIES);
+		return -ENXIO;
+	}
+
+	smmu->evtq.q.llq.max_n_shift = min_t(u32, EVTQ_MAX_SZ_SHIFT,
+					     FIELD_GET(IDR1_EVTQS, reg));
+	smmu->priq.q.llq.max_n_shift = min_t(u32, PRIQ_MAX_SZ_SHIFT,
+					     FIELD_GET(IDR1_PRIQS, reg));
+
+	/* SID/SSID sizes */
+	smmu->ssid_bits = FIELD_GET(IDR1_SSIDSIZE, reg);
+	smmu->sid_bits = FIELD_GET(IDR1_SIDSIZE, reg);
+	smmu->iommu.max_pasids = 1UL << smmu->ssid_bits;
+
+	/*
+	 * If the SMMU supports fewer bits than would fill a single L2 stream
+	 * table, use a linear table instead.
+	 */
+	if (smmu->sid_bits <= STRTAB_SPLIT)
+		smmu->features &= ~ARM_SMMU_FEAT_2_LVL_STRTAB;
+
+	/* IDR3 */
+	reg = readl_relaxed(smmu->base + ARM_SMMU_IDR3);
+	if (FIELD_GET(IDR3_RIL, reg))
+		smmu->features |= ARM_SMMU_FEAT_RANGE_INV;
+
+	/* IDR5 */
+	reg = readl_relaxed(smmu->base + ARM_SMMU_IDR5);
+
+	/* Maximum number of outstanding stalls */
+	smmu->evtq.max_stalls = FIELD_GET(IDR5_STALL_MAX, reg);
+
+	/* Page sizes */
+	if (reg & IDR5_GRAN64K)
+		smmu->pgsize_bitmap |= SZ_64K | SZ_512M;
+	if (reg & IDR5_GRAN16K)
+		smmu->pgsize_bitmap |= SZ_16K | SZ_32M;
+	if (reg & IDR5_GRAN4K)
+		smmu->pgsize_bitmap |= SZ_4K | SZ_2M | SZ_1G;
+
+	/* Input address size */
+	if (FIELD_GET(IDR5_VAX, reg) == IDR5_VAX_52_BIT)
+		smmu->features |= ARM_SMMU_FEAT_VAX;
+
+	/* Output address size */
+	switch (FIELD_GET(IDR5_OAS, reg)) {
+	case IDR5_OAS_32_BIT:
+		smmu->oas = 32;
+		break;
+	case IDR5_OAS_36_BIT:
+		smmu->oas = 36;
+		break;
+	case IDR5_OAS_40_BIT:
+		smmu->oas = 40;
+		break;
+	case IDR5_OAS_42_BIT:
+		smmu->oas = 42;
+		break;
+	case IDR5_OAS_44_BIT:
+		smmu->oas = 44;
+		break;
+	case IDR5_OAS_52_BIT:
+		smmu->oas = 52;
+		smmu->pgsize_bitmap |= 1ULL << 42; /* 4TB */
+		break;
+	default:
+		dev_info(smmu->dev,
+			"unknown output address size. Truncating to 48-bit\n");
+		fallthrough;
+	case IDR5_OAS_48_BIT:
+		smmu->oas = 48;
+	}
+
+	/* Set the DMA mask for our table walker */
+	if (dma_set_mask_and_coherent(smmu->dev, DMA_BIT_MASK(smmu->oas)))
+		dev_warn(smmu->dev,
+			 "failed to set DMA mask for table walker\n");
+
+	smmu->ias = max(smmu->ias, smmu->oas);
+
+	if (arm_smmu_sva_supported(smmu))
+		smmu->features |= ARM_SMMU_FEAT_SVA;
+
+	dev_info(smmu->dev, "ias %lu-bit, oas %lu-bit (features 0x%08x)\n",
+		 smmu->ias, smmu->oas, smmu->features);
+	return 0;
+}
+
+int arm_smmu_write_reg_sync(struct arm_smmu_device *smmu, u32 val,
+			    unsigned int reg_off, unsigned int ack_off)
+{
+	u32 reg;
+
+	writel_relaxed(val, smmu->base + reg_off);
+	return readl_relaxed_poll_timeout(smmu->base + ack_off, reg, reg == val,
+					  1, ARM_SMMU_POLL_TIMEOUT_US);
+}
+
+/* GBPA is "special" */
+int arm_smmu_update_gbpa(struct arm_smmu_device *smmu, u32 set, u32 clr)
+{
+	int ret;
+	u32 reg, __iomem *gbpa = smmu->base + ARM_SMMU_GBPA;
+
+	ret = readl_relaxed_poll_timeout(gbpa, reg, !(reg & GBPA_UPDATE),
+					 1, ARM_SMMU_POLL_TIMEOUT_US);
+	if (ret)
+		return ret;
+
+	reg &= ~clr;
+	reg |= set;
+	writel_relaxed(reg | GBPA_UPDATE, gbpa);
+	ret = readl_relaxed_poll_timeout(gbpa, reg, !(reg & GBPA_UPDATE),
+					 1, ARM_SMMU_POLL_TIMEOUT_US);
+
+	if (ret)
+		dev_err(smmu->dev, "GBPA not responding to update\n");
+	return ret;
+}
+
+int arm_smmu_device_disable(struct arm_smmu_device *smmu)
+{
+	int ret;
+
+	ret = arm_smmu_write_reg_sync(smmu, 0, ARM_SMMU_CR0, ARM_SMMU_CR0ACK);
+	if (ret)
+		dev_err(smmu->dev, "failed to clear cr0\n");
+
+	return ret;
+}
+
+bool arm_smmu_capable(struct device *dev, enum iommu_cap cap)
+{
+	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+
+	switch (cap) {
+	case IOMMU_CAP_CACHE_COHERENCY:
+		/* Assume that a coherent TCU implies coherent TBUs */
+		return master->smmu->features & ARM_SMMU_FEAT_COHERENCY;
+	case IOMMU_CAP_NOEXEC:
+		return true;
+	default:
+		return false;
+	}
+}
+
+
+struct iommu_group *arm_smmu_device_group(struct device *dev)
+{
+	struct iommu_group *group;
+
+	/*
+	 * We don't support devices sharing stream IDs other than PCI RID
+	 * aliases, since the necessary ID-to-device lookup becomes rather
+	 * impractical given a potential sparse 32-bit stream ID space.
+	 */
+	if (dev_is_pci(dev))
+		group = pci_device_group(dev);
+	else
+		group = generic_device_group(dev);
+
+	return group;
+}
+
+int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
+{
+	return iommu_fwspec_add_ids(dev, args->args, 1);
+}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index bcbd691ca96a..08fd79f66d29 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -17,13 +17,11 @@
 #include <linux/err.h>
 #include <linux/interrupt.h>
 #include <linux/io-pgtable.h>
-#include <linux/iopoll.h>
 #include <linux/module.h>
 #include <linux/msi.h>
 #include <linux/of.h>
 #include <linux/of_address.h>
 #include <linux/of_platform.h>
-#include <linux/pci.h>
 #include <linux/pci-ats.h>
 #include <linux/platform_device.h>
 
@@ -1642,8 +1640,6 @@ static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
 	return IRQ_HANDLED;
 }
 
-static int arm_smmu_device_disable(struct arm_smmu_device *smmu);
-
 static irqreturn_t arm_smmu_gerror_handler(int irq, void *dev)
 {
 	u32 gerror, gerrorn, active;
@@ -1990,21 +1986,6 @@ static const struct iommu_flush_ops arm_smmu_flush_ops = {
 };
 
 /* IOMMU API */
-static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap)
-{
-	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
-
-	switch (cap) {
-	case IOMMU_CAP_CACHE_COHERENCY:
-		/* Assume that a coherent TCU implies coherent TBUs */
-		return master->smmu->features & ARM_SMMU_FEAT_COHERENCY;
-	case IOMMU_CAP_NOEXEC:
-		return true;
-	default:
-		return false;
-	}
-}
-
 static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 {
 	struct arm_smmu_domain *smmu_domain;
@@ -2698,23 +2679,6 @@ static void arm_smmu_release_device(struct device *dev)
 	kfree(master);
 }
 
-static struct iommu_group *arm_smmu_device_group(struct device *dev)
-{
-	struct iommu_group *group;
-
-	/*
-	 * We don't support devices sharing stream IDs other than PCI RID
-	 * aliases, since the necessary ID-to-device lookup becomes rather
-	 * impractical given a potential sparse 32-bit stream ID space.
-	 */
-	if (dev_is_pci(dev))
-		group = pci_device_group(dev);
-	else
-		group = generic_device_group(dev);
-
-	return group;
-}
-
 static int arm_smmu_enable_nesting(struct iommu_domain *domain)
 {
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
@@ -2730,11 +2694,6 @@ static int arm_smmu_enable_nesting(struct iommu_domain *domain)
 	return ret;
 }
 
-static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
-{
-	return iommu_fwspec_add_ids(dev, args->args, 1);
-}
-
 static void arm_smmu_get_resv_regions(struct device *dev,
 				      struct list_head *head)
 {
@@ -3081,38 +3040,6 @@ static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
 	return arm_smmu_init_strtab(smmu);
 }
 
-static int arm_smmu_write_reg_sync(struct arm_smmu_device *smmu, u32 val,
-				   unsigned int reg_off, unsigned int ack_off)
-{
-	u32 reg;
-
-	writel_relaxed(val, smmu->base + reg_off);
-	return readl_relaxed_poll_timeout(smmu->base + ack_off, reg, reg == val,
-					  1, ARM_SMMU_POLL_TIMEOUT_US);
-}
-
-/* GBPA is "special" */
-static int arm_smmu_update_gbpa(struct arm_smmu_device *smmu, u32 set, u32 clr)
-{
-	int ret;
-	u32 reg, __iomem *gbpa = smmu->base + ARM_SMMU_GBPA;
-
-	ret = readl_relaxed_poll_timeout(gbpa, reg, !(reg & GBPA_UPDATE),
-					 1, ARM_SMMU_POLL_TIMEOUT_US);
-	if (ret)
-		return ret;
-
-	reg &= ~clr;
-	reg |= set;
-	writel_relaxed(reg | GBPA_UPDATE, gbpa);
-	ret = readl_relaxed_poll_timeout(gbpa, reg, !(reg & GBPA_UPDATE),
-					 1, ARM_SMMU_POLL_TIMEOUT_US);
-
-	if (ret)
-		dev_err(smmu->dev, "GBPA not responding to update\n");
-	return ret;
-}
-
 static void arm_smmu_free_msis(void *data)
 {
 	struct device *dev = data;
@@ -3258,17 +3185,6 @@ static int arm_smmu_setup_irqs(struct arm_smmu_device *smmu)
 	return 0;
 }
 
-static int arm_smmu_device_disable(struct arm_smmu_device *smmu)
-{
-	int ret;
-
-	ret = arm_smmu_write_reg_sync(smmu, 0, ARM_SMMU_CR0, ARM_SMMU_CR0ACK);
-	if (ret)
-		dev_err(smmu->dev, "failed to clear cr0\n");
-
-	return ret;
-}
-
 static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 {
 	int ret;
@@ -3404,215 +3320,6 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 	return 0;
 }
 
-static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
-{
-	u32 reg;
-	bool coherent = smmu->features & ARM_SMMU_FEAT_COHERENCY;
-
-	/* IDR0 */
-	reg = readl_relaxed(smmu->base + ARM_SMMU_IDR0);
-
-	/* 2-level structures */
-	if (FIELD_GET(IDR0_ST_LVL, reg) == IDR0_ST_LVL_2LVL)
-		smmu->features |= ARM_SMMU_FEAT_2_LVL_STRTAB;
-
-	if (reg & IDR0_CD2L)
-		smmu->features |= ARM_SMMU_FEAT_2_LVL_CDTAB;
-
-	/*
-	 * Translation table endianness.
-	 * We currently require the same endianness as the CPU, but this
-	 * could be changed later by adding a new IO_PGTABLE_QUIRK.
-	 */
-	switch (FIELD_GET(IDR0_TTENDIAN, reg)) {
-	case IDR0_TTENDIAN_MIXED:
-		smmu->features |= ARM_SMMU_FEAT_TT_LE | ARM_SMMU_FEAT_TT_BE;
-		break;
-#ifdef __BIG_ENDIAN
-	case IDR0_TTENDIAN_BE:
-		smmu->features |= ARM_SMMU_FEAT_TT_BE;
-		break;
-#else
-	case IDR0_TTENDIAN_LE:
-		smmu->features |= ARM_SMMU_FEAT_TT_LE;
-		break;
-#endif
-	default:
-		dev_err(smmu->dev, "unknown/unsupported TT endianness!\n");
-		return -ENXIO;
-	}
-
-	/* Boolean feature flags */
-	if (IS_ENABLED(CONFIG_PCI_PRI) && reg & IDR0_PRI)
-		smmu->features |= ARM_SMMU_FEAT_PRI;
-
-	if (IS_ENABLED(CONFIG_PCI_ATS) && reg & IDR0_ATS)
-		smmu->features |= ARM_SMMU_FEAT_ATS;
-
-	if (reg & IDR0_SEV)
-		smmu->features |= ARM_SMMU_FEAT_SEV;
-
-	if (reg & IDR0_MSI) {
-		smmu->features |= ARM_SMMU_FEAT_MSI;
-		if (coherent)
-			smmu->options |= ARM_SMMU_OPT_MSIPOLL;
-	}
-
-	if (reg & IDR0_HYP) {
-		smmu->features |= ARM_SMMU_FEAT_HYP;
-		if (cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN))
-			smmu->features |= ARM_SMMU_FEAT_E2H;
-	}
-
-	/*
-	 * The coherency feature as set by FW is used in preference to the ID
-	 * register, but warn on mismatch.
-	 */
-	if (!!(reg & IDR0_COHACC) != coherent)
-		dev_warn(smmu->dev, "IDR0.COHACC overridden by FW configuration (%s)\n",
-			 coherent ? "true" : "false");
-
-	switch (FIELD_GET(IDR0_STALL_MODEL, reg)) {
-	case IDR0_STALL_MODEL_FORCE:
-		smmu->features |= ARM_SMMU_FEAT_STALL_FORCE;
-		fallthrough;
-	case IDR0_STALL_MODEL_STALL:
-		smmu->features |= ARM_SMMU_FEAT_STALLS;
-	}
-
-	if (reg & IDR0_S1P)
-		smmu->features |= ARM_SMMU_FEAT_TRANS_S1;
-
-	if (reg & IDR0_S2P)
-		smmu->features |= ARM_SMMU_FEAT_TRANS_S2;
-
-	if (!(reg & (IDR0_S1P | IDR0_S2P))) {
-		dev_err(smmu->dev, "no translation support!\n");
-		return -ENXIO;
-	}
-
-	/* We only support the AArch64 table format at present */
-	switch (FIELD_GET(IDR0_TTF, reg)) {
-	case IDR0_TTF_AARCH32_64:
-		smmu->ias = 40;
-		fallthrough;
-	case IDR0_TTF_AARCH64:
-		break;
-	default:
-		dev_err(smmu->dev, "AArch64 table format not supported!\n");
-		return -ENXIO;
-	}
-
-	/* ASID/VMID sizes */
-	smmu->asid_bits = reg & IDR0_ASID16 ? 16 : 8;
-	smmu->vmid_bits = reg & IDR0_VMID16 ? 16 : 8;
-
-	/* IDR1 */
-	reg = readl_relaxed(smmu->base + ARM_SMMU_IDR1);
-	if (reg & (IDR1_TABLES_PRESET | IDR1_QUEUES_PRESET | IDR1_REL)) {
-		dev_err(smmu->dev, "embedded implementation not supported\n");
-		return -ENXIO;
-	}
-
-	/* Queue sizes, capped to ensure natural alignment */
-	smmu->cmdq.q.llq.max_n_shift = min_t(u32, CMDQ_MAX_SZ_SHIFT,
-					     FIELD_GET(IDR1_CMDQS, reg));
-	if (smmu->cmdq.q.llq.max_n_shift <= ilog2(CMDQ_BATCH_ENTRIES)) {
-		/*
-		 * We don't support splitting up batches, so one batch of
-		 * commands plus an extra sync needs to fit inside the command
-		 * queue. There's also no way we can handle the weird alignment
-		 * restrictions on the base pointer for a unit-length queue.
-		 */
-		dev_err(smmu->dev, "command queue size <= %d entries not supported\n",
-			CMDQ_BATCH_ENTRIES);
-		return -ENXIO;
-	}
-
-	smmu->evtq.q.llq.max_n_shift = min_t(u32, EVTQ_MAX_SZ_SHIFT,
-					     FIELD_GET(IDR1_EVTQS, reg));
-	smmu->priq.q.llq.max_n_shift = min_t(u32, PRIQ_MAX_SZ_SHIFT,
-					     FIELD_GET(IDR1_PRIQS, reg));
-
-	/* SID/SSID sizes */
-	smmu->ssid_bits = FIELD_GET(IDR1_SSIDSIZE, reg);
-	smmu->sid_bits = FIELD_GET(IDR1_SIDSIZE, reg);
-	smmu->iommu.max_pasids = 1UL << smmu->ssid_bits;
-
-	/*
-	 * If the SMMU supports fewer bits than would fill a single L2 stream
-	 * table, use a linear table instead.
-	 */
-	if (smmu->sid_bits <= STRTAB_SPLIT)
-		smmu->features &= ~ARM_SMMU_FEAT_2_LVL_STRTAB;
-
-	/* IDR3 */
-	reg = readl_relaxed(smmu->base + ARM_SMMU_IDR3);
-	if (FIELD_GET(IDR3_RIL, reg))
-		smmu->features |= ARM_SMMU_FEAT_RANGE_INV;
-
-	/* IDR5 */
-	reg = readl_relaxed(smmu->base + ARM_SMMU_IDR5);
-
-	/* Maximum number of outstanding stalls */
-	smmu->evtq.max_stalls = FIELD_GET(IDR5_STALL_MAX, reg);
-
-	/* Page sizes */
-	if (reg & IDR5_GRAN64K)
-		smmu->pgsize_bitmap |= SZ_64K | SZ_512M;
-	if (reg & IDR5_GRAN16K)
-		smmu->pgsize_bitmap |= SZ_16K | SZ_32M;
-	if (reg & IDR5_GRAN4K)
-		smmu->pgsize_bitmap |= SZ_4K | SZ_2M | SZ_1G;
-
-	/* Input address size */
-	if (FIELD_GET(IDR5_VAX, reg) == IDR5_VAX_52_BIT)
-		smmu->features |= ARM_SMMU_FEAT_VAX;
-
-	/* Output address size */
-	switch (FIELD_GET(IDR5_OAS, reg)) {
-	case IDR5_OAS_32_BIT:
-		smmu->oas = 32;
-		break;
-	case IDR5_OAS_36_BIT:
-		smmu->oas = 36;
-		break;
-	case IDR5_OAS_40_BIT:
-		smmu->oas = 40;
-		break;
-	case IDR5_OAS_42_BIT:
-		smmu->oas = 42;
-		break;
-	case IDR5_OAS_44_BIT:
-		smmu->oas = 44;
-		break;
-	case IDR5_OAS_52_BIT:
-		smmu->oas = 52;
-		smmu->pgsize_bitmap |= 1ULL << 42; /* 4TB */
-		break;
-	default:
-		dev_info(smmu->dev,
-			"unknown output address size. Truncating to 48-bit\n");
-		fallthrough;
-	case IDR5_OAS_48_BIT:
-		smmu->oas = 48;
-	}
-
-	/* Set the DMA mask for our table walker */
-	if (dma_set_mask_and_coherent(smmu->dev, DMA_BIT_MASK(smmu->oas)))
-		dev_warn(smmu->dev,
-			 "failed to set DMA mask for table walker\n");
-
-	smmu->ias = max(smmu->ias, smmu->oas);
-
-	if (arm_smmu_sva_supported(smmu))
-		smmu->features |= ARM_SMMU_FEAT_SVA;
-
-	dev_info(smmu->dev, "ias %lu-bit, oas %lu-bit (features 0x%08x)\n",
-		 smmu->ias, smmu->oas, smmu->features);
-	return 0;
-}
-
 #ifdef CONFIG_ACPI
 static void acpi_smmu_get_options(u32 model, struct arm_smmu_device *smmu)
 {
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 29/45] iommu/arm-smmu-v3: Move some functions to arm-smmu-v3-common.c
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Move functions that can be shared between normal and KVM drivers to
arm-smmu-v3-common.c

Only straightforward moves here. More subtle factoring will be done in
then next patches.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/iommu/arm/arm-smmu-v3/Makefile        |   1 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |   9 +
 .../arm/arm-smmu-v3/arm-smmu-v3-common.c      | 296 ++++++++++++++++++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 293 -----------------
 4 files changed, 306 insertions(+), 293 deletions(-)
 create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c

diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile
index 54feb1ecccad..c4fcc796213c 100644
--- a/drivers/iommu/arm/arm-smmu-v3/Makefile
+++ b/drivers/iommu/arm/arm-smmu-v3/Makefile
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_ARM_SMMU_V3) += arm_smmu_v3.o
 arm_smmu_v3-objs-y += arm-smmu-v3.o
+arm_smmu_v3-objs-y += arm-smmu-v3-common.o
 arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
 arm_smmu_v3-objs := $(arm_smmu_v3-objs-y)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 32ce835ab4eb..59e8101d4ff5 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -269,6 +269,15 @@ extern struct xarray arm_smmu_asid_xa;
 extern struct mutex arm_smmu_asid_lock;
 extern struct arm_smmu_ctx_desc quiet_cd;
 
+int arm_smmu_write_reg_sync(struct arm_smmu_device *smmu, u32 val,
+			    unsigned int reg_off, unsigned int ack_off);
+int arm_smmu_update_gbpa(struct arm_smmu_device *smmu, u32 set, u32 clr);
+int arm_smmu_device_disable(struct arm_smmu_device *smmu);
+bool arm_smmu_capable(struct device *dev, enum iommu_cap cap);
+struct iommu_group *arm_smmu_device_group(struct device *dev);
+int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args);
+int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu);
+
 int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain, int ssid,
 			    struct arm_smmu_ctx_desc *cd);
 void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
new file mode 100644
index 000000000000..5e43329c0826
--- /dev/null
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
@@ -0,0 +1,296 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/dma-mapping.h>
+#include <linux/iopoll.h>
+#include <linux/pci.h>
+
+#include "arm-smmu-v3.h"
+
+int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
+{
+	u32 reg;
+	bool coherent = smmu->features & ARM_SMMU_FEAT_COHERENCY;
+
+	/* IDR0 */
+	reg = readl_relaxed(smmu->base + ARM_SMMU_IDR0);
+
+	/* 2-level structures */
+	if (FIELD_GET(IDR0_ST_LVL, reg) == IDR0_ST_LVL_2LVL)
+		smmu->features |= ARM_SMMU_FEAT_2_LVL_STRTAB;
+
+	if (reg & IDR0_CD2L)
+		smmu->features |= ARM_SMMU_FEAT_2_LVL_CDTAB;
+
+	/*
+	 * Translation table endianness.
+	 * We currently require the same endianness as the CPU, but this
+	 * could be changed later by adding a new IO_PGTABLE_QUIRK.
+	 */
+	switch (FIELD_GET(IDR0_TTENDIAN, reg)) {
+	case IDR0_TTENDIAN_MIXED:
+		smmu->features |= ARM_SMMU_FEAT_TT_LE | ARM_SMMU_FEAT_TT_BE;
+		break;
+#ifdef __BIG_ENDIAN
+	case IDR0_TTENDIAN_BE:
+		smmu->features |= ARM_SMMU_FEAT_TT_BE;
+		break;
+#else
+	case IDR0_TTENDIAN_LE:
+		smmu->features |= ARM_SMMU_FEAT_TT_LE;
+		break;
+#endif
+	default:
+		dev_err(smmu->dev, "unknown/unsupported TT endianness!\n");
+		return -ENXIO;
+	}
+
+	/* Boolean feature flags */
+	if (IS_ENABLED(CONFIG_PCI_PRI) && reg & IDR0_PRI)
+		smmu->features |= ARM_SMMU_FEAT_PRI;
+
+	if (IS_ENABLED(CONFIG_PCI_ATS) && reg & IDR0_ATS)
+		smmu->features |= ARM_SMMU_FEAT_ATS;
+
+	if (reg & IDR0_SEV)
+		smmu->features |= ARM_SMMU_FEAT_SEV;
+
+	if (reg & IDR0_MSI) {
+		smmu->features |= ARM_SMMU_FEAT_MSI;
+		if (coherent)
+			smmu->options |= ARM_SMMU_OPT_MSIPOLL;
+	}
+
+	if (reg & IDR0_HYP) {
+		smmu->features |= ARM_SMMU_FEAT_HYP;
+		if (cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN))
+			smmu->features |= ARM_SMMU_FEAT_E2H;
+	}
+
+	/*
+	 * The coherency feature as set by FW is used in preference to the ID
+	 * register, but warn on mismatch.
+	 */
+	if (!!(reg & IDR0_COHACC) != coherent)
+		dev_warn(smmu->dev, "IDR0.COHACC overridden by FW configuration (%s)\n",
+			 coherent ? "true" : "false");
+
+	switch (FIELD_GET(IDR0_STALL_MODEL, reg)) {
+	case IDR0_STALL_MODEL_FORCE:
+		smmu->features |= ARM_SMMU_FEAT_STALL_FORCE;
+		fallthrough;
+	case IDR0_STALL_MODEL_STALL:
+		smmu->features |= ARM_SMMU_FEAT_STALLS;
+	}
+
+	if (reg & IDR0_S1P)
+		smmu->features |= ARM_SMMU_FEAT_TRANS_S1;
+
+	if (reg & IDR0_S2P)
+		smmu->features |= ARM_SMMU_FEAT_TRANS_S2;
+
+	if (!(reg & (IDR0_S1P | IDR0_S2P))) {
+		dev_err(smmu->dev, "no translation support!\n");
+		return -ENXIO;
+	}
+
+	/* We only support the AArch64 table format at present */
+	switch (FIELD_GET(IDR0_TTF, reg)) {
+	case IDR0_TTF_AARCH32_64:
+		smmu->ias = 40;
+		fallthrough;
+	case IDR0_TTF_AARCH64:
+		break;
+	default:
+		dev_err(smmu->dev, "AArch64 table format not supported!\n");
+		return -ENXIO;
+	}
+
+	/* ASID/VMID sizes */
+	smmu->asid_bits = reg & IDR0_ASID16 ? 16 : 8;
+	smmu->vmid_bits = reg & IDR0_VMID16 ? 16 : 8;
+
+	/* IDR1 */
+	reg = readl_relaxed(smmu->base + ARM_SMMU_IDR1);
+	if (reg & (IDR1_TABLES_PRESET | IDR1_QUEUES_PRESET | IDR1_REL)) {
+		dev_err(smmu->dev, "embedded implementation not supported\n");
+		return -ENXIO;
+	}
+
+	/* Queue sizes, capped to ensure natural alignment */
+	smmu->cmdq.q.llq.max_n_shift = min_t(u32, CMDQ_MAX_SZ_SHIFT,
+					     FIELD_GET(IDR1_CMDQS, reg));
+	if (smmu->cmdq.q.llq.max_n_shift <= ilog2(CMDQ_BATCH_ENTRIES)) {
+		/*
+		 * We don't support splitting up batches, so one batch of
+		 * commands plus an extra sync needs to fit inside the command
+		 * queue. There's also no way we can handle the weird alignment
+		 * restrictions on the base pointer for a unit-length queue.
+		 */
+		dev_err(smmu->dev, "command queue size <= %d entries not supported\n",
+			CMDQ_BATCH_ENTRIES);
+		return -ENXIO;
+	}
+
+	smmu->evtq.q.llq.max_n_shift = min_t(u32, EVTQ_MAX_SZ_SHIFT,
+					     FIELD_GET(IDR1_EVTQS, reg));
+	smmu->priq.q.llq.max_n_shift = min_t(u32, PRIQ_MAX_SZ_SHIFT,
+					     FIELD_GET(IDR1_PRIQS, reg));
+
+	/* SID/SSID sizes */
+	smmu->ssid_bits = FIELD_GET(IDR1_SSIDSIZE, reg);
+	smmu->sid_bits = FIELD_GET(IDR1_SIDSIZE, reg);
+	smmu->iommu.max_pasids = 1UL << smmu->ssid_bits;
+
+	/*
+	 * If the SMMU supports fewer bits than would fill a single L2 stream
+	 * table, use a linear table instead.
+	 */
+	if (smmu->sid_bits <= STRTAB_SPLIT)
+		smmu->features &= ~ARM_SMMU_FEAT_2_LVL_STRTAB;
+
+	/* IDR3 */
+	reg = readl_relaxed(smmu->base + ARM_SMMU_IDR3);
+	if (FIELD_GET(IDR3_RIL, reg))
+		smmu->features |= ARM_SMMU_FEAT_RANGE_INV;
+
+	/* IDR5 */
+	reg = readl_relaxed(smmu->base + ARM_SMMU_IDR5);
+
+	/* Maximum number of outstanding stalls */
+	smmu->evtq.max_stalls = FIELD_GET(IDR5_STALL_MAX, reg);
+
+	/* Page sizes */
+	if (reg & IDR5_GRAN64K)
+		smmu->pgsize_bitmap |= SZ_64K | SZ_512M;
+	if (reg & IDR5_GRAN16K)
+		smmu->pgsize_bitmap |= SZ_16K | SZ_32M;
+	if (reg & IDR5_GRAN4K)
+		smmu->pgsize_bitmap |= SZ_4K | SZ_2M | SZ_1G;
+
+	/* Input address size */
+	if (FIELD_GET(IDR5_VAX, reg) == IDR5_VAX_52_BIT)
+		smmu->features |= ARM_SMMU_FEAT_VAX;
+
+	/* Output address size */
+	switch (FIELD_GET(IDR5_OAS, reg)) {
+	case IDR5_OAS_32_BIT:
+		smmu->oas = 32;
+		break;
+	case IDR5_OAS_36_BIT:
+		smmu->oas = 36;
+		break;
+	case IDR5_OAS_40_BIT:
+		smmu->oas = 40;
+		break;
+	case IDR5_OAS_42_BIT:
+		smmu->oas = 42;
+		break;
+	case IDR5_OAS_44_BIT:
+		smmu->oas = 44;
+		break;
+	case IDR5_OAS_52_BIT:
+		smmu->oas = 52;
+		smmu->pgsize_bitmap |= 1ULL << 42; /* 4TB */
+		break;
+	default:
+		dev_info(smmu->dev,
+			"unknown output address size. Truncating to 48-bit\n");
+		fallthrough;
+	case IDR5_OAS_48_BIT:
+		smmu->oas = 48;
+	}
+
+	/* Set the DMA mask for our table walker */
+	if (dma_set_mask_and_coherent(smmu->dev, DMA_BIT_MASK(smmu->oas)))
+		dev_warn(smmu->dev,
+			 "failed to set DMA mask for table walker\n");
+
+	smmu->ias = max(smmu->ias, smmu->oas);
+
+	if (arm_smmu_sva_supported(smmu))
+		smmu->features |= ARM_SMMU_FEAT_SVA;
+
+	dev_info(smmu->dev, "ias %lu-bit, oas %lu-bit (features 0x%08x)\n",
+		 smmu->ias, smmu->oas, smmu->features);
+	return 0;
+}
+
+int arm_smmu_write_reg_sync(struct arm_smmu_device *smmu, u32 val,
+			    unsigned int reg_off, unsigned int ack_off)
+{
+	u32 reg;
+
+	writel_relaxed(val, smmu->base + reg_off);
+	return readl_relaxed_poll_timeout(smmu->base + ack_off, reg, reg == val,
+					  1, ARM_SMMU_POLL_TIMEOUT_US);
+}
+
+/* GBPA is "special" */
+int arm_smmu_update_gbpa(struct arm_smmu_device *smmu, u32 set, u32 clr)
+{
+	int ret;
+	u32 reg, __iomem *gbpa = smmu->base + ARM_SMMU_GBPA;
+
+	ret = readl_relaxed_poll_timeout(gbpa, reg, !(reg & GBPA_UPDATE),
+					 1, ARM_SMMU_POLL_TIMEOUT_US);
+	if (ret)
+		return ret;
+
+	reg &= ~clr;
+	reg |= set;
+	writel_relaxed(reg | GBPA_UPDATE, gbpa);
+	ret = readl_relaxed_poll_timeout(gbpa, reg, !(reg & GBPA_UPDATE),
+					 1, ARM_SMMU_POLL_TIMEOUT_US);
+
+	if (ret)
+		dev_err(smmu->dev, "GBPA not responding to update\n");
+	return ret;
+}
+
+int arm_smmu_device_disable(struct arm_smmu_device *smmu)
+{
+	int ret;
+
+	ret = arm_smmu_write_reg_sync(smmu, 0, ARM_SMMU_CR0, ARM_SMMU_CR0ACK);
+	if (ret)
+		dev_err(smmu->dev, "failed to clear cr0\n");
+
+	return ret;
+}
+
+bool arm_smmu_capable(struct device *dev, enum iommu_cap cap)
+{
+	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+
+	switch (cap) {
+	case IOMMU_CAP_CACHE_COHERENCY:
+		/* Assume that a coherent TCU implies coherent TBUs */
+		return master->smmu->features & ARM_SMMU_FEAT_COHERENCY;
+	case IOMMU_CAP_NOEXEC:
+		return true;
+	default:
+		return false;
+	}
+}
+
+
+struct iommu_group *arm_smmu_device_group(struct device *dev)
+{
+	struct iommu_group *group;
+
+	/*
+	 * We don't support devices sharing stream IDs other than PCI RID
+	 * aliases, since the necessary ID-to-device lookup becomes rather
+	 * impractical given a potential sparse 32-bit stream ID space.
+	 */
+	if (dev_is_pci(dev))
+		group = pci_device_group(dev);
+	else
+		group = generic_device_group(dev);
+
+	return group;
+}
+
+int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
+{
+	return iommu_fwspec_add_ids(dev, args->args, 1);
+}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index bcbd691ca96a..08fd79f66d29 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -17,13 +17,11 @@
 #include <linux/err.h>
 #include <linux/interrupt.h>
 #include <linux/io-pgtable.h>
-#include <linux/iopoll.h>
 #include <linux/module.h>
 #include <linux/msi.h>
 #include <linux/of.h>
 #include <linux/of_address.h>
 #include <linux/of_platform.h>
-#include <linux/pci.h>
 #include <linux/pci-ats.h>
 #include <linux/platform_device.h>
 
@@ -1642,8 +1640,6 @@ static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
 	return IRQ_HANDLED;
 }
 
-static int arm_smmu_device_disable(struct arm_smmu_device *smmu);
-
 static irqreturn_t arm_smmu_gerror_handler(int irq, void *dev)
 {
 	u32 gerror, gerrorn, active;
@@ -1990,21 +1986,6 @@ static const struct iommu_flush_ops arm_smmu_flush_ops = {
 };
 
 /* IOMMU API */
-static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap)
-{
-	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
-
-	switch (cap) {
-	case IOMMU_CAP_CACHE_COHERENCY:
-		/* Assume that a coherent TCU implies coherent TBUs */
-		return master->smmu->features & ARM_SMMU_FEAT_COHERENCY;
-	case IOMMU_CAP_NOEXEC:
-		return true;
-	default:
-		return false;
-	}
-}
-
 static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 {
 	struct arm_smmu_domain *smmu_domain;
@@ -2698,23 +2679,6 @@ static void arm_smmu_release_device(struct device *dev)
 	kfree(master);
 }
 
-static struct iommu_group *arm_smmu_device_group(struct device *dev)
-{
-	struct iommu_group *group;
-
-	/*
-	 * We don't support devices sharing stream IDs other than PCI RID
-	 * aliases, since the necessary ID-to-device lookup becomes rather
-	 * impractical given a potential sparse 32-bit stream ID space.
-	 */
-	if (dev_is_pci(dev))
-		group = pci_device_group(dev);
-	else
-		group = generic_device_group(dev);
-
-	return group;
-}
-
 static int arm_smmu_enable_nesting(struct iommu_domain *domain)
 {
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
@@ -2730,11 +2694,6 @@ static int arm_smmu_enable_nesting(struct iommu_domain *domain)
 	return ret;
 }
 
-static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
-{
-	return iommu_fwspec_add_ids(dev, args->args, 1);
-}
-
 static void arm_smmu_get_resv_regions(struct device *dev,
 				      struct list_head *head)
 {
@@ -3081,38 +3040,6 @@ static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
 	return arm_smmu_init_strtab(smmu);
 }
 
-static int arm_smmu_write_reg_sync(struct arm_smmu_device *smmu, u32 val,
-				   unsigned int reg_off, unsigned int ack_off)
-{
-	u32 reg;
-
-	writel_relaxed(val, smmu->base + reg_off);
-	return readl_relaxed_poll_timeout(smmu->base + ack_off, reg, reg == val,
-					  1, ARM_SMMU_POLL_TIMEOUT_US);
-}
-
-/* GBPA is "special" */
-static int arm_smmu_update_gbpa(struct arm_smmu_device *smmu, u32 set, u32 clr)
-{
-	int ret;
-	u32 reg, __iomem *gbpa = smmu->base + ARM_SMMU_GBPA;
-
-	ret = readl_relaxed_poll_timeout(gbpa, reg, !(reg & GBPA_UPDATE),
-					 1, ARM_SMMU_POLL_TIMEOUT_US);
-	if (ret)
-		return ret;
-
-	reg &= ~clr;
-	reg |= set;
-	writel_relaxed(reg | GBPA_UPDATE, gbpa);
-	ret = readl_relaxed_poll_timeout(gbpa, reg, !(reg & GBPA_UPDATE),
-					 1, ARM_SMMU_POLL_TIMEOUT_US);
-
-	if (ret)
-		dev_err(smmu->dev, "GBPA not responding to update\n");
-	return ret;
-}
-
 static void arm_smmu_free_msis(void *data)
 {
 	struct device *dev = data;
@@ -3258,17 +3185,6 @@ static int arm_smmu_setup_irqs(struct arm_smmu_device *smmu)
 	return 0;
 }
 
-static int arm_smmu_device_disable(struct arm_smmu_device *smmu)
-{
-	int ret;
-
-	ret = arm_smmu_write_reg_sync(smmu, 0, ARM_SMMU_CR0, ARM_SMMU_CR0ACK);
-	if (ret)
-		dev_err(smmu->dev, "failed to clear cr0\n");
-
-	return ret;
-}
-
 static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 {
 	int ret;
@@ -3404,215 +3320,6 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 	return 0;
 }
 
-static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
-{
-	u32 reg;
-	bool coherent = smmu->features & ARM_SMMU_FEAT_COHERENCY;
-
-	/* IDR0 */
-	reg = readl_relaxed(smmu->base + ARM_SMMU_IDR0);
-
-	/* 2-level structures */
-	if (FIELD_GET(IDR0_ST_LVL, reg) == IDR0_ST_LVL_2LVL)
-		smmu->features |= ARM_SMMU_FEAT_2_LVL_STRTAB;
-
-	if (reg & IDR0_CD2L)
-		smmu->features |= ARM_SMMU_FEAT_2_LVL_CDTAB;
-
-	/*
-	 * Translation table endianness.
-	 * We currently require the same endianness as the CPU, but this
-	 * could be changed later by adding a new IO_PGTABLE_QUIRK.
-	 */
-	switch (FIELD_GET(IDR0_TTENDIAN, reg)) {
-	case IDR0_TTENDIAN_MIXED:
-		smmu->features |= ARM_SMMU_FEAT_TT_LE | ARM_SMMU_FEAT_TT_BE;
-		break;
-#ifdef __BIG_ENDIAN
-	case IDR0_TTENDIAN_BE:
-		smmu->features |= ARM_SMMU_FEAT_TT_BE;
-		break;
-#else
-	case IDR0_TTENDIAN_LE:
-		smmu->features |= ARM_SMMU_FEAT_TT_LE;
-		break;
-#endif
-	default:
-		dev_err(smmu->dev, "unknown/unsupported TT endianness!\n");
-		return -ENXIO;
-	}
-
-	/* Boolean feature flags */
-	if (IS_ENABLED(CONFIG_PCI_PRI) && reg & IDR0_PRI)
-		smmu->features |= ARM_SMMU_FEAT_PRI;
-
-	if (IS_ENABLED(CONFIG_PCI_ATS) && reg & IDR0_ATS)
-		smmu->features |= ARM_SMMU_FEAT_ATS;
-
-	if (reg & IDR0_SEV)
-		smmu->features |= ARM_SMMU_FEAT_SEV;
-
-	if (reg & IDR0_MSI) {
-		smmu->features |= ARM_SMMU_FEAT_MSI;
-		if (coherent)
-			smmu->options |= ARM_SMMU_OPT_MSIPOLL;
-	}
-
-	if (reg & IDR0_HYP) {
-		smmu->features |= ARM_SMMU_FEAT_HYP;
-		if (cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN))
-			smmu->features |= ARM_SMMU_FEAT_E2H;
-	}
-
-	/*
-	 * The coherency feature as set by FW is used in preference to the ID
-	 * register, but warn on mismatch.
-	 */
-	if (!!(reg & IDR0_COHACC) != coherent)
-		dev_warn(smmu->dev, "IDR0.COHACC overridden by FW configuration (%s)\n",
-			 coherent ? "true" : "false");
-
-	switch (FIELD_GET(IDR0_STALL_MODEL, reg)) {
-	case IDR0_STALL_MODEL_FORCE:
-		smmu->features |= ARM_SMMU_FEAT_STALL_FORCE;
-		fallthrough;
-	case IDR0_STALL_MODEL_STALL:
-		smmu->features |= ARM_SMMU_FEAT_STALLS;
-	}
-
-	if (reg & IDR0_S1P)
-		smmu->features |= ARM_SMMU_FEAT_TRANS_S1;
-
-	if (reg & IDR0_S2P)
-		smmu->features |= ARM_SMMU_FEAT_TRANS_S2;
-
-	if (!(reg & (IDR0_S1P | IDR0_S2P))) {
-		dev_err(smmu->dev, "no translation support!\n");
-		return -ENXIO;
-	}
-
-	/* We only support the AArch64 table format at present */
-	switch (FIELD_GET(IDR0_TTF, reg)) {
-	case IDR0_TTF_AARCH32_64:
-		smmu->ias = 40;
-		fallthrough;
-	case IDR0_TTF_AARCH64:
-		break;
-	default:
-		dev_err(smmu->dev, "AArch64 table format not supported!\n");
-		return -ENXIO;
-	}
-
-	/* ASID/VMID sizes */
-	smmu->asid_bits = reg & IDR0_ASID16 ? 16 : 8;
-	smmu->vmid_bits = reg & IDR0_VMID16 ? 16 : 8;
-
-	/* IDR1 */
-	reg = readl_relaxed(smmu->base + ARM_SMMU_IDR1);
-	if (reg & (IDR1_TABLES_PRESET | IDR1_QUEUES_PRESET | IDR1_REL)) {
-		dev_err(smmu->dev, "embedded implementation not supported\n");
-		return -ENXIO;
-	}
-
-	/* Queue sizes, capped to ensure natural alignment */
-	smmu->cmdq.q.llq.max_n_shift = min_t(u32, CMDQ_MAX_SZ_SHIFT,
-					     FIELD_GET(IDR1_CMDQS, reg));
-	if (smmu->cmdq.q.llq.max_n_shift <= ilog2(CMDQ_BATCH_ENTRIES)) {
-		/*
-		 * We don't support splitting up batches, so one batch of
-		 * commands plus an extra sync needs to fit inside the command
-		 * queue. There's also no way we can handle the weird alignment
-		 * restrictions on the base pointer for a unit-length queue.
-		 */
-		dev_err(smmu->dev, "command queue size <= %d entries not supported\n",
-			CMDQ_BATCH_ENTRIES);
-		return -ENXIO;
-	}
-
-	smmu->evtq.q.llq.max_n_shift = min_t(u32, EVTQ_MAX_SZ_SHIFT,
-					     FIELD_GET(IDR1_EVTQS, reg));
-	smmu->priq.q.llq.max_n_shift = min_t(u32, PRIQ_MAX_SZ_SHIFT,
-					     FIELD_GET(IDR1_PRIQS, reg));
-
-	/* SID/SSID sizes */
-	smmu->ssid_bits = FIELD_GET(IDR1_SSIDSIZE, reg);
-	smmu->sid_bits = FIELD_GET(IDR1_SIDSIZE, reg);
-	smmu->iommu.max_pasids = 1UL << smmu->ssid_bits;
-
-	/*
-	 * If the SMMU supports fewer bits than would fill a single L2 stream
-	 * table, use a linear table instead.
-	 */
-	if (smmu->sid_bits <= STRTAB_SPLIT)
-		smmu->features &= ~ARM_SMMU_FEAT_2_LVL_STRTAB;
-
-	/* IDR3 */
-	reg = readl_relaxed(smmu->base + ARM_SMMU_IDR3);
-	if (FIELD_GET(IDR3_RIL, reg))
-		smmu->features |= ARM_SMMU_FEAT_RANGE_INV;
-
-	/* IDR5 */
-	reg = readl_relaxed(smmu->base + ARM_SMMU_IDR5);
-
-	/* Maximum number of outstanding stalls */
-	smmu->evtq.max_stalls = FIELD_GET(IDR5_STALL_MAX, reg);
-
-	/* Page sizes */
-	if (reg & IDR5_GRAN64K)
-		smmu->pgsize_bitmap |= SZ_64K | SZ_512M;
-	if (reg & IDR5_GRAN16K)
-		smmu->pgsize_bitmap |= SZ_16K | SZ_32M;
-	if (reg & IDR5_GRAN4K)
-		smmu->pgsize_bitmap |= SZ_4K | SZ_2M | SZ_1G;
-
-	/* Input address size */
-	if (FIELD_GET(IDR5_VAX, reg) == IDR5_VAX_52_BIT)
-		smmu->features |= ARM_SMMU_FEAT_VAX;
-
-	/* Output address size */
-	switch (FIELD_GET(IDR5_OAS, reg)) {
-	case IDR5_OAS_32_BIT:
-		smmu->oas = 32;
-		break;
-	case IDR5_OAS_36_BIT:
-		smmu->oas = 36;
-		break;
-	case IDR5_OAS_40_BIT:
-		smmu->oas = 40;
-		break;
-	case IDR5_OAS_42_BIT:
-		smmu->oas = 42;
-		break;
-	case IDR5_OAS_44_BIT:
-		smmu->oas = 44;
-		break;
-	case IDR5_OAS_52_BIT:
-		smmu->oas = 52;
-		smmu->pgsize_bitmap |= 1ULL << 42; /* 4TB */
-		break;
-	default:
-		dev_info(smmu->dev,
-			"unknown output address size. Truncating to 48-bit\n");
-		fallthrough;
-	case IDR5_OAS_48_BIT:
-		smmu->oas = 48;
-	}
-
-	/* Set the DMA mask for our table walker */
-	if (dma_set_mask_and_coherent(smmu->dev, DMA_BIT_MASK(smmu->oas)))
-		dev_warn(smmu->dev,
-			 "failed to set DMA mask for table walker\n");
-
-	smmu->ias = max(smmu->ias, smmu->oas);
-
-	if (arm_smmu_sva_supported(smmu))
-		smmu->features |= ARM_SMMU_FEAT_SVA;
-
-	dev_info(smmu->dev, "ias %lu-bit, oas %lu-bit (features 0x%08x)\n",
-		 smmu->ias, smmu->oas, smmu->features);
-	return 0;
-}
-
 #ifdef CONFIG_ACPI
 static void acpi_smmu_get_options(u32 model, struct arm_smmu_device *smmu)
 {
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 30/45] iommu/arm-smmu-v3: Move queue and table allocation to arm-smmu-v3-common.c
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Move more code to arm-smmu-v3-common.c, so that the KVM driver can reuse
it.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |   8 +
 .../arm/arm-smmu-v3/arm-smmu-v3-common.c      | 190 ++++++++++++++++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 215 ++----------------
 3 files changed, 219 insertions(+), 194 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 59e8101d4ff5..8ab84282f62a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -277,6 +277,14 @@ bool arm_smmu_capable(struct device *dev, enum iommu_cap cap);
 struct iommu_group *arm_smmu_device_group(struct device *dev);
 int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args);
 int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu);
+int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
+			    struct arm_smmu_queue *q,
+			    void __iomem *page,
+			    unsigned long prod_off,
+			    unsigned long cons_off,
+			    size_t dwords, const char *name);
+int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid);
+int arm_smmu_init_strtab(struct arm_smmu_device *smmu);
 
 int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain, int ssid,
 			    struct arm_smmu_ctx_desc *cd);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
index 5e43329c0826..9226971b6e53 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
@@ -294,3 +294,193 @@ int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
 	return iommu_fwspec_add_ids(dev, args->args, 1);
 }
+
+int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
+			    struct arm_smmu_queue *q,
+			    void __iomem *page,
+			    unsigned long prod_off,
+			    unsigned long cons_off,
+			    size_t dwords, const char *name)
+{
+	size_t qsz;
+
+	do {
+		qsz = ((1 << q->llq.max_n_shift) * dwords) << 3;
+		q->base = dmam_alloc_coherent(smmu->dev, qsz, &q->base_dma,
+					      GFP_KERNEL);
+		if (q->base || qsz < PAGE_SIZE)
+			break;
+
+		q->llq.max_n_shift--;
+	} while (1);
+
+	if (!q->base) {
+		dev_err(smmu->dev,
+			"failed to allocate queue (0x%zx bytes) for %s\n",
+			qsz, name);
+		return -ENOMEM;
+	}
+
+	if (!WARN_ON(q->base_dma & (qsz - 1))) {
+		dev_info(smmu->dev, "allocated %u entries for %s\n",
+			 1 << q->llq.max_n_shift, name);
+	}
+
+	q->prod_reg	= page + prod_off;
+	q->cons_reg	= page + cons_off;
+	q->ent_dwords	= dwords;
+
+	q->q_base  = Q_BASE_RWA;
+	q->q_base |= q->base_dma & Q_BASE_ADDR_MASK;
+	q->q_base |= FIELD_PREP(Q_BASE_LOG2SIZE, q->llq.max_n_shift);
+
+	q->llq.prod = q->llq.cons = 0;
+	return 0;
+}
+
+/* Stream table initialization functions */
+static void
+arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
+{
+	u64 val = 0;
+
+	val |= FIELD_PREP(STRTAB_L1_DESC_SPAN, desc->span);
+	val |= desc->l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK;
+
+	/* Ensure the SMMU sees a zeroed table after reading this pointer */
+	WRITE_ONCE(*dst, cpu_to_le64(val));
+}
+
+int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
+{
+	size_t size;
+	void *strtab;
+	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
+	struct arm_smmu_strtab_l1_desc *desc = &cfg->l1_desc[sid >> STRTAB_SPLIT];
+
+	if (desc->l2ptr)
+		return 0;
+
+	size = 1 << (STRTAB_SPLIT + ilog2(STRTAB_STE_DWORDS) + 3);
+	strtab = &cfg->strtab[(sid >> STRTAB_SPLIT) * STRTAB_L1_DESC_DWORDS];
+
+	desc->span = STRTAB_SPLIT + 1;
+	desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, &desc->l2ptr_dma,
+					  GFP_KERNEL);
+	if (!desc->l2ptr) {
+		dev_err(smmu->dev,
+			"failed to allocate l2 stream table for SID %u\n",
+			sid);
+		return -ENOMEM;
+	}
+
+	arm_smmu_write_strtab_l1_desc(strtab, desc);
+	return 0;
+}
+
+static int arm_smmu_init_l1_strtab(struct arm_smmu_device *smmu)
+{
+	unsigned int i;
+	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
+	void *strtab = smmu->strtab_cfg.strtab;
+
+	cfg->l1_desc = devm_kcalloc(smmu->dev, cfg->num_l1_ents,
+				    sizeof(*cfg->l1_desc), GFP_KERNEL);
+	if (!cfg->l1_desc)
+		return -ENOMEM;
+
+	for (i = 0; i < cfg->num_l1_ents; ++i) {
+		arm_smmu_write_strtab_l1_desc(strtab, &cfg->l1_desc[i]);
+		strtab += STRTAB_L1_DESC_DWORDS << 3;
+	}
+
+	return 0;
+}
+
+static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
+{
+	void *strtab;
+	u64 reg;
+	u32 size, l1size;
+	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
+
+	/* Calculate the L1 size, capped to the SIDSIZE. */
+	size = STRTAB_L1_SZ_SHIFT - (ilog2(STRTAB_L1_DESC_DWORDS) + 3);
+	size = min(size, smmu->sid_bits - STRTAB_SPLIT);
+	cfg->num_l1_ents = 1 << size;
+
+	size += STRTAB_SPLIT;
+	if (size < smmu->sid_bits)
+		dev_warn(smmu->dev,
+			 "2-level strtab only covers %u/%u bits of SID\n",
+			 size, smmu->sid_bits);
+
+	l1size = cfg->num_l1_ents * (STRTAB_L1_DESC_DWORDS << 3);
+	strtab = dmam_alloc_coherent(smmu->dev, l1size, &cfg->strtab_dma,
+				     GFP_KERNEL);
+	if (!strtab) {
+		dev_err(smmu->dev,
+			"failed to allocate l1 stream table (%u bytes)\n",
+			l1size);
+		return -ENOMEM;
+	}
+	cfg->strtab = strtab;
+
+	/* Configure strtab_base_cfg for 2 levels */
+	reg  = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_2LVL);
+	reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, size);
+	reg |= FIELD_PREP(STRTAB_BASE_CFG_SPLIT, STRTAB_SPLIT);
+	cfg->strtab_base_cfg = reg;
+
+	return arm_smmu_init_l1_strtab(smmu);
+}
+
+static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
+{
+	void *strtab;
+	u64 reg;
+	u32 size;
+	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
+
+	size = (1 << smmu->sid_bits) * (STRTAB_STE_DWORDS << 3);
+	strtab = dmam_alloc_coherent(smmu->dev, size, &cfg->strtab_dma,
+				     GFP_KERNEL);
+	if (!strtab) {
+		dev_err(smmu->dev,
+			"failed to allocate linear stream table (%u bytes)\n",
+			size);
+		return -ENOMEM;
+	}
+	cfg->strtab = strtab;
+	cfg->num_l1_ents = 1 << smmu->sid_bits;
+
+	/* Configure strtab_base_cfg for a linear table covering all SIDs */
+	reg  = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_LINEAR);
+	reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, smmu->sid_bits);
+	cfg->strtab_base_cfg = reg;
+
+	return 0;
+}
+
+int arm_smmu_init_strtab(struct arm_smmu_device *smmu)
+{
+	u64 reg;
+	int ret;
+
+	if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB)
+		ret = arm_smmu_init_strtab_2lvl(smmu);
+	else
+		ret = arm_smmu_init_strtab_linear(smmu);
+
+	if (ret)
+		return ret;
+
+	/* Set the strtab base address */
+	reg  = smmu->strtab_cfg.strtab_dma & STRTAB_BASE_ADDR_MASK;
+	reg |= STRTAB_BASE_RA;
+	smmu->strtab_cfg.strtab_base = reg;
+
+	/* Allocate the first VMID for stage-2 bypass STEs */
+	set_bit(0, smmu->vmid_map);
+	return 0;
+}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 08fd79f66d29..2baaf064a324 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1209,18 +1209,6 @@ bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd)
 }
 
 /* Stream table manipulation functions */
-static void
-arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
-{
-	u64 val = 0;
-
-	val |= FIELD_PREP(STRTAB_L1_DESC_SPAN, desc->span);
-	val |= desc->l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK;
-
-	/* See comment in arm_smmu_write_ctx_desc() */
-	WRITE_ONCE(*dst, cpu_to_le64(val));
-}
-
 static void arm_smmu_sync_ste_for_sid(struct arm_smmu_device *smmu, u32 sid)
 {
 	struct arm_smmu_cmdq_ent cmd = {
@@ -1395,34 +1383,6 @@ static void arm_smmu_init_bypass_stes(__le64 *strtab, unsigned int nent, bool fo
 	}
 }
 
-static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
-{
-	size_t size;
-	void *strtab;
-	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
-	struct arm_smmu_strtab_l1_desc *desc = &cfg->l1_desc[sid >> STRTAB_SPLIT];
-
-	if (desc->l2ptr)
-		return 0;
-
-	size = 1 << (STRTAB_SPLIT + ilog2(STRTAB_STE_DWORDS) + 3);
-	strtab = &cfg->strtab[(sid >> STRTAB_SPLIT) * STRTAB_L1_DESC_DWORDS];
-
-	desc->span = STRTAB_SPLIT + 1;
-	desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, &desc->l2ptr_dma,
-					  GFP_KERNEL);
-	if (!desc->l2ptr) {
-		dev_err(smmu->dev,
-			"failed to allocate l2 stream table for SID %u\n",
-			sid);
-		return -ENOMEM;
-	}
-
-	arm_smmu_init_bypass_stes(desc->l2ptr, 1 << STRTAB_SPLIT, false);
-	arm_smmu_write_strtab_l1_desc(strtab, desc);
-	return 0;
-}
-
 static struct arm_smmu_master *
 arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
 {
@@ -2515,13 +2475,24 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid)
 
 static int arm_smmu_init_sid_strtab(struct arm_smmu_device *smmu, u32 sid)
 {
+	int ret;
+
 	/* Check the SIDs are in range of the SMMU and our stream table */
 	if (!arm_smmu_sid_in_range(smmu, sid))
 		return -ERANGE;
 
 	/* Ensure l2 strtab is initialised */
-	if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB)
-		return arm_smmu_init_l2_strtab(smmu, sid);
+	if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
+		struct arm_smmu_strtab_l1_desc *desc;
+
+		ret = arm_smmu_init_l2_strtab(smmu, sid);
+		if (ret)
+			return ret;
+
+		desc = &smmu->strtab_cfg.l1_desc[sid >> STRTAB_SPLIT];
+		arm_smmu_init_bypass_stes(desc->l2ptr, 1 << STRTAB_SPLIT,
+					  false);
+	}
 
 	return 0;
 }
@@ -2821,49 +2792,6 @@ static struct iommu_ops arm_smmu_ops = {
 };
 
 /* Probing and initialisation functions */
-static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
-				   struct arm_smmu_queue *q,
-				   void __iomem *page,
-				   unsigned long prod_off,
-				   unsigned long cons_off,
-				   size_t dwords, const char *name)
-{
-	size_t qsz;
-
-	do {
-		qsz = ((1 << q->llq.max_n_shift) * dwords) << 3;
-		q->base = dmam_alloc_coherent(smmu->dev, qsz, &q->base_dma,
-					      GFP_KERNEL);
-		if (q->base || qsz < PAGE_SIZE)
-			break;
-
-		q->llq.max_n_shift--;
-	} while (1);
-
-	if (!q->base) {
-		dev_err(smmu->dev,
-			"failed to allocate queue (0x%zx bytes) for %s\n",
-			qsz, name);
-		return -ENOMEM;
-	}
-
-	if (!WARN_ON(q->base_dma & (qsz - 1))) {
-		dev_info(smmu->dev, "allocated %u entries for %s\n",
-			 1 << q->llq.max_n_shift, name);
-	}
-
-	q->prod_reg	= page + prod_off;
-	q->cons_reg	= page + cons_off;
-	q->ent_dwords	= dwords;
-
-	q->q_base  = Q_BASE_RWA;
-	q->q_base |= q->base_dma & Q_BASE_ADDR_MASK;
-	q->q_base |= FIELD_PREP(Q_BASE_LOG2SIZE, q->llq.max_n_shift);
-
-	q->llq.prod = q->llq.cons = 0;
-	return 0;
-}
-
 static int arm_smmu_cmdq_init(struct arm_smmu_device *smmu)
 {
 	struct arm_smmu_cmdq *cmdq = &smmu->cmdq;
@@ -2918,114 +2846,6 @@ static int arm_smmu_init_queues(struct arm_smmu_device *smmu)
 				       PRIQ_ENT_DWORDS, "priq");
 }
 
-static int arm_smmu_init_l1_strtab(struct arm_smmu_device *smmu)
-{
-	unsigned int i;
-	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
-	void *strtab = smmu->strtab_cfg.strtab;
-
-	cfg->l1_desc = devm_kcalloc(smmu->dev, cfg->num_l1_ents,
-				    sizeof(*cfg->l1_desc), GFP_KERNEL);
-	if (!cfg->l1_desc)
-		return -ENOMEM;
-
-	for (i = 0; i < cfg->num_l1_ents; ++i) {
-		arm_smmu_write_strtab_l1_desc(strtab, &cfg->l1_desc[i]);
-		strtab += STRTAB_L1_DESC_DWORDS << 3;
-	}
-
-	return 0;
-}
-
-static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
-{
-	void *strtab;
-	u64 reg;
-	u32 size, l1size;
-	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
-
-	/* Calculate the L1 size, capped to the SIDSIZE. */
-	size = STRTAB_L1_SZ_SHIFT - (ilog2(STRTAB_L1_DESC_DWORDS) + 3);
-	size = min(size, smmu->sid_bits - STRTAB_SPLIT);
-	cfg->num_l1_ents = 1 << size;
-
-	size += STRTAB_SPLIT;
-	if (size < smmu->sid_bits)
-		dev_warn(smmu->dev,
-			 "2-level strtab only covers %u/%u bits of SID\n",
-			 size, smmu->sid_bits);
-
-	l1size = cfg->num_l1_ents * (STRTAB_L1_DESC_DWORDS << 3);
-	strtab = dmam_alloc_coherent(smmu->dev, l1size, &cfg->strtab_dma,
-				     GFP_KERNEL);
-	if (!strtab) {
-		dev_err(smmu->dev,
-			"failed to allocate l1 stream table (%u bytes)\n",
-			l1size);
-		return -ENOMEM;
-	}
-	cfg->strtab = strtab;
-
-	/* Configure strtab_base_cfg for 2 levels */
-	reg  = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_2LVL);
-	reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, size);
-	reg |= FIELD_PREP(STRTAB_BASE_CFG_SPLIT, STRTAB_SPLIT);
-	cfg->strtab_base_cfg = reg;
-
-	return arm_smmu_init_l1_strtab(smmu);
-}
-
-static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
-{
-	void *strtab;
-	u64 reg;
-	u32 size;
-	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
-
-	size = (1 << smmu->sid_bits) * (STRTAB_STE_DWORDS << 3);
-	strtab = dmam_alloc_coherent(smmu->dev, size, &cfg->strtab_dma,
-				     GFP_KERNEL);
-	if (!strtab) {
-		dev_err(smmu->dev,
-			"failed to allocate linear stream table (%u bytes)\n",
-			size);
-		return -ENOMEM;
-	}
-	cfg->strtab = strtab;
-	cfg->num_l1_ents = 1 << smmu->sid_bits;
-
-	/* Configure strtab_base_cfg for a linear table covering all SIDs */
-	reg  = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_LINEAR);
-	reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, smmu->sid_bits);
-	cfg->strtab_base_cfg = reg;
-
-	arm_smmu_init_bypass_stes(strtab, cfg->num_l1_ents, false);
-	return 0;
-}
-
-static int arm_smmu_init_strtab(struct arm_smmu_device *smmu)
-{
-	u64 reg;
-	int ret;
-
-	if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB)
-		ret = arm_smmu_init_strtab_2lvl(smmu);
-	else
-		ret = arm_smmu_init_strtab_linear(smmu);
-
-	if (ret)
-		return ret;
-
-	/* Set the strtab base address */
-	reg  = smmu->strtab_cfg.strtab_dma & STRTAB_BASE_ADDR_MASK;
-	reg |= STRTAB_BASE_RA;
-	smmu->strtab_cfg.strtab_base = reg;
-
-	/* Allocate the first VMID for stage-2 bypass STEs */
-	set_bit(0, smmu->vmid_map);
-	return 0;
-}
-
 static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
 {
 	int ret;
@@ -3037,7 +2857,14 @@ static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
 	if (ret)
 		return ret;
 
-	return arm_smmu_init_strtab(smmu);
+	ret = arm_smmu_init_strtab(smmu);
+	if (ret)
+		return ret;
+
+	if (!(smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB))
+		arm_smmu_init_bypass_stes(smmu->strtab_cfg.strtab,
+					  smmu->strtab_cfg.num_l1_ents, false);
+	return 0;
 }
 
 static void arm_smmu_free_msis(void *data)
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 30/45] iommu/arm-smmu-v3: Move queue and table allocation to arm-smmu-v3-common.c
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Move more code to arm-smmu-v3-common.c, so that the KVM driver can reuse
it.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |   8 +
 .../arm/arm-smmu-v3/arm-smmu-v3-common.c      | 190 ++++++++++++++++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 215 ++----------------
 3 files changed, 219 insertions(+), 194 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 59e8101d4ff5..8ab84282f62a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -277,6 +277,14 @@ bool arm_smmu_capable(struct device *dev, enum iommu_cap cap);
 struct iommu_group *arm_smmu_device_group(struct device *dev);
 int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args);
 int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu);
+int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
+			    struct arm_smmu_queue *q,
+			    void __iomem *page,
+			    unsigned long prod_off,
+			    unsigned long cons_off,
+			    size_t dwords, const char *name);
+int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid);
+int arm_smmu_init_strtab(struct arm_smmu_device *smmu);
 
 int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain, int ssid,
 			    struct arm_smmu_ctx_desc *cd);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
index 5e43329c0826..9226971b6e53 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
@@ -294,3 +294,193 @@ int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
 	return iommu_fwspec_add_ids(dev, args->args, 1);
 }
+
+int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
+			    struct arm_smmu_queue *q,
+			    void __iomem *page,
+			    unsigned long prod_off,
+			    unsigned long cons_off,
+			    size_t dwords, const char *name)
+{
+	size_t qsz;
+
+	do {
+		qsz = ((1 << q->llq.max_n_shift) * dwords) << 3;
+		q->base = dmam_alloc_coherent(smmu->dev, qsz, &q->base_dma,
+					      GFP_KERNEL);
+		if (q->base || qsz < PAGE_SIZE)
+			break;
+
+		q->llq.max_n_shift--;
+	} while (1);
+
+	if (!q->base) {
+		dev_err(smmu->dev,
+			"failed to allocate queue (0x%zx bytes) for %s\n",
+			qsz, name);
+		return -ENOMEM;
+	}
+
+	if (!WARN_ON(q->base_dma & (qsz - 1))) {
+		dev_info(smmu->dev, "allocated %u entries for %s\n",
+			 1 << q->llq.max_n_shift, name);
+	}
+
+	q->prod_reg	= page + prod_off;
+	q->cons_reg	= page + cons_off;
+	q->ent_dwords	= dwords;
+
+	q->q_base  = Q_BASE_RWA;
+	q->q_base |= q->base_dma & Q_BASE_ADDR_MASK;
+	q->q_base |= FIELD_PREP(Q_BASE_LOG2SIZE, q->llq.max_n_shift);
+
+	q->llq.prod = q->llq.cons = 0;
+	return 0;
+}
+
+/* Stream table initialization functions */
+static void
+arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
+{
+	u64 val = 0;
+
+	val |= FIELD_PREP(STRTAB_L1_DESC_SPAN, desc->span);
+	val |= desc->l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK;
+
+	/* Ensure the SMMU sees a zeroed table after reading this pointer */
+	WRITE_ONCE(*dst, cpu_to_le64(val));
+}
+
+int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
+{
+	size_t size;
+	void *strtab;
+	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
+	struct arm_smmu_strtab_l1_desc *desc = &cfg->l1_desc[sid >> STRTAB_SPLIT];
+
+	if (desc->l2ptr)
+		return 0;
+
+	size = 1 << (STRTAB_SPLIT + ilog2(STRTAB_STE_DWORDS) + 3);
+	strtab = &cfg->strtab[(sid >> STRTAB_SPLIT) * STRTAB_L1_DESC_DWORDS];
+
+	desc->span = STRTAB_SPLIT + 1;
+	desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, &desc->l2ptr_dma,
+					  GFP_KERNEL);
+	if (!desc->l2ptr) {
+		dev_err(smmu->dev,
+			"failed to allocate l2 stream table for SID %u\n",
+			sid);
+		return -ENOMEM;
+	}
+
+	arm_smmu_write_strtab_l1_desc(strtab, desc);
+	return 0;
+}
+
+static int arm_smmu_init_l1_strtab(struct arm_smmu_device *smmu)
+{
+	unsigned int i;
+	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
+	void *strtab = smmu->strtab_cfg.strtab;
+
+	cfg->l1_desc = devm_kcalloc(smmu->dev, cfg->num_l1_ents,
+				    sizeof(*cfg->l1_desc), GFP_KERNEL);
+	if (!cfg->l1_desc)
+		return -ENOMEM;
+
+	for (i = 0; i < cfg->num_l1_ents; ++i) {
+		arm_smmu_write_strtab_l1_desc(strtab, &cfg->l1_desc[i]);
+		strtab += STRTAB_L1_DESC_DWORDS << 3;
+	}
+
+	return 0;
+}
+
+static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
+{
+	void *strtab;
+	u64 reg;
+	u32 size, l1size;
+	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
+
+	/* Calculate the L1 size, capped to the SIDSIZE. */
+	size = STRTAB_L1_SZ_SHIFT - (ilog2(STRTAB_L1_DESC_DWORDS) + 3);
+	size = min(size, smmu->sid_bits - STRTAB_SPLIT);
+	cfg->num_l1_ents = 1 << size;
+
+	size += STRTAB_SPLIT;
+	if (size < smmu->sid_bits)
+		dev_warn(smmu->dev,
+			 "2-level strtab only covers %u/%u bits of SID\n",
+			 size, smmu->sid_bits);
+
+	l1size = cfg->num_l1_ents * (STRTAB_L1_DESC_DWORDS << 3);
+	strtab = dmam_alloc_coherent(smmu->dev, l1size, &cfg->strtab_dma,
+				     GFP_KERNEL);
+	if (!strtab) {
+		dev_err(smmu->dev,
+			"failed to allocate l1 stream table (%u bytes)\n",
+			l1size);
+		return -ENOMEM;
+	}
+	cfg->strtab = strtab;
+
+	/* Configure strtab_base_cfg for 2 levels */
+	reg  = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_2LVL);
+	reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, size);
+	reg |= FIELD_PREP(STRTAB_BASE_CFG_SPLIT, STRTAB_SPLIT);
+	cfg->strtab_base_cfg = reg;
+
+	return arm_smmu_init_l1_strtab(smmu);
+}
+
+static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
+{
+	void *strtab;
+	u64 reg;
+	u32 size;
+	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
+
+	size = (1 << smmu->sid_bits) * (STRTAB_STE_DWORDS << 3);
+	strtab = dmam_alloc_coherent(smmu->dev, size, &cfg->strtab_dma,
+				     GFP_KERNEL);
+	if (!strtab) {
+		dev_err(smmu->dev,
+			"failed to allocate linear stream table (%u bytes)\n",
+			size);
+		return -ENOMEM;
+	}
+	cfg->strtab = strtab;
+	cfg->num_l1_ents = 1 << smmu->sid_bits;
+
+	/* Configure strtab_base_cfg for a linear table covering all SIDs */
+	reg  = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_LINEAR);
+	reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, smmu->sid_bits);
+	cfg->strtab_base_cfg = reg;
+
+	return 0;
+}
+
+int arm_smmu_init_strtab(struct arm_smmu_device *smmu)
+{
+	u64 reg;
+	int ret;
+
+	if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB)
+		ret = arm_smmu_init_strtab_2lvl(smmu);
+	else
+		ret = arm_smmu_init_strtab_linear(smmu);
+
+	if (ret)
+		return ret;
+
+	/* Set the strtab base address */
+	reg  = smmu->strtab_cfg.strtab_dma & STRTAB_BASE_ADDR_MASK;
+	reg |= STRTAB_BASE_RA;
+	smmu->strtab_cfg.strtab_base = reg;
+
+	/* Allocate the first VMID for stage-2 bypass STEs */
+	set_bit(0, smmu->vmid_map);
+	return 0;
+}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 08fd79f66d29..2baaf064a324 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1209,18 +1209,6 @@ bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd)
 }
 
 /* Stream table manipulation functions */
-static void
-arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
-{
-	u64 val = 0;
-
-	val |= FIELD_PREP(STRTAB_L1_DESC_SPAN, desc->span);
-	val |= desc->l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK;
-
-	/* See comment in arm_smmu_write_ctx_desc() */
-	WRITE_ONCE(*dst, cpu_to_le64(val));
-}
-
 static void arm_smmu_sync_ste_for_sid(struct arm_smmu_device *smmu, u32 sid)
 {
 	struct arm_smmu_cmdq_ent cmd = {
@@ -1395,34 +1383,6 @@ static void arm_smmu_init_bypass_stes(__le64 *strtab, unsigned int nent, bool fo
 	}
 }
 
-static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
-{
-	size_t size;
-	void *strtab;
-	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
-	struct arm_smmu_strtab_l1_desc *desc = &cfg->l1_desc[sid >> STRTAB_SPLIT];
-
-	if (desc->l2ptr)
-		return 0;
-
-	size = 1 << (STRTAB_SPLIT + ilog2(STRTAB_STE_DWORDS) + 3);
-	strtab = &cfg->strtab[(sid >> STRTAB_SPLIT) * STRTAB_L1_DESC_DWORDS];
-
-	desc->span = STRTAB_SPLIT + 1;
-	desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, &desc->l2ptr_dma,
-					  GFP_KERNEL);
-	if (!desc->l2ptr) {
-		dev_err(smmu->dev,
-			"failed to allocate l2 stream table for SID %u\n",
-			sid);
-		return -ENOMEM;
-	}
-
-	arm_smmu_init_bypass_stes(desc->l2ptr, 1 << STRTAB_SPLIT, false);
-	arm_smmu_write_strtab_l1_desc(strtab, desc);
-	return 0;
-}
-
 static struct arm_smmu_master *
 arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
 {
@@ -2515,13 +2475,24 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid)
 
 static int arm_smmu_init_sid_strtab(struct arm_smmu_device *smmu, u32 sid)
 {
+	int ret;
+
 	/* Check the SIDs are in range of the SMMU and our stream table */
 	if (!arm_smmu_sid_in_range(smmu, sid))
 		return -ERANGE;
 
 	/* Ensure l2 strtab is initialised */
-	if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB)
-		return arm_smmu_init_l2_strtab(smmu, sid);
+	if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
+		struct arm_smmu_strtab_l1_desc *desc;
+
+		ret = arm_smmu_init_l2_strtab(smmu, sid);
+		if (ret)
+			return ret;
+
+		desc = &smmu->strtab_cfg.l1_desc[sid >> STRTAB_SPLIT];
+		arm_smmu_init_bypass_stes(desc->l2ptr, 1 << STRTAB_SPLIT,
+					  false);
+	}
 
 	return 0;
 }
@@ -2821,49 +2792,6 @@ static struct iommu_ops arm_smmu_ops = {
 };
 
 /* Probing and initialisation functions */
-static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
-				   struct arm_smmu_queue *q,
-				   void __iomem *page,
-				   unsigned long prod_off,
-				   unsigned long cons_off,
-				   size_t dwords, const char *name)
-{
-	size_t qsz;
-
-	do {
-		qsz = ((1 << q->llq.max_n_shift) * dwords) << 3;
-		q->base = dmam_alloc_coherent(smmu->dev, qsz, &q->base_dma,
-					      GFP_KERNEL);
-		if (q->base || qsz < PAGE_SIZE)
-			break;
-
-		q->llq.max_n_shift--;
-	} while (1);
-
-	if (!q->base) {
-		dev_err(smmu->dev,
-			"failed to allocate queue (0x%zx bytes) for %s\n",
-			qsz, name);
-		return -ENOMEM;
-	}
-
-	if (!WARN_ON(q->base_dma & (qsz - 1))) {
-		dev_info(smmu->dev, "allocated %u entries for %s\n",
-			 1 << q->llq.max_n_shift, name);
-	}
-
-	q->prod_reg	= page + prod_off;
-	q->cons_reg	= page + cons_off;
-	q->ent_dwords	= dwords;
-
-	q->q_base  = Q_BASE_RWA;
-	q->q_base |= q->base_dma & Q_BASE_ADDR_MASK;
-	q->q_base |= FIELD_PREP(Q_BASE_LOG2SIZE, q->llq.max_n_shift);
-
-	q->llq.prod = q->llq.cons = 0;
-	return 0;
-}
-
 static int arm_smmu_cmdq_init(struct arm_smmu_device *smmu)
 {
 	struct arm_smmu_cmdq *cmdq = &smmu->cmdq;
@@ -2918,114 +2846,6 @@ static int arm_smmu_init_queues(struct arm_smmu_device *smmu)
 				       PRIQ_ENT_DWORDS, "priq");
 }
 
-static int arm_smmu_init_l1_strtab(struct arm_smmu_device *smmu)
-{
-	unsigned int i;
-	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
-	void *strtab = smmu->strtab_cfg.strtab;
-
-	cfg->l1_desc = devm_kcalloc(smmu->dev, cfg->num_l1_ents,
-				    sizeof(*cfg->l1_desc), GFP_KERNEL);
-	if (!cfg->l1_desc)
-		return -ENOMEM;
-
-	for (i = 0; i < cfg->num_l1_ents; ++i) {
-		arm_smmu_write_strtab_l1_desc(strtab, &cfg->l1_desc[i]);
-		strtab += STRTAB_L1_DESC_DWORDS << 3;
-	}
-
-	return 0;
-}
-
-static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
-{
-	void *strtab;
-	u64 reg;
-	u32 size, l1size;
-	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
-
-	/* Calculate the L1 size, capped to the SIDSIZE. */
-	size = STRTAB_L1_SZ_SHIFT - (ilog2(STRTAB_L1_DESC_DWORDS) + 3);
-	size = min(size, smmu->sid_bits - STRTAB_SPLIT);
-	cfg->num_l1_ents = 1 << size;
-
-	size += STRTAB_SPLIT;
-	if (size < smmu->sid_bits)
-		dev_warn(smmu->dev,
-			 "2-level strtab only covers %u/%u bits of SID\n",
-			 size, smmu->sid_bits);
-
-	l1size = cfg->num_l1_ents * (STRTAB_L1_DESC_DWORDS << 3);
-	strtab = dmam_alloc_coherent(smmu->dev, l1size, &cfg->strtab_dma,
-				     GFP_KERNEL);
-	if (!strtab) {
-		dev_err(smmu->dev,
-			"failed to allocate l1 stream table (%u bytes)\n",
-			l1size);
-		return -ENOMEM;
-	}
-	cfg->strtab = strtab;
-
-	/* Configure strtab_base_cfg for 2 levels */
-	reg  = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_2LVL);
-	reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, size);
-	reg |= FIELD_PREP(STRTAB_BASE_CFG_SPLIT, STRTAB_SPLIT);
-	cfg->strtab_base_cfg = reg;
-
-	return arm_smmu_init_l1_strtab(smmu);
-}
-
-static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
-{
-	void *strtab;
-	u64 reg;
-	u32 size;
-	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
-
-	size = (1 << smmu->sid_bits) * (STRTAB_STE_DWORDS << 3);
-	strtab = dmam_alloc_coherent(smmu->dev, size, &cfg->strtab_dma,
-				     GFP_KERNEL);
-	if (!strtab) {
-		dev_err(smmu->dev,
-			"failed to allocate linear stream table (%u bytes)\n",
-			size);
-		return -ENOMEM;
-	}
-	cfg->strtab = strtab;
-	cfg->num_l1_ents = 1 << smmu->sid_bits;
-
-	/* Configure strtab_base_cfg for a linear table covering all SIDs */
-	reg  = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_LINEAR);
-	reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, smmu->sid_bits);
-	cfg->strtab_base_cfg = reg;
-
-	arm_smmu_init_bypass_stes(strtab, cfg->num_l1_ents, false);
-	return 0;
-}
-
-static int arm_smmu_init_strtab(struct arm_smmu_device *smmu)
-{
-	u64 reg;
-	int ret;
-
-	if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB)
-		ret = arm_smmu_init_strtab_2lvl(smmu);
-	else
-		ret = arm_smmu_init_strtab_linear(smmu);
-
-	if (ret)
-		return ret;
-
-	/* Set the strtab base address */
-	reg  = smmu->strtab_cfg.strtab_dma & STRTAB_BASE_ADDR_MASK;
-	reg |= STRTAB_BASE_RA;
-	smmu->strtab_cfg.strtab_base = reg;
-
-	/* Allocate the first VMID for stage-2 bypass STEs */
-	set_bit(0, smmu->vmid_map);
-	return 0;
-}
-
 static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
 {
 	int ret;
@@ -3037,7 +2857,14 @@ static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
 	if (ret)
 		return ret;
 
-	return arm_smmu_init_strtab(smmu);
+	ret = arm_smmu_init_strtab(smmu);
+	if (ret)
+		return ret;
+
+	if (!(smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB))
+		arm_smmu_init_bypass_stes(smmu->strtab_cfg.strtab,
+					  smmu->strtab_cfg.num_l1_ents, false);
+	return 0;
 }
 
 static void arm_smmu_free_msis(void *data)
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 31/45] iommu/arm-smmu-v3: Move firmware probe to arm-smmu-v3-common
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Move the FW probe functions to the common source, and take the
opportunity to clean up the 'bypass' behaviour a bit (see dc87a98db751
("iommu/arm-smmu: Fall back to global bypass"))

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |   4 +
 .../arm/arm-smmu-v3/arm-smmu-v3-common.c      | 107 ++++++++++++++++++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 106 +----------------
 3 files changed, 114 insertions(+), 103 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 8ab84282f62a..345aac378712 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -276,6 +276,10 @@ int arm_smmu_device_disable(struct arm_smmu_device *smmu);
 bool arm_smmu_capable(struct device *dev, enum iommu_cap cap);
 struct iommu_group *arm_smmu_device_group(struct device *dev);
 int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args);
+
+struct platform_device;
+int arm_smmu_fw_probe(struct platform_device *pdev,
+		      struct arm_smmu_device *smmu, bool *bypass);
 int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu);
 int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
 			    struct arm_smmu_queue *q,
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
index 9226971b6e53..4e945df5d64f 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
@@ -1,10 +1,117 @@
 // SPDX-License-Identifier: GPL-2.0
+#include <linux/acpi.h>
 #include <linux/dma-mapping.h>
 #include <linux/iopoll.h>
+#include <linux/of.h>
+#include <linux/of_address.h>
+#include <linux/of_platform.h>
 #include <linux/pci.h>
 
 #include "arm-smmu-v3.h"
 
+struct arm_smmu_option_prop {
+	u32 opt;
+	const char *prop;
+};
+
+static struct arm_smmu_option_prop arm_smmu_options[] = {
+	{ ARM_SMMU_OPT_SKIP_PREFETCH, "hisilicon,broken-prefetch-cmd" },
+	{ ARM_SMMU_OPT_PAGE0_REGS_ONLY, "cavium,cn9900-broken-page1-regspace"},
+	{ 0, NULL},
+};
+
+static void parse_driver_options(struct arm_smmu_device *smmu)
+{
+	int i = 0;
+
+	do {
+		if (of_property_read_bool(smmu->dev->of_node,
+						arm_smmu_options[i].prop)) {
+			smmu->options |= arm_smmu_options[i].opt;
+			dev_notice(smmu->dev, "option %s\n",
+				arm_smmu_options[i].prop);
+		}
+	} while (arm_smmu_options[++i].opt);
+}
+
+static int arm_smmu_device_dt_probe(struct platform_device *pdev,
+				    struct arm_smmu_device *smmu,
+				    bool *bypass)
+{
+	struct device *dev = &pdev->dev;
+	u32 cells;
+
+	*bypass = true;
+	if (of_property_read_u32(dev->of_node, "#iommu-cells", &cells))
+		dev_err(dev, "missing #iommu-cells property\n");
+	else if (cells != 1)
+		dev_err(dev, "invalid #iommu-cells value (%d)\n", cells);
+	else
+		*bypass = false;
+
+	parse_driver_options(smmu);
+
+	if (of_dma_is_coherent(dev->of_node))
+		smmu->features |= ARM_SMMU_FEAT_COHERENCY;
+
+	return 0;
+}
+
+#ifdef CONFIG_ACPI
+static void acpi_smmu_get_options(u32 model, struct arm_smmu_device *smmu)
+{
+	switch (model) {
+	case ACPI_IORT_SMMU_V3_CAVIUM_CN99XX:
+		smmu->options |= ARM_SMMU_OPT_PAGE0_REGS_ONLY;
+		break;
+	case ACPI_IORT_SMMU_V3_HISILICON_HI161X:
+		smmu->options |= ARM_SMMU_OPT_SKIP_PREFETCH;
+		break;
+	}
+
+	dev_notice(smmu->dev, "option mask 0x%x\n", smmu->options);
+}
+
+static int arm_smmu_device_acpi_probe(struct platform_device *pdev,
+				      struct arm_smmu_device *smmu,
+				      bool *bypass)
+{
+	struct acpi_iort_smmu_v3 *iort_smmu;
+	struct device *dev = smmu->dev;
+	struct acpi_iort_node *node;
+
+	node = *(struct acpi_iort_node **)dev_get_platdata(dev);
+
+	/* Retrieve SMMUv3 specific data */
+	iort_smmu = (struct acpi_iort_smmu_v3 *)node->node_data;
+
+	acpi_smmu_get_options(iort_smmu->model, smmu);
+
+	if (iort_smmu->flags & ACPI_IORT_SMMU_V3_COHACC_OVERRIDE)
+		smmu->features |= ARM_SMMU_FEAT_COHERENCY;
+
+	*bypass = false;
+	return 0;
+}
+
+#else
+static inline int arm_smmu_device_acpi_probe(struct platform_device *pdev,
+					     struct arm_smmu_device *smmu,
+					     bool *bypass)
+{
+	return -ENODEV;
+}
+#endif
+
+int arm_smmu_fw_probe(struct platform_device *pdev,
+		      struct arm_smmu_device *smmu, bool *bypass)
+{
+	if (smmu->dev->of_node)
+		return arm_smmu_device_dt_probe(pdev, smmu, bypass);
+	else
+		return arm_smmu_device_acpi_probe(pdev, smmu, bypass);
+}
+
 int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 {
 	u32 reg;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 2baaf064a324..7cb171304953 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -9,7 +9,6 @@
  * This driver is powered by bad coffee and bombay mix.
  */
 
-#include <linux/acpi.h>
 #include <linux/acpi_iort.h>
 #include <linux/bitops.h>
 #include <linux/crash_dump.h>
@@ -19,9 +18,6 @@
 #include <linux/io-pgtable.h>
 #include <linux/module.h>
 #include <linux/msi.h>
-#include <linux/of.h>
-#include <linux/of_address.h>
-#include <linux/of_platform.h>
 #include <linux/pci-ats.h>
 #include <linux/platform_device.h>
 
@@ -64,11 +60,6 @@ static phys_addr_t arm_smmu_msi_cfg[ARM_SMMU_MAX_MSIS][3] = {
 	},
 };
 
-struct arm_smmu_option_prop {
-	u32 opt;
-	const char *prop;
-};
-
 DEFINE_XARRAY_ALLOC1(arm_smmu_asid_xa);
 DEFINE_MUTEX(arm_smmu_asid_lock);
 
@@ -78,26 +69,6 @@ DEFINE_MUTEX(arm_smmu_asid_lock);
  */
 struct arm_smmu_ctx_desc quiet_cd = { 0 };
 
-static struct arm_smmu_option_prop arm_smmu_options[] = {
-	{ ARM_SMMU_OPT_SKIP_PREFETCH, "hisilicon,broken-prefetch-cmd" },
-	{ ARM_SMMU_OPT_PAGE0_REGS_ONLY, "cavium,cn9900-broken-page1-regspace"},
-	{ 0, NULL},
-};
-
-static void parse_driver_options(struct arm_smmu_device *smmu)
-{
-	int i = 0;
-
-	do {
-		if (of_property_read_bool(smmu->dev->of_node,
-						arm_smmu_options[i].prop)) {
-			smmu->options |= arm_smmu_options[i].opt;
-			dev_notice(smmu->dev, "option %s\n",
-				arm_smmu_options[i].prop);
-		}
-	} while (arm_smmu_options[++i].opt);
-}
-
 /* Low-level queue manipulation functions */
 static bool queue_has_space(struct arm_smmu_ll_queue *q, u32 n)
 {
@@ -3147,70 +3118,6 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 	return 0;
 }
 
-#ifdef CONFIG_ACPI
-static void acpi_smmu_get_options(u32 model, struct arm_smmu_device *smmu)
-{
-	switch (model) {
-	case ACPI_IORT_SMMU_V3_CAVIUM_CN99XX:
-		smmu->options |= ARM_SMMU_OPT_PAGE0_REGS_ONLY;
-		break;
-	case ACPI_IORT_SMMU_V3_HISILICON_HI161X:
-		smmu->options |= ARM_SMMU_OPT_SKIP_PREFETCH;
-		break;
-	}
-
-	dev_notice(smmu->dev, "option mask 0x%x\n", smmu->options);
-}
-
-static int arm_smmu_device_acpi_probe(struct platform_device *pdev,
-				      struct arm_smmu_device *smmu)
-{
-	struct acpi_iort_smmu_v3 *iort_smmu;
-	struct device *dev = smmu->dev;
-	struct acpi_iort_node *node;
-
-	node = *(struct acpi_iort_node **)dev_get_platdata(dev);
-
-	/* Retrieve SMMUv3 specific data */
-	iort_smmu = (struct acpi_iort_smmu_v3 *)node->node_data;
-
-	acpi_smmu_get_options(iort_smmu->model, smmu);
-
-	if (iort_smmu->flags & ACPI_IORT_SMMU_V3_COHACC_OVERRIDE)
-		smmu->features |= ARM_SMMU_FEAT_COHERENCY;
-
-	return 0;
-}
-#else
-static inline int arm_smmu_device_acpi_probe(struct platform_device *pdev,
-					     struct arm_smmu_device *smmu)
-{
-	return -ENODEV;
-}
-#endif
-
-static int arm_smmu_device_dt_probe(struct platform_device *pdev,
-				    struct arm_smmu_device *smmu)
-{
-	struct device *dev = &pdev->dev;
-	u32 cells;
-	int ret = -EINVAL;
-
-	if (of_property_read_u32(dev->of_node, "#iommu-cells", &cells))
-		dev_err(dev, "missing #iommu-cells property\n");
-	else if (cells != 1)
-		dev_err(dev, "invalid #iommu-cells value (%d)\n", cells);
-	else
-		ret = 0;
-
-	parse_driver_options(smmu);
-
-	if (of_dma_is_coherent(dev->of_node))
-		smmu->features |= ARM_SMMU_FEAT_COHERENCY;
-
-	return ret;
-}
-
 static unsigned long arm_smmu_resource_size(struct arm_smmu_device *smmu)
 {
 	if (smmu->options & ARM_SMMU_OPT_PAGE0_REGS_ONLY)
@@ -3271,16 +3178,9 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
 		return -ENOMEM;
 	smmu->dev = dev;
 
-	if (dev->of_node) {
-		ret = arm_smmu_device_dt_probe(pdev, smmu);
-	} else {
-		ret = arm_smmu_device_acpi_probe(pdev, smmu);
-		if (ret == -ENODEV)
-			return ret;
-	}
-
-	/* Set bypass mode according to firmware probing result */
-	bypass = !!ret;
+	ret = arm_smmu_fw_probe(pdev, smmu, &bypass);
+	if (ret)
+		return ret;
 
 	/* Base address */
 	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 31/45] iommu/arm-smmu-v3: Move firmware probe to arm-smmu-v3-common
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Move the FW probe functions to the common source, and take the
opportunity to clean up the 'bypass' behaviour a bit (see dc87a98db751
("iommu/arm-smmu: Fall back to global bypass"))

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |   4 +
 .../arm/arm-smmu-v3/arm-smmu-v3-common.c      | 107 ++++++++++++++++++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 106 +----------------
 3 files changed, 114 insertions(+), 103 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 8ab84282f62a..345aac378712 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -276,6 +276,10 @@ int arm_smmu_device_disable(struct arm_smmu_device *smmu);
 bool arm_smmu_capable(struct device *dev, enum iommu_cap cap);
 struct iommu_group *arm_smmu_device_group(struct device *dev);
 int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args);
+
+struct platform_device;
+int arm_smmu_fw_probe(struct platform_device *pdev,
+		      struct arm_smmu_device *smmu, bool *bypass);
 int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu);
 int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
 			    struct arm_smmu_queue *q,
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
index 9226971b6e53..4e945df5d64f 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
@@ -1,10 +1,117 @@
 // SPDX-License-Identifier: GPL-2.0
+#include <linux/acpi.h>
 #include <linux/dma-mapping.h>
 #include <linux/iopoll.h>
+#include <linux/of.h>
+#include <linux/of_address.h>
+#include <linux/of_platform.h>
 #include <linux/pci.h>
 
 #include "arm-smmu-v3.h"
 
+struct arm_smmu_option_prop {
+	u32 opt;
+	const char *prop;
+};
+
+static struct arm_smmu_option_prop arm_smmu_options[] = {
+	{ ARM_SMMU_OPT_SKIP_PREFETCH, "hisilicon,broken-prefetch-cmd" },
+	{ ARM_SMMU_OPT_PAGE0_REGS_ONLY, "cavium,cn9900-broken-page1-regspace"},
+	{ 0, NULL},
+};
+
+static void parse_driver_options(struct arm_smmu_device *smmu)
+{
+	int i = 0;
+
+	do {
+		if (of_property_read_bool(smmu->dev->of_node,
+						arm_smmu_options[i].prop)) {
+			smmu->options |= arm_smmu_options[i].opt;
+			dev_notice(smmu->dev, "option %s\n",
+				arm_smmu_options[i].prop);
+		}
+	} while (arm_smmu_options[++i].opt);
+}
+
+static int arm_smmu_device_dt_probe(struct platform_device *pdev,
+				    struct arm_smmu_device *smmu,
+				    bool *bypass)
+{
+	struct device *dev = &pdev->dev;
+	u32 cells;
+
+	*bypass = true;
+	if (of_property_read_u32(dev->of_node, "#iommu-cells", &cells))
+		dev_err(dev, "missing #iommu-cells property\n");
+	else if (cells != 1)
+		dev_err(dev, "invalid #iommu-cells value (%d)\n", cells);
+	else
+		*bypass = false;
+
+	parse_driver_options(smmu);
+
+	if (of_dma_is_coherent(dev->of_node))
+		smmu->features |= ARM_SMMU_FEAT_COHERENCY;
+
+	return 0;
+}
+
+#ifdef CONFIG_ACPI
+static void acpi_smmu_get_options(u32 model, struct arm_smmu_device *smmu)
+{
+	switch (model) {
+	case ACPI_IORT_SMMU_V3_CAVIUM_CN99XX:
+		smmu->options |= ARM_SMMU_OPT_PAGE0_REGS_ONLY;
+		break;
+	case ACPI_IORT_SMMU_V3_HISILICON_HI161X:
+		smmu->options |= ARM_SMMU_OPT_SKIP_PREFETCH;
+		break;
+	}
+
+	dev_notice(smmu->dev, "option mask 0x%x\n", smmu->options);
+}
+
+static int arm_smmu_device_acpi_probe(struct platform_device *pdev,
+				      struct arm_smmu_device *smmu,
+				      bool *bypass)
+{
+	struct acpi_iort_smmu_v3 *iort_smmu;
+	struct device *dev = smmu->dev;
+	struct acpi_iort_node *node;
+
+	node = *(struct acpi_iort_node **)dev_get_platdata(dev);
+
+	/* Retrieve SMMUv3 specific data */
+	iort_smmu = (struct acpi_iort_smmu_v3 *)node->node_data;
+
+	acpi_smmu_get_options(iort_smmu->model, smmu);
+
+	if (iort_smmu->flags & ACPI_IORT_SMMU_V3_COHACC_OVERRIDE)
+		smmu->features |= ARM_SMMU_FEAT_COHERENCY;
+
+	*bypass = false;
+	return 0;
+}
+
+#else
+static inline int arm_smmu_device_acpi_probe(struct platform_device *pdev,
+					     struct arm_smmu_device *smmu,
+					     bool *bypass)
+{
+	return -ENODEV;
+}
+#endif
+
+int arm_smmu_fw_probe(struct platform_device *pdev,
+		      struct arm_smmu_device *smmu, bool *bypass)
+{
+	if (smmu->dev->of_node)
+		return arm_smmu_device_dt_probe(pdev, smmu, bypass);
+	else
+		return arm_smmu_device_acpi_probe(pdev, smmu, bypass);
+}
+
 int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 {
 	u32 reg;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 2baaf064a324..7cb171304953 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -9,7 +9,6 @@
  * This driver is powered by bad coffee and bombay mix.
  */
 
-#include <linux/acpi.h>
 #include <linux/acpi_iort.h>
 #include <linux/bitops.h>
 #include <linux/crash_dump.h>
@@ -19,9 +18,6 @@
 #include <linux/io-pgtable.h>
 #include <linux/module.h>
 #include <linux/msi.h>
-#include <linux/of.h>
-#include <linux/of_address.h>
-#include <linux/of_platform.h>
 #include <linux/pci-ats.h>
 #include <linux/platform_device.h>
 
@@ -64,11 +60,6 @@ static phys_addr_t arm_smmu_msi_cfg[ARM_SMMU_MAX_MSIS][3] = {
 	},
 };
 
-struct arm_smmu_option_prop {
-	u32 opt;
-	const char *prop;
-};
-
 DEFINE_XARRAY_ALLOC1(arm_smmu_asid_xa);
 DEFINE_MUTEX(arm_smmu_asid_lock);
 
@@ -78,26 +69,6 @@ DEFINE_MUTEX(arm_smmu_asid_lock);
  */
 struct arm_smmu_ctx_desc quiet_cd = { 0 };
 
-static struct arm_smmu_option_prop arm_smmu_options[] = {
-	{ ARM_SMMU_OPT_SKIP_PREFETCH, "hisilicon,broken-prefetch-cmd" },
-	{ ARM_SMMU_OPT_PAGE0_REGS_ONLY, "cavium,cn9900-broken-page1-regspace"},
-	{ 0, NULL},
-};
-
-static void parse_driver_options(struct arm_smmu_device *smmu)
-{
-	int i = 0;
-
-	do {
-		if (of_property_read_bool(smmu->dev->of_node,
-						arm_smmu_options[i].prop)) {
-			smmu->options |= arm_smmu_options[i].opt;
-			dev_notice(smmu->dev, "option %s\n",
-				arm_smmu_options[i].prop);
-		}
-	} while (arm_smmu_options[++i].opt);
-}
-
 /* Low-level queue manipulation functions */
 static bool queue_has_space(struct arm_smmu_ll_queue *q, u32 n)
 {
@@ -3147,70 +3118,6 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 	return 0;
 }
 
-#ifdef CONFIG_ACPI
-static void acpi_smmu_get_options(u32 model, struct arm_smmu_device *smmu)
-{
-	switch (model) {
-	case ACPI_IORT_SMMU_V3_CAVIUM_CN99XX:
-		smmu->options |= ARM_SMMU_OPT_PAGE0_REGS_ONLY;
-		break;
-	case ACPI_IORT_SMMU_V3_HISILICON_HI161X:
-		smmu->options |= ARM_SMMU_OPT_SKIP_PREFETCH;
-		break;
-	}
-
-	dev_notice(smmu->dev, "option mask 0x%x\n", smmu->options);
-}
-
-static int arm_smmu_device_acpi_probe(struct platform_device *pdev,
-				      struct arm_smmu_device *smmu)
-{
-	struct acpi_iort_smmu_v3 *iort_smmu;
-	struct device *dev = smmu->dev;
-	struct acpi_iort_node *node;
-
-	node = *(struct acpi_iort_node **)dev_get_platdata(dev);
-
-	/* Retrieve SMMUv3 specific data */
-	iort_smmu = (struct acpi_iort_smmu_v3 *)node->node_data;
-
-	acpi_smmu_get_options(iort_smmu->model, smmu);
-
-	if (iort_smmu->flags & ACPI_IORT_SMMU_V3_COHACC_OVERRIDE)
-		smmu->features |= ARM_SMMU_FEAT_COHERENCY;
-
-	return 0;
-}
-#else
-static inline int arm_smmu_device_acpi_probe(struct platform_device *pdev,
-					     struct arm_smmu_device *smmu)
-{
-	return -ENODEV;
-}
-#endif
-
-static int arm_smmu_device_dt_probe(struct platform_device *pdev,
-				    struct arm_smmu_device *smmu)
-{
-	struct device *dev = &pdev->dev;
-	u32 cells;
-	int ret = -EINVAL;
-
-	if (of_property_read_u32(dev->of_node, "#iommu-cells", &cells))
-		dev_err(dev, "missing #iommu-cells property\n");
-	else if (cells != 1)
-		dev_err(dev, "invalid #iommu-cells value (%d)\n", cells);
-	else
-		ret = 0;
-
-	parse_driver_options(smmu);
-
-	if (of_dma_is_coherent(dev->of_node))
-		smmu->features |= ARM_SMMU_FEAT_COHERENCY;
-
-	return ret;
-}
-
 static unsigned long arm_smmu_resource_size(struct arm_smmu_device *smmu)
 {
 	if (smmu->options & ARM_SMMU_OPT_PAGE0_REGS_ONLY)
@@ -3271,16 +3178,9 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
 		return -ENOMEM;
 	smmu->dev = dev;
 
-	if (dev->of_node) {
-		ret = arm_smmu_device_dt_probe(pdev, smmu);
-	} else {
-		ret = arm_smmu_device_acpi_probe(pdev, smmu);
-		if (ret == -ENODEV)
-			return ret;
-	}
-
-	/* Set bypass mode according to firmware probing result */
-	bypass = !!ret;
+	ret = arm_smmu_fw_probe(pdev, smmu, &bypass);
+	if (ret)
+		return ret;
 
 	/* Base address */
 	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 32/45] iommu/arm-smmu-v3: Move IOMMU registration to arm-smmu-v3-common.c
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

The KVM driver will need to implement a few IOMMU ops, so move the
helpers to arm-smmu-v3-common.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  4 +++
 .../arm/arm-smmu-v3/arm-smmu-v3-common.c      | 34 +++++++++++++++++++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 17 ++--------
 3 files changed, 40 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 345aac378712..87034da361ca 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -290,6 +290,10 @@ int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
 int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid);
 int arm_smmu_init_strtab(struct arm_smmu_device *smmu);
 
+int arm_smmu_register_iommu(struct arm_smmu_device *smmu,
+			    struct iommu_ops *ops, phys_addr_t ioaddr);
+void arm_smmu_unregister_iommu(struct arm_smmu_device *smmu);
+
 int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain, int ssid,
 			    struct arm_smmu_ctx_desc *cd);
 void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
index 4e945df5d64f..7faf28c5a8b4 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
@@ -112,6 +112,13 @@ int arm_smmu_fw_probe(struct platform_device *pdev,
 		return arm_smmu_device_acpi_probe(pdev, smmu, bypass);
 }
 
+#ifdef CONFIG_ARM_SMMU_V3_SVA
+bool __weak arm_smmu_sva_supported(struct arm_smmu_device *smmu)
+{
+	return false;
+}
+#endif
+
 int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 {
 	u32 reg;
@@ -591,3 +598,30 @@ int arm_smmu_init_strtab(struct arm_smmu_device *smmu)
 	set_bit(0, smmu->vmid_map);
 	return 0;
 }
+
+int arm_smmu_register_iommu(struct arm_smmu_device *smmu,
+			    struct iommu_ops *ops, phys_addr_t ioaddr)
+{
+	int ret;
+	struct device *dev = smmu->dev;
+
+	ret = iommu_device_sysfs_add(&smmu->iommu, dev, NULL,
+				     "smmu3.%pa", &ioaddr);
+	if (ret)
+		return ret;
+
+	ret = iommu_device_register(&smmu->iommu, ops, dev);
+	if (ret) {
+		dev_err(dev, "Failed to register iommu\n");
+		iommu_device_sysfs_remove(&smmu->iommu);
+		return ret;
+	}
+
+	return 0;
+}
+
+void arm_smmu_unregister_iommu(struct arm_smmu_device *smmu)
+{
+	iommu_device_unregister(&smmu->iommu);
+	iommu_device_sysfs_remove(&smmu->iommu);
+}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 7cb171304953..a972c00700cc 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3257,27 +3257,14 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
 		return ret;
 
 	/* And we're up. Go go go! */
-	ret = iommu_device_sysfs_add(&smmu->iommu, dev, NULL,
-				     "smmu3.%pa", &ioaddr);
-	if (ret)
-		return ret;
-
-	ret = iommu_device_register(&smmu->iommu, &arm_smmu_ops, dev);
-	if (ret) {
-		dev_err(dev, "Failed to register iommu\n");
-		iommu_device_sysfs_remove(&smmu->iommu);
-		return ret;
-	}
-
-	return 0;
+	return arm_smmu_register_iommu(smmu, &arm_smmu_ops, ioaddr);
 }
 
 static int arm_smmu_device_remove(struct platform_device *pdev)
 {
 	struct arm_smmu_device *smmu = platform_get_drvdata(pdev);
 
-	iommu_device_unregister(&smmu->iommu);
-	iommu_device_sysfs_remove(&smmu->iommu);
+	arm_smmu_unregister_iommu(smmu);
 	arm_smmu_device_disable(smmu);
 	iopf_queue_free(smmu->evtq.iopf);
 
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 32/45] iommu/arm-smmu-v3: Move IOMMU registration to arm-smmu-v3-common.c
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

The KVM driver will need to implement a few IOMMU ops, so move the
helpers to arm-smmu-v3-common.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  4 +++
 .../arm/arm-smmu-v3/arm-smmu-v3-common.c      | 34 +++++++++++++++++++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 17 ++--------
 3 files changed, 40 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 345aac378712..87034da361ca 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -290,6 +290,10 @@ int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
 int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid);
 int arm_smmu_init_strtab(struct arm_smmu_device *smmu);
 
+int arm_smmu_register_iommu(struct arm_smmu_device *smmu,
+			    struct iommu_ops *ops, phys_addr_t ioaddr);
+void arm_smmu_unregister_iommu(struct arm_smmu_device *smmu);
+
 int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain, int ssid,
 			    struct arm_smmu_ctx_desc *cd);
 void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
index 4e945df5d64f..7faf28c5a8b4 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
@@ -112,6 +112,13 @@ int arm_smmu_fw_probe(struct platform_device *pdev,
 		return arm_smmu_device_acpi_probe(pdev, smmu, bypass);
 }
 
+#ifdef CONFIG_ARM_SMMU_V3_SVA
+bool __weak arm_smmu_sva_supported(struct arm_smmu_device *smmu)
+{
+	return false;
+}
+#endif
+
 int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 {
 	u32 reg;
@@ -591,3 +598,30 @@ int arm_smmu_init_strtab(struct arm_smmu_device *smmu)
 	set_bit(0, smmu->vmid_map);
 	return 0;
 }
+
+int arm_smmu_register_iommu(struct arm_smmu_device *smmu,
+			    struct iommu_ops *ops, phys_addr_t ioaddr)
+{
+	int ret;
+	struct device *dev = smmu->dev;
+
+	ret = iommu_device_sysfs_add(&smmu->iommu, dev, NULL,
+				     "smmu3.%pa", &ioaddr);
+	if (ret)
+		return ret;
+
+	ret = iommu_device_register(&smmu->iommu, ops, dev);
+	if (ret) {
+		dev_err(dev, "Failed to register iommu\n");
+		iommu_device_sysfs_remove(&smmu->iommu);
+		return ret;
+	}
+
+	return 0;
+}
+
+void arm_smmu_unregister_iommu(struct arm_smmu_device *smmu)
+{
+	iommu_device_unregister(&smmu->iommu);
+	iommu_device_sysfs_remove(&smmu->iommu);
+}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 7cb171304953..a972c00700cc 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3257,27 +3257,14 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
 		return ret;
 
 	/* And we're up. Go go go! */
-	ret = iommu_device_sysfs_add(&smmu->iommu, dev, NULL,
-				     "smmu3.%pa", &ioaddr);
-	if (ret)
-		return ret;
-
-	ret = iommu_device_register(&smmu->iommu, &arm_smmu_ops, dev);
-	if (ret) {
-		dev_err(dev, "Failed to register iommu\n");
-		iommu_device_sysfs_remove(&smmu->iommu);
-		return ret;
-	}
-
-	return 0;
+	return arm_smmu_register_iommu(smmu, &arm_smmu_ops, ioaddr);
 }
 
 static int arm_smmu_device_remove(struct platform_device *pdev)
 {
 	struct arm_smmu_device *smmu = platform_get_drvdata(pdev);
 
-	iommu_device_unregister(&smmu->iommu);
-	iommu_device_sysfs_remove(&smmu->iommu);
+	arm_smmu_unregister_iommu(smmu);
 	arm_smmu_device_disable(smmu);
 	iopf_queue_free(smmu->evtq.iopf);
 
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 33/45] iommu/arm-smmu-v3: Use single pages for level-2 stream tables
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Rather than using a fixed split point for the stream tables, base it on
the page size. It's easier for the KVM driver to pass single pages to
the hypervisor when lazily allocating stream tables.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/include/asm/arm-smmu-v3-regs.h     |  1 -
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  1 +
 .../arm/arm-smmu-v3/arm-smmu-v3-common.c      | 21 ++++++++++++-------
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 10 ++++-----
 4 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/include/asm/arm-smmu-v3-regs.h b/arch/arm64/include/asm/arm-smmu-v3-regs.h
index 646a734f2554..357e52f4038f 100644
--- a/arch/arm64/include/asm/arm-smmu-v3-regs.h
+++ b/arch/arm64/include/asm/arm-smmu-v3-regs.h
@@ -168,7 +168,6 @@
  *       256 lazy entries per table (each table covers a PCI bus)
  */
 #define STRTAB_L1_SZ_SHIFT		20
-#define STRTAB_SPLIT			8
 
 #define STRTAB_L1_DESC_DWORDS		1
 #define STRTAB_L1_DESC_SPAN		GENMASK_ULL(4, 0)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 87034da361ca..3a4649f43839 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -163,6 +163,7 @@ struct arm_smmu_strtab_cfg {
 
 	u64				strtab_base;
 	u32				strtab_base_cfg;
+	u8				split;
 };
 
 /* An SMMUv3 instance */
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
index 7faf28c5a8b4..c44075015979 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
@@ -254,11 +254,14 @@ int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 	smmu->sid_bits = FIELD_GET(IDR1_SIDSIZE, reg);
 	smmu->iommu.max_pasids = 1UL << smmu->ssid_bits;
 
+	/* Use one page per level-2 table */
+	smmu->strtab_cfg.split = PAGE_SHIFT - (ilog2(STRTAB_STE_DWORDS) + 3);
+
 	/*
 	 * If the SMMU supports fewer bits than would fill a single L2 stream
 	 * table, use a linear table instead.
 	 */
-	if (smmu->sid_bits <= STRTAB_SPLIT)
+	if (smmu->sid_bits <= smmu->strtab_cfg.split)
 		smmu->features &= ~ARM_SMMU_FEAT_2_LVL_STRTAB;
 
 	/* IDR3 */
@@ -470,15 +473,17 @@ int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
 	size_t size;
 	void *strtab;
 	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
-	struct arm_smmu_strtab_l1_desc *desc = &cfg->l1_desc[sid >> STRTAB_SPLIT];
+	struct arm_smmu_strtab_l1_desc *desc =
+		&cfg->l1_desc[sid >> smmu->strtab_cfg.split];
 
 	if (desc->l2ptr)
 		return 0;
 
-	size = 1 << (STRTAB_SPLIT + ilog2(STRTAB_STE_DWORDS) + 3);
-	strtab = &cfg->strtab[(sid >> STRTAB_SPLIT) * STRTAB_L1_DESC_DWORDS];
+	size = 1 << (smmu->strtab_cfg.split + ilog2(STRTAB_STE_DWORDS) + 3);
+	strtab = &cfg->strtab[(sid >> smmu->strtab_cfg.split) *
+		 STRTAB_L1_DESC_DWORDS];
 
-	desc->span = STRTAB_SPLIT + 1;
+	desc->span = smmu->strtab_cfg.split + 1;
 	desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, &desc->l2ptr_dma,
 					  GFP_KERNEL);
 	if (!desc->l2ptr) {
@@ -520,10 +525,10 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
 
 	/* Calculate the L1 size, capped to the SIDSIZE. */
 	size = STRTAB_L1_SZ_SHIFT - (ilog2(STRTAB_L1_DESC_DWORDS) + 3);
-	size = min(size, smmu->sid_bits - STRTAB_SPLIT);
+	size = min(size, smmu->sid_bits - smmu->strtab_cfg.split);
 	cfg->num_l1_ents = 1 << size;
 
-	size += STRTAB_SPLIT;
+	size += smmu->strtab_cfg.split;
 	if (size < smmu->sid_bits)
 		dev_warn(smmu->dev,
 			 "2-level strtab only covers %u/%u bits of SID\n",
@@ -543,7 +548,7 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
 	/* Configure strtab_base_cfg for 2 levels */
 	reg  = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_2LVL);
 	reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, size);
-	reg |= FIELD_PREP(STRTAB_BASE_CFG_SPLIT, STRTAB_SPLIT);
+	reg |= FIELD_PREP(STRTAB_BASE_CFG_SPLIT, smmu->strtab_cfg.split);
 	cfg->strtab_base_cfg = reg;
 
 	return arm_smmu_init_l1_strtab(smmu);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index a972c00700cc..19f170088268 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2156,9 +2156,9 @@ static __le64 *arm_smmu_get_step_for_sid(struct arm_smmu_device *smmu, u32 sid)
 		int idx;
 
 		/* Two-level walk */
-		idx = (sid >> STRTAB_SPLIT) * STRTAB_L1_DESC_DWORDS;
+		idx = (sid >> smmu->strtab_cfg.split) * STRTAB_L1_DESC_DWORDS;
 		l1_desc = &cfg->l1_desc[idx];
-		idx = (sid & ((1 << STRTAB_SPLIT) - 1)) * STRTAB_STE_DWORDS;
+		idx = (sid & ((1 << smmu->strtab_cfg.split) - 1)) * STRTAB_STE_DWORDS;
 		step = &l1_desc->l2ptr[idx];
 	} else {
 		/* Simple linear lookup */
@@ -2439,7 +2439,7 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid)
 	unsigned long limit = smmu->strtab_cfg.num_l1_ents;
 
 	if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB)
-		limit *= 1UL << STRTAB_SPLIT;
+		limit *= 1UL << smmu->strtab_cfg.split;
 
 	return sid < limit;
 }
@@ -2460,8 +2460,8 @@ static int arm_smmu_init_sid_strtab(struct arm_smmu_device *smmu, u32 sid)
 		if (ret)
 			return ret;
 
-		desc = &smmu->strtab_cfg.l1_desc[sid >> STRTAB_SPLIT];
-		arm_smmu_init_bypass_stes(desc->l2ptr, 1 << STRTAB_SPLIT,
+		desc = &smmu->strtab_cfg.l1_desc[sid >> smmu->strtab_cfg.split];
+		arm_smmu_init_bypass_stes(desc->l2ptr, 1 << smmu->strtab_cfg.split,
 					  false);
 	}
 
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 33/45] iommu/arm-smmu-v3: Use single pages for level-2 stream tables
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Rather than using a fixed split point for the stream tables, base it on
the page size. It's easier for the KVM driver to pass single pages to
the hypervisor when lazily allocating stream tables.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/include/asm/arm-smmu-v3-regs.h     |  1 -
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  1 +
 .../arm/arm-smmu-v3/arm-smmu-v3-common.c      | 21 ++++++++++++-------
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 10 ++++-----
 4 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/include/asm/arm-smmu-v3-regs.h b/arch/arm64/include/asm/arm-smmu-v3-regs.h
index 646a734f2554..357e52f4038f 100644
--- a/arch/arm64/include/asm/arm-smmu-v3-regs.h
+++ b/arch/arm64/include/asm/arm-smmu-v3-regs.h
@@ -168,7 +168,6 @@
  *       256 lazy entries per table (each table covers a PCI bus)
  */
 #define STRTAB_L1_SZ_SHIFT		20
-#define STRTAB_SPLIT			8
 
 #define STRTAB_L1_DESC_DWORDS		1
 #define STRTAB_L1_DESC_SPAN		GENMASK_ULL(4, 0)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 87034da361ca..3a4649f43839 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -163,6 +163,7 @@ struct arm_smmu_strtab_cfg {
 
 	u64				strtab_base;
 	u32				strtab_base_cfg;
+	u8				split;
 };
 
 /* An SMMUv3 instance */
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
index 7faf28c5a8b4..c44075015979 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
@@ -254,11 +254,14 @@ int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 	smmu->sid_bits = FIELD_GET(IDR1_SIDSIZE, reg);
 	smmu->iommu.max_pasids = 1UL << smmu->ssid_bits;
 
+	/* Use one page per level-2 table */
+	smmu->strtab_cfg.split = PAGE_SHIFT - (ilog2(STRTAB_STE_DWORDS) + 3);
+
 	/*
 	 * If the SMMU supports fewer bits than would fill a single L2 stream
 	 * table, use a linear table instead.
 	 */
-	if (smmu->sid_bits <= STRTAB_SPLIT)
+	if (smmu->sid_bits <= smmu->strtab_cfg.split)
 		smmu->features &= ~ARM_SMMU_FEAT_2_LVL_STRTAB;
 
 	/* IDR3 */
@@ -470,15 +473,17 @@ int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
 	size_t size;
 	void *strtab;
 	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
-	struct arm_smmu_strtab_l1_desc *desc = &cfg->l1_desc[sid >> STRTAB_SPLIT];
+	struct arm_smmu_strtab_l1_desc *desc =
+		&cfg->l1_desc[sid >> smmu->strtab_cfg.split];
 
 	if (desc->l2ptr)
 		return 0;
 
-	size = 1 << (STRTAB_SPLIT + ilog2(STRTAB_STE_DWORDS) + 3);
-	strtab = &cfg->strtab[(sid >> STRTAB_SPLIT) * STRTAB_L1_DESC_DWORDS];
+	size = 1 << (smmu->strtab_cfg.split + ilog2(STRTAB_STE_DWORDS) + 3);
+	strtab = &cfg->strtab[(sid >> smmu->strtab_cfg.split) *
+		 STRTAB_L1_DESC_DWORDS];
 
-	desc->span = STRTAB_SPLIT + 1;
+	desc->span = smmu->strtab_cfg.split + 1;
 	desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, &desc->l2ptr_dma,
 					  GFP_KERNEL);
 	if (!desc->l2ptr) {
@@ -520,10 +525,10 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
 
 	/* Calculate the L1 size, capped to the SIDSIZE. */
 	size = STRTAB_L1_SZ_SHIFT - (ilog2(STRTAB_L1_DESC_DWORDS) + 3);
-	size = min(size, smmu->sid_bits - STRTAB_SPLIT);
+	size = min(size, smmu->sid_bits - smmu->strtab_cfg.split);
 	cfg->num_l1_ents = 1 << size;
 
-	size += STRTAB_SPLIT;
+	size += smmu->strtab_cfg.split;
 	if (size < smmu->sid_bits)
 		dev_warn(smmu->dev,
 			 "2-level strtab only covers %u/%u bits of SID\n",
@@ -543,7 +548,7 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
 	/* Configure strtab_base_cfg for 2 levels */
 	reg  = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_2LVL);
 	reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, size);
-	reg |= FIELD_PREP(STRTAB_BASE_CFG_SPLIT, STRTAB_SPLIT);
+	reg |= FIELD_PREP(STRTAB_BASE_CFG_SPLIT, smmu->strtab_cfg.split);
 	cfg->strtab_base_cfg = reg;
 
 	return arm_smmu_init_l1_strtab(smmu);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index a972c00700cc..19f170088268 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2156,9 +2156,9 @@ static __le64 *arm_smmu_get_step_for_sid(struct arm_smmu_device *smmu, u32 sid)
 		int idx;
 
 		/* Two-level walk */
-		idx = (sid >> STRTAB_SPLIT) * STRTAB_L1_DESC_DWORDS;
+		idx = (sid >> smmu->strtab_cfg.split) * STRTAB_L1_DESC_DWORDS;
 		l1_desc = &cfg->l1_desc[idx];
-		idx = (sid & ((1 << STRTAB_SPLIT) - 1)) * STRTAB_STE_DWORDS;
+		idx = (sid & ((1 << smmu->strtab_cfg.split) - 1)) * STRTAB_STE_DWORDS;
 		step = &l1_desc->l2ptr[idx];
 	} else {
 		/* Simple linear lookup */
@@ -2439,7 +2439,7 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid)
 	unsigned long limit = smmu->strtab_cfg.num_l1_ents;
 
 	if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB)
-		limit *= 1UL << STRTAB_SPLIT;
+		limit *= 1UL << smmu->strtab_cfg.split;
 
 	return sid < limit;
 }
@@ -2460,8 +2460,8 @@ static int arm_smmu_init_sid_strtab(struct arm_smmu_device *smmu, u32 sid)
 		if (ret)
 			return ret;
 
-		desc = &smmu->strtab_cfg.l1_desc[sid >> STRTAB_SPLIT];
-		arm_smmu_init_bypass_stes(desc->l2ptr, 1 << STRTAB_SPLIT,
+		desc = &smmu->strtab_cfg.l1_desc[sid >> smmu->strtab_cfg.split];
+		arm_smmu_init_bypass_stes(desc->l2ptr, 1 << smmu->strtab_cfg.split,
 					  false);
 	}
 
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 34/45] iommu/arm-smmu-v3: Add host driver for pKVM
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Under protected KVM (pKVM), the host does not have access to guest or
hypervisor memory. This means that devices owned by the host must be
isolated by the SMMU, and the hypervisor is in charge of the SMMU.

Introduce the host component that replaces the normal SMMUv3 driver when
pKVM is enabled, and sends configuration and requests to the actual
driver running in the hypervisor (EL2).

Rather than rely on regular driver probe, pKVM directly calls
kvm_arm_smmu_v3_init(), which synchronously finds all SMMUs and hands
them to the hypervisor. If the regular driver is enabled, it will not
find any free SMMU to drive once it gets probed.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/iommu/arm/arm-smmu-v3/Makefile        |  5 ++
 include/kvm/arm_smmu_v3.h                     | 14 +++++
 arch/arm64/kvm/arm.c                          | 18 +++++-
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c   | 58 +++++++++++++++++++
 4 files changed, 94 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c

diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile
index c4fcc796213c..a90b97d8bae3 100644
--- a/drivers/iommu/arm/arm-smmu-v3/Makefile
+++ b/drivers/iommu/arm/arm-smmu-v3/Makefile
@@ -4,3 +4,8 @@ arm_smmu_v3-objs-y += arm-smmu-v3.o
 arm_smmu_v3-objs-y += arm-smmu-v3-common.o
 arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
 arm_smmu_v3-objs := $(arm_smmu_v3-objs-y)
+
+obj-$(CONFIG_ARM_SMMU_V3_PKVM) += arm_smmu_v3_kvm.o
+arm_smmu_v3_kvm-objs-y += arm-smmu-v3-kvm.o
+arm_smmu_v3_kvm-objs-y += arm-smmu-v3-common.o
+arm_smmu_v3_kvm-objs := $(arm_smmu_v3_kvm-objs-y)
diff --git a/include/kvm/arm_smmu_v3.h b/include/kvm/arm_smmu_v3.h
index ed139b0e9612..373b915b6661 100644
--- a/include/kvm/arm_smmu_v3.h
+++ b/include/kvm/arm_smmu_v3.h
@@ -40,4 +40,18 @@ extern struct hyp_arm_smmu_v3_device *kvm_nvhe_sym(kvm_hyp_arm_smmu_v3_smmus);
 
 #endif /* CONFIG_ARM_SMMU_V3_PKVM */
 
+#ifndef __KVM_NVHE_HYPERVISOR__
+# if IS_ENABLED(CONFIG_ARM_SMMU_V3_PKVM)
+int kvm_arm_smmu_v3_init(unsigned int *count);
+void kvm_arm_smmu_v3_remove(void);
+
+# else /* CONFIG_ARM_SMMU_V3_PKVM */
+static inline int kvm_arm_smmu_v3_init(unsigned int *count)
+{
+	return -ENODEV;
+}
+static void kvm_arm_smmu_v3_remove(void) {}
+# endif /* CONFIG_ARM_SMMU_V3_PKVM */
+#endif /* __KVM_NVHE_HYPERVISOR__ */
+
 #endif /* __KVM_ARM_SMMU_V3_H */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 31faae76d519..a4cd09fc4abf 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -44,6 +44,7 @@
 #include <kvm/arm_hypercalls.h>
 #include <kvm/arm_pmu.h>
 #include <kvm/arm_psci.h>
+#include <kvm/arm_smmu_v3.h>
 
 static enum kvm_mode kvm_mode = KVM_MODE_DEFAULT;
 DEFINE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
@@ -1901,11 +1902,26 @@ static bool init_psci_relay(void)
 
 static int init_stage2_iommu(void)
 {
-	return KVM_IOMMU_DRIVER_NONE;
+	int ret;
+	unsigned int smmu_count;
+
+	ret = kvm_arm_smmu_v3_init(&smmu_count);
+	if (ret)
+		return ret;
+	else if (!smmu_count)
+		return KVM_IOMMU_DRIVER_NONE;
+	return KVM_IOMMU_DRIVER_SMMUV3;
 }
 
 static void remove_stage2_iommu(enum kvm_iommu_driver iommu)
 {
+	switch (iommu) {
+	case KVM_IOMMU_DRIVER_SMMUV3:
+		kvm_arm_smmu_v3_remove();
+		break;
+	default:
+		break;
+	}
 }
 
 static int init_subsystems(void)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
new file mode 100644
index 000000000000..4092da8050ef
--- /dev/null
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
@@ -0,0 +1,58 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * pKVM host driver for the Arm SMMUv3
+ *
+ * Copyright (C) 2022 Linaro Ltd.
+ */
+#include <linux/of_platform.h>
+
+#include <kvm/arm_smmu_v3.h>
+
+#include "arm-smmu-v3.h"
+
+static int kvm_arm_smmu_probe(struct platform_device *pdev)
+{
+	return -ENOSYS;
+}
+
+static int kvm_arm_smmu_remove(struct platform_device *pdev)
+{
+	return 0;
+}
+
+static const struct of_device_id arm_smmu_of_match[] = {
+	{ .compatible = "arm,smmu-v3", },
+	{ },
+};
+
+static struct platform_driver kvm_arm_smmu_driver = {
+	.driver = {
+		.name = "kvm-arm-smmu-v3",
+		.of_match_table = arm_smmu_of_match,
+	},
+	.remove = kvm_arm_smmu_remove,
+};
+
+/**
+ * kvm_arm_smmu_v3_init() - Reserve the SMMUv3 for KVM
+ * @count: on success, number of SMMUs successfully initialized
+ *
+ * Return 0 if all present SMMUv3 were probed successfully, or an error.
+ *   If no SMMU was found, return 0, with a count of 0.
+ */
+int kvm_arm_smmu_v3_init(unsigned int *count)
+{
+	int ret;
+
+	ret = platform_driver_probe(&kvm_arm_smmu_driver, kvm_arm_smmu_probe);
+	if (ret)
+		return ret;
+
+	*count = 0;
+	return 0;
+}
+
+void kvm_arm_smmu_v3_remove(void)
+{
+	platform_driver_unregister(&kvm_arm_smmu_driver);
+}
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 34/45] iommu/arm-smmu-v3: Add host driver for pKVM
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Under protected KVM (pKVM), the host does not have access to guest or
hypervisor memory. This means that devices owned by the host must be
isolated by the SMMU, and the hypervisor is in charge of the SMMU.

Introduce the host component that replaces the normal SMMUv3 driver when
pKVM is enabled, and sends configuration and requests to the actual
driver running in the hypervisor (EL2).

Rather than rely on regular driver probe, pKVM directly calls
kvm_arm_smmu_v3_init(), which synchronously finds all SMMUs and hands
them to the hypervisor. If the regular driver is enabled, it will not
find any free SMMU to drive once it gets probed.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/iommu/arm/arm-smmu-v3/Makefile        |  5 ++
 include/kvm/arm_smmu_v3.h                     | 14 +++++
 arch/arm64/kvm/arm.c                          | 18 +++++-
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c   | 58 +++++++++++++++++++
 4 files changed, 94 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c

diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile
index c4fcc796213c..a90b97d8bae3 100644
--- a/drivers/iommu/arm/arm-smmu-v3/Makefile
+++ b/drivers/iommu/arm/arm-smmu-v3/Makefile
@@ -4,3 +4,8 @@ arm_smmu_v3-objs-y += arm-smmu-v3.o
 arm_smmu_v3-objs-y += arm-smmu-v3-common.o
 arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
 arm_smmu_v3-objs := $(arm_smmu_v3-objs-y)
+
+obj-$(CONFIG_ARM_SMMU_V3_PKVM) += arm_smmu_v3_kvm.o
+arm_smmu_v3_kvm-objs-y += arm-smmu-v3-kvm.o
+arm_smmu_v3_kvm-objs-y += arm-smmu-v3-common.o
+arm_smmu_v3_kvm-objs := $(arm_smmu_v3_kvm-objs-y)
diff --git a/include/kvm/arm_smmu_v3.h b/include/kvm/arm_smmu_v3.h
index ed139b0e9612..373b915b6661 100644
--- a/include/kvm/arm_smmu_v3.h
+++ b/include/kvm/arm_smmu_v3.h
@@ -40,4 +40,18 @@ extern struct hyp_arm_smmu_v3_device *kvm_nvhe_sym(kvm_hyp_arm_smmu_v3_smmus);
 
 #endif /* CONFIG_ARM_SMMU_V3_PKVM */
 
+#ifndef __KVM_NVHE_HYPERVISOR__
+# if IS_ENABLED(CONFIG_ARM_SMMU_V3_PKVM)
+int kvm_arm_smmu_v3_init(unsigned int *count);
+void kvm_arm_smmu_v3_remove(void);
+
+# else /* CONFIG_ARM_SMMU_V3_PKVM */
+static inline int kvm_arm_smmu_v3_init(unsigned int *count)
+{
+	return -ENODEV;
+}
+static void kvm_arm_smmu_v3_remove(void) {}
+# endif /* CONFIG_ARM_SMMU_V3_PKVM */
+#endif /* __KVM_NVHE_HYPERVISOR__ */
+
 #endif /* __KVM_ARM_SMMU_V3_H */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 31faae76d519..a4cd09fc4abf 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -44,6 +44,7 @@
 #include <kvm/arm_hypercalls.h>
 #include <kvm/arm_pmu.h>
 #include <kvm/arm_psci.h>
+#include <kvm/arm_smmu_v3.h>
 
 static enum kvm_mode kvm_mode = KVM_MODE_DEFAULT;
 DEFINE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
@@ -1901,11 +1902,26 @@ static bool init_psci_relay(void)
 
 static int init_stage2_iommu(void)
 {
-	return KVM_IOMMU_DRIVER_NONE;
+	int ret;
+	unsigned int smmu_count;
+
+	ret = kvm_arm_smmu_v3_init(&smmu_count);
+	if (ret)
+		return ret;
+	else if (!smmu_count)
+		return KVM_IOMMU_DRIVER_NONE;
+	return KVM_IOMMU_DRIVER_SMMUV3;
 }
 
 static void remove_stage2_iommu(enum kvm_iommu_driver iommu)
 {
+	switch (iommu) {
+	case KVM_IOMMU_DRIVER_SMMUV3:
+		kvm_arm_smmu_v3_remove();
+		break;
+	default:
+		break;
+	}
 }
 
 static int init_subsystems(void)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
new file mode 100644
index 000000000000..4092da8050ef
--- /dev/null
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
@@ -0,0 +1,58 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * pKVM host driver for the Arm SMMUv3
+ *
+ * Copyright (C) 2022 Linaro Ltd.
+ */
+#include <linux/of_platform.h>
+
+#include <kvm/arm_smmu_v3.h>
+
+#include "arm-smmu-v3.h"
+
+static int kvm_arm_smmu_probe(struct platform_device *pdev)
+{
+	return -ENOSYS;
+}
+
+static int kvm_arm_smmu_remove(struct platform_device *pdev)
+{
+	return 0;
+}
+
+static const struct of_device_id arm_smmu_of_match[] = {
+	{ .compatible = "arm,smmu-v3", },
+	{ },
+};
+
+static struct platform_driver kvm_arm_smmu_driver = {
+	.driver = {
+		.name = "kvm-arm-smmu-v3",
+		.of_match_table = arm_smmu_of_match,
+	},
+	.remove = kvm_arm_smmu_remove,
+};
+
+/**
+ * kvm_arm_smmu_v3_init() - Reserve the SMMUv3 for KVM
+ * @count: on success, number of SMMUs successfully initialized
+ *
+ * Return 0 if all present SMMUv3 were probed successfully, or an error.
+ *   If no SMMU was found, return 0, with a count of 0.
+ */
+int kvm_arm_smmu_v3_init(unsigned int *count)
+{
+	int ret;
+
+	ret = platform_driver_probe(&kvm_arm_smmu_driver, kvm_arm_smmu_probe);
+	if (ret)
+		return ret;
+
+	*count = 0;
+	return 0;
+}
+
+void kvm_arm_smmu_v3_remove(void)
+{
+	platform_driver_unregister(&kvm_arm_smmu_driver);
+}
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 35/45] iommu/arm-smmu-v3-kvm: Pass a list of SMMU devices to the hypervisor
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Build a list of devices and donate the page to the hypervisor. At this
point the host is trusted and this would be a good opportunity to
provide more information about the system. For example, which devices
are owned by the host (perhaps via the VMID and SW bits in the stream
table, although we populate the stream table lazily at the moment.)

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c   | 123 +++++++++++++++++-
 1 file changed, 120 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
index 4092da8050ef..1e0daf9ea4ac 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
@@ -4,15 +4,78 @@
  *
  * Copyright (C) 2022 Linaro Ltd.
  */
+#include <asm/kvm_mmu.h>
 #include <linux/of_platform.h>
 
 #include <kvm/arm_smmu_v3.h>
 
 #include "arm-smmu-v3.h"
 
+struct host_arm_smmu_device {
+	struct arm_smmu_device		smmu;
+	pkvm_handle_t			id;
+};
+
+#define smmu_to_host(_smmu) \
+	container_of(_smmu, struct host_arm_smmu_device, smmu);
+
+static size_t				kvm_arm_smmu_cur;
+static size_t				kvm_arm_smmu_count;
+static struct hyp_arm_smmu_v3_device	*kvm_arm_smmu_array;
+
 static int kvm_arm_smmu_probe(struct platform_device *pdev)
 {
-	return -ENOSYS;
+	int ret;
+	bool bypass;
+	size_t size;
+	phys_addr_t ioaddr;
+	struct resource *res;
+	struct arm_smmu_device *smmu;
+	struct device *dev = &pdev->dev;
+	struct host_arm_smmu_device *host_smmu;
+	struct hyp_arm_smmu_v3_device *hyp_smmu;
+
+	if (kvm_arm_smmu_cur >= kvm_arm_smmu_count)
+		return -ENOSPC;
+
+	hyp_smmu = &kvm_arm_smmu_array[kvm_arm_smmu_cur];
+
+	host_smmu = devm_kzalloc(dev, sizeof(*host_smmu), GFP_KERNEL);
+	if (!host_smmu)
+		return -ENOMEM;
+
+	smmu = &host_smmu->smmu;
+	smmu->dev = dev;
+
+	ret = arm_smmu_fw_probe(pdev, smmu, &bypass);
+	if (ret || bypass)
+		return ret ?: -EINVAL;
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	size = resource_size(res);
+	if (size < SZ_128K) {
+		dev_err(dev, "unsupported MMIO region size (%pr)\n", res);
+		return -EINVAL;
+	}
+	ioaddr = res->start;
+	host_smmu->id = kvm_arm_smmu_cur;
+
+	smmu->base = devm_ioremap_resource(dev, res);
+	if (IS_ERR(smmu->base))
+		return PTR_ERR(smmu->base);
+
+	ret = arm_smmu_device_hw_probe(smmu);
+	if (ret)
+		return ret;
+
+	platform_set_drvdata(pdev, host_smmu);
+
+	/* Hypervisor parameters */
+	hyp_smmu->mmio_addr = ioaddr;
+	hyp_smmu->mmio_size = size;
+	kvm_arm_smmu_cur++;
+
+	return 0;
 }
 
 static int kvm_arm_smmu_remove(struct platform_device *pdev)
@@ -33,6 +96,36 @@ static struct platform_driver kvm_arm_smmu_driver = {
 	.remove = kvm_arm_smmu_remove,
 };
 
+static int kvm_arm_smmu_array_alloc(void)
+{
+	int smmu_order;
+	struct device_node *np;
+
+	kvm_arm_smmu_count = 0;
+	for_each_compatible_node(np, NULL, "arm,smmu-v3")
+		kvm_arm_smmu_count++;
+
+	if (!kvm_arm_smmu_count)
+		return 0;
+
+	/* Allocate the parameter list shared with the hypervisor */
+	smmu_order = get_order(kvm_arm_smmu_count * sizeof(*kvm_arm_smmu_array));
+	kvm_arm_smmu_array = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
+						      smmu_order);
+	if (!kvm_arm_smmu_array)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void kvm_arm_smmu_array_free(void)
+{
+	int order;
+
+	order = get_order(kvm_arm_smmu_count * sizeof(*kvm_arm_smmu_array));
+	free_pages((unsigned long)kvm_arm_smmu_array, order);
+}
+
 /**
  * kvm_arm_smmu_v3_init() - Reserve the SMMUv3 for KVM
  * @count: on success, number of SMMUs successfully initialized
@@ -44,12 +137,36 @@ int kvm_arm_smmu_v3_init(unsigned int *count)
 {
 	int ret;
 
+	/*
+	 * Check whether any device owned by the host is behind an SMMU.
+	 */
+	ret = kvm_arm_smmu_array_alloc();
+	*count = kvm_arm_smmu_count;
+	if (ret || !kvm_arm_smmu_count)
+		return ret;
+
 	ret = platform_driver_probe(&kvm_arm_smmu_driver, kvm_arm_smmu_probe);
 	if (ret)
-		return ret;
+		goto err_free;
 
-	*count = 0;
+	if (kvm_arm_smmu_cur != kvm_arm_smmu_count) {
+		/* A device exists but failed to probe */
+		ret = -EUNATCH;
+		goto err_free;
+	}
+
+	/*
+	 * These variables are stored in the nVHE image, and won't be accessible
+	 * after KVM initialization. Ownership of kvm_arm_smmu_array will be
+	 * transferred to the hypervisor as well.
+	 */
+	kvm_hyp_arm_smmu_v3_smmus = kern_hyp_va(kvm_arm_smmu_array);
+	kvm_hyp_arm_smmu_v3_count = kvm_arm_smmu_count;
 	return 0;
+
+err_free:
+	kvm_arm_smmu_array_free();
+	return ret;
 }
 
 void kvm_arm_smmu_v3_remove(void)
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 35/45] iommu/arm-smmu-v3-kvm: Pass a list of SMMU devices to the hypervisor
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Build a list of devices and donate the page to the hypervisor. At this
point the host is trusted and this would be a good opportunity to
provide more information about the system. For example, which devices
are owned by the host (perhaps via the VMID and SW bits in the stream
table, although we populate the stream table lazily at the moment.)

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c   | 123 +++++++++++++++++-
 1 file changed, 120 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
index 4092da8050ef..1e0daf9ea4ac 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
@@ -4,15 +4,78 @@
  *
  * Copyright (C) 2022 Linaro Ltd.
  */
+#include <asm/kvm_mmu.h>
 #include <linux/of_platform.h>
 
 #include <kvm/arm_smmu_v3.h>
 
 #include "arm-smmu-v3.h"
 
+struct host_arm_smmu_device {
+	struct arm_smmu_device		smmu;
+	pkvm_handle_t			id;
+};
+
+#define smmu_to_host(_smmu) \
+	container_of(_smmu, struct host_arm_smmu_device, smmu);
+
+static size_t				kvm_arm_smmu_cur;
+static size_t				kvm_arm_smmu_count;
+static struct hyp_arm_smmu_v3_device	*kvm_arm_smmu_array;
+
 static int kvm_arm_smmu_probe(struct platform_device *pdev)
 {
-	return -ENOSYS;
+	int ret;
+	bool bypass;
+	size_t size;
+	phys_addr_t ioaddr;
+	struct resource *res;
+	struct arm_smmu_device *smmu;
+	struct device *dev = &pdev->dev;
+	struct host_arm_smmu_device *host_smmu;
+	struct hyp_arm_smmu_v3_device *hyp_smmu;
+
+	if (kvm_arm_smmu_cur >= kvm_arm_smmu_count)
+		return -ENOSPC;
+
+	hyp_smmu = &kvm_arm_smmu_array[kvm_arm_smmu_cur];
+
+	host_smmu = devm_kzalloc(dev, sizeof(*host_smmu), GFP_KERNEL);
+	if (!host_smmu)
+		return -ENOMEM;
+
+	smmu = &host_smmu->smmu;
+	smmu->dev = dev;
+
+	ret = arm_smmu_fw_probe(pdev, smmu, &bypass);
+	if (ret || bypass)
+		return ret ?: -EINVAL;
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	size = resource_size(res);
+	if (size < SZ_128K) {
+		dev_err(dev, "unsupported MMIO region size (%pr)\n", res);
+		return -EINVAL;
+	}
+	ioaddr = res->start;
+	host_smmu->id = kvm_arm_smmu_cur;
+
+	smmu->base = devm_ioremap_resource(dev, res);
+	if (IS_ERR(smmu->base))
+		return PTR_ERR(smmu->base);
+
+	ret = arm_smmu_device_hw_probe(smmu);
+	if (ret)
+		return ret;
+
+	platform_set_drvdata(pdev, host_smmu);
+
+	/* Hypervisor parameters */
+	hyp_smmu->mmio_addr = ioaddr;
+	hyp_smmu->mmio_size = size;
+	kvm_arm_smmu_cur++;
+
+	return 0;
 }
 
 static int kvm_arm_smmu_remove(struct platform_device *pdev)
@@ -33,6 +96,36 @@ static struct platform_driver kvm_arm_smmu_driver = {
 	.remove = kvm_arm_smmu_remove,
 };
 
+static int kvm_arm_smmu_array_alloc(void)
+{
+	int smmu_order;
+	struct device_node *np;
+
+	kvm_arm_smmu_count = 0;
+	for_each_compatible_node(np, NULL, "arm,smmu-v3")
+		kvm_arm_smmu_count++;
+
+	if (!kvm_arm_smmu_count)
+		return 0;
+
+	/* Allocate the parameter list shared with the hypervisor */
+	smmu_order = get_order(kvm_arm_smmu_count * sizeof(*kvm_arm_smmu_array));
+	kvm_arm_smmu_array = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
+						      smmu_order);
+	if (!kvm_arm_smmu_array)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void kvm_arm_smmu_array_free(void)
+{
+	int order;
+
+	order = get_order(kvm_arm_smmu_count * sizeof(*kvm_arm_smmu_array));
+	free_pages((unsigned long)kvm_arm_smmu_array, order);
+}
+
 /**
  * kvm_arm_smmu_v3_init() - Reserve the SMMUv3 for KVM
  * @count: on success, number of SMMUs successfully initialized
@@ -44,12 +137,36 @@ int kvm_arm_smmu_v3_init(unsigned int *count)
 {
 	int ret;
 
+	/*
+	 * Check whether any device owned by the host is behind an SMMU.
+	 */
+	ret = kvm_arm_smmu_array_alloc();
+	*count = kvm_arm_smmu_count;
+	if (ret || !kvm_arm_smmu_count)
+		return ret;
+
 	ret = platform_driver_probe(&kvm_arm_smmu_driver, kvm_arm_smmu_probe);
 	if (ret)
-		return ret;
+		goto err_free;
 
-	*count = 0;
+	if (kvm_arm_smmu_cur != kvm_arm_smmu_count) {
+		/* A device exists but failed to probe */
+		ret = -EUNATCH;
+		goto err_free;
+	}
+
+	/*
+	 * These variables are stored in the nVHE image, and won't be accessible
+	 * after KVM initialization. Ownership of kvm_arm_smmu_array will be
+	 * transferred to the hypervisor as well.
+	 */
+	kvm_hyp_arm_smmu_v3_smmus = kern_hyp_va(kvm_arm_smmu_array);
+	kvm_hyp_arm_smmu_v3_count = kvm_arm_smmu_count;
 	return 0;
+
+err_free:
+	kvm_arm_smmu_array_free();
+	return ret;
 }
 
 void kvm_arm_smmu_v3_remove(void)
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 36/45] iommu/arm-smmu-v3-kvm: Validate device features
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

The KVM hypervisor driver supports a small subset of features. Ensure
the implementation is compatible, and disable some unused features.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c   | 57 +++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
index 1e0daf9ea4ac..2cc632f6b256 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
@@ -23,6 +23,59 @@ static size_t				kvm_arm_smmu_cur;
 static size_t				kvm_arm_smmu_count;
 static struct hyp_arm_smmu_v3_device	*kvm_arm_smmu_array;
 
+static bool kvm_arm_smmu_validate_features(struct arm_smmu_device *smmu)
+{
+	unsigned long oas;
+	unsigned int required_features =
+		ARM_SMMU_FEAT_TRANS_S2 |
+		ARM_SMMU_FEAT_TT_LE;
+	unsigned int forbidden_features =
+		ARM_SMMU_FEAT_STALL_FORCE;
+	unsigned int keep_features =
+		ARM_SMMU_FEAT_2_LVL_STRTAB	|
+		ARM_SMMU_FEAT_2_LVL_CDTAB	|
+		ARM_SMMU_FEAT_TT_LE		|
+		ARM_SMMU_FEAT_SEV		|
+		ARM_SMMU_FEAT_COHERENCY		|
+		ARM_SMMU_FEAT_TRANS_S1		|
+		ARM_SMMU_FEAT_TRANS_S2		|
+		ARM_SMMU_FEAT_VAX		|
+		ARM_SMMU_FEAT_RANGE_INV;
+
+	if (smmu->options & ARM_SMMU_OPT_PAGE0_REGS_ONLY) {
+		dev_err(smmu->dev, "unsupported layout\n");
+		return false;
+	}
+
+	if ((smmu->features & required_features) != required_features) {
+		dev_err(smmu->dev, "missing features 0x%x\n",
+			required_features & ~smmu->features);
+		return false;
+	}
+
+	if (smmu->features & forbidden_features) {
+		dev_err(smmu->dev, "features 0x%x forbidden\n",
+			smmu->features & forbidden_features);
+		return false;
+	}
+
+	smmu->features &= keep_features;
+
+	/*
+	 * This can be relaxed (although the spec says that OAS "must match
+	 * the system physical address size."), but requires some changes. All
+	 * table and queue allocations must use GFP_DMA* to ensure the SMMU can
+	 * access them.
+	 */
+	oas = get_kvm_ipa_limit();
+	if (smmu->oas < oas) {
+		dev_err(smmu->dev, "incompatible address size\n");
+		return false;
+	}
+
+	return true;
+}
+
 static int kvm_arm_smmu_probe(struct platform_device *pdev)
 {
 	int ret;
@@ -68,11 +121,15 @@ static int kvm_arm_smmu_probe(struct platform_device *pdev)
 	if (ret)
 		return ret;
 
+	if (!kvm_arm_smmu_validate_features(smmu))
+		return -ENODEV;
+
 	platform_set_drvdata(pdev, host_smmu);
 
 	/* Hypervisor parameters */
 	hyp_smmu->mmio_addr = ioaddr;
 	hyp_smmu->mmio_size = size;
+	hyp_smmu->features = smmu->features;
 	kvm_arm_smmu_cur++;
 
 	return 0;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 36/45] iommu/arm-smmu-v3-kvm: Validate device features
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

The KVM hypervisor driver supports a small subset of features. Ensure
the implementation is compatible, and disable some unused features.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c   | 57 +++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
index 1e0daf9ea4ac..2cc632f6b256 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
@@ -23,6 +23,59 @@ static size_t				kvm_arm_smmu_cur;
 static size_t				kvm_arm_smmu_count;
 static struct hyp_arm_smmu_v3_device	*kvm_arm_smmu_array;
 
+static bool kvm_arm_smmu_validate_features(struct arm_smmu_device *smmu)
+{
+	unsigned long oas;
+	unsigned int required_features =
+		ARM_SMMU_FEAT_TRANS_S2 |
+		ARM_SMMU_FEAT_TT_LE;
+	unsigned int forbidden_features =
+		ARM_SMMU_FEAT_STALL_FORCE;
+	unsigned int keep_features =
+		ARM_SMMU_FEAT_2_LVL_STRTAB	|
+		ARM_SMMU_FEAT_2_LVL_CDTAB	|
+		ARM_SMMU_FEAT_TT_LE		|
+		ARM_SMMU_FEAT_SEV		|
+		ARM_SMMU_FEAT_COHERENCY		|
+		ARM_SMMU_FEAT_TRANS_S1		|
+		ARM_SMMU_FEAT_TRANS_S2		|
+		ARM_SMMU_FEAT_VAX		|
+		ARM_SMMU_FEAT_RANGE_INV;
+
+	if (smmu->options & ARM_SMMU_OPT_PAGE0_REGS_ONLY) {
+		dev_err(smmu->dev, "unsupported layout\n");
+		return false;
+	}
+
+	if ((smmu->features & required_features) != required_features) {
+		dev_err(smmu->dev, "missing features 0x%x\n",
+			required_features & ~smmu->features);
+		return false;
+	}
+
+	if (smmu->features & forbidden_features) {
+		dev_err(smmu->dev, "features 0x%x forbidden\n",
+			smmu->features & forbidden_features);
+		return false;
+	}
+
+	smmu->features &= keep_features;
+
+	/*
+	 * This can be relaxed (although the spec says that OAS "must match
+	 * the system physical address size."), but requires some changes. All
+	 * table and queue allocations must use GFP_DMA* to ensure the SMMU can
+	 * access them.
+	 */
+	oas = get_kvm_ipa_limit();
+	if (smmu->oas < oas) {
+		dev_err(smmu->dev, "incompatible address size\n");
+		return false;
+	}
+
+	return true;
+}
+
 static int kvm_arm_smmu_probe(struct platform_device *pdev)
 {
 	int ret;
@@ -68,11 +121,15 @@ static int kvm_arm_smmu_probe(struct platform_device *pdev)
 	if (ret)
 		return ret;
 
+	if (!kvm_arm_smmu_validate_features(smmu))
+		return -ENODEV;
+
 	platform_set_drvdata(pdev, host_smmu);
 
 	/* Hypervisor parameters */
 	hyp_smmu->mmio_addr = ioaddr;
 	hyp_smmu->mmio_size = size;
+	hyp_smmu->features = smmu->features;
 	kvm_arm_smmu_cur++;
 
 	return 0;
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 37/45] iommu/arm-smmu-v3-kvm: Allocate structures and reset device
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Allocate the structures that will be shared between hypervisor and SMMU:
command queue and stream table. Install them in the MMIO registers,
along with some configuration bits. After hyp initialization, the host
won't have access to those pages anymore.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c   | 56 +++++++++++++++++++
 1 file changed, 56 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
index 2cc632f6b256..8808890f4dc0 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
@@ -14,6 +14,7 @@
 struct host_arm_smmu_device {
 	struct arm_smmu_device		smmu;
 	pkvm_handle_t			id;
+	u32				boot_gbpa;
 };
 
 #define smmu_to_host(_smmu) \
@@ -76,6 +77,38 @@ static bool kvm_arm_smmu_validate_features(struct arm_smmu_device *smmu)
 	return true;
 }
 
+static int kvm_arm_smmu_device_reset(struct host_arm_smmu_device *host_smmu)
+{
+	int ret;
+	u32 reg;
+	struct arm_smmu_device *smmu = &host_smmu->smmu;
+
+	reg = readl_relaxed(smmu->base + ARM_SMMU_CR0);
+	if (reg & CR0_SMMUEN)
+		dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n");
+
+	/* Disable bypass */
+	host_smmu->boot_gbpa = readl_relaxed(smmu->base + ARM_SMMU_GBPA);
+	ret = arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0);
+	if (ret)
+		return ret;
+
+	ret = arm_smmu_device_disable(smmu);
+	if (ret)
+		return ret;
+
+	/* Stream table */
+	writeq_relaxed(smmu->strtab_cfg.strtab_base,
+		       smmu->base + ARM_SMMU_STRTAB_BASE);
+	writel_relaxed(smmu->strtab_cfg.strtab_base_cfg,
+		       smmu->base + ARM_SMMU_STRTAB_BASE_CFG);
+
+	/* Command queue */
+	writeq_relaxed(smmu->cmdq.q.q_base, smmu->base + ARM_SMMU_CMDQ_BASE);
+
+	return 0;
+}
+
 static int kvm_arm_smmu_probe(struct platform_device *pdev)
 {
 	int ret;
@@ -124,6 +157,20 @@ static int kvm_arm_smmu_probe(struct platform_device *pdev)
 	if (!kvm_arm_smmu_validate_features(smmu))
 		return -ENODEV;
 
+	ret = arm_smmu_init_one_queue(smmu, &smmu->cmdq.q, smmu->base,
+				      ARM_SMMU_CMDQ_PROD, ARM_SMMU_CMDQ_CONS,
+				      CMDQ_ENT_DWORDS, "cmdq");
+	if (ret)
+		return ret;
+
+	ret = arm_smmu_init_strtab(smmu);
+	if (ret)
+		return ret;
+
+	ret = kvm_arm_smmu_device_reset(host_smmu);
+	if (ret)
+		return ret;
+
 	platform_set_drvdata(pdev, host_smmu);
 
 	/* Hypervisor parameters */
@@ -137,6 +184,15 @@ static int kvm_arm_smmu_probe(struct platform_device *pdev)
 
 static int kvm_arm_smmu_remove(struct platform_device *pdev)
 {
+	struct host_arm_smmu_device *host_smmu = platform_get_drvdata(pdev);
+	struct arm_smmu_device *smmu = &host_smmu->smmu;
+
+	/*
+	 * There was an error during hypervisor setup. The hyp driver may
+	 * have already enabled the device, so disable it.
+	 */
+	arm_smmu_device_disable(smmu);
+	arm_smmu_update_gbpa(smmu, host_smmu->boot_gbpa, GBPA_ABORT);
 	return 0;
 }
 
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 37/45] iommu/arm-smmu-v3-kvm: Allocate structures and reset device
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Allocate the structures that will be shared between hypervisor and SMMU:
command queue and stream table. Install them in the MMIO registers,
along with some configuration bits. After hyp initialization, the host
won't have access to those pages anymore.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c   | 56 +++++++++++++++++++
 1 file changed, 56 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
index 2cc632f6b256..8808890f4dc0 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
@@ -14,6 +14,7 @@
 struct host_arm_smmu_device {
 	struct arm_smmu_device		smmu;
 	pkvm_handle_t			id;
+	u32				boot_gbpa;
 };
 
 #define smmu_to_host(_smmu) \
@@ -76,6 +77,38 @@ static bool kvm_arm_smmu_validate_features(struct arm_smmu_device *smmu)
 	return true;
 }
 
+static int kvm_arm_smmu_device_reset(struct host_arm_smmu_device *host_smmu)
+{
+	int ret;
+	u32 reg;
+	struct arm_smmu_device *smmu = &host_smmu->smmu;
+
+	reg = readl_relaxed(smmu->base + ARM_SMMU_CR0);
+	if (reg & CR0_SMMUEN)
+		dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n");
+
+	/* Disable bypass */
+	host_smmu->boot_gbpa = readl_relaxed(smmu->base + ARM_SMMU_GBPA);
+	ret = arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0);
+	if (ret)
+		return ret;
+
+	ret = arm_smmu_device_disable(smmu);
+	if (ret)
+		return ret;
+
+	/* Stream table */
+	writeq_relaxed(smmu->strtab_cfg.strtab_base,
+		       smmu->base + ARM_SMMU_STRTAB_BASE);
+	writel_relaxed(smmu->strtab_cfg.strtab_base_cfg,
+		       smmu->base + ARM_SMMU_STRTAB_BASE_CFG);
+
+	/* Command queue */
+	writeq_relaxed(smmu->cmdq.q.q_base, smmu->base + ARM_SMMU_CMDQ_BASE);
+
+	return 0;
+}
+
 static int kvm_arm_smmu_probe(struct platform_device *pdev)
 {
 	int ret;
@@ -124,6 +157,20 @@ static int kvm_arm_smmu_probe(struct platform_device *pdev)
 	if (!kvm_arm_smmu_validate_features(smmu))
 		return -ENODEV;
 
+	ret = arm_smmu_init_one_queue(smmu, &smmu->cmdq.q, smmu->base,
+				      ARM_SMMU_CMDQ_PROD, ARM_SMMU_CMDQ_CONS,
+				      CMDQ_ENT_DWORDS, "cmdq");
+	if (ret)
+		return ret;
+
+	ret = arm_smmu_init_strtab(smmu);
+	if (ret)
+		return ret;
+
+	ret = kvm_arm_smmu_device_reset(host_smmu);
+	if (ret)
+		return ret;
+
 	platform_set_drvdata(pdev, host_smmu);
 
 	/* Hypervisor parameters */
@@ -137,6 +184,15 @@ static int kvm_arm_smmu_probe(struct platform_device *pdev)
 
 static int kvm_arm_smmu_remove(struct platform_device *pdev)
 {
+	struct host_arm_smmu_device *host_smmu = platform_get_drvdata(pdev);
+	struct arm_smmu_device *smmu = &host_smmu->smmu;
+
+	/*
+	 * There was an error during hypervisor setup. The hyp driver may
+	 * have already enabled the device, so disable it.
+	 */
+	arm_smmu_device_disable(smmu);
+	arm_smmu_update_gbpa(smmu, host_smmu->boot_gbpa, GBPA_ABORT);
 	return 0;
 }
 
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 38/45] iommu/arm-smmu-v3-kvm: Add per-cpu page queue
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Allocate page queues shared with the hypervisor for page donation and
reclaim. A local_lock ensures that only one thread fills the queue
during a hypercall.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c   | 93 ++++++++++++++++++-
 1 file changed, 92 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
index 8808890f4dc0..755c77bc0417 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
@@ -5,6 +5,7 @@
  * Copyright (C) 2022 Linaro Ltd.
  */
 #include <asm/kvm_mmu.h>
+#include <linux/local_lock.h>
 #include <linux/of_platform.h>
 
 #include <kvm/arm_smmu_v3.h>
@@ -23,6 +24,81 @@ struct host_arm_smmu_device {
 static size_t				kvm_arm_smmu_cur;
 static size_t				kvm_arm_smmu_count;
 static struct hyp_arm_smmu_v3_device	*kvm_arm_smmu_array;
+static struct kvm_hyp_iommu_memcache	*kvm_arm_smmu_memcache;
+
+static DEFINE_PER_CPU(local_lock_t, memcache_lock) =
+				INIT_LOCAL_LOCK(memcache_lock);
+
+static void *kvm_arm_smmu_alloc_page(void *opaque)
+{
+	struct arm_smmu_device *smmu = opaque;
+	struct page *p;
+
+	p = alloc_pages_node(dev_to_node(smmu->dev), GFP_ATOMIC, 0);
+	if (!p)
+		return NULL;
+
+	return page_address(p);
+}
+
+static void kvm_arm_smmu_free_page(void *va, void *opaque)
+{
+	free_page((unsigned long)va);
+}
+
+static phys_addr_t kvm_arm_smmu_host_pa(void *va)
+{
+	return __pa(va);
+}
+
+static void *kvm_arm_smmu_host_va(phys_addr_t pa)
+{
+	return __va(pa);
+}
+
+__maybe_unused
+static int kvm_arm_smmu_topup_memcache(struct arm_smmu_device *smmu)
+{
+	struct kvm_hyp_memcache *mc;
+	int cpu = raw_smp_processor_id();
+
+	lockdep_assert_held(this_cpu_ptr(&memcache_lock));
+	mc = &kvm_arm_smmu_memcache[cpu].pages;
+
+	if (!kvm_arm_smmu_memcache[cpu].needs_page)
+		return -EBADE;
+
+	kvm_arm_smmu_memcache[cpu].needs_page = false;
+	return  __topup_hyp_memcache(mc, 1, kvm_arm_smmu_alloc_page,
+				     kvm_arm_smmu_host_pa, smmu);
+}
+
+__maybe_unused
+static void kvm_arm_smmu_reclaim_memcache(void)
+{
+	struct kvm_hyp_memcache *mc;
+	int cpu = raw_smp_processor_id();
+
+	lockdep_assert_held(this_cpu_ptr(&memcache_lock));
+	mc = &kvm_arm_smmu_memcache[cpu].pages;
+
+	__free_hyp_memcache(mc, kvm_arm_smmu_free_page,
+			    kvm_arm_smmu_host_va, NULL);
+}
+
+/*
+ * Issue hypercall, and retry after filling the memcache if necessary.
+ * After the call, reclaim pages pushed in the memcache by the hypervisor.
+ */
+#define kvm_call_hyp_nvhe_mc(smmu, ...)				\
+({								\
+	int __ret;						\
+	do {							\
+		__ret = kvm_call_hyp_nvhe(__VA_ARGS__);		\
+	} while (__ret && !kvm_arm_smmu_topup_memcache(smmu));	\
+	kvm_arm_smmu_reclaim_memcache();			\
+	__ret;							\
+})
 
 static bool kvm_arm_smmu_validate_features(struct arm_smmu_device *smmu)
 {
@@ -211,7 +287,7 @@ static struct platform_driver kvm_arm_smmu_driver = {
 
 static int kvm_arm_smmu_array_alloc(void)
 {
-	int smmu_order;
+	int smmu_order, mc_order;
 	struct device_node *np;
 
 	kvm_arm_smmu_count = 0;
@@ -228,7 +304,17 @@ static int kvm_arm_smmu_array_alloc(void)
 	if (!kvm_arm_smmu_array)
 		return -ENOMEM;
 
+	mc_order = get_order(NR_CPUS * sizeof(*kvm_arm_smmu_memcache));
+	kvm_arm_smmu_memcache = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
+							 mc_order);
+	if (!kvm_arm_smmu_memcache)
+		goto err_free_array;
+
 	return 0;
+
+err_free_array:
+	free_pages((unsigned long)kvm_arm_smmu_array, smmu_order);
+	return -ENOMEM;
 }
 
 static void kvm_arm_smmu_array_free(void)
@@ -237,6 +323,8 @@ static void kvm_arm_smmu_array_free(void)
 
 	order = get_order(kvm_arm_smmu_count * sizeof(*kvm_arm_smmu_array));
 	free_pages((unsigned long)kvm_arm_smmu_array, order);
+	order = get_order(NR_CPUS * sizeof(*kvm_arm_smmu_memcache));
+	free_pages((unsigned long)kvm_arm_smmu_memcache, order);
 }
 
 /**
@@ -272,9 +360,12 @@ int kvm_arm_smmu_v3_init(unsigned int *count)
 	 * These variables are stored in the nVHE image, and won't be accessible
 	 * after KVM initialization. Ownership of kvm_arm_smmu_array will be
 	 * transferred to the hypervisor as well.
+	 *
+	 * kvm_arm_smmu_memcache is shared between hypervisor and host.
 	 */
 	kvm_hyp_arm_smmu_v3_smmus = kern_hyp_va(kvm_arm_smmu_array);
 	kvm_hyp_arm_smmu_v3_count = kvm_arm_smmu_count;
+	kvm_hyp_iommu_memcaches = kern_hyp_va(kvm_arm_smmu_memcache);
 	return 0;
 
 err_free:
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 38/45] iommu/arm-smmu-v3-kvm: Add per-cpu page queue
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Allocate page queues shared with the hypervisor for page donation and
reclaim. A local_lock ensures that only one thread fills the queue
during a hypercall.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c   | 93 ++++++++++++++++++-
 1 file changed, 92 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
index 8808890f4dc0..755c77bc0417 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
@@ -5,6 +5,7 @@
  * Copyright (C) 2022 Linaro Ltd.
  */
 #include <asm/kvm_mmu.h>
+#include <linux/local_lock.h>
 #include <linux/of_platform.h>
 
 #include <kvm/arm_smmu_v3.h>
@@ -23,6 +24,81 @@ struct host_arm_smmu_device {
 static size_t				kvm_arm_smmu_cur;
 static size_t				kvm_arm_smmu_count;
 static struct hyp_arm_smmu_v3_device	*kvm_arm_smmu_array;
+static struct kvm_hyp_iommu_memcache	*kvm_arm_smmu_memcache;
+
+static DEFINE_PER_CPU(local_lock_t, memcache_lock) =
+				INIT_LOCAL_LOCK(memcache_lock);
+
+static void *kvm_arm_smmu_alloc_page(void *opaque)
+{
+	struct arm_smmu_device *smmu = opaque;
+	struct page *p;
+
+	p = alloc_pages_node(dev_to_node(smmu->dev), GFP_ATOMIC, 0);
+	if (!p)
+		return NULL;
+
+	return page_address(p);
+}
+
+static void kvm_arm_smmu_free_page(void *va, void *opaque)
+{
+	free_page((unsigned long)va);
+}
+
+static phys_addr_t kvm_arm_smmu_host_pa(void *va)
+{
+	return __pa(va);
+}
+
+static void *kvm_arm_smmu_host_va(phys_addr_t pa)
+{
+	return __va(pa);
+}
+
+__maybe_unused
+static int kvm_arm_smmu_topup_memcache(struct arm_smmu_device *smmu)
+{
+	struct kvm_hyp_memcache *mc;
+	int cpu = raw_smp_processor_id();
+
+	lockdep_assert_held(this_cpu_ptr(&memcache_lock));
+	mc = &kvm_arm_smmu_memcache[cpu].pages;
+
+	if (!kvm_arm_smmu_memcache[cpu].needs_page)
+		return -EBADE;
+
+	kvm_arm_smmu_memcache[cpu].needs_page = false;
+	return  __topup_hyp_memcache(mc, 1, kvm_arm_smmu_alloc_page,
+				     kvm_arm_smmu_host_pa, smmu);
+}
+
+__maybe_unused
+static void kvm_arm_smmu_reclaim_memcache(void)
+{
+	struct kvm_hyp_memcache *mc;
+	int cpu = raw_smp_processor_id();
+
+	lockdep_assert_held(this_cpu_ptr(&memcache_lock));
+	mc = &kvm_arm_smmu_memcache[cpu].pages;
+
+	__free_hyp_memcache(mc, kvm_arm_smmu_free_page,
+			    kvm_arm_smmu_host_va, NULL);
+}
+
+/*
+ * Issue hypercall, and retry after filling the memcache if necessary.
+ * After the call, reclaim pages pushed in the memcache by the hypervisor.
+ */
+#define kvm_call_hyp_nvhe_mc(smmu, ...)				\
+({								\
+	int __ret;						\
+	do {							\
+		__ret = kvm_call_hyp_nvhe(__VA_ARGS__);		\
+	} while (__ret && !kvm_arm_smmu_topup_memcache(smmu));	\
+	kvm_arm_smmu_reclaim_memcache();			\
+	__ret;							\
+})
 
 static bool kvm_arm_smmu_validate_features(struct arm_smmu_device *smmu)
 {
@@ -211,7 +287,7 @@ static struct platform_driver kvm_arm_smmu_driver = {
 
 static int kvm_arm_smmu_array_alloc(void)
 {
-	int smmu_order;
+	int smmu_order, mc_order;
 	struct device_node *np;
 
 	kvm_arm_smmu_count = 0;
@@ -228,7 +304,17 @@ static int kvm_arm_smmu_array_alloc(void)
 	if (!kvm_arm_smmu_array)
 		return -ENOMEM;
 
+	mc_order = get_order(NR_CPUS * sizeof(*kvm_arm_smmu_memcache));
+	kvm_arm_smmu_memcache = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
+							 mc_order);
+	if (!kvm_arm_smmu_memcache)
+		goto err_free_array;
+
 	return 0;
+
+err_free_array:
+	free_pages((unsigned long)kvm_arm_smmu_array, smmu_order);
+	return -ENOMEM;
 }
 
 static void kvm_arm_smmu_array_free(void)
@@ -237,6 +323,8 @@ static void kvm_arm_smmu_array_free(void)
 
 	order = get_order(kvm_arm_smmu_count * sizeof(*kvm_arm_smmu_array));
 	free_pages((unsigned long)kvm_arm_smmu_array, order);
+	order = get_order(NR_CPUS * sizeof(*kvm_arm_smmu_memcache));
+	free_pages((unsigned long)kvm_arm_smmu_memcache, order);
 }
 
 /**
@@ -272,9 +360,12 @@ int kvm_arm_smmu_v3_init(unsigned int *count)
 	 * These variables are stored in the nVHE image, and won't be accessible
 	 * after KVM initialization. Ownership of kvm_arm_smmu_array will be
 	 * transferred to the hypervisor as well.
+	 *
+	 * kvm_arm_smmu_memcache is shared between hypervisor and host.
 	 */
 	kvm_hyp_arm_smmu_v3_smmus = kern_hyp_va(kvm_arm_smmu_array);
 	kvm_hyp_arm_smmu_v3_count = kvm_arm_smmu_count;
+	kvm_hyp_iommu_memcaches = kern_hyp_va(kvm_arm_smmu_memcache);
 	return 0;
 
 err_free:
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 39/45] iommu/arm-smmu-v3-kvm: Initialize page table configuration
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Prepare the stage-2 I/O page table configuration that will be used by
the hypervisor driver.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c   | 29 +++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
index 755c77bc0417..55489d56fb5b 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
@@ -16,6 +16,7 @@ struct host_arm_smmu_device {
 	struct arm_smmu_device		smmu;
 	pkvm_handle_t			id;
 	u32				boot_gbpa;
+	unsigned int			pgd_order;
 };
 
 #define smmu_to_host(_smmu) \
@@ -192,6 +193,7 @@ static int kvm_arm_smmu_probe(struct platform_device *pdev)
 	size_t size;
 	phys_addr_t ioaddr;
 	struct resource *res;
+	struct io_pgtable_cfg cfg;
 	struct arm_smmu_device *smmu;
 	struct device *dev = &pdev->dev;
 	struct host_arm_smmu_device *host_smmu;
@@ -233,6 +235,31 @@ static int kvm_arm_smmu_probe(struct platform_device *pdev)
 	if (!kvm_arm_smmu_validate_features(smmu))
 		return -ENODEV;
 
+	/*
+	 * Stage-1 should be easy to support, though we do need to allocate a
+	 * context descriptor table.
+	 */
+	cfg = (struct io_pgtable_cfg) {
+		.fmt = ARM_64_LPAE_S2,
+		.pgsize_bitmap = smmu->pgsize_bitmap,
+		.ias = smmu->ias,
+		.oas = smmu->oas,
+		.coherent_walk = smmu->features & ARM_SMMU_FEAT_COHERENCY,
+	};
+
+	/*
+	 * Choose the page and address size. Compute the PGD size and number of
+	 * levels as well, so we know how much memory to pre-allocate.
+	 */
+	ret = io_pgtable_configure(&cfg, &size);
+	if (ret)
+		return ret;
+
+	host_smmu->pgd_order = get_order(size);
+	smmu->pgsize_bitmap = cfg.pgsize_bitmap;
+	smmu->ias = cfg.ias;
+	smmu->oas = cfg.oas;
+
 	ret = arm_smmu_init_one_queue(smmu, &smmu->cmdq.q, smmu->base,
 				      ARM_SMMU_CMDQ_PROD, ARM_SMMU_CMDQ_CONS,
 				      CMDQ_ENT_DWORDS, "cmdq");
@@ -253,6 +280,8 @@ static int kvm_arm_smmu_probe(struct platform_device *pdev)
 	hyp_smmu->mmio_addr = ioaddr;
 	hyp_smmu->mmio_size = size;
 	hyp_smmu->features = smmu->features;
+	hyp_smmu->iommu.pgtable_cfg = cfg;
+
 	kvm_arm_smmu_cur++;
 
 	return 0;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 39/45] iommu/arm-smmu-v3-kvm: Initialize page table configuration
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Prepare the stage-2 I/O page table configuration that will be used by
the hypervisor driver.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c   | 29 +++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
index 755c77bc0417..55489d56fb5b 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
@@ -16,6 +16,7 @@ struct host_arm_smmu_device {
 	struct arm_smmu_device		smmu;
 	pkvm_handle_t			id;
 	u32				boot_gbpa;
+	unsigned int			pgd_order;
 };
 
 #define smmu_to_host(_smmu) \
@@ -192,6 +193,7 @@ static int kvm_arm_smmu_probe(struct platform_device *pdev)
 	size_t size;
 	phys_addr_t ioaddr;
 	struct resource *res;
+	struct io_pgtable_cfg cfg;
 	struct arm_smmu_device *smmu;
 	struct device *dev = &pdev->dev;
 	struct host_arm_smmu_device *host_smmu;
@@ -233,6 +235,31 @@ static int kvm_arm_smmu_probe(struct platform_device *pdev)
 	if (!kvm_arm_smmu_validate_features(smmu))
 		return -ENODEV;
 
+	/*
+	 * Stage-1 should be easy to support, though we do need to allocate a
+	 * context descriptor table.
+	 */
+	cfg = (struct io_pgtable_cfg) {
+		.fmt = ARM_64_LPAE_S2,
+		.pgsize_bitmap = smmu->pgsize_bitmap,
+		.ias = smmu->ias,
+		.oas = smmu->oas,
+		.coherent_walk = smmu->features & ARM_SMMU_FEAT_COHERENCY,
+	};
+
+	/*
+	 * Choose the page and address size. Compute the PGD size and number of
+	 * levels as well, so we know how much memory to pre-allocate.
+	 */
+	ret = io_pgtable_configure(&cfg, &size);
+	if (ret)
+		return ret;
+
+	host_smmu->pgd_order = get_order(size);
+	smmu->pgsize_bitmap = cfg.pgsize_bitmap;
+	smmu->ias = cfg.ias;
+	smmu->oas = cfg.oas;
+
 	ret = arm_smmu_init_one_queue(smmu, &smmu->cmdq.q, smmu->base,
 				      ARM_SMMU_CMDQ_PROD, ARM_SMMU_CMDQ_CONS,
 				      CMDQ_ENT_DWORDS, "cmdq");
@@ -253,6 +280,8 @@ static int kvm_arm_smmu_probe(struct platform_device *pdev)
 	hyp_smmu->mmio_addr = ioaddr;
 	hyp_smmu->mmio_size = size;
 	hyp_smmu->features = smmu->features;
+	hyp_smmu->iommu.pgtable_cfg = cfg;
+
 	kvm_arm_smmu_cur++;
 
 	return 0;
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 40/45] iommu/arm-smmu-v3-kvm: Add IOMMU ops
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Forward alloc_domain(), attach_dev(), map_pages(), etc to the
hypervisor.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c   | 330 +++++++++++++++++-
 1 file changed, 328 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
index 55489d56fb5b..930d78f6e29f 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
@@ -22,10 +22,28 @@ struct host_arm_smmu_device {
 #define smmu_to_host(_smmu) \
 	container_of(_smmu, struct host_arm_smmu_device, smmu);
 
+struct kvm_arm_smmu_master {
+	struct arm_smmu_device		*smmu;
+	struct device			*dev;
+	struct kvm_arm_smmu_domain	*domain;
+};
+
+struct kvm_arm_smmu_domain {
+	struct iommu_domain		domain;
+	struct arm_smmu_device		*smmu;
+	struct mutex			init_mutex;
+	unsigned long			pgd;
+	pkvm_handle_t			id;
+};
+
+#define to_kvm_smmu_domain(_domain) \
+	container_of(_domain, struct kvm_arm_smmu_domain, domain)
+
 static size_t				kvm_arm_smmu_cur;
 static size_t				kvm_arm_smmu_count;
 static struct hyp_arm_smmu_v3_device	*kvm_arm_smmu_array;
 static struct kvm_hyp_iommu_memcache	*kvm_arm_smmu_memcache;
+static DEFINE_IDA(kvm_arm_smmu_domain_ida);
 
 static DEFINE_PER_CPU(local_lock_t, memcache_lock) =
 				INIT_LOCAL_LOCK(memcache_lock);
@@ -57,7 +75,6 @@ static void *kvm_arm_smmu_host_va(phys_addr_t pa)
 	return __va(pa);
 }
 
-__maybe_unused
 static int kvm_arm_smmu_topup_memcache(struct arm_smmu_device *smmu)
 {
 	struct kvm_hyp_memcache *mc;
@@ -74,7 +91,6 @@ static int kvm_arm_smmu_topup_memcache(struct arm_smmu_device *smmu)
 				     kvm_arm_smmu_host_pa, smmu);
 }
 
-__maybe_unused
 static void kvm_arm_smmu_reclaim_memcache(void)
 {
 	struct kvm_hyp_memcache *mc;
@@ -101,6 +117,299 @@ static void kvm_arm_smmu_reclaim_memcache(void)
 	__ret;							\
 })
 
+static struct platform_driver kvm_arm_smmu_driver;
+
+static struct arm_smmu_device *
+kvm_arm_smmu_get_by_fwnode(struct fwnode_handle *fwnode)
+{
+	struct device *dev;
+
+	dev = driver_find_device_by_fwnode(&kvm_arm_smmu_driver.driver, fwnode);
+	put_device(dev);
+	return dev ? dev_get_drvdata(dev) : NULL;
+}
+
+static struct iommu_ops kvm_arm_smmu_ops;
+
+static struct iommu_device *kvm_arm_smmu_probe_device(struct device *dev)
+{
+	struct arm_smmu_device *smmu;
+	struct kvm_arm_smmu_master *master;
+	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+
+	if (!fwspec || fwspec->ops != &kvm_arm_smmu_ops)
+		return ERR_PTR(-ENODEV);
+
+	if (WARN_ON_ONCE(dev_iommu_priv_get(dev)))
+		return ERR_PTR(-EBUSY);
+
+	smmu = kvm_arm_smmu_get_by_fwnode(fwspec->iommu_fwnode);
+	if (!smmu)
+		return ERR_PTR(-ENODEV);
+
+	master = kzalloc(sizeof(*master), GFP_KERNEL);
+	if (!master)
+		return ERR_PTR(-ENOMEM);
+
+	master->dev = dev;
+	master->smmu = smmu;
+	dev_iommu_priv_set(dev, master);
+
+	return &smmu->iommu;
+}
+
+static void kvm_arm_smmu_release_device(struct device *dev)
+{
+	struct kvm_arm_smmu_master *master = dev_iommu_priv_get(dev);
+
+	kfree(master);
+	iommu_fwspec_free(dev);
+}
+
+static struct iommu_domain *kvm_arm_smmu_domain_alloc(unsigned type)
+{
+	struct kvm_arm_smmu_domain *kvm_smmu_domain;
+
+	/*
+	 * We don't support
+	 * - IOMMU_DOMAIN_IDENTITY because we rely on the host telling the
+	 *   hypervisor which pages are used for DMA.
+	 * - IOMMU_DOMAIN_DMA_FQ because lazy unmap would clash with memory
+	 *   donation to guests.
+	 */
+	if (type != IOMMU_DOMAIN_DMA &&
+	    type != IOMMU_DOMAIN_UNMANAGED)
+		return NULL;
+
+	kvm_smmu_domain = kzalloc(sizeof(*kvm_smmu_domain), GFP_KERNEL);
+	if (!kvm_smmu_domain)
+		return NULL;
+
+	mutex_init(&kvm_smmu_domain->init_mutex);
+
+	return &kvm_smmu_domain->domain;
+}
+
+static int kvm_arm_smmu_domain_finalize(struct kvm_arm_smmu_domain *kvm_smmu_domain,
+					struct kvm_arm_smmu_master *master)
+{
+	int ret = 0;
+	struct page *p;
+	unsigned long pgd;
+	struct arm_smmu_device *smmu = master->smmu;
+	struct host_arm_smmu_device *host_smmu = smmu_to_host(smmu);
+
+	if (kvm_smmu_domain->smmu) {
+		if (kvm_smmu_domain->smmu != smmu)
+			return -EINVAL;
+		return 0;
+	}
+
+	ret = ida_alloc_range(&kvm_arm_smmu_domain_ida, 0, 1 << smmu->vmid_bits,
+			      GFP_KERNEL);
+	if (ret < 0)
+		return ret;
+	kvm_smmu_domain->id = ret;
+
+	/*
+	 * PGD allocation does not use the memcache because it may be of higher
+	 * order when concatenated.
+	 */
+	p = alloc_pages_node(dev_to_node(smmu->dev), GFP_KERNEL | __GFP_ZERO,
+			     host_smmu->pgd_order);
+	if (!p)
+		return -ENOMEM;
+
+	pgd = (unsigned long)page_to_virt(p);
+
+	local_lock_irq(&memcache_lock);
+	ret = kvm_call_hyp_nvhe_mc(smmu, __pkvm_host_iommu_alloc_domain,
+				   host_smmu->id, kvm_smmu_domain->id, pgd);
+	local_unlock_irq(&memcache_lock);
+	if (ret)
+		goto err_free;
+
+	kvm_smmu_domain->domain.pgsize_bitmap = smmu->pgsize_bitmap;
+	kvm_smmu_domain->domain.geometry.aperture_end = (1UL << smmu->ias) - 1;
+	kvm_smmu_domain->domain.geometry.force_aperture = true;
+	kvm_smmu_domain->smmu = smmu;
+	kvm_smmu_domain->pgd = pgd;
+
+	return 0;
+
+err_free:
+	free_pages(pgd, host_smmu->pgd_order);
+	ida_free(&kvm_arm_smmu_domain_ida, kvm_smmu_domain->id);
+	return ret;
+}
+
+static void kvm_arm_smmu_domain_free(struct iommu_domain *domain)
+{
+	int ret;
+	struct kvm_arm_smmu_domain *kvm_smmu_domain = to_kvm_smmu_domain(domain);
+	struct arm_smmu_device *smmu = kvm_smmu_domain->smmu;
+
+	if (smmu) {
+		struct host_arm_smmu_device *host_smmu = smmu_to_host(smmu);
+
+		ret = kvm_call_hyp_nvhe(__pkvm_host_iommu_free_domain,
+					host_smmu->id, kvm_smmu_domain->id);
+		/*
+		 * On failure, leak the pgd because it probably hasn't been
+		 * reclaimed by the host.
+		 */
+		if (!WARN_ON(ret))
+			free_pages(kvm_smmu_domain->pgd, host_smmu->pgd_order);
+		ida_free(&kvm_arm_smmu_domain_ida, kvm_smmu_domain->id);
+	}
+	kfree(kvm_smmu_domain);
+}
+
+static int kvm_arm_smmu_detach_dev(struct host_arm_smmu_device *host_smmu,
+				   struct kvm_arm_smmu_master *master)
+{
+	int i, ret;
+	struct arm_smmu_device *smmu = &host_smmu->smmu;
+	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
+
+	if (!master->domain)
+		return 0;
+
+	for (i = 0; i < fwspec->num_ids; i++) {
+		int sid = fwspec->ids[i];
+
+		ret = kvm_call_hyp_nvhe(__pkvm_host_iommu_detach_dev,
+					host_smmu->id, master->domain->id, sid);
+		if (ret) {
+			dev_err(smmu->dev, "cannot detach device %s (0x%x): %d\n",
+				dev_name(master->dev), sid, ret);
+			break;
+		}
+	}
+
+	master->domain = NULL;
+
+	return ret;
+}
+
+static int kvm_arm_smmu_attach_dev(struct iommu_domain *domain,
+				   struct device *dev)
+{
+	int i, ret;
+	struct arm_smmu_device *smmu;
+	struct host_arm_smmu_device *host_smmu;
+	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+	struct kvm_arm_smmu_master *master = dev_iommu_priv_get(dev);
+	struct kvm_arm_smmu_domain *kvm_smmu_domain = to_kvm_smmu_domain(domain);
+
+	if (!master)
+		return -ENODEV;
+
+	smmu = master->smmu;
+	host_smmu = smmu_to_host(smmu);
+
+	ret = kvm_arm_smmu_detach_dev(host_smmu, master);
+	if (ret)
+		return ret;
+
+	mutex_lock(&kvm_smmu_domain->init_mutex);
+	ret = kvm_arm_smmu_domain_finalize(kvm_smmu_domain, master);
+	mutex_unlock(&kvm_smmu_domain->init_mutex);
+	if (ret)
+		return ret;
+
+	local_lock_irq(&memcache_lock);
+	for (i = 0; i < fwspec->num_ids; i++) {
+		int sid = fwspec->ids[i];
+
+		ret = kvm_call_hyp_nvhe_mc(smmu, __pkvm_host_iommu_attach_dev,
+					   host_smmu->id, kvm_smmu_domain->id,
+					   sid);
+		if (ret) {
+			dev_err(smmu->dev, "cannot attach device %s (0x%x): %d\n",
+				dev_name(dev), sid, ret);
+			goto out_unlock;
+		}
+	}
+	master->domain = kvm_smmu_domain;
+
+out_unlock:
+	if (ret)
+		kvm_arm_smmu_detach_dev(host_smmu, master);
+	local_unlock_irq(&memcache_lock);
+	return ret;
+}
+
+static int kvm_arm_smmu_map_pages(struct iommu_domain *domain,
+				  unsigned long iova, phys_addr_t paddr,
+				  size_t pgsize, size_t pgcount, int prot,
+				  gfp_t gfp, size_t *mapped)
+{
+	int ret;
+	unsigned long irqflags;
+	struct kvm_arm_smmu_domain *kvm_smmu_domain = to_kvm_smmu_domain(domain);
+	struct arm_smmu_device *smmu = kvm_smmu_domain->smmu;
+	struct host_arm_smmu_device *host_smmu = smmu_to_host(smmu);
+
+	local_lock_irqsave(&memcache_lock, irqflags);
+	ret = kvm_call_hyp_nvhe_mc(smmu, __pkvm_host_iommu_map_pages,
+				   host_smmu->id, kvm_smmu_domain->id, iova,
+				   paddr, pgsize, pgcount, prot);
+	local_unlock_irqrestore(&memcache_lock, irqflags);
+	if (ret)
+		return ret;
+
+	*mapped = pgsize * pgcount;
+	return 0;
+}
+
+static size_t kvm_arm_smmu_unmap_pages(struct iommu_domain *domain,
+				       unsigned long iova, size_t pgsize,
+				       size_t pgcount,
+				       struct iommu_iotlb_gather *iotlb_gather)
+{
+	int ret;
+	unsigned long irqflags;
+	struct kvm_arm_smmu_domain *kvm_smmu_domain = to_kvm_smmu_domain(domain);
+	struct arm_smmu_device *smmu = kvm_smmu_domain->smmu;
+	struct host_arm_smmu_device *host_smmu = smmu_to_host(smmu);
+
+	local_lock_irqsave(&memcache_lock, irqflags);
+	ret = kvm_call_hyp_nvhe_mc(smmu, __pkvm_host_iommu_unmap_pages,
+				   host_smmu->id, kvm_smmu_domain->id, iova,
+				   pgsize, pgcount);
+	local_unlock_irqrestore(&memcache_lock, irqflags);
+
+	return ret ? 0 : pgsize * pgcount;
+}
+
+static phys_addr_t kvm_arm_smmu_iova_to_phys(struct iommu_domain *domain,
+					     dma_addr_t iova)
+{
+	struct kvm_arm_smmu_domain *kvm_smmu_domain = to_kvm_smmu_domain(domain);
+	struct host_arm_smmu_device *host_smmu = smmu_to_host(kvm_smmu_domain->smmu);
+
+	return kvm_call_hyp_nvhe(__pkvm_host_iommu_iova_to_phys, host_smmu->id,
+				 kvm_smmu_domain->id, iova);
+}
+
+static struct iommu_ops kvm_arm_smmu_ops = {
+	.capable		= arm_smmu_capable,
+	.device_group		= arm_smmu_device_group,
+	.of_xlate		= arm_smmu_of_xlate,
+	.probe_device		= kvm_arm_smmu_probe_device,
+	.release_device		= kvm_arm_smmu_release_device,
+	.domain_alloc		= kvm_arm_smmu_domain_alloc,
+	.owner			= THIS_MODULE,
+	.default_domain_ops = &(const struct iommu_domain_ops) {
+		.attach_dev	= kvm_arm_smmu_attach_dev,
+		.free		= kvm_arm_smmu_domain_free,
+		.map_pages	= kvm_arm_smmu_map_pages,
+		.unmap_pages	= kvm_arm_smmu_unmap_pages,
+		.iova_to_phys	= kvm_arm_smmu_iova_to_phys,
+	}
+};
+
 static bool kvm_arm_smmu_validate_features(struct arm_smmu_device *smmu)
 {
 	unsigned long oas;
@@ -186,6 +495,12 @@ static int kvm_arm_smmu_device_reset(struct host_arm_smmu_device *host_smmu)
 	return 0;
 }
 
+static void *kvm_arm_smmu_alloc_domains(struct arm_smmu_device *smmu)
+{
+	return (void *)devm_get_free_pages(smmu->dev, GFP_KERNEL | __GFP_ZERO,
+					   get_order(KVM_IOMMU_DOMAINS_ROOT_SIZE));
+}
+
 static int kvm_arm_smmu_probe(struct platform_device *pdev)
 {
 	int ret;
@@ -274,6 +589,16 @@ static int kvm_arm_smmu_probe(struct platform_device *pdev)
 	if (ret)
 		return ret;
 
+	hyp_smmu->iommu.domains = kvm_arm_smmu_alloc_domains(smmu);
+	if (!hyp_smmu->iommu.domains)
+		return -ENOMEM;
+
+	hyp_smmu->iommu.nr_domains = 1 << smmu->vmid_bits;
+
+	ret = arm_smmu_register_iommu(smmu, &kvm_arm_smmu_ops, ioaddr);
+	if (ret)
+		return ret;
+
 	platform_set_drvdata(pdev, host_smmu);
 
 	/* Hypervisor parameters */
@@ -296,6 +621,7 @@ static int kvm_arm_smmu_remove(struct platform_device *pdev)
 	 * There was an error during hypervisor setup. The hyp driver may
 	 * have already enabled the device, so disable it.
 	 */
+	arm_smmu_unregister_iommu(smmu);
 	arm_smmu_device_disable(smmu);
 	arm_smmu_update_gbpa(smmu, host_smmu->boot_gbpa, GBPA_ABORT);
 	return 0;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 40/45] iommu/arm-smmu-v3-kvm: Add IOMMU ops
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Forward alloc_domain(), attach_dev(), map_pages(), etc to the
hypervisor.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c   | 330 +++++++++++++++++-
 1 file changed, 328 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
index 55489d56fb5b..930d78f6e29f 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
@@ -22,10 +22,28 @@ struct host_arm_smmu_device {
 #define smmu_to_host(_smmu) \
 	container_of(_smmu, struct host_arm_smmu_device, smmu);
 
+struct kvm_arm_smmu_master {
+	struct arm_smmu_device		*smmu;
+	struct device			*dev;
+	struct kvm_arm_smmu_domain	*domain;
+};
+
+struct kvm_arm_smmu_domain {
+	struct iommu_domain		domain;
+	struct arm_smmu_device		*smmu;
+	struct mutex			init_mutex;
+	unsigned long			pgd;
+	pkvm_handle_t			id;
+};
+
+#define to_kvm_smmu_domain(_domain) \
+	container_of(_domain, struct kvm_arm_smmu_domain, domain)
+
 static size_t				kvm_arm_smmu_cur;
 static size_t				kvm_arm_smmu_count;
 static struct hyp_arm_smmu_v3_device	*kvm_arm_smmu_array;
 static struct kvm_hyp_iommu_memcache	*kvm_arm_smmu_memcache;
+static DEFINE_IDA(kvm_arm_smmu_domain_ida);
 
 static DEFINE_PER_CPU(local_lock_t, memcache_lock) =
 				INIT_LOCAL_LOCK(memcache_lock);
@@ -57,7 +75,6 @@ static void *kvm_arm_smmu_host_va(phys_addr_t pa)
 	return __va(pa);
 }
 
-__maybe_unused
 static int kvm_arm_smmu_topup_memcache(struct arm_smmu_device *smmu)
 {
 	struct kvm_hyp_memcache *mc;
@@ -74,7 +91,6 @@ static int kvm_arm_smmu_topup_memcache(struct arm_smmu_device *smmu)
 				     kvm_arm_smmu_host_pa, smmu);
 }
 
-__maybe_unused
 static void kvm_arm_smmu_reclaim_memcache(void)
 {
 	struct kvm_hyp_memcache *mc;
@@ -101,6 +117,299 @@ static void kvm_arm_smmu_reclaim_memcache(void)
 	__ret;							\
 })
 
+static struct platform_driver kvm_arm_smmu_driver;
+
+static struct arm_smmu_device *
+kvm_arm_smmu_get_by_fwnode(struct fwnode_handle *fwnode)
+{
+	struct device *dev;
+
+	dev = driver_find_device_by_fwnode(&kvm_arm_smmu_driver.driver, fwnode);
+	put_device(dev);
+	return dev ? dev_get_drvdata(dev) : NULL;
+}
+
+static struct iommu_ops kvm_arm_smmu_ops;
+
+static struct iommu_device *kvm_arm_smmu_probe_device(struct device *dev)
+{
+	struct arm_smmu_device *smmu;
+	struct kvm_arm_smmu_master *master;
+	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+
+	if (!fwspec || fwspec->ops != &kvm_arm_smmu_ops)
+		return ERR_PTR(-ENODEV);
+
+	if (WARN_ON_ONCE(dev_iommu_priv_get(dev)))
+		return ERR_PTR(-EBUSY);
+
+	smmu = kvm_arm_smmu_get_by_fwnode(fwspec->iommu_fwnode);
+	if (!smmu)
+		return ERR_PTR(-ENODEV);
+
+	master = kzalloc(sizeof(*master), GFP_KERNEL);
+	if (!master)
+		return ERR_PTR(-ENOMEM);
+
+	master->dev = dev;
+	master->smmu = smmu;
+	dev_iommu_priv_set(dev, master);
+
+	return &smmu->iommu;
+}
+
+static void kvm_arm_smmu_release_device(struct device *dev)
+{
+	struct kvm_arm_smmu_master *master = dev_iommu_priv_get(dev);
+
+	kfree(master);
+	iommu_fwspec_free(dev);
+}
+
+static struct iommu_domain *kvm_arm_smmu_domain_alloc(unsigned type)
+{
+	struct kvm_arm_smmu_domain *kvm_smmu_domain;
+
+	/*
+	 * We don't support
+	 * - IOMMU_DOMAIN_IDENTITY because we rely on the host telling the
+	 *   hypervisor which pages are used for DMA.
+	 * - IOMMU_DOMAIN_DMA_FQ because lazy unmap would clash with memory
+	 *   donation to guests.
+	 */
+	if (type != IOMMU_DOMAIN_DMA &&
+	    type != IOMMU_DOMAIN_UNMANAGED)
+		return NULL;
+
+	kvm_smmu_domain = kzalloc(sizeof(*kvm_smmu_domain), GFP_KERNEL);
+	if (!kvm_smmu_domain)
+		return NULL;
+
+	mutex_init(&kvm_smmu_domain->init_mutex);
+
+	return &kvm_smmu_domain->domain;
+}
+
+static int kvm_arm_smmu_domain_finalize(struct kvm_arm_smmu_domain *kvm_smmu_domain,
+					struct kvm_arm_smmu_master *master)
+{
+	int ret = 0;
+	struct page *p;
+	unsigned long pgd;
+	struct arm_smmu_device *smmu = master->smmu;
+	struct host_arm_smmu_device *host_smmu = smmu_to_host(smmu);
+
+	if (kvm_smmu_domain->smmu) {
+		if (kvm_smmu_domain->smmu != smmu)
+			return -EINVAL;
+		return 0;
+	}
+
+	ret = ida_alloc_range(&kvm_arm_smmu_domain_ida, 0, 1 << smmu->vmid_bits,
+			      GFP_KERNEL);
+	if (ret < 0)
+		return ret;
+	kvm_smmu_domain->id = ret;
+
+	/*
+	 * PGD allocation does not use the memcache because it may be of higher
+	 * order when concatenated.
+	 */
+	p = alloc_pages_node(dev_to_node(smmu->dev), GFP_KERNEL | __GFP_ZERO,
+			     host_smmu->pgd_order);
+	if (!p)
+		return -ENOMEM;
+
+	pgd = (unsigned long)page_to_virt(p);
+
+	local_lock_irq(&memcache_lock);
+	ret = kvm_call_hyp_nvhe_mc(smmu, __pkvm_host_iommu_alloc_domain,
+				   host_smmu->id, kvm_smmu_domain->id, pgd);
+	local_unlock_irq(&memcache_lock);
+	if (ret)
+		goto err_free;
+
+	kvm_smmu_domain->domain.pgsize_bitmap = smmu->pgsize_bitmap;
+	kvm_smmu_domain->domain.geometry.aperture_end = (1UL << smmu->ias) - 1;
+	kvm_smmu_domain->domain.geometry.force_aperture = true;
+	kvm_smmu_domain->smmu = smmu;
+	kvm_smmu_domain->pgd = pgd;
+
+	return 0;
+
+err_free:
+	free_pages(pgd, host_smmu->pgd_order);
+	ida_free(&kvm_arm_smmu_domain_ida, kvm_smmu_domain->id);
+	return ret;
+}
+
+static void kvm_arm_smmu_domain_free(struct iommu_domain *domain)
+{
+	int ret;
+	struct kvm_arm_smmu_domain *kvm_smmu_domain = to_kvm_smmu_domain(domain);
+	struct arm_smmu_device *smmu = kvm_smmu_domain->smmu;
+
+	if (smmu) {
+		struct host_arm_smmu_device *host_smmu = smmu_to_host(smmu);
+
+		ret = kvm_call_hyp_nvhe(__pkvm_host_iommu_free_domain,
+					host_smmu->id, kvm_smmu_domain->id);
+		/*
+		 * On failure, leak the pgd because it probably hasn't been
+		 * reclaimed by the host.
+		 */
+		if (!WARN_ON(ret))
+			free_pages(kvm_smmu_domain->pgd, host_smmu->pgd_order);
+		ida_free(&kvm_arm_smmu_domain_ida, kvm_smmu_domain->id);
+	}
+	kfree(kvm_smmu_domain);
+}
+
+static int kvm_arm_smmu_detach_dev(struct host_arm_smmu_device *host_smmu,
+				   struct kvm_arm_smmu_master *master)
+{
+	int i, ret;
+	struct arm_smmu_device *smmu = &host_smmu->smmu;
+	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
+
+	if (!master->domain)
+		return 0;
+
+	for (i = 0; i < fwspec->num_ids; i++) {
+		int sid = fwspec->ids[i];
+
+		ret = kvm_call_hyp_nvhe(__pkvm_host_iommu_detach_dev,
+					host_smmu->id, master->domain->id, sid);
+		if (ret) {
+			dev_err(smmu->dev, "cannot detach device %s (0x%x): %d\n",
+				dev_name(master->dev), sid, ret);
+			break;
+		}
+	}
+
+	master->domain = NULL;
+
+	return ret;
+}
+
+static int kvm_arm_smmu_attach_dev(struct iommu_domain *domain,
+				   struct device *dev)
+{
+	int i, ret;
+	struct arm_smmu_device *smmu;
+	struct host_arm_smmu_device *host_smmu;
+	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+	struct kvm_arm_smmu_master *master = dev_iommu_priv_get(dev);
+	struct kvm_arm_smmu_domain *kvm_smmu_domain = to_kvm_smmu_domain(domain);
+
+	if (!master)
+		return -ENODEV;
+
+	smmu = master->smmu;
+	host_smmu = smmu_to_host(smmu);
+
+	ret = kvm_arm_smmu_detach_dev(host_smmu, master);
+	if (ret)
+		return ret;
+
+	mutex_lock(&kvm_smmu_domain->init_mutex);
+	ret = kvm_arm_smmu_domain_finalize(kvm_smmu_domain, master);
+	mutex_unlock(&kvm_smmu_domain->init_mutex);
+	if (ret)
+		return ret;
+
+	local_lock_irq(&memcache_lock);
+	for (i = 0; i < fwspec->num_ids; i++) {
+		int sid = fwspec->ids[i];
+
+		ret = kvm_call_hyp_nvhe_mc(smmu, __pkvm_host_iommu_attach_dev,
+					   host_smmu->id, kvm_smmu_domain->id,
+					   sid);
+		if (ret) {
+			dev_err(smmu->dev, "cannot attach device %s (0x%x): %d\n",
+				dev_name(dev), sid, ret);
+			goto out_unlock;
+		}
+	}
+	master->domain = kvm_smmu_domain;
+
+out_unlock:
+	if (ret)
+		kvm_arm_smmu_detach_dev(host_smmu, master);
+	local_unlock_irq(&memcache_lock);
+	return ret;
+}
+
+static int kvm_arm_smmu_map_pages(struct iommu_domain *domain,
+				  unsigned long iova, phys_addr_t paddr,
+				  size_t pgsize, size_t pgcount, int prot,
+				  gfp_t gfp, size_t *mapped)
+{
+	int ret;
+	unsigned long irqflags;
+	struct kvm_arm_smmu_domain *kvm_smmu_domain = to_kvm_smmu_domain(domain);
+	struct arm_smmu_device *smmu = kvm_smmu_domain->smmu;
+	struct host_arm_smmu_device *host_smmu = smmu_to_host(smmu);
+
+	local_lock_irqsave(&memcache_lock, irqflags);
+	ret = kvm_call_hyp_nvhe_mc(smmu, __pkvm_host_iommu_map_pages,
+				   host_smmu->id, kvm_smmu_domain->id, iova,
+				   paddr, pgsize, pgcount, prot);
+	local_unlock_irqrestore(&memcache_lock, irqflags);
+	if (ret)
+		return ret;
+
+	*mapped = pgsize * pgcount;
+	return 0;
+}
+
+static size_t kvm_arm_smmu_unmap_pages(struct iommu_domain *domain,
+				       unsigned long iova, size_t pgsize,
+				       size_t pgcount,
+				       struct iommu_iotlb_gather *iotlb_gather)
+{
+	int ret;
+	unsigned long irqflags;
+	struct kvm_arm_smmu_domain *kvm_smmu_domain = to_kvm_smmu_domain(domain);
+	struct arm_smmu_device *smmu = kvm_smmu_domain->smmu;
+	struct host_arm_smmu_device *host_smmu = smmu_to_host(smmu);
+
+	local_lock_irqsave(&memcache_lock, irqflags);
+	ret = kvm_call_hyp_nvhe_mc(smmu, __pkvm_host_iommu_unmap_pages,
+				   host_smmu->id, kvm_smmu_domain->id, iova,
+				   pgsize, pgcount);
+	local_unlock_irqrestore(&memcache_lock, irqflags);
+
+	return ret ? 0 : pgsize * pgcount;
+}
+
+static phys_addr_t kvm_arm_smmu_iova_to_phys(struct iommu_domain *domain,
+					     dma_addr_t iova)
+{
+	struct kvm_arm_smmu_domain *kvm_smmu_domain = to_kvm_smmu_domain(domain);
+	struct host_arm_smmu_device *host_smmu = smmu_to_host(kvm_smmu_domain->smmu);
+
+	return kvm_call_hyp_nvhe(__pkvm_host_iommu_iova_to_phys, host_smmu->id,
+				 kvm_smmu_domain->id, iova);
+}
+
+static struct iommu_ops kvm_arm_smmu_ops = {
+	.capable		= arm_smmu_capable,
+	.device_group		= arm_smmu_device_group,
+	.of_xlate		= arm_smmu_of_xlate,
+	.probe_device		= kvm_arm_smmu_probe_device,
+	.release_device		= kvm_arm_smmu_release_device,
+	.domain_alloc		= kvm_arm_smmu_domain_alloc,
+	.owner			= THIS_MODULE,
+	.default_domain_ops = &(const struct iommu_domain_ops) {
+		.attach_dev	= kvm_arm_smmu_attach_dev,
+		.free		= kvm_arm_smmu_domain_free,
+		.map_pages	= kvm_arm_smmu_map_pages,
+		.unmap_pages	= kvm_arm_smmu_unmap_pages,
+		.iova_to_phys	= kvm_arm_smmu_iova_to_phys,
+	}
+};
+
 static bool kvm_arm_smmu_validate_features(struct arm_smmu_device *smmu)
 {
 	unsigned long oas;
@@ -186,6 +495,12 @@ static int kvm_arm_smmu_device_reset(struct host_arm_smmu_device *host_smmu)
 	return 0;
 }
 
+static void *kvm_arm_smmu_alloc_domains(struct arm_smmu_device *smmu)
+{
+	return (void *)devm_get_free_pages(smmu->dev, GFP_KERNEL | __GFP_ZERO,
+					   get_order(KVM_IOMMU_DOMAINS_ROOT_SIZE));
+}
+
 static int kvm_arm_smmu_probe(struct platform_device *pdev)
 {
 	int ret;
@@ -274,6 +589,16 @@ static int kvm_arm_smmu_probe(struct platform_device *pdev)
 	if (ret)
 		return ret;
 
+	hyp_smmu->iommu.domains = kvm_arm_smmu_alloc_domains(smmu);
+	if (!hyp_smmu->iommu.domains)
+		return -ENOMEM;
+
+	hyp_smmu->iommu.nr_domains = 1 << smmu->vmid_bits;
+
+	ret = arm_smmu_register_iommu(smmu, &kvm_arm_smmu_ops, ioaddr);
+	if (ret)
+		return ret;
+
 	platform_set_drvdata(pdev, host_smmu);
 
 	/* Hypervisor parameters */
@@ -296,6 +621,7 @@ static int kvm_arm_smmu_remove(struct platform_device *pdev)
 	 * There was an error during hypervisor setup. The hyp driver may
 	 * have already enabled the device, so disable it.
 	 */
+	arm_smmu_unregister_iommu(smmu);
 	arm_smmu_device_disable(smmu);
 	arm_smmu_update_gbpa(smmu, host_smmu->boot_gbpa, GBPA_ABORT);
 	return 0;
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 41/45] KVM: arm64: pkvm: Add __pkvm_host_add_remove_page()
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Add a small helper to remove and add back a page from the host stage-2.
This will be used to temporarily unmap a piece of shared sram (device
memory) from the host while we handle a SCMI request, preventing the
host from modifying the request after it is verified.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  1 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 17 +++++++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index a363d58a998b..a7b28307604d 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -75,6 +75,7 @@ int __pkvm_guest_share_host(struct pkvm_hyp_vcpu *hyp_vcpu, u64 ipa);
 int __pkvm_guest_unshare_host(struct pkvm_hyp_vcpu *hyp_vcpu, u64 ipa);
 int __pkvm_host_share_dma(u64 phys_addr, size_t size, bool is_ram);
 int __pkvm_host_unshare_dma(u64 phys_addr, size_t size);
+int __pkvm_host_add_remove_page(u64 pfn, bool remove);
 
 bool addr_is_memory(phys_addr_t phys);
 int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index dcf08ce03790..6c3eeea4d4f5 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -1954,3 +1954,20 @@ int __pkvm_host_reclaim_page(u64 pfn)
 
 	return ret;
 }
+
+/*
+ * Temporarily unmap a page from the host stage-2, if @remove is true, or put it
+ * back. After restoring the ownership to host, the page will be lazy-mapped.
+ */
+int __pkvm_host_add_remove_page(u64 pfn, bool remove)
+{
+	int ret;
+	u64 host_addr = hyp_pfn_to_phys(pfn);
+	u8 owner = remove ? PKVM_ID_HYP : PKVM_ID_HOST;
+
+	host_lock_component();
+	ret = host_stage2_set_owner_locked(host_addr, PAGE_SIZE, owner);
+	host_unlock_component();
+
+	return ret;
+}
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 41/45] KVM: arm64: pkvm: Add __pkvm_host_add_remove_page()
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Add a small helper to remove and add back a page from the host stage-2.
This will be used to temporarily unmap a piece of shared sram (device
memory) from the host while we handle a SCMI request, preventing the
host from modifying the request after it is verified.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  1 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 17 +++++++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index a363d58a998b..a7b28307604d 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -75,6 +75,7 @@ int __pkvm_guest_share_host(struct pkvm_hyp_vcpu *hyp_vcpu, u64 ipa);
 int __pkvm_guest_unshare_host(struct pkvm_hyp_vcpu *hyp_vcpu, u64 ipa);
 int __pkvm_host_share_dma(u64 phys_addr, size_t size, bool is_ram);
 int __pkvm_host_unshare_dma(u64 phys_addr, size_t size);
+int __pkvm_host_add_remove_page(u64 pfn, bool remove);
 
 bool addr_is_memory(phys_addr_t phys);
 int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index dcf08ce03790..6c3eeea4d4f5 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -1954,3 +1954,20 @@ int __pkvm_host_reclaim_page(u64 pfn)
 
 	return ret;
 }
+
+/*
+ * Temporarily unmap a page from the host stage-2, if @remove is true, or put it
+ * back. After restoring the ownership to host, the page will be lazy-mapped.
+ */
+int __pkvm_host_add_remove_page(u64 pfn, bool remove)
+{
+	int ret;
+	u64 host_addr = hyp_pfn_to_phys(pfn);
+	u8 owner = remove ? PKVM_ID_HYP : PKVM_ID_HOST;
+
+	host_lock_component();
+	ret = host_stage2_set_owner_locked(host_addr, PAGE_SIZE, owner);
+	host_unlock_component();
+
+	return ret;
+}
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 42/45] KVM: arm64: pkvm: Support SCMI power domain
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

The hypervisor needs to catch power domain changes for devices it owns,
such as the SMMU. Possible reasons:

* Ensure that software and hardware states are consistent. The driver
  does not attempt to modify the state while the device is off.
* Save and restore the device state.
* Enforce dependency between consumers and suppliers. For example ensure
  that endpoints are off before turning the SMMU off, in case a powered
  off SMMU lets DMA through. However this is normally enforced by
  firmware.

Add a SCMI power domain, as the standard method for device power
management on Arm. Other methods can be added to kvm_power_domain later.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/kvm/hyp/nvhe/Makefile              |   1 +
 arch/arm64/include/asm/kvm_hyp.h              |   1 +
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h        |  26 ++
 .../arm64/kvm/hyp/include/nvhe/trap_handler.h |   2 +
 include/kvm/power_domain.h                    |  22 ++
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            |   4 +-
 arch/arm64/kvm/hyp/nvhe/power/scmi.c          | 233 ++++++++++++++++++
 7 files changed, 287 insertions(+), 2 deletions(-)
 create mode 100644 include/kvm/power_domain.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/power/scmi.c

diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 8359909bd796..583a1f920c81 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -32,6 +32,7 @@ hyp-obj-$(CONFIG_KVM_IOMMU) += iommu/iommu.o
 hyp-obj-$(CONFIG_ARM_SMMU_V3_PKVM) += iommu/arm-smmu-v3.o
 hyp-obj-$(CONFIG_ARM_SMMU_V3_PKVM) += iommu/io-pgtable-arm.o \
 	../../../../../drivers/iommu/io-pgtable-arm-common.o
+hyp-obj-y += power/scmi.o
 
 ##
 ## Build rules for compiling nVHE hyp code
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 0226a719e28f..91b792d1c074 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -104,6 +104,7 @@ void deactivate_traps_vhe_put(struct kvm_vcpu *vcpu);
 u64 __guest_enter(struct kvm_vcpu *vcpu);
 
 bool kvm_host_psci_handler(struct kvm_cpu_context *host_ctxt);
+bool kvm_host_scmi_handler(struct kvm_cpu_context *host_ctxt);
 
 #ifdef __KVM_NVHE_HYPERVISOR__
 void __noreturn __hyp_do_panic(struct kvm_cpu_context *host_ctxt, u64 spsr,
diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index 746dc1c05a8e..1025354b4650 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -8,6 +8,7 @@
 #define __ARM64_KVM_NVHE_PKVM_H__
 
 #include <asm/kvm_pkvm.h>
+#include <kvm/power_domain.h>
 
 #include <nvhe/gfp.h>
 #include <nvhe/spinlock.h>
@@ -112,4 +113,29 @@ struct pkvm_hyp_vcpu *pkvm_mpidr_to_hyp_vcpu(struct pkvm_hyp_vm *vm, u64 mpidr);
 int pkvm_timer_init(void);
 void pkvm_udelay(unsigned long usecs);
 
+struct kvm_power_domain_ops {
+	int (*power_on)(struct kvm_power_domain *pd);
+	int (*power_off)(struct kvm_power_domain *pd);
+};
+
+int pkvm_init_scmi_pd(struct kvm_power_domain *pd,
+		      const struct kvm_power_domain_ops *ops);
+
+/*
+ * Register a power domain. When the hypervisor catches power requests from the
+ * host for this power domain, it calls the power ops with @pd as argument.
+ */
+static inline int pkvm_init_power_domain(struct kvm_power_domain *pd,
+					 const struct kvm_power_domain_ops *ops)
+{
+	switch (pd->type) {
+	case KVM_POWER_DOMAIN_NONE:
+		return 0;
+	case KVM_POWER_DOMAIN_ARM_SCMI:
+		return pkvm_init_scmi_pd(pd, ops);
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
 #endif /* __ARM64_KVM_NVHE_PKVM_H__ */
diff --git a/arch/arm64/kvm/hyp/include/nvhe/trap_handler.h b/arch/arm64/kvm/hyp/include/nvhe/trap_handler.h
index 1e6d995968a1..0e6bb92ccdb7 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/trap_handler.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/trap_handler.h
@@ -15,4 +15,6 @@
 #define DECLARE_REG(type, name, ctxt, reg)	\
 				type name = (type)cpu_reg(ctxt, (reg))
 
+void __kvm_hyp_host_forward_smc(struct kvm_cpu_context *host_ctxt);
+
 #endif /* __ARM64_KVM_NVHE_TRAP_HANDLER_H__ */
diff --git a/include/kvm/power_domain.h b/include/kvm/power_domain.h
new file mode 100644
index 000000000000..3dcb40005a04
--- /dev/null
+++ b/include/kvm/power_domain.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_POWER_DOMAIN_H
+#define __KVM_POWER_DOMAIN_H
+
+enum kvm_power_domain_type {
+	KVM_POWER_DOMAIN_NONE,
+	KVM_POWER_DOMAIN_ARM_SCMI,
+};
+
+struct kvm_power_domain {
+	enum kvm_power_domain_type	type;
+	union {
+		struct {
+			u32		smc_id;
+			u32		domain_id;
+			phys_addr_t	shmem_base;
+			size_t		shmem_size;
+		} arm_scmi;
+	};
+};
+
+#endif /* __KVM_POWER_DOMAIN_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 34ec46b890f0..ad0877e6ea54 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -37,8 +37,6 @@ DEFINE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
 
 struct kvm_iommu_ops kvm_iommu_ops;
 
-void __kvm_hyp_host_forward_smc(struct kvm_cpu_context *host_ctxt);
-
 typedef void (*hyp_entry_exit_handler_fn)(struct pkvm_hyp_vcpu *);
 
 static void handle_pvm_entry_wfx(struct pkvm_hyp_vcpu *hyp_vcpu)
@@ -1217,6 +1215,8 @@ static void handle_host_smc(struct kvm_cpu_context *host_ctxt)
 	bool handled;
 
 	handled = kvm_host_psci_handler(host_ctxt);
+	if (!handled)
+		handled = kvm_host_scmi_handler(host_ctxt);
 	if (!handled)
 		default_host_smc_handler(host_ctxt);
 
diff --git a/arch/arm64/kvm/hyp/nvhe/power/scmi.c b/arch/arm64/kvm/hyp/nvhe/power/scmi.c
new file mode 100644
index 000000000000..e9ac33f3583c
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/power/scmi.c
@@ -0,0 +1,233 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2022 Linaro Ltd.
+ */
+
+#include <linux/bitfield.h>
+
+#include <nvhe/pkvm.h>
+#include <nvhe/mm.h>
+#include <nvhe/mem_protect.h>
+#include <nvhe/trap_handler.h>
+
+/* SCMI protocol */
+#define SCMI_PROTOCOL_POWER_DOMAIN	0x11
+
+/*  shmem registers */
+#define SCMI_SHM_CHANNEL_STATUS		0x4
+#define SCMI_SHM_CHANNEL_FLAGS		0x10
+#define SCMI_SHM_LENGTH			0x14
+#define SCMI_SHM_MESSAGE_HEADER		0x18
+#define SCMI_SHM_MESSAGE_PAYLOAD	0x1c
+
+/*  channel status */
+#define SCMI_CHN_FREE			(1U << 0)
+#define SCMI_CHN_ERROR			(1U << 1)
+
+/*  channel flags */
+#define SCMI_CHN_IRQ			(1U << 0)
+
+/*  message header */
+#define SCMI_HDR_TOKEN			GENMASK(27, 18)
+#define SCMI_HDR_PROTOCOL_ID		GENMASK(17, 10)
+#define SCMI_HDR_MESSAGE_TYPE		GENMASK(9, 8)
+#define SCMI_HDR_MESSAGE_ID		GENMASK(7, 0)
+
+/*  power domain */
+#define SCMI_PD_STATE_SET		0x4
+#define SCMI_PD_STATE_SET_FLAGS		0x0
+#define SCMI_PD_STATE_SET_DOMAIN_ID	0x4
+#define SCMI_PD_STATE_SET_POWER_STATE	0x8
+
+#define SCMI_PD_STATE_SET_STATUS	0x0
+
+#define SCMI_PD_STATE_SET_FLAGS_ASYNC	(1U << 0)
+
+#define SCMI_PD_POWER_ON		0
+#define SCMI_PD_POWER_OFF		(1U << 30)
+
+#define SCMI_SUCCESS			0
+
+
+static struct {
+	u32				smc_id;
+	phys_addr_t			shmem_pfn;
+	size_t				shmem_size;
+	void __iomem			*shmem;
+} scmi_channel;
+
+#define MAX_POWER_DOMAINS		16
+
+struct scmi_power_domain {
+	struct kvm_power_domain			*pd;
+	const struct kvm_power_domain_ops	*ops;
+};
+
+static struct scmi_power_domain scmi_power_domains[MAX_POWER_DOMAINS];
+static int scmi_power_domain_count;
+
+#define SCMI_POLL_TIMEOUT_US	1000000 /* 1s! */
+
+/* Forward the command to EL3, and wait for completion */
+static int scmi_run_command(struct kvm_cpu_context *host_ctxt)
+{
+	u32 reg;
+	unsigned long i = 0;
+
+	__kvm_hyp_host_forward_smc(host_ctxt);
+
+	do {
+		reg = readl_relaxed(scmi_channel.shmem + SCMI_SHM_CHANNEL_STATUS);
+		if (reg & SCMI_CHN_FREE)
+			break;
+
+		if (WARN_ON(++i > SCMI_POLL_TIMEOUT_US))
+			return -ETIMEDOUT;
+
+		pkvm_udelay(1);
+	} while (!(reg & (SCMI_CHN_FREE | SCMI_CHN_ERROR)));
+
+	if (reg & SCMI_CHN_ERROR)
+		return -EIO;
+
+	reg = readl_relaxed(scmi_channel.shmem + SCMI_SHM_MESSAGE_PAYLOAD +
+			    SCMI_PD_STATE_SET_STATUS);
+	if (reg != SCMI_SUCCESS)
+		return -EIO;
+
+	return 0;
+}
+
+static void __kvm_host_scmi_handler(struct kvm_cpu_context *host_ctxt)
+{
+	int i;
+	u32 reg;
+	struct scmi_power_domain *scmi_pd = NULL;
+
+	/*
+	 * FIXME: the spec does not really allow for an intermediary filtering
+	 * messages on the channel: as soon as the host clears SCMI_CHN_FREE,
+	 * the server may process the message. It doesn't have to wait for a
+	 * doorbell and could just poll on the shared mem. Unlikely in practice,
+	 * but this code is not correct without a spec change requiring the
+	 * server to observe an SMC before processing the message.
+	 */
+	reg = readl_relaxed(scmi_channel.shmem + SCMI_SHM_CHANNEL_STATUS);
+	if (reg & (SCMI_CHN_FREE | SCMI_CHN_ERROR))
+		return;
+
+	reg = readl_relaxed(scmi_channel.shmem + SCMI_SHM_MESSAGE_HEADER);
+	if (FIELD_GET(SCMI_HDR_PROTOCOL_ID, reg) != SCMI_PROTOCOL_POWER_DOMAIN)
+		goto out_forward_smc;
+
+	if (FIELD_GET(SCMI_HDR_MESSAGE_ID, reg) != SCMI_PD_STATE_SET)
+		goto out_forward_smc;
+
+	reg = readl_relaxed(scmi_channel.shmem + SCMI_SHM_MESSAGE_PAYLOAD +
+			    SCMI_PD_STATE_SET_FLAGS);
+	if (WARN_ON(reg & SCMI_PD_STATE_SET_FLAGS_ASYNC))
+		/* We don't support async requests at the moment */
+		return;
+
+	reg = readl_relaxed(scmi_channel.shmem + SCMI_SHM_MESSAGE_PAYLOAD +
+			    SCMI_PD_STATE_SET_DOMAIN_ID);
+
+	for (i = 0; i < MAX_POWER_DOMAINS; i++) {
+		if (!scmi_power_domains[i].pd)
+			break;
+
+		if (reg == scmi_power_domains[i].pd->arm_scmi.domain_id) {
+			scmi_pd = &scmi_power_domains[i];
+			break;
+		}
+	}
+	if (!scmi_pd)
+		goto out_forward_smc;
+
+	reg = readl_relaxed(scmi_channel.shmem + SCMI_SHM_MESSAGE_PAYLOAD +
+			    SCMI_PD_STATE_SET_POWER_STATE);
+	switch (reg) {
+	case SCMI_PD_POWER_ON:
+		if (scmi_run_command(host_ctxt))
+			break;
+
+		scmi_pd->ops->power_on(scmi_pd->pd);
+		break;
+	case SCMI_PD_POWER_OFF:
+		scmi_pd->ops->power_off(scmi_pd->pd);
+
+		if (scmi_run_command(host_ctxt))
+			scmi_pd->ops->power_on(scmi_pd->pd);
+		break;
+	}
+	return;
+
+out_forward_smc:
+	__kvm_hyp_host_forward_smc(host_ctxt);
+}
+
+bool kvm_host_scmi_handler(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(u64, func_id, host_ctxt, 0);
+
+	if (!scmi_channel.shmem || func_id != scmi_channel.smc_id)
+		return false; /* Unhandled */
+
+	/*
+	 * Prevent the host from modifying the request while it is in flight.
+	 * One page is enough, SCMI messages are smaller than that.
+	 *
+	 * FIXME: the host is allowed to poll the shmem while the request is in
+	 * flight, or read shmem when receiving the SCMI interrupt. Although
+	 * it's unlikely with the SMC-based transport, this too requires some
+	 * tightening in the spec.
+	 */
+	if (WARN_ON(__pkvm_host_add_remove_page(scmi_channel.shmem_pfn, true)))
+		return true;
+
+	__kvm_host_scmi_handler(host_ctxt);
+
+	WARN_ON(__pkvm_host_add_remove_page(scmi_channel.shmem_pfn, false));
+	return true; /* Handled */
+}
+
+int pkvm_init_scmi_pd(struct kvm_power_domain *pd,
+		      const struct kvm_power_domain_ops *ops)
+{
+	int ret;
+
+	if (!IS_ALIGNED(pd->arm_scmi.shmem_base, PAGE_SIZE) ||
+	    pd->arm_scmi.shmem_size < PAGE_SIZE) {
+		return -EINVAL;
+	}
+
+	if (!scmi_channel.shmem) {
+		unsigned long shmem;
+
+		/* FIXME: Do we need to mark those pages shared in the host s2? */
+		ret = __pkvm_create_private_mapping(pd->arm_scmi.shmem_base,
+						    pd->arm_scmi.shmem_size,
+						    PAGE_HYP_DEVICE,
+						    &shmem);
+		if (ret)
+			return ret;
+
+		scmi_channel.smc_id = pd->arm_scmi.smc_id;
+		scmi_channel.shmem_pfn = hyp_phys_to_pfn(pd->arm_scmi.shmem_base);
+		scmi_channel.shmem = (void *)shmem;
+
+	} else if (scmi_channel.shmem_pfn !=
+		   hyp_phys_to_pfn(pd->arm_scmi.shmem_base) ||
+		   scmi_channel.smc_id != pd->arm_scmi.smc_id) {
+		/* We support a single channel at the moment */
+		return -ENXIO;
+	}
+
+	if (scmi_power_domain_count == MAX_POWER_DOMAINS)
+		return -ENOSPC;
+
+	scmi_power_domains[scmi_power_domain_count].pd = pd;
+	scmi_power_domains[scmi_power_domain_count].ops = ops;
+	scmi_power_domain_count++;
+	return 0;
+}
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 42/45] KVM: arm64: pkvm: Support SCMI power domain
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

The hypervisor needs to catch power domain changes for devices it owns,
such as the SMMU. Possible reasons:

* Ensure that software and hardware states are consistent. The driver
  does not attempt to modify the state while the device is off.
* Save and restore the device state.
* Enforce dependency between consumers and suppliers. For example ensure
  that endpoints are off before turning the SMMU off, in case a powered
  off SMMU lets DMA through. However this is normally enforced by
  firmware.

Add a SCMI power domain, as the standard method for device power
management on Arm. Other methods can be added to kvm_power_domain later.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/kvm/hyp/nvhe/Makefile              |   1 +
 arch/arm64/include/asm/kvm_hyp.h              |   1 +
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h        |  26 ++
 .../arm64/kvm/hyp/include/nvhe/trap_handler.h |   2 +
 include/kvm/power_domain.h                    |  22 ++
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            |   4 +-
 arch/arm64/kvm/hyp/nvhe/power/scmi.c          | 233 ++++++++++++++++++
 7 files changed, 287 insertions(+), 2 deletions(-)
 create mode 100644 include/kvm/power_domain.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/power/scmi.c

diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 8359909bd796..583a1f920c81 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -32,6 +32,7 @@ hyp-obj-$(CONFIG_KVM_IOMMU) += iommu/iommu.o
 hyp-obj-$(CONFIG_ARM_SMMU_V3_PKVM) += iommu/arm-smmu-v3.o
 hyp-obj-$(CONFIG_ARM_SMMU_V3_PKVM) += iommu/io-pgtable-arm.o \
 	../../../../../drivers/iommu/io-pgtable-arm-common.o
+hyp-obj-y += power/scmi.o
 
 ##
 ## Build rules for compiling nVHE hyp code
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 0226a719e28f..91b792d1c074 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -104,6 +104,7 @@ void deactivate_traps_vhe_put(struct kvm_vcpu *vcpu);
 u64 __guest_enter(struct kvm_vcpu *vcpu);
 
 bool kvm_host_psci_handler(struct kvm_cpu_context *host_ctxt);
+bool kvm_host_scmi_handler(struct kvm_cpu_context *host_ctxt);
 
 #ifdef __KVM_NVHE_HYPERVISOR__
 void __noreturn __hyp_do_panic(struct kvm_cpu_context *host_ctxt, u64 spsr,
diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index 746dc1c05a8e..1025354b4650 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -8,6 +8,7 @@
 #define __ARM64_KVM_NVHE_PKVM_H__
 
 #include <asm/kvm_pkvm.h>
+#include <kvm/power_domain.h>
 
 #include <nvhe/gfp.h>
 #include <nvhe/spinlock.h>
@@ -112,4 +113,29 @@ struct pkvm_hyp_vcpu *pkvm_mpidr_to_hyp_vcpu(struct pkvm_hyp_vm *vm, u64 mpidr);
 int pkvm_timer_init(void);
 void pkvm_udelay(unsigned long usecs);
 
+struct kvm_power_domain_ops {
+	int (*power_on)(struct kvm_power_domain *pd);
+	int (*power_off)(struct kvm_power_domain *pd);
+};
+
+int pkvm_init_scmi_pd(struct kvm_power_domain *pd,
+		      const struct kvm_power_domain_ops *ops);
+
+/*
+ * Register a power domain. When the hypervisor catches power requests from the
+ * host for this power domain, it calls the power ops with @pd as argument.
+ */
+static inline int pkvm_init_power_domain(struct kvm_power_domain *pd,
+					 const struct kvm_power_domain_ops *ops)
+{
+	switch (pd->type) {
+	case KVM_POWER_DOMAIN_NONE:
+		return 0;
+	case KVM_POWER_DOMAIN_ARM_SCMI:
+		return pkvm_init_scmi_pd(pd, ops);
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
 #endif /* __ARM64_KVM_NVHE_PKVM_H__ */
diff --git a/arch/arm64/kvm/hyp/include/nvhe/trap_handler.h b/arch/arm64/kvm/hyp/include/nvhe/trap_handler.h
index 1e6d995968a1..0e6bb92ccdb7 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/trap_handler.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/trap_handler.h
@@ -15,4 +15,6 @@
 #define DECLARE_REG(type, name, ctxt, reg)	\
 				type name = (type)cpu_reg(ctxt, (reg))
 
+void __kvm_hyp_host_forward_smc(struct kvm_cpu_context *host_ctxt);
+
 #endif /* __ARM64_KVM_NVHE_TRAP_HANDLER_H__ */
diff --git a/include/kvm/power_domain.h b/include/kvm/power_domain.h
new file mode 100644
index 000000000000..3dcb40005a04
--- /dev/null
+++ b/include/kvm/power_domain.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_POWER_DOMAIN_H
+#define __KVM_POWER_DOMAIN_H
+
+enum kvm_power_domain_type {
+	KVM_POWER_DOMAIN_NONE,
+	KVM_POWER_DOMAIN_ARM_SCMI,
+};
+
+struct kvm_power_domain {
+	enum kvm_power_domain_type	type;
+	union {
+		struct {
+			u32		smc_id;
+			u32		domain_id;
+			phys_addr_t	shmem_base;
+			size_t		shmem_size;
+		} arm_scmi;
+	};
+};
+
+#endif /* __KVM_POWER_DOMAIN_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 34ec46b890f0..ad0877e6ea54 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -37,8 +37,6 @@ DEFINE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
 
 struct kvm_iommu_ops kvm_iommu_ops;
 
-void __kvm_hyp_host_forward_smc(struct kvm_cpu_context *host_ctxt);
-
 typedef void (*hyp_entry_exit_handler_fn)(struct pkvm_hyp_vcpu *);
 
 static void handle_pvm_entry_wfx(struct pkvm_hyp_vcpu *hyp_vcpu)
@@ -1217,6 +1215,8 @@ static void handle_host_smc(struct kvm_cpu_context *host_ctxt)
 	bool handled;
 
 	handled = kvm_host_psci_handler(host_ctxt);
+	if (!handled)
+		handled = kvm_host_scmi_handler(host_ctxt);
 	if (!handled)
 		default_host_smc_handler(host_ctxt);
 
diff --git a/arch/arm64/kvm/hyp/nvhe/power/scmi.c b/arch/arm64/kvm/hyp/nvhe/power/scmi.c
new file mode 100644
index 000000000000..e9ac33f3583c
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/power/scmi.c
@@ -0,0 +1,233 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2022 Linaro Ltd.
+ */
+
+#include <linux/bitfield.h>
+
+#include <nvhe/pkvm.h>
+#include <nvhe/mm.h>
+#include <nvhe/mem_protect.h>
+#include <nvhe/trap_handler.h>
+
+/* SCMI protocol */
+#define SCMI_PROTOCOL_POWER_DOMAIN	0x11
+
+/*  shmem registers */
+#define SCMI_SHM_CHANNEL_STATUS		0x4
+#define SCMI_SHM_CHANNEL_FLAGS		0x10
+#define SCMI_SHM_LENGTH			0x14
+#define SCMI_SHM_MESSAGE_HEADER		0x18
+#define SCMI_SHM_MESSAGE_PAYLOAD	0x1c
+
+/*  channel status */
+#define SCMI_CHN_FREE			(1U << 0)
+#define SCMI_CHN_ERROR			(1U << 1)
+
+/*  channel flags */
+#define SCMI_CHN_IRQ			(1U << 0)
+
+/*  message header */
+#define SCMI_HDR_TOKEN			GENMASK(27, 18)
+#define SCMI_HDR_PROTOCOL_ID		GENMASK(17, 10)
+#define SCMI_HDR_MESSAGE_TYPE		GENMASK(9, 8)
+#define SCMI_HDR_MESSAGE_ID		GENMASK(7, 0)
+
+/*  power domain */
+#define SCMI_PD_STATE_SET		0x4
+#define SCMI_PD_STATE_SET_FLAGS		0x0
+#define SCMI_PD_STATE_SET_DOMAIN_ID	0x4
+#define SCMI_PD_STATE_SET_POWER_STATE	0x8
+
+#define SCMI_PD_STATE_SET_STATUS	0x0
+
+#define SCMI_PD_STATE_SET_FLAGS_ASYNC	(1U << 0)
+
+#define SCMI_PD_POWER_ON		0
+#define SCMI_PD_POWER_OFF		(1U << 30)
+
+#define SCMI_SUCCESS			0
+
+
+static struct {
+	u32				smc_id;
+	phys_addr_t			shmem_pfn;
+	size_t				shmem_size;
+	void __iomem			*shmem;
+} scmi_channel;
+
+#define MAX_POWER_DOMAINS		16
+
+struct scmi_power_domain {
+	struct kvm_power_domain			*pd;
+	const struct kvm_power_domain_ops	*ops;
+};
+
+static struct scmi_power_domain scmi_power_domains[MAX_POWER_DOMAINS];
+static int scmi_power_domain_count;
+
+#define SCMI_POLL_TIMEOUT_US	1000000 /* 1s! */
+
+/* Forward the command to EL3, and wait for completion */
+static int scmi_run_command(struct kvm_cpu_context *host_ctxt)
+{
+	u32 reg;
+	unsigned long i = 0;
+
+	__kvm_hyp_host_forward_smc(host_ctxt);
+
+	do {
+		reg = readl_relaxed(scmi_channel.shmem + SCMI_SHM_CHANNEL_STATUS);
+		if (reg & SCMI_CHN_FREE)
+			break;
+
+		if (WARN_ON(++i > SCMI_POLL_TIMEOUT_US))
+			return -ETIMEDOUT;
+
+		pkvm_udelay(1);
+	} while (!(reg & (SCMI_CHN_FREE | SCMI_CHN_ERROR)));
+
+	if (reg & SCMI_CHN_ERROR)
+		return -EIO;
+
+	reg = readl_relaxed(scmi_channel.shmem + SCMI_SHM_MESSAGE_PAYLOAD +
+			    SCMI_PD_STATE_SET_STATUS);
+	if (reg != SCMI_SUCCESS)
+		return -EIO;
+
+	return 0;
+}
+
+static void __kvm_host_scmi_handler(struct kvm_cpu_context *host_ctxt)
+{
+	int i;
+	u32 reg;
+	struct scmi_power_domain *scmi_pd = NULL;
+
+	/*
+	 * FIXME: the spec does not really allow for an intermediary filtering
+	 * messages on the channel: as soon as the host clears SCMI_CHN_FREE,
+	 * the server may process the message. It doesn't have to wait for a
+	 * doorbell and could just poll on the shared mem. Unlikely in practice,
+	 * but this code is not correct without a spec change requiring the
+	 * server to observe an SMC before processing the message.
+	 */
+	reg = readl_relaxed(scmi_channel.shmem + SCMI_SHM_CHANNEL_STATUS);
+	if (reg & (SCMI_CHN_FREE | SCMI_CHN_ERROR))
+		return;
+
+	reg = readl_relaxed(scmi_channel.shmem + SCMI_SHM_MESSAGE_HEADER);
+	if (FIELD_GET(SCMI_HDR_PROTOCOL_ID, reg) != SCMI_PROTOCOL_POWER_DOMAIN)
+		goto out_forward_smc;
+
+	if (FIELD_GET(SCMI_HDR_MESSAGE_ID, reg) != SCMI_PD_STATE_SET)
+		goto out_forward_smc;
+
+	reg = readl_relaxed(scmi_channel.shmem + SCMI_SHM_MESSAGE_PAYLOAD +
+			    SCMI_PD_STATE_SET_FLAGS);
+	if (WARN_ON(reg & SCMI_PD_STATE_SET_FLAGS_ASYNC))
+		/* We don't support async requests at the moment */
+		return;
+
+	reg = readl_relaxed(scmi_channel.shmem + SCMI_SHM_MESSAGE_PAYLOAD +
+			    SCMI_PD_STATE_SET_DOMAIN_ID);
+
+	for (i = 0; i < MAX_POWER_DOMAINS; i++) {
+		if (!scmi_power_domains[i].pd)
+			break;
+
+		if (reg == scmi_power_domains[i].pd->arm_scmi.domain_id) {
+			scmi_pd = &scmi_power_domains[i];
+			break;
+		}
+	}
+	if (!scmi_pd)
+		goto out_forward_smc;
+
+	reg = readl_relaxed(scmi_channel.shmem + SCMI_SHM_MESSAGE_PAYLOAD +
+			    SCMI_PD_STATE_SET_POWER_STATE);
+	switch (reg) {
+	case SCMI_PD_POWER_ON:
+		if (scmi_run_command(host_ctxt))
+			break;
+
+		scmi_pd->ops->power_on(scmi_pd->pd);
+		break;
+	case SCMI_PD_POWER_OFF:
+		scmi_pd->ops->power_off(scmi_pd->pd);
+
+		if (scmi_run_command(host_ctxt))
+			scmi_pd->ops->power_on(scmi_pd->pd);
+		break;
+	}
+	return;
+
+out_forward_smc:
+	__kvm_hyp_host_forward_smc(host_ctxt);
+}
+
+bool kvm_host_scmi_handler(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(u64, func_id, host_ctxt, 0);
+
+	if (!scmi_channel.shmem || func_id != scmi_channel.smc_id)
+		return false; /* Unhandled */
+
+	/*
+	 * Prevent the host from modifying the request while it is in flight.
+	 * One page is enough, SCMI messages are smaller than that.
+	 *
+	 * FIXME: the host is allowed to poll the shmem while the request is in
+	 * flight, or read shmem when receiving the SCMI interrupt. Although
+	 * it's unlikely with the SMC-based transport, this too requires some
+	 * tightening in the spec.
+	 */
+	if (WARN_ON(__pkvm_host_add_remove_page(scmi_channel.shmem_pfn, true)))
+		return true;
+
+	__kvm_host_scmi_handler(host_ctxt);
+
+	WARN_ON(__pkvm_host_add_remove_page(scmi_channel.shmem_pfn, false));
+	return true; /* Handled */
+}
+
+int pkvm_init_scmi_pd(struct kvm_power_domain *pd,
+		      const struct kvm_power_domain_ops *ops)
+{
+	int ret;
+
+	if (!IS_ALIGNED(pd->arm_scmi.shmem_base, PAGE_SIZE) ||
+	    pd->arm_scmi.shmem_size < PAGE_SIZE) {
+		return -EINVAL;
+	}
+
+	if (!scmi_channel.shmem) {
+		unsigned long shmem;
+
+		/* FIXME: Do we need to mark those pages shared in the host s2? */
+		ret = __pkvm_create_private_mapping(pd->arm_scmi.shmem_base,
+						    pd->arm_scmi.shmem_size,
+						    PAGE_HYP_DEVICE,
+						    &shmem);
+		if (ret)
+			return ret;
+
+		scmi_channel.smc_id = pd->arm_scmi.smc_id;
+		scmi_channel.shmem_pfn = hyp_phys_to_pfn(pd->arm_scmi.shmem_base);
+		scmi_channel.shmem = (void *)shmem;
+
+	} else if (scmi_channel.shmem_pfn !=
+		   hyp_phys_to_pfn(pd->arm_scmi.shmem_base) ||
+		   scmi_channel.smc_id != pd->arm_scmi.smc_id) {
+		/* We support a single channel at the moment */
+		return -ENXIO;
+	}
+
+	if (scmi_power_domain_count == MAX_POWER_DOMAINS)
+		return -ENOSPC;
+
+	scmi_power_domains[scmi_power_domain_count].pd = pd;
+	scmi_power_domains[scmi_power_domain_count].ops = ops;
+	scmi_power_domain_count++;
+	return 0;
+}
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 43/45] KVM: arm64: smmu-v3: Support power management
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Add power domain ops to the hypervisor IOMMU driver. We currently make
these assumptions:

* The register state is retained across power off.
* The TLBs are clean on power on.
* Another privileged software (EL3 or SCP FW) handles dependencies
  between SMMU and endpoints.

So we just need to make sure that the CPU does not touch the SMMU
registers while it is powered off.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 include/kvm/arm_smmu_v3.h                   |  4 +++
 include/kvm/iommu.h                         |  4 +++
 arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c | 12 +++++++
 arch/arm64/kvm/hyp/nvhe/iommu/iommu.c       | 36 +++++++++++++++++++++
 4 files changed, 56 insertions(+)

diff --git a/include/kvm/arm_smmu_v3.h b/include/kvm/arm_smmu_v3.h
index 373b915b6661..d345cd616407 100644
--- a/include/kvm/arm_smmu_v3.h
+++ b/include/kvm/arm_smmu_v3.h
@@ -12,6 +12,9 @@
  * Parameters from the trusted host:
  * @mmio_addr		base address of the SMMU registers
  * @mmio_size		size of the registers resource
+ * @caches_clean_on_power_on
+ *			is it safe to elide cache and TLB invalidation commands
+ *			while the SMMU is OFF
  *
  * Other members are filled and used at runtime by the SMMU driver.
  */
@@ -20,6 +23,7 @@ struct hyp_arm_smmu_v3_device {
 	phys_addr_t		mmio_addr;
 	size_t			mmio_size;
 	unsigned long		features;
+	bool			caches_clean_on_power_on;
 
 	void __iomem		*base;
 	u32			cmdq_prod;
diff --git a/include/kvm/iommu.h b/include/kvm/iommu.h
index 2bbe5f7bf726..ab888da731bc 100644
--- a/include/kvm/iommu.h
+++ b/include/kvm/iommu.h
@@ -3,6 +3,7 @@
 #define __KVM_IOMMU_H
 
 #include <asm/kvm_host.h>
+#include <kvm/power_domain.h>
 #include <linux/io-pgtable.h>
 
 /*
@@ -10,6 +11,7 @@
  * @pgtable_cfg:	page table configuration
  * @domains:		root domain table
  * @nr_domains:		max number of domains (exclusive)
+ * @power_domain:	power domain information
  *
  * Other members are filled and used at runtime by the IOMMU driver.
  */
@@ -17,8 +19,10 @@ struct kvm_hyp_iommu {
 	struct io_pgtable_cfg		pgtable_cfg;
 	void				**domains;
 	size_t				nr_domains;
+	struct kvm_power_domain		power_domain;
 
 	struct io_pgtable_params	*pgtable;
+	bool				power_is_off;
 };
 
 struct kvm_hyp_iommu_memcache {
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
index 56e313203a16..20610ebf04c2 100644
--- a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
@@ -83,6 +83,9 @@ static int smmu_add_cmd(struct hyp_arm_smmu_v3_device *smmu,
 	int idx = Q_IDX(smmu, smmu->cmdq_prod);
 	u64 *slot = smmu->cmdq_base + idx * CMDQ_ENT_DWORDS;
 
+	if (smmu->iommu.power_is_off)
+		return -EPIPE;
+
 	ret = smmu_wait_event(smmu, !smmu_cmdq_full(smmu));
 	if (ret)
 		return ret;
@@ -160,6 +163,9 @@ static int smmu_sync_ste(struct hyp_arm_smmu_v3_device *smmu, u32 sid)
 		.cfgi.leaf = true,
 	};
 
+	if (smmu->iommu.power_is_off && smmu->caches_clean_on_power_on)
+		return 0;
+
 	return smmu_send_cmd(smmu, &cmd);
 }
 
@@ -394,6 +400,9 @@ static void smmu_tlb_flush_all(void *cookie)
 		.tlbi.vmid = data->domain_id,
 	};
 
+	if (smmu->iommu.power_is_off && smmu->caches_clean_on_power_on)
+		return;
+
 	WARN_ON(smmu_send_cmd(smmu, &cmd));
 }
 
@@ -409,6 +418,9 @@ static void smmu_tlb_inv_range(struct kvm_iommu_tlb_cookie *data,
 		.tlbi.leaf = leaf,
 	};
 
+	if (smmu->iommu.power_is_off && smmu->caches_clean_on_power_on)
+		return;
+
 	/*
 	 * There are no mappings at high addresses since we don't use TTB1, so
 	 * no overflow possible.
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
index 0550e7bdf179..2fb5514ee0ef 100644
--- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
@@ -327,10 +327,46 @@ phys_addr_t kvm_iommu_iova_to_phys(pkvm_handle_t iommu_id,
 	return phys;
 }
 
+static int iommu_power_on(struct kvm_power_domain *pd)
+{
+	struct kvm_hyp_iommu *iommu = container_of(pd, struct kvm_hyp_iommu,
+						   power_domain);
+
+	/*
+	 * We currently assume that the device retains its architectural state
+	 * across power off, hence no save/restore.
+	 */
+	hyp_spin_lock(&iommu_lock);
+	iommu->power_is_off = false;
+	hyp_spin_unlock(&iommu_lock);
+	return 0;
+}
+
+static int iommu_power_off(struct kvm_power_domain *pd)
+{
+	struct kvm_hyp_iommu *iommu = container_of(pd, struct kvm_hyp_iommu,
+						   power_domain);
+
+	hyp_spin_lock(&iommu_lock);
+	iommu->power_is_off = true;
+	hyp_spin_unlock(&iommu_lock);
+	return 0;
+}
+
+static const struct kvm_power_domain_ops iommu_power_ops = {
+	.power_on	= iommu_power_on,
+	.power_off	= iommu_power_off,
+};
+
 int kvm_iommu_init_device(struct kvm_hyp_iommu *iommu)
 {
+	int ret;
 	void *domains;
 
+	ret = pkvm_init_power_domain(&iommu->power_domain, &iommu_power_ops);
+	if (ret)
+		return ret;
+
 	domains = iommu->domains;
 	iommu->domains = kern_hyp_va(domains);
 	return pkvm_create_mappings(iommu->domains, iommu->domains +
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 43/45] KVM: arm64: smmu-v3: Support power management
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Add power domain ops to the hypervisor IOMMU driver. We currently make
these assumptions:

* The register state is retained across power off.
* The TLBs are clean on power on.
* Another privileged software (EL3 or SCP FW) handles dependencies
  between SMMU and endpoints.

So we just need to make sure that the CPU does not touch the SMMU
registers while it is powered off.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 include/kvm/arm_smmu_v3.h                   |  4 +++
 include/kvm/iommu.h                         |  4 +++
 arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c | 12 +++++++
 arch/arm64/kvm/hyp/nvhe/iommu/iommu.c       | 36 +++++++++++++++++++++
 4 files changed, 56 insertions(+)

diff --git a/include/kvm/arm_smmu_v3.h b/include/kvm/arm_smmu_v3.h
index 373b915b6661..d345cd616407 100644
--- a/include/kvm/arm_smmu_v3.h
+++ b/include/kvm/arm_smmu_v3.h
@@ -12,6 +12,9 @@
  * Parameters from the trusted host:
  * @mmio_addr		base address of the SMMU registers
  * @mmio_size		size of the registers resource
+ * @caches_clean_on_power_on
+ *			is it safe to elide cache and TLB invalidation commands
+ *			while the SMMU is OFF
  *
  * Other members are filled and used at runtime by the SMMU driver.
  */
@@ -20,6 +23,7 @@ struct hyp_arm_smmu_v3_device {
 	phys_addr_t		mmio_addr;
 	size_t			mmio_size;
 	unsigned long		features;
+	bool			caches_clean_on_power_on;
 
 	void __iomem		*base;
 	u32			cmdq_prod;
diff --git a/include/kvm/iommu.h b/include/kvm/iommu.h
index 2bbe5f7bf726..ab888da731bc 100644
--- a/include/kvm/iommu.h
+++ b/include/kvm/iommu.h
@@ -3,6 +3,7 @@
 #define __KVM_IOMMU_H
 
 #include <asm/kvm_host.h>
+#include <kvm/power_domain.h>
 #include <linux/io-pgtable.h>
 
 /*
@@ -10,6 +11,7 @@
  * @pgtable_cfg:	page table configuration
  * @domains:		root domain table
  * @nr_domains:		max number of domains (exclusive)
+ * @power_domain:	power domain information
  *
  * Other members are filled and used at runtime by the IOMMU driver.
  */
@@ -17,8 +19,10 @@ struct kvm_hyp_iommu {
 	struct io_pgtable_cfg		pgtable_cfg;
 	void				**domains;
 	size_t				nr_domains;
+	struct kvm_power_domain		power_domain;
 
 	struct io_pgtable_params	*pgtable;
+	bool				power_is_off;
 };
 
 struct kvm_hyp_iommu_memcache {
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
index 56e313203a16..20610ebf04c2 100644
--- a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
@@ -83,6 +83,9 @@ static int smmu_add_cmd(struct hyp_arm_smmu_v3_device *smmu,
 	int idx = Q_IDX(smmu, smmu->cmdq_prod);
 	u64 *slot = smmu->cmdq_base + idx * CMDQ_ENT_DWORDS;
 
+	if (smmu->iommu.power_is_off)
+		return -EPIPE;
+
 	ret = smmu_wait_event(smmu, !smmu_cmdq_full(smmu));
 	if (ret)
 		return ret;
@@ -160,6 +163,9 @@ static int smmu_sync_ste(struct hyp_arm_smmu_v3_device *smmu, u32 sid)
 		.cfgi.leaf = true,
 	};
 
+	if (smmu->iommu.power_is_off && smmu->caches_clean_on_power_on)
+		return 0;
+
 	return smmu_send_cmd(smmu, &cmd);
 }
 
@@ -394,6 +400,9 @@ static void smmu_tlb_flush_all(void *cookie)
 		.tlbi.vmid = data->domain_id,
 	};
 
+	if (smmu->iommu.power_is_off && smmu->caches_clean_on_power_on)
+		return;
+
 	WARN_ON(smmu_send_cmd(smmu, &cmd));
 }
 
@@ -409,6 +418,9 @@ static void smmu_tlb_inv_range(struct kvm_iommu_tlb_cookie *data,
 		.tlbi.leaf = leaf,
 	};
 
+	if (smmu->iommu.power_is_off && smmu->caches_clean_on_power_on)
+		return;
+
 	/*
 	 * There are no mappings at high addresses since we don't use TTB1, so
 	 * no overflow possible.
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
index 0550e7bdf179..2fb5514ee0ef 100644
--- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
@@ -327,10 +327,46 @@ phys_addr_t kvm_iommu_iova_to_phys(pkvm_handle_t iommu_id,
 	return phys;
 }
 
+static int iommu_power_on(struct kvm_power_domain *pd)
+{
+	struct kvm_hyp_iommu *iommu = container_of(pd, struct kvm_hyp_iommu,
+						   power_domain);
+
+	/*
+	 * We currently assume that the device retains its architectural state
+	 * across power off, hence no save/restore.
+	 */
+	hyp_spin_lock(&iommu_lock);
+	iommu->power_is_off = false;
+	hyp_spin_unlock(&iommu_lock);
+	return 0;
+}
+
+static int iommu_power_off(struct kvm_power_domain *pd)
+{
+	struct kvm_hyp_iommu *iommu = container_of(pd, struct kvm_hyp_iommu,
+						   power_domain);
+
+	hyp_spin_lock(&iommu_lock);
+	iommu->power_is_off = true;
+	hyp_spin_unlock(&iommu_lock);
+	return 0;
+}
+
+static const struct kvm_power_domain_ops iommu_power_ops = {
+	.power_on	= iommu_power_on,
+	.power_off	= iommu_power_off,
+};
+
 int kvm_iommu_init_device(struct kvm_hyp_iommu *iommu)
 {
+	int ret;
 	void *domains;
 
+	ret = pkvm_init_power_domain(&iommu->power_domain, &iommu_power_ops);
+	if (ret)
+		return ret;
+
 	domains = iommu->domains;
 	iommu->domains = kern_hyp_va(domains);
 	return pkvm_create_mappings(iommu->domains, iommu->domains +
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 44/45] iommu/arm-smmu-v3-kvm: Support power management with SCMI SMC
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Discover SCMI parameters for the SMMUv3 power domain, and pass them to
the hypervisor. Power management must use a method based on SMC, so the
hypervisor driver can catch them and keep the software state in sync
with the hardware.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c   | 76 +++++++++++++++++++
 1 file changed, 76 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
index 930d78f6e29f..198e41d808b0 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
@@ -6,6 +6,7 @@
  */
 #include <asm/kvm_mmu.h>
 #include <linux/local_lock.h>
+#include <linux/of_address.h>
 #include <linux/of_platform.h>
 
 #include <kvm/arm_smmu_v3.h>
@@ -495,6 +496,75 @@ static int kvm_arm_smmu_device_reset(struct host_arm_smmu_device *host_smmu)
 	return 0;
 }
 
+static int kvm_arm_probe_scmi_pd(struct device_node *scmi_node,
+				 struct kvm_power_domain *pd)
+{
+	int ret;
+	struct resource res;
+	struct of_phandle_args args;
+
+	pd->type = KVM_POWER_DOMAIN_ARM_SCMI;
+
+	ret = of_parse_phandle_with_args(scmi_node, "shmem", NULL, 0, &args);
+	if (ret)
+		return ret;
+
+	ret = of_address_to_resource(args.np, 0, &res);
+	if (ret)
+		goto out_put_nodes;
+
+	ret = of_property_read_u32(scmi_node, "arm,smc-id",
+				   &pd->arm_scmi.smc_id);
+	if (ret)
+		goto out_put_nodes;
+
+	/*
+	 * The shared buffer is unmapped from the host while a request is in
+	 * flight, so it has to be on its own page.
+	 */
+	if (!IS_ALIGNED(res.start, SZ_64K) || resource_size(&res) < SZ_64K) {
+		ret = -EINVAL;
+		goto out_put_nodes;
+	}
+
+	pd->arm_scmi.shmem_base = res.start;
+	pd->arm_scmi.shmem_size = resource_size(&res);
+
+out_put_nodes:
+	of_node_put(args.np);
+	return ret;
+}
+
+/* TODO: Move this. None of it is specific to SMMU */
+static int kvm_arm_probe_power_domain(struct device *dev,
+				      struct kvm_power_domain *pd)
+{
+	int ret;
+	struct device_node *parent;
+	struct of_phandle_args args;
+
+	if (!of_get_property(dev->of_node, "power-domains", NULL))
+		return 0;
+
+	ret = of_parse_phandle_with_args(dev->of_node, "power-domains",
+					 "#power-domain-cells", 0, &args);
+	if (ret)
+		return ret;
+
+	parent = of_get_parent(args.np);
+	if (parent && of_device_is_compatible(parent, "arm,scmi-smc") &&
+	    args.args_count > 0) {
+		pd->arm_scmi.domain_id = args.args[0];
+		ret = kvm_arm_probe_scmi_pd(parent, pd);
+	} else {
+		dev_err(dev, "Unsupported PM method for %pOF\n", args.np);
+		ret = -EINVAL;
+	}
+	of_node_put(parent);
+	of_node_put(args.np);
+	return ret;
+}
+
 static void *kvm_arm_smmu_alloc_domains(struct arm_smmu_device *smmu)
 {
 	return (void *)devm_get_free_pages(smmu->dev, GFP_KERNEL | __GFP_ZERO,
@@ -513,6 +583,7 @@ static int kvm_arm_smmu_probe(struct platform_device *pdev)
 	struct device *dev = &pdev->dev;
 	struct host_arm_smmu_device *host_smmu;
 	struct hyp_arm_smmu_v3_device *hyp_smmu;
+	struct kvm_power_domain power_domain = {};
 
 	if (kvm_arm_smmu_cur >= kvm_arm_smmu_count)
 		return -ENOSPC;
@@ -530,6 +601,10 @@ static int kvm_arm_smmu_probe(struct platform_device *pdev)
 	if (ret || bypass)
 		return ret ?: -EINVAL;
 
+	ret = kvm_arm_probe_power_domain(dev, &power_domain);
+	if (ret)
+		return ret;
+
 	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
 	size = resource_size(res);
 	if (size < SZ_128K) {
@@ -606,6 +681,7 @@ static int kvm_arm_smmu_probe(struct platform_device *pdev)
 	hyp_smmu->mmio_size = size;
 	hyp_smmu->features = smmu->features;
 	hyp_smmu->iommu.pgtable_cfg = cfg;
+	hyp_smmu->iommu.power_domain = power_domain;
 
 	kvm_arm_smmu_cur++;
 
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 44/45] iommu/arm-smmu-v3-kvm: Support power management with SCMI SMC
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Discover SCMI parameters for the SMMUv3 power domain, and pass them to
the hypervisor. Power management must use a method based on SMC, so the
hypervisor driver can catch them and keep the software state in sync
with the hardware.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c   | 76 +++++++++++++++++++
 1 file changed, 76 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
index 930d78f6e29f..198e41d808b0 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
@@ -6,6 +6,7 @@
  */
 #include <asm/kvm_mmu.h>
 #include <linux/local_lock.h>
+#include <linux/of_address.h>
 #include <linux/of_platform.h>
 
 #include <kvm/arm_smmu_v3.h>
@@ -495,6 +496,75 @@ static int kvm_arm_smmu_device_reset(struct host_arm_smmu_device *host_smmu)
 	return 0;
 }
 
+static int kvm_arm_probe_scmi_pd(struct device_node *scmi_node,
+				 struct kvm_power_domain *pd)
+{
+	int ret;
+	struct resource res;
+	struct of_phandle_args args;
+
+	pd->type = KVM_POWER_DOMAIN_ARM_SCMI;
+
+	ret = of_parse_phandle_with_args(scmi_node, "shmem", NULL, 0, &args);
+	if (ret)
+		return ret;
+
+	ret = of_address_to_resource(args.np, 0, &res);
+	if (ret)
+		goto out_put_nodes;
+
+	ret = of_property_read_u32(scmi_node, "arm,smc-id",
+				   &pd->arm_scmi.smc_id);
+	if (ret)
+		goto out_put_nodes;
+
+	/*
+	 * The shared buffer is unmapped from the host while a request is in
+	 * flight, so it has to be on its own page.
+	 */
+	if (!IS_ALIGNED(res.start, SZ_64K) || resource_size(&res) < SZ_64K) {
+		ret = -EINVAL;
+		goto out_put_nodes;
+	}
+
+	pd->arm_scmi.shmem_base = res.start;
+	pd->arm_scmi.shmem_size = resource_size(&res);
+
+out_put_nodes:
+	of_node_put(args.np);
+	return ret;
+}
+
+/* TODO: Move this. None of it is specific to SMMU */
+static int kvm_arm_probe_power_domain(struct device *dev,
+				      struct kvm_power_domain *pd)
+{
+	int ret;
+	struct device_node *parent;
+	struct of_phandle_args args;
+
+	if (!of_get_property(dev->of_node, "power-domains", NULL))
+		return 0;
+
+	ret = of_parse_phandle_with_args(dev->of_node, "power-domains",
+					 "#power-domain-cells", 0, &args);
+	if (ret)
+		return ret;
+
+	parent = of_get_parent(args.np);
+	if (parent && of_device_is_compatible(parent, "arm,scmi-smc") &&
+	    args.args_count > 0) {
+		pd->arm_scmi.domain_id = args.args[0];
+		ret = kvm_arm_probe_scmi_pd(parent, pd);
+	} else {
+		dev_err(dev, "Unsupported PM method for %pOF\n", args.np);
+		ret = -EINVAL;
+	}
+	of_node_put(parent);
+	of_node_put(args.np);
+	return ret;
+}
+
 static void *kvm_arm_smmu_alloc_domains(struct arm_smmu_device *smmu)
 {
 	return (void *)devm_get_free_pages(smmu->dev, GFP_KERNEL | __GFP_ZERO,
@@ -513,6 +583,7 @@ static int kvm_arm_smmu_probe(struct platform_device *pdev)
 	struct device *dev = &pdev->dev;
 	struct host_arm_smmu_device *host_smmu;
 	struct hyp_arm_smmu_v3_device *hyp_smmu;
+	struct kvm_power_domain power_domain = {};
 
 	if (kvm_arm_smmu_cur >= kvm_arm_smmu_count)
 		return -ENOSPC;
@@ -530,6 +601,10 @@ static int kvm_arm_smmu_probe(struct platform_device *pdev)
 	if (ret || bypass)
 		return ret ?: -EINVAL;
 
+	ret = kvm_arm_probe_power_domain(dev, &power_domain);
+	if (ret)
+		return ret;
+
 	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
 	size = resource_size(res);
 	if (size < SZ_128K) {
@@ -606,6 +681,7 @@ static int kvm_arm_smmu_probe(struct platform_device *pdev)
 	hyp_smmu->mmio_size = size;
 	hyp_smmu->features = smmu->features;
 	hyp_smmu->iommu.pgtable_cfg = cfg;
+	hyp_smmu->iommu.power_domain = power_domain;
 
 	kvm_arm_smmu_cur++;
 
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 45/45] iommu/arm-smmu-v3-kvm: Enable runtime PM
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Enable runtime PM for the KVM SMMUv3 driver. The PM link to DMA masters
dictates when the SMMU should be powered on.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c   | 54 +++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
index 198e41d808b0..cd865049f89a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
@@ -8,6 +8,7 @@
 #include <linux/local_lock.h>
 #include <linux/of_address.h>
 #include <linux/of_platform.h>
+#include <linux/pm_runtime.h>
 
 #include <kvm/arm_smmu_v3.h>
 
@@ -18,6 +19,7 @@ struct host_arm_smmu_device {
 	pkvm_handle_t			id;
 	u32				boot_gbpa;
 	unsigned int			pgd_order;
+	atomic_t			initialized;
 };
 
 #define smmu_to_host(_smmu) \
@@ -134,8 +136,10 @@ static struct iommu_ops kvm_arm_smmu_ops;
 
 static struct iommu_device *kvm_arm_smmu_probe_device(struct device *dev)
 {
+	int ret;
 	struct arm_smmu_device *smmu;
 	struct kvm_arm_smmu_master *master;
+	struct host_arm_smmu_device *host_smmu;
 	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
 
 	if (!fwspec || fwspec->ops != &kvm_arm_smmu_ops)
@@ -156,7 +160,28 @@ static struct iommu_device *kvm_arm_smmu_probe_device(struct device *dev)
 	master->smmu = smmu;
 	dev_iommu_priv_set(dev, master);
 
+	if (!device_link_add(dev, smmu->dev,
+			     DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE |
+			     DL_FLAG_AUTOREMOVE_SUPPLIER)) {
+		ret = -ENOLINK;
+		goto err_free;
+	}
+
+	/*
+	 * If the SMMU has just been initialized by the hypervisor, release the
+	 * extra PM reference taken by kvm_arm_smmu_probe().  Not sure yet how
+	 * to improve this. Maybe have KVM call us back when it finished
+	 * initializing?
+	 */
+	host_smmu = smmu_to_host(smmu);
+	if (atomic_add_unless(&host_smmu->initialized, 1, 1))
+		pm_runtime_put_noidle(smmu->dev);
+
 	return &smmu->iommu;
+
+err_free:
+	kfree(master);
+	return ERR_PTR(ret);
 }
 
 static void kvm_arm_smmu_release_device(struct device *dev)
@@ -685,6 +710,30 @@ static int kvm_arm_smmu_probe(struct platform_device *pdev)
 
 	kvm_arm_smmu_cur++;
 
+	/*
+	 * The state of endpoints dictates when the SMMU is powered off. To turn
+	 * the SMMU on and off, a genpd driver uses SCMI over the SMC transport,
+	 * or some other platform-specific SMC. Those power requests are caught
+	 * by the hypervisor, so that the hyp driver doesn't touch the hardware
+	 * state while it is off.
+	 *
+	 * We are making a big assumption here, that TLBs and caches are invalid
+	 * on power on, and therefore we don't need to wake the SMMU when
+	 * modifying page tables, stream tables and context tables. If this
+	 * assumption does not hold on some systems, then we'll need to grab RPM
+	 * reference in map(), attach(), etc, so the hyp driver can send
+	 * invalidations.
+	 */
+	hyp_smmu->caches_clean_on_power_on = true;
+
+	pm_runtime_set_active(dev);
+	pm_runtime_enable(dev);
+	/*
+	 * Take a reference to keep the SMMU powered on while the hypervisor
+	 * initializes it.
+	 */
+	pm_runtime_resume_and_get(dev);
+
 	return 0;
 }
 
@@ -697,6 +746,11 @@ static int kvm_arm_smmu_remove(struct platform_device *pdev)
 	 * There was an error during hypervisor setup. The hyp driver may
 	 * have already enabled the device, so disable it.
 	 */
+
+	if (!atomic_read(&host_smmu->initialized))
+		pm_runtime_put_noidle(&pdev->dev);
+	pm_runtime_disable(&pdev->dev);
+	pm_runtime_set_suspended(&pdev->dev);
 	arm_smmu_unregister_iommu(smmu);
 	arm_smmu_device_disable(smmu);
 	arm_smmu_update_gbpa(smmu, host_smmu->boot_gbpa, GBPA_ABORT);
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [RFC PATCH 45/45] iommu/arm-smmu-v3-kvm: Enable runtime PM
@ 2023-02-01 12:53   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-01 12:53 UTC (permalink / raw)
  To: maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Jean-Philippe Brucker

Enable runtime PM for the KVM SMMUv3 driver. The PM link to DMA masters
dictates when the SMMU should be powered on.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c   | 54 +++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
index 198e41d808b0..cd865049f89a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
@@ -8,6 +8,7 @@
 #include <linux/local_lock.h>
 #include <linux/of_address.h>
 #include <linux/of_platform.h>
+#include <linux/pm_runtime.h>
 
 #include <kvm/arm_smmu_v3.h>
 
@@ -18,6 +19,7 @@ struct host_arm_smmu_device {
 	pkvm_handle_t			id;
 	u32				boot_gbpa;
 	unsigned int			pgd_order;
+	atomic_t			initialized;
 };
 
 #define smmu_to_host(_smmu) \
@@ -134,8 +136,10 @@ static struct iommu_ops kvm_arm_smmu_ops;
 
 static struct iommu_device *kvm_arm_smmu_probe_device(struct device *dev)
 {
+	int ret;
 	struct arm_smmu_device *smmu;
 	struct kvm_arm_smmu_master *master;
+	struct host_arm_smmu_device *host_smmu;
 	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
 
 	if (!fwspec || fwspec->ops != &kvm_arm_smmu_ops)
@@ -156,7 +160,28 @@ static struct iommu_device *kvm_arm_smmu_probe_device(struct device *dev)
 	master->smmu = smmu;
 	dev_iommu_priv_set(dev, master);
 
+	if (!device_link_add(dev, smmu->dev,
+			     DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE |
+			     DL_FLAG_AUTOREMOVE_SUPPLIER)) {
+		ret = -ENOLINK;
+		goto err_free;
+	}
+
+	/*
+	 * If the SMMU has just been initialized by the hypervisor, release the
+	 * extra PM reference taken by kvm_arm_smmu_probe().  Not sure yet how
+	 * to improve this. Maybe have KVM call us back when it finished
+	 * initializing?
+	 */
+	host_smmu = smmu_to_host(smmu);
+	if (atomic_add_unless(&host_smmu->initialized, 1, 1))
+		pm_runtime_put_noidle(smmu->dev);
+
 	return &smmu->iommu;
+
+err_free:
+	kfree(master);
+	return ERR_PTR(ret);
 }
 
 static void kvm_arm_smmu_release_device(struct device *dev)
@@ -685,6 +710,30 @@ static int kvm_arm_smmu_probe(struct platform_device *pdev)
 
 	kvm_arm_smmu_cur++;
 
+	/*
+	 * The state of endpoints dictates when the SMMU is powered off. To turn
+	 * the SMMU on and off, a genpd driver uses SCMI over the SMC transport,
+	 * or some other platform-specific SMC. Those power requests are caught
+	 * by the hypervisor, so that the hyp driver doesn't touch the hardware
+	 * state while it is off.
+	 *
+	 * We are making a big assumption here, that TLBs and caches are invalid
+	 * on power on, and therefore we don't need to wake the SMMU when
+	 * modifying page tables, stream tables and context tables. If this
+	 * assumption does not hold on some systems, then we'll need to grab RPM
+	 * reference in map(), attach(), etc, so the hyp driver can send
+	 * invalidations.
+	 */
+	hyp_smmu->caches_clean_on_power_on = true;
+
+	pm_runtime_set_active(dev);
+	pm_runtime_enable(dev);
+	/*
+	 * Take a reference to keep the SMMU powered on while the hypervisor
+	 * initializes it.
+	 */
+	pm_runtime_resume_and_get(dev);
+
 	return 0;
 }
 
@@ -697,6 +746,11 @@ static int kvm_arm_smmu_remove(struct platform_device *pdev)
 	 * There was an error during hypervisor setup. The hyp driver may
 	 * have already enabled the device, so disable it.
 	 */
+
+	if (!atomic_read(&host_smmu->initialized))
+		pm_runtime_put_noidle(&pdev->dev);
+	pm_runtime_disable(&pdev->dev);
+	pm_runtime_set_suspended(&pdev->dev);
 	arm_smmu_unregister_iommu(smmu);
 	arm_smmu_device_disable(smmu);
 	arm_smmu_update_gbpa(smmu, host_smmu->boot_gbpa, GBPA_ABORT);
-- 
2.39.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* RE: [RFC PATCH 00/45] KVM: Arm SMMUv3 driver for pKVM
  2023-02-01 12:52 ` Jean-Philippe Brucker
@ 2023-02-02  7:07   ` Tian, Kevin
  -1 siblings, 0 replies; 201+ messages in thread
From: Tian, Kevin @ 2023-02-02  7:07 UTC (permalink / raw)
  To: Jean-Philippe Brucker, maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Chen, Jason CJ, Zhang, Tina

> From: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Sent: Wednesday, February 1, 2023 8:53 PM
> 
> 3. Private I/O page tables
> 
> A flexible alternative uses private page tables in the SMMU, entirely
> disconnected from the CPU page tables. With this the SMMU can implement
> a
> reduced set of features, even shed a stage of translation. This also
> provides a virtual I/O address space to the host, which allows more
> efficient memory allocation for large buffers, and for devices with
> limited addressing abilities.
> 
> This is the solution implemented in this series. The host creates
> IOVA->HPA mappings with two hypercalls map_pages() and unmap_pages(),
> and
> the hypervisor populates the page tables. Page tables are abstracted into
> IOMMU domains, which allow multiple devices to share the same address
> space. Another four hypercalls, alloc_domain(), attach_dev(), detach_dev()
> and free_domain(), manage the domains.
> 

Out of curiosity. Does virtio-iommu fit in this usage? If yes then there is
no need to add specific enlightenment in existing iommu drivers. If no 
probably because as mentioned in the start a full-fledged iommu driver 
doesn't fit nVHE so lots of smmu driver logic has to be kept in the host?
anyway just want to check your thoughts on the possibility.

btw some of my colleagues are porting pKVM to Intel platform. I believe
they will post their work shortly and there might require some common
framework in pKVM hypervisor like iommu domain, hypercalls, etc. like 
what we have in the host iommu subsystem. CC them in case of any early
thought they want to throw in. 😊

Thanks
Kevin

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [RFC PATCH 00/45] KVM: Arm SMMUv3 driver for pKVM
@ 2023-02-02  7:07   ` Tian, Kevin
  0 siblings, 0 replies; 201+ messages in thread
From: Tian, Kevin @ 2023-02-02  7:07 UTC (permalink / raw)
  To: Jean-Philippe Brucker, maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu, Chen, Jason CJ, Zhang, Tina

> From: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Sent: Wednesday, February 1, 2023 8:53 PM
> 
> 3. Private I/O page tables
> 
> A flexible alternative uses private page tables in the SMMU, entirely
> disconnected from the CPU page tables. With this the SMMU can implement
> a
> reduced set of features, even shed a stage of translation. This also
> provides a virtual I/O address space to the host, which allows more
> efficient memory allocation for large buffers, and for devices with
> limited addressing abilities.
> 
> This is the solution implemented in this series. The host creates
> IOVA->HPA mappings with two hypercalls map_pages() and unmap_pages(),
> and
> the hypervisor populates the page tables. Page tables are abstracted into
> IOMMU domains, which allow multiple devices to share the same address
> space. Another four hypercalls, alloc_domain(), attach_dev(), detach_dev()
> and free_domain(), manage the domains.
> 

Out of curiosity. Does virtio-iommu fit in this usage? If yes then there is
no need to add specific enlightenment in existing iommu drivers. If no 
probably because as mentioned in the start a full-fledged iommu driver 
doesn't fit nVHE so lots of smmu driver logic has to be kept in the host?
anyway just want to check your thoughts on the possibility.

btw some of my colleagues are porting pKVM to Intel platform. I believe
they will post their work shortly and there might require some common
framework in pKVM hypervisor like iommu domain, hypercalls, etc. like 
what we have in the host iommu subsystem. CC them in case of any early
thought they want to throw in. 😊

Thanks
Kevin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 00/45] KVM: Arm SMMUv3 driver for pKVM
  2023-02-02  7:07   ` Tian, Kevin
@ 2023-02-02 10:05     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-02 10:05 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, smostafa, dbrazdil,
	ryan.roberts, linux-arm-kernel, kvmarm, iommu, Chen, Jason CJ,
	Zhang, Tina

On Thu, Feb 02, 2023 at 07:07:55AM +0000, Tian, Kevin wrote:
> > From: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > Sent: Wednesday, February 1, 2023 8:53 PM
> > 
> > 3. Private I/O page tables
> > 
> > A flexible alternative uses private page tables in the SMMU, entirely
> > disconnected from the CPU page tables. With this the SMMU can implement
> > a
> > reduced set of features, even shed a stage of translation. This also
> > provides a virtual I/O address space to the host, which allows more
> > efficient memory allocation for large buffers, and for devices with
> > limited addressing abilities.
> > 
> > This is the solution implemented in this series. The host creates
> > IOVA->HPA mappings with two hypercalls map_pages() and unmap_pages(),
> > and
> > the hypervisor populates the page tables. Page tables are abstracted into
> > IOMMU domains, which allow multiple devices to share the same address
> > space. Another four hypercalls, alloc_domain(), attach_dev(), detach_dev()
> > and free_domain(), manage the domains.
> > 
> 
> Out of curiosity. Does virtio-iommu fit in this usage?

I don't think so, because you still need a driver for the physical IOMMU
in the hypervisor. virtio-iommu would only replace the hypercall interface
with queues, and I don't think that buys us anything.

Maybe virtio on the guest side could be advantageous, because that
interface has to be stable and virtio comes with stable APIs for several
classes of devices. But implementing virtio in pkvm means a lot of extra
code so it needs to be considered carefully.

> If yes then there is
> no need to add specific enlightenment in existing iommu drivers. If no 
> probably because as mentioned in the start a full-fledged iommu driver 
> doesn't fit nVHE so lots of smmu driver logic has to be kept in the host?

To minimize the attack surface of the hypervisor, we don't want to load
any superfluous code, so the hypervisor part of the SMMUv3 driver only
contains code to populate tables and send commands (which is still too
much for my taste but seems unavoidable to isolate host DMA). Left in the
host are things like ACPI/DT parser, interrupts, possibly the event queue
(which informs of DMA errors), extra features and complex optimizations.
The host also has to implement IOMMU ops to liaise between the DMA API and
the hypervisor.

> anyway just want to check your thoughts on the possibility.
> 
> btw some of my colleagues are porting pKVM to Intel platform. I believe
> they will post their work shortly and there might require some common
> framework in pKVM hypervisor like iommu domain, hypercalls, etc. like 
> what we have in the host iommu subsystem. CC them in case of any early
> thought they want to throw in. 😊

Cool! The hypervisor part contains iommu/iommu.c which deals with
hypercalls and domains and doesn't contain anything specific to Arm (it's
only in arch/arm64 because that's where pkvm currently sits). It does rely
on io-pgtable at the moment which is not used by VT-d but that can be
abstracted as well. It's possible however that on Intel an entirely
different set of hypercalls will be needed, if a simpler solution such as
sharing page tables fits better because VT-d implementations are more
homogeneous.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 00/45] KVM: Arm SMMUv3 driver for pKVM
@ 2023-02-02 10:05     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-02 10:05 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, smostafa, dbrazdil,
	ryan.roberts, linux-arm-kernel, kvmarm, iommu, Chen, Jason CJ,
	Zhang, Tina

On Thu, Feb 02, 2023 at 07:07:55AM +0000, Tian, Kevin wrote:
> > From: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > Sent: Wednesday, February 1, 2023 8:53 PM
> > 
> > 3. Private I/O page tables
> > 
> > A flexible alternative uses private page tables in the SMMU, entirely
> > disconnected from the CPU page tables. With this the SMMU can implement
> > a
> > reduced set of features, even shed a stage of translation. This also
> > provides a virtual I/O address space to the host, which allows more
> > efficient memory allocation for large buffers, and for devices with
> > limited addressing abilities.
> > 
> > This is the solution implemented in this series. The host creates
> > IOVA->HPA mappings with two hypercalls map_pages() and unmap_pages(),
> > and
> > the hypervisor populates the page tables. Page tables are abstracted into
> > IOMMU domains, which allow multiple devices to share the same address
> > space. Another four hypercalls, alloc_domain(), attach_dev(), detach_dev()
> > and free_domain(), manage the domains.
> > 
> 
> Out of curiosity. Does virtio-iommu fit in this usage?

I don't think so, because you still need a driver for the physical IOMMU
in the hypervisor. virtio-iommu would only replace the hypercall interface
with queues, and I don't think that buys us anything.

Maybe virtio on the guest side could be advantageous, because that
interface has to be stable and virtio comes with stable APIs for several
classes of devices. But implementing virtio in pkvm means a lot of extra
code so it needs to be considered carefully.

> If yes then there is
> no need to add specific enlightenment in existing iommu drivers. If no 
> probably because as mentioned in the start a full-fledged iommu driver 
> doesn't fit nVHE so lots of smmu driver logic has to be kept in the host?

To minimize the attack surface of the hypervisor, we don't want to load
any superfluous code, so the hypervisor part of the SMMUv3 driver only
contains code to populate tables and send commands (which is still too
much for my taste but seems unavoidable to isolate host DMA). Left in the
host are things like ACPI/DT parser, interrupts, possibly the event queue
(which informs of DMA errors), extra features and complex optimizations.
The host also has to implement IOMMU ops to liaise between the DMA API and
the hypervisor.

> anyway just want to check your thoughts on the possibility.
> 
> btw some of my colleagues are porting pKVM to Intel platform. I believe
> they will post their work shortly and there might require some common
> framework in pKVM hypervisor like iommu domain, hypercalls, etc. like 
> what we have in the host iommu subsystem. CC them in case of any early
> thought they want to throw in. 😊

Cool! The hypervisor part contains iommu/iommu.c which deals with
hypercalls and domains and doesn't contain anything specific to Arm (it's
only in arch/arm64 because that's where pkvm currently sits). It does rely
on io-pgtable at the moment which is not used by VT-d but that can be
abstracted as well. It's possible however that on Intel an entirely
different set of hypercalls will be needed, if a simpler solution such as
sharing page tables fits better because VT-d implementations are more
homogeneous.

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [RFC PATCH 00/45] KVM: Arm SMMUv3 driver for pKVM
  2023-02-02 10:05     ` Jean-Philippe Brucker
@ 2023-02-03  2:04       ` Tian, Kevin
  -1 siblings, 0 replies; 201+ messages in thread
From: Tian, Kevin @ 2023-02-03  2:04 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, smostafa, dbrazdil,
	ryan.roberts, linux-arm-kernel, kvmarm, iommu, Chen, Jason CJ,
	Zhang, Tina

> From: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Sent: Thursday, February 2, 2023 6:05 PM
> 
> On Thu, Feb 02, 2023 at 07:07:55AM +0000, Tian, Kevin wrote:
> > > From: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > > Sent: Wednesday, February 1, 2023 8:53 PM
> > >
> > > 3. Private I/O page tables
> > >
> > > A flexible alternative uses private page tables in the SMMU, entirely
> > > disconnected from the CPU page tables. With this the SMMU can
> implement
> > > a
> > > reduced set of features, even shed a stage of translation. This also
> > > provides a virtual I/O address space to the host, which allows more
> > > efficient memory allocation for large buffers, and for devices with
> > > limited addressing abilities.
> > >
> > > This is the solution implemented in this series. The host creates
> > > IOVA->HPA mappings with two hypercalls map_pages() and
> unmap_pages(),
> > > and
> > > the hypervisor populates the page tables. Page tables are abstracted into
> > > IOMMU domains, which allow multiple devices to share the same
> address
> > > space. Another four hypercalls, alloc_domain(), attach_dev(),
> detach_dev()
> > > and free_domain(), manage the domains.
> > >
> >
> > Out of curiosity. Does virtio-iommu fit in this usage?
> 
> I don't think so, because you still need a driver for the physical IOMMU
> in the hypervisor. virtio-iommu would only replace the hypercall interface
> with queues, and I don't think that buys us anything.
> 
> Maybe virtio on the guest side could be advantageous, because that
> interface has to be stable and virtio comes with stable APIs for several
> classes of devices. But implementing virtio in pkvm means a lot of extra
> code so it needs to be considered carefully.
> 

this makes sense.

> > If yes then there is
> > no need to add specific enlightenment in existing iommu drivers. If no
> > probably because as mentioned in the start a full-fledged iommu driver
> > doesn't fit nVHE so lots of smmu driver logic has to be kept in the host?
> 
> To minimize the attack surface of the hypervisor, we don't want to load
> any superfluous code, so the hypervisor part of the SMMUv3 driver only
> contains code to populate tables and send commands (which is still too
> much for my taste but seems unavoidable to isolate host DMA). Left in the
> host are things like ACPI/DT parser, interrupts, possibly the event queue
> (which informs of DMA errors), extra features and complex optimizations.
> The host also has to implement IOMMU ops to liaise between the DMA API
> and
> the hypervisor.
> 
> > anyway just want to check your thoughts on the possibility.
> >
> > btw some of my colleagues are porting pKVM to Intel platform. I believe
> > they will post their work shortly and there might require some common
> > framework in pKVM hypervisor like iommu domain, hypercalls, etc. like
> > what we have in the host iommu subsystem. CC them in case of any early
> > thought they want to throw in. 😊
> 
> Cool! The hypervisor part contains iommu/iommu.c which deals with
> hypercalls and domains and doesn't contain anything specific to Arm (it's
> only in arch/arm64 because that's where pkvm currently sits). It does rely
> on io-pgtable at the moment which is not used by VT-d but that can be
> abstracted as well. It's possible however that on Intel an entirely
> different set of hypercalls will be needed, if a simpler solution such as
> sharing page tables fits better because VT-d implementations are more
> homogeneous.
> 

yes depending on the choice on VT-d there could be different degree
of the sharing possibility. I'll let Jason/Tina comment on their design
choice.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [RFC PATCH 00/45] KVM: Arm SMMUv3 driver for pKVM
@ 2023-02-03  2:04       ` Tian, Kevin
  0 siblings, 0 replies; 201+ messages in thread
From: Tian, Kevin @ 2023-02-03  2:04 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, smostafa, dbrazdil,
	ryan.roberts, linux-arm-kernel, kvmarm, iommu, Chen, Jason CJ,
	Zhang, Tina

> From: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Sent: Thursday, February 2, 2023 6:05 PM
> 
> On Thu, Feb 02, 2023 at 07:07:55AM +0000, Tian, Kevin wrote:
> > > From: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > > Sent: Wednesday, February 1, 2023 8:53 PM
> > >
> > > 3. Private I/O page tables
> > >
> > > A flexible alternative uses private page tables in the SMMU, entirely
> > > disconnected from the CPU page tables. With this the SMMU can
> implement
> > > a
> > > reduced set of features, even shed a stage of translation. This also
> > > provides a virtual I/O address space to the host, which allows more
> > > efficient memory allocation for large buffers, and for devices with
> > > limited addressing abilities.
> > >
> > > This is the solution implemented in this series. The host creates
> > > IOVA->HPA mappings with two hypercalls map_pages() and
> unmap_pages(),
> > > and
> > > the hypervisor populates the page tables. Page tables are abstracted into
> > > IOMMU domains, which allow multiple devices to share the same
> address
> > > space. Another four hypercalls, alloc_domain(), attach_dev(),
> detach_dev()
> > > and free_domain(), manage the domains.
> > >
> >
> > Out of curiosity. Does virtio-iommu fit in this usage?
> 
> I don't think so, because you still need a driver for the physical IOMMU
> in the hypervisor. virtio-iommu would only replace the hypercall interface
> with queues, and I don't think that buys us anything.
> 
> Maybe virtio on the guest side could be advantageous, because that
> interface has to be stable and virtio comes with stable APIs for several
> classes of devices. But implementing virtio in pkvm means a lot of extra
> code so it needs to be considered carefully.
> 

this makes sense.

> > If yes then there is
> > no need to add specific enlightenment in existing iommu drivers. If no
> > probably because as mentioned in the start a full-fledged iommu driver
> > doesn't fit nVHE so lots of smmu driver logic has to be kept in the host?
> 
> To minimize the attack surface of the hypervisor, we don't want to load
> any superfluous code, so the hypervisor part of the SMMUv3 driver only
> contains code to populate tables and send commands (which is still too
> much for my taste but seems unavoidable to isolate host DMA). Left in the
> host are things like ACPI/DT parser, interrupts, possibly the event queue
> (which informs of DMA errors), extra features and complex optimizations.
> The host also has to implement IOMMU ops to liaise between the DMA API
> and
> the hypervisor.
> 
> > anyway just want to check your thoughts on the possibility.
> >
> > btw some of my colleagues are porting pKVM to Intel platform. I believe
> > they will post their work shortly and there might require some common
> > framework in pKVM hypervisor like iommu domain, hypercalls, etc. like
> > what we have in the host iommu subsystem. CC them in case of any early
> > thought they want to throw in. 😊
> 
> Cool! The hypervisor part contains iommu/iommu.c which deals with
> hypercalls and domains and doesn't contain anything specific to Arm (it's
> only in arch/arm64 because that's where pkvm currently sits). It does rely
> on io-pgtable at the moment which is not used by VT-d but that can be
> abstracted as well. It's possible however that on Intel an entirely
> different set of hypercalls will be needed, if a simpler solution such as
> sharing page tables fits better because VT-d implementations are more
> homogeneous.
> 

yes depending on the choice on VT-d there could be different degree
of the sharing possibility. I'll let Jason/Tina comment on their design
choice.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [RFC PATCH 00/45] KVM: Arm SMMUv3 driver for pKVM
  2023-02-03  2:04       ` Tian, Kevin
@ 2023-02-03  8:39         ` Chen, Jason CJ
  -1 siblings, 0 replies; 201+ messages in thread
From: Chen, Jason CJ @ 2023-02-03  8:39 UTC (permalink / raw)
  To: Tian, Kevin, Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, smostafa, dbrazdil,
	ryan.roberts, linux-arm-kernel, kvmarm, iommu, Zhang, Tina, Chen,
	Jason CJ

> -----Original Message-----
> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Friday, February 3, 2023 10:05 AM
> To: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Cc: maz@kernel.org; catalin.marinas@arm.com; will@kernel.org;
> joro@8bytes.org; robin.murphy@arm.com; james.morse@arm.com;
> suzuki.poulose@arm.com; oliver.upton@linux.dev; yuzenghui@huawei.com;
> smostafa@google.com; dbrazdil@google.com; ryan.roberts@arm.com; linux-
> arm-kernel@lists.infradead.org; kvmarm@lists.linux.dev;
> iommu@lists.linux.dev; Chen, Jason CJ <jason.cj.chen@intel.com>; Zhang,
> Tina <tina.zhang@intel.com>
> Subject: RE: [RFC PATCH 00/45] KVM: Arm SMMUv3 driver for pKVM
> 
> > From: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > Sent: Thursday, February 2, 2023 6:05 PM
> >
> > On Thu, Feb 02, 2023 at 07:07:55AM +0000, Tian, Kevin wrote:
> > > > From: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > > > Sent: Wednesday, February 1, 2023 8:53 PM
> > > >
> > > > 3. Private I/O page tables
> > > >
> > > > A flexible alternative uses private page tables in the SMMU,
> > > > entirely disconnected from the CPU page tables. With this the SMMU
> > > > can
> > implement
> > > > a
> > > > reduced set of features, even shed a stage of translation. This
> > > > also provides a virtual I/O address space to the host, which
> > > > allows more efficient memory allocation for large buffers, and for
> > > > devices with limited addressing abilities.
> > > >
> > > > This is the solution implemented in this series. The host creates
> > > > IOVA->HPA mappings with two hypercalls map_pages() and
> > unmap_pages(),
> > > > and
> > > > the hypervisor populates the page tables. Page tables are
> > > > abstracted into IOMMU domains, which allow multiple devices to
> > > > share the same
> > address
> > > > space. Another four hypercalls, alloc_domain(), attach_dev(),
> > detach_dev()
> > > > and free_domain(), manage the domains.
> > > >
> > >
> > > Out of curiosity. Does virtio-iommu fit in this usage?
> >
> > I don't think so, because you still need a driver for the physical
> > IOMMU in the hypervisor. virtio-iommu would only replace the hypercall
> > interface with queues, and I don't think that buys us anything.
> >
> > Maybe virtio on the guest side could be advantageous, because that
> > interface has to be stable and virtio comes with stable APIs for
> > several classes of devices. But implementing virtio in pkvm means a
> > lot of extra code so it needs to be considered carefully.
> >
> 
> this makes sense.
> 
> > > If yes then there is
> > > no need to add specific enlightenment in existing iommu drivers. If
> > > no probably because as mentioned in the start a full-fledged iommu
> > > driver doesn't fit nVHE so lots of smmu driver logic has to be kept in the
> host?
> >
> > To minimize the attack surface of the hypervisor, we don't want to
> > load any superfluous code, so the hypervisor part of the SMMUv3 driver
> > only contains code to populate tables and send commands (which is
> > still too much for my taste but seems unavoidable to isolate host
> > DMA). Left in the host are things like ACPI/DT parser, interrupts,
> > possibly the event queue (which informs of DMA errors), extra features
> and complex optimizations.
> > The host also has to implement IOMMU ops to liaise between the DMA API
> > and the hypervisor.
> >
> > > anyway just want to check your thoughts on the possibility.
> > >
> > > btw some of my colleagues are porting pKVM to Intel platform. I
> > > believe they will post their work shortly and there might require
> > > some common framework in pKVM hypervisor like iommu domain,
> > > hypercalls, etc. like what we have in the host iommu subsystem. CC
> > > them in case of any early thought they want to throw in. 😊
> >
> > Cool! The hypervisor part contains iommu/iommu.c which deals with
> > hypercalls and domains and doesn't contain anything specific to Arm
> > (it's only in arch/arm64 because that's where pkvm currently sits). It
> > does rely on io-pgtable at the moment which is not used by VT-d but
> > that can be abstracted as well. It's possible however that on Intel an
> > entirely different set of hypercalls will be needed, if a simpler
> > solution such as sharing page tables fits better because VT-d
> > implementations are more homogeneous.
> >
> 
> yes depending on the choice on VT-d there could be different degree of the
> sharing possibility. I'll let Jason/Tina comment on their design choice.

Thanks Kevin bring us here. Current our POC solution for VT-d is based on nested
translation, as there are two level io-pgtable, we keep first-level page table full 
controlled by host VM (IOVA -> host_GPA) and second-level page table is managed 
by pKVM (host_GPA -> HPA). This solution is simple straight-forward, but pKVM 
still need to provide vIOMMU emulation for host (e.g., shadowing root/context/
pasid tables,  emulating IOTLB flush etc.). 

As I know, SMMU also support nested translation mode, may I know what's the 
mode used for pKVM?

We met similar solution choices whether to share second-level io-pgtable with CPU
pgtable,  and finally we also decided to introduce a new pgtable, this increase the
complexity of page state management - as io-pgtable & cpu-pgtable need to align
the page ownership.

Now our solution is based on vIOMMU emulation in pKVM, enlighten method should
also be an alternative solution.

Thanks
Jason CJ Chen

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [RFC PATCH 00/45] KVM: Arm SMMUv3 driver for pKVM
@ 2023-02-03  8:39         ` Chen, Jason CJ
  0 siblings, 0 replies; 201+ messages in thread
From: Chen, Jason CJ @ 2023-02-03  8:39 UTC (permalink / raw)
  To: Tian, Kevin, Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, smostafa, dbrazdil,
	ryan.roberts, linux-arm-kernel, kvmarm, iommu, Zhang, Tina, Chen,
	Jason CJ

> -----Original Message-----
> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Friday, February 3, 2023 10:05 AM
> To: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Cc: maz@kernel.org; catalin.marinas@arm.com; will@kernel.org;
> joro@8bytes.org; robin.murphy@arm.com; james.morse@arm.com;
> suzuki.poulose@arm.com; oliver.upton@linux.dev; yuzenghui@huawei.com;
> smostafa@google.com; dbrazdil@google.com; ryan.roberts@arm.com; linux-
> arm-kernel@lists.infradead.org; kvmarm@lists.linux.dev;
> iommu@lists.linux.dev; Chen, Jason CJ <jason.cj.chen@intel.com>; Zhang,
> Tina <tina.zhang@intel.com>
> Subject: RE: [RFC PATCH 00/45] KVM: Arm SMMUv3 driver for pKVM
> 
> > From: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > Sent: Thursday, February 2, 2023 6:05 PM
> >
> > On Thu, Feb 02, 2023 at 07:07:55AM +0000, Tian, Kevin wrote:
> > > > From: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > > > Sent: Wednesday, February 1, 2023 8:53 PM
> > > >
> > > > 3. Private I/O page tables
> > > >
> > > > A flexible alternative uses private page tables in the SMMU,
> > > > entirely disconnected from the CPU page tables. With this the SMMU
> > > > can
> > implement
> > > > a
> > > > reduced set of features, even shed a stage of translation. This
> > > > also provides a virtual I/O address space to the host, which
> > > > allows more efficient memory allocation for large buffers, and for
> > > > devices with limited addressing abilities.
> > > >
> > > > This is the solution implemented in this series. The host creates
> > > > IOVA->HPA mappings with two hypercalls map_pages() and
> > unmap_pages(),
> > > > and
> > > > the hypervisor populates the page tables. Page tables are
> > > > abstracted into IOMMU domains, which allow multiple devices to
> > > > share the same
> > address
> > > > space. Another four hypercalls, alloc_domain(), attach_dev(),
> > detach_dev()
> > > > and free_domain(), manage the domains.
> > > >
> > >
> > > Out of curiosity. Does virtio-iommu fit in this usage?
> >
> > I don't think so, because you still need a driver for the physical
> > IOMMU in the hypervisor. virtio-iommu would only replace the hypercall
> > interface with queues, and I don't think that buys us anything.
> >
> > Maybe virtio on the guest side could be advantageous, because that
> > interface has to be stable and virtio comes with stable APIs for
> > several classes of devices. But implementing virtio in pkvm means a
> > lot of extra code so it needs to be considered carefully.
> >
> 
> this makes sense.
> 
> > > If yes then there is
> > > no need to add specific enlightenment in existing iommu drivers. If
> > > no probably because as mentioned in the start a full-fledged iommu
> > > driver doesn't fit nVHE so lots of smmu driver logic has to be kept in the
> host?
> >
> > To minimize the attack surface of the hypervisor, we don't want to
> > load any superfluous code, so the hypervisor part of the SMMUv3 driver
> > only contains code to populate tables and send commands (which is
> > still too much for my taste but seems unavoidable to isolate host
> > DMA). Left in the host are things like ACPI/DT parser, interrupts,
> > possibly the event queue (which informs of DMA errors), extra features
> and complex optimizations.
> > The host also has to implement IOMMU ops to liaise between the DMA API
> > and the hypervisor.
> >
> > > anyway just want to check your thoughts on the possibility.
> > >
> > > btw some of my colleagues are porting pKVM to Intel platform. I
> > > believe they will post their work shortly and there might require
> > > some common framework in pKVM hypervisor like iommu domain,
> > > hypercalls, etc. like what we have in the host iommu subsystem. CC
> > > them in case of any early thought they want to throw in. 😊
> >
> > Cool! The hypervisor part contains iommu/iommu.c which deals with
> > hypercalls and domains and doesn't contain anything specific to Arm
> > (it's only in arch/arm64 because that's where pkvm currently sits). It
> > does rely on io-pgtable at the moment which is not used by VT-d but
> > that can be abstracted as well. It's possible however that on Intel an
> > entirely different set of hypercalls will be needed, if a simpler
> > solution such as sharing page tables fits better because VT-d
> > implementations are more homogeneous.
> >
> 
> yes depending on the choice on VT-d there could be different degree of the
> sharing possibility. I'll let Jason/Tina comment on their design choice.

Thanks Kevin bring us here. Current our POC solution for VT-d is based on nested
translation, as there are two level io-pgtable, we keep first-level page table full 
controlled by host VM (IOVA -> host_GPA) and second-level page table is managed 
by pKVM (host_GPA -> HPA). This solution is simple straight-forward, but pKVM 
still need to provide vIOMMU emulation for host (e.g., shadowing root/context/
pasid tables,  emulating IOTLB flush etc.). 

As I know, SMMU also support nested translation mode, may I know what's the 
mode used for pKVM?

We met similar solution choices whether to share second-level io-pgtable with CPU
pgtable,  and finally we also decided to introduce a new pgtable, this increase the
complexity of page state management - as io-pgtable & cpu-pgtable need to align
the page ownership.

Now our solution is based on vIOMMU emulation in pKVM, enlighten method should
also be an alternative solution.

Thanks
Jason CJ Chen
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 00/45] KVM: Arm SMMUv3 driver for pKVM
  2023-02-03  8:39         ` Chen, Jason CJ
@ 2023-02-03 11:23           ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-03 11:23 UTC (permalink / raw)
  To: Chen, Jason CJ
  Cc: Tian, Kevin, maz, catalin.marinas, will, joro, robin.murphy,
	james.morse, suzuki.poulose, oliver.upton, yuzenghui, smostafa,
	dbrazdil, ryan.roberts, linux-arm-kernel, kvmarm, iommu, Zhang,
	Tina

Hi Jason,

On Fri, Feb 03, 2023 at 08:39:41AM +0000, Chen, Jason CJ wrote:
> > > > btw some of my colleagues are porting pKVM to Intel platform. I
> > > > believe they will post their work shortly and there might require
> > > > some common framework in pKVM hypervisor like iommu domain,
> > > > hypercalls, etc. like what we have in the host iommu subsystem. CC
> > > > them in case of any early thought they want to throw in. 😊
> > >
> > > Cool! The hypervisor part contains iommu/iommu.c which deals with
> > > hypercalls and domains and doesn't contain anything specific to Arm
> > > (it's only in arch/arm64 because that's where pkvm currently sits). It
> > > does rely on io-pgtable at the moment which is not used by VT-d but
> > > that can be abstracted as well. It's possible however that on Intel an
> > > entirely different set of hypercalls will be needed, if a simpler
> > > solution such as sharing page tables fits better because VT-d
> > > implementations are more homogeneous.
> > >
> > 
> > yes depending on the choice on VT-d there could be different degree of the
> > sharing possibility. I'll let Jason/Tina comment on their design choice.
> 
> Thanks Kevin bring us here. Current our POC solution for VT-d is based on nested
> translation, as there are two level io-pgtable, we keep first-level page table full 
> controlled by host VM (IOVA -> host_GPA) and second-level page table is managed 
> by pKVM (host_GPA -> HPA). This solution is simple straight-forward, but pKVM 
> still need to provide vIOMMU emulation for host (e.g., shadowing root/context/
> pasid tables,  emulating IOTLB flush etc.). 

I dismissed emulating the SMMU early on because it feels too complex
compared to an abstracted hypercall interface, but again that may be due
to the high variation of configurations of the SMMU. For nesting, you
could use some of the interface that Yi Liu and Jacob Pan have been
working on [1]. It should be possible with a couple of attach-table and
tlb-invalidate hypercalls to avoid emulating the low-level registers and
queues.
 
> As I know, SMMU also support nested translation mode, may I know what's the 
> mode used for pKVM?

It doesn't use nested translation because it is optional in the SMMU, and
this series tries to support any possible implementation. Since pKVM on
arm64 is being used on mobile platforms I suspect that, to save space,
some SMMUs might not implement first-level or second-level page tables.
Besides, supporting nesting for Arm would still require hypercalls for
pinning DMA pages (solution 2).

This series populates the second-level tables with the complete IOVA -> PA
translation (similarly to how VFIO works at the moment). If an
implementation only supports first-level tables, then the hypervisor would
own it and put the IOVA -> PA translation in there.

Thanks,
Jean

[1] https://lore.kernel.org/linux-iommu/1570045363-24856-2-git-send-email-jacob.jun.pan@linux.intel.com/
    (It's being reworked but I couldn't find a recent link)

> 
> We met similar solution choices whether to share second-level io-pgtable with CPU
> pgtable,  and finally we also decided to introduce a new pgtable, this increase the
> complexity of page state management - as io-pgtable & cpu-pgtable need to align
> the page ownership.
> 
> Now our solution is based on vIOMMU emulation in pKVM, enlighten method should
> also be an alternative solution.
> 
> Thanks
> Jason CJ Chen


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 00/45] KVM: Arm SMMUv3 driver for pKVM
@ 2023-02-03 11:23           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-03 11:23 UTC (permalink / raw)
  To: Chen, Jason CJ
  Cc: Tian, Kevin, maz, catalin.marinas, will, joro, robin.murphy,
	james.morse, suzuki.poulose, oliver.upton, yuzenghui, smostafa,
	dbrazdil, ryan.roberts, linux-arm-kernel, kvmarm, iommu, Zhang,
	Tina

Hi Jason,

On Fri, Feb 03, 2023 at 08:39:41AM +0000, Chen, Jason CJ wrote:
> > > > btw some of my colleagues are porting pKVM to Intel platform. I
> > > > believe they will post their work shortly and there might require
> > > > some common framework in pKVM hypervisor like iommu domain,
> > > > hypercalls, etc. like what we have in the host iommu subsystem. CC
> > > > them in case of any early thought they want to throw in. 😊
> > >
> > > Cool! The hypervisor part contains iommu/iommu.c which deals with
> > > hypercalls and domains and doesn't contain anything specific to Arm
> > > (it's only in arch/arm64 because that's where pkvm currently sits). It
> > > does rely on io-pgtable at the moment which is not used by VT-d but
> > > that can be abstracted as well. It's possible however that on Intel an
> > > entirely different set of hypercalls will be needed, if a simpler
> > > solution such as sharing page tables fits better because VT-d
> > > implementations are more homogeneous.
> > >
> > 
> > yes depending on the choice on VT-d there could be different degree of the
> > sharing possibility. I'll let Jason/Tina comment on their design choice.
> 
> Thanks Kevin bring us here. Current our POC solution for VT-d is based on nested
> translation, as there are two level io-pgtable, we keep first-level page table full 
> controlled by host VM (IOVA -> host_GPA) and second-level page table is managed 
> by pKVM (host_GPA -> HPA). This solution is simple straight-forward, but pKVM 
> still need to provide vIOMMU emulation for host (e.g., shadowing root/context/
> pasid tables,  emulating IOTLB flush etc.). 

I dismissed emulating the SMMU early on because it feels too complex
compared to an abstracted hypercall interface, but again that may be due
to the high variation of configurations of the SMMU. For nesting, you
could use some of the interface that Yi Liu and Jacob Pan have been
working on [1]. It should be possible with a couple of attach-table and
tlb-invalidate hypercalls to avoid emulating the low-level registers and
queues.
 
> As I know, SMMU also support nested translation mode, may I know what's the 
> mode used for pKVM?

It doesn't use nested translation because it is optional in the SMMU, and
this series tries to support any possible implementation. Since pKVM on
arm64 is being used on mobile platforms I suspect that, to save space,
some SMMUs might not implement first-level or second-level page tables.
Besides, supporting nesting for Arm would still require hypercalls for
pinning DMA pages (solution 2).

This series populates the second-level tables with the complete IOVA -> PA
translation (similarly to how VFIO works at the moment). If an
implementation only supports first-level tables, then the hypervisor would
own it and put the IOVA -> PA translation in there.

Thanks,
Jean

[1] https://lore.kernel.org/linux-iommu/1570045363-24856-2-git-send-email-jacob.jun.pan@linux.intel.com/
    (It's being reworked but I couldn't find a recent link)

> 
> We met similar solution choices whether to share second-level io-pgtable with CPU
> pgtable,  and finally we also decided to introduce a new pgtable, this increase the
> complexity of page state management - as io-pgtable & cpu-pgtable need to align
> the page ownership.
> 
> Now our solution is based on vIOMMU emulation in pKVM, enlighten method should
> also be an alternative solution.
> 
> Thanks
> Jason CJ Chen


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [RFC PATCH 00/45] KVM: Arm SMMUv3 driver for pKVM
  2023-02-03 11:23           ` Jean-Philippe Brucker
@ 2023-02-04  8:19             ` Chen, Jason CJ
  -1 siblings, 0 replies; 201+ messages in thread
From: Chen, Jason CJ @ 2023-02-04  8:19 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Tian, Kevin, maz, catalin.marinas, will, joro, robin.murphy,
	james.morse, suzuki.poulose, oliver.upton, yuzenghui, smostafa,
	dbrazdil, ryan.roberts, linux-arm-kernel, kvmarm, iommu, Zhang,
	Tina, Chen, Jason CJ

Hi, Jean,

Thanks for the information! Let's do more investigation.

Yes, if using enlighten method, we may skip nested translation. Meantime we
shall ensure host not touch this capability. We may also need trade-off to support
SVM kind features.

Thanks

Jason

> -----Original Message-----
> From: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Sent: Friday, February 3, 2023 7:24 PM
> To: Chen, Jason CJ <jason.cj.chen@intel.com>
> Cc: Tian, Kevin <kevin.tian@intel.com>; maz@kernel.org;
> catalin.marinas@arm.com; will@kernel.org; joro@8bytes.org;
> robin.murphy@arm.com; james.morse@arm.com;
> suzuki.poulose@arm.com; oliver.upton@linux.dev; yuzenghui@huawei.com;
> smostafa@google.com; dbrazdil@google.com; ryan.roberts@arm.com;
> linux-arm-kernel@lists.infradead.org; kvmarm@lists.linux.dev;
> iommu@lists.linux.dev; Zhang, Tina <tina.zhang@intel.com>
> Subject: Re: [RFC PATCH 00/45] KVM: Arm SMMUv3 driver for pKVM
> 
> Hi Jason,
> 
> On Fri, Feb 03, 2023 at 08:39:41AM +0000, Chen, Jason CJ wrote:
> > > > > btw some of my colleagues are porting pKVM to Intel platform. I
> > > > > believe they will post their work shortly and there might
> > > > > require some common framework in pKVM hypervisor like iommu
> > > > > domain, hypercalls, etc. like what we have in the host iommu
> > > > > subsystem. CC them in case of any early thought they want to
> > > > > throw in. 😊
> > > >
> > > > Cool! The hypervisor part contains iommu/iommu.c which deals with
> > > > hypercalls and domains and doesn't contain anything specific to
> > > > Arm (it's only in arch/arm64 because that's where pkvm currently
> > > > sits). It does rely on io-pgtable at the moment which is not used
> > > > by VT-d but that can be abstracted as well. It's possible however
> > > > that on Intel an entirely different set of hypercalls will be
> > > > needed, if a simpler solution such as sharing page tables fits
> > > > better because VT-d implementations are more homogeneous.
> > > >
> > >
> > > yes depending on the choice on VT-d there could be different degree
> > > of the sharing possibility. I'll let Jason/Tina comment on their design
> choice.
> >
> > Thanks Kevin bring us here. Current our POC solution for VT-d is based
> > on nested translation, as there are two level io-pgtable, we keep
> > first-level page table full controlled by host VM (IOVA -> host_GPA)
> > and second-level page table is managed by pKVM (host_GPA -> HPA). This
> > solution is simple straight-forward, but pKVM still need to provide
> > vIOMMU emulation for host (e.g., shadowing root/context/ pasid tables,
> emulating IOTLB flush etc.).
> 
> I dismissed emulating the SMMU early on because it feels too complex
> compared to an abstracted hypercall interface, but again that may be due to
> the high variation of configurations of the SMMU. For nesting, you could use
> some of the interface that Yi Liu and Jacob Pan have been working on [1]. It
> should be possible with a couple of attach-table and tlb-invalidate hypercalls
> to avoid emulating the low-level registers and queues.
> 
> > As I know, SMMU also support nested translation mode, may I know
> > what's the mode used for pKVM?
> 
> It doesn't use nested translation because it is optional in the SMMU, and this
> series tries to support any possible implementation. Since pKVM on
> arm64 is being used on mobile platforms I suspect that, to save space, some
> SMMUs might not implement first-level or second-level page tables.
> Besides, supporting nesting for Arm would still require hypercalls for pinning
> DMA pages (solution 2).
> 
> This series populates the second-level tables with the complete IOVA -> PA
> translation (similarly to how VFIO works at the moment). If an
> implementation only supports first-level tables, then the hypervisor would
> own it and put the IOVA -> PA translation in there.
> 
> Thanks,
> Jean
> 
> [1] https://lore.kernel.org/linux-iommu/1570045363-24856-2-git-send-email-
> jacob.jun.pan@linux.intel.com/
>     (It's being reworked but I couldn't find a recent link)
> 
> >
> > We met similar solution choices whether to share second-level
> > io-pgtable with CPU pgtable,  and finally we also decided to introduce
> > a new pgtable, this increase the complexity of page state management -
> > as io-pgtable & cpu-pgtable need to align the page ownership.
> >
> > Now our solution is based on vIOMMU emulation in pKVM, enlighten
> > method should also be an alternative solution.
> >
> > Thanks
> > Jason CJ Chen


^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [RFC PATCH 00/45] KVM: Arm SMMUv3 driver for pKVM
@ 2023-02-04  8:19             ` Chen, Jason CJ
  0 siblings, 0 replies; 201+ messages in thread
From: Chen, Jason CJ @ 2023-02-04  8:19 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Tian, Kevin, maz, catalin.marinas, will, joro, robin.murphy,
	james.morse, suzuki.poulose, oliver.upton, yuzenghui, smostafa,
	dbrazdil, ryan.roberts, linux-arm-kernel, kvmarm, iommu, Zhang,
	Tina, Chen, Jason CJ

Hi, Jean,

Thanks for the information! Let's do more investigation.

Yes, if using enlighten method, we may skip nested translation. Meantime we
shall ensure host not touch this capability. We may also need trade-off to support
SVM kind features.

Thanks

Jason

> -----Original Message-----
> From: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Sent: Friday, February 3, 2023 7:24 PM
> To: Chen, Jason CJ <jason.cj.chen@intel.com>
> Cc: Tian, Kevin <kevin.tian@intel.com>; maz@kernel.org;
> catalin.marinas@arm.com; will@kernel.org; joro@8bytes.org;
> robin.murphy@arm.com; james.morse@arm.com;
> suzuki.poulose@arm.com; oliver.upton@linux.dev; yuzenghui@huawei.com;
> smostafa@google.com; dbrazdil@google.com; ryan.roberts@arm.com;
> linux-arm-kernel@lists.infradead.org; kvmarm@lists.linux.dev;
> iommu@lists.linux.dev; Zhang, Tina <tina.zhang@intel.com>
> Subject: Re: [RFC PATCH 00/45] KVM: Arm SMMUv3 driver for pKVM
> 
> Hi Jason,
> 
> On Fri, Feb 03, 2023 at 08:39:41AM +0000, Chen, Jason CJ wrote:
> > > > > btw some of my colleagues are porting pKVM to Intel platform. I
> > > > > believe they will post their work shortly and there might
> > > > > require some common framework in pKVM hypervisor like iommu
> > > > > domain, hypercalls, etc. like what we have in the host iommu
> > > > > subsystem. CC them in case of any early thought they want to
> > > > > throw in. 😊
> > > >
> > > > Cool! The hypervisor part contains iommu/iommu.c which deals with
> > > > hypercalls and domains and doesn't contain anything specific to
> > > > Arm (it's only in arch/arm64 because that's where pkvm currently
> > > > sits). It does rely on io-pgtable at the moment which is not used
> > > > by VT-d but that can be abstracted as well. It's possible however
> > > > that on Intel an entirely different set of hypercalls will be
> > > > needed, if a simpler solution such as sharing page tables fits
> > > > better because VT-d implementations are more homogeneous.
> > > >
> > >
> > > yes depending on the choice on VT-d there could be different degree
> > > of the sharing possibility. I'll let Jason/Tina comment on their design
> choice.
> >
> > Thanks Kevin bring us here. Current our POC solution for VT-d is based
> > on nested translation, as there are two level io-pgtable, we keep
> > first-level page table full controlled by host VM (IOVA -> host_GPA)
> > and second-level page table is managed by pKVM (host_GPA -> HPA). This
> > solution is simple straight-forward, but pKVM still need to provide
> > vIOMMU emulation for host (e.g., shadowing root/context/ pasid tables,
> emulating IOTLB flush etc.).
> 
> I dismissed emulating the SMMU early on because it feels too complex
> compared to an abstracted hypercall interface, but again that may be due to
> the high variation of configurations of the SMMU. For nesting, you could use
> some of the interface that Yi Liu and Jacob Pan have been working on [1]. It
> should be possible with a couple of attach-table and tlb-invalidate hypercalls
> to avoid emulating the low-level registers and queues.
> 
> > As I know, SMMU also support nested translation mode, may I know
> > what's the mode used for pKVM?
> 
> It doesn't use nested translation because it is optional in the SMMU, and this
> series tries to support any possible implementation. Since pKVM on
> arm64 is being used on mobile platforms I suspect that, to save space, some
> SMMUs might not implement first-level or second-level page tables.
> Besides, supporting nesting for Arm would still require hypercalls for pinning
> DMA pages (solution 2).
> 
> This series populates the second-level tables with the complete IOVA -> PA
> translation (similarly to how VFIO works at the moment). If an
> implementation only supports first-level tables, then the hypervisor would
> own it and put the IOVA -> PA translation in there.
> 
> Thanks,
> Jean
> 
> [1] https://lore.kernel.org/linux-iommu/1570045363-24856-2-git-send-email-
> jacob.jun.pan@linux.intel.com/
>     (It's being reworked but I couldn't find a recent link)
> 
> >
> > We met similar solution choices whether to share second-level
> > io-pgtable with CPU pgtable,  and finally we also decided to introduce
> > a new pgtable, this increase the complexity of page state management -
> > as io-pgtable & cpu-pgtable need to align the page ownership.
> >
> > Now our solution is based on vIOMMU emulation in pKVM, enlighten
> > method should also be an alternative solution.
> >
> > Thanks
> > Jason CJ Chen

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 00/45] KVM: Arm SMMUv3 driver for pKVM
  2023-02-04  8:19             ` Chen, Jason CJ
@ 2023-02-04 12:30               ` tina.zhang
  -1 siblings, 0 replies; 201+ messages in thread
From: tina.zhang @ 2023-02-04 12:30 UTC (permalink / raw)
  To: Chen, Jason CJ, Jean-Philippe Brucker
  Cc: Tian, Kevin, maz, catalin.marinas, will, joro, robin.murphy,
	james.morse, suzuki.poulose, oliver.upton, yuzenghui, smostafa,
	dbrazdil, ryan.roberts, linux-arm-kernel, kvmarm, iommu



On 2/4/23 16:19, Chen, Jason CJ wrote:
> Hi, Jean,
> 
> Thanks for the information! Let's do more investigation.
> 
> Yes, if using enlighten method, we may skip nested translation. Meantime we
> shall ensure host not touch this capability. We may also need trade-off to support
> SVM kind features.
Hi Jason,

Nested translation is also optional to vt-d. Not all IA platforms could 
have vt-d with nested translation support. For those legacy platforms 
(e.g. on which vt-d doesn't support scalable mode), providing an 
enlightened way for pKVM to isolate DMA seems reasonable. Otherwise, 
pKVM may need to shadow io-page table which could introduce performance 
overhead.


Regards,
-Tina
> 
> Thanks
> 
> Jason
> 
>> -----Original Message-----
>> From: Jean-Philippe Brucker <jean-philippe@linaro.org>
>> Sent: Friday, February 3, 2023 7:24 PM
>> To: Chen, Jason CJ <jason.cj.chen@intel.com>
>> Cc: Tian, Kevin <kevin.tian@intel.com>; maz@kernel.org;
>> catalin.marinas@arm.com; will@kernel.org; joro@8bytes.org;
>> robin.murphy@arm.com; james.morse@arm.com;
>> suzuki.poulose@arm.com; oliver.upton@linux.dev; yuzenghui@huawei.com;
>> smostafa@google.com; dbrazdil@google.com; ryan.roberts@arm.com;
>> linux-arm-kernel@lists.infradead.org; kvmarm@lists.linux.dev;
>> iommu@lists.linux.dev; Zhang, Tina <tina.zhang@intel.com>
>> Subject: Re: [RFC PATCH 00/45] KVM: Arm SMMUv3 driver for pKVM
>>
>> Hi Jason,
>>
>> On Fri, Feb 03, 2023 at 08:39:41AM +0000, Chen, Jason CJ wrote:
>>>>>> btw some of my colleagues are porting pKVM to Intel platform. I
>>>>>> believe they will post their work shortly and there might
>>>>>> require some common framework in pKVM hypervisor like iommu
>>>>>> domain, hypercalls, etc. like what we have in the host iommu
>>>>>> subsystem. CC them in case of any early thought they want to
>>>>>> throw in. 😊
>>>>>
>>>>> Cool! The hypervisor part contains iommu/iommu.c which deals with
>>>>> hypercalls and domains and doesn't contain anything specific to
>>>>> Arm (it's only in arch/arm64 because that's where pkvm currently
>>>>> sits). It does rely on io-pgtable at the moment which is not used
>>>>> by VT-d but that can be abstracted as well. It's possible however
>>>>> that on Intel an entirely different set of hypercalls will be
>>>>> needed, if a simpler solution such as sharing page tables fits
>>>>> better because VT-d implementations are more homogeneous.
>>>>>
>>>>
>>>> yes depending on the choice on VT-d there could be different degree
>>>> of the sharing possibility. I'll let Jason/Tina comment on their design
>> choice.
>>>
>>> Thanks Kevin bring us here. Current our POC solution for VT-d is based
>>> on nested translation, as there are two level io-pgtable, we keep
>>> first-level page table full controlled by host VM (IOVA -> host_GPA)
>>> and second-level page table is managed by pKVM (host_GPA -> HPA). This
>>> solution is simple straight-forward, but pKVM still need to provide
>>> vIOMMU emulation for host (e.g., shadowing root/context/ pasid tables,
>> emulating IOTLB flush etc.).
>>
>> I dismissed emulating the SMMU early on because it feels too complex
>> compared to an abstracted hypercall interface, but again that may be due to
>> the high variation of configurations of the SMMU. For nesting, you could use
>> some of the interface that Yi Liu and Jacob Pan have been working on [1]. It
>> should be possible with a couple of attach-table and tlb-invalidate hypercalls
>> to avoid emulating the low-level registers and queues.
>>
>>> As I know, SMMU also support nested translation mode, may I know
>>> what's the mode used for pKVM?
>>
>> It doesn't use nested translation because it is optional in the SMMU, and this
>> series tries to support any possible implementation. Since pKVM on
>> arm64 is being used on mobile platforms I suspect that, to save space, some
>> SMMUs might not implement first-level or second-level page tables.
>> Besides, supporting nesting for Arm would still require hypercalls for pinning
>> DMA pages (solution 2).
>>
>> This series populates the second-level tables with the complete IOVA -> PA
>> translation (similarly to how VFIO works at the moment). If an
>> implementation only supports first-level tables, then the hypervisor would
>> own it and put the IOVA -> PA translation in there.
>>
>> Thanks,
>> Jean
>>
>> [1] https://lore.kernel.org/linux-iommu/1570045363-24856-2-git-send-email-
>> jacob.jun.pan@linux.intel.com/
>>      (It's being reworked but I couldn't find a recent link)
>>
>>>
>>> We met similar solution choices whether to share second-level
>>> io-pgtable with CPU pgtable,  and finally we also decided to introduce
>>> a new pgtable, this increase the complexity of page state management -
>>> as io-pgtable & cpu-pgtable need to align the page ownership.
>>>
>>> Now our solution is based on vIOMMU emulation in pKVM, enlighten
>>> method should also be an alternative solution.
>>>
>>> Thanks
>>> Jason CJ Chen
> 

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 00/45] KVM: Arm SMMUv3 driver for pKVM
@ 2023-02-04 12:30               ` tina.zhang
  0 siblings, 0 replies; 201+ messages in thread
From: tina.zhang @ 2023-02-04 12:30 UTC (permalink / raw)
  To: Chen, Jason CJ, Jean-Philippe Brucker
  Cc: Tian, Kevin, maz, catalin.marinas, will, joro, robin.murphy,
	james.morse, suzuki.poulose, oliver.upton, yuzenghui, smostafa,
	dbrazdil, ryan.roberts, linux-arm-kernel, kvmarm, iommu



On 2/4/23 16:19, Chen, Jason CJ wrote:
> Hi, Jean,
> 
> Thanks for the information! Let's do more investigation.
> 
> Yes, if using enlighten method, we may skip nested translation. Meantime we
> shall ensure host not touch this capability. We may also need trade-off to support
> SVM kind features.
Hi Jason,

Nested translation is also optional to vt-d. Not all IA platforms could 
have vt-d with nested translation support. For those legacy platforms 
(e.g. on which vt-d doesn't support scalable mode), providing an 
enlightened way for pKVM to isolate DMA seems reasonable. Otherwise, 
pKVM may need to shadow io-page table which could introduce performance 
overhead.


Regards,
-Tina
> 
> Thanks
> 
> Jason
> 
>> -----Original Message-----
>> From: Jean-Philippe Brucker <jean-philippe@linaro.org>
>> Sent: Friday, February 3, 2023 7:24 PM
>> To: Chen, Jason CJ <jason.cj.chen@intel.com>
>> Cc: Tian, Kevin <kevin.tian@intel.com>; maz@kernel.org;
>> catalin.marinas@arm.com; will@kernel.org; joro@8bytes.org;
>> robin.murphy@arm.com; james.morse@arm.com;
>> suzuki.poulose@arm.com; oliver.upton@linux.dev; yuzenghui@huawei.com;
>> smostafa@google.com; dbrazdil@google.com; ryan.roberts@arm.com;
>> linux-arm-kernel@lists.infradead.org; kvmarm@lists.linux.dev;
>> iommu@lists.linux.dev; Zhang, Tina <tina.zhang@intel.com>
>> Subject: Re: [RFC PATCH 00/45] KVM: Arm SMMUv3 driver for pKVM
>>
>> Hi Jason,
>>
>> On Fri, Feb 03, 2023 at 08:39:41AM +0000, Chen, Jason CJ wrote:
>>>>>> btw some of my colleagues are porting pKVM to Intel platform. I
>>>>>> believe they will post their work shortly and there might
>>>>>> require some common framework in pKVM hypervisor like iommu
>>>>>> domain, hypercalls, etc. like what we have in the host iommu
>>>>>> subsystem. CC them in case of any early thought they want to
>>>>>> throw in. 😊
>>>>>
>>>>> Cool! The hypervisor part contains iommu/iommu.c which deals with
>>>>> hypercalls and domains and doesn't contain anything specific to
>>>>> Arm (it's only in arch/arm64 because that's where pkvm currently
>>>>> sits). It does rely on io-pgtable at the moment which is not used
>>>>> by VT-d but that can be abstracted as well. It's possible however
>>>>> that on Intel an entirely different set of hypercalls will be
>>>>> needed, if a simpler solution such as sharing page tables fits
>>>>> better because VT-d implementations are more homogeneous.
>>>>>
>>>>
>>>> yes depending on the choice on VT-d there could be different degree
>>>> of the sharing possibility. I'll let Jason/Tina comment on their design
>> choice.
>>>
>>> Thanks Kevin bring us here. Current our POC solution for VT-d is based
>>> on nested translation, as there are two level io-pgtable, we keep
>>> first-level page table full controlled by host VM (IOVA -> host_GPA)
>>> and second-level page table is managed by pKVM (host_GPA -> HPA). This
>>> solution is simple straight-forward, but pKVM still need to provide
>>> vIOMMU emulation for host (e.g., shadowing root/context/ pasid tables,
>> emulating IOTLB flush etc.).
>>
>> I dismissed emulating the SMMU early on because it feels too complex
>> compared to an abstracted hypercall interface, but again that may be due to
>> the high variation of configurations of the SMMU. For nesting, you could use
>> some of the interface that Yi Liu and Jacob Pan have been working on [1]. It
>> should be possible with a couple of attach-table and tlb-invalidate hypercalls
>> to avoid emulating the low-level registers and queues.
>>
>>> As I know, SMMU also support nested translation mode, may I know
>>> what's the mode used for pKVM?
>>
>> It doesn't use nested translation because it is optional in the SMMU, and this
>> series tries to support any possible implementation. Since pKVM on
>> arm64 is being used on mobile platforms I suspect that, to save space, some
>> SMMUs might not implement first-level or second-level page tables.
>> Besides, supporting nesting for Arm would still require hypercalls for pinning
>> DMA pages (solution 2).
>>
>> This series populates the second-level tables with the complete IOVA -> PA
>> translation (similarly to how VFIO works at the moment). If an
>> implementation only supports first-level tables, then the hypervisor would
>> own it and put the IOVA -> PA translation in there.
>>
>> Thanks,
>> Jean
>>
>> [1] https://lore.kernel.org/linux-iommu/1570045363-24856-2-git-send-email-
>> jacob.jun.pan@linux.intel.com/
>>      (It's being reworked but I couldn't find a recent link)
>>
>>>
>>> We met similar solution choices whether to share second-level
>>> io-pgtable with CPU pgtable,  and finally we also decided to introduce
>>> a new pgtable, this increase the complexity of page state management -
>>> as io-pgtable & cpu-pgtable need to align the page ownership.
>>>
>>> Now our solution is based on vIOMMU emulation in pKVM, enlighten
>>> method should also be an alternative solution.
>>>
>>> Thanks
>>> Jason CJ Chen
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 15/45] KVM: arm64: pkvm: Add __pkvm_host_share/unshare_dma()
  2023-02-01 12:52   ` Jean-Philippe Brucker
@ 2023-02-04 12:51     ` tina.zhang
  -1 siblings, 0 replies; 201+ messages in thread
From: tina.zhang @ 2023-02-04 12:51 UTC (permalink / raw)
  To: Jean-Philippe Brucker, maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu



On 2/1/23 20:52, Jean-Philippe Brucker wrote:
> Host pages mapped in the SMMU must not be donated to the guest or
> hypervisor, since the host could then use DMA to break confidentiality.
> Mark them shared in the host stage-2 page tables, and keep a refcount in
> the hyp vmemmap.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> ---
>   arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |   3 +
>   arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 185 ++++++++++++++++++
>   2 files changed, 188 insertions(+)
> 
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> index 021825aee854..a363d58a998b 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -58,6 +58,7 @@ enum pkvm_component_id {
>   	PKVM_ID_HOST,
>   	PKVM_ID_HYP,
>   	PKVM_ID_GUEST,
> +	PKVM_ID_IOMMU,
>   };
>   
>   extern unsigned long hyp_nr_cpus;
> @@ -72,6 +73,8 @@ int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu);
>   int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu);
>   int __pkvm_guest_share_host(struct pkvm_hyp_vcpu *hyp_vcpu, u64 ipa);
>   int __pkvm_guest_unshare_host(struct pkvm_hyp_vcpu *hyp_vcpu, u64 ipa);
> +int __pkvm_host_share_dma(u64 phys_addr, size_t size, bool is_ram);
> +int __pkvm_host_unshare_dma(u64 phys_addr, size_t size);
>   
>   bool addr_is_memory(phys_addr_t phys);
>   int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 856673291d70..dcf08ce03790 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -1148,6 +1148,9 @@ static int check_share(struct pkvm_mem_share *share)
>   	case PKVM_ID_GUEST:
>   		ret = guest_ack_share(completer_addr, tx, share->completer_prot);
>   		break;
> +	case PKVM_ID_IOMMU:
> +		ret = 0;
> +		break;
>   	default:
>   		ret = -EINVAL;
>   	}
> @@ -1185,6 +1188,9 @@ static int __do_share(struct pkvm_mem_share *share)
>   	case PKVM_ID_GUEST:
>   		ret = guest_complete_share(completer_addr, tx, share->completer_prot);
>   		break;
> +	case PKVM_ID_IOMMU:
> +		ret = 0;
> +		break;
>   	default:
>   		ret = -EINVAL;
>   	}
> @@ -1239,6 +1245,9 @@ static int check_unshare(struct pkvm_mem_share *share)
>   	case PKVM_ID_HYP:
>   		ret = hyp_ack_unshare(completer_addr, tx);
>   		break;
> +	case PKVM_ID_IOMMU:
> +		ret = 0;
> +		break;
>   	default:
>   		ret = -EINVAL;
>   	}
> @@ -1273,6 +1282,9 @@ static int __do_unshare(struct pkvm_mem_share *share)
>   	case PKVM_ID_HYP:
>   		ret = hyp_complete_unshare(completer_addr, tx);
>   		break;
> +	case PKVM_ID_IOMMU:
> +		ret = 0;
> +		break;
>   	default:
>   		ret = -EINVAL;
>   	}
> @@ -1633,6 +1645,179 @@ void hyp_unpin_shared_mem(void *from, void *to)
>   	host_unlock_component();
>   }
>   
> +static int __host_check_page_dma_shared(phys_addr_t phys_addr)
> +{
> +	int ret;
> +	u64 hyp_addr;
> +
> +	/*
> +	 * The page is already refcounted. Make sure it's owned by the host, and
> +	 * not part of the hyp pool.
> +	 */
> +	ret = __host_check_page_state_range(phys_addr, PAGE_SIZE,
> +					    PKVM_PAGE_SHARED_OWNED);
> +	if (ret)
> +		return ret;
> +
> +	/*
> +	 * Refcounted and owned by host, means it's either mapped in the
> +	 * SMMU, or it's some VM/VCPU state shared with the hypervisor.
> +	 * The host has no reason to use a page for both.
> +	 */
> +	hyp_addr = (u64)hyp_phys_to_virt(phys_addr);
> +	return __hyp_check_page_state_range(hyp_addr, PAGE_SIZE, PKVM_NOPAGE);
> +}
> +
> +static int __pkvm_host_share_dma_page(phys_addr_t phys_addr, bool is_ram)
> +{
> +	int ret;
> +	struct hyp_page *p = hyp_phys_to_page(phys_addr);
> +	struct pkvm_mem_share share = {
> +		.tx	= {
> +			.nr_pages	= 1,
> +			.initiator	= {
> +				.id	= PKVM_ID_HOST,
> +				.addr	= phys_addr,
> +			},
> +			.completer	= {
> +				.id	= PKVM_ID_IOMMU,
> +			},
> +		},
> +	};
> +
> +	hyp_assert_lock_held(&host_mmu.lock);
> +	hyp_assert_lock_held(&pkvm_pgd_lock);
> +
> +	/*
> +	 * Some differences between handling of RAM and device memory:
> +	 * - The hyp vmemmap area for device memory is not backed by physical
> +	 *   pages in the hyp page tables.
> +	 * - Device memory is unmapped automatically under memory pressure
> +	 *   (host_stage2_try()) and the ownership information would be
> +	 *   discarded.
> +	 * We don't need to deal with that at the moment, because the host
> +	 * cannot share or donate device memory, only RAM.
> +	 *
> +	 * Since 'is_ram' is only a hint provided by the host, we do need to
> +	 * make sure of it.
> +	 */
> +	if (!is_ram)
> +		return addr_is_memory(phys_addr) ? -EINVAL : 0;
> +
> +	ret = hyp_page_ref_inc_return(p);
> +	BUG_ON(ret == 0);
> +	if (ret < 0)
> +		return ret;
> +	else if (ret == 1)
> +		ret = do_share(&share);
> +	else
> +		ret = __host_check_page_dma_shared(phys_addr);
> +
> +	if (ret)
> +		hyp_page_ref_dec(p);
> +
> +	return ret;
> +}
> +
> +static int __pkvm_host_unshare_dma_page(phys_addr_t phys_addr)
> +{
> +	struct hyp_page *p = hyp_phys_to_page(phys_addr);
> +	struct pkvm_mem_share share = {
> +		.tx	= {
> +			.nr_pages	= 1,
> +			.initiator	= {
> +				.id	= PKVM_ID_HOST,
> +				.addr	= phys_addr,
> +			},
> +			.completer	= {
> +				.id	= PKVM_ID_IOMMU,
> +			},
> +		},
> +	};
> +
> +	hyp_assert_lock_held(&host_mmu.lock);
> +	hyp_assert_lock_held(&pkvm_pgd_lock);
> +
> +	if (!addr_is_memory(phys_addr))
> +		return 0;
> +
> +	if (!hyp_page_ref_dec_and_test(p))
> +		return 0;
> +
> +	return do_unshare(&share);
> +}
> +
> +/*
> + * __pkvm_host_share_dma - Mark host memory as used for DMA
> + * @phys_addr:	physical address of the DMA region
> + * @size:	size of the DMA region
> + * @is_ram:	whether it is RAM or device memory
> + *
> + * We must not allow the host to donate pages that are mapped in the IOMMU for
> + * DMA. So:
> + * 1. Mark the host S2 entry as being owned by IOMMU
> + * 2. Refcount it, since a page may be mapped in multiple device address spaces.
> + *
> + * At some point we may end up needing more than the current 16 bits for
> + * refcounting, for example if all devices and sub-devices map the same MSI
> + * doorbell page. It will do for now.
> + */
> +int __pkvm_host_share_dma(phys_addr_t phys_addr, size_t size, bool is_ram)
> +{
> +	int i;
> +	int ret;
> +	size_t nr_pages = size >> PAGE_SHIFT;
> +
> +	if (WARN_ON(!PAGE_ALIGNED(phys_addr | size)))
> +		return -EINVAL;
> +
> +	host_lock_component();
> +	hyp_lock_component();
> +
> +	for (i = 0; i < nr_pages; i++) {
> +		ret = __pkvm_host_share_dma_page(phys_addr + i * PAGE_SIZE,
> +						 is_ram);
Hi Jean,

I'm not familiar with ARM arch. Just out of curiosity. If pKVM-ARM 
populates the host stage-2 page table lazily, would there be a case that 
device driver in host triggers DMA with pages which have not been mapped 
to the host stage-2 page table yet? How do we handle this situation?

Regards,
-Tina

> +		if (ret)
> +			break;
> +	}
> +
> +	if (ret) {
> +		for (--i; i >= 0; --i)
> +			__pkvm_host_unshare_dma_page(phys_addr + i * PAGE_SIZE);
> +	}
> +
> +	hyp_unlock_component();
> +	host_unlock_component();
> +
> +	return ret;
> +}
> +
> +int __pkvm_host_unshare_dma(phys_addr_t phys_addr, size_t size)
> +{
> +	int i;
> +	int ret;
> +	size_t nr_pages = size >> PAGE_SHIFT;
> +
> +	host_lock_component();
> +	hyp_lock_component();
> +
> +	/*
> +	 * We end up here after the caller successfully unmapped the page from
> +	 * the IOMMU table. Which means that a ref is held, the page is shared
> +	 * in the host s2, there can be no failure.
> +	 */
> +	for (i = 0; i < nr_pages; i++) {
> +		ret = __pkvm_host_unshare_dma_page(phys_addr + i * PAGE_SIZE);
> +		if (ret)
> +			break;
> +	}
> +
> +	hyp_unlock_component();
> +	host_unlock_component();
> +
> +	return ret;
> +}
> +
>   int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
>   {
>   	int ret;

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 15/45] KVM: arm64: pkvm: Add __pkvm_host_share/unshare_dma()
@ 2023-02-04 12:51     ` tina.zhang
  0 siblings, 0 replies; 201+ messages in thread
From: tina.zhang @ 2023-02-04 12:51 UTC (permalink / raw)
  To: Jean-Philippe Brucker, maz, catalin.marinas, will, joro
  Cc: robin.murphy, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, smostafa, dbrazdil, ryan.roberts, linux-arm-kernel,
	kvmarm, iommu



On 2/1/23 20:52, Jean-Philippe Brucker wrote:
> Host pages mapped in the SMMU must not be donated to the guest or
> hypervisor, since the host could then use DMA to break confidentiality.
> Mark them shared in the host stage-2 page tables, and keep a refcount in
> the hyp vmemmap.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> ---
>   arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |   3 +
>   arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 185 ++++++++++++++++++
>   2 files changed, 188 insertions(+)
> 
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> index 021825aee854..a363d58a998b 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -58,6 +58,7 @@ enum pkvm_component_id {
>   	PKVM_ID_HOST,
>   	PKVM_ID_HYP,
>   	PKVM_ID_GUEST,
> +	PKVM_ID_IOMMU,
>   };
>   
>   extern unsigned long hyp_nr_cpus;
> @@ -72,6 +73,8 @@ int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu);
>   int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu);
>   int __pkvm_guest_share_host(struct pkvm_hyp_vcpu *hyp_vcpu, u64 ipa);
>   int __pkvm_guest_unshare_host(struct pkvm_hyp_vcpu *hyp_vcpu, u64 ipa);
> +int __pkvm_host_share_dma(u64 phys_addr, size_t size, bool is_ram);
> +int __pkvm_host_unshare_dma(u64 phys_addr, size_t size);
>   
>   bool addr_is_memory(phys_addr_t phys);
>   int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 856673291d70..dcf08ce03790 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -1148,6 +1148,9 @@ static int check_share(struct pkvm_mem_share *share)
>   	case PKVM_ID_GUEST:
>   		ret = guest_ack_share(completer_addr, tx, share->completer_prot);
>   		break;
> +	case PKVM_ID_IOMMU:
> +		ret = 0;
> +		break;
>   	default:
>   		ret = -EINVAL;
>   	}
> @@ -1185,6 +1188,9 @@ static int __do_share(struct pkvm_mem_share *share)
>   	case PKVM_ID_GUEST:
>   		ret = guest_complete_share(completer_addr, tx, share->completer_prot);
>   		break;
> +	case PKVM_ID_IOMMU:
> +		ret = 0;
> +		break;
>   	default:
>   		ret = -EINVAL;
>   	}
> @@ -1239,6 +1245,9 @@ static int check_unshare(struct pkvm_mem_share *share)
>   	case PKVM_ID_HYP:
>   		ret = hyp_ack_unshare(completer_addr, tx);
>   		break;
> +	case PKVM_ID_IOMMU:
> +		ret = 0;
> +		break;
>   	default:
>   		ret = -EINVAL;
>   	}
> @@ -1273,6 +1282,9 @@ static int __do_unshare(struct pkvm_mem_share *share)
>   	case PKVM_ID_HYP:
>   		ret = hyp_complete_unshare(completer_addr, tx);
>   		break;
> +	case PKVM_ID_IOMMU:
> +		ret = 0;
> +		break;
>   	default:
>   		ret = -EINVAL;
>   	}
> @@ -1633,6 +1645,179 @@ void hyp_unpin_shared_mem(void *from, void *to)
>   	host_unlock_component();
>   }
>   
> +static int __host_check_page_dma_shared(phys_addr_t phys_addr)
> +{
> +	int ret;
> +	u64 hyp_addr;
> +
> +	/*
> +	 * The page is already refcounted. Make sure it's owned by the host, and
> +	 * not part of the hyp pool.
> +	 */
> +	ret = __host_check_page_state_range(phys_addr, PAGE_SIZE,
> +					    PKVM_PAGE_SHARED_OWNED);
> +	if (ret)
> +		return ret;
> +
> +	/*
> +	 * Refcounted and owned by host, means it's either mapped in the
> +	 * SMMU, or it's some VM/VCPU state shared with the hypervisor.
> +	 * The host has no reason to use a page for both.
> +	 */
> +	hyp_addr = (u64)hyp_phys_to_virt(phys_addr);
> +	return __hyp_check_page_state_range(hyp_addr, PAGE_SIZE, PKVM_NOPAGE);
> +}
> +
> +static int __pkvm_host_share_dma_page(phys_addr_t phys_addr, bool is_ram)
> +{
> +	int ret;
> +	struct hyp_page *p = hyp_phys_to_page(phys_addr);
> +	struct pkvm_mem_share share = {
> +		.tx	= {
> +			.nr_pages	= 1,
> +			.initiator	= {
> +				.id	= PKVM_ID_HOST,
> +				.addr	= phys_addr,
> +			},
> +			.completer	= {
> +				.id	= PKVM_ID_IOMMU,
> +			},
> +		},
> +	};
> +
> +	hyp_assert_lock_held(&host_mmu.lock);
> +	hyp_assert_lock_held(&pkvm_pgd_lock);
> +
> +	/*
> +	 * Some differences between handling of RAM and device memory:
> +	 * - The hyp vmemmap area for device memory is not backed by physical
> +	 *   pages in the hyp page tables.
> +	 * - Device memory is unmapped automatically under memory pressure
> +	 *   (host_stage2_try()) and the ownership information would be
> +	 *   discarded.
> +	 * We don't need to deal with that at the moment, because the host
> +	 * cannot share or donate device memory, only RAM.
> +	 *
> +	 * Since 'is_ram' is only a hint provided by the host, we do need to
> +	 * make sure of it.
> +	 */
> +	if (!is_ram)
> +		return addr_is_memory(phys_addr) ? -EINVAL : 0;
> +
> +	ret = hyp_page_ref_inc_return(p);
> +	BUG_ON(ret == 0);
> +	if (ret < 0)
> +		return ret;
> +	else if (ret == 1)
> +		ret = do_share(&share);
> +	else
> +		ret = __host_check_page_dma_shared(phys_addr);
> +
> +	if (ret)
> +		hyp_page_ref_dec(p);
> +
> +	return ret;
> +}
> +
> +static int __pkvm_host_unshare_dma_page(phys_addr_t phys_addr)
> +{
> +	struct hyp_page *p = hyp_phys_to_page(phys_addr);
> +	struct pkvm_mem_share share = {
> +		.tx	= {
> +			.nr_pages	= 1,
> +			.initiator	= {
> +				.id	= PKVM_ID_HOST,
> +				.addr	= phys_addr,
> +			},
> +			.completer	= {
> +				.id	= PKVM_ID_IOMMU,
> +			},
> +		},
> +	};
> +
> +	hyp_assert_lock_held(&host_mmu.lock);
> +	hyp_assert_lock_held(&pkvm_pgd_lock);
> +
> +	if (!addr_is_memory(phys_addr))
> +		return 0;
> +
> +	if (!hyp_page_ref_dec_and_test(p))
> +		return 0;
> +
> +	return do_unshare(&share);
> +}
> +
> +/*
> + * __pkvm_host_share_dma - Mark host memory as used for DMA
> + * @phys_addr:	physical address of the DMA region
> + * @size:	size of the DMA region
> + * @is_ram:	whether it is RAM or device memory
> + *
> + * We must not allow the host to donate pages that are mapped in the IOMMU for
> + * DMA. So:
> + * 1. Mark the host S2 entry as being owned by IOMMU
> + * 2. Refcount it, since a page may be mapped in multiple device address spaces.
> + *
> + * At some point we may end up needing more than the current 16 bits for
> + * refcounting, for example if all devices and sub-devices map the same MSI
> + * doorbell page. It will do for now.
> + */
> +int __pkvm_host_share_dma(phys_addr_t phys_addr, size_t size, bool is_ram)
> +{
> +	int i;
> +	int ret;
> +	size_t nr_pages = size >> PAGE_SHIFT;
> +
> +	if (WARN_ON(!PAGE_ALIGNED(phys_addr | size)))
> +		return -EINVAL;
> +
> +	host_lock_component();
> +	hyp_lock_component();
> +
> +	for (i = 0; i < nr_pages; i++) {
> +		ret = __pkvm_host_share_dma_page(phys_addr + i * PAGE_SIZE,
> +						 is_ram);
Hi Jean,

I'm not familiar with ARM arch. Just out of curiosity. If pKVM-ARM 
populates the host stage-2 page table lazily, would there be a case that 
device driver in host triggers DMA with pages which have not been mapped 
to the host stage-2 page table yet? How do we handle this situation?

Regards,
-Tina

> +		if (ret)
> +			break;
> +	}
> +
> +	if (ret) {
> +		for (--i; i >= 0; --i)
> +			__pkvm_host_unshare_dma_page(phys_addr + i * PAGE_SIZE);
> +	}
> +
> +	hyp_unlock_component();
> +	host_unlock_component();
> +
> +	return ret;
> +}
> +
> +int __pkvm_host_unshare_dma(phys_addr_t phys_addr, size_t size)
> +{
> +	int i;
> +	int ret;
> +	size_t nr_pages = size >> PAGE_SHIFT;
> +
> +	host_lock_component();
> +	hyp_lock_component();
> +
> +	/*
> +	 * We end up here after the caller successfully unmapped the page from
> +	 * the IOMMU table. Which means that a ref is held, the page is shared
> +	 * in the host s2, there can be no failure.
> +	 */
> +	for (i = 0; i < nr_pages; i++) {
> +		ret = __pkvm_host_unshare_dma_page(phys_addr + i * PAGE_SIZE);
> +		if (ret)
> +			break;
> +	}
> +
> +	hyp_unlock_component();
> +	host_unlock_component();
> +
> +	return ret;
> +}
> +
>   int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
>   {
>   	int ret;

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 15/45] KVM: arm64: pkvm: Add __pkvm_host_share/unshare_dma()
  2023-02-04 12:51     ` tina.zhang
@ 2023-02-06 12:13       ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-06 12:13 UTC (permalink / raw)
  To: tina.zhang
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, smostafa, dbrazdil,
	ryan.roberts, linux-arm-kernel, kvmarm, iommu

Hi Tina,

On Sat, Feb 04, 2023 at 08:51:38PM +0800, tina.zhang wrote:
> > +int __pkvm_host_share_dma(phys_addr_t phys_addr, size_t size, bool is_ram)
> > +{
> > +	int i;
> > +	int ret;
> > +	size_t nr_pages = size >> PAGE_SHIFT;
> > +
> > +	if (WARN_ON(!PAGE_ALIGNED(phys_addr | size)))
> > +		return -EINVAL;
> > +
> > +	host_lock_component();
> > +	hyp_lock_component();
> > +
> > +	for (i = 0; i < nr_pages; i++) {
> > +		ret = __pkvm_host_share_dma_page(phys_addr + i * PAGE_SIZE,
> > +						 is_ram);
> Hi Jean,
> 
> I'm not familiar with ARM arch. Just out of curiosity. If pKVM-ARM populates
> the host stage-2 page table lazily, would there be a case that device driver
> in host triggers DMA with pages which have not been mapped to the host
> stage-2 page table yet? How do we handle this situation?

It's possible that the host asks the hypervisor to map on the IOMMU side
a page that is not yet mapped on the CPU side. In general before calling
map_pages() the host zero-initializes the page, triggering a page fault
which creates the mapping, so this case is rare. But if it happens,
__pkvm_host_share_dma() will create the CPU stage-2 mapping:

 __pkvm_host_share_dma()
  do_share()
   host_initiate_share()
    host_stage2_idmap_locked()

which creates a valid identity mapping, along with the ownership
information PVKM_PAGE_SHARED_OWNED. That ownership info is really all we
need here, to prevent future donations to guests or hyp. Since the SMMU
side uses separate stage-2 page tables, we don't actually need to create a
valid mapping on the CPU side yet, that's just how pKVM's mem_protect
currently works. But I don't think it hurts to create the mapping right
away instead of waiting for the CPU page fault, because the host will
likely access the page soon to read what the device wrote there.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 15/45] KVM: arm64: pkvm: Add __pkvm_host_share/unshare_dma()
@ 2023-02-06 12:13       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-06 12:13 UTC (permalink / raw)
  To: tina.zhang
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, smostafa, dbrazdil,
	ryan.roberts, linux-arm-kernel, kvmarm, iommu

Hi Tina,

On Sat, Feb 04, 2023 at 08:51:38PM +0800, tina.zhang wrote:
> > +int __pkvm_host_share_dma(phys_addr_t phys_addr, size_t size, bool is_ram)
> > +{
> > +	int i;
> > +	int ret;
> > +	size_t nr_pages = size >> PAGE_SHIFT;
> > +
> > +	if (WARN_ON(!PAGE_ALIGNED(phys_addr | size)))
> > +		return -EINVAL;
> > +
> > +	host_lock_component();
> > +	hyp_lock_component();
> > +
> > +	for (i = 0; i < nr_pages; i++) {
> > +		ret = __pkvm_host_share_dma_page(phys_addr + i * PAGE_SIZE,
> > +						 is_ram);
> Hi Jean,
> 
> I'm not familiar with ARM arch. Just out of curiosity. If pKVM-ARM populates
> the host stage-2 page table lazily, would there be a case that device driver
> in host triggers DMA with pages which have not been mapped to the host
> stage-2 page table yet? How do we handle this situation?

It's possible that the host asks the hypervisor to map on the IOMMU side
a page that is not yet mapped on the CPU side. In general before calling
map_pages() the host zero-initializes the page, triggering a page fault
which creates the mapping, so this case is rare. But if it happens,
__pkvm_host_share_dma() will create the CPU stage-2 mapping:

 __pkvm_host_share_dma()
  do_share()
   host_initiate_share()
    host_stage2_idmap_locked()

which creates a valid identity mapping, along with the ownership
information PVKM_PAGE_SHARED_OWNED. That ownership info is really all we
need here, to prevent future donations to guests or hyp. Since the SMMU
side uses separate stage-2 page tables, we don't actually need to create a
valid mapping on the CPU side yet, that's just how pKVM's mem_protect
currently works. But I don't think it hurts to create the mapping right
away instead of waiting for the CPU page fault, because the host will
likely access the page soon to read what the device wrote there.

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 15/45] KVM: arm64: pkvm: Add __pkvm_host_share/unshare_dma()
  2023-02-06 12:13       ` Jean-Philippe Brucker
@ 2023-02-07  2:37         ` tina.zhang
  -1 siblings, 0 replies; 201+ messages in thread
From: tina.zhang @ 2023-02-07  2:37 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, smostafa, dbrazdil,
	ryan.roberts, linux-arm-kernel, kvmarm, iommu

Hi Jean,

On 2/6/23 20:13, Jean-Philippe Brucker wrote:
> Hi Tina,
> 
> On Sat, Feb 04, 2023 at 08:51:38PM +0800, tina.zhang wrote:
>>> +int __pkvm_host_share_dma(phys_addr_t phys_addr, size_t size, bool is_ram)
>>> +{
>>> +	int i;
>>> +	int ret;
>>> +	size_t nr_pages = size >> PAGE_SHIFT;
>>> +
>>> +	if (WARN_ON(!PAGE_ALIGNED(phys_addr | size)))
>>> +		return -EINVAL;
>>> +
>>> +	host_lock_component();
>>> +	hyp_lock_component();
>>> +
>>> +	for (i = 0; i < nr_pages; i++) {
>>> +		ret = __pkvm_host_share_dma_page(phys_addr + i * PAGE_SIZE,
>>> +						 is_ram);
>> Hi Jean,
>>
>> I'm not familiar with ARM arch. Just out of curiosity. If pKVM-ARM populates
>> the host stage-2 page table lazily, would there be a case that device driver
>> in host triggers DMA with pages which have not been mapped to the host
>> stage-2 page table yet? How do we handle this situation?
> 
> It's possible that the host asks the hypervisor to map on the IOMMU side
> a page that is not yet mapped on the CPU side. In general before calling
> map_pages() the host zero-initializes the page, triggering a page fault
> which creates the mapping, so this case is rare. But if it happens,
> __pkvm_host_share_dma() will create the CPU stage-2 mapping:
> 
>   __pkvm_host_share_dma()
>    do_share()
>     host_initiate_share()
>      host_stage2_idmap_locked()
> 
> which creates a valid identity mapping, along with the ownership
> information PVKM_PAGE_SHARED_OWNED. That ownership info is really all we
> need here, to prevent future donations to guests or hyp. Since the SMMU
> side uses separate stage-2 page tables, we don't actually need to create a
> valid mapping on the CPU side yet, that's just how pKVM's mem_protect
> currently works. But I don't think it hurts to create the mapping right
> away instead of waiting for the CPU page fault, because the host will
> likely access the page soon to read what the device wrote there.
Right. I was thinking if stage-2 lazy mapping method is being adopted, 
whether there would be a case that the check_share() couldn't pass, as 
the pte might not be valid at that time. After checking the logic, I see 
we check kvm_pte_valid(pte) as well as the value of pte. So I guess the 
check_share() can return successfully even w/o having triggered host CPU 
page fault. Thanks for the elaborating.

Regards,
-Tina

> 
> Thanks,
> Jean
> 

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 15/45] KVM: arm64: pkvm: Add __pkvm_host_share/unshare_dma()
@ 2023-02-07  2:37         ` tina.zhang
  0 siblings, 0 replies; 201+ messages in thread
From: tina.zhang @ 2023-02-07  2:37 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, smostafa, dbrazdil,
	ryan.roberts, linux-arm-kernel, kvmarm, iommu

Hi Jean,

On 2/6/23 20:13, Jean-Philippe Brucker wrote:
> Hi Tina,
> 
> On Sat, Feb 04, 2023 at 08:51:38PM +0800, tina.zhang wrote:
>>> +int __pkvm_host_share_dma(phys_addr_t phys_addr, size_t size, bool is_ram)
>>> +{
>>> +	int i;
>>> +	int ret;
>>> +	size_t nr_pages = size >> PAGE_SHIFT;
>>> +
>>> +	if (WARN_ON(!PAGE_ALIGNED(phys_addr | size)))
>>> +		return -EINVAL;
>>> +
>>> +	host_lock_component();
>>> +	hyp_lock_component();
>>> +
>>> +	for (i = 0; i < nr_pages; i++) {
>>> +		ret = __pkvm_host_share_dma_page(phys_addr + i * PAGE_SIZE,
>>> +						 is_ram);
>> Hi Jean,
>>
>> I'm not familiar with ARM arch. Just out of curiosity. If pKVM-ARM populates
>> the host stage-2 page table lazily, would there be a case that device driver
>> in host triggers DMA with pages which have not been mapped to the host
>> stage-2 page table yet? How do we handle this situation?
> 
> It's possible that the host asks the hypervisor to map on the IOMMU side
> a page that is not yet mapped on the CPU side. In general before calling
> map_pages() the host zero-initializes the page, triggering a page fault
> which creates the mapping, so this case is rare. But if it happens,
> __pkvm_host_share_dma() will create the CPU stage-2 mapping:
> 
>   __pkvm_host_share_dma()
>    do_share()
>     host_initiate_share()
>      host_stage2_idmap_locked()
> 
> which creates a valid identity mapping, along with the ownership
> information PVKM_PAGE_SHARED_OWNED. That ownership info is really all we
> need here, to prevent future donations to guests or hyp. Since the SMMU
> side uses separate stage-2 page tables, we don't actually need to create a
> valid mapping on the CPU side yet, that's just how pKVM's mem_protect
> currently works. But I don't think it hurts to create the mapping right
> away instead of waiting for the CPU page fault, because the host will
> likely access the page soon to read what the device wrote there.
Right. I was thinking if stage-2 lazy mapping method is being adopted, 
whether there would be a case that the check_share() couldn't pass, as 
the pte might not be valid at that time. After checking the logic, I see 
we check kvm_pte_valid(pte) as well as the value of pte. So I guess the 
check_share() can return successfully even w/o having triggered host CPU 
page fault. Thanks for the elaborating.

Regards,
-Tina

> 
> Thanks,
> Jean
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 15/45] KVM: arm64: pkvm: Add __pkvm_host_share/unshare_dma()
  2023-02-07  2:37         ` tina.zhang
@ 2023-02-07 10:39           ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-07 10:39 UTC (permalink / raw)
  To: tina.zhang
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, smostafa, dbrazdil,
	ryan.roberts, linux-arm-kernel, kvmarm, iommu

On Tue, Feb 07, 2023 at 10:37:51AM +0800, tina.zhang wrote:
> Hi Jean,
> 
> On 2/6/23 20:13, Jean-Philippe Brucker wrote:
> > Hi Tina,
> > 
> > On Sat, Feb 04, 2023 at 08:51:38PM +0800, tina.zhang wrote:
> > > > +int __pkvm_host_share_dma(phys_addr_t phys_addr, size_t size, bool is_ram)
> > > > +{
> > > > +	int i;
> > > > +	int ret;
> > > > +	size_t nr_pages = size >> PAGE_SHIFT;
> > > > +
> > > > +	if (WARN_ON(!PAGE_ALIGNED(phys_addr | size)))
> > > > +		return -EINVAL;
> > > > +
> > > > +	host_lock_component();
> > > > +	hyp_lock_component();
> > > > +
> > > > +	for (i = 0; i < nr_pages; i++) {
> > > > +		ret = __pkvm_host_share_dma_page(phys_addr + i * PAGE_SIZE,
> > > > +						 is_ram);
> > > Hi Jean,
> > > 
> > > I'm not familiar with ARM arch. Just out of curiosity. If pKVM-ARM populates
> > > the host stage-2 page table lazily, would there be a case that device driver
> > > in host triggers DMA with pages which have not been mapped to the host
> > > stage-2 page table yet? How do we handle this situation?
> > 
> > It's possible that the host asks the hypervisor to map on the IOMMU side
> > a page that is not yet mapped on the CPU side. In general before calling
> > map_pages() the host zero-initializes the page, triggering a page fault
> > which creates the mapping, so this case is rare. But if it happens,
> > __pkvm_host_share_dma() will create the CPU stage-2 mapping:
> > 
> >   __pkvm_host_share_dma()
> >    do_share()
> >     host_initiate_share()
> >      host_stage2_idmap_locked()
> > 
> > which creates a valid identity mapping, along with the ownership
> > information PVKM_PAGE_SHARED_OWNED. That ownership info is really all we
> > need here, to prevent future donations to guests or hyp. Since the SMMU
> > side uses separate stage-2 page tables, we don't actually need to create a
> > valid mapping on the CPU side yet, that's just how pKVM's mem_protect
> > currently works. But I don't think it hurts to create the mapping right
> > away instead of waiting for the CPU page fault, because the host will
> > likely access the page soon to read what the device wrote there.
> Right. I was thinking if stage-2 lazy mapping method is being adopted,
> whether there would be a case that the check_share() couldn't pass, as the
> pte might not be valid at that time. After checking the logic, I see we
> check kvm_pte_valid(pte) as well as the value of pte. So I guess the
> check_share() can return successfully even w/o having triggered host CPU
> page fault.

Yes, during check_share(), host_get_page_state() allows disabled PTE
unless the PTE already contains ownership information (in particular, if
the page has been donated to a guest or to the hypervisor, the PTE is
invalid and contains the non-zero owner ID.)

Thanks,
Jean


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 15/45] KVM: arm64: pkvm: Add __pkvm_host_share/unshare_dma()
@ 2023-02-07 10:39           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-07 10:39 UTC (permalink / raw)
  To: tina.zhang
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, smostafa, dbrazdil,
	ryan.roberts, linux-arm-kernel, kvmarm, iommu

On Tue, Feb 07, 2023 at 10:37:51AM +0800, tina.zhang wrote:
> Hi Jean,
> 
> On 2/6/23 20:13, Jean-Philippe Brucker wrote:
> > Hi Tina,
> > 
> > On Sat, Feb 04, 2023 at 08:51:38PM +0800, tina.zhang wrote:
> > > > +int __pkvm_host_share_dma(phys_addr_t phys_addr, size_t size, bool is_ram)
> > > > +{
> > > > +	int i;
> > > > +	int ret;
> > > > +	size_t nr_pages = size >> PAGE_SHIFT;
> > > > +
> > > > +	if (WARN_ON(!PAGE_ALIGNED(phys_addr | size)))
> > > > +		return -EINVAL;
> > > > +
> > > > +	host_lock_component();
> > > > +	hyp_lock_component();
> > > > +
> > > > +	for (i = 0; i < nr_pages; i++) {
> > > > +		ret = __pkvm_host_share_dma_page(phys_addr + i * PAGE_SIZE,
> > > > +						 is_ram);
> > > Hi Jean,
> > > 
> > > I'm not familiar with ARM arch. Just out of curiosity. If pKVM-ARM populates
> > > the host stage-2 page table lazily, would there be a case that device driver
> > > in host triggers DMA with pages which have not been mapped to the host
> > > stage-2 page table yet? How do we handle this situation?
> > 
> > It's possible that the host asks the hypervisor to map on the IOMMU side
> > a page that is not yet mapped on the CPU side. In general before calling
> > map_pages() the host zero-initializes the page, triggering a page fault
> > which creates the mapping, so this case is rare. But if it happens,
> > __pkvm_host_share_dma() will create the CPU stage-2 mapping:
> > 
> >   __pkvm_host_share_dma()
> >    do_share()
> >     host_initiate_share()
> >      host_stage2_idmap_locked()
> > 
> > which creates a valid identity mapping, along with the ownership
> > information PVKM_PAGE_SHARED_OWNED. That ownership info is really all we
> > need here, to prevent future donations to guests or hyp. Since the SMMU
> > side uses separate stage-2 page tables, we don't actually need to create a
> > valid mapping on the CPU side yet, that's just how pKVM's mem_protect
> > currently works. But I don't think it hurts to create the mapping right
> > away instead of waiting for the CPU page fault, because the host will
> > likely access the page soon to read what the device wrote there.
> Right. I was thinking if stage-2 lazy mapping method is being adopted,
> whether there would be a case that the check_share() couldn't pass, as the
> pte might not be valid at that time. After checking the logic, I see we
> check kvm_pte_valid(pte) as well as the value of pte. So I guess the
> check_share() can return successfully even w/o having triggered host CPU
> page fault.

Yes, during check_share(), host_get_page_state() allows disabled PTE
unless the PTE already contains ownership information (in particular, if
the page has been donated to a guest or to the hypervisor, the PTE is
invalid and contains the non-zero owner ID.)

Thanks,
Jean


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 05/45] iommu/io-pgtable: Split io_pgtable structure
  2023-02-01 12:52   ` Jean-Philippe Brucker
  (?)
@ 2023-02-07 12:16   ` Mostafa Saleh
  2023-02-08 18:01       ` Jean-Philippe Brucker
  -1 siblings, 1 reply; 201+ messages in thread
From: Mostafa Saleh @ 2023-02-07 12:16 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu, Abhinav Kumar,
	Alyssa Rosenzweig, Andy Gross, Bjorn Andersson, Daniel Vetter,
	David Airlie, Dmitry Baryshkov, Hector Martin, Konrad Dybcio,
	Matthias Brugger, Rob Clark, Rob Herring, Sean Paul,
	Steven Price, Suravee Suthikulpanit, Sven Peter, Tomeu Vizoso,
	Yong Wu

Hi Jean,

On Wed, Feb 01, 2023 at 12:52:49PM +0000, Jean-Philippe Brucker wrote:
> The io_pgtable structure contains all information needed for io-pgtable
> ops map() and unmap(), including a static configuration, driver-facing
> ops, TLB callbacks and the PGD pointer. Most of these are common to all
> sets of page tables for a given configuration, and really only need one
> instance.
> 
> Split the structure in two:
> 
> * io_pgtable_params contains information that is common to all sets of
>   page tables for a given io_pgtable_cfg.
> * io_pgtable contains information that is different for each set of page
>   tables, namely the PGD and the IOMMU driver cookie passed to TLB
>   callbacks.
> 
> Keep essentially the same interface for IOMMU drivers, but move it
> behind a set of helpers.
> 
> The goal is to optimize for space, in order to allocate less memory in
> the KVM SMMU driver. While storing 64k io-pgtables with identical
> configuration would previously require 10MB, it is now 512kB because the
> driver only needs to store the pgd for each domain.
> 
> Note that the io_pgtable_cfg still contains the TTBRs, which are
> specific to a set of page tables. Most of them can be removed, since
> IOMMU drivers can trivially obtain them with virt_to_phys(iop->pgd).
> Some architectures do have static configuration bits in the TTBR that
> need to be kept.
> 
> Unfortunately the split does add an additional dereference which
> degrades performance slightly. Running a single-threaded dma-map
> benchmark on a server with SMMUv3, I measured a regression of 7-9ns for
> map() and 32-78ns for unmap(), which is a slowdown of about 4% and 8%
> respectively.
> 
> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
> Cc: Andy Gross <agross@kernel.org>
> Cc: Bjorn Andersson <andersson@kernel.org>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: David Airlie <airlied@gmail.com>
> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> Cc: Hector Martin <marcan@marcan.st>
> Cc: Konrad Dybcio <konrad.dybcio@linaro.org>
> Cc: Matthias Brugger <matthias.bgg@gmail.com>
> Cc: Rob Clark <robdclark@gmail.com>
> Cc: Rob Herring <robh@kernel.org>
> Cc: Sean Paul <sean@poorly.run>
> Cc: Steven Price <steven.price@arm.com>
> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> Cc: Sven Peter <sven@svenpeter.dev>
> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
> Cc: Yong Wu <yong.wu@mediatek.com>
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> ---
>  drivers/gpu/drm/panfrost/panfrost_device.h  |   2 +-
>  drivers/iommu/amd/amd_iommu_types.h         |  17 +-
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |   3 +-
>  drivers/iommu/arm/arm-smmu/arm-smmu.h       |   2 +-
>  include/linux/io-pgtable-arm.h              |  12 +-
>  include/linux/io-pgtable.h                  |  94 +++++++---
>  drivers/gpu/drm/msm/msm_iommu.c             |  21 ++-
>  drivers/gpu/drm/panfrost/panfrost_mmu.c     |  20 +--
>  drivers/iommu/amd/io_pgtable.c              |  26 +--
>  drivers/iommu/amd/io_pgtable_v2.c           |  43 ++---
>  drivers/iommu/amd/iommu.c                   |  28 ++-
>  drivers/iommu/apple-dart.c                  |  36 ++--
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  34 ++--
>  drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c  |   7 +-
>  drivers/iommu/arm/arm-smmu/arm-smmu.c       |  40 ++---
>  drivers/iommu/arm/arm-smmu/qcom_iommu.c     |  40 ++---
>  drivers/iommu/io-pgtable-arm-common.c       |  80 +++++----
>  drivers/iommu/io-pgtable-arm-v7s.c          | 189 ++++++++++----------
>  drivers/iommu/io-pgtable-arm.c              | 158 ++++++++--------
>  drivers/iommu/io-pgtable-dart.c             |  97 +++++-----
>  drivers/iommu/io-pgtable.c                  |  36 ++--
>  drivers/iommu/ipmmu-vmsa.c                  |  18 +-
>  drivers/iommu/msm_iommu.c                   |  17 +-
>  drivers/iommu/mtk_iommu.c                   |  13 +-
>  24 files changed, 519 insertions(+), 514 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h
> index 8b25278f34c8..8a610c4b8f03 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_device.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_device.h
> @@ -126,7 +126,7 @@ struct panfrost_mmu {
>  	struct panfrost_device *pfdev;
>  	struct kref refcount;
>  	struct io_pgtable_cfg pgtbl_cfg;
> -	struct io_pgtable_ops *pgtbl_ops;
> +	struct io_pgtable pgtbl;
>  	struct drm_mm mm;
>  	spinlock_t mm_lock;
>  	int as;
> diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
> index 3d684190b4d5..5920a556f7ec 100644
> --- a/drivers/iommu/amd/amd_iommu_types.h
> +++ b/drivers/iommu/amd/amd_iommu_types.h
> @@ -516,10 +516,10 @@ struct amd_irte_ops;
>  #define AMD_IOMMU_FLAG_TRANS_PRE_ENABLED      (1 << 0)
>  
>  #define io_pgtable_to_data(x) \
> -	container_of((x), struct amd_io_pgtable, iop)
> +	container_of((x), struct amd_io_pgtable, iop_params)
>  
>  #define io_pgtable_ops_to_data(x) \
> -	io_pgtable_to_data(io_pgtable_ops_to_pgtable(x))
> +	io_pgtable_to_data(io_pgtable_ops_to_params(x))
>  
>  #define io_pgtable_ops_to_domain(x) \
>  	container_of(io_pgtable_ops_to_data(x), \
> @@ -529,12 +529,13 @@ struct amd_irte_ops;
>  	container_of((x), struct amd_io_pgtable, pgtbl_cfg)
>  
>  struct amd_io_pgtable {
> -	struct io_pgtable_cfg	pgtbl_cfg;
> -	struct io_pgtable	iop;
> -	int			mode;
> -	u64			*root;
> -	atomic64_t		pt_root;	/* pgtable root and pgtable mode */
> -	u64			*pgd;		/* v2 pgtable pgd pointer */
> +	struct io_pgtable_cfg		pgtbl_cfg;
> +	struct io_pgtable		iop;
> +	struct io_pgtable_params	iop_params;
> +	int				mode;
> +	u64				*root;
> +	atomic64_t			pt_root;	/* pgtable root and pgtable mode */
> +	u64				*pgd;		/* v2 pgtable pgd pointer */
>  };
>  
>  /*
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 8d772ea8a583..cec3c8103404 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -10,6 +10,7 @@
>  
>  #include <linux/bitfield.h>
>  #include <linux/iommu.h>
> +#include <linux/io-pgtable.h>
>  #include <linux/kernel.h>
>  #include <linux/mmzone.h>
>  #include <linux/sizes.h>
> @@ -710,7 +711,7 @@ struct arm_smmu_domain {
>  	struct arm_smmu_device		*smmu;
>  	struct mutex			init_mutex; /* Protects smmu pointer */
>  
> -	struct io_pgtable_ops		*pgtbl_ops;
> +	struct io_pgtable		pgtbl;
>  	bool				stall_enabled;
>  	atomic_t			nr_ats_masters;
>  
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> index 703fd5817ec1..249825fc71ac 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> @@ -366,7 +366,7 @@ enum arm_smmu_domain_stage {
>  
>  struct arm_smmu_domain {
>  	struct arm_smmu_device		*smmu;
> -	struct io_pgtable_ops		*pgtbl_ops;
> +	struct io_pgtable		pgtbl;
>  	unsigned long			pgtbl_quirks;
>  	const struct iommu_flush_ops	*flush_ops;
>  	struct arm_smmu_cfg		cfg;
> diff --git a/include/linux/io-pgtable-arm.h b/include/linux/io-pgtable-arm.h
> index 42202bc0ffa2..5199bd9851b6 100644
> --- a/include/linux/io-pgtable-arm.h
> +++ b/include/linux/io-pgtable-arm.h
> @@ -9,13 +9,11 @@ extern bool selftest_running;
>  typedef u64 arm_lpae_iopte;
>  
>  struct arm_lpae_io_pgtable {
> -	struct io_pgtable	iop;
> +	struct io_pgtable_params	iop;
>  
> -	int			pgd_bits;
> -	int			start_level;
> -	int			bits_per_level;
> -
> -	void			*pgd;
> +	int				pgd_bits;
> +	int				start_level;
> +	int				bits_per_level;
>  };
>  
>  /* Struct accessors */
> @@ -23,7 +21,7 @@ struct arm_lpae_io_pgtable {
>  	container_of((x), struct arm_lpae_io_pgtable, iop)
>  
>  #define io_pgtable_ops_to_data(x)					\
> -	io_pgtable_to_data(io_pgtable_ops_to_pgtable(x))
> +	io_pgtable_to_data(io_pgtable_ops_to_params(x))
>  
>  /*
>   * Calculate the right shift amount to get to the portion describing level l
> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
> index ee6484d7a5e0..cce5ddbf71c7 100644
> --- a/include/linux/io-pgtable.h
> +++ b/include/linux/io-pgtable.h
> @@ -149,6 +149,20 @@ struct io_pgtable_cfg {
>  	};
>  };
>  
> +/**
> + * struct io_pgtable - Structure describing a set of page tables.
> + *
> + * @ops:	The page table operations in use for this set of page tables.
> + * @cookie:	An opaque token provided by the IOMMU driver and passed back to
> + *		any callback routines.
> + * @pgd:	Virtual address of the page directory.
> + */
> +struct io_pgtable {
> +	struct io_pgtable_ops	*ops;
> +	void			*cookie;
> +	void			*pgd;
> +};
> +
>  /**
>   * struct io_pgtable_ops - Page table manipulation API for IOMMU drivers.
>   *
> @@ -160,36 +174,64 @@ struct io_pgtable_cfg {
>   * the same names.
>   */
>  struct io_pgtable_ops {
> -	int (*map_pages)(struct io_pgtable_ops *ops, unsigned long iova,
> +	int (*map_pages)(struct io_pgtable *iop, unsigned long iova,
>  			 phys_addr_t paddr, size_t pgsize, size_t pgcount,
>  			 int prot, gfp_t gfp, size_t *mapped);
> -	size_t (*unmap_pages)(struct io_pgtable_ops *ops, unsigned long iova,
> +	size_t (*unmap_pages)(struct io_pgtable *iop, unsigned long iova,
>  			      size_t pgsize, size_t pgcount,
>  			      struct iommu_iotlb_gather *gather);
> -	phys_addr_t (*iova_to_phys)(struct io_pgtable_ops *ops,
> -				    unsigned long iova);
> +	phys_addr_t (*iova_to_phys)(struct io_pgtable *iop, unsigned long iova);
>  };
>  
> +static inline int
> +iopt_map_pages(struct io_pgtable *iop, unsigned long iova, phys_addr_t paddr,
> +	       size_t pgsize, size_t pgcount, int prot, gfp_t gfp,
> +	       size_t *mapped)
> +{
> +	if (!iop->ops || !iop->ops->map_pages)
> +		return -EINVAL;
> +	return iop->ops->map_pages(iop, iova, paddr, pgsize, pgcount, prot, gfp,
> +				   mapped);
> +}
> +
> +static inline size_t
> +iopt_unmap_pages(struct io_pgtable *iop, unsigned long iova, size_t pgsize,
> +		 size_t pgcount, struct iommu_iotlb_gather *gather)
> +{
> +	if (!iop->ops || !iop->ops->map_pages)
Should this be !iop->ops->unmap_pages?


> +		return 0;
> +	return iop->ops->unmap_pages(iop, iova, pgsize, pgcount, gather);
> +}
> +
> +static inline phys_addr_t
> +iopt_iova_to_phys(struct io_pgtable *iop, unsigned long iova)
> +{
> +	if (!iop->ops || !iop->ops->iova_to_phys)
> +		return 0;
> +	return iop->ops->iova_to_phys(iop, iova);
> +}
> +
>  /**
>   * alloc_io_pgtable_ops() - Allocate a page table allocator for use by an IOMMU.
>   *
> + * @iop:    The page table object, filled with the allocated ops on success
>   * @cfg:    The page table configuration. This will be modified to represent
>   *          the configuration actually provided by the allocator (e.g. the
>   *          pgsize_bitmap may be restricted).
>   * @cookie: An opaque token provided by the IOMMU driver and passed back to
>   *          the callback routines in cfg->tlb.
>   */
> -struct io_pgtable_ops *alloc_io_pgtable_ops(struct io_pgtable_cfg *cfg,
> -					    void *cookie);
> +int alloc_io_pgtable_ops(struct io_pgtable *iop, struct io_pgtable_cfg *cfg,
> +			 void *cookie);
>  
>  /**
> - * free_io_pgtable_ops() - Free an io_pgtable_ops structure. The caller
> + * free_io_pgtable_ops() - Free the page table. The caller
>   *                         *must* ensure that the page table is no longer
>   *                         live, but the TLB can be dirty.
>   *
> - * @ops: The ops returned from alloc_io_pgtable_ops.
> + * @iop: The iop object passed to alloc_io_pgtable_ops
>   */
> -void free_io_pgtable_ops(struct io_pgtable_ops *ops);
> +void free_io_pgtable_ops(struct io_pgtable *iop);
>  
>  /**
>   * io_pgtable_configure - Create page table config
> @@ -209,42 +251,41 @@ int io_pgtable_configure(struct io_pgtable_cfg *cfg, size_t *pgd_size);
>   */
>  
>  /**
> - * struct io_pgtable - Internal structure describing a set of page tables.
> + * struct io_pgtable_params - Internal structure describing parameters for a
> + *			      given page table configuration
>   *
> - * @cookie: An opaque token provided by the IOMMU driver and passed back to
> - *          any callback routines.
>   * @cfg:    A copy of the page table configuration.
>   * @ops:    The page table operations in use for this set of page tables.
>   */
> -struct io_pgtable {
> -	void			*cookie;
> +struct io_pgtable_params {
>  	struct io_pgtable_cfg	cfg;
>  	struct io_pgtable_ops	ops;
>  };
>  
> -#define io_pgtable_ops_to_pgtable(x) container_of((x), struct io_pgtable, ops)
> +#define io_pgtable_ops_to_params(x) container_of((x), struct io_pgtable_params, ops)
>  
> -static inline void io_pgtable_tlb_flush_all(struct io_pgtable *iop)
> +static inline void io_pgtable_tlb_flush_all(struct io_pgtable_cfg *cfg,
> +					    struct io_pgtable *iop)
>  {
> -	if (iop->cfg.tlb && iop->cfg.tlb->tlb_flush_all)
> -		iop->cfg.tlb->tlb_flush_all(iop->cookie);
> +	if (cfg->tlb && cfg->tlb->tlb_flush_all)
> +		cfg->tlb->tlb_flush_all(iop->cookie);
>  }
>  
>  static inline void
> -io_pgtable_tlb_flush_walk(struct io_pgtable *iop, unsigned long iova,
> -			  size_t size, size_t granule)
> +io_pgtable_tlb_flush_walk(struct io_pgtable_cfg *cfg, struct io_pgtable *iop,
> +			  unsigned long iova, size_t size, size_t granule)
>  {
> -	if (iop->cfg.tlb && iop->cfg.tlb->tlb_flush_walk)
> -		iop->cfg.tlb->tlb_flush_walk(iova, size, granule, iop->cookie);
> +	if (cfg->tlb && cfg->tlb->tlb_flush_walk)
> +		cfg->tlb->tlb_flush_walk(iova, size, granule, iop->cookie);
>  }
>  
>  static inline void
> -io_pgtable_tlb_add_page(struct io_pgtable *iop,
> +io_pgtable_tlb_add_page(struct io_pgtable_cfg *cfg, struct io_pgtable *iop,
>  			struct iommu_iotlb_gather * gather, unsigned long iova,
>  			size_t granule)
>  {
> -	if (iop->cfg.tlb && iop->cfg.tlb->tlb_add_page)
> -		iop->cfg.tlb->tlb_add_page(gather, iova, granule, iop->cookie);
> +	if (cfg->tlb && cfg->tlb->tlb_add_page)
> +		cfg->tlb->tlb_add_page(gather, iova, granule, iop->cookie);
>  }
>  
>  /**
> @@ -256,7 +297,8 @@ io_pgtable_tlb_add_page(struct io_pgtable *iop,
>   * @configure: Create the configuration without allocating anything. Optional.
>   */
>  struct io_pgtable_init_fns {
> -	struct io_pgtable *(*alloc)(struct io_pgtable_cfg *cfg, void *cookie);
> +	int (*alloc)(struct io_pgtable *iop, struct io_pgtable_cfg *cfg,
> +		     void *cookie);
>  	void (*free)(struct io_pgtable *iop);
>  	int (*configure)(struct io_pgtable_cfg *cfg, size_t *pgd_size);
>  };
> diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
> index e9c6f281e3dd..e372ca6cd79c 100644
> --- a/drivers/gpu/drm/msm/msm_iommu.c
> +++ b/drivers/gpu/drm/msm/msm_iommu.c
> @@ -20,7 +20,7 @@ struct msm_iommu {
>  struct msm_iommu_pagetable {
>  	struct msm_mmu base;
>  	struct msm_mmu *parent;
> -	struct io_pgtable_ops *pgtbl_ops;
> +	struct io_pgtable pgtbl;
>  	unsigned long pgsize_bitmap;	/* Bitmap of page sizes in use */
>  	phys_addr_t ttbr;
>  	u32 asid;
> @@ -90,14 +90,14 @@ static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova,
>  		size_t size)
>  {
>  	struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
> -	struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
>  
>  	while (size) {
>  		size_t unmapped, pgsize, count;
>  
>  		pgsize = calc_pgsize(pagetable, iova, iova, size, &count);
>  
> -		unmapped = ops->unmap_pages(ops, iova, pgsize, count, NULL);
> +		unmapped = iopt_unmap_pages(&pagetable->pgtbl, iova, pgsize,
> +					    count, NULL);
>  		if (!unmapped)
>  			break;
>  
> @@ -114,7 +114,7 @@ static int msm_iommu_pagetable_map(struct msm_mmu *mmu, u64 iova,
>  		struct sg_table *sgt, size_t len, int prot)
>  {
>  	struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
> -	struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
> +	struct io_pgtable *iop = &pagetable->pgtbl;
>  	struct scatterlist *sg;
>  	u64 addr = iova;
>  	unsigned int i;
> @@ -129,7 +129,7 @@ static int msm_iommu_pagetable_map(struct msm_mmu *mmu, u64 iova,
>  
>  			pgsize = calc_pgsize(pagetable, addr, phys, size, &count);
>  
> -			ret = ops->map_pages(ops, addr, phys, pgsize, count,
> +			ret = iopt_map_pages(iop, addr, phys, pgsize, count,
>  					     prot, GFP_KERNEL, &mapped);
>  
>  			/* map_pages could fail after mapping some of the pages,
> @@ -163,7 +163,7 @@ static void msm_iommu_pagetable_destroy(struct msm_mmu *mmu)
>  	if (atomic_dec_return(&iommu->pagetables) == 0)
>  		adreno_smmu->set_ttbr0_cfg(adreno_smmu->cookie, NULL);
>  
> -	free_io_pgtable_ops(pagetable->pgtbl_ops);
> +	free_io_pgtable_ops(&pagetable->pgtbl);
>  	kfree(pagetable);
>  }
>  
> @@ -258,11 +258,10 @@ struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent)
>  	ttbr0_cfg.quirks &= ~IO_PGTABLE_QUIRK_ARM_TTBR1;
>  	ttbr0_cfg.tlb = &null_tlb_ops;
>  
> -	pagetable->pgtbl_ops = alloc_io_pgtable_ops(&ttbr0_cfg, iommu->domain);
> -
> -	if (!pagetable->pgtbl_ops) {
> +	ret = alloc_io_pgtable_ops(&pagetable->pgtbl, &ttbr0_cfg, iommu->domain);
> +	if (ret) {
>  		kfree(pagetable);
> -		return ERR_PTR(-ENOMEM);
> +		return ERR_PTR(ret);
>  	}
>  
>  	/*
> @@ -275,7 +274,7 @@ struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent)
>  
>  		ret = adreno_smmu->set_ttbr0_cfg(adreno_smmu->cookie, &ttbr0_cfg);
>  		if (ret) {
> -			free_io_pgtable_ops(pagetable->pgtbl_ops);
> +			free_io_pgtable_ops(&pagetable->pgtbl);
>  			kfree(pagetable);
>  			return ERR_PTR(ret);
>  		}
> diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> index 31bdb5d46244..118b49ab120f 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> @@ -290,7 +290,6 @@ static int mmu_map_sg(struct panfrost_device *pfdev, struct panfrost_mmu *mmu,
>  {
>  	unsigned int count;
>  	struct scatterlist *sgl;
> -	struct io_pgtable_ops *ops = mmu->pgtbl_ops;
>  	u64 start_iova = iova;
>  
>  	for_each_sgtable_dma_sg(sgt, sgl, count) {
> @@ -303,8 +302,8 @@ static int mmu_map_sg(struct panfrost_device *pfdev, struct panfrost_mmu *mmu,
>  			size_t pgcount, mapped = 0;
>  			size_t pgsize = get_pgsize(iova | paddr, len, &pgcount);
>  
> -			ops->map_pages(ops, iova, paddr, pgsize, pgcount, prot,
> -				       GFP_KERNEL, &mapped);
> +			iopt_map_pages(&mmu->pgtbl, iova, paddr, pgsize,
> +				       pgcount, prot, GFP_KERNEL, &mapped);
>  			/* Don't get stuck if things have gone wrong */
>  			mapped = max(mapped, pgsize);
>  			iova += mapped;
> @@ -349,7 +348,7 @@ void panfrost_mmu_unmap(struct panfrost_gem_mapping *mapping)
>  	struct panfrost_gem_object *bo = mapping->obj;
>  	struct drm_gem_object *obj = &bo->base.base;
>  	struct panfrost_device *pfdev = to_panfrost_device(obj->dev);
> -	struct io_pgtable_ops *ops = mapping->mmu->pgtbl_ops;
> +	struct io_pgtable *iop = &mapping->mmu->pgtbl;
>  	u64 iova = mapping->mmnode.start << PAGE_SHIFT;
>  	size_t len = mapping->mmnode.size << PAGE_SHIFT;
>  	size_t unmapped_len = 0;
> @@ -366,8 +365,8 @@ void panfrost_mmu_unmap(struct panfrost_gem_mapping *mapping)
>  
>  		if (bo->is_heap)
>  			pgcount = 1;
> -		if (!bo->is_heap || ops->iova_to_phys(ops, iova)) {
> -			unmapped_page = ops->unmap_pages(ops, iova, pgsize, pgcount, NULL);
> +		if (!bo->is_heap || iopt_iova_to_phys(iop, iova)) {
> +			unmapped_page = iopt_unmap_pages(iop, iova, pgsize, pgcount, NULL);
>  			WARN_ON(unmapped_page != pgsize * pgcount);
>  		}
>  		iova += pgsize * pgcount;
> @@ -560,7 +559,7 @@ static void panfrost_mmu_release_ctx(struct kref *kref)
>  	}
>  	spin_unlock(&pfdev->as_lock);
>  
> -	free_io_pgtable_ops(mmu->pgtbl_ops);
> +	free_io_pgtable_ops(&mmu->pgtbl);
>  	drm_mm_takedown(&mmu->mm);
>  	kfree(mmu);
>  }
> @@ -605,6 +604,7 @@ static void panfrost_drm_mm_color_adjust(const struct drm_mm_node *node,
>  
>  struct panfrost_mmu *panfrost_mmu_ctx_create(struct panfrost_device *pfdev)
>  {
> +	int ret;
>  	struct panfrost_mmu *mmu;
>  
>  	mmu = kzalloc(sizeof(*mmu), GFP_KERNEL);
> @@ -631,10 +631,10 @@ struct panfrost_mmu *panfrost_mmu_ctx_create(struct panfrost_device *pfdev)
>  		.iommu_dev	= pfdev->dev,
>  	};
>  
> -	mmu->pgtbl_ops = alloc_io_pgtable_ops(&mmu->pgtbl_cfg, mmu);
> -	if (!mmu->pgtbl_ops) {
> +	ret = alloc_io_pgtable_ops(&mmu->pgtbl, &mmu->pgtbl_cfg, mmu);
> +	if (ret) {
>  		kfree(mmu);
> -		return ERR_PTR(-EINVAL);
> +		return ERR_PTR(ret);
>  	}
>  
>  	kref_init(&mmu->refcount);
> diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
> index ace0e9b8b913..f9ea551404ba 100644
> --- a/drivers/iommu/amd/io_pgtable.c
> +++ b/drivers/iommu/amd/io_pgtable.c
> @@ -360,11 +360,11 @@ static void free_clear_pte(u64 *pte, u64 pteval, struct list_head *freelist)
>   * supporting all features of AMD IOMMU page tables like level skipping
>   * and full 64 bit address spaces.
>   */
> -static int iommu_v1_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
> +static int iommu_v1_map_pages(struct io_pgtable *iop, unsigned long iova,
>  			      phys_addr_t paddr, size_t pgsize, size_t pgcount,
>  			      int prot, gfp_t gfp, size_t *mapped)
>  {
> -	struct protection_domain *dom = io_pgtable_ops_to_domain(ops);
> +	struct protection_domain *dom = io_pgtable_ops_to_domain(iop->ops);
>  	LIST_HEAD(freelist);
>  	bool updated = false;
>  	u64 __pte, *pte;
> @@ -435,12 +435,12 @@ static int iommu_v1_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
>  	return ret;
>  }
>  
> -static unsigned long iommu_v1_unmap_pages(struct io_pgtable_ops *ops,
> +static unsigned long iommu_v1_unmap_pages(struct io_pgtable *iop,
>  					  unsigned long iova,
>  					  size_t pgsize, size_t pgcount,
>  					  struct iommu_iotlb_gather *gather)
>  {
> -	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(ops);
> +	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(iop->ops);
>  	unsigned long long unmapped;
>  	unsigned long unmap_size;
>  	u64 *pte;
> @@ -469,9 +469,9 @@ static unsigned long iommu_v1_unmap_pages(struct io_pgtable_ops *ops,
>  	return unmapped;
>  }
>  
> -static phys_addr_t iommu_v1_iova_to_phys(struct io_pgtable_ops *ops, unsigned long iova)
> +static phys_addr_t iommu_v1_iova_to_phys(struct io_pgtable *iop, unsigned long iova)
>  {
> -	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(ops);
> +	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(iop->ops);
>  	unsigned long offset_mask, pte_pgsize;
>  	u64 *pte, __pte;
>  
> @@ -491,7 +491,7 @@ static phys_addr_t iommu_v1_iova_to_phys(struct io_pgtable_ops *ops, unsigned lo
>   */
>  static void v1_free_pgtable(struct io_pgtable *iop)
>  {
> -	struct amd_io_pgtable *pgtable = container_of(iop, struct amd_io_pgtable, iop);
> +	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(iop->ops);
>  	struct protection_domain *dom;
>  	LIST_HEAD(freelist);
>  
> @@ -515,7 +515,8 @@ static void v1_free_pgtable(struct io_pgtable *iop)
>  	put_pages_list(&freelist);
>  }
>  
> -static struct io_pgtable *v1_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
> +int v1_alloc_pgtable(struct io_pgtable *iop, struct io_pgtable_cfg *cfg,
> +		     void *cookie)
>  {
>  	struct amd_io_pgtable *pgtable = io_pgtable_cfg_to_data(cfg);
>  
> @@ -524,11 +525,12 @@ static struct io_pgtable *v1_alloc_pgtable(struct io_pgtable_cfg *cfg, void *coo
>  	cfg->oas            = IOMMU_OUT_ADDR_BIT_SIZE,
>  	cfg->tlb            = &v1_flush_ops;
>  
> -	pgtable->iop.ops.map_pages    = iommu_v1_map_pages;
> -	pgtable->iop.ops.unmap_pages  = iommu_v1_unmap_pages;
> -	pgtable->iop.ops.iova_to_phys = iommu_v1_iova_to_phys;
> +	pgtable->iop_params.ops.map_pages    = iommu_v1_map_pages;
> +	pgtable->iop_params.ops.unmap_pages  = iommu_v1_unmap_pages;
> +	pgtable->iop_params.ops.iova_to_phys = iommu_v1_iova_to_phys;
> +	iop->ops = &pgtable->iop_params.ops;
>  
> -	return &pgtable->iop;
> +	return 0;
>  }
>  
>  struct io_pgtable_init_fns io_pgtable_amd_iommu_v1_init_fns = {
> diff --git a/drivers/iommu/amd/io_pgtable_v2.c b/drivers/iommu/amd/io_pgtable_v2.c
> index 8638ddf6fb3b..52acb8f11a27 100644
> --- a/drivers/iommu/amd/io_pgtable_v2.c
> +++ b/drivers/iommu/amd/io_pgtable_v2.c
> @@ -239,12 +239,12 @@ static u64 *fetch_pte(struct amd_io_pgtable *pgtable,
>  	return pte;
>  }
>  
> -static int iommu_v2_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
> +static int iommu_v2_map_pages(struct io_pgtable *iop, unsigned long iova,
>  			      phys_addr_t paddr, size_t pgsize, size_t pgcount,
>  			      int prot, gfp_t gfp, size_t *mapped)
>  {
> -	struct protection_domain *pdom = io_pgtable_ops_to_domain(ops);
> -	struct io_pgtable_cfg *cfg = &pdom->iop.iop.cfg;
> +	struct protection_domain *pdom = io_pgtable_ops_to_domain(iop->ops);
> +	struct io_pgtable_cfg *cfg = &pdom->iop.iop_params.cfg;
>  	u64 *pte;
>  	unsigned long map_size;
>  	unsigned long mapped_size = 0;
> @@ -290,13 +290,13 @@ static int iommu_v2_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
>  	return ret;
>  }
>  
> -static unsigned long iommu_v2_unmap_pages(struct io_pgtable_ops *ops,
> +static unsigned long iommu_v2_unmap_pages(struct io_pgtable *iop,
>  					  unsigned long iova,
>  					  size_t pgsize, size_t pgcount,
>  					  struct iommu_iotlb_gather *gather)
>  {
> -	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(ops);
> -	struct io_pgtable_cfg *cfg = &pgtable->iop.cfg;
> +	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(iop->ops);
> +	struct io_pgtable_cfg *cfg = &pgtable->iop_params.cfg;
>  	unsigned long unmap_size;
>  	unsigned long unmapped = 0;
>  	size_t size = pgcount << __ffs(pgsize);
> @@ -319,9 +319,9 @@ static unsigned long iommu_v2_unmap_pages(struct io_pgtable_ops *ops,
>  	return unmapped;
>  }
>  
> -static phys_addr_t iommu_v2_iova_to_phys(struct io_pgtable_ops *ops, unsigned long iova)
> +static phys_addr_t iommu_v2_iova_to_phys(struct io_pgtable *iop, unsigned long iova)
>  {
> -	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(ops);
> +	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(iop->ops);
>  	unsigned long offset_mask, pte_pgsize;
>  	u64 *pte, __pte;
>  
> @@ -362,7 +362,7 @@ static const struct iommu_flush_ops v2_flush_ops = {
>  static void v2_free_pgtable(struct io_pgtable *iop)
>  {
>  	struct protection_domain *pdom;
> -	struct amd_io_pgtable *pgtable = container_of(iop, struct amd_io_pgtable, iop);
> +	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(iop->ops);
>  
>  	pdom = container_of(pgtable, struct protection_domain, iop);
>  	if (!(pdom->flags & PD_IOMMUV2_MASK))
> @@ -375,38 +375,39 @@ static void v2_free_pgtable(struct io_pgtable *iop)
>  	amd_iommu_domain_update(pdom);
>  
>  	/* Free page table */
> -	free_pgtable(pgtable->pgd, get_pgtable_level());
> +	free_pgtable(iop->pgd, get_pgtable_level());
>  }
>  
> -static struct io_pgtable *v2_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
> +int v2_alloc_pgtable(struct io_pgtable *iop, struct io_pgtable_cfg *cfg, void *cookie)
>  {
>  	struct amd_io_pgtable *pgtable = io_pgtable_cfg_to_data(cfg);
>  	struct protection_domain *pdom = (struct protection_domain *)cookie;
>  	int ret;
>  
> -	pgtable->pgd = alloc_pgtable_page();
> -	if (!pgtable->pgd)
> -		return NULL;
> +	iop->pgd = alloc_pgtable_page();
> +	if (!iop->pgd)
> +		return -ENOMEM;
>  
> -	ret = amd_iommu_domain_set_gcr3(&pdom->domain, 0, iommu_virt_to_phys(pgtable->pgd));
> +	ret = amd_iommu_domain_set_gcr3(&pdom->domain, 0, iommu_virt_to_phys(iop->pgd));
>  	if (ret)
>  		goto err_free_pgd;
>  
> -	pgtable->iop.ops.map_pages    = iommu_v2_map_pages;
> -	pgtable->iop.ops.unmap_pages  = iommu_v2_unmap_pages;
> -	pgtable->iop.ops.iova_to_phys = iommu_v2_iova_to_phys;
> +	pgtable->iop_params.ops.map_pages    = iommu_v2_map_pages;
> +	pgtable->iop_params.ops.unmap_pages  = iommu_v2_unmap_pages;
> +	pgtable->iop_params.ops.iova_to_phys = iommu_v2_iova_to_phys;
> +	iop->ops = &pgtable->iop_params.ops;
>  
>  	cfg->pgsize_bitmap = AMD_IOMMU_PGSIZES_V2,
>  	cfg->ias           = IOMMU_IN_ADDR_BIT_SIZE,
>  	cfg->oas           = IOMMU_OUT_ADDR_BIT_SIZE,
>  	cfg->tlb           = &v2_flush_ops;
>  
> -	return &pgtable->iop;
> +	return 0;
>  
>  err_free_pgd:
> -	free_pgtable_page(pgtable->pgd);
> +	free_pgtable_page(iop->pgd);
>  
> -	return NULL;
> +	return ret;
>  }
>  
>  struct io_pgtable_init_fns io_pgtable_amd_iommu_v2_init_fns = {
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index 7efb6b467041..51f9cecdcb6b 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -1984,7 +1984,7 @@ static void protection_domain_free(struct protection_domain *domain)
>  		return;
>  
>  	if (domain->iop.pgtbl_cfg.tlb)
> -		free_io_pgtable_ops(&domain->iop.iop.ops);
> +		free_io_pgtable_ops(&domain->iop.iop);
>  
>  	if (domain->id)
>  		domain_id_free(domain->id);
> @@ -2037,7 +2037,6 @@ static int protection_domain_init_v2(struct protection_domain *domain)
>  
>  static struct protection_domain *protection_domain_alloc(unsigned int type)
>  {
> -	struct io_pgtable_ops *pgtbl_ops;
>  	struct protection_domain *domain;
>  	int pgtable = amd_iommu_pgtable;
>  	int mode = DEFAULT_PGTABLE_LEVEL;
> @@ -2073,8 +2072,9 @@ static struct protection_domain *protection_domain_alloc(unsigned int type)
>  		goto out_err;
>  
>  	domain->iop.pgtbl_cfg.fmt = pgtable;
> -	pgtbl_ops = alloc_io_pgtable_ops(&domain->iop.pgtbl_cfg, domain);
> -	if (!pgtbl_ops) {
> +	ret = alloc_io_pgtable_ops(&domain->iop.iop, &domain->iop.pgtbl_cfg,
> +				   domain);
> +	if (ret) {
>  		domain_id_free(domain->id);
>  		goto out_err;
>  	}
> @@ -2185,7 +2185,7 @@ static void amd_iommu_iotlb_sync_map(struct iommu_domain *dom,
>  				     unsigned long iova, size_t size)
>  {
>  	struct protection_domain *domain = to_pdomain(dom);
> -	struct io_pgtable_ops *ops = &domain->iop.iop.ops;
> +	struct io_pgtable_ops *ops = domain->iop.iop.ops;
>  
>  	if (ops->map_pages)
>  		domain_flush_np_cache(domain, iova, size);
> @@ -2196,9 +2196,7 @@ static int amd_iommu_map_pages(struct iommu_domain *dom, unsigned long iova,
>  			       int iommu_prot, gfp_t gfp, size_t *mapped)
>  {
>  	struct protection_domain *domain = to_pdomain(dom);
> -	struct io_pgtable_ops *ops = &domain->iop.iop.ops;
>  	int prot = 0;
> -	int ret = -EINVAL;
>  
>  	if ((amd_iommu_pgtable == AMD_IOMMU_V1) &&
>  	    (domain->iop.mode == PAGE_MODE_NONE))
> @@ -2209,12 +2207,8 @@ static int amd_iommu_map_pages(struct iommu_domain *dom, unsigned long iova,
>  	if (iommu_prot & IOMMU_WRITE)
>  		prot |= IOMMU_PROT_IW;
>  
> -	if (ops->map_pages) {
> -		ret = ops->map_pages(ops, iova, paddr, pgsize,
> -				     pgcount, prot, gfp, mapped);
> -	}
> -
> -	return ret;
> +	return iopt_map_pages(&domain->iop.iop, iova, paddr, pgsize, pgcount,
> +			      prot, gfp, mapped);
>  }
>  
>  static void amd_iommu_iotlb_gather_add_page(struct iommu_domain *domain,
> @@ -2243,14 +2237,13 @@ static size_t amd_iommu_unmap_pages(struct iommu_domain *dom, unsigned long iova
>  				    struct iommu_iotlb_gather *gather)
>  {
>  	struct protection_domain *domain = to_pdomain(dom);
> -	struct io_pgtable_ops *ops = &domain->iop.iop.ops;
>  	size_t r;
>  
>  	if ((amd_iommu_pgtable == AMD_IOMMU_V1) &&
>  	    (domain->iop.mode == PAGE_MODE_NONE))
>  		return 0;
>  
> -	r = (ops->unmap_pages) ? ops->unmap_pages(ops, iova, pgsize, pgcount, NULL) : 0;
> +	r = iopt_unmap_pages(&domain->iop.iop, iova, pgsize, pgcount, NULL);
>  
>  	if (r)
>  		amd_iommu_iotlb_gather_add_page(dom, gather, iova, r);
> @@ -2262,9 +2255,8 @@ static phys_addr_t amd_iommu_iova_to_phys(struct iommu_domain *dom,
>  					  dma_addr_t iova)
>  {
>  	struct protection_domain *domain = to_pdomain(dom);
> -	struct io_pgtable_ops *ops = &domain->iop.iop.ops;
>  
> -	return ops->iova_to_phys(ops, iova);
> +	return iopt_iova_to_phys(&domain->iop.iop, iova);
>  }
>  
>  static bool amd_iommu_capable(struct device *dev, enum iommu_cap cap)
> @@ -2460,7 +2452,7 @@ void amd_iommu_domain_direct_map(struct iommu_domain *dom)
>  	spin_lock_irqsave(&domain->lock, flags);
>  
>  	if (domain->iop.pgtbl_cfg.tlb)
> -		free_io_pgtable_ops(&domain->iop.iop.ops);
> +		free_io_pgtable_ops(&domain->iop.iop);
>  
>  	spin_unlock_irqrestore(&domain->lock, flags);
>  }
> diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
> index 571f948add7c..b806019f925b 100644
> --- a/drivers/iommu/apple-dart.c
> +++ b/drivers/iommu/apple-dart.c
> @@ -150,14 +150,14 @@ struct apple_dart_atomic_stream_map {
>  /*
>   * This structure is attached to each iommu domain handled by a DART.
>   *
> - * @pgtbl_ops: pagetable ops allocated by io-pgtable
> + * @pgtbl: pagetable allocated by io-pgtable
>   * @finalized: true if the domain has been completely initialized
>   * @init_lock: protects domain initialization
>   * @stream_maps: streams attached to this domain (valid for DMA/UNMANAGED only)
>   * @domain: core iommu domain pointer
>   */
>  struct apple_dart_domain {
> -	struct io_pgtable_ops *pgtbl_ops;
> +	struct io_pgtable pgtbl;
>  
>  	bool finalized;
>  	struct mutex init_lock;
> @@ -354,12 +354,8 @@ static phys_addr_t apple_dart_iova_to_phys(struct iommu_domain *domain,
>  					   dma_addr_t iova)
>  {
>  	struct apple_dart_domain *dart_domain = to_dart_domain(domain);
> -	struct io_pgtable_ops *ops = dart_domain->pgtbl_ops;
>  
> -	if (!ops)
> -		return 0;
> -
> -	return ops->iova_to_phys(ops, iova);
> +	return iopt_iova_to_phys(&dart_domain->pgtbl, iova);
>  }
>  
>  static int apple_dart_map_pages(struct iommu_domain *domain, unsigned long iova,
> @@ -368,13 +364,9 @@ static int apple_dart_map_pages(struct iommu_domain *domain, unsigned long iova,
>  				size_t *mapped)
>  {
>  	struct apple_dart_domain *dart_domain = to_dart_domain(domain);
> -	struct io_pgtable_ops *ops = dart_domain->pgtbl_ops;
> -
> -	if (!ops)
> -		return -ENODEV;
>  
> -	return ops->map_pages(ops, iova, paddr, pgsize, pgcount, prot, gfp,
> -			      mapped);
> +	return iopt_map_pages(&dart_domain->pgtbl, iova, paddr, pgsize, pgcount,
> +			      prot, gfp, mapped);
>  }
>  
>  static size_t apple_dart_unmap_pages(struct iommu_domain *domain,
> @@ -383,9 +375,9 @@ static size_t apple_dart_unmap_pages(struct iommu_domain *domain,
>  				     struct iommu_iotlb_gather *gather)
>  {
>  	struct apple_dart_domain *dart_domain = to_dart_domain(domain);
> -	struct io_pgtable_ops *ops = dart_domain->pgtbl_ops;
>  
> -	return ops->unmap_pages(ops, iova, pgsize, pgcount, gather);
> +	return iopt_unmap_pages(&dart_domain->pgtbl, iova, pgsize, pgcount,
> +				gather);
>  }
>  
>  static void
> @@ -394,7 +386,7 @@ apple_dart_setup_translation(struct apple_dart_domain *domain,
>  {
>  	int i;
>  	struct io_pgtable_cfg *pgtbl_cfg =
> -		&io_pgtable_ops_to_pgtable(domain->pgtbl_ops)->cfg;
> +		&io_pgtable_ops_to_params(domain->pgtbl.ops)->cfg;
>  
>  	for (i = 0; i < pgtbl_cfg->apple_dart_cfg.n_ttbrs; ++i)
>  		apple_dart_hw_set_ttbr(stream_map, i,
> @@ -435,11 +427,9 @@ static int apple_dart_finalize_domain(struct iommu_domain *domain,
>  		.iommu_dev = dart->dev,
>  	};
>  
> -	dart_domain->pgtbl_ops = alloc_io_pgtable_ops(&pgtbl_cfg, domain);
> -	if (!dart_domain->pgtbl_ops) {
> -		ret = -ENOMEM;
> +	ret = alloc_io_pgtable_ops(&dart_domain->pgtbl, &pgtbl_cfg, domain);
> +	if (ret)
>  		goto done;
> -	}
>  
>  	domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
>  	domain->geometry.aperture_start = 0;
> @@ -590,7 +580,7 @@ static struct iommu_domain *apple_dart_domain_alloc(unsigned int type)
>  
>  	mutex_init(&dart_domain->init_lock);
>  
> -	/* no need to allocate pgtbl_ops or do any other finalization steps */
> +	/* no need to allocate pgtbl or do any other finalization steps */
>  	if (type == IOMMU_DOMAIN_IDENTITY || type == IOMMU_DOMAIN_BLOCKED)
>  		dart_domain->finalized = true;
>  
> @@ -601,8 +591,8 @@ static void apple_dart_domain_free(struct iommu_domain *domain)
>  {
>  	struct apple_dart_domain *dart_domain = to_dart_domain(domain);
>  
> -	if (dart_domain->pgtbl_ops)
> -		free_io_pgtable_ops(dart_domain->pgtbl_ops);
> +	if (dart_domain->pgtbl.ops)
> +		free_io_pgtable_ops(&dart_domain->pgtbl);
>  
>  	kfree(dart_domain);
>  }
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index c033b23ca4b2..97d24ee5c14d 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2058,7 +2058,7 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
>  	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>  	struct arm_smmu_device *smmu = smmu_domain->smmu;
>  
> -	free_io_pgtable_ops(smmu_domain->pgtbl_ops);
> +	free_io_pgtable_ops(&smmu_domain->pgtbl);
>  
>  	/* Free the CD and ASID, if we allocated them */
>  	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
> @@ -2171,7 +2171,6 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
>  	unsigned long ias, oas;
>  	enum io_pgtable_fmt fmt;
>  	struct io_pgtable_cfg pgtbl_cfg;
> -	struct io_pgtable_ops *pgtbl_ops;
>  	int (*finalise_stage_fn)(struct arm_smmu_domain *,
>  				 struct arm_smmu_master *,
>  				 struct io_pgtable_cfg *);
> @@ -2218,9 +2217,9 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
>  		.iommu_dev	= smmu->dev,
>  	};
>  
> -	pgtbl_ops = alloc_io_pgtable_ops(&pgtbl_cfg, smmu_domain);
> -	if (!pgtbl_ops)
> -		return -ENOMEM;
> +	ret = alloc_io_pgtable_ops(&smmu_domain->pgtbl, &pgtbl_cfg, smmu_domain);
> +	if (ret)
> +		return ret;
>  
>  	domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
>  	domain->geometry.aperture_end = (1UL << pgtbl_cfg.ias) - 1;
> @@ -2228,11 +2227,10 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
>  
>  	ret = finalise_stage_fn(smmu_domain, master, &pgtbl_cfg);
>  	if (ret < 0) {
> -		free_io_pgtable_ops(pgtbl_ops);
> +		free_io_pgtable_ops(&smmu_domain->pgtbl);
>  		return ret;
>  	}
>  
> -	smmu_domain->pgtbl_ops = pgtbl_ops;
>  	return 0;
>  }
>  
> @@ -2468,12 +2466,10 @@ static int arm_smmu_map_pages(struct iommu_domain *domain, unsigned long iova,
>  			      phys_addr_t paddr, size_t pgsize, size_t pgcount,
>  			      int prot, gfp_t gfp, size_t *mapped)
>  {
> -	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> -
> -	if (!ops)
> -		return -ENODEV;
> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>  
> -	return ops->map_pages(ops, iova, paddr, pgsize, pgcount, prot, gfp, mapped);
> +	return iopt_map_pages(&smmu_domain->pgtbl, iova, paddr, pgsize, pgcount,
> +			      prot, gfp, mapped);
>  }
>  
>  static size_t arm_smmu_unmap_pages(struct iommu_domain *domain, unsigned long iova,
> @@ -2481,12 +2477,9 @@ static size_t arm_smmu_unmap_pages(struct iommu_domain *domain, unsigned long io
>  				   struct iommu_iotlb_gather *gather)
>  {
>  	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> -	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>  
> -	if (!ops)
> -		return 0;
> -
> -	return ops->unmap_pages(ops, iova, pgsize, pgcount, gather);
> +	return iopt_unmap_pages(&smmu_domain->pgtbl, iova, pgsize, pgcount,
> +				gather);
>  }
>  
>  static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
> @@ -2513,12 +2506,9 @@ static void arm_smmu_iotlb_sync(struct iommu_domain *domain,
>  static phys_addr_t
>  arm_smmu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova)
>  {
> -	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> -
> -	if (!ops)
> -		return 0;
> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>  
> -	return ops->iova_to_phys(ops, iova);
> +	return iopt_iova_to_phys(&smmu_domain->pgtbl, iova);
>  }
>  
>  static struct platform_driver arm_smmu_driver;
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> index 91d404deb115..0673841167be 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> @@ -122,8 +122,8 @@ static const struct io_pgtable_cfg *qcom_adreno_smmu_get_ttbr1_cfg(
>  		const void *cookie)
>  {
>  	struct arm_smmu_domain *smmu_domain = (void *)cookie;
> -	struct io_pgtable *pgtable =
> -		io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops);
> +	struct io_pgtable_params *pgtable =
> +		io_pgtable_ops_to_params(smmu_domain->pgtbl.ops);
>  	return &pgtable->cfg;
>  }
>  
> @@ -137,7 +137,8 @@ static int qcom_adreno_smmu_set_ttbr0_cfg(const void *cookie,
>  		const struct io_pgtable_cfg *pgtbl_cfg)
>  {
>  	struct arm_smmu_domain *smmu_domain = (void *)cookie;
> -	struct io_pgtable *pgtable = io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops);
> +	struct io_pgtable_params *pgtable =
> +		io_pgtable_ops_to_params(smmu_domain->pgtbl.ops);
>  	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
>  	struct arm_smmu_cb *cb = &smmu_domain->smmu->cbs[cfg->cbndx];
>  
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> index f230d2ce977a..201055254d5b 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> @@ -614,7 +614,6 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
>  {
>  	int irq, start, ret = 0;
>  	unsigned long ias, oas;
> -	struct io_pgtable_ops *pgtbl_ops;
>  	struct io_pgtable_cfg pgtbl_cfg;
>  	enum io_pgtable_fmt fmt;
>  	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> @@ -765,11 +764,9 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
>  	if (smmu_domain->pgtbl_quirks)
>  		pgtbl_cfg.quirks |= smmu_domain->pgtbl_quirks;
>  
> -	pgtbl_ops = alloc_io_pgtable_ops(&pgtbl_cfg, smmu_domain);
> -	if (!pgtbl_ops) {
> -		ret = -ENOMEM;
> +	ret = alloc_io_pgtable_ops(&smmu_domain->pgtbl, &pgtbl_cfg, smmu_domain);
> +	if (ret)
>  		goto out_clear_smmu;
> -	}
>  
>  	/* Update the domain's page sizes to reflect the page table format */
>  	domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
> @@ -808,8 +805,6 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
>  
>  	mutex_unlock(&smmu_domain->init_mutex);
>  
> -	/* Publish page table ops for map/unmap */
> -	smmu_domain->pgtbl_ops = pgtbl_ops;
>  	return 0;
>  
>  out_clear_smmu:
> @@ -846,7 +841,7 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain)
>  		devm_free_irq(smmu->dev, irq, domain);
>  	}
>  
> -	free_io_pgtable_ops(smmu_domain->pgtbl_ops);
> +	free_io_pgtable_ops(&smmu_domain->pgtbl);
>  	__arm_smmu_free_bitmap(smmu->context_map, cfg->cbndx);
>  
>  	arm_smmu_rpm_put(smmu);
> @@ -1181,15 +1176,13 @@ static int arm_smmu_map_pages(struct iommu_domain *domain, unsigned long iova,
>  			      phys_addr_t paddr, size_t pgsize, size_t pgcount,
>  			      int prot, gfp_t gfp, size_t *mapped)
>  {
> -	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> -	struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
>  	int ret;
>  
> -	if (!ops)
> -		return -ENODEV;
> -
>  	arm_smmu_rpm_get(smmu);
> -	ret = ops->map_pages(ops, iova, paddr, pgsize, pgcount, prot, gfp, mapped);
> +	ret = iopt_map_pages(&smmu_domain->pgtbl, iova, paddr, pgsize, pgcount,
> +			     prot, gfp, mapped);
>  	arm_smmu_rpm_put(smmu);
>  
>  	return ret;
> @@ -1199,15 +1192,13 @@ static size_t arm_smmu_unmap_pages(struct iommu_domain *domain, unsigned long io
>  				   size_t pgsize, size_t pgcount,
>  				   struct iommu_iotlb_gather *iotlb_gather)
>  {
> -	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> -	struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
>  	size_t ret;
>  
> -	if (!ops)
> -		return 0;
> -
>  	arm_smmu_rpm_get(smmu);
> -	ret = ops->unmap_pages(ops, iova, pgsize, pgcount, iotlb_gather);
> +	ret = iopt_unmap_pages(&smmu_domain->pgtbl, iova, pgsize, pgcount,
> +			       iotlb_gather);
>  	arm_smmu_rpm_put(smmu);
>  
>  	return ret;
> @@ -1249,7 +1240,6 @@ static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,
>  	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>  	struct arm_smmu_device *smmu = smmu_domain->smmu;
>  	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
> -	struct io_pgtable_ops *ops= smmu_domain->pgtbl_ops;
>  	struct device *dev = smmu->dev;
>  	void __iomem *reg;
>  	u32 tmp;
> @@ -1277,7 +1267,7 @@ static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,
>  			"iova to phys timed out on %pad. Falling back to software table walk.\n",
>  			&iova);
>  		arm_smmu_rpm_put(smmu);
> -		return ops->iova_to_phys(ops, iova);
> +		return iopt_iova_to_phys(&smmu_domain->pgtbl, iova);
>  	}
>  
>  	phys = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_PAR);
> @@ -1299,16 +1289,12 @@ static phys_addr_t arm_smmu_iova_to_phys(struct iommu_domain *domain,
>  					dma_addr_t iova)
>  {
>  	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> -	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
> -
> -	if (!ops)
> -		return 0;
>  
>  	if (smmu_domain->smmu->features & ARM_SMMU_FEAT_TRANS_OPS &&
>  			smmu_domain->stage == ARM_SMMU_DOMAIN_S1)
>  		return arm_smmu_iova_to_phys_hard(domain, iova);
>  
> -	return ops->iova_to_phys(ops, iova);
> +	return iopt_iova_to_phys(&smmu_domain->pgtbl, iova);
>  }
>  
>  static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap)
> diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
> index 65eb8bdcbe50..56676dd84462 100644
> --- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c
> +++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
> @@ -64,7 +64,7 @@ struct qcom_iommu_ctx {
>  };
>  
>  struct qcom_iommu_domain {
> -	struct io_pgtable_ops	*pgtbl_ops;
> +	struct io_pgtable	 pgtbl;
>  	spinlock_t		 pgtbl_lock;
>  	struct mutex		 init_mutex; /* Protects iommu pointer */
>  	struct iommu_domain	 domain;
> @@ -229,7 +229,6 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
>  {
>  	struct qcom_iommu_domain *qcom_domain = to_qcom_iommu_domain(domain);
>  	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> -	struct io_pgtable_ops *pgtbl_ops;
>  	struct io_pgtable_cfg pgtbl_cfg;
>  	int i, ret = 0;
>  	u32 reg;
> @@ -250,10 +249,9 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
>  	qcom_domain->iommu = qcom_iommu;
>  	qcom_domain->fwspec = fwspec;
>  
> -	pgtbl_ops = alloc_io_pgtable_ops(&pgtbl_cfg, qcom_domain);
> -	if (!pgtbl_ops) {
> +	ret = alloc_io_pgtable_ops(&qcom_domain->pgtbl, &pgtbl_cfg, qcom_domain);
> +	if (ret) {
>  		dev_err(qcom_iommu->dev, "failed to allocate pagetable ops\n");
> -		ret = -ENOMEM;
>  		goto out_clear_iommu;
>  	}
>  
> @@ -308,9 +306,6 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
>  
>  	mutex_unlock(&qcom_domain->init_mutex);
>  
> -	/* Publish page table ops for map/unmap */
> -	qcom_domain->pgtbl_ops = pgtbl_ops;
> -
>  	return 0;
>  
>  out_clear_iommu:
> @@ -353,7 +348,7 @@ static void qcom_iommu_domain_free(struct iommu_domain *domain)
>  		 * is on to avoid unclocked accesses in the TLB inv path:
>  		 */
>  		pm_runtime_get_sync(qcom_domain->iommu->dev);
> -		free_io_pgtable_ops(qcom_domain->pgtbl_ops);
> +		free_io_pgtable_ops(&qcom_domain->pgtbl);
>  		pm_runtime_put_sync(qcom_domain->iommu->dev);
>  	}
>  
> @@ -417,13 +412,10 @@ static int qcom_iommu_map(struct iommu_domain *domain, unsigned long iova,
>  	int ret;
>  	unsigned long flags;
>  	struct qcom_iommu_domain *qcom_domain = to_qcom_iommu_domain(domain);
> -	struct io_pgtable_ops *ops = qcom_domain->pgtbl_ops;
> -
> -	if (!ops)
> -		return -ENODEV;
>  
>  	spin_lock_irqsave(&qcom_domain->pgtbl_lock, flags);
> -	ret = ops->map_pages(ops, iova, paddr, pgsize, pgcount, prot, GFP_ATOMIC, mapped);
> +	ret = iopt_map_pages(&qcom_domain->pgtbl, iova, paddr, pgsize, pgcount,
> +			     prot, GFP_ATOMIC, mapped);
>  	spin_unlock_irqrestore(&qcom_domain->pgtbl_lock, flags);
>  	return ret;
>  }
> @@ -435,10 +427,6 @@ static size_t qcom_iommu_unmap(struct iommu_domain *domain, unsigned long iova,
>  	size_t ret;
>  	unsigned long flags;
>  	struct qcom_iommu_domain *qcom_domain = to_qcom_iommu_domain(domain);
> -	struct io_pgtable_ops *ops = qcom_domain->pgtbl_ops;
> -
> -	if (!ops)
> -		return 0;
>  
>  	/* NOTE: unmap can be called after client device is powered off,
>  	 * for example, with GPUs or anything involving dma-buf.  So we
> @@ -447,7 +435,8 @@ static size_t qcom_iommu_unmap(struct iommu_domain *domain, unsigned long iova,
>  	 */
>  	pm_runtime_get_sync(qcom_domain->iommu->dev);
>  	spin_lock_irqsave(&qcom_domain->pgtbl_lock, flags);
> -	ret = ops->unmap_pages(ops, iova, pgsize, pgcount, gather);
> +	ret = iopt_unmap_pages(&qcom_domain->pgtbl, iova, pgsize, pgcount,
> +			       gather);
>  	spin_unlock_irqrestore(&qcom_domain->pgtbl_lock, flags);
>  	pm_runtime_put_sync(qcom_domain->iommu->dev);
>  
> @@ -457,13 +446,12 @@ static size_t qcom_iommu_unmap(struct iommu_domain *domain, unsigned long iova,
>  static void qcom_iommu_flush_iotlb_all(struct iommu_domain *domain)
>  {
>  	struct qcom_iommu_domain *qcom_domain = to_qcom_iommu_domain(domain);
> -	struct io_pgtable *pgtable = container_of(qcom_domain->pgtbl_ops,
> -						  struct io_pgtable, ops);
> -	if (!qcom_domain->pgtbl_ops)
> +
> +	if (!qcom_domain->pgtbl.ops)
>  		return;
>  
>  	pm_runtime_get_sync(qcom_domain->iommu->dev);
> -	qcom_iommu_tlb_sync(pgtable->cookie);
> +	qcom_iommu_tlb_sync(qcom_domain->pgtbl.cookie);
>  	pm_runtime_put_sync(qcom_domain->iommu->dev);
>  }
>  
> @@ -479,13 +467,9 @@ static phys_addr_t qcom_iommu_iova_to_phys(struct iommu_domain *domain,
>  	phys_addr_t ret;
>  	unsigned long flags;
>  	struct qcom_iommu_domain *qcom_domain = to_qcom_iommu_domain(domain);
> -	struct io_pgtable_ops *ops = qcom_domain->pgtbl_ops;
> -
> -	if (!ops)
> -		return 0;
>  
>  	spin_lock_irqsave(&qcom_domain->pgtbl_lock, flags);
> -	ret = ops->iova_to_phys(ops, iova);
> +	ret = iopt_iova_to_phys(&qcom_domain->pgtbl, iova);
>  	spin_unlock_irqrestore(&qcom_domain->pgtbl_lock, flags);
>  
>  	return ret;
> diff --git a/drivers/iommu/io-pgtable-arm-common.c b/drivers/iommu/io-pgtable-arm-common.c
> index 4b3a9ce806ea..359086cace34 100644
> --- a/drivers/iommu/io-pgtable-arm-common.c
> +++ b/drivers/iommu/io-pgtable-arm-common.c
> @@ -48,7 +48,8 @@ static void __arm_lpae_clear_pte(arm_lpae_iopte *ptep, struct io_pgtable_cfg *cf
>  		__arm_lpae_sync_pte(ptep, 1, cfg);
>  }
>  
> -static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
> +static size_t __arm_lpae_unmap(struct io_pgtable *iop,
> +			       struct arm_lpae_io_pgtable *data,
>  			       struct iommu_iotlb_gather *gather,
>  			       unsigned long iova, size_t size, size_t pgcount,
>  			       int lvl, arm_lpae_iopte *ptep);
> @@ -74,7 +75,8 @@ static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
>  		__arm_lpae_sync_pte(ptep, num_entries, cfg);
>  }
>  
> -static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
> +static int arm_lpae_init_pte(struct io_pgtable *iop,
> +			     struct arm_lpae_io_pgtable *data,
>  			     unsigned long iova, phys_addr_t paddr,
>  			     arm_lpae_iopte prot, int lvl, int num_entries,
>  			     arm_lpae_iopte *ptep)
> @@ -95,8 +97,8 @@ static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
>  			size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
>  
>  			tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data);
> -			if (__arm_lpae_unmap(data, NULL, iova + i * sz, sz, 1,
> -					     lvl, tblp) != sz) {
> +			if (__arm_lpae_unmap(iop, data, NULL, iova + i * sz, sz,
> +					     1, lvl, tblp) != sz) {
>  				WARN_ON(1);
>  				return -EINVAL;
>  			}
> @@ -139,10 +141,10 @@ static arm_lpae_iopte arm_lpae_install_table(arm_lpae_iopte *table,
>  	return old;
>  }
>  
> -int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova,
> -		   phys_addr_t paddr, size_t size, size_t pgcount,
> -		   arm_lpae_iopte prot, int lvl, arm_lpae_iopte *ptep,
> -		   gfp_t gfp, size_t *mapped)
> +int __arm_lpae_map(struct io_pgtable *iop, struct arm_lpae_io_pgtable *data,
> +		   unsigned long iova, phys_addr_t paddr, size_t size,
> +		   size_t pgcount, arm_lpae_iopte prot, int lvl,
> +		   arm_lpae_iopte *ptep, gfp_t gfp, size_t *mapped)
>  {
>  	arm_lpae_iopte *cptep, pte;
>  	size_t block_size = ARM_LPAE_BLOCK_SIZE(lvl, data);
> @@ -158,7 +160,8 @@ int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova,
>  	if (size == block_size) {
>  		max_entries = ARM_LPAE_PTES_PER_TABLE(data) - map_idx_start;
>  		num_entries = min_t(int, pgcount, max_entries);
> -		ret = arm_lpae_init_pte(data, iova, paddr, prot, lvl, num_entries, ptep);
> +		ret = arm_lpae_init_pte(iop, data, iova, paddr, prot, lvl,
> +					num_entries, ptep);
>  		if (!ret)
>  			*mapped += num_entries * size;
>  
> @@ -192,7 +195,7 @@ int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova,
>  	}
>  
>  	/* Rinse, repeat */
> -	return __arm_lpae_map(data, iova, paddr, size, pgcount, prot, lvl + 1,
> +	return __arm_lpae_map(iop, data, iova, paddr, size, pgcount, prot, lvl + 1,
>  			      cptep, gfp, mapped);
>  }
>  
> @@ -260,13 +263,13 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
>  	return pte;
>  }
>  
> -int arm_lpae_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
> +int arm_lpae_map_pages(struct io_pgtable *iop, unsigned long iova,
>  		       phys_addr_t paddr, size_t pgsize, size_t pgcount,
>  		       int iommu_prot, gfp_t gfp, size_t *mapped)
>  {
> -	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
> +	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
>  	struct io_pgtable_cfg *cfg = &data->iop.cfg;
> -	arm_lpae_iopte *ptep = data->pgd;
> +	arm_lpae_iopte *ptep = iop->pgd;
>  	int ret, lvl = data->start_level;
>  	arm_lpae_iopte prot;
>  	long iaext = (s64)iova >> cfg->ias;
> @@ -284,7 +287,7 @@ int arm_lpae_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
>  		return 0;
>  
>  	prot = arm_lpae_prot_to_pte(data, iommu_prot);
> -	ret = __arm_lpae_map(data, iova, paddr, pgsize, pgcount, prot, lvl,
> +	ret = __arm_lpae_map(iop, data, iova, paddr, pgsize, pgcount, prot, lvl,
>  			     ptep, gfp, mapped);
>  	/*
>  	 * Synchronise all PTE updates for the new mapping before there's
> @@ -326,7 +329,8 @@ void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
>  	__arm_lpae_free_pages(start, table_size, &data->iop.cfg);
>  }
>  
> -static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
> +static size_t arm_lpae_split_blk_unmap(struct io_pgtable *iop,
> +				       struct arm_lpae_io_pgtable *data,
>  				       struct iommu_iotlb_gather *gather,
>  				       unsigned long iova, size_t size,
>  				       arm_lpae_iopte blk_pte, int lvl,
> @@ -378,21 +382,24 @@ static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
>  		tablep = iopte_deref(pte, data);
>  	} else if (unmap_idx_start >= 0) {
>  		for (i = 0; i < num_entries; i++)
> -			io_pgtable_tlb_add_page(&data->iop, gather, iova + i * size, size);
> +			io_pgtable_tlb_add_page(cfg, iop, gather,
> +						iova + i * size, size);
>  
>  		return num_entries * size;
>  	}
>  
> -	return __arm_lpae_unmap(data, gather, iova, size, pgcount, lvl, tablep);
> +	return __arm_lpae_unmap(iop, data, gather, iova, size, pgcount, lvl,
> +				tablep);
>  }
>  
> -static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
> +static size_t __arm_lpae_unmap(struct io_pgtable *iop,
> +			       struct arm_lpae_io_pgtable *data,
>  			       struct iommu_iotlb_gather *gather,
>  			       unsigned long iova, size_t size, size_t pgcount,
>  			       int lvl, arm_lpae_iopte *ptep)
>  {
>  	arm_lpae_iopte pte;
> -	struct io_pgtable *iop = &data->iop;
> +	struct io_pgtable_cfg *cfg = &data->iop.cfg;
>  	int i = 0, num_entries, max_entries, unmap_idx_start;
>  
>  	/* Something went horribly wrong and we ran out of page table */
> @@ -415,15 +422,16 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>  			if (WARN_ON(!pte))
>  				break;
>  
> -			__arm_lpae_clear_pte(ptep, &iop->cfg);
> +			__arm_lpae_clear_pte(ptep, cfg);
>  
> -			if (!iopte_leaf(pte, lvl, iop->cfg.fmt)) {
> +			if (!iopte_leaf(pte, lvl, cfg->fmt)) {
>  				/* Also flush any partial walks */
> -				io_pgtable_tlb_flush_walk(iop, iova + i * size, size,
> +				io_pgtable_tlb_flush_walk(cfg, iop, iova + i * size, size,
>  							  ARM_LPAE_GRANULE(data));
>  				__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data));
>  			} else if (!iommu_iotlb_gather_queued(gather)) {
> -				io_pgtable_tlb_add_page(iop, gather, iova + i * size, size);
> +				io_pgtable_tlb_add_page(cfg, iop, gather,
> +							iova + i * size, size);
>  			}
>  
>  			ptep++;
> @@ -431,27 +439,28 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>  		}
>  
>  		return i * size;
> -	} else if (iopte_leaf(pte, lvl, iop->cfg.fmt)) {
> +	} else if (iopte_leaf(pte, lvl, cfg->fmt)) {
>  		/*
>  		 * Insert a table at the next level to map the old region,
>  		 * minus the part we want to unmap
>  		 */
> -		return arm_lpae_split_blk_unmap(data, gather, iova, size, pte,
> -						lvl + 1, ptep, pgcount);
> +		return arm_lpae_split_blk_unmap(iop, data, gather, iova, size,
> +						pte, lvl + 1, ptep, pgcount);
>  	}
>  
>  	/* Keep on walkin' */
>  	ptep = iopte_deref(pte, data);
> -	return __arm_lpae_unmap(data, gather, iova, size, pgcount, lvl + 1, ptep);
> +	return __arm_lpae_unmap(iop, data, gather, iova, size,
> +				pgcount, lvl + 1, ptep);
>  }
>  
> -size_t arm_lpae_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova,
> +size_t arm_lpae_unmap_pages(struct io_pgtable *iop, unsigned long iova,
>  			    size_t pgsize, size_t pgcount,
>  			    struct iommu_iotlb_gather *gather)
>  {
> -	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
> +	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
>  	struct io_pgtable_cfg *cfg = &data->iop.cfg;
> -	arm_lpae_iopte *ptep = data->pgd;
> +	arm_lpae_iopte *ptep = iop->pgd;
>  	long iaext = (s64)iova >> cfg->ias;
>  
>  	if (WARN_ON(!pgsize || (pgsize & cfg->pgsize_bitmap) != pgsize || !pgcount))
> @@ -462,15 +471,14 @@ size_t arm_lpae_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova,
>  	if (WARN_ON(iaext))
>  		return 0;
>  
> -	return __arm_lpae_unmap(data, gather, iova, pgsize, pgcount,
> -				data->start_level, ptep);
> +	return __arm_lpae_unmap(iop, data, gather, iova, pgsize,
> +				pgcount, data->start_level, ptep);
>  }
>  
> -phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
> -				  unsigned long iova)
> +static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable *iop, unsigned long iova)
>  {
> -	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
> -	arm_lpae_iopte pte, *ptep = data->pgd;
> +	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
> +	arm_lpae_iopte pte, *ptep = iop->pgd;
>  	int lvl = data->start_level;
>  
>  	do {
> diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
> index 278b4299d757..2dd12fabfaee 100644
> --- a/drivers/iommu/io-pgtable-arm-v7s.c
> +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> @@ -40,7 +40,7 @@
>  	container_of((x), struct arm_v7s_io_pgtable, iop)
>  
>  #define io_pgtable_ops_to_data(x)					\
> -	io_pgtable_to_data(io_pgtable_ops_to_pgtable(x))
> +	io_pgtable_to_data(io_pgtable_ops_to_params(x))
>  
>  /*
>   * We have 32 bits total; 12 bits resolved at level 1, 8 bits at level 2,
> @@ -162,11 +162,10 @@ typedef u32 arm_v7s_iopte;
>  static bool selftest_running;
>  
>  struct arm_v7s_io_pgtable {
> -	struct io_pgtable	iop;
> +	struct io_pgtable_params	iop;
>  
> -	arm_v7s_iopte		*pgd;
> -	struct kmem_cache	*l2_tables;
> -	spinlock_t		split_lock;
> +	struct kmem_cache		*l2_tables;
> +	spinlock_t			split_lock;
>  };
>  
>  static bool arm_v7s_pte_is_cont(arm_v7s_iopte pte, int lvl);
> @@ -424,13 +423,14 @@ static bool arm_v7s_pte_is_cont(arm_v7s_iopte pte, int lvl)
>  	return false;
>  }
>  
> -static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *,
> +static size_t __arm_v7s_unmap(struct io_pgtable *, struct arm_v7s_io_pgtable *,
>  			      struct iommu_iotlb_gather *, unsigned long,
>  			      size_t, int, arm_v7s_iopte *);
>  
> -static int arm_v7s_init_pte(struct arm_v7s_io_pgtable *data,
> -			    unsigned long iova, phys_addr_t paddr, int prot,
> -			    int lvl, int num_entries, arm_v7s_iopte *ptep)
> +static int arm_v7s_init_pte(struct io_pgtable *iop,
> +			    struct arm_v7s_io_pgtable *data, unsigned long iova,
> +			    phys_addr_t paddr, int prot, int lvl,
> +			    int num_entries, arm_v7s_iopte *ptep)
>  {
>  	struct io_pgtable_cfg *cfg = &data->iop.cfg;
>  	arm_v7s_iopte pte;
> @@ -446,7 +446,7 @@ static int arm_v7s_init_pte(struct arm_v7s_io_pgtable *data,
>  			size_t sz = ARM_V7S_BLOCK_SIZE(lvl);
>  
>  			tblp = ptep - ARM_V7S_LVL_IDX(iova, lvl, cfg);
> -			if (WARN_ON(__arm_v7s_unmap(data, NULL, iova + i * sz,
> +			if (WARN_ON(__arm_v7s_unmap(iop, data, NULL, iova + i * sz,
>  						    sz, lvl, tblp) != sz))
>  				return -EINVAL;
>  		} else if (ptep[i]) {
> @@ -494,9 +494,9 @@ static arm_v7s_iopte arm_v7s_install_table(arm_v7s_iopte *table,
>  	return old;
>  }
>  
> -static int __arm_v7s_map(struct arm_v7s_io_pgtable *data, unsigned long iova,
> -			 phys_addr_t paddr, size_t size, int prot,
> -			 int lvl, arm_v7s_iopte *ptep, gfp_t gfp)
> +static int __arm_v7s_map(struct io_pgtable *iop, struct arm_v7s_io_pgtable *data,
> +			 unsigned long iova, phys_addr_t paddr, size_t size,
> +			 int prot, int lvl, arm_v7s_iopte *ptep, gfp_t gfp)
>  {
>  	struct io_pgtable_cfg *cfg = &data->iop.cfg;
>  	arm_v7s_iopte pte, *cptep;
> @@ -507,7 +507,7 @@ static int __arm_v7s_map(struct arm_v7s_io_pgtable *data, unsigned long iova,
>  
>  	/* If we can install a leaf entry at this level, then do so */
>  	if (num_entries)
> -		return arm_v7s_init_pte(data, iova, paddr, prot,
> +		return arm_v7s_init_pte(iop, data, iova, paddr, prot,
>  					lvl, num_entries, ptep);
>  
>  	/* We can't allocate tables at the final level */
> @@ -538,14 +538,14 @@ static int __arm_v7s_map(struct arm_v7s_io_pgtable *data, unsigned long iova,
>  	}
>  
>  	/* Rinse, repeat */
> -	return __arm_v7s_map(data, iova, paddr, size, prot, lvl + 1, cptep, gfp);
> +	return __arm_v7s_map(iop, data, iova, paddr, size, prot, lvl + 1, cptep, gfp);
>  }
>  
> -static int arm_v7s_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
> +static int arm_v7s_map_pages(struct io_pgtable *iop, unsigned long iova,
>  			     phys_addr_t paddr, size_t pgsize, size_t pgcount,
>  			     int prot, gfp_t gfp, size_t *mapped)
>  {
> -	struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops);
> +	struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
>  	int ret = -EINVAL;
>  
>  	if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias) ||
> @@ -557,8 +557,8 @@ static int arm_v7s_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
>  		return 0;
>  
>  	while (pgcount--) {
> -		ret = __arm_v7s_map(data, iova, paddr, pgsize, prot, 1, data->pgd,
> -				    gfp);
> +		ret = __arm_v7s_map(iop, data, iova, paddr, pgsize, prot, 1,
> +				    iop->pgd, gfp);
>  		if (ret)
>  			break;
>  
> @@ -577,26 +577,26 @@ static int arm_v7s_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
>  
>  static void arm_v7s_free_pgtable(struct io_pgtable *iop)
>  {
> -	struct arm_v7s_io_pgtable *data = io_pgtable_to_data(iop);
> +	struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
> +	arm_v7s_iopte *ptep = iop->pgd;
>  	int i;
>  
> -	for (i = 0; i < ARM_V7S_PTES_PER_LVL(1, &data->iop.cfg); i++) {
> -		arm_v7s_iopte pte = data->pgd[i];
> -
> -		if (ARM_V7S_PTE_IS_TABLE(pte, 1))
> -			__arm_v7s_free_table(iopte_deref(pte, 1, data),
> +	for (i = 0; i < ARM_V7S_PTES_PER_LVL(1, &data->iop.cfg); i++, ptep++) {
> +		if (ARM_V7S_PTE_IS_TABLE(*ptep, 1))
> +			__arm_v7s_free_table(iopte_deref(*ptep, 1, data),
>  					     2, data);
>  	}
> -	__arm_v7s_free_table(data->pgd, 1, data);
> +	__arm_v7s_free_table(iop->pgd, 1, data);
>  	kmem_cache_destroy(data->l2_tables);
>  	kfree(data);
>  }
>  
> -static arm_v7s_iopte arm_v7s_split_cont(struct arm_v7s_io_pgtable *data,
> +static arm_v7s_iopte arm_v7s_split_cont(struct io_pgtable *iop,
> +					struct arm_v7s_io_pgtable *data,
>  					unsigned long iova, int idx, int lvl,
>  					arm_v7s_iopte *ptep)
>  {
> -	struct io_pgtable *iop = &data->iop;
> +	struct io_pgtable_cfg *cfg = &data->iop.cfg;
>  	arm_v7s_iopte pte;
>  	size_t size = ARM_V7S_BLOCK_SIZE(lvl);
>  	int i;
> @@ -611,14 +611,15 @@ static arm_v7s_iopte arm_v7s_split_cont(struct arm_v7s_io_pgtable *data,
>  	for (i = 0; i < ARM_V7S_CONT_PAGES; i++)
>  		ptep[i] = pte + i * size;
>  
> -	__arm_v7s_pte_sync(ptep, ARM_V7S_CONT_PAGES, &iop->cfg);
> +	__arm_v7s_pte_sync(ptep, ARM_V7S_CONT_PAGES, cfg);
>  
>  	size *= ARM_V7S_CONT_PAGES;
> -	io_pgtable_tlb_flush_walk(iop, iova, size, size);
> +	io_pgtable_tlb_flush_walk(cfg, iop, iova, size, size);
>  	return pte;
>  }
>  
> -static size_t arm_v7s_split_blk_unmap(struct arm_v7s_io_pgtable *data,
> +static size_t arm_v7s_split_blk_unmap(struct io_pgtable *iop,
> +				      struct arm_v7s_io_pgtable *data,
>  				      struct iommu_iotlb_gather *gather,
>  				      unsigned long iova, size_t size,
>  				      arm_v7s_iopte blk_pte,
> @@ -656,27 +657,28 @@ static size_t arm_v7s_split_blk_unmap(struct arm_v7s_io_pgtable *data,
>  			return 0;
>  
>  		tablep = iopte_deref(pte, 1, data);
> -		return __arm_v7s_unmap(data, gather, iova, size, 2, tablep);
> +		return __arm_v7s_unmap(iop, data, gather, iova, size, 2, tablep);
>  	}
>  
> -	io_pgtable_tlb_add_page(&data->iop, gather, iova, size);
> +	io_pgtable_tlb_add_page(cfg, iop, gather, iova, size);
>  	return size;
>  }
>  
> -static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *data,
> +static size_t __arm_v7s_unmap(struct io_pgtable *iop,
> +			      struct arm_v7s_io_pgtable *data,
>  			      struct iommu_iotlb_gather *gather,
>  			      unsigned long iova, size_t size, int lvl,
>  			      arm_v7s_iopte *ptep)
>  {
>  	arm_v7s_iopte pte[ARM_V7S_CONT_PAGES];
> -	struct io_pgtable *iop = &data->iop;
> +	struct io_pgtable_cfg *cfg = &data->iop.cfg;
>  	int idx, i = 0, num_entries = size >> ARM_V7S_LVL_SHIFT(lvl);
>  
>  	/* Something went horribly wrong and we ran out of page table */
>  	if (WARN_ON(lvl > 2))
>  		return 0;
>  
> -	idx = ARM_V7S_LVL_IDX(iova, lvl, &iop->cfg);
> +	idx = ARM_V7S_LVL_IDX(iova, lvl, cfg);
>  	ptep += idx;
>  	do {
>  		pte[i] = READ_ONCE(ptep[i]);
> @@ -698,7 +700,7 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *data,
>  		unsigned long flags;
>  
>  		spin_lock_irqsave(&data->split_lock, flags);
> -		pte[0] = arm_v7s_split_cont(data, iova, idx, lvl, ptep);
> +		pte[0] = arm_v7s_split_cont(iop, data, iova, idx, lvl, ptep);
>  		spin_unlock_irqrestore(&data->split_lock, flags);
>  	}
>  
> @@ -706,17 +708,18 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *data,
>  	if (num_entries) {
>  		size_t blk_size = ARM_V7S_BLOCK_SIZE(lvl);
>  
> -		__arm_v7s_set_pte(ptep, 0, num_entries, &iop->cfg);
> +		__arm_v7s_set_pte(ptep, 0, num_entries, cfg);
>  
>  		for (i = 0; i < num_entries; i++) {
>  			if (ARM_V7S_PTE_IS_TABLE(pte[i], lvl)) {
>  				/* Also flush any partial walks */
> -				io_pgtable_tlb_flush_walk(iop, iova, blk_size,
> +				io_pgtable_tlb_flush_walk(cfg, iop, iova, blk_size,
>  						ARM_V7S_BLOCK_SIZE(lvl + 1));
>  				ptep = iopte_deref(pte[i], lvl, data);
>  				__arm_v7s_free_table(ptep, lvl + 1, data);
>  			} else if (!iommu_iotlb_gather_queued(gather)) {
> -				io_pgtable_tlb_add_page(iop, gather, iova, blk_size);
> +				io_pgtable_tlb_add_page(cfg, iop, gather, iova,
> +							blk_size);
>  			}
>  			iova += blk_size;
>  		}
> @@ -726,27 +729,27 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *data,
>  		 * Insert a table at the next level to map the old region,
>  		 * minus the part we want to unmap
>  		 */
> -		return arm_v7s_split_blk_unmap(data, gather, iova, size, pte[0],
> -					       ptep);
> +		return arm_v7s_split_blk_unmap(iop, data, gather, iova, size,
> +					       pte[0], ptep);
>  	}
>  
>  	/* Keep on walkin' */
>  	ptep = iopte_deref(pte[0], lvl, data);
> -	return __arm_v7s_unmap(data, gather, iova, size, lvl + 1, ptep);
> +	return __arm_v7s_unmap(iop, data, gather, iova, size, lvl + 1, ptep);
>  }
>  
> -static size_t arm_v7s_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova,
> +static size_t arm_v7s_unmap_pages(struct io_pgtable *iop, unsigned long iova,
>  				  size_t pgsize, size_t pgcount,
>  				  struct iommu_iotlb_gather *gather)
>  {
> -	struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops);
> +	struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
>  	size_t unmapped = 0, ret;
>  
>  	if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias)))
>  		return 0;
>  
>  	while (pgcount--) {
> -		ret = __arm_v7s_unmap(data, gather, iova, pgsize, 1, data->pgd);
> +		ret = __arm_v7s_unmap(iop, data, gather, iova, pgsize, 1, iop->pgd);
>  		if (!ret)
>  			break;
>  
> @@ -757,11 +760,11 @@ static size_t arm_v7s_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova
>  	return unmapped;
>  }
>  
> -static phys_addr_t arm_v7s_iova_to_phys(struct io_pgtable_ops *ops,
> +static phys_addr_t arm_v7s_iova_to_phys(struct io_pgtable *iop,
>  					unsigned long iova)
>  {
> -	struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops);
> -	arm_v7s_iopte *ptep = data->pgd, pte;
> +	struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
> +	arm_v7s_iopte *ptep = iop->pgd, pte;
>  	int lvl = 0;
>  	u32 mask;
>  
> @@ -780,37 +783,37 @@ static phys_addr_t arm_v7s_iova_to_phys(struct io_pgtable_ops *ops,
>  	return iopte_to_paddr(pte, lvl, &data->iop.cfg) | (iova & ~mask);
>  }
>  
> -static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,
> -						void *cookie)
> +static int arm_v7s_alloc_pgtable(struct io_pgtable *iop,
> +				 struct io_pgtable_cfg *cfg, void *cookie)
>  {
>  	struct arm_v7s_io_pgtable *data;
>  	slab_flags_t slab_flag;
>  	phys_addr_t paddr;
>  
>  	if (cfg->ias > (arm_v7s_is_mtk_enabled(cfg) ? 34 : ARM_V7S_ADDR_BITS))
> -		return NULL;
> +		return -EINVAL;
>  
>  	if (cfg->oas > (arm_v7s_is_mtk_enabled(cfg) ? 35 : ARM_V7S_ADDR_BITS))
> -		return NULL;
> +		return -EINVAL;
>  
>  	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
>  			    IO_PGTABLE_QUIRK_NO_PERMS |
>  			    IO_PGTABLE_QUIRK_ARM_MTK_EXT |
>  			    IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT))
> -		return NULL;
> +		return -EINVAL;
>  
>  	/* If ARM_MTK_4GB is enabled, the NO_PERMS is also expected. */
>  	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_EXT &&
>  	    !(cfg->quirks & IO_PGTABLE_QUIRK_NO_PERMS))
> -			return NULL;
> +		return -EINVAL;
>  
>  	if ((cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT) &&
>  	    !arm_v7s_is_mtk_enabled(cfg))
> -		return NULL;
> +		return -EINVAL;
>  
>  	data = kmalloc(sizeof(*data), GFP_KERNEL);
>  	if (!data)
> -		return NULL;
> +		return -ENOMEM;
>  
>  	spin_lock_init(&data->split_lock);
>  
> @@ -860,15 +863,15 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,
>  				ARM_V7S_NMRR_OR(7, ARM_V7S_RGN_WBWA);
>  
>  	/* Looking good; allocate a pgd */
> -	data->pgd = __arm_v7s_alloc_table(1, GFP_KERNEL, data);
> -	if (!data->pgd)
> +	iop->pgd = __arm_v7s_alloc_table(1, GFP_KERNEL, data);
> +	if (!iop->pgd)
>  		goto out_free_data;
>  
>  	/* Ensure the empty pgd is visible before any actual TTBR write */
>  	wmb();
>  
>  	/* TTBR */
> -	paddr = virt_to_phys(data->pgd);
> +	paddr = virt_to_phys(iop->pgd);
>  	if (arm_v7s_is_mtk_enabled(cfg))
>  		cfg->arm_v7s_cfg.ttbr = paddr | upper_32_bits(paddr);
>  	else
> @@ -878,12 +881,13 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,
>  					 ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_WBWA)) :
>  					(ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_NC) |
>  					 ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_NC)));
> -	return &data->iop;
> +	iop->ops = &data->iop.ops;
> +	return 0;
>  
>  out_free_data:
>  	kmem_cache_destroy(data->l2_tables);
>  	kfree(data);
> -	return NULL;
> +	return -EINVAL;
>  }
>  
>  struct io_pgtable_init_fns io_pgtable_arm_v7s_init_fns = {
> @@ -920,7 +924,7 @@ static const struct iommu_flush_ops dummy_tlb_ops __initconst = {
>  	.tlb_add_page	= dummy_tlb_add_page,
>  };
>  
> -#define __FAIL(ops)	({				\
> +#define __FAIL()	({				\
>  		WARN(1, "selftest: test failed\n");	\
>  		selftest_running = false;		\
>  		-EFAULT;				\
> @@ -928,7 +932,7 @@ static const struct iommu_flush_ops dummy_tlb_ops __initconst = {
>  
>  static int __init arm_v7s_do_selftests(void)
>  {
> -	struct io_pgtable_ops *ops;
> +	struct io_pgtable iop;
>  	struct io_pgtable_cfg cfg = {
>  		.fmt = ARM_V7S,
>  		.tlb = &dummy_tlb_ops,
> @@ -946,8 +950,7 @@ static int __init arm_v7s_do_selftests(void)
>  
>  	cfg_cookie = &cfg;
>  
> -	ops = alloc_io_pgtable_ops(&cfg, &cfg);
> -	if (!ops) {
> +	if (alloc_io_pgtable_ops(&iop, &cfg, &cfg)) {
>  		pr_err("selftest: failed to allocate io pgtable ops\n");
>  		return -EINVAL;
>  	}
> @@ -956,14 +959,14 @@ static int __init arm_v7s_do_selftests(void)
>  	 * Initial sanity checks.
>  	 * Empty page tables shouldn't provide any translations.
>  	 */
> -	if (ops->iova_to_phys(ops, 42))
> -		return __FAIL(ops);
> +	if (iopt_iova_to_phys(&iop, 42))
> +		return __FAIL();
>  
> -	if (ops->iova_to_phys(ops, SZ_1G + 42))
> -		return __FAIL(ops);
> +	if (iopt_iova_to_phys(&iop, SZ_1G + 42))
> +		return __FAIL();
>  
> -	if (ops->iova_to_phys(ops, SZ_2G + 42))
> -		return __FAIL(ops);
> +	if (iopt_iova_to_phys(&iop, SZ_2G + 42))
> +		return __FAIL();
>  
>  	/*
>  	 * Distinct mappings of different granule sizes.
> @@ -971,20 +974,20 @@ static int __init arm_v7s_do_selftests(void)
>  	iova = 0;
>  	for_each_set_bit(i, &cfg.pgsize_bitmap, BITS_PER_LONG) {
>  		size = 1UL << i;
> -		if (ops->map_pages(ops, iova, iova, size, 1,
> +		if (iopt_map_pages(&iop, iova, iova, size, 1,
>  				   IOMMU_READ | IOMMU_WRITE |
>  				   IOMMU_NOEXEC | IOMMU_CACHE,
>  				   GFP_KERNEL, &mapped))
> -			return __FAIL(ops);
> +			return __FAIL();
>  
>  		/* Overlapping mappings */
> -		if (!ops->map_pages(ops, iova, iova + size, size, 1,
> +		if (!iopt_map_pages(&iop, iova, iova + size, size, 1,
>  				    IOMMU_READ | IOMMU_NOEXEC, GFP_KERNEL,
>  				    &mapped))
> -			return __FAIL(ops);
> +			return __FAIL();
>  
> -		if (ops->iova_to_phys(ops, iova + 42) != (iova + 42))
> -			return __FAIL(ops);
> +		if (iopt_iova_to_phys(&iop, iova + 42) != (iova + 42))
> +			return __FAIL();
>  
>  		iova += SZ_16M;
>  		loopnr++;
> @@ -995,17 +998,17 @@ static int __init arm_v7s_do_selftests(void)
>  	size = 1UL << __ffs(cfg.pgsize_bitmap);
>  	while (i < loopnr) {
>  		iova_start = i * SZ_16M;
> -		if (ops->unmap_pages(ops, iova_start + size, size, 1, NULL) != size)
> -			return __FAIL(ops);
> +		if (iopt_unmap_pages(&iop, iova_start + size, size, 1, NULL) != size)
> +			return __FAIL();
>  
>  		/* Remap of partial unmap */
> -		if (ops->map_pages(ops, iova_start + size, size, size, 1,
> +		if (iopt_map_pages(&iop, iova_start + size, size, size, 1,
>  				   IOMMU_READ, GFP_KERNEL, &mapped))
> -			return __FAIL(ops);
> +			return __FAIL();
>  
> -		if (ops->iova_to_phys(ops, iova_start + size + 42)
> +		if (iopt_iova_to_phys(&iop, iova_start + size + 42)
>  		    != (size + 42))
> -			return __FAIL(ops);
> +			return __FAIL();
>  		i++;
>  	}
>  
> @@ -1014,24 +1017,24 @@ static int __init arm_v7s_do_selftests(void)
>  	for_each_set_bit(i, &cfg.pgsize_bitmap, BITS_PER_LONG) {
>  		size = 1UL << i;
>  
> -		if (ops->unmap_pages(ops, iova, size, 1, NULL) != size)
> -			return __FAIL(ops);
> +		if (iopt_unmap_pages(&iop, iova, size, 1, NULL) != size)
> +			return __FAIL();
>  
> -		if (ops->iova_to_phys(ops, iova + 42))
> -			return __FAIL(ops);
> +		if (iopt_iova_to_phys(&iop, iova + 42))
> +			return __FAIL();
>  
>  		/* Remap full block */
> -		if (ops->map_pages(ops, iova, iova, size, 1, IOMMU_WRITE,
> +		if (iopt_map_pages(&iop, iova, iova, size, 1, IOMMU_WRITE,
>  				   GFP_KERNEL, &mapped))
> -			return __FAIL(ops);
> +			return __FAIL();
>  
> -		if (ops->iova_to_phys(ops, iova + 42) != (iova + 42))
> -			return __FAIL(ops);
> +		if (iopt_iova_to_phys(&iop, iova + 42) != (iova + 42))
> +			return __FAIL();
>  
>  		iova += SZ_16M;
>  	}
>  
> -	free_io_pgtable_ops(ops);
> +	free_io_pgtable_ops(&iop);
>  
>  	selftest_running = false;
>  
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index c412500efadf..bee8980c89eb 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -82,40 +82,40 @@ void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries,
>  
>  static void arm_lpae_free_pgtable(struct io_pgtable *iop)
>  {
> -	struct arm_lpae_io_pgtable *data = io_pgtable_to_data(iop);
> +	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
>  
> -	__arm_lpae_free_pgtable(data, data->start_level, data->pgd);
> +	__arm_lpae_free_pgtable(data, data->start_level, iop->pgd);
>  	kfree(data);
>  }
>  
> -static struct io_pgtable *
> -arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
> +int arm_64_lpae_alloc_pgtable_s1(struct io_pgtable *iop,
> +				 struct io_pgtable_cfg *cfg, void *cookie)
>  {
>  	struct arm_lpae_io_pgtable *data;
>  
>  	data = kzalloc(sizeof(*data), GFP_KERNEL);
>  	if (!data)
> -		return NULL;
> +		return -ENOMEM;
>  
>  	if (arm_lpae_init_pgtable_s1(cfg, data))
>  		goto out_free_data;
>  
>  	/* Looking good; allocate a pgd */
> -	data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data),
> -					   GFP_KERNEL, cfg);
> -	if (!data->pgd)
> +	iop->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data),
> +					  GFP_KERNEL, cfg);
> +	if (!iop->pgd)
>  		goto out_free_data;
>  
>  	/* Ensure the empty pgd is visible before any actual TTBR write */
>  	wmb();
>  
> -	/* TTBR */
> -	cfg->arm_lpae_s1_cfg.ttbr = virt_to_phys(data->pgd);
> -	return &data->iop;
> +	cfg->arm_lpae_s1_cfg.ttbr = virt_to_phys(iop->pgd);
> +	iop->ops = &data->iop.ops;
> +	return 0;
>  
>  out_free_data:
>  	kfree(data);
> -	return NULL;
> +	return -EINVAL;
>  }
>  
>  static int arm_64_lpae_configure_s1(struct io_pgtable_cfg *cfg, size_t *pgd_size)
> @@ -130,34 +130,35 @@ static int arm_64_lpae_configure_s1(struct io_pgtable_cfg *cfg, size_t *pgd_size
>  	return 0;
>  }
>  
> -static struct io_pgtable *
> -arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
> +int arm_64_lpae_alloc_pgtable_s2(struct io_pgtable *iop,
> +				 struct io_pgtable_cfg *cfg, void *cookie)
>  {
>  	struct arm_lpae_io_pgtable *data;
>  
>  	data = kzalloc(sizeof(*data), GFP_KERNEL);
>  	if (!data)
> -		return NULL;
> +		return -ENOMEM;
>  
>  	if (arm_lpae_init_pgtable_s2(cfg, data))
>  		goto out_free_data;
>  
>  	/* Allocate pgd pages */
> -	data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data),
> -					   GFP_KERNEL, cfg);
> -	if (!data->pgd)
> +	iop->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data),
> +					  GFP_KERNEL, cfg);
> +	if (!iop->pgd)
>  		goto out_free_data;
>  
>  	/* Ensure the empty pgd is visible before any actual TTBR write */
>  	wmb();
>  
>  	/* VTTBR */
> -	cfg->arm_lpae_s2_cfg.vttbr = virt_to_phys(data->pgd);
> -	return &data->iop;
> +	cfg->arm_lpae_s2_cfg.vttbr = virt_to_phys(iop->pgd);
> +	iop->ops = &data->iop.ops;
> +	return 0;
>  
>  out_free_data:
>  	kfree(data);
> -	return NULL;
> +	return -EINVAL;
>  }
>  
>  static int arm_64_lpae_configure_s2(struct io_pgtable_cfg *cfg, size_t *pgd_size)
> @@ -172,46 +173,46 @@ static int arm_64_lpae_configure_s2(struct io_pgtable_cfg *cfg, size_t *pgd_size
>  	return 0;
>  }
>  
> -static struct io_pgtable *
> -arm_32_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
> +int arm_32_lpae_alloc_pgtable_s1(struct io_pgtable *iop,
> +				 struct io_pgtable_cfg *cfg, void *cookie)
>  {
>  	if (cfg->ias > 32 || cfg->oas > 40)
> -		return NULL;
> +		return -EINVAL;
>  
>  	cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G);
> -	return arm_64_lpae_alloc_pgtable_s1(cfg, cookie);
> +	return arm_64_lpae_alloc_pgtable_s1(iop, cfg, cookie);
>  }
>  
> -static struct io_pgtable *
> -arm_32_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
> +int arm_32_lpae_alloc_pgtable_s2(struct io_pgtable *iop,
> +				 struct io_pgtable_cfg *cfg, void *cookie)
>  {
>  	if (cfg->ias > 40 || cfg->oas > 40)
> -		return NULL;
> +		return -EINVAL;
>  
>  	cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G);
> -	return arm_64_lpae_alloc_pgtable_s2(cfg, cookie);
> +	return arm_64_lpae_alloc_pgtable_s2(iop, cfg, cookie);
>  }
>  
> -static struct io_pgtable *
> -arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
> +int arm_mali_lpae_alloc_pgtable(struct io_pgtable *iop,
> +				struct io_pgtable_cfg *cfg, void *cookie)
>  {
>  	struct arm_lpae_io_pgtable *data;
>  
>  	/* No quirks for Mali (hopefully) */
>  	if (cfg->quirks)
> -		return NULL;
> +		return -EINVAL;
>  
>  	if (cfg->ias > 48 || cfg->oas > 40)
> -		return NULL;
> +		return -EINVAL;
>  
>  	cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G);
>  
>  	data = kzalloc(sizeof(*data), GFP_KERNEL);
>  	if (!data)
> -		return NULL;
> +		return -ENOMEM;
>  
>  	if (arm_lpae_init_pgtable(cfg, data))
> -		return NULL;
> +		goto out_free_data;
>  
>  	/* Mali seems to need a full 4-level table regardless of IAS */
>  	if (data->start_level > 0) {
> @@ -233,25 +234,26 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
>  		(ARM_MALI_LPAE_MEMATTR_IMP_DEF
>  		 << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV));
>  
> -	data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data), GFP_KERNEL,
> -					   cfg);
> -	if (!data->pgd)
> +	iop->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data), GFP_KERNEL,
> +					  cfg);
> +	if (!iop->pgd)
>  		goto out_free_data;
>  
>  	/* Ensure the empty pgd is visible before TRANSTAB can be written */
>  	wmb();
>  
> -	cfg->arm_mali_lpae_cfg.transtab = virt_to_phys(data->pgd) |
> +	cfg->arm_mali_lpae_cfg.transtab = virt_to_phys(iop->pgd) |
>  					  ARM_MALI_LPAE_TTBR_READ_INNER |
>  					  ARM_MALI_LPAE_TTBR_ADRMODE_TABLE;
>  	if (cfg->coherent_walk)
>  		cfg->arm_mali_lpae_cfg.transtab |= ARM_MALI_LPAE_TTBR_SHARE_OUTER;
>  
> -	return &data->iop;
> +	iop->ops = &data->iop.ops;
> +	return 0;
>  
>  out_free_data:
>  	kfree(data);
> -	return NULL;
> +	return -EINVAL;
>  }
>  
>  struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns = {
> @@ -310,21 +312,21 @@ static const struct iommu_flush_ops dummy_tlb_ops __initconst = {
>  	.tlb_add_page	= dummy_tlb_add_page,
>  };
>  
> -static void __init arm_lpae_dump_ops(struct io_pgtable_ops *ops)
> +static void __init arm_lpae_dump_ops(struct io_pgtable *iop)
>  {
> -	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
> +	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
>  	struct io_pgtable_cfg *cfg = &data->iop.cfg;
>  
>  	pr_err("cfg: pgsize_bitmap 0x%lx, ias %u-bit\n",
>  		cfg->pgsize_bitmap, cfg->ias);
>  	pr_err("data: %d levels, 0x%zx pgd_size, %u pg_shift, %u bits_per_level, pgd @ %p\n",
>  		ARM_LPAE_MAX_LEVELS - data->start_level, ARM_LPAE_PGD_SIZE(data),
> -		ilog2(ARM_LPAE_GRANULE(data)), data->bits_per_level, data->pgd);
> +		ilog2(ARM_LPAE_GRANULE(data)), data->bits_per_level, iop->pgd);
>  }
>  
> -#define __FAIL(ops, i)	({						\
> +#define __FAIL(iop, i)	({						\
>  		WARN(1, "selftest: test failed for fmt idx %d\n", (i));	\
> -		arm_lpae_dump_ops(ops);					\
> +		arm_lpae_dump_ops(iop);					\
>  		selftest_running = false;				\
>  		-EFAULT;						\
>  })
> @@ -336,34 +338,34 @@ static int __init arm_lpae_run_tests(struct io_pgtable_cfg *cfg)
>  		ARM_64_LPAE_S2,
>  	};
>  
> -	int i, j;
> +	int i, j, ret;
>  	unsigned long iova;
>  	size_t size, mapped;
> -	struct io_pgtable_ops *ops;
> +	struct io_pgtable iop;
>  
>  	selftest_running = true;
>  
>  	for (i = 0; i < ARRAY_SIZE(fmts); ++i) {
>  		cfg_cookie = cfg;
>  		cfg->fmt = fmts[i];
> -		ops = alloc_io_pgtable_ops(cfg, cfg);
> -		if (!ops) {
> +		ret = alloc_io_pgtable_ops(&iop, cfg, cfg);
> +		if (ret) {
>  			pr_err("selftest: failed to allocate io pgtable ops\n");
> -			return -ENOMEM;
> +			return ret;
>  		}
>  
>  		/*
>  		 * Initial sanity checks.
>  		 * Empty page tables shouldn't provide any translations.
>  		 */
> -		if (ops->iova_to_phys(ops, 42))
> -			return __FAIL(ops, i);
> +		if (iopt_iova_to_phys(&iop, 42))
> +			return __FAIL(&iop, i);
>  
> -		if (ops->iova_to_phys(ops, SZ_1G + 42))
> -			return __FAIL(ops, i);
> +		if (iopt_iova_to_phys(&iop, SZ_1G + 42))
> +			return __FAIL(&iop, i);
>  
> -		if (ops->iova_to_phys(ops, SZ_2G + 42))
> -			return __FAIL(ops, i);
> +		if (iopt_iova_to_phys(&iop, SZ_2G + 42))
> +			return __FAIL(&iop, i);
>  
>  		/*
>  		 * Distinct mappings of different granule sizes.
> @@ -372,60 +374,60 @@ static int __init arm_lpae_run_tests(struct io_pgtable_cfg *cfg)
>  		for_each_set_bit(j, &cfg->pgsize_bitmap, BITS_PER_LONG) {
>  			size = 1UL << j;
>  
> -			if (ops->map_pages(ops, iova, iova, size, 1,
> +			if (iopt_map_pages(&iop, iova, iova, size, 1,
>  					   IOMMU_READ | IOMMU_WRITE |
>  					   IOMMU_NOEXEC | IOMMU_CACHE,
>  					   GFP_KERNEL, &mapped))
> -				return __FAIL(ops, i);
> +				return __FAIL(&iop, i);
>  
>  			/* Overlapping mappings */
> -			if (!ops->map_pages(ops, iova, iova + size, size, 1,
> +			if (!iopt_map_pages(&iop, iova, iova + size, size, 1,
>  					    IOMMU_READ | IOMMU_NOEXEC,
>  					    GFP_KERNEL, &mapped))
> -				return __FAIL(ops, i);
> +				return __FAIL(&iop, i);
>  
> -			if (ops->iova_to_phys(ops, iova + 42) != (iova + 42))
> -				return __FAIL(ops, i);
> +			if (iopt_iova_to_phys(&iop, iova + 42) != (iova + 42))
> +				return __FAIL(&iop, i);
>  
>  			iova += SZ_1G;
>  		}
>  
>  		/* Partial unmap */
>  		size = 1UL << __ffs(cfg->pgsize_bitmap);
> -		if (ops->unmap_pages(ops, SZ_1G + size, size, 1, NULL) != size)
> -			return __FAIL(ops, i);
> +		if (iopt_unmap_pages(&iop, SZ_1G + size, size, 1, NULL) != size)
> +			return __FAIL(&iop, i);
>  
>  		/* Remap of partial unmap */
> -		if (ops->map_pages(ops, SZ_1G + size, size, size, 1,
> +		if (iopt_map_pages(&iop, SZ_1G + size, size, size, 1,
>  				   IOMMU_READ, GFP_KERNEL, &mapped))
> -			return __FAIL(ops, i);
> +			return __FAIL(&iop, i);
>  
> -		if (ops->iova_to_phys(ops, SZ_1G + size + 42) != (size + 42))
> -			return __FAIL(ops, i);
> +		if (iopt_iova_to_phys(&iop, SZ_1G + size + 42) != (size + 42))
> +			return __FAIL(&iop, i);
>  
>  		/* Full unmap */
>  		iova = 0;
>  		for_each_set_bit(j, &cfg->pgsize_bitmap, BITS_PER_LONG) {
>  			size = 1UL << j;
>  
> -			if (ops->unmap_pages(ops, iova, size, 1, NULL) != size)
> -				return __FAIL(ops, i);
> +			if (iopt_unmap_pages(&iop, iova, size, 1, NULL) != size)
> +				return __FAIL(&iop, i);
>  
> -			if (ops->iova_to_phys(ops, iova + 42))
> -				return __FAIL(ops, i);
> +			if (iopt_iova_to_phys(&iop, iova + 42))
> +				return __FAIL(&iop, i);
>  
>  			/* Remap full block */
> -			if (ops->map_pages(ops, iova, iova, size, 1,
> +			if (iopt_map_pages(&iop, iova, iova, size, 1,
>  					   IOMMU_WRITE, GFP_KERNEL, &mapped))
> -				return __FAIL(ops, i);
> +				return __FAIL(&iop, i);
>  
> -			if (ops->iova_to_phys(ops, iova + 42) != (iova + 42))
> -				return __FAIL(ops, i);
> +			if (iopt_iova_to_phys(&iop, iova + 42) != (iova + 42))
> +				return __FAIL(&iop, i);
>  
>  			iova += SZ_1G;
>  		}
>  
> -		free_io_pgtable_ops(ops);
> +		free_io_pgtable_ops(&iop);
>  	}
>  
>  	selftest_running = false;
> diff --git a/drivers/iommu/io-pgtable-dart.c b/drivers/iommu/io-pgtable-dart.c
> index f981b25d8c98..1bb2e91ed0a7 100644
> --- a/drivers/iommu/io-pgtable-dart.c
> +++ b/drivers/iommu/io-pgtable-dart.c
> @@ -34,7 +34,7 @@
>  	container_of((x), struct dart_io_pgtable, iop)
>  
>  #define io_pgtable_ops_to_data(x)					\
> -	io_pgtable_to_data(io_pgtable_ops_to_pgtable(x))
> +	io_pgtable_to_data(io_pgtable_ops_to_params(x))
>  
>  #define DART_GRANULE(d)						\
>  	(sizeof(dart_iopte) << (d)->bits_per_level)
> @@ -65,12 +65,10 @@
>  #define iopte_deref(pte, d) __va(iopte_to_paddr(pte, d))
>  
>  struct dart_io_pgtable {
> -	struct io_pgtable	iop;
> +	struct io_pgtable_params	iop;
>  
> -	int			tbl_bits;
> -	int			bits_per_level;
> -
> -	void			*pgd[DART_MAX_TABLES];
> +	int				tbl_bits;
> +	int				bits_per_level;
>  };
>  
>  typedef u64 dart_iopte;
> @@ -170,10 +168,14 @@ static dart_iopte dart_install_table(dart_iopte *table,
>  	return old;
>  }
>  
> -static int dart_get_table(struct dart_io_pgtable *data, unsigned long iova)
> +static dart_iopte *dart_get_table(struct io_pgtable *iop,
> +				  struct dart_io_pgtable *data,
> +				  unsigned long iova)
>  {
> -	return (iova >> (3 * data->bits_per_level + ilog2(sizeof(dart_iopte)))) &
> +	int tbl = (iova >> (3 * data->bits_per_level + ilog2(sizeof(dart_iopte)))) &
>  		((1 << data->tbl_bits) - 1);
> +
> +	return iop->pgd + DART_GRANULE(data) * tbl;
>  }
>  
>  static int dart_get_l1_index(struct dart_io_pgtable *data, unsigned long iova)
> @@ -190,12 +192,12 @@ static int dart_get_l2_index(struct dart_io_pgtable *data, unsigned long iova)
>  		 ((1 << data->bits_per_level) - 1);
>  }
>  
> -static  dart_iopte *dart_get_l2(struct dart_io_pgtable *data, unsigned long iova)
> +static  dart_iopte *dart_get_l2(struct io_pgtable *iop,
> +				struct dart_io_pgtable *data, unsigned long iova)
>  {
>  	dart_iopte pte, *ptep;
> -	int tbl = dart_get_table(data, iova);
>  
> -	ptep = data->pgd[tbl];
> +	ptep = dart_get_table(iop, data, iova);
>  	if (!ptep)
>  		return NULL;
>  
> @@ -233,14 +235,14 @@ static dart_iopte dart_prot_to_pte(struct dart_io_pgtable *data,
>  	return pte;
>  }
>  
> -static int dart_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
> +static int dart_map_pages(struct io_pgtable *iop, unsigned long iova,
>  			      phys_addr_t paddr, size_t pgsize, size_t pgcount,
>  			      int iommu_prot, gfp_t gfp, size_t *mapped)
>  {
> -	struct dart_io_pgtable *data = io_pgtable_ops_to_data(ops);
> +	struct dart_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
>  	struct io_pgtable_cfg *cfg = &data->iop.cfg;
>  	size_t tblsz = DART_GRANULE(data);
> -	int ret = 0, tbl, num_entries, max_entries, map_idx_start;
> +	int ret = 0, num_entries, max_entries, map_idx_start;
>  	dart_iopte pte, *cptep, *ptep;
>  	dart_iopte prot;
>  
> @@ -254,9 +256,7 @@ static int dart_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
>  	if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE)))
>  		return 0;
>  
> -	tbl = dart_get_table(data, iova);
> -
> -	ptep = data->pgd[tbl];
> +	ptep = dart_get_table(iop, data, iova);
>  	ptep += dart_get_l1_index(data, iova);
>  	pte = READ_ONCE(*ptep);
>  
> @@ -295,11 +295,11 @@ static int dart_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
>  	return ret;
>  }
>  
> -static size_t dart_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova,
> +static size_t dart_unmap_pages(struct io_pgtable *iop, unsigned long iova,
>  				   size_t pgsize, size_t pgcount,
>  				   struct iommu_iotlb_gather *gather)
>  {
> -	struct dart_io_pgtable *data = io_pgtable_ops_to_data(ops);
> +	struct dart_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
>  	struct io_pgtable_cfg *cfg = &data->iop.cfg;
>  	int i = 0, num_entries, max_entries, unmap_idx_start;
>  	dart_iopte pte, *ptep;
> @@ -307,7 +307,7 @@ static size_t dart_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova,
>  	if (WARN_ON(pgsize != cfg->pgsize_bitmap || !pgcount))
>  		return 0;
>  
> -	ptep = dart_get_l2(data, iova);
> +	ptep = dart_get_l2(iop, data, iova);
>  
>  	/* Valid L2 IOPTE pointer? */
>  	if (WARN_ON(!ptep))
> @@ -328,7 +328,7 @@ static size_t dart_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova,
>  		*ptep = 0;
>  
>  		if (!iommu_iotlb_gather_queued(gather))
> -			io_pgtable_tlb_add_page(&data->iop, gather,
> +			io_pgtable_tlb_add_page(cfg, iop, gather,
>  						iova + i * pgsize, pgsize);
>  
>  		ptep++;
> @@ -338,13 +338,13 @@ static size_t dart_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova,
>  	return i * pgsize;
>  }
>  
> -static phys_addr_t dart_iova_to_phys(struct io_pgtable_ops *ops,
> +static phys_addr_t dart_iova_to_phys(struct io_pgtable *iop,
>  					 unsigned long iova)
>  {
> -	struct dart_io_pgtable *data = io_pgtable_ops_to_data(ops);
> +	struct dart_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
>  	dart_iopte pte, *ptep;
>  
> -	ptep = dart_get_l2(data, iova);
> +	ptep = dart_get_l2(iop, data, iova);
>  
>  	/* Valid L2 IOPTE pointer? */
>  	if (!ptep)
> @@ -394,56 +394,56 @@ dart_alloc_pgtable(struct io_pgtable_cfg *cfg)
>  	return data;
>  }
>  
> -static struct io_pgtable *
> -apple_dart_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
> +static int apple_dart_alloc_pgtable(struct io_pgtable *iop,
> +				    struct io_pgtable_cfg *cfg, void *cookie)
>  {
>  	struct dart_io_pgtable *data;
>  	int i;
>  
>  	if (!cfg->coherent_walk)
> -		return NULL;
> +		return -EINVAL;
>  
>  	if (cfg->oas != 36 && cfg->oas != 42)
> -		return NULL;
> +		return -EINVAL;
>  
>  	if (cfg->ias > cfg->oas)
> -		return NULL;
> +		return -EINVAL;
>  
>  	if (!(cfg->pgsize_bitmap == SZ_4K || cfg->pgsize_bitmap == SZ_16K))
> -		return NULL;
> +		return -EINVAL;
>  
>  	data = dart_alloc_pgtable(cfg);
>  	if (!data)
> -		return NULL;
> +		return -ENOMEM;
>  
>  	cfg->apple_dart_cfg.n_ttbrs = 1 << data->tbl_bits;
>  
> -	for (i = 0; i < cfg->apple_dart_cfg.n_ttbrs; ++i) {
> -		data->pgd[i] = __dart_alloc_pages(DART_GRANULE(data), GFP_KERNEL,
> -					   cfg);
> -		if (!data->pgd[i])
> -			goto out_free_data;
> -		cfg->apple_dart_cfg.ttbr[i] = virt_to_phys(data->pgd[i]);
> -	}
> +	iop->pgd = __dart_alloc_pages(cfg->apple_dart_cfg.n_ttbrs *
> +				      DART_GRANULE(data), GFP_KERNEL, cfg);
> +	if (!iop->pgd)
> +		goto out_free_data;
> +
> +	for (i = 0; i < cfg->apple_dart_cfg.n_ttbrs; ++i)
> +		cfg->apple_dart_cfg.ttbr[i] = virt_to_phys(iop->pgd) +
> +					      i * DART_GRANULE(data);
>  
> -	return &data->iop;
> +	iop->ops = &data->iop.ops;
> +	return 0;
>  
>  out_free_data:
> -	while (--i >= 0)
> -		free_pages((unsigned long)data->pgd[i],
> -			   get_order(DART_GRANULE(data)));
>  	kfree(data);
> -	return NULL;
> +	return -ENOMEM;
>  }
>  
>  static void apple_dart_free_pgtable(struct io_pgtable *iop)
>  {
> -	struct dart_io_pgtable *data = io_pgtable_to_data(iop);
> +	struct dart_io_pgtable *data = io_pgtable_ops_to_data(iop->ops);
> +	size_t n_ttbrs = 1 << data->tbl_bits;
>  	dart_iopte *ptep, *end;
>  	int i;
>  
> -	for (i = 0; i < (1 << data->tbl_bits) && data->pgd[i]; ++i) {
> -		ptep = data->pgd[i];
> +	for (i = 0; i < n_ttbrs; ++i) {
> +		ptep = iop->pgd + DART_GRANULE(data) * i;
>  		end = (void *)ptep + DART_GRANULE(data);
>  
>  		while (ptep != end) {
> @@ -456,10 +456,9 @@ static void apple_dart_free_pgtable(struct io_pgtable *iop)
>  				free_pages(page, get_order(DART_GRANULE(data)));
>  			}
>  		}
> -		free_pages((unsigned long)data->pgd[i],
> -			   get_order(DART_GRANULE(data)));
>  	}
> -
> +	free_pages((unsigned long)iop->pgd,
> +		   get_order(DART_GRANULE(data) * n_ttbrs));
>  	kfree(data);
>  }
>  
> diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c
> index 2aba691db1da..acc6802b2f50 100644
> --- a/drivers/iommu/io-pgtable.c
> +++ b/drivers/iommu/io-pgtable.c
> @@ -34,27 +34,30 @@ io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] = {
>  #endif
>  };
>  
> -struct io_pgtable_ops *alloc_io_pgtable_ops(struct io_pgtable_cfg *cfg,
> -					    void *cookie)
> +int alloc_io_pgtable_ops(struct io_pgtable *iop, struct io_pgtable_cfg *cfg,
> +			 void *cookie)
>  {
> -	struct io_pgtable *iop;
> +	int ret;
> +	struct io_pgtable_params *params;
>  	const struct io_pgtable_init_fns *fns;
>  
>  	if (cfg->fmt >= IO_PGTABLE_NUM_FMTS)
> -		return NULL;
> +		return -EINVAL;
>  
>  	fns = io_pgtable_init_table[cfg->fmt];
>  	if (!fns)
> -		return NULL;
> +		return -EINVAL;
>  
> -	iop = fns->alloc(cfg, cookie);
> -	if (!iop)
> -		return NULL;
> +	ret = fns->alloc(iop, cfg, cookie);
> +	if (ret)
> +		return ret;
> +
> +	params = io_pgtable_ops_to_params(iop->ops);
>  
>  	iop->cookie	= cookie;
> -	iop->cfg	= *cfg;
> +	params->cfg	= *cfg;
>  
> -	return &iop->ops;
> +	return 0;
>  }
>  EXPORT_SYMBOL_GPL(alloc_io_pgtable_ops);
>  
> @@ -62,16 +65,17 @@ EXPORT_SYMBOL_GPL(alloc_io_pgtable_ops);
>   * It is the IOMMU driver's responsibility to ensure that the page table
>   * is no longer accessible to the walker by this point.
>   */
> -void free_io_pgtable_ops(struct io_pgtable_ops *ops)
> +void free_io_pgtable_ops(struct io_pgtable *iop)
>  {
> -	struct io_pgtable *iop;
> +	struct io_pgtable_params *params;
>  
> -	if (!ops)
> +	if (!iop)
>  		return;
>  
> -	iop = io_pgtable_ops_to_pgtable(ops);
> -	io_pgtable_tlb_flush_all(iop);
> -	io_pgtable_init_table[iop->cfg.fmt]->free(iop);
> +	params = io_pgtable_ops_to_params(iop->ops);
> +	io_pgtable_tlb_flush_all(&params->cfg, iop);
> +	io_pgtable_init_table[params->cfg.fmt]->free(iop);
> +	memset(iop, 0, sizeof(*iop));
>  }
>  EXPORT_SYMBOL_GPL(free_io_pgtable_ops);
>  
> diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
> index 4a1927489635..3ff21e6bf939 100644
> --- a/drivers/iommu/ipmmu-vmsa.c
> +++ b/drivers/iommu/ipmmu-vmsa.c
> @@ -73,7 +73,7 @@ struct ipmmu_vmsa_domain {
>  	struct iommu_domain io_domain;
>  
>  	struct io_pgtable_cfg cfg;
> -	struct io_pgtable_ops *iop;
> +	struct io_pgtable iop;
>  
>  	unsigned int context_id;
>  	struct mutex mutex;			/* Protects mappings */
> @@ -458,11 +458,11 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
>  
>  	domain->context_id = ret;
>  
> -	domain->iop = alloc_io_pgtable_ops(&domain->cfg, domain);
> -	if (!domain->iop) {
> +	ret = alloc_io_pgtable_ops(&domain->iop, &domain->cfg, domain);
> +	if (ret) {
>  		ipmmu_domain_free_context(domain->mmu->root,
>  					  domain->context_id);
> -		return -EINVAL;
> +		return ret;
>  	}
>  
>  	ipmmu_domain_setup_context(domain);
> @@ -592,7 +592,7 @@ static void ipmmu_domain_free(struct iommu_domain *io_domain)
>  	 * been detached.
>  	 */
>  	ipmmu_domain_destroy_context(domain);
> -	free_io_pgtable_ops(domain->iop);
> +	free_io_pgtable_ops(&domain->iop);
>  	kfree(domain);
>  }
>  
> @@ -664,8 +664,8 @@ static int ipmmu_map(struct iommu_domain *io_domain, unsigned long iova,
>  {
>  	struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);
>  
> -	return domain->iop->map_pages(domain->iop, iova, paddr, pgsize, pgcount,
> -				      prot, gfp, mapped);
> +	return iopt_map_pages(&domain->iop, iova, paddr, pgsize, pgcount, prot,
> +			      gfp, mapped);
>  }
>  
>  static size_t ipmmu_unmap(struct iommu_domain *io_domain, unsigned long iova,
> @@ -674,7 +674,7 @@ static size_t ipmmu_unmap(struct iommu_domain *io_domain, unsigned long iova,
>  {
>  	struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);
>  
> -	return domain->iop->unmap_pages(domain->iop, iova, pgsize, pgcount, gather);
> +	return iopt_unmap_pages(&domain->iop, iova, pgsize, pgcount, gather);
>  }
>  
>  static void ipmmu_flush_iotlb_all(struct iommu_domain *io_domain)
> @@ -698,7 +698,7 @@ static phys_addr_t ipmmu_iova_to_phys(struct iommu_domain *io_domain,
>  
>  	/* TODO: Is locking needed ? */
>  
> -	return domain->iop->iova_to_phys(domain->iop, iova);
> +	return iopt_iova_to_phys(&domain->iop, iova);
>  }
>  
>  static int ipmmu_init_platform_device(struct device *dev,
> diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
> index 2c05a84ec1bf..6dae6743e11b 100644
> --- a/drivers/iommu/msm_iommu.c
> +++ b/drivers/iommu/msm_iommu.c
> @@ -41,7 +41,7 @@ struct msm_priv {
>  	struct list_head list_attached;
>  	struct iommu_domain domain;
>  	struct io_pgtable_cfg	cfg;
> -	struct io_pgtable_ops	*iop;
> +	struct io_pgtable	iop;
>  	struct device		*dev;
>  	spinlock_t		pgtlock; /* pagetable lock */
>  };
> @@ -339,6 +339,7 @@ static void msm_iommu_domain_free(struct iommu_domain *domain)
>  
>  static int msm_iommu_domain_config(struct msm_priv *priv)
>  {
> +	int ret;
>  	spin_lock_init(&priv->pgtlock);
>  
>  	priv->cfg = (struct io_pgtable_cfg) {
> @@ -350,10 +351,10 @@ static int msm_iommu_domain_config(struct msm_priv *priv)
>  		.iommu_dev = priv->dev,
>  	};
>  
> -	priv->iop = alloc_io_pgtable_ops(&priv->cfg, priv);
> -	if (!priv->iop) {
> +	ret = alloc_io_pgtable_ops(&priv->iop, &priv->cfg, priv);
> +	if (ret) {
>  		dev_err(priv->dev, "Failed to allocate pgtable\n");
> -		return -EINVAL;
> +		return ret;
>  	}
>  
>  	msm_iommu_ops.pgsize_bitmap = priv->cfg.pgsize_bitmap;
> @@ -453,7 +454,7 @@ static void msm_iommu_detach_dev(struct iommu_domain *domain,
>  	struct msm_iommu_ctx_dev *master;
>  	int ret;
>  
> -	free_io_pgtable_ops(priv->iop);
> +	free_io_pgtable_ops(&priv->iop);
>  
>  	spin_lock_irqsave(&msm_iommu_lock, flags);
>  	list_for_each_entry(iommu, &priv->list_attached, dom_node) {
> @@ -480,8 +481,8 @@ static int msm_iommu_map(struct iommu_domain *domain, unsigned long iova,
>  	int ret;
>  
>  	spin_lock_irqsave(&priv->pgtlock, flags);
> -	ret = priv->iop->map_pages(priv->iop, iova, pa, pgsize, pgcount, prot,
> -				   GFP_ATOMIC, mapped);
> +	ret = iopt_map_pages(&priv->iop, iova, pa, pgsize, pgcount, prot,
> +			     GFP_ATOMIC, mapped);
>  	spin_unlock_irqrestore(&priv->pgtlock, flags);
>  
>  	return ret;
> @@ -504,7 +505,7 @@ static size_t msm_iommu_unmap(struct iommu_domain *domain, unsigned long iova,
>  	size_t ret;
>  
>  	spin_lock_irqsave(&priv->pgtlock, flags);
> -	ret = priv->iop->unmap_pages(priv->iop, iova, pgsize, pgcount, gather);
> +	ret = iopt_unmap_pages(&priv->iop, iova, pgsize, pgcount, gather);
>  	spin_unlock_irqrestore(&priv->pgtlock, flags);
>  
>  	return ret;
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 0d754d94ae52..615d9ade575e 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -244,7 +244,7 @@ struct mtk_iommu_data {
>  
>  struct mtk_iommu_domain {
>  	struct io_pgtable_cfg		cfg;
> -	struct io_pgtable_ops		*iop;
> +	struct io_pgtable		iop;
>  
>  	struct mtk_iommu_bank_data	*bank;
>  	struct iommu_domain		domain;
> @@ -587,6 +587,7 @@ static int mtk_iommu_domain_finalise(struct mtk_iommu_domain *dom,
>  {
>  	const struct mtk_iommu_iova_region *region;
>  	struct mtk_iommu_domain	*m4u_dom;
> +	int ret;
>  
>  	/* Always use bank0 in sharing pgtable case */
>  	m4u_dom = data->bank[0].m4u_dom;
> @@ -615,8 +616,8 @@ static int mtk_iommu_domain_finalise(struct mtk_iommu_domain *dom,
>  	else
>  		dom->cfg.oas = 35;
>  
> -	dom->iop = alloc_io_pgtable_ops(&dom->cfg, data);
> -	if (!dom->iop) {
> +	ret = alloc_io_pgtable_ops(&dom->iop, &dom->cfg, data);
> +	if (ret) {
>  		dev_err(data->dev, "Failed to alloc io pgtable\n");
>  		return -ENOMEM;
>  	}
> @@ -730,7 +731,7 @@ static int mtk_iommu_map(struct iommu_domain *domain, unsigned long iova,
>  		paddr |= BIT_ULL(32);
>  
>  	/* Synchronize with the tlb_lock */
> -	return dom->iop->map_pages(dom->iop, iova, paddr, pgsize, pgcount, prot, gfp, mapped);
> +	return iopt_map_pages(&dom->iop, iova, paddr, pgsize, pgcount, prot, gfp, mapped);
>  }
>  
>  static size_t mtk_iommu_unmap(struct iommu_domain *domain,
> @@ -740,7 +741,7 @@ static size_t mtk_iommu_unmap(struct iommu_domain *domain,
>  	struct mtk_iommu_domain *dom = to_mtk_domain(domain);
>  
>  	iommu_iotlb_gather_add_range(gather, iova, pgsize * pgcount);
> -	return dom->iop->unmap_pages(dom->iop, iova, pgsize, pgcount, gather);
> +	return iopt_unmap_pages(&dom->iop, iova, pgsize, pgcount, gather);
>  }
>  
>  static void mtk_iommu_flush_iotlb_all(struct iommu_domain *domain)
> @@ -773,7 +774,7 @@ static phys_addr_t mtk_iommu_iova_to_phys(struct iommu_domain *domain,
>  	struct mtk_iommu_domain *dom = to_mtk_domain(domain);
>  	phys_addr_t pa;
>  
> -	pa = dom->iop->iova_to_phys(dom->iop, iova);
> +	pa = iopt_iova_to_phys(&dom->iop, iova);
>  	if (IS_ENABLED(CONFIG_PHYS_ADDR_T_64BIT) &&
>  	    dom->bank->parent_data->enable_4GB &&
>  	    pa >= MTK_IOMMU_4GB_MODE_REMAP_BASE)
> -- 
> 2.39.0
> 

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 09/45] KVM: arm64: pkvm: Add pkvm_create_hyp_device_mapping()
  2023-02-01 12:52   ` Jean-Philippe Brucker
@ 2023-02-07 12:22     ` Mostafa Saleh
  -1 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2023-02-07 12:22 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Jean,

On Wed, Feb 01, 2023 at 12:52:53PM +0000, Jean-Philippe Brucker wrote:
> Add a function to map a MMIO region in the hypervisor and remove it from
> the host. Hypervisor device drivers use this to reserve their regions
> during setup.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> ---
>  arch/arm64/kvm/hyp/include/nvhe/mm.h |  1 +
>  arch/arm64/kvm/hyp/nvhe/setup.c      | 17 +++++++++++++++++
>  2 files changed, 18 insertions(+)
> 
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> index d5ec972b5c1e..84db840f2057 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> @@ -27,5 +27,6 @@ int __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
>  				  enum kvm_pgtable_prot prot,
>  				  unsigned long *haddr);
>  int pkvm_alloc_private_va_range(size_t size, unsigned long *haddr);
> +int pkvm_create_hyp_device_mapping(u64 base, u64 size, void __iomem *haddr);
>  
>  #endif /* __KVM_HYP_MM_H */
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index 629e74c46b35..de7d60c3c20b 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -259,6 +259,23 @@ static int fix_host_ownership(void)
>  	return 0;
>  }
>  
> +/* Map the MMIO region into the hypervisor and remove it from host */
> +int pkvm_create_hyp_device_mapping(u64 base, u64 size, void __iomem *haddr)
> +{
> +	int ret;
> +
> +	ret = __pkvm_create_private_mapping(base, size, PAGE_HYP_DEVICE, haddr);
> +	if (ret)
> +		return ret;
> +
> +	/* lock not needed during setup */
> +	ret = host_stage2_set_owner_locked(base, size, PKVM_ID_HYP);
> +	if (ret)
> +		return ret;
> +

Do we need to unmap in case of errors from host_stage2_set_owner_locked?

> +	return ret;
> +}
> +
>  static int fix_hyp_pgtable_refcnt(void)
>  {
>  	struct kvm_pgtable_walker walker = {
> -- 
> 2.39.0
> 

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 09/45] KVM: arm64: pkvm: Add pkvm_create_hyp_device_mapping()
@ 2023-02-07 12:22     ` Mostafa Saleh
  0 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2023-02-07 12:22 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Jean,

On Wed, Feb 01, 2023 at 12:52:53PM +0000, Jean-Philippe Brucker wrote:
> Add a function to map a MMIO region in the hypervisor and remove it from
> the host. Hypervisor device drivers use this to reserve their regions
> during setup.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> ---
>  arch/arm64/kvm/hyp/include/nvhe/mm.h |  1 +
>  arch/arm64/kvm/hyp/nvhe/setup.c      | 17 +++++++++++++++++
>  2 files changed, 18 insertions(+)
> 
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> index d5ec972b5c1e..84db840f2057 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> @@ -27,5 +27,6 @@ int __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
>  				  enum kvm_pgtable_prot prot,
>  				  unsigned long *haddr);
>  int pkvm_alloc_private_va_range(size_t size, unsigned long *haddr);
> +int pkvm_create_hyp_device_mapping(u64 base, u64 size, void __iomem *haddr);
>  
>  #endif /* __KVM_HYP_MM_H */
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index 629e74c46b35..de7d60c3c20b 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -259,6 +259,23 @@ static int fix_host_ownership(void)
>  	return 0;
>  }
>  
> +/* Map the MMIO region into the hypervisor and remove it from host */
> +int pkvm_create_hyp_device_mapping(u64 base, u64 size, void __iomem *haddr)
> +{
> +	int ret;
> +
> +	ret = __pkvm_create_private_mapping(base, size, PAGE_HYP_DEVICE, haddr);
> +	if (ret)
> +		return ret;
> +
> +	/* lock not needed during setup */
> +	ret = host_stage2_set_owner_locked(base, size, PKVM_ID_HYP);
> +	if (ret)
> +		return ret;
> +

Do we need to unmap in case of errors from host_stage2_set_owner_locked?

> +	return ret;
> +}
> +
>  static int fix_hyp_pgtable_refcnt(void)
>  {
>  	struct kvm_pgtable_walker walker = {
> -- 
> 2.39.0
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 15/45] KVM: arm64: pkvm: Add __pkvm_host_share/unshare_dma()
  2023-02-01 12:52   ` Jean-Philippe Brucker
@ 2023-02-07 12:53     ` Mostafa Saleh
  -1 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2023-02-07 12:53 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Jean,

On Wed, Feb 01, 2023 at 12:52:59PM +0000, Jean-Philippe Brucker wrote:
> Host pages mapped in the SMMU must not be donated to the guest or
> hypervisor, since the host could then use DMA to break confidentiality.
> Mark them shared in the host stage-2 page tables, and keep a refcount in
> the hyp vmemmap.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> ---
>  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |   3 +
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 185 ++++++++++++++++++
>  2 files changed, 188 insertions(+)
> 
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> index 021825aee854..a363d58a998b 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -58,6 +58,7 @@ enum pkvm_component_id {
>  	PKVM_ID_HOST,
>  	PKVM_ID_HYP,
>  	PKVM_ID_GUEST,
> +	PKVM_ID_IOMMU,
>  };
>  
>  extern unsigned long hyp_nr_cpus;
> @@ -72,6 +73,8 @@ int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu);
>  int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu);
>  int __pkvm_guest_share_host(struct pkvm_hyp_vcpu *hyp_vcpu, u64 ipa);
>  int __pkvm_guest_unshare_host(struct pkvm_hyp_vcpu *hyp_vcpu, u64 ipa);
> +int __pkvm_host_share_dma(u64 phys_addr, size_t size, bool is_ram);
> +int __pkvm_host_unshare_dma(u64 phys_addr, size_t size);
>  
>  bool addr_is_memory(phys_addr_t phys);
>  int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 856673291d70..dcf08ce03790 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -1148,6 +1148,9 @@ static int check_share(struct pkvm_mem_share *share)
>  	case PKVM_ID_GUEST:
>  		ret = guest_ack_share(completer_addr, tx, share->completer_prot);
>  		break;
> +	case PKVM_ID_IOMMU:
> +		ret = 0;
> +		break;
>  	default:
>  		ret = -EINVAL;
>  	}
> @@ -1185,6 +1188,9 @@ static int __do_share(struct pkvm_mem_share *share)
>  	case PKVM_ID_GUEST:
>  		ret = guest_complete_share(completer_addr, tx, share->completer_prot);
>  		break;
> +	case PKVM_ID_IOMMU:
> +		ret = 0;
> +		break;
>  	default:
>  		ret = -EINVAL;
>  	}
> @@ -1239,6 +1245,9 @@ static int check_unshare(struct pkvm_mem_share *share)
>  	case PKVM_ID_HYP:
>  		ret = hyp_ack_unshare(completer_addr, tx);
>  		break;
> +	case PKVM_ID_IOMMU:
> +		ret = 0;
> +		break;
>  	default:
>  		ret = -EINVAL;
>  	}
> @@ -1273,6 +1282,9 @@ static int __do_unshare(struct pkvm_mem_share *share)
>  	case PKVM_ID_HYP:
>  		ret = hyp_complete_unshare(completer_addr, tx);
>  		break;
> +	case PKVM_ID_IOMMU:
> +		ret = 0;
> +		break;
>  	default:
>  		ret = -EINVAL;
>  	}
> @@ -1633,6 +1645,179 @@ void hyp_unpin_shared_mem(void *from, void *to)
>  	host_unlock_component();
>  }
>  
> +static int __host_check_page_dma_shared(phys_addr_t phys_addr)
> +{
> +	int ret;
> +	u64 hyp_addr;
> +
> +	/*
> +	 * The page is already refcounted. Make sure it's owned by the host, and
> +	 * not part of the hyp pool.
> +	 */
> +	ret = __host_check_page_state_range(phys_addr, PAGE_SIZE,
> +					    PKVM_PAGE_SHARED_OWNED);
> +	if (ret)
> +		return ret;
> +
> +	/*
> +	 * Refcounted and owned by host, means it's either mapped in the
> +	 * SMMU, or it's some VM/VCPU state shared with the hypervisor.
> +	 * The host has no reason to use a page for both.
> +	 */
> +	hyp_addr = (u64)hyp_phys_to_virt(phys_addr);
> +	return __hyp_check_page_state_range(hyp_addr, PAGE_SIZE, PKVM_NOPAGE);

This works for hyp-host sharing, I am worried about scalability of this.
For example FFA(still on the list) adds a new entity that can share data
with host, and we would need an extra check for it.
https://lore.kernel.org/kvmarm/20221116170335.2341003-1-qperret@google.com/

One way I can think about this, is to use the SW bits in SMMU page table to
represent ownership, for example for PKVM_ID_IOMMU
do_share:
-completer: set SW bits in the SMMU page table to
PKVM_PAGE_SHARED_BORROWED

do_unshare:
-completer: set SW bit back to PKVM_PAGE_OWNED(I think this is good
enough for host ownership)

In __pkvm_host_share_dma_page
If page refcount > 1, it succeeds if only the page was borrowed in the
SMMU page table and shared in the host page table.


Thanks,
Mostafa


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 15/45] KVM: arm64: pkvm: Add __pkvm_host_share/unshare_dma()
@ 2023-02-07 12:53     ` Mostafa Saleh
  0 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2023-02-07 12:53 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Jean,

On Wed, Feb 01, 2023 at 12:52:59PM +0000, Jean-Philippe Brucker wrote:
> Host pages mapped in the SMMU must not be donated to the guest or
> hypervisor, since the host could then use DMA to break confidentiality.
> Mark them shared in the host stage-2 page tables, and keep a refcount in
> the hyp vmemmap.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> ---
>  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |   3 +
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 185 ++++++++++++++++++
>  2 files changed, 188 insertions(+)
> 
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> index 021825aee854..a363d58a998b 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -58,6 +58,7 @@ enum pkvm_component_id {
>  	PKVM_ID_HOST,
>  	PKVM_ID_HYP,
>  	PKVM_ID_GUEST,
> +	PKVM_ID_IOMMU,
>  };
>  
>  extern unsigned long hyp_nr_cpus;
> @@ -72,6 +73,8 @@ int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu);
>  int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu);
>  int __pkvm_guest_share_host(struct pkvm_hyp_vcpu *hyp_vcpu, u64 ipa);
>  int __pkvm_guest_unshare_host(struct pkvm_hyp_vcpu *hyp_vcpu, u64 ipa);
> +int __pkvm_host_share_dma(u64 phys_addr, size_t size, bool is_ram);
> +int __pkvm_host_unshare_dma(u64 phys_addr, size_t size);
>  
>  bool addr_is_memory(phys_addr_t phys);
>  int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 856673291d70..dcf08ce03790 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -1148,6 +1148,9 @@ static int check_share(struct pkvm_mem_share *share)
>  	case PKVM_ID_GUEST:
>  		ret = guest_ack_share(completer_addr, tx, share->completer_prot);
>  		break;
> +	case PKVM_ID_IOMMU:
> +		ret = 0;
> +		break;
>  	default:
>  		ret = -EINVAL;
>  	}
> @@ -1185,6 +1188,9 @@ static int __do_share(struct pkvm_mem_share *share)
>  	case PKVM_ID_GUEST:
>  		ret = guest_complete_share(completer_addr, tx, share->completer_prot);
>  		break;
> +	case PKVM_ID_IOMMU:
> +		ret = 0;
> +		break;
>  	default:
>  		ret = -EINVAL;
>  	}
> @@ -1239,6 +1245,9 @@ static int check_unshare(struct pkvm_mem_share *share)
>  	case PKVM_ID_HYP:
>  		ret = hyp_ack_unshare(completer_addr, tx);
>  		break;
> +	case PKVM_ID_IOMMU:
> +		ret = 0;
> +		break;
>  	default:
>  		ret = -EINVAL;
>  	}
> @@ -1273,6 +1282,9 @@ static int __do_unshare(struct pkvm_mem_share *share)
>  	case PKVM_ID_HYP:
>  		ret = hyp_complete_unshare(completer_addr, tx);
>  		break;
> +	case PKVM_ID_IOMMU:
> +		ret = 0;
> +		break;
>  	default:
>  		ret = -EINVAL;
>  	}
> @@ -1633,6 +1645,179 @@ void hyp_unpin_shared_mem(void *from, void *to)
>  	host_unlock_component();
>  }
>  
> +static int __host_check_page_dma_shared(phys_addr_t phys_addr)
> +{
> +	int ret;
> +	u64 hyp_addr;
> +
> +	/*
> +	 * The page is already refcounted. Make sure it's owned by the host, and
> +	 * not part of the hyp pool.
> +	 */
> +	ret = __host_check_page_state_range(phys_addr, PAGE_SIZE,
> +					    PKVM_PAGE_SHARED_OWNED);
> +	if (ret)
> +		return ret;
> +
> +	/*
> +	 * Refcounted and owned by host, means it's either mapped in the
> +	 * SMMU, or it's some VM/VCPU state shared with the hypervisor.
> +	 * The host has no reason to use a page for both.
> +	 */
> +	hyp_addr = (u64)hyp_phys_to_virt(phys_addr);
> +	return __hyp_check_page_state_range(hyp_addr, PAGE_SIZE, PKVM_NOPAGE);

This works for hyp-host sharing, I am worried about scalability of this.
For example FFA(still on the list) adds a new entity that can share data
with host, and we would need an extra check for it.
https://lore.kernel.org/kvmarm/20221116170335.2341003-1-qperret@google.com/

One way I can think about this, is to use the SW bits in SMMU page table to
represent ownership, for example for PKVM_ID_IOMMU
do_share:
-completer: set SW bits in the SMMU page table to
PKVM_PAGE_SHARED_BORROWED

do_unshare:
-completer: set SW bit back to PKVM_PAGE_OWNED(I think this is good
enough for host ownership)

In __pkvm_host_share_dma_page
If page refcount > 1, it succeeds if only the page was borrowed in the
SMMU page table and shared in the host page table.


Thanks,
Mostafa


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 19/45] KVM: arm64: iommu: Add domains
  2023-02-01 12:53   ` Jean-Philippe Brucker
@ 2023-02-07 13:13     ` Mostafa Saleh
  -1 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2023-02-07 13:13 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Jean,

On Wed, Feb 01, 2023 at 12:53:03PM +0000, Jean-Philippe Brucker wrote:
> The IOMMU domain abstraction allows to share the same page tables
> between multiple devices. That may be necessary due to hardware
> constraints, if multiple devices cannot be isolated by the IOMMU
> (conventional PCI bus for example). It may also help with optimizing
> resource or TLB use. For pKVM in particular, it may be useful to reduce
> the amount of memory required for page tables. All devices owned by the
> host kernel could be attached to the same domain (though that requires
> host changes).
> 
> Each IOMMU device holds an array of domains, and the host allocates
> domain IDs that index this array. The alloc() operation initializes the
> domain and prepares the page tables. The attach() operation initializes
> the device table that holds the PGD and its configuration.

I was wondering about the need for pre-allocation of the domain array.

An alternative way I see:
- We don’t pre-allocate any domain.

- When the EL1 driver has a request to domain_alloc, it will allocate
both kernel(iommu_domain) and hypervisor domains(kvm_hyp_iommu_domain).

- In __pkvm_host_iommu_alloc_domain, it will take over the hyp struct
from the kernel (via donation).

- In all other hypercalls, the kernel address of kvm_hyp_iommu_domain will
be used as domain ID, which guarantees uniqueness and O(1) access.

- The hypervisor would just need to transform the address(kern_hyp_va)
to get the domain pointer.

I believe that would save some memory as we allocate domains when needed.

Please let me know what you think about this?

Thanks,
Mostafa

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 19/45] KVM: arm64: iommu: Add domains
@ 2023-02-07 13:13     ` Mostafa Saleh
  0 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2023-02-07 13:13 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Jean,

On Wed, Feb 01, 2023 at 12:53:03PM +0000, Jean-Philippe Brucker wrote:
> The IOMMU domain abstraction allows to share the same page tables
> between multiple devices. That may be necessary due to hardware
> constraints, if multiple devices cannot be isolated by the IOMMU
> (conventional PCI bus for example). It may also help with optimizing
> resource or TLB use. For pKVM in particular, it may be useful to reduce
> the amount of memory required for page tables. All devices owned by the
> host kernel could be attached to the same domain (though that requires
> host changes).
> 
> Each IOMMU device holds an array of domains, and the host allocates
> domain IDs that index this array. The alloc() operation initializes the
> domain and prepares the page tables. The attach() operation initializes
> the device table that holds the PGD and its configuration.

I was wondering about the need for pre-allocation of the domain array.

An alternative way I see:
- We don’t pre-allocate any domain.

- When the EL1 driver has a request to domain_alloc, it will allocate
both kernel(iommu_domain) and hypervisor domains(kvm_hyp_iommu_domain).

- In __pkvm_host_iommu_alloc_domain, it will take over the hyp struct
from the kernel (via donation).

- In all other hypercalls, the kernel address of kvm_hyp_iommu_domain will
be used as domain ID, which guarantees uniqueness and O(1) access.

- The hypervisor would just need to transform the address(kern_hyp_va)
to get the domain pointer.

I believe that would save some memory as we allocate domains when needed.

Please let me know what you think about this?

Thanks,
Mostafa

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 40/45] iommu/arm-smmu-v3-kvm: Add IOMMU ops
  2023-02-01 12:53   ` Jean-Philippe Brucker
@ 2023-02-07 13:22     ` Mostafa Saleh
  -1 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2023-02-07 13:22 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Jean,

On Wed, Feb 01, 2023 at 12:53:24PM +0000, Jean-Philippe Brucker wrote:
> Forward alloc_domain(), attach_dev(), map_pages(), etc to the
> hypervisor.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> ---
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c   | 330 +++++++++++++++++-
>  1 file changed, 328 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
> index 55489d56fb5b..930d78f6e29f 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
> @@ -22,10 +22,28 @@ struct host_arm_smmu_device {
>  #define smmu_to_host(_smmu) \
>  	container_of(_smmu, struct host_arm_smmu_device, smmu);
>  
> +struct kvm_arm_smmu_master {
> +	struct arm_smmu_device		*smmu;
> +	struct device			*dev;
> +	struct kvm_arm_smmu_domain	*domain;
> +};
> +
> +struct kvm_arm_smmu_domain {
> +	struct iommu_domain		domain;
> +	struct arm_smmu_device		*smmu;
> +	struct mutex			init_mutex;
> +	unsigned long			pgd;
> +	pkvm_handle_t			id;
> +};
> +
> +#define to_kvm_smmu_domain(_domain) \
> +	container_of(_domain, struct kvm_arm_smmu_domain, domain)
> +
>  static size_t				kvm_arm_smmu_cur;
>  static size_t				kvm_arm_smmu_count;
>  static struct hyp_arm_smmu_v3_device	*kvm_arm_smmu_array;
>  static struct kvm_hyp_iommu_memcache	*kvm_arm_smmu_memcache;
> +static DEFINE_IDA(kvm_arm_smmu_domain_ida);
>  
>  static DEFINE_PER_CPU(local_lock_t, memcache_lock) =
>  				INIT_LOCAL_LOCK(memcache_lock);
> @@ -57,7 +75,6 @@ static void *kvm_arm_smmu_host_va(phys_addr_t pa)
>  	return __va(pa);
>  }
>  
> -__maybe_unused
>  static int kvm_arm_smmu_topup_memcache(struct arm_smmu_device *smmu)
>  {
>  	struct kvm_hyp_memcache *mc;
> @@ -74,7 +91,6 @@ static int kvm_arm_smmu_topup_memcache(struct arm_smmu_device *smmu)
>  				     kvm_arm_smmu_host_pa, smmu);
>  }
>  
> -__maybe_unused
>  static void kvm_arm_smmu_reclaim_memcache(void)
>  {
>  	struct kvm_hyp_memcache *mc;
> @@ -101,6 +117,299 @@ static void kvm_arm_smmu_reclaim_memcache(void)
>  	__ret;							\
>  })
>  
> +static struct platform_driver kvm_arm_smmu_driver;
> +
> +static struct arm_smmu_device *
> +kvm_arm_smmu_get_by_fwnode(struct fwnode_handle *fwnode)
> +{
> +	struct device *dev;
> +
> +	dev = driver_find_device_by_fwnode(&kvm_arm_smmu_driver.driver, fwnode);
> +	put_device(dev);
> +	return dev ? dev_get_drvdata(dev) : NULL;
> +}
> +
> +static struct iommu_ops kvm_arm_smmu_ops;
> +
> +static struct iommu_device *kvm_arm_smmu_probe_device(struct device *dev)
> +{
> +	struct arm_smmu_device *smmu;
> +	struct kvm_arm_smmu_master *master;
> +	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> +
> +	if (!fwspec || fwspec->ops != &kvm_arm_smmu_ops)
> +		return ERR_PTR(-ENODEV);
> +
> +	if (WARN_ON_ONCE(dev_iommu_priv_get(dev)))
> +		return ERR_PTR(-EBUSY);
> +
> +	smmu = kvm_arm_smmu_get_by_fwnode(fwspec->iommu_fwnode);
> +	if (!smmu)
> +		return ERR_PTR(-ENODEV);
> +
> +	master = kzalloc(sizeof(*master), GFP_KERNEL);
> +	if (!master)
> +		return ERR_PTR(-ENOMEM);
> +
> +	master->dev = dev;
> +	master->smmu = smmu;
> +	dev_iommu_priv_set(dev, master);
> +
> +	return &smmu->iommu;
> +}
> +
> +static void kvm_arm_smmu_release_device(struct device *dev)
> +{
> +	struct kvm_arm_smmu_master *master = dev_iommu_priv_get(dev);
> +
> +	kfree(master);
> +	iommu_fwspec_free(dev);
> +}
> +
> +static struct iommu_domain *kvm_arm_smmu_domain_alloc(unsigned type)
> +{
> +	struct kvm_arm_smmu_domain *kvm_smmu_domain;
> +
> +	/*
> +	 * We don't support
> +	 * - IOMMU_DOMAIN_IDENTITY because we rely on the host telling the
> +	 *   hypervisor which pages are used for DMA.
> +	 * - IOMMU_DOMAIN_DMA_FQ because lazy unmap would clash with memory
> +	 *   donation to guests.
> +	 */
> +	if (type != IOMMU_DOMAIN_DMA &&
> +	    type != IOMMU_DOMAIN_UNMANAGED)
> +		return NULL;
> +
> +	kvm_smmu_domain = kzalloc(sizeof(*kvm_smmu_domain), GFP_KERNEL);
> +	if (!kvm_smmu_domain)
> +		return NULL;
> +
> +	mutex_init(&kvm_smmu_domain->init_mutex);
> +
> +	return &kvm_smmu_domain->domain;
> +}
> +
> +static int kvm_arm_smmu_domain_finalize(struct kvm_arm_smmu_domain *kvm_smmu_domain,
> +					struct kvm_arm_smmu_master *master)
> +{
> +	int ret = 0;
> +	struct page *p;
> +	unsigned long pgd;
> +	struct arm_smmu_device *smmu = master->smmu;
> +	struct host_arm_smmu_device *host_smmu = smmu_to_host(smmu);
> +
> +	if (kvm_smmu_domain->smmu) {
> +		if (kvm_smmu_domain->smmu != smmu)
> +			return -EINVAL;
> +		return 0;
> +	}
> +
> +	ret = ida_alloc_range(&kvm_arm_smmu_domain_ida, 0, 1 << smmu->vmid_bits,
> +			      GFP_KERNEL);
> +	if (ret < 0)
> +		return ret;
> +	kvm_smmu_domain->id = ret;
> +
> +	/*
> +	 * PGD allocation does not use the memcache because it may be of higher
> +	 * order when concatenated.
> +	 */
> +	p = alloc_pages_node(dev_to_node(smmu->dev), GFP_KERNEL | __GFP_ZERO,
> +			     host_smmu->pgd_order);
> +	if (!p)
> +		return -ENOMEM;
> +
> +	pgd = (unsigned long)page_to_virt(p);
> +
> +	local_lock_irq(&memcache_lock);
> +	ret = kvm_call_hyp_nvhe_mc(smmu, __pkvm_host_iommu_alloc_domain,
> +				   host_smmu->id, kvm_smmu_domain->id, pgd);

What is the idea of postponing this HVC to attach and don’t call it in
alloc_domain HVC?

Thanks,
Mostafa


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 40/45] iommu/arm-smmu-v3-kvm: Add IOMMU ops
@ 2023-02-07 13:22     ` Mostafa Saleh
  0 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2023-02-07 13:22 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Jean,

On Wed, Feb 01, 2023 at 12:53:24PM +0000, Jean-Philippe Brucker wrote:
> Forward alloc_domain(), attach_dev(), map_pages(), etc to the
> hypervisor.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> ---
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c   | 330 +++++++++++++++++-
>  1 file changed, 328 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
> index 55489d56fb5b..930d78f6e29f 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
> @@ -22,10 +22,28 @@ struct host_arm_smmu_device {
>  #define smmu_to_host(_smmu) \
>  	container_of(_smmu, struct host_arm_smmu_device, smmu);
>  
> +struct kvm_arm_smmu_master {
> +	struct arm_smmu_device		*smmu;
> +	struct device			*dev;
> +	struct kvm_arm_smmu_domain	*domain;
> +};
> +
> +struct kvm_arm_smmu_domain {
> +	struct iommu_domain		domain;
> +	struct arm_smmu_device		*smmu;
> +	struct mutex			init_mutex;
> +	unsigned long			pgd;
> +	pkvm_handle_t			id;
> +};
> +
> +#define to_kvm_smmu_domain(_domain) \
> +	container_of(_domain, struct kvm_arm_smmu_domain, domain)
> +
>  static size_t				kvm_arm_smmu_cur;
>  static size_t				kvm_arm_smmu_count;
>  static struct hyp_arm_smmu_v3_device	*kvm_arm_smmu_array;
>  static struct kvm_hyp_iommu_memcache	*kvm_arm_smmu_memcache;
> +static DEFINE_IDA(kvm_arm_smmu_domain_ida);
>  
>  static DEFINE_PER_CPU(local_lock_t, memcache_lock) =
>  				INIT_LOCAL_LOCK(memcache_lock);
> @@ -57,7 +75,6 @@ static void *kvm_arm_smmu_host_va(phys_addr_t pa)
>  	return __va(pa);
>  }
>  
> -__maybe_unused
>  static int kvm_arm_smmu_topup_memcache(struct arm_smmu_device *smmu)
>  {
>  	struct kvm_hyp_memcache *mc;
> @@ -74,7 +91,6 @@ static int kvm_arm_smmu_topup_memcache(struct arm_smmu_device *smmu)
>  				     kvm_arm_smmu_host_pa, smmu);
>  }
>  
> -__maybe_unused
>  static void kvm_arm_smmu_reclaim_memcache(void)
>  {
>  	struct kvm_hyp_memcache *mc;
> @@ -101,6 +117,299 @@ static void kvm_arm_smmu_reclaim_memcache(void)
>  	__ret;							\
>  })
>  
> +static struct platform_driver kvm_arm_smmu_driver;
> +
> +static struct arm_smmu_device *
> +kvm_arm_smmu_get_by_fwnode(struct fwnode_handle *fwnode)
> +{
> +	struct device *dev;
> +
> +	dev = driver_find_device_by_fwnode(&kvm_arm_smmu_driver.driver, fwnode);
> +	put_device(dev);
> +	return dev ? dev_get_drvdata(dev) : NULL;
> +}
> +
> +static struct iommu_ops kvm_arm_smmu_ops;
> +
> +static struct iommu_device *kvm_arm_smmu_probe_device(struct device *dev)
> +{
> +	struct arm_smmu_device *smmu;
> +	struct kvm_arm_smmu_master *master;
> +	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> +
> +	if (!fwspec || fwspec->ops != &kvm_arm_smmu_ops)
> +		return ERR_PTR(-ENODEV);
> +
> +	if (WARN_ON_ONCE(dev_iommu_priv_get(dev)))
> +		return ERR_PTR(-EBUSY);
> +
> +	smmu = kvm_arm_smmu_get_by_fwnode(fwspec->iommu_fwnode);
> +	if (!smmu)
> +		return ERR_PTR(-ENODEV);
> +
> +	master = kzalloc(sizeof(*master), GFP_KERNEL);
> +	if (!master)
> +		return ERR_PTR(-ENOMEM);
> +
> +	master->dev = dev;
> +	master->smmu = smmu;
> +	dev_iommu_priv_set(dev, master);
> +
> +	return &smmu->iommu;
> +}
> +
> +static void kvm_arm_smmu_release_device(struct device *dev)
> +{
> +	struct kvm_arm_smmu_master *master = dev_iommu_priv_get(dev);
> +
> +	kfree(master);
> +	iommu_fwspec_free(dev);
> +}
> +
> +static struct iommu_domain *kvm_arm_smmu_domain_alloc(unsigned type)
> +{
> +	struct kvm_arm_smmu_domain *kvm_smmu_domain;
> +
> +	/*
> +	 * We don't support
> +	 * - IOMMU_DOMAIN_IDENTITY because we rely on the host telling the
> +	 *   hypervisor which pages are used for DMA.
> +	 * - IOMMU_DOMAIN_DMA_FQ because lazy unmap would clash with memory
> +	 *   donation to guests.
> +	 */
> +	if (type != IOMMU_DOMAIN_DMA &&
> +	    type != IOMMU_DOMAIN_UNMANAGED)
> +		return NULL;
> +
> +	kvm_smmu_domain = kzalloc(sizeof(*kvm_smmu_domain), GFP_KERNEL);
> +	if (!kvm_smmu_domain)
> +		return NULL;
> +
> +	mutex_init(&kvm_smmu_domain->init_mutex);
> +
> +	return &kvm_smmu_domain->domain;
> +}
> +
> +static int kvm_arm_smmu_domain_finalize(struct kvm_arm_smmu_domain *kvm_smmu_domain,
> +					struct kvm_arm_smmu_master *master)
> +{
> +	int ret = 0;
> +	struct page *p;
> +	unsigned long pgd;
> +	struct arm_smmu_device *smmu = master->smmu;
> +	struct host_arm_smmu_device *host_smmu = smmu_to_host(smmu);
> +
> +	if (kvm_smmu_domain->smmu) {
> +		if (kvm_smmu_domain->smmu != smmu)
> +			return -EINVAL;
> +		return 0;
> +	}
> +
> +	ret = ida_alloc_range(&kvm_arm_smmu_domain_ida, 0, 1 << smmu->vmid_bits,
> +			      GFP_KERNEL);
> +	if (ret < 0)
> +		return ret;
> +	kvm_smmu_domain->id = ret;
> +
> +	/*
> +	 * PGD allocation does not use the memcache because it may be of higher
> +	 * order when concatenated.
> +	 */
> +	p = alloc_pages_node(dev_to_node(smmu->dev), GFP_KERNEL | __GFP_ZERO,
> +			     host_smmu->pgd_order);
> +	if (!p)
> +		return -ENOMEM;
> +
> +	pgd = (unsigned long)page_to_virt(p);
> +
> +	local_lock_irq(&memcache_lock);
> +	ret = kvm_call_hyp_nvhe_mc(smmu, __pkvm_host_iommu_alloc_domain,
> +				   host_smmu->id, kvm_smmu_domain->id, pgd);

What is the idea of postponing this HVC to attach and don’t call it in
alloc_domain HVC?

Thanks,
Mostafa


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 42/45] KVM: arm64: pkvm: Support SCMI power domain
  2023-02-01 12:53   ` Jean-Philippe Brucker
@ 2023-02-07 13:27     ` Mostafa Saleh
  -1 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2023-02-07 13:27 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Jean,

> +bool kvm_host_scmi_handler(struct kvm_cpu_context *host_ctxt)
> +{
> +	DECLARE_REG(u64, func_id, host_ctxt, 0);
> +
> +	if (!scmi_channel.shmem || func_id != scmi_channel.smc_id)
> +		return false; /* Unhandled */
> +
> +	/*
> +	 * Prevent the host from modifying the request while it is in flight.
> +	 * One page is enough, SCMI messages are smaller than that.
> +	 *
> +	 * FIXME: the host is allowed to poll the shmem while the request is in
> +	 * flight, or read shmem when receiving the SCMI interrupt. Although
> +	 * it's unlikely with the SMC-based transport, this too requires some
> +	 * tightening in the spec.
> +	 */
> +	if (WARN_ON(__pkvm_host_add_remove_page(scmi_channel.shmem_pfn, true)))
> +		return true;
> +
> +	__kvm_host_scmi_handler(host_ctxt);
> +
> +	WARN_ON(__pkvm_host_add_remove_page(scmi_channel.shmem_pfn, false));
> +	return true; /* Handled */
> +}

I am not sure what a typical SCMI channel shmem_size would be.
But would map/unmap be more performant than
memcpy(hyp_local_copy,scmi_channel.pfn,scmi_channel.shmem_size ); ?

Thank,
Mostafa



^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 42/45] KVM: arm64: pkvm: Support SCMI power domain
@ 2023-02-07 13:27     ` Mostafa Saleh
  0 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2023-02-07 13:27 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Jean,

> +bool kvm_host_scmi_handler(struct kvm_cpu_context *host_ctxt)
> +{
> +	DECLARE_REG(u64, func_id, host_ctxt, 0);
> +
> +	if (!scmi_channel.shmem || func_id != scmi_channel.smc_id)
> +		return false; /* Unhandled */
> +
> +	/*
> +	 * Prevent the host from modifying the request while it is in flight.
> +	 * One page is enough, SCMI messages are smaller than that.
> +	 *
> +	 * FIXME: the host is allowed to poll the shmem while the request is in
> +	 * flight, or read shmem when receiving the SCMI interrupt. Although
> +	 * it's unlikely with the SMC-based transport, this too requires some
> +	 * tightening in the spec.
> +	 */
> +	if (WARN_ON(__pkvm_host_add_remove_page(scmi_channel.shmem_pfn, true)))
> +		return true;
> +
> +	__kvm_host_scmi_handler(host_ctxt);
> +
> +	WARN_ON(__pkvm_host_add_remove_page(scmi_channel.shmem_pfn, false));
> +	return true; /* Handled */
> +}

I am not sure what a typical SCMI channel shmem_size would be.
But would map/unmap be more performant than
memcpy(hyp_local_copy,scmi_channel.pfn,scmi_channel.shmem_size ); ?

Thank,
Mostafa



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 19/45] KVM: arm64: iommu: Add domains
  2023-02-07 13:13     ` Mostafa Saleh
@ 2023-02-08 12:31       ` Mostafa Saleh
  -1 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2023-02-08 12:31 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

On Tue, Feb 7, 2023 at 1:13 PM Mostafa Saleh <smostafa@google.com> wrote:

> I was wondering about the need for pre-allocation of the domain array.
>
> An alternative way I see:
> - We don’t pre-allocate any domain.
>
> - When the EL1 driver has a request to domain_alloc, it will allocate
> both kernel(iommu_domain) and hypervisor domains(kvm_hyp_iommu_domain).
>
> - In __pkvm_host_iommu_alloc_domain, it will take over the hyp struct
> from the kernel (via donation).
>
> - In all other hypercalls, the kernel address of kvm_hyp_iommu_domain will
> be used as domain ID, which guarantees uniqueness and O(1) access.
>
> - The hypervisor would just need to transform the address(kern_hyp_va)
> to get the domain pointer.


This actually will not work with the current sequence, as we can't
guarantee that the domain_id sent later from the host is trusted, and as
the domain points to the page table this can be dangerous, I will have a
closer look to see if we can make this work somehow.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 19/45] KVM: arm64: iommu: Add domains
@ 2023-02-08 12:31       ` Mostafa Saleh
  0 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2023-02-08 12:31 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

On Tue, Feb 7, 2023 at 1:13 PM Mostafa Saleh <smostafa@google.com> wrote:

> I was wondering about the need for pre-allocation of the domain array.
>
> An alternative way I see:
> - We don’t pre-allocate any domain.
>
> - When the EL1 driver has a request to domain_alloc, it will allocate
> both kernel(iommu_domain) and hypervisor domains(kvm_hyp_iommu_domain).
>
> - In __pkvm_host_iommu_alloc_domain, it will take over the hyp struct
> from the kernel (via donation).
>
> - In all other hypercalls, the kernel address of kvm_hyp_iommu_domain will
> be used as domain ID, which guarantees uniqueness and O(1) access.
>
> - The hypervisor would just need to transform the address(kern_hyp_va)
> to get the domain pointer.


This actually will not work with the current sequence, as we can't
guarantee that the domain_id sent later from the host is trusted, and as
the domain points to the page table this can be dangerous, I will have a
closer look to see if we can make this work somehow.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 05/45] iommu/io-pgtable: Split io_pgtable structure
  2023-02-07 12:16   ` Mostafa Saleh
@ 2023-02-08 18:01       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-08 18:01 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu, Abhinav Kumar,
	Alyssa Rosenzweig, Andy Gross, Bjorn Andersson, Daniel Vetter,
	David Airlie, Dmitry Baryshkov, Hector Martin, Konrad Dybcio,
	Matthias Brugger, Rob Clark, Rob Herring, Sean Paul,
	Steven Price, Suravee Suthikulpanit, Sven Peter, Tomeu Vizoso,
	Yong Wu

Hi Mostafa,

On Tue, Feb 07, 2023 at 12:16:12PM +0000, Mostafa Saleh wrote:
> > +static inline size_t
> > +iopt_unmap_pages(struct io_pgtable *iop, unsigned long iova, size_t pgsize,
> > +		 size_t pgcount, struct iommu_iotlb_gather *gather)
> > +{
> > +	if (!iop->ops || !iop->ops->map_pages)
> Should this be !iop->ops->unmap_pages?

Oh right, good catch!

Sorry about the size of this patch by the way. If we decide to keep this
change I'll reduce it by first introducing these helpers and then
splitting the structure in another patch.

Thanks,
Jean


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 05/45] iommu/io-pgtable: Split io_pgtable structure
@ 2023-02-08 18:01       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-08 18:01 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu, Abhinav Kumar,
	Alyssa Rosenzweig, Andy Gross, Bjorn Andersson, Daniel Vetter,
	David Airlie, Dmitry Baryshkov, Hector Martin, Konrad Dybcio,
	Matthias Brugger, Rob Clark, Rob Herring, Sean Paul,
	Steven Price, Suravee Suthikulpanit, Sven Peter, Tomeu Vizoso,
	Yong Wu

Hi Mostafa,

On Tue, Feb 07, 2023 at 12:16:12PM +0000, Mostafa Saleh wrote:
> > +static inline size_t
> > +iopt_unmap_pages(struct io_pgtable *iop, unsigned long iova, size_t pgsize,
> > +		 size_t pgcount, struct iommu_iotlb_gather *gather)
> > +{
> > +	if (!iop->ops || !iop->ops->map_pages)
> Should this be !iop->ops->unmap_pages?

Oh right, good catch!

Sorry about the size of this patch by the way. If we decide to keep this
change I'll reduce it by first introducing these helpers and then
splitting the structure in another patch.

Thanks,
Jean


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 09/45] KVM: arm64: pkvm: Add pkvm_create_hyp_device_mapping()
  2023-02-07 12:22     ` Mostafa Saleh
@ 2023-02-08 18:02       ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-08 18:02 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

On Tue, Feb 07, 2023 at 12:22:13PM +0000, Mostafa Saleh wrote:
> > +/* Map the MMIO region into the hypervisor and remove it from host */
> > +int pkvm_create_hyp_device_mapping(u64 base, u64 size, void __iomem *haddr)
> > +{
> > +	int ret;
> > +
> > +	ret = __pkvm_create_private_mapping(base, size, PAGE_HYP_DEVICE, haddr);
> > +	if (ret)
> > +		return ret;
> > +
> > +	/* lock not needed during setup */
> > +	ret = host_stage2_set_owner_locked(base, size, PKVM_ID_HYP);
> > +	if (ret)
> > +		return ret;
> > +
> 
> Do we need to unmap in case of errors from host_stage2_set_owner_locked?

I don't think so because this function is meant for hyp setup only, so any
error causes a complete teardown of both hyp and host page tables

I wondered about adding a BUG_ON() here if the function is not used during
setup, but maybe improving the comment would be good enough.

Thanks,
Jean


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 09/45] KVM: arm64: pkvm: Add pkvm_create_hyp_device_mapping()
@ 2023-02-08 18:02       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-08 18:02 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

On Tue, Feb 07, 2023 at 12:22:13PM +0000, Mostafa Saleh wrote:
> > +/* Map the MMIO region into the hypervisor and remove it from host */
> > +int pkvm_create_hyp_device_mapping(u64 base, u64 size, void __iomem *haddr)
> > +{
> > +	int ret;
> > +
> > +	ret = __pkvm_create_private_mapping(base, size, PAGE_HYP_DEVICE, haddr);
> > +	if (ret)
> > +		return ret;
> > +
> > +	/* lock not needed during setup */
> > +	ret = host_stage2_set_owner_locked(base, size, PKVM_ID_HYP);
> > +	if (ret)
> > +		return ret;
> > +
> 
> Do we need to unmap in case of errors from host_stage2_set_owner_locked?

I don't think so because this function is meant for hyp setup only, so any
error causes a complete teardown of both hyp and host page tables

I wondered about adding a BUG_ON() here if the function is not used during
setup, but maybe improving the comment would be good enough.

Thanks,
Jean


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 19/45] KVM: arm64: iommu: Add domains
  2023-02-08 12:31       ` Mostafa Saleh
@ 2023-02-08 18:05         ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-08 18:05 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

On Wed, Feb 08, 2023 at 12:31:15PM +0000, Mostafa Saleh wrote:
> On Tue, Feb 7, 2023 at 1:13 PM Mostafa Saleh <smostafa@google.com> wrote:
> 
> > I was wondering about the need for pre-allocation of the domain array.
> >
> > An alternative way I see:
> > - We don’t pre-allocate any domain.
> >
> > - When the EL1 driver has a request to domain_alloc, it will allocate
> > both kernel(iommu_domain) and hypervisor domains(kvm_hyp_iommu_domain).
> >
> > - In __pkvm_host_iommu_alloc_domain, it will take over the hyp struct
> > from the kernel (via donation).

That also requires an entire page for each domain no?  I guess this domain
table would only be worse in memory use if we have fewer than 2 domains,
since it costs one page for the root table, and then stores 256 domains
per leaf page.

What I've been trying to avoid with this table is introducing a malloc in
the hypervisor, but we might have to bite the bullet eventually (although
with a malloc, access will probably worse than O(1)).

Thanks,
Jean

> >
> > - In all other hypercalls, the kernel address of kvm_hyp_iommu_domain will
> > be used as domain ID, which guarantees uniqueness and O(1) access.
> >
> > - The hypervisor would just need to transform the address(kern_hyp_va)
> > to get the domain pointer.
> 
> 
> This actually will not work with the current sequence, as we can't
> guarantee that the domain_id sent later from the host is trusted, and as
> the domain points to the page table this can be dangerous, I will have a
> closer look to see if we can make this work somehow.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 19/45] KVM: arm64: iommu: Add domains
@ 2023-02-08 18:05         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-08 18:05 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

On Wed, Feb 08, 2023 at 12:31:15PM +0000, Mostafa Saleh wrote:
> On Tue, Feb 7, 2023 at 1:13 PM Mostafa Saleh <smostafa@google.com> wrote:
> 
> > I was wondering about the need for pre-allocation of the domain array.
> >
> > An alternative way I see:
> > - We don’t pre-allocate any domain.
> >
> > - When the EL1 driver has a request to domain_alloc, it will allocate
> > both kernel(iommu_domain) and hypervisor domains(kvm_hyp_iommu_domain).
> >
> > - In __pkvm_host_iommu_alloc_domain, it will take over the hyp struct
> > from the kernel (via donation).

That also requires an entire page for each domain no?  I guess this domain
table would only be worse in memory use if we have fewer than 2 domains,
since it costs one page for the root table, and then stores 256 domains
per leaf page.

What I've been trying to avoid with this table is introducing a malloc in
the hypervisor, but we might have to bite the bullet eventually (although
with a malloc, access will probably worse than O(1)).

Thanks,
Jean

> >
> > - In all other hypercalls, the kernel address of kvm_hyp_iommu_domain will
> > be used as domain ID, which guarantees uniqueness and O(1) access.
> >
> > - The hypervisor would just need to transform the address(kern_hyp_va)
> > to get the domain pointer.
> 
> 
> This actually will not work with the current sequence, as we can't
> guarantee that the domain_id sent later from the host is trusted, and as
> the domain points to the page table this can be dangerous, I will have a
> closer look to see if we can make this work somehow.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 40/45] iommu/arm-smmu-v3-kvm: Add IOMMU ops
  2023-02-07 13:22     ` Mostafa Saleh
@ 2023-02-08 18:13       ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-08 18:13 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

On Tue, Feb 07, 2023 at 01:22:11PM +0000, Mostafa Saleh wrote:
> > +static struct iommu_domain *kvm_arm_smmu_domain_alloc(unsigned type)
> > +{
> > +	struct kvm_arm_smmu_domain *kvm_smmu_domain;
> > +
> > +	/*
> > +	 * We don't support
> > +	 * - IOMMU_DOMAIN_IDENTITY because we rely on the host telling the
> > +	 *   hypervisor which pages are used for DMA.
> > +	 * - IOMMU_DOMAIN_DMA_FQ because lazy unmap would clash with memory
> > +	 *   donation to guests.
> > +	 */
> > +	if (type != IOMMU_DOMAIN_DMA &&
> > +	    type != IOMMU_DOMAIN_UNMANAGED)
> > +		return NULL;
> > +
> > +	kvm_smmu_domain = kzalloc(sizeof(*kvm_smmu_domain), GFP_KERNEL);
> > +	if (!kvm_smmu_domain)
> > +		return NULL;
> > +
> > +	mutex_init(&kvm_smmu_domain->init_mutex);
> > +
> > +	return &kvm_smmu_domain->domain;
> > +}
> > +
> > +static int kvm_arm_smmu_domain_finalize(struct kvm_arm_smmu_domain *kvm_smmu_domain,
> > +					struct kvm_arm_smmu_master *master)
> > +{
> > +	int ret = 0;
> > +	struct page *p;
> > +	unsigned long pgd;
> > +	struct arm_smmu_device *smmu = master->smmu;
> > +	struct host_arm_smmu_device *host_smmu = smmu_to_host(smmu);
> > +
> > +	if (kvm_smmu_domain->smmu) {
> > +		if (kvm_smmu_domain->smmu != smmu)
> > +			return -EINVAL;
> > +		return 0;
> > +	}
> > +
> > +	ret = ida_alloc_range(&kvm_arm_smmu_domain_ida, 0, 1 << smmu->vmid_bits,
> > +			      GFP_KERNEL);
> > +	if (ret < 0)
> > +		return ret;
> > +	kvm_smmu_domain->id = ret;
> > +
> > +	/*
> > +	 * PGD allocation does not use the memcache because it may be of higher
> > +	 * order when concatenated.
> > +	 */
> > +	p = alloc_pages_node(dev_to_node(smmu->dev), GFP_KERNEL | __GFP_ZERO,
> > +			     host_smmu->pgd_order);
> > +	if (!p)
> > +		return -ENOMEM;
> > +
> > +	pgd = (unsigned long)page_to_virt(p);
> > +
> > +	local_lock_irq(&memcache_lock);
> > +	ret = kvm_call_hyp_nvhe_mc(smmu, __pkvm_host_iommu_alloc_domain,
> > +				   host_smmu->id, kvm_smmu_domain->id, pgd);
> 
> What is the idea of postponing this HVC to attach and don’t call it in
> alloc_domain HVC?

Yes ideally this HVC would be in kvm_arm_smmu_domain_alloc() above, but
due to the way the IOMMU API works at the moment, that function doesn't
take a context parameter. So we don't know which SMMU this will be for,
which is problematic. We don't know which page table formats are
supported, how many VMIDs, how to allocate memory for the page tables
(which NUMA node is the SMMU on, can it access high physical addresses or
does it need special allocations?)

I think there are plans to add a context to domain_alloc(), but at the
moment IOMMU drivers complete the domain allocation in attach().

Thanks,
Jean

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 40/45] iommu/arm-smmu-v3-kvm: Add IOMMU ops
@ 2023-02-08 18:13       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-08 18:13 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

On Tue, Feb 07, 2023 at 01:22:11PM +0000, Mostafa Saleh wrote:
> > +static struct iommu_domain *kvm_arm_smmu_domain_alloc(unsigned type)
> > +{
> > +	struct kvm_arm_smmu_domain *kvm_smmu_domain;
> > +
> > +	/*
> > +	 * We don't support
> > +	 * - IOMMU_DOMAIN_IDENTITY because we rely on the host telling the
> > +	 *   hypervisor which pages are used for DMA.
> > +	 * - IOMMU_DOMAIN_DMA_FQ because lazy unmap would clash with memory
> > +	 *   donation to guests.
> > +	 */
> > +	if (type != IOMMU_DOMAIN_DMA &&
> > +	    type != IOMMU_DOMAIN_UNMANAGED)
> > +		return NULL;
> > +
> > +	kvm_smmu_domain = kzalloc(sizeof(*kvm_smmu_domain), GFP_KERNEL);
> > +	if (!kvm_smmu_domain)
> > +		return NULL;
> > +
> > +	mutex_init(&kvm_smmu_domain->init_mutex);
> > +
> > +	return &kvm_smmu_domain->domain;
> > +}
> > +
> > +static int kvm_arm_smmu_domain_finalize(struct kvm_arm_smmu_domain *kvm_smmu_domain,
> > +					struct kvm_arm_smmu_master *master)
> > +{
> > +	int ret = 0;
> > +	struct page *p;
> > +	unsigned long pgd;
> > +	struct arm_smmu_device *smmu = master->smmu;
> > +	struct host_arm_smmu_device *host_smmu = smmu_to_host(smmu);
> > +
> > +	if (kvm_smmu_domain->smmu) {
> > +		if (kvm_smmu_domain->smmu != smmu)
> > +			return -EINVAL;
> > +		return 0;
> > +	}
> > +
> > +	ret = ida_alloc_range(&kvm_arm_smmu_domain_ida, 0, 1 << smmu->vmid_bits,
> > +			      GFP_KERNEL);
> > +	if (ret < 0)
> > +		return ret;
> > +	kvm_smmu_domain->id = ret;
> > +
> > +	/*
> > +	 * PGD allocation does not use the memcache because it may be of higher
> > +	 * order when concatenated.
> > +	 */
> > +	p = alloc_pages_node(dev_to_node(smmu->dev), GFP_KERNEL | __GFP_ZERO,
> > +			     host_smmu->pgd_order);
> > +	if (!p)
> > +		return -ENOMEM;
> > +
> > +	pgd = (unsigned long)page_to_virt(p);
> > +
> > +	local_lock_irq(&memcache_lock);
> > +	ret = kvm_call_hyp_nvhe_mc(smmu, __pkvm_host_iommu_alloc_domain,
> > +				   host_smmu->id, kvm_smmu_domain->id, pgd);
> 
> What is the idea of postponing this HVC to attach and don’t call it in
> alloc_domain HVC?

Yes ideally this HVC would be in kvm_arm_smmu_domain_alloc() above, but
due to the way the IOMMU API works at the moment, that function doesn't
take a context parameter. So we don't know which SMMU this will be for,
which is problematic. We don't know which page table formats are
supported, how many VMIDs, how to allocate memory for the page tables
(which NUMA node is the SMMU on, can it access high physical addresses or
does it need special allocations?)

I think there are plans to add a context to domain_alloc(), but at the
moment IOMMU drivers complete the domain allocation in attach().

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 15/45] KVM: arm64: pkvm: Add __pkvm_host_share/unshare_dma()
  2023-02-07 12:53     ` Mostafa Saleh
@ 2023-02-10 19:21       ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-10 19:21 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

On Tue, Feb 07, 2023 at 12:53:56PM +0000, Mostafa Saleh wrote:
> > +static int __host_check_page_dma_shared(phys_addr_t phys_addr)
> > +{
> > +	int ret;
> > +	u64 hyp_addr;
> > +
> > +	/*
> > +	 * The page is already refcounted. Make sure it's owned by the host, and
> > +	 * not part of the hyp pool.
> > +	 */
> > +	ret = __host_check_page_state_range(phys_addr, PAGE_SIZE,
> > +					    PKVM_PAGE_SHARED_OWNED);
> > +	if (ret)
> > +		return ret;
> > +
> > +	/*
> > +	 * Refcounted and owned by host, means it's either mapped in the
> > +	 * SMMU, or it's some VM/VCPU state shared with the hypervisor.
> > +	 * The host has no reason to use a page for both.
> > +	 */
> > +	hyp_addr = (u64)hyp_phys_to_virt(phys_addr);
> > +	return __hyp_check_page_state_range(hyp_addr, PAGE_SIZE, PKVM_NOPAGE);
> 
> This works for hyp-host sharing, I am worried about scalability of this.
> For example FFA(still on the list) adds a new entity that can share data
> with host, and we would need an extra check for it.
> https://lore.kernel.org/kvmarm/20221116170335.2341003-1-qperret@google.com/

Right, although it looks like the ff-a support doesn't need a refcount at
the moment so passing such a page to iommu_map() would fail at
check_share() (but I may be wrong, I'll need another look). Even if it did
use a refcount, I think it may be fine to share it between different
entities: the refcount ensures that every sharing is undone before the
page can be donated to another entity, and a separate tracking ensures
that the host undoes the right thing (the io-pgtable mappings for pages
shared with the IOMMU, and the secure world for ff-a)

Regardless I agree that this doesn't scale, it gets too complex and I'd
prefer if tracking the page state was less subtle. I'll try to find
something more generic.

> 
> One way I can think about this, is to use the SW bits in SMMU page table to
> represent ownership, for example for PKVM_ID_IOMMU
> do_share:
> -completer: set SW bits in the SMMU page table to
> PKVM_PAGE_SHARED_BORROWED

I'm not sure I understand, would we add this in order to share a page that
is already mapped in the SMMU with another entity (e.g. ff-a)?  Doing it
this way would be difficult because there may be a lot of different SMMU
page tables mapping the page.

The fact that a page is mapped in the SMMU page table already indicates
that it's borrowed from the host so the SW bits seem redundant.

> 
> do_unshare:
> -completer: set SW bit back to PKVM_PAGE_OWNED(I think this is good
> enough for host ownership)
> 
> In __pkvm_host_share_dma_page
> If page refcount > 1, it succeeds if only the page was borrowed in the
> SMMU page table and shared in the host page table.

We could also move more information into the vmemmap, because struct
hyp_page still has some space for page state. It has six free bits at the
moment, which is enough for both owner ID and share state, and maybe we
could steal a couple more from the refcount and order fields if necessary.

With that the iommu_map() and unmap() path could also be made a lot more
scalable because they wouldn't need to take any page table locks when
updating page state, a cmpxchg may be sufficient.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 15/45] KVM: arm64: pkvm: Add __pkvm_host_share/unshare_dma()
@ 2023-02-10 19:21       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-10 19:21 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

On Tue, Feb 07, 2023 at 12:53:56PM +0000, Mostafa Saleh wrote:
> > +static int __host_check_page_dma_shared(phys_addr_t phys_addr)
> > +{
> > +	int ret;
> > +	u64 hyp_addr;
> > +
> > +	/*
> > +	 * The page is already refcounted. Make sure it's owned by the host, and
> > +	 * not part of the hyp pool.
> > +	 */
> > +	ret = __host_check_page_state_range(phys_addr, PAGE_SIZE,
> > +					    PKVM_PAGE_SHARED_OWNED);
> > +	if (ret)
> > +		return ret;
> > +
> > +	/*
> > +	 * Refcounted and owned by host, means it's either mapped in the
> > +	 * SMMU, or it's some VM/VCPU state shared with the hypervisor.
> > +	 * The host has no reason to use a page for both.
> > +	 */
> > +	hyp_addr = (u64)hyp_phys_to_virt(phys_addr);
> > +	return __hyp_check_page_state_range(hyp_addr, PAGE_SIZE, PKVM_NOPAGE);
> 
> This works for hyp-host sharing, I am worried about scalability of this.
> For example FFA(still on the list) adds a new entity that can share data
> with host, and we would need an extra check for it.
> https://lore.kernel.org/kvmarm/20221116170335.2341003-1-qperret@google.com/

Right, although it looks like the ff-a support doesn't need a refcount at
the moment so passing such a page to iommu_map() would fail at
check_share() (but I may be wrong, I'll need another look). Even if it did
use a refcount, I think it may be fine to share it between different
entities: the refcount ensures that every sharing is undone before the
page can be donated to another entity, and a separate tracking ensures
that the host undoes the right thing (the io-pgtable mappings for pages
shared with the IOMMU, and the secure world for ff-a)

Regardless I agree that this doesn't scale, it gets too complex and I'd
prefer if tracking the page state was less subtle. I'll try to find
something more generic.

> 
> One way I can think about this, is to use the SW bits in SMMU page table to
> represent ownership, for example for PKVM_ID_IOMMU
> do_share:
> -completer: set SW bits in the SMMU page table to
> PKVM_PAGE_SHARED_BORROWED

I'm not sure I understand, would we add this in order to share a page that
is already mapped in the SMMU with another entity (e.g. ff-a)?  Doing it
this way would be difficult because there may be a lot of different SMMU
page tables mapping the page.

The fact that a page is mapped in the SMMU page table already indicates
that it's borrowed from the host so the SW bits seem redundant.

> 
> do_unshare:
> -completer: set SW bit back to PKVM_PAGE_OWNED(I think this is good
> enough for host ownership)
> 
> In __pkvm_host_share_dma_page
> If page refcount > 1, it succeeds if only the page was borrowed in the
> SMMU page table and shared in the host page table.

We could also move more information into the vmemmap, because struct
hyp_page still has some space for page state. It has six free bits at the
moment, which is enough for both owner ID and share state, and maybe we
could steal a couple more from the refcount and order fields if necessary.

With that the iommu_map() and unmap() path could also be made a lot more
scalable because they wouldn't need to take any page table locks when
updating page state, a cmpxchg may be sufficient.

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 42/45] KVM: arm64: pkvm: Support SCMI power domain
  2023-02-07 13:27     ` Mostafa Saleh
@ 2023-02-10 19:23       ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-10 19:23 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

On Tue, Feb 07, 2023 at 01:27:17PM +0000, Mostafa Saleh wrote:
> Hi Jean,
> 
> > +bool kvm_host_scmi_handler(struct kvm_cpu_context *host_ctxt)
> > +{
> > +	DECLARE_REG(u64, func_id, host_ctxt, 0);
> > +
> > +	if (!scmi_channel.shmem || func_id != scmi_channel.smc_id)
> > +		return false; /* Unhandled */
> > +
> > +	/*
> > +	 * Prevent the host from modifying the request while it is in flight.
> > +	 * One page is enough, SCMI messages are smaller than that.
> > +	 *
> > +	 * FIXME: the host is allowed to poll the shmem while the request is in
> > +	 * flight, or read shmem when receiving the SCMI interrupt. Although
> > +	 * it's unlikely with the SMC-based transport, this too requires some
> > +	 * tightening in the spec.
> > +	 */
> > +	if (WARN_ON(__pkvm_host_add_remove_page(scmi_channel.shmem_pfn, true)))
> > +		return true;
> > +
> > +	__kvm_host_scmi_handler(host_ctxt);
> > +
> > +	WARN_ON(__pkvm_host_add_remove_page(scmi_channel.shmem_pfn, false));
> > +	return true; /* Handled */
> > +}
> 
> I am not sure what a typical SCMI channel shmem_size would be.

shmem_size is large but a SCMI message isn't bigger than 128 bytes at the
moment, so copying would certainly be more performant.

> But would map/unmap be more performant than
> memcpy(hyp_local_copy,scmi_channel.pfn,scmi_channel.shmem_size ); ?

The problem is that we're forwarding this to the SCMI server, which
expects to find the command in the shmem.

I took the easy route here, but a more efficient way to support SCMI would
be for the hypervisor to own the channel permanently, and the host<->hyp
communication would use a separate shared page. The hypervisor could then
copy from one channel to the other. It requires more support in the host
driver so I'd like to see if there is any interest in supporting SCMI
before working on this.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 42/45] KVM: arm64: pkvm: Support SCMI power domain
@ 2023-02-10 19:23       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-02-10 19:23 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

On Tue, Feb 07, 2023 at 01:27:17PM +0000, Mostafa Saleh wrote:
> Hi Jean,
> 
> > +bool kvm_host_scmi_handler(struct kvm_cpu_context *host_ctxt)
> > +{
> > +	DECLARE_REG(u64, func_id, host_ctxt, 0);
> > +
> > +	if (!scmi_channel.shmem || func_id != scmi_channel.smc_id)
> > +		return false; /* Unhandled */
> > +
> > +	/*
> > +	 * Prevent the host from modifying the request while it is in flight.
> > +	 * One page is enough, SCMI messages are smaller than that.
> > +	 *
> > +	 * FIXME: the host is allowed to poll the shmem while the request is in
> > +	 * flight, or read shmem when receiving the SCMI interrupt. Although
> > +	 * it's unlikely with the SMC-based transport, this too requires some
> > +	 * tightening in the spec.
> > +	 */
> > +	if (WARN_ON(__pkvm_host_add_remove_page(scmi_channel.shmem_pfn, true)))
> > +		return true;
> > +
> > +	__kvm_host_scmi_handler(host_ctxt);
> > +
> > +	WARN_ON(__pkvm_host_add_remove_page(scmi_channel.shmem_pfn, false));
> > +	return true; /* Handled */
> > +}
> 
> I am not sure what a typical SCMI channel shmem_size would be.

shmem_size is large but a SCMI message isn't bigger than 128 bytes at the
moment, so copying would certainly be more performant.

> But would map/unmap be more performant than
> memcpy(hyp_local_copy,scmi_channel.pfn,scmi_channel.shmem_size ); ?

The problem is that we're forwarding this to the SCMI server, which
expects to find the command in the shmem.

I took the easy route here, but a more efficient way to support SCMI would
be for the hypervisor to own the channel permanently, and the host<->hyp
communication would use a separate shared page. The hypervisor could then
copy from one channel to the other. It requires more support in the host
driver so I'd like to see if there is any interest in supporting SCMI
before working on this.

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 19/45] KVM: arm64: iommu: Add domains
  2023-02-08 18:05         ` Jean-Philippe Brucker
@ 2023-02-10 22:03           ` Mostafa Saleh
  -1 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2023-02-10 22:03 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

On Wed, Feb 8, 2023 at 6:05 PM Jean-Philippe Brucker
<jean-philippe@linaro.org> wrote:
>
> On Wed, Feb 08, 2023 at 12:31:15PM +0000, Mostafa Saleh wrote:
> > On Tue, Feb 7, 2023 at 1:13 PM Mostafa Saleh <smostafa@google.com> wrote:
> >
> > > I was wondering about the need for pre-allocation of the domain array.
> > >
> > > An alternative way I see:
> > > - We don’t pre-allocate any domain.
> > >
> > > - When the EL1 driver has a request to domain_alloc, it will allocate
> > > both kernel(iommu_domain) and hypervisor domains(kvm_hyp_iommu_domain).
> > >
> > > - In __pkvm_host_iommu_alloc_domain, it will take over the hyp struct
> > > from the kernel (via donation).
>
> That also requires an entire page for each domain no?  I guess this domain
> table would only be worse in memory use if we have fewer than 2 domains,
> since it costs one page for the root table, and then stores 256 domains
> per leaf page.

Yes, that would require a page for a domain also, which is inefficient.

> What I've been trying to avoid with this table is introducing a malloc in
> the hypervisor, but we might have to bite the bullet eventually (although
> with a malloc, access will probably worse than O(1)).

An alternative approach,

1- At SMMU init, it will alloc va range which is not backed by any memory
(via pkvm_alloc_private_va_range) that is contiguous with the max size
of domains.
2- This will be like a large array indexed by domain id, and it would
be filled on demand
from memcache.
3-alloc_domain will make sure that the new domain_id has a page and
then any other
access from map and unmap would just index this memory.

This can save the extra page in the root table, handle_to_domain would
slightly be
more efficient.
But this can cause page faults in EL2 if domain_id was not correct (allocated in
EL2 before). So I am not sure if it is worth it.


Thanks,
Mostafa

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 19/45] KVM: arm64: iommu: Add domains
@ 2023-02-10 22:03           ` Mostafa Saleh
  0 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2023-02-10 22:03 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

On Wed, Feb 8, 2023 at 6:05 PM Jean-Philippe Brucker
<jean-philippe@linaro.org> wrote:
>
> On Wed, Feb 08, 2023 at 12:31:15PM +0000, Mostafa Saleh wrote:
> > On Tue, Feb 7, 2023 at 1:13 PM Mostafa Saleh <smostafa@google.com> wrote:
> >
> > > I was wondering about the need for pre-allocation of the domain array.
> > >
> > > An alternative way I see:
> > > - We don’t pre-allocate any domain.
> > >
> > > - When the EL1 driver has a request to domain_alloc, it will allocate
> > > both kernel(iommu_domain) and hypervisor domains(kvm_hyp_iommu_domain).
> > >
> > > - In __pkvm_host_iommu_alloc_domain, it will take over the hyp struct
> > > from the kernel (via donation).
>
> That also requires an entire page for each domain no?  I guess this domain
> table would only be worse in memory use if we have fewer than 2 domains,
> since it costs one page for the root table, and then stores 256 domains
> per leaf page.

Yes, that would require a page for a domain also, which is inefficient.

> What I've been trying to avoid with this table is introducing a malloc in
> the hypervisor, but we might have to bite the bullet eventually (although
> with a malloc, access will probably worse than O(1)).

An alternative approach,

1- At SMMU init, it will alloc va range which is not backed by any memory
(via pkvm_alloc_private_va_range) that is contiguous with the max size
of domains.
2- This will be like a large array indexed by domain id, and it would
be filled on demand
from memcache.
3-alloc_domain will make sure that the new domain_id has a page and
then any other
access from map and unmap would just index this memory.

This can save the extra page in the root table, handle_to_domain would
slightly be
more efficient.
But this can cause page faults in EL2 if domain_id was not correct (allocated in
EL2 before). So I am not sure if it is worth it.


Thanks,
Mostafa

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 39/45] iommu/arm-smmu-v3-kvm: Initialize page table configuration
  2023-02-01 12:53   ` Jean-Philippe Brucker
@ 2023-03-22 10:23     ` Mostafa Saleh
  -1 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2023-03-22 10:23 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Jean,

On Wed, Feb 01, 2023 at 12:53:23PM +0000, Jean-Philippe Brucker wrote:
> Prepare the stage-2 I/O page table configuration that will be used by
> the hypervisor driver.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> ---
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c   | 29 +++++++++++++++++++
>  1 file changed, 29 insertions(+)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
> index 755c77bc0417..55489d56fb5b 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
> @@ -16,6 +16,7 @@ struct host_arm_smmu_device {
>  	struct arm_smmu_device		smmu;
>  	pkvm_handle_t			id;
>  	u32				boot_gbpa;
> +	unsigned int			pgd_order;
>  };
>  
>  #define smmu_to_host(_smmu) \
> @@ -192,6 +193,7 @@ static int kvm_arm_smmu_probe(struct platform_device *pdev)
>  	size_t size;
>  	phys_addr_t ioaddr;
>  	struct resource *res;
> +	struct io_pgtable_cfg cfg;
>  	struct arm_smmu_device *smmu;
>  	struct device *dev = &pdev->dev;
>  	struct host_arm_smmu_device *host_smmu;
> @@ -233,6 +235,31 @@ static int kvm_arm_smmu_probe(struct platform_device *pdev)
>  	if (!kvm_arm_smmu_validate_features(smmu))
>  		return -ENODEV;
>  
> +	/*
> +	 * Stage-1 should be easy to support, though we do need to allocate a
> +	 * context descriptor table.
> +	 */
> +	cfg = (struct io_pgtable_cfg) {
> +		.fmt = ARM_64_LPAE_S2,
> +		.pgsize_bitmap = smmu->pgsize_bitmap,
> +		.ias = smmu->ias,
> +		.oas = smmu->oas,
> +		.coherent_walk = smmu->features & ARM_SMMU_FEAT_COHERENCY,
> +	};
> +
> +	/*
> +	 * Choose the page and address size. Compute the PGD size and number of
> +	 * levels as well, so we know how much memory to pre-allocate.
> +	 */
> +	ret = io_pgtable_configure(&cfg, &size);
size variable is overwritten here with pgd size, while used later on
the assumption it still contains the SMMU MMIO size.
This looks like it is not intended?


> +	if (ret)
> +		return ret;
> +
> +	host_smmu->pgd_order = get_order(size);
> +	smmu->pgsize_bitmap = cfg.pgsize_bitmap;
> +	smmu->ias = cfg.ias;
> +	smmu->oas = cfg.oas;
> +
>  	ret = arm_smmu_init_one_queue(smmu, &smmu->cmdq.q, smmu->base,
>  				      ARM_SMMU_CMDQ_PROD, ARM_SMMU_CMDQ_CONS,
>  				      CMDQ_ENT_DWORDS, "cmdq");
> @@ -253,6 +280,8 @@ static int kvm_arm_smmu_probe(struct platform_device *pdev)
>  	hyp_smmu->mmio_addr = ioaddr;
>  	hyp_smmu->mmio_size = size;
>  	hyp_smmu->features = smmu->features;
> +	hyp_smmu->iommu.pgtable_cfg = cfg;
> +
>  	kvm_arm_smmu_cur++;
>  
>  	return 0;
> -- 
> 2.39.0
> 
Thanks,
Mostafa

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 39/45] iommu/arm-smmu-v3-kvm: Initialize page table configuration
@ 2023-03-22 10:23     ` Mostafa Saleh
  0 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2023-03-22 10:23 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Jean,

On Wed, Feb 01, 2023 at 12:53:23PM +0000, Jean-Philippe Brucker wrote:
> Prepare the stage-2 I/O page table configuration that will be used by
> the hypervisor driver.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> ---
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c   | 29 +++++++++++++++++++
>  1 file changed, 29 insertions(+)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
> index 755c77bc0417..55489d56fb5b 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
> @@ -16,6 +16,7 @@ struct host_arm_smmu_device {
>  	struct arm_smmu_device		smmu;
>  	pkvm_handle_t			id;
>  	u32				boot_gbpa;
> +	unsigned int			pgd_order;
>  };
>  
>  #define smmu_to_host(_smmu) \
> @@ -192,6 +193,7 @@ static int kvm_arm_smmu_probe(struct platform_device *pdev)
>  	size_t size;
>  	phys_addr_t ioaddr;
>  	struct resource *res;
> +	struct io_pgtable_cfg cfg;
>  	struct arm_smmu_device *smmu;
>  	struct device *dev = &pdev->dev;
>  	struct host_arm_smmu_device *host_smmu;
> @@ -233,6 +235,31 @@ static int kvm_arm_smmu_probe(struct platform_device *pdev)
>  	if (!kvm_arm_smmu_validate_features(smmu))
>  		return -ENODEV;
>  
> +	/*
> +	 * Stage-1 should be easy to support, though we do need to allocate a
> +	 * context descriptor table.
> +	 */
> +	cfg = (struct io_pgtable_cfg) {
> +		.fmt = ARM_64_LPAE_S2,
> +		.pgsize_bitmap = smmu->pgsize_bitmap,
> +		.ias = smmu->ias,
> +		.oas = smmu->oas,
> +		.coherent_walk = smmu->features & ARM_SMMU_FEAT_COHERENCY,
> +	};
> +
> +	/*
> +	 * Choose the page and address size. Compute the PGD size and number of
> +	 * levels as well, so we know how much memory to pre-allocate.
> +	 */
> +	ret = io_pgtable_configure(&cfg, &size);
size variable is overwritten here with pgd size, while used later on
the assumption it still contains the SMMU MMIO size.
This looks like it is not intended?


> +	if (ret)
> +		return ret;
> +
> +	host_smmu->pgd_order = get_order(size);
> +	smmu->pgsize_bitmap = cfg.pgsize_bitmap;
> +	smmu->ias = cfg.ias;
> +	smmu->oas = cfg.oas;
> +
>  	ret = arm_smmu_init_one_queue(smmu, &smmu->cmdq.q, smmu->base,
>  				      ARM_SMMU_CMDQ_PROD, ARM_SMMU_CMDQ_CONS,
>  				      CMDQ_ENT_DWORDS, "cmdq");
> @@ -253,6 +280,8 @@ static int kvm_arm_smmu_probe(struct platform_device *pdev)
>  	hyp_smmu->mmio_addr = ioaddr;
>  	hyp_smmu->mmio_size = size;
>  	hyp_smmu->features = smmu->features;
> +	hyp_smmu->iommu.pgtable_cfg = cfg;
> +
>  	kvm_arm_smmu_cur++;
>  
>  	return 0;
> -- 
> 2.39.0
> 
Thanks,
Mostafa

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 39/45] iommu/arm-smmu-v3-kvm: Initialize page table configuration
  2023-03-22 10:23     ` Mostafa Saleh
@ 2023-03-22 14:42       ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-03-22 14:42 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Mostafa,

On Wed, Mar 22, 2023 at 10:23:50AM +0000, Mostafa Saleh wrote:
> > +	/*
> > +	 * Stage-1 should be easy to support, though we do need to allocate a
> > +	 * context descriptor table.
> > +	 */
> > +	cfg = (struct io_pgtable_cfg) {
> > +		.fmt = ARM_64_LPAE_S2,
> > +		.pgsize_bitmap = smmu->pgsize_bitmap,
> > +		.ias = smmu->ias,
> > +		.oas = smmu->oas,
> > +		.coherent_walk = smmu->features & ARM_SMMU_FEAT_COHERENCY,
> > +	};
> > +
> > +	/*
> > +	 * Choose the page and address size. Compute the PGD size and number of
> > +	 * levels as well, so we know how much memory to pre-allocate.

(I also need to fix that comment, we're not getting the number of levels anymore)

> > +	 */
> > +	ret = io_pgtable_configure(&cfg, &size);
> size variable is overwritten here with pgd size, while used later on
> the assumption it still contains the SMMU MMIO size.
> This looks like it is not intended?

No it's a bug, thanks for spotting it. I'll try to update the pkvm/smmu
branch today with the other issues you reported

Thanks,
Jean


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 39/45] iommu/arm-smmu-v3-kvm: Initialize page table configuration
@ 2023-03-22 14:42       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-03-22 14:42 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Mostafa,

On Wed, Mar 22, 2023 at 10:23:50AM +0000, Mostafa Saleh wrote:
> > +	/*
> > +	 * Stage-1 should be easy to support, though we do need to allocate a
> > +	 * context descriptor table.
> > +	 */
> > +	cfg = (struct io_pgtable_cfg) {
> > +		.fmt = ARM_64_LPAE_S2,
> > +		.pgsize_bitmap = smmu->pgsize_bitmap,
> > +		.ias = smmu->ias,
> > +		.oas = smmu->oas,
> > +		.coherent_walk = smmu->features & ARM_SMMU_FEAT_COHERENCY,
> > +	};
> > +
> > +	/*
> > +	 * Choose the page and address size. Compute the PGD size and number of
> > +	 * levels as well, so we know how much memory to pre-allocate.

(I also need to fix that comment, we're not getting the number of levels anymore)

> > +	 */
> > +	ret = io_pgtable_configure(&cfg, &size);
> size variable is overwritten here with pgd size, while used later on
> the assumption it still contains the SMMU MMIO size.
> This looks like it is not intended?

No it's a bug, thanks for spotting it. I'll try to update the pkvm/smmu
branch today with the other issues you reported

Thanks,
Jean


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 20/45] KVM: arm64: iommu: Add map() and unmap() operations
  2023-02-01 12:53   ` Jean-Philippe Brucker
@ 2023-03-30 18:14     ` Mostafa Saleh
  -1 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2023-03-30 18:14 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Jean,

On Wed, Feb 01, 2023 at 12:53:04PM +0000, Jean-Philippe Brucker wrote:
> Handle map() and unmap() hypercalls by calling the io-pgtable library.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> ---
>  arch/arm64/kvm/hyp/nvhe/iommu/iommu.c | 144 ++++++++++++++++++++++++++
>  1 file changed, 144 insertions(+)
> 
> diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> index 7404ea77ed9f..0550e7bdf179 100644
> --- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> +++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> @@ -183,6 +183,150 @@ int kvm_iommu_detach_dev(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
>  	return ret;
>  }
>  
> +static int __kvm_iommu_unmap_pages(struct io_pgtable *iopt, unsigned long iova,
> +				   size_t pgsize, size_t pgcount)
> +{
> +	int ret;
> +	size_t unmapped;
> +	phys_addr_t paddr;
> +	size_t total_unmapped = 0;
> +	size_t size = pgsize * pgcount;
> +
> +	while (total_unmapped < size) {
> +		paddr = iopt_iova_to_phys(iopt, iova);
> +		if (paddr == 0)
> +			return -EINVAL;
> +
> +		/*
> +		 * One page/block at a time, because the range provided may not
> +		 * be physically contiguous, and we need to unshare all physical
> +		 * pages.
> +		 */
> +		unmapped = iopt_unmap_pages(iopt, iova, pgsize, 1, NULL);
> +		if (!unmapped)
> +			return -EINVAL;
> +
> +		ret = __pkvm_host_unshare_dma(paddr, pgsize);
> +		if (ret)
> +			return ret;
> +
> +		iova += unmapped;
> +		pgcount -= unmapped / pgsize;
> +		total_unmapped += unmapped;
> +	}
> +
> +	return 0;
> +}
> +
> +#define IOMMU_PROT_MASK (IOMMU_READ | IOMMU_WRITE | IOMMU_CACHE |\
> +			 IOMMU_NOEXEC | IOMMU_MMIO)
> +
> +int kvm_iommu_map_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
> +			unsigned long iova, phys_addr_t paddr, size_t pgsize,
> +			size_t pgcount, int prot)
> +{
> +	size_t size;
> +	size_t granule;
> +	int ret = -EINVAL;
> +	size_t mapped = 0;
> +	struct io_pgtable iopt;
> +	struct kvm_hyp_iommu *iommu;
> +	size_t pgcount_orig = pgcount;
> +	unsigned long iova_orig = iova;
> +	struct kvm_hyp_iommu_domain *domain;
> +
> +	if (prot & ~IOMMU_PROT_MASK)
> +		return -EINVAL;
> +
> +	if (__builtin_mul_overflow(pgsize, pgcount, &size) ||
> +	    iova + size < iova || paddr + size < paddr)
> +		return -EOVERFLOW;
> +
> +	hyp_spin_lock(&iommu_lock);
> +
> +	domain = handle_to_domain(iommu_id, domain_id, &iommu);
> +	if (!domain)
> +		goto err_unlock;
> +
> +	granule = 1 << __ffs(iommu->pgtable->cfg.pgsize_bitmap);
> +	if (!IS_ALIGNED(iova | paddr | pgsize, granule))
> +		goto err_unlock;
> +
> +	ret = __pkvm_host_share_dma(paddr, size, !(prot & IOMMU_MMIO));
> +	if (ret)
> +		goto err_unlock;
> +
> +	iopt = domain_to_iopt(iommu, domain, domain_id);
> +	while (pgcount) {
> +		ret = iopt_map_pages(&iopt, iova, paddr, pgsize, pgcount, prot,
> +				     0, &mapped);
> +		WARN_ON(!IS_ALIGNED(mapped, pgsize));
> +		pgcount -= mapped / pgsize;
> +		if (ret)
> +			goto err_unmap;
> +		iova += mapped;
> +		paddr += mapped;
> +	}
> +
> +	hyp_spin_unlock(&iommu_lock);
> +	return 0;
> +
> +err_unmap:
> +	__kvm_iommu_unmap_pages(&iopt, iova_orig, pgsize, pgcount_orig - pgcount);
On error here, this unmaps (and unshares) only pages that has been
mapped.
But all pages where shared with IOMMU before (via
__pkvm_host_share_dma) and this corrupts the other pages state as
they are marked as shared while they are not.

I see we can add a "bool unshare" arg to __kvm_iommu_unmap_pages which
will be called with false on error from here after calling
__pkvm_host_unshare_dma for the whole range.

And set to true from kvm_iommu_unmap_pages.

> +err_unlock:
> +	hyp_spin_unlock(&iommu_lock);
> +	return ret;
> +}
> +
> +int kvm_iommu_unmap_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
> +			  unsigned long iova, size_t pgsize, size_t pgcount)
> +{
> +	size_t size;
> +	size_t granule;
> +	int ret = -EINVAL;
> +	struct io_pgtable iopt;
> +	struct kvm_hyp_iommu *iommu;
> +	struct kvm_hyp_iommu_domain *domain;
> +
> +	if (__builtin_mul_overflow(pgsize, pgcount, &size) ||
> +	    iova + size < iova)
> +		return -EOVERFLOW;
> +
> +	hyp_spin_lock(&iommu_lock);
> +	domain = handle_to_domain(iommu_id, domain_id, &iommu);
> +	if (!domain)
> +		goto out_unlock;
> +
> +	granule = 1 << __ffs(iommu->pgtable->cfg.pgsize_bitmap);
> +	if (!IS_ALIGNED(iova | pgsize, granule))
> +		goto out_unlock;
> +
> +	iopt = domain_to_iopt(iommu, domain, domain_id);
> +	ret = __kvm_iommu_unmap_pages(&iopt, iova, pgsize, pgcount);
> +out_unlock:
> +	hyp_spin_unlock(&iommu_lock);
> +	return ret;
> +}
> +
> +phys_addr_t kvm_iommu_iova_to_phys(pkvm_handle_t iommu_id,
> +				   pkvm_handle_t domain_id, unsigned long iova)
> +{
> +	phys_addr_t phys = 0;
> +	struct io_pgtable iopt;
> +	struct kvm_hyp_iommu *iommu;
> +	struct kvm_hyp_iommu_domain *domain;
> +
> +	hyp_spin_lock(&iommu_lock);
> +	domain = handle_to_domain(iommu_id, domain_id, &iommu);
> +	if (domain) {
> +		iopt = domain_to_iopt(iommu, domain, domain_id);
> +
> +		phys = iopt_iova_to_phys(&iopt, iova);
> +	}
> +	hyp_spin_unlock(&iommu_lock);
> +	return phys;
> +}
> +
>  int kvm_iommu_init_device(struct kvm_hyp_iommu *iommu)
>  {
>  	void *domains;
> -- 
> 2.39.0

Thanks,
Mostafa

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 20/45] KVM: arm64: iommu: Add map() and unmap() operations
@ 2023-03-30 18:14     ` Mostafa Saleh
  0 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2023-03-30 18:14 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Jean,

On Wed, Feb 01, 2023 at 12:53:04PM +0000, Jean-Philippe Brucker wrote:
> Handle map() and unmap() hypercalls by calling the io-pgtable library.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> ---
>  arch/arm64/kvm/hyp/nvhe/iommu/iommu.c | 144 ++++++++++++++++++++++++++
>  1 file changed, 144 insertions(+)
> 
> diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> index 7404ea77ed9f..0550e7bdf179 100644
> --- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> +++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> @@ -183,6 +183,150 @@ int kvm_iommu_detach_dev(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
>  	return ret;
>  }
>  
> +static int __kvm_iommu_unmap_pages(struct io_pgtable *iopt, unsigned long iova,
> +				   size_t pgsize, size_t pgcount)
> +{
> +	int ret;
> +	size_t unmapped;
> +	phys_addr_t paddr;
> +	size_t total_unmapped = 0;
> +	size_t size = pgsize * pgcount;
> +
> +	while (total_unmapped < size) {
> +		paddr = iopt_iova_to_phys(iopt, iova);
> +		if (paddr == 0)
> +			return -EINVAL;
> +
> +		/*
> +		 * One page/block at a time, because the range provided may not
> +		 * be physically contiguous, and we need to unshare all physical
> +		 * pages.
> +		 */
> +		unmapped = iopt_unmap_pages(iopt, iova, pgsize, 1, NULL);
> +		if (!unmapped)
> +			return -EINVAL;
> +
> +		ret = __pkvm_host_unshare_dma(paddr, pgsize);
> +		if (ret)
> +			return ret;
> +
> +		iova += unmapped;
> +		pgcount -= unmapped / pgsize;
> +		total_unmapped += unmapped;
> +	}
> +
> +	return 0;
> +}
> +
> +#define IOMMU_PROT_MASK (IOMMU_READ | IOMMU_WRITE | IOMMU_CACHE |\
> +			 IOMMU_NOEXEC | IOMMU_MMIO)
> +
> +int kvm_iommu_map_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
> +			unsigned long iova, phys_addr_t paddr, size_t pgsize,
> +			size_t pgcount, int prot)
> +{
> +	size_t size;
> +	size_t granule;
> +	int ret = -EINVAL;
> +	size_t mapped = 0;
> +	struct io_pgtable iopt;
> +	struct kvm_hyp_iommu *iommu;
> +	size_t pgcount_orig = pgcount;
> +	unsigned long iova_orig = iova;
> +	struct kvm_hyp_iommu_domain *domain;
> +
> +	if (prot & ~IOMMU_PROT_MASK)
> +		return -EINVAL;
> +
> +	if (__builtin_mul_overflow(pgsize, pgcount, &size) ||
> +	    iova + size < iova || paddr + size < paddr)
> +		return -EOVERFLOW;
> +
> +	hyp_spin_lock(&iommu_lock);
> +
> +	domain = handle_to_domain(iommu_id, domain_id, &iommu);
> +	if (!domain)
> +		goto err_unlock;
> +
> +	granule = 1 << __ffs(iommu->pgtable->cfg.pgsize_bitmap);
> +	if (!IS_ALIGNED(iova | paddr | pgsize, granule))
> +		goto err_unlock;
> +
> +	ret = __pkvm_host_share_dma(paddr, size, !(prot & IOMMU_MMIO));
> +	if (ret)
> +		goto err_unlock;
> +
> +	iopt = domain_to_iopt(iommu, domain, domain_id);
> +	while (pgcount) {
> +		ret = iopt_map_pages(&iopt, iova, paddr, pgsize, pgcount, prot,
> +				     0, &mapped);
> +		WARN_ON(!IS_ALIGNED(mapped, pgsize));
> +		pgcount -= mapped / pgsize;
> +		if (ret)
> +			goto err_unmap;
> +		iova += mapped;
> +		paddr += mapped;
> +	}
> +
> +	hyp_spin_unlock(&iommu_lock);
> +	return 0;
> +
> +err_unmap:
> +	__kvm_iommu_unmap_pages(&iopt, iova_orig, pgsize, pgcount_orig - pgcount);
On error here, this unmaps (and unshares) only pages that has been
mapped.
But all pages where shared with IOMMU before (via
__pkvm_host_share_dma) and this corrupts the other pages state as
they are marked as shared while they are not.

I see we can add a "bool unshare" arg to __kvm_iommu_unmap_pages which
will be called with false on error from here after calling
__pkvm_host_unshare_dma for the whole range.

And set to true from kvm_iommu_unmap_pages.

> +err_unlock:
> +	hyp_spin_unlock(&iommu_lock);
> +	return ret;
> +}
> +
> +int kvm_iommu_unmap_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
> +			  unsigned long iova, size_t pgsize, size_t pgcount)
> +{
> +	size_t size;
> +	size_t granule;
> +	int ret = -EINVAL;
> +	struct io_pgtable iopt;
> +	struct kvm_hyp_iommu *iommu;
> +	struct kvm_hyp_iommu_domain *domain;
> +
> +	if (__builtin_mul_overflow(pgsize, pgcount, &size) ||
> +	    iova + size < iova)
> +		return -EOVERFLOW;
> +
> +	hyp_spin_lock(&iommu_lock);
> +	domain = handle_to_domain(iommu_id, domain_id, &iommu);
> +	if (!domain)
> +		goto out_unlock;
> +
> +	granule = 1 << __ffs(iommu->pgtable->cfg.pgsize_bitmap);
> +	if (!IS_ALIGNED(iova | pgsize, granule))
> +		goto out_unlock;
> +
> +	iopt = domain_to_iopt(iommu, domain, domain_id);
> +	ret = __kvm_iommu_unmap_pages(&iopt, iova, pgsize, pgcount);
> +out_unlock:
> +	hyp_spin_unlock(&iommu_lock);
> +	return ret;
> +}
> +
> +phys_addr_t kvm_iommu_iova_to_phys(pkvm_handle_t iommu_id,
> +				   pkvm_handle_t domain_id, unsigned long iova)
> +{
> +	phys_addr_t phys = 0;
> +	struct io_pgtable iopt;
> +	struct kvm_hyp_iommu *iommu;
> +	struct kvm_hyp_iommu_domain *domain;
> +
> +	hyp_spin_lock(&iommu_lock);
> +	domain = handle_to_domain(iommu_id, domain_id, &iommu);
> +	if (domain) {
> +		iopt = domain_to_iopt(iommu, domain, domain_id);
> +
> +		phys = iopt_iova_to_phys(&iopt, iova);
> +	}
> +	hyp_spin_unlock(&iommu_lock);
> +	return phys;
> +}
> +
>  int kvm_iommu_init_device(struct kvm_hyp_iommu *iommu)
>  {
>  	void *domains;
> -- 
> 2.39.0

Thanks,
Mostafa

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 20/45] KVM: arm64: iommu: Add map() and unmap() operations
  2023-03-30 18:14     ` Mostafa Saleh
@ 2023-04-04 16:00       ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-04-04 16:00 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Mostafa,

On Thu, Mar 30, 2023 at 06:14:04PM +0000, Mostafa Saleh wrote:
> > +err_unmap:
> > +	__kvm_iommu_unmap_pages(&iopt, iova_orig, pgsize, pgcount_orig - pgcount);
> On error here, this unmaps (and unshares) only pages that has been
> mapped.
> But all pages where shared with IOMMU before (via
> __pkvm_host_share_dma) and this corrupts the other pages state as
> they are marked as shared while they are not.

Right, I'll fix this

> I see we can add a "bool unshare" arg to __kvm_iommu_unmap_pages which
> will be called with false on error from here after calling
> __pkvm_host_unshare_dma for the whole range.

I think it's simpler to call iopt_unmap_pages() directly here, followed by
__pkvm_host_unshare_dma(). It even saves us a few lines

Thanks,
Jean


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 20/45] KVM: arm64: iommu: Add map() and unmap() operations
@ 2023-04-04 16:00       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-04-04 16:00 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Mostafa,

On Thu, Mar 30, 2023 at 06:14:04PM +0000, Mostafa Saleh wrote:
> > +err_unmap:
> > +	__kvm_iommu_unmap_pages(&iopt, iova_orig, pgsize, pgcount_orig - pgcount);
> On error here, this unmaps (and unshares) only pages that has been
> mapped.
> But all pages where shared with IOMMU before (via
> __pkvm_host_share_dma) and this corrupts the other pages state as
> they are marked as shared while they are not.

Right, I'll fix this

> I see we can add a "bool unshare" arg to __kvm_iommu_unmap_pages which
> will be called with false on error from here after calling
> __pkvm_host_unshare_dma for the whole range.

I think it's simpler to call iopt_unmap_pages() directly here, followed by
__pkvm_host_unshare_dma(). It even saves us a few lines

Thanks,
Jean


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 19/45] KVM: arm64: iommu: Add domains
  2023-02-01 12:53   ` Jean-Philippe Brucker
@ 2023-05-19 15:33     ` Mostafa Saleh
  -1 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2023-05-19 15:33 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Jean,

On Wed, Feb 01, 2023 at 12:53:03PM +0000, Jean-Philippe Brucker wrote:
> +/*
> + * Serialize access to domains and IOMMU driver internal structures (command
> + * queue, device tables)
> + */
> +static hyp_spinlock_t iommu_lock;
> +
I was looking more into this lock and I think we can make it per IOMMU instead
of having one big lock to avoid congestion, as I see it is only used to
protect per-IOMMU resources.
Some special handling needed as hyp_spinlock_t is not exposed to EL1.
Maybe something like this:

diff --git a/arch/arm64/kvm/hyp/hyp-constants.c b/arch/arm64/kvm/hyp/hyp-constants.c
index b257a3b4bfc5..96d30a37f9e6 100644
--- a/arch/arm64/kvm/hyp/hyp-constants.c
+++ b/arch/arm64/kvm/hyp/hyp-constants.c
@@ -3,11 +3,13 @@
 #include <linux/kbuild.h>
 #include <nvhe/memory.h>
 #include <nvhe/pkvm.h>
+#include <nvhe/spinlock.h>

 int main(void)
 {
 	DEFINE(STRUCT_HYP_PAGE_SIZE,	sizeof(struct hyp_page));
 	DEFINE(PKVM_HYP_VM_SIZE,	sizeof(struct pkvm_hyp_vm));
 	DEFINE(PKVM_HYP_VCPU_SIZE,	sizeof(struct pkvm_hyp_vcpu));
+	DEFINE(HYP_SPINLOCK_SIZE,	sizeof(hyp_spinlock_t));
 	return 0;
 }
diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile
index a90b97d8bae3..cf9195e24a08 100644
--- a/drivers/iommu/arm/arm-smmu-v3/Makefile
+++ b/drivers/iommu/arm/arm-smmu-v3/Makefile
@@ -6,6 +6,7 @@ arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
 arm_smmu_v3-objs := $(arm_smmu_v3-objs-y)
 
 obj-$(CONFIG_ARM_SMMU_V3_PKVM) += arm_smmu_v3_kvm.o
+ccflags-$(CONFIG_ARM_SMMU_V3_PKVM) += -Iarch/arm64/kvm/
 arm_smmu_v3_kvm-objs-y += arm-smmu-v3-kvm.o
 arm_smmu_v3_kvm-objs-y += arm-smmu-v3-common.o
 arm_smmu_v3_kvm-objs := $(arm_smmu_v3_kvm-objs-y)
diff --git a/include/kvm/iommu.h b/include/kvm/iommu.h
index ab888da731bc..82827b99b1ed 100644
--- a/include/kvm/iommu.h
+++ b/include/kvm/iommu.h
@@ -5,6 +5,12 @@
 #include <asm/kvm_host.h>
 #include <kvm/power_domain.h>
 #include <linux/io-pgtable.h>
+#ifdef __KVM_NVHE_HYPERVISOR__
+#include <nvhe/spinlock.h>
+#else
+#include "hyp_constants.h"
+#endif
 
 /*
  * Parameters from the trusted host:
@@ -23,6 +29,11 @@ struct kvm_hyp_iommu {
 
 	struct io_pgtable_params	*pgtable;
 	bool				power_is_off;
+#ifdef __KVM_NVHE_HYPERVISOR__
+	hyp_spinlock_t			iommu_lock;
+#else
+	u8 unused[HYP_SPINLOCK_SIZE];
+#endif
 };
 
 struct kvm_hyp_iommu_memcache {
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
index 1f4d5fcc1386..afaf173e65ed 100644
--- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
@@ -14,12 +14,6 @@
 
 struct kvm_hyp_iommu_memcache __ro_after_init *kvm_hyp_iommu_memcaches;
 
-/*
- * Serialize access to domains and IOMMU driver internal structures (command
- * queue, device tables)
- */
-static hyp_spinlock_t iommu_lock;
-
 #define domain_to_iopt(_iommu, _domain, _domain_id)		\
 	(struct io_pgtable) {					\
 		.ops = &(_iommu)->pgtable->ops,			\
@@ -93,10 +87,10 @@ int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 {
 	int ret = -EINVAL;
 	struct io_pgtable iopt;
-	struct kvm_hyp_iommu *iommu;
+	struct kvm_hyp_iommu *iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
 	struct kvm_hyp_iommu_domain *domain;
 
-	hyp_spin_lock(&iommu_lock);
+	hyp_spin_lock(&iommu->iommu_lock);
 	domain = handle_to_domain(iommu_id, domain_id, &iommu);
 	if (!domain)
 		goto out_unlock;
@@ -112,7 +106,7 @@ int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 	domain->refs = 1;
 	domain->pgd = iopt.pgd;
 out_unlock:
-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return ret;
 }
 
@@ -120,10 +114,10 @@ int kvm_iommu_free_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id)
 {
 	int ret = -EINVAL;
 	struct io_pgtable iopt;
-	struct kvm_hyp_iommu *iommu;
+	struct kvm_hyp_iommu *iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
 	struct kvm_hyp_iommu_domain *domain;
 
-	hyp_spin_lock(&iommu_lock);
+	hyp_spin_lock(&iommu->iommu_lock);
 	domain = handle_to_domain(iommu_id, domain_id, &iommu);
 	if (!domain)
 		goto out_unlock;
@@ -137,7 +131,7 @@ int kvm_iommu_free_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id)
 	memset(domain, 0, sizeof(*domain));
 
 out_unlock:
-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return ret;
 }
-- 

(I didn't include full patch as it is too long, but mainly the
rest is s/&iommu_lock/&iommu->iommu_lock)

Please let me know what do you think?

Thanks,
Mostafa


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 19/45] KVM: arm64: iommu: Add domains
@ 2023-05-19 15:33     ` Mostafa Saleh
  0 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2023-05-19 15:33 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Jean,

On Wed, Feb 01, 2023 at 12:53:03PM +0000, Jean-Philippe Brucker wrote:
> +/*
> + * Serialize access to domains and IOMMU driver internal structures (command
> + * queue, device tables)
> + */
> +static hyp_spinlock_t iommu_lock;
> +
I was looking more into this lock and I think we can make it per IOMMU instead
of having one big lock to avoid congestion, as I see it is only used to
protect per-IOMMU resources.
Some special handling needed as hyp_spinlock_t is not exposed to EL1.
Maybe something like this:

diff --git a/arch/arm64/kvm/hyp/hyp-constants.c b/arch/arm64/kvm/hyp/hyp-constants.c
index b257a3b4bfc5..96d30a37f9e6 100644
--- a/arch/arm64/kvm/hyp/hyp-constants.c
+++ b/arch/arm64/kvm/hyp/hyp-constants.c
@@ -3,11 +3,13 @@
 #include <linux/kbuild.h>
 #include <nvhe/memory.h>
 #include <nvhe/pkvm.h>
+#include <nvhe/spinlock.h>

 int main(void)
 {
 	DEFINE(STRUCT_HYP_PAGE_SIZE,	sizeof(struct hyp_page));
 	DEFINE(PKVM_HYP_VM_SIZE,	sizeof(struct pkvm_hyp_vm));
 	DEFINE(PKVM_HYP_VCPU_SIZE,	sizeof(struct pkvm_hyp_vcpu));
+	DEFINE(HYP_SPINLOCK_SIZE,	sizeof(hyp_spinlock_t));
 	return 0;
 }
diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile
index a90b97d8bae3..cf9195e24a08 100644
--- a/drivers/iommu/arm/arm-smmu-v3/Makefile
+++ b/drivers/iommu/arm/arm-smmu-v3/Makefile
@@ -6,6 +6,7 @@ arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
 arm_smmu_v3-objs := $(arm_smmu_v3-objs-y)
 
 obj-$(CONFIG_ARM_SMMU_V3_PKVM) += arm_smmu_v3_kvm.o
+ccflags-$(CONFIG_ARM_SMMU_V3_PKVM) += -Iarch/arm64/kvm/
 arm_smmu_v3_kvm-objs-y += arm-smmu-v3-kvm.o
 arm_smmu_v3_kvm-objs-y += arm-smmu-v3-common.o
 arm_smmu_v3_kvm-objs := $(arm_smmu_v3_kvm-objs-y)
diff --git a/include/kvm/iommu.h b/include/kvm/iommu.h
index ab888da731bc..82827b99b1ed 100644
--- a/include/kvm/iommu.h
+++ b/include/kvm/iommu.h
@@ -5,6 +5,12 @@
 #include <asm/kvm_host.h>
 #include <kvm/power_domain.h>
 #include <linux/io-pgtable.h>
+#ifdef __KVM_NVHE_HYPERVISOR__
+#include <nvhe/spinlock.h>
+#else
+#include "hyp_constants.h"
+#endif
 
 /*
  * Parameters from the trusted host:
@@ -23,6 +29,11 @@ struct kvm_hyp_iommu {
 
 	struct io_pgtable_params	*pgtable;
 	bool				power_is_off;
+#ifdef __KVM_NVHE_HYPERVISOR__
+	hyp_spinlock_t			iommu_lock;
+#else
+	u8 unused[HYP_SPINLOCK_SIZE];
+#endif
 };
 
 struct kvm_hyp_iommu_memcache {
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
index 1f4d5fcc1386..afaf173e65ed 100644
--- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
@@ -14,12 +14,6 @@
 
 struct kvm_hyp_iommu_memcache __ro_after_init *kvm_hyp_iommu_memcaches;
 
-/*
- * Serialize access to domains and IOMMU driver internal structures (command
- * queue, device tables)
- */
-static hyp_spinlock_t iommu_lock;
-
 #define domain_to_iopt(_iommu, _domain, _domain_id)		\
 	(struct io_pgtable) {					\
 		.ops = &(_iommu)->pgtable->ops,			\
@@ -93,10 +87,10 @@ int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 {
 	int ret = -EINVAL;
 	struct io_pgtable iopt;
-	struct kvm_hyp_iommu *iommu;
+	struct kvm_hyp_iommu *iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
 	struct kvm_hyp_iommu_domain *domain;
 
-	hyp_spin_lock(&iommu_lock);
+	hyp_spin_lock(&iommu->iommu_lock);
 	domain = handle_to_domain(iommu_id, domain_id, &iommu);
 	if (!domain)
 		goto out_unlock;
@@ -112,7 +106,7 @@ int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 	domain->refs = 1;
 	domain->pgd = iopt.pgd;
 out_unlock:
-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return ret;
 }
 
@@ -120,10 +114,10 @@ int kvm_iommu_free_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id)
 {
 	int ret = -EINVAL;
 	struct io_pgtable iopt;
-	struct kvm_hyp_iommu *iommu;
+	struct kvm_hyp_iommu *iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
 	struct kvm_hyp_iommu_domain *domain;
 
-	hyp_spin_lock(&iommu_lock);
+	hyp_spin_lock(&iommu->iommu_lock);
 	domain = handle_to_domain(iommu_id, domain_id, &iommu);
 	if (!domain)
 		goto out_unlock;
@@ -137,7 +131,7 @@ int kvm_iommu_free_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id)
 	memset(domain, 0, sizeof(*domain));
 
 out_unlock:
-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return ret;
 }
-- 

(I didn't include full patch as it is too long, but mainly the
rest is s/&iommu_lock/&iommu->iommu_lock)

Please let me know what do you think?

Thanks,
Mostafa


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 19/45] KVM: arm64: iommu: Add domains
  2023-05-19 15:33     ` Mostafa Saleh
@ 2023-06-02 15:29       ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-06-02 15:29 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Mostafa,

On Fri, 19 May 2023 at 16:33, Mostafa Saleh <smostafa@google.com> wrote:
>
> Hi Jean,
>
> On Wed, Feb 01, 2023 at 12:53:03PM +0000, Jean-Philippe Brucker wrote:
> > +/*
> > + * Serialize access to domains and IOMMU driver internal structures (command
> > + * queue, device tables)
> > + */
> > +static hyp_spinlock_t iommu_lock;
> > +
> I was looking more into this lock and I think we can make it per IOMMU instead
> of having one big lock to avoid congestion, as I see it is only used to
> protect per-IOMMU resources.

Yes it's a major bottleneck, thanks for looking into this. I think we'll
eventually want to improve scalability within an IOMMU and even a domain:
a multi-queue network device will share a single domain between multiple
CPUs, each issuing lots of map/unmap calls. Or just two devices drivers
working on different CPUs and sharing one IOMMU. In those cases the
per-IOMMU lock won't be good enough and we'll be back to this problem, but
a per-IOMMU lock should already improve scalability for some systems.

Currently the global lock protects:

(1) The domain array (per IOMMU)
(2) The io-pgtables (per domain)
(3) The command queue (per SMMU)
(4) Power state (per SMMU)

I ran some experiments with refcounting the domains to avoid the lock for
(1) and (2), which improves map() scalability. See
https://jpbrucker.net/git/linux/commit/?h=pkvm/smmu-wip&id=5ad3bc6fd589b6f11fe31ccee5b8777ba8739167
(and another experiment to optimize the DMA refcount on branch pkvm/smmu-wip)

For (2), the io-pgtable-arm code already supports concurrent accesses
(2c3d273eabe8 ("iommu/io-pgtable-arm: Support lockless operation")). But
this one needs careful consideration because in the host, the io-pgtable
code trusts the device drivers. For example it expects that only buggy
drivers call map()/unmap() to the same IOVA concurrently. We need to make
sure a compromised host drivers can't exploit these assumptions to do
anything nasty.

There are options to improve (3) as well. The host SMMU driver supports
lockless command-queue (587e6c10a7ce ("iommu/arm-smmu-v3: Reduce
contention during command-queue insertion")) but it may be too complex to
put in the hypervisor (or rather, I haven't tried to understand it yet).
We could also wait for systems with ECMDQ, which enables per-CPU queues.

> Some special handling needed as hyp_spinlock_t is not exposed to EL1.
> Maybe something like this:
>
> diff --git a/arch/arm64/kvm/hyp/hyp-constants.c b/arch/arm64/kvm/hyp/hyp-constants.c
> index b257a3b4bfc5..96d30a37f9e6 100644
> --- a/arch/arm64/kvm/hyp/hyp-constants.c
> +++ b/arch/arm64/kvm/hyp/hyp-constants.c
> @@ -3,11 +3,13 @@
>  #include <linux/kbuild.h>
>  #include <nvhe/memory.h>
>  #include <nvhe/pkvm.h>
> +#include <nvhe/spinlock.h>
>
>  int main(void)
>  {
>         DEFINE(STRUCT_HYP_PAGE_SIZE,    sizeof(struct hyp_page));
>         DEFINE(PKVM_HYP_VM_SIZE,        sizeof(struct pkvm_hyp_vm));
>         DEFINE(PKVM_HYP_VCPU_SIZE,      sizeof(struct pkvm_hyp_vcpu));
> +       DEFINE(HYP_SPINLOCK_SIZE,       sizeof(hyp_spinlock_t));
>         return 0;
>  }
> diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile
> index a90b97d8bae3..cf9195e24a08 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/Makefile
> +++ b/drivers/iommu/arm/arm-smmu-v3/Makefile
> @@ -6,6 +6,7 @@ arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
>  arm_smmu_v3-objs := $(arm_smmu_v3-objs-y)
>
>  obj-$(CONFIG_ARM_SMMU_V3_PKVM) += arm_smmu_v3_kvm.o
> +ccflags-$(CONFIG_ARM_SMMU_V3_PKVM) += -Iarch/arm64/kvm/
>  arm_smmu_v3_kvm-objs-y += arm-smmu-v3-kvm.o
>  arm_smmu_v3_kvm-objs-y += arm-smmu-v3-common.o
>  arm_smmu_v3_kvm-objs := $(arm_smmu_v3_kvm-objs-y)
> diff --git a/include/kvm/iommu.h b/include/kvm/iommu.h
> index ab888da731bc..82827b99b1ed 100644
> --- a/include/kvm/iommu.h
> +++ b/include/kvm/iommu.h
> @@ -5,6 +5,12 @@
>  #include <asm/kvm_host.h>
>  #include <kvm/power_domain.h>
>  #include <linux/io-pgtable.h>
> +#ifdef __KVM_NVHE_HYPERVISOR__
> +#include <nvhe/spinlock.h>
> +#else
> +#include "hyp_constants.h"
> +#endif
>
>  /*
>   * Parameters from the trusted host:
> @@ -23,6 +29,11 @@ struct kvm_hyp_iommu {
>
>         struct io_pgtable_params        *pgtable;
>         bool                            power_is_off;
> +#ifdef __KVM_NVHE_HYPERVISOR__
> +       hyp_spinlock_t                  iommu_lock;
> +#else
> +       u8 unused[HYP_SPINLOCK_SIZE];
> +#endif
>  };
>
>  struct kvm_hyp_iommu_memcache {
> diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> index 1f4d5fcc1386..afaf173e65ed 100644
> --- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> +++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> @@ -14,12 +14,6 @@
>
>  struct kvm_hyp_iommu_memcache __ro_after_init *kvm_hyp_iommu_memcaches;
>
> -/*
> - * Serialize access to domains and IOMMU driver internal structures (command
> - * queue, device tables)
> - */
> -static hyp_spinlock_t iommu_lock;
> -
>  #define domain_to_iopt(_iommu, _domain, _domain_id)            \
>         (struct io_pgtable) {                                   \
>                 .ops = &(_iommu)->pgtable->ops,                 \
> @@ -93,10 +87,10 @@ int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
>  {
>         int ret = -EINVAL;
>         struct io_pgtable iopt;
> -       struct kvm_hyp_iommu *iommu;
> +       struct kvm_hyp_iommu *iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
>         struct kvm_hyp_iommu_domain *domain;
>
> -       hyp_spin_lock(&iommu_lock);
> +       hyp_spin_lock(&iommu->iommu_lock);
>         domain = handle_to_domain(iommu_id, domain_id, &iommu);
>         if (!domain)
>                 goto out_unlock;
> @@ -112,7 +106,7 @@ int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
>         domain->refs = 1;
>         domain->pgd = iopt.pgd;
>  out_unlock:
> -       hyp_spin_unlock(&iommu_lock);
> +       hyp_spin_unlock(&iommu->iommu_lock);
>         return ret;
>  }
>
> @@ -120,10 +114,10 @@ int kvm_iommu_free_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id)
>  {
>         int ret = -EINVAL;
>         struct io_pgtable iopt;
> -       struct kvm_hyp_iommu *iommu;
> +       struct kvm_hyp_iommu *iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
>         struct kvm_hyp_iommu_domain *domain;
>
> -       hyp_spin_lock(&iommu_lock);
> +       hyp_spin_lock(&iommu->iommu_lock);
>         domain = handle_to_domain(iommu_id, domain_id, &iommu);
>         if (!domain)
>                 goto out_unlock;
> @@ -137,7 +131,7 @@ int kvm_iommu_free_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id)
>         memset(domain, 0, sizeof(*domain));
>
>  out_unlock:
> -       hyp_spin_unlock(&iommu_lock);
> +       hyp_spin_unlock(&iommu->iommu_lock);
>         return ret;
>  }
> --
>
> (I didn't include full patch as it is too long, but mainly the
> rest is s/&iommu_lock/&iommu->iommu_lock)
>
> Please let me know what do you think?

It makes sense, I can append it to my tree or squash it (with your S-o-B)
if you want to send the full patch

Thanks,
Jean

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 19/45] KVM: arm64: iommu: Add domains
@ 2023-06-02 15:29       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-06-02 15:29 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Mostafa,

On Fri, 19 May 2023 at 16:33, Mostafa Saleh <smostafa@google.com> wrote:
>
> Hi Jean,
>
> On Wed, Feb 01, 2023 at 12:53:03PM +0000, Jean-Philippe Brucker wrote:
> > +/*
> > + * Serialize access to domains and IOMMU driver internal structures (command
> > + * queue, device tables)
> > + */
> > +static hyp_spinlock_t iommu_lock;
> > +
> I was looking more into this lock and I think we can make it per IOMMU instead
> of having one big lock to avoid congestion, as I see it is only used to
> protect per-IOMMU resources.

Yes it's a major bottleneck, thanks for looking into this. I think we'll
eventually want to improve scalability within an IOMMU and even a domain:
a multi-queue network device will share a single domain between multiple
CPUs, each issuing lots of map/unmap calls. Or just two devices drivers
working on different CPUs and sharing one IOMMU. In those cases the
per-IOMMU lock won't be good enough and we'll be back to this problem, but
a per-IOMMU lock should already improve scalability for some systems.

Currently the global lock protects:

(1) The domain array (per IOMMU)
(2) The io-pgtables (per domain)
(3) The command queue (per SMMU)
(4) Power state (per SMMU)

I ran some experiments with refcounting the domains to avoid the lock for
(1) and (2), which improves map() scalability. See
https://jpbrucker.net/git/linux/commit/?h=pkvm/smmu-wip&id=5ad3bc6fd589b6f11fe31ccee5b8777ba8739167
(and another experiment to optimize the DMA refcount on branch pkvm/smmu-wip)

For (2), the io-pgtable-arm code already supports concurrent accesses
(2c3d273eabe8 ("iommu/io-pgtable-arm: Support lockless operation")). But
this one needs careful consideration because in the host, the io-pgtable
code trusts the device drivers. For example it expects that only buggy
drivers call map()/unmap() to the same IOVA concurrently. We need to make
sure a compromised host drivers can't exploit these assumptions to do
anything nasty.

There are options to improve (3) as well. The host SMMU driver supports
lockless command-queue (587e6c10a7ce ("iommu/arm-smmu-v3: Reduce
contention during command-queue insertion")) but it may be too complex to
put in the hypervisor (or rather, I haven't tried to understand it yet).
We could also wait for systems with ECMDQ, which enables per-CPU queues.

> Some special handling needed as hyp_spinlock_t is not exposed to EL1.
> Maybe something like this:
>
> diff --git a/arch/arm64/kvm/hyp/hyp-constants.c b/arch/arm64/kvm/hyp/hyp-constants.c
> index b257a3b4bfc5..96d30a37f9e6 100644
> --- a/arch/arm64/kvm/hyp/hyp-constants.c
> +++ b/arch/arm64/kvm/hyp/hyp-constants.c
> @@ -3,11 +3,13 @@
>  #include <linux/kbuild.h>
>  #include <nvhe/memory.h>
>  #include <nvhe/pkvm.h>
> +#include <nvhe/spinlock.h>
>
>  int main(void)
>  {
>         DEFINE(STRUCT_HYP_PAGE_SIZE,    sizeof(struct hyp_page));
>         DEFINE(PKVM_HYP_VM_SIZE,        sizeof(struct pkvm_hyp_vm));
>         DEFINE(PKVM_HYP_VCPU_SIZE,      sizeof(struct pkvm_hyp_vcpu));
> +       DEFINE(HYP_SPINLOCK_SIZE,       sizeof(hyp_spinlock_t));
>         return 0;
>  }
> diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile
> index a90b97d8bae3..cf9195e24a08 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/Makefile
> +++ b/drivers/iommu/arm/arm-smmu-v3/Makefile
> @@ -6,6 +6,7 @@ arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
>  arm_smmu_v3-objs := $(arm_smmu_v3-objs-y)
>
>  obj-$(CONFIG_ARM_SMMU_V3_PKVM) += arm_smmu_v3_kvm.o
> +ccflags-$(CONFIG_ARM_SMMU_V3_PKVM) += -Iarch/arm64/kvm/
>  arm_smmu_v3_kvm-objs-y += arm-smmu-v3-kvm.o
>  arm_smmu_v3_kvm-objs-y += arm-smmu-v3-common.o
>  arm_smmu_v3_kvm-objs := $(arm_smmu_v3_kvm-objs-y)
> diff --git a/include/kvm/iommu.h b/include/kvm/iommu.h
> index ab888da731bc..82827b99b1ed 100644
> --- a/include/kvm/iommu.h
> +++ b/include/kvm/iommu.h
> @@ -5,6 +5,12 @@
>  #include <asm/kvm_host.h>
>  #include <kvm/power_domain.h>
>  #include <linux/io-pgtable.h>
> +#ifdef __KVM_NVHE_HYPERVISOR__
> +#include <nvhe/spinlock.h>
> +#else
> +#include "hyp_constants.h"
> +#endif
>
>  /*
>   * Parameters from the trusted host:
> @@ -23,6 +29,11 @@ struct kvm_hyp_iommu {
>
>         struct io_pgtable_params        *pgtable;
>         bool                            power_is_off;
> +#ifdef __KVM_NVHE_HYPERVISOR__
> +       hyp_spinlock_t                  iommu_lock;
> +#else
> +       u8 unused[HYP_SPINLOCK_SIZE];
> +#endif
>  };
>
>  struct kvm_hyp_iommu_memcache {
> diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> index 1f4d5fcc1386..afaf173e65ed 100644
> --- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> +++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> @@ -14,12 +14,6 @@
>
>  struct kvm_hyp_iommu_memcache __ro_after_init *kvm_hyp_iommu_memcaches;
>
> -/*
> - * Serialize access to domains and IOMMU driver internal structures (command
> - * queue, device tables)
> - */
> -static hyp_spinlock_t iommu_lock;
> -
>  #define domain_to_iopt(_iommu, _domain, _domain_id)            \
>         (struct io_pgtable) {                                   \
>                 .ops = &(_iommu)->pgtable->ops,                 \
> @@ -93,10 +87,10 @@ int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
>  {
>         int ret = -EINVAL;
>         struct io_pgtable iopt;
> -       struct kvm_hyp_iommu *iommu;
> +       struct kvm_hyp_iommu *iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
>         struct kvm_hyp_iommu_domain *domain;
>
> -       hyp_spin_lock(&iommu_lock);
> +       hyp_spin_lock(&iommu->iommu_lock);
>         domain = handle_to_domain(iommu_id, domain_id, &iommu);
>         if (!domain)
>                 goto out_unlock;
> @@ -112,7 +106,7 @@ int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
>         domain->refs = 1;
>         domain->pgd = iopt.pgd;
>  out_unlock:
> -       hyp_spin_unlock(&iommu_lock);
> +       hyp_spin_unlock(&iommu->iommu_lock);
>         return ret;
>  }
>
> @@ -120,10 +114,10 @@ int kvm_iommu_free_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id)
>  {
>         int ret = -EINVAL;
>         struct io_pgtable iopt;
> -       struct kvm_hyp_iommu *iommu;
> +       struct kvm_hyp_iommu *iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
>         struct kvm_hyp_iommu_domain *domain;
>
> -       hyp_spin_lock(&iommu_lock);
> +       hyp_spin_lock(&iommu->iommu_lock);
>         domain = handle_to_domain(iommu_id, domain_id, &iommu);
>         if (!domain)
>                 goto out_unlock;
> @@ -137,7 +131,7 @@ int kvm_iommu_free_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id)
>         memset(domain, 0, sizeof(*domain));
>
>  out_unlock:
> -       hyp_spin_unlock(&iommu_lock);
> +       hyp_spin_unlock(&iommu->iommu_lock);
>         return ret;
>  }
> --
>
> (I didn't include full patch as it is too long, but mainly the
> rest is s/&iommu_lock/&iommu->iommu_lock)
>
> Please let me know what do you think?

It makes sense, I can append it to my tree or squash it (with your S-o-B)
if you want to send the full patch

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 19/45] KVM: arm64: iommu: Add domains
  2023-06-02 15:29       ` Jean-Philippe Brucker
@ 2023-06-15 13:32         ` Mostafa Saleh
  -1 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2023-06-15 13:32 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Jean,

On Fri, Jun 02, 2023 at 04:29:07PM +0100, Jean-Philippe Brucker wrote:
> Hi Mostafa,
> 
> On Fri, 19 May 2023 at 16:33, Mostafa Saleh <smostafa@google.com> wrote:
> >
> > Hi Jean,
> >
> > On Wed, Feb 01, 2023 at 12:53:03PM +0000, Jean-Philippe Brucker wrote:
> > > +/*
> > > + * Serialize access to domains and IOMMU driver internal structures (command
> > > + * queue, device tables)
> > > + */
> > > +static hyp_spinlock_t iommu_lock;
> > > +
> > I was looking more into this lock and I think we can make it per IOMMU instead
> > of having one big lock to avoid congestion, as I see it is only used to
> > protect per-IOMMU resources.
> 
> Yes it's a major bottleneck, thanks for looking into this. I think we'll
> eventually want to improve scalability within an IOMMU and even a domain:
> a multi-queue network device will share a single domain between multiple
> CPUs, each issuing lots of map/unmap calls. Or just two devices drivers
> working on different CPUs and sharing one IOMMU. In those cases the
> per-IOMMU lock won't be good enough and we'll be back to this problem, but
> a per-IOMMU lock should already improve scalability for some systems.
> 
> Currently the global lock protects:
> 
> (1) The domain array (per IOMMU)
> (2) The io-pgtables (per domain)
> (3) The command queue (per SMMU)
> (4) Power state (per SMMU)
> 
> I ran some experiments with refcounting the domains to avoid the lock for
> (1) and (2), which improves map() scalability. See
> https://jpbrucker.net/git/linux/commit/?h=pkvm/smmu-wip&id=5ad3bc6fd589b6f11fe31ccee5b8777ba8739167
> (and another experiment to optimize the DMA refcount on branch pkvm/smmu-wip)

Thanks for the detailed explanation! I will give it a try.

> For (2), the io-pgtable-arm code already supports concurrent accesses
> (2c3d273eabe8 ("iommu/io-pgtable-arm: Support lockless operation")). But
> this one needs careful consideration because in the host, the io-pgtable
> code trusts the device drivers. For example it expects that only buggy
> drivers call map()/unmap() to the same IOVA concurrently. We need to make
> sure a compromised host drivers can't exploit these assumptions to do
> anything nasty.
> 
> There are options to improve (3) as well. The host SMMU driver supports
> lockless command-queue (587e6c10a7ce ("iommu/arm-smmu-v3: Reduce
> contention during command-queue insertion")) but it may be too complex to
> put in the hypervisor (or rather, I haven't tried to understand it yet).
> We could also wait for systems with ECMDQ, which enables per-CPU queues.
I was thinking about the same thing, in the futrue we should aim to have a
similar lockless approach for the cmdq as the host in EL2. I am not sure
how feasable it is, I will have a closer look. But having a per-IOMMU
shouldbe be a step in the right direction.

> > Some special handling needed as hyp_spinlock_t is not exposed to EL1.
> > Maybe something like this:
> >
> > diff --git a/arch/arm64/kvm/hyp/hyp-constants.c b/arch/arm64/kvm/hyp/hyp-constants.c
> > index b257a3b4bfc5..96d30a37f9e6 100644
> > --- a/arch/arm64/kvm/hyp/hyp-constants.c
> > +++ b/arch/arm64/kvm/hyp/hyp-constants.c
> > @@ -3,11 +3,13 @@
> >  #include <linux/kbuild.h>
> >  #include <nvhe/memory.h>
> >  #include <nvhe/pkvm.h>
> > +#include <nvhe/spinlock.h>
> >
> >  int main(void)
> >  {
> >         DEFINE(STRUCT_HYP_PAGE_SIZE,    sizeof(struct hyp_page));
> >         DEFINE(PKVM_HYP_VM_SIZE,        sizeof(struct pkvm_hyp_vm));
> >         DEFINE(PKVM_HYP_VCPU_SIZE,      sizeof(struct pkvm_hyp_vcpu));
> > +       DEFINE(HYP_SPINLOCK_SIZE,       sizeof(hyp_spinlock_t));
> >         return 0;
> >  }
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile
> > index a90b97d8bae3..cf9195e24a08 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/Makefile
> > +++ b/drivers/iommu/arm/arm-smmu-v3/Makefile
> > @@ -6,6 +6,7 @@ arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
> >  arm_smmu_v3-objs := $(arm_smmu_v3-objs-y)
> >
> >  obj-$(CONFIG_ARM_SMMU_V3_PKVM) += arm_smmu_v3_kvm.o
> > +ccflags-$(CONFIG_ARM_SMMU_V3_PKVM) += -Iarch/arm64/kvm/
> >  arm_smmu_v3_kvm-objs-y += arm-smmu-v3-kvm.o
> >  arm_smmu_v3_kvm-objs-y += arm-smmu-v3-common.o
> >  arm_smmu_v3_kvm-objs := $(arm_smmu_v3_kvm-objs-y)
> > diff --git a/include/kvm/iommu.h b/include/kvm/iommu.h
> > index ab888da731bc..82827b99b1ed 100644
> > --- a/include/kvm/iommu.h
> > +++ b/include/kvm/iommu.h
> > @@ -5,6 +5,12 @@
> >  #include <asm/kvm_host.h>
> >  #include <kvm/power_domain.h>
> >  #include <linux/io-pgtable.h>
> > +#ifdef __KVM_NVHE_HYPERVISOR__
> > +#include <nvhe/spinlock.h>
> > +#else
> > +#include "hyp_constants.h"
> > +#endif
> >
> >  /*
> >   * Parameters from the trusted host:
> > @@ -23,6 +29,11 @@ struct kvm_hyp_iommu {
> >
> >         struct io_pgtable_params        *pgtable;
> >         bool                            power_is_off;
> > +#ifdef __KVM_NVHE_HYPERVISOR__
> > +       hyp_spinlock_t                  iommu_lock;
> > +#else
> > +       u8 unused[HYP_SPINLOCK_SIZE];
> > +#endif
> >  };
> >
> >  struct kvm_hyp_iommu_memcache {
> > diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> > index 1f4d5fcc1386..afaf173e65ed 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> > @@ -14,12 +14,6 @@
> >
> >  struct kvm_hyp_iommu_memcache __ro_after_init *kvm_hyp_iommu_memcaches;
> >
> > -/*
> > - * Serialize access to domains and IOMMU driver internal structures (command
> > - * queue, device tables)
> > - */
> > -static hyp_spinlock_t iommu_lock;
> > -
> >  #define domain_to_iopt(_iommu, _domain, _domain_id)            \
> >         (struct io_pgtable) {                                   \
> >                 .ops = &(_iommu)->pgtable->ops,                 \
> > @@ -93,10 +87,10 @@ int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
> >  {
> >         int ret = -EINVAL;
> >         struct io_pgtable iopt;
> > -       struct kvm_hyp_iommu *iommu;
> > +       struct kvm_hyp_iommu *iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
> >         struct kvm_hyp_iommu_domain *domain;
> >
> > -       hyp_spin_lock(&iommu_lock);
> > +       hyp_spin_lock(&iommu->iommu_lock);
> >         domain = handle_to_domain(iommu_id, domain_id, &iommu);
> >         if (!domain)
> >                 goto out_unlock;
> > @@ -112,7 +106,7 @@ int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
> >         domain->refs = 1;
> >         domain->pgd = iopt.pgd;
> >  out_unlock:
> > -       hyp_spin_unlock(&iommu_lock);
> > +       hyp_spin_unlock(&iommu->iommu_lock);
> >         return ret;
> >  }
> >
> > @@ -120,10 +114,10 @@ int kvm_iommu_free_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id)
> >  {
> >         int ret = -EINVAL;
> >         struct io_pgtable iopt;
> > -       struct kvm_hyp_iommu *iommu;
> > +       struct kvm_hyp_iommu *iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
> >         struct kvm_hyp_iommu_domain *domain;
> >
> > -       hyp_spin_lock(&iommu_lock);
> > +       hyp_spin_lock(&iommu->iommu_lock);
> >         domain = handle_to_domain(iommu_id, domain_id, &iommu);
> >         if (!domain)
> >                 goto out_unlock;
> > @@ -137,7 +131,7 @@ int kvm_iommu_free_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id)
> >         memset(domain, 0, sizeof(*domain));
> >
> >  out_unlock:
> > -       hyp_spin_unlock(&iommu_lock);
> > +       hyp_spin_unlock(&iommu->iommu_lock);
> >         return ret;
> >  }
> > --
> >
> > (I didn't include full patch as it is too long, but mainly the
> > rest is s/&iommu_lock/&iommu->iommu_lock)
> >
> > Please let me know what do you think?
> 
> It makes sense, I can append it to my tree or squash it (with your S-o-B)
> if you want to send the full patch
Sure, that is the full patch based on latest
https://jpbrucker.net/git/linux/log/?h=pkvm/smmu

Thanks,
Mostafa

--

From 5b43b97ca2486d52aa2ad475e4a34357dea57bca Mon Sep 17 00:00:00 2001
From: Mostafa Saleh <smostafa@google.com>
Date: Tue, 13 Jun 2023 14:59:37 +0000
Subject: [PATCH] KVM: arm: iommu: Use per-iommu lock instead of one global
 lock

Currently, there is one big lock for all of the IOMMU operations
which is used to protect per-IOMMU resources or domains (which are
currently per-IOMMU also).

This can be improved for setups with many IOMMUs by having a per
IOMMU lock.

As now we always get the iommu before calling handle_to_domain, we
can remove the out_iommu arg and iommu_id and pass iommu instead.

Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
 arch/arm64/kvm/hyp/hyp-constants.c     |  1 +
 arch/arm64/kvm/hyp/nvhe/iommu/iommu.c  | 75 +++++++++++++-------------
 drivers/iommu/arm/arm-smmu-v3/Makefile |  1 +
 include/kvm/iommu.h                    | 10 ++++
 4 files changed, 48 insertions(+), 39 deletions(-)

diff --git a/arch/arm64/kvm/hyp/hyp-constants.c b/arch/arm64/kvm/hyp/hyp-constants.c
index b257a3b4bfc5..d84cc8ec1e46 100644
--- a/arch/arm64/kvm/hyp/hyp-constants.c
+++ b/arch/arm64/kvm/hyp/hyp-constants.c
@@ -9,5 +9,6 @@ int main(void)
 	DEFINE(STRUCT_HYP_PAGE_SIZE,	sizeof(struct hyp_page));
 	DEFINE(PKVM_HYP_VM_SIZE,	sizeof(struct pkvm_hyp_vm));
 	DEFINE(PKVM_HYP_VCPU_SIZE,	sizeof(struct pkvm_hyp_vcpu));
+	DEFINE(HYP_SPINLOCK_SIZE,       sizeof(hyp_spinlock_t));
 	return 0;
 }
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
index c70d3b6934eb..1dcba39d1d41 100644
--- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
@@ -14,12 +14,6 @@

 struct kvm_hyp_iommu_memcache __ro_after_init *kvm_hyp_iommu_memcaches;

-/*
- * Serialize access to domains and IOMMU driver internal structures (command
- * queue, device tables)
- */
-static hyp_spinlock_t iommu_lock;
-
 #define domain_to_iopt(_iommu, _domain, _domain_id)		\
 	(struct io_pgtable) {					\
 		.ops = &(_iommu)->pgtable->ops,			\
@@ -59,14 +53,11 @@ void kvm_iommu_reclaim_page(void *p)
 }

 static struct kvm_hyp_iommu_domain *
-handle_to_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
-		 struct kvm_hyp_iommu **out_iommu)
+handle_to_domain(struct kvm_hyp_iommu *iommu, pkvm_handle_t domain_id)
 {
 	int idx;
-	struct kvm_hyp_iommu *iommu;
 	struct kvm_hyp_iommu_domain *domains;

-	iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
 	if (!iommu)
 		return NULL;

@@ -83,7 +74,6 @@ handle_to_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 		iommu->domains[idx] = domains;
 	}

-	*out_iommu = iommu;
 	return &domains[domain_id & KVM_IOMMU_DOMAIN_ID_LEAF_MASK];
 }

@@ -95,8 +85,9 @@ int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 	struct kvm_hyp_iommu *iommu;
 	struct kvm_hyp_iommu_domain *domain;

-	hyp_spin_lock(&iommu_lock);
-	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
+	hyp_spin_lock(&iommu->iommu_lock);
+	domain = handle_to_domain(iommu, domain_id);
 	if (!domain)
 		goto out_unlock;

@@ -111,7 +102,7 @@ int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 	domain->refs = 1;
 	domain->pgd = iopt.pgd;
 out_unlock:
-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return ret;
 }

@@ -122,8 +113,9 @@ int kvm_iommu_free_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id)
 	struct kvm_hyp_iommu *iommu;
 	struct kvm_hyp_iommu_domain *domain;

-	hyp_spin_lock(&iommu_lock);
-	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
+	hyp_spin_lock(&iommu->iommu_lock);
+	domain = handle_to_domain(iommu, domain_id);
 	if (!domain)
 		goto out_unlock;

@@ -136,7 +128,7 @@ int kvm_iommu_free_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id)
 	memset(domain, 0, sizeof(*domain));

 out_unlock:
-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return ret;
 }

@@ -147,8 +139,9 @@ int kvm_iommu_attach_dev(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 	struct kvm_hyp_iommu *iommu;
 	struct kvm_hyp_iommu_domain *domain;

-	hyp_spin_lock(&iommu_lock);
-	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
+	hyp_spin_lock(&iommu->iommu_lock);
+	domain = handle_to_domain(iommu, domain_id);
 	if (!domain || !domain->refs || domain->refs == UINT_MAX)
 		goto out_unlock;

@@ -158,7 +151,7 @@ int kvm_iommu_attach_dev(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,

 	domain->refs++;
 out_unlock:
-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return ret;
 }

@@ -169,8 +162,9 @@ int kvm_iommu_detach_dev(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 	struct kvm_hyp_iommu *iommu;
 	struct kvm_hyp_iommu_domain *domain;

-	hyp_spin_lock(&iommu_lock);
-	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
+	hyp_spin_lock(&iommu->iommu_lock);
+	domain = handle_to_domain(iommu, domain_id);
 	if (!domain || domain->refs <= 1)
 		goto out_unlock;

@@ -180,7 +174,7 @@ int kvm_iommu_detach_dev(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,

 	domain->refs--;
 out_unlock:
-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return ret;
 }

@@ -209,9 +203,10 @@ int kvm_iommu_map_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 	    iova + size < iova || paddr + size < paddr)
 		return -EOVERFLOW;

-	hyp_spin_lock(&iommu_lock);
+	iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
+	hyp_spin_lock(&iommu->iommu_lock);

-	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	domain = handle_to_domain(iommu, domain_id);
 	if (!domain)
 		goto err_unlock;

@@ -236,7 +231,7 @@ int kvm_iommu_map_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 		paddr += mapped;
 	}

-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return 0;

 err_unmap:
@@ -245,7 +240,7 @@ int kvm_iommu_map_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 		iopt_unmap_pages(&iopt, iova_orig, pgsize, pgcount, NULL);
 	__pkvm_host_unshare_dma(paddr_orig, size);
 err_unlock:
-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return ret;
 }

@@ -269,8 +264,9 @@ size_t kvm_iommu_unmap_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 	    iova + size < iova)
 		return 0;

-	hyp_spin_lock(&iommu_lock);
-	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
+	hyp_spin_lock(&iommu->iommu_lock);
+	domain = handle_to_domain(iommu, domain_id);
 	if (!domain)
 		goto out_unlock;

@@ -299,7 +295,7 @@ size_t kvm_iommu_unmap_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 	}

 out_unlock:
-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return total_unmapped;
 }

@@ -311,14 +307,15 @@ phys_addr_t kvm_iommu_iova_to_phys(pkvm_handle_t iommu_id,
 	struct kvm_hyp_iommu *iommu;
 	struct kvm_hyp_iommu_domain *domain;

-	hyp_spin_lock(&iommu_lock);
-	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
+	hyp_spin_lock(&iommu->iommu_lock);
+	domain = handle_to_domain(iommu, domain_id);
 	if (domain) {
 		iopt = domain_to_iopt(iommu, domain, domain_id);

 		phys = iopt_iova_to_phys(&iopt, iova);
 	}
-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return phys;
 }

@@ -333,9 +330,9 @@ static int iommu_power_on(struct kvm_power_domain *pd)
 	 * We currently assume that the device retains its architectural state
 	 * across power off, hence no save/restore.
 	 */
-	hyp_spin_lock(&iommu_lock);
+	hyp_spin_lock(&iommu->iommu_lock);
 	iommu->power_is_off = false;
-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return 0;
 }

@@ -346,9 +343,9 @@ static int iommu_power_off(struct kvm_power_domain *pd)

 	pkvm_debug("%s\n", __func__);

-	hyp_spin_lock(&iommu_lock);
+	hyp_spin_lock(&iommu->iommu_lock);
 	iommu->power_is_off = true;
-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return 0;
 }

@@ -362,6 +359,8 @@ int kvm_iommu_init_device(struct kvm_hyp_iommu *iommu)
 	int ret;
 	void *domains;

+	hyp_spin_lock_init(&iommu->iommu_lock);
+
 	ret = pkvm_init_power_domain(&iommu->power_domain, &iommu_power_ops);
 	if (ret)
 		return ret;
@@ -376,8 +375,6 @@ int kvm_iommu_init(void)
 {
 	enum kvm_pgtable_prot prot;

-	hyp_spin_lock_init(&iommu_lock);
-
 	if (WARN_ON(!kvm_iommu_ops.get_iommu_by_id ||
 		    !kvm_iommu_ops.alloc_iopt ||
 		    !kvm_iommu_ops.free_iopt ||
diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile
index a90b97d8bae3..cf9195e24a08 100644
--- a/drivers/iommu/arm/arm-smmu-v3/Makefile
+++ b/drivers/iommu/arm/arm-smmu-v3/Makefile
@@ -6,6 +6,7 @@ arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
 arm_smmu_v3-objs := $(arm_smmu_v3-objs-y)

 obj-$(CONFIG_ARM_SMMU_V3_PKVM) += arm_smmu_v3_kvm.o
+ccflags-$(CONFIG_ARM_SMMU_V3_PKVM) += -Iarch/arm64/kvm/
 arm_smmu_v3_kvm-objs-y += arm-smmu-v3-kvm.o
 arm_smmu_v3_kvm-objs-y += arm-smmu-v3-common.o
 arm_smmu_v3_kvm-objs := $(arm_smmu_v3_kvm-objs-y)
diff --git a/include/kvm/iommu.h b/include/kvm/iommu.h
index ab888da731bc..b7cda7540156 100644
--- a/include/kvm/iommu.h
+++ b/include/kvm/iommu.h
@@ -5,6 +5,11 @@
 #include <asm/kvm_host.h>
 #include <kvm/power_domain.h>
 #include <linux/io-pgtable.h>
+#ifdef __KVM_NVHE_HYPERVISOR__
+#include <nvhe/spinlock.h>
+#else
+#include "hyp_constants.h"
+#endif

 /*
  * Parameters from the trusted host:
@@ -23,6 +28,11 @@ struct kvm_hyp_iommu {

 	struct io_pgtable_params	*pgtable;
 	bool				power_is_off;
+#ifdef __KVM_NVHE_HYPERVISOR__
+	hyp_spinlock_t			iommu_lock;
+#else
+	u8 unused[HYP_SPINLOCK_SIZE];
+#endif
 };

 struct kvm_hyp_iommu_memcache {
--
2.41.0.162.gfafddb0af9-goog


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 19/45] KVM: arm64: iommu: Add domains
@ 2023-06-15 13:32         ` Mostafa Saleh
  0 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2023-06-15 13:32 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Jean,

On Fri, Jun 02, 2023 at 04:29:07PM +0100, Jean-Philippe Brucker wrote:
> Hi Mostafa,
> 
> On Fri, 19 May 2023 at 16:33, Mostafa Saleh <smostafa@google.com> wrote:
> >
> > Hi Jean,
> >
> > On Wed, Feb 01, 2023 at 12:53:03PM +0000, Jean-Philippe Brucker wrote:
> > > +/*
> > > + * Serialize access to domains and IOMMU driver internal structures (command
> > > + * queue, device tables)
> > > + */
> > > +static hyp_spinlock_t iommu_lock;
> > > +
> > I was looking more into this lock and I think we can make it per IOMMU instead
> > of having one big lock to avoid congestion, as I see it is only used to
> > protect per-IOMMU resources.
> 
> Yes it's a major bottleneck, thanks for looking into this. I think we'll
> eventually want to improve scalability within an IOMMU and even a domain:
> a multi-queue network device will share a single domain between multiple
> CPUs, each issuing lots of map/unmap calls. Or just two devices drivers
> working on different CPUs and sharing one IOMMU. In those cases the
> per-IOMMU lock won't be good enough and we'll be back to this problem, but
> a per-IOMMU lock should already improve scalability for some systems.
> 
> Currently the global lock protects:
> 
> (1) The domain array (per IOMMU)
> (2) The io-pgtables (per domain)
> (3) The command queue (per SMMU)
> (4) Power state (per SMMU)
> 
> I ran some experiments with refcounting the domains to avoid the lock for
> (1) and (2), which improves map() scalability. See
> https://jpbrucker.net/git/linux/commit/?h=pkvm/smmu-wip&id=5ad3bc6fd589b6f11fe31ccee5b8777ba8739167
> (and another experiment to optimize the DMA refcount on branch pkvm/smmu-wip)

Thanks for the detailed explanation! I will give it a try.

> For (2), the io-pgtable-arm code already supports concurrent accesses
> (2c3d273eabe8 ("iommu/io-pgtable-arm: Support lockless operation")). But
> this one needs careful consideration because in the host, the io-pgtable
> code trusts the device drivers. For example it expects that only buggy
> drivers call map()/unmap() to the same IOVA concurrently. We need to make
> sure a compromised host drivers can't exploit these assumptions to do
> anything nasty.
> 
> There are options to improve (3) as well. The host SMMU driver supports
> lockless command-queue (587e6c10a7ce ("iommu/arm-smmu-v3: Reduce
> contention during command-queue insertion")) but it may be too complex to
> put in the hypervisor (or rather, I haven't tried to understand it yet).
> We could also wait for systems with ECMDQ, which enables per-CPU queues.
I was thinking about the same thing, in the futrue we should aim to have a
similar lockless approach for the cmdq as the host in EL2. I am not sure
how feasable it is, I will have a closer look. But having a per-IOMMU
shouldbe be a step in the right direction.

> > Some special handling needed as hyp_spinlock_t is not exposed to EL1.
> > Maybe something like this:
> >
> > diff --git a/arch/arm64/kvm/hyp/hyp-constants.c b/arch/arm64/kvm/hyp/hyp-constants.c
> > index b257a3b4bfc5..96d30a37f9e6 100644
> > --- a/arch/arm64/kvm/hyp/hyp-constants.c
> > +++ b/arch/arm64/kvm/hyp/hyp-constants.c
> > @@ -3,11 +3,13 @@
> >  #include <linux/kbuild.h>
> >  #include <nvhe/memory.h>
> >  #include <nvhe/pkvm.h>
> > +#include <nvhe/spinlock.h>
> >
> >  int main(void)
> >  {
> >         DEFINE(STRUCT_HYP_PAGE_SIZE,    sizeof(struct hyp_page));
> >         DEFINE(PKVM_HYP_VM_SIZE,        sizeof(struct pkvm_hyp_vm));
> >         DEFINE(PKVM_HYP_VCPU_SIZE,      sizeof(struct pkvm_hyp_vcpu));
> > +       DEFINE(HYP_SPINLOCK_SIZE,       sizeof(hyp_spinlock_t));
> >         return 0;
> >  }
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile
> > index a90b97d8bae3..cf9195e24a08 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/Makefile
> > +++ b/drivers/iommu/arm/arm-smmu-v3/Makefile
> > @@ -6,6 +6,7 @@ arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
> >  arm_smmu_v3-objs := $(arm_smmu_v3-objs-y)
> >
> >  obj-$(CONFIG_ARM_SMMU_V3_PKVM) += arm_smmu_v3_kvm.o
> > +ccflags-$(CONFIG_ARM_SMMU_V3_PKVM) += -Iarch/arm64/kvm/
> >  arm_smmu_v3_kvm-objs-y += arm-smmu-v3-kvm.o
> >  arm_smmu_v3_kvm-objs-y += arm-smmu-v3-common.o
> >  arm_smmu_v3_kvm-objs := $(arm_smmu_v3_kvm-objs-y)
> > diff --git a/include/kvm/iommu.h b/include/kvm/iommu.h
> > index ab888da731bc..82827b99b1ed 100644
> > --- a/include/kvm/iommu.h
> > +++ b/include/kvm/iommu.h
> > @@ -5,6 +5,12 @@
> >  #include <asm/kvm_host.h>
> >  #include <kvm/power_domain.h>
> >  #include <linux/io-pgtable.h>
> > +#ifdef __KVM_NVHE_HYPERVISOR__
> > +#include <nvhe/spinlock.h>
> > +#else
> > +#include "hyp_constants.h"
> > +#endif
> >
> >  /*
> >   * Parameters from the trusted host:
> > @@ -23,6 +29,11 @@ struct kvm_hyp_iommu {
> >
> >         struct io_pgtable_params        *pgtable;
> >         bool                            power_is_off;
> > +#ifdef __KVM_NVHE_HYPERVISOR__
> > +       hyp_spinlock_t                  iommu_lock;
> > +#else
> > +       u8 unused[HYP_SPINLOCK_SIZE];
> > +#endif
> >  };
> >
> >  struct kvm_hyp_iommu_memcache {
> > diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> > index 1f4d5fcc1386..afaf173e65ed 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> > @@ -14,12 +14,6 @@
> >
> >  struct kvm_hyp_iommu_memcache __ro_after_init *kvm_hyp_iommu_memcaches;
> >
> > -/*
> > - * Serialize access to domains and IOMMU driver internal structures (command
> > - * queue, device tables)
> > - */
> > -static hyp_spinlock_t iommu_lock;
> > -
> >  #define domain_to_iopt(_iommu, _domain, _domain_id)            \
> >         (struct io_pgtable) {                                   \
> >                 .ops = &(_iommu)->pgtable->ops,                 \
> > @@ -93,10 +87,10 @@ int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
> >  {
> >         int ret = -EINVAL;
> >         struct io_pgtable iopt;
> > -       struct kvm_hyp_iommu *iommu;
> > +       struct kvm_hyp_iommu *iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
> >         struct kvm_hyp_iommu_domain *domain;
> >
> > -       hyp_spin_lock(&iommu_lock);
> > +       hyp_spin_lock(&iommu->iommu_lock);
> >         domain = handle_to_domain(iommu_id, domain_id, &iommu);
> >         if (!domain)
> >                 goto out_unlock;
> > @@ -112,7 +106,7 @@ int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
> >         domain->refs = 1;
> >         domain->pgd = iopt.pgd;
> >  out_unlock:
> > -       hyp_spin_unlock(&iommu_lock);
> > +       hyp_spin_unlock(&iommu->iommu_lock);
> >         return ret;
> >  }
> >
> > @@ -120,10 +114,10 @@ int kvm_iommu_free_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id)
> >  {
> >         int ret = -EINVAL;
> >         struct io_pgtable iopt;
> > -       struct kvm_hyp_iommu *iommu;
> > +       struct kvm_hyp_iommu *iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
> >         struct kvm_hyp_iommu_domain *domain;
> >
> > -       hyp_spin_lock(&iommu_lock);
> > +       hyp_spin_lock(&iommu->iommu_lock);
> >         domain = handle_to_domain(iommu_id, domain_id, &iommu);
> >         if (!domain)
> >                 goto out_unlock;
> > @@ -137,7 +131,7 @@ int kvm_iommu_free_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id)
> >         memset(domain, 0, sizeof(*domain));
> >
> >  out_unlock:
> > -       hyp_spin_unlock(&iommu_lock);
> > +       hyp_spin_unlock(&iommu->iommu_lock);
> >         return ret;
> >  }
> > --
> >
> > (I didn't include full patch as it is too long, but mainly the
> > rest is s/&iommu_lock/&iommu->iommu_lock)
> >
> > Please let me know what do you think?
> 
> It makes sense, I can append it to my tree or squash it (with your S-o-B)
> if you want to send the full patch
Sure, that is the full patch based on latest
https://jpbrucker.net/git/linux/log/?h=pkvm/smmu

Thanks,
Mostafa

--

From 5b43b97ca2486d52aa2ad475e4a34357dea57bca Mon Sep 17 00:00:00 2001
From: Mostafa Saleh <smostafa@google.com>
Date: Tue, 13 Jun 2023 14:59:37 +0000
Subject: [PATCH] KVM: arm: iommu: Use per-iommu lock instead of one global
 lock

Currently, there is one big lock for all of the IOMMU operations
which is used to protect per-IOMMU resources or domains (which are
currently per-IOMMU also).

This can be improved for setups with many IOMMUs by having a per
IOMMU lock.

As now we always get the iommu before calling handle_to_domain, we
can remove the out_iommu arg and iommu_id and pass iommu instead.

Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
 arch/arm64/kvm/hyp/hyp-constants.c     |  1 +
 arch/arm64/kvm/hyp/nvhe/iommu/iommu.c  | 75 +++++++++++++-------------
 drivers/iommu/arm/arm-smmu-v3/Makefile |  1 +
 include/kvm/iommu.h                    | 10 ++++
 4 files changed, 48 insertions(+), 39 deletions(-)

diff --git a/arch/arm64/kvm/hyp/hyp-constants.c b/arch/arm64/kvm/hyp/hyp-constants.c
index b257a3b4bfc5..d84cc8ec1e46 100644
--- a/arch/arm64/kvm/hyp/hyp-constants.c
+++ b/arch/arm64/kvm/hyp/hyp-constants.c
@@ -9,5 +9,6 @@ int main(void)
 	DEFINE(STRUCT_HYP_PAGE_SIZE,	sizeof(struct hyp_page));
 	DEFINE(PKVM_HYP_VM_SIZE,	sizeof(struct pkvm_hyp_vm));
 	DEFINE(PKVM_HYP_VCPU_SIZE,	sizeof(struct pkvm_hyp_vcpu));
+	DEFINE(HYP_SPINLOCK_SIZE,       sizeof(hyp_spinlock_t));
 	return 0;
 }
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
index c70d3b6934eb..1dcba39d1d41 100644
--- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
@@ -14,12 +14,6 @@

 struct kvm_hyp_iommu_memcache __ro_after_init *kvm_hyp_iommu_memcaches;

-/*
- * Serialize access to domains and IOMMU driver internal structures (command
- * queue, device tables)
- */
-static hyp_spinlock_t iommu_lock;
-
 #define domain_to_iopt(_iommu, _domain, _domain_id)		\
 	(struct io_pgtable) {					\
 		.ops = &(_iommu)->pgtable->ops,			\
@@ -59,14 +53,11 @@ void kvm_iommu_reclaim_page(void *p)
 }

 static struct kvm_hyp_iommu_domain *
-handle_to_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
-		 struct kvm_hyp_iommu **out_iommu)
+handle_to_domain(struct kvm_hyp_iommu *iommu, pkvm_handle_t domain_id)
 {
 	int idx;
-	struct kvm_hyp_iommu *iommu;
 	struct kvm_hyp_iommu_domain *domains;

-	iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
 	if (!iommu)
 		return NULL;

@@ -83,7 +74,6 @@ handle_to_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 		iommu->domains[idx] = domains;
 	}

-	*out_iommu = iommu;
 	return &domains[domain_id & KVM_IOMMU_DOMAIN_ID_LEAF_MASK];
 }

@@ -95,8 +85,9 @@ int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 	struct kvm_hyp_iommu *iommu;
 	struct kvm_hyp_iommu_domain *domain;

-	hyp_spin_lock(&iommu_lock);
-	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
+	hyp_spin_lock(&iommu->iommu_lock);
+	domain = handle_to_domain(iommu, domain_id);
 	if (!domain)
 		goto out_unlock;

@@ -111,7 +102,7 @@ int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 	domain->refs = 1;
 	domain->pgd = iopt.pgd;
 out_unlock:
-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return ret;
 }

@@ -122,8 +113,9 @@ int kvm_iommu_free_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id)
 	struct kvm_hyp_iommu *iommu;
 	struct kvm_hyp_iommu_domain *domain;

-	hyp_spin_lock(&iommu_lock);
-	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
+	hyp_spin_lock(&iommu->iommu_lock);
+	domain = handle_to_domain(iommu, domain_id);
 	if (!domain)
 		goto out_unlock;

@@ -136,7 +128,7 @@ int kvm_iommu_free_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id)
 	memset(domain, 0, sizeof(*domain));

 out_unlock:
-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return ret;
 }

@@ -147,8 +139,9 @@ int kvm_iommu_attach_dev(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 	struct kvm_hyp_iommu *iommu;
 	struct kvm_hyp_iommu_domain *domain;

-	hyp_spin_lock(&iommu_lock);
-	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
+	hyp_spin_lock(&iommu->iommu_lock);
+	domain = handle_to_domain(iommu, domain_id);
 	if (!domain || !domain->refs || domain->refs == UINT_MAX)
 		goto out_unlock;

@@ -158,7 +151,7 @@ int kvm_iommu_attach_dev(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,

 	domain->refs++;
 out_unlock:
-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return ret;
 }

@@ -169,8 +162,9 @@ int kvm_iommu_detach_dev(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 	struct kvm_hyp_iommu *iommu;
 	struct kvm_hyp_iommu_domain *domain;

-	hyp_spin_lock(&iommu_lock);
-	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
+	hyp_spin_lock(&iommu->iommu_lock);
+	domain = handle_to_domain(iommu, domain_id);
 	if (!domain || domain->refs <= 1)
 		goto out_unlock;

@@ -180,7 +174,7 @@ int kvm_iommu_detach_dev(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,

 	domain->refs--;
 out_unlock:
-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return ret;
 }

@@ -209,9 +203,10 @@ int kvm_iommu_map_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 	    iova + size < iova || paddr + size < paddr)
 		return -EOVERFLOW;

-	hyp_spin_lock(&iommu_lock);
+	iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
+	hyp_spin_lock(&iommu->iommu_lock);

-	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	domain = handle_to_domain(iommu, domain_id);
 	if (!domain)
 		goto err_unlock;

@@ -236,7 +231,7 @@ int kvm_iommu_map_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 		paddr += mapped;
 	}

-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return 0;

 err_unmap:
@@ -245,7 +240,7 @@ int kvm_iommu_map_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 		iopt_unmap_pages(&iopt, iova_orig, pgsize, pgcount, NULL);
 	__pkvm_host_unshare_dma(paddr_orig, size);
 err_unlock:
-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return ret;
 }

@@ -269,8 +264,9 @@ size_t kvm_iommu_unmap_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 	    iova + size < iova)
 		return 0;

-	hyp_spin_lock(&iommu_lock);
-	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
+	hyp_spin_lock(&iommu->iommu_lock);
+	domain = handle_to_domain(iommu, domain_id);
 	if (!domain)
 		goto out_unlock;

@@ -299,7 +295,7 @@ size_t kvm_iommu_unmap_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 	}

 out_unlock:
-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return total_unmapped;
 }

@@ -311,14 +307,15 @@ phys_addr_t kvm_iommu_iova_to_phys(pkvm_handle_t iommu_id,
 	struct kvm_hyp_iommu *iommu;
 	struct kvm_hyp_iommu_domain *domain;

-	hyp_spin_lock(&iommu_lock);
-	domain = handle_to_domain(iommu_id, domain_id, &iommu);
+	iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
+	hyp_spin_lock(&iommu->iommu_lock);
+	domain = handle_to_domain(iommu, domain_id);
 	if (domain) {
 		iopt = domain_to_iopt(iommu, domain, domain_id);

 		phys = iopt_iova_to_phys(&iopt, iova);
 	}
-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return phys;
 }

@@ -333,9 +330,9 @@ static int iommu_power_on(struct kvm_power_domain *pd)
 	 * We currently assume that the device retains its architectural state
 	 * across power off, hence no save/restore.
 	 */
-	hyp_spin_lock(&iommu_lock);
+	hyp_spin_lock(&iommu->iommu_lock);
 	iommu->power_is_off = false;
-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return 0;
 }

@@ -346,9 +343,9 @@ static int iommu_power_off(struct kvm_power_domain *pd)

 	pkvm_debug("%s\n", __func__);

-	hyp_spin_lock(&iommu_lock);
+	hyp_spin_lock(&iommu->iommu_lock);
 	iommu->power_is_off = true;
-	hyp_spin_unlock(&iommu_lock);
+	hyp_spin_unlock(&iommu->iommu_lock);
 	return 0;
 }

@@ -362,6 +359,8 @@ int kvm_iommu_init_device(struct kvm_hyp_iommu *iommu)
 	int ret;
 	void *domains;

+	hyp_spin_lock_init(&iommu->iommu_lock);
+
 	ret = pkvm_init_power_domain(&iommu->power_domain, &iommu_power_ops);
 	if (ret)
 		return ret;
@@ -376,8 +375,6 @@ int kvm_iommu_init(void)
 {
 	enum kvm_pgtable_prot prot;

-	hyp_spin_lock_init(&iommu_lock);
-
 	if (WARN_ON(!kvm_iommu_ops.get_iommu_by_id ||
 		    !kvm_iommu_ops.alloc_iopt ||
 		    !kvm_iommu_ops.free_iopt ||
diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile
index a90b97d8bae3..cf9195e24a08 100644
--- a/drivers/iommu/arm/arm-smmu-v3/Makefile
+++ b/drivers/iommu/arm/arm-smmu-v3/Makefile
@@ -6,6 +6,7 @@ arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
 arm_smmu_v3-objs := $(arm_smmu_v3-objs-y)

 obj-$(CONFIG_ARM_SMMU_V3_PKVM) += arm_smmu_v3_kvm.o
+ccflags-$(CONFIG_ARM_SMMU_V3_PKVM) += -Iarch/arm64/kvm/
 arm_smmu_v3_kvm-objs-y += arm-smmu-v3-kvm.o
 arm_smmu_v3_kvm-objs-y += arm-smmu-v3-common.o
 arm_smmu_v3_kvm-objs := $(arm_smmu_v3_kvm-objs-y)
diff --git a/include/kvm/iommu.h b/include/kvm/iommu.h
index ab888da731bc..b7cda7540156 100644
--- a/include/kvm/iommu.h
+++ b/include/kvm/iommu.h
@@ -5,6 +5,11 @@
 #include <asm/kvm_host.h>
 #include <kvm/power_domain.h>
 #include <linux/io-pgtable.h>
+#ifdef __KVM_NVHE_HYPERVISOR__
+#include <nvhe/spinlock.h>
+#else
+#include "hyp_constants.h"
+#endif

 /*
  * Parameters from the trusted host:
@@ -23,6 +28,11 @@ struct kvm_hyp_iommu {

 	struct io_pgtable_params	*pgtable;
 	bool				power_is_off;
+#ifdef __KVM_NVHE_HYPERVISOR__
+	hyp_spinlock_t			iommu_lock;
+#else
+	u8 unused[HYP_SPINLOCK_SIZE];
+#endif
 };

 struct kvm_hyp_iommu_memcache {
--
2.41.0.162.gfafddb0af9-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 27/45] KVM: arm64: smmu-v3: Setup domains and page table configuration
  2023-02-01 12:53   ` Jean-Philippe Brucker
@ 2023-06-23 19:12     ` Mostafa Saleh
  -1 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2023-06-23 19:12 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Jean,

On Wed, Feb 01, 2023 at 12:53:11PM +0000, Jean-Philippe Brucker wrote:
> Setup the stream table entries when the host issues the attach_dev() and
> detach_dev() hypercalls. The driver holds one io-pgtable configuration
> for all domains.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> ---
>  include/kvm/arm_smmu_v3.h                   |   2 +
>  arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c | 178 +++++++++++++++++++-
>  2 files changed, 177 insertions(+), 3 deletions(-)
> 
> diff --git a/include/kvm/arm_smmu_v3.h b/include/kvm/arm_smmu_v3.h
> index fc67a3bf5709..ed139b0e9612 100644
> --- a/include/kvm/arm_smmu_v3.h
> +++ b/include/kvm/arm_smmu_v3.h
> @@ -3,6 +3,7 @@
>  #define __KVM_ARM_SMMU_V3_H
>  
>  #include <asm/kvm_asm.h>
> +#include <linux/io-pgtable-arm.h>
>  #include <kvm/iommu.h>
>  
>  #if IS_ENABLED(CONFIG_ARM_SMMU_V3_PKVM)
> @@ -28,6 +29,7 @@ struct hyp_arm_smmu_v3_device {
>  	size_t			strtab_num_entries;
>  	size_t			strtab_num_l1_entries;
>  	u8			strtab_split;
> +	struct arm_lpae_io_pgtable pgtable;
>  };
>  
>  extern size_t kvm_nvhe_sym(kvm_hyp_arm_smmu_v3_count);
> diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
> index 81040339ccfe..56e313203a16 100644
> --- a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
> +++ b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
> @@ -152,7 +152,6 @@ static int smmu_send_cmd(struct hyp_arm_smmu_v3_device *smmu,
>  	return smmu_sync_cmd(smmu);
>  }
>  
> -__maybe_unused
>  static int smmu_sync_ste(struct hyp_arm_smmu_v3_device *smmu, u32 sid)
>  {
>  	struct arm_smmu_cmdq_ent cmd = {
> @@ -194,7 +193,6 @@ static int smmu_alloc_l2_strtab(struct hyp_arm_smmu_v3_device *smmu, u32 idx)
>  	return 0;
>  }
>  
> -__maybe_unused
>  static u64 *smmu_get_ste_ptr(struct hyp_arm_smmu_v3_device *smmu, u32 sid)
>  {
>  	u32 idx;
> @@ -382,6 +380,68 @@ static int smmu_reset_device(struct hyp_arm_smmu_v3_device *smmu)
>  	return smmu_write_cr0(smmu, 0);
>  }
>  
> +static struct hyp_arm_smmu_v3_device *to_smmu(struct kvm_hyp_iommu *iommu)
> +{
> +	return container_of(iommu, struct hyp_arm_smmu_v3_device, iommu);
> +}
> +
> +static void smmu_tlb_flush_all(void *cookie)
> +{
> +	struct kvm_iommu_tlb_cookie *data = cookie;
> +	struct hyp_arm_smmu_v3_device *smmu = to_smmu(data->iommu);
> +	struct arm_smmu_cmdq_ent cmd = {
> +		.opcode = CMDQ_OP_TLBI_S12_VMALL,
> +		.tlbi.vmid = data->domain_id,
> +	};
> +
> +	WARN_ON(smmu_send_cmd(smmu, &cmd));
> +}
> +
> +static void smmu_tlb_inv_range(struct kvm_iommu_tlb_cookie *data,
> +			       unsigned long iova, size_t size, size_t granule,
> +			       bool leaf)
> +{
> +	struct hyp_arm_smmu_v3_device *smmu = to_smmu(data->iommu);
> +	unsigned long end = iova + size;
> +	struct arm_smmu_cmdq_ent cmd = {
> +		.opcode = CMDQ_OP_TLBI_S2_IPA,
> +		.tlbi.vmid = data->domain_id,
> +		.tlbi.leaf = leaf,
> +	};
> +
> +	/*
> +	 * There are no mappings at high addresses since we don't use TTB1, so
> +	 * no overflow possible.
> +	 */
> +	BUG_ON(end < iova);
> +
> +	while (iova < end) {
> +		cmd.tlbi.addr = iova;
> +		WARN_ON(smmu_send_cmd(smmu, &cmd));
> +		BUG_ON(iova + granule < iova);
> +		iova += granule;
> +	}
> +}
> +
> +static void smmu_tlb_flush_walk(unsigned long iova, size_t size,
> +				size_t granule, void *cookie)
> +{
> +	smmu_tlb_inv_range(cookie, iova, size, granule, false);
> +}
> +
> +static void smmu_tlb_add_page(struct iommu_iotlb_gather *gather,
> +			      unsigned long iova, size_t granule,
> +			      void *cookie)
> +{
> +	smmu_tlb_inv_range(cookie, iova, granule, granule, true);
> +}
> +
> +static const struct iommu_flush_ops smmu_tlb_ops = {
> +	.tlb_flush_all	= smmu_tlb_flush_all,
> +	.tlb_flush_walk = smmu_tlb_flush_walk,
> +	.tlb_add_page	= smmu_tlb_add_page,
> +};
> +
>  static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
>  {
>  	int ret;
> @@ -394,6 +454,14 @@ static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
>  	if (IS_ERR(smmu->base))
>  		return PTR_ERR(smmu->base);
>  
> +	smmu->iommu.pgtable_cfg.tlb = &smmu_tlb_ops;
> +
> +	ret = kvm_arm_io_pgtable_init(&smmu->iommu.pgtable_cfg, &smmu->pgtable);
> +	if (ret)
> +		return ret;
> +
> +	smmu->iommu.pgtable = &smmu->pgtable.iop;
> +
>  	ret = smmu_init_registers(smmu);
>  	if (ret)
>  		return ret;
> @@ -406,7 +474,11 @@ static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
>  	if (ret)
>  		return ret;
>  
> -	return smmu_reset_device(smmu);
> +	ret = smmu_reset_device(smmu);
> +	if (ret)
> +		return ret;
> +
> +	return kvm_iommu_init_device(&smmu->iommu);
>  }
>  
>  static int smmu_init(void)
> @@ -414,6 +486,10 @@ static int smmu_init(void)
>  	int ret;
>  	struct hyp_arm_smmu_v3_device *smmu;
>  
> +	ret = kvm_iommu_init();
> +	if (ret)
> +		return ret;
> +
>  	ret = pkvm_create_mappings(kvm_hyp_arm_smmu_v3_smmus,
>  				   kvm_hyp_arm_smmu_v3_smmus +
>  				   kvm_hyp_arm_smmu_v3_count,
> @@ -430,8 +506,104 @@ static int smmu_init(void)
>  	return 0;
>  }
>  
> +static struct kvm_hyp_iommu *smmu_id_to_iommu(pkvm_handle_t smmu_id)
> +{
> +	if (smmu_id >= kvm_hyp_arm_smmu_v3_count)
> +		return NULL;
> +	smmu_id = array_index_nospec(smmu_id, kvm_hyp_arm_smmu_v3_count);
> +
> +	return &kvm_hyp_arm_smmu_v3_smmus[smmu_id].iommu;
> +}
> +
> +static int smmu_attach_dev(struct kvm_hyp_iommu *iommu, pkvm_handle_t domain_id,
> +			   struct kvm_hyp_iommu_domain *domain, u32 sid)
> +{
> +	int i;
> +	int ret;
> +	u64 *dst;
> +	struct io_pgtable_cfg *cfg;
> +	u64 ts, sl, ic, oc, sh, tg, ps;
> +	u64 ent[STRTAB_STE_DWORDS] = {};
> +	struct hyp_arm_smmu_v3_device *smmu = to_smmu(iommu);
> +
> +	dst = smmu_get_ste_ptr(smmu, sid);
> +	if (!dst || dst[0])
> +		return -EINVAL;
> +
> +	cfg = &smmu->pgtable.iop.cfg;
> +	ps = cfg->arm_lpae_s2_cfg.vtcr.ps;
> +	tg = cfg->arm_lpae_s2_cfg.vtcr.tg;
> +	sh = cfg->arm_lpae_s2_cfg.vtcr.sh;
> +	oc = cfg->arm_lpae_s2_cfg.vtcr.orgn;
> +	ic = cfg->arm_lpae_s2_cfg.vtcr.irgn;
> +	sl = cfg->arm_lpae_s2_cfg.vtcr.sl;
> +	ts = cfg->arm_lpae_s2_cfg.vtcr.tsz;
> +
> +	ent[0] = STRTAB_STE_0_V |
> +		 FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS);
> +	ent[2] = FIELD_PREP(STRTAB_STE_2_VTCR,
> +			FIELD_PREP(STRTAB_STE_2_VTCR_S2PS, ps) |
> +			FIELD_PREP(STRTAB_STE_2_VTCR_S2TG, tg) |
> +			FIELD_PREP(STRTAB_STE_2_VTCR_S2SH0, sh) |
> +			FIELD_PREP(STRTAB_STE_2_VTCR_S2OR0, oc) |
> +			FIELD_PREP(STRTAB_STE_2_VTCR_S2IR0, ic) |
> +			FIELD_PREP(STRTAB_STE_2_VTCR_S2SL0, sl) |
> +			FIELD_PREP(STRTAB_STE_2_VTCR_S2T0SZ, ts)) |
> +		 FIELD_PREP(STRTAB_STE_2_S2VMID, domain_id) |
> +		 STRTAB_STE_2_S2AA64;
> +	ent[3] = hyp_virt_to_phys(domain->pgd) & STRTAB_STE_3_S2TTB_MASK;
> +
> +	/*
> +	 * The SMMU may cache a disabled STE.
> +	 * Initialize all fields, sync, then enable it.
> +	 */
> +	for (i = 1; i < STRTAB_STE_DWORDS; i++)
> +		dst[i] = cpu_to_le64(ent[i]);
> +
> +	ret = smmu_sync_ste(smmu, sid);
> +	if (ret)
> +		return ret;
> +
> +	WRITE_ONCE(dst[0], cpu_to_le64(ent[0]));
> +	ret = smmu_sync_ste(smmu, sid);
> +	if (ret)
> +		dst[0] = 0;
> +
> +	return ret;
> +}
> +
> +static int smmu_detach_dev(struct kvm_hyp_iommu *iommu, pkvm_handle_t domain_id,
> +			   struct kvm_hyp_iommu_domain *domain, u32 sid)
> +{
> +	u64 ttb;
> +	u64 *dst;
> +	int i, ret;
> +	struct hyp_arm_smmu_v3_device *smmu = to_smmu(iommu);
> +
> +	dst = smmu_get_ste_ptr(smmu, sid);
> +	if (!dst)
> +		return -ENODEV;
> +
> +	ttb = dst[3] & STRTAB_STE_3_S2TTB_MASK;
This is unused, does detach needs to do anything with ttb?

> +	dst[0] = 0;
> +	ret = smmu_sync_ste(smmu, sid);
> +	if (ret)
> +		return ret;
> +
> +	for (i = 1; i < STRTAB_STE_DWORDS; i++)
> +		dst[i] = 0;
> +
> +	return smmu_sync_ste(smmu, sid);
> +}
> +
>  static struct kvm_iommu_ops smmu_ops = {
>  	.init				= smmu_init,
> +	.get_iommu_by_id		= smmu_id_to_iommu,
> +	.alloc_iopt			= kvm_arm_io_pgtable_alloc,
> +	.free_iopt			= kvm_arm_io_pgtable_free,
> +	.attach_dev			= smmu_attach_dev,
> +	.detach_dev			= smmu_detach_dev,
>  };
>  
>  int kvm_arm_smmu_v3_register(void)
> -- 
> 2.39.0

Thanks,
Mostafa

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 27/45] KVM: arm64: smmu-v3: Setup domains and page table configuration
@ 2023-06-23 19:12     ` Mostafa Saleh
  0 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2023-06-23 19:12 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Jean,

On Wed, Feb 01, 2023 at 12:53:11PM +0000, Jean-Philippe Brucker wrote:
> Setup the stream table entries when the host issues the attach_dev() and
> detach_dev() hypercalls. The driver holds one io-pgtable configuration
> for all domains.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> ---
>  include/kvm/arm_smmu_v3.h                   |   2 +
>  arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c | 178 +++++++++++++++++++-
>  2 files changed, 177 insertions(+), 3 deletions(-)
> 
> diff --git a/include/kvm/arm_smmu_v3.h b/include/kvm/arm_smmu_v3.h
> index fc67a3bf5709..ed139b0e9612 100644
> --- a/include/kvm/arm_smmu_v3.h
> +++ b/include/kvm/arm_smmu_v3.h
> @@ -3,6 +3,7 @@
>  #define __KVM_ARM_SMMU_V3_H
>  
>  #include <asm/kvm_asm.h>
> +#include <linux/io-pgtable-arm.h>
>  #include <kvm/iommu.h>
>  
>  #if IS_ENABLED(CONFIG_ARM_SMMU_V3_PKVM)
> @@ -28,6 +29,7 @@ struct hyp_arm_smmu_v3_device {
>  	size_t			strtab_num_entries;
>  	size_t			strtab_num_l1_entries;
>  	u8			strtab_split;
> +	struct arm_lpae_io_pgtable pgtable;
>  };
>  
>  extern size_t kvm_nvhe_sym(kvm_hyp_arm_smmu_v3_count);
> diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
> index 81040339ccfe..56e313203a16 100644
> --- a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
> +++ b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
> @@ -152,7 +152,6 @@ static int smmu_send_cmd(struct hyp_arm_smmu_v3_device *smmu,
>  	return smmu_sync_cmd(smmu);
>  }
>  
> -__maybe_unused
>  static int smmu_sync_ste(struct hyp_arm_smmu_v3_device *smmu, u32 sid)
>  {
>  	struct arm_smmu_cmdq_ent cmd = {
> @@ -194,7 +193,6 @@ static int smmu_alloc_l2_strtab(struct hyp_arm_smmu_v3_device *smmu, u32 idx)
>  	return 0;
>  }
>  
> -__maybe_unused
>  static u64 *smmu_get_ste_ptr(struct hyp_arm_smmu_v3_device *smmu, u32 sid)
>  {
>  	u32 idx;
> @@ -382,6 +380,68 @@ static int smmu_reset_device(struct hyp_arm_smmu_v3_device *smmu)
>  	return smmu_write_cr0(smmu, 0);
>  }
>  
> +static struct hyp_arm_smmu_v3_device *to_smmu(struct kvm_hyp_iommu *iommu)
> +{
> +	return container_of(iommu, struct hyp_arm_smmu_v3_device, iommu);
> +}
> +
> +static void smmu_tlb_flush_all(void *cookie)
> +{
> +	struct kvm_iommu_tlb_cookie *data = cookie;
> +	struct hyp_arm_smmu_v3_device *smmu = to_smmu(data->iommu);
> +	struct arm_smmu_cmdq_ent cmd = {
> +		.opcode = CMDQ_OP_TLBI_S12_VMALL,
> +		.tlbi.vmid = data->domain_id,
> +	};
> +
> +	WARN_ON(smmu_send_cmd(smmu, &cmd));
> +}
> +
> +static void smmu_tlb_inv_range(struct kvm_iommu_tlb_cookie *data,
> +			       unsigned long iova, size_t size, size_t granule,
> +			       bool leaf)
> +{
> +	struct hyp_arm_smmu_v3_device *smmu = to_smmu(data->iommu);
> +	unsigned long end = iova + size;
> +	struct arm_smmu_cmdq_ent cmd = {
> +		.opcode = CMDQ_OP_TLBI_S2_IPA,
> +		.tlbi.vmid = data->domain_id,
> +		.tlbi.leaf = leaf,
> +	};
> +
> +	/*
> +	 * There are no mappings at high addresses since we don't use TTB1, so
> +	 * no overflow possible.
> +	 */
> +	BUG_ON(end < iova);
> +
> +	while (iova < end) {
> +		cmd.tlbi.addr = iova;
> +		WARN_ON(smmu_send_cmd(smmu, &cmd));
> +		BUG_ON(iova + granule < iova);
> +		iova += granule;
> +	}
> +}
> +
> +static void smmu_tlb_flush_walk(unsigned long iova, size_t size,
> +				size_t granule, void *cookie)
> +{
> +	smmu_tlb_inv_range(cookie, iova, size, granule, false);
> +}
> +
> +static void smmu_tlb_add_page(struct iommu_iotlb_gather *gather,
> +			      unsigned long iova, size_t granule,
> +			      void *cookie)
> +{
> +	smmu_tlb_inv_range(cookie, iova, granule, granule, true);
> +}
> +
> +static const struct iommu_flush_ops smmu_tlb_ops = {
> +	.tlb_flush_all	= smmu_tlb_flush_all,
> +	.tlb_flush_walk = smmu_tlb_flush_walk,
> +	.tlb_add_page	= smmu_tlb_add_page,
> +};
> +
>  static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
>  {
>  	int ret;
> @@ -394,6 +454,14 @@ static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
>  	if (IS_ERR(smmu->base))
>  		return PTR_ERR(smmu->base);
>  
> +	smmu->iommu.pgtable_cfg.tlb = &smmu_tlb_ops;
> +
> +	ret = kvm_arm_io_pgtable_init(&smmu->iommu.pgtable_cfg, &smmu->pgtable);
> +	if (ret)
> +		return ret;
> +
> +	smmu->iommu.pgtable = &smmu->pgtable.iop;
> +
>  	ret = smmu_init_registers(smmu);
>  	if (ret)
>  		return ret;
> @@ -406,7 +474,11 @@ static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
>  	if (ret)
>  		return ret;
>  
> -	return smmu_reset_device(smmu);
> +	ret = smmu_reset_device(smmu);
> +	if (ret)
> +		return ret;
> +
> +	return kvm_iommu_init_device(&smmu->iommu);
>  }
>  
>  static int smmu_init(void)
> @@ -414,6 +486,10 @@ static int smmu_init(void)
>  	int ret;
>  	struct hyp_arm_smmu_v3_device *smmu;
>  
> +	ret = kvm_iommu_init();
> +	if (ret)
> +		return ret;
> +
>  	ret = pkvm_create_mappings(kvm_hyp_arm_smmu_v3_smmus,
>  				   kvm_hyp_arm_smmu_v3_smmus +
>  				   kvm_hyp_arm_smmu_v3_count,
> @@ -430,8 +506,104 @@ static int smmu_init(void)
>  	return 0;
>  }
>  
> +static struct kvm_hyp_iommu *smmu_id_to_iommu(pkvm_handle_t smmu_id)
> +{
> +	if (smmu_id >= kvm_hyp_arm_smmu_v3_count)
> +		return NULL;
> +	smmu_id = array_index_nospec(smmu_id, kvm_hyp_arm_smmu_v3_count);
> +
> +	return &kvm_hyp_arm_smmu_v3_smmus[smmu_id].iommu;
> +}
> +
> +static int smmu_attach_dev(struct kvm_hyp_iommu *iommu, pkvm_handle_t domain_id,
> +			   struct kvm_hyp_iommu_domain *domain, u32 sid)
> +{
> +	int i;
> +	int ret;
> +	u64 *dst;
> +	struct io_pgtable_cfg *cfg;
> +	u64 ts, sl, ic, oc, sh, tg, ps;
> +	u64 ent[STRTAB_STE_DWORDS] = {};
> +	struct hyp_arm_smmu_v3_device *smmu = to_smmu(iommu);
> +
> +	dst = smmu_get_ste_ptr(smmu, sid);
> +	if (!dst || dst[0])
> +		return -EINVAL;
> +
> +	cfg = &smmu->pgtable.iop.cfg;
> +	ps = cfg->arm_lpae_s2_cfg.vtcr.ps;
> +	tg = cfg->arm_lpae_s2_cfg.vtcr.tg;
> +	sh = cfg->arm_lpae_s2_cfg.vtcr.sh;
> +	oc = cfg->arm_lpae_s2_cfg.vtcr.orgn;
> +	ic = cfg->arm_lpae_s2_cfg.vtcr.irgn;
> +	sl = cfg->arm_lpae_s2_cfg.vtcr.sl;
> +	ts = cfg->arm_lpae_s2_cfg.vtcr.tsz;
> +
> +	ent[0] = STRTAB_STE_0_V |
> +		 FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS);
> +	ent[2] = FIELD_PREP(STRTAB_STE_2_VTCR,
> +			FIELD_PREP(STRTAB_STE_2_VTCR_S2PS, ps) |
> +			FIELD_PREP(STRTAB_STE_2_VTCR_S2TG, tg) |
> +			FIELD_PREP(STRTAB_STE_2_VTCR_S2SH0, sh) |
> +			FIELD_PREP(STRTAB_STE_2_VTCR_S2OR0, oc) |
> +			FIELD_PREP(STRTAB_STE_2_VTCR_S2IR0, ic) |
> +			FIELD_PREP(STRTAB_STE_2_VTCR_S2SL0, sl) |
> +			FIELD_PREP(STRTAB_STE_2_VTCR_S2T0SZ, ts)) |
> +		 FIELD_PREP(STRTAB_STE_2_S2VMID, domain_id) |
> +		 STRTAB_STE_2_S2AA64;
> +	ent[3] = hyp_virt_to_phys(domain->pgd) & STRTAB_STE_3_S2TTB_MASK;
> +
> +	/*
> +	 * The SMMU may cache a disabled STE.
> +	 * Initialize all fields, sync, then enable it.
> +	 */
> +	for (i = 1; i < STRTAB_STE_DWORDS; i++)
> +		dst[i] = cpu_to_le64(ent[i]);
> +
> +	ret = smmu_sync_ste(smmu, sid);
> +	if (ret)
> +		return ret;
> +
> +	WRITE_ONCE(dst[0], cpu_to_le64(ent[0]));
> +	ret = smmu_sync_ste(smmu, sid);
> +	if (ret)
> +		dst[0] = 0;
> +
> +	return ret;
> +}
> +
> +static int smmu_detach_dev(struct kvm_hyp_iommu *iommu, pkvm_handle_t domain_id,
> +			   struct kvm_hyp_iommu_domain *domain, u32 sid)
> +{
> +	u64 ttb;
> +	u64 *dst;
> +	int i, ret;
> +	struct hyp_arm_smmu_v3_device *smmu = to_smmu(iommu);
> +
> +	dst = smmu_get_ste_ptr(smmu, sid);
> +	if (!dst)
> +		return -ENODEV;
> +
> +	ttb = dst[3] & STRTAB_STE_3_S2TTB_MASK;
This is unused, does detach needs to do anything with ttb?

> +	dst[0] = 0;
> +	ret = smmu_sync_ste(smmu, sid);
> +	if (ret)
> +		return ret;
> +
> +	for (i = 1; i < STRTAB_STE_DWORDS; i++)
> +		dst[i] = 0;
> +
> +	return smmu_sync_ste(smmu, sid);
> +}
> +
>  static struct kvm_iommu_ops smmu_ops = {
>  	.init				= smmu_init,
> +	.get_iommu_by_id		= smmu_id_to_iommu,
> +	.alloc_iopt			= kvm_arm_io_pgtable_alloc,
> +	.free_iopt			= kvm_arm_io_pgtable_free,
> +	.attach_dev			= smmu_attach_dev,
> +	.detach_dev			= smmu_detach_dev,
>  };
>  
>  int kvm_arm_smmu_v3_register(void)
> -- 
> 2.39.0

Thanks,
Mostafa

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 27/45] KVM: arm64: smmu-v3: Setup domains and page table configuration
  2023-06-23 19:12     ` Mostafa Saleh
@ 2023-07-03 10:41       ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-07-03 10:41 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Mostafa,

On Fri, Jun 23, 2023 at 07:12:05PM +0000, Mostafa Saleh wrote:
> > +static int smmu_detach_dev(struct kvm_hyp_iommu *iommu, pkvm_handle_t domain_id,
> > +			   struct kvm_hyp_iommu_domain *domain, u32 sid)
> > +{
> > +	u64 ttb;
> > +	u64 *dst;
> > +	int i, ret;
> > +	struct hyp_arm_smmu_v3_device *smmu = to_smmu(iommu);
> > +
> > +	dst = smmu_get_ste_ptr(smmu, sid);
> > +	if (!dst)
> > +		return -ENODEV;
> > +
> > +	ttb = dst[3] & STRTAB_STE_3_S2TTB_MASK;
> This is unused, does detach needs to do anything with ttb?

No it doesn't look like I've ever used this, I removed it

Thanks,
Jean

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 27/45] KVM: arm64: smmu-v3: Setup domains and page table configuration
@ 2023-07-03 10:41       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-07-03 10:41 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Mostafa,

On Fri, Jun 23, 2023 at 07:12:05PM +0000, Mostafa Saleh wrote:
> > +static int smmu_detach_dev(struct kvm_hyp_iommu *iommu, pkvm_handle_t domain_id,
> > +			   struct kvm_hyp_iommu_domain *domain, u32 sid)
> > +{
> > +	u64 ttb;
> > +	u64 *dst;
> > +	int i, ret;
> > +	struct hyp_arm_smmu_v3_device *smmu = to_smmu(iommu);
> > +
> > +	dst = smmu_get_ste_ptr(smmu, sid);
> > +	if (!dst)
> > +		return -ENODEV;
> > +
> > +	ttb = dst[3] & STRTAB_STE_3_S2TTB_MASK;
> This is unused, does detach needs to do anything with ttb?

No it doesn't look like I've ever used this, I removed it

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 20/45] KVM: arm64: iommu: Add map() and unmap() operations
  2023-04-04 16:00       ` Jean-Philippe Brucker
@ 2023-09-20 16:23         ` Mostafa Saleh
  -1 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2023-09-20 16:23 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Jean,

On Tue, Apr 04, 2023 at 05:00:46PM +0100, Jean-Philippe Brucker wrote:
> Hi Mostafa,
> 
> On Thu, Mar 30, 2023 at 06:14:04PM +0000, Mostafa Saleh wrote:
> > > +err_unmap:
> > > +	__kvm_iommu_unmap_pages(&iopt, iova_orig, pgsize, pgcount_orig - pgcount);
> > On error here, this unmaps (and unshares) only pages that has been
> > mapped.
> > But all pages where shared with IOMMU before (via
> > __pkvm_host_share_dma) and this corrupts the other pages state as
> > they are marked as shared while they are not.
> 
> Right, I'll fix this
> 
> > I see we can add a "bool unshare" arg to __kvm_iommu_unmap_pages which
> > will be called with false on error from here after calling
> > __pkvm_host_unshare_dma for the whole range.
> 
> I think it's simpler to call iopt_unmap_pages() directly here, followed by
> __pkvm_host_unshare_dma(). It even saves us a few lines
> 

I have been doing some testing based on the latest updates in
https://jpbrucker.net/git/linux/log/?h=pkvm/smmu

I think the unmap here is not enough as it assumes the whole range
can be unmapped in one call, so we would need something like this
instead (patch might no directly apply though):
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
index 7ebda87a1c61..32e145b9240f 100644
--- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
@@ -432,6 +432,7 @@ int kvm_iommu_map_pages(pkvm_handle_t domain_id,
 	unsigned long iova_orig = iova;
 	struct kvm_hyp_iommu_domain *domain;
 	struct pkvm_hyp_vcpu *ctxt = pkvm_get_loaded_hyp_vcpu();
+	size_t unmapped;
 
 	if (!kvm_iommu_ops)
 		return -ENODEV;
@@ -489,8 +490,13 @@ int kvm_iommu_map_pages(pkvm_handle_t domain_id,
 
 err_unmap:
 	pgcount = pgcount_orig - pgcount;
-	if (pgcount)
-		iopt_unmap_pages(&iopt, iova_orig, pgsize, pgcount, NULL);
+	while (pgcount) {
+		unmapped = iopt_unmap_pages(&iopt, iova_orig, pgsize, pgcount, NULL);
+		iova_orig += unmapped;
+		pgcount -= unmapped/pgsize;
+		if (!unmapped)
+			break;
+	}
 	__pkvm_unshare_dma(paddr_orig, size);
 err_domain_put:
 	domain_put(domain);
-- 


Thanks,
Mostafa

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 20/45] KVM: arm64: iommu: Add map() and unmap() operations
@ 2023-09-20 16:23         ` Mostafa Saleh
  0 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2023-09-20 16:23 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Jean,

On Tue, Apr 04, 2023 at 05:00:46PM +0100, Jean-Philippe Brucker wrote:
> Hi Mostafa,
> 
> On Thu, Mar 30, 2023 at 06:14:04PM +0000, Mostafa Saleh wrote:
> > > +err_unmap:
> > > +	__kvm_iommu_unmap_pages(&iopt, iova_orig, pgsize, pgcount_orig - pgcount);
> > On error here, this unmaps (and unshares) only pages that has been
> > mapped.
> > But all pages where shared with IOMMU before (via
> > __pkvm_host_share_dma) and this corrupts the other pages state as
> > they are marked as shared while they are not.
> 
> Right, I'll fix this
> 
> > I see we can add a "bool unshare" arg to __kvm_iommu_unmap_pages which
> > will be called with false on error from here after calling
> > __pkvm_host_unshare_dma for the whole range.
> 
> I think it's simpler to call iopt_unmap_pages() directly here, followed by
> __pkvm_host_unshare_dma(). It even saves us a few lines
> 

I have been doing some testing based on the latest updates in
https://jpbrucker.net/git/linux/log/?h=pkvm/smmu

I think the unmap here is not enough as it assumes the whole range
can be unmapped in one call, so we would need something like this
instead (patch might no directly apply though):
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
index 7ebda87a1c61..32e145b9240f 100644
--- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
@@ -432,6 +432,7 @@ int kvm_iommu_map_pages(pkvm_handle_t domain_id,
 	unsigned long iova_orig = iova;
 	struct kvm_hyp_iommu_domain *domain;
 	struct pkvm_hyp_vcpu *ctxt = pkvm_get_loaded_hyp_vcpu();
+	size_t unmapped;
 
 	if (!kvm_iommu_ops)
 		return -ENODEV;
@@ -489,8 +490,13 @@ int kvm_iommu_map_pages(pkvm_handle_t domain_id,
 
 err_unmap:
 	pgcount = pgcount_orig - pgcount;
-	if (pgcount)
-		iopt_unmap_pages(&iopt, iova_orig, pgsize, pgcount, NULL);
+	while (pgcount) {
+		unmapped = iopt_unmap_pages(&iopt, iova_orig, pgsize, pgcount, NULL);
+		iova_orig += unmapped;
+		pgcount -= unmapped/pgsize;
+		if (!unmapped)
+			break;
+	}
 	__pkvm_unshare_dma(paddr_orig, size);
 err_domain_put:
 	domain_put(domain);
-- 


Thanks,
Mostafa

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 40/45] iommu/arm-smmu-v3-kvm: Add IOMMU ops
  2023-02-01 12:53   ` Jean-Philippe Brucker
@ 2023-09-20 16:27     ` Mostafa Saleh
  -1 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2023-09-20 16:27 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

On Wed, Feb 01, 2023 at 12:53:24PM +0000, Jean-Philippe Brucker wrote:
> Forward alloc_domain(), attach_dev(), map_pages(), etc to the
> hypervisor.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> ---
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c   | 330 +++++++++++++++++-
>  1 file changed, 328 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
> index 55489d56fb5b..930d78f6e29f 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
> @@ -22,10 +22,28 @@ struct host_arm_smmu_device {
>  #define smmu_to_host(_smmu) \
>  	container_of(_smmu, struct host_arm_smmu_device, smmu);
>  
> +struct kvm_arm_smmu_master {
> +	struct arm_smmu_device		*smmu;
> +	struct device			*dev;
> +	struct kvm_arm_smmu_domain	*domain;
> +};
> +
> +struct kvm_arm_smmu_domain {
> +	struct iommu_domain		domain;
> +	struct arm_smmu_device		*smmu;
> +	struct mutex			init_mutex;
> +	unsigned long			pgd;
> +	pkvm_handle_t			id;
> +};
> +
> +#define to_kvm_smmu_domain(_domain) \
> +	container_of(_domain, struct kvm_arm_smmu_domain, domain)
> +
>  static size_t				kvm_arm_smmu_cur;
>  static size_t				kvm_arm_smmu_count;
>  static struct hyp_arm_smmu_v3_device	*kvm_arm_smmu_array;
>  static struct kvm_hyp_iommu_memcache	*kvm_arm_smmu_memcache;
> +static DEFINE_IDA(kvm_arm_smmu_domain_ida);
>  
>  static DEFINE_PER_CPU(local_lock_t, memcache_lock) =
>  				INIT_LOCAL_LOCK(memcache_lock);
> @@ -57,7 +75,6 @@ static void *kvm_arm_smmu_host_va(phys_addr_t pa)
>  	return __va(pa);
>  }
>  
> -__maybe_unused
>  static int kvm_arm_smmu_topup_memcache(struct arm_smmu_device *smmu)
>  {
>  	struct kvm_hyp_memcache *mc;
> @@ -74,7 +91,6 @@ static int kvm_arm_smmu_topup_memcache(struct arm_smmu_device *smmu)
>  				     kvm_arm_smmu_host_pa, smmu);
>  }
>  
> -__maybe_unused
>  static void kvm_arm_smmu_reclaim_memcache(void)
>  {
>  	struct kvm_hyp_memcache *mc;
> @@ -101,6 +117,299 @@ static void kvm_arm_smmu_reclaim_memcache(void)
>  	__ret;							\
>  })
>  
> +static struct platform_driver kvm_arm_smmu_driver;
> +
> +static struct arm_smmu_device *
> +kvm_arm_smmu_get_by_fwnode(struct fwnode_handle *fwnode)
> +{
> +	struct device *dev;
> +
> +	dev = driver_find_device_by_fwnode(&kvm_arm_smmu_driver.driver, fwnode);
> +	put_device(dev);
> +	return dev ? dev_get_drvdata(dev) : NULL;
> +}
> +
> +static struct iommu_ops kvm_arm_smmu_ops;
> +
> +static struct iommu_device *kvm_arm_smmu_probe_device(struct device *dev)
> +{
> +	struct arm_smmu_device *smmu;
> +	struct kvm_arm_smmu_master *master;
> +	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> +
> +	if (!fwspec || fwspec->ops != &kvm_arm_smmu_ops)
> +		return ERR_PTR(-ENODEV);
> +
> +	if (WARN_ON_ONCE(dev_iommu_priv_get(dev)))
> +		return ERR_PTR(-EBUSY);
> +
> +	smmu = kvm_arm_smmu_get_by_fwnode(fwspec->iommu_fwnode);
> +	if (!smmu)
> +		return ERR_PTR(-ENODEV);
> +
> +	master = kzalloc(sizeof(*master), GFP_KERNEL);
> +	if (!master)
> +		return ERR_PTR(-ENOMEM);
> +
> +	master->dev = dev;
> +	master->smmu = smmu;
> +	dev_iommu_priv_set(dev, master);
> +
> +	return &smmu->iommu;
> +}
> +
> +static void kvm_arm_smmu_release_device(struct device *dev)
> +{
> +	struct kvm_arm_smmu_master *master = dev_iommu_priv_get(dev);
> +
> +	kfree(master);
> +	iommu_fwspec_free(dev);
> +}
> +
> +static struct iommu_domain *kvm_arm_smmu_domain_alloc(unsigned type)
> +{
> +	struct kvm_arm_smmu_domain *kvm_smmu_domain;
> +
> +	/*
> +	 * We don't support
> +	 * - IOMMU_DOMAIN_IDENTITY because we rely on the host telling the
> +	 *   hypervisor which pages are used for DMA.
> +	 * - IOMMU_DOMAIN_DMA_FQ because lazy unmap would clash with memory
> +	 *   donation to guests.
> +	 */
> +	if (type != IOMMU_DOMAIN_DMA &&
> +	    type != IOMMU_DOMAIN_UNMANAGED)
> +		return NULL;
> +
> +	kvm_smmu_domain = kzalloc(sizeof(*kvm_smmu_domain), GFP_KERNEL);
> +	if (!kvm_smmu_domain)
> +		return NULL;
> +
> +	mutex_init(&kvm_smmu_domain->init_mutex);
> +
> +	return &kvm_smmu_domain->domain;
> +}
> +
> +static int kvm_arm_smmu_domain_finalize(struct kvm_arm_smmu_domain *kvm_smmu_domain,
> +					struct kvm_arm_smmu_master *master)
> +{
> +	int ret = 0;
> +	struct page *p;
> +	unsigned long pgd;
> +	struct arm_smmu_device *smmu = master->smmu;
> +	struct host_arm_smmu_device *host_smmu = smmu_to_host(smmu);
> +
> +	if (kvm_smmu_domain->smmu) {
> +		if (kvm_smmu_domain->smmu != smmu)
> +			return -EINVAL;
> +		return 0;
> +	}
> +
> +	ret = ida_alloc_range(&kvm_arm_smmu_domain_ida, 0, 1 << smmu->vmid_bits,
> +			      GFP_KERNEL);
> +	if (ret < 0)
> +		return ret;
> +	kvm_smmu_domain->id = ret;
> +
> +	/*
> +	 * PGD allocation does not use the memcache because it may be of higher
> +	 * order when concatenated.
> +	 */
> +	p = alloc_pages_node(dev_to_node(smmu->dev), GFP_KERNEL | __GFP_ZERO,
> +			     host_smmu->pgd_order);
> +	if (!p)
> +		return -ENOMEM;
> +
> +	pgd = (unsigned long)page_to_virt(p);
> +
> +	local_lock_irq(&memcache_lock);
> +	ret = kvm_call_hyp_nvhe_mc(smmu, __pkvm_host_iommu_alloc_domain,
> +				   host_smmu->id, kvm_smmu_domain->id, pgd);
> +	local_unlock_irq(&memcache_lock);
> +	if (ret)
> +		goto err_free;
> +
> +	kvm_smmu_domain->domain.pgsize_bitmap = smmu->pgsize_bitmap;
> +	kvm_smmu_domain->domain.geometry.aperture_end = (1UL << smmu->ias) - 1;
> +	kvm_smmu_domain->domain.geometry.force_aperture = true;
> +	kvm_smmu_domain->smmu = smmu;
> +	kvm_smmu_domain->pgd = pgd;
> +
> +	return 0;
> +
> +err_free:
> +	free_pages(pgd, host_smmu->pgd_order);
> +	ida_free(&kvm_arm_smmu_domain_ida, kvm_smmu_domain->id);
> +	return ret;
> +}
> +
> +static void kvm_arm_smmu_domain_free(struct iommu_domain *domain)
> +{
> +	int ret;
> +	struct kvm_arm_smmu_domain *kvm_smmu_domain = to_kvm_smmu_domain(domain);
> +	struct arm_smmu_device *smmu = kvm_smmu_domain->smmu;
> +
> +	if (smmu) {
> +		struct host_arm_smmu_device *host_smmu = smmu_to_host(smmu);
> +
> +		ret = kvm_call_hyp_nvhe(__pkvm_host_iommu_free_domain,
> +					host_smmu->id, kvm_smmu_domain->id);
> +		/*
> +		 * On failure, leak the pgd because it probably hasn't been
> +		 * reclaimed by the host.
> +		 */
> +		if (!WARN_ON(ret))
> +			free_pages(kvm_smmu_domain->pgd, host_smmu->pgd_order);
I believe this doube-free the pgd in case of attatch_dev fails, as it
would try to free it their also (in kvm_arm_smmu_domain_finalize).

I think this is right place to free the pgd.

> +		ida_free(&kvm_arm_smmu_domain_ida, kvm_smmu_domain->id);
> +	}
> +	kfree(kvm_smmu_domain);
> +}
> +
> +static int kvm_arm_smmu_detach_dev(struct host_arm_smmu_device *host_smmu,
> +				   struct kvm_arm_smmu_master *master)
> +{
> +	int i, ret;
> +	struct arm_smmu_device *smmu = &host_smmu->smmu;
> +	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
> +
> +	if (!master->domain)
> +		return 0;
> +
> +	for (i = 0; i < fwspec->num_ids; i++) {
> +		int sid = fwspec->ids[i];
> +
> +		ret = kvm_call_hyp_nvhe(__pkvm_host_iommu_detach_dev,
> +					host_smmu->id, master->domain->id, sid);
> +		if (ret) {
> +			dev_err(smmu->dev, "cannot detach device %s (0x%x): %d\n",
> +				dev_name(master->dev), sid, ret);
> +			break;
> +		}
> +	}
> +
> +	master->domain = NULL;
> +
> +	return ret;
> +}
> +
> +static int kvm_arm_smmu_attach_dev(struct iommu_domain *domain,
> +				   struct device *dev)
> +{
> +	int i, ret;
> +	struct arm_smmu_device *smmu;
> +	struct host_arm_smmu_device *host_smmu;
> +	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> +	struct kvm_arm_smmu_master *master = dev_iommu_priv_get(dev);
> +	struct kvm_arm_smmu_domain *kvm_smmu_domain = to_kvm_smmu_domain(domain);
> +
> +	if (!master)
> +		return -ENODEV;
> +
> +	smmu = master->smmu;
> +	host_smmu = smmu_to_host(smmu);
> +
> +	ret = kvm_arm_smmu_detach_dev(host_smmu, master);
> +	if (ret)
> +		return ret;
> +
> +	mutex_lock(&kvm_smmu_domain->init_mutex);
> +	ret = kvm_arm_smmu_domain_finalize(kvm_smmu_domain, master);
> +	mutex_unlock(&kvm_smmu_domain->init_mutex);
> +	if (ret)
> +		return ret;
> +
> +	local_lock_irq(&memcache_lock);
> +	for (i = 0; i < fwspec->num_ids; i++) {
> +		int sid = fwspec->ids[i];
> +
> +		ret = kvm_call_hyp_nvhe_mc(smmu, __pkvm_host_iommu_attach_dev,
> +					   host_smmu->id, kvm_smmu_domain->id,
> +					   sid);
> +		if (ret) {
> +			dev_err(smmu->dev, "cannot attach device %s (0x%x): %d\n",
> +				dev_name(dev), sid, ret);
> +			goto out_unlock;
> +		}
> +	}
> +	master->domain = kvm_smmu_domain;
> +
> +out_unlock:
> +	if (ret)
> +		kvm_arm_smmu_detach_dev(host_smmu, master);
> +	local_unlock_irq(&memcache_lock);
> +	return ret;
> +}
> +
> +static int kvm_arm_smmu_map_pages(struct iommu_domain *domain,
> +				  unsigned long iova, phys_addr_t paddr,
> +				  size_t pgsize, size_t pgcount, int prot,
> +				  gfp_t gfp, size_t *mapped)
> +{
> +	int ret;
> +	unsigned long irqflags;
> +	struct kvm_arm_smmu_domain *kvm_smmu_domain = to_kvm_smmu_domain(domain);
> +	struct arm_smmu_device *smmu = kvm_smmu_domain->smmu;
> +	struct host_arm_smmu_device *host_smmu = smmu_to_host(smmu);
> +
> +	local_lock_irqsave(&memcache_lock, irqflags);
> +	ret = kvm_call_hyp_nvhe_mc(smmu, __pkvm_host_iommu_map_pages,
> +				   host_smmu->id, kvm_smmu_domain->id, iova,
> +				   paddr, pgsize, pgcount, prot);
> +	local_unlock_irqrestore(&memcache_lock, irqflags);
> +	if (ret)
> +		return ret;
> +
> +	*mapped = pgsize * pgcount;
> +	return 0;
> +}
> +
> +static size_t kvm_arm_smmu_unmap_pages(struct iommu_domain *domain,
> +				       unsigned long iova, size_t pgsize,
> +				       size_t pgcount,
> +				       struct iommu_iotlb_gather *iotlb_gather)
> +{
> +	int ret;
> +	unsigned long irqflags;
> +	struct kvm_arm_smmu_domain *kvm_smmu_domain = to_kvm_smmu_domain(domain);
> +	struct arm_smmu_device *smmu = kvm_smmu_domain->smmu;
> +	struct host_arm_smmu_device *host_smmu = smmu_to_host(smmu);
> +
> +	local_lock_irqsave(&memcache_lock, irqflags);
> +	ret = kvm_call_hyp_nvhe_mc(smmu, __pkvm_host_iommu_unmap_pages,
> +				   host_smmu->id, kvm_smmu_domain->id, iova,
> +				   pgsize, pgcount);
> +	local_unlock_irqrestore(&memcache_lock, irqflags);
> +
> +	return ret ? 0 : pgsize * pgcount;
> +}
> +
> +static phys_addr_t kvm_arm_smmu_iova_to_phys(struct iommu_domain *domain,
> +					     dma_addr_t iova)
> +{
> +	struct kvm_arm_smmu_domain *kvm_smmu_domain = to_kvm_smmu_domain(domain);
> +	struct host_arm_smmu_device *host_smmu = smmu_to_host(kvm_smmu_domain->smmu);
> +
> +	return kvm_call_hyp_nvhe(__pkvm_host_iommu_iova_to_phys, host_smmu->id,
> +				 kvm_smmu_domain->id, iova);
> +}
> +
> +static struct iommu_ops kvm_arm_smmu_ops = {
> +	.capable		= arm_smmu_capable,
> +	.device_group		= arm_smmu_device_group,
> +	.of_xlate		= arm_smmu_of_xlate,
> +	.probe_device		= kvm_arm_smmu_probe_device,
> +	.release_device		= kvm_arm_smmu_release_device,
> +	.domain_alloc		= kvm_arm_smmu_domain_alloc,
> +	.owner			= THIS_MODULE,
> +	.default_domain_ops = &(const struct iommu_domain_ops) {
> +		.attach_dev	= kvm_arm_smmu_attach_dev,
> +		.free		= kvm_arm_smmu_domain_free,
> +		.map_pages	= kvm_arm_smmu_map_pages,
> +		.unmap_pages	= kvm_arm_smmu_unmap_pages,
> +		.iova_to_phys	= kvm_arm_smmu_iova_to_phys,
> +	}
> +};
> +
>  static bool kvm_arm_smmu_validate_features(struct arm_smmu_device *smmu)
>  {
>  	unsigned long oas;
> @@ -186,6 +495,12 @@ static int kvm_arm_smmu_device_reset(struct host_arm_smmu_device *host_smmu)
>  	return 0;
>  }
>  
> +static void *kvm_arm_smmu_alloc_domains(struct arm_smmu_device *smmu)
> +{
> +	return (void *)devm_get_free_pages(smmu->dev, GFP_KERNEL | __GFP_ZERO,
> +					   get_order(KVM_IOMMU_DOMAINS_ROOT_SIZE));
> +}
> +
>  static int kvm_arm_smmu_probe(struct platform_device *pdev)
>  {
>  	int ret;
> @@ -274,6 +589,16 @@ static int kvm_arm_smmu_probe(struct platform_device *pdev)
>  	if (ret)
>  		return ret;
>  
> +	hyp_smmu->iommu.domains = kvm_arm_smmu_alloc_domains(smmu);
> +	if (!hyp_smmu->iommu.domains)
> +		return -ENOMEM;
> +
> +	hyp_smmu->iommu.nr_domains = 1 << smmu->vmid_bits;
> +
> +	ret = arm_smmu_register_iommu(smmu, &kvm_arm_smmu_ops, ioaddr);
> +	if (ret)
> +		return ret;
> +
>  	platform_set_drvdata(pdev, host_smmu);
>  
>  	/* Hypervisor parameters */
> @@ -296,6 +621,7 @@ static int kvm_arm_smmu_remove(struct platform_device *pdev)
>  	 * There was an error during hypervisor setup. The hyp driver may
>  	 * have already enabled the device, so disable it.
>  	 */
> +	arm_smmu_unregister_iommu(smmu);
>  	arm_smmu_device_disable(smmu);
>  	arm_smmu_update_gbpa(smmu, host_smmu->boot_gbpa, GBPA_ABORT);
>  	return 0;
> -- 
> 2.39.0
> 
>
Thanks,
Mostafa

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 40/45] iommu/arm-smmu-v3-kvm: Add IOMMU ops
@ 2023-09-20 16:27     ` Mostafa Saleh
  0 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2023-09-20 16:27 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

On Wed, Feb 01, 2023 at 12:53:24PM +0000, Jean-Philippe Brucker wrote:
> Forward alloc_domain(), attach_dev(), map_pages(), etc to the
> hypervisor.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> ---
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c   | 330 +++++++++++++++++-
>  1 file changed, 328 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
> index 55489d56fb5b..930d78f6e29f 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
> @@ -22,10 +22,28 @@ struct host_arm_smmu_device {
>  #define smmu_to_host(_smmu) \
>  	container_of(_smmu, struct host_arm_smmu_device, smmu);
>  
> +struct kvm_arm_smmu_master {
> +	struct arm_smmu_device		*smmu;
> +	struct device			*dev;
> +	struct kvm_arm_smmu_domain	*domain;
> +};
> +
> +struct kvm_arm_smmu_domain {
> +	struct iommu_domain		domain;
> +	struct arm_smmu_device		*smmu;
> +	struct mutex			init_mutex;
> +	unsigned long			pgd;
> +	pkvm_handle_t			id;
> +};
> +
> +#define to_kvm_smmu_domain(_domain) \
> +	container_of(_domain, struct kvm_arm_smmu_domain, domain)
> +
>  static size_t				kvm_arm_smmu_cur;
>  static size_t				kvm_arm_smmu_count;
>  static struct hyp_arm_smmu_v3_device	*kvm_arm_smmu_array;
>  static struct kvm_hyp_iommu_memcache	*kvm_arm_smmu_memcache;
> +static DEFINE_IDA(kvm_arm_smmu_domain_ida);
>  
>  static DEFINE_PER_CPU(local_lock_t, memcache_lock) =
>  				INIT_LOCAL_LOCK(memcache_lock);
> @@ -57,7 +75,6 @@ static void *kvm_arm_smmu_host_va(phys_addr_t pa)
>  	return __va(pa);
>  }
>  
> -__maybe_unused
>  static int kvm_arm_smmu_topup_memcache(struct arm_smmu_device *smmu)
>  {
>  	struct kvm_hyp_memcache *mc;
> @@ -74,7 +91,6 @@ static int kvm_arm_smmu_topup_memcache(struct arm_smmu_device *smmu)
>  				     kvm_arm_smmu_host_pa, smmu);
>  }
>  
> -__maybe_unused
>  static void kvm_arm_smmu_reclaim_memcache(void)
>  {
>  	struct kvm_hyp_memcache *mc;
> @@ -101,6 +117,299 @@ static void kvm_arm_smmu_reclaim_memcache(void)
>  	__ret;							\
>  })
>  
> +static struct platform_driver kvm_arm_smmu_driver;
> +
> +static struct arm_smmu_device *
> +kvm_arm_smmu_get_by_fwnode(struct fwnode_handle *fwnode)
> +{
> +	struct device *dev;
> +
> +	dev = driver_find_device_by_fwnode(&kvm_arm_smmu_driver.driver, fwnode);
> +	put_device(dev);
> +	return dev ? dev_get_drvdata(dev) : NULL;
> +}
> +
> +static struct iommu_ops kvm_arm_smmu_ops;
> +
> +static struct iommu_device *kvm_arm_smmu_probe_device(struct device *dev)
> +{
> +	struct arm_smmu_device *smmu;
> +	struct kvm_arm_smmu_master *master;
> +	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> +
> +	if (!fwspec || fwspec->ops != &kvm_arm_smmu_ops)
> +		return ERR_PTR(-ENODEV);
> +
> +	if (WARN_ON_ONCE(dev_iommu_priv_get(dev)))
> +		return ERR_PTR(-EBUSY);
> +
> +	smmu = kvm_arm_smmu_get_by_fwnode(fwspec->iommu_fwnode);
> +	if (!smmu)
> +		return ERR_PTR(-ENODEV);
> +
> +	master = kzalloc(sizeof(*master), GFP_KERNEL);
> +	if (!master)
> +		return ERR_PTR(-ENOMEM);
> +
> +	master->dev = dev;
> +	master->smmu = smmu;
> +	dev_iommu_priv_set(dev, master);
> +
> +	return &smmu->iommu;
> +}
> +
> +static void kvm_arm_smmu_release_device(struct device *dev)
> +{
> +	struct kvm_arm_smmu_master *master = dev_iommu_priv_get(dev);
> +
> +	kfree(master);
> +	iommu_fwspec_free(dev);
> +}
> +
> +static struct iommu_domain *kvm_arm_smmu_domain_alloc(unsigned type)
> +{
> +	struct kvm_arm_smmu_domain *kvm_smmu_domain;
> +
> +	/*
> +	 * We don't support
> +	 * - IOMMU_DOMAIN_IDENTITY because we rely on the host telling the
> +	 *   hypervisor which pages are used for DMA.
> +	 * - IOMMU_DOMAIN_DMA_FQ because lazy unmap would clash with memory
> +	 *   donation to guests.
> +	 */
> +	if (type != IOMMU_DOMAIN_DMA &&
> +	    type != IOMMU_DOMAIN_UNMANAGED)
> +		return NULL;
> +
> +	kvm_smmu_domain = kzalloc(sizeof(*kvm_smmu_domain), GFP_KERNEL);
> +	if (!kvm_smmu_domain)
> +		return NULL;
> +
> +	mutex_init(&kvm_smmu_domain->init_mutex);
> +
> +	return &kvm_smmu_domain->domain;
> +}
> +
> +static int kvm_arm_smmu_domain_finalize(struct kvm_arm_smmu_domain *kvm_smmu_domain,
> +					struct kvm_arm_smmu_master *master)
> +{
> +	int ret = 0;
> +	struct page *p;
> +	unsigned long pgd;
> +	struct arm_smmu_device *smmu = master->smmu;
> +	struct host_arm_smmu_device *host_smmu = smmu_to_host(smmu);
> +
> +	if (kvm_smmu_domain->smmu) {
> +		if (kvm_smmu_domain->smmu != smmu)
> +			return -EINVAL;
> +		return 0;
> +	}
> +
> +	ret = ida_alloc_range(&kvm_arm_smmu_domain_ida, 0, 1 << smmu->vmid_bits,
> +			      GFP_KERNEL);
> +	if (ret < 0)
> +		return ret;
> +	kvm_smmu_domain->id = ret;
> +
> +	/*
> +	 * PGD allocation does not use the memcache because it may be of higher
> +	 * order when concatenated.
> +	 */
> +	p = alloc_pages_node(dev_to_node(smmu->dev), GFP_KERNEL | __GFP_ZERO,
> +			     host_smmu->pgd_order);
> +	if (!p)
> +		return -ENOMEM;
> +
> +	pgd = (unsigned long)page_to_virt(p);
> +
> +	local_lock_irq(&memcache_lock);
> +	ret = kvm_call_hyp_nvhe_mc(smmu, __pkvm_host_iommu_alloc_domain,
> +				   host_smmu->id, kvm_smmu_domain->id, pgd);
> +	local_unlock_irq(&memcache_lock);
> +	if (ret)
> +		goto err_free;
> +
> +	kvm_smmu_domain->domain.pgsize_bitmap = smmu->pgsize_bitmap;
> +	kvm_smmu_domain->domain.geometry.aperture_end = (1UL << smmu->ias) - 1;
> +	kvm_smmu_domain->domain.geometry.force_aperture = true;
> +	kvm_smmu_domain->smmu = smmu;
> +	kvm_smmu_domain->pgd = pgd;
> +
> +	return 0;
> +
> +err_free:
> +	free_pages(pgd, host_smmu->pgd_order);
> +	ida_free(&kvm_arm_smmu_domain_ida, kvm_smmu_domain->id);
> +	return ret;
> +}
> +
> +static void kvm_arm_smmu_domain_free(struct iommu_domain *domain)
> +{
> +	int ret;
> +	struct kvm_arm_smmu_domain *kvm_smmu_domain = to_kvm_smmu_domain(domain);
> +	struct arm_smmu_device *smmu = kvm_smmu_domain->smmu;
> +
> +	if (smmu) {
> +		struct host_arm_smmu_device *host_smmu = smmu_to_host(smmu);
> +
> +		ret = kvm_call_hyp_nvhe(__pkvm_host_iommu_free_domain,
> +					host_smmu->id, kvm_smmu_domain->id);
> +		/*
> +		 * On failure, leak the pgd because it probably hasn't been
> +		 * reclaimed by the host.
> +		 */
> +		if (!WARN_ON(ret))
> +			free_pages(kvm_smmu_domain->pgd, host_smmu->pgd_order);
I believe this doube-free the pgd in case of attatch_dev fails, as it
would try to free it their also (in kvm_arm_smmu_domain_finalize).

I think this is right place to free the pgd.

> +		ida_free(&kvm_arm_smmu_domain_ida, kvm_smmu_domain->id);
> +	}
> +	kfree(kvm_smmu_domain);
> +}
> +
> +static int kvm_arm_smmu_detach_dev(struct host_arm_smmu_device *host_smmu,
> +				   struct kvm_arm_smmu_master *master)
> +{
> +	int i, ret;
> +	struct arm_smmu_device *smmu = &host_smmu->smmu;
> +	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
> +
> +	if (!master->domain)
> +		return 0;
> +
> +	for (i = 0; i < fwspec->num_ids; i++) {
> +		int sid = fwspec->ids[i];
> +
> +		ret = kvm_call_hyp_nvhe(__pkvm_host_iommu_detach_dev,
> +					host_smmu->id, master->domain->id, sid);
> +		if (ret) {
> +			dev_err(smmu->dev, "cannot detach device %s (0x%x): %d\n",
> +				dev_name(master->dev), sid, ret);
> +			break;
> +		}
> +	}
> +
> +	master->domain = NULL;
> +
> +	return ret;
> +}
> +
> +static int kvm_arm_smmu_attach_dev(struct iommu_domain *domain,
> +				   struct device *dev)
> +{
> +	int i, ret;
> +	struct arm_smmu_device *smmu;
> +	struct host_arm_smmu_device *host_smmu;
> +	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> +	struct kvm_arm_smmu_master *master = dev_iommu_priv_get(dev);
> +	struct kvm_arm_smmu_domain *kvm_smmu_domain = to_kvm_smmu_domain(domain);
> +
> +	if (!master)
> +		return -ENODEV;
> +
> +	smmu = master->smmu;
> +	host_smmu = smmu_to_host(smmu);
> +
> +	ret = kvm_arm_smmu_detach_dev(host_smmu, master);
> +	if (ret)
> +		return ret;
> +
> +	mutex_lock(&kvm_smmu_domain->init_mutex);
> +	ret = kvm_arm_smmu_domain_finalize(kvm_smmu_domain, master);
> +	mutex_unlock(&kvm_smmu_domain->init_mutex);
> +	if (ret)
> +		return ret;
> +
> +	local_lock_irq(&memcache_lock);
> +	for (i = 0; i < fwspec->num_ids; i++) {
> +		int sid = fwspec->ids[i];
> +
> +		ret = kvm_call_hyp_nvhe_mc(smmu, __pkvm_host_iommu_attach_dev,
> +					   host_smmu->id, kvm_smmu_domain->id,
> +					   sid);
> +		if (ret) {
> +			dev_err(smmu->dev, "cannot attach device %s (0x%x): %d\n",
> +				dev_name(dev), sid, ret);
> +			goto out_unlock;
> +		}
> +	}
> +	master->domain = kvm_smmu_domain;
> +
> +out_unlock:
> +	if (ret)
> +		kvm_arm_smmu_detach_dev(host_smmu, master);
> +	local_unlock_irq(&memcache_lock);
> +	return ret;
> +}
> +
> +static int kvm_arm_smmu_map_pages(struct iommu_domain *domain,
> +				  unsigned long iova, phys_addr_t paddr,
> +				  size_t pgsize, size_t pgcount, int prot,
> +				  gfp_t gfp, size_t *mapped)
> +{
> +	int ret;
> +	unsigned long irqflags;
> +	struct kvm_arm_smmu_domain *kvm_smmu_domain = to_kvm_smmu_domain(domain);
> +	struct arm_smmu_device *smmu = kvm_smmu_domain->smmu;
> +	struct host_arm_smmu_device *host_smmu = smmu_to_host(smmu);
> +
> +	local_lock_irqsave(&memcache_lock, irqflags);
> +	ret = kvm_call_hyp_nvhe_mc(smmu, __pkvm_host_iommu_map_pages,
> +				   host_smmu->id, kvm_smmu_domain->id, iova,
> +				   paddr, pgsize, pgcount, prot);
> +	local_unlock_irqrestore(&memcache_lock, irqflags);
> +	if (ret)
> +		return ret;
> +
> +	*mapped = pgsize * pgcount;
> +	return 0;
> +}
> +
> +static size_t kvm_arm_smmu_unmap_pages(struct iommu_domain *domain,
> +				       unsigned long iova, size_t pgsize,
> +				       size_t pgcount,
> +				       struct iommu_iotlb_gather *iotlb_gather)
> +{
> +	int ret;
> +	unsigned long irqflags;
> +	struct kvm_arm_smmu_domain *kvm_smmu_domain = to_kvm_smmu_domain(domain);
> +	struct arm_smmu_device *smmu = kvm_smmu_domain->smmu;
> +	struct host_arm_smmu_device *host_smmu = smmu_to_host(smmu);
> +
> +	local_lock_irqsave(&memcache_lock, irqflags);
> +	ret = kvm_call_hyp_nvhe_mc(smmu, __pkvm_host_iommu_unmap_pages,
> +				   host_smmu->id, kvm_smmu_domain->id, iova,
> +				   pgsize, pgcount);
> +	local_unlock_irqrestore(&memcache_lock, irqflags);
> +
> +	return ret ? 0 : pgsize * pgcount;
> +}
> +
> +static phys_addr_t kvm_arm_smmu_iova_to_phys(struct iommu_domain *domain,
> +					     dma_addr_t iova)
> +{
> +	struct kvm_arm_smmu_domain *kvm_smmu_domain = to_kvm_smmu_domain(domain);
> +	struct host_arm_smmu_device *host_smmu = smmu_to_host(kvm_smmu_domain->smmu);
> +
> +	return kvm_call_hyp_nvhe(__pkvm_host_iommu_iova_to_phys, host_smmu->id,
> +				 kvm_smmu_domain->id, iova);
> +}
> +
> +static struct iommu_ops kvm_arm_smmu_ops = {
> +	.capable		= arm_smmu_capable,
> +	.device_group		= arm_smmu_device_group,
> +	.of_xlate		= arm_smmu_of_xlate,
> +	.probe_device		= kvm_arm_smmu_probe_device,
> +	.release_device		= kvm_arm_smmu_release_device,
> +	.domain_alloc		= kvm_arm_smmu_domain_alloc,
> +	.owner			= THIS_MODULE,
> +	.default_domain_ops = &(const struct iommu_domain_ops) {
> +		.attach_dev	= kvm_arm_smmu_attach_dev,
> +		.free		= kvm_arm_smmu_domain_free,
> +		.map_pages	= kvm_arm_smmu_map_pages,
> +		.unmap_pages	= kvm_arm_smmu_unmap_pages,
> +		.iova_to_phys	= kvm_arm_smmu_iova_to_phys,
> +	}
> +};
> +
>  static bool kvm_arm_smmu_validate_features(struct arm_smmu_device *smmu)
>  {
>  	unsigned long oas;
> @@ -186,6 +495,12 @@ static int kvm_arm_smmu_device_reset(struct host_arm_smmu_device *host_smmu)
>  	return 0;
>  }
>  
> +static void *kvm_arm_smmu_alloc_domains(struct arm_smmu_device *smmu)
> +{
> +	return (void *)devm_get_free_pages(smmu->dev, GFP_KERNEL | __GFP_ZERO,
> +					   get_order(KVM_IOMMU_DOMAINS_ROOT_SIZE));
> +}
> +
>  static int kvm_arm_smmu_probe(struct platform_device *pdev)
>  {
>  	int ret;
> @@ -274,6 +589,16 @@ static int kvm_arm_smmu_probe(struct platform_device *pdev)
>  	if (ret)
>  		return ret;
>  
> +	hyp_smmu->iommu.domains = kvm_arm_smmu_alloc_domains(smmu);
> +	if (!hyp_smmu->iommu.domains)
> +		return -ENOMEM;
> +
> +	hyp_smmu->iommu.nr_domains = 1 << smmu->vmid_bits;
> +
> +	ret = arm_smmu_register_iommu(smmu, &kvm_arm_smmu_ops, ioaddr);
> +	if (ret)
> +		return ret;
> +
>  	platform_set_drvdata(pdev, host_smmu);
>  
>  	/* Hypervisor parameters */
> @@ -296,6 +621,7 @@ static int kvm_arm_smmu_remove(struct platform_device *pdev)
>  	 * There was an error during hypervisor setup. The hyp driver may
>  	 * have already enabled the device, so disable it.
>  	 */
> +	arm_smmu_unregister_iommu(smmu);
>  	arm_smmu_device_disable(smmu);
>  	arm_smmu_update_gbpa(smmu, host_smmu->boot_gbpa, GBPA_ABORT);
>  	return 0;
> -- 
> 2.39.0
> 
>
Thanks,
Mostafa

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 40/45] iommu/arm-smmu-v3-kvm: Add IOMMU ops
  2023-09-20 16:27     ` Mostafa Saleh
@ 2023-09-25 17:18       ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-09-25 17:18 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

On Wed, Sep 20, 2023 at 04:27:41PM +0000, Mostafa Saleh wrote:
> > +static void kvm_arm_smmu_domain_free(struct iommu_domain *domain)
> > +{
> > +	int ret;
> > +	struct kvm_arm_smmu_domain *kvm_smmu_domain = to_kvm_smmu_domain(domain);
> > +	struct arm_smmu_device *smmu = kvm_smmu_domain->smmu;
> > +
> > +	if (smmu) {
> > +		struct host_arm_smmu_device *host_smmu = smmu_to_host(smmu);
> > +
> > +		ret = kvm_call_hyp_nvhe(__pkvm_host_iommu_free_domain,
> > +					host_smmu->id, kvm_smmu_domain->id);
> > +		/*
> > +		 * On failure, leak the pgd because it probably hasn't been
> > +		 * reclaimed by the host.
> > +		 */
> > +		if (!WARN_ON(ret))
> > +			free_pages(kvm_smmu_domain->pgd, host_smmu->pgd_order);
> I believe this doube-free the pgd in case of attatch_dev fails, as it
> would try to free it their also (in kvm_arm_smmu_domain_finalize).
> 
> I think this is right place to free the pgd.

Since this depends on kvm_smmu_domain->smmu being non-NULL, which is only
true if finalize() succeeded, then we shouldn't get a double-free.

But finalize() does leak kvm_smmu_domain->id if the pgd allocation fails,
I fixed that.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 40/45] iommu/arm-smmu-v3-kvm: Add IOMMU ops
@ 2023-09-25 17:18       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-09-25 17:18 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

On Wed, Sep 20, 2023 at 04:27:41PM +0000, Mostafa Saleh wrote:
> > +static void kvm_arm_smmu_domain_free(struct iommu_domain *domain)
> > +{
> > +	int ret;
> > +	struct kvm_arm_smmu_domain *kvm_smmu_domain = to_kvm_smmu_domain(domain);
> > +	struct arm_smmu_device *smmu = kvm_smmu_domain->smmu;
> > +
> > +	if (smmu) {
> > +		struct host_arm_smmu_device *host_smmu = smmu_to_host(smmu);
> > +
> > +		ret = kvm_call_hyp_nvhe(__pkvm_host_iommu_free_domain,
> > +					host_smmu->id, kvm_smmu_domain->id);
> > +		/*
> > +		 * On failure, leak the pgd because it probably hasn't been
> > +		 * reclaimed by the host.
> > +		 */
> > +		if (!WARN_ON(ret))
> > +			free_pages(kvm_smmu_domain->pgd, host_smmu->pgd_order);
> I believe this doube-free the pgd in case of attatch_dev fails, as it
> would try to free it their also (in kvm_arm_smmu_domain_finalize).
> 
> I think this is right place to free the pgd.

Since this depends on kvm_smmu_domain->smmu being non-NULL, which is only
true if finalize() succeeded, then we shouldn't get a double-free.

But finalize() does leak kvm_smmu_domain->id if the pgd allocation fails,
I fixed that.

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 20/45] KVM: arm64: iommu: Add map() and unmap() operations
  2023-09-20 16:23         ` Mostafa Saleh
@ 2023-09-25 17:21           ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-09-25 17:21 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

On Wed, Sep 20, 2023 at 04:23:49PM +0000, Mostafa Saleh wrote:
> I have been doing some testing based on the latest updates in
> https://jpbrucker.net/git/linux/log/?h=pkvm/smmu
> 
> I think the unmap here is not enough as it assumes the whole range
> can be unmapped in one call, so we would need something like this
> instead (patch might no directly apply though):

Thanks for testing, the unmap is indeed wrong because io-pgtable doesn't
like clearing empty ptes. Another problem with very large mappings is that
we'll return to host several times to allocate more page tables, removing
and recreating everything from scratch every time. So I changed the
map_pages() call to keep track of what's been mapped so far, see below.

I pushed these fixes to the pkvm/smmu branch

Thanks,
Jean

---
diff --git a/arch/arm64/kvm/hyp/include/nvhe/iommu.h b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
index f522ec0002f10..254df53e04783 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/iommu.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
@@ -37,9 +37,9 @@ int kvm_iommu_attach_dev(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 			 u32 endpoint_id);
 int kvm_iommu_detach_dev(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 			 u32 endpoint_id);
-int kvm_iommu_map_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
-			unsigned long iova, phys_addr_t paddr, size_t pgsize,
-			size_t pgcount, int prot);
+size_t kvm_iommu_map_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
+			   unsigned long iova, phys_addr_t paddr, size_t pgsize,
+			   size_t pgcount, int prot);
 size_t kvm_iommu_unmap_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 			     unsigned long iova, size_t pgsize, size_t pgcount);
 phys_addr_t kvm_iommu_iova_to_phys(pkvm_handle_t iommu_id,
@@ -72,12 +72,12 @@ static inline int kvm_iommu_detach_dev(pkvm_handle_t iommu_id,
 	return -ENODEV;
 }

-static inline int kvm_iommu_map_pages(pkvm_handle_t iommu_id,
-				      pkvm_handle_t domain_id,
-				      unsigned long iova, phys_addr_t paddr,
-				      size_t pgsize, size_t pgcount, int prot)
+static inline size_t kvm_iommu_map_pages(pkvm_handle_t iommu_id,
+					 pkvm_handle_t domain_id,
+					 unsigned long iova, phys_addr_t paddr,
+					 size_t pgsize, size_t pgcount, int prot)
 {
-	return -ENODEV;
+	return 0;
 }

 static inline size_t kvm_iommu_unmap_pages(pkvm_handle_t iommu_id,
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
index ad8c7b073de46..a69247c861d51 100644
--- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
@@ -190,31 +190,29 @@ int kvm_iommu_detach_dev(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 #define IOMMU_PROT_MASK (IOMMU_READ | IOMMU_WRITE | IOMMU_CACHE |\
 			 IOMMU_NOEXEC | IOMMU_MMIO)

-int kvm_iommu_map_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
-			unsigned long iova, phys_addr_t paddr, size_t pgsize,
-			size_t pgcount, int prot)
+size_t kvm_iommu_map_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
+			   unsigned long iova, phys_addr_t paddr, size_t pgsize,
+			   size_t pgcount, int prot)
 {
 	size_t size;
 	size_t mapped;
 	size_t granule;
 	int ret = -EINVAL;
 	struct io_pgtable iopt;
+	size_t total_mapped = 0;
 	struct kvm_hyp_iommu *iommu;
-	size_t pgcount_orig = pgcount;
-	phys_addr_t paddr_orig = paddr;
-	unsigned long iova_orig = iova;
 	struct kvm_hyp_iommu_domain *domain;

 	if (prot & ~IOMMU_PROT_MASK)
-		return -EINVAL;
+		return 0;

 	if (__builtin_mul_overflow(pgsize, pgcount, &size) ||
 	    iova + size < iova || paddr + size < paddr)
-		return -EOVERFLOW;
+		return 0;

 	iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
 	if (!iommu)
-		return -EINVAL;
+		return 0;

 	hyp_spin_lock(&iommu->lock);
 	domain = handle_to_domain(iommu, domain_id);
@@ -230,29 +228,29 @@ int kvm_iommu_map_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 		goto err_unlock;

 	iopt = domain_to_iopt(iommu, domain, domain_id);
-	while (pgcount) {
+	while (pgcount && !ret) {
 		mapped = 0;
 		ret = iopt_map_pages(&iopt, iova, paddr, pgsize, pgcount, prot,
 				     0, &mapped);
 		WARN_ON(!IS_ALIGNED(mapped, pgsize));
+		WARN_ON(mapped > pgcount * pgsize);
+
 		pgcount -= mapped / pgsize;
-		if (ret)
-			goto err_unmap;
+		total_mapped += mapped;
 		iova += mapped;
 		paddr += mapped;
 	}

-	hyp_spin_unlock(&iommu->lock);
-	return 0;
-
-err_unmap:
-	pgcount = pgcount_orig - pgcount;
+	/*
+	 * Unshare the bits that haven't been mapped yet. The host calls back
+	 * either to continue mapping, or to unmap and unshare what's been done
+	 * so far.
+	 */
 	if (pgcount)
-		iopt_unmap_pages(&iopt, iova_orig, pgsize, pgcount, NULL);
-	__pkvm_host_unshare_dma(paddr_orig, size);
+		__pkvm_host_unshare_dma(paddr, pgcount * pgsize);
 err_unlock:
 	hyp_spin_unlock(&iommu->lock);
-	return ret;
+	return total_mapped;
 }

 size_t kvm_iommu_unmap_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
index e67cd7485dfd4..0ffe3e32cf419 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
@@ -369,23 +369,32 @@ static int kvm_arm_smmu_attach_dev(struct iommu_domain *domain,
 static int kvm_arm_smmu_map_pages(struct iommu_domain *domain,
 				  unsigned long iova, phys_addr_t paddr,
 				  size_t pgsize, size_t pgcount, int prot,
-				  gfp_t gfp, size_t *mapped)
+				  gfp_t gfp, size_t *total_mapped)
 {
-	int ret;
+	size_t mapped;
 	unsigned long irqflags;
+	size_t size = pgsize * pgcount;
 	struct kvm_arm_smmu_domain *kvm_smmu_domain = to_kvm_smmu_domain(domain);
 	struct arm_smmu_device *smmu = kvm_smmu_domain->smmu;
 	struct host_arm_smmu_device *host_smmu = smmu_to_host(smmu);

 	local_lock_irqsave(&memcache_lock, irqflags);
-	ret = kvm_call_hyp_nvhe_mc(smmu, __pkvm_host_iommu_map_pages,
-				   host_smmu->id, kvm_smmu_domain->id, iova,
-				   paddr, pgsize, pgcount, prot);
+	do {
+		mapped = kvm_call_hyp_nvhe(__pkvm_host_iommu_map_pages,
+					   host_smmu->id, kvm_smmu_domain->id,
+					   iova, paddr, pgsize, pgcount, prot);
+		iova += mapped;
+		paddr += mapped;
+		WARN_ON(mapped % pgsize);
+		WARN_ON(mapped > pgcount * pgsize);
+		pgcount -= mapped / pgsize;
+		*total_mapped += mapped;
+	} while (*total_mapped < size && !kvm_arm_smmu_topup_memcache(smmu));
+	kvm_arm_smmu_reclaim_memcache();
 	local_unlock_irqrestore(&memcache_lock, irqflags);
-	if (ret)
-		return ret;
+	if (*total_mapped < size)
+		return -EINVAL;

-	*mapped = pgsize * pgcount;
 	return 0;
 }

----

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
index 0ffe3e32cf419..0055ecd75990e 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
@@ -405,6 +405,8 @@ static size_t kvm_arm_smmu_unmap_pages(struct iommu_domain *domain,
 {
 	size_t unmapped;
 	unsigned long irqflags;
+	size_t total_unmapped = 0;
+	size_t size = pgsize * pgcount;
 	struct kvm_arm_smmu_domain *kvm_smmu_domain = to_kvm_smmu_domain(domain);
 	struct arm_smmu_device *smmu = kvm_smmu_domain->smmu;
 	struct host_arm_smmu_device *host_smmu = smmu_to_host(smmu);
@@ -414,11 +416,23 @@ static size_t kvm_arm_smmu_unmap_pages(struct iommu_domain *domain,
 		unmapped = kvm_call_hyp_nvhe(__pkvm_host_iommu_unmap_pages,
 					     host_smmu->id, kvm_smmu_domain->id,
 					     iova, pgsize, pgcount);
-	} while (!unmapped && !kvm_arm_smmu_topup_memcache(smmu));
+		total_unmapped += unmapped;
+		iova += unmapped;
+		WARN_ON(unmapped % pgsize);
+		pgcount -= unmapped / pgsize;
+
+		/*
+		 * The page table driver can unmap less than we asked for. If it
+		 * didn't unmap anything at all, then it either reached the end
+		 * of the range, or it needs a page in the memcache to break a
+		 * block mapping.
+		 */
+	} while (total_unmapped < size &&
+		 (unmapped || !kvm_arm_smmu_topup_memcache(smmu)));
 	kvm_arm_smmu_reclaim_memcache();
 	local_unlock_irqrestore(&memcache_lock, irqflags);
 
-	return unmapped;
+	return total_unmapped;
 }
 
 static phys_addr_t kvm_arm_smmu_iova_to_phys(struct iommu_domain *domain,


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 20/45] KVM: arm64: iommu: Add map() and unmap() operations
@ 2023-09-25 17:21           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2023-09-25 17:21 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

On Wed, Sep 20, 2023 at 04:23:49PM +0000, Mostafa Saleh wrote:
> I have been doing some testing based on the latest updates in
> https://jpbrucker.net/git/linux/log/?h=pkvm/smmu
> 
> I think the unmap here is not enough as it assumes the whole range
> can be unmapped in one call, so we would need something like this
> instead (patch might no directly apply though):

Thanks for testing, the unmap is indeed wrong because io-pgtable doesn't
like clearing empty ptes. Another problem with very large mappings is that
we'll return to host several times to allocate more page tables, removing
and recreating everything from scratch every time. So I changed the
map_pages() call to keep track of what's been mapped so far, see below.

I pushed these fixes to the pkvm/smmu branch

Thanks,
Jean

---
diff --git a/arch/arm64/kvm/hyp/include/nvhe/iommu.h b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
index f522ec0002f10..254df53e04783 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/iommu.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
@@ -37,9 +37,9 @@ int kvm_iommu_attach_dev(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 			 u32 endpoint_id);
 int kvm_iommu_detach_dev(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 			 u32 endpoint_id);
-int kvm_iommu_map_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
-			unsigned long iova, phys_addr_t paddr, size_t pgsize,
-			size_t pgcount, int prot);
+size_t kvm_iommu_map_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
+			   unsigned long iova, phys_addr_t paddr, size_t pgsize,
+			   size_t pgcount, int prot);
 size_t kvm_iommu_unmap_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 			     unsigned long iova, size_t pgsize, size_t pgcount);
 phys_addr_t kvm_iommu_iova_to_phys(pkvm_handle_t iommu_id,
@@ -72,12 +72,12 @@ static inline int kvm_iommu_detach_dev(pkvm_handle_t iommu_id,
 	return -ENODEV;
 }

-static inline int kvm_iommu_map_pages(pkvm_handle_t iommu_id,
-				      pkvm_handle_t domain_id,
-				      unsigned long iova, phys_addr_t paddr,
-				      size_t pgsize, size_t pgcount, int prot)
+static inline size_t kvm_iommu_map_pages(pkvm_handle_t iommu_id,
+					 pkvm_handle_t domain_id,
+					 unsigned long iova, phys_addr_t paddr,
+					 size_t pgsize, size_t pgcount, int prot)
 {
-	return -ENODEV;
+	return 0;
 }

 static inline size_t kvm_iommu_unmap_pages(pkvm_handle_t iommu_id,
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
index ad8c7b073de46..a69247c861d51 100644
--- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
@@ -190,31 +190,29 @@ int kvm_iommu_detach_dev(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 #define IOMMU_PROT_MASK (IOMMU_READ | IOMMU_WRITE | IOMMU_CACHE |\
 			 IOMMU_NOEXEC | IOMMU_MMIO)

-int kvm_iommu_map_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
-			unsigned long iova, phys_addr_t paddr, size_t pgsize,
-			size_t pgcount, int prot)
+size_t kvm_iommu_map_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
+			   unsigned long iova, phys_addr_t paddr, size_t pgsize,
+			   size_t pgcount, int prot)
 {
 	size_t size;
 	size_t mapped;
 	size_t granule;
 	int ret = -EINVAL;
 	struct io_pgtable iopt;
+	size_t total_mapped = 0;
 	struct kvm_hyp_iommu *iommu;
-	size_t pgcount_orig = pgcount;
-	phys_addr_t paddr_orig = paddr;
-	unsigned long iova_orig = iova;
 	struct kvm_hyp_iommu_domain *domain;

 	if (prot & ~IOMMU_PROT_MASK)
-		return -EINVAL;
+		return 0;

 	if (__builtin_mul_overflow(pgsize, pgcount, &size) ||
 	    iova + size < iova || paddr + size < paddr)
-		return -EOVERFLOW;
+		return 0;

 	iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
 	if (!iommu)
-		return -EINVAL;
+		return 0;

 	hyp_spin_lock(&iommu->lock);
 	domain = handle_to_domain(iommu, domain_id);
@@ -230,29 +228,29 @@ int kvm_iommu_map_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
 		goto err_unlock;

 	iopt = domain_to_iopt(iommu, domain, domain_id);
-	while (pgcount) {
+	while (pgcount && !ret) {
 		mapped = 0;
 		ret = iopt_map_pages(&iopt, iova, paddr, pgsize, pgcount, prot,
 				     0, &mapped);
 		WARN_ON(!IS_ALIGNED(mapped, pgsize));
+		WARN_ON(mapped > pgcount * pgsize);
+
 		pgcount -= mapped / pgsize;
-		if (ret)
-			goto err_unmap;
+		total_mapped += mapped;
 		iova += mapped;
 		paddr += mapped;
 	}

-	hyp_spin_unlock(&iommu->lock);
-	return 0;
-
-err_unmap:
-	pgcount = pgcount_orig - pgcount;
+	/*
+	 * Unshare the bits that haven't been mapped yet. The host calls back
+	 * either to continue mapping, or to unmap and unshare what's been done
+	 * so far.
+	 */
 	if (pgcount)
-		iopt_unmap_pages(&iopt, iova_orig, pgsize, pgcount, NULL);
-	__pkvm_host_unshare_dma(paddr_orig, size);
+		__pkvm_host_unshare_dma(paddr, pgcount * pgsize);
 err_unlock:
 	hyp_spin_unlock(&iommu->lock);
-	return ret;
+	return total_mapped;
 }

 size_t kvm_iommu_unmap_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
index e67cd7485dfd4..0ffe3e32cf419 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
@@ -369,23 +369,32 @@ static int kvm_arm_smmu_attach_dev(struct iommu_domain *domain,
 static int kvm_arm_smmu_map_pages(struct iommu_domain *domain,
 				  unsigned long iova, phys_addr_t paddr,
 				  size_t pgsize, size_t pgcount, int prot,
-				  gfp_t gfp, size_t *mapped)
+				  gfp_t gfp, size_t *total_mapped)
 {
-	int ret;
+	size_t mapped;
 	unsigned long irqflags;
+	size_t size = pgsize * pgcount;
 	struct kvm_arm_smmu_domain *kvm_smmu_domain = to_kvm_smmu_domain(domain);
 	struct arm_smmu_device *smmu = kvm_smmu_domain->smmu;
 	struct host_arm_smmu_device *host_smmu = smmu_to_host(smmu);

 	local_lock_irqsave(&memcache_lock, irqflags);
-	ret = kvm_call_hyp_nvhe_mc(smmu, __pkvm_host_iommu_map_pages,
-				   host_smmu->id, kvm_smmu_domain->id, iova,
-				   paddr, pgsize, pgcount, prot);
+	do {
+		mapped = kvm_call_hyp_nvhe(__pkvm_host_iommu_map_pages,
+					   host_smmu->id, kvm_smmu_domain->id,
+					   iova, paddr, pgsize, pgcount, prot);
+		iova += mapped;
+		paddr += mapped;
+		WARN_ON(mapped % pgsize);
+		WARN_ON(mapped > pgcount * pgsize);
+		pgcount -= mapped / pgsize;
+		*total_mapped += mapped;
+	} while (*total_mapped < size && !kvm_arm_smmu_topup_memcache(smmu));
+	kvm_arm_smmu_reclaim_memcache();
 	local_unlock_irqrestore(&memcache_lock, irqflags);
-	if (ret)
-		return ret;
+	if (*total_mapped < size)
+		return -EINVAL;

-	*mapped = pgsize * pgcount;
 	return 0;
 }

----

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
index 0ffe3e32cf419..0055ecd75990e 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
@@ -405,6 +405,8 @@ static size_t kvm_arm_smmu_unmap_pages(struct iommu_domain *domain,
 {
 	size_t unmapped;
 	unsigned long irqflags;
+	size_t total_unmapped = 0;
+	size_t size = pgsize * pgcount;
 	struct kvm_arm_smmu_domain *kvm_smmu_domain = to_kvm_smmu_domain(domain);
 	struct arm_smmu_device *smmu = kvm_smmu_domain->smmu;
 	struct host_arm_smmu_device *host_smmu = smmu_to_host(smmu);
@@ -414,11 +416,23 @@ static size_t kvm_arm_smmu_unmap_pages(struct iommu_domain *domain,
 		unmapped = kvm_call_hyp_nvhe(__pkvm_host_iommu_unmap_pages,
 					     host_smmu->id, kvm_smmu_domain->id,
 					     iova, pgsize, pgcount);
-	} while (!unmapped && !kvm_arm_smmu_topup_memcache(smmu));
+		total_unmapped += unmapped;
+		iova += unmapped;
+		WARN_ON(unmapped % pgsize);
+		pgcount -= unmapped / pgsize;
+
+		/*
+		 * The page table driver can unmap less than we asked for. If it
+		 * didn't unmap anything at all, then it either reached the end
+		 * of the range, or it needs a page in the memcache to break a
+		 * block mapping.
+		 */
+	} while (total_unmapped < size &&
+		 (unmapped || !kvm_arm_smmu_topup_memcache(smmu)));
 	kvm_arm_smmu_reclaim_memcache();
 	local_unlock_irqrestore(&memcache_lock, irqflags);
 
-	return unmapped;
+	return total_unmapped;
 }
 
 static phys_addr_t kvm_arm_smmu_iova_to_phys(struct iommu_domain *domain,


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 40/45] iommu/arm-smmu-v3-kvm: Add IOMMU ops
  2023-09-25 17:18       ` Jean-Philippe Brucker
@ 2023-09-26  9:54         ` Mostafa Saleh
  -1 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2023-09-26  9:54 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Jean,

On Mon, Sep 25, 2023 at 06:18:53PM +0100, Jean-Philippe Brucker wrote:
> On Wed, Sep 20, 2023 at 04:27:41PM +0000, Mostafa Saleh wrote:
> > > +static void kvm_arm_smmu_domain_free(struct iommu_domain *domain)
> > > +{
> > > +	int ret;
> > > +	struct kvm_arm_smmu_domain *kvm_smmu_domain = to_kvm_smmu_domain(domain);
> > > +	struct arm_smmu_device *smmu = kvm_smmu_domain->smmu;
> > > +
> > > +	if (smmu) {
> > > +		struct host_arm_smmu_device *host_smmu = smmu_to_host(smmu);
> > > +
> > > +		ret = kvm_call_hyp_nvhe(__pkvm_host_iommu_free_domain,
> > > +					host_smmu->id, kvm_smmu_domain->id);
> > > +		/*
> > > +		 * On failure, leak the pgd because it probably hasn't been
> > > +		 * reclaimed by the host.
> > > +		 */
> > > +		if (!WARN_ON(ret))
> > > +			free_pages(kvm_smmu_domain->pgd, host_smmu->pgd_order);
> > I believe this doube-free the pgd in case of attatch_dev fails, as it
> > would try to free it their also (in kvm_arm_smmu_domain_finalize).
> > 
> > I think this is right place to free the pgd.
> 
> Since this depends on kvm_smmu_domain->smmu being non-NULL, which is only
> true if finalize() succeeded, then we shouldn't get a double-free.

Yes, the other free was comming from an experiment I was making to use
the IOMMU layer with guest VMs, so this is correct. Sorry about that.

> But finalize() does leak kvm_smmu_domain->id if the pgd allocation fails,
> I fixed that.

Thanks!

> Thanks,
> Jean

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 40/45] iommu/arm-smmu-v3-kvm: Add IOMMU ops
@ 2023-09-26  9:54         ` Mostafa Saleh
  0 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2023-09-26  9:54 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Jean,

On Mon, Sep 25, 2023 at 06:18:53PM +0100, Jean-Philippe Brucker wrote:
> On Wed, Sep 20, 2023 at 04:27:41PM +0000, Mostafa Saleh wrote:
> > > +static void kvm_arm_smmu_domain_free(struct iommu_domain *domain)
> > > +{
> > > +	int ret;
> > > +	struct kvm_arm_smmu_domain *kvm_smmu_domain = to_kvm_smmu_domain(domain);
> > > +	struct arm_smmu_device *smmu = kvm_smmu_domain->smmu;
> > > +
> > > +	if (smmu) {
> > > +		struct host_arm_smmu_device *host_smmu = smmu_to_host(smmu);
> > > +
> > > +		ret = kvm_call_hyp_nvhe(__pkvm_host_iommu_free_domain,
> > > +					host_smmu->id, kvm_smmu_domain->id);
> > > +		/*
> > > +		 * On failure, leak the pgd because it probably hasn't been
> > > +		 * reclaimed by the host.
> > > +		 */
> > > +		if (!WARN_ON(ret))
> > > +			free_pages(kvm_smmu_domain->pgd, host_smmu->pgd_order);
> > I believe this doube-free the pgd in case of attatch_dev fails, as it
> > would try to free it their also (in kvm_arm_smmu_domain_finalize).
> > 
> > I think this is right place to free the pgd.
> 
> Since this depends on kvm_smmu_domain->smmu being non-NULL, which is only
> true if finalize() succeeded, then we shouldn't get a double-free.

Yes, the other free was comming from an experiment I was making to use
the IOMMU layer with guest VMs, so this is correct. Sorry about that.

> But finalize() does leak kvm_smmu_domain->id if the pgd allocation fails,
> I fixed that.

Thanks!

> Thanks,
> Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 12/45] KVM: arm64: pkvm: Unify pkvm_pkvm_teardown_donated_memory()
  2023-02-01 12:52   ` Jean-Philippe Brucker
@ 2024-01-15 14:33     ` Sebastian Ene
  -1 siblings, 0 replies; 201+ messages in thread
From: Sebastian Ene @ 2024-01-15 14:33 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, smostafa, dbrazdil,
	ryan.roberts, linux-arm-kernel, kvmarm, iommu

On Wed, Feb 01, 2023 at 12:52:56PM +0000, Jean-Philippe Brucker wrote:

Hi Jean,

> Tearing down donated memory requires clearing the memory, pushing the
> pages into the reclaim memcache, and moving the mapping into the host
> stage-2. Keep these operations in a single function.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> ---
>  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  2 +
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c         |  3 +-
>  arch/arm64/kvm/hyp/nvhe/pkvm.c                | 50 +++++++------------
>  3 files changed, 22 insertions(+), 33 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> index d4f4ffbb7dbb..021825aee854 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -86,6 +86,8 @@ void reclaim_guest_pages(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *mc);
>  
>  void *pkvm_map_donated_memory(unsigned long host_va, size_t size);
>  void pkvm_unmap_donated_memory(void *va, size_t size);
> +void pkvm_teardown_donated_memory(struct kvm_hyp_memcache *mc, void *addr,
> +				  size_t dirty_size);
>  
>  static __always_inline void __load_host_stage2(void)
>  {
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 410361f41e38..cad5736026d5 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -314,8 +314,7 @@ void reclaim_guest_pages(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *mc)
>  	addr = hyp_alloc_pages(&vm->pool, 0);
>  	while (addr) {
>  		memset(hyp_virt_to_page(addr), 0, sizeof(struct hyp_page));
> -		push_hyp_memcache(mc, addr, hyp_virt_to_phys);
> -		WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(addr), 1));
> +		pkvm_teardown_donated_memory(mc, addr, 0);
>  		addr = hyp_alloc_pages(&vm->pool, 0);
>  	}
>  }
> diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> index a3711979bbd3..c51a8a592849 100644
> --- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
> +++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> @@ -602,27 +602,28 @@ void *pkvm_map_donated_memory(unsigned long host_va, size_t size)
>  	return va;
>  }
>  
> -static void __unmap_donated_memory(void *va, size_t size)
> +void pkvm_teardown_donated_memory(struct kvm_hyp_memcache *mc, void *va,
> +				  size_t dirty_size)
>  {
> -	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(va),
> -				       PAGE_ALIGN(size) >> PAGE_SHIFT));
> -}
> +	size_t size = max(PAGE_ALIGN(dirty_size), PAGE_SIZE);
>  
> -void pkvm_unmap_donated_memory(void *va, size_t size)
> -{
>  	if (!va)
>  		return;
>  
> -	memset(va, 0, size);
> -	__unmap_donated_memory(va, size);
> +	memset(va, 0, dirty_size);
> +
> +	if (mc) {
> +		for (void *start = va; start < va + size; start += PAGE_SIZE)
> +			push_hyp_memcache(mc, start, hyp_virt_to_phys);
> +	}
> +
> +	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(va),
> +				       size >> PAGE_SHIFT));
>  }
>  
> -static void unmap_donated_memory_noclear(void *va, size_t size)
> +void pkvm_unmap_donated_memory(void *va, size_t size)
>  {
> -	if (!va)
> -		return;
> -
> -	__unmap_donated_memory(va, size);
> +	pkvm_teardown_donated_memory(NULL, va, size);
>  }
>  
>  /*
> @@ -759,18 +760,6 @@ int __pkvm_init_vcpu(pkvm_handle_t handle, struct kvm_vcpu *host_vcpu,
>  	return ret;
>  }
>  
> -static void
> -teardown_donated_memory(struct kvm_hyp_memcache *mc, void *addr, size_t size)
> -{
> -	size = PAGE_ALIGN(size);
> -	memset(addr, 0, size);
> -
> -	for (void *start = addr; start < addr + size; start += PAGE_SIZE)
> -		push_hyp_memcache(mc, start, hyp_virt_to_phys);
> -
> -	unmap_donated_memory_noclear(addr, size);
> -}
> -
>  int __pkvm_teardown_vm(pkvm_handle_t handle)
>  {
>  	size_t vm_size, last_ran_size;
> @@ -813,19 +802,18 @@ int __pkvm_teardown_vm(pkvm_handle_t handle)
>  		vcpu_mc = &hyp_vcpu->vcpu.arch.pkvm_memcache;
>  		while (vcpu_mc->nr_pages) {
>  			addr = pop_hyp_memcache(vcpu_mc, hyp_phys_to_virt);
> -			push_hyp_memcache(mc, addr, hyp_virt_to_phys);
> -			unmap_donated_memory_noclear(addr, PAGE_SIZE);
> +			pkvm_teardown_donated_memory(mc, addr, 0);

Here we probably need to pass PAGE_SIZE as an argument instead of "0"
to make sure that we clear out the content of the page before tearing it
down.

>  		}
>  
> -		teardown_donated_memory(mc, hyp_vcpu, sizeof(*hyp_vcpu));
> +		pkvm_teardown_donated_memory(mc, hyp_vcpu, sizeof(*hyp_vcpu));
>  	}
>  
>  	last_ran_size = pkvm_get_last_ran_size();
> -	teardown_donated_memory(mc, hyp_vm->kvm.arch.mmu.last_vcpu_ran,
> -				last_ran_size);
> +	pkvm_teardown_donated_memory(mc, hyp_vm->kvm.arch.mmu.last_vcpu_ran,
> +				     last_ran_size);
>  
>  	vm_size = pkvm_get_hyp_vm_size(hyp_vm->kvm.created_vcpus);
> -	teardown_donated_memory(mc, hyp_vm, vm_size);
> +	pkvm_teardown_donated_memory(mc, hyp_vm, vm_size);
>  	hyp_unpin_shared_mem(host_kvm, host_kvm + 1);
>  	return 0;
>  
> -- 
> 2.39.0
>

Thanks,
Seb

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 12/45] KVM: arm64: pkvm: Unify pkvm_pkvm_teardown_donated_memory()
@ 2024-01-15 14:33     ` Sebastian Ene
  0 siblings, 0 replies; 201+ messages in thread
From: Sebastian Ene @ 2024-01-15 14:33 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, smostafa, dbrazdil,
	ryan.roberts, linux-arm-kernel, kvmarm, iommu

On Wed, Feb 01, 2023 at 12:52:56PM +0000, Jean-Philippe Brucker wrote:

Hi Jean,

> Tearing down donated memory requires clearing the memory, pushing the
> pages into the reclaim memcache, and moving the mapping into the host
> stage-2. Keep these operations in a single function.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> ---
>  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  2 +
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c         |  3 +-
>  arch/arm64/kvm/hyp/nvhe/pkvm.c                | 50 +++++++------------
>  3 files changed, 22 insertions(+), 33 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> index d4f4ffbb7dbb..021825aee854 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -86,6 +86,8 @@ void reclaim_guest_pages(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *mc);
>  
>  void *pkvm_map_donated_memory(unsigned long host_va, size_t size);
>  void pkvm_unmap_donated_memory(void *va, size_t size);
> +void pkvm_teardown_donated_memory(struct kvm_hyp_memcache *mc, void *addr,
> +				  size_t dirty_size);
>  
>  static __always_inline void __load_host_stage2(void)
>  {
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 410361f41e38..cad5736026d5 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -314,8 +314,7 @@ void reclaim_guest_pages(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *mc)
>  	addr = hyp_alloc_pages(&vm->pool, 0);
>  	while (addr) {
>  		memset(hyp_virt_to_page(addr), 0, sizeof(struct hyp_page));
> -		push_hyp_memcache(mc, addr, hyp_virt_to_phys);
> -		WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(addr), 1));
> +		pkvm_teardown_donated_memory(mc, addr, 0);
>  		addr = hyp_alloc_pages(&vm->pool, 0);
>  	}
>  }
> diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> index a3711979bbd3..c51a8a592849 100644
> --- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
> +++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> @@ -602,27 +602,28 @@ void *pkvm_map_donated_memory(unsigned long host_va, size_t size)
>  	return va;
>  }
>  
> -static void __unmap_donated_memory(void *va, size_t size)
> +void pkvm_teardown_donated_memory(struct kvm_hyp_memcache *mc, void *va,
> +				  size_t dirty_size)
>  {
> -	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(va),
> -				       PAGE_ALIGN(size) >> PAGE_SHIFT));
> -}
> +	size_t size = max(PAGE_ALIGN(dirty_size), PAGE_SIZE);
>  
> -void pkvm_unmap_donated_memory(void *va, size_t size)
> -{
>  	if (!va)
>  		return;
>  
> -	memset(va, 0, size);
> -	__unmap_donated_memory(va, size);
> +	memset(va, 0, dirty_size);
> +
> +	if (mc) {
> +		for (void *start = va; start < va + size; start += PAGE_SIZE)
> +			push_hyp_memcache(mc, start, hyp_virt_to_phys);
> +	}
> +
> +	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(va),
> +				       size >> PAGE_SHIFT));
>  }
>  
> -static void unmap_donated_memory_noclear(void *va, size_t size)
> +void pkvm_unmap_donated_memory(void *va, size_t size)
>  {
> -	if (!va)
> -		return;
> -
> -	__unmap_donated_memory(va, size);
> +	pkvm_teardown_donated_memory(NULL, va, size);
>  }
>  
>  /*
> @@ -759,18 +760,6 @@ int __pkvm_init_vcpu(pkvm_handle_t handle, struct kvm_vcpu *host_vcpu,
>  	return ret;
>  }
>  
> -static void
> -teardown_donated_memory(struct kvm_hyp_memcache *mc, void *addr, size_t size)
> -{
> -	size = PAGE_ALIGN(size);
> -	memset(addr, 0, size);
> -
> -	for (void *start = addr; start < addr + size; start += PAGE_SIZE)
> -		push_hyp_memcache(mc, start, hyp_virt_to_phys);
> -
> -	unmap_donated_memory_noclear(addr, size);
> -}
> -
>  int __pkvm_teardown_vm(pkvm_handle_t handle)
>  {
>  	size_t vm_size, last_ran_size;
> @@ -813,19 +802,18 @@ int __pkvm_teardown_vm(pkvm_handle_t handle)
>  		vcpu_mc = &hyp_vcpu->vcpu.arch.pkvm_memcache;
>  		while (vcpu_mc->nr_pages) {
>  			addr = pop_hyp_memcache(vcpu_mc, hyp_phys_to_virt);
> -			push_hyp_memcache(mc, addr, hyp_virt_to_phys);
> -			unmap_donated_memory_noclear(addr, PAGE_SIZE);
> +			pkvm_teardown_donated_memory(mc, addr, 0);

Here we probably need to pass PAGE_SIZE as an argument instead of "0"
to make sure that we clear out the content of the page before tearing it
down.

>  		}
>  
> -		teardown_donated_memory(mc, hyp_vcpu, sizeof(*hyp_vcpu));
> +		pkvm_teardown_donated_memory(mc, hyp_vcpu, sizeof(*hyp_vcpu));
>  	}
>  
>  	last_ran_size = pkvm_get_last_ran_size();
> -	teardown_donated_memory(mc, hyp_vm->kvm.arch.mmu.last_vcpu_ran,
> -				last_ran_size);
> +	pkvm_teardown_donated_memory(mc, hyp_vm->kvm.arch.mmu.last_vcpu_ran,
> +				     last_ran_size);
>  
>  	vm_size = pkvm_get_hyp_vm_size(hyp_vm->kvm.created_vcpus);
> -	teardown_donated_memory(mc, hyp_vm, vm_size);
> +	pkvm_teardown_donated_memory(mc, hyp_vm, vm_size);
>  	hyp_unpin_shared_mem(host_kvm, host_kvm + 1);
>  	return 0;
>  
> -- 
> 2.39.0
>

Thanks,
Seb

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 27/45] KVM: arm64: smmu-v3: Setup domains and page table configuration
  2023-02-01 12:53   ` Jean-Philippe Brucker
@ 2024-01-15 14:34     ` Mostafa Saleh
  -1 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2024-01-15 14:34 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu, Daniel Mentz

Hi Jean,

On Wed, Feb 1, 2023 at 12:59 PM Jean-Philippe Brucker
<jean-philippe@linaro.org> wrote:
>
> Setup the stream table entries when the host issues the attach_dev() and
> detach_dev() hypercalls. The driver holds one io-pgtable configuration
> for all domains.
>
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> ---
>  include/kvm/arm_smmu_v3.h                   |   2 +
>  arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c | 178 +++++++++++++++++++-
>  2 files changed, 177 insertions(+), 3 deletions(-)
>
> diff --git a/include/kvm/arm_smmu_v3.h b/include/kvm/arm_smmu_v3.h
> index fc67a3bf5709..ed139b0e9612 100644
> --- a/include/kvm/arm_smmu_v3.h
> +++ b/include/kvm/arm_smmu_v3.h
> @@ -3,6 +3,7 @@
>  #define __KVM_ARM_SMMU_V3_H
>
>  #include <asm/kvm_asm.h>
> +#include <linux/io-pgtable-arm.h>
>  #include <kvm/iommu.h>
>
>  #if IS_ENABLED(CONFIG_ARM_SMMU_V3_PKVM)
> @@ -28,6 +29,7 @@ struct hyp_arm_smmu_v3_device {
>         size_t                  strtab_num_entries;
>         size_t                  strtab_num_l1_entries;
>         u8                      strtab_split;
> +       struct arm_lpae_io_pgtable pgtable;
>  };
>
>  extern size_t kvm_nvhe_sym(kvm_hyp_arm_smmu_v3_count);
> diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
> index 81040339ccfe..56e313203a16 100644
> --- a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
> +++ b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
> @@ -152,7 +152,6 @@ static int smmu_send_cmd(struct hyp_arm_smmu_v3_device *smmu,
>         return smmu_sync_cmd(smmu);
>  }
>
> -__maybe_unused
>  static int smmu_sync_ste(struct hyp_arm_smmu_v3_device *smmu, u32 sid)
>  {
>         struct arm_smmu_cmdq_ent cmd = {
> @@ -194,7 +193,6 @@ static int smmu_alloc_l2_strtab(struct hyp_arm_smmu_v3_device *smmu, u32 idx)
>         return 0;
>  }
>
> -__maybe_unused
>  static u64 *smmu_get_ste_ptr(struct hyp_arm_smmu_v3_device *smmu, u32 sid)
>  {
>         u32 idx;
> @@ -382,6 +380,68 @@ static int smmu_reset_device(struct hyp_arm_smmu_v3_device *smmu)
>         return smmu_write_cr0(smmu, 0);
>  }
>
> +static struct hyp_arm_smmu_v3_device *to_smmu(struct kvm_hyp_iommu *iommu)
> +{
> +       return container_of(iommu, struct hyp_arm_smmu_v3_device, iommu);
> +}
> +
> +static void smmu_tlb_flush_all(void *cookie)
> +{
> +       struct kvm_iommu_tlb_cookie *data = cookie;
> +       struct hyp_arm_smmu_v3_device *smmu = to_smmu(data->iommu);
> +       struct arm_smmu_cmdq_ent cmd = {
> +               .opcode = CMDQ_OP_TLBI_S12_VMALL,
> +               .tlbi.vmid = data->domain_id,
> +       };
> +
> +       WARN_ON(smmu_send_cmd(smmu, &cmd));
> +}
> +
> +static void smmu_tlb_inv_range(struct kvm_iommu_tlb_cookie *data,
> +                              unsigned long iova, size_t size, size_t granule,
> +                              bool leaf)
> +{
> +       struct hyp_arm_smmu_v3_device *smmu = to_smmu(data->iommu);
> +       unsigned long end = iova + size;
> +       struct arm_smmu_cmdq_ent cmd = {
> +               .opcode = CMDQ_OP_TLBI_S2_IPA,
> +               .tlbi.vmid = data->domain_id,
> +               .tlbi.leaf = leaf,
> +       };
> +
> +       /*
> +        * There are no mappings at high addresses since we don't use TTB1, so
> +        * no overflow possible.
> +        */
> +       BUG_ON(end < iova);
> +
> +       while (iova < end) {
> +               cmd.tlbi.addr = iova;
> +               WARN_ON(smmu_send_cmd(smmu, &cmd));

This would issue a sync command between each range, which is not needed,
maybe we can build the command first and then issue the sync, similar
to what the upstream driver does, what do you think?

Thanks,
Mostafa

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 27/45] KVM: arm64: smmu-v3: Setup domains and page table configuration
@ 2024-01-15 14:34     ` Mostafa Saleh
  0 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2024-01-15 14:34 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu, Daniel Mentz

Hi Jean,

On Wed, Feb 1, 2023 at 12:59 PM Jean-Philippe Brucker
<jean-philippe@linaro.org> wrote:
>
> Setup the stream table entries when the host issues the attach_dev() and
> detach_dev() hypercalls. The driver holds one io-pgtable configuration
> for all domains.
>
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> ---
>  include/kvm/arm_smmu_v3.h                   |   2 +
>  arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c | 178 +++++++++++++++++++-
>  2 files changed, 177 insertions(+), 3 deletions(-)
>
> diff --git a/include/kvm/arm_smmu_v3.h b/include/kvm/arm_smmu_v3.h
> index fc67a3bf5709..ed139b0e9612 100644
> --- a/include/kvm/arm_smmu_v3.h
> +++ b/include/kvm/arm_smmu_v3.h
> @@ -3,6 +3,7 @@
>  #define __KVM_ARM_SMMU_V3_H
>
>  #include <asm/kvm_asm.h>
> +#include <linux/io-pgtable-arm.h>
>  #include <kvm/iommu.h>
>
>  #if IS_ENABLED(CONFIG_ARM_SMMU_V3_PKVM)
> @@ -28,6 +29,7 @@ struct hyp_arm_smmu_v3_device {
>         size_t                  strtab_num_entries;
>         size_t                  strtab_num_l1_entries;
>         u8                      strtab_split;
> +       struct arm_lpae_io_pgtable pgtable;
>  };
>
>  extern size_t kvm_nvhe_sym(kvm_hyp_arm_smmu_v3_count);
> diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
> index 81040339ccfe..56e313203a16 100644
> --- a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
> +++ b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
> @@ -152,7 +152,6 @@ static int smmu_send_cmd(struct hyp_arm_smmu_v3_device *smmu,
>         return smmu_sync_cmd(smmu);
>  }
>
> -__maybe_unused
>  static int smmu_sync_ste(struct hyp_arm_smmu_v3_device *smmu, u32 sid)
>  {
>         struct arm_smmu_cmdq_ent cmd = {
> @@ -194,7 +193,6 @@ static int smmu_alloc_l2_strtab(struct hyp_arm_smmu_v3_device *smmu, u32 idx)
>         return 0;
>  }
>
> -__maybe_unused
>  static u64 *smmu_get_ste_ptr(struct hyp_arm_smmu_v3_device *smmu, u32 sid)
>  {
>         u32 idx;
> @@ -382,6 +380,68 @@ static int smmu_reset_device(struct hyp_arm_smmu_v3_device *smmu)
>         return smmu_write_cr0(smmu, 0);
>  }
>
> +static struct hyp_arm_smmu_v3_device *to_smmu(struct kvm_hyp_iommu *iommu)
> +{
> +       return container_of(iommu, struct hyp_arm_smmu_v3_device, iommu);
> +}
> +
> +static void smmu_tlb_flush_all(void *cookie)
> +{
> +       struct kvm_iommu_tlb_cookie *data = cookie;
> +       struct hyp_arm_smmu_v3_device *smmu = to_smmu(data->iommu);
> +       struct arm_smmu_cmdq_ent cmd = {
> +               .opcode = CMDQ_OP_TLBI_S12_VMALL,
> +               .tlbi.vmid = data->domain_id,
> +       };
> +
> +       WARN_ON(smmu_send_cmd(smmu, &cmd));
> +}
> +
> +static void smmu_tlb_inv_range(struct kvm_iommu_tlb_cookie *data,
> +                              unsigned long iova, size_t size, size_t granule,
> +                              bool leaf)
> +{
> +       struct hyp_arm_smmu_v3_device *smmu = to_smmu(data->iommu);
> +       unsigned long end = iova + size;
> +       struct arm_smmu_cmdq_ent cmd = {
> +               .opcode = CMDQ_OP_TLBI_S2_IPA,
> +               .tlbi.vmid = data->domain_id,
> +               .tlbi.leaf = leaf,
> +       };
> +
> +       /*
> +        * There are no mappings at high addresses since we don't use TTB1, so
> +        * no overflow possible.
> +        */
> +       BUG_ON(end < iova);
> +
> +       while (iova < end) {
> +               cmd.tlbi.addr = iova;
> +               WARN_ON(smmu_send_cmd(smmu, &cmd));

This would issue a sync command between each range, which is not needed,
maybe we can build the command first and then issue the sync, similar
to what the upstream driver does, what do you think?

Thanks,
Mostafa

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 24/45] KVM: arm64: smmu-v3: Setup stream table
  2023-02-01 12:53   ` Jean-Philippe Brucker
@ 2024-01-16  8:59     ` Mostafa Saleh
  -1 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2024-01-16  8:59 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Jean,

On Wed, Feb 1, 2023 at 12:59 PM Jean-Philippe Brucker
<jean-philippe@linaro.org> wrote:
>
> Map the stream table allocated by the host into the hypervisor address
> space. When the host mappings are finalized, the table is unmapped from
> the host. Depending on the host configuration, the stream table may have
> one or two levels. Populate the level-2 stream table lazily.
>
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> ---
>  include/kvm/arm_smmu_v3.h                   |   4 +
>  arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c | 133 +++++++++++++++++++-
>  2 files changed, 136 insertions(+), 1 deletion(-)
>
> diff --git a/include/kvm/arm_smmu_v3.h b/include/kvm/arm_smmu_v3.h
> index da36737bc1e0..fc67a3bf5709 100644
> --- a/include/kvm/arm_smmu_v3.h
> +++ b/include/kvm/arm_smmu_v3.h
> @@ -24,6 +24,10 @@ struct hyp_arm_smmu_v3_device {
>         u32                     cmdq_prod;
>         u64                     *cmdq_base;
>         size_t                  cmdq_log2size;
> +       u64                     *strtab_base;
> +       size_t                  strtab_num_entries;
> +       size_t                  strtab_num_l1_entries;
> +       u8                      strtab_split;
>  };
>
>  extern size_t kvm_nvhe_sym(kvm_hyp_arm_smmu_v3_count);
> diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
> index 36ee5724f36f..021bebebd40c 100644
> --- a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
> +++ b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
> @@ -141,7 +141,6 @@ static int smmu_sync_cmd(struct hyp_arm_smmu_v3_device *smmu)
>         return smmu_wait_event(smmu, smmu_cmdq_empty(smmu));
>  }
>
> -__maybe_unused
>  static int smmu_send_cmd(struct hyp_arm_smmu_v3_device *smmu,
>                          struct arm_smmu_cmdq_ent *cmd)
>  {
> @@ -153,6 +152,82 @@ static int smmu_send_cmd(struct hyp_arm_smmu_v3_device *smmu,
>         return smmu_sync_cmd(smmu);
>  }
>
> +__maybe_unused
> +static int smmu_sync_ste(struct hyp_arm_smmu_v3_device *smmu, u32 sid)
> +{
> +       struct arm_smmu_cmdq_ent cmd = {
> +               .opcode = CMDQ_OP_CFGI_STE,
> +               .cfgi.sid = sid,
> +               .cfgi.leaf = true,
> +       };
> +
> +       return smmu_send_cmd(smmu, &cmd);
> +}
> +
I see the page tables are properly configured for ARM_SMMU_FEAT_COHERENCY but no
handling for the STE or CMDQ, I believe here we should have something as:
if (!(smmu->features & ARM_SMMU_FEAT_COHERENCY))
        kvm_flush_dcache_to_poc(step, STRTAB_STE_DWORDS << 3);

Similarly in "smmu_add_cmd" for the command queue. Or use NC mapping
(which doesn't exist
upstream as far as I can see)

Thanks,
Mostafa

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 24/45] KVM: arm64: smmu-v3: Setup stream table
@ 2024-01-16  8:59     ` Mostafa Saleh
  0 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2024-01-16  8:59 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Jean,

On Wed, Feb 1, 2023 at 12:59 PM Jean-Philippe Brucker
<jean-philippe@linaro.org> wrote:
>
> Map the stream table allocated by the host into the hypervisor address
> space. When the host mappings are finalized, the table is unmapped from
> the host. Depending on the host configuration, the stream table may have
> one or two levels. Populate the level-2 stream table lazily.
>
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> ---
>  include/kvm/arm_smmu_v3.h                   |   4 +
>  arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c | 133 +++++++++++++++++++-
>  2 files changed, 136 insertions(+), 1 deletion(-)
>
> diff --git a/include/kvm/arm_smmu_v3.h b/include/kvm/arm_smmu_v3.h
> index da36737bc1e0..fc67a3bf5709 100644
> --- a/include/kvm/arm_smmu_v3.h
> +++ b/include/kvm/arm_smmu_v3.h
> @@ -24,6 +24,10 @@ struct hyp_arm_smmu_v3_device {
>         u32                     cmdq_prod;
>         u64                     *cmdq_base;
>         size_t                  cmdq_log2size;
> +       u64                     *strtab_base;
> +       size_t                  strtab_num_entries;
> +       size_t                  strtab_num_l1_entries;
> +       u8                      strtab_split;
>  };
>
>  extern size_t kvm_nvhe_sym(kvm_hyp_arm_smmu_v3_count);
> diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
> index 36ee5724f36f..021bebebd40c 100644
> --- a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
> +++ b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
> @@ -141,7 +141,6 @@ static int smmu_sync_cmd(struct hyp_arm_smmu_v3_device *smmu)
>         return smmu_wait_event(smmu, smmu_cmdq_empty(smmu));
>  }
>
> -__maybe_unused
>  static int smmu_send_cmd(struct hyp_arm_smmu_v3_device *smmu,
>                          struct arm_smmu_cmdq_ent *cmd)
>  {
> @@ -153,6 +152,82 @@ static int smmu_send_cmd(struct hyp_arm_smmu_v3_device *smmu,
>         return smmu_sync_cmd(smmu);
>  }
>
> +__maybe_unused
> +static int smmu_sync_ste(struct hyp_arm_smmu_v3_device *smmu, u32 sid)
> +{
> +       struct arm_smmu_cmdq_ent cmd = {
> +               .opcode = CMDQ_OP_CFGI_STE,
> +               .cfgi.sid = sid,
> +               .cfgi.leaf = true,
> +       };
> +
> +       return smmu_send_cmd(smmu, &cmd);
> +}
> +
I see the page tables are properly configured for ARM_SMMU_FEAT_COHERENCY but no
handling for the STE or CMDQ, I believe here we should have something as:
if (!(smmu->features & ARM_SMMU_FEAT_COHERENCY))
        kvm_flush_dcache_to_poc(step, STRTAB_STE_DWORDS << 3);

Similarly in "smmu_add_cmd" for the command queue. Or use NC mapping
(which doesn't exist
upstream as far as I can see)

Thanks,
Mostafa

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 24/45] KVM: arm64: smmu-v3: Setup stream table
  2024-01-16  8:59     ` Mostafa Saleh
@ 2024-01-23 19:45       ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2024-01-23 19:45 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Mostafa,

On Tue, Jan 16, 2024 at 08:59:41AM +0000, Mostafa Saleh wrote:
> > +__maybe_unused
> > +static int smmu_sync_ste(struct hyp_arm_smmu_v3_device *smmu, u32 sid)
> > +{
> > +       struct arm_smmu_cmdq_ent cmd = {
> > +               .opcode = CMDQ_OP_CFGI_STE,
> > +               .cfgi.sid = sid,
> > +               .cfgi.leaf = true,
> > +       };
> > +
> > +       return smmu_send_cmd(smmu, &cmd);
> > +}
> > +
> I see the page tables are properly configured for ARM_SMMU_FEAT_COHERENCY but no
> handling for the STE or CMDQ, I believe here we should have something as:
> if (!(smmu->features & ARM_SMMU_FEAT_COHERENCY))
>         kvm_flush_dcache_to_poc(step, STRTAB_STE_DWORDS << 3);
> 
> Similarly in "smmu_add_cmd" for the command queue. Or use NC mapping
> (which doesn't exist
> upstream as far as I can see)

Right, the host driver seems to do this. If I'm following correctly we end
up with dma_direct_alloc() calling pgprot_dmacoherent() and get
MT_NORMAL_NC, when the SMMU is declared non-coherent in DT/IORT.

So we'd get mismatched attributes if hyp is then mapping these structures
cacheable, but I don't remember how that works exactly. Might be fine
since host donates the pages to hyp and we'd have a cache flush in
between. I'll have to read up on that.

Regardless, mapping NC seems cleaner, more readable. I'll see if I can add
that attribute to kvm_pgtable_hyp_map().

Thanks,
Jean

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 24/45] KVM: arm64: smmu-v3: Setup stream table
@ 2024-01-23 19:45       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2024-01-23 19:45 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Mostafa,

On Tue, Jan 16, 2024 at 08:59:41AM +0000, Mostafa Saleh wrote:
> > +__maybe_unused
> > +static int smmu_sync_ste(struct hyp_arm_smmu_v3_device *smmu, u32 sid)
> > +{
> > +       struct arm_smmu_cmdq_ent cmd = {
> > +               .opcode = CMDQ_OP_CFGI_STE,
> > +               .cfgi.sid = sid,
> > +               .cfgi.leaf = true,
> > +       };
> > +
> > +       return smmu_send_cmd(smmu, &cmd);
> > +}
> > +
> I see the page tables are properly configured for ARM_SMMU_FEAT_COHERENCY but no
> handling for the STE or CMDQ, I believe here we should have something as:
> if (!(smmu->features & ARM_SMMU_FEAT_COHERENCY))
>         kvm_flush_dcache_to_poc(step, STRTAB_STE_DWORDS << 3);
> 
> Similarly in "smmu_add_cmd" for the command queue. Or use NC mapping
> (which doesn't exist
> upstream as far as I can see)

Right, the host driver seems to do this. If I'm following correctly we end
up with dma_direct_alloc() calling pgprot_dmacoherent() and get
MT_NORMAL_NC, when the SMMU is declared non-coherent in DT/IORT.

So we'd get mismatched attributes if hyp is then mapping these structures
cacheable, but I don't remember how that works exactly. Might be fine
since host donates the pages to hyp and we'd have a cache flush in
between. I'll have to read up on that.

Regardless, mapping NC seems cleaner, more readable. I'll see if I can add
that attribute to kvm_pgtable_hyp_map().

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 12/45] KVM: arm64: pkvm: Unify pkvm_pkvm_teardown_donated_memory()
  2024-01-15 14:33     ` Sebastian Ene
@ 2024-01-23 19:49       ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2024-01-23 19:49 UTC (permalink / raw)
  To: Sebastian Ene
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, smostafa, dbrazdil,
	ryan.roberts, linux-arm-kernel, kvmarm, iommu

Hi Seb,

On Mon, Jan 15, 2024 at 02:33:50PM +0000, Sebastian Ene wrote:
> >  int __pkvm_teardown_vm(pkvm_handle_t handle)
> >  {
> >  	size_t vm_size, last_ran_size;
> > @@ -813,19 +802,18 @@ int __pkvm_teardown_vm(pkvm_handle_t handle)
> >  		vcpu_mc = &hyp_vcpu->vcpu.arch.pkvm_memcache;
> >  		while (vcpu_mc->nr_pages) {
> >  			addr = pop_hyp_memcache(vcpu_mc, hyp_phys_to_virt);
> > -			push_hyp_memcache(mc, addr, hyp_virt_to_phys);
> > -			unmap_donated_memory_noclear(addr, PAGE_SIZE);
> > +			pkvm_teardown_donated_memory(mc, addr, 0);
> 
> Here we probably need to pass PAGE_SIZE as an argument instead of "0"
> to make sure that we clear out the content of the page before tearing it
> down.

But since it's replacing unmap_donated_memory_noclear(), would that be a
change of behavior?  That would be a separate patch because this one is
just trying to refactor things.

Thanks,
Jean

> 
> >  		}
> >  
> > -		teardown_donated_memory(mc, hyp_vcpu, sizeof(*hyp_vcpu));
> > +		pkvm_teardown_donated_memory(mc, hyp_vcpu, sizeof(*hyp_vcpu));
> >  	}
> >  
> >  	last_ran_size = pkvm_get_last_ran_size();
> > -	teardown_donated_memory(mc, hyp_vm->kvm.arch.mmu.last_vcpu_ran,
> > -				last_ran_size);
> > +	pkvm_teardown_donated_memory(mc, hyp_vm->kvm.arch.mmu.last_vcpu_ran,
> > +				     last_ran_size);
> >  
> >  	vm_size = pkvm_get_hyp_vm_size(hyp_vm->kvm.created_vcpus);
> > -	teardown_donated_memory(mc, hyp_vm, vm_size);
> > +	pkvm_teardown_donated_memory(mc, hyp_vm, vm_size);
> >  	hyp_unpin_shared_mem(host_kvm, host_kvm + 1);
> >  	return 0;
> >  
> > -- 
> > 2.39.0
> >
> 
> Thanks,
> Seb

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 12/45] KVM: arm64: pkvm: Unify pkvm_pkvm_teardown_donated_memory()
@ 2024-01-23 19:49       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2024-01-23 19:49 UTC (permalink / raw)
  To: Sebastian Ene
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, smostafa, dbrazdil,
	ryan.roberts, linux-arm-kernel, kvmarm, iommu

Hi Seb,

On Mon, Jan 15, 2024 at 02:33:50PM +0000, Sebastian Ene wrote:
> >  int __pkvm_teardown_vm(pkvm_handle_t handle)
> >  {
> >  	size_t vm_size, last_ran_size;
> > @@ -813,19 +802,18 @@ int __pkvm_teardown_vm(pkvm_handle_t handle)
> >  		vcpu_mc = &hyp_vcpu->vcpu.arch.pkvm_memcache;
> >  		while (vcpu_mc->nr_pages) {
> >  			addr = pop_hyp_memcache(vcpu_mc, hyp_phys_to_virt);
> > -			push_hyp_memcache(mc, addr, hyp_virt_to_phys);
> > -			unmap_donated_memory_noclear(addr, PAGE_SIZE);
> > +			pkvm_teardown_donated_memory(mc, addr, 0);
> 
> Here we probably need to pass PAGE_SIZE as an argument instead of "0"
> to make sure that we clear out the content of the page before tearing it
> down.

But since it's replacing unmap_donated_memory_noclear(), would that be a
change of behavior?  That would be a separate patch because this one is
just trying to refactor things.

Thanks,
Jean

> 
> >  		}
> >  
> > -		teardown_donated_memory(mc, hyp_vcpu, sizeof(*hyp_vcpu));
> > +		pkvm_teardown_donated_memory(mc, hyp_vcpu, sizeof(*hyp_vcpu));
> >  	}
> >  
> >  	last_ran_size = pkvm_get_last_ran_size();
> > -	teardown_donated_memory(mc, hyp_vm->kvm.arch.mmu.last_vcpu_ran,
> > -				last_ran_size);
> > +	pkvm_teardown_donated_memory(mc, hyp_vm->kvm.arch.mmu.last_vcpu_ran,
> > +				     last_ran_size);
> >  
> >  	vm_size = pkvm_get_hyp_vm_size(hyp_vm->kvm.created_vcpus);
> > -	teardown_donated_memory(mc, hyp_vm, vm_size);
> > +	pkvm_teardown_donated_memory(mc, hyp_vm, vm_size);
> >  	hyp_unpin_shared_mem(host_kvm, host_kvm + 1);
> >  	return 0;
> >  
> > -- 
> > 2.39.0
> >
> 
> Thanks,
> Seb

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 27/45] KVM: arm64: smmu-v3: Setup domains and page table configuration
  2024-01-15 14:34     ` Mostafa Saleh
@ 2024-01-23 19:50       ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2024-01-23 19:50 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu, Daniel Mentz

On Mon, Jan 15, 2024 at 02:34:12PM +0000, Mostafa Saleh wrote:
> > +static void smmu_tlb_inv_range(struct kvm_iommu_tlb_cookie *data,
> > +                              unsigned long iova, size_t size, size_t granule,
> > +                              bool leaf)
> > +{
> > +       struct hyp_arm_smmu_v3_device *smmu = to_smmu(data->iommu);
> > +       unsigned long end = iova + size;
> > +       struct arm_smmu_cmdq_ent cmd = {
> > +               .opcode = CMDQ_OP_TLBI_S2_IPA,
> > +               .tlbi.vmid = data->domain_id,
> > +               .tlbi.leaf = leaf,
> > +       };
> > +
> > +       /*
> > +        * There are no mappings at high addresses since we don't use TTB1, so
> > +        * no overflow possible.
> > +        */
> > +       BUG_ON(end < iova);
> > +
> > +       while (iova < end) {
> > +               cmd.tlbi.addr = iova;
> > +               WARN_ON(smmu_send_cmd(smmu, &cmd));
> 
> This would issue a sync command between each range, which is not needed,
> maybe we can build the command first and then issue the sync, similar
> to what the upstream driver does, what do you think?

Yes, moving the sync out of the loop would be better. To keep things
simple I'd just replace this with smmu_add_cmd() and add a smmu_sync_cmd()
at the end, but maybe some implementations won't consume the TLBI itself
fast enough, and we need to build a command list in software. Do you think
smmu_add_cmd() is sufficient here?

Thanks,
Jean


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 27/45] KVM: arm64: smmu-v3: Setup domains and page table configuration
@ 2024-01-23 19:50       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2024-01-23 19:50 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu, Daniel Mentz

On Mon, Jan 15, 2024 at 02:34:12PM +0000, Mostafa Saleh wrote:
> > +static void smmu_tlb_inv_range(struct kvm_iommu_tlb_cookie *data,
> > +                              unsigned long iova, size_t size, size_t granule,
> > +                              bool leaf)
> > +{
> > +       struct hyp_arm_smmu_v3_device *smmu = to_smmu(data->iommu);
> > +       unsigned long end = iova + size;
> > +       struct arm_smmu_cmdq_ent cmd = {
> > +               .opcode = CMDQ_OP_TLBI_S2_IPA,
> > +               .tlbi.vmid = data->domain_id,
> > +               .tlbi.leaf = leaf,
> > +       };
> > +
> > +       /*
> > +        * There are no mappings at high addresses since we don't use TTB1, so
> > +        * no overflow possible.
> > +        */
> > +       BUG_ON(end < iova);
> > +
> > +       while (iova < end) {
> > +               cmd.tlbi.addr = iova;
> > +               WARN_ON(smmu_send_cmd(smmu, &cmd));
> 
> This would issue a sync command between each range, which is not needed,
> maybe we can build the command first and then issue the sync, similar
> to what the upstream driver does, what do you think?

Yes, moving the sync out of the loop would be better. To keep things
simple I'd just replace this with smmu_add_cmd() and add a smmu_sync_cmd()
at the end, but maybe some implementations won't consume the TLBI itself
fast enough, and we need to build a command list in software. Do you think
smmu_add_cmd() is sufficient here?

Thanks,
Jean


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 03/45] iommu/io-pgtable: Move fmt into io_pgtable_cfg
  2023-02-01 12:52   ` Jean-Philippe Brucker
@ 2024-02-16 11:55     ` Mostafa Saleh
  -1 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2024-02-16 11:55 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Jean,

On Wed, Feb 01, 2023 at 12:52:47PM +0000, Jean-Philippe Brucker wrote:
> When passing the I/O pagetable configuration around and adding new
> operations, it will be slightly more convenient to have fmt be part of
> the config structure rather than a separate parameter.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> ---
>  include/linux/io-pgtable.h                  |  8 +++----
>  drivers/gpu/drm/msm/msm_iommu.c             |  3 +--
>  drivers/gpu/drm/panfrost/panfrost_mmu.c     |  4 ++--
>  drivers/iommu/amd/iommu.c                   |  3 ++-
>  drivers/iommu/apple-dart.c                  |  4 ++--
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  3 ++-
>  drivers/iommu/arm/arm-smmu/arm-smmu.c       |  3 ++-
>  drivers/iommu/arm/arm-smmu/qcom_iommu.c     |  3 ++-
>  drivers/iommu/io-pgtable-arm-common.c       | 26 ++++++++++-----------
>  drivers/iommu/io-pgtable-arm-v7s.c          |  3 ++-
>  drivers/iommu/io-pgtable-arm.c              |  3 ++-
>  drivers/iommu/io-pgtable-dart.c             |  8 +++----
>  drivers/iommu/io-pgtable.c                  | 10 ++++----
>  drivers/iommu/ipmmu-vmsa.c                  |  4 ++--
>  drivers/iommu/msm_iommu.c                   |  3 ++-
>  drivers/iommu/mtk_iommu.c                   |  3 ++-
>  16 files changed, 47 insertions(+), 44 deletions(-)
> 
> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
> index 1b7a44b35616..1b0c26241a78 100644
> --- a/include/linux/io-pgtable.h
> +++ b/include/linux/io-pgtable.h
> @@ -49,6 +49,7 @@ struct iommu_flush_ops {
>  /**
>   * struct io_pgtable_cfg - Configuration data for a set of page tables.
>   *
> + * @fmt	           Format used for these page tables
>   * @quirks:        A bitmap of hardware quirks that require some special
>   *                 action by the low-level page table allocator.
>   * @pgsize_bitmap: A bitmap of page sizes supported by this set of page
> @@ -62,6 +63,7 @@ struct iommu_flush_ops {
>   *                 page table walker.
>   */
>  struct io_pgtable_cfg {
> +	enum io_pgtable_fmt		fmt;
>  	/*
>  	 * IO_PGTABLE_QUIRK_ARM_NS: (ARM formats) Set NS and NSTABLE bits in
>  	 *	stage 1 PTEs, for hardware which insists on validating them
> @@ -171,15 +173,13 @@ struct io_pgtable_ops {
>  /**
>   * alloc_io_pgtable_ops() - Allocate a page table allocator for use by an IOMMU.
>   *
> - * @fmt:    The page table format.
>   * @cfg:    The page table configuration. This will be modified to represent
>   *          the configuration actually provided by the allocator (e.g. the
>   *          pgsize_bitmap may be restricted).
>   * @cookie: An opaque token provided by the IOMMU driver and passed back to
>   *          the callback routines in cfg->tlb.
>   */
> -struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
> -					    struct io_pgtable_cfg *cfg,
> +struct io_pgtable_ops *alloc_io_pgtable_ops(struct io_pgtable_cfg *cfg,
>  					    void *cookie);
>  
>  /**
> @@ -199,14 +199,12 @@ void free_io_pgtable_ops(struct io_pgtable_ops *ops);
>  /**
>   * struct io_pgtable - Internal structure describing a set of page tables.
>   *
> - * @fmt:    The page table format.
>   * @cookie: An opaque token provided by the IOMMU driver and passed back to
>   *          any callback routines.
>   * @cfg:    A copy of the page table configuration.
>   * @ops:    The page table operations in use for this set of page tables.
>   */
>  struct io_pgtable {
> -	enum io_pgtable_fmt	fmt;
>  	void			*cookie;
>  	struct io_pgtable_cfg	cfg;
>  	struct io_pgtable_ops	ops;
> diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
> index c2507582ecf3..e9c6f281e3dd 100644
> --- a/drivers/gpu/drm/msm/msm_iommu.c
> +++ b/drivers/gpu/drm/msm/msm_iommu.c
> @@ -258,8 +258,7 @@ struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent)
>  	ttbr0_cfg.quirks &= ~IO_PGTABLE_QUIRK_ARM_TTBR1;
>  	ttbr0_cfg.tlb = &null_tlb_ops;
>  
> -	pagetable->pgtbl_ops = alloc_io_pgtable_ops(ARM_64_LPAE_S1,
> -		&ttbr0_cfg, iommu->domain);
This seems to miss:
+	ttbr0_cfg.fmt = ARM_64_LPAE_S1;
> +	pagetable->pgtbl_ops = alloc_io_pgtable_ops(&ttbr0_cfg, iommu->domain);
>  
>  	if (!pagetable->pgtbl_ops) {
>  		kfree(pagetable);
> diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> index 4e83a1891f3e..31bdb5d46244 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> @@ -622,6 +622,7 @@ struct panfrost_mmu *panfrost_mmu_ctx_create(struct panfrost_device *pfdev)
>  	mmu->as = -1;
>  
>  	mmu->pgtbl_cfg = (struct io_pgtable_cfg) {
> +		.fmt		= ARM_MALI_LPAE,
>  		.pgsize_bitmap	= SZ_4K | SZ_2M,
>  		.ias		= FIELD_GET(0xff, pfdev->features.mmu_features),
>  		.oas		= FIELD_GET(0xff00, pfdev->features.mmu_features),
> @@ -630,8 +631,7 @@ struct panfrost_mmu *panfrost_mmu_ctx_create(struct panfrost_device *pfdev)
>  		.iommu_dev	= pfdev->dev,
>  	};
>  
> -	mmu->pgtbl_ops = alloc_io_pgtable_ops(ARM_MALI_LPAE, &mmu->pgtbl_cfg,
> -					      mmu);
> +	mmu->pgtbl_ops = alloc_io_pgtable_ops(&mmu->pgtbl_cfg, mmu);
>  	if (!mmu->pgtbl_ops) {
>  		kfree(mmu);
>  		return ERR_PTR(-EINVAL);
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index cbeaab55c0db..7efb6b467041 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -2072,7 +2072,8 @@ static struct protection_domain *protection_domain_alloc(unsigned int type)
>  	if (ret)
>  		goto out_err;
>  
> -	pgtbl_ops = alloc_io_pgtable_ops(pgtable, &domain->iop.pgtbl_cfg, domain);
> +	domain->iop.pgtbl_cfg.fmt = pgtable;
> +	pgtbl_ops = alloc_io_pgtable_ops(&domain->iop.pgtbl_cfg, domain);
>  	if (!pgtbl_ops) {
>  		domain_id_free(domain->id);
>  		goto out_err;
> diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
> index 4f4a323be0d0..571f948add7c 100644
> --- a/drivers/iommu/apple-dart.c
> +++ b/drivers/iommu/apple-dart.c
> @@ -427,6 +427,7 @@ static int apple_dart_finalize_domain(struct iommu_domain *domain,
>  	}
>  
>  	pgtbl_cfg = (struct io_pgtable_cfg){
> +		.fmt = dart->hw->fmt,
>  		.pgsize_bitmap = dart->pgsize,
>  		.ias = 32,
>  		.oas = dart->hw->oas,
> @@ -434,8 +435,7 @@ static int apple_dart_finalize_domain(struct iommu_domain *domain,
>  		.iommu_dev = dart->dev,
>  	};
>  
> -	dart_domain->pgtbl_ops =
> -		alloc_io_pgtable_ops(dart->hw->fmt, &pgtbl_cfg, domain);
> +	dart_domain->pgtbl_ops = alloc_io_pgtable_ops(&pgtbl_cfg, domain);
>  	if (!dart_domain->pgtbl_ops) {
>  		ret = -ENOMEM;
>  		goto done;
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index ab160198edd6..c033b23ca4b2 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2209,6 +2209,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
>  	}
>  
>  	pgtbl_cfg = (struct io_pgtable_cfg) {
> +		.fmt		= fmt,
>  		.pgsize_bitmap	= smmu->pgsize_bitmap,
>  		.ias		= ias,
>  		.oas		= oas,
> @@ -2217,7 +2218,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
>  		.iommu_dev	= smmu->dev,
>  	};
>  
> -	pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);
> +	pgtbl_ops = alloc_io_pgtable_ops(&pgtbl_cfg, smmu_domain);
>  	if (!pgtbl_ops)
>  		return -ENOMEM;
>  
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> index 719fbca1fe52..f230d2ce977a 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> @@ -747,6 +747,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
>  		cfg->asid = cfg->cbndx;
>  
>  	pgtbl_cfg = (struct io_pgtable_cfg) {
> +		.fmt		= fmt,
>  		.pgsize_bitmap	= smmu->pgsize_bitmap,
>  		.ias		= ias,
>  		.oas		= oas,
> @@ -764,7 +765,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
>  	if (smmu_domain->pgtbl_quirks)
>  		pgtbl_cfg.quirks |= smmu_domain->pgtbl_quirks;
>  
> -	pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);
> +	pgtbl_ops = alloc_io_pgtable_ops(&pgtbl_cfg, smmu_domain);
>  	if (!pgtbl_ops) {
>  		ret = -ENOMEM;
>  		goto out_clear_smmu;
> diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
> index 270c3d9128ba..65eb8bdcbe50 100644
> --- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c
> +++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
> @@ -239,6 +239,7 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
>  		goto out_unlock;
>  
>  	pgtbl_cfg = (struct io_pgtable_cfg) {
> +		.fmt		= ARM_32_LPAE_S1,
>  		.pgsize_bitmap	= qcom_iommu_ops.pgsize_bitmap,
>  		.ias		= 32,
>  		.oas		= 40,
> @@ -249,7 +250,7 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
>  	qcom_domain->iommu = qcom_iommu;
>  	qcom_domain->fwspec = fwspec;
>  
> -	pgtbl_ops = alloc_io_pgtable_ops(ARM_32_LPAE_S1, &pgtbl_cfg, qcom_domain);
> +	pgtbl_ops = alloc_io_pgtable_ops(&pgtbl_cfg, qcom_domain);
>  	if (!pgtbl_ops) {
>  		dev_err(qcom_iommu->dev, "failed to allocate pagetable ops\n");
>  		ret = -ENOMEM;
> diff --git a/drivers/iommu/io-pgtable-arm-common.c b/drivers/iommu/io-pgtable-arm-common.c
> index 7340b5096499..4b3a9ce806ea 100644
> --- a/drivers/iommu/io-pgtable-arm-common.c
> +++ b/drivers/iommu/io-pgtable-arm-common.c
> @@ -62,7 +62,7 @@ static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
>  	size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
>  	int i;
>  
> -	if (data->iop.fmt != ARM_MALI_LPAE && lvl == ARM_LPAE_MAX_LEVELS - 1)
> +	if (data->iop.cfg.fmt != ARM_MALI_LPAE && lvl == ARM_LPAE_MAX_LEVELS - 1)
>  		pte |= ARM_LPAE_PTE_TYPE_PAGE;
>  	else
>  		pte |= ARM_LPAE_PTE_TYPE_BLOCK;
> @@ -82,7 +82,7 @@ static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
>  	int i;
>  
>  	for (i = 0; i < num_entries; i++)
> -		if (iopte_leaf(ptep[i], lvl, data->iop.fmt)) {
> +		if (iopte_leaf(ptep[i], lvl, data->iop.cfg.fmt)) {
>  			/* We require an unmap first */
>  			WARN_ON(!selftest_running);
>  			return -EEXIST;
> @@ -183,7 +183,7 @@ int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova,
>  		__arm_lpae_sync_pte(ptep, 1, cfg);
>  	}
>  
> -	if (pte && !iopte_leaf(pte, lvl, data->iop.fmt)) {
> +	if (pte && !iopte_leaf(pte, lvl, data->iop.cfg.fmt)) {
>  		cptep = iopte_deref(pte, data);
>  	} else if (pte) {
>  		/* We require an unmap first */
> @@ -201,8 +201,8 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
>  {
>  	arm_lpae_iopte pte;
>  
> -	if (data->iop.fmt == ARM_64_LPAE_S1 ||
> -	    data->iop.fmt == ARM_32_LPAE_S1) {
> +	if (data->iop.cfg.fmt == ARM_64_LPAE_S1 ||
> +	    data->iop.cfg.fmt == ARM_32_LPAE_S1) {
>  		pte = ARM_LPAE_PTE_nG;
>  		if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
>  			pte |= ARM_LPAE_PTE_AP_RDONLY;
> @@ -220,8 +220,8 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
>  	 * Note that this logic is structured to accommodate Mali LPAE
>  	 * having stage-1-like attributes but stage-2-like permissions.
>  	 */
> -	if (data->iop.fmt == ARM_64_LPAE_S2 ||
> -	    data->iop.fmt == ARM_32_LPAE_S2) {
> +	if (data->iop.cfg.fmt == ARM_64_LPAE_S2 ||
> +	    data->iop.cfg.fmt == ARM_32_LPAE_S2) {
>  		if (prot & IOMMU_MMIO)
>  			pte |= ARM_LPAE_PTE_MEMATTR_DEV;
>  		else if (prot & IOMMU_CACHE)
> @@ -243,7 +243,7 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
>  	 * "outside the GPU" (i.e. either the Inner or System domain in CPU
>  	 * terms, depending on coherency).
>  	 */
> -	if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE)
> +	if (prot & IOMMU_CACHE && data->iop.cfg.fmt != ARM_MALI_LPAE)
>  		pte |= ARM_LPAE_PTE_SH_IS;
>  	else
>  		pte |= ARM_LPAE_PTE_SH_OS;
> @@ -254,7 +254,7 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
>  	if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_NS)
>  		pte |= ARM_LPAE_PTE_NS;
>  
> -	if (data->iop.fmt != ARM_MALI_LPAE)
> +	if (data->iop.cfg.fmt != ARM_MALI_LPAE)
>  		pte |= ARM_LPAE_PTE_AF;
>  
>  	return pte;
> @@ -317,7 +317,7 @@ void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
>  	while (ptep != end) {
>  		arm_lpae_iopte pte = *ptep++;
>  
> -		if (!pte || iopte_leaf(pte, lvl, data->iop.fmt))
> +		if (!pte || iopte_leaf(pte, lvl, data->iop.cfg.fmt))
>  			continue;
>  
>  		__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data));
> @@ -417,7 +417,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>  
>  			__arm_lpae_clear_pte(ptep, &iop->cfg);
>  
> -			if (!iopte_leaf(pte, lvl, iop->fmt)) {
> +			if (!iopte_leaf(pte, lvl, iop->cfg.fmt)) {
>  				/* Also flush any partial walks */
>  				io_pgtable_tlb_flush_walk(iop, iova + i * size, size,
>  							  ARM_LPAE_GRANULE(data));
> @@ -431,7 +431,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>  		}
>  
>  		return i * size;
> -	} else if (iopte_leaf(pte, lvl, iop->fmt)) {
> +	} else if (iopte_leaf(pte, lvl, iop->cfg.fmt)) {
>  		/*
>  		 * Insert a table at the next level to map the old region,
>  		 * minus the part we want to unmap
> @@ -487,7 +487,7 @@ phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
>  			return 0;
>  
>  		/* Leaf entry? */
> -		if (iopte_leaf(pte, lvl, data->iop.fmt))
> +		if (iopte_leaf(pte, lvl, data->iop.cfg.fmt))
>  			goto found_translation;
>  
>  		/* Take it to the next level */
> diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
> index 75f244a3e12d..278b4299d757 100644
> --- a/drivers/iommu/io-pgtable-arm-v7s.c
> +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> @@ -930,6 +930,7 @@ static int __init arm_v7s_do_selftests(void)
>  {
>  	struct io_pgtable_ops *ops;
>  	struct io_pgtable_cfg cfg = {
> +		.fmt = ARM_V7S,
>  		.tlb = &dummy_tlb_ops,
>  		.oas = 32,
>  		.ias = 32,
> @@ -945,7 +946,7 @@ static int __init arm_v7s_do_selftests(void)
>  
>  	cfg_cookie = &cfg;
>  
> -	ops = alloc_io_pgtable_ops(ARM_V7S, &cfg, &cfg);
> +	ops = alloc_io_pgtable_ops(&cfg, &cfg);
>  	if (!ops) {
>  		pr_err("selftest: failed to allocate io pgtable ops\n");
>  		return -EINVAL;
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index b2b188bb86b3..b76b903400de 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -319,7 +319,8 @@ static int __init arm_lpae_run_tests(struct io_pgtable_cfg *cfg)
>  
>  	for (i = 0; i < ARRAY_SIZE(fmts); ++i) {
>  		cfg_cookie = cfg;
> -		ops = alloc_io_pgtable_ops(fmts[i], cfg, cfg);
> +		cfg->fmt = fmts[i];
> +		ops = alloc_io_pgtable_ops(cfg, cfg);
>  		if (!ops) {
>  			pr_err("selftest: failed to allocate io pgtable ops\n");
>  			return -ENOMEM;
> diff --git a/drivers/iommu/io-pgtable-dart.c b/drivers/iommu/io-pgtable-dart.c
> index 74b1ef2b96be..f981b25d8c98 100644
> --- a/drivers/iommu/io-pgtable-dart.c
> +++ b/drivers/iommu/io-pgtable-dart.c
> @@ -81,7 +81,7 @@ static dart_iopte paddr_to_iopte(phys_addr_t paddr,
>  {
>  	dart_iopte pte;
>  
> -	if (data->iop.fmt == APPLE_DART)
> +	if (data->iop.cfg.fmt == APPLE_DART)
>  		return paddr & APPLE_DART1_PADDR_MASK;
>  
>  	/* format is APPLE_DART2 */
> @@ -96,7 +96,7 @@ static phys_addr_t iopte_to_paddr(dart_iopte pte,
>  {
>  	u64 paddr;
>  
> -	if (data->iop.fmt == APPLE_DART)
> +	if (data->iop.cfg.fmt == APPLE_DART)
>  		return pte & APPLE_DART1_PADDR_MASK;
>  
>  	/* format is APPLE_DART2 */
> @@ -215,13 +215,13 @@ static dart_iopte dart_prot_to_pte(struct dart_io_pgtable *data,
>  {
>  	dart_iopte pte = 0;
>  
> -	if (data->iop.fmt == APPLE_DART) {
> +	if (data->iop.cfg.fmt == APPLE_DART) {
>  		if (!(prot & IOMMU_WRITE))
>  			pte |= APPLE_DART1_PTE_PROT_NO_WRITE;
>  		if (!(prot & IOMMU_READ))
>  			pte |= APPLE_DART1_PTE_PROT_NO_READ;
>  	}
> -	if (data->iop.fmt == APPLE_DART2) {
> +	if (data->iop.cfg.fmt == APPLE_DART2) {
>  		if (!(prot & IOMMU_WRITE))
>  			pte |= APPLE_DART2_PTE_PROT_NO_WRITE;
>  		if (!(prot & IOMMU_READ))
> diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c
> index b843fcd365d2..79e459f95012 100644
> --- a/drivers/iommu/io-pgtable.c
> +++ b/drivers/iommu/io-pgtable.c
> @@ -34,17 +34,16 @@ io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] = {
>  #endif
>  };
>  
> -struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
> -					    struct io_pgtable_cfg *cfg,
> +struct io_pgtable_ops *alloc_io_pgtable_ops(struct io_pgtable_cfg *cfg,
>  					    void *cookie)
>  {
>  	struct io_pgtable *iop;
>  	const struct io_pgtable_init_fns *fns;
>  
> -	if (fmt >= IO_PGTABLE_NUM_FMTS)
> +	if (cfg->fmt >= IO_PGTABLE_NUM_FMTS)
>  		return NULL;
>  
> -	fns = io_pgtable_init_table[fmt];
> +	fns = io_pgtable_init_table[cfg->fmt];
>  	if (!fns)
>  		return NULL;
>  
> @@ -52,7 +51,6 @@ struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
>  	if (!iop)
>  		return NULL;
>  
> -	iop->fmt	= fmt;
>  	iop->cookie	= cookie;
>  	iop->cfg	= *cfg;
>  
> @@ -73,6 +71,6 @@ void free_io_pgtable_ops(struct io_pgtable_ops *ops)
>  
>  	iop = io_pgtable_ops_to_pgtable(ops);
>  	io_pgtable_tlb_flush_all(iop);
> -	io_pgtable_init_table[iop->fmt]->free(iop);
> +	io_pgtable_init_table[iop->cfg.fmt]->free(iop);
>  }
>  EXPORT_SYMBOL_GPL(free_io_pgtable_ops);
> diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
> index a003bd5fc65c..4a1927489635 100644
> --- a/drivers/iommu/ipmmu-vmsa.c
> +++ b/drivers/iommu/ipmmu-vmsa.c
> @@ -447,6 +447,7 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
>  	 */
>  	domain->cfg.coherent_walk = false;
>  	domain->cfg.iommu_dev = domain->mmu->root->dev;
> +	domain->cfg.fmt = ARM_32_LPAE_S1;
>  
>  	/*
>  	 * Find an unused context.
> @@ -457,8 +458,7 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
>  
>  	domain->context_id = ret;
>  
> -	domain->iop = alloc_io_pgtable_ops(ARM_32_LPAE_S1, &domain->cfg,
> -					   domain);
> +	domain->iop = alloc_io_pgtable_ops(&domain->cfg, domain);
>  	if (!domain->iop) {
>  		ipmmu_domain_free_context(domain->mmu->root,
>  					  domain->context_id);
> diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
> index c60624910872..2c05a84ec1bf 100644
> --- a/drivers/iommu/msm_iommu.c
> +++ b/drivers/iommu/msm_iommu.c
> @@ -342,6 +342,7 @@ static int msm_iommu_domain_config(struct msm_priv *priv)
>  	spin_lock_init(&priv->pgtlock);
>  
>  	priv->cfg = (struct io_pgtable_cfg) {
> +		.fmt = ARM_V7S,
>  		.pgsize_bitmap = msm_iommu_ops.pgsize_bitmap,
>  		.ias = 32,
>  		.oas = 32,
> @@ -349,7 +350,7 @@ static int msm_iommu_domain_config(struct msm_priv *priv)
>  		.iommu_dev = priv->dev,
>  	};
>  
> -	priv->iop = alloc_io_pgtable_ops(ARM_V7S, &priv->cfg, priv);
> +	priv->iop = alloc_io_pgtable_ops(&priv->cfg, priv);
>  	if (!priv->iop) {
>  		dev_err(priv->dev, "Failed to allocate pgtable\n");
>  		return -EINVAL;
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 2badd6acfb23..0d754d94ae52 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -598,6 +598,7 @@ static int mtk_iommu_domain_finalise(struct mtk_iommu_domain *dom,
>  	}
>  
>  	dom->cfg = (struct io_pgtable_cfg) {
> +		.fmt = ARM_V7S,
>  		.quirks = IO_PGTABLE_QUIRK_ARM_NS |
>  			IO_PGTABLE_QUIRK_NO_PERMS |
>  			IO_PGTABLE_QUIRK_ARM_MTK_EXT,
> @@ -614,7 +615,7 @@ static int mtk_iommu_domain_finalise(struct mtk_iommu_domain *dom,
>  	else
>  		dom->cfg.oas = 35;
>  
> -	dom->iop = alloc_io_pgtable_ops(ARM_V7S, &dom->cfg, data);
> +	dom->iop = alloc_io_pgtable_ops(&dom->cfg, data);
>  	if (!dom->iop) {
>  		dev_err(data->dev, "Failed to alloc io pgtable\n");
>  		return -ENOMEM;
> -- 
> 2.39.0
> 
Thanks,
Mostafa

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 03/45] iommu/io-pgtable: Move fmt into io_pgtable_cfg
@ 2024-02-16 11:55     ` Mostafa Saleh
  0 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2024-02-16 11:55 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Jean,

On Wed, Feb 01, 2023 at 12:52:47PM +0000, Jean-Philippe Brucker wrote:
> When passing the I/O pagetable configuration around and adding new
> operations, it will be slightly more convenient to have fmt be part of
> the config structure rather than a separate parameter.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> ---
>  include/linux/io-pgtable.h                  |  8 +++----
>  drivers/gpu/drm/msm/msm_iommu.c             |  3 +--
>  drivers/gpu/drm/panfrost/panfrost_mmu.c     |  4 ++--
>  drivers/iommu/amd/iommu.c                   |  3 ++-
>  drivers/iommu/apple-dart.c                  |  4 ++--
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  3 ++-
>  drivers/iommu/arm/arm-smmu/arm-smmu.c       |  3 ++-
>  drivers/iommu/arm/arm-smmu/qcom_iommu.c     |  3 ++-
>  drivers/iommu/io-pgtable-arm-common.c       | 26 ++++++++++-----------
>  drivers/iommu/io-pgtable-arm-v7s.c          |  3 ++-
>  drivers/iommu/io-pgtable-arm.c              |  3 ++-
>  drivers/iommu/io-pgtable-dart.c             |  8 +++----
>  drivers/iommu/io-pgtable.c                  | 10 ++++----
>  drivers/iommu/ipmmu-vmsa.c                  |  4 ++--
>  drivers/iommu/msm_iommu.c                   |  3 ++-
>  drivers/iommu/mtk_iommu.c                   |  3 ++-
>  16 files changed, 47 insertions(+), 44 deletions(-)
> 
> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
> index 1b7a44b35616..1b0c26241a78 100644
> --- a/include/linux/io-pgtable.h
> +++ b/include/linux/io-pgtable.h
> @@ -49,6 +49,7 @@ struct iommu_flush_ops {
>  /**
>   * struct io_pgtable_cfg - Configuration data for a set of page tables.
>   *
> + * @fmt	           Format used for these page tables
>   * @quirks:        A bitmap of hardware quirks that require some special
>   *                 action by the low-level page table allocator.
>   * @pgsize_bitmap: A bitmap of page sizes supported by this set of page
> @@ -62,6 +63,7 @@ struct iommu_flush_ops {
>   *                 page table walker.
>   */
>  struct io_pgtable_cfg {
> +	enum io_pgtable_fmt		fmt;
>  	/*
>  	 * IO_PGTABLE_QUIRK_ARM_NS: (ARM formats) Set NS and NSTABLE bits in
>  	 *	stage 1 PTEs, for hardware which insists on validating them
> @@ -171,15 +173,13 @@ struct io_pgtable_ops {
>  /**
>   * alloc_io_pgtable_ops() - Allocate a page table allocator for use by an IOMMU.
>   *
> - * @fmt:    The page table format.
>   * @cfg:    The page table configuration. This will be modified to represent
>   *          the configuration actually provided by the allocator (e.g. the
>   *          pgsize_bitmap may be restricted).
>   * @cookie: An opaque token provided by the IOMMU driver and passed back to
>   *          the callback routines in cfg->tlb.
>   */
> -struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
> -					    struct io_pgtable_cfg *cfg,
> +struct io_pgtable_ops *alloc_io_pgtable_ops(struct io_pgtable_cfg *cfg,
>  					    void *cookie);
>  
>  /**
> @@ -199,14 +199,12 @@ void free_io_pgtable_ops(struct io_pgtable_ops *ops);
>  /**
>   * struct io_pgtable - Internal structure describing a set of page tables.
>   *
> - * @fmt:    The page table format.
>   * @cookie: An opaque token provided by the IOMMU driver and passed back to
>   *          any callback routines.
>   * @cfg:    A copy of the page table configuration.
>   * @ops:    The page table operations in use for this set of page tables.
>   */
>  struct io_pgtable {
> -	enum io_pgtable_fmt	fmt;
>  	void			*cookie;
>  	struct io_pgtable_cfg	cfg;
>  	struct io_pgtable_ops	ops;
> diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
> index c2507582ecf3..e9c6f281e3dd 100644
> --- a/drivers/gpu/drm/msm/msm_iommu.c
> +++ b/drivers/gpu/drm/msm/msm_iommu.c
> @@ -258,8 +258,7 @@ struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent)
>  	ttbr0_cfg.quirks &= ~IO_PGTABLE_QUIRK_ARM_TTBR1;
>  	ttbr0_cfg.tlb = &null_tlb_ops;
>  
> -	pagetable->pgtbl_ops = alloc_io_pgtable_ops(ARM_64_LPAE_S1,
> -		&ttbr0_cfg, iommu->domain);
This seems to miss:
+	ttbr0_cfg.fmt = ARM_64_LPAE_S1;
> +	pagetable->pgtbl_ops = alloc_io_pgtable_ops(&ttbr0_cfg, iommu->domain);
>  
>  	if (!pagetable->pgtbl_ops) {
>  		kfree(pagetable);
> diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> index 4e83a1891f3e..31bdb5d46244 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> @@ -622,6 +622,7 @@ struct panfrost_mmu *panfrost_mmu_ctx_create(struct panfrost_device *pfdev)
>  	mmu->as = -1;
>  
>  	mmu->pgtbl_cfg = (struct io_pgtable_cfg) {
> +		.fmt		= ARM_MALI_LPAE,
>  		.pgsize_bitmap	= SZ_4K | SZ_2M,
>  		.ias		= FIELD_GET(0xff, pfdev->features.mmu_features),
>  		.oas		= FIELD_GET(0xff00, pfdev->features.mmu_features),
> @@ -630,8 +631,7 @@ struct panfrost_mmu *panfrost_mmu_ctx_create(struct panfrost_device *pfdev)
>  		.iommu_dev	= pfdev->dev,
>  	};
>  
> -	mmu->pgtbl_ops = alloc_io_pgtable_ops(ARM_MALI_LPAE, &mmu->pgtbl_cfg,
> -					      mmu);
> +	mmu->pgtbl_ops = alloc_io_pgtable_ops(&mmu->pgtbl_cfg, mmu);
>  	if (!mmu->pgtbl_ops) {
>  		kfree(mmu);
>  		return ERR_PTR(-EINVAL);
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index cbeaab55c0db..7efb6b467041 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -2072,7 +2072,8 @@ static struct protection_domain *protection_domain_alloc(unsigned int type)
>  	if (ret)
>  		goto out_err;
>  
> -	pgtbl_ops = alloc_io_pgtable_ops(pgtable, &domain->iop.pgtbl_cfg, domain);
> +	domain->iop.pgtbl_cfg.fmt = pgtable;
> +	pgtbl_ops = alloc_io_pgtable_ops(&domain->iop.pgtbl_cfg, domain);
>  	if (!pgtbl_ops) {
>  		domain_id_free(domain->id);
>  		goto out_err;
> diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
> index 4f4a323be0d0..571f948add7c 100644
> --- a/drivers/iommu/apple-dart.c
> +++ b/drivers/iommu/apple-dart.c
> @@ -427,6 +427,7 @@ static int apple_dart_finalize_domain(struct iommu_domain *domain,
>  	}
>  
>  	pgtbl_cfg = (struct io_pgtable_cfg){
> +		.fmt = dart->hw->fmt,
>  		.pgsize_bitmap = dart->pgsize,
>  		.ias = 32,
>  		.oas = dart->hw->oas,
> @@ -434,8 +435,7 @@ static int apple_dart_finalize_domain(struct iommu_domain *domain,
>  		.iommu_dev = dart->dev,
>  	};
>  
> -	dart_domain->pgtbl_ops =
> -		alloc_io_pgtable_ops(dart->hw->fmt, &pgtbl_cfg, domain);
> +	dart_domain->pgtbl_ops = alloc_io_pgtable_ops(&pgtbl_cfg, domain);
>  	if (!dart_domain->pgtbl_ops) {
>  		ret = -ENOMEM;
>  		goto done;
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index ab160198edd6..c033b23ca4b2 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2209,6 +2209,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
>  	}
>  
>  	pgtbl_cfg = (struct io_pgtable_cfg) {
> +		.fmt		= fmt,
>  		.pgsize_bitmap	= smmu->pgsize_bitmap,
>  		.ias		= ias,
>  		.oas		= oas,
> @@ -2217,7 +2218,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
>  		.iommu_dev	= smmu->dev,
>  	};
>  
> -	pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);
> +	pgtbl_ops = alloc_io_pgtable_ops(&pgtbl_cfg, smmu_domain);
>  	if (!pgtbl_ops)
>  		return -ENOMEM;
>  
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> index 719fbca1fe52..f230d2ce977a 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> @@ -747,6 +747,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
>  		cfg->asid = cfg->cbndx;
>  
>  	pgtbl_cfg = (struct io_pgtable_cfg) {
> +		.fmt		= fmt,
>  		.pgsize_bitmap	= smmu->pgsize_bitmap,
>  		.ias		= ias,
>  		.oas		= oas,
> @@ -764,7 +765,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
>  	if (smmu_domain->pgtbl_quirks)
>  		pgtbl_cfg.quirks |= smmu_domain->pgtbl_quirks;
>  
> -	pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);
> +	pgtbl_ops = alloc_io_pgtable_ops(&pgtbl_cfg, smmu_domain);
>  	if (!pgtbl_ops) {
>  		ret = -ENOMEM;
>  		goto out_clear_smmu;
> diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
> index 270c3d9128ba..65eb8bdcbe50 100644
> --- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c
> +++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
> @@ -239,6 +239,7 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
>  		goto out_unlock;
>  
>  	pgtbl_cfg = (struct io_pgtable_cfg) {
> +		.fmt		= ARM_32_LPAE_S1,
>  		.pgsize_bitmap	= qcom_iommu_ops.pgsize_bitmap,
>  		.ias		= 32,
>  		.oas		= 40,
> @@ -249,7 +250,7 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
>  	qcom_domain->iommu = qcom_iommu;
>  	qcom_domain->fwspec = fwspec;
>  
> -	pgtbl_ops = alloc_io_pgtable_ops(ARM_32_LPAE_S1, &pgtbl_cfg, qcom_domain);
> +	pgtbl_ops = alloc_io_pgtable_ops(&pgtbl_cfg, qcom_domain);
>  	if (!pgtbl_ops) {
>  		dev_err(qcom_iommu->dev, "failed to allocate pagetable ops\n");
>  		ret = -ENOMEM;
> diff --git a/drivers/iommu/io-pgtable-arm-common.c b/drivers/iommu/io-pgtable-arm-common.c
> index 7340b5096499..4b3a9ce806ea 100644
> --- a/drivers/iommu/io-pgtable-arm-common.c
> +++ b/drivers/iommu/io-pgtable-arm-common.c
> @@ -62,7 +62,7 @@ static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
>  	size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
>  	int i;
>  
> -	if (data->iop.fmt != ARM_MALI_LPAE && lvl == ARM_LPAE_MAX_LEVELS - 1)
> +	if (data->iop.cfg.fmt != ARM_MALI_LPAE && lvl == ARM_LPAE_MAX_LEVELS - 1)
>  		pte |= ARM_LPAE_PTE_TYPE_PAGE;
>  	else
>  		pte |= ARM_LPAE_PTE_TYPE_BLOCK;
> @@ -82,7 +82,7 @@ static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
>  	int i;
>  
>  	for (i = 0; i < num_entries; i++)
> -		if (iopte_leaf(ptep[i], lvl, data->iop.fmt)) {
> +		if (iopte_leaf(ptep[i], lvl, data->iop.cfg.fmt)) {
>  			/* We require an unmap first */
>  			WARN_ON(!selftest_running);
>  			return -EEXIST;
> @@ -183,7 +183,7 @@ int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova,
>  		__arm_lpae_sync_pte(ptep, 1, cfg);
>  	}
>  
> -	if (pte && !iopte_leaf(pte, lvl, data->iop.fmt)) {
> +	if (pte && !iopte_leaf(pte, lvl, data->iop.cfg.fmt)) {
>  		cptep = iopte_deref(pte, data);
>  	} else if (pte) {
>  		/* We require an unmap first */
> @@ -201,8 +201,8 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
>  {
>  	arm_lpae_iopte pte;
>  
> -	if (data->iop.fmt == ARM_64_LPAE_S1 ||
> -	    data->iop.fmt == ARM_32_LPAE_S1) {
> +	if (data->iop.cfg.fmt == ARM_64_LPAE_S1 ||
> +	    data->iop.cfg.fmt == ARM_32_LPAE_S1) {
>  		pte = ARM_LPAE_PTE_nG;
>  		if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
>  			pte |= ARM_LPAE_PTE_AP_RDONLY;
> @@ -220,8 +220,8 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
>  	 * Note that this logic is structured to accommodate Mali LPAE
>  	 * having stage-1-like attributes but stage-2-like permissions.
>  	 */
> -	if (data->iop.fmt == ARM_64_LPAE_S2 ||
> -	    data->iop.fmt == ARM_32_LPAE_S2) {
> +	if (data->iop.cfg.fmt == ARM_64_LPAE_S2 ||
> +	    data->iop.cfg.fmt == ARM_32_LPAE_S2) {
>  		if (prot & IOMMU_MMIO)
>  			pte |= ARM_LPAE_PTE_MEMATTR_DEV;
>  		else if (prot & IOMMU_CACHE)
> @@ -243,7 +243,7 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
>  	 * "outside the GPU" (i.e. either the Inner or System domain in CPU
>  	 * terms, depending on coherency).
>  	 */
> -	if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE)
> +	if (prot & IOMMU_CACHE && data->iop.cfg.fmt != ARM_MALI_LPAE)
>  		pte |= ARM_LPAE_PTE_SH_IS;
>  	else
>  		pte |= ARM_LPAE_PTE_SH_OS;
> @@ -254,7 +254,7 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
>  	if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_NS)
>  		pte |= ARM_LPAE_PTE_NS;
>  
> -	if (data->iop.fmt != ARM_MALI_LPAE)
> +	if (data->iop.cfg.fmt != ARM_MALI_LPAE)
>  		pte |= ARM_LPAE_PTE_AF;
>  
>  	return pte;
> @@ -317,7 +317,7 @@ void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
>  	while (ptep != end) {
>  		arm_lpae_iopte pte = *ptep++;
>  
> -		if (!pte || iopte_leaf(pte, lvl, data->iop.fmt))
> +		if (!pte || iopte_leaf(pte, lvl, data->iop.cfg.fmt))
>  			continue;
>  
>  		__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data));
> @@ -417,7 +417,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>  
>  			__arm_lpae_clear_pte(ptep, &iop->cfg);
>  
> -			if (!iopte_leaf(pte, lvl, iop->fmt)) {
> +			if (!iopte_leaf(pte, lvl, iop->cfg.fmt)) {
>  				/* Also flush any partial walks */
>  				io_pgtable_tlb_flush_walk(iop, iova + i * size, size,
>  							  ARM_LPAE_GRANULE(data));
> @@ -431,7 +431,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>  		}
>  
>  		return i * size;
> -	} else if (iopte_leaf(pte, lvl, iop->fmt)) {
> +	} else if (iopte_leaf(pte, lvl, iop->cfg.fmt)) {
>  		/*
>  		 * Insert a table at the next level to map the old region,
>  		 * minus the part we want to unmap
> @@ -487,7 +487,7 @@ phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
>  			return 0;
>  
>  		/* Leaf entry? */
> -		if (iopte_leaf(pte, lvl, data->iop.fmt))
> +		if (iopte_leaf(pte, lvl, data->iop.cfg.fmt))
>  			goto found_translation;
>  
>  		/* Take it to the next level */
> diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
> index 75f244a3e12d..278b4299d757 100644
> --- a/drivers/iommu/io-pgtable-arm-v7s.c
> +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> @@ -930,6 +930,7 @@ static int __init arm_v7s_do_selftests(void)
>  {
>  	struct io_pgtable_ops *ops;
>  	struct io_pgtable_cfg cfg = {
> +		.fmt = ARM_V7S,
>  		.tlb = &dummy_tlb_ops,
>  		.oas = 32,
>  		.ias = 32,
> @@ -945,7 +946,7 @@ static int __init arm_v7s_do_selftests(void)
>  
>  	cfg_cookie = &cfg;
>  
> -	ops = alloc_io_pgtable_ops(ARM_V7S, &cfg, &cfg);
> +	ops = alloc_io_pgtable_ops(&cfg, &cfg);
>  	if (!ops) {
>  		pr_err("selftest: failed to allocate io pgtable ops\n");
>  		return -EINVAL;
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index b2b188bb86b3..b76b903400de 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -319,7 +319,8 @@ static int __init arm_lpae_run_tests(struct io_pgtable_cfg *cfg)
>  
>  	for (i = 0; i < ARRAY_SIZE(fmts); ++i) {
>  		cfg_cookie = cfg;
> -		ops = alloc_io_pgtable_ops(fmts[i], cfg, cfg);
> +		cfg->fmt = fmts[i];
> +		ops = alloc_io_pgtable_ops(cfg, cfg);
>  		if (!ops) {
>  			pr_err("selftest: failed to allocate io pgtable ops\n");
>  			return -ENOMEM;
> diff --git a/drivers/iommu/io-pgtable-dart.c b/drivers/iommu/io-pgtable-dart.c
> index 74b1ef2b96be..f981b25d8c98 100644
> --- a/drivers/iommu/io-pgtable-dart.c
> +++ b/drivers/iommu/io-pgtable-dart.c
> @@ -81,7 +81,7 @@ static dart_iopte paddr_to_iopte(phys_addr_t paddr,
>  {
>  	dart_iopte pte;
>  
> -	if (data->iop.fmt == APPLE_DART)
> +	if (data->iop.cfg.fmt == APPLE_DART)
>  		return paddr & APPLE_DART1_PADDR_MASK;
>  
>  	/* format is APPLE_DART2 */
> @@ -96,7 +96,7 @@ static phys_addr_t iopte_to_paddr(dart_iopte pte,
>  {
>  	u64 paddr;
>  
> -	if (data->iop.fmt == APPLE_DART)
> +	if (data->iop.cfg.fmt == APPLE_DART)
>  		return pte & APPLE_DART1_PADDR_MASK;
>  
>  	/* format is APPLE_DART2 */
> @@ -215,13 +215,13 @@ static dart_iopte dart_prot_to_pte(struct dart_io_pgtable *data,
>  {
>  	dart_iopte pte = 0;
>  
> -	if (data->iop.fmt == APPLE_DART) {
> +	if (data->iop.cfg.fmt == APPLE_DART) {
>  		if (!(prot & IOMMU_WRITE))
>  			pte |= APPLE_DART1_PTE_PROT_NO_WRITE;
>  		if (!(prot & IOMMU_READ))
>  			pte |= APPLE_DART1_PTE_PROT_NO_READ;
>  	}
> -	if (data->iop.fmt == APPLE_DART2) {
> +	if (data->iop.cfg.fmt == APPLE_DART2) {
>  		if (!(prot & IOMMU_WRITE))
>  			pte |= APPLE_DART2_PTE_PROT_NO_WRITE;
>  		if (!(prot & IOMMU_READ))
> diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c
> index b843fcd365d2..79e459f95012 100644
> --- a/drivers/iommu/io-pgtable.c
> +++ b/drivers/iommu/io-pgtable.c
> @@ -34,17 +34,16 @@ io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] = {
>  #endif
>  };
>  
> -struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
> -					    struct io_pgtable_cfg *cfg,
> +struct io_pgtable_ops *alloc_io_pgtable_ops(struct io_pgtable_cfg *cfg,
>  					    void *cookie)
>  {
>  	struct io_pgtable *iop;
>  	const struct io_pgtable_init_fns *fns;
>  
> -	if (fmt >= IO_PGTABLE_NUM_FMTS)
> +	if (cfg->fmt >= IO_PGTABLE_NUM_FMTS)
>  		return NULL;
>  
> -	fns = io_pgtable_init_table[fmt];
> +	fns = io_pgtable_init_table[cfg->fmt];
>  	if (!fns)
>  		return NULL;
>  
> @@ -52,7 +51,6 @@ struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
>  	if (!iop)
>  		return NULL;
>  
> -	iop->fmt	= fmt;
>  	iop->cookie	= cookie;
>  	iop->cfg	= *cfg;
>  
> @@ -73,6 +71,6 @@ void free_io_pgtable_ops(struct io_pgtable_ops *ops)
>  
>  	iop = io_pgtable_ops_to_pgtable(ops);
>  	io_pgtable_tlb_flush_all(iop);
> -	io_pgtable_init_table[iop->fmt]->free(iop);
> +	io_pgtable_init_table[iop->cfg.fmt]->free(iop);
>  }
>  EXPORT_SYMBOL_GPL(free_io_pgtable_ops);
> diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
> index a003bd5fc65c..4a1927489635 100644
> --- a/drivers/iommu/ipmmu-vmsa.c
> +++ b/drivers/iommu/ipmmu-vmsa.c
> @@ -447,6 +447,7 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
>  	 */
>  	domain->cfg.coherent_walk = false;
>  	domain->cfg.iommu_dev = domain->mmu->root->dev;
> +	domain->cfg.fmt = ARM_32_LPAE_S1;
>  
>  	/*
>  	 * Find an unused context.
> @@ -457,8 +458,7 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
>  
>  	domain->context_id = ret;
>  
> -	domain->iop = alloc_io_pgtable_ops(ARM_32_LPAE_S1, &domain->cfg,
> -					   domain);
> +	domain->iop = alloc_io_pgtable_ops(&domain->cfg, domain);
>  	if (!domain->iop) {
>  		ipmmu_domain_free_context(domain->mmu->root,
>  					  domain->context_id);
> diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
> index c60624910872..2c05a84ec1bf 100644
> --- a/drivers/iommu/msm_iommu.c
> +++ b/drivers/iommu/msm_iommu.c
> @@ -342,6 +342,7 @@ static int msm_iommu_domain_config(struct msm_priv *priv)
>  	spin_lock_init(&priv->pgtlock);
>  
>  	priv->cfg = (struct io_pgtable_cfg) {
> +		.fmt = ARM_V7S,
>  		.pgsize_bitmap = msm_iommu_ops.pgsize_bitmap,
>  		.ias = 32,
>  		.oas = 32,
> @@ -349,7 +350,7 @@ static int msm_iommu_domain_config(struct msm_priv *priv)
>  		.iommu_dev = priv->dev,
>  	};
>  
> -	priv->iop = alloc_io_pgtable_ops(ARM_V7S, &priv->cfg, priv);
> +	priv->iop = alloc_io_pgtable_ops(&priv->cfg, priv);
>  	if (!priv->iop) {
>  		dev_err(priv->dev, "Failed to allocate pgtable\n");
>  		return -EINVAL;
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 2badd6acfb23..0d754d94ae52 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -598,6 +598,7 @@ static int mtk_iommu_domain_finalise(struct mtk_iommu_domain *dom,
>  	}
>  
>  	dom->cfg = (struct io_pgtable_cfg) {
> +		.fmt = ARM_V7S,
>  		.quirks = IO_PGTABLE_QUIRK_ARM_NS |
>  			IO_PGTABLE_QUIRK_NO_PERMS |
>  			IO_PGTABLE_QUIRK_ARM_MTK_EXT,
> @@ -614,7 +615,7 @@ static int mtk_iommu_domain_finalise(struct mtk_iommu_domain *dom,
>  	else
>  		dom->cfg.oas = 35;
>  
> -	dom->iop = alloc_io_pgtable_ops(ARM_V7S, &dom->cfg, data);
> +	dom->iop = alloc_io_pgtable_ops(&dom->cfg, data);
>  	if (!dom->iop) {
>  		dev_err(data->dev, "Failed to alloc io pgtable\n");
>  		return -ENOMEM;
> -- 
> 2.39.0
> 
Thanks,
Mostafa

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 20/45] KVM: arm64: iommu: Add map() and unmap() operations
  2023-02-01 12:53   ` Jean-Philippe Brucker
@ 2024-02-16 11:59     ` Mostafa Saleh
  -1 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2024-02-16 11:59 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Jean,

On Wed, Feb 01, 2023 at 12:53:04PM +0000, Jean-Philippe Brucker wrote:
> Handle map() and unmap() hypercalls by calling the io-pgtable library.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> ---
>  arch/arm64/kvm/hyp/nvhe/iommu/iommu.c | 144 ++++++++++++++++++++++++++
>  1 file changed, 144 insertions(+)
> 
> diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> index 7404ea77ed9f..0550e7bdf179 100644
> --- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> +++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> @@ -183,6 +183,150 @@ int kvm_iommu_detach_dev(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
>  	return ret;
>  }
>  
> +static int __kvm_iommu_unmap_pages(struct io_pgtable *iopt, unsigned long iova,
> +				   size_t pgsize, size_t pgcount)
> +{
> +	int ret;
> +	size_t unmapped;
> +	phys_addr_t paddr;
> +	size_t total_unmapped = 0;
> +	size_t size = pgsize * pgcount;
> +
> +	while (total_unmapped < size) {
> +		paddr = iopt_iova_to_phys(iopt, iova);
> +		if (paddr == 0)
> +			return -EINVAL;
> +
> +		/*
> +		 * One page/block at a time, because the range provided may not
> +		 * be physically contiguous, and we need to unshare all physical
> +		 * pages.
> +		 */
> +		unmapped = iopt_unmap_pages(iopt, iova, pgsize, 1, NULL);
> +		if (!unmapped)
> +			return -EINVAL;
> +
> +		ret = __pkvm_host_unshare_dma(paddr, pgsize);
> +		if (ret)
> +			return ret;
> +
> +		iova += unmapped;
> +		pgcount -= unmapped / pgsize;
> +		total_unmapped += unmapped;
> +	}
> +
> +	return 0;
> +}
> +
> +#define IOMMU_PROT_MASK (IOMMU_READ | IOMMU_WRITE | IOMMU_CACHE |\
> +			 IOMMU_NOEXEC | IOMMU_MMIO)
Is there a reason IOMMU_PRIV is not allowed?
> +
> +int kvm_iommu_map_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
> +			unsigned long iova, phys_addr_t paddr, size_t pgsize,
> +			size_t pgcount, int prot)
> +{
> +	size_t size;
> +	size_t granule;
> +	int ret = -EINVAL;
> +	size_t mapped = 0;
> +	struct io_pgtable iopt;
> +	struct kvm_hyp_iommu *iommu;
> +	size_t pgcount_orig = pgcount;
> +	unsigned long iova_orig = iova;
> +	struct kvm_hyp_iommu_domain *domain;
> +
> +	if (prot & ~IOMMU_PROT_MASK)
> +		return -EINVAL;
> +
> +	if (__builtin_mul_overflow(pgsize, pgcount, &size) ||
> +	    iova + size < iova || paddr + size < paddr)
> +		return -EOVERFLOW;
> +
> +	hyp_spin_lock(&iommu_lock);
> +
> +	domain = handle_to_domain(iommu_id, domain_id, &iommu);
> +	if (!domain)
> +		goto err_unlock;
> +
> +	granule = 1 << __ffs(iommu->pgtable->cfg.pgsize_bitmap);
> +	if (!IS_ALIGNED(iova | paddr | pgsize, granule))
> +		goto err_unlock;
> +
> +	ret = __pkvm_host_share_dma(paddr, size, !(prot & IOMMU_MMIO));
> +	if (ret)
> +		goto err_unlock;
> +
> +	iopt = domain_to_iopt(iommu, domain, domain_id);
> +	while (pgcount) {
> +		ret = iopt_map_pages(&iopt, iova, paddr, pgsize, pgcount, prot,
> +				     0, &mapped);
> +		WARN_ON(!IS_ALIGNED(mapped, pgsize));
> +		pgcount -= mapped / pgsize;
> +		if (ret)
> +			goto err_unmap;
> +		iova += mapped;
> +		paddr += mapped;
> +	}
> +
> +	hyp_spin_unlock(&iommu_lock);
> +	return 0;
> +
> +err_unmap:
> +	__kvm_iommu_unmap_pages(&iopt, iova_orig, pgsize, pgcount_orig - pgcount);
> +err_unlock:
> +	hyp_spin_unlock(&iommu_lock);
> +	return ret;
> +}
> +
> +int kvm_iommu_unmap_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
> +			  unsigned long iova, size_t pgsize, size_t pgcount)
> +{
> +	size_t size;
> +	size_t granule;
> +	int ret = -EINVAL;
> +	struct io_pgtable iopt;
> +	struct kvm_hyp_iommu *iommu;
> +	struct kvm_hyp_iommu_domain *domain;
> +
> +	if (__builtin_mul_overflow(pgsize, pgcount, &size) ||
> +	    iova + size < iova)
> +		return -EOVERFLOW;
> +
> +	hyp_spin_lock(&iommu_lock);
> +	domain = handle_to_domain(iommu_id, domain_id, &iommu);
> +	if (!domain)
> +		goto out_unlock;
> +
> +	granule = 1 << __ffs(iommu->pgtable->cfg.pgsize_bitmap);
> +	if (!IS_ALIGNED(iova | pgsize, granule))
> +		goto out_unlock;
> +
> +	iopt = domain_to_iopt(iommu, domain, domain_id);
> +	ret = __kvm_iommu_unmap_pages(&iopt, iova, pgsize, pgcount);
> +out_unlock:
> +	hyp_spin_unlock(&iommu_lock);
> +	return ret;
> +}
> +
> +phys_addr_t kvm_iommu_iova_to_phys(pkvm_handle_t iommu_id,
> +				   pkvm_handle_t domain_id, unsigned long iova)
> +{
> +	phys_addr_t phys = 0;
> +	struct io_pgtable iopt;
> +	struct kvm_hyp_iommu *iommu;
> +	struct kvm_hyp_iommu_domain *domain;
> +
> +	hyp_spin_lock(&iommu_lock);
> +	domain = handle_to_domain(iommu_id, domain_id, &iommu);
> +	if (domain) {
> +		iopt = domain_to_iopt(iommu, domain, domain_id);
> +
> +		phys = iopt_iova_to_phys(&iopt, iova);
> +	}
> +	hyp_spin_unlock(&iommu_lock);
> +	return phys;
> +}
> +
>  int kvm_iommu_init_device(struct kvm_hyp_iommu *iommu)
>  {
>  	void *domains;
> -- 
> 2.39.0
>
Thanks,
Mostafa

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 20/45] KVM: arm64: iommu: Add map() and unmap() operations
@ 2024-02-16 11:59     ` Mostafa Saleh
  0 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2024-02-16 11:59 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Jean,

On Wed, Feb 01, 2023 at 12:53:04PM +0000, Jean-Philippe Brucker wrote:
> Handle map() and unmap() hypercalls by calling the io-pgtable library.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> ---
>  arch/arm64/kvm/hyp/nvhe/iommu/iommu.c | 144 ++++++++++++++++++++++++++
>  1 file changed, 144 insertions(+)
> 
> diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> index 7404ea77ed9f..0550e7bdf179 100644
> --- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> +++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> @@ -183,6 +183,150 @@ int kvm_iommu_detach_dev(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
>  	return ret;
>  }
>  
> +static int __kvm_iommu_unmap_pages(struct io_pgtable *iopt, unsigned long iova,
> +				   size_t pgsize, size_t pgcount)
> +{
> +	int ret;
> +	size_t unmapped;
> +	phys_addr_t paddr;
> +	size_t total_unmapped = 0;
> +	size_t size = pgsize * pgcount;
> +
> +	while (total_unmapped < size) {
> +		paddr = iopt_iova_to_phys(iopt, iova);
> +		if (paddr == 0)
> +			return -EINVAL;
> +
> +		/*
> +		 * One page/block at a time, because the range provided may not
> +		 * be physically contiguous, and we need to unshare all physical
> +		 * pages.
> +		 */
> +		unmapped = iopt_unmap_pages(iopt, iova, pgsize, 1, NULL);
> +		if (!unmapped)
> +			return -EINVAL;
> +
> +		ret = __pkvm_host_unshare_dma(paddr, pgsize);
> +		if (ret)
> +			return ret;
> +
> +		iova += unmapped;
> +		pgcount -= unmapped / pgsize;
> +		total_unmapped += unmapped;
> +	}
> +
> +	return 0;
> +}
> +
> +#define IOMMU_PROT_MASK (IOMMU_READ | IOMMU_WRITE | IOMMU_CACHE |\
> +			 IOMMU_NOEXEC | IOMMU_MMIO)
Is there a reason IOMMU_PRIV is not allowed?
> +
> +int kvm_iommu_map_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
> +			unsigned long iova, phys_addr_t paddr, size_t pgsize,
> +			size_t pgcount, int prot)
> +{
> +	size_t size;
> +	size_t granule;
> +	int ret = -EINVAL;
> +	size_t mapped = 0;
> +	struct io_pgtable iopt;
> +	struct kvm_hyp_iommu *iommu;
> +	size_t pgcount_orig = pgcount;
> +	unsigned long iova_orig = iova;
> +	struct kvm_hyp_iommu_domain *domain;
> +
> +	if (prot & ~IOMMU_PROT_MASK)
> +		return -EINVAL;
> +
> +	if (__builtin_mul_overflow(pgsize, pgcount, &size) ||
> +	    iova + size < iova || paddr + size < paddr)
> +		return -EOVERFLOW;
> +
> +	hyp_spin_lock(&iommu_lock);
> +
> +	domain = handle_to_domain(iommu_id, domain_id, &iommu);
> +	if (!domain)
> +		goto err_unlock;
> +
> +	granule = 1 << __ffs(iommu->pgtable->cfg.pgsize_bitmap);
> +	if (!IS_ALIGNED(iova | paddr | pgsize, granule))
> +		goto err_unlock;
> +
> +	ret = __pkvm_host_share_dma(paddr, size, !(prot & IOMMU_MMIO));
> +	if (ret)
> +		goto err_unlock;
> +
> +	iopt = domain_to_iopt(iommu, domain, domain_id);
> +	while (pgcount) {
> +		ret = iopt_map_pages(&iopt, iova, paddr, pgsize, pgcount, prot,
> +				     0, &mapped);
> +		WARN_ON(!IS_ALIGNED(mapped, pgsize));
> +		pgcount -= mapped / pgsize;
> +		if (ret)
> +			goto err_unmap;
> +		iova += mapped;
> +		paddr += mapped;
> +	}
> +
> +	hyp_spin_unlock(&iommu_lock);
> +	return 0;
> +
> +err_unmap:
> +	__kvm_iommu_unmap_pages(&iopt, iova_orig, pgsize, pgcount_orig - pgcount);
> +err_unlock:
> +	hyp_spin_unlock(&iommu_lock);
> +	return ret;
> +}
> +
> +int kvm_iommu_unmap_pages(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
> +			  unsigned long iova, size_t pgsize, size_t pgcount)
> +{
> +	size_t size;
> +	size_t granule;
> +	int ret = -EINVAL;
> +	struct io_pgtable iopt;
> +	struct kvm_hyp_iommu *iommu;
> +	struct kvm_hyp_iommu_domain *domain;
> +
> +	if (__builtin_mul_overflow(pgsize, pgcount, &size) ||
> +	    iova + size < iova)
> +		return -EOVERFLOW;
> +
> +	hyp_spin_lock(&iommu_lock);
> +	domain = handle_to_domain(iommu_id, domain_id, &iommu);
> +	if (!domain)
> +		goto out_unlock;
> +
> +	granule = 1 << __ffs(iommu->pgtable->cfg.pgsize_bitmap);
> +	if (!IS_ALIGNED(iova | pgsize, granule))
> +		goto out_unlock;
> +
> +	iopt = domain_to_iopt(iommu, domain, domain_id);
> +	ret = __kvm_iommu_unmap_pages(&iopt, iova, pgsize, pgcount);
> +out_unlock:
> +	hyp_spin_unlock(&iommu_lock);
> +	return ret;
> +}
> +
> +phys_addr_t kvm_iommu_iova_to_phys(pkvm_handle_t iommu_id,
> +				   pkvm_handle_t domain_id, unsigned long iova)
> +{
> +	phys_addr_t phys = 0;
> +	struct io_pgtable iopt;
> +	struct kvm_hyp_iommu *iommu;
> +	struct kvm_hyp_iommu_domain *domain;
> +
> +	hyp_spin_lock(&iommu_lock);
> +	domain = handle_to_domain(iommu_id, domain_id, &iommu);
> +	if (domain) {
> +		iopt = domain_to_iopt(iommu, domain, domain_id);
> +
> +		phys = iopt_iova_to_phys(&iopt, iova);
> +	}
> +	hyp_spin_unlock(&iommu_lock);
> +	return phys;
> +}
> +
>  int kvm_iommu_init_device(struct kvm_hyp_iommu *iommu)
>  {
>  	void *domains;
> -- 
> 2.39.0
>
Thanks,
Mostafa

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 30/45] iommu/arm-smmu-v3: Move queue and table allocation to arm-smmu-v3-common.c
  2023-02-01 12:53   ` Jean-Philippe Brucker
@ 2024-02-16 12:03     ` Mostafa Saleh
  -1 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2024-02-16 12:03 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Jean,

On Wed, Feb 01, 2023 at 12:53:14PM +0000, Jean-Philippe Brucker wrote:
> Move more code to arm-smmu-v3-common.c, so that the KVM driver can reuse
> it.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |   8 +
>  .../arm/arm-smmu-v3/arm-smmu-v3-common.c      | 190 ++++++++++++++++
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 215 ++----------------
>  3 files changed, 219 insertions(+), 194 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 59e8101d4ff5..8ab84282f62a 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -277,6 +277,14 @@ bool arm_smmu_capable(struct device *dev, enum iommu_cap cap);
>  struct iommu_group *arm_smmu_device_group(struct device *dev);
>  int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args);
>  int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu);
> +int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
> +			    struct arm_smmu_queue *q,
> +			    void __iomem *page,
> +			    unsigned long prod_off,
> +			    unsigned long cons_off,
> +			    size_t dwords, const char *name);
> +int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid);
I see this is not used by the KVM driver, so it is not needed in the
common file?

> +int arm_smmu_init_strtab(struct arm_smmu_device *smmu);
>  
>  int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain, int ssid,
>  			    struct arm_smmu_ctx_desc *cd);
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
> index 5e43329c0826..9226971b6e53 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
> @@ -294,3 +294,193 @@ int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
>  {
>  	return iommu_fwspec_add_ids(dev, args->args, 1);
>  }
> +
> +int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
> +			    struct arm_smmu_queue *q,
> +			    void __iomem *page,
> +			    unsigned long prod_off,
> +			    unsigned long cons_off,
> +			    size_t dwords, const char *name)
> +{
> +	size_t qsz;
> +
> +	do {
> +		qsz = ((1 << q->llq.max_n_shift) * dwords) << 3;
> +		q->base = dmam_alloc_coherent(smmu->dev, qsz, &q->base_dma,
> +					      GFP_KERNEL);
> +		if (q->base || qsz < PAGE_SIZE)
> +			break;
> +
> +		q->llq.max_n_shift--;
> +	} while (1);
> +
> +	if (!q->base) {
> +		dev_err(smmu->dev,
> +			"failed to allocate queue (0x%zx bytes) for %s\n",
> +			qsz, name);
> +		return -ENOMEM;
> +	}
> +
> +	if (!WARN_ON(q->base_dma & (qsz - 1))) {
> +		dev_info(smmu->dev, "allocated %u entries for %s\n",
> +			 1 << q->llq.max_n_shift, name);
> +	}
> +
> +	q->prod_reg	= page + prod_off;
> +	q->cons_reg	= page + cons_off;
> +	q->ent_dwords	= dwords;
> +
> +	q->q_base  = Q_BASE_RWA;
> +	q->q_base |= q->base_dma & Q_BASE_ADDR_MASK;
> +	q->q_base |= FIELD_PREP(Q_BASE_LOG2SIZE, q->llq.max_n_shift);
> +
> +	q->llq.prod = q->llq.cons = 0;
> +	return 0;
> +}
> +
> +/* Stream table initialization functions */
> +static void
> +arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
> +{
> +	u64 val = 0;
> +
> +	val |= FIELD_PREP(STRTAB_L1_DESC_SPAN, desc->span);
> +	val |= desc->l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK;
> +
> +	/* Ensure the SMMU sees a zeroed table after reading this pointer */
> +	WRITE_ONCE(*dst, cpu_to_le64(val));
> +}
> +
> +int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
> +{
> +	size_t size;
> +	void *strtab;
> +	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> +	struct arm_smmu_strtab_l1_desc *desc = &cfg->l1_desc[sid >> STRTAB_SPLIT];
> +
> +	if (desc->l2ptr)
> +		return 0;
> +
> +	size = 1 << (STRTAB_SPLIT + ilog2(STRTAB_STE_DWORDS) + 3);
> +	strtab = &cfg->strtab[(sid >> STRTAB_SPLIT) * STRTAB_L1_DESC_DWORDS];
> +
> +	desc->span = STRTAB_SPLIT + 1;
> +	desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, &desc->l2ptr_dma,
> +					  GFP_KERNEL);
> +	if (!desc->l2ptr) {
> +		dev_err(smmu->dev,
> +			"failed to allocate l2 stream table for SID %u\n",
> +			sid);
> +		return -ENOMEM;
> +	}
> +
> +	arm_smmu_write_strtab_l1_desc(strtab, desc);
> +	return 0;
> +}
> +
> +static int arm_smmu_init_l1_strtab(struct arm_smmu_device *smmu)
> +{
> +	unsigned int i;
> +	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> +	void *strtab = smmu->strtab_cfg.strtab;
> +
> +	cfg->l1_desc = devm_kcalloc(smmu->dev, cfg->num_l1_ents,
> +				    sizeof(*cfg->l1_desc), GFP_KERNEL);
> +	if (!cfg->l1_desc)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < cfg->num_l1_ents; ++i) {
> +		arm_smmu_write_strtab_l1_desc(strtab, &cfg->l1_desc[i]);
> +		strtab += STRTAB_L1_DESC_DWORDS << 3;
> +	}
> +
> +	return 0;
> +}
> +
> +static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
> +{
> +	void *strtab;
> +	u64 reg;
> +	u32 size, l1size;
> +	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> +
> +	/* Calculate the L1 size, capped to the SIDSIZE. */
> +	size = STRTAB_L1_SZ_SHIFT - (ilog2(STRTAB_L1_DESC_DWORDS) + 3);
> +	size = min(size, smmu->sid_bits - STRTAB_SPLIT);
> +	cfg->num_l1_ents = 1 << size;
> +
> +	size += STRTAB_SPLIT;
> +	if (size < smmu->sid_bits)
> +		dev_warn(smmu->dev,
> +			 "2-level strtab only covers %u/%u bits of SID\n",
> +			 size, smmu->sid_bits);
> +
> +	l1size = cfg->num_l1_ents * (STRTAB_L1_DESC_DWORDS << 3);
> +	strtab = dmam_alloc_coherent(smmu->dev, l1size, &cfg->strtab_dma,
> +				     GFP_KERNEL);
> +	if (!strtab) {
> +		dev_err(smmu->dev,
> +			"failed to allocate l1 stream table (%u bytes)\n",
> +			l1size);
> +		return -ENOMEM;
> +	}
> +	cfg->strtab = strtab;
> +
> +	/* Configure strtab_base_cfg for 2 levels */
> +	reg  = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_2LVL);
> +	reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, size);
> +	reg |= FIELD_PREP(STRTAB_BASE_CFG_SPLIT, STRTAB_SPLIT);
> +	cfg->strtab_base_cfg = reg;
> +
> +	return arm_smmu_init_l1_strtab(smmu);
> +}
> +
> +static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
> +{
> +	void *strtab;
> +	u64 reg;
> +	u32 size;
> +	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> +
> +	size = (1 << smmu->sid_bits) * (STRTAB_STE_DWORDS << 3);
> +	strtab = dmam_alloc_coherent(smmu->dev, size, &cfg->strtab_dma,
> +				     GFP_KERNEL);
> +	if (!strtab) {
> +		dev_err(smmu->dev,
> +			"failed to allocate linear stream table (%u bytes)\n",
> +			size);
> +		return -ENOMEM;
> +	}
> +	cfg->strtab = strtab;
> +	cfg->num_l1_ents = 1 << smmu->sid_bits;
> +
> +	/* Configure strtab_base_cfg for a linear table covering all SIDs */
> +	reg  = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_LINEAR);
> +	reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, smmu->sid_bits);
> +	cfg->strtab_base_cfg = reg;
> +
> +	return 0;
> +}
> +
> +int arm_smmu_init_strtab(struct arm_smmu_device *smmu)
> +{
> +	u64 reg;
> +	int ret;
> +
> +	if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB)
> +		ret = arm_smmu_init_strtab_2lvl(smmu);
> +	else
> +		ret = arm_smmu_init_strtab_linear(smmu);
> +
> +	if (ret)
> +		return ret;
> +
> +	/* Set the strtab base address */
> +	reg  = smmu->strtab_cfg.strtab_dma & STRTAB_BASE_ADDR_MASK;
> +	reg |= STRTAB_BASE_RA;
> +	smmu->strtab_cfg.strtab_base = reg;
> +
> +	/* Allocate the first VMID for stage-2 bypass STEs */
> +	set_bit(0, smmu->vmid_map);
> +	return 0;
> +}
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 08fd79f66d29..2baaf064a324 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1209,18 +1209,6 @@ bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd)
>  }
>  
>  /* Stream table manipulation functions */
> -static void
> -arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
> -{
> -	u64 val = 0;
> -
> -	val |= FIELD_PREP(STRTAB_L1_DESC_SPAN, desc->span);
> -	val |= desc->l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK;
> -
> -	/* See comment in arm_smmu_write_ctx_desc() */
> -	WRITE_ONCE(*dst, cpu_to_le64(val));
> -}
> -
>  static void arm_smmu_sync_ste_for_sid(struct arm_smmu_device *smmu, u32 sid)
>  {
>  	struct arm_smmu_cmdq_ent cmd = {
> @@ -1395,34 +1383,6 @@ static void arm_smmu_init_bypass_stes(__le64 *strtab, unsigned int nent, bool fo
>  	}
>  }
>  
> -static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
> -{
> -	size_t size;
> -	void *strtab;
> -	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> -	struct arm_smmu_strtab_l1_desc *desc = &cfg->l1_desc[sid >> STRTAB_SPLIT];
> -
> -	if (desc->l2ptr)
> -		return 0;
> -
> -	size = 1 << (STRTAB_SPLIT + ilog2(STRTAB_STE_DWORDS) + 3);
> -	strtab = &cfg->strtab[(sid >> STRTAB_SPLIT) * STRTAB_L1_DESC_DWORDS];
> -
> -	desc->span = STRTAB_SPLIT + 1;
> -	desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, &desc->l2ptr_dma,
> -					  GFP_KERNEL);
> -	if (!desc->l2ptr) {
> -		dev_err(smmu->dev,
> -			"failed to allocate l2 stream table for SID %u\n",
> -			sid);
> -		return -ENOMEM;
> -	}
> -
> -	arm_smmu_init_bypass_stes(desc->l2ptr, 1 << STRTAB_SPLIT, false);
> -	arm_smmu_write_strtab_l1_desc(strtab, desc);
> -	return 0;
> -}
> -
>  static struct arm_smmu_master *
>  arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
>  {
> @@ -2515,13 +2475,24 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid)
>  
>  static int arm_smmu_init_sid_strtab(struct arm_smmu_device *smmu, u32 sid)
>  {
> +	int ret;
> +
>  	/* Check the SIDs are in range of the SMMU and our stream table */
>  	if (!arm_smmu_sid_in_range(smmu, sid))
>  		return -ERANGE;
>  
>  	/* Ensure l2 strtab is initialised */
> -	if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB)
> -		return arm_smmu_init_l2_strtab(smmu, sid);
> +	if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
> +		struct arm_smmu_strtab_l1_desc *desc;
> +
> +		ret = arm_smmu_init_l2_strtab(smmu, sid);
> +		if (ret)
> +			return ret;
> +
> +		desc = &smmu->strtab_cfg.l1_desc[sid >> STRTAB_SPLIT];
> +		arm_smmu_init_bypass_stes(desc->l2ptr, 1 << STRTAB_SPLIT,
> +					  false);
> +	}
>  
>  	return 0;
>  }
> @@ -2821,49 +2792,6 @@ static struct iommu_ops arm_smmu_ops = {
>  };
>  
>  /* Probing and initialisation functions */
> -static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
> -				   struct arm_smmu_queue *q,
> -				   void __iomem *page,
> -				   unsigned long prod_off,
> -				   unsigned long cons_off,
> -				   size_t dwords, const char *name)
> -{
> -	size_t qsz;
> -
> -	do {
> -		qsz = ((1 << q->llq.max_n_shift) * dwords) << 3;
> -		q->base = dmam_alloc_coherent(smmu->dev, qsz, &q->base_dma,
> -					      GFP_KERNEL);
> -		if (q->base || qsz < PAGE_SIZE)
> -			break;
> -
> -		q->llq.max_n_shift--;
> -	} while (1);
> -
> -	if (!q->base) {
> -		dev_err(smmu->dev,
> -			"failed to allocate queue (0x%zx bytes) for %s\n",
> -			qsz, name);
> -		return -ENOMEM;
> -	}
> -
> -	if (!WARN_ON(q->base_dma & (qsz - 1))) {
> -		dev_info(smmu->dev, "allocated %u entries for %s\n",
> -			 1 << q->llq.max_n_shift, name);
> -	}
> -
> -	q->prod_reg	= page + prod_off;
> -	q->cons_reg	= page + cons_off;
> -	q->ent_dwords	= dwords;
> -
> -	q->q_base  = Q_BASE_RWA;
> -	q->q_base |= q->base_dma & Q_BASE_ADDR_MASK;
> -	q->q_base |= FIELD_PREP(Q_BASE_LOG2SIZE, q->llq.max_n_shift);
> -
> -	q->llq.prod = q->llq.cons = 0;
> -	return 0;
> -}
> -
>  static int arm_smmu_cmdq_init(struct arm_smmu_device *smmu)
>  {
>  	struct arm_smmu_cmdq *cmdq = &smmu->cmdq;
> @@ -2918,114 +2846,6 @@ static int arm_smmu_init_queues(struct arm_smmu_device *smmu)
>  				       PRIQ_ENT_DWORDS, "priq");
>  }
>  
> -static int arm_smmu_init_l1_strtab(struct arm_smmu_device *smmu)
> -{
> -	unsigned int i;
> -	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> -	void *strtab = smmu->strtab_cfg.strtab;
> -
> -	cfg->l1_desc = devm_kcalloc(smmu->dev, cfg->num_l1_ents,
> -				    sizeof(*cfg->l1_desc), GFP_KERNEL);
> -	if (!cfg->l1_desc)
> -		return -ENOMEM;
> -
> -	for (i = 0; i < cfg->num_l1_ents; ++i) {
> -		arm_smmu_write_strtab_l1_desc(strtab, &cfg->l1_desc[i]);
> -		strtab += STRTAB_L1_DESC_DWORDS << 3;
> -	}
> -
> -	return 0;
> -}
> -
> -static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
> -{
> -	void *strtab;
> -	u64 reg;
> -	u32 size, l1size;
> -	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> -
> -	/* Calculate the L1 size, capped to the SIDSIZE. */
> -	size = STRTAB_L1_SZ_SHIFT - (ilog2(STRTAB_L1_DESC_DWORDS) + 3);
> -	size = min(size, smmu->sid_bits - STRTAB_SPLIT);
> -	cfg->num_l1_ents = 1 << size;
> -
> -	size += STRTAB_SPLIT;
> -	if (size < smmu->sid_bits)
> -		dev_warn(smmu->dev,
> -			 "2-level strtab only covers %u/%u bits of SID\n",
> -			 size, smmu->sid_bits);
> -
> -	l1size = cfg->num_l1_ents * (STRTAB_L1_DESC_DWORDS << 3);
> -	strtab = dmam_alloc_coherent(smmu->dev, l1size, &cfg->strtab_dma,
> -				     GFP_KERNEL);
> -	if (!strtab) {
> -		dev_err(smmu->dev,
> -			"failed to allocate l1 stream table (%u bytes)\n",
> -			l1size);
> -		return -ENOMEM;
> -	}
> -	cfg->strtab = strtab;
> -
> -	/* Configure strtab_base_cfg for 2 levels */
> -	reg  = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_2LVL);
> -	reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, size);
> -	reg |= FIELD_PREP(STRTAB_BASE_CFG_SPLIT, STRTAB_SPLIT);
> -	cfg->strtab_base_cfg = reg;
> -
> -	return arm_smmu_init_l1_strtab(smmu);
> -}
> -
> -static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
> -{
> -	void *strtab;
> -	u64 reg;
> -	u32 size;
> -	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> -
> -	size = (1 << smmu->sid_bits) * (STRTAB_STE_DWORDS << 3);
> -	strtab = dmam_alloc_coherent(smmu->dev, size, &cfg->strtab_dma,
> -				     GFP_KERNEL);
> -	if (!strtab) {
> -		dev_err(smmu->dev,
> -			"failed to allocate linear stream table (%u bytes)\n",
> -			size);
> -		return -ENOMEM;
> -	}
> -	cfg->strtab = strtab;
> -	cfg->num_l1_ents = 1 << smmu->sid_bits;
> -
> -	/* Configure strtab_base_cfg for a linear table covering all SIDs */
> -	reg  = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_LINEAR);
> -	reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, smmu->sid_bits);
> -	cfg->strtab_base_cfg = reg;
> -
> -	arm_smmu_init_bypass_stes(strtab, cfg->num_l1_ents, false);
> -	return 0;
> -}
> -
> -static int arm_smmu_init_strtab(struct arm_smmu_device *smmu)
> -{
> -	u64 reg;
> -	int ret;
> -
> -	if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB)
> -		ret = arm_smmu_init_strtab_2lvl(smmu);
> -	else
> -		ret = arm_smmu_init_strtab_linear(smmu);
> -
> -	if (ret)
> -		return ret;
> -
> -	/* Set the strtab base address */
> -	reg  = smmu->strtab_cfg.strtab_dma & STRTAB_BASE_ADDR_MASK;
> -	reg |= STRTAB_BASE_RA;
> -	smmu->strtab_cfg.strtab_base = reg;
> -
> -	/* Allocate the first VMID for stage-2 bypass STEs */
> -	set_bit(0, smmu->vmid_map);
> -	return 0;
> -}
> -
>  static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
>  {
>  	int ret;
> @@ -3037,7 +2857,14 @@ static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
>  	if (ret)
>  		return ret;
>  
> -	return arm_smmu_init_strtab(smmu);
> +	ret = arm_smmu_init_strtab(smmu);
> +	if (ret)
> +		return ret;
> +
> +	if (!(smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB))
> +		arm_smmu_init_bypass_stes(smmu->strtab_cfg.strtab,
> +					  smmu->strtab_cfg.num_l1_ents, false);
> +	return 0;
>  }
>  
>  static void arm_smmu_free_msis(void *data)
> -- 
> 2.39.0
>
Thanks,
Mostafa

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 30/45] iommu/arm-smmu-v3: Move queue and table allocation to arm-smmu-v3-common.c
@ 2024-02-16 12:03     ` Mostafa Saleh
  0 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2024-02-16 12:03 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

Hi Jean,

On Wed, Feb 01, 2023 at 12:53:14PM +0000, Jean-Philippe Brucker wrote:
> Move more code to arm-smmu-v3-common.c, so that the KVM driver can reuse
> it.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |   8 +
>  .../arm/arm-smmu-v3/arm-smmu-v3-common.c      | 190 ++++++++++++++++
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 215 ++----------------
>  3 files changed, 219 insertions(+), 194 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 59e8101d4ff5..8ab84282f62a 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -277,6 +277,14 @@ bool arm_smmu_capable(struct device *dev, enum iommu_cap cap);
>  struct iommu_group *arm_smmu_device_group(struct device *dev);
>  int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args);
>  int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu);
> +int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
> +			    struct arm_smmu_queue *q,
> +			    void __iomem *page,
> +			    unsigned long prod_off,
> +			    unsigned long cons_off,
> +			    size_t dwords, const char *name);
> +int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid);
I see this is not used by the KVM driver, so it is not needed in the
common file?

> +int arm_smmu_init_strtab(struct arm_smmu_device *smmu);
>  
>  int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain, int ssid,
>  			    struct arm_smmu_ctx_desc *cd);
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
> index 5e43329c0826..9226971b6e53 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common.c
> @@ -294,3 +294,193 @@ int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
>  {
>  	return iommu_fwspec_add_ids(dev, args->args, 1);
>  }
> +
> +int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
> +			    struct arm_smmu_queue *q,
> +			    void __iomem *page,
> +			    unsigned long prod_off,
> +			    unsigned long cons_off,
> +			    size_t dwords, const char *name)
> +{
> +	size_t qsz;
> +
> +	do {
> +		qsz = ((1 << q->llq.max_n_shift) * dwords) << 3;
> +		q->base = dmam_alloc_coherent(smmu->dev, qsz, &q->base_dma,
> +					      GFP_KERNEL);
> +		if (q->base || qsz < PAGE_SIZE)
> +			break;
> +
> +		q->llq.max_n_shift--;
> +	} while (1);
> +
> +	if (!q->base) {
> +		dev_err(smmu->dev,
> +			"failed to allocate queue (0x%zx bytes) for %s\n",
> +			qsz, name);
> +		return -ENOMEM;
> +	}
> +
> +	if (!WARN_ON(q->base_dma & (qsz - 1))) {
> +		dev_info(smmu->dev, "allocated %u entries for %s\n",
> +			 1 << q->llq.max_n_shift, name);
> +	}
> +
> +	q->prod_reg	= page + prod_off;
> +	q->cons_reg	= page + cons_off;
> +	q->ent_dwords	= dwords;
> +
> +	q->q_base  = Q_BASE_RWA;
> +	q->q_base |= q->base_dma & Q_BASE_ADDR_MASK;
> +	q->q_base |= FIELD_PREP(Q_BASE_LOG2SIZE, q->llq.max_n_shift);
> +
> +	q->llq.prod = q->llq.cons = 0;
> +	return 0;
> +}
> +
> +/* Stream table initialization functions */
> +static void
> +arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
> +{
> +	u64 val = 0;
> +
> +	val |= FIELD_PREP(STRTAB_L1_DESC_SPAN, desc->span);
> +	val |= desc->l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK;
> +
> +	/* Ensure the SMMU sees a zeroed table after reading this pointer */
> +	WRITE_ONCE(*dst, cpu_to_le64(val));
> +}
> +
> +int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
> +{
> +	size_t size;
> +	void *strtab;
> +	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> +	struct arm_smmu_strtab_l1_desc *desc = &cfg->l1_desc[sid >> STRTAB_SPLIT];
> +
> +	if (desc->l2ptr)
> +		return 0;
> +
> +	size = 1 << (STRTAB_SPLIT + ilog2(STRTAB_STE_DWORDS) + 3);
> +	strtab = &cfg->strtab[(sid >> STRTAB_SPLIT) * STRTAB_L1_DESC_DWORDS];
> +
> +	desc->span = STRTAB_SPLIT + 1;
> +	desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, &desc->l2ptr_dma,
> +					  GFP_KERNEL);
> +	if (!desc->l2ptr) {
> +		dev_err(smmu->dev,
> +			"failed to allocate l2 stream table for SID %u\n",
> +			sid);
> +		return -ENOMEM;
> +	}
> +
> +	arm_smmu_write_strtab_l1_desc(strtab, desc);
> +	return 0;
> +}
> +
> +static int arm_smmu_init_l1_strtab(struct arm_smmu_device *smmu)
> +{
> +	unsigned int i;
> +	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> +	void *strtab = smmu->strtab_cfg.strtab;
> +
> +	cfg->l1_desc = devm_kcalloc(smmu->dev, cfg->num_l1_ents,
> +				    sizeof(*cfg->l1_desc), GFP_KERNEL);
> +	if (!cfg->l1_desc)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < cfg->num_l1_ents; ++i) {
> +		arm_smmu_write_strtab_l1_desc(strtab, &cfg->l1_desc[i]);
> +		strtab += STRTAB_L1_DESC_DWORDS << 3;
> +	}
> +
> +	return 0;
> +}
> +
> +static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
> +{
> +	void *strtab;
> +	u64 reg;
> +	u32 size, l1size;
> +	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> +
> +	/* Calculate the L1 size, capped to the SIDSIZE. */
> +	size = STRTAB_L1_SZ_SHIFT - (ilog2(STRTAB_L1_DESC_DWORDS) + 3);
> +	size = min(size, smmu->sid_bits - STRTAB_SPLIT);
> +	cfg->num_l1_ents = 1 << size;
> +
> +	size += STRTAB_SPLIT;
> +	if (size < smmu->sid_bits)
> +		dev_warn(smmu->dev,
> +			 "2-level strtab only covers %u/%u bits of SID\n",
> +			 size, smmu->sid_bits);
> +
> +	l1size = cfg->num_l1_ents * (STRTAB_L1_DESC_DWORDS << 3);
> +	strtab = dmam_alloc_coherent(smmu->dev, l1size, &cfg->strtab_dma,
> +				     GFP_KERNEL);
> +	if (!strtab) {
> +		dev_err(smmu->dev,
> +			"failed to allocate l1 stream table (%u bytes)\n",
> +			l1size);
> +		return -ENOMEM;
> +	}
> +	cfg->strtab = strtab;
> +
> +	/* Configure strtab_base_cfg for 2 levels */
> +	reg  = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_2LVL);
> +	reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, size);
> +	reg |= FIELD_PREP(STRTAB_BASE_CFG_SPLIT, STRTAB_SPLIT);
> +	cfg->strtab_base_cfg = reg;
> +
> +	return arm_smmu_init_l1_strtab(smmu);
> +}
> +
> +static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
> +{
> +	void *strtab;
> +	u64 reg;
> +	u32 size;
> +	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> +
> +	size = (1 << smmu->sid_bits) * (STRTAB_STE_DWORDS << 3);
> +	strtab = dmam_alloc_coherent(smmu->dev, size, &cfg->strtab_dma,
> +				     GFP_KERNEL);
> +	if (!strtab) {
> +		dev_err(smmu->dev,
> +			"failed to allocate linear stream table (%u bytes)\n",
> +			size);
> +		return -ENOMEM;
> +	}
> +	cfg->strtab = strtab;
> +	cfg->num_l1_ents = 1 << smmu->sid_bits;
> +
> +	/* Configure strtab_base_cfg for a linear table covering all SIDs */
> +	reg  = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_LINEAR);
> +	reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, smmu->sid_bits);
> +	cfg->strtab_base_cfg = reg;
> +
> +	return 0;
> +}
> +
> +int arm_smmu_init_strtab(struct arm_smmu_device *smmu)
> +{
> +	u64 reg;
> +	int ret;
> +
> +	if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB)
> +		ret = arm_smmu_init_strtab_2lvl(smmu);
> +	else
> +		ret = arm_smmu_init_strtab_linear(smmu);
> +
> +	if (ret)
> +		return ret;
> +
> +	/* Set the strtab base address */
> +	reg  = smmu->strtab_cfg.strtab_dma & STRTAB_BASE_ADDR_MASK;
> +	reg |= STRTAB_BASE_RA;
> +	smmu->strtab_cfg.strtab_base = reg;
> +
> +	/* Allocate the first VMID for stage-2 bypass STEs */
> +	set_bit(0, smmu->vmid_map);
> +	return 0;
> +}
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 08fd79f66d29..2baaf064a324 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1209,18 +1209,6 @@ bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd)
>  }
>  
>  /* Stream table manipulation functions */
> -static void
> -arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
> -{
> -	u64 val = 0;
> -
> -	val |= FIELD_PREP(STRTAB_L1_DESC_SPAN, desc->span);
> -	val |= desc->l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK;
> -
> -	/* See comment in arm_smmu_write_ctx_desc() */
> -	WRITE_ONCE(*dst, cpu_to_le64(val));
> -}
> -
>  static void arm_smmu_sync_ste_for_sid(struct arm_smmu_device *smmu, u32 sid)
>  {
>  	struct arm_smmu_cmdq_ent cmd = {
> @@ -1395,34 +1383,6 @@ static void arm_smmu_init_bypass_stes(__le64 *strtab, unsigned int nent, bool fo
>  	}
>  }
>  
> -static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
> -{
> -	size_t size;
> -	void *strtab;
> -	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> -	struct arm_smmu_strtab_l1_desc *desc = &cfg->l1_desc[sid >> STRTAB_SPLIT];
> -
> -	if (desc->l2ptr)
> -		return 0;
> -
> -	size = 1 << (STRTAB_SPLIT + ilog2(STRTAB_STE_DWORDS) + 3);
> -	strtab = &cfg->strtab[(sid >> STRTAB_SPLIT) * STRTAB_L1_DESC_DWORDS];
> -
> -	desc->span = STRTAB_SPLIT + 1;
> -	desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, &desc->l2ptr_dma,
> -					  GFP_KERNEL);
> -	if (!desc->l2ptr) {
> -		dev_err(smmu->dev,
> -			"failed to allocate l2 stream table for SID %u\n",
> -			sid);
> -		return -ENOMEM;
> -	}
> -
> -	arm_smmu_init_bypass_stes(desc->l2ptr, 1 << STRTAB_SPLIT, false);
> -	arm_smmu_write_strtab_l1_desc(strtab, desc);
> -	return 0;
> -}
> -
>  static struct arm_smmu_master *
>  arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
>  {
> @@ -2515,13 +2475,24 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid)
>  
>  static int arm_smmu_init_sid_strtab(struct arm_smmu_device *smmu, u32 sid)
>  {
> +	int ret;
> +
>  	/* Check the SIDs are in range of the SMMU and our stream table */
>  	if (!arm_smmu_sid_in_range(smmu, sid))
>  		return -ERANGE;
>  
>  	/* Ensure l2 strtab is initialised */
> -	if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB)
> -		return arm_smmu_init_l2_strtab(smmu, sid);
> +	if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
> +		struct arm_smmu_strtab_l1_desc *desc;
> +
> +		ret = arm_smmu_init_l2_strtab(smmu, sid);
> +		if (ret)
> +			return ret;
> +
> +		desc = &smmu->strtab_cfg.l1_desc[sid >> STRTAB_SPLIT];
> +		arm_smmu_init_bypass_stes(desc->l2ptr, 1 << STRTAB_SPLIT,
> +					  false);
> +	}
>  
>  	return 0;
>  }
> @@ -2821,49 +2792,6 @@ static struct iommu_ops arm_smmu_ops = {
>  };
>  
>  /* Probing and initialisation functions */
> -static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
> -				   struct arm_smmu_queue *q,
> -				   void __iomem *page,
> -				   unsigned long prod_off,
> -				   unsigned long cons_off,
> -				   size_t dwords, const char *name)
> -{
> -	size_t qsz;
> -
> -	do {
> -		qsz = ((1 << q->llq.max_n_shift) * dwords) << 3;
> -		q->base = dmam_alloc_coherent(smmu->dev, qsz, &q->base_dma,
> -					      GFP_KERNEL);
> -		if (q->base || qsz < PAGE_SIZE)
> -			break;
> -
> -		q->llq.max_n_shift--;
> -	} while (1);
> -
> -	if (!q->base) {
> -		dev_err(smmu->dev,
> -			"failed to allocate queue (0x%zx bytes) for %s\n",
> -			qsz, name);
> -		return -ENOMEM;
> -	}
> -
> -	if (!WARN_ON(q->base_dma & (qsz - 1))) {
> -		dev_info(smmu->dev, "allocated %u entries for %s\n",
> -			 1 << q->llq.max_n_shift, name);
> -	}
> -
> -	q->prod_reg	= page + prod_off;
> -	q->cons_reg	= page + cons_off;
> -	q->ent_dwords	= dwords;
> -
> -	q->q_base  = Q_BASE_RWA;
> -	q->q_base |= q->base_dma & Q_BASE_ADDR_MASK;
> -	q->q_base |= FIELD_PREP(Q_BASE_LOG2SIZE, q->llq.max_n_shift);
> -
> -	q->llq.prod = q->llq.cons = 0;
> -	return 0;
> -}
> -
>  static int arm_smmu_cmdq_init(struct arm_smmu_device *smmu)
>  {
>  	struct arm_smmu_cmdq *cmdq = &smmu->cmdq;
> @@ -2918,114 +2846,6 @@ static int arm_smmu_init_queues(struct arm_smmu_device *smmu)
>  				       PRIQ_ENT_DWORDS, "priq");
>  }
>  
> -static int arm_smmu_init_l1_strtab(struct arm_smmu_device *smmu)
> -{
> -	unsigned int i;
> -	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> -	void *strtab = smmu->strtab_cfg.strtab;
> -
> -	cfg->l1_desc = devm_kcalloc(smmu->dev, cfg->num_l1_ents,
> -				    sizeof(*cfg->l1_desc), GFP_KERNEL);
> -	if (!cfg->l1_desc)
> -		return -ENOMEM;
> -
> -	for (i = 0; i < cfg->num_l1_ents; ++i) {
> -		arm_smmu_write_strtab_l1_desc(strtab, &cfg->l1_desc[i]);
> -		strtab += STRTAB_L1_DESC_DWORDS << 3;
> -	}
> -
> -	return 0;
> -}
> -
> -static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
> -{
> -	void *strtab;
> -	u64 reg;
> -	u32 size, l1size;
> -	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> -
> -	/* Calculate the L1 size, capped to the SIDSIZE. */
> -	size = STRTAB_L1_SZ_SHIFT - (ilog2(STRTAB_L1_DESC_DWORDS) + 3);
> -	size = min(size, smmu->sid_bits - STRTAB_SPLIT);
> -	cfg->num_l1_ents = 1 << size;
> -
> -	size += STRTAB_SPLIT;
> -	if (size < smmu->sid_bits)
> -		dev_warn(smmu->dev,
> -			 "2-level strtab only covers %u/%u bits of SID\n",
> -			 size, smmu->sid_bits);
> -
> -	l1size = cfg->num_l1_ents * (STRTAB_L1_DESC_DWORDS << 3);
> -	strtab = dmam_alloc_coherent(smmu->dev, l1size, &cfg->strtab_dma,
> -				     GFP_KERNEL);
> -	if (!strtab) {
> -		dev_err(smmu->dev,
> -			"failed to allocate l1 stream table (%u bytes)\n",
> -			l1size);
> -		return -ENOMEM;
> -	}
> -	cfg->strtab = strtab;
> -
> -	/* Configure strtab_base_cfg for 2 levels */
> -	reg  = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_2LVL);
> -	reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, size);
> -	reg |= FIELD_PREP(STRTAB_BASE_CFG_SPLIT, STRTAB_SPLIT);
> -	cfg->strtab_base_cfg = reg;
> -
> -	return arm_smmu_init_l1_strtab(smmu);
> -}
> -
> -static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
> -{
> -	void *strtab;
> -	u64 reg;
> -	u32 size;
> -	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> -
> -	size = (1 << smmu->sid_bits) * (STRTAB_STE_DWORDS << 3);
> -	strtab = dmam_alloc_coherent(smmu->dev, size, &cfg->strtab_dma,
> -				     GFP_KERNEL);
> -	if (!strtab) {
> -		dev_err(smmu->dev,
> -			"failed to allocate linear stream table (%u bytes)\n",
> -			size);
> -		return -ENOMEM;
> -	}
> -	cfg->strtab = strtab;
> -	cfg->num_l1_ents = 1 << smmu->sid_bits;
> -
> -	/* Configure strtab_base_cfg for a linear table covering all SIDs */
> -	reg  = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_LINEAR);
> -	reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, smmu->sid_bits);
> -	cfg->strtab_base_cfg = reg;
> -
> -	arm_smmu_init_bypass_stes(strtab, cfg->num_l1_ents, false);
> -	return 0;
> -}
> -
> -static int arm_smmu_init_strtab(struct arm_smmu_device *smmu)
> -{
> -	u64 reg;
> -	int ret;
> -
> -	if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB)
> -		ret = arm_smmu_init_strtab_2lvl(smmu);
> -	else
> -		ret = arm_smmu_init_strtab_linear(smmu);
> -
> -	if (ret)
> -		return ret;
> -
> -	/* Set the strtab base address */
> -	reg  = smmu->strtab_cfg.strtab_dma & STRTAB_BASE_ADDR_MASK;
> -	reg |= STRTAB_BASE_RA;
> -	smmu->strtab_cfg.strtab_base = reg;
> -
> -	/* Allocate the first VMID for stage-2 bypass STEs */
> -	set_bit(0, smmu->vmid_map);
> -	return 0;
> -}
> -
>  static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
>  {
>  	int ret;
> @@ -3037,7 +2857,14 @@ static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
>  	if (ret)
>  		return ret;
>  
> -	return arm_smmu_init_strtab(smmu);
> +	ret = arm_smmu_init_strtab(smmu);
> +	if (ret)
> +		return ret;
> +
> +	if (!(smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB))
> +		arm_smmu_init_bypass_stes(smmu->strtab_cfg.strtab,
> +					  smmu->strtab_cfg.num_l1_ents, false);
> +	return 0;
>  }
>  
>  static void arm_smmu_free_msis(void *data)
> -- 
> 2.39.0
>
Thanks,
Mostafa

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 27/45] KVM: arm64: smmu-v3: Setup domains and page table configuration
  2024-01-23 19:50       ` Jean-Philippe Brucker
@ 2024-02-16 12:11         ` Mostafa Saleh
  -1 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2024-02-16 12:11 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu, Daniel Mentz

On Tue, Jan 23, 2024 at 7:50 PM Jean-Philippe Brucker
<jean-philippe@linaro.org> wrote:
>
> On Mon, Jan 15, 2024 at 02:34:12PM +0000, Mostafa Saleh wrote:
> > > +static void smmu_tlb_inv_range(struct kvm_iommu_tlb_cookie *data,
> > > +                              unsigned long iova, size_t size, size_t granule,
> > > +                              bool leaf)
> > > +{
> > > +       struct hyp_arm_smmu_v3_device *smmu = to_smmu(data->iommu);
> > > +       unsigned long end = iova + size;
> > > +       struct arm_smmu_cmdq_ent cmd = {
> > > +               .opcode = CMDQ_OP_TLBI_S2_IPA,
> > > +               .tlbi.vmid = data->domain_id,
> > > +               .tlbi.leaf = leaf,
> > > +       };
> > > +
> > > +       /*
> > > +        * There are no mappings at high addresses since we don't use TTB1, so
> > > +        * no overflow possible.
> > > +        */
> > > +       BUG_ON(end < iova);
> > > +
> > > +       while (iova < end) {
> > > +               cmd.tlbi.addr = iova;
> > > +               WARN_ON(smmu_send_cmd(smmu, &cmd));
> >
> > This would issue a sync command between each range, which is not needed,
> > maybe we can build the command first and then issue the sync, similar
> > to what the upstream driver does, what do you think?
>
> Yes, moving the sync out of the loop would be better. To keep things
> simple I'd just replace this with smmu_add_cmd() and add a smmu_sync_cmd()
> at the end, but maybe some implementations won't consume the TLBI itself
> fast enough, and we need to build a command list in software. Do you think
> smmu_add_cmd() is sufficient here?

Replacing this with smmu_add_cmd makes sense.
We only poll the queue at SYNC, which is the last command, so it
doesn't matter the pace
of the TLBI consumption I believe?

One advantage of building the command list first, is that we also
avoid MMIO access for the queue which can be slow.

Thanks,
Mostafa

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 27/45] KVM: arm64: smmu-v3: Setup domains and page table configuration
@ 2024-02-16 12:11         ` Mostafa Saleh
  0 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2024-02-16 12:11 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu, Daniel Mentz

On Tue, Jan 23, 2024 at 7:50 PM Jean-Philippe Brucker
<jean-philippe@linaro.org> wrote:
>
> On Mon, Jan 15, 2024 at 02:34:12PM +0000, Mostafa Saleh wrote:
> > > +static void smmu_tlb_inv_range(struct kvm_iommu_tlb_cookie *data,
> > > +                              unsigned long iova, size_t size, size_t granule,
> > > +                              bool leaf)
> > > +{
> > > +       struct hyp_arm_smmu_v3_device *smmu = to_smmu(data->iommu);
> > > +       unsigned long end = iova + size;
> > > +       struct arm_smmu_cmdq_ent cmd = {
> > > +               .opcode = CMDQ_OP_TLBI_S2_IPA,
> > > +               .tlbi.vmid = data->domain_id,
> > > +               .tlbi.leaf = leaf,
> > > +       };
> > > +
> > > +       /*
> > > +        * There are no mappings at high addresses since we don't use TTB1, so
> > > +        * no overflow possible.
> > > +        */
> > > +       BUG_ON(end < iova);
> > > +
> > > +       while (iova < end) {
> > > +               cmd.tlbi.addr = iova;
> > > +               WARN_ON(smmu_send_cmd(smmu, &cmd));
> >
> > This would issue a sync command between each range, which is not needed,
> > maybe we can build the command first and then issue the sync, similar
> > to what the upstream driver does, what do you think?
>
> Yes, moving the sync out of the loop would be better. To keep things
> simple I'd just replace this with smmu_add_cmd() and add a smmu_sync_cmd()
> at the end, but maybe some implementations won't consume the TLBI itself
> fast enough, and we need to build a command list in software. Do you think
> smmu_add_cmd() is sufficient here?

Replacing this with smmu_add_cmd makes sense.
We only poll the queue at SYNC, which is the last command, so it
doesn't matter the pace
of the TLBI consumption I believe?

One advantage of building the command list first, is that we also
avoid MMIO access for the queue which can be slow.

Thanks,
Mostafa

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 24/45] KVM: arm64: smmu-v3: Setup stream table
  2024-01-23 19:45       ` Jean-Philippe Brucker
@ 2024-02-16 12:19         ` Mostafa Saleh
  -1 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2024-02-16 12:19 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

On Tue, Jan 23, 2024 at 7:45 PM Jean-Philippe Brucker
<jean-philippe@linaro.org> wrote:
>
> Hi Mostafa,
>
> On Tue, Jan 16, 2024 at 08:59:41AM +0000, Mostafa Saleh wrote:
> > > +__maybe_unused
> > > +static int smmu_sync_ste(struct hyp_arm_smmu_v3_device *smmu, u32 sid)
> > > +{
> > > +       struct arm_smmu_cmdq_ent cmd = {
> > > +               .opcode = CMDQ_OP_CFGI_STE,
> > > +               .cfgi.sid = sid,
> > > +               .cfgi.leaf = true,
> > > +       };
> > > +
> > > +       return smmu_send_cmd(smmu, &cmd);
> > > +}
> > > +
> > I see the page tables are properly configured for ARM_SMMU_FEAT_COHERENCY but no
> > handling for the STE or CMDQ, I believe here we should have something as:
> > if (!(smmu->features & ARM_SMMU_FEAT_COHERENCY))
> >         kvm_flush_dcache_to_poc(step, STRTAB_STE_DWORDS << 3);
> >
> > Similarly in "smmu_add_cmd" for the command queue. Or use NC mapping
> > (which doesn't exist
> > upstream as far as I can see)
>
> Right, the host driver seems to do this. If I'm following correctly we end
> up with dma_direct_alloc() calling pgprot_dmacoherent() and get
> MT_NORMAL_NC, when the SMMU is declared non-coherent in DT/IORT.
>
> So we'd get mismatched attributes if hyp is then mapping these structures
> cacheable, but I don't remember how that works exactly. Might be fine
> since host donates the pages to hyp and we'd have a cache flush in
> between. I'll have to read up on that.

I guess that is not enough, as the hypervisor writes the STE/CMDQ at any time.

> Regardless, mapping NC seems cleaner, more readable. I'll see if I can add
> that attribute to kvm_pgtable_hyp_map().

There is a patch for that already in Android
https://android.googlesource.com/kernel/common/+/636c912401dec4d178f6cdf6073f546b15828cf7%5E%21/#F0

But I a guess as beginning CMO is enough, I have this POC for it
https://android-kvm.googlesource.com/linux/+/193b027de376317eb8daa4eb207badaa1d6fda4a%5E%21/#F0

Thanks,
Mostafa

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 24/45] KVM: arm64: smmu-v3: Setup stream table
@ 2024-02-16 12:19         ` Mostafa Saleh
  0 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2024-02-16 12:19 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

On Tue, Jan 23, 2024 at 7:45 PM Jean-Philippe Brucker
<jean-philippe@linaro.org> wrote:
>
> Hi Mostafa,
>
> On Tue, Jan 16, 2024 at 08:59:41AM +0000, Mostafa Saleh wrote:
> > > +__maybe_unused
> > > +static int smmu_sync_ste(struct hyp_arm_smmu_v3_device *smmu, u32 sid)
> > > +{
> > > +       struct arm_smmu_cmdq_ent cmd = {
> > > +               .opcode = CMDQ_OP_CFGI_STE,
> > > +               .cfgi.sid = sid,
> > > +               .cfgi.leaf = true,
> > > +       };
> > > +
> > > +       return smmu_send_cmd(smmu, &cmd);
> > > +}
> > > +
> > I see the page tables are properly configured for ARM_SMMU_FEAT_COHERENCY but no
> > handling for the STE or CMDQ, I believe here we should have something as:
> > if (!(smmu->features & ARM_SMMU_FEAT_COHERENCY))
> >         kvm_flush_dcache_to_poc(step, STRTAB_STE_DWORDS << 3);
> >
> > Similarly in "smmu_add_cmd" for the command queue. Or use NC mapping
> > (which doesn't exist
> > upstream as far as I can see)
>
> Right, the host driver seems to do this. If I'm following correctly we end
> up with dma_direct_alloc() calling pgprot_dmacoherent() and get
> MT_NORMAL_NC, when the SMMU is declared non-coherent in DT/IORT.
>
> So we'd get mismatched attributes if hyp is then mapping these structures
> cacheable, but I don't remember how that works exactly. Might be fine
> since host donates the pages to hyp and we'd have a cache flush in
> between. I'll have to read up on that.

I guess that is not enough, as the hypervisor writes the STE/CMDQ at any time.

> Regardless, mapping NC seems cleaner, more readable. I'll see if I can add
> that attribute to kvm_pgtable_hyp_map().

There is a patch for that already in Android
https://android.googlesource.com/kernel/common/+/636c912401dec4d178f6cdf6073f546b15828cf7%5E%21/#F0

But I a guess as beginning CMO is enough, I have this POC for it
https://android-kvm.googlesource.com/linux/+/193b027de376317eb8daa4eb207badaa1d6fda4a%5E%21/#F0

Thanks,
Mostafa

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 20/45] KVM: arm64: iommu: Add map() and unmap() operations
  2024-02-16 11:59     ` Mostafa Saleh
@ 2024-02-26 14:12       ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2024-02-26 14:12 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

On Fri, Feb 16, 2024 at 11:59:26AM +0000, Mostafa Saleh wrote:
> > +#define IOMMU_PROT_MASK (IOMMU_READ | IOMMU_WRITE | IOMMU_CACHE |\
> > +			 IOMMU_NOEXEC | IOMMU_MMIO)
> Is there a reason IOMMU_PRIV is not allowed?

No I probably just forgot it

Thanks,
Jean


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 20/45] KVM: arm64: iommu: Add map() and unmap() operations
@ 2024-02-26 14:12       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2024-02-26 14:12 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

On Fri, Feb 16, 2024 at 11:59:26AM +0000, Mostafa Saleh wrote:
> > +#define IOMMU_PROT_MASK (IOMMU_READ | IOMMU_WRITE | IOMMU_CACHE |\
> > +			 IOMMU_NOEXEC | IOMMU_MMIO)
> Is there a reason IOMMU_PRIV is not allowed?

No I probably just forgot it

Thanks,
Jean


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 24/45] KVM: arm64: smmu-v3: Setup stream table
  2024-02-16 12:19         ` Mostafa Saleh
@ 2024-02-26 14:13           ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2024-02-26 14:13 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

On Fri, Feb 16, 2024 at 12:19:01PM +0000, Mostafa Saleh wrote:
> On Tue, Jan 23, 2024 at 7:45 PM Jean-Philippe Brucker
> <jean-philippe@linaro.org> wrote:
> >
> > Hi Mostafa,
> >
> > On Tue, Jan 16, 2024 at 08:59:41AM +0000, Mostafa Saleh wrote:
> > > > +__maybe_unused
> > > > +static int smmu_sync_ste(struct hyp_arm_smmu_v3_device *smmu, u32 sid)
> > > > +{
> > > > +       struct arm_smmu_cmdq_ent cmd = {
> > > > +               .opcode = CMDQ_OP_CFGI_STE,
> > > > +               .cfgi.sid = sid,
> > > > +               .cfgi.leaf = true,
> > > > +       };
> > > > +
> > > > +       return smmu_send_cmd(smmu, &cmd);
> > > > +}
> > > > +
> > > I see the page tables are properly configured for ARM_SMMU_FEAT_COHERENCY but no
> > > handling for the STE or CMDQ, I believe here we should have something as:
> > > if (!(smmu->features & ARM_SMMU_FEAT_COHERENCY))
> > >         kvm_flush_dcache_to_poc(step, STRTAB_STE_DWORDS << 3);
> > >
> > > Similarly in "smmu_add_cmd" for the command queue. Or use NC mapping
> > > (which doesn't exist
> > > upstream as far as I can see)
> >
> > Right, the host driver seems to do this. If I'm following correctly we end
> > up with dma_direct_alloc() calling pgprot_dmacoherent() and get
> > MT_NORMAL_NC, when the SMMU is declared non-coherent in DT/IORT.
> >
> > So we'd get mismatched attributes if hyp is then mapping these structures
> > cacheable, but I don't remember how that works exactly. Might be fine
> > since host donates the pages to hyp and we'd have a cache flush in
> > between. I'll have to read up on that.
> 
> I guess that is not enough, as the hypervisor writes the STE/CMDQ at any time.
> 
> > Regardless, mapping NC seems cleaner, more readable. I'll see if I can add
> > that attribute to kvm_pgtable_hyp_map().
> 
> There is a patch for that already in Android
> https://android.googlesource.com/kernel/common/+/636c912401dec4d178f6cdf6073f546b15828cf7%5E%21/#F0

Nice, I've added this (rather than CMO, to avoid mismatched attributes)
but don't have the hardware to test it:

diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
index 4b0b70017f59..e43011b51ef4 100644
--- a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
@@ -268,12 +268,17 @@ static int smmu_init_registers(struct hyp_arm_smmu_v3_device *smmu)
 }
 
 /* Transfer ownership of structures from host to hyp */
-static void *smmu_take_pages(u64 base, size_t size)
+static void *smmu_take_pages(struct hyp_arm_smmu_v3_device *smmu, u64 base,
+			     size_t size)
 {
 	void *hyp_ptr;
+	enum kvm_pgtable_prot prot = PAGE_HYP;
+
+	if (!(smmu->features & ARM_SMMU_FEAT_COHERENCY))
+		prot |= KVM_PGTABLE_PROT_NC;
 
 	hyp_ptr = hyp_phys_to_virt(base);
-	if (pkvm_create_mappings(hyp_ptr, hyp_ptr + size, PAGE_HYP))
+	if (pkvm_create_mappings(hyp_ptr, hyp_ptr + size, prot))
 		return NULL;
 
 	return hyp_ptr;
@@ -293,7 +298,7 @@ static int smmu_init_cmdq(struct hyp_arm_smmu_v3_device *smmu)
 	cmdq_size = cmdq_nr_entries * CMDQ_ENT_DWORDS * 8;
 
 	cmdq_base &= Q_BASE_ADDR_MASK;
-	smmu->cmdq_base = smmu_take_pages(cmdq_base, cmdq_size);
+	smmu->cmdq_base = smmu_take_pages(smmu, cmdq_base, cmdq_size);
 	if (!smmu->cmdq_base)
 		return -EINVAL;
 
@@ -350,7 +355,7 @@ static int smmu_init_strtab(struct hyp_arm_smmu_v3_device *smmu)
 	}
 
 	strtab_base &= STRTAB_BASE_ADDR_MASK;
-	smmu->strtab_base = smmu_take_pages(strtab_base, strtab_size);
+	smmu->strtab_base = smmu_take_pages(smmu, strtab_base, strtab_size);
 	if (!smmu->strtab_base)
 		return -EINVAL;

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 24/45] KVM: arm64: smmu-v3: Setup stream table
@ 2024-02-26 14:13           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2024-02-26 14:13 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

On Fri, Feb 16, 2024 at 12:19:01PM +0000, Mostafa Saleh wrote:
> On Tue, Jan 23, 2024 at 7:45 PM Jean-Philippe Brucker
> <jean-philippe@linaro.org> wrote:
> >
> > Hi Mostafa,
> >
> > On Tue, Jan 16, 2024 at 08:59:41AM +0000, Mostafa Saleh wrote:
> > > > +__maybe_unused
> > > > +static int smmu_sync_ste(struct hyp_arm_smmu_v3_device *smmu, u32 sid)
> > > > +{
> > > > +       struct arm_smmu_cmdq_ent cmd = {
> > > > +               .opcode = CMDQ_OP_CFGI_STE,
> > > > +               .cfgi.sid = sid,
> > > > +               .cfgi.leaf = true,
> > > > +       };
> > > > +
> > > > +       return smmu_send_cmd(smmu, &cmd);
> > > > +}
> > > > +
> > > I see the page tables are properly configured for ARM_SMMU_FEAT_COHERENCY but no
> > > handling for the STE or CMDQ, I believe here we should have something as:
> > > if (!(smmu->features & ARM_SMMU_FEAT_COHERENCY))
> > >         kvm_flush_dcache_to_poc(step, STRTAB_STE_DWORDS << 3);
> > >
> > > Similarly in "smmu_add_cmd" for the command queue. Or use NC mapping
> > > (which doesn't exist
> > > upstream as far as I can see)
> >
> > Right, the host driver seems to do this. If I'm following correctly we end
> > up with dma_direct_alloc() calling pgprot_dmacoherent() and get
> > MT_NORMAL_NC, when the SMMU is declared non-coherent in DT/IORT.
> >
> > So we'd get mismatched attributes if hyp is then mapping these structures
> > cacheable, but I don't remember how that works exactly. Might be fine
> > since host donates the pages to hyp and we'd have a cache flush in
> > between. I'll have to read up on that.
> 
> I guess that is not enough, as the hypervisor writes the STE/CMDQ at any time.
> 
> > Regardless, mapping NC seems cleaner, more readable. I'll see if I can add
> > that attribute to kvm_pgtable_hyp_map().
> 
> There is a patch for that already in Android
> https://android.googlesource.com/kernel/common/+/636c912401dec4d178f6cdf6073f546b15828cf7%5E%21/#F0

Nice, I've added this (rather than CMO, to avoid mismatched attributes)
but don't have the hardware to test it:

diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
index 4b0b70017f59..e43011b51ef4 100644
--- a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
@@ -268,12 +268,17 @@ static int smmu_init_registers(struct hyp_arm_smmu_v3_device *smmu)
 }
 
 /* Transfer ownership of structures from host to hyp */
-static void *smmu_take_pages(u64 base, size_t size)
+static void *smmu_take_pages(struct hyp_arm_smmu_v3_device *smmu, u64 base,
+			     size_t size)
 {
 	void *hyp_ptr;
+	enum kvm_pgtable_prot prot = PAGE_HYP;
+
+	if (!(smmu->features & ARM_SMMU_FEAT_COHERENCY))
+		prot |= KVM_PGTABLE_PROT_NC;
 
 	hyp_ptr = hyp_phys_to_virt(base);
-	if (pkvm_create_mappings(hyp_ptr, hyp_ptr + size, PAGE_HYP))
+	if (pkvm_create_mappings(hyp_ptr, hyp_ptr + size, prot))
 		return NULL;
 
 	return hyp_ptr;
@@ -293,7 +298,7 @@ static int smmu_init_cmdq(struct hyp_arm_smmu_v3_device *smmu)
 	cmdq_size = cmdq_nr_entries * CMDQ_ENT_DWORDS * 8;
 
 	cmdq_base &= Q_BASE_ADDR_MASK;
-	smmu->cmdq_base = smmu_take_pages(cmdq_base, cmdq_size);
+	smmu->cmdq_base = smmu_take_pages(smmu, cmdq_base, cmdq_size);
 	if (!smmu->cmdq_base)
 		return -EINVAL;
 
@@ -350,7 +355,7 @@ static int smmu_init_strtab(struct hyp_arm_smmu_v3_device *smmu)
 	}
 
 	strtab_base &= STRTAB_BASE_ADDR_MASK;
-	smmu->strtab_base = smmu_take_pages(strtab_base, strtab_size);
+	smmu->strtab_base = smmu_take_pages(smmu, strtab_base, strtab_size);
 	if (!smmu->strtab_base)
 		return -EINVAL;

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 27/45] KVM: arm64: smmu-v3: Setup domains and page table configuration
  2024-02-16 12:11         ` Mostafa Saleh
@ 2024-02-26 14:18           ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2024-02-26 14:18 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu, Daniel Mentz

On Fri, Feb 16, 2024 at 12:11:48PM +0000, Mostafa Saleh wrote:
> On Tue, Jan 23, 2024 at 7:50 PM Jean-Philippe Brucker
> <jean-philippe@linaro.org> wrote:
> >
> > On Mon, Jan 15, 2024 at 02:34:12PM +0000, Mostafa Saleh wrote:
> > > > +static void smmu_tlb_inv_range(struct kvm_iommu_tlb_cookie *data,
> > > > +                              unsigned long iova, size_t size, size_t granule,
> > > > +                              bool leaf)
> > > > +{
> > > > +       struct hyp_arm_smmu_v3_device *smmu = to_smmu(data->iommu);
> > > > +       unsigned long end = iova + size;
> > > > +       struct arm_smmu_cmdq_ent cmd = {
> > > > +               .opcode = CMDQ_OP_TLBI_S2_IPA,
> > > > +               .tlbi.vmid = data->domain_id,
> > > > +               .tlbi.leaf = leaf,
> > > > +       };
> > > > +
> > > > +       /*
> > > > +        * There are no mappings at high addresses since we don't use TTB1, so
> > > > +        * no overflow possible.
> > > > +        */
> > > > +       BUG_ON(end < iova);
> > > > +
> > > > +       while (iova < end) {
> > > > +               cmd.tlbi.addr = iova;
> > > > +               WARN_ON(smmu_send_cmd(smmu, &cmd));
> > >
> > > This would issue a sync command between each range, which is not needed,
> > > maybe we can build the command first and then issue the sync, similar
> > > to what the upstream driver does, what do you think?
> >
> > Yes, moving the sync out of the loop would be better. To keep things
> > simple I'd just replace this with smmu_add_cmd() and add a smmu_sync_cmd()
> > at the end, but maybe some implementations won't consume the TLBI itself
> > fast enough, and we need to build a command list in software. Do you think
> > smmu_add_cmd() is sufficient here?
> 
> Replacing this with smmu_add_cmd makes sense.
> We only poll the queue at SYNC, which is the last command, so it
> doesn't matter the pace
> of the TLBI consumption I believe?

Yes only smmu_sync_cmd() waits for consumption (unless the queue is full
when we attempt to add a cmd). And submitting the TLBIs early could allow
the hardware to do some processing while we prepare the next commands, but
I don't know if it actually works that way.

> 
> One advantage of building the command list first, is that we also
> avoid MMIO access for the queue which can be slow.

Yes, I'm curious about the overhead of MMIO on some of these platforms.
Maybe we should do some software batching if you're able to measure a
performance impact from reading and writing CMDQ indices, but I suspect
the map/unmap context switches completely overshadow it at the moment.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 27/45] KVM: arm64: smmu-v3: Setup domains and page table configuration
@ 2024-02-26 14:18           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2024-02-26 14:18 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu, Daniel Mentz

On Fri, Feb 16, 2024 at 12:11:48PM +0000, Mostafa Saleh wrote:
> On Tue, Jan 23, 2024 at 7:50 PM Jean-Philippe Brucker
> <jean-philippe@linaro.org> wrote:
> >
> > On Mon, Jan 15, 2024 at 02:34:12PM +0000, Mostafa Saleh wrote:
> > > > +static void smmu_tlb_inv_range(struct kvm_iommu_tlb_cookie *data,
> > > > +                              unsigned long iova, size_t size, size_t granule,
> > > > +                              bool leaf)
> > > > +{
> > > > +       struct hyp_arm_smmu_v3_device *smmu = to_smmu(data->iommu);
> > > > +       unsigned long end = iova + size;
> > > > +       struct arm_smmu_cmdq_ent cmd = {
> > > > +               .opcode = CMDQ_OP_TLBI_S2_IPA,
> > > > +               .tlbi.vmid = data->domain_id,
> > > > +               .tlbi.leaf = leaf,
> > > > +       };
> > > > +
> > > > +       /*
> > > > +        * There are no mappings at high addresses since we don't use TTB1, so
> > > > +        * no overflow possible.
> > > > +        */
> > > > +       BUG_ON(end < iova);
> > > > +
> > > > +       while (iova < end) {
> > > > +               cmd.tlbi.addr = iova;
> > > > +               WARN_ON(smmu_send_cmd(smmu, &cmd));
> > >
> > > This would issue a sync command between each range, which is not needed,
> > > maybe we can build the command first and then issue the sync, similar
> > > to what the upstream driver does, what do you think?
> >
> > Yes, moving the sync out of the loop would be better. To keep things
> > simple I'd just replace this with smmu_add_cmd() and add a smmu_sync_cmd()
> > at the end, but maybe some implementations won't consume the TLBI itself
> > fast enough, and we need to build a command list in software. Do you think
> > smmu_add_cmd() is sufficient here?
> 
> Replacing this with smmu_add_cmd makes sense.
> We only poll the queue at SYNC, which is the last command, so it
> doesn't matter the pace
> of the TLBI consumption I believe?

Yes only smmu_sync_cmd() waits for consumption (unless the queue is full
when we attempt to add a cmd). And submitting the TLBIs early could allow
the hardware to do some processing while we prepare the next commands, but
I don't know if it actually works that way.

> 
> One advantage of building the command list first, is that we also
> avoid MMIO access for the queue which can be slow.

Yes, I'm curious about the overhead of MMIO on some of these platforms.
Maybe we should do some software batching if you're able to measure a
performance impact from reading and writing CMDQ indices, but I suspect
the map/unmap context switches completely overshadow it at the moment.

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 30/45] iommu/arm-smmu-v3: Move queue and table allocation to arm-smmu-v3-common.c
  2024-02-16 12:03     ` Mostafa Saleh
@ 2024-02-26 14:19       ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2024-02-26 14:19 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

On Fri, Feb 16, 2024 at 12:03:41PM +0000, Mostafa Saleh wrote:
> > +int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid);
> I see this is not used by the KVM driver, so it is not needed in the
> common file?

Indeed, looks like I've already removed this at some point

Thanks,
Jean

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 30/45] iommu/arm-smmu-v3: Move queue and table allocation to arm-smmu-v3-common.c
@ 2024-02-26 14:19       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 201+ messages in thread
From: Jean-Philippe Brucker @ 2024-02-26 14:19 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

On Fri, Feb 16, 2024 at 12:03:41PM +0000, Mostafa Saleh wrote:
> > +int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid);
> I see this is not used by the KVM driver, so it is not needed in the
> common file?

Indeed, looks like I've already removed this at some point

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 24/45] KVM: arm64: smmu-v3: Setup stream table
  2024-02-26 14:13           ` Jean-Philippe Brucker
@ 2024-03-06 12:51             ` Mostafa Saleh
  -1 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2024-03-06 12:51 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

On Mon, Feb 26, 2024 at 02:13:52PM +0000, Jean-Philippe Brucker wrote:
> On Fri, Feb 16, 2024 at 12:19:01PM +0000, Mostafa Saleh wrote:
> > On Tue, Jan 23, 2024 at 7:45 PM Jean-Philippe Brucker
> > <jean-philippe@linaro.org> wrote:
> > >
> > > Hi Mostafa,
> > >
> > > On Tue, Jan 16, 2024 at 08:59:41AM +0000, Mostafa Saleh wrote:
> > > > > +__maybe_unused
> > > > > +static int smmu_sync_ste(struct hyp_arm_smmu_v3_device *smmu, u32 sid)
> > > > > +{
> > > > > +       struct arm_smmu_cmdq_ent cmd = {
> > > > > +               .opcode = CMDQ_OP_CFGI_STE,
> > > > > +               .cfgi.sid = sid,
> > > > > +               .cfgi.leaf = true,
> > > > > +       };
> > > > > +
> > > > > +       return smmu_send_cmd(smmu, &cmd);
> > > > > +}
> > > > > +
> > > > I see the page tables are properly configured for ARM_SMMU_FEAT_COHERENCY but no
> > > > handling for the STE or CMDQ, I believe here we should have something as:
> > > > if (!(smmu->features & ARM_SMMU_FEAT_COHERENCY))
> > > >         kvm_flush_dcache_to_poc(step, STRTAB_STE_DWORDS << 3);
> > > >
> > > > Similarly in "smmu_add_cmd" for the command queue. Or use NC mapping
> > > > (which doesn't exist
> > > > upstream as far as I can see)
> > >
> > > Right, the host driver seems to do this. If I'm following correctly we end
> > > up with dma_direct_alloc() calling pgprot_dmacoherent() and get
> > > MT_NORMAL_NC, when the SMMU is declared non-coherent in DT/IORT.
> > >
> > > So we'd get mismatched attributes if hyp is then mapping these structures
> > > cacheable, but I don't remember how that works exactly. Might be fine
> > > since host donates the pages to hyp and we'd have a cache flush in
> > > between. I'll have to read up on that.
> > 
> > I guess that is not enough, as the hypervisor writes the STE/CMDQ at any time.
> > 
> > > Regardless, mapping NC seems cleaner, more readable. I'll see if I can add
> > > that attribute to kvm_pgtable_hyp_map().
> > 
> > There is a patch for that already in Android
> > https://android.googlesource.com/kernel/common/+/636c912401dec4d178f6cdf6073f546b15828cf7%5E%21/#F0
> 
> Nice, I've added this (rather than CMO, to avoid mismatched attributes)
> but don't have the hardware to test it:
> 
> diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
> index 4b0b70017f59..e43011b51ef4 100644
> --- a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
> +++ b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
> @@ -268,12 +268,17 @@ static int smmu_init_registers(struct hyp_arm_smmu_v3_device *smmu)
>  }
>  
>  /* Transfer ownership of structures from host to hyp */
> -static void *smmu_take_pages(u64 base, size_t size)
> +static void *smmu_take_pages(struct hyp_arm_smmu_v3_device *smmu, u64 base,
> +			     size_t size)
>  {
>  	void *hyp_ptr;
> +	enum kvm_pgtable_prot prot = PAGE_HYP;
> +
> +	if (!(smmu->features & ARM_SMMU_FEAT_COHERENCY))
> +		prot |= KVM_PGTABLE_PROT_NC;
>  
>  	hyp_ptr = hyp_phys_to_virt(base);
> -	if (pkvm_create_mappings(hyp_ptr, hyp_ptr + size, PAGE_HYP))
> +	if (pkvm_create_mappings(hyp_ptr, hyp_ptr + size, prot))
>  		return NULL;
>  
>  	return hyp_ptr;
> @@ -293,7 +298,7 @@ static int smmu_init_cmdq(struct hyp_arm_smmu_v3_device *smmu)
>  	cmdq_size = cmdq_nr_entries * CMDQ_ENT_DWORDS * 8;
>  
>  	cmdq_base &= Q_BASE_ADDR_MASK;
> -	smmu->cmdq_base = smmu_take_pages(cmdq_base, cmdq_size);
> +	smmu->cmdq_base = smmu_take_pages(smmu, cmdq_base, cmdq_size);
>  	if (!smmu->cmdq_base)
>  		return -EINVAL;
>  
> @@ -350,7 +355,7 @@ static int smmu_init_strtab(struct hyp_arm_smmu_v3_device *smmu)
>  	}
>  
>  	strtab_base &= STRTAB_BASE_ADDR_MASK;
> -	smmu->strtab_base = smmu_take_pages(strtab_base, strtab_size);
> +	smmu->strtab_base = smmu_take_pages(smmu, strtab_base, strtab_size);
>  	if (!smmu->strtab_base)
>  		return -EINVAL;

Thanks, that is missing the L2 for the STE, but I guess for that we can
just CMO for now, as the HW doen't update it, unlike the CMDQ which must
be mapped as NC and CMO won't be enough.

I am investigating to see if we can map the memory donated from the host
on demand with differet prot, in that case iommu_donate_pages can return
memory with the different attributes.

Thanks,
Mostafa

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [RFC PATCH 24/45] KVM: arm64: smmu-v3: Setup stream table
@ 2024-03-06 12:51             ` Mostafa Saleh
  0 siblings, 0 replies; 201+ messages in thread
From: Mostafa Saleh @ 2024-03-06 12:51 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: maz, catalin.marinas, will, joro, robin.murphy, james.morse,
	suzuki.poulose, oliver.upton, yuzenghui, dbrazdil, ryan.roberts,
	linux-arm-kernel, kvmarm, iommu

On Mon, Feb 26, 2024 at 02:13:52PM +0000, Jean-Philippe Brucker wrote:
> On Fri, Feb 16, 2024 at 12:19:01PM +0000, Mostafa Saleh wrote:
> > On Tue, Jan 23, 2024 at 7:45 PM Jean-Philippe Brucker
> > <jean-philippe@linaro.org> wrote:
> > >
> > > Hi Mostafa,
> > >
> > > On Tue, Jan 16, 2024 at 08:59:41AM +0000, Mostafa Saleh wrote:
> > > > > +__maybe_unused
> > > > > +static int smmu_sync_ste(struct hyp_arm_smmu_v3_device *smmu, u32 sid)
> > > > > +{
> > > > > +       struct arm_smmu_cmdq_ent cmd = {
> > > > > +               .opcode = CMDQ_OP_CFGI_STE,
> > > > > +               .cfgi.sid = sid,
> > > > > +               .cfgi.leaf = true,
> > > > > +       };
> > > > > +
> > > > > +       return smmu_send_cmd(smmu, &cmd);
> > > > > +}
> > > > > +
> > > > I see the page tables are properly configured for ARM_SMMU_FEAT_COHERENCY but no
> > > > handling for the STE or CMDQ, I believe here we should have something as:
> > > > if (!(smmu->features & ARM_SMMU_FEAT_COHERENCY))
> > > >         kvm_flush_dcache_to_poc(step, STRTAB_STE_DWORDS << 3);
> > > >
> > > > Similarly in "smmu_add_cmd" for the command queue. Or use NC mapping
> > > > (which doesn't exist
> > > > upstream as far as I can see)
> > >
> > > Right, the host driver seems to do this. If I'm following correctly we end
> > > up with dma_direct_alloc() calling pgprot_dmacoherent() and get
> > > MT_NORMAL_NC, when the SMMU is declared non-coherent in DT/IORT.
> > >
> > > So we'd get mismatched attributes if hyp is then mapping these structures
> > > cacheable, but I don't remember how that works exactly. Might be fine
> > > since host donates the pages to hyp and we'd have a cache flush in
> > > between. I'll have to read up on that.
> > 
> > I guess that is not enough, as the hypervisor writes the STE/CMDQ at any time.
> > 
> > > Regardless, mapping NC seems cleaner, more readable. I'll see if I can add
> > > that attribute to kvm_pgtable_hyp_map().
> > 
> > There is a patch for that already in Android
> > https://android.googlesource.com/kernel/common/+/636c912401dec4d178f6cdf6073f546b15828cf7%5E%21/#F0
> 
> Nice, I've added this (rather than CMO, to avoid mismatched attributes)
> but don't have the hardware to test it:
> 
> diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
> index 4b0b70017f59..e43011b51ef4 100644
> --- a/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
> +++ b/arch/arm64/kvm/hyp/nvhe/iommu/arm-smmu-v3.c
> @@ -268,12 +268,17 @@ static int smmu_init_registers(struct hyp_arm_smmu_v3_device *smmu)
>  }
>  
>  /* Transfer ownership of structures from host to hyp */
> -static void *smmu_take_pages(u64 base, size_t size)
> +static void *smmu_take_pages(struct hyp_arm_smmu_v3_device *smmu, u64 base,
> +			     size_t size)
>  {
>  	void *hyp_ptr;
> +	enum kvm_pgtable_prot prot = PAGE_HYP;
> +
> +	if (!(smmu->features & ARM_SMMU_FEAT_COHERENCY))
> +		prot |= KVM_PGTABLE_PROT_NC;
>  
>  	hyp_ptr = hyp_phys_to_virt(base);
> -	if (pkvm_create_mappings(hyp_ptr, hyp_ptr + size, PAGE_HYP))
> +	if (pkvm_create_mappings(hyp_ptr, hyp_ptr + size, prot))
>  		return NULL;
>  
>  	return hyp_ptr;
> @@ -293,7 +298,7 @@ static int smmu_init_cmdq(struct hyp_arm_smmu_v3_device *smmu)
>  	cmdq_size = cmdq_nr_entries * CMDQ_ENT_DWORDS * 8;
>  
>  	cmdq_base &= Q_BASE_ADDR_MASK;
> -	smmu->cmdq_base = smmu_take_pages(cmdq_base, cmdq_size);
> +	smmu->cmdq_base = smmu_take_pages(smmu, cmdq_base, cmdq_size);
>  	if (!smmu->cmdq_base)
>  		return -EINVAL;
>  
> @@ -350,7 +355,7 @@ static int smmu_init_strtab(struct hyp_arm_smmu_v3_device *smmu)
>  	}
>  
>  	strtab_base &= STRTAB_BASE_ADDR_MASK;
> -	smmu->strtab_base = smmu_take_pages(strtab_base, strtab_size);
> +	smmu->strtab_base = smmu_take_pages(smmu, strtab_base, strtab_size);
>  	if (!smmu->strtab_base)
>  		return -EINVAL;

Thanks, that is missing the L2 for the STE, but I guess for that we can
just CMO for now, as the HW doen't update it, unlike the CMDQ which must
be mapped as NC and CMO won't be enough.

I am investigating to see if we can map the memory donated from the host
on demand with differet prot, in that case iommu_donate_pages can return
memory with the different attributes.

Thanks,
Mostafa

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 201+ messages in thread

end of thread, other threads:[~2024-03-06 12:51 UTC | newest]

Thread overview: 201+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-01 12:52 [RFC PATCH 00/45] KVM: Arm SMMUv3 driver for pKVM Jean-Philippe Brucker
2023-02-01 12:52 ` Jean-Philippe Brucker
2023-02-01 12:52 ` [RFC PATCH 01/45] iommu/io-pgtable-arm: Split the page table driver Jean-Philippe Brucker
2023-02-01 12:52   ` Jean-Philippe Brucker
2023-02-01 12:52 ` [RFC PATCH 02/45] iommu/io-pgtable-arm: Split initialization Jean-Philippe Brucker
2023-02-01 12:52   ` Jean-Philippe Brucker
2023-02-01 12:52 ` [RFC PATCH 03/45] iommu/io-pgtable: Move fmt into io_pgtable_cfg Jean-Philippe Brucker
2023-02-01 12:52   ` Jean-Philippe Brucker
2024-02-16 11:55   ` Mostafa Saleh
2024-02-16 11:55     ` Mostafa Saleh
2023-02-01 12:52 ` [RFC PATCH 04/45] iommu/io-pgtable: Add configure() operation Jean-Philippe Brucker
2023-02-01 12:52   ` Jean-Philippe Brucker
2023-02-01 12:52 ` [RFC PATCH 05/45] iommu/io-pgtable: Split io_pgtable structure Jean-Philippe Brucker
2023-02-01 12:52   ` Jean-Philippe Brucker
2023-02-07 12:16   ` Mostafa Saleh
2023-02-08 18:01     ` Jean-Philippe Brucker
2023-02-08 18:01       ` Jean-Philippe Brucker
2023-02-01 12:52 ` [RFC PATCH 06/45] iommu/io-pgtable-arm: Extend __arm_lpae_free_pgtable() to only free child tables Jean-Philippe Brucker
2023-02-01 12:52   ` Jean-Philippe Brucker
2023-02-01 12:52 ` [RFC PATCH 07/45] iommu/arm-smmu-v3: Move some definitions to arm64 include/ Jean-Philippe Brucker
2023-02-01 12:52   ` Jean-Philippe Brucker
2023-02-01 12:52 ` [RFC PATCH 08/45] KVM: arm64: pkvm: Add pkvm_udelay() Jean-Philippe Brucker
2023-02-01 12:52   ` Jean-Philippe Brucker
2023-02-01 12:52 ` [RFC PATCH 09/45] KVM: arm64: pkvm: Add pkvm_create_hyp_device_mapping() Jean-Philippe Brucker
2023-02-01 12:52   ` Jean-Philippe Brucker
2023-02-07 12:22   ` Mostafa Saleh
2023-02-07 12:22     ` Mostafa Saleh
2023-02-08 18:02     ` Jean-Philippe Brucker
2023-02-08 18:02       ` Jean-Philippe Brucker
2023-02-01 12:52 ` [RFC PATCH 10/45] KVM: arm64: pkvm: Expose pkvm_map/unmap_donated_memory() Jean-Philippe Brucker
2023-02-01 12:52   ` Jean-Philippe Brucker
2023-02-01 12:52 ` [RFC PATCH 11/45] KVM: arm64: pkvm: Expose pkvm_admit_host_page() Jean-Philippe Brucker
2023-02-01 12:52   ` Jean-Philippe Brucker
2023-02-01 12:52 ` [RFC PATCH 12/45] KVM: arm64: pkvm: Unify pkvm_pkvm_teardown_donated_memory() Jean-Philippe Brucker
2023-02-01 12:52   ` Jean-Philippe Brucker
2024-01-15 14:33   ` Sebastian Ene
2024-01-15 14:33     ` Sebastian Ene
2024-01-23 19:49     ` Jean-Philippe Brucker
2024-01-23 19:49       ` Jean-Philippe Brucker
2023-02-01 12:52 ` [RFC PATCH 13/45] KVM: arm64: pkvm: Add hyp_page_ref_inc_return() Jean-Philippe Brucker
2023-02-01 12:52   ` Jean-Philippe Brucker
2023-02-01 12:52 ` [RFC PATCH 14/45] KVM: arm64: pkvm: Prevent host donation of device memory Jean-Philippe Brucker
2023-02-01 12:52   ` Jean-Philippe Brucker
2023-02-01 12:52 ` [RFC PATCH 15/45] KVM: arm64: pkvm: Add __pkvm_host_share/unshare_dma() Jean-Philippe Brucker
2023-02-01 12:52   ` Jean-Philippe Brucker
2023-02-04 12:51   ` tina.zhang
2023-02-04 12:51     ` tina.zhang
2023-02-06 12:13     ` Jean-Philippe Brucker
2023-02-06 12:13       ` Jean-Philippe Brucker
2023-02-07  2:37       ` tina.zhang
2023-02-07  2:37         ` tina.zhang
2023-02-07 10:39         ` Jean-Philippe Brucker
2023-02-07 10:39           ` Jean-Philippe Brucker
2023-02-07 12:53   ` Mostafa Saleh
2023-02-07 12:53     ` Mostafa Saleh
2023-02-10 19:21     ` Jean-Philippe Brucker
2023-02-10 19:21       ` Jean-Philippe Brucker
2023-02-01 12:53 ` [RFC PATCH 16/45] KVM: arm64: Introduce IOMMU driver infrastructure Jean-Philippe Brucker
2023-02-01 12:53   ` Jean-Philippe Brucker
2023-02-01 12:53 ` [RFC PATCH 17/45] KVM: arm64: pkvm: Add IOMMU hypercalls Jean-Philippe Brucker
2023-02-01 12:53   ` Jean-Philippe Brucker
2023-02-01 12:53 ` [RFC PATCH 18/45] KVM: arm64: iommu: Add per-cpu page queue Jean-Philippe Brucker
2023-02-01 12:53   ` Jean-Philippe Brucker
2023-02-01 12:53 ` [RFC PATCH 19/45] KVM: arm64: iommu: Add domains Jean-Philippe Brucker
2023-02-01 12:53   ` Jean-Philippe Brucker
2023-02-07 13:13   ` Mostafa Saleh
2023-02-07 13:13     ` Mostafa Saleh
2023-02-08 12:31     ` Mostafa Saleh
2023-02-08 12:31       ` Mostafa Saleh
2023-02-08 18:05       ` Jean-Philippe Brucker
2023-02-08 18:05         ` Jean-Philippe Brucker
2023-02-10 22:03         ` Mostafa Saleh
2023-02-10 22:03           ` Mostafa Saleh
2023-05-19 15:33   ` Mostafa Saleh
2023-05-19 15:33     ` Mostafa Saleh
2023-06-02 15:29     ` Jean-Philippe Brucker
2023-06-02 15:29       ` Jean-Philippe Brucker
2023-06-15 13:32       ` Mostafa Saleh
2023-06-15 13:32         ` Mostafa Saleh
2023-02-01 12:53 ` [RFC PATCH 20/45] KVM: arm64: iommu: Add map() and unmap() operations Jean-Philippe Brucker
2023-02-01 12:53   ` Jean-Philippe Brucker
2023-03-30 18:14   ` Mostafa Saleh
2023-03-30 18:14     ` Mostafa Saleh
2023-04-04 16:00     ` Jean-Philippe Brucker
2023-04-04 16:00       ` Jean-Philippe Brucker
2023-09-20 16:23       ` Mostafa Saleh
2023-09-20 16:23         ` Mostafa Saleh
2023-09-25 17:21         ` Jean-Philippe Brucker
2023-09-25 17:21           ` Jean-Philippe Brucker
2024-02-16 11:59   ` Mostafa Saleh
2024-02-16 11:59     ` Mostafa Saleh
2024-02-26 14:12     ` Jean-Philippe Brucker
2024-02-26 14:12       ` Jean-Philippe Brucker
2023-02-01 12:53 ` [RFC PATCH 21/45] KVM: arm64: iommu: Add SMMUv3 driver Jean-Philippe Brucker
2023-02-01 12:53   ` Jean-Philippe Brucker
2023-02-01 12:53 ` [RFC PATCH 22/45] KVM: arm64: smmu-v3: Initialize registers Jean-Philippe Brucker
2023-02-01 12:53   ` Jean-Philippe Brucker
2023-02-01 12:53 ` [RFC PATCH 23/45] KVM: arm64: smmu-v3: Setup command queue Jean-Philippe Brucker
2023-02-01 12:53   ` Jean-Philippe Brucker
2023-02-01 12:53 ` [RFC PATCH 24/45] KVM: arm64: smmu-v3: Setup stream table Jean-Philippe Brucker
2023-02-01 12:53   ` Jean-Philippe Brucker
2024-01-16  8:59   ` Mostafa Saleh
2024-01-16  8:59     ` Mostafa Saleh
2024-01-23 19:45     ` Jean-Philippe Brucker
2024-01-23 19:45       ` Jean-Philippe Brucker
2024-02-16 12:19       ` Mostafa Saleh
2024-02-16 12:19         ` Mostafa Saleh
2024-02-26 14:13         ` Jean-Philippe Brucker
2024-02-26 14:13           ` Jean-Philippe Brucker
2024-03-06 12:51           ` Mostafa Saleh
2024-03-06 12:51             ` Mostafa Saleh
2023-02-01 12:53 ` [RFC PATCH 25/45] KVM: arm64: smmu-v3: Reset the device Jean-Philippe Brucker
2023-02-01 12:53   ` Jean-Philippe Brucker
2023-02-01 12:53 ` [RFC PATCH 26/45] KVM: arm64: smmu-v3: Support io-pgtable Jean-Philippe Brucker
2023-02-01 12:53   ` Jean-Philippe Brucker
2023-02-01 12:53 ` [RFC PATCH 27/45] KVM: arm64: smmu-v3: Setup domains and page table configuration Jean-Philippe Brucker
2023-02-01 12:53   ` Jean-Philippe Brucker
2023-06-23 19:12   ` Mostafa Saleh
2023-06-23 19:12     ` Mostafa Saleh
2023-07-03 10:41     ` Jean-Philippe Brucker
2023-07-03 10:41       ` Jean-Philippe Brucker
2024-01-15 14:34   ` Mostafa Saleh
2024-01-15 14:34     ` Mostafa Saleh
2024-01-23 19:50     ` Jean-Philippe Brucker
2024-01-23 19:50       ` Jean-Philippe Brucker
2024-02-16 12:11       ` Mostafa Saleh
2024-02-16 12:11         ` Mostafa Saleh
2024-02-26 14:18         ` Jean-Philippe Brucker
2024-02-26 14:18           ` Jean-Philippe Brucker
2023-02-01 12:53 ` [RFC PATCH 28/45] iommu/arm-smmu-v3: Extract driver-specific bits from probe function Jean-Philippe Brucker
2023-02-01 12:53   ` Jean-Philippe Brucker
2023-02-01 12:53 ` [RFC PATCH 29/45] iommu/arm-smmu-v3: Move some functions to arm-smmu-v3-common.c Jean-Philippe Brucker
2023-02-01 12:53   ` Jean-Philippe Brucker
2023-02-01 12:53 ` [RFC PATCH 30/45] iommu/arm-smmu-v3: Move queue and table allocation " Jean-Philippe Brucker
2023-02-01 12:53   ` Jean-Philippe Brucker
2024-02-16 12:03   ` Mostafa Saleh
2024-02-16 12:03     ` Mostafa Saleh
2024-02-26 14:19     ` Jean-Philippe Brucker
2024-02-26 14:19       ` Jean-Philippe Brucker
2023-02-01 12:53 ` [RFC PATCH 31/45] iommu/arm-smmu-v3: Move firmware probe to arm-smmu-v3-common Jean-Philippe Brucker
2023-02-01 12:53   ` Jean-Philippe Brucker
2023-02-01 12:53 ` [RFC PATCH 32/45] iommu/arm-smmu-v3: Move IOMMU registration to arm-smmu-v3-common.c Jean-Philippe Brucker
2023-02-01 12:53   ` Jean-Philippe Brucker
2023-02-01 12:53 ` [RFC PATCH 33/45] iommu/arm-smmu-v3: Use single pages for level-2 stream tables Jean-Philippe Brucker
2023-02-01 12:53   ` Jean-Philippe Brucker
2023-02-01 12:53 ` [RFC PATCH 34/45] iommu/arm-smmu-v3: Add host driver for pKVM Jean-Philippe Brucker
2023-02-01 12:53   ` Jean-Philippe Brucker
2023-02-01 12:53 ` [RFC PATCH 35/45] iommu/arm-smmu-v3-kvm: Pass a list of SMMU devices to the hypervisor Jean-Philippe Brucker
2023-02-01 12:53   ` Jean-Philippe Brucker
2023-02-01 12:53 ` [RFC PATCH 36/45] iommu/arm-smmu-v3-kvm: Validate device features Jean-Philippe Brucker
2023-02-01 12:53   ` Jean-Philippe Brucker
2023-02-01 12:53 ` [RFC PATCH 37/45] iommu/arm-smmu-v3-kvm: Allocate structures and reset device Jean-Philippe Brucker
2023-02-01 12:53   ` Jean-Philippe Brucker
2023-02-01 12:53 ` [RFC PATCH 38/45] iommu/arm-smmu-v3-kvm: Add per-cpu page queue Jean-Philippe Brucker
2023-02-01 12:53   ` Jean-Philippe Brucker
2023-02-01 12:53 ` [RFC PATCH 39/45] iommu/arm-smmu-v3-kvm: Initialize page table configuration Jean-Philippe Brucker
2023-02-01 12:53   ` Jean-Philippe Brucker
2023-03-22 10:23   ` Mostafa Saleh
2023-03-22 10:23     ` Mostafa Saleh
2023-03-22 14:42     ` Jean-Philippe Brucker
2023-03-22 14:42       ` Jean-Philippe Brucker
2023-02-01 12:53 ` [RFC PATCH 40/45] iommu/arm-smmu-v3-kvm: Add IOMMU ops Jean-Philippe Brucker
2023-02-01 12:53   ` Jean-Philippe Brucker
2023-02-07 13:22   ` Mostafa Saleh
2023-02-07 13:22     ` Mostafa Saleh
2023-02-08 18:13     ` Jean-Philippe Brucker
2023-02-08 18:13       ` Jean-Philippe Brucker
2023-09-20 16:27   ` Mostafa Saleh
2023-09-20 16:27     ` Mostafa Saleh
2023-09-25 17:18     ` Jean-Philippe Brucker
2023-09-25 17:18       ` Jean-Philippe Brucker
2023-09-26  9:54       ` Mostafa Saleh
2023-09-26  9:54         ` Mostafa Saleh
2023-02-01 12:53 ` [RFC PATCH 41/45] KVM: arm64: pkvm: Add __pkvm_host_add_remove_page() Jean-Philippe Brucker
2023-02-01 12:53   ` Jean-Philippe Brucker
2023-02-01 12:53 ` [RFC PATCH 42/45] KVM: arm64: pkvm: Support SCMI power domain Jean-Philippe Brucker
2023-02-01 12:53   ` Jean-Philippe Brucker
2023-02-07 13:27   ` Mostafa Saleh
2023-02-07 13:27     ` Mostafa Saleh
2023-02-10 19:23     ` Jean-Philippe Brucker
2023-02-10 19:23       ` Jean-Philippe Brucker
2023-02-01 12:53 ` [RFC PATCH 43/45] KVM: arm64: smmu-v3: Support power management Jean-Philippe Brucker
2023-02-01 12:53   ` Jean-Philippe Brucker
2023-02-01 12:53 ` [RFC PATCH 44/45] iommu/arm-smmu-v3-kvm: Support power management with SCMI SMC Jean-Philippe Brucker
2023-02-01 12:53   ` Jean-Philippe Brucker
2023-02-01 12:53 ` [RFC PATCH 45/45] iommu/arm-smmu-v3-kvm: Enable runtime PM Jean-Philippe Brucker
2023-02-01 12:53   ` Jean-Philippe Brucker
2023-02-02  7:07 ` [RFC PATCH 00/45] KVM: Arm SMMUv3 driver for pKVM Tian, Kevin
2023-02-02  7:07   ` Tian, Kevin
2023-02-02 10:05   ` Jean-Philippe Brucker
2023-02-02 10:05     ` Jean-Philippe Brucker
2023-02-03  2:04     ` Tian, Kevin
2023-02-03  2:04       ` Tian, Kevin
2023-02-03  8:39       ` Chen, Jason CJ
2023-02-03  8:39         ` Chen, Jason CJ
2023-02-03 11:23         ` Jean-Philippe Brucker
2023-02-03 11:23           ` Jean-Philippe Brucker
2023-02-04  8:19           ` Chen, Jason CJ
2023-02-04  8:19             ` Chen, Jason CJ
2023-02-04 12:30             ` tina.zhang
2023-02-04 12:30               ` tina.zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.