iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/13] Linux RISC-V IOMMU Support
@ 2023-07-19 19:33 Tomasz Jeznach
  2023-07-19 19:33 ` [PATCH 01/11] RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support Tomasz Jeznach
                   ` (11 more replies)
  0 siblings, 12 replies; 86+ messages in thread
From: Tomasz Jeznach @ 2023-07-19 19:33 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley
  Cc: Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Nick Kossifidis, Sebastien Boeuf, iommu, linux-riscv,
	linux-kernel, linux, Tomasz Jeznach

The RISC-V IOMMU specification is now ratified as-per the RISC-V international
process [1]. The latest frozen specifcation can be found at:
https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf

At a high-level, the RISC-V IOMMU specification defines:
1) Memory-mapped programming interface
   - Mandatory and optional registers layout and description.
   - Software guidelines for device initialization and capabilities discovery.
2) In-memory queue interface
   - A command-queue used by software to queue commands to the IOMMU.
   - A fault/event queue used to bring faults and events to software’s attention.
   - A page-request queue used to report “Page Request” messages received from
     PCIe devices.
   - Message-signalled and wire-signaled interrupt mechanism.
3) In-memory data structures
   - Device-context: used to associate a device with an address space and to hold
     other per-device parameters used by the IOMMU to perform address translations.
   - Process-contexts: used to associate a different virtual address space based on
     device provided process identification number.
   - MSI page table configuration used to direct an MSI to a guest interrupt file
     in an IMSIC. The MSI page table formats are defined by the Advanced Interrupt
     Architecture specification [2].

This series introduces complete single-level translation support, including shared
virtual address (SVA), ATS/PRI interfaces in the kernel driver. Patches adding MSI
identity remapping and G-Stage translation (GPA to SPA) are added only to excercise
hardware interfaces, to be complemented with AIA/KVM bindings in follow-up series.

This series is a logical regrouping of series of incremental patches based on
RISC-V International IOMMU Task Group discussions and specification development
process. Original series can be found at the maintainer's repository branch [3].

These patches can also be found in the riscv_iommu_v1 branch at:
https://github.com/tjeznach/linux/tree/riscv_iommu_v1

To test this series, use QEMU/OpenSBI with RISC-V IOMMU implementation available in
the riscv_iommu_v1 branch at:
https://github.com/tjeznach/qemu/tree/riscv_iommu_v1

References:
[1] - https://wiki.riscv.org/display/HOME/Specification+Status
[2] - https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
[3] - https://github.com/tjeznach/qemu/tree/tjeznach/riscv-iommu-20230719


Anup Patel (1):
  dt-bindings: Add RISC-V IOMMU bindings

Tomasz Jeznach (10):
  RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support.
  RISC-V: arch/riscv/config: enable RISC-V IOMMU support
  MAINTAINERS: Add myself for RISC-V IOMMU driver
  RISC-V: drivers/iommu/riscv: Add sysfs interface
  RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues
  RISC-V: drivers/iommu/riscv: Add device context support
  RISC-V: drivers/iommu/riscv: Add page table support
  RISC-V: drivers/iommu/riscv: Add SVA with PASID/ATS/PRI support.
  RISC-V: drivers/iommu/riscv: Add MSI identity remapping
  RISC-V: drivers/iommu/riscv: Add G-Stage translation support

 .../bindings/iommu/riscv,iommu.yaml           |  146 ++
 MAINTAINERS                                   |    7 +
 arch/riscv/configs/defconfig                  |    1 +
 drivers/iommu/Kconfig                         |    1 +
 drivers/iommu/Makefile                        |    2 +-
 drivers/iommu/io-pgtable.c                    |    3 +
 drivers/iommu/riscv/Kconfig                   |   22 +
 drivers/iommu/riscv/Makefile                  |    1 +
 drivers/iommu/riscv/io_pgtable.c              |  266 ++
 drivers/iommu/riscv/iommu-bits.h              |  704 ++++++
 drivers/iommu/riscv/iommu-pci.c               |  206 ++
 drivers/iommu/riscv/iommu-platform.c          |  160 ++
 drivers/iommu/riscv/iommu-sysfs.c             |  183 ++
 drivers/iommu/riscv/iommu.c                   | 2130 +++++++++++++++++
 drivers/iommu/riscv/iommu.h                   |  165 ++
 include/linux/io-pgtable.h                    |    2 +
 16 files changed, 3998 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
 create mode 100644 drivers/iommu/riscv/Kconfig
 create mode 100644 drivers/iommu/riscv/Makefile
 create mode 100644 drivers/iommu/riscv/io_pgtable.c
 create mode 100644 drivers/iommu/riscv/iommu-bits.h
 create mode 100644 drivers/iommu/riscv/iommu-pci.c
 create mode 100644 drivers/iommu/riscv/iommu-platform.c
 create mode 100644 drivers/iommu/riscv/iommu-sysfs.c
 create mode 100644 drivers/iommu/riscv/iommu.c
 create mode 100644 drivers/iommu/riscv/iommu.h

-- 
2.34.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH 01/11] RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support.
  2023-07-19 19:33 [PATCH 00/13] Linux RISC-V IOMMU Support Tomasz Jeznach
@ 2023-07-19 19:33 ` Tomasz Jeznach
  2023-07-19 20:49   ` Conor Dooley
                     ` (7 more replies)
  2023-07-19 19:33 ` [PATCH 02/11] RISC-V: arch/riscv/config: enable RISC-V IOMMU support Tomasz Jeznach
                   ` (10 subsequent siblings)
  11 siblings, 8 replies; 86+ messages in thread
From: Tomasz Jeznach @ 2023-07-19 19:33 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley
  Cc: Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Nick Kossifidis, Sebastien Boeuf, iommu, linux-riscv,
	linux-kernel, linux, Tomasz Jeznach

The patch introduces skeleton IOMMU device driver implementation as defined
by RISC-V IOMMU Architecture Specification, Version 1.0 [1], with minimal support
for pass-through mapping, basic initialization and bindings for platform and PCIe
hardware implementations.

Series of patches following specification evolution has been reorganized to provide
functional separation of implemented blocks, compliant with ratified specification.

This and following patch series includes code contributed by: Nick Kossifidis
<mick@ics.forth.gr> (iommu-platform device, number of specification clarification
and bugfixes and readability improvements), Sebastien Boeuf <seb@rivosinc.com> (page
table creation, ATS/PGR flow).

Complete history can be found at the maintainer's repository branch [2].

Device driver enables RISC-V 32/64 support for memory translation for DMA capable
PCI and platform devices, multilevel device directory table, process directory,
shared virtual address support, wired and message signaled interrupt for translation
I/O fault, page request interface and command processing.

Matching RISCV-V IOMMU device emulation implementation is available for QEMU project,
along with educational device extensions for PASID ATS/PRI support [3].

References:
 - [1] https://github.com/riscv-non-isa/riscv-iommu
 - [2] https://github.com/tjeznach/linux/tree/tjeznach/riscv-iommu
 - [3] https://github.com/tjeznach/qemu/tree/tjeznach/riscv-iommu

Co-developed-by: Nick Kossifidis <mick@ics.forth.gr>
Signed-off-by: Nick Kossifidis <mick@ics.forth.gr>
Co-developed-by: Sebastien Boeuf <seb@rivosinc.com>
Signed-off-by: Sebastien Boeuf <seb@rivosinc.com>
Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
---
 drivers/iommu/Kconfig                |   1 +
 drivers/iommu/Makefile               |   2 +-
 drivers/iommu/riscv/Kconfig          |  22 +
 drivers/iommu/riscv/Makefile         |   1 +
 drivers/iommu/riscv/iommu-bits.h     | 704 +++++++++++++++++++++++++++
 drivers/iommu/riscv/iommu-pci.c      | 134 +++++
 drivers/iommu/riscv/iommu-platform.c |  94 ++++
 drivers/iommu/riscv/iommu.c          | 660 +++++++++++++++++++++++++
 drivers/iommu/riscv/iommu.h          | 115 +++++
 9 files changed, 1732 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/riscv/Kconfig
 create mode 100644 drivers/iommu/riscv/Makefile
 create mode 100644 drivers/iommu/riscv/iommu-bits.h
 create mode 100644 drivers/iommu/riscv/iommu-pci.c
 create mode 100644 drivers/iommu/riscv/iommu-platform.c
 create mode 100644 drivers/iommu/riscv/iommu.c
 create mode 100644 drivers/iommu/riscv/iommu.h

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 2b12b583ef4b..36fcc6fd5b4e 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -187,6 +187,7 @@ config MSM_IOMMU
 source "drivers/iommu/amd/Kconfig"
 source "drivers/iommu/intel/Kconfig"
 source "drivers/iommu/iommufd/Kconfig"
+source "drivers/iommu/riscv/Kconfig"
 
 config IRQ_REMAP
 	bool "Support for Interrupt Remapping"
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 769e43d780ce..8f57110a9fb1 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
-obj-y += amd/ intel/ arm/ iommufd/
+obj-y += amd/ intel/ arm/ iommufd/ riscv/
 obj-$(CONFIG_IOMMU_API) += iommu.o
 obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
diff --git a/drivers/iommu/riscv/Kconfig b/drivers/iommu/riscv/Kconfig
new file mode 100644
index 000000000000..01d4043849d4
--- /dev/null
+++ b/drivers/iommu/riscv/Kconfig
@@ -0,0 +1,22 @@
+# SPDX-License-Identifier: GPL-2.0-only
+# RISC-V IOMMU support
+
+config RISCV_IOMMU
+	bool "RISC-V IOMMU driver"
+	depends on RISCV
+	select IOMMU_API
+	select IOMMU_DMA
+	select IOMMU_SVA
+	select IOMMU_IOVA
+	select IOMMU_IO_PGTABLE
+	select IOASID
+	select PCI_MSI
+	select PCI_ATS
+	select PCI_PRI
+	select PCI_PASID
+	select MMU_NOTIFIER
+	help
+	  Support for devices following RISC-V IOMMU specification.
+
+	  If unsure, say N here.
+
diff --git a/drivers/iommu/riscv/Makefile b/drivers/iommu/riscv/Makefile
new file mode 100644
index 000000000000..38730c11e4a8
--- /dev/null
+++ b/drivers/iommu/riscv/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_RISCV_IOMMU) += iommu.o iommu-pci.o iommu-platform.o
\ No newline at end of file
diff --git a/drivers/iommu/riscv/iommu-bits.h b/drivers/iommu/riscv/iommu-bits.h
new file mode 100644
index 000000000000..b2946793a73d
--- /dev/null
+++ b/drivers/iommu/riscv/iommu-bits.h
@@ -0,0 +1,704 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright © 2022-2023 Rivos Inc.
+ * Copyright © 2023 FORTH-ICS/CARV
+ * Copyright © 2023 RISC-V IOMMU Task Group
+ *
+ * RISC-V Ziommu - Register Layout and Data Structures.
+ *
+ * Based on the 'RISC-V IOMMU Architecture Specification', Version 1.0
+ * Published at  https://github.com/riscv-non-isa/riscv-iommu
+ *
+ */
+
+#ifndef _RISCV_IOMMU_BITS_H_
+#define _RISCV_IOMMU_BITS_H_
+
+#include <linux/types.h>
+#include <linux/bitfield.h>
+#include <linux/bits.h>
+
+/*
+ * Chapter 5: Memory Mapped register interface
+ */
+
+/* Common field positions */
+#define RISCV_IOMMU_PPN_FIELD		GENMASK_ULL(53, 10)
+#define RISCV_IOMMU_QUEUE_LOGSZ_FIELD	GENMASK_ULL(4, 0)
+#define RISCV_IOMMU_QUEUE_INDEX_FIELD	GENMASK_ULL(31, 0)
+#define RISCV_IOMMU_QUEUE_ENABLE	BIT(0)
+#define RISCV_IOMMU_QUEUE_INTR_ENABLE	BIT(1)
+#define RISCV_IOMMU_QUEUE_MEM_FAULT	BIT(8)
+#define RISCV_IOMMU_QUEUE_OVERFLOW	BIT(9)
+#define RISCV_IOMMU_QUEUE_ACTIVE	BIT(16)
+#define RISCV_IOMMU_QUEUE_BUSY		BIT(17)
+
+#define RISCV_IOMMU_ATP_PPN_FIELD	GENMASK_ULL(43, 0)
+#define RISCV_IOMMU_ATP_MODE_FIELD	GENMASK_ULL(63, 60)
+
+/* 5.3 IOMMU Capabilities (64bits) */
+#define RISCV_IOMMU_REG_CAP		0x0000
+#define RISCV_IOMMU_CAP_VERSION		GENMASK_ULL(7, 0)
+#define RISCV_IOMMU_CAP_S_SV32		BIT_ULL(8)
+#define RISCV_IOMMU_CAP_S_SV39		BIT_ULL(9)
+#define RISCV_IOMMU_CAP_S_SV48		BIT_ULL(10)
+#define RISCV_IOMMU_CAP_S_SV57		BIT_ULL(11)
+#define RISCV_IOMMU_CAP_SVPBMT		BIT_ULL(15)
+#define RISCV_IOMMU_CAP_G_SV32		BIT_ULL(16)
+#define RISCV_IOMMU_CAP_G_SV39		BIT_ULL(17)
+#define RISCV_IOMMU_CAP_G_SV48		BIT_ULL(18)
+#define RISCV_IOMMU_CAP_G_SV57		BIT_ULL(19)
+#define RISCV_IOMMU_CAP_MSI_FLAT	BIT_ULL(22)
+#define RISCV_IOMMU_CAP_MSI_MRIF	BIT_ULL(23)
+#define RISCV_IOMMU_CAP_AMO		BIT_ULL(24)
+#define RISCV_IOMMU_CAP_ATS		BIT_ULL(25)
+#define RISCV_IOMMU_CAP_T2GPA		BIT_ULL(26)
+#define RISCV_IOMMU_CAP_END		BIT_ULL(27)
+#define RISCV_IOMMU_CAP_IGS		GENMASK_ULL(29, 28)
+#define RISCV_IOMMU_CAP_HPM		BIT_ULL(30)
+#define RISCV_IOMMU_CAP_DBG		BIT_ULL(31)
+#define RISCV_IOMMU_CAP_PAS		GENMASK_ULL(37, 32)
+#define RISCV_IOMMU_CAP_PD8		BIT_ULL(38)
+#define RISCV_IOMMU_CAP_PD17		BIT_ULL(39)
+#define RISCV_IOMMU_CAP_PD20		BIT_ULL(40)
+
+#define RISCV_IOMMU_CAP_VERSION_VER_MASK	0xF0
+#define RISCV_IOMMU_CAP_VERSION_REV_MASK	0x0F
+
+/**
+ * enum riscv_iommu_igs_settings - Interrupt Generation Support Settings
+ * @RISCV_IOMMU_CAP_IGS_MSI: I/O MMU supports only MSI generation
+ * @RISCV_IOMMU_CAP_IGS_WSI: I/O MMU supports only Wired-Signaled interrupt
+ * @RISCV_IOMMU_CAP_IGS_BOTH: I/O MMU supports both MSI and WSI generation
+ * @RISCV_IOMMU_CAP_IGS_RSRV: Reserved for standard use
+ */
+enum riscv_iommu_igs_settings {
+	RISCV_IOMMU_CAP_IGS_MSI = 0,
+	RISCV_IOMMU_CAP_IGS_WSI = 1,
+	RISCV_IOMMU_CAP_IGS_BOTH = 2,
+	RISCV_IOMMU_CAP_IGS_RSRV = 3
+};
+
+/* 5.4 Features control register (32bits) */
+#define RISCV_IOMMU_REG_FCTL		0x0008
+#define RISCV_IOMMU_FCTL_BE		BIT(0)
+#define RISCV_IOMMU_FCTL_WSI		BIT(1)
+#define RISCV_IOMMU_FCTL_GXL		BIT(2)
+
+/* 5.5 Device-directory-table pointer (64bits) */
+#define RISCV_IOMMU_REG_DDTP		0x0010
+#define RISCV_IOMMU_DDTP_MODE		GENMASK_ULL(3, 0)
+#define RISCV_IOMMU_DDTP_BUSY		BIT_ULL(4)
+#define RISCV_IOMMU_DDTP_PPN		RISCV_IOMMU_PPN_FIELD
+
+/**
+ * enum riscv_iommu_ddtp_modes - I/O MMU translation modes
+ * @RISCV_IOMMU_DDTP_MODE_OFF: No inbound transactions allowed
+ * @RISCV_IOMMU_DDTP_MODE_BARE: Pass-through mode
+ * @RISCV_IOMMU_DDTP_MODE_1LVL: One-level DDT
+ * @RISCV_IOMMU_DDTP_MODE_2LVL: Two-level DDT
+ * @RISCV_IOMMU_DDTP_MODE_3LVL: Three-level DDT
+ */
+enum riscv_iommu_ddtp_modes {
+	RISCV_IOMMU_DDTP_MODE_OFF = 0,
+	RISCV_IOMMU_DDTP_MODE_BARE = 1,
+	RISCV_IOMMU_DDTP_MODE_1LVL = 2,
+	RISCV_IOMMU_DDTP_MODE_2LVL = 3,
+	RISCV_IOMMU_DDTP_MODE_3LVL = 4,
+	RISCV_IOMMU_DDTP_MODE_MAX = 4
+};
+
+/* 5.6 Command Queue Base (64bits) */
+#define RISCV_IOMMU_REG_CQB		0x0018
+#define RISCV_IOMMU_CQB_ENTRIES		RISCV_IOMMU_QUEUE_LOGSZ_FIELD
+#define RISCV_IOMMU_CQB_PPN		RISCV_IOMMU_PPN_FIELD
+
+/* 5.7 Command Queue head (32bits) */
+#define RISCV_IOMMU_REG_CQH		0x0020
+#define RISCV_IOMMU_CQH_INDEX		RISCV_IOMMU_QUEUE_INDEX_FIELD
+
+/* 5.8 Command Queue tail (32bits) */
+#define RISCV_IOMMU_REG_CQT		0x0024
+#define RISCV_IOMMU_CQT_INDEX		RISCV_IOMMU_QUEUE_INDEX_FIELD
+
+/* 5.9 Fault Queue Base (64bits) */
+#define RISCV_IOMMU_REG_FQB		0x0028
+#define RISCV_IOMMU_FQB_ENTRIES		RISCV_IOMMU_QUEUE_LOGSZ_FIELD
+#define RISCV_IOMMU_FQB_PPN		RISCV_IOMMU_PPN_FIELD
+
+/* 5.10 Fault Queue Head (32bits) */
+#define RISCV_IOMMU_REG_FQH		0x0030
+#define RISCV_IOMMU_FQH_INDEX		RISCV_IOMMU_QUEUE_INDEX_FIELD
+
+/* 5.11 Fault Queue tail (32bits) */
+#define RISCV_IOMMU_REG_FQT		0x0034
+#define RISCV_IOMMU_FQT_INDEX		RISCV_IOMMU_QUEUE_INDEX_FIELD
+
+/* 5.12 Page Request Queue base (64bits) */
+#define RISCV_IOMMU_REG_PQB		0x0038
+#define RISCV_IOMMU_PQB_ENTRIES		RISCV_IOMMU_QUEUE_LOGSZ_FIELD
+#define RISCV_IOMMU_PQB_PPN		RISCV_IOMMU_PPN_FIELD
+
+/* 5.13 Page Request Queue head (32bits) */
+#define RISCV_IOMMU_REG_PQH		0x0040
+#define RISCV_IOMMU_PQH_INDEX		RISCV_IOMMU_QUEUE_INDEX_FIELD
+
+/* 5.14 Page Request Queue tail (32bits) */
+#define RISCV_IOMMU_REG_PQT		0x0044
+#define RISCV_IOMMU_PQT_INDEX_MASK	RISCV_IOMMU_QUEUE_INDEX_FIELD
+
+/* 5.15 Command Queue CSR (32bits) */
+#define RISCV_IOMMU_REG_CQCSR		0x0048
+#define RISCV_IOMMU_CQCSR_CQEN		RISCV_IOMMU_QUEUE_ENABLE
+#define RISCV_IOMMU_CQCSR_CIE		RISCV_IOMMU_QUEUE_INTR_ENABLE
+#define RISCV_IOMMU_CQCSR_CQMF		RISCV_IOMMU_QUEUE_MEM_FAULT
+#define RISCV_IOMMU_CQCSR_CMD_TO	BIT(9)
+#define RISCV_IOMMU_CQCSR_CMD_ILL	BIT(10)
+#define RISCV_IOMMU_CQCSR_FENCE_W_IP	BIT(11)
+#define RISCV_IOMMU_CQCSR_CQON		RISCV_IOMMU_QUEUE_ACTIVE
+#define RISCV_IOMMU_CQCSR_BUSY		RISCV_IOMMU_QUEUE_BUSY
+
+/* 5.16 Fault Queue CSR (32bits) */
+#define RISCV_IOMMU_REG_FQCSR		0x004C
+#define RISCV_IOMMU_FQCSR_FQEN		RISCV_IOMMU_QUEUE_ENABLE
+#define RISCV_IOMMU_FQCSR_FIE		RISCV_IOMMU_QUEUE_INTR_ENABLE
+#define RISCV_IOMMU_FQCSR_FQMF		RISCV_IOMMU_QUEUE_MEM_FAULT
+#define RISCV_IOMMU_FQCSR_FQOF		RISCV_IOMMU_QUEUE_OVERFLOW
+#define RISCV_IOMMU_FQCSR_FQON		RISCV_IOMMU_QUEUE_ACTIVE
+#define RISCV_IOMMU_FQCSR_BUSY		RISCV_IOMMU_QUEUE_BUSY
+
+/* 5.17 Page Request Queue CSR (32bits) */
+#define RISCV_IOMMU_REG_PQCSR		0x0050
+#define RISCV_IOMMU_PQCSR_PQEN		RISCV_IOMMU_QUEUE_ENABLE
+#define RISCV_IOMMU_PQCSR_PIE		RISCV_IOMMU_QUEUE_INTR_ENABLE
+#define RISCV_IOMMU_PQCSR_PQMF		RISCV_IOMMU_QUEUE_MEM_FAULT
+#define RISCV_IOMMU_PQCSR_PQOF		RISCV_IOMMU_QUEUE_OVERFLOW
+#define RISCV_IOMMU_PQCSR_PQON		RISCV_IOMMU_QUEUE_ACTIVE
+#define RISCV_IOMMU_PQCSR_BUSY		RISCV_IOMMU_QUEUE_BUSY
+
+/* 5.18 Interrupt Pending Status (32bits) */
+#define RISCV_IOMMU_REG_IPSR		0x0054
+
+#define RISCV_IOMMU_INTR_CQ		0
+#define RISCV_IOMMU_INTR_FQ		1
+#define RISCV_IOMMU_INTR_PM		2
+#define RISCV_IOMMU_INTR_PQ		3
+#define RISCV_IOMMU_INTR_COUNT		4
+
+#define RISCV_IOMMU_IPSR_CIP		BIT(RISCV_IOMMU_INTR_CQ)
+#define RISCV_IOMMU_IPSR_FIP		BIT(RISCV_IOMMU_INTR_FQ)
+#define RISCV_IOMMU_IPSR_PMIP		BIT(RISCV_IOMMU_INTR_PM)
+#define RISCV_IOMMU_IPSR_PIP		BIT(RISCV_IOMMU_INTR_PQ)
+
+/* 5.19 Performance monitoring counter overflow status (32bits) */
+#define RISCV_IOMMU_REG_IOCOUNTOVF	0x0058
+#define RISCV_IOMMU_IOCOUNTOVF_CY	BIT(0)
+#define RISCV_IOMMU_IOCOUNTOVF_HPM	GENMASK_ULL(31, 1)
+
+/* 5.20 Performance monitoring counter inhibits (32bits) */
+#define RISCV_IOMMU_REG_IOCOUNTINH	0x005C
+#define RISCV_IOMMU_IOCOUNTINH_CY	BIT(0)
+#define RISCV_IOMMU_IOCOUNTINH_HPM	GENMASK(31, 1)
+
+/* 5.21 Performance monitoring cycles counter (64bits) */
+#define RISCV_IOMMU_REG_IOHPMCYCLES     0x0060
+#define RISCV_IOMMU_IOHPMCYCLES_COUNTER	GENMASK_ULL(62, 0)
+#define RISCV_IOMMU_IOHPMCYCLES_OVF	BIT_ULL(63)
+
+/* 5.22 Performance monitoring event counters (31 * 64bits) */
+#define RISCV_IOMMU_REG_IOHPMCTR_BASE	0x0068
+#define RISCV_IOMMU_REG_IOHPMCTR(_n)	(RISCV_IOMMU_REG_IOHPMCTR_BASE + (_n * 0x8))
+
+/* 5.23 Performance monitoring event selectors (31 * 64bits) */
+#define RISCV_IOMMU_REG_IOHPMEVT_BASE	0x0160
+#define RISCV_IOMMU_REG_IOHPMEVT(_n)	(RISCV_IOMMU_REG_IOHPMEVT_BASE + (_n * 0x8))
+#define RISCV_IOMMU_IOHPMEVT_EVENT_ID	GENMASK_ULL(14, 0)
+#define RISCV_IOMMU_IOHPMEVT_DMASK	BIT_ULL(15)
+#define RISCV_IOMMU_IOHPMEVT_PID_PSCID	GENMASK_ULL(35, 16)
+#define RISCV_IOMMU_IOHPMEVT_DID_GSCID	GENMASK_ULL(59, 36)
+#define RISCV_IOMMU_IOHPMEVT_PV_PSCV	BIT_ULL(60)
+#define RISCV_IOMMU_IOHPMEVT_DV_GSCV	BIT_ULL(61)
+#define RISCV_IOMMU_IOHPMEVT_IDT	BIT_ULL(62)
+#define RISCV_IOMMU_IOHPMEVT_OF		BIT_ULL(63)
+
+/**
+ * enum riscv_iommu_hpmevent_id - Performance-monitoring event identifier
+ *
+ * @RISCV_IOMMU_HPMEVENT_INVALID: Invalid event, do not count
+ * @RISCV_IOMMU_HPMEVENT_URQ: Untranslated requests
+ * @RISCV_IOMMU_HPMEVENT_TRQ: Translated requests
+ * @RISCV_IOMMU_HPMEVENT_ATS_RQ: ATS translation requests
+ * @RISCV_IOMMU_HPMEVENT_TLB_MISS: TLB misses
+ * @RISCV_IOMMU_HPMEVENT_DD_WALK: Device directory walks
+ * @RISCV_IOMMU_HPMEVENT_PD_WALK: Process directory walks
+ * @RISCV_IOMMU_HPMEVENT_S_VS_WALKS: S/VS-Stage page table walks
+ * @RISCV_IOMMU_HPMEVENT_G_WALKS: G-Stage page table walks
+ * @RISCV_IOMMU_HPMEVENT_MAX: Value to denote maximum Event IDs
+ */
+enum riscv_iommu_hpmevent_id {
+	RISCV_IOMMU_HPMEVENT_INVALID    = 0,
+	RISCV_IOMMU_HPMEVENT_URQ        = 1,
+	RISCV_IOMMU_HPMEVENT_TRQ        = 2,
+	RISCV_IOMMU_HPMEVENT_ATS_RQ     = 3,
+	RISCV_IOMMU_HPMEVENT_TLB_MISS   = 4,
+	RISCV_IOMMU_HPMEVENT_DD_WALK    = 5,
+	RISCV_IOMMU_HPMEVENT_PD_WALK    = 6,
+	RISCV_IOMMU_HPMEVENT_S_VS_WALKS = 7,
+	RISCV_IOMMU_HPMEVENT_G_WALKS    = 8,
+	RISCV_IOMMU_HPMEVENT_MAX        = 9
+};
+
+/* 5.24 Translation request IOVA (64bits) */
+#define RISCV_IOMMU_REG_TR_REQ_IOVA     0x0258
+#define RISCV_IOMMU_TR_REQ_IOVA_VPN	GENMASK_ULL(63, 12)
+
+/* 5.25 Translation request control (64bits) */
+#define RISCV_IOMMU_REG_TR_REQ_CTL	0x0260
+#define RISCV_IOMMU_TR_REQ_CTL_GO_BUSY	BIT_ULL(0)
+#define RISCV_IOMMU_TR_REQ_CTL_PRIV	BIT_ULL(1)
+#define RISCV_IOMMU_TR_REQ_CTL_EXE	BIT_ULL(2)
+#define RISCV_IOMMU_TR_REQ_CTL_NW	BIT_ULL(3)
+#define RISCV_IOMMU_TR_REQ_CTL_PID	GENMASK_ULL(31, 12)
+#define RISCV_IOMMU_TR_REQ_CTL_PV	BIT_ULL(32)
+#define RISCV_IOMMU_TR_REQ_CTL_DID	GENMASK_ULL(63, 40)
+
+/* 5.26 Translation request response (64bits) */
+#define RISCV_IOMMU_REG_TR_RESPONSE	0x0268
+#define RISCV_IOMMU_TR_RESPONSE_FAULT	BIT_ULL(0)
+#define RISCV_IOMMU_TR_RESPONSE_PBMT	GENMASK_ULL(8, 7)
+#define RISCV_IOMMU_TR_RESPONSE_SZ	BIT_ULL(9)
+#define RISCV_IOMMU_TR_RESPONSE_PPN	RISCV_IOMMU_PPN_FIELD
+
+/* 5.27 Interrupt cause to vector (64bits) */
+#define RISCV_IOMMU_REG_IVEC		0x02F8
+#define RISCV_IOMMU_IVEC_CIV		GENMASK_ULL(3, 0)
+#define RISCV_IOMMU_IVEC_FIV		GENMASK_ULL(7, 4)
+#define RISCV_IOMMU_IVEC_PMIV		GENMASK_ULL(11, 8)
+#define RISCV_IOMMU_IVEC_PIV		GENMASK_ULL(15,12)
+
+/* 5.28 MSI Configuration table (32 * 64bits) */
+#define RISCV_IOMMU_REG_MSI_CONFIG	0x0300
+#define RISCV_IOMMU_REG_MSI_ADDR(_n)	(RISCV_IOMMU_REG_MSI_CONFIG + (_n * 0x10))
+#define RISCV_IOMMU_MSI_ADDR		GENMASK_ULL(55, 2)
+#define RISCV_IOMMU_REG_MSI_DATA(_n)	(RISCV_IOMMU_REG_MSI_CONFIG + (_n * 0x10) + 0x08)
+#define RISCV_IOMMU_MSI_DATA		GENMASK_ULL(31, 0)
+#define RISCV_IOMMU_REG_MSI_VEC_CTL(_n)	(RISCV_IOMMU_REG_MSI_CONFIG + (_n * 0x10) + 0x0C)
+#define RISCV_IOMMU_MSI_VEC_CTL_M	BIT_ULL(0)
+
+#define RISCV_IOMMU_REG_SIZE	0x1000
+
+/*
+ * Chapter 2: Data structures
+ */
+
+/*
+ * Device Directory Table macros for non-leaf nodes
+ */
+#define RISCV_IOMMU_DDTE_VALID	BIT_ULL(0)
+#define RISCV_IOMMU_DDTE_PPN	RISCV_IOMMU_PPN_FIELD
+
+/**
+ * struct riscv_iommu_dc - Device Context
+ * @tc: Translation Control
+ * @iohgatp: I/O Hypervisor guest address translation and protection
+ *	     (Second stage context)
+ * @ta: Translation Attributes
+ * @fsc: First stage context
+ * @msiptpt: MSI page table pointer
+ * @msi_addr_mask: MSI address mask
+ * @msi_addr_pattern: MSI address pattern
+ *
+ * This structure is used for leaf nodes on the Device Directory Table,
+ * in case RISCV_IOMMU_CAP_MSI_FLAT is not set, the bottom 4 fields are
+ * not present and are skipped with pointer arithmetic to avoid
+ * casting, check out riscv_iommu_get_dc().
+ * See section 2.1 for more details
+ */
+struct riscv_iommu_dc {
+	u64 tc;
+	u64 iohgatp;
+	u64 ta;
+	u64 fsc;
+	u64 msiptp;
+	u64 msi_addr_mask;
+	u64 msi_addr_pattern;
+	u64 _reserved;
+};
+
+/* Translation control fields */
+#define RISCV_IOMMU_DC_TC_V		BIT_ULL(0)
+#define RISCV_IOMMU_DC_TC_EN_ATS	BIT_ULL(1)
+#define RISCV_IOMMU_DC_TC_EN_PRI	BIT_ULL(2)
+#define RISCV_IOMMU_DC_TC_T2GPA		BIT_ULL(3)
+#define RISCV_IOMMU_DC_TC_DTF		BIT_ULL(4)
+#define RISCV_IOMMU_DC_TC_PDTV		BIT_ULL(5)
+#define RISCV_IOMMU_DC_TC_PRPR		BIT_ULL(6)
+#define RISCV_IOMMU_DC_TC_GADE		BIT_ULL(7)
+#define RISCV_IOMMU_DC_TC_SADE		BIT_ULL(8)
+#define RISCV_IOMMU_DC_TC_DPE		BIT_ULL(9)
+#define RISCV_IOMMU_DC_TC_SBE		BIT_ULL(10)
+#define RISCV_IOMMU_DC_TC_SXL		BIT_ULL(11)
+
+/* Second-stage (aka G-stage) context fields */
+#define RISCV_IOMMU_DC_IOHGATP_PPN	RISCV_IOMMU_ATP_PPN_FIELD
+#define RISCV_IOMMU_DC_IOHGATP_GSCID	GENMASK_ULL(59, 44)
+#define RISCV_IOMMU_DC_IOHGATP_MODE	RISCV_IOMMU_ATP_MODE_FIELD
+
+/**
+ * enum riscv_iommu_dc_iohgatp_modes - Guest address translation/protection modes
+ * @RISCV_IOMMU_DC_IOHGATP_MODE_BARE: No translation/protection
+ * @RISCV_IOMMU_DC_IOHGATP_MODE_SV32X4: Sv32x4 (2-bit extension of Sv32), when fctl.GXL == 1
+ * @RISCV_IOMMU_DC_IOHGATP_MODE_SV39X4: Sv39x4 (2-bit extension of Sv39), when fctl.GXL == 0
+ * @RISCV_IOMMU_DC_IOHGATP_MODE_SV48X4: Sv48x4 (2-bit extension of Sv48), when fctl.GXL == 0
+ * @RISCV_IOMMU_DC_IOHGATP_MODE_SV57X4: Sv57x4 (2-bit extension of Sv57), when fctl.GXL == 0
+ */
+enum riscv_iommu_dc_iohgatp_modes {
+	RISCV_IOMMU_DC_IOHGATP_MODE_BARE = 0,
+	RISCV_IOMMU_DC_IOHGATP_MODE_SV32X4 = 8,
+	RISCV_IOMMU_DC_IOHGATP_MODE_SV39X4 = 8,
+	RISCV_IOMMU_DC_IOHGATP_MODE_SV48X4 = 9,
+	RISCV_IOMMU_DC_IOHGATP_MODE_SV57X4 = 10
+};
+
+/* Translation attributes fields */
+#define RISCV_IOMMU_DC_TA_PSCID		GENMASK_ULL(31,12)
+
+/* First-stage context fields */
+#define RISCV_IOMMU_DC_FSC_PPN		RISCV_IOMMU_ATP_PPN_FIELD
+#define RISCV_IOMMU_DC_FSC_MODE		RISCV_IOMMU_ATP_MODE_FIELD
+
+/**
+ * enum riscv_iommu_dc_fsc_atp_modes - First stage address translation/protection modes
+ * @RISCV_IOMMU_DC_FSC_MODE_BARE: No translation/protection
+ * @RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV32: Sv32, when dc.tc.SXL == 1
+ * @RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39: Sv39, when dc.tc.SXL == 0
+ * @RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV48: Sv48, when dc.tc.SXL == 0
+ * @RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV57: Sv57, when dc.tc.SXL == 0
+ * @RISCV_IOMMU_DC_FSC_PDTP_MODE_PD8: 1lvl PDT, 8bit process ids
+ * @RISCV_IOMMU_DC_FSC_PDTP_MODE_PD17: 2lvl PDT, 17bit process ids
+ * @RISCV_IOMMU_DC_FSC_PDTP_MODE_PD20: 3lvl PDT, 20bit process ids
+ *
+ * FSC holds IOSATP when RISCV_IOMMU_DC_TC_PDTV is 0 and PDTP otherwise.
+ * IOSATP controls the first stage address translation (same as the satp register on
+ * the RISC-V MMU), and PDTP holds the process directory table, used to select a
+ * first stage page table based on a process id (for devices that support multiple
+ * process ids).
+ */
+enum riscv_iommu_dc_fsc_atp_modes {
+	RISCV_IOMMU_DC_FSC_MODE_BARE = 0,
+	RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV32 = 8,
+	RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 = 8,
+	RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV48 = 9,
+	RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV57 = 10,
+	RISCV_IOMMU_DC_FSC_PDTP_MODE_PD8 = 1,
+	RISCV_IOMMU_DC_FSC_PDTP_MODE_PD17 = 2,
+	RISCV_IOMMU_DC_FSC_PDTP_MODE_PD20 = 3
+};
+
+/* MSI page table pointer */
+#define RISCV_IOMMU_DC_MSIPTP_PPN	RISCV_IOMMU_ATP_PPN_FIELD
+#define RISCV_IOMMU_DC_MSIPTP_MODE	RISCV_IOMMU_ATP_MODE_FIELD
+#define RISCV_IOMMU_DC_MSIPTP_MODE_OFF	0
+#define RISCV_IOMMU_DC_MSIPTP_MODE_FLAT	1
+
+/* MSI address mask */
+#define RISCV_IOMMU_DC_MSI_ADDR_MASK	GENMASK_ULL(51, 0)
+
+/* MSI address pattern */
+#define RISCV_IOMMU_DC_MSI_PATTERN	GENMASK_ULL(51, 0)
+
+/**
+ * struct riscv_iommu_pc - Process Context
+ * @ta: Translation Attributes
+ * @fsc: First stage context
+ *
+ * This structure is used for leaf nodes on the Process Directory Table
+ * See section 2.3 for more details
+ */
+struct riscv_iommu_pc {
+	u64 ta;
+	u64 fsc;
+};
+
+/* Translation attributes fields */
+#define RISCV_IOMMU_PC_TA_V	BIT_ULL(0)
+#define RISCV_IOMMU_PC_TA_ENS	BIT_ULL(1)
+#define RISCV_IOMMU_PC_TA_SUM	BIT_ULL(2)
+#define RISCV_IOMMU_PC_TA_PSCID	GENMASK_ULL(31, 12)
+
+/* First stage context fields */
+#define RISCV_IOMMU_PC_FSC_PPN	RISCV_IOMMU_ATP_PPN_FIELD
+#define RISCV_IOMMU_PC_FSC_MODE	RISCV_IOMMU_ATP_MODE_FIELD
+
+/*
+ * Chapter 3: In-memory queue interface
+ */
+
+/**
+ * struct riscv_iommu_cmd - Generic I/O MMU command structure
+ * @dword0: Includes the opcode and the function identifier
+ * @dword1: Opcode specific data
+ *
+ * The commands are interpreted as two 64bit fields, where the first
+ * 7bits of the first field are the opcode which also defines the
+ * command's format, followed by a 3bit field that specifies the
+ * function invoked by that command, and the rest is opcode-specific.
+ * This is a generic struct which will be populated differently
+ * according to each command. For more infos on the commands and
+ * the command queue check section 3.1.
+ */
+struct riscv_iommu_command {
+	u64 dword0;
+	u64 dword1;
+};
+
+/* Fields on dword0, common for all commands */
+#define RISCV_IOMMU_CMD_OPCODE	GENMASK_ULL(6, 0)
+#define	RISCV_IOMMU_CMD_FUNC	GENMASK_ULL(9, 7)
+
+/* 3.1.1 I/O MMU Page-table cache invalidation */
+/* Fields on dword0 */
+#define RISCV_IOMMU_CMD_IOTINVAL_OPCODE		1
+#define RISCV_IOMMU_CMD_IOTINVAL_FUNC_VMA	0
+#define RISCV_IOMMU_CMD_IOTINVAL_FUNC_GVMA	1
+#define RISCV_IOMMU_CMD_IOTINVAL_AV		BIT_ULL(10)
+#define RISCV_IOMMU_CMD_IOTINVAL_PSCID		GENMASK_ULL(31, 12)
+#define RISCV_IOMMU_CMD_IOTINVAL_PSCV		BIT_ULL(32)
+#define RISCV_IOMMU_CMD_IOTINVAL_GV		BIT_ULL(33)
+#define RISCV_IOMMU_CMD_IOTINVAL_GSCID		GENMASK_ULL(59, 44)
+/* dword1 is the address, 4K-alligned and shifted to the right by
+ * two bits. */
+
+/* 3.1.2 I/O MMU Command Queue Fences */
+/* Fields on dword0 */
+#define RISCV_IOMMU_CMD_IOFENCE_OPCODE		2
+#define RISCV_IOMMU_CMD_IOFENCE_FUNC_C		0
+#define RISCV_IOMMU_CMD_IOFENCE_AV		BIT_ULL(10)
+#define RISCV_IOMMU_CMD_IOFENCE_WSI		BIT_ULL(11)
+#define RISCV_IOMMU_CMD_IOFENCE_PR		BIT_ULL(12)
+#define RISCV_IOMMU_CMD_IOFENCE_PW		BIT_ULL(13)
+#define RISCV_IOMMU_CMD_IOFENCE_DATA		GENMASK_ULL(63, 32)
+/* dword1 is the address, word-size alligned and shifted to the
+ * right by two bits. */
+
+/* 3.1.3 I/O MMU Directory cache invalidation */
+/* Fields on dword0 */
+#define RISCV_IOMMU_CMD_IODIR_OPCODE		3
+#define RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_DDT	0
+#define RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_PDT	1
+#define RISCV_IOMMU_CMD_IODIR_PID		GENMASK_ULL(31, 12)
+#define RISCV_IOMMU_CMD_IODIR_DV		BIT_ULL(33)
+#define RISCV_IOMMU_CMD_IODIR_DID		GENMASK_ULL(63, 40)
+/* dword1 is reserved for standard use */
+
+/* 3.1.4 I/O MMU PCIe ATS */
+/* Fields on dword0 */
+#define RISCV_IOMMU_CMD_ATS_OPCODE		4
+#define RISCV_IOMMU_CMD_ATS_FUNC_INVAL		0
+#define RISCV_IOMMU_CMD_ATS_FUNC_PRGR		1
+#define RISCV_IOMMU_CMD_ATS_PID			GENMASK_ULL(31, 12)
+#define RISCV_IOMMU_CMD_ATS_PV			BIT_ULL(32)
+#define RISCV_IOMMU_CMD_ATS_DSV			BIT_ULL(33)
+#define RISCV_IOMMU_CMD_ATS_RID			GENMASK_ULL(55, 40)
+#define RISCV_IOMMU_CMD_ATS_DSEG		GENMASK_ULL(63, 56)
+/* dword1 is the ATS payload, two different payload types for INVAL and PRGR */
+
+/* ATS.INVAL payload*/
+#define RISCV_IOMMU_CMD_ATS_INVAL_G		BIT_ULL(0)
+/* Bits 1 - 10 are zeroed */
+#define RISCV_IOMMU_CMD_ATS_INVAL_S		BIT_ULL(11)
+#define RISCV_IOMMU_CMD_ATS_INVAL_UADDR		GENMASK_ULL(63, 12)
+
+/* ATS.PRGR payload */
+/* Bits 0 - 31 are zeroed */
+#define RISCV_IOMMU_CMD_ATS_PRGR_PRG_INDEX	GENMASK_ULL(40, 32)
+/* Bits 41 - 43 are zeroed */
+#define RISCV_IOMMU_CMD_ATS_PRGR_RESP_CODE	GENMASK_ULL(47, 44)
+#define RISCV_IOMMU_CMD_ATS_PRGR_DST_ID		GENMASK_ULL(63, 48)
+
+/**
+ * struct riscv_iommu_fq_record - Fault/Event Queue Record
+ * @hdr: Header, includes fault/event cause, PID/DID, transaction type etc
+ * @_reserved: Low 32bits for custom use, high 32bits for standard use
+ * @iotval: Transaction-type/cause specific format
+ * @iotval2: Cause specific format
+ *
+ * The fault/event queue reports events and failures raised when
+ * processing transactions. Each record is a 32byte structure where
+ * the first dword has a fixed format for providing generic infos
+ * regarding the fault/event, and two more dwords are there for
+ * fault/event-specific information. For more details see section
+ * 3.2.
+ */
+struct riscv_iommu_fq_record {
+	u64 hdr;
+	u64 _reserved;
+	u64 iotval;
+	u64 iotval2;
+};
+
+/* Fields on header */
+#define RISCV_IOMMU_FQ_HDR_CAUSE	GENMASK_ULL(11, 0)
+#define RISCV_IOMMU_FQ_HDR_PID		GENMASK_ULL(31, 12)
+#define RISCV_IOMMU_FQ_HDR_PV		BIT_ULL(32)
+#define RISCV_IOMMU_FQ_HDR_PRIV		BIT_ULL(33)
+#define RISCV_IOMMU_FQ_HDR_TTYPE	GENMASK_ULL(39, 34)
+#define RISCV_IOMMU_FQ_HDR_DID		GENMASK_ULL(63, 40)
+
+/**
+ * enum riscv_iommu_fq_causes - Fault/event cause values
+ * @RISCV_IOMMU_FQ_CAUSE_INST_FAULT: Instruction access fault
+ * @RISCV_IOMMU_FQ_CAUSE_RD_ADDR_MISALIGNED: Read address misaligned
+ * @RISCV_IOMMU_FQ_CAUSE_RD_FAULT: Read load fault
+ * @RISCV_IOMMU_FQ_CAUSE_WR_ADDR_MISALIGNED: Write/AMO address misaligned
+ * @RISCV_IOMMU_FQ_CAUSE_WR_FAULT: Write/AMO access fault
+ * @RISCV_IOMMU_FQ_CAUSE_INST_FAULT_S: Instruction page fault
+ * @RISCV_IOMMU_FQ_CAUSE_RD_FAULT_S: Read page fault
+ * @RISCV_IOMMU_FQ_CAUSE_WR_FAULT_S: Write/AMO page fault
+ * @RISCV_IOMMU_FQ_CAUSE_INST_FAULT_VS: Instruction guest page fault
+ * @RISCV_IOMMU_FQ_CAUSE_RD_FAULT_VS: Read guest page fault
+ * @RISCV_IOMMU_FQ_CAUSE_WR_FAULT_VS: Write/AMO guest page fault
+ * @RISCV_IOMMU_FQ_CAUSE_DMA_DISABLED: All inbound transactions disallowed
+ * @RISCV_IOMMU_FQ_CAUSE_DDT_LOAD_FAULT: DDT entry load access fault
+ * @RISCV_IOMMU_FQ_CAUSE_DDT_INVALID: DDT entry invalid
+ * @RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED: DDT entry misconfigured
+ * @RISCV_IOMMU_FQ_CAUSE_TTYPE_BLOCKED: Transaction type disallowed
+ * @RISCV_IOMMU_FQ_CAUSE_MSI_LOAD_FAULT: MSI PTE load access fault
+ * @RISCV_IOMMU_FQ_CAUSE_MSI_INVALID: MSI PTE invalid
+ * @RISCV_IOMMU_FQ_CAUSE_MSI_MISCONFIGURED: MSI PTE misconfigured
+ * @RISCV_IOMMU_FQ_CAUSE_MRIF_FAULT: MRIF access fault
+ * @RISCV_IOMMU_FQ_CAUSE_PDT_LOAD_FAULT: PDT entry load access fault
+ * @RISCV_IOMMU_FQ_CAUSE_PDT_INVALID: PDT entry invalid
+ * @RISCV_IOMMU_FQ_CAUSE_PDT_MISCONFIGURED: PDT entry misconfigured
+ * @RISCV_IOMMU_FQ_CAUSE_DDT_CORRUPTED: DDT data corruption
+ * @RISCV_IOMMU_FQ_CAUSE_PDT_CORRUPTED: PDT data corruption
+ * @RISCV_IOMMU_FQ_CAUSE_MSI_PT_CORRUPTED: MSI page table data corruption
+ * @RISCV_IOMMU_FQ_CAUSE_MRIF_CORRUIPTED: MRIF data corruption
+ * @RISCV_IOMMU_FQ_CAUSE_INTERNAL_DP_ERROR: Internal data path error
+ * @RISCV_IOMMU_FQ_CAUSE_MSI_WR_FAULT: IOMMU MSI write access fault
+ * @RISCV_IOMMU_FQ_CAUSE_PT_CORRUPTED: First/second stage page table data corruption
+ *
+ * Values are on table 11 of the spec, encodings 275 - 2047 are reserved for standard
+ * use, and 2048 - 4095 for custom use.
+ */
+enum riscv_iommu_fq_causes {
+	RISCV_IOMMU_FQ_CAUSE_INST_FAULT = 1,
+	RISCV_IOMMU_FQ_CAUSE_RD_ADDR_MISALIGNED = 4,
+	RISCV_IOMMU_FQ_CAUSE_RD_FAULT = 5,
+	RISCV_IOMMU_FQ_CAUSE_WR_ADDR_MISALIGNED = 6,
+	RISCV_IOMMU_FQ_CAUSE_WR_FAULT = 7,
+	RISCV_IOMMU_FQ_CAUSE_INST_FAULT_S = 12,
+	RISCV_IOMMU_FQ_CAUSE_RD_FAULT_S = 13,
+	RISCV_IOMMU_FQ_CAUSE_WR_FAULT_S = 15,
+	RISCV_IOMMU_FQ_CAUSE_INST_FAULT_VS = 20,
+	RISCV_IOMMU_FQ_CAUSE_RD_FAULT_VS = 21,
+	RISCV_IOMMU_FQ_CAUSE_WR_FAULT_VS = 23,
+	RISCV_IOMMU_FQ_CAUSE_DMA_DISABLED = 256,
+	RISCV_IOMMU_FQ_CAUSE_DDT_LOAD_FAULT = 257,
+	RISCV_IOMMU_FQ_CAUSE_DDT_INVALID = 258,
+	RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED = 259,
+	RISCV_IOMMU_FQ_CAUSE_TTYPE_BLOCKED = 260,
+	RISCV_IOMMU_FQ_CAUSE_MSI_LOAD_FAULT = 261,
+	RISCV_IOMMU_FQ_CAUSE_MSI_INVALID = 262,
+	RISCV_IOMMU_FQ_CAUSE_MSI_MISCONFIGURED = 263,
+	RISCV_IOMMU_FQ_CAUSE_MRIF_FAULT = 264,
+	RISCV_IOMMU_FQ_CAUSE_PDT_LOAD_FAULT = 265,
+	RISCV_IOMMU_FQ_CAUSE_PDT_INVALID = 266,
+	RISCV_IOMMU_FQ_CAUSE_PDT_MISCONFIGURED = 267,
+	RISCV_IOMMU_FQ_CAUSE_DDT_CORRUPTED = 268,
+	RISCV_IOMMU_FQ_CAUSE_PDT_CORRUPTED = 269,
+	RISCV_IOMMU_FQ_CAUSE_MSI_PT_CORRUPTED = 270,
+	RISCV_IOMMU_FQ_CAUSE_MRIF_CORRUIPTED = 271,
+	RISCV_IOMMU_FQ_CAUSE_INTERNAL_DP_ERROR = 272,
+	RISCV_IOMMU_FQ_CAUSE_MSI_WR_FAULT = 273,
+	RISCV_IOMMU_FQ_CAUSE_PT_CORRUPTED = 274
+};
+
+/**
+ * enum riscv_iommu_fq_ttypes: Fault/event transaction types
+ * @RISCV_IOMMU_FQ_TTYPE_NONE: None. Fault not caused by an inbound transaction.
+ * @RISCV_IOMMU_FQ_TTYPE_UADDR_INST_FETCH: Instruction fetch from untranslated address
+ * @RISCV_IOMMU_FQ_TTYPE_UADDR_RD: Read from untranslated address
+ * @RISCV_IOMMU_FQ_TTYPE_UADDR_WR: Write/AMO to untranslated address
+ * @RISCV_IOMMU_FQ_TTYPE_TADDR_INST_FETCH: Instruction fetch from translated address
+ * @RISCV_IOMMU_FQ_TTYPE_TADDR_RD: Read from translated address
+ * @RISCV_IOMMU_FQ_TTYPE_TADDR_WR: Write/AMO to translated address
+ * @RISCV_IOMMU_FQ_TTYPE_PCIE_ATS_REQ: PCIe ATS translation request
+ * @RISCV_IOMMU_FW_TTYPE_PCIE_MSG_REQ: PCIe message request
+ *
+ * Values are on table 12 of the spec, type 4 and 10 - 31 are reserved for standard use
+ * and 31 - 63 for custom use.
+ */
+enum riscv_iommu_fq_ttypes {
+	RISCV_IOMMU_FQ_TTYPE_NONE = 0,
+	RISCV_IOMMU_FQ_TTYPE_UADDR_INST_FETCH = 1,
+	RISCV_IOMMU_FQ_TTYPE_UADDR_RD = 2,
+	RISCV_IOMMU_FQ_TTYPE_UADDR_WR = 3,
+	RISCV_IOMMU_FQ_TTYPE_TADDR_INST_FETCH = 5,
+	RISCV_IOMMU_FQ_TTYPE_TADDR_RD = 6,
+	RISCV_IOMMU_FQ_TTYPE_TADDR_WR = 7,
+	RISCV_IOMMU_FQ_TTYPE_PCIE_ATS_REQ = 8,
+	RISCV_IOMMU_FW_TTYPE_PCIE_MSG_REQ = 9,
+};
+
+/**
+ * struct riscv_iommu_pq_record - PCIe Page Request record
+ * @hdr: Header, includes PID, DID etc
+ * @payload: Holds the page address, request group and permission bits
+ *
+ * For more infos on the PCIe Page Request queue see chapter 3.3.
+ */
+struct riscv_iommu_pq_record {
+	u64 hdr;
+	u64 payload;
+};
+
+/* Header fields */
+#define RISCV_IOMMU_PREQ_HDR_PID	GENMASK_ULL(31, 12)
+#define RISCV_IOMMU_PREQ_HDR_PV		BIT_ULL(32)
+#define RISCV_IOMMU_PREQ_HDR_PRIV	BIT_ULL(33)
+#define RISCV_IOMMU_PREQ_HDR_EXEC	BIT_ULL(34)
+#define RISCV_IOMMU_PREQ_HDR_DID	GENMASK_ULL(63, 40)
+
+/* Payload fields */
+#define RISCV_IOMMU_PREQ_PAYLOAD_R	BIT_ULL(0)
+#define RISCV_IOMMU_PREQ_PAYLOAD_W	BIT_ULL(1)
+#define RISCV_IOMMU_PREQ_PAYLOAD_L	BIT_ULL(2)
+#define RISCV_IOMMU_PREQ_PAYLOAD_M	GENMASK_ULL(2, 0)	/* Mask of RWL for convenience */
+#define RISCV_IOMMU_PREQ_PRG_INDEX	GENMASK_ULL(11, 3)
+#define RISCV_IOMMU_PREQ_UADDR		GENMASK_ULL(63, 12)
+
+/**
+ * struct riscv_iommu_msi_pte - MSI Page Table Entry
+ * @pte: MSI PTE
+ * @mrif_info: Memory-resident interrupt file info
+ *
+ * The MSI Page Table is used for virtualizing MSIs, so that when
+ * a device sends an MSI to a guest, the IOMMU can reroute it
+ * by translating the MSI address, either to a guest interrupt file
+ * or a memory resident interrupt file (MRIF). Note that this page table
+ * is an array of MSI PTEs, not a multi-level pt, each entry
+ * is a leaf entry. For more infos check out the the AIA spec, chapter 9.5.
+ *
+ * Also in basic mode the mrif_info field is ignored by the IOMMU and can
+ * be used by software, any other reserved fields on pte must be zeroed-out
+ * by software.
+ */
+struct riscv_iommu_msi_pte {
+	u64 pte;
+	u64 mrif_info;
+};
+
+/* Fields on pte */
+#define RISCV_IOMMU_MSI_PTE_V		BIT_ULL(0)
+#define RISCV_IOMMU_MSI_PTE_M		GENMASK_ULL(2, 1)
+#define RISCV_IOMMU_MSI_PTE_MRIF_ADDR	GENMASK_ULL(53, 7)	/* When M == 1 (MRIF mode) */
+#define RISCV_IOMMU_MSI_PTE_PPN		RISCV_IOMMU_PPN_FIELD	/* When M == 3 (basic mode) */
+#define RISCV_IOMMU_MSI_PTE_C		BIT_ULL(63)
+
+/* Fields on mrif_info */
+#define RISCV_IOMMU_MSI_MRIF_NID	GENMASK_ULL(9, 0)
+#define RISCV_IOMMU_MSI_MRIF_NPPN	RISCV_IOMMU_PPN_FIELD
+#define RISCV_IOMMU_MSI_MRIF_NID_MSB	BIT_ULL(60)
+
+#endif /* _RISCV_IOMMU_BITS_H_ */
diff --git a/drivers/iommu/riscv/iommu-pci.c b/drivers/iommu/riscv/iommu-pci.c
new file mode 100644
index 000000000000..c91f963d7a29
--- /dev/null
+++ b/drivers/iommu/riscv/iommu-pci.c
@@ -0,0 +1,134 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Copyright © 2022-2023 Rivos Inc.
+ * Copyright © 2023 FORTH-ICS/CARV
+ *
+ * RISCV IOMMU as a PCIe device
+ *
+ * Authors
+ *	Tomasz Jeznach <tjeznach@rivosinc.com>
+ *	Nick Kossifidis <mick@ics.forth.gr>
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/compiler.h>
+#include <linux/pci.h>
+#include <linux/init.h>
+#include <linux/iommu.h>
+#include <linux/bitfield.h>
+
+#include "iommu.h"
+
+/* Rivos Inc. assigned PCI Vendor and Device IDs */
+#ifndef PCI_VENDOR_ID_RIVOS
+#define PCI_VENDOR_ID_RIVOS             0x1efd
+#endif
+
+#ifndef PCI_DEVICE_ID_RIVOS_IOMMU
+#define PCI_DEVICE_ID_RIVOS_IOMMU       0xedf1
+#endif
+
+static int riscv_iommu_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
+{
+	struct device *dev = &pdev->dev;
+	struct riscv_iommu_device *iommu;
+	int ret;
+
+	ret = pci_enable_device_mem(pdev);
+	if (ret < 0)
+		return ret;
+
+	ret = pci_request_mem_regions(pdev, KBUILD_MODNAME);
+	if (ret < 0)
+		goto fail;
+
+	ret = -ENOMEM;
+
+	iommu = devm_kzalloc(dev, sizeof(*iommu), GFP_KERNEL);
+	if (!iommu)
+		goto fail;
+
+	if (!(pci_resource_flags(pdev, 0) & IORESOURCE_MEM))
+		goto fail;
+
+	if (pci_resource_len(pdev, 0) < RISCV_IOMMU_REG_SIZE)
+		goto fail;
+
+	iommu->reg_phys = pci_resource_start(pdev, 0);
+	if (!iommu->reg_phys)
+		goto fail;
+
+	iommu->reg = devm_ioremap(dev, iommu->reg_phys, RISCV_IOMMU_REG_SIZE);
+	if (!iommu->reg)
+		goto fail;
+
+	iommu->dev = dev;
+	dev_set_drvdata(dev, iommu);
+
+	dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
+	pci_set_master(pdev);
+
+	ret = riscv_iommu_init(iommu);
+	if (!ret)
+		return ret;
+
+ fail:
+	pci_clear_master(pdev);
+	pci_release_regions(pdev);
+	pci_disable_device(pdev);
+	/* Note: devres_release_all() will release iommu and iommu->reg */
+	return ret;
+}
+
+static void riscv_iommu_pci_remove(struct pci_dev *pdev)
+{
+	riscv_iommu_remove(dev_get_drvdata(&pdev->dev));
+	pci_clear_master(pdev);
+	pci_release_regions(pdev);
+	pci_disable_device(pdev);
+}
+
+static int riscv_iommu_suspend(struct device *dev)
+{
+	dev_warn(dev, "RISC-V IOMMU PM not implemented");
+	return -ENODEV;
+}
+
+static int riscv_iommu_resume(struct device *dev)
+{
+	dev_warn(dev, "RISC-V IOMMU PM not implemented");
+	return -ENODEV;
+}
+
+static DEFINE_SIMPLE_DEV_PM_OPS(riscv_iommu_pm_ops, riscv_iommu_suspend,
+				riscv_iommu_resume);
+
+static const struct pci_device_id riscv_iommu_pci_tbl[] = {
+	{PCI_VENDOR_ID_RIVOS, PCI_DEVICE_ID_RIVOS_IOMMU,
+	 PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0},
+	{0,}
+};
+
+MODULE_DEVICE_TABLE(pci, riscv_iommu_pci_tbl);
+
+static const struct of_device_id riscv_iommu_of_match[] = {
+	{.compatible = "riscv,pci-iommu",},
+	{},
+};
+
+MODULE_DEVICE_TABLE(of, riscv_iommu_of_match);
+
+static struct pci_driver riscv_iommu_pci_driver = {
+	.name = KBUILD_MODNAME,
+	.id_table = riscv_iommu_pci_tbl,
+	.probe = riscv_iommu_pci_probe,
+	.remove = riscv_iommu_pci_remove,
+	.driver = {
+		   .pm = pm_sleep_ptr(&riscv_iommu_pm_ops),
+		   .of_match_table = riscv_iommu_of_match,
+		   },
+};
+
+module_driver(riscv_iommu_pci_driver, pci_register_driver, pci_unregister_driver);
diff --git a/drivers/iommu/riscv/iommu-platform.c b/drivers/iommu/riscv/iommu-platform.c
new file mode 100644
index 000000000000..e4e8ca6711e7
--- /dev/null
+++ b/drivers/iommu/riscv/iommu-platform.c
@@ -0,0 +1,94 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * RISC-V IOMMU as a platform device
+ *
+ * Copyright © 2023 FORTH-ICS/CARV
+ *
+ * Author: Nick Kossifidis <mick@ics.forth.gr>
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/of_platform.h>
+#include <linux/bitfield.h>
+
+#include "iommu-bits.h"
+#include "iommu.h"
+
+static int riscv_iommu_platform_probe(struct platform_device *pdev)
+{
+	struct device *dev = &pdev->dev;
+	struct riscv_iommu_device *iommu = NULL;
+	struct resource *res = NULL;
+	int ret = 0;
+
+	iommu = devm_kzalloc(dev, sizeof(*iommu), GFP_KERNEL);
+	if (!iommu)
+		return -ENOMEM;
+
+	iommu->dev = dev;
+	dev_set_drvdata(dev, iommu);
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	if (!res) {
+		dev_err(dev, "could not find resource for register region\n");
+		return -EINVAL;
+	}
+
+	iommu->reg = devm_platform_get_and_ioremap_resource(pdev, 0, &res);
+	if (IS_ERR(iommu->reg)) {
+		ret = dev_err_probe(dev, PTR_ERR(iommu->reg),
+				    "could not map register region\n");
+		goto fail;
+	};
+
+	iommu->reg_phys = res->start;
+
+	ret = -ENODEV;
+
+	/* Sanity check: Did we get the whole register space ? */
+	if ((res->end - res->start + 1) < RISCV_IOMMU_REG_SIZE) {
+		dev_err(dev, "device region smaller than register file (0x%llx)\n",
+			res->end - res->start);
+		goto fail;
+	}
+
+	dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
+
+	return riscv_iommu_init(iommu);
+
+ fail:
+	/* Note: devres_release_all() will release iommu and iommu->reg */
+	return ret;
+};
+
+static void riscv_iommu_platform_remove(struct platform_device *pdev)
+{
+	riscv_iommu_remove(dev_get_drvdata(&pdev->dev));
+};
+
+static void riscv_iommu_platform_shutdown(struct platform_device *pdev)
+{
+	return;
+};
+
+static const struct of_device_id riscv_iommu_of_match[] = {
+	{.compatible = "riscv,iommu",},
+	{},
+};
+
+MODULE_DEVICE_TABLE(of, riscv_iommu_of_match);
+
+static struct platform_driver riscv_iommu_platform_driver = {
+	.driver = {
+		   .name = "riscv,iommu",
+		   .of_match_table = riscv_iommu_of_match,
+		   .suppress_bind_attrs = true,
+		   },
+	.probe = riscv_iommu_platform_probe,
+	.remove_new = riscv_iommu_platform_remove,
+	.shutdown = riscv_iommu_platform_shutdown,
+};
+
+module_driver(riscv_iommu_platform_driver, platform_driver_register,
+	      platform_driver_unregister);
diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
new file mode 100644
index 000000000000..8c236242e2cc
--- /dev/null
+++ b/drivers/iommu/riscv/iommu.c
@@ -0,0 +1,660 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * IOMMU API for RISC-V architected Ziommu implementations.
+ *
+ * Copyright © 2022-2023 Rivos Inc.
+ * Copyright © 2023 FORTH-ICS/CARV
+ *
+ * Authors
+ *	Tomasz Jeznach <tjeznach@rivosinc.com>
+ *	Nick Kossifidis <mick@ics.forth.gr>
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/bitfield.h>
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/compiler.h>
+#include <linux/pci.h>
+#include <linux/pci-ats.h>
+#include <linux/init.h>
+#include <linux/completion.h>
+#include <linux/uaccess.h>
+#include <linux/iommu.h>
+#include <linux/irqdomain.h>
+#include <linux/platform_device.h>
+#include <linux/dma-map-ops.h>
+#include <asm/page.h>
+
+#include "../dma-iommu.h"
+#include "../iommu-sva.h"
+#include "iommu.h"
+
+#include <asm/csr.h>
+#include <asm/delay.h>
+
+MODULE_DESCRIPTION("IOMMU driver for RISC-V architected Ziommu implementations");
+MODULE_AUTHOR("Tomasz Jeznach <tjeznach@rivosinc.com>");
+MODULE_AUTHOR("Nick Kossifidis <mick@ics.forth.gr>");
+MODULE_ALIAS("riscv-iommu");
+MODULE_LICENSE("GPL v2");
+
+/* Global IOMMU params. */
+static int ddt_mode = RISCV_IOMMU_DDTP_MODE_BARE;
+module_param(ddt_mode, int, 0644);
+MODULE_PARM_DESC(ddt_mode, "Device Directory Table mode.");
+
+/* IOMMU PSCID allocation namespace. */
+#define RISCV_IOMMU_MAX_PSCID	(1U << 20)
+static DEFINE_IDA(riscv_iommu_pscids);
+
+/* 1 second */
+#define RISCV_IOMMU_TIMEOUT	riscv_timebase
+
+/* RISC-V IOMMU PPN <> PHYS address conversions, PHYS <=> PPN[53:10] */
+#define phys_to_ppn(va)  (((va) >> 2) & (((1ULL << 44) - 1) << 10))
+#define ppn_to_phys(pn)	 (((pn) << 2) & (((1ULL << 44) - 1) << 12))
+
+#define iommu_domain_to_riscv(iommu_domain) \
+    container_of(iommu_domain, struct riscv_iommu_domain, domain)
+
+#define iommu_device_to_riscv(iommu_device) \
+    container_of(iommu_device, struct riscv_iommu, iommu)
+
+static const struct iommu_domain_ops riscv_iommu_domain_ops;
+static const struct iommu_ops riscv_iommu_ops;
+
+/*
+ * Register device for IOMMU tracking.
+ */
+static void riscv_iommu_add_device(struct riscv_iommu_device *iommu, struct device *dev)
+{
+	struct riscv_iommu_endpoint *ep, *rb_ep;
+	struct rb_node **new_node, *parent_node = NULL;
+
+	mutex_lock(&iommu->eps_mutex);
+
+	ep = dev_iommu_priv_get(dev);
+
+	new_node = &(iommu->eps.rb_node);
+	while (*new_node) {
+		rb_ep = rb_entry(*new_node, struct riscv_iommu_endpoint, node);
+		parent_node = *new_node;
+		if (rb_ep->devid > ep->devid) {
+			new_node = &((*new_node)->rb_left);
+		} else if (rb_ep->devid < ep->devid) {
+			new_node = &((*new_node)->rb_right);
+		} else {
+			dev_warn(dev, "device %u already in the tree\n", ep->devid);
+			break;
+		}
+	}
+
+	rb_link_node(&ep->node, parent_node, new_node);
+	rb_insert_color(&ep->node, &iommu->eps);
+
+	mutex_unlock(&iommu->eps_mutex);
+}
+
+/*
+ * Endpoint management
+ */
+
+static int riscv_iommu_of_xlate(struct device *dev, struct of_phandle_args *args)
+{
+	return iommu_fwspec_add_ids(dev, args->args, 1);
+}
+
+static bool riscv_iommu_capable(struct device *dev, enum iommu_cap cap)
+{
+	switch (cap) {
+	case IOMMU_CAP_CACHE_COHERENCY:
+	case IOMMU_CAP_PRE_BOOT_PROTECTION:
+		return true;
+
+	default:
+		break;
+	}
+
+	return false;
+}
+
+static struct iommu_device *riscv_iommu_probe_device(struct device *dev)
+{
+	struct riscv_iommu_device *iommu;
+	struct riscv_iommu_endpoint *ep;
+	struct iommu_fwspec *fwspec;
+
+	fwspec = dev_iommu_fwspec_get(dev);
+	if (!fwspec || fwspec->ops != &riscv_iommu_ops ||
+	    !fwspec->iommu_fwnode || !fwspec->iommu_fwnode->dev)
+		return ERR_PTR(-ENODEV);
+
+	iommu = dev_get_drvdata(fwspec->iommu_fwnode->dev);
+	if (!iommu)
+		return ERR_PTR(-ENODEV);
+
+	if (dev_iommu_priv_get(dev))
+		return &iommu->iommu;
+
+	ep = kzalloc(sizeof(*ep), GFP_KERNEL);
+	if (!ep)
+		return ERR_PTR(-ENOMEM);
+
+	mutex_init(&ep->lock);
+	INIT_LIST_HEAD(&ep->domain);
+
+	if (dev_is_pci(dev)) {
+		ep->devid = pci_dev_id(to_pci_dev(dev));
+		ep->domid = pci_domain_nr(to_pci_dev(dev)->bus);
+	} else {
+		/* TODO: Make this generic, for now hardcode domain id to 0 */
+		ep->devid = fwspec->ids[0];
+		ep->domid = 0;
+	}
+
+	ep->iommu = iommu;
+	ep->dev = dev;
+
+	dev_info(iommu->dev, "adding device to iommu with devid %i in domain %i\n",
+		ep->devid, ep->domid);
+
+	dev_iommu_priv_set(dev, ep);
+	riscv_iommu_add_device(iommu, dev);
+
+	return &iommu->iommu;
+}
+
+static void riscv_iommu_probe_finalize(struct device *dev)
+{
+	set_dma_ops(dev, NULL);
+	iommu_setup_dma_ops(dev, 0, U64_MAX);
+}
+
+static void riscv_iommu_release_device(struct device *dev)
+{
+	struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
+	struct riscv_iommu_device *iommu = ep->iommu;
+
+	dev_info(dev, "device with devid %i released\n", ep->devid);
+
+	mutex_lock(&ep->lock);
+	list_del(&ep->domain);
+	mutex_unlock(&ep->lock);
+
+	/* Remove endpoint from IOMMU tracking structures */
+	mutex_lock(&iommu->eps_mutex);
+	rb_erase(&ep->node, &iommu->eps);
+	mutex_unlock(&iommu->eps_mutex);
+
+	set_dma_ops(dev, NULL);
+	dev_iommu_priv_set(dev, NULL);
+
+	kfree(ep);
+}
+
+static struct iommu_group *riscv_iommu_device_group(struct device *dev)
+{
+	if (dev_is_pci(dev))
+		return pci_device_group(dev);
+	return generic_device_group(dev);
+}
+
+static void riscv_iommu_get_resv_regions(struct device *dev, struct list_head *head)
+{
+	iommu_dma_get_resv_regions(dev, head);
+}
+
+/*
+ * Domain management
+ */
+
+static struct iommu_domain *riscv_iommu_domain_alloc(unsigned type)
+{
+	struct riscv_iommu_domain *domain;
+
+	if (type != IOMMU_DOMAIN_IDENTITY &&
+	    type != IOMMU_DOMAIN_BLOCKED)
+		return NULL;
+
+	domain = kzalloc(sizeof(*domain), GFP_KERNEL);
+	if (!domain)
+		return NULL;
+
+	mutex_init(&domain->lock);
+	INIT_LIST_HEAD(&domain->endpoints);
+
+	domain->domain.ops = &riscv_iommu_domain_ops;
+	domain->mode = RISCV_IOMMU_DC_FSC_MODE_BARE;
+	domain->pscid = ida_alloc_range(&riscv_iommu_pscids, 1,
+					RISCV_IOMMU_MAX_PSCID, GFP_KERNEL);
+
+	printk("domain type %x alloc %u\n", type, domain->pscid);
+
+	return &domain->domain;
+}
+
+static void riscv_iommu_domain_free(struct iommu_domain *iommu_domain)
+{
+	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
+
+	if (!list_empty(&domain->endpoints)) {
+		pr_warn("IOMMU domain is not empty!\n");
+	}
+
+	if (domain->pgd_root)
+		free_pages((unsigned long)domain->pgd_root, 0);
+
+	if ((int)domain->pscid > 0)
+		ida_free(&riscv_iommu_pscids, domain->pscid);
+
+	printk("domain free %u\n", domain->pscid);
+
+	kfree(domain);
+}
+
+static int riscv_iommu_domain_finalize(struct riscv_iommu_domain *domain,
+				       struct riscv_iommu_device *iommu)
+{
+	struct iommu_domain_geometry *geometry;
+
+	/* Domain assigned to another iommu */
+	if (domain->iommu && domain->iommu != iommu)
+		return -EINVAL;
+	/* Domain already initialized */
+	else if (domain->iommu)
+		return 0;
+
+	/*
+	 * TODO: Before using VA_BITS and satp_mode here, verify they
+	 * are supported by the iommu, through the capabilities register.
+	 */
+
+	geometry = &domain->domain.geometry;
+
+	/*
+	 * Note: RISC-V Privilege spec mandates that virtual addresses
+	 * need to be sign-extended, so if (VA_BITS - 1) is set, all
+	 * bits >= VA_BITS need to also be set or else we'll get a
+	 * page fault. However the code that creates the mappings
+	 * above us (e.g. iommu_dma_alloc_iova()) won't do that for us
+	 * for now, so we'll end up with invalid virtual addresses
+	 * to map. As a workaround until we get this sorted out
+	 * limit the available virtual addresses to VA_BITS - 1.
+	 */
+	geometry->aperture_start = 0;
+	geometry->aperture_end = DMA_BIT_MASK(VA_BITS - 1);
+	geometry->force_aperture = true;
+
+	domain->iommu = iommu;
+
+	if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)
+		return 0;
+
+	/* TODO: Fix this for RV32 */
+	domain->mode = satp_mode >> 60;
+	domain->pgd_root = (pgd_t *) __get_free_pages(GFP_KERNEL | __GFP_ZERO, 0);
+
+	if (!domain->pgd_root)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static int riscv_iommu_attach_dev(struct iommu_domain *iommu_domain, struct device *dev)
+{
+	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
+	struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
+	int ret;
+
+	/* PSCID not valid */
+	if ((int)domain->pscid < 0)
+		return -ENOMEM;
+
+	mutex_lock(&domain->lock);
+	mutex_lock(&ep->lock);
+
+	if (!list_empty(&ep->domain)) {
+		dev_warn(dev, "endpoint already attached to a domain. dropping\n");
+		list_del_init(&ep->domain);
+	}
+
+	/* allocate root pages, initialize io-pgtable ops, etc. */
+	ret = riscv_iommu_domain_finalize(domain, ep->iommu);
+	if (ret < 0) {
+		dev_err(dev, "can not finalize domain: %d\n", ret);
+		mutex_unlock(&ep->lock);
+		mutex_unlock(&domain->lock);
+		return ret;
+	}
+
+	if (ep->iommu->ddt_mode != RISCV_IOMMU_DDTP_MODE_BARE ||
+	    domain->domain.type != IOMMU_DOMAIN_IDENTITY) {
+		dev_warn(dev, "domain type %d not supported\n",
+		    domain->domain.type);
+		return -ENODEV;
+	}
+
+	list_add_tail(&ep->domain, &domain->endpoints);
+	mutex_unlock(&ep->lock);
+	mutex_unlock(&domain->lock);
+
+	dev_info(dev, "domain type %d attached w/ PSCID %u\n",
+	    domain->domain.type, domain->pscid);
+
+	return 0;
+}
+
+static void riscv_iommu_flush_iotlb_range(struct iommu_domain *iommu_domain,
+					  unsigned long *start, unsigned long *end,
+					  size_t *pgsize)
+{
+	/* Command interface not implemented */
+}
+
+static void riscv_iommu_flush_iotlb_all(struct iommu_domain *iommu_domain)
+{
+	riscv_iommu_flush_iotlb_range(iommu_domain, NULL, NULL, NULL);
+}
+
+static void riscv_iommu_iotlb_sync(struct iommu_domain *iommu_domain,
+				   struct iommu_iotlb_gather *gather)
+{
+	riscv_iommu_flush_iotlb_range(iommu_domain, &gather->start, &gather->end,
+				      &gather->pgsize);
+}
+
+static void riscv_iommu_iotlb_sync_map(struct iommu_domain *iommu_domain,
+				       unsigned long iova, size_t size)
+{
+	unsigned long end = iova + size - 1;
+	/*
+	 * Given we don't know the page size used by this range, we assume the
+	 * smallest page size to ensure all possible entries are flushed from
+	 * the IOATC.
+	 */
+	size_t pgsize = PAGE_SIZE;
+	riscv_iommu_flush_iotlb_range(iommu_domain, &iova, &end, &pgsize);
+}
+
+static int riscv_iommu_map_pages(struct iommu_domain *iommu_domain,
+				 unsigned long iova, phys_addr_t phys,
+				 size_t pgsize, size_t pgcount, int prot,
+				 gfp_t gfp, size_t *mapped)
+{
+	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
+
+	if (domain->domain.type == IOMMU_DOMAIN_IDENTITY) {
+		*mapped = pgsize * pgcount;
+		return 0;
+	}
+
+	return -ENODEV;
+}
+
+static size_t riscv_iommu_unmap_pages(struct iommu_domain *iommu_domain,
+				      unsigned long iova, size_t pgsize,
+				      size_t pgcount, struct iommu_iotlb_gather *gather)
+{
+	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
+
+	if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)
+		return pgsize * pgcount;
+
+	return 0;
+}
+
+static phys_addr_t riscv_iommu_iova_to_phys(struct iommu_domain *iommu_domain,
+					    dma_addr_t iova)
+{
+	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
+
+	if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)
+		return (phys_addr_t) iova;
+
+	return 0;
+}
+
+/*
+ * Translation mode setup
+ */
+
+static u64 riscv_iommu_get_ddtp(struct riscv_iommu_device *iommu)
+{
+	u64 ddtp;
+	cycles_t end_cycles = RISCV_IOMMU_TIMEOUT + get_cycles();
+
+	/* Wait for DDTP.BUSY to be cleared and return latest value */
+	do {
+		ddtp = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_DDTP);
+		if (!(ddtp & RISCV_IOMMU_DDTP_BUSY))
+			break;
+		cpu_relax();
+	} while (get_cycles() < end_cycles);
+
+	return ddtp;
+}
+
+static void riscv_iommu_ddt_cleanup(struct riscv_iommu_device *iommu)
+{
+	/* TODO: teardown whole device directory tree. */
+	if (iommu->ddtp) {
+		if (iommu->ddtp_in_iomem) {
+			iounmap((void *)iommu->ddtp);
+		} else
+			free_page(iommu->ddtp);
+		iommu->ddtp = 0;
+	}
+}
+
+static int riscv_iommu_enable(struct riscv_iommu_device *iommu, unsigned requested_mode)
+{
+	struct device *dev = iommu->dev;
+	u64 ddtp = 0;
+	u64 ddtp_paddr = 0;
+	unsigned mode = requested_mode;
+	unsigned mode_readback = 0;
+
+	ddtp = riscv_iommu_get_ddtp(iommu);
+	if (ddtp & RISCV_IOMMU_DDTP_BUSY)
+		return -EBUSY;
+
+	/* Disallow state transtion from xLVL to xLVL. */
+	switch (FIELD_GET(RISCV_IOMMU_DDTP_MODE, ddtp)) {
+	case RISCV_IOMMU_DDTP_MODE_BARE:
+	case RISCV_IOMMU_DDTP_MODE_OFF:
+		break;
+	default:
+		if ((mode != RISCV_IOMMU_DDTP_MODE_BARE)
+		    && (mode != RISCV_IOMMU_DDTP_MODE_OFF))
+			return -EINVAL;
+		break;
+	}
+
+ retry:
+	switch (mode) {
+	case RISCV_IOMMU_DDTP_MODE_BARE:
+	case RISCV_IOMMU_DDTP_MODE_OFF:
+		riscv_iommu_ddt_cleanup(iommu);
+		ddtp = FIELD_PREP(RISCV_IOMMU_DDTP_MODE, mode);
+		break;
+	case RISCV_IOMMU_DDTP_MODE_1LVL:
+	case RISCV_IOMMU_DDTP_MODE_2LVL:
+	case RISCV_IOMMU_DDTP_MODE_3LVL:
+		if (!iommu->ddtp) {
+			/*
+			 * We haven't initialized ddtp yet, since it's WARL make
+			 * sure that we don't have a hardwired PPN field there
+			 * that points to i/o memory instead.
+			 */
+			riscv_iommu_writeq(iommu, RISCV_IOMMU_REG_DDTP, 0);
+			ddtp = riscv_iommu_get_ddtp(iommu);
+			ddtp_paddr = ppn_to_phys(ddtp);
+			if (ddtp_paddr) {
+				dev_warn(dev, "ddtp at 0x%llx\n", ddtp_paddr);
+				iommu->ddtp =
+				    (unsigned long)ioremap(ddtp_paddr, PAGE_SIZE);
+				iommu->ddtp_in_iomem = true;
+			} else {
+				iommu->ddtp = get_zeroed_page(GFP_KERNEL);
+			}
+		}
+		if (!iommu->ddtp)
+			return -ENOMEM;
+
+		ddtp = FIELD_PREP(RISCV_IOMMU_DDTP_MODE, mode) |
+		    phys_to_ppn(__pa(iommu->ddtp));
+
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	riscv_iommu_writeq(iommu, RISCV_IOMMU_REG_DDTP, ddtp);
+	ddtp = riscv_iommu_get_ddtp(iommu);
+	if (ddtp & RISCV_IOMMU_DDTP_BUSY) {
+		dev_warn(dev, "timeout when setting ddtp (ddt mode: %i)\n", mode);
+		return -EBUSY;
+	}
+
+	mode_readback = FIELD_GET(RISCV_IOMMU_DDTP_MODE, ddtp);
+	dev_info(dev, "mode_readback: %i, mode: %i\n", mode_readback, mode);
+	if (mode_readback != mode) {
+		/*
+		 * Mode field is WARL, an I/O MMU may support a subset of
+		 * directory table levels in which case if we tried to set
+		 * an unsupported number of levels we'll readback either
+		 * a valid xLVL or off/bare. If we got off/bare, try again
+		 * with a smaller xLVL.
+		 */
+		if (mode_readback < RISCV_IOMMU_DDTP_MODE_1LVL &&
+		    mode > RISCV_IOMMU_DDTP_MODE_1LVL) {
+			mode--;
+			goto retry;
+		}
+
+		/*
+		 * We tried all supported xLVL modes and still got off/bare instead,
+		 * an I/O MMU must support at least one supported xLVL mode so something
+		 * went very wrong.
+		 */
+		if (mode_readback < RISCV_IOMMU_DDTP_MODE_1LVL &&
+		    mode == RISCV_IOMMU_DDTP_MODE_1LVL)
+			goto fail;
+
+		/*
+		 * We tried setting off or bare and got something else back, something
+		 * went very wrong since off/bare is always legal.
+		 */
+		if (mode < RISCV_IOMMU_DDTP_MODE_1LVL)
+			goto fail;
+
+		/*
+		 * We tried setting an xLVL mode but got another xLVL mode that
+		 * we don't support (e.g. a custom one).
+		 */
+		if (mode_readback > RISCV_IOMMU_DDTP_MODE_MAX)
+			goto fail;
+
+		/* We tried setting an xLVL mode but got another supported xLVL mode */
+		mode = mode_readback;
+	}
+
+	if (mode != requested_mode)
+		dev_warn(dev, "unsupported DDT mode requested (%i), using %i instead\n",
+			 requested_mode, mode);
+
+	iommu->ddt_mode = mode;
+	dev_info(dev, "ddt_mode: %i\n", iommu->ddt_mode);
+	return 0;
+
+ fail:
+	dev_err(dev, "failed to set DDT mode, tried: %i and got %i\n", mode,
+		mode_readback);
+	riscv_iommu_ddt_cleanup(iommu);
+	return -EINVAL;
+}
+
+/*
+ * Common I/O MMU driver probe/teardown
+ */
+
+static const struct iommu_domain_ops riscv_iommu_domain_ops = {
+	.free = riscv_iommu_domain_free,
+	.attach_dev = riscv_iommu_attach_dev,
+	.map_pages = riscv_iommu_map_pages,
+	.unmap_pages = riscv_iommu_unmap_pages,
+	.iova_to_phys = riscv_iommu_iova_to_phys,
+	.iotlb_sync = riscv_iommu_iotlb_sync,
+	.iotlb_sync_map = riscv_iommu_iotlb_sync_map,
+	.flush_iotlb_all = riscv_iommu_flush_iotlb_all,
+};
+
+static const struct iommu_ops riscv_iommu_ops = {
+	.owner = THIS_MODULE,
+	.pgsize_bitmap = SZ_4K | SZ_2M | SZ_512M,
+	.capable = riscv_iommu_capable,
+	.domain_alloc = riscv_iommu_domain_alloc,
+	.probe_device = riscv_iommu_probe_device,
+	.probe_finalize = riscv_iommu_probe_finalize,
+	.release_device = riscv_iommu_release_device,
+	.device_group = riscv_iommu_device_group,
+	.get_resv_regions = riscv_iommu_get_resv_regions,
+	.of_xlate = riscv_iommu_of_xlate,
+	.default_domain_ops = &riscv_iommu_domain_ops,
+};
+
+void riscv_iommu_remove(struct riscv_iommu_device *iommu)
+{
+	iommu_device_unregister(&iommu->iommu);
+	riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_OFF);
+}
+
+int riscv_iommu_init(struct riscv_iommu_device *iommu)
+{
+	struct device *dev = iommu->dev;
+	u32 fctl = 0;
+	int ret;
+
+	iommu->eps = RB_ROOT;
+
+	fctl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_FCTL);
+
+#ifdef CONFIG_CPU_BIG_ENDIAN
+	if (!(cap & RISCV_IOMMU_CAP_END)) {
+		dev_err(dev, "IOMMU doesn't support Big Endian\n");
+		return -EIO;
+	} else if (!(fctl & RISCV_IOMMU_FCTL_BE)) {
+		fctl |= FIELD_PREP(RISCV_IOMMU_FCTL_BE, 1);
+		riscv_iommu_writel(iommu, RISCV_IOMMU_REG_FCTL, fctl);
+	}
+#endif
+
+	/* Clear any pending interrupt flag. */
+	riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR,
+			   RISCV_IOMMU_IPSR_CIP |
+			   RISCV_IOMMU_IPSR_FIP |
+			   RISCV_IOMMU_IPSR_PMIP | RISCV_IOMMU_IPSR_PIP);
+	spin_lock_init(&iommu->cq_lock);
+	mutex_init(&iommu->eps_mutex);
+
+	ret = riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_BARE);
+
+	if (ret) {
+		dev_err(dev, "cannot enable iommu device (%d)\n", ret);
+		goto fail;
+	}
+
+	ret = iommu_device_register(&iommu->iommu, &riscv_iommu_ops, dev);
+	if (ret) {
+		dev_err(dev, "cannot register iommu interface (%d)\n", ret);
+		iommu_device_sysfs_remove(&iommu->iommu);
+		goto fail;
+	}
+
+	return 0;
+ fail:
+	riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_OFF);
+	return ret;
+}
diff --git a/drivers/iommu/riscv/iommu.h b/drivers/iommu/riscv/iommu.h
new file mode 100644
index 000000000000..7baefd3630b3
--- /dev/null
+++ b/drivers/iommu/riscv/iommu.h
@@ -0,0 +1,115 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright © 2022-2023 Rivos Inc.
+ * Copyright © 2023 FORTH-ICS/CARV
+ *
+ * RISC-V Ziommu - IOMMU Interface Specification.
+ *
+ * Authors
+ *	Tomasz Jeznach <tjeznach@rivosinc.com>
+ *	Nick Kossifidis <mick@ics.forth.gr>
+ */
+
+#ifndef _RISCV_IOMMU_H_
+#define _RISCV_IOMMU_H_
+
+#include <linux/types.h>
+#include <linux/iova.h>
+#include <linux/io.h>
+#include <linux/idr.h>
+#include <linux/list.h>
+#include <linux/iommu.h>
+#include <linux/io-pgtable.h>
+
+#include "iommu-bits.h"
+
+#define IOMMU_PAGE_SIZE_4K	BIT_ULL(12)
+#define IOMMU_PAGE_SIZE_2M	BIT_ULL(21)
+#define IOMMU_PAGE_SIZE_1G	BIT_ULL(30)
+#define IOMMU_PAGE_SIZE_512G	BIT_ULL(39)
+
+struct riscv_iommu_device {
+	struct iommu_device iommu;	/* iommu core interface */
+	struct device *dev;		/* iommu hardware */
+
+	/* hardware control register space */
+	void __iomem *reg;
+	resource_size_t reg_phys;
+
+	/* IRQs for the various queues */
+	int irq_cmdq;
+	int irq_fltq;
+	int irq_pm;
+	int irq_priq;
+
+	/* supported and enabled hardware capabilities */
+	u64 cap;
+
+	/* global lock, to be removed */
+	spinlock_t cq_lock;
+
+	/* device directory table root pointer and mode */
+	unsigned long ddtp;
+	unsigned ddt_mode;
+	bool ddtp_in_iomem;
+
+	/* Connected end-points */
+	struct rb_root eps;
+	struct mutex eps_mutex;
+};
+
+struct riscv_iommu_domain {
+	struct iommu_domain domain;
+
+	struct list_head endpoints;
+	struct mutex lock;
+	struct riscv_iommu_device *iommu;
+
+	unsigned mode;		/* RIO_ATP_MODE_* enum */
+	unsigned pscid;		/* RISC-V IOMMU PSCID */
+
+	pgd_t *pgd_root;	/* page table root pointer */
+};
+
+/* Private dev_iommu_priv object, device-domain relationship. */
+struct riscv_iommu_endpoint {
+	struct device *dev;			/* platform or PCI endpoint device */
+	unsigned devid;      			/* PCI bus:device:function number */
+	unsigned domid;    			/* PCI domain number, segment */
+	struct rb_node node;    		/* device tracking node (lookup by devid) */
+	struct riscv_iommu_device *iommu;	/* parent iommu device */
+
+	struct mutex lock;
+	struct list_head domain;		/* endpoint attached managed domain */
+};
+
+/* Helper functions and macros */
+
+static inline u32 riscv_iommu_readl(struct riscv_iommu_device *iommu,
+				    unsigned offset)
+{
+	return readl_relaxed(iommu->reg + offset);
+}
+
+static inline void riscv_iommu_writel(struct riscv_iommu_device *iommu,
+				      unsigned offset, u32 val)
+{
+	writel_relaxed(val, iommu->reg + offset);
+}
+
+static inline u64 riscv_iommu_readq(struct riscv_iommu_device *iommu,
+				    unsigned offset)
+{
+	return readq_relaxed(iommu->reg + offset);
+}
+
+static inline void riscv_iommu_writeq(struct riscv_iommu_device *iommu,
+				      unsigned offset, u64 val)
+{
+	writeq_relaxed(val, iommu->reg + offset);
+}
+
+int riscv_iommu_init(struct riscv_iommu_device *iommu);
+void riscv_iommu_remove(struct riscv_iommu_device *iommu);
+
+#endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 02/11] RISC-V: arch/riscv/config: enable RISC-V IOMMU support
  2023-07-19 19:33 [PATCH 00/13] Linux RISC-V IOMMU Support Tomasz Jeznach
  2023-07-19 19:33 ` [PATCH 01/11] RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support Tomasz Jeznach
@ 2023-07-19 19:33 ` Tomasz Jeznach
  2023-07-19 20:22   ` Conor Dooley
  2023-07-19 19:33 ` [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings Tomasz Jeznach
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 86+ messages in thread
From: Tomasz Jeznach @ 2023-07-19 19:33 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley
  Cc: Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Nick Kossifidis, Sebastien Boeuf, iommu, linux-riscv,
	linux-kernel, linux, Tomasz Jeznach

---
 arch/riscv/configs/defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/riscv/configs/defconfig b/arch/riscv/configs/defconfig
index 0a0107460a5c..1a0c3b24329f 100644
--- a/arch/riscv/configs/defconfig
+++ b/arch/riscv/configs/defconfig
@@ -178,6 +178,7 @@ CONFIG_VIRTIO_PCI=y
 CONFIG_VIRTIO_BALLOON=y
 CONFIG_VIRTIO_INPUT=y
 CONFIG_VIRTIO_MMIO=y
+CONFIG_RISCV_IOMMU=y
 CONFIG_SUN8I_DE2_CCU=m
 CONFIG_SUN50I_IOMMU=y
 CONFIG_RPMSG_CHAR=y
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings
  2023-07-19 19:33 [PATCH 00/13] Linux RISC-V IOMMU Support Tomasz Jeznach
  2023-07-19 19:33 ` [PATCH 01/11] RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support Tomasz Jeznach
  2023-07-19 19:33 ` [PATCH 02/11] RISC-V: arch/riscv/config: enable RISC-V IOMMU support Tomasz Jeznach
@ 2023-07-19 19:33 ` Tomasz Jeznach
  2023-07-19 20:19   ` Conor Dooley
  2023-07-24  8:03   ` Zong Li
  2023-07-19 19:33 ` [PATCH 04/11] MAINTAINERS: Add myself for RISC-V IOMMU driver Tomasz Jeznach
                   ` (8 subsequent siblings)
  11 siblings, 2 replies; 86+ messages in thread
From: Tomasz Jeznach @ 2023-07-19 19:33 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley
  Cc: Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Nick Kossifidis, Sebastien Boeuf, iommu, linux-riscv,
	linux-kernel, linux

From: Anup Patel <apatel@ventanamicro.com>

We add DT bindings document for RISC-V IOMMU platform and PCI devices
defined by the RISC-V IOMMU specification.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
---
 .../bindings/iommu/riscv,iommu.yaml           | 146 ++++++++++++++++++
 1 file changed, 146 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/iommu/riscv,iommu.yaml

diff --git a/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
new file mode 100644
index 000000000000..8a9aedb61768
--- /dev/null
+++ b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
@@ -0,0 +1,146 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/iommu/riscv,iommu.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: RISC-V IOMMU Implementation
+
+maintainers:
+  - Tomasz Jeznach <tjeznach@rivosinc.com>
+
+description:
+  The RISC-V IOMMU specificaiton defines an IOMMU for RISC-V platforms
+  which can be a regular platform device or a PCI device connected to
+  the host root port.
+
+  The RISC-V IOMMU provides two stage translation, device directory table,
+  command queue and fault reporting as wired interrupt or MSIx event for
+  both PCI and platform devices.
+
+  Visit https://github.com/riscv-non-isa/riscv-iommu for more details.
+
+properties:
+  compatible:
+    oneOf:
+      - description: RISC-V IOMMU as a platform device
+        items:
+          - enum:
+              - vendor,chip-iommu
+          - const: riscv,iommu
+
+      - description: RISC-V IOMMU as a PCI device connected to root port
+        items:
+          - enum:
+              - vendor,chip-pci-iommu
+          - const: riscv,pci-iommu
+
+  reg:
+    maxItems: 1
+    description:
+      For RISC-V IOMMU as a platform device, this represents the MMIO base
+      address of registers.
+
+      For RISC-V IOMMU as a PCI device, this represents the PCI-PCI bridge
+      details as described in Documentation/devicetree/bindings/pci/pci.txt
+
+  '#iommu-cells':
+    const: 2
+    description: |
+      Each IOMMU specifier represents the base device ID and number of
+      device IDs.
+
+  interrupts:
+    minItems: 1
+    maxItems: 16
+    description:
+      The presence of this property implies that given RISC-V IOMMU uses
+      wired interrupts to notify the RISC-V HARTS (or CPUs).
+
+  msi-parent:
+    description:
+      The presence of this property implies that given RISC-V IOMMU uses
+      MSIx to notify the RISC-V HARTs (or CPUs). This property should be
+      considered only when the interrupts property is absent.
+
+  dma-coherent:
+    description:
+      Present if page table walks and DMA accessed made by the RISC-V IOMMU
+      are cache coherent with the CPU.
+
+  power-domains:
+    maxItems: 1
+
+required:
+  - compatible
+  - reg
+  - '#iommu-cells'
+
+additionalProperties: false
+
+examples:
+  - |
+    /* Example 1 (IOMMU platform device with wired interrupts) */
+    immu1: iommu@1bccd000 {
+        compatible = "vendor,chip-iommu", "riscv,iommu";
+        reg = <0x1bccd000 0x1000>;
+        interrupt-parent = <&aplic_smode>;
+        interrupts = <32 4>, <33 4>, <34 4>, <35 4>;
+        #iommu-cells = <2>;
+    };
+
+    /* Device with two IOMMU device IDs, 0 and 7 */
+    master1 {
+        iommus = <&immu1 0 1>, <&immu1 7 1>;
+    };
+
+  - |
+    /* Example 2 (IOMMU platform device with MSIs) */
+    immu2: iommu@1bcdd000 {
+        compatible = "vendor,chip-iommu", "riscv,iommu";
+        reg = <0x1bccd000 0x1000>;
+        msi-parent = <&imsics_smode>;
+        #iommu-cells = <2>;
+    };
+
+    bus {
+        #address-cells = <2>;
+        #size-cells = <2>;
+
+        /* Device with IOMMU device IDs ranging from 32 to 64 */
+        master1 {
+                iommus = <&immu2 32 32>;
+        };
+
+        pcie@40000000 {
+            compatible = "pci-host-cam-generic";
+            device_type = "pci";
+            #address-cells = <3>;
+            #size-cells = <2>;
+            bus-range = <0x0 0x1>;
+
+            /* CPU_PHYSICAL(2)  SIZE(2) */
+            reg = <0x0 0x40000000 0x0 0x1000000>;
+
+            /* BUS_ADDRESS(3)  CPU_PHYSICAL(2)  SIZE(2) */
+            ranges = <0x01000000 0x0 0x01000000  0x0 0x01000000  0x0 0x00010000>,
+                     <0x02000000 0x0 0x41000000  0x0 0x41000000  0x0 0x3f000000>;
+
+            #interrupt-cells = <0x1>;
+
+            /* PCI_DEVICE(3)  INT#(1)  CONTROLLER(PHANDLE)  CONTROLLER_DATA(2) */
+            interrupt-map = <   0x0 0x0 0x0  0x1  &aplic_smode  0x4 0x1>,
+                            < 0x800 0x0 0x0  0x1  &aplic_smode  0x5 0x1>,
+                            <0x1000 0x0 0x0  0x1  &aplic_smode  0x6 0x1>,
+                            <0x1800 0x0 0x0  0x1  &aplic_smode  0x7 0x1>;
+
+            /* PCI_DEVICE(3)  INT#(1) */
+            interrupt-map-mask = <0xf800 0x0 0x0  0x7>;
+
+            msi-parent = <&imsics_smode>;
+
+            /* Devices with bus number 0-127 are mastered via immu2 */
+            iommu-map = <0x0000 &immu2 0x0000 0x8000>;
+        };
+    };
+...
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 04/11] MAINTAINERS: Add myself for RISC-V IOMMU driver
  2023-07-19 19:33 [PATCH 00/13] Linux RISC-V IOMMU Support Tomasz Jeznach
                   ` (2 preceding siblings ...)
  2023-07-19 19:33 ` [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings Tomasz Jeznach
@ 2023-07-19 19:33 ` Tomasz Jeznach
  2023-07-20 12:42   ` Baolu Lu
  2023-07-19 19:33 ` [PATCH 05/11] RISC-V: drivers/iommu/riscv: Add sysfs interface Tomasz Jeznach
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 86+ messages in thread
From: Tomasz Jeznach @ 2023-07-19 19:33 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley
  Cc: Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Nick Kossifidis, Sebastien Boeuf, iommu, linux-riscv,
	linux-kernel, linux, Tomasz Jeznach

Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
---
 MAINTAINERS | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index aee340630eca..d28b1b99f4c6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -18270,6 +18270,13 @@ F:	arch/riscv/
 N:	riscv
 K:	riscv
 
+RISC-V IOMMU
+M:	Tomasz Jeznach <tjeznach@rivosinc.com>
+L:	linux-riscv@lists.infradead.org
+S:	Maintained
+F:	Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
+F:	drivers/iommu/riscv/
+
 RISC-V MICROCHIP FPGA SUPPORT
 M:	Conor Dooley <conor.dooley@microchip.com>
 M:	Daire McNamara <daire.mcnamara@microchip.com>
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 05/11] RISC-V: drivers/iommu/riscv: Add sysfs interface
  2023-07-19 19:33 [PATCH 00/13] Linux RISC-V IOMMU Support Tomasz Jeznach
                   ` (3 preceding siblings ...)
  2023-07-19 19:33 ` [PATCH 04/11] MAINTAINERS: Add myself for RISC-V IOMMU driver Tomasz Jeznach
@ 2023-07-19 19:33 ` Tomasz Jeznach
  2023-07-20  6:38   ` Krzysztof Kozlowski
  2023-07-20 12:50   ` Baolu Lu
  2023-07-19 19:33 ` [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues Tomasz Jeznach
                   ` (6 subsequent siblings)
  11 siblings, 2 replies; 86+ messages in thread
From: Tomasz Jeznach @ 2023-07-19 19:33 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley
  Cc: Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Nick Kossifidis, Sebastien Boeuf, iommu, linux-riscv,
	linux-kernel, linux, Tomasz Jeznach

Enable sysfs debug / visibility interface providing restricted
access to hardware registers.

Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
---
 drivers/iommu/riscv/Makefile      |   2 +-
 drivers/iommu/riscv/iommu-sysfs.c | 183 ++++++++++++++++++++++++++++++
 drivers/iommu/riscv/iommu.c       |   7 ++
 drivers/iommu/riscv/iommu.h       |   2 +
 4 files changed, 193 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/riscv/iommu-sysfs.c

diff --git a/drivers/iommu/riscv/Makefile b/drivers/iommu/riscv/Makefile
index 38730c11e4a8..9523eb053cfc 100644
--- a/drivers/iommu/riscv/Makefile
+++ b/drivers/iommu/riscv/Makefile
@@ -1 +1 @@
-obj-$(CONFIG_RISCV_IOMMU) += iommu.o iommu-pci.o iommu-platform.o
\ No newline at end of file
+obj-$(CONFIG_RISCV_IOMMU) += iommu.o iommu-pci.o iommu-platform.o iommu-sysfs.o
\ No newline at end of file
diff --git a/drivers/iommu/riscv/iommu-sysfs.c b/drivers/iommu/riscv/iommu-sysfs.c
new file mode 100644
index 000000000000..f038ea8445c5
--- /dev/null
+++ b/drivers/iommu/riscv/iommu-sysfs.c
@@ -0,0 +1,183 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * IOMMU API for RISC-V architected Ziommu implementations.
+ *
+ * Copyright © 2022-2023 Rivos Inc.
+ *
+ * Author: Tomasz Jeznach <tjeznach@rivosinc.com>
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/compiler.h>
+#include <linux/iommu.h>
+#include <linux/platform_device.h>
+#include <asm/page.h>
+
+#include "iommu.h"
+
+#define sysfs_dev_to_iommu(dev) \
+	container_of(dev_get_drvdata(dev), struct riscv_iommu_device, iommu)
+
+static ssize_t address_show(struct device *dev,
+			    struct device_attribute *attr, char *buf)
+{
+	struct riscv_iommu_device *iommu = sysfs_dev_to_iommu(dev);
+	return sprintf(buf, "%llx\n", iommu->reg_phys);
+}
+
+static DEVICE_ATTR_RO(address);
+
+#define ATTR_RD_REG32(name, offset)					\
+	ssize_t reg_ ## name ## _show(struct device *dev,		\
+			struct device_attribute *attr, char *buf)	\
+{									\
+	struct riscv_iommu_device *iommu = sysfs_dev_to_iommu(dev);	\
+	return sprintf(buf, "0x%x\n",					\
+			riscv_iommu_readl(iommu, offset));		\
+}
+
+#define ATTR_RD_REG64(name, offset)					\
+	ssize_t reg_ ## name ## _show(struct device *dev,		\
+			struct device_attribute *attr, char *buf)	\
+{									\
+	struct riscv_iommu_device *iommu = sysfs_dev_to_iommu(dev);	\
+	return sprintf(buf, "0x%llx\n",					\
+			riscv_iommu_readq(iommu, offset));		\
+}
+
+#define ATTR_WR_REG32(name, offset)					\
+	ssize_t reg_ ## name ## _store(struct device *dev,		\
+			struct device_attribute *attr,			\
+			const char *buf, size_t len)			\
+{									\
+	struct riscv_iommu_device *iommu = sysfs_dev_to_iommu(dev);	\
+	unsigned long val;						\
+	int ret;							\
+	ret = kstrtoul(buf, 0, &val);					\
+	if (ret)							\
+		return ret;						\
+	riscv_iommu_writel(iommu, offset, val);				\
+	return len;							\
+}
+
+#define ATTR_WR_REG64(name, offset)					\
+	ssize_t reg_ ## name ## _store(struct device *dev,		\
+			struct device_attribute *attr,			\
+			const char *buf, size_t len)			\
+{									\
+	struct riscv_iommu_device *iommu = sysfs_dev_to_iommu(dev);	\
+	unsigned long long val;						\
+	int ret;							\
+	ret = kstrtoull(buf, 0, &val);					\
+	if (ret)							\
+		return ret;						\
+	riscv_iommu_writeq(iommu, offset, val);				\
+	return len;							\
+}
+
+#define ATTR_RO_REG32(name, offset)					\
+static ATTR_RD_REG32(name, offset);					\
+static DEVICE_ATTR_RO(reg_ ## name)
+
+#define ATTR_RW_REG32(name, offset)					\
+static ATTR_RD_REG32(name, offset);					\
+static ATTR_WR_REG32(name, offset);					\
+static DEVICE_ATTR_RW(reg_ ## name)
+
+#define ATTR_RO_REG64(name, offset)					\
+static ATTR_RD_REG64(name, offset);					\
+static DEVICE_ATTR_RO(reg_ ## name)
+
+#define ATTR_RW_REG64(name, offset)					\
+static ATTR_RD_REG64(name, offset);					\
+static ATTR_WR_REG64(name, offset);					\
+static DEVICE_ATTR_RW(reg_ ## name)
+
+ATTR_RO_REG64(cap, RISCV_IOMMU_REG_CAP);
+ATTR_RO_REG64(fctl, RISCV_IOMMU_REG_FCTL);
+ATTR_RO_REG32(cqh, RISCV_IOMMU_REG_CQH);
+ATTR_RO_REG32(cqt, RISCV_IOMMU_REG_CQT);
+ATTR_RO_REG32(cqcsr, RISCV_IOMMU_REG_CQCSR);
+ATTR_RO_REG32(fqh, RISCV_IOMMU_REG_FQH);
+ATTR_RO_REG32(fqt, RISCV_IOMMU_REG_FQT);
+ATTR_RO_REG32(fqcsr, RISCV_IOMMU_REG_FQCSR);
+ATTR_RO_REG32(pqh, RISCV_IOMMU_REG_PQH);
+ATTR_RO_REG32(pqt, RISCV_IOMMU_REG_PQT);
+ATTR_RO_REG32(pqcsr, RISCV_IOMMU_REG_PQCSR);
+ATTR_RO_REG32(ipsr, RISCV_IOMMU_REG_IPSR);
+ATTR_RO_REG32(ivec, RISCV_IOMMU_REG_IVEC);
+ATTR_RW_REG64(tr_iova, RISCV_IOMMU_REG_TR_REQ_IOVA);
+ATTR_RW_REG64(tr_ctrl, RISCV_IOMMU_REG_TR_REQ_CTL);
+ATTR_RW_REG64(tr_response, RISCV_IOMMU_REG_TR_RESPONSE);
+ATTR_RW_REG32(iocntovf, RISCV_IOMMU_REG_IOCOUNTOVF);
+ATTR_RW_REG32(iocntinh, RISCV_IOMMU_REG_IOCOUNTINH);
+ATTR_RW_REG64(iohpmcycles, RISCV_IOMMU_REG_IOHPMCYCLES);
+ATTR_RW_REG64(iohpmevt_1, RISCV_IOMMU_REG_IOHPMEVT(0));
+ATTR_RW_REG64(iohpmevt_2, RISCV_IOMMU_REG_IOHPMEVT(1));
+ATTR_RW_REG64(iohpmevt_3, RISCV_IOMMU_REG_IOHPMEVT(2));
+ATTR_RW_REG64(iohpmevt_4, RISCV_IOMMU_REG_IOHPMEVT(3));
+ATTR_RW_REG64(iohpmevt_5, RISCV_IOMMU_REG_IOHPMEVT(4));
+ATTR_RW_REG64(iohpmevt_6, RISCV_IOMMU_REG_IOHPMEVT(5));
+ATTR_RW_REG64(iohpmevt_7, RISCV_IOMMU_REG_IOHPMEVT(6));
+ATTR_RW_REG64(iohpmctr_1, RISCV_IOMMU_REG_IOHPMCTR(0));
+ATTR_RW_REG64(iohpmctr_2, RISCV_IOMMU_REG_IOHPMCTR(1));
+ATTR_RW_REG64(iohpmctr_3, RISCV_IOMMU_REG_IOHPMCTR(2));
+ATTR_RW_REG64(iohpmctr_4, RISCV_IOMMU_REG_IOHPMCTR(3));
+ATTR_RW_REG64(iohpmctr_5, RISCV_IOMMU_REG_IOHPMCTR(4));
+ATTR_RW_REG64(iohpmctr_6, RISCV_IOMMU_REG_IOHPMCTR(5));
+ATTR_RW_REG64(iohpmctr_7, RISCV_IOMMU_REG_IOHPMCTR(6));
+
+static struct attribute *riscv_iommu_attrs[] = {
+	&dev_attr_address.attr,
+	&dev_attr_reg_cap.attr,
+	&dev_attr_reg_fctl.attr,
+	&dev_attr_reg_cqh.attr,
+	&dev_attr_reg_cqt.attr,
+	&dev_attr_reg_cqcsr.attr,
+	&dev_attr_reg_fqh.attr,
+	&dev_attr_reg_fqt.attr,
+	&dev_attr_reg_fqcsr.attr,
+	&dev_attr_reg_pqh.attr,
+	&dev_attr_reg_pqt.attr,
+	&dev_attr_reg_pqcsr.attr,
+	&dev_attr_reg_ipsr.attr,
+	&dev_attr_reg_ivec.attr,
+	&dev_attr_reg_tr_iova.attr,
+	&dev_attr_reg_tr_ctrl.attr,
+	&dev_attr_reg_tr_response.attr,
+	&dev_attr_reg_iocntovf.attr,
+	&dev_attr_reg_iocntinh.attr,
+	&dev_attr_reg_iohpmcycles.attr,
+	&dev_attr_reg_iohpmctr_1.attr,
+	&dev_attr_reg_iohpmevt_1.attr,
+	&dev_attr_reg_iohpmctr_2.attr,
+	&dev_attr_reg_iohpmevt_2.attr,
+	&dev_attr_reg_iohpmctr_3.attr,
+	&dev_attr_reg_iohpmevt_3.attr,
+	&dev_attr_reg_iohpmctr_4.attr,
+	&dev_attr_reg_iohpmevt_4.attr,
+	&dev_attr_reg_iohpmctr_5.attr,
+	&dev_attr_reg_iohpmevt_5.attr,
+	&dev_attr_reg_iohpmctr_6.attr,
+	&dev_attr_reg_iohpmevt_6.attr,
+	&dev_attr_reg_iohpmctr_7.attr,
+	&dev_attr_reg_iohpmevt_7.attr,
+	NULL,
+};
+
+static struct attribute_group riscv_iommu_group = {
+	.name = "riscv-iommu",
+	.attrs = riscv_iommu_attrs,
+};
+
+const struct attribute_group *riscv_iommu_groups[] = {
+	&riscv_iommu_group,
+	NULL,
+};
+
+int riscv_iommu_sysfs_add(struct riscv_iommu_device *iommu) {
+	return iommu_device_sysfs_add(&iommu->iommu, NULL,
+		riscv_iommu_groups, "riscv-iommu@%llx", iommu->reg_phys);
+}
+
diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index 8c236242e2cc..31dc3c458e13 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -608,6 +608,7 @@ static const struct iommu_ops riscv_iommu_ops = {
 void riscv_iommu_remove(struct riscv_iommu_device *iommu)
 {
 	iommu_device_unregister(&iommu->iommu);
+	iommu_device_sysfs_remove(&iommu->iommu);
 	riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_OFF);
 }
 
@@ -646,6 +647,12 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
 		goto fail;
 	}
 
+	ret = riscv_iommu_sysfs_add(iommu);
+	if (ret) {
+		dev_err(dev, "cannot register sysfs interface (%d)\n", ret);
+		goto fail;
+	}
+
 	ret = iommu_device_register(&iommu->iommu, &riscv_iommu_ops, dev);
 	if (ret) {
 		dev_err(dev, "cannot register iommu interface (%d)\n", ret);
diff --git a/drivers/iommu/riscv/iommu.h b/drivers/iommu/riscv/iommu.h
index 7baefd3630b3..7dc9baa59a50 100644
--- a/drivers/iommu/riscv/iommu.h
+++ b/drivers/iommu/riscv/iommu.h
@@ -112,4 +112,6 @@ static inline void riscv_iommu_writeq(struct riscv_iommu_device *iommu,
 int riscv_iommu_init(struct riscv_iommu_device *iommu);
 void riscv_iommu_remove(struct riscv_iommu_device *iommu);
 
+int riscv_iommu_sysfs_add(struct riscv_iommu_device *iommu);
+
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues
  2023-07-19 19:33 [PATCH 00/13] Linux RISC-V IOMMU Support Tomasz Jeznach
                   ` (4 preceding siblings ...)
  2023-07-19 19:33 ` [PATCH 05/11] RISC-V: drivers/iommu/riscv: Add sysfs interface Tomasz Jeznach
@ 2023-07-19 19:33 ` Tomasz Jeznach
  2023-07-20  3:11   ` Nick Kossifidis
                     ` (3 more replies)
  2023-07-19 19:33 ` [PATCH 07/11] RISC-V: drivers/iommu/riscv: Add device context support Tomasz Jeznach
                   ` (5 subsequent siblings)
  11 siblings, 4 replies; 86+ messages in thread
From: Tomasz Jeznach @ 2023-07-19 19:33 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley
  Cc: Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Nick Kossifidis, Sebastien Boeuf, iommu, linux-riscv,
	linux-kernel, linux, Tomasz Jeznach

Enables message or wire signal interrupts for PCIe and platforms devices.

Co-developed-by: Nick Kossifidis <mick@ics.forth.gr>
Signed-off-by: Nick Kossifidis <mick@ics.forth.gr>
Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
---
 drivers/iommu/riscv/iommu-pci.c      |  72 ++++
 drivers/iommu/riscv/iommu-platform.c |  66 +++
 drivers/iommu/riscv/iommu.c          | 604 ++++++++++++++++++++++++++-
 drivers/iommu/riscv/iommu.h          |  28 ++
 4 files changed, 769 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/riscv/iommu-pci.c b/drivers/iommu/riscv/iommu-pci.c
index c91f963d7a29..9ea0647f7b92 100644
--- a/drivers/iommu/riscv/iommu-pci.c
+++ b/drivers/iommu/riscv/iommu-pci.c
@@ -34,6 +34,7 @@ static int riscv_iommu_pci_probe(struct pci_dev *pdev, const struct pci_device_i
 {
 	struct device *dev = &pdev->dev;
 	struct riscv_iommu_device *iommu;
+	u64 icvec;
 	int ret;
 
 	ret = pci_enable_device_mem(pdev);
@@ -67,14 +68,84 @@ static int riscv_iommu_pci_probe(struct pci_dev *pdev, const struct pci_device_i
 	iommu->dev = dev;
 	dev_set_drvdata(dev, iommu);
 
+	/* Check device reported capabilities. */
+	iommu->cap = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_CAP);
+
+	/* The PCI driver only uses MSIs, make sure the IOMMU supports this */
+	switch (FIELD_GET(RISCV_IOMMU_CAP_IGS, iommu->cap)) {
+	case RISCV_IOMMU_CAP_IGS_MSI:
+	case RISCV_IOMMU_CAP_IGS_BOTH:
+		break;
+	default:
+		dev_err(dev, "unable to use message-signaled interrupts\n");
+		ret = -ENODEV;
+		goto fail;
+	}
+
 	dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
 	pci_set_master(pdev);
 
+	/* Allocate and assign IRQ vectors for the various events */
+	ret = pci_alloc_irq_vectors(pdev, 1, RISCV_IOMMU_INTR_COUNT, PCI_IRQ_MSIX);
+	if (ret < 0) {
+		dev_err(dev, "unable to allocate irq vectors\n");
+		goto fail;
+	}
+
+	ret = -ENODEV;
+
+	iommu->irq_cmdq = msi_get_virq(dev, RISCV_IOMMU_INTR_CQ);
+	if (!iommu->irq_cmdq) {
+		dev_warn(dev, "no MSI vector %d for the command queue\n",
+			 RISCV_IOMMU_INTR_CQ);
+		goto fail;
+	}
+
+	iommu->irq_fltq = msi_get_virq(dev, RISCV_IOMMU_INTR_FQ);
+	if (!iommu->irq_fltq) {
+		dev_warn(dev, "no MSI vector %d for the fault/event queue\n",
+			 RISCV_IOMMU_INTR_FQ);
+		goto fail;
+	}
+
+	if (iommu->cap & RISCV_IOMMU_CAP_HPM) {
+		iommu->irq_pm = msi_get_virq(dev, RISCV_IOMMU_INTR_PM);
+		if (!iommu->irq_pm) {
+			dev_warn(dev,
+				 "no MSI vector %d for performance monitoring\n",
+				 RISCV_IOMMU_INTR_PM);
+			goto fail;
+		}
+	}
+
+	if (iommu->cap & RISCV_IOMMU_CAP_ATS) {
+		iommu->irq_priq = msi_get_virq(dev, RISCV_IOMMU_INTR_PQ);
+		if (!iommu->irq_priq) {
+			dev_warn(dev,
+				 "no MSI vector %d for page-request queue\n",
+				 RISCV_IOMMU_INTR_PQ);
+			goto fail;
+		}
+	}
+
+	/* Set simple 1:1 mapping for MSI vectors */
+	icvec = FIELD_PREP(RISCV_IOMMU_IVEC_CIV, RISCV_IOMMU_INTR_CQ) |
+	    FIELD_PREP(RISCV_IOMMU_IVEC_FIV, RISCV_IOMMU_INTR_FQ);
+
+	if (iommu->cap & RISCV_IOMMU_CAP_HPM)
+		icvec |= FIELD_PREP(RISCV_IOMMU_IVEC_PMIV, RISCV_IOMMU_INTR_PM);
+
+	if (iommu->cap & RISCV_IOMMU_CAP_ATS)
+		icvec |= FIELD_PREP(RISCV_IOMMU_IVEC_PIV, RISCV_IOMMU_INTR_PQ);
+
+	riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IVEC, icvec);
+
 	ret = riscv_iommu_init(iommu);
 	if (!ret)
 		return ret;
 
  fail:
+	pci_free_irq_vectors(pdev);
 	pci_clear_master(pdev);
 	pci_release_regions(pdev);
 	pci_disable_device(pdev);
@@ -85,6 +156,7 @@ static int riscv_iommu_pci_probe(struct pci_dev *pdev, const struct pci_device_i
 static void riscv_iommu_pci_remove(struct pci_dev *pdev)
 {
 	riscv_iommu_remove(dev_get_drvdata(&pdev->dev));
+	pci_free_irq_vectors(pdev);
 	pci_clear_master(pdev);
 	pci_release_regions(pdev);
 	pci_disable_device(pdev);
diff --git a/drivers/iommu/riscv/iommu-platform.c b/drivers/iommu/riscv/iommu-platform.c
index e4e8ca6711e7..35935d3c7ef4 100644
--- a/drivers/iommu/riscv/iommu-platform.c
+++ b/drivers/iommu/riscv/iommu-platform.c
@@ -20,6 +20,8 @@ static int riscv_iommu_platform_probe(struct platform_device *pdev)
 	struct device *dev = &pdev->dev;
 	struct riscv_iommu_device *iommu = NULL;
 	struct resource *res = NULL;
+	u32 fctl = 0;
+	int irq = 0;
 	int ret = 0;
 
 	iommu = devm_kzalloc(dev, sizeof(*iommu), GFP_KERNEL);
@@ -53,6 +55,70 @@ static int riscv_iommu_platform_probe(struct platform_device *pdev)
 		goto fail;
 	}
 
+	iommu->cap = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_CAP);
+
+	/* For now we only support WSIs until we have AIA support */
+	ret = FIELD_GET(RISCV_IOMMU_CAP_IGS, iommu->cap);
+	if (ret == RISCV_IOMMU_CAP_IGS_MSI) {
+		dev_err(dev, "IOMMU only supports MSIs\n");
+		goto fail;
+	}
+
+	/* Parse IRQ assignment */
+	irq = platform_get_irq_byname_optional(pdev, "cmdq");
+	if (irq > 0)
+		iommu->irq_cmdq = irq;
+	else {
+		dev_err(dev, "no IRQ provided for the command queue\n");
+		goto fail;
+	}
+
+	irq = platform_get_irq_byname_optional(pdev, "fltq");
+	if (irq > 0)
+		iommu->irq_fltq = irq;
+	else {
+		dev_err(dev, "no IRQ provided for the fault/event queue\n");
+		goto fail;
+	}
+
+	if (iommu->cap & RISCV_IOMMU_CAP_HPM) {
+		irq = platform_get_irq_byname_optional(pdev, "pm");
+		if (irq > 0)
+			iommu->irq_pm = irq;
+		else {
+			dev_err(dev, "no IRQ provided for performance monitoring\n");
+			goto fail;
+		}
+	}
+
+	if (iommu->cap & RISCV_IOMMU_CAP_ATS) {
+		irq = platform_get_irq_byname_optional(pdev, "priq");
+		if (irq > 0)
+			iommu->irq_priq = irq;
+		else {
+			dev_err(dev, "no IRQ provided for the page-request queue\n");
+			goto fail;
+		}
+	}
+
+	/* Make sure fctl.WSI is set */
+	fctl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_FCTL);
+	fctl |= RISCV_IOMMU_FCTL_WSI;
+	riscv_iommu_writel(iommu, RISCV_IOMMU_REG_FCTL, fctl);
+
+	/* Parse Queue lengts */
+	ret = of_property_read_u32(pdev->dev.of_node, "cmdq_len", &iommu->cmdq_len);
+	if (!ret)
+		dev_info(dev, "command queue length set to %i\n", iommu->cmdq_len);
+
+	ret = of_property_read_u32(pdev->dev.of_node, "fltq_len", &iommu->fltq_len);
+	if (!ret)
+		dev_info(dev, "fault/event queue length set to %i\n", iommu->fltq_len);
+
+	ret = of_property_read_u32(pdev->dev.of_node, "priq_len", &iommu->priq_len);
+	if (!ret)
+		dev_info(dev, "page request queue length set to %i\n", iommu->priq_len);
+
 	dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
 
 	return riscv_iommu_init(iommu);
diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index 31dc3c458e13..5c4cf9875302 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -45,6 +45,18 @@ static int ddt_mode = RISCV_IOMMU_DDTP_MODE_BARE;
 module_param(ddt_mode, int, 0644);
 MODULE_PARM_DESC(ddt_mode, "Device Directory Table mode.");
 
+static int cmdq_length = 1024;
+module_param(cmdq_length, int, 0644);
+MODULE_PARM_DESC(cmdq_length, "Command queue length.");
+
+static int fltq_length = 1024;
+module_param(fltq_length, int, 0644);
+MODULE_PARM_DESC(fltq_length, "Fault queue length.");
+
+static int priq_length = 1024;
+module_param(priq_length, int, 0644);
+MODULE_PARM_DESC(priq_length, "Page request interface queue length.");
+
 /* IOMMU PSCID allocation namespace. */
 #define RISCV_IOMMU_MAX_PSCID	(1U << 20)
 static DEFINE_IDA(riscv_iommu_pscids);
@@ -65,6 +77,497 @@ static DEFINE_IDA(riscv_iommu_pscids);
 static const struct iommu_domain_ops riscv_iommu_domain_ops;
 static const struct iommu_ops riscv_iommu_ops;
 
+/*
+ * Common queue management routines
+ */
+
+/* Note: offsets are the same for all queues */
+#define Q_HEAD(q) ((q)->qbr + (RISCV_IOMMU_REG_CQH - RISCV_IOMMU_REG_CQB))
+#define Q_TAIL(q) ((q)->qbr + (RISCV_IOMMU_REG_CQT - RISCV_IOMMU_REG_CQB))
+
+static unsigned riscv_iommu_queue_consume(struct riscv_iommu_device *iommu,
+					  struct riscv_iommu_queue *q, unsigned *ready)
+{
+	u32 tail = riscv_iommu_readl(iommu, Q_TAIL(q));
+	*ready = q->lui;
+
+	BUG_ON(q->cnt <= tail);
+	if (q->lui <= tail)
+		return tail - q->lui;
+	return q->cnt - q->lui;
+}
+
+static void riscv_iommu_queue_release(struct riscv_iommu_device *iommu,
+				      struct riscv_iommu_queue *q, unsigned count)
+{
+	q->lui = (q->lui + count) & (q->cnt - 1);
+	riscv_iommu_writel(iommu, Q_HEAD(q), q->lui);
+}
+
+static u32 riscv_iommu_queue_ctrl(struct riscv_iommu_device *iommu,
+				  struct riscv_iommu_queue *q, u32 val)
+{
+	cycles_t end_cycles = RISCV_IOMMU_TIMEOUT + get_cycles();
+
+	riscv_iommu_writel(iommu, q->qcr, val);
+	do {
+		val = riscv_iommu_readl(iommu, q->qcr);
+		if (!(val & RISCV_IOMMU_QUEUE_BUSY))
+			break;
+		cpu_relax();
+	} while (get_cycles() < end_cycles);
+
+	return val;
+}
+
+static void riscv_iommu_queue_free(struct riscv_iommu_device *iommu,
+				   struct riscv_iommu_queue *q)
+{
+	size_t size = q->len * q->cnt;
+
+	riscv_iommu_queue_ctrl(iommu, q, 0);
+
+	if (q->base) {
+		if (q->in_iomem)
+			iounmap(q->base);
+		else
+			dmam_free_coherent(iommu->dev, size, q->base, q->base_dma);
+	}
+	if (q->irq)
+		free_irq(q->irq, q);
+}
+
+static irqreturn_t riscv_iommu_cmdq_irq_check(int irq, void *data);
+static irqreturn_t riscv_iommu_cmdq_process(int irq, void *data);
+static irqreturn_t riscv_iommu_fltq_irq_check(int irq, void *data);
+static irqreturn_t riscv_iommu_fltq_process(int irq, void *data);
+static irqreturn_t riscv_iommu_priq_irq_check(int irq, void *data);
+static irqreturn_t riscv_iommu_priq_process(int irq, void *data);
+
+static int riscv_iommu_queue_init(struct riscv_iommu_device *iommu, int queue_id)
+{
+	struct device *dev = iommu->dev;
+	struct riscv_iommu_queue *q = NULL;
+	size_t queue_size = 0;
+	irq_handler_t irq_check;
+	irq_handler_t irq_process;
+	const char *name;
+	int count = 0;
+	int irq = 0;
+	unsigned order = 0;
+	u64 qbr_val = 0;
+	u64 qbr_readback = 0;
+	u64 qbr_paddr = 0;
+	int ret = 0;
+
+	switch (queue_id) {
+	case RISCV_IOMMU_COMMAND_QUEUE:
+		q = &iommu->cmdq;
+		q->len = sizeof(struct riscv_iommu_command);
+		count = iommu->cmdq_len;
+		irq = iommu->irq_cmdq;
+		irq_check = riscv_iommu_cmdq_irq_check;
+		irq_process = riscv_iommu_cmdq_process;
+		q->qbr = RISCV_IOMMU_REG_CQB;
+		q->qcr = RISCV_IOMMU_REG_CQCSR;
+		name = "cmdq";
+		break;
+	case RISCV_IOMMU_FAULT_QUEUE:
+		q = &iommu->fltq;
+		q->len = sizeof(struct riscv_iommu_fq_record);
+		count = iommu->fltq_len;
+		irq = iommu->irq_fltq;
+		irq_check = riscv_iommu_fltq_irq_check;
+		irq_process = riscv_iommu_fltq_process;
+		q->qbr = RISCV_IOMMU_REG_FQB;
+		q->qcr = RISCV_IOMMU_REG_FQCSR;
+		name = "fltq";
+		break;
+	case RISCV_IOMMU_PAGE_REQUEST_QUEUE:
+		q = &iommu->priq;
+		q->len = sizeof(struct riscv_iommu_pq_record);
+		count = iommu->priq_len;
+		irq = iommu->irq_priq;
+		irq_check = riscv_iommu_priq_irq_check;
+		irq_process = riscv_iommu_priq_process;
+		q->qbr = RISCV_IOMMU_REG_PQB;
+		q->qcr = RISCV_IOMMU_REG_PQCSR;
+		name = "priq";
+		break;
+	default:
+		dev_err(dev, "invalid queue interrupt index in queue_init!\n");
+		return -EINVAL;
+	}
+
+	/* Polling not implemented */
+	if (!irq)
+		return -ENODEV;
+
+	/* Allocate queue in memory and set the base register */
+	order = ilog2(count);
+	do {
+		queue_size = q->len * (1ULL << order);
+		q->base = dmam_alloc_coherent(dev, queue_size, &q->base_dma, GFP_KERNEL);
+		if (q->base || queue_size < PAGE_SIZE)
+			break;
+
+		order--;
+	} while (1);
+
+	if (!q->base) {
+		dev_err(dev, "failed to allocate %s queue (cnt: %u)\n", name, count);
+		return -ENOMEM;
+	}
+
+	q->cnt = 1ULL << order;
+
+	qbr_val = phys_to_ppn(q->base_dma) |
+	    FIELD_PREP(RISCV_IOMMU_QUEUE_LOGSZ_FIELD, order - 1);
+
+	riscv_iommu_writeq(iommu, q->qbr, qbr_val);
+
+	/*
+	 * Queue base registers are WARL, so it's possible that whatever we wrote
+	 * there was illegal/not supported by the hw in which case we need to make
+	 * sure we set a supported PPN and/or queue size.
+	 */
+	qbr_readback = riscv_iommu_readq(iommu, q->qbr);
+	if (qbr_readback == qbr_val)
+		goto irq;
+
+	dmam_free_coherent(dev, queue_size, q->base, q->base_dma);
+
+	/* Get supported queue size */
+	order = FIELD_GET(RISCV_IOMMU_QUEUE_LOGSZ_FIELD, qbr_readback) + 1;
+	q->cnt = 1ULL << order;
+	queue_size = q->len * q->cnt;
+
+	/*
+	 * In case we also failed to set PPN, it means the field is hardcoded and the
+	 * queue resides in I/O memory instead, so get its physical address and
+	 * ioremap it.
+	 */
+	qbr_paddr = ppn_to_phys(qbr_readback);
+	if (qbr_paddr != q->base_dma) {
+		dev_info(dev,
+			 "hardcoded ppn in %s base register, using io memory for the queue\n",
+			 name);
+		dev_info(dev, "queue length for %s set to %i\n", name, q->cnt);
+		q->in_iomem = true;
+		q->base = ioremap(qbr_paddr, queue_size);
+		if (!q->base) {
+			dev_err(dev, "failed to map %s queue (cnt: %u)\n", name, q->cnt);
+			return -ENOMEM;
+		}
+		q->base_dma = qbr_paddr;
+	} else {
+		/*
+		 * We only failed to set the queue size, re-try to allocate memory with
+		 * the queue size supported by the hw.
+		 */
+		dev_info(dev, "hardcoded queue size in %s base register\n", name);
+		dev_info(dev, "retrying with queue length: %i\n", q->cnt);
+		q->base = dmam_alloc_coherent(dev, queue_size, &q->base_dma, GFP_KERNEL);
+		if (!q->base) {
+			dev_err(dev, "failed to allocate %s queue (cnt: %u)\n",
+				name, q->cnt);
+			return -ENOMEM;
+		}
+	}
+
+	qbr_val = phys_to_ppn(q->base_dma) |
+	    FIELD_PREP(RISCV_IOMMU_QUEUE_LOGSZ_FIELD, order - 1);
+	riscv_iommu_writeq(iommu, q->qbr, qbr_val);
+
+	/* Final check to make sure hw accepted our write */
+	qbr_readback = riscv_iommu_readq(iommu, q->qbr);
+	if (qbr_readback != qbr_val) {
+		dev_err(dev, "failed to set base register for %s\n", name);
+		goto fail;
+	}
+
+ irq:
+	if (request_threaded_irq(irq, irq_check, irq_process, IRQF_ONESHOT | IRQF_SHARED,
+				 dev_name(dev), q)) {
+		dev_err(dev, "fail to request irq %d for %s\n", irq, name);
+		goto fail;
+	}
+
+	q->irq = irq;
+
+	/* Note: All RIO_xQ_EN/IE fields are in the same offsets */
+	ret =
+	    riscv_iommu_queue_ctrl(iommu, q,
+				   RISCV_IOMMU_QUEUE_ENABLE |
+				   RISCV_IOMMU_QUEUE_INTR_ENABLE);
+	if (ret & RISCV_IOMMU_QUEUE_BUSY) {
+		dev_err(dev, "%s init timeout\n", name);
+		ret = -EBUSY;
+		goto fail;
+	}
+
+	return 0;
+
+ fail:
+	riscv_iommu_queue_free(iommu, q);
+	return 0;
+}
+
+/*
+ * I/O MMU Command queue chapter 3.1
+ */
+
+static inline void riscv_iommu_cmd_inval_vma(struct riscv_iommu_command *cmd)
+{
+	cmd->dword0 =
+	    FIELD_PREP(RISCV_IOMMU_CMD_OPCODE,
+		       RISCV_IOMMU_CMD_IOTINVAL_OPCODE) | FIELD_PREP(RISCV_IOMMU_CMD_FUNC,
+								     RISCV_IOMMU_CMD_IOTINVAL_FUNC_VMA);
+	cmd->dword1 = 0;
+}
+
+static inline void riscv_iommu_cmd_inval_set_addr(struct riscv_iommu_command *cmd,
+						  u64 addr)
+{
+	cmd->dword0 |= RISCV_IOMMU_CMD_IOTINVAL_AV;
+	cmd->dword1 = addr;
+}
+
+static inline void riscv_iommu_cmd_inval_set_pscid(struct riscv_iommu_command *cmd,
+						   unsigned pscid)
+{
+	cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_IOTINVAL_PSCID, pscid) |
+	    RISCV_IOMMU_CMD_IOTINVAL_PSCV;
+}
+
+static inline void riscv_iommu_cmd_inval_set_gscid(struct riscv_iommu_command *cmd,
+						   unsigned gscid)
+{
+	cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_IOTINVAL_GSCID, gscid) |
+	    RISCV_IOMMU_CMD_IOTINVAL_GV;
+}
+
+static inline void riscv_iommu_cmd_iofence(struct riscv_iommu_command *cmd)
+{
+	cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IOFENCE_OPCODE) |
+	    FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IOFENCE_FUNC_C);
+	cmd->dword1 = 0;
+}
+
+static inline void riscv_iommu_cmd_iofence_set_av(struct riscv_iommu_command *cmd,
+						  u64 addr, u32 data)
+{
+	cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IOFENCE_OPCODE) |
+	    FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IOFENCE_FUNC_C) |
+	    FIELD_PREP(RISCV_IOMMU_CMD_IOFENCE_DATA, data) | RISCV_IOMMU_CMD_IOFENCE_AV;
+	cmd->dword1 = (addr >> 2);
+}
+
+static inline void riscv_iommu_cmd_iodir_inval_ddt(struct riscv_iommu_command *cmd)
+{
+	cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IODIR_OPCODE) |
+	    FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_DDT);
+	cmd->dword1 = 0;
+}
+
+static inline void riscv_iommu_cmd_iodir_inval_pdt(struct riscv_iommu_command *cmd)
+{
+	cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IODIR_OPCODE) |
+	    FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_PDT);
+	cmd->dword1 = 0;
+}
+
+static inline void riscv_iommu_cmd_iodir_set_did(struct riscv_iommu_command *cmd,
+						 unsigned devid)
+{
+	cmd->dword0 |=
+	    FIELD_PREP(RISCV_IOMMU_CMD_IODIR_DID, devid) | RISCV_IOMMU_CMD_IODIR_DV;
+}
+
+/* TODO: Convert into lock-less MPSC implementation. */
+static bool riscv_iommu_post_sync(struct riscv_iommu_device *iommu,
+				  struct riscv_iommu_command *cmd, bool sync)
+{
+	u32 head, tail, next, last;
+	unsigned long flags;
+
+	spin_lock_irqsave(&iommu->cq_lock, flags);
+	head = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQH) & (iommu->cmdq.cnt - 1);
+	tail = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQT) & (iommu->cmdq.cnt - 1);
+	last = iommu->cmdq.lui;
+	if (tail != last) {
+		spin_unlock_irqrestore(&iommu->cq_lock, flags);
+		/*
+		 * FIXME: This is a workaround for dropped MMIO writes/reads on QEMU platform.
+		 *        While debugging of the problem is still ongoing, this provides
+		 *        a simple impolementation of try-again policy.
+		 *        Will be changed to lock-less algorithm in the feature.
+		 */
+		dev_dbg(iommu->dev, "IOMMU CQT: %x != %x (1st)\n", last, tail);
+		spin_lock_irqsave(&iommu->cq_lock, flags);
+		tail =
+		    riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQT) & (iommu->cmdq.cnt - 1);
+		last = iommu->cmdq.lui;
+		if (tail != last) {
+			spin_unlock_irqrestore(&iommu->cq_lock, flags);
+			dev_dbg(iommu->dev, "IOMMU CQT: %x != %x (2nd)\n", last, tail);
+			spin_lock_irqsave(&iommu->cq_lock, flags);
+		}
+	}
+
+	next = (last + 1) & (iommu->cmdq.cnt - 1);
+	if (next != head) {
+		struct riscv_iommu_command *ptr = iommu->cmdq.base;
+		ptr[last] = *cmd;
+		wmb();
+		riscv_iommu_writel(iommu, RISCV_IOMMU_REG_CQT, next);
+		iommu->cmdq.lui = next;
+	}
+
+	spin_unlock_irqrestore(&iommu->cq_lock, flags);
+
+	if (sync && head != next) {
+		cycles_t start_time = get_cycles();
+		while (1) {
+			last = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQH) &
+			    (iommu->cmdq.cnt - 1);
+			if (head < next && last >= next)
+				break;
+			if (head > next && last < head && last >= next)
+				break;
+			if (RISCV_IOMMU_TIMEOUT < (get_cycles() - start_time)) {
+				dev_err(iommu->dev, "IOFENCE TIMEOUT\n");
+				return false;
+			}
+			cpu_relax();
+		}
+	}
+
+	return next != head;
+}
+
+static bool riscv_iommu_post(struct riscv_iommu_device *iommu,
+			     struct riscv_iommu_command *cmd)
+{
+	return riscv_iommu_post_sync(iommu, cmd, false);
+}
+
+static bool riscv_iommu_iofence_sync(struct riscv_iommu_device *iommu)
+{
+	struct riscv_iommu_command cmd;
+	riscv_iommu_cmd_iofence(&cmd);
+	return riscv_iommu_post_sync(iommu, &cmd, true);
+}
+
+/* Command queue primary interrupt handler */
+static irqreturn_t riscv_iommu_cmdq_irq_check(int irq, void *data)
+{
+	struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
+	struct riscv_iommu_device *iommu =
+	    container_of(q, struct riscv_iommu_device, cmdq);
+	u32 ipsr = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_IPSR);
+	if (ipsr & RISCV_IOMMU_IPSR_CIP)
+		return IRQ_WAKE_THREAD;
+	return IRQ_NONE;
+}
+
+/* Command queue interrupt hanlder thread function */
+static irqreturn_t riscv_iommu_cmdq_process(int irq, void *data)
+{
+	struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
+	struct riscv_iommu_device *iommu;
+	unsigned ctrl;
+
+	iommu = container_of(q, struct riscv_iommu_device, cmdq);
+
+	/* Error reporting, clear error reports if any. */
+	ctrl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQCSR);
+	if (ctrl & (RISCV_IOMMU_CQCSR_CQMF |
+		    RISCV_IOMMU_CQCSR_CMD_TO | RISCV_IOMMU_CQCSR_CMD_ILL)) {
+		riscv_iommu_queue_ctrl(iommu, &iommu->cmdq, ctrl);
+		dev_warn_ratelimited(iommu->dev,
+				     "Command queue error: fault: %d tout: %d err: %d\n",
+				     !!(ctrl & RISCV_IOMMU_CQCSR_CQMF),
+				     !!(ctrl & RISCV_IOMMU_CQCSR_CMD_TO),
+				     !!(ctrl & RISCV_IOMMU_CQCSR_CMD_ILL));
+	}
+
+	/* Clear fault interrupt pending. */
+	riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR, RISCV_IOMMU_IPSR_CIP);
+
+	return IRQ_HANDLED;
+}
+
+/*
+ * Fault/event queue, chapter 3.2
+ */
+
+static void riscv_iommu_fault_report(struct riscv_iommu_device *iommu,
+				     struct riscv_iommu_fq_record *event)
+{
+	unsigned err, devid;
+
+	err = FIELD_GET(RISCV_IOMMU_FQ_HDR_CAUSE, event->hdr);
+	devid = FIELD_GET(RISCV_IOMMU_FQ_HDR_DID, event->hdr);
+
+	dev_warn_ratelimited(iommu->dev,
+			     "Fault %d devid: %d" " iotval: %llx iotval2: %llx\n", err,
+			     devid, event->iotval, event->iotval2);
+}
+
+/* Fault/event queue primary interrupt handler */
+static irqreturn_t riscv_iommu_fltq_irq_check(int irq, void *data)
+{
+	struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
+	struct riscv_iommu_device *iommu =
+	    container_of(q, struct riscv_iommu_device, fltq);
+	u32 ipsr = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_IPSR);
+	if (ipsr & RISCV_IOMMU_IPSR_FIP)
+		return IRQ_WAKE_THREAD;
+	return IRQ_NONE;
+}
+
+/* Fault queue interrupt hanlder thread function */
+static irqreturn_t riscv_iommu_fltq_process(int irq, void *data)
+{
+	struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
+	struct riscv_iommu_device *iommu;
+	struct riscv_iommu_fq_record *events;
+	unsigned cnt, len, idx, ctrl;
+
+	iommu = container_of(q, struct riscv_iommu_device, fltq);
+	events = (struct riscv_iommu_fq_record *)q->base;
+
+	/* Error reporting, clear error reports if any. */
+	ctrl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_FQCSR);
+	if (ctrl & (RISCV_IOMMU_FQCSR_FQMF | RISCV_IOMMU_FQCSR_FQOF)) {
+		riscv_iommu_queue_ctrl(iommu, &iommu->fltq, ctrl);
+		dev_warn_ratelimited(iommu->dev,
+				     "Fault queue error: fault: %d full: %d\n",
+				     !!(ctrl & RISCV_IOMMU_FQCSR_FQMF),
+				     !!(ctrl & RISCV_IOMMU_FQCSR_FQOF));
+	}
+
+	/* Clear fault interrupt pending. */
+	riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR, RISCV_IOMMU_IPSR_FIP);
+
+	/* Report fault events. */
+	do {
+		cnt = riscv_iommu_queue_consume(iommu, q, &idx);
+		if (!cnt)
+			break;
+		for (len = 0; len < cnt; idx++, len++)
+			riscv_iommu_fault_report(iommu, &events[idx]);
+		riscv_iommu_queue_release(iommu, q, cnt);
+	} while (1);
+
+	return IRQ_HANDLED;
+}
+
+/*
+ * Page request queue, chapter 3.3
+ */
+
 /*
  * Register device for IOMMU tracking.
  */
@@ -97,6 +600,54 @@ static void riscv_iommu_add_device(struct riscv_iommu_device *iommu, struct devi
 	mutex_unlock(&iommu->eps_mutex);
 }
 
+/* Page request interface queue primary interrupt handler */
+static irqreturn_t riscv_iommu_priq_irq_check(int irq, void *data)
+{
+	struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
+	struct riscv_iommu_device *iommu =
+	    container_of(q, struct riscv_iommu_device, priq);
+	u32 ipsr = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_IPSR);
+	if (ipsr & RISCV_IOMMU_IPSR_PIP)
+		return IRQ_WAKE_THREAD;
+	return IRQ_NONE;
+}
+
+/* Page request interface queue interrupt hanlder thread function */
+static irqreturn_t riscv_iommu_priq_process(int irq, void *data)
+{
+	struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
+	struct riscv_iommu_device *iommu;
+	struct riscv_iommu_pq_record *requests;
+	unsigned cnt, idx, ctrl;
+
+	iommu = container_of(q, struct riscv_iommu_device, priq);
+	requests = (struct riscv_iommu_pq_record *)q->base;
+
+	/* Error reporting, clear error reports if any. */
+	ctrl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_PQCSR);
+	if (ctrl & (RISCV_IOMMU_PQCSR_PQMF | RISCV_IOMMU_PQCSR_PQOF)) {
+		riscv_iommu_queue_ctrl(iommu, &iommu->priq, ctrl);
+		dev_warn_ratelimited(iommu->dev,
+				     "Page request queue error: fault: %d full: %d\n",
+				     !!(ctrl & RISCV_IOMMU_PQCSR_PQMF),
+				     !!(ctrl & RISCV_IOMMU_PQCSR_PQOF));
+	}
+
+	/* Clear page request interrupt pending. */
+	riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR, RISCV_IOMMU_IPSR_PIP);
+
+	/* Process page requests. */
+	do {
+		cnt = riscv_iommu_queue_consume(iommu, q, &idx);
+		if (!cnt)
+			break;
+		dev_warn(iommu->dev, "unexpected %u page requests\n", cnt);
+		riscv_iommu_queue_release(iommu, q, cnt);
+	} while (1);
+
+	return IRQ_HANDLED;
+}
+
 /*
  * Endpoint management
  */
@@ -350,7 +901,29 @@ static void riscv_iommu_flush_iotlb_range(struct iommu_domain *iommu_domain,
 					  unsigned long *start, unsigned long *end,
 					  size_t *pgsize)
 {
-	/* Command interface not implemented */
+	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
+	struct riscv_iommu_command cmd;
+	unsigned long iova;
+
+	if (domain->mode == RISCV_IOMMU_DC_FSC_MODE_BARE)
+		return;
+
+	/* Domain not attached to an IOMMU! */
+	BUG_ON(!domain->iommu);
+
+	riscv_iommu_cmd_inval_vma(&cmd);
+	riscv_iommu_cmd_inval_set_pscid(&cmd, domain->pscid);
+
+	if (start && end && pgsize) {
+		/* Cover only the range that is needed */
+		for (iova = *start; iova <= *end; iova += *pgsize) {
+			riscv_iommu_cmd_inval_set_addr(&cmd, iova);
+			riscv_iommu_post(domain->iommu, &cmd);
+		}
+	} else {
+		riscv_iommu_post(domain->iommu, &cmd);
+	}
+	riscv_iommu_iofence_sync(domain->iommu);
 }
 
 static void riscv_iommu_flush_iotlb_all(struct iommu_domain *iommu_domain)
@@ -610,6 +1183,9 @@ void riscv_iommu_remove(struct riscv_iommu_device *iommu)
 	iommu_device_unregister(&iommu->iommu);
 	iommu_device_sysfs_remove(&iommu->iommu);
 	riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_OFF);
+	riscv_iommu_queue_free(iommu, &iommu->cmdq);
+	riscv_iommu_queue_free(iommu, &iommu->fltq);
+	riscv_iommu_queue_free(iommu, &iommu->priq);
 }
 
 int riscv_iommu_init(struct riscv_iommu_device *iommu)
@@ -632,6 +1208,16 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
 	}
 #endif
 
+	/*
+	 * Assign queue lengths from module parameters if not already
+	 * set on the device tree.
+	 */
+	if (!iommu->cmdq_len)
+		iommu->cmdq_len = cmdq_length;
+	if (!iommu->fltq_len)
+		iommu->fltq_len = fltq_length;
+	if (!iommu->priq_len)
+		iommu->priq_len = priq_length;
 	/* Clear any pending interrupt flag. */
 	riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR,
 			   RISCV_IOMMU_IPSR_CIP |
@@ -639,7 +1225,20 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
 			   RISCV_IOMMU_IPSR_PMIP | RISCV_IOMMU_IPSR_PIP);
 	spin_lock_init(&iommu->cq_lock);
 	mutex_init(&iommu->eps_mutex);
+	ret = riscv_iommu_queue_init(iommu, RISCV_IOMMU_COMMAND_QUEUE);
+	if (ret)
+		goto fail;
+	ret = riscv_iommu_queue_init(iommu, RISCV_IOMMU_FAULT_QUEUE);
+	if (ret)
+		goto fail;
+	if (!(iommu->cap & RISCV_IOMMU_CAP_ATS))
+		goto no_ats;
+
+	ret = riscv_iommu_queue_init(iommu, RISCV_IOMMU_PAGE_REQUEST_QUEUE);
+	if (ret)
+		goto fail;
 
+ no_ats:
 	ret = riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_BARE);
 
 	if (ret) {
@@ -663,5 +1262,8 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
 	return 0;
  fail:
 	riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_OFF);
+	riscv_iommu_queue_free(iommu, &iommu->priq);
+	riscv_iommu_queue_free(iommu, &iommu->fltq);
+	riscv_iommu_queue_free(iommu, &iommu->cmdq);
 	return ret;
 }
diff --git a/drivers/iommu/riscv/iommu.h b/drivers/iommu/riscv/iommu.h
index 7dc9baa59a50..04148a2a8ffd 100644
--- a/drivers/iommu/riscv/iommu.h
+++ b/drivers/iommu/riscv/iommu.h
@@ -28,6 +28,24 @@
 #define IOMMU_PAGE_SIZE_1G	BIT_ULL(30)
 #define IOMMU_PAGE_SIZE_512G	BIT_ULL(39)
 
+struct riscv_iommu_queue {
+	dma_addr_t base_dma;	/* ring buffer bus address */
+	void *base;		/* ring buffer pointer */
+	size_t len;		/* single item length */
+	u32 cnt;		/* items count */
+	u32 lui;		/* last used index, consumer/producer share */
+	unsigned qbr;		/* queue base register offset */
+	unsigned qcr;		/* queue control and status register offset */
+	int irq;		/* registered interrupt number */
+	bool in_iomem;		/* indicates queue data are in I/O memory  */
+};
+
+enum riscv_queue_ids {
+	RISCV_IOMMU_COMMAND_QUEUE	= 0,
+	RISCV_IOMMU_FAULT_QUEUE		= 1,
+	RISCV_IOMMU_PAGE_REQUEST_QUEUE	= 2
+};
+
 struct riscv_iommu_device {
 	struct iommu_device iommu;	/* iommu core interface */
 	struct device *dev;		/* iommu hardware */
@@ -42,6 +60,11 @@ struct riscv_iommu_device {
 	int irq_pm;
 	int irq_priq;
 
+	/* Queue lengths */
+	int cmdq_len;
+	int fltq_len;
+	int priq_len;
+
 	/* supported and enabled hardware capabilities */
 	u64 cap;
 
@@ -53,6 +76,11 @@ struct riscv_iommu_device {
 	unsigned ddt_mode;
 	bool ddtp_in_iomem;
 
+	/* hardware queues */
+	struct riscv_iommu_queue cmdq;
+	struct riscv_iommu_queue fltq;
+	struct riscv_iommu_queue priq;
+
 	/* Connected end-points */
 	struct rb_root eps;
 	struct mutex eps_mutex;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 07/11] RISC-V: drivers/iommu/riscv: Add device context support
  2023-07-19 19:33 [PATCH 00/13] Linux RISC-V IOMMU Support Tomasz Jeznach
                   ` (5 preceding siblings ...)
  2023-07-19 19:33 ` [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues Tomasz Jeznach
@ 2023-07-19 19:33 ` Tomasz Jeznach
  2023-08-16 19:08   ` Robin Murphy
  2023-07-19 19:33 ` [PATCH 08/11] RISC-V: drivers/iommu/riscv: Add page table support Tomasz Jeznach
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 86+ messages in thread
From: Tomasz Jeznach @ 2023-07-19 19:33 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley
  Cc: Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Nick Kossifidis, Sebastien Boeuf, iommu, linux-riscv,
	linux-kernel, linux, Tomasz Jeznach

Introduces per device translation context, with 1,2 or 3 tree level
device tree structures.

Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
---
 drivers/iommu/riscv/iommu.c | 163 ++++++++++++++++++++++++++++++++++--
 drivers/iommu/riscv/iommu.h |   1 +
 2 files changed, 158 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index 5c4cf9875302..9ee7d2b222b5 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -41,7 +41,7 @@ MODULE_ALIAS("riscv-iommu");
 MODULE_LICENSE("GPL v2");
 
 /* Global IOMMU params. */
-static int ddt_mode = RISCV_IOMMU_DDTP_MODE_BARE;
+static int ddt_mode = RISCV_IOMMU_DDTP_MODE_3LVL;
 module_param(ddt_mode, int, 0644);
 MODULE_PARM_DESC(ddt_mode, "Device Directory Table mode.");
 
@@ -452,6 +452,14 @@ static bool riscv_iommu_post(struct riscv_iommu_device *iommu,
 	return riscv_iommu_post_sync(iommu, cmd, false);
 }
 
+static bool riscv_iommu_iodir_inv_devid(struct riscv_iommu_device *iommu, unsigned devid)
+{
+	struct riscv_iommu_command cmd;
+	riscv_iommu_cmd_iodir_inval_ddt(&cmd);
+	riscv_iommu_cmd_iodir_set_did(&cmd, devid);
+	return riscv_iommu_post(iommu, &cmd);
+}
+
 static bool riscv_iommu_iofence_sync(struct riscv_iommu_device *iommu)
 {
 	struct riscv_iommu_command cmd;
@@ -671,6 +679,94 @@ static bool riscv_iommu_capable(struct device *dev, enum iommu_cap cap)
 	return false;
 }
 
+/* TODO: implement proper device context management, e.g. teardown flow */
+
+/* Lookup or initialize device directory info structure. */
+static struct riscv_iommu_dc *riscv_iommu_get_dc(struct riscv_iommu_device *iommu,
+						 unsigned devid)
+{
+	const bool base_format = !(iommu->cap & RISCV_IOMMU_CAP_MSI_FLAT);
+	unsigned depth = iommu->ddt_mode - RISCV_IOMMU_DDTP_MODE_1LVL;
+	u8 ddi_bits[3] = { 0 };
+	u64 *ddtp = NULL, ddt;
+
+	if (iommu->ddt_mode == RISCV_IOMMU_DDTP_MODE_OFF ||
+	    iommu->ddt_mode == RISCV_IOMMU_DDTP_MODE_BARE)
+		return NULL;
+
+	/* Make sure the mode is valid */
+	if (iommu->ddt_mode > RISCV_IOMMU_DDTP_MODE_MAX)
+		return NULL;
+
+	/*
+	 * Device id partitioning for base format:
+	 * DDI[0]: bits 0 - 6   (1st level) (7 bits)
+	 * DDI[1]: bits 7 - 15  (2nd level) (9 bits)
+	 * DDI[2]: bits 16 - 23 (3rd level) (8 bits)
+	 *
+	 * For extended format:
+	 * DDI[0]: bits 0 - 5   (1st level) (6 bits)
+	 * DDI[1]: bits 6 - 14  (2nd level) (9 bits)
+	 * DDI[2]: bits 15 - 23 (3rd level) (9 bits)
+	 */
+	if (base_format) {
+		ddi_bits[0] = 7;
+		ddi_bits[1] = 7 + 9;
+		ddi_bits[2] = 7 + 9 + 8;
+	} else {
+		ddi_bits[0] = 6;
+		ddi_bits[1] = 6 + 9;
+		ddi_bits[2] = 6 + 9 + 9;
+	}
+
+	/* Make sure device id is within range */
+	if (devid >= (1 << ddi_bits[depth]))
+		return NULL;
+
+	/* Get to the level of the non-leaf node that holds the device context */
+	for (ddtp = (u64 *) iommu->ddtp; depth-- > 0;) {
+		const int split = ddi_bits[depth];
+		/*
+		 * Each non-leaf node is 64bits wide and on each level
+		 * nodes are indexed by DDI[depth].
+		 */
+		ddtp += (devid >> split) & 0x1FF;
+
+ retry:
+		/*
+		 * Check if this node has been populated and if not
+		 * allocate a new level and populate it.
+		 */
+		ddt = READ_ONCE(*ddtp);
+		if (ddt & RISCV_IOMMU_DDTE_VALID) {
+			ddtp = __va(ppn_to_phys(ddt));
+		} else {
+			u64 old, new = get_zeroed_page(GFP_KERNEL);
+			if (!new)
+				return NULL;
+
+			old = cmpxchg64_relaxed(ddtp, ddt,
+						phys_to_ppn(__pa(new)) |
+						RISCV_IOMMU_DDTE_VALID);
+
+			if (old != ddt) {
+				free_page(new);
+				goto retry;
+			}
+
+			ddtp = (u64 *) new;
+		}
+	}
+
+	/*
+	 * Grab the node that matches DDI[depth], note that when using base
+	 * format the device context is 4 * 64bits, and the extended format
+	 * is 8 * 64bits, hence the (3 - base_format) below.
+	 */
+	ddtp += (devid & ((64 << base_format) - 1)) << (3 - base_format);
+	return (struct riscv_iommu_dc *)ddtp;
+}
+
 static struct iommu_device *riscv_iommu_probe_device(struct device *dev)
 {
 	struct riscv_iommu_device *iommu;
@@ -708,6 +804,9 @@ static struct iommu_device *riscv_iommu_probe_device(struct device *dev)
 	ep->iommu = iommu;
 	ep->dev = dev;
 
+	/* Initial DC pointer can be NULL if IOMMU is configured in OFF or BARE mode */
+	ep->dc = riscv_iommu_get_dc(iommu, ep->devid);
+
 	dev_info(iommu->dev, "adding device to iommu with devid %i in domain %i\n",
 		ep->devid, ep->domid);
 
@@ -734,6 +833,16 @@ static void riscv_iommu_release_device(struct device *dev)
 	list_del(&ep->domain);
 	mutex_unlock(&ep->lock);
 
+	if (ep->dc) {
+		// this should be already done by domain detach.
+		ep->dc->tc = 0ULL;
+		wmb();
+		ep->dc->fsc = 0ULL;
+		ep->dc->iohgatp = 0ULL;
+		wmb();
+		riscv_iommu_iodir_inv_devid(iommu, ep->devid);
+	}
+
 	/* Remove endpoint from IOMMU tracking structures */
 	mutex_lock(&iommu->eps_mutex);
 	rb_erase(&ep->node, &iommu->eps);
@@ -853,11 +962,21 @@ static int riscv_iommu_domain_finalize(struct riscv_iommu_domain *domain,
 	return 0;
 }
 
+static u64 riscv_iommu_domain_atp(struct riscv_iommu_domain *domain)
+{
+	u64 atp = FIELD_PREP(RISCV_IOMMU_DC_FSC_MODE, domain->mode);
+	if (domain->mode != RISCV_IOMMU_DC_FSC_MODE_BARE)
+		atp |= FIELD_PREP(RISCV_IOMMU_DC_FSC_PPN, virt_to_pfn(domain->pgd_root));
+	return atp;
+}
+
 static int riscv_iommu_attach_dev(struct iommu_domain *iommu_domain, struct device *dev)
 {
 	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
 	struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
+	struct riscv_iommu_dc *dc = ep->dc;
 	int ret;
+	u64 val;
 
 	/* PSCID not valid */
 	if ((int)domain->pscid < 0)
@@ -880,17 +999,44 @@ static int riscv_iommu_attach_dev(struct iommu_domain *iommu_domain, struct devi
 		return ret;
 	}
 
-	if (ep->iommu->ddt_mode != RISCV_IOMMU_DDTP_MODE_BARE ||
-	    domain->domain.type != IOMMU_DOMAIN_IDENTITY) {
-		dev_warn(dev, "domain type %d not supported\n",
-		    domain->domain.type);
+	if (ep->iommu->ddt_mode == RISCV_IOMMU_DDTP_MODE_BARE &&
+	    domain->domain.type == IOMMU_DOMAIN_IDENTITY) {
+		dev_info(dev, "domain type %d attached w/ PSCID %u\n",
+		    domain->domain.type, domain->pscid);
+		return 0;
+	}
+
+	if (!dc) {
 		return -ENODEV;
 	}
 
+	/*
+	 * S-Stage translation table. G-Stage remains unmodified (BARE).
+	 */
+	val = FIELD_PREP(RISCV_IOMMU_DC_TA_PSCID, domain->pscid);
+
+	dc->ta = cpu_to_le64(val);
+	dc->fsc = cpu_to_le64(riscv_iommu_domain_atp(domain));
+
+	wmb();
+
+	/* Mark device context as valid, synchronise device context cache. */
+	val = RISCV_IOMMU_DC_TC_V;
+
+	if (ep->iommu->cap & RISCV_IOMMU_CAP_AMO) {
+		val |= RISCV_IOMMU_DC_TC_GADE |
+		       RISCV_IOMMU_DC_TC_SADE;
+	}
+
+	dc->tc = cpu_to_le64(val);
+	wmb();
+
 	list_add_tail(&ep->domain, &domain->endpoints);
 	mutex_unlock(&ep->lock);
 	mutex_unlock(&domain->lock);
 
+	riscv_iommu_iodir_inv_devid(ep->iommu, ep->devid);
+
 	dev_info(dev, "domain type %d attached w/ PSCID %u\n",
 	    domain->domain.type, domain->pscid);
 
@@ -1239,7 +1385,12 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
 		goto fail;
 
  no_ats:
-	ret = riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_BARE);
+	if (iommu_default_passthrough()) {
+		dev_info(dev, "iommu set to passthrough mode\n");
+		ret = riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_BARE);
+	} else {
+		ret = riscv_iommu_enable(iommu, ddt_mode);
+	}
 
 	if (ret) {
 		dev_err(dev, "cannot enable iommu device (%d)\n", ret);
diff --git a/drivers/iommu/riscv/iommu.h b/drivers/iommu/riscv/iommu.h
index 04148a2a8ffd..9140df71e17b 100644
--- a/drivers/iommu/riscv/iommu.h
+++ b/drivers/iommu/riscv/iommu.h
@@ -105,6 +105,7 @@ struct riscv_iommu_endpoint {
 	unsigned devid;      			/* PCI bus:device:function number */
 	unsigned domid;    			/* PCI domain number, segment */
 	struct rb_node node;    		/* device tracking node (lookup by devid) */
+	struct riscv_iommu_dc *dc;		/* device context pointer */
 	struct riscv_iommu_device *iommu;	/* parent iommu device */
 
 	struct mutex lock;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 08/11] RISC-V: drivers/iommu/riscv: Add page table support
  2023-07-19 19:33 [PATCH 00/13] Linux RISC-V IOMMU Support Tomasz Jeznach
                   ` (6 preceding siblings ...)
  2023-07-19 19:33 ` [PATCH 07/11] RISC-V: drivers/iommu/riscv: Add device context support Tomasz Jeznach
@ 2023-07-19 19:33 ` Tomasz Jeznach
  2023-07-25 13:13   ` Zong Li
                     ` (2 more replies)
  2023-07-19 19:33 ` [PATCH 09/11] RISC-V: drivers/iommu/riscv: Add SVA with PASID/ATS/PRI support Tomasz Jeznach
                   ` (3 subsequent siblings)
  11 siblings, 3 replies; 86+ messages in thread
From: Tomasz Jeznach @ 2023-07-19 19:33 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley
  Cc: Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Nick Kossifidis, Sebastien Boeuf, iommu, linux-riscv,
	linux-kernel, linux, Tomasz Jeznach

Introduces I/O page level translation services, with 4K, 2M, 1G page
size support and enables page level iommu_map/unmap domain interfaces.

Co-developed-by: Sebastien Boeuf <seb@rivosinc.com>
Signed-off-by: Sebastien Boeuf <seb@rivosinc.com>
Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
---
 drivers/iommu/io-pgtable.c       |   3 +
 drivers/iommu/riscv/Makefile     |   2 +-
 drivers/iommu/riscv/io_pgtable.c | 266 +++++++++++++++++++++++++++++++
 drivers/iommu/riscv/iommu.c      |  40 +++--
 drivers/iommu/riscv/iommu.h      |   1 +
 include/linux/io-pgtable.h       |   2 +
 6 files changed, 297 insertions(+), 17 deletions(-)
 create mode 100644 drivers/iommu/riscv/io_pgtable.c

diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c
index b843fcd365d2..c4807175934f 100644
--- a/drivers/iommu/io-pgtable.c
+++ b/drivers/iommu/io-pgtable.c
@@ -32,6 +32,9 @@ io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] = {
 	[AMD_IOMMU_V1] = &io_pgtable_amd_iommu_v1_init_fns,
 	[AMD_IOMMU_V2] = &io_pgtable_amd_iommu_v2_init_fns,
 #endif
+#ifdef CONFIG_RISCV_IOMMU
+	[RISCV_IOMMU] = &io_pgtable_riscv_init_fns,
+#endif
 };
 
 struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
diff --git a/drivers/iommu/riscv/Makefile b/drivers/iommu/riscv/Makefile
index 9523eb053cfc..13af452c3052 100644
--- a/drivers/iommu/riscv/Makefile
+++ b/drivers/iommu/riscv/Makefile
@@ -1 +1 @@
-obj-$(CONFIG_RISCV_IOMMU) += iommu.o iommu-pci.o iommu-platform.o iommu-sysfs.o
\ No newline at end of file
+obj-$(CONFIG_RISCV_IOMMU) += iommu.o iommu-pci.o iommu-platform.o iommu-sysfs.o io_pgtable.o
diff --git a/drivers/iommu/riscv/io_pgtable.c b/drivers/iommu/riscv/io_pgtable.c
new file mode 100644
index 000000000000..b6e603e6726e
--- /dev/null
+++ b/drivers/iommu/riscv/io_pgtable.c
@@ -0,0 +1,266 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright © 2022-2023 Rivos Inc.
+ *
+ * RISC-V IOMMU page table allocator.
+ *
+ * Authors:
+ *	Tomasz Jeznach <tjeznach@rivosinc.com>
+ *	Sebastien Boeuf <seb@rivosinc.com>
+ */
+
+#include <linux/atomic.h>
+#include <linux/bitops.h>
+#include <linux/io-pgtable.h>
+#include <linux/kernel.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <linux/dma-mapping.h>
+
+#include "iommu.h"
+
+#define io_pgtable_to_domain(x) \
+	container_of((x), struct riscv_iommu_domain, pgtbl)
+
+#define io_pgtable_ops_to_domain(x) \
+	io_pgtable_to_domain(container_of((x), struct io_pgtable, ops))
+
+static inline size_t get_page_size(size_t size)
+{
+	if (size >= IOMMU_PAGE_SIZE_512G)
+		return IOMMU_PAGE_SIZE_512G;
+
+	if (size >= IOMMU_PAGE_SIZE_1G)
+		return IOMMU_PAGE_SIZE_1G;
+
+	if (size >= IOMMU_PAGE_SIZE_2M)
+		return IOMMU_PAGE_SIZE_2M;
+
+	return IOMMU_PAGE_SIZE_4K;
+}
+
+static void riscv_iommu_pt_walk_free(pmd_t * ptp, unsigned shift, bool root)
+{
+	pmd_t *pte, *pt_base;
+	int i;
+
+	if (shift == PAGE_SHIFT)
+		return;
+
+	if (root)
+		pt_base = ptp;
+	else
+		pt_base =
+		    (pmd_t *) pfn_to_virt(__page_val_to_pfn(pmd_val(*ptp)));
+
+	/* Recursively free all sub page table pages */
+	for (i = 0; i < PTRS_PER_PMD; i++) {
+		pte = pt_base + i;
+		if (pmd_present(*pte) && !pmd_leaf(*pte))
+			riscv_iommu_pt_walk_free(pte, shift - 9, false);
+	}
+
+	/* Now free the current page table page */
+	if (!root && pmd_present(*pt_base))
+		free_page((unsigned long)pt_base);
+}
+
+static void riscv_iommu_free_pgtable(struct io_pgtable *iop)
+{
+	struct riscv_iommu_domain *domain = io_pgtable_to_domain(iop);
+	riscv_iommu_pt_walk_free((pmd_t *) domain->pgd_root, PGDIR_SHIFT, true);
+}
+
+static pte_t *riscv_iommu_pt_walk_alloc(pmd_t * ptp, unsigned long iova,
+					unsigned shift, bool root,
+					size_t pgsize,
+					unsigned long (*pd_alloc)(gfp_t),
+					gfp_t gfp)
+{
+	pmd_t *pte;
+	unsigned long pfn;
+
+	if (root)
+		pte = ptp + ((iova >> shift) & (PTRS_PER_PMD - 1));
+	else
+		pte = (pmd_t *) pfn_to_virt(__page_val_to_pfn(pmd_val(*ptp))) +
+		    ((iova >> shift) & (PTRS_PER_PMD - 1));
+
+	if ((1ULL << shift) <= pgsize) {
+		if (pmd_present(*pte) && !pmd_leaf(*pte))
+			riscv_iommu_pt_walk_free(pte, shift - 9, false);
+		return (pte_t *) pte;
+	}
+
+	if (pmd_none(*pte)) {
+		pfn = pd_alloc ? virt_to_pfn(pd_alloc(gfp)) : 0;
+		if (!pfn)
+			return NULL;
+		set_pmd(pte, __pmd((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
+	}
+
+	return riscv_iommu_pt_walk_alloc(pte, iova, shift - 9, false,
+					 pgsize, pd_alloc, gfp);
+}
+
+static pte_t *riscv_iommu_pt_walk_fetch(pmd_t * ptp,
+					unsigned long iova, unsigned shift,
+					bool root)
+{
+	pmd_t *pte;
+
+	if (root)
+		pte = ptp + ((iova >> shift) & (PTRS_PER_PMD - 1));
+	else
+		pte = (pmd_t *) pfn_to_virt(__page_val_to_pfn(pmd_val(*ptp))) +
+		    ((iova >> shift) & (PTRS_PER_PMD - 1));
+
+	if (pmd_leaf(*pte))
+		return (pte_t *) pte;
+	else if (pmd_none(*pte))
+		return NULL;
+	else if (shift == PAGE_SHIFT)
+		return NULL;
+
+	return riscv_iommu_pt_walk_fetch(pte, iova, shift - 9, false);
+}
+
+static int riscv_iommu_map_pages(struct io_pgtable_ops *ops,
+				 unsigned long iova, phys_addr_t phys,
+				 size_t pgsize, size_t pgcount, int prot,
+				 gfp_t gfp, size_t *mapped)
+{
+	struct riscv_iommu_domain *domain = io_pgtable_ops_to_domain(ops);
+	size_t size = 0;
+	size_t page_size = get_page_size(pgsize);
+	pte_t *pte;
+	pte_t pte_val;
+	pgprot_t pte_prot;
+
+	if (domain->domain.type == IOMMU_DOMAIN_BLOCKED)
+		return -ENODEV;
+
+	if (domain->domain.type == IOMMU_DOMAIN_IDENTITY) {
+		*mapped = pgsize * pgcount;
+		return 0;
+	}
+
+	pte_prot = (prot & IOMMU_WRITE) ?
+	    __pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_WRITE | _PAGE_DIRTY) :
+	    __pgprot(_PAGE_BASE | _PAGE_READ);
+
+	while (pgcount--) {
+		pte =
+		    riscv_iommu_pt_walk_alloc((pmd_t *) domain->pgd_root, iova,
+					      PGDIR_SHIFT, true, page_size,
+					      get_zeroed_page, gfp);
+		if (!pte) {
+			*mapped = size;
+			return -ENOMEM;
+		}
+
+		pte_val = pfn_pte(phys_to_pfn(phys), pte_prot);
+
+		set_pte(pte, pte_val);
+
+		size += page_size;
+		iova += page_size;
+		phys += page_size;
+	}
+
+	*mapped = size;
+	return 0;
+}
+
+static size_t riscv_iommu_unmap_pages(struct io_pgtable_ops *ops,
+				      unsigned long iova, size_t pgsize,
+				      size_t pgcount,
+				      struct iommu_iotlb_gather *gather)
+{
+	struct riscv_iommu_domain *domain = io_pgtable_ops_to_domain(ops);
+	size_t size = 0;
+	size_t page_size = get_page_size(pgsize);
+	pte_t *pte;
+
+	if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)
+		return pgsize * pgcount;
+
+	while (pgcount--) {
+		pte = riscv_iommu_pt_walk_fetch((pmd_t *) domain->pgd_root,
+						iova, PGDIR_SHIFT, true);
+		if (!pte)
+			return size;
+
+		set_pte(pte, __pte(0));
+
+		iommu_iotlb_gather_add_page(&domain->domain, gather, iova,
+					    pgsize);
+
+		size += page_size;
+		iova += page_size;
+	}
+
+	return size;
+}
+
+static phys_addr_t riscv_iommu_iova_to_phys(struct io_pgtable_ops *ops,
+					    unsigned long iova)
+{
+	struct riscv_iommu_domain *domain = io_pgtable_ops_to_domain(ops);
+	pte_t *pte;
+
+	if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)
+		return (phys_addr_t) iova;
+
+	pte = riscv_iommu_pt_walk_fetch((pmd_t *) domain->pgd_root,
+					iova, PGDIR_SHIFT, true);
+	if (!pte || !pte_present(*pte))
+		return 0;
+
+	return (pfn_to_phys(pte_pfn(*pte)) | (iova & PAGE_MASK));
+}
+
+static void riscv_iommu_tlb_inv_all(void *cookie)
+{
+}
+
+static void riscv_iommu_tlb_inv_walk(unsigned long iova, size_t size,
+				     size_t granule, void *cookie)
+{
+}
+
+static void riscv_iommu_tlb_add_page(struct iommu_iotlb_gather *gather,
+				     unsigned long iova, size_t granule,
+				     void *cookie)
+{
+}
+
+static const struct iommu_flush_ops riscv_iommu_flush_ops = {
+	.tlb_flush_all = riscv_iommu_tlb_inv_all,
+	.tlb_flush_walk = riscv_iommu_tlb_inv_walk,
+	.tlb_add_page = riscv_iommu_tlb_add_page,
+};
+
+/* NOTE: cfg should point to riscv_iommu_domain structure member pgtbl.cfg */
+static struct io_pgtable *riscv_iommu_alloc_pgtable(struct io_pgtable_cfg *cfg,
+						    void *cookie)
+{
+	struct io_pgtable *iop = container_of(cfg, struct io_pgtable, cfg);
+
+	cfg->pgsize_bitmap = SZ_4K | SZ_2M | SZ_1G;
+	cfg->ias = 57;		// va mode, SvXX -> ias
+	cfg->oas = 57;		// pa mode, or SvXX+4 -> oas
+	cfg->tlb = &riscv_iommu_flush_ops;
+
+	iop->ops.map_pages = riscv_iommu_map_pages;
+	iop->ops.unmap_pages = riscv_iommu_unmap_pages;
+	iop->ops.iova_to_phys = riscv_iommu_iova_to_phys;
+
+	return iop;
+}
+
+struct io_pgtable_init_fns io_pgtable_riscv_init_fns = {
+	.alloc = riscv_iommu_alloc_pgtable,
+	.free = riscv_iommu_free_pgtable,
+};
diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index 9ee7d2b222b5..2ef6952a2109 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -807,7 +807,7 @@ static struct iommu_device *riscv_iommu_probe_device(struct device *dev)
 	/* Initial DC pointer can be NULL if IOMMU is configured in OFF or BARE mode */
 	ep->dc = riscv_iommu_get_dc(iommu, ep->devid);
 
-	dev_info(iommu->dev, "adding device to iommu with devid %i in domain %i\n",
+	dev_dbg(iommu->dev, "adding device to iommu with devid %i in domain %i\n",
 		ep->devid, ep->domid);
 
 	dev_iommu_priv_set(dev, ep);
@@ -874,7 +874,10 @@ static struct iommu_domain *riscv_iommu_domain_alloc(unsigned type)
 {
 	struct riscv_iommu_domain *domain;
 
-	if (type != IOMMU_DOMAIN_IDENTITY &&
+	if (type != IOMMU_DOMAIN_DMA &&
+	    type != IOMMU_DOMAIN_DMA_FQ &&
+	    type != IOMMU_DOMAIN_UNMANAGED &&
+	    type != IOMMU_DOMAIN_IDENTITY &&
 	    type != IOMMU_DOMAIN_BLOCKED)
 		return NULL;
 
@@ -890,7 +893,7 @@ static struct iommu_domain *riscv_iommu_domain_alloc(unsigned type)
 	domain->pscid = ida_alloc_range(&riscv_iommu_pscids, 1,
 					RISCV_IOMMU_MAX_PSCID, GFP_KERNEL);
 
-	printk("domain type %x alloc %u\n", type, domain->pscid);
+	printk("domain alloc %u\n", domain->pscid);
 
 	return &domain->domain;
 }
@@ -903,6 +906,9 @@ static void riscv_iommu_domain_free(struct iommu_domain *iommu_domain)
 		pr_warn("IOMMU domain is not empty!\n");
 	}
 
+	if (domain->pgtbl.cookie)
+		free_io_pgtable_ops(&domain->pgtbl.ops);
+
 	if (domain->pgd_root)
 		free_pages((unsigned long)domain->pgd_root, 0);
 
@@ -959,6 +965,9 @@ static int riscv_iommu_domain_finalize(struct riscv_iommu_domain *domain,
 	if (!domain->pgd_root)
 		return -ENOMEM;
 
+	if (!alloc_io_pgtable_ops(RISCV_IOMMU, &domain->pgtbl.cfg, domain))
+		return -ENOMEM;
+
 	return 0;
 }
 
@@ -1006,9 +1015,8 @@ static int riscv_iommu_attach_dev(struct iommu_domain *iommu_domain, struct devi
 		return 0;
 	}
 
-	if (!dc) {
+	if (!dc)
 		return -ENODEV;
-	}
 
 	/*
 	 * S-Stage translation table. G-Stage remains unmodified (BARE).
@@ -1104,12 +1112,11 @@ static int riscv_iommu_map_pages(struct iommu_domain *iommu_domain,
 {
 	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
 
-	if (domain->domain.type == IOMMU_DOMAIN_IDENTITY) {
-		*mapped = pgsize * pgcount;
-		return 0;
-	}
+	if (!domain->pgtbl.ops.map_pages)
+		return -ENODEV;
 
-	return -ENODEV;
+	return domain->pgtbl.ops.map_pages(&domain->pgtbl.ops, iova, phys,
+					   pgsize, pgcount, prot, gfp, mapped);
 }
 
 static size_t riscv_iommu_unmap_pages(struct iommu_domain *iommu_domain,
@@ -1118,10 +1125,11 @@ static size_t riscv_iommu_unmap_pages(struct iommu_domain *iommu_domain,
 {
 	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
 
-	if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)
-		return pgsize * pgcount;
+	if (!domain->pgtbl.ops.unmap_pages)
+		return 0;
 
-	return 0;
+	return domain->pgtbl.ops.unmap_pages(&domain->pgtbl.ops, iova, pgsize,
+					     pgcount, gather);
 }
 
 static phys_addr_t riscv_iommu_iova_to_phys(struct iommu_domain *iommu_domain,
@@ -1129,10 +1137,10 @@ static phys_addr_t riscv_iommu_iova_to_phys(struct iommu_domain *iommu_domain,
 {
 	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
 
-	if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)
-		return (phys_addr_t) iova;
+	if (!domain->pgtbl.ops.iova_to_phys)
+		return 0;
 
-	return 0;
+	return domain->pgtbl.ops.iova_to_phys(&domain->pgtbl.ops, iova);
 }
 
 /*
diff --git a/drivers/iommu/riscv/iommu.h b/drivers/iommu/riscv/iommu.h
index 9140df71e17b..fe32a4eff14e 100644
--- a/drivers/iommu/riscv/iommu.h
+++ b/drivers/iommu/riscv/iommu.h
@@ -88,6 +88,7 @@ struct riscv_iommu_device {
 
 struct riscv_iommu_domain {
 	struct iommu_domain domain;
+	struct io_pgtable pgtbl;
 
 	struct list_head endpoints;
 	struct mutex lock;
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 1b7a44b35616..8dd9d3a28e3a 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -19,6 +19,7 @@ enum io_pgtable_fmt {
 	AMD_IOMMU_V2,
 	APPLE_DART,
 	APPLE_DART2,
+	RISCV_IOMMU,
 	IO_PGTABLE_NUM_FMTS,
 };
 
@@ -258,5 +259,6 @@ extern struct io_pgtable_init_fns io_pgtable_arm_mali_lpae_init_fns;
 extern struct io_pgtable_init_fns io_pgtable_amd_iommu_v1_init_fns;
 extern struct io_pgtable_init_fns io_pgtable_amd_iommu_v2_init_fns;
 extern struct io_pgtable_init_fns io_pgtable_apple_dart_init_fns;
+extern struct io_pgtable_init_fns io_pgtable_riscv_init_fns;
 
 #endif /* __IO_PGTABLE_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 09/11] RISC-V: drivers/iommu/riscv: Add SVA with PASID/ATS/PRI support.
  2023-07-19 19:33 [PATCH 00/13] Linux RISC-V IOMMU Support Tomasz Jeznach
                   ` (7 preceding siblings ...)
  2023-07-19 19:33 ` [PATCH 08/11] RISC-V: drivers/iommu/riscv: Add page table support Tomasz Jeznach
@ 2023-07-19 19:33 ` Tomasz Jeznach
  2023-07-31  9:04   ` Zong Li
  2023-07-19 19:33 ` [PATCH 10/11] RISC-V: drivers/iommu/riscv: Add MSI identity remapping Tomasz Jeznach
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 86+ messages in thread
From: Tomasz Jeznach @ 2023-07-19 19:33 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley
  Cc: Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Nick Kossifidis, Sebastien Boeuf, iommu, linux-riscv,
	linux-kernel, linux, Tomasz Jeznach

Introduces SVA (Shared Virtual Address) for RISC-V IOMMU, with
ATS/PRI services for capable devices.

Co-developed-by: Sebastien Boeuf <seb@rivosinc.com>
Signed-off-by: Sebastien Boeuf <seb@rivosinc.com>
Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
---
 drivers/iommu/riscv/iommu.c | 601 +++++++++++++++++++++++++++++++++++-
 drivers/iommu/riscv/iommu.h |  14 +
 2 files changed, 610 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index 2ef6952a2109..6042c35be3ca 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -384,6 +384,89 @@ static inline void riscv_iommu_cmd_iodir_set_did(struct riscv_iommu_command *cmd
 	    FIELD_PREP(RISCV_IOMMU_CMD_IODIR_DID, devid) | RISCV_IOMMU_CMD_IODIR_DV;
 }
 
+static inline void riscv_iommu_cmd_iodir_set_pid(struct riscv_iommu_command *cmd,
+						 unsigned pasid)
+{
+	cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_IODIR_PID, pasid);
+}
+
+static void riscv_iommu_cmd_ats_inval(struct riscv_iommu_command *cmd)
+{
+	cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_ATS_OPCODE) |
+	    FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_ATS_FUNC_INVAL);
+	cmd->dword1 = 0;
+}
+
+static inline void riscv_iommu_cmd_ats_prgr(struct riscv_iommu_command *cmd)
+{
+	cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_ATS_OPCODE) |
+	    FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_ATS_FUNC_PRGR);
+	cmd->dword1 = 0;
+}
+
+static void riscv_iommu_cmd_ats_set_rid(struct riscv_iommu_command *cmd, u32 rid)
+{
+	cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_ATS_RID, rid);
+}
+
+static void riscv_iommu_cmd_ats_set_pid(struct riscv_iommu_command *cmd, u32 pid)
+{
+	cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_ATS_PID, pid) | RISCV_IOMMU_CMD_ATS_PV;
+}
+
+static void riscv_iommu_cmd_ats_set_dseg(struct riscv_iommu_command *cmd, u8 seg)
+{
+	cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_ATS_DSEG, seg) | RISCV_IOMMU_CMD_ATS_DSV;
+}
+
+static void riscv_iommu_cmd_ats_set_payload(struct riscv_iommu_command *cmd, u64 payload)
+{
+	cmd->dword1 = payload;
+}
+
+/* Prepare the ATS invalidation payload */
+static unsigned long riscv_iommu_ats_inval_payload(unsigned long start,
+						   unsigned long end, bool global_inv)
+{
+	size_t len = end - start + 1;
+	unsigned long payload = 0;
+
+	/*
+	 * PCI Express specification
+	 * Section 10.2.3.2 Translation Range Size (S) Field
+	 */
+	if (len < PAGE_SIZE)
+		len = PAGE_SIZE;
+	else
+		len = __roundup_pow_of_two(len);
+
+	payload = (start & ~(len - 1)) | (((len - 1) >> 12) << 11);
+
+	if (global_inv)
+		payload |= RISCV_IOMMU_CMD_ATS_INVAL_G;
+
+	return payload;
+}
+
+/* Prepare the ATS invalidation payload for all translations to be invalidated. */
+static unsigned long riscv_iommu_ats_inval_all_payload(bool global_inv)
+{
+	unsigned long payload = GENMASK_ULL(62, 11);
+
+	if (global_inv)
+		payload |= RISCV_IOMMU_CMD_ATS_INVAL_G;
+
+	return payload;
+}
+
+/* Prepare the ATS "Page Request Group Response" payload */
+static unsigned long riscv_iommu_ats_prgr_payload(u16 dest_id, u8 resp_code, u16 grp_idx)
+{
+	return FIELD_PREP(RISCV_IOMMU_CMD_ATS_PRGR_DST_ID, dest_id) |
+	    FIELD_PREP(RISCV_IOMMU_CMD_ATS_PRGR_RESP_CODE, resp_code) |
+	    FIELD_PREP(RISCV_IOMMU_CMD_ATS_PRGR_PRG_INDEX, grp_idx);
+}
+
 /* TODO: Convert into lock-less MPSC implementation. */
 static bool riscv_iommu_post_sync(struct riscv_iommu_device *iommu,
 				  struct riscv_iommu_command *cmd, bool sync)
@@ -460,6 +543,16 @@ static bool riscv_iommu_iodir_inv_devid(struct riscv_iommu_device *iommu, unsign
 	return riscv_iommu_post(iommu, &cmd);
 }
 
+static bool riscv_iommu_iodir_inv_pasid(struct riscv_iommu_device *iommu,
+					unsigned devid, unsigned pasid)
+{
+	struct riscv_iommu_command cmd;
+	riscv_iommu_cmd_iodir_inval_pdt(&cmd);
+	riscv_iommu_cmd_iodir_set_did(&cmd, devid);
+	riscv_iommu_cmd_iodir_set_pid(&cmd, pasid);
+	return riscv_iommu_post(iommu, &cmd);
+}
+
 static bool riscv_iommu_iofence_sync(struct riscv_iommu_device *iommu)
 {
 	struct riscv_iommu_command cmd;
@@ -467,6 +560,62 @@ static bool riscv_iommu_iofence_sync(struct riscv_iommu_device *iommu)
 	return riscv_iommu_post_sync(iommu, &cmd, true);
 }
 
+static void riscv_iommu_mm_invalidate(struct mmu_notifier *mn,
+				      struct mm_struct *mm, unsigned long start,
+				      unsigned long end)
+{
+	struct riscv_iommu_command cmd;
+	struct riscv_iommu_endpoint *endpoint;
+	struct riscv_iommu_domain *domain =
+	    container_of(mn, struct riscv_iommu_domain, mn);
+	unsigned long iova;
+	/*
+	 * The mm_types defines vm_end as the first byte after the end address,
+	 * different from IOMMU subsystem using the last address of an address
+	 * range. So do a simple translation here by updating what end means.
+	 */
+	unsigned long payload = riscv_iommu_ats_inval_payload(start, end - 1, true);
+
+	riscv_iommu_cmd_inval_vma(&cmd);
+	riscv_iommu_cmd_inval_set_gscid(&cmd, 0);
+	riscv_iommu_cmd_inval_set_pscid(&cmd, domain->pscid);
+	if (end > start) {
+		/* Cover only the range that is needed */
+		for (iova = start; iova < end; iova += PAGE_SIZE) {
+			riscv_iommu_cmd_inval_set_addr(&cmd, iova);
+			riscv_iommu_post(domain->iommu, &cmd);
+		}
+	} else {
+		riscv_iommu_post(domain->iommu, &cmd);
+	}
+
+	riscv_iommu_iofence_sync(domain->iommu);
+
+	/* ATS invalidation for every device and for specific translation range. */
+	list_for_each_entry(endpoint, &domain->endpoints, domain) {
+		if (!endpoint->pasid_enabled)
+			continue;
+
+		riscv_iommu_cmd_ats_inval(&cmd);
+		riscv_iommu_cmd_ats_set_dseg(&cmd, endpoint->domid);
+		riscv_iommu_cmd_ats_set_rid(&cmd, endpoint->devid);
+		riscv_iommu_cmd_ats_set_pid(&cmd, domain->pasid);
+		riscv_iommu_cmd_ats_set_payload(&cmd, payload);
+		riscv_iommu_post(domain->iommu, &cmd);
+	}
+	riscv_iommu_iofence_sync(domain->iommu);
+}
+
+static void riscv_iommu_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
+{
+	/* TODO: removed from notifier, cleanup PSCID mapping, flush IOTLB */
+}
+
+static const struct mmu_notifier_ops riscv_iommu_mmuops = {
+	.release = riscv_iommu_mm_release,
+	.invalidate_range = riscv_iommu_mm_invalidate,
+};
+
 /* Command queue primary interrupt handler */
 static irqreturn_t riscv_iommu_cmdq_irq_check(int irq, void *data)
 {
@@ -608,6 +757,128 @@ static void riscv_iommu_add_device(struct riscv_iommu_device *iommu, struct devi
 	mutex_unlock(&iommu->eps_mutex);
 }
 
+/*
+ * Get device reference based on device identifier (requester id).
+ * Decrement reference count with put_device() call.
+ */
+static struct device *riscv_iommu_get_device(struct riscv_iommu_device *iommu,
+					     unsigned devid)
+{
+	struct rb_node *node;
+	struct riscv_iommu_endpoint *ep;
+	struct device *dev = NULL;
+
+	mutex_lock(&iommu->eps_mutex);
+
+	node = iommu->eps.rb_node;
+	while (node && !dev) {
+		ep = rb_entry(node, struct riscv_iommu_endpoint, node);
+		if (ep->devid < devid)
+			node = node->rb_right;
+		else if (ep->devid > devid)
+			node = node->rb_left;
+		else
+			dev = get_device(ep->dev);
+	}
+
+	mutex_unlock(&iommu->eps_mutex);
+
+	return dev;
+}
+
+static int riscv_iommu_ats_prgr(struct device *dev, struct iommu_page_response *msg)
+{
+	struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
+	struct riscv_iommu_command cmd;
+	u8 resp_code;
+	unsigned long payload;
+
+	switch (msg->code) {
+	case IOMMU_PAGE_RESP_SUCCESS:
+		resp_code = 0b0000;
+		break;
+	case IOMMU_PAGE_RESP_INVALID:
+		resp_code = 0b0001;
+		break;
+	case IOMMU_PAGE_RESP_FAILURE:
+		resp_code = 0b1111;
+		break;
+	}
+	payload = riscv_iommu_ats_prgr_payload(ep->devid, resp_code, msg->grpid);
+
+	/* ATS Page Request Group Response */
+	riscv_iommu_cmd_ats_prgr(&cmd);
+	riscv_iommu_cmd_ats_set_dseg(&cmd, ep->domid);
+	riscv_iommu_cmd_ats_set_rid(&cmd, ep->devid);
+	if (msg->flags & IOMMU_PAGE_RESP_PASID_VALID)
+		riscv_iommu_cmd_ats_set_pid(&cmd, msg->pasid);
+	riscv_iommu_cmd_ats_set_payload(&cmd, payload);
+	riscv_iommu_post(ep->iommu, &cmd);
+
+	return 0;
+}
+
+static void riscv_iommu_page_request(struct riscv_iommu_device *iommu,
+				     struct riscv_iommu_pq_record *req)
+{
+	struct iommu_fault_event event = { 0 };
+	struct iommu_fault_page_request *prm = &event.fault.prm;
+	int ret;
+	struct device *dev;
+	unsigned devid = FIELD_GET(RISCV_IOMMU_PREQ_HDR_DID, req->hdr);
+
+	/* Ignore PGR Stop marker. */
+	if ((req->payload & RISCV_IOMMU_PREQ_PAYLOAD_M) == RISCV_IOMMU_PREQ_PAYLOAD_L)
+		return;
+
+	dev = riscv_iommu_get_device(iommu, devid);
+	if (!dev) {
+		/* TODO: Handle invalid page request */
+		return;
+	}
+
+	event.fault.type = IOMMU_FAULT_PAGE_REQ;
+
+	if (req->payload & RISCV_IOMMU_PREQ_PAYLOAD_L)
+		prm->flags |= IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
+	if (req->payload & RISCV_IOMMU_PREQ_PAYLOAD_W)
+		prm->perm |= IOMMU_FAULT_PERM_WRITE;
+	if (req->payload & RISCV_IOMMU_PREQ_PAYLOAD_R)
+		prm->perm |= IOMMU_FAULT_PERM_READ;
+
+	prm->grpid = FIELD_GET(RISCV_IOMMU_PREQ_PRG_INDEX, req->payload);
+	prm->addr = FIELD_GET(RISCV_IOMMU_PREQ_UADDR, req->payload) << PAGE_SHIFT;
+
+	if (req->hdr & RISCV_IOMMU_PREQ_HDR_PV) {
+		prm->flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
+		/* TODO: where to find this bit */
+		prm->flags |= IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID;
+		prm->pasid = FIELD_GET(RISCV_IOMMU_PREQ_HDR_PID, req->hdr);
+	}
+
+	ret = iommu_report_device_fault(dev, &event);
+	if (ret) {
+		struct iommu_page_response resp = {
+			.grpid = prm->grpid,
+			.code = IOMMU_PAGE_RESP_FAILURE,
+		};
+		if (prm->flags & IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID) {
+			resp.flags |= IOMMU_PAGE_RESP_PASID_VALID;
+			resp.pasid = prm->pasid;
+		}
+		riscv_iommu_ats_prgr(dev, &resp);
+	}
+
+	put_device(dev);
+}
+
+static int riscv_iommu_page_response(struct device *dev,
+				     struct iommu_fault_event *evt,
+				     struct iommu_page_response *msg)
+{
+	return riscv_iommu_ats_prgr(dev, msg);
+}
+
 /* Page request interface queue primary interrupt handler */
 static irqreturn_t riscv_iommu_priq_irq_check(int irq, void *data)
 {
@@ -626,7 +897,7 @@ static irqreturn_t riscv_iommu_priq_process(int irq, void *data)
 	struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
 	struct riscv_iommu_device *iommu;
 	struct riscv_iommu_pq_record *requests;
-	unsigned cnt, idx, ctrl;
+	unsigned cnt, len, idx, ctrl;
 
 	iommu = container_of(q, struct riscv_iommu_device, priq);
 	requests = (struct riscv_iommu_pq_record *)q->base;
@@ -649,7 +920,8 @@ static irqreturn_t riscv_iommu_priq_process(int irq, void *data)
 		cnt = riscv_iommu_queue_consume(iommu, q, &idx);
 		if (!cnt)
 			break;
-		dev_warn(iommu->dev, "unexpected %u page requests\n", cnt);
+		for (len = 0; len < cnt; idx++, len++)
+			riscv_iommu_page_request(iommu, &requests[idx]);
 		riscv_iommu_queue_release(iommu, q, cnt);
 	} while (1);
 
@@ -660,6 +932,169 @@ static irqreturn_t riscv_iommu_priq_process(int irq, void *data)
  * Endpoint management
  */
 
+/* Endpoint features/capabilities */
+static void riscv_iommu_disable_ep(struct riscv_iommu_endpoint *ep)
+{
+	struct pci_dev *pdev;
+
+	if (!dev_is_pci(ep->dev))
+		return;
+
+	pdev = to_pci_dev(ep->dev);
+
+	if (ep->pasid_enabled) {
+		pci_disable_ats(pdev);
+		pci_disable_pri(pdev);
+		pci_disable_pasid(pdev);
+		ep->pasid_enabled = false;
+	}
+}
+
+static void riscv_iommu_enable_ep(struct riscv_iommu_endpoint *ep)
+{
+	int rc, feat, num;
+	struct pci_dev *pdev;
+	struct device *dev = ep->dev;
+
+	if (!dev_is_pci(dev))
+		return;
+
+	if (!ep->iommu->iommu.max_pasids)
+		return;
+
+	pdev = to_pci_dev(dev);
+
+	if (!pci_ats_supported(pdev))
+		return;
+
+	if (!pci_pri_supported(pdev))
+		return;
+
+	feat = pci_pasid_features(pdev);
+	if (feat < 0)
+		return;
+
+	num = pci_max_pasids(pdev);
+	if (!num) {
+		dev_warn(dev, "Can't enable PASID (num: %d)\n", num);
+		return;
+	}
+
+	if (num > ep->iommu->iommu.max_pasids)
+		num = ep->iommu->iommu.max_pasids;
+
+	rc = pci_enable_pasid(pdev, feat);
+	if (rc) {
+		dev_warn(dev, "Can't enable PASID (rc: %d)\n", rc);
+		return;
+	}
+
+	rc = pci_reset_pri(pdev);
+	if (rc) {
+		dev_warn(dev, "Can't reset PRI (rc: %d)\n", rc);
+		pci_disable_pasid(pdev);
+		return;
+	}
+
+	/* TODO: Get supported PRI queue length, hard-code to 32 entries */
+	rc = pci_enable_pri(pdev, 32);
+	if (rc) {
+		dev_warn(dev, "Can't enable PRI (rc: %d)\n", rc);
+		pci_disable_pasid(pdev);
+		return;
+	}
+
+	rc = pci_enable_ats(pdev, PAGE_SHIFT);
+	if (rc) {
+		dev_warn(dev, "Can't enable ATS (rc: %d)\n", rc);
+		pci_disable_pri(pdev);
+		pci_disable_pasid(pdev);
+		return;
+	}
+
+	ep->pc = (struct riscv_iommu_pc *)get_zeroed_page(GFP_KERNEL);
+	if (!ep->pc) {
+		pci_disable_ats(pdev);
+		pci_disable_pri(pdev);
+		pci_disable_pasid(pdev);
+		return;
+	}
+
+	ep->pasid_enabled = true;
+	ep->pasid_feat = feat;
+	ep->pasid_bits = ilog2(num);
+
+	dev_dbg(ep->dev, "PASID/ATS support enabled, %d bits\n", ep->pasid_bits);
+}
+
+static int riscv_iommu_enable_sva(struct device *dev)
+{
+	int ret;
+	struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
+
+	if (!ep || !ep->iommu || !ep->iommu->pq_work)
+		return -EINVAL;
+
+	if (!ep->pasid_enabled)
+		return -ENODEV;
+
+	ret = iopf_queue_add_device(ep->iommu->pq_work, dev);
+	if (ret)
+		return ret;
+
+	return iommu_register_device_fault_handler(dev, iommu_queue_iopf, dev);
+}
+
+static int riscv_iommu_disable_sva(struct device *dev)
+{
+	int ret;
+	struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
+
+	ret = iommu_unregister_device_fault_handler(dev);
+	if (!ret)
+		ret = iopf_queue_remove_device(ep->iommu->pq_work, dev);
+
+	return ret;
+}
+
+static int riscv_iommu_enable_iopf(struct device *dev)
+{
+	struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
+
+	if (ep && ep->pasid_enabled)
+		return 0;
+
+	return -EINVAL;
+}
+
+static int riscv_iommu_dev_enable_feat(struct device *dev, enum iommu_dev_features feat)
+{
+	switch (feat) {
+	case IOMMU_DEV_FEAT_IOPF:
+		return riscv_iommu_enable_iopf(dev);
+
+	case IOMMU_DEV_FEAT_SVA:
+		return riscv_iommu_enable_sva(dev);
+
+	default:
+		return -ENODEV;
+	}
+}
+
+static int riscv_iommu_dev_disable_feat(struct device *dev, enum iommu_dev_features feat)
+{
+	switch (feat) {
+	case IOMMU_DEV_FEAT_IOPF:
+		return 0;
+
+	case IOMMU_DEV_FEAT_SVA:
+		return riscv_iommu_disable_sva(dev);
+
+	default:
+		return -ENODEV;
+	}
+}
+
 static int riscv_iommu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
 	return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -812,6 +1247,7 @@ static struct iommu_device *riscv_iommu_probe_device(struct device *dev)
 
 	dev_iommu_priv_set(dev, ep);
 	riscv_iommu_add_device(iommu, dev);
+	riscv_iommu_enable_ep(ep);
 
 	return &iommu->iommu;
 }
@@ -843,6 +1279,8 @@ static void riscv_iommu_release_device(struct device *dev)
 		riscv_iommu_iodir_inv_devid(iommu, ep->devid);
 	}
 
+	riscv_iommu_disable_ep(ep);
+
 	/* Remove endpoint from IOMMU tracking structures */
 	mutex_lock(&iommu->eps_mutex);
 	rb_erase(&ep->node, &iommu->eps);
@@ -878,7 +1316,8 @@ static struct iommu_domain *riscv_iommu_domain_alloc(unsigned type)
 	    type != IOMMU_DOMAIN_DMA_FQ &&
 	    type != IOMMU_DOMAIN_UNMANAGED &&
 	    type != IOMMU_DOMAIN_IDENTITY &&
-	    type != IOMMU_DOMAIN_BLOCKED)
+	    type != IOMMU_DOMAIN_BLOCKED &&
+	    type != IOMMU_DOMAIN_SVA)
 		return NULL;
 
 	domain = kzalloc(sizeof(*domain), GFP_KERNEL);
@@ -906,6 +1345,9 @@ static void riscv_iommu_domain_free(struct iommu_domain *iommu_domain)
 		pr_warn("IOMMU domain is not empty!\n");
 	}
 
+	if (domain->mn.ops && iommu_domain->mm)
+		mmu_notifier_unregister(&domain->mn, iommu_domain->mm);
+
 	if (domain->pgtbl.cookie)
 		free_io_pgtable_ops(&domain->pgtbl.ops);
 
@@ -1023,14 +1465,29 @@ static int riscv_iommu_attach_dev(struct iommu_domain *iommu_domain, struct devi
 	 */
 	val = FIELD_PREP(RISCV_IOMMU_DC_TA_PSCID, domain->pscid);
 
-	dc->ta = cpu_to_le64(val);
-	dc->fsc = cpu_to_le64(riscv_iommu_domain_atp(domain));
+	if (ep->pasid_enabled) {
+		ep->pc[0].ta = cpu_to_le64(val | RISCV_IOMMU_PC_TA_V);
+		ep->pc[0].fsc = cpu_to_le64(riscv_iommu_domain_atp(domain));
+		dc->ta = 0;
+		dc->fsc = cpu_to_le64(virt_to_pfn(ep->pc) |
+		    FIELD_PREP(RISCV_IOMMU_DC_FSC_MODE, RISCV_IOMMU_DC_FSC_PDTP_MODE_PD8));
+	} else {
+		dc->ta = cpu_to_le64(val);
+		dc->fsc = cpu_to_le64(riscv_iommu_domain_atp(domain));
+	}
 
 	wmb();
 
 	/* Mark device context as valid, synchronise device context cache. */
 	val = RISCV_IOMMU_DC_TC_V;
 
+	if (ep->pasid_enabled) {
+		val |= RISCV_IOMMU_DC_TC_EN_ATS |
+		       RISCV_IOMMU_DC_TC_EN_PRI |
+		       RISCV_IOMMU_DC_TC_DPE |
+		       RISCV_IOMMU_DC_TC_PDTV;
+	}
+
 	if (ep->iommu->cap & RISCV_IOMMU_CAP_AMO) {
 		val |= RISCV_IOMMU_DC_TC_GADE |
 		       RISCV_IOMMU_DC_TC_SADE;
@@ -1051,13 +1508,107 @@ static int riscv_iommu_attach_dev(struct iommu_domain *iommu_domain, struct devi
 	return 0;
 }
 
+static int riscv_iommu_set_dev_pasid(struct iommu_domain *iommu_domain,
+				     struct device *dev, ioasid_t pasid)
+{
+	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
+	struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
+	u64 ta, fsc;
+
+	if (!iommu_domain || !iommu_domain->mm)
+		return -EINVAL;
+
+	/* Driver uses TC.DPE mode, PASID #0 is incorrect. */
+	if (pasid == 0)
+		return -EINVAL;
+
+	/* Incorrect domain identifier */
+	if ((int)domain->pscid < 0)
+		return -ENOMEM;
+
+	/* Process Context table should be set for pasid enabled endpoints. */
+	if (!ep || !ep->pasid_enabled || !ep->dc || !ep->pc)
+		return -ENODEV;
+
+	domain->pasid = pasid;
+	domain->iommu = ep->iommu;
+	domain->mn.ops = &riscv_iommu_mmuops;
+
+	/* register mm notifier */
+	if (mmu_notifier_register(&domain->mn, iommu_domain->mm))
+		return -ENODEV;
+
+	/* TODO: get SXL value for the process, use 32 bit or SATP mode */
+	fsc = virt_to_pfn(iommu_domain->mm->pgd) | satp_mode;
+	ta = RISCV_IOMMU_PC_TA_V | FIELD_PREP(RISCV_IOMMU_PC_TA_PSCID, domain->pscid);
+
+	fsc = le64_to_cpu(xchg_relaxed(&(ep->pc[pasid].fsc), cpu_to_le64(fsc)));
+	ta = le64_to_cpu(xchg_relaxed(&(ep->pc[pasid].ta), cpu_to_le64(ta)));
+
+	wmb();
+
+	if (ta & RISCV_IOMMU_PC_TA_V) {
+		riscv_iommu_iodir_inv_pasid(ep->iommu, ep->devid, pasid);
+		riscv_iommu_iofence_sync(ep->iommu);
+	}
+
+	dev_info(dev, "domain type %d attached w/ PSCID %u PASID %u\n",
+	    domain->domain.type, domain->pscid, domain->pasid);
+
+	return 0;
+}
+
+static void riscv_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
+{
+	struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
+	struct riscv_iommu_command cmd;
+	unsigned long payload = riscv_iommu_ats_inval_all_payload(false);
+	u64 ta;
+
+	/* invalidate TA.V */
+	ta = le64_to_cpu(xchg_relaxed(&(ep->pc[pasid].ta), 0));
+
+	wmb();
+
+	dev_info(dev, "domain removed w/ PSCID %u PASID %u\n",
+	    (unsigned)FIELD_GET(RISCV_IOMMU_PC_TA_PSCID, ta), pasid);
+
+	/* 1. invalidate PDT entry */
+	riscv_iommu_iodir_inv_pasid(ep->iommu, ep->devid, pasid);
+
+	/* 2. invalidate all matching IOATC entries (if PASID was valid) */
+	if (ta & RISCV_IOMMU_PC_TA_V) {
+		riscv_iommu_cmd_inval_vma(&cmd);
+		riscv_iommu_cmd_inval_set_gscid(&cmd, 0);
+		riscv_iommu_cmd_inval_set_pscid(&cmd,
+		    FIELD_GET(RISCV_IOMMU_PC_TA_PSCID, ta));
+		riscv_iommu_post(ep->iommu, &cmd);
+	}
+
+	/* 3. Wait IOATC flush to happen */
+	riscv_iommu_iofence_sync(ep->iommu);
+
+	/* 4. ATS invalidation */
+	riscv_iommu_cmd_ats_inval(&cmd);
+	riscv_iommu_cmd_ats_set_dseg(&cmd, ep->domid);
+	riscv_iommu_cmd_ats_set_rid(&cmd, ep->devid);
+	riscv_iommu_cmd_ats_set_pid(&cmd, pasid);
+	riscv_iommu_cmd_ats_set_payload(&cmd, payload);
+	riscv_iommu_post(ep->iommu, &cmd);
+
+	/* 5. Wait DevATC flush to happen */
+	riscv_iommu_iofence_sync(ep->iommu);
+}
+
 static void riscv_iommu_flush_iotlb_range(struct iommu_domain *iommu_domain,
 					  unsigned long *start, unsigned long *end,
 					  size_t *pgsize)
 {
 	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
 	struct riscv_iommu_command cmd;
+	struct riscv_iommu_endpoint *endpoint;
 	unsigned long iova;
+	unsigned long payload;
 
 	if (domain->mode == RISCV_IOMMU_DC_FSC_MODE_BARE)
 		return;
@@ -1065,6 +1616,12 @@ static void riscv_iommu_flush_iotlb_range(struct iommu_domain *iommu_domain,
 	/* Domain not attached to an IOMMU! */
 	BUG_ON(!domain->iommu);
 
+	if (start && end) {
+		payload = riscv_iommu_ats_inval_payload(*start, *end, true);
+	} else {
+		payload = riscv_iommu_ats_inval_all_payload(true);
+	}
+
 	riscv_iommu_cmd_inval_vma(&cmd);
 	riscv_iommu_cmd_inval_set_pscid(&cmd, domain->pscid);
 
@@ -1078,6 +1635,20 @@ static void riscv_iommu_flush_iotlb_range(struct iommu_domain *iommu_domain,
 		riscv_iommu_post(domain->iommu, &cmd);
 	}
 	riscv_iommu_iofence_sync(domain->iommu);
+
+	/* ATS invalidation for every device and for every translation */
+	list_for_each_entry(endpoint, &domain->endpoints, domain) {
+		if (!endpoint->pasid_enabled)
+			continue;
+
+		riscv_iommu_cmd_ats_inval(&cmd);
+		riscv_iommu_cmd_ats_set_dseg(&cmd, endpoint->domid);
+		riscv_iommu_cmd_ats_set_rid(&cmd, endpoint->devid);
+		riscv_iommu_cmd_ats_set_pid(&cmd, domain->pasid);
+		riscv_iommu_cmd_ats_set_payload(&cmd, payload);
+		riscv_iommu_post(domain->iommu, &cmd);
+	}
+	riscv_iommu_iofence_sync(domain->iommu);
 }
 
 static void riscv_iommu_flush_iotlb_all(struct iommu_domain *iommu_domain)
@@ -1310,6 +1881,7 @@ static int riscv_iommu_enable(struct riscv_iommu_device *iommu, unsigned request
 static const struct iommu_domain_ops riscv_iommu_domain_ops = {
 	.free = riscv_iommu_domain_free,
 	.attach_dev = riscv_iommu_attach_dev,
+	.set_dev_pasid = riscv_iommu_set_dev_pasid,
 	.map_pages = riscv_iommu_map_pages,
 	.unmap_pages = riscv_iommu_unmap_pages,
 	.iova_to_phys = riscv_iommu_iova_to_phys,
@@ -1326,9 +1898,13 @@ static const struct iommu_ops riscv_iommu_ops = {
 	.probe_device = riscv_iommu_probe_device,
 	.probe_finalize = riscv_iommu_probe_finalize,
 	.release_device = riscv_iommu_release_device,
+	.remove_dev_pasid = riscv_iommu_remove_dev_pasid,
 	.device_group = riscv_iommu_device_group,
 	.get_resv_regions = riscv_iommu_get_resv_regions,
 	.of_xlate = riscv_iommu_of_xlate,
+	.dev_enable_feat = riscv_iommu_dev_enable_feat,
+	.dev_disable_feat = riscv_iommu_dev_disable_feat,
+	.page_response = riscv_iommu_page_response,
 	.default_domain_ops = &riscv_iommu_domain_ops,
 };
 
@@ -1340,6 +1916,7 @@ void riscv_iommu_remove(struct riscv_iommu_device *iommu)
 	riscv_iommu_queue_free(iommu, &iommu->cmdq);
 	riscv_iommu_queue_free(iommu, &iommu->fltq);
 	riscv_iommu_queue_free(iommu, &iommu->priq);
+	iopf_queue_free(iommu->pq_work);
 }
 
 int riscv_iommu_init(struct riscv_iommu_device *iommu)
@@ -1362,6 +1939,12 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
 	}
 #endif
 
+	if (iommu->cap & RISCV_IOMMU_CAP_PD20)
+		iommu->iommu.max_pasids = 1u << 20;
+	else if (iommu->cap & RISCV_IOMMU_CAP_PD17)
+		iommu->iommu.max_pasids = 1u << 17;
+	else if (iommu->cap & RISCV_IOMMU_CAP_PD8)
+		iommu->iommu.max_pasids = 1u << 8;
 	/*
 	 * Assign queue lengths from module parameters if not already
 	 * set on the device tree.
@@ -1387,6 +1970,13 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
 		goto fail;
 	if (!(iommu->cap & RISCV_IOMMU_CAP_ATS))
 		goto no_ats;
+	/* PRI functionally depends on ATS’s capabilities. */
+	iommu->pq_work = iopf_queue_alloc(dev_name(dev));
+	if (!iommu->pq_work) {
+		dev_err(dev, "failed to allocate iopf queue\n");
+		ret = -ENOMEM;
+		goto fail;
+	}
 
 	ret = riscv_iommu_queue_init(iommu, RISCV_IOMMU_PAGE_REQUEST_QUEUE);
 	if (ret)
@@ -1424,5 +2014,6 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
 	riscv_iommu_queue_free(iommu, &iommu->priq);
 	riscv_iommu_queue_free(iommu, &iommu->fltq);
 	riscv_iommu_queue_free(iommu, &iommu->cmdq);
+	iopf_queue_free(iommu->pq_work);
 	return ret;
 }
diff --git a/drivers/iommu/riscv/iommu.h b/drivers/iommu/riscv/iommu.h
index fe32a4eff14e..83e8d00fd0f8 100644
--- a/drivers/iommu/riscv/iommu.h
+++ b/drivers/iommu/riscv/iommu.h
@@ -17,9 +17,11 @@
 #include <linux/iova.h>
 #include <linux/io.h>
 #include <linux/idr.h>
+#include <linux/mmu_notifier.h>
 #include <linux/list.h>
 #include <linux/iommu.h>
 #include <linux/io-pgtable.h>
+#include <linux/mmu_notifier.h>
 
 #include "iommu-bits.h"
 
@@ -76,6 +78,9 @@ struct riscv_iommu_device {
 	unsigned ddt_mode;
 	bool ddtp_in_iomem;
 
+	/* I/O page fault queue */
+	struct iopf_queue *pq_work;
+
 	/* hardware queues */
 	struct riscv_iommu_queue cmdq;
 	struct riscv_iommu_queue fltq;
@@ -91,11 +96,14 @@ struct riscv_iommu_domain {
 	struct io_pgtable pgtbl;
 
 	struct list_head endpoints;
+	struct list_head notifiers;
 	struct mutex lock;
+	struct mmu_notifier mn;
 	struct riscv_iommu_device *iommu;
 
 	unsigned mode;		/* RIO_ATP_MODE_* enum */
 	unsigned pscid;		/* RISC-V IOMMU PSCID */
+	ioasid_t pasid;		/* IOMMU_DOMAIN_SVA: Cached PASID */
 
 	pgd_t *pgd_root;	/* page table root pointer */
 };
@@ -107,10 +115,16 @@ struct riscv_iommu_endpoint {
 	unsigned domid;    			/* PCI domain number, segment */
 	struct rb_node node;    		/* device tracking node (lookup by devid) */
 	struct riscv_iommu_dc *dc;		/* device context pointer */
+	struct riscv_iommu_pc *pc;		/* process context root, valid if pasid_enabled is true */
 	struct riscv_iommu_device *iommu;	/* parent iommu device */
 
 	struct mutex lock;
 	struct list_head domain;		/* endpoint attached managed domain */
+
+	/* end point info bits */
+	unsigned pasid_bits;
+	unsigned pasid_feat;
+	bool pasid_enabled;
 };
 
 /* Helper functions and macros */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 10/11] RISC-V: drivers/iommu/riscv: Add MSI identity remapping
  2023-07-19 19:33 [PATCH 00/13] Linux RISC-V IOMMU Support Tomasz Jeznach
                   ` (8 preceding siblings ...)
  2023-07-19 19:33 ` [PATCH 09/11] RISC-V: drivers/iommu/riscv: Add SVA with PASID/ATS/PRI support Tomasz Jeznach
@ 2023-07-19 19:33 ` Tomasz Jeznach
  2023-07-31  8:02   ` Zong Li
  2023-08-16 21:43   ` Robin Murphy
  2023-07-19 19:33 ` [PATCH 11/11] RISC-V: drivers/iommu/riscv: Add G-Stage translation support Tomasz Jeznach
       [not found] ` <CAHCEehJKYu3-GSX2L6L4_VVvYt1MagRgPJvYTbqekrjPw3ZSkA@mail.gmail.com>
  11 siblings, 2 replies; 86+ messages in thread
From: Tomasz Jeznach @ 2023-07-19 19:33 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley
  Cc: Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Nick Kossifidis, Sebastien Boeuf, iommu, linux-riscv,
	linux-kernel, linux, Tomasz Jeznach

This change provides basic identity mapping support to
excercise MSI_FLAT hardware capability.

Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
---
 drivers/iommu/riscv/iommu.c | 81 +++++++++++++++++++++++++++++++++++++
 drivers/iommu/riscv/iommu.h |  3 ++
 2 files changed, 84 insertions(+)

diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index 6042c35be3ca..7b3e3e135cf6 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -61,6 +61,9 @@ MODULE_PARM_DESC(priq_length, "Page request interface queue length.");
 #define RISCV_IOMMU_MAX_PSCID	(1U << 20)
 static DEFINE_IDA(riscv_iommu_pscids);
 
+/* TODO: Enable MSI remapping */
+#define RISCV_IMSIC_BASE	0x28000000
+
 /* 1 second */
 #define RISCV_IOMMU_TIMEOUT	riscv_timebase
 
@@ -932,6 +935,72 @@ static irqreturn_t riscv_iommu_priq_process(int irq, void *data)
  * Endpoint management
  */
 
+static int riscv_iommu_enable_ir(struct riscv_iommu_endpoint *ep)
+{
+	struct riscv_iommu_device *iommu = ep->iommu;
+	struct iommu_resv_region *entry;
+	struct irq_domain *msi_domain;
+	u64 val;
+	int i;
+
+	/* Initialize MSI remapping */
+	if (!ep->dc || !(iommu->cap & RISCV_IOMMU_CAP_MSI_FLAT))
+		return 0;
+
+	ep->msi_root = (struct riscv_iommu_msi_pte *)get_zeroed_page(GFP_KERNEL);
+	if (!ep->msi_root)
+		return -ENOMEM;
+
+	for (i = 0; i < 256; i++) {
+		ep->msi_root[i].pte = RISCV_IOMMU_MSI_PTE_V |
+		    FIELD_PREP(RISCV_IOMMU_MSI_PTE_M, 3) |
+		    phys_to_ppn(RISCV_IMSIC_BASE + i * PAGE_SIZE);
+	}
+
+	entry = iommu_alloc_resv_region(RISCV_IMSIC_BASE, PAGE_SIZE * 256, 0,
+					IOMMU_RESV_SW_MSI, GFP_KERNEL);
+	if (entry)
+		list_add_tail(&entry->list, &ep->regions);
+
+	val = virt_to_pfn(ep->msi_root) |
+	    FIELD_PREP(RISCV_IOMMU_DC_MSIPTP_MODE, RISCV_IOMMU_DC_MSIPTP_MODE_FLAT);
+	ep->dc->msiptp = cpu_to_le64(val);
+
+	/* Single page of MSIPTP, 256 IMSIC files */
+	ep->dc->msi_addr_mask = cpu_to_le64(255);
+	ep->dc->msi_addr_pattern = cpu_to_le64(RISCV_IMSIC_BASE >> 12);
+	wmb();
+
+	/* set msi domain for the device as isolated. hack. */
+	msi_domain = dev_get_msi_domain(ep->dev);
+	if (msi_domain) {
+		msi_domain->flags |= IRQ_DOMAIN_FLAG_ISOLATED_MSI;
+	}
+
+	dev_dbg(ep->dev, "RV-IR enabled\n");
+
+	ep->ir_enabled = true;
+
+	return 0;
+}
+
+static void riscv_iommu_disable_ir(struct riscv_iommu_endpoint *ep)
+{
+	if (!ep->ir_enabled)
+		return;
+
+	ep->dc->msi_addr_pattern = 0ULL;
+	ep->dc->msi_addr_mask = 0ULL;
+	ep->dc->msiptp = 0ULL;
+	wmb();
+
+	dev_dbg(ep->dev, "RV-IR disabled\n");
+
+	free_pages((unsigned long)ep->msi_root, 0);
+	ep->msi_root = NULL;
+	ep->ir_enabled = false;
+}
+
 /* Endpoint features/capabilities */
 static void riscv_iommu_disable_ep(struct riscv_iommu_endpoint *ep)
 {
@@ -1226,6 +1295,7 @@ static struct iommu_device *riscv_iommu_probe_device(struct device *dev)
 
 	mutex_init(&ep->lock);
 	INIT_LIST_HEAD(&ep->domain);
+	INIT_LIST_HEAD(&ep->regions);
 
 	if (dev_is_pci(dev)) {
 		ep->devid = pci_dev_id(to_pci_dev(dev));
@@ -1248,6 +1318,7 @@ static struct iommu_device *riscv_iommu_probe_device(struct device *dev)
 	dev_iommu_priv_set(dev, ep);
 	riscv_iommu_add_device(iommu, dev);
 	riscv_iommu_enable_ep(ep);
+	riscv_iommu_enable_ir(ep);
 
 	return &iommu->iommu;
 }
@@ -1279,6 +1350,7 @@ static void riscv_iommu_release_device(struct device *dev)
 		riscv_iommu_iodir_inv_devid(iommu, ep->devid);
 	}
 
+	riscv_iommu_disable_ir(ep);
 	riscv_iommu_disable_ep(ep);
 
 	/* Remove endpoint from IOMMU tracking structures */
@@ -1301,6 +1373,15 @@ static struct iommu_group *riscv_iommu_device_group(struct device *dev)
 
 static void riscv_iommu_get_resv_regions(struct device *dev, struct list_head *head)
 {
+	struct iommu_resv_region *entry, *new_entry;
+	struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
+
+	list_for_each_entry(entry, &ep->regions, list) {
+		new_entry = kmemdup(entry, sizeof(*entry), GFP_KERNEL);
+		if (new_entry)
+			list_add_tail(&new_entry->list, head);
+	}
+
 	iommu_dma_get_resv_regions(dev, head);
 }
 
diff --git a/drivers/iommu/riscv/iommu.h b/drivers/iommu/riscv/iommu.h
index 83e8d00fd0f8..55418a1144fb 100644
--- a/drivers/iommu/riscv/iommu.h
+++ b/drivers/iommu/riscv/iommu.h
@@ -117,14 +117,17 @@ struct riscv_iommu_endpoint {
 	struct riscv_iommu_dc *dc;		/* device context pointer */
 	struct riscv_iommu_pc *pc;		/* process context root, valid if pasid_enabled is true */
 	struct riscv_iommu_device *iommu;	/* parent iommu device */
+	struct riscv_iommu_msi_pte *msi_root;	/* interrupt re-mapping */
 
 	struct mutex lock;
 	struct list_head domain;		/* endpoint attached managed domain */
+	struct list_head regions;		/* reserved regions, interrupt remapping window */
 
 	/* end point info bits */
 	unsigned pasid_bits;
 	unsigned pasid_feat;
 	bool pasid_enabled;
+	bool ir_enabled;
 };
 
 /* Helper functions and macros */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 11/11] RISC-V: drivers/iommu/riscv: Add G-Stage translation support
  2023-07-19 19:33 [PATCH 00/13] Linux RISC-V IOMMU Support Tomasz Jeznach
                   ` (9 preceding siblings ...)
  2023-07-19 19:33 ` [PATCH 10/11] RISC-V: drivers/iommu/riscv: Add MSI identity remapping Tomasz Jeznach
@ 2023-07-19 19:33 ` Tomasz Jeznach
  2023-07-31  8:12   ` Zong Li
  2023-08-16 21:13   ` Robin Murphy
       [not found] ` <CAHCEehJKYu3-GSX2L6L4_VVvYt1MagRgPJvYTbqekrjPw3ZSkA@mail.gmail.com>
  11 siblings, 2 replies; 86+ messages in thread
From: Tomasz Jeznach @ 2023-07-19 19:33 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley
  Cc: Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Nick Kossifidis, Sebastien Boeuf, iommu, linux-riscv,
	linux-kernel, linux, Tomasz Jeznach

This change introduces 2nd stage translation configuration
support, enabling nested translation for IOMMU hardware.
Pending integration with VMM IOMMUFD interfaces to manage
1st stage translation and IOMMU virtialization interfaces.

Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
---
 drivers/iommu/riscv/iommu.c | 58 ++++++++++++++++++++++++++++---------
 drivers/iommu/riscv/iommu.h |  3 +-
 2 files changed, 46 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index 7b3e3e135cf6..3ca2f0194d3c 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -1418,6 +1418,19 @@ static struct iommu_domain *riscv_iommu_domain_alloc(unsigned type)
 	return &domain->domain;
 }
 
+/* mark domain as second-stage translation */
+static int riscv_iommu_enable_nesting(struct iommu_domain *iommu_domain)
+{
+	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
+
+	mutex_lock(&domain->lock);
+	if (list_empty(&domain->endpoints))
+		domain->g_stage = true;
+	mutex_unlock(&domain->lock);
+
+	return domain->g_stage ? 0 : -EBUSY;
+}
+
 static void riscv_iommu_domain_free(struct iommu_domain *iommu_domain)
 {
 	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
@@ -1433,7 +1446,7 @@ static void riscv_iommu_domain_free(struct iommu_domain *iommu_domain)
 		free_io_pgtable_ops(&domain->pgtbl.ops);
 
 	if (domain->pgd_root)
-		free_pages((unsigned long)domain->pgd_root, 0);
+		free_pages((unsigned long)domain->pgd_root, domain->g_stage ? 2 : 0);
 
 	if ((int)domain->pscid > 0)
 		ida_free(&riscv_iommu_pscids, domain->pscid);
@@ -1483,7 +1496,8 @@ static int riscv_iommu_domain_finalize(struct riscv_iommu_domain *domain,
 
 	/* TODO: Fix this for RV32 */
 	domain->mode = satp_mode >> 60;
-	domain->pgd_root = (pgd_t *) __get_free_pages(GFP_KERNEL | __GFP_ZERO, 0);
+	domain->pgd_root = (pgd_t *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
+						      domain->g_stage ? 2 : 0);
 
 	if (!domain->pgd_root)
 		return -ENOMEM;
@@ -1499,6 +1513,8 @@ static u64 riscv_iommu_domain_atp(struct riscv_iommu_domain *domain)
 	u64 atp = FIELD_PREP(RISCV_IOMMU_DC_FSC_MODE, domain->mode);
 	if (domain->mode != RISCV_IOMMU_DC_FSC_MODE_BARE)
 		atp |= FIELD_PREP(RISCV_IOMMU_DC_FSC_PPN, virt_to_pfn(domain->pgd_root));
+	if (domain->g_stage)
+		atp |= FIELD_PREP(RISCV_IOMMU_DC_IOHGATP_GSCID, domain->pscid);
 	return atp;
 }
 
@@ -1541,20 +1557,30 @@ static int riscv_iommu_attach_dev(struct iommu_domain *iommu_domain, struct devi
 	if (!dc)
 		return -ENODEV;
 
-	/*
-	 * S-Stage translation table. G-Stage remains unmodified (BARE).
-	 */
-	val = FIELD_PREP(RISCV_IOMMU_DC_TA_PSCID, domain->pscid);
-
-	if (ep->pasid_enabled) {
-		ep->pc[0].ta = cpu_to_le64(val | RISCV_IOMMU_PC_TA_V);
-		ep->pc[0].fsc = cpu_to_le64(riscv_iommu_domain_atp(domain));
+	if (domain->g_stage) {
+		/*
+		 * Enable G-Stage translation with initial pass-through mode
+		 * for S-Stage. VMM is responsible for more restrictive
+		 * guest VA translation scheme configuration.
+		 */
 		dc->ta = 0;
-		dc->fsc = cpu_to_le64(virt_to_pfn(ep->pc) |
-		    FIELD_PREP(RISCV_IOMMU_DC_FSC_MODE, RISCV_IOMMU_DC_FSC_PDTP_MODE_PD8));
+		dc->fsc = 0ULL; /* RISCV_IOMMU_DC_FSC_MODE_BARE */ ;
+		dc->iohgatp = cpu_to_le64(riscv_iommu_domain_atp(domain));
 	} else {
-		dc->ta = cpu_to_le64(val);
-		dc->fsc = cpu_to_le64(riscv_iommu_domain_atp(domain));
+		/* S-Stage translation table. G-Stage remains unmodified. */
+		if (ep->pasid_enabled) {
+			val = FIELD_PREP(RISCV_IOMMU_DC_TA_PSCID, domain->pscid);
+			ep->pc[0].ta = cpu_to_le64(val | RISCV_IOMMU_PC_TA_V);
+			ep->pc[0].fsc = cpu_to_le64(riscv_iommu_domain_atp(domain));
+			dc->ta = 0;
+			val = FIELD_PREP(RISCV_IOMMU_DC_FSC_MODE,
+					  RISCV_IOMMU_DC_FSC_PDTP_MODE_PD8);
+			dc->fsc = cpu_to_le64(val | virt_to_pfn(ep->pc));
+		} else {
+			val = FIELD_PREP(RISCV_IOMMU_DC_TA_PSCID, domain->pscid);
+			dc->ta = cpu_to_le64(val);
+			dc->fsc = cpu_to_le64(riscv_iommu_domain_atp(domain));
+		}
 	}
 
 	wmb();
@@ -1599,6 +1625,9 @@ static int riscv_iommu_set_dev_pasid(struct iommu_domain *iommu_domain,
 	if (!iommu_domain || !iommu_domain->mm)
 		return -EINVAL;
 
+	if (domain->g_stage)
+		return -EINVAL;
+
 	/* Driver uses TC.DPE mode, PASID #0 is incorrect. */
 	if (pasid == 0)
 		return -EINVAL;
@@ -1969,6 +1998,7 @@ static const struct iommu_domain_ops riscv_iommu_domain_ops = {
 	.iotlb_sync = riscv_iommu_iotlb_sync,
 	.iotlb_sync_map = riscv_iommu_iotlb_sync_map,
 	.flush_iotlb_all = riscv_iommu_flush_iotlb_all,
+	.enable_nesting = riscv_iommu_enable_nesting,
 };
 
 static const struct iommu_ops riscv_iommu_ops = {
diff --git a/drivers/iommu/riscv/iommu.h b/drivers/iommu/riscv/iommu.h
index 55418a1144fb..55e5aafea5bc 100644
--- a/drivers/iommu/riscv/iommu.h
+++ b/drivers/iommu/riscv/iommu.h
@@ -102,8 +102,9 @@ struct riscv_iommu_domain {
 	struct riscv_iommu_device *iommu;
 
 	unsigned mode;		/* RIO_ATP_MODE_* enum */
-	unsigned pscid;		/* RISC-V IOMMU PSCID */
+	unsigned pscid;		/* RISC-V IOMMU PSCID / GSCID */
 	ioasid_t pasid;		/* IOMMU_DOMAIN_SVA: Cached PASID */
+	bool g_stage;		/* 2nd stage translation domain */
 
 	pgd_t *pgd_root;	/* page table root pointer */
 };
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings
  2023-07-19 19:33 ` [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings Tomasz Jeznach
@ 2023-07-19 20:19   ` Conor Dooley
       [not found]     ` <CAH2o1u6CZSb7pXcaXmh7dJQmNZYh3uORk4x7vJPrb+uCwFdU5g@mail.gmail.com>
  2023-07-19 21:37     ` Rob Herring
  2023-07-24  8:03   ` Zong Li
  1 sibling, 2 replies; 86+ messages in thread
From: Conor Dooley @ 2023-07-19 20:19 UTC (permalink / raw)
  To: Tomasz Jeznach
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley,
	Anup Patel, Albert Ou, linux, linux-kernel, Sebastien Boeuf,
	iommu, Palmer Dabbelt, Nick Kossifidis, linux-riscv, robh+dt,
	krzysztof.kozlowski+dt, devicetree

[-- Attachment #1: Type: text/plain, Size: 5717 bytes --]

Hey Tomasz,

On Wed, Jul 19, 2023 at 12:33:47PM -0700, Tomasz Jeznach wrote:
> From: Anup Patel <apatel@ventanamicro.com>
> 
> We add DT bindings document for RISC-V IOMMU platform and PCI devices
> defined by the RISC-V IOMMU specification.
> 
> Signed-off-by: Anup Patel <apatel@ventanamicro.com>

Your signoff is missing from here.

Secondly, as get_maintainer.pl would have told you, dt-bindings patches
need to be sent to the dt-binding maintainers and list.
+CC maintainers & list.

Thirdly, dt-binding patches should come before their users.

> ---
>  .../bindings/iommu/riscv,iommu.yaml           | 146 ++++++++++++++++++
>  1 file changed, 146 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> 
> diff --git a/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> new file mode 100644
> index 000000000000..8a9aedb61768
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> @@ -0,0 +1,146 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/iommu/riscv,iommu.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: RISC-V IOMMU Implementation
> +
> +maintainers:
> +  - Tomasz Jeznach <tjeznach@rivosinc.com>

What about Anup, who seems to have written this?
Or your co-authors of the drivers?

> +
> +description:
> +  The RISC-V IOMMU specificaiton defines an IOMMU for RISC-V platforms
> +  which can be a regular platform device or a PCI device connected to
> +  the host root port.
> +
> +  The RISC-V IOMMU provides two stage translation, device directory table,
> +  command queue and fault reporting as wired interrupt or MSIx event for
> +  both PCI and platform devices.
> +
> +  Visit https://github.com/riscv-non-isa/riscv-iommu for more details.
> +
> +properties:
> +  compatible:
> +    oneOf:
> +      - description: RISC-V IOMMU as a platform device
> +        items:
> +          - enum:
> +              - vendor,chip-iommu

These dummy compatibles are not valid, as was pointed out to Anup on
the AIA series. Please go look at what was done there instead:
https://lore.kernel.org/all/20230719113542.2293295-7-apatel@ventanamicro.com/

> +          - const: riscv,iommu
> +
> +      - description: RISC-V IOMMU as a PCI device connected to root port
> +        items:
> +          - enum:
> +              - vendor,chip-pci-iommu
> +          - const: riscv,pci-iommu

I'm not really au fait with the arm smmu stuff, but do any of its
versions support being connected to a root port? 

> +  reg:
> +    maxItems: 1
> +    description:
> +      For RISC-V IOMMU as a platform device, this represents the MMIO base
> +      address of registers.
> +
> +      For RISC-V IOMMU as a PCI device, this represents the PCI-PCI bridge
> +      details as described in Documentation/devicetree/bindings/pci/pci.txt
> +
> +  '#iommu-cells':
> +    const: 2
> +    description: |

|s are only needed where formatting needs to be preserved.

> +      Each IOMMU specifier represents the base device ID and number of
> +      device IDs.
> +
> +  interrupts:
> +    minItems: 1
> +    maxItems: 16

What are any of these interrupts?

> +    description:
> +      The presence of this property implies that given RISC-V IOMMU uses
> +      wired interrupts to notify the RISC-V HARTS (or CPUs).
> +
> +  msi-parent:
> +    description:
> +      The presence of this property implies that given RISC-V IOMMU uses
> +      MSIx to notify the RISC-V HARTs (or CPUs). This property should be
> +      considered only when the interrupts property is absent.
> +
> +  dma-coherent:

RISC-V is dma-coherent by default, should this not be dma-noncoherent
instead?

> +    description:
> +      Present if page table walks and DMA accessed made by the RISC-V IOMMU
> +      are cache coherent with the CPU.
> +
> +  power-domains:
> +    maxItems: 1
> +
> +required:
> +  - compatible
> +  - reg
> +  - '#iommu-cells'
> +
> +additionalProperties: false
> +
> +examples:
> +  - |
> +    /* Example 1 (IOMMU platform device with wired interrupts) */
> +    immu1: iommu@1bccd000 {

Why is this "immu"? typo or intentional?

> +        compatible = "vendor,chip-iommu", "riscv,iommu";
> +        reg = <0x1bccd000 0x1000>;
> +        interrupt-parent = <&aplic_smode>;
> +        interrupts = <32 4>, <33 4>, <34 4>, <35 4>;
> +        #iommu-cells = <2>;
> +    };
> +
> +    /* Device with two IOMMU device IDs, 0 and 7 */
> +    master1 {
> +        iommus = <&immu1 0 1>, <&immu1 7 1>;
> +    };
> +
> +  - |
> +    /* Example 2 (IOMMU platform device with MSIs) */
> +    immu2: iommu@1bcdd000 {
> +        compatible = "vendor,chip-iommu", "riscv,iommu";
> +        reg = <0x1bccd000 0x1000>;
> +        msi-parent = <&imsics_smode>;
> +        #iommu-cells = <2>;
> +    };
> +
> +    bus {
> +        #address-cells = <2>;
> +        #size-cells = <2>;
> +
> +        /* Device with IOMMU device IDs ranging from 32 to 64 */
> +        master1 {
> +                iommus = <&immu2 32 32>;
> +        };
> +
> +        pcie@40000000 {
> +            compatible = "pci-host-cam-generic";
> +            device_type = "pci";
> +            #address-cells = <3>;
> +            #size-cells = <2>;
> +            bus-range = <0x0 0x1>;
> +
> +            /* CPU_PHYSICAL(2)  SIZE(2) */

These sort of comments seem to just repeat what address-cells &
size-cells has already said, no?

Thanks,
Conor.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 02/11] RISC-V: arch/riscv/config: enable RISC-V IOMMU support
  2023-07-19 19:33 ` [PATCH 02/11] RISC-V: arch/riscv/config: enable RISC-V IOMMU support Tomasz Jeznach
@ 2023-07-19 20:22   ` Conor Dooley
  2023-07-19 21:07     ` Tomasz Jeznach
  0 siblings, 1 reply; 86+ messages in thread
From: Conor Dooley @ 2023-07-19 20:22 UTC (permalink / raw)
  To: Tomasz Jeznach
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley,
	Anup Patel, Albert Ou, linux, linux-kernel, Sebastien Boeuf,
	iommu, Palmer Dabbelt, Nick Kossifidis, linux-riscv

[-- Attachment #1: Type: text/plain, Size: 1196 bytes --]

On Wed, Jul 19, 2023 at 12:33:46PM -0700, Tomasz Jeznach wrote:

$subject: RISC-V: arch/riscv/config: enable RISC-V IOMMU support

Please look at any other commits to the files you are touching
and use a subject line that emulates them. In this case, try
git log --oneline --no-merges -- arch/riscv/configs/
Same goes for the odd pattern in your driver patches.

Also, the patch may be trivial, but you still need to sign off on it
and provide a commit message.

Thanks,
Conor.

> ---
>  arch/riscv/configs/defconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/riscv/configs/defconfig b/arch/riscv/configs/defconfig
> index 0a0107460a5c..1a0c3b24329f 100644
> --- a/arch/riscv/configs/defconfig
> +++ b/arch/riscv/configs/defconfig
> @@ -178,6 +178,7 @@ CONFIG_VIRTIO_PCI=y
>  CONFIG_VIRTIO_BALLOON=y
>  CONFIG_VIRTIO_INPUT=y
>  CONFIG_VIRTIO_MMIO=y
> +CONFIG_RISCV_IOMMU=y
>  CONFIG_SUN8I_DE2_CCU=m
>  CONFIG_SUN50I_IOMMU=y
>  CONFIG_RPMSG_CHAR=y
> -- 
> 2.34.1
> 
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 01/11] RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support.
  2023-07-19 19:33 ` [PATCH 01/11] RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support Tomasz Jeznach
@ 2023-07-19 20:49   ` Conor Dooley
  2023-07-19 21:43     ` Tomasz Jeznach
  2023-07-20 10:38   ` Baolu Lu
                     ` (6 subsequent siblings)
  7 siblings, 1 reply; 86+ messages in thread
From: Conor Dooley @ 2023-07-19 20:49 UTC (permalink / raw)
  To: Tomasz Jeznach
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley,
	Anup Patel, Albert Ou, linux, linux-kernel, Sebastien Boeuf,
	iommu, Palmer Dabbelt, Nick Kossifidis, linux-riscv

[-- Attachment #1: Type: text/plain, Size: 2710 bytes --]

Hey Tomasz,

On Wed, Jul 19, 2023 at 12:33:45PM -0700, Tomasz Jeznach wrote:
> The patch introduces skeleton IOMMU device driver implementation as defined
> by RISC-V IOMMU Architecture Specification, Version 1.0 [1], with minimal support
> for pass-through mapping, basic initialization and bindings for platform and PCIe
> hardware implementations.
> 
> Series of patches following specification evolution has been reorganized to provide
> functional separation of implemented blocks, compliant with ratified specification.
> 
> This and following patch series includes code contributed by: Nick Kossifidis
> <mick@ics.forth.gr> (iommu-platform device, number of specification clarification
> and bugfixes and readability improvements), Sebastien Boeuf <seb@rivosinc.com> (page
> table creation, ATS/PGR flow).
> 
> Complete history can be found at the maintainer's repository branch [2].
> 
> Device driver enables RISC-V 32/64 support for memory translation for DMA capable
> PCI and platform devices, multilevel device directory table, process directory,
> shared virtual address support, wired and message signaled interrupt for translation
> I/O fault, page request interface and command processing.
> 
> Matching RISCV-V IOMMU device emulation implementation is available for QEMU project,
> along with educational device extensions for PASID ATS/PRI support [3].

This commit message reads like a cover letter IMO. At whatever point
you send a v2, could you re-write this focusing on what is done in the
patch itself?

Also, since I am not going to reply to any of these iommu driver patches
in a meaningful capacity, please run checkpatch.pl on your work. There
are well over 100 style etc complaints that it has highlighted. Sparse
has also gone a bit nuts, with many warnings along the lines of:
drivers/iommu/riscv/iommu.c:1568:29: warning: incorrect type in assignment (different base types)
drivers/iommu/riscv/iommu.c:1568:29:    expected unsigned long long [usertype] iohgatp
drivers/iommu/riscv/iommu.c:1568:29:    got restricted __le64 [usertype]

I can provide you the full list when the patchwork automation has run
through the series.

Anyway, what I wanted to ask was whether it was valid to use the IOMMU
in a system if Ziommu is not present in whatever the ISA extension
communication mechanism is? Eg, riscv,isa or the ISA string property in
the ACPI tables.

Thanks,
Conor.

> References:
>  - [1] https://github.com/riscv-non-isa/riscv-iommu
>  - [2] https://github.com/tjeznach/linux/tree/tjeznach/riscv-iommu
>  - [3] https://github.com/tjeznach/qemu/tree/tjeznach/riscv-iommu

FYI, we have the Link: tag/trailer for this.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings
       [not found]     ` <CAH2o1u6CZSb7pXcaXmh7dJQmNZYh3uORk4x7vJPrb+uCwFdU5g@mail.gmail.com>
@ 2023-07-19 20:57       ` Conor Dooley
  0 siblings, 0 replies; 86+ messages in thread
From: Conor Dooley @ 2023-07-19 20:57 UTC (permalink / raw)
  To: Tomasz Jeznach
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley,
	Anup Patel, Albert Ou, linux, linux-kernel, Sebastien Boeuf,
	iommu, Palmer Dabbelt, Nick Kossifidis, linux-riscv, robh+dt,
	krzysztof.kozlowski+dt, devicetree

[-- Attachment #1: Type: text/plain, Size: 8126 bytes --]

On Wed, Jul 19, 2023 at 01:52:28PM -0700, Tomasz Jeznach wrote:
> On Wed, Jul 19, 2023 at 1:19 PM Conor Dooley <conor@kernel.org> wrote:
> 
> > Hey Tomasz,
> >
> > On Wed, Jul 19, 2023 at 12:33:47PM -0700, Tomasz Jeznach wrote:
> > > From: Anup Patel <apatel@ventanamicro.com>
> > >
> > > We add DT bindings document for RISC-V IOMMU platform and PCI devices
> > > defined by the RISC-V IOMMU specification.
> > >
> > > Signed-off-by: Anup Patel <apatel@ventanamicro.com>
> >
> > Your signoff is missing from here.
> >
> > Secondly, as get_maintainer.pl would have told you, dt-bindings patches
> > need to be sent to the dt-binding maintainers and list.
> > +CC maintainers & list.
> >
> > Thirdly, dt-binding patches should come before their users.
> >
> 
> 
> Thank you for pointing out and adding DT maintainers.
> The signoff is definitely missing, and I'll will amend with other fixes /
> reordering.

Yeah, please wait until you get actual feedback on the drivers etc
though before you do that.

Also, don't send html emails to the mailing lists. They will be rejected
and those outside of direct-cc will not see the emails.

> > > ---
> > >  .../bindings/iommu/riscv,iommu.yaml           | 146 ++++++++++++++++++
> > >  1 file changed, 146 insertions(+)
> > >  create mode 100644
> > Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > >
> > > diff --git a/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > > new file mode 100644
> > > index 000000000000..8a9aedb61768
> > > --- /dev/null
> > > +++ b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > > @@ -0,0 +1,146 @@
> > > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > > +%YAML 1.2
> > > +---
> > > +$id: http://devicetree.org/schemas/iommu/riscv,iommu.yaml#
> > > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > > +
> > > +title: RISC-V IOMMU Implementation
> > > +
> > > +maintainers:
> > > +  - Tomasz Jeznach <tjeznach@rivosinc.com>
> >
> > What about Anup, who seems to have written this?
> > Or your co-authors of the drivers?
> >
> >
> Anup provided only device tree riscv,iommu bindings proposal, but handed
> over its maintenance.
> 
> > +
> > > +description:
> > > +  The RISC-V IOMMU specificaiton defines an IOMMU for RISC-V platforms
> > > +  which can be a regular platform device or a PCI device connected to
> > > +  the host root port.
> > > +
> > > +  The RISC-V IOMMU provides two stage translation, device directory
> > table,
> > > +  command queue and fault reporting as wired interrupt or MSIx event for
> > > +  both PCI and platform devices.
> > > +
> > > +  Visit https://github.com/riscv-non-isa/riscv-iommu for more details.
> > > +
> > > +properties:
> > > +  compatible:
> > > +    oneOf:
> > > +      - description: RISC-V IOMMU as a platform device
> > > +        items:
> > > +          - enum:
> > > +              - vendor,chip-iommu
> >
> > These dummy compatibles are not valid, as was pointed out to Anup on
> > the AIA series. Please go look at what was done there instead:
> >
> > https://lore.kernel.org/all/20230719113542.2293295-7-apatel@ventanamicro.com/
> >
> >
> Thank you, good pointer, seams like the same comments apply here. Will go
> through the discussion and update.
> 
> 
> > > +          - const: riscv,iommu
> > > +
> > > +      - description: RISC-V IOMMU as a PCI device connected to root port
> > > +        items:
> > > +          - enum:
> > > +              - vendor,chip-pci-iommu
> > > +          - const: riscv,pci-iommu
> >
> > I'm not really au fait with the arm smmu stuff, but do any of its
> > versions support being connected to a root port?
> >
> >
> RISC-V IOMMU allows them to be connected to the root port, or presented as
> a platform device.

That is not quite what I asked... What I want to know is why we are
doing something different to Arm's SMMU stuff & whether it is because
RISC-V has extra capabilities, or the binding itself is flawed.

(There's no more comments from me below, just making sure the mail's
contents reaches lore)

Cheers,
Conor.

> > > +  reg:
> > > +    maxItems: 1
> > > +    description:
> > > +      For RISC-V IOMMU as a platform device, this represents the MMIO
> > base
> > > +      address of registers.
> > > +
> > > +      For RISC-V IOMMU as a PCI device, this represents the PCI-PCI
> > bridge
> > > +      details as described in
> > Documentation/devicetree/bindings/pci/pci.txt
> > > +
> > > +  '#iommu-cells':
> > > +    const: 2
> > > +    description: |
> >
> > |s are only needed where formatting needs to be preserved.
> >
> > > +      Each IOMMU specifier represents the base device ID and number of
> > > +      device IDs.
> > > +
> > > +  interrupts:
> > > +    minItems: 1
> > > +    maxItems: 16
> >
> > What are any of these interrupts?
> >
> >
> I'll add a description to the file. In short queue interfaces signalling to
> the driver.
> 
> 
> > +    description:
> > > +      The presence of this property implies that given RISC-V IOMMU uses
> > > +      wired interrupts to notify the RISC-V HARTS (or CPUs).
> > > +
> > > +  msi-parent:
> > > +    description:
> > > +      The presence of this property implies that given RISC-V IOMMU uses
> > > +      MSIx to notify the RISC-V HARTs (or CPUs). This property should be
> > > +      considered only when the interrupts property is absent.
> > > +
> > > +  dma-coherent:
> >
> > RISC-V is dma-coherent by default, should this not be dma-noncoherent
> > instead?
> >
> >
> Very valid comment. I'm ok to reverse the flag unless anyone objects.
> 
> 
> > > +    description:
> > > +      Present if page table walks and DMA accessed made by the RISC-V
> > IOMMU
> > > +      are cache coherent with the CPU.
> > > +
> > > +  power-domains:
> > > +    maxItems: 1
> > > +
> > > +required:
> > > +  - compatible
> > > +  - reg
> > > +  - '#iommu-cells'
> > > +
> > > +additionalProperties: false
> > > +
> > > +examples:
> > > +  - |
> > > +    /* Example 1 (IOMMU platform device with wired interrupts) */
> > > +    immu1: iommu@1bccd000 {
> >
> > Why is this "immu"? typo or intentional?
> >
> 
> I guess there was no particular naming schema here, but I might defer this
> question to the author.
> 
> 
> >
> > > +        compatible = "vendor,chip-iommu", "riscv,iommu";
> > > +        reg = <0x1bccd000 0x1000>;
> > > +        interrupt-parent = <&aplic_smode>;
> > > +        interrupts = <32 4>, <33 4>, <34 4>, <35 4>;
> > > +        #iommu-cells = <2>;
> > > +    };
> > > +
> > > +    /* Device with two IOMMU device IDs, 0 and 7 */
> > > +    master1 {
> > > +        iommus = <&immu1 0 1>, <&immu1 7 1>;
> > > +    };
> > > +
> > > +  - |
> > > +    /* Example 2 (IOMMU platform device with MSIs) */
> > > +    immu2: iommu@1bcdd000 {
> > > +        compatible = "vendor,chip-iommu", "riscv,iommu";
> > > +        reg = <0x1bccd000 0x1000>;
> > > +        msi-parent = <&imsics_smode>;
> > > +        #iommu-cells = <2>;
> > > +    };
> > > +
> > > +    bus {
> > > +        #address-cells = <2>;
> > > +        #size-cells = <2>;
> > > +
> > > +        /* Device with IOMMU device IDs ranging from 32 to 64 */
> > > +        master1 {
> > > +                iommus = <&immu2 32 32>;
> > > +        };
> > > +
> > > +        pcie@40000000 {
> > > +            compatible = "pci-host-cam-generic";
> > > +            device_type = "pci";
> > > +            #address-cells = <3>;
> > > +            #size-cells = <2>;
> > > +            bus-range = <0x0 0x1>;
> > > +
> > > +            /* CPU_PHYSICAL(2)  SIZE(2) */
> >
> > These sort of comments seem to just repeat what address-cells &
> > size-cells has already said, no?
> >
> >
> Correct.
> 
> 
> 
> > Thanks,
> > Conor.
> >
> 
> 
> Thank you Conor for prompt response and comments.
> I'll address them in the next version.
> 
> - Tomasz

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 02/11] RISC-V: arch/riscv/config: enable RISC-V IOMMU support
  2023-07-19 20:22   ` Conor Dooley
@ 2023-07-19 21:07     ` Tomasz Jeznach
  2023-07-20  6:37       ` Krzysztof Kozlowski
  0 siblings, 1 reply; 86+ messages in thread
From: Tomasz Jeznach @ 2023-07-19 21:07 UTC (permalink / raw)
  To: Conor Dooley
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley,
	Anup Patel, Albert Ou, linux, linux-kernel, Sebastien Boeuf,
	iommu, Palmer Dabbelt, Nick Kossifidis, linux-riscv

On Wed, Jul 19, 2023 at 1:22 PM Conor Dooley <conor@kernel.org> wrote:
>
> On Wed, Jul 19, 2023 at 12:33:46PM -0700, Tomasz Jeznach wrote:
>
> $subject: RISC-V: arch/riscv/config: enable RISC-V IOMMU support
>
> Please look at any other commits to the files you are touching
> and use a subject line that emulates them. In this case, try
> git log --oneline --no-merges -- arch/riscv/configs/
> Same goes for the odd pattern in your driver patches.
>
> Also, the patch may be trivial, but you still need to sign off on it
> and provide a commit message.
>

ack. added to-do the list for v2.

Thank you,
- Tomasz

> Thanks,
> Conor.
>
> > ---
> >  arch/riscv/configs/defconfig | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/riscv/configs/defconfig b/arch/riscv/configs/defconfig
> > index 0a0107460a5c..1a0c3b24329f 100644
> > --- a/arch/riscv/configs/defconfig
> > +++ b/arch/riscv/configs/defconfig
> > @@ -178,6 +178,7 @@ CONFIG_VIRTIO_PCI=y
> >  CONFIG_VIRTIO_BALLOON=y
> >  CONFIG_VIRTIO_INPUT=y
> >  CONFIG_VIRTIO_MMIO=y
> > +CONFIG_RISCV_IOMMU=y
> >  CONFIG_SUN8I_DE2_CCU=m
> >  CONFIG_SUN50I_IOMMU=y
> >  CONFIG_RPMSG_CHAR=y
> > --
> > 2.34.1
> >
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings
  2023-07-19 20:19   ` Conor Dooley
       [not found]     ` <CAH2o1u6CZSb7pXcaXmh7dJQmNZYh3uORk4x7vJPrb+uCwFdU5g@mail.gmail.com>
@ 2023-07-19 21:37     ` Rob Herring
  2023-07-19 23:04       ` Tomasz Jeznach
  1 sibling, 1 reply; 86+ messages in thread
From: Rob Herring @ 2023-07-19 21:37 UTC (permalink / raw)
  To: Conor Dooley, Tomasz Jeznach
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley,
	Anup Patel, Albert Ou, linux, linux-kernel, Sebastien Boeuf,
	iommu, Palmer Dabbelt, Nick Kossifidis, linux-riscv,
	krzysztof.kozlowski+dt, devicetree

On Wed, Jul 19, 2023 at 2:19 PM Conor Dooley <conor@kernel.org> wrote:
>
> Hey Tomasz,
>
> On Wed, Jul 19, 2023 at 12:33:47PM -0700, Tomasz Jeznach wrote:
> > From: Anup Patel <apatel@ventanamicro.com>
> >
> > We add DT bindings document for RISC-V IOMMU platform and PCI devices
> > defined by the RISC-V IOMMU specification.
> >
> > Signed-off-by: Anup Patel <apatel@ventanamicro.com>
>
> Your signoff is missing from here.
>
> Secondly, as get_maintainer.pl would have told you, dt-bindings patches
> need to be sent to the dt-binding maintainers and list.
> +CC maintainers & list.
>
> Thirdly, dt-binding patches should come before their users.
>
> > ---
> >  .../bindings/iommu/riscv,iommu.yaml           | 146 ++++++++++++++++++
> >  1 file changed, 146 insertions(+)
> >  create mode 100644 Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> >
> > diff --git a/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > new file mode 100644
> > index 000000000000..8a9aedb61768
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > @@ -0,0 +1,146 @@
> > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > +%YAML 1.2
> > +---
> > +$id: http://devicetree.org/schemas/iommu/riscv,iommu.yaml#
> > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > +
> > +title: RISC-V IOMMU Implementation
> > +
> > +maintainers:
> > +  - Tomasz Jeznach <tjeznach@rivosinc.com>
>
> What about Anup, who seems to have written this?
> Or your co-authors of the drivers?
>
> > +
> > +description:
> > +  The RISC-V IOMMU specificaiton defines an IOMMU for RISC-V platforms

typo

> > +  which can be a regular platform device or a PCI device connected to
> > +  the host root port.
> > +
> > +  The RISC-V IOMMU provides two stage translation, device directory table,
> > +  command queue and fault reporting as wired interrupt or MSIx event for
> > +  both PCI and platform devices.

TBC, you want a PCI device that's an IOMMU and the IOMMU serves
(provides translation for) PCI devices?

> > +
> > +  Visit https://github.com/riscv-non-isa/riscv-iommu for more details.
> > +
> > +properties:
> > +  compatible:
> > +    oneOf:
> > +      - description: RISC-V IOMMU as a platform device

"platform device" is a Linux term. Don't use Linux terms in bindings.

> > +        items:
> > +          - enum:
> > +              - vendor,chip-iommu
>
> These dummy compatibles are not valid, as was pointed out to Anup on
> the AIA series. Please go look at what was done there instead:
> https://lore.kernel.org/all/20230719113542.2293295-7-apatel@ventanamicro.com/
>
> > +          - const: riscv,iommu
> > +
> > +      - description: RISC-V IOMMU as a PCI device connected to root port
> > +        items:
> > +          - enum:
> > +              - vendor,chip-pci-iommu
> > +          - const: riscv,pci-iommu
>
> I'm not really au fait with the arm smmu stuff, but do any of its
> versions support being connected to a root port?

PCI devices have a defined format for the compatible string based on
VID/PID. For PCI, also usually don't need to be described in DT
because they are discoverable. The exception is when there's parts
which aren't. Which parts aren't?

> > +  reg:
> > +    maxItems: 1
> > +    description:
> > +      For RISC-V IOMMU as a platform device, this represents the MMIO base
> > +      address of registers.
> > +
> > +      For RISC-V IOMMU as a PCI device, this represents the PCI-PCI bridge

Your IOMMU is also a PCI-PCI bridge? Is that a normal PCI thing?


> > +      details as described in Documentation/devicetree/bindings/pci/pci.txt

Don't refer to pci.txt. It is going to be removed.

> > +
> > +  '#iommu-cells':
> > +    const: 2
> > +    description: |
>
> |s are only needed where formatting needs to be preserved.
>
> > +      Each IOMMU specifier represents the base device ID and number of
> > +      device IDs.

Doesn't that assume device IDs are contiguous? Generally not a safe assumption.

> > +
> > +  interrupts:
> > +    minItems: 1
> > +    maxItems: 16
>
> What are any of these interrupts?
>
> > +    description:
> > +      The presence of this property implies that given RISC-V IOMMU uses
> > +      wired interrupts to notify the RISC-V HARTS (or CPUs).
> > +
> > +  msi-parent:
> > +    description:
> > +      The presence of this property implies that given RISC-V IOMMU uses
> > +      MSIx to notify the RISC-V HARTs (or CPUs). This property should be
> > +      considered only when the interrupts property is absent.

This doesn't make sense for a PCI device. PCI defines its own way to
describe MSI support.

> > +
> > +  dma-coherent:
>
> RISC-V is dma-coherent by default, should this not be dma-noncoherent
> instead?
>
> > +    description:
> > +      Present if page table walks and DMA accessed made by the RISC-V IOMMU
> > +      are cache coherent with the CPU.
> > +
> > +  power-domains:
> > +    maxItems: 1
> > +
> > +required:
> > +  - compatible
> > +  - reg
> > +  - '#iommu-cells'
> > +
> > +additionalProperties: false
> > +
> > +examples:
> > +  - |
> > +    /* Example 1 (IOMMU platform device with wired interrupts) */
> > +    immu1: iommu@1bccd000 {
>
> Why is this "immu"? typo or intentional?
>
> > +        compatible = "vendor,chip-iommu", "riscv,iommu";
> > +        reg = <0x1bccd000 0x1000>;
> > +        interrupt-parent = <&aplic_smode>;
> > +        interrupts = <32 4>, <33 4>, <34 4>, <35 4>;
> > +        #iommu-cells = <2>;
> > +    };
> > +
> > +    /* Device with two IOMMU device IDs, 0 and 7 */
> > +    master1 {
> > +        iommus = <&immu1 0 1>, <&immu1 7 1>;
> > +    };
> > +
> > +  - |
> > +    /* Example 2 (IOMMU platform device with MSIs) */
> > +    immu2: iommu@1bcdd000 {
> > +        compatible = "vendor,chip-iommu", "riscv,iommu";
> > +        reg = <0x1bccd000 0x1000>;
> > +        msi-parent = <&imsics_smode>;
> > +        #iommu-cells = <2>;
> > +    };
> > +
> > +    bus {
> > +        #address-cells = <2>;
> > +        #size-cells = <2>;
> > +
> > +        /* Device with IOMMU device IDs ranging from 32 to 64 */
> > +        master1 {
> > +                iommus = <&immu2 32 32>;
> > +        };
> > +
> > +        pcie@40000000 {
> > +            compatible = "pci-host-cam-generic";
> > +            device_type = "pci";
> > +            #address-cells = <3>;
> > +            #size-cells = <2>;
> > +            bus-range = <0x0 0x1>;
> > +
> > +            /* CPU_PHYSICAL(2)  SIZE(2) */

I'm guessing there was more after this, but I don't have it...

Guessing, immu2 is a PCI device, but it translates for master1 which
is not a PCI device? Weird. Why would anyone build such a thing?


Rob

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 01/11] RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support.
  2023-07-19 20:49   ` Conor Dooley
@ 2023-07-19 21:43     ` Tomasz Jeznach
  2023-07-20 19:27       ` Conor Dooley
  2023-07-21  9:44       ` Conor Dooley
  0 siblings, 2 replies; 86+ messages in thread
From: Tomasz Jeznach @ 2023-07-19 21:43 UTC (permalink / raw)
  To: Conor Dooley
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley,
	Anup Patel, Albert Ou, linux, linux-kernel, Sebastien Boeuf,
	iommu, Palmer Dabbelt, Nick Kossifidis, linux-riscv

On Wed, Jul 19, 2023 at 1:50 PM Conor Dooley <conor@kernel.org> wrote:
>
> Hey Tomasz,
>
> On Wed, Jul 19, 2023 at 12:33:45PM -0700, Tomasz Jeznach wrote:
> > The patch introduces skeleton IOMMU device driver implementation as defined
>
> This commit message reads like a cover letter IMO. At whatever point
> you send a v2, could you re-write this focusing on what is done in the
> patch itself?
>

ack. will amend the commit message.


> Also, since I am not going to reply to any of these iommu driver patches
> in a meaningful capacity, please run checkpatch.pl on your work. There
> are well over 100 style etc complaints that it has highlighted. Sparse
> has also gone a bit nuts, with many warnings along the lines of:
> drivers/iommu/riscv/iommu.c:1568:29: warning: incorrect type in assignment (different base types)
> drivers/iommu/riscv/iommu.c:1568:29:    expected unsigned long long [usertype] iohgatp
> drivers/iommu/riscv/iommu.c:1568:29:    got restricted __le64 [usertype]
>
> I can provide you the full list when the patchwork automation has run
> through the series.
>

Thank you, a list of used lint checkers definitely would help.


> Anyway, what I wanted to ask was whether it was valid to use the IOMMU
> in a system if Ziommu is not present in whatever the ISA extension
> communication mechanism is? Eg, riscv,isa or the ISA string property in
> the ACPI tables.
>

Yes, this has been pointed out to me already. As far as I can recall,
there was a discussion
at some point to introduce those as Ziommu extensions, later agreeing
not to call IOMMU
using ISA string conventions.  Will remove remaining occurrences of
Ziommu from the series.


> Thanks,
> Conor.
>
> > References:
> >  - [1] https://github.com/riscv-non-isa/riscv-iommu
> >  - [2] https://github.com/tjeznach/linux/tree/tjeznach/riscv-iommu
> >  - [3] https://github.com/tjeznach/qemu/tree/tjeznach/riscv-iommu
>
> FYI, we have the Link: tag/trailer for this.
>


Thanks,
- Tomasz

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings
  2023-07-19 21:37     ` Rob Herring
@ 2023-07-19 23:04       ` Tomasz Jeznach
  0 siblings, 0 replies; 86+ messages in thread
From: Tomasz Jeznach @ 2023-07-19 23:04 UTC (permalink / raw)
  To: Rob Herring
  Cc: Conor Dooley, Joerg Roedel, Will Deacon, Robin Murphy,
	Paul Walmsley, Anup Patel, Albert Ou, linux, linux-kernel,
	Sebastien Boeuf, iommu, Palmer Dabbelt, Nick Kossifidis,
	linux-riscv, krzysztof.kozlowski+dt, devicetree

On Wed, Jul 19, 2023 at 2:37 PM Rob Herring <robh+dt@kernel.org> wrote:
>
> On Wed, Jul 19, 2023 at 2:19 PM Conor Dooley <conor@kernel.org> wrote:
> >
> > Hey Tomasz,
> >
> > On Wed, Jul 19, 2023 at 12:33:47PM -0700, Tomasz Jeznach wrote:
> > > From: Anup Patel <apatel@ventanamicro.com>
> > >
> > > We add DT bindings document for RISC-V IOMMU platform and PCI devices
> > > defined by the RISC-V IOMMU specification.
> > >
> > > Signed-off-by: Anup Patel <apatel@ventanamicro.com>
> >
> > Your signoff is missing from here.
> >
> > Secondly, as get_maintainer.pl would have told you, dt-bindings patches
> > need to be sent to the dt-binding maintainers and list.
> > +CC maintainers & list.
> >
> > Thirdly, dt-binding patches should come before their users.
> >
> > > ---
> > >  .../bindings/iommu/riscv,iommu.yaml           | 146 ++++++++++++++++++
> > >  1 file changed, 146 insertions(+)
> > >  create mode 100644 Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > >
> > > diff --git a/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > > new file mode 100644
> > > index 000000000000..8a9aedb61768
> > > --- /dev/null
> > > +++ b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > > @@ -0,0 +1,146 @@
> > > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > > +%YAML 1.2
> > > +---
> > > +$id: http://devicetree.org/schemas/iommu/riscv,iommu.yaml#
> > > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > > +
> > > +title: RISC-V IOMMU Implementation
> > > +
> > > +maintainers:
> > > +  - Tomasz Jeznach <tjeznach@rivosinc.com>
> >
> > What about Anup, who seems to have written this?
> > Or your co-authors of the drivers?
> >
> > > +
> > > +description:
> > > +  The RISC-V IOMMU specificaiton defines an IOMMU for RISC-V platforms
>
> typo
>

ack

> > > +  which can be a regular platform device or a PCI device connected to
> > > +  the host root port.
> > > +
> > > +  The RISC-V IOMMU provides two stage translation, device directory table,
> > > +  command queue and fault reporting as wired interrupt or MSIx event for
> > > +  both PCI and platform devices.
>
> TBC, you want a PCI device that's an IOMMU and the IOMMU serves
> (provides translation for) PCI devices?
>

Yes, IOMMU as a PCIe device providing address translation services for
connect PCIe root complex.

> > > +
> > > +  Visit https://github.com/riscv-non-isa/riscv-iommu for more details.
> > > +
> > > +properties:
> > > +  compatible:
> > > +    oneOf:
> > > +      - description: RISC-V IOMMU as a platform device
>
> "platform device" is a Linux term. Don't use Linux terms in bindings.
>

ack.


> > > +        items:
> > > +          - enum:
> > > +              - vendor,chip-iommu
> >
> > These dummy compatibles are not valid, as was pointed out to Anup on
> > the AIA series. Please go look at what was done there instead:
> > https://lore.kernel.org/all/20230719113542.2293295-7-apatel@ventanamicro.com/
> >
> > > +          - const: riscv,iommu
> > > +
> > > +      - description: RISC-V IOMMU as a PCI device connected to root port
> > > +        items:
> > > +          - enum:
> > > +              - vendor,chip-pci-iommu
> > > +          - const: riscv,pci-iommu
> >
> > I'm not really au fait with the arm smmu stuff, but do any of its
> > versions support being connected to a root port?
>
> PCI devices have a defined format for the compatible string based on
> VID/PID. For PCI, also usually don't need to be described in DT
> because they are discoverable. The exception is when there's parts
> which aren't. Which parts aren't?
>

We've put 'riscv,pci-iommu' node here to describe relationship between PCIe
devices and IOMMU(s), needed for the pcie root complex description (iommu-map).
If there is a better way to reference PCI-IOMMU without adding
pci-iommu definition
that would solve the problem. Every other property of pci-iommu should
be discoverable.

> > > +  reg:
> > > +    maxItems: 1
> > > +    description:
> > > +      For RISC-V IOMMU as a platform device, this represents the MMIO base
> > > +      address of registers.
> > > +
> > > +      For RISC-V IOMMU as a PCI device, this represents the PCI-PCI bridge
>
> Your IOMMU is also a PCI-PCI bridge? Is that a normal PCI thing?
>

It's allowed to be integrated with root complex / IO bridge, but it is
as a separate PCIe device.
I'll clarify the description.

>
> > > +      details as described in Documentation/devicetree/bindings/pci/pci.txt
>
> Don't refer to pci.txt. It is going to be removed.
>

ack.

> > > +
> > > +  '#iommu-cells':
> > > +    const: 2
> > > +    description: |
> >
> > |s are only needed where formatting needs to be preserved.
> >
> > > +      Each IOMMU specifier represents the base device ID and number of
> > > +      device IDs.
>
> Doesn't that assume device IDs are contiguous? Generally not a safe assumption.
>

ack.

> > > +
> > > +  interrupts:
> > > +    minItems: 1
> > > +    maxItems: 16
> >
> > What are any of these interrupts?
> >
> > > +    description:
> > > +      The presence of this property implies that given RISC-V IOMMU uses
> > > +      wired interrupts to notify the RISC-V HARTS (or CPUs).
> > > +
> > > +  msi-parent:
> > > +    description:
> > > +      The presence of this property implies that given RISC-V IOMMU uses
> > > +      MSIx to notify the RISC-V HARTs (or CPUs). This property should be
> > > +      considered only when the interrupts property is absent.
>
> This doesn't make sense for a PCI device. PCI defines its own way to
> describe MSI support.
>

Agree, this is for IOMMU as a non-PCI device, capable of sending MSI.
Follows 'MSI clients' notes from
devicetree/bindings/interrupt-controller/msi.txt
Is this a proper way to describe this relationship?

> > > +
> > > +  dma-coherent:
> >
> > RISC-V is dma-coherent by default, should this not be dma-noncoherent
> > instead?
> >
> > > +    description:
> > > +      Present if page table walks and DMA accessed made by the RISC-V IOMMU
> > > +      are cache coherent with the CPU.
> > > +
> > > +  power-domains:
> > > +    maxItems: 1
> > > +
> > > +required:
> > > +  - compatible
> > > +  - reg
> > > +  - '#iommu-cells'
> > > +
> > > +additionalProperties: false
> > > +
> > > +examples:
> > > +  - |
> > > +    /* Example 1 (IOMMU platform device with wired interrupts) */
> > > +    immu1: iommu@1bccd000 {
> >
> > Why is this "immu"? typo or intentional?
> >
> > > +        compatible = "vendor,chip-iommu", "riscv,iommu";
> > > +        reg = <0x1bccd000 0x1000>;
> > > +        interrupt-parent = <&aplic_smode>;
> > > +        interrupts = <32 4>, <33 4>, <34 4>, <35 4>;
> > > +        #iommu-cells = <2>;
> > > +    };
> > > +
> > > +    /* Device with two IOMMU device IDs, 0 and 7 */
> > > +    master1 {
> > > +        iommus = <&immu1 0 1>, <&immu1 7 1>;
> > > +    };
> > > +
> > > +  - |
> > > +    /* Example 2 (IOMMU platform device with MSIs) */
> > > +    immu2: iommu@1bcdd000 {
> > > +        compatible = "vendor,chip-iommu", "riscv,iommu";
> > > +        reg = <0x1bccd000 0x1000>;
> > > +        msi-parent = <&imsics_smode>;
> > > +        #iommu-cells = <2>;
> > > +    };
> > > +
> > > +    bus {
> > > +        #address-cells = <2>;
> > > +        #size-cells = <2>;
> > > +
> > > +        /* Device with IOMMU device IDs ranging from 32 to 64 */
> > > +        master1 {
> > > +                iommus = <&immu2 32 32>;
> > > +        };
> > > +
> > > +        pcie@40000000 {
> > > +            compatible = "pci-host-cam-generic";
> > > +            device_type = "pci";
> > > +            #address-cells = <3>;
> > > +            #size-cells = <2>;
> > > +            bus-range = <0x0 0x1>;
> > > +
> > > +            /* CPU_PHYSICAL(2)  SIZE(2) */
>
> I'm guessing there was more after this, but I don't have it...

Complete patch 3 is at:
https://lore.kernel.org/linux-iommu/cover.1689792825.git.tjeznach@rivosinc.com/T/#mbf8dc4098fb09b87b2618c5c545ae882f11b114b

>
> Guessing, immu2 is a PCI device, but it translates for master1 which
> is not a PCI device? Weird. Why would anyone build such a thing?
>

In this example immu2 is a non-PCI device. Agree, otherwise would be weird.

>
> Rob

Thank you,
- Tomasz

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues
  2023-07-19 19:33 ` [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues Tomasz Jeznach
@ 2023-07-20  3:11   ` Nick Kossifidis
  2023-07-20 18:00     ` Tomasz Jeznach
  2023-07-20 13:08   ` Baolu Lu
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 86+ messages in thread
From: Nick Kossifidis @ 2023-07-20  3:11 UTC (permalink / raw)
  To: Tomasz Jeznach, Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley
  Cc: Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Sebastien Boeuf, iommu, linux-riscv, linux-kernel, linux

Hello Tomasz,

On 7/19/23 22:33, Tomasz Jeznach wrote:
> Enables message or wire signal interrupts for PCIe and platforms devices.
> 

The description doesn't match the subject nor the patch content (we 
don't jus enable interrupts, we also init the queues).

> +	/* Parse Queue lengts */
> +	ret = of_property_read_u32(pdev->dev.of_node, "cmdq_len", &iommu->cmdq_len);
> +	if (!ret)
> +		dev_info(dev, "command queue length set to %i\n", iommu->cmdq_len);
> +
> +	ret = of_property_read_u32(pdev->dev.of_node, "fltq_len", &iommu->fltq_len);
> +	if (!ret)
> +		dev_info(dev, "fault/event queue length set to %i\n", iommu->fltq_len);
> +
> +	ret = of_property_read_u32(pdev->dev.of_node, "priq_len", &iommu->priq_len);
> +	if (!ret)
> +		dev_info(dev, "page request queue length set to %i\n", iommu->priq_len);
> +
>   	dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
>   

We need to add those to the device tree binding doc (or throw them away, 
I thought it would be better to have them as part of the device 
desciption than a module parameter).


> +static irqreturn_t riscv_iommu_priq_irq_check(int irq, void *data);
> +static irqreturn_t riscv_iommu_priq_process(int irq, void *data);
> +

> +	case RISCV_IOMMU_PAGE_REQUEST_QUEUE:
> +		q = &iommu->priq;
> +		q->len = sizeof(struct riscv_iommu_pq_record);
> +		count = iommu->priq_len;
> +		irq = iommu->irq_priq;
> +		irq_check = riscv_iommu_priq_irq_check;
> +		irq_process = riscv_iommu_priq_process;
> +		q->qbr = RISCV_IOMMU_REG_PQB;
> +		q->qcr = RISCV_IOMMU_REG_PQCSR;
> +		name = "priq";
> +		break;


It makes more sense to add the code for the page request queue in the 
patch that adds ATS/PRI support IMHO. This comment also applies to its 
interrupt handlers below.


> +static inline void riscv_iommu_cmd_inval_set_addr(struct riscv_iommu_command *cmd,
> +						  u64 addr)
> +{
> +	cmd->dword0 |= RISCV_IOMMU_CMD_IOTINVAL_AV;
> +	cmd->dword1 = addr;
> +}
> +

This needs to be (addr >> 2) to match the spec, same as in the iofence 
command.

Regards,
Nick


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 02/11] RISC-V: arch/riscv/config: enable RISC-V IOMMU support
  2023-07-19 21:07     ` Tomasz Jeznach
@ 2023-07-20  6:37       ` Krzysztof Kozlowski
  0 siblings, 0 replies; 86+ messages in thread
From: Krzysztof Kozlowski @ 2023-07-20  6:37 UTC (permalink / raw)
  To: Tomasz Jeznach, Conor Dooley
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley,
	Anup Patel, Albert Ou, linux, linux-kernel, Sebastien Boeuf,
	iommu, Palmer Dabbelt, Nick Kossifidis, linux-riscv

On 19/07/2023 23:07, Tomasz Jeznach wrote:
> On Wed, Jul 19, 2023 at 1:22 PM Conor Dooley <conor@kernel.org> wrote:
>>
>> On Wed, Jul 19, 2023 at 12:33:46PM -0700, Tomasz Jeznach wrote:
>>
>> $subject: RISC-V: arch/riscv/config: enable RISC-V IOMMU support
>>
>> Please look at any other commits to the files you are touching
>> and use a subject line that emulates them. In this case, try
>> git log --oneline --no-merges -- arch/riscv/configs/
>> Same goes for the odd pattern in your driver patches.
>>
>> Also, the patch may be trivial, but you still need to sign off on it
>> and provide a commit message.
>>
> 
> ack. added to-do the list for v2.

Please run checkpatch before sending.

Best regards,
Krzysztof


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 05/11] RISC-V: drivers/iommu/riscv: Add sysfs interface
  2023-07-19 19:33 ` [PATCH 05/11] RISC-V: drivers/iommu/riscv: Add sysfs interface Tomasz Jeznach
@ 2023-07-20  6:38   ` Krzysztof Kozlowski
  2023-07-20 18:30     ` Tomasz Jeznach
  2023-07-20 12:50   ` Baolu Lu
  1 sibling, 1 reply; 86+ messages in thread
From: Krzysztof Kozlowski @ 2023-07-20  6:38 UTC (permalink / raw)
  To: Tomasz Jeznach, Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley
  Cc: Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Nick Kossifidis, Sebastien Boeuf, iommu, linux-riscv,
	linux-kernel, linux

On 19/07/2023 21:33, Tomasz Jeznach wrote:
> Enable sysfs debug / visibility interface providing restricted
> access to hardware registers.

Please use subject prefixes matching the subsystem. You can get them for
example with `git log --oneline -- DIRECTORY_OR_FILE` on the directory
your patch is touching.

> 
> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> ---
>  drivers/iommu/riscv/Makefile      |   2 +-
>  drivers/iommu/riscv/iommu-sysfs.c | 183 ++++++++++++++++++++++++++++++
>  drivers/iommu/riscv/iommu.c       |   7 ++
>  drivers/iommu/riscv/iommu.h       |   2 +
>  4 files changed, 193 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/iommu/riscv/iommu-sysfs.c
> 
> diff --git a/drivers/iommu/riscv/Makefile b/drivers/iommu/riscv/Makefile
> index 38730c11e4a8..9523eb053cfc 100644
> --- a/drivers/iommu/riscv/Makefile
> +++ b/drivers/iommu/riscv/Makefile
> @@ -1 +1 @@
> -obj-$(CONFIG_RISCV_IOMMU) += iommu.o iommu-pci.o iommu-platform.o
> \ No newline at end of file
> +obj-$(CONFIG_RISCV_IOMMU) += iommu.o iommu-pci.o iommu-platform.o iommu-sysfs.o
> \ No newline at end of file

You have this error in multiple places.

> diff --git a/drivers/iommu/riscv/iommu-sysfs.c b/drivers/iommu/riscv/iommu-sysfs.c
> new file mode 100644
> index 000000000000..f038ea8445c5
> --- /dev/null
> +++ b/drivers/iommu/riscv/iommu-sysfs.c
> @@ -0,0 +1,183 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * IOMMU API for RISC-V architected Ziommu implementations.
> + *
> + * Copyright © 2022-2023 Rivos Inc.
> + *
> + * Author: Tomasz Jeznach <tjeznach@rivosinc.com>
> + */
> +
> +#include <linux/module.h>
> +#include <linux/kernel.h>
> +#include <linux/compiler.h>
> +#include <linux/iommu.h>
> +#include <linux/platform_device.h>
> +#include <asm/page.h>
> +
> +#include "iommu.h"
> +
> +#define sysfs_dev_to_iommu(dev) \
> +	container_of(dev_get_drvdata(dev), struct riscv_iommu_device, iommu)
> +
> +static ssize_t address_show(struct device *dev,
> +			    struct device_attribute *attr, char *buf)


Where is the sysfs ABI documented?


Best regards,
Krzysztof


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 01/11] RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support.
  2023-07-19 19:33 ` [PATCH 01/11] RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support Tomasz Jeznach
  2023-07-19 20:49   ` Conor Dooley
@ 2023-07-20 10:38   ` Baolu Lu
  2023-07-20 12:31   ` Baolu Lu
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 86+ messages in thread
From: Baolu Lu @ 2023-07-20 10:38 UTC (permalink / raw)
  To: Tomasz Jeznach, Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley
  Cc: baolu.lu, Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Nick Kossifidis, Sebastien Boeuf, iommu, linux-riscv,
	linux-kernel, linux

On 2023/7/20 3:33, Tomasz Jeznach wrote:
> +struct riscv_iommu_domain {
> +	struct iommu_domain domain;
> +
> +	struct list_head endpoints;
> +	struct mutex lock;
> +	struct riscv_iommu_device *iommu;

How are domains and iommu devices connected? A domain can be attached to
multiple devices, which are possibly behind different iommu devices. So
a domain is possibly connected to multiple iommu devices.

Is it possible?

> +
> +	unsigned mode;		/* RIO_ATP_MODE_* enum */
> +	unsigned pscid;		/* RISC-V IOMMU PSCID */
> +
> +	pgd_t *pgd_root;	/* page table root pointer */
> +};
> +
> +/* Private dev_iommu_priv object, device-domain relationship. */
> +struct riscv_iommu_endpoint {
> +	struct device *dev;			/* platform or PCI endpoint device */
> +	unsigned devid;      			/* PCI bus:device:function number */
> +	unsigned domid;    			/* PCI domain number, segment */
> +	struct rb_node node;    		/* device tracking node (lookup by devid) */
> +	struct riscv_iommu_device *iommu;	/* parent iommu device */
> +
> +	struct mutex lock;
> +	struct list_head domain;		/* endpoint attached managed domain */
> +};

Best regards,
baolu

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 01/11] RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support.
  2023-07-19 19:33 ` [PATCH 01/11] RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support Tomasz Jeznach
  2023-07-19 20:49   ` Conor Dooley
  2023-07-20 10:38   ` Baolu Lu
@ 2023-07-20 12:31   ` Baolu Lu
  2023-07-20 17:30     ` Tomasz Jeznach
  2023-07-28  2:42   ` Zong Li
                     ` (4 subsequent siblings)
  7 siblings, 1 reply; 86+ messages in thread
From: Baolu Lu @ 2023-07-20 12:31 UTC (permalink / raw)
  To: Tomasz Jeznach, Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley
  Cc: baolu.lu, Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Nick Kossifidis, Sebastien Boeuf, iommu, linux-riscv,
	linux-kernel, linux

On 2023/7/20 3:33, Tomasz Jeznach wrote:
> +static void riscv_iommu_iotlb_sync_map(struct iommu_domain *iommu_domain,
> +				       unsigned long iova, size_t size)
> +{
> +	unsigned long end = iova + size - 1;
> +	/*
> +	 * Given we don't know the page size used by this range, we assume the
> +	 * smallest page size to ensure all possible entries are flushed from
> +	 * the IOATC.
> +	 */
> +	size_t pgsize = PAGE_SIZE;
> +	riscv_iommu_flush_iotlb_range(iommu_domain, &iova, &end, &pgsize);
> +}

Does RISC-V IOMMU require to invalidate the TLB cache after new mappings
are created?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 04/11] MAINTAINERS: Add myself for RISC-V IOMMU driver
  2023-07-19 19:33 ` [PATCH 04/11] MAINTAINERS: Add myself for RISC-V IOMMU driver Tomasz Jeznach
@ 2023-07-20 12:42   ` Baolu Lu
  2023-07-20 17:32     ` Tomasz Jeznach
  0 siblings, 1 reply; 86+ messages in thread
From: Baolu Lu @ 2023-07-20 12:42 UTC (permalink / raw)
  To: Tomasz Jeznach, Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley
  Cc: baolu.lu, Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Nick Kossifidis, Sebastien Boeuf, iommu, linux-riscv,
	linux-kernel, linux

On 2023/7/20 3:33, Tomasz Jeznach wrote:
> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> ---
>   MAINTAINERS | 7 +++++++
>   1 file changed, 7 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index aee340630eca..d28b1b99f4c6 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -18270,6 +18270,13 @@ F:	arch/riscv/
>   N:	riscv
>   K:	riscv
>   
> +RISC-V IOMMU
> +M:	Tomasz Jeznach <tjeznach@rivosinc.com>
> +L:	linux-riscv@lists.infradead.org

Please add the iommu subsystem mailing list.

iommu@lists.linux.dev

It's the right place to discuss iommu drivers.

> +S:	Maintained
> +F:	Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> +F:	drivers/iommu/riscv/
> +
>   RISC-V MICROCHIP FPGA SUPPORT
>   M:	Conor Dooley <conor.dooley@microchip.com>
>   M:	Daire McNamara <daire.mcnamara@microchip.com>

Best regards,
baolu

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 05/11] RISC-V: drivers/iommu/riscv: Add sysfs interface
  2023-07-19 19:33 ` [PATCH 05/11] RISC-V: drivers/iommu/riscv: Add sysfs interface Tomasz Jeznach
  2023-07-20  6:38   ` Krzysztof Kozlowski
@ 2023-07-20 12:50   ` Baolu Lu
  2023-07-20 17:47     ` Tomasz Jeznach
  1 sibling, 1 reply; 86+ messages in thread
From: Baolu Lu @ 2023-07-20 12:50 UTC (permalink / raw)
  To: Tomasz Jeznach, Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley
  Cc: baolu.lu, Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Nick Kossifidis, Sebastien Boeuf, iommu, linux-riscv,
	linux-kernel, linux

On 2023/7/20 3:33, Tomasz Jeznach wrote:
> +#define sysfs_dev_to_iommu(dev) \
> +	container_of(dev_get_drvdata(dev), struct riscv_iommu_device, iommu)
> +
> +static ssize_t address_show(struct device *dev,
> +			    struct device_attribute *attr, char *buf)
> +{
> +	struct riscv_iommu_device *iommu = sysfs_dev_to_iommu(dev);
> +	return sprintf(buf, "%llx\n", iommu->reg_phys);

Use sysfs_emit() please.

> +}
> +
> +static DEVICE_ATTR_RO(address);
> +
> +#define ATTR_RD_REG32(name, offset)					\
> +	ssize_t reg_ ## name ## _show(struct device *dev,		\
> +			struct device_attribute *attr, char *buf)	\
> +{									\
> +	struct riscv_iommu_device *iommu = sysfs_dev_to_iommu(dev);	\
> +	return sprintf(buf, "0x%x\n",					\
> +			riscv_iommu_readl(iommu, offset));		\
> +}
> +
> +#define ATTR_RD_REG64(name, offset)					\
> +	ssize_t reg_ ## name ## _show(struct device *dev,		\
> +			struct device_attribute *attr, char *buf)	\
> +{									\
> +	struct riscv_iommu_device *iommu = sysfs_dev_to_iommu(dev);	\
> +	return sprintf(buf, "0x%llx\n",					\
> +			riscv_iommu_readq(iommu, offset));		\
> +}
> +
> +#define ATTR_WR_REG32(name, offset)					\
> +	ssize_t reg_ ## name ## _store(struct device *dev,		\
> +			struct device_attribute *attr,			\
> +			const char *buf, size_t len)			\
> +{									\
> +	struct riscv_iommu_device *iommu = sysfs_dev_to_iommu(dev);	\
> +	unsigned long val;						\
> +	int ret;							\
> +	ret = kstrtoul(buf, 0, &val);					\
> +	if (ret)							\
> +		return ret;						\
> +	riscv_iommu_writel(iommu, offset, val);				\
> +	return len;							\
> +}
> +
> +#define ATTR_WR_REG64(name, offset)					\
> +	ssize_t reg_ ## name ## _store(struct device *dev,		\
> +			struct device_attribute *attr,			\
> +			const char *buf, size_t len)			\
> +{									\
> +	struct riscv_iommu_device *iommu = sysfs_dev_to_iommu(dev);	\
> +	unsigned long long val;						\
> +	int ret;							\
> +	ret = kstrtoull(buf, 0, &val);					\
> +	if (ret)							\
> +		return ret;						\
> +	riscv_iommu_writeq(iommu, offset, val);				\
> +	return len;							\
> +}

So this allows users to change the registers through sysfs? How does
it synchronize with the iommu driver?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues
  2023-07-19 19:33 ` [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues Tomasz Jeznach
  2023-07-20  3:11   ` Nick Kossifidis
@ 2023-07-20 13:08   ` Baolu Lu
  2023-07-20 17:49     ` Tomasz Jeznach
  2023-07-29 12:58   ` Zong Li
  2023-08-16 18:49   ` Robin Murphy
  3 siblings, 1 reply; 86+ messages in thread
From: Baolu Lu @ 2023-07-20 13:08 UTC (permalink / raw)
  To: Tomasz Jeznach, Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley
  Cc: baolu.lu, Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Nick Kossifidis, Sebastien Boeuf, iommu, linux-riscv,
	linux-kernel, linux

On 2023/7/20 3:33, Tomasz Jeznach wrote:
> Enables message or wire signal interrupts for PCIe and platforms devices.

If this patch could be divided into multiple small patches, each
logically doing one specific thing, it will help people better review
the code.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 01/11] RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support.
  2023-07-20 12:31   ` Baolu Lu
@ 2023-07-20 17:30     ` Tomasz Jeznach
  0 siblings, 0 replies; 86+ messages in thread
From: Tomasz Jeznach @ 2023-07-20 17:30 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Nick Kossifidis, Sebastien Boeuf, iommu, linux-riscv,
	linux-kernel, linux

On Thu, Jul 20, 2023 at 5:31 AM Baolu Lu <baolu.lu@linux.intel.com> wrote:
>
> On 2023/7/20 3:33, Tomasz Jeznach wrote:
> > +static void riscv_iommu_iotlb_sync_map(struct iommu_domain *iommu_domain,
> > +                                    unsigned long iova, size_t size)
> > +{
> > +     unsigned long end = iova + size - 1;
> > +     /*
> > +      * Given we don't know the page size used by this range, we assume the
> > +      * smallest page size to ensure all possible entries are flushed from
> > +      * the IOATC.
> > +      */
> > +     size_t pgsize = PAGE_SIZE;
> > +     riscv_iommu_flush_iotlb_range(iommu_domain, &iova, &end, &pgsize);
> > +}
>
> Does RISC-V IOMMU require to invalidate the TLB cache after new mappings
> are created?
>

No. Only on unmapping / permission change.
Thanks for pointing this out.

> Best regards,
> baolu

regards,
- Tomasz

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 04/11] MAINTAINERS: Add myself for RISC-V IOMMU driver
  2023-07-20 12:42   ` Baolu Lu
@ 2023-07-20 17:32     ` Tomasz Jeznach
  0 siblings, 0 replies; 86+ messages in thread
From: Tomasz Jeznach @ 2023-07-20 17:32 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Nick Kossifidis, Sebastien Boeuf, iommu, linux-riscv,
	linux-kernel, linux

On Thu, Jul 20, 2023 at 5:42 AM Baolu Lu <baolu.lu@linux.intel.com> wrote:
>
> On 2023/7/20 3:33, Tomasz Jeznach wrote:
> > Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> > ---
> >   MAINTAINERS | 7 +++++++
> >   1 file changed, 7 insertions(+)
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index aee340630eca..d28b1b99f4c6 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -18270,6 +18270,13 @@ F:   arch/riscv/
> >   N:  riscv
> >   K:  riscv
> >
> > +RISC-V IOMMU
> > +M:   Tomasz Jeznach <tjeznach@rivosinc.com>
> > +L:   linux-riscv@lists.infradead.org
>
> Please add the iommu subsystem mailing list.
>
> iommu@lists.linux.dev
>
> It's the right place to discuss iommu drivers.
>

ack. will add in the next version. Thanks

> > +S:   Maintained
> > +F:   Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > +F:   drivers/iommu/riscv/
> > +
> >   RISC-V MICROCHIP FPGA SUPPORT
> >   M:  Conor Dooley <conor.dooley@microchip.com>
> >   M:  Daire McNamara <daire.mcnamara@microchip.com>
>
> Best regards,
> baolu

regards,
- Tomasz

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 05/11] RISC-V: drivers/iommu/riscv: Add sysfs interface
  2023-07-20 12:50   ` Baolu Lu
@ 2023-07-20 17:47     ` Tomasz Jeznach
  0 siblings, 0 replies; 86+ messages in thread
From: Tomasz Jeznach @ 2023-07-20 17:47 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Nick Kossifidis, Sebastien Boeuf, iommu, linux-riscv,
	linux-kernel, linux

On Thu, Jul 20, 2023 at 5:51 AM Baolu Lu <baolu.lu@linux.intel.com> wrote:
>
> On 2023/7/20 3:33, Tomasz Jeznach wrote:
> > +#define sysfs_dev_to_iommu(dev) \
> > +     container_of(dev_get_drvdata(dev), struct riscv_iommu_device, iommu)
> > +
> > +static ssize_t address_show(struct device *dev,
> > +                         struct device_attribute *attr, char *buf)
> > +{
> > +     struct riscv_iommu_device *iommu = sysfs_dev_to_iommu(dev);
> > +     return sprintf(buf, "%llx\n", iommu->reg_phys);
>
> Use sysfs_emit() please.
>

ack. Thanks, will update.

> > +}
> > +
> > +static DEVICE_ATTR_RO(address);
> > +
> > +#define ATTR_RD_REG32(name, offset)                                  \
> > +     ssize_t reg_ ## name ## _show(struct device *dev,               \
> > +                     struct device_attribute *attr, char *buf)       \
> > +{                                                                    \
> > +     struct riscv_iommu_device *iommu = sysfs_dev_to_iommu(dev);     \
> > +     return sprintf(buf, "0x%x\n",                                   \
> > +                     riscv_iommu_readl(iommu, offset));              \
> > +}
> > +
> > +#define ATTR_RD_REG64(name, offset)                                  \
> > +     ssize_t reg_ ## name ## _show(struct device *dev,               \
> > +                     struct device_attribute *attr, char *buf)       \
> > +{                                                                    \
> > +     struct riscv_iommu_device *iommu = sysfs_dev_to_iommu(dev);     \
> > +     return sprintf(buf, "0x%llx\n",                                 \
> > +                     riscv_iommu_readq(iommu, offset));              \
> > +}
> > +
> > +#define ATTR_WR_REG32(name, offset)                                  \
> > +     ssize_t reg_ ## name ## _store(struct device *dev,              \
> > +                     struct device_attribute *attr,                  \
> > +                     const char *buf, size_t len)                    \
> > +{                                                                    \
> > +     struct riscv_iommu_device *iommu = sysfs_dev_to_iommu(dev);     \
> > +     unsigned long val;                                              \
> > +     int ret;                                                        \
> > +     ret = kstrtoul(buf, 0, &val);                                   \
> > +     if (ret)                                                        \
> > +             return ret;                                             \
> > +     riscv_iommu_writel(iommu, offset, val);                         \
> > +     return len;                                                     \
> > +}
> > +
> > +#define ATTR_WR_REG64(name, offset)                                  \
> > +     ssize_t reg_ ## name ## _store(struct device *dev,              \
> > +                     struct device_attribute *attr,                  \
> > +                     const char *buf, size_t len)                    \
> > +{                                                                    \
> > +     struct riscv_iommu_device *iommu = sysfs_dev_to_iommu(dev);     \
> > +     unsigned long long val;                                         \
> > +     int ret;                                                        \
> > +     ret = kstrtoull(buf, 0, &val);                                  \
> > +     if (ret)                                                        \
> > +             return ret;                                             \
> > +     riscv_iommu_writeq(iommu, offset, val);                         \
> > +     return len;                                                     \
> > +}
>
> So this allows users to change the registers through sysfs? How does
> it synchronize with the iommu driver?
>

The only writable registers are for debug interface and performance
monitoring counters, without any synchronization requirements between
user / driver.  In follow up patch series performance counters will be
also removed from sysfs, replaced by integration with perfmon
subsystem. The only remaining will be a debug access, providing user
access to address translation, in short it provides an interface to
query SPA based on provided IOVA/RID/PASID. There was a discussion in
RVI IOMMU TG forum if it's acceptable to expose such an interface to
the privileged user, and the conclusion was that it's very likely not
exposing more info than privileged users already are able to acquire
by looking at in-memory data structures.

Read-only registers are to provide debug access to track queue
head/tail pointers and interrupt states.

> Best regards,
> baolu

regards,
- Tomasz

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues
  2023-07-20 13:08   ` Baolu Lu
@ 2023-07-20 17:49     ` Tomasz Jeznach
  0 siblings, 0 replies; 86+ messages in thread
From: Tomasz Jeznach @ 2023-07-20 17:49 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Nick Kossifidis, Sebastien Boeuf, iommu, linux-riscv,
	linux-kernel, linux

On Thu, Jul 20, 2023 at 6:18 AM Baolu Lu <baolu.lu@linux.intel.com> wrote:
>
> On 2023/7/20 3:33, Tomasz Jeznach wrote:
> > Enables message or wire signal interrupts for PCIe and platforms devices.
>
> If this patch could be divided into multiple small patches, each
> logically doing one specific thing, it will help people better review
> the code.
>

ack. I've got a similar comment regarding this patch already.
I will split and add more notes to the commit message. Thanks.


> Best regards,
> baolu
>


regards,
- Tomasz

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues
  2023-07-20  3:11   ` Nick Kossifidis
@ 2023-07-20 18:00     ` Tomasz Jeznach
  2023-07-20 18:43       ` Conor Dooley
  2023-07-24  9:47       ` Zong Li
  0 siblings, 2 replies; 86+ messages in thread
From: Tomasz Jeznach @ 2023-07-20 18:00 UTC (permalink / raw)
  To: Nick Kossifidis
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Sebastien Boeuf, iommu, linux-riscv, linux-kernel, linux

On Wed, Jul 19, 2023 at 8:12 PM Nick Kossifidis <mick@ics.forth.gr> wrote:
>
> Hello Tomasz,
>
> On 7/19/23 22:33, Tomasz Jeznach wrote:
> > Enables message or wire signal interrupts for PCIe and platforms devices.
> >
>
> The description doesn't match the subject nor the patch content (we
> don't jus enable interrupts, we also init the queues).
>
> > +     /* Parse Queue lengts */
> > +     ret = of_property_read_u32(pdev->dev.of_node, "cmdq_len", &iommu->cmdq_len);
> > +     if (!ret)
> > +             dev_info(dev, "command queue length set to %i\n", iommu->cmdq_len);
> > +
> > +     ret = of_property_read_u32(pdev->dev.of_node, "fltq_len", &iommu->fltq_len);
> > +     if (!ret)
> > +             dev_info(dev, "fault/event queue length set to %i\n", iommu->fltq_len);
> > +
> > +     ret = of_property_read_u32(pdev->dev.of_node, "priq_len", &iommu->priq_len);
> > +     if (!ret)
> > +             dev_info(dev, "page request queue length set to %i\n", iommu->priq_len);
> > +
> >       dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
> >
>
> We need to add those to the device tree binding doc (or throw them away,
> I thought it would be better to have them as part of the device
> desciption than a module parameter).
>

We can add them as an optional fields to DT.
Alternatively, I've been looking into an option to auto-scale CQ/PQ
based on number of attached devices, but this gets trickier for
hot-pluggable systems. I've added module parameters as a bare-minimum,
but still looking for better solutions.

>
> > +static irqreturn_t riscv_iommu_priq_irq_check(int irq, void *data);
> > +static irqreturn_t riscv_iommu_priq_process(int irq, void *data);
> > +
>
> > +     case RISCV_IOMMU_PAGE_REQUEST_QUEUE:
> > +             q = &iommu->priq;
> > +             q->len = sizeof(struct riscv_iommu_pq_record);
> > +             count = iommu->priq_len;
> > +             irq = iommu->irq_priq;
> > +             irq_check = riscv_iommu_priq_irq_check;
> > +             irq_process = riscv_iommu_priq_process;
> > +             q->qbr = RISCV_IOMMU_REG_PQB;
> > +             q->qcr = RISCV_IOMMU_REG_PQCSR;
> > +             name = "priq";
> > +             break;
>
>
> It makes more sense to add the code for the page request queue in the
> patch that adds ATS/PRI support IMHO. This comment also applies to its
> interrupt handlers below.
>

ack. will do.

>
> > +static inline void riscv_iommu_cmd_inval_set_addr(struct riscv_iommu_command *cmd,
> > +                                               u64 addr)
> > +{
> > +     cmd->dword0 |= RISCV_IOMMU_CMD_IOTINVAL_AV;
> > +     cmd->dword1 = addr;
> > +}
> > +
>
> This needs to be (addr >> 2) to match the spec, same as in the iofence
> command.
>

oops. Thanks!

> Regards,
> Nick
>

regards,
- Tomasz

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 05/11] RISC-V: drivers/iommu/riscv: Add sysfs interface
  2023-07-20  6:38   ` Krzysztof Kozlowski
@ 2023-07-20 18:30     ` Tomasz Jeznach
  2023-07-20 21:37       ` Krzysztof Kozlowski
  0 siblings, 1 reply; 86+ messages in thread
From: Tomasz Jeznach @ 2023-07-20 18:30 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Nick Kossifidis, Sebastien Boeuf, iommu, linux-riscv,
	linux-kernel, linux

On Wed, Jul 19, 2023 at 11:38 PM Krzysztof Kozlowski <krzk@kernel.org> wrote:
>
> On 19/07/2023 21:33, Tomasz Jeznach wrote:
> > Enable sysfs debug / visibility interface providing restricted
> > access to hardware registers.
>
> Please use subject prefixes matching the subsystem. You can get them for
> example with `git log --oneline -- DIRECTORY_OR_FILE` on the directory
> your patch is touching.
>

ack.

> >
> > Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> > ---
> >  drivers/iommu/riscv/Makefile      |   2 +-
> >  drivers/iommu/riscv/iommu-sysfs.c | 183 ++++++++++++++++++++++++++++++
> >  drivers/iommu/riscv/iommu.c       |   7 ++
> >  drivers/iommu/riscv/iommu.h       |   2 +
> >  4 files changed, 193 insertions(+), 1 deletion(-)
> >  create mode 100644 drivers/iommu/riscv/iommu-sysfs.c
> >
> > diff --git a/drivers/iommu/riscv/Makefile b/drivers/iommu/riscv/Makefile
> > index 38730c11e4a8..9523eb053cfc 100644
> > --- a/drivers/iommu/riscv/Makefile
> > +++ b/drivers/iommu/riscv/Makefile
> > @@ -1 +1 @@
> > -obj-$(CONFIG_RISCV_IOMMU) += iommu.o iommu-pci.o iommu-platform.o
> > \ No newline at end of file
> > +obj-$(CONFIG_RISCV_IOMMU) += iommu.o iommu-pci.o iommu-platform.o iommu-sysfs.o
> > \ No newline at end of file
>
> You have this error in multiple places.
>

ack. next version will run through checkpatch.pl, should spot such problems.

> > diff --git a/drivers/iommu/riscv/iommu-sysfs.c b/drivers/iommu/riscv/iommu-sysfs.c
> > new file mode 100644
> > index 000000000000..f038ea8445c5
> > --- /dev/null
> > +++ b/drivers/iommu/riscv/iommu-sysfs.c
> > @@ -0,0 +1,183 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * IOMMU API for RISC-V architected Ziommu implementations.
> > + *
> > + * Copyright © 2022-2023 Rivos Inc.
> > + *
> > + * Author: Tomasz Jeznach <tjeznach@rivosinc.com>
> > + */
> > +
> > +#include <linux/module.h>
> > +#include <linux/kernel.h>
> > +#include <linux/compiler.h>
> > +#include <linux/iommu.h>
> > +#include <linux/platform_device.h>
> > +#include <asm/page.h>
> > +
> > +#include "iommu.h"
> > +
> > +#define sysfs_dev_to_iommu(dev) \
> > +     container_of(dev_get_drvdata(dev), struct riscv_iommu_device, iommu)
> > +
> > +static ssize_t address_show(struct device *dev,
> > +                         struct device_attribute *attr, char *buf)
>
>
> Where is the sysfs ABI documented?
>

Sysfs for now is used only to expose selected IOMMU memory mapped
registers, with complete documentation in the RISC-V IOMMU Arch Spec
[1], and some comments in iommu-bits.h file.
LMK If it would be better to put a dedicated file documenting those
with the patch itself.


[1] https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf

>
> Best regards,
> Krzysztof
>

regards,
- Tomasz

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues
  2023-07-20 18:00     ` Tomasz Jeznach
@ 2023-07-20 18:43       ` Conor Dooley
  2023-07-24  9:47       ` Zong Li
  1 sibling, 0 replies; 86+ messages in thread
From: Conor Dooley @ 2023-07-20 18:43 UTC (permalink / raw)
  To: Tomasz Jeznach
  Cc: Nick Kossifidis, Anup Patel, Albert Ou, linux, Will Deacon,
	Joerg Roedel, linux-kernel, Sebastien Boeuf, iommu,
	Palmer Dabbelt, Paul Walmsley, linux-riscv, Robin Murphy

[-- Attachment #1: Type: text/plain, Size: 1986 bytes --]

On Thu, Jul 20, 2023 at 11:00:10AM -0700, Tomasz Jeznach wrote:
> On Wed, Jul 19, 2023 at 8:12 PM Nick Kossifidis <mick@ics.forth.gr> wrote:
> > On 7/19/23 22:33, Tomasz Jeznach wrote:
> > The description doesn't match the subject nor the patch content (we
> > don't jus enable interrupts, we also init the queues).
> >
> > > +     /* Parse Queue lengts */
> > > +     ret = of_property_read_u32(pdev->dev.of_node, "cmdq_len", &iommu->cmdq_len);
> > > +     if (!ret)
> > > +             dev_info(dev, "command queue length set to %i\n", iommu->cmdq_len);
> > > +
> > > +     ret = of_property_read_u32(pdev->dev.of_node, "fltq_len", &iommu->fltq_len);
> > > +     if (!ret)
> > > +             dev_info(dev, "fault/event queue length set to %i\n", iommu->fltq_len);
> > > +
> > > +     ret = of_property_read_u32(pdev->dev.of_node, "priq_len", &iommu->priq_len);
> > > +     if (!ret)
> > > +             dev_info(dev, "page request queue length set to %i\n", iommu->priq_len);
> > > +
> > >       dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
> > >
> >
> > We need to add those to the device tree binding doc (or throw them away,
> > I thought it would be better to have them as part of the device
> > desciption than a module parameter).

Aye, I didn't notice these. Any DT properties /must/ be documented.
To avoid having to make the comments on v2, properties should also not
contain underscores.

> We can add them as an optional fields to DT.
> Alternatively, I've been looking into an option to auto-scale CQ/PQ
> based on number of attached devices, but this gets trickier for
> hot-pluggable systems. I've added module parameters as a bare-minimum,
> but still looking for better solutions.

If they're properties of the hardware, they should come from DT/ACPI,
unless they're auto-detectable, in which case that is preferred.
To quote GregKH "please do not add new module parameters for drivers,
this is not the 1990s" :)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 01/11] RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support.
  2023-07-19 21:43     ` Tomasz Jeznach
@ 2023-07-20 19:27       ` Conor Dooley
  2023-07-21  9:44       ` Conor Dooley
  1 sibling, 0 replies; 86+ messages in thread
From: Conor Dooley @ 2023-07-20 19:27 UTC (permalink / raw)
  To: Tomasz Jeznach
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley,
	Anup Patel, Albert Ou, linux, linux-kernel, Sebastien Boeuf,
	iommu, Palmer Dabbelt, Nick Kossifidis, linux-riscv

[-- Attachment #1: Type: text/plain, Size: 753 bytes --]

On Wed, Jul 19, 2023 at 02:43:51PM -0700, Tomasz Jeznach wrote:
> On Wed, Jul 19, 2023 at 1:50 PM Conor Dooley <conor@kernel.org> wrote:

> > Anyway, what I wanted to ask was whether it was valid to use the IOMMU
> > in a system if Ziommu is not present in whatever the ISA extension
> > communication mechanism is? Eg, riscv,isa or the ISA string property in
> > the ACPI tables.
> >
> 
> Yes, this has been pointed out to me already. As far as I can recall,
> there was a discussion
> at some point to introduce those as Ziommu extensions, later agreeing
> not to call IOMMU
> using ISA string conventions.  Will remove remaining occurrences of
> Ziommu from the series.

Right, thanks for clearing that up. I got a bit confused :)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 05/11] RISC-V: drivers/iommu/riscv: Add sysfs interface
  2023-07-20 18:30     ` Tomasz Jeznach
@ 2023-07-20 21:37       ` Krzysztof Kozlowski
  2023-07-20 22:08         ` Conor Dooley
  0 siblings, 1 reply; 86+ messages in thread
From: Krzysztof Kozlowski @ 2023-07-20 21:37 UTC (permalink / raw)
  To: Tomasz Jeznach
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Nick Kossifidis, Sebastien Boeuf, iommu, linux-riscv,
	linux-kernel, linux

On 20/07/2023 20:30, Tomasz Jeznach wrote:
u.h"
>>> +
>>> +#define sysfs_dev_to_iommu(dev) \
>>> +     container_of(dev_get_drvdata(dev), struct riscv_iommu_device, iommu)
>>> +
>>> +static ssize_t address_show(struct device *dev,
>>> +                         struct device_attribute *attr, char *buf)
>>
>>
>> Where is the sysfs ABI documented?
>>
> 
> Sysfs for now is used only to expose selected IOMMU memory mapped
> registers, with complete documentation in the RISC-V IOMMU Arch Spec
> [1], and some comments in iommu-bits.h file.
> LMK If it would be better to put a dedicated file documenting those
> with the patch itself.

I meant, you created new sysfs interface. Maybe I missed something in
the patchset, but each new sysfs interface required documenting in
Documentation/ABI/.

Best regards,
Krzysztof


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 05/11] RISC-V: drivers/iommu/riscv: Add sysfs interface
  2023-07-20 21:37       ` Krzysztof Kozlowski
@ 2023-07-20 22:08         ` Conor Dooley
  2023-07-21  3:49           ` Tomasz Jeznach
  0 siblings, 1 reply; 86+ messages in thread
From: Conor Dooley @ 2023-07-20 22:08 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: Tomasz Jeznach, Joerg Roedel, Will Deacon, Robin Murphy,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Nick Kossifidis, Sebastien Boeuf, iommu, linux-riscv,
	linux-kernel, linux

[-- Attachment #1: Type: text/plain, Size: 1317 bytes --]

On Thu, Jul 20, 2023 at 11:37:50PM +0200, Krzysztof Kozlowski wrote:
> On 20/07/2023 20:30, Tomasz Jeznach wrote:

> >>> +#define sysfs_dev_to_iommu(dev) \
> >>> +     container_of(dev_get_drvdata(dev), struct riscv_iommu_device, iommu)
> >>> +
> >>> +static ssize_t address_show(struct device *dev,
> >>> +                         struct device_attribute *attr, char *buf)
> >>
> >>
> >> Where is the sysfs ABI documented?
> >>
> > 
> > Sysfs for now is used only to expose selected IOMMU memory mapped
> > registers, with complete documentation in the RISC-V IOMMU Arch Spec
> > [1], and some comments in iommu-bits.h file.
> > LMK If it would be better to put a dedicated file documenting those
> > with the patch itself.
> 
> I meant, you created new sysfs interface. Maybe I missed something in
> the patchset, but each new sysfs interface required documenting in
> Documentation/ABI/.

| expose selected IOMMU memory mapped registers

| Enable sysfs debug / visibility interface providing restricted
| access to hardware registers.

Documentation requirements of sysfs stuff aside, I'm not sure that we
even want a sysfs interface for this in the first place? Seems like, if
at all, this should be debugfs instead? Seems like the only use case for
it is debugging/development...

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 05/11] RISC-V: drivers/iommu/riscv: Add sysfs interface
  2023-07-20 22:08         ` Conor Dooley
@ 2023-07-21  3:49           ` Tomasz Jeznach
  0 siblings, 0 replies; 86+ messages in thread
From: Tomasz Jeznach @ 2023-07-21  3:49 UTC (permalink / raw)
  To: Conor Dooley
  Cc: Krzysztof Kozlowski, Joerg Roedel, Will Deacon, Robin Murphy,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Nick Kossifidis, Sebastien Boeuf, iommu, linux-riscv,
	linux-kernel, linux

On Thu, Jul 20, 2023 at 3:08 PM Conor Dooley <conor@kernel.org> wrote:
>
> On Thu, Jul 20, 2023 at 11:37:50PM +0200, Krzysztof Kozlowski wrote:
> > On 20/07/2023 20:30, Tomasz Jeznach wrote:
>
> > >>> +#define sysfs_dev_to_iommu(dev) \
> > >>> +     container_of(dev_get_drvdata(dev), struct riscv_iommu_device, iommu)
> > >>> +
> > >>> +static ssize_t address_show(struct device *dev,
> > >>> +                         struct device_attribute *attr, char *buf)
> > >>
> > >>
> > >> Where is the sysfs ABI documented?
> > >>
> > >
> > > Sysfs for now is used only to expose selected IOMMU memory mapped
> > > registers, with complete documentation in the RISC-V IOMMU Arch Spec
> > > [1], and some comments in iommu-bits.h file.
> > > LMK If it would be better to put a dedicated file documenting those
> > > with the patch itself.
> >
> > I meant, you created new sysfs interface. Maybe I missed something in
> > the patchset, but each new sysfs interface required documenting in
> > Documentation/ABI/.
>
> | expose selected IOMMU memory mapped registers
>
> | Enable sysfs debug / visibility interface providing restricted
> | access to hardware registers.
>
> Documentation requirements of sysfs stuff aside, I'm not sure that we
> even want a sysfs interface for this in the first place? Seems like, if
> at all, this should be debugfs instead? Seems like the only use case for
> it is debugging/development...

Thanks Conor, will switch to debugfs. This will be a more suitable interface.

regards,
- Tomasz

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 01/11] RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support.
  2023-07-19 21:43     ` Tomasz Jeznach
  2023-07-20 19:27       ` Conor Dooley
@ 2023-07-21  9:44       ` Conor Dooley
  1 sibling, 0 replies; 86+ messages in thread
From: Conor Dooley @ 2023-07-21  9:44 UTC (permalink / raw)
  To: Tomasz Jeznach
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley,
	Anup Patel, Albert Ou, linux, linux-kernel, Sebastien Boeuf,
	iommu, Palmer Dabbelt, Nick Kossifidis, linux-riscv

[-- Attachment #1: Type: text/plain, Size: 1311 bytes --]

On Wed, Jul 19, 2023 at 02:43:51PM -0700, Tomasz Jeznach wrote:
> On Wed, Jul 19, 2023 at 1:50 PM Conor Dooley <conor@kernel.org> wrote:

> > Also, since I am not going to reply to any of these iommu driver patches
> > in a meaningful capacity, please run checkpatch.pl on your work. There
> > are well over 100 style etc complaints that it has highlighted. Sparse
> > has also gone a bit nuts, with many warnings along the lines of:
> > drivers/iommu/riscv/iommu.c:1568:29: warning: incorrect type in assignment (different base types)
> > drivers/iommu/riscv/iommu.c:1568:29:    expected unsigned long long [usertype] iohgatp
> > drivers/iommu/riscv/iommu.c:1568:29:    got restricted __le64 [usertype]
> >
> > I can provide you the full list when the patchwork automation has run
> > through the series.
> >
> 
> Thank you, a list of used lint checkers definitely would help.

checkpatch is mentioned in the patch submission documentation ;)

Anyway, here's the series on patchwork:
https://patchwork.kernel.org/project/linux-riscv/list/?series=767543
You can see there's quite a few failure for each patch, so you'll need
to resolve those.

Also, I noticed the 32-bit build is broken in some patches, so please
build this driver for 32-bit before sending a v2.

Thanks,
Conor.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings
  2023-07-19 19:33 ` [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings Tomasz Jeznach
  2023-07-19 20:19   ` Conor Dooley
@ 2023-07-24  8:03   ` Zong Li
  2023-07-24 10:02     ` Anup Patel
  1 sibling, 1 reply; 86+ messages in thread
From: Zong Li @ 2023-07-24  8:03 UTC (permalink / raw)
  To: Tomasz Jeznach
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley,
	Anup Patel, Albert Ou, linux, linux-kernel, Sebastien Boeuf,
	iommu, Palmer Dabbelt, Nick Kossifidis, linux-riscv

On Thu, Jul 20, 2023 at 3:35 AM Tomasz Jeznach <tjeznach@rivosinc.com> wrote:
>
> From: Anup Patel <apatel@ventanamicro.com>
>
> We add DT bindings document for RISC-V IOMMU platform and PCI devices
> defined by the RISC-V IOMMU specification.
>
> Signed-off-by: Anup Patel <apatel@ventanamicro.com>
> ---
>  .../bindings/iommu/riscv,iommu.yaml           | 146 ++++++++++++++++++
>  1 file changed, 146 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
>
> diff --git a/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> new file mode 100644
> index 000000000000..8a9aedb61768
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> @@ -0,0 +1,146 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/iommu/riscv,iommu.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: RISC-V IOMMU Implementation
> +
> +maintainers:
> +  - Tomasz Jeznach <tjeznach@rivosinc.com>
> +
> +description:
> +  The RISC-V IOMMU specificaiton defines an IOMMU for RISC-V platforms
> +  which can be a regular platform device or a PCI device connected to
> +  the host root port.
> +
> +  The RISC-V IOMMU provides two stage translation, device directory table,
> +  command queue and fault reporting as wired interrupt or MSIx event for
> +  both PCI and platform devices.
> +
> +  Visit https://github.com/riscv-non-isa/riscv-iommu for more details.
> +
> +properties:
> +  compatible:
> +    oneOf:
> +      - description: RISC-V IOMMU as a platform device
> +        items:
> +          - enum:
> +              - vendor,chip-iommu
> +          - const: riscv,iommu
> +
> +      - description: RISC-V IOMMU as a PCI device connected to root port
> +        items:
> +          - enum:
> +              - vendor,chip-pci-iommu
> +          - const: riscv,pci-iommu
> +
> +  reg:
> +    maxItems: 1
> +    description:
> +      For RISC-V IOMMU as a platform device, this represents the MMIO base
> +      address of registers.
> +
> +      For RISC-V IOMMU as a PCI device, this represents the PCI-PCI bridge
> +      details as described in Documentation/devicetree/bindings/pci/pci.txt
> +
> +  '#iommu-cells':
> +    const: 2
> +    description: |
> +      Each IOMMU specifier represents the base device ID and number of
> +      device IDs.
> +
> +  interrupts:
> +    minItems: 1
> +    maxItems: 16
> +    description:
> +      The presence of this property implies that given RISC-V IOMMU uses
> +      wired interrupts to notify the RISC-V HARTS (or CPUs).
> +
> +  msi-parent:
> +    description:
> +      The presence of this property implies that given RISC-V IOMMU uses
> +      MSIx to notify the RISC-V HARTs (or CPUs). This property should be
> +      considered only when the interrupts property is absent.
> +
> +  dma-coherent:
> +    description:
> +      Present if page table walks and DMA accessed made by the RISC-V IOMMU
> +      are cache coherent with the CPU.
> +
> +  power-domains:
> +    maxItems: 1
> +

In RISC-V IOMMU, certain devices can be set to bypass mode when the
IOMMU is in translation mode. To identify the devices that require
bypass mode by default, does it be sensible to add a property to
indicate this behavior?

> +required:
> +  - compatible
> +  - reg
> +  - '#iommu-cells'
> +
> +additionalProperties: false
> +
> +examples:
> +  - |
> +    /* Example 1 (IOMMU platform device with wired interrupts) */
> +    immu1: iommu@1bccd000 {
> +        compatible = "vendor,chip-iommu", "riscv,iommu";
> +        reg = <0x1bccd000 0x1000>;
> +        interrupt-parent = <&aplic_smode>;
> +        interrupts = <32 4>, <33 4>, <34 4>, <35 4>;
> +        #iommu-cells = <2>;
> +    };
> +
> +    /* Device with two IOMMU device IDs, 0 and 7 */
> +    master1 {
> +        iommus = <&immu1 0 1>, <&immu1 7 1>;
> +    };
> +
> +  - |
> +    /* Example 2 (IOMMU platform device with MSIs) */
> +    immu2: iommu@1bcdd000 {
> +        compatible = "vendor,chip-iommu", "riscv,iommu";
> +        reg = <0x1bccd000 0x1000>;
> +        msi-parent = <&imsics_smode>;
> +        #iommu-cells = <2>;
> +    };
> +
> +    bus {
> +        #address-cells = <2>;
> +        #size-cells = <2>;
> +
> +        /* Device with IOMMU device IDs ranging from 32 to 64 */
> +        master1 {
> +                iommus = <&immu2 32 32>;
> +        };
> +
> +        pcie@40000000 {
> +            compatible = "pci-host-cam-generic";
> +            device_type = "pci";
> +            #address-cells = <3>;
> +            #size-cells = <2>;
> +            bus-range = <0x0 0x1>;
> +
> +            /* CPU_PHYSICAL(2)  SIZE(2) */
> +            reg = <0x0 0x40000000 0x0 0x1000000>;
> +
> +            /* BUS_ADDRESS(3)  CPU_PHYSICAL(2)  SIZE(2) */
> +            ranges = <0x01000000 0x0 0x01000000  0x0 0x01000000  0x0 0x00010000>,
> +                     <0x02000000 0x0 0x41000000  0x0 0x41000000  0x0 0x3f000000>;
> +
> +            #interrupt-cells = <0x1>;
> +
> +            /* PCI_DEVICE(3)  INT#(1)  CONTROLLER(PHANDLE)  CONTROLLER_DATA(2) */
> +            interrupt-map = <   0x0 0x0 0x0  0x1  &aplic_smode  0x4 0x1>,
> +                            < 0x800 0x0 0x0  0x1  &aplic_smode  0x5 0x1>,
> +                            <0x1000 0x0 0x0  0x1  &aplic_smode  0x6 0x1>,
> +                            <0x1800 0x0 0x0  0x1  &aplic_smode  0x7 0x1>;
> +
> +            /* PCI_DEVICE(3)  INT#(1) */
> +            interrupt-map-mask = <0xf800 0x0 0x0  0x7>;
> +
> +            msi-parent = <&imsics_smode>;
> +
> +            /* Devices with bus number 0-127 are mastered via immu2 */
> +            iommu-map = <0x0000 &immu2 0x0000 0x8000>;
> +        };
> +    };
> +...
> --
> 2.34.1
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues
  2023-07-20 18:00     ` Tomasz Jeznach
  2023-07-20 18:43       ` Conor Dooley
@ 2023-07-24  9:47       ` Zong Li
  2023-07-28  5:18         ` Tomasz Jeznach
  1 sibling, 1 reply; 86+ messages in thread
From: Zong Li @ 2023-07-24  9:47 UTC (permalink / raw)
  To: Tomasz Jeznach
  Cc: Nick Kossifidis, Anup Patel, Albert Ou, linux, Will Deacon,
	Joerg Roedel, linux-kernel, Sebastien Boeuf, iommu,
	Palmer Dabbelt, Paul Walmsley, linux-riscv, Robin Murphy

On Fri, Jul 21, 2023 at 2:00 AM Tomasz Jeznach <tjeznach@rivosinc.com> wrote:
>
> On Wed, Jul 19, 2023 at 8:12 PM Nick Kossifidis <mick@ics.forth.gr> wrote:
> >
> > Hello Tomasz,
> >
> > On 7/19/23 22:33, Tomasz Jeznach wrote:
> > > Enables message or wire signal interrupts for PCIe and platforms devices.
> > >
> >
> > The description doesn't match the subject nor the patch content (we
> > don't jus enable interrupts, we also init the queues).
> >
> > > +     /* Parse Queue lengts */
> > > +     ret = of_property_read_u32(pdev->dev.of_node, "cmdq_len", &iommu->cmdq_len);
> > > +     if (!ret)
> > > +             dev_info(dev, "command queue length set to %i\n", iommu->cmdq_len);
> > > +
> > > +     ret = of_property_read_u32(pdev->dev.of_node, "fltq_len", &iommu->fltq_len);
> > > +     if (!ret)
> > > +             dev_info(dev, "fault/event queue length set to %i\n", iommu->fltq_len);
> > > +
> > > +     ret = of_property_read_u32(pdev->dev.of_node, "priq_len", &iommu->priq_len);
> > > +     if (!ret)
> > > +             dev_info(dev, "page request queue length set to %i\n", iommu->priq_len);
> > > +
> > >       dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
> > >
> >
> > We need to add those to the device tree binding doc (or throw them away,
> > I thought it would be better to have them as part of the device
> > desciption than a module parameter).
> >
>
> We can add them as an optional fields to DT.
> Alternatively, I've been looking into an option to auto-scale CQ/PQ
> based on number of attached devices, but this gets trickier for
> hot-pluggable systems. I've added module parameters as a bare-minimum,
> but still looking for better solutions.
>
> >
> > > +static irqreturn_t riscv_iommu_priq_irq_check(int irq, void *data);
> > > +static irqreturn_t riscv_iommu_priq_process(int irq, void *data);
> > > +
> >
> > > +     case RISCV_IOMMU_PAGE_REQUEST_QUEUE:
> > > +             q = &iommu->priq;
> > > +             q->len = sizeof(struct riscv_iommu_pq_record);
> > > +             count = iommu->priq_len;
> > > +             irq = iommu->irq_priq;
> > > +             irq_check = riscv_iommu_priq_irq_check;
> > > +             irq_process = riscv_iommu_priq_process;
> > > +             q->qbr = RISCV_IOMMU_REG_PQB;
> > > +             q->qcr = RISCV_IOMMU_REG_PQCSR;
> > > +             name = "priq";
> > > +             break;
> >
> >
> > It makes more sense to add the code for the page request queue in the
> > patch that adds ATS/PRI support IMHO. This comment also applies to its
> > interrupt handlers below.
> >
>
> ack. will do.
>
> >
> > > +static inline void riscv_iommu_cmd_inval_set_addr(struct riscv_iommu_command *cmd,
> > > +                                               u64 addr)
> > > +{
> > > +     cmd->dword0 |= RISCV_IOMMU_CMD_IOTINVAL_AV;
> > > +     cmd->dword1 = addr;
> > > +}
> > > +
> >
> > This needs to be (addr >> 2) to match the spec, same as in the iofence
> > command.
> >
>
> oops. Thanks!
>

I think it should be (addr >> 12) according to the spec.

> > Regards,
> > Nick
> >
>
> regards,
> - Tomasz
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings
  2023-07-24  8:03   ` Zong Li
@ 2023-07-24 10:02     ` Anup Patel
  2023-07-24 11:31       ` Zong Li
  0 siblings, 1 reply; 86+ messages in thread
From: Anup Patel @ 2023-07-24 10:02 UTC (permalink / raw)
  To: Zong Li
  Cc: Tomasz Jeznach, Joerg Roedel, Will Deacon, Robin Murphy,
	Paul Walmsley, Albert Ou, linux, linux-kernel, Sebastien Boeuf,
	iommu, Palmer Dabbelt, Nick Kossifidis, linux-riscv

On Mon, Jul 24, 2023 at 1:33 PM Zong Li <zong.li@sifive.com> wrote:
>
> On Thu, Jul 20, 2023 at 3:35 AM Tomasz Jeznach <tjeznach@rivosinc.com> wrote:
> >
> > From: Anup Patel <apatel@ventanamicro.com>
> >
> > We add DT bindings document for RISC-V IOMMU platform and PCI devices
> > defined by the RISC-V IOMMU specification.
> >
> > Signed-off-by: Anup Patel <apatel@ventanamicro.com>
> > ---
> >  .../bindings/iommu/riscv,iommu.yaml           | 146 ++++++++++++++++++
> >  1 file changed, 146 insertions(+)
> >  create mode 100644 Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> >
> > diff --git a/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > new file mode 100644
> > index 000000000000..8a9aedb61768
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > @@ -0,0 +1,146 @@
> > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > +%YAML 1.2
> > +---
> > +$id: http://devicetree.org/schemas/iommu/riscv,iommu.yaml#
> > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > +
> > +title: RISC-V IOMMU Implementation
> > +
> > +maintainers:
> > +  - Tomasz Jeznach <tjeznach@rivosinc.com>
> > +
> > +description:
> > +  The RISC-V IOMMU specificaiton defines an IOMMU for RISC-V platforms
> > +  which can be a regular platform device or a PCI device connected to
> > +  the host root port.
> > +
> > +  The RISC-V IOMMU provides two stage translation, device directory table,
> > +  command queue and fault reporting as wired interrupt or MSIx event for
> > +  both PCI and platform devices.
> > +
> > +  Visit https://github.com/riscv-non-isa/riscv-iommu for more details.
> > +
> > +properties:
> > +  compatible:
> > +    oneOf:
> > +      - description: RISC-V IOMMU as a platform device
> > +        items:
> > +          - enum:
> > +              - vendor,chip-iommu
> > +          - const: riscv,iommu
> > +
> > +      - description: RISC-V IOMMU as a PCI device connected to root port
> > +        items:
> > +          - enum:
> > +              - vendor,chip-pci-iommu
> > +          - const: riscv,pci-iommu
> > +
> > +  reg:
> > +    maxItems: 1
> > +    description:
> > +      For RISC-V IOMMU as a platform device, this represents the MMIO base
> > +      address of registers.
> > +
> > +      For RISC-V IOMMU as a PCI device, this represents the PCI-PCI bridge
> > +      details as described in Documentation/devicetree/bindings/pci/pci.txt
> > +
> > +  '#iommu-cells':
> > +    const: 2
> > +    description: |
> > +      Each IOMMU specifier represents the base device ID and number of
> > +      device IDs.
> > +
> > +  interrupts:
> > +    minItems: 1
> > +    maxItems: 16
> > +    description:
> > +      The presence of this property implies that given RISC-V IOMMU uses
> > +      wired interrupts to notify the RISC-V HARTS (or CPUs).
> > +
> > +  msi-parent:
> > +    description:
> > +      The presence of this property implies that given RISC-V IOMMU uses
> > +      MSIx to notify the RISC-V HARTs (or CPUs). This property should be
> > +      considered only when the interrupts property is absent.
> > +
> > +  dma-coherent:
> > +    description:
> > +      Present if page table walks and DMA accessed made by the RISC-V IOMMU
> > +      are cache coherent with the CPU.
> > +
> > +  power-domains:
> > +    maxItems: 1
> > +
>
> In RISC-V IOMMU, certain devices can be set to bypass mode when the
> IOMMU is in translation mode. To identify the devices that require
> bypass mode by default, does it be sensible to add a property to
> indicate this behavior?

Bypass mode for a device is a property of that device (similar to dma-coherent)
and not of the IOMMU. Other architectures (ARM and x86) never added such
a device property for bypass mode so I guess it is NOT ADVISABLE to do it.

If this is REALLY required then we can do something similar to the QCOM
SMMU driver where they have a whitelist of devices which are allowed to
be in bypass mode (i.e. IOMMU_DOMAIN_IDENTITY) based their device
compatible string and any device outside this whitelist is blocked by default.

Regards,
Anup

>
> > +required:
> > +  - compatible
> > +  - reg
> > +  - '#iommu-cells'
> > +
> > +additionalProperties: false
> > +
> > +examples:
> > +  - |
> > +    /* Example 1 (IOMMU platform device with wired interrupts) */
> > +    immu1: iommu@1bccd000 {
> > +        compatible = "vendor,chip-iommu", "riscv,iommu";
> > +        reg = <0x1bccd000 0x1000>;
> > +        interrupt-parent = <&aplic_smode>;
> > +        interrupts = <32 4>, <33 4>, <34 4>, <35 4>;
> > +        #iommu-cells = <2>;
> > +    };
> > +
> > +    /* Device with two IOMMU device IDs, 0 and 7 */
> > +    master1 {
> > +        iommus = <&immu1 0 1>, <&immu1 7 1>;
> > +    };
> > +
> > +  - |
> > +    /* Example 2 (IOMMU platform device with MSIs) */
> > +    immu2: iommu@1bcdd000 {
> > +        compatible = "vendor,chip-iommu", "riscv,iommu";
> > +        reg = <0x1bccd000 0x1000>;
> > +        msi-parent = <&imsics_smode>;
> > +        #iommu-cells = <2>;
> > +    };
> > +
> > +    bus {
> > +        #address-cells = <2>;
> > +        #size-cells = <2>;
> > +
> > +        /* Device with IOMMU device IDs ranging from 32 to 64 */
> > +        master1 {
> > +                iommus = <&immu2 32 32>;
> > +        };
> > +
> > +        pcie@40000000 {
> > +            compatible = "pci-host-cam-generic";
> > +            device_type = "pci";
> > +            #address-cells = <3>;
> > +            #size-cells = <2>;
> > +            bus-range = <0x0 0x1>;
> > +
> > +            /* CPU_PHYSICAL(2)  SIZE(2) */
> > +            reg = <0x0 0x40000000 0x0 0x1000000>;
> > +
> > +            /* BUS_ADDRESS(3)  CPU_PHYSICAL(2)  SIZE(2) */
> > +            ranges = <0x01000000 0x0 0x01000000  0x0 0x01000000  0x0 0x00010000>,
> > +                     <0x02000000 0x0 0x41000000  0x0 0x41000000  0x0 0x3f000000>;
> > +
> > +            #interrupt-cells = <0x1>;
> > +
> > +            /* PCI_DEVICE(3)  INT#(1)  CONTROLLER(PHANDLE)  CONTROLLER_DATA(2) */
> > +            interrupt-map = <   0x0 0x0 0x0  0x1  &aplic_smode  0x4 0x1>,
> > +                            < 0x800 0x0 0x0  0x1  &aplic_smode  0x5 0x1>,
> > +                            <0x1000 0x0 0x0  0x1  &aplic_smode  0x6 0x1>,
> > +                            <0x1800 0x0 0x0  0x1  &aplic_smode  0x7 0x1>;
> > +
> > +            /* PCI_DEVICE(3)  INT#(1) */
> > +            interrupt-map-mask = <0xf800 0x0 0x0  0x7>;
> > +
> > +            msi-parent = <&imsics_smode>;
> > +
> > +            /* Devices with bus number 0-127 are mastered via immu2 */
> > +            iommu-map = <0x0000 &immu2 0x0000 0x8000>;
> > +        };
> > +    };
> > +...
> > --
> > 2.34.1
> >
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings
  2023-07-24 10:02     ` Anup Patel
@ 2023-07-24 11:31       ` Zong Li
  2023-07-24 12:10         ` Anup Patel
  0 siblings, 1 reply; 86+ messages in thread
From: Zong Li @ 2023-07-24 11:31 UTC (permalink / raw)
  To: Anup Patel
  Cc: Tomasz Jeznach, Joerg Roedel, Will Deacon, Robin Murphy,
	Paul Walmsley, Albert Ou, linux, linux-kernel, Sebastien Boeuf,
	iommu, Palmer Dabbelt, Nick Kossifidis, linux-riscv

On Mon, Jul 24, 2023 at 6:02 PM Anup Patel <apatel@ventanamicro.com> wrote:
>
> On Mon, Jul 24, 2023 at 1:33 PM Zong Li <zong.li@sifive.com> wrote:
> >
> > On Thu, Jul 20, 2023 at 3:35 AM Tomasz Jeznach <tjeznach@rivosinc.com> wrote:
> > >
> > > From: Anup Patel <apatel@ventanamicro.com>
> > >
> > > We add DT bindings document for RISC-V IOMMU platform and PCI devices
> > > defined by the RISC-V IOMMU specification.
> > >
> > > Signed-off-by: Anup Patel <apatel@ventanamicro.com>
> > > ---
> > >  .../bindings/iommu/riscv,iommu.yaml           | 146 ++++++++++++++++++
> > >  1 file changed, 146 insertions(+)
> > >  create mode 100644 Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > >
> > > diff --git a/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > > new file mode 100644
> > > index 000000000000..8a9aedb61768
> > > --- /dev/null
> > > +++ b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > > @@ -0,0 +1,146 @@
> > > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > > +%YAML 1.2
> > > +---
> > > +$id: http://devicetree.org/schemas/iommu/riscv,iommu.yaml#
> > > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > > +
> > > +title: RISC-V IOMMU Implementation
> > > +
> > > +maintainers:
> > > +  - Tomasz Jeznach <tjeznach@rivosinc.com>
> > > +
> > > +description:
> > > +  The RISC-V IOMMU specificaiton defines an IOMMU for RISC-V platforms
> > > +  which can be a regular platform device or a PCI device connected to
> > > +  the host root port.
> > > +
> > > +  The RISC-V IOMMU provides two stage translation, device directory table,
> > > +  command queue and fault reporting as wired interrupt or MSIx event for
> > > +  both PCI and platform devices.
> > > +
> > > +  Visit https://github.com/riscv-non-isa/riscv-iommu for more details.
> > > +
> > > +properties:
> > > +  compatible:
> > > +    oneOf:
> > > +      - description: RISC-V IOMMU as a platform device
> > > +        items:
> > > +          - enum:
> > > +              - vendor,chip-iommu
> > > +          - const: riscv,iommu
> > > +
> > > +      - description: RISC-V IOMMU as a PCI device connected to root port
> > > +        items:
> > > +          - enum:
> > > +              - vendor,chip-pci-iommu
> > > +          - const: riscv,pci-iommu
> > > +
> > > +  reg:
> > > +    maxItems: 1
> > > +    description:
> > > +      For RISC-V IOMMU as a platform device, this represents the MMIO base
> > > +      address of registers.
> > > +
> > > +      For RISC-V IOMMU as a PCI device, this represents the PCI-PCI bridge
> > > +      details as described in Documentation/devicetree/bindings/pci/pci.txt
> > > +
> > > +  '#iommu-cells':
> > > +    const: 2
> > > +    description: |
> > > +      Each IOMMU specifier represents the base device ID and number of
> > > +      device IDs.
> > > +
> > > +  interrupts:
> > > +    minItems: 1
> > > +    maxItems: 16
> > > +    description:
> > > +      The presence of this property implies that given RISC-V IOMMU uses
> > > +      wired interrupts to notify the RISC-V HARTS (or CPUs).
> > > +
> > > +  msi-parent:
> > > +    description:
> > > +      The presence of this property implies that given RISC-V IOMMU uses
> > > +      MSIx to notify the RISC-V HARTs (or CPUs). This property should be
> > > +      considered only when the interrupts property is absent.
> > > +
> > > +  dma-coherent:
> > > +    description:
> > > +      Present if page table walks and DMA accessed made by the RISC-V IOMMU
> > > +      are cache coherent with the CPU.
> > > +
> > > +  power-domains:
> > > +    maxItems: 1
> > > +
> >
> > In RISC-V IOMMU, certain devices can be set to bypass mode when the
> > IOMMU is in translation mode. To identify the devices that require
> > bypass mode by default, does it be sensible to add a property to
> > indicate this behavior?
>
> Bypass mode for a device is a property of that device (similar to dma-coherent)
> and not of the IOMMU. Other architectures (ARM and x86) never added such
> a device property for bypass mode so I guess it is NOT ADVISABLE to do it.
>
> If this is REALLY required then we can do something similar to the QCOM
> SMMU driver where they have a whitelist of devices which are allowed to
> be in bypass mode (i.e. IOMMU_DOMAIN_IDENTITY) based their device
> compatible string and any device outside this whitelist is blocked by default.
>

I have considered that adding the property of bypass mode to that
device would be more appropriate. However, if we want to define this
property for the device, it might need to go through the generic IOMMU
dt-bindings, but I'm not sure if other IOMMU devices need this. I am
bringing up this topic here because I would like to explore if there
are any solutions on the IOMMU side, such as a property that indicates
the phandle of devices wishing to set bypass mode, somewhat similar to
the whitelist you mentioned earlier. Do you think we should address
this? After all, this is a case of RISC-V IOMMU supported.

> Regards,
> Anup
>
> >
> > > +required:
> > > +  - compatible
> > > +  - reg
> > > +  - '#iommu-cells'
> > > +
> > > +additionalProperties: false
> > > +
> > > +examples:
> > > +  - |
> > > +    /* Example 1 (IOMMU platform device with wired interrupts) */
> > > +    immu1: iommu@1bccd000 {
> > > +        compatible = "vendor,chip-iommu", "riscv,iommu";
> > > +        reg = <0x1bccd000 0x1000>;
> > > +        interrupt-parent = <&aplic_smode>;
> > > +        interrupts = <32 4>, <33 4>, <34 4>, <35 4>;
> > > +        #iommu-cells = <2>;
> > > +    };
> > > +
> > > +    /* Device with two IOMMU device IDs, 0 and 7 */
> > > +    master1 {
> > > +        iommus = <&immu1 0 1>, <&immu1 7 1>;
> > > +    };
> > > +
> > > +  - |
> > > +    /* Example 2 (IOMMU platform device with MSIs) */
> > > +    immu2: iommu@1bcdd000 {
> > > +        compatible = "vendor,chip-iommu", "riscv,iommu";
> > > +        reg = <0x1bccd000 0x1000>;
> > > +        msi-parent = <&imsics_smode>;
> > > +        #iommu-cells = <2>;
> > > +    };
> > > +
> > > +    bus {
> > > +        #address-cells = <2>;
> > > +        #size-cells = <2>;
> > > +
> > > +        /* Device with IOMMU device IDs ranging from 32 to 64 */
> > > +        master1 {
> > > +                iommus = <&immu2 32 32>;
> > > +        };
> > > +
> > > +        pcie@40000000 {
> > > +            compatible = "pci-host-cam-generic";
> > > +            device_type = "pci";
> > > +            #address-cells = <3>;
> > > +            #size-cells = <2>;
> > > +            bus-range = <0x0 0x1>;
> > > +
> > > +            /* CPU_PHYSICAL(2)  SIZE(2) */
> > > +            reg = <0x0 0x40000000 0x0 0x1000000>;
> > > +
> > > +            /* BUS_ADDRESS(3)  CPU_PHYSICAL(2)  SIZE(2) */
> > > +            ranges = <0x01000000 0x0 0x01000000  0x0 0x01000000  0x0 0x00010000>,
> > > +                     <0x02000000 0x0 0x41000000  0x0 0x41000000  0x0 0x3f000000>;
> > > +
> > > +            #interrupt-cells = <0x1>;
> > > +
> > > +            /* PCI_DEVICE(3)  INT#(1)  CONTROLLER(PHANDLE)  CONTROLLER_DATA(2) */
> > > +            interrupt-map = <   0x0 0x0 0x0  0x1  &aplic_smode  0x4 0x1>,
> > > +                            < 0x800 0x0 0x0  0x1  &aplic_smode  0x5 0x1>,
> > > +                            <0x1000 0x0 0x0  0x1  &aplic_smode  0x6 0x1>,
> > > +                            <0x1800 0x0 0x0  0x1  &aplic_smode  0x7 0x1>;
> > > +
> > > +            /* PCI_DEVICE(3)  INT#(1) */
> > > +            interrupt-map-mask = <0xf800 0x0 0x0  0x7>;
> > > +
> > > +            msi-parent = <&imsics_smode>;
> > > +
> > > +            /* Devices with bus number 0-127 are mastered via immu2 */
> > > +            iommu-map = <0x0000 &immu2 0x0000 0x8000>;
> > > +        };
> > > +    };
> > > +...
> > > --
> > > 2.34.1
> > >
> > >
> > > _______________________________________________
> > > linux-riscv mailing list
> > > linux-riscv@lists.infradead.org
> > > http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings
  2023-07-24 11:31       ` Zong Li
@ 2023-07-24 12:10         ` Anup Patel
  2023-07-24 13:23           ` Zong Li
  0 siblings, 1 reply; 86+ messages in thread
From: Anup Patel @ 2023-07-24 12:10 UTC (permalink / raw)
  To: Zong Li
  Cc: Tomasz Jeznach, Joerg Roedel, Will Deacon, Robin Murphy,
	Paul Walmsley, Albert Ou, linux, linux-kernel, Sebastien Boeuf,
	iommu, Palmer Dabbelt, Nick Kossifidis, linux-riscv

On Mon, Jul 24, 2023 at 5:01 PM Zong Li <zong.li@sifive.com> wrote:
>
> On Mon, Jul 24, 2023 at 6:02 PM Anup Patel <apatel@ventanamicro.com> wrote:
> >
> > On Mon, Jul 24, 2023 at 1:33 PM Zong Li <zong.li@sifive.com> wrote:
> > >
> > > On Thu, Jul 20, 2023 at 3:35 AM Tomasz Jeznach <tjeznach@rivosinc.com> wrote:
> > > >
> > > > From: Anup Patel <apatel@ventanamicro.com>
> > > >
> > > > We add DT bindings document for RISC-V IOMMU platform and PCI devices
> > > > defined by the RISC-V IOMMU specification.
> > > >
> > > > Signed-off-by: Anup Patel <apatel@ventanamicro.com>
> > > > ---
> > > >  .../bindings/iommu/riscv,iommu.yaml           | 146 ++++++++++++++++++
> > > >  1 file changed, 146 insertions(+)
> > > >  create mode 100644 Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > > >
> > > > diff --git a/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > > > new file mode 100644
> > > > index 000000000000..8a9aedb61768
> > > > --- /dev/null
> > > > +++ b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > > > @@ -0,0 +1,146 @@
> > > > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > > > +%YAML 1.2
> > > > +---
> > > > +$id: http://devicetree.org/schemas/iommu/riscv,iommu.yaml#
> > > > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > > > +
> > > > +title: RISC-V IOMMU Implementation
> > > > +
> > > > +maintainers:
> > > > +  - Tomasz Jeznach <tjeznach@rivosinc.com>
> > > > +
> > > > +description:
> > > > +  The RISC-V IOMMU specificaiton defines an IOMMU for RISC-V platforms
> > > > +  which can be a regular platform device or a PCI device connected to
> > > > +  the host root port.
> > > > +
> > > > +  The RISC-V IOMMU provides two stage translation, device directory table,
> > > > +  command queue and fault reporting as wired interrupt or MSIx event for
> > > > +  both PCI and platform devices.
> > > > +
> > > > +  Visit https://github.com/riscv-non-isa/riscv-iommu for more details.
> > > > +
> > > > +properties:
> > > > +  compatible:
> > > > +    oneOf:
> > > > +      - description: RISC-V IOMMU as a platform device
> > > > +        items:
> > > > +          - enum:
> > > > +              - vendor,chip-iommu
> > > > +          - const: riscv,iommu
> > > > +
> > > > +      - description: RISC-V IOMMU as a PCI device connected to root port
> > > > +        items:
> > > > +          - enum:
> > > > +              - vendor,chip-pci-iommu
> > > > +          - const: riscv,pci-iommu
> > > > +
> > > > +  reg:
> > > > +    maxItems: 1
> > > > +    description:
> > > > +      For RISC-V IOMMU as a platform device, this represents the MMIO base
> > > > +      address of registers.
> > > > +
> > > > +      For RISC-V IOMMU as a PCI device, this represents the PCI-PCI bridge
> > > > +      details as described in Documentation/devicetree/bindings/pci/pci.txt
> > > > +
> > > > +  '#iommu-cells':
> > > > +    const: 2
> > > > +    description: |
> > > > +      Each IOMMU specifier represents the base device ID and number of
> > > > +      device IDs.
> > > > +
> > > > +  interrupts:
> > > > +    minItems: 1
> > > > +    maxItems: 16
> > > > +    description:
> > > > +      The presence of this property implies that given RISC-V IOMMU uses
> > > > +      wired interrupts to notify the RISC-V HARTS (or CPUs).
> > > > +
> > > > +  msi-parent:
> > > > +    description:
> > > > +      The presence of this property implies that given RISC-V IOMMU uses
> > > > +      MSIx to notify the RISC-V HARTs (or CPUs). This property should be
> > > > +      considered only when the interrupts property is absent.
> > > > +
> > > > +  dma-coherent:
> > > > +    description:
> > > > +      Present if page table walks and DMA accessed made by the RISC-V IOMMU
> > > > +      are cache coherent with the CPU.
> > > > +
> > > > +  power-domains:
> > > > +    maxItems: 1
> > > > +
> > >
> > > In RISC-V IOMMU, certain devices can be set to bypass mode when the
> > > IOMMU is in translation mode. To identify the devices that require
> > > bypass mode by default, does it be sensible to add a property to
> > > indicate this behavior?
> >
> > Bypass mode for a device is a property of that device (similar to dma-coherent)
> > and not of the IOMMU. Other architectures (ARM and x86) never added such
> > a device property for bypass mode so I guess it is NOT ADVISABLE to do it.
> >
> > If this is REALLY required then we can do something similar to the QCOM
> > SMMU driver where they have a whitelist of devices which are allowed to
> > be in bypass mode (i.e. IOMMU_DOMAIN_IDENTITY) based their device
> > compatible string and any device outside this whitelist is blocked by default.
> >
>
> I have considered that adding the property of bypass mode to that
> device would be more appropriate. However, if we want to define this
> property for the device, it might need to go through the generic IOMMU
> dt-bindings, but I'm not sure if other IOMMU devices need this. I am
> bringing up this topic here because I would like to explore if there
> are any solutions on the IOMMU side, such as a property that indicates
> the phandle of devices wishing to set bypass mode, somewhat similar to
> the whitelist you mentioned earlier. Do you think we should address
> this? After all, this is a case of RISC-V IOMMU supported.

Bypass mode is a common feature across IOMMUs. Other IOMMUs don't
have a special property for bypass mode at device-level or at IOMMU level,
which clearly indicates that defining a RISC-V specific property is not the
right way to go.

The real question is how do we set IOMMU_DOMAIN_IDENTITY (i.e.
bypass/identity domain) as the default domain for certain devices ?

One possible option is to implement def_domain_type() IOMMU operation
for RISC-V IOMMU which will return IOMMU_DOMAIN_IDENTITY for
certain devices based on compatible string matching (i.e. whitelist of
devices). As an example, refer qcom_smmu_def_domain_type()
of drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c

Regards,
Anup





>
> > Regards,
> > Anup
> >
> > >
> > > > +required:
> > > > +  - compatible
> > > > +  - reg
> > > > +  - '#iommu-cells'
> > > > +
> > > > +additionalProperties: false
> > > > +
> > > > +examples:
> > > > +  - |
> > > > +    /* Example 1 (IOMMU platform device with wired interrupts) */
> > > > +    immu1: iommu@1bccd000 {
> > > > +        compatible = "vendor,chip-iommu", "riscv,iommu";
> > > > +        reg = <0x1bccd000 0x1000>;
> > > > +        interrupt-parent = <&aplic_smode>;
> > > > +        interrupts = <32 4>, <33 4>, <34 4>, <35 4>;
> > > > +        #iommu-cells = <2>;
> > > > +    };
> > > > +
> > > > +    /* Device with two IOMMU device IDs, 0 and 7 */
> > > > +    master1 {
> > > > +        iommus = <&immu1 0 1>, <&immu1 7 1>;
> > > > +    };
> > > > +
> > > > +  - |
> > > > +    /* Example 2 (IOMMU platform device with MSIs) */
> > > > +    immu2: iommu@1bcdd000 {
> > > > +        compatible = "vendor,chip-iommu", "riscv,iommu";
> > > > +        reg = <0x1bccd000 0x1000>;
> > > > +        msi-parent = <&imsics_smode>;
> > > > +        #iommu-cells = <2>;
> > > > +    };
> > > > +
> > > > +    bus {
> > > > +        #address-cells = <2>;
> > > > +        #size-cells = <2>;
> > > > +
> > > > +        /* Device with IOMMU device IDs ranging from 32 to 64 */
> > > > +        master1 {
> > > > +                iommus = <&immu2 32 32>;
> > > > +        };
> > > > +
> > > > +        pcie@40000000 {
> > > > +            compatible = "pci-host-cam-generic";
> > > > +            device_type = "pci";
> > > > +            #address-cells = <3>;
> > > > +            #size-cells = <2>;
> > > > +            bus-range = <0x0 0x1>;
> > > > +
> > > > +            /* CPU_PHYSICAL(2)  SIZE(2) */
> > > > +            reg = <0x0 0x40000000 0x0 0x1000000>;
> > > > +
> > > > +            /* BUS_ADDRESS(3)  CPU_PHYSICAL(2)  SIZE(2) */
> > > > +            ranges = <0x01000000 0x0 0x01000000  0x0 0x01000000  0x0 0x00010000>,
> > > > +                     <0x02000000 0x0 0x41000000  0x0 0x41000000  0x0 0x3f000000>;
> > > > +
> > > > +            #interrupt-cells = <0x1>;
> > > > +
> > > > +            /* PCI_DEVICE(3)  INT#(1)  CONTROLLER(PHANDLE)  CONTROLLER_DATA(2) */
> > > > +            interrupt-map = <   0x0 0x0 0x0  0x1  &aplic_smode  0x4 0x1>,
> > > > +                            < 0x800 0x0 0x0  0x1  &aplic_smode  0x5 0x1>,
> > > > +                            <0x1000 0x0 0x0  0x1  &aplic_smode  0x6 0x1>,
> > > > +                            <0x1800 0x0 0x0  0x1  &aplic_smode  0x7 0x1>;
> > > > +
> > > > +            /* PCI_DEVICE(3)  INT#(1) */
> > > > +            interrupt-map-mask = <0xf800 0x0 0x0  0x7>;
> > > > +
> > > > +            msi-parent = <&imsics_smode>;
> > > > +
> > > > +            /* Devices with bus number 0-127 are mastered via immu2 */
> > > > +            iommu-map = <0x0000 &immu2 0x0000 0x8000>;
> > > > +        };
> > > > +    };
> > > > +...
> > > > --
> > > > 2.34.1
> > > >
> > > >
> > > > _______________________________________________
> > > > linux-riscv mailing list
> > > > linux-riscv@lists.infradead.org
> > > > http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings
  2023-07-24 12:10         ` Anup Patel
@ 2023-07-24 13:23           ` Zong Li
  2023-07-26  3:21             ` Baolu Lu
  0 siblings, 1 reply; 86+ messages in thread
From: Zong Li @ 2023-07-24 13:23 UTC (permalink / raw)
  To: Anup Patel
  Cc: Tomasz Jeznach, Joerg Roedel, Will Deacon, Robin Murphy,
	Paul Walmsley, Albert Ou, linux, linux-kernel, Sebastien Boeuf,
	iommu, Palmer Dabbelt, Nick Kossifidis, linux-riscv

On Mon, Jul 24, 2023 at 8:10 PM Anup Patel <apatel@ventanamicro.com> wrote:
>
> On Mon, Jul 24, 2023 at 5:01 PM Zong Li <zong.li@sifive.com> wrote:
> >
> > On Mon, Jul 24, 2023 at 6:02 PM Anup Patel <apatel@ventanamicro.com> wrote:
> > >
> > > On Mon, Jul 24, 2023 at 1:33 PM Zong Li <zong.li@sifive.com> wrote:
> > > >
> > > > On Thu, Jul 20, 2023 at 3:35 AM Tomasz Jeznach <tjeznach@rivosinc.com> wrote:
> > > > >
> > > > > From: Anup Patel <apatel@ventanamicro.com>
> > > > >
> > > > > We add DT bindings document for RISC-V IOMMU platform and PCI devices
> > > > > defined by the RISC-V IOMMU specification.
> > > > >
> > > > > Signed-off-by: Anup Patel <apatel@ventanamicro.com>
> > > > > ---
> > > > >  .../bindings/iommu/riscv,iommu.yaml           | 146 ++++++++++++++++++
> > > > >  1 file changed, 146 insertions(+)
> > > > >  create mode 100644 Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > > > >
> > > > > diff --git a/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > > > > new file mode 100644
> > > > > index 000000000000..8a9aedb61768
> > > > > --- /dev/null
> > > > > +++ b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > > > > @@ -0,0 +1,146 @@
> > > > > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > > > > +%YAML 1.2
> > > > > +---
> > > > > +$id: http://devicetree.org/schemas/iommu/riscv,iommu.yaml#
> > > > > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > > > > +
> > > > > +title: RISC-V IOMMU Implementation
> > > > > +
> > > > > +maintainers:
> > > > > +  - Tomasz Jeznach <tjeznach@rivosinc.com>
> > > > > +
> > > > > +description:
> > > > > +  The RISC-V IOMMU specificaiton defines an IOMMU for RISC-V platforms
> > > > > +  which can be a regular platform device or a PCI device connected to
> > > > > +  the host root port.
> > > > > +
> > > > > +  The RISC-V IOMMU provides two stage translation, device directory table,
> > > > > +  command queue and fault reporting as wired interrupt or MSIx event for
> > > > > +  both PCI and platform devices.
> > > > > +
> > > > > +  Visit https://github.com/riscv-non-isa/riscv-iommu for more details.
> > > > > +
> > > > > +properties:
> > > > > +  compatible:
> > > > > +    oneOf:
> > > > > +      - description: RISC-V IOMMU as a platform device
> > > > > +        items:
> > > > > +          - enum:
> > > > > +              - vendor,chip-iommu
> > > > > +          - const: riscv,iommu
> > > > > +
> > > > > +      - description: RISC-V IOMMU as a PCI device connected to root port
> > > > > +        items:
> > > > > +          - enum:
> > > > > +              - vendor,chip-pci-iommu
> > > > > +          - const: riscv,pci-iommu
> > > > > +
> > > > > +  reg:
> > > > > +    maxItems: 1
> > > > > +    description:
> > > > > +      For RISC-V IOMMU as a platform device, this represents the MMIO base
> > > > > +      address of registers.
> > > > > +
> > > > > +      For RISC-V IOMMU as a PCI device, this represents the PCI-PCI bridge
> > > > > +      details as described in Documentation/devicetree/bindings/pci/pci.txt
> > > > > +
> > > > > +  '#iommu-cells':
> > > > > +    const: 2
> > > > > +    description: |
> > > > > +      Each IOMMU specifier represents the base device ID and number of
> > > > > +      device IDs.
> > > > > +
> > > > > +  interrupts:
> > > > > +    minItems: 1
> > > > > +    maxItems: 16
> > > > > +    description:
> > > > > +      The presence of this property implies that given RISC-V IOMMU uses
> > > > > +      wired interrupts to notify the RISC-V HARTS (or CPUs).
> > > > > +
> > > > > +  msi-parent:
> > > > > +    description:
> > > > > +      The presence of this property implies that given RISC-V IOMMU uses
> > > > > +      MSIx to notify the RISC-V HARTs (or CPUs). This property should be
> > > > > +      considered only when the interrupts property is absent.
> > > > > +
> > > > > +  dma-coherent:
> > > > > +    description:
> > > > > +      Present if page table walks and DMA accessed made by the RISC-V IOMMU
> > > > > +      are cache coherent with the CPU.
> > > > > +
> > > > > +  power-domains:
> > > > > +    maxItems: 1
> > > > > +
> > > >
> > > > In RISC-V IOMMU, certain devices can be set to bypass mode when the
> > > > IOMMU is in translation mode. To identify the devices that require
> > > > bypass mode by default, does it be sensible to add a property to
> > > > indicate this behavior?
> > >
> > > Bypass mode for a device is a property of that device (similar to dma-coherent)
> > > and not of the IOMMU. Other architectures (ARM and x86) never added such
> > > a device property for bypass mode so I guess it is NOT ADVISABLE to do it.
> > >
> > > If this is REALLY required then we can do something similar to the QCOM
> > > SMMU driver where they have a whitelist of devices which are allowed to
> > > be in bypass mode (i.e. IOMMU_DOMAIN_IDENTITY) based their device
> > > compatible string and any device outside this whitelist is blocked by default.
> > >
> >
> > I have considered that adding the property of bypass mode to that
> > device would be more appropriate. However, if we want to define this
> > property for the device, it might need to go through the generic IOMMU
> > dt-bindings, but I'm not sure if other IOMMU devices need this. I am
> > bringing up this topic here because I would like to explore if there
> > are any solutions on the IOMMU side, such as a property that indicates
> > the phandle of devices wishing to set bypass mode, somewhat similar to
> > the whitelist you mentioned earlier. Do you think we should address
> > this? After all, this is a case of RISC-V IOMMU supported.
>
> Bypass mode is a common feature across IOMMUs. Other IOMMUs don't
> have a special property for bypass mode at device-level or at IOMMU level,
> which clearly indicates that defining a RISC-V specific property is not the
> right way to go.
>
> The real question is how do we set IOMMU_DOMAIN_IDENTITY (i.e.
> bypass/identity domain) as the default domain for certain devices ?
>
> One possible option is to implement def_domain_type() IOMMU operation
> for RISC-V IOMMU which will return IOMMU_DOMAIN_IDENTITY for
> certain devices based on compatible string matching (i.e. whitelist of
> devices). As an example, refer qcom_smmu_def_domain_type()
> of drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>

That is indeed one way to approach it, and we can modify the
compatible string when we want to change the mode. However, it would
be preferable to explore a more flexible approach to achieve this
goal. By doing so, we can avoid hard coding anything in the driver or
having to rebuild the kernel  whenever we want to change the mode for
certain devices. While I have considered extending a cell in the
'iommus' property to indicate a device's desire to set bypass mode, it
doesn't comply with the iommu documentation and could lead to
ambiguous definitions.

If, at present, we are unable to find a suitable solution, perhaps
let's keep this topic in mind until we discover a more appropriate
approach. In the meantime, we can continue to explore other
possibilities to implement it. Thanks.

> Regards,
> Anup
>
>
>
>
>
> >
> > > Regards,
> > > Anup
> > >
> > > >
> > > > > +required:
> > > > > +  - compatible
> > > > > +  - reg
> > > > > +  - '#iommu-cells'
> > > > > +
> > > > > +additionalProperties: false
> > > > > +
> > > > > +examples:
> > > > > +  - |
> > > > > +    /* Example 1 (IOMMU platform device with wired interrupts) */
> > > > > +    immu1: iommu@1bccd000 {
> > > > > +        compatible = "vendor,chip-iommu", "riscv,iommu";
> > > > > +        reg = <0x1bccd000 0x1000>;
> > > > > +        interrupt-parent = <&aplic_smode>;
> > > > > +        interrupts = <32 4>, <33 4>, <34 4>, <35 4>;
> > > > > +        #iommu-cells = <2>;
> > > > > +    };
> > > > > +
> > > > > +    /* Device with two IOMMU device IDs, 0 and 7 */
> > > > > +    master1 {
> > > > > +        iommus = <&immu1 0 1>, <&immu1 7 1>;
> > > > > +    };
> > > > > +
> > > > > +  - |
> > > > > +    /* Example 2 (IOMMU platform device with MSIs) */
> > > > > +    immu2: iommu@1bcdd000 {
> > > > > +        compatible = "vendor,chip-iommu", "riscv,iommu";
> > > > > +        reg = <0x1bccd000 0x1000>;
> > > > > +        msi-parent = <&imsics_smode>;
> > > > > +        #iommu-cells = <2>;
> > > > > +    };
> > > > > +
> > > > > +    bus {
> > > > > +        #address-cells = <2>;
> > > > > +        #size-cells = <2>;
> > > > > +
> > > > > +        /* Device with IOMMU device IDs ranging from 32 to 64 */
> > > > > +        master1 {
> > > > > +                iommus = <&immu2 32 32>;
> > > > > +        };
> > > > > +
> > > > > +        pcie@40000000 {
> > > > > +            compatible = "pci-host-cam-generic";
> > > > > +            device_type = "pci";
> > > > > +            #address-cells = <3>;
> > > > > +            #size-cells = <2>;
> > > > > +            bus-range = <0x0 0x1>;
> > > > > +
> > > > > +            /* CPU_PHYSICAL(2)  SIZE(2) */
> > > > > +            reg = <0x0 0x40000000 0x0 0x1000000>;
> > > > > +
> > > > > +            /* BUS_ADDRESS(3)  CPU_PHYSICAL(2)  SIZE(2) */
> > > > > +            ranges = <0x01000000 0x0 0x01000000  0x0 0x01000000  0x0 0x00010000>,
> > > > > +                     <0x02000000 0x0 0x41000000  0x0 0x41000000  0x0 0x3f000000>;
> > > > > +
> > > > > +            #interrupt-cells = <0x1>;
> > > > > +
> > > > > +            /* PCI_DEVICE(3)  INT#(1)  CONTROLLER(PHANDLE)  CONTROLLER_DATA(2) */
> > > > > +            interrupt-map = <   0x0 0x0 0x0  0x1  &aplic_smode  0x4 0x1>,
> > > > > +                            < 0x800 0x0 0x0  0x1  &aplic_smode  0x5 0x1>,
> > > > > +                            <0x1000 0x0 0x0  0x1  &aplic_smode  0x6 0x1>,
> > > > > +                            <0x1800 0x0 0x0  0x1  &aplic_smode  0x7 0x1>;
> > > > > +
> > > > > +            /* PCI_DEVICE(3)  INT#(1) */
> > > > > +            interrupt-map-mask = <0xf800 0x0 0x0  0x7>;
> > > > > +
> > > > > +            msi-parent = <&imsics_smode>;
> > > > > +
> > > > > +            /* Devices with bus number 0-127 are mastered via immu2 */
> > > > > +            iommu-map = <0x0000 &immu2 0x0000 0x8000>;
> > > > > +        };
> > > > > +    };
> > > > > +...
> > > > > --
> > > > > 2.34.1
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > linux-riscv mailing list
> > > > > linux-riscv@lists.infradead.org
> > > > > http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 08/11] RISC-V: drivers/iommu/riscv: Add page table support
  2023-07-19 19:33 ` [PATCH 08/11] RISC-V: drivers/iommu/riscv: Add page table support Tomasz Jeznach
@ 2023-07-25 13:13   ` Zong Li
  2023-07-31  7:19   ` Zong Li
  2023-08-16 21:04   ` Robin Murphy
  2 siblings, 0 replies; 86+ messages in thread
From: Zong Li @ 2023-07-25 13:13 UTC (permalink / raw)
  To: Tomasz Jeznach
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley,
	Anup Patel, Albert Ou, linux, linux-kernel, Sebastien Boeuf,
	iommu, Palmer Dabbelt, Nick Kossifidis, linux-riscv

On Thu, Jul 20, 2023 at 3:34 AM Tomasz Jeznach <tjeznach@rivosinc.com> wrote:
>
> Introduces I/O page level translation services, with 4K, 2M, 1G page
> size support and enables page level iommu_map/unmap domain interfaces.
>
> Co-developed-by: Sebastien Boeuf <seb@rivosinc.com>
> Signed-off-by: Sebastien Boeuf <seb@rivosinc.com>
> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> ---
>  drivers/iommu/io-pgtable.c       |   3 +
>  drivers/iommu/riscv/Makefile     |   2 +-
>  drivers/iommu/riscv/io_pgtable.c | 266 +++++++++++++++++++++++++++++++
>  drivers/iommu/riscv/iommu.c      |  40 +++--
>  drivers/iommu/riscv/iommu.h      |   1 +
>  include/linux/io-pgtable.h       |   2 +
>  6 files changed, 297 insertions(+), 17 deletions(-)
>  create mode 100644 drivers/iommu/riscv/io_pgtable.c
>
> diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c
> index b843fcd365d2..c4807175934f 100644
> --- a/drivers/iommu/io-pgtable.c
> +++ b/drivers/iommu/io-pgtable.c
> @@ -32,6 +32,9 @@ io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] = {
>         [AMD_IOMMU_V1] = &io_pgtable_amd_iommu_v1_init_fns,
>         [AMD_IOMMU_V2] = &io_pgtable_amd_iommu_v2_init_fns,
>  #endif
> +#ifdef CONFIG_RISCV_IOMMU
> +       [RISCV_IOMMU] = &io_pgtable_riscv_init_fns,
> +#endif
>  };
>
>  struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
> diff --git a/drivers/iommu/riscv/Makefile b/drivers/iommu/riscv/Makefile
> index 9523eb053cfc..13af452c3052 100644
> --- a/drivers/iommu/riscv/Makefile
> +++ b/drivers/iommu/riscv/Makefile
> @@ -1 +1 @@
> -obj-$(CONFIG_RISCV_IOMMU) += iommu.o iommu-pci.o iommu-platform.o iommu-sysfs.o
> \ No newline at end of file
> +obj-$(CONFIG_RISCV_IOMMU) += iommu.o iommu-pci.o iommu-platform.o iommu-sysfs.o io_pgtable.o
> diff --git a/drivers/iommu/riscv/io_pgtable.c b/drivers/iommu/riscv/io_pgtable.c
> new file mode 100644
> index 000000000000..b6e603e6726e
> --- /dev/null
> +++ b/drivers/iommu/riscv/io_pgtable.c
> @@ -0,0 +1,266 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright © 2022-2023 Rivos Inc.
> + *
> + * RISC-V IOMMU page table allocator.
> + *
> + * Authors:
> + *     Tomasz Jeznach <tjeznach@rivosinc.com>
> + *     Sebastien Boeuf <seb@rivosinc.com>
> + */
> +
> +#include <linux/atomic.h>
> +#include <linux/bitops.h>
> +#include <linux/io-pgtable.h>
> +#include <linux/kernel.h>
> +#include <linux/sizes.h>
> +#include <linux/slab.h>
> +#include <linux/types.h>
> +#include <linux/dma-mapping.h>
> +
> +#include "iommu.h"
> +
> +#define io_pgtable_to_domain(x) \
> +       container_of((x), struct riscv_iommu_domain, pgtbl)
> +
> +#define io_pgtable_ops_to_domain(x) \
> +       io_pgtable_to_domain(container_of((x), struct io_pgtable, ops))
> +
> +static inline size_t get_page_size(size_t size)
> +{
> +       if (size >= IOMMU_PAGE_SIZE_512G)
> +               return IOMMU_PAGE_SIZE_512G;
> +
> +       if (size >= IOMMU_PAGE_SIZE_1G)
> +               return IOMMU_PAGE_SIZE_1G;
> +
> +       if (size >= IOMMU_PAGE_SIZE_2M)
> +               return IOMMU_PAGE_SIZE_2M;
> +
> +       return IOMMU_PAGE_SIZE_4K;
> +}
> +
> +static void riscv_iommu_pt_walk_free(pmd_t * ptp, unsigned shift, bool root)
> +{
> +       pmd_t *pte, *pt_base;
> +       int i;
> +
> +       if (shift == PAGE_SHIFT)
> +               return;
> +
> +       if (root)
> +               pt_base = ptp;
> +       else
> +               pt_base =
> +                   (pmd_t *) pfn_to_virt(__page_val_to_pfn(pmd_val(*ptp)));
> +
> +       /* Recursively free all sub page table pages */
> +       for (i = 0; i < PTRS_PER_PMD; i++) {
> +               pte = pt_base + i;
> +               if (pmd_present(*pte) && !pmd_leaf(*pte))
> +                       riscv_iommu_pt_walk_free(pte, shift - 9, false);
> +       }
> +
> +       /* Now free the current page table page */
> +       if (!root && pmd_present(*pt_base))
> +               free_page((unsigned long)pt_base);
> +}
> +
> +static void riscv_iommu_free_pgtable(struct io_pgtable *iop)
> +{
> +       struct riscv_iommu_domain *domain = io_pgtable_to_domain(iop);
> +       riscv_iommu_pt_walk_free((pmd_t *) domain->pgd_root, PGDIR_SHIFT, true);
> +}
> +
> +static pte_t *riscv_iommu_pt_walk_alloc(pmd_t * ptp, unsigned long iova,
> +                                       unsigned shift, bool root,
> +                                       size_t pgsize,
> +                                       unsigned long (*pd_alloc)(gfp_t),
> +                                       gfp_t gfp)
> +{
> +       pmd_t *pte;
> +       unsigned long pfn;
> +
> +       if (root)
> +               pte = ptp + ((iova >> shift) & (PTRS_PER_PMD - 1));
> +       else
> +               pte = (pmd_t *) pfn_to_virt(__page_val_to_pfn(pmd_val(*ptp))) +
> +                   ((iova >> shift) & (PTRS_PER_PMD - 1));
> +
> +       if ((1ULL << shift) <= pgsize) {
> +               if (pmd_present(*pte) && !pmd_leaf(*pte))
> +                       riscv_iommu_pt_walk_free(pte, shift - 9, false);
> +               return (pte_t *) pte;
> +       }
> +
> +       if (pmd_none(*pte)) {
> +               pfn = pd_alloc ? virt_to_pfn(pd_alloc(gfp)) : 0;
> +               if (!pfn)
> +                       return NULL;
> +               set_pmd(pte, __pmd((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
> +       }
> +
> +       return riscv_iommu_pt_walk_alloc(pte, iova, shift - 9, false,
> +                                        pgsize, pd_alloc, gfp);
> +}
> +
> +static pte_t *riscv_iommu_pt_walk_fetch(pmd_t * ptp,
> +                                       unsigned long iova, unsigned shift,
> +                                       bool root)
> +{
> +       pmd_t *pte;
> +
> +       if (root)
> +               pte = ptp + ((iova >> shift) & (PTRS_PER_PMD - 1));
> +       else
> +               pte = (pmd_t *) pfn_to_virt(__page_val_to_pfn(pmd_val(*ptp))) +
> +                   ((iova >> shift) & (PTRS_PER_PMD - 1));
> +
> +       if (pmd_leaf(*pte))
> +               return (pte_t *) pte;
> +       else if (pmd_none(*pte))
> +               return NULL;
> +       else if (shift == PAGE_SHIFT)
> +               return NULL;
> +
> +       return riscv_iommu_pt_walk_fetch(pte, iova, shift - 9, false);
> +}
> +
> +static int riscv_iommu_map_pages(struct io_pgtable_ops *ops,
> +                                unsigned long iova, phys_addr_t phys,
> +                                size_t pgsize, size_t pgcount, int prot,
> +                                gfp_t gfp, size_t *mapped)
> +{
> +       struct riscv_iommu_domain *domain = io_pgtable_ops_to_domain(ops);
> +       size_t size = 0;
> +       size_t page_size = get_page_size(pgsize);
> +       pte_t *pte;
> +       pte_t pte_val;
> +       pgprot_t pte_prot;
> +
> +       if (domain->domain.type == IOMMU_DOMAIN_BLOCKED)
> +               return -ENODEV;
> +
> +       if (domain->domain.type == IOMMU_DOMAIN_IDENTITY) {
> +               *mapped = pgsize * pgcount;
> +               return 0;
> +       }
> +
> +       pte_prot = (prot & IOMMU_WRITE) ?
> +           __pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_WRITE | _PAGE_DIRTY) :
> +           __pgprot(_PAGE_BASE | _PAGE_READ);
> +
> +       while (pgcount--) {
> +               pte =
> +                   riscv_iommu_pt_walk_alloc((pmd_t *) domain->pgd_root, iova,
> +                                             PGDIR_SHIFT, true, page_size,
> +                                             get_zeroed_page, gfp);
> +               if (!pte) {
> +                       *mapped = size;
> +                       return -ENOMEM;
> +               }
> +
> +               pte_val = pfn_pte(phys_to_pfn(phys), pte_prot);
> +
> +               set_pte(pte, pte_val);
> +
> +               size += page_size;
> +               iova += page_size;
> +               phys += page_size;
> +       }
> +
> +       *mapped = size;
> +       return 0;
> +}
> +
> +static size_t riscv_iommu_unmap_pages(struct io_pgtable_ops *ops,
> +                                     unsigned long iova, size_t pgsize,
> +                                     size_t pgcount,
> +                                     struct iommu_iotlb_gather *gather)
> +{
> +       struct riscv_iommu_domain *domain = io_pgtable_ops_to_domain(ops);
> +       size_t size = 0;
> +       size_t page_size = get_page_size(pgsize);
> +       pte_t *pte;
> +
> +       if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)
> +               return pgsize * pgcount;
> +
> +       while (pgcount--) {
> +               pte = riscv_iommu_pt_walk_fetch((pmd_t *) domain->pgd_root,
> +                                               iova, PGDIR_SHIFT, true);
> +               if (!pte)
> +                       return size;
> +
> +               set_pte(pte, __pte(0));
> +
> +               iommu_iotlb_gather_add_page(&domain->domain, gather, iova,
> +                                           pgsize);
> +
> +               size += page_size;
> +               iova += page_size;
> +       }
> +
> +       return size;
> +}
> +
> +static phys_addr_t riscv_iommu_iova_to_phys(struct io_pgtable_ops *ops,
> +                                           unsigned long iova)
> +{
> +       struct riscv_iommu_domain *domain = io_pgtable_ops_to_domain(ops);
> +       pte_t *pte;
> +
> +       if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)
> +               return (phys_addr_t) iova;
> +
> +       pte = riscv_iommu_pt_walk_fetch((pmd_t *) domain->pgd_root,
> +                                       iova, PGDIR_SHIFT, true);
> +       if (!pte || !pte_present(*pte))
> +               return 0;
> +
> +       return (pfn_to_phys(pte_pfn(*pte)) | (iova & PAGE_MASK));

It should be (iova & ~PAGE_MASK) for getting low 12 bits.

> +}
> +
> +static void riscv_iommu_tlb_inv_all(void *cookie)
> +{
> +}
> +
> +static void riscv_iommu_tlb_inv_walk(unsigned long iova, size_t size,
> +                                    size_t granule, void *cookie)
> +{
> +}
> +
> +static void riscv_iommu_tlb_add_page(struct iommu_iotlb_gather *gather,
> +                                    unsigned long iova, size_t granule,
> +                                    void *cookie)
> +{
> +}
> +
> +static const struct iommu_flush_ops riscv_iommu_flush_ops = {
> +       .tlb_flush_all = riscv_iommu_tlb_inv_all,
> +       .tlb_flush_walk = riscv_iommu_tlb_inv_walk,
> +       .tlb_add_page = riscv_iommu_tlb_add_page,
> +};
> +
> +/* NOTE: cfg should point to riscv_iommu_domain structure member pgtbl.cfg */
> +static struct io_pgtable *riscv_iommu_alloc_pgtable(struct io_pgtable_cfg *cfg,
> +                                                   void *cookie)
> +{
> +       struct io_pgtable *iop = container_of(cfg, struct io_pgtable, cfg);
> +
> +       cfg->pgsize_bitmap = SZ_4K | SZ_2M | SZ_1G;
> +       cfg->ias = 57;          // va mode, SvXX -> ias
> +       cfg->oas = 57;          // pa mode, or SvXX+4 -> oas
> +       cfg->tlb = &riscv_iommu_flush_ops;
> +
> +       iop->ops.map_pages = riscv_iommu_map_pages;
> +       iop->ops.unmap_pages = riscv_iommu_unmap_pages;
> +       iop->ops.iova_to_phys = riscv_iommu_iova_to_phys;
> +
> +       return iop;
> +}
> +
> +struct io_pgtable_init_fns io_pgtable_riscv_init_fns = {
> +       .alloc = riscv_iommu_alloc_pgtable,
> +       .free = riscv_iommu_free_pgtable,
> +};
> diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
> index 9ee7d2b222b5..2ef6952a2109 100644
> --- a/drivers/iommu/riscv/iommu.c
> +++ b/drivers/iommu/riscv/iommu.c
> @@ -807,7 +807,7 @@ static struct iommu_device *riscv_iommu_probe_device(struct device *dev)
>         /* Initial DC pointer can be NULL if IOMMU is configured in OFF or BARE mode */
>         ep->dc = riscv_iommu_get_dc(iommu, ep->devid);
>
> -       dev_info(iommu->dev, "adding device to iommu with devid %i in domain %i\n",
> +       dev_dbg(iommu->dev, "adding device to iommu with devid %i in domain %i\n",
>                 ep->devid, ep->domid);
>
>         dev_iommu_priv_set(dev, ep);
> @@ -874,7 +874,10 @@ static struct iommu_domain *riscv_iommu_domain_alloc(unsigned type)
>  {
>         struct riscv_iommu_domain *domain;
>
> -       if (type != IOMMU_DOMAIN_IDENTITY &&
> +       if (type != IOMMU_DOMAIN_DMA &&
> +           type != IOMMU_DOMAIN_DMA_FQ &&
> +           type != IOMMU_DOMAIN_UNMANAGED &&
> +           type != IOMMU_DOMAIN_IDENTITY &&
>             type != IOMMU_DOMAIN_BLOCKED)
>                 return NULL;
>
> @@ -890,7 +893,7 @@ static struct iommu_domain *riscv_iommu_domain_alloc(unsigned type)
>         domain->pscid = ida_alloc_range(&riscv_iommu_pscids, 1,
>                                         RISCV_IOMMU_MAX_PSCID, GFP_KERNEL);
>
> -       printk("domain type %x alloc %u\n", type, domain->pscid);
> +       printk("domain alloc %u\n", domain->pscid);
>
>         return &domain->domain;
>  }
> @@ -903,6 +906,9 @@ static void riscv_iommu_domain_free(struct iommu_domain *iommu_domain)
>                 pr_warn("IOMMU domain is not empty!\n");
>         }
>
> +       if (domain->pgtbl.cookie)
> +               free_io_pgtable_ops(&domain->pgtbl.ops);
> +
>         if (domain->pgd_root)
>                 free_pages((unsigned long)domain->pgd_root, 0);
>
> @@ -959,6 +965,9 @@ static int riscv_iommu_domain_finalize(struct riscv_iommu_domain *domain,
>         if (!domain->pgd_root)
>                 return -ENOMEM;
>
> +       if (!alloc_io_pgtable_ops(RISCV_IOMMU, &domain->pgtbl.cfg, domain))
> +               return -ENOMEM;
> +
>         return 0;
>  }
>
> @@ -1006,9 +1015,8 @@ static int riscv_iommu_attach_dev(struct iommu_domain *iommu_domain, struct devi
>                 return 0;
>         }
>
> -       if (!dc) {
> +       if (!dc)
>                 return -ENODEV;
> -       }
>
>         /*
>          * S-Stage translation table. G-Stage remains unmodified (BARE).
> @@ -1104,12 +1112,11 @@ static int riscv_iommu_map_pages(struct iommu_domain *iommu_domain,
>  {
>         struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
>
> -       if (domain->domain.type == IOMMU_DOMAIN_IDENTITY) {
> -               *mapped = pgsize * pgcount;
> -               return 0;
> -       }
> +       if (!domain->pgtbl.ops.map_pages)
> +               return -ENODEV;
>
> -       return -ENODEV;
> +       return domain->pgtbl.ops.map_pages(&domain->pgtbl.ops, iova, phys,
> +                                          pgsize, pgcount, prot, gfp, mapped);
>  }
>
>  static size_t riscv_iommu_unmap_pages(struct iommu_domain *iommu_domain,
> @@ -1118,10 +1125,11 @@ static size_t riscv_iommu_unmap_pages(struct iommu_domain *iommu_domain,
>  {
>         struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
>
> -       if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)
> -               return pgsize * pgcount;
> +       if (!domain->pgtbl.ops.unmap_pages)
> +               return 0;
>
> -       return 0;
> +       return domain->pgtbl.ops.unmap_pages(&domain->pgtbl.ops, iova, pgsize,
> +                                            pgcount, gather);
>  }
>
>  static phys_addr_t riscv_iommu_iova_to_phys(struct iommu_domain *iommu_domain,
> @@ -1129,10 +1137,10 @@ static phys_addr_t riscv_iommu_iova_to_phys(struct iommu_domain *iommu_domain,
>  {
>         struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
>
> -       if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)
> -               return (phys_addr_t) iova;
> +       if (!domain->pgtbl.ops.iova_to_phys)
> +               return 0;
>
> -       return 0;
> +       return domain->pgtbl.ops.iova_to_phys(&domain->pgtbl.ops, iova);
>  }
>
>  /*
> diff --git a/drivers/iommu/riscv/iommu.h b/drivers/iommu/riscv/iommu.h
> index 9140df71e17b..fe32a4eff14e 100644
> --- a/drivers/iommu/riscv/iommu.h
> +++ b/drivers/iommu/riscv/iommu.h
> @@ -88,6 +88,7 @@ struct riscv_iommu_device {
>
>  struct riscv_iommu_domain {
>         struct iommu_domain domain;
> +       struct io_pgtable pgtbl;
>
>         struct list_head endpoints;
>         struct mutex lock;
> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
> index 1b7a44b35616..8dd9d3a28e3a 100644
> --- a/include/linux/io-pgtable.h
> +++ b/include/linux/io-pgtable.h
> @@ -19,6 +19,7 @@ enum io_pgtable_fmt {
>         AMD_IOMMU_V2,
>         APPLE_DART,
>         APPLE_DART2,
> +       RISCV_IOMMU,
>         IO_PGTABLE_NUM_FMTS,
>  };
>
> @@ -258,5 +259,6 @@ extern struct io_pgtable_init_fns io_pgtable_arm_mali_lpae_init_fns;
>  extern struct io_pgtable_init_fns io_pgtable_amd_iommu_v1_init_fns;
>  extern struct io_pgtable_init_fns io_pgtable_amd_iommu_v2_init_fns;
>  extern struct io_pgtable_init_fns io_pgtable_apple_dart_init_fns;
> +extern struct io_pgtable_init_fns io_pgtable_riscv_init_fns;
>
>  #endif /* __IO_PGTABLE_H */
> --
> 2.34.1
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings
  2023-07-24 13:23           ` Zong Li
@ 2023-07-26  3:21             ` Baolu Lu
  2023-07-26  4:26               ` Zong Li
  0 siblings, 1 reply; 86+ messages in thread
From: Baolu Lu @ 2023-07-26  3:21 UTC (permalink / raw)
  To: Zong Li, Anup Patel
  Cc: baolu.lu, Tomasz Jeznach, Joerg Roedel, Will Deacon,
	Robin Murphy, Paul Walmsley, Albert Ou, linux, linux-kernel,
	Sebastien Boeuf, iommu, Palmer Dabbelt, Nick Kossifidis,
	linux-riscv

On 2023/7/24 21:23, Zong Li wrote:
>>>>> In RISC-V IOMMU, certain devices can be set to bypass mode when the
>>>>> IOMMU is in translation mode. To identify the devices that require
>>>>> bypass mode by default, does it be sensible to add a property to
>>>>> indicate this behavior?
>>>> Bypass mode for a device is a property of that device (similar to dma-coherent)
>>>> and not of the IOMMU. Other architectures (ARM and x86) never added such
>>>> a device property for bypass mode so I guess it is NOT ADVISABLE to do it.
>>>>
>>>> If this is REALLY required then we can do something similar to the QCOM
>>>> SMMU driver where they have a whitelist of devices which are allowed to
>>>> be in bypass mode (i.e. IOMMU_DOMAIN_IDENTITY) based their device
>>>> compatible string and any device outside this whitelist is blocked by default.
>>>>
>>> I have considered that adding the property of bypass mode to that
>>> device would be more appropriate. However, if we want to define this
>>> property for the device, it might need to go through the generic IOMMU
>>> dt-bindings, but I'm not sure if other IOMMU devices need this. I am
>>> bringing up this topic here because I would like to explore if there
>>> are any solutions on the IOMMU side, such as a property that indicates
>>> the phandle of devices wishing to set bypass mode, somewhat similar to
>>> the whitelist you mentioned earlier. Do you think we should address
>>> this? After all, this is a case of RISC-V IOMMU supported.
>> Bypass mode is a common feature across IOMMUs. Other IOMMUs don't
>> have a special property for bypass mode at device-level or at IOMMU level,
>> which clearly indicates that defining a RISC-V specific property is not the
>> right way to go.
>>
>> The real question is how do we set IOMMU_DOMAIN_IDENTITY (i.e.
>> bypass/identity domain) as the default domain for certain devices ?
>>
>> One possible option is to implement def_domain_type() IOMMU operation
>> for RISC-V IOMMU which will return IOMMU_DOMAIN_IDENTITY for
>> certain devices based on compatible string matching (i.e. whitelist of
>> devices). As an example, refer qcom_smmu_def_domain_type()
>> of drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>>
> That is indeed one way to approach it, and we can modify the
> compatible string when we want to change the mode. However, it would
> be preferable to explore a more flexible approach to achieve this
> goal. By doing so, we can avoid hard coding anything in the driver or
> having to rebuild the kernel  whenever we want to change the mode for
> certain devices. While I have considered extending a cell in the
> 'iommus' property to indicate a device's desire to set bypass mode, it
> doesn't comply with the iommu documentation and could lead to
> ambiguous definitions.

Hard coding the matching strings in the iommu driver is definitely not a
preferable way. A feasible solution from current code's point of view is
that platform opt-in the device's special requirements through DT or
ACPI. And in the def_domain_type callback, let the iommu core know that,
hence it can allocate a right type of domain for the device.

Thoughts?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings
  2023-07-26  3:21             ` Baolu Lu
@ 2023-07-26  4:26               ` Zong Li
  2023-07-26 12:17                 ` Jason Gunthorpe
  0 siblings, 1 reply; 86+ messages in thread
From: Zong Li @ 2023-07-26  4:26 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Anup Patel, Tomasz Jeznach, Joerg Roedel, Will Deacon,
	Robin Murphy, Paul Walmsley, Albert Ou, linux, linux-kernel,
	Sebastien Boeuf, iommu, Palmer Dabbelt, Nick Kossifidis,
	linux-riscv

On Wed, Jul 26, 2023 at 11:21 AM Baolu Lu <baolu.lu@linux.intel.com> wrote:
>
> On 2023/7/24 21:23, Zong Li wrote:
> >>>>> In RISC-V IOMMU, certain devices can be set to bypass mode when the
> >>>>> IOMMU is in translation mode. To identify the devices that require
> >>>>> bypass mode by default, does it be sensible to add a property to
> >>>>> indicate this behavior?
> >>>> Bypass mode for a device is a property of that device (similar to dma-coherent)
> >>>> and not of the IOMMU. Other architectures (ARM and x86) never added such
> >>>> a device property for bypass mode so I guess it is NOT ADVISABLE to do it.
> >>>>
> >>>> If this is REALLY required then we can do something similar to the QCOM
> >>>> SMMU driver where they have a whitelist of devices which are allowed to
> >>>> be in bypass mode (i.e. IOMMU_DOMAIN_IDENTITY) based their device
> >>>> compatible string and any device outside this whitelist is blocked by default.
> >>>>
> >>> I have considered that adding the property of bypass mode to that
> >>> device would be more appropriate. However, if we want to define this
> >>> property for the device, it might need to go through the generic IOMMU
> >>> dt-bindings, but I'm not sure if other IOMMU devices need this. I am
> >>> bringing up this topic here because I would like to explore if there
> >>> are any solutions on the IOMMU side, such as a property that indicates
> >>> the phandle of devices wishing to set bypass mode, somewhat similar to
> >>> the whitelist you mentioned earlier. Do you think we should address
> >>> this? After all, this is a case of RISC-V IOMMU supported.
> >> Bypass mode is a common feature across IOMMUs. Other IOMMUs don't
> >> have a special property for bypass mode at device-level or at IOMMU level,
> >> which clearly indicates that defining a RISC-V specific property is not the
> >> right way to go.
> >>
> >> The real question is how do we set IOMMU_DOMAIN_IDENTITY (i.e.
> >> bypass/identity domain) as the default domain for certain devices ?
> >>
> >> One possible option is to implement def_domain_type() IOMMU operation
> >> for RISC-V IOMMU which will return IOMMU_DOMAIN_IDENTITY for
> >> certain devices based on compatible string matching (i.e. whitelist of
> >> devices). As an example, refer qcom_smmu_def_domain_type()
> >> of drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> >>
> > That is indeed one way to approach it, and we can modify the
> > compatible string when we want to change the mode. However, it would
> > be preferable to explore a more flexible approach to achieve this
> > goal. By doing so, we can avoid hard coding anything in the driver or
> > having to rebuild the kernel  whenever we want to change the mode for
> > certain devices. While I have considered extending a cell in the
> > 'iommus' property to indicate a device's desire to set bypass mode, it
> > doesn't comply with the iommu documentation and could lead to
> > ambiguous definitions.
>
> Hard coding the matching strings in the iommu driver is definitely not a
> preferable way. A feasible solution from current code's point of view is
> that platform opt-in the device's special requirements through DT or
> ACPI. And in the def_domain_type callback, let the iommu core know that,
> hence it can allocate a right type of domain for the device.
>
> Thoughts?
>

It would be nice if we can deal with it at this time. As we discussed
earlier, we might need to consider how to indicate that, such as
putting a property in device side or iommu side, and whether we need
to define it in generic dt-binding instead of RISC-V specific
dt-binding.

> Best regards,
> baolu

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings
  2023-07-26  4:26               ` Zong Li
@ 2023-07-26 12:17                 ` Jason Gunthorpe
  2023-07-27  2:42                   ` Zong Li
  0 siblings, 1 reply; 86+ messages in thread
From: Jason Gunthorpe @ 2023-07-26 12:17 UTC (permalink / raw)
  To: Zong Li
  Cc: Baolu Lu, Anup Patel, Tomasz Jeznach, Joerg Roedel, Will Deacon,
	Robin Murphy, Paul Walmsley, Albert Ou, linux, linux-kernel,
	Sebastien Boeuf, iommu, Palmer Dabbelt, Nick Kossifidis,
	linux-riscv

On Wed, Jul 26, 2023 at 12:26:14PM +0800, Zong Li wrote:
> On Wed, Jul 26, 2023 at 11:21 AM Baolu Lu <baolu.lu@linux.intel.com> wrote:
> >
> > On 2023/7/24 21:23, Zong Li wrote:
> > >>>>> In RISC-V IOMMU, certain devices can be set to bypass mode when the
> > >>>>> IOMMU is in translation mode. To identify the devices that require
> > >>>>> bypass mode by default, does it be sensible to add a property to
> > >>>>> indicate this behavior?
> > >>>> Bypass mode for a device is a property of that device (similar to dma-coherent)
> > >>>> and not of the IOMMU. Other architectures (ARM and x86) never added such
> > >>>> a device property for bypass mode so I guess it is NOT ADVISABLE to do it.
> > >>>>
> > >>>> If this is REALLY required then we can do something similar to the QCOM
> > >>>> SMMU driver where they have a whitelist of devices which are allowed to
> > >>>> be in bypass mode (i.e. IOMMU_DOMAIN_IDENTITY) based their device
> > >>>> compatible string and any device outside this whitelist is
> > >>>> blocked by default.

I have a draft patch someplace that consolidated all this quirk
checking into the core code. Generally the expectation is that any
device behind an iommu is fully functional in all modes. The existing
quirks are for HW defects that make some devices not work properly. In
this case the right outcome seems to be effectively blocking them from
using the iommu.

So, you should explain a lot more what "require bypass mode" means in
the RISCV world and why any device would need it.

Jason

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings
  2023-07-26 12:17                 ` Jason Gunthorpe
@ 2023-07-27  2:42                   ` Zong Li
  2023-08-09 14:57                     ` Jason Gunthorpe
  0 siblings, 1 reply; 86+ messages in thread
From: Zong Li @ 2023-07-27  2:42 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Baolu Lu, Anup Patel, Tomasz Jeznach, Joerg Roedel, Will Deacon,
	Robin Murphy, Paul Walmsley, Albert Ou, linux, linux-kernel,
	Sebastien Boeuf, iommu, Palmer Dabbelt, Nick Kossifidis,
	linux-riscv

On Wed, Jul 26, 2023 at 8:17 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Jul 26, 2023 at 12:26:14PM +0800, Zong Li wrote:
> > On Wed, Jul 26, 2023 at 11:21 AM Baolu Lu <baolu.lu@linux.intel.com> wrote:
> > >
> > > On 2023/7/24 21:23, Zong Li wrote:
> > > >>>>> In RISC-V IOMMU, certain devices can be set to bypass mode when the
> > > >>>>> IOMMU is in translation mode. To identify the devices that require
> > > >>>>> bypass mode by default, does it be sensible to add a property to
> > > >>>>> indicate this behavior?
> > > >>>> Bypass mode for a device is a property of that device (similar to dma-coherent)
> > > >>>> and not of the IOMMU. Other architectures (ARM and x86) never added such
> > > >>>> a device property for bypass mode so I guess it is NOT ADVISABLE to do it.
> > > >>>>
> > > >>>> If this is REALLY required then we can do something similar to the QCOM
> > > >>>> SMMU driver where they have a whitelist of devices which are allowed to
> > > >>>> be in bypass mode (i.e. IOMMU_DOMAIN_IDENTITY) based their device
> > > >>>> compatible string and any device outside this whitelist is
> > > >>>> blocked by default.
>
> I have a draft patch someplace that consolidated all this quirk
> checking into the core code. Generally the expectation is that any
> device behind an iommu is fully functional in all modes. The existing
> quirks are for HW defects that make some devices not work properly. In
> this case the right outcome seems to be effectively blocking them from
> using the iommu.
>
> So, you should explain a lot more what "require bypass mode" means in
> the RISCV world and why any device would need it.

Perhaps this question could be related to the scenarios in which
devices wish to be in bypass mode when the IOMMU is in translation
mode, and why IOMMU defines/supports this case. Currently, I could
envision a scenario where a device is already connected to the IOMMU
in hardware, but it is not functioning correctly, or there are
performance impacts. If modifying the hardware is not feasible, a
default configuration that allows bypass mode could be provided as a
solution. There might be other scenarios that I might have overlooked.
It seems to me since IOMMU supports this configuration, it would be
advantageous to have an approach to achieve it, and DT might be a
flexible way.

>
> Jason

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 01/11] RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support.
  2023-07-19 19:33 ` [PATCH 01/11] RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support Tomasz Jeznach
                     ` (2 preceding siblings ...)
  2023-07-20 12:31   ` Baolu Lu
@ 2023-07-28  2:42   ` Zong Li
  2023-08-02 20:15     ` Tomasz Jeznach
  2023-08-03  0:18   ` Jason Gunthorpe
                     ` (3 subsequent siblings)
  7 siblings, 1 reply; 86+ messages in thread
From: Zong Li @ 2023-07-28  2:42 UTC (permalink / raw)
  To: Tomasz Jeznach
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley,
	Anup Patel, Albert Ou, linux, linux-kernel, Sebastien Boeuf,
	iommu, Palmer Dabbelt, Nick Kossifidis, linux-riscv

On Thu, Jul 20, 2023 at 3:34 AM Tomasz Jeznach <tjeznach@rivosinc.com> wrote:
>
> The patch introduces skeleton IOMMU device driver implementation as defined
> by RISC-V IOMMU Architecture Specification, Version 1.0 [1], with minimal support
> for pass-through mapping, basic initialization and bindings for platform and PCIe
> hardware implementations.
>
> Series of patches following specification evolution has been reorganized to provide
> functional separation of implemented blocks, compliant with ratified specification.
>
> This and following patch series includes code contributed by: Nick Kossifidis
> <mick@ics.forth.gr> (iommu-platform device, number of specification clarification
> and bugfixes and readability improvements), Sebastien Boeuf <seb@rivosinc.com> (page
> table creation, ATS/PGR flow).
>
> Complete history can be found at the maintainer's repository branch [2].
>
> Device driver enables RISC-V 32/64 support for memory translation for DMA capable
> PCI and platform devices, multilevel device directory table, process directory,
> shared virtual address support, wired and message signaled interrupt for translation
> I/O fault, page request interface and command processing.
>
> Matching RISCV-V IOMMU device emulation implementation is available for QEMU project,
> along with educational device extensions for PASID ATS/PRI support [3].
>
> References:
>  - [1] https://github.com/riscv-non-isa/riscv-iommu
>  - [2] https://github.com/tjeznach/linux/tree/tjeznach/riscv-iommu
>  - [3] https://github.com/tjeznach/qemu/tree/tjeznach/riscv-iommu
>
> Co-developed-by: Nick Kossifidis <mick@ics.forth.gr>
> Signed-off-by: Nick Kossifidis <mick@ics.forth.gr>
> Co-developed-by: Sebastien Boeuf <seb@rivosinc.com>
> Signed-off-by: Sebastien Boeuf <seb@rivosinc.com>
> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> ---
>  drivers/iommu/Kconfig                |   1 +
>  drivers/iommu/Makefile               |   2 +-
>  drivers/iommu/riscv/Kconfig          |  22 +
>  drivers/iommu/riscv/Makefile         |   1 +
>  drivers/iommu/riscv/iommu-bits.h     | 704 +++++++++++++++++++++++++++
>  drivers/iommu/riscv/iommu-pci.c      | 134 +++++
>  drivers/iommu/riscv/iommu-platform.c |  94 ++++
>  drivers/iommu/riscv/iommu.c          | 660 +++++++++++++++++++++++++
>  drivers/iommu/riscv/iommu.h          | 115 +++++
>  9 files changed, 1732 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/iommu/riscv/Kconfig
>  create mode 100644 drivers/iommu/riscv/Makefile
>  create mode 100644 drivers/iommu/riscv/iommu-bits.h
>  create mode 100644 drivers/iommu/riscv/iommu-pci.c
>  create mode 100644 drivers/iommu/riscv/iommu-platform.c
>  create mode 100644 drivers/iommu/riscv/iommu.c
>  create mode 100644 drivers/iommu/riscv/iommu.h
>
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 2b12b583ef4b..36fcc6fd5b4e 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -187,6 +187,7 @@ config MSM_IOMMU
>  source "drivers/iommu/amd/Kconfig"
>  source "drivers/iommu/intel/Kconfig"
>  source "drivers/iommu/iommufd/Kconfig"
> +source "drivers/iommu/riscv/Kconfig"
>
>  config IRQ_REMAP
>         bool "Support for Interrupt Remapping"
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 769e43d780ce..8f57110a9fb1 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -1,5 +1,5 @@
>  # SPDX-License-Identifier: GPL-2.0
> -obj-y += amd/ intel/ arm/ iommufd/
> +obj-y += amd/ intel/ arm/ iommufd/ riscv/
>  obj-$(CONFIG_IOMMU_API) += iommu.o
>  obj-$(CONFIG_IOMMU_API) += iommu-traces.o
>  obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
> diff --git a/drivers/iommu/riscv/Kconfig b/drivers/iommu/riscv/Kconfig
> new file mode 100644
> index 000000000000..01d4043849d4
> --- /dev/null
> +++ b/drivers/iommu/riscv/Kconfig
> @@ -0,0 +1,22 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +# RISC-V IOMMU support
> +
> +config RISCV_IOMMU
> +       bool "RISC-V IOMMU driver"
> +       depends on RISCV
> +       select IOMMU_API
> +       select IOMMU_DMA
> +       select IOMMU_SVA
> +       select IOMMU_IOVA
> +       select IOMMU_IO_PGTABLE
> +       select IOASID
> +       select PCI_MSI
> +       select PCI_ATS
> +       select PCI_PRI
> +       select PCI_PASID
> +       select MMU_NOTIFIER
> +       help
> +         Support for devices following RISC-V IOMMU specification.
> +
> +         If unsure, say N here.
> +
> diff --git a/drivers/iommu/riscv/Makefile b/drivers/iommu/riscv/Makefile
> new file mode 100644
> index 000000000000..38730c11e4a8
> --- /dev/null
> +++ b/drivers/iommu/riscv/Makefile
> @@ -0,0 +1 @@
> +obj-$(CONFIG_RISCV_IOMMU) += iommu.o iommu-pci.o iommu-platform.o
> \ No newline at end of file
> diff --git a/drivers/iommu/riscv/iommu-bits.h b/drivers/iommu/riscv/iommu-bits.h
> new file mode 100644
> index 000000000000..b2946793a73d
> --- /dev/null
> +++ b/drivers/iommu/riscv/iommu-bits.h
> @@ -0,0 +1,704 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright © 2022-2023 Rivos Inc.
> + * Copyright © 2023 FORTH-ICS/CARV
> + * Copyright © 2023 RISC-V IOMMU Task Group
> + *
> + * RISC-V Ziommu - Register Layout and Data Structures.
> + *
> + * Based on the 'RISC-V IOMMU Architecture Specification', Version 1.0
> + * Published at  https://github.com/riscv-non-isa/riscv-iommu
> + *
> + */
> +
> +#ifndef _RISCV_IOMMU_BITS_H_
> +#define _RISCV_IOMMU_BITS_H_
> +
> +#include <linux/types.h>
> +#include <linux/bitfield.h>
> +#include <linux/bits.h>
> +
> +/*
> + * Chapter 5: Memory Mapped register interface
> + */
> +
> +/* Common field positions */
> +#define RISCV_IOMMU_PPN_FIELD          GENMASK_ULL(53, 10)
> +#define RISCV_IOMMU_QUEUE_LOGSZ_FIELD  GENMASK_ULL(4, 0)
> +#define RISCV_IOMMU_QUEUE_INDEX_FIELD  GENMASK_ULL(31, 0)
> +#define RISCV_IOMMU_QUEUE_ENABLE       BIT(0)
> +#define RISCV_IOMMU_QUEUE_INTR_ENABLE  BIT(1)
> +#define RISCV_IOMMU_QUEUE_MEM_FAULT    BIT(8)
> +#define RISCV_IOMMU_QUEUE_OVERFLOW     BIT(9)
> +#define RISCV_IOMMU_QUEUE_ACTIVE       BIT(16)
> +#define RISCV_IOMMU_QUEUE_BUSY         BIT(17)
> +
> +#define RISCV_IOMMU_ATP_PPN_FIELD      GENMASK_ULL(43, 0)
> +#define RISCV_IOMMU_ATP_MODE_FIELD     GENMASK_ULL(63, 60)
> +
> +/* 5.3 IOMMU Capabilities (64bits) */
> +#define RISCV_IOMMU_REG_CAP            0x0000
> +#define RISCV_IOMMU_CAP_VERSION                GENMASK_ULL(7, 0)
> +#define RISCV_IOMMU_CAP_S_SV32         BIT_ULL(8)
> +#define RISCV_IOMMU_CAP_S_SV39         BIT_ULL(9)
> +#define RISCV_IOMMU_CAP_S_SV48         BIT_ULL(10)
> +#define RISCV_IOMMU_CAP_S_SV57         BIT_ULL(11)
> +#define RISCV_IOMMU_CAP_SVPBMT         BIT_ULL(15)
> +#define RISCV_IOMMU_CAP_G_SV32         BIT_ULL(16)
> +#define RISCV_IOMMU_CAP_G_SV39         BIT_ULL(17)
> +#define RISCV_IOMMU_CAP_G_SV48         BIT_ULL(18)
> +#define RISCV_IOMMU_CAP_G_SV57         BIT_ULL(19)
> +#define RISCV_IOMMU_CAP_MSI_FLAT       BIT_ULL(22)
> +#define RISCV_IOMMU_CAP_MSI_MRIF       BIT_ULL(23)
> +#define RISCV_IOMMU_CAP_AMO            BIT_ULL(24)
> +#define RISCV_IOMMU_CAP_ATS            BIT_ULL(25)
> +#define RISCV_IOMMU_CAP_T2GPA          BIT_ULL(26)
> +#define RISCV_IOMMU_CAP_END            BIT_ULL(27)
> +#define RISCV_IOMMU_CAP_IGS            GENMASK_ULL(29, 28)
> +#define RISCV_IOMMU_CAP_HPM            BIT_ULL(30)
> +#define RISCV_IOMMU_CAP_DBG            BIT_ULL(31)
> +#define RISCV_IOMMU_CAP_PAS            GENMASK_ULL(37, 32)
> +#define RISCV_IOMMU_CAP_PD8            BIT_ULL(38)
> +#define RISCV_IOMMU_CAP_PD17           BIT_ULL(39)
> +#define RISCV_IOMMU_CAP_PD20           BIT_ULL(40)
> +
> +#define RISCV_IOMMU_CAP_VERSION_VER_MASK       0xF0
> +#define RISCV_IOMMU_CAP_VERSION_REV_MASK       0x0F
> +
> +/**
> + * enum riscv_iommu_igs_settings - Interrupt Generation Support Settings
> + * @RISCV_IOMMU_CAP_IGS_MSI: I/O MMU supports only MSI generation
> + * @RISCV_IOMMU_CAP_IGS_WSI: I/O MMU supports only Wired-Signaled interrupt
> + * @RISCV_IOMMU_CAP_IGS_BOTH: I/O MMU supports both MSI and WSI generation
> + * @RISCV_IOMMU_CAP_IGS_RSRV: Reserved for standard use
> + */
> +enum riscv_iommu_igs_settings {
> +       RISCV_IOMMU_CAP_IGS_MSI = 0,
> +       RISCV_IOMMU_CAP_IGS_WSI = 1,
> +       RISCV_IOMMU_CAP_IGS_BOTH = 2,
> +       RISCV_IOMMU_CAP_IGS_RSRV = 3
> +};
> +
> +/* 5.4 Features control register (32bits) */
> +#define RISCV_IOMMU_REG_FCTL           0x0008
> +#define RISCV_IOMMU_FCTL_BE            BIT(0)
> +#define RISCV_IOMMU_FCTL_WSI           BIT(1)
> +#define RISCV_IOMMU_FCTL_GXL           BIT(2)
> +
> +/* 5.5 Device-directory-table pointer (64bits) */
> +#define RISCV_IOMMU_REG_DDTP           0x0010
> +#define RISCV_IOMMU_DDTP_MODE          GENMASK_ULL(3, 0)
> +#define RISCV_IOMMU_DDTP_BUSY          BIT_ULL(4)
> +#define RISCV_IOMMU_DDTP_PPN           RISCV_IOMMU_PPN_FIELD
> +
> +/**
> + * enum riscv_iommu_ddtp_modes - I/O MMU translation modes
> + * @RISCV_IOMMU_DDTP_MODE_OFF: No inbound transactions allowed
> + * @RISCV_IOMMU_DDTP_MODE_BARE: Pass-through mode
> + * @RISCV_IOMMU_DDTP_MODE_1LVL: One-level DDT
> + * @RISCV_IOMMU_DDTP_MODE_2LVL: Two-level DDT
> + * @RISCV_IOMMU_DDTP_MODE_3LVL: Three-level DDT
> + */
> +enum riscv_iommu_ddtp_modes {
> +       RISCV_IOMMU_DDTP_MODE_OFF = 0,
> +       RISCV_IOMMU_DDTP_MODE_BARE = 1,
> +       RISCV_IOMMU_DDTP_MODE_1LVL = 2,
> +       RISCV_IOMMU_DDTP_MODE_2LVL = 3,
> +       RISCV_IOMMU_DDTP_MODE_3LVL = 4,
> +       RISCV_IOMMU_DDTP_MODE_MAX = 4
> +};
> +
> +/* 5.6 Command Queue Base (64bits) */
> +#define RISCV_IOMMU_REG_CQB            0x0018
> +#define RISCV_IOMMU_CQB_ENTRIES                RISCV_IOMMU_QUEUE_LOGSZ_FIELD
> +#define RISCV_IOMMU_CQB_PPN            RISCV_IOMMU_PPN_FIELD
> +
> +/* 5.7 Command Queue head (32bits) */
> +#define RISCV_IOMMU_REG_CQH            0x0020
> +#define RISCV_IOMMU_CQH_INDEX          RISCV_IOMMU_QUEUE_INDEX_FIELD
> +
> +/* 5.8 Command Queue tail (32bits) */
> +#define RISCV_IOMMU_REG_CQT            0x0024
> +#define RISCV_IOMMU_CQT_INDEX          RISCV_IOMMU_QUEUE_INDEX_FIELD
> +
> +/* 5.9 Fault Queue Base (64bits) */
> +#define RISCV_IOMMU_REG_FQB            0x0028
> +#define RISCV_IOMMU_FQB_ENTRIES                RISCV_IOMMU_QUEUE_LOGSZ_FIELD
> +#define RISCV_IOMMU_FQB_PPN            RISCV_IOMMU_PPN_FIELD
> +
> +/* 5.10 Fault Queue Head (32bits) */
> +#define RISCV_IOMMU_REG_FQH            0x0030
> +#define RISCV_IOMMU_FQH_INDEX          RISCV_IOMMU_QUEUE_INDEX_FIELD
> +
> +/* 5.11 Fault Queue tail (32bits) */
> +#define RISCV_IOMMU_REG_FQT            0x0034
> +#define RISCV_IOMMU_FQT_INDEX          RISCV_IOMMU_QUEUE_INDEX_FIELD
> +
> +/* 5.12 Page Request Queue base (64bits) */
> +#define RISCV_IOMMU_REG_PQB            0x0038
> +#define RISCV_IOMMU_PQB_ENTRIES                RISCV_IOMMU_QUEUE_LOGSZ_FIELD
> +#define RISCV_IOMMU_PQB_PPN            RISCV_IOMMU_PPN_FIELD
> +
> +/* 5.13 Page Request Queue head (32bits) */
> +#define RISCV_IOMMU_REG_PQH            0x0040
> +#define RISCV_IOMMU_PQH_INDEX          RISCV_IOMMU_QUEUE_INDEX_FIELD
> +
> +/* 5.14 Page Request Queue tail (32bits) */
> +#define RISCV_IOMMU_REG_PQT            0x0044
> +#define RISCV_IOMMU_PQT_INDEX_MASK     RISCV_IOMMU_QUEUE_INDEX_FIELD
> +
> +/* 5.15 Command Queue CSR (32bits) */
> +#define RISCV_IOMMU_REG_CQCSR          0x0048
> +#define RISCV_IOMMU_CQCSR_CQEN         RISCV_IOMMU_QUEUE_ENABLE
> +#define RISCV_IOMMU_CQCSR_CIE          RISCV_IOMMU_QUEUE_INTR_ENABLE
> +#define RISCV_IOMMU_CQCSR_CQMF         RISCV_IOMMU_QUEUE_MEM_FAULT
> +#define RISCV_IOMMU_CQCSR_CMD_TO       BIT(9)
> +#define RISCV_IOMMU_CQCSR_CMD_ILL      BIT(10)
> +#define RISCV_IOMMU_CQCSR_FENCE_W_IP   BIT(11)
> +#define RISCV_IOMMU_CQCSR_CQON         RISCV_IOMMU_QUEUE_ACTIVE
> +#define RISCV_IOMMU_CQCSR_BUSY         RISCV_IOMMU_QUEUE_BUSY
> +
> +/* 5.16 Fault Queue CSR (32bits) */
> +#define RISCV_IOMMU_REG_FQCSR          0x004C
> +#define RISCV_IOMMU_FQCSR_FQEN         RISCV_IOMMU_QUEUE_ENABLE
> +#define RISCV_IOMMU_FQCSR_FIE          RISCV_IOMMU_QUEUE_INTR_ENABLE
> +#define RISCV_IOMMU_FQCSR_FQMF         RISCV_IOMMU_QUEUE_MEM_FAULT
> +#define RISCV_IOMMU_FQCSR_FQOF         RISCV_IOMMU_QUEUE_OVERFLOW
> +#define RISCV_IOMMU_FQCSR_FQON         RISCV_IOMMU_QUEUE_ACTIVE
> +#define RISCV_IOMMU_FQCSR_BUSY         RISCV_IOMMU_QUEUE_BUSY
> +
> +/* 5.17 Page Request Queue CSR (32bits) */
> +#define RISCV_IOMMU_REG_PQCSR          0x0050
> +#define RISCV_IOMMU_PQCSR_PQEN         RISCV_IOMMU_QUEUE_ENABLE
> +#define RISCV_IOMMU_PQCSR_PIE          RISCV_IOMMU_QUEUE_INTR_ENABLE
> +#define RISCV_IOMMU_PQCSR_PQMF         RISCV_IOMMU_QUEUE_MEM_FAULT
> +#define RISCV_IOMMU_PQCSR_PQOF         RISCV_IOMMU_QUEUE_OVERFLOW
> +#define RISCV_IOMMU_PQCSR_PQON         RISCV_IOMMU_QUEUE_ACTIVE
> +#define RISCV_IOMMU_PQCSR_BUSY         RISCV_IOMMU_QUEUE_BUSY
> +
> +/* 5.18 Interrupt Pending Status (32bits) */
> +#define RISCV_IOMMU_REG_IPSR           0x0054
> +
> +#define RISCV_IOMMU_INTR_CQ            0
> +#define RISCV_IOMMU_INTR_FQ            1
> +#define RISCV_IOMMU_INTR_PM            2
> +#define RISCV_IOMMU_INTR_PQ            3
> +#define RISCV_IOMMU_INTR_COUNT         4
> +
> +#define RISCV_IOMMU_IPSR_CIP           BIT(RISCV_IOMMU_INTR_CQ)
> +#define RISCV_IOMMU_IPSR_FIP           BIT(RISCV_IOMMU_INTR_FQ)
> +#define RISCV_IOMMU_IPSR_PMIP          BIT(RISCV_IOMMU_INTR_PM)
> +#define RISCV_IOMMU_IPSR_PIP           BIT(RISCV_IOMMU_INTR_PQ)
> +
> +/* 5.19 Performance monitoring counter overflow status (32bits) */
> +#define RISCV_IOMMU_REG_IOCOUNTOVF     0x0058
> +#define RISCV_IOMMU_IOCOUNTOVF_CY      BIT(0)
> +#define RISCV_IOMMU_IOCOUNTOVF_HPM     GENMASK_ULL(31, 1)
> +
> +/* 5.20 Performance monitoring counter inhibits (32bits) */
> +#define RISCV_IOMMU_REG_IOCOUNTINH     0x005C
> +#define RISCV_IOMMU_IOCOUNTINH_CY      BIT(0)
> +#define RISCV_IOMMU_IOCOUNTINH_HPM     GENMASK(31, 1)
> +
> +/* 5.21 Performance monitoring cycles counter (64bits) */
> +#define RISCV_IOMMU_REG_IOHPMCYCLES     0x0060
> +#define RISCV_IOMMU_IOHPMCYCLES_COUNTER        GENMASK_ULL(62, 0)
> +#define RISCV_IOMMU_IOHPMCYCLES_OVF    BIT_ULL(63)
> +
> +/* 5.22 Performance monitoring event counters (31 * 64bits) */
> +#define RISCV_IOMMU_REG_IOHPMCTR_BASE  0x0068
> +#define RISCV_IOMMU_REG_IOHPMCTR(_n)   (RISCV_IOMMU_REG_IOHPMCTR_BASE + (_n * 0x8))
> +
> +/* 5.23 Performance monitoring event selectors (31 * 64bits) */
> +#define RISCV_IOMMU_REG_IOHPMEVT_BASE  0x0160
> +#define RISCV_IOMMU_REG_IOHPMEVT(_n)   (RISCV_IOMMU_REG_IOHPMEVT_BASE + (_n * 0x8))
> +#define RISCV_IOMMU_IOHPMEVT_EVENT_ID  GENMASK_ULL(14, 0)
> +#define RISCV_IOMMU_IOHPMEVT_DMASK     BIT_ULL(15)
> +#define RISCV_IOMMU_IOHPMEVT_PID_PSCID GENMASK_ULL(35, 16)
> +#define RISCV_IOMMU_IOHPMEVT_DID_GSCID GENMASK_ULL(59, 36)
> +#define RISCV_IOMMU_IOHPMEVT_PV_PSCV   BIT_ULL(60)
> +#define RISCV_IOMMU_IOHPMEVT_DV_GSCV   BIT_ULL(61)
> +#define RISCV_IOMMU_IOHPMEVT_IDT       BIT_ULL(62)
> +#define RISCV_IOMMU_IOHPMEVT_OF                BIT_ULL(63)
> +
> +/**
> + * enum riscv_iommu_hpmevent_id - Performance-monitoring event identifier
> + *
> + * @RISCV_IOMMU_HPMEVENT_INVALID: Invalid event, do not count
> + * @RISCV_IOMMU_HPMEVENT_URQ: Untranslated requests
> + * @RISCV_IOMMU_HPMEVENT_TRQ: Translated requests
> + * @RISCV_IOMMU_HPMEVENT_ATS_RQ: ATS translation requests
> + * @RISCV_IOMMU_HPMEVENT_TLB_MISS: TLB misses
> + * @RISCV_IOMMU_HPMEVENT_DD_WALK: Device directory walks
> + * @RISCV_IOMMU_HPMEVENT_PD_WALK: Process directory walks
> + * @RISCV_IOMMU_HPMEVENT_S_VS_WALKS: S/VS-Stage page table walks
> + * @RISCV_IOMMU_HPMEVENT_G_WALKS: G-Stage page table walks
> + * @RISCV_IOMMU_HPMEVENT_MAX: Value to denote maximum Event IDs
> + */
> +enum riscv_iommu_hpmevent_id {
> +       RISCV_IOMMU_HPMEVENT_INVALID    = 0,
> +       RISCV_IOMMU_HPMEVENT_URQ        = 1,
> +       RISCV_IOMMU_HPMEVENT_TRQ        = 2,
> +       RISCV_IOMMU_HPMEVENT_ATS_RQ     = 3,
> +       RISCV_IOMMU_HPMEVENT_TLB_MISS   = 4,
> +       RISCV_IOMMU_HPMEVENT_DD_WALK    = 5,
> +       RISCV_IOMMU_HPMEVENT_PD_WALK    = 6,
> +       RISCV_IOMMU_HPMEVENT_S_VS_WALKS = 7,
> +       RISCV_IOMMU_HPMEVENT_G_WALKS    = 8,
> +       RISCV_IOMMU_HPMEVENT_MAX        = 9
> +};
> +
> +/* 5.24 Translation request IOVA (64bits) */
> +#define RISCV_IOMMU_REG_TR_REQ_IOVA     0x0258
> +#define RISCV_IOMMU_TR_REQ_IOVA_VPN    GENMASK_ULL(63, 12)
> +
> +/* 5.25 Translation request control (64bits) */
> +#define RISCV_IOMMU_REG_TR_REQ_CTL     0x0260
> +#define RISCV_IOMMU_TR_REQ_CTL_GO_BUSY BIT_ULL(0)
> +#define RISCV_IOMMU_TR_REQ_CTL_PRIV    BIT_ULL(1)
> +#define RISCV_IOMMU_TR_REQ_CTL_EXE     BIT_ULL(2)
> +#define RISCV_IOMMU_TR_REQ_CTL_NW      BIT_ULL(3)
> +#define RISCV_IOMMU_TR_REQ_CTL_PID     GENMASK_ULL(31, 12)
> +#define RISCV_IOMMU_TR_REQ_CTL_PV      BIT_ULL(32)
> +#define RISCV_IOMMU_TR_REQ_CTL_DID     GENMASK_ULL(63, 40)
> +
> +/* 5.26 Translation request response (64bits) */
> +#define RISCV_IOMMU_REG_TR_RESPONSE    0x0268
> +#define RISCV_IOMMU_TR_RESPONSE_FAULT  BIT_ULL(0)
> +#define RISCV_IOMMU_TR_RESPONSE_PBMT   GENMASK_ULL(8, 7)
> +#define RISCV_IOMMU_TR_RESPONSE_SZ     BIT_ULL(9)
> +#define RISCV_IOMMU_TR_RESPONSE_PPN    RISCV_IOMMU_PPN_FIELD
> +
> +/* 5.27 Interrupt cause to vector (64bits) */
> +#define RISCV_IOMMU_REG_IVEC           0x02F8
> +#define RISCV_IOMMU_IVEC_CIV           GENMASK_ULL(3, 0)
> +#define RISCV_IOMMU_IVEC_FIV           GENMASK_ULL(7, 4)
> +#define RISCV_IOMMU_IVEC_PMIV          GENMASK_ULL(11, 8)
> +#define RISCV_IOMMU_IVEC_PIV           GENMASK_ULL(15,12)
> +
> +/* 5.28 MSI Configuration table (32 * 64bits) */
> +#define RISCV_IOMMU_REG_MSI_CONFIG     0x0300
> +#define RISCV_IOMMU_REG_MSI_ADDR(_n)   (RISCV_IOMMU_REG_MSI_CONFIG + (_n * 0x10))
> +#define RISCV_IOMMU_MSI_ADDR           GENMASK_ULL(55, 2)
> +#define RISCV_IOMMU_REG_MSI_DATA(_n)   (RISCV_IOMMU_REG_MSI_CONFIG + (_n * 0x10) + 0x08)
> +#define RISCV_IOMMU_MSI_DATA           GENMASK_ULL(31, 0)
> +#define RISCV_IOMMU_REG_MSI_VEC_CTL(_n)        (RISCV_IOMMU_REG_MSI_CONFIG + (_n * 0x10) + 0x0C)
> +#define RISCV_IOMMU_MSI_VEC_CTL_M      BIT_ULL(0)
> +
> +#define RISCV_IOMMU_REG_SIZE   0x1000
> +
> +/*
> + * Chapter 2: Data structures
> + */
> +
> +/*
> + * Device Directory Table macros for non-leaf nodes
> + */
> +#define RISCV_IOMMU_DDTE_VALID BIT_ULL(0)
> +#define RISCV_IOMMU_DDTE_PPN   RISCV_IOMMU_PPN_FIELD
> +
> +/**
> + * struct riscv_iommu_dc - Device Context
> + * @tc: Translation Control
> + * @iohgatp: I/O Hypervisor guest address translation and protection
> + *          (Second stage context)
> + * @ta: Translation Attributes
> + * @fsc: First stage context
> + * @msiptpt: MSI page table pointer
> + * @msi_addr_mask: MSI address mask
> + * @msi_addr_pattern: MSI address pattern
> + *
> + * This structure is used for leaf nodes on the Device Directory Table,
> + * in case RISCV_IOMMU_CAP_MSI_FLAT is not set, the bottom 4 fields are
> + * not present and are skipped with pointer arithmetic to avoid
> + * casting, check out riscv_iommu_get_dc().
> + * See section 2.1 for more details
> + */
> +struct riscv_iommu_dc {
> +       u64 tc;
> +       u64 iohgatp;
> +       u64 ta;
> +       u64 fsc;
> +       u64 msiptp;
> +       u64 msi_addr_mask;
> +       u64 msi_addr_pattern;
> +       u64 _reserved;
> +};
> +
> +/* Translation control fields */
> +#define RISCV_IOMMU_DC_TC_V            BIT_ULL(0)
> +#define RISCV_IOMMU_DC_TC_EN_ATS       BIT_ULL(1)
> +#define RISCV_IOMMU_DC_TC_EN_PRI       BIT_ULL(2)
> +#define RISCV_IOMMU_DC_TC_T2GPA                BIT_ULL(3)
> +#define RISCV_IOMMU_DC_TC_DTF          BIT_ULL(4)
> +#define RISCV_IOMMU_DC_TC_PDTV         BIT_ULL(5)
> +#define RISCV_IOMMU_DC_TC_PRPR         BIT_ULL(6)
> +#define RISCV_IOMMU_DC_TC_GADE         BIT_ULL(7)
> +#define RISCV_IOMMU_DC_TC_SADE         BIT_ULL(8)
> +#define RISCV_IOMMU_DC_TC_DPE          BIT_ULL(9)
> +#define RISCV_IOMMU_DC_TC_SBE          BIT_ULL(10)
> +#define RISCV_IOMMU_DC_TC_SXL          BIT_ULL(11)
> +
> +/* Second-stage (aka G-stage) context fields */
> +#define RISCV_IOMMU_DC_IOHGATP_PPN     RISCV_IOMMU_ATP_PPN_FIELD
> +#define RISCV_IOMMU_DC_IOHGATP_GSCID   GENMASK_ULL(59, 44)
> +#define RISCV_IOMMU_DC_IOHGATP_MODE    RISCV_IOMMU_ATP_MODE_FIELD
> +
> +/**
> + * enum riscv_iommu_dc_iohgatp_modes - Guest address translation/protection modes
> + * @RISCV_IOMMU_DC_IOHGATP_MODE_BARE: No translation/protection
> + * @RISCV_IOMMU_DC_IOHGATP_MODE_SV32X4: Sv32x4 (2-bit extension of Sv32), when fctl.GXL == 1
> + * @RISCV_IOMMU_DC_IOHGATP_MODE_SV39X4: Sv39x4 (2-bit extension of Sv39), when fctl.GXL == 0
> + * @RISCV_IOMMU_DC_IOHGATP_MODE_SV48X4: Sv48x4 (2-bit extension of Sv48), when fctl.GXL == 0
> + * @RISCV_IOMMU_DC_IOHGATP_MODE_SV57X4: Sv57x4 (2-bit extension of Sv57), when fctl.GXL == 0
> + */
> +enum riscv_iommu_dc_iohgatp_modes {
> +       RISCV_IOMMU_DC_IOHGATP_MODE_BARE = 0,
> +       RISCV_IOMMU_DC_IOHGATP_MODE_SV32X4 = 8,
> +       RISCV_IOMMU_DC_IOHGATP_MODE_SV39X4 = 8,
> +       RISCV_IOMMU_DC_IOHGATP_MODE_SV48X4 = 9,
> +       RISCV_IOMMU_DC_IOHGATP_MODE_SV57X4 = 10
> +};
> +
> +/* Translation attributes fields */
> +#define RISCV_IOMMU_DC_TA_PSCID                GENMASK_ULL(31,12)
> +
> +/* First-stage context fields */
> +#define RISCV_IOMMU_DC_FSC_PPN         RISCV_IOMMU_ATP_PPN_FIELD
> +#define RISCV_IOMMU_DC_FSC_MODE                RISCV_IOMMU_ATP_MODE_FIELD
> +
> +/**
> + * enum riscv_iommu_dc_fsc_atp_modes - First stage address translation/protection modes
> + * @RISCV_IOMMU_DC_FSC_MODE_BARE: No translation/protection
> + * @RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV32: Sv32, when dc.tc.SXL == 1
> + * @RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39: Sv39, when dc.tc.SXL == 0
> + * @RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV48: Sv48, when dc.tc.SXL == 0
> + * @RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV57: Sv57, when dc.tc.SXL == 0
> + * @RISCV_IOMMU_DC_FSC_PDTP_MODE_PD8: 1lvl PDT, 8bit process ids
> + * @RISCV_IOMMU_DC_FSC_PDTP_MODE_PD17: 2lvl PDT, 17bit process ids
> + * @RISCV_IOMMU_DC_FSC_PDTP_MODE_PD20: 3lvl PDT, 20bit process ids
> + *
> + * FSC holds IOSATP when RISCV_IOMMU_DC_TC_PDTV is 0 and PDTP otherwise.
> + * IOSATP controls the first stage address translation (same as the satp register on
> + * the RISC-V MMU), and PDTP holds the process directory table, used to select a
> + * first stage page table based on a process id (for devices that support multiple
> + * process ids).
> + */
> +enum riscv_iommu_dc_fsc_atp_modes {
> +       RISCV_IOMMU_DC_FSC_MODE_BARE = 0,
> +       RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV32 = 8,
> +       RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 = 8,
> +       RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV48 = 9,
> +       RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV57 = 10,
> +       RISCV_IOMMU_DC_FSC_PDTP_MODE_PD8 = 1,
> +       RISCV_IOMMU_DC_FSC_PDTP_MODE_PD17 = 2,
> +       RISCV_IOMMU_DC_FSC_PDTP_MODE_PD20 = 3
> +};
> +
> +/* MSI page table pointer */
> +#define RISCV_IOMMU_DC_MSIPTP_PPN      RISCV_IOMMU_ATP_PPN_FIELD
> +#define RISCV_IOMMU_DC_MSIPTP_MODE     RISCV_IOMMU_ATP_MODE_FIELD
> +#define RISCV_IOMMU_DC_MSIPTP_MODE_OFF 0
> +#define RISCV_IOMMU_DC_MSIPTP_MODE_FLAT        1
> +
> +/* MSI address mask */
> +#define RISCV_IOMMU_DC_MSI_ADDR_MASK   GENMASK_ULL(51, 0)
> +
> +/* MSI address pattern */
> +#define RISCV_IOMMU_DC_MSI_PATTERN     GENMASK_ULL(51, 0)
> +
> +/**
> + * struct riscv_iommu_pc - Process Context
> + * @ta: Translation Attributes
> + * @fsc: First stage context
> + *
> + * This structure is used for leaf nodes on the Process Directory Table
> + * See section 2.3 for more details
> + */
> +struct riscv_iommu_pc {
> +       u64 ta;
> +       u64 fsc;
> +};
> +
> +/* Translation attributes fields */
> +#define RISCV_IOMMU_PC_TA_V    BIT_ULL(0)
> +#define RISCV_IOMMU_PC_TA_ENS  BIT_ULL(1)
> +#define RISCV_IOMMU_PC_TA_SUM  BIT_ULL(2)
> +#define RISCV_IOMMU_PC_TA_PSCID        GENMASK_ULL(31, 12)
> +
> +/* First stage context fields */
> +#define RISCV_IOMMU_PC_FSC_PPN RISCV_IOMMU_ATP_PPN_FIELD
> +#define RISCV_IOMMU_PC_FSC_MODE        RISCV_IOMMU_ATP_MODE_FIELD
> +
> +/*
> + * Chapter 3: In-memory queue interface
> + */
> +
> +/**
> + * struct riscv_iommu_cmd - Generic I/O MMU command structure
> + * @dword0: Includes the opcode and the function identifier
> + * @dword1: Opcode specific data
> + *
> + * The commands are interpreted as two 64bit fields, where the first
> + * 7bits of the first field are the opcode which also defines the
> + * command's format, followed by a 3bit field that specifies the
> + * function invoked by that command, and the rest is opcode-specific.
> + * This is a generic struct which will be populated differently
> + * according to each command. For more infos on the commands and
> + * the command queue check section 3.1.
> + */
> +struct riscv_iommu_command {
> +       u64 dword0;
> +       u64 dword1;
> +};
> +
> +/* Fields on dword0, common for all commands */
> +#define RISCV_IOMMU_CMD_OPCODE GENMASK_ULL(6, 0)
> +#define        RISCV_IOMMU_CMD_FUNC    GENMASK_ULL(9, 7)
> +
> +/* 3.1.1 I/O MMU Page-table cache invalidation */
> +/* Fields on dword0 */
> +#define RISCV_IOMMU_CMD_IOTINVAL_OPCODE                1
> +#define RISCV_IOMMU_CMD_IOTINVAL_FUNC_VMA      0
> +#define RISCV_IOMMU_CMD_IOTINVAL_FUNC_GVMA     1
> +#define RISCV_IOMMU_CMD_IOTINVAL_AV            BIT_ULL(10)
> +#define RISCV_IOMMU_CMD_IOTINVAL_PSCID         GENMASK_ULL(31, 12)
> +#define RISCV_IOMMU_CMD_IOTINVAL_PSCV          BIT_ULL(32)
> +#define RISCV_IOMMU_CMD_IOTINVAL_GV            BIT_ULL(33)
> +#define RISCV_IOMMU_CMD_IOTINVAL_GSCID         GENMASK_ULL(59, 44)
> +/* dword1 is the address, 4K-alligned and shifted to the right by
> + * two bits. */
> +
> +/* 3.1.2 I/O MMU Command Queue Fences */
> +/* Fields on dword0 */
> +#define RISCV_IOMMU_CMD_IOFENCE_OPCODE         2
> +#define RISCV_IOMMU_CMD_IOFENCE_FUNC_C         0
> +#define RISCV_IOMMU_CMD_IOFENCE_AV             BIT_ULL(10)
> +#define RISCV_IOMMU_CMD_IOFENCE_WSI            BIT_ULL(11)
> +#define RISCV_IOMMU_CMD_IOFENCE_PR             BIT_ULL(12)
> +#define RISCV_IOMMU_CMD_IOFENCE_PW             BIT_ULL(13)
> +#define RISCV_IOMMU_CMD_IOFENCE_DATA           GENMASK_ULL(63, 32)
> +/* dword1 is the address, word-size alligned and shifted to the
> + * right by two bits. */
> +
> +/* 3.1.3 I/O MMU Directory cache invalidation */
> +/* Fields on dword0 */
> +#define RISCV_IOMMU_CMD_IODIR_OPCODE           3
> +#define RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_DDT   0
> +#define RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_PDT   1
> +#define RISCV_IOMMU_CMD_IODIR_PID              GENMASK_ULL(31, 12)
> +#define RISCV_IOMMU_CMD_IODIR_DV               BIT_ULL(33)
> +#define RISCV_IOMMU_CMD_IODIR_DID              GENMASK_ULL(63, 40)
> +/* dword1 is reserved for standard use */
> +
> +/* 3.1.4 I/O MMU PCIe ATS */
> +/* Fields on dword0 */
> +#define RISCV_IOMMU_CMD_ATS_OPCODE             4
> +#define RISCV_IOMMU_CMD_ATS_FUNC_INVAL         0
> +#define RISCV_IOMMU_CMD_ATS_FUNC_PRGR          1
> +#define RISCV_IOMMU_CMD_ATS_PID                        GENMASK_ULL(31, 12)
> +#define RISCV_IOMMU_CMD_ATS_PV                 BIT_ULL(32)
> +#define RISCV_IOMMU_CMD_ATS_DSV                        BIT_ULL(33)
> +#define RISCV_IOMMU_CMD_ATS_RID                        GENMASK_ULL(55, 40)
> +#define RISCV_IOMMU_CMD_ATS_DSEG               GENMASK_ULL(63, 56)
> +/* dword1 is the ATS payload, two different payload types for INVAL and PRGR */
> +
> +/* ATS.INVAL payload*/
> +#define RISCV_IOMMU_CMD_ATS_INVAL_G            BIT_ULL(0)
> +/* Bits 1 - 10 are zeroed */
> +#define RISCV_IOMMU_CMD_ATS_INVAL_S            BIT_ULL(11)
> +#define RISCV_IOMMU_CMD_ATS_INVAL_UADDR                GENMASK_ULL(63, 12)
> +
> +/* ATS.PRGR payload */
> +/* Bits 0 - 31 are zeroed */
> +#define RISCV_IOMMU_CMD_ATS_PRGR_PRG_INDEX     GENMASK_ULL(40, 32)
> +/* Bits 41 - 43 are zeroed */
> +#define RISCV_IOMMU_CMD_ATS_PRGR_RESP_CODE     GENMASK_ULL(47, 44)
> +#define RISCV_IOMMU_CMD_ATS_PRGR_DST_ID                GENMASK_ULL(63, 48)
> +
> +/**
> + * struct riscv_iommu_fq_record - Fault/Event Queue Record
> + * @hdr: Header, includes fault/event cause, PID/DID, transaction type etc
> + * @_reserved: Low 32bits for custom use, high 32bits for standard use
> + * @iotval: Transaction-type/cause specific format
> + * @iotval2: Cause specific format
> + *
> + * The fault/event queue reports events and failures raised when
> + * processing transactions. Each record is a 32byte structure where
> + * the first dword has a fixed format for providing generic infos
> + * regarding the fault/event, and two more dwords are there for
> + * fault/event-specific information. For more details see section
> + * 3.2.
> + */
> +struct riscv_iommu_fq_record {
> +       u64 hdr;
> +       u64 _reserved;
> +       u64 iotval;
> +       u64 iotval2;
> +};
> +
> +/* Fields on header */
> +#define RISCV_IOMMU_FQ_HDR_CAUSE       GENMASK_ULL(11, 0)
> +#define RISCV_IOMMU_FQ_HDR_PID         GENMASK_ULL(31, 12)
> +#define RISCV_IOMMU_FQ_HDR_PV          BIT_ULL(32)
> +#define RISCV_IOMMU_FQ_HDR_PRIV                BIT_ULL(33)
> +#define RISCV_IOMMU_FQ_HDR_TTYPE       GENMASK_ULL(39, 34)
> +#define RISCV_IOMMU_FQ_HDR_DID         GENMASK_ULL(63, 40)
> +
> +/**
> + * enum riscv_iommu_fq_causes - Fault/event cause values
> + * @RISCV_IOMMU_FQ_CAUSE_INST_FAULT: Instruction access fault
> + * @RISCV_IOMMU_FQ_CAUSE_RD_ADDR_MISALIGNED: Read address misaligned
> + * @RISCV_IOMMU_FQ_CAUSE_RD_FAULT: Read load fault
> + * @RISCV_IOMMU_FQ_CAUSE_WR_ADDR_MISALIGNED: Write/AMO address misaligned
> + * @RISCV_IOMMU_FQ_CAUSE_WR_FAULT: Write/AMO access fault
> + * @RISCV_IOMMU_FQ_CAUSE_INST_FAULT_S: Instruction page fault
> + * @RISCV_IOMMU_FQ_CAUSE_RD_FAULT_S: Read page fault
> + * @RISCV_IOMMU_FQ_CAUSE_WR_FAULT_S: Write/AMO page fault
> + * @RISCV_IOMMU_FQ_CAUSE_INST_FAULT_VS: Instruction guest page fault
> + * @RISCV_IOMMU_FQ_CAUSE_RD_FAULT_VS: Read guest page fault
> + * @RISCV_IOMMU_FQ_CAUSE_WR_FAULT_VS: Write/AMO guest page fault
> + * @RISCV_IOMMU_FQ_CAUSE_DMA_DISABLED: All inbound transactions disallowed
> + * @RISCV_IOMMU_FQ_CAUSE_DDT_LOAD_FAULT: DDT entry load access fault
> + * @RISCV_IOMMU_FQ_CAUSE_DDT_INVALID: DDT entry invalid
> + * @RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED: DDT entry misconfigured
> + * @RISCV_IOMMU_FQ_CAUSE_TTYPE_BLOCKED: Transaction type disallowed
> + * @RISCV_IOMMU_FQ_CAUSE_MSI_LOAD_FAULT: MSI PTE load access fault
> + * @RISCV_IOMMU_FQ_CAUSE_MSI_INVALID: MSI PTE invalid
> + * @RISCV_IOMMU_FQ_CAUSE_MSI_MISCONFIGURED: MSI PTE misconfigured
> + * @RISCV_IOMMU_FQ_CAUSE_MRIF_FAULT: MRIF access fault
> + * @RISCV_IOMMU_FQ_CAUSE_PDT_LOAD_FAULT: PDT entry load access fault
> + * @RISCV_IOMMU_FQ_CAUSE_PDT_INVALID: PDT entry invalid
> + * @RISCV_IOMMU_FQ_CAUSE_PDT_MISCONFIGURED: PDT entry misconfigured
> + * @RISCV_IOMMU_FQ_CAUSE_DDT_CORRUPTED: DDT data corruption
> + * @RISCV_IOMMU_FQ_CAUSE_PDT_CORRUPTED: PDT data corruption
> + * @RISCV_IOMMU_FQ_CAUSE_MSI_PT_CORRUPTED: MSI page table data corruption
> + * @RISCV_IOMMU_FQ_CAUSE_MRIF_CORRUIPTED: MRIF data corruption
> + * @RISCV_IOMMU_FQ_CAUSE_INTERNAL_DP_ERROR: Internal data path error
> + * @RISCV_IOMMU_FQ_CAUSE_MSI_WR_FAULT: IOMMU MSI write access fault
> + * @RISCV_IOMMU_FQ_CAUSE_PT_CORRUPTED: First/second stage page table data corruption
> + *
> + * Values are on table 11 of the spec, encodings 275 - 2047 are reserved for standard
> + * use, and 2048 - 4095 for custom use.
> + */
> +enum riscv_iommu_fq_causes {
> +       RISCV_IOMMU_FQ_CAUSE_INST_FAULT = 1,
> +       RISCV_IOMMU_FQ_CAUSE_RD_ADDR_MISALIGNED = 4,
> +       RISCV_IOMMU_FQ_CAUSE_RD_FAULT = 5,
> +       RISCV_IOMMU_FQ_CAUSE_WR_ADDR_MISALIGNED = 6,
> +       RISCV_IOMMU_FQ_CAUSE_WR_FAULT = 7,
> +       RISCV_IOMMU_FQ_CAUSE_INST_FAULT_S = 12,
> +       RISCV_IOMMU_FQ_CAUSE_RD_FAULT_S = 13,
> +       RISCV_IOMMU_FQ_CAUSE_WR_FAULT_S = 15,
> +       RISCV_IOMMU_FQ_CAUSE_INST_FAULT_VS = 20,
> +       RISCV_IOMMU_FQ_CAUSE_RD_FAULT_VS = 21,
> +       RISCV_IOMMU_FQ_CAUSE_WR_FAULT_VS = 23,
> +       RISCV_IOMMU_FQ_CAUSE_DMA_DISABLED = 256,
> +       RISCV_IOMMU_FQ_CAUSE_DDT_LOAD_FAULT = 257,
> +       RISCV_IOMMU_FQ_CAUSE_DDT_INVALID = 258,
> +       RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED = 259,
> +       RISCV_IOMMU_FQ_CAUSE_TTYPE_BLOCKED = 260,
> +       RISCV_IOMMU_FQ_CAUSE_MSI_LOAD_FAULT = 261,
> +       RISCV_IOMMU_FQ_CAUSE_MSI_INVALID = 262,
> +       RISCV_IOMMU_FQ_CAUSE_MSI_MISCONFIGURED = 263,
> +       RISCV_IOMMU_FQ_CAUSE_MRIF_FAULT = 264,
> +       RISCV_IOMMU_FQ_CAUSE_PDT_LOAD_FAULT = 265,
> +       RISCV_IOMMU_FQ_CAUSE_PDT_INVALID = 266,
> +       RISCV_IOMMU_FQ_CAUSE_PDT_MISCONFIGURED = 267,
> +       RISCV_IOMMU_FQ_CAUSE_DDT_CORRUPTED = 268,
> +       RISCV_IOMMU_FQ_CAUSE_PDT_CORRUPTED = 269,
> +       RISCV_IOMMU_FQ_CAUSE_MSI_PT_CORRUPTED = 270,
> +       RISCV_IOMMU_FQ_CAUSE_MRIF_CORRUIPTED = 271,
> +       RISCV_IOMMU_FQ_CAUSE_INTERNAL_DP_ERROR = 272,
> +       RISCV_IOMMU_FQ_CAUSE_MSI_WR_FAULT = 273,
> +       RISCV_IOMMU_FQ_CAUSE_PT_CORRUPTED = 274
> +};
> +
> +/**
> + * enum riscv_iommu_fq_ttypes: Fault/event transaction types
> + * @RISCV_IOMMU_FQ_TTYPE_NONE: None. Fault not caused by an inbound transaction.
> + * @RISCV_IOMMU_FQ_TTYPE_UADDR_INST_FETCH: Instruction fetch from untranslated address
> + * @RISCV_IOMMU_FQ_TTYPE_UADDR_RD: Read from untranslated address
> + * @RISCV_IOMMU_FQ_TTYPE_UADDR_WR: Write/AMO to untranslated address
> + * @RISCV_IOMMU_FQ_TTYPE_TADDR_INST_FETCH: Instruction fetch from translated address
> + * @RISCV_IOMMU_FQ_TTYPE_TADDR_RD: Read from translated address
> + * @RISCV_IOMMU_FQ_TTYPE_TADDR_WR: Write/AMO to translated address
> + * @RISCV_IOMMU_FQ_TTYPE_PCIE_ATS_REQ: PCIe ATS translation request
> + * @RISCV_IOMMU_FW_TTYPE_PCIE_MSG_REQ: PCIe message request
> + *
> + * Values are on table 12 of the spec, type 4 and 10 - 31 are reserved for standard use
> + * and 31 - 63 for custom use.
> + */
> +enum riscv_iommu_fq_ttypes {
> +       RISCV_IOMMU_FQ_TTYPE_NONE = 0,
> +       RISCV_IOMMU_FQ_TTYPE_UADDR_INST_FETCH = 1,
> +       RISCV_IOMMU_FQ_TTYPE_UADDR_RD = 2,
> +       RISCV_IOMMU_FQ_TTYPE_UADDR_WR = 3,
> +       RISCV_IOMMU_FQ_TTYPE_TADDR_INST_FETCH = 5,
> +       RISCV_IOMMU_FQ_TTYPE_TADDR_RD = 6,
> +       RISCV_IOMMU_FQ_TTYPE_TADDR_WR = 7,
> +       RISCV_IOMMU_FQ_TTYPE_PCIE_ATS_REQ = 8,
> +       RISCV_IOMMU_FW_TTYPE_PCIE_MSG_REQ = 9,
> +};
> +
> +/**
> + * struct riscv_iommu_pq_record - PCIe Page Request record
> + * @hdr: Header, includes PID, DID etc
> + * @payload: Holds the page address, request group and permission bits
> + *
> + * For more infos on the PCIe Page Request queue see chapter 3.3.
> + */
> +struct riscv_iommu_pq_record {
> +       u64 hdr;
> +       u64 payload;
> +};
> +
> +/* Header fields */
> +#define RISCV_IOMMU_PREQ_HDR_PID       GENMASK_ULL(31, 12)
> +#define RISCV_IOMMU_PREQ_HDR_PV                BIT_ULL(32)
> +#define RISCV_IOMMU_PREQ_HDR_PRIV      BIT_ULL(33)
> +#define RISCV_IOMMU_PREQ_HDR_EXEC      BIT_ULL(34)
> +#define RISCV_IOMMU_PREQ_HDR_DID       GENMASK_ULL(63, 40)
> +
> +/* Payload fields */
> +#define RISCV_IOMMU_PREQ_PAYLOAD_R     BIT_ULL(0)
> +#define RISCV_IOMMU_PREQ_PAYLOAD_W     BIT_ULL(1)
> +#define RISCV_IOMMU_PREQ_PAYLOAD_L     BIT_ULL(2)
> +#define RISCV_IOMMU_PREQ_PAYLOAD_M     GENMASK_ULL(2, 0)       /* Mask of RWL for convenience */
> +#define RISCV_IOMMU_PREQ_PRG_INDEX     GENMASK_ULL(11, 3)
> +#define RISCV_IOMMU_PREQ_UADDR         GENMASK_ULL(63, 12)
> +
> +/**
> + * struct riscv_iommu_msi_pte - MSI Page Table Entry
> + * @pte: MSI PTE
> + * @mrif_info: Memory-resident interrupt file info
> + *
> + * The MSI Page Table is used for virtualizing MSIs, so that when
> + * a device sends an MSI to a guest, the IOMMU can reroute it
> + * by translating the MSI address, either to a guest interrupt file
> + * or a memory resident interrupt file (MRIF). Note that this page table
> + * is an array of MSI PTEs, not a multi-level pt, each entry
> + * is a leaf entry. For more infos check out the the AIA spec, chapter 9.5.
> + *
> + * Also in basic mode the mrif_info field is ignored by the IOMMU and can
> + * be used by software, any other reserved fields on pte must be zeroed-out
> + * by software.
> + */
> +struct riscv_iommu_msi_pte {
> +       u64 pte;
> +       u64 mrif_info;
> +};
> +
> +/* Fields on pte */
> +#define RISCV_IOMMU_MSI_PTE_V          BIT_ULL(0)
> +#define RISCV_IOMMU_MSI_PTE_M          GENMASK_ULL(2, 1)
> +#define RISCV_IOMMU_MSI_PTE_MRIF_ADDR  GENMASK_ULL(53, 7)      /* When M == 1 (MRIF mode) */
> +#define RISCV_IOMMU_MSI_PTE_PPN                RISCV_IOMMU_PPN_FIELD   /* When M == 3 (basic mode) */
> +#define RISCV_IOMMU_MSI_PTE_C          BIT_ULL(63)
> +
> +/* Fields on mrif_info */
> +#define RISCV_IOMMU_MSI_MRIF_NID       GENMASK_ULL(9, 0)
> +#define RISCV_IOMMU_MSI_MRIF_NPPN      RISCV_IOMMU_PPN_FIELD
> +#define RISCV_IOMMU_MSI_MRIF_NID_MSB   BIT_ULL(60)
> +
> +#endif /* _RISCV_IOMMU_BITS_H_ */
> diff --git a/drivers/iommu/riscv/iommu-pci.c b/drivers/iommu/riscv/iommu-pci.c
> new file mode 100644
> index 000000000000..c91f963d7a29
> --- /dev/null
> +++ b/drivers/iommu/riscv/iommu-pci.c
> @@ -0,0 +1,134 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +/*
> + * Copyright © 2022-2023 Rivos Inc.
> + * Copyright © 2023 FORTH-ICS/CARV
> + *
> + * RISCV IOMMU as a PCIe device
> + *
> + * Authors
> + *     Tomasz Jeznach <tjeznach@rivosinc.com>
> + *     Nick Kossifidis <mick@ics.forth.gr>
> + */
> +
> +#include <linux/module.h>
> +#include <linux/kernel.h>
> +#include <linux/compiler.h>
> +#include <linux/pci.h>
> +#include <linux/init.h>
> +#include <linux/iommu.h>
> +#include <linux/bitfield.h>
> +
> +#include "iommu.h"
> +
> +/* Rivos Inc. assigned PCI Vendor and Device IDs */
> +#ifndef PCI_VENDOR_ID_RIVOS
> +#define PCI_VENDOR_ID_RIVOS             0x1efd
> +#endif
> +
> +#ifndef PCI_DEVICE_ID_RIVOS_IOMMU
> +#define PCI_DEVICE_ID_RIVOS_IOMMU       0xedf1
> +#endif
> +
> +static int riscv_iommu_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> +{
> +       struct device *dev = &pdev->dev;
> +       struct riscv_iommu_device *iommu;
> +       int ret;
> +
> +       ret = pci_enable_device_mem(pdev);
> +       if (ret < 0)
> +               return ret;
> +
> +       ret = pci_request_mem_regions(pdev, KBUILD_MODNAME);
> +       if (ret < 0)
> +               goto fail;
> +
> +       ret = -ENOMEM;
> +
> +       iommu = devm_kzalloc(dev, sizeof(*iommu), GFP_KERNEL);
> +       if (!iommu)
> +               goto fail;
> +
> +       if (!(pci_resource_flags(pdev, 0) & IORESOURCE_MEM))
> +               goto fail;
> +
> +       if (pci_resource_len(pdev, 0) < RISCV_IOMMU_REG_SIZE)
> +               goto fail;
> +
> +       iommu->reg_phys = pci_resource_start(pdev, 0);
> +       if (!iommu->reg_phys)
> +               goto fail;
> +
> +       iommu->reg = devm_ioremap(dev, iommu->reg_phys, RISCV_IOMMU_REG_SIZE);
> +       if (!iommu->reg)
> +               goto fail;
> +
> +       iommu->dev = dev;
> +       dev_set_drvdata(dev, iommu);
> +
> +       dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
> +       pci_set_master(pdev);
> +
> +       ret = riscv_iommu_init(iommu);
> +       if (!ret)
> +               return ret;
> +
> + fail:
> +       pci_clear_master(pdev);
> +       pci_release_regions(pdev);
> +       pci_disable_device(pdev);
> +       /* Note: devres_release_all() will release iommu and iommu->reg */
> +       return ret;
> +}
> +
> +static void riscv_iommu_pci_remove(struct pci_dev *pdev)
> +{
> +       riscv_iommu_remove(dev_get_drvdata(&pdev->dev));
> +       pci_clear_master(pdev);
> +       pci_release_regions(pdev);
> +       pci_disable_device(pdev);
> +}
> +
> +static int riscv_iommu_suspend(struct device *dev)
> +{
> +       dev_warn(dev, "RISC-V IOMMU PM not implemented");
> +       return -ENODEV;
> +}
> +
> +static int riscv_iommu_resume(struct device *dev)
> +{
> +       dev_warn(dev, "RISC-V IOMMU PM not implemented");
> +       return -ENODEV;
> +}
> +
> +static DEFINE_SIMPLE_DEV_PM_OPS(riscv_iommu_pm_ops, riscv_iommu_suspend,
> +                               riscv_iommu_resume);
> +
> +static const struct pci_device_id riscv_iommu_pci_tbl[] = {
> +       {PCI_VENDOR_ID_RIVOS, PCI_DEVICE_ID_RIVOS_IOMMU,
> +        PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0},
> +       {0,}
> +};
> +
> +MODULE_DEVICE_TABLE(pci, riscv_iommu_pci_tbl);
> +
> +static const struct of_device_id riscv_iommu_of_match[] = {
> +       {.compatible = "riscv,pci-iommu",},
> +       {},
> +};
> +
> +MODULE_DEVICE_TABLE(of, riscv_iommu_of_match);
> +
> +static struct pci_driver riscv_iommu_pci_driver = {
> +       .name = KBUILD_MODNAME,
> +       .id_table = riscv_iommu_pci_tbl,
> +       .probe = riscv_iommu_pci_probe,
> +       .remove = riscv_iommu_pci_remove,
> +       .driver = {
> +                  .pm = pm_sleep_ptr(&riscv_iommu_pm_ops),
> +                  .of_match_table = riscv_iommu_of_match,
> +                  },
> +};
> +
> +module_driver(riscv_iommu_pci_driver, pci_register_driver, pci_unregister_driver);
> diff --git a/drivers/iommu/riscv/iommu-platform.c b/drivers/iommu/riscv/iommu-platform.c
> new file mode 100644
> index 000000000000..e4e8ca6711e7
> --- /dev/null
> +++ b/drivers/iommu/riscv/iommu-platform.c
> @@ -0,0 +1,94 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * RISC-V IOMMU as a platform device
> + *
> + * Copyright © 2023 FORTH-ICS/CARV
> + *
> + * Author: Nick Kossifidis <mick@ics.forth.gr>
> + */
> +
> +#include <linux/module.h>
> +#include <linux/kernel.h>
> +#include <linux/of_platform.h>
> +#include <linux/bitfield.h>
> +
> +#include "iommu-bits.h"
> +#include "iommu.h"
> +
> +static int riscv_iommu_platform_probe(struct platform_device *pdev)
> +{
> +       struct device *dev = &pdev->dev;
> +       struct riscv_iommu_device *iommu = NULL;
> +       struct resource *res = NULL;
> +       int ret = 0;
> +
> +       iommu = devm_kzalloc(dev, sizeof(*iommu), GFP_KERNEL);
> +       if (!iommu)
> +               return -ENOMEM;
> +
> +       iommu->dev = dev;
> +       dev_set_drvdata(dev, iommu);
> +
> +       res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> +       if (!res) {
> +               dev_err(dev, "could not find resource for register region\n");
> +               return -EINVAL;
> +       }
> +
> +       iommu->reg = devm_platform_get_and_ioremap_resource(pdev, 0, &res);
> +       if (IS_ERR(iommu->reg)) {
> +               ret = dev_err_probe(dev, PTR_ERR(iommu->reg),
> +                                   "could not map register region\n");
> +               goto fail;
> +       };
> +
> +       iommu->reg_phys = res->start;
> +
> +       ret = -ENODEV;
> +
> +       /* Sanity check: Did we get the whole register space ? */
> +       if ((res->end - res->start + 1) < RISCV_IOMMU_REG_SIZE) {
> +               dev_err(dev, "device region smaller than register file (0x%llx)\n",
> +                       res->end - res->start);
> +               goto fail;
> +       }

Could we assume that DT should be responsible for specifying the right size?

> +
> +       dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
> +
> +       return riscv_iommu_init(iommu);
> +
> + fail:
> +       /* Note: devres_release_all() will release iommu and iommu->reg */
> +       return ret;
> +};
> +
> +static void riscv_iommu_platform_remove(struct platform_device *pdev)
> +{
> +       riscv_iommu_remove(dev_get_drvdata(&pdev->dev));
> +};
> +
> +static void riscv_iommu_platform_shutdown(struct platform_device *pdev)
> +{
> +       return;
> +};
> +
> +static const struct of_device_id riscv_iommu_of_match[] = {
> +       {.compatible = "riscv,iommu",},
> +       {},
> +};
> +
> +MODULE_DEVICE_TABLE(of, riscv_iommu_of_match);
> +
> +static struct platform_driver riscv_iommu_platform_driver = {
> +       .driver = {
> +                  .name = "riscv,iommu",
> +                  .of_match_table = riscv_iommu_of_match,
> +                  .suppress_bind_attrs = true,
> +                  },
> +       .probe = riscv_iommu_platform_probe,
> +       .remove_new = riscv_iommu_platform_remove,
> +       .shutdown = riscv_iommu_platform_shutdown,
> +};
> +
> +module_driver(riscv_iommu_platform_driver, platform_driver_register,
> +             platform_driver_unregister);
> diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
> new file mode 100644
> index 000000000000..8c236242e2cc
> --- /dev/null
> +++ b/drivers/iommu/riscv/iommu.c
> @@ -0,0 +1,660 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * IOMMU API for RISC-V architected Ziommu implementations.
> + *
> + * Copyright © 2022-2023 Rivos Inc.
> + * Copyright © 2023 FORTH-ICS/CARV
> + *
> + * Authors
> + *     Tomasz Jeznach <tjeznach@rivosinc.com>
> + *     Nick Kossifidis <mick@ics.forth.gr>
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include <linux/bitfield.h>
> +#include <linux/module.h>
> +#include <linux/kernel.h>
> +#include <linux/compiler.h>
> +#include <linux/pci.h>
> +#include <linux/pci-ats.h>
> +#include <linux/init.h>
> +#include <linux/completion.h>
> +#include <linux/uaccess.h>
> +#include <linux/iommu.h>
> +#include <linux/irqdomain.h>
> +#include <linux/platform_device.h>
> +#include <linux/dma-map-ops.h>
> +#include <asm/page.h>
> +
> +#include "../dma-iommu.h"
> +#include "../iommu-sva.h"
> +#include "iommu.h"
> +
> +#include <asm/csr.h>
> +#include <asm/delay.h>
> +
> +MODULE_DESCRIPTION("IOMMU driver for RISC-V architected Ziommu implementations");
> +MODULE_AUTHOR("Tomasz Jeznach <tjeznach@rivosinc.com>");
> +MODULE_AUTHOR("Nick Kossifidis <mick@ics.forth.gr>");
> +MODULE_ALIAS("riscv-iommu");
> +MODULE_LICENSE("GPL v2");
> +
> +/* Global IOMMU params. */
> +static int ddt_mode = RISCV_IOMMU_DDTP_MODE_BARE;
> +module_param(ddt_mode, int, 0644);
> +MODULE_PARM_DESC(ddt_mode, "Device Directory Table mode.");
> +
> +/* IOMMU PSCID allocation namespace. */
> +#define RISCV_IOMMU_MAX_PSCID  (1U << 20)
> +static DEFINE_IDA(riscv_iommu_pscids);
> +
> +/* 1 second */
> +#define RISCV_IOMMU_TIMEOUT    riscv_timebase
> +
> +/* RISC-V IOMMU PPN <> PHYS address conversions, PHYS <=> PPN[53:10] */
> +#define phys_to_ppn(va)  (((va) >> 2) & (((1ULL << 44) - 1) << 10))
> +#define ppn_to_phys(pn)         (((pn) << 2) & (((1ULL << 44) - 1) << 12))
> +
> +#define iommu_domain_to_riscv(iommu_domain) \
> +    container_of(iommu_domain, struct riscv_iommu_domain, domain)
> +
> +#define iommu_device_to_riscv(iommu_device) \
> +    container_of(iommu_device, struct riscv_iommu, iommu)
> +
> +static const struct iommu_domain_ops riscv_iommu_domain_ops;
> +static const struct iommu_ops riscv_iommu_ops;
> +
> +/*
> + * Register device for IOMMU tracking.
> + */
> +static void riscv_iommu_add_device(struct riscv_iommu_device *iommu, struct device *dev)
> +{
> +       struct riscv_iommu_endpoint *ep, *rb_ep;
> +       struct rb_node **new_node, *parent_node = NULL;
> +
> +       mutex_lock(&iommu->eps_mutex);
> +
> +       ep = dev_iommu_priv_get(dev);
> +
> +       new_node = &(iommu->eps.rb_node);
> +       while (*new_node) {
> +               rb_ep = rb_entry(*new_node, struct riscv_iommu_endpoint, node);
> +               parent_node = *new_node;
> +               if (rb_ep->devid > ep->devid) {
> +                       new_node = &((*new_node)->rb_left);
> +               } else if (rb_ep->devid < ep->devid) {
> +                       new_node = &((*new_node)->rb_right);
> +               } else {
> +                       dev_warn(dev, "device %u already in the tree\n", ep->devid);
> +                       break;
> +               }
> +       }
> +
> +       rb_link_node(&ep->node, parent_node, new_node);
> +       rb_insert_color(&ep->node, &iommu->eps);
> +
> +       mutex_unlock(&iommu->eps_mutex);
> +}
> +
> +/*
> + * Endpoint management
> + */
> +
> +static int riscv_iommu_of_xlate(struct device *dev, struct of_phandle_args *args)
> +{
> +       return iommu_fwspec_add_ids(dev, args->args, 1);
> +}
> +
> +static bool riscv_iommu_capable(struct device *dev, enum iommu_cap cap)
> +{
> +       switch (cap) {
> +       case IOMMU_CAP_CACHE_COHERENCY:
> +       case IOMMU_CAP_PRE_BOOT_PROTECTION:
> +               return true;
> +
> +       default:
> +               break;
> +       }
> +
> +       return false;
> +}
> +
> +static struct iommu_device *riscv_iommu_probe_device(struct device *dev)
> +{
> +       struct riscv_iommu_device *iommu;
> +       struct riscv_iommu_endpoint *ep;
> +       struct iommu_fwspec *fwspec;
> +
> +       fwspec = dev_iommu_fwspec_get(dev);
> +       if (!fwspec || fwspec->ops != &riscv_iommu_ops ||
> +           !fwspec->iommu_fwnode || !fwspec->iommu_fwnode->dev)
> +               return ERR_PTR(-ENODEV);
> +
> +       iommu = dev_get_drvdata(fwspec->iommu_fwnode->dev);
> +       if (!iommu)
> +               return ERR_PTR(-ENODEV);
> +
> +       if (dev_iommu_priv_get(dev))
> +               return &iommu->iommu;
> +
> +       ep = kzalloc(sizeof(*ep), GFP_KERNEL);
> +       if (!ep)
> +               return ERR_PTR(-ENOMEM);
> +
> +       mutex_init(&ep->lock);
> +       INIT_LIST_HEAD(&ep->domain);
> +
> +       if (dev_is_pci(dev)) {
> +               ep->devid = pci_dev_id(to_pci_dev(dev));
> +               ep->domid = pci_domain_nr(to_pci_dev(dev)->bus);
> +       } else {
> +               /* TODO: Make this generic, for now hardcode domain id to 0 */
> +               ep->devid = fwspec->ids[0];
> +               ep->domid = 0;
> +       }
> +
> +       ep->iommu = iommu;
> +       ep->dev = dev;
> +
> +       dev_info(iommu->dev, "adding device to iommu with devid %i in domain %i\n",
> +               ep->devid, ep->domid);
> +
> +       dev_iommu_priv_set(dev, ep);
> +       riscv_iommu_add_device(iommu, dev);
> +
> +       return &iommu->iommu;
> +}
> +
> +static void riscv_iommu_probe_finalize(struct device *dev)
> +{
> +       set_dma_ops(dev, NULL);
> +       iommu_setup_dma_ops(dev, 0, U64_MAX);
> +}
> +
> +static void riscv_iommu_release_device(struct device *dev)
> +{
> +       struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
> +       struct riscv_iommu_device *iommu = ep->iommu;
> +
> +       dev_info(dev, "device with devid %i released\n", ep->devid);
> +
> +       mutex_lock(&ep->lock);
> +       list_del(&ep->domain);
> +       mutex_unlock(&ep->lock);
> +
> +       /* Remove endpoint from IOMMU tracking structures */
> +       mutex_lock(&iommu->eps_mutex);
> +       rb_erase(&ep->node, &iommu->eps);
> +       mutex_unlock(&iommu->eps_mutex);
> +
> +       set_dma_ops(dev, NULL);
> +       dev_iommu_priv_set(dev, NULL);
> +
> +       kfree(ep);
> +}
> +
> +static struct iommu_group *riscv_iommu_device_group(struct device *dev)
> +{
> +       if (dev_is_pci(dev))
> +               return pci_device_group(dev);
> +       return generic_device_group(dev);
> +}
> +
> +static void riscv_iommu_get_resv_regions(struct device *dev, struct list_head *head)
> +{
> +       iommu_dma_get_resv_regions(dev, head);
> +}
> +
> +/*
> + * Domain management
> + */
> +
> +static struct iommu_domain *riscv_iommu_domain_alloc(unsigned type)
> +{
> +       struct riscv_iommu_domain *domain;
> +
> +       if (type != IOMMU_DOMAIN_IDENTITY &&
> +           type != IOMMU_DOMAIN_BLOCKED)
> +               return NULL;
> +
> +       domain = kzalloc(sizeof(*domain), GFP_KERNEL);
> +       if (!domain)
> +               return NULL;
> +
> +       mutex_init(&domain->lock);
> +       INIT_LIST_HEAD(&domain->endpoints);
> +
> +       domain->domain.ops = &riscv_iommu_domain_ops;
> +       domain->mode = RISCV_IOMMU_DC_FSC_MODE_BARE;
> +       domain->pscid = ida_alloc_range(&riscv_iommu_pscids, 1,
> +                                       RISCV_IOMMU_MAX_PSCID, GFP_KERNEL);
> +
> +       printk("domain type %x alloc %u\n", type, domain->pscid);
> +

Could it uses pr_xxx instead of printk?

> +       return &domain->domain;
> +}
> +
> +static void riscv_iommu_domain_free(struct iommu_domain *iommu_domain)
> +{
> +       struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
> +
> +       if (!list_empty(&domain->endpoints)) {
> +               pr_warn("IOMMU domain is not empty!\n");
> +       }
> +
> +       if (domain->pgd_root)
> +               free_pages((unsigned long)domain->pgd_root, 0);
> +
> +       if ((int)domain->pscid > 0)
> +               ida_free(&riscv_iommu_pscids, domain->pscid);
> +
> +       printk("domain free %u\n", domain->pscid);
> +

Could it uses pr_xxx instead of printk?

> +       kfree(domain);
> +}
> +
> +static int riscv_iommu_domain_finalize(struct riscv_iommu_domain *domain,
> +                                      struct riscv_iommu_device *iommu)
> +{
> +       struct iommu_domain_geometry *geometry;
> +
> +       /* Domain assigned to another iommu */
> +       if (domain->iommu && domain->iommu != iommu)
> +               return -EINVAL;
> +       /* Domain already initialized */
> +       else if (domain->iommu)
> +               return 0;
> +
> +       /*
> +        * TODO: Before using VA_BITS and satp_mode here, verify they
> +        * are supported by the iommu, through the capabilities register.
> +        */
> +
> +       geometry = &domain->domain.geometry;
> +
> +       /*
> +        * Note: RISC-V Privilege spec mandates that virtual addresses
> +        * need to be sign-extended, so if (VA_BITS - 1) is set, all
> +        * bits >= VA_BITS need to also be set or else we'll get a
> +        * page fault. However the code that creates the mappings
> +        * above us (e.g. iommu_dma_alloc_iova()) won't do that for us
> +        * for now, so we'll end up with invalid virtual addresses
> +        * to map. As a workaround until we get this sorted out
> +        * limit the available virtual addresses to VA_BITS - 1.
> +        */
> +       geometry->aperture_start = 0;
> +       geometry->aperture_end = DMA_BIT_MASK(VA_BITS - 1);
> +       geometry->force_aperture = true;
> +
> +       domain->iommu = iommu;
> +
> +       if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)
> +               return 0;
> +
> +       /* TODO: Fix this for RV32 */
> +       domain->mode = satp_mode >> 60;
> +       domain->pgd_root = (pgd_t *) __get_free_pages(GFP_KERNEL | __GFP_ZERO, 0);
> +
> +       if (!domain->pgd_root)
> +               return -ENOMEM;
> +
> +       return 0;
> +}
> +
> +static int riscv_iommu_attach_dev(struct iommu_domain *iommu_domain, struct device *dev)
> +{
> +       struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
> +       struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
> +       int ret;
> +
> +       /* PSCID not valid */
> +       if ((int)domain->pscid < 0)
> +               return -ENOMEM;
> +
> +       mutex_lock(&domain->lock);
> +       mutex_lock(&ep->lock);
> +
> +       if (!list_empty(&ep->domain)) {
> +               dev_warn(dev, "endpoint already attached to a domain. dropping\n");
> +               list_del_init(&ep->domain);
> +       }
> +
> +       /* allocate root pages, initialize io-pgtable ops, etc. */
> +       ret = riscv_iommu_domain_finalize(domain, ep->iommu);
> +       if (ret < 0) {
> +               dev_err(dev, "can not finalize domain: %d\n", ret);
> +               mutex_unlock(&ep->lock);
> +               mutex_unlock(&domain->lock);
> +               return ret;
> +       }
> +
> +       if (ep->iommu->ddt_mode != RISCV_IOMMU_DDTP_MODE_BARE ||
> +           domain->domain.type != IOMMU_DOMAIN_IDENTITY) {
> +               dev_warn(dev, "domain type %d not supported\n",
> +                   domain->domain.type);
> +               return -ENODEV;
> +       }
> +
> +       list_add_tail(&ep->domain, &domain->endpoints);
> +       mutex_unlock(&ep->lock);
> +       mutex_unlock(&domain->lock);
> +
> +       dev_info(dev, "domain type %d attached w/ PSCID %u\n",
> +           domain->domain.type, domain->pscid);
> +
> +       return 0;
> +}
> +
> +static void riscv_iommu_flush_iotlb_range(struct iommu_domain *iommu_domain,
> +                                         unsigned long *start, unsigned long *end,
> +                                         size_t *pgsize)
> +{
> +       /* Command interface not implemented */
> +}
> +
> +static void riscv_iommu_flush_iotlb_all(struct iommu_domain *iommu_domain)
> +{
> +       riscv_iommu_flush_iotlb_range(iommu_domain, NULL, NULL, NULL);
> +}
> +
> +static void riscv_iommu_iotlb_sync(struct iommu_domain *iommu_domain,
> +                                  struct iommu_iotlb_gather *gather)
> +{
> +       riscv_iommu_flush_iotlb_range(iommu_domain, &gather->start, &gather->end,
> +                                     &gather->pgsize);
> +}
> +
> +static void riscv_iommu_iotlb_sync_map(struct iommu_domain *iommu_domain,
> +                                      unsigned long iova, size_t size)
> +{
> +       unsigned long end = iova + size - 1;
> +       /*
> +        * Given we don't know the page size used by this range, we assume the
> +        * smallest page size to ensure all possible entries are flushed from
> +        * the IOATC.
> +        */
> +       size_t pgsize = PAGE_SIZE;
> +       riscv_iommu_flush_iotlb_range(iommu_domain, &iova, &end, &pgsize);
> +}
> +
> +static int riscv_iommu_map_pages(struct iommu_domain *iommu_domain,
> +                                unsigned long iova, phys_addr_t phys,
> +                                size_t pgsize, size_t pgcount, int prot,
> +                                gfp_t gfp, size_t *mapped)
> +{
> +       struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
> +
> +       if (domain->domain.type == IOMMU_DOMAIN_IDENTITY) {
> +               *mapped = pgsize * pgcount;
> +               return 0;
> +       }
> +
> +       return -ENODEV;
> +}
> +
> +static size_t riscv_iommu_unmap_pages(struct iommu_domain *iommu_domain,
> +                                     unsigned long iova, size_t pgsize,
> +                                     size_t pgcount, struct iommu_iotlb_gather *gather)
> +{
> +       struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
> +
> +       if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)
> +               return pgsize * pgcount;
> +
> +       return 0;
> +}
> +
> +static phys_addr_t riscv_iommu_iova_to_phys(struct iommu_domain *iommu_domain,
> +                                           dma_addr_t iova)
> +{
> +       struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
> +
> +       if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)
> +               return (phys_addr_t) iova;
> +
> +       return 0;
> +}
> +
> +/*
> + * Translation mode setup
> + */
> +
> +static u64 riscv_iommu_get_ddtp(struct riscv_iommu_device *iommu)
> +{
> +       u64 ddtp;
> +       cycles_t end_cycles = RISCV_IOMMU_TIMEOUT + get_cycles();
> +
> +       /* Wait for DDTP.BUSY to be cleared and return latest value */
> +       do {
> +               ddtp = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_DDTP);
> +               if (!(ddtp & RISCV_IOMMU_DDTP_BUSY))
> +                       break;
> +               cpu_relax();
> +       } while (get_cycles() < end_cycles);
> +
> +       return ddtp;
> +}
> +
> +static void riscv_iommu_ddt_cleanup(struct riscv_iommu_device *iommu)
> +{
> +       /* TODO: teardown whole device directory tree. */
> +       if (iommu->ddtp) {
> +               if (iommu->ddtp_in_iomem) {
> +                       iounmap((void *)iommu->ddtp);
> +               } else
> +                       free_page(iommu->ddtp);
> +               iommu->ddtp = 0;
> +       }
> +}
> +
> +static int riscv_iommu_enable(struct riscv_iommu_device *iommu, unsigned requested_mode)
> +{
> +       struct device *dev = iommu->dev;
> +       u64 ddtp = 0;
> +       u64 ddtp_paddr = 0;
> +       unsigned mode = requested_mode;
> +       unsigned mode_readback = 0;
> +
> +       ddtp = riscv_iommu_get_ddtp(iommu);
> +       if (ddtp & RISCV_IOMMU_DDTP_BUSY)
> +               return -EBUSY;
> +
> +       /* Disallow state transtion from xLVL to xLVL. */
> +       switch (FIELD_GET(RISCV_IOMMU_DDTP_MODE, ddtp)) {
> +       case RISCV_IOMMU_DDTP_MODE_BARE:
> +       case RISCV_IOMMU_DDTP_MODE_OFF:
> +               break;
> +       default:
> +               if ((mode != RISCV_IOMMU_DDTP_MODE_BARE)
> +                   && (mode != RISCV_IOMMU_DDTP_MODE_OFF))
> +                       return -EINVAL;
> +               break;
> +       }
> +
> + retry:

We need to consider the `iommu.passthrough` before we set up the mode
in switch case, something like

if (iommu_default_passthrough()) {
        /* set ddtp to bare mode */
}

> +       switch (mode) {
> +       case RISCV_IOMMU_DDTP_MODE_BARE:
> +       case RISCV_IOMMU_DDTP_MODE_OFF:
> +               riscv_iommu_ddt_cleanup(iommu);
> +               ddtp = FIELD_PREP(RISCV_IOMMU_DDTP_MODE, mode);
> +               break;
> +       case RISCV_IOMMU_DDTP_MODE_1LVL:
> +       case RISCV_IOMMU_DDTP_MODE_2LVL:
> +       case RISCV_IOMMU_DDTP_MODE_3LVL:
> +               if (!iommu->ddtp) {
> +                       /*
> +                        * We haven't initialized ddtp yet, since it's WARL make
> +                        * sure that we don't have a hardwired PPN field there
> +                        * that points to i/o memory instead.
> +                        */
> +                       riscv_iommu_writeq(iommu, RISCV_IOMMU_REG_DDTP, 0);
> +                       ddtp = riscv_iommu_get_ddtp(iommu);
> +                       ddtp_paddr = ppn_to_phys(ddtp);
> +                       if (ddtp_paddr) {
> +                               dev_warn(dev, "ddtp at 0x%llx\n", ddtp_paddr);
> +                               iommu->ddtp =
> +                                   (unsigned long)ioremap(ddtp_paddr, PAGE_SIZE);
> +                               iommu->ddtp_in_iomem = true;
> +                       } else {
> +                               iommu->ddtp = get_zeroed_page(GFP_KERNEL);
> +                       }
> +               }
> +               if (!iommu->ddtp)
> +                       return -ENOMEM;
> +
> +               ddtp = FIELD_PREP(RISCV_IOMMU_DDTP_MODE, mode) |
> +                   phys_to_ppn(__pa(iommu->ddtp));
> +
> +               break;
> +       default:
> +               return -EINVAL;
> +       }
> +
> +       riscv_iommu_writeq(iommu, RISCV_IOMMU_REG_DDTP, ddtp);
> +       ddtp = riscv_iommu_get_ddtp(iommu);
> +       if (ddtp & RISCV_IOMMU_DDTP_BUSY) {
> +               dev_warn(dev, "timeout when setting ddtp (ddt mode: %i)\n", mode);
> +               return -EBUSY;
> +       }
> +
> +       mode_readback = FIELD_GET(RISCV_IOMMU_DDTP_MODE, ddtp);
> +       dev_info(dev, "mode_readback: %i, mode: %i\n", mode_readback, mode);
> +       if (mode_readback != mode) {
> +               /*
> +                * Mode field is WARL, an I/O MMU may support a subset of
> +                * directory table levels in which case if we tried to set
> +                * an unsupported number of levels we'll readback either
> +                * a valid xLVL or off/bare. If we got off/bare, try again
> +                * with a smaller xLVL.
> +                */
> +               if (mode_readback < RISCV_IOMMU_DDTP_MODE_1LVL &&
> +                   mode > RISCV_IOMMU_DDTP_MODE_1LVL) {
> +                       mode--;
> +                       goto retry;
> +               }
> +
> +               /*
> +                * We tried all supported xLVL modes and still got off/bare instead,
> +                * an I/O MMU must support at least one supported xLVL mode so something
> +                * went very wrong.
> +                */
> +               if (mode_readback < RISCV_IOMMU_DDTP_MODE_1LVL &&
> +                   mode == RISCV_IOMMU_DDTP_MODE_1LVL)
> +                       goto fail;
> +
> +               /*
> +                * We tried setting off or bare and got something else back, something
> +                * went very wrong since off/bare is always legal.
> +                */
> +               if (mode < RISCV_IOMMU_DDTP_MODE_1LVL)
> +                       goto fail;
> +
> +               /*
> +                * We tried setting an xLVL mode but got another xLVL mode that
> +                * we don't support (e.g. a custom one).
> +                */
> +               if (mode_readback > RISCV_IOMMU_DDTP_MODE_MAX)
> +                       goto fail;
> +
> +               /* We tried setting an xLVL mode but got another supported xLVL mode */
> +               mode = mode_readback;
> +       }
> +
> +       if (mode != requested_mode)
> +               dev_warn(dev, "unsupported DDT mode requested (%i), using %i instead\n",
> +                        requested_mode, mode);
> +
> +       iommu->ddt_mode = mode;
> +       dev_info(dev, "ddt_mode: %i\n", iommu->ddt_mode);
> +       return 0;
> +
> + fail:
> +       dev_err(dev, "failed to set DDT mode, tried: %i and got %i\n", mode,
> +               mode_readback);
> +       riscv_iommu_ddt_cleanup(iommu);
> +       return -EINVAL;
> +}
> +
> +/*
> + * Common I/O MMU driver probe/teardown
> + */
> +
> +static const struct iommu_domain_ops riscv_iommu_domain_ops = {
> +       .free = riscv_iommu_domain_free,
> +       .attach_dev = riscv_iommu_attach_dev,
> +       .map_pages = riscv_iommu_map_pages,
> +       .unmap_pages = riscv_iommu_unmap_pages,
> +       .iova_to_phys = riscv_iommu_iova_to_phys,
> +       .iotlb_sync = riscv_iommu_iotlb_sync,
> +       .iotlb_sync_map = riscv_iommu_iotlb_sync_map,
> +       .flush_iotlb_all = riscv_iommu_flush_iotlb_all,
> +};
> +
> +static const struct iommu_ops riscv_iommu_ops = {
> +       .owner = THIS_MODULE,
> +       .pgsize_bitmap = SZ_4K | SZ_2M | SZ_512M,
> +       .capable = riscv_iommu_capable,
> +       .domain_alloc = riscv_iommu_domain_alloc,
> +       .probe_device = riscv_iommu_probe_device,
> +       .probe_finalize = riscv_iommu_probe_finalize,
> +       .release_device = riscv_iommu_release_device,
> +       .device_group = riscv_iommu_device_group,
> +       .get_resv_regions = riscv_iommu_get_resv_regions,
> +       .of_xlate = riscv_iommu_of_xlate,
> +       .default_domain_ops = &riscv_iommu_domain_ops,
> +};
> +
> +void riscv_iommu_remove(struct riscv_iommu_device *iommu)
> +{
> +       iommu_device_unregister(&iommu->iommu);
> +       riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_OFF);
> +}
> +
> +int riscv_iommu_init(struct riscv_iommu_device *iommu)
> +{
> +       struct device *dev = iommu->dev;
> +       u32 fctl = 0;
> +       int ret;
> +
> +       iommu->eps = RB_ROOT;
> +
> +       fctl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_FCTL);
> +
> +#ifdef CONFIG_CPU_BIG_ENDIAN
> +       if (!(cap & RISCV_IOMMU_CAP_END)) {
> +               dev_err(dev, "IOMMU doesn't support Big Endian\n");
> +               return -EIO;
> +       } else if (!(fctl & RISCV_IOMMU_FCTL_BE)) {
> +               fctl |= FIELD_PREP(RISCV_IOMMU_FCTL_BE, 1);
> +               riscv_iommu_writel(iommu, RISCV_IOMMU_REG_FCTL, fctl);
> +       }
> +#endif
> +
> +       /* Clear any pending interrupt flag. */
> +       riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR,
> +                          RISCV_IOMMU_IPSR_CIP |
> +                          RISCV_IOMMU_IPSR_FIP |
> +                          RISCV_IOMMU_IPSR_PMIP | RISCV_IOMMU_IPSR_PIP);
> +       spin_lock_init(&iommu->cq_lock);
> +       mutex_init(&iommu->eps_mutex);
> +
> +       ret = riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_BARE);
> +
> +       if (ret) {
> +               dev_err(dev, "cannot enable iommu device (%d)\n", ret);
> +               goto fail;
> +       }
> +
> +       ret = iommu_device_register(&iommu->iommu, &riscv_iommu_ops, dev);
> +       if (ret) {
> +               dev_err(dev, "cannot register iommu interface (%d)\n", ret);
> +               iommu_device_sysfs_remove(&iommu->iommu);
> +               goto fail;
> +       }
> +
> +       return 0;
> + fail:
> +       riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_OFF);
> +       return ret;
> +}
> diff --git a/drivers/iommu/riscv/iommu.h b/drivers/iommu/riscv/iommu.h
> new file mode 100644
> index 000000000000..7baefd3630b3
> --- /dev/null
> +++ b/drivers/iommu/riscv/iommu.h
> @@ -0,0 +1,115 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright © 2022-2023 Rivos Inc.
> + * Copyright © 2023 FORTH-ICS/CARV
> + *
> + * RISC-V Ziommu - IOMMU Interface Specification.
> + *
> + * Authors
> + *     Tomasz Jeznach <tjeznach@rivosinc.com>
> + *     Nick Kossifidis <mick@ics.forth.gr>
> + */
> +
> +#ifndef _RISCV_IOMMU_H_
> +#define _RISCV_IOMMU_H_
> +
> +#include <linux/types.h>
> +#include <linux/iova.h>
> +#include <linux/io.h>
> +#include <linux/idr.h>
> +#include <linux/list.h>
> +#include <linux/iommu.h>
> +#include <linux/io-pgtable.h>
> +
> +#include "iommu-bits.h"
> +
> +#define IOMMU_PAGE_SIZE_4K     BIT_ULL(12)
> +#define IOMMU_PAGE_SIZE_2M     BIT_ULL(21)
> +#define IOMMU_PAGE_SIZE_1G     BIT_ULL(30)
> +#define IOMMU_PAGE_SIZE_512G   BIT_ULL(39)
> +
> +struct riscv_iommu_device {
> +       struct iommu_device iommu;      /* iommu core interface */
> +       struct device *dev;             /* iommu hardware */
> +
> +       /* hardware control register space */
> +       void __iomem *reg;
> +       resource_size_t reg_phys;
> +
> +       /* IRQs for the various queues */
> +       int irq_cmdq;
> +       int irq_fltq;
> +       int irq_pm;
> +       int irq_priq;
> +
> +       /* supported and enabled hardware capabilities */
> +       u64 cap;
> +
> +       /* global lock, to be removed */
> +       spinlock_t cq_lock;
> +
> +       /* device directory table root pointer and mode */
> +       unsigned long ddtp;
> +       unsigned ddt_mode;
> +       bool ddtp_in_iomem;
> +
> +       /* Connected end-points */
> +       struct rb_root eps;
> +       struct mutex eps_mutex;
> +};
> +
> +struct riscv_iommu_domain {
> +       struct iommu_domain domain;
> +
> +       struct list_head endpoints;
> +       struct mutex lock;
> +       struct riscv_iommu_device *iommu;
> +
> +       unsigned mode;          /* RIO_ATP_MODE_* enum */
> +       unsigned pscid;         /* RISC-V IOMMU PSCID */
> +
> +       pgd_t *pgd_root;        /* page table root pointer */
> +};
> +
> +/* Private dev_iommu_priv object, device-domain relationship. */
> +struct riscv_iommu_endpoint {
> +       struct device *dev;                     /* platform or PCI endpoint device */
> +       unsigned devid;                         /* PCI bus:device:function number */
> +       unsigned domid;                         /* PCI domain number, segment */
> +       struct rb_node node;                    /* device tracking node (lookup by devid) */
> +       struct riscv_iommu_device *iommu;       /* parent iommu device */
> +
> +       struct mutex lock;
> +       struct list_head domain;                /* endpoint attached managed domain */
> +};
> +
> +/* Helper functions and macros */
> +
> +static inline u32 riscv_iommu_readl(struct riscv_iommu_device *iommu,
> +                                   unsigned offset)
> +{
> +       return readl_relaxed(iommu->reg + offset);
> +}
> +
> +static inline void riscv_iommu_writel(struct riscv_iommu_device *iommu,
> +                                     unsigned offset, u32 val)
> +{
> +       writel_relaxed(val, iommu->reg + offset);
> +}
> +
> +static inline u64 riscv_iommu_readq(struct riscv_iommu_device *iommu,
> +                                   unsigned offset)
> +{
> +       return readq_relaxed(iommu->reg + offset);
> +}
> +
> +static inline void riscv_iommu_writeq(struct riscv_iommu_device *iommu,
> +                                     unsigned offset, u64 val)
> +{
> +       writeq_relaxed(val, iommu->reg + offset);
> +}
> +
> +int riscv_iommu_init(struct riscv_iommu_device *iommu);
> +void riscv_iommu_remove(struct riscv_iommu_device *iommu);
> +
> +#endif
> --
> 2.34.1
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues
  2023-07-24  9:47       ` Zong Li
@ 2023-07-28  5:18         ` Tomasz Jeznach
  2023-07-28  8:48           ` Zong Li
  0 siblings, 1 reply; 86+ messages in thread
From: Tomasz Jeznach @ 2023-07-28  5:18 UTC (permalink / raw)
  To: Zong Li
  Cc: Nick Kossifidis, Anup Patel, Albert Ou, linux, Will Deacon,
	Joerg Roedel, linux-kernel, Sebastien Boeuf, iommu,
	Palmer Dabbelt, Paul Walmsley, linux-riscv, Robin Murphy

On Mon, Jul 24, 2023 at 11:47 AM Zong Li <zong.li@sifive.com> wrote:
>
> On Fri, Jul 21, 2023 at 2:00 AM Tomasz Jeznach <tjeznach@rivosinc.com> wrote:
> >
> > On Wed, Jul 19, 2023 at 8:12 PM Nick Kossifidis <mick@ics.forth.gr> wrote:
> > >
> > > Hello Tomasz,
> > >
> > > On 7/19/23 22:33, Tomasz Jeznach wrote:
> > > > Enables message or wire signal interrupts for PCIe and platforms devices.
> > > >
> > >
> > > The description doesn't match the subject nor the patch content (we
> > > don't jus enable interrupts, we also init the queues).
> > >
> > > > +     /* Parse Queue lengts */
> > > > +     ret = of_property_read_u32(pdev->dev.of_node, "cmdq_len", &iommu->cmdq_len);
> > > > +     if (!ret)
> > > > +             dev_info(dev, "command queue length set to %i\n", iommu->cmdq_len);
> > > > +
> > > > +     ret = of_property_read_u32(pdev->dev.of_node, "fltq_len", &iommu->fltq_len);
> > > > +     if (!ret)
> > > > +             dev_info(dev, "fault/event queue length set to %i\n", iommu->fltq_len);
> > > > +
> > > > +     ret = of_property_read_u32(pdev->dev.of_node, "priq_len", &iommu->priq_len);
> > > > +     if (!ret)
> > > > +             dev_info(dev, "page request queue length set to %i\n", iommu->priq_len);
> > > > +
> > > >       dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
> > > >
> > >
> > > We need to add those to the device tree binding doc (or throw them away,
> > > I thought it would be better to have them as part of the device
> > > desciption than a module parameter).
> > >
> >
> > We can add them as an optional fields to DT.
> > Alternatively, I've been looking into an option to auto-scale CQ/PQ
> > based on number of attached devices, but this gets trickier for
> > hot-pluggable systems. I've added module parameters as a bare-minimum,
> > but still looking for better solutions.
> >
> > >
> > > > +static irqreturn_t riscv_iommu_priq_irq_check(int irq, void *data);
> > > > +static irqreturn_t riscv_iommu_priq_process(int irq, void *data);
> > > > +
> > >
> > > > +     case RISCV_IOMMU_PAGE_REQUEST_QUEUE:
> > > > +             q = &iommu->priq;
> > > > +             q->len = sizeof(struct riscv_iommu_pq_record);
> > > > +             count = iommu->priq_len;
> > > > +             irq = iommu->irq_priq;
> > > > +             irq_check = riscv_iommu_priq_irq_check;
> > > > +             irq_process = riscv_iommu_priq_process;
> > > > +             q->qbr = RISCV_IOMMU_REG_PQB;
> > > > +             q->qcr = RISCV_IOMMU_REG_PQCSR;
> > > > +             name = "priq";
> > > > +             break;
> > >
> > >
> > > It makes more sense to add the code for the page request queue in the
> > > patch that adds ATS/PRI support IMHO. This comment also applies to its
> > > interrupt handlers below.
> > >
> >
> > ack. will do.
> >
> > >
> > > > +static inline void riscv_iommu_cmd_inval_set_addr(struct riscv_iommu_command *cmd,
> > > > +                                               u64 addr)
> > > > +{
> > > > +     cmd->dword0 |= RISCV_IOMMU_CMD_IOTINVAL_AV;
> > > > +     cmd->dword1 = addr;
> > > > +}
> > > > +
> > >
> > > This needs to be (addr >> 2) to match the spec, same as in the iofence
> > > command.
> > >
> >
> > oops. Thanks!
> >
>
> I think it should be (addr >> 12) according to the spec.
>

My reading of the spec '3.1.1. IOMMU Page-Table cache invalidation commands'
is that it is a 4k page aligned address packed at dword1[61:10], so
effectively shifted by 2 bits.

regards,
- Tomasz

> > > Regards,
> > > Nick
> > >
> >
> > regards,
> > - Tomasz
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues
  2023-07-28  5:18         ` Tomasz Jeznach
@ 2023-07-28  8:48           ` Zong Li
  0 siblings, 0 replies; 86+ messages in thread
From: Zong Li @ 2023-07-28  8:48 UTC (permalink / raw)
  To: Tomasz Jeznach
  Cc: Nick Kossifidis, Anup Patel, Albert Ou, linux, Will Deacon,
	Joerg Roedel, linux-kernel, Sebastien Boeuf, iommu,
	Palmer Dabbelt, Paul Walmsley, linux-riscv, Robin Murphy

On Fri, Jul 28, 2023 at 1:19 PM Tomasz Jeznach <tjeznach@rivosinc.com> wrote:
>
> On Mon, Jul 24, 2023 at 11:47 AM Zong Li <zong.li@sifive.com> wrote:
> >
> > On Fri, Jul 21, 2023 at 2:00 AM Tomasz Jeznach <tjeznach@rivosinc.com> wrote:
> > >
> > > On Wed, Jul 19, 2023 at 8:12 PM Nick Kossifidis <mick@ics.forth.gr> wrote:
> > > >
> > > > Hello Tomasz,
> > > >
> > > > On 7/19/23 22:33, Tomasz Jeznach wrote:
> > > > > Enables message or wire signal interrupts for PCIe and platforms devices.
> > > > >
> > > >
> > > > The description doesn't match the subject nor the patch content (we
> > > > don't jus enable interrupts, we also init the queues).
> > > >
> > > > > +     /* Parse Queue lengts */
> > > > > +     ret = of_property_read_u32(pdev->dev.of_node, "cmdq_len", &iommu->cmdq_len);
> > > > > +     if (!ret)
> > > > > +             dev_info(dev, "command queue length set to %i\n", iommu->cmdq_len);
> > > > > +
> > > > > +     ret = of_property_read_u32(pdev->dev.of_node, "fltq_len", &iommu->fltq_len);
> > > > > +     if (!ret)
> > > > > +             dev_info(dev, "fault/event queue length set to %i\n", iommu->fltq_len);
> > > > > +
> > > > > +     ret = of_property_read_u32(pdev->dev.of_node, "priq_len", &iommu->priq_len);
> > > > > +     if (!ret)
> > > > > +             dev_info(dev, "page request queue length set to %i\n", iommu->priq_len);
> > > > > +
> > > > >       dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
> > > > >
> > > >
> > > > We need to add those to the device tree binding doc (or throw them away,
> > > > I thought it would be better to have them as part of the device
> > > > desciption than a module parameter).
> > > >
> > >
> > > We can add them as an optional fields to DT.
> > > Alternatively, I've been looking into an option to auto-scale CQ/PQ
> > > based on number of attached devices, but this gets trickier for
> > > hot-pluggable systems. I've added module parameters as a bare-minimum,
> > > but still looking for better solutions.
> > >
> > > >
> > > > > +static irqreturn_t riscv_iommu_priq_irq_check(int irq, void *data);
> > > > > +static irqreturn_t riscv_iommu_priq_process(int irq, void *data);
> > > > > +
> > > >
> > > > > +     case RISCV_IOMMU_PAGE_REQUEST_QUEUE:
> > > > > +             q = &iommu->priq;
> > > > > +             q->len = sizeof(struct riscv_iommu_pq_record);
> > > > > +             count = iommu->priq_len;
> > > > > +             irq = iommu->irq_priq;
> > > > > +             irq_check = riscv_iommu_priq_irq_check;
> > > > > +             irq_process = riscv_iommu_priq_process;
> > > > > +             q->qbr = RISCV_IOMMU_REG_PQB;
> > > > > +             q->qcr = RISCV_IOMMU_REG_PQCSR;
> > > > > +             name = "priq";
> > > > > +             break;
> > > >
> > > >
> > > > It makes more sense to add the code for the page request queue in the
> > > > patch that adds ATS/PRI support IMHO. This comment also applies to its
> > > > interrupt handlers below.
> > > >
> > >
> > > ack. will do.
> > >
> > > >
> > > > > +static inline void riscv_iommu_cmd_inval_set_addr(struct riscv_iommu_command *cmd,
> > > > > +                                               u64 addr)
> > > > > +{
> > > > > +     cmd->dword0 |= RISCV_IOMMU_CMD_IOTINVAL_AV;
> > > > > +     cmd->dword1 = addr;
> > > > > +}
> > > > > +
> > > >
> > > > This needs to be (addr >> 2) to match the spec, same as in the iofence
> > > > command.
> > > >
> > >
> > > oops. Thanks!
> > >
> >
> > I think it should be (addr >> 12) according to the spec.
> >
>
> My reading of the spec '3.1.1. IOMMU Page-Table cache invalidation commands'
> is that it is a 4k page aligned address packed at dword1[61:10], so
> effectively shifted by 2 bits.

Thanks for your clarifying. Just an opinion, perhaps you can use
'FIELD_PREP()' on it as well, it might be clearer.

>
> regards,
> - Tomasz
>
> > > > Regards,
> > > > Nick
> > > >
> > >
> > > regards,
> > > - Tomasz
> > >
> > > _______________________________________________
> > > linux-riscv mailing list
> > > linux-riscv@lists.infradead.org
> > > http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues
  2023-07-19 19:33 ` [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues Tomasz Jeznach
  2023-07-20  3:11   ` Nick Kossifidis
  2023-07-20 13:08   ` Baolu Lu
@ 2023-07-29 12:58   ` Zong Li
  2023-07-31  9:32     ` Nick Kossifidis
  2023-08-02 20:50     ` Tomasz Jeznach
  2023-08-16 18:49   ` Robin Murphy
  3 siblings, 2 replies; 86+ messages in thread
From: Zong Li @ 2023-07-29 12:58 UTC (permalink / raw)
  To: Tomasz Jeznach
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley,
	Anup Patel, Albert Ou, linux, linux-kernel, Sebastien Boeuf,
	iommu, Palmer Dabbelt, Nick Kossifidis, linux-riscv

On Thu, Jul 20, 2023 at 3:34 AM Tomasz Jeznach <tjeznach@rivosinc.com> wrote:
>
> Enables message or wire signal interrupts for PCIe and platforms devices.
>
> Co-developed-by: Nick Kossifidis <mick@ics.forth.gr>
> Signed-off-by: Nick Kossifidis <mick@ics.forth.gr>
> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> ---
>  drivers/iommu/riscv/iommu-pci.c      |  72 ++++
>  drivers/iommu/riscv/iommu-platform.c |  66 +++
>  drivers/iommu/riscv/iommu.c          | 604 ++++++++++++++++++++++++++-
>  drivers/iommu/riscv/iommu.h          |  28 ++
>  4 files changed, 769 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/riscv/iommu-pci.c b/drivers/iommu/riscv/iommu-pci.c
> index c91f963d7a29..9ea0647f7b92 100644
> --- a/drivers/iommu/riscv/iommu-pci.c
> +++ b/drivers/iommu/riscv/iommu-pci.c
> @@ -34,6 +34,7 @@ static int riscv_iommu_pci_probe(struct pci_dev *pdev, const struct pci_device_i
>  {
>         struct device *dev = &pdev->dev;
>         struct riscv_iommu_device *iommu;
> +       u64 icvec;
>         int ret;
>
>         ret = pci_enable_device_mem(pdev);
> @@ -67,14 +68,84 @@ static int riscv_iommu_pci_probe(struct pci_dev *pdev, const struct pci_device_i
>         iommu->dev = dev;
>         dev_set_drvdata(dev, iommu);
>
> +       /* Check device reported capabilities. */
> +       iommu->cap = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_CAP);
> +
> +       /* The PCI driver only uses MSIs, make sure the IOMMU supports this */
> +       switch (FIELD_GET(RISCV_IOMMU_CAP_IGS, iommu->cap)) {
> +       case RISCV_IOMMU_CAP_IGS_MSI:
> +       case RISCV_IOMMU_CAP_IGS_BOTH:
> +               break;
> +       default:
> +               dev_err(dev, "unable to use message-signaled interrupts\n");
> +               ret = -ENODEV;
> +               goto fail;
> +       }
> +
>         dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
>         pci_set_master(pdev);
>
> +       /* Allocate and assign IRQ vectors for the various events */
> +       ret = pci_alloc_irq_vectors(pdev, 1, RISCV_IOMMU_INTR_COUNT, PCI_IRQ_MSIX);
> +       if (ret < 0) {
> +               dev_err(dev, "unable to allocate irq vectors\n");
> +               goto fail;
> +       }
> +
> +       ret = -ENODEV;
> +
> +       iommu->irq_cmdq = msi_get_virq(dev, RISCV_IOMMU_INTR_CQ);
> +       if (!iommu->irq_cmdq) {
> +               dev_warn(dev, "no MSI vector %d for the command queue\n",
> +                        RISCV_IOMMU_INTR_CQ);
> +               goto fail;
> +       }
> +
> +       iommu->irq_fltq = msi_get_virq(dev, RISCV_IOMMU_INTR_FQ);
> +       if (!iommu->irq_fltq) {
> +               dev_warn(dev, "no MSI vector %d for the fault/event queue\n",
> +                        RISCV_IOMMU_INTR_FQ);
> +               goto fail;
> +       }
> +
> +       if (iommu->cap & RISCV_IOMMU_CAP_HPM) {
> +               iommu->irq_pm = msi_get_virq(dev, RISCV_IOMMU_INTR_PM);
> +               if (!iommu->irq_pm) {
> +                       dev_warn(dev,
> +                                "no MSI vector %d for performance monitoring\n",
> +                                RISCV_IOMMU_INTR_PM);
> +                       goto fail;
> +               }
> +       }
> +
> +       if (iommu->cap & RISCV_IOMMU_CAP_ATS) {
> +               iommu->irq_priq = msi_get_virq(dev, RISCV_IOMMU_INTR_PQ);
> +               if (!iommu->irq_priq) {
> +                       dev_warn(dev,
> +                                "no MSI vector %d for page-request queue\n",
> +                                RISCV_IOMMU_INTR_PQ);
> +                       goto fail;
> +               }
> +       }
> +
> +       /* Set simple 1:1 mapping for MSI vectors */
> +       icvec = FIELD_PREP(RISCV_IOMMU_IVEC_CIV, RISCV_IOMMU_INTR_CQ) |
> +           FIELD_PREP(RISCV_IOMMU_IVEC_FIV, RISCV_IOMMU_INTR_FQ);
> +
> +       if (iommu->cap & RISCV_IOMMU_CAP_HPM)
> +               icvec |= FIELD_PREP(RISCV_IOMMU_IVEC_PMIV, RISCV_IOMMU_INTR_PM);
> +
> +       if (iommu->cap & RISCV_IOMMU_CAP_ATS)
> +               icvec |= FIELD_PREP(RISCV_IOMMU_IVEC_PIV, RISCV_IOMMU_INTR_PQ);
> +
> +       riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IVEC, icvec);
> +
>         ret = riscv_iommu_init(iommu);
>         if (!ret)
>                 return ret;
>
>   fail:
> +       pci_free_irq_vectors(pdev);
>         pci_clear_master(pdev);
>         pci_release_regions(pdev);
>         pci_disable_device(pdev);
> @@ -85,6 +156,7 @@ static int riscv_iommu_pci_probe(struct pci_dev *pdev, const struct pci_device_i
>  static void riscv_iommu_pci_remove(struct pci_dev *pdev)
>  {
>         riscv_iommu_remove(dev_get_drvdata(&pdev->dev));
> +       pci_free_irq_vectors(pdev);
>         pci_clear_master(pdev);
>         pci_release_regions(pdev);
>         pci_disable_device(pdev);
> diff --git a/drivers/iommu/riscv/iommu-platform.c b/drivers/iommu/riscv/iommu-platform.c
> index e4e8ca6711e7..35935d3c7ef4 100644
> --- a/drivers/iommu/riscv/iommu-platform.c
> +++ b/drivers/iommu/riscv/iommu-platform.c
> @@ -20,6 +20,8 @@ static int riscv_iommu_platform_probe(struct platform_device *pdev)
>         struct device *dev = &pdev->dev;
>         struct riscv_iommu_device *iommu = NULL;
>         struct resource *res = NULL;
> +       u32 fctl = 0;
> +       int irq = 0;
>         int ret = 0;
>
>         iommu = devm_kzalloc(dev, sizeof(*iommu), GFP_KERNEL);
> @@ -53,6 +55,70 @@ static int riscv_iommu_platform_probe(struct platform_device *pdev)
>                 goto fail;
>         }
>
> +       iommu->cap = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_CAP);
> +
> +       /* For now we only support WSIs until we have AIA support */

I'm not completely understand AIA support here, because I saw the pci
case uses the MSI, and kernel seems to have the AIA implementation.
Could you please elaborate it?

> +       ret = FIELD_GET(RISCV_IOMMU_CAP_IGS, iommu->cap);
> +       if (ret == RISCV_IOMMU_CAP_IGS_MSI) {
> +               dev_err(dev, "IOMMU only supports MSIs\n");
> +               goto fail;
> +       }
> +
> +       /* Parse IRQ assignment */
> +       irq = platform_get_irq_byname_optional(pdev, "cmdq");
> +       if (irq > 0)
> +               iommu->irq_cmdq = irq;
> +       else {
> +               dev_err(dev, "no IRQ provided for the command queue\n");
> +               goto fail;
> +       }
> +
> +       irq = platform_get_irq_byname_optional(pdev, "fltq");
> +       if (irq > 0)
> +               iommu->irq_fltq = irq;
> +       else {
> +               dev_err(dev, "no IRQ provided for the fault/event queue\n");
> +               goto fail;
> +       }
> +
> +       if (iommu->cap & RISCV_IOMMU_CAP_HPM) {
> +               irq = platform_get_irq_byname_optional(pdev, "pm");
> +               if (irq > 0)
> +                       iommu->irq_pm = irq;
> +               else {
> +                       dev_err(dev, "no IRQ provided for performance monitoring\n");
> +                       goto fail;
> +               }
> +       }
> +
> +       if (iommu->cap & RISCV_IOMMU_CAP_ATS) {
> +               irq = platform_get_irq_byname_optional(pdev, "priq");
> +               if (irq > 0)
> +                       iommu->irq_priq = irq;
> +               else {
> +                       dev_err(dev, "no IRQ provided for the page-request queue\n");
> +                       goto fail;
> +               }
> +       }

Should we define the "interrupt-names" in dt-bindings?

> +
> +       /* Make sure fctl.WSI is set */
> +       fctl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_FCTL);
> +       fctl |= RISCV_IOMMU_FCTL_WSI;
> +       riscv_iommu_writel(iommu, RISCV_IOMMU_REG_FCTL, fctl);
> +
> +       /* Parse Queue lengts */
> +       ret = of_property_read_u32(pdev->dev.of_node, "cmdq_len", &iommu->cmdq_len);
> +       if (!ret)
> +               dev_info(dev, "command queue length set to %i\n", iommu->cmdq_len);
> +
> +       ret = of_property_read_u32(pdev->dev.of_node, "fltq_len", &iommu->fltq_len);
> +       if (!ret)
> +               dev_info(dev, "fault/event queue length set to %i\n", iommu->fltq_len);
> +
> +       ret = of_property_read_u32(pdev->dev.of_node, "priq_len", &iommu->priq_len);
> +       if (!ret)
> +               dev_info(dev, "page request queue length set to %i\n", iommu->priq_len);
> +
>         dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
>
>         return riscv_iommu_init(iommu);
> diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
> index 31dc3c458e13..5c4cf9875302 100644
> --- a/drivers/iommu/riscv/iommu.c
> +++ b/drivers/iommu/riscv/iommu.c
> @@ -45,6 +45,18 @@ static int ddt_mode = RISCV_IOMMU_DDTP_MODE_BARE;
>  module_param(ddt_mode, int, 0644);
>  MODULE_PARM_DESC(ddt_mode, "Device Directory Table mode.");
>
> +static int cmdq_length = 1024;
> +module_param(cmdq_length, int, 0644);
> +MODULE_PARM_DESC(cmdq_length, "Command queue length.");
> +
> +static int fltq_length = 1024;
> +module_param(fltq_length, int, 0644);
> +MODULE_PARM_DESC(fltq_length, "Fault queue length.");
> +
> +static int priq_length = 1024;
> +module_param(priq_length, int, 0644);
> +MODULE_PARM_DESC(priq_length, "Page request interface queue length.");
> +
>  /* IOMMU PSCID allocation namespace. */
>  #define RISCV_IOMMU_MAX_PSCID  (1U << 20)
>  static DEFINE_IDA(riscv_iommu_pscids);
> @@ -65,6 +77,497 @@ static DEFINE_IDA(riscv_iommu_pscids);
>  static const struct iommu_domain_ops riscv_iommu_domain_ops;
>  static const struct iommu_ops riscv_iommu_ops;
>
> +/*
> + * Common queue management routines
> + */
> +
> +/* Note: offsets are the same for all queues */
> +#define Q_HEAD(q) ((q)->qbr + (RISCV_IOMMU_REG_CQH - RISCV_IOMMU_REG_CQB))
> +#define Q_TAIL(q) ((q)->qbr + (RISCV_IOMMU_REG_CQT - RISCV_IOMMU_REG_CQB))
> +
> +static unsigned riscv_iommu_queue_consume(struct riscv_iommu_device *iommu,
> +                                         struct riscv_iommu_queue *q, unsigned *ready)
> +{
> +       u32 tail = riscv_iommu_readl(iommu, Q_TAIL(q));
> +       *ready = q->lui;
> +
> +       BUG_ON(q->cnt <= tail);
> +       if (q->lui <= tail)
> +               return tail - q->lui;
> +       return q->cnt - q->lui;
> +}
> +
> +static void riscv_iommu_queue_release(struct riscv_iommu_device *iommu,
> +                                     struct riscv_iommu_queue *q, unsigned count)
> +{
> +       q->lui = (q->lui + count) & (q->cnt - 1);
> +       riscv_iommu_writel(iommu, Q_HEAD(q), q->lui);
> +}
> +
> +static u32 riscv_iommu_queue_ctrl(struct riscv_iommu_device *iommu,
> +                                 struct riscv_iommu_queue *q, u32 val)
> +{
> +       cycles_t end_cycles = RISCV_IOMMU_TIMEOUT + get_cycles();
> +
> +       riscv_iommu_writel(iommu, q->qcr, val);
> +       do {
> +               val = riscv_iommu_readl(iommu, q->qcr);
> +               if (!(val & RISCV_IOMMU_QUEUE_BUSY))
> +                       break;
> +               cpu_relax();
> +       } while (get_cycles() < end_cycles);
> +
> +       return val;
> +}
> +
> +static void riscv_iommu_queue_free(struct riscv_iommu_device *iommu,
> +                                  struct riscv_iommu_queue *q)
> +{
> +       size_t size = q->len * q->cnt;
> +
> +       riscv_iommu_queue_ctrl(iommu, q, 0);
> +
> +       if (q->base) {
> +               if (q->in_iomem)
> +                       iounmap(q->base);
> +               else
> +                       dmam_free_coherent(iommu->dev, size, q->base, q->base_dma);
> +       }
> +       if (q->irq)
> +               free_irq(q->irq, q);
> +}
> +
> +static irqreturn_t riscv_iommu_cmdq_irq_check(int irq, void *data);
> +static irqreturn_t riscv_iommu_cmdq_process(int irq, void *data);
> +static irqreturn_t riscv_iommu_fltq_irq_check(int irq, void *data);
> +static irqreturn_t riscv_iommu_fltq_process(int irq, void *data);
> +static irqreturn_t riscv_iommu_priq_irq_check(int irq, void *data);
> +static irqreturn_t riscv_iommu_priq_process(int irq, void *data);
> +
> +static int riscv_iommu_queue_init(struct riscv_iommu_device *iommu, int queue_id)
> +{
> +       struct device *dev = iommu->dev;
> +       struct riscv_iommu_queue *q = NULL;
> +       size_t queue_size = 0;
> +       irq_handler_t irq_check;
> +       irq_handler_t irq_process;
> +       const char *name;
> +       int count = 0;
> +       int irq = 0;
> +       unsigned order = 0;
> +       u64 qbr_val = 0;
> +       u64 qbr_readback = 0;
> +       u64 qbr_paddr = 0;
> +       int ret = 0;
> +
> +       switch (queue_id) {
> +       case RISCV_IOMMU_COMMAND_QUEUE:
> +               q = &iommu->cmdq;
> +               q->len = sizeof(struct riscv_iommu_command);
> +               count = iommu->cmdq_len;
> +               irq = iommu->irq_cmdq;
> +               irq_check = riscv_iommu_cmdq_irq_check;
> +               irq_process = riscv_iommu_cmdq_process;
> +               q->qbr = RISCV_IOMMU_REG_CQB;
> +               q->qcr = RISCV_IOMMU_REG_CQCSR;
> +               name = "cmdq";
> +               break;
> +       case RISCV_IOMMU_FAULT_QUEUE:
> +               q = &iommu->fltq;
> +               q->len = sizeof(struct riscv_iommu_fq_record);
> +               count = iommu->fltq_len;
> +               irq = iommu->irq_fltq;
> +               irq_check = riscv_iommu_fltq_irq_check;
> +               irq_process = riscv_iommu_fltq_process;
> +               q->qbr = RISCV_IOMMU_REG_FQB;
> +               q->qcr = RISCV_IOMMU_REG_FQCSR;
> +               name = "fltq";
> +               break;
> +       case RISCV_IOMMU_PAGE_REQUEST_QUEUE:
> +               q = &iommu->priq;
> +               q->len = sizeof(struct riscv_iommu_pq_record);
> +               count = iommu->priq_len;
> +               irq = iommu->irq_priq;
> +               irq_check = riscv_iommu_priq_irq_check;
> +               irq_process = riscv_iommu_priq_process;
> +               q->qbr = RISCV_IOMMU_REG_PQB;
> +               q->qcr = RISCV_IOMMU_REG_PQCSR;
> +               name = "priq";
> +               break;
> +       default:
> +               dev_err(dev, "invalid queue interrupt index in queue_init!\n");
> +               return -EINVAL;
> +       }
> +
> +       /* Polling not implemented */
> +       if (!irq)
> +               return -ENODEV;
> +
> +       /* Allocate queue in memory and set the base register */
> +       order = ilog2(count);
> +       do {
> +               queue_size = q->len * (1ULL << order);
> +               q->base = dmam_alloc_coherent(dev, queue_size, &q->base_dma, GFP_KERNEL);
> +               if (q->base || queue_size < PAGE_SIZE)
> +                       break;
> +
> +               order--;
> +       } while (1);
> +
> +       if (!q->base) {
> +               dev_err(dev, "failed to allocate %s queue (cnt: %u)\n", name, count);
> +               return -ENOMEM;
> +       }
> +
> +       q->cnt = 1ULL << order;
> +
> +       qbr_val = phys_to_ppn(q->base_dma) |
> +           FIELD_PREP(RISCV_IOMMU_QUEUE_LOGSZ_FIELD, order - 1);
> +
> +       riscv_iommu_writeq(iommu, q->qbr, qbr_val);
> +
> +       /*
> +        * Queue base registers are WARL, so it's possible that whatever we wrote
> +        * there was illegal/not supported by the hw in which case we need to make
> +        * sure we set a supported PPN and/or queue size.
> +        */
> +       qbr_readback = riscv_iommu_readq(iommu, q->qbr);
> +       if (qbr_readback == qbr_val)
> +               goto irq;
> +
> +       dmam_free_coherent(dev, queue_size, q->base, q->base_dma);
> +
> +       /* Get supported queue size */
> +       order = FIELD_GET(RISCV_IOMMU_QUEUE_LOGSZ_FIELD, qbr_readback) + 1;
> +       q->cnt = 1ULL << order;
> +       queue_size = q->len * q->cnt;
> +
> +       /*
> +        * In case we also failed to set PPN, it means the field is hardcoded and the
> +        * queue resides in I/O memory instead, so get its physical address and
> +        * ioremap it.
> +        */
> +       qbr_paddr = ppn_to_phys(qbr_readback);
> +       if (qbr_paddr != q->base_dma) {
> +               dev_info(dev,
> +                        "hardcoded ppn in %s base register, using io memory for the queue\n",
> +                        name);
> +               dev_info(dev, "queue length for %s set to %i\n", name, q->cnt);
> +               q->in_iomem = true;
> +               q->base = ioremap(qbr_paddr, queue_size);
> +               if (!q->base) {
> +                       dev_err(dev, "failed to map %s queue (cnt: %u)\n", name, q->cnt);
> +                       return -ENOMEM;
> +               }
> +               q->base_dma = qbr_paddr;
> +       } else {
> +               /*
> +                * We only failed to set the queue size, re-try to allocate memory with
> +                * the queue size supported by the hw.
> +                */
> +               dev_info(dev, "hardcoded queue size in %s base register\n", name);
> +               dev_info(dev, "retrying with queue length: %i\n", q->cnt);
> +               q->base = dmam_alloc_coherent(dev, queue_size, &q->base_dma, GFP_KERNEL);
> +               if (!q->base) {
> +                       dev_err(dev, "failed to allocate %s queue (cnt: %u)\n",
> +                               name, q->cnt);
> +                       return -ENOMEM;
> +               }
> +       }
> +
> +       qbr_val = phys_to_ppn(q->base_dma) |
> +           FIELD_PREP(RISCV_IOMMU_QUEUE_LOGSZ_FIELD, order - 1);
> +       riscv_iommu_writeq(iommu, q->qbr, qbr_val);
> +
> +       /* Final check to make sure hw accepted our write */
> +       qbr_readback = riscv_iommu_readq(iommu, q->qbr);
> +       if (qbr_readback != qbr_val) {
> +               dev_err(dev, "failed to set base register for %s\n", name);
> +               goto fail;
> +       }
> +
> + irq:
> +       if (request_threaded_irq(irq, irq_check, irq_process, IRQF_ONESHOT | IRQF_SHARED,
> +                                dev_name(dev), q)) {
> +               dev_err(dev, "fail to request irq %d for %s\n", irq, name);
> +               goto fail;
> +       }
> +
> +       q->irq = irq;
> +
> +       /* Note: All RIO_xQ_EN/IE fields are in the same offsets */
> +       ret =
> +           riscv_iommu_queue_ctrl(iommu, q,
> +                                  RISCV_IOMMU_QUEUE_ENABLE |
> +                                  RISCV_IOMMU_QUEUE_INTR_ENABLE);
> +       if (ret & RISCV_IOMMU_QUEUE_BUSY) {
> +               dev_err(dev, "%s init timeout\n", name);
> +               ret = -EBUSY;
> +               goto fail;
> +       }
> +
> +       return 0;
> +
> + fail:
> +       riscv_iommu_queue_free(iommu, q);
> +       return 0;
> +}
> +
> +/*
> + * I/O MMU Command queue chapter 3.1
> + */
> +
> +static inline void riscv_iommu_cmd_inval_vma(struct riscv_iommu_command *cmd)
> +{
> +       cmd->dword0 =
> +           FIELD_PREP(RISCV_IOMMU_CMD_OPCODE,
> +                      RISCV_IOMMU_CMD_IOTINVAL_OPCODE) | FIELD_PREP(RISCV_IOMMU_CMD_FUNC,
> +                                                                    RISCV_IOMMU_CMD_IOTINVAL_FUNC_VMA);
> +       cmd->dword1 = 0;
> +}
> +
> +static inline void riscv_iommu_cmd_inval_set_addr(struct riscv_iommu_command *cmd,
> +                                                 u64 addr)
> +{
> +       cmd->dword0 |= RISCV_IOMMU_CMD_IOTINVAL_AV;
> +       cmd->dword1 = addr;
> +}
> +
> +static inline void riscv_iommu_cmd_inval_set_pscid(struct riscv_iommu_command *cmd,
> +                                                  unsigned pscid)
> +{
> +       cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_IOTINVAL_PSCID, pscid) |
> +           RISCV_IOMMU_CMD_IOTINVAL_PSCV;
> +}
> +
> +static inline void riscv_iommu_cmd_inval_set_gscid(struct riscv_iommu_command *cmd,
> +                                                  unsigned gscid)
> +{
> +       cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_IOTINVAL_GSCID, gscid) |
> +           RISCV_IOMMU_CMD_IOTINVAL_GV;
> +}
> +
> +static inline void riscv_iommu_cmd_iofence(struct riscv_iommu_command *cmd)
> +{
> +       cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IOFENCE_OPCODE) |
> +           FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IOFENCE_FUNC_C);
> +       cmd->dword1 = 0;
> +}
> +
> +static inline void riscv_iommu_cmd_iofence_set_av(struct riscv_iommu_command *cmd,
> +                                                 u64 addr, u32 data)
> +{
> +       cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IOFENCE_OPCODE) |
> +           FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IOFENCE_FUNC_C) |
> +           FIELD_PREP(RISCV_IOMMU_CMD_IOFENCE_DATA, data) | RISCV_IOMMU_CMD_IOFENCE_AV;
> +       cmd->dword1 = (addr >> 2);
> +}
> +
> +static inline void riscv_iommu_cmd_iodir_inval_ddt(struct riscv_iommu_command *cmd)
> +{
> +       cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IODIR_OPCODE) |
> +           FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_DDT);
> +       cmd->dword1 = 0;
> +}
> +
> +static inline void riscv_iommu_cmd_iodir_inval_pdt(struct riscv_iommu_command *cmd)
> +{
> +       cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IODIR_OPCODE) |
> +           FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_PDT);
> +       cmd->dword1 = 0;
> +}
> +
> +static inline void riscv_iommu_cmd_iodir_set_did(struct riscv_iommu_command *cmd,
> +                                                unsigned devid)
> +{
> +       cmd->dword0 |=
> +           FIELD_PREP(RISCV_IOMMU_CMD_IODIR_DID, devid) | RISCV_IOMMU_CMD_IODIR_DV;
> +}
> +
> +/* TODO: Convert into lock-less MPSC implementation. */
> +static bool riscv_iommu_post_sync(struct riscv_iommu_device *iommu,
> +                                 struct riscv_iommu_command *cmd, bool sync)
> +{
> +       u32 head, tail, next, last;
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&iommu->cq_lock, flags);
> +       head = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQH) & (iommu->cmdq.cnt - 1);
> +       tail = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQT) & (iommu->cmdq.cnt - 1);
> +       last = iommu->cmdq.lui;
> +       if (tail != last) {
> +               spin_unlock_irqrestore(&iommu->cq_lock, flags);
> +               /*
> +                * FIXME: This is a workaround for dropped MMIO writes/reads on QEMU platform.
> +                *        While debugging of the problem is still ongoing, this provides
> +                *        a simple impolementation of try-again policy.
> +                *        Will be changed to lock-less algorithm in the feature.
> +                */
> +               dev_dbg(iommu->dev, "IOMMU CQT: %x != %x (1st)\n", last, tail);
> +               spin_lock_irqsave(&iommu->cq_lock, flags);
> +               tail =
> +                   riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQT) & (iommu->cmdq.cnt - 1);
> +               last = iommu->cmdq.lui;
> +               if (tail != last) {
> +                       spin_unlock_irqrestore(&iommu->cq_lock, flags);
> +                       dev_dbg(iommu->dev, "IOMMU CQT: %x != %x (2nd)\n", last, tail);
> +                       spin_lock_irqsave(&iommu->cq_lock, flags);
> +               }
> +       }
> +
> +       next = (last + 1) & (iommu->cmdq.cnt - 1);
> +       if (next != head) {
> +               struct riscv_iommu_command *ptr = iommu->cmdq.base;
> +               ptr[last] = *cmd;
> +               wmb();
> +               riscv_iommu_writel(iommu, RISCV_IOMMU_REG_CQT, next);
> +               iommu->cmdq.lui = next;
> +       }
> +
> +       spin_unlock_irqrestore(&iommu->cq_lock, flags);
> +
> +       if (sync && head != next) {
> +               cycles_t start_time = get_cycles();
> +               while (1) {
> +                       last = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQH) &
> +                           (iommu->cmdq.cnt - 1);
> +                       if (head < next && last >= next)
> +                               break;
> +                       if (head > next && last < head && last >= next)
> +                               break;
> +                       if (RISCV_IOMMU_TIMEOUT < (get_cycles() - start_time)) {

This condition will be imprecise, because here is not in irq disabled
context, it will be scheduled out or preempted. When we come back
here, it might be over 1 second, but the IOFENCE is actually
completed.

> +                               dev_err(iommu->dev, "IOFENCE TIMEOUT\n");
> +                               return false;
> +                       }
> +                       cpu_relax();
> +               }
> +       }
> +
> +       return next != head;
> +}
> +
> +static bool riscv_iommu_post(struct riscv_iommu_device *iommu,
> +                            struct riscv_iommu_command *cmd)
> +{
> +       return riscv_iommu_post_sync(iommu, cmd, false);
> +}
> +
> +static bool riscv_iommu_iofence_sync(struct riscv_iommu_device *iommu)
> +{
> +       struct riscv_iommu_command cmd;
> +       riscv_iommu_cmd_iofence(&cmd);
> +       return riscv_iommu_post_sync(iommu, &cmd, true);
> +}
> +
> +/* Command queue primary interrupt handler */
> +static irqreturn_t riscv_iommu_cmdq_irq_check(int irq, void *data)
> +{
> +       struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> +       struct riscv_iommu_device *iommu =
> +           container_of(q, struct riscv_iommu_device, cmdq);
> +       u32 ipsr = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_IPSR);
> +       if (ipsr & RISCV_IOMMU_IPSR_CIP)
> +               return IRQ_WAKE_THREAD;
> +       return IRQ_NONE;
> +}
> +
> +/* Command queue interrupt hanlder thread function */
> +static irqreturn_t riscv_iommu_cmdq_process(int irq, void *data)
> +{
> +       struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> +       struct riscv_iommu_device *iommu;
> +       unsigned ctrl;
> +
> +       iommu = container_of(q, struct riscv_iommu_device, cmdq);
> +
> +       /* Error reporting, clear error reports if any. */
> +       ctrl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQCSR);
> +       if (ctrl & (RISCV_IOMMU_CQCSR_CQMF |
> +                   RISCV_IOMMU_CQCSR_CMD_TO | RISCV_IOMMU_CQCSR_CMD_ILL)) {
> +               riscv_iommu_queue_ctrl(iommu, &iommu->cmdq, ctrl);
> +               dev_warn_ratelimited(iommu->dev,
> +                                    "Command queue error: fault: %d tout: %d err: %d\n",
> +                                    !!(ctrl & RISCV_IOMMU_CQCSR_CQMF),
> +                                    !!(ctrl & RISCV_IOMMU_CQCSR_CMD_TO),
> +                                    !!(ctrl & RISCV_IOMMU_CQCSR_CMD_ILL));

We need to handle the error by either adjusting the tail to remove the
failed command or fixing the failed command itself. Otherwise, the
failed command will keep in the queue and IOMMU will try to execute
it. I guess the first option might be easier to implement.

> +       }
> +
> +       /* Clear fault interrupt pending. */
> +       riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR, RISCV_IOMMU_IPSR_CIP);
> +
> +       return IRQ_HANDLED;
> +}
> +
> +/*
> + * Fault/event queue, chapter 3.2
> + */
> +
> +static void riscv_iommu_fault_report(struct riscv_iommu_device *iommu,
> +                                    struct riscv_iommu_fq_record *event)
> +{
> +       unsigned err, devid;
> +
> +       err = FIELD_GET(RISCV_IOMMU_FQ_HDR_CAUSE, event->hdr);
> +       devid = FIELD_GET(RISCV_IOMMU_FQ_HDR_DID, event->hdr);
> +
> +       dev_warn_ratelimited(iommu->dev,
> +                            "Fault %d devid: %d" " iotval: %llx iotval2: %llx\n", err,
> +                            devid, event->iotval, event->iotval2);
> +}
> +
> +/* Fault/event queue primary interrupt handler */
> +static irqreturn_t riscv_iommu_fltq_irq_check(int irq, void *data)
> +{
> +       struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> +       struct riscv_iommu_device *iommu =
> +           container_of(q, struct riscv_iommu_device, fltq);
> +       u32 ipsr = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_IPSR);
> +       if (ipsr & RISCV_IOMMU_IPSR_FIP)
> +               return IRQ_WAKE_THREAD;
> +       return IRQ_NONE;
> +}
> +
> +/* Fault queue interrupt hanlder thread function */
> +static irqreturn_t riscv_iommu_fltq_process(int irq, void *data)
> +{
> +       struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> +       struct riscv_iommu_device *iommu;
> +       struct riscv_iommu_fq_record *events;
> +       unsigned cnt, len, idx, ctrl;
> +
> +       iommu = container_of(q, struct riscv_iommu_device, fltq);
> +       events = (struct riscv_iommu_fq_record *)q->base;
> +
> +       /* Error reporting, clear error reports if any. */
> +       ctrl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_FQCSR);
> +       if (ctrl & (RISCV_IOMMU_FQCSR_FQMF | RISCV_IOMMU_FQCSR_FQOF)) {
> +               riscv_iommu_queue_ctrl(iommu, &iommu->fltq, ctrl);
> +               dev_warn_ratelimited(iommu->dev,
> +                                    "Fault queue error: fault: %d full: %d\n",
> +                                    !!(ctrl & RISCV_IOMMU_FQCSR_FQMF),
> +                                    !!(ctrl & RISCV_IOMMU_FQCSR_FQOF));
> +       }
> +
> +       /* Clear fault interrupt pending. */
> +       riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR, RISCV_IOMMU_IPSR_FIP);
> +
> +       /* Report fault events. */
> +       do {
> +               cnt = riscv_iommu_queue_consume(iommu, q, &idx);
> +               if (!cnt)
> +                       break;
> +               for (len = 0; len < cnt; idx++, len++)
> +                       riscv_iommu_fault_report(iommu, &events[idx]);
> +               riscv_iommu_queue_release(iommu, q, cnt);
> +       } while (1);
> +
> +       return IRQ_HANDLED;
> +}
> +
> +/*
> + * Page request queue, chapter 3.3
> + */
> +
>  /*
>   * Register device for IOMMU tracking.
>   */
> @@ -97,6 +600,54 @@ static void riscv_iommu_add_device(struct riscv_iommu_device *iommu, struct devi
>         mutex_unlock(&iommu->eps_mutex);
>  }
>
> +/* Page request interface queue primary interrupt handler */
> +static irqreturn_t riscv_iommu_priq_irq_check(int irq, void *data)
> +{
> +       struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> +       struct riscv_iommu_device *iommu =
> +           container_of(q, struct riscv_iommu_device, priq);
> +       u32 ipsr = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_IPSR);
> +       if (ipsr & RISCV_IOMMU_IPSR_PIP)
> +               return IRQ_WAKE_THREAD;
> +       return IRQ_NONE;
> +}
> +
> +/* Page request interface queue interrupt hanlder thread function */
> +static irqreturn_t riscv_iommu_priq_process(int irq, void *data)
> +{
> +       struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> +       struct riscv_iommu_device *iommu;
> +       struct riscv_iommu_pq_record *requests;
> +       unsigned cnt, idx, ctrl;
> +
> +       iommu = container_of(q, struct riscv_iommu_device, priq);
> +       requests = (struct riscv_iommu_pq_record *)q->base;
> +
> +       /* Error reporting, clear error reports if any. */
> +       ctrl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_PQCSR);
> +       if (ctrl & (RISCV_IOMMU_PQCSR_PQMF | RISCV_IOMMU_PQCSR_PQOF)) {
> +               riscv_iommu_queue_ctrl(iommu, &iommu->priq, ctrl);
> +               dev_warn_ratelimited(iommu->dev,
> +                                    "Page request queue error: fault: %d full: %d\n",
> +                                    !!(ctrl & RISCV_IOMMU_PQCSR_PQMF),
> +                                    !!(ctrl & RISCV_IOMMU_PQCSR_PQOF));
> +       }
> +
> +       /* Clear page request interrupt pending. */
> +       riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR, RISCV_IOMMU_IPSR_PIP);
> +
> +       /* Process page requests. */
> +       do {
> +               cnt = riscv_iommu_queue_consume(iommu, q, &idx);
> +               if (!cnt)
> +                       break;
> +               dev_warn(iommu->dev, "unexpected %u page requests\n", cnt);
> +               riscv_iommu_queue_release(iommu, q, cnt);
> +       } while (1);
> +
> +       return IRQ_HANDLED;
> +}
> +
>  /*
>   * Endpoint management
>   */
> @@ -350,7 +901,29 @@ static void riscv_iommu_flush_iotlb_range(struct iommu_domain *iommu_domain,
>                                           unsigned long *start, unsigned long *end,
>                                           size_t *pgsize)
>  {
> -       /* Command interface not implemented */
> +       struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
> +       struct riscv_iommu_command cmd;
> +       unsigned long iova;
> +
> +       if (domain->mode == RISCV_IOMMU_DC_FSC_MODE_BARE)
> +               return;
> +
> +       /* Domain not attached to an IOMMU! */
> +       BUG_ON(!domain->iommu);
> +
> +       riscv_iommu_cmd_inval_vma(&cmd);
> +       riscv_iommu_cmd_inval_set_pscid(&cmd, domain->pscid);
> +
> +       if (start && end && pgsize) {
> +               /* Cover only the range that is needed */
> +               for (iova = *start; iova <= *end; iova += *pgsize) {
> +                       riscv_iommu_cmd_inval_set_addr(&cmd, iova);
> +                       riscv_iommu_post(domain->iommu, &cmd);
> +               }
> +       } else {
> +               riscv_iommu_post(domain->iommu, &cmd);
> +       }
> +       riscv_iommu_iofence_sync(domain->iommu);
>  }
>
>  static void riscv_iommu_flush_iotlb_all(struct iommu_domain *iommu_domain)
> @@ -610,6 +1183,9 @@ void riscv_iommu_remove(struct riscv_iommu_device *iommu)
>         iommu_device_unregister(&iommu->iommu);
>         iommu_device_sysfs_remove(&iommu->iommu);
>         riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_OFF);
> +       riscv_iommu_queue_free(iommu, &iommu->cmdq);
> +       riscv_iommu_queue_free(iommu, &iommu->fltq);
> +       riscv_iommu_queue_free(iommu, &iommu->priq);
>  }
>
>  int riscv_iommu_init(struct riscv_iommu_device *iommu)
> @@ -632,6 +1208,16 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
>         }
>  #endif
>
> +       /*
> +        * Assign queue lengths from module parameters if not already
> +        * set on the device tree.
> +        */
> +       if (!iommu->cmdq_len)
> +               iommu->cmdq_len = cmdq_length;
> +       if (!iommu->fltq_len)
> +               iommu->fltq_len = fltq_length;
> +       if (!iommu->priq_len)
> +               iommu->priq_len = priq_length;
>         /* Clear any pending interrupt flag. */
>         riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR,
>                            RISCV_IOMMU_IPSR_CIP |
> @@ -639,7 +1225,20 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
>                            RISCV_IOMMU_IPSR_PMIP | RISCV_IOMMU_IPSR_PIP);
>         spin_lock_init(&iommu->cq_lock);
>         mutex_init(&iommu->eps_mutex);
> +       ret = riscv_iommu_queue_init(iommu, RISCV_IOMMU_COMMAND_QUEUE);
> +       if (ret)
> +               goto fail;
> +       ret = riscv_iommu_queue_init(iommu, RISCV_IOMMU_FAULT_QUEUE);
> +       if (ret)
> +               goto fail;
> +       if (!(iommu->cap & RISCV_IOMMU_CAP_ATS))
> +               goto no_ats;
> +
> +       ret = riscv_iommu_queue_init(iommu, RISCV_IOMMU_PAGE_REQUEST_QUEUE);
> +       if (ret)
> +               goto fail;
>
> + no_ats:
>         ret = riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_BARE);
>
>         if (ret) {
> @@ -663,5 +1262,8 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
>         return 0;
>   fail:
>         riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_OFF);
> +       riscv_iommu_queue_free(iommu, &iommu->priq);
> +       riscv_iommu_queue_free(iommu, &iommu->fltq);
> +       riscv_iommu_queue_free(iommu, &iommu->cmdq);
>         return ret;
>  }
> diff --git a/drivers/iommu/riscv/iommu.h b/drivers/iommu/riscv/iommu.h
> index 7dc9baa59a50..04148a2a8ffd 100644
> --- a/drivers/iommu/riscv/iommu.h
> +++ b/drivers/iommu/riscv/iommu.h
> @@ -28,6 +28,24 @@
>  #define IOMMU_PAGE_SIZE_1G     BIT_ULL(30)
>  #define IOMMU_PAGE_SIZE_512G   BIT_ULL(39)
>
> +struct riscv_iommu_queue {
> +       dma_addr_t base_dma;    /* ring buffer bus address */
> +       void *base;             /* ring buffer pointer */
> +       size_t len;             /* single item length */
> +       u32 cnt;                /* items count */
> +       u32 lui;                /* last used index, consumer/producer share */
> +       unsigned qbr;           /* queue base register offset */
> +       unsigned qcr;           /* queue control and status register offset */
> +       int irq;                /* registered interrupt number */
> +       bool in_iomem;          /* indicates queue data are in I/O memory  */
> +};
> +
> +enum riscv_queue_ids {
> +       RISCV_IOMMU_COMMAND_QUEUE       = 0,
> +       RISCV_IOMMU_FAULT_QUEUE         = 1,
> +       RISCV_IOMMU_PAGE_REQUEST_QUEUE  = 2
> +};
> +
>  struct riscv_iommu_device {
>         struct iommu_device iommu;      /* iommu core interface */
>         struct device *dev;             /* iommu hardware */
> @@ -42,6 +60,11 @@ struct riscv_iommu_device {
>         int irq_pm;
>         int irq_priq;
>
> +       /* Queue lengths */
> +       int cmdq_len;
> +       int fltq_len;
> +       int priq_len;
> +
>         /* supported and enabled hardware capabilities */
>         u64 cap;
>
> @@ -53,6 +76,11 @@ struct riscv_iommu_device {
>         unsigned ddt_mode;
>         bool ddtp_in_iomem;
>
> +       /* hardware queues */
> +       struct riscv_iommu_queue cmdq;
> +       struct riscv_iommu_queue fltq;
> +       struct riscv_iommu_queue priq;
> +
>         /* Connected end-points */
>         struct rb_root eps;
>         struct mutex eps_mutex;
> --
> 2.34.1
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 08/11] RISC-V: drivers/iommu/riscv: Add page table support
  2023-07-19 19:33 ` [PATCH 08/11] RISC-V: drivers/iommu/riscv: Add page table support Tomasz Jeznach
  2023-07-25 13:13   ` Zong Li
@ 2023-07-31  7:19   ` Zong Li
  2023-08-16 21:04   ` Robin Murphy
  2 siblings, 0 replies; 86+ messages in thread
From: Zong Li @ 2023-07-31  7:19 UTC (permalink / raw)
  To: Tomasz Jeznach
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley,
	Anup Patel, Albert Ou, linux, linux-kernel, Sebastien Boeuf,
	iommu, Palmer Dabbelt, Nick Kossifidis, linux-riscv

On Thu, Jul 20, 2023 at 3:34 AM Tomasz Jeznach <tjeznach@rivosinc.com> wrote:
>
> Introduces I/O page level translation services, with 4K, 2M, 1G page
> size support and enables page level iommu_map/unmap domain interfaces.
>
> Co-developed-by: Sebastien Boeuf <seb@rivosinc.com>
> Signed-off-by: Sebastien Boeuf <seb@rivosinc.com>
> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> ---
>  drivers/iommu/io-pgtable.c       |   3 +
>  drivers/iommu/riscv/Makefile     |   2 +-
>  drivers/iommu/riscv/io_pgtable.c | 266 +++++++++++++++++++++++++++++++
>  drivers/iommu/riscv/iommu.c      |  40 +++--
>  drivers/iommu/riscv/iommu.h      |   1 +
>  include/linux/io-pgtable.h       |   2 +
>  6 files changed, 297 insertions(+), 17 deletions(-)
>  create mode 100644 drivers/iommu/riscv/io_pgtable.c
>
> diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c
> index b843fcd365d2..c4807175934f 100644
> --- a/drivers/iommu/io-pgtable.c
> +++ b/drivers/iommu/io-pgtable.c
> @@ -32,6 +32,9 @@ io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] = {
>         [AMD_IOMMU_V1] = &io_pgtable_amd_iommu_v1_init_fns,
>         [AMD_IOMMU_V2] = &io_pgtable_amd_iommu_v2_init_fns,
>  #endif
> +#ifdef CONFIG_RISCV_IOMMU
> +       [RISCV_IOMMU] = &io_pgtable_riscv_init_fns,
> +#endif
>  };
>
>  struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
> diff --git a/drivers/iommu/riscv/Makefile b/drivers/iommu/riscv/Makefile
> index 9523eb053cfc..13af452c3052 100644
> --- a/drivers/iommu/riscv/Makefile
> +++ b/drivers/iommu/riscv/Makefile
> @@ -1 +1 @@
> -obj-$(CONFIG_RISCV_IOMMU) += iommu.o iommu-pci.o iommu-platform.o iommu-sysfs.o
> \ No newline at end of file
> +obj-$(CONFIG_RISCV_IOMMU) += iommu.o iommu-pci.o iommu-platform.o iommu-sysfs.o io_pgtable.o
> diff --git a/drivers/iommu/riscv/io_pgtable.c b/drivers/iommu/riscv/io_pgtable.c
> new file mode 100644
> index 000000000000..b6e603e6726e
> --- /dev/null
> +++ b/drivers/iommu/riscv/io_pgtable.c
> @@ -0,0 +1,266 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright © 2022-2023 Rivos Inc.
> + *
> + * RISC-V IOMMU page table allocator.
> + *
> + * Authors:
> + *     Tomasz Jeznach <tjeznach@rivosinc.com>
> + *     Sebastien Boeuf <seb@rivosinc.com>
> + */
> +
> +#include <linux/atomic.h>
> +#include <linux/bitops.h>
> +#include <linux/io-pgtable.h>
> +#include <linux/kernel.h>
> +#include <linux/sizes.h>
> +#include <linux/slab.h>
> +#include <linux/types.h>
> +#include <linux/dma-mapping.h>
> +
> +#include "iommu.h"
> +
> +#define io_pgtable_to_domain(x) \
> +       container_of((x), struct riscv_iommu_domain, pgtbl)
> +
> +#define io_pgtable_ops_to_domain(x) \
> +       io_pgtable_to_domain(container_of((x), struct io_pgtable, ops))
> +
> +static inline size_t get_page_size(size_t size)
> +{
> +       if (size >= IOMMU_PAGE_SIZE_512G)
> +               return IOMMU_PAGE_SIZE_512G;
> +
> +       if (size >= IOMMU_PAGE_SIZE_1G)
> +               return IOMMU_PAGE_SIZE_1G;
> +
> +       if (size >= IOMMU_PAGE_SIZE_2M)
> +               return IOMMU_PAGE_SIZE_2M;
> +
> +       return IOMMU_PAGE_SIZE_4K;
> +}
> +
> +static void riscv_iommu_pt_walk_free(pmd_t * ptp, unsigned shift, bool root)
> +{
> +       pmd_t *pte, *pt_base;
> +       int i;
> +
> +       if (shift == PAGE_SHIFT)
> +               return;
> +
> +       if (root)
> +               pt_base = ptp;
> +       else
> +               pt_base =
> +                   (pmd_t *) pfn_to_virt(__page_val_to_pfn(pmd_val(*ptp)));
> +
> +       /* Recursively free all sub page table pages */
> +       for (i = 0; i < PTRS_PER_PMD; i++) {
> +               pte = pt_base + i;
> +               if (pmd_present(*pte) && !pmd_leaf(*pte))
> +                       riscv_iommu_pt_walk_free(pte, shift - 9, false);
> +       }
> +
> +       /* Now free the current page table page */
> +       if (!root && pmd_present(*pt_base))
> +               free_page((unsigned long)pt_base);
> +}
> +
> +static void riscv_iommu_free_pgtable(struct io_pgtable *iop)
> +{
> +       struct riscv_iommu_domain *domain = io_pgtable_to_domain(iop);
> +       riscv_iommu_pt_walk_free((pmd_t *) domain->pgd_root, PGDIR_SHIFT, true);
> +}
> +
> +static pte_t *riscv_iommu_pt_walk_alloc(pmd_t * ptp, unsigned long iova,
> +                                       unsigned shift, bool root,
> +                                       size_t pgsize,
> +                                       unsigned long (*pd_alloc)(gfp_t),
> +                                       gfp_t gfp)
> +{
> +       pmd_t *pte;
> +       unsigned long pfn;
> +
> +       if (root)
> +               pte = ptp + ((iova >> shift) & (PTRS_PER_PMD - 1));
> +       else
> +               pte = (pmd_t *) pfn_to_virt(__page_val_to_pfn(pmd_val(*ptp))) +
> +                   ((iova >> shift) & (PTRS_PER_PMD - 1));
> +
> +       if ((1ULL << shift) <= pgsize) {
> +               if (pmd_present(*pte) && !pmd_leaf(*pte))
> +                       riscv_iommu_pt_walk_free(pte, shift - 9, false);
> +               return (pte_t *) pte;
> +       }
> +
> +       if (pmd_none(*pte)) {
> +               pfn = pd_alloc ? virt_to_pfn(pd_alloc(gfp)) : 0;
> +               if (!pfn)
> +                       return NULL;
> +               set_pmd(pte, __pmd((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
> +       }
> +
> +       return riscv_iommu_pt_walk_alloc(pte, iova, shift - 9, false,
> +                                        pgsize, pd_alloc, gfp);
> +}
> +
> +static pte_t *riscv_iommu_pt_walk_fetch(pmd_t * ptp,
> +                                       unsigned long iova, unsigned shift,
> +                                       bool root)
> +{
> +       pmd_t *pte;
> +
> +       if (root)
> +               pte = ptp + ((iova >> shift) & (PTRS_PER_PMD - 1));
> +       else
> +               pte = (pmd_t *) pfn_to_virt(__page_val_to_pfn(pmd_val(*ptp))) +
> +                   ((iova >> shift) & (PTRS_PER_PMD - 1));
> +
> +       if (pmd_leaf(*pte))
> +               return (pte_t *) pte;
> +       else if (pmd_none(*pte))
> +               return NULL;
> +       else if (shift == PAGE_SHIFT)
> +               return NULL;
> +
> +       return riscv_iommu_pt_walk_fetch(pte, iova, shift - 9, false);
> +}
> +
> +static int riscv_iommu_map_pages(struct io_pgtable_ops *ops,
> +                                unsigned long iova, phys_addr_t phys,
> +                                size_t pgsize, size_t pgcount, int prot,
> +                                gfp_t gfp, size_t *mapped)
> +{
> +       struct riscv_iommu_domain *domain = io_pgtable_ops_to_domain(ops);
> +       size_t size = 0;
> +       size_t page_size = get_page_size(pgsize);
> +       pte_t *pte;
> +       pte_t pte_val;
> +       pgprot_t pte_prot;
> +
> +       if (domain->domain.type == IOMMU_DOMAIN_BLOCKED)
> +               return -ENODEV;
> +
> +       if (domain->domain.type == IOMMU_DOMAIN_IDENTITY) {
> +               *mapped = pgsize * pgcount;
> +               return 0;
> +       }
> +
> +       pte_prot = (prot & IOMMU_WRITE) ?
> +           __pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_WRITE | _PAGE_DIRTY) :
> +           __pgprot(_PAGE_BASE | _PAGE_READ);
> +
> +       while (pgcount--) {
> +               pte =
> +                   riscv_iommu_pt_walk_alloc((pmd_t *) domain->pgd_root, iova,
> +                                             PGDIR_SHIFT, true, page_size,
> +                                             get_zeroed_page, gfp);
> +               if (!pte) {
> +                       *mapped = size;
> +                       return -ENOMEM;
> +               }
> +
> +               pte_val = pfn_pte(phys_to_pfn(phys), pte_prot);
> +
> +               set_pte(pte, pte_val);
> +
> +               size += page_size;
> +               iova += page_size;
> +               phys += page_size;
> +       }
> +
> +       *mapped = size;
> +       return 0;
> +}
> +
> +static size_t riscv_iommu_unmap_pages(struct io_pgtable_ops *ops,
> +                                     unsigned long iova, size_t pgsize,
> +                                     size_t pgcount,
> +                                     struct iommu_iotlb_gather *gather)
> +{
> +       struct riscv_iommu_domain *domain = io_pgtable_ops_to_domain(ops);
> +       size_t size = 0;
> +       size_t page_size = get_page_size(pgsize);
> +       pte_t *pte;
> +
> +       if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)
> +               return pgsize * pgcount;
> +
> +       while (pgcount--) {
> +               pte = riscv_iommu_pt_walk_fetch((pmd_t *) domain->pgd_root,
> +                                               iova, PGDIR_SHIFT, true);
> +               if (!pte)
> +                       return size;
> +
> +               set_pte(pte, __pte(0));
> +
> +               iommu_iotlb_gather_add_page(&domain->domain, gather, iova,
> +                                           pgsize);
> +
> +               size += page_size;
> +               iova += page_size;
> +       }
> +
> +       return size;
> +}
> +
> +static phys_addr_t riscv_iommu_iova_to_phys(struct io_pgtable_ops *ops,
> +                                           unsigned long iova)
> +{
> +       struct riscv_iommu_domain *domain = io_pgtable_ops_to_domain(ops);
> +       pte_t *pte;
> +
> +       if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)
> +               return (phys_addr_t) iova;
> +
> +       pte = riscv_iommu_pt_walk_fetch((pmd_t *) domain->pgd_root,
> +                                       iova, PGDIR_SHIFT, true);
> +       if (!pte || !pte_present(*pte))
> +               return 0;
> +
> +       return (pfn_to_phys(pte_pfn(*pte)) | (iova & PAGE_MASK));

As I mentioned in last mail, it should be (iova & ~PAGE_MASK) for
getting low 12 bits.

> +}
> +
> +static void riscv_iommu_tlb_inv_all(void *cookie)
> +{
> +}
> +
> +static void riscv_iommu_tlb_inv_walk(unsigned long iova, size_t size,
> +                                    size_t granule, void *cookie)
> +{
> +}
> +

What could we do in these callbacks? Perhaps send the IOTINVAL command to IOMMU?

> +static void riscv_iommu_tlb_add_page(struct iommu_iotlb_gather *gather,
> +                                    unsigned long iova, size_t granule,
> +                                    void *cookie)
> +{
> +}
> +
> +static const struct iommu_flush_ops riscv_iommu_flush_ops = {
> +       .tlb_flush_all = riscv_iommu_tlb_inv_all,
> +       .tlb_flush_walk = riscv_iommu_tlb_inv_walk,
> +       .tlb_add_page = riscv_iommu_tlb_add_page,
> +};
> +
> +/* NOTE: cfg should point to riscv_iommu_domain structure member pgtbl.cfg */
> +static struct io_pgtable *riscv_iommu_alloc_pgtable(struct io_pgtable_cfg *cfg,
> +                                                   void *cookie)
> +{
> +       struct io_pgtable *iop = container_of(cfg, struct io_pgtable, cfg);
> +
> +       cfg->pgsize_bitmap = SZ_4K | SZ_2M | SZ_1G;
> +       cfg->ias = 57;          // va mode, SvXX -> ias
> +       cfg->oas = 57;          // pa mode, or SvXX+4 -> oas

Is it possible that use VA_BITS instead of this magic number?

> +       cfg->tlb = &riscv_iommu_flush_ops;
> +
> +       iop->ops.map_pages = riscv_iommu_map_pages;
> +       iop->ops.unmap_pages = riscv_iommu_unmap_pages;
> +       iop->ops.iova_to_phys = riscv_iommu_iova_to_phys;
> +
> +       return iop;
> +}
> +
> +struct io_pgtable_init_fns io_pgtable_riscv_init_fns = {
> +       .alloc = riscv_iommu_alloc_pgtable,
> +       .free = riscv_iommu_free_pgtable,
> +};
> diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
> index 9ee7d2b222b5..2ef6952a2109 100644
> --- a/drivers/iommu/riscv/iommu.c
> +++ b/drivers/iommu/riscv/iommu.c
> @@ -807,7 +807,7 @@ static struct iommu_device *riscv_iommu_probe_device(struct device *dev)
>         /* Initial DC pointer can be NULL if IOMMU is configured in OFF or BARE mode */
>         ep->dc = riscv_iommu_get_dc(iommu, ep->devid);
>
> -       dev_info(iommu->dev, "adding device to iommu with devid %i in domain %i\n",
> +       dev_dbg(iommu->dev, "adding device to iommu with devid %i in domain %i\n",
>                 ep->devid, ep->domid);
>
>         dev_iommu_priv_set(dev, ep);
> @@ -874,7 +874,10 @@ static struct iommu_domain *riscv_iommu_domain_alloc(unsigned type)
>  {
>         struct riscv_iommu_domain *domain;
>
> -       if (type != IOMMU_DOMAIN_IDENTITY &&
> +       if (type != IOMMU_DOMAIN_DMA &&
> +           type != IOMMU_DOMAIN_DMA_FQ &&
> +           type != IOMMU_DOMAIN_UNMANAGED &&
> +           type != IOMMU_DOMAIN_IDENTITY &&
>             type != IOMMU_DOMAIN_BLOCKED)
>                 return NULL;
>
> @@ -890,7 +893,7 @@ static struct iommu_domain *riscv_iommu_domain_alloc(unsigned type)
>         domain->pscid = ida_alloc_range(&riscv_iommu_pscids, 1,
>                                         RISCV_IOMMU_MAX_PSCID, GFP_KERNEL);
>
> -       printk("domain type %x alloc %u\n", type, domain->pscid);
> +       printk("domain alloc %u\n", domain->pscid);
>
>         return &domain->domain;
>  }
> @@ -903,6 +906,9 @@ static void riscv_iommu_domain_free(struct iommu_domain *iommu_domain)
>                 pr_warn("IOMMU domain is not empty!\n");
>         }
>
> +       if (domain->pgtbl.cookie)
> +               free_io_pgtable_ops(&domain->pgtbl.ops);
> +
>         if (domain->pgd_root)
>                 free_pages((unsigned long)domain->pgd_root, 0);
>
> @@ -959,6 +965,9 @@ static int riscv_iommu_domain_finalize(struct riscv_iommu_domain *domain,
>         if (!domain->pgd_root)
>                 return -ENOMEM;
>
> +       if (!alloc_io_pgtable_ops(RISCV_IOMMU, &domain->pgtbl.cfg, domain))
> +               return -ENOMEM;
> +
>         return 0;
>  }
>
> @@ -1006,9 +1015,8 @@ static int riscv_iommu_attach_dev(struct iommu_domain *iommu_domain, struct devi
>                 return 0;
>         }
>
> -       if (!dc) {
> +       if (!dc)
>                 return -ENODEV;
> -       }
>
>         /*
>          * S-Stage translation table. G-Stage remains unmodified (BARE).
> @@ -1104,12 +1112,11 @@ static int riscv_iommu_map_pages(struct iommu_domain *iommu_domain,
>  {
>         struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
>
> -       if (domain->domain.type == IOMMU_DOMAIN_IDENTITY) {
> -               *mapped = pgsize * pgcount;
> -               return 0;
> -       }
> +       if (!domain->pgtbl.ops.map_pages)
> +               return -ENODEV;
>
> -       return -ENODEV;
> +       return domain->pgtbl.ops.map_pages(&domain->pgtbl.ops, iova, phys,
> +                                          pgsize, pgcount, prot, gfp, mapped);
>  }
>
>  static size_t riscv_iommu_unmap_pages(struct iommu_domain *iommu_domain,
> @@ -1118,10 +1125,11 @@ static size_t riscv_iommu_unmap_pages(struct iommu_domain *iommu_domain,
>  {
>         struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
>
> -       if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)
> -               return pgsize * pgcount;
> +       if (!domain->pgtbl.ops.unmap_pages)
> +               return 0;
>
> -       return 0;
> +       return domain->pgtbl.ops.unmap_pages(&domain->pgtbl.ops, iova, pgsize,
> +                                            pgcount, gather);
>  }
>
>  static phys_addr_t riscv_iommu_iova_to_phys(struct iommu_domain *iommu_domain,
> @@ -1129,10 +1137,10 @@ static phys_addr_t riscv_iommu_iova_to_phys(struct iommu_domain *iommu_domain,
>  {
>         struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
>
> -       if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)
> -               return (phys_addr_t) iova;
> +       if (!domain->pgtbl.ops.iova_to_phys)
> +               return 0;
>
> -       return 0;
> +       return domain->pgtbl.ops.iova_to_phys(&domain->pgtbl.ops, iova);
>  }
>
>  /*
> diff --git a/drivers/iommu/riscv/iommu.h b/drivers/iommu/riscv/iommu.h
> index 9140df71e17b..fe32a4eff14e 100644
> --- a/drivers/iommu/riscv/iommu.h
> +++ b/drivers/iommu/riscv/iommu.h
> @@ -88,6 +88,7 @@ struct riscv_iommu_device {
>
>  struct riscv_iommu_domain {
>         struct iommu_domain domain;
> +       struct io_pgtable pgtbl;
>
>         struct list_head endpoints;
>         struct mutex lock;
> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
> index 1b7a44b35616..8dd9d3a28e3a 100644
> --- a/include/linux/io-pgtable.h
> +++ b/include/linux/io-pgtable.h
> @@ -19,6 +19,7 @@ enum io_pgtable_fmt {
>         AMD_IOMMU_V2,
>         APPLE_DART,
>         APPLE_DART2,
> +       RISCV_IOMMU,
>         IO_PGTABLE_NUM_FMTS,
>  };
>
> @@ -258,5 +259,6 @@ extern struct io_pgtable_init_fns io_pgtable_arm_mali_lpae_init_fns;
>  extern struct io_pgtable_init_fns io_pgtable_amd_iommu_v1_init_fns;
>  extern struct io_pgtable_init_fns io_pgtable_amd_iommu_v2_init_fns;
>  extern struct io_pgtable_init_fns io_pgtable_apple_dart_init_fns;
> +extern struct io_pgtable_init_fns io_pgtable_riscv_init_fns;
>
>  #endif /* __IO_PGTABLE_H */
> --
> 2.34.1
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 10/11] RISC-V: drivers/iommu/riscv: Add MSI identity remapping
  2023-07-19 19:33 ` [PATCH 10/11] RISC-V: drivers/iommu/riscv: Add MSI identity remapping Tomasz Jeznach
@ 2023-07-31  8:02   ` Zong Li
  2023-08-16 21:43   ` Robin Murphy
  1 sibling, 0 replies; 86+ messages in thread
From: Zong Li @ 2023-07-31  8:02 UTC (permalink / raw)
  To: Tomasz Jeznach
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley,
	Anup Patel, Albert Ou, linux, linux-kernel, Sebastien Boeuf,
	iommu, Palmer Dabbelt, Nick Kossifidis, linux-riscv

On Thu, Jul 20, 2023 at 3:34 AM Tomasz Jeznach <tjeznach@rivosinc.com> wrote:
>
> This change provides basic identity mapping support to
> excercise MSI_FLAT hardware capability.
>
> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> ---
>  drivers/iommu/riscv/iommu.c | 81 +++++++++++++++++++++++++++++++++++++
>  drivers/iommu/riscv/iommu.h |  3 ++
>  2 files changed, 84 insertions(+)
>
> diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
> index 6042c35be3ca..7b3e3e135cf6 100644
> --- a/drivers/iommu/riscv/iommu.c
> +++ b/drivers/iommu/riscv/iommu.c
> @@ -61,6 +61,9 @@ MODULE_PARM_DESC(priq_length, "Page request interface queue length.");
>  #define RISCV_IOMMU_MAX_PSCID  (1U << 20)
>  static DEFINE_IDA(riscv_iommu_pscids);
>
> +/* TODO: Enable MSI remapping */
> +#define RISCV_IMSIC_BASE       0x28000000

I'm not sure if it is appropriate to hard code the base address of
peripheral in source code, it might be depends on the layout of each
target.

> +
>  /* 1 second */
>  #define RISCV_IOMMU_TIMEOUT    riscv_timebase
>
> @@ -932,6 +935,72 @@ static irqreturn_t riscv_iommu_priq_process(int irq, void *data)
>   * Endpoint management
>   */
>
> +static int riscv_iommu_enable_ir(struct riscv_iommu_endpoint *ep)
> +{
> +       struct riscv_iommu_device *iommu = ep->iommu;
> +       struct iommu_resv_region *entry;
> +       struct irq_domain *msi_domain;
> +       u64 val;
> +       int i;
> +
> +       /* Initialize MSI remapping */
> +       if (!ep->dc || !(iommu->cap & RISCV_IOMMU_CAP_MSI_FLAT))
> +               return 0;
> +
> +       ep->msi_root = (struct riscv_iommu_msi_pte *)get_zeroed_page(GFP_KERNEL);
> +       if (!ep->msi_root)
> +               return -ENOMEM;
> +
> +       for (i = 0; i < 256; i++) {
> +               ep->msi_root[i].pte = RISCV_IOMMU_MSI_PTE_V |
> +                   FIELD_PREP(RISCV_IOMMU_MSI_PTE_M, 3) |
> +                   phys_to_ppn(RISCV_IMSIC_BASE + i * PAGE_SIZE);
> +       }
> +
> +       entry = iommu_alloc_resv_region(RISCV_IMSIC_BASE, PAGE_SIZE * 256, 0,
> +                                       IOMMU_RESV_SW_MSI, GFP_KERNEL);
> +       if (entry)
> +               list_add_tail(&entry->list, &ep->regions);
> +
> +       val = virt_to_pfn(ep->msi_root) |
> +           FIELD_PREP(RISCV_IOMMU_DC_MSIPTP_MODE, RISCV_IOMMU_DC_MSIPTP_MODE_FLAT);
> +       ep->dc->msiptp = cpu_to_le64(val);
> +
> +       /* Single page of MSIPTP, 256 IMSIC files */
> +       ep->dc->msi_addr_mask = cpu_to_le64(255);
> +       ep->dc->msi_addr_pattern = cpu_to_le64(RISCV_IMSIC_BASE >> 12);
> +       wmb();
> +
> +       /* set msi domain for the device as isolated. hack. */
> +       msi_domain = dev_get_msi_domain(ep->dev);
> +       if (msi_domain) {
> +               msi_domain->flags |= IRQ_DOMAIN_FLAG_ISOLATED_MSI;
> +       }
> +
> +       dev_dbg(ep->dev, "RV-IR enabled\n");
> +
> +       ep->ir_enabled = true;
> +
> +       return 0;
> +}
> +
> +static void riscv_iommu_disable_ir(struct riscv_iommu_endpoint *ep)
> +{
> +       if (!ep->ir_enabled)
> +               return;
> +
> +       ep->dc->msi_addr_pattern = 0ULL;
> +       ep->dc->msi_addr_mask = 0ULL;
> +       ep->dc->msiptp = 0ULL;
> +       wmb();
> +
> +       dev_dbg(ep->dev, "RV-IR disabled\n");
> +
> +       free_pages((unsigned long)ep->msi_root, 0);
> +       ep->msi_root = NULL;
> +       ep->ir_enabled = false;
> +}
> +
>  /* Endpoint features/capabilities */
>  static void riscv_iommu_disable_ep(struct riscv_iommu_endpoint *ep)
>  {
> @@ -1226,6 +1295,7 @@ static struct iommu_device *riscv_iommu_probe_device(struct device *dev)
>
>         mutex_init(&ep->lock);
>         INIT_LIST_HEAD(&ep->domain);
> +       INIT_LIST_HEAD(&ep->regions);
>
>         if (dev_is_pci(dev)) {
>                 ep->devid = pci_dev_id(to_pci_dev(dev));
> @@ -1248,6 +1318,7 @@ static struct iommu_device *riscv_iommu_probe_device(struct device *dev)
>         dev_iommu_priv_set(dev, ep);
>         riscv_iommu_add_device(iommu, dev);
>         riscv_iommu_enable_ep(ep);
> +       riscv_iommu_enable_ir(ep);
>
>         return &iommu->iommu;
>  }
> @@ -1279,6 +1350,7 @@ static void riscv_iommu_release_device(struct device *dev)
>                 riscv_iommu_iodir_inv_devid(iommu, ep->devid);
>         }
>
> +       riscv_iommu_disable_ir(ep);
>         riscv_iommu_disable_ep(ep);
>
>         /* Remove endpoint from IOMMU tracking structures */
> @@ -1301,6 +1373,15 @@ static struct iommu_group *riscv_iommu_device_group(struct device *dev)
>
>  static void riscv_iommu_get_resv_regions(struct device *dev, struct list_head *head)
>  {
> +       struct iommu_resv_region *entry, *new_entry;
> +       struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
> +
> +       list_for_each_entry(entry, &ep->regions, list) {
> +               new_entry = kmemdup(entry, sizeof(*entry), GFP_KERNEL);
> +               if (new_entry)
> +                       list_add_tail(&new_entry->list, head);
> +       }
> +
>         iommu_dma_get_resv_regions(dev, head);
>  }
>
> diff --git a/drivers/iommu/riscv/iommu.h b/drivers/iommu/riscv/iommu.h
> index 83e8d00fd0f8..55418a1144fb 100644
> --- a/drivers/iommu/riscv/iommu.h
> +++ b/drivers/iommu/riscv/iommu.h
> @@ -117,14 +117,17 @@ struct riscv_iommu_endpoint {
>         struct riscv_iommu_dc *dc;              /* device context pointer */
>         struct riscv_iommu_pc *pc;              /* process context root, valid if pasid_enabled is true */
>         struct riscv_iommu_device *iommu;       /* parent iommu device */
> +       struct riscv_iommu_msi_pte *msi_root;   /* interrupt re-mapping */
>
>         struct mutex lock;
>         struct list_head domain;                /* endpoint attached managed domain */
> +       struct list_head regions;               /* reserved regions, interrupt remapping window */
>
>         /* end point info bits */
>         unsigned pasid_bits;
>         unsigned pasid_feat;
>         bool pasid_enabled;
> +       bool ir_enabled;
>  };
>
>  /* Helper functions and macros */
> --
> 2.34.1
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 11/11] RISC-V: drivers/iommu/riscv: Add G-Stage translation support
  2023-07-19 19:33 ` [PATCH 11/11] RISC-V: drivers/iommu/riscv: Add G-Stage translation support Tomasz Jeznach
@ 2023-07-31  8:12   ` Zong Li
  2023-08-16 21:13   ` Robin Murphy
  1 sibling, 0 replies; 86+ messages in thread
From: Zong Li @ 2023-07-31  8:12 UTC (permalink / raw)
  To: Tomasz Jeznach
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley,
	Anup Patel, Albert Ou, linux, linux-kernel, Sebastien Boeuf,
	iommu, Palmer Dabbelt, Nick Kossifidis, linux-riscv

On Thu, Jul 20, 2023 at 3:34 AM Tomasz Jeznach <tjeznach@rivosinc.com> wrote:
>
> This change introduces 2nd stage translation configuration
> support, enabling nested translation for IOMMU hardware.
> Pending integration with VMM IOMMUFD interfaces to manage
> 1st stage translation and IOMMU virtialization interfaces.
>
> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> ---
>  drivers/iommu/riscv/iommu.c | 58 ++++++++++++++++++++++++++++---------
>  drivers/iommu/riscv/iommu.h |  3 +-
>  2 files changed, 46 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
> index 7b3e3e135cf6..3ca2f0194d3c 100644
> --- a/drivers/iommu/riscv/iommu.c
> +++ b/drivers/iommu/riscv/iommu.c
> @@ -1418,6 +1418,19 @@ static struct iommu_domain *riscv_iommu_domain_alloc(unsigned type)
>         return &domain->domain;
>  }
>
> +/* mark domain as second-stage translation */
> +static int riscv_iommu_enable_nesting(struct iommu_domain *iommu_domain)
> +{
> +       struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
> +
> +       mutex_lock(&domain->lock);
> +       if (list_empty(&domain->endpoints))
> +               domain->g_stage = true;
> +       mutex_unlock(&domain->lock);
> +
> +       return domain->g_stage ? 0 : -EBUSY;
> +}
> +
>  static void riscv_iommu_domain_free(struct iommu_domain *iommu_domain)
>  {
>         struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
> @@ -1433,7 +1446,7 @@ static void riscv_iommu_domain_free(struct iommu_domain *iommu_domain)
>                 free_io_pgtable_ops(&domain->pgtbl.ops);
>
>         if (domain->pgd_root)
> -               free_pages((unsigned long)domain->pgd_root, 0);
> +               free_pages((unsigned long)domain->pgd_root, domain->g_stage ? 2 : 0);
>
>         if ((int)domain->pscid > 0)
>                 ida_free(&riscv_iommu_pscids, domain->pscid);
> @@ -1483,7 +1496,8 @@ static int riscv_iommu_domain_finalize(struct riscv_iommu_domain *domain,
>
>         /* TODO: Fix this for RV32 */
>         domain->mode = satp_mode >> 60;
> -       domain->pgd_root = (pgd_t *) __get_free_pages(GFP_KERNEL | __GFP_ZERO, 0);
> +       domain->pgd_root = (pgd_t *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
> +                                                     domain->g_stage ? 2 : 0);
>
>         if (!domain->pgd_root)
>                 return -ENOMEM;
> @@ -1499,6 +1513,8 @@ static u64 riscv_iommu_domain_atp(struct riscv_iommu_domain *domain)
>         u64 atp = FIELD_PREP(RISCV_IOMMU_DC_FSC_MODE, domain->mode);
>         if (domain->mode != RISCV_IOMMU_DC_FSC_MODE_BARE)
>                 atp |= FIELD_PREP(RISCV_IOMMU_DC_FSC_PPN, virt_to_pfn(domain->pgd_root));
> +       if (domain->g_stage)
> +               atp |= FIELD_PREP(RISCV_IOMMU_DC_IOHGATP_GSCID, domain->pscid);
>         return atp;
>  }
>
> @@ -1541,20 +1557,30 @@ static int riscv_iommu_attach_dev(struct iommu_domain *iommu_domain, struct devi
>         if (!dc)
>                 return -ENODEV;
>
> -       /*
> -        * S-Stage translation table. G-Stage remains unmodified (BARE).
> -        */
> -       val = FIELD_PREP(RISCV_IOMMU_DC_TA_PSCID, domain->pscid);
> -
> -       if (ep->pasid_enabled) {
> -               ep->pc[0].ta = cpu_to_le64(val | RISCV_IOMMU_PC_TA_V);
> -               ep->pc[0].fsc = cpu_to_le64(riscv_iommu_domain_atp(domain));
> +       if (domain->g_stage) {
> +               /*
> +                * Enable G-Stage translation with initial pass-through mode
> +                * for S-Stage. VMM is responsible for more restrictive
> +                * guest VA translation scheme configuration.
> +                */
>                 dc->ta = 0;
> -               dc->fsc = cpu_to_le64(virt_to_pfn(ep->pc) |
> -                   FIELD_PREP(RISCV_IOMMU_DC_FSC_MODE, RISCV_IOMMU_DC_FSC_PDTP_MODE_PD8));
> +               dc->fsc = 0ULL; /* RISCV_IOMMU_DC_FSC_MODE_BARE */ ;
> +               dc->iohgatp = cpu_to_le64(riscv_iommu_domain_atp(domain));
>         } else {
> -               dc->ta = cpu_to_le64(val);
> -               dc->fsc = cpu_to_le64(riscv_iommu_domain_atp(domain));
> +               /* S-Stage translation table. G-Stage remains unmodified. */
> +               if (ep->pasid_enabled) {
> +                       val = FIELD_PREP(RISCV_IOMMU_DC_TA_PSCID, domain->pscid);
> +                       ep->pc[0].ta = cpu_to_le64(val | RISCV_IOMMU_PC_TA_V);
> +                       ep->pc[0].fsc = cpu_to_le64(riscv_iommu_domain_atp(domain));
> +                       dc->ta = 0;
> +                       val = FIELD_PREP(RISCV_IOMMU_DC_FSC_MODE,
> +                                         RISCV_IOMMU_DC_FSC_PDTP_MODE_PD8);
> +                       dc->fsc = cpu_to_le64(val | virt_to_pfn(ep->pc));
> +               } else {
> +                       val = FIELD_PREP(RISCV_IOMMU_DC_TA_PSCID, domain->pscid);
> +                       dc->ta = cpu_to_le64(val);
> +                       dc->fsc = cpu_to_le64(riscv_iommu_domain_atp(domain));
> +               }
>         }
>
>         wmb();
> @@ -1599,6 +1625,9 @@ static int riscv_iommu_set_dev_pasid(struct iommu_domain *iommu_domain,
>         if (!iommu_domain || !iommu_domain->mm)
>                 return -EINVAL;
>
> +       if (domain->g_stage)
> +               return -EINVAL;
> +
>         /* Driver uses TC.DPE mode, PASID #0 is incorrect. */
>         if (pasid == 0)
>                 return -EINVAL;
> @@ -1969,6 +1998,7 @@ static const struct iommu_domain_ops riscv_iommu_domain_ops = {
>         .iotlb_sync = riscv_iommu_iotlb_sync,
>         .iotlb_sync_map = riscv_iommu_iotlb_sync_map,
>         .flush_iotlb_all = riscv_iommu_flush_iotlb_all,
> +       .enable_nesting = riscv_iommu_enable_nesting,
>  };
>

I don't see the GVMA invalidate command, I guess we need do something
likes that in 'riscv_iommu_mm_invalidate'

>  static const struct iommu_ops riscv_iommu_ops = {
> diff --git a/drivers/iommu/riscv/iommu.h b/drivers/iommu/riscv/iommu.h
> index 55418a1144fb..55e5aafea5bc 100644
> --- a/drivers/iommu/riscv/iommu.h
> +++ b/drivers/iommu/riscv/iommu.h
> @@ -102,8 +102,9 @@ struct riscv_iommu_domain {
>         struct riscv_iommu_device *iommu;
>
>         unsigned mode;          /* RIO_ATP_MODE_* enum */
> -       unsigned pscid;         /* RISC-V IOMMU PSCID */
> +       unsigned pscid;         /* RISC-V IOMMU PSCID / GSCID */
>         ioasid_t pasid;         /* IOMMU_DOMAIN_SVA: Cached PASID */
> +       bool g_stage;           /* 2nd stage translation domain */
>
>         pgd_t *pgd_root;        /* page table root pointer */
>  };
> --
> 2.34.1
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 09/11] RISC-V: drivers/iommu/riscv: Add SVA with PASID/ATS/PRI support.
  2023-07-19 19:33 ` [PATCH 09/11] RISC-V: drivers/iommu/riscv: Add SVA with PASID/ATS/PRI support Tomasz Jeznach
@ 2023-07-31  9:04   ` Zong Li
  0 siblings, 0 replies; 86+ messages in thread
From: Zong Li @ 2023-07-31  9:04 UTC (permalink / raw)
  To: Tomasz Jeznach
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley,
	Anup Patel, Albert Ou, linux, linux-kernel, Sebastien Boeuf,
	iommu, Palmer Dabbelt, Nick Kossifidis, linux-riscv

On Thu, Jul 20, 2023 at 3:35 AM Tomasz Jeznach <tjeznach@rivosinc.com> wrote:
>
> Introduces SVA (Shared Virtual Address) for RISC-V IOMMU, with
> ATS/PRI services for capable devices.
>
> Co-developed-by: Sebastien Boeuf <seb@rivosinc.com>
> Signed-off-by: Sebastien Boeuf <seb@rivosinc.com>
> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> ---
>  drivers/iommu/riscv/iommu.c | 601 +++++++++++++++++++++++++++++++++++-
>  drivers/iommu/riscv/iommu.h |  14 +
>  2 files changed, 610 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
> index 2ef6952a2109..6042c35be3ca 100644
> --- a/drivers/iommu/riscv/iommu.c
> +++ b/drivers/iommu/riscv/iommu.c
> @@ -384,6 +384,89 @@ static inline void riscv_iommu_cmd_iodir_set_did(struct riscv_iommu_command *cmd
>             FIELD_PREP(RISCV_IOMMU_CMD_IODIR_DID, devid) | RISCV_IOMMU_CMD_IODIR_DV;
>  }
>
> +static inline void riscv_iommu_cmd_iodir_set_pid(struct riscv_iommu_command *cmd,
> +                                                unsigned pasid)
> +{
> +       cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_IODIR_PID, pasid);
> +}
> +
> +static void riscv_iommu_cmd_ats_inval(struct riscv_iommu_command *cmd)
> +{
> +       cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_ATS_OPCODE) |
> +           FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_ATS_FUNC_INVAL);
> +       cmd->dword1 = 0;
> +}
> +
> +static inline void riscv_iommu_cmd_ats_prgr(struct riscv_iommu_command *cmd)
> +{
> +       cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_ATS_OPCODE) |
> +           FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_ATS_FUNC_PRGR);
> +       cmd->dword1 = 0;
> +}
> +
> +static void riscv_iommu_cmd_ats_set_rid(struct riscv_iommu_command *cmd, u32 rid)
> +{
> +       cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_ATS_RID, rid);
> +}
> +
> +static void riscv_iommu_cmd_ats_set_pid(struct riscv_iommu_command *cmd, u32 pid)
> +{
> +       cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_ATS_PID, pid) | RISCV_IOMMU_CMD_ATS_PV;
> +}
> +
> +static void riscv_iommu_cmd_ats_set_dseg(struct riscv_iommu_command *cmd, u8 seg)
> +{
> +       cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_ATS_DSEG, seg) | RISCV_IOMMU_CMD_ATS_DSV;
> +}
> +
> +static void riscv_iommu_cmd_ats_set_payload(struct riscv_iommu_command *cmd, u64 payload)
> +{
> +       cmd->dword1 = payload;
> +}
> +
> +/* Prepare the ATS invalidation payload */
> +static unsigned long riscv_iommu_ats_inval_payload(unsigned long start,
> +                                                  unsigned long end, bool global_inv)
> +{
> +       size_t len = end - start + 1;
> +       unsigned long payload = 0;
> +
> +       /*
> +        * PCI Express specification
> +        * Section 10.2.3.2 Translation Range Size (S) Field
> +        */
> +       if (len < PAGE_SIZE)
> +               len = PAGE_SIZE;
> +       else
> +               len = __roundup_pow_of_two(len);
> +
> +       payload = (start & ~(len - 1)) | (((len - 1) >> 12) << 11);
> +
> +       if (global_inv)
> +               payload |= RISCV_IOMMU_CMD_ATS_INVAL_G;
> +
> +       return payload;
> +}
> +
> +/* Prepare the ATS invalidation payload for all translations to be invalidated. */
> +static unsigned long riscv_iommu_ats_inval_all_payload(bool global_inv)
> +{
> +       unsigned long payload = GENMASK_ULL(62, 11);
> +
> +       if (global_inv)
> +               payload |= RISCV_IOMMU_CMD_ATS_INVAL_G;
> +
> +       return payload;
> +}
> +
> +/* Prepare the ATS "Page Request Group Response" payload */
> +static unsigned long riscv_iommu_ats_prgr_payload(u16 dest_id, u8 resp_code, u16 grp_idx)
> +{
> +       return FIELD_PREP(RISCV_IOMMU_CMD_ATS_PRGR_DST_ID, dest_id) |
> +           FIELD_PREP(RISCV_IOMMU_CMD_ATS_PRGR_RESP_CODE, resp_code) |
> +           FIELD_PREP(RISCV_IOMMU_CMD_ATS_PRGR_PRG_INDEX, grp_idx);
> +}
> +
>  /* TODO: Convert into lock-less MPSC implementation. */
>  static bool riscv_iommu_post_sync(struct riscv_iommu_device *iommu,
>                                   struct riscv_iommu_command *cmd, bool sync)
> @@ -460,6 +543,16 @@ static bool riscv_iommu_iodir_inv_devid(struct riscv_iommu_device *iommu, unsign
>         return riscv_iommu_post(iommu, &cmd);
>  }
>
> +static bool riscv_iommu_iodir_inv_pasid(struct riscv_iommu_device *iommu,
> +                                       unsigned devid, unsigned pasid)
> +{
> +       struct riscv_iommu_command cmd;
> +       riscv_iommu_cmd_iodir_inval_pdt(&cmd);
> +       riscv_iommu_cmd_iodir_set_did(&cmd, devid);
> +       riscv_iommu_cmd_iodir_set_pid(&cmd, pasid);
> +       return riscv_iommu_post(iommu, &cmd);
> +}
> +
>  static bool riscv_iommu_iofence_sync(struct riscv_iommu_device *iommu)
>  {
>         struct riscv_iommu_command cmd;
> @@ -467,6 +560,62 @@ static bool riscv_iommu_iofence_sync(struct riscv_iommu_device *iommu)
>         return riscv_iommu_post_sync(iommu, &cmd, true);
>  }
>
> +static void riscv_iommu_mm_invalidate(struct mmu_notifier *mn,
> +                                     struct mm_struct *mm, unsigned long start,
> +                                     unsigned long end)
> +{
> +       struct riscv_iommu_command cmd;
> +       struct riscv_iommu_endpoint *endpoint;
> +       struct riscv_iommu_domain *domain =
> +           container_of(mn, struct riscv_iommu_domain, mn);
> +       unsigned long iova;
> +       /*
> +        * The mm_types defines vm_end as the first byte after the end address,
> +        * different from IOMMU subsystem using the last address of an address
> +        * range. So do a simple translation here by updating what end means.
> +        */
> +       unsigned long payload = riscv_iommu_ats_inval_payload(start, end - 1, true);
> +
> +       riscv_iommu_cmd_inval_vma(&cmd);
> +       riscv_iommu_cmd_inval_set_gscid(&cmd, 0);
> +       riscv_iommu_cmd_inval_set_pscid(&cmd, domain->pscid);
> +       if (end > start) {
> +               /* Cover only the range that is needed */
> +               for (iova = start; iova < end; iova += PAGE_SIZE) {
> +                       riscv_iommu_cmd_inval_set_addr(&cmd, iova);
> +                       riscv_iommu_post(domain->iommu, &cmd);
> +               }
> +       } else {
> +               riscv_iommu_post(domain->iommu, &cmd);
> +       }
> +
> +       riscv_iommu_iofence_sync(domain->iommu);
> +
> +       /* ATS invalidation for every device and for specific translation range. */
> +       list_for_each_entry(endpoint, &domain->endpoints, domain) {
> +               if (!endpoint->pasid_enabled)
> +                       continue;
> +
> +               riscv_iommu_cmd_ats_inval(&cmd);
> +               riscv_iommu_cmd_ats_set_dseg(&cmd, endpoint->domid);
> +               riscv_iommu_cmd_ats_set_rid(&cmd, endpoint->devid);
> +               riscv_iommu_cmd_ats_set_pid(&cmd, domain->pasid);
> +               riscv_iommu_cmd_ats_set_payload(&cmd, payload);
> +               riscv_iommu_post(domain->iommu, &cmd);
> +       }
> +       riscv_iommu_iofence_sync(domain->iommu);
> +}
> +
> +static void riscv_iommu_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
> +{
> +       /* TODO: removed from notifier, cleanup PSCID mapping, flush IOTLB */
> +}
> +
> +static const struct mmu_notifier_ops riscv_iommu_mmuops = {
> +       .release = riscv_iommu_mm_release,
> +       .invalidate_range = riscv_iommu_mm_invalidate,
> +};
> +
>  /* Command queue primary interrupt handler */
>  static irqreturn_t riscv_iommu_cmdq_irq_check(int irq, void *data)
>  {
> @@ -608,6 +757,128 @@ static void riscv_iommu_add_device(struct riscv_iommu_device *iommu, struct devi
>         mutex_unlock(&iommu->eps_mutex);
>  }
>
> +/*
> + * Get device reference based on device identifier (requester id).
> + * Decrement reference count with put_device() call.
> + */
> +static struct device *riscv_iommu_get_device(struct riscv_iommu_device *iommu,
> +                                            unsigned devid)
> +{
> +       struct rb_node *node;
> +       struct riscv_iommu_endpoint *ep;
> +       struct device *dev = NULL;
> +
> +       mutex_lock(&iommu->eps_mutex);
> +
> +       node = iommu->eps.rb_node;
> +       while (node && !dev) {
> +               ep = rb_entry(node, struct riscv_iommu_endpoint, node);
> +               if (ep->devid < devid)
> +                       node = node->rb_right;
> +               else if (ep->devid > devid)
> +                       node = node->rb_left;
> +               else
> +                       dev = get_device(ep->dev);
> +       }
> +
> +       mutex_unlock(&iommu->eps_mutex);
> +
> +       return dev;
> +}
> +
> +static int riscv_iommu_ats_prgr(struct device *dev, struct iommu_page_response *msg)
> +{
> +       struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
> +       struct riscv_iommu_command cmd;
> +       u8 resp_code;
> +       unsigned long payload;
> +
> +       switch (msg->code) {
> +       case IOMMU_PAGE_RESP_SUCCESS:
> +               resp_code = 0b0000;
> +               break;
> +       case IOMMU_PAGE_RESP_INVALID:
> +               resp_code = 0b0001;
> +               break;
> +       case IOMMU_PAGE_RESP_FAILURE:
> +               resp_code = 0b1111;
> +               break;
> +       }
> +       payload = riscv_iommu_ats_prgr_payload(ep->devid, resp_code, msg->grpid);
> +
> +       /* ATS Page Request Group Response */
> +       riscv_iommu_cmd_ats_prgr(&cmd);
> +       riscv_iommu_cmd_ats_set_dseg(&cmd, ep->domid);
> +       riscv_iommu_cmd_ats_set_rid(&cmd, ep->devid);
> +       if (msg->flags & IOMMU_PAGE_RESP_PASID_VALID)
> +               riscv_iommu_cmd_ats_set_pid(&cmd, msg->pasid);
> +       riscv_iommu_cmd_ats_set_payload(&cmd, payload);
> +       riscv_iommu_post(ep->iommu, &cmd);
> +
> +       return 0;
> +}
> +
> +static void riscv_iommu_page_request(struct riscv_iommu_device *iommu,
> +                                    struct riscv_iommu_pq_record *req)
> +{
> +       struct iommu_fault_event event = { 0 };
> +       struct iommu_fault_page_request *prm = &event.fault.prm;
> +       int ret;
> +       struct device *dev;
> +       unsigned devid = FIELD_GET(RISCV_IOMMU_PREQ_HDR_DID, req->hdr);
> +
> +       /* Ignore PGR Stop marker. */
> +       if ((req->payload & RISCV_IOMMU_PREQ_PAYLOAD_M) == RISCV_IOMMU_PREQ_PAYLOAD_L)
> +               return;
> +
> +       dev = riscv_iommu_get_device(iommu, devid);
> +       if (!dev) {
> +               /* TODO: Handle invalid page request */
> +               return;
> +       }
> +
> +       event.fault.type = IOMMU_FAULT_PAGE_REQ;
> +
> +       if (req->payload & RISCV_IOMMU_PREQ_PAYLOAD_L)
> +               prm->flags |= IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
> +       if (req->payload & RISCV_IOMMU_PREQ_PAYLOAD_W)
> +               prm->perm |= IOMMU_FAULT_PERM_WRITE;
> +       if (req->payload & RISCV_IOMMU_PREQ_PAYLOAD_R)
> +               prm->perm |= IOMMU_FAULT_PERM_READ;
> +
> +       prm->grpid = FIELD_GET(RISCV_IOMMU_PREQ_PRG_INDEX, req->payload);
> +       prm->addr = FIELD_GET(RISCV_IOMMU_PREQ_UADDR, req->payload) << PAGE_SHIFT;
> +
> +       if (req->hdr & RISCV_IOMMU_PREQ_HDR_PV) {
> +               prm->flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
> +               /* TODO: where to find this bit */
> +               prm->flags |= IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID;
> +               prm->pasid = FIELD_GET(RISCV_IOMMU_PREQ_HDR_PID, req->hdr);
> +       }
> +
> +       ret = iommu_report_device_fault(dev, &event);
> +       if (ret) {
> +               struct iommu_page_response resp = {
> +                       .grpid = prm->grpid,
> +                       .code = IOMMU_PAGE_RESP_FAILURE,
> +               };
> +               if (prm->flags & IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID) {
> +                       resp.flags |= IOMMU_PAGE_RESP_PASID_VALID;
> +                       resp.pasid = prm->pasid;
> +               }
> +               riscv_iommu_ats_prgr(dev, &resp);
> +       }
> +
> +       put_device(dev);
> +}
> +
> +static int riscv_iommu_page_response(struct device *dev,
> +                                    struct iommu_fault_event *evt,
> +                                    struct iommu_page_response *msg)
> +{
> +       return riscv_iommu_ats_prgr(dev, msg);
> +}
> +
>  /* Page request interface queue primary interrupt handler */
>  static irqreturn_t riscv_iommu_priq_irq_check(int irq, void *data)
>  {
> @@ -626,7 +897,7 @@ static irqreturn_t riscv_iommu_priq_process(int irq, void *data)
>         struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
>         struct riscv_iommu_device *iommu;
>         struct riscv_iommu_pq_record *requests;
> -       unsigned cnt, idx, ctrl;
> +       unsigned cnt, len, idx, ctrl;
>
>         iommu = container_of(q, struct riscv_iommu_device, priq);
>         requests = (struct riscv_iommu_pq_record *)q->base;
> @@ -649,7 +920,8 @@ static irqreturn_t riscv_iommu_priq_process(int irq, void *data)
>                 cnt = riscv_iommu_queue_consume(iommu, q, &idx);
>                 if (!cnt)
>                         break;
> -               dev_warn(iommu->dev, "unexpected %u page requests\n", cnt);
> +               for (len = 0; len < cnt; idx++, len++)
> +                       riscv_iommu_page_request(iommu, &requests[idx]);
>                 riscv_iommu_queue_release(iommu, q, cnt);
>         } while (1);
>
> @@ -660,6 +932,169 @@ static irqreturn_t riscv_iommu_priq_process(int irq, void *data)
>   * Endpoint management
>   */
>
> +/* Endpoint features/capabilities */
> +static void riscv_iommu_disable_ep(struct riscv_iommu_endpoint *ep)
> +{
> +       struct pci_dev *pdev;
> +
> +       if (!dev_is_pci(ep->dev))
> +               return;
> +
> +       pdev = to_pci_dev(ep->dev);
> +
> +       if (ep->pasid_enabled) {
> +               pci_disable_ats(pdev);
> +               pci_disable_pri(pdev);
> +               pci_disable_pasid(pdev);
> +               ep->pasid_enabled = false;
> +       }
> +}
> +
> +static void riscv_iommu_enable_ep(struct riscv_iommu_endpoint *ep)
> +{
> +       int rc, feat, num;
> +       struct pci_dev *pdev;
> +       struct device *dev = ep->dev;
> +
> +       if (!dev_is_pci(dev))
> +               return;
> +
> +       if (!ep->iommu->iommu.max_pasids)
> +               return;
> +
> +       pdev = to_pci_dev(dev);
> +
> +       if (!pci_ats_supported(pdev))
> +               return;
> +
> +       if (!pci_pri_supported(pdev))
> +               return;
> +
> +       feat = pci_pasid_features(pdev);
> +       if (feat < 0)
> +               return;
> +
> +       num = pci_max_pasids(pdev);
> +       if (!num) {
> +               dev_warn(dev, "Can't enable PASID (num: %d)\n", num);
> +               return;
> +       }
> +
> +       if (num > ep->iommu->iommu.max_pasids)
> +               num = ep->iommu->iommu.max_pasids;
> +
> +       rc = pci_enable_pasid(pdev, feat);
> +       if (rc) {
> +               dev_warn(dev, "Can't enable PASID (rc: %d)\n", rc);
> +               return;
> +       }
> +
> +       rc = pci_reset_pri(pdev);
> +       if (rc) {
> +               dev_warn(dev, "Can't reset PRI (rc: %d)\n", rc);
> +               pci_disable_pasid(pdev);
> +               return;
> +       }
> +
> +       /* TODO: Get supported PRI queue length, hard-code to 32 entries */
> +       rc = pci_enable_pri(pdev, 32);
> +       if (rc) {
> +               dev_warn(dev, "Can't enable PRI (rc: %d)\n", rc);
> +               pci_disable_pasid(pdev);
> +               return;
> +       }
> +
> +       rc = pci_enable_ats(pdev, PAGE_SHIFT);
> +       if (rc) {
> +               dev_warn(dev, "Can't enable ATS (rc: %d)\n", rc);
> +               pci_disable_pri(pdev);
> +               pci_disable_pasid(pdev);
> +               return;
> +       }
> +
> +       ep->pc = (struct riscv_iommu_pc *)get_zeroed_page(GFP_KERNEL);
> +       if (!ep->pc) {
> +               pci_disable_ats(pdev);
> +               pci_disable_pri(pdev);
> +               pci_disable_pasid(pdev);
> +               return;
> +       }
> +
> +       ep->pasid_enabled = true;
> +       ep->pasid_feat = feat;
> +       ep->pasid_bits = ilog2(num);
> +
> +       dev_dbg(ep->dev, "PASID/ATS support enabled, %d bits\n", ep->pasid_bits);
> +}
> +
> +static int riscv_iommu_enable_sva(struct device *dev)
> +{
> +       int ret;
> +       struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
> +
> +       if (!ep || !ep->iommu || !ep->iommu->pq_work)
> +               return -EINVAL;
> +
> +       if (!ep->pasid_enabled)
> +               return -ENODEV;
> +
> +       ret = iopf_queue_add_device(ep->iommu->pq_work, dev);
> +       if (ret)
> +               return ret;
> +
> +       return iommu_register_device_fault_handler(dev, iommu_queue_iopf, dev);
> +}
> +
> +static int riscv_iommu_disable_sva(struct device *dev)
> +{
> +       int ret;
> +       struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
> +
> +       ret = iommu_unregister_device_fault_handler(dev);
> +       if (!ret)
> +               ret = iopf_queue_remove_device(ep->iommu->pq_work, dev);
> +
> +       return ret;
> +}
> +
> +static int riscv_iommu_enable_iopf(struct device *dev)
> +{
> +       struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
> +
> +       if (ep && ep->pasid_enabled)
> +               return 0;
> +
> +       return -EINVAL;
> +}
> +
> +static int riscv_iommu_dev_enable_feat(struct device *dev, enum iommu_dev_features feat)
> +{
> +       switch (feat) {
> +       case IOMMU_DEV_FEAT_IOPF:
> +               return riscv_iommu_enable_iopf(dev);
> +
> +       case IOMMU_DEV_FEAT_SVA:
> +               return riscv_iommu_enable_sva(dev);
> +
> +       default:
> +               return -ENODEV;
> +       }
> +}
> +
> +static int riscv_iommu_dev_disable_feat(struct device *dev, enum iommu_dev_features feat)
> +{
> +       switch (feat) {
> +       case IOMMU_DEV_FEAT_IOPF:
> +               return 0;
> +
> +       case IOMMU_DEV_FEAT_SVA:
> +               return riscv_iommu_disable_sva(dev);
> +
> +       default:
> +               return -ENODEV;
> +       }
> +}
> +
>  static int riscv_iommu_of_xlate(struct device *dev, struct of_phandle_args *args)
>  {
>         return iommu_fwspec_add_ids(dev, args->args, 1);
> @@ -812,6 +1247,7 @@ static struct iommu_device *riscv_iommu_probe_device(struct device *dev)
>
>         dev_iommu_priv_set(dev, ep);
>         riscv_iommu_add_device(iommu, dev);
> +       riscv_iommu_enable_ep(ep);
>
>         return &iommu->iommu;
>  }
> @@ -843,6 +1279,8 @@ static void riscv_iommu_release_device(struct device *dev)
>                 riscv_iommu_iodir_inv_devid(iommu, ep->devid);
>         }
>
> +       riscv_iommu_disable_ep(ep);
> +
>         /* Remove endpoint from IOMMU tracking structures */
>         mutex_lock(&iommu->eps_mutex);
>         rb_erase(&ep->node, &iommu->eps);
> @@ -878,7 +1316,8 @@ static struct iommu_domain *riscv_iommu_domain_alloc(unsigned type)
>             type != IOMMU_DOMAIN_DMA_FQ &&
>             type != IOMMU_DOMAIN_UNMANAGED &&
>             type != IOMMU_DOMAIN_IDENTITY &&
> -           type != IOMMU_DOMAIN_BLOCKED)
> +           type != IOMMU_DOMAIN_BLOCKED &&
> +           type != IOMMU_DOMAIN_SVA)
>                 return NULL;
>
>         domain = kzalloc(sizeof(*domain), GFP_KERNEL);
> @@ -906,6 +1345,9 @@ static void riscv_iommu_domain_free(struct iommu_domain *iommu_domain)
>                 pr_warn("IOMMU domain is not empty!\n");
>         }
>
> +       if (domain->mn.ops && iommu_domain->mm)
> +               mmu_notifier_unregister(&domain->mn, iommu_domain->mm);
> +
>         if (domain->pgtbl.cookie)
>                 free_io_pgtable_ops(&domain->pgtbl.ops);
>
> @@ -1023,14 +1465,29 @@ static int riscv_iommu_attach_dev(struct iommu_domain *iommu_domain, struct devi
>          */
>         val = FIELD_PREP(RISCV_IOMMU_DC_TA_PSCID, domain->pscid);
>
> -       dc->ta = cpu_to_le64(val);
> -       dc->fsc = cpu_to_le64(riscv_iommu_domain_atp(domain));
> +       if (ep->pasid_enabled) {
> +               ep->pc[0].ta = cpu_to_le64(val | RISCV_IOMMU_PC_TA_V);
> +               ep->pc[0].fsc = cpu_to_le64(riscv_iommu_domain_atp(domain));
> +               dc->ta = 0;
> +               dc->fsc = cpu_to_le64(virt_to_pfn(ep->pc) |
> +                   FIELD_PREP(RISCV_IOMMU_DC_FSC_MODE, RISCV_IOMMU_DC_FSC_PDTP_MODE_PD8));

Could I know why we determinate to use PD8 directly? Rather than PD17 or PD20.

> +       } else {
> +               dc->ta = cpu_to_le64(val);
> +               dc->fsc = cpu_to_le64(riscv_iommu_domain_atp(domain));
> +       }
>
>         wmb();
>
>         /* Mark device context as valid, synchronise device context cache. */
>         val = RISCV_IOMMU_DC_TC_V;
>
> +       if (ep->pasid_enabled) {
> +               val |= RISCV_IOMMU_DC_TC_EN_ATS |
> +                      RISCV_IOMMU_DC_TC_EN_PRI |
> +                      RISCV_IOMMU_DC_TC_DPE |
> +                      RISCV_IOMMU_DC_TC_PDTV;
> +       }
> +
>         if (ep->iommu->cap & RISCV_IOMMU_CAP_AMO) {
>                 val |= RISCV_IOMMU_DC_TC_GADE |
>                        RISCV_IOMMU_DC_TC_SADE;
> @@ -1051,13 +1508,107 @@ static int riscv_iommu_attach_dev(struct iommu_domain *iommu_domain, struct devi
>         return 0;
>  }
>
> +static int riscv_iommu_set_dev_pasid(struct iommu_domain *iommu_domain,
> +                                    struct device *dev, ioasid_t pasid)
> +{
> +       struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
> +       struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
> +       u64 ta, fsc;
> +
> +       if (!iommu_domain || !iommu_domain->mm)
> +               return -EINVAL;
> +
> +       /* Driver uses TC.DPE mode, PASID #0 is incorrect. */
> +       if (pasid == 0)
> +               return -EINVAL;
> +
> +       /* Incorrect domain identifier */
> +       if ((int)domain->pscid < 0)
> +               return -ENOMEM;
> +
> +       /* Process Context table should be set for pasid enabled endpoints. */
> +       if (!ep || !ep->pasid_enabled || !ep->dc || !ep->pc)
> +               return -ENODEV;
> +
> +       domain->pasid = pasid;
> +       domain->iommu = ep->iommu;
> +       domain->mn.ops = &riscv_iommu_mmuops;
> +
> +       /* register mm notifier */
> +       if (mmu_notifier_register(&domain->mn, iommu_domain->mm))
> +               return -ENODEV;
> +
> +       /* TODO: get SXL value for the process, use 32 bit or SATP mode */
> +       fsc = virt_to_pfn(iommu_domain->mm->pgd) | satp_mode;
> +       ta = RISCV_IOMMU_PC_TA_V | FIELD_PREP(RISCV_IOMMU_PC_TA_PSCID, domain->pscid);
> +
> +       fsc = le64_to_cpu(xchg_relaxed(&(ep->pc[pasid].fsc), cpu_to_le64(fsc)));
> +       ta = le64_to_cpu(xchg_relaxed(&(ep->pc[pasid].ta), cpu_to_le64(ta)));
> +
> +       wmb();
> +
> +       if (ta & RISCV_IOMMU_PC_TA_V) {
> +               riscv_iommu_iodir_inv_pasid(ep->iommu, ep->devid, pasid);
> +               riscv_iommu_iofence_sync(ep->iommu);
> +       }
> +
> +       dev_info(dev, "domain type %d attached w/ PSCID %u PASID %u\n",
> +           domain->domain.type, domain->pscid, domain->pasid);
> +
> +       return 0;
> +}
> +
> +static void riscv_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
> +{
> +       struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
> +       struct riscv_iommu_command cmd;
> +       unsigned long payload = riscv_iommu_ats_inval_all_payload(false);
> +       u64 ta;
> +
> +       /* invalidate TA.V */
> +       ta = le64_to_cpu(xchg_relaxed(&(ep->pc[pasid].ta), 0));
> +
> +       wmb();
> +
> +       dev_info(dev, "domain removed w/ PSCID %u PASID %u\n",
> +           (unsigned)FIELD_GET(RISCV_IOMMU_PC_TA_PSCID, ta), pasid);
> +
> +       /* 1. invalidate PDT entry */
> +       riscv_iommu_iodir_inv_pasid(ep->iommu, ep->devid, pasid);
> +
> +       /* 2. invalidate all matching IOATC entries (if PASID was valid) */
> +       if (ta & RISCV_IOMMU_PC_TA_V) {
> +               riscv_iommu_cmd_inval_vma(&cmd);
> +               riscv_iommu_cmd_inval_set_gscid(&cmd, 0);
> +               riscv_iommu_cmd_inval_set_pscid(&cmd,
> +                   FIELD_GET(RISCV_IOMMU_PC_TA_PSCID, ta));
> +               riscv_iommu_post(ep->iommu, &cmd);
> +       }
> +
> +       /* 3. Wait IOATC flush to happen */
> +       riscv_iommu_iofence_sync(ep->iommu);
> +
> +       /* 4. ATS invalidation */
> +       riscv_iommu_cmd_ats_inval(&cmd);
> +       riscv_iommu_cmd_ats_set_dseg(&cmd, ep->domid);
> +       riscv_iommu_cmd_ats_set_rid(&cmd, ep->devid);
> +       riscv_iommu_cmd_ats_set_pid(&cmd, pasid);
> +       riscv_iommu_cmd_ats_set_payload(&cmd, payload);
> +       riscv_iommu_post(ep->iommu, &cmd);
> +
> +       /* 5. Wait DevATC flush to happen */
> +       riscv_iommu_iofence_sync(ep->iommu);
> +}
> +
>  static void riscv_iommu_flush_iotlb_range(struct iommu_domain *iommu_domain,
>                                           unsigned long *start, unsigned long *end,
>                                           size_t *pgsize)
>  {
>         struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
>         struct riscv_iommu_command cmd;
> +       struct riscv_iommu_endpoint *endpoint;
>         unsigned long iova;
> +       unsigned long payload;
>
>         if (domain->mode == RISCV_IOMMU_DC_FSC_MODE_BARE)
>                 return;
> @@ -1065,6 +1616,12 @@ static void riscv_iommu_flush_iotlb_range(struct iommu_domain *iommu_domain,
>         /* Domain not attached to an IOMMU! */
>         BUG_ON(!domain->iommu);
>
> +       if (start && end) {
> +               payload = riscv_iommu_ats_inval_payload(*start, *end, true);
> +       } else {
> +               payload = riscv_iommu_ats_inval_all_payload(true);
> +       }
> +
>         riscv_iommu_cmd_inval_vma(&cmd);
>         riscv_iommu_cmd_inval_set_pscid(&cmd, domain->pscid);
>
> @@ -1078,6 +1635,20 @@ static void riscv_iommu_flush_iotlb_range(struct iommu_domain *iommu_domain,
>                 riscv_iommu_post(domain->iommu, &cmd);
>         }
>         riscv_iommu_iofence_sync(domain->iommu);
> +
> +       /* ATS invalidation for every device and for every translation */
> +       list_for_each_entry(endpoint, &domain->endpoints, domain) {
> +               if (!endpoint->pasid_enabled)
> +                       continue;
> +
> +               riscv_iommu_cmd_ats_inval(&cmd);
> +               riscv_iommu_cmd_ats_set_dseg(&cmd, endpoint->domid);
> +               riscv_iommu_cmd_ats_set_rid(&cmd, endpoint->devid);
> +               riscv_iommu_cmd_ats_set_pid(&cmd, domain->pasid);
> +               riscv_iommu_cmd_ats_set_payload(&cmd, payload);
> +               riscv_iommu_post(domain->iommu, &cmd);
> +       }
> +       riscv_iommu_iofence_sync(domain->iommu);
>  }
>
>  static void riscv_iommu_flush_iotlb_all(struct iommu_domain *iommu_domain)
> @@ -1310,6 +1881,7 @@ static int riscv_iommu_enable(struct riscv_iommu_device *iommu, unsigned request
>  static const struct iommu_domain_ops riscv_iommu_domain_ops = {
>         .free = riscv_iommu_domain_free,
>         .attach_dev = riscv_iommu_attach_dev,
> +       .set_dev_pasid = riscv_iommu_set_dev_pasid,
>         .map_pages = riscv_iommu_map_pages,
>         .unmap_pages = riscv_iommu_unmap_pages,
>         .iova_to_phys = riscv_iommu_iova_to_phys,
> @@ -1326,9 +1898,13 @@ static const struct iommu_ops riscv_iommu_ops = {
>         .probe_device = riscv_iommu_probe_device,
>         .probe_finalize = riscv_iommu_probe_finalize,
>         .release_device = riscv_iommu_release_device,
> +       .remove_dev_pasid = riscv_iommu_remove_dev_pasid,
>         .device_group = riscv_iommu_device_group,
>         .get_resv_regions = riscv_iommu_get_resv_regions,
>         .of_xlate = riscv_iommu_of_xlate,
> +       .dev_enable_feat = riscv_iommu_dev_enable_feat,
> +       .dev_disable_feat = riscv_iommu_dev_disable_feat,
> +       .page_response = riscv_iommu_page_response,
>         .default_domain_ops = &riscv_iommu_domain_ops,
>  };
>
> @@ -1340,6 +1916,7 @@ void riscv_iommu_remove(struct riscv_iommu_device *iommu)
>         riscv_iommu_queue_free(iommu, &iommu->cmdq);
>         riscv_iommu_queue_free(iommu, &iommu->fltq);
>         riscv_iommu_queue_free(iommu, &iommu->priq);
> +       iopf_queue_free(iommu->pq_work);
>  }
>
>  int riscv_iommu_init(struct riscv_iommu_device *iommu)
> @@ -1362,6 +1939,12 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
>         }
>  #endif
>
> +       if (iommu->cap & RISCV_IOMMU_CAP_PD20)
> +               iommu->iommu.max_pasids = 1u << 20;
> +       else if (iommu->cap & RISCV_IOMMU_CAP_PD17)
> +               iommu->iommu.max_pasids = 1u << 17;
> +       else if (iommu->cap & RISCV_IOMMU_CAP_PD8)
> +               iommu->iommu.max_pasids = 1u << 8;
>         /*
>          * Assign queue lengths from module parameters if not already
>          * set on the device tree.
> @@ -1387,6 +1970,13 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
>                 goto fail;
>         if (!(iommu->cap & RISCV_IOMMU_CAP_ATS))
>                 goto no_ats;
> +       /* PRI functionally depends on ATS’s capabilities. */
> +       iommu->pq_work = iopf_queue_alloc(dev_name(dev));
> +       if (!iommu->pq_work) {
> +               dev_err(dev, "failed to allocate iopf queue\n");
> +               ret = -ENOMEM;
> +               goto fail;
> +       }
>
>         ret = riscv_iommu_queue_init(iommu, RISCV_IOMMU_PAGE_REQUEST_QUEUE);
>         if (ret)
> @@ -1424,5 +2014,6 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
>         riscv_iommu_queue_free(iommu, &iommu->priq);
>         riscv_iommu_queue_free(iommu, &iommu->fltq);
>         riscv_iommu_queue_free(iommu, &iommu->cmdq);
> +       iopf_queue_free(iommu->pq_work);
>         return ret;
>  }
> diff --git a/drivers/iommu/riscv/iommu.h b/drivers/iommu/riscv/iommu.h
> index fe32a4eff14e..83e8d00fd0f8 100644
> --- a/drivers/iommu/riscv/iommu.h
> +++ b/drivers/iommu/riscv/iommu.h
> @@ -17,9 +17,11 @@
>  #include <linux/iova.h>
>  #include <linux/io.h>
>  #include <linux/idr.h>
> +#include <linux/mmu_notifier.h>
>  #include <linux/list.h>
>  #include <linux/iommu.h>
>  #include <linux/io-pgtable.h>
> +#include <linux/mmu_notifier.h>

You include the mmu_notifier.h twice in this header

>
>  #include "iommu-bits.h"
>
> @@ -76,6 +78,9 @@ struct riscv_iommu_device {
>         unsigned ddt_mode;
>         bool ddtp_in_iomem;
>
> +       /* I/O page fault queue */
> +       struct iopf_queue *pq_work;
> +
>         /* hardware queues */
>         struct riscv_iommu_queue cmdq;
>         struct riscv_iommu_queue fltq;
> @@ -91,11 +96,14 @@ struct riscv_iommu_domain {
>         struct io_pgtable pgtbl;
>
>         struct list_head endpoints;
> +       struct list_head notifiers;
>         struct mutex lock;
> +       struct mmu_notifier mn;
>         struct riscv_iommu_device *iommu;
>
>         unsigned mode;          /* RIO_ATP_MODE_* enum */
>         unsigned pscid;         /* RISC-V IOMMU PSCID */
> +       ioasid_t pasid;         /* IOMMU_DOMAIN_SVA: Cached PASID */
>
>         pgd_t *pgd_root;        /* page table root pointer */
>  };
> @@ -107,10 +115,16 @@ struct riscv_iommu_endpoint {
>         unsigned domid;                         /* PCI domain number, segment */
>         struct rb_node node;                    /* device tracking node (lookup by devid) */
>         struct riscv_iommu_dc *dc;              /* device context pointer */
> +       struct riscv_iommu_pc *pc;              /* process context root, valid if pasid_enabled is true */
>         struct riscv_iommu_device *iommu;       /* parent iommu device */
>
>         struct mutex lock;
>         struct list_head domain;                /* endpoint attached managed domain */
> +
> +       /* end point info bits */
> +       unsigned pasid_bits;
> +       unsigned pasid_feat;
> +       bool pasid_enabled;
>  };
>
>  /* Helper functions and macros */
> --
> 2.34.1
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues
  2023-07-29 12:58   ` Zong Li
@ 2023-07-31  9:32     ` Nick Kossifidis
  2023-07-31 13:15       ` Zong Li
  2023-08-02 20:50     ` Tomasz Jeznach
  1 sibling, 1 reply; 86+ messages in thread
From: Nick Kossifidis @ 2023-07-31  9:32 UTC (permalink / raw)
  To: Zong Li, Tomasz Jeznach
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley,
	Anup Patel, Albert Ou, linux, linux-kernel, Sebastien Boeuf,
	iommu, Palmer Dabbelt, linux-riscv

On 7/29/23 15:58, Zong Li wrote:
> On Thu, Jul 20, 2023 at 3:34 AM Tomasz Jeznach <tjeznach@rivosinc.com> wrote:
>> +       iommu->cap = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_CAP);
>> +
>> +       /* For now we only support WSIs until we have AIA support */
> 
> I'm not completely understand AIA support here, because I saw the pci
> case uses the MSI, and kernel seems to have the AIA implementation.
> Could you please elaborate it?
> 

When I wrote this we didn't have AIA in the kernel, and without IMSIC we 
can't have MSIs in the hart (we can still have MSIs in the PCIe controller).

> 
> Should we define the "interrupt-names" in dt-bindings?
> 

Yes we should, along with queue lengths below.

>> +
>> +       /* Make sure fctl.WSI is set */
>> +       fctl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_FCTL);
>> +       fctl |= RISCV_IOMMU_FCTL_WSI;
>> +       riscv_iommu_writel(iommu, RISCV_IOMMU_REG_FCTL, fctl);
>> +
>> +       /* Parse Queue lengts */
>> +       ret = of_property_read_u32(pdev->dev.of_node, "cmdq_len", &iommu->cmdq_len);
>> +       if (!ret)
>> +               dev_info(dev, "command queue length set to %i\n", iommu->cmdq_len);
>> +
>> +       ret = of_property_read_u32(pdev->dev.of_node, "fltq_len", &iommu->fltq_len);
>> +       if (!ret)
>> +               dev_info(dev, "fault/event queue length set to %i\n", iommu->fltq_len);
>> +
>> +       ret = of_property_read_u32(pdev->dev.of_node, "priq_len", &iommu->priq_len);
>> +       if (!ret)
>> +               dev_info(dev, "page request queue length set to %i\n", iommu->priq_len);
>> +
>>          dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
>>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues
  2023-07-31  9:32     ` Nick Kossifidis
@ 2023-07-31 13:15       ` Zong Li
  2023-07-31 23:35         ` Nick Kossifidis
  0 siblings, 1 reply; 86+ messages in thread
From: Zong Li @ 2023-07-31 13:15 UTC (permalink / raw)
  To: Nick Kossifidis
  Cc: Tomasz Jeznach, Joerg Roedel, Will Deacon, Robin Murphy,
	Paul Walmsley, Anup Patel, Albert Ou, linux, linux-kernel,
	Sebastien Boeuf, iommu, Palmer Dabbelt, linux-riscv

On Mon, Jul 31, 2023 at 5:32 PM Nick Kossifidis <mick@ics.forth.gr> wrote:
>
> On 7/29/23 15:58, Zong Li wrote:
> > On Thu, Jul 20, 2023 at 3:34 AM Tomasz Jeznach <tjeznach@rivosinc.com> wrote:
> >> +       iommu->cap = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_CAP);
> >> +
> >> +       /* For now we only support WSIs until we have AIA support */
> >
> > I'm not completely understand AIA support here, because I saw the pci
> > case uses the MSI, and kernel seems to have the AIA implementation.
> > Could you please elaborate it?
> >
>
> When I wrote this we didn't have AIA in the kernel, and without IMSIC we
> can't have MSIs in the hart (we can still have MSIs in the PCIe controller).

Thanks for your clarification, do we support the MSI in next version?

>
> >
> > Should we define the "interrupt-names" in dt-bindings?
> >
>
> Yes we should, along with queue lengths below.
>
> >> +
> >> +       /* Make sure fctl.WSI is set */
> >> +       fctl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_FCTL);
> >> +       fctl |= RISCV_IOMMU_FCTL_WSI;
> >> +       riscv_iommu_writel(iommu, RISCV_IOMMU_REG_FCTL, fctl);
> >> +
> >> +       /* Parse Queue lengts */
> >> +       ret = of_property_read_u32(pdev->dev.of_node, "cmdq_len", &iommu->cmdq_len);
> >> +       if (!ret)
> >> +               dev_info(dev, "command queue length set to %i\n", iommu->cmdq_len);
> >> +
> >> +       ret = of_property_read_u32(pdev->dev.of_node, "fltq_len", &iommu->fltq_len);
> >> +       if (!ret)
> >> +               dev_info(dev, "fault/event queue length set to %i\n", iommu->fltq_len);
> >> +
> >> +       ret = of_property_read_u32(pdev->dev.of_node, "priq_len", &iommu->priq_len);
> >> +       if (!ret)
> >> +               dev_info(dev, "page request queue length set to %i\n", iommu->priq_len);
> >> +
> >>          dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
> >>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues
  2023-07-31 13:15       ` Zong Li
@ 2023-07-31 23:35         ` Nick Kossifidis
  2023-08-01  0:37           ` Zong Li
  0 siblings, 1 reply; 86+ messages in thread
From: Nick Kossifidis @ 2023-07-31 23:35 UTC (permalink / raw)
  To: Zong Li
  Cc: Tomasz Jeznach, Joerg Roedel, Will Deacon, Robin Murphy,
	Paul Walmsley, Anup Patel, Albert Ou, linux, linux-kernel,
	Sebastien Boeuf, iommu, Palmer Dabbelt, linux-riscv

On 7/31/23 16:15, Zong Li wrote:
> On Mon, Jul 31, 2023 at 5:32 PM Nick Kossifidis <mick@ics.forth.gr> wrote:
>>
>> On 7/29/23 15:58, Zong Li wrote:
>>> On Thu, Jul 20, 2023 at 3:34 AM Tomasz Jeznach <tjeznach@rivosinc.com> wrote:
>>>> +       iommu->cap = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_CAP);
>>>> +
>>>> +       /* For now we only support WSIs until we have AIA support */
>>>
>>> I'm not completely understand AIA support here, because I saw the pci
>>> case uses the MSI, and kernel seems to have the AIA implementation.
>>> Could you please elaborate it?
>>>
>>
>> When I wrote this we didn't have AIA in the kernel, and without IMSIC we
>> can't have MSIs in the hart (we can still have MSIs in the PCIe controller).
> 
> Thanks for your clarification, do we support the MSI in next version?
> 

I don't think there is an IOMMU implementation out there (emulated or in 
hw) that can do MSIs and is not a pcie device (the QEMU implementation 
is a pcie device). If we have something to test this against, and we 
also have an IMSIC etc, we can work on that.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues
  2023-07-31 23:35         ` Nick Kossifidis
@ 2023-08-01  0:37           ` Zong Li
  2023-08-02 20:28             ` Tomasz Jeznach
  0 siblings, 1 reply; 86+ messages in thread
From: Zong Li @ 2023-08-01  0:37 UTC (permalink / raw)
  To: Nick Kossifidis
  Cc: Tomasz Jeznach, Joerg Roedel, Will Deacon, Robin Murphy,
	Paul Walmsley, Anup Patel, Albert Ou, linux, linux-kernel,
	Sebastien Boeuf, iommu, Palmer Dabbelt, linux-riscv

On Tue, Aug 1, 2023 at 7:35 AM Nick Kossifidis <mick@ics.forth.gr> wrote:
>
> On 7/31/23 16:15, Zong Li wrote:
> > On Mon, Jul 31, 2023 at 5:32 PM Nick Kossifidis <mick@ics.forth.gr> wrote:
> >>
> >> On 7/29/23 15:58, Zong Li wrote:
> >>> On Thu, Jul 20, 2023 at 3:34 AM Tomasz Jeznach <tjeznach@rivosinc.com> wrote:
> >>>> +       iommu->cap = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_CAP);
> >>>> +
> >>>> +       /* For now we only support WSIs until we have AIA support */
> >>>
> >>> I'm not completely understand AIA support here, because I saw the pci
> >>> case uses the MSI, and kernel seems to have the AIA implementation.
> >>> Could you please elaborate it?
> >>>
> >>
> >> When I wrote this we didn't have AIA in the kernel, and without IMSIC we
> >> can't have MSIs in the hart (we can still have MSIs in the PCIe controller).
> >
> > Thanks for your clarification, do we support the MSI in next version?
> >
>
> I don't think there is an IOMMU implementation out there (emulated or in
> hw) that can do MSIs and is not a pcie device (the QEMU implementation
> is a pcie device). If we have something to test this against, and we
> also have an IMSIC etc, we can work on that.

I guess I can assist with that. We have an IOMMU hardware (non-pcie
device) that has already implemented the MSI functionality, and I have
conducted testing on it. Perhaps let me add the related implementation
here after this series is merged.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 01/11] RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support.
  2023-07-28  2:42   ` Zong Li
@ 2023-08-02 20:15     ` Tomasz Jeznach
  2023-08-02 20:25       ` Conor Dooley
  2023-08-03  3:37       ` Zong Li
  0 siblings, 2 replies; 86+ messages in thread
From: Tomasz Jeznach @ 2023-08-02 20:15 UTC (permalink / raw)
  To: Zong Li
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley,
	Anup Patel, Albert Ou, linux, linux-kernel, Sebastien Boeuf,
	iommu, Palmer Dabbelt, Nick Kossifidis, linux-riscv

On Thu, Jul 27, 2023 at 7:42 PM Zong Li <zong.li@sifive.com> wrote:
>
> On Thu, Jul 20, 2023 at 3:34 AM Tomasz Jeznach <tjeznach@rivosinc.com> wrote:
> >
> > +static int riscv_iommu_platform_probe(struct platform_device *pdev)
> > +{
> > +       struct device *dev = &pdev->dev;
> > +       struct riscv_iommu_device *iommu = NULL;
> > +       struct resource *res = NULL;
> > +       int ret = 0;
> > +
> > +       iommu = devm_kzalloc(dev, sizeof(*iommu), GFP_KERNEL);
> > +       if (!iommu)
> > +               return -ENOMEM;
> > +
> > +       iommu->dev = dev;
> > +       dev_set_drvdata(dev, iommu);
> > +
> > +       res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> > +       if (!res) {
> > +               dev_err(dev, "could not find resource for register region\n");
> > +               return -EINVAL;
> > +       }
> > +
> > +       iommu->reg = devm_platform_get_and_ioremap_resource(pdev, 0, &res);
> > +       if (IS_ERR(iommu->reg)) {
> > +               ret = dev_err_probe(dev, PTR_ERR(iommu->reg),
> > +                                   "could not map register region\n");
> > +               goto fail;
> > +       };
> > +
> > +       iommu->reg_phys = res->start;
> > +
> > +       ret = -ENODEV;
> > +
> > +       /* Sanity check: Did we get the whole register space ? */
> > +       if ((res->end - res->start + 1) < RISCV_IOMMU_REG_SIZE) {
> > +               dev_err(dev, "device region smaller than register file (0x%llx)\n",
> > +                       res->end - res->start);
> > +               goto fail;
> > +       }
>
> Could we assume that DT should be responsible for specifying the right size?
>

This only to validate DT provided info and driver expected register
file size. Expectation is that DT will provide right size.


> > +static struct iommu_domain *riscv_iommu_domain_alloc(unsigned type)
> > +{
> > +       struct riscv_iommu_domain *domain;
> > +
> > +       if (type != IOMMU_DOMAIN_IDENTITY &&
> > +           type != IOMMU_DOMAIN_BLOCKED)
> > +               return NULL;
> > +
> > +       domain = kzalloc(sizeof(*domain), GFP_KERNEL);
> > +       if (!domain)
> > +               return NULL;
> > +
> > +       mutex_init(&domain->lock);
> > +       INIT_LIST_HEAD(&domain->endpoints);
> > +
> > +       domain->domain.ops = &riscv_iommu_domain_ops;
> > +       domain->mode = RISCV_IOMMU_DC_FSC_MODE_BARE;
> > +       domain->pscid = ida_alloc_range(&riscv_iommu_pscids, 1,
> > +                                       RISCV_IOMMU_MAX_PSCID, GFP_KERNEL);
> > +
> > +       printk("domain type %x alloc %u\n", type, domain->pscid);
> > +
>
> Could it uses pr_xxx instead of printk?
>

Absolutely, fixed here and elsewhere. Also, used dev_dbg wherever applicable.

> > +
> > +static int riscv_iommu_enable(struct riscv_iommu_device *iommu, unsigned requested_mode)
> > +{
> > +       struct device *dev = iommu->dev;
> > +       u64 ddtp = 0;
> > +       u64 ddtp_paddr = 0;
> > +       unsigned mode = requested_mode;
> > +       unsigned mode_readback = 0;
> > +
> > +       ddtp = riscv_iommu_get_ddtp(iommu);
> > +       if (ddtp & RISCV_IOMMU_DDTP_BUSY)
> > +               return -EBUSY;
> > +
> > +       /* Disallow state transtion from xLVL to xLVL. */
> > +       switch (FIELD_GET(RISCV_IOMMU_DDTP_MODE, ddtp)) {
> > +       case RISCV_IOMMU_DDTP_MODE_BARE:
> > +       case RISCV_IOMMU_DDTP_MODE_OFF:
> > +               break;
> > +       default:
> > +               if ((mode != RISCV_IOMMU_DDTP_MODE_BARE)
> > +                   && (mode != RISCV_IOMMU_DDTP_MODE_OFF))
> > +                       return -EINVAL;
> > +               break;
> > +       }
> > +
> > + retry:
>
> We need to consider the `iommu.passthrough` before we set up the mode
> in switch case, something like
>

This function is only to execute configuration and set device directory mode.
Handling global iommu.passthrough policy is implemented in
riscv_iommu_init() call (patch #7).

Best,
- Tomasz

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 01/11] RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support.
  2023-08-02 20:15     ` Tomasz Jeznach
@ 2023-08-02 20:25       ` Conor Dooley
  2023-08-03  3:37       ` Zong Li
  1 sibling, 0 replies; 86+ messages in thread
From: Conor Dooley @ 2023-08-02 20:25 UTC (permalink / raw)
  To: Tomasz Jeznach
  Cc: Zong Li, Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley,
	Anup Patel, Albert Ou, linux, linux-kernel, Sebastien Boeuf,
	iommu, Palmer Dabbelt, Nick Kossifidis, linux-riscv

[-- Attachment #1: Type: text/plain, Size: 2079 bytes --]

On Wed, Aug 02, 2023 at 01:15:22PM -0700, Tomasz Jeznach wrote:
> On Thu, Jul 27, 2023 at 7:42 PM Zong Li <zong.li@sifive.com> wrote:
> >
> > On Thu, Jul 20, 2023 at 3:34 AM Tomasz Jeznach <tjeznach@rivosinc.com> wrote:
> > >
> > > +static int riscv_iommu_platform_probe(struct platform_device *pdev)
> > > +{
> > > +       struct device *dev = &pdev->dev;
> > > +       struct riscv_iommu_device *iommu = NULL;
> > > +       struct resource *res = NULL;
> > > +       int ret = 0;
> > > +
> > > +       iommu = devm_kzalloc(dev, sizeof(*iommu), GFP_KERNEL);
> > > +       if (!iommu)
> > > +               return -ENOMEM;
> > > +
> > > +       iommu->dev = dev;
> > > +       dev_set_drvdata(dev, iommu);
> > > +
> > > +       res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> > > +       if (!res) {
> > > +               dev_err(dev, "could not find resource for register region\n");
> > > +               return -EINVAL;
> > > +       }
> > > +
> > > +       iommu->reg = devm_platform_get_and_ioremap_resource(pdev, 0, &res);
> > > +       if (IS_ERR(iommu->reg)) {
> > > +               ret = dev_err_probe(dev, PTR_ERR(iommu->reg),
> > > +                                   "could not map register region\n");
> > > +               goto fail;
> > > +       };
> > > +
> > > +       iommu->reg_phys = res->start;
> > > +
> > > +       ret = -ENODEV;
> > > +
> > > +       /* Sanity check: Did we get the whole register space ? */
> > > +       if ((res->end - res->start + 1) < RISCV_IOMMU_REG_SIZE) {
> > > +               dev_err(dev, "device region smaller than register file (0x%llx)\n",
> > > +                       res->end - res->start);
> > > +               goto fail;
> > > +       }
> >
> > Could we assume that DT should be responsible for specifying the right size?
> >
> 
> This only to validate DT provided info and driver expected register
> file size. Expectation is that DT will provide right size.

FWIW this check seems needless to me, it's not the kernels job to
validate the devicetree.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues
  2023-08-01  0:37           ` Zong Li
@ 2023-08-02 20:28             ` Tomasz Jeznach
  0 siblings, 0 replies; 86+ messages in thread
From: Tomasz Jeznach @ 2023-08-02 20:28 UTC (permalink / raw)
  To: Zong Li
  Cc: Nick Kossifidis, Joerg Roedel, Will Deacon, Robin Murphy,
	Paul Walmsley, Anup Patel, Albert Ou, linux, linux-kernel,
	Sebastien Boeuf, iommu, Palmer Dabbelt, linux-riscv

On Mon, Jul 31, 2023 at 5:38 PM Zong Li <zong.li@sifive.com> wrote:
>
> On Tue, Aug 1, 2023 at 7:35 AM Nick Kossifidis <mick@ics.forth.gr> wrote:
> >
> > On 7/31/23 16:15, Zong Li wrote:
> > > On Mon, Jul 31, 2023 at 5:32 PM Nick Kossifidis <mick@ics.forth.gr> wrote:
> > >>
> > >> On 7/29/23 15:58, Zong Li wrote:
> > >>> On Thu, Jul 20, 2023 at 3:34 AM Tomasz Jeznach <tjeznach@rivosinc.com> wrote:
> > >>>> +       iommu->cap = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_CAP);
> > >>>> +
> > >>>> +       /* For now we only support WSIs until we have AIA support */
> > >>>
> > >>> I'm not completely understand AIA support here, because I saw the pci
> > >>> case uses the MSI, and kernel seems to have the AIA implementation.
> > >>> Could you please elaborate it?
> > >>>
> > >>
> > >> When I wrote this we didn't have AIA in the kernel, and without IMSIC we
> > >> can't have MSIs in the hart (we can still have MSIs in the PCIe controller).
> > >
> > > Thanks for your clarification, do we support the MSI in next version?
> > >
> >
> > I don't think there is an IOMMU implementation out there (emulated or in
> > hw) that can do MSIs and is not a pcie device (the QEMU implementation
> > is a pcie device). If we have something to test this against, and we
> > also have an IMSIC etc, we can work on that.
>
> I guess I can assist with that. We have an IOMMU hardware (non-pcie
> device) that has already implemented the MSI functionality, and I have
> conducted testing on it. Perhaps let me add the related implementation
> here after this series is merged.

Thanks, getting MSI support for non-PCIe IOMMU hardware would be great!

best,
- Tomasz

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues
  2023-07-29 12:58   ` Zong Li
  2023-07-31  9:32     ` Nick Kossifidis
@ 2023-08-02 20:50     ` Tomasz Jeznach
  2023-08-03  8:24       ` Zong Li
  1 sibling, 1 reply; 86+ messages in thread
From: Tomasz Jeznach @ 2023-08-02 20:50 UTC (permalink / raw)
  To: Zong Li
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley,
	Anup Patel, Albert Ou, linux, linux-kernel, Sebastien Boeuf,
	iommu, Palmer Dabbelt, Nick Kossifidis, linux-riscv

On Sat, Jul 29, 2023 at 5:58 AM Zong Li <zong.li@sifive.com> wrote:
>
> On Thu, Jul 20, 2023 at 3:34 AM Tomasz Jeznach <tjeznach@rivosinc.com> wrote:
> >
> > Enables message or wire signal interrupts for PCIe and platforms devices.
> >
> > Co-developed-by: Nick Kossifidis <mick@ics.forth.gr>
> > Signed-off-by: Nick Kossifidis <mick@ics.forth.gr>
> > Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> > ---
> >  drivers/iommu/riscv/iommu-pci.c      |  72 ++++
> >  drivers/iommu/riscv/iommu-platform.c |  66 +++
> >  drivers/iommu/riscv/iommu.c          | 604 ++++++++++++++++++++++++++-
> >  drivers/iommu/riscv/iommu.h          |  28 ++
> >  4 files changed, 769 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/riscv/iommu-pci.c b/drivers/iommu/riscv/iommu-pci.c
> > index c91f963d7a29..9ea0647f7b92 100644
> > --- a/drivers/iommu/riscv/iommu-pci.c
> > +++ b/drivers/iommu/riscv/iommu-pci.c
> > @@ -34,6 +34,7 @@ static int riscv_iommu_pci_probe(struct pci_dev *pdev, const struct pci_device_i
> >  {
> >         struct device *dev = &pdev->dev;
> >         struct riscv_iommu_device *iommu;
> > +       u64 icvec;
> >         int ret;
> >
> >         ret = pci_enable_device_mem(pdev);
> > @@ -67,14 +68,84 @@ static int riscv_iommu_pci_probe(struct pci_dev *pdev, const struct pci_device_i
> >         iommu->dev = dev;
> >         dev_set_drvdata(dev, iommu);
> >
> > +       /* Check device reported capabilities. */
> > +       iommu->cap = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_CAP);
> > +
> > +       /* The PCI driver only uses MSIs, make sure the IOMMU supports this */
> > +       switch (FIELD_GET(RISCV_IOMMU_CAP_IGS, iommu->cap)) {
> > +       case RISCV_IOMMU_CAP_IGS_MSI:
> > +       case RISCV_IOMMU_CAP_IGS_BOTH:
> > +               break;
> > +       default:
> > +               dev_err(dev, "unable to use message-signaled interrupts\n");
> > +               ret = -ENODEV;
> > +               goto fail;
> > +       }
> > +
> >         dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
> >         pci_set_master(pdev);
> >
> > +       /* Allocate and assign IRQ vectors for the various events */
> > +       ret = pci_alloc_irq_vectors(pdev, 1, RISCV_IOMMU_INTR_COUNT, PCI_IRQ_MSIX);
> > +       if (ret < 0) {
> > +               dev_err(dev, "unable to allocate irq vectors\n");
> > +               goto fail;
> > +       }
> > +
> > +       ret = -ENODEV;
> > +
> > +       iommu->irq_cmdq = msi_get_virq(dev, RISCV_IOMMU_INTR_CQ);
> > +       if (!iommu->irq_cmdq) {
> > +               dev_warn(dev, "no MSI vector %d for the command queue\n",
> > +                        RISCV_IOMMU_INTR_CQ);
> > +               goto fail;
> > +       }
> > +
> > +       iommu->irq_fltq = msi_get_virq(dev, RISCV_IOMMU_INTR_FQ);
> > +       if (!iommu->irq_fltq) {
> > +               dev_warn(dev, "no MSI vector %d for the fault/event queue\n",
> > +                        RISCV_IOMMU_INTR_FQ);
> > +               goto fail;
> > +       }
> > +
> > +       if (iommu->cap & RISCV_IOMMU_CAP_HPM) {
> > +               iommu->irq_pm = msi_get_virq(dev, RISCV_IOMMU_INTR_PM);
> > +               if (!iommu->irq_pm) {
> > +                       dev_warn(dev,
> > +                                "no MSI vector %d for performance monitoring\n",
> > +                                RISCV_IOMMU_INTR_PM);
> > +                       goto fail;
> > +               }
> > +       }
> > +
> > +       if (iommu->cap & RISCV_IOMMU_CAP_ATS) {
> > +               iommu->irq_priq = msi_get_virq(dev, RISCV_IOMMU_INTR_PQ);
> > +               if (!iommu->irq_priq) {
> > +                       dev_warn(dev,
> > +                                "no MSI vector %d for page-request queue\n",
> > +                                RISCV_IOMMU_INTR_PQ);
> > +                       goto fail;
> > +               }
> > +       }
> > +
> > +       /* Set simple 1:1 mapping for MSI vectors */
> > +       icvec = FIELD_PREP(RISCV_IOMMU_IVEC_CIV, RISCV_IOMMU_INTR_CQ) |
> > +           FIELD_PREP(RISCV_IOMMU_IVEC_FIV, RISCV_IOMMU_INTR_FQ);
> > +
> > +       if (iommu->cap & RISCV_IOMMU_CAP_HPM)
> > +               icvec |= FIELD_PREP(RISCV_IOMMU_IVEC_PMIV, RISCV_IOMMU_INTR_PM);
> > +
> > +       if (iommu->cap & RISCV_IOMMU_CAP_ATS)
> > +               icvec |= FIELD_PREP(RISCV_IOMMU_IVEC_PIV, RISCV_IOMMU_INTR_PQ);
> > +
> > +       riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IVEC, icvec);
> > +
> >         ret = riscv_iommu_init(iommu);
> >         if (!ret)
> >                 return ret;
> >
> >   fail:
> > +       pci_free_irq_vectors(pdev);
> >         pci_clear_master(pdev);
> >         pci_release_regions(pdev);
> >         pci_disable_device(pdev);
> > @@ -85,6 +156,7 @@ static int riscv_iommu_pci_probe(struct pci_dev *pdev, const struct pci_device_i
> >  static void riscv_iommu_pci_remove(struct pci_dev *pdev)
> >  {
> >         riscv_iommu_remove(dev_get_drvdata(&pdev->dev));
> > +       pci_free_irq_vectors(pdev);
> >         pci_clear_master(pdev);
> >         pci_release_regions(pdev);
> >         pci_disable_device(pdev);
> > diff --git a/drivers/iommu/riscv/iommu-platform.c b/drivers/iommu/riscv/iommu-platform.c
> > index e4e8ca6711e7..35935d3c7ef4 100644
> > --- a/drivers/iommu/riscv/iommu-platform.c
> > +++ b/drivers/iommu/riscv/iommu-platform.c
> > @@ -20,6 +20,8 @@ static int riscv_iommu_platform_probe(struct platform_device *pdev)
> >         struct device *dev = &pdev->dev;
> >         struct riscv_iommu_device *iommu = NULL;
> >         struct resource *res = NULL;
> > +       u32 fctl = 0;
> > +       int irq = 0;
> >         int ret = 0;
> >
> >         iommu = devm_kzalloc(dev, sizeof(*iommu), GFP_KERNEL);
> > @@ -53,6 +55,70 @@ static int riscv_iommu_platform_probe(struct platform_device *pdev)
> >                 goto fail;
> >         }
> >
> > +       iommu->cap = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_CAP);
> > +
> > +       /* For now we only support WSIs until we have AIA support */
>
> I'm not completely understand AIA support here, because I saw the pci
> case uses the MSI, and kernel seems to have the AIA implementation.
> Could you please elaborate it?
>
> > +       ret = FIELD_GET(RISCV_IOMMU_CAP_IGS, iommu->cap);
> > +       if (ret == RISCV_IOMMU_CAP_IGS_MSI) {
> > +               dev_err(dev, "IOMMU only supports MSIs\n");
> > +               goto fail;
> > +       }
> > +
> > +       /* Parse IRQ assignment */
> > +       irq = platform_get_irq_byname_optional(pdev, "cmdq");
> > +       if (irq > 0)
> > +               iommu->irq_cmdq = irq;
> > +       else {
> > +               dev_err(dev, "no IRQ provided for the command queue\n");
> > +               goto fail;
> > +       }
> > +
> > +       irq = platform_get_irq_byname_optional(pdev, "fltq");
> > +       if (irq > 0)
> > +               iommu->irq_fltq = irq;
> > +       else {
> > +               dev_err(dev, "no IRQ provided for the fault/event queue\n");
> > +               goto fail;
> > +       }
> > +
> > +       if (iommu->cap & RISCV_IOMMU_CAP_HPM) {
> > +               irq = platform_get_irq_byname_optional(pdev, "pm");
> > +               if (irq > 0)
> > +                       iommu->irq_pm = irq;
> > +               else {
> > +                       dev_err(dev, "no IRQ provided for performance monitoring\n");
> > +                       goto fail;
> > +               }
> > +       }
> > +
> > +       if (iommu->cap & RISCV_IOMMU_CAP_ATS) {
> > +               irq = platform_get_irq_byname_optional(pdev, "priq");
> > +               if (irq > 0)
> > +                       iommu->irq_priq = irq;
> > +               else {
> > +                       dev_err(dev, "no IRQ provided for the page-request queue\n");
> > +                       goto fail;
> > +               }
> > +       }
>
> Should we define the "interrupt-names" in dt-bindings?
>

Yes, this was brought up earlier wrt dt-bindings.

I'm considering removal of interrupt names from DT (and get-byname
option), as IOMMU hardware cause-to-vector remapping `icvec` should be
used to map interrupt source to actual interrupt vector. If possible
device driver should map cause to interrupt (based on number of
vectors available) or rely on ICVEC WARL properties to discover fixed
cause-to-vector mapping in the hardware.

Please let me know if this is reasonable change.

> > +
> > +       /* Make sure fctl.WSI is set */
> > +       fctl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_FCTL);
> > +       fctl |= RISCV_IOMMU_FCTL_WSI;
> > +       riscv_iommu_writel(iommu, RISCV_IOMMU_REG_FCTL, fctl);
> > +
> > +       /* Parse Queue lengts */
> > +       ret = of_property_read_u32(pdev->dev.of_node, "cmdq_len", &iommu->cmdq_len);
> > +       if (!ret)
> > +               dev_info(dev, "command queue length set to %i\n", iommu->cmdq_len);
> > +
> > +       ret = of_property_read_u32(pdev->dev.of_node, "fltq_len", &iommu->fltq_len);
> > +       if (!ret)
> > +               dev_info(dev, "fault/event queue length set to %i\n", iommu->fltq_len);
> > +
> > +       ret = of_property_read_u32(pdev->dev.of_node, "priq_len", &iommu->priq_len);
> > +       if (!ret)
> > +               dev_info(dev, "page request queue length set to %i\n", iommu->priq_len);
> > +
> >         dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
> >
> >         return riscv_iommu_init(iommu);
> > diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
> > index 31dc3c458e13..5c4cf9875302 100644
> > --- a/drivers/iommu/riscv/iommu.c
> > +++ b/drivers/iommu/riscv/iommu.c
> > @@ -45,6 +45,18 @@ static int ddt_mode = RISCV_IOMMU_DDTP_MODE_BARE;
> >  module_param(ddt_mode, int, 0644);
> >  MODULE_PARM_DESC(ddt_mode, "Device Directory Table mode.");
> >
> > +static int cmdq_length = 1024;
> > +module_param(cmdq_length, int, 0644);
> > +MODULE_PARM_DESC(cmdq_length, "Command queue length.");
> > +
> > +static int fltq_length = 1024;
> > +module_param(fltq_length, int, 0644);
> > +MODULE_PARM_DESC(fltq_length, "Fault queue length.");
> > +
> > +static int priq_length = 1024;
> > +module_param(priq_length, int, 0644);
> > +MODULE_PARM_DESC(priq_length, "Page request interface queue length.");
> > +
> >  /* IOMMU PSCID allocation namespace. */
> >  #define RISCV_IOMMU_MAX_PSCID  (1U << 20)
> >  static DEFINE_IDA(riscv_iommu_pscids);
> > @@ -65,6 +77,497 @@ static DEFINE_IDA(riscv_iommu_pscids);
> >  static const struct iommu_domain_ops riscv_iommu_domain_ops;
> >  static const struct iommu_ops riscv_iommu_ops;
> >
> > +/*
> > + * Common queue management routines
> > + */
> > +
> > +/* Note: offsets are the same for all queues */
> > +#define Q_HEAD(q) ((q)->qbr + (RISCV_IOMMU_REG_CQH - RISCV_IOMMU_REG_CQB))
> > +#define Q_TAIL(q) ((q)->qbr + (RISCV_IOMMU_REG_CQT - RISCV_IOMMU_REG_CQB))
> > +
> > +static unsigned riscv_iommu_queue_consume(struct riscv_iommu_device *iommu,
> > +                                         struct riscv_iommu_queue *q, unsigned *ready)
> > +{
> > +       u32 tail = riscv_iommu_readl(iommu, Q_TAIL(q));
> > +       *ready = q->lui;
> > +
> > +       BUG_ON(q->cnt <= tail);
> > +       if (q->lui <= tail)
> > +               return tail - q->lui;
> > +       return q->cnt - q->lui;
> > +}
> > +
> > +static void riscv_iommu_queue_release(struct riscv_iommu_device *iommu,
> > +                                     struct riscv_iommu_queue *q, unsigned count)
> > +{
> > +       q->lui = (q->lui + count) & (q->cnt - 1);
> > +       riscv_iommu_writel(iommu, Q_HEAD(q), q->lui);
> > +}
> > +
> > +static u32 riscv_iommu_queue_ctrl(struct riscv_iommu_device *iommu,
> > +                                 struct riscv_iommu_queue *q, u32 val)
> > +{
> > +       cycles_t end_cycles = RISCV_IOMMU_TIMEOUT + get_cycles();
> > +
> > +       riscv_iommu_writel(iommu, q->qcr, val);
> > +       do {
> > +               val = riscv_iommu_readl(iommu, q->qcr);
> > +               if (!(val & RISCV_IOMMU_QUEUE_BUSY))
> > +                       break;
> > +               cpu_relax();
> > +       } while (get_cycles() < end_cycles);
> > +
> > +       return val;
> > +}
> > +
> > +static void riscv_iommu_queue_free(struct riscv_iommu_device *iommu,
> > +                                  struct riscv_iommu_queue *q)
> > +{
> > +       size_t size = q->len * q->cnt;
> > +
> > +       riscv_iommu_queue_ctrl(iommu, q, 0);
> > +
> > +       if (q->base) {
> > +               if (q->in_iomem)
> > +                       iounmap(q->base);
> > +               else
> > +                       dmam_free_coherent(iommu->dev, size, q->base, q->base_dma);
> > +       }
> > +       if (q->irq)
> > +               free_irq(q->irq, q);
> > +}
> > +
> > +static irqreturn_t riscv_iommu_cmdq_irq_check(int irq, void *data);
> > +static irqreturn_t riscv_iommu_cmdq_process(int irq, void *data);
> > +static irqreturn_t riscv_iommu_fltq_irq_check(int irq, void *data);
> > +static irqreturn_t riscv_iommu_fltq_process(int irq, void *data);
> > +static irqreturn_t riscv_iommu_priq_irq_check(int irq, void *data);
> > +static irqreturn_t riscv_iommu_priq_process(int irq, void *data);
> > +
> > +static int riscv_iommu_queue_init(struct riscv_iommu_device *iommu, int queue_id)
> > +{
> > +       struct device *dev = iommu->dev;
> > +       struct riscv_iommu_queue *q = NULL;
> > +       size_t queue_size = 0;
> > +       irq_handler_t irq_check;
> > +       irq_handler_t irq_process;
> > +       const char *name;
> > +       int count = 0;
> > +       int irq = 0;
> > +       unsigned order = 0;
> > +       u64 qbr_val = 0;
> > +       u64 qbr_readback = 0;
> > +       u64 qbr_paddr = 0;
> > +       int ret = 0;
> > +
> > +       switch (queue_id) {
> > +       case RISCV_IOMMU_COMMAND_QUEUE:
> > +               q = &iommu->cmdq;
> > +               q->len = sizeof(struct riscv_iommu_command);
> > +               count = iommu->cmdq_len;
> > +               irq = iommu->irq_cmdq;
> > +               irq_check = riscv_iommu_cmdq_irq_check;
> > +               irq_process = riscv_iommu_cmdq_process;
> > +               q->qbr = RISCV_IOMMU_REG_CQB;
> > +               q->qcr = RISCV_IOMMU_REG_CQCSR;
> > +               name = "cmdq";
> > +               break;
> > +       case RISCV_IOMMU_FAULT_QUEUE:
> > +               q = &iommu->fltq;
> > +               q->len = sizeof(struct riscv_iommu_fq_record);
> > +               count = iommu->fltq_len;
> > +               irq = iommu->irq_fltq;
> > +               irq_check = riscv_iommu_fltq_irq_check;
> > +               irq_process = riscv_iommu_fltq_process;
> > +               q->qbr = RISCV_IOMMU_REG_FQB;
> > +               q->qcr = RISCV_IOMMU_REG_FQCSR;
> > +               name = "fltq";
> > +               break;
> > +       case RISCV_IOMMU_PAGE_REQUEST_QUEUE:
> > +               q = &iommu->priq;
> > +               q->len = sizeof(struct riscv_iommu_pq_record);
> > +               count = iommu->priq_len;
> > +               irq = iommu->irq_priq;
> > +               irq_check = riscv_iommu_priq_irq_check;
> > +               irq_process = riscv_iommu_priq_process;
> > +               q->qbr = RISCV_IOMMU_REG_PQB;
> > +               q->qcr = RISCV_IOMMU_REG_PQCSR;
> > +               name = "priq";
> > +               break;
> > +       default:
> > +               dev_err(dev, "invalid queue interrupt index in queue_init!\n");
> > +               return -EINVAL;
> > +       }
> > +
> > +       /* Polling not implemented */
> > +       if (!irq)
> > +               return -ENODEV;
> > +
> > +       /* Allocate queue in memory and set the base register */
> > +       order = ilog2(count);
> > +       do {
> > +               queue_size = q->len * (1ULL << order);
> > +               q->base = dmam_alloc_coherent(dev, queue_size, &q->base_dma, GFP_KERNEL);
> > +               if (q->base || queue_size < PAGE_SIZE)
> > +                       break;
> > +
> > +               order--;
> > +       } while (1);
> > +
> > +       if (!q->base) {
> > +               dev_err(dev, "failed to allocate %s queue (cnt: %u)\n", name, count);
> > +               return -ENOMEM;
> > +       }
> > +
> > +       q->cnt = 1ULL << order;
> > +
> > +       qbr_val = phys_to_ppn(q->base_dma) |
> > +           FIELD_PREP(RISCV_IOMMU_QUEUE_LOGSZ_FIELD, order - 1);
> > +
> > +       riscv_iommu_writeq(iommu, q->qbr, qbr_val);
> > +
> > +       /*
> > +        * Queue base registers are WARL, so it's possible that whatever we wrote
> > +        * there was illegal/not supported by the hw in which case we need to make
> > +        * sure we set a supported PPN and/or queue size.
> > +        */
> > +       qbr_readback = riscv_iommu_readq(iommu, q->qbr);
> > +       if (qbr_readback == qbr_val)
> > +               goto irq;
> > +
> > +       dmam_free_coherent(dev, queue_size, q->base, q->base_dma);
> > +
> > +       /* Get supported queue size */
> > +       order = FIELD_GET(RISCV_IOMMU_QUEUE_LOGSZ_FIELD, qbr_readback) + 1;
> > +       q->cnt = 1ULL << order;
> > +       queue_size = q->len * q->cnt;
> > +
> > +       /*
> > +        * In case we also failed to set PPN, it means the field is hardcoded and the
> > +        * queue resides in I/O memory instead, so get its physical address and
> > +        * ioremap it.
> > +        */
> > +       qbr_paddr = ppn_to_phys(qbr_readback);
> > +       if (qbr_paddr != q->base_dma) {
> > +               dev_info(dev,
> > +                        "hardcoded ppn in %s base register, using io memory for the queue\n",
> > +                        name);
> > +               dev_info(dev, "queue length for %s set to %i\n", name, q->cnt);
> > +               q->in_iomem = true;
> > +               q->base = ioremap(qbr_paddr, queue_size);
> > +               if (!q->base) {
> > +                       dev_err(dev, "failed to map %s queue (cnt: %u)\n", name, q->cnt);
> > +                       return -ENOMEM;
> > +               }
> > +               q->base_dma = qbr_paddr;
> > +       } else {
> > +               /*
> > +                * We only failed to set the queue size, re-try to allocate memory with
> > +                * the queue size supported by the hw.
> > +                */
> > +               dev_info(dev, "hardcoded queue size in %s base register\n", name);
> > +               dev_info(dev, "retrying with queue length: %i\n", q->cnt);
> > +               q->base = dmam_alloc_coherent(dev, queue_size, &q->base_dma, GFP_KERNEL);
> > +               if (!q->base) {
> > +                       dev_err(dev, "failed to allocate %s queue (cnt: %u)\n",
> > +                               name, q->cnt);
> > +                       return -ENOMEM;
> > +               }
> > +       }
> > +
> > +       qbr_val = phys_to_ppn(q->base_dma) |
> > +           FIELD_PREP(RISCV_IOMMU_QUEUE_LOGSZ_FIELD, order - 1);
> > +       riscv_iommu_writeq(iommu, q->qbr, qbr_val);
> > +
> > +       /* Final check to make sure hw accepted our write */
> > +       qbr_readback = riscv_iommu_readq(iommu, q->qbr);
> > +       if (qbr_readback != qbr_val) {
> > +               dev_err(dev, "failed to set base register for %s\n", name);
> > +               goto fail;
> > +       }
> > +
> > + irq:
> > +       if (request_threaded_irq(irq, irq_check, irq_process, IRQF_ONESHOT | IRQF_SHARED,
> > +                                dev_name(dev), q)) {
> > +               dev_err(dev, "fail to request irq %d for %s\n", irq, name);
> > +               goto fail;
> > +       }
> > +
> > +       q->irq = irq;
> > +
> > +       /* Note: All RIO_xQ_EN/IE fields are in the same offsets */
> > +       ret =
> > +           riscv_iommu_queue_ctrl(iommu, q,
> > +                                  RISCV_IOMMU_QUEUE_ENABLE |
> > +                                  RISCV_IOMMU_QUEUE_INTR_ENABLE);
> > +       if (ret & RISCV_IOMMU_QUEUE_BUSY) {
> > +               dev_err(dev, "%s init timeout\n", name);
> > +               ret = -EBUSY;
> > +               goto fail;
> > +       }
> > +
> > +       return 0;
> > +
> > + fail:
> > +       riscv_iommu_queue_free(iommu, q);
> > +       return 0;
> > +}
> > +
> > +/*
> > + * I/O MMU Command queue chapter 3.1
> > + */
> > +
> > +static inline void riscv_iommu_cmd_inval_vma(struct riscv_iommu_command *cmd)
> > +{
> > +       cmd->dword0 =
> > +           FIELD_PREP(RISCV_IOMMU_CMD_OPCODE,
> > +                      RISCV_IOMMU_CMD_IOTINVAL_OPCODE) | FIELD_PREP(RISCV_IOMMU_CMD_FUNC,
> > +                                                                    RISCV_IOMMU_CMD_IOTINVAL_FUNC_VMA);
> > +       cmd->dword1 = 0;
> > +}
> > +
> > +static inline void riscv_iommu_cmd_inval_set_addr(struct riscv_iommu_command *cmd,
> > +                                                 u64 addr)
> > +{
> > +       cmd->dword0 |= RISCV_IOMMU_CMD_IOTINVAL_AV;
> > +       cmd->dword1 = addr;
> > +}
> > +
> > +static inline void riscv_iommu_cmd_inval_set_pscid(struct riscv_iommu_command *cmd,
> > +                                                  unsigned pscid)
> > +{
> > +       cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_IOTINVAL_PSCID, pscid) |
> > +           RISCV_IOMMU_CMD_IOTINVAL_PSCV;
> > +}
> > +
> > +static inline void riscv_iommu_cmd_inval_set_gscid(struct riscv_iommu_command *cmd,
> > +                                                  unsigned gscid)
> > +{
> > +       cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_IOTINVAL_GSCID, gscid) |
> > +           RISCV_IOMMU_CMD_IOTINVAL_GV;
> > +}
> > +
> > +static inline void riscv_iommu_cmd_iofence(struct riscv_iommu_command *cmd)
> > +{
> > +       cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IOFENCE_OPCODE) |
> > +           FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IOFENCE_FUNC_C);
> > +       cmd->dword1 = 0;
> > +}
> > +
> > +static inline void riscv_iommu_cmd_iofence_set_av(struct riscv_iommu_command *cmd,
> > +                                                 u64 addr, u32 data)
> > +{
> > +       cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IOFENCE_OPCODE) |
> > +           FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IOFENCE_FUNC_C) |
> > +           FIELD_PREP(RISCV_IOMMU_CMD_IOFENCE_DATA, data) | RISCV_IOMMU_CMD_IOFENCE_AV;
> > +       cmd->dword1 = (addr >> 2);
> > +}
> > +
> > +static inline void riscv_iommu_cmd_iodir_inval_ddt(struct riscv_iommu_command *cmd)
> > +{
> > +       cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IODIR_OPCODE) |
> > +           FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_DDT);
> > +       cmd->dword1 = 0;
> > +}
> > +
> > +static inline void riscv_iommu_cmd_iodir_inval_pdt(struct riscv_iommu_command *cmd)
> > +{
> > +       cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IODIR_OPCODE) |
> > +           FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_PDT);
> > +       cmd->dword1 = 0;
> > +}
> > +
> > +static inline void riscv_iommu_cmd_iodir_set_did(struct riscv_iommu_command *cmd,
> > +                                                unsigned devid)
> > +{
> > +       cmd->dword0 |=
> > +           FIELD_PREP(RISCV_IOMMU_CMD_IODIR_DID, devid) | RISCV_IOMMU_CMD_IODIR_DV;
> > +}
> > +
> > +/* TODO: Convert into lock-less MPSC implementation. */
> > +static bool riscv_iommu_post_sync(struct riscv_iommu_device *iommu,
> > +                                 struct riscv_iommu_command *cmd, bool sync)
> > +{
> > +       u32 head, tail, next, last;
> > +       unsigned long flags;
> > +
> > +       spin_lock_irqsave(&iommu->cq_lock, flags);
> > +       head = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQH) & (iommu->cmdq.cnt - 1);
> > +       tail = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQT) & (iommu->cmdq.cnt - 1);
> > +       last = iommu->cmdq.lui;
> > +       if (tail != last) {
> > +               spin_unlock_irqrestore(&iommu->cq_lock, flags);
> > +               /*
> > +                * FIXME: This is a workaround for dropped MMIO writes/reads on QEMU platform.
> > +                *        While debugging of the problem is still ongoing, this provides
> > +                *        a simple impolementation of try-again policy.
> > +                *        Will be changed to lock-less algorithm in the feature.
> > +                */
> > +               dev_dbg(iommu->dev, "IOMMU CQT: %x != %x (1st)\n", last, tail);
> > +               spin_lock_irqsave(&iommu->cq_lock, flags);
> > +               tail =
> > +                   riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQT) & (iommu->cmdq.cnt - 1);
> > +               last = iommu->cmdq.lui;
> > +               if (tail != last) {
> > +                       spin_unlock_irqrestore(&iommu->cq_lock, flags);
> > +                       dev_dbg(iommu->dev, "IOMMU CQT: %x != %x (2nd)\n", last, tail);
> > +                       spin_lock_irqsave(&iommu->cq_lock, flags);
> > +               }
> > +       }
> > +
> > +       next = (last + 1) & (iommu->cmdq.cnt - 1);
> > +       if (next != head) {
> > +               struct riscv_iommu_command *ptr = iommu->cmdq.base;
> > +               ptr[last] = *cmd;
> > +               wmb();
> > +               riscv_iommu_writel(iommu, RISCV_IOMMU_REG_CQT, next);
> > +               iommu->cmdq.lui = next;
> > +       }
> > +
> > +       spin_unlock_irqrestore(&iommu->cq_lock, flags);
> > +
> > +       if (sync && head != next) {
> > +               cycles_t start_time = get_cycles();
> > +               while (1) {
> > +                       last = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQH) &
> > +                           (iommu->cmdq.cnt - 1);
> > +                       if (head < next && last >= next)
> > +                               break;
> > +                       if (head > next && last < head && last >= next)
> > +                               break;
> > +                       if (RISCV_IOMMU_TIMEOUT < (get_cycles() - start_time)) {
>
> This condition will be imprecise, because here is not in irq disabled
> context, it will be scheduled out or preempted. When we come back
> here, it might be over 1 second, but the IOFENCE is actually
> completed.
>

Good point. Thank.


> > +                               dev_err(iommu->dev, "IOFENCE TIMEOUT\n");
> > +                               return false;
> > +                       }
> > +                       cpu_relax();
> > +               }
> > +       }
> > +
> > +       return next != head;
> > +}
> > +
> > +static bool riscv_iommu_post(struct riscv_iommu_device *iommu,
> > +                            struct riscv_iommu_command *cmd)
> > +{
> > +       return riscv_iommu_post_sync(iommu, cmd, false);
> > +}
> > +
> > +static bool riscv_iommu_iofence_sync(struct riscv_iommu_device *iommu)
> > +{
> > +       struct riscv_iommu_command cmd;
> > +       riscv_iommu_cmd_iofence(&cmd);
> > +       return riscv_iommu_post_sync(iommu, &cmd, true);
> > +}
> > +
> > +/* Command queue primary interrupt handler */
> > +static irqreturn_t riscv_iommu_cmdq_irq_check(int irq, void *data)
> > +{
> > +       struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> > +       struct riscv_iommu_device *iommu =
> > +           container_of(q, struct riscv_iommu_device, cmdq);
> > +       u32 ipsr = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_IPSR);
> > +       if (ipsr & RISCV_IOMMU_IPSR_CIP)
> > +               return IRQ_WAKE_THREAD;
> > +       return IRQ_NONE;
> > +}
> > +
> > +/* Command queue interrupt hanlder thread function */
> > +static irqreturn_t riscv_iommu_cmdq_process(int irq, void *data)
> > +{
> > +       struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> > +       struct riscv_iommu_device *iommu;
> > +       unsigned ctrl;
> > +
> > +       iommu = container_of(q, struct riscv_iommu_device, cmdq);
> > +
> > +       /* Error reporting, clear error reports if any. */
> > +       ctrl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQCSR);
> > +       if (ctrl & (RISCV_IOMMU_CQCSR_CQMF |
> > +                   RISCV_IOMMU_CQCSR_CMD_TO | RISCV_IOMMU_CQCSR_CMD_ILL)) {
> > +               riscv_iommu_queue_ctrl(iommu, &iommu->cmdq, ctrl);
> > +               dev_warn_ratelimited(iommu->dev,
> > +                                    "Command queue error: fault: %d tout: %d err: %d\n",
> > +                                    !!(ctrl & RISCV_IOMMU_CQCSR_CQMF),
> > +                                    !!(ctrl & RISCV_IOMMU_CQCSR_CMD_TO),
> > +                                    !!(ctrl & RISCV_IOMMU_CQCSR_CMD_ILL));
>
> We need to handle the error by either adjusting the tail to remove the
> failed command or fixing the failed command itself. Otherwise, the
> failed command will keep in the queue and IOMMU will try to execute
> it. I guess the first option might be easier to implement.
>

Correct. Thanks for pointing this out.
Error handling / recovery was not pushed in this series. There is
work-in-progress series to handle various types of failures, including
command processing errors, DDT misconfiguration, queue overflows,
device reported faults handling, etc.  I can bring some of the error
handling here, if needed. Otherwise I'd prefer to keep it as separate
series, sent out once this one is merged.

> > +       }
> > +
> > +       /* Clear fault interrupt pending. */
> > +       riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR, RISCV_IOMMU_IPSR_CIP);
> > +
> > +       return IRQ_HANDLED;
> > +}
> > +
> > +/*
> > + * Fault/event queue, chapter 3.2
> > + */
> > +
> > +static void riscv_iommu_fault_report(struct riscv_iommu_device *iommu,
> > +                                    struct riscv_iommu_fq_record *event)
> > +{
> > +       unsigned err, devid;
> > +
> > +       err = FIELD_GET(RISCV_IOMMU_FQ_HDR_CAUSE, event->hdr);
> > +       devid = FIELD_GET(RISCV_IOMMU_FQ_HDR_DID, event->hdr);
> > +
> > +       dev_warn_ratelimited(iommu->dev,
> > +                            "Fault %d devid: %d" " iotval: %llx iotval2: %llx\n", err,
> > +                            devid, event->iotval, event->iotval2);
> > +}
> > +
> > +/* Fault/event queue primary interrupt handler */
> > +static irqreturn_t riscv_iommu_fltq_irq_check(int irq, void *data)
> > +{
> > +       struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> > +       struct riscv_iommu_device *iommu =
> > +           container_of(q, struct riscv_iommu_device, fltq);
> > +       u32 ipsr = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_IPSR);
> > +       if (ipsr & RISCV_IOMMU_IPSR_FIP)
> > +               return IRQ_WAKE_THREAD;
> > +       return IRQ_NONE;
> > +}
> > +
> > +/* Fault queue interrupt hanlder thread function */
> > +static irqreturn_t riscv_iommu_fltq_process(int irq, void *data)
> > +{
> > +       struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> > +       struct riscv_iommu_device *iommu;
> > +       struct riscv_iommu_fq_record *events;
> > +       unsigned cnt, len, idx, ctrl;
> > +
> > +       iommu = container_of(q, struct riscv_iommu_device, fltq);
> > +       events = (struct riscv_iommu_fq_record *)q->base;
> > +
> > +       /* Error reporting, clear error reports if any. */
> > +       ctrl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_FQCSR);
> > +       if (ctrl & (RISCV_IOMMU_FQCSR_FQMF | RISCV_IOMMU_FQCSR_FQOF)) {
> > +               riscv_iommu_queue_ctrl(iommu, &iommu->fltq, ctrl);
> > +               dev_warn_ratelimited(iommu->dev,
> > +                                    "Fault queue error: fault: %d full: %d\n",
> > +                                    !!(ctrl & RISCV_IOMMU_FQCSR_FQMF),
> > +                                    !!(ctrl & RISCV_IOMMU_FQCSR_FQOF));
> > +       }
> > +
> > +       /* Clear fault interrupt pending. */
> > +       riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR, RISCV_IOMMU_IPSR_FIP);
> > +
> > +       /* Report fault events. */
> > +       do {
> > +               cnt = riscv_iommu_queue_consume(iommu, q, &idx);
> > +               if (!cnt)
> > +                       break;
> > +               for (len = 0; len < cnt; idx++, len++)
> > +                       riscv_iommu_fault_report(iommu, &events[idx]);
> > +               riscv_iommu_queue_release(iommu, q, cnt);
> > +       } while (1);
> > +
> > +       return IRQ_HANDLED;
> > +}
> > +
> > +/*
> > + * Page request queue, chapter 3.3
> > + */
> > +
> >  /*
> >   * Register device for IOMMU tracking.
> >   */
> > @@ -97,6 +600,54 @@ static void riscv_iommu_add_device(struct riscv_iommu_device *iommu, struct devi
> >         mutex_unlock(&iommu->eps_mutex);
> >  }
> >
> > +/* Page request interface queue primary interrupt handler */
> > +static irqreturn_t riscv_iommu_priq_irq_check(int irq, void *data)
> > +{
> > +       struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> > +       struct riscv_iommu_device *iommu =
> > +           container_of(q, struct riscv_iommu_device, priq);
> > +       u32 ipsr = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_IPSR);
> > +       if (ipsr & RISCV_IOMMU_IPSR_PIP)
> > +               return IRQ_WAKE_THREAD;
> > +       return IRQ_NONE;
> > +}
> > +
> > +/* Page request interface queue interrupt hanlder thread function */
> > +static irqreturn_t riscv_iommu_priq_process(int irq, void *data)
> > +{
> > +       struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> > +       struct riscv_iommu_device *iommu;
> > +       struct riscv_iommu_pq_record *requests;
> > +       unsigned cnt, idx, ctrl;
> > +
> > +       iommu = container_of(q, struct riscv_iommu_device, priq);
> > +       requests = (struct riscv_iommu_pq_record *)q->base;
> > +
> > +       /* Error reporting, clear error reports if any. */
> > +       ctrl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_PQCSR);
> > +       if (ctrl & (RISCV_IOMMU_PQCSR_PQMF | RISCV_IOMMU_PQCSR_PQOF)) {
> > +               riscv_iommu_queue_ctrl(iommu, &iommu->priq, ctrl);
> > +               dev_warn_ratelimited(iommu->dev,
> > +                                    "Page request queue error: fault: %d full: %d\n",
> > +                                    !!(ctrl & RISCV_IOMMU_PQCSR_PQMF),
> > +                                    !!(ctrl & RISCV_IOMMU_PQCSR_PQOF));
> > +       }
> > +
> > +       /* Clear page request interrupt pending. */
> > +       riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR, RISCV_IOMMU_IPSR_PIP);
> > +
> > +       /* Process page requests. */
> > +       do {
> > +               cnt = riscv_iommu_queue_consume(iommu, q, &idx);
> > +               if (!cnt)
> > +                       break;
> > +               dev_warn(iommu->dev, "unexpected %u page requests\n", cnt);
> > +               riscv_iommu_queue_release(iommu, q, cnt);
> > +       } while (1);
> > +
> > +       return IRQ_HANDLED;
> > +}
> > +
> >  /*
> >   * Endpoint management
> >   */
> > @@ -350,7 +901,29 @@ static void riscv_iommu_flush_iotlb_range(struct iommu_domain *iommu_domain,
> >                                           unsigned long *start, unsigned long *end,
> >                                           size_t *pgsize)
> >  {
> > -       /* Command interface not implemented */
> > +       struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
> > +       struct riscv_iommu_command cmd;
> > +       unsigned long iova;
> > +
> > +       if (domain->mode == RISCV_IOMMU_DC_FSC_MODE_BARE)
> > +               return;
> > +
> > +       /* Domain not attached to an IOMMU! */
> > +       BUG_ON(!domain->iommu);
> > +
> > +       riscv_iommu_cmd_inval_vma(&cmd);
> > +       riscv_iommu_cmd_inval_set_pscid(&cmd, domain->pscid);
> > +
> > +       if (start && end && pgsize) {
> > +               /* Cover only the range that is needed */
> > +               for (iova = *start; iova <= *end; iova += *pgsize) {
> > +                       riscv_iommu_cmd_inval_set_addr(&cmd, iova);
> > +                       riscv_iommu_post(domain->iommu, &cmd);
> > +               }
> > +       } else {
> > +               riscv_iommu_post(domain->iommu, &cmd);
> > +       }
> > +       riscv_iommu_iofence_sync(domain->iommu);
> >  }
> >
> >  static void riscv_iommu_flush_iotlb_all(struct iommu_domain *iommu_domain)
> > @@ -610,6 +1183,9 @@ void riscv_iommu_remove(struct riscv_iommu_device *iommu)
> >         iommu_device_unregister(&iommu->iommu);
> >         iommu_device_sysfs_remove(&iommu->iommu);
> >         riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_OFF);
> > +       riscv_iommu_queue_free(iommu, &iommu->cmdq);
> > +       riscv_iommu_queue_free(iommu, &iommu->fltq);
> > +       riscv_iommu_queue_free(iommu, &iommu->priq);
> >  }
> >
> >  int riscv_iommu_init(struct riscv_iommu_device *iommu)
> > @@ -632,6 +1208,16 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
> >         }
> >  #endif
> >
> > +       /*
> > +        * Assign queue lengths from module parameters if not already
> > +        * set on the device tree.
> > +        */
> > +       if (!iommu->cmdq_len)
> > +               iommu->cmdq_len = cmdq_length;
> > +       if (!iommu->fltq_len)
> > +               iommu->fltq_len = fltq_length;
> > +       if (!iommu->priq_len)
> > +               iommu->priq_len = priq_length;
> >         /* Clear any pending interrupt flag. */
> >         riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR,
> >                            RISCV_IOMMU_IPSR_CIP |
> > @@ -639,7 +1225,20 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
> >                            RISCV_IOMMU_IPSR_PMIP | RISCV_IOMMU_IPSR_PIP);
> >         spin_lock_init(&iommu->cq_lock);
> >         mutex_init(&iommu->eps_mutex);
> > +       ret = riscv_iommu_queue_init(iommu, RISCV_IOMMU_COMMAND_QUEUE);
> > +       if (ret)
> > +               goto fail;
> > +       ret = riscv_iommu_queue_init(iommu, RISCV_IOMMU_FAULT_QUEUE);
> > +       if (ret)
> > +               goto fail;
> > +       if (!(iommu->cap & RISCV_IOMMU_CAP_ATS))
> > +               goto no_ats;
> > +
> > +       ret = riscv_iommu_queue_init(iommu, RISCV_IOMMU_PAGE_REQUEST_QUEUE);
> > +       if (ret)
> > +               goto fail;
> >
> > + no_ats:
> >         ret = riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_BARE);
> >
> >         if (ret) {
> > @@ -663,5 +1262,8 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
> >         return 0;
> >   fail:
> >         riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_OFF);
> > +       riscv_iommu_queue_free(iommu, &iommu->priq);
> > +       riscv_iommu_queue_free(iommu, &iommu->fltq);
> > +       riscv_iommu_queue_free(iommu, &iommu->cmdq);
> >         return ret;
> >  }
> > diff --git a/drivers/iommu/riscv/iommu.h b/drivers/iommu/riscv/iommu.h
> > index 7dc9baa59a50..04148a2a8ffd 100644
> > --- a/drivers/iommu/riscv/iommu.h
> > +++ b/drivers/iommu/riscv/iommu.h
> > @@ -28,6 +28,24 @@
> >  #define IOMMU_PAGE_SIZE_1G     BIT_ULL(30)
> >  #define IOMMU_PAGE_SIZE_512G   BIT_ULL(39)
> >
> > +struct riscv_iommu_queue {
> > +       dma_addr_t base_dma;    /* ring buffer bus address */
> > +       void *base;             /* ring buffer pointer */
> > +       size_t len;             /* single item length */
> > +       u32 cnt;                /* items count */
> > +       u32 lui;                /* last used index, consumer/producer share */
> > +       unsigned qbr;           /* queue base register offset */
> > +       unsigned qcr;           /* queue control and status register offset */
> > +       int irq;                /* registered interrupt number */
> > +       bool in_iomem;          /* indicates queue data are in I/O memory  */
> > +};
> > +
> > +enum riscv_queue_ids {
> > +       RISCV_IOMMU_COMMAND_QUEUE       = 0,
> > +       RISCV_IOMMU_FAULT_QUEUE         = 1,
> > +       RISCV_IOMMU_PAGE_REQUEST_QUEUE  = 2
> > +};
> > +
> >  struct riscv_iommu_device {
> >         struct iommu_device iommu;      /* iommu core interface */
> >         struct device *dev;             /* iommu hardware */
> > @@ -42,6 +60,11 @@ struct riscv_iommu_device {
> >         int irq_pm;
> >         int irq_priq;
> >
> > +       /* Queue lengths */
> > +       int cmdq_len;
> > +       int fltq_len;
> > +       int priq_len;
> > +
> >         /* supported and enabled hardware capabilities */
> >         u64 cap;
> >
> > @@ -53,6 +76,11 @@ struct riscv_iommu_device {
> >         unsigned ddt_mode;
> >         bool ddtp_in_iomem;
> >
> > +       /* hardware queues */
> > +       struct riscv_iommu_queue cmdq;
> > +       struct riscv_iommu_queue fltq;
> > +       struct riscv_iommu_queue priq;
> > +
> >         /* Connected end-points */
> >         struct rb_root eps;
> >         struct mutex eps_mutex;
> > --
> > 2.34.1
> >
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv

best,
- Tomasz

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 01/11] RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support.
  2023-07-19 19:33 ` [PATCH 01/11] RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support Tomasz Jeznach
                     ` (3 preceding siblings ...)
  2023-07-28  2:42   ` Zong Li
@ 2023-08-03  0:18   ` Jason Gunthorpe
  2023-08-03  8:27   ` Zong Li
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 86+ messages in thread
From: Jason Gunthorpe @ 2023-08-03  0:18 UTC (permalink / raw)
  To: Tomasz Jeznach
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Nick Kossifidis, Sebastien Boeuf, iommu, linux-riscv,
	linux-kernel, linux

On Wed, Jul 19, 2023 at 12:33:45PM -0700, Tomasz Jeznach wrote:

> +static int riscv_iommu_domain_finalize(struct riscv_iommu_domain *domain,
> +				       struct riscv_iommu_device *iommu)
> +{

Do not introduce this finalize pattern into new drivers. We are trying
to get rid of it. I don't see anything here that suggest you need it.

Do all of this when you allocate the domain.

> +	struct iommu_domain_geometry *geometry;
> +
> +	/* Domain assigned to another iommu */
> +	if (domain->iommu && domain->iommu != iommu)
> +		return -EINVAL;
> +	/* Domain already initialized */
> +	else if (domain->iommu)
> +		return 0;

These tests are not good, the domain should be able to be associated
with as many iommu instances as it likes.

> +static int riscv_iommu_attach_dev(struct iommu_domain *iommu_domain, struct device *dev)
> +{
> +	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
> +	struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
> +	int ret;
> +
> +	/* PSCID not valid */
> +	if ((int)domain->pscid < 0)
> +		return -ENOMEM;
> +
> +	mutex_lock(&domain->lock);
> +	mutex_lock(&ep->lock);
> +
> +	if (!list_empty(&ep->domain)) {
> +		dev_warn(dev, "endpoint already attached to a domain. dropping\n");

This is legitimate, it means the driver has to replace the domain, and
drivers have to implement this.

> +/*
> + * Common I/O MMU driver probe/teardown
> + */
> +
> +static const struct iommu_domain_ops riscv_iommu_domain_ops = {
> +	.free = riscv_iommu_domain_free,
> +	.attach_dev = riscv_iommu_attach_dev,
> +	.map_pages = riscv_iommu_map_pages,
> +	.unmap_pages = riscv_iommu_unmap_pages,
> +	.iova_to_phys = riscv_iommu_iova_to_phys,
> +	.iotlb_sync = riscv_iommu_iotlb_sync,
> +	.iotlb_sync_map = riscv_iommu_iotlb_sync_map,
> +	.flush_iotlb_all = riscv_iommu_flush_iotlb_all,
> +};

Please split the ops by domain type, eg identity, paging, sva, etc.

> +int riscv_iommu_init(struct riscv_iommu_device *iommu)
> +{
> +	struct device *dev = iommu->dev;
> +	u32 fctl = 0;
> +	int ret;
> +
> +	iommu->eps = RB_ROOT;
> +
> +	fctl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_FCTL);
> +
> +#ifdef CONFIG_CPU_BIG_ENDIAN
> +	if (!(cap & RISCV_IOMMU_CAP_END)) {
> +		dev_err(dev, "IOMMU doesn't support Big Endian\n");

Why not?

> +		return -EIO;
> +	} else if (!(fctl & RISCV_IOMMU_FCTL_BE)) {
> +		fctl |= FIELD_PREP(RISCV_IOMMU_FCTL_BE, 1);
> +		riscv_iommu_writel(iommu, RISCV_IOMMU_REG_FCTL, fctl);
> +	}
> +#endif
> +
> +	/* Clear any pending interrupt flag. */
> +	riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR,
> +			   RISCV_IOMMU_IPSR_CIP |
> +			   RISCV_IOMMU_IPSR_FIP |
> +			   RISCV_IOMMU_IPSR_PMIP | RISCV_IOMMU_IPSR_PIP);
> +	spin_lock_init(&iommu->cq_lock);
> +	mutex_init(&iommu->eps_mutex);
> +
> +	ret = riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_BARE);
> +
> +	if (ret) {
> +		dev_err(dev, "cannot enable iommu device (%d)\n", ret);
> +		goto fail;
> +	}
> +
> +	ret = iommu_device_register(&iommu->iommu, &riscv_iommu_ops, dev);
> +	if (ret) {
> +		dev_err(dev, "cannot register iommu interface (%d)\n", ret);
> +		iommu_device_sysfs_remove(&iommu->iommu);
> +		goto fail;
> +	}

The calls to iommu_device_sysfs_add() are missing, this is mandatory..

Jason

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 01/11] RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support.
  2023-08-02 20:15     ` Tomasz Jeznach
  2023-08-02 20:25       ` Conor Dooley
@ 2023-08-03  3:37       ` Zong Li
  1 sibling, 0 replies; 86+ messages in thread
From: Zong Li @ 2023-08-03  3:37 UTC (permalink / raw)
  To: Tomasz Jeznach
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley,
	Anup Patel, Albert Ou, linux, linux-kernel, Sebastien Boeuf,
	iommu, Palmer Dabbelt, Nick Kossifidis, linux-riscv

On Thu, Aug 3, 2023 at 4:15 AM Tomasz Jeznach <tjeznach@rivosinc.com> wrote:
>
> On Thu, Jul 27, 2023 at 7:42 PM Zong Li <zong.li@sifive.com> wrote:
> >
> > On Thu, Jul 20, 2023 at 3:34 AM Tomasz Jeznach <tjeznach@rivosinc.com> wrote:
> > >
> > > +static int riscv_iommu_platform_probe(struct platform_device *pdev)
> > > +{
> > > +       struct device *dev = &pdev->dev;
> > > +       struct riscv_iommu_device *iommu = NULL;
> > > +       struct resource *res = NULL;
> > > +       int ret = 0;
> > > +
> > > +       iommu = devm_kzalloc(dev, sizeof(*iommu), GFP_KERNEL);
> > > +       if (!iommu)
> > > +               return -ENOMEM;
> > > +
> > > +       iommu->dev = dev;
> > > +       dev_set_drvdata(dev, iommu);
> > > +
> > > +       res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> > > +       if (!res) {
> > > +               dev_err(dev, "could not find resource for register region\n");
> > > +               return -EINVAL;
> > > +       }
> > > +
> > > +       iommu->reg = devm_platform_get_and_ioremap_resource(pdev, 0, &res);
> > > +       if (IS_ERR(iommu->reg)) {
> > > +               ret = dev_err_probe(dev, PTR_ERR(iommu->reg),
> > > +                                   "could not map register region\n");
> > > +               goto fail;
> > > +       };
> > > +
> > > +       iommu->reg_phys = res->start;
> > > +
> > > +       ret = -ENODEV;
> > > +
> > > +       /* Sanity check: Did we get the whole register space ? */
> > > +       if ((res->end - res->start + 1) < RISCV_IOMMU_REG_SIZE) {
> > > +               dev_err(dev, "device region smaller than register file (0x%llx)\n",
> > > +                       res->end - res->start);
> > > +               goto fail;
> > > +       }
> >
> > Could we assume that DT should be responsible for specifying the right size?
> >
>
> This only to validate DT provided info and driver expected register
> file size. Expectation is that DT will provide right size.
>
>
> > > +static struct iommu_domain *riscv_iommu_domain_alloc(unsigned type)
> > > +{
> > > +       struct riscv_iommu_domain *domain;
> > > +
> > > +       if (type != IOMMU_DOMAIN_IDENTITY &&
> > > +           type != IOMMU_DOMAIN_BLOCKED)
> > > +               return NULL;
> > > +
> > > +       domain = kzalloc(sizeof(*domain), GFP_KERNEL);
> > > +       if (!domain)
> > > +               return NULL;
> > > +
> > > +       mutex_init(&domain->lock);
> > > +       INIT_LIST_HEAD(&domain->endpoints);
> > > +
> > > +       domain->domain.ops = &riscv_iommu_domain_ops;
> > > +       domain->mode = RISCV_IOMMU_DC_FSC_MODE_BARE;
> > > +       domain->pscid = ida_alloc_range(&riscv_iommu_pscids, 1,
> > > +                                       RISCV_IOMMU_MAX_PSCID, GFP_KERNEL);
> > > +
> > > +       printk("domain type %x alloc %u\n", type, domain->pscid);
> > > +
> >
> > Could it uses pr_xxx instead of printk?
> >
>
> Absolutely, fixed here and elsewhere. Also, used dev_dbg wherever applicable.
>
> > > +
> > > +static int riscv_iommu_enable(struct riscv_iommu_device *iommu, unsigned requested_mode)
> > > +{
> > > +       struct device *dev = iommu->dev;
> > > +       u64 ddtp = 0;
> > > +       u64 ddtp_paddr = 0;
> > > +       unsigned mode = requested_mode;
> > > +       unsigned mode_readback = 0;
> > > +
> > > +       ddtp = riscv_iommu_get_ddtp(iommu);
> > > +       if (ddtp & RISCV_IOMMU_DDTP_BUSY)
> > > +               return -EBUSY;
> > > +
> > > +       /* Disallow state transtion from xLVL to xLVL. */
> > > +       switch (FIELD_GET(RISCV_IOMMU_DDTP_MODE, ddtp)) {
> > > +       case RISCV_IOMMU_DDTP_MODE_BARE:
> > > +       case RISCV_IOMMU_DDTP_MODE_OFF:
> > > +               break;
> > > +       default:
> > > +               if ((mode != RISCV_IOMMU_DDTP_MODE_BARE)
> > > +                   && (mode != RISCV_IOMMU_DDTP_MODE_OFF))
> > > +                       return -EINVAL;
> > > +               break;
> > > +       }
> > > +
> > > + retry:
> >
> > We need to consider the `iommu.passthrough` before we set up the mode
> > in switch case, something like
> >
>
> This function is only to execute configuration and set device directory mode.
> Handling global iommu.passthrough policy is implemented in
> riscv_iommu_init() call (patch #7).

Thanks. I saw that in patch #7.

>
> Best,
> - Tomasz

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues
  2023-08-02 20:50     ` Tomasz Jeznach
@ 2023-08-03  8:24       ` Zong Li
  0 siblings, 0 replies; 86+ messages in thread
From: Zong Li @ 2023-08-03  8:24 UTC (permalink / raw)
  To: Tomasz Jeznach
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley,
	Anup Patel, Albert Ou, linux, linux-kernel, Sebastien Boeuf,
	iommu, Palmer Dabbelt, Nick Kossifidis, linux-riscv

On Thu, Aug 3, 2023 at 4:50 AM Tomasz Jeznach <tjeznach@rivosinc.com> wrote:
>
> On Sat, Jul 29, 2023 at 5:58 AM Zong Li <zong.li@sifive.com> wrote:
> >
> > On Thu, Jul 20, 2023 at 3:34 AM Tomasz Jeznach <tjeznach@rivosinc.com> wrote:
> > >
> > > Enables message or wire signal interrupts for PCIe and platforms devices.
> > >
> > > Co-developed-by: Nick Kossifidis <mick@ics.forth.gr>
> > > Signed-off-by: Nick Kossifidis <mick@ics.forth.gr>
> > > Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> > > ---
> > >  drivers/iommu/riscv/iommu-pci.c      |  72 ++++
> > >  drivers/iommu/riscv/iommu-platform.c |  66 +++
> > >  drivers/iommu/riscv/iommu.c          | 604 ++++++++++++++++++++++++++-
> > >  drivers/iommu/riscv/iommu.h          |  28 ++
> > >  4 files changed, 769 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/iommu/riscv/iommu-pci.c b/drivers/iommu/riscv/iommu-pci.c
> > > index c91f963d7a29..9ea0647f7b92 100644
> > > --- a/drivers/iommu/riscv/iommu-pci.c
> > > +++ b/drivers/iommu/riscv/iommu-pci.c
> > > @@ -34,6 +34,7 @@ static int riscv_iommu_pci_probe(struct pci_dev *pdev, const struct pci_device_i
> > >  {
> > >         struct device *dev = &pdev->dev;
> > >         struct riscv_iommu_device *iommu;
> > > +       u64 icvec;
> > >         int ret;
> > >
> > >         ret = pci_enable_device_mem(pdev);
> > > @@ -67,14 +68,84 @@ static int riscv_iommu_pci_probe(struct pci_dev *pdev, const struct pci_device_i
> > >         iommu->dev = dev;
> > >         dev_set_drvdata(dev, iommu);
> > >
> > > +       /* Check device reported capabilities. */
> > > +       iommu->cap = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_CAP);
> > > +
> > > +       /* The PCI driver only uses MSIs, make sure the IOMMU supports this */
> > > +       switch (FIELD_GET(RISCV_IOMMU_CAP_IGS, iommu->cap)) {
> > > +       case RISCV_IOMMU_CAP_IGS_MSI:
> > > +       case RISCV_IOMMU_CAP_IGS_BOTH:
> > > +               break;
> > > +       default:
> > > +               dev_err(dev, "unable to use message-signaled interrupts\n");
> > > +               ret = -ENODEV;
> > > +               goto fail;
> > > +       }
> > > +
> > >         dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
> > >         pci_set_master(pdev);
> > >
> > > +       /* Allocate and assign IRQ vectors for the various events */
> > > +       ret = pci_alloc_irq_vectors(pdev, 1, RISCV_IOMMU_INTR_COUNT, PCI_IRQ_MSIX);
> > > +       if (ret < 0) {
> > > +               dev_err(dev, "unable to allocate irq vectors\n");
> > > +               goto fail;
> > > +       }
> > > +
> > > +       ret = -ENODEV;
> > > +
> > > +       iommu->irq_cmdq = msi_get_virq(dev, RISCV_IOMMU_INTR_CQ);
> > > +       if (!iommu->irq_cmdq) {
> > > +               dev_warn(dev, "no MSI vector %d for the command queue\n",
> > > +                        RISCV_IOMMU_INTR_CQ);
> > > +               goto fail;
> > > +       }
> > > +
> > > +       iommu->irq_fltq = msi_get_virq(dev, RISCV_IOMMU_INTR_FQ);
> > > +       if (!iommu->irq_fltq) {
> > > +               dev_warn(dev, "no MSI vector %d for the fault/event queue\n",
> > > +                        RISCV_IOMMU_INTR_FQ);
> > > +               goto fail;
> > > +       }
> > > +
> > > +       if (iommu->cap & RISCV_IOMMU_CAP_HPM) {
> > > +               iommu->irq_pm = msi_get_virq(dev, RISCV_IOMMU_INTR_PM);
> > > +               if (!iommu->irq_pm) {
> > > +                       dev_warn(dev,
> > > +                                "no MSI vector %d for performance monitoring\n",
> > > +                                RISCV_IOMMU_INTR_PM);
> > > +                       goto fail;
> > > +               }
> > > +       }
> > > +
> > > +       if (iommu->cap & RISCV_IOMMU_CAP_ATS) {
> > > +               iommu->irq_priq = msi_get_virq(dev, RISCV_IOMMU_INTR_PQ);
> > > +               if (!iommu->irq_priq) {
> > > +                       dev_warn(dev,
> > > +                                "no MSI vector %d for page-request queue\n",
> > > +                                RISCV_IOMMU_INTR_PQ);
> > > +                       goto fail;
> > > +               }
> > > +       }
> > > +
> > > +       /* Set simple 1:1 mapping for MSI vectors */
> > > +       icvec = FIELD_PREP(RISCV_IOMMU_IVEC_CIV, RISCV_IOMMU_INTR_CQ) |
> > > +           FIELD_PREP(RISCV_IOMMU_IVEC_FIV, RISCV_IOMMU_INTR_FQ);
> > > +
> > > +       if (iommu->cap & RISCV_IOMMU_CAP_HPM)
> > > +               icvec |= FIELD_PREP(RISCV_IOMMU_IVEC_PMIV, RISCV_IOMMU_INTR_PM);
> > > +
> > > +       if (iommu->cap & RISCV_IOMMU_CAP_ATS)
> > > +               icvec |= FIELD_PREP(RISCV_IOMMU_IVEC_PIV, RISCV_IOMMU_INTR_PQ);
> > > +
> > > +       riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IVEC, icvec);
> > > +
> > >         ret = riscv_iommu_init(iommu);
> > >         if (!ret)
> > >                 return ret;
> > >
> > >   fail:
> > > +       pci_free_irq_vectors(pdev);
> > >         pci_clear_master(pdev);
> > >         pci_release_regions(pdev);
> > >         pci_disable_device(pdev);
> > > @@ -85,6 +156,7 @@ static int riscv_iommu_pci_probe(struct pci_dev *pdev, const struct pci_device_i
> > >  static void riscv_iommu_pci_remove(struct pci_dev *pdev)
> > >  {
> > >         riscv_iommu_remove(dev_get_drvdata(&pdev->dev));
> > > +       pci_free_irq_vectors(pdev);
> > >         pci_clear_master(pdev);
> > >         pci_release_regions(pdev);
> > >         pci_disable_device(pdev);
> > > diff --git a/drivers/iommu/riscv/iommu-platform.c b/drivers/iommu/riscv/iommu-platform.c
> > > index e4e8ca6711e7..35935d3c7ef4 100644
> > > --- a/drivers/iommu/riscv/iommu-platform.c
> > > +++ b/drivers/iommu/riscv/iommu-platform.c
> > > @@ -20,6 +20,8 @@ static int riscv_iommu_platform_probe(struct platform_device *pdev)
> > >         struct device *dev = &pdev->dev;
> > >         struct riscv_iommu_device *iommu = NULL;
> > >         struct resource *res = NULL;
> > > +       u32 fctl = 0;
> > > +       int irq = 0;
> > >         int ret = 0;
> > >
> > >         iommu = devm_kzalloc(dev, sizeof(*iommu), GFP_KERNEL);
> > > @@ -53,6 +55,70 @@ static int riscv_iommu_platform_probe(struct platform_device *pdev)
> > >                 goto fail;
> > >         }
> > >
> > > +       iommu->cap = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_CAP);
> > > +
> > > +       /* For now we only support WSIs until we have AIA support */
> >
> > I'm not completely understand AIA support here, because I saw the pci
> > case uses the MSI, and kernel seems to have the AIA implementation.
> > Could you please elaborate it?
> >
> > > +       ret = FIELD_GET(RISCV_IOMMU_CAP_IGS, iommu->cap);
> > > +       if (ret == RISCV_IOMMU_CAP_IGS_MSI) {
> > > +               dev_err(dev, "IOMMU only supports MSIs\n");
> > > +               goto fail;
> > > +       }
> > > +
> > > +       /* Parse IRQ assignment */
> > > +       irq = platform_get_irq_byname_optional(pdev, "cmdq");
> > > +       if (irq > 0)
> > > +               iommu->irq_cmdq = irq;
> > > +       else {
> > > +               dev_err(dev, "no IRQ provided for the command queue\n");
> > > +               goto fail;
> > > +       }
> > > +
> > > +       irq = platform_get_irq_byname_optional(pdev, "fltq");
> > > +       if (irq > 0)
> > > +               iommu->irq_fltq = irq;
> > > +       else {
> > > +               dev_err(dev, "no IRQ provided for the fault/event queue\n");
> > > +               goto fail;
> > > +       }
> > > +
> > > +       if (iommu->cap & RISCV_IOMMU_CAP_HPM) {
> > > +               irq = platform_get_irq_byname_optional(pdev, "pm");
> > > +               if (irq > 0)
> > > +                       iommu->irq_pm = irq;
> > > +               else {
> > > +                       dev_err(dev, "no IRQ provided for performance monitoring\n");
> > > +                       goto fail;
> > > +               }
> > > +       }
> > > +
> > > +       if (iommu->cap & RISCV_IOMMU_CAP_ATS) {
> > > +               irq = platform_get_irq_byname_optional(pdev, "priq");
> > > +               if (irq > 0)
> > > +                       iommu->irq_priq = irq;
> > > +               else {
> > > +                       dev_err(dev, "no IRQ provided for the page-request queue\n");
> > > +                       goto fail;
> > > +               }
> > > +       }
> >
> > Should we define the "interrupt-names" in dt-bindings?
> >
>
> Yes, this was brought up earlier wrt dt-bindings.
>
> I'm considering removal of interrupt names from DT (and get-byname
> option), as IOMMU hardware cause-to-vector remapping `icvec` should be
> used to map interrupt source to actual interrupt vector. If possible
> device driver should map cause to interrupt (based on number of
> vectors available) or rely on ICVEC WARL properties to discover fixed
> cause-to-vector mapping in the hardware.
>

I'm not sure if I understand correctly, but one thing we might need to
consider is when the vector numbers are less than interrupt sources,
for example, if IOMMU only supports single vector (i.e. supports a
single interrupt wire), the `request_threaded_irq` will request the
irq three times by using same irq number for command, fault and
page-request queues. It would cause a problem and fail to request an
IRQ for the other two queues. It seems that we still need to consider
this situation regardless of how we determine the IRQs for each
interrupt source.

> Please let me know if this is reasonable change.
>
> > > +
> > > +       /* Make sure fctl.WSI is set */
> > > +       fctl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_FCTL);
> > > +       fctl |= RISCV_IOMMU_FCTL_WSI;
> > > +       riscv_iommu_writel(iommu, RISCV_IOMMU_REG_FCTL, fctl);
> > > +
> > > +       /* Parse Queue lengts */
> > > +       ret = of_property_read_u32(pdev->dev.of_node, "cmdq_len", &iommu->cmdq_len);
> > > +       if (!ret)
> > > +               dev_info(dev, "command queue length set to %i\n", iommu->cmdq_len);
> > > +
> > > +       ret = of_property_read_u32(pdev->dev.of_node, "fltq_len", &iommu->fltq_len);
> > > +       if (!ret)
> > > +               dev_info(dev, "fault/event queue length set to %i\n", iommu->fltq_len);
> > > +
> > > +       ret = of_property_read_u32(pdev->dev.of_node, "priq_len", &iommu->priq_len);
> > > +       if (!ret)
> > > +               dev_info(dev, "page request queue length set to %i\n", iommu->priq_len);
> > > +
> > >         dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
> > >
> > >         return riscv_iommu_init(iommu);
> > > diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
> > > index 31dc3c458e13..5c4cf9875302 100644
> > > --- a/drivers/iommu/riscv/iommu.c
> > > +++ b/drivers/iommu/riscv/iommu.c
> > > @@ -45,6 +45,18 @@ static int ddt_mode = RISCV_IOMMU_DDTP_MODE_BARE;
> > >  module_param(ddt_mode, int, 0644);
> > >  MODULE_PARM_DESC(ddt_mode, "Device Directory Table mode.");
> > >
> > > +static int cmdq_length = 1024;
> > > +module_param(cmdq_length, int, 0644);
> > > +MODULE_PARM_DESC(cmdq_length, "Command queue length.");
> > > +
> > > +static int fltq_length = 1024;
> > > +module_param(fltq_length, int, 0644);
> > > +MODULE_PARM_DESC(fltq_length, "Fault queue length.");
> > > +
> > > +static int priq_length = 1024;
> > > +module_param(priq_length, int, 0644);
> > > +MODULE_PARM_DESC(priq_length, "Page request interface queue length.");
> > > +
> > >  /* IOMMU PSCID allocation namespace. */
> > >  #define RISCV_IOMMU_MAX_PSCID  (1U << 20)
> > >  static DEFINE_IDA(riscv_iommu_pscids);
> > > @@ -65,6 +77,497 @@ static DEFINE_IDA(riscv_iommu_pscids);
> > >  static const struct iommu_domain_ops riscv_iommu_domain_ops;
> > >  static const struct iommu_ops riscv_iommu_ops;
> > >
> > > +/*
> > > + * Common queue management routines
> > > + */
> > > +
> > > +/* Note: offsets are the same for all queues */
> > > +#define Q_HEAD(q) ((q)->qbr + (RISCV_IOMMU_REG_CQH - RISCV_IOMMU_REG_CQB))
> > > +#define Q_TAIL(q) ((q)->qbr + (RISCV_IOMMU_REG_CQT - RISCV_IOMMU_REG_CQB))
> > > +
> > > +static unsigned riscv_iommu_queue_consume(struct riscv_iommu_device *iommu,
> > > +                                         struct riscv_iommu_queue *q, unsigned *ready)
> > > +{
> > > +       u32 tail = riscv_iommu_readl(iommu, Q_TAIL(q));
> > > +       *ready = q->lui;
> > > +
> > > +       BUG_ON(q->cnt <= tail);
> > > +       if (q->lui <= tail)
> > > +               return tail - q->lui;
> > > +       return q->cnt - q->lui;
> > > +}
> > > +
> > > +static void riscv_iommu_queue_release(struct riscv_iommu_device *iommu,
> > > +                                     struct riscv_iommu_queue *q, unsigned count)
> > > +{
> > > +       q->lui = (q->lui + count) & (q->cnt - 1);
> > > +       riscv_iommu_writel(iommu, Q_HEAD(q), q->lui);
> > > +}
> > > +
> > > +static u32 riscv_iommu_queue_ctrl(struct riscv_iommu_device *iommu,
> > > +                                 struct riscv_iommu_queue *q, u32 val)
> > > +{
> > > +       cycles_t end_cycles = RISCV_IOMMU_TIMEOUT + get_cycles();
> > > +
> > > +       riscv_iommu_writel(iommu, q->qcr, val);
> > > +       do {
> > > +               val = riscv_iommu_readl(iommu, q->qcr);
> > > +               if (!(val & RISCV_IOMMU_QUEUE_BUSY))
> > > +                       break;
> > > +               cpu_relax();
> > > +       } while (get_cycles() < end_cycles);
> > > +
> > > +       return val;
> > > +}
> > > +
> > > +static void riscv_iommu_queue_free(struct riscv_iommu_device *iommu,
> > > +                                  struct riscv_iommu_queue *q)
> > > +{
> > > +       size_t size = q->len * q->cnt;
> > > +
> > > +       riscv_iommu_queue_ctrl(iommu, q, 0);
> > > +
> > > +       if (q->base) {
> > > +               if (q->in_iomem)
> > > +                       iounmap(q->base);
> > > +               else
> > > +                       dmam_free_coherent(iommu->dev, size, q->base, q->base_dma);
> > > +       }
> > > +       if (q->irq)
> > > +               free_irq(q->irq, q);
> > > +}
> > > +
> > > +static irqreturn_t riscv_iommu_cmdq_irq_check(int irq, void *data);
> > > +static irqreturn_t riscv_iommu_cmdq_process(int irq, void *data);
> > > +static irqreturn_t riscv_iommu_fltq_irq_check(int irq, void *data);
> > > +static irqreturn_t riscv_iommu_fltq_process(int irq, void *data);
> > > +static irqreturn_t riscv_iommu_priq_irq_check(int irq, void *data);
> > > +static irqreturn_t riscv_iommu_priq_process(int irq, void *data);
> > > +
> > > +static int riscv_iommu_queue_init(struct riscv_iommu_device *iommu, int queue_id)
> > > +{
> > > +       struct device *dev = iommu->dev;
> > > +       struct riscv_iommu_queue *q = NULL;
> > > +       size_t queue_size = 0;
> > > +       irq_handler_t irq_check;
> > > +       irq_handler_t irq_process;
> > > +       const char *name;
> > > +       int count = 0;
> > > +       int irq = 0;
> > > +       unsigned order = 0;
> > > +       u64 qbr_val = 0;
> > > +       u64 qbr_readback = 0;
> > > +       u64 qbr_paddr = 0;
> > > +       int ret = 0;
> > > +
> > > +       switch (queue_id) {
> > > +       case RISCV_IOMMU_COMMAND_QUEUE:
> > > +               q = &iommu->cmdq;
> > > +               q->len = sizeof(struct riscv_iommu_command);
> > > +               count = iommu->cmdq_len;
> > > +               irq = iommu->irq_cmdq;
> > > +               irq_check = riscv_iommu_cmdq_irq_check;
> > > +               irq_process = riscv_iommu_cmdq_process;
> > > +               q->qbr = RISCV_IOMMU_REG_CQB;
> > > +               q->qcr = RISCV_IOMMU_REG_CQCSR;
> > > +               name = "cmdq";
> > > +               break;
> > > +       case RISCV_IOMMU_FAULT_QUEUE:
> > > +               q = &iommu->fltq;
> > > +               q->len = sizeof(struct riscv_iommu_fq_record);
> > > +               count = iommu->fltq_len;
> > > +               irq = iommu->irq_fltq;
> > > +               irq_check = riscv_iommu_fltq_irq_check;
> > > +               irq_process = riscv_iommu_fltq_process;
> > > +               q->qbr = RISCV_IOMMU_REG_FQB;
> > > +               q->qcr = RISCV_IOMMU_REG_FQCSR;
> > > +               name = "fltq";
> > > +               break;
> > > +       case RISCV_IOMMU_PAGE_REQUEST_QUEUE:
> > > +               q = &iommu->priq;
> > > +               q->len = sizeof(struct riscv_iommu_pq_record);
> > > +               count = iommu->priq_len;
> > > +               irq = iommu->irq_priq;
> > > +               irq_check = riscv_iommu_priq_irq_check;
> > > +               irq_process = riscv_iommu_priq_process;
> > > +               q->qbr = RISCV_IOMMU_REG_PQB;
> > > +               q->qcr = RISCV_IOMMU_REG_PQCSR;
> > > +               name = "priq";
> > > +               break;
> > > +       default:
> > > +               dev_err(dev, "invalid queue interrupt index in queue_init!\n");
> > > +               return -EINVAL;
> > > +       }
> > > +
> > > +       /* Polling not implemented */
> > > +       if (!irq)
> > > +               return -ENODEV;
> > > +
> > > +       /* Allocate queue in memory and set the base register */
> > > +       order = ilog2(count);
> > > +       do {
> > > +               queue_size = q->len * (1ULL << order);
> > > +               q->base = dmam_alloc_coherent(dev, queue_size, &q->base_dma, GFP_KERNEL);
> > > +               if (q->base || queue_size < PAGE_SIZE)
> > > +                       break;
> > > +
> > > +               order--;
> > > +       } while (1);
> > > +
> > > +       if (!q->base) {
> > > +               dev_err(dev, "failed to allocate %s queue (cnt: %u)\n", name, count);
> > > +               return -ENOMEM;
> > > +       }
> > > +
> > > +       q->cnt = 1ULL << order;
> > > +
> > > +       qbr_val = phys_to_ppn(q->base_dma) |
> > > +           FIELD_PREP(RISCV_IOMMU_QUEUE_LOGSZ_FIELD, order - 1);
> > > +
> > > +       riscv_iommu_writeq(iommu, q->qbr, qbr_val);
> > > +
> > > +       /*
> > > +        * Queue base registers are WARL, so it's possible that whatever we wrote
> > > +        * there was illegal/not supported by the hw in which case we need to make
> > > +        * sure we set a supported PPN and/or queue size.
> > > +        */
> > > +       qbr_readback = riscv_iommu_readq(iommu, q->qbr);
> > > +       if (qbr_readback == qbr_val)
> > > +               goto irq;
> > > +
> > > +       dmam_free_coherent(dev, queue_size, q->base, q->base_dma);
> > > +
> > > +       /* Get supported queue size */
> > > +       order = FIELD_GET(RISCV_IOMMU_QUEUE_LOGSZ_FIELD, qbr_readback) + 1;
> > > +       q->cnt = 1ULL << order;
> > > +       queue_size = q->len * q->cnt;
> > > +
> > > +       /*
> > > +        * In case we also failed to set PPN, it means the field is hardcoded and the
> > > +        * queue resides in I/O memory instead, so get its physical address and
> > > +        * ioremap it.
> > > +        */
> > > +       qbr_paddr = ppn_to_phys(qbr_readback);
> > > +       if (qbr_paddr != q->base_dma) {
> > > +               dev_info(dev,
> > > +                        "hardcoded ppn in %s base register, using io memory for the queue\n",
> > > +                        name);
> > > +               dev_info(dev, "queue length for %s set to %i\n", name, q->cnt);
> > > +               q->in_iomem = true;
> > > +               q->base = ioremap(qbr_paddr, queue_size);
> > > +               if (!q->base) {
> > > +                       dev_err(dev, "failed to map %s queue (cnt: %u)\n", name, q->cnt);
> > > +                       return -ENOMEM;
> > > +               }
> > > +               q->base_dma = qbr_paddr;
> > > +       } else {
> > > +               /*
> > > +                * We only failed to set the queue size, re-try to allocate memory with
> > > +                * the queue size supported by the hw.
> > > +                */
> > > +               dev_info(dev, "hardcoded queue size in %s base register\n", name);
> > > +               dev_info(dev, "retrying with queue length: %i\n", q->cnt);
> > > +               q->base = dmam_alloc_coherent(dev, queue_size, &q->base_dma, GFP_KERNEL);
> > > +               if (!q->base) {
> > > +                       dev_err(dev, "failed to allocate %s queue (cnt: %u)\n",
> > > +                               name, q->cnt);
> > > +                       return -ENOMEM;
> > > +               }
> > > +       }
> > > +
> > > +       qbr_val = phys_to_ppn(q->base_dma) |
> > > +           FIELD_PREP(RISCV_IOMMU_QUEUE_LOGSZ_FIELD, order - 1);
> > > +       riscv_iommu_writeq(iommu, q->qbr, qbr_val);
> > > +
> > > +       /* Final check to make sure hw accepted our write */
> > > +       qbr_readback = riscv_iommu_readq(iommu, q->qbr);
> > > +       if (qbr_readback != qbr_val) {
> > > +               dev_err(dev, "failed to set base register for %s\n", name);
> > > +               goto fail;
> > > +       }
> > > +
> > > + irq:
> > > +       if (request_threaded_irq(irq, irq_check, irq_process, IRQF_ONESHOT | IRQF_SHARED,
> > > +                                dev_name(dev), q)) {
> > > +               dev_err(dev, "fail to request irq %d for %s\n", irq, name);
> > > +               goto fail;
> > > +       }
> > > +
> > > +       q->irq = irq;
> > > +
> > > +       /* Note: All RIO_xQ_EN/IE fields are in the same offsets */
> > > +       ret =
> > > +           riscv_iommu_queue_ctrl(iommu, q,
> > > +                                  RISCV_IOMMU_QUEUE_ENABLE |
> > > +                                  RISCV_IOMMU_QUEUE_INTR_ENABLE);
> > > +       if (ret & RISCV_IOMMU_QUEUE_BUSY) {
> > > +               dev_err(dev, "%s init timeout\n", name);
> > > +               ret = -EBUSY;
> > > +               goto fail;
> > > +       }
> > > +
> > > +       return 0;
> > > +
> > > + fail:
> > > +       riscv_iommu_queue_free(iommu, q);
> > > +       return 0;
> > > +}
> > > +
> > > +/*
> > > + * I/O MMU Command queue chapter 3.1
> > > + */
> > > +
> > > +static inline void riscv_iommu_cmd_inval_vma(struct riscv_iommu_command *cmd)
> > > +{
> > > +       cmd->dword0 =
> > > +           FIELD_PREP(RISCV_IOMMU_CMD_OPCODE,
> > > +                      RISCV_IOMMU_CMD_IOTINVAL_OPCODE) | FIELD_PREP(RISCV_IOMMU_CMD_FUNC,
> > > +                                                                    RISCV_IOMMU_CMD_IOTINVAL_FUNC_VMA);
> > > +       cmd->dword1 = 0;
> > > +}
> > > +
> > > +static inline void riscv_iommu_cmd_inval_set_addr(struct riscv_iommu_command *cmd,
> > > +                                                 u64 addr)
> > > +{
> > > +       cmd->dword0 |= RISCV_IOMMU_CMD_IOTINVAL_AV;
> > > +       cmd->dword1 = addr;
> > > +}
> > > +
> > > +static inline void riscv_iommu_cmd_inval_set_pscid(struct riscv_iommu_command *cmd,
> > > +                                                  unsigned pscid)
> > > +{
> > > +       cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_IOTINVAL_PSCID, pscid) |
> > > +           RISCV_IOMMU_CMD_IOTINVAL_PSCV;
> > > +}
> > > +
> > > +static inline void riscv_iommu_cmd_inval_set_gscid(struct riscv_iommu_command *cmd,
> > > +                                                  unsigned gscid)
> > > +{
> > > +       cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_IOTINVAL_GSCID, gscid) |
> > > +           RISCV_IOMMU_CMD_IOTINVAL_GV;
> > > +}
> > > +
> > > +static inline void riscv_iommu_cmd_iofence(struct riscv_iommu_command *cmd)
> > > +{
> > > +       cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IOFENCE_OPCODE) |
> > > +           FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IOFENCE_FUNC_C);
> > > +       cmd->dword1 = 0;
> > > +}
> > > +
> > > +static inline void riscv_iommu_cmd_iofence_set_av(struct riscv_iommu_command *cmd,
> > > +                                                 u64 addr, u32 data)
> > > +{
> > > +       cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IOFENCE_OPCODE) |
> > > +           FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IOFENCE_FUNC_C) |
> > > +           FIELD_PREP(RISCV_IOMMU_CMD_IOFENCE_DATA, data) | RISCV_IOMMU_CMD_IOFENCE_AV;
> > > +       cmd->dword1 = (addr >> 2);
> > > +}
> > > +
> > > +static inline void riscv_iommu_cmd_iodir_inval_ddt(struct riscv_iommu_command *cmd)
> > > +{
> > > +       cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IODIR_OPCODE) |
> > > +           FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_DDT);
> > > +       cmd->dword1 = 0;
> > > +}
> > > +
> > > +static inline void riscv_iommu_cmd_iodir_inval_pdt(struct riscv_iommu_command *cmd)
> > > +{
> > > +       cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IODIR_OPCODE) |
> > > +           FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_PDT);
> > > +       cmd->dword1 = 0;
> > > +}
> > > +
> > > +static inline void riscv_iommu_cmd_iodir_set_did(struct riscv_iommu_command *cmd,
> > > +                                                unsigned devid)
> > > +{
> > > +       cmd->dword0 |=
> > > +           FIELD_PREP(RISCV_IOMMU_CMD_IODIR_DID, devid) | RISCV_IOMMU_CMD_IODIR_DV;
> > > +}
> > > +
> > > +/* TODO: Convert into lock-less MPSC implementation. */
> > > +static bool riscv_iommu_post_sync(struct riscv_iommu_device *iommu,
> > > +                                 struct riscv_iommu_command *cmd, bool sync)
> > > +{
> > > +       u32 head, tail, next, last;
> > > +       unsigned long flags;
> > > +
> > > +       spin_lock_irqsave(&iommu->cq_lock, flags);
> > > +       head = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQH) & (iommu->cmdq.cnt - 1);
> > > +       tail = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQT) & (iommu->cmdq.cnt - 1);
> > > +       last = iommu->cmdq.lui;
> > > +       if (tail != last) {
> > > +               spin_unlock_irqrestore(&iommu->cq_lock, flags);
> > > +               /*
> > > +                * FIXME: This is a workaround for dropped MMIO writes/reads on QEMU platform.
> > > +                *        While debugging of the problem is still ongoing, this provides
> > > +                *        a simple impolementation of try-again policy.
> > > +                *        Will be changed to lock-less algorithm in the feature.
> > > +                */
> > > +               dev_dbg(iommu->dev, "IOMMU CQT: %x != %x (1st)\n", last, tail);
> > > +               spin_lock_irqsave(&iommu->cq_lock, flags);
> > > +               tail =
> > > +                   riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQT) & (iommu->cmdq.cnt - 1);
> > > +               last = iommu->cmdq.lui;
> > > +               if (tail != last) {
> > > +                       spin_unlock_irqrestore(&iommu->cq_lock, flags);
> > > +                       dev_dbg(iommu->dev, "IOMMU CQT: %x != %x (2nd)\n", last, tail);
> > > +                       spin_lock_irqsave(&iommu->cq_lock, flags);
> > > +               }
> > > +       }
> > > +
> > > +       next = (last + 1) & (iommu->cmdq.cnt - 1);
> > > +       if (next != head) {
> > > +               struct riscv_iommu_command *ptr = iommu->cmdq.base;
> > > +               ptr[last] = *cmd;
> > > +               wmb();
> > > +               riscv_iommu_writel(iommu, RISCV_IOMMU_REG_CQT, next);
> > > +               iommu->cmdq.lui = next;
> > > +       }
> > > +
> > > +       spin_unlock_irqrestore(&iommu->cq_lock, flags);
> > > +
> > > +       if (sync && head != next) {
> > > +               cycles_t start_time = get_cycles();
> > > +               while (1) {
> > > +                       last = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQH) &
> > > +                           (iommu->cmdq.cnt - 1);
> > > +                       if (head < next && last >= next)
> > > +                               break;
> > > +                       if (head > next && last < head && last >= next)
> > > +                               break;
> > > +                       if (RISCV_IOMMU_TIMEOUT < (get_cycles() - start_time)) {
> >
> > This condition will be imprecise, because here is not in irq disabled
> > context, it will be scheduled out or preempted. When we come back
> > here, it might be over 1 second, but the IOFENCE is actually
> > completed.
> >
>
> Good point. Thank.
>
>
> > > +                               dev_err(iommu->dev, "IOFENCE TIMEOUT\n");
> > > +                               return false;
> > > +                       }
> > > +                       cpu_relax();
> > > +               }
> > > +       }
> > > +
> > > +       return next != head;
> > > +}
> > > +
> > > +static bool riscv_iommu_post(struct riscv_iommu_device *iommu,
> > > +                            struct riscv_iommu_command *cmd)
> > > +{
> > > +       return riscv_iommu_post_sync(iommu, cmd, false);
> > > +}
> > > +
> > > +static bool riscv_iommu_iofence_sync(struct riscv_iommu_device *iommu)
> > > +{
> > > +       struct riscv_iommu_command cmd;
> > > +       riscv_iommu_cmd_iofence(&cmd);
> > > +       return riscv_iommu_post_sync(iommu, &cmd, true);
> > > +}
> > > +
> > > +/* Command queue primary interrupt handler */
> > > +static irqreturn_t riscv_iommu_cmdq_irq_check(int irq, void *data)
> > > +{
> > > +       struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> > > +       struct riscv_iommu_device *iommu =
> > > +           container_of(q, struct riscv_iommu_device, cmdq);
> > > +       u32 ipsr = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_IPSR);
> > > +       if (ipsr & RISCV_IOMMU_IPSR_CIP)
> > > +               return IRQ_WAKE_THREAD;
> > > +       return IRQ_NONE;
> > > +}
> > > +
> > > +/* Command queue interrupt hanlder thread function */
> > > +static irqreturn_t riscv_iommu_cmdq_process(int irq, void *data)
> > > +{
> > > +       struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> > > +       struct riscv_iommu_device *iommu;
> > > +       unsigned ctrl;
> > > +
> > > +       iommu = container_of(q, struct riscv_iommu_device, cmdq);
> > > +
> > > +       /* Error reporting, clear error reports if any. */
> > > +       ctrl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQCSR);
> > > +       if (ctrl & (RISCV_IOMMU_CQCSR_CQMF |
> > > +                   RISCV_IOMMU_CQCSR_CMD_TO | RISCV_IOMMU_CQCSR_CMD_ILL)) {
> > > +               riscv_iommu_queue_ctrl(iommu, &iommu->cmdq, ctrl);
> > > +               dev_warn_ratelimited(iommu->dev,
> > > +                                    "Command queue error: fault: %d tout: %d err: %d\n",
> > > +                                    !!(ctrl & RISCV_IOMMU_CQCSR_CQMF),
> > > +                                    !!(ctrl & RISCV_IOMMU_CQCSR_CMD_TO),
> > > +                                    !!(ctrl & RISCV_IOMMU_CQCSR_CMD_ILL));
> >
> > We need to handle the error by either adjusting the tail to remove the
> > failed command or fixing the failed command itself. Otherwise, the
> > failed command will keep in the queue and IOMMU will try to execute
> > it. I guess the first option might be easier to implement.
> >
>
> Correct. Thanks for pointing this out.
> Error handling / recovery was not pushed in this series. There is
> work-in-progress series to handle various types of failures, including
> command processing errors, DDT misconfiguration, queue overflows,
> device reported faults handling, etc.  I can bring some of the error
> handling here, if needed. Otherwise I'd prefer to keep it as separate
> series, sent out once this one is merged.

It sounds good to me, thanks.

>
> > > +       }
> > > +
> > > +       /* Clear fault interrupt pending. */
> > > +       riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR, RISCV_IOMMU_IPSR_CIP);
> > > +
> > > +       return IRQ_HANDLED;
> > > +}
> > > +
> > > +/*
> > > + * Fault/event queue, chapter 3.2
> > > + */
> > > +
> > > +static void riscv_iommu_fault_report(struct riscv_iommu_device *iommu,
> > > +                                    struct riscv_iommu_fq_record *event)
> > > +{
> > > +       unsigned err, devid;
> > > +
> > > +       err = FIELD_GET(RISCV_IOMMU_FQ_HDR_CAUSE, event->hdr);
> > > +       devid = FIELD_GET(RISCV_IOMMU_FQ_HDR_DID, event->hdr);
> > > +
> > > +       dev_warn_ratelimited(iommu->dev,
> > > +                            "Fault %d devid: %d" " iotval: %llx iotval2: %llx\n", err,
> > > +                            devid, event->iotval, event->iotval2);
> > > +}
> > > +
> > > +/* Fault/event queue primary interrupt handler */
> > > +static irqreturn_t riscv_iommu_fltq_irq_check(int irq, void *data)
> > > +{
> > > +       struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> > > +       struct riscv_iommu_device *iommu =
> > > +           container_of(q, struct riscv_iommu_device, fltq);
> > > +       u32 ipsr = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_IPSR);
> > > +       if (ipsr & RISCV_IOMMU_IPSR_FIP)
> > > +               return IRQ_WAKE_THREAD;
> > > +       return IRQ_NONE;
> > > +}
> > > +
> > > +/* Fault queue interrupt hanlder thread function */
> > > +static irqreturn_t riscv_iommu_fltq_process(int irq, void *data)
> > > +{
> > > +       struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> > > +       struct riscv_iommu_device *iommu;
> > > +       struct riscv_iommu_fq_record *events;
> > > +       unsigned cnt, len, idx, ctrl;
> > > +
> > > +       iommu = container_of(q, struct riscv_iommu_device, fltq);
> > > +       events = (struct riscv_iommu_fq_record *)q->base;
> > > +
> > > +       /* Error reporting, clear error reports if any. */
> > > +       ctrl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_FQCSR);
> > > +       if (ctrl & (RISCV_IOMMU_FQCSR_FQMF | RISCV_IOMMU_FQCSR_FQOF)) {
> > > +               riscv_iommu_queue_ctrl(iommu, &iommu->fltq, ctrl);
> > > +               dev_warn_ratelimited(iommu->dev,
> > > +                                    "Fault queue error: fault: %d full: %d\n",
> > > +                                    !!(ctrl & RISCV_IOMMU_FQCSR_FQMF),
> > > +                                    !!(ctrl & RISCV_IOMMU_FQCSR_FQOF));
> > > +       }
> > > +
> > > +       /* Clear fault interrupt pending. */
> > > +       riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR, RISCV_IOMMU_IPSR_FIP);
> > > +
> > > +       /* Report fault events. */
> > > +       do {
> > > +               cnt = riscv_iommu_queue_consume(iommu, q, &idx);
> > > +               if (!cnt)
> > > +                       break;
> > > +               for (len = 0; len < cnt; idx++, len++)
> > > +                       riscv_iommu_fault_report(iommu, &events[idx]);
> > > +               riscv_iommu_queue_release(iommu, q, cnt);
> > > +       } while (1);
> > > +
> > > +       return IRQ_HANDLED;
> > > +}
> > > +
> > > +/*
> > > + * Page request queue, chapter 3.3
> > > + */
> > > +
> > >  /*
> > >   * Register device for IOMMU tracking.
> > >   */
> > > @@ -97,6 +600,54 @@ static void riscv_iommu_add_device(struct riscv_iommu_device *iommu, struct devi
> > >         mutex_unlock(&iommu->eps_mutex);
> > >  }
> > >
> > > +/* Page request interface queue primary interrupt handler */
> > > +static irqreturn_t riscv_iommu_priq_irq_check(int irq, void *data)
> > > +{
> > > +       struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> > > +       struct riscv_iommu_device *iommu =
> > > +           container_of(q, struct riscv_iommu_device, priq);
> > > +       u32 ipsr = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_IPSR);
> > > +       if (ipsr & RISCV_IOMMU_IPSR_PIP)
> > > +               return IRQ_WAKE_THREAD;
> > > +       return IRQ_NONE;
> > > +}
> > > +
> > > +/* Page request interface queue interrupt hanlder thread function */
> > > +static irqreturn_t riscv_iommu_priq_process(int irq, void *data)
> > > +{
> > > +       struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> > > +       struct riscv_iommu_device *iommu;
> > > +       struct riscv_iommu_pq_record *requests;
> > > +       unsigned cnt, idx, ctrl;
> > > +
> > > +       iommu = container_of(q, struct riscv_iommu_device, priq);
> > > +       requests = (struct riscv_iommu_pq_record *)q->base;
> > > +
> > > +       /* Error reporting, clear error reports if any. */
> > > +       ctrl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_PQCSR);
> > > +       if (ctrl & (RISCV_IOMMU_PQCSR_PQMF | RISCV_IOMMU_PQCSR_PQOF)) {
> > > +               riscv_iommu_queue_ctrl(iommu, &iommu->priq, ctrl);
> > > +               dev_warn_ratelimited(iommu->dev,
> > > +                                    "Page request queue error: fault: %d full: %d\n",
> > > +                                    !!(ctrl & RISCV_IOMMU_PQCSR_PQMF),
> > > +                                    !!(ctrl & RISCV_IOMMU_PQCSR_PQOF));
> > > +       }
> > > +
> > > +       /* Clear page request interrupt pending. */
> > > +       riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR, RISCV_IOMMU_IPSR_PIP);
> > > +
> > > +       /* Process page requests. */
> > > +       do {
> > > +               cnt = riscv_iommu_queue_consume(iommu, q, &idx);
> > > +               if (!cnt)
> > > +                       break;
> > > +               dev_warn(iommu->dev, "unexpected %u page requests\n", cnt);
> > > +               riscv_iommu_queue_release(iommu, q, cnt);
> > > +       } while (1);
> > > +
> > > +       return IRQ_HANDLED;
> > > +}
> > > +
> > >  /*
> > >   * Endpoint management
> > >   */
> > > @@ -350,7 +901,29 @@ static void riscv_iommu_flush_iotlb_range(struct iommu_domain *iommu_domain,
> > >                                           unsigned long *start, unsigned long *end,
> > >                                           size_t *pgsize)
> > >  {
> > > -       /* Command interface not implemented */
> > > +       struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
> > > +       struct riscv_iommu_command cmd;
> > > +       unsigned long iova;
> > > +
> > > +       if (domain->mode == RISCV_IOMMU_DC_FSC_MODE_BARE)
> > > +               return;
> > > +
> > > +       /* Domain not attached to an IOMMU! */
> > > +       BUG_ON(!domain->iommu);
> > > +
> > > +       riscv_iommu_cmd_inval_vma(&cmd);
> > > +       riscv_iommu_cmd_inval_set_pscid(&cmd, domain->pscid);
> > > +
> > > +       if (start && end && pgsize) {
> > > +               /* Cover only the range that is needed */
> > > +               for (iova = *start; iova <= *end; iova += *pgsize) {
> > > +                       riscv_iommu_cmd_inval_set_addr(&cmd, iova);
> > > +                       riscv_iommu_post(domain->iommu, &cmd);
> > > +               }
> > > +       } else {
> > > +               riscv_iommu_post(domain->iommu, &cmd);
> > > +       }
> > > +       riscv_iommu_iofence_sync(domain->iommu);
> > >  }
> > >
> > >  static void riscv_iommu_flush_iotlb_all(struct iommu_domain *iommu_domain)
> > > @@ -610,6 +1183,9 @@ void riscv_iommu_remove(struct riscv_iommu_device *iommu)
> > >         iommu_device_unregister(&iommu->iommu);
> > >         iommu_device_sysfs_remove(&iommu->iommu);
> > >         riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_OFF);
> > > +       riscv_iommu_queue_free(iommu, &iommu->cmdq);
> > > +       riscv_iommu_queue_free(iommu, &iommu->fltq);
> > > +       riscv_iommu_queue_free(iommu, &iommu->priq);
> > >  }
> > >
> > >  int riscv_iommu_init(struct riscv_iommu_device *iommu)
> > > @@ -632,6 +1208,16 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
> > >         }
> > >  #endif
> > >
> > > +       /*
> > > +        * Assign queue lengths from module parameters if not already
> > > +        * set on the device tree.
> > > +        */
> > > +       if (!iommu->cmdq_len)
> > > +               iommu->cmdq_len = cmdq_length;
> > > +       if (!iommu->fltq_len)
> > > +               iommu->fltq_len = fltq_length;
> > > +       if (!iommu->priq_len)
> > > +               iommu->priq_len = priq_length;
> > >         /* Clear any pending interrupt flag. */
> > >         riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR,
> > >                            RISCV_IOMMU_IPSR_CIP |
> > > @@ -639,7 +1225,20 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
> > >                            RISCV_IOMMU_IPSR_PMIP | RISCV_IOMMU_IPSR_PIP);
> > >         spin_lock_init(&iommu->cq_lock);
> > >         mutex_init(&iommu->eps_mutex);
> > > +       ret = riscv_iommu_queue_init(iommu, RISCV_IOMMU_COMMAND_QUEUE);
> > > +       if (ret)
> > > +               goto fail;
> > > +       ret = riscv_iommu_queue_init(iommu, RISCV_IOMMU_FAULT_QUEUE);
> > > +       if (ret)
> > > +               goto fail;
> > > +       if (!(iommu->cap & RISCV_IOMMU_CAP_ATS))
> > > +               goto no_ats;
> > > +
> > > +       ret = riscv_iommu_queue_init(iommu, RISCV_IOMMU_PAGE_REQUEST_QUEUE);
> > > +       if (ret)
> > > +               goto fail;
> > >
> > > + no_ats:
> > >         ret = riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_BARE);
> > >
> > >         if (ret) {
> > > @@ -663,5 +1262,8 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
> > >         return 0;
> > >   fail:
> > >         riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_OFF);
> > > +       riscv_iommu_queue_free(iommu, &iommu->priq);
> > > +       riscv_iommu_queue_free(iommu, &iommu->fltq);
> > > +       riscv_iommu_queue_free(iommu, &iommu->cmdq);
> > >         return ret;
> > >  }
> > > diff --git a/drivers/iommu/riscv/iommu.h b/drivers/iommu/riscv/iommu.h
> > > index 7dc9baa59a50..04148a2a8ffd 100644
> > > --- a/drivers/iommu/riscv/iommu.h
> > > +++ b/drivers/iommu/riscv/iommu.h
> > > @@ -28,6 +28,24 @@
> > >  #define IOMMU_PAGE_SIZE_1G     BIT_ULL(30)
> > >  #define IOMMU_PAGE_SIZE_512G   BIT_ULL(39)
> > >
> > > +struct riscv_iommu_queue {
> > > +       dma_addr_t base_dma;    /* ring buffer bus address */
> > > +       void *base;             /* ring buffer pointer */
> > > +       size_t len;             /* single item length */
> > > +       u32 cnt;                /* items count */
> > > +       u32 lui;                /* last used index, consumer/producer share */
> > > +       unsigned qbr;           /* queue base register offset */
> > > +       unsigned qcr;           /* queue control and status register offset */
> > > +       int irq;                /* registered interrupt number */
> > > +       bool in_iomem;          /* indicates queue data are in I/O memory  */
> > > +};
> > > +
> > > +enum riscv_queue_ids {
> > > +       RISCV_IOMMU_COMMAND_QUEUE       = 0,
> > > +       RISCV_IOMMU_FAULT_QUEUE         = 1,
> > > +       RISCV_IOMMU_PAGE_REQUEST_QUEUE  = 2
> > > +};
> > > +
> > >  struct riscv_iommu_device {
> > >         struct iommu_device iommu;      /* iommu core interface */
> > >         struct device *dev;             /* iommu hardware */
> > > @@ -42,6 +60,11 @@ struct riscv_iommu_device {
> > >         int irq_pm;
> > >         int irq_priq;
> > >
> > > +       /* Queue lengths */
> > > +       int cmdq_len;
> > > +       int fltq_len;
> > > +       int priq_len;
> > > +
> > >         /* supported and enabled hardware capabilities */
> > >         u64 cap;
> > >
> > > @@ -53,6 +76,11 @@ struct riscv_iommu_device {
> > >         unsigned ddt_mode;
> > >         bool ddtp_in_iomem;
> > >
> > > +       /* hardware queues */
> > > +       struct riscv_iommu_queue cmdq;
> > > +       struct riscv_iommu_queue fltq;
> > > +       struct riscv_iommu_queue priq;
> > > +
> > >         /* Connected end-points */
> > >         struct rb_root eps;
> > >         struct mutex eps_mutex;
> > > --
> > > 2.34.1
> > >
> > >
> > > _______________________________________________
> > > linux-riscv mailing list
> > > linux-riscv@lists.infradead.org
> > > http://lists.infradead.org/mailman/listinfo/linux-riscv
>
> best,
> - Tomasz

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 01/11] RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support.
  2023-07-19 19:33 ` [PATCH 01/11] RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support Tomasz Jeznach
                     ` (4 preceding siblings ...)
  2023-08-03  0:18   ` Jason Gunthorpe
@ 2023-08-03  8:27   ` Zong Li
  2023-08-16 18:05   ` Robin Murphy
  2024-04-13 10:15   ` Xingyou Chen
  7 siblings, 0 replies; 86+ messages in thread
From: Zong Li @ 2023-08-03  8:27 UTC (permalink / raw)
  To: Tomasz Jeznach
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley,
	Anup Patel, Albert Ou, linux, linux-kernel, Sebastien Boeuf,
	iommu, Palmer Dabbelt, Nick Kossifidis, linux-riscv

On Thu, Jul 20, 2023 at 3:34 AM Tomasz Jeznach <tjeznach@rivosinc.com> wrote:
>
> The patch introduces skeleton IOMMU device driver implementation as defined
> by RISC-V IOMMU Architecture Specification, Version 1.0 [1], with minimal support
> for pass-through mapping, basic initialization and bindings for platform and PCIe
> hardware implementations.
>
> Series of patches following specification evolution has been reorganized to provide
> functional separation of implemented blocks, compliant with ratified specification.
>
> This and following patch series includes code contributed by: Nick Kossifidis
> <mick@ics.forth.gr> (iommu-platform device, number of specification clarification
> and bugfixes and readability improvements), Sebastien Boeuf <seb@rivosinc.com> (page
> table creation, ATS/PGR flow).
>
> Complete history can be found at the maintainer's repository branch [2].
>
> Device driver enables RISC-V 32/64 support for memory translation for DMA capable
> PCI and platform devices, multilevel device directory table, process directory,
> shared virtual address support, wired and message signaled interrupt for translation
> I/O fault, page request interface and command processing.
>
> Matching RISCV-V IOMMU device emulation implementation is available for QEMU project,
> along with educational device extensions for PASID ATS/PRI support [3].
>
> References:
>  - [1] https://github.com/riscv-non-isa/riscv-iommu
>  - [2] https://github.com/tjeznach/linux/tree/tjeznach/riscv-iommu
>  - [3] https://github.com/tjeznach/qemu/tree/tjeznach/riscv-iommu
>
> Co-developed-by: Nick Kossifidis <mick@ics.forth.gr>
> Signed-off-by: Nick Kossifidis <mick@ics.forth.gr>
> Co-developed-by: Sebastien Boeuf <seb@rivosinc.com>
> Signed-off-by: Sebastien Boeuf <seb@rivosinc.com>
> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> ---
>  drivers/iommu/Kconfig                |   1 +
>  drivers/iommu/Makefile               |   2 +-
>  drivers/iommu/riscv/Kconfig          |  22 +
>  drivers/iommu/riscv/Makefile         |   1 +
>  drivers/iommu/riscv/iommu-bits.h     | 704 +++++++++++++++++++++++++++
>  drivers/iommu/riscv/iommu-pci.c      | 134 +++++
>  drivers/iommu/riscv/iommu-platform.c |  94 ++++
>  drivers/iommu/riscv/iommu.c          | 660 +++++++++++++++++++++++++
>  drivers/iommu/riscv/iommu.h          | 115 +++++
>  9 files changed, 1732 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/iommu/riscv/Kconfig
>  create mode 100644 drivers/iommu/riscv/Makefile
>  create mode 100644 drivers/iommu/riscv/iommu-bits.h
>  create mode 100644 drivers/iommu/riscv/iommu-pci.c
>  create mode 100644 drivers/iommu/riscv/iommu-platform.c
>  create mode 100644 drivers/iommu/riscv/iommu.c
>  create mode 100644 drivers/iommu/riscv/iommu.h
>
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 2b12b583ef4b..36fcc6fd5b4e 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -187,6 +187,7 @@ config MSM_IOMMU
>  source "drivers/iommu/amd/Kconfig"
>  source "drivers/iommu/intel/Kconfig"
>  source "drivers/iommu/iommufd/Kconfig"
> +source "drivers/iommu/riscv/Kconfig"
>
>  config IRQ_REMAP
>         bool "Support for Interrupt Remapping"
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 769e43d780ce..8f57110a9fb1 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -1,5 +1,5 @@
>  # SPDX-License-Identifier: GPL-2.0
> -obj-y += amd/ intel/ arm/ iommufd/
> +obj-y += amd/ intel/ arm/ iommufd/ riscv/
>  obj-$(CONFIG_IOMMU_API) += iommu.o
>  obj-$(CONFIG_IOMMU_API) += iommu-traces.o
>  obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
> diff --git a/drivers/iommu/riscv/Kconfig b/drivers/iommu/riscv/Kconfig
> new file mode 100644
> index 000000000000..01d4043849d4
> --- /dev/null
> +++ b/drivers/iommu/riscv/Kconfig
> @@ -0,0 +1,22 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +# RISC-V IOMMU support
> +
> +config RISCV_IOMMU
> +       bool "RISC-V IOMMU driver"
> +       depends on RISCV
> +       select IOMMU_API
> +       select IOMMU_DMA
> +       select IOMMU_SVA
> +       select IOMMU_IOVA
> +       select IOMMU_IO_PGTABLE
> +       select IOASID
> +       select PCI_MSI
> +       select PCI_ATS
> +       select PCI_PRI
> +       select PCI_PASID
> +       select MMU_NOTIFIER
> +       help
> +         Support for devices following RISC-V IOMMU specification.
> +
> +         If unsure, say N here.
> +
> diff --git a/drivers/iommu/riscv/Makefile b/drivers/iommu/riscv/Makefile
> new file mode 100644
> index 000000000000..38730c11e4a8
> --- /dev/null
> +++ b/drivers/iommu/riscv/Makefile
> @@ -0,0 +1 @@
> +obj-$(CONFIG_RISCV_IOMMU) += iommu.o iommu-pci.o iommu-platform.o
> \ No newline at end of file
> diff --git a/drivers/iommu/riscv/iommu-bits.h b/drivers/iommu/riscv/iommu-bits.h
> new file mode 100644
> index 000000000000..b2946793a73d
> --- /dev/null
> +++ b/drivers/iommu/riscv/iommu-bits.h
> @@ -0,0 +1,704 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright © 2022-2023 Rivos Inc.
> + * Copyright © 2023 FORTH-ICS/CARV
> + * Copyright © 2023 RISC-V IOMMU Task Group
> + *
> + * RISC-V Ziommu - Register Layout and Data Structures.
> + *
> + * Based on the 'RISC-V IOMMU Architecture Specification', Version 1.0
> + * Published at  https://github.com/riscv-non-isa/riscv-iommu
> + *
> + */
> +
> +#ifndef _RISCV_IOMMU_BITS_H_
> +#define _RISCV_IOMMU_BITS_H_
> +
> +#include <linux/types.h>
> +#include <linux/bitfield.h>
> +#include <linux/bits.h>
> +
> +/*
> + * Chapter 5: Memory Mapped register interface
> + */
> +
> +/* Common field positions */
> +#define RISCV_IOMMU_PPN_FIELD          GENMASK_ULL(53, 10)
> +#define RISCV_IOMMU_QUEUE_LOGSZ_FIELD  GENMASK_ULL(4, 0)
> +#define RISCV_IOMMU_QUEUE_INDEX_FIELD  GENMASK_ULL(31, 0)
> +#define RISCV_IOMMU_QUEUE_ENABLE       BIT(0)
> +#define RISCV_IOMMU_QUEUE_INTR_ENABLE  BIT(1)
> +#define RISCV_IOMMU_QUEUE_MEM_FAULT    BIT(8)
> +#define RISCV_IOMMU_QUEUE_OVERFLOW     BIT(9)
> +#define RISCV_IOMMU_QUEUE_ACTIVE       BIT(16)
> +#define RISCV_IOMMU_QUEUE_BUSY         BIT(17)
> +
> +#define RISCV_IOMMU_ATP_PPN_FIELD      GENMASK_ULL(43, 0)
> +#define RISCV_IOMMU_ATP_MODE_FIELD     GENMASK_ULL(63, 60)
> +
> +/* 5.3 IOMMU Capabilities (64bits) */
> +#define RISCV_IOMMU_REG_CAP            0x0000
> +#define RISCV_IOMMU_CAP_VERSION                GENMASK_ULL(7, 0)
> +#define RISCV_IOMMU_CAP_S_SV32         BIT_ULL(8)
> +#define RISCV_IOMMU_CAP_S_SV39         BIT_ULL(9)
> +#define RISCV_IOMMU_CAP_S_SV48         BIT_ULL(10)
> +#define RISCV_IOMMU_CAP_S_SV57         BIT_ULL(11)
> +#define RISCV_IOMMU_CAP_SVPBMT         BIT_ULL(15)
> +#define RISCV_IOMMU_CAP_G_SV32         BIT_ULL(16)
> +#define RISCV_IOMMU_CAP_G_SV39         BIT_ULL(17)
> +#define RISCV_IOMMU_CAP_G_SV48         BIT_ULL(18)
> +#define RISCV_IOMMU_CAP_G_SV57         BIT_ULL(19)
> +#define RISCV_IOMMU_CAP_MSI_FLAT       BIT_ULL(22)
> +#define RISCV_IOMMU_CAP_MSI_MRIF       BIT_ULL(23)
> +#define RISCV_IOMMU_CAP_AMO            BIT_ULL(24)
> +#define RISCV_IOMMU_CAP_ATS            BIT_ULL(25)
> +#define RISCV_IOMMU_CAP_T2GPA          BIT_ULL(26)
> +#define RISCV_IOMMU_CAP_END            BIT_ULL(27)
> +#define RISCV_IOMMU_CAP_IGS            GENMASK_ULL(29, 28)
> +#define RISCV_IOMMU_CAP_HPM            BIT_ULL(30)
> +#define RISCV_IOMMU_CAP_DBG            BIT_ULL(31)
> +#define RISCV_IOMMU_CAP_PAS            GENMASK_ULL(37, 32)
> +#define RISCV_IOMMU_CAP_PD8            BIT_ULL(38)
> +#define RISCV_IOMMU_CAP_PD17           BIT_ULL(39)
> +#define RISCV_IOMMU_CAP_PD20           BIT_ULL(40)
> +
> +#define RISCV_IOMMU_CAP_VERSION_VER_MASK       0xF0
> +#define RISCV_IOMMU_CAP_VERSION_REV_MASK       0x0F
> +
> +/**
> + * enum riscv_iommu_igs_settings - Interrupt Generation Support Settings
> + * @RISCV_IOMMU_CAP_IGS_MSI: I/O MMU supports only MSI generation
> + * @RISCV_IOMMU_CAP_IGS_WSI: I/O MMU supports only Wired-Signaled interrupt
> + * @RISCV_IOMMU_CAP_IGS_BOTH: I/O MMU supports both MSI and WSI generation
> + * @RISCV_IOMMU_CAP_IGS_RSRV: Reserved for standard use
> + */
> +enum riscv_iommu_igs_settings {
> +       RISCV_IOMMU_CAP_IGS_MSI = 0,
> +       RISCV_IOMMU_CAP_IGS_WSI = 1,
> +       RISCV_IOMMU_CAP_IGS_BOTH = 2,
> +       RISCV_IOMMU_CAP_IGS_RSRV = 3
> +};
> +
> +/* 5.4 Features control register (32bits) */
> +#define RISCV_IOMMU_REG_FCTL           0x0008
> +#define RISCV_IOMMU_FCTL_BE            BIT(0)
> +#define RISCV_IOMMU_FCTL_WSI           BIT(1)
> +#define RISCV_IOMMU_FCTL_GXL           BIT(2)
> +
> +/* 5.5 Device-directory-table pointer (64bits) */
> +#define RISCV_IOMMU_REG_DDTP           0x0010
> +#define RISCV_IOMMU_DDTP_MODE          GENMASK_ULL(3, 0)
> +#define RISCV_IOMMU_DDTP_BUSY          BIT_ULL(4)
> +#define RISCV_IOMMU_DDTP_PPN           RISCV_IOMMU_PPN_FIELD
> +
> +/**
> + * enum riscv_iommu_ddtp_modes - I/O MMU translation modes
> + * @RISCV_IOMMU_DDTP_MODE_OFF: No inbound transactions allowed
> + * @RISCV_IOMMU_DDTP_MODE_BARE: Pass-through mode
> + * @RISCV_IOMMU_DDTP_MODE_1LVL: One-level DDT
> + * @RISCV_IOMMU_DDTP_MODE_2LVL: Two-level DDT
> + * @RISCV_IOMMU_DDTP_MODE_3LVL: Three-level DDT
> + */
> +enum riscv_iommu_ddtp_modes {
> +       RISCV_IOMMU_DDTP_MODE_OFF = 0,
> +       RISCV_IOMMU_DDTP_MODE_BARE = 1,
> +       RISCV_IOMMU_DDTP_MODE_1LVL = 2,
> +       RISCV_IOMMU_DDTP_MODE_2LVL = 3,
> +       RISCV_IOMMU_DDTP_MODE_3LVL = 4,
> +       RISCV_IOMMU_DDTP_MODE_MAX = 4
> +};
> +
> +/* 5.6 Command Queue Base (64bits) */
> +#define RISCV_IOMMU_REG_CQB            0x0018
> +#define RISCV_IOMMU_CQB_ENTRIES                RISCV_IOMMU_QUEUE_LOGSZ_FIELD
> +#define RISCV_IOMMU_CQB_PPN            RISCV_IOMMU_PPN_FIELD
> +
> +/* 5.7 Command Queue head (32bits) */
> +#define RISCV_IOMMU_REG_CQH            0x0020
> +#define RISCV_IOMMU_CQH_INDEX          RISCV_IOMMU_QUEUE_INDEX_FIELD
> +
> +/* 5.8 Command Queue tail (32bits) */
> +#define RISCV_IOMMU_REG_CQT            0x0024
> +#define RISCV_IOMMU_CQT_INDEX          RISCV_IOMMU_QUEUE_INDEX_FIELD
> +
> +/* 5.9 Fault Queue Base (64bits) */
> +#define RISCV_IOMMU_REG_FQB            0x0028
> +#define RISCV_IOMMU_FQB_ENTRIES                RISCV_IOMMU_QUEUE_LOGSZ_FIELD
> +#define RISCV_IOMMU_FQB_PPN            RISCV_IOMMU_PPN_FIELD
> +
> +/* 5.10 Fault Queue Head (32bits) */
> +#define RISCV_IOMMU_REG_FQH            0x0030
> +#define RISCV_IOMMU_FQH_INDEX          RISCV_IOMMU_QUEUE_INDEX_FIELD
> +
> +/* 5.11 Fault Queue tail (32bits) */
> +#define RISCV_IOMMU_REG_FQT            0x0034
> +#define RISCV_IOMMU_FQT_INDEX          RISCV_IOMMU_QUEUE_INDEX_FIELD
> +
> +/* 5.12 Page Request Queue base (64bits) */
> +#define RISCV_IOMMU_REG_PQB            0x0038
> +#define RISCV_IOMMU_PQB_ENTRIES                RISCV_IOMMU_QUEUE_LOGSZ_FIELD
> +#define RISCV_IOMMU_PQB_PPN            RISCV_IOMMU_PPN_FIELD
> +
> +/* 5.13 Page Request Queue head (32bits) */
> +#define RISCV_IOMMU_REG_PQH            0x0040
> +#define RISCV_IOMMU_PQH_INDEX          RISCV_IOMMU_QUEUE_INDEX_FIELD
> +
> +/* 5.14 Page Request Queue tail (32bits) */
> +#define RISCV_IOMMU_REG_PQT            0x0044
> +#define RISCV_IOMMU_PQT_INDEX_MASK     RISCV_IOMMU_QUEUE_INDEX_FIELD
> +
> +/* 5.15 Command Queue CSR (32bits) */
> +#define RISCV_IOMMU_REG_CQCSR          0x0048
> +#define RISCV_IOMMU_CQCSR_CQEN         RISCV_IOMMU_QUEUE_ENABLE
> +#define RISCV_IOMMU_CQCSR_CIE          RISCV_IOMMU_QUEUE_INTR_ENABLE
> +#define RISCV_IOMMU_CQCSR_CQMF         RISCV_IOMMU_QUEUE_MEM_FAULT
> +#define RISCV_IOMMU_CQCSR_CMD_TO       BIT(9)
> +#define RISCV_IOMMU_CQCSR_CMD_ILL      BIT(10)
> +#define RISCV_IOMMU_CQCSR_FENCE_W_IP   BIT(11)
> +#define RISCV_IOMMU_CQCSR_CQON         RISCV_IOMMU_QUEUE_ACTIVE
> +#define RISCV_IOMMU_CQCSR_BUSY         RISCV_IOMMU_QUEUE_BUSY
> +
> +/* 5.16 Fault Queue CSR (32bits) */
> +#define RISCV_IOMMU_REG_FQCSR          0x004C
> +#define RISCV_IOMMU_FQCSR_FQEN         RISCV_IOMMU_QUEUE_ENABLE
> +#define RISCV_IOMMU_FQCSR_FIE          RISCV_IOMMU_QUEUE_INTR_ENABLE
> +#define RISCV_IOMMU_FQCSR_FQMF         RISCV_IOMMU_QUEUE_MEM_FAULT
> +#define RISCV_IOMMU_FQCSR_FQOF         RISCV_IOMMU_QUEUE_OVERFLOW
> +#define RISCV_IOMMU_FQCSR_FQON         RISCV_IOMMU_QUEUE_ACTIVE
> +#define RISCV_IOMMU_FQCSR_BUSY         RISCV_IOMMU_QUEUE_BUSY
> +
> +/* 5.17 Page Request Queue CSR (32bits) */
> +#define RISCV_IOMMU_REG_PQCSR          0x0050
> +#define RISCV_IOMMU_PQCSR_PQEN         RISCV_IOMMU_QUEUE_ENABLE
> +#define RISCV_IOMMU_PQCSR_PIE          RISCV_IOMMU_QUEUE_INTR_ENABLE
> +#define RISCV_IOMMU_PQCSR_PQMF         RISCV_IOMMU_QUEUE_MEM_FAULT
> +#define RISCV_IOMMU_PQCSR_PQOF         RISCV_IOMMU_QUEUE_OVERFLOW
> +#define RISCV_IOMMU_PQCSR_PQON         RISCV_IOMMU_QUEUE_ACTIVE
> +#define RISCV_IOMMU_PQCSR_BUSY         RISCV_IOMMU_QUEUE_BUSY
> +
> +/* 5.18 Interrupt Pending Status (32bits) */
> +#define RISCV_IOMMU_REG_IPSR           0x0054
> +
> +#define RISCV_IOMMU_INTR_CQ            0
> +#define RISCV_IOMMU_INTR_FQ            1
> +#define RISCV_IOMMU_INTR_PM            2
> +#define RISCV_IOMMU_INTR_PQ            3
> +#define RISCV_IOMMU_INTR_COUNT         4
> +
> +#define RISCV_IOMMU_IPSR_CIP           BIT(RISCV_IOMMU_INTR_CQ)
> +#define RISCV_IOMMU_IPSR_FIP           BIT(RISCV_IOMMU_INTR_FQ)
> +#define RISCV_IOMMU_IPSR_PMIP          BIT(RISCV_IOMMU_INTR_PM)
> +#define RISCV_IOMMU_IPSR_PIP           BIT(RISCV_IOMMU_INTR_PQ)
> +
> +/* 5.19 Performance monitoring counter overflow status (32bits) */
> +#define RISCV_IOMMU_REG_IOCOUNTOVF     0x0058
> +#define RISCV_IOMMU_IOCOUNTOVF_CY      BIT(0)
> +#define RISCV_IOMMU_IOCOUNTOVF_HPM     GENMASK_ULL(31, 1)
> +
> +/* 5.20 Performance monitoring counter inhibits (32bits) */
> +#define RISCV_IOMMU_REG_IOCOUNTINH     0x005C
> +#define RISCV_IOMMU_IOCOUNTINH_CY      BIT(0)
> +#define RISCV_IOMMU_IOCOUNTINH_HPM     GENMASK(31, 1)
> +
> +/* 5.21 Performance monitoring cycles counter (64bits) */
> +#define RISCV_IOMMU_REG_IOHPMCYCLES     0x0060
> +#define RISCV_IOMMU_IOHPMCYCLES_COUNTER        GENMASK_ULL(62, 0)
> +#define RISCV_IOMMU_IOHPMCYCLES_OVF    BIT_ULL(63)
> +
> +/* 5.22 Performance monitoring event counters (31 * 64bits) */
> +#define RISCV_IOMMU_REG_IOHPMCTR_BASE  0x0068
> +#define RISCV_IOMMU_REG_IOHPMCTR(_n)   (RISCV_IOMMU_REG_IOHPMCTR_BASE + (_n * 0x8))
> +
> +/* 5.23 Performance monitoring event selectors (31 * 64bits) */
> +#define RISCV_IOMMU_REG_IOHPMEVT_BASE  0x0160
> +#define RISCV_IOMMU_REG_IOHPMEVT(_n)   (RISCV_IOMMU_REG_IOHPMEVT_BASE + (_n * 0x8))
> +#define RISCV_IOMMU_IOHPMEVT_EVENT_ID  GENMASK_ULL(14, 0)
> +#define RISCV_IOMMU_IOHPMEVT_DMASK     BIT_ULL(15)
> +#define RISCV_IOMMU_IOHPMEVT_PID_PSCID GENMASK_ULL(35, 16)
> +#define RISCV_IOMMU_IOHPMEVT_DID_GSCID GENMASK_ULL(59, 36)
> +#define RISCV_IOMMU_IOHPMEVT_PV_PSCV   BIT_ULL(60)
> +#define RISCV_IOMMU_IOHPMEVT_DV_GSCV   BIT_ULL(61)
> +#define RISCV_IOMMU_IOHPMEVT_IDT       BIT_ULL(62)
> +#define RISCV_IOMMU_IOHPMEVT_OF                BIT_ULL(63)
> +
> +/**
> + * enum riscv_iommu_hpmevent_id - Performance-monitoring event identifier
> + *
> + * @RISCV_IOMMU_HPMEVENT_INVALID: Invalid event, do not count
> + * @RISCV_IOMMU_HPMEVENT_URQ: Untranslated requests
> + * @RISCV_IOMMU_HPMEVENT_TRQ: Translated requests
> + * @RISCV_IOMMU_HPMEVENT_ATS_RQ: ATS translation requests
> + * @RISCV_IOMMU_HPMEVENT_TLB_MISS: TLB misses
> + * @RISCV_IOMMU_HPMEVENT_DD_WALK: Device directory walks
> + * @RISCV_IOMMU_HPMEVENT_PD_WALK: Process directory walks
> + * @RISCV_IOMMU_HPMEVENT_S_VS_WALKS: S/VS-Stage page table walks
> + * @RISCV_IOMMU_HPMEVENT_G_WALKS: G-Stage page table walks
> + * @RISCV_IOMMU_HPMEVENT_MAX: Value to denote maximum Event IDs
> + */
> +enum riscv_iommu_hpmevent_id {
> +       RISCV_IOMMU_HPMEVENT_INVALID    = 0,
> +       RISCV_IOMMU_HPMEVENT_URQ        = 1,
> +       RISCV_IOMMU_HPMEVENT_TRQ        = 2,
> +       RISCV_IOMMU_HPMEVENT_ATS_RQ     = 3,
> +       RISCV_IOMMU_HPMEVENT_TLB_MISS   = 4,
> +       RISCV_IOMMU_HPMEVENT_DD_WALK    = 5,
> +       RISCV_IOMMU_HPMEVENT_PD_WALK    = 6,
> +       RISCV_IOMMU_HPMEVENT_S_VS_WALKS = 7,
> +       RISCV_IOMMU_HPMEVENT_G_WALKS    = 8,
> +       RISCV_IOMMU_HPMEVENT_MAX        = 9
> +};
> +
> +/* 5.24 Translation request IOVA (64bits) */
> +#define RISCV_IOMMU_REG_TR_REQ_IOVA     0x0258
> +#define RISCV_IOMMU_TR_REQ_IOVA_VPN    GENMASK_ULL(63, 12)
> +
> +/* 5.25 Translation request control (64bits) */
> +#define RISCV_IOMMU_REG_TR_REQ_CTL     0x0260
> +#define RISCV_IOMMU_TR_REQ_CTL_GO_BUSY BIT_ULL(0)
> +#define RISCV_IOMMU_TR_REQ_CTL_PRIV    BIT_ULL(1)
> +#define RISCV_IOMMU_TR_REQ_CTL_EXE     BIT_ULL(2)
> +#define RISCV_IOMMU_TR_REQ_CTL_NW      BIT_ULL(3)
> +#define RISCV_IOMMU_TR_REQ_CTL_PID     GENMASK_ULL(31, 12)
> +#define RISCV_IOMMU_TR_REQ_CTL_PV      BIT_ULL(32)
> +#define RISCV_IOMMU_TR_REQ_CTL_DID     GENMASK_ULL(63, 40)
> +
> +/* 5.26 Translation request response (64bits) */
> +#define RISCV_IOMMU_REG_TR_RESPONSE    0x0268
> +#define RISCV_IOMMU_TR_RESPONSE_FAULT  BIT_ULL(0)
> +#define RISCV_IOMMU_TR_RESPONSE_PBMT   GENMASK_ULL(8, 7)
> +#define RISCV_IOMMU_TR_RESPONSE_SZ     BIT_ULL(9)
> +#define RISCV_IOMMU_TR_RESPONSE_PPN    RISCV_IOMMU_PPN_FIELD
> +
> +/* 5.27 Interrupt cause to vector (64bits) */
> +#define RISCV_IOMMU_REG_IVEC           0x02F8
> +#define RISCV_IOMMU_IVEC_CIV           GENMASK_ULL(3, 0)
> +#define RISCV_IOMMU_IVEC_FIV           GENMASK_ULL(7, 4)
> +#define RISCV_IOMMU_IVEC_PMIV          GENMASK_ULL(11, 8)
> +#define RISCV_IOMMU_IVEC_PIV           GENMASK_ULL(15,12)
> +
> +/* 5.28 MSI Configuration table (32 * 64bits) */
> +#define RISCV_IOMMU_REG_MSI_CONFIG     0x0300
> +#define RISCV_IOMMU_REG_MSI_ADDR(_n)   (RISCV_IOMMU_REG_MSI_CONFIG + (_n * 0x10))
> +#define RISCV_IOMMU_MSI_ADDR           GENMASK_ULL(55, 2)
> +#define RISCV_IOMMU_REG_MSI_DATA(_n)   (RISCV_IOMMU_REG_MSI_CONFIG + (_n * 0x10) + 0x08)
> +#define RISCV_IOMMU_MSI_DATA           GENMASK_ULL(31, 0)
> +#define RISCV_IOMMU_REG_MSI_VEC_CTL(_n)        (RISCV_IOMMU_REG_MSI_CONFIG + (_n * 0x10) + 0x0C)
> +#define RISCV_IOMMU_MSI_VEC_CTL_M      BIT_ULL(0)
> +
> +#define RISCV_IOMMU_REG_SIZE   0x1000
> +
> +/*
> + * Chapter 2: Data structures
> + */
> +
> +/*
> + * Device Directory Table macros for non-leaf nodes
> + */
> +#define RISCV_IOMMU_DDTE_VALID BIT_ULL(0)
> +#define RISCV_IOMMU_DDTE_PPN   RISCV_IOMMU_PPN_FIELD
> +
> +/**
> + * struct riscv_iommu_dc - Device Context
> + * @tc: Translation Control
> + * @iohgatp: I/O Hypervisor guest address translation and protection
> + *          (Second stage context)
> + * @ta: Translation Attributes
> + * @fsc: First stage context
> + * @msiptpt: MSI page table pointer
> + * @msi_addr_mask: MSI address mask
> + * @msi_addr_pattern: MSI address pattern
> + *
> + * This structure is used for leaf nodes on the Device Directory Table,
> + * in case RISCV_IOMMU_CAP_MSI_FLAT is not set, the bottom 4 fields are
> + * not present and are skipped with pointer arithmetic to avoid
> + * casting, check out riscv_iommu_get_dc().
> + * See section 2.1 for more details
> + */
> +struct riscv_iommu_dc {
> +       u64 tc;
> +       u64 iohgatp;
> +       u64 ta;
> +       u64 fsc;
> +       u64 msiptp;
> +       u64 msi_addr_mask;
> +       u64 msi_addr_pattern;
> +       u64 _reserved;
> +};
> +
> +/* Translation control fields */
> +#define RISCV_IOMMU_DC_TC_V            BIT_ULL(0)
> +#define RISCV_IOMMU_DC_TC_EN_ATS       BIT_ULL(1)
> +#define RISCV_IOMMU_DC_TC_EN_PRI       BIT_ULL(2)
> +#define RISCV_IOMMU_DC_TC_T2GPA                BIT_ULL(3)
> +#define RISCV_IOMMU_DC_TC_DTF          BIT_ULL(4)
> +#define RISCV_IOMMU_DC_TC_PDTV         BIT_ULL(5)
> +#define RISCV_IOMMU_DC_TC_PRPR         BIT_ULL(6)
> +#define RISCV_IOMMU_DC_TC_GADE         BIT_ULL(7)
> +#define RISCV_IOMMU_DC_TC_SADE         BIT_ULL(8)
> +#define RISCV_IOMMU_DC_TC_DPE          BIT_ULL(9)
> +#define RISCV_IOMMU_DC_TC_SBE          BIT_ULL(10)
> +#define RISCV_IOMMU_DC_TC_SXL          BIT_ULL(11)
> +
> +/* Second-stage (aka G-stage) context fields */
> +#define RISCV_IOMMU_DC_IOHGATP_PPN     RISCV_IOMMU_ATP_PPN_FIELD
> +#define RISCV_IOMMU_DC_IOHGATP_GSCID   GENMASK_ULL(59, 44)
> +#define RISCV_IOMMU_DC_IOHGATP_MODE    RISCV_IOMMU_ATP_MODE_FIELD
> +
> +/**
> + * enum riscv_iommu_dc_iohgatp_modes - Guest address translation/protection modes
> + * @RISCV_IOMMU_DC_IOHGATP_MODE_BARE: No translation/protection
> + * @RISCV_IOMMU_DC_IOHGATP_MODE_SV32X4: Sv32x4 (2-bit extension of Sv32), when fctl.GXL == 1
> + * @RISCV_IOMMU_DC_IOHGATP_MODE_SV39X4: Sv39x4 (2-bit extension of Sv39), when fctl.GXL == 0
> + * @RISCV_IOMMU_DC_IOHGATP_MODE_SV48X4: Sv48x4 (2-bit extension of Sv48), when fctl.GXL == 0
> + * @RISCV_IOMMU_DC_IOHGATP_MODE_SV57X4: Sv57x4 (2-bit extension of Sv57), when fctl.GXL == 0
> + */
> +enum riscv_iommu_dc_iohgatp_modes {
> +       RISCV_IOMMU_DC_IOHGATP_MODE_BARE = 0,
> +       RISCV_IOMMU_DC_IOHGATP_MODE_SV32X4 = 8,
> +       RISCV_IOMMU_DC_IOHGATP_MODE_SV39X4 = 8,
> +       RISCV_IOMMU_DC_IOHGATP_MODE_SV48X4 = 9,
> +       RISCV_IOMMU_DC_IOHGATP_MODE_SV57X4 = 10
> +};
> +
> +/* Translation attributes fields */
> +#define RISCV_IOMMU_DC_TA_PSCID                GENMASK_ULL(31,12)
> +
> +/* First-stage context fields */
> +#define RISCV_IOMMU_DC_FSC_PPN         RISCV_IOMMU_ATP_PPN_FIELD
> +#define RISCV_IOMMU_DC_FSC_MODE                RISCV_IOMMU_ATP_MODE_FIELD
> +
> +/**
> + * enum riscv_iommu_dc_fsc_atp_modes - First stage address translation/protection modes
> + * @RISCV_IOMMU_DC_FSC_MODE_BARE: No translation/protection
> + * @RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV32: Sv32, when dc.tc.SXL == 1
> + * @RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39: Sv39, when dc.tc.SXL == 0
> + * @RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV48: Sv48, when dc.tc.SXL == 0
> + * @RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV57: Sv57, when dc.tc.SXL == 0
> + * @RISCV_IOMMU_DC_FSC_PDTP_MODE_PD8: 1lvl PDT, 8bit process ids
> + * @RISCV_IOMMU_DC_FSC_PDTP_MODE_PD17: 2lvl PDT, 17bit process ids
> + * @RISCV_IOMMU_DC_FSC_PDTP_MODE_PD20: 3lvl PDT, 20bit process ids
> + *
> + * FSC holds IOSATP when RISCV_IOMMU_DC_TC_PDTV is 0 and PDTP otherwise.
> + * IOSATP controls the first stage address translation (same as the satp register on
> + * the RISC-V MMU), and PDTP holds the process directory table, used to select a
> + * first stage page table based on a process id (for devices that support multiple
> + * process ids).
> + */
> +enum riscv_iommu_dc_fsc_atp_modes {
> +       RISCV_IOMMU_DC_FSC_MODE_BARE = 0,
> +       RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV32 = 8,
> +       RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 = 8,
> +       RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV48 = 9,
> +       RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV57 = 10,
> +       RISCV_IOMMU_DC_FSC_PDTP_MODE_PD8 = 1,
> +       RISCV_IOMMU_DC_FSC_PDTP_MODE_PD17 = 2,
> +       RISCV_IOMMU_DC_FSC_PDTP_MODE_PD20 = 3
> +};
> +
> +/* MSI page table pointer */
> +#define RISCV_IOMMU_DC_MSIPTP_PPN      RISCV_IOMMU_ATP_PPN_FIELD
> +#define RISCV_IOMMU_DC_MSIPTP_MODE     RISCV_IOMMU_ATP_MODE_FIELD
> +#define RISCV_IOMMU_DC_MSIPTP_MODE_OFF 0
> +#define RISCV_IOMMU_DC_MSIPTP_MODE_FLAT        1
> +
> +/* MSI address mask */
> +#define RISCV_IOMMU_DC_MSI_ADDR_MASK   GENMASK_ULL(51, 0)
> +
> +/* MSI address pattern */
> +#define RISCV_IOMMU_DC_MSI_PATTERN     GENMASK_ULL(51, 0)
> +
> +/**
> + * struct riscv_iommu_pc - Process Context
> + * @ta: Translation Attributes
> + * @fsc: First stage context
> + *
> + * This structure is used for leaf nodes on the Process Directory Table
> + * See section 2.3 for more details
> + */
> +struct riscv_iommu_pc {
> +       u64 ta;
> +       u64 fsc;
> +};
> +
> +/* Translation attributes fields */
> +#define RISCV_IOMMU_PC_TA_V    BIT_ULL(0)
> +#define RISCV_IOMMU_PC_TA_ENS  BIT_ULL(1)
> +#define RISCV_IOMMU_PC_TA_SUM  BIT_ULL(2)
> +#define RISCV_IOMMU_PC_TA_PSCID        GENMASK_ULL(31, 12)
> +
> +/* First stage context fields */
> +#define RISCV_IOMMU_PC_FSC_PPN RISCV_IOMMU_ATP_PPN_FIELD
> +#define RISCV_IOMMU_PC_FSC_MODE        RISCV_IOMMU_ATP_MODE_FIELD
> +
> +/*
> + * Chapter 3: In-memory queue interface
> + */
> +
> +/**
> + * struct riscv_iommu_cmd - Generic I/O MMU command structure
> + * @dword0: Includes the opcode and the function identifier
> + * @dword1: Opcode specific data
> + *
> + * The commands are interpreted as two 64bit fields, where the first
> + * 7bits of the first field are the opcode which also defines the
> + * command's format, followed by a 3bit field that specifies the
> + * function invoked by that command, and the rest is opcode-specific.
> + * This is a generic struct which will be populated differently
> + * according to each command. For more infos on the commands and
> + * the command queue check section 3.1.
> + */
> +struct riscv_iommu_command {
> +       u64 dword0;
> +       u64 dword1;
> +};
> +
> +/* Fields on dword0, common for all commands */
> +#define RISCV_IOMMU_CMD_OPCODE GENMASK_ULL(6, 0)
> +#define        RISCV_IOMMU_CMD_FUNC    GENMASK_ULL(9, 7)
> +
> +/* 3.1.1 I/O MMU Page-table cache invalidation */
> +/* Fields on dword0 */
> +#define RISCV_IOMMU_CMD_IOTINVAL_OPCODE                1
> +#define RISCV_IOMMU_CMD_IOTINVAL_FUNC_VMA      0
> +#define RISCV_IOMMU_CMD_IOTINVAL_FUNC_GVMA     1
> +#define RISCV_IOMMU_CMD_IOTINVAL_AV            BIT_ULL(10)
> +#define RISCV_IOMMU_CMD_IOTINVAL_PSCID         GENMASK_ULL(31, 12)
> +#define RISCV_IOMMU_CMD_IOTINVAL_PSCV          BIT_ULL(32)
> +#define RISCV_IOMMU_CMD_IOTINVAL_GV            BIT_ULL(33)
> +#define RISCV_IOMMU_CMD_IOTINVAL_GSCID         GENMASK_ULL(59, 44)
> +/* dword1 is the address, 4K-alligned and shifted to the right by
> + * two bits. */
> +
> +/* 3.1.2 I/O MMU Command Queue Fences */
> +/* Fields on dword0 */
> +#define RISCV_IOMMU_CMD_IOFENCE_OPCODE         2
> +#define RISCV_IOMMU_CMD_IOFENCE_FUNC_C         0
> +#define RISCV_IOMMU_CMD_IOFENCE_AV             BIT_ULL(10)
> +#define RISCV_IOMMU_CMD_IOFENCE_WSI            BIT_ULL(11)
> +#define RISCV_IOMMU_CMD_IOFENCE_PR             BIT_ULL(12)
> +#define RISCV_IOMMU_CMD_IOFENCE_PW             BIT_ULL(13)
> +#define RISCV_IOMMU_CMD_IOFENCE_DATA           GENMASK_ULL(63, 32)
> +/* dword1 is the address, word-size alligned and shifted to the
> + * right by two bits. */
> +
> +/* 3.1.3 I/O MMU Directory cache invalidation */
> +/* Fields on dword0 */
> +#define RISCV_IOMMU_CMD_IODIR_OPCODE           3
> +#define RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_DDT   0
> +#define RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_PDT   1
> +#define RISCV_IOMMU_CMD_IODIR_PID              GENMASK_ULL(31, 12)
> +#define RISCV_IOMMU_CMD_IODIR_DV               BIT_ULL(33)
> +#define RISCV_IOMMU_CMD_IODIR_DID              GENMASK_ULL(63, 40)
> +/* dword1 is reserved for standard use */
> +
> +/* 3.1.4 I/O MMU PCIe ATS */
> +/* Fields on dword0 */
> +#define RISCV_IOMMU_CMD_ATS_OPCODE             4
> +#define RISCV_IOMMU_CMD_ATS_FUNC_INVAL         0
> +#define RISCV_IOMMU_CMD_ATS_FUNC_PRGR          1
> +#define RISCV_IOMMU_CMD_ATS_PID                        GENMASK_ULL(31, 12)
> +#define RISCV_IOMMU_CMD_ATS_PV                 BIT_ULL(32)
> +#define RISCV_IOMMU_CMD_ATS_DSV                        BIT_ULL(33)
> +#define RISCV_IOMMU_CMD_ATS_RID                        GENMASK_ULL(55, 40)
> +#define RISCV_IOMMU_CMD_ATS_DSEG               GENMASK_ULL(63, 56)
> +/* dword1 is the ATS payload, two different payload types for INVAL and PRGR */
> +
> +/* ATS.INVAL payload*/
> +#define RISCV_IOMMU_CMD_ATS_INVAL_G            BIT_ULL(0)
> +/* Bits 1 - 10 are zeroed */
> +#define RISCV_IOMMU_CMD_ATS_INVAL_S            BIT_ULL(11)
> +#define RISCV_IOMMU_CMD_ATS_INVAL_UADDR                GENMASK_ULL(63, 12)
> +
> +/* ATS.PRGR payload */
> +/* Bits 0 - 31 are zeroed */
> +#define RISCV_IOMMU_CMD_ATS_PRGR_PRG_INDEX     GENMASK_ULL(40, 32)
> +/* Bits 41 - 43 are zeroed */
> +#define RISCV_IOMMU_CMD_ATS_PRGR_RESP_CODE     GENMASK_ULL(47, 44)
> +#define RISCV_IOMMU_CMD_ATS_PRGR_DST_ID                GENMASK_ULL(63, 48)
> +
> +/**
> + * struct riscv_iommu_fq_record - Fault/Event Queue Record
> + * @hdr: Header, includes fault/event cause, PID/DID, transaction type etc
> + * @_reserved: Low 32bits for custom use, high 32bits for standard use
> + * @iotval: Transaction-type/cause specific format
> + * @iotval2: Cause specific format
> + *
> + * The fault/event queue reports events and failures raised when
> + * processing transactions. Each record is a 32byte structure where
> + * the first dword has a fixed format for providing generic infos
> + * regarding the fault/event, and two more dwords are there for
> + * fault/event-specific information. For more details see section
> + * 3.2.
> + */
> +struct riscv_iommu_fq_record {
> +       u64 hdr;
> +       u64 _reserved;
> +       u64 iotval;
> +       u64 iotval2;
> +};
> +
> +/* Fields on header */
> +#define RISCV_IOMMU_FQ_HDR_CAUSE       GENMASK_ULL(11, 0)
> +#define RISCV_IOMMU_FQ_HDR_PID         GENMASK_ULL(31, 12)
> +#define RISCV_IOMMU_FQ_HDR_PV          BIT_ULL(32)
> +#define RISCV_IOMMU_FQ_HDR_PRIV                BIT_ULL(33)
> +#define RISCV_IOMMU_FQ_HDR_TTYPE       GENMASK_ULL(39, 34)
> +#define RISCV_IOMMU_FQ_HDR_DID         GENMASK_ULL(63, 40)
> +
> +/**
> + * enum riscv_iommu_fq_causes - Fault/event cause values
> + * @RISCV_IOMMU_FQ_CAUSE_INST_FAULT: Instruction access fault
> + * @RISCV_IOMMU_FQ_CAUSE_RD_ADDR_MISALIGNED: Read address misaligned
> + * @RISCV_IOMMU_FQ_CAUSE_RD_FAULT: Read load fault
> + * @RISCV_IOMMU_FQ_CAUSE_WR_ADDR_MISALIGNED: Write/AMO address misaligned
> + * @RISCV_IOMMU_FQ_CAUSE_WR_FAULT: Write/AMO access fault
> + * @RISCV_IOMMU_FQ_CAUSE_INST_FAULT_S: Instruction page fault
> + * @RISCV_IOMMU_FQ_CAUSE_RD_FAULT_S: Read page fault
> + * @RISCV_IOMMU_FQ_CAUSE_WR_FAULT_S: Write/AMO page fault
> + * @RISCV_IOMMU_FQ_CAUSE_INST_FAULT_VS: Instruction guest page fault
> + * @RISCV_IOMMU_FQ_CAUSE_RD_FAULT_VS: Read guest page fault
> + * @RISCV_IOMMU_FQ_CAUSE_WR_FAULT_VS: Write/AMO guest page fault
> + * @RISCV_IOMMU_FQ_CAUSE_DMA_DISABLED: All inbound transactions disallowed
> + * @RISCV_IOMMU_FQ_CAUSE_DDT_LOAD_FAULT: DDT entry load access fault
> + * @RISCV_IOMMU_FQ_CAUSE_DDT_INVALID: DDT entry invalid
> + * @RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED: DDT entry misconfigured
> + * @RISCV_IOMMU_FQ_CAUSE_TTYPE_BLOCKED: Transaction type disallowed
> + * @RISCV_IOMMU_FQ_CAUSE_MSI_LOAD_FAULT: MSI PTE load access fault
> + * @RISCV_IOMMU_FQ_CAUSE_MSI_INVALID: MSI PTE invalid
> + * @RISCV_IOMMU_FQ_CAUSE_MSI_MISCONFIGURED: MSI PTE misconfigured
> + * @RISCV_IOMMU_FQ_CAUSE_MRIF_FAULT: MRIF access fault
> + * @RISCV_IOMMU_FQ_CAUSE_PDT_LOAD_FAULT: PDT entry load access fault
> + * @RISCV_IOMMU_FQ_CAUSE_PDT_INVALID: PDT entry invalid
> + * @RISCV_IOMMU_FQ_CAUSE_PDT_MISCONFIGURED: PDT entry misconfigured
> + * @RISCV_IOMMU_FQ_CAUSE_DDT_CORRUPTED: DDT data corruption
> + * @RISCV_IOMMU_FQ_CAUSE_PDT_CORRUPTED: PDT data corruption
> + * @RISCV_IOMMU_FQ_CAUSE_MSI_PT_CORRUPTED: MSI page table data corruption
> + * @RISCV_IOMMU_FQ_CAUSE_MRIF_CORRUIPTED: MRIF data corruption
> + * @RISCV_IOMMU_FQ_CAUSE_INTERNAL_DP_ERROR: Internal data path error
> + * @RISCV_IOMMU_FQ_CAUSE_MSI_WR_FAULT: IOMMU MSI write access fault
> + * @RISCV_IOMMU_FQ_CAUSE_PT_CORRUPTED: First/second stage page table data corruption
> + *
> + * Values are on table 11 of the spec, encodings 275 - 2047 are reserved for standard
> + * use, and 2048 - 4095 for custom use.
> + */
> +enum riscv_iommu_fq_causes {
> +       RISCV_IOMMU_FQ_CAUSE_INST_FAULT = 1,
> +       RISCV_IOMMU_FQ_CAUSE_RD_ADDR_MISALIGNED = 4,
> +       RISCV_IOMMU_FQ_CAUSE_RD_FAULT = 5,
> +       RISCV_IOMMU_FQ_CAUSE_WR_ADDR_MISALIGNED = 6,
> +       RISCV_IOMMU_FQ_CAUSE_WR_FAULT = 7,
> +       RISCV_IOMMU_FQ_CAUSE_INST_FAULT_S = 12,
> +       RISCV_IOMMU_FQ_CAUSE_RD_FAULT_S = 13,
> +       RISCV_IOMMU_FQ_CAUSE_WR_FAULT_S = 15,
> +       RISCV_IOMMU_FQ_CAUSE_INST_FAULT_VS = 20,
> +       RISCV_IOMMU_FQ_CAUSE_RD_FAULT_VS = 21,
> +       RISCV_IOMMU_FQ_CAUSE_WR_FAULT_VS = 23,
> +       RISCV_IOMMU_FQ_CAUSE_DMA_DISABLED = 256,
> +       RISCV_IOMMU_FQ_CAUSE_DDT_LOAD_FAULT = 257,
> +       RISCV_IOMMU_FQ_CAUSE_DDT_INVALID = 258,
> +       RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED = 259,
> +       RISCV_IOMMU_FQ_CAUSE_TTYPE_BLOCKED = 260,
> +       RISCV_IOMMU_FQ_CAUSE_MSI_LOAD_FAULT = 261,
> +       RISCV_IOMMU_FQ_CAUSE_MSI_INVALID = 262,
> +       RISCV_IOMMU_FQ_CAUSE_MSI_MISCONFIGURED = 263,
> +       RISCV_IOMMU_FQ_CAUSE_MRIF_FAULT = 264,
> +       RISCV_IOMMU_FQ_CAUSE_PDT_LOAD_FAULT = 265,
> +       RISCV_IOMMU_FQ_CAUSE_PDT_INVALID = 266,
> +       RISCV_IOMMU_FQ_CAUSE_PDT_MISCONFIGURED = 267,
> +       RISCV_IOMMU_FQ_CAUSE_DDT_CORRUPTED = 268,
> +       RISCV_IOMMU_FQ_CAUSE_PDT_CORRUPTED = 269,
> +       RISCV_IOMMU_FQ_CAUSE_MSI_PT_CORRUPTED = 270,
> +       RISCV_IOMMU_FQ_CAUSE_MRIF_CORRUIPTED = 271,
> +       RISCV_IOMMU_FQ_CAUSE_INTERNAL_DP_ERROR = 272,
> +       RISCV_IOMMU_FQ_CAUSE_MSI_WR_FAULT = 273,
> +       RISCV_IOMMU_FQ_CAUSE_PT_CORRUPTED = 274
> +};
> +
> +/**
> + * enum riscv_iommu_fq_ttypes: Fault/event transaction types
> + * @RISCV_IOMMU_FQ_TTYPE_NONE: None. Fault not caused by an inbound transaction.
> + * @RISCV_IOMMU_FQ_TTYPE_UADDR_INST_FETCH: Instruction fetch from untranslated address
> + * @RISCV_IOMMU_FQ_TTYPE_UADDR_RD: Read from untranslated address
> + * @RISCV_IOMMU_FQ_TTYPE_UADDR_WR: Write/AMO to untranslated address
> + * @RISCV_IOMMU_FQ_TTYPE_TADDR_INST_FETCH: Instruction fetch from translated address
> + * @RISCV_IOMMU_FQ_TTYPE_TADDR_RD: Read from translated address
> + * @RISCV_IOMMU_FQ_TTYPE_TADDR_WR: Write/AMO to translated address
> + * @RISCV_IOMMU_FQ_TTYPE_PCIE_ATS_REQ: PCIe ATS translation request
> + * @RISCV_IOMMU_FW_TTYPE_PCIE_MSG_REQ: PCIe message request
> + *
> + * Values are on table 12 of the spec, type 4 and 10 - 31 are reserved for standard use
> + * and 31 - 63 for custom use.
> + */
> +enum riscv_iommu_fq_ttypes {
> +       RISCV_IOMMU_FQ_TTYPE_NONE = 0,
> +       RISCV_IOMMU_FQ_TTYPE_UADDR_INST_FETCH = 1,
> +       RISCV_IOMMU_FQ_TTYPE_UADDR_RD = 2,
> +       RISCV_IOMMU_FQ_TTYPE_UADDR_WR = 3,
> +       RISCV_IOMMU_FQ_TTYPE_TADDR_INST_FETCH = 5,
> +       RISCV_IOMMU_FQ_TTYPE_TADDR_RD = 6,
> +       RISCV_IOMMU_FQ_TTYPE_TADDR_WR = 7,
> +       RISCV_IOMMU_FQ_TTYPE_PCIE_ATS_REQ = 8,
> +       RISCV_IOMMU_FW_TTYPE_PCIE_MSG_REQ = 9,
> +};
> +
> +/**
> + * struct riscv_iommu_pq_record - PCIe Page Request record
> + * @hdr: Header, includes PID, DID etc
> + * @payload: Holds the page address, request group and permission bits
> + *
> + * For more infos on the PCIe Page Request queue see chapter 3.3.
> + */
> +struct riscv_iommu_pq_record {
> +       u64 hdr;
> +       u64 payload;
> +};
> +
> +/* Header fields */
> +#define RISCV_IOMMU_PREQ_HDR_PID       GENMASK_ULL(31, 12)
> +#define RISCV_IOMMU_PREQ_HDR_PV                BIT_ULL(32)
> +#define RISCV_IOMMU_PREQ_HDR_PRIV      BIT_ULL(33)
> +#define RISCV_IOMMU_PREQ_HDR_EXEC      BIT_ULL(34)
> +#define RISCV_IOMMU_PREQ_HDR_DID       GENMASK_ULL(63, 40)
> +
> +/* Payload fields */
> +#define RISCV_IOMMU_PREQ_PAYLOAD_R     BIT_ULL(0)
> +#define RISCV_IOMMU_PREQ_PAYLOAD_W     BIT_ULL(1)
> +#define RISCV_IOMMU_PREQ_PAYLOAD_L     BIT_ULL(2)
> +#define RISCV_IOMMU_PREQ_PAYLOAD_M     GENMASK_ULL(2, 0)       /* Mask of RWL for convenience */
> +#define RISCV_IOMMU_PREQ_PRG_INDEX     GENMASK_ULL(11, 3)
> +#define RISCV_IOMMU_PREQ_UADDR         GENMASK_ULL(63, 12)
> +
> +/**
> + * struct riscv_iommu_msi_pte - MSI Page Table Entry
> + * @pte: MSI PTE
> + * @mrif_info: Memory-resident interrupt file info
> + *
> + * The MSI Page Table is used for virtualizing MSIs, so that when
> + * a device sends an MSI to a guest, the IOMMU can reroute it
> + * by translating the MSI address, either to a guest interrupt file
> + * or a memory resident interrupt file (MRIF). Note that this page table
> + * is an array of MSI PTEs, not a multi-level pt, each entry
> + * is a leaf entry. For more infos check out the the AIA spec, chapter 9.5.
> + *
> + * Also in basic mode the mrif_info field is ignored by the IOMMU and can
> + * be used by software, any other reserved fields on pte must be zeroed-out
> + * by software.
> + */
> +struct riscv_iommu_msi_pte {
> +       u64 pte;
> +       u64 mrif_info;
> +};
> +
> +/* Fields on pte */
> +#define RISCV_IOMMU_MSI_PTE_V          BIT_ULL(0)
> +#define RISCV_IOMMU_MSI_PTE_M          GENMASK_ULL(2, 1)
> +#define RISCV_IOMMU_MSI_PTE_MRIF_ADDR  GENMASK_ULL(53, 7)      /* When M == 1 (MRIF mode) */
> +#define RISCV_IOMMU_MSI_PTE_PPN                RISCV_IOMMU_PPN_FIELD   /* When M == 3 (basic mode) */
> +#define RISCV_IOMMU_MSI_PTE_C          BIT_ULL(63)
> +
> +/* Fields on mrif_info */
> +#define RISCV_IOMMU_MSI_MRIF_NID       GENMASK_ULL(9, 0)
> +#define RISCV_IOMMU_MSI_MRIF_NPPN      RISCV_IOMMU_PPN_FIELD
> +#define RISCV_IOMMU_MSI_MRIF_NID_MSB   BIT_ULL(60)
> +
> +#endif /* _RISCV_IOMMU_BITS_H_ */
> diff --git a/drivers/iommu/riscv/iommu-pci.c b/drivers/iommu/riscv/iommu-pci.c
> new file mode 100644
> index 000000000000..c91f963d7a29
> --- /dev/null
> +++ b/drivers/iommu/riscv/iommu-pci.c
> @@ -0,0 +1,134 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +/*
> + * Copyright © 2022-2023 Rivos Inc.
> + * Copyright © 2023 FORTH-ICS/CARV
> + *
> + * RISCV IOMMU as a PCIe device
> + *
> + * Authors
> + *     Tomasz Jeznach <tjeznach@rivosinc.com>
> + *     Nick Kossifidis <mick@ics.forth.gr>
> + */
> +
> +#include <linux/module.h>
> +#include <linux/kernel.h>
> +#include <linux/compiler.h>
> +#include <linux/pci.h>
> +#include <linux/init.h>
> +#include <linux/iommu.h>
> +#include <linux/bitfield.h>
> +
> +#include "iommu.h"
> +
> +/* Rivos Inc. assigned PCI Vendor and Device IDs */
> +#ifndef PCI_VENDOR_ID_RIVOS
> +#define PCI_VENDOR_ID_RIVOS             0x1efd
> +#endif
> +
> +#ifndef PCI_DEVICE_ID_RIVOS_IOMMU
> +#define PCI_DEVICE_ID_RIVOS_IOMMU       0xedf1
> +#endif
> +
> +static int riscv_iommu_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> +{
> +       struct device *dev = &pdev->dev;
> +       struct riscv_iommu_device *iommu;
> +       int ret;
> +
> +       ret = pci_enable_device_mem(pdev);
> +       if (ret < 0)
> +               return ret;
> +
> +       ret = pci_request_mem_regions(pdev, KBUILD_MODNAME);
> +       if (ret < 0)
> +               goto fail;
> +
> +       ret = -ENOMEM;
> +
> +       iommu = devm_kzalloc(dev, sizeof(*iommu), GFP_KERNEL);
> +       if (!iommu)
> +               goto fail;
> +
> +       if (!(pci_resource_flags(pdev, 0) & IORESOURCE_MEM))
> +               goto fail;
> +
> +       if (pci_resource_len(pdev, 0) < RISCV_IOMMU_REG_SIZE)
> +               goto fail;
> +
> +       iommu->reg_phys = pci_resource_start(pdev, 0);
> +       if (!iommu->reg_phys)
> +               goto fail;
> +
> +       iommu->reg = devm_ioremap(dev, iommu->reg_phys, RISCV_IOMMU_REG_SIZE);
> +       if (!iommu->reg)
> +               goto fail;
> +
> +       iommu->dev = dev;
> +       dev_set_drvdata(dev, iommu);
> +
> +       dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
> +       pci_set_master(pdev);
> +
> +       ret = riscv_iommu_init(iommu);
> +       if (!ret)
> +               return ret;
> +
> + fail:
> +       pci_clear_master(pdev);
> +       pci_release_regions(pdev);
> +       pci_disable_device(pdev);
> +       /* Note: devres_release_all() will release iommu and iommu->reg */
> +       return ret;
> +}
> +
> +static void riscv_iommu_pci_remove(struct pci_dev *pdev)
> +{
> +       riscv_iommu_remove(dev_get_drvdata(&pdev->dev));
> +       pci_clear_master(pdev);
> +       pci_release_regions(pdev);
> +       pci_disable_device(pdev);
> +}
> +
> +static int riscv_iommu_suspend(struct device *dev)
> +{
> +       dev_warn(dev, "RISC-V IOMMU PM not implemented");
> +       return -ENODEV;
> +}
> +
> +static int riscv_iommu_resume(struct device *dev)
> +{
> +       dev_warn(dev, "RISC-V IOMMU PM not implemented");
> +       return -ENODEV;
> +}
> +
> +static DEFINE_SIMPLE_DEV_PM_OPS(riscv_iommu_pm_ops, riscv_iommu_suspend,
> +                               riscv_iommu_resume);
> +
> +static const struct pci_device_id riscv_iommu_pci_tbl[] = {
> +       {PCI_VENDOR_ID_RIVOS, PCI_DEVICE_ID_RIVOS_IOMMU,
> +        PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0},
> +       {0,}
> +};
> +
> +MODULE_DEVICE_TABLE(pci, riscv_iommu_pci_tbl);
> +
> +static const struct of_device_id riscv_iommu_of_match[] = {
> +       {.compatible = "riscv,pci-iommu",},
> +       {},
> +};
> +
> +MODULE_DEVICE_TABLE(of, riscv_iommu_of_match);
> +
> +static struct pci_driver riscv_iommu_pci_driver = {
> +       .name = KBUILD_MODNAME,
> +       .id_table = riscv_iommu_pci_tbl,
> +       .probe = riscv_iommu_pci_probe,
> +       .remove = riscv_iommu_pci_remove,
> +       .driver = {
> +                  .pm = pm_sleep_ptr(&riscv_iommu_pm_ops),
> +                  .of_match_table = riscv_iommu_of_match,
> +                  },
> +};
> +
> +module_driver(riscv_iommu_pci_driver, pci_register_driver, pci_unregister_driver);
> diff --git a/drivers/iommu/riscv/iommu-platform.c b/drivers/iommu/riscv/iommu-platform.c
> new file mode 100644
> index 000000000000..e4e8ca6711e7
> --- /dev/null
> +++ b/drivers/iommu/riscv/iommu-platform.c
> @@ -0,0 +1,94 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * RISC-V IOMMU as a platform device
> + *
> + * Copyright © 2023 FORTH-ICS/CARV
> + *
> + * Author: Nick Kossifidis <mick@ics.forth.gr>
> + */
> +
> +#include <linux/module.h>
> +#include <linux/kernel.h>
> +#include <linux/of_platform.h>
> +#include <linux/bitfield.h>
> +
> +#include "iommu-bits.h"
> +#include "iommu.h"
> +
> +static int riscv_iommu_platform_probe(struct platform_device *pdev)
> +{
> +       struct device *dev = &pdev->dev;
> +       struct riscv_iommu_device *iommu = NULL;
> +       struct resource *res = NULL;
> +       int ret = 0;
> +
> +       iommu = devm_kzalloc(dev, sizeof(*iommu), GFP_KERNEL);
> +       if (!iommu)
> +               return -ENOMEM;
> +
> +       iommu->dev = dev;
> +       dev_set_drvdata(dev, iommu);
> +
> +       res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> +       if (!res) {
> +               dev_err(dev, "could not find resource for register region\n");
> +               return -EINVAL;
> +       }
> +
> +       iommu->reg = devm_platform_get_and_ioremap_resource(pdev, 0, &res);
> +       if (IS_ERR(iommu->reg)) {
> +               ret = dev_err_probe(dev, PTR_ERR(iommu->reg),
> +                                   "could not map register region\n");
> +               goto fail;
> +       };
> +
> +       iommu->reg_phys = res->start;
> +
> +       ret = -ENODEV;
> +
> +       /* Sanity check: Did we get the whole register space ? */
> +       if ((res->end - res->start + 1) < RISCV_IOMMU_REG_SIZE) {
> +               dev_err(dev, "device region smaller than register file (0x%llx)\n",
> +                       res->end - res->start);
> +               goto fail;
> +       }
> +
> +       dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
> +
> +       return riscv_iommu_init(iommu);
> +
> + fail:
> +       /* Note: devres_release_all() will release iommu and iommu->reg */
> +       return ret;
> +};
> +
> +static void riscv_iommu_platform_remove(struct platform_device *pdev)
> +{
> +       riscv_iommu_remove(dev_get_drvdata(&pdev->dev));
> +};
> +
> +static void riscv_iommu_platform_shutdown(struct platform_device *pdev)
> +{
> +       return;
> +};
> +
> +static const struct of_device_id riscv_iommu_of_match[] = {
> +       {.compatible = "riscv,iommu",},
> +       {},
> +};
> +
> +MODULE_DEVICE_TABLE(of, riscv_iommu_of_match);
> +
> +static struct platform_driver riscv_iommu_platform_driver = {
> +       .driver = {
> +                  .name = "riscv,iommu",
> +                  .of_match_table = riscv_iommu_of_match,
> +                  .suppress_bind_attrs = true,
> +                  },
> +       .probe = riscv_iommu_platform_probe,
> +       .remove_new = riscv_iommu_platform_remove,
> +       .shutdown = riscv_iommu_platform_shutdown,
> +};
> +
> +module_driver(riscv_iommu_platform_driver, platform_driver_register,
> +             platform_driver_unregister);
> diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
> new file mode 100644
> index 000000000000..8c236242e2cc
> --- /dev/null
> +++ b/drivers/iommu/riscv/iommu.c
> @@ -0,0 +1,660 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * IOMMU API for RISC-V architected Ziommu implementations.
> + *
> + * Copyright © 2022-2023 Rivos Inc.
> + * Copyright © 2023 FORTH-ICS/CARV
> + *
> + * Authors
> + *     Tomasz Jeznach <tjeznach@rivosinc.com>
> + *     Nick Kossifidis <mick@ics.forth.gr>
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include <linux/bitfield.h>
> +#include <linux/module.h>
> +#include <linux/kernel.h>
> +#include <linux/compiler.h>
> +#include <linux/pci.h>
> +#include <linux/pci-ats.h>
> +#include <linux/init.h>
> +#include <linux/completion.h>
> +#include <linux/uaccess.h>
> +#include <linux/iommu.h>
> +#include <linux/irqdomain.h>
> +#include <linux/platform_device.h>
> +#include <linux/dma-map-ops.h>
> +#include <asm/page.h>
> +
> +#include "../dma-iommu.h"
> +#include "../iommu-sva.h"
> +#include "iommu.h"
> +
> +#include <asm/csr.h>
> +#include <asm/delay.h>
> +
> +MODULE_DESCRIPTION("IOMMU driver for RISC-V architected Ziommu implementations");
> +MODULE_AUTHOR("Tomasz Jeznach <tjeznach@rivosinc.com>");
> +MODULE_AUTHOR("Nick Kossifidis <mick@ics.forth.gr>");
> +MODULE_ALIAS("riscv-iommu");
> +MODULE_LICENSE("GPL v2");
> +
> +/* Global IOMMU params. */
> +static int ddt_mode = RISCV_IOMMU_DDTP_MODE_BARE;
> +module_param(ddt_mode, int, 0644);
> +MODULE_PARM_DESC(ddt_mode, "Device Directory Table mode.");
> +
> +/* IOMMU PSCID allocation namespace. */
> +#define RISCV_IOMMU_MAX_PSCID  (1U << 20)
> +static DEFINE_IDA(riscv_iommu_pscids);
> +
> +/* 1 second */
> +#define RISCV_IOMMU_TIMEOUT    riscv_timebase
> +
> +/* RISC-V IOMMU PPN <> PHYS address conversions, PHYS <=> PPN[53:10] */
> +#define phys_to_ppn(va)  (((va) >> 2) & (((1ULL << 44) - 1) << 10))
> +#define ppn_to_phys(pn)         (((pn) << 2) & (((1ULL << 44) - 1) << 12))
> +
> +#define iommu_domain_to_riscv(iommu_domain) \
> +    container_of(iommu_domain, struct riscv_iommu_domain, domain)
> +
> +#define iommu_device_to_riscv(iommu_device) \
> +    container_of(iommu_device, struct riscv_iommu, iommu)
> +
> +static const struct iommu_domain_ops riscv_iommu_domain_ops;
> +static const struct iommu_ops riscv_iommu_ops;
> +
> +/*
> + * Register device for IOMMU tracking.
> + */
> +static void riscv_iommu_add_device(struct riscv_iommu_device *iommu, struct device *dev)
> +{
> +       struct riscv_iommu_endpoint *ep, *rb_ep;
> +       struct rb_node **new_node, *parent_node = NULL;
> +
> +       mutex_lock(&iommu->eps_mutex);
> +
> +       ep = dev_iommu_priv_get(dev);
> +
> +       new_node = &(iommu->eps.rb_node);
> +       while (*new_node) {
> +               rb_ep = rb_entry(*new_node, struct riscv_iommu_endpoint, node);
> +               parent_node = *new_node;
> +               if (rb_ep->devid > ep->devid) {
> +                       new_node = &((*new_node)->rb_left);
> +               } else if (rb_ep->devid < ep->devid) {
> +                       new_node = &((*new_node)->rb_right);
> +               } else {
> +                       dev_warn(dev, "device %u already in the tree\n", ep->devid);
> +                       break;
> +               }
> +       }
> +
> +       rb_link_node(&ep->node, parent_node, new_node);
> +       rb_insert_color(&ep->node, &iommu->eps);
> +
> +       mutex_unlock(&iommu->eps_mutex);
> +}
> +
> +/*
> + * Endpoint management
> + */
> +
> +static int riscv_iommu_of_xlate(struct device *dev, struct of_phandle_args *args)
> +{
> +       return iommu_fwspec_add_ids(dev, args->args, 1);
> +}
> +
> +static bool riscv_iommu_capable(struct device *dev, enum iommu_cap cap)
> +{
> +       switch (cap) {
> +       case IOMMU_CAP_CACHE_COHERENCY:
> +       case IOMMU_CAP_PRE_BOOT_PROTECTION:
> +               return true;
> +
> +       default:
> +               break;
> +       }
> +
> +       return false;
> +}
> +
> +static struct iommu_device *riscv_iommu_probe_device(struct device *dev)
> +{
> +       struct riscv_iommu_device *iommu;
> +       struct riscv_iommu_endpoint *ep;
> +       struct iommu_fwspec *fwspec;
> +
> +       fwspec = dev_iommu_fwspec_get(dev);
> +       if (!fwspec || fwspec->ops != &riscv_iommu_ops ||
> +           !fwspec->iommu_fwnode || !fwspec->iommu_fwnode->dev)
> +               return ERR_PTR(-ENODEV);
> +
> +       iommu = dev_get_drvdata(fwspec->iommu_fwnode->dev);
> +       if (!iommu)
> +               return ERR_PTR(-ENODEV);
> +
> +       if (dev_iommu_priv_get(dev))
> +               return &iommu->iommu;
> +
> +       ep = kzalloc(sizeof(*ep), GFP_KERNEL);
> +       if (!ep)
> +               return ERR_PTR(-ENOMEM);
> +
> +       mutex_init(&ep->lock);
> +       INIT_LIST_HEAD(&ep->domain);
> +
> +       if (dev_is_pci(dev)) {
> +               ep->devid = pci_dev_id(to_pci_dev(dev));
> +               ep->domid = pci_domain_nr(to_pci_dev(dev)->bus);
> +       } else {
> +               /* TODO: Make this generic, for now hardcode domain id to 0 */
> +               ep->devid = fwspec->ids[0];
> +               ep->domid = 0;
> +       }
> +
> +       ep->iommu = iommu;
> +       ep->dev = dev;
> +
> +       dev_info(iommu->dev, "adding device to iommu with devid %i in domain %i\n",
> +               ep->devid, ep->domid);
> +
> +       dev_iommu_priv_set(dev, ep);
> +       riscv_iommu_add_device(iommu, dev);
> +
> +       return &iommu->iommu;
> +}
> +
> +static void riscv_iommu_probe_finalize(struct device *dev)
> +{
> +       set_dma_ops(dev, NULL);
> +       iommu_setup_dma_ops(dev, 0, U64_MAX);
> +}
> +
> +static void riscv_iommu_release_device(struct device *dev)
> +{
> +       struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
> +       struct riscv_iommu_device *iommu = ep->iommu;
> +
> +       dev_info(dev, "device with devid %i released\n", ep->devid);
> +
> +       mutex_lock(&ep->lock);
> +       list_del(&ep->domain);
> +       mutex_unlock(&ep->lock);
> +
> +       /* Remove endpoint from IOMMU tracking structures */
> +       mutex_lock(&iommu->eps_mutex);
> +       rb_erase(&ep->node, &iommu->eps);
> +       mutex_unlock(&iommu->eps_mutex);
> +
> +       set_dma_ops(dev, NULL);
> +       dev_iommu_priv_set(dev, NULL);
> +
> +       kfree(ep);
> +}
> +
> +static struct iommu_group *riscv_iommu_device_group(struct device *dev)
> +{
> +       if (dev_is_pci(dev))
> +               return pci_device_group(dev);
> +       return generic_device_group(dev);
> +}
> +
> +static void riscv_iommu_get_resv_regions(struct device *dev, struct list_head *head)
> +{
> +       iommu_dma_get_resv_regions(dev, head);
> +}
> +
> +/*
> + * Domain management
> + */
> +
> +static struct iommu_domain *riscv_iommu_domain_alloc(unsigned type)
> +{
> +       struct riscv_iommu_domain *domain;
> +
> +       if (type != IOMMU_DOMAIN_IDENTITY &&
> +           type != IOMMU_DOMAIN_BLOCKED)
> +               return NULL;
> +
> +       domain = kzalloc(sizeof(*domain), GFP_KERNEL);
> +       if (!domain)
> +               return NULL;
> +
> +       mutex_init(&domain->lock);
> +       INIT_LIST_HEAD(&domain->endpoints);
> +
> +       domain->domain.ops = &riscv_iommu_domain_ops;
> +       domain->mode = RISCV_IOMMU_DC_FSC_MODE_BARE;
> +       domain->pscid = ida_alloc_range(&riscv_iommu_pscids, 1,
> +                                       RISCV_IOMMU_MAX_PSCID, GFP_KERNEL);
> +
> +       printk("domain type %x alloc %u\n", type, domain->pscid);
> +
> +       return &domain->domain;
> +}
> +
> +static void riscv_iommu_domain_free(struct iommu_domain *iommu_domain)
> +{
> +       struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
> +
> +       if (!list_empty(&domain->endpoints)) {
> +               pr_warn("IOMMU domain is not empty!\n");
> +       }
> +
> +       if (domain->pgd_root)
> +               free_pages((unsigned long)domain->pgd_root, 0);
> +
> +       if ((int)domain->pscid > 0)
> +               ida_free(&riscv_iommu_pscids, domain->pscid);
> +
> +       printk("domain free %u\n", domain->pscid);
> +
> +       kfree(domain);
> +}
> +
> +static int riscv_iommu_domain_finalize(struct riscv_iommu_domain *domain,
> +                                      struct riscv_iommu_device *iommu)
> +{
> +       struct iommu_domain_geometry *geometry;
> +
> +       /* Domain assigned to another iommu */
> +       if (domain->iommu && domain->iommu != iommu)
> +               return -EINVAL;
> +       /* Domain already initialized */
> +       else if (domain->iommu)
> +               return 0;
> +
> +       /*
> +        * TODO: Before using VA_BITS and satp_mode here, verify they
> +        * are supported by the iommu, through the capabilities register.
> +        */
> +
> +       geometry = &domain->domain.geometry;
> +
> +       /*
> +        * Note: RISC-V Privilege spec mandates that virtual addresses
> +        * need to be sign-extended, so if (VA_BITS - 1) is set, all
> +        * bits >= VA_BITS need to also be set or else we'll get a
> +        * page fault. However the code that creates the mappings
> +        * above us (e.g. iommu_dma_alloc_iova()) won't do that for us
> +        * for now, so we'll end up with invalid virtual addresses
> +        * to map. As a workaround until we get this sorted out
> +        * limit the available virtual addresses to VA_BITS - 1.
> +        */
> +       geometry->aperture_start = 0;
> +       geometry->aperture_end = DMA_BIT_MASK(VA_BITS - 1);
> +       geometry->force_aperture = true;
> +
> +       domain->iommu = iommu;
> +
> +       if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)
> +               return 0;
> +
> +       /* TODO: Fix this for RV32 */
> +       domain->mode = satp_mode >> 60;
> +       domain->pgd_root = (pgd_t *) __get_free_pages(GFP_KERNEL | __GFP_ZERO, 0);
> +
> +       if (!domain->pgd_root)
> +               return -ENOMEM;
> +
> +       return 0;
> +}
> +
> +static int riscv_iommu_attach_dev(struct iommu_domain *iommu_domain, struct device *dev)
> +{
> +       struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
> +       struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
> +       int ret;
> +
> +       /* PSCID not valid */
> +       if ((int)domain->pscid < 0)
> +               return -ENOMEM;
> +
> +       mutex_lock(&domain->lock);
> +       mutex_lock(&ep->lock);
> +
> +       if (!list_empty(&ep->domain)) {
> +               dev_warn(dev, "endpoint already attached to a domain. dropping\n");
> +               list_del_init(&ep->domain);
> +       }
> +
> +       /* allocate root pages, initialize io-pgtable ops, etc. */
> +       ret = riscv_iommu_domain_finalize(domain, ep->iommu);
> +       if (ret < 0) {
> +               dev_err(dev, "can not finalize domain: %d\n", ret);
> +               mutex_unlock(&ep->lock);
> +               mutex_unlock(&domain->lock);
> +               return ret;
> +       }
> +
> +       if (ep->iommu->ddt_mode != RISCV_IOMMU_DDTP_MODE_BARE ||
> +           domain->domain.type != IOMMU_DOMAIN_IDENTITY) {
> +               dev_warn(dev, "domain type %d not supported\n",
> +                   domain->domain.type);
> +               return -ENODEV;
> +       }
> +
> +       list_add_tail(&ep->domain, &domain->endpoints);
> +       mutex_unlock(&ep->lock);
> +       mutex_unlock(&domain->lock);
> +
> +       dev_info(dev, "domain type %d attached w/ PSCID %u\n",
> +           domain->domain.type, domain->pscid);
> +
> +       return 0;
> +}
> +
> +static void riscv_iommu_flush_iotlb_range(struct iommu_domain *iommu_domain,
> +                                         unsigned long *start, unsigned long *end,
> +                                         size_t *pgsize)
> +{
> +       /* Command interface not implemented */
> +}
> +
> +static void riscv_iommu_flush_iotlb_all(struct iommu_domain *iommu_domain)
> +{
> +       riscv_iommu_flush_iotlb_range(iommu_domain, NULL, NULL, NULL);
> +}
> +
> +static void riscv_iommu_iotlb_sync(struct iommu_domain *iommu_domain,
> +                                  struct iommu_iotlb_gather *gather)
> +{
> +       riscv_iommu_flush_iotlb_range(iommu_domain, &gather->start, &gather->end,
> +                                     &gather->pgsize);
> +}
> +
> +static void riscv_iommu_iotlb_sync_map(struct iommu_domain *iommu_domain,
> +                                      unsigned long iova, size_t size)
> +{
> +       unsigned long end = iova + size - 1;
> +       /*
> +        * Given we don't know the page size used by this range, we assume the
> +        * smallest page size to ensure all possible entries are flushed from
> +        * the IOATC.
> +        */
> +       size_t pgsize = PAGE_SIZE;
> +       riscv_iommu_flush_iotlb_range(iommu_domain, &iova, &end, &pgsize);
> +}
> +
> +static int riscv_iommu_map_pages(struct iommu_domain *iommu_domain,
> +                                unsigned long iova, phys_addr_t phys,
> +                                size_t pgsize, size_t pgcount, int prot,
> +                                gfp_t gfp, size_t *mapped)
> +{
> +       struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
> +
> +       if (domain->domain.type == IOMMU_DOMAIN_IDENTITY) {
> +               *mapped = pgsize * pgcount;
> +               return 0;
> +       }
> +
> +       return -ENODEV;
> +}
> +
> +static size_t riscv_iommu_unmap_pages(struct iommu_domain *iommu_domain,
> +                                     unsigned long iova, size_t pgsize,
> +                                     size_t pgcount, struct iommu_iotlb_gather *gather)
> +{
> +       struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
> +
> +       if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)
> +               return pgsize * pgcount;
> +
> +       return 0;
> +}
> +
> +static phys_addr_t riscv_iommu_iova_to_phys(struct iommu_domain *iommu_domain,
> +                                           dma_addr_t iova)
> +{
> +       struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
> +
> +       if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)
> +               return (phys_addr_t) iova;
> +
> +       return 0;
> +}
> +
> +/*
> + * Translation mode setup
> + */
> +
> +static u64 riscv_iommu_get_ddtp(struct riscv_iommu_device *iommu)
> +{
> +       u64 ddtp;
> +       cycles_t end_cycles = RISCV_IOMMU_TIMEOUT + get_cycles();
> +
> +       /* Wait for DDTP.BUSY to be cleared and return latest value */
> +       do {
> +               ddtp = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_DDTP);
> +               if (!(ddtp & RISCV_IOMMU_DDTP_BUSY))
> +                       break;
> +               cpu_relax();
> +       } while (get_cycles() < end_cycles);
> +
> +       return ddtp;
> +}
> +
> +static void riscv_iommu_ddt_cleanup(struct riscv_iommu_device *iommu)
> +{
> +       /* TODO: teardown whole device directory tree. */
> +       if (iommu->ddtp) {
> +               if (iommu->ddtp_in_iomem) {
> +                       iounmap((void *)iommu->ddtp);
> +               } else
> +                       free_page(iommu->ddtp);
> +               iommu->ddtp = 0;
> +       }
> +}
> +
> +static int riscv_iommu_enable(struct riscv_iommu_device *iommu, unsigned requested_mode)
> +{
> +       struct device *dev = iommu->dev;
> +       u64 ddtp = 0;
> +       u64 ddtp_paddr = 0;
> +       unsigned mode = requested_mode;
> +       unsigned mode_readback = 0;
> +
> +       ddtp = riscv_iommu_get_ddtp(iommu);
> +       if (ddtp & RISCV_IOMMU_DDTP_BUSY)
> +               return -EBUSY;
> +
> +       /* Disallow state transtion from xLVL to xLVL. */
> +       switch (FIELD_GET(RISCV_IOMMU_DDTP_MODE, ddtp)) {
> +       case RISCV_IOMMU_DDTP_MODE_BARE:
> +       case RISCV_IOMMU_DDTP_MODE_OFF:
> +               break;
> +       default:
> +               if ((mode != RISCV_IOMMU_DDTP_MODE_BARE)
> +                   && (mode != RISCV_IOMMU_DDTP_MODE_OFF))
> +                       return -EINVAL;
> +               break;
> +       }
> +
> + retry:
> +       switch (mode) {
> +       case RISCV_IOMMU_DDTP_MODE_BARE:
> +       case RISCV_IOMMU_DDTP_MODE_OFF:
> +               riscv_iommu_ddt_cleanup(iommu);
> +               ddtp = FIELD_PREP(RISCV_IOMMU_DDTP_MODE, mode);
> +               break;
> +       case RISCV_IOMMU_DDTP_MODE_1LVL:
> +       case RISCV_IOMMU_DDTP_MODE_2LVL:
> +       case RISCV_IOMMU_DDTP_MODE_3LVL:
> +               if (!iommu->ddtp) {
> +                       /*
> +                        * We haven't initialized ddtp yet, since it's WARL make
> +                        * sure that we don't have a hardwired PPN field there
> +                        * that points to i/o memory instead.
> +                        */
> +                       riscv_iommu_writeq(iommu, RISCV_IOMMU_REG_DDTP, 0);
> +                       ddtp = riscv_iommu_get_ddtp(iommu);
> +                       ddtp_paddr = ppn_to_phys(ddtp);
> +                       if (ddtp_paddr) {
> +                               dev_warn(dev, "ddtp at 0x%llx\n", ddtp_paddr);
> +                               iommu->ddtp =
> +                                   (unsigned long)ioremap(ddtp_paddr, PAGE_SIZE);
> +                               iommu->ddtp_in_iomem = true;
> +                       } else {
> +                               iommu->ddtp = get_zeroed_page(GFP_KERNEL);
> +                       }
> +               }
> +               if (!iommu->ddtp)
> +                       return -ENOMEM;
> +
> +               ddtp = FIELD_PREP(RISCV_IOMMU_DDTP_MODE, mode) |
> +                   phys_to_ppn(__pa(iommu->ddtp));
> +
> +               break;
> +       default:
> +               return -EINVAL;
> +       }
> +
> +       riscv_iommu_writeq(iommu, RISCV_IOMMU_REG_DDTP, ddtp);
> +       ddtp = riscv_iommu_get_ddtp(iommu);
> +       if (ddtp & RISCV_IOMMU_DDTP_BUSY) {
> +               dev_warn(dev, "timeout when setting ddtp (ddt mode: %i)\n", mode);
> +               return -EBUSY;
> +       }
> +
> +       mode_readback = FIELD_GET(RISCV_IOMMU_DDTP_MODE, ddtp);
> +       dev_info(dev, "mode_readback: %i, mode: %i\n", mode_readback, mode);
> +       if (mode_readback != mode) {
> +               /*
> +                * Mode field is WARL, an I/O MMU may support a subset of
> +                * directory table levels in which case if we tried to set
> +                * an unsupported number of levels we'll readback either
> +                * a valid xLVL or off/bare. If we got off/bare, try again
> +                * with a smaller xLVL.
> +                */
> +               if (mode_readback < RISCV_IOMMU_DDTP_MODE_1LVL &&
> +                   mode > RISCV_IOMMU_DDTP_MODE_1LVL) {
> +                       mode--;
> +                       goto retry;
> +               }
> +
> +               /*
> +                * We tried all supported xLVL modes and still got off/bare instead,
> +                * an I/O MMU must support at least one supported xLVL mode so something
> +                * went very wrong.
> +                */
> +               if (mode_readback < RISCV_IOMMU_DDTP_MODE_1LVL &&
> +                   mode == RISCV_IOMMU_DDTP_MODE_1LVL)
> +                       goto fail;
> +
> +               /*
> +                * We tried setting off or bare and got something else back, something
> +                * went very wrong since off/bare is always legal.
> +                */
> +               if (mode < RISCV_IOMMU_DDTP_MODE_1LVL)
> +                       goto fail;
> +
> +               /*
> +                * We tried setting an xLVL mode but got another xLVL mode that
> +                * we don't support (e.g. a custom one).
> +                */
> +               if (mode_readback > RISCV_IOMMU_DDTP_MODE_MAX)
> +                       goto fail;
> +
> +               /* We tried setting an xLVL mode but got another supported xLVL mode */
> +               mode = mode_readback;
> +       }
> +
> +       if (mode != requested_mode)
> +               dev_warn(dev, "unsupported DDT mode requested (%i), using %i instead\n",
> +                        requested_mode, mode);
> +
> +       iommu->ddt_mode = mode;
> +       dev_info(dev, "ddt_mode: %i\n", iommu->ddt_mode);
> +       return 0;
> +
> + fail:
> +       dev_err(dev, "failed to set DDT mode, tried: %i and got %i\n", mode,
> +               mode_readback);
> +       riscv_iommu_ddt_cleanup(iommu);
> +       return -EINVAL;
> +}
> +
> +/*
> + * Common I/O MMU driver probe/teardown
> + */
> +
> +static const struct iommu_domain_ops riscv_iommu_domain_ops = {
> +       .free = riscv_iommu_domain_free,
> +       .attach_dev = riscv_iommu_attach_dev,

Could I know why there is no implementation of '.detach_dev' ? It
seems that we need to clear something when device is detached, such as
device context.

> +       .map_pages = riscv_iommu_map_pages,
> +       .unmap_pages = riscv_iommu_unmap_pages,
> +       .iova_to_phys = riscv_iommu_iova_to_phys,
> +       .iotlb_sync = riscv_iommu_iotlb_sync,
> +       .iotlb_sync_map = riscv_iommu_iotlb_sync_map,
> +       .flush_iotlb_all = riscv_iommu_flush_iotlb_all,
> +};
> +
> +static const struct iommu_ops riscv_iommu_ops = {
> +       .owner = THIS_MODULE,
> +       .pgsize_bitmap = SZ_4K | SZ_2M | SZ_512M,
> +       .capable = riscv_iommu_capable,
> +       .domain_alloc = riscv_iommu_domain_alloc,
> +       .probe_device = riscv_iommu_probe_device,
> +       .probe_finalize = riscv_iommu_probe_finalize,
> +       .release_device = riscv_iommu_release_device,
> +       .device_group = riscv_iommu_device_group,
> +       .get_resv_regions = riscv_iommu_get_resv_regions,
> +       .of_xlate = riscv_iommu_of_xlate,
> +       .default_domain_ops = &riscv_iommu_domain_ops,
> +};
> +
> +void riscv_iommu_remove(struct riscv_iommu_device *iommu)
> +{
> +       iommu_device_unregister(&iommu->iommu);
> +       riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_OFF);
> +}
> +
> +int riscv_iommu_init(struct riscv_iommu_device *iommu)
> +{
> +       struct device *dev = iommu->dev;
> +       u32 fctl = 0;
> +       int ret;
> +
> +       iommu->eps = RB_ROOT;
> +
> +       fctl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_FCTL);
> +
> +#ifdef CONFIG_CPU_BIG_ENDIAN
> +       if (!(cap & RISCV_IOMMU_CAP_END)) {
> +               dev_err(dev, "IOMMU doesn't support Big Endian\n");
> +               return -EIO;
> +       } else if (!(fctl & RISCV_IOMMU_FCTL_BE)) {
> +               fctl |= FIELD_PREP(RISCV_IOMMU_FCTL_BE, 1);
> +               riscv_iommu_writel(iommu, RISCV_IOMMU_REG_FCTL, fctl);
> +       }
> +#endif
> +
> +       /* Clear any pending interrupt flag. */
> +       riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR,
> +                          RISCV_IOMMU_IPSR_CIP |
> +                          RISCV_IOMMU_IPSR_FIP |
> +                          RISCV_IOMMU_IPSR_PMIP | RISCV_IOMMU_IPSR_PIP);
> +       spin_lock_init(&iommu->cq_lock);
> +       mutex_init(&iommu->eps_mutex);
> +
> +       ret = riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_BARE);
> +
> +       if (ret) {
> +               dev_err(dev, "cannot enable iommu device (%d)\n", ret);
> +               goto fail;
> +       }
> +
> +       ret = iommu_device_register(&iommu->iommu, &riscv_iommu_ops, dev);
> +       if (ret) {
> +               dev_err(dev, "cannot register iommu interface (%d)\n", ret);
> +               iommu_device_sysfs_remove(&iommu->iommu);
> +               goto fail;
> +       }
> +
> +       return 0;
> + fail:
> +       riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_OFF);
> +       return ret;
> +}
> diff --git a/drivers/iommu/riscv/iommu.h b/drivers/iommu/riscv/iommu.h
> new file mode 100644
> index 000000000000..7baefd3630b3
> --- /dev/null
> +++ b/drivers/iommu/riscv/iommu.h
> @@ -0,0 +1,115 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright © 2022-2023 Rivos Inc.
> + * Copyright © 2023 FORTH-ICS/CARV
> + *
> + * RISC-V Ziommu - IOMMU Interface Specification.
> + *
> + * Authors
> + *     Tomasz Jeznach <tjeznach@rivosinc.com>
> + *     Nick Kossifidis <mick@ics.forth.gr>
> + */
> +
> +#ifndef _RISCV_IOMMU_H_
> +#define _RISCV_IOMMU_H_
> +
> +#include <linux/types.h>
> +#include <linux/iova.h>
> +#include <linux/io.h>
> +#include <linux/idr.h>
> +#include <linux/list.h>
> +#include <linux/iommu.h>
> +#include <linux/io-pgtable.h>
> +
> +#include "iommu-bits.h"
> +
> +#define IOMMU_PAGE_SIZE_4K     BIT_ULL(12)
> +#define IOMMU_PAGE_SIZE_2M     BIT_ULL(21)
> +#define IOMMU_PAGE_SIZE_1G     BIT_ULL(30)
> +#define IOMMU_PAGE_SIZE_512G   BIT_ULL(39)
> +
> +struct riscv_iommu_device {
> +       struct iommu_device iommu;      /* iommu core interface */
> +       struct device *dev;             /* iommu hardware */
> +
> +       /* hardware control register space */
> +       void __iomem *reg;
> +       resource_size_t reg_phys;
> +
> +       /* IRQs for the various queues */
> +       int irq_cmdq;
> +       int irq_fltq;
> +       int irq_pm;
> +       int irq_priq;
> +
> +       /* supported and enabled hardware capabilities */
> +       u64 cap;
> +
> +       /* global lock, to be removed */
> +       spinlock_t cq_lock;
> +
> +       /* device directory table root pointer and mode */
> +       unsigned long ddtp;
> +       unsigned ddt_mode;
> +       bool ddtp_in_iomem;
> +
> +       /* Connected end-points */
> +       struct rb_root eps;
> +       struct mutex eps_mutex;
> +};
> +
> +struct riscv_iommu_domain {
> +       struct iommu_domain domain;
> +
> +       struct list_head endpoints;
> +       struct mutex lock;
> +       struct riscv_iommu_device *iommu;
> +
> +       unsigned mode;          /* RIO_ATP_MODE_* enum */
> +       unsigned pscid;         /* RISC-V IOMMU PSCID */
> +
> +       pgd_t *pgd_root;        /* page table root pointer */
> +};
> +
> +/* Private dev_iommu_priv object, device-domain relationship. */
> +struct riscv_iommu_endpoint {
> +       struct device *dev;                     /* platform or PCI endpoint device */
> +       unsigned devid;                         /* PCI bus:device:function number */
> +       unsigned domid;                         /* PCI domain number, segment */
> +       struct rb_node node;                    /* device tracking node (lookup by devid) */
> +       struct riscv_iommu_device *iommu;       /* parent iommu device */
> +
> +       struct mutex lock;
> +       struct list_head domain;                /* endpoint attached managed domain */
> +};
> +
> +/* Helper functions and macros */
> +
> +static inline u32 riscv_iommu_readl(struct riscv_iommu_device *iommu,
> +                                   unsigned offset)
> +{
> +       return readl_relaxed(iommu->reg + offset);
> +}
> +
> +static inline void riscv_iommu_writel(struct riscv_iommu_device *iommu,
> +                                     unsigned offset, u32 val)
> +{
> +       writel_relaxed(val, iommu->reg + offset);
> +}
> +
> +static inline u64 riscv_iommu_readq(struct riscv_iommu_device *iommu,
> +                                   unsigned offset)
> +{
> +       return readq_relaxed(iommu->reg + offset);
> +}
> +
> +static inline void riscv_iommu_writeq(struct riscv_iommu_device *iommu,
> +                                     unsigned offset, u64 val)
> +{
> +       writeq_relaxed(val, iommu->reg + offset);
> +}
> +
> +int riscv_iommu_init(struct riscv_iommu_device *iommu);
> +void riscv_iommu_remove(struct riscv_iommu_device *iommu);
> +
> +#endif
> --
> 2.34.1
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings
  2023-07-27  2:42                   ` Zong Li
@ 2023-08-09 14:57                     ` Jason Gunthorpe
  2023-08-15  1:28                       ` Zong Li
  0 siblings, 1 reply; 86+ messages in thread
From: Jason Gunthorpe @ 2023-08-09 14:57 UTC (permalink / raw)
  To: Zong Li
  Cc: Baolu Lu, Anup Patel, Tomasz Jeznach, Joerg Roedel, Will Deacon,
	Robin Murphy, Paul Walmsley, Albert Ou, linux, linux-kernel,
	Sebastien Boeuf, iommu, Palmer Dabbelt, Nick Kossifidis,
	linux-riscv

On Thu, Jul 27, 2023 at 10:42:47AM +0800, Zong Li wrote:

> Perhaps this question could be related to the scenarios in which
> devices wish to be in bypass mode when the IOMMU is in translation
> mode, and why IOMMU defines/supports this case. Currently, I could
> envision a scenario where a device is already connected to the IOMMU
> in hardware, but it is not functioning correctly, or there are
> performance impacts. If modifying the hardware is not feasible, a
> default configuration that allows bypass mode could be provided as a
> solution. There might be other scenarios that I might have overlooked.
> It seems to me since IOMMU supports this configuration, it would be
> advantageous to have an approach to achieve it, and DT might be a
> flexible way.

So far we've taken the approach that broken hardware is quirked in the
kernel by matching OF compatible string pattners. This is HW that is
completely broken and the IOMMU doesn't work at all for it.

HW that is slow or whatever is not quirked and this is an admin policy
choice where the system should land on the security/performance
spectrum.

So I'm not sure adding DT makes sense here.

Jason

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings
  2023-08-09 14:57                     ` Jason Gunthorpe
@ 2023-08-15  1:28                       ` Zong Li
  2023-08-15 18:38                         ` Jason Gunthorpe
  0 siblings, 1 reply; 86+ messages in thread
From: Zong Li @ 2023-08-15  1:28 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Baolu Lu, Anup Patel, Tomasz Jeznach, Joerg Roedel, Will Deacon,
	Robin Murphy, Paul Walmsley, Albert Ou, linux, linux-kernel,
	Sebastien Boeuf, iommu, Palmer Dabbelt, Nick Kossifidis,
	linux-riscv

On Wed, Aug 9, 2023 at 10:57 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Thu, Jul 27, 2023 at 10:42:47AM +0800, Zong Li wrote:
>
> > Perhaps this question could be related to the scenarios in which
> > devices wish to be in bypass mode when the IOMMU is in translation
> > mode, and why IOMMU defines/supports this case. Currently, I could
> > envision a scenario where a device is already connected to the IOMMU
> > in hardware, but it is not functioning correctly, or there are
> > performance impacts. If modifying the hardware is not feasible, a
> > default configuration that allows bypass mode could be provided as a
> > solution. There might be other scenarios that I might have overlooked.
> > It seems to me since IOMMU supports this configuration, it would be
> > advantageous to have an approach to achieve it, and DT might be a
> > flexible way.
>
> So far we've taken the approach that broken hardware is quirked in the
> kernel by matching OF compatible string pattners. This is HW that is
> completely broken and the IOMMU doesn't work at all for it.
>
> HW that is slow or whatever is not quirked and this is an admin policy
> choice where the system should land on the security/performance
> spectrum.
>
> So I'm not sure adding DT makes sense here.
>

Hi Jason,
Sorry for being late here, I hadn't noticed this reply earlier. The
approach seems to address the situation. Could you kindly provide
information about the location of the patches? I was wondering about
further details regarding this particular implementation. Thanks

> Jason

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings
  2023-08-15  1:28                       ` Zong Li
@ 2023-08-15 18:38                         ` Jason Gunthorpe
  2023-08-16  2:16                           ` Zong Li
  0 siblings, 1 reply; 86+ messages in thread
From: Jason Gunthorpe @ 2023-08-15 18:38 UTC (permalink / raw)
  To: Zong Li
  Cc: Baolu Lu, Anup Patel, Tomasz Jeznach, Joerg Roedel, Will Deacon,
	Robin Murphy, Paul Walmsley, Albert Ou, linux, linux-kernel,
	Sebastien Boeuf, iommu, Palmer Dabbelt, Nick Kossifidis,
	linux-riscv

On Tue, Aug 15, 2023 at 09:28:54AM +0800, Zong Li wrote:
> On Wed, Aug 9, 2023 at 10:57 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Thu, Jul 27, 2023 at 10:42:47AM +0800, Zong Li wrote:
> >
> > > Perhaps this question could be related to the scenarios in which
> > > devices wish to be in bypass mode when the IOMMU is in translation
> > > mode, and why IOMMU defines/supports this case. Currently, I could
> > > envision a scenario where a device is already connected to the IOMMU
> > > in hardware, but it is not functioning correctly, or there are
> > > performance impacts. If modifying the hardware is not feasible, a
> > > default configuration that allows bypass mode could be provided as a
> > > solution. There might be other scenarios that I might have overlooked.
> > > It seems to me since IOMMU supports this configuration, it would be
> > > advantageous to have an approach to achieve it, and DT might be a
> > > flexible way.
> >
> > So far we've taken the approach that broken hardware is quirked in the
> > kernel by matching OF compatible string pattners. This is HW that is
> > completely broken and the IOMMU doesn't work at all for it.
> >
> > HW that is slow or whatever is not quirked and this is an admin policy
> > choice where the system should land on the security/performance
> > spectrum.
> >
> > So I'm not sure adding DT makes sense here.
> >
> 
> Hi Jason,
> Sorry for being late here, I hadn't noticed this reply earlier. The
> approach seems to address the situation. Could you kindly provide
> information about the location of the patches? I was wondering about
> further details regarding this particular implementation. Thanks

There are a couple versions, eg  
 arm_smmu_def_domain_type()
 qcom_smmu_def_domain_type()

Jason

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings
  2023-08-15 18:38                         ` Jason Gunthorpe
@ 2023-08-16  2:16                           ` Zong Li
  2023-08-16  4:10                             ` Baolu Lu
  0 siblings, 1 reply; 86+ messages in thread
From: Zong Li @ 2023-08-16  2:16 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Baolu Lu, Anup Patel, Tomasz Jeznach, Joerg Roedel, Will Deacon,
	Robin Murphy, Paul Walmsley, Albert Ou, linux, linux-kernel,
	Sebastien Boeuf, iommu, Palmer Dabbelt, Nick Kossifidis,
	linux-riscv

On Wed, Aug 16, 2023 at 2:38 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Aug 15, 2023 at 09:28:54AM +0800, Zong Li wrote:
> > On Wed, Aug 9, 2023 at 10:57 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Thu, Jul 27, 2023 at 10:42:47AM +0800, Zong Li wrote:
> > >
> > > > Perhaps this question could be related to the scenarios in which
> > > > devices wish to be in bypass mode when the IOMMU is in translation
> > > > mode, and why IOMMU defines/supports this case. Currently, I could
> > > > envision a scenario where a device is already connected to the IOMMU
> > > > in hardware, but it is not functioning correctly, or there are
> > > > performance impacts. If modifying the hardware is not feasible, a
> > > > default configuration that allows bypass mode could be provided as a
> > > > solution. There might be other scenarios that I might have overlooked.
> > > > It seems to me since IOMMU supports this configuration, it would be
> > > > advantageous to have an approach to achieve it, and DT might be a
> > > > flexible way.
> > >
> > > So far we've taken the approach that broken hardware is quirked in the
> > > kernel by matching OF compatible string pattners. This is HW that is
> > > completely broken and the IOMMU doesn't work at all for it.
> > >
> > > HW that is slow or whatever is not quirked and this is an admin policy
> > > choice where the system should land on the security/performance
> > > spectrum.
> > >
> > > So I'm not sure adding DT makes sense here.
> > >
> >
> > Hi Jason,
> > Sorry for being late here, I hadn't noticed this reply earlier. The
> > approach seems to address the situation. Could you kindly provide
> > information about the location of the patches? I was wondering about
> > further details regarding this particular implementation. Thanks
>
> There are a couple versions, eg
>  arm_smmu_def_domain_type()
>  qcom_smmu_def_domain_type()
>

I thought what you mentioned earlier is that there is a new approach
being considered for this. I think what you point out is the same as
Anup mentioned. However, as I mentioned earlier, I am exploring a more
flexible approach to achieve this objective. This way, we can avoid
hard coding anything (i.e.list compatible string) in the driver or
requiring a kernel rebuild every time we need to change the mode for
specific devices. For example, the driver could parse the device node
to determine and record if a device will be set to bypass, and then
the .def_domain_type could be used to set to IOMMU_DOMAIN_IDENTITY by
the record. I'm not sure if it makes sense for everyone, it seems to
me that it would be great if there is a way to do this. :)

> Jason

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings
  2023-08-16  2:16                           ` Zong Li
@ 2023-08-16  4:10                             ` Baolu Lu
  0 siblings, 0 replies; 86+ messages in thread
From: Baolu Lu @ 2023-08-16  4:10 UTC (permalink / raw)
  To: Zong Li, Jason Gunthorpe
  Cc: baolu.lu, Anup Patel, Tomasz Jeznach, Joerg Roedel, Will Deacon,
	Robin Murphy, Paul Walmsley, Albert Ou, linux, linux-kernel,
	Sebastien Boeuf, iommu, Palmer Dabbelt, Nick Kossifidis,
	linux-riscv

On 2023/8/16 10:16, Zong Li wrote:
> On Wed, Aug 16, 2023 at 2:38 AM Jason Gunthorpe<jgg@ziepe.ca>  wrote:
>> On Tue, Aug 15, 2023 at 09:28:54AM +0800, Zong Li wrote:
>>> On Wed, Aug 9, 2023 at 10:57 PM Jason Gunthorpe<jgg@ziepe.ca>  wrote:
>>>> On Thu, Jul 27, 2023 at 10:42:47AM +0800, Zong Li wrote:
>>>>
>>>>> Perhaps this question could be related to the scenarios in which
>>>>> devices wish to be in bypass mode when the IOMMU is in translation
>>>>> mode, and why IOMMU defines/supports this case. Currently, I could
>>>>> envision a scenario where a device is already connected to the IOMMU
>>>>> in hardware, but it is not functioning correctly, or there are
>>>>> performance impacts. If modifying the hardware is not feasible, a
>>>>> default configuration that allows bypass mode could be provided as a
>>>>> solution. There might be other scenarios that I might have overlooked.
>>>>> It seems to me since IOMMU supports this configuration, it would be
>>>>> advantageous to have an approach to achieve it, and DT might be a
>>>>> flexible way.
>>>> So far we've taken the approach that broken hardware is quirked in the
>>>> kernel by matching OF compatible string pattners. This is HW that is
>>>> completely broken and the IOMMU doesn't work at all for it.
>>>>
>>>> HW that is slow or whatever is not quirked and this is an admin policy
>>>> choice where the system should land on the security/performance
>>>> spectrum.
>>>>
>>>> So I'm not sure adding DT makes sense here.
>>>>
>>> Hi Jason,
>>> Sorry for being late here, I hadn't noticed this reply earlier. The
>>> approach seems to address the situation. Could you kindly provide
>>> information about the location of the patches? I was wondering about
>>> further details regarding this particular implementation. Thanks
>> There are a couple versions, eg
>>   arm_smmu_def_domain_type()
>>   qcom_smmu_def_domain_type()
>>
> I thought what you mentioned earlier is that there is a new approach
> being considered for this. I think what you point out is the same as
> Anup mentioned. However, as I mentioned earlier, I am exploring a more
> flexible approach to achieve this objective. This way, we can avoid
> hard coding anything (i.e.list compatible string) in the driver or
> requiring a kernel rebuild every time we need to change the mode for
> specific devices. For example, the driver could parse the device node
> to determine and record if a device will be set to bypass, and then
> the .def_domain_type could be used to set to IOMMU_DOMAIN_IDENTITY by
> the record. I'm not sure if it makes sense for everyone, it seems to
> me that it would be great if there is a way to do this. 😄

What you described applies to the case where the device is *quirky*, it
"is not functioning correctly" when the IOMMU is configured in DMA
translation mode.

But it could not be used in another case, as described above, where
IOMMU translation has performance impacts on the device's DMA
efficiency. This is a kind of a user policy and should not be achieved
through the "DT/APCI + def_domain_type" mechanism.

The iommu subsystem has provided a sysfs interface that users can use to
change the domain type for devices. This means that users can change the
domain type at their wishes, without having to modify the kernel
configuration.

Best regards,
baolu


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 01/11] RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support.
  2023-07-19 19:33 ` [PATCH 01/11] RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support Tomasz Jeznach
                     ` (5 preceding siblings ...)
  2023-08-03  8:27   ` Zong Li
@ 2023-08-16 18:05   ` Robin Murphy
  2024-04-13 10:15   ` Xingyou Chen
  7 siblings, 0 replies; 86+ messages in thread
From: Robin Murphy @ 2023-08-16 18:05 UTC (permalink / raw)
  To: Tomasz Jeznach, Joerg Roedel, Will Deacon, Paul Walmsley
  Cc: Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Nick Kossifidis, Sebastien Boeuf, iommu, linux-riscv,
	linux-kernel, linux

On 2023-07-19 20:33, Tomasz Jeznach wrote:
> The patch introduces skeleton IOMMU device driver implementation as defined
> by RISC-V IOMMU Architecture Specification, Version 1.0 [1], with minimal support
> for pass-through mapping, basic initialization and bindings for platform and PCIe
> hardware implementations.
> 
> Series of patches following specification evolution has been reorganized to provide
> functional separation of implemented blocks, compliant with ratified specification.
> 
> This and following patch series includes code contributed by: Nick Kossifidis
> <mick@ics.forth.gr> (iommu-platform device, number of specification clarification
> and bugfixes and readability improvements), Sebastien Boeuf <seb@rivosinc.com> (page
> table creation, ATS/PGR flow).
> 
> Complete history can be found at the maintainer's repository branch [2].
> 
> Device driver enables RISC-V 32/64 support for memory translation for DMA capable
> PCI and platform devices, multilevel device directory table, process directory,
> shared virtual address support, wired and message signaled interrupt for translation
> I/O fault, page request interface and command processing.
> 
> Matching RISCV-V IOMMU device emulation implementation is available for QEMU project,
> along with educational device extensions for PASID ATS/PRI support [3].
> 
> References:
>   - [1] https://github.com/riscv-non-isa/riscv-iommu
>   - [2] https://github.com/tjeznach/linux/tree/tjeznach/riscv-iommu
>   - [3] https://github.com/tjeznach/qemu/tree/tjeznach/riscv-iommu
> 
> Co-developed-by: Nick Kossifidis <mick@ics.forth.gr>
> Signed-off-by: Nick Kossifidis <mick@ics.forth.gr>
> Co-developed-by: Sebastien Boeuf <seb@rivosinc.com>
> Signed-off-by: Sebastien Boeuf <seb@rivosinc.com>
> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> ---
>   drivers/iommu/Kconfig                |   1 +
>   drivers/iommu/Makefile               |   2 +-
>   drivers/iommu/riscv/Kconfig          |  22 +
>   drivers/iommu/riscv/Makefile         |   1 +
>   drivers/iommu/riscv/iommu-bits.h     | 704 +++++++++++++++++++++++++++
>   drivers/iommu/riscv/iommu-pci.c      | 134 +++++
>   drivers/iommu/riscv/iommu-platform.c |  94 ++++
>   drivers/iommu/riscv/iommu.c          | 660 +++++++++++++++++++++++++
>   drivers/iommu/riscv/iommu.h          | 115 +++++
>   9 files changed, 1732 insertions(+), 1 deletion(-)
>   create mode 100644 drivers/iommu/riscv/Kconfig
>   create mode 100644 drivers/iommu/riscv/Makefile
>   create mode 100644 drivers/iommu/riscv/iommu-bits.h
>   create mode 100644 drivers/iommu/riscv/iommu-pci.c
>   create mode 100644 drivers/iommu/riscv/iommu-platform.c
>   create mode 100644 drivers/iommu/riscv/iommu.c
>   create mode 100644 drivers/iommu/riscv/iommu.h
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 2b12b583ef4b..36fcc6fd5b4e 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -187,6 +187,7 @@ config MSM_IOMMU
>   source "drivers/iommu/amd/Kconfig"
>   source "drivers/iommu/intel/Kconfig"
>   source "drivers/iommu/iommufd/Kconfig"
> +source "drivers/iommu/riscv/Kconfig"
>   
>   config IRQ_REMAP
>   	bool "Support for Interrupt Remapping"
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 769e43d780ce..8f57110a9fb1 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -1,5 +1,5 @@
>   # SPDX-License-Identifier: GPL-2.0
> -obj-y += amd/ intel/ arm/ iommufd/
> +obj-y += amd/ intel/ arm/ iommufd/ riscv/
>   obj-$(CONFIG_IOMMU_API) += iommu.o
>   obj-$(CONFIG_IOMMU_API) += iommu-traces.o
>   obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
> diff --git a/drivers/iommu/riscv/Kconfig b/drivers/iommu/riscv/Kconfig
> new file mode 100644
> index 000000000000..01d4043849d4
> --- /dev/null
> +++ b/drivers/iommu/riscv/Kconfig
> @@ -0,0 +1,22 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +# RISC-V IOMMU support
> +
> +config RISCV_IOMMU
> +	bool "RISC-V IOMMU driver"
> +	depends on RISCV
> +	select IOMMU_API
> +	select IOMMU_DMA

No. See commit de9f8a91eb32.

> +	select IOMMU_SVA
> +	select IOMMU_IOVA

No.

> +	select IOMMU_IO_PGTABLE

Do you anticipate needing to support multiple pagetable formats, or 
sharing this format with other drivers? If not, I'd usually suggest 
avoiding the overhead of io-pgtable.

> +	select IOASID

Doesn't exist.

> +	select PCI_MSI

Already selected at the arch level, does it really need reselecting here?

> +	select PCI_ATS
> +	select PCI_PRI
> +	select PCI_PASID
> +	select MMU_NOTIFIER
> +	help
> +	  Support for devices following RISC-V IOMMU specification.
> +
> +	  If unsure, say N here.
> +
[...]
> diff --git a/drivers/iommu/riscv/iommu-pci.c b/drivers/iommu/riscv/iommu-pci.c
> new file mode 100644
> index 000000000000..c91f963d7a29
> --- /dev/null
> +++ b/drivers/iommu/riscv/iommu-pci.c
> @@ -0,0 +1,134 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +/*
> + * Copyright © 2022-2023 Rivos Inc.
> + * Copyright © 2023 FORTH-ICS/CARV
> + *
> + * RISCV IOMMU as a PCIe device
> + *
> + * Authors
> + *	Tomasz Jeznach <tjeznach@rivosinc.com>
> + *	Nick Kossifidis <mick@ics.forth.gr>
> + */
> +
> +#include <linux/module.h>
> +#include <linux/kernel.h>
> +#include <linux/compiler.h>
> +#include <linux/pci.h>
> +#include <linux/init.h>
> +#include <linux/iommu.h>
> +#include <linux/bitfield.h>
> +
> +#include "iommu.h"
> +
> +/* Rivos Inc. assigned PCI Vendor and Device IDs */
> +#ifndef PCI_VENDOR_ID_RIVOS
> +#define PCI_VENDOR_ID_RIVOS             0x1efd
> +#endif
> +
> +#ifndef PCI_DEVICE_ID_RIVOS_IOMMU
> +#define PCI_DEVICE_ID_RIVOS_IOMMU       0xedf1
> +#endif
> +
> +static int riscv_iommu_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> +{
> +	struct device *dev = &pdev->dev;
> +	struct riscv_iommu_device *iommu;
> +	int ret;
> +
> +	ret = pci_enable_device_mem(pdev);
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = pci_request_mem_regions(pdev, KBUILD_MODNAME);
> +	if (ret < 0)
> +		goto fail;
> +
> +	ret = -ENOMEM;
> +
> +	iommu = devm_kzalloc(dev, sizeof(*iommu), GFP_KERNEL);
> +	if (!iommu)
> +		goto fail;
> +
> +	if (!(pci_resource_flags(pdev, 0) & IORESOURCE_MEM))
> +		goto fail;
> +
> +	if (pci_resource_len(pdev, 0) < RISCV_IOMMU_REG_SIZE)
> +		goto fail;
> +
> +	iommu->reg_phys = pci_resource_start(pdev, 0);
> +	if (!iommu->reg_phys)
> +		goto fail;
> +
> +	iommu->reg = devm_ioremap(dev, iommu->reg_phys, RISCV_IOMMU_REG_SIZE);
> +	if (!iommu->reg)
> +		goto fail;
> +
> +	iommu->dev = dev;
> +	dev_set_drvdata(dev, iommu);
> +
> +	dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
> +	pci_set_master(pdev);
> +
> +	ret = riscv_iommu_init(iommu);
> +	if (!ret)
> +		return ret;
> +
> + fail:
> +	pci_clear_master(pdev);
> +	pci_release_regions(pdev);
> +	pci_disable_device(pdev);
> +	/* Note: devres_release_all() will release iommu and iommu->reg */
> +	return ret;
> +}
> +
> +static void riscv_iommu_pci_remove(struct pci_dev *pdev)
> +{
> +	riscv_iommu_remove(dev_get_drvdata(&pdev->dev));
> +	pci_clear_master(pdev);
> +	pci_release_regions(pdev);
> +	pci_disable_device(pdev);
> +}
> +
> +static int riscv_iommu_suspend(struct device *dev)
> +{
> +	dev_warn(dev, "RISC-V IOMMU PM not implemented");
> +	return -ENODEV;
> +}
> +
> +static int riscv_iommu_resume(struct device *dev)
> +{
> +	dev_warn(dev, "RISC-V IOMMU PM not implemented");
> +	return -ENODEV;
> +}
> +
> +static DEFINE_SIMPLE_DEV_PM_OPS(riscv_iommu_pm_ops, riscv_iommu_suspend,
> +				riscv_iommu_resume);
> +
> +static const struct pci_device_id riscv_iommu_pci_tbl[] = {
> +	{PCI_VENDOR_ID_RIVOS, PCI_DEVICE_ID_RIVOS_IOMMU,
> +	 PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0},
> +	{0,}
> +};
> +
> +MODULE_DEVICE_TABLE(pci, riscv_iommu_pci_tbl);
> +
> +static const struct of_device_id riscv_iommu_of_match[] = {
> +	{.compatible = "riscv,pci-iommu",},
> +	{},
> +};
> +
> +MODULE_DEVICE_TABLE(of, riscv_iommu_of_match);
> +
> +static struct pci_driver riscv_iommu_pci_driver = {
> +	.name = KBUILD_MODNAME,
> +	.id_table = riscv_iommu_pci_tbl,
> +	.probe = riscv_iommu_pci_probe,
> +	.remove = riscv_iommu_pci_remove,
> +	.driver = {
> +		   .pm = pm_sleep_ptr(&riscv_iommu_pm_ops),
> +		   .of_match_table = riscv_iommu_of_match,

Does that even do anything for a PCI driver?

Also you're missing suppress_bind_attrs here.

> +		   },
> +};
> +
> +module_driver(riscv_iommu_pci_driver, pci_register_driver, pci_unregister_driver);
> diff --git a/drivers/iommu/riscv/iommu-platform.c b/drivers/iommu/riscv/iommu-platform.c
> new file mode 100644
> index 000000000000..e4e8ca6711e7
> --- /dev/null
> +++ b/drivers/iommu/riscv/iommu-platform.c
> @@ -0,0 +1,94 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * RISC-V IOMMU as a platform device
> + *
> + * Copyright © 2023 FORTH-ICS/CARV
> + *
> + * Author: Nick Kossifidis <mick@ics.forth.gr>
> + */
> +
> +#include <linux/module.h>
> +#include <linux/kernel.h>
> +#include <linux/of_platform.h>
> +#include <linux/bitfield.h>
> +
> +#include "iommu-bits.h"
> +#include "iommu.h"
> +
> +static int riscv_iommu_platform_probe(struct platform_device *pdev)
> +{
> +	struct device *dev = &pdev->dev;
> +	struct riscv_iommu_device *iommu = NULL;
> +	struct resource *res = NULL;
> +	int ret = 0;
> +
> +	iommu = devm_kzalloc(dev, sizeof(*iommu), GFP_KERNEL);
> +	if (!iommu)
> +		return -ENOMEM;
> +
> +	iommu->dev = dev;
> +	dev_set_drvdata(dev, iommu);
> +
> +	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> +	if (!res) {
> +		dev_err(dev, "could not find resource for register region\n");
> +		return -EINVAL;
> +	}
> +
> +	iommu->reg = devm_platform_get_and_ioremap_resource(pdev, 0, &res);
> +	if (IS_ERR(iommu->reg)) {
> +		ret = dev_err_probe(dev, PTR_ERR(iommu->reg),
> +				    "could not map register region\n");
> +		goto fail;
> +	};
> +
> +	iommu->reg_phys = res->start;
> +
> +	ret = -ENODEV;
> +
> +	/* Sanity check: Did we get the whole register space ? */
> +	if ((res->end - res->start + 1) < RISCV_IOMMU_REG_SIZE) {
> +		dev_err(dev, "device region smaller than register file (0x%llx)\n",
> +			res->end - res->start);
> +		goto fail;
> +	}
> +
> +	dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
> +
> +	return riscv_iommu_init(iommu);
> +
> + fail:
> +	/* Note: devres_release_all() will release iommu and iommu->reg */
> +	return ret;
> +};
> +
> +static void riscv_iommu_platform_remove(struct platform_device *pdev)
> +{
> +	riscv_iommu_remove(dev_get_drvdata(&pdev->dev));
> +};
> +
> +static void riscv_iommu_platform_shutdown(struct platform_device *pdev)
> +{
> +	return;

Surely just don't implement it at all?

> +};
> +
> +static const struct of_device_id riscv_iommu_of_match[] = {
> +	{.compatible = "riscv,iommu",},
> +	{},
> +};
> +
> +MODULE_DEVICE_TABLE(of, riscv_iommu_of_match);
> +
> +static struct platform_driver riscv_iommu_platform_driver = {
> +	.driver = {
> +		   .name = "riscv,iommu",
> +		   .of_match_table = riscv_iommu_of_match,
> +		   .suppress_bind_attrs = true,
> +		   },
> +	.probe = riscv_iommu_platform_probe,
> +	.remove_new = riscv_iommu_platform_remove,
> +	.shutdown = riscv_iommu_platform_shutdown,
> +};
> +
> +module_driver(riscv_iommu_platform_driver, platform_driver_register,
> +	      platform_driver_unregister);
> diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
> new file mode 100644
> index 000000000000..8c236242e2cc
> --- /dev/null
> +++ b/drivers/iommu/riscv/iommu.c
> @@ -0,0 +1,660 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * IOMMU API for RISC-V architected Ziommu implementations.
> + *
> + * Copyright © 2022-2023 Rivos Inc.
> + * Copyright © 2023 FORTH-ICS/CARV
> + *
> + * Authors
> + *	Tomasz Jeznach <tjeznach@rivosinc.com>
> + *	Nick Kossifidis <mick@ics.forth.gr>
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include <linux/bitfield.h>
> +#include <linux/module.h>
> +#include <linux/kernel.h>
> +#include <linux/compiler.h>
> +#include <linux/pci.h>
> +#include <linux/pci-ats.h>
> +#include <linux/init.h>
> +#include <linux/completion.h>
> +#include <linux/uaccess.h>
> +#include <linux/iommu.h>
> +#include <linux/irqdomain.h>
> +#include <linux/platform_device.h>
> +#include <linux/dma-map-ops.h>

No. This is a device driver, not a DMA ops implemenmtation.

> +#include <asm/page.h>
> +
> +#include "../dma-iommu.h"
> +#include "../iommu-sva.h"
> +#include "iommu.h"
> +
> +#include <asm/csr.h>
> +#include <asm/delay.h>
> +
> +MODULE_DESCRIPTION("IOMMU driver for RISC-V architected Ziommu implementations");
> +MODULE_AUTHOR("Tomasz Jeznach <tjeznach@rivosinc.com>");
> +MODULE_AUTHOR("Nick Kossifidis <mick@ics.forth.gr>");
> +MODULE_ALIAS("riscv-iommu");
> +MODULE_LICENSE("GPL v2");
> +
> +/* Global IOMMU params. */
> +static int ddt_mode = RISCV_IOMMU_DDTP_MODE_BARE;
> +module_param(ddt_mode, int, 0644);
> +MODULE_PARM_DESC(ddt_mode, "Device Directory Table mode.");
> +
> +/* IOMMU PSCID allocation namespace. */
> +#define RISCV_IOMMU_MAX_PSCID	(1U << 20)
> +static DEFINE_IDA(riscv_iommu_pscids);
> +
> +/* 1 second */
> +#define RISCV_IOMMU_TIMEOUT	riscv_timebase
> +
> +/* RISC-V IOMMU PPN <> PHYS address conversions, PHYS <=> PPN[53:10] */
> +#define phys_to_ppn(va)  (((va) >> 2) & (((1ULL << 44) - 1) << 10))
> +#define ppn_to_phys(pn)	 (((pn) << 2) & (((1ULL << 44) - 1) << 12))
> +
> +#define iommu_domain_to_riscv(iommu_domain) \
> +    container_of(iommu_domain, struct riscv_iommu_domain, domain)
> +
> +#define iommu_device_to_riscv(iommu_device) \
> +    container_of(iommu_device, struct riscv_iommu, iommu)
> +
> +static const struct iommu_domain_ops riscv_iommu_domain_ops;
> +static const struct iommu_ops riscv_iommu_ops;
> +
> +/*
> + * Register device for IOMMU tracking.
> + */
> +static void riscv_iommu_add_device(struct riscv_iommu_device *iommu, struct device *dev)
> +{
> +	struct riscv_iommu_endpoint *ep, *rb_ep;
> +	struct rb_node **new_node, *parent_node = NULL;
> +
> +	mutex_lock(&iommu->eps_mutex);
> +
> +	ep = dev_iommu_priv_get(dev);
> +
> +	new_node = &(iommu->eps.rb_node);
> +	while (*new_node) {
> +		rb_ep = rb_entry(*new_node, struct riscv_iommu_endpoint, node);
> +		parent_node = *new_node;
> +		if (rb_ep->devid > ep->devid) {
> +			new_node = &((*new_node)->rb_left);
> +		} else if (rb_ep->devid < ep->devid) {
> +			new_node = &((*new_node)->rb_right);
> +		} else {
> +			dev_warn(dev, "device %u already in the tree\n", ep->devid);
> +			break;
> +		}
> +	}
> +
> +	rb_link_node(&ep->node, parent_node, new_node);
> +	rb_insert_color(&ep->node, &iommu->eps);
> +
> +	mutex_unlock(&iommu->eps_mutex);
> +}
> +
> +/*
> + * Endpoint management
> + */
> +
> +static int riscv_iommu_of_xlate(struct device *dev, struct of_phandle_args *args)
> +{
> +	return iommu_fwspec_add_ids(dev, args->args, 1);
> +}
> +
> +static bool riscv_iommu_capable(struct device *dev, enum iommu_cap cap)
> +{
> +	switch (cap) {
> +	case IOMMU_CAP_CACHE_COHERENCY:
> +	case IOMMU_CAP_PRE_BOOT_PROTECTION:

I don't think you can ever unconditionally claim pre-boot protection. 
Even if an IOMMU implementation does come out of reset in "Off" state - 
which I see the spec only recommends, not requires - something that ran 
before Linux could have done something and left it in "Bare" state.

> +		return true;
> +
> +	default:
> +		break;
> +	}
> +
> +	return false;
> +}
> +
> +static struct iommu_device *riscv_iommu_probe_device(struct device *dev)
> +{
> +	struct riscv_iommu_device *iommu;
> +	struct riscv_iommu_endpoint *ep;
> +	struct iommu_fwspec *fwspec;
> +
> +	fwspec = dev_iommu_fwspec_get(dev);
> +	if (!fwspec || fwspec->ops != &riscv_iommu_ops ||
> +	    !fwspec->iommu_fwnode || !fwspec->iommu_fwnode->dev)

I'm pretty sure it shouldn't be possible for a fwspec to have ops set 
without a valid fwnode, given that the fwnode is used to find the ops :/

> +		return ERR_PTR(-ENODEV);
> +
> +	iommu = dev_get_drvdata(fwspec->iommu_fwnode->dev);
> +	if (!iommu)
> +		return ERR_PTR(-ENODEV);
> +
> +	if (dev_iommu_priv_get(dev))

That should never be true at this point.

> +		return &iommu->iommu;
> +
> +	ep = kzalloc(sizeof(*ep), GFP_KERNEL);
> +	if (!ep)
> +		return ERR_PTR(-ENOMEM);
> +
> +	mutex_init(&ep->lock);
> +	INIT_LIST_HEAD(&ep->domain);
> +
> +	if (dev_is_pci(dev)) {
> +		ep->devid = pci_dev_id(to_pci_dev(dev));
> +		ep->domid = pci_domain_nr(to_pci_dev(dev)->bus);
> +	} else {
> +		/* TODO: Make this generic, for now hardcode domain id to 0 */
> +		ep->devid = fwspec->ids[0];
> +		ep->domid = 0;
> +	}
> +
> +	ep->iommu = iommu;
> +	ep->dev = dev;
> +
> +	dev_info(iommu->dev, "adding device to iommu with devid %i in domain %i\n",
> +		ep->devid, ep->domid);

Please clean up all these debugging prints before submitting patches 
upstream. I do start to wonder how worthwhile it is to review code that 
doesn't even look finished...

> +
> +	dev_iommu_priv_set(dev, ep);
> +	riscv_iommu_add_device(iommu, dev);
> +
> +	return &iommu->iommu;
> +}
> +
> +static void riscv_iommu_probe_finalize(struct device *dev)
> +{
> +	set_dma_ops(dev, NULL);
> +	iommu_setup_dma_ops(dev, 0, U64_MAX);
> +}

riscv already implements arch_setup_dma_ops(), so please make use of 
that flow; this probe_finalize bodge is an x86 thing, mostly for legacy 
reasons.

> +
> +static void riscv_iommu_release_device(struct device *dev)
> +{
> +	struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
> +	struct riscv_iommu_device *iommu = ep->iommu;
> +
> +	dev_info(dev, "device with devid %i released\n", ep->devid);
> +
> +	mutex_lock(&ep->lock);
> +	list_del(&ep->domain);
> +	mutex_unlock(&ep->lock);
> +
> +	/* Remove endpoint from IOMMU tracking structures */
> +	mutex_lock(&iommu->eps_mutex);
> +	rb_erase(&ep->node, &iommu->eps);
> +	mutex_unlock(&iommu->eps_mutex);
> +
> +	set_dma_ops(dev, NULL);
> +	dev_iommu_priv_set(dev, NULL);
> +
> +	kfree(ep);
> +}
> +
> +static struct iommu_group *riscv_iommu_device_group(struct device *dev)
> +{
> +	if (dev_is_pci(dev))
> +		return pci_device_group(dev);
> +	return generic_device_group(dev);
> +}
> +
> +static void riscv_iommu_get_resv_regions(struct device *dev, struct list_head *head)
> +{
> +	iommu_dma_get_resv_regions(dev, head);

Just assign it as the callback directly - the wrapper function serves no 
purpose.

> +}
> +
> +/*
> + * Domain management
> + */
> +
> +static struct iommu_domain *riscv_iommu_domain_alloc(unsigned type)
> +{
> +	struct riscv_iommu_domain *domain;
> +
> +	if (type != IOMMU_DOMAIN_IDENTITY &&
> +	    type != IOMMU_DOMAIN_BLOCKED)

Whatever's going on here I don't think really fits the meaning of 
IOMMU_DOMAIN_BLOCKED.

> +		return NULL;
> +
> +	domain = kzalloc(sizeof(*domain), GFP_KERNEL);
> +	if (!domain)
> +		return NULL;
> +
> +	mutex_init(&domain->lock);
> +	INIT_LIST_HEAD(&domain->endpoints);
> +
> +	domain->domain.ops = &riscv_iommu_domain_ops;
> +	domain->mode = RISCV_IOMMU_DC_FSC_MODE_BARE;
> +	domain->pscid = ida_alloc_range(&riscv_iommu_pscids, 1,
> +					RISCV_IOMMU_MAX_PSCID, GFP_KERNEL);

If this fails, it seems you "successfully" return a completely useless 
domain to which nothing can attach. Why not just fail the allocation 
right here?

> +
> +	printk("domain type %x alloc %u\n", type, domain->pscid);
> +
> +	return &domain->domain;
> +}
> +
> +static void riscv_iommu_domain_free(struct iommu_domain *iommu_domain)
> +{
> +	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
> +
> +	if (!list_empty(&domain->endpoints)) {
> +		pr_warn("IOMMU domain is not empty!\n");
> +	}
> +
> +	if (domain->pgd_root)
> +		free_pages((unsigned long)domain->pgd_root, 0);
> +
> +	if ((int)domain->pscid > 0)
> +		ida_free(&riscv_iommu_pscids, domain->pscid);
> +
> +	printk("domain free %u\n", domain->pscid);
> +
> +	kfree(domain);
> +}
> +
> +static int riscv_iommu_domain_finalize(struct riscv_iommu_domain *domain,
> +				       struct riscv_iommu_device *iommu)
> +{
> +	struct iommu_domain_geometry *geometry;
> +
> +	/* Domain assigned to another iommu */
> +	if (domain->iommu && domain->iommu != iommu)
> +		return -EINVAL;
> +	/* Domain already initialized */
> +	else if (domain->iommu)
> +		return 0;
> +
> +	/*
> +	 * TODO: Before using VA_BITS and satp_mode here, verify they
> +	 * are supported by the iommu, through the capabilities register.
> +	 */

Yes, doing that sounds like a very good idea.

> +
> +	geometry = &domain->domain.geometry;
> +
> +	/*
> +	 * Note: RISC-V Privilege spec mandates that virtual addresses
> +	 * need to be sign-extended, so if (VA_BITS - 1) is set, all
> +	 * bits >= VA_BITS need to also be set or else we'll get a
> +	 * page fault. However the code that creates the mappings
> +	 * above us (e.g. iommu_dma_alloc_iova()) won't do that for us
> +	 * for now, so we'll end up with invalid virtual addresses
> +	 * to map. As a workaround until we get this sorted out
> +	 * limit the available virtual addresses to VA_BITS - 1.
> +	 */

Would you have a practical use for a single 64-bit VA space with a 
massive hole in the middle anyway?

> +	geometry->aperture_start = 0;
> +	geometry->aperture_end = DMA_BIT_MASK(VA_BITS - 1);
> +	geometry->force_aperture = true;
> +
> +	domain->iommu = iommu;
> +
> +	if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)
> +		return 0;
> +
> +	/* TODO: Fix this for RV32 */
> +	domain->mode = satp_mode >> 60;
> +	domain->pgd_root = (pgd_t *) __get_free_pages(GFP_KERNEL | __GFP_ZERO, 0);
> +
> +	if (!domain->pgd_root)
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +static int riscv_iommu_attach_dev(struct iommu_domain *iommu_domain, struct device *dev)
> +{
> +	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
> +	struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
> +	int ret;
> +
> +	/* PSCID not valid */
> +	if ((int)domain->pscid < 0)
> +		return -ENOMEM;
> +
> +	mutex_lock(&domain->lock);
> +	mutex_lock(&ep->lock);
> +
> +	if (!list_empty(&ep->domain)) {
> +		dev_warn(dev, "endpoint already attached to a domain. dropping\n");

This should not be a warning condition. Other than the very first attach 
at iommu_probe_device() time, .attach_dev will always be moving devices 
directly from one domain to another.

> +		list_del_init(&ep->domain);
> +	}
> +
> +	/* allocate root pages, initialize io-pgtable ops, etc. */
> +	ret = riscv_iommu_domain_finalize(domain, ep->iommu);
> +	if (ret < 0) {
> +		dev_err(dev, "can not finalize domain: %d\n", ret);
> +		mutex_unlock(&ep->lock);
> +		mutex_unlock(&domain->lock);
> +		return ret;
> +	}
> +
> +	if (ep->iommu->ddt_mode != RISCV_IOMMU_DDTP_MODE_BARE ||
> +	    domain->domain.type != IOMMU_DOMAIN_IDENTITY) {
> +		dev_warn(dev, "domain type %d not supported\n",
> +		    domain->domain.type);
> +		return -ENODEV;

OK, so you don't actually support blocking domains anyway? In that case, 
don't accept the allocation request in the first place.

> +	}
> +
> +	list_add_tail(&ep->domain, &domain->endpoints);
> +	mutex_unlock(&ep->lock);
> +	mutex_unlock(&domain->lock);
> +
> +	dev_info(dev, "domain type %d attached w/ PSCID %u\n",
> +	    domain->domain.type, domain->pscid);
> +
> +	return 0;
> +}
> +
> +static void riscv_iommu_flush_iotlb_range(struct iommu_domain *iommu_domain,
> +					  unsigned long *start, unsigned long *end,
> +					  size_t *pgsize)
> +{
> +	/* Command interface not implemented */
> +}
> +
> +static void riscv_iommu_flush_iotlb_all(struct iommu_domain *iommu_domain)
> +{
> +	riscv_iommu_flush_iotlb_range(iommu_domain, NULL, NULL, NULL);
> +}
> +
> +static void riscv_iommu_iotlb_sync(struct iommu_domain *iommu_domain,
> +				   struct iommu_iotlb_gather *gather)
> +{
> +	riscv_iommu_flush_iotlb_range(iommu_domain, &gather->start, &gather->end,
> +				      &gather->pgsize);
> +}
> +
> +static void riscv_iommu_iotlb_sync_map(struct iommu_domain *iommu_domain,
> +				       unsigned long iova, size_t size)
> +{
> +	unsigned long end = iova + size - 1;
> +	/*
> +	 * Given we don't know the page size used by this range, we assume the
> +	 * smallest page size to ensure all possible entries are flushed from
> +	 * the IOATC.
> +	 */
> +	size_t pgsize = PAGE_SIZE;
> +	riscv_iommu_flush_iotlb_range(iommu_domain, &iova, &end, &pgsize);

The spec says the IOMMU is not permitted to cache invalid PTEs, so why 
have this? (I mean, it's clearly a completely useless no-op anyway, but 
hey...)

> +}
> +
> +static int riscv_iommu_map_pages(struct iommu_domain *iommu_domain,
> +				 unsigned long iova, phys_addr_t phys,
> +				 size_t pgsize, size_t pgcount, int prot,
> +				 gfp_t gfp, size_t *mapped)
> +{
> +	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
> +
> +	if (domain->domain.type == IOMMU_DOMAIN_IDENTITY) {

That can't happen. Not to mention that pretending to successfully map 
any IOVA to any PA in an identity domain, which by definition doesn't do 
that, would be fundamentally nonsensical anyway.

> +		*mapped = pgsize * pgcount;
> +		return 0;
> +	}
> +
> +	return -ENODEV;
> +}
> +
> +static size_t riscv_iommu_unmap_pages(struct iommu_domain *iommu_domain,
> +				      unsigned long iova, size_t pgsize,
> +				      size_t pgcount, struct iommu_iotlb_gather *gather)
> +{
> +	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
> +
> +	if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)

Ditto.

> +		return pgsize * pgcount;
> +
> +	return 0;
> +}
> +
> +static phys_addr_t riscv_iommu_iova_to_phys(struct iommu_domain *iommu_domain,
> +					    dma_addr_t iova)
> +{
> +	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
> +
> +	if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)

Ditto. Have you seen iommu_iova_to_phys()?

> +		return (phys_addr_t) iova;
> +
> +	return 0;
> +}
> +
> +/*
> + * Translation mode setup
> + */
> +
> +static u64 riscv_iommu_get_ddtp(struct riscv_iommu_device *iommu)
> +{
> +	u64 ddtp;
> +	cycles_t end_cycles = RISCV_IOMMU_TIMEOUT + get_cycles();
> +
> +	/* Wait for DDTP.BUSY to be cleared and return latest value */
> +	do {
> +		ddtp = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_DDTP);
> +		if (!(ddtp & RISCV_IOMMU_DDTP_BUSY))
> +			break;
> +		cpu_relax();
> +	} while (get_cycles() < end_cycles);

Smells like readq_poll_timeout().

> +
> +	return ddtp;
> +}
> +
> +static void riscv_iommu_ddt_cleanup(struct riscv_iommu_device *iommu)
> +{
> +	/* TODO: teardown whole device directory tree. */
> +	if (iommu->ddtp) {
> +		if (iommu->ddtp_in_iomem) {
> +			iounmap((void *)iommu->ddtp);
> +		} else
> +			free_page(iommu->ddtp);
> +		iommu->ddtp = 0;
> +	}
> +}
> +
> +static int riscv_iommu_enable(struct riscv_iommu_device *iommu, unsigned requested_mode)
> +{
> +	struct device *dev = iommu->dev;
> +	u64 ddtp = 0;
> +	u64 ddtp_paddr = 0;
> +	unsigned mode = requested_mode;
> +	unsigned mode_readback = 0;
> +
> +	ddtp = riscv_iommu_get_ddtp(iommu);
> +	if (ddtp & RISCV_IOMMU_DDTP_BUSY)
> +		return -EBUSY;
> +
> +	/* Disallow state transtion from xLVL to xLVL. */
> +	switch (FIELD_GET(RISCV_IOMMU_DDTP_MODE, ddtp)) {
> +	case RISCV_IOMMU_DDTP_MODE_BARE:
> +	case RISCV_IOMMU_DDTP_MODE_OFF:
> +		break;
> +	default:
> +		if ((mode != RISCV_IOMMU_DDTP_MODE_BARE)
> +		    && (mode != RISCV_IOMMU_DDTP_MODE_OFF))
> +			return -EINVAL;
> +		break;
> +	}
> +
> + retry:
> +	switch (mode) {
> +	case RISCV_IOMMU_DDTP_MODE_BARE:
> +	case RISCV_IOMMU_DDTP_MODE_OFF:
> +		riscv_iommu_ddt_cleanup(iommu);
> +		ddtp = FIELD_PREP(RISCV_IOMMU_DDTP_MODE, mode);
> +		break;
> +	case RISCV_IOMMU_DDTP_MODE_1LVL:
> +	case RISCV_IOMMU_DDTP_MODE_2LVL:
> +	case RISCV_IOMMU_DDTP_MODE_3LVL:
> +		if (!iommu->ddtp) {
> +			/*
> +			 * We haven't initialized ddtp yet, since it's WARL make
> +			 * sure that we don't have a hardwired PPN field there
> +			 * that points to i/o memory instead.
> +			 */
> +			riscv_iommu_writeq(iommu, RISCV_IOMMU_REG_DDTP, 0);
> +			ddtp = riscv_iommu_get_ddtp(iommu);
> +			ddtp_paddr = ppn_to_phys(ddtp);
> +			if (ddtp_paddr) {
> +				dev_warn(dev, "ddtp at 0x%llx\n", ddtp_paddr);
> +				iommu->ddtp =
> +				    (unsigned long)ioremap(ddtp_paddr, PAGE_SIZE);
> +				iommu->ddtp_in_iomem = true;
> +			} else {
> +				iommu->ddtp = get_zeroed_page(GFP_KERNEL);
> +			}
> +		}
> +		if (!iommu->ddtp)
> +			return -ENOMEM;
> +
> +		ddtp = FIELD_PREP(RISCV_IOMMU_DDTP_MODE, mode) |
> +		    phys_to_ppn(__pa(iommu->ddtp));
> +
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	riscv_iommu_writeq(iommu, RISCV_IOMMU_REG_DDTP, ddtp);
> +	ddtp = riscv_iommu_get_ddtp(iommu);
> +	if (ddtp & RISCV_IOMMU_DDTP_BUSY) {
> +		dev_warn(dev, "timeout when setting ddtp (ddt mode: %i)\n", mode);
> +		return -EBUSY;
> +	}
> +
> +	mode_readback = FIELD_GET(RISCV_IOMMU_DDTP_MODE, ddtp);
> +	dev_info(dev, "mode_readback: %i, mode: %i\n", mode_readback, mode);
> +	if (mode_readback != mode) {
> +		/*
> +		 * Mode field is WARL, an I/O MMU may support a subset of
> +		 * directory table levels in which case if we tried to set
> +		 * an unsupported number of levels we'll readback either
> +		 * a valid xLVL or off/bare. If we got off/bare, try again
> +		 * with a smaller xLVL.
> +		 */
> +		if (mode_readback < RISCV_IOMMU_DDTP_MODE_1LVL &&
> +		    mode > RISCV_IOMMU_DDTP_MODE_1LVL) {
> +			mode--;
> +			goto retry;
> +		}
> +
> +		/*
> +		 * We tried all supported xLVL modes and still got off/bare instead,
> +		 * an I/O MMU must support at least one supported xLVL mode so something
> +		 * went very wrong.
> +		 */
> +		if (mode_readback < RISCV_IOMMU_DDTP_MODE_1LVL &&
> +		    mode == RISCV_IOMMU_DDTP_MODE_1LVL)
> +			goto fail;
> +
> +		/*
> +		 * We tried setting off or bare and got something else back, something
> +		 * went very wrong since off/bare is always legal.
> +		 */
> +		if (mode < RISCV_IOMMU_DDTP_MODE_1LVL)
> +			goto fail;
> +
> +		/*
> +		 * We tried setting an xLVL mode but got another xLVL mode that
> +		 * we don't support (e.g. a custom one).
> +		 */
> +		if (mode_readback > RISCV_IOMMU_DDTP_MODE_MAX)
> +			goto fail;
> +
> +		/* We tried setting an xLVL mode but got another supported xLVL mode */
> +		mode = mode_readback;
> +	}
> +
> +	if (mode != requested_mode)
> +		dev_warn(dev, "unsupported DDT mode requested (%i), using %i instead\n",
> +			 requested_mode, mode);
> +
> +	iommu->ddt_mode = mode;
> +	dev_info(dev, "ddt_mode: %i\n", iommu->ddt_mode);
> +	return 0;
> +
> + fail:
> +	dev_err(dev, "failed to set DDT mode, tried: %i and got %i\n", mode,
> +		mode_readback);
> +	riscv_iommu_ddt_cleanup(iommu);
> +	return -EINVAL;
> +}
> +
> +/*
> + * Common I/O MMU driver probe/teardown
> + */
> +
> +static const struct iommu_domain_ops riscv_iommu_domain_ops = {
> +	.free = riscv_iommu_domain_free,
> +	.attach_dev = riscv_iommu_attach_dev,
> +	.map_pages = riscv_iommu_map_pages,
> +	.unmap_pages = riscv_iommu_unmap_pages,
> +	.iova_to_phys = riscv_iommu_iova_to_phys,
> +	.iotlb_sync = riscv_iommu_iotlb_sync,
> +	.iotlb_sync_map = riscv_iommu_iotlb_sync_map,
> +	.flush_iotlb_all = riscv_iommu_flush_iotlb_all,
> +};
> +
> +static const struct iommu_ops riscv_iommu_ops = {
> +	.owner = THIS_MODULE,
> +	.pgsize_bitmap = SZ_4K | SZ_2M | SZ_512M,
> +	.capable = riscv_iommu_capable,
> +	.domain_alloc = riscv_iommu_domain_alloc,
> +	.probe_device = riscv_iommu_probe_device,
> +	.probe_finalize = riscv_iommu_probe_finalize,
> +	.release_device = riscv_iommu_release_device,
> +	.device_group = riscv_iommu_device_group,
> +	.get_resv_regions = riscv_iommu_get_resv_regions,
> +	.of_xlate = riscv_iommu_of_xlate,
> +	.default_domain_ops = &riscv_iommu_domain_ops,
> +};
> +
> +void riscv_iommu_remove(struct riscv_iommu_device *iommu)
> +{
> +	iommu_device_unregister(&iommu->iommu);
> +	riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_OFF);
> +}
> +
> +int riscv_iommu_init(struct riscv_iommu_device *iommu)
> +{
> +	struct device *dev = iommu->dev;
> +	u32 fctl = 0;
> +	int ret;
> +
> +	iommu->eps = RB_ROOT;
> +
> +	fctl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_FCTL);
> +
> +#ifdef CONFIG_CPU_BIG_ENDIAN
> +	if (!(cap & RISCV_IOMMU_CAP_END)) {
> +		dev_err(dev, "IOMMU doesn't support Big Endian\n");
> +		return -EIO;
> +	} else if (!(fctl & RISCV_IOMMU_FCTL_BE)) {
> +		fctl |= FIELD_PREP(RISCV_IOMMU_FCTL_BE, 1);
> +		riscv_iommu_writel(iommu, RISCV_IOMMU_REG_FCTL, fctl);
> +	}
> +#endif
> +
> +	/* Clear any pending interrupt flag. */
> +	riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR,
> +			   RISCV_IOMMU_IPSR_CIP |
> +			   RISCV_IOMMU_IPSR_FIP |
> +			   RISCV_IOMMU_IPSR_PMIP | RISCV_IOMMU_IPSR_PIP);
> +	spin_lock_init(&iommu->cq_lock);
> +	mutex_init(&iommu->eps_mutex);
> +
> +	ret = riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_BARE);

Um, yeah, you definitely don't support blocking domains if you just put 
the whole thing in bypass :/

> +
> +	if (ret) {
> +		dev_err(dev, "cannot enable iommu device (%d)\n", ret);
> +		goto fail;
> +	}
> +
> +	ret = iommu_device_register(&iommu->iommu, &riscv_iommu_ops, dev);
> +	if (ret) {
> +		dev_err(dev, "cannot register iommu interface (%d)\n", ret);
> +		iommu_device_sysfs_remove(&iommu->iommu);

But it was never added?

> +		goto fail;
> +	}
> +
> +	return 0;
> + fail:
> +	riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_OFF);
> +	return ret;
> +}
[...]
I appreciate the attempt to split the series up and not have one giant 
4000-line patch, but I have to say it's kind of hard to usefully review 
a patch like this where I'm struggling to tell the "real" code from the 
unfinished dead ends and plain nonsense. I'd suggest splitting it up 
slightly differently: strip out all the IOMMU API stuff here and just 
have this patch add the basic probing and hardware initialisation, then 
add the queues, device tables and pagetables, *then* wire it all up to 
the IOMMU API once it can be meaningfully functional. It ought to be 
possible to have each patch be purely additions of "final" code, no 
temporary placeholders and weird bodges just to let a bisection compile 
and/or pretend to work. Then "extra" features like SVA can be built on 
top as you have already. And of course we like the dt binding to be 
patch #1 :)

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues
  2023-07-19 19:33 ` [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues Tomasz Jeznach
                     ` (2 preceding siblings ...)
  2023-07-29 12:58   ` Zong Li
@ 2023-08-16 18:49   ` Robin Murphy
  3 siblings, 0 replies; 86+ messages in thread
From: Robin Murphy @ 2023-08-16 18:49 UTC (permalink / raw)
  To: Tomasz Jeznach, Joerg Roedel, Will Deacon, Paul Walmsley
  Cc: Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Nick Kossifidis, Sebastien Boeuf, iommu, linux-riscv,
	linux-kernel, linux

On 2023-07-19 20:33, Tomasz Jeznach wrote:
> Enables message or wire signal interrupts for PCIe and platforms devices.
> 
> Co-developed-by: Nick Kossifidis <mick@ics.forth.gr>
> Signed-off-by: Nick Kossifidis <mick@ics.forth.gr>
> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> ---
>   drivers/iommu/riscv/iommu-pci.c      |  72 ++++
>   drivers/iommu/riscv/iommu-platform.c |  66 +++
>   drivers/iommu/riscv/iommu.c          | 604 ++++++++++++++++++++++++++-
>   drivers/iommu/riscv/iommu.h          |  28 ++
>   4 files changed, 769 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/riscv/iommu-pci.c b/drivers/iommu/riscv/iommu-pci.c
> index c91f963d7a29..9ea0647f7b92 100644
> --- a/drivers/iommu/riscv/iommu-pci.c
> +++ b/drivers/iommu/riscv/iommu-pci.c
> @@ -34,6 +34,7 @@ static int riscv_iommu_pci_probe(struct pci_dev *pdev, const struct pci_device_i
>   {
>   	struct device *dev = &pdev->dev;
>   	struct riscv_iommu_device *iommu;
> +	u64 icvec;
>   	int ret;
>   
>   	ret = pci_enable_device_mem(pdev);
> @@ -67,14 +68,84 @@ static int riscv_iommu_pci_probe(struct pci_dev *pdev, const struct pci_device_i
>   	iommu->dev = dev;
>   	dev_set_drvdata(dev, iommu);
>   
> +	/* Check device reported capabilities. */
> +	iommu->cap = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_CAP);
> +
> +	/* The PCI driver only uses MSIs, make sure the IOMMU supports this */
> +	switch (FIELD_GET(RISCV_IOMMU_CAP_IGS, iommu->cap)) {
> +	case RISCV_IOMMU_CAP_IGS_MSI:
> +	case RISCV_IOMMU_CAP_IGS_BOTH:
> +		break;
> +	default:
> +		dev_err(dev, "unable to use message-signaled interrupts\n");
> +		ret = -ENODEV;
> +		goto fail;
> +	}
> +
>   	dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
>   	pci_set_master(pdev);
>   
> +	/* Allocate and assign IRQ vectors for the various events */
> +	ret = pci_alloc_irq_vectors(pdev, 1, RISCV_IOMMU_INTR_COUNT, PCI_IRQ_MSIX);
> +	if (ret < 0) {
> +		dev_err(dev, "unable to allocate irq vectors\n");
> +		goto fail;
> +	}
> +
> +	ret = -ENODEV;
> +
> +	iommu->irq_cmdq = msi_get_virq(dev, RISCV_IOMMU_INTR_CQ);
> +	if (!iommu->irq_cmdq) {
> +		dev_warn(dev, "no MSI vector %d for the command queue\n",
> +			 RISCV_IOMMU_INTR_CQ);
> +		goto fail;
> +	}
> +
> +	iommu->irq_fltq = msi_get_virq(dev, RISCV_IOMMU_INTR_FQ);
> +	if (!iommu->irq_fltq) {
> +		dev_warn(dev, "no MSI vector %d for the fault/event queue\n",
> +			 RISCV_IOMMU_INTR_FQ);
> +		goto fail;
> +	}
> +
> +	if (iommu->cap & RISCV_IOMMU_CAP_HPM) {
> +		iommu->irq_pm = msi_get_virq(dev, RISCV_IOMMU_INTR_PM);
> +		if (!iommu->irq_pm) {
> +			dev_warn(dev,
> +				 "no MSI vector %d for performance monitoring\n",
> +				 RISCV_IOMMU_INTR_PM);
> +			goto fail;
> +		}
> +	}
> +
> +	if (iommu->cap & RISCV_IOMMU_CAP_ATS) {
> +		iommu->irq_priq = msi_get_virq(dev, RISCV_IOMMU_INTR_PQ);
> +		if (!iommu->irq_priq) {
> +			dev_warn(dev,
> +				 "no MSI vector %d for page-request queue\n",
> +				 RISCV_IOMMU_INTR_PQ);
> +			goto fail;
> +		}
> +	}
> +
> +	/* Set simple 1:1 mapping for MSI vectors */
> +	icvec = FIELD_PREP(RISCV_IOMMU_IVEC_CIV, RISCV_IOMMU_INTR_CQ) |
> +	    FIELD_PREP(RISCV_IOMMU_IVEC_FIV, RISCV_IOMMU_INTR_FQ);
> +
> +	if (iommu->cap & RISCV_IOMMU_CAP_HPM)
> +		icvec |= FIELD_PREP(RISCV_IOMMU_IVEC_PMIV, RISCV_IOMMU_INTR_PM);
> +
> +	if (iommu->cap & RISCV_IOMMU_CAP_ATS)
> +		icvec |= FIELD_PREP(RISCV_IOMMU_IVEC_PIV, RISCV_IOMMU_INTR_PQ);
> +
> +	riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IVEC, icvec);
> +
>   	ret = riscv_iommu_init(iommu);
>   	if (!ret)
>   		return ret;
>   
>    fail:
> +	pci_free_irq_vectors(pdev);
>   	pci_clear_master(pdev);
>   	pci_release_regions(pdev);
>   	pci_disable_device(pdev);
> @@ -85,6 +156,7 @@ static int riscv_iommu_pci_probe(struct pci_dev *pdev, const struct pci_device_i
>   static void riscv_iommu_pci_remove(struct pci_dev *pdev)
>   {
>   	riscv_iommu_remove(dev_get_drvdata(&pdev->dev));
> +	pci_free_irq_vectors(pdev);
>   	pci_clear_master(pdev);
>   	pci_release_regions(pdev);
>   	pci_disable_device(pdev);
> diff --git a/drivers/iommu/riscv/iommu-platform.c b/drivers/iommu/riscv/iommu-platform.c
> index e4e8ca6711e7..35935d3c7ef4 100644
> --- a/drivers/iommu/riscv/iommu-platform.c
> +++ b/drivers/iommu/riscv/iommu-platform.c
> @@ -20,6 +20,8 @@ static int riscv_iommu_platform_probe(struct platform_device *pdev)
>   	struct device *dev = &pdev->dev;
>   	struct riscv_iommu_device *iommu = NULL;
>   	struct resource *res = NULL;
> +	u32 fctl = 0;
> +	int irq = 0;
>   	int ret = 0;
>   
>   	iommu = devm_kzalloc(dev, sizeof(*iommu), GFP_KERNEL);
> @@ -53,6 +55,70 @@ static int riscv_iommu_platform_probe(struct platform_device *pdev)
>   		goto fail;
>   	}
>   
> +	iommu->cap = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_CAP);
> +
> +	/* For now we only support WSIs until we have AIA support */
> +	ret = FIELD_GET(RISCV_IOMMU_CAP_IGS, iommu->cap);
> +	if (ret == RISCV_IOMMU_CAP_IGS_MSI) {
> +		dev_err(dev, "IOMMU only supports MSIs\n");
> +		goto fail;
> +	}
> +
> +	/* Parse IRQ assignment */
> +	irq = platform_get_irq_byname_optional(pdev, "cmdq");
> +	if (irq > 0)
> +		iommu->irq_cmdq = irq;
> +	else {
> +		dev_err(dev, "no IRQ provided for the command queue\n");
> +		goto fail;
> +	}
> +
> +	irq = platform_get_irq_byname_optional(pdev, "fltq");
> +	if (irq > 0)
> +		iommu->irq_fltq = irq;
> +	else {
> +		dev_err(dev, "no IRQ provided for the fault/event queue\n");
> +		goto fail;
> +	}
> +
> +	if (iommu->cap & RISCV_IOMMU_CAP_HPM) {
> +		irq = platform_get_irq_byname_optional(pdev, "pm");
> +		if (irq > 0)
> +			iommu->irq_pm = irq;
> +		else {
> +			dev_err(dev, "no IRQ provided for performance monitoring\n");
> +			goto fail;
> +		}
> +	}
> +
> +	if (iommu->cap & RISCV_IOMMU_CAP_ATS) {
> +		irq = platform_get_irq_byname_optional(pdev, "priq");
> +		if (irq > 0)
> +			iommu->irq_priq = irq;
> +		else {
> +			dev_err(dev, "no IRQ provided for the page-request queue\n");
> +			goto fail;
> +		}
> +	}
> +
> +	/* Make sure fctl.WSI is set */
> +	fctl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_FCTL);
> +	fctl |= RISCV_IOMMU_FCTL_WSI;
> +	riscv_iommu_writel(iommu, RISCV_IOMMU_REG_FCTL, fctl);
> +
> +	/* Parse Queue lengts */
> +	ret = of_property_read_u32(pdev->dev.of_node, "cmdq_len", &iommu->cmdq_len);
> +	if (!ret)
> +		dev_info(dev, "command queue length set to %i\n", iommu->cmdq_len);
> +
> +	ret = of_property_read_u32(pdev->dev.of_node, "fltq_len", &iommu->fltq_len);
> +	if (!ret)
> +		dev_info(dev, "fault/event queue length set to %i\n", iommu->fltq_len);
> +
> +	ret = of_property_read_u32(pdev->dev.of_node, "priq_len", &iommu->priq_len);
> +	if (!ret)
> +		dev_info(dev, "page request queue length set to %i\n", iommu->priq_len);

These properties are not documented in the binding, but are clearly 
Linux-specific driver policy which does not belong in DT anyway.

> +
>   	dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
>   
>   	return riscv_iommu_init(iommu);
> diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
> index 31dc3c458e13..5c4cf9875302 100644
> --- a/drivers/iommu/riscv/iommu.c
> +++ b/drivers/iommu/riscv/iommu.c
> @@ -45,6 +45,18 @@ static int ddt_mode = RISCV_IOMMU_DDTP_MODE_BARE;
>   module_param(ddt_mode, int, 0644);
>   MODULE_PARM_DESC(ddt_mode, "Device Directory Table mode.");
>   
> +static int cmdq_length = 1024;
> +module_param(cmdq_length, int, 0644);
> +MODULE_PARM_DESC(cmdq_length, "Command queue length.");
> +
> +static int fltq_length = 1024;
> +module_param(fltq_length, int, 0644);
> +MODULE_PARM_DESC(fltq_length, "Fault queue length.");
> +
> +static int priq_length = 1024;
> +module_param(priq_length, int, 0644);
> +MODULE_PARM_DESC(priq_length, "Page request interface queue length.");
> +
>   /* IOMMU PSCID allocation namespace. */
>   #define RISCV_IOMMU_MAX_PSCID	(1U << 20)
>   static DEFINE_IDA(riscv_iommu_pscids);
> @@ -65,6 +77,497 @@ static DEFINE_IDA(riscv_iommu_pscids);
>   static const struct iommu_domain_ops riscv_iommu_domain_ops;
>   static const struct iommu_ops riscv_iommu_ops;
>   
> +/*
> + * Common queue management routines
> + */
> +
> +/* Note: offsets are the same for all queues */
> +#define Q_HEAD(q) ((q)->qbr + (RISCV_IOMMU_REG_CQH - RISCV_IOMMU_REG_CQB))
> +#define Q_TAIL(q) ((q)->qbr + (RISCV_IOMMU_REG_CQT - RISCV_IOMMU_REG_CQB))
> +
> +static unsigned riscv_iommu_queue_consume(struct riscv_iommu_device *iommu,
> +					  struct riscv_iommu_queue *q, unsigned *ready)
> +{
> +	u32 tail = riscv_iommu_readl(iommu, Q_TAIL(q));
> +	*ready = q->lui;
> +
> +	BUG_ON(q->cnt <= tail);
> +	if (q->lui <= tail)
> +		return tail - q->lui;
> +	return q->cnt - q->lui;
> +}
> +
> +static void riscv_iommu_queue_release(struct riscv_iommu_device *iommu,
> +				      struct riscv_iommu_queue *q, unsigned count)
> +{
> +	q->lui = (q->lui + count) & (q->cnt - 1);
> +	riscv_iommu_writel(iommu, Q_HEAD(q), q->lui);
> +}
> +
> +static u32 riscv_iommu_queue_ctrl(struct riscv_iommu_device *iommu,
> +				  struct riscv_iommu_queue *q, u32 val)
> +{
> +	cycles_t end_cycles = RISCV_IOMMU_TIMEOUT + get_cycles();
> +
> +	riscv_iommu_writel(iommu, q->qcr, val);
> +	do {
> +		val = riscv_iommu_readl(iommu, q->qcr);
> +		if (!(val & RISCV_IOMMU_QUEUE_BUSY))
> +			break;
> +		cpu_relax();
> +	} while (get_cycles() < end_cycles);
> +
> +	return val;
> +}
> +
> +static void riscv_iommu_queue_free(struct riscv_iommu_device *iommu,
> +				   struct riscv_iommu_queue *q)
> +{
> +	size_t size = q->len * q->cnt;
> +
> +	riscv_iommu_queue_ctrl(iommu, q, 0);
> +
> +	if (q->base) {
> +		if (q->in_iomem)
> +			iounmap(q->base);
> +		else
> +			dmam_free_coherent(iommu->dev, size, q->base, q->base_dma);
> +	}
> +	if (q->irq)
> +		free_irq(q->irq, q);
> +}
> +
> +static irqreturn_t riscv_iommu_cmdq_irq_check(int irq, void *data);
> +static irqreturn_t riscv_iommu_cmdq_process(int irq, void *data);
> +static irqreturn_t riscv_iommu_fltq_irq_check(int irq, void *data);
> +static irqreturn_t riscv_iommu_fltq_process(int irq, void *data);
> +static irqreturn_t riscv_iommu_priq_irq_check(int irq, void *data);
> +static irqreturn_t riscv_iommu_priq_process(int irq, void *data);
> +
> +static int riscv_iommu_queue_init(struct riscv_iommu_device *iommu, int queue_id)
> +{
> +	struct device *dev = iommu->dev;
> +	struct riscv_iommu_queue *q = NULL;
> +	size_t queue_size = 0;
> +	irq_handler_t irq_check;
> +	irq_handler_t irq_process;
> +	const char *name;
> +	int count = 0;
> +	int irq = 0;
> +	unsigned order = 0;
> +	u64 qbr_val = 0;
> +	u64 qbr_readback = 0;
> +	u64 qbr_paddr = 0;
> +	int ret = 0;
> +
> +	switch (queue_id) {
> +	case RISCV_IOMMU_COMMAND_QUEUE:
> +		q = &iommu->cmdq;
> +		q->len = sizeof(struct riscv_iommu_command);
> +		count = iommu->cmdq_len;
> +		irq = iommu->irq_cmdq;
> +		irq_check = riscv_iommu_cmdq_irq_check;
> +		irq_process = riscv_iommu_cmdq_process;
> +		q->qbr = RISCV_IOMMU_REG_CQB;
> +		q->qcr = RISCV_IOMMU_REG_CQCSR;
> +		name = "cmdq";
> +		break;
> +	case RISCV_IOMMU_FAULT_QUEUE:
> +		q = &iommu->fltq;
> +		q->len = sizeof(struct riscv_iommu_fq_record);
> +		count = iommu->fltq_len;
> +		irq = iommu->irq_fltq;
> +		irq_check = riscv_iommu_fltq_irq_check;
> +		irq_process = riscv_iommu_fltq_process;
> +		q->qbr = RISCV_IOMMU_REG_FQB;
> +		q->qcr = RISCV_IOMMU_REG_FQCSR;
> +		name = "fltq";
> +		break;
> +	case RISCV_IOMMU_PAGE_REQUEST_QUEUE:
> +		q = &iommu->priq;
> +		q->len = sizeof(struct riscv_iommu_pq_record);
> +		count = iommu->priq_len;
> +		irq = iommu->irq_priq;
> +		irq_check = riscv_iommu_priq_irq_check;
> +		irq_process = riscv_iommu_priq_process;
> +		q->qbr = RISCV_IOMMU_REG_PQB;
> +		q->qcr = RISCV_IOMMU_REG_PQCSR;
> +		name = "priq";
> +		break;
> +	default:
> +		dev_err(dev, "invalid queue interrupt index in queue_init!\n");
> +		return -EINVAL;
> +	}
> +
> +	/* Polling not implemented */
> +	if (!irq)
> +		return -ENODEV;
> +
> +	/* Allocate queue in memory and set the base register */
> +	order = ilog2(count);
> +	do {
> +		queue_size = q->len * (1ULL << order);
> +		q->base = dmam_alloc_coherent(dev, queue_size, &q->base_dma, GFP_KERNEL);
> +		if (q->base || queue_size < PAGE_SIZE)
> +			break;
> +
> +		order--;
> +	} while (1);
> +
> +	if (!q->base) {
> +		dev_err(dev, "failed to allocate %s queue (cnt: %u)\n", name, count);
> +		return -ENOMEM;
> +	}
> +
> +	q->cnt = 1ULL << order;
> +
> +	qbr_val = phys_to_ppn(q->base_dma) |
> +	    FIELD_PREP(RISCV_IOMMU_QUEUE_LOGSZ_FIELD, order - 1);
> +
> +	riscv_iommu_writeq(iommu, q->qbr, qbr_val);
> +
> +	/*
> +	 * Queue base registers are WARL, so it's possible that whatever we wrote
> +	 * there was illegal/not supported by the hw in which case we need to make
> +	 * sure we set a supported PPN and/or queue size.
> +	 */
> +	qbr_readback = riscv_iommu_readq(iommu, q->qbr);
> +	if (qbr_readback == qbr_val)
> +		goto irq;
> +
> +	dmam_free_coherent(dev, queue_size, q->base, q->base_dma);
> +
> +	/* Get supported queue size */
> +	order = FIELD_GET(RISCV_IOMMU_QUEUE_LOGSZ_FIELD, qbr_readback) + 1;
> +	q->cnt = 1ULL << order;
> +	queue_size = q->len * q->cnt;

Um... What? We allocate an arbitrarily-sized queue, free it again, 
*then* check what the hardware actually supports, and maybe allocate 
another queue? I can't help thinking there's a much better way...

> +
> +	/*
> +	 * In case we also failed to set PPN, it means the field is hardcoded and the
> +	 * queue resides in I/O memory instead, so get its physical address and
> +	 * ioremap it.
> +	 */
> +	qbr_paddr = ppn_to_phys(qbr_readback);
> +	if (qbr_paddr != q->base_dma) {
> +		dev_info(dev,
> +			 "hardcoded ppn in %s base register, using io memory for the queue\n",
> +			 name);
> +		dev_info(dev, "queue length for %s set to %i\n", name, q->cnt);
> +		q->in_iomem = true;
> +		q->base = ioremap(qbr_paddr, queue_size);
> +		if (!q->base) {
> +			dev_err(dev, "failed to map %s queue (cnt: %u)\n", name, q->cnt);
> +			return -ENOMEM;
> +		}
> +		q->base_dma = qbr_paddr;
> +	} else {
> +		/*
> +		 * We only failed to set the queue size, re-try to allocate memory with
> +		 * the queue size supported by the hw.
> +		 */
> +		dev_info(dev, "hardcoded queue size in %s base register\n", name);
> +		dev_info(dev, "retrying with queue length: %i\n", q->cnt);
> +		q->base = dmam_alloc_coherent(dev, queue_size, &q->base_dma, GFP_KERNEL);

Note that dma_alloc_coherent only guarantees natural alignment here, so 
if you need a minimum alignment of 4KB as the spec claims you should 
really make clamp your minimum allocation size to that.

> +		if (!q->base) {
> +			dev_err(dev, "failed to allocate %s queue (cnt: %u)\n",
> +				name, q->cnt);
> +			return -ENOMEM;
> +		}
> +	}
> +
> +	qbr_val = phys_to_ppn(q->base_dma) |
> +	    FIELD_PREP(RISCV_IOMMU_QUEUE_LOGSZ_FIELD, order - 1);
> +	riscv_iommu_writeq(iommu, q->qbr, qbr_val);
> +
> +	/* Final check to make sure hw accepted our write */
> +	qbr_readback = riscv_iommu_readq(iommu, q->qbr);
> +	if (qbr_readback != qbr_val) {
> +		dev_err(dev, "failed to set base register for %s\n", name);
> +		goto fail;
> +	}
> +
> + irq:
> +	if (request_threaded_irq(irq, irq_check, irq_process, IRQF_ONESHOT | IRQF_SHARED,
> +				 dev_name(dev), q)) {
> +		dev_err(dev, "fail to request irq %d for %s\n", irq, name);
> +		goto fail;
> +	}
> +
> +	q->irq = irq;
> +
> +	/* Note: All RIO_xQ_EN/IE fields are in the same offsets */
> +	ret =
> +	    riscv_iommu_queue_ctrl(iommu, q,
> +				   RISCV_IOMMU_QUEUE_ENABLE |
> +				   RISCV_IOMMU_QUEUE_INTR_ENABLE);
> +	if (ret & RISCV_IOMMU_QUEUE_BUSY) {
> +		dev_err(dev, "%s init timeout\n", name);
> +		ret = -EBUSY;
> +		goto fail;
> +	}
> +
> +	return 0;
> +
> + fail:
> +	riscv_iommu_queue_free(iommu, q);
> +	return 0;
> +}
> +
> +/*
> + * I/O MMU Command queue chapter 3.1
> + */
> +
> +static inline void riscv_iommu_cmd_inval_vma(struct riscv_iommu_command *cmd)
> +{
> +	cmd->dword0 =
> +	    FIELD_PREP(RISCV_IOMMU_CMD_OPCODE,
> +		       RISCV_IOMMU_CMD_IOTINVAL_OPCODE) | FIELD_PREP(RISCV_IOMMU_CMD_FUNC,
> +								     RISCV_IOMMU_CMD_IOTINVAL_FUNC_VMA);

Interesting indentation... :/

> +	cmd->dword1 = 0;
> +}
> +
> +static inline void riscv_iommu_cmd_inval_set_addr(struct riscv_iommu_command *cmd,
> +						  u64 addr)
> +{
> +	cmd->dword0 |= RISCV_IOMMU_CMD_IOTINVAL_AV;
> +	cmd->dword1 = addr;
> +}
> +
> +static inline void riscv_iommu_cmd_inval_set_pscid(struct riscv_iommu_command *cmd,
> +						   unsigned pscid)
> +{
> +	cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_IOTINVAL_PSCID, pscid) |
> +	    RISCV_IOMMU_CMD_IOTINVAL_PSCV;
> +}
> +
> +static inline void riscv_iommu_cmd_inval_set_gscid(struct riscv_iommu_command *cmd,
> +						   unsigned gscid)
> +{
> +	cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_IOTINVAL_GSCID, gscid) |
> +	    RISCV_IOMMU_CMD_IOTINVAL_GV;
> +}
> +
> +static inline void riscv_iommu_cmd_iofence(struct riscv_iommu_command *cmd)
> +{
> +	cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IOFENCE_OPCODE) |
> +	    FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IOFENCE_FUNC_C);
> +	cmd->dword1 = 0;
> +}
> +
> +static inline void riscv_iommu_cmd_iofence_set_av(struct riscv_iommu_command *cmd,
> +						  u64 addr, u32 data)
> +{
> +	cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IOFENCE_OPCODE) |
> +	    FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IOFENCE_FUNC_C) |
> +	    FIELD_PREP(RISCV_IOMMU_CMD_IOFENCE_DATA, data) | RISCV_IOMMU_CMD_IOFENCE_AV;
> +	cmd->dword1 = (addr >> 2);
> +}
> +
> +static inline void riscv_iommu_cmd_iodir_inval_ddt(struct riscv_iommu_command *cmd)
> +{
> +	cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IODIR_OPCODE) |
> +	    FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_DDT);
> +	cmd->dword1 = 0;
> +}
> +
> +static inline void riscv_iommu_cmd_iodir_inval_pdt(struct riscv_iommu_command *cmd)
> +{
> +	cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IODIR_OPCODE) |
> +	    FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_PDT);
> +	cmd->dword1 = 0;
> +}
> +
> +static inline void riscv_iommu_cmd_iodir_set_did(struct riscv_iommu_command *cmd,
> +						 unsigned devid)
> +{
> +	cmd->dword0 |=
> +	    FIELD_PREP(RISCV_IOMMU_CMD_IODIR_DID, devid) | RISCV_IOMMU_CMD_IODIR_DV;
> +}
> +
> +/* TODO: Convert into lock-less MPSC implementation. */
> +static bool riscv_iommu_post_sync(struct riscv_iommu_device *iommu,
> +				  struct riscv_iommu_command *cmd, bool sync)
> +{
> +	u32 head, tail, next, last;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&iommu->cq_lock, flags);
> +	head = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQH) & (iommu->cmdq.cnt - 1);
> +	tail = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQT) & (iommu->cmdq.cnt - 1);
> +	last = iommu->cmdq.lui;
> +	if (tail != last) {
> +		spin_unlock_irqrestore(&iommu->cq_lock, flags);
> +		/*
> +		 * FIXME: This is a workaround for dropped MMIO writes/reads on QEMU platform.
> +		 *        While debugging of the problem is still ongoing, this provides
> +		 *        a simple impolementation of try-again policy.
> +		 *        Will be changed to lock-less algorithm in the feature.
> +		 */
> +		dev_dbg(iommu->dev, "IOMMU CQT: %x != %x (1st)\n", last, tail);
> +		spin_lock_irqsave(&iommu->cq_lock, flags);
> +		tail =
> +		    riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQT) & (iommu->cmdq.cnt - 1);
> +		last = iommu->cmdq.lui;
> +		if (tail != last) {
> +			spin_unlock_irqrestore(&iommu->cq_lock, flags);
> +			dev_dbg(iommu->dev, "IOMMU CQT: %x != %x (2nd)\n", last, tail);
> +			spin_lock_irqsave(&iommu->cq_lock, flags);
> +		}
> +	}
> +
> +	next = (last + 1) & (iommu->cmdq.cnt - 1);
> +	if (next != head) {
> +		struct riscv_iommu_command *ptr = iommu->cmdq.base;
> +		ptr[last] = *cmd;
> +		wmb();
> +		riscv_iommu_writel(iommu, RISCV_IOMMU_REG_CQT, next);
> +		iommu->cmdq.lui = next;
> +	}
> +
> +	spin_unlock_irqrestore(&iommu->cq_lock, flags);
> +
> +	if (sync && head != next) {
> +		cycles_t start_time = get_cycles();
> +		while (1) {
> +			last = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQH) &
> +			    (iommu->cmdq.cnt - 1);
> +			if (head < next && last >= next)
> +				break;
> +			if (head > next && last < head && last >= next)
> +				break;
> +			if (RISCV_IOMMU_TIMEOUT < (get_cycles() - start_time)) {
> +				dev_err(iommu->dev, "IOFENCE TIMEOUT\n");
> +				return false;
> +			}
> +			cpu_relax();
> +		}
> +	}
> +
> +	return next != head;
> +}
> +
> +static bool riscv_iommu_post(struct riscv_iommu_device *iommu,
> +			     struct riscv_iommu_command *cmd)
> +{
> +	return riscv_iommu_post_sync(iommu, cmd, false);
> +}
> +
> +static bool riscv_iommu_iofence_sync(struct riscv_iommu_device *iommu)
> +{
> +	struct riscv_iommu_command cmd;
> +	riscv_iommu_cmd_iofence(&cmd);
> +	return riscv_iommu_post_sync(iommu, &cmd, true);
> +}
> +
> +/* Command queue primary interrupt handler */
> +static irqreturn_t riscv_iommu_cmdq_irq_check(int irq, void *data)
> +{
> +	struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> +	struct riscv_iommu_device *iommu =
> +	    container_of(q, struct riscv_iommu_device, cmdq);
> +	u32 ipsr = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_IPSR);
> +	if (ipsr & RISCV_IOMMU_IPSR_CIP)
> +		return IRQ_WAKE_THREAD;
> +	return IRQ_NONE;
> +}
> +
> +/* Command queue interrupt hanlder thread function */
> +static irqreturn_t riscv_iommu_cmdq_process(int irq, void *data)
> +{
> +	struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> +	struct riscv_iommu_device *iommu;
> +	unsigned ctrl;
> +
> +	iommu = container_of(q, struct riscv_iommu_device, cmdq);
> +
> +	/* Error reporting, clear error reports if any. */
> +	ctrl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_CQCSR);
> +	if (ctrl & (RISCV_IOMMU_CQCSR_CQMF |
> +		    RISCV_IOMMU_CQCSR_CMD_TO | RISCV_IOMMU_CQCSR_CMD_ILL)) {
> +		riscv_iommu_queue_ctrl(iommu, &iommu->cmdq, ctrl);
> +		dev_warn_ratelimited(iommu->dev,
> +				     "Command queue error: fault: %d tout: %d err: %d\n",
> +				     !!(ctrl & RISCV_IOMMU_CQCSR_CQMF),
> +				     !!(ctrl & RISCV_IOMMU_CQCSR_CMD_TO),
> +				     !!(ctrl & RISCV_IOMMU_CQCSR_CMD_ILL));
> +	}
> +
> +	/* Clear fault interrupt pending. */
> +	riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR, RISCV_IOMMU_IPSR_CIP);
> +
> +	return IRQ_HANDLED;
> +}
> +
> +/*
> + * Fault/event queue, chapter 3.2
> + */
> +
> +static void riscv_iommu_fault_report(struct riscv_iommu_device *iommu,
> +				     struct riscv_iommu_fq_record *event)
> +{
> +	unsigned err, devid;
> +
> +	err = FIELD_GET(RISCV_IOMMU_FQ_HDR_CAUSE, event->hdr);
> +	devid = FIELD_GET(RISCV_IOMMU_FQ_HDR_DID, event->hdr);
> +
> +	dev_warn_ratelimited(iommu->dev,
> +			     "Fault %d devid: %d" " iotval: %llx iotval2: %llx\n", err,
> +			     devid, event->iotval, event->iotval2);
> +}
> +
> +/* Fault/event queue primary interrupt handler */
> +static irqreturn_t riscv_iommu_fltq_irq_check(int irq, void *data)
> +{
> +	struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> +	struct riscv_iommu_device *iommu =
> +	    container_of(q, struct riscv_iommu_device, fltq);
> +	u32 ipsr = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_IPSR);
> +	if (ipsr & RISCV_IOMMU_IPSR_FIP)
> +		return IRQ_WAKE_THREAD;
> +	return IRQ_NONE;
> +}
> +
> +/* Fault queue interrupt hanlder thread function */
> +static irqreturn_t riscv_iommu_fltq_process(int irq, void *data)
> +{
> +	struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> +	struct riscv_iommu_device *iommu;
> +	struct riscv_iommu_fq_record *events;
> +	unsigned cnt, len, idx, ctrl;
> +
> +	iommu = container_of(q, struct riscv_iommu_device, fltq);
> +	events = (struct riscv_iommu_fq_record *)q->base;
> +
> +	/* Error reporting, clear error reports if any. */
> +	ctrl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_FQCSR);
> +	if (ctrl & (RISCV_IOMMU_FQCSR_FQMF | RISCV_IOMMU_FQCSR_FQOF)) {
> +		riscv_iommu_queue_ctrl(iommu, &iommu->fltq, ctrl);
> +		dev_warn_ratelimited(iommu->dev,
> +				     "Fault queue error: fault: %d full: %d\n",
> +				     !!(ctrl & RISCV_IOMMU_FQCSR_FQMF),
> +				     !!(ctrl & RISCV_IOMMU_FQCSR_FQOF));
> +	}
> +
> +	/* Clear fault interrupt pending. */
> +	riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR, RISCV_IOMMU_IPSR_FIP);
> +
> +	/* Report fault events. */
> +	do {
> +		cnt = riscv_iommu_queue_consume(iommu, q, &idx);
> +		if (!cnt)
> +			break;
> +		for (len = 0; len < cnt; idx++, len++)
> +			riscv_iommu_fault_report(iommu, &events[idx]);
> +		riscv_iommu_queue_release(iommu, q, cnt);
> +	} while (1);
> +
> +	return IRQ_HANDLED;
> +}
> +
> +/*
> + * Page request queue, chapter 3.3
> + */
> +
>   /*
>    * Register device for IOMMU tracking.
>    */
> @@ -97,6 +600,54 @@ static void riscv_iommu_add_device(struct riscv_iommu_device *iommu, struct devi
>   	mutex_unlock(&iommu->eps_mutex);
>   }
>   
> +/* Page request interface queue primary interrupt handler */
> +static irqreturn_t riscv_iommu_priq_irq_check(int irq, void *data)
> +{
> +	struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> +	struct riscv_iommu_device *iommu =
> +	    container_of(q, struct riscv_iommu_device, priq);
> +	u32 ipsr = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_IPSR);
> +	if (ipsr & RISCV_IOMMU_IPSR_PIP)
> +		return IRQ_WAKE_THREAD;
> +	return IRQ_NONE;
> +}
> +
> +/* Page request interface queue interrupt hanlder thread function */
> +static irqreturn_t riscv_iommu_priq_process(int irq, void *data)
> +{
> +	struct riscv_iommu_queue *q = (struct riscv_iommu_queue *)data;
> +	struct riscv_iommu_device *iommu;
> +	struct riscv_iommu_pq_record *requests;
> +	unsigned cnt, idx, ctrl;
> +
> +	iommu = container_of(q, struct riscv_iommu_device, priq);
> +	requests = (struct riscv_iommu_pq_record *)q->base;
> +
> +	/* Error reporting, clear error reports if any. */
> +	ctrl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_PQCSR);
> +	if (ctrl & (RISCV_IOMMU_PQCSR_PQMF | RISCV_IOMMU_PQCSR_PQOF)) {
> +		riscv_iommu_queue_ctrl(iommu, &iommu->priq, ctrl);
> +		dev_warn_ratelimited(iommu->dev,
> +				     "Page request queue error: fault: %d full: %d\n",
> +				     !!(ctrl & RISCV_IOMMU_PQCSR_PQMF),
> +				     !!(ctrl & RISCV_IOMMU_PQCSR_PQOF));
> +	}
> +
> +	/* Clear page request interrupt pending. */
> +	riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR, RISCV_IOMMU_IPSR_PIP);
> +
> +	/* Process page requests. */
> +	do {
> +		cnt = riscv_iommu_queue_consume(iommu, q, &idx);
> +		if (!cnt)
> +			break;
> +		dev_warn(iommu->dev, "unexpected %u page requests\n", cnt);
> +		riscv_iommu_queue_release(iommu, q, cnt);
> +	} while (1);
> +
> +	return IRQ_HANDLED;
> +}
> +
>   /*
>    * Endpoint management
>    */
> @@ -350,7 +901,29 @@ static void riscv_iommu_flush_iotlb_range(struct iommu_domain *iommu_domain,
>   					  unsigned long *start, unsigned long *end,
>   					  size_t *pgsize)
>   {
> -	/* Command interface not implemented */
> +	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
> +	struct riscv_iommu_command cmd;
> +	unsigned long iova;
> +
> +	if (domain->mode == RISCV_IOMMU_DC_FSC_MODE_BARE)

That should probably not happen - things shouldn't be calling TLB ops on 
identity domains (and ideally your identity domains wouldn't even *have* 
iotlb callbacks...)

> +		return;
> +
> +	/* Domain not attached to an IOMMU! */
> +	BUG_ON(!domain->iommu);

However I'm not sure how iommu_create_device_direct_mappings() isn't 
hitting that?

Thanks,
Robin.

> +
> +	riscv_iommu_cmd_inval_vma(&cmd);
> +	riscv_iommu_cmd_inval_set_pscid(&cmd, domain->pscid);
> +
> +	if (start && end && pgsize) {
> +		/* Cover only the range that is needed */
> +		for (iova = *start; iova <= *end; iova += *pgsize) {
> +			riscv_iommu_cmd_inval_set_addr(&cmd, iova);
> +			riscv_iommu_post(domain->iommu, &cmd);
> +		}
> +	} else {
> +		riscv_iommu_post(domain->iommu, &cmd);
> +	}
> +	riscv_iommu_iofence_sync(domain->iommu);
>   }
>   
>   static void riscv_iommu_flush_iotlb_all(struct iommu_domain *iommu_domain)
> @@ -610,6 +1183,9 @@ void riscv_iommu_remove(struct riscv_iommu_device *iommu)
>   	iommu_device_unregister(&iommu->iommu);
>   	iommu_device_sysfs_remove(&iommu->iommu);
>   	riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_OFF);
> +	riscv_iommu_queue_free(iommu, &iommu->cmdq);
> +	riscv_iommu_queue_free(iommu, &iommu->fltq);
> +	riscv_iommu_queue_free(iommu, &iommu->priq);
>   }
>   
>   int riscv_iommu_init(struct riscv_iommu_device *iommu)
> @@ -632,6 +1208,16 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
>   	}
>   #endif
>   
> +	/*
> +	 * Assign queue lengths from module parameters if not already
> +	 * set on the device tree.
> +	 */
> +	if (!iommu->cmdq_len)
> +		iommu->cmdq_len = cmdq_length;
> +	if (!iommu->fltq_len)
> +		iommu->fltq_len = fltq_length;
> +	if (!iommu->priq_len)
> +		iommu->priq_len = priq_length;
>   	/* Clear any pending interrupt flag. */
>   	riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR,
>   			   RISCV_IOMMU_IPSR_CIP |
> @@ -639,7 +1225,20 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
>   			   RISCV_IOMMU_IPSR_PMIP | RISCV_IOMMU_IPSR_PIP);
>   	spin_lock_init(&iommu->cq_lock);
>   	mutex_init(&iommu->eps_mutex);
> +	ret = riscv_iommu_queue_init(iommu, RISCV_IOMMU_COMMAND_QUEUE);
> +	if (ret)
> +		goto fail;
> +	ret = riscv_iommu_queue_init(iommu, RISCV_IOMMU_FAULT_QUEUE);
> +	if (ret)
> +		goto fail;
> +	if (!(iommu->cap & RISCV_IOMMU_CAP_ATS))
> +		goto no_ats;
> +
> +	ret = riscv_iommu_queue_init(iommu, RISCV_IOMMU_PAGE_REQUEST_QUEUE);
> +	if (ret)
> +		goto fail;
>   
> + no_ats:
>   	ret = riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_BARE);
>   
>   	if (ret) {
> @@ -663,5 +1262,8 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
>   	return 0;
>    fail:
>   	riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_OFF);
> +	riscv_iommu_queue_free(iommu, &iommu->priq);
> +	riscv_iommu_queue_free(iommu, &iommu->fltq);
> +	riscv_iommu_queue_free(iommu, &iommu->cmdq);
>   	return ret;
>   }
> diff --git a/drivers/iommu/riscv/iommu.h b/drivers/iommu/riscv/iommu.h
> index 7dc9baa59a50..04148a2a8ffd 100644
> --- a/drivers/iommu/riscv/iommu.h
> +++ b/drivers/iommu/riscv/iommu.h
> @@ -28,6 +28,24 @@
>   #define IOMMU_PAGE_SIZE_1G	BIT_ULL(30)
>   #define IOMMU_PAGE_SIZE_512G	BIT_ULL(39)
>   
> +struct riscv_iommu_queue {
> +	dma_addr_t base_dma;	/* ring buffer bus address */
> +	void *base;		/* ring buffer pointer */
> +	size_t len;		/* single item length */
> +	u32 cnt;		/* items count */
> +	u32 lui;		/* last used index, consumer/producer share */
> +	unsigned qbr;		/* queue base register offset */
> +	unsigned qcr;		/* queue control and status register offset */
> +	int irq;		/* registered interrupt number */
> +	bool in_iomem;		/* indicates queue data are in I/O memory  */
> +};
> +
> +enum riscv_queue_ids {
> +	RISCV_IOMMU_COMMAND_QUEUE	= 0,
> +	RISCV_IOMMU_FAULT_QUEUE		= 1,
> +	RISCV_IOMMU_PAGE_REQUEST_QUEUE	= 2
> +};
> +
>   struct riscv_iommu_device {
>   	struct iommu_device iommu;	/* iommu core interface */
>   	struct device *dev;		/* iommu hardware */
> @@ -42,6 +60,11 @@ struct riscv_iommu_device {
>   	int irq_pm;
>   	int irq_priq;
>   
> +	/* Queue lengths */
> +	int cmdq_len;
> +	int fltq_len;
> +	int priq_len;
> +
>   	/* supported and enabled hardware capabilities */
>   	u64 cap;
>   
> @@ -53,6 +76,11 @@ struct riscv_iommu_device {
>   	unsigned ddt_mode;
>   	bool ddtp_in_iomem;
>   
> +	/* hardware queues */
> +	struct riscv_iommu_queue cmdq;
> +	struct riscv_iommu_queue fltq;
> +	struct riscv_iommu_queue priq;
> +
>   	/* Connected end-points */
>   	struct rb_root eps;
>   	struct mutex eps_mutex;

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 07/11] RISC-V: drivers/iommu/riscv: Add device context support
  2023-07-19 19:33 ` [PATCH 07/11] RISC-V: drivers/iommu/riscv: Add device context support Tomasz Jeznach
@ 2023-08-16 19:08   ` Robin Murphy
  0 siblings, 0 replies; 86+ messages in thread
From: Robin Murphy @ 2023-08-16 19:08 UTC (permalink / raw)
  To: Tomasz Jeznach, Joerg Roedel, Will Deacon, Paul Walmsley
  Cc: Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Nick Kossifidis, Sebastien Boeuf, iommu, linux-riscv,
	linux-kernel, linux

On 2023-07-19 20:33, Tomasz Jeznach wrote:
> Introduces per device translation context, with 1,2 or 3 tree level
> device tree structures.
> 
> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> ---
>   drivers/iommu/riscv/iommu.c | 163 ++++++++++++++++++++++++++++++++++--
>   drivers/iommu/riscv/iommu.h |   1 +
>   2 files changed, 158 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
> index 5c4cf9875302..9ee7d2b222b5 100644
> --- a/drivers/iommu/riscv/iommu.c
> +++ b/drivers/iommu/riscv/iommu.c
> @@ -41,7 +41,7 @@ MODULE_ALIAS("riscv-iommu");
>   MODULE_LICENSE("GPL v2");
>   
>   /* Global IOMMU params. */
> -static int ddt_mode = RISCV_IOMMU_DDTP_MODE_BARE;
> +static int ddt_mode = RISCV_IOMMU_DDTP_MODE_3LVL;
>   module_param(ddt_mode, int, 0644);
>   MODULE_PARM_DESC(ddt_mode, "Device Directory Table mode.");
>   
> @@ -452,6 +452,14 @@ static bool riscv_iommu_post(struct riscv_iommu_device *iommu,
>   	return riscv_iommu_post_sync(iommu, cmd, false);
>   }
>   
> +static bool riscv_iommu_iodir_inv_devid(struct riscv_iommu_device *iommu, unsigned devid)
> +{
> +	struct riscv_iommu_command cmd;
> +	riscv_iommu_cmd_iodir_inval_ddt(&cmd);
> +	riscv_iommu_cmd_iodir_set_did(&cmd, devid);
> +	return riscv_iommu_post(iommu, &cmd);
> +}
> +
>   static bool riscv_iommu_iofence_sync(struct riscv_iommu_device *iommu)
>   {
>   	struct riscv_iommu_command cmd;
> @@ -671,6 +679,94 @@ static bool riscv_iommu_capable(struct device *dev, enum iommu_cap cap)
>   	return false;
>   }
>   
> +/* TODO: implement proper device context management, e.g. teardown flow */
> +
> +/* Lookup or initialize device directory info structure. */
> +static struct riscv_iommu_dc *riscv_iommu_get_dc(struct riscv_iommu_device *iommu,
> +						 unsigned devid)
> +{
> +	const bool base_format = !(iommu->cap & RISCV_IOMMU_CAP_MSI_FLAT);
> +	unsigned depth = iommu->ddt_mode - RISCV_IOMMU_DDTP_MODE_1LVL;
> +	u8 ddi_bits[3] = { 0 };
> +	u64 *ddtp = NULL, ddt;
> +
> +	if (iommu->ddt_mode == RISCV_IOMMU_DDTP_MODE_OFF ||
> +	    iommu->ddt_mode == RISCV_IOMMU_DDTP_MODE_BARE)
> +		return NULL;

I don't see how the driver can ever be useful without a DDT - I'd have 
thought that you only ever want to use one of those modes on probe 
failure or remove.

> +
> +	/* Make sure the mode is valid */
> +	if (iommu->ddt_mode > RISCV_IOMMU_DDTP_MODE_MAX)
> +		return NULL;
> +
> +	/*
> +	 * Device id partitioning for base format:
> +	 * DDI[0]: bits 0 - 6   (1st level) (7 bits)
> +	 * DDI[1]: bits 7 - 15  (2nd level) (9 bits)
> +	 * DDI[2]: bits 16 - 23 (3rd level) (8 bits)
> +	 *
> +	 * For extended format:
> +	 * DDI[0]: bits 0 - 5   (1st level) (6 bits)
> +	 * DDI[1]: bits 6 - 14  (2nd level) (9 bits)
> +	 * DDI[2]: bits 15 - 23 (3rd level) (9 bits)
> +	 */
> +	if (base_format) {
> +		ddi_bits[0] = 7;
> +		ddi_bits[1] = 7 + 9;
> +		ddi_bits[2] = 7 + 9 + 8;
> +	} else {
> +		ddi_bits[0] = 6;
> +		ddi_bits[1] = 6 + 9;
> +		ddi_bits[2] = 6 + 9 + 9;
> +	}
> +
> +	/* Make sure device id is within range */
> +	if (devid >= (1 << ddi_bits[depth]))
> +		return NULL;
> +
> +	/* Get to the level of the non-leaf node that holds the device context */
> +	for (ddtp = (u64 *) iommu->ddtp; depth-- > 0;) {
> +		const int split = ddi_bits[depth];
> +		/*
> +		 * Each non-leaf node is 64bits wide and on each level
> +		 * nodes are indexed by DDI[depth].
> +		 */
> +		ddtp += (devid >> split) & 0x1FF;
> +
> + retry:
> +		/*
> +		 * Check if this node has been populated and if not
> +		 * allocate a new level and populate it.
> +		 */
> +		ddt = READ_ONCE(*ddtp);
> +		if (ddt & RISCV_IOMMU_DDTE_VALID) {
> +			ddtp = __va(ppn_to_phys(ddt));
> +		} else {
> +			u64 old, new = get_zeroed_page(GFP_KERNEL);
> +			if (!new)
> +				return NULL;
> +
> +			old = cmpxchg64_relaxed(ddtp, ddt,
> +						phys_to_ppn(__pa(new)) |
> +						RISCV_IOMMU_DDTE_VALID);
> +
> +			if (old != ddt) {
> +				free_page(new);
> +				goto retry;
> +			}
> +
> +			ddtp = (u64 *) new;
> +		}
> +	}
> +
> +	/*
> +	 * Grab the node that matches DDI[depth], note that when using base
> +	 * format the device context is 4 * 64bits, and the extended format
> +	 * is 8 * 64bits, hence the (3 - base_format) below.
> +	 */
> +	ddtp += (devid & ((64 << base_format) - 1)) << (3 - base_format);
> +	return (struct riscv_iommu_dc *)ddtp;
> +}
> +
>   static struct iommu_device *riscv_iommu_probe_device(struct device *dev)
>   {
>   	struct riscv_iommu_device *iommu;
> @@ -708,6 +804,9 @@ static struct iommu_device *riscv_iommu_probe_device(struct device *dev)
>   	ep->iommu = iommu;
>   	ep->dev = dev;
>   
> +	/* Initial DC pointer can be NULL if IOMMU is configured in OFF or BARE mode */
> +	ep->dc = riscv_iommu_get_dc(iommu, ep->devid);
> +
>   	dev_info(iommu->dev, "adding device to iommu with devid %i in domain %i\n",
>   		ep->devid, ep->domid);
>   
> @@ -734,6 +833,16 @@ static void riscv_iommu_release_device(struct device *dev)
>   	list_del(&ep->domain);
>   	mutex_unlock(&ep->lock);
>   
> +	if (ep->dc) {
> +		// this should be already done by domain detach.

What's domain detach? ;)

> +		ep->dc->tc = 0ULL;
> +		wmb();
> +		ep->dc->fsc = 0ULL;
> +		ep->dc->iohgatp = 0ULL;
> +		wmb();
> +		riscv_iommu_iodir_inv_devid(iommu, ep->devid);
> +	}
> +
>   	/* Remove endpoint from IOMMU tracking structures */
>   	mutex_lock(&iommu->eps_mutex);
>   	rb_erase(&ep->node, &iommu->eps);
> @@ -853,11 +962,21 @@ static int riscv_iommu_domain_finalize(struct riscv_iommu_domain *domain,
>   	return 0;
>   }
>   
> +static u64 riscv_iommu_domain_atp(struct riscv_iommu_domain *domain)
> +{
> +	u64 atp = FIELD_PREP(RISCV_IOMMU_DC_FSC_MODE, domain->mode);
> +	if (domain->mode != RISCV_IOMMU_DC_FSC_MODE_BARE)
> +		atp |= FIELD_PREP(RISCV_IOMMU_DC_FSC_PPN, virt_to_pfn(domain->pgd_root));
> +	return atp;
> +}
> +
>   static int riscv_iommu_attach_dev(struct iommu_domain *iommu_domain, struct device *dev)
>   {
>   	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
>   	struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
> +	struct riscv_iommu_dc *dc = ep->dc;
>   	int ret;
> +	u64 val;
>   
>   	/* PSCID not valid */
>   	if ((int)domain->pscid < 0)
> @@ -880,17 +999,44 @@ static int riscv_iommu_attach_dev(struct iommu_domain *iommu_domain, struct devi
>   		return ret;
>   	}
>   
> -	if (ep->iommu->ddt_mode != RISCV_IOMMU_DDTP_MODE_BARE ||
> -	    domain->domain.type != IOMMU_DOMAIN_IDENTITY) {
> -		dev_warn(dev, "domain type %d not supported\n",
> -		    domain->domain.type);
> +	if (ep->iommu->ddt_mode == RISCV_IOMMU_DDTP_MODE_BARE &&
> +	    domain->domain.type == IOMMU_DOMAIN_IDENTITY) {
> +		dev_info(dev, "domain type %d attached w/ PSCID %u\n",
> +		    domain->domain.type, domain->pscid);
> +		return 0;
> +	}
> +
> +	if (!dc) {
>   		return -ENODEV;
>   	}
>   
> +	/*
> +	 * S-Stage translation table. G-Stage remains unmodified (BARE).
> +	 */
> +	val = FIELD_PREP(RISCV_IOMMU_DC_TA_PSCID, domain->pscid);
> +
> +	dc->ta = cpu_to_le64(val);
> +	dc->fsc = cpu_to_le64(riscv_iommu_domain_atp(domain));
> +
> +	wmb();
> +
> +	/* Mark device context as valid, synchronise device context cache. */
> +	val = RISCV_IOMMU_DC_TC_V;
> +
> +	if (ep->iommu->cap & RISCV_IOMMU_CAP_AMO) {
> +		val |= RISCV_IOMMU_DC_TC_GADE |
> +		       RISCV_IOMMU_DC_TC_SADE;
> +	}
> +
> +	dc->tc = cpu_to_le64(val);
> +	wmb();
> +
>   	list_add_tail(&ep->domain, &domain->endpoints);
>   	mutex_unlock(&ep->lock);
>   	mutex_unlock(&domain->lock);
>   
> +	riscv_iommu_iodir_inv_devid(ep->iommu, ep->devid);
> +
>   	dev_info(dev, "domain type %d attached w/ PSCID %u\n",
>   	    domain->domain.type, domain->pscid);
>   
> @@ -1239,7 +1385,12 @@ int riscv_iommu_init(struct riscv_iommu_device *iommu)
>   		goto fail;
>   
>    no_ats:
> -	ret = riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_BARE);
> +	if (iommu_default_passthrough()) {
> +		dev_info(dev, "iommu set to passthrough mode\n");
> +		ret = riscv_iommu_enable(iommu, RISCV_IOMMU_DDTP_MODE_BARE);

Yeah, disabling the whole IOMMU is not what default passthrough means... 
drivers should not care about that at all, it only affects the core 
code's choice of default domain type. Even if that is identity, 
translation absolutely still needs to be available on a per-device 
basis, for unmanaged domains or default domain changes via sysfs.

Thanks,
Robin.

> +	} else {
> +		ret = riscv_iommu_enable(iommu, ddt_mode);
> +	}
>   
>   	if (ret) {
>   		dev_err(dev, "cannot enable iommu device (%d)\n", ret);
> diff --git a/drivers/iommu/riscv/iommu.h b/drivers/iommu/riscv/iommu.h
> index 04148a2a8ffd..9140df71e17b 100644
> --- a/drivers/iommu/riscv/iommu.h
> +++ b/drivers/iommu/riscv/iommu.h
> @@ -105,6 +105,7 @@ struct riscv_iommu_endpoint {
>   	unsigned devid;      			/* PCI bus:device:function number */
>   	unsigned domid;    			/* PCI domain number, segment */
>   	struct rb_node node;    		/* device tracking node (lookup by devid) */
> +	struct riscv_iommu_dc *dc;		/* device context pointer */
>   	struct riscv_iommu_device *iommu;	/* parent iommu device */
>   
>   	struct mutex lock;

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 08/11] RISC-V: drivers/iommu/riscv: Add page table support
  2023-07-19 19:33 ` [PATCH 08/11] RISC-V: drivers/iommu/riscv: Add page table support Tomasz Jeznach
  2023-07-25 13:13   ` Zong Li
  2023-07-31  7:19   ` Zong Li
@ 2023-08-16 21:04   ` Robin Murphy
  2 siblings, 0 replies; 86+ messages in thread
From: Robin Murphy @ 2023-08-16 21:04 UTC (permalink / raw)
  To: Tomasz Jeznach, Joerg Roedel, Will Deacon, Paul Walmsley
  Cc: Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Nick Kossifidis, Sebastien Boeuf, iommu, linux-riscv,
	linux-kernel, linux

On 2023-07-19 20:33, Tomasz Jeznach wrote:
> Introduces I/O page level translation services, with 4K, 2M, 1G page
> size support and enables page level iommu_map/unmap domain interfaces.
> 
> Co-developed-by: Sebastien Boeuf <seb@rivosinc.com>
> Signed-off-by: Sebastien Boeuf <seb@rivosinc.com>
> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> ---
>   drivers/iommu/io-pgtable.c       |   3 +
>   drivers/iommu/riscv/Makefile     |   2 +-
>   drivers/iommu/riscv/io_pgtable.c | 266 +++++++++++++++++++++++++++++++
>   drivers/iommu/riscv/iommu.c      |  40 +++--
>   drivers/iommu/riscv/iommu.h      |   1 +
>   include/linux/io-pgtable.h       |   2 +
>   6 files changed, 297 insertions(+), 17 deletions(-)
>   create mode 100644 drivers/iommu/riscv/io_pgtable.c
> 
> diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c
> index b843fcd365d2..c4807175934f 100644
> --- a/drivers/iommu/io-pgtable.c
> +++ b/drivers/iommu/io-pgtable.c
> @@ -32,6 +32,9 @@ io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] = {
>   	[AMD_IOMMU_V1] = &io_pgtable_amd_iommu_v1_init_fns,
>   	[AMD_IOMMU_V2] = &io_pgtable_amd_iommu_v2_init_fns,
>   #endif
> +#ifdef CONFIG_RISCV_IOMMU
> +	[RISCV_IOMMU] = &io_pgtable_riscv_init_fns,
> +#endif
>   };
>   
>   struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
> diff --git a/drivers/iommu/riscv/Makefile b/drivers/iommu/riscv/Makefile
> index 9523eb053cfc..13af452c3052 100644
> --- a/drivers/iommu/riscv/Makefile
> +++ b/drivers/iommu/riscv/Makefile
> @@ -1 +1 @@
> -obj-$(CONFIG_RISCV_IOMMU) += iommu.o iommu-pci.o iommu-platform.o iommu-sysfs.o
> \ No newline at end of file
> +obj-$(CONFIG_RISCV_IOMMU) += iommu.o iommu-pci.o iommu-platform.o iommu-sysfs.o io_pgtable.o
> diff --git a/drivers/iommu/riscv/io_pgtable.c b/drivers/iommu/riscv/io_pgtable.c
> new file mode 100644
> index 000000000000..b6e603e6726e
> --- /dev/null
> +++ b/drivers/iommu/riscv/io_pgtable.c
> @@ -0,0 +1,266 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright © 2022-2023 Rivos Inc.
> + *
> + * RISC-V IOMMU page table allocator.
> + *
> + * Authors:
> + *	Tomasz Jeznach <tjeznach@rivosinc.com>
> + *	Sebastien Boeuf <seb@rivosinc.com>
> + */
> +
> +#include <linux/atomic.h>
> +#include <linux/bitops.h>
> +#include <linux/io-pgtable.h>
> +#include <linux/kernel.h>
> +#include <linux/sizes.h>
> +#include <linux/slab.h>
> +#include <linux/types.h>
> +#include <linux/dma-mapping.h>

There's no DMA API usage here. Should there be?

> +
> +#include "iommu.h"
> +
> +#define io_pgtable_to_domain(x) \
> +	container_of((x), struct riscv_iommu_domain, pgtbl)
> +
> +#define io_pgtable_ops_to_domain(x) \
> +	io_pgtable_to_domain(container_of((x), struct io_pgtable, ops))
> +
> +static inline size_t get_page_size(size_t size)
> +{
> +	if (size >= IOMMU_PAGE_SIZE_512G)
> +		return IOMMU_PAGE_SIZE_512G;
> +
> +	if (size >= IOMMU_PAGE_SIZE_1G)
> +		return IOMMU_PAGE_SIZE_1G;
> +
> +	if (size >= IOMMU_PAGE_SIZE_2M)
> +		return IOMMU_PAGE_SIZE_2M;
> +
> +	return IOMMU_PAGE_SIZE_4K;
> +}
> +
> +static void riscv_iommu_pt_walk_free(pmd_t * ptp, unsigned shift, bool root)
> +{
> +	pmd_t *pte, *pt_base;
> +	int i;
> +
> +	if (shift == PAGE_SHIFT)
> +		return;
> +
> +	if (root)
> +		pt_base = ptp;
> +	else
> +		pt_base =
> +		    (pmd_t *) pfn_to_virt(__page_val_to_pfn(pmd_val(*ptp)));
> +
> +	/* Recursively free all sub page table pages */
> +	for (i = 0; i < PTRS_PER_PMD; i++) {
> +		pte = pt_base + i;
> +		if (pmd_present(*pte) && !pmd_leaf(*pte))
> +			riscv_iommu_pt_walk_free(pte, shift - 9, false);
> +	}
> +
> +	/* Now free the current page table page */

Without any TLB maintenance, even if it was live? Maybe walk caches and 
speculative prefetching are a long way from anyone's mind if this is 
still only running under Qemu, but it still makes me unconfortable to 
see a complete lack of appropriate looking maintenance in the places I 
would usually expect to. Especially if there's the prospect of the IOMMU 
doing hardware pagetable updates itself (which I see is a thing, even if 
it's not enabled here yet).

> +	if (!root && pmd_present(*pt_base))
> +		free_page((unsigned long)pt_base);
> +}
> +
> +static void riscv_iommu_free_pgtable(struct io_pgtable *iop)
> +{
> +	struct riscv_iommu_domain *domain = io_pgtable_to_domain(iop);
> +	riscv_iommu_pt_walk_free((pmd_t *) domain->pgd_root, PGDIR_SHIFT, true);
> +}
> +
> +static pte_t *riscv_iommu_pt_walk_alloc(pmd_t * ptp, unsigned long iova,
> +					unsigned shift, bool root,
> +					size_t pgsize,
> +					unsigned long (*pd_alloc)(gfp_t),
> +					gfp_t gfp)
> +{
> +	pmd_t *pte;
> +	unsigned long pfn;
> +
> +	if (root)
> +		pte = ptp + ((iova >> shift) & (PTRS_PER_PMD - 1));
> +	else
> +		pte = (pmd_t *) pfn_to_virt(__page_val_to_pfn(pmd_val(*ptp))) +
> +		    ((iova >> shift) & (PTRS_PER_PMD - 1));
> +
> +	if ((1ULL << shift) <= pgsize) {
> +		if (pmd_present(*pte) && !pmd_leaf(*pte))
> +			riscv_iommu_pt_walk_free(pte, shift - 9, false);
> +		return (pte_t *) pte;
> +	}
> +
> +	if (pmd_none(*pte)) {
> +		pfn = pd_alloc ? virt_to_pfn(pd_alloc(gfp)) : 0;
> +		if (!pfn)
> +			return NULL;
> +		set_pmd(pte, __pmd((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
> +	}
> +
> +	return riscv_iommu_pt_walk_alloc(pte, iova, shift - 9, false,
> +					 pgsize, pd_alloc, gfp);
> +}
> +
> +static pte_t *riscv_iommu_pt_walk_fetch(pmd_t * ptp,
> +					unsigned long iova, unsigned shift,
> +					bool root)
> +{
> +	pmd_t *pte;
> +
> +	if (root)
> +		pte = ptp + ((iova >> shift) & (PTRS_PER_PMD - 1));
> +	else
> +		pte = (pmd_t *) pfn_to_virt(__page_val_to_pfn(pmd_val(*ptp))) +
> +		    ((iova >> shift) & (PTRS_PER_PMD - 1));
> +
> +	if (pmd_leaf(*pte))
> +		return (pte_t *) pte;
> +	else if (pmd_none(*pte))
> +		return NULL;
> +	else if (shift == PAGE_SHIFT)
> +		return NULL;
> +
> +	return riscv_iommu_pt_walk_fetch(pte, iova, shift - 9, false);
> +}
> +
> +static int riscv_iommu_map_pages(struct io_pgtable_ops *ops,
> +				 unsigned long iova, phys_addr_t phys,
> +				 size_t pgsize, size_t pgcount, int prot,
> +				 gfp_t gfp, size_t *mapped)
> +{
> +	struct riscv_iommu_domain *domain = io_pgtable_ops_to_domain(ops);
> +	size_t size = 0;
> +	size_t page_size = get_page_size(pgsize);
> +	pte_t *pte;
> +	pte_t pte_val;
> +	pgprot_t pte_prot;
> +
> +	if (domain->domain.type == IOMMU_DOMAIN_BLOCKED)
> +		return -ENODEV;
> +
> +	if (domain->domain.type == IOMMU_DOMAIN_IDENTITY) {
> +		*mapped = pgsize * pgcount;
> +		return 0;
> +	}

As before, these are utter nonsense, but cannot happen anyway.

> +
> +	pte_prot = (prot & IOMMU_WRITE) ?
> +	    __pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_WRITE | _PAGE_DIRTY) :
> +	    __pgprot(_PAGE_BASE | _PAGE_READ);
> +
> +	while (pgcount--) {
> +		pte =
> +		    riscv_iommu_pt_walk_alloc((pmd_t *) domain->pgd_root, iova,
> +					      PGDIR_SHIFT, true, page_size,
> +					      get_zeroed_page, gfp);
> +		if (!pte) {
> +			*mapped = size;
> +			return -ENOMEM;
> +		}
> +
> +		pte_val = pfn_pte(phys_to_pfn(phys), pte_prot);
> +
> +		set_pte(pte, pte_val);
> +
> +		size += page_size;
> +		iova += page_size;
> +		phys += page_size;
> +	}
> +
> +	*mapped = size;
> +	return 0;
> +}
> +
> +static size_t riscv_iommu_unmap_pages(struct io_pgtable_ops *ops,
> +				      unsigned long iova, size_t pgsize,
> +				      size_t pgcount,
> +				      struct iommu_iotlb_gather *gather)
> +{
> +	struct riscv_iommu_domain *domain = io_pgtable_ops_to_domain(ops);
> +	size_t size = 0;
> +	size_t page_size = get_page_size(pgsize);
> +	pte_t *pte;
> +
> +	if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)
> +		return pgsize * pgcount;

"Yes, non-existent caller, those pages are definitely unmapped and 
inaccessible now. Totally secure. Device couldn't possibly touch them if 
it tried. Would I lie to you?"

> +
> +	while (pgcount--) {
> +		pte = riscv_iommu_pt_walk_fetch((pmd_t *) domain->pgd_root,
> +						iova, PGDIR_SHIFT, true);
> +		if (!pte)
> +			return size;
> +
> +		set_pte(pte, __pte(0));
> +
> +		iommu_iotlb_gather_add_page(&domain->domain, gather, iova,
> +					    pgsize);
> +
> +		size += page_size;
> +		iova += page_size;
> +	}
> +
> +	return size;
> +}
> +
> +static phys_addr_t riscv_iommu_iova_to_phys(struct io_pgtable_ops *ops,
> +					    unsigned long iova)
> +{
> +	struct riscv_iommu_domain *domain = io_pgtable_ops_to_domain(ops);
> +	pte_t *pte;
> +
> +	if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)
> +		return (phys_addr_t) iova;

I mean, even if it was still 2 years ago before the core code handled 
this anyway (and it's only for a couple of broken network drivers doing 
dumb things they shouldn't), why would a sane IOMMU driver even bother 
going to the lengths of allocating io-pgtable ops for an identity domain 
that by definition doesn't use a pagetable!?

> +
> +	pte = riscv_iommu_pt_walk_fetch((pmd_t *) domain->pgd_root,
> +					iova, PGDIR_SHIFT, true);
> +	if (!pte || !pte_present(*pte))
> +		return 0;
> +
> +	return (pfn_to_phys(pte_pfn(*pte)) | (iova & PAGE_MASK));
> +}
> +
> +static void riscv_iommu_tlb_inv_all(void *cookie)
> +{
> +}
> +
> +static void riscv_iommu_tlb_inv_walk(unsigned long iova, size_t size,
> +				     size_t granule, void *cookie)
> +{
> +}
> +
> +static void riscv_iommu_tlb_add_page(struct iommu_iotlb_gather *gather,
> +				     unsigned long iova, size_t granule,
> +				     void *cookie)
> +{
> +}
> +
> +static const struct iommu_flush_ops riscv_iommu_flush_ops = {
> +	.tlb_flush_all = riscv_iommu_tlb_inv_all,
> +	.tlb_flush_walk = riscv_iommu_tlb_inv_walk,
> +	.tlb_add_page = riscv_iommu_tlb_add_page,
> +};

...Why? Either implement them properly, or don't implement them at all. 
And if they are implemented it needs to be by the driver, so either way 
they shouldn't be *here*.

> +
> +/* NOTE: cfg should point to riscv_iommu_domain structure member pgtbl.cfg */
> +static struct io_pgtable *riscv_iommu_alloc_pgtable(struct io_pgtable_cfg *cfg,
> +						    void *cookie)
> +{
> +	struct io_pgtable *iop = container_of(cfg, struct io_pgtable, cfg);
> +
> +	cfg->pgsize_bitmap = SZ_4K | SZ_2M | SZ_1G;
> +	cfg->ias = 57;		// va mode, SvXX -> ias
> +	cfg->oas = 57;		// pa mode, or SvXX+4 -> oas

At least IAS should be passed by the driver based on what the IOMMU 
actually supports (and isn't OAS 56?)

> +	cfg->tlb = &riscv_iommu_flush_ops;
> +
> +	iop->ops.map_pages = riscv_iommu_map_pages;
> +	iop->ops.unmap_pages = riscv_iommu_unmap_pages;
> +	iop->ops.iova_to_phys = riscv_iommu_iova_to_phys;
> +
> +	return iop;
> +}
> +
> +struct io_pgtable_init_fns io_pgtable_riscv_init_fns = {
> +	.alloc = riscv_iommu_alloc_pgtable,
> +	.free = riscv_iommu_free_pgtable,
> +};
> diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
> index 9ee7d2b222b5..2ef6952a2109 100644
> --- a/drivers/iommu/riscv/iommu.c
> +++ b/drivers/iommu/riscv/iommu.c
> @@ -807,7 +807,7 @@ static struct iommu_device *riscv_iommu_probe_device(struct device *dev)
>   	/* Initial DC pointer can be NULL if IOMMU is configured in OFF or BARE mode */
>   	ep->dc = riscv_iommu_get_dc(iommu, ep->devid);
>   
> -	dev_info(iommu->dev, "adding device to iommu with devid %i in domain %i\n",
> +	dev_dbg(iommu->dev, "adding device to iommu with devid %i in domain %i\n",
>   		ep->devid, ep->domid);
>   
>   	dev_iommu_priv_set(dev, ep);
> @@ -874,7 +874,10 @@ static struct iommu_domain *riscv_iommu_domain_alloc(unsigned type)
>   {
>   	struct riscv_iommu_domain *domain;
>   
> -	if (type != IOMMU_DOMAIN_IDENTITY &&
> +	if (type != IOMMU_DOMAIN_DMA &&
> +	    type != IOMMU_DOMAIN_DMA_FQ &&

IOMMU_DOMAIN_DMA_FQ isn't exposed to drivers any more.

> +	    type != IOMMU_DOMAIN_UNMANAGED &&
> +	    type != IOMMU_DOMAIN_IDENTITY &&
>   	    type != IOMMU_DOMAIN_BLOCKED)

I might start believing you could support blocking domains if I saw some 
kind of handling of them since patch #7, but no, still nothing.

AFAICS from the spec there's no super-convenient way if you did want to 
do it, since you apparently can't suppress the faults from simply making 
the DC invalid. I guess it might be a case of keeping a special 
always-empty context so you can point any device's FSC at that while 
setting DTF. But it's hardly critical, so for now I'd just remove the 
broken non-support and leave the idea as something to revisit later.

>   		return NULL;
>   
> @@ -890,7 +893,7 @@ static struct iommu_domain *riscv_iommu_domain_alloc(unsigned type)
>   	domain->pscid = ida_alloc_range(&riscv_iommu_pscids, 1,
>   					RISCV_IOMMU_MAX_PSCID, GFP_KERNEL);
>   
> -	printk("domain type %x alloc %u\n", type, domain->pscid);
> +	printk("domain alloc %u\n", domain->pscid);
>   
>   	return &domain->domain;
>   }
> @@ -903,6 +906,9 @@ static void riscv_iommu_domain_free(struct iommu_domain *iommu_domain)
>   		pr_warn("IOMMU domain is not empty!\n");
>   	}
>   
> +	if (domain->pgtbl.cookie)
> +		free_io_pgtable_ops(&domain->pgtbl.ops);
> +
>   	if (domain->pgd_root)
>   		free_pages((unsigned long)domain->pgd_root, 0);

Is there a reason for this weird pgd_root setup where the io-pgtable 
implementation doesn't simply allocate and own the full pagetable itself?

>   
> @@ -959,6 +965,9 @@ static int riscv_iommu_domain_finalize(struct riscv_iommu_domain *domain,
>   	if (!domain->pgd_root)
>   		return -ENOMEM;
>   
> +	if (!alloc_io_pgtable_ops(RISCV_IOMMU, &domain->pgtbl.cfg, domain))
> +		return -ENOMEM;
> +
>   	return 0;
>   }
>   
> @@ -1006,9 +1015,8 @@ static int riscv_iommu_attach_dev(struct iommu_domain *iommu_domain, struct devi
>   		return 0;
>   	}
>   
> -	if (!dc) {
> +	if (!dc)
>   		return -ENODEV;
> -	}

This is a great example of more of the kind of stuff I was getting at on 
patch #1 - obviously unnecessary churn *within* a series is the ideal 
way to overload and annoy reviewers... and then I look at the rest of my 
screen below and see loads of code from an earlier patch being deleted 
already, so apparently it was a waste of time reviewing it at all :(

Thanks,
Robin.

>   	/*
>   	 * S-Stage translation table. G-Stage remains unmodified (BARE).
> @@ -1104,12 +1112,11 @@ static int riscv_iommu_map_pages(struct iommu_domain *iommu_domain,
>   {
>   	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
>   
> -	if (domain->domain.type == IOMMU_DOMAIN_IDENTITY) {
> -		*mapped = pgsize * pgcount;
> -		return 0;
> -	}
> +	if (!domain->pgtbl.ops.map_pages)
> +		return -ENODEV;
>   
> -	return -ENODEV;
> +	return domain->pgtbl.ops.map_pages(&domain->pgtbl.ops, iova, phys,
> +					   pgsize, pgcount, prot, gfp, mapped);
>   }
>   
>   static size_t riscv_iommu_unmap_pages(struct iommu_domain *iommu_domain,
> @@ -1118,10 +1125,11 @@ static size_t riscv_iommu_unmap_pages(struct iommu_domain *iommu_domain,
>   {
>   	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
>   
> -	if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)
> -		return pgsize * pgcount;
> +	if (!domain->pgtbl.ops.unmap_pages)
> +		return 0;
>   
> -	return 0;
> +	return domain->pgtbl.ops.unmap_pages(&domain->pgtbl.ops, iova, pgsize,
> +					     pgcount, gather);
>   }
>   
>   static phys_addr_t riscv_iommu_iova_to_phys(struct iommu_domain *iommu_domain,
> @@ -1129,10 +1137,10 @@ static phys_addr_t riscv_iommu_iova_to_phys(struct iommu_domain *iommu_domain,
>   {
>   	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
>   
> -	if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)
> -		return (phys_addr_t) iova;
> +	if (!domain->pgtbl.ops.iova_to_phys)
> +		return 0;
>   
> -	return 0;
> +	return domain->pgtbl.ops.iova_to_phys(&domain->pgtbl.ops, iova);
>   }
>   
>   /*
> diff --git a/drivers/iommu/riscv/iommu.h b/drivers/iommu/riscv/iommu.h
> index 9140df71e17b..fe32a4eff14e 100644
> --- a/drivers/iommu/riscv/iommu.h
> +++ b/drivers/iommu/riscv/iommu.h
> @@ -88,6 +88,7 @@ struct riscv_iommu_device {
>   
>   struct riscv_iommu_domain {
>   	struct iommu_domain domain;
> +	struct io_pgtable pgtbl;
>   
>   	struct list_head endpoints;
>   	struct mutex lock;
> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
> index 1b7a44b35616..8dd9d3a28e3a 100644
> --- a/include/linux/io-pgtable.h
> +++ b/include/linux/io-pgtable.h
> @@ -19,6 +19,7 @@ enum io_pgtable_fmt {
>   	AMD_IOMMU_V2,
>   	APPLE_DART,
>   	APPLE_DART2,
> +	RISCV_IOMMU,
>   	IO_PGTABLE_NUM_FMTS,
>   };
>   
> @@ -258,5 +259,6 @@ extern struct io_pgtable_init_fns io_pgtable_arm_mali_lpae_init_fns;
>   extern struct io_pgtable_init_fns io_pgtable_amd_iommu_v1_init_fns;
>   extern struct io_pgtable_init_fns io_pgtable_amd_iommu_v2_init_fns;
>   extern struct io_pgtable_init_fns io_pgtable_apple_dart_init_fns;
> +extern struct io_pgtable_init_fns io_pgtable_riscv_init_fns;
>   
>   #endif /* __IO_PGTABLE_H */

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 11/11] RISC-V: drivers/iommu/riscv: Add G-Stage translation support
  2023-07-19 19:33 ` [PATCH 11/11] RISC-V: drivers/iommu/riscv: Add G-Stage translation support Tomasz Jeznach
  2023-07-31  8:12   ` Zong Li
@ 2023-08-16 21:13   ` Robin Murphy
  1 sibling, 0 replies; 86+ messages in thread
From: Robin Murphy @ 2023-08-16 21:13 UTC (permalink / raw)
  To: Tomasz Jeznach, Joerg Roedel, Will Deacon, Paul Walmsley
  Cc: Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Nick Kossifidis, Sebastien Boeuf, iommu, linux-riscv,
	linux-kernel, linux

On 2023-07-19 20:33, Tomasz Jeznach wrote:
> This change introduces 2nd stage translation configuration
> support, enabling nested translation for IOMMU hardware.
> Pending integration with VMM IOMMUFD interfaces to manage
> 1st stage translation and IOMMU virtialization interfaces.
> 
> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> ---
>   drivers/iommu/riscv/iommu.c | 58 ++++++++++++++++++++++++++++---------
>   drivers/iommu/riscv/iommu.h |  3 +-
>   2 files changed, 46 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
> index 7b3e3e135cf6..3ca2f0194d3c 100644
> --- a/drivers/iommu/riscv/iommu.c
> +++ b/drivers/iommu/riscv/iommu.c
> @@ -1418,6 +1418,19 @@ static struct iommu_domain *riscv_iommu_domain_alloc(unsigned type)
>   	return &domain->domain;
>   }
>   
> +/* mark domain as second-stage translation */
> +static int riscv_iommu_enable_nesting(struct iommu_domain *iommu_domain)

Please don't add more instances of enable_nesting. It's a dead end that 
has never actually been used and should be removed fairly soon. The new 
nesting infrastructure is all still in flight, but the current patchsets 
should give a good idea of what you'd want to work towards:

https://lore.kernel.org/linux-iommu/20230724110406.107212-1-yi.l.liu@intel.com/
https://lore.kernel.org/linux-iommu/20230724111335.107427-1-yi.l.liu@intel.com/
https://lore.kernel.org/linux-iommu/cover.1683688960.git.nicolinc@nvidia.com/

Thanks,
Robin.

> +{
> +	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
> +
> +	mutex_lock(&domain->lock);
> +	if (list_empty(&domain->endpoints))
> +		domain->g_stage = true;
> +	mutex_unlock(&domain->lock);
> +
> +	return domain->g_stage ? 0 : -EBUSY;
> +}
> +
>   static void riscv_iommu_domain_free(struct iommu_domain *iommu_domain)
>   {
>   	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
> @@ -1433,7 +1446,7 @@ static void riscv_iommu_domain_free(struct iommu_domain *iommu_domain)
>   		free_io_pgtable_ops(&domain->pgtbl.ops);
>   
>   	if (domain->pgd_root)
> -		free_pages((unsigned long)domain->pgd_root, 0);
> +		free_pages((unsigned long)domain->pgd_root, domain->g_stage ? 2 : 0);
>   
>   	if ((int)domain->pscid > 0)
>   		ida_free(&riscv_iommu_pscids, domain->pscid);
> @@ -1483,7 +1496,8 @@ static int riscv_iommu_domain_finalize(struct riscv_iommu_domain *domain,
>   
>   	/* TODO: Fix this for RV32 */
>   	domain->mode = satp_mode >> 60;
> -	domain->pgd_root = (pgd_t *) __get_free_pages(GFP_KERNEL | __GFP_ZERO, 0);
> +	domain->pgd_root = (pgd_t *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
> +						      domain->g_stage ? 2 : 0);
>   
>   	if (!domain->pgd_root)
>   		return -ENOMEM;
> @@ -1499,6 +1513,8 @@ static u64 riscv_iommu_domain_atp(struct riscv_iommu_domain *domain)
>   	u64 atp = FIELD_PREP(RISCV_IOMMU_DC_FSC_MODE, domain->mode);
>   	if (domain->mode != RISCV_IOMMU_DC_FSC_MODE_BARE)
>   		atp |= FIELD_PREP(RISCV_IOMMU_DC_FSC_PPN, virt_to_pfn(domain->pgd_root));
> +	if (domain->g_stage)
> +		atp |= FIELD_PREP(RISCV_IOMMU_DC_IOHGATP_GSCID, domain->pscid);
>   	return atp;
>   }
>   
> @@ -1541,20 +1557,30 @@ static int riscv_iommu_attach_dev(struct iommu_domain *iommu_domain, struct devi
>   	if (!dc)
>   		return -ENODEV;
>   
> -	/*
> -	 * S-Stage translation table. G-Stage remains unmodified (BARE).
> -	 */
> -	val = FIELD_PREP(RISCV_IOMMU_DC_TA_PSCID, domain->pscid);
> -
> -	if (ep->pasid_enabled) {
> -		ep->pc[0].ta = cpu_to_le64(val | RISCV_IOMMU_PC_TA_V);
> -		ep->pc[0].fsc = cpu_to_le64(riscv_iommu_domain_atp(domain));
> +	if (domain->g_stage) {
> +		/*
> +		 * Enable G-Stage translation with initial pass-through mode
> +		 * for S-Stage. VMM is responsible for more restrictive
> +		 * guest VA translation scheme configuration.
> +		 */
>   		dc->ta = 0;
> -		dc->fsc = cpu_to_le64(virt_to_pfn(ep->pc) |
> -		    FIELD_PREP(RISCV_IOMMU_DC_FSC_MODE, RISCV_IOMMU_DC_FSC_PDTP_MODE_PD8));
> +		dc->fsc = 0ULL; /* RISCV_IOMMU_DC_FSC_MODE_BARE */ ;
> +		dc->iohgatp = cpu_to_le64(riscv_iommu_domain_atp(domain));
>   	} else {
> -		dc->ta = cpu_to_le64(val);
> -		dc->fsc = cpu_to_le64(riscv_iommu_domain_atp(domain));
> +		/* S-Stage translation table. G-Stage remains unmodified. */
> +		if (ep->pasid_enabled) {
> +			val = FIELD_PREP(RISCV_IOMMU_DC_TA_PSCID, domain->pscid);
> +			ep->pc[0].ta = cpu_to_le64(val | RISCV_IOMMU_PC_TA_V);
> +			ep->pc[0].fsc = cpu_to_le64(riscv_iommu_domain_atp(domain));
> +			dc->ta = 0;
> +			val = FIELD_PREP(RISCV_IOMMU_DC_FSC_MODE,
> +					  RISCV_IOMMU_DC_FSC_PDTP_MODE_PD8);
> +			dc->fsc = cpu_to_le64(val | virt_to_pfn(ep->pc));
> +		} else {
> +			val = FIELD_PREP(RISCV_IOMMU_DC_TA_PSCID, domain->pscid);
> +			dc->ta = cpu_to_le64(val);
> +			dc->fsc = cpu_to_le64(riscv_iommu_domain_atp(domain));
> +		}
>   	}
>   
>   	wmb();
> @@ -1599,6 +1625,9 @@ static int riscv_iommu_set_dev_pasid(struct iommu_domain *iommu_domain,
>   	if (!iommu_domain || !iommu_domain->mm)
>   		return -EINVAL;
>   
> +	if (domain->g_stage)
> +		return -EINVAL;
> +
>   	/* Driver uses TC.DPE mode, PASID #0 is incorrect. */
>   	if (pasid == 0)
>   		return -EINVAL;
> @@ -1969,6 +1998,7 @@ static const struct iommu_domain_ops riscv_iommu_domain_ops = {
>   	.iotlb_sync = riscv_iommu_iotlb_sync,
>   	.iotlb_sync_map = riscv_iommu_iotlb_sync_map,
>   	.flush_iotlb_all = riscv_iommu_flush_iotlb_all,
> +	.enable_nesting = riscv_iommu_enable_nesting,
>   };
>   
>   static const struct iommu_ops riscv_iommu_ops = {
> diff --git a/drivers/iommu/riscv/iommu.h b/drivers/iommu/riscv/iommu.h
> index 55418a1144fb..55e5aafea5bc 100644
> --- a/drivers/iommu/riscv/iommu.h
> +++ b/drivers/iommu/riscv/iommu.h
> @@ -102,8 +102,9 @@ struct riscv_iommu_domain {
>   	struct riscv_iommu_device *iommu;
>   
>   	unsigned mode;		/* RIO_ATP_MODE_* enum */
> -	unsigned pscid;		/* RISC-V IOMMU PSCID */
> +	unsigned pscid;		/* RISC-V IOMMU PSCID / GSCID */
>   	ioasid_t pasid;		/* IOMMU_DOMAIN_SVA: Cached PASID */
> +	bool g_stage;		/* 2nd stage translation domain */
>   
>   	pgd_t *pgd_root;	/* page table root pointer */
>   };

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 10/11] RISC-V: drivers/iommu/riscv: Add MSI identity remapping
  2023-07-19 19:33 ` [PATCH 10/11] RISC-V: drivers/iommu/riscv: Add MSI identity remapping Tomasz Jeznach
  2023-07-31  8:02   ` Zong Li
@ 2023-08-16 21:43   ` Robin Murphy
  1 sibling, 0 replies; 86+ messages in thread
From: Robin Murphy @ 2023-08-16 21:43 UTC (permalink / raw)
  To: Tomasz Jeznach, Joerg Roedel, Will Deacon, Paul Walmsley
  Cc: Palmer Dabbelt, Albert Ou, Anup Patel, Sunil V L,
	Nick Kossifidis, Sebastien Boeuf, iommu, linux-riscv,
	linux-kernel, linux

On 2023-07-19 20:33, Tomasz Jeznach wrote:
> This change provides basic identity mapping support to
> excercise MSI_FLAT hardware capability.
> 
> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> ---
>   drivers/iommu/riscv/iommu.c | 81 +++++++++++++++++++++++++++++++++++++
>   drivers/iommu/riscv/iommu.h |  3 ++
>   2 files changed, 84 insertions(+)
> 
> diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
> index 6042c35be3ca..7b3e3e135cf6 100644
> --- a/drivers/iommu/riscv/iommu.c
> +++ b/drivers/iommu/riscv/iommu.c
> @@ -61,6 +61,9 @@ MODULE_PARM_DESC(priq_length, "Page request interface queue length.");
>   #define RISCV_IOMMU_MAX_PSCID	(1U << 20)
>   static DEFINE_IDA(riscv_iommu_pscids);
>   
> +/* TODO: Enable MSI remapping */
> +#define RISCV_IMSIC_BASE	0x28000000
> +
>   /* 1 second */
>   #define RISCV_IOMMU_TIMEOUT	riscv_timebase
>   
> @@ -932,6 +935,72 @@ static irqreturn_t riscv_iommu_priq_process(int irq, void *data)
>    * Endpoint management
>    */
>   
> +static int riscv_iommu_enable_ir(struct riscv_iommu_endpoint *ep)
> +{
> +	struct riscv_iommu_device *iommu = ep->iommu;
> +	struct iommu_resv_region *entry;
> +	struct irq_domain *msi_domain;
> +	u64 val;
> +	int i;
> +
> +	/* Initialize MSI remapping */
> +	if (!ep->dc || !(iommu->cap & RISCV_IOMMU_CAP_MSI_FLAT))
> +		return 0;
> +
> +	ep->msi_root = (struct riscv_iommu_msi_pte *)get_zeroed_page(GFP_KERNEL);
> +	if (!ep->msi_root)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < 256; i++) {
> +		ep->msi_root[i].pte = RISCV_IOMMU_MSI_PTE_V |
> +		    FIELD_PREP(RISCV_IOMMU_MSI_PTE_M, 3) |
> +		    phys_to_ppn(RISCV_IMSIC_BASE + i * PAGE_SIZE);
> +	}
> +
> +	entry = iommu_alloc_resv_region(RISCV_IMSIC_BASE, PAGE_SIZE * 256, 0,
> +					IOMMU_RESV_SW_MSI, GFP_KERNEL);
> +	if (entry)
> +		list_add_tail(&entry->list, &ep->regions);
> +
> +	val = virt_to_pfn(ep->msi_root) |
> +	    FIELD_PREP(RISCV_IOMMU_DC_MSIPTP_MODE, RISCV_IOMMU_DC_MSIPTP_MODE_FLAT);
> +	ep->dc->msiptp = cpu_to_le64(val);
> +
> +	/* Single page of MSIPTP, 256 IMSIC files */
> +	ep->dc->msi_addr_mask = cpu_to_le64(255);
> +	ep->dc->msi_addr_pattern = cpu_to_le64(RISCV_IMSIC_BASE >> 12);
> +	wmb();
> +
> +	/* set msi domain for the device as isolated. hack. */

Hack because this should be implemented as a proper hierarchical MSI 
domain, or hack because it doesn't actually represent isolation? Nothing 
really jumps out at me from the IOMMU and IMSIC specs, so I'm leaning 
towards the hunch that there's no real isolation, it's more just 
implicit in the assumption that each distinct VM/process with devices 
assigned should get its own interrupt file. I can't easily see how that 
would be achieved for things like VFIO :/

Thanks,
Robin.

> +	msi_domain = dev_get_msi_domain(ep->dev);
> +	if (msi_domain) {
> +		msi_domain->flags |= IRQ_DOMAIN_FLAG_ISOLATED_MSI;
> +	}
> +
> +	dev_dbg(ep->dev, "RV-IR enabled\n");
> +
> +	ep->ir_enabled = true;
> +
> +	return 0;
> +}
> +
> +static void riscv_iommu_disable_ir(struct riscv_iommu_endpoint *ep)
> +{
> +	if (!ep->ir_enabled)
> +		return;
> +
> +	ep->dc->msi_addr_pattern = 0ULL;
> +	ep->dc->msi_addr_mask = 0ULL;
> +	ep->dc->msiptp = 0ULL;
> +	wmb();
> +
> +	dev_dbg(ep->dev, "RV-IR disabled\n");
> +
> +	free_pages((unsigned long)ep->msi_root, 0);
> +	ep->msi_root = NULL;
> +	ep->ir_enabled = false;
> +}
> +
>   /* Endpoint features/capabilities */
>   static void riscv_iommu_disable_ep(struct riscv_iommu_endpoint *ep)
>   {
> @@ -1226,6 +1295,7 @@ static struct iommu_device *riscv_iommu_probe_device(struct device *dev)
>   
>   	mutex_init(&ep->lock);
>   	INIT_LIST_HEAD(&ep->domain);
> +	INIT_LIST_HEAD(&ep->regions);
>   
>   	if (dev_is_pci(dev)) {
>   		ep->devid = pci_dev_id(to_pci_dev(dev));
> @@ -1248,6 +1318,7 @@ static struct iommu_device *riscv_iommu_probe_device(struct device *dev)
>   	dev_iommu_priv_set(dev, ep);
>   	riscv_iommu_add_device(iommu, dev);
>   	riscv_iommu_enable_ep(ep);
> +	riscv_iommu_enable_ir(ep);
>   
>   	return &iommu->iommu;
>   }
> @@ -1279,6 +1350,7 @@ static void riscv_iommu_release_device(struct device *dev)
>   		riscv_iommu_iodir_inv_devid(iommu, ep->devid);
>   	}
>   
> +	riscv_iommu_disable_ir(ep);
>   	riscv_iommu_disable_ep(ep);
>   
>   	/* Remove endpoint from IOMMU tracking structures */
> @@ -1301,6 +1373,15 @@ static struct iommu_group *riscv_iommu_device_group(struct device *dev)
>   
>   static void riscv_iommu_get_resv_regions(struct device *dev, struct list_head *head)
>   {
> +	struct iommu_resv_region *entry, *new_entry;
> +	struct riscv_iommu_endpoint *ep = dev_iommu_priv_get(dev);
> +
> +	list_for_each_entry(entry, &ep->regions, list) {
> +		new_entry = kmemdup(entry, sizeof(*entry), GFP_KERNEL);
> +		if (new_entry)
> +			list_add_tail(&new_entry->list, head);
> +	}
> +
>   	iommu_dma_get_resv_regions(dev, head);
>   }
>   
> diff --git a/drivers/iommu/riscv/iommu.h b/drivers/iommu/riscv/iommu.h
> index 83e8d00fd0f8..55418a1144fb 100644
> --- a/drivers/iommu/riscv/iommu.h
> +++ b/drivers/iommu/riscv/iommu.h
> @@ -117,14 +117,17 @@ struct riscv_iommu_endpoint {
>   	struct riscv_iommu_dc *dc;		/* device context pointer */
>   	struct riscv_iommu_pc *pc;		/* process context root, valid if pasid_enabled is true */
>   	struct riscv_iommu_device *iommu;	/* parent iommu device */
> +	struct riscv_iommu_msi_pte *msi_root;	/* interrupt re-mapping */
>   
>   	struct mutex lock;
>   	struct list_head domain;		/* endpoint attached managed domain */
> +	struct list_head regions;		/* reserved regions, interrupt remapping window */
>   
>   	/* end point info bits */
>   	unsigned pasid_bits;
>   	unsigned pasid_feat;
>   	bool pasid_enabled;
> +	bool ir_enabled;
>   };
>   
>   /* Helper functions and macros */

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 00/13] Linux RISC-V IOMMU Support
       [not found] ` <CAHCEehJKYu3-GSX2L6L4_VVvYt1MagRgPJvYTbqekrjPw3ZSkA@mail.gmail.com>
@ 2024-02-23 14:04   ` Zong Li
  2024-04-04 17:37     ` Tomasz Jeznach
  0 siblings, 1 reply; 86+ messages in thread
From: Zong Li @ 2024-02-23 14:04 UTC (permalink / raw)
  To: Tomasz Jeznach
  Cc: Paul Walmsley, Palmer Dabbelt, Robin Murphy, Will Deacon,
	Joerg Roedel, Anup Patel, Albert Ou, Greentime Hu, linux,
	linux-kernel@vger.kernel.org List, Sebastien Boeuf, iommu,
	Nick Kossifidis, linux-riscv

>
> The RISC-V IOMMU specification is now ratified as-per the RISC-V international
> process [1]. The latest frozen specifcation can be found at:
> https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf
>
> At a high-level, the RISC-V IOMMU specification defines:
> 1) Memory-mapped programming interface
>    - Mandatory and optional registers layout and description.
>    - Software guidelines for device initialization and capabilities discovery.
> 2) In-memory queue interface
>    - A command-queue used by software to queue commands to the IOMMU.
>    - A fault/event queue used to bring faults and events to software’s attention.
>    - A page-request queue used to report “Page Request” messages received from
>      PCIe devices.
>    - Message-signalled and wire-signaled interrupt mechanism.
> 3) In-memory data structures
>    - Device-context: used to associate a device with an address space and to hold
>      other per-device parameters used by the IOMMU to perform address translations.
>    - Process-contexts: used to associate a different virtual address space based on
>      device provided process identification number.
>    - MSI page table configuration used to direct an MSI to a guest interrupt file
>      in an IMSIC. The MSI page table formats are defined by the Advanced Interrupt
>      Architecture specification [2].
>
> This series introduces complete single-level translation support, including shared
> virtual address (SVA), ATS/PRI interfaces in the kernel driver. Patches adding MSI
> identity remapping and G-Stage translation (GPA to SPA) are added only to excercise
> hardware interfaces, to be complemented with AIA/KVM bindings in follow-up series.
>
> This series is a logical regrouping of series of incremental patches based on
> RISC-V International IOMMU Task Group discussions and specification development
> process. Original series can be found at the maintainer's repository branch [3].
>
> These patches can also be found in the riscv_iommu_v1 branch at:
> https://github.com/tjeznach/linux/tree/riscv_iommu_v1
>
> To test this series, use QEMU/OpenSBI with RISC-V IOMMU implementation available in
> the riscv_iommu_v1 branch at:
> https://github.com/tjeznach/qemu/tree/riscv_iommu_v1
>
> References:
> [1] - https://wiki.riscv.org/display/HOME/Specification+Status
> [2] - https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
> [3] - https://github.com/tjeznach/qemu/tree/tjeznach/riscv-iommu-20230719
>
>
> Anup Patel (1):
>   dt-bindings: Add RISC-V IOMMU bindings
>
> Tomasz Jeznach (10):
>   RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support.
>   RISC-V: arch/riscv/config: enable RISC-V IOMMU support
>   MAINTAINERS: Add myself for RISC-V IOMMU driver
>   RISC-V: drivers/iommu/riscv: Add sysfs interface
>   RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues
>   RISC-V: drivers/iommu/riscv: Add device context support
>   RISC-V: drivers/iommu/riscv: Add page table support
>   RISC-V: drivers/iommu/riscv: Add SVA with PASID/ATS/PRI support.
>   RISC-V: drivers/iommu/riscv: Add MSI identity remapping
>   RISC-V: drivers/iommu/riscv: Add G-Stage translation support
>
>  .../bindings/iommu/riscv,iommu.yaml           |  146 ++
>  MAINTAINERS                                   |    7 +
>  arch/riscv/configs/defconfig                  |    1 +
>  drivers/iommu/Kconfig                         |    1 +
>  drivers/iommu/Makefile                        |    2 +-
>  drivers/iommu/io-pgtable.c                    |    3 +
>  drivers/iommu/riscv/Kconfig                   |   22 +
>  drivers/iommu/riscv/Makefile                  |    1 +
>  drivers/iommu/riscv/io_pgtable.c              |  266 ++
>  drivers/iommu/riscv/iommu-bits.h              |  704 ++++++
>  drivers/iommu/riscv/iommu-pci.c               |  206 ++
>  drivers/iommu/riscv/iommu-platform.c          |  160 ++
>  drivers/iommu/riscv/iommu-sysfs.c             |  183 ++
>  drivers/iommu/riscv/iommu.c                   | 2130 +++++++++++++++++
>  drivers/iommu/riscv/iommu.h                   |  165 ++
>  include/linux/io-pgtable.h                    |    2 +
>  16 files changed, 3998 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
>  create mode 100644 drivers/iommu/riscv/Kconfig
>  create mode 100644 drivers/iommu/riscv/Makefile
>  create mode 100644 drivers/iommu/riscv/io_pgtable.c
>  create mode 100644 drivers/iommu/riscv/iommu-bits.h
>  create mode 100644 drivers/iommu/riscv/iommu-pci.c
>  create mode 100644 drivers/iommu/riscv/iommu-platform.c
>  create mode 100644 drivers/iommu/riscv/iommu-sysfs.c
>  create mode 100644 drivers/iommu/riscv/iommu.c
>  create mode 100644 drivers/iommu/riscv/iommu.h
>
> --
> 2.34.1
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

Hi Tomasz,
Could I know if you have a plan for the next version and if you have
any estimates for when the v2 patch will be ready? We have some
patches based on top of your old implementation, and it would be great
if we can rebase them onto your next version. Thanks.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 00/13] Linux RISC-V IOMMU Support
  2024-02-23 14:04   ` [PATCH 00/13] Linux RISC-V IOMMU Support Zong Li
@ 2024-04-04 17:37     ` Tomasz Jeznach
  2024-04-10  5:38       ` Zong Li
  0 siblings, 1 reply; 86+ messages in thread
From: Tomasz Jeznach @ 2024-04-04 17:37 UTC (permalink / raw)
  To: Zong Li
  Cc: Paul Walmsley, Palmer Dabbelt, Robin Murphy, Will Deacon,
	Joerg Roedel, Anup Patel, Albert Ou, Greentime Hu, linux,
	linux-kernel@vger.kernel.org List, Sebastien Boeuf, iommu,
	Nick Kossifidis, linux-riscv

On Fri, Feb 23, 2024 at 6:04 AM Zong Li <zong.li@sifive.com> wrote:
>
> >
> > The RISC-V IOMMU specification is now ratified as-per the RISC-V international
> > process [1]. The latest frozen specifcation can be found at:
> > https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf
> >
> > At a high-level, the RISC-V IOMMU specification defines:
> > 1) Memory-mapped programming interface
> >    - Mandatory and optional registers layout and description.
> >    - Software guidelines for device initialization and capabilities discovery.
> > 2) In-memory queue interface
> >    - A command-queue used by software to queue commands to the IOMMU.
> >    - A fault/event queue used to bring faults and events to software’s attention.
> >    - A page-request queue used to report “Page Request” messages received from
> >      PCIe devices.
> >    - Message-signalled and wire-signaled interrupt mechanism.
> > 3) In-memory data structures
> >    - Device-context: used to associate a device with an address space and to hold
> >      other per-device parameters used by the IOMMU to perform address translations.
> >    - Process-contexts: used to associate a different virtual address space based on
> >      device provided process identification number.
> >    - MSI page table configuration used to direct an MSI to a guest interrupt file
> >      in an IMSIC. The MSI page table formats are defined by the Advanced Interrupt
> >      Architecture specification [2].
> >
> > This series introduces complete single-level translation support, including shared
> > virtual address (SVA), ATS/PRI interfaces in the kernel driver. Patches adding MSI
> > identity remapping and G-Stage translation (GPA to SPA) are added only to excercise
> > hardware interfaces, to be complemented with AIA/KVM bindings in follow-up series.
> >
> > This series is a logical regrouping of series of incremental patches based on
> > RISC-V International IOMMU Task Group discussions and specification development
> > process. Original series can be found at the maintainer's repository branch [3].
> >
> > These patches can also be found in the riscv_iommu_v1 branch at:
> > https://github.com/tjeznach/linux/tree/riscv_iommu_v1
> >
> > To test this series, use QEMU/OpenSBI with RISC-V IOMMU implementation available in
> > the riscv_iommu_v1 branch at:
> > https://github.com/tjeznach/qemu/tree/riscv_iommu_v1
> >
> > References:
> > [1] - https://wiki.riscv.org/display/HOME/Specification+Status
> > [2] - https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
> > [3] - https://github.com/tjeznach/qemu/tree/tjeznach/riscv-iommu-20230719
> >
> >
> > Anup Patel (1):
> >   dt-bindings: Add RISC-V IOMMU bindings
> >
> > Tomasz Jeznach (10):
> >   RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support.
> >   RISC-V: arch/riscv/config: enable RISC-V IOMMU support
> >   MAINTAINERS: Add myself for RISC-V IOMMU driver
> >   RISC-V: drivers/iommu/riscv: Add sysfs interface
> >   RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues
> >   RISC-V: drivers/iommu/riscv: Add device context support
> >   RISC-V: drivers/iommu/riscv: Add page table support
> >   RISC-V: drivers/iommu/riscv: Add SVA with PASID/ATS/PRI support.
> >   RISC-V: drivers/iommu/riscv: Add MSI identity remapping
> >   RISC-V: drivers/iommu/riscv: Add G-Stage translation support
> >
> >  .../bindings/iommu/riscv,iommu.yaml           |  146 ++
> >  MAINTAINERS                                   |    7 +
> >  arch/riscv/configs/defconfig                  |    1 +
> >  drivers/iommu/Kconfig                         |    1 +
> >  drivers/iommu/Makefile                        |    2 +-
> >  drivers/iommu/io-pgtable.c                    |    3 +
> >  drivers/iommu/riscv/Kconfig                   |   22 +
> >  drivers/iommu/riscv/Makefile                  |    1 +
> >  drivers/iommu/riscv/io_pgtable.c              |  266 ++
> >  drivers/iommu/riscv/iommu-bits.h              |  704 ++++++
> >  drivers/iommu/riscv/iommu-pci.c               |  206 ++
> >  drivers/iommu/riscv/iommu-platform.c          |  160 ++
> >  drivers/iommu/riscv/iommu-sysfs.c             |  183 ++
> >  drivers/iommu/riscv/iommu.c                   | 2130 +++++++++++++++++
> >  drivers/iommu/riscv/iommu.h                   |  165 ++
> >  include/linux/io-pgtable.h                    |    2 +
> >  16 files changed, 3998 insertions(+), 1 deletion(-)
> >  create mode 100644 Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> >  create mode 100644 drivers/iommu/riscv/Kconfig
> >  create mode 100644 drivers/iommu/riscv/Makefile
> >  create mode 100644 drivers/iommu/riscv/io_pgtable.c
> >  create mode 100644 drivers/iommu/riscv/iommu-bits.h
> >  create mode 100644 drivers/iommu/riscv/iommu-pci.c
> >  create mode 100644 drivers/iommu/riscv/iommu-platform.c
> >  create mode 100644 drivers/iommu/riscv/iommu-sysfs.c
> >  create mode 100644 drivers/iommu/riscv/iommu.c
> >  create mode 100644 drivers/iommu/riscv/iommu.h
> >
> > --
> > 2.34.1
> >
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
>
> Hi Tomasz,
> Could I know if you have a plan for the next version and if you have
> any estimates for when the v2 patch will be ready? We have some
> patches based on top of your old implementation, and it would be great
> if we can rebase them onto your next version. Thanks.

Hi Zong,

Thank you for your interest. Next version of the iommu/riscv is almost ready to
be sent in next few days.
There is a number of bug fixes and design changes based on the testing and
great feedback after v1 was published.
Upcoming patch set will be smaller, with core functionality only, hopefully to
make the review easier. Functionality related to the MSI remapping, shared
virtual addressing, nested translations will be moved to separate patch sets.

Complete, up to date revision is always available at
https://github.com/tjeznach/linux/

regards,
- Tomasz

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 00/13] Linux RISC-V IOMMU Support
  2024-04-04 17:37     ` Tomasz Jeznach
@ 2024-04-10  5:38       ` Zong Li
  0 siblings, 0 replies; 86+ messages in thread
From: Zong Li @ 2024-04-10  5:38 UTC (permalink / raw)
  To: Tomasz Jeznach
  Cc: Paul Walmsley, Palmer Dabbelt, Robin Murphy, Will Deacon,
	Joerg Roedel, Anup Patel, Albert Ou, Greentime Hu, linux,
	linux-kernel@vger.kernel.org List, Sebastien Boeuf, iommu,
	Nick Kossifidis, linux-riscv

On Fri, Apr 5, 2024 at 1:37 AM Tomasz Jeznach <tjeznach@rivosinc.com> wrote:
>
> On Fri, Feb 23, 2024 at 6:04 AM Zong Li <zong.li@sifive.com> wrote:
> >
> > >
> > > The RISC-V IOMMU specification is now ratified as-per the RISC-V international
> > > process [1]. The latest frozen specifcation can be found at:
> > > https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf
> > >
> > > At a high-level, the RISC-V IOMMU specification defines:
> > > 1) Memory-mapped programming interface
> > >    - Mandatory and optional registers layout and description.
> > >    - Software guidelines for device initialization and capabilities discovery.
> > > 2) In-memory queue interface
> > >    - A command-queue used by software to queue commands to the IOMMU.
> > >    - A fault/event queue used to bring faults and events to software’s attention.
> > >    - A page-request queue used to report “Page Request” messages received from
> > >      PCIe devices.
> > >    - Message-signalled and wire-signaled interrupt mechanism.
> > > 3) In-memory data structures
> > >    - Device-context: used to associate a device with an address space and to hold
> > >      other per-device parameters used by the IOMMU to perform address translations.
> > >    - Process-contexts: used to associate a different virtual address space based on
> > >      device provided process identification number.
> > >    - MSI page table configuration used to direct an MSI to a guest interrupt file
> > >      in an IMSIC. The MSI page table formats are defined by the Advanced Interrupt
> > >      Architecture specification [2].
> > >
> > > This series introduces complete single-level translation support, including shared
> > > virtual address (SVA), ATS/PRI interfaces in the kernel driver. Patches adding MSI
> > > identity remapping and G-Stage translation (GPA to SPA) are added only to excercise
> > > hardware interfaces, to be complemented with AIA/KVM bindings in follow-up series.
> > >
> > > This series is a logical regrouping of series of incremental patches based on
> > > RISC-V International IOMMU Task Group discussions and specification development
> > > process. Original series can be found at the maintainer's repository branch [3].
> > >
> > > These patches can also be found in the riscv_iommu_v1 branch at:
> > > https://github.com/tjeznach/linux/tree/riscv_iommu_v1
> > >
> > > To test this series, use QEMU/OpenSBI with RISC-V IOMMU implementation available in
> > > the riscv_iommu_v1 branch at:
> > > https://github.com/tjeznach/qemu/tree/riscv_iommu_v1
> > >
> > > References:
> > > [1] - https://wiki.riscv.org/display/HOME/Specification+Status
> > > [2] - https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
> > > [3] - https://github.com/tjeznach/qemu/tree/tjeznach/riscv-iommu-20230719
> > >
> > >
> > > Anup Patel (1):
> > >   dt-bindings: Add RISC-V IOMMU bindings
> > >
> > > Tomasz Jeznach (10):
> > >   RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support.
> > >   RISC-V: arch/riscv/config: enable RISC-V IOMMU support
> > >   MAINTAINERS: Add myself for RISC-V IOMMU driver
> > >   RISC-V: drivers/iommu/riscv: Add sysfs interface
> > >   RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues
> > >   RISC-V: drivers/iommu/riscv: Add device context support
> > >   RISC-V: drivers/iommu/riscv: Add page table support
> > >   RISC-V: drivers/iommu/riscv: Add SVA with PASID/ATS/PRI support.
> > >   RISC-V: drivers/iommu/riscv: Add MSI identity remapping
> > >   RISC-V: drivers/iommu/riscv: Add G-Stage translation support
> > >
> > >  .../bindings/iommu/riscv,iommu.yaml           |  146 ++
> > >  MAINTAINERS                                   |    7 +
> > >  arch/riscv/configs/defconfig                  |    1 +
> > >  drivers/iommu/Kconfig                         |    1 +
> > >  drivers/iommu/Makefile                        |    2 +-
> > >  drivers/iommu/io-pgtable.c                    |    3 +
> > >  drivers/iommu/riscv/Kconfig                   |   22 +
> > >  drivers/iommu/riscv/Makefile                  |    1 +
> > >  drivers/iommu/riscv/io_pgtable.c              |  266 ++
> > >  drivers/iommu/riscv/iommu-bits.h              |  704 ++++++
> > >  drivers/iommu/riscv/iommu-pci.c               |  206 ++
> > >  drivers/iommu/riscv/iommu-platform.c          |  160 ++
> > >  drivers/iommu/riscv/iommu-sysfs.c             |  183 ++
> > >  drivers/iommu/riscv/iommu.c                   | 2130 +++++++++++++++++
> > >  drivers/iommu/riscv/iommu.h                   |  165 ++
> > >  include/linux/io-pgtable.h                    |    2 +
> > >  16 files changed, 3998 insertions(+), 1 deletion(-)
> > >  create mode 100644 Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> > >  create mode 100644 drivers/iommu/riscv/Kconfig
> > >  create mode 100644 drivers/iommu/riscv/Makefile
> > >  create mode 100644 drivers/iommu/riscv/io_pgtable.c
> > >  create mode 100644 drivers/iommu/riscv/iommu-bits.h
> > >  create mode 100644 drivers/iommu/riscv/iommu-pci.c
> > >  create mode 100644 drivers/iommu/riscv/iommu-platform.c
> > >  create mode 100644 drivers/iommu/riscv/iommu-sysfs.c
> > >  create mode 100644 drivers/iommu/riscv/iommu.c
> > >  create mode 100644 drivers/iommu/riscv/iommu.h
> > >
> > > --
> > > 2.34.1
> > >
> > >
> > > _______________________________________________
> > > linux-riscv mailing list
> > > linux-riscv@lists.infradead.org
> > > http://lists.infradead.org/mailman/listinfo/linux-riscv
> >
> > Hi Tomasz,
> > Could I know if you have a plan for the next version and if you have
> > any estimates for when the v2 patch will be ready? We have some
> > patches based on top of your old implementation, and it would be great
> > if we can rebase them onto your next version. Thanks.
>
> Hi Zong,
>
> Thank you for your interest. Next version of the iommu/riscv is almost ready to
> be sent in next few days.

Hi Tomasz,
Thanks you for the update, I would help to review the v2 series as well.

> There is a number of bug fixes and design changes based on the testing and
> great feedback after v1 was published.
> Upcoming patch set will be smaller, with core functionality only, hopefully to
> make the review easier. Functionality related to the MSI remapping, shared
> virtual addressing, nested translations will be moved to separate patch sets.
>
> Complete, up to date revision is always available at
> https://github.com/tjeznach/linux/
>
> regards,
> - Tomasz

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 01/11] RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support.
  2023-07-19 19:33 ` [PATCH 01/11] RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support Tomasz Jeznach
                     ` (6 preceding siblings ...)
  2023-08-16 18:05   ` Robin Murphy
@ 2024-04-13 10:15   ` Xingyou Chen
  7 siblings, 0 replies; 86+ messages in thread
From: Xingyou Chen @ 2024-04-13 10:15 UTC (permalink / raw)
  To: Tomasz Jeznach, Joerg Roedel, Will Deacon, Robin Murphy, Paul Walmsley
  Cc: Anup Patel, Albert Ou, linux, linux-kernel, Sebastien Boeuf,
	iommu, Palmer Dabbelt, Nick Kossifidis, linux-riscv


On 7/20/23 03:33, Tomasz Jeznach wrote:
> ... > +#endif /* _RISCV_IOMMU_BITS_H_ */
> diff --git a/drivers/iommu/riscv/iommu-pci.c b/drivers/iommu/riscv/iommu-pci.c
> new file mode 100644
> index 000000000000..c91f963d7a29
> --- /dev/null
> +++ b/drivers/iommu/riscv/iommu-pci.c
> @@ -0,0 +1,134 @@
> ...
> +
> +static struct pci_driver riscv_iommu_pci_driver = {
> +	.name = KBUILD_MODNAME,
> +	.id_table = riscv_iommu_pci_tbl,
> +	.probe = riscv_iommu_pci_probe,
> +	.remove = riscv_iommu_pci_remove,
> +	.driver = {
> +		   .pm = pm_sleep_ptr(&riscv_iommu_pm_ops),
> +		   .of_match_table = riscv_iommu_of_match,
> +		   },
> +};
> +
> +module_driver(riscv_iommu_pci_driver, pci_register_driver, pci_unregister_driver);

There's helper macro to be considered, and not forced to:
   module_pci_driver(riscv_iommu_pci_driver);

> diff --git a/drivers/iommu/riscv/iommu-platform.c b/drivers/iommu/riscv/iommu-platform.c
> new file mode 100644
> index 000000000000..e4e8ca6711e7
> --- /dev/null
> +++ b/drivers/iommu/riscv/iommu-platform.c
> @@ -0,0 +1,94 @@
> ...
> +
> +static struct platform_driver riscv_iommu_platform_driver = {
> +	.driver = {
> +		   .name = "riscv,iommu",
> +		   .of_match_table = riscv_iommu_of_match,
> +		   .suppress_bind_attrs = true,
> +		   },
> +	.probe = riscv_iommu_platform_probe,
> +	.remove_new = riscv_iommu_platform_remove,
> +	.shutdown = riscv_iommu_platform_shutdown,
> +};
> +
> +module_driver(riscv_iommu_platform_driver, platform_driver_register,
> +	      platform_driver_unregister);

And also:
   module_platform_driver(riscv_iommu_platform_driver);

^ permalink raw reply	[flat|nested] 86+ messages in thread

end of thread, other threads:[~2024-04-13 10:22 UTC | newest]

Thread overview: 86+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-19 19:33 [PATCH 00/13] Linux RISC-V IOMMU Support Tomasz Jeznach
2023-07-19 19:33 ` [PATCH 01/11] RISC-V: drivers/iommu: Add RISC-V IOMMU - Ziommu support Tomasz Jeznach
2023-07-19 20:49   ` Conor Dooley
2023-07-19 21:43     ` Tomasz Jeznach
2023-07-20 19:27       ` Conor Dooley
2023-07-21  9:44       ` Conor Dooley
2023-07-20 10:38   ` Baolu Lu
2023-07-20 12:31   ` Baolu Lu
2023-07-20 17:30     ` Tomasz Jeznach
2023-07-28  2:42   ` Zong Li
2023-08-02 20:15     ` Tomasz Jeznach
2023-08-02 20:25       ` Conor Dooley
2023-08-03  3:37       ` Zong Li
2023-08-03  0:18   ` Jason Gunthorpe
2023-08-03  8:27   ` Zong Li
2023-08-16 18:05   ` Robin Murphy
2024-04-13 10:15   ` Xingyou Chen
2023-07-19 19:33 ` [PATCH 02/11] RISC-V: arch/riscv/config: enable RISC-V IOMMU support Tomasz Jeznach
2023-07-19 20:22   ` Conor Dooley
2023-07-19 21:07     ` Tomasz Jeznach
2023-07-20  6:37       ` Krzysztof Kozlowski
2023-07-19 19:33 ` [PATCH 03/11] dt-bindings: Add RISC-V IOMMU bindings Tomasz Jeznach
2023-07-19 20:19   ` Conor Dooley
     [not found]     ` <CAH2o1u6CZSb7pXcaXmh7dJQmNZYh3uORk4x7vJPrb+uCwFdU5g@mail.gmail.com>
2023-07-19 20:57       ` Conor Dooley
2023-07-19 21:37     ` Rob Herring
2023-07-19 23:04       ` Tomasz Jeznach
2023-07-24  8:03   ` Zong Li
2023-07-24 10:02     ` Anup Patel
2023-07-24 11:31       ` Zong Li
2023-07-24 12:10         ` Anup Patel
2023-07-24 13:23           ` Zong Li
2023-07-26  3:21             ` Baolu Lu
2023-07-26  4:26               ` Zong Li
2023-07-26 12:17                 ` Jason Gunthorpe
2023-07-27  2:42                   ` Zong Li
2023-08-09 14:57                     ` Jason Gunthorpe
2023-08-15  1:28                       ` Zong Li
2023-08-15 18:38                         ` Jason Gunthorpe
2023-08-16  2:16                           ` Zong Li
2023-08-16  4:10                             ` Baolu Lu
2023-07-19 19:33 ` [PATCH 04/11] MAINTAINERS: Add myself for RISC-V IOMMU driver Tomasz Jeznach
2023-07-20 12:42   ` Baolu Lu
2023-07-20 17:32     ` Tomasz Jeznach
2023-07-19 19:33 ` [PATCH 05/11] RISC-V: drivers/iommu/riscv: Add sysfs interface Tomasz Jeznach
2023-07-20  6:38   ` Krzysztof Kozlowski
2023-07-20 18:30     ` Tomasz Jeznach
2023-07-20 21:37       ` Krzysztof Kozlowski
2023-07-20 22:08         ` Conor Dooley
2023-07-21  3:49           ` Tomasz Jeznach
2023-07-20 12:50   ` Baolu Lu
2023-07-20 17:47     ` Tomasz Jeznach
2023-07-19 19:33 ` [PATCH 06/11] RISC-V: drivers/iommu/riscv: Add command, fault, page-req queues Tomasz Jeznach
2023-07-20  3:11   ` Nick Kossifidis
2023-07-20 18:00     ` Tomasz Jeznach
2023-07-20 18:43       ` Conor Dooley
2023-07-24  9:47       ` Zong Li
2023-07-28  5:18         ` Tomasz Jeznach
2023-07-28  8:48           ` Zong Li
2023-07-20 13:08   ` Baolu Lu
2023-07-20 17:49     ` Tomasz Jeznach
2023-07-29 12:58   ` Zong Li
2023-07-31  9:32     ` Nick Kossifidis
2023-07-31 13:15       ` Zong Li
2023-07-31 23:35         ` Nick Kossifidis
2023-08-01  0:37           ` Zong Li
2023-08-02 20:28             ` Tomasz Jeznach
2023-08-02 20:50     ` Tomasz Jeznach
2023-08-03  8:24       ` Zong Li
2023-08-16 18:49   ` Robin Murphy
2023-07-19 19:33 ` [PATCH 07/11] RISC-V: drivers/iommu/riscv: Add device context support Tomasz Jeznach
2023-08-16 19:08   ` Robin Murphy
2023-07-19 19:33 ` [PATCH 08/11] RISC-V: drivers/iommu/riscv: Add page table support Tomasz Jeznach
2023-07-25 13:13   ` Zong Li
2023-07-31  7:19   ` Zong Li
2023-08-16 21:04   ` Robin Murphy
2023-07-19 19:33 ` [PATCH 09/11] RISC-V: drivers/iommu/riscv: Add SVA with PASID/ATS/PRI support Tomasz Jeznach
2023-07-31  9:04   ` Zong Li
2023-07-19 19:33 ` [PATCH 10/11] RISC-V: drivers/iommu/riscv: Add MSI identity remapping Tomasz Jeznach
2023-07-31  8:02   ` Zong Li
2023-08-16 21:43   ` Robin Murphy
2023-07-19 19:33 ` [PATCH 11/11] RISC-V: drivers/iommu/riscv: Add G-Stage translation support Tomasz Jeznach
2023-07-31  8:12   ` Zong Li
2023-08-16 21:13   ` Robin Murphy
     [not found] ` <CAHCEehJKYu3-GSX2L6L4_VVvYt1MagRgPJvYTbqekrjPw3ZSkA@mail.gmail.com>
2024-02-23 14:04   ` [PATCH 00/13] Linux RISC-V IOMMU Support Zong Li
2024-04-04 17:37     ` Tomasz Jeznach
2024-04-10  5:38       ` Zong Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).