qemu-riscv.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/15] riscv: QEMU RISC-V IOMMU Support
@ 2024-03-07 16:03 Daniel Henrique Barboza
  2024-03-07 16:03 ` [PATCH v2 01/15] exec/memtxattr: add process identifier to the transaction attributes Daniel Henrique Barboza
                   ` (15 more replies)
  0 siblings, 16 replies; 55+ messages in thread
From: Daniel Henrique Barboza @ 2024-03-07 16:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, zhiwei_liu,
	palmer, ajones, tjeznach, Daniel Henrique Barboza

Hi,

This is the second version of the work Tomasz sent in July 2023 [1].
I'll be helping Tomasz upstreaming it. 

The core emulation code is left unchanged but a few tweaks were made in
v2:

- The most notable difference in this version is that the code was split
  in smaller chunks. Patch 03 is still a 1700 lines patch, which is an
  improvement from the 3800 lines patch from v1, but we can only go so
  far when splitting the core components of the emulation. The reality
  is that the IOMMU emulation is a rather complex piece of software and
  there's not much we can do to alleviate it;

- I'm not contributing the HPM support that was present in v1. It shaved
  off 600 lines of code from the series, which is already large enough
  as is. We'll introduce HPM in later versions or as a follow-up;

- The riscv-iommu-header.h header was also trimmed. I shaved it of 300
  or so from it, all of them from definitions that the emulation isn't
  using it. The header will be eventually be imported from the Linux
  driver (not upstream yet), so for now we can live with a trimmed
  header for the emulation usage alone;

- I added libqos tests for the riscv-iommu-pci device. The idea of these
  tests is to give us more confidence in the emulation code;

- 'edu' device support. The support was retrieved from Tomasz EDU branch
  [2]. This device can then be used to test PCI passthrough to exercise
  the IOMMU.


Patches based on alistair/riscv-to-apply.next.

v1 link: https://lore.kernel.org/qemu-riscv/cover.1689819031.git.tjeznach@rivosinc.com/

[1] https://lore.kernel.org/qemu-riscv/cover.1689819031.git.tjeznach@rivosinc.com/
[2] https://github.com/tjeznach/qemu.git, branch 'riscv_iommu_edu_impl'

Andrew Jones (1):
  hw/riscv/riscv-iommu: Add another irq for mrif notifications

Daniel Henrique Barboza (2):
  test/qtest: add riscv-iommu-pci tests
  qtest/riscv-iommu-test: add init queues test

Tomasz Jeznach (12):
  exec/memtxattr: add process identifier to the transaction attributes
  hw/riscv: add riscv-iommu-bits.h
  hw/riscv: add RISC-V IOMMU base emulation
  hw/riscv: add riscv-iommu-pci device
  hw/riscv: add riscv-iommu-sys platform device
  hw/riscv/virt.c: support for RISC-V IOMMU PCIDevice hotplug
  hw/riscv/riscv-iommu: add Address Translation Cache (IOATC)
  hw/riscv/riscv-iommu: add s-stage and g-stage support
  hw/riscv/riscv-iommu: add ATS support
  hw/riscv/riscv-iommu: add DBG support
  hw/misc: EDU: added PASID support
  hw/misc: EDU: add ATS/PRI capability

 hw/misc/edu.c                    |  297 ++++-
 hw/riscv/Kconfig                 |    4 +
 hw/riscv/meson.build             |    1 +
 hw/riscv/riscv-iommu-bits.h      |  407 ++++++
 hw/riscv/riscv-iommu-pci.c       |  173 +++
 hw/riscv/riscv-iommu-sys.c       |   93 ++
 hw/riscv/riscv-iommu.c           | 2085 ++++++++++++++++++++++++++++++
 hw/riscv/riscv-iommu.h           |  146 +++
 hw/riscv/trace-events            |   15 +
 hw/riscv/trace.h                 |    2 +
 hw/riscv/virt.c                  |   33 +-
 include/exec/memattrs.h          |    5 +
 include/hw/riscv/iommu.h         |   40 +
 meson.build                      |    1 +
 tests/qtest/libqos/meson.build   |    4 +
 tests/qtest/libqos/riscv-iommu.c |   79 ++
 tests/qtest/libqos/riscv-iommu.h |   96 ++
 tests/qtest/meson.build          |    1 +
 tests/qtest/riscv-iommu-test.c   |  234 ++++
 19 files changed, 3704 insertions(+), 12 deletions(-)
 create mode 100644 hw/riscv/riscv-iommu-bits.h
 create mode 100644 hw/riscv/riscv-iommu-pci.c
 create mode 100644 hw/riscv/riscv-iommu-sys.c
 create mode 100644 hw/riscv/riscv-iommu.c
 create mode 100644 hw/riscv/riscv-iommu.h
 create mode 100644 hw/riscv/trace-events
 create mode 100644 hw/riscv/trace.h
 create mode 100644 include/hw/riscv/iommu.h
 create mode 100644 tests/qtest/libqos/riscv-iommu.c
 create mode 100644 tests/qtest/libqos/riscv-iommu.h
 create mode 100644 tests/qtest/riscv-iommu-test.c

-- 
2.43.2



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 01/15] exec/memtxattr: add process identifier to the transaction attributes
  2024-03-07 16:03 [PATCH v2 00/15] riscv: QEMU RISC-V IOMMU Support Daniel Henrique Barboza
@ 2024-03-07 16:03 ` Daniel Henrique Barboza
  2024-04-23 16:33   ` Frank Chang
  2024-03-07 16:03 ` [PATCH v2 02/15] hw/riscv: add riscv-iommu-bits.h Daniel Henrique Barboza
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 55+ messages in thread
From: Daniel Henrique Barboza @ 2024-03-07 16:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, zhiwei_liu,
	palmer, ajones, tjeznach

From: Tomasz Jeznach <tjeznach@rivosinc.com>

Extend memory transaction attributes with process identifier to allow
per-request address translation logic to use requester_id / process_id
to identify memory mapping (e.g. enabling IOMMU w/ PASID translations).

Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
---
 include/exec/memattrs.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/exec/memattrs.h b/include/exec/memattrs.h
index 14cdd8d582..46d0725416 100644
--- a/include/exec/memattrs.h
+++ b/include/exec/memattrs.h
@@ -52,6 +52,11 @@ typedef struct MemTxAttrs {
     unsigned int memory:1;
     /* Requester ID (for MSI for example) */
     unsigned int requester_id:16;
+
+    /*
+     * PCI PASID support: Limited to 8 bits process identifier.
+     */
+    unsigned int pasid:8;
 } MemTxAttrs;
 
 /* Bus masters which don't specify any attributes will get this,
-- 
2.43.2



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 02/15] hw/riscv: add riscv-iommu-bits.h
  2024-03-07 16:03 [PATCH v2 00/15] riscv: QEMU RISC-V IOMMU Support Daniel Henrique Barboza
  2024-03-07 16:03 ` [PATCH v2 01/15] exec/memtxattr: add process identifier to the transaction attributes Daniel Henrique Barboza
@ 2024-03-07 16:03 ` Daniel Henrique Barboza
  2024-05-10 11:01   ` Frank Chang
  2024-05-15 10:02   ` Eric Cheng
  2024-03-07 16:03 ` [PATCH v2 03/15] hw/riscv: add RISC-V IOMMU base emulation Daniel Henrique Barboza
                   ` (13 subsequent siblings)
  15 siblings, 2 replies; 55+ messages in thread
From: Daniel Henrique Barboza @ 2024-03-07 16:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, zhiwei_liu,
	palmer, ajones, tjeznach, Daniel Henrique Barboza

From: Tomasz Jeznach <tjeznach@rivosinc.com>

This header will be used by the RISC-V IOMMU emulation to be added
in the next patch. Due to its size it's being sent in separate for
an easier review.

One thing to notice is that this header can be replaced by the future
Linux RISC-V IOMMU driver header, which would become a linux-header we
would import instead of keeping our own. The Linux implementation isn't
upstream yet so for now we'll have to manage riscv-iommu-bits.h.

Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
---
 hw/riscv/riscv-iommu-bits.h | 335 ++++++++++++++++++++++++++++++++++++
 1 file changed, 335 insertions(+)
 create mode 100644 hw/riscv/riscv-iommu-bits.h

diff --git a/hw/riscv/riscv-iommu-bits.h b/hw/riscv/riscv-iommu-bits.h
new file mode 100644
index 0000000000..8e80b1e52a
--- /dev/null
+++ b/hw/riscv/riscv-iommu-bits.h
@@ -0,0 +1,335 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright © 2022-2023 Rivos Inc.
+ * Copyright © 2023 FORTH-ICS/CARV
+ * Copyright © 2023 RISC-V IOMMU Task Group
+ *
+ * RISC-V Ziommu - Register Layout and Data Structures.
+ *
+ * Based on the IOMMU spec version 1.0, 3/2023
+ * https://github.com/riscv-non-isa/riscv-iommu
+ */
+
+#ifndef HW_RISCV_IOMMU_BITS_H
+#define HW_RISCV_IOMMU_BITS_H
+
+#include "qemu/osdep.h"
+
+#define RISCV_IOMMU_SPEC_DOT_VER 0x010
+
+#ifndef GENMASK_ULL
+#define GENMASK_ULL(h, l) (((~0ULL) >> (63 - (h) + (l))) << (l))
+#endif
+
+/*
+ * struct riscv_iommu_fq_record - Fault/Event Queue Record
+ * See section 3.2 for more info.
+ */
+struct riscv_iommu_fq_record {
+    uint64_t hdr;
+    uint64_t _reserved;
+    uint64_t iotval;
+    uint64_t iotval2;
+};
+/* Header fields */
+#define RISCV_IOMMU_FQ_HDR_CAUSE        GENMASK_ULL(11, 0)
+#define RISCV_IOMMU_FQ_HDR_PID          GENMASK_ULL(31, 12)
+#define RISCV_IOMMU_FQ_HDR_PV           BIT_ULL(32)
+#define RISCV_IOMMU_FQ_HDR_TTYPE        GENMASK_ULL(39, 34)
+#define RISCV_IOMMU_FQ_HDR_DID          GENMASK_ULL(63, 40)
+
+/*
+ * struct riscv_iommu_pq_record - PCIe Page Request record
+ * For more infos on the PCIe Page Request queue see chapter 3.3.
+ */
+struct riscv_iommu_pq_record {
+      uint64_t hdr;
+      uint64_t payload;
+};
+/* Header fields */
+#define RISCV_IOMMU_PREQ_HDR_PID        GENMASK_ULL(31, 12)
+#define RISCV_IOMMU_PREQ_HDR_PV         BIT_ULL(32)
+#define RISCV_IOMMU_PREQ_HDR_PRIV       BIT_ULL(33)
+#define RISCV_IOMMU_PREQ_HDR_EXEC       BIT_ULL(34)
+#define RISCV_IOMMU_PREQ_HDR_DID        GENMASK_ULL(63, 40)
+/* Payload fields */
+#define RISCV_IOMMU_PREQ_PAYLOAD_M      GENMASK_ULL(2, 0)
+
+/* Common field positions */
+#define RISCV_IOMMU_PPN_FIELD           GENMASK_ULL(53, 10)
+#define RISCV_IOMMU_QUEUE_LOGSZ_FIELD   GENMASK_ULL(4, 0)
+#define RISCV_IOMMU_QUEUE_INDEX_FIELD   GENMASK_ULL(31, 0)
+#define RISCV_IOMMU_QUEUE_ENABLE        BIT(0)
+#define RISCV_IOMMU_QUEUE_INTR_ENABLE   BIT(1)
+#define RISCV_IOMMU_QUEUE_MEM_FAULT     BIT(8)
+#define RISCV_IOMMU_QUEUE_OVERFLOW      BIT(9)
+#define RISCV_IOMMU_QUEUE_ACTIVE        BIT(16)
+#define RISCV_IOMMU_QUEUE_BUSY          BIT(17)
+#define RISCV_IOMMU_ATP_PPN_FIELD       GENMASK_ULL(43, 0)
+#define RISCV_IOMMU_ATP_MODE_FIELD      GENMASK_ULL(63, 60)
+
+/* 5.3 IOMMU Capabilities (64bits) */
+#define RISCV_IOMMU_REG_CAP             0x0000
+#define RISCV_IOMMU_CAP_VERSION         GENMASK_ULL(7, 0)
+#define RISCV_IOMMU_CAP_MSI_FLAT        BIT_ULL(22)
+#define RISCV_IOMMU_CAP_MSI_MRIF        BIT_ULL(23)
+#define RISCV_IOMMU_CAP_IGS             GENMASK_ULL(29, 28)
+#define RISCV_IOMMU_CAP_PAS             GENMASK_ULL(37, 32)
+#define RISCV_IOMMU_CAP_PD8             BIT_ULL(38)
+
+/* 5.4 Features control register (32bits) */
+#define RISCV_IOMMU_REG_FCTL            0x0008
+
+/* 5.5 Device-directory-table pointer (64bits) */
+#define RISCV_IOMMU_REG_DDTP            0x0010
+#define RISCV_IOMMU_DDTP_MODE           GENMASK_ULL(3, 0)
+#define RISCV_IOMMU_DDTP_BUSY           BIT_ULL(4)
+#define RISCV_IOMMU_DDTP_PPN            RISCV_IOMMU_PPN_FIELD
+
+enum riscv_iommu_ddtp_modes {
+    RISCV_IOMMU_DDTP_MODE_OFF = 0,
+    RISCV_IOMMU_DDTP_MODE_BARE = 1,
+    RISCV_IOMMU_DDTP_MODE_1LVL = 2,
+    RISCV_IOMMU_DDTP_MODE_2LVL = 3,
+    RISCV_IOMMU_DDTP_MODE_3LVL = 4,
+    RISCV_IOMMU_DDTP_MODE_MAX = 4
+};
+
+/* 5.6 Command Queue Base (64bits) */
+#define RISCV_IOMMU_REG_CQB             0x0018
+#define RISCV_IOMMU_CQB_LOG2SZ          RISCV_IOMMU_QUEUE_LOGSZ_FIELD
+#define RISCV_IOMMU_CQB_PPN             RISCV_IOMMU_PPN_FIELD
+
+/* 5.7 Command Queue head (32bits) */
+#define RISCV_IOMMU_REG_CQH             0x0020
+
+/* 5.8 Command Queue tail (32bits) */
+#define RISCV_IOMMU_REG_CQT             0x0024
+
+/* 5.9 Fault Queue Base (64bits) */
+#define RISCV_IOMMU_REG_FQB             0x0028
+#define RISCV_IOMMU_FQB_LOG2SZ          RISCV_IOMMU_QUEUE_LOGSZ_FIELD
+#define RISCV_IOMMU_FQB_PPN             RISCV_IOMMU_PPN_FIELD
+
+/* 5.10 Fault Queue Head (32bits) */
+#define RISCV_IOMMU_REG_FQH             0x0030
+
+/* 5.11 Fault Queue tail (32bits) */
+#define RISCV_IOMMU_REG_FQT             0x0034
+
+/* 5.12 Page Request Queue base (64bits) */
+#define RISCV_IOMMU_REG_PQB             0x0038
+#define RISCV_IOMMU_PQB_LOG2SZ          RISCV_IOMMU_QUEUE_LOGSZ_FIELD
+#define RISCV_IOMMU_PQB_PPN             RISCV_IOMMU_PPN_FIELD
+
+/* 5.13 Page Request Queue head (32bits) */
+#define RISCV_IOMMU_REG_PQH             0x0040
+
+/* 5.14 Page Request Queue tail (32bits) */
+#define RISCV_IOMMU_REG_PQT             0x0044
+
+/* 5.15 Command Queue CSR (32bits) */
+#define RISCV_IOMMU_REG_CQCSR           0x0048
+#define RISCV_IOMMU_CQCSR_CQEN          RISCV_IOMMU_QUEUE_ENABLE
+#define RISCV_IOMMU_CQCSR_CIE           RISCV_IOMMU_QUEUE_INTR_ENABLE
+#define RISCV_IOMMU_CQCSR_CQMF          RISCV_IOMMU_QUEUE_MEM_FAULT
+#define RISCV_IOMMU_CQCSR_CMD_TO        BIT(9)
+#define RISCV_IOMMU_CQCSR_CMD_ILL       BIT(10)
+#define RISCV_IOMMU_CQCSR_CQON          RISCV_IOMMU_QUEUE_ACTIVE
+#define RISCV_IOMMU_CQCSR_BUSY          RISCV_IOMMU_QUEUE_BUSY
+
+/* 5.16 Fault Queue CSR (32bits) */
+#define RISCV_IOMMU_REG_FQCSR           0x004C
+#define RISCV_IOMMU_FQCSR_FQEN          RISCV_IOMMU_QUEUE_ENABLE
+#define RISCV_IOMMU_FQCSR_FIE           RISCV_IOMMU_QUEUE_INTR_ENABLE
+#define RISCV_IOMMU_FQCSR_FQMF          RISCV_IOMMU_QUEUE_MEM_FAULT
+#define RISCV_IOMMU_FQCSR_FQOF          RISCV_IOMMU_QUEUE_OVERFLOW
+#define RISCV_IOMMU_FQCSR_FQON          RISCV_IOMMU_QUEUE_ACTIVE
+#define RISCV_IOMMU_FQCSR_BUSY          RISCV_IOMMU_QUEUE_BUSY
+
+/* 5.17 Page Request Queue CSR (32bits) */
+#define RISCV_IOMMU_REG_PQCSR           0x0050
+#define RISCV_IOMMU_PQCSR_PQEN          RISCV_IOMMU_QUEUE_ENABLE
+#define RISCV_IOMMU_PQCSR_PIE           RISCV_IOMMU_QUEUE_INTR_ENABLE
+#define RISCV_IOMMU_PQCSR_PQMF          RISCV_IOMMU_QUEUE_MEM_FAULT
+#define RISCV_IOMMU_PQCSR_PQOF          RISCV_IOMMU_QUEUE_OVERFLOW
+#define RISCV_IOMMU_PQCSR_PQON          RISCV_IOMMU_QUEUE_ACTIVE
+#define RISCV_IOMMU_PQCSR_BUSY          RISCV_IOMMU_QUEUE_BUSY
+
+/* 5.18 Interrupt Pending Status (32bits) */
+#define RISCV_IOMMU_REG_IPSR            0x0054
+
+enum {
+    RISCV_IOMMU_INTR_CQ,
+    RISCV_IOMMU_INTR_FQ,
+    RISCV_IOMMU_INTR_PM,
+    RISCV_IOMMU_INTR_PQ,
+    RISCV_IOMMU_INTR_COUNT
+};
+
+/* 5.27 Interrupt cause to vector (64bits) */
+#define RISCV_IOMMU_REG_IVEC            0x02F8
+
+/* 5.28 MSI Configuration table (32 * 64bits) */
+#define RISCV_IOMMU_REG_MSI_CONFIG      0x0300
+
+#define RISCV_IOMMU_REG_SIZE           0x1000
+
+#define RISCV_IOMMU_DDTE_VALID          BIT_ULL(0)
+#define RISCV_IOMMU_DDTE_PPN            RISCV_IOMMU_PPN_FIELD
+
+/* Struct riscv_iommu_dc - Device Context - section 2.1 */
+struct riscv_iommu_dc {
+      uint64_t tc;
+      uint64_t iohgatp;
+      uint64_t ta;
+      uint64_t fsc;
+      uint64_t msiptp;
+      uint64_t msi_addr_mask;
+      uint64_t msi_addr_pattern;
+      uint64_t _reserved;
+};
+
+/* Translation control fields */
+#define RISCV_IOMMU_DC_TC_V             BIT_ULL(0)
+#define RISCV_IOMMU_DC_TC_DTF           BIT_ULL(4)
+#define RISCV_IOMMU_DC_TC_PDTV          BIT_ULL(5)
+#define RISCV_IOMMU_DC_TC_PRPR          BIT_ULL(6)
+#define RISCV_IOMMU_DC_TC_DPE           BIT_ULL(9)
+#define RISCV_IOMMU_DC_TC_SXL           BIT_ULL(11)
+
+/* Second-stage (aka G-stage) context fields */
+#define RISCV_IOMMU_DC_IOHGATP_PPN      RISCV_IOMMU_ATP_PPN_FIELD
+#define RISCV_IOMMU_DC_IOHGATP_GSCID    GENMASK_ULL(59, 44)
+#define RISCV_IOMMU_DC_IOHGATP_MODE     RISCV_IOMMU_ATP_MODE_FIELD
+
+enum riscv_iommu_dc_iohgatp_modes {
+    RISCV_IOMMU_DC_IOHGATP_MODE_BARE = 0,
+    RISCV_IOMMU_DC_IOHGATP_MODE_SV32X4 = 8,
+    RISCV_IOMMU_DC_IOHGATP_MODE_SV39X4 = 8,
+    RISCV_IOMMU_DC_IOHGATP_MODE_SV48X4 = 9,
+    RISCV_IOMMU_DC_IOHGATP_MODE_SV57X4 = 10
+};
+
+/* Translation attributes fields */
+#define RISCV_IOMMU_DC_TA_PSCID         GENMASK_ULL(31, 12)
+
+/* First-stage context fields */
+#define RISCV_IOMMU_DC_FSC_PPN          RISCV_IOMMU_ATP_PPN_FIELD
+#define RISCV_IOMMU_DC_FSC_MODE         RISCV_IOMMU_ATP_MODE_FIELD
+
+/* Generic I/O MMU command structure - check section 3.1 */
+struct riscv_iommu_command {
+    uint64_t dword0;
+    uint64_t dword1;
+};
+
+#define RISCV_IOMMU_CMD_OPCODE          GENMASK_ULL(6, 0)
+#define RISCV_IOMMU_CMD_FUNC            GENMASK_ULL(9, 7)
+
+#define RISCV_IOMMU_CMD_IOTINVAL_OPCODE         1
+#define RISCV_IOMMU_CMD_IOTINVAL_FUNC_VMA       0
+#define RISCV_IOMMU_CMD_IOTINVAL_FUNC_GVMA      1
+#define RISCV_IOMMU_CMD_IOTINVAL_AV     BIT_ULL(10)
+#define RISCV_IOMMU_CMD_IOTINVAL_PSCID  GENMASK_ULL(31, 12)
+#define RISCV_IOMMU_CMD_IOTINVAL_PSCV   BIT_ULL(32)
+#define RISCV_IOMMU_CMD_IOTINVAL_GV     BIT_ULL(33)
+#define RISCV_IOMMU_CMD_IOTINVAL_GSCID  GENMASK_ULL(59, 44)
+
+#define RISCV_IOMMU_CMD_IOFENCE_OPCODE          2
+#define RISCV_IOMMU_CMD_IOFENCE_FUNC_C          0
+#define RISCV_IOMMU_CMD_IOFENCE_AV      BIT_ULL(10)
+#define RISCV_IOMMU_CMD_IOFENCE_DATA    GENMASK_ULL(63, 32)
+
+#define RISCV_IOMMU_CMD_IODIR_OPCODE            3
+#define RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_DDT    0
+#define RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_PDT    1
+#define RISCV_IOMMU_CMD_IODIR_PID       GENMASK_ULL(31, 12)
+#define RISCV_IOMMU_CMD_IODIR_DV        BIT_ULL(33)
+#define RISCV_IOMMU_CMD_IODIR_DID       GENMASK_ULL(63, 40)
+
+enum riscv_iommu_dc_fsc_atp_modes {
+    RISCV_IOMMU_DC_FSC_MODE_BARE = 0,
+    RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV32 = 8,
+    RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 = 8,
+    RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV48 = 9,
+    RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV57 = 10,
+    RISCV_IOMMU_DC_FSC_PDTP_MODE_PD8 = 1,
+    RISCV_IOMMU_DC_FSC_PDTP_MODE_PD17 = 2,
+    RISCV_IOMMU_DC_FSC_PDTP_MODE_PD20 = 3
+};
+
+enum riscv_iommu_fq_causes {
+    RISCV_IOMMU_FQ_CAUSE_INST_FAULT           = 1,
+    RISCV_IOMMU_FQ_CAUSE_RD_ADDR_MISALIGNED   = 4,
+    RISCV_IOMMU_FQ_CAUSE_RD_FAULT             = 5,
+    RISCV_IOMMU_FQ_CAUSE_WR_ADDR_MISALIGNED   = 6,
+    RISCV_IOMMU_FQ_CAUSE_WR_FAULT             = 7,
+    RISCV_IOMMU_FQ_CAUSE_INST_FAULT_S         = 12,
+    RISCV_IOMMU_FQ_CAUSE_RD_FAULT_S           = 13,
+    RISCV_IOMMU_FQ_CAUSE_WR_FAULT_S           = 15,
+    RISCV_IOMMU_FQ_CAUSE_INST_FAULT_VS        = 20,
+    RISCV_IOMMU_FQ_CAUSE_RD_FAULT_VS          = 21,
+    RISCV_IOMMU_FQ_CAUSE_WR_FAULT_VS          = 23,
+    RISCV_IOMMU_FQ_CAUSE_DMA_DISABLED         = 256,
+    RISCV_IOMMU_FQ_CAUSE_DDT_LOAD_FAULT       = 257,
+    RISCV_IOMMU_FQ_CAUSE_DDT_INVALID          = 258,
+    RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED    = 259,
+    RISCV_IOMMU_FQ_CAUSE_TTYPE_BLOCKED        = 260,
+    RISCV_IOMMU_FQ_CAUSE_MSI_LOAD_FAULT       = 261,
+    RISCV_IOMMU_FQ_CAUSE_MSI_INVALID          = 262,
+    RISCV_IOMMU_FQ_CAUSE_MSI_MISCONFIGURED    = 263,
+    RISCV_IOMMU_FQ_CAUSE_MRIF_FAULT           = 264,
+    RISCV_IOMMU_FQ_CAUSE_PDT_LOAD_FAULT       = 265,
+    RISCV_IOMMU_FQ_CAUSE_PDT_INVALID          = 266,
+    RISCV_IOMMU_FQ_CAUSE_PDT_MISCONFIGURED    = 267,
+    RISCV_IOMMU_FQ_CAUSE_DDT_CORRUPTED        = 268,
+    RISCV_IOMMU_FQ_CAUSE_PDT_CORRUPTED        = 269,
+    RISCV_IOMMU_FQ_CAUSE_MSI_PT_CORRUPTED     = 270,
+    RISCV_IOMMU_FQ_CAUSE_MRIF_CORRUIPTED      = 271,
+    RISCV_IOMMU_FQ_CAUSE_INTERNAL_DP_ERROR    = 272,
+    RISCV_IOMMU_FQ_CAUSE_MSI_WR_FAULT         = 273,
+    RISCV_IOMMU_FQ_CAUSE_PT_CORRUPTED         = 274
+};
+
+/* MSI page table pointer */
+#define RISCV_IOMMU_DC_MSIPTP_PPN       RISCV_IOMMU_ATP_PPN_FIELD
+#define RISCV_IOMMU_DC_MSIPTP_MODE      RISCV_IOMMU_ATP_MODE_FIELD
+#define RISCV_IOMMU_DC_MSIPTP_MODE_FLAT 1
+
+/* Translation attributes fields */
+#define RISCV_IOMMU_PC_TA_V             BIT_ULL(0)
+
+/* First stage context fields */
+#define RISCV_IOMMU_PC_FSC_PPN          GENMASK_ULL(43, 0)
+
+enum riscv_iommu_fq_ttypes {
+    RISCV_IOMMU_FQ_TTYPE_NONE = 0,
+    RISCV_IOMMU_FQ_TTYPE_UADDR_INST_FETCH = 1,
+    RISCV_IOMMU_FQ_TTYPE_UADDR_RD = 2,
+    RISCV_IOMMU_FQ_TTYPE_UADDR_WR = 3,
+    RISCV_IOMMU_FQ_TTYPE_TADDR_INST_FETCH = 5,
+    RISCV_IOMMU_FQ_TTYPE_TADDR_RD = 6,
+    RISCV_IOMMU_FQ_TTYPE_TADDR_WR = 7,
+    RISCV_IOMMU_FW_TTYPE_PCIE_MSG_REQ = 8,
+};
+
+/* Fields on pte */
+#define RISCV_IOMMU_MSI_PTE_V           BIT_ULL(0)
+#define RISCV_IOMMU_MSI_PTE_M           GENMASK_ULL(2, 1)
+
+#define RISCV_IOMMU_MSI_PTE_M_MRIF      1
+#define RISCV_IOMMU_MSI_PTE_M_BASIC     3
+
+/* When M == 1 (MRIF mode) */
+#define RISCV_IOMMU_MSI_PTE_MRIF_ADDR   GENMASK_ULL(53, 7)
+/* When M == 3 (basic mode) */
+#define RISCV_IOMMU_MSI_PTE_PPN         RISCV_IOMMU_PPN_FIELD
+#define RISCV_IOMMU_MSI_PTE_C           BIT_ULL(63)
+
+/* Fields on mrif_info */
+#define RISCV_IOMMU_MSI_MRIF_NID        GENMASK_ULL(9, 0)
+#define RISCV_IOMMU_MSI_MRIF_NPPN       RISCV_IOMMU_PPN_FIELD
+#define RISCV_IOMMU_MSI_MRIF_NID_MSB    BIT_ULL(60)
+
+#endif /* _RISCV_IOMMU_BITS_H_ */
-- 
2.43.2



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 03/15] hw/riscv: add RISC-V IOMMU base emulation
  2024-03-07 16:03 [PATCH v2 00/15] riscv: QEMU RISC-V IOMMU Support Daniel Henrique Barboza
  2024-03-07 16:03 ` [PATCH v2 01/15] exec/memtxattr: add process identifier to the transaction attributes Daniel Henrique Barboza
  2024-03-07 16:03 ` [PATCH v2 02/15] hw/riscv: add riscv-iommu-bits.h Daniel Henrique Barboza
@ 2024-03-07 16:03 ` Daniel Henrique Barboza
  2024-05-01 11:57   ` Jason Chien
  2024-05-02 11:37   ` Frank Chang
  2024-03-07 16:03 ` [PATCH v2 04/15] hw/riscv: add riscv-iommu-pci device Daniel Henrique Barboza
                   ` (12 subsequent siblings)
  15 siblings, 2 replies; 55+ messages in thread
From: Daniel Henrique Barboza @ 2024-03-07 16:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, zhiwei_liu,
	palmer, ajones, tjeznach, Sebastien Boeuf,
	Daniel Henrique Barboza

From: Tomasz Jeznach <tjeznach@rivosinc.com>

The RISC-V IOMMU specification is now ratified as-per the RISC-V
international process. The latest frozen specifcation can be found
at:

https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf

Add the foundation of the device emulation for RISC-V IOMMU, which
includes an IOMMU that has no capabilities but MSI interrupt support and
fault queue interfaces. We'll add add more features incrementally in the
next patches.

Co-developed-by: Sebastien Boeuf <seb@rivosinc.com>
Signed-off-by: Sebastien Boeuf <seb@rivosinc.com>
Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
---
 hw/riscv/Kconfig         |    4 +
 hw/riscv/meson.build     |    1 +
 hw/riscv/riscv-iommu.c   | 1492 ++++++++++++++++++++++++++++++++++++++
 hw/riscv/riscv-iommu.h   |  141 ++++
 hw/riscv/trace-events    |   11 +
 hw/riscv/trace.h         |    2 +
 include/hw/riscv/iommu.h |   36 +
 meson.build              |    1 +
 8 files changed, 1688 insertions(+)
 create mode 100644 hw/riscv/riscv-iommu.c
 create mode 100644 hw/riscv/riscv-iommu.h
 create mode 100644 hw/riscv/trace-events
 create mode 100644 hw/riscv/trace.h
 create mode 100644 include/hw/riscv/iommu.h

diff --git a/hw/riscv/Kconfig b/hw/riscv/Kconfig
index 5d644eb7b1..faf6a10029 100644
--- a/hw/riscv/Kconfig
+++ b/hw/riscv/Kconfig
@@ -1,3 +1,6 @@
+config RISCV_IOMMU
+    bool
+
 config RISCV_NUMA
     bool
 
@@ -38,6 +41,7 @@ config RISCV_VIRT
     select SERIAL
     select RISCV_ACLINT
     select RISCV_APLIC
+    select RISCV_IOMMU
     select RISCV_IMSIC
     select SIFIVE_PLIC
     select SIFIVE_TEST
diff --git a/hw/riscv/meson.build b/hw/riscv/meson.build
index 2f7ee81be3..ba9eebd605 100644
--- a/hw/riscv/meson.build
+++ b/hw/riscv/meson.build
@@ -10,5 +10,6 @@ riscv_ss.add(when: 'CONFIG_SIFIVE_U', if_true: files('sifive_u.c'))
 riscv_ss.add(when: 'CONFIG_SPIKE', if_true: files('spike.c'))
 riscv_ss.add(when: 'CONFIG_MICROCHIP_PFSOC', if_true: files('microchip_pfsoc.c'))
 riscv_ss.add(when: 'CONFIG_ACPI', if_true: files('virt-acpi-build.c'))
+riscv_ss.add(when: 'CONFIG_RISCV_IOMMU', if_true: files('riscv-iommu.c'))
 
 hw_arch += {'riscv': riscv_ss}
diff --git a/hw/riscv/riscv-iommu.c b/hw/riscv/riscv-iommu.c
new file mode 100644
index 0000000000..df534b99b0
--- /dev/null
+++ b/hw/riscv/riscv-iommu.c
@@ -0,0 +1,1492 @@
+/*
+ * QEMU emulation of an RISC-V IOMMU (Ziommu)
+ *
+ * Copyright (C) 2021-2023, Rivos Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "qom/object.h"
+#include "hw/pci/pci_bus.h"
+#include "hw/pci/pci_device.h"
+#include "hw/qdev-properties.h"
+#include "hw/riscv/riscv_hart.h"
+#include "migration/vmstate.h"
+#include "qapi/error.h"
+#include "qemu/timer.h"
+
+#include "cpu_bits.h"
+#include "riscv-iommu.h"
+#include "riscv-iommu-bits.h"
+#include "trace.h"
+
+#define LIMIT_CACHE_CTX               (1U << 7)
+#define LIMIT_CACHE_IOT               (1U << 20)
+
+/* Physical page number coversions */
+#define PPN_PHYS(ppn)                 ((ppn) << TARGET_PAGE_BITS)
+#define PPN_DOWN(phy)                 ((phy) >> TARGET_PAGE_BITS)
+
+typedef struct RISCVIOMMUContext RISCVIOMMUContext;
+typedef struct RISCVIOMMUEntry RISCVIOMMUEntry;
+
+/* Device assigned I/O address space */
+struct RISCVIOMMUSpace {
+    IOMMUMemoryRegion iova_mr;  /* IOVA memory region for attached device */
+    AddressSpace iova_as;       /* IOVA address space for attached device */
+    RISCVIOMMUState *iommu;     /* Managing IOMMU device state */
+    uint32_t devid;             /* Requester identifier, AKA device_id */
+    bool notifier;              /* IOMMU unmap notifier enabled */
+    QLIST_ENTRY(RISCVIOMMUSpace) list;
+};
+
+/* Device translation context state. */
+struct RISCVIOMMUContext {
+    uint64_t devid:24;          /* Requester Id, AKA device_id */
+    uint64_t pasid:20;          /* Process Address Space ID */
+    uint64_t __rfu:20;          /* reserved */
+    uint64_t tc;                /* Translation Control */
+    uint64_t ta;                /* Translation Attributes */
+    uint64_t msi_addr_mask;     /* MSI filtering - address mask */
+    uint64_t msi_addr_pattern;  /* MSI filtering - address pattern */
+    uint64_t msiptp;            /* MSI redirection page table pointer */
+};
+
+/* IOMMU index for transactions without PASID specified. */
+#define RISCV_IOMMU_NOPASID 0
+
+static void riscv_iommu_notify(RISCVIOMMUState *s, int vec)
+{
+    const uint32_t ipsr =
+        riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_IPSR, (1 << vec), 0);
+    const uint32_t ivec = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_IVEC);
+    if (s->notify && !(ipsr & (1 << vec))) {
+        s->notify(s, (ivec >> (vec * 4)) & 0x0F);
+    }
+}
+
+static void riscv_iommu_fault(RISCVIOMMUState *s,
+                              struct riscv_iommu_fq_record *ev)
+{
+    uint32_t ctrl = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_FQCSR);
+    uint32_t head = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_FQH) & s->fq_mask;
+    uint32_t tail = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_FQT) & s->fq_mask;
+    uint32_t next = (tail + 1) & s->fq_mask;
+    uint32_t devid = get_field(ev->hdr, RISCV_IOMMU_FQ_HDR_DID);
+
+    trace_riscv_iommu_flt(s->parent_obj.id, PCI_BUS_NUM(devid), PCI_SLOT(devid),
+                          PCI_FUNC(devid), ev->hdr, ev->iotval);
+
+    if (!(ctrl & RISCV_IOMMU_FQCSR_FQON) ||
+        !!(ctrl & (RISCV_IOMMU_FQCSR_FQOF | RISCV_IOMMU_FQCSR_FQMF))) {
+        return;
+    }
+
+    if (head == next) {
+        riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_FQCSR,
+                              RISCV_IOMMU_FQCSR_FQOF, 0);
+    } else {
+        dma_addr_t addr = s->fq_addr + tail * sizeof(*ev);
+        if (dma_memory_write(s->target_as, addr, ev, sizeof(*ev),
+                             MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) {
+            riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_FQCSR,
+                                  RISCV_IOMMU_FQCSR_FQMF, 0);
+        } else {
+            riscv_iommu_reg_set32(s, RISCV_IOMMU_REG_FQT, next);
+        }
+    }
+
+    if (ctrl & RISCV_IOMMU_FQCSR_FIE) {
+        riscv_iommu_notify(s, RISCV_IOMMU_INTR_FQ);
+    }
+}
+
+static void riscv_iommu_pri(RISCVIOMMUState *s,
+    struct riscv_iommu_pq_record *pr)
+{
+    uint32_t ctrl = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_PQCSR);
+    uint32_t head = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_PQH) & s->pq_mask;
+    uint32_t tail = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_PQT) & s->pq_mask;
+    uint32_t next = (tail + 1) & s->pq_mask;
+    uint32_t devid = get_field(pr->hdr, RISCV_IOMMU_PREQ_HDR_DID);
+
+    trace_riscv_iommu_pri(s->parent_obj.id, PCI_BUS_NUM(devid), PCI_SLOT(devid),
+                          PCI_FUNC(devid), pr->payload);
+
+    if (!(ctrl & RISCV_IOMMU_PQCSR_PQON) ||
+        !!(ctrl & (RISCV_IOMMU_PQCSR_PQOF | RISCV_IOMMU_PQCSR_PQMF))) {
+        return;
+    }
+
+    if (head == next) {
+        riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_PQCSR,
+                              RISCV_IOMMU_PQCSR_PQOF, 0);
+    } else {
+        dma_addr_t addr = s->pq_addr + tail * sizeof(*pr);
+        if (dma_memory_write(s->target_as, addr, pr, sizeof(*pr),
+                             MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) {
+            riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_PQCSR,
+                                  RISCV_IOMMU_PQCSR_PQMF, 0);
+        } else {
+            riscv_iommu_reg_set32(s, RISCV_IOMMU_REG_PQT, next);
+        }
+    }
+
+    if (ctrl & RISCV_IOMMU_PQCSR_PIE) {
+        riscv_iommu_notify(s, RISCV_IOMMU_INTR_PQ);
+    }
+}
+
+/* Portable implementation of pext_u64, bit-mask extraction. */
+static uint64_t _pext_u64(uint64_t val, uint64_t ext)
+{
+    uint64_t ret = 0;
+    uint64_t rot = 1;
+
+    while (ext) {
+        if (ext & 1) {
+            if (val & 1) {
+                ret |= rot;
+            }
+            rot <<= 1;
+        }
+        val >>= 1;
+        ext >>= 1;
+    }
+
+    return ret;
+}
+
+/* Check if GPA matches MSI/MRIF pattern. */
+static bool riscv_iommu_msi_check(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
+    dma_addr_t gpa)
+{
+    if (get_field(ctx->msiptp, RISCV_IOMMU_DC_MSIPTP_MODE) !=
+        RISCV_IOMMU_DC_MSIPTP_MODE_FLAT) {
+        return false; /* Invalid MSI/MRIF mode */
+    }
+
+    if ((PPN_DOWN(gpa) ^ ctx->msi_addr_pattern) & ~ctx->msi_addr_mask) {
+        return false; /* GPA not in MSI range defined by AIA IMSIC rules. */
+    }
+
+    return true;
+}
+
+/* RISCV IOMMU Address Translation Lookup - Page Table Walk */
+static int riscv_iommu_spa_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
+    IOMMUTLBEntry *iotlb)
+{
+    /* Early check for MSI address match when IOVA == GPA */
+    if (iotlb->perm & IOMMU_WO &&
+        riscv_iommu_msi_check(s, ctx, iotlb->iova)) {
+        iotlb->target_as = &s->trap_as;
+        iotlb->translated_addr = iotlb->iova;
+        iotlb->addr_mask = ~TARGET_PAGE_MASK;
+        return 0;
+    }
+
+    /* Exit early for pass-through mode. */
+    iotlb->translated_addr = iotlb->iova;
+    iotlb->addr_mask = ~TARGET_PAGE_MASK;
+    /* Allow R/W in pass-through mode */
+    iotlb->perm = IOMMU_RW;
+    return 0;
+}
+
+/* Redirect MSI write for given GPA. */
+static MemTxResult riscv_iommu_msi_write(RISCVIOMMUState *s,
+    RISCVIOMMUContext *ctx, uint64_t gpa, uint64_t data,
+    unsigned size, MemTxAttrs attrs)
+{
+    MemTxResult res;
+    dma_addr_t addr;
+    uint64_t intn;
+    uint32_t n190;
+    uint64_t pte[2];
+
+    if (!riscv_iommu_msi_check(s, ctx, gpa)) {
+        return MEMTX_ACCESS_ERROR;
+    }
+
+    /* Interrupt File Number */
+    intn = _pext_u64(PPN_DOWN(gpa), ctx->msi_addr_mask);
+    if (intn >= 256) {
+        /* Interrupt file number out of range */
+        return MEMTX_ACCESS_ERROR;
+    }
+
+    /* fetch MSI PTE */
+    addr = PPN_PHYS(get_field(ctx->msiptp, RISCV_IOMMU_DC_MSIPTP_PPN));
+    addr = addr | (intn * sizeof(pte));
+    res = dma_memory_read(s->target_as, addr, &pte, sizeof(pte),
+            MEMTXATTRS_UNSPECIFIED);
+    if (res != MEMTX_OK) {
+        return res;
+    }
+
+    le64_to_cpus(&pte[0]);
+    le64_to_cpus(&pte[1]);
+
+    if (!(pte[0] & RISCV_IOMMU_MSI_PTE_V) || (pte[0] & RISCV_IOMMU_MSI_PTE_C)) {
+        return MEMTX_ACCESS_ERROR;
+    }
+
+    switch (get_field(pte[0], RISCV_IOMMU_MSI_PTE_M)) {
+    case RISCV_IOMMU_MSI_PTE_M_BASIC:
+        /* MSI Pass-through mode */
+        addr = PPN_PHYS(get_field(pte[0], RISCV_IOMMU_MSI_PTE_PPN));
+        addr = addr | (gpa & TARGET_PAGE_MASK);
+
+        trace_riscv_iommu_msi(s->parent_obj.id, PCI_BUS_NUM(ctx->devid),
+                              PCI_SLOT(ctx->devid), PCI_FUNC(ctx->devid),
+                              gpa, addr);
+
+        return dma_memory_write(s->target_as, addr, &data, size, attrs);
+    case RISCV_IOMMU_MSI_PTE_M_MRIF:
+        /* MRIF mode, continue. */
+        break;
+    default:
+        return MEMTX_ACCESS_ERROR;
+    }
+
+    /*
+     * Report an error for interrupt identities exceeding the maximum allowed
+     * for an IMSIC interrupt file (2047) or destination address is not 32-bit
+     * aligned. See IOMMU Specification, Chapter 2.3. MSI page tables.
+     */
+    if ((data > 2047) || (gpa & 3)) {
+        return MEMTX_ACCESS_ERROR;
+    }
+
+    /* MSI MRIF mode, non atomic pending bit update */
+
+    /* MRIF pending bit address */
+    addr = get_field(pte[0], RISCV_IOMMU_MSI_PTE_MRIF_ADDR) << 9;
+    addr = addr | ((data & 0x7c0) >> 3);
+
+    trace_riscv_iommu_msi(s->parent_obj.id, PCI_BUS_NUM(ctx->devid),
+                          PCI_SLOT(ctx->devid), PCI_FUNC(ctx->devid),
+                          gpa, addr);
+
+    /* MRIF pending bit mask */
+    data = 1ULL << (data & 0x03f);
+    res = dma_memory_read(s->target_as, addr, &intn, sizeof(intn), attrs);
+    if (res != MEMTX_OK) {
+        return res;
+    }
+    intn = intn | data;
+    res = dma_memory_write(s->target_as, addr, &intn, sizeof(intn), attrs);
+    if (res != MEMTX_OK) {
+        return res;
+    }
+
+    /* Get MRIF enable bits */
+    addr = addr + sizeof(intn);
+    res = dma_memory_read(s->target_as, addr, &intn, sizeof(intn), attrs);
+    if (res != MEMTX_OK) {
+        return res;
+    }
+    if (!(intn & data)) {
+        /* notification disabled, MRIF update completed. */
+        return MEMTX_OK;
+    }
+
+    /* Send notification message */
+    addr = PPN_PHYS(get_field(pte[1], RISCV_IOMMU_MSI_MRIF_NPPN));
+    n190 = get_field(pte[1], RISCV_IOMMU_MSI_MRIF_NID) |
+          (get_field(pte[1], RISCV_IOMMU_MSI_MRIF_NID_MSB) << 10);
+
+    res = dma_memory_write(s->target_as, addr, &n190, sizeof(n190), attrs);
+    if (res != MEMTX_OK) {
+        return res;
+    }
+
+    return MEMTX_OK;
+}
+
+/*
+ * RISC-V IOMMU Device Context Loopkup - Device Directory Tree Walk
+ *
+ * @s         : IOMMU Device State
+ * @ctx       : Device Translation Context with devid and pasid set.
+ * @return    : success or fault code.
+ */
+static int riscv_iommu_ctx_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx)
+{
+    const uint64_t ddtp = s->ddtp;
+    unsigned mode = get_field(ddtp, RISCV_IOMMU_DDTP_MODE);
+    dma_addr_t addr = PPN_PHYS(get_field(ddtp, RISCV_IOMMU_DDTP_PPN));
+    struct riscv_iommu_dc dc;
+    /* Device Context format: 0: extended (64 bytes) | 1: base (32 bytes) */
+    const int dc_fmt = !s->enable_msi;
+    const size_t dc_len = sizeof(dc) >> dc_fmt;
+    unsigned depth;
+    uint64_t de;
+
+    switch (mode) {
+    case RISCV_IOMMU_DDTP_MODE_OFF:
+        return RISCV_IOMMU_FQ_CAUSE_DMA_DISABLED;
+
+    case RISCV_IOMMU_DDTP_MODE_BARE:
+        /* mock up pass-through translation context */
+        ctx->tc = RISCV_IOMMU_DC_TC_V;
+        ctx->ta = 0;
+        ctx->msiptp = 0;
+        return 0;
+
+    case RISCV_IOMMU_DDTP_MODE_1LVL:
+        depth = 0;
+        break;
+
+    case RISCV_IOMMU_DDTP_MODE_2LVL:
+        depth = 1;
+        break;
+
+    case RISCV_IOMMU_DDTP_MODE_3LVL:
+        depth = 2;
+        break;
+
+    default:
+        return RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED;
+    }
+
+    /*
+     * Check supported device id width (in bits).
+     * See IOMMU Specification, Chapter 6. Software guidelines.
+     * - if extended device-context format is used:
+     *   1LVL: 6, 2LVL: 15, 3LVL: 24
+     * - if base device-context format is used:
+     *   1LVL: 7, 2LVL: 16, 3LVL: 24
+     */
+    if (ctx->devid >= (1 << (depth * 9 + 6 + (dc_fmt && depth != 2)))) {
+        return RISCV_IOMMU_FQ_CAUSE_DDT_INVALID;
+    }
+
+    /* Device directory tree walk */
+    for (; depth-- > 0; ) {
+        /*
+         * Select device id index bits based on device directory tree level
+         * and device context format.
+         * See IOMMU Specification, Chapter 2. Data Structures.
+         * - if extended device-context format is used:
+         *   device index: [23:15][14:6][5:0]
+         * - if base device-context format is used:
+         *   device index: [23:16][15:7][6:0]
+         */
+        const int split = depth * 9 + 6 + dc_fmt;
+        addr |= ((ctx->devid >> split) << 3) & ~TARGET_PAGE_MASK;
+        if (dma_memory_read(s->target_as, addr, &de, sizeof(de),
+                            MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) {
+            return RISCV_IOMMU_FQ_CAUSE_DDT_LOAD_FAULT;
+        }
+        le64_to_cpus(&de);
+        if (!(de & RISCV_IOMMU_DDTE_VALID)) {
+            /* invalid directory entry */
+            return RISCV_IOMMU_FQ_CAUSE_DDT_INVALID;
+        }
+        if (de & ~(RISCV_IOMMU_DDTE_PPN | RISCV_IOMMU_DDTE_VALID)) {
+            /* reserved bits set */
+            return RISCV_IOMMU_FQ_CAUSE_DDT_INVALID;
+        }
+        addr = PPN_PHYS(get_field(de, RISCV_IOMMU_DDTE_PPN));
+    }
+
+    /* index into device context entry page */
+    addr |= (ctx->devid * dc_len) & ~TARGET_PAGE_MASK;
+
+    memset(&dc, 0, sizeof(dc));
+    if (dma_memory_read(s->target_as, addr, &dc, dc_len,
+                        MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) {
+        return RISCV_IOMMU_FQ_CAUSE_DDT_LOAD_FAULT;
+    }
+
+    /* Set translation context. */
+    ctx->tc = le64_to_cpu(dc.tc);
+    ctx->ta = le64_to_cpu(dc.ta);
+    ctx->msiptp = le64_to_cpu(dc.msiptp);
+    ctx->msi_addr_mask = le64_to_cpu(dc.msi_addr_mask);
+    ctx->msi_addr_pattern = le64_to_cpu(dc.msi_addr_pattern);
+
+    if (!(ctx->tc & RISCV_IOMMU_DC_TC_V)) {
+        return RISCV_IOMMU_FQ_CAUSE_DDT_INVALID;
+    }
+
+    if (!(ctx->tc & RISCV_IOMMU_DC_TC_PDTV)) {
+        if (ctx->pasid != RISCV_IOMMU_NOPASID) {
+            /* PASID is disabled */
+            return RISCV_IOMMU_FQ_CAUSE_TTYPE_BLOCKED;
+        }
+        return 0;
+    }
+
+    /* FSC.TC.PDTV enabled */
+    if (mode > RISCV_IOMMU_DC_FSC_PDTP_MODE_PD20) {
+        /* Invalid PDTP.MODE */
+        return RISCV_IOMMU_FQ_CAUSE_PDT_MISCONFIGURED;
+    }
+
+    for (depth = mode - RISCV_IOMMU_DC_FSC_PDTP_MODE_PD8; depth-- > 0; ) {
+        /*
+         * Select process id index bits based on process directory tree
+         * level. See IOMMU Specification, 2.2. Process-Directory-Table.
+         */
+        const int split = depth * 9 + 8;
+        addr |= ((ctx->pasid >> split) << 3) & ~TARGET_PAGE_MASK;
+        if (dma_memory_read(s->target_as, addr, &de, sizeof(de),
+                            MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) {
+            return RISCV_IOMMU_FQ_CAUSE_PDT_LOAD_FAULT;
+        }
+        le64_to_cpus(&de);
+        if (!(de & RISCV_IOMMU_PC_TA_V)) {
+            return RISCV_IOMMU_FQ_CAUSE_PDT_INVALID;
+        }
+        addr = PPN_PHYS(get_field(de, RISCV_IOMMU_PC_FSC_PPN));
+    }
+
+    /* Leaf entry in PDT */
+    addr |= (ctx->pasid << 4) & ~TARGET_PAGE_MASK;
+    if (dma_memory_read(s->target_as, addr, &dc.ta, sizeof(uint64_t) * 2,
+                        MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) {
+        return RISCV_IOMMU_FQ_CAUSE_PDT_LOAD_FAULT;
+    }
+
+    /* Use FSC and TA from process directory entry. */
+    ctx->ta = le64_to_cpu(dc.ta);
+
+    return 0;
+}
+
+/* Translation Context cache support */
+static gboolean __ctx_equal(gconstpointer v1, gconstpointer v2)
+{
+    RISCVIOMMUContext *c1 = (RISCVIOMMUContext *) v1;
+    RISCVIOMMUContext *c2 = (RISCVIOMMUContext *) v2;
+    return c1->devid == c2->devid && c1->pasid == c2->pasid;
+}
+
+static guint __ctx_hash(gconstpointer v)
+{
+    RISCVIOMMUContext *ctx = (RISCVIOMMUContext *) v;
+    /* Generate simple hash of (pasid, devid), assuming 24-bit wide devid */
+    return (guint)(ctx->devid) + ((guint)(ctx->pasid) << 24);
+}
+
+static void __ctx_inval_devid_pasid(gpointer key, gpointer value, gpointer data)
+{
+    RISCVIOMMUContext *ctx = (RISCVIOMMUContext *) value;
+    RISCVIOMMUContext *arg = (RISCVIOMMUContext *) data;
+    if (ctx->tc & RISCV_IOMMU_DC_TC_V &&
+        ctx->devid == arg->devid &&
+        ctx->pasid == arg->pasid) {
+        ctx->tc &= ~RISCV_IOMMU_DC_TC_V;
+    }
+}
+
+static void __ctx_inval_devid(gpointer key, gpointer value, gpointer data)
+{
+    RISCVIOMMUContext *ctx = (RISCVIOMMUContext *) value;
+    RISCVIOMMUContext *arg = (RISCVIOMMUContext *) data;
+    if (ctx->tc & RISCV_IOMMU_DC_TC_V &&
+        ctx->devid == arg->devid) {
+        ctx->tc &= ~RISCV_IOMMU_DC_TC_V;
+    }
+}
+
+static void __ctx_inval_all(gpointer key, gpointer value, gpointer data)
+{
+    RISCVIOMMUContext *ctx = (RISCVIOMMUContext *) value;
+    if (ctx->tc & RISCV_IOMMU_DC_TC_V) {
+        ctx->tc &= ~RISCV_IOMMU_DC_TC_V;
+    }
+}
+
+static void riscv_iommu_ctx_inval(RISCVIOMMUState *s, GHFunc func,
+    uint32_t devid, uint32_t pasid)
+{
+    GHashTable *ctx_cache;
+    RISCVIOMMUContext key = {
+        .devid = devid,
+        .pasid = pasid,
+    };
+    ctx_cache = g_hash_table_ref(s->ctx_cache);
+    g_hash_table_foreach(ctx_cache, func, &key);
+    g_hash_table_unref(ctx_cache);
+}
+
+/* Find or allocate translation context for a given {device_id, process_id} */
+static RISCVIOMMUContext *riscv_iommu_ctx(RISCVIOMMUState *s,
+    unsigned devid, unsigned pasid, void **ref)
+{
+    GHashTable *ctx_cache;
+    RISCVIOMMUContext *ctx;
+    RISCVIOMMUContext key = {
+        .devid = devid,
+        .pasid = pasid,
+    };
+
+    ctx_cache = g_hash_table_ref(s->ctx_cache);
+    ctx = g_hash_table_lookup(ctx_cache, &key);
+
+    if (ctx && (ctx->tc & RISCV_IOMMU_DC_TC_V)) {
+        *ref = ctx_cache;
+        return ctx;
+    }
+
+    if (g_hash_table_size(s->ctx_cache) >= LIMIT_CACHE_CTX) {
+        ctx_cache = g_hash_table_new_full(__ctx_hash, __ctx_equal,
+                                          g_free, NULL);
+        g_hash_table_unref(qatomic_xchg(&s->ctx_cache, ctx_cache));
+    }
+
+    ctx = g_new0(RISCVIOMMUContext, 1);
+    ctx->devid = devid;
+    ctx->pasid = pasid;
+
+    int fault = riscv_iommu_ctx_fetch(s, ctx);
+    if (!fault) {
+        g_hash_table_add(ctx_cache, ctx);
+        *ref = ctx_cache;
+        return ctx;
+    }
+
+    g_hash_table_unref(ctx_cache);
+    *ref = NULL;
+
+    if (!(ctx->tc & RISCV_IOMMU_DC_TC_DTF)) {
+        struct riscv_iommu_fq_record ev = { 0 };
+        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_CAUSE, fault);
+        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_TTYPE,
+            RISCV_IOMMU_FQ_TTYPE_UADDR_RD);
+        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_DID, devid);
+        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_PID, pasid);
+        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_PV, !!pasid);
+        riscv_iommu_fault(s, &ev);
+    }
+
+    g_free(ctx);
+    return NULL;
+}
+
+static void riscv_iommu_ctx_put(RISCVIOMMUState *s, void *ref)
+{
+    if (ref) {
+        g_hash_table_unref((GHashTable *)ref);
+    }
+}
+
+/* Find or allocate address space for a given device */
+static AddressSpace *riscv_iommu_space(RISCVIOMMUState *s, uint32_t devid)
+{
+    RISCVIOMMUSpace *as;
+
+    /* FIXME: PCIe bus remapping for attached endpoints. */
+    devid |= s->bus << 8;
+
+    qemu_mutex_lock(&s->core_lock);
+    QLIST_FOREACH(as, &s->spaces, list) {
+        if (as->devid == devid) {
+            break;
+        }
+    }
+    qemu_mutex_unlock(&s->core_lock);
+
+    if (as == NULL) {
+        char name[64];
+        as = g_new0(RISCVIOMMUSpace, 1);
+
+        as->iommu = s;
+        as->devid = devid;
+
+        snprintf(name, sizeof(name), "riscv-iommu-%04x:%02x.%d-iova",
+            PCI_BUS_NUM(as->devid), PCI_SLOT(as->devid), PCI_FUNC(as->devid));
+
+        /* IOVA address space, untranslated addresses */
+        memory_region_init_iommu(&as->iova_mr, sizeof(as->iova_mr),
+            TYPE_RISCV_IOMMU_MEMORY_REGION,
+            OBJECT(as), name, UINT64_MAX);
+        address_space_init(&as->iova_as, MEMORY_REGION(&as->iova_mr),
+            TYPE_RISCV_IOMMU_PCI);
+
+        qemu_mutex_lock(&s->core_lock);
+        QLIST_INSERT_HEAD(&s->spaces, as, list);
+        qemu_mutex_unlock(&s->core_lock);
+
+        trace_riscv_iommu_new(s->parent_obj.id, PCI_BUS_NUM(as->devid),
+                PCI_SLOT(as->devid), PCI_FUNC(as->devid));
+    }
+    return &as->iova_as;
+}
+
+static int riscv_iommu_translate(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
+    IOMMUTLBEntry *iotlb)
+{
+    bool enable_faults;
+    bool enable_pasid;
+    bool enable_pri;
+    int fault;
+
+    enable_faults = !(ctx->tc & RISCV_IOMMU_DC_TC_DTF);
+    /*
+     * TC[32] is reserved for custom extensions, used here to temporarily
+     * enable automatic page-request generation for ATS queries.
+     */
+    enable_pri = (iotlb->perm == IOMMU_NONE) && (ctx->tc & BIT_ULL(32));
+    enable_pasid = (ctx->tc & RISCV_IOMMU_DC_TC_PDTV);
+
+    /* Translate using device directory / page table information. */
+    fault = riscv_iommu_spa_fetch(s, ctx, iotlb);
+
+    if (enable_pri && fault) {
+        struct riscv_iommu_pq_record pr = {0};
+        if (enable_pasid) {
+            pr.hdr = set_field(RISCV_IOMMU_PREQ_HDR_PV,
+                RISCV_IOMMU_PREQ_HDR_PID, ctx->pasid);
+        }
+        pr.hdr = set_field(pr.hdr, RISCV_IOMMU_PREQ_HDR_DID, ctx->devid);
+        pr.payload = (iotlb->iova & TARGET_PAGE_MASK) |
+                     RISCV_IOMMU_PREQ_PAYLOAD_M;
+        riscv_iommu_pri(s, &pr);
+        return fault;
+    }
+
+    if (enable_faults && fault) {
+        struct riscv_iommu_fq_record ev;
+        unsigned ttype;
+
+        if (iotlb->perm & IOMMU_RW) {
+            ttype = RISCV_IOMMU_FQ_TTYPE_UADDR_WR;
+        } else {
+            ttype = RISCV_IOMMU_FQ_TTYPE_UADDR_RD;
+        }
+        ev.hdr = set_field(0, RISCV_IOMMU_FQ_HDR_CAUSE, fault);
+        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_TTYPE, ttype);
+        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_PV, enable_pasid);
+        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_PID, ctx->pasid);
+        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_DID, ctx->devid);
+        ev.iotval    = iotlb->iova;
+        ev.iotval2   = iotlb->translated_addr;
+        ev._reserved = 0;
+        riscv_iommu_fault(s, &ev);
+        return fault;
+    }
+
+    return 0;
+}
+
+/* IOMMU Command Interface */
+static MemTxResult riscv_iommu_iofence(RISCVIOMMUState *s, bool notify,
+    uint64_t addr, uint32_t data)
+{
+    /*
+     * ATS processing in this implementation of the IOMMU is synchronous,
+     * no need to wait for completions here.
+     */
+    if (!notify) {
+        return MEMTX_OK;
+    }
+
+    return dma_memory_write(s->target_as, addr, &data, sizeof(data),
+        MEMTXATTRS_UNSPECIFIED);
+}
+
+static void riscv_iommu_process_ddtp(RISCVIOMMUState *s)
+{
+    uint64_t old_ddtp = s->ddtp;
+    uint64_t new_ddtp = riscv_iommu_reg_get64(s, RISCV_IOMMU_REG_DDTP);
+    unsigned new_mode = get_field(new_ddtp, RISCV_IOMMU_DDTP_MODE);
+    unsigned old_mode = get_field(old_ddtp, RISCV_IOMMU_DDTP_MODE);
+    bool ok = false;
+
+    /*
+     * Check for allowed DDTP.MODE transitions:
+     * {OFF, BARE}        -> {OFF, BARE, 1LVL, 2LVL, 3LVL}
+     * {1LVL, 2LVL, 3LVL} -> {OFF, BARE}
+     */
+    if (new_mode == old_mode ||
+        new_mode == RISCV_IOMMU_DDTP_MODE_OFF ||
+        new_mode == RISCV_IOMMU_DDTP_MODE_BARE) {
+        ok = true;
+    } else if (new_mode == RISCV_IOMMU_DDTP_MODE_1LVL ||
+               new_mode == RISCV_IOMMU_DDTP_MODE_2LVL ||
+               new_mode == RISCV_IOMMU_DDTP_MODE_3LVL) {
+        ok = old_mode == RISCV_IOMMU_DDTP_MODE_OFF ||
+             old_mode == RISCV_IOMMU_DDTP_MODE_BARE;
+    }
+
+    if (ok) {
+        /* clear reserved and busy bits, report back sanitized version */
+        new_ddtp = set_field(new_ddtp & RISCV_IOMMU_DDTP_PPN,
+                             RISCV_IOMMU_DDTP_MODE, new_mode);
+    } else {
+        new_ddtp = old_ddtp;
+    }
+    s->ddtp = new_ddtp;
+
+    riscv_iommu_reg_set64(s, RISCV_IOMMU_REG_DDTP, new_ddtp);
+}
+
+/* Command function and opcode field. */
+#define RISCV_IOMMU_CMD(func, op) (((func) << 7) | (op))
+
+static void riscv_iommu_process_cq_tail(RISCVIOMMUState *s)
+{
+    struct riscv_iommu_command cmd;
+    MemTxResult res;
+    dma_addr_t addr;
+    uint32_t tail, head, ctrl;
+    uint64_t cmd_opcode;
+    GHFunc func;
+
+    ctrl = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_CQCSR);
+    tail = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_CQT) & s->cq_mask;
+    head = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_CQH) & s->cq_mask;
+
+    /* Check for pending error or queue processing disabled */
+    if (!(ctrl & RISCV_IOMMU_CQCSR_CQON) ||
+        !!(ctrl & (RISCV_IOMMU_CQCSR_CMD_ILL | RISCV_IOMMU_CQCSR_CQMF))) {
+        return;
+    }
+
+    while (tail != head) {
+        addr = s->cq_addr  + head * sizeof(cmd);
+        res = dma_memory_read(s->target_as, addr, &cmd, sizeof(cmd),
+                              MEMTXATTRS_UNSPECIFIED);
+
+        if (res != MEMTX_OK) {
+            riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_CQCSR,
+                                  RISCV_IOMMU_CQCSR_CQMF, 0);
+            goto fault;
+        }
+
+        trace_riscv_iommu_cmd(s->parent_obj.id, cmd.dword0, cmd.dword1);
+
+        cmd_opcode = get_field(cmd.dword0,
+                               RISCV_IOMMU_CMD_OPCODE | RISCV_IOMMU_CMD_FUNC);
+
+        switch (cmd_opcode) {
+        case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IOFENCE_FUNC_C,
+                             RISCV_IOMMU_CMD_IOFENCE_OPCODE):
+            res = riscv_iommu_iofence(s,
+                cmd.dword0 & RISCV_IOMMU_CMD_IOFENCE_AV, cmd.dword1,
+                get_field(cmd.dword0, RISCV_IOMMU_CMD_IOFENCE_DATA));
+
+            if (res != MEMTX_OK) {
+                riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_CQCSR,
+                                      RISCV_IOMMU_CQCSR_CQMF, 0);
+                goto fault;
+            }
+            break;
+
+        case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IOTINVAL_FUNC_GVMA,
+                             RISCV_IOMMU_CMD_IOTINVAL_OPCODE):
+            if (cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_PSCV) {
+                /* illegal command arguments IOTINVAL.GVMA & PSCV == 1 */
+                goto cmd_ill;
+            }
+            /* translation cache not implemented yet */
+            break;
+
+        case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IOTINVAL_FUNC_VMA,
+                             RISCV_IOMMU_CMD_IOTINVAL_OPCODE):
+            /* translation cache not implemented yet */
+            break;
+
+        case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_DDT,
+                             RISCV_IOMMU_CMD_IODIR_OPCODE):
+            if (!(cmd.dword0 & RISCV_IOMMU_CMD_IODIR_DV)) {
+                /* invalidate all device context cache mappings */
+                func = __ctx_inval_all;
+            } else {
+                /* invalidate all device context matching DID */
+                func = __ctx_inval_devid;
+            }
+            riscv_iommu_ctx_inval(s, func,
+                get_field(cmd.dword0, RISCV_IOMMU_CMD_IODIR_DID), 0);
+            break;
+
+        case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_PDT,
+                             RISCV_IOMMU_CMD_IODIR_OPCODE):
+            if (!(cmd.dword0 & RISCV_IOMMU_CMD_IODIR_DV)) {
+                /* illegal command arguments IODIR_PDT & DV == 0 */
+                goto cmd_ill;
+            } else {
+                func = __ctx_inval_devid_pasid;
+            }
+            riscv_iommu_ctx_inval(s, func,
+                get_field(cmd.dword0, RISCV_IOMMU_CMD_IODIR_DID),
+                get_field(cmd.dword0, RISCV_IOMMU_CMD_IODIR_PID));
+            break;
+
+        default:
+        cmd_ill:
+            /* Invalid instruction, do not advance instruction index. */
+            riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_CQCSR,
+                RISCV_IOMMU_CQCSR_CMD_ILL, 0);
+            goto fault;
+        }
+
+        /* Advance and update head pointer after command completes. */
+        head = (head + 1) & s->cq_mask;
+        riscv_iommu_reg_set32(s, RISCV_IOMMU_REG_CQH, head);
+    }
+    return;
+
+fault:
+    if (ctrl & RISCV_IOMMU_CQCSR_CIE) {
+        riscv_iommu_notify(s, RISCV_IOMMU_INTR_CQ);
+    }
+}
+
+static void riscv_iommu_process_cq_control(RISCVIOMMUState *s)
+{
+    uint64_t base;
+    uint32_t ctrl_set = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_CQCSR);
+    uint32_t ctrl_clr;
+    bool enable = !!(ctrl_set & RISCV_IOMMU_CQCSR_CQEN);
+    bool active = !!(ctrl_set & RISCV_IOMMU_CQCSR_CQON);
+
+    if (enable && !active) {
+        base = riscv_iommu_reg_get64(s, RISCV_IOMMU_REG_CQB);
+        s->cq_mask = (2ULL << get_field(base, RISCV_IOMMU_CQB_LOG2SZ)) - 1;
+        s->cq_addr = PPN_PHYS(get_field(base, RISCV_IOMMU_CQB_PPN));
+        stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_CQT], ~s->cq_mask);
+        stl_le_p(&s->regs_rw[RISCV_IOMMU_REG_CQH], 0);
+        stl_le_p(&s->regs_rw[RISCV_IOMMU_REG_CQT], 0);
+        ctrl_set = RISCV_IOMMU_CQCSR_CQON;
+        ctrl_clr = RISCV_IOMMU_CQCSR_BUSY | RISCV_IOMMU_CQCSR_CQMF |
+            RISCV_IOMMU_CQCSR_CMD_ILL | RISCV_IOMMU_CQCSR_CMD_TO;
+    } else if (!enable && active) {
+        stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_CQT], ~0);
+        ctrl_set = 0;
+        ctrl_clr = RISCV_IOMMU_CQCSR_BUSY | RISCV_IOMMU_CQCSR_CQON;
+    } else {
+        ctrl_set = 0;
+        ctrl_clr = RISCV_IOMMU_CQCSR_BUSY;
+    }
+
+    riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_CQCSR, ctrl_set, ctrl_clr);
+}
+
+static void riscv_iommu_process_fq_control(RISCVIOMMUState *s)
+{
+    uint64_t base;
+    uint32_t ctrl_set = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_FQCSR);
+    uint32_t ctrl_clr;
+    bool enable = !!(ctrl_set & RISCV_IOMMU_FQCSR_FQEN);
+    bool active = !!(ctrl_set & RISCV_IOMMU_FQCSR_FQON);
+
+    if (enable && !active) {
+        base = riscv_iommu_reg_get64(s, RISCV_IOMMU_REG_FQB);
+        s->fq_mask = (2ULL << get_field(base, RISCV_IOMMU_FQB_LOG2SZ)) - 1;
+        s->fq_addr = PPN_PHYS(get_field(base, RISCV_IOMMU_FQB_PPN));
+        stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_FQH], ~s->fq_mask);
+        stl_le_p(&s->regs_rw[RISCV_IOMMU_REG_FQH], 0);
+        stl_le_p(&s->regs_rw[RISCV_IOMMU_REG_FQT], 0);
+        ctrl_set = RISCV_IOMMU_FQCSR_FQON;
+        ctrl_clr = RISCV_IOMMU_FQCSR_BUSY | RISCV_IOMMU_FQCSR_FQMF |
+            RISCV_IOMMU_FQCSR_FQOF;
+    } else if (!enable && active) {
+        stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_FQH], ~0);
+        ctrl_set = 0;
+        ctrl_clr = RISCV_IOMMU_FQCSR_BUSY | RISCV_IOMMU_FQCSR_FQON;
+    } else {
+        ctrl_set = 0;
+        ctrl_clr = RISCV_IOMMU_FQCSR_BUSY;
+    }
+
+    riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_FQCSR, ctrl_set, ctrl_clr);
+}
+
+static void riscv_iommu_process_pq_control(RISCVIOMMUState *s)
+{
+    uint64_t base;
+    uint32_t ctrl_set = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_PQCSR);
+    uint32_t ctrl_clr;
+    bool enable = !!(ctrl_set & RISCV_IOMMU_PQCSR_PQEN);
+    bool active = !!(ctrl_set & RISCV_IOMMU_PQCSR_PQON);
+
+    if (enable && !active) {
+        base = riscv_iommu_reg_get64(s, RISCV_IOMMU_REG_PQB);
+        s->pq_mask = (2ULL << get_field(base, RISCV_IOMMU_PQB_LOG2SZ)) - 1;
+        s->pq_addr = PPN_PHYS(get_field(base, RISCV_IOMMU_PQB_PPN));
+        stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_PQH], ~s->pq_mask);
+        stl_le_p(&s->regs_rw[RISCV_IOMMU_REG_PQH], 0);
+        stl_le_p(&s->regs_rw[RISCV_IOMMU_REG_PQT], 0);
+        ctrl_set = RISCV_IOMMU_PQCSR_PQON;
+        ctrl_clr = RISCV_IOMMU_PQCSR_BUSY | RISCV_IOMMU_PQCSR_PQMF |
+            RISCV_IOMMU_PQCSR_PQOF;
+    } else if (!enable && active) {
+        stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_PQH], ~0);
+        ctrl_set = 0;
+        ctrl_clr = RISCV_IOMMU_PQCSR_BUSY | RISCV_IOMMU_PQCSR_PQON;
+    } else {
+        ctrl_set = 0;
+        ctrl_clr = RISCV_IOMMU_PQCSR_BUSY;
+    }
+
+    riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_PQCSR, ctrl_set, ctrl_clr);
+}
+
+/* Core IOMMU execution activation */
+enum {
+    RISCV_IOMMU_EXEC_DDTP,
+    RISCV_IOMMU_EXEC_CQCSR,
+    RISCV_IOMMU_EXEC_CQT,
+    RISCV_IOMMU_EXEC_FQCSR,
+    RISCV_IOMMU_EXEC_FQH,
+    RISCV_IOMMU_EXEC_PQCSR,
+    RISCV_IOMMU_EXEC_PQH,
+    RISCV_IOMMU_EXEC_TR_REQUEST,
+    /* RISCV_IOMMU_EXEC_EXIT must be the last enum value */
+    RISCV_IOMMU_EXEC_EXIT,
+};
+
+static void *riscv_iommu_core_proc(void* arg)
+{
+    RISCVIOMMUState *s = arg;
+    unsigned exec = 0;
+    unsigned mask = 0;
+
+    while (!(exec & BIT(RISCV_IOMMU_EXEC_EXIT))) {
+        mask = (mask ? mask : BIT(RISCV_IOMMU_EXEC_EXIT)) >> 1;
+        switch (exec & mask) {
+        case BIT(RISCV_IOMMU_EXEC_DDTP):
+            riscv_iommu_process_ddtp(s);
+            break;
+        case BIT(RISCV_IOMMU_EXEC_CQCSR):
+            riscv_iommu_process_cq_control(s);
+            break;
+        case BIT(RISCV_IOMMU_EXEC_CQT):
+            riscv_iommu_process_cq_tail(s);
+            break;
+        case BIT(RISCV_IOMMU_EXEC_FQCSR):
+            riscv_iommu_process_fq_control(s);
+            break;
+        case BIT(RISCV_IOMMU_EXEC_FQH):
+            /* NOP */
+            break;
+        case BIT(RISCV_IOMMU_EXEC_PQCSR):
+            riscv_iommu_process_pq_control(s);
+            break;
+        case BIT(RISCV_IOMMU_EXEC_PQH):
+            /* NOP */
+            break;
+        case BIT(RISCV_IOMMU_EXEC_TR_REQUEST):
+            /* DBG support not implemented yet */
+            break;
+        }
+        exec &= ~mask;
+        if (!exec) {
+            qemu_mutex_lock(&s->core_lock);
+            exec = s->core_exec;
+            while (!exec) {
+                qemu_cond_wait(&s->core_cond, &s->core_lock);
+                exec = s->core_exec;
+            }
+            s->core_exec = 0;
+            qemu_mutex_unlock(&s->core_lock);
+        }
+    };
+
+    return NULL;
+}
+
+static MemTxResult riscv_iommu_mmio_write(void *opaque, hwaddr addr,
+    uint64_t data, unsigned size, MemTxAttrs attrs)
+{
+    RISCVIOMMUState *s = opaque;
+    uint32_t regb = addr & ~3;
+    uint32_t busy = 0;
+    uint32_t exec = 0;
+
+    if (size == 0 || size > 8 || (addr & (size - 1)) != 0) {
+        /* Unsupported MMIO alignment or access size */
+        return MEMTX_ERROR;
+    }
+
+    if (addr + size > RISCV_IOMMU_REG_MSI_CONFIG) {
+        /* Unsupported MMIO access location. */
+        return MEMTX_ACCESS_ERROR;
+    }
+
+    /* Track actionable MMIO write. */
+    switch (regb) {
+    case RISCV_IOMMU_REG_DDTP:
+    case RISCV_IOMMU_REG_DDTP + 4:
+        exec = BIT(RISCV_IOMMU_EXEC_DDTP);
+        regb = RISCV_IOMMU_REG_DDTP;
+        busy = RISCV_IOMMU_DDTP_BUSY;
+        break;
+
+    case RISCV_IOMMU_REG_CQT:
+        exec = BIT(RISCV_IOMMU_EXEC_CQT);
+        break;
+
+    case RISCV_IOMMU_REG_CQCSR:
+        exec = BIT(RISCV_IOMMU_EXEC_CQCSR);
+        busy = RISCV_IOMMU_CQCSR_BUSY;
+        break;
+
+    case RISCV_IOMMU_REG_FQH:
+        exec = BIT(RISCV_IOMMU_EXEC_FQH);
+        break;
+
+    case RISCV_IOMMU_REG_FQCSR:
+        exec = BIT(RISCV_IOMMU_EXEC_FQCSR);
+        busy = RISCV_IOMMU_FQCSR_BUSY;
+        break;
+
+    case RISCV_IOMMU_REG_PQH:
+        exec = BIT(RISCV_IOMMU_EXEC_PQH);
+        break;
+
+    case RISCV_IOMMU_REG_PQCSR:
+        exec = BIT(RISCV_IOMMU_EXEC_PQCSR);
+        busy = RISCV_IOMMU_PQCSR_BUSY;
+        break;
+    }
+
+    /*
+     * Registers update might be not synchronized with core logic.
+     * If system software updates register when relevant BUSY bit is set
+     * IOMMU behavior of additional writes to the register is UNSPECIFIED
+     */
+
+    qemu_spin_lock(&s->regs_lock);
+    if (size == 1) {
+        uint8_t ro = s->regs_ro[addr];
+        uint8_t wc = s->regs_wc[addr];
+        uint8_t rw = s->regs_rw[addr];
+        s->regs_rw[addr] = ((rw & ro) | (data & ~ro)) & ~(data & wc);
+    } else if (size == 2) {
+        uint16_t ro = lduw_le_p(&s->regs_ro[addr]);
+        uint16_t wc = lduw_le_p(&s->regs_wc[addr]);
+        uint16_t rw = lduw_le_p(&s->regs_rw[addr]);
+        stw_le_p(&s->regs_rw[addr], ((rw & ro) | (data & ~ro)) & ~(data & wc));
+    } else if (size == 4) {
+        uint32_t ro = ldl_le_p(&s->regs_ro[addr]);
+        uint32_t wc = ldl_le_p(&s->regs_wc[addr]);
+        uint32_t rw = ldl_le_p(&s->regs_rw[addr]);
+        stl_le_p(&s->regs_rw[addr], ((rw & ro) | (data & ~ro)) & ~(data & wc));
+    } else if (size == 8) {
+        uint64_t ro = ldq_le_p(&s->regs_ro[addr]);
+        uint64_t wc = ldq_le_p(&s->regs_wc[addr]);
+        uint64_t rw = ldq_le_p(&s->regs_rw[addr]);
+        stq_le_p(&s->regs_rw[addr], ((rw & ro) | (data & ~ro)) & ~(data & wc));
+    }
+
+    /* Busy flag update, MSB 4-byte register. */
+    if (busy) {
+        uint32_t rw = ldl_le_p(&s->regs_rw[regb]);
+        stl_le_p(&s->regs_rw[regb], rw | busy);
+    }
+    qemu_spin_unlock(&s->regs_lock);
+
+    /* Wake up core processing thread. */
+    if (exec) {
+        qemu_mutex_lock(&s->core_lock);
+        s->core_exec |= exec;
+        qemu_cond_signal(&s->core_cond);
+        qemu_mutex_unlock(&s->core_lock);
+    }
+
+    return MEMTX_OK;
+}
+
+static MemTxResult riscv_iommu_mmio_read(void *opaque, hwaddr addr,
+    uint64_t *data, unsigned size, MemTxAttrs attrs)
+{
+    RISCVIOMMUState *s = opaque;
+    uint64_t val = -1;
+    uint8_t *ptr;
+
+    if ((addr & (size - 1)) != 0) {
+        /* Unsupported MMIO alignment. */
+        return MEMTX_ERROR;
+    }
+
+    if (addr + size > RISCV_IOMMU_REG_MSI_CONFIG) {
+        return MEMTX_ACCESS_ERROR;
+    }
+
+    ptr = &s->regs_rw[addr];
+
+    if (size == 1) {
+        val = (uint64_t)*ptr;
+    } else if (size == 2) {
+        val = lduw_le_p(ptr);
+    } else if (size == 4) {
+        val = ldl_le_p(ptr);
+    } else if (size == 8) {
+        val = ldq_le_p(ptr);
+    } else {
+        return MEMTX_ERROR;
+    }
+
+    *data = val;
+
+    return MEMTX_OK;
+}
+
+static const MemoryRegionOps riscv_iommu_mmio_ops = {
+    .read_with_attrs = riscv_iommu_mmio_read,
+    .write_with_attrs = riscv_iommu_mmio_write,
+    .endianness = DEVICE_NATIVE_ENDIAN,
+    .impl = {
+        .min_access_size = 1,
+        .max_access_size = 8,
+        .unaligned = false,
+    },
+    .valid = {
+        .min_access_size = 1,
+        .max_access_size = 8,
+    }
+};
+
+/*
+ * Translations matching MSI pattern check are redirected to "riscv-iommu-trap"
+ * memory region as untranslated address, for additional MSI/MRIF interception
+ * by IOMMU interrupt remapping implementation.
+ * Note: Device emulation code generating an MSI is expected to provide a valid
+ * memory transaction attributes with requested_id set.
+ */
+static MemTxResult riscv_iommu_trap_write(void *opaque, hwaddr addr,
+    uint64_t data, unsigned size, MemTxAttrs attrs)
+{
+    RISCVIOMMUState* s = (RISCVIOMMUState *)opaque;
+    RISCVIOMMUContext *ctx;
+    MemTxResult res;
+    void *ref;
+    uint32_t devid = attrs.requester_id;
+
+    if (attrs.unspecified) {
+        return MEMTX_ACCESS_ERROR;
+    }
+
+    /* FIXME: PCIe bus remapping for attached endpoints. */
+    devid |= s->bus << 8;
+
+    ctx = riscv_iommu_ctx(s, devid, 0, &ref);
+    if (ctx == NULL) {
+        res = MEMTX_ACCESS_ERROR;
+    } else {
+        res = riscv_iommu_msi_write(s, ctx, addr, data, size, attrs);
+    }
+    riscv_iommu_ctx_put(s, ref);
+    return res;
+}
+
+static MemTxResult riscv_iommu_trap_read(void *opaque, hwaddr addr,
+    uint64_t *data, unsigned size, MemTxAttrs attrs)
+{
+    return MEMTX_ACCESS_ERROR;
+}
+
+static const MemoryRegionOps riscv_iommu_trap_ops = {
+    .read_with_attrs = riscv_iommu_trap_read,
+    .write_with_attrs = riscv_iommu_trap_write,
+    .endianness = DEVICE_LITTLE_ENDIAN,
+    .impl = {
+        .min_access_size = 1,
+        .max_access_size = 8,
+        .unaligned = true,
+    },
+    .valid = {
+        .min_access_size = 1,
+        .max_access_size = 8,
+    }
+};
+
+static void riscv_iommu_realize(DeviceState *dev, Error **errp)
+{
+    RISCVIOMMUState *s = RISCV_IOMMU(dev);
+
+    s->cap = s->version & RISCV_IOMMU_CAP_VERSION;
+    if (s->enable_msi) {
+        s->cap |= RISCV_IOMMU_CAP_MSI_FLAT | RISCV_IOMMU_CAP_MSI_MRIF;
+    }
+    /* Report QEMU target physical address space limits */
+    s->cap = set_field(s->cap, RISCV_IOMMU_CAP_PAS,
+                       TARGET_PHYS_ADDR_SPACE_BITS);
+
+    /* TODO: method to report supported PASID bits */
+    s->pasid_bits = 8; /* restricted to size of MemTxAttrs.pasid */
+    s->cap |= RISCV_IOMMU_CAP_PD8;
+
+    /* Out-of-reset translation mode: OFF (DMA disabled) BARE (passthrough) */
+    s->ddtp = set_field(0, RISCV_IOMMU_DDTP_MODE, s->enable_off ?
+                        RISCV_IOMMU_DDTP_MODE_OFF : RISCV_IOMMU_DDTP_MODE_BARE);
+
+    /* register storage */
+    s->regs_rw = g_new0(uint8_t, RISCV_IOMMU_REG_SIZE);
+    s->regs_ro = g_new0(uint8_t, RISCV_IOMMU_REG_SIZE);
+    s->regs_wc = g_new0(uint8_t, RISCV_IOMMU_REG_SIZE);
+
+     /* Mark all registers read-only */
+    memset(s->regs_ro, 0xff, RISCV_IOMMU_REG_SIZE);
+
+    /*
+     * Register complete MMIO space, including MSI/PBA registers.
+     * Note, PCIDevice implementation will add overlapping MR for MSI/PBA,
+     * managed directly by the PCIDevice implementation.
+     */
+    memory_region_init_io(&s->regs_mr, OBJECT(dev), &riscv_iommu_mmio_ops, s,
+        "riscv-iommu-regs", RISCV_IOMMU_REG_SIZE);
+
+    /* Set power-on register state */
+    stq_le_p(&s->regs_rw[RISCV_IOMMU_REG_CAP], s->cap);
+    stq_le_p(&s->regs_rw[RISCV_IOMMU_REG_FCTL], s->fctl);
+    stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_DDTP],
+        ~(RISCV_IOMMU_DDTP_PPN | RISCV_IOMMU_DDTP_MODE));
+    stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_CQB],
+        ~(RISCV_IOMMU_CQB_LOG2SZ | RISCV_IOMMU_CQB_PPN));
+    stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_FQB],
+        ~(RISCV_IOMMU_FQB_LOG2SZ | RISCV_IOMMU_FQB_PPN));
+    stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_PQB],
+        ~(RISCV_IOMMU_PQB_LOG2SZ | RISCV_IOMMU_PQB_PPN));
+    stl_le_p(&s->regs_wc[RISCV_IOMMU_REG_CQCSR], RISCV_IOMMU_CQCSR_CQMF |
+        RISCV_IOMMU_CQCSR_CMD_TO | RISCV_IOMMU_CQCSR_CMD_ILL);
+    stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_CQCSR], RISCV_IOMMU_CQCSR_CQON |
+        RISCV_IOMMU_CQCSR_BUSY);
+    stl_le_p(&s->regs_wc[RISCV_IOMMU_REG_FQCSR], RISCV_IOMMU_FQCSR_FQMF |
+        RISCV_IOMMU_FQCSR_FQOF);
+    stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_FQCSR], RISCV_IOMMU_FQCSR_FQON |
+        RISCV_IOMMU_FQCSR_BUSY);
+    stl_le_p(&s->regs_wc[RISCV_IOMMU_REG_PQCSR], RISCV_IOMMU_PQCSR_PQMF |
+        RISCV_IOMMU_PQCSR_PQOF);
+    stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_PQCSR], RISCV_IOMMU_PQCSR_PQON |
+        RISCV_IOMMU_PQCSR_BUSY);
+    stl_le_p(&s->regs_wc[RISCV_IOMMU_REG_IPSR], ~0);
+    stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_IVEC], 0);
+    stq_le_p(&s->regs_rw[RISCV_IOMMU_REG_DDTP], s->ddtp);
+
+    /* Memory region for downstream access, if specified. */
+    if (s->target_mr) {
+        s->target_as = g_new0(AddressSpace, 1);
+        address_space_init(s->target_as, s->target_mr,
+            "riscv-iommu-downstream");
+    } else {
+        /* Fallback to global system memory. */
+        s->target_as = &address_space_memory;
+    }
+
+    /* Memory region for untranslated MRIF/MSI writes */
+    memory_region_init_io(&s->trap_mr, OBJECT(dev), &riscv_iommu_trap_ops, s,
+            "riscv-iommu-trap", ~0ULL);
+    address_space_init(&s->trap_as, &s->trap_mr, "riscv-iommu-trap-as");
+
+    /* Device translation context cache */
+    s->ctx_cache = g_hash_table_new_full(__ctx_hash, __ctx_equal,
+                                         g_free, NULL);
+
+    s->iommus.le_next = NULL;
+    s->iommus.le_prev = NULL;
+    QLIST_INIT(&s->spaces);
+    qemu_cond_init(&s->core_cond);
+    qemu_mutex_init(&s->core_lock);
+    qemu_spin_init(&s->regs_lock);
+    qemu_thread_create(&s->core_proc, "riscv-iommu-core",
+        riscv_iommu_core_proc, s, QEMU_THREAD_JOINABLE);
+}
+
+static void riscv_iommu_unrealize(DeviceState *dev)
+{
+    RISCVIOMMUState *s = RISCV_IOMMU(dev);
+
+    qemu_mutex_lock(&s->core_lock);
+    /* cancel pending operations and stop */
+    s->core_exec = BIT(RISCV_IOMMU_EXEC_EXIT);
+    qemu_cond_signal(&s->core_cond);
+    qemu_mutex_unlock(&s->core_lock);
+    qemu_thread_join(&s->core_proc);
+    qemu_cond_destroy(&s->core_cond);
+    qemu_mutex_destroy(&s->core_lock);
+    g_hash_table_unref(s->ctx_cache);
+}
+
+static Property riscv_iommu_properties[] = {
+    DEFINE_PROP_UINT32("version", RISCVIOMMUState, version,
+        RISCV_IOMMU_SPEC_DOT_VER),
+    DEFINE_PROP_UINT32("bus", RISCVIOMMUState, bus, 0x0),
+    DEFINE_PROP_BOOL("intremap", RISCVIOMMUState, enable_msi, TRUE),
+    DEFINE_PROP_BOOL("off", RISCVIOMMUState, enable_off, TRUE),
+    DEFINE_PROP_LINK("downstream-mr", RISCVIOMMUState, target_mr,
+        TYPE_MEMORY_REGION, MemoryRegion *),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void riscv_iommu_class_init(ObjectClass *klass, void* data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    /* internal device for riscv-iommu-{pci/sys}, not user-creatable */
+    dc->user_creatable = false;
+    dc->realize = riscv_iommu_realize;
+    dc->unrealize = riscv_iommu_unrealize;
+    device_class_set_props(dc, riscv_iommu_properties);
+}
+
+static const TypeInfo riscv_iommu_info = {
+    .name = TYPE_RISCV_IOMMU,
+    .parent = TYPE_DEVICE,
+    .instance_size = sizeof(RISCVIOMMUState),
+    .class_init = riscv_iommu_class_init,
+};
+
+static const char *IOMMU_FLAG_STR[] = {
+    "NA",
+    "RO",
+    "WR",
+    "RW",
+};
+
+/* RISC-V IOMMU Memory Region - Address Translation Space */
+static IOMMUTLBEntry riscv_iommu_memory_region_translate(
+    IOMMUMemoryRegion *iommu_mr, hwaddr addr,
+    IOMMUAccessFlags flag, int iommu_idx)
+{
+    RISCVIOMMUSpace *as = container_of(iommu_mr, RISCVIOMMUSpace, iova_mr);
+    RISCVIOMMUContext *ctx;
+    void *ref;
+    IOMMUTLBEntry iotlb = {
+        .iova = addr,
+        .target_as = as->iommu->target_as,
+        .addr_mask = ~0ULL,
+        .perm = flag,
+    };
+
+    ctx = riscv_iommu_ctx(as->iommu, as->devid, iommu_idx, &ref);
+    if (ctx == NULL) {
+        /* Translation disabled or invalid. */
+        iotlb.addr_mask = 0;
+        iotlb.perm = IOMMU_NONE;
+    } else if (riscv_iommu_translate(as->iommu, ctx, &iotlb)) {
+        /* Translation disabled or fault reported. */
+        iotlb.addr_mask = 0;
+        iotlb.perm = IOMMU_NONE;
+    }
+
+    /* Trace all dma translations with original access flags. */
+    trace_riscv_iommu_dma(as->iommu->parent_obj.id, PCI_BUS_NUM(as->devid),
+                          PCI_SLOT(as->devid), PCI_FUNC(as->devid), iommu_idx,
+                          IOMMU_FLAG_STR[flag & IOMMU_RW], iotlb.iova,
+                          iotlb.translated_addr);
+
+    riscv_iommu_ctx_put(as->iommu, ref);
+
+    return iotlb;
+}
+
+static int riscv_iommu_memory_region_notify(
+    IOMMUMemoryRegion *iommu_mr, IOMMUNotifierFlag old,
+    IOMMUNotifierFlag new, Error **errp)
+{
+    RISCVIOMMUSpace *as = container_of(iommu_mr, RISCVIOMMUSpace, iova_mr);
+
+    if (old == IOMMU_NOTIFIER_NONE) {
+        as->notifier = true;
+        trace_riscv_iommu_notifier_add(iommu_mr->parent_obj.name);
+    } else if (new == IOMMU_NOTIFIER_NONE) {
+        as->notifier = false;
+        trace_riscv_iommu_notifier_del(iommu_mr->parent_obj.name);
+    }
+
+    return 0;
+}
+
+static inline bool pci_is_iommu(PCIDevice *pdev)
+{
+    return pci_get_word(pdev->config + PCI_CLASS_DEVICE) == 0x0806;
+}
+
+static AddressSpace *riscv_iommu_find_as(PCIBus *bus, void *opaque, int devfn)
+{
+    RISCVIOMMUState *s = (RISCVIOMMUState *) opaque;
+    PCIDevice *pdev = pci_find_device(bus, pci_bus_num(bus), devfn);
+    AddressSpace *as = NULL;
+
+    if (pdev && pci_is_iommu(pdev)) {
+        return s->target_as;
+    }
+
+    /* Find first registered IOMMU device */
+    while (s->iommus.le_prev) {
+        s = *(s->iommus.le_prev);
+    }
+
+    /* Find first matching IOMMU */
+    while (s != NULL && as == NULL) {
+        as = riscv_iommu_space(s, PCI_BUILD_BDF(pci_bus_num(bus), devfn));
+        s = s->iommus.le_next;
+    }
+
+    return as ? as : &address_space_memory;
+}
+
+static const PCIIOMMUOps riscv_iommu_ops = {
+    .get_address_space = riscv_iommu_find_as,
+};
+
+void riscv_iommu_pci_setup_iommu(RISCVIOMMUState *iommu, PCIBus *bus,
+        Error **errp)
+{
+    if (bus->iommu_ops &&
+        bus->iommu_ops->get_address_space == riscv_iommu_find_as) {
+        /* Allow multiple IOMMUs on the same PCIe bus, link known devices */
+        RISCVIOMMUState *last = (RISCVIOMMUState *)bus->iommu_opaque;
+        QLIST_INSERT_AFTER(last, iommu, iommus);
+    } else if (bus->iommu_ops == NULL) {
+        pci_setup_iommu(bus, &riscv_iommu_ops, iommu);
+    } else {
+        error_setg(errp, "can't register secondary IOMMU for PCI bus #%d",
+            pci_bus_num(bus));
+    }
+}
+
+static int riscv_iommu_memory_region_index(IOMMUMemoryRegion *iommu_mr,
+    MemTxAttrs attrs)
+{
+    return attrs.unspecified ? RISCV_IOMMU_NOPASID : (int)attrs.pasid;
+}
+
+static int riscv_iommu_memory_region_index_len(IOMMUMemoryRegion *iommu_mr)
+{
+    RISCVIOMMUSpace *as = container_of(iommu_mr, RISCVIOMMUSpace, iova_mr);
+    return 1 << as->iommu->pasid_bits;
+}
+
+static void riscv_iommu_memory_region_init(ObjectClass *klass, void *data)
+{
+    IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_CLASS(klass);
+
+    imrc->translate = riscv_iommu_memory_region_translate;
+    imrc->notify_flag_changed = riscv_iommu_memory_region_notify;
+    imrc->attrs_to_index = riscv_iommu_memory_region_index;
+    imrc->num_indexes = riscv_iommu_memory_region_index_len;
+}
+
+static const TypeInfo riscv_iommu_memory_region_info = {
+    .parent = TYPE_IOMMU_MEMORY_REGION,
+    .name = TYPE_RISCV_IOMMU_MEMORY_REGION,
+    .class_init = riscv_iommu_memory_region_init,
+};
+
+static void riscv_iommu_register_mr_types(void)
+{
+    type_register_static(&riscv_iommu_memory_region_info);
+    type_register_static(&riscv_iommu_info);
+}
+
+type_init(riscv_iommu_register_mr_types);
diff --git a/hw/riscv/riscv-iommu.h b/hw/riscv/riscv-iommu.h
new file mode 100644
index 0000000000..6f740de690
--- /dev/null
+++ b/hw/riscv/riscv-iommu.h
@@ -0,0 +1,141 @@
+/*
+ * QEMU emulation of an RISC-V IOMMU (Ziommu)
+ *
+ * Copyright (C) 2022-2023 Rivos Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HW_RISCV_IOMMU_STATE_H
+#define HW_RISCV_IOMMU_STATE_H
+
+#include "qemu/osdep.h"
+#include "qom/object.h"
+
+#include "hw/riscv/iommu.h"
+
+struct RISCVIOMMUState {
+    /*< private >*/
+    DeviceState parent_obj;
+
+    /*< public >*/
+    uint32_t version;     /* Reported interface version number */
+    uint32_t pasid_bits;  /* process identifier width */
+    uint32_t bus;         /* PCI bus mapping for non-root endpoints */
+
+    uint64_t cap;         /* IOMMU supported capabilities */
+    uint64_t fctl;        /* IOMMU enabled features */
+
+    bool enable_off;      /* Enable out-of-reset OFF mode (DMA disabled) */
+    bool enable_msi;      /* Enable MSI remapping */
+
+    /* IOMMU Internal State */
+    uint64_t ddtp;        /* Validated Device Directory Tree Root Pointer */
+
+    dma_addr_t cq_addr;   /* Command queue base physical address */
+    dma_addr_t fq_addr;   /* Fault/event queue base physical address */
+    dma_addr_t pq_addr;   /* Page request queue base physical address */
+
+    uint32_t cq_mask;     /* Command queue index bit mask */
+    uint32_t fq_mask;     /* Fault/event queue index bit mask */
+    uint32_t pq_mask;     /* Page request queue index bit mask */
+
+    /* interrupt notifier */
+    void (*notify)(RISCVIOMMUState *iommu, unsigned vector);
+
+    /* IOMMU State Machine */
+    QemuThread core_proc; /* Background processing thread */
+    QemuMutex core_lock;  /* Global IOMMU lock, used for cache/regs updates */
+    QemuCond core_cond;   /* Background processing wake up signal */
+    unsigned core_exec;   /* Processing thread execution actions */
+
+    /* IOMMU target address space */
+    AddressSpace *target_as;
+    MemoryRegion *target_mr;
+
+    /* MSI / MRIF access trap */
+    AddressSpace trap_as;
+    MemoryRegion trap_mr;
+
+    GHashTable *ctx_cache;          /* Device translation Context Cache */
+
+    /* MMIO Hardware Interface */
+    MemoryRegion regs_mr;
+    QemuSpin regs_lock;
+    uint8_t *regs_rw;  /* register state (user write) */
+    uint8_t *regs_wc;  /* write-1-to-clear mask */
+    uint8_t *regs_ro;  /* read-only mask */
+
+    QLIST_ENTRY(RISCVIOMMUState) iommus;
+    QLIST_HEAD(, RISCVIOMMUSpace) spaces;
+};
+
+void riscv_iommu_pci_setup_iommu(RISCVIOMMUState *iommu, PCIBus *bus,
+         Error **errp);
+
+/* private helpers */
+
+/* Register helper functions */
+static inline uint32_t riscv_iommu_reg_mod32(RISCVIOMMUState *s,
+    unsigned idx, uint32_t set, uint32_t clr)
+{
+    uint32_t val;
+    qemu_spin_lock(&s->regs_lock);
+    val = ldl_le_p(s->regs_rw + idx);
+    stl_le_p(s->regs_rw + idx, (val & ~clr) | set);
+    qemu_spin_unlock(&s->regs_lock);
+    return val;
+}
+
+static inline void riscv_iommu_reg_set32(RISCVIOMMUState *s,
+    unsigned idx, uint32_t set)
+{
+    qemu_spin_lock(&s->regs_lock);
+    stl_le_p(s->regs_rw + idx, set);
+    qemu_spin_unlock(&s->regs_lock);
+}
+
+static inline uint32_t riscv_iommu_reg_get32(RISCVIOMMUState *s,
+    unsigned idx)
+{
+    return ldl_le_p(s->regs_rw + idx);
+}
+
+static inline uint64_t riscv_iommu_reg_mod64(RISCVIOMMUState *s,
+    unsigned idx, uint64_t set, uint64_t clr)
+{
+    uint64_t val;
+    qemu_spin_lock(&s->regs_lock);
+    val = ldq_le_p(s->regs_rw + idx);
+    stq_le_p(s->regs_rw + idx, (val & ~clr) | set);
+    qemu_spin_unlock(&s->regs_lock);
+    return val;
+}
+
+static inline void riscv_iommu_reg_set64(RISCVIOMMUState *s,
+    unsigned idx, uint64_t set)
+{
+    qemu_spin_lock(&s->regs_lock);
+    stq_le_p(s->regs_rw + idx, set);
+    qemu_spin_unlock(&s->regs_lock);
+}
+
+static inline uint64_t riscv_iommu_reg_get64(RISCVIOMMUState *s,
+    unsigned idx)
+{
+    return ldq_le_p(s->regs_rw + idx);
+}
+
+
+
+#endif
diff --git a/hw/riscv/trace-events b/hw/riscv/trace-events
new file mode 100644
index 0000000000..42a97caffa
--- /dev/null
+++ b/hw/riscv/trace-events
@@ -0,0 +1,11 @@
+# See documentation at docs/devel/tracing.rst
+
+# riscv-iommu.c
+riscv_iommu_new(const char *id, unsigned b, unsigned d, unsigned f) "%s: device attached %04x:%02x.%d"
+riscv_iommu_flt(const char *id, unsigned b, unsigned d, unsigned f, uint64_t reason, uint64_t iova) "%s: fault %04x:%02x.%u reason: 0x%"PRIx64" iova: 0x%"PRIx64
+riscv_iommu_pri(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova) "%s: page request %04x:%02x.%u iova: 0x%"PRIx64
+riscv_iommu_dma(const char *id, unsigned b, unsigned d, unsigned f, unsigned pasid, const char *dir, uint64_t iova, uint64_t phys) "%s: translate %04x:%02x.%u #%u %s 0x%"PRIx64" -> 0x%"PRIx64
+riscv_iommu_msi(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova, uint64_t phys) "%s: translate %04x:%02x.%u MSI 0x%"PRIx64" -> 0x%"PRIx64
+riscv_iommu_cmd(const char *id, uint64_t l, uint64_t u) "%s: command 0x%"PRIx64" 0x%"PRIx64
+riscv_iommu_notifier_add(const char *id) "%s: dev-iotlb notifier added"
+riscv_iommu_notifier_del(const char *id) "%s: dev-iotlb notifier removed"
diff --git a/hw/riscv/trace.h b/hw/riscv/trace.h
new file mode 100644
index 0000000000..b88504b750
--- /dev/null
+++ b/hw/riscv/trace.h
@@ -0,0 +1,2 @@
+#include "trace/trace-hw_riscv.h"
+
diff --git a/include/hw/riscv/iommu.h b/include/hw/riscv/iommu.h
new file mode 100644
index 0000000000..403b365893
--- /dev/null
+++ b/include/hw/riscv/iommu.h
@@ -0,0 +1,36 @@
+/*
+ * QEMU emulation of an RISC-V IOMMU (Ziommu)
+ *
+ * Copyright (C) 2022-2023 Rivos Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HW_RISCV_IOMMU_H
+#define HW_RISCV_IOMMU_H
+
+#include "qemu/osdep.h"
+#include "qom/object.h"
+
+#define TYPE_RISCV_IOMMU "riscv-iommu"
+OBJECT_DECLARE_SIMPLE_TYPE(RISCVIOMMUState, RISCV_IOMMU)
+typedef struct RISCVIOMMUState RISCVIOMMUState;
+
+#define TYPE_RISCV_IOMMU_MEMORY_REGION "riscv-iommu-mr"
+typedef struct RISCVIOMMUSpace RISCVIOMMUSpace;
+
+#define TYPE_RISCV_IOMMU_PCI "riscv-iommu-pci"
+OBJECT_DECLARE_SIMPLE_TYPE(RISCVIOMMUStatePci, RISCV_IOMMU_PCI)
+typedef struct RISCVIOMMUStatePci RISCVIOMMUStatePci;
+
+#endif
diff --git a/meson.build b/meson.build
index c59ca496f2..75e56f3282 100644
--- a/meson.build
+++ b/meson.build
@@ -3361,6 +3361,7 @@ if have_system
     'hw/rdma',
     'hw/rdma/vmw',
     'hw/rtc',
+    'hw/riscv',
     'hw/s390x',
     'hw/scsi',
     'hw/sd',
-- 
2.43.2



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 04/15] hw/riscv: add riscv-iommu-pci device
  2024-03-07 16:03 [PATCH v2 00/15] riscv: QEMU RISC-V IOMMU Support Daniel Henrique Barboza
                   ` (2 preceding siblings ...)
  2024-03-07 16:03 ` [PATCH v2 03/15] hw/riscv: add RISC-V IOMMU base emulation Daniel Henrique Barboza
@ 2024-03-07 16:03 ` Daniel Henrique Barboza
  2024-04-29  7:21   ` Frank Chang
  2024-03-07 16:03 ` [PATCH v2 05/15] hw/riscv: add riscv-iommu-sys platform device Daniel Henrique Barboza
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 55+ messages in thread
From: Daniel Henrique Barboza @ 2024-03-07 16:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, zhiwei_liu,
	palmer, ajones, tjeznach, Daniel Henrique Barboza

From: Tomasz Jeznach <tjeznach@rivosinc.com>

The RISC-V IOMMU can be modelled as a PCIe device following the
guidelines of the RISC-V IOMMU spec, chapter 7.1, "Integrating an IOMMU
as a PCIe device".

Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
---
 hw/riscv/meson.build       |   2 +-
 hw/riscv/riscv-iommu-pci.c | 173 +++++++++++++++++++++++++++++++++++++
 2 files changed, 174 insertions(+), 1 deletion(-)
 create mode 100644 hw/riscv/riscv-iommu-pci.c

diff --git a/hw/riscv/meson.build b/hw/riscv/meson.build
index ba9eebd605..4674cec6c4 100644
--- a/hw/riscv/meson.build
+++ b/hw/riscv/meson.build
@@ -10,6 +10,6 @@ riscv_ss.add(when: 'CONFIG_SIFIVE_U', if_true: files('sifive_u.c'))
 riscv_ss.add(when: 'CONFIG_SPIKE', if_true: files('spike.c'))
 riscv_ss.add(when: 'CONFIG_MICROCHIP_PFSOC', if_true: files('microchip_pfsoc.c'))
 riscv_ss.add(when: 'CONFIG_ACPI', if_true: files('virt-acpi-build.c'))
-riscv_ss.add(when: 'CONFIG_RISCV_IOMMU', if_true: files('riscv-iommu.c'))
+riscv_ss.add(when: 'CONFIG_RISCV_IOMMU', if_true: files('riscv-iommu.c', 'riscv-iommu-pci.c'))
 
 hw_arch += {'riscv': riscv_ss}
diff --git a/hw/riscv/riscv-iommu-pci.c b/hw/riscv/riscv-iommu-pci.c
new file mode 100644
index 0000000000..4eb1057210
--- /dev/null
+++ b/hw/riscv/riscv-iommu-pci.c
@@ -0,0 +1,173 @@
+/*
+ * QEMU emulation of an RISC-V IOMMU (Ziommu)
+ *
+ * Copyright (C) 2022-2023 Rivos Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/pci/msi.h"
+#include "hw/pci/msix.h"
+#include "hw/pci/pci_bus.h"
+#include "hw/qdev-properties.h"
+#include "hw/riscv/riscv_hart.h"
+#include "migration/vmstate.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "qemu/host-utils.h"
+#include "qom/object.h"
+
+#include "cpu_bits.h"
+#include "riscv-iommu.h"
+#include "riscv-iommu-bits.h"
+
+#ifndef PCI_VENDOR_ID_RIVOS
+#define PCI_VENDOR_ID_RIVOS           0x1efd
+#endif
+
+#ifndef PCI_DEVICE_ID_RIVOS_IOMMU
+#define PCI_DEVICE_ID_RIVOS_IOMMU     0xedf1
+#endif
+
+/* RISC-V IOMMU PCI Device Emulation */
+
+typedef struct RISCVIOMMUStatePci {
+    PCIDevice        pci;     /* Parent PCIe device state */
+    MemoryRegion     bar0;    /* PCI BAR (including MSI-x config) */
+    RISCVIOMMUState  iommu;   /* common IOMMU state */
+} RISCVIOMMUStatePci;
+
+/* interrupt delivery callback */
+static void riscv_iommu_pci_notify(RISCVIOMMUState *iommu, unsigned vector)
+{
+    RISCVIOMMUStatePci *s = container_of(iommu, RISCVIOMMUStatePci, iommu);
+
+    if (msix_enabled(&(s->pci))) {
+        msix_notify(&(s->pci), vector);
+    }
+}
+
+static void riscv_iommu_pci_realize(PCIDevice *dev, Error **errp)
+{
+    RISCVIOMMUStatePci *s = DO_UPCAST(RISCVIOMMUStatePci, pci, dev);
+    RISCVIOMMUState *iommu = &s->iommu;
+    Error *err = NULL;
+
+    /* Set device id for trace / debug */
+    DEVICE(iommu)->id = g_strdup_printf("%02x:%02x.%01x",
+        pci_dev_bus_num(dev), PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn));
+    qdev_realize(DEVICE(iommu), NULL, errp);
+
+    memory_region_init(&s->bar0, OBJECT(s), "riscv-iommu-bar0",
+        QEMU_ALIGN_UP(memory_region_size(&iommu->regs_mr), TARGET_PAGE_SIZE));
+    memory_region_add_subregion(&s->bar0, 0, &iommu->regs_mr);
+
+    pcie_endpoint_cap_init(dev, 0);
+
+    pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY |
+                     PCI_BASE_ADDRESS_MEM_TYPE_64, &s->bar0);
+
+    int ret = msix_init(dev, RISCV_IOMMU_INTR_COUNT,
+                        &s->bar0, 0, RISCV_IOMMU_REG_MSI_CONFIG,
+                        &s->bar0, 0, RISCV_IOMMU_REG_MSI_CONFIG + 256, 0, &err);
+
+    if (ret == -ENOTSUP) {
+        /*
+         * MSI-x is not supported by the platform.
+         * Driver should use timer/polling based notification handlers.
+         */
+        warn_report_err(err);
+    } else if (ret < 0) {
+        error_propagate(errp, err);
+        return;
+    } else {
+        /* mark all allocated MSIx vectors as used. */
+        msix_vector_use(dev, RISCV_IOMMU_INTR_CQ);
+        msix_vector_use(dev, RISCV_IOMMU_INTR_FQ);
+        msix_vector_use(dev, RISCV_IOMMU_INTR_PM);
+        msix_vector_use(dev, RISCV_IOMMU_INTR_PQ);
+        iommu->notify = riscv_iommu_pci_notify;
+    }
+
+    PCIBus *bus = pci_device_root_bus(dev);
+    if (!bus) {
+        error_setg(errp, "can't find PCIe root port for %02x:%02x.%x",
+            pci_bus_num(pci_get_bus(dev)), PCI_SLOT(dev->devfn),
+            PCI_FUNC(dev->devfn));
+        return;
+    }
+
+    riscv_iommu_pci_setup_iommu(iommu, bus, errp);
+}
+
+static void riscv_iommu_pci_exit(PCIDevice *pci_dev)
+{
+    pci_setup_iommu(pci_device_root_bus(pci_dev), NULL, NULL);
+}
+
+static const VMStateDescription riscv_iommu_vmstate = {
+    .name = "riscv-iommu",
+    .unmigratable = 1
+};
+
+static void riscv_iommu_pci_init(Object *obj)
+{
+    RISCVIOMMUStatePci *s = RISCV_IOMMU_PCI(obj);
+    RISCVIOMMUState *iommu = &s->iommu;
+
+    object_initialize_child(obj, "iommu", iommu, TYPE_RISCV_IOMMU);
+    qdev_alias_all_properties(DEVICE(iommu), obj);
+}
+
+static Property riscv_iommu_pci_properties[] = {
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void riscv_iommu_pci_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+
+    k->realize = riscv_iommu_pci_realize;
+    k->exit = riscv_iommu_pci_exit;
+    k->vendor_id = PCI_VENDOR_ID_RIVOS;
+    k->device_id = PCI_DEVICE_ID_RIVOS_IOMMU;
+    k->revision = 0;
+    k->class_id = 0x0806;
+    dc->desc = "RISCV-IOMMU DMA Remapping device";
+    dc->vmsd = &riscv_iommu_vmstate;
+    dc->hotpluggable = false;
+    dc->user_creatable = true;
+    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+    device_class_set_props(dc, riscv_iommu_pci_properties);
+}
+
+static const TypeInfo riscv_iommu_pci = {
+    .name = TYPE_RISCV_IOMMU_PCI,
+    .parent = TYPE_PCI_DEVICE,
+    .class_init = riscv_iommu_pci_class_init,
+    .instance_init = riscv_iommu_pci_init,
+    .instance_size = sizeof(RISCVIOMMUStatePci),
+    .interfaces = (InterfaceInfo[]) {
+        { INTERFACE_PCIE_DEVICE },
+        { },
+    },
+};
+
+static void riscv_iommu_register_pci_types(void)
+{
+    type_register_static(&riscv_iommu_pci);
+}
+
+type_init(riscv_iommu_register_pci_types);
-- 
2.43.2



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 05/15] hw/riscv: add riscv-iommu-sys platform device
  2024-03-07 16:03 [PATCH v2 00/15] riscv: QEMU RISC-V IOMMU Support Daniel Henrique Barboza
                   ` (3 preceding siblings ...)
  2024-03-07 16:03 ` [PATCH v2 04/15] hw/riscv: add riscv-iommu-pci device Daniel Henrique Barboza
@ 2024-03-07 16:03 ` Daniel Henrique Barboza
  2024-04-30  1:35   ` Frank Chang
  2024-03-07 16:03 ` [PATCH v2 06/15] hw/riscv/virt.c: support for RISC-V IOMMU PCIDevice hotplug Daniel Henrique Barboza
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 55+ messages in thread
From: Daniel Henrique Barboza @ 2024-03-07 16:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, zhiwei_liu,
	palmer, ajones, tjeznach, Daniel Henrique Barboza

From: Tomasz Jeznach <tjeznach@rivosinc.com>

This device models the RISC-V IOMMU as a sysbus device.

Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
---
 hw/riscv/meson.build       |  2 +-
 hw/riscv/riscv-iommu-sys.c | 93 ++++++++++++++++++++++++++++++++++++++
 include/hw/riscv/iommu.h   |  4 ++
 3 files changed, 98 insertions(+), 1 deletion(-)
 create mode 100644 hw/riscv/riscv-iommu-sys.c

diff --git a/hw/riscv/meson.build b/hw/riscv/meson.build
index 4674cec6c4..e37c5d78e2 100644
--- a/hw/riscv/meson.build
+++ b/hw/riscv/meson.build
@@ -10,6 +10,6 @@ riscv_ss.add(when: 'CONFIG_SIFIVE_U', if_true: files('sifive_u.c'))
 riscv_ss.add(when: 'CONFIG_SPIKE', if_true: files('spike.c'))
 riscv_ss.add(when: 'CONFIG_MICROCHIP_PFSOC', if_true: files('microchip_pfsoc.c'))
 riscv_ss.add(when: 'CONFIG_ACPI', if_true: files('virt-acpi-build.c'))
-riscv_ss.add(when: 'CONFIG_RISCV_IOMMU', if_true: files('riscv-iommu.c', 'riscv-iommu-pci.c'))
+riscv_ss.add(when: 'CONFIG_RISCV_IOMMU', if_true: files('riscv-iommu.c', 'riscv-iommu-pci.c', 'riscv-iommu-sys.c'))
 
 hw_arch += {'riscv': riscv_ss}
diff --git a/hw/riscv/riscv-iommu-sys.c b/hw/riscv/riscv-iommu-sys.c
new file mode 100644
index 0000000000..4305cf8d79
--- /dev/null
+++ b/hw/riscv/riscv-iommu-sys.c
@@ -0,0 +1,93 @@
+/*
+ * QEMU emulation of an RISC-V IOMMU (Ziommu) - Platform Device
+ *
+ * Copyright (C) 2022-2023 Rivos Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/pci/pci_bus.h"
+#include "hw/qdev-properties.h"
+#include "hw/sysbus.h"
+#include "qapi/error.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "qemu/host-utils.h"
+#include "qemu/module.h"
+#include "qemu/osdep.h"
+#include "qom/object.h"
+
+#include "riscv-iommu.h"
+
+/* RISC-V IOMMU System Platform Device Emulation */
+
+struct RISCVIOMMUStateSys {
+    SysBusDevice     parent;
+    uint64_t         addr;
+    RISCVIOMMUState  iommu;
+};
+
+static void riscv_iommu_sys_realize(DeviceState *dev, Error **errp)
+{
+    RISCVIOMMUStateSys *s = RISCV_IOMMU_SYS(dev);
+    PCIBus *pci_bus;
+
+    qdev_realize(DEVICE(&s->iommu), NULL, errp);
+    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &s->iommu.regs_mr);
+    if (s->addr) {
+        sysbus_mmio_map(SYS_BUS_DEVICE(s), 0, s->addr);
+    }
+
+    pci_bus = (PCIBus *) object_resolve_path_type("", TYPE_PCI_BUS, NULL);
+    if (pci_bus) {
+        riscv_iommu_pci_setup_iommu(&s->iommu, pci_bus, errp);
+    }
+}
+
+static void riscv_iommu_sys_init(Object *obj)
+{
+    RISCVIOMMUStateSys *s = RISCV_IOMMU_SYS(obj);
+    RISCVIOMMUState *iommu = &s->iommu;
+
+    object_initialize_child(obj, "iommu", iommu, TYPE_RISCV_IOMMU);
+    qdev_alias_all_properties(DEVICE(iommu), obj);
+}
+
+static Property riscv_iommu_sys_properties[] = {
+    DEFINE_PROP_UINT64("addr", RISCVIOMMUStateSys, addr, 0),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void riscv_iommu_sys_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    dc->realize = riscv_iommu_sys_realize;
+    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+    device_class_set_props(dc, riscv_iommu_sys_properties);
+}
+
+static const TypeInfo riscv_iommu_sys = {
+    .name          = TYPE_RISCV_IOMMU_SYS,
+    .parent        = TYPE_SYS_BUS_DEVICE,
+    .class_init    = riscv_iommu_sys_class_init,
+    .instance_init = riscv_iommu_sys_init,
+    .instance_size = sizeof(RISCVIOMMUStateSys),
+};
+
+static void riscv_iommu_register_sys(void)
+{
+    type_register_static(&riscv_iommu_sys);
+}
+
+type_init(riscv_iommu_register_sys)
diff --git a/include/hw/riscv/iommu.h b/include/hw/riscv/iommu.h
index 403b365893..c8d28a79a1 100644
--- a/include/hw/riscv/iommu.h
+++ b/include/hw/riscv/iommu.h
@@ -33,4 +33,8 @@ typedef struct RISCVIOMMUSpace RISCVIOMMUSpace;
 OBJECT_DECLARE_SIMPLE_TYPE(RISCVIOMMUStatePci, RISCV_IOMMU_PCI)
 typedef struct RISCVIOMMUStatePci RISCVIOMMUStatePci;
 
+#define TYPE_RISCV_IOMMU_SYS "riscv-iommu-device"
+OBJECT_DECLARE_SIMPLE_TYPE(RISCVIOMMUStateSys, RISCV_IOMMU_SYS)
+typedef struct RISCVIOMMUStateSys RISCVIOMMUStateSys;
+
 #endif
-- 
2.43.2



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 06/15] hw/riscv/virt.c: support for RISC-V IOMMU PCIDevice hotplug
  2024-03-07 16:03 [PATCH v2 00/15] riscv: QEMU RISC-V IOMMU Support Daniel Henrique Barboza
                   ` (4 preceding siblings ...)
  2024-03-07 16:03 ` [PATCH v2 05/15] hw/riscv: add riscv-iommu-sys platform device Daniel Henrique Barboza
@ 2024-03-07 16:03 ` Daniel Henrique Barboza
  2024-04-30  2:17   ` Frank Chang
  2024-05-15  6:25   ` Eric Cheng
  2024-03-07 16:03 ` [PATCH v2 07/15] test/qtest: add riscv-iommu-pci tests Daniel Henrique Barboza
                   ` (9 subsequent siblings)
  15 siblings, 2 replies; 55+ messages in thread
From: Daniel Henrique Barboza @ 2024-03-07 16:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, zhiwei_liu,
	palmer, ajones, tjeznach, Daniel Henrique Barboza

From: Tomasz Jeznach <tjeznach@rivosinc.com>

Generate device tree entry for riscv-iommu PCI device, along with
mapping all PCI device identifiers to the single IOMMU device instance.

Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
---
 hw/riscv/virt.c | 33 ++++++++++++++++++++++++++++++++-
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index a094af97c3..67a8267747 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -32,6 +32,7 @@
 #include "hw/core/sysbus-fdt.h"
 #include "target/riscv/pmu.h"
 #include "hw/riscv/riscv_hart.h"
+#include "hw/riscv/iommu.h"
 #include "hw/riscv/virt.h"
 #include "hw/riscv/boot.h"
 #include "hw/riscv/numa.h"
@@ -1004,6 +1005,30 @@ static void create_fdt_virtio_iommu(RISCVVirtState *s, uint16_t bdf)
                            bdf + 1, iommu_phandle, bdf + 1, 0xffff - bdf);
 }
 
+static void create_fdt_iommu(RISCVVirtState *s, uint16_t bdf)
+{
+    const char comp[] = "riscv,pci-iommu";
+    void *fdt = MACHINE(s)->fdt;
+    uint32_t iommu_phandle;
+    g_autofree char *iommu_node = NULL;
+    g_autofree char *pci_node = NULL;
+
+    pci_node = g_strdup_printf("/soc/pci@%lx",
+                               (long) virt_memmap[VIRT_PCIE_ECAM].base);
+    iommu_node = g_strdup_printf("%s/iommu@%x", pci_node, bdf);
+    iommu_phandle = qemu_fdt_alloc_phandle(fdt);
+    qemu_fdt_add_subnode(fdt, iommu_node);
+
+    qemu_fdt_setprop(fdt, iommu_node, "compatible", comp, sizeof(comp));
+    qemu_fdt_setprop_cell(fdt, iommu_node, "#iommu-cells", 1);
+    qemu_fdt_setprop_cell(fdt, iommu_node, "phandle", iommu_phandle);
+    qemu_fdt_setprop_cells(fdt, iommu_node, "reg",
+                           bdf << 8, 0, 0, 0, 0);
+    qemu_fdt_setprop_cells(fdt, pci_node, "iommu-map",
+                           0, iommu_phandle, 0, bdf,
+                           bdf + 1, iommu_phandle, bdf + 1, 0xffff - bdf);
+}
+
 static void finalize_fdt(RISCVVirtState *s)
 {
     uint32_t phandle = 1, irq_mmio_phandle = 1, msi_pcie_phandle = 1;
@@ -1712,9 +1737,11 @@ static HotplugHandler *virt_machine_get_hotplug_handler(MachineState *machine,
     MachineClass *mc = MACHINE_GET_CLASS(machine);
 
     if (device_is_dynamic_sysbus(mc, dev) ||
-        object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI)) {
+        object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI) ||
+        object_dynamic_cast(OBJECT(dev), TYPE_RISCV_IOMMU_PCI)) {
         return HOTPLUG_HANDLER(machine);
     }
+
     return NULL;
 }
 
@@ -1735,6 +1762,10 @@ static void virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
     if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI)) {
         create_fdt_virtio_iommu(s, pci_get_bdf(PCI_DEVICE(dev)));
     }
+
+    if (object_dynamic_cast(OBJECT(dev), TYPE_RISCV_IOMMU_PCI)) {
+        create_fdt_iommu(s, pci_get_bdf(PCI_DEVICE(dev)));
+    }
 }
 
 static void virt_machine_class_init(ObjectClass *oc, void *data)
-- 
2.43.2



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 07/15] test/qtest: add riscv-iommu-pci tests
  2024-03-07 16:03 [PATCH v2 00/15] riscv: QEMU RISC-V IOMMU Support Daniel Henrique Barboza
                   ` (5 preceding siblings ...)
  2024-03-07 16:03 ` [PATCH v2 06/15] hw/riscv/virt.c: support for RISC-V IOMMU PCIDevice hotplug Daniel Henrique Barboza
@ 2024-03-07 16:03 ` Daniel Henrique Barboza
  2024-04-30  3:33   ` Frank Chang
  2024-03-07 16:03 ` [PATCH v2 08/15] hw/riscv/riscv-iommu: add Address Translation Cache (IOATC) Daniel Henrique Barboza
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 55+ messages in thread
From: Daniel Henrique Barboza @ 2024-03-07 16:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, zhiwei_liu,
	palmer, ajones, tjeznach, Daniel Henrique Barboza

To test the RISC-V IOMMU emulation we'll use its PCI representation.
Create a new 'riscv-iommu-pci' libqos device that will be present with
CONFIG_RISCV_IOMMU.  This config is only available for RISC-V, so this
device will only be consumed by the RISC-V libqos machine.

Start with basic tests: a PCI sanity check and a reset state register
test. The reset test was taken from the RISC-V IOMMU spec chapter 5.2,
"Reset behavior".

More tests will be added later.

Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
---
 tests/qtest/libqos/meson.build   |  4 ++
 tests/qtest/libqos/riscv-iommu.c | 79 +++++++++++++++++++++++++++
 tests/qtest/libqos/riscv-iommu.h | 67 +++++++++++++++++++++++
 tests/qtest/meson.build          |  1 +
 tests/qtest/riscv-iommu-test.c   | 93 ++++++++++++++++++++++++++++++++
 5 files changed, 244 insertions(+)
 create mode 100644 tests/qtest/libqos/riscv-iommu.c
 create mode 100644 tests/qtest/libqos/riscv-iommu.h
 create mode 100644 tests/qtest/riscv-iommu-test.c

diff --git a/tests/qtest/libqos/meson.build b/tests/qtest/libqos/meson.build
index 3aed6efcb8..07fe20eacb 100644
--- a/tests/qtest/libqos/meson.build
+++ b/tests/qtest/libqos/meson.build
@@ -67,6 +67,10 @@ if have_virtfs
   libqos_srcs += files('virtio-9p.c', 'virtio-9p-client.c')
 endif
 
+if config_all_devices.has_key('CONFIG_RISCV_IOMMU')
+  libqos_srcs += files('riscv-iommu.c')
+endif
+
 libqos = static_library('qos', libqos_srcs + genh,
                         name_suffix: 'fa',
                         build_by_default: false)
diff --git a/tests/qtest/libqos/riscv-iommu.c b/tests/qtest/libqos/riscv-iommu.c
new file mode 100644
index 0000000000..8ae7d4888c
--- /dev/null
+++ b/tests/qtest/libqos/riscv-iommu.c
@@ -0,0 +1,79 @@
+/*
+ * libqos driver riscv-iommu-pci framework
+ *
+ * Copyright (c) 2024 Ventana Micro Systems Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at your
+ * option) any later version.  See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "../libqtest.h"
+#include "qemu/module.h"
+#include "qgraph.h"
+#include "pci.h"
+#include "riscv-iommu.h"
+
+#define PCI_VENDOR_ID_RIVOS           0x1efd
+#define PCI_DEVICE_ID_RIVOS_IOMMU     0xedf1
+
+static void *riscv_iommu_pci_get_driver(void *obj, const char *interface)
+{
+    QRISCVIOMMU *r_iommu_pci = obj;
+
+    if (!g_strcmp0(interface, "pci-device")) {
+        return &r_iommu_pci->dev;
+    }
+
+    fprintf(stderr, "%s not present in riscv_iommu_pci\n", interface);
+    g_assert_not_reached();
+}
+
+static void riscv_iommu_pci_start_hw(QOSGraphObject *obj)
+{
+    QRISCVIOMMU *pci = (QRISCVIOMMU *)obj;
+    qpci_device_enable(&pci->dev);
+}
+
+static void riscv_iommu_pci_destructor(QOSGraphObject *obj)
+{
+    QRISCVIOMMU *pci = (QRISCVIOMMU *)obj;
+    qpci_iounmap(&pci->dev, pci->reg_bar);
+}
+
+static void *riscv_iommu_pci_create(void *pci_bus, QGuestAllocator *alloc,
+                                    void *addr)
+{
+    QRISCVIOMMU *r_iommu_pci = g_new0(QRISCVIOMMU, 1);
+    QPCIBus *bus = pci_bus;
+
+    qpci_device_init(&r_iommu_pci->dev, bus, addr);
+    r_iommu_pci->reg_bar = qpci_iomap(&r_iommu_pci->dev, 0, NULL);
+
+    r_iommu_pci->obj.get_driver = riscv_iommu_pci_get_driver;
+    r_iommu_pci->obj.start_hw = riscv_iommu_pci_start_hw;
+    r_iommu_pci->obj.destructor = riscv_iommu_pci_destructor;
+    return &r_iommu_pci->obj;
+}
+
+static void riscv_iommu_pci_register_nodes(void)
+{
+    QPCIAddress addr = {
+        .vendor_id = PCI_VENDOR_ID_RIVOS,
+        .device_id = PCI_DEVICE_ID_RIVOS_IOMMU,
+        .devfn = QPCI_DEVFN(1, 0),
+    };
+
+    QOSGraphEdgeOptions opts = {
+        .extra_device_opts = "addr=01.0",
+    };
+
+    add_qpci_address(&opts, &addr);
+
+    qos_node_create_driver("riscv-iommu-pci", riscv_iommu_pci_create);
+    qos_node_produces("riscv-iommu-pci", "pci-device");
+    qos_node_consumes("riscv-iommu-pci", "pci-bus", &opts);
+}
+
+libqos_init(riscv_iommu_pci_register_nodes);
diff --git a/tests/qtest/libqos/riscv-iommu.h b/tests/qtest/libqos/riscv-iommu.h
new file mode 100644
index 0000000000..8c056caa7b
--- /dev/null
+++ b/tests/qtest/libqos/riscv-iommu.h
@@ -0,0 +1,67 @@
+/*
+ * libqos driver riscv-iommu-pci framework
+ *
+ * Copyright (c) 2024 Ventana Micro Systems Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at your
+ * option) any later version.  See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef TESTS_LIBQOS_RISCV_IOMMU_H
+#define TESTS_LIBQOS_RISCV_IOMMU_H
+
+#include "qgraph.h"
+#include "pci.h"
+#include "qemu/bitops.h"
+
+#ifndef GENMASK_ULL
+#define GENMASK_ULL(h, l) (((~0ULL) >> (63 - (h) + (l))) << (l))
+#endif
+
+#define RISCV_IOMMU_PCI_VENDOR_ID_RIVOS  0x1efd
+#define RISCV_IOMMU_PCI_DEVICE_ID_RIVOS  0xedf1
+#define RISCV_IOMMU_PCI_DEVICE_CLASS     0x0806
+
+/* Common field positions */
+#define RISCV_IOMMU_QUEUE_ENABLE        BIT(0)
+#define RISCV_IOMMU_QUEUE_INTR_ENABLE   BIT(1)
+#define RISCV_IOMMU_QUEUE_MEM_FAULT     BIT(8)
+#define RISCV_IOMMU_QUEUE_ACTIVE        BIT(16)
+#define RISCV_IOMMU_QUEUE_BUSY          BIT(17)
+
+#define RISCV_IOMMU_REG_CAP             0x0000
+#define RISCV_IOMMU_CAP_VERSION         GENMASK_ULL(7, 0)
+
+#define RISCV_IOMMU_REG_DDTP            0x0010
+#define RISCV_IOMMU_DDTP_BUSY           BIT_ULL(4)
+#define RISCV_IOMMU_DDTP_MODE           GENMASK_ULL(3, 0)
+#define RISCV_IOMMU_DDTP_MODE_OFF       0
+
+#define RISCV_IOMMU_REG_CQCSR           0x0048
+#define RISCV_IOMMU_CQCSR_CQEN          RISCV_IOMMU_QUEUE_ENABLE
+#define RISCV_IOMMU_CQCSR_CIE           RISCV_IOMMU_QUEUE_INTR_ENABLE
+#define RISCV_IOMMU_CQCSR_CQON          RISCV_IOMMU_QUEUE_ACTIVE
+#define RISCV_IOMMU_CQCSR_BUSY          RISCV_IOMMU_QUEUE_BUSY
+
+#define RISCV_IOMMU_REG_FQCSR           0x004C
+#define RISCV_IOMMU_FQCSR_FQEN          RISCV_IOMMU_QUEUE_ENABLE
+#define RISCV_IOMMU_FQCSR_FIE           RISCV_IOMMU_QUEUE_INTR_ENABLE
+#define RISCV_IOMMU_FQCSR_FQON          RISCV_IOMMU_QUEUE_ACTIVE
+#define RISCV_IOMMU_FQCSR_BUSY          RISCV_IOMMU_QUEUE_BUSY
+
+#define RISCV_IOMMU_REG_PQCSR           0x0050
+#define RISCV_IOMMU_PQCSR_PQEN          RISCV_IOMMU_QUEUE_ENABLE
+#define RISCV_IOMMU_PQCSR_PIE           RISCV_IOMMU_QUEUE_INTR_ENABLE
+#define RISCV_IOMMU_PQCSR_PQON          RISCV_IOMMU_QUEUE_ACTIVE
+#define RISCV_IOMMU_PQCSR_BUSY          RISCV_IOMMU_QUEUE_BUSY
+
+#define RISCV_IOMMU_REG_IPSR            0x0054
+
+typedef struct QRISCVIOMMU {
+    QOSGraphObject obj;
+    QPCIDevice dev;
+    QPCIBar reg_bar;
+} QRISCVIOMMU;
+
+#endif
diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 31b9f4ede4..aeb7346840 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -285,6 +285,7 @@ qos_test_ss.add(
   'vmxnet3-test.c',
   'igb-test.c',
   'ufs-test.c',
+  'riscv-iommu-test.c',
 )
 
 if config_all_devices.has_key('CONFIG_VIRTIO_SERIAL')
diff --git a/tests/qtest/riscv-iommu-test.c b/tests/qtest/riscv-iommu-test.c
new file mode 100644
index 0000000000..13b887d15e
--- /dev/null
+++ b/tests/qtest/riscv-iommu-test.c
@@ -0,0 +1,93 @@
+/*
+ * QTest testcase for RISC-V IOMMU
+ *
+ * Copyright (c) 2024 Ventana Micro Systems Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at your
+ * option) any later version.  See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "libqtest-single.h"
+#include "qemu/module.h"
+#include "libqos/qgraph.h"
+#include "libqos/riscv-iommu.h"
+#include "hw/pci/pci_regs.h"
+
+static uint32_t riscv_iommu_read_reg32(QRISCVIOMMU *r_iommu, int reg_offset)
+{
+    uint32_t reg;
+
+    qpci_memread(&r_iommu->dev, r_iommu->reg_bar, reg_offset,
+                 &reg, sizeof(reg));
+    return reg;
+}
+
+static uint64_t riscv_iommu_read_reg64(QRISCVIOMMU *r_iommu, int reg_offset)
+{
+    uint64_t reg;
+
+    qpci_memread(&r_iommu->dev, r_iommu->reg_bar, reg_offset,
+                 &reg, sizeof(reg));
+    return reg;
+}
+
+static void test_pci_config(void *obj, void *data, QGuestAllocator *t_alloc)
+{
+    QRISCVIOMMU *r_iommu = obj;
+    QPCIDevice *dev = &r_iommu->dev;
+    uint16_t vendorid, deviceid, classid;
+
+    vendorid = qpci_config_readw(dev, PCI_VENDOR_ID);
+    deviceid = qpci_config_readw(dev, PCI_DEVICE_ID);
+    classid = qpci_config_readw(dev, PCI_CLASS_DEVICE);
+
+    g_assert_cmpuint(vendorid, ==, RISCV_IOMMU_PCI_VENDOR_ID_RIVOS);
+    g_assert_cmpuint(deviceid, ==, RISCV_IOMMU_PCI_DEVICE_ID_RIVOS);
+    g_assert_cmpuint(classid, ==, RISCV_IOMMU_PCI_DEVICE_CLASS);
+}
+
+static void test_reg_reset(void *obj, void *data, QGuestAllocator *t_alloc)
+{
+    QRISCVIOMMU *r_iommu = obj;
+    uint64_t cap;
+    uint32_t reg;
+
+    cap = riscv_iommu_read_reg64(r_iommu, RISCV_IOMMU_REG_CAP);
+    g_assert_cmpuint(cap & RISCV_IOMMU_CAP_VERSION, ==, 0x10);
+
+    reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_CQCSR);
+    g_assert_cmpuint(reg & RISCV_IOMMU_CQCSR_CQEN, ==, 0);
+    g_assert_cmpuint(reg & RISCV_IOMMU_CQCSR_CIE, ==, 0);
+    g_assert_cmpuint(reg & RISCV_IOMMU_CQCSR_CQON, ==, 0);
+    g_assert_cmpuint(reg & RISCV_IOMMU_CQCSR_BUSY, ==, 0);
+
+    reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_FQCSR);
+    g_assert_cmpuint(reg & RISCV_IOMMU_FQCSR_FQEN, ==, 0);
+    g_assert_cmpuint(reg & RISCV_IOMMU_FQCSR_FIE, ==, 0);
+    g_assert_cmpuint(reg & RISCV_IOMMU_FQCSR_FQON, ==, 0);
+    g_assert_cmpuint(reg & RISCV_IOMMU_FQCSR_BUSY, ==, 0);
+
+    reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_PQCSR);
+    g_assert_cmpuint(reg & RISCV_IOMMU_PQCSR_PQEN, ==, 0);
+    g_assert_cmpuint(reg & RISCV_IOMMU_PQCSR_PIE, ==, 0);
+    g_assert_cmpuint(reg & RISCV_IOMMU_PQCSR_PQON, ==, 0);
+    g_assert_cmpuint(reg & RISCV_IOMMU_PQCSR_BUSY, ==, 0);
+
+    reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_DDTP);
+    g_assert_cmpuint(reg & RISCV_IOMMU_DDTP_BUSY, ==, 0);
+    g_assert_cmpuint(reg & RISCV_IOMMU_DDTP_MODE, ==,
+                     RISCV_IOMMU_DDTP_MODE_OFF);
+
+    reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_IPSR);
+    g_assert_cmpuint(reg, ==, 0);
+}
+
+static void register_riscv_iommu_test(void)
+{
+    qos_add_test("pci_config", "riscv-iommu-pci", test_pci_config, NULL);
+    qos_add_test("reg_reset", "riscv-iommu-pci", test_reg_reset, NULL);
+}
+
+libqos_init(register_riscv_iommu_test);
-- 
2.43.2



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 08/15] hw/riscv/riscv-iommu: add Address Translation Cache (IOATC)
  2024-03-07 16:03 [PATCH v2 00/15] riscv: QEMU RISC-V IOMMU Support Daniel Henrique Barboza
                   ` (6 preceding siblings ...)
  2024-03-07 16:03 ` [PATCH v2 07/15] test/qtest: add riscv-iommu-pci tests Daniel Henrique Barboza
@ 2024-03-07 16:03 ` Daniel Henrique Barboza
  2024-05-08  7:26   ` Frank Chang
  2024-03-07 16:03 ` [PATCH v2 09/15] hw/riscv/riscv-iommu: add s-stage and g-stage support Daniel Henrique Barboza
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 55+ messages in thread
From: Daniel Henrique Barboza @ 2024-03-07 16:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, zhiwei_liu,
	palmer, ajones, tjeznach, Daniel Henrique Barboza

From: Tomasz Jeznach <tjeznach@rivosinc.com>

The RISC-V IOMMU spec predicts that the IOMMU can use translation caches
to hold entries from the DDT. This includes implementation for all cache
commands that are marked as 'not implemented'.

There are some artifacts included in the cache that predicts s-stage and
g-stage elements, although we don't support it yet. We'll introduce them
next.

Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
---
 hw/riscv/riscv-iommu.c | 190 ++++++++++++++++++++++++++++++++++++++++-
 hw/riscv/riscv-iommu.h |   2 +
 2 files changed, 188 insertions(+), 4 deletions(-)

diff --git a/hw/riscv/riscv-iommu.c b/hw/riscv/riscv-iommu.c
index df534b99b0..0b93146327 100644
--- a/hw/riscv/riscv-iommu.c
+++ b/hw/riscv/riscv-iommu.c
@@ -63,6 +63,16 @@ struct RISCVIOMMUContext {
     uint64_t msiptp;            /* MSI redirection page table pointer */
 };
 
+/* Address translation cache entry */
+struct RISCVIOMMUEntry {
+    uint64_t iova:44;           /* IOVA Page Number */
+    uint64_t pscid:20;          /* Process Soft-Context identifier */
+    uint64_t phys:44;           /* Physical Page Number */
+    uint64_t gscid:16;          /* Guest Soft-Context identifier */
+    uint64_t perm:2;            /* IOMMU_RW flags */
+    uint64_t __rfu:2;
+};
+
 /* IOMMU index for transactions without PASID specified. */
 #define RISCV_IOMMU_NOPASID 0
 
@@ -629,14 +639,127 @@ static AddressSpace *riscv_iommu_space(RISCVIOMMUState *s, uint32_t devid)
     return &as->iova_as;
 }
 
+/* Translation Object cache support */
+static gboolean __iot_equal(gconstpointer v1, gconstpointer v2)
+{
+    RISCVIOMMUEntry *t1 = (RISCVIOMMUEntry *) v1;
+    RISCVIOMMUEntry *t2 = (RISCVIOMMUEntry *) v2;
+    return t1->gscid == t2->gscid && t1->pscid == t2->pscid &&
+           t1->iova == t2->iova;
+}
+
+static guint __iot_hash(gconstpointer v)
+{
+    RISCVIOMMUEntry *t = (RISCVIOMMUEntry *) v;
+    return (guint)t->iova;
+}
+
+/* GV: 1 PSCV: 1 AV: 1 */
+static void __iot_inval_pscid_iova(gpointer key, gpointer value, gpointer data)
+{
+    RISCVIOMMUEntry *iot = (RISCVIOMMUEntry *) value;
+    RISCVIOMMUEntry *arg = (RISCVIOMMUEntry *) data;
+    if (iot->gscid == arg->gscid &&
+        iot->pscid == arg->pscid &&
+        iot->iova == arg->iova) {
+        iot->perm = 0;
+    }
+}
+
+/* GV: 1 PSCV: 1 AV: 0 */
+static void __iot_inval_pscid(gpointer key, gpointer value, gpointer data)
+{
+    RISCVIOMMUEntry *iot = (RISCVIOMMUEntry *) value;
+    RISCVIOMMUEntry *arg = (RISCVIOMMUEntry *) data;
+    if (iot->gscid == arg->gscid &&
+        iot->pscid == arg->pscid) {
+        iot->perm = 0;
+    }
+}
+
+/* GV: 1 GVMA: 1 */
+static void __iot_inval_gscid_gpa(gpointer key, gpointer value, gpointer data)
+{
+    RISCVIOMMUEntry *iot = (RISCVIOMMUEntry *) value;
+    RISCVIOMMUEntry *arg = (RISCVIOMMUEntry *) data;
+    if (iot->gscid == arg->gscid) {
+        /* simplified cache, no GPA matching */
+        iot->perm = 0;
+    }
+}
+
+/* GV: 1 GVMA: 0 */
+static void __iot_inval_gscid(gpointer key, gpointer value, gpointer data)
+{
+    RISCVIOMMUEntry *iot = (RISCVIOMMUEntry *) value;
+    RISCVIOMMUEntry *arg = (RISCVIOMMUEntry *) data;
+    if (iot->gscid == arg->gscid) {
+        iot->perm = 0;
+    }
+}
+
+/* GV: 0 */
+static void __iot_inval_all(gpointer key, gpointer value, gpointer data)
+{
+    RISCVIOMMUEntry *iot = (RISCVIOMMUEntry *) value;
+    iot->perm = 0;
+}
+
+/* caller should keep ref-count for iot_cache object */
+static RISCVIOMMUEntry *riscv_iommu_iot_lookup(RISCVIOMMUContext *ctx,
+    GHashTable *iot_cache, hwaddr iova)
+{
+    RISCVIOMMUEntry key = {
+        .pscid = get_field(ctx->ta, RISCV_IOMMU_DC_TA_PSCID),
+        .iova  = PPN_DOWN(iova),
+    };
+    return g_hash_table_lookup(iot_cache, &key);
+}
+
+/* caller should keep ref-count for iot_cache object */
+static void riscv_iommu_iot_update(RISCVIOMMUState *s,
+    GHashTable *iot_cache, RISCVIOMMUEntry *iot)
+{
+    if (!s->iot_limit) {
+        return;
+    }
+
+    if (g_hash_table_size(s->iot_cache) >= s->iot_limit) {
+        iot_cache = g_hash_table_new_full(__iot_hash, __iot_equal,
+                                          g_free, NULL);
+        g_hash_table_unref(qatomic_xchg(&s->iot_cache, iot_cache));
+    }
+    g_hash_table_add(iot_cache, iot);
+}
+
+static void riscv_iommu_iot_inval(RISCVIOMMUState *s, GHFunc func,
+    uint32_t gscid, uint32_t pscid, hwaddr iova)
+{
+    GHashTable *iot_cache;
+    RISCVIOMMUEntry key = {
+        .gscid = gscid,
+        .pscid = pscid,
+        .iova  = PPN_DOWN(iova),
+    };
+
+    iot_cache = g_hash_table_ref(s->iot_cache);
+    g_hash_table_foreach(iot_cache, func, &key);
+    g_hash_table_unref(iot_cache);
+}
+
 static int riscv_iommu_translate(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
-    IOMMUTLBEntry *iotlb)
+    IOMMUTLBEntry *iotlb, bool enable_cache)
 {
+    RISCVIOMMUEntry *iot;
+    IOMMUAccessFlags perm;
     bool enable_faults;
     bool enable_pasid;
     bool enable_pri;
+    GHashTable *iot_cache;
     int fault;
 
+    iot_cache = g_hash_table_ref(s->iot_cache);
+
     enable_faults = !(ctx->tc & RISCV_IOMMU_DC_TC_DTF);
     /*
      * TC[32] is reserved for custom extensions, used here to temporarily
@@ -645,9 +768,36 @@ static int riscv_iommu_translate(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
     enable_pri = (iotlb->perm == IOMMU_NONE) && (ctx->tc & BIT_ULL(32));
     enable_pasid = (ctx->tc & RISCV_IOMMU_DC_TC_PDTV);
 
+    iot = riscv_iommu_iot_lookup(ctx, iot_cache, iotlb->iova);
+    perm = iot ? iot->perm : IOMMU_NONE;
+    if (perm != IOMMU_NONE) {
+        iotlb->translated_addr = PPN_PHYS(iot->phys);
+        iotlb->addr_mask = ~TARGET_PAGE_MASK;
+        iotlb->perm = perm;
+        fault = 0;
+        goto done;
+    }
+
     /* Translate using device directory / page table information. */
     fault = riscv_iommu_spa_fetch(s, ctx, iotlb);
 
+    if (!fault && iotlb->target_as == &s->trap_as) {
+        /* Do not cache trapped MSI translations */
+        goto done;
+    }
+
+    if (!fault && iotlb->translated_addr != iotlb->iova && enable_cache) {
+        iot = g_new0(RISCVIOMMUEntry, 1);
+        iot->iova = PPN_DOWN(iotlb->iova);
+        iot->phys = PPN_DOWN(iotlb->translated_addr);
+        iot->pscid = get_field(ctx->ta, RISCV_IOMMU_DC_TA_PSCID);
+        iot->perm = iotlb->perm;
+        riscv_iommu_iot_update(s, iot_cache, iot);
+    }
+
+done:
+    g_hash_table_unref(iot_cache);
+
     if (enable_pri && fault) {
         struct riscv_iommu_pq_record pr = {0};
         if (enable_pasid) {
@@ -794,13 +944,40 @@ static void riscv_iommu_process_cq_tail(RISCVIOMMUState *s)
             if (cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_PSCV) {
                 /* illegal command arguments IOTINVAL.GVMA & PSCV == 1 */
                 goto cmd_ill;
+            } else if (!(cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_GV)) {
+                /* invalidate all cache mappings */
+                func = __iot_inval_all;
+            } else if (!(cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_AV)) {
+                /* invalidate cache matching GSCID */
+                func = __iot_inval_gscid;
+            } else {
+                /* invalidate cache matching GSCID and ADDR (GPA) */
+                func = __iot_inval_gscid_gpa;
             }
-            /* translation cache not implemented yet */
+            riscv_iommu_iot_inval(s, func,
+                get_field(cmd.dword0, RISCV_IOMMU_CMD_IOTINVAL_GSCID), 0,
+                cmd.dword1 & TARGET_PAGE_MASK);
             break;
 
         case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IOTINVAL_FUNC_VMA,
                              RISCV_IOMMU_CMD_IOTINVAL_OPCODE):
-            /* translation cache not implemented yet */
+            if (!(cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_GV)) {
+                /* invalidate all cache mappings, simplified model */
+                func = __iot_inval_all;
+            } else if (!(cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_PSCV)) {
+                /* invalidate cache matching GSCID, simplified model */
+                func = __iot_inval_gscid;
+            } else if (!(cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_AV)) {
+                /* invalidate cache matching GSCID and PSCID */
+                func = __iot_inval_pscid;
+            } else {
+                /* invalidate cache matching GSCID and PSCID and ADDR (IOVA) */
+                func = __iot_inval_pscid_iova;
+            }
+            riscv_iommu_iot_inval(s, func,
+                get_field(cmd.dword0, RISCV_IOMMU_CMD_IOTINVAL_GSCID),
+                get_field(cmd.dword0, RISCV_IOMMU_CMD_IOTINVAL_PSCID),
+                cmd.dword1 & TARGET_PAGE_MASK);
             break;
 
         case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_DDT,
@@ -1290,6 +1467,8 @@ static void riscv_iommu_realize(DeviceState *dev, Error **errp)
     /* Device translation context cache */
     s->ctx_cache = g_hash_table_new_full(__ctx_hash, __ctx_equal,
                                          g_free, NULL);
+    s->iot_cache = g_hash_table_new_full(__iot_hash, __iot_equal,
+                                         g_free, NULL);
 
     s->iommus.le_next = NULL;
     s->iommus.le_prev = NULL;
@@ -1313,6 +1492,7 @@ static void riscv_iommu_unrealize(DeviceState *dev)
     qemu_thread_join(&s->core_proc);
     qemu_cond_destroy(&s->core_cond);
     qemu_mutex_destroy(&s->core_lock);
+    g_hash_table_unref(s->iot_cache);
     g_hash_table_unref(s->ctx_cache);
 }
 
@@ -1320,6 +1500,8 @@ static Property riscv_iommu_properties[] = {
     DEFINE_PROP_UINT32("version", RISCVIOMMUState, version,
         RISCV_IOMMU_SPEC_DOT_VER),
     DEFINE_PROP_UINT32("bus", RISCVIOMMUState, bus, 0x0),
+    DEFINE_PROP_UINT32("ioatc-limit", RISCVIOMMUState, iot_limit,
+        LIMIT_CACHE_IOT),
     DEFINE_PROP_BOOL("intremap", RISCVIOMMUState, enable_msi, TRUE),
     DEFINE_PROP_BOOL("off", RISCVIOMMUState, enable_off, TRUE),
     DEFINE_PROP_LINK("downstream-mr", RISCVIOMMUState, target_mr,
@@ -1372,7 +1554,7 @@ static IOMMUTLBEntry riscv_iommu_memory_region_translate(
         /* Translation disabled or invalid. */
         iotlb.addr_mask = 0;
         iotlb.perm = IOMMU_NONE;
-    } else if (riscv_iommu_translate(as->iommu, ctx, &iotlb)) {
+    } else if (riscv_iommu_translate(as->iommu, ctx, &iotlb, true)) {
         /* Translation disabled or fault reported. */
         iotlb.addr_mask = 0;
         iotlb.perm = IOMMU_NONE;
diff --git a/hw/riscv/riscv-iommu.h b/hw/riscv/riscv-iommu.h
index 6f740de690..eea2123686 100644
--- a/hw/riscv/riscv-iommu.h
+++ b/hw/riscv/riscv-iommu.h
@@ -68,6 +68,8 @@ struct RISCVIOMMUState {
     MemoryRegion trap_mr;
 
     GHashTable *ctx_cache;          /* Device translation Context Cache */
+    GHashTable *iot_cache;          /* IO Translated Address Cache */
+    unsigned iot_limit;             /* IO Translation Cache size limit */
 
     /* MMIO Hardware Interface */
     MemoryRegion regs_mr;
-- 
2.43.2



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 09/15] hw/riscv/riscv-iommu: add s-stage and g-stage support
  2024-03-07 16:03 [PATCH v2 00/15] riscv: QEMU RISC-V IOMMU Support Daniel Henrique Barboza
                   ` (7 preceding siblings ...)
  2024-03-07 16:03 ` [PATCH v2 08/15] hw/riscv/riscv-iommu: add Address Translation Cache (IOATC) Daniel Henrique Barboza
@ 2024-03-07 16:03 ` Daniel Henrique Barboza
  2024-05-10 10:36   ` Frank Chang
  2024-03-07 16:03 ` [PATCH v2 10/15] hw/riscv/riscv-iommu: add ATS support Daniel Henrique Barboza
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 55+ messages in thread
From: Daniel Henrique Barboza @ 2024-03-07 16:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, zhiwei_liu,
	palmer, ajones, tjeznach, Daniel Henrique Barboza

From: Tomasz Jeznach <tjeznach@rivosinc.com>

Add support for s-stage (sv32, sv39, sv48, sv57 caps) and g-stage
(sv32x4, sv39x4, sv48x4, sv57x4 caps). Most of the work is done in the
riscv_iommu_spa_fetch() function that now has to consider how many
translation stages we need to walk the page table.

Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
---
 hw/riscv/riscv-iommu-bits.h |  11 ++
 hw/riscv/riscv-iommu.c      | 282 ++++++++++++++++++++++++++++++++++--
 hw/riscv/riscv-iommu.h      |   2 +
 3 files changed, 286 insertions(+), 9 deletions(-)

diff --git a/hw/riscv/riscv-iommu-bits.h b/hw/riscv/riscv-iommu-bits.h
index 8e80b1e52a..9d645d69ea 100644
--- a/hw/riscv/riscv-iommu-bits.h
+++ b/hw/riscv/riscv-iommu-bits.h
@@ -71,6 +71,14 @@ struct riscv_iommu_pq_record {
 /* 5.3 IOMMU Capabilities (64bits) */
 #define RISCV_IOMMU_REG_CAP             0x0000
 #define RISCV_IOMMU_CAP_VERSION         GENMASK_ULL(7, 0)
+#define RISCV_IOMMU_CAP_SV32            BIT_ULL(8)
+#define RISCV_IOMMU_CAP_SV39            BIT_ULL(9)
+#define RISCV_IOMMU_CAP_SV48            BIT_ULL(10)
+#define RISCV_IOMMU_CAP_SV57            BIT_ULL(11)
+#define RISCV_IOMMU_CAP_SV32X4          BIT_ULL(16)
+#define RISCV_IOMMU_CAP_SV39X4          BIT_ULL(17)
+#define RISCV_IOMMU_CAP_SV48X4          BIT_ULL(18)
+#define RISCV_IOMMU_CAP_SV57X4          BIT_ULL(19)
 #define RISCV_IOMMU_CAP_MSI_FLAT        BIT_ULL(22)
 #define RISCV_IOMMU_CAP_MSI_MRIF        BIT_ULL(23)
 #define RISCV_IOMMU_CAP_IGS             GENMASK_ULL(29, 28)
@@ -79,6 +87,7 @@ struct riscv_iommu_pq_record {
 
 /* 5.4 Features control register (32bits) */
 #define RISCV_IOMMU_REG_FCTL            0x0008
+#define RISCV_IOMMU_FCTL_GXL            BIT(2)
 
 /* 5.5 Device-directory-table pointer (64bits) */
 #define RISCV_IOMMU_REG_DDTP            0x0010
@@ -195,6 +204,8 @@ struct riscv_iommu_dc {
 #define RISCV_IOMMU_DC_TC_DTF           BIT_ULL(4)
 #define RISCV_IOMMU_DC_TC_PDTV          BIT_ULL(5)
 #define RISCV_IOMMU_DC_TC_PRPR          BIT_ULL(6)
+#define RISCV_IOMMU_DC_TC_GADE          BIT_ULL(7)
+#define RISCV_IOMMU_DC_TC_SADE          BIT_ULL(8)
 #define RISCV_IOMMU_DC_TC_DPE           BIT_ULL(9)
 #define RISCV_IOMMU_DC_TC_SXL           BIT_ULL(11)
 
diff --git a/hw/riscv/riscv-iommu.c b/hw/riscv/riscv-iommu.c
index 0b93146327..03a610fa75 100644
--- a/hw/riscv/riscv-iommu.c
+++ b/hw/riscv/riscv-iommu.c
@@ -58,6 +58,8 @@ struct RISCVIOMMUContext {
     uint64_t __rfu:20;          /* reserved */
     uint64_t tc;                /* Translation Control */
     uint64_t ta;                /* Translation Attributes */
+    uint64_t satp;              /* S-Stage address translation and protection */
+    uint64_t gatp;              /* G-Stage address translation and protection */
     uint64_t msi_addr_mask;     /* MSI filtering - address mask */
     uint64_t msi_addr_pattern;  /* MSI filtering - address pattern */
     uint64_t msiptp;            /* MSI redirection page table pointer */
@@ -194,12 +196,46 @@ static bool riscv_iommu_msi_check(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
     return true;
 }
 
-/* RISCV IOMMU Address Translation Lookup - Page Table Walk */
+/*
+ * RISCV IOMMU Address Translation Lookup - Page Table Walk
+ *
+ * Note: Code is based on get_physical_address() from target/riscv/cpu_helper.c
+ * Both implementation can be merged into single helper function in future.
+ * Keeping them separate for now, as error reporting and flow specifics are
+ * sufficiently different for separate implementation.
+ *
+ * @s        : IOMMU Device State
+ * @ctx      : Translation context for device id and process address space id.
+ * @iotlb    : translation data: physical address and access mode.
+ * @gpa      : provided IOVA is a guest physical address, use G-Stage only.
+ * @return   : success or fault cause code.
+ */
 static int riscv_iommu_spa_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
-    IOMMUTLBEntry *iotlb)
+    IOMMUTLBEntry *iotlb, bool gpa)
 {
+    dma_addr_t addr, base;
+    uint64_t satp, gatp, pte;
+    bool en_s, en_g;
+    struct {
+        unsigned char step;
+        unsigned char levels;
+        unsigned char ptidxbits;
+        unsigned char ptesize;
+    } sc[2];
+    /* Translation stage phase */
+    enum {
+        S_STAGE = 0,
+        G_STAGE = 1,
+    } pass;
+
+    satp = get_field(ctx->satp, RISCV_IOMMU_ATP_MODE_FIELD);
+    gatp = get_field(ctx->gatp, RISCV_IOMMU_ATP_MODE_FIELD);
+
+    en_s = satp != RISCV_IOMMU_DC_FSC_MODE_BARE && !gpa;
+    en_g = gatp != RISCV_IOMMU_DC_IOHGATP_MODE_BARE;
+
     /* Early check for MSI address match when IOVA == GPA */
-    if (iotlb->perm & IOMMU_WO &&
+    if (!en_s && (iotlb->perm & IOMMU_WO) &&
         riscv_iommu_msi_check(s, ctx, iotlb->iova)) {
         iotlb->target_as = &s->trap_as;
         iotlb->translated_addr = iotlb->iova;
@@ -208,11 +244,196 @@ static int riscv_iommu_spa_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
     }
 
     /* Exit early for pass-through mode. */
-    iotlb->translated_addr = iotlb->iova;
-    iotlb->addr_mask = ~TARGET_PAGE_MASK;
-    /* Allow R/W in pass-through mode */
-    iotlb->perm = IOMMU_RW;
-    return 0;
+    if (!(en_s || en_g)) {
+        iotlb->translated_addr = iotlb->iova;
+        iotlb->addr_mask = ~TARGET_PAGE_MASK;
+        /* Allow R/W in pass-through mode */
+        iotlb->perm = IOMMU_RW;
+        return 0;
+    }
+
+    /* S/G translation parameters. */
+    for (pass = 0; pass < 2; pass++) {
+        uint32_t sv_mode;
+
+        sc[pass].step = 0;
+        if (pass ? (s->fctl & RISCV_IOMMU_FCTL_GXL) :
+            (ctx->tc & RISCV_IOMMU_DC_TC_SXL)) {
+            /* 32bit mode for GXL/SXL == 1 */
+            switch (pass ? gatp : satp) {
+            case RISCV_IOMMU_DC_IOHGATP_MODE_BARE:
+                sc[pass].levels    = 0;
+                sc[pass].ptidxbits = 0;
+                sc[pass].ptesize   = 0;
+                break;
+            case RISCV_IOMMU_DC_IOHGATP_MODE_SV32X4:
+                sv_mode = pass ? RISCV_IOMMU_CAP_SV32X4 : RISCV_IOMMU_CAP_SV32;
+                if (!(s->cap & sv_mode)) {
+                    return RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED;
+                }
+                sc[pass].levels    = 2;
+                sc[pass].ptidxbits = 10;
+                sc[pass].ptesize   = 4;
+                break;
+            default:
+                return RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED;
+            }
+        } else {
+            /* 64bit mode for GXL/SXL == 0 */
+            switch (pass ? gatp : satp) {
+            case RISCV_IOMMU_DC_IOHGATP_MODE_BARE:
+                sc[pass].levels    = 0;
+                sc[pass].ptidxbits = 0;
+                sc[pass].ptesize   = 0;
+                break;
+            case RISCV_IOMMU_DC_IOHGATP_MODE_SV39X4:
+                sv_mode = pass ? RISCV_IOMMU_CAP_SV39X4 : RISCV_IOMMU_CAP_SV39;
+                if (!(s->cap & sv_mode)) {
+                    return RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED;
+                }
+                sc[pass].levels    = 3;
+                sc[pass].ptidxbits = 9;
+                sc[pass].ptesize   = 8;
+                break;
+            case RISCV_IOMMU_DC_IOHGATP_MODE_SV48X4:
+                sv_mode = pass ? RISCV_IOMMU_CAP_SV48X4 : RISCV_IOMMU_CAP_SV48;
+                if (!(s->cap & sv_mode)) {
+                    return RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED;
+                }
+                sc[pass].levels    = 4;
+                sc[pass].ptidxbits = 9;
+                sc[pass].ptesize   = 8;
+                break;
+            case RISCV_IOMMU_DC_IOHGATP_MODE_SV57X4:
+                sv_mode = pass ? RISCV_IOMMU_CAP_SV57X4 : RISCV_IOMMU_CAP_SV57;
+                if (!(s->cap & sv_mode)) {
+                    return RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED;
+                }
+                sc[pass].levels    = 5;
+                sc[pass].ptidxbits = 9;
+                sc[pass].ptesize   = 8;
+                break;
+            default:
+                return RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED;
+            }
+        }
+    };
+
+    /* S/G stages translation tables root pointers */
+    gatp = PPN_PHYS(get_field(ctx->gatp, RISCV_IOMMU_ATP_PPN_FIELD));
+    satp = PPN_PHYS(get_field(ctx->satp, RISCV_IOMMU_ATP_PPN_FIELD));
+    addr = (en_s && en_g) ? satp : iotlb->iova;
+    base = en_g ? gatp : satp;
+    pass = en_g ? G_STAGE : S_STAGE;
+
+    do {
+        const unsigned widened = (pass && !sc[pass].step) ? 2 : 0;
+        const unsigned va_bits = widened + sc[pass].ptidxbits;
+        const unsigned va_skip = TARGET_PAGE_BITS + sc[pass].ptidxbits *
+                                 (sc[pass].levels - 1 - sc[pass].step);
+        const unsigned idx = (addr >> va_skip) & ((1 << va_bits) - 1);
+        const dma_addr_t pte_addr = base + idx * sc[pass].ptesize;
+        const bool ade =
+            ctx->tc & (pass ? RISCV_IOMMU_DC_TC_GADE : RISCV_IOMMU_DC_TC_SADE);
+
+        /* Address range check before first level lookup */
+        if (!sc[pass].step) {
+            const uint64_t va_mask = (1ULL << (va_skip + va_bits)) - 1;
+            if ((addr & va_mask) != addr) {
+                return RISCV_IOMMU_FQ_CAUSE_DMA_DISABLED;
+            }
+        }
+
+        /* Read page table entry */
+        if (dma_memory_read(s->target_as, pte_addr, &pte,
+                sc[pass].ptesize, MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) {
+            return (iotlb->perm & IOMMU_WO) ? RISCV_IOMMU_FQ_CAUSE_WR_FAULT
+                                            : RISCV_IOMMU_FQ_CAUSE_RD_FAULT;
+        }
+
+        if (sc[pass].ptesize == 4) {
+            pte = (uint64_t) le32_to_cpu(*((uint32_t *)&pte));
+        } else {
+            pte = le64_to_cpu(pte);
+        }
+
+        sc[pass].step++;
+        hwaddr ppn = pte >> PTE_PPN_SHIFT;
+
+        if (!(pte & PTE_V)) {
+            break;                /* Invalid PTE */
+        } else if (!(pte & (PTE_R | PTE_W | PTE_X))) {
+            base = PPN_PHYS(ppn); /* Inner PTE, continue walking */
+        } else if ((pte & (PTE_R | PTE_W | PTE_X)) == PTE_W) {
+            break;                /* Reserved leaf PTE flags: PTE_W */
+        } else if ((pte & (PTE_R | PTE_W | PTE_X)) == (PTE_W | PTE_X)) {
+            break;                /* Reserved leaf PTE flags: PTE_W + PTE_X */
+        } else if (ppn & ((1ULL << (va_skip - TARGET_PAGE_BITS)) - 1)) {
+            break;                /* Misaligned PPN */
+        } else if ((iotlb->perm & IOMMU_RO) && !(pte & PTE_R)) {
+            break;                /* Read access check failed */
+        } else if ((iotlb->perm & IOMMU_WO) && !(pte & PTE_W)) {
+            break;                /* Write access check failed */
+        } else if ((iotlb->perm & IOMMU_RO) && !ade && !(pte & PTE_A)) {
+            break;                /* Access bit not set */
+        } else if ((iotlb->perm & IOMMU_WO) && !ade && !(pte & PTE_D)) {
+            break;                /* Dirty bit not set */
+        } else {
+            /* Leaf PTE, translation completed. */
+            sc[pass].step = sc[pass].levels;
+            base = PPN_PHYS(ppn) | (addr & ((1ULL << va_skip) - 1));
+            /* Update address mask based on smallest translation granularity */
+            iotlb->addr_mask &= (1ULL << va_skip) - 1;
+            /* Continue with S-Stage translation? */
+            if (pass && sc[0].step != sc[0].levels) {
+                pass = S_STAGE;
+                addr = iotlb->iova;
+                continue;
+            }
+            /* Translation phase completed (GPA or SPA) */
+            iotlb->translated_addr = base;
+            iotlb->perm = (pte & PTE_W) ? ((pte & PTE_R) ? IOMMU_RW : IOMMU_WO)
+                                                         : IOMMU_RO;
+
+            /* Check MSI GPA address match */
+            if (pass == S_STAGE && (iotlb->perm & IOMMU_WO) &&
+                riscv_iommu_msi_check(s, ctx, base)) {
+                /* Trap MSI writes and return GPA address. */
+                iotlb->target_as = &s->trap_as;
+                iotlb->addr_mask = ~TARGET_PAGE_MASK;
+                return 0;
+            }
+
+            /* Continue with G-Stage translation? */
+            if (!pass && en_g) {
+                pass = G_STAGE;
+                addr = base;
+                base = gatp;
+                sc[pass].step = 0;
+                continue;
+            }
+
+            return 0;
+        }
+
+        if (sc[pass].step == sc[pass].levels) {
+            break; /* Can't find leaf PTE */
+        }
+
+        /* Continue with G-Stage translation? */
+        if (!pass && en_g) {
+            pass = G_STAGE;
+            addr = base;
+            base = gatp;
+            sc[pass].step = 0;
+        }
+    } while (1);
+
+    return (iotlb->perm & IOMMU_WO) ?
+                (pass ? RISCV_IOMMU_FQ_CAUSE_WR_FAULT_VS :
+                        RISCV_IOMMU_FQ_CAUSE_WR_FAULT_S) :
+                (pass ? RISCV_IOMMU_FQ_CAUSE_RD_FAULT_VS :
+                        RISCV_IOMMU_FQ_CAUSE_RD_FAULT_S);
 }
 
 /* Redirect MSI write for given GPA. */
@@ -351,6 +572,10 @@ static int riscv_iommu_ctx_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx)
 
     case RISCV_IOMMU_DDTP_MODE_BARE:
         /* mock up pass-through translation context */
+        ctx->gatp = set_field(0, RISCV_IOMMU_ATP_MODE_FIELD,
+            RISCV_IOMMU_DC_IOHGATP_MODE_BARE);
+        ctx->satp = set_field(0, RISCV_IOMMU_ATP_MODE_FIELD,
+            RISCV_IOMMU_DC_FSC_MODE_BARE);
         ctx->tc = RISCV_IOMMU_DC_TC_V;
         ctx->ta = 0;
         ctx->msiptp = 0;
@@ -424,6 +649,8 @@ static int riscv_iommu_ctx_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx)
 
     /* Set translation context. */
     ctx->tc = le64_to_cpu(dc.tc);
+    ctx->gatp = le64_to_cpu(dc.iohgatp);
+    ctx->satp = le64_to_cpu(dc.fsc);
     ctx->ta = le64_to_cpu(dc.ta);
     ctx->msiptp = le64_to_cpu(dc.msiptp);
     ctx->msi_addr_mask = le64_to_cpu(dc.msi_addr_mask);
@@ -433,14 +660,38 @@ static int riscv_iommu_ctx_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx)
         return RISCV_IOMMU_FQ_CAUSE_DDT_INVALID;
     }
 
+    /* FSC field checks */
+    mode = get_field(ctx->satp, RISCV_IOMMU_DC_FSC_MODE);
+    addr = PPN_PHYS(get_field(ctx->satp, RISCV_IOMMU_DC_FSC_PPN));
+
+    if (mode == RISCV_IOMMU_DC_FSC_MODE_BARE) {
+        /* No S-Stage translation, done. */
+        return 0;
+    }
+
     if (!(ctx->tc & RISCV_IOMMU_DC_TC_PDTV)) {
         if (ctx->pasid != RISCV_IOMMU_NOPASID) {
             /* PASID is disabled */
             return RISCV_IOMMU_FQ_CAUSE_TTYPE_BLOCKED;
         }
+        if (mode > RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV57) {
+            /* Invalid translation mode */
+            return RISCV_IOMMU_FQ_CAUSE_DDT_INVALID;
+        }
         return 0;
     }
 
+    if (ctx->pasid == RISCV_IOMMU_NOPASID) {
+        if (!(ctx->tc & RISCV_IOMMU_DC_TC_DPE)) {
+            /* No default PASID enabled, set BARE mode */
+            ctx->satp = 0ULL;
+            return 0;
+        } else {
+            /* Use default PASID #0 */
+            ctx->pasid = 0;
+        }
+    }
+
     /* FSC.TC.PDTV enabled */
     if (mode > RISCV_IOMMU_DC_FSC_PDTP_MODE_PD20) {
         /* Invalid PDTP.MODE */
@@ -474,6 +725,7 @@ static int riscv_iommu_ctx_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx)
 
     /* Use FSC and TA from process directory entry. */
     ctx->ta = le64_to_cpu(dc.ta);
+    ctx->satp = le64_to_cpu(dc.fsc);
 
     return 0;
 }
@@ -710,6 +962,7 @@ static RISCVIOMMUEntry *riscv_iommu_iot_lookup(RISCVIOMMUContext *ctx,
     GHashTable *iot_cache, hwaddr iova)
 {
     RISCVIOMMUEntry key = {
+        .gscid = get_field(ctx->gatp, RISCV_IOMMU_DC_IOHGATP_GSCID),
         .pscid = get_field(ctx->ta, RISCV_IOMMU_DC_TA_PSCID),
         .iova  = PPN_DOWN(iova),
     };
@@ -779,7 +1032,7 @@ static int riscv_iommu_translate(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
     }
 
     /* Translate using device directory / page table information. */
-    fault = riscv_iommu_spa_fetch(s, ctx, iotlb);
+    fault = riscv_iommu_spa_fetch(s, ctx, iotlb, false);
 
     if (!fault && iotlb->target_as == &s->trap_as) {
         /* Do not cache trapped MSI translations */
@@ -790,6 +1043,7 @@ static int riscv_iommu_translate(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
         iot = g_new0(RISCVIOMMUEntry, 1);
         iot->iova = PPN_DOWN(iotlb->iova);
         iot->phys = PPN_DOWN(iotlb->translated_addr);
+        iot->gscid = get_field(ctx->gatp, RISCV_IOMMU_DC_IOHGATP_GSCID);
         iot->pscid = get_field(ctx->ta, RISCV_IOMMU_DC_TA_PSCID);
         iot->perm = iotlb->perm;
         riscv_iommu_iot_update(s, iot_cache, iot);
@@ -1394,6 +1648,14 @@ static void riscv_iommu_realize(DeviceState *dev, Error **errp)
     if (s->enable_msi) {
         s->cap |= RISCV_IOMMU_CAP_MSI_FLAT | RISCV_IOMMU_CAP_MSI_MRIF;
     }
+    if (s->enable_s_stage) {
+        s->cap |= RISCV_IOMMU_CAP_SV32 | RISCV_IOMMU_CAP_SV39 |
+                  RISCV_IOMMU_CAP_SV48 | RISCV_IOMMU_CAP_SV57;
+    }
+    if (s->enable_g_stage) {
+        s->cap |= RISCV_IOMMU_CAP_SV32X4 | RISCV_IOMMU_CAP_SV39X4 |
+                  RISCV_IOMMU_CAP_SV48X4 | RISCV_IOMMU_CAP_SV57X4;
+    }
     /* Report QEMU target physical address space limits */
     s->cap = set_field(s->cap, RISCV_IOMMU_CAP_PAS,
                        TARGET_PHYS_ADDR_SPACE_BITS);
@@ -1504,6 +1766,8 @@ static Property riscv_iommu_properties[] = {
         LIMIT_CACHE_IOT),
     DEFINE_PROP_BOOL("intremap", RISCVIOMMUState, enable_msi, TRUE),
     DEFINE_PROP_BOOL("off", RISCVIOMMUState, enable_off, TRUE),
+    DEFINE_PROP_BOOL("s-stage", RISCVIOMMUState, enable_s_stage, TRUE),
+    DEFINE_PROP_BOOL("g-stage", RISCVIOMMUState, enable_g_stage, TRUE),
     DEFINE_PROP_LINK("downstream-mr", RISCVIOMMUState, target_mr,
         TYPE_MEMORY_REGION, MemoryRegion *),
     DEFINE_PROP_END_OF_LIST(),
diff --git a/hw/riscv/riscv-iommu.h b/hw/riscv/riscv-iommu.h
index eea2123686..9b33fb97ef 100644
--- a/hw/riscv/riscv-iommu.h
+++ b/hw/riscv/riscv-iommu.h
@@ -38,6 +38,8 @@ struct RISCVIOMMUState {
 
     bool enable_off;      /* Enable out-of-reset OFF mode (DMA disabled) */
     bool enable_msi;      /* Enable MSI remapping */
+    bool enable_s_stage;  /* Enable S/VS-Stage translation */
+    bool enable_g_stage;  /* Enable G-Stage translation */
 
     /* IOMMU Internal State */
     uint64_t ddtp;        /* Validated Device Directory Tree Root Pointer */
-- 
2.43.2



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 10/15] hw/riscv/riscv-iommu: add ATS support
  2024-03-07 16:03 [PATCH v2 00/15] riscv: QEMU RISC-V IOMMU Support Daniel Henrique Barboza
                   ` (8 preceding siblings ...)
  2024-03-07 16:03 ` [PATCH v2 09/15] hw/riscv/riscv-iommu: add s-stage and g-stage support Daniel Henrique Barboza
@ 2024-03-07 16:03 ` Daniel Henrique Barboza
  2024-05-08  2:57   ` Frank Chang
  2024-03-07 16:03 ` [PATCH v2 11/15] hw/riscv/riscv-iommu: add DBG support Daniel Henrique Barboza
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 55+ messages in thread
From: Daniel Henrique Barboza @ 2024-03-07 16:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, zhiwei_liu,
	palmer, ajones, tjeznach, Daniel Henrique Barboza

From: Tomasz Jeznach <tjeznach@rivosinc.com>

Add PCIe Address Translation Services (ATS) capabilities to the IOMMU.
This will add support for ATS translation requests in Fault/Event
queues, Page-request queue and IOATC invalidations.

Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
---
 hw/riscv/riscv-iommu-bits.h |  43 ++++++++++++++-
 hw/riscv/riscv-iommu.c      | 107 +++++++++++++++++++++++++++++++++---
 hw/riscv/riscv-iommu.h      |   1 +
 hw/riscv/trace-events       |   3 +
 4 files changed, 145 insertions(+), 9 deletions(-)

diff --git a/hw/riscv/riscv-iommu-bits.h b/hw/riscv/riscv-iommu-bits.h
index 9d645d69ea..0994f5ce48 100644
--- a/hw/riscv/riscv-iommu-bits.h
+++ b/hw/riscv/riscv-iommu-bits.h
@@ -81,6 +81,7 @@ struct riscv_iommu_pq_record {
 #define RISCV_IOMMU_CAP_SV57X4          BIT_ULL(19)
 #define RISCV_IOMMU_CAP_MSI_FLAT        BIT_ULL(22)
 #define RISCV_IOMMU_CAP_MSI_MRIF        BIT_ULL(23)
+#define RISCV_IOMMU_CAP_ATS             BIT_ULL(25)
 #define RISCV_IOMMU_CAP_IGS             GENMASK_ULL(29, 28)
 #define RISCV_IOMMU_CAP_PAS             GENMASK_ULL(37, 32)
 #define RISCV_IOMMU_CAP_PD8             BIT_ULL(38)
@@ -201,6 +202,7 @@ struct riscv_iommu_dc {
 
 /* Translation control fields */
 #define RISCV_IOMMU_DC_TC_V             BIT_ULL(0)
+#define RISCV_IOMMU_DC_TC_EN_ATS        BIT_ULL(1)
 #define RISCV_IOMMU_DC_TC_DTF           BIT_ULL(4)
 #define RISCV_IOMMU_DC_TC_PDTV          BIT_ULL(5)
 #define RISCV_IOMMU_DC_TC_PRPR          BIT_ULL(6)
@@ -259,6 +261,20 @@ struct riscv_iommu_command {
 #define RISCV_IOMMU_CMD_IODIR_DV        BIT_ULL(33)
 #define RISCV_IOMMU_CMD_IODIR_DID       GENMASK_ULL(63, 40)
 
+/* 3.1.4 I/O MMU PCIe ATS */
+#define RISCV_IOMMU_CMD_ATS_OPCODE              4
+#define RISCV_IOMMU_CMD_ATS_FUNC_INVAL          0
+#define RISCV_IOMMU_CMD_ATS_FUNC_PRGR           1
+#define RISCV_IOMMU_CMD_ATS_PID         GENMASK_ULL(31, 12)
+#define RISCV_IOMMU_CMD_ATS_PV          BIT_ULL(32)
+#define RISCV_IOMMU_CMD_ATS_DSV         BIT_ULL(33)
+#define RISCV_IOMMU_CMD_ATS_RID         GENMASK_ULL(55, 40)
+#define RISCV_IOMMU_CMD_ATS_DSEG        GENMASK_ULL(63, 56)
+/* dword1 is the ATS payload, two different payload types for INVAL and PRGR */
+
+/* ATS.PRGR payload */
+#define RISCV_IOMMU_CMD_ATS_PRGR_RESP_CODE      GENMASK_ULL(47, 44)
+
 enum riscv_iommu_dc_fsc_atp_modes {
     RISCV_IOMMU_DC_FSC_MODE_BARE = 0,
     RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV32 = 8,
@@ -322,7 +338,32 @@ enum riscv_iommu_fq_ttypes {
     RISCV_IOMMU_FQ_TTYPE_TADDR_INST_FETCH = 5,
     RISCV_IOMMU_FQ_TTYPE_TADDR_RD = 6,
     RISCV_IOMMU_FQ_TTYPE_TADDR_WR = 7,
-    RISCV_IOMMU_FW_TTYPE_PCIE_MSG_REQ = 8,
+    RISCV_IOMMU_FQ_TTYPE_PCIE_ATS_REQ = 8,
+    RISCV_IOMMU_FW_TTYPE_PCIE_MSG_REQ = 9,
+};
+
+/* Header fields */
+#define RISCV_IOMMU_PREQ_HDR_PID        GENMASK_ULL(31, 12)
+#define RISCV_IOMMU_PREQ_HDR_PV         BIT_ULL(32)
+#define RISCV_IOMMU_PREQ_HDR_PRIV       BIT_ULL(33)
+#define RISCV_IOMMU_PREQ_HDR_EXEC       BIT_ULL(34)
+#define RISCV_IOMMU_PREQ_HDR_DID        GENMASK_ULL(63, 40)
+
+/* Payload fields */
+#define RISCV_IOMMU_PREQ_PAYLOAD_R      BIT_ULL(0)
+#define RISCV_IOMMU_PREQ_PAYLOAD_W      BIT_ULL(1)
+#define RISCV_IOMMU_PREQ_PAYLOAD_L      BIT_ULL(2)
+#define RISCV_IOMMU_PREQ_PAYLOAD_M      GENMASK_ULL(2, 0)
+#define RISCV_IOMMU_PREQ_PRG_INDEX      GENMASK_ULL(11, 3)
+#define RISCV_IOMMU_PREQ_UADDR          GENMASK_ULL(63, 12)
+
+
+/*
+ * struct riscv_iommu_msi_pte - MSI Page Table Entry
+ */
+struct riscv_iommu_msi_pte {
+      uint64_t pte;
+      uint64_t mrif_info;
 };
 
 /* Fields on pte */
diff --git a/hw/riscv/riscv-iommu.c b/hw/riscv/riscv-iommu.c
index 03a610fa75..7af5929b10 100644
--- a/hw/riscv/riscv-iommu.c
+++ b/hw/riscv/riscv-iommu.c
@@ -576,7 +576,7 @@ static int riscv_iommu_ctx_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx)
             RISCV_IOMMU_DC_IOHGATP_MODE_BARE);
         ctx->satp = set_field(0, RISCV_IOMMU_ATP_MODE_FIELD,
             RISCV_IOMMU_DC_FSC_MODE_BARE);
-        ctx->tc = RISCV_IOMMU_DC_TC_V;
+        ctx->tc = RISCV_IOMMU_DC_TC_EN_ATS | RISCV_IOMMU_DC_TC_V;
         ctx->ta = 0;
         ctx->msiptp = 0;
         return 0;
@@ -1021,6 +1021,18 @@ static int riscv_iommu_translate(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
     enable_pri = (iotlb->perm == IOMMU_NONE) && (ctx->tc & BIT_ULL(32));
     enable_pasid = (ctx->tc & RISCV_IOMMU_DC_TC_PDTV);
 
+    /* Check for ATS request. */
+    if (iotlb->perm == IOMMU_NONE) {
+        /* Check if ATS is disabled. */
+        if (!(ctx->tc & RISCV_IOMMU_DC_TC_EN_ATS)) {
+            enable_pri = false;
+            fault = RISCV_IOMMU_FQ_CAUSE_TTYPE_BLOCKED;
+            goto done;
+        }
+        trace_riscv_iommu_ats(s->parent_obj.id, PCI_BUS_NUM(ctx->devid),
+                PCI_SLOT(ctx->devid), PCI_FUNC(ctx->devid), iotlb->iova);
+    }
+
     iot = riscv_iommu_iot_lookup(ctx, iot_cache, iotlb->iova);
     perm = iot ? iot->perm : IOMMU_NONE;
     if (perm != IOMMU_NONE) {
@@ -1067,13 +1079,10 @@ done:
 
     if (enable_faults && fault) {
         struct riscv_iommu_fq_record ev;
-        unsigned ttype;
-
-        if (iotlb->perm & IOMMU_RW) {
-            ttype = RISCV_IOMMU_FQ_TTYPE_UADDR_WR;
-        } else {
-            ttype = RISCV_IOMMU_FQ_TTYPE_UADDR_RD;
-        }
+        const unsigned ttype =
+            (iotlb->perm & IOMMU_RW) ? RISCV_IOMMU_FQ_TTYPE_UADDR_WR :
+            ((iotlb->perm & IOMMU_RO) ? RISCV_IOMMU_FQ_TTYPE_UADDR_RD :
+            RISCV_IOMMU_FQ_TTYPE_PCIE_ATS_REQ);
         ev.hdr = set_field(0, RISCV_IOMMU_FQ_HDR_CAUSE, fault);
         ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_TTYPE, ttype);
         ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_PV, enable_pasid);
@@ -1105,6 +1114,73 @@ static MemTxResult riscv_iommu_iofence(RISCVIOMMUState *s, bool notify,
         MEMTXATTRS_UNSPECIFIED);
 }
 
+static void riscv_iommu_ats(RISCVIOMMUState *s,
+    struct riscv_iommu_command *cmd, IOMMUNotifierFlag flag,
+    IOMMUAccessFlags perm,
+    void (*trace_fn)(const char *id))
+{
+    RISCVIOMMUSpace *as = NULL;
+    IOMMUNotifier *n;
+    IOMMUTLBEvent event;
+    uint32_t pasid;
+    uint32_t devid;
+    const bool pv = cmd->dword0 & RISCV_IOMMU_CMD_ATS_PV;
+
+    if (cmd->dword0 & RISCV_IOMMU_CMD_ATS_DSV) {
+        /* Use device segment and requester id */
+        devid = get_field(cmd->dword0,
+            RISCV_IOMMU_CMD_ATS_DSEG | RISCV_IOMMU_CMD_ATS_RID);
+    } else {
+        devid = get_field(cmd->dword0, RISCV_IOMMU_CMD_ATS_RID);
+    }
+
+    pasid = get_field(cmd->dword0, RISCV_IOMMU_CMD_ATS_PID);
+
+    qemu_mutex_lock(&s->core_lock);
+    QLIST_FOREACH(as, &s->spaces, list) {
+        if (as->devid == devid) {
+            break;
+        }
+    }
+    qemu_mutex_unlock(&s->core_lock);
+
+    if (!as || !as->notifier) {
+        return;
+    }
+
+    event.type = flag;
+    event.entry.perm = perm;
+    event.entry.target_as = s->target_as;
+
+    IOMMU_NOTIFIER_FOREACH(n, &as->iova_mr) {
+        if (!pv || n->iommu_idx == pasid) {
+            event.entry.iova = n->start;
+            event.entry.addr_mask = n->end - n->start;
+            trace_fn(as->iova_mr.parent_obj.name);
+            memory_region_notify_iommu_one(n, &event);
+        }
+    }
+}
+
+static void riscv_iommu_ats_inval(RISCVIOMMUState *s,
+    struct riscv_iommu_command *cmd)
+{
+    return riscv_iommu_ats(s, cmd, IOMMU_NOTIFIER_DEVIOTLB_UNMAP, IOMMU_NONE,
+                           trace_riscv_iommu_ats_inval);
+}
+
+static void riscv_iommu_ats_prgr(RISCVIOMMUState *s,
+    struct riscv_iommu_command *cmd)
+{
+    unsigned resp_code = get_field(cmd->dword1,
+                                   RISCV_IOMMU_CMD_ATS_PRGR_RESP_CODE);
+
+    /* Using the access flag to carry response code information */
+    IOMMUAccessFlags perm = resp_code ? IOMMU_NONE : IOMMU_RW;
+    return riscv_iommu_ats(s, cmd, IOMMU_NOTIFIER_MAP, perm,
+                           trace_riscv_iommu_ats_prgr);
+}
+
 static void riscv_iommu_process_ddtp(RISCVIOMMUState *s)
 {
     uint64_t old_ddtp = s->ddtp;
@@ -1260,6 +1336,17 @@ static void riscv_iommu_process_cq_tail(RISCVIOMMUState *s)
                 get_field(cmd.dword0, RISCV_IOMMU_CMD_IODIR_PID));
             break;
 
+        /* ATS commands */
+        case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_ATS_FUNC_INVAL,
+                             RISCV_IOMMU_CMD_ATS_OPCODE):
+            riscv_iommu_ats_inval(s, &cmd);
+            break;
+
+        case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_ATS_FUNC_PRGR,
+                             RISCV_IOMMU_CMD_ATS_OPCODE):
+            riscv_iommu_ats_prgr(s, &cmd);
+            break;
+
         default:
         cmd_ill:
             /* Invalid instruction, do not advance instruction index. */
@@ -1648,6 +1735,9 @@ static void riscv_iommu_realize(DeviceState *dev, Error **errp)
     if (s->enable_msi) {
         s->cap |= RISCV_IOMMU_CAP_MSI_FLAT | RISCV_IOMMU_CAP_MSI_MRIF;
     }
+    if (s->enable_ats) {
+        s->cap |= RISCV_IOMMU_CAP_ATS;
+    }
     if (s->enable_s_stage) {
         s->cap |= RISCV_IOMMU_CAP_SV32 | RISCV_IOMMU_CAP_SV39 |
                   RISCV_IOMMU_CAP_SV48 | RISCV_IOMMU_CAP_SV57;
@@ -1765,6 +1855,7 @@ static Property riscv_iommu_properties[] = {
     DEFINE_PROP_UINT32("ioatc-limit", RISCVIOMMUState, iot_limit,
         LIMIT_CACHE_IOT),
     DEFINE_PROP_BOOL("intremap", RISCVIOMMUState, enable_msi, TRUE),
+    DEFINE_PROP_BOOL("ats", RISCVIOMMUState, enable_ats, TRUE),
     DEFINE_PROP_BOOL("off", RISCVIOMMUState, enable_off, TRUE),
     DEFINE_PROP_BOOL("s-stage", RISCVIOMMUState, enable_s_stage, TRUE),
     DEFINE_PROP_BOOL("g-stage", RISCVIOMMUState, enable_g_stage, TRUE),
diff --git a/hw/riscv/riscv-iommu.h b/hw/riscv/riscv-iommu.h
index 9b33fb97ef..47f3fdad58 100644
--- a/hw/riscv/riscv-iommu.h
+++ b/hw/riscv/riscv-iommu.h
@@ -38,6 +38,7 @@ struct RISCVIOMMUState {
 
     bool enable_off;      /* Enable out-of-reset OFF mode (DMA disabled) */
     bool enable_msi;      /* Enable MSI remapping */
+    bool enable_ats;      /* Enable ATS support */
     bool enable_s_stage;  /* Enable S/VS-Stage translation */
     bool enable_g_stage;  /* Enable G-Stage translation */
 
diff --git a/hw/riscv/trace-events b/hw/riscv/trace-events
index 42a97caffa..4b486b6420 100644
--- a/hw/riscv/trace-events
+++ b/hw/riscv/trace-events
@@ -9,3 +9,6 @@ riscv_iommu_msi(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iov
 riscv_iommu_cmd(const char *id, uint64_t l, uint64_t u) "%s: command 0x%"PRIx64" 0x%"PRIx64
 riscv_iommu_notifier_add(const char *id) "%s: dev-iotlb notifier added"
 riscv_iommu_notifier_del(const char *id) "%s: dev-iotlb notifier removed"
+riscv_iommu_ats(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova) "%s: translate request %04x:%02x.%u iova: 0x%"PRIx64
+riscv_iommu_ats_inval(const char *id) "%s: dev-iotlb invalidate"
+riscv_iommu_ats_prgr(const char *id) "%s: dev-iotlb page request group response"
-- 
2.43.2



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 11/15] hw/riscv/riscv-iommu: add DBG support
  2024-03-07 16:03 [PATCH v2 00/15] riscv: QEMU RISC-V IOMMU Support Daniel Henrique Barboza
                   ` (9 preceding siblings ...)
  2024-03-07 16:03 ` [PATCH v2 10/15] hw/riscv/riscv-iommu: add ATS support Daniel Henrique Barboza
@ 2024-03-07 16:03 ` Daniel Henrique Barboza
  2024-05-06  4:09   ` Frank Chang
  2024-03-07 16:03 ` [PATCH v2 12/15] hw/riscv/riscv-iommu: Add another irq for mrif notifications Daniel Henrique Barboza
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 55+ messages in thread
From: Daniel Henrique Barboza @ 2024-03-07 16:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, zhiwei_liu,
	palmer, ajones, tjeznach, Daniel Henrique Barboza

From: Tomasz Jeznach <tjeznach@rivosinc.com>

DBG support adds three additional registers: tr_req_iova, tr_req_ctl and
tr_response.

The DBG cap is always enabled. No on/off toggle is provided for it.

Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
---
 hw/riscv/riscv-iommu-bits.h | 20 +++++++++++++
 hw/riscv/riscv-iommu.c      | 57 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 76 insertions(+), 1 deletion(-)

diff --git a/hw/riscv/riscv-iommu-bits.h b/hw/riscv/riscv-iommu-bits.h
index 0994f5ce48..b3f92411bb 100644
--- a/hw/riscv/riscv-iommu-bits.h
+++ b/hw/riscv/riscv-iommu-bits.h
@@ -83,6 +83,7 @@ struct riscv_iommu_pq_record {
 #define RISCV_IOMMU_CAP_MSI_MRIF        BIT_ULL(23)
 #define RISCV_IOMMU_CAP_ATS             BIT_ULL(25)
 #define RISCV_IOMMU_CAP_IGS             GENMASK_ULL(29, 28)
+#define RISCV_IOMMU_CAP_DBG             BIT_ULL(31)
 #define RISCV_IOMMU_CAP_PAS             GENMASK_ULL(37, 32)
 #define RISCV_IOMMU_CAP_PD8             BIT_ULL(38)
 
@@ -177,6 +178,25 @@ enum {
     RISCV_IOMMU_INTR_COUNT
 };
 
+#define RISCV_IOMMU_IPSR_CIP            BIT(RISCV_IOMMU_INTR_CQ)
+#define RISCV_IOMMU_IPSR_FIP            BIT(RISCV_IOMMU_INTR_FQ)
+#define RISCV_IOMMU_IPSR_PMIP           BIT(RISCV_IOMMU_INTR_PM)
+#define RISCV_IOMMU_IPSR_PIP            BIT(RISCV_IOMMU_INTR_PQ)
+
+/* 5.24 Translation request IOVA (64bits) */
+#define RISCV_IOMMU_REG_TR_REQ_IOVA     0x0258
+
+/* 5.25 Translation request control (64bits) */
+#define RISCV_IOMMU_REG_TR_REQ_CTL      0x0260
+#define RISCV_IOMMU_TR_REQ_CTL_GO_BUSY  BIT_ULL(0)
+#define RISCV_IOMMU_TR_REQ_CTL_PID      GENMASK_ULL(31, 12)
+#define RISCV_IOMMU_TR_REQ_CTL_DID      GENMASK_ULL(63, 40)
+
+/* 5.26 Translation request response (64bits) */
+#define RISCV_IOMMU_REG_TR_RESPONSE     0x0268
+#define RISCV_IOMMU_TR_RESPONSE_FAULT   BIT_ULL(0)
+#define RISCV_IOMMU_TR_RESPONSE_PPN     RISCV_IOMMU_PPN_FIELD
+
 /* 5.27 Interrupt cause to vector (64bits) */
 #define RISCV_IOMMU_REG_IVEC            0x02F8
 
diff --git a/hw/riscv/riscv-iommu.c b/hw/riscv/riscv-iommu.c
index 7af5929b10..1fa1286d07 100644
--- a/hw/riscv/riscv-iommu.c
+++ b/hw/riscv/riscv-iommu.c
@@ -1457,6 +1457,46 @@ static void riscv_iommu_process_pq_control(RISCVIOMMUState *s)
     riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_PQCSR, ctrl_set, ctrl_clr);
 }
 
+static void riscv_iommu_process_dbg(RISCVIOMMUState *s)
+{
+    uint64_t iova = riscv_iommu_reg_get64(s, RISCV_IOMMU_REG_TR_REQ_IOVA);
+    uint64_t ctrl = riscv_iommu_reg_get64(s, RISCV_IOMMU_REG_TR_REQ_CTL);
+    unsigned devid = get_field(ctrl, RISCV_IOMMU_TR_REQ_CTL_DID);
+    unsigned pid = get_field(ctrl, RISCV_IOMMU_TR_REQ_CTL_PID);
+    RISCVIOMMUContext *ctx;
+    void *ref;
+
+    if (!(ctrl & RISCV_IOMMU_TR_REQ_CTL_GO_BUSY)) {
+        return;
+    }
+
+    ctx = riscv_iommu_ctx(s, devid, pid, &ref);
+    if (ctx == NULL) {
+        riscv_iommu_reg_set64(s, RISCV_IOMMU_REG_TR_RESPONSE,
+                                 RISCV_IOMMU_TR_RESPONSE_FAULT |
+                                 (RISCV_IOMMU_FQ_CAUSE_DMA_DISABLED << 10));
+    } else {
+        IOMMUTLBEntry iotlb = {
+            .iova = iova,
+            .perm = IOMMU_NONE,
+            .addr_mask = ~0,
+            .target_as = NULL,
+        };
+        int fault = riscv_iommu_translate(s, ctx, &iotlb, false);
+        if (fault) {
+            iova = RISCV_IOMMU_TR_RESPONSE_FAULT | (((uint64_t) fault) << 10);
+        } else {
+            iova = ((iotlb.translated_addr & ~iotlb.addr_mask) >> 2) &
+                RISCV_IOMMU_TR_RESPONSE_PPN;
+        }
+        riscv_iommu_reg_set64(s, RISCV_IOMMU_REG_TR_RESPONSE, iova);
+    }
+
+    riscv_iommu_reg_mod64(s, RISCV_IOMMU_REG_TR_REQ_CTL, 0,
+        RISCV_IOMMU_TR_REQ_CTL_GO_BUSY);
+    riscv_iommu_ctx_put(s, ref);
+}
+
 /* Core IOMMU execution activation */
 enum {
     RISCV_IOMMU_EXEC_DDTP,
@@ -1502,7 +1542,7 @@ static void *riscv_iommu_core_proc(void* arg)
             /* NOP */
             break;
         case BIT(RISCV_IOMMU_EXEC_TR_REQUEST):
-            /* DBG support not implemented yet */
+            riscv_iommu_process_dbg(s);
             break;
         }
         exec &= ~mask;
@@ -1574,6 +1614,12 @@ static MemTxResult riscv_iommu_mmio_write(void *opaque, hwaddr addr,
         exec = BIT(RISCV_IOMMU_EXEC_PQCSR);
         busy = RISCV_IOMMU_PQCSR_BUSY;
         break;
+
+    case RISCV_IOMMU_REG_TR_REQ_CTL:
+        exec = BIT(RISCV_IOMMU_EXEC_TR_REQUEST);
+        regb = RISCV_IOMMU_REG_TR_REQ_CTL;
+        busy = RISCV_IOMMU_TR_REQ_CTL_GO_BUSY;
+        break;
     }
 
     /*
@@ -1746,6 +1792,9 @@ static void riscv_iommu_realize(DeviceState *dev, Error **errp)
         s->cap |= RISCV_IOMMU_CAP_SV32X4 | RISCV_IOMMU_CAP_SV39X4 |
                   RISCV_IOMMU_CAP_SV48X4 | RISCV_IOMMU_CAP_SV57X4;
     }
+    /* Enable translation debug interface */
+    s->cap |= RISCV_IOMMU_CAP_DBG;
+
     /* Report QEMU target physical address space limits */
     s->cap = set_field(s->cap, RISCV_IOMMU_CAP_PAS,
                        TARGET_PHYS_ADDR_SPACE_BITS);
@@ -1800,6 +1849,12 @@ static void riscv_iommu_realize(DeviceState *dev, Error **errp)
     stl_le_p(&s->regs_wc[RISCV_IOMMU_REG_IPSR], ~0);
     stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_IVEC], 0);
     stq_le_p(&s->regs_rw[RISCV_IOMMU_REG_DDTP], s->ddtp);
+    /* If debug registers enabled. */
+    if (s->cap & RISCV_IOMMU_CAP_DBG) {
+        stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_TR_REQ_IOVA], 0);
+        stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_TR_REQ_CTL],
+            RISCV_IOMMU_TR_REQ_CTL_GO_BUSY);
+    }
 
     /* Memory region for downstream access, if specified. */
     if (s->target_mr) {
-- 
2.43.2



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 12/15] hw/riscv/riscv-iommu: Add another irq for mrif notifications
  2024-03-07 16:03 [PATCH v2 00/15] riscv: QEMU RISC-V IOMMU Support Daniel Henrique Barboza
                   ` (10 preceding siblings ...)
  2024-03-07 16:03 ` [PATCH v2 11/15] hw/riscv/riscv-iommu: add DBG support Daniel Henrique Barboza
@ 2024-03-07 16:03 ` Daniel Henrique Barboza
  2024-05-06  6:12   ` Frank Chang
  2024-03-07 16:03 ` [PATCH v2 13/15] qtest/riscv-iommu-test: add init queues test Daniel Henrique Barboza
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 55+ messages in thread
From: Daniel Henrique Barboza @ 2024-03-07 16:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, zhiwei_liu,
	palmer, ajones, tjeznach, Daniel Henrique Barboza

From: Andrew Jones <ajones@ventanamicro.com>

And add mrif notification trace.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
---
 hw/riscv/riscv-iommu-pci.c | 2 +-
 hw/riscv/riscv-iommu.c     | 1 +
 hw/riscv/trace-events      | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/riscv/riscv-iommu-pci.c b/hw/riscv/riscv-iommu-pci.c
index 4eb1057210..8a7b71166c 100644
--- a/hw/riscv/riscv-iommu-pci.c
+++ b/hw/riscv/riscv-iommu-pci.c
@@ -78,7 +78,7 @@ static void riscv_iommu_pci_realize(PCIDevice *dev, Error **errp)
     pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY |
                      PCI_BASE_ADDRESS_MEM_TYPE_64, &s->bar0);
 
-    int ret = msix_init(dev, RISCV_IOMMU_INTR_COUNT,
+    int ret = msix_init(dev, RISCV_IOMMU_INTR_COUNT + 1,
                         &s->bar0, 0, RISCV_IOMMU_REG_MSI_CONFIG,
                         &s->bar0, 0, RISCV_IOMMU_REG_MSI_CONFIG + 256, 0, &err);
 
diff --git a/hw/riscv/riscv-iommu.c b/hw/riscv/riscv-iommu.c
index 1fa1286d07..954a6892c2 100644
--- a/hw/riscv/riscv-iommu.c
+++ b/hw/riscv/riscv-iommu.c
@@ -543,6 +543,7 @@ static MemTxResult riscv_iommu_msi_write(RISCVIOMMUState *s,
     if (res != MEMTX_OK) {
         return res;
     }
+    trace_riscv_iommu_mrif_notification(s->parent_obj.id, n190, addr);
 
     return MEMTX_OK;
 }
diff --git a/hw/riscv/trace-events b/hw/riscv/trace-events
index 4b486b6420..d69719a27a 100644
--- a/hw/riscv/trace-events
+++ b/hw/riscv/trace-events
@@ -6,6 +6,7 @@ riscv_iommu_flt(const char *id, unsigned b, unsigned d, unsigned f, uint64_t rea
 riscv_iommu_pri(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova) "%s: page request %04x:%02x.%u iova: 0x%"PRIx64
 riscv_iommu_dma(const char *id, unsigned b, unsigned d, unsigned f, unsigned pasid, const char *dir, uint64_t iova, uint64_t phys) "%s: translate %04x:%02x.%u #%u %s 0x%"PRIx64" -> 0x%"PRIx64
 riscv_iommu_msi(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova, uint64_t phys) "%s: translate %04x:%02x.%u MSI 0x%"PRIx64" -> 0x%"PRIx64
+riscv_iommu_mrif_notification(const char *id, uint32_t nid, uint64_t phys) "%s: sent MRIF notification 0x%x to 0x%"PRIx64
 riscv_iommu_cmd(const char *id, uint64_t l, uint64_t u) "%s: command 0x%"PRIx64" 0x%"PRIx64
 riscv_iommu_notifier_add(const char *id) "%s: dev-iotlb notifier added"
 riscv_iommu_notifier_del(const char *id) "%s: dev-iotlb notifier removed"
-- 
2.43.2



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 13/15] qtest/riscv-iommu-test: add init queues test
  2024-03-07 16:03 [PATCH v2 00/15] riscv: QEMU RISC-V IOMMU Support Daniel Henrique Barboza
                   ` (11 preceding siblings ...)
  2024-03-07 16:03 ` [PATCH v2 12/15] hw/riscv/riscv-iommu: Add another irq for mrif notifications Daniel Henrique Barboza
@ 2024-03-07 16:03 ` Daniel Henrique Barboza
  2024-05-07  8:01   ` Frank Chang
  2024-03-07 16:03 ` [PATCH v2 14/15] hw/misc: EDU: added PASID support Daniel Henrique Barboza
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 55+ messages in thread
From: Daniel Henrique Barboza @ 2024-03-07 16:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, zhiwei_liu,
	palmer, ajones, tjeznach, Daniel Henrique Barboza

Add an additional test to further exercise the IOMMU where we attempt to
initialize the command, fault and page-request queues.

These steps are taken from chapter 6.2 of the RISC-V IOMMU spec,
"Guidelines for initialization". It emulates what we expect from the
software/OS when initializing the IOMMU.

Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
---
 tests/qtest/libqos/riscv-iommu.h |  29 +++++++
 tests/qtest/riscv-iommu-test.c   | 141 +++++++++++++++++++++++++++++++
 2 files changed, 170 insertions(+)

diff --git a/tests/qtest/libqos/riscv-iommu.h b/tests/qtest/libqos/riscv-iommu.h
index 8c056caa7b..aeaa5fb8b8 100644
--- a/tests/qtest/libqos/riscv-iommu.h
+++ b/tests/qtest/libqos/riscv-iommu.h
@@ -58,6 +58,35 @@
 
 #define RISCV_IOMMU_REG_IPSR            0x0054
 
+#define RISCV_IOMMU_REG_IVEC            0x02F8
+#define RISCV_IOMMU_REG_IVEC_CIV        GENMASK_ULL(3, 0)
+#define RISCV_IOMMU_REG_IVEC_FIV        GENMASK_ULL(7, 4)
+#define RISCV_IOMMU_REG_IVEC_PIV        GENMASK_ULL(15, 12)
+
+#define RISCV_IOMMU_REG_CQB             0x0018
+#define RISCV_IOMMU_CQB_PPN_START       10
+#define RISCV_IOMMU_CQB_PPN_LEN         44
+#define RISCV_IOMMU_CQB_LOG2SZ_START    0
+#define RISCV_IOMMU_CQB_LOG2SZ_LEN      5
+
+#define RISCV_IOMMU_REG_CQT             0x0024
+
+#define RISCV_IOMMU_REG_FQB             0x0028
+#define RISCV_IOMMU_FQB_PPN_START       10
+#define RISCV_IOMMU_FQB_PPN_LEN         44
+#define RISCV_IOMMU_FQB_LOG2SZ_START    0
+#define RISCV_IOMMU_FQB_LOG2SZ_LEN      5
+
+#define RISCV_IOMMU_REG_FQT             0x0034
+
+#define RISCV_IOMMU_REG_PQB             0x0038
+#define RISCV_IOMMU_PQB_PPN_START       10
+#define RISCV_IOMMU_PQB_PPN_LEN         44
+#define RISCV_IOMMU_PQB_LOG2SZ_START    0
+#define RISCV_IOMMU_PQB_LOG2SZ_LEN      5
+
+#define RISCV_IOMMU_REG_PQT             0x0044
+
 typedef struct QRISCVIOMMU {
     QOSGraphObject obj;
     QPCIDevice dev;
diff --git a/tests/qtest/riscv-iommu-test.c b/tests/qtest/riscv-iommu-test.c
index 13b887d15e..64f3f092f2 100644
--- a/tests/qtest/riscv-iommu-test.c
+++ b/tests/qtest/riscv-iommu-test.c
@@ -33,6 +33,20 @@ static uint64_t riscv_iommu_read_reg64(QRISCVIOMMU *r_iommu, int reg_offset)
     return reg;
 }
 
+static void riscv_iommu_write_reg32(QRISCVIOMMU *r_iommu, int reg_offset,
+                                    uint32_t val)
+{
+    qpci_memwrite(&r_iommu->dev, r_iommu->reg_bar, reg_offset,
+                  &val, sizeof(val));
+}
+
+static void riscv_iommu_write_reg64(QRISCVIOMMU *r_iommu, int reg_offset,
+                                    uint64_t val)
+{
+    qpci_memwrite(&r_iommu->dev, r_iommu->reg_bar, reg_offset,
+                  &val, sizeof(val));
+}
+
 static void test_pci_config(void *obj, void *data, QGuestAllocator *t_alloc)
 {
     QRISCVIOMMU *r_iommu = obj;
@@ -84,10 +98,137 @@ static void test_reg_reset(void *obj, void *data, QGuestAllocator *t_alloc)
     g_assert_cmpuint(reg, ==, 0);
 }
 
+/*
+ * Common timeout-based poll for CQCSR, FQCSR and PQCSR. All
+ * their ON bits are mapped as RISCV_IOMMU_QUEUE_ACTIVE (16),
+ */
+static void qtest_wait_for_queue_active(QRISCVIOMMU *r_iommu,
+                                        uint32_t queue_csr)
+{
+    QTestState *qts = global_qtest;
+    guint64 timeout_us = 2 * 1000 * 1000;
+    gint64 start_time = g_get_monotonic_time();
+    uint32_t reg;
+
+    for (;;) {
+        qtest_clock_step(qts, 100);
+
+        reg = riscv_iommu_read_reg32(r_iommu, queue_csr);
+        if (reg & RISCV_IOMMU_QUEUE_ACTIVE) {
+            break;
+        }
+        g_assert(g_get_monotonic_time() - start_time <= timeout_us);
+    }
+}
+
+/*
+ * Goes through the queue activation procedures of chapter 6.2,
+ * "Guidelines for initialization", of the RISCV-IOMMU spec.
+ */
+static void test_iommu_init_queues(void *obj, void *data,
+                                   QGuestAllocator *t_alloc)
+{
+    QRISCVIOMMU *r_iommu = obj;
+    uint64_t reg64, q_addr;
+    uint32_t reg;
+    int k;
+
+    reg64 = riscv_iommu_read_reg64(r_iommu, RISCV_IOMMU_REG_CAP);
+    g_assert_cmpuint(reg64 & RISCV_IOMMU_CAP_VERSION, ==, 0x10);
+
+    /*
+     * Program the command queue. Write 0xF to civ, assert that
+     * we have 4 writable bits (k = 4). The amount of entries N in the
+     * command queue is 2^4 = 16. We need to alloc a N*16 bytes
+     * buffer and use it to set cqb.
+     */
+    riscv_iommu_write_reg32(r_iommu, RISCV_IOMMU_REG_IVEC,
+                            0xFFFF & RISCV_IOMMU_REG_IVEC_CIV);
+    reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_IVEC);
+    g_assert_cmpuint(reg & RISCV_IOMMU_REG_IVEC_CIV, ==, 0xF);
+
+    q_addr = guest_alloc(t_alloc, 16 * 16);
+    reg64 = 0;
+    k = 4;
+    deposit64(reg64, RISCV_IOMMU_CQB_PPN_START,
+              RISCV_IOMMU_CQB_PPN_LEN, q_addr);
+    deposit64(reg64, RISCV_IOMMU_CQB_LOG2SZ_START,
+              RISCV_IOMMU_CQB_LOG2SZ_LEN, k - 1);
+    riscv_iommu_write_reg64(r_iommu, RISCV_IOMMU_REG_CQB, reg64);
+
+    /* cqt = 0, cqcsr.cqen = 1, poll cqcsr.cqon until it reads 1 */
+    riscv_iommu_write_reg32(r_iommu, RISCV_IOMMU_REG_CQT, 0);
+
+    reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_CQCSR);
+    reg |= RISCV_IOMMU_CQCSR_CQEN;
+    riscv_iommu_write_reg32(r_iommu, RISCV_IOMMU_REG_CQCSR, reg);
+
+    qtest_wait_for_queue_active(r_iommu, RISCV_IOMMU_REG_CQCSR);
+
+    /*
+     * Program the fault queue. Similar to the above:
+     * - Write 0xF to fiv, assert that we have 4 writable bits (k = 4)
+     * - Alloc a 16*32 bytes (instead of 16*16) buffer and use it to set
+     * fqb
+     */
+    riscv_iommu_write_reg32(r_iommu, RISCV_IOMMU_REG_IVEC,
+                            0xFFFF & RISCV_IOMMU_REG_IVEC_FIV);
+    reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_IVEC);
+    g_assert_cmpuint(reg & RISCV_IOMMU_REG_IVEC_FIV, ==, 0xF0);
+
+    q_addr = guest_alloc(t_alloc, 16 * 32);
+    reg64 = 0;
+    k = 4;
+    deposit64(reg64, RISCV_IOMMU_FQB_PPN_START,
+              RISCV_IOMMU_FQB_PPN_LEN, q_addr);
+    deposit64(reg64, RISCV_IOMMU_FQB_LOG2SZ_START,
+              RISCV_IOMMU_FQB_LOG2SZ_LEN, k - 1);
+    riscv_iommu_write_reg64(r_iommu, RISCV_IOMMU_REG_FQB, reg64);
+
+    /* fqt = 0, fqcsr.fqen = 1, poll fqcsr.fqon until it reads 1 */
+    riscv_iommu_write_reg32(r_iommu, RISCV_IOMMU_REG_FQT, 0);
+
+    reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_FQCSR);
+    reg |= RISCV_IOMMU_FQCSR_FQEN;
+    riscv_iommu_write_reg32(r_iommu, RISCV_IOMMU_REG_FQCSR, reg);
+
+    qtest_wait_for_queue_active(r_iommu, RISCV_IOMMU_REG_FQCSR);
+
+    /*
+     * Program the page-request queue:
+     - Write 0xF to piv, assert that we have 4 writable bits (k = 4)
+     - Alloc a 16*16 bytes buffer and use it to set pqb.
+     */
+    riscv_iommu_write_reg32(r_iommu, RISCV_IOMMU_REG_IVEC,
+                            0xFFFF & RISCV_IOMMU_REG_IVEC_PIV);
+    reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_IVEC);
+    g_assert_cmpuint(reg & RISCV_IOMMU_REG_IVEC_PIV, ==, 0xF000);
+
+    q_addr = guest_alloc(t_alloc, 16 * 16);
+    reg64 = 0;
+    k = 4;
+    deposit64(reg64, RISCV_IOMMU_PQB_PPN_START,
+              RISCV_IOMMU_PQB_PPN_LEN, q_addr);
+    deposit64(reg64, RISCV_IOMMU_PQB_LOG2SZ_START,
+              RISCV_IOMMU_PQB_LOG2SZ_LEN, k - 1);
+    riscv_iommu_write_reg64(r_iommu, RISCV_IOMMU_REG_PQB, reg64);
+
+    /* pqt = 0, pqcsr.pqen = 1, poll pqcsr.pqon until it reads 1 */
+    riscv_iommu_write_reg32(r_iommu, RISCV_IOMMU_REG_PQT, 0);
+
+    reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_PQCSR);
+    reg |= RISCV_IOMMU_PQCSR_PQEN;
+    riscv_iommu_write_reg32(r_iommu, RISCV_IOMMU_REG_PQCSR, reg);
+
+    qtest_wait_for_queue_active(r_iommu, RISCV_IOMMU_REG_PQCSR);
+}
+
 static void register_riscv_iommu_test(void)
 {
     qos_add_test("pci_config", "riscv-iommu-pci", test_pci_config, NULL);
     qos_add_test("reg_reset", "riscv-iommu-pci", test_reg_reset, NULL);
+    qos_add_test("iommu_init_queues", "riscv-iommu-pci",
+                 test_iommu_init_queues, NULL);
 }
 
 libqos_init(register_riscv_iommu_test);
-- 
2.43.2



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 14/15] hw/misc: EDU: added PASID support
  2024-03-07 16:03 [PATCH v2 00/15] riscv: QEMU RISC-V IOMMU Support Daniel Henrique Barboza
                   ` (12 preceding siblings ...)
  2024-03-07 16:03 ` [PATCH v2 13/15] qtest/riscv-iommu-test: add init queues test Daniel Henrique Barboza
@ 2024-03-07 16:03 ` Daniel Henrique Barboza
  2024-05-07  9:06   ` Frank Chang
  2024-03-07 16:03 ` [PATCH v2 15/15] hw/misc: EDU: add ATS/PRI capability Daniel Henrique Barboza
  2024-05-10 11:14 ` [PATCH v2 00/15] riscv: QEMU RISC-V IOMMU Support Frank Chang
  15 siblings, 1 reply; 55+ messages in thread
From: Daniel Henrique Barboza @ 2024-03-07 16:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, zhiwei_liu,
	palmer, ajones, tjeznach

From: Tomasz Jeznach <tjeznach@rivosinc.com>

Extension to support DMA with PASID identifier and reporting PASID
extended PCIe capabilities.

Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
---
 hw/misc/edu.c | 57 +++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 44 insertions(+), 13 deletions(-)

diff --git a/hw/misc/edu.c b/hw/misc/edu.c
index 2a976ca2b1..522cec85b3 100644
--- a/hw/misc/edu.c
+++ b/hw/misc/edu.c
@@ -26,6 +26,7 @@
 #include "qemu/units.h"
 #include "hw/pci/pci.h"
 #include "hw/hw.h"
+#include "hw/qdev-properties.h"
 #include "hw/pci/msi.h"
 #include "qemu/timer.h"
 #include "qom/object.h"
@@ -53,6 +54,8 @@ struct EduState {
     QemuCond thr_cond;
     bool stopping;
 
+    bool enable_pasid;
+
     uint32_t addr4;
     uint32_t fact;
 #define EDU_STATUS_COMPUTING    0x01
@@ -66,6 +69,9 @@ struct EduState {
 # define EDU_DMA_FROM_PCI       0
 # define EDU_DMA_TO_PCI         1
 #define EDU_DMA_IRQ             0x4
+#define EDU_DMA_PV              0x8
+#define EDU_DMA_PASID(cmd)      (((cmd) >> 8) & ((1U << 20) - 1))
+
     struct dma_state {
         dma_addr_t src;
         dma_addr_t dst;
@@ -126,12 +132,7 @@ static void edu_check_range(uint64_t addr, uint64_t size1, uint64_t start,
 
 static dma_addr_t edu_clamp_addr(const EduState *edu, dma_addr_t addr)
 {
-    dma_addr_t res = addr & edu->dma_mask;
-
-    if (addr != res) {
-        printf("EDU: clamping DMA %#.16"PRIx64" to %#.16"PRIx64"!\n", addr, res);
-    }
-
+    dma_addr_t res = addr;
     return res;
 }
 
@@ -139,23 +140,33 @@ static void edu_dma_timer(void *opaque)
 {
     EduState *edu = opaque;
     bool raise_irq = false;
+    MemTxAttrs attrs = MEMTXATTRS_UNSPECIFIED;
 
     if (!(edu->dma.cmd & EDU_DMA_RUN)) {
         return;
     }
 
+    if (edu->enable_pasid && (edu->dma.cmd & EDU_DMA_PV)) {
+        attrs.unspecified = 0;
+        attrs.pasid = EDU_DMA_PASID(edu->dma.cmd);
+        attrs.requester_id = pci_requester_id(&edu->pdev);
+        attrs.secure = 0;
+    }
+
     if (EDU_DMA_DIR(edu->dma.cmd) == EDU_DMA_FROM_PCI) {
         uint64_t dst = edu->dma.dst;
         edu_check_range(dst, edu->dma.cnt, DMA_START, DMA_SIZE);
         dst -= DMA_START;
-        pci_dma_read(&edu->pdev, edu_clamp_addr(edu, edu->dma.src),
-                edu->dma_buf + dst, edu->dma.cnt);
+        pci_dma_rw(&edu->pdev, edu_clamp_addr(edu, edu->dma.src),
+                edu->dma_buf + dst, edu->dma.cnt,
+                DMA_DIRECTION_TO_DEVICE, attrs);
     } else {
         uint64_t src = edu->dma.src;
         edu_check_range(src, edu->dma.cnt, DMA_START, DMA_SIZE);
         src -= DMA_START;
-        pci_dma_write(&edu->pdev, edu_clamp_addr(edu, edu->dma.dst),
-                edu->dma_buf + src, edu->dma.cnt);
+        pci_dma_rw(&edu->pdev, edu_clamp_addr(edu, edu->dma.dst),
+                edu->dma_buf + src, edu->dma.cnt,
+                DMA_DIRECTION_FROM_DEVICE, attrs);
     }
 
     edu->dma.cmd &= ~EDU_DMA_RUN;
@@ -255,7 +266,8 @@ static void edu_mmio_write(void *opaque, hwaddr addr, uint64_t val,
         if (qatomic_read(&edu->status) & EDU_STATUS_COMPUTING) {
             break;
         }
-        /* EDU_STATUS_COMPUTING cannot go 0->1 concurrently, because it is only
+        /*
+         * EDU_STATUS_COMPUTING cannot go 0->1 concurrently, because it is only
          * set in this function and it is under the iothread mutex.
          */
         qemu_mutex_lock(&edu->thr_mutex);
@@ -368,9 +380,21 @@ static void pci_edu_realize(PCIDevice *pdev, Error **errp)
 {
     EduState *edu = EDU(pdev);
     uint8_t *pci_conf = pdev->config;
+    int pos;
 
     pci_config_set_interrupt_pin(pci_conf, 1);
 
+    pcie_endpoint_cap_init(pdev, 0);
+
+    /* PCIe extended capability for PASID */
+    pos = PCI_CONFIG_SPACE_SIZE;
+    if (edu->enable_pasid) {
+        /* PCIe Spec 7.8.9 PASID Extended Capability Structure */
+        pcie_add_capability(pdev, 0x1b, 1, pos, 8);
+        pci_set_long(pdev->config + pos + 4, 0x00001400);
+        pci_set_long(pdev->wmask + pos + 4,  0xfff0ffff);
+    }
+
     if (msi_init(pdev, 0, 1, true, false, errp)) {
         return;
     }
@@ -404,20 +428,27 @@ static void pci_edu_uninit(PCIDevice *pdev)
     msi_uninit(pdev);
 }
 
+
 static void edu_instance_init(Object *obj)
 {
     EduState *edu = EDU(obj);
 
-    edu->dma_mask = (1UL << 28) - 1;
+    edu->dma_mask = ~0ULL;
     object_property_add_uint64_ptr(obj, "dma_mask",
                                    &edu->dma_mask, OBJ_PROP_FLAG_READWRITE);
 }
 
+static Property edu_properties[] = {
+    DEFINE_PROP_BOOL("pasid", EduState, enable_pasid, TRUE),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
 static void edu_class_init(ObjectClass *class, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(class);
     PCIDeviceClass *k = PCI_DEVICE_CLASS(class);
 
+    device_class_set_props(dc, edu_properties);
     k->realize = pci_edu_realize;
     k->exit = pci_edu_uninit;
     k->vendor_id = PCI_VENDOR_ID_QEMU;
@@ -430,7 +461,7 @@ static void edu_class_init(ObjectClass *class, void *data)
 static void pci_edu_register_types(void)
 {
     static InterfaceInfo interfaces[] = {
-        { INTERFACE_CONVENTIONAL_PCI_DEVICE },
+        { INTERFACE_PCIE_DEVICE },
         { },
     };
     static const TypeInfo edu_info = {
-- 
2.43.2



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 15/15] hw/misc: EDU: add ATS/PRI capability
  2024-03-07 16:03 [PATCH v2 00/15] riscv: QEMU RISC-V IOMMU Support Daniel Henrique Barboza
                   ` (13 preceding siblings ...)
  2024-03-07 16:03 ` [PATCH v2 14/15] hw/misc: EDU: added PASID support Daniel Henrique Barboza
@ 2024-03-07 16:03 ` Daniel Henrique Barboza
  2024-05-07 15:32   ` Frank Chang
  2024-05-10 11:14 ` [PATCH v2 00/15] riscv: QEMU RISC-V IOMMU Support Frank Chang
  15 siblings, 1 reply; 55+ messages in thread
From: Daniel Henrique Barboza @ 2024-03-07 16:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, zhiwei_liu,
	palmer, ajones, tjeznach

From: Tomasz Jeznach <tjeznach@rivosinc.com>

Mimic ATS interface with IOMMU translate request with IOMMU_NONE.  If
mapping exists, translation service will return current permission
flags, otherwise will report no permissions.

Implement and register the IOMMU memory region listener to be notified
whenever an ATS invalidation request is sent from the IOMMU.

Implement and register the IOMMU memory region listener to be notified
whenever an ATS page request group response is triggered from the IOMMU.

Introduces a retry mechanism to the timer design so that any page that's
not available should be only accessed after the PRGR notification has
been received.

Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
Signed-off-by: Sebastien Boeuf <seb@rivosinc.com>
---
 hw/misc/edu.c | 258 ++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 251 insertions(+), 7 deletions(-)

diff --git a/hw/misc/edu.c b/hw/misc/edu.c
index 522cec85b3..f4f6c15ec6 100644
--- a/hw/misc/edu.c
+++ b/hw/misc/edu.c
@@ -45,6 +45,14 @@ DECLARE_INSTANCE_CHECKER(EduState, EDU,
 #define DMA_START       0x40000
 #define DMA_SIZE        4096
 
+/*
+ * Number of tries before giving up on page request group response.
+ * Given the timer callback is scheduled to be run again after 100ms,
+ * 10 tries give roughly a second for the PRGR notification to be
+ * received.
+ */
+#define NUM_TRIES       10
+
 struct EduState {
     PCIDevice pdev;
     MemoryRegion mmio;
@@ -55,6 +63,7 @@ struct EduState {
     bool stopping;
 
     bool enable_pasid;
+    uint32_t try;
 
     uint32_t addr4;
     uint32_t fact;
@@ -81,6 +90,20 @@ struct EduState {
     QEMUTimer dma_timer;
     char dma_buf[DMA_SIZE];
     uint64_t dma_mask;
+
+    MemoryListener iommu_listener;
+    QLIST_HEAD(, edu_iommu) iommu_list;
+
+    bool prgr_rcvd;
+    bool prgr_success;
+};
+
+struct edu_iommu {
+    EduState *edu;
+    IOMMUMemoryRegion *iommu_mr;
+    hwaddr iommu_offset;
+    IOMMUNotifier n;
+    QLIST_ENTRY(edu_iommu) iommu_next;
 };
 
 static bool edu_msi_enabled(EduState *edu)
@@ -136,11 +159,65 @@ static dma_addr_t edu_clamp_addr(const EduState *edu, dma_addr_t addr)
     return res;
 }
 
+static bool __find_iommu_mr_cb(Int128 start, Int128 len, const MemoryRegion *mr,
+    hwaddr offset_in_region, void *opaque)
+{
+    IOMMUMemoryRegion **iommu_mr = opaque;
+    *iommu_mr = memory_region_get_iommu((MemoryRegion *)mr);
+    return *iommu_mr != NULL;
+}
+
+static int pci_dma_perm(PCIDevice *pdev, dma_addr_t iova, MemTxAttrs attrs)
+{
+    IOMMUMemoryRegion *iommu_mr = NULL;
+    IOMMUMemoryRegionClass *imrc;
+    int iommu_idx;
+    FlatView *fv;
+    EduState *edu = EDU(pdev);
+    struct edu_iommu *iommu;
+
+    RCU_READ_LOCK_GUARD();
+
+    fv = address_space_to_flatview(pci_get_address_space(pdev));
+
+    /* Find first IOMMUMemoryRegion */
+    flatview_for_each_range(fv, __find_iommu_mr_cb, &iommu_mr);
+
+    if (iommu_mr) {
+        imrc = memory_region_get_iommu_class_nocheck(iommu_mr);
+
+        /* IOMMU Index is mapping to memory attributes (PASID, etc) */
+        iommu_idx = imrc->attrs_to_index ?
+                    imrc->attrs_to_index(iommu_mr, attrs) : 0;
+
+        /* Update IOMMU notifiers with proper index */
+        QLIST_FOREACH(iommu, &edu->iommu_list, iommu_next) {
+            if (iommu->iommu_mr == iommu_mr &&
+                iommu->n.iommu_idx != iommu_idx) {
+                memory_region_unregister_iommu_notifier(
+                    MEMORY_REGION(iommu->iommu_mr), &iommu->n);
+                iommu->n.iommu_idx = iommu_idx;
+                memory_region_register_iommu_notifier(
+                    MEMORY_REGION(iommu->iommu_mr), &iommu->n, NULL);
+            }
+        }
+
+        /* Translate request with IOMMU_NONE is an ATS request */
+        IOMMUTLBEntry iotlb = imrc->translate(iommu_mr, iova, IOMMU_NONE,
+                                              iommu_idx);
+
+        return iotlb.perm;
+    }
+
+    return IOMMU_NONE;
+}
+
 static void edu_dma_timer(void *opaque)
 {
     EduState *edu = opaque;
     bool raise_irq = false;
     MemTxAttrs attrs = MEMTXATTRS_UNSPECIFIED;
+    MemTxResult res;
 
     if (!(edu->dma.cmd & EDU_DMA_RUN)) {
         return;
@@ -155,18 +232,70 @@ static void edu_dma_timer(void *opaque)
 
     if (EDU_DMA_DIR(edu->dma.cmd) == EDU_DMA_FROM_PCI) {
         uint64_t dst = edu->dma.dst;
+        uint64_t src = edu_clamp_addr(edu, edu->dma.src);
         edu_check_range(dst, edu->dma.cnt, DMA_START, DMA_SIZE);
         dst -= DMA_START;
-        pci_dma_rw(&edu->pdev, edu_clamp_addr(edu, edu->dma.src),
-                edu->dma_buf + dst, edu->dma.cnt,
-                DMA_DIRECTION_TO_DEVICE, attrs);
+        if (edu->try-- == NUM_TRIES) {
+            edu->prgr_rcvd = false;
+            if (!(pci_dma_perm(&edu->pdev, src, attrs) & IOMMU_RO)) {
+                timer_mod(&edu->dma_timer,
+                          qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 100);
+                return;
+            }
+        } else if (edu->try) {
+            if (!edu->prgr_rcvd) {
+                timer_mod(&edu->dma_timer,
+                          qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 100);
+                return;
+            }
+            if (!edu->prgr_success) {
+                /* PRGR failure, fail DMA. */
+                edu->dma.cmd &= ~EDU_DMA_RUN;
+                return;
+            }
+        } else {
+            /* timeout, fail DMA. */
+            edu->dma.cmd &= ~EDU_DMA_RUN;
+            return;
+        }
+        res = pci_dma_rw(&edu->pdev, src, edu->dma_buf + dst, edu->dma.cnt,
+            DMA_DIRECTION_TO_DEVICE, attrs);
+        if (res != MEMTX_OK) {
+            hw_error("EDU: DMA transfer TO 0x%"PRIx64" failed.\n", dst);
+        }
     } else {
         uint64_t src = edu->dma.src;
+        uint64_t dst = edu_clamp_addr(edu, edu->dma.dst);
         edu_check_range(src, edu->dma.cnt, DMA_START, DMA_SIZE);
         src -= DMA_START;
-        pci_dma_rw(&edu->pdev, edu_clamp_addr(edu, edu->dma.dst),
-                edu->dma_buf + src, edu->dma.cnt,
-                DMA_DIRECTION_FROM_DEVICE, attrs);
+        if (edu->try-- == NUM_TRIES) {
+            edu->prgr_rcvd = false;
+            if (!(pci_dma_perm(&edu->pdev, dst, attrs) & IOMMU_WO)) {
+                timer_mod(&edu->dma_timer,
+                          qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 100);
+                return;
+            }
+        } else if (edu->try) {
+            if (!edu->prgr_rcvd) {
+                timer_mod(&edu->dma_timer,
+                          qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 100);
+                return;
+            }
+            if (!edu->prgr_success) {
+                /* PRGR failure, fail DMA. */
+                edu->dma.cmd &= ~EDU_DMA_RUN;
+                return;
+            }
+        } else {
+            /* timeout, fail DMA. */
+            edu->dma.cmd &= ~EDU_DMA_RUN;
+            return;
+        }
+        res = pci_dma_rw(&edu->pdev, dst, edu->dma_buf + src, edu->dma.cnt,
+            DMA_DIRECTION_FROM_DEVICE, attrs);
+        if (res != MEMTX_OK) {
+            hw_error("EDU: DMA transfer FROM 0x%"PRIx64" failed.\n", src);
+        }
     }
 
     edu->dma.cmd &= ~EDU_DMA_RUN;
@@ -193,6 +322,7 @@ static void dma_rw(EduState *edu, bool write, dma_addr_t *val, dma_addr_t *dma,
     }
 
     if (timer) {
+        edu->try = NUM_TRIES;
         timer_mod(&edu->dma_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 100);
     }
 }
@@ -376,9 +506,92 @@ static void *edu_fact_thread(void *opaque)
     return NULL;
 }
 
+static void edu_iommu_ats_prgr_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
+{
+    struct edu_iommu *iommu = container_of(n, struct edu_iommu, n);
+    EduState *edu = iommu->edu;
+    edu->prgr_success = (iotlb->perm != IOMMU_NONE);
+    barrier();
+    edu->prgr_rcvd = true;
+}
+
+static void edu_iommu_ats_inval_notify(IOMMUNotifier *n,
+                                       IOMMUTLBEntry *iotlb)
+{
+
+}
+
+static void edu_iommu_region_add(MemoryListener *listener,
+                                   MemoryRegionSection *section)
+{
+    EduState *edu = container_of(listener, EduState, iommu_listener);
+    struct edu_iommu *iommu;
+    Int128 end;
+    int iommu_idx;
+    IOMMUMemoryRegion *iommu_mr;
+
+    if (!memory_region_is_iommu(section->mr)) {
+        return;
+    }
+
+    iommu_mr = IOMMU_MEMORY_REGION(section->mr);
+
+    /* Register ATS.INVAL notifier */
+    iommu = g_malloc0(sizeof(*iommu));
+    iommu->iommu_mr = iommu_mr;
+    iommu->iommu_offset = section->offset_within_address_space -
+                          section->offset_within_region;
+    iommu->edu = edu;
+    end = int128_add(int128_make64(section->offset_within_region),
+                     section->size);
+    end = int128_sub(end, int128_one());
+    iommu_idx = memory_region_iommu_attrs_to_index(iommu_mr,
+                                                   MEMTXATTRS_UNSPECIFIED);
+    iommu_notifier_init(&iommu->n, edu_iommu_ats_inval_notify,
+                        IOMMU_NOTIFIER_DEVIOTLB_UNMAP,
+                        section->offset_within_region,
+                        int128_get64(end),
+                        iommu_idx);
+    memory_region_register_iommu_notifier(section->mr, &iommu->n, NULL);
+    QLIST_INSERT_HEAD(&edu->iommu_list, iommu, iommu_next);
+
+    /* Register ATS.PRGR notifier */
+    iommu = g_memdup2(iommu, sizeof(*iommu));
+    iommu_notifier_init(&iommu->n, edu_iommu_ats_prgr_notify,
+                        IOMMU_NOTIFIER_MAP,
+                        section->offset_within_region,
+                        int128_get64(end),
+                        iommu_idx);
+    memory_region_register_iommu_notifier(section->mr, &iommu->n, NULL);
+    QLIST_INSERT_HEAD(&edu->iommu_list, iommu, iommu_next);
+}
+
+static void edu_iommu_region_del(MemoryListener *listener,
+                                   MemoryRegionSection *section)
+{
+    EduState *edu = container_of(listener, EduState, iommu_listener);
+    struct edu_iommu *iommu;
+
+    if (!memory_region_is_iommu(section->mr)) {
+        return;
+    }
+
+    QLIST_FOREACH(iommu, &edu->iommu_list, iommu_next) {
+        if (MEMORY_REGION(iommu->iommu_mr) == section->mr &&
+            iommu->n.start == section->offset_within_region) {
+            memory_region_unregister_iommu_notifier(section->mr,
+                                                    &iommu->n);
+            QLIST_REMOVE(iommu, iommu_next);
+            g_free(iommu);
+            break;
+        }
+    }
+}
+
 static void pci_edu_realize(PCIDevice *pdev, Error **errp)
 {
     EduState *edu = EDU(pdev);
+    AddressSpace *dma_as = NULL;
     uint8_t *pci_conf = pdev->config;
     int pos;
 
@@ -390,9 +603,28 @@ static void pci_edu_realize(PCIDevice *pdev, Error **errp)
     pos = PCI_CONFIG_SPACE_SIZE;
     if (edu->enable_pasid) {
         /* PCIe Spec 7.8.9 PASID Extended Capability Structure */
-        pcie_add_capability(pdev, 0x1b, 1, pos, 8);
+        pcie_add_capability(pdev, PCI_EXT_CAP_ID_PASID, 1, pos, 8);
         pci_set_long(pdev->config + pos + 4, 0x00001400);
         pci_set_long(pdev->wmask + pos + 4,  0xfff0ffff);
+        pos += 8;
+
+        /* ATS Capability */
+        pcie_ats_init(pdev, pos, true);
+        pos += PCI_EXT_CAP_ATS_SIZEOF;
+
+        /* PRI Capability */
+        pcie_add_capability(pdev, PCI_EXT_CAP_ID_PRI, 1, pos, 16);
+        /* PRI STOPPED */
+        pci_set_long(pdev->config + pos +  4, 0x01000000);
+        /* PRI ENABLE bit writable */
+        pci_set_long(pdev->wmask  + pos +  4, 0x00000001);
+        /* PRI Capacity Supported */
+        pci_set_long(pdev->config + pos +  8, 0x00000080);
+        /* PRI Allocations Allowed, 32 */
+        pci_set_long(pdev->config + pos + 12, 0x00000040);
+        pci_set_long(pdev->wmask  + pos + 12, 0x0000007f);
+
+        pos += 8;
     }
 
     if (msi_init(pdev, 0, 1, true, false, errp)) {
@@ -409,12 +641,24 @@ static void pci_edu_realize(PCIDevice *pdev, Error **errp)
     memory_region_init_io(&edu->mmio, OBJECT(edu), &edu_mmio_ops, edu,
                     "edu-mmio", 1 * MiB);
     pci_register_bar(pdev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &edu->mmio);
+
+    /* Register IOMMU listener */
+    edu->iommu_listener = (MemoryListener) {
+        .name = "edu-iommu",
+        .region_add = edu_iommu_region_add,
+        .region_del = edu_iommu_region_del,
+    };
+
+    dma_as = pci_device_iommu_address_space(pdev);
+    memory_listener_register(&edu->iommu_listener, dma_as);
 }
 
 static void pci_edu_uninit(PCIDevice *pdev)
 {
     EduState *edu = EDU(pdev);
 
+    memory_listener_unregister(&edu->iommu_listener);
+
     qemu_mutex_lock(&edu->thr_mutex);
     edu->stopping = true;
     qemu_mutex_unlock(&edu->thr_mutex);
-- 
2.43.2



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 01/15] exec/memtxattr: add process identifier to the transaction attributes
  2024-03-07 16:03 ` [PATCH v2 01/15] exec/memtxattr: add process identifier to the transaction attributes Daniel Henrique Barboza
@ 2024-04-23 16:33   ` Frank Chang
  0 siblings, 0 replies; 55+ messages in thread
From: Frank Chang @ 2024-04-23 16:33 UTC (permalink / raw)
  To: Daniel Henrique Barboza
  Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
	zhiwei_liu, palmer, ajones, tjeznach

Reviewed-by: Frank Chang <frank.chang@sifive.com>

Daniel Henrique Barboza <dbarboza@ventanamicro.com> 於 2024年3月8日 週五 上午12:04寫道:
>
> From: Tomasz Jeznach <tjeznach@rivosinc.com>
>
> Extend memory transaction attributes with process identifier to allow
> per-request address translation logic to use requester_id / process_id
> to identify memory mapping (e.g. enabling IOMMU w/ PASID translations).
>
> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> ---
>  include/exec/memattrs.h | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/include/exec/memattrs.h b/include/exec/memattrs.h
> index 14cdd8d582..46d0725416 100644
> --- a/include/exec/memattrs.h
> +++ b/include/exec/memattrs.h
> @@ -52,6 +52,11 @@ typedef struct MemTxAttrs {
>      unsigned int memory:1;
>      /* Requester ID (for MSI for example) */
>      unsigned int requester_id:16;
> +
> +    /*
> +     * PCI PASID support: Limited to 8 bits process identifier.
> +     */
> +    unsigned int pasid:8;
>  } MemTxAttrs;
>
>  /* Bus masters which don't specify any attributes will get this,
> --
> 2.43.2
>
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 04/15] hw/riscv: add riscv-iommu-pci device
  2024-03-07 16:03 ` [PATCH v2 04/15] hw/riscv: add riscv-iommu-pci device Daniel Henrique Barboza
@ 2024-04-29  7:21   ` Frank Chang
  2024-05-02  9:37     ` Daniel Henrique Barboza
  0 siblings, 1 reply; 55+ messages in thread
From: Frank Chang @ 2024-04-29  7:21 UTC (permalink / raw)
  To: Daniel Henrique Barboza
  Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
	zhiwei_liu, palmer, ajones, tjeznach

[-- Attachment #1: Type: text/plain, Size: 8561 bytes --]

Daniel Henrique Barboza <dbarboza@ventanamicro.com> 於 2024年3月8日 週五
上午12:04寫道:
>
> From: Tomasz Jeznach <tjeznach@rivosinc.com>
>
> The RISC-V IOMMU can be modelled as a PCIe device following the
> guidelines of the RISC-V IOMMU spec, chapter 7.1, "Integrating an IOMMU
> as a PCIe device".
>
> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> ---
>  hw/riscv/meson.build       |   2 +-
>  hw/riscv/riscv-iommu-pci.c | 173 +++++++++++++++++++++++++++++++++++++
>  2 files changed, 174 insertions(+), 1 deletion(-)
>  create mode 100644 hw/riscv/riscv-iommu-pci.c
>
> diff --git a/hw/riscv/meson.build b/hw/riscv/meson.build
> index ba9eebd605..4674cec6c4 100644
> --- a/hw/riscv/meson.build
> +++ b/hw/riscv/meson.build
> @@ -10,6 +10,6 @@ riscv_ss.add(when: 'CONFIG_SIFIVE_U', if_true:
files('sifive_u.c'))
>  riscv_ss.add(when: 'CONFIG_SPIKE', if_true: files('spike.c'))
>  riscv_ss.add(when: 'CONFIG_MICROCHIP_PFSOC', if_true:
files('microchip_pfsoc.c'))
>  riscv_ss.add(when: 'CONFIG_ACPI', if_true: files('virt-acpi-build.c'))
> -riscv_ss.add(when: 'CONFIG_RISCV_IOMMU', if_true: files('riscv-iommu.c'))
> +riscv_ss.add(when: 'CONFIG_RISCV_IOMMU', if_true: files('riscv-iommu.c',
'riscv-iommu-pci.c'))
>
>  hw_arch += {'riscv': riscv_ss}
> diff --git a/hw/riscv/riscv-iommu-pci.c b/hw/riscv/riscv-iommu-pci.c
> new file mode 100644
> index 0000000000..4eb1057210
> --- /dev/null
> +++ b/hw/riscv/riscv-iommu-pci.c
> @@ -0,0 +1,173 @@
> +/*
> + * QEMU emulation of an RISC-V IOMMU (Ziommu)
> + *
> + * Copyright (C) 2022-2023 Rivos Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/pci/msi.h"
> +#include "hw/pci/msix.h"
> +#include "hw/pci/pci_bus.h"
> +#include "hw/qdev-properties.h"
> +#include "hw/riscv/riscv_hart.h"
> +#include "migration/vmstate.h"
> +#include "qapi/error.h"
> +#include "qemu/error-report.h"
> +#include "qemu/host-utils.h"
> +#include "qom/object.h"
> +
> +#include "cpu_bits.h"
> +#include "riscv-iommu.h"
> +#include "riscv-iommu-bits.h"
> +
> +#ifndef PCI_VENDOR_ID_RIVOS
> +#define PCI_VENDOR_ID_RIVOS           0x1efd
> +#endif
> +
> +#ifndef PCI_DEVICE_ID_RIVOS_IOMMU
> +#define PCI_DEVICE_ID_RIVOS_IOMMU     0xedf1
> +#endif
> +
> +/* RISC-V IOMMU PCI Device Emulation */
> +
> +typedef struct RISCVIOMMUStatePci {
> +    PCIDevice        pci;     /* Parent PCIe device state */
> +    MemoryRegion     bar0;    /* PCI BAR (including MSI-x config) */
> +    RISCVIOMMUState  iommu;   /* common IOMMU state */
> +} RISCVIOMMUStatePci;
> +
> +/* interrupt delivery callback */
> +static void riscv_iommu_pci_notify(RISCVIOMMUState *iommu, unsigned
vector)
> +{
> +    RISCVIOMMUStatePci *s = container_of(iommu, RISCVIOMMUStatePci,
iommu);
> +
> +    if (msix_enabled(&(s->pci))) {
> +        msix_notify(&(s->pci), vector);
> +    }
> +}
> +
> +static void riscv_iommu_pci_realize(PCIDevice *dev, Error **errp)
> +{
> +    RISCVIOMMUStatePci *s = DO_UPCAST(RISCVIOMMUStatePci, pci, dev);
> +    RISCVIOMMUState *iommu = &s->iommu;
> +    Error *err = NULL;
> +
> +    /* Set device id for trace / debug */
> +    DEVICE(iommu)->id = g_strdup_printf("%02x:%02x.%01x",
> +        pci_dev_bus_num(dev), PCI_SLOT(dev->devfn),
PCI_FUNC(dev->devfn));

pci_dev_bus_num() calls pci_bus_num(),
and pci_bus_num() is assigned to pcibus_num(),
which returns bus->parent_dev->config[PCI_SECONDARY_BUS]
However, PCI bus number is not initialized by SW when IOMMU is initialized.
So pci_bus_num() will always return 0, IIRC.
Same issue as pci_bus_num() above.

> +    qdev_realize(DEVICE(iommu), NULL, errp);
> +
> +    memory_region_init(&s->bar0, OBJECT(s), "riscv-iommu-bar0",
> +        QEMU_ALIGN_UP(memory_region_size(&iommu->regs_mr),
TARGET_PAGE_SIZE));
> +    memory_region_add_subregion(&s->bar0, 0, &iommu->regs_mr);
> +
> +    pcie_endpoint_cap_init(dev, 0);
> +
> +    pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY |
> +                     PCI_BASE_ADDRESS_MEM_TYPE_64, &s->bar0);
> +
> +    int ret = msix_init(dev, RISCV_IOMMU_INTR_COUNT,
> +                        &s->bar0, 0, RISCV_IOMMU_REG_MSI_CONFIG,
> +                        &s->bar0, 0, RISCV_IOMMU_REG_MSI_CONFIG + 256,
0, &err);
> +
> +    if (ret == -ENOTSUP) {
> +        /*
> +         * MSI-x is not supported by the platform.
> +         * Driver should use timer/polling based notification handlers.
> +         */
> +        warn_report_err(err);
> +    } else if (ret < 0) {
> +        error_propagate(errp, err);
> +        return;
> +    } else {
> +        /* mark all allocated MSIx vectors as used. */
> +        msix_vector_use(dev, RISCV_IOMMU_INTR_CQ);
> +        msix_vector_use(dev, RISCV_IOMMU_INTR_FQ);
> +        msix_vector_use(dev, RISCV_IOMMU_INTR_PM);
> +        msix_vector_use(dev, RISCV_IOMMU_INTR_PQ);
> +        iommu->notify = riscv_iommu_pci_notify;
> +    }
> +
> +    PCIBus *bus = pci_device_root_bus(dev);
> +    if (!bus) {
> +        error_setg(errp, "can't find PCIe root port for %02x:%02x.%x",
> +            pci_bus_num(pci_get_bus(dev)), PCI_SLOT(dev->devfn),

Same issue to pci_dev_bus_num() above.

> +            PCI_FUNC(dev->devfn));
> +        return;
> +    }
> +
> +    riscv_iommu_pci_setup_iommu(iommu, bus, errp);
> +}
> +
> +static void riscv_iommu_pci_exit(PCIDevice *pci_dev)
> +{
> +    pci_setup_iommu(pci_device_root_bus(pci_dev), NULL, NULL);
> +}
> +
> +static const VMStateDescription riscv_iommu_vmstate = {
> +    .name = "riscv-iommu",
> +    .unmigratable = 1
> +};
> +
> +static void riscv_iommu_pci_init(Object *obj)
> +{
> +    RISCVIOMMUStatePci *s = RISCV_IOMMU_PCI(obj);
> +    RISCVIOMMUState *iommu = &s->iommu;
> +
> +    object_initialize_child(obj, "iommu", iommu, TYPE_RISCV_IOMMU);
> +    qdev_alias_all_properties(DEVICE(iommu), obj);
> +}
> +
> +static Property riscv_iommu_pci_properties[] = {
> +    DEFINE_PROP_END_OF_LIST(),
> +};

Do we need to assign the empty properties?

> +
> +static void riscv_iommu_pci_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> +
> +    k->realize = riscv_iommu_pci_realize;
> +    k->exit = riscv_iommu_pci_exit;
> +    k->vendor_id = PCI_VENDOR_ID_RIVOS;
> +    k->device_id = PCI_DEVICE_ID_RIVOS_IOMMU;

I know RIVOS originally modeled this IOMMU,
but we (SiFive) also have our IOMMU based on RISC-V IOMMU:
https://open-src-soc.org/2022-05/media/slides/RISC-V-International-Day-2022-05-05-14h10-Perinne-Peresse.pdf
Do we have the guidelines on how to extend the vendor IOMMU?

> +    k->revision = 0;
> +    k->class_id = 0x0806;

We should add
#define PCI_CLASS_SYSTEM_IOMMU 0x0806
instead of the hard-coded value.

P.S. AMD's IOMMU also uses hard-coded value 0x0806 in: hw/i386/amd_iommu.c.

> +    dc->desc = "RISCV-IOMMU DMA Remapping device";
> +    dc->vmsd = &riscv_iommu_vmstate;
> +    dc->hotpluggable = false;
> +    dc->user_creatable = true;
> +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> +    device_class_set_props(dc, riscv_iommu_pci_properties);
> +}
> +
> +static const TypeInfo riscv_iommu_pci = {
> +    .name = TYPE_RISCV_IOMMU_PCI,
> +    .parent = TYPE_PCI_DEVICE,
> +    .class_init = riscv_iommu_pci_class_init,
> +    .instance_init = riscv_iommu_pci_init,
> +    .instance_size = sizeof(RISCVIOMMUStatePci),
> +    .interfaces = (InterfaceInfo[]) {
> +        { INTERFACE_PCIE_DEVICE },
> +        { },
> +    },
> +};
> +
> +static void riscv_iommu_register_pci_types(void)
> +{
> +    type_register_static(&riscv_iommu_pci);
> +}
> +
> +type_init(riscv_iommu_register_pci_types);
> --
> 2.43.2
>
>

[-- Attachment #2: Type: text/html, Size: 10798 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 05/15] hw/riscv: add riscv-iommu-sys platform device
  2024-03-07 16:03 ` [PATCH v2 05/15] hw/riscv: add riscv-iommu-sys platform device Daniel Henrique Barboza
@ 2024-04-30  1:35   ` Frank Chang
  0 siblings, 0 replies; 55+ messages in thread
From: Frank Chang @ 2024-04-30  1:35 UTC (permalink / raw)
  To: Daniel Henrique Barboza
  Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
	zhiwei_liu, palmer, ajones, tjeznach

Reviewed-by: Frank Chang <frank.chang@sifive.com>

Daniel Henrique Barboza <dbarboza@ventanamicro.com> 於 2024年3月8日 週五 上午12:05寫道:
>
> From: Tomasz Jeznach <tjeznach@rivosinc.com>
>
> This device models the RISC-V IOMMU as a sysbus device.
>
> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> ---
>  hw/riscv/meson.build       |  2 +-
>  hw/riscv/riscv-iommu-sys.c | 93 ++++++++++++++++++++++++++++++++++++++
>  include/hw/riscv/iommu.h   |  4 ++
>  3 files changed, 98 insertions(+), 1 deletion(-)
>  create mode 100644 hw/riscv/riscv-iommu-sys.c
>
> diff --git a/hw/riscv/meson.build b/hw/riscv/meson.build
> index 4674cec6c4..e37c5d78e2 100644
> --- a/hw/riscv/meson.build
> +++ b/hw/riscv/meson.build
> @@ -10,6 +10,6 @@ riscv_ss.add(when: 'CONFIG_SIFIVE_U', if_true: files('sifive_u.c'))
>  riscv_ss.add(when: 'CONFIG_SPIKE', if_true: files('spike.c'))
>  riscv_ss.add(when: 'CONFIG_MICROCHIP_PFSOC', if_true: files('microchip_pfsoc.c'))
>  riscv_ss.add(when: 'CONFIG_ACPI', if_true: files('virt-acpi-build.c'))
> -riscv_ss.add(when: 'CONFIG_RISCV_IOMMU', if_true: files('riscv-iommu.c', 'riscv-iommu-pci.c'))
> +riscv_ss.add(when: 'CONFIG_RISCV_IOMMU', if_true: files('riscv-iommu.c', 'riscv-iommu-pci.c', 'riscv-iommu-sys.c'))
>
>  hw_arch += {'riscv': riscv_ss}
> diff --git a/hw/riscv/riscv-iommu-sys.c b/hw/riscv/riscv-iommu-sys.c
> new file mode 100644
> index 0000000000..4305cf8d79
> --- /dev/null
> +++ b/hw/riscv/riscv-iommu-sys.c
> @@ -0,0 +1,93 @@
> +/*
> + * QEMU emulation of an RISC-V IOMMU (Ziommu) - Platform Device
> + *
> + * Copyright (C) 2022-2023 Rivos Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/pci/pci_bus.h"
> +#include "hw/qdev-properties.h"
> +#include "hw/sysbus.h"
> +#include "qapi/error.h"
> +#include "qapi/error.h"
> +#include "qemu/error-report.h"
> +#include "qemu/host-utils.h"
> +#include "qemu/module.h"
> +#include "qemu/osdep.h"
> +#include "qom/object.h"
> +
> +#include "riscv-iommu.h"
> +
> +/* RISC-V IOMMU System Platform Device Emulation */
> +
> +struct RISCVIOMMUStateSys {
> +    SysBusDevice     parent;
> +    uint64_t         addr;
> +    RISCVIOMMUState  iommu;
> +};
> +
> +static void riscv_iommu_sys_realize(DeviceState *dev, Error **errp)
> +{
> +    RISCVIOMMUStateSys *s = RISCV_IOMMU_SYS(dev);
> +    PCIBus *pci_bus;
> +
> +    qdev_realize(DEVICE(&s->iommu), NULL, errp);
> +    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &s->iommu.regs_mr);
> +    if (s->addr) {
> +        sysbus_mmio_map(SYS_BUS_DEVICE(s), 0, s->addr);
> +    }
> +
> +    pci_bus = (PCIBus *) object_resolve_path_type("", TYPE_PCI_BUS, NULL);
> +    if (pci_bus) {
> +        riscv_iommu_pci_setup_iommu(&s->iommu, pci_bus, errp);
> +    }
> +}
> +
> +static void riscv_iommu_sys_init(Object *obj)
> +{
> +    RISCVIOMMUStateSys *s = RISCV_IOMMU_SYS(obj);
> +    RISCVIOMMUState *iommu = &s->iommu;
> +
> +    object_initialize_child(obj, "iommu", iommu, TYPE_RISCV_IOMMU);
> +    qdev_alias_all_properties(DEVICE(iommu), obj);
> +}
> +
> +static Property riscv_iommu_sys_properties[] = {
> +    DEFINE_PROP_UINT64("addr", RISCVIOMMUStateSys, addr, 0),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void riscv_iommu_sys_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +    dc->realize = riscv_iommu_sys_realize;
> +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> +    device_class_set_props(dc, riscv_iommu_sys_properties);
> +}
> +
> +static const TypeInfo riscv_iommu_sys = {
> +    .name          = TYPE_RISCV_IOMMU_SYS,
> +    .parent        = TYPE_SYS_BUS_DEVICE,
> +    .class_init    = riscv_iommu_sys_class_init,
> +    .instance_init = riscv_iommu_sys_init,
> +    .instance_size = sizeof(RISCVIOMMUStateSys),
> +};
> +
> +static void riscv_iommu_register_sys(void)
> +{
> +    type_register_static(&riscv_iommu_sys);
> +}
> +
> +type_init(riscv_iommu_register_sys)
> diff --git a/include/hw/riscv/iommu.h b/include/hw/riscv/iommu.h
> index 403b365893..c8d28a79a1 100644
> --- a/include/hw/riscv/iommu.h
> +++ b/include/hw/riscv/iommu.h
> @@ -33,4 +33,8 @@ typedef struct RISCVIOMMUSpace RISCVIOMMUSpace;
>  OBJECT_DECLARE_SIMPLE_TYPE(RISCVIOMMUStatePci, RISCV_IOMMU_PCI)
>  typedef struct RISCVIOMMUStatePci RISCVIOMMUStatePci;
>
> +#define TYPE_RISCV_IOMMU_SYS "riscv-iommu-device"
> +OBJECT_DECLARE_SIMPLE_TYPE(RISCVIOMMUStateSys, RISCV_IOMMU_SYS)
> +typedef struct RISCVIOMMUStateSys RISCVIOMMUStateSys;
> +
>  #endif
> --
> 2.43.2
>
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 06/15] hw/riscv/virt.c: support for RISC-V IOMMU PCIDevice hotplug
  2024-03-07 16:03 ` [PATCH v2 06/15] hw/riscv/virt.c: support for RISC-V IOMMU PCIDevice hotplug Daniel Henrique Barboza
@ 2024-04-30  2:17   ` Frank Chang
  2024-05-15  6:25   ` Eric Cheng
  1 sibling, 0 replies; 55+ messages in thread
From: Frank Chang @ 2024-04-30  2:17 UTC (permalink / raw)
  To: Daniel Henrique Barboza
  Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
	zhiwei_liu, palmer, ajones, tjeznach

Reviewed-by: Frank Chang <frank.chang@sifive.com>

Daniel Henrique Barboza <dbarboza@ventanamicro.com> 於 2024年3月8日 週五 上午12:06寫道:
>
> From: Tomasz Jeznach <tjeznach@rivosinc.com>
>
> Generate device tree entry for riscv-iommu PCI device, along with
> mapping all PCI device identifiers to the single IOMMU device instance.
>
> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> ---
>  hw/riscv/virt.c | 33 ++++++++++++++++++++++++++++++++-
>  1 file changed, 32 insertions(+), 1 deletion(-)
>
> diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
> index a094af97c3..67a8267747 100644
> --- a/hw/riscv/virt.c
> +++ b/hw/riscv/virt.c
> @@ -32,6 +32,7 @@
>  #include "hw/core/sysbus-fdt.h"
>  #include "target/riscv/pmu.h"
>  #include "hw/riscv/riscv_hart.h"
> +#include "hw/riscv/iommu.h"
>  #include "hw/riscv/virt.h"
>  #include "hw/riscv/boot.h"
>  #include "hw/riscv/numa.h"
> @@ -1004,6 +1005,30 @@ static void create_fdt_virtio_iommu(RISCVVirtState *s, uint16_t bdf)
>                             bdf + 1, iommu_phandle, bdf + 1, 0xffff - bdf);
>  }
>
> +static void create_fdt_iommu(RISCVVirtState *s, uint16_t bdf)
> +{
> +    const char comp[] = "riscv,pci-iommu";
> +    void *fdt = MACHINE(s)->fdt;
> +    uint32_t iommu_phandle;
> +    g_autofree char *iommu_node = NULL;
> +    g_autofree char *pci_node = NULL;
> +
> +    pci_node = g_strdup_printf("/soc/pci@%lx",
> +                               (long) virt_memmap[VIRT_PCIE_ECAM].base);
> +    iommu_node = g_strdup_printf("%s/iommu@%x", pci_node, bdf);
> +    iommu_phandle = qemu_fdt_alloc_phandle(fdt);
> +    qemu_fdt_add_subnode(fdt, iommu_node);
> +
> +    qemu_fdt_setprop(fdt, iommu_node, "compatible", comp, sizeof(comp));
> +    qemu_fdt_setprop_cell(fdt, iommu_node, "#iommu-cells", 1);
> +    qemu_fdt_setprop_cell(fdt, iommu_node, "phandle", iommu_phandle);
> +    qemu_fdt_setprop_cells(fdt, iommu_node, "reg",
> +                           bdf << 8, 0, 0, 0, 0);
> +    qemu_fdt_setprop_cells(fdt, pci_node, "iommu-map",
> +                           0, iommu_phandle, 0, bdf,
> +                           bdf + 1, iommu_phandle, bdf + 1, 0xffff - bdf);
> +}
> +
>  static void finalize_fdt(RISCVVirtState *s)
>  {
>      uint32_t phandle = 1, irq_mmio_phandle = 1, msi_pcie_phandle = 1;
> @@ -1712,9 +1737,11 @@ static HotplugHandler *virt_machine_get_hotplug_handler(MachineState *machine,
>      MachineClass *mc = MACHINE_GET_CLASS(machine);
>
>      if (device_is_dynamic_sysbus(mc, dev) ||
> -        object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI)) {
> +        object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI) ||
> +        object_dynamic_cast(OBJECT(dev), TYPE_RISCV_IOMMU_PCI)) {
>          return HOTPLUG_HANDLER(machine);
>      }
> +
>      return NULL;
>  }
>
> @@ -1735,6 +1762,10 @@ static void virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
>      if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI)) {
>          create_fdt_virtio_iommu(s, pci_get_bdf(PCI_DEVICE(dev)));
>      }
> +
> +    if (object_dynamic_cast(OBJECT(dev), TYPE_RISCV_IOMMU_PCI)) {
> +        create_fdt_iommu(s, pci_get_bdf(PCI_DEVICE(dev)));
> +    }
>  }
>
>  static void virt_machine_class_init(ObjectClass *oc, void *data)
> --
> 2.43.2
>
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 07/15] test/qtest: add riscv-iommu-pci tests
  2024-03-07 16:03 ` [PATCH v2 07/15] test/qtest: add riscv-iommu-pci tests Daniel Henrique Barboza
@ 2024-04-30  3:33   ` Frank Chang
  0 siblings, 0 replies; 55+ messages in thread
From: Frank Chang @ 2024-04-30  3:33 UTC (permalink / raw)
  To: Daniel Henrique Barboza
  Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
	zhiwei_liu, palmer, ajones, tjeznach

Reviewed-by: Frank Chang <frank.chang@sifive.com>

Daniel Henrique Barboza <dbarboza@ventanamicro.com> 於 2024年3月8日 週五 上午12:05寫道:
>
> To test the RISC-V IOMMU emulation we'll use its PCI representation.
> Create a new 'riscv-iommu-pci' libqos device that will be present with
> CONFIG_RISCV_IOMMU.  This config is only available for RISC-V, so this
> device will only be consumed by the RISC-V libqos machine.
>
> Start with basic tests: a PCI sanity check and a reset state register
> test. The reset test was taken from the RISC-V IOMMU spec chapter 5.2,
> "Reset behavior".
>
> More tests will be added later.
>
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> ---
>  tests/qtest/libqos/meson.build   |  4 ++
>  tests/qtest/libqos/riscv-iommu.c | 79 +++++++++++++++++++++++++++
>  tests/qtest/libqos/riscv-iommu.h | 67 +++++++++++++++++++++++
>  tests/qtest/meson.build          |  1 +
>  tests/qtest/riscv-iommu-test.c   | 93 ++++++++++++++++++++++++++++++++
>  5 files changed, 244 insertions(+)
>  create mode 100644 tests/qtest/libqos/riscv-iommu.c
>  create mode 100644 tests/qtest/libqos/riscv-iommu.h
>  create mode 100644 tests/qtest/riscv-iommu-test.c
>
> diff --git a/tests/qtest/libqos/meson.build b/tests/qtest/libqos/meson.build
> index 3aed6efcb8..07fe20eacb 100644
> --- a/tests/qtest/libqos/meson.build
> +++ b/tests/qtest/libqos/meson.build
> @@ -67,6 +67,10 @@ if have_virtfs
>    libqos_srcs += files('virtio-9p.c', 'virtio-9p-client.c')
>  endif
>
> +if config_all_devices.has_key('CONFIG_RISCV_IOMMU')
> +  libqos_srcs += files('riscv-iommu.c')
> +endif
> +
>  libqos = static_library('qos', libqos_srcs + genh,
>                          name_suffix: 'fa',
>                          build_by_default: false)
> diff --git a/tests/qtest/libqos/riscv-iommu.c b/tests/qtest/libqos/riscv-iommu.c
> new file mode 100644
> index 0000000000..8ae7d4888c
> --- /dev/null
> +++ b/tests/qtest/libqos/riscv-iommu.c
> @@ -0,0 +1,79 @@
> +/*
> + * libqos driver riscv-iommu-pci framework
> + *
> + * Copyright (c) 2024 Ventana Micro Systems Inc.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or (at your
> + * option) any later version.  See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "../libqtest.h"
> +#include "qemu/module.h"
> +#include "qgraph.h"
> +#include "pci.h"
> +#include "riscv-iommu.h"
> +
> +#define PCI_VENDOR_ID_RIVOS           0x1efd
> +#define PCI_DEVICE_ID_RIVOS_IOMMU     0xedf1
> +
> +static void *riscv_iommu_pci_get_driver(void *obj, const char *interface)
> +{
> +    QRISCVIOMMU *r_iommu_pci = obj;
> +
> +    if (!g_strcmp0(interface, "pci-device")) {
> +        return &r_iommu_pci->dev;
> +    }
> +
> +    fprintf(stderr, "%s not present in riscv_iommu_pci\n", interface);
> +    g_assert_not_reached();
> +}
> +
> +static void riscv_iommu_pci_start_hw(QOSGraphObject *obj)
> +{
> +    QRISCVIOMMU *pci = (QRISCVIOMMU *)obj;
> +    qpci_device_enable(&pci->dev);
> +}
> +
> +static void riscv_iommu_pci_destructor(QOSGraphObject *obj)
> +{
> +    QRISCVIOMMU *pci = (QRISCVIOMMU *)obj;
> +    qpci_iounmap(&pci->dev, pci->reg_bar);
> +}
> +
> +static void *riscv_iommu_pci_create(void *pci_bus, QGuestAllocator *alloc,
> +                                    void *addr)
> +{
> +    QRISCVIOMMU *r_iommu_pci = g_new0(QRISCVIOMMU, 1);
> +    QPCIBus *bus = pci_bus;
> +
> +    qpci_device_init(&r_iommu_pci->dev, bus, addr);
> +    r_iommu_pci->reg_bar = qpci_iomap(&r_iommu_pci->dev, 0, NULL);
> +
> +    r_iommu_pci->obj.get_driver = riscv_iommu_pci_get_driver;
> +    r_iommu_pci->obj.start_hw = riscv_iommu_pci_start_hw;
> +    r_iommu_pci->obj.destructor = riscv_iommu_pci_destructor;
> +    return &r_iommu_pci->obj;
> +}
> +
> +static void riscv_iommu_pci_register_nodes(void)
> +{
> +    QPCIAddress addr = {
> +        .vendor_id = PCI_VENDOR_ID_RIVOS,
> +        .device_id = PCI_DEVICE_ID_RIVOS_IOMMU,
> +        .devfn = QPCI_DEVFN(1, 0),
> +    };
> +
> +    QOSGraphEdgeOptions opts = {
> +        .extra_device_opts = "addr=01.0",
> +    };
> +
> +    add_qpci_address(&opts, &addr);
> +
> +    qos_node_create_driver("riscv-iommu-pci", riscv_iommu_pci_create);
> +    qos_node_produces("riscv-iommu-pci", "pci-device");
> +    qos_node_consumes("riscv-iommu-pci", "pci-bus", &opts);
> +}
> +
> +libqos_init(riscv_iommu_pci_register_nodes);
> diff --git a/tests/qtest/libqos/riscv-iommu.h b/tests/qtest/libqos/riscv-iommu.h
> new file mode 100644
> index 0000000000..8c056caa7b
> --- /dev/null
> +++ b/tests/qtest/libqos/riscv-iommu.h
> @@ -0,0 +1,67 @@
> +/*
> + * libqos driver riscv-iommu-pci framework
> + *
> + * Copyright (c) 2024 Ventana Micro Systems Inc.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or (at your
> + * option) any later version.  See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef TESTS_LIBQOS_RISCV_IOMMU_H
> +#define TESTS_LIBQOS_RISCV_IOMMU_H
> +
> +#include "qgraph.h"
> +#include "pci.h"
> +#include "qemu/bitops.h"
> +
> +#ifndef GENMASK_ULL
> +#define GENMASK_ULL(h, l) (((~0ULL) >> (63 - (h) + (l))) << (l))
> +#endif
> +
> +#define RISCV_IOMMU_PCI_VENDOR_ID_RIVOS  0x1efd
> +#define RISCV_IOMMU_PCI_DEVICE_ID_RIVOS  0xedf1
> +#define RISCV_IOMMU_PCI_DEVICE_CLASS     0x0806
> +
> +/* Common field positions */
> +#define RISCV_IOMMU_QUEUE_ENABLE        BIT(0)
> +#define RISCV_IOMMU_QUEUE_INTR_ENABLE   BIT(1)
> +#define RISCV_IOMMU_QUEUE_MEM_FAULT     BIT(8)
> +#define RISCV_IOMMU_QUEUE_ACTIVE        BIT(16)
> +#define RISCV_IOMMU_QUEUE_BUSY          BIT(17)
> +
> +#define RISCV_IOMMU_REG_CAP             0x0000
> +#define RISCV_IOMMU_CAP_VERSION         GENMASK_ULL(7, 0)
> +
> +#define RISCV_IOMMU_REG_DDTP            0x0010
> +#define RISCV_IOMMU_DDTP_BUSY           BIT_ULL(4)
> +#define RISCV_IOMMU_DDTP_MODE           GENMASK_ULL(3, 0)
> +#define RISCV_IOMMU_DDTP_MODE_OFF       0
> +
> +#define RISCV_IOMMU_REG_CQCSR           0x0048
> +#define RISCV_IOMMU_CQCSR_CQEN          RISCV_IOMMU_QUEUE_ENABLE
> +#define RISCV_IOMMU_CQCSR_CIE           RISCV_IOMMU_QUEUE_INTR_ENABLE
> +#define RISCV_IOMMU_CQCSR_CQON          RISCV_IOMMU_QUEUE_ACTIVE
> +#define RISCV_IOMMU_CQCSR_BUSY          RISCV_IOMMU_QUEUE_BUSY
> +
> +#define RISCV_IOMMU_REG_FQCSR           0x004C
> +#define RISCV_IOMMU_FQCSR_FQEN          RISCV_IOMMU_QUEUE_ENABLE
> +#define RISCV_IOMMU_FQCSR_FIE           RISCV_IOMMU_QUEUE_INTR_ENABLE
> +#define RISCV_IOMMU_FQCSR_FQON          RISCV_IOMMU_QUEUE_ACTIVE
> +#define RISCV_IOMMU_FQCSR_BUSY          RISCV_IOMMU_QUEUE_BUSY
> +
> +#define RISCV_IOMMU_REG_PQCSR           0x0050
> +#define RISCV_IOMMU_PQCSR_PQEN          RISCV_IOMMU_QUEUE_ENABLE
> +#define RISCV_IOMMU_PQCSR_PIE           RISCV_IOMMU_QUEUE_INTR_ENABLE
> +#define RISCV_IOMMU_PQCSR_PQON          RISCV_IOMMU_QUEUE_ACTIVE
> +#define RISCV_IOMMU_PQCSR_BUSY          RISCV_IOMMU_QUEUE_BUSY
> +
> +#define RISCV_IOMMU_REG_IPSR            0x0054
> +
> +typedef struct QRISCVIOMMU {
> +    QOSGraphObject obj;
> +    QPCIDevice dev;
> +    QPCIBar reg_bar;
> +} QRISCVIOMMU;
> +
> +#endif
> diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
> index 31b9f4ede4..aeb7346840 100644
> --- a/tests/qtest/meson.build
> +++ b/tests/qtest/meson.build
> @@ -285,6 +285,7 @@ qos_test_ss.add(
>    'vmxnet3-test.c',
>    'igb-test.c',
>    'ufs-test.c',
> +  'riscv-iommu-test.c',
>  )
>
>  if config_all_devices.has_key('CONFIG_VIRTIO_SERIAL')
> diff --git a/tests/qtest/riscv-iommu-test.c b/tests/qtest/riscv-iommu-test.c
> new file mode 100644
> index 0000000000..13b887d15e
> --- /dev/null
> +++ b/tests/qtest/riscv-iommu-test.c
> @@ -0,0 +1,93 @@
> +/*
> + * QTest testcase for RISC-V IOMMU
> + *
> + * Copyright (c) 2024 Ventana Micro Systems Inc.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or (at your
> + * option) any later version.  See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "libqtest-single.h"
> +#include "qemu/module.h"
> +#include "libqos/qgraph.h"
> +#include "libqos/riscv-iommu.h"
> +#include "hw/pci/pci_regs.h"
> +
> +static uint32_t riscv_iommu_read_reg32(QRISCVIOMMU *r_iommu, int reg_offset)
> +{
> +    uint32_t reg;
> +
> +    qpci_memread(&r_iommu->dev, r_iommu->reg_bar, reg_offset,
> +                 &reg, sizeof(reg));
> +    return reg;
> +}
> +
> +static uint64_t riscv_iommu_read_reg64(QRISCVIOMMU *r_iommu, int reg_offset)
> +{
> +    uint64_t reg;
> +
> +    qpci_memread(&r_iommu->dev, r_iommu->reg_bar, reg_offset,
> +                 &reg, sizeof(reg));
> +    return reg;
> +}
> +
> +static void test_pci_config(void *obj, void *data, QGuestAllocator *t_alloc)
> +{
> +    QRISCVIOMMU *r_iommu = obj;
> +    QPCIDevice *dev = &r_iommu->dev;
> +    uint16_t vendorid, deviceid, classid;
> +
> +    vendorid = qpci_config_readw(dev, PCI_VENDOR_ID);
> +    deviceid = qpci_config_readw(dev, PCI_DEVICE_ID);
> +    classid = qpci_config_readw(dev, PCI_CLASS_DEVICE);
> +
> +    g_assert_cmpuint(vendorid, ==, RISCV_IOMMU_PCI_VENDOR_ID_RIVOS);
> +    g_assert_cmpuint(deviceid, ==, RISCV_IOMMU_PCI_DEVICE_ID_RIVOS);
> +    g_assert_cmpuint(classid, ==, RISCV_IOMMU_PCI_DEVICE_CLASS);
> +}
> +
> +static void test_reg_reset(void *obj, void *data, QGuestAllocator *t_alloc)
> +{
> +    QRISCVIOMMU *r_iommu = obj;
> +    uint64_t cap;
> +    uint32_t reg;
> +
> +    cap = riscv_iommu_read_reg64(r_iommu, RISCV_IOMMU_REG_CAP);
> +    g_assert_cmpuint(cap & RISCV_IOMMU_CAP_VERSION, ==, 0x10);
> +
> +    reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_CQCSR);
> +    g_assert_cmpuint(reg & RISCV_IOMMU_CQCSR_CQEN, ==, 0);
> +    g_assert_cmpuint(reg & RISCV_IOMMU_CQCSR_CIE, ==, 0);
> +    g_assert_cmpuint(reg & RISCV_IOMMU_CQCSR_CQON, ==, 0);
> +    g_assert_cmpuint(reg & RISCV_IOMMU_CQCSR_BUSY, ==, 0);
> +
> +    reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_FQCSR);
> +    g_assert_cmpuint(reg & RISCV_IOMMU_FQCSR_FQEN, ==, 0);
> +    g_assert_cmpuint(reg & RISCV_IOMMU_FQCSR_FIE, ==, 0);
> +    g_assert_cmpuint(reg & RISCV_IOMMU_FQCSR_FQON, ==, 0);
> +    g_assert_cmpuint(reg & RISCV_IOMMU_FQCSR_BUSY, ==, 0);
> +
> +    reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_PQCSR);
> +    g_assert_cmpuint(reg & RISCV_IOMMU_PQCSR_PQEN, ==, 0);
> +    g_assert_cmpuint(reg & RISCV_IOMMU_PQCSR_PIE, ==, 0);
> +    g_assert_cmpuint(reg & RISCV_IOMMU_PQCSR_PQON, ==, 0);
> +    g_assert_cmpuint(reg & RISCV_IOMMU_PQCSR_BUSY, ==, 0);
> +
> +    reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_DDTP);
> +    g_assert_cmpuint(reg & RISCV_IOMMU_DDTP_BUSY, ==, 0);
> +    g_assert_cmpuint(reg & RISCV_IOMMU_DDTP_MODE, ==,
> +                     RISCV_IOMMU_DDTP_MODE_OFF);
> +
> +    reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_IPSR);
> +    g_assert_cmpuint(reg, ==, 0);
> +}
> +
> +static void register_riscv_iommu_test(void)
> +{
> +    qos_add_test("pci_config", "riscv-iommu-pci", test_pci_config, NULL);
> +    qos_add_test("reg_reset", "riscv-iommu-pci", test_reg_reset, NULL);
> +}
> +
> +libqos_init(register_riscv_iommu_test);
> --
> 2.43.2
>
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 03/15] hw/riscv: add RISC-V IOMMU base emulation
  2024-03-07 16:03 ` [PATCH v2 03/15] hw/riscv: add RISC-V IOMMU base emulation Daniel Henrique Barboza
@ 2024-05-01 11:57   ` Jason Chien
  2024-05-14 20:06     ` Daniel Henrique Barboza
  2024-05-02 11:37   ` Frank Chang
  1 sibling, 1 reply; 55+ messages in thread
From: Jason Chien @ 2024-05-01 11:57 UTC (permalink / raw)
  To: Daniel Henrique Barboza, qemu-devel
  Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, zhiwei_liu,
	palmer, ajones, tjeznach, Sebastien Boeuf

[-- Attachment #1: Type: text/plain, Size: 66397 bytes --]

Daniel Henrique Barboza 於 2024/3/8 上午 12:03 寫道:
> From: Tomasz Jeznach<tjeznach@rivosinc.com>
>
> The RISC-V IOMMU specification is now ratified as-per the RISC-V
> international process. The latest frozen specifcation can be found
> at:
>
> https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf
>
> Add the foundation of the device emulation for RISC-V IOMMU, which
> includes an IOMMU that has no capabilities but MSI interrupt support and
> fault queue interfaces. We'll add add more features incrementally in the
> next patches.
>
> Co-developed-by: Sebastien Boeuf<seb@rivosinc.com>
> Signed-off-by: Sebastien Boeuf<seb@rivosinc.com>
> Signed-off-by: Tomasz Jeznach<tjeznach@rivosinc.com>
> Signed-off-by: Daniel Henrique Barboza<dbarboza@ventanamicro.com>
> ---
>   hw/riscv/Kconfig         |    4 +
>   hw/riscv/meson.build     |    1 +
>   hw/riscv/riscv-iommu.c   | 1492 ++++++++++++++++++++++++++++++++++++++
>   hw/riscv/riscv-iommu.h   |  141 ++++
>   hw/riscv/trace-events    |   11 +
>   hw/riscv/trace.h         |    2 +
>   include/hw/riscv/iommu.h |   36 +
>   meson.build              |    1 +
>   8 files changed, 1688 insertions(+)
>   create mode 100644 hw/riscv/riscv-iommu.c
>   create mode 100644 hw/riscv/riscv-iommu.h
>   create mode 100644 hw/riscv/trace-events
>   create mode 100644 hw/riscv/trace.h
>   create mode 100644 include/hw/riscv/iommu.h
>
> diff --git a/hw/riscv/Kconfig b/hw/riscv/Kconfig
> index 5d644eb7b1..faf6a10029 100644
> --- a/hw/riscv/Kconfig
> +++ b/hw/riscv/Kconfig
> @@ -1,3 +1,6 @@
> +config RISCV_IOMMU
> +    bool
> +
>   config RISCV_NUMA
>       bool
>   
> @@ -38,6 +41,7 @@ config RISCV_VIRT
>       select SERIAL
>       select RISCV_ACLINT
>       select RISCV_APLIC
> +    select RISCV_IOMMU
>       select RISCV_IMSIC
>       select SIFIVE_PLIC
>       select SIFIVE_TEST
> diff --git a/hw/riscv/meson.build b/hw/riscv/meson.build
> index 2f7ee81be3..ba9eebd605 100644
> --- a/hw/riscv/meson.build
> +++ b/hw/riscv/meson.build
> @@ -10,5 +10,6 @@ riscv_ss.add(when: 'CONFIG_SIFIVE_U', if_true: files('sifive_u.c'))
>   riscv_ss.add(when: 'CONFIG_SPIKE', if_true: files('spike.c'))
>   riscv_ss.add(when: 'CONFIG_MICROCHIP_PFSOC', if_true: files('microchip_pfsoc.c'))
>   riscv_ss.add(when: 'CONFIG_ACPI', if_true: files('virt-acpi-build.c'))
> +riscv_ss.add(when: 'CONFIG_RISCV_IOMMU', if_true: files('riscv-iommu.c'))
>   
>   hw_arch += {'riscv': riscv_ss}
> diff --git a/hw/riscv/riscv-iommu.c b/hw/riscv/riscv-iommu.c
> new file mode 100644
> index 0000000000..df534b99b0
> --- /dev/null
> +++ b/hw/riscv/riscv-iommu.c
> @@ -0,0 +1,1492 @@
> +/*
> + * QEMU emulation of an RISC-V IOMMU (Ziommu)
> + *
> + * Copyright (C) 2021-2023, Rivos Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see<http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qom/object.h"
> +#include "hw/pci/pci_bus.h"
> +#include "hw/pci/pci_device.h"
> +#include "hw/qdev-properties.h"
> +#include "hw/riscv/riscv_hart.h"
> +#include "migration/vmstate.h"
> +#include "qapi/error.h"
> +#include "qemu/timer.h"
> +
> +#include "cpu_bits.h"
> +#include "riscv-iommu.h"
> +#include "riscv-iommu-bits.h"
> +#include "trace.h"
> +
> +#define LIMIT_CACHE_CTX               (1U << 7)
> +#define LIMIT_CACHE_IOT               (1U << 20)
> +
> +/* Physical page number coversions */
> +#define PPN_PHYS(ppn)                 ((ppn) << TARGET_PAGE_BITS)
> +#define PPN_DOWN(phy)                 ((phy) >> TARGET_PAGE_BITS)
> +
> +typedef struct RISCVIOMMUContext RISCVIOMMUContext;
> +typedef struct RISCVIOMMUEntry RISCVIOMMUEntry;
> +
> +/* Device assigned I/O address space */
> +struct RISCVIOMMUSpace {
> +    IOMMUMemoryRegion iova_mr;  /* IOVA memory region for attached device */
> +    AddressSpace iova_as;       /* IOVA address space for attached device */
> +    RISCVIOMMUState *iommu;     /* Managing IOMMU device state */
> +    uint32_t devid;             /* Requester identifier, AKA device_id */
> +    bool notifier;              /* IOMMU unmap notifier enabled */
> +    QLIST_ENTRY(RISCVIOMMUSpace) list;
> +};
> +
> +/* Device translation context state. */
> +struct RISCVIOMMUContext {
> +    uint64_t devid:24;          /* Requester Id, AKA device_id */
> +    uint64_t pasid:20;          /* Process Address Space ID */
> +    uint64_t __rfu:20;          /* reserved */
> +    uint64_t tc;                /* Translation Control */
> +    uint64_t ta;                /* Translation Attributes */
> +    uint64_t msi_addr_mask;     /* MSI filtering - address mask */
> +    uint64_t msi_addr_pattern;  /* MSI filtering - address pattern */
> +    uint64_t msiptp;            /* MSI redirection page table pointer */
> +};
> +
> +/* IOMMU index for transactions without PASID specified. */
> +#define RISCV_IOMMU_NOPASID 0
> +
> +static void riscv_iommu_notify(RISCVIOMMUState *s, int vec)
> +{
> +    const uint32_t ipsr =
> +        riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_IPSR, (1 << vec), 0);
> +    const uint32_t ivec = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_IVEC);
> +    if (s->notify && !(ipsr & (1 << vec))) {
> +        s->notify(s, (ivec >> (vec * 4)) & 0x0F);
> +    }
> +}
The RISC-V IOMMU also supports WSI.
> +
> +static void riscv_iommu_fault(RISCVIOMMUState *s,
> +                              struct riscv_iommu_fq_record *ev)
> +{
> +    uint32_t ctrl = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_FQCSR);
> +    uint32_t head = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_FQH) & s->fq_mask;
> +    uint32_t tail = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_FQT) & s->fq_mask;
> +    uint32_t next = (tail + 1) & s->fq_mask;
> +    uint32_t devid = get_field(ev->hdr, RISCV_IOMMU_FQ_HDR_DID);
> +
> +    trace_riscv_iommu_flt(s->parent_obj.id, PCI_BUS_NUM(devid), PCI_SLOT(devid),
> +                          PCI_FUNC(devid), ev->hdr, ev->iotval);
> +
> +    if (!(ctrl & RISCV_IOMMU_FQCSR_FQON) ||
> +        !!(ctrl & (RISCV_IOMMU_FQCSR_FQOF | RISCV_IOMMU_FQCSR_FQMF))) {
> +        return;
> +    }
> +
> +    if (head == next) {
> +        riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_FQCSR,
> +                              RISCV_IOMMU_FQCSR_FQOF, 0);
> +    } else {
> +        dma_addr_t addr = s->fq_addr + tail * sizeof(*ev);
> +        if (dma_memory_write(s->target_as, addr, ev, sizeof(*ev),
> +                             MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) {
> +            riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_FQCSR,
> +                                  RISCV_IOMMU_FQCSR_FQMF, 0);
> +        } else {
> +            riscv_iommu_reg_set32(s, RISCV_IOMMU_REG_FQT, next);
> +        }
> +    }
> +
> +    if (ctrl & RISCV_IOMMU_FQCSR_FIE) {
> +        riscv_iommu_notify(s, RISCV_IOMMU_INTR_FQ);
> +    }
> +}
> +
> +static void riscv_iommu_pri(RISCVIOMMUState *s,
> +    struct riscv_iommu_pq_record *pr)
> +{
> +    uint32_t ctrl = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_PQCSR);
> +    uint32_t head = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_PQH) & s->pq_mask;
> +    uint32_t tail = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_PQT) & s->pq_mask;
> +    uint32_t next = (tail + 1) & s->pq_mask;
> +    uint32_t devid = get_field(pr->hdr, RISCV_IOMMU_PREQ_HDR_DID);
> +
> +    trace_riscv_iommu_pri(s->parent_obj.id, PCI_BUS_NUM(devid), PCI_SLOT(devid),
> +                          PCI_FUNC(devid), pr->payload);
> +
> +    if (!(ctrl & RISCV_IOMMU_PQCSR_PQON) ||
> +        !!(ctrl & (RISCV_IOMMU_PQCSR_PQOF | RISCV_IOMMU_PQCSR_PQMF))) {
> +        return;
> +    }
> +
> +    if (head == next) {
> +        riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_PQCSR,
> +                              RISCV_IOMMU_PQCSR_PQOF, 0);
> +    } else {
> +        dma_addr_t addr = s->pq_addr + tail * sizeof(*pr);
> +        if (dma_memory_write(s->target_as, addr, pr, sizeof(*pr),
> +                             MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) {
> +            riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_PQCSR,
> +                                  RISCV_IOMMU_PQCSR_PQMF, 0);
> +        } else {
> +            riscv_iommu_reg_set32(s, RISCV_IOMMU_REG_PQT, next);
> +        }
> +    }
> +
> +    if (ctrl & RISCV_IOMMU_PQCSR_PIE) {
> +        riscv_iommu_notify(s, RISCV_IOMMU_INTR_PQ);
> +    }
> +}
> +
> +/* Portable implementation of pext_u64, bit-mask extraction. */
> +static uint64_t _pext_u64(uint64_t val, uint64_t ext)
> +{
> +    uint64_t ret = 0;
> +    uint64_t rot = 1;
> +
> +    while (ext) {
> +        if (ext & 1) {
> +            if (val & 1) {
> +                ret |= rot;
> +            }
> +            rot <<= 1;
> +        }
> +        val >>= 1;
> +        ext >>= 1;
> +    }
> +
> +    return ret;
> +}
> +
> +/* Check if GPA matches MSI/MRIF pattern. */
> +static bool riscv_iommu_msi_check(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
> +    dma_addr_t gpa)
> +{
> +    if (get_field(ctx->msiptp, RISCV_IOMMU_DC_MSIPTP_MODE) !=
> +        RISCV_IOMMU_DC_MSIPTP_MODE_FLAT) {
> +        return false; /* Invalid MSI/MRIF mode */
> +    }
> +
> +    if ((PPN_DOWN(gpa) ^ ctx->msi_addr_pattern) & ~ctx->msi_addr_mask) {
> +        return false; /* GPA not in MSI range defined by AIA IMSIC rules. */
> +    }
> +
> +    return true;
> +}
> +
> +/* RISCV IOMMU Address Translation Lookup - Page Table Walk */
> +static int riscv_iommu_spa_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
> +    IOMMUTLBEntry *iotlb)
> +{
> +    /* Early check for MSI address match when IOVA == GPA */
> +    if (iotlb->perm & IOMMU_WO &&
> +        riscv_iommu_msi_check(s, ctx, iotlb->iova)) {
> +        iotlb->target_as = &s->trap_as;
> +        iotlb->translated_addr = iotlb->iova;
> +        iotlb->addr_mask = ~TARGET_PAGE_MASK;
> +        return 0;
> +    }
> +
> +    /* Exit early for pass-through mode. */
> +    iotlb->translated_addr = iotlb->iova;
> +    iotlb->addr_mask = ~TARGET_PAGE_MASK;
> +    /* Allow R/W in pass-through mode */
> +    iotlb->perm = IOMMU_RW;
> +    return 0;
> +}
> +
> +/* Redirect MSI write for given GPA. */
> +static MemTxResult riscv_iommu_msi_write(RISCVIOMMUState *s,
> +    RISCVIOMMUContext *ctx, uint64_t gpa, uint64_t data,
> +    unsigned size, MemTxAttrs attrs)
> +{
> +    MemTxResult res;
> +    dma_addr_t addr;
> +    uint64_t intn;
> +    uint32_t n190;
> +    uint64_t pte[2];
> +
> +    if (!riscv_iommu_msi_check(s, ctx, gpa)) {
> +        return MEMTX_ACCESS_ERROR;
> +    }
> +
> +    /* Interrupt File Number */
> +    intn = _pext_u64(PPN_DOWN(gpa), ctx->msi_addr_mask);
> +    if (intn >= 256) {
> +        /* Interrupt file number out of range */
> +        return MEMTX_ACCESS_ERROR;
> +    }
> +
> +    /* fetch MSI PTE */
> +    addr = PPN_PHYS(get_field(ctx->msiptp, RISCV_IOMMU_DC_MSIPTP_PPN));
> +    addr = addr | (intn * sizeof(pte));
> +    res = dma_memory_read(s->target_as, addr, &pte, sizeof(pte),
> +            MEMTXATTRS_UNSPECIFIED);
> +    if (res != MEMTX_OK) {
> +        return res;
> +    }
> +
> +    le64_to_cpus(&pte[0]);
> +    le64_to_cpus(&pte[1]);
> +
> +    if (!(pte[0] & RISCV_IOMMU_MSI_PTE_V) || (pte[0] & RISCV_IOMMU_MSI_PTE_C)) {
> +        return MEMTX_ACCESS_ERROR;
> +    }
> +
> +    switch (get_field(pte[0], RISCV_IOMMU_MSI_PTE_M)) {
> +    case RISCV_IOMMU_MSI_PTE_M_BASIC:
> +        /* MSI Pass-through mode */
> +        addr = PPN_PHYS(get_field(pte[0], RISCV_IOMMU_MSI_PTE_PPN));
> +        addr = addr | (gpa & TARGET_PAGE_MASK);
> +
> +        trace_riscv_iommu_msi(s->parent_obj.id, PCI_BUS_NUM(ctx->devid),
> +                              PCI_SLOT(ctx->devid), PCI_FUNC(ctx->devid),
> +                              gpa, addr);
> +
> +        return dma_memory_write(s->target_as, addr, &data, size, attrs);
> +    case RISCV_IOMMU_MSI_PTE_M_MRIF:
> +        /* MRIF mode, continue. */
> +        break;
> +    default:
> +        return MEMTX_ACCESS_ERROR;
> +    }
> +
> +    /*
> +     * Report an error for interrupt identities exceeding the maximum allowed
> +     * for an IMSIC interrupt file (2047) or destination address is not 32-bit
> +     * aligned. See IOMMU Specification, Chapter 2.3. MSI page tables.
> +     */
> +    if ((data > 2047) || (gpa & 3)) {
> +        return MEMTX_ACCESS_ERROR;
> +    }
> +
> +    /* MSI MRIF mode, non atomic pending bit update */
> +
> +    /* MRIF pending bit address */
> +    addr = get_field(pte[0], RISCV_IOMMU_MSI_PTE_MRIF_ADDR) << 9;
> +    addr = addr | ((data & 0x7c0) >> 3);
> +
> +    trace_riscv_iommu_msi(s->parent_obj.id, PCI_BUS_NUM(ctx->devid),
> +                          PCI_SLOT(ctx->devid), PCI_FUNC(ctx->devid),
> +                          gpa, addr);
> +
> +    /* MRIF pending bit mask */
> +    data = 1ULL << (data & 0x03f);
> +    res = dma_memory_read(s->target_as, addr, &intn, sizeof(intn), attrs);
> +    if (res != MEMTX_OK) {
> +        return res;
> +    }
> +    intn = intn | data;
> +    res = dma_memory_write(s->target_as, addr, &intn, sizeof(intn), attrs);
> +    if (res != MEMTX_OK) {
> +        return res;
> +    }
> +
> +    /* Get MRIF enable bits */
> +    addr = addr + sizeof(intn);
> +    res = dma_memory_read(s->target_as, addr, &intn, sizeof(intn), attrs);
> +    if (res != MEMTX_OK) {
> +        return res;
> +    }
> +    if (!(intn & data)) {
> +        /* notification disabled, MRIF update completed. */
> +        return MEMTX_OK;
> +    }
> +
> +    /* Send notification message */
> +    addr = PPN_PHYS(get_field(pte[1], RISCV_IOMMU_MSI_MRIF_NPPN));
> +    n190 = get_field(pte[1], RISCV_IOMMU_MSI_MRIF_NID) |
> +          (get_field(pte[1], RISCV_IOMMU_MSI_MRIF_NID_MSB) << 10);
> +
> +    res = dma_memory_write(s->target_as, addr, &n190, sizeof(n190), attrs);
> +    if (res != MEMTX_OK) {
> +        return res;
> +    }
> +
> +    return MEMTX_OK;
> +}
> +
> +/*
> + * RISC-V IOMMU Device Context Loopkup - Device Directory Tree Walk
> + *
> + * @s         : IOMMU Device State
> + * @ctx       : Device Translation Context with devid and pasid set.
> + * @return    : success or fault code.
> + */
> +static int riscv_iommu_ctx_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx)
> +{
> +    const uint64_t ddtp = s->ddtp;
> +    unsigned mode = get_field(ddtp, RISCV_IOMMU_DDTP_MODE);
> +    dma_addr_t addr = PPN_PHYS(get_field(ddtp, RISCV_IOMMU_DDTP_PPN));
> +    struct riscv_iommu_dc dc;
> +    /* Device Context format: 0: extended (64 bytes) | 1: base (32 bytes) */
> +    const int dc_fmt = !s->enable_msi;
> +    const size_t dc_len = sizeof(dc) >> dc_fmt;
> +    unsigned depth;
> +    uint64_t de;
> +
> +    switch (mode) {
> +    case RISCV_IOMMU_DDTP_MODE_OFF:
> +        return RISCV_IOMMU_FQ_CAUSE_DMA_DISABLED;
> +
> +    case RISCV_IOMMU_DDTP_MODE_BARE:
> +        /* mock up pass-through translation context */
> +        ctx->tc = RISCV_IOMMU_DC_TC_V;
> +        ctx->ta = 0;
> +        ctx->msiptp = 0;
> +        return 0;
> +
> +    case RISCV_IOMMU_DDTP_MODE_1LVL:
> +        depth = 0;
> +        break;
> +
> +    case RISCV_IOMMU_DDTP_MODE_2LVL:
> +        depth = 1;
> +        break;
> +
> +    case RISCV_IOMMU_DDTP_MODE_3LVL:
> +        depth = 2;
> +        break;
> +
> +    default:
> +        return RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED;
> +    }
> +
> +    /*
> +     * Check supported device id width (in bits).
> +     * See IOMMU Specification, Chapter 6. Software guidelines.
> +     * - if extended device-context format is used:
> +     *   1LVL: 6, 2LVL: 15, 3LVL: 24
> +     * - if base device-context format is used:
> +     *   1LVL: 7, 2LVL: 16, 3LVL: 24
> +     */
> +    if (ctx->devid >= (1 << (depth * 9 + 6 + (dc_fmt && depth != 2)))) {
> +        return RISCV_IOMMU_FQ_CAUSE_DDT_INVALID;

The cause should be 260 not 258.

 From the RISC-V IOMMU Architecture Spec v1.0.0 section 2.3:
If the device_id is wider than that supported by the IOMMU mode, as 
determined by the following checks then stop and report "Transaction 
type disallowed" (cause = 260).
a. ddtp.iommu_mode is 2LVL and DDI[2] is not 0
b. ddtp.iommu_mode is 1LVL and either DDI[2] is not 0 or DDI[1] is not 0

> +    }
> +
> +    /* Device directory tree walk */
> +    for (; depth-- > 0; ) {
> +        /*
> +         * Select device id index bits based on device directory tree level
> +         * and device context format.
> +         * See IOMMU Specification, Chapter 2. Data Structures.
> +         * - if extended device-context format is used:
> +         *   device index: [23:15][14:6][5:0]
> +         * - if base device-context format is used:
> +         *   device index: [23:16][15:7][6:0]
> +         */
> +        const int split = depth * 9 + 6 + dc_fmt;
> +        addr |= ((ctx->devid >> split) << 3) & ~TARGET_PAGE_MASK;
> +        if (dma_memory_read(s->target_as, addr, &de, sizeof(de),
> +                            MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) {
> +            return RISCV_IOMMU_FQ_CAUSE_DDT_LOAD_FAULT;
> +        }
> +        le64_to_cpus(&de);
> +        if (!(de & RISCV_IOMMU_DDTE_VALID)) {
> +            /* invalid directory entry */
> +            return RISCV_IOMMU_FQ_CAUSE_DDT_INVALID;
> +        }
> +        if (de & ~(RISCV_IOMMU_DDTE_PPN | RISCV_IOMMU_DDTE_VALID)) {
> +            /* reserved bits set */
> +            return RISCV_IOMMU_FQ_CAUSE_DDT_INVALID;

The cause should be 259 not 258.

 From RISC-V IOMMU Architecture Spec v1.0.0 section 2.3.1:
If any bits or encoding that are reserved for future standard use are 
set within ddte, stop and report "DDT entry misconfigured" (cause = 259).

> +        }
> +        addr = PPN_PHYS(get_field(de, RISCV_IOMMU_DDTE_PPN));
> +    }
> +
> +    /* index into device context entry page */
> +    addr |= (ctx->devid * dc_len) & ~TARGET_PAGE_MASK;
> +
> +    memset(&dc, 0, sizeof(dc));
> +    if (dma_memory_read(s->target_as, addr, &dc, dc_len,
> +                        MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) {
> +        return RISCV_IOMMU_FQ_CAUSE_DDT_LOAD_FAULT;
> +    }
> +
> +    /* Set translation context. */
> +    ctx->tc = le64_to_cpu(dc.tc);
> +    ctx->ta = le64_to_cpu(dc.ta);
> +    ctx->msiptp = le64_to_cpu(dc.msiptp);
> +    ctx->msi_addr_mask = le64_to_cpu(dc.msi_addr_mask);
> +    ctx->msi_addr_pattern = le64_to_cpu(dc.msi_addr_pattern);
> +
According to RISC-V IOMMU Architecture spec v1.0.0 section 2.1.4, we 
should do some checks for the found device context.
> +    if (!(ctx->tc & RISCV_IOMMU_DC_TC_V)) {
> +        return RISCV_IOMMU_FQ_CAUSE_DDT_INVALID;
> +    }
> +
> +    if (!(ctx->tc & RISCV_IOMMU_DC_TC_PDTV)) {
> +        if (ctx->pasid != RISCV_IOMMU_NOPASID) {
> +            /* PASID is disabled */
> +            return RISCV_IOMMU_FQ_CAUSE_TTYPE_BLOCKED;
> +        }
> +        return 0;
> +    }
> +
> +    /* FSC.TC.PDTV enabled */
> +    if (mode > RISCV_IOMMU_DC_FSC_PDTP_MODE_PD20) {
> +        /* Invalid PDTP.MODE */
> +        return RISCV_IOMMU_FQ_CAUSE_PDT_MISCONFIGURED;
> +    }
> +
> +    for (depth = mode - RISCV_IOMMU_DC_FSC_PDTP_MODE_PD8; depth-- > 0; ) {
> +        /*
> +         * Select process id index bits based on process directory tree
> +         * level. See IOMMU Specification, 2.2. Process-Directory-Table.
> +         */
> +        const int split = depth * 9 + 8;
> +        addr |= ((ctx->pasid >> split) << 3) & ~TARGET_PAGE_MASK;
> +        if (dma_memory_read(s->target_as, addr, &de, sizeof(de),
> +                            MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) {
> +            return RISCV_IOMMU_FQ_CAUSE_PDT_LOAD_FAULT;
> +        }
> +        le64_to_cpus(&de);
> +        if (!(de & RISCV_IOMMU_PC_TA_V)) {
> +            return RISCV_IOMMU_FQ_CAUSE_PDT_INVALID;
> +        }
> +        addr = PPN_PHYS(get_field(de, RISCV_IOMMU_PC_FSC_PPN));
> +    }
> +
> +    /* Leaf entry in PDT */
> +    addr |= (ctx->pasid << 4) & ~TARGET_PAGE_MASK;
> +    if (dma_memory_read(s->target_as, addr, &dc.ta, sizeof(uint64_t) * 2,
> +                        MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) {
> +        return RISCV_IOMMU_FQ_CAUSE_PDT_LOAD_FAULT;
> +    }
> +
According to RISC-V IOMMU Architecture spec v1.0.0 section 2.2.4, we 
should do some checks for the found process context.
> +    /* Use FSC and TA from process directory entry. */
> +    ctx->ta = le64_to_cpu(dc.ta);
> +
> +    return 0;
> +}
> +
> +/* Translation Context cache support */
> +static gboolean __ctx_equal(gconstpointer v1, gconstpointer v2)
> +{
> +    RISCVIOMMUContext *c1 = (RISCVIOMMUContext *) v1;
> +    RISCVIOMMUContext *c2 = (RISCVIOMMUContext *) v2;
> +    return c1->devid == c2->devid && c1->pasid == c2->pasid;
> +}
> +
> +static guint __ctx_hash(gconstpointer v)
> +{
> +    RISCVIOMMUContext *ctx = (RISCVIOMMUContext *) v;
> +    /* Generate simple hash of (pasid, devid), assuming 24-bit wide devid */
> +    return (guint)(ctx->devid) + ((guint)(ctx->pasid) << 24);
> +}
> +
> +static void __ctx_inval_devid_pasid(gpointer key, gpointer value, gpointer data)
> +{
> +    RISCVIOMMUContext *ctx = (RISCVIOMMUContext *) value;
> +    RISCVIOMMUContext *arg = (RISCVIOMMUContext *) data;
> +    if (ctx->tc & RISCV_IOMMU_DC_TC_V &&
> +        ctx->devid == arg->devid &&
> +        ctx->pasid == arg->pasid) {
> +        ctx->tc &= ~RISCV_IOMMU_DC_TC_V;
> +    }
> +}
> +
> +static void __ctx_inval_devid(gpointer key, gpointer value, gpointer data)
> +{
> +    RISCVIOMMUContext *ctx = (RISCVIOMMUContext *) value;
> +    RISCVIOMMUContext *arg = (RISCVIOMMUContext *) data;
> +    if (ctx->tc & RISCV_IOMMU_DC_TC_V &&
> +        ctx->devid == arg->devid) {
> +        ctx->tc &= ~RISCV_IOMMU_DC_TC_V;
> +    }
> +}
> +
> +static void __ctx_inval_all(gpointer key, gpointer value, gpointer data)
> +{
> +    RISCVIOMMUContext *ctx = (RISCVIOMMUContext *) value;
> +    if (ctx->tc & RISCV_IOMMU_DC_TC_V) {
> +        ctx->tc &= ~RISCV_IOMMU_DC_TC_V;
> +    }
> +}
> +
> +static void riscv_iommu_ctx_inval(RISCVIOMMUState *s, GHFunc func,
> +    uint32_t devid, uint32_t pasid)
> +{
> +    GHashTable *ctx_cache;
> +    RISCVIOMMUContext key = {
> +        .devid = devid,
> +        .pasid = pasid,
> +    };
> +    ctx_cache = g_hash_table_ref(s->ctx_cache);
> +    g_hash_table_foreach(ctx_cache, func, &key);
> +    g_hash_table_unref(ctx_cache);
> +}
> +
> +/* Find or allocate translation context for a given {device_id, process_id} */
> +static RISCVIOMMUContext *riscv_iommu_ctx(RISCVIOMMUState *s,
> +    unsigned devid, unsigned pasid, void **ref)
> +{
> +    GHashTable *ctx_cache;
> +    RISCVIOMMUContext *ctx;
> +    RISCVIOMMUContext key = {
> +        .devid = devid,
> +        .pasid = pasid,
> +    };
> +
> +    ctx_cache = g_hash_table_ref(s->ctx_cache);
> +    ctx = g_hash_table_lookup(ctx_cache, &key);
> +
> +    if (ctx && (ctx->tc & RISCV_IOMMU_DC_TC_V)) {
> +        *ref = ctx_cache;
> +        return ctx;
> +    }
> +
> +    if (g_hash_table_size(s->ctx_cache) >= LIMIT_CACHE_CTX) {
> +        ctx_cache = g_hash_table_new_full(__ctx_hash, __ctx_equal,
> +                                          g_free, NULL);
> +        g_hash_table_unref(qatomic_xchg(&s->ctx_cache, ctx_cache));
> +    }
> +
> +    ctx = g_new0(RISCVIOMMUContext, 1);
> +    ctx->devid = devid;
> +    ctx->pasid = pasid;
> +
> +    int fault = riscv_iommu_ctx_fetch(s, ctx);
> +    if (!fault) {
> +        g_hash_table_add(ctx_cache, ctx);
> +        *ref = ctx_cache;
> +        return ctx;
> +    }
> +
> +    g_hash_table_unref(ctx_cache);
> +    *ref = NULL;
> +
> +    if (!(ctx->tc & RISCV_IOMMU_DC_TC_DTF)) {
> +        struct riscv_iommu_fq_record ev = { 0 };
> +        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_CAUSE, fault);
> +        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_TTYPE,
> +            RISCV_IOMMU_FQ_TTYPE_UADDR_RD);
> +        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_DID, devid);
> +        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_PID, pasid);
> +        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_PV, !!pasid);
> +        riscv_iommu_fault(s, &ev);
> +    }
> +
> +    g_free(ctx);
> +    return NULL;
> +}
> +
> +static void riscv_iommu_ctx_put(RISCVIOMMUState *s, void *ref)
> +{
> +    if (ref) {
> +        g_hash_table_unref((GHashTable *)ref);
> +    }
> +}
> +
> +/* Find or allocate address space for a given device */
> +static AddressSpace *riscv_iommu_space(RISCVIOMMUState *s, uint32_t devid)
> +{
> +    RISCVIOMMUSpace *as;
> +
> +    /* FIXME: PCIe bus remapping for attached endpoints. */
> +    devid |= s->bus << 8;
> +
> +    qemu_mutex_lock(&s->core_lock);
> +    QLIST_FOREACH(as, &s->spaces, list) {
> +        if (as->devid == devid) {
> +            break;
> +        }
> +    }
> +    qemu_mutex_unlock(&s->core_lock);
> +
> +    if (as == NULL) {
> +        char name[64];
> +        as = g_new0(RISCVIOMMUSpace, 1);
> +
> +        as->iommu = s;
> +        as->devid = devid;
> +
> +        snprintf(name, sizeof(name), "riscv-iommu-%04x:%02x.%d-iova",
> +            PCI_BUS_NUM(as->devid), PCI_SLOT(as->devid), PCI_FUNC(as->devid));
> +
> +        /* IOVA address space, untranslated addresses */
> +        memory_region_init_iommu(&as->iova_mr, sizeof(as->iova_mr),
> +            TYPE_RISCV_IOMMU_MEMORY_REGION,
> +            OBJECT(as), name, UINT64_MAX);
> +        address_space_init(&as->iova_as, MEMORY_REGION(&as->iova_mr),
> +            TYPE_RISCV_IOMMU_PCI);
> +
> +        qemu_mutex_lock(&s->core_lock);
> +        QLIST_INSERT_HEAD(&s->spaces, as, list);
> +        qemu_mutex_unlock(&s->core_lock);
> +
> +        trace_riscv_iommu_new(s->parent_obj.id, PCI_BUS_NUM(as->devid),
> +                PCI_SLOT(as->devid), PCI_FUNC(as->devid));
> +    }
> +    return &as->iova_as;
> +}
> +
> +static int riscv_iommu_translate(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
> +    IOMMUTLBEntry *iotlb)
> +{
> +    bool enable_faults;
> +    bool enable_pasid;
> +    bool enable_pri;
> +    int fault;
> +
> +    enable_faults = !(ctx->tc & RISCV_IOMMU_DC_TC_DTF);
> +    /*
> +     * TC[32] is reserved for custom extensions, used here to temporarily
> +     * enable automatic page-request generation for ATS queries.
> +     */
> +    enable_pri = (iotlb->perm == IOMMU_NONE) && (ctx->tc & BIT_ULL(32));
> +    enable_pasid = (ctx->tc & RISCV_IOMMU_DC_TC_PDTV);
> +
> +    /* Translate using device directory / page table information. */
> +    fault = riscv_iommu_spa_fetch(s, ctx, iotlb);
> +
> +    if (enable_pri && fault) {
> +        struct riscv_iommu_pq_record pr = {0};
> +        if (enable_pasid) {
> +            pr.hdr = set_field(RISCV_IOMMU_PREQ_HDR_PV,
> +                RISCV_IOMMU_PREQ_HDR_PID, ctx->pasid);
> +        }
> +        pr.hdr = set_field(pr.hdr, RISCV_IOMMU_PREQ_HDR_DID, ctx->devid);
> +        pr.payload = (iotlb->iova & TARGET_PAGE_MASK) |
> +                     RISCV_IOMMU_PREQ_PAYLOAD_M;
> +        riscv_iommu_pri(s, &pr);
> +        return fault;
> +    }
> +
> +    if (enable_faults && fault) {
> +        struct riscv_iommu_fq_record ev;
> +        unsigned ttype;
> +
> +        if (iotlb->perm & IOMMU_RW) {
> +            ttype = RISCV_IOMMU_FQ_TTYPE_UADDR_WR;
> +        } else {
> +            ttype = RISCV_IOMMU_FQ_TTYPE_UADDR_RD;
> +        }
> +        ev.hdr = set_field(0, RISCV_IOMMU_FQ_HDR_CAUSE, fault);
> +        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_TTYPE, ttype);
> +        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_PV, enable_pasid);
> +        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_PID, ctx->pasid);
> +        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_DID, ctx->devid);
> +        ev.iotval    = iotlb->iova;
> +        ev.iotval2   = iotlb->translated_addr;
> +        ev._reserved = 0;
> +        riscv_iommu_fault(s, &ev);
> +        return fault;
> +    }
> +
> +    return 0;
> +}
> +
> +/* IOMMU Command Interface */
> +static MemTxResult riscv_iommu_iofence(RISCVIOMMUState *s, bool notify,
> +    uint64_t addr, uint32_t data)
> +{
> +    /*
> +     * ATS processing in this implementation of the IOMMU is synchronous,
> +     * no need to wait for completions here.
> +     */
> +    if (!notify) {
> +        return MEMTX_OK;
> +    }
> +
> +    return dma_memory_write(s->target_as, addr, &data, sizeof(data),
> +        MEMTXATTRS_UNSPECIFIED);
> +}
> +
> +static void riscv_iommu_process_ddtp(RISCVIOMMUState *s)
> +{
> +    uint64_t old_ddtp = s->ddtp;
> +    uint64_t new_ddtp = riscv_iommu_reg_get64(s, RISCV_IOMMU_REG_DDTP);
> +    unsigned new_mode = get_field(new_ddtp, RISCV_IOMMU_DDTP_MODE);
> +    unsigned old_mode = get_field(old_ddtp, RISCV_IOMMU_DDTP_MODE);
> +    bool ok = false;
> +
> +    /*
> +     * Check for allowed DDTP.MODE transitions:
> +     * {OFF, BARE}        -> {OFF, BARE, 1LVL, 2LVL, 3LVL}
> +     * {1LVL, 2LVL, 3LVL} -> {OFF, BARE}
> +     */
> +    if (new_mode == old_mode ||
> +        new_mode == RISCV_IOMMU_DDTP_MODE_OFF ||
> +        new_mode == RISCV_IOMMU_DDTP_MODE_BARE) {
> +        ok = true;
> +    } else if (new_mode == RISCV_IOMMU_DDTP_MODE_1LVL ||
> +               new_mode == RISCV_IOMMU_DDTP_MODE_2LVL ||
> +               new_mode == RISCV_IOMMU_DDTP_MODE_3LVL) {
> +        ok = old_mode == RISCV_IOMMU_DDTP_MODE_OFF ||
> +             old_mode == RISCV_IOMMU_DDTP_MODE_BARE;
> +    }
> +
> +    if (ok) {
> +        /* clear reserved and busy bits, report back sanitized version */
> +        new_ddtp = set_field(new_ddtp & RISCV_IOMMU_DDTP_PPN,
> +                             RISCV_IOMMU_DDTP_MODE, new_mode);
> +    } else {
> +        new_ddtp = old_ddtp;
> +    }
> +    s->ddtp = new_ddtp;
> +
> +    riscv_iommu_reg_set64(s, RISCV_IOMMU_REG_DDTP, new_ddtp);
> +}
> +
> +/* Command function and opcode field. */
> +#define RISCV_IOMMU_CMD(func, op) (((func) << 7) | (op))
> +
> +static void riscv_iommu_process_cq_tail(RISCVIOMMUState *s)
> +{
> +    struct riscv_iommu_command cmd;
> +    MemTxResult res;
> +    dma_addr_t addr;
> +    uint32_t tail, head, ctrl;
> +    uint64_t cmd_opcode;
> +    GHFunc func;
> +
> +    ctrl = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_CQCSR);
> +    tail = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_CQT) & s->cq_mask;
> +    head = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_CQH) & s->cq_mask;
> +
> +    /* Check for pending error or queue processing disabled */
> +    if (!(ctrl & RISCV_IOMMU_CQCSR_CQON) ||
> +        !!(ctrl & (RISCV_IOMMU_CQCSR_CMD_ILL | RISCV_IOMMU_CQCSR_CQMF))) {
> +        return;
> +    }
> +
> +    while (tail != head) {
> +        addr = s->cq_addr  + head * sizeof(cmd);
> +        res = dma_memory_read(s->target_as, addr, &cmd, sizeof(cmd),
> +                              MEMTXATTRS_UNSPECIFIED);
> +
> +        if (res != MEMTX_OK) {
> +            riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_CQCSR,
> +                                  RISCV_IOMMU_CQCSR_CQMF, 0);
> +            goto fault;
> +        }
> +
> +        trace_riscv_iommu_cmd(s->parent_obj.id, cmd.dword0, cmd.dword1);
> +
> +        cmd_opcode = get_field(cmd.dword0,
> +                               RISCV_IOMMU_CMD_OPCODE | RISCV_IOMMU_CMD_FUNC);
> +
> +        switch (cmd_opcode) {
> +        case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IOFENCE_FUNC_C,
> +                             RISCV_IOMMU_CMD_IOFENCE_OPCODE):
> +            res = riscv_iommu_iofence(s,
> +                cmd.dword0 & RISCV_IOMMU_CMD_IOFENCE_AV, cmd.dword1,
> +                get_field(cmd.dword0, RISCV_IOMMU_CMD_IOFENCE_DATA));
> +
> +            if (res != MEMTX_OK) {
> +                riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_CQCSR,
> +                                      RISCV_IOMMU_CQCSR_CQMF, 0);
> +                goto fault;
> +            }
> +            break;
> +
> +        case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IOTINVAL_FUNC_GVMA,
> +                             RISCV_IOMMU_CMD_IOTINVAL_OPCODE):
> +            if (cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_PSCV) {
> +                /* illegal command arguments IOTINVAL.GVMA & PSCV == 1 */
> +                goto cmd_ill;
> +            }
> +            /* translation cache not implemented yet */
> +            break;
> +
> +        case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IOTINVAL_FUNC_VMA,
> +                             RISCV_IOMMU_CMD_IOTINVAL_OPCODE):
> +            /* translation cache not implemented yet */
> +            break;
> +
> +        case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_DDT,
> +                             RISCV_IOMMU_CMD_IODIR_OPCODE):
> +            if (!(cmd.dword0 & RISCV_IOMMU_CMD_IODIR_DV)) {
> +                /* invalidate all device context cache mappings */
> +                func = __ctx_inval_all;
> +            } else {
> +                /* invalidate all device context matching DID */
> +                func = __ctx_inval_devid;
> +            }
> +            riscv_iommu_ctx_inval(s, func,
> +                get_field(cmd.dword0, RISCV_IOMMU_CMD_IODIR_DID), 0);
> +            break;
> +
> +        case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_PDT,
> +                             RISCV_IOMMU_CMD_IODIR_OPCODE):
> +            if (!(cmd.dword0 & RISCV_IOMMU_CMD_IODIR_DV)) {
> +                /* illegal command arguments IODIR_PDT & DV == 0 */
> +                goto cmd_ill;
> +            } else {
> +                func = __ctx_inval_devid_pasid;
> +            }
> +            riscv_iommu_ctx_inval(s, func,
> +                get_field(cmd.dword0, RISCV_IOMMU_CMD_IODIR_DID),
> +                get_field(cmd.dword0, RISCV_IOMMU_CMD_IODIR_PID));
> +            break;
> +
> +        default:
> +        cmd_ill:
> +            /* Invalid instruction, do not advance instruction index. */
> +            riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_CQCSR,
> +                RISCV_IOMMU_CQCSR_CMD_ILL, 0);
> +            goto fault;
> +        }
> +
> +        /* Advance and update head pointer after command completes. */
> +        head = (head + 1) & s->cq_mask;
> +        riscv_iommu_reg_set32(s, RISCV_IOMMU_REG_CQH, head);
> +    }
> +    return;
> +
> +fault:
> +    if (ctrl & RISCV_IOMMU_CQCSR_CIE) {
> +        riscv_iommu_notify(s, RISCV_IOMMU_INTR_CQ);
> +    }
> +}
> +
> +static void riscv_iommu_process_cq_control(RISCVIOMMUState *s)
> +{
> +    uint64_t base;
> +    uint32_t ctrl_set = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_CQCSR);
> +    uint32_t ctrl_clr;
> +    bool enable = !!(ctrl_set & RISCV_IOMMU_CQCSR_CQEN);
> +    bool active = !!(ctrl_set & RISCV_IOMMU_CQCSR_CQON);
> +
> +    if (enable && !active) {
> +        base = riscv_iommu_reg_get64(s, RISCV_IOMMU_REG_CQB);
> +        s->cq_mask = (2ULL << get_field(base, RISCV_IOMMU_CQB_LOG2SZ)) - 1;
> +        s->cq_addr = PPN_PHYS(get_field(base, RISCV_IOMMU_CQB_PPN));
> +        stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_CQT], ~s->cq_mask);
> +        stl_le_p(&s->regs_rw[RISCV_IOMMU_REG_CQH], 0);
> +        stl_le_p(&s->regs_rw[RISCV_IOMMU_REG_CQT], 0);
> +        ctrl_set = RISCV_IOMMU_CQCSR_CQON;
> +        ctrl_clr = RISCV_IOMMU_CQCSR_BUSY | RISCV_IOMMU_CQCSR_CQMF |
> +            RISCV_IOMMU_CQCSR_CMD_ILL | RISCV_IOMMU_CQCSR_CMD_TO;
cqcsr.fence_w_ip should be set to 0 as well.
> +    } else if (!enable && active) {
> +        stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_CQT], ~0);
> +        ctrl_set = 0;
> +        ctrl_clr = RISCV_IOMMU_CQCSR_BUSY | RISCV_IOMMU_CQCSR_CQON;
> +    } else {
> +        ctrl_set = 0;
> +        ctrl_clr = RISCV_IOMMU_CQCSR_BUSY;
> +    }
> +
> +    riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_CQCSR, ctrl_set, ctrl_clr);
> +}
> +
> +static void riscv_iommu_process_fq_control(RISCVIOMMUState *s)
> +{
> +    uint64_t base;
> +    uint32_t ctrl_set = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_FQCSR);
> +    uint32_t ctrl_clr;
> +    bool enable = !!(ctrl_set & RISCV_IOMMU_FQCSR_FQEN);
> +    bool active = !!(ctrl_set & RISCV_IOMMU_FQCSR_FQON);
> +
> +    if (enable && !active) {
> +        base = riscv_iommu_reg_get64(s, RISCV_IOMMU_REG_FQB);
> +        s->fq_mask = (2ULL << get_field(base, RISCV_IOMMU_FQB_LOG2SZ)) - 1;
> +        s->fq_addr = PPN_PHYS(get_field(base, RISCV_IOMMU_FQB_PPN));
> +        stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_FQH], ~s->fq_mask);
> +        stl_le_p(&s->regs_rw[RISCV_IOMMU_REG_FQH], 0);
> +        stl_le_p(&s->regs_rw[RISCV_IOMMU_REG_FQT], 0);
> +        ctrl_set = RISCV_IOMMU_FQCSR_FQON;
> +        ctrl_clr = RISCV_IOMMU_FQCSR_BUSY | RISCV_IOMMU_FQCSR_FQMF |
> +            RISCV_IOMMU_FQCSR_FQOF;
> +    } else if (!enable && active) {
> +        stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_FQH], ~0);
> +        ctrl_set = 0;
> +        ctrl_clr = RISCV_IOMMU_FQCSR_BUSY | RISCV_IOMMU_FQCSR_FQON;
> +    } else {
> +        ctrl_set = 0;
> +        ctrl_clr = RISCV_IOMMU_FQCSR_BUSY;
> +    }
> +
> +    riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_FQCSR, ctrl_set, ctrl_clr);
> +}
> +
> +static void riscv_iommu_process_pq_control(RISCVIOMMUState *s)
> +{
> +    uint64_t base;
> +    uint32_t ctrl_set = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_PQCSR);
> +    uint32_t ctrl_clr;
> +    bool enable = !!(ctrl_set & RISCV_IOMMU_PQCSR_PQEN);
> +    bool active = !!(ctrl_set & RISCV_IOMMU_PQCSR_PQON);
> +
> +    if (enable && !active) {
> +        base = riscv_iommu_reg_get64(s, RISCV_IOMMU_REG_PQB);
> +        s->pq_mask = (2ULL << get_field(base, RISCV_IOMMU_PQB_LOG2SZ)) - 1;
> +        s->pq_addr = PPN_PHYS(get_field(base, RISCV_IOMMU_PQB_PPN));
> +        stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_PQH], ~s->pq_mask);
> +        stl_le_p(&s->regs_rw[RISCV_IOMMU_REG_PQH], 0);
> +        stl_le_p(&s->regs_rw[RISCV_IOMMU_REG_PQT], 0);
> +        ctrl_set = RISCV_IOMMU_PQCSR_PQON;
> +        ctrl_clr = RISCV_IOMMU_PQCSR_BUSY | RISCV_IOMMU_PQCSR_PQMF |
> +            RISCV_IOMMU_PQCSR_PQOF;
> +    } else if (!enable && active) {
> +        stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_PQH], ~0);
> +        ctrl_set = 0;
> +        ctrl_clr = RISCV_IOMMU_PQCSR_BUSY | RISCV_IOMMU_PQCSR_PQON;
> +    } else {
> +        ctrl_set = 0;
> +        ctrl_clr = RISCV_IOMMU_PQCSR_BUSY;
> +    }
> +
> +    riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_PQCSR, ctrl_set, ctrl_clr);
> +}
> +
> +/* Core IOMMU execution activation */
> +enum {
> +    RISCV_IOMMU_EXEC_DDTP,
> +    RISCV_IOMMU_EXEC_CQCSR,
> +    RISCV_IOMMU_EXEC_CQT,
> +    RISCV_IOMMU_EXEC_FQCSR,
> +    RISCV_IOMMU_EXEC_FQH,
> +    RISCV_IOMMU_EXEC_PQCSR,
> +    RISCV_IOMMU_EXEC_PQH,
> +    RISCV_IOMMU_EXEC_TR_REQUEST,
> +    /* RISCV_IOMMU_EXEC_EXIT must be the last enum value */
> +    RISCV_IOMMU_EXEC_EXIT,
> +};
> +
> +static void *riscv_iommu_core_proc(void* arg)
> +{
> +    RISCVIOMMUState *s = arg;
> +    unsigned exec = 0;
> +    unsigned mask = 0;
> +
> +    while (!(exec & BIT(RISCV_IOMMU_EXEC_EXIT))) {
> +        mask = (mask ? mask : BIT(RISCV_IOMMU_EXEC_EXIT)) >> 1;
> +        switch (exec & mask) {
> +        case BIT(RISCV_IOMMU_EXEC_DDTP):
> +            riscv_iommu_process_ddtp(s);
> +            break;
> +        case BIT(RISCV_IOMMU_EXEC_CQCSR):
> +            riscv_iommu_process_cq_control(s);
> +            break;
> +        case BIT(RISCV_IOMMU_EXEC_CQT):
> +            riscv_iommu_process_cq_tail(s);
> +            break;
> +        case BIT(RISCV_IOMMU_EXEC_FQCSR):
> +            riscv_iommu_process_fq_control(s);
> +            break;
> +        case BIT(RISCV_IOMMU_EXEC_FQH):
> +            /* NOP */
> +            break;
> +        case BIT(RISCV_IOMMU_EXEC_PQCSR):
> +            riscv_iommu_process_pq_control(s);
> +            break;
> +        case BIT(RISCV_IOMMU_EXEC_PQH):
> +            /* NOP */
> +            break;
> +        case BIT(RISCV_IOMMU_EXEC_TR_REQUEST):
> +            /* DBG support not implemented yet */
> +            break;
> +        }
> +        exec &= ~mask;
> +        if (!exec) {
> +            qemu_mutex_lock(&s->core_lock);
> +            exec = s->core_exec;
> +            while (!exec) {
> +                qemu_cond_wait(&s->core_cond, &s->core_lock);
> +                exec = s->core_exec;
> +            }
> +            s->core_exec = 0;
> +            qemu_mutex_unlock(&s->core_lock);
> +        }
> +    };
> +
> +    return NULL;
> +}
> +
> +static MemTxResult riscv_iommu_mmio_write(void *opaque, hwaddr addr,
> +    uint64_t data, unsigned size, MemTxAttrs attrs)
> +{
> +    RISCVIOMMUState *s = opaque;
> +    uint32_t regb = addr & ~3;
> +    uint32_t busy = 0;
> +    uint32_t exec = 0;
> +
> +    if (size == 0 || size > 8 || (addr & (size - 1)) != 0) {
> +        /* Unsupported MMIO alignment or access size */
> +        return MEMTX_ERROR;
> +    }
> +
> +    if (addr + size > RISCV_IOMMU_REG_MSI_CONFIG) {
> +        /* Unsupported MMIO access location. */
> +        return MEMTX_ACCESS_ERROR;
> +    }
> +
> +    /* Track actionable MMIO write. */
> +    switch (regb) {

There should be a case for IPSR register.

 From RISC-V IOMMU Architecture Spec v1.0.0 section 5.18:
If a bit in ipsr is 1 then a write of 1 to the bit transitions the bit 
from 1→0. If the conditions to set that bit are still present (See 
[IPSR_FIELDS]) or if they occur after the bit is cleared then that bit 
transitions again from 0→1.
> +    case RISCV_IOMMU_REG_DDTP:
> +    case RISCV_IOMMU_REG_DDTP + 4:
> +        exec = BIT(RISCV_IOMMU_EXEC_DDTP);
> +        regb = RISCV_IOMMU_REG_DDTP;
> +        busy = RISCV_IOMMU_DDTP_BUSY;
> +        break;
> +
> +    case RISCV_IOMMU_REG_CQT:
> +        exec = BIT(RISCV_IOMMU_EXEC_CQT);
> +        break;
> +
> +    case RISCV_IOMMU_REG_CQCSR:
> +        exec = BIT(RISCV_IOMMU_EXEC_CQCSR);
> +        busy = RISCV_IOMMU_CQCSR_BUSY;
> +        break;
> +
> +    case RISCV_IOMMU_REG_FQH:
> +        exec = BIT(RISCV_IOMMU_EXEC_FQH);
> +        break;
> +
> +    case RISCV_IOMMU_REG_FQCSR:
> +        exec = BIT(RISCV_IOMMU_EXEC_FQCSR);
> +        busy = RISCV_IOMMU_FQCSR_BUSY;
> +        break;
> +
> +    case RISCV_IOMMU_REG_PQH:
> +        exec = BIT(RISCV_IOMMU_EXEC_PQH);
> +        break;
> +
> +    case RISCV_IOMMU_REG_PQCSR:
> +        exec = BIT(RISCV_IOMMU_EXEC_PQCSR);
> +        busy = RISCV_IOMMU_PQCSR_BUSY;
> +        break;
> +    }
> +
> +    /*
> +     * Registers update might be not synchronized with core logic.
> +     * If system software updates register when relevant BUSY bit is set
> +     * IOMMU behavior of additional writes to the register is UNSPECIFIED
> +     */
> +
> +    qemu_spin_lock(&s->regs_lock);
> +    if (size == 1) {
> +        uint8_t ro = s->regs_ro[addr];
> +        uint8_t wc = s->regs_wc[addr];
> +        uint8_t rw = s->regs_rw[addr];
> +        s->regs_rw[addr] = ((rw & ro) | (data & ~ro)) & ~(data & wc);
> +    } else if (size == 2) {
> +        uint16_t ro = lduw_le_p(&s->regs_ro[addr]);
> +        uint16_t wc = lduw_le_p(&s->regs_wc[addr]);
> +        uint16_t rw = lduw_le_p(&s->regs_rw[addr]);
> +        stw_le_p(&s->regs_rw[addr], ((rw & ro) | (data & ~ro)) & ~(data & wc));
> +    } else if (size == 4) {
> +        uint32_t ro = ldl_le_p(&s->regs_ro[addr]);
> +        uint32_t wc = ldl_le_p(&s->regs_wc[addr]);
> +        uint32_t rw = ldl_le_p(&s->regs_rw[addr]);
> +        stl_le_p(&s->regs_rw[addr], ((rw & ro) | (data & ~ro)) & ~(data & wc));
> +    } else if (size == 8) {
> +        uint64_t ro = ldq_le_p(&s->regs_ro[addr]);
> +        uint64_t wc = ldq_le_p(&s->regs_wc[addr]);
> +        uint64_t rw = ldq_le_p(&s->regs_rw[addr]);
> +        stq_le_p(&s->regs_rw[addr], ((rw & ro) | (data & ~ro)) & ~(data & wc));
> +    }
> +
> +    /* Busy flag update, MSB 4-byte register. */
> +    if (busy) {
> +        uint32_t rw = ldl_le_p(&s->regs_rw[regb]);
> +        stl_le_p(&s->regs_rw[regb], rw | busy);
> +    }
> +    qemu_spin_unlock(&s->regs_lock);
> +
> +    /* Wake up core processing thread. */
> +    if (exec) {
> +        qemu_mutex_lock(&s->core_lock);
> +        s->core_exec |= exec;
> +        qemu_cond_signal(&s->core_cond);
> +        qemu_mutex_unlock(&s->core_lock);
> +    }
> +
> +    return MEMTX_OK;
> +}
> +
> +static MemTxResult riscv_iommu_mmio_read(void *opaque, hwaddr addr,
> +    uint64_t *data, unsigned size, MemTxAttrs attrs)
> +{
> +    RISCVIOMMUState *s = opaque;
> +    uint64_t val = -1;
> +    uint8_t *ptr;
> +
> +    if ((addr & (size - 1)) != 0) {
> +        /* Unsupported MMIO alignment. */
> +        return MEMTX_ERROR;
> +    }
> +
> +    if (addr + size > RISCV_IOMMU_REG_MSI_CONFIG) {
> +        return MEMTX_ACCESS_ERROR;
> +    }
> +
> +    ptr = &s->regs_rw[addr];
> +
> +    if (size == 1) {
> +        val = (uint64_t)*ptr;
> +    } else if (size == 2) {
> +        val = lduw_le_p(ptr);
> +    } else if (size == 4) {
> +        val = ldl_le_p(ptr);
> +    } else if (size == 8) {
> +        val = ldq_le_p(ptr);
> +    } else {
> +        return MEMTX_ERROR;
> +    }
> +
> +    *data = val;
> +
> +    return MEMTX_OK;
> +}
> +
> +static const MemoryRegionOps riscv_iommu_mmio_ops = {
> +    .read_with_attrs = riscv_iommu_mmio_read,
> +    .write_with_attrs = riscv_iommu_mmio_write,
> +    .endianness = DEVICE_NATIVE_ENDIAN,
> +    .impl = {
> +        .min_access_size = 1,
> +        .max_access_size = 8,
> +        .unaligned = false,
> +    },
> +    .valid = {
> +        .min_access_size = 1,
> +        .max_access_size = 8,
> +    }
> +};
> +
> +/*
> + * Translations matching MSI pattern check are redirected to "riscv-iommu-trap"
> + * memory region as untranslated address, for additional MSI/MRIF interception
> + * by IOMMU interrupt remapping implementation.
> + * Note: Device emulation code generating an MSI is expected to provide a valid
> + * memory transaction attributes with requested_id set.
> + */
> +static MemTxResult riscv_iommu_trap_write(void *opaque, hwaddr addr,
> +    uint64_t data, unsigned size, MemTxAttrs attrs)
> +{
> +    RISCVIOMMUState* s = (RISCVIOMMUState *)opaque;
> +    RISCVIOMMUContext *ctx;
> +    MemTxResult res;
> +    void *ref;
> +    uint32_t devid = attrs.requester_id;
> +
> +    if (attrs.unspecified) {
> +        return MEMTX_ACCESS_ERROR;
> +    }
> +
> +    /* FIXME: PCIe bus remapping for attached endpoints. */
> +    devid |= s->bus << 8;
> +
> +    ctx = riscv_iommu_ctx(s, devid, 0, &ref);
> +    if (ctx == NULL) {
> +        res = MEMTX_ACCESS_ERROR;
> +    } else {
> +        res = riscv_iommu_msi_write(s, ctx, addr, data, size, attrs);
> +    }
> +    riscv_iommu_ctx_put(s, ref);
> +    return res;
> +}
> +
> +static MemTxResult riscv_iommu_trap_read(void *opaque, hwaddr addr,
> +    uint64_t *data, unsigned size, MemTxAttrs attrs)
> +{
> +    return MEMTX_ACCESS_ERROR;
> +}
> +
> +static const MemoryRegionOps riscv_iommu_trap_ops = {
> +    .read_with_attrs = riscv_iommu_trap_read,
> +    .write_with_attrs = riscv_iommu_trap_write,
> +    .endianness = DEVICE_LITTLE_ENDIAN,
> +    .impl = {
> +        .min_access_size = 1,
> +        .max_access_size = 8,
> +        .unaligned = true,
> +    },
> +    .valid = {
> +        .min_access_size = 1,
> +        .max_access_size = 8,
> +    }
> +};
> +
> +static void riscv_iommu_realize(DeviceState *dev, Error **errp)
> +{
> +    RISCVIOMMUState *s = RISCV_IOMMU(dev);
> +
> +    s->cap = s->version & RISCV_IOMMU_CAP_VERSION;
> +    if (s->enable_msi) {
> +        s->cap |= RISCV_IOMMU_CAP_MSI_FLAT | RISCV_IOMMU_CAP_MSI_MRIF;
> +    }
> +    /* Report QEMU target physical address space limits */
> +    s->cap = set_field(s->cap, RISCV_IOMMU_CAP_PAS,
> +                       TARGET_PHYS_ADDR_SPACE_BITS);
> +
> +    /* TODO: method to report supported PASID bits */
> +    s->pasid_bits = 8; /* restricted to size of MemTxAttrs.pasid */
> +    s->cap |= RISCV_IOMMU_CAP_PD8;
> +
> +    /* Out-of-reset translation mode: OFF (DMA disabled) BARE (passthrough) */
> +    s->ddtp = set_field(0, RISCV_IOMMU_DDTP_MODE, s->enable_off ?
> +                        RISCV_IOMMU_DDTP_MODE_OFF : RISCV_IOMMU_DDTP_MODE_BARE);
> +
> +    /* register storage */
> +    s->regs_rw = g_new0(uint8_t, RISCV_IOMMU_REG_SIZE);
> +    s->regs_ro = g_new0(uint8_t, RISCV_IOMMU_REG_SIZE);
> +    s->regs_wc = g_new0(uint8_t, RISCV_IOMMU_REG_SIZE);
> +
> +     /* Mark all registers read-only */
> +    memset(s->regs_ro, 0xff, RISCV_IOMMU_REG_SIZE);
> +
> +    /*
> +     * Register complete MMIO space, including MSI/PBA registers.
> +     * Note, PCIDevice implementation will add overlapping MR for MSI/PBA,
> +     * managed directly by the PCIDevice implementation.
> +     */
> +    memory_region_init_io(&s->regs_mr, OBJECT(dev), &riscv_iommu_mmio_ops, s,
> +        "riscv-iommu-regs", RISCV_IOMMU_REG_SIZE);
> +
> +    /* Set power-on register state */
> +    stq_le_p(&s->regs_rw[RISCV_IOMMU_REG_CAP], s->cap);
> +    stq_le_p(&s->regs_rw[RISCV_IOMMU_REG_FCTL], s->fctl);
s->fctl is not initialized.
> +    stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_DDTP],
> +        ~(RISCV_IOMMU_DDTP_PPN | RISCV_IOMMU_DDTP_MODE));
> +    stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_CQB],
> +        ~(RISCV_IOMMU_CQB_LOG2SZ | RISCV_IOMMU_CQB_PPN));
> +    stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_FQB],
> +        ~(RISCV_IOMMU_FQB_LOG2SZ | RISCV_IOMMU_FQB_PPN));
> +    stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_PQB],
> +        ~(RISCV_IOMMU_PQB_LOG2SZ | RISCV_IOMMU_PQB_PPN));
> +    stl_le_p(&s->regs_wc[RISCV_IOMMU_REG_CQCSR], RISCV_IOMMU_CQCSR_CQMF |
> +        RISCV_IOMMU_CQCSR_CMD_TO | RISCV_IOMMU_CQCSR_CMD_ILL);
> +    stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_CQCSR], RISCV_IOMMU_CQCSR_CQON |
> +        RISCV_IOMMU_CQCSR_BUSY);
> +    stl_le_p(&s->regs_wc[RISCV_IOMMU_REG_FQCSR], RISCV_IOMMU_FQCSR_FQMF |
> +        RISCV_IOMMU_FQCSR_FQOF);
> +    stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_FQCSR], RISCV_IOMMU_FQCSR_FQON |
> +        RISCV_IOMMU_FQCSR_BUSY);
> +    stl_le_p(&s->regs_wc[RISCV_IOMMU_REG_PQCSR], RISCV_IOMMU_PQCSR_PQMF |
> +        RISCV_IOMMU_PQCSR_PQOF);
> +    stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_PQCSR], RISCV_IOMMU_PQCSR_PQON |
> +        RISCV_IOMMU_PQCSR_BUSY);
> +    stl_le_p(&s->regs_wc[RISCV_IOMMU_REG_IPSR], ~0);
> +    stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_IVEC], 0);
> +    stq_le_p(&s->regs_rw[RISCV_IOMMU_REG_DDTP], s->ddtp);
> +
> +    /* Memory region for downstream access, if specified. */
> +    if (s->target_mr) {
> +        s->target_as = g_new0(AddressSpace, 1);
> +        address_space_init(s->target_as, s->target_mr,
> +            "riscv-iommu-downstream");
> +    } else {
> +        /* Fallback to global system memory. */
> +        s->target_as = &address_space_memory;
> +    }
> +
> +    /* Memory region for untranslated MRIF/MSI writes */
> +    memory_region_init_io(&s->trap_mr, OBJECT(dev), &riscv_iommu_trap_ops, s,
> +            "riscv-iommu-trap", ~0ULL);
> +    address_space_init(&s->trap_as, &s->trap_mr, "riscv-iommu-trap-as");
> +
> +    /* Device translation context cache */
> +    s->ctx_cache = g_hash_table_new_full(__ctx_hash, __ctx_equal,
> +                                         g_free, NULL);
> +
> +    s->iommus.le_next = NULL;
> +    s->iommus.le_prev = NULL;
> +    QLIST_INIT(&s->spaces);
> +    qemu_cond_init(&s->core_cond);
> +    qemu_mutex_init(&s->core_lock);
> +    qemu_spin_init(&s->regs_lock);
> +    qemu_thread_create(&s->core_proc, "riscv-iommu-core",
> +        riscv_iommu_core_proc, s, QEMU_THREAD_JOINABLE);
> +}
> +
> +static void riscv_iommu_unrealize(DeviceState *dev)
> +{
> +    RISCVIOMMUState *s = RISCV_IOMMU(dev);
> +
> +    qemu_mutex_lock(&s->core_lock);
> +    /* cancel pending operations and stop */
> +    s->core_exec = BIT(RISCV_IOMMU_EXEC_EXIT);
> +    qemu_cond_signal(&s->core_cond);
> +    qemu_mutex_unlock(&s->core_lock);
> +    qemu_thread_join(&s->core_proc);
> +    qemu_cond_destroy(&s->core_cond);
> +    qemu_mutex_destroy(&s->core_lock);
> +    g_hash_table_unref(s->ctx_cache);
> +}
> +
> +static Property riscv_iommu_properties[] = {
> +    DEFINE_PROP_UINT32("version", RISCVIOMMUState, version,
> +        RISCV_IOMMU_SPEC_DOT_VER),
> +    DEFINE_PROP_UINT32("bus", RISCVIOMMUState, bus, 0x0),
> +    DEFINE_PROP_BOOL("intremap", RISCVIOMMUState, enable_msi, TRUE),
> +    DEFINE_PROP_BOOL("off", RISCVIOMMUState, enable_off, TRUE),
> +    DEFINE_PROP_LINK("downstream-mr", RISCVIOMMUState, target_mr,
> +        TYPE_MEMORY_REGION, MemoryRegion *),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void riscv_iommu_class_init(ObjectClass *klass, void* data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +
> +    /* internal device for riscv-iommu-{pci/sys}, not user-creatable */
> +    dc->user_creatable = false;
> +    dc->realize = riscv_iommu_realize;
> +    dc->unrealize = riscv_iommu_unrealize;
> +    device_class_set_props(dc, riscv_iommu_properties);
> +}
> +
> +static const TypeInfo riscv_iommu_info = {
> +    .name = TYPE_RISCV_IOMMU,
> +    .parent = TYPE_DEVICE,
> +    .instance_size = sizeof(RISCVIOMMUState),
> +    .class_init = riscv_iommu_class_init,
> +};
> +
> +static const char *IOMMU_FLAG_STR[] = {
> +    "NA",
> +    "RO",
> +    "WR",
> +    "RW",
> +};
> +
> +/* RISC-V IOMMU Memory Region - Address Translation Space */
> +static IOMMUTLBEntry riscv_iommu_memory_region_translate(
> +    IOMMUMemoryRegion *iommu_mr, hwaddr addr,
> +    IOMMUAccessFlags flag, int iommu_idx)
> +{
> +    RISCVIOMMUSpace *as = container_of(iommu_mr, RISCVIOMMUSpace, iova_mr);
> +    RISCVIOMMUContext *ctx;
> +    void *ref;
> +    IOMMUTLBEntry iotlb = {
> +        .iova = addr,
> +        .target_as = as->iommu->target_as,
> +        .addr_mask = ~0ULL,
> +        .perm = flag,
> +    };
> +
> +    ctx = riscv_iommu_ctx(as->iommu, as->devid, iommu_idx, &ref);
> +    if (ctx == NULL) {
> +        /* Translation disabled or invalid. */
> +        iotlb.addr_mask = 0;
> +        iotlb.perm = IOMMU_NONE;
> +    } else if (riscv_iommu_translate(as->iommu, ctx, &iotlb)) {
> +        /* Translation disabled or fault reported. */
> +        iotlb.addr_mask = 0;
> +        iotlb.perm = IOMMU_NONE;
> +    }
> +
> +    /* Trace all dma translations with original access flags. */
> +    trace_riscv_iommu_dma(as->iommu->parent_obj.id, PCI_BUS_NUM(as->devid),
> +                          PCI_SLOT(as->devid), PCI_FUNC(as->devid), iommu_idx,
> +                          IOMMU_FLAG_STR[flag & IOMMU_RW], iotlb.iova,
> +                          iotlb.translated_addr);
> +
> +    riscv_iommu_ctx_put(as->iommu, ref);
> +
> +    return iotlb;
> +}
> +
> +static int riscv_iommu_memory_region_notify(
> +    IOMMUMemoryRegion *iommu_mr, IOMMUNotifierFlag old,
> +    IOMMUNotifierFlag new, Error **errp)
> +{
> +    RISCVIOMMUSpace *as = container_of(iommu_mr, RISCVIOMMUSpace, iova_mr);
> +
> +    if (old == IOMMU_NOTIFIER_NONE) {
> +        as->notifier = true;
> +        trace_riscv_iommu_notifier_add(iommu_mr->parent_obj.name);
> +    } else if (new == IOMMU_NOTIFIER_NONE) {
> +        as->notifier = false;
> +        trace_riscv_iommu_notifier_del(iommu_mr->parent_obj.name);
> +    }
> +
> +    return 0;
> +}
> +
> +static inline bool pci_is_iommu(PCIDevice *pdev)
> +{
> +    return pci_get_word(pdev->config + PCI_CLASS_DEVICE) == 0x0806;
> +}
> +
> +static AddressSpace *riscv_iommu_find_as(PCIBus *bus, void *opaque, int devfn)
> +{
> +    RISCVIOMMUState *s = (RISCVIOMMUState *) opaque;
> +    PCIDevice *pdev = pci_find_device(bus, pci_bus_num(bus), devfn);
> +    AddressSpace *as = NULL;
> +
> +    if (pdev && pci_is_iommu(pdev)) {
> +        return s->target_as;
> +    }
> +
> +    /* Find first registered IOMMU device */
> +    while (s->iommus.le_prev) {
> +        s = *(s->iommus.le_prev);
> +    }
> +
> +    /* Find first matching IOMMU */
> +    while (s != NULL && as == NULL) {
> +        as = riscv_iommu_space(s, PCI_BUILD_BDF(pci_bus_num(bus), devfn));
> +        s = s->iommus.le_next;
> +    }
> +
> +    return as ? as : &address_space_memory;
> +}
> +
> +static const PCIIOMMUOps riscv_iommu_ops = {
> +    .get_address_space = riscv_iommu_find_as,
> +};
> +
> +void riscv_iommu_pci_setup_iommu(RISCVIOMMUState *iommu, PCIBus *bus,
> +        Error **errp)
> +{
> +    if (bus->iommu_ops &&
> +        bus->iommu_ops->get_address_space == riscv_iommu_find_as) {
> +        /* Allow multiple IOMMUs on the same PCIe bus, link known devices */
> +        RISCVIOMMUState *last = (RISCVIOMMUState *)bus->iommu_opaque;
> +        QLIST_INSERT_AFTER(last, iommu, iommus);
> +    } else if (bus->iommu_ops == NULL) {
> +        pci_setup_iommu(bus, &riscv_iommu_ops, iommu);
The original bus->iommu_op and bus->iommu_opaque will be lost.
> +    } else {
> +        error_setg(errp, "can't register secondary IOMMU for PCI bus #%d",
> +            pci_bus_num(bus));
> +    }
> +}
> +
> +static int riscv_iommu_memory_region_index(IOMMUMemoryRegion *iommu_mr,
> +    MemTxAttrs attrs)
> +{
> +    return attrs.unspecified ? RISCV_IOMMU_NOPASID : (int)attrs.pasid;
> +}
> +
> +static int riscv_iommu_memory_region_index_len(IOMMUMemoryRegion *iommu_mr)
> +{
> +    RISCVIOMMUSpace *as = container_of(iommu_mr, RISCVIOMMUSpace, iova_mr);
> +    return 1 << as->iommu->pasid_bits;
> +}
> +
> +static void riscv_iommu_memory_region_init(ObjectClass *klass, void *data)
> +{
> +    IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_CLASS(klass);
> +
> +    imrc->translate = riscv_iommu_memory_region_translate;
> +    imrc->notify_flag_changed = riscv_iommu_memory_region_notify;
> +    imrc->attrs_to_index = riscv_iommu_memory_region_index;
> +    imrc->num_indexes = riscv_iommu_memory_region_index_len;
> +}
> +
> +static const TypeInfo riscv_iommu_memory_region_info = {
> +    .parent = TYPE_IOMMU_MEMORY_REGION,
> +    .name = TYPE_RISCV_IOMMU_MEMORY_REGION,
> +    .class_init = riscv_iommu_memory_region_init,
> +};
> +
> +static void riscv_iommu_register_mr_types(void)
> +{
> +    type_register_static(&riscv_iommu_memory_region_info);
> +    type_register_static(&riscv_iommu_info);
> +}
> +
> +type_init(riscv_iommu_register_mr_types);
> diff --git a/hw/riscv/riscv-iommu.h b/hw/riscv/riscv-iommu.h
> new file mode 100644
> index 0000000000..6f740de690
> --- /dev/null
> +++ b/hw/riscv/riscv-iommu.h
> @@ -0,0 +1,141 @@
> +/*
> + * QEMU emulation of an RISC-V IOMMU (Ziommu)
> + *
> + * Copyright (C) 2022-2023 Rivos Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see<http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef HW_RISCV_IOMMU_STATE_H
> +#define HW_RISCV_IOMMU_STATE_H
> +
> +#include "qemu/osdep.h"
> +#include "qom/object.h"
> +
> +#include "hw/riscv/iommu.h"
> +
> +struct RISCVIOMMUState {
> +    /*< private >*/
> +    DeviceState parent_obj;
> +
> +    /*< public >*/
> +    uint32_t version;     /* Reported interface version number */
> +    uint32_t pasid_bits;  /* process identifier width */
> +    uint32_t bus;         /* PCI bus mapping for non-root endpoints */
> +
> +    uint64_t cap;         /* IOMMU supported capabilities */
> +    uint64_t fctl;        /* IOMMU enabled features */
> +
> +    bool enable_off;      /* Enable out-of-reset OFF mode (DMA disabled) */
> +    bool enable_msi;      /* Enable MSI remapping */
> +
> +    /* IOMMU Internal State */
> +    uint64_t ddtp;        /* Validated Device Directory Tree Root Pointer */
> +
> +    dma_addr_t cq_addr;   /* Command queue base physical address */
> +    dma_addr_t fq_addr;   /* Fault/event queue base physical address */
> +    dma_addr_t pq_addr;   /* Page request queue base physical address */
> +
> +    uint32_t cq_mask;     /* Command queue index bit mask */
> +    uint32_t fq_mask;     /* Fault/event queue index bit mask */
> +    uint32_t pq_mask;     /* Page request queue index bit mask */
> +
> +    /* interrupt notifier */
> +    void (*notify)(RISCVIOMMUState *iommu, unsigned vector);
> +
> +    /* IOMMU State Machine */
> +    QemuThread core_proc; /* Background processing thread */
> +    QemuMutex core_lock;  /* Global IOMMU lock, used for cache/regs updates */
> +    QemuCond core_cond;   /* Background processing wake up signal */
> +    unsigned core_exec;   /* Processing thread execution actions */
> +
> +    /* IOMMU target address space */
> +    AddressSpace *target_as;
> +    MemoryRegion *target_mr;
> +
> +    /* MSI / MRIF access trap */
> +    AddressSpace trap_as;
> +    MemoryRegion trap_mr;
> +
> +    GHashTable *ctx_cache;          /* Device translation Context Cache */
> +
> +    /* MMIO Hardware Interface */
> +    MemoryRegion regs_mr;
> +    QemuSpin regs_lock;
> +    uint8_t *regs_rw;  /* register state (user write) */
> +    uint8_t *regs_wc;  /* write-1-to-clear mask */
> +    uint8_t *regs_ro;  /* read-only mask */
> +
> +    QLIST_ENTRY(RISCVIOMMUState) iommus;
> +    QLIST_HEAD(, RISCVIOMMUSpace) spaces;
> +};
> +
> +void riscv_iommu_pci_setup_iommu(RISCVIOMMUState *iommu, PCIBus *bus,
> +         Error **errp);
> +
> +/* private helpers */
> +
> +/* Register helper functions */
> +static inline uint32_t riscv_iommu_reg_mod32(RISCVIOMMUState *s,
> +    unsigned idx, uint32_t set, uint32_t clr)
> +{
> +    uint32_t val;
> +    qemu_spin_lock(&s->regs_lock);
> +    val = ldl_le_p(s->regs_rw + idx);
> +    stl_le_p(s->regs_rw + idx, (val & ~clr) | set);
> +    qemu_spin_unlock(&s->regs_lock);
> +    return val;
> +}
> +
> +static inline void riscv_iommu_reg_set32(RISCVIOMMUState *s,
> +    unsigned idx, uint32_t set)
> +{
> +    qemu_spin_lock(&s->regs_lock);
> +    stl_le_p(s->regs_rw + idx, set);
> +    qemu_spin_unlock(&s->regs_lock);
> +}
> +
> +static inline uint32_t riscv_iommu_reg_get32(RISCVIOMMUState *s,
> +    unsigned idx)
> +{
> +    return ldl_le_p(s->regs_rw + idx);
> +}
> +
> +static inline uint64_t riscv_iommu_reg_mod64(RISCVIOMMUState *s,
> +    unsigned idx, uint64_t set, uint64_t clr)
> +{
> +    uint64_t val;
> +    qemu_spin_lock(&s->regs_lock);
> +    val = ldq_le_p(s->regs_rw + idx);
> +    stq_le_p(s->regs_rw + idx, (val & ~clr) | set);
> +    qemu_spin_unlock(&s->regs_lock);
> +    return val;
> +}
> +
> +static inline void riscv_iommu_reg_set64(RISCVIOMMUState *s,
> +    unsigned idx, uint64_t set)
> +{
> +    qemu_spin_lock(&s->regs_lock);
> +    stq_le_p(s->regs_rw + idx, set);
> +    qemu_spin_unlock(&s->regs_lock);
> +}
> +
> +static inline uint64_t riscv_iommu_reg_get64(RISCVIOMMUState *s,
> +    unsigned idx)
> +{
> +    return ldq_le_p(s->regs_rw + idx);
> +}
> +
> +
> +
> +#endif
> diff --git a/hw/riscv/trace-events b/hw/riscv/trace-events
> new file mode 100644
> index 0000000000..42a97caffa
> --- /dev/null
> +++ b/hw/riscv/trace-events
> @@ -0,0 +1,11 @@
> +# See documentation at docs/devel/tracing.rst
> +
> +# riscv-iommu.c
> +riscv_iommu_new(const char *id, unsigned b, unsigned d, unsigned f) "%s: device attached %04x:%02x.%d"
> +riscv_iommu_flt(const char *id, unsigned b, unsigned d, unsigned f, uint64_t reason, uint64_t iova) "%s: fault %04x:%02x.%u reason: 0x%"PRIx64" iova: 0x%"PRIx64
> +riscv_iommu_pri(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova) "%s: page request %04x:%02x.%u iova: 0x%"PRIx64
> +riscv_iommu_dma(const char *id, unsigned b, unsigned d, unsigned f, unsigned pasid, const char *dir, uint64_t iova, uint64_t phys) "%s: translate %04x:%02x.%u #%u %s 0x%"PRIx64" -> 0x%"PRIx64
> +riscv_iommu_msi(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova, uint64_t phys) "%s: translate %04x:%02x.%u MSI 0x%"PRIx64" -> 0x%"PRIx64
> +riscv_iommu_cmd(const char *id, uint64_t l, uint64_t u) "%s: command 0x%"PRIx64" 0x%"PRIx64
> +riscv_iommu_notifier_add(const char *id) "%s: dev-iotlb notifier added"
> +riscv_iommu_notifier_del(const char *id) "%s: dev-iotlb notifier removed"
> diff --git a/hw/riscv/trace.h b/hw/riscv/trace.h
> new file mode 100644
> index 0000000000..b88504b750
> --- /dev/null
> +++ b/hw/riscv/trace.h
> @@ -0,0 +1,2 @@
> +#include "trace/trace-hw_riscv.h"
> +
> diff --git a/include/hw/riscv/iommu.h b/include/hw/riscv/iommu.h
> new file mode 100644
> index 0000000000..403b365893
> --- /dev/null
> +++ b/include/hw/riscv/iommu.h
> @@ -0,0 +1,36 @@
> +/*
> + * QEMU emulation of an RISC-V IOMMU (Ziommu)
> + *
> + * Copyright (C) 2022-2023 Rivos Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see<http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef HW_RISCV_IOMMU_H
> +#define HW_RISCV_IOMMU_H
> +
> +#include "qemu/osdep.h"
> +#include "qom/object.h"
> +
> +#define TYPE_RISCV_IOMMU "riscv-iommu"
> +OBJECT_DECLARE_SIMPLE_TYPE(RISCVIOMMUState, RISCV_IOMMU)
> +typedef struct RISCVIOMMUState RISCVIOMMUState;
> +
> +#define TYPE_RISCV_IOMMU_MEMORY_REGION "riscv-iommu-mr"
> +typedef struct RISCVIOMMUSpace RISCVIOMMUSpace;
> +
> +#define TYPE_RISCV_IOMMU_PCI "riscv-iommu-pci"
> +OBJECT_DECLARE_SIMPLE_TYPE(RISCVIOMMUStatePci, RISCV_IOMMU_PCI)
> +typedef struct RISCVIOMMUStatePci RISCVIOMMUStatePci;
> +
> +#endif
> diff --git a/meson.build b/meson.build
> index c59ca496f2..75e56f3282 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -3361,6 +3361,7 @@ if have_system
>       'hw/rdma',
>       'hw/rdma/vmw',
>       'hw/rtc',
> +    'hw/riscv',
>       'hw/s390x',
>       'hw/scsi',
>       'hw/sd',

[-- Attachment #2: Type: text/html, Size: 67766 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 04/15] hw/riscv: add riscv-iommu-pci device
  2024-04-29  7:21   ` Frank Chang
@ 2024-05-02  9:37     ` Daniel Henrique Barboza
  0 siblings, 0 replies; 55+ messages in thread
From: Daniel Henrique Barboza @ 2024-05-02  9:37 UTC (permalink / raw)
  To: Frank Chang
  Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
	zhiwei_liu, palmer, ajones, tjeznach



On 4/29/24 04:21, Frank Chang wrote:
> Daniel Henrique Barboza <dbarboza@ventanamicro.com <mailto:dbarboza@ventanamicro.com>> 於 2024年3月8日 週五 上午12:04寫道:
>  >
>  > From: Tomasz Jeznach <tjeznach@rivosinc.com <mailto:tjeznach@rivosinc.com>>
>  >
>  > The RISC-V IOMMU can be modelled as a PCIe device following the
>  > guidelines of the RISC-V IOMMU spec, chapter 7.1, "Integrating an IOMMU
>  > as a PCIe device".
>  >
>  > Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com <mailto:tjeznach@rivosinc.com>>
>  > Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com <mailto:dbarboza@ventanamicro.com>>
>  > ---
>  >  hw/riscv/meson.build       |   2 +-
>  >  hw/riscv/riscv-iommu-pci.c | 173 +++++++++++++++++++++++++++++++++++++
>  >  2 files changed, 174 insertions(+), 1 deletion(-)
>  >  create mode 100644 hw/riscv/riscv-iommu-pci.c
>  >
>  > diff --git a/hw/riscv/meson.build b/hw/riscv/meson.build
>  > index ba9eebd605..4674cec6c4 100644
>  > --- a/hw/riscv/meson.build
>  > +++ b/hw/riscv/meson.build
>  > @@ -10,6 +10,6 @@ riscv_ss.add(when: 'CONFIG_SIFIVE_U', if_true: files('sifive_u.c'))
>  >  riscv_ss.add(when: 'CONFIG_SPIKE', if_true: files('spike.c'))
>  >  riscv_ss.add(when: 'CONFIG_MICROCHIP_PFSOC', if_true: files('microchip_pfsoc.c'))
>  >  riscv_ss.add(when: 'CONFIG_ACPI', if_true: files('virt-acpi-build.c'))
>  > -riscv_ss.add(when: 'CONFIG_RISCV_IOMMU', if_true: files('riscv-iommu.c'))
>  > +riscv_ss.add(when: 'CONFIG_RISCV_IOMMU', if_true: files('riscv-iommu.c', 'riscv-iommu-pci.c'))
>  >
>  >  hw_arch += {'riscv': riscv_ss}
>  > diff --git a/hw/riscv/riscv-iommu-pci.c b/hw/riscv/riscv-iommu-pci.c
>  > new file mode 100644
>  > index 0000000000..4eb1057210
>  > --- /dev/null
>  > +++ b/hw/riscv/riscv-iommu-pci.c
>  > @@ -0,0 +1,173 @@
>  > +/*
>  > + * QEMU emulation of an RISC-V IOMMU (Ziommu)
>  > + *
>  > + * Copyright (C) 2022-2023 Rivos Inc.
>  > + *
>  > + * This program is free software; you can redistribute it and/or modify
>  > + * it under the terms of the GNU General Public License as published by
>  > + * the Free Software Foundation; either version 2 of the License.
>  > + *
>  > + * This program is distributed in the hope that it will be useful,
>  > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>  > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>  > + * GNU General Public License for more details.
>  > + *
>  > + * You should have received a copy of the GNU General Public License along
>  > + * with this program; if not, see <http://www.gnu.org/licenses/ <http://www.gnu.org/licenses/>>.
>  > + */
>  > +
>  > +#include "qemu/osdep.h"
>  > +#include "hw/pci/msi.h"
>  > +#include "hw/pci/msix.h"
>  > +#include "hw/pci/pci_bus.h"
>  > +#include "hw/qdev-properties.h"
>  > +#include "hw/riscv/riscv_hart.h"
>  > +#include "migration/vmstate.h"
>  > +#include "qapi/error.h"
>  > +#include "qemu/error-report.h"
>  > +#include "qemu/host-utils.h"
>  > +#include "qom/object.h"
>  > +
>  > +#include "cpu_bits.h"
>  > +#include "riscv-iommu.h"
>  > +#include "riscv-iommu-bits.h"
>  > +
>  > +#ifndef PCI_VENDOR_ID_RIVOS
>  > +#define PCI_VENDOR_ID_RIVOS           0x1efd
>  > +#endif
>  > +
>  > +#ifndef PCI_DEVICE_ID_RIVOS_IOMMU
>  > +#define PCI_DEVICE_ID_RIVOS_IOMMU     0xedf1
>  > +#endif
>  > +
>  > +/* RISC-V IOMMU PCI Device Emulation */
>  > +
>  > +typedef struct RISCVIOMMUStatePci {
>  > +    PCIDevice        pci;     /* Parent PCIe device state */
>  > +    MemoryRegion     bar0;    /* PCI BAR (including MSI-x config) */
>  > +    RISCVIOMMUState  iommu;   /* common IOMMU state */
>  > +} RISCVIOMMUStatePci;
>  > +
>  > +/* interrupt delivery callback */
>  > +static void riscv_iommu_pci_notify(RISCVIOMMUState *iommu, unsigned vector)
>  > +{
>  > +    RISCVIOMMUStatePci *s = container_of(iommu, RISCVIOMMUStatePci, iommu);
>  > +
>  > +    if (msix_enabled(&(s->pci))) {
>  > +        msix_notify(&(s->pci), vector);
>  > +    }
>  > +}
>  > +
>  > +static void riscv_iommu_pci_realize(PCIDevice *dev, Error **errp)
>  > +{
>  > +    RISCVIOMMUStatePci *s = DO_UPCAST(RISCVIOMMUStatePci, pci, dev);
>  > +    RISCVIOMMUState *iommu = &s->iommu;
>  > +    Error *err = NULL;
>  > +
>  > +    /* Set device id for trace / debug */
>  > +    DEVICE(iommu)->id = g_strdup_printf("%02x:%02x.%01x",
>  > +        pci_dev_bus_num(dev), PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn));
> 
> pci_dev_bus_num() calls pci_bus_num(),
> and pci_bus_num() is assigned to pcibus_num(),
> which returns bus->parent_dev->config[PCI_SECONDARY_BUS]
> However, PCI bus number is not initialized by SW when IOMMU is initialized.
> So pci_bus_num() will always return 0, IIRC.
> Same issue as pci_bus_num() above.
> 
>  > +    qdev_realize(DEVICE(iommu), NULL, errp);
>  > +
>  > +    memory_region_init(&s->bar0, OBJECT(s), "riscv-iommu-bar0",
>  > +        QEMU_ALIGN_UP(memory_region_size(&iommu->regs_mr), TARGET_PAGE_SIZE));
>  > +    memory_region_add_subregion(&s->bar0, 0, &iommu->regs_mr);
>  > +
>  > +    pcie_endpoint_cap_init(dev, 0);
>  > +
>  > +    pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY |
>  > +                     PCI_BASE_ADDRESS_MEM_TYPE_64, &s->bar0);
>  > +
>  > +    int ret = msix_init(dev, RISCV_IOMMU_INTR_COUNT,
>  > +                        &s->bar0, 0, RISCV_IOMMU_REG_MSI_CONFIG,
>  > +                        &s->bar0, 0, RISCV_IOMMU_REG_MSI_CONFIG + 256, 0, &err);
>  > +
>  > +    if (ret == -ENOTSUP) {
>  > +        /*
>  > +         * MSI-x is not supported by the platform.
>  > +         * Driver should use timer/polling based notification handlers.
>  > +         */
>  > +        warn_report_err(err);
>  > +    } else if (ret < 0) {
>  > +        error_propagate(errp, err);
>  > +        return;
>  > +    } else {
>  > +        /* mark all allocated MSIx vectors as used. */
>  > +        msix_vector_use(dev, RISCV_IOMMU_INTR_CQ);
>  > +        msix_vector_use(dev, RISCV_IOMMU_INTR_FQ);
>  > +        msix_vector_use(dev, RISCV_IOMMU_INTR_PM);
>  > +        msix_vector_use(dev, RISCV_IOMMU_INTR_PQ);
>  > +        iommu->notify = riscv_iommu_pci_notify;
>  > +    }
>  > +
>  > +    PCIBus *bus = pci_device_root_bus(dev);
>  > +    if (!bus) {
>  > +        error_setg(errp, "can't find PCIe root port for %02x:%02x.%x",
>  > +            pci_bus_num(pci_get_bus(dev)), PCI_SLOT(dev->devfn),
> 
> Same issue to pci_dev_bus_num() above.
> 
>  > +            PCI_FUNC(dev->devfn));
>  > +        return;
>  > +    }
>  > +
>  > +    riscv_iommu_pci_setup_iommu(iommu, bus, errp);
>  > +}
>  > +
>  > +static void riscv_iommu_pci_exit(PCIDevice *pci_dev)
>  > +{
>  > +    pci_setup_iommu(pci_device_root_bus(pci_dev), NULL, NULL);
>  > +}
>  > +
>  > +static const VMStateDescription riscv_iommu_vmstate = {
>  > +    .name = "riscv-iommu",
>  > +    .unmigratable = 1
>  > +};
>  > +
>  > +static void riscv_iommu_pci_init(Object *obj)
>  > +{
>  > +    RISCVIOMMUStatePci *s = RISCV_IOMMU_PCI(obj);
>  > +    RISCVIOMMUState *iommu = &s->iommu;
>  > +
>  > +    object_initialize_child(obj, "iommu", iommu, TYPE_RISCV_IOMMU);
>  > +    qdev_alias_all_properties(DEVICE(iommu), obj);
>  > +}
>  > +
>  > +static Property riscv_iommu_pci_properties[] = {
>  > +    DEFINE_PROP_END_OF_LIST(),
>  > +};
> 
> Do we need to assign the empty properties?
> 
>  > +
>  > +static void riscv_iommu_pci_class_init(ObjectClass *klass, void *data)
>  > +{
>  > +    DeviceClass *dc = DEVICE_CLASS(klass);
>  > +    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
>  > +
>  > +    k->realize = riscv_iommu_pci_realize;
>  > +    k->exit = riscv_iommu_pci_exit;
>  > +    k->vendor_id = PCI_VENDOR_ID_RIVOS;
>  > +    k->device_id = PCI_DEVICE_ID_RIVOS_IOMMU;
> 
> I know RIVOS originally modeled this IOMMU,
> but we (SiFive) also have our IOMMU based on RISC-V IOMMU:
> https://open-src-soc.org/2022-05/media/slides/RISC-V-International-Day-2022-05-05-14h10-Perinne-Peresse.pdf <https://open-src-soc.org/2022-05/media/slides/RISC-V-International-Day-2022-05-05-14h10-Perinne-Peresse.pdf>
> Do we have the guidelines on how to extend the vendor IOMMU?

We'll use a generic PCI ID for the QEMU IOMMU model. Drew is giving a
hand looking into it. We'll either send a separated patch to add
the new PCI ID or fold the patch into this series.

As for extend the generic IOMMU, you can use this base implementation
(that doesn't have anything specific to Rivos, it's a generic spec
implementation) to implement the Si-Five device if you want. It
woud a device like:

>  > +static const TypeInfo riscv_sifive_iommu_pci = {
>  > +    .name = TYPE_RISCV_SIFIVE_IOMMU_PCI,
>  > +    .parent = TYPE_RISCV_IOMMU_PCI,

Same thing with the base emulation code. You can use it as a base and then
add the logic that is exclusive to Si-Five on top of it.


Thanks,

Daniel



> 
>  > +    k->revision = 0;
>  > +    k->class_id = 0x0806;
> 
> We should add
> #define PCI_CLASS_SYSTEM_IOMMU 0x0806
> instead of the hard-coded value.
> 
> P.S. AMD's IOMMU also uses hard-coded value 0x0806 in: hw/i386/amd_iommu.c.
> 
>  > +    dc->desc = "RISCV-IOMMU DMA Remapping device";
>  > +    dc->vmsd = &riscv_iommu_vmstate;
>  > +    dc->hotpluggable = false;
>  > +    dc->user_creatable = true;
>  > +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>  > +    device_class_set_props(dc, riscv_iommu_pci_properties);
>  > +}
>  > +
>  > +static const TypeInfo riscv_iommu_pci = {
>  > +    .name = TYPE_RISCV_IOMMU_PCI,
>  > +    .parent = TYPE_PCI_DEVICE,
>  > +    .class_init = riscv_iommu_pci_class_init,
>  > +    .instance_init = riscv_iommu_pci_init,
>  > +    .instance_size = sizeof(RISCVIOMMUStatePci),
>  > +    .interfaces = (InterfaceInfo[]) {
>  > +        { INTERFACE_PCIE_DEVICE },
>  > +        { },
>  > +    },
>  > +};
>  > +
>  > +static void riscv_iommu_register_pci_types(void)
>  > +{
>  > +    type_register_static(&riscv_iommu_pci);
>  > +}
>  > +
>  > +type_init(riscv_iommu_register_pci_types);
>  > --
>  > 2.43.2
>  >
>  >


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 03/15] hw/riscv: add RISC-V IOMMU base emulation
  2024-03-07 16:03 ` [PATCH v2 03/15] hw/riscv: add RISC-V IOMMU base emulation Daniel Henrique Barboza
  2024-05-01 11:57   ` Jason Chien
@ 2024-05-02 11:37   ` Frank Chang
  2024-05-08 11:15     ` Daniel Henrique Barboza
  1 sibling, 1 reply; 55+ messages in thread
From: Frank Chang @ 2024-05-02 11:37 UTC (permalink / raw)
  To: Daniel Henrique Barboza
  Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
	zhiwei_liu, palmer, ajones, tjeznach, Sebastien Boeuf

Hi Daniel,

Daniel Henrique Barboza <dbarboza@ventanamicro.com> 於 2024年3月8日 週五 上午12:04寫道:
>
> From: Tomasz Jeznach <tjeznach@rivosinc.com>
>
> The RISC-V IOMMU specification is now ratified as-per the RISC-V
> international process. The latest frozen specifcation can be found
> at:
>
> https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf
>
> Add the foundation of the device emulation for RISC-V IOMMU, which
> includes an IOMMU that has no capabilities but MSI interrupt support and
> fault queue interfaces. We'll add add more features incrementally in the
> next patches.
>
> Co-developed-by: Sebastien Boeuf <seb@rivosinc.com>
> Signed-off-by: Sebastien Boeuf <seb@rivosinc.com>
> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> ---
>  hw/riscv/Kconfig         |    4 +
>  hw/riscv/meson.build     |    1 +
>  hw/riscv/riscv-iommu.c   | 1492 ++++++++++++++++++++++++++++++++++++++
>  hw/riscv/riscv-iommu.h   |  141 ++++
>  hw/riscv/trace-events    |   11 +
>  hw/riscv/trace.h         |    2 +
>  include/hw/riscv/iommu.h |   36 +
>  meson.build              |    1 +
>  8 files changed, 1688 insertions(+)
>  create mode 100644 hw/riscv/riscv-iommu.c
>  create mode 100644 hw/riscv/riscv-iommu.h
>  create mode 100644 hw/riscv/trace-events
>  create mode 100644 hw/riscv/trace.h
>  create mode 100644 include/hw/riscv/iommu.h
>
> diff --git a/hw/riscv/Kconfig b/hw/riscv/Kconfig
> index 5d644eb7b1..faf6a10029 100644
> --- a/hw/riscv/Kconfig
> +++ b/hw/riscv/Kconfig
> @@ -1,3 +1,6 @@
> +config RISCV_IOMMU
> +    bool
> +
>  config RISCV_NUMA
>      bool
>
> @@ -38,6 +41,7 @@ config RISCV_VIRT
>      select SERIAL
>      select RISCV_ACLINT
>      select RISCV_APLIC
> +    select RISCV_IOMMU
>      select RISCV_IMSIC
>      select SIFIVE_PLIC
>      select SIFIVE_TEST
> diff --git a/hw/riscv/meson.build b/hw/riscv/meson.build
> index 2f7ee81be3..ba9eebd605 100644
> --- a/hw/riscv/meson.build
> +++ b/hw/riscv/meson.build
> @@ -10,5 +10,6 @@ riscv_ss.add(when: 'CONFIG_SIFIVE_U', if_true: files('sifive_u.c'))
>  riscv_ss.add(when: 'CONFIG_SPIKE', if_true: files('spike.c'))
>  riscv_ss.add(when: 'CONFIG_MICROCHIP_PFSOC', if_true: files('microchip_pfsoc.c'))
>  riscv_ss.add(when: 'CONFIG_ACPI', if_true: files('virt-acpi-build.c'))
> +riscv_ss.add(when: 'CONFIG_RISCV_IOMMU', if_true: files('riscv-iommu.c'))
>
>  hw_arch += {'riscv': riscv_ss}
> diff --git a/hw/riscv/riscv-iommu.c b/hw/riscv/riscv-iommu.c
> new file mode 100644
> index 0000000000..df534b99b0
> --- /dev/null
> +++ b/hw/riscv/riscv-iommu.c
> @@ -0,0 +1,1492 @@
> +/*
> + * QEMU emulation of an RISC-V IOMMU (Ziommu)
> + *
> + * Copyright (C) 2021-2023, Rivos Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qom/object.h"
> +#include "hw/pci/pci_bus.h"
> +#include "hw/pci/pci_device.h"
> +#include "hw/qdev-properties.h"
> +#include "hw/riscv/riscv_hart.h"
> +#include "migration/vmstate.h"
> +#include "qapi/error.h"
> +#include "qemu/timer.h"
> +
> +#include "cpu_bits.h"
> +#include "riscv-iommu.h"
> +#include "riscv-iommu-bits.h"
> +#include "trace.h"
> +
> +#define LIMIT_CACHE_CTX               (1U << 7)
> +#define LIMIT_CACHE_IOT               (1U << 20)
> +
> +/* Physical page number coversions */
> +#define PPN_PHYS(ppn)                 ((ppn) << TARGET_PAGE_BITS)
> +#define PPN_DOWN(phy)                 ((phy) >> TARGET_PAGE_BITS)
> +
> +typedef struct RISCVIOMMUContext RISCVIOMMUContext;
> +typedef struct RISCVIOMMUEntry RISCVIOMMUEntry;
> +
> +/* Device assigned I/O address space */
> +struct RISCVIOMMUSpace {
> +    IOMMUMemoryRegion iova_mr;  /* IOVA memory region for attached device */
> +    AddressSpace iova_as;       /* IOVA address space for attached device */
> +    RISCVIOMMUState *iommu;     /* Managing IOMMU device state */
> +    uint32_t devid;             /* Requester identifier, AKA device_id */
> +    bool notifier;              /* IOMMU unmap notifier enabled */
> +    QLIST_ENTRY(RISCVIOMMUSpace) list;
> +};
> +
> +/* Device translation context state. */
> +struct RISCVIOMMUContext {
> +    uint64_t devid:24;          /* Requester Id, AKA device_id */
> +    uint64_t pasid:20;          /* Process Address Space ID */
> +    uint64_t __rfu:20;          /* reserved */
> +    uint64_t tc;                /* Translation Control */
> +    uint64_t ta;                /* Translation Attributes */
> +    uint64_t msi_addr_mask;     /* MSI filtering - address mask */
> +    uint64_t msi_addr_pattern;  /* MSI filtering - address pattern */
> +    uint64_t msiptp;            /* MSI redirection page table pointer */
> +};
> +
> +/* IOMMU index for transactions without PASID specified. */
> +#define RISCV_IOMMU_NOPASID 0
> +
> +static void riscv_iommu_notify(RISCVIOMMUState *s, int vec)
> +{
> +    const uint32_t ipsr =
> +        riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_IPSR, (1 << vec), 0);
> +    const uint32_t ivec = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_IVEC);
> +    if (s->notify && !(ipsr & (1 << vec))) {
> +        s->notify(s, (ivec >> (vec * 4)) & 0x0F);
> +    }

s->notify is assigned to riscv_iommu_pci_notify() only.
There's no way to assert the wire-signaled interrupt.

We should also check fctl.WSI before asserting the interrupt.

> +}
> +
> +static void riscv_iommu_fault(RISCVIOMMUState *s,
> +                              struct riscv_iommu_fq_record *ev)
> +{
> +    uint32_t ctrl = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_FQCSR);
> +    uint32_t head = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_FQH) & s->fq_mask;
> +    uint32_t tail = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_FQT) & s->fq_mask;
> +    uint32_t next = (tail + 1) & s->fq_mask;
> +    uint32_t devid = get_field(ev->hdr, RISCV_IOMMU_FQ_HDR_DID);
> +
> +    trace_riscv_iommu_flt(s->parent_obj.id, PCI_BUS_NUM(devid), PCI_SLOT(devid),
> +                          PCI_FUNC(devid), ev->hdr, ev->iotval);
> +
> +    if (!(ctrl & RISCV_IOMMU_FQCSR_FQON) ||
> +        !!(ctrl & (RISCV_IOMMU_FQCSR_FQOF | RISCV_IOMMU_FQCSR_FQMF))) {
> +        return;
> +    }
> +
> +    if (head == next) {
> +        riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_FQCSR,
> +                              RISCV_IOMMU_FQCSR_FQOF, 0);
> +    } else {
> +        dma_addr_t addr = s->fq_addr + tail * sizeof(*ev);
> +        if (dma_memory_write(s->target_as, addr, ev, sizeof(*ev),
> +                             MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) {
> +            riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_FQCSR,
> +                                  RISCV_IOMMU_FQCSR_FQMF, 0);
> +        } else {
> +            riscv_iommu_reg_set32(s, RISCV_IOMMU_REG_FQT, next);
> +        }
> +    }
> +
> +    if (ctrl & RISCV_IOMMU_FQCSR_FIE) {
> +        riscv_iommu_notify(s, RISCV_IOMMU_INTR_FQ);
> +    }
> +}
> +
> +static void riscv_iommu_pri(RISCVIOMMUState *s,
> +    struct riscv_iommu_pq_record *pr)
> +{
> +    uint32_t ctrl = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_PQCSR);
> +    uint32_t head = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_PQH) & s->pq_mask;
> +    uint32_t tail = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_PQT) & s->pq_mask;
> +    uint32_t next = (tail + 1) & s->pq_mask;
> +    uint32_t devid = get_field(pr->hdr, RISCV_IOMMU_PREQ_HDR_DID);
> +
> +    trace_riscv_iommu_pri(s->parent_obj.id, PCI_BUS_NUM(devid), PCI_SLOT(devid),
> +                          PCI_FUNC(devid), pr->payload);
> +
> +    if (!(ctrl & RISCV_IOMMU_PQCSR_PQON) ||
> +        !!(ctrl & (RISCV_IOMMU_PQCSR_PQOF | RISCV_IOMMU_PQCSR_PQMF))) {
> +        return;
> +    }
> +
> +    if (head == next) {
> +        riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_PQCSR,
> +                              RISCV_IOMMU_PQCSR_PQOF, 0);
> +    } else {
> +        dma_addr_t addr = s->pq_addr + tail * sizeof(*pr);
> +        if (dma_memory_write(s->target_as, addr, pr, sizeof(*pr),
> +                             MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) {
> +            riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_PQCSR,
> +                                  RISCV_IOMMU_PQCSR_PQMF, 0);
> +        } else {
> +            riscv_iommu_reg_set32(s, RISCV_IOMMU_REG_PQT, next);
> +        }
> +    }
> +
> +    if (ctrl & RISCV_IOMMU_PQCSR_PIE) {
> +        riscv_iommu_notify(s, RISCV_IOMMU_INTR_PQ);
> +    }
> +}
> +
> +/* Portable implementation of pext_u64, bit-mask extraction. */
> +static uint64_t _pext_u64(uint64_t val, uint64_t ext)
> +{
> +    uint64_t ret = 0;
> +    uint64_t rot = 1;
> +
> +    while (ext) {
> +        if (ext & 1) {
> +            if (val & 1) {
> +                ret |= rot;
> +            }
> +            rot <<= 1;
> +        }
> +        val >>= 1;
> +        ext >>= 1;
> +    }
> +
> +    return ret;
> +}
> +
> +/* Check if GPA matches MSI/MRIF pattern. */
> +static bool riscv_iommu_msi_check(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
> +    dma_addr_t gpa)
> +{
> +    if (get_field(ctx->msiptp, RISCV_IOMMU_DC_MSIPTP_MODE) !=
> +        RISCV_IOMMU_DC_MSIPTP_MODE_FLAT) {
> +        return false; /* Invalid MSI/MRIF mode */
> +    }
> +
> +    if ((PPN_DOWN(gpa) ^ ctx->msi_addr_pattern) & ~ctx->msi_addr_mask) {
> +        return false; /* GPA not in MSI range defined by AIA IMSIC rules. */
> +    }
> +
> +    return true;
> +}
> +
> +/* RISCV IOMMU Address Translation Lookup - Page Table Walk */
> +static int riscv_iommu_spa_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
> +    IOMMUTLBEntry *iotlb)
> +{
> +    /* Early check for MSI address match when IOVA == GPA */
> +    if (iotlb->perm & IOMMU_WO &&
> +        riscv_iommu_msi_check(s, ctx, iotlb->iova)) {
> +        iotlb->target_as = &s->trap_as;
> +        iotlb->translated_addr = iotlb->iova;
> +        iotlb->addr_mask = ~TARGET_PAGE_MASK;
> +        return 0;
> +    }
> +
> +    /* Exit early for pass-through mode. */
> +    iotlb->translated_addr = iotlb->iova;
> +    iotlb->addr_mask = ~TARGET_PAGE_MASK;
> +    /* Allow R/W in pass-through mode */
> +    iotlb->perm = IOMMU_RW;
> +    return 0;
> +}
> +
> +/* Redirect MSI write for given GPA. */
> +static MemTxResult riscv_iommu_msi_write(RISCVIOMMUState *s,
> +    RISCVIOMMUContext *ctx, uint64_t gpa, uint64_t data,
> +    unsigned size, MemTxAttrs attrs)
> +{
> +    MemTxResult res;
> +    dma_addr_t addr;
> +    uint64_t intn;
> +    uint32_t n190;
> +    uint64_t pte[2];
> +
> +    if (!riscv_iommu_msi_check(s, ctx, gpa)) {
> +        return MEMTX_ACCESS_ERROR;
> +    }
> +
> +    /* Interrupt File Number */
> +    intn = _pext_u64(PPN_DOWN(gpa), ctx->msi_addr_mask);
> +    if (intn >= 256) {
> +        /* Interrupt file number out of range */
> +        return MEMTX_ACCESS_ERROR;
> +    }
> +
> +    /* fetch MSI PTE */
> +    addr = PPN_PHYS(get_field(ctx->msiptp, RISCV_IOMMU_DC_MSIPTP_PPN));
> +    addr = addr | (intn * sizeof(pte));
> +    res = dma_memory_read(s->target_as, addr, &pte, sizeof(pte),
> +            MEMTXATTRS_UNSPECIFIED);
> +    if (res != MEMTX_OK) {
> +        return res;

The spec says that:
"If msipte access detects a data corruption (a.k.a. poisoned data),
then stop and report "MSI PT data corruption" (cause = 270)."

> +    }
> +
> +    le64_to_cpus(&pte[0]);
> +    le64_to_cpus(&pte[1]);
> +
> +    if (!(pte[0] & RISCV_IOMMU_MSI_PTE_V) || (pte[0] & RISCV_IOMMU_MSI_PTE_C)) {
> +        return MEMTX_ACCESS_ERROR;

The spec says that:
"If msipte.V == 0, then stop and report "MSI PTE not valid" (cause = 262)."

> +    }
> +
> +    switch (get_field(pte[0], RISCV_IOMMU_MSI_PTE_M)) {
> +    case RISCV_IOMMU_MSI_PTE_M_BASIC:
> +        /* MSI Pass-through mode */
> +        addr = PPN_PHYS(get_field(pte[0], RISCV_IOMMU_MSI_PTE_PPN));
> +        addr = addr | (gpa & TARGET_PAGE_MASK);
> +
> +        trace_riscv_iommu_msi(s->parent_obj.id, PCI_BUS_NUM(ctx->devid),
> +                              PCI_SLOT(ctx->devid), PCI_FUNC(ctx->devid),
> +                              gpa, addr);
> +
> +        return dma_memory_write(s->target_as, addr, &data, size, attrs);
> +    case RISCV_IOMMU_MSI_PTE_M_MRIF:
> +        /* MRIF mode, continue. */
> +        break;
> +    default:
> +        return MEMTX_ACCESS_ERROR;

The spec says that:
"If msipte.M == 0 or msipte.M == 2, then stop and report "MSI PTE
misconfigured" (cause = 263)."

> +    }
> +
> +    /*
> +     * Report an error for interrupt identities exceeding the maximum allowed
> +     * for an IMSIC interrupt file (2047) or destination address is not 32-bit
> +     * aligned. See IOMMU Specification, Chapter 2.3. MSI page tables.
> +     */
> +    if ((data > 2047) || (gpa & 3)) {
> +        return MEMTX_ACCESS_ERROR;
> +    }
> +
> +    /* MSI MRIF mode, non atomic pending bit update */
> +
> +    /* MRIF pending bit address */
> +    addr = get_field(pte[0], RISCV_IOMMU_MSI_PTE_MRIF_ADDR) << 9;
> +    addr = addr | ((data & 0x7c0) >> 3);
> +
> +    trace_riscv_iommu_msi(s->parent_obj.id, PCI_BUS_NUM(ctx->devid),
> +                          PCI_SLOT(ctx->devid), PCI_FUNC(ctx->devid),
> +                          gpa, addr);
> +
> +    /* MRIF pending bit mask */
> +    data = 1ULL << (data & 0x03f);
> +    res = dma_memory_read(s->target_as, addr, &intn, sizeof(intn), attrs);
> +    if (res != MEMTX_OK) {
> +        return res;
> +    }
> +    intn = intn | data;
> +    res = dma_memory_write(s->target_as, addr, &intn, sizeof(intn), attrs);
> +    if (res != MEMTX_OK) {
> +        return res;
> +    }
> +
> +    /* Get MRIF enable bits */
> +    addr = addr + sizeof(intn);
> +    res = dma_memory_read(s->target_as, addr, &intn, sizeof(intn), attrs);
> +    if (res != MEMTX_OK) {
> +        return res;
> +    }
> +    if (!(intn & data)) {
> +        /* notification disabled, MRIF update completed. */
> +        return MEMTX_OK;
> +    }
> +
> +    /* Send notification message */
> +    addr = PPN_PHYS(get_field(pte[1], RISCV_IOMMU_MSI_MRIF_NPPN));
> +    n190 = get_field(pte[1], RISCV_IOMMU_MSI_MRIF_NID) |
> +          (get_field(pte[1], RISCV_IOMMU_MSI_MRIF_NID_MSB) << 10);
> +
> +    res = dma_memory_write(s->target_as, addr, &n190, sizeof(n190), attrs);
> +    if (res != MEMTX_OK) {
> +        return res;
> +    }
> +
> +    return MEMTX_OK;
> +}
> +
> +/*
> + * RISC-V IOMMU Device Context Loopkup - Device Directory Tree Walk
> + *
> + * @s         : IOMMU Device State
> + * @ctx       : Device Translation Context with devid and pasid set.
> + * @return    : success or fault code.
> + */
> +static int riscv_iommu_ctx_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx)
> +{
> +    const uint64_t ddtp = s->ddtp;
> +    unsigned mode = get_field(ddtp, RISCV_IOMMU_DDTP_MODE);
> +    dma_addr_t addr = PPN_PHYS(get_field(ddtp, RISCV_IOMMU_DDTP_PPN));
> +    struct riscv_iommu_dc dc;
> +    /* Device Context format: 0: extended (64 bytes) | 1: base (32 bytes) */
> +    const int dc_fmt = !s->enable_msi;
> +    const size_t dc_len = sizeof(dc) >> dc_fmt;
> +    unsigned depth;
> +    uint64_t de;
> +
> +    switch (mode) {
> +    case RISCV_IOMMU_DDTP_MODE_OFF:
> +        return RISCV_IOMMU_FQ_CAUSE_DMA_DISABLED;
> +
> +    case RISCV_IOMMU_DDTP_MODE_BARE:
> +        /* mock up pass-through translation context */
> +        ctx->tc = RISCV_IOMMU_DC_TC_V;
> +        ctx->ta = 0;
> +        ctx->msiptp = 0;
> +        return 0;
> +
> +    case RISCV_IOMMU_DDTP_MODE_1LVL:
> +        depth = 0;
> +        break;
> +
> +    case RISCV_IOMMU_DDTP_MODE_2LVL:
> +        depth = 1;
> +        break;
> +
> +    case RISCV_IOMMU_DDTP_MODE_3LVL:
> +        depth = 2;
> +        break;
> +
> +    default:
> +        return RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED;
> +    }
> +
> +    /*
> +     * Check supported device id width (in bits).
> +     * See IOMMU Specification, Chapter 6. Software guidelines.
> +     * - if extended device-context format is used:
> +     *   1LVL: 6, 2LVL: 15, 3LVL: 24
> +     * - if base device-context format is used:
> +     *   1LVL: 7, 2LVL: 16, 3LVL: 24
> +     */
> +    if (ctx->devid >= (1 << (depth * 9 + 6 + (dc_fmt && depth != 2)))) {
> +        return RISCV_IOMMU_FQ_CAUSE_DDT_INVALID;
> +    }
> +
> +    /* Device directory tree walk */
> +    for (; depth-- > 0; ) {
> +        /*
> +         * Select device id index bits based on device directory tree level
> +         * and device context format.
> +         * See IOMMU Specification, Chapter 2. Data Structures.
> +         * - if extended device-context format is used:
> +         *   device index: [23:15][14:6][5:0]
> +         * - if base device-context format is used:
> +         *   device index: [23:16][15:7][6:0]
> +         */
> +        const int split = depth * 9 + 6 + dc_fmt;
> +        addr |= ((ctx->devid >> split) << 3) & ~TARGET_PAGE_MASK;
> +        if (dma_memory_read(s->target_as, addr, &de, sizeof(de),
> +                            MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) {
> +            return RISCV_IOMMU_FQ_CAUSE_DDT_LOAD_FAULT;
> +        }
> +        le64_to_cpus(&de);
> +        if (!(de & RISCV_IOMMU_DDTE_VALID)) {
> +            /* invalid directory entry */
> +            return RISCV_IOMMU_FQ_CAUSE_DDT_INVALID;
> +        }
> +        if (de & ~(RISCV_IOMMU_DDTE_PPN | RISCV_IOMMU_DDTE_VALID)) {
> +            /* reserved bits set */
> +            return RISCV_IOMMU_FQ_CAUSE_DDT_INVALID;
> +        }
> +        addr = PPN_PHYS(get_field(de, RISCV_IOMMU_DDTE_PPN));
> +    }
> +
> +    /* index into device context entry page */
> +    addr |= (ctx->devid * dc_len) & ~TARGET_PAGE_MASK;
> +
> +    memset(&dc, 0, sizeof(dc));
> +    if (dma_memory_read(s->target_as, addr, &dc, dc_len,
> +                        MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) {
> +        return RISCV_IOMMU_FQ_CAUSE_DDT_LOAD_FAULT;
> +    }
> +
> +    /* Set translation context. */
> +    ctx->tc = le64_to_cpu(dc.tc);
> +    ctx->ta = le64_to_cpu(dc.ta);
> +    ctx->msiptp = le64_to_cpu(dc.msiptp);
> +    ctx->msi_addr_mask = le64_to_cpu(dc.msi_addr_mask);
> +    ctx->msi_addr_pattern = le64_to_cpu(dc.msi_addr_pattern);
> +
> +    if (!(ctx->tc & RISCV_IOMMU_DC_TC_V)) {
> +        return RISCV_IOMMU_FQ_CAUSE_DDT_INVALID;
> +    }
> +
> +    if (!(ctx->tc & RISCV_IOMMU_DC_TC_PDTV)) {
> +        if (ctx->pasid != RISCV_IOMMU_NOPASID) {
> +            /* PASID is disabled */
> +            return RISCV_IOMMU_FQ_CAUSE_TTYPE_BLOCKED;
> +        }
> +        return 0;
> +    }
> +
> +    /* FSC.TC.PDTV enabled */
> +    if (mode > RISCV_IOMMU_DC_FSC_PDTP_MODE_PD20) {
> +        /* Invalid PDTP.MODE */
> +        return RISCV_IOMMU_FQ_CAUSE_PDT_MISCONFIGURED;
> +    }
> +
> +    for (depth = mode - RISCV_IOMMU_DC_FSC_PDTP_MODE_PD8; depth-- > 0; ) {
> +        /*
> +         * Select process id index bits based on process directory tree
> +         * level. See IOMMU Specification, 2.2. Process-Directory-Table.
> +         */
> +        const int split = depth * 9 + 8;
> +        addr |= ((ctx->pasid >> split) << 3) & ~TARGET_PAGE_MASK;
> +        if (dma_memory_read(s->target_as, addr, &de, sizeof(de),
> +                            MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) {
> +            return RISCV_IOMMU_FQ_CAUSE_PDT_LOAD_FAULT;
> +        }
> +        le64_to_cpus(&de);
> +        if (!(de & RISCV_IOMMU_PC_TA_V)) {
> +            return RISCV_IOMMU_FQ_CAUSE_PDT_INVALID;
> +        }
> +        addr = PPN_PHYS(get_field(de, RISCV_IOMMU_PC_FSC_PPN));
> +    }
> +
> +    /* Leaf entry in PDT */
> +    addr |= (ctx->pasid << 4) & ~TARGET_PAGE_MASK;
> +    if (dma_memory_read(s->target_as, addr, &dc.ta, sizeof(uint64_t) * 2,
> +                        MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) {
> +        return RISCV_IOMMU_FQ_CAUSE_PDT_LOAD_FAULT;
> +    }
> +
> +    /* Use FSC and TA from process directory entry. */
> +    ctx->ta = le64_to_cpu(dc.ta);
> +
> +    return 0;
> +}
> +
> +/* Translation Context cache support */
> +static gboolean __ctx_equal(gconstpointer v1, gconstpointer v2)
> +{
> +    RISCVIOMMUContext *c1 = (RISCVIOMMUContext *) v1;
> +    RISCVIOMMUContext *c2 = (RISCVIOMMUContext *) v2;
> +    return c1->devid == c2->devid && c1->pasid == c2->pasid;
> +}
> +
> +static guint __ctx_hash(gconstpointer v)
> +{
> +    RISCVIOMMUContext *ctx = (RISCVIOMMUContext *) v;
> +    /* Generate simple hash of (pasid, devid), assuming 24-bit wide devid */
> +    return (guint)(ctx->devid) + ((guint)(ctx->pasid) << 24);
> +}
> +
> +static void __ctx_inval_devid_pasid(gpointer key, gpointer value, gpointer data)
> +{
> +    RISCVIOMMUContext *ctx = (RISCVIOMMUContext *) value;
> +    RISCVIOMMUContext *arg = (RISCVIOMMUContext *) data;
> +    if (ctx->tc & RISCV_IOMMU_DC_TC_V &&
> +        ctx->devid == arg->devid &&
> +        ctx->pasid == arg->pasid) {
> +        ctx->tc &= ~RISCV_IOMMU_DC_TC_V;
> +    }
> +}
> +
> +static void __ctx_inval_devid(gpointer key, gpointer value, gpointer data)
> +{
> +    RISCVIOMMUContext *ctx = (RISCVIOMMUContext *) value;
> +    RISCVIOMMUContext *arg = (RISCVIOMMUContext *) data;
> +    if (ctx->tc & RISCV_IOMMU_DC_TC_V &&
> +        ctx->devid == arg->devid) {
> +        ctx->tc &= ~RISCV_IOMMU_DC_TC_V;
> +    }
> +}
> +
> +static void __ctx_inval_all(gpointer key, gpointer value, gpointer data)
> +{
> +    RISCVIOMMUContext *ctx = (RISCVIOMMUContext *) value;
> +    if (ctx->tc & RISCV_IOMMU_DC_TC_V) {
> +        ctx->tc &= ~RISCV_IOMMU_DC_TC_V;
> +    }
> +}
> +
> +static void riscv_iommu_ctx_inval(RISCVIOMMUState *s, GHFunc func,
> +    uint32_t devid, uint32_t pasid)
> +{
> +    GHashTable *ctx_cache;
> +    RISCVIOMMUContext key = {
> +        .devid = devid,
> +        .pasid = pasid,
> +    };
> +    ctx_cache = g_hash_table_ref(s->ctx_cache);
> +    g_hash_table_foreach(ctx_cache, func, &key);
> +    g_hash_table_unref(ctx_cache);
> +}
> +
> +/* Find or allocate translation context for a given {device_id, process_id} */
> +static RISCVIOMMUContext *riscv_iommu_ctx(RISCVIOMMUState *s,
> +    unsigned devid, unsigned pasid, void **ref)
> +{
> +    GHashTable *ctx_cache;
> +    RISCVIOMMUContext *ctx;
> +    RISCVIOMMUContext key = {
> +        .devid = devid,
> +        .pasid = pasid,
> +    };
> +
> +    ctx_cache = g_hash_table_ref(s->ctx_cache);
> +    ctx = g_hash_table_lookup(ctx_cache, &key);
> +
> +    if (ctx && (ctx->tc & RISCV_IOMMU_DC_TC_V)) {
> +        *ref = ctx_cache;
> +        return ctx;
> +    }
> +
> +    if (g_hash_table_size(s->ctx_cache) >= LIMIT_CACHE_CTX) {
> +        ctx_cache = g_hash_table_new_full(__ctx_hash, __ctx_equal,
> +                                          g_free, NULL);
> +        g_hash_table_unref(qatomic_xchg(&s->ctx_cache, ctx_cache));
> +    }
> +
> +    ctx = g_new0(RISCVIOMMUContext, 1);
> +    ctx->devid = devid;
> +    ctx->pasid = pasid;
> +
> +    int fault = riscv_iommu_ctx_fetch(s, ctx);
> +    if (!fault) {
> +        g_hash_table_add(ctx_cache, ctx);
> +        *ref = ctx_cache;
> +        return ctx;
> +    }
> +
> +    g_hash_table_unref(ctx_cache);
> +    *ref = NULL;
> +
> +    if (!(ctx->tc & RISCV_IOMMU_DC_TC_DTF)) {

riscv_iommu_ctx_fetch() may return:
RISCV_IOMMU_FQ_CAUSE_DMA_DISABLED (256)
RISCV_IOMMU_FQ_CAUSE_DDT_LOAD_FAULT (257)
RISCV_IOMMU_FQ_CAUSE_DDT_INVALID (258)
RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED (259)

These faults are reported even when DTF is set to 1.
We should report these faults regardless of DTF setting.

> +        struct riscv_iommu_fq_record ev = { 0 };
> +        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_CAUSE, fault);
> +        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_TTYPE,
> +            RISCV_IOMMU_FQ_TTYPE_UADDR_RD);
> +        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_DID, devid);
> +        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_PID, pasid);
> +        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_PV, !!pasid);
> +        riscv_iommu_fault(s, &ev);
> +    }
> +
> +    g_free(ctx);
> +    return NULL;
> +}
> +
> +static void riscv_iommu_ctx_put(RISCVIOMMUState *s, void *ref)
> +{
> +    if (ref) {
> +        g_hash_table_unref((GHashTable *)ref);
> +    }
> +}
> +
> +/* Find or allocate address space for a given device */
> +static AddressSpace *riscv_iommu_space(RISCVIOMMUState *s, uint32_t devid)
> +{
> +    RISCVIOMMUSpace *as;
> +
> +    /* FIXME: PCIe bus remapping for attached endpoints. */
> +    devid |= s->bus << 8;
> +
> +    qemu_mutex_lock(&s->core_lock);
> +    QLIST_FOREACH(as, &s->spaces, list) {
> +        if (as->devid == devid) {
> +            break;
> +        }
> +    }
> +    qemu_mutex_unlock(&s->core_lock);
> +
> +    if (as == NULL) {
> +        char name[64];
> +        as = g_new0(RISCVIOMMUSpace, 1);
> +
> +        as->iommu = s;
> +        as->devid = devid;
> +
> +        snprintf(name, sizeof(name), "riscv-iommu-%04x:%02x.%d-iova",
> +            PCI_BUS_NUM(as->devid), PCI_SLOT(as->devid), PCI_FUNC(as->devid));
> +
> +        /* IOVA address space, untranslated addresses */
> +        memory_region_init_iommu(&as->iova_mr, sizeof(as->iova_mr),
> +            TYPE_RISCV_IOMMU_MEMORY_REGION,
> +            OBJECT(as), name, UINT64_MAX);
> +        address_space_init(&as->iova_as, MEMORY_REGION(&as->iova_mr),
> +            TYPE_RISCV_IOMMU_PCI);

Why do we use TYPE_RISCV_IOMMU_PCI as the address space name here?

> +
> +        qemu_mutex_lock(&s->core_lock);
> +        QLIST_INSERT_HEAD(&s->spaces, as, list);
> +        qemu_mutex_unlock(&s->core_lock);
> +
> +        trace_riscv_iommu_new(s->parent_obj.id, PCI_BUS_NUM(as->devid),
> +                PCI_SLOT(as->devid), PCI_FUNC(as->devid));
> +    }
> +    return &as->iova_as;
> +}
> +
> +static int riscv_iommu_translate(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
> +    IOMMUTLBEntry *iotlb)
> +{
> +    bool enable_faults;
> +    bool enable_pasid;
> +    bool enable_pri;
> +    int fault;
> +
> +    enable_faults = !(ctx->tc & RISCV_IOMMU_DC_TC_DTF);
> +    /*
> +     * TC[32] is reserved for custom extensions, used here to temporarily
> +     * enable automatic page-request generation for ATS queries.
> +     */
> +    enable_pri = (iotlb->perm == IOMMU_NONE) && (ctx->tc & BIT_ULL(32));
> +    enable_pasid = (ctx->tc & RISCV_IOMMU_DC_TC_PDTV);
> +
> +    /* Translate using device directory / page table information. */
> +    fault = riscv_iommu_spa_fetch(s, ctx, iotlb);
> +
> +    if (enable_pri && fault) {
> +        struct riscv_iommu_pq_record pr = {0};
> +        if (enable_pasid) {
> +            pr.hdr = set_field(RISCV_IOMMU_PREQ_HDR_PV,
> +                RISCV_IOMMU_PREQ_HDR_PID, ctx->pasid);
> +        }
> +        pr.hdr = set_field(pr.hdr, RISCV_IOMMU_PREQ_HDR_DID, ctx->devid);
> +        pr.payload = (iotlb->iova & TARGET_PAGE_MASK) |
> +                     RISCV_IOMMU_PREQ_PAYLOAD_M;
> +        riscv_iommu_pri(s, &pr);
> +        return fault;
> +    }
> +
> +    if (enable_faults && fault) {
> +        struct riscv_iommu_fq_record ev;
> +        unsigned ttype;
> +
> +        if (iotlb->perm & IOMMU_RW) {
> +            ttype = RISCV_IOMMU_FQ_TTYPE_UADDR_WR;
> +        } else {
> +            ttype = RISCV_IOMMU_FQ_TTYPE_UADDR_RD;
> +        }
> +        ev.hdr = set_field(0, RISCV_IOMMU_FQ_HDR_CAUSE, fault);
> +        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_TTYPE, ttype);
> +        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_PV, enable_pasid);
> +        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_PID, ctx->pasid);
> +        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_DID, ctx->devid);
> +        ev.iotval    = iotlb->iova;
> +        ev.iotval2   = iotlb->translated_addr;
> +        ev._reserved = 0;
> +        riscv_iommu_fault(s, &ev);
> +        return fault;
> +    }
> +
> +    return 0;
> +}
> +
> +/* IOMMU Command Interface */
> +static MemTxResult riscv_iommu_iofence(RISCVIOMMUState *s, bool notify,
> +    uint64_t addr, uint32_t data)
> +{
> +    /*
> +     * ATS processing in this implementation of the IOMMU is synchronous,
> +     * no need to wait for completions here.
> +     */
> +    if (!notify) {
> +        return MEMTX_OK;
> +    }
> +
> +    return dma_memory_write(s->target_as, addr, &data, sizeof(data),
> +        MEMTXATTRS_UNSPECIFIED);

We should also assert the interrupt when IOFENCE.WSI is true
and IOMMU is configured with wire-signaled interrupt.

> +}
> +
> +static void riscv_iommu_process_ddtp(RISCVIOMMUState *s)
> +{
> +    uint64_t old_ddtp = s->ddtp;
> +    uint64_t new_ddtp = riscv_iommu_reg_get64(s, RISCV_IOMMU_REG_DDTP);
> +    unsigned new_mode = get_field(new_ddtp, RISCV_IOMMU_DDTP_MODE);
> +    unsigned old_mode = get_field(old_ddtp, RISCV_IOMMU_DDTP_MODE);
> +    bool ok = false;
> +
> +    /*
> +     * Check for allowed DDTP.MODE transitions:
> +     * {OFF, BARE}        -> {OFF, BARE, 1LVL, 2LVL, 3LVL}
> +     * {1LVL, 2LVL, 3LVL} -> {OFF, BARE}
> +     */
> +    if (new_mode == old_mode ||
> +        new_mode == RISCV_IOMMU_DDTP_MODE_OFF ||
> +        new_mode == RISCV_IOMMU_DDTP_MODE_BARE) {
> +        ok = true;
> +    } else if (new_mode == RISCV_IOMMU_DDTP_MODE_1LVL ||
> +               new_mode == RISCV_IOMMU_DDTP_MODE_2LVL ||
> +               new_mode == RISCV_IOMMU_DDTP_MODE_3LVL) {
> +        ok = old_mode == RISCV_IOMMU_DDTP_MODE_OFF ||
> +             old_mode == RISCV_IOMMU_DDTP_MODE_BARE;
> +    }
> +
> +    if (ok) {
> +        /* clear reserved and busy bits, report back sanitized version */
> +        new_ddtp = set_field(new_ddtp & RISCV_IOMMU_DDTP_PPN,
> +                             RISCV_IOMMU_DDTP_MODE, new_mode);
> +    } else {
> +        new_ddtp = old_ddtp;
> +    }
> +    s->ddtp = new_ddtp;
> +
> +    riscv_iommu_reg_set64(s, RISCV_IOMMU_REG_DDTP, new_ddtp);
> +}
> +
> +/* Command function and opcode field. */
> +#define RISCV_IOMMU_CMD(func, op) (((func) << 7) | (op))
> +
> +static void riscv_iommu_process_cq_tail(RISCVIOMMUState *s)
> +{
> +    struct riscv_iommu_command cmd;
> +    MemTxResult res;
> +    dma_addr_t addr;
> +    uint32_t tail, head, ctrl;
> +    uint64_t cmd_opcode;
> +    GHFunc func;
> +
> +    ctrl = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_CQCSR);
> +    tail = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_CQT) & s->cq_mask;
> +    head = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_CQH) & s->cq_mask;
> +
> +    /* Check for pending error or queue processing disabled */
> +    if (!(ctrl & RISCV_IOMMU_CQCSR_CQON) ||
> +        !!(ctrl & (RISCV_IOMMU_CQCSR_CMD_ILL | RISCV_IOMMU_CQCSR_CQMF))) {
> +        return;
> +    }
> +
> +    while (tail != head) {
> +        addr = s->cq_addr  + head * sizeof(cmd);
> +        res = dma_memory_read(s->target_as, addr, &cmd, sizeof(cmd),
> +                              MEMTXATTRS_UNSPECIFIED);
> +
> +        if (res != MEMTX_OK) {
> +            riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_CQCSR,
> +                                  RISCV_IOMMU_CQCSR_CQMF, 0);
> +            goto fault;
> +        }
> +
> +        trace_riscv_iommu_cmd(s->parent_obj.id, cmd.dword0, cmd.dword1);
> +
> +        cmd_opcode = get_field(cmd.dword0,
> +                               RISCV_IOMMU_CMD_OPCODE | RISCV_IOMMU_CMD_FUNC);
> +
> +        switch (cmd_opcode) {
> +        case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IOFENCE_FUNC_C,
> +                             RISCV_IOMMU_CMD_IOFENCE_OPCODE):
> +            res = riscv_iommu_iofence(s,
> +                cmd.dword0 & RISCV_IOMMU_CMD_IOFENCE_AV, cmd.dword1,
> +                get_field(cmd.dword0, RISCV_IOMMU_CMD_IOFENCE_DATA));
> +
> +            if (res != MEMTX_OK) {
> +                riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_CQCSR,
> +                                      RISCV_IOMMU_CQCSR_CQMF, 0);
> +                goto fault;
> +            }
> +            break;
> +
> +        case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IOTINVAL_FUNC_GVMA,
> +                             RISCV_IOMMU_CMD_IOTINVAL_OPCODE):
> +            if (cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_PSCV) {
> +                /* illegal command arguments IOTINVAL.GVMA & PSCV == 1 */
> +                goto cmd_ill;
> +            }
> +            /* translation cache not implemented yet */
> +            break;
> +
> +        case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IOTINVAL_FUNC_VMA,
> +                             RISCV_IOMMU_CMD_IOTINVAL_OPCODE):
> +            /* translation cache not implemented yet */
> +            break;
> +
> +        case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_DDT,
> +                             RISCV_IOMMU_CMD_IODIR_OPCODE):
> +            if (!(cmd.dword0 & RISCV_IOMMU_CMD_IODIR_DV)) {
> +                /* invalidate all device context cache mappings */
> +                func = __ctx_inval_all;
> +            } else {
> +                /* invalidate all device context matching DID */
> +                func = __ctx_inval_devid;
> +            }
> +            riscv_iommu_ctx_inval(s, func,
> +                get_field(cmd.dword0, RISCV_IOMMU_CMD_IODIR_DID), 0);
> +            break;
> +
> +        case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_PDT,
> +                             RISCV_IOMMU_CMD_IODIR_OPCODE):
> +            if (!(cmd.dword0 & RISCV_IOMMU_CMD_IODIR_DV)) {
> +                /* illegal command arguments IODIR_PDT & DV == 0 */
> +                goto cmd_ill;
> +            } else {
> +                func = __ctx_inval_devid_pasid;
> +            }
> +            riscv_iommu_ctx_inval(s, func,
> +                get_field(cmd.dword0, RISCV_IOMMU_CMD_IODIR_DID),
> +                get_field(cmd.dword0, RISCV_IOMMU_CMD_IODIR_PID));
> +            break;
> +
> +        default:
> +        cmd_ill:
> +            /* Invalid instruction, do not advance instruction index. */
> +            riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_CQCSR,
> +                RISCV_IOMMU_CQCSR_CMD_ILL, 0);
> +            goto fault;
> +        }
> +
> +        /* Advance and update head pointer after command completes. */
> +        head = (head + 1) & s->cq_mask;
> +        riscv_iommu_reg_set32(s, RISCV_IOMMU_REG_CQH, head);
> +    }
> +    return;
> +
> +fault:
> +    if (ctrl & RISCV_IOMMU_CQCSR_CIE) {
> +        riscv_iommu_notify(s, RISCV_IOMMU_INTR_CQ);
> +    }
> +}
> +
> +static void riscv_iommu_process_cq_control(RISCVIOMMUState *s)
> +{
> +    uint64_t base;
> +    uint32_t ctrl_set = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_CQCSR);
> +    uint32_t ctrl_clr;
> +    bool enable = !!(ctrl_set & RISCV_IOMMU_CQCSR_CQEN);
> +    bool active = !!(ctrl_set & RISCV_IOMMU_CQCSR_CQON);
> +
> +    if (enable && !active) {
> +        base = riscv_iommu_reg_get64(s, RISCV_IOMMU_REG_CQB);
> +        s->cq_mask = (2ULL << get_field(base, RISCV_IOMMU_CQB_LOG2SZ)) - 1;
> +        s->cq_addr = PPN_PHYS(get_field(base, RISCV_IOMMU_CQB_PPN));
> +        stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_CQT], ~s->cq_mask);
> +        stl_le_p(&s->regs_rw[RISCV_IOMMU_REG_CQH], 0);
> +        stl_le_p(&s->regs_rw[RISCV_IOMMU_REG_CQT], 0);
> +        ctrl_set = RISCV_IOMMU_CQCSR_CQON;
> +        ctrl_clr = RISCV_IOMMU_CQCSR_BUSY | RISCV_IOMMU_CQCSR_CQMF |
> +            RISCV_IOMMU_CQCSR_CMD_ILL | RISCV_IOMMU_CQCSR_CMD_TO;
> +    } else if (!enable && active) {
> +        stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_CQT], ~0);
> +        ctrl_set = 0;
> +        ctrl_clr = RISCV_IOMMU_CQCSR_BUSY | RISCV_IOMMU_CQCSR_CQON;
> +    } else {
> +        ctrl_set = 0;
> +        ctrl_clr = RISCV_IOMMU_CQCSR_BUSY;
> +    }
> +
> +    riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_CQCSR, ctrl_set, ctrl_clr);
> +}
> +
> +static void riscv_iommu_process_fq_control(RISCVIOMMUState *s)
> +{
> +    uint64_t base;
> +    uint32_t ctrl_set = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_FQCSR);
> +    uint32_t ctrl_clr;
> +    bool enable = !!(ctrl_set & RISCV_IOMMU_FQCSR_FQEN);
> +    bool active = !!(ctrl_set & RISCV_IOMMU_FQCSR_FQON);
> +
> +    if (enable && !active) {
> +        base = riscv_iommu_reg_get64(s, RISCV_IOMMU_REG_FQB);
> +        s->fq_mask = (2ULL << get_field(base, RISCV_IOMMU_FQB_LOG2SZ)) - 1;
> +        s->fq_addr = PPN_PHYS(get_field(base, RISCV_IOMMU_FQB_PPN));
> +        stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_FQH], ~s->fq_mask);
> +        stl_le_p(&s->regs_rw[RISCV_IOMMU_REG_FQH], 0);
> +        stl_le_p(&s->regs_rw[RISCV_IOMMU_REG_FQT], 0);
> +        ctrl_set = RISCV_IOMMU_FQCSR_FQON;
> +        ctrl_clr = RISCV_IOMMU_FQCSR_BUSY | RISCV_IOMMU_FQCSR_FQMF |
> +            RISCV_IOMMU_FQCSR_FQOF;
> +    } else if (!enable && active) {
> +        stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_FQH], ~0);
> +        ctrl_set = 0;
> +        ctrl_clr = RISCV_IOMMU_FQCSR_BUSY | RISCV_IOMMU_FQCSR_FQON;
> +    } else {
> +        ctrl_set = 0;
> +        ctrl_clr = RISCV_IOMMU_FQCSR_BUSY;
> +    }
> +
> +    riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_FQCSR, ctrl_set, ctrl_clr);
> +}
> +
> +static void riscv_iommu_process_pq_control(RISCVIOMMUState *s)
> +{
> +    uint64_t base;
> +    uint32_t ctrl_set = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_PQCSR);
> +    uint32_t ctrl_clr;
> +    bool enable = !!(ctrl_set & RISCV_IOMMU_PQCSR_PQEN);
> +    bool active = !!(ctrl_set & RISCV_IOMMU_PQCSR_PQON);
> +
> +    if (enable && !active) {
> +        base = riscv_iommu_reg_get64(s, RISCV_IOMMU_REG_PQB);
> +        s->pq_mask = (2ULL << get_field(base, RISCV_IOMMU_PQB_LOG2SZ)) - 1;
> +        s->pq_addr = PPN_PHYS(get_field(base, RISCV_IOMMU_PQB_PPN));
> +        stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_PQH], ~s->pq_mask);
> +        stl_le_p(&s->regs_rw[RISCV_IOMMU_REG_PQH], 0);
> +        stl_le_p(&s->regs_rw[RISCV_IOMMU_REG_PQT], 0);
> +        ctrl_set = RISCV_IOMMU_PQCSR_PQON;
> +        ctrl_clr = RISCV_IOMMU_PQCSR_BUSY | RISCV_IOMMU_PQCSR_PQMF |
> +            RISCV_IOMMU_PQCSR_PQOF;
> +    } else if (!enable && active) {
> +        stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_PQH], ~0);
> +        ctrl_set = 0;
> +        ctrl_clr = RISCV_IOMMU_PQCSR_BUSY | RISCV_IOMMU_PQCSR_PQON;
> +    } else {
> +        ctrl_set = 0;
> +        ctrl_clr = RISCV_IOMMU_PQCSR_BUSY;
> +    }
> +
> +    riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_PQCSR, ctrl_set, ctrl_clr);
> +}
> +
> +/* Core IOMMU execution activation */
> +enum {
> +    RISCV_IOMMU_EXEC_DDTP,
> +    RISCV_IOMMU_EXEC_CQCSR,
> +    RISCV_IOMMU_EXEC_CQT,
> +    RISCV_IOMMU_EXEC_FQCSR,
> +    RISCV_IOMMU_EXEC_FQH,
> +    RISCV_IOMMU_EXEC_PQCSR,
> +    RISCV_IOMMU_EXEC_PQH,
> +    RISCV_IOMMU_EXEC_TR_REQUEST,
> +    /* RISCV_IOMMU_EXEC_EXIT must be the last enum value */
> +    RISCV_IOMMU_EXEC_EXIT,
> +};
> +
> +static void *riscv_iommu_core_proc(void* arg)
> +{
> +    RISCVIOMMUState *s = arg;
> +    unsigned exec = 0;
> +    unsigned mask = 0;
> +
> +    while (!(exec & BIT(RISCV_IOMMU_EXEC_EXIT))) {
> +        mask = (mask ? mask : BIT(RISCV_IOMMU_EXEC_EXIT)) >> 1;
> +        switch (exec & mask) {
> +        case BIT(RISCV_IOMMU_EXEC_DDTP):
> +            riscv_iommu_process_ddtp(s);
> +            break;
> +        case BIT(RISCV_IOMMU_EXEC_CQCSR):
> +            riscv_iommu_process_cq_control(s);
> +            break;
> +        case BIT(RISCV_IOMMU_EXEC_CQT):
> +            riscv_iommu_process_cq_tail(s);
> +            break;
> +        case BIT(RISCV_IOMMU_EXEC_FQCSR):
> +            riscv_iommu_process_fq_control(s);
> +            break;
> +        case BIT(RISCV_IOMMU_EXEC_FQH):
> +            /* NOP */
> +            break;
> +        case BIT(RISCV_IOMMU_EXEC_PQCSR):
> +            riscv_iommu_process_pq_control(s);
> +            break;
> +        case BIT(RISCV_IOMMU_EXEC_PQH):
> +            /* NOP */
> +            break;
> +        case BIT(RISCV_IOMMU_EXEC_TR_REQUEST):
> +            /* DBG support not implemented yet */
> +            break;
> +        }
> +        exec &= ~mask;
> +        if (!exec) {
> +            qemu_mutex_lock(&s->core_lock);
> +            exec = s->core_exec;
> +            while (!exec) {
> +                qemu_cond_wait(&s->core_cond, &s->core_lock);
> +                exec = s->core_exec;
> +            }
> +            s->core_exec = 0;
> +            qemu_mutex_unlock(&s->core_lock);
> +        }
> +    };
> +
> +    return NULL;
> +}
> +
> +static MemTxResult riscv_iommu_mmio_write(void *opaque, hwaddr addr,
> +    uint64_t data, unsigned size, MemTxAttrs attrs)
> +{
> +    RISCVIOMMUState *s = opaque;
> +    uint32_t regb = addr & ~3;
> +    uint32_t busy = 0;
> +    uint32_t exec = 0;
> +
> +    if (size == 0 || size > 8 || (addr & (size - 1)) != 0) {

Is it ever possible to have size = 0 or size > 8 write access?
This should be guarded by .valid.min_access_size and .valid.max_access_size.

> +        /* Unsupported MMIO alignment or access size */
> +        return MEMTX_ERROR;
> +    }
> +
> +    if (addr + size > RISCV_IOMMU_REG_MSI_CONFIG) {
> +        /* Unsupported MMIO access location. */
> +        return MEMTX_ACCESS_ERROR;
> +    }
> +
> +    /* Track actionable MMIO write. */
> +    switch (regb) {
> +    case RISCV_IOMMU_REG_DDTP:
> +    case RISCV_IOMMU_REG_DDTP + 4:
> +        exec = BIT(RISCV_IOMMU_EXEC_DDTP);
> +        regb = RISCV_IOMMU_REG_DDTP;
> +        busy = RISCV_IOMMU_DDTP_BUSY;
> +        break;
> +
> +    case RISCV_IOMMU_REG_CQT:
> +        exec = BIT(RISCV_IOMMU_EXEC_CQT);
> +        break;
> +
> +    case RISCV_IOMMU_REG_CQCSR:
> +        exec = BIT(RISCV_IOMMU_EXEC_CQCSR);
> +        busy = RISCV_IOMMU_CQCSR_BUSY;
> +        break;
> +
> +    case RISCV_IOMMU_REG_FQH:
> +        exec = BIT(RISCV_IOMMU_EXEC_FQH);
> +        break;
> +
> +    case RISCV_IOMMU_REG_FQCSR:
> +        exec = BIT(RISCV_IOMMU_EXEC_FQCSR);
> +        busy = RISCV_IOMMU_FQCSR_BUSY;
> +        break;
> +
> +    case RISCV_IOMMU_REG_PQH:
> +        exec = BIT(RISCV_IOMMU_EXEC_PQH);
> +        break;
> +
> +    case RISCV_IOMMU_REG_PQCSR:
> +        exec = BIT(RISCV_IOMMU_EXEC_PQCSR);
> +        busy = RISCV_IOMMU_PQCSR_BUSY;
> +        break;
> +    }
> +
> +    /*
> +     * Registers update might be not synchronized with core logic.
> +     * If system software updates register when relevant BUSY bit is set
> +     * IOMMU behavior of additional writes to the register is UNSPECIFIED
> +     */
> +
> +    qemu_spin_lock(&s->regs_lock);
> +    if (size == 1) {
> +        uint8_t ro = s->regs_ro[addr];
> +        uint8_t wc = s->regs_wc[addr];
> +        uint8_t rw = s->regs_rw[addr];
> +        s->regs_rw[addr] = ((rw & ro) | (data & ~ro)) & ~(data & wc);
> +    } else if (size == 2) {
> +        uint16_t ro = lduw_le_p(&s->regs_ro[addr]);
> +        uint16_t wc = lduw_le_p(&s->regs_wc[addr]);
> +        uint16_t rw = lduw_le_p(&s->regs_rw[addr]);
> +        stw_le_p(&s->regs_rw[addr], ((rw & ro) | (data & ~ro)) & ~(data & wc));
> +    } else if (size == 4) {
> +        uint32_t ro = ldl_le_p(&s->regs_ro[addr]);
> +        uint32_t wc = ldl_le_p(&s->regs_wc[addr]);
> +        uint32_t rw = ldl_le_p(&s->regs_rw[addr]);
> +        stl_le_p(&s->regs_rw[addr], ((rw & ro) | (data & ~ro)) & ~(data & wc));
> +    } else if (size == 8) {
> +        uint64_t ro = ldq_le_p(&s->regs_ro[addr]);
> +        uint64_t wc = ldq_le_p(&s->regs_wc[addr]);
> +        uint64_t rw = ldq_le_p(&s->regs_rw[addr]);
> +        stq_le_p(&s->regs_rw[addr], ((rw & ro) | (data & ~ro)) & ~(data & wc));
> +    }
> +
> +    /* Busy flag update, MSB 4-byte register. */
> +    if (busy) {
> +        uint32_t rw = ldl_le_p(&s->regs_rw[regb]);
> +        stl_le_p(&s->regs_rw[regb], rw | busy);
> +    }
> +    qemu_spin_unlock(&s->regs_lock);
> +
> +    /* Wake up core processing thread. */
> +    if (exec) {
> +        qemu_mutex_lock(&s->core_lock);
> +        s->core_exec |= exec;
> +        qemu_cond_signal(&s->core_cond);
> +        qemu_mutex_unlock(&s->core_lock);
> +    }
> +
> +    return MEMTX_OK;
> +}
> +
> +static MemTxResult riscv_iommu_mmio_read(void *opaque, hwaddr addr,
> +    uint64_t *data, unsigned size, MemTxAttrs attrs)
> +{
> +    RISCVIOMMUState *s = opaque;
> +    uint64_t val = -1;
> +    uint8_t *ptr;
> +
> +    if ((addr & (size - 1)) != 0) {
> +        /* Unsupported MMIO alignment. */
> +        return MEMTX_ERROR;
> +    }
> +
> +    if (addr + size > RISCV_IOMMU_REG_MSI_CONFIG) {
> +        return MEMTX_ACCESS_ERROR;
> +    }
> +
> +    ptr = &s->regs_rw[addr];
> +
> +    if (size == 1) {
> +        val = (uint64_t)*ptr;
> +    } else if (size == 2) {
> +        val = lduw_le_p(ptr);
> +    } else if (size == 4) {
> +        val = ldl_le_p(ptr);
> +    } else if (size == 8) {
> +        val = ldq_le_p(ptr);
> +    } else {
> +        return MEMTX_ERROR;
> +    }
> +
> +    *data = val;
> +
> +    return MEMTX_OK;
> +}
> +
> +static const MemoryRegionOps riscv_iommu_mmio_ops = {
> +    .read_with_attrs = riscv_iommu_mmio_read,
> +    .write_with_attrs = riscv_iommu_mmio_write,
> +    .endianness = DEVICE_NATIVE_ENDIAN,
> +    .impl = {
> +        .min_access_size = 1,
> +        .max_access_size = 8,
> +        .unaligned = false,
> +    },
> +    .valid = {
> +        .min_access_size = 1,
> +        .max_access_size = 8,
> +    }

Spec says:
"The IOMMU behavior for register accesses where the address is not aligned
to the size of the access, or if the access spans multiple registers,
or if the size
of the access is not 4 bytes or 8 bytes, is UNSPECIFIED."

Section 6.1. Reading and writing IOMMU registers also says:
"Registers that are 64-bit wide may be accessed using either a 32-bit
or a 64-bit access.
Registers that are 32-bit wide must only be accessed using a 32-bit access."

Should we limit the access sizes to only 4 and 8 bytes?

> +};
> +
> +/*
> + * Translations matching MSI pattern check are redirected to "riscv-iommu-trap"
> + * memory region as untranslated address, for additional MSI/MRIF interception
> + * by IOMMU interrupt remapping implementation.
> + * Note: Device emulation code generating an MSI is expected to provide a valid
> + * memory transaction attributes with requested_id set.
> + */
> +static MemTxResult riscv_iommu_trap_write(void *opaque, hwaddr addr,
> +    uint64_t data, unsigned size, MemTxAttrs attrs)
> +{
> +    RISCVIOMMUState* s = (RISCVIOMMUState *)opaque;
> +    RISCVIOMMUContext *ctx;
> +    MemTxResult res;
> +    void *ref;
> +    uint32_t devid = attrs.requester_id;
> +
> +    if (attrs.unspecified) {
> +        return MEMTX_ACCESS_ERROR;
> +    }
> +
> +    /* FIXME: PCIe bus remapping for attached endpoints. */
> +    devid |= s->bus << 8;
> +
> +    ctx = riscv_iommu_ctx(s, devid, 0, &ref);
> +    if (ctx == NULL) {
> +        res = MEMTX_ACCESS_ERROR;
> +    } else {
> +        res = riscv_iommu_msi_write(s, ctx, addr, data, size, attrs);
> +    }
> +    riscv_iommu_ctx_put(s, ref);
> +    return res;
> +}
> +
> +static MemTxResult riscv_iommu_trap_read(void *opaque, hwaddr addr,
> +    uint64_t *data, unsigned size, MemTxAttrs attrs)
> +{
> +    return MEMTX_ACCESS_ERROR;
> +}
> +
> +static const MemoryRegionOps riscv_iommu_trap_ops = {
> +    .read_with_attrs = riscv_iommu_trap_read,
> +    .write_with_attrs = riscv_iommu_trap_write,
> +    .endianness = DEVICE_LITTLE_ENDIAN,
> +    .impl = {
> +        .min_access_size = 1,
> +        .max_access_size = 8,
> +        .unaligned = true,
> +    },
> +    .valid = {
> +        .min_access_size = 1,
> +        .max_access_size = 8,
> +    }
> +};
> +
> +static void riscv_iommu_realize(DeviceState *dev, Error **errp)
> +{
> +    RISCVIOMMUState *s = RISCV_IOMMU(dev);
> +
> +    s->cap = s->version & RISCV_IOMMU_CAP_VERSION;
> +    if (s->enable_msi) {
> +        s->cap |= RISCV_IOMMU_CAP_MSI_FLAT | RISCV_IOMMU_CAP_MSI_MRIF;
> +    }
> +    /* Report QEMU target physical address space limits */
> +    s->cap = set_field(s->cap, RISCV_IOMMU_CAP_PAS,
> +                       TARGET_PHYS_ADDR_SPACE_BITS);
> +
> +    /* TODO: method to report supported PASID bits */
> +    s->pasid_bits = 8; /* restricted to size of MemTxAttrs.pasid */
> +    s->cap |= RISCV_IOMMU_CAP_PD8;
> +
> +    /* Out-of-reset translation mode: OFF (DMA disabled) BARE (passthrough) */
> +    s->ddtp = set_field(0, RISCV_IOMMU_DDTP_MODE, s->enable_off ?
> +                        RISCV_IOMMU_DDTP_MODE_OFF : RISCV_IOMMU_DDTP_MODE_BARE);
> +
> +    /* register storage */
> +    s->regs_rw = g_new0(uint8_t, RISCV_IOMMU_REG_SIZE);
> +    s->regs_ro = g_new0(uint8_t, RISCV_IOMMU_REG_SIZE);
> +    s->regs_wc = g_new0(uint8_t, RISCV_IOMMU_REG_SIZE);
> +
> +     /* Mark all registers read-only */
> +    memset(s->regs_ro, 0xff, RISCV_IOMMU_REG_SIZE);
> +
> +    /*
> +     * Register complete MMIO space, including MSI/PBA registers.
> +     * Note, PCIDevice implementation will add overlapping MR for MSI/PBA,
> +     * managed directly by the PCIDevice implementation.
> +     */
> +    memory_region_init_io(&s->regs_mr, OBJECT(dev), &riscv_iommu_mmio_ops, s,
> +        "riscv-iommu-regs", RISCV_IOMMU_REG_SIZE);
> +
> +    /* Set power-on register state */
> +    stq_le_p(&s->regs_rw[RISCV_IOMMU_REG_CAP], s->cap);
> +    stq_le_p(&s->regs_rw[RISCV_IOMMU_REG_FCTL], s->fctl);
> +    stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_DDTP],
> +        ~(RISCV_IOMMU_DDTP_PPN | RISCV_IOMMU_DDTP_MODE));
> +    stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_CQB],
> +        ~(RISCV_IOMMU_CQB_LOG2SZ | RISCV_IOMMU_CQB_PPN));
> +    stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_FQB],
> +        ~(RISCV_IOMMU_FQB_LOG2SZ | RISCV_IOMMU_FQB_PPN));
> +    stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_PQB],
> +        ~(RISCV_IOMMU_PQB_LOG2SZ | RISCV_IOMMU_PQB_PPN));
> +    stl_le_p(&s->regs_wc[RISCV_IOMMU_REG_CQCSR], RISCV_IOMMU_CQCSR_CQMF |
> +        RISCV_IOMMU_CQCSR_CMD_TO | RISCV_IOMMU_CQCSR_CMD_ILL);
> +    stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_CQCSR], RISCV_IOMMU_CQCSR_CQON |
> +        RISCV_IOMMU_CQCSR_BUSY);
> +    stl_le_p(&s->regs_wc[RISCV_IOMMU_REG_FQCSR], RISCV_IOMMU_FQCSR_FQMF |
> +        RISCV_IOMMU_FQCSR_FQOF);
> +    stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_FQCSR], RISCV_IOMMU_FQCSR_FQON |
> +        RISCV_IOMMU_FQCSR_BUSY);
> +    stl_le_p(&s->regs_wc[RISCV_IOMMU_REG_PQCSR], RISCV_IOMMU_PQCSR_PQMF |
> +        RISCV_IOMMU_PQCSR_PQOF);
> +    stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_PQCSR], RISCV_IOMMU_PQCSR_PQON |
> +        RISCV_IOMMU_PQCSR_BUSY);
> +    stl_le_p(&s->regs_wc[RISCV_IOMMU_REG_IPSR], ~0);
> +    stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_IVEC], 0);
> +    stq_le_p(&s->regs_rw[RISCV_IOMMU_REG_DDTP], s->ddtp);
> +
> +    /* Memory region for downstream access, if specified. */
> +    if (s->target_mr) {
> +        s->target_as = g_new0(AddressSpace, 1);
> +        address_space_init(s->target_as, s->target_mr,
> +            "riscv-iommu-downstream");
> +    } else {
> +        /* Fallback to global system memory. */
> +        s->target_as = &address_space_memory;
> +    }
> +
> +    /* Memory region for untranslated MRIF/MSI writes */
> +    memory_region_init_io(&s->trap_mr, OBJECT(dev), &riscv_iommu_trap_ops, s,
> +            "riscv-iommu-trap", ~0ULL);
> +    address_space_init(&s->trap_as, &s->trap_mr, "riscv-iommu-trap-as");
> +
> +    /* Device translation context cache */
> +    s->ctx_cache = g_hash_table_new_full(__ctx_hash, __ctx_equal,
> +                                         g_free, NULL);
> +
> +    s->iommus.le_next = NULL;
> +    s->iommus.le_prev = NULL;
> +    QLIST_INIT(&s->spaces);
> +    qemu_cond_init(&s->core_cond);
> +    qemu_mutex_init(&s->core_lock);
> +    qemu_spin_init(&s->regs_lock);
> +    qemu_thread_create(&s->core_proc, "riscv-iommu-core",
> +        riscv_iommu_core_proc, s, QEMU_THREAD_JOINABLE);

In our experience, using QEMU thread increases the latency of command
queue processing,
which leads to the potential IOMMU fence timeout in the Linux driver
when using IOMMU with KVM,
e.g. booting the guest Linux.

Is it possible to remove the thread from the IOMMU just like ARM, AMD,
and Intel IOMMU models?

> +}
> +
> +static void riscv_iommu_unrealize(DeviceState *dev)
> +{
> +    RISCVIOMMUState *s = RISCV_IOMMU(dev);
> +
> +    qemu_mutex_lock(&s->core_lock);
> +    /* cancel pending operations and stop */
> +    s->core_exec = BIT(RISCV_IOMMU_EXEC_EXIT);
> +    qemu_cond_signal(&s->core_cond);
> +    qemu_mutex_unlock(&s->core_lock);
> +    qemu_thread_join(&s->core_proc);
> +    qemu_cond_destroy(&s->core_cond);
> +    qemu_mutex_destroy(&s->core_lock);
> +    g_hash_table_unref(s->ctx_cache);
> +}
> +
> +static Property riscv_iommu_properties[] = {
> +    DEFINE_PROP_UINT32("version", RISCVIOMMUState, version,
> +        RISCV_IOMMU_SPEC_DOT_VER),
> +    DEFINE_PROP_UINT32("bus", RISCVIOMMUState, bus, 0x0),
> +    DEFINE_PROP_BOOL("intremap", RISCVIOMMUState, enable_msi, TRUE),
> +    DEFINE_PROP_BOOL("off", RISCVIOMMUState, enable_off, TRUE),
> +    DEFINE_PROP_LINK("downstream-mr", RISCVIOMMUState, target_mr,
> +        TYPE_MEMORY_REGION, MemoryRegion *),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void riscv_iommu_class_init(ObjectClass *klass, void* data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +
> +    /* internal device for riscv-iommu-{pci/sys}, not user-creatable */
> +    dc->user_creatable = false;
> +    dc->realize = riscv_iommu_realize;
> +    dc->unrealize = riscv_iommu_unrealize;
> +    device_class_set_props(dc, riscv_iommu_properties);
> +}
> +
> +static const TypeInfo riscv_iommu_info = {
> +    .name = TYPE_RISCV_IOMMU,
> +    .parent = TYPE_DEVICE,
> +    .instance_size = sizeof(RISCVIOMMUState),
> +    .class_init = riscv_iommu_class_init,
> +};
> +
> +static const char *IOMMU_FLAG_STR[] = {
> +    "NA",
> +    "RO",
> +    "WR",
> +    "RW",
> +};
> +
> +/* RISC-V IOMMU Memory Region - Address Translation Space */
> +static IOMMUTLBEntry riscv_iommu_memory_region_translate(
> +    IOMMUMemoryRegion *iommu_mr, hwaddr addr,
> +    IOMMUAccessFlags flag, int iommu_idx)
> +{
> +    RISCVIOMMUSpace *as = container_of(iommu_mr, RISCVIOMMUSpace, iova_mr);
> +    RISCVIOMMUContext *ctx;
> +    void *ref;
> +    IOMMUTLBEntry iotlb = {
> +        .iova = addr,
> +        .target_as = as->iommu->target_as,
> +        .addr_mask = ~0ULL,
> +        .perm = flag,
> +    };
> +
> +    ctx = riscv_iommu_ctx(as->iommu, as->devid, iommu_idx, &ref);
> +    if (ctx == NULL) {
> +        /* Translation disabled or invalid. */
> +        iotlb.addr_mask = 0;
> +        iotlb.perm = IOMMU_NONE;
> +    } else if (riscv_iommu_translate(as->iommu, ctx, &iotlb)) {
> +        /* Translation disabled or fault reported. */
> +        iotlb.addr_mask = 0;
> +        iotlb.perm = IOMMU_NONE;
> +    }
> +
> +    /* Trace all dma translations with original access flags. */
> +    trace_riscv_iommu_dma(as->iommu->parent_obj.id, PCI_BUS_NUM(as->devid),
> +                          PCI_SLOT(as->devid), PCI_FUNC(as->devid), iommu_idx,
> +                          IOMMU_FLAG_STR[flag & IOMMU_RW], iotlb.iova,
> +                          iotlb.translated_addr);
> +
> +    riscv_iommu_ctx_put(as->iommu, ref);
> +
> +    return iotlb;
> +}
> +
> +static int riscv_iommu_memory_region_notify(
> +    IOMMUMemoryRegion *iommu_mr, IOMMUNotifierFlag old,
> +    IOMMUNotifierFlag new, Error **errp)
> +{
> +    RISCVIOMMUSpace *as = container_of(iommu_mr, RISCVIOMMUSpace, iova_mr);
> +
> +    if (old == IOMMU_NOTIFIER_NONE) {
> +        as->notifier = true;
> +        trace_riscv_iommu_notifier_add(iommu_mr->parent_obj.name);
> +    } else if (new == IOMMU_NOTIFIER_NONE) {
> +        as->notifier = false;
> +        trace_riscv_iommu_notifier_del(iommu_mr->parent_obj.name);
> +    }
> +
> +    return 0;
> +}
> +
> +static inline bool pci_is_iommu(PCIDevice *pdev)
> +{
> +    return pci_get_word(pdev->config + PCI_CLASS_DEVICE) == 0x0806;
> +}
> +
> +static AddressSpace *riscv_iommu_find_as(PCIBus *bus, void *opaque, int devfn)
> +{
> +    RISCVIOMMUState *s = (RISCVIOMMUState *) opaque;
> +    PCIDevice *pdev = pci_find_device(bus, pci_bus_num(bus), devfn);
> +    AddressSpace *as = NULL;
> +
> +    if (pdev && pci_is_iommu(pdev)) {
> +        return s->target_as;
> +    }
> +
> +    /* Find first registered IOMMU device */
> +    while (s->iommus.le_prev) {
> +        s = *(s->iommus.le_prev);
> +    }
> +
> +    /* Find first matching IOMMU */
> +    while (s != NULL && as == NULL) {
> +        as = riscv_iommu_space(s, PCI_BUILD_BDF(pci_bus_num(bus), devfn));

For pci_bus_num(),
riscv_iommu_find_as() can be called at the very early stage
where software has no chance to enumerate the bus numbers.




> +        s = s->iommus.le_next;
> +    }
> +
> +    return as ? as : &address_space_memory;
> +}
> +
> +static const PCIIOMMUOps riscv_iommu_ops = {
> +    .get_address_space = riscv_iommu_find_as,
> +};
> +
> +void riscv_iommu_pci_setup_iommu(RISCVIOMMUState *iommu, PCIBus *bus,
> +        Error **errp)
> +{
> +    if (bus->iommu_ops &&
> +        bus->iommu_ops->get_address_space == riscv_iommu_find_as) {
> +        /* Allow multiple IOMMUs on the same PCIe bus, link known devices */
> +        RISCVIOMMUState *last = (RISCVIOMMUState *)bus->iommu_opaque;
> +        QLIST_INSERT_AFTER(last, iommu, iommus);
> +    } else if (bus->iommu_ops == NULL) {
> +        pci_setup_iommu(bus, &riscv_iommu_ops, iommu);
> +    } else {
> +        error_setg(errp, "can't register secondary IOMMU for PCI bus #%d",
> +            pci_bus_num(bus));
> +    }
> +}
> +
> +static int riscv_iommu_memory_region_index(IOMMUMemoryRegion *iommu_mr,
> +    MemTxAttrs attrs)
> +{
> +    return attrs.unspecified ? RISCV_IOMMU_NOPASID : (int)attrs.pasid;
> +}
> +
> +static int riscv_iommu_memory_region_index_len(IOMMUMemoryRegion *iommu_mr)
> +{
> +    RISCVIOMMUSpace *as = container_of(iommu_mr, RISCVIOMMUSpace, iova_mr);
> +    return 1 << as->iommu->pasid_bits;
> +}
> +
> +static void riscv_iommu_memory_region_init(ObjectClass *klass, void *data)
> +{
> +    IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_CLASS(klass);
> +
> +    imrc->translate = riscv_iommu_memory_region_translate;
> +    imrc->notify_flag_changed = riscv_iommu_memory_region_notify;
> +    imrc->attrs_to_index = riscv_iommu_memory_region_index;
> +    imrc->num_indexes = riscv_iommu_memory_region_index_len;
> +}
> +
> +static const TypeInfo riscv_iommu_memory_region_info = {
> +    .parent = TYPE_IOMMU_MEMORY_REGION,
> +    .name = TYPE_RISCV_IOMMU_MEMORY_REGION,
> +    .class_init = riscv_iommu_memory_region_init,
> +};
> +
> +static void riscv_iommu_register_mr_types(void)
> +{
> +    type_register_static(&riscv_iommu_memory_region_info);
> +    type_register_static(&riscv_iommu_info);
> +}
> +
> +type_init(riscv_iommu_register_mr_types);
> diff --git a/hw/riscv/riscv-iommu.h b/hw/riscv/riscv-iommu.h
> new file mode 100644
> index 0000000000..6f740de690
> --- /dev/null
> +++ b/hw/riscv/riscv-iommu.h
> @@ -0,0 +1,141 @@
> +/*
> + * QEMU emulation of an RISC-V IOMMU (Ziommu)
> + *
> + * Copyright (C) 2022-2023 Rivos Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef HW_RISCV_IOMMU_STATE_H
> +#define HW_RISCV_IOMMU_STATE_H
> +
> +#include "qemu/osdep.h"
> +#include "qom/object.h"
> +
> +#include "hw/riscv/iommu.h"
> +
> +struct RISCVIOMMUState {
> +    /*< private >*/
> +    DeviceState parent_obj;
> +
> +    /*< public >*/
> +    uint32_t version;     /* Reported interface version number */
> +    uint32_t pasid_bits;  /* process identifier width */
> +    uint32_t bus;         /* PCI bus mapping for non-root endpoints */
> +
> +    uint64_t cap;         /* IOMMU supported capabilities */
> +    uint64_t fctl;        /* IOMMU enabled features */
> +
> +    bool enable_off;      /* Enable out-of-reset OFF mode (DMA disabled) */
> +    bool enable_msi;      /* Enable MSI remapping */
> +
> +    /* IOMMU Internal State */
> +    uint64_t ddtp;        /* Validated Device Directory Tree Root Pointer */
> +
> +    dma_addr_t cq_addr;   /* Command queue base physical address */
> +    dma_addr_t fq_addr;   /* Fault/event queue base physical address */
> +    dma_addr_t pq_addr;   /* Page request queue base physical address */
> +
> +    uint32_t cq_mask;     /* Command queue index bit mask */
> +    uint32_t fq_mask;     /* Fault/event queue index bit mask */
> +    uint32_t pq_mask;     /* Page request queue index bit mask */
> +
> +    /* interrupt notifier */
> +    void (*notify)(RISCVIOMMUState *iommu, unsigned vector);
> +
> +    /* IOMMU State Machine */
> +    QemuThread core_proc; /* Background processing thread */
> +    QemuMutex core_lock;  /* Global IOMMU lock, used for cache/regs updates */
> +    QemuCond core_cond;   /* Background processing wake up signal */
> +    unsigned core_exec;   /* Processing thread execution actions */
> +
> +    /* IOMMU target address space */
> +    AddressSpace *target_as;
> +    MemoryRegion *target_mr;
> +
> +    /* MSI / MRIF access trap */
> +    AddressSpace trap_as;
> +    MemoryRegion trap_mr;
> +
> +    GHashTable *ctx_cache;          /* Device translation Context Cache */
> +
> +    /* MMIO Hardware Interface */
> +    MemoryRegion regs_mr;
> +    QemuSpin regs_lock;
> +    uint8_t *regs_rw;  /* register state (user write) */
> +    uint8_t *regs_wc;  /* write-1-to-clear mask */
> +    uint8_t *regs_ro;  /* read-only mask */
> +
> +    QLIST_ENTRY(RISCVIOMMUState) iommus;
> +    QLIST_HEAD(, RISCVIOMMUSpace) spaces;
> +};
> +
> +void riscv_iommu_pci_setup_iommu(RISCVIOMMUState *iommu, PCIBus *bus,
> +         Error **errp);
> +
> +/* private helpers */
> +
> +/* Register helper functions */
> +static inline uint32_t riscv_iommu_reg_mod32(RISCVIOMMUState *s,
> +    unsigned idx, uint32_t set, uint32_t clr)
> +{
> +    uint32_t val;
> +    qemu_spin_lock(&s->regs_lock);
> +    val = ldl_le_p(s->regs_rw + idx);
> +    stl_le_p(s->regs_rw + idx, (val & ~clr) | set);
> +    qemu_spin_unlock(&s->regs_lock);
> +    return val;
> +}
> +
> +static inline void riscv_iommu_reg_set32(RISCVIOMMUState *s,
> +    unsigned idx, uint32_t set)
> +{
> +    qemu_spin_lock(&s->regs_lock);
> +    stl_le_p(s->regs_rw + idx, set);
> +    qemu_spin_unlock(&s->regs_lock);
> +}
> +
> +static inline uint32_t riscv_iommu_reg_get32(RISCVIOMMUState *s,
> +    unsigned idx)
> +{
> +    return ldl_le_p(s->regs_rw + idx);
> +}
> +
> +static inline uint64_t riscv_iommu_reg_mod64(RISCVIOMMUState *s,
> +    unsigned idx, uint64_t set, uint64_t clr)
> +{
> +    uint64_t val;
> +    qemu_spin_lock(&s->regs_lock);
> +    val = ldq_le_p(s->regs_rw + idx);
> +    stq_le_p(s->regs_rw + idx, (val & ~clr) | set);
> +    qemu_spin_unlock(&s->regs_lock);
> +    return val;
> +}
> +
> +static inline void riscv_iommu_reg_set64(RISCVIOMMUState *s,
> +    unsigned idx, uint64_t set)
> +{
> +    qemu_spin_lock(&s->regs_lock);
> +    stq_le_p(s->regs_rw + idx, set);
> +    qemu_spin_unlock(&s->regs_lock);
> +}
> +
> +static inline uint64_t riscv_iommu_reg_get64(RISCVIOMMUState *s,
> +    unsigned idx)
> +{
> +    return ldq_le_p(s->regs_rw + idx);
> +}
> +
> +
> +
> +#endif
> diff --git a/hw/riscv/trace-events b/hw/riscv/trace-events
> new file mode 100644
> index 0000000000..42a97caffa
> --- /dev/null
> +++ b/hw/riscv/trace-events
> @@ -0,0 +1,11 @@
> +# See documentation at docs/devel/tracing.rst
> +
> +# riscv-iommu.c
> +riscv_iommu_new(const char *id, unsigned b, unsigned d, unsigned f) "%s: device attached %04x:%02x.%d"
> +riscv_iommu_flt(const char *id, unsigned b, unsigned d, unsigned f, uint64_t reason, uint64_t iova) "%s: fault %04x:%02x.%u reason: 0x%"PRIx64" iova: 0x%"PRIx64
> +riscv_iommu_pri(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova) "%s: page request %04x:%02x.%u iova: 0x%"PRIx64
> +riscv_iommu_dma(const char *id, unsigned b, unsigned d, unsigned f, unsigned pasid, const char *dir, uint64_t iova, uint64_t phys) "%s: translate %04x:%02x.%u #%u %s 0x%"PRIx64" -> 0x%"PRIx64
> +riscv_iommu_msi(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova, uint64_t phys) "%s: translate %04x:%02x.%u MSI 0x%"PRIx64" -> 0x%"PRIx64
> +riscv_iommu_cmd(const char *id, uint64_t l, uint64_t u) "%s: command 0x%"PRIx64" 0x%"PRIx64
> +riscv_iommu_notifier_add(const char *id) "%s: dev-iotlb notifier added"
> +riscv_iommu_notifier_del(const char *id) "%s: dev-iotlb notifier removed"
> diff --git a/hw/riscv/trace.h b/hw/riscv/trace.h
> new file mode 100644
> index 0000000000..b88504b750
> --- /dev/null
> +++ b/hw/riscv/trace.h
> @@ -0,0 +1,2 @@
> +#include "trace/trace-hw_riscv.h"
> +
> diff --git a/include/hw/riscv/iommu.h b/include/hw/riscv/iommu.h
> new file mode 100644
> index 0000000000..403b365893
> --- /dev/null
> +++ b/include/hw/riscv/iommu.h
> @@ -0,0 +1,36 @@
> +/*
> + * QEMU emulation of an RISC-V IOMMU (Ziommu)
> + *
> + * Copyright (C) 2022-2023 Rivos Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef HW_RISCV_IOMMU_H
> +#define HW_RISCV_IOMMU_H
> +
> +#include "qemu/osdep.h"
> +#include "qom/object.h"
> +
> +#define TYPE_RISCV_IOMMU "riscv-iommu"
> +OBJECT_DECLARE_SIMPLE_TYPE(RISCVIOMMUState, RISCV_IOMMU)
> +typedef struct RISCVIOMMUState RISCVIOMMUState;
> +
> +#define TYPE_RISCV_IOMMU_MEMORY_REGION "riscv-iommu-mr"
> +typedef struct RISCVIOMMUSpace RISCVIOMMUSpace;
> +
> +#define TYPE_RISCV_IOMMU_PCI "riscv-iommu-pci"
> +OBJECT_DECLARE_SIMPLE_TYPE(RISCVIOMMUStatePci, RISCV_IOMMU_PCI)
> +typedef struct RISCVIOMMUStatePci RISCVIOMMUStatePci;
> +
> +#endif
> diff --git a/meson.build b/meson.build
> index c59ca496f2..75e56f3282 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -3361,6 +3361,7 @@ if have_system
>      'hw/rdma',
>      'hw/rdma/vmw',
>      'hw/rtc',
> +    'hw/riscv',
>      'hw/s390x',
>      'hw/scsi',
>      'hw/sd',
> --
> 2.43.2
>
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 11/15] hw/riscv/riscv-iommu: add DBG support
  2024-03-07 16:03 ` [PATCH v2 11/15] hw/riscv/riscv-iommu: add DBG support Daniel Henrique Barboza
@ 2024-05-06  4:09   ` Frank Chang
  2024-05-06 13:05     ` Daniel Henrique Barboza
  0 siblings, 1 reply; 55+ messages in thread
From: Frank Chang @ 2024-05-06  4:09 UTC (permalink / raw)
  To: Daniel Henrique Barboza
  Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
	zhiwei_liu, palmer, ajones, tjeznach

Hi Daniel,

Daniel Henrique Barboza <dbarboza@ventanamicro.com> 於 2024年3月8日 週五 上午12:05寫道:
>
> From: Tomasz Jeznach <tjeznach@rivosinc.com>
>
> DBG support adds three additional registers: tr_req_iova, tr_req_ctl and
> tr_response.
>
> The DBG cap is always enabled. No on/off toggle is provided for it.
>
> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> ---
>  hw/riscv/riscv-iommu-bits.h | 20 +++++++++++++
>  hw/riscv/riscv-iommu.c      | 57 ++++++++++++++++++++++++++++++++++++-
>  2 files changed, 76 insertions(+), 1 deletion(-)
>
> diff --git a/hw/riscv/riscv-iommu-bits.h b/hw/riscv/riscv-iommu-bits.h
> index 0994f5ce48..b3f92411bb 100644
> --- a/hw/riscv/riscv-iommu-bits.h
> +++ b/hw/riscv/riscv-iommu-bits.h
> @@ -83,6 +83,7 @@ struct riscv_iommu_pq_record {
>  #define RISCV_IOMMU_CAP_MSI_MRIF        BIT_ULL(23)
>  #define RISCV_IOMMU_CAP_ATS             BIT_ULL(25)
>  #define RISCV_IOMMU_CAP_IGS             GENMASK_ULL(29, 28)
> +#define RISCV_IOMMU_CAP_DBG             BIT_ULL(31)
>  #define RISCV_IOMMU_CAP_PAS             GENMASK_ULL(37, 32)
>  #define RISCV_IOMMU_CAP_PD8             BIT_ULL(38)
>
> @@ -177,6 +178,25 @@ enum {
>      RISCV_IOMMU_INTR_COUNT
>  };
>
> +#define RISCV_IOMMU_IPSR_CIP            BIT(RISCV_IOMMU_INTR_CQ)
> +#define RISCV_IOMMU_IPSR_FIP            BIT(RISCV_IOMMU_INTR_FQ)
> +#define RISCV_IOMMU_IPSR_PMIP           BIT(RISCV_IOMMU_INTR_PM)
> +#define RISCV_IOMMU_IPSR_PIP            BIT(RISCV_IOMMU_INTR_PQ)

These are not related to the DBG.

> +
> +/* 5.24 Translation request IOVA (64bits) */
> +#define RISCV_IOMMU_REG_TR_REQ_IOVA     0x0258
> +
> +/* 5.25 Translation request control (64bits) */
> +#define RISCV_IOMMU_REG_TR_REQ_CTL      0x0260
> +#define RISCV_IOMMU_TR_REQ_CTL_GO_BUSY  BIT_ULL(0)
> +#define RISCV_IOMMU_TR_REQ_CTL_PID      GENMASK_ULL(31, 12)
> +#define RISCV_IOMMU_TR_REQ_CTL_DID      GENMASK_ULL(63, 40)
> +
> +/* 5.26 Translation request response (64bits) */
> +#define RISCV_IOMMU_REG_TR_RESPONSE     0x0268
> +#define RISCV_IOMMU_TR_RESPONSE_FAULT   BIT_ULL(0)
> +#define RISCV_IOMMU_TR_RESPONSE_PPN     RISCV_IOMMU_PPN_FIELD
> +
>  /* 5.27 Interrupt cause to vector (64bits) */
>  #define RISCV_IOMMU_REG_IVEC            0x02F8
>
> diff --git a/hw/riscv/riscv-iommu.c b/hw/riscv/riscv-iommu.c
> index 7af5929b10..1fa1286d07 100644
> --- a/hw/riscv/riscv-iommu.c
> +++ b/hw/riscv/riscv-iommu.c
> @@ -1457,6 +1457,46 @@ static void riscv_iommu_process_pq_control(RISCVIOMMUState *s)
>      riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_PQCSR, ctrl_set, ctrl_clr);
>  }
>
> +static void riscv_iommu_process_dbg(RISCVIOMMUState *s)
> +{
> +    uint64_t iova = riscv_iommu_reg_get64(s, RISCV_IOMMU_REG_TR_REQ_IOVA);
> +    uint64_t ctrl = riscv_iommu_reg_get64(s, RISCV_IOMMU_REG_TR_REQ_CTL);
> +    unsigned devid = get_field(ctrl, RISCV_IOMMU_TR_REQ_CTL_DID);
> +    unsigned pid = get_field(ctrl, RISCV_IOMMU_TR_REQ_CTL_PID);
> +    RISCVIOMMUContext *ctx;
> +    void *ref;
> +
> +    if (!(ctrl & RISCV_IOMMU_TR_REQ_CTL_GO_BUSY)) {
> +        return;
> +    }
> +
> +    ctx = riscv_iommu_ctx(s, devid, pid, &ref);
> +    if (ctx == NULL) {
> +        riscv_iommu_reg_set64(s, RISCV_IOMMU_REG_TR_RESPONSE,
> +                                 RISCV_IOMMU_TR_RESPONSE_FAULT |
> +                                 (RISCV_IOMMU_FQ_CAUSE_DMA_DISABLED << 10));
> +    } else {
> +        IOMMUTLBEntry iotlb = {
> +            .iova = iova,
> +            .perm = IOMMU_NONE,

.perm should honor tr_req_ctl.[Exe|Nw]

> +            .addr_mask = ~0,
> +            .target_as = NULL,
> +        };
> +        int fault = riscv_iommu_translate(s, ctx, &iotlb, false);
> +        if (fault) {
> +            iova = RISCV_IOMMU_TR_RESPONSE_FAULT | (((uint64_t) fault) << 10);
> +        } else {
> +            iova = ((iotlb.translated_addr & ~iotlb.addr_mask) >> 2) &

For 4-KB page, we should right-shift 12 bits.

> +                RISCV_IOMMU_TR_RESPONSE_PPN;

It's possible that the translation is not 4-KB page (i.e. superpage),
which we should set tr_response.S
and encode translation range size in tr_response.PPN.

Regards,
Frank Chang

> +        }
> +        riscv_iommu_reg_set64(s, RISCV_IOMMU_REG_TR_RESPONSE, iova);
> +    }
> +
> +    riscv_iommu_reg_mod64(s, RISCV_IOMMU_REG_TR_REQ_CTL, 0,
> +        RISCV_IOMMU_TR_REQ_CTL_GO_BUSY);
> +    riscv_iommu_ctx_put(s, ref);
> +}
> +
>  /* Core IOMMU execution activation */
>  enum {
>      RISCV_IOMMU_EXEC_DDTP,
> @@ -1502,7 +1542,7 @@ static void *riscv_iommu_core_proc(void* arg)
>              /* NOP */
>              break;
>          case BIT(RISCV_IOMMU_EXEC_TR_REQUEST):
> -            /* DBG support not implemented yet */
> +            riscv_iommu_process_dbg(s);
>              break;
>          }
>          exec &= ~mask;
> @@ -1574,6 +1614,12 @@ static MemTxResult riscv_iommu_mmio_write(void *opaque, hwaddr addr,
>          exec = BIT(RISCV_IOMMU_EXEC_PQCSR);
>          busy = RISCV_IOMMU_PQCSR_BUSY;
>          break;
> +
> +    case RISCV_IOMMU_REG_TR_REQ_CTL:
> +        exec = BIT(RISCV_IOMMU_EXEC_TR_REQUEST);
> +        regb = RISCV_IOMMU_REG_TR_REQ_CTL;
> +        busy = RISCV_IOMMU_TR_REQ_CTL_GO_BUSY;
> +        break;
>      }
>
>      /*
> @@ -1746,6 +1792,9 @@ static void riscv_iommu_realize(DeviceState *dev, Error **errp)
>          s->cap |= RISCV_IOMMU_CAP_SV32X4 | RISCV_IOMMU_CAP_SV39X4 |
>                    RISCV_IOMMU_CAP_SV48X4 | RISCV_IOMMU_CAP_SV57X4;
>      }
> +    /* Enable translation debug interface */
> +    s->cap |= RISCV_IOMMU_CAP_DBG;
> +
>      /* Report QEMU target physical address space limits */
>      s->cap = set_field(s->cap, RISCV_IOMMU_CAP_PAS,
>                         TARGET_PHYS_ADDR_SPACE_BITS);
> @@ -1800,6 +1849,12 @@ static void riscv_iommu_realize(DeviceState *dev, Error **errp)
>      stl_le_p(&s->regs_wc[RISCV_IOMMU_REG_IPSR], ~0);
>      stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_IVEC], 0);
>      stq_le_p(&s->regs_rw[RISCV_IOMMU_REG_DDTP], s->ddtp);
> +    /* If debug registers enabled. */
> +    if (s->cap & RISCV_IOMMU_CAP_DBG) {
> +        stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_TR_REQ_IOVA], 0);
> +        stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_TR_REQ_CTL],
> +            RISCV_IOMMU_TR_REQ_CTL_GO_BUSY);
> +    }
>
>      /* Memory region for downstream access, if specified. */
>      if (s->target_mr) {
> --
> 2.43.2
>
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 12/15] hw/riscv/riscv-iommu: Add another irq for mrif notifications
  2024-03-07 16:03 ` [PATCH v2 12/15] hw/riscv/riscv-iommu: Add another irq for mrif notifications Daniel Henrique Barboza
@ 2024-05-06  6:12   ` Frank Chang
  0 siblings, 0 replies; 55+ messages in thread
From: Frank Chang @ 2024-05-06  6:12 UTC (permalink / raw)
  To: Daniel Henrique Barboza
  Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
	zhiwei_liu, palmer, ajones, tjeznach

Reviewed-by: Frank Chang <frank.chang@sifive.com>

Daniel Henrique Barboza <dbarboza@ventanamicro.com> 於 2024年3月8日 週五 上午12:06寫道:
>
> From: Andrew Jones <ajones@ventanamicro.com>
>
> And add mrif notification trace.
>
> Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> ---
>  hw/riscv/riscv-iommu-pci.c | 2 +-
>  hw/riscv/riscv-iommu.c     | 1 +
>  hw/riscv/trace-events      | 1 +
>  3 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/hw/riscv/riscv-iommu-pci.c b/hw/riscv/riscv-iommu-pci.c
> index 4eb1057210..8a7b71166c 100644
> --- a/hw/riscv/riscv-iommu-pci.c
> +++ b/hw/riscv/riscv-iommu-pci.c
> @@ -78,7 +78,7 @@ static void riscv_iommu_pci_realize(PCIDevice *dev, Error **errp)
>      pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY |
>                       PCI_BASE_ADDRESS_MEM_TYPE_64, &s->bar0);
>
> -    int ret = msix_init(dev, RISCV_IOMMU_INTR_COUNT,
> +    int ret = msix_init(dev, RISCV_IOMMU_INTR_COUNT + 1,
>                          &s->bar0, 0, RISCV_IOMMU_REG_MSI_CONFIG,
>                          &s->bar0, 0, RISCV_IOMMU_REG_MSI_CONFIG + 256, 0, &err);
>
> diff --git a/hw/riscv/riscv-iommu.c b/hw/riscv/riscv-iommu.c
> index 1fa1286d07..954a6892c2 100644
> --- a/hw/riscv/riscv-iommu.c
> +++ b/hw/riscv/riscv-iommu.c
> @@ -543,6 +543,7 @@ static MemTxResult riscv_iommu_msi_write(RISCVIOMMUState *s,
>      if (res != MEMTX_OK) {
>          return res;
>      }
> +    trace_riscv_iommu_mrif_notification(s->parent_obj.id, n190, addr);
>
>      return MEMTX_OK;
>  }
> diff --git a/hw/riscv/trace-events b/hw/riscv/trace-events
> index 4b486b6420..d69719a27a 100644
> --- a/hw/riscv/trace-events
> +++ b/hw/riscv/trace-events
> @@ -6,6 +6,7 @@ riscv_iommu_flt(const char *id, unsigned b, unsigned d, unsigned f, uint64_t rea
>  riscv_iommu_pri(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova) "%s: page request %04x:%02x.%u iova: 0x%"PRIx64
>  riscv_iommu_dma(const char *id, unsigned b, unsigned d, unsigned f, unsigned pasid, const char *dir, uint64_t iova, uint64_t phys) "%s: translate %04x:%02x.%u #%u %s 0x%"PRIx64" -> 0x%"PRIx64
>  riscv_iommu_msi(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova, uint64_t phys) "%s: translate %04x:%02x.%u MSI 0x%"PRIx64" -> 0x%"PRIx64
> +riscv_iommu_mrif_notification(const char *id, uint32_t nid, uint64_t phys) "%s: sent MRIF notification 0x%x to 0x%"PRIx64
>  riscv_iommu_cmd(const char *id, uint64_t l, uint64_t u) "%s: command 0x%"PRIx64" 0x%"PRIx64
>  riscv_iommu_notifier_add(const char *id) "%s: dev-iotlb notifier added"
>  riscv_iommu_notifier_del(const char *id) "%s: dev-iotlb notifier removed"
> --
> 2.43.2
>
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 11/15] hw/riscv/riscv-iommu: add DBG support
  2024-05-06  4:09   ` Frank Chang
@ 2024-05-06 13:05     ` Daniel Henrique Barboza
  2024-05-10 10:59       ` Frank Chang
  0 siblings, 1 reply; 55+ messages in thread
From: Daniel Henrique Barboza @ 2024-05-06 13:05 UTC (permalink / raw)
  To: Frank Chang
  Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
	zhiwei_liu, palmer, ajones, tjeznach

Hi Frank,

On 5/6/24 01:09, Frank Chang wrote:
> Hi Daniel,
> 
> Daniel Henrique Barboza <dbarboza@ventanamicro.com> 於 2024年3月8日 週五 上午12:05寫道:
>>
>> From: Tomasz Jeznach <tjeznach@rivosinc.com>
>>
>> DBG support adds three additional registers: tr_req_iova, tr_req_ctl and
>> tr_response.
>>
>> The DBG cap is always enabled. No on/off toggle is provided for it.
>>
>> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
>> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
>> ---
>>   hw/riscv/riscv-iommu-bits.h | 20 +++++++++++++
>>   hw/riscv/riscv-iommu.c      | 57 ++++++++++++++++++++++++++++++++++++-
>>   2 files changed, 76 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/riscv/riscv-iommu-bits.h b/hw/riscv/riscv-iommu-bits.h
>> index 0994f5ce48..b3f92411bb 100644
>> --- a/hw/riscv/riscv-iommu-bits.h
>> +++ b/hw/riscv/riscv-iommu-bits.h
>> @@ -83,6 +83,7 @@ struct riscv_iommu_pq_record {
>>   #define RISCV_IOMMU_CAP_MSI_MRIF        BIT_ULL(23)
>>   #define RISCV_IOMMU_CAP_ATS             BIT_ULL(25)
>>   #define RISCV_IOMMU_CAP_IGS             GENMASK_ULL(29, 28)
>> +#define RISCV_IOMMU_CAP_DBG             BIT_ULL(31)
>>   #define RISCV_IOMMU_CAP_PAS             GENMASK_ULL(37, 32)
>>   #define RISCV_IOMMU_CAP_PD8             BIT_ULL(38)
>>
>> @@ -177,6 +178,25 @@ enum {
>>       RISCV_IOMMU_INTR_COUNT
>>   };
>>
>> +#define RISCV_IOMMU_IPSR_CIP            BIT(RISCV_IOMMU_INTR_CQ)
>> +#define RISCV_IOMMU_IPSR_FIP            BIT(RISCV_IOMMU_INTR_FQ)
>> +#define RISCV_IOMMU_IPSR_PMIP           BIT(RISCV_IOMMU_INTR_PM)
>> +#define RISCV_IOMMU_IPSR_PIP            BIT(RISCV_IOMMU_INTR_PQ)
> 
> These are not related to the DBG.
> 
>> +
>> +/* 5.24 Translation request IOVA (64bits) */
>> +#define RISCV_IOMMU_REG_TR_REQ_IOVA     0x0258
>> +
>> +/* 5.25 Translation request control (64bits) */
>> +#define RISCV_IOMMU_REG_TR_REQ_CTL      0x0260
>> +#define RISCV_IOMMU_TR_REQ_CTL_GO_BUSY  BIT_ULL(0)
>> +#define RISCV_IOMMU_TR_REQ_CTL_PID      GENMASK_ULL(31, 12)
>> +#define RISCV_IOMMU_TR_REQ_CTL_DID      GENMASK_ULL(63, 40)
>> +
>> +/* 5.26 Translation request response (64bits) */
>> +#define RISCV_IOMMU_REG_TR_RESPONSE     0x0268
>> +#define RISCV_IOMMU_TR_RESPONSE_FAULT   BIT_ULL(0)
>> +#define RISCV_IOMMU_TR_RESPONSE_PPN     RISCV_IOMMU_PPN_FIELD
>> +
>>   /* 5.27 Interrupt cause to vector (64bits) */
>>   #define RISCV_IOMMU_REG_IVEC            0x02F8
>>
>> diff --git a/hw/riscv/riscv-iommu.c b/hw/riscv/riscv-iommu.c
>> index 7af5929b10..1fa1286d07 100644
>> --- a/hw/riscv/riscv-iommu.c
>> +++ b/hw/riscv/riscv-iommu.c
>> @@ -1457,6 +1457,46 @@ static void riscv_iommu_process_pq_control(RISCVIOMMUState *s)
>>       riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_PQCSR, ctrl_set, ctrl_clr);
>>   }
>>
>> +static void riscv_iommu_process_dbg(RISCVIOMMUState *s)
>> +{
>> +    uint64_t iova = riscv_iommu_reg_get64(s, RISCV_IOMMU_REG_TR_REQ_IOVA);
>> +    uint64_t ctrl = riscv_iommu_reg_get64(s, RISCV_IOMMU_REG_TR_REQ_CTL);
>> +    unsigned devid = get_field(ctrl, RISCV_IOMMU_TR_REQ_CTL_DID);
>> +    unsigned pid = get_field(ctrl, RISCV_IOMMU_TR_REQ_CTL_PID);
>> +    RISCVIOMMUContext *ctx;
>> +    void *ref;
>> +
>> +    if (!(ctrl & RISCV_IOMMU_TR_REQ_CTL_GO_BUSY)) {
>> +        return;
>> +    }
>> +
>> +    ctx = riscv_iommu_ctx(s, devid, pid, &ref);
>> +    if (ctx == NULL) {
>> +        riscv_iommu_reg_set64(s, RISCV_IOMMU_REG_TR_RESPONSE,
>> +                                 RISCV_IOMMU_TR_RESPONSE_FAULT |
>> +                                 (RISCV_IOMMU_FQ_CAUSE_DMA_DISABLED << 10));
>> +    } else {
>> +        IOMMUTLBEntry iotlb = {
>> +            .iova = iova,
>> +            .perm = IOMMU_NONE,
> 
> .perm should honor tr_req_ctl.[Exe|Nw]
> 
>> +            .addr_mask = ~0,
>> +            .target_as = NULL,
>> +        };
>> +        int fault = riscv_iommu_translate(s, ctx, &iotlb, false);
>> +        if (fault) {
>> +            iova = RISCV_IOMMU_TR_RESPONSE_FAULT | (((uint64_t) fault) << 10);
>> +        } else {
>> +            iova = ((iotlb.translated_addr & ~iotlb.addr_mask) >> 2) &
> 
> For 4-KB page, we should right-shift 12 bits.
> 
>> +                RISCV_IOMMU_TR_RESPONSE_PPN;
> 
> It's possible that the translation is not 4-KB page (i.e. superpage),
> which we should set tr_response.S
> and encode translation range size in tr_response.PPN.

At this moment this emulation doesn't support superpages, at least from my
understanding. Tomasz is welcome to correct me if I'm wrong. I'll explictly
set tr_response.S to 0 here to make it clearer.

The idea here IIUC is to, in the future, merge the IOMMU translation lookup code
with the existing lookup code we have (cpu_helper.c, get_physical_address()), and
with that the IOMMU will end up supporting both super-pages and svnapot.



Thanks,

Daniel


> 
> Regards,
> Frank Chang
> 
>> +        }
>> +        riscv_iommu_reg_set64(s, RISCV_IOMMU_REG_TR_RESPONSE, iova);
>> +    }
>> +
>> +    riscv_iommu_reg_mod64(s, RISCV_IOMMU_REG_TR_REQ_CTL, 0,
>> +        RISCV_IOMMU_TR_REQ_CTL_GO_BUSY);
>> +    riscv_iommu_ctx_put(s, ref);
>> +}
>> +
>>   /* Core IOMMU execution activation */
>>   enum {
>>       RISCV_IOMMU_EXEC_DDTP,
>> @@ -1502,7 +1542,7 @@ static void *riscv_iommu_core_proc(void* arg)
>>               /* NOP */
>>               break;
>>           case BIT(RISCV_IOMMU_EXEC_TR_REQUEST):
>> -            /* DBG support not implemented yet */
>> +            riscv_iommu_process_dbg(s);
>>               break;
>>           }
>>           exec &= ~mask;
>> @@ -1574,6 +1614,12 @@ static MemTxResult riscv_iommu_mmio_write(void *opaque, hwaddr addr,
>>           exec = BIT(RISCV_IOMMU_EXEC_PQCSR);
>>           busy = RISCV_IOMMU_PQCSR_BUSY;
>>           break;
>> +
>> +    case RISCV_IOMMU_REG_TR_REQ_CTL:
>> +        exec = BIT(RISCV_IOMMU_EXEC_TR_REQUEST);
>> +        regb = RISCV_IOMMU_REG_TR_REQ_CTL;
>> +        busy = RISCV_IOMMU_TR_REQ_CTL_GO_BUSY;
>> +        break;
>>       }
>>
>>       /*
>> @@ -1746,6 +1792,9 @@ static void riscv_iommu_realize(DeviceState *dev, Error **errp)
>>           s->cap |= RISCV_IOMMU_CAP_SV32X4 | RISCV_IOMMU_CAP_SV39X4 |
>>                     RISCV_IOMMU_CAP_SV48X4 | RISCV_IOMMU_CAP_SV57X4;
>>       }
>> +    /* Enable translation debug interface */
>> +    s->cap |= RISCV_IOMMU_CAP_DBG;
>> +
>>       /* Report QEMU target physical address space limits */
>>       s->cap = set_field(s->cap, RISCV_IOMMU_CAP_PAS,
>>                          TARGET_PHYS_ADDR_SPACE_BITS);
>> @@ -1800,6 +1849,12 @@ static void riscv_iommu_realize(DeviceState *dev, Error **errp)
>>       stl_le_p(&s->regs_wc[RISCV_IOMMU_REG_IPSR], ~0);
>>       stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_IVEC], 0);
>>       stq_le_p(&s->regs_rw[RISCV_IOMMU_REG_DDTP], s->ddtp);
>> +    /* If debug registers enabled. */
>> +    if (s->cap & RISCV_IOMMU_CAP_DBG) {
>> +        stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_TR_REQ_IOVA], 0);
>> +        stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_TR_REQ_CTL],
>> +            RISCV_IOMMU_TR_REQ_CTL_GO_BUSY);
>> +    }
>>
>>       /* Memory region for downstream access, if specified. */
>>       if (s->target_mr) {
>> --
>> 2.43.2
>>
>>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 13/15] qtest/riscv-iommu-test: add init queues test
  2024-03-07 16:03 ` [PATCH v2 13/15] qtest/riscv-iommu-test: add init queues test Daniel Henrique Barboza
@ 2024-05-07  8:01   ` Frank Chang
  0 siblings, 0 replies; 55+ messages in thread
From: Frank Chang @ 2024-05-07  8:01 UTC (permalink / raw)
  To: Daniel Henrique Barboza
  Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
	zhiwei_liu, palmer, ajones, tjeznach

Reviewed-by: Frank Chang <frank.chang@sifive.com>

Daniel Henrique Barboza <dbarboza@ventanamicro.com> 於 2024年3月8日 週五 上午12:06寫道:
>
> Add an additional test to further exercise the IOMMU where we attempt to
> initialize the command, fault and page-request queues.
>
> These steps are taken from chapter 6.2 of the RISC-V IOMMU spec,
> "Guidelines for initialization". It emulates what we expect from the
> software/OS when initializing the IOMMU.
>
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> ---
>  tests/qtest/libqos/riscv-iommu.h |  29 +++++++
>  tests/qtest/riscv-iommu-test.c   | 141 +++++++++++++++++++++++++++++++
>  2 files changed, 170 insertions(+)
>
> diff --git a/tests/qtest/libqos/riscv-iommu.h b/tests/qtest/libqos/riscv-iommu.h
> index 8c056caa7b..aeaa5fb8b8 100644
> --- a/tests/qtest/libqos/riscv-iommu.h
> +++ b/tests/qtest/libqos/riscv-iommu.h
> @@ -58,6 +58,35 @@
>
>  #define RISCV_IOMMU_REG_IPSR            0x0054
>
> +#define RISCV_IOMMU_REG_IVEC            0x02F8
> +#define RISCV_IOMMU_REG_IVEC_CIV        GENMASK_ULL(3, 0)
> +#define RISCV_IOMMU_REG_IVEC_FIV        GENMASK_ULL(7, 4)
> +#define RISCV_IOMMU_REG_IVEC_PIV        GENMASK_ULL(15, 12)
> +
> +#define RISCV_IOMMU_REG_CQB             0x0018
> +#define RISCV_IOMMU_CQB_PPN_START       10
> +#define RISCV_IOMMU_CQB_PPN_LEN         44
> +#define RISCV_IOMMU_CQB_LOG2SZ_START    0
> +#define RISCV_IOMMU_CQB_LOG2SZ_LEN      5
> +
> +#define RISCV_IOMMU_REG_CQT             0x0024
> +
> +#define RISCV_IOMMU_REG_FQB             0x0028
> +#define RISCV_IOMMU_FQB_PPN_START       10
> +#define RISCV_IOMMU_FQB_PPN_LEN         44
> +#define RISCV_IOMMU_FQB_LOG2SZ_START    0
> +#define RISCV_IOMMU_FQB_LOG2SZ_LEN      5
> +
> +#define RISCV_IOMMU_REG_FQT             0x0034
> +
> +#define RISCV_IOMMU_REG_PQB             0x0038
> +#define RISCV_IOMMU_PQB_PPN_START       10
> +#define RISCV_IOMMU_PQB_PPN_LEN         44
> +#define RISCV_IOMMU_PQB_LOG2SZ_START    0
> +#define RISCV_IOMMU_PQB_LOG2SZ_LEN      5
> +
> +#define RISCV_IOMMU_REG_PQT             0x0044
> +
>  typedef struct QRISCVIOMMU {
>      QOSGraphObject obj;
>      QPCIDevice dev;
> diff --git a/tests/qtest/riscv-iommu-test.c b/tests/qtest/riscv-iommu-test.c
> index 13b887d15e..64f3f092f2 100644
> --- a/tests/qtest/riscv-iommu-test.c
> +++ b/tests/qtest/riscv-iommu-test.c
> @@ -33,6 +33,20 @@ static uint64_t riscv_iommu_read_reg64(QRISCVIOMMU *r_iommu, int reg_offset)
>      return reg;
>  }
>
> +static void riscv_iommu_write_reg32(QRISCVIOMMU *r_iommu, int reg_offset,
> +                                    uint32_t val)
> +{
> +    qpci_memwrite(&r_iommu->dev, r_iommu->reg_bar, reg_offset,
> +                  &val, sizeof(val));
> +}
> +
> +static void riscv_iommu_write_reg64(QRISCVIOMMU *r_iommu, int reg_offset,
> +                                    uint64_t val)
> +{
> +    qpci_memwrite(&r_iommu->dev, r_iommu->reg_bar, reg_offset,
> +                  &val, sizeof(val));
> +}
> +
>  static void test_pci_config(void *obj, void *data, QGuestAllocator *t_alloc)
>  {
>      QRISCVIOMMU *r_iommu = obj;
> @@ -84,10 +98,137 @@ static void test_reg_reset(void *obj, void *data, QGuestAllocator *t_alloc)
>      g_assert_cmpuint(reg, ==, 0);
>  }
>
> +/*
> + * Common timeout-based poll for CQCSR, FQCSR and PQCSR. All
> + * their ON bits are mapped as RISCV_IOMMU_QUEUE_ACTIVE (16),
> + */
> +static void qtest_wait_for_queue_active(QRISCVIOMMU *r_iommu,
> +                                        uint32_t queue_csr)
> +{
> +    QTestState *qts = global_qtest;
> +    guint64 timeout_us = 2 * 1000 * 1000;
> +    gint64 start_time = g_get_monotonic_time();
> +    uint32_t reg;
> +
> +    for (;;) {
> +        qtest_clock_step(qts, 100);
> +
> +        reg = riscv_iommu_read_reg32(r_iommu, queue_csr);
> +        if (reg & RISCV_IOMMU_QUEUE_ACTIVE) {
> +            break;
> +        }
> +        g_assert(g_get_monotonic_time() - start_time <= timeout_us);
> +    }
> +}
> +
> +/*
> + * Goes through the queue activation procedures of chapter 6.2,
> + * "Guidelines for initialization", of the RISCV-IOMMU spec.
> + */
> +static void test_iommu_init_queues(void *obj, void *data,
> +                                   QGuestAllocator *t_alloc)
> +{
> +    QRISCVIOMMU *r_iommu = obj;
> +    uint64_t reg64, q_addr;
> +    uint32_t reg;
> +    int k;
> +
> +    reg64 = riscv_iommu_read_reg64(r_iommu, RISCV_IOMMU_REG_CAP);
> +    g_assert_cmpuint(reg64 & RISCV_IOMMU_CAP_VERSION, ==, 0x10);
> +
> +    /*
> +     * Program the command queue. Write 0xF to civ, assert that
> +     * we have 4 writable bits (k = 4). The amount of entries N in the
> +     * command queue is 2^4 = 16. We need to alloc a N*16 bytes
> +     * buffer and use it to set cqb.
> +     */
> +    riscv_iommu_write_reg32(r_iommu, RISCV_IOMMU_REG_IVEC,
> +                            0xFFFF & RISCV_IOMMU_REG_IVEC_CIV);
> +    reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_IVEC);
> +    g_assert_cmpuint(reg & RISCV_IOMMU_REG_IVEC_CIV, ==, 0xF);
> +
> +    q_addr = guest_alloc(t_alloc, 16 * 16);
> +    reg64 = 0;
> +    k = 4;
> +    deposit64(reg64, RISCV_IOMMU_CQB_PPN_START,
> +              RISCV_IOMMU_CQB_PPN_LEN, q_addr);
> +    deposit64(reg64, RISCV_IOMMU_CQB_LOG2SZ_START,
> +              RISCV_IOMMU_CQB_LOG2SZ_LEN, k - 1);
> +    riscv_iommu_write_reg64(r_iommu, RISCV_IOMMU_REG_CQB, reg64);
> +
> +    /* cqt = 0, cqcsr.cqen = 1, poll cqcsr.cqon until it reads 1 */
> +    riscv_iommu_write_reg32(r_iommu, RISCV_IOMMU_REG_CQT, 0);
> +
> +    reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_CQCSR);
> +    reg |= RISCV_IOMMU_CQCSR_CQEN;
> +    riscv_iommu_write_reg32(r_iommu, RISCV_IOMMU_REG_CQCSR, reg);
> +
> +    qtest_wait_for_queue_active(r_iommu, RISCV_IOMMU_REG_CQCSR);
> +
> +    /*
> +     * Program the fault queue. Similar to the above:
> +     * - Write 0xF to fiv, assert that we have 4 writable bits (k = 4)
> +     * - Alloc a 16*32 bytes (instead of 16*16) buffer and use it to set
> +     * fqb
> +     */
> +    riscv_iommu_write_reg32(r_iommu, RISCV_IOMMU_REG_IVEC,
> +                            0xFFFF & RISCV_IOMMU_REG_IVEC_FIV);
> +    reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_IVEC);
> +    g_assert_cmpuint(reg & RISCV_IOMMU_REG_IVEC_FIV, ==, 0xF0);
> +
> +    q_addr = guest_alloc(t_alloc, 16 * 32);
> +    reg64 = 0;
> +    k = 4;
> +    deposit64(reg64, RISCV_IOMMU_FQB_PPN_START,
> +              RISCV_IOMMU_FQB_PPN_LEN, q_addr);
> +    deposit64(reg64, RISCV_IOMMU_FQB_LOG2SZ_START,
> +              RISCV_IOMMU_FQB_LOG2SZ_LEN, k - 1);
> +    riscv_iommu_write_reg64(r_iommu, RISCV_IOMMU_REG_FQB, reg64);
> +
> +    /* fqt = 0, fqcsr.fqen = 1, poll fqcsr.fqon until it reads 1 */
> +    riscv_iommu_write_reg32(r_iommu, RISCV_IOMMU_REG_FQT, 0);
> +
> +    reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_FQCSR);
> +    reg |= RISCV_IOMMU_FQCSR_FQEN;
> +    riscv_iommu_write_reg32(r_iommu, RISCV_IOMMU_REG_FQCSR, reg);
> +
> +    qtest_wait_for_queue_active(r_iommu, RISCV_IOMMU_REG_FQCSR);
> +
> +    /*
> +     * Program the page-request queue:
> +     - Write 0xF to piv, assert that we have 4 writable bits (k = 4)
> +     - Alloc a 16*16 bytes buffer and use it to set pqb.
> +     */
> +    riscv_iommu_write_reg32(r_iommu, RISCV_IOMMU_REG_IVEC,
> +                            0xFFFF & RISCV_IOMMU_REG_IVEC_PIV);
> +    reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_IVEC);
> +    g_assert_cmpuint(reg & RISCV_IOMMU_REG_IVEC_PIV, ==, 0xF000);
> +
> +    q_addr = guest_alloc(t_alloc, 16 * 16);
> +    reg64 = 0;
> +    k = 4;
> +    deposit64(reg64, RISCV_IOMMU_PQB_PPN_START,
> +              RISCV_IOMMU_PQB_PPN_LEN, q_addr);
> +    deposit64(reg64, RISCV_IOMMU_PQB_LOG2SZ_START,
> +              RISCV_IOMMU_PQB_LOG2SZ_LEN, k - 1);
> +    riscv_iommu_write_reg64(r_iommu, RISCV_IOMMU_REG_PQB, reg64);
> +
> +    /* pqt = 0, pqcsr.pqen = 1, poll pqcsr.pqon until it reads 1 */
> +    riscv_iommu_write_reg32(r_iommu, RISCV_IOMMU_REG_PQT, 0);
> +
> +    reg = riscv_iommu_read_reg32(r_iommu, RISCV_IOMMU_REG_PQCSR);
> +    reg |= RISCV_IOMMU_PQCSR_PQEN;
> +    riscv_iommu_write_reg32(r_iommu, RISCV_IOMMU_REG_PQCSR, reg);
> +
> +    qtest_wait_for_queue_active(r_iommu, RISCV_IOMMU_REG_PQCSR);
> +}
> +
>  static void register_riscv_iommu_test(void)
>  {
>      qos_add_test("pci_config", "riscv-iommu-pci", test_pci_config, NULL);
>      qos_add_test("reg_reset", "riscv-iommu-pci", test_reg_reset, NULL);
> +    qos_add_test("iommu_init_queues", "riscv-iommu-pci",
> +                 test_iommu_init_queues, NULL);
>  }
>
>  libqos_init(register_riscv_iommu_test);
> --
> 2.43.2
>
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 14/15] hw/misc: EDU: added PASID support
  2024-03-07 16:03 ` [PATCH v2 14/15] hw/misc: EDU: added PASID support Daniel Henrique Barboza
@ 2024-05-07  9:06   ` Frank Chang
  0 siblings, 0 replies; 55+ messages in thread
From: Frank Chang @ 2024-05-07  9:06 UTC (permalink / raw)
  To: Daniel Henrique Barboza
  Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
	zhiwei_liu, palmer, ajones, tjeznach

Hi Daniel,

Daniel Henrique Barboza <dbarboza@ventanamicro.com> 於 2024年3月8日 週五 上午12:05寫道:

>
> From: Tomasz Jeznach <tjeznach@rivosinc.com>
>
> Extension to support DMA with PASID identifier and reporting PASID
> extended PCIe capabilities.
>
> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> ---
>  hw/misc/edu.c | 57 +++++++++++++++++++++++++++++++++++++++------------
>  1 file changed, 44 insertions(+), 13 deletions(-)
>
> diff --git a/hw/misc/edu.c b/hw/misc/edu.c
> index 2a976ca2b1..522cec85b3 100644
> --- a/hw/misc/edu.c
> +++ b/hw/misc/edu.c
> @@ -26,6 +26,7 @@
>  #include "qemu/units.h"
>  #include "hw/pci/pci.h"
>  #include "hw/hw.h"
> +#include "hw/qdev-properties.h"
>  #include "hw/pci/msi.h"
>  #include "qemu/timer.h"
>  #include "qom/object.h"
> @@ -53,6 +54,8 @@ struct EduState {
>      QemuCond thr_cond;
>      bool stopping;
>
> +    bool enable_pasid;
> +
>      uint32_t addr4;
>      uint32_t fact;
>  #define EDU_STATUS_COMPUTING    0x01
> @@ -66,6 +69,9 @@ struct EduState {
>  # define EDU_DMA_FROM_PCI       0
>  # define EDU_DMA_TO_PCI         1
>  #define EDU_DMA_IRQ             0x4
> +#define EDU_DMA_PV              0x8
> +#define EDU_DMA_PASID(cmd)      (((cmd) >> 8) & ((1U << 20) - 1))
> +
>      struct dma_state {
>          dma_addr_t src;
>          dma_addr_t dst;
> @@ -126,12 +132,7 @@ static void edu_check_range(uint64_t addr, uint64_t size1, uint64_t start,
>
>  static dma_addr_t edu_clamp_addr(const EduState *edu, dma_addr_t addr)
>  {
> -    dma_addr_t res = addr & edu->dma_mask;
> -
> -    if (addr != res) {
> -        printf("EDU: clamping DMA %#.16"PRIx64" to %#.16"PRIx64"!\n", addr, res);
> -    }
> -
> +    dma_addr_t res = addr;
>      return res;
>  }
>
> @@ -139,23 +140,33 @@ static void edu_dma_timer(void *opaque)
>  {
>      EduState *edu = opaque;
>      bool raise_irq = false;
> +    MemTxAttrs attrs = MEMTXATTRS_UNSPECIFIED;
>
>      if (!(edu->dma.cmd & EDU_DMA_RUN)) {
>          return;
>      }
>
> +    if (edu->enable_pasid && (edu->dma.cmd & EDU_DMA_PV)) {
> +        attrs.unspecified = 0;
> +        attrs.pasid = EDU_DMA_PASID(edu->dma.cmd);
> +        attrs.requester_id = pci_requester_id(&edu->pdev);
> +        attrs.secure = 0;
> +    }
> +
>      if (EDU_DMA_DIR(edu->dma.cmd) == EDU_DMA_FROM_PCI) {
>          uint64_t dst = edu->dma.dst;
>          edu_check_range(dst, edu->dma.cnt, DMA_START, DMA_SIZE);
>          dst -= DMA_START;
> -        pci_dma_read(&edu->pdev, edu_clamp_addr(edu, edu->dma.src),
> -                edu->dma_buf + dst, edu->dma.cnt);
> +        pci_dma_rw(&edu->pdev, edu_clamp_addr(edu, edu->dma.src),
> +                edu->dma_buf + dst, edu->dma.cnt,
> +                DMA_DIRECTION_TO_DEVICE, attrs);
>      } else {
>          uint64_t src = edu->dma.src;
>          edu_check_range(src, edu->dma.cnt, DMA_START, DMA_SIZE);
>          src -= DMA_START;
> -        pci_dma_write(&edu->pdev, edu_clamp_addr(edu, edu->dma.dst),
> -                edu->dma_buf + src, edu->dma.cnt);
> +        pci_dma_rw(&edu->pdev, edu_clamp_addr(edu, edu->dma.dst),
> +                edu->dma_buf + src, edu->dma.cnt,
> +                DMA_DIRECTION_FROM_DEVICE, attrs);
>      }
>
>      edu->dma.cmd &= ~EDU_DMA_RUN;
> @@ -255,7 +266,8 @@ static void edu_mmio_write(void *opaque, hwaddr addr, uint64_t val,
>          if (qatomic_read(&edu->status) & EDU_STATUS_COMPUTING) {
>              break;
>          }
> -        /* EDU_STATUS_COMPUTING cannot go 0->1 concurrently, because it is only
> +        /*
> +         * EDU_STATUS_COMPUTING cannot go 0->1 concurrently, because it is only
>           * set in this function and it is under the iothread mutex.
>           */
>          qemu_mutex_lock(&edu->thr_mutex);
> @@ -368,9 +380,21 @@ static void pci_edu_realize(PCIDevice *pdev, Error **errp)
>  {
>      EduState *edu = EDU(pdev);
>      uint8_t *pci_conf = pdev->config;
> +    int pos;
>
>      pci_config_set_interrupt_pin(pci_conf, 1);
>
> +    pcie_endpoint_cap_init(pdev, 0);
> +
> +    /* PCIe extended capability for PASID */
> +    pos = PCI_CONFIG_SPACE_SIZE;
> +    if (edu->enable_pasid) {
> +        /* PCIe Spec 7.8.9 PASID Extended Capability Structure */
> +        pcie_add_capability(pdev, 0x1b, 1, pos, 8);
> +        pci_set_long(pdev->config + pos + 4, 0x00001400);
> +        pci_set_long(pdev->wmask + pos + 4,  0xfff0ffff);

We should use the defines declared in
include/standard-headers/linux/pci_regs.h for readability.

> +    }
> +
>      if (msi_init(pdev, 0, 1, true, false, errp)) {
>          return;
>      }
> @@ -404,20 +428,27 @@ static void pci_edu_uninit(PCIDevice *pdev)
>      msi_uninit(pdev);
>  }
>
> +

This new line is unnecessary.

>  static void edu_instance_init(Object *obj)
>  {
>      EduState *edu = EDU(obj);
>
> -    edu->dma_mask = (1UL << 28) - 1;
> +    edu->dma_mask = ~0ULL;

docs/specs/edu.txt says:
"For educational purposes, the device supports only 28 bits (256 MiB)
by default. Students shall set dma_mask for the device in the OS driver
properly."

We should either update the EDU spec or revert the change here.

>      object_property_add_uint64_ptr(obj, "dma_mask",
>                                     &edu->dma_mask, OBJ_PROP_FLAG_READWRITE);
>  }
>
> +static Property edu_properties[] = {
> +    DEFINE_PROP_BOOL("pasid", EduState, enable_pasid, TRUE),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
>  static void edu_class_init(ObjectClass *class, void *data)
>  {
>      DeviceClass *dc = DEVICE_CLASS(class);
>      PCIDeviceClass *k = PCI_DEVICE_CLASS(class);
>
> +    device_class_set_props(dc, edu_properties);
>      k->realize = pci_edu_realize;
>      k->exit = pci_edu_uninit;
>      k->vendor_id = PCI_VENDOR_ID_QEMU;
> @@ -430,7 +461,7 @@ static void edu_class_init(ObjectClass *class, void *data)
>  static void pci_edu_register_types(void)
>  {
>      static InterfaceInfo interfaces[] = {
> -        { INTERFACE_CONVENTIONAL_PCI_DEVICE },
> +        { INTERFACE_PCIE_DEVICE },
>          { },
>      };
>      static const TypeInfo edu_info = {
> --
> 2.43.2
>
>

This commit introduces a new command for PASID (PV, bitwise OR of:
0x08; PASID, cmds[27:8]).
We should also update the EDU spec: docs/specs/edu.rst to address the changes.


Regards,
Frank Chang


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 15/15] hw/misc: EDU: add ATS/PRI capability
  2024-03-07 16:03 ` [PATCH v2 15/15] hw/misc: EDU: add ATS/PRI capability Daniel Henrique Barboza
@ 2024-05-07 15:32   ` Frank Chang
  2024-05-16 13:59     ` Daniel Henrique Barboza
  0 siblings, 1 reply; 55+ messages in thread
From: Frank Chang @ 2024-05-07 15:32 UTC (permalink / raw)
  To: Daniel Henrique Barboza
  Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
	zhiwei_liu, palmer, ajones, tjeznach

Hi Daniel,

Daniel Henrique Barboza <dbarboza@ventanamicro.com> 於 2024年3月8日 週五 上午12:05寫道:
>
> From: Tomasz Jeznach <tjeznach@rivosinc.com>
>
> Mimic ATS interface with IOMMU translate request with IOMMU_NONE.  If
> mapping exists, translation service will return current permission
> flags, otherwise will report no permissions.
>
> Implement and register the IOMMU memory region listener to be notified
> whenever an ATS invalidation request is sent from the IOMMU.
>
> Implement and register the IOMMU memory region listener to be notified
> whenever an ATS page request group response is triggered from the IOMMU.
>
> Introduces a retry mechanism to the timer design so that any page that's
> not available should be only accessed after the PRGR notification has
> been received.
>
> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> Signed-off-by: Sebastien Boeuf <seb@rivosinc.com>
> ---
>  hw/misc/edu.c | 258 ++++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 251 insertions(+), 7 deletions(-)
>
> diff --git a/hw/misc/edu.c b/hw/misc/edu.c
> index 522cec85b3..f4f6c15ec6 100644
> --- a/hw/misc/edu.c
> +++ b/hw/misc/edu.c
> @@ -45,6 +45,14 @@ DECLARE_INSTANCE_CHECKER(EduState, EDU,
>  #define DMA_START       0x40000
>  #define DMA_SIZE        4096
>
> +/*
> + * Number of tries before giving up on page request group response.
> + * Given the timer callback is scheduled to be run again after 100ms,
> + * 10 tries give roughly a second for the PRGR notification to be
> + * received.
> + */
> +#define NUM_TRIES       10
> +
>  struct EduState {
>      PCIDevice pdev;
>      MemoryRegion mmio;
> @@ -55,6 +63,7 @@ struct EduState {
>      bool stopping;
>
>      bool enable_pasid;
> +    uint32_t try;
>
>      uint32_t addr4;
>      uint32_t fact;
> @@ -81,6 +90,20 @@ struct EduState {
>      QEMUTimer dma_timer;
>      char dma_buf[DMA_SIZE];
>      uint64_t dma_mask;
> +
> +    MemoryListener iommu_listener;
> +    QLIST_HEAD(, edu_iommu) iommu_list;
> +
> +    bool prgr_rcvd;
> +    bool prgr_success;
> +};
> +
> +struct edu_iommu {
> +    EduState *edu;
> +    IOMMUMemoryRegion *iommu_mr;
> +    hwaddr iommu_offset;
> +    IOMMUNotifier n;
> +    QLIST_ENTRY(edu_iommu) iommu_next;
>  };
>
>  static bool edu_msi_enabled(EduState *edu)
> @@ -136,11 +159,65 @@ static dma_addr_t edu_clamp_addr(const EduState *edu, dma_addr_t addr)
>      return res;
>  }
>
> +static bool __find_iommu_mr_cb(Int128 start, Int128 len, const MemoryRegion *mr,
> +    hwaddr offset_in_region, void *opaque)
> +{
> +    IOMMUMemoryRegion **iommu_mr = opaque;
> +    *iommu_mr = memory_region_get_iommu((MemoryRegion *)mr);
> +    return *iommu_mr != NULL;
> +}
> +
> +static int pci_dma_perm(PCIDevice *pdev, dma_addr_t iova, MemTxAttrs attrs)
> +{
> +    IOMMUMemoryRegion *iommu_mr = NULL;
> +    IOMMUMemoryRegionClass *imrc;
> +    int iommu_idx;
> +    FlatView *fv;
> +    EduState *edu = EDU(pdev);
> +    struct edu_iommu *iommu;
> +
> +    RCU_READ_LOCK_GUARD();
> +
> +    fv = address_space_to_flatview(pci_get_address_space(pdev));
> +
> +    /* Find first IOMMUMemoryRegion */
> +    flatview_for_each_range(fv, __find_iommu_mr_cb, &iommu_mr);
> +
> +    if (iommu_mr) {
> +        imrc = memory_region_get_iommu_class_nocheck(iommu_mr);
> +
> +        /* IOMMU Index is mapping to memory attributes (PASID, etc) */
> +        iommu_idx = imrc->attrs_to_index ?
> +                    imrc->attrs_to_index(iommu_mr, attrs) : 0;
> +
> +        /* Update IOMMU notifiers with proper index */
> +        QLIST_FOREACH(iommu, &edu->iommu_list, iommu_next) {
> +            if (iommu->iommu_mr == iommu_mr &&
> +                iommu->n.iommu_idx != iommu_idx) {
> +                memory_region_unregister_iommu_notifier(
> +                    MEMORY_REGION(iommu->iommu_mr), &iommu->n);
> +                iommu->n.iommu_idx = iommu_idx;
> +                memory_region_register_iommu_notifier(
> +                    MEMORY_REGION(iommu->iommu_mr), &iommu->n, NULL);
> +            }
> +        }
> +
> +        /* Translate request with IOMMU_NONE is an ATS request */
> +        IOMMUTLBEntry iotlb = imrc->translate(iommu_mr, iova, IOMMU_NONE,
> +                                              iommu_idx);
> +
> +        return iotlb.perm;
> +    }
> +
> +    return IOMMU_NONE;
> +}
> +
>  static void edu_dma_timer(void *opaque)
>  {
>      EduState *edu = opaque;
>      bool raise_irq = false;
>      MemTxAttrs attrs = MEMTXATTRS_UNSPECIFIED;
> +    MemTxResult res;
>
>      if (!(edu->dma.cmd & EDU_DMA_RUN)) {
>          return;
> @@ -155,18 +232,70 @@ static void edu_dma_timer(void *opaque)
>
>      if (EDU_DMA_DIR(edu->dma.cmd) == EDU_DMA_FROM_PCI) {
>          uint64_t dst = edu->dma.dst;
> +        uint64_t src = edu_clamp_addr(edu, edu->dma.src);
>          edu_check_range(dst, edu->dma.cnt, DMA_START, DMA_SIZE);
>          dst -= DMA_START;
> -        pci_dma_rw(&edu->pdev, edu_clamp_addr(edu, edu->dma.src),
> -                edu->dma_buf + dst, edu->dma.cnt,
> -                DMA_DIRECTION_TO_DEVICE, attrs);
> +        if (edu->try-- == NUM_TRIES) {
> +            edu->prgr_rcvd = false;
> +            if (!(pci_dma_perm(&edu->pdev, src, attrs) & IOMMU_RO)) {
> +                timer_mod(&edu->dma_timer,
> +                          qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 100);
> +                return;
> +            }
> +        } else if (edu->try) {
> +            if (!edu->prgr_rcvd) {
> +                timer_mod(&edu->dma_timer,
> +                          qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 100);
> +                return;
> +            }
> +            if (!edu->prgr_success) {
> +                /* PRGR failure, fail DMA. */
> +                edu->dma.cmd &= ~EDU_DMA_RUN;
> +                return;
> +            }
> +        } else {
> +            /* timeout, fail DMA. */
> +            edu->dma.cmd &= ~EDU_DMA_RUN;
> +            return;
> +        }
> +        res = pci_dma_rw(&edu->pdev, src, edu->dma_buf + dst, edu->dma.cnt,
> +            DMA_DIRECTION_TO_DEVICE, attrs);
> +        if (res != MEMTX_OK) {
> +            hw_error("EDU: DMA transfer TO 0x%"PRIx64" failed.\n", dst);
> +        }
>      } else {
>          uint64_t src = edu->dma.src;
> +        uint64_t dst = edu_clamp_addr(edu, edu->dma.dst);
>          edu_check_range(src, edu->dma.cnt, DMA_START, DMA_SIZE);
>          src -= DMA_START;
> -        pci_dma_rw(&edu->pdev, edu_clamp_addr(edu, edu->dma.dst),
> -                edu->dma_buf + src, edu->dma.cnt,
> -                DMA_DIRECTION_FROM_DEVICE, attrs);
> +        if (edu->try-- == NUM_TRIES) {
> +            edu->prgr_rcvd = false;
> +            if (!(pci_dma_perm(&edu->pdev, dst, attrs) & IOMMU_WO)) {
> +                timer_mod(&edu->dma_timer,
> +                          qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 100);
> +                return;
> +            }
> +        } else if (edu->try) {
> +            if (!edu->prgr_rcvd) {
> +                timer_mod(&edu->dma_timer,
> +                          qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 100);
> +                return;
> +            }
> +            if (!edu->prgr_success) {
> +                /* PRGR failure, fail DMA. */
> +                edu->dma.cmd &= ~EDU_DMA_RUN;
> +                return;
> +            }
> +        } else {
> +            /* timeout, fail DMA. */
> +            edu->dma.cmd &= ~EDU_DMA_RUN;
> +            return;
> +        }
> +        res = pci_dma_rw(&edu->pdev, dst, edu->dma_buf + src, edu->dma.cnt,
> +            DMA_DIRECTION_FROM_DEVICE, attrs);
> +        if (res != MEMTX_OK) {
> +            hw_error("EDU: DMA transfer FROM 0x%"PRIx64" failed.\n", src);
> +        }
>      }
>
>      edu->dma.cmd &= ~EDU_DMA_RUN;
> @@ -193,6 +322,7 @@ static void dma_rw(EduState *edu, bool write, dma_addr_t *val, dma_addr_t *dma,
>      }
>
>      if (timer) {
> +        edu->try = NUM_TRIES;
>          timer_mod(&edu->dma_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 100);
>      }
>  }
> @@ -376,9 +506,92 @@ static void *edu_fact_thread(void *opaque)
>      return NULL;
>  }
>
> +static void edu_iommu_ats_prgr_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> +{
> +    struct edu_iommu *iommu = container_of(n, struct edu_iommu, n);
> +    EduState *edu = iommu->edu;
> +    edu->prgr_success = (iotlb->perm != IOMMU_NONE);
> +    barrier();
> +    edu->prgr_rcvd = true;
> +}
> +
> +static void edu_iommu_ats_inval_notify(IOMMUNotifier *n,
> +                                       IOMMUTLBEntry *iotlb)
> +{
> +
> +}
> +
> +static void edu_iommu_region_add(MemoryListener *listener,
> +                                   MemoryRegionSection *section)
> +{
> +    EduState *edu = container_of(listener, EduState, iommu_listener);
> +    struct edu_iommu *iommu;
> +    Int128 end;
> +    int iommu_idx;
> +    IOMMUMemoryRegion *iommu_mr;
> +
> +    if (!memory_region_is_iommu(section->mr)) {
> +        return;
> +    }
> +
> +    iommu_mr = IOMMU_MEMORY_REGION(section->mr);
> +
> +    /* Register ATS.INVAL notifier */
> +    iommu = g_malloc0(sizeof(*iommu));
> +    iommu->iommu_mr = iommu_mr;
> +    iommu->iommu_offset = section->offset_within_address_space -
> +                          section->offset_within_region;
> +    iommu->edu = edu;
> +    end = int128_add(int128_make64(section->offset_within_region),
> +                     section->size);
> +    end = int128_sub(end, int128_one());
> +    iommu_idx = memory_region_iommu_attrs_to_index(iommu_mr,
> +                                                   MEMTXATTRS_UNSPECIFIED);
> +    iommu_notifier_init(&iommu->n, edu_iommu_ats_inval_notify,
> +                        IOMMU_NOTIFIER_DEVIOTLB_UNMAP,
> +                        section->offset_within_region,
> +                        int128_get64(end),
> +                        iommu_idx);
> +    memory_region_register_iommu_notifier(section->mr, &iommu->n, NULL);
> +    QLIST_INSERT_HEAD(&edu->iommu_list, iommu, iommu_next);
> +
> +    /* Register ATS.PRGR notifier */
> +    iommu = g_memdup2(iommu, sizeof(*iommu));
> +    iommu_notifier_init(&iommu->n, edu_iommu_ats_prgr_notify,
> +                        IOMMU_NOTIFIER_MAP,
> +                        section->offset_within_region,
> +                        int128_get64(end),
> +                        iommu_idx);
> +    memory_region_register_iommu_notifier(section->mr, &iommu->n, NULL);
> +    QLIST_INSERT_HEAD(&edu->iommu_list, iommu, iommu_next);
> +}
> +
> +static void edu_iommu_region_del(MemoryListener *listener,
> +                                   MemoryRegionSection *section)
> +{
> +    EduState *edu = container_of(listener, EduState, iommu_listener);
> +    struct edu_iommu *iommu;
> +
> +    if (!memory_region_is_iommu(section->mr)) {
> +        return;
> +    }
> +
> +    QLIST_FOREACH(iommu, &edu->iommu_list, iommu_next) {
> +        if (MEMORY_REGION(iommu->iommu_mr) == section->mr &&
> +            iommu->n.start == section->offset_within_region) {
> +            memory_region_unregister_iommu_notifier(section->mr,
> +                                                    &iommu->n);
> +            QLIST_REMOVE(iommu, iommu_next);
> +            g_free(iommu);
> +            break;
> +        }
> +    }
> +}
> +
>  static void pci_edu_realize(PCIDevice *pdev, Error **errp)
>  {
>      EduState *edu = EDU(pdev);
> +    AddressSpace *dma_as = NULL;
>      uint8_t *pci_conf = pdev->config;
>      int pos;
>
> @@ -390,9 +603,28 @@ static void pci_edu_realize(PCIDevice *pdev, Error **errp)
>      pos = PCI_CONFIG_SPACE_SIZE;
>      if (edu->enable_pasid) {
>          /* PCIe Spec 7.8.9 PASID Extended Capability Structure */
> -        pcie_add_capability(pdev, 0x1b, 1, pos, 8);
> +        pcie_add_capability(pdev, PCI_EXT_CAP_ID_PASID, 1, pos, 8);

This should be included in the 14th commit.

>          pci_set_long(pdev->config + pos + 4, 0x00001400);
>          pci_set_long(pdev->wmask + pos + 4,  0xfff0ffff);
> +        pos += 8;
> +
> +        /* ATS Capability */
> +        pcie_ats_init(pdev, pos, true);
> +        pos += PCI_EXT_CAP_ATS_SIZEOF;
> +
> +        /* PRI Capability */
> +        pcie_add_capability(pdev, PCI_EXT_CAP_ID_PRI, 1, pos, 16);
> +        /* PRI STOPPED */
> +        pci_set_long(pdev->config + pos +  4, 0x01000000);
> +        /* PRI ENABLE bit writable */
> +        pci_set_long(pdev->wmask  + pos +  4, 0x00000001);
> +        /* PRI Capacity Supported */
> +        pci_set_long(pdev->config + pos +  8, 0x00000080);
> +        /* PRI Allocations Allowed, 32 */
> +        pci_set_long(pdev->config + pos + 12, 0x00000040);
> +        pci_set_long(pdev->wmask  + pos + 12, 0x0000007f);

We should use the defines declared in
include/standard-headers/linux/pci_regs.h for readability,
though some of the bitfields are not defined in the header file.

Regards,
Frank Chang

> +
> +        pos += 8;
>      }
>
>      if (msi_init(pdev, 0, 1, true, false, errp)) {
> @@ -409,12 +641,24 @@ static void pci_edu_realize(PCIDevice *pdev, Error **errp)
>      memory_region_init_io(&edu->mmio, OBJECT(edu), &edu_mmio_ops, edu,
>                      "edu-mmio", 1 * MiB);
>      pci_register_bar(pdev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &edu->mmio);
> +
> +    /* Register IOMMU listener */
> +    edu->iommu_listener = (MemoryListener) {
> +        .name = "edu-iommu",
> +        .region_add = edu_iommu_region_add,
> +        .region_del = edu_iommu_region_del,
> +    };
> +
> +    dma_as = pci_device_iommu_address_space(pdev);
> +    memory_listener_register(&edu->iommu_listener, dma_as);
>  }
>
>  static void pci_edu_uninit(PCIDevice *pdev)
>  {
>      EduState *edu = EDU(pdev);
>
> +    memory_listener_unregister(&edu->iommu_listener);
> +
>      qemu_mutex_lock(&edu->thr_mutex);
>      edu->stopping = true;
>      qemu_mutex_unlock(&edu->thr_mutex);
> --
> 2.43.2
>
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 10/15] hw/riscv/riscv-iommu: add ATS support
  2024-03-07 16:03 ` [PATCH v2 10/15] hw/riscv/riscv-iommu: add ATS support Daniel Henrique Barboza
@ 2024-05-08  2:57   ` Frank Chang
  2024-05-17  9:29     ` Daniel Henrique Barboza
  0 siblings, 1 reply; 55+ messages in thread
From: Frank Chang @ 2024-05-08  2:57 UTC (permalink / raw)
  To: Daniel Henrique Barboza
  Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
	zhiwei_liu, palmer, ajones, tjeznach

Hi Daniel,

Daniel Henrique Barboza <dbarboza@ventanamicro.com> 於 2024年3月8日 週五 上午12:06寫道:
>
> From: Tomasz Jeznach <tjeznach@rivosinc.com>
>
> Add PCIe Address Translation Services (ATS) capabilities to the IOMMU.
> This will add support for ATS translation requests in Fault/Event
> queues, Page-request queue and IOATC invalidations.
>
> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> ---
>  hw/riscv/riscv-iommu-bits.h |  43 ++++++++++++++-
>  hw/riscv/riscv-iommu.c      | 107 +++++++++++++++++++++++++++++++++---
>  hw/riscv/riscv-iommu.h      |   1 +
>  hw/riscv/trace-events       |   3 +
>  4 files changed, 145 insertions(+), 9 deletions(-)
>
> diff --git a/hw/riscv/riscv-iommu-bits.h b/hw/riscv/riscv-iommu-bits.h
> index 9d645d69ea..0994f5ce48 100644
> --- a/hw/riscv/riscv-iommu-bits.h
> +++ b/hw/riscv/riscv-iommu-bits.h
> @@ -81,6 +81,7 @@ struct riscv_iommu_pq_record {
>  #define RISCV_IOMMU_CAP_SV57X4          BIT_ULL(19)
>  #define RISCV_IOMMU_CAP_MSI_FLAT        BIT_ULL(22)
>  #define RISCV_IOMMU_CAP_MSI_MRIF        BIT_ULL(23)
> +#define RISCV_IOMMU_CAP_ATS             BIT_ULL(25)
>  #define RISCV_IOMMU_CAP_IGS             GENMASK_ULL(29, 28)
>  #define RISCV_IOMMU_CAP_PAS             GENMASK_ULL(37, 32)
>  #define RISCV_IOMMU_CAP_PD8             BIT_ULL(38)
> @@ -201,6 +202,7 @@ struct riscv_iommu_dc {
>
>  /* Translation control fields */
>  #define RISCV_IOMMU_DC_TC_V             BIT_ULL(0)
> +#define RISCV_IOMMU_DC_TC_EN_ATS        BIT_ULL(1)
>  #define RISCV_IOMMU_DC_TC_DTF           BIT_ULL(4)
>  #define RISCV_IOMMU_DC_TC_PDTV          BIT_ULL(5)
>  #define RISCV_IOMMU_DC_TC_PRPR          BIT_ULL(6)
> @@ -259,6 +261,20 @@ struct riscv_iommu_command {
>  #define RISCV_IOMMU_CMD_IODIR_DV        BIT_ULL(33)
>  #define RISCV_IOMMU_CMD_IODIR_DID       GENMASK_ULL(63, 40)
>
> +/* 3.1.4 I/O MMU PCIe ATS */
> +#define RISCV_IOMMU_CMD_ATS_OPCODE              4
> +#define RISCV_IOMMU_CMD_ATS_FUNC_INVAL          0
> +#define RISCV_IOMMU_CMD_ATS_FUNC_PRGR           1
> +#define RISCV_IOMMU_CMD_ATS_PID         GENMASK_ULL(31, 12)
> +#define RISCV_IOMMU_CMD_ATS_PV          BIT_ULL(32)
> +#define RISCV_IOMMU_CMD_ATS_DSV         BIT_ULL(33)
> +#define RISCV_IOMMU_CMD_ATS_RID         GENMASK_ULL(55, 40)
> +#define RISCV_IOMMU_CMD_ATS_DSEG        GENMASK_ULL(63, 56)
> +/* dword1 is the ATS payload, two different payload types for INVAL and PRGR */
> +
> +/* ATS.PRGR payload */
> +#define RISCV_IOMMU_CMD_ATS_PRGR_RESP_CODE      GENMASK_ULL(47, 44)
> +
>  enum riscv_iommu_dc_fsc_atp_modes {
>      RISCV_IOMMU_DC_FSC_MODE_BARE = 0,
>      RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV32 = 8,
> @@ -322,7 +338,32 @@ enum riscv_iommu_fq_ttypes {
>      RISCV_IOMMU_FQ_TTYPE_TADDR_INST_FETCH = 5,
>      RISCV_IOMMU_FQ_TTYPE_TADDR_RD = 6,
>      RISCV_IOMMU_FQ_TTYPE_TADDR_WR = 7,
> -    RISCV_IOMMU_FW_TTYPE_PCIE_MSG_REQ = 8,
> +    RISCV_IOMMU_FQ_TTYPE_PCIE_ATS_REQ = 8,
> +    RISCV_IOMMU_FW_TTYPE_PCIE_MSG_REQ = 9,
> +};
> +
> +/* Header fields */
> +#define RISCV_IOMMU_PREQ_HDR_PID        GENMASK_ULL(31, 12)
> +#define RISCV_IOMMU_PREQ_HDR_PV         BIT_ULL(32)
> +#define RISCV_IOMMU_PREQ_HDR_PRIV       BIT_ULL(33)
> +#define RISCV_IOMMU_PREQ_HDR_EXEC       BIT_ULL(34)
> +#define RISCV_IOMMU_PREQ_HDR_DID        GENMASK_ULL(63, 40)
> +
> +/* Payload fields */
> +#define RISCV_IOMMU_PREQ_PAYLOAD_R      BIT_ULL(0)
> +#define RISCV_IOMMU_PREQ_PAYLOAD_W      BIT_ULL(1)
> +#define RISCV_IOMMU_PREQ_PAYLOAD_L      BIT_ULL(2)
> +#define RISCV_IOMMU_PREQ_PAYLOAD_M      GENMASK_ULL(2, 0)
> +#define RISCV_IOMMU_PREQ_PRG_INDEX      GENMASK_ULL(11, 3)
> +#define RISCV_IOMMU_PREQ_UADDR          GENMASK_ULL(63, 12)
> +
> +
> +/*
> + * struct riscv_iommu_msi_pte - MSI Page Table Entry
> + */
> +struct riscv_iommu_msi_pte {
> +      uint64_t pte;
> +      uint64_t mrif_info;
>  };
>
>  /* Fields on pte */
> diff --git a/hw/riscv/riscv-iommu.c b/hw/riscv/riscv-iommu.c
> index 03a610fa75..7af5929b10 100644
> --- a/hw/riscv/riscv-iommu.c
> +++ b/hw/riscv/riscv-iommu.c
> @@ -576,7 +576,7 @@ static int riscv_iommu_ctx_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx)
>              RISCV_IOMMU_DC_IOHGATP_MODE_BARE);
>          ctx->satp = set_field(0, RISCV_IOMMU_ATP_MODE_FIELD,
>              RISCV_IOMMU_DC_FSC_MODE_BARE);
> -        ctx->tc = RISCV_IOMMU_DC_TC_V;
> +        ctx->tc = RISCV_IOMMU_DC_TC_EN_ATS | RISCV_IOMMU_DC_TC_V;

We should OR RISCV_IOMMU_DC_TC_EN_ATS only when IOMMU has ATS capability.
(i.e. s->enable_ats == true).

>          ctx->ta = 0;
>          ctx->msiptp = 0;
>          return 0;
> @@ -1021,6 +1021,18 @@ static int riscv_iommu_translate(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
>      enable_pri = (iotlb->perm == IOMMU_NONE) && (ctx->tc & BIT_ULL(32));
>      enable_pasid = (ctx->tc & RISCV_IOMMU_DC_TC_PDTV);
>
> +    /* Check for ATS request. */
> +    if (iotlb->perm == IOMMU_NONE) {
> +        /* Check if ATS is disabled. */
> +        if (!(ctx->tc & RISCV_IOMMU_DC_TC_EN_ATS)) {
> +            enable_pri = false;
> +            fault = RISCV_IOMMU_FQ_CAUSE_TTYPE_BLOCKED;
> +            goto done;
> +        }
> +        trace_riscv_iommu_ats(s->parent_obj.id, PCI_BUS_NUM(ctx->devid),
> +                PCI_SLOT(ctx->devid), PCI_FUNC(ctx->devid), iotlb->iova);

It's possible that iotlb->perm == IOMMU_NONE,
but the translation request comes from riscv_iommu_process_dbg().

> +    }
> +
>      iot = riscv_iommu_iot_lookup(ctx, iot_cache, iotlb->iova);
>      perm = iot ? iot->perm : IOMMU_NONE;
>      if (perm != IOMMU_NONE) {
> @@ -1067,13 +1079,10 @@ done:
>
>      if (enable_faults && fault) {
>          struct riscv_iommu_fq_record ev;
> -        unsigned ttype;
> -
> -        if (iotlb->perm & IOMMU_RW) {
> -            ttype = RISCV_IOMMU_FQ_TTYPE_UADDR_WR;
> -        } else {
> -            ttype = RISCV_IOMMU_FQ_TTYPE_UADDR_RD;
> -        }
> +        const unsigned ttype =
> +            (iotlb->perm & IOMMU_RW) ? RISCV_IOMMU_FQ_TTYPE_UADDR_WR :
> +            ((iotlb->perm & IOMMU_RO) ? RISCV_IOMMU_FQ_TTYPE_UADDR_RD :
> +            RISCV_IOMMU_FQ_TTYPE_PCIE_ATS_REQ);
>          ev.hdr = set_field(0, RISCV_IOMMU_FQ_HDR_CAUSE, fault);
>          ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_TTYPE, ttype);
>          ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_PV, enable_pasid);
> @@ -1105,6 +1114,73 @@ static MemTxResult riscv_iommu_iofence(RISCVIOMMUState *s, bool notify,
>          MEMTXATTRS_UNSPECIFIED);
>  }
>
> +static void riscv_iommu_ats(RISCVIOMMUState *s,
> +    struct riscv_iommu_command *cmd, IOMMUNotifierFlag flag,
> +    IOMMUAccessFlags perm,
> +    void (*trace_fn)(const char *id))
> +{
> +    RISCVIOMMUSpace *as = NULL;
> +    IOMMUNotifier *n;
> +    IOMMUTLBEvent event;
> +    uint32_t pasid;
> +    uint32_t devid;
> +    const bool pv = cmd->dword0 & RISCV_IOMMU_CMD_ATS_PV;
> +
> +    if (cmd->dword0 & RISCV_IOMMU_CMD_ATS_DSV) {
> +        /* Use device segment and requester id */
> +        devid = get_field(cmd->dword0,
> +            RISCV_IOMMU_CMD_ATS_DSEG | RISCV_IOMMU_CMD_ATS_RID);
> +    } else {
> +        devid = get_field(cmd->dword0, RISCV_IOMMU_CMD_ATS_RID);
> +    }
> +
> +    pasid = get_field(cmd->dword0, RISCV_IOMMU_CMD_ATS_PID);
> +
> +    qemu_mutex_lock(&s->core_lock);
> +    QLIST_FOREACH(as, &s->spaces, list) {
> +        if (as->devid == devid) {
> +            break;
> +        }
> +    }
> +    qemu_mutex_unlock(&s->core_lock);
> +
> +    if (!as || !as->notifier) {
> +        return;
> +    }
> +
> +    event.type = flag;
> +    event.entry.perm = perm;
> +    event.entry.target_as = s->target_as;
> +
> +    IOMMU_NOTIFIER_FOREACH(n, &as->iova_mr) {
> +        if (!pv || n->iommu_idx == pasid) {
> +            event.entry.iova = n->start;
> +            event.entry.addr_mask = n->end - n->start;
> +            trace_fn(as->iova_mr.parent_obj.name);
> +            memory_region_notify_iommu_one(n, &event);
> +        }
> +    }
> +}
> +
> +static void riscv_iommu_ats_inval(RISCVIOMMUState *s,
> +    struct riscv_iommu_command *cmd)
> +{
> +    return riscv_iommu_ats(s, cmd, IOMMU_NOTIFIER_DEVIOTLB_UNMAP, IOMMU_NONE,
> +                           trace_riscv_iommu_ats_inval);
> +}
> +
> +static void riscv_iommu_ats_prgr(RISCVIOMMUState *s,
> +    struct riscv_iommu_command *cmd)
> +{
> +    unsigned resp_code = get_field(cmd->dword1,
> +                                   RISCV_IOMMU_CMD_ATS_PRGR_RESP_CODE);
> +
> +    /* Using the access flag to carry response code information */
> +    IOMMUAccessFlags perm = resp_code ? IOMMU_NONE : IOMMU_RW;
> +    return riscv_iommu_ats(s, cmd, IOMMU_NOTIFIER_MAP, perm,
> +                           trace_riscv_iommu_ats_prgr);
> +}
> +
>  static void riscv_iommu_process_ddtp(RISCVIOMMUState *s)
>  {
>      uint64_t old_ddtp = s->ddtp;
> @@ -1260,6 +1336,17 @@ static void riscv_iommu_process_cq_tail(RISCVIOMMUState *s)
>                  get_field(cmd.dword0, RISCV_IOMMU_CMD_IODIR_PID));
>              break;
>
> +        /* ATS commands */
> +        case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_ATS_FUNC_INVAL,
> +                             RISCV_IOMMU_CMD_ATS_OPCODE):
> +            riscv_iommu_ats_inval(s, &cmd);
> +            break;
> +
> +        case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_ATS_FUNC_PRGR,
> +                             RISCV_IOMMU_CMD_ATS_OPCODE):
> +            riscv_iommu_ats_prgr(s, &cmd);
> +            break;
> +

PCIe ATS commands are supported only when capabilities.ATS is set to 1
(i.e. s->enable_ats == true).

Regards,
Frank Chang

>          default:
>          cmd_ill:
>              /* Invalid instruction, do not advance instruction index. */
> @@ -1648,6 +1735,9 @@ static void riscv_iommu_realize(DeviceState *dev, Error **errp)
>      if (s->enable_msi) {
>          s->cap |= RISCV_IOMMU_CAP_MSI_FLAT | RISCV_IOMMU_CAP_MSI_MRIF;
>      }
> +    if (s->enable_ats) {
> +        s->cap |= RISCV_IOMMU_CAP_ATS;
> +    }
>      if (s->enable_s_stage) {
>          s->cap |= RISCV_IOMMU_CAP_SV32 | RISCV_IOMMU_CAP_SV39 |
>                    RISCV_IOMMU_CAP_SV48 | RISCV_IOMMU_CAP_SV57;
> @@ -1765,6 +1855,7 @@ static Property riscv_iommu_properties[] = {
>      DEFINE_PROP_UINT32("ioatc-limit", RISCVIOMMUState, iot_limit,
>          LIMIT_CACHE_IOT),
>      DEFINE_PROP_BOOL("intremap", RISCVIOMMUState, enable_msi, TRUE),
> +    DEFINE_PROP_BOOL("ats", RISCVIOMMUState, enable_ats, TRUE),
>      DEFINE_PROP_BOOL("off", RISCVIOMMUState, enable_off, TRUE),
>      DEFINE_PROP_BOOL("s-stage", RISCVIOMMUState, enable_s_stage, TRUE),
>      DEFINE_PROP_BOOL("g-stage", RISCVIOMMUState, enable_g_stage, TRUE),
> diff --git a/hw/riscv/riscv-iommu.h b/hw/riscv/riscv-iommu.h
> index 9b33fb97ef..47f3fdad58 100644
> --- a/hw/riscv/riscv-iommu.h
> +++ b/hw/riscv/riscv-iommu.h
> @@ -38,6 +38,7 @@ struct RISCVIOMMUState {
>
>      bool enable_off;      /* Enable out-of-reset OFF mode (DMA disabled) */
>      bool enable_msi;      /* Enable MSI remapping */
> +    bool enable_ats;      /* Enable ATS support */
>      bool enable_s_stage;  /* Enable S/VS-Stage translation */
>      bool enable_g_stage;  /* Enable G-Stage translation */
>
> diff --git a/hw/riscv/trace-events b/hw/riscv/trace-events
> index 42a97caffa..4b486b6420 100644
> --- a/hw/riscv/trace-events
> +++ b/hw/riscv/trace-events
> @@ -9,3 +9,6 @@ riscv_iommu_msi(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iov
>  riscv_iommu_cmd(const char *id, uint64_t l, uint64_t u) "%s: command 0x%"PRIx64" 0x%"PRIx64
>  riscv_iommu_notifier_add(const char *id) "%s: dev-iotlb notifier added"
>  riscv_iommu_notifier_del(const char *id) "%s: dev-iotlb notifier removed"
> +riscv_iommu_ats(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova) "%s: translate request %04x:%02x.%u iova: 0x%"PRIx64
> +riscv_iommu_ats_inval(const char *id) "%s: dev-iotlb invalidate"
> +riscv_iommu_ats_prgr(const char *id) "%s: dev-iotlb page request group response"
> --
> 2.43.2
>
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 08/15] hw/riscv/riscv-iommu: add Address Translation Cache (IOATC)
  2024-03-07 16:03 ` [PATCH v2 08/15] hw/riscv/riscv-iommu: add Address Translation Cache (IOATC) Daniel Henrique Barboza
@ 2024-05-08  7:26   ` Frank Chang
  2024-05-16 21:45     ` Daniel Henrique Barboza
  0 siblings, 1 reply; 55+ messages in thread
From: Frank Chang @ 2024-05-08  7:26 UTC (permalink / raw)
  To: Daniel Henrique Barboza
  Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
	zhiwei_liu, palmer, ajones, tjeznach

Hi Daniel,

Daniel Henrique Barboza <dbarboza@ventanamicro.com> 於 2024年3月8日 週五 上午12:05寫道:
>
> From: Tomasz Jeznach <tjeznach@rivosinc.com>
>
> The RISC-V IOMMU spec predicts that the IOMMU can use translation caches
> to hold entries from the DDT. This includes implementation for all cache
> commands that are marked as 'not implemented'.
>
> There are some artifacts included in the cache that predicts s-stage and
> g-stage elements, although we don't support it yet. We'll introduce them
> next.
>
> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> ---
>  hw/riscv/riscv-iommu.c | 190 ++++++++++++++++++++++++++++++++++++++++-
>  hw/riscv/riscv-iommu.h |   2 +
>  2 files changed, 188 insertions(+), 4 deletions(-)
>
> diff --git a/hw/riscv/riscv-iommu.c b/hw/riscv/riscv-iommu.c
> index df534b99b0..0b93146327 100644
> --- a/hw/riscv/riscv-iommu.c
> +++ b/hw/riscv/riscv-iommu.c
> @@ -63,6 +63,16 @@ struct RISCVIOMMUContext {
>      uint64_t msiptp;            /* MSI redirection page table pointer */
>  };
>
> +/* Address translation cache entry */
> +struct RISCVIOMMUEntry {
> +    uint64_t iova:44;           /* IOVA Page Number */
> +    uint64_t pscid:20;          /* Process Soft-Context identifier */
> +    uint64_t phys:44;           /* Physical Page Number */
> +    uint64_t gscid:16;          /* Guest Soft-Context identifier */
> +    uint64_t perm:2;            /* IOMMU_RW flags */
> +    uint64_t __rfu:2;
> +};
> +
>  /* IOMMU index for transactions without PASID specified. */
>  #define RISCV_IOMMU_NOPASID 0
>
> @@ -629,14 +639,127 @@ static AddressSpace *riscv_iommu_space(RISCVIOMMUState *s, uint32_t devid)
>      return &as->iova_as;
>  }
>
> +/* Translation Object cache support */
> +static gboolean __iot_equal(gconstpointer v1, gconstpointer v2)
> +{
> +    RISCVIOMMUEntry *t1 = (RISCVIOMMUEntry *) v1;
> +    RISCVIOMMUEntry *t2 = (RISCVIOMMUEntry *) v2;
> +    return t1->gscid == t2->gscid && t1->pscid == t2->pscid &&
> +           t1->iova == t2->iova;
> +}
> +
> +static guint __iot_hash(gconstpointer v)
> +{
> +    RISCVIOMMUEntry *t = (RISCVIOMMUEntry *) v;
> +    return (guint)t->iova;
> +}
> +
> +/* GV: 1 PSCV: 1 AV: 1 */
> +static void __iot_inval_pscid_iova(gpointer key, gpointer value, gpointer data)
> +{
> +    RISCVIOMMUEntry *iot = (RISCVIOMMUEntry *) value;
> +    RISCVIOMMUEntry *arg = (RISCVIOMMUEntry *) data;
> +    if (iot->gscid == arg->gscid &&
> +        iot->pscid == arg->pscid &&
> +        iot->iova == arg->iova) {
> +        iot->perm = 0;

Maybe using IOMMU_NONE would be clearer?

Otherwise,
Reviewed-by: Frank Chang <frank.chang@sifive.com>

> +    }
> +}
> +
> +/* GV: 1 PSCV: 1 AV: 0 */
> +static void __iot_inval_pscid(gpointer key, gpointer value, gpointer data)
> +{
> +    RISCVIOMMUEntry *iot = (RISCVIOMMUEntry *) value;
> +    RISCVIOMMUEntry *arg = (RISCVIOMMUEntry *) data;
> +    if (iot->gscid == arg->gscid &&
> +        iot->pscid == arg->pscid) {
> +        iot->perm = 0;
> +    }
> +}
> +
> +/* GV: 1 GVMA: 1 */
> +static void __iot_inval_gscid_gpa(gpointer key, gpointer value, gpointer data)
> +{
> +    RISCVIOMMUEntry *iot = (RISCVIOMMUEntry *) value;
> +    RISCVIOMMUEntry *arg = (RISCVIOMMUEntry *) data;
> +    if (iot->gscid == arg->gscid) {
> +        /* simplified cache, no GPA matching */
> +        iot->perm = 0;
> +    }
> +}
> +
> +/* GV: 1 GVMA: 0 */
> +static void __iot_inval_gscid(gpointer key, gpointer value, gpointer data)
> +{
> +    RISCVIOMMUEntry *iot = (RISCVIOMMUEntry *) value;
> +    RISCVIOMMUEntry *arg = (RISCVIOMMUEntry *) data;
> +    if (iot->gscid == arg->gscid) {
> +        iot->perm = 0;
> +    }
> +}
> +
> +/* GV: 0 */
> +static void __iot_inval_all(gpointer key, gpointer value, gpointer data)
> +{
> +    RISCVIOMMUEntry *iot = (RISCVIOMMUEntry *) value;
> +    iot->perm = 0;
> +}
> +
> +/* caller should keep ref-count for iot_cache object */
> +static RISCVIOMMUEntry *riscv_iommu_iot_lookup(RISCVIOMMUContext *ctx,
> +    GHashTable *iot_cache, hwaddr iova)
> +{
> +    RISCVIOMMUEntry key = {
> +        .pscid = get_field(ctx->ta, RISCV_IOMMU_DC_TA_PSCID),
> +        .iova  = PPN_DOWN(iova),
> +    };
> +    return g_hash_table_lookup(iot_cache, &key);
> +}
> +
> +/* caller should keep ref-count for iot_cache object */
> +static void riscv_iommu_iot_update(RISCVIOMMUState *s,
> +    GHashTable *iot_cache, RISCVIOMMUEntry *iot)
> +{
> +    if (!s->iot_limit) {
> +        return;
> +    }
> +
> +    if (g_hash_table_size(s->iot_cache) >= s->iot_limit) {
> +        iot_cache = g_hash_table_new_full(__iot_hash, __iot_equal,
> +                                          g_free, NULL);
> +        g_hash_table_unref(qatomic_xchg(&s->iot_cache, iot_cache));
> +    }
> +    g_hash_table_add(iot_cache, iot);
> +}
> +
> +static void riscv_iommu_iot_inval(RISCVIOMMUState *s, GHFunc func,
> +    uint32_t gscid, uint32_t pscid, hwaddr iova)
> +{
> +    GHashTable *iot_cache;
> +    RISCVIOMMUEntry key = {
> +        .gscid = gscid,
> +        .pscid = pscid,
> +        .iova  = PPN_DOWN(iova),
> +    };
> +
> +    iot_cache = g_hash_table_ref(s->iot_cache);
> +    g_hash_table_foreach(iot_cache, func, &key);
> +    g_hash_table_unref(iot_cache);
> +}
> +
>  static int riscv_iommu_translate(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
> -    IOMMUTLBEntry *iotlb)
> +    IOMMUTLBEntry *iotlb, bool enable_cache)
>  {
> +    RISCVIOMMUEntry *iot;
> +    IOMMUAccessFlags perm;
>      bool enable_faults;
>      bool enable_pasid;
>      bool enable_pri;
> +    GHashTable *iot_cache;
>      int fault;
>
> +    iot_cache = g_hash_table_ref(s->iot_cache);
> +
>      enable_faults = !(ctx->tc & RISCV_IOMMU_DC_TC_DTF);
>      /*
>       * TC[32] is reserved for custom extensions, used here to temporarily
> @@ -645,9 +768,36 @@ static int riscv_iommu_translate(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
>      enable_pri = (iotlb->perm == IOMMU_NONE) && (ctx->tc & BIT_ULL(32));
>      enable_pasid = (ctx->tc & RISCV_IOMMU_DC_TC_PDTV);
>
> +    iot = riscv_iommu_iot_lookup(ctx, iot_cache, iotlb->iova);
> +    perm = iot ? iot->perm : IOMMU_NONE;
> +    if (perm != IOMMU_NONE) {
> +        iotlb->translated_addr = PPN_PHYS(iot->phys);
> +        iotlb->addr_mask = ~TARGET_PAGE_MASK;
> +        iotlb->perm = perm;
> +        fault = 0;
> +        goto done;
> +    }
> +
>      /* Translate using device directory / page table information. */
>      fault = riscv_iommu_spa_fetch(s, ctx, iotlb);
>
> +    if (!fault && iotlb->target_as == &s->trap_as) {
> +        /* Do not cache trapped MSI translations */
> +        goto done;
> +    }
> +
> +    if (!fault && iotlb->translated_addr != iotlb->iova && enable_cache) {
> +        iot = g_new0(RISCVIOMMUEntry, 1);
> +        iot->iova = PPN_DOWN(iotlb->iova);
> +        iot->phys = PPN_DOWN(iotlb->translated_addr);
> +        iot->pscid = get_field(ctx->ta, RISCV_IOMMU_DC_TA_PSCID);
> +        iot->perm = iotlb->perm;
> +        riscv_iommu_iot_update(s, iot_cache, iot);
> +    }
> +
> +done:
> +    g_hash_table_unref(iot_cache);
> +
>      if (enable_pri && fault) {
>          struct riscv_iommu_pq_record pr = {0};
>          if (enable_pasid) {
> @@ -794,13 +944,40 @@ static void riscv_iommu_process_cq_tail(RISCVIOMMUState *s)
>              if (cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_PSCV) {
>                  /* illegal command arguments IOTINVAL.GVMA & PSCV == 1 */
>                  goto cmd_ill;
> +            } else if (!(cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_GV)) {
> +                /* invalidate all cache mappings */
> +                func = __iot_inval_all;
> +            } else if (!(cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_AV)) {
> +                /* invalidate cache matching GSCID */
> +                func = __iot_inval_gscid;
> +            } else {
> +                /* invalidate cache matching GSCID and ADDR (GPA) */
> +                func = __iot_inval_gscid_gpa;
>              }
> -            /* translation cache not implemented yet */
> +            riscv_iommu_iot_inval(s, func,
> +                get_field(cmd.dword0, RISCV_IOMMU_CMD_IOTINVAL_GSCID), 0,
> +                cmd.dword1 & TARGET_PAGE_MASK);
>              break;
>
>          case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IOTINVAL_FUNC_VMA,
>                               RISCV_IOMMU_CMD_IOTINVAL_OPCODE):
> -            /* translation cache not implemented yet */
> +            if (!(cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_GV)) {
> +                /* invalidate all cache mappings, simplified model */
> +                func = __iot_inval_all;
> +            } else if (!(cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_PSCV)) {
> +                /* invalidate cache matching GSCID, simplified model */
> +                func = __iot_inval_gscid;
> +            } else if (!(cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_AV)) {
> +                /* invalidate cache matching GSCID and PSCID */
> +                func = __iot_inval_pscid;
> +            } else {
> +                /* invalidate cache matching GSCID and PSCID and ADDR (IOVA) */
> +                func = __iot_inval_pscid_iova;
> +            }
> +            riscv_iommu_iot_inval(s, func,
> +                get_field(cmd.dword0, RISCV_IOMMU_CMD_IOTINVAL_GSCID),
> +                get_field(cmd.dword0, RISCV_IOMMU_CMD_IOTINVAL_PSCID),
> +                cmd.dword1 & TARGET_PAGE_MASK);
>              break;
>
>          case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_DDT,
> @@ -1290,6 +1467,8 @@ static void riscv_iommu_realize(DeviceState *dev, Error **errp)
>      /* Device translation context cache */
>      s->ctx_cache = g_hash_table_new_full(__ctx_hash, __ctx_equal,
>                                           g_free, NULL);
> +    s->iot_cache = g_hash_table_new_full(__iot_hash, __iot_equal,
> +                                         g_free, NULL);
>
>      s->iommus.le_next = NULL;
>      s->iommus.le_prev = NULL;
> @@ -1313,6 +1492,7 @@ static void riscv_iommu_unrealize(DeviceState *dev)
>      qemu_thread_join(&s->core_proc);
>      qemu_cond_destroy(&s->core_cond);
>      qemu_mutex_destroy(&s->core_lock);
> +    g_hash_table_unref(s->iot_cache);
>      g_hash_table_unref(s->ctx_cache);
>  }
>
> @@ -1320,6 +1500,8 @@ static Property riscv_iommu_properties[] = {
>      DEFINE_PROP_UINT32("version", RISCVIOMMUState, version,
>          RISCV_IOMMU_SPEC_DOT_VER),
>      DEFINE_PROP_UINT32("bus", RISCVIOMMUState, bus, 0x0),
> +    DEFINE_PROP_UINT32("ioatc-limit", RISCVIOMMUState, iot_limit,
> +        LIMIT_CACHE_IOT),
>      DEFINE_PROP_BOOL("intremap", RISCVIOMMUState, enable_msi, TRUE),
>      DEFINE_PROP_BOOL("off", RISCVIOMMUState, enable_off, TRUE),
>      DEFINE_PROP_LINK("downstream-mr", RISCVIOMMUState, target_mr,
> @@ -1372,7 +1554,7 @@ static IOMMUTLBEntry riscv_iommu_memory_region_translate(
>          /* Translation disabled or invalid. */
>          iotlb.addr_mask = 0;
>          iotlb.perm = IOMMU_NONE;
> -    } else if (riscv_iommu_translate(as->iommu, ctx, &iotlb)) {
> +    } else if (riscv_iommu_translate(as->iommu, ctx, &iotlb, true)) {
>          /* Translation disabled or fault reported. */
>          iotlb.addr_mask = 0;
>          iotlb.perm = IOMMU_NONE;
> diff --git a/hw/riscv/riscv-iommu.h b/hw/riscv/riscv-iommu.h
> index 6f740de690..eea2123686 100644
> --- a/hw/riscv/riscv-iommu.h
> +++ b/hw/riscv/riscv-iommu.h
> @@ -68,6 +68,8 @@ struct RISCVIOMMUState {
>      MemoryRegion trap_mr;
>
>      GHashTable *ctx_cache;          /* Device translation Context Cache */
> +    GHashTable *iot_cache;          /* IO Translated Address Cache */
> +    unsigned iot_limit;             /* IO Translation Cache size limit */
>
>      /* MMIO Hardware Interface */
>      MemoryRegion regs_mr;
> --
> 2.43.2
>
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 03/15] hw/riscv: add RISC-V IOMMU base emulation
  2024-05-02 11:37   ` Frank Chang
@ 2024-05-08 11:15     ` Daniel Henrique Barboza
  2024-05-10 10:58       ` Frank Chang
  2024-05-13 12:37       ` Daniel Henrique Barboza
  0 siblings, 2 replies; 55+ messages in thread
From: Daniel Henrique Barboza @ 2024-05-08 11:15 UTC (permalink / raw)
  To: Frank Chang
  Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
	zhiwei_liu, palmer, ajones, tjeznach, Sebastien Boeuf

Hi Frank,

I'll reply with that I've done so far. Still missing some stuff:

On 5/2/24 08:37, Frank Chang wrote:
> Hi Daniel,
> 
> Daniel Henrique Barboza <dbarboza@ventanamicro.com> 於 2024年3月8日 週五 上午12:04寫道:
>>
>> From: Tomasz Jeznach <tjeznach@rivosinc.com>
>>
>> The RISC-V IOMMU specification is now ratified as-per the RISC-V
>> international process. The latest frozen specifcation can be found
>> at:
>>
>> https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf
>>
>> Add the foundation of the device emulation for RISC-V IOMMU, which
>> includes an IOMMU that has no capabilities but MSI interrupt support and
>> fault queue interfaces. We'll add add more features incrementally in the
>> next patches.
>>
>> Co-developed-by: Sebastien Boeuf <seb@rivosinc.com>
>> Signed-off-by: Sebastien Boeuf <seb@rivosinc.com>
>> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
>> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
>> ---
>>   hw/riscv/Kconfig         |    4 +
>>   hw/riscv/meson.build     |    1 +
>>   hw/riscv/riscv-iommu.c   | 1492 ++++++++++++++++++++++++++++++++++++++
>>   hw/riscv/riscv-iommu.h   |  141 ++++
>>   hw/riscv/trace-events    |   11 +
>>   hw/riscv/trace.h         |    2 +
>>   include/hw/riscv/iommu.h |   36 +
>>   meson.build              |    1 +
>>   8 files changed, 1688 insertions(+)
>>   create mode 100644 hw/riscv/riscv-iommu.c
>>   create mode 100644 hw/riscv/riscv-iommu.h
>>   create mode 100644 hw/riscv/trace-events
>>   create mode 100644 hw/riscv/trace.h
>>   create mode 100644 include/hw/riscv/iommu.h
>>

(...)

+{
>> +    const uint32_t ipsr =
>> +        riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_IPSR, (1 << vec), 0);
>> +    const uint32_t ivec = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_IVEC);
>> +    if (s->notify && !(ipsr & (1 << vec))) {
>> +        s->notify(s, (ivec >> (vec * 4)) & 0x0F);
>> +    }
> 
> s->notify is assigned to riscv_iommu_pci_notify() only.
> There's no way to assert the wire-signaled interrupt.
> 
> We should also check fctl.WSI before asserting the interrupt.
> 

This implementation does not support wire-signalled interrupts. It supports only
MSI, i.e. capabililities.IGS is always MSI (0). For this reason the code is also
not checking for fctl.WSI.



>> +}
  (...)

>> +    g_hash_table_unref(ctx_cache);
>> +    *ref = NULL;
>> +
>> +    if (!(ctx->tc & RISCV_IOMMU_DC_TC_DTF)) {
> 
> riscv_iommu_ctx_fetch() may return:
> RISCV_IOMMU_FQ_CAUSE_DMA_DISABLED (256)
> RISCV_IOMMU_FQ_CAUSE_DDT_LOAD_FAULT (257)
> RISCV_IOMMU_FQ_CAUSE_DDT_INVALID (258)
> RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED (259)
> 
> These faults are reported even when DTF is set to 1.
> We should report these faults regardless of DTF setting.


I created a "riscv_iommu_report_fault()" helper to centralize all the report fault
logic. This helper will check for DTF and, if set, we'll check the 'cause' to see if
we still want the fault to be reported or not. This helper is then used in these 2
instances where we're creating a fault by hand. It's also used extensively in
riscv_iommu_msi_write() to handle all the cases you mentioned above where we
weren't issuing faults.


> 
>> +        struct riscv_iommu_fq_record ev = { 0 };
>> +        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_CAUSE, fault);
>> +        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_TTYPE,
>> +            RISCV_IOMMU_FQ_TTYPE_UADDR_RD);
>> +        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_DID, devid);
>> +        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_PID, pasid);
>> +        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_PV, !!pasid);
>> +        riscv_iommu_fault(s, &ev);
>> +    }
>> +
>> +    g_free(ctx);
>> +    return NULL;
>> +}
>> +
>> +static void riscv_iommu_ctx_put(RISCVIOMMUState *s, void *ref)
>> +{
>> +    if (ref) {
>> +        g_hash_table_unref((GHashTable *)ref);
>> +    }
>> +}
>> +
>> +/* Find or allocate address space for a given device */
>> +static AddressSpace *riscv_iommu_space(RISCVIOMMUState *s, uint32_t devid)
>> +{
>> +    RISCVIOMMUSpace *as;
>> +
>> +    /* FIXME: PCIe bus remapping for attached endpoints. */
>> +    devid |= s->bus << 8;
>> +
>> +    qemu_mutex_lock(&s->core_lock);
>> +    QLIST_FOREACH(as, &s->spaces, list) {
>> +        if (as->devid == devid) {
>> +            break;
>> +        }
>> +    }
>> +    qemu_mutex_unlock(&s->core_lock);
>> +
>> +    if (as == NULL) {
>> +        char name[64];
>> +        as = g_new0(RISCVIOMMUSpace, 1);
>> +
>> +        as->iommu = s;
>> +        as->devid = devid;
>> +
>> +        snprintf(name, sizeof(name), "riscv-iommu-%04x:%02x.%d-iova",
>> +            PCI_BUS_NUM(as->devid), PCI_SLOT(as->devid), PCI_FUNC(as->devid));
>> +
>> +        /* IOVA address space, untranslated addresses */
>> +        memory_region_init_iommu(&as->iova_mr, sizeof(as->iova_mr),
>> +            TYPE_RISCV_IOMMU_MEMORY_REGION,
>> +            OBJECT(as), name, UINT64_MAX);
>> +        address_space_init(&as->iova_as, MEMORY_REGION(&as->iova_mr),
>> +            TYPE_RISCV_IOMMU_PCI);
> 
> Why do we use TYPE_RISCV_IOMMU_PCI as the address space name here?
> 

This is an error. TYPE_RISCV_IOMMU_PCI is the name of the PCI IOMMU device.

Seeing other iommus in QEMU it seems like the name of memory region is a simple
string, e.g. "amd_iommu", and then the name of the address space of the device
is something that includes the device identification.

I'll change this to something like:

         snprintf(name, sizeof(name), "riscv-iommu-%04x:%02x.%d-iova",
             PCI_BUS_NUM(as->devid), PCI_SLOT(as->devid), PCI_FUNC(as->devid));

         /* IOVA address space, untranslated addresses */
         memory_region_init_iommu(&as->iova_mr, sizeof(as->iova_mr),
             TYPE_RISCV_IOMMU_MEMORY_REGION,
             OBJECT(as), "riscv_iommu", UINT64_MAX);
         address_space_init(&as->iova_as, MEMORY_REGION(&as->iova_mr),
                            name);

>> +
>> +        qemu_mutex_lock(&s->core_lock);

(...)


>> +    }
>> +
>> +    return dma_memory_write(s->target_as, addr, &data, sizeof(data),
>> +        MEMTXATTRS_UNSPECIFIED);
> 
> We should also assert the interrupt when IOFENCE.WSI is true
> and IOMMU is configured with wire-signaled interrupt.


I believe that for the same reason I pointed earlier ("this implementation does not
support wire-signalled interrupts") we're not checking for IOFENCE.WSI here.

> 
>> +}
>> +

(...)

>> +
>> +static MemTxResult riscv_iommu_mmio_write(void *opaque, hwaddr addr,
>> +    uint64_t data, unsigned size, MemTxAttrs attrs)
>> +{
>> +    RISCVIOMMUState *s = opaque;
>> +    uint32_t regb = addr & ~3;
>> +    uint32_t busy = 0;
>> +    uint32_t exec = 0;
>> +
>> +    if (size == 0 || size > 8 || (addr & (size - 1)) != 0) {
> 
> Is it ever possible to have size = 0 or size > 8 write access?
> This should be guarded by .valid.min_access_size and .valid.max_access_size.

Yes. And on this point:


> 

(...)

>> +
>> +static const MemoryRegionOps riscv_iommu_mmio_ops = {
>> +    .read_with_attrs = riscv_iommu_mmio_read,
>> +    .write_with_attrs = riscv_iommu_mmio_write,
>> +    .endianness = DEVICE_NATIVE_ENDIAN,
>> +    .impl = {
>> +        .min_access_size = 1,
>> +        .max_access_size = 8,
>> +        .unaligned = false,
>> +    },
>> +    .valid = {
>> +        .min_access_size = 1,
>> +        .max_access_size = 8,
>> +    }
> 
> Spec says:
> "The IOMMU behavior for register accesses where the address is not aligned
> to the size of the access, or if the access spans multiple registers,
> or if the size
> of the access is not 4 bytes or 8 bytes, is UNSPECIFIED."
> 
> Section 6.1. Reading and writing IOMMU registers also says:
> "Registers that are 64-bit wide may be accessed using either a 32-bit
> or a 64-bit access.
> Registers that are 32-bit wide must only be accessed using a 32-bit access."
> 
> Should we limit the access sizes to only 4 and 8 bytes?

Yes. We should set min = 4, max = 8, and use min and max to validate the
access  in riscv_iommu_mmio_write().


> 
>> +};
>> +
>> +/*

(...)

>> +
>> +static const MemoryRegionOps riscv_iommu_trap_ops = {
>> +    .read_with_attrs = riscv_iommu_trap_read,
>> +    .write_with_attrs = riscv_iommu_trap_write,
>> +    .endianness = DEVICE_LITTLE_ENDIAN,
>> +    .impl = {
>> +        .min_access_size = 1,
>> +        .max_access_size = 8,
>> +        .unaligned = true,
>> +    },
>> +    .valid = {
>> +        .min_access_size = 1,
>> +        .max_access_size = 8,
>> +    }
>> +};

We'll also want set min = 4 and max = 8 in these ops too.

>> +
>> +static void riscv_iommu_realize(DeviceState *dev, Error **errp)

(...)

>> +
>> +    /* Memory region for untranslated MRIF/MSI writes */
>> +    memory_region_init_io(&s->trap_mr, OBJECT(dev), &riscv_iommu_trap_ops, s,
>> +            "riscv-iommu-trap", ~0ULL);
>> +    address_space_init(&s->trap_as, &s->trap_mr, "riscv-iommu-trap-as");
>> +
>> +    /* Device translation context cache */
>> +    s->ctx_cache = g_hash_table_new_full(__ctx_hash, __ctx_equal,
>> +                                         g_free, NULL);
>> +
>> +    s->iommus.le_next = NULL;
>> +    s->iommus.le_prev = NULL;
>> +    QLIST_INIT(&s->spaces);
>> +    qemu_cond_init(&s->core_cond);
>> +    qemu_mutex_init(&s->core_lock);
>> +    qemu_spin_init(&s->regs_lock);
>> +    qemu_thread_create(&s->core_proc, "riscv-iommu-core",
>> +        riscv_iommu_core_proc, s, QEMU_THREAD_JOINABLE);
> 
> In our experience, using QEMU thread increases the latency of command
> queue processing,
> which leads to the potential IOMMU fence timeout in the Linux driver
> when using IOMMU with KVM,
> e.g. booting the guest Linux.
> 
> Is it possible to remove the thread from the IOMMU just like ARM, AMD,
> and Intel IOMMU models?

Interesting. We've been using this emulation internally in Ventana, with
KVM and VFIO, and didn't experience this issue. Drew is on CC and can talk
more about it.

That said, I don't mind this change, assuming it's feasible to make it for this
first version.  I'll need to check it how other IOMMUs are doing it.



> 
>> +}
>> +

(...)

>> +
>> +static AddressSpace *riscv_iommu_find_as(PCIBus *bus, void *opaque, int devfn)
>> +{
>> +    RISCVIOMMUState *s = (RISCVIOMMUState *) opaque;
>> +    PCIDevice *pdev = pci_find_device(bus, pci_bus_num(bus), devfn);
>> +    AddressSpace *as = NULL;
>> +
>> +    if (pdev && pci_is_iommu(pdev)) {
>> +        return s->target_as;
>> +    }
>> +
>> +    /* Find first registered IOMMU device */
>> +    while (s->iommus.le_prev) {
>> +        s = *(s->iommus.le_prev);
>> +    }
>> +
>> +    /* Find first matching IOMMU */
>> +    while (s != NULL && as == NULL) {
>> +        as = riscv_iommu_space(s, PCI_BUILD_BDF(pci_bus_num(bus), devfn));
> 
> For pci_bus_num(),
> riscv_iommu_find_as() can be called at the very early stage
> where software has no chance to enumerate the bus numbers.

I'll see how other IOMMUs are handling their iommu_find_as()


Thanks,


Daniel


> 
> 
> 
> 
>> +        s = s->iommus.le_next;
>> +    }
>> +
>> +    return as ? as : &address_space_memory;
>> +}
>> +
>> +static const PCIIOMMUOps riscv_iommu_ops = {
>> +    .get_address_space = riscv_iommu_find_as,
>> +};
>> +
>> +void riscv_iommu_pci_setup_iommu(RISCVIOMMUState *iommu, PCIBus *bus,
>> +        Error **errp)
>> +{
>> +    if (bus->iommu_ops &&
>> +        bus->iommu_ops->get_address_space == riscv_iommu_find_as) {
>> +        /* Allow multiple IOMMUs on the same PCIe bus, link known devices */
>> +        RISCVIOMMUState *last = (RISCVIOMMUState *)bus->iommu_opaque;
>> +        QLIST_INSERT_AFTER(last, iommu, iommus);
>> +    } else if (bus->iommu_ops == NULL) {
>> +        pci_setup_iommu(bus, &riscv_iommu_ops, iommu);
>> +    } else {
>> +        error_setg(errp, "can't register secondary IOMMU for PCI bus #%d",
>> +            pci_bus_num(bus));
>> +    }
>> +}
>> +
>> +static int riscv_iommu_memory_region_index(IOMMUMemoryRegion *iommu_mr,
>> +    MemTxAttrs attrs)
>> +{
>> +    return attrs.unspecified ? RISCV_IOMMU_NOPASID : (int)attrs.pasid;
>> +}
>> +
>> +static int riscv_iommu_memory_region_index_len(IOMMUMemoryRegion *iommu_mr)
>> +{
>> +    RISCVIOMMUSpace *as = container_of(iommu_mr, RISCVIOMMUSpace, iova_mr);
>> +    return 1 << as->iommu->pasid_bits;
>> +}
>> +
>> +static void riscv_iommu_memory_region_init(ObjectClass *klass, void *data)
>> +{
>> +    IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_CLASS(klass);
>> +
>> +    imrc->translate = riscv_iommu_memory_region_translate;
>> +    imrc->notify_flag_changed = riscv_iommu_memory_region_notify;
>> +    imrc->attrs_to_index = riscv_iommu_memory_region_index;
>> +    imrc->num_indexes = riscv_iommu_memory_region_index_len;
>> +}
>> +
>> +static const TypeInfo riscv_iommu_memory_region_info = {
>> +    .parent = TYPE_IOMMU_MEMORY_REGION,
>> +    .name = TYPE_RISCV_IOMMU_MEMORY_REGION,
>> +    .class_init = riscv_iommu_memory_region_init,
>> +};
>> +
>> +static void riscv_iommu_register_mr_types(void)
>> +{
>> +    type_register_static(&riscv_iommu_memory_region_info);
>> +    type_register_static(&riscv_iommu_info);
>> +}
>> +
>> +type_init(riscv_iommu_register_mr_types);
>> diff --git a/hw/riscv/riscv-iommu.h b/hw/riscv/riscv-iommu.h
>> new file mode 100644
>> index 0000000000..6f740de690
>> --- /dev/null
>> +++ b/hw/riscv/riscv-iommu.h
>> @@ -0,0 +1,141 @@
>> +/*
>> + * QEMU emulation of an RISC-V IOMMU (Ziommu)
>> + *
>> + * Copyright (C) 2022-2023 Rivos Inc.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along
>> + * with this program; if not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#ifndef HW_RISCV_IOMMU_STATE_H
>> +#define HW_RISCV_IOMMU_STATE_H
>> +
>> +#include "qemu/osdep.h"
>> +#include "qom/object.h"
>> +
>> +#include "hw/riscv/iommu.h"
>> +
>> +struct RISCVIOMMUState {
>> +    /*< private >*/
>> +    DeviceState parent_obj;
>> +
>> +    /*< public >*/
>> +    uint32_t version;     /* Reported interface version number */
>> +    uint32_t pasid_bits;  /* process identifier width */
>> +    uint32_t bus;         /* PCI bus mapping for non-root endpoints */
>> +
>> +    uint64_t cap;         /* IOMMU supported capabilities */
>> +    uint64_t fctl;        /* IOMMU enabled features */
>> +
>> +    bool enable_off;      /* Enable out-of-reset OFF mode (DMA disabled) */
>> +    bool enable_msi;      /* Enable MSI remapping */
>> +
>> +    /* IOMMU Internal State */
>> +    uint64_t ddtp;        /* Validated Device Directory Tree Root Pointer */
>> +
>> +    dma_addr_t cq_addr;   /* Command queue base physical address */
>> +    dma_addr_t fq_addr;   /* Fault/event queue base physical address */
>> +    dma_addr_t pq_addr;   /* Page request queue base physical address */
>> +
>> +    uint32_t cq_mask;     /* Command queue index bit mask */
>> +    uint32_t fq_mask;     /* Fault/event queue index bit mask */
>> +    uint32_t pq_mask;     /* Page request queue index bit mask */
>> +
>> +    /* interrupt notifier */
>> +    void (*notify)(RISCVIOMMUState *iommu, unsigned vector);
>> +
>> +    /* IOMMU State Machine */
>> +    QemuThread core_proc; /* Background processing thread */
>> +    QemuMutex core_lock;  /* Global IOMMU lock, used for cache/regs updates */
>> +    QemuCond core_cond;   /* Background processing wake up signal */
>> +    unsigned core_exec;   /* Processing thread execution actions */
>> +
>> +    /* IOMMU target address space */
>> +    AddressSpace *target_as;
>> +    MemoryRegion *target_mr;
>> +
>> +    /* MSI / MRIF access trap */
>> +    AddressSpace trap_as;
>> +    MemoryRegion trap_mr;
>> +
>> +    GHashTable *ctx_cache;          /* Device translation Context Cache */
>> +
>> +    /* MMIO Hardware Interface */
>> +    MemoryRegion regs_mr;
>> +    QemuSpin regs_lock;
>> +    uint8_t *regs_rw;  /* register state (user write) */
>> +    uint8_t *regs_wc;  /* write-1-to-clear mask */
>> +    uint8_t *regs_ro;  /* read-only mask */
>> +
>> +    QLIST_ENTRY(RISCVIOMMUState) iommus;
>> +    QLIST_HEAD(, RISCVIOMMUSpace) spaces;
>> +};
>> +
>> +void riscv_iommu_pci_setup_iommu(RISCVIOMMUState *iommu, PCIBus *bus,
>> +         Error **errp);
>> +
>> +/* private helpers */
>> +
>> +/* Register helper functions */
>> +static inline uint32_t riscv_iommu_reg_mod32(RISCVIOMMUState *s,
>> +    unsigned idx, uint32_t set, uint32_t clr)
>> +{
>> +    uint32_t val;
>> +    qemu_spin_lock(&s->regs_lock);
>> +    val = ldl_le_p(s->regs_rw + idx);
>> +    stl_le_p(s->regs_rw + idx, (val & ~clr) | set);
>> +    qemu_spin_unlock(&s->regs_lock);
>> +    return val;
>> +}
>> +
>> +static inline void riscv_iommu_reg_set32(RISCVIOMMUState *s,
>> +    unsigned idx, uint32_t set)
>> +{
>> +    qemu_spin_lock(&s->regs_lock);
>> +    stl_le_p(s->regs_rw + idx, set);
>> +    qemu_spin_unlock(&s->regs_lock);
>> +}
>> +
>> +static inline uint32_t riscv_iommu_reg_get32(RISCVIOMMUState *s,
>> +    unsigned idx)
>> +{
>> +    return ldl_le_p(s->regs_rw + idx);
>> +}
>> +
>> +static inline uint64_t riscv_iommu_reg_mod64(RISCVIOMMUState *s,
>> +    unsigned idx, uint64_t set, uint64_t clr)
>> +{
>> +    uint64_t val;
>> +    qemu_spin_lock(&s->regs_lock);
>> +    val = ldq_le_p(s->regs_rw + idx);
>> +    stq_le_p(s->regs_rw + idx, (val & ~clr) | set);
>> +    qemu_spin_unlock(&s->regs_lock);
>> +    return val;
>> +}
>> +
>> +static inline void riscv_iommu_reg_set64(RISCVIOMMUState *s,
>> +    unsigned idx, uint64_t set)
>> +{
>> +    qemu_spin_lock(&s->regs_lock);
>> +    stq_le_p(s->regs_rw + idx, set);
>> +    qemu_spin_unlock(&s->regs_lock);
>> +}
>> +
>> +static inline uint64_t riscv_iommu_reg_get64(RISCVIOMMUState *s,
>> +    unsigned idx)
>> +{
>> +    return ldq_le_p(s->regs_rw + idx);
>> +}
>> +
>> +
>> +
>> +#endif
>> diff --git a/hw/riscv/trace-events b/hw/riscv/trace-events
>> new file mode 100644
>> index 0000000000..42a97caffa
>> --- /dev/null
>> +++ b/hw/riscv/trace-events
>> @@ -0,0 +1,11 @@
>> +# See documentation at docs/devel/tracing.rst
>> +
>> +# riscv-iommu.c
>> +riscv_iommu_new(const char *id, unsigned b, unsigned d, unsigned f) "%s: device attached %04x:%02x.%d"
>> +riscv_iommu_flt(const char *id, unsigned b, unsigned d, unsigned f, uint64_t reason, uint64_t iova) "%s: fault %04x:%02x.%u reason: 0x%"PRIx64" iova: 0x%"PRIx64
>> +riscv_iommu_pri(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova) "%s: page request %04x:%02x.%u iova: 0x%"PRIx64
>> +riscv_iommu_dma(const char *id, unsigned b, unsigned d, unsigned f, unsigned pasid, const char *dir, uint64_t iova, uint64_t phys) "%s: translate %04x:%02x.%u #%u %s 0x%"PRIx64" -> 0x%"PRIx64
>> +riscv_iommu_msi(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova, uint64_t phys) "%s: translate %04x:%02x.%u MSI 0x%"PRIx64" -> 0x%"PRIx64
>> +riscv_iommu_cmd(const char *id, uint64_t l, uint64_t u) "%s: command 0x%"PRIx64" 0x%"PRIx64
>> +riscv_iommu_notifier_add(const char *id) "%s: dev-iotlb notifier added"
>> +riscv_iommu_notifier_del(const char *id) "%s: dev-iotlb notifier removed"
>> diff --git a/hw/riscv/trace.h b/hw/riscv/trace.h
>> new file mode 100644
>> index 0000000000..b88504b750
>> --- /dev/null
>> +++ b/hw/riscv/trace.h
>> @@ -0,0 +1,2 @@
>> +#include "trace/trace-hw_riscv.h"
>> +
>> diff --git a/include/hw/riscv/iommu.h b/include/hw/riscv/iommu.h
>> new file mode 100644
>> index 0000000000..403b365893
>> --- /dev/null
>> +++ b/include/hw/riscv/iommu.h
>> @@ -0,0 +1,36 @@
>> +/*
>> + * QEMU emulation of an RISC-V IOMMU (Ziommu)
>> + *
>> + * Copyright (C) 2022-2023 Rivos Inc.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along
>> + * with this program; if not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#ifndef HW_RISCV_IOMMU_H
>> +#define HW_RISCV_IOMMU_H
>> +
>> +#include "qemu/osdep.h"
>> +#include "qom/object.h"
>> +
>> +#define TYPE_RISCV_IOMMU "riscv-iommu"
>> +OBJECT_DECLARE_SIMPLE_TYPE(RISCVIOMMUState, RISCV_IOMMU)
>> +typedef struct RISCVIOMMUState RISCVIOMMUState;
>> +
>> +#define TYPE_RISCV_IOMMU_MEMORY_REGION "riscv-iommu-mr"
>> +typedef struct RISCVIOMMUSpace RISCVIOMMUSpace;
>> +
>> +#define TYPE_RISCV_IOMMU_PCI "riscv-iommu-pci"
>> +OBJECT_DECLARE_SIMPLE_TYPE(RISCVIOMMUStatePci, RISCV_IOMMU_PCI)
>> +typedef struct RISCVIOMMUStatePci RISCVIOMMUStatePci;
>> +
>> +#endif
>> diff --git a/meson.build b/meson.build
>> index c59ca496f2..75e56f3282 100644
>> --- a/meson.build
>> +++ b/meson.build
>> @@ -3361,6 +3361,7 @@ if have_system
>>       'hw/rdma',
>>       'hw/rdma/vmw',
>>       'hw/rtc',
>> +    'hw/riscv',
>>       'hw/s390x',
>>       'hw/scsi',
>>       'hw/sd',
>> --
>> 2.43.2
>>
>>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 09/15] hw/riscv/riscv-iommu: add s-stage and g-stage support
  2024-03-07 16:03 ` [PATCH v2 09/15] hw/riscv/riscv-iommu: add s-stage and g-stage support Daniel Henrique Barboza
@ 2024-05-10 10:36   ` Frank Chang
  2024-05-10 11:14     ` Andrew Jones
  0 siblings, 1 reply; 55+ messages in thread
From: Frank Chang @ 2024-05-10 10:36 UTC (permalink / raw)
  To: Daniel Henrique Barboza
  Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
	zhiwei_liu, palmer, ajones, tjeznach

Hi Daniel,

Daniel Henrique Barboza <dbarboza@ventanamicro.com> 於 2024年3月8日 週五 上午12:09寫道:
>
> From: Tomasz Jeznach <tjeznach@rivosinc.com>
>
> Add support for s-stage (sv32, sv39, sv48, sv57 caps) and g-stage
> (sv32x4, sv39x4, sv48x4, sv57x4 caps). Most of the work is done in the
> riscv_iommu_spa_fetch() function that now has to consider how many
> translation stages we need to walk the page table.
>
> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> ---
>  hw/riscv/riscv-iommu-bits.h |  11 ++
>  hw/riscv/riscv-iommu.c      | 282 ++++++++++++++++++++++++++++++++++--
>  hw/riscv/riscv-iommu.h      |   2 +
>  3 files changed, 286 insertions(+), 9 deletions(-)
>
> diff --git a/hw/riscv/riscv-iommu-bits.h b/hw/riscv/riscv-iommu-bits.h
> index 8e80b1e52a..9d645d69ea 100644
> --- a/hw/riscv/riscv-iommu-bits.h
> +++ b/hw/riscv/riscv-iommu-bits.h
> @@ -71,6 +71,14 @@ struct riscv_iommu_pq_record {
>  /* 5.3 IOMMU Capabilities (64bits) */
>  #define RISCV_IOMMU_REG_CAP             0x0000
>  #define RISCV_IOMMU_CAP_VERSION         GENMASK_ULL(7, 0)
> +#define RISCV_IOMMU_CAP_SV32            BIT_ULL(8)
> +#define RISCV_IOMMU_CAP_SV39            BIT_ULL(9)
> +#define RISCV_IOMMU_CAP_SV48            BIT_ULL(10)
> +#define RISCV_IOMMU_CAP_SV57            BIT_ULL(11)
> +#define RISCV_IOMMU_CAP_SV32X4          BIT_ULL(16)
> +#define RISCV_IOMMU_CAP_SV39X4          BIT_ULL(17)
> +#define RISCV_IOMMU_CAP_SV48X4          BIT_ULL(18)
> +#define RISCV_IOMMU_CAP_SV57X4          BIT_ULL(19)
>  #define RISCV_IOMMU_CAP_MSI_FLAT        BIT_ULL(22)
>  #define RISCV_IOMMU_CAP_MSI_MRIF        BIT_ULL(23)
>  #define RISCV_IOMMU_CAP_IGS             GENMASK_ULL(29, 28)
> @@ -79,6 +87,7 @@ struct riscv_iommu_pq_record {
>
>  /* 5.4 Features control register (32bits) */
>  #define RISCV_IOMMU_REG_FCTL            0x0008
> +#define RISCV_IOMMU_FCTL_GXL            BIT(2)
>
>  /* 5.5 Device-directory-table pointer (64bits) */
>  #define RISCV_IOMMU_REG_DDTP            0x0010
> @@ -195,6 +204,8 @@ struct riscv_iommu_dc {
>  #define RISCV_IOMMU_DC_TC_DTF           BIT_ULL(4)
>  #define RISCV_IOMMU_DC_TC_PDTV          BIT_ULL(5)
>  #define RISCV_IOMMU_DC_TC_PRPR          BIT_ULL(6)
> +#define RISCV_IOMMU_DC_TC_GADE          BIT_ULL(7)
> +#define RISCV_IOMMU_DC_TC_SADE          BIT_ULL(8)
>  #define RISCV_IOMMU_DC_TC_DPE           BIT_ULL(9)
>  #define RISCV_IOMMU_DC_TC_SXL           BIT_ULL(11)
>
> diff --git a/hw/riscv/riscv-iommu.c b/hw/riscv/riscv-iommu.c
> index 0b93146327..03a610fa75 100644
> --- a/hw/riscv/riscv-iommu.c
> +++ b/hw/riscv/riscv-iommu.c
> @@ -58,6 +58,8 @@ struct RISCVIOMMUContext {
>      uint64_t __rfu:20;          /* reserved */
>      uint64_t tc;                /* Translation Control */
>      uint64_t ta;                /* Translation Attributes */
> +    uint64_t satp;              /* S-Stage address translation and protection */
> +    uint64_t gatp;              /* G-Stage address translation and protection */
>      uint64_t msi_addr_mask;     /* MSI filtering - address mask */
>      uint64_t msi_addr_pattern;  /* MSI filtering - address pattern */
>      uint64_t msiptp;            /* MSI redirection page table pointer */
> @@ -194,12 +196,46 @@ static bool riscv_iommu_msi_check(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
>      return true;
>  }
>
> -/* RISCV IOMMU Address Translation Lookup - Page Table Walk */
> +/*
> + * RISCV IOMMU Address Translation Lookup - Page Table Walk
> + *
> + * Note: Code is based on get_physical_address() from target/riscv/cpu_helper.c
> + * Both implementation can be merged into single helper function in future.
> + * Keeping them separate for now, as error reporting and flow specifics are
> + * sufficiently different for separate implementation.
> + *
> + * @s        : IOMMU Device State
> + * @ctx      : Translation context for device id and process address space id.
> + * @iotlb    : translation data: physical address and access mode.
> + * @gpa      : provided IOVA is a guest physical address, use G-Stage only.
> + * @return   : success or fault cause code.
> + */
>  static int riscv_iommu_spa_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
> -    IOMMUTLBEntry *iotlb)
> +    IOMMUTLBEntry *iotlb, bool gpa)
>  {
> +    dma_addr_t addr, base;
> +    uint64_t satp, gatp, pte;
> +    bool en_s, en_g;
> +    struct {
> +        unsigned char step;
> +        unsigned char levels;
> +        unsigned char ptidxbits;
> +        unsigned char ptesize;
> +    } sc[2];
> +    /* Translation stage phase */
> +    enum {
> +        S_STAGE = 0,
> +        G_STAGE = 1,
> +    } pass;
> +
> +    satp = get_field(ctx->satp, RISCV_IOMMU_ATP_MODE_FIELD);
> +    gatp = get_field(ctx->gatp, RISCV_IOMMU_ATP_MODE_FIELD);
> +
> +    en_s = satp != RISCV_IOMMU_DC_FSC_MODE_BARE && !gpa;
> +    en_g = gatp != RISCV_IOMMU_DC_IOHGATP_MODE_BARE;
> +
>      /* Early check for MSI address match when IOVA == GPA */
> -    if (iotlb->perm & IOMMU_WO &&
> +    if (!en_s && (iotlb->perm & IOMMU_WO) &&

I'm wondering do we need to check "en_s" for MSI writes?

IOMMU spec Section 2.3.3. Process to translate addresses of MSIs says:
"Determine if the address A is an access to a virtual interrupt file
as specified in Section 2.1.3.6."

and Section 2.1.3.6 says:

"An incoming memory access made by a device is recognized as
an access to a virtual interrupt file if the destination guest physical page
matches the supplied address pattern in all bit positions that are zeros
in the supplied address mask. In detail, a memory access to
guest physical address A is recognized as an access to a virtual
interrupt file’s
memory-mapped page if:
(A >> 12) & ~msi_addr_mask = (msi_addr_pattern & ~msi_addr_mask)"

Is checking the address pattern sufficient enough to determine
the address is an MSI to a virtual interrupt file?

>          riscv_iommu_msi_check(s, ctx, iotlb->iova)) {
>          iotlb->target_as = &s->trap_as;
>          iotlb->translated_addr = iotlb->iova;
> @@ -208,11 +244,196 @@ static int riscv_iommu_spa_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
>      }
>
>      /* Exit early for pass-through mode. */
> -    iotlb->translated_addr = iotlb->iova;
> -    iotlb->addr_mask = ~TARGET_PAGE_MASK;
> -    /* Allow R/W in pass-through mode */
> -    iotlb->perm = IOMMU_RW;
> -    return 0;
> +    if (!(en_s || en_g)) {
> +        iotlb->translated_addr = iotlb->iova;
> +        iotlb->addr_mask = ~TARGET_PAGE_MASK;
> +        /* Allow R/W in pass-through mode */
> +        iotlb->perm = IOMMU_RW;
> +        return 0;
> +    }
> +
> +    /* S/G translation parameters. */
> +    for (pass = 0; pass < 2; pass++) {
> +        uint32_t sv_mode;
> +
> +        sc[pass].step = 0;
> +        if (pass ? (s->fctl & RISCV_IOMMU_FCTL_GXL) :
> +            (ctx->tc & RISCV_IOMMU_DC_TC_SXL)) {
> +            /* 32bit mode for GXL/SXL == 1 */
> +            switch (pass ? gatp : satp) {
> +            case RISCV_IOMMU_DC_IOHGATP_MODE_BARE:
> +                sc[pass].levels    = 0;
> +                sc[pass].ptidxbits = 0;
> +                sc[pass].ptesize   = 0;
> +                break;
> +            case RISCV_IOMMU_DC_IOHGATP_MODE_SV32X4:
> +                sv_mode = pass ? RISCV_IOMMU_CAP_SV32X4 : RISCV_IOMMU_CAP_SV32;
> +                if (!(s->cap & sv_mode)) {
> +                    return RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED;
> +                }
> +                sc[pass].levels    = 2;
> +                sc[pass].ptidxbits = 10;
> +                sc[pass].ptesize   = 4;
> +                break;
> +            default:
> +                return RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED;
> +            }
> +        } else {
> +            /* 64bit mode for GXL/SXL == 0 */
> +            switch (pass ? gatp : satp) {
> +            case RISCV_IOMMU_DC_IOHGATP_MODE_BARE:
> +                sc[pass].levels    = 0;
> +                sc[pass].ptidxbits = 0;
> +                sc[pass].ptesize   = 0;
> +                break;
> +            case RISCV_IOMMU_DC_IOHGATP_MODE_SV39X4:
> +                sv_mode = pass ? RISCV_IOMMU_CAP_SV39X4 : RISCV_IOMMU_CAP_SV39;
> +                if (!(s->cap & sv_mode)) {
> +                    return RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED;
> +                }
> +                sc[pass].levels    = 3;
> +                sc[pass].ptidxbits = 9;
> +                sc[pass].ptesize   = 8;
> +                break;
> +            case RISCV_IOMMU_DC_IOHGATP_MODE_SV48X4:
> +                sv_mode = pass ? RISCV_IOMMU_CAP_SV48X4 : RISCV_IOMMU_CAP_SV48;
> +                if (!(s->cap & sv_mode)) {
> +                    return RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED;
> +                }
> +                sc[pass].levels    = 4;
> +                sc[pass].ptidxbits = 9;
> +                sc[pass].ptesize   = 8;
> +                break;
> +            case RISCV_IOMMU_DC_IOHGATP_MODE_SV57X4:
> +                sv_mode = pass ? RISCV_IOMMU_CAP_SV57X4 : RISCV_IOMMU_CAP_SV57;
> +                if (!(s->cap & sv_mode)) {
> +                    return RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED;
> +                }
> +                sc[pass].levels    = 5;
> +                sc[pass].ptidxbits = 9;
> +                sc[pass].ptesize   = 8;
> +                break;
> +            default:
> +                return RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED;
> +            }
> +        }
> +    };
> +
> +    /* S/G stages translation tables root pointers */
> +    gatp = PPN_PHYS(get_field(ctx->gatp, RISCV_IOMMU_ATP_PPN_FIELD));
> +    satp = PPN_PHYS(get_field(ctx->satp, RISCV_IOMMU_ATP_PPN_FIELD));
> +    addr = (en_s && en_g) ? satp : iotlb->iova;
> +    base = en_g ? gatp : satp;
> +    pass = en_g ? G_STAGE : S_STAGE;
> +
> +    do {
> +        const unsigned widened = (pass && !sc[pass].step) ? 2 : 0;
> +        const unsigned va_bits = widened + sc[pass].ptidxbits;
> +        const unsigned va_skip = TARGET_PAGE_BITS + sc[pass].ptidxbits *
> +                                 (sc[pass].levels - 1 - sc[pass].step);
> +        const unsigned idx = (addr >> va_skip) & ((1 << va_bits) - 1);
> +        const dma_addr_t pte_addr = base + idx * sc[pass].ptesize;
> +        const bool ade =
> +            ctx->tc & (pass ? RISCV_IOMMU_DC_TC_GADE : RISCV_IOMMU_DC_TC_SADE);
> +
> +        /* Address range check before first level lookup */
> +        if (!sc[pass].step) {
> +            const uint64_t va_mask = (1ULL << (va_skip + va_bits)) - 1;
> +            if ((addr & va_mask) != addr) {
> +                return RISCV_IOMMU_FQ_CAUSE_DMA_DISABLED;
> +            }
> +        }
> +
> +        /* Read page table entry */
> +        if (dma_memory_read(s->target_as, pte_addr, &pte,
> +                sc[pass].ptesize, MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) {
> +            return (iotlb->perm & IOMMU_WO) ? RISCV_IOMMU_FQ_CAUSE_WR_FAULT
> +                                            : RISCV_IOMMU_FQ_CAUSE_RD_FAULT;
> +        }
> +
> +        if (sc[pass].ptesize == 4) {
> +            pte = (uint64_t) le32_to_cpu(*((uint32_t *)&pte));
> +        } else {
> +            pte = le64_to_cpu(pte);
> +        }
> +
> +        sc[pass].step++;
> +        hwaddr ppn = pte >> PTE_PPN_SHIFT;
> +
> +        if (!(pte & PTE_V)) {
> +            break;                /* Invalid PTE */
> +        } else if (!(pte & (PTE_R | PTE_W | PTE_X))) {
> +            base = PPN_PHYS(ppn); /* Inner PTE, continue walking */
> +        } else if ((pte & (PTE_R | PTE_W | PTE_X)) == PTE_W) {
> +            break;                /* Reserved leaf PTE flags: PTE_W */
> +        } else if ((pte & (PTE_R | PTE_W | PTE_X)) == (PTE_W | PTE_X)) {
> +            break;                /* Reserved leaf PTE flags: PTE_W + PTE_X */
> +        } else if (ppn & ((1ULL << (va_skip - TARGET_PAGE_BITS)) - 1)) {
> +            break;                /* Misaligned PPN */
> +        } else if ((iotlb->perm & IOMMU_RO) && !(pte & PTE_R)) {
> +            break;                /* Read access check failed */
> +        } else if ((iotlb->perm & IOMMU_WO) && !(pte & PTE_W)) {
> +            break;                /* Write access check failed */
> +        } else if ((iotlb->perm & IOMMU_RO) && !ade && !(pte & PTE_A)) {
> +            break;                /* Access bit not set */
> +        } else if ((iotlb->perm & IOMMU_WO) && !ade && !(pte & PTE_D)) {
> +            break;                /* Dirty bit not set */
> +        } else {
> +            /* Leaf PTE, translation completed. */
> +            sc[pass].step = sc[pass].levels;
> +            base = PPN_PHYS(ppn) | (addr & ((1ULL << va_skip) - 1));
> +            /* Update address mask based on smallest translation granularity */
> +            iotlb->addr_mask &= (1ULL << va_skip) - 1;
> +            /* Continue with S-Stage translation? */
> +            if (pass && sc[0].step != sc[0].levels) {

Replace 0 with S_STAGE?

> +                pass = S_STAGE;
> +                addr = iotlb->iova;
> +                continue;
> +            }

May I ask under which case we will continue to walk the S-stage translation
after leaf PTE is found?

I thought the translation combinations are:
S-stage (i.e. Single-stage translation)
G-stage (i.e. G-stage only translation)
S-stage -> G-stage (i.e. Nested translation)

> +            /* Translation phase completed (GPA or SPA) */
> +            iotlb->translated_addr = base;
> +            iotlb->perm = (pte & PTE_W) ? ((pte & PTE_R) ? IOMMU_RW : IOMMU_WO)
> +                                                         : IOMMU_RO;
> +
> +            /* Check MSI GPA address match */
> +            if (pass == S_STAGE && (iotlb->perm & IOMMU_WO) &&
> +                riscv_iommu_msi_check(s, ctx, base)) {
> +                /* Trap MSI writes and return GPA address. */
> +                iotlb->target_as = &s->trap_as;
> +                iotlb->addr_mask = ~TARGET_PAGE_MASK;
> +                return 0;
> +            }
> +
> +            /* Continue with G-Stage translation? */
> +            if (!pass && en_g) {
> +                pass = G_STAGE;
> +                addr = base;
> +                base = gatp;
> +                sc[pass].step = 0;
> +                continue;
> +            }
> +
> +            return 0;
> +        }
> +
> +        if (sc[pass].step == sc[pass].levels) {
> +            break; /* Can't find leaf PTE */
> +        }
> +
> +        /* Continue with G-Stage translation? */
> +        if (!pass && en_g) {
> +            pass = G_STAGE;
> +            addr = base;
> +            base = gatp;
> +            sc[pass].step = 0;
> +        }

Will this if condition ever be executed?

For S-stage -> G-stage (i.e. Nested translation),
G-stage translation should be continued by
the S-stage Leaf PTE's if condition above?

> +    } while (1);
> +
> +    return (iotlb->perm & IOMMU_WO) ?
> +                (pass ? RISCV_IOMMU_FQ_CAUSE_WR_FAULT_VS :
> +                        RISCV_IOMMU_FQ_CAUSE_WR_FAULT_S) :
> +                (pass ? RISCV_IOMMU_FQ_CAUSE_RD_FAULT_VS :
> +                        RISCV_IOMMU_FQ_CAUSE_RD_FAULT_S);
>  }
>
>  /* Redirect MSI write for given GPA. */
> @@ -351,6 +572,10 @@ static int riscv_iommu_ctx_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx)
>
>      case RISCV_IOMMU_DDTP_MODE_BARE:
>          /* mock up pass-through translation context */
> +        ctx->gatp = set_field(0, RISCV_IOMMU_ATP_MODE_FIELD,
> +            RISCV_IOMMU_DC_IOHGATP_MODE_BARE);
> +        ctx->satp = set_field(0, RISCV_IOMMU_ATP_MODE_FIELD,
> +            RISCV_IOMMU_DC_FSC_MODE_BARE);
>          ctx->tc = RISCV_IOMMU_DC_TC_V;
>          ctx->ta = 0;
>          ctx->msiptp = 0;
> @@ -424,6 +649,8 @@ static int riscv_iommu_ctx_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx)
>
>      /* Set translation context. */
>      ctx->tc = le64_to_cpu(dc.tc);
> +    ctx->gatp = le64_to_cpu(dc.iohgatp);
> +    ctx->satp = le64_to_cpu(dc.fsc);
>      ctx->ta = le64_to_cpu(dc.ta);
>      ctx->msiptp = le64_to_cpu(dc.msiptp);
>      ctx->msi_addr_mask = le64_to_cpu(dc.msi_addr_mask);
> @@ -433,14 +660,38 @@ static int riscv_iommu_ctx_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx)
>          return RISCV_IOMMU_FQ_CAUSE_DDT_INVALID;
>      }
>
> +    /* FSC field checks */
> +    mode = get_field(ctx->satp, RISCV_IOMMU_DC_FSC_MODE);
> +    addr = PPN_PHYS(get_field(ctx->satp, RISCV_IOMMU_DC_FSC_PPN));
> +
> +    if (mode == RISCV_IOMMU_DC_FSC_MODE_BARE) {
> +        /* No S-Stage translation, done. */
> +        return 0;
> +    }
> +
>      if (!(ctx->tc & RISCV_IOMMU_DC_TC_PDTV)) {
>          if (ctx->pasid != RISCV_IOMMU_NOPASID) {
>              /* PASID is disabled */
>              return RISCV_IOMMU_FQ_CAUSE_TTYPE_BLOCKED;
>          }
> +        if (mode > RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV57) {
> +            /* Invalid translation mode */
> +            return RISCV_IOMMU_FQ_CAUSE_DDT_INVALID;
> +        }
>          return 0;
>      }
>
> +    if (ctx->pasid == RISCV_IOMMU_NOPASID) {
> +        if (!(ctx->tc & RISCV_IOMMU_DC_TC_DPE)) {
> +            /* No default PASID enabled, set BARE mode */
> +            ctx->satp = 0ULL;
> +            return 0;
> +        } else {
> +            /* Use default PASID #0 */
> +            ctx->pasid = 0;

How do we differentiate between the default PASID: 0
and RISCV_IOMMU_NOPASID?

Regards,
Frank Chang

> +        }
> +    }
> +
>      /* FSC.TC.PDTV enabled */
>      if (mode > RISCV_IOMMU_DC_FSC_PDTP_MODE_PD20) {
>          /* Invalid PDTP.MODE */
> @@ -474,6 +725,7 @@ static int riscv_iommu_ctx_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx)
>
>      /* Use FSC and TA from process directory entry. */
>      ctx->ta = le64_to_cpu(dc.ta);
> +    ctx->satp = le64_to_cpu(dc.fsc);
>
>      return 0;
>  }
> @@ -710,6 +962,7 @@ static RISCVIOMMUEntry *riscv_iommu_iot_lookup(RISCVIOMMUContext *ctx,
>      GHashTable *iot_cache, hwaddr iova)
>  {
>      RISCVIOMMUEntry key = {
> +        .gscid = get_field(ctx->gatp, RISCV_IOMMU_DC_IOHGATP_GSCID),
>          .pscid = get_field(ctx->ta, RISCV_IOMMU_DC_TA_PSCID),
>          .iova  = PPN_DOWN(iova),
>      };
> @@ -779,7 +1032,7 @@ static int riscv_iommu_translate(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
>      }
>
>      /* Translate using device directory / page table information. */
> -    fault = riscv_iommu_spa_fetch(s, ctx, iotlb);
> +    fault = riscv_iommu_spa_fetch(s, ctx, iotlb, false);
>
>      if (!fault && iotlb->target_as == &s->trap_as) {
>          /* Do not cache trapped MSI translations */
> @@ -790,6 +1043,7 @@ static int riscv_iommu_translate(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
>          iot = g_new0(RISCVIOMMUEntry, 1);
>          iot->iova = PPN_DOWN(iotlb->iova);
>          iot->phys = PPN_DOWN(iotlb->translated_addr);
> +        iot->gscid = get_field(ctx->gatp, RISCV_IOMMU_DC_IOHGATP_GSCID);
>          iot->pscid = get_field(ctx->ta, RISCV_IOMMU_DC_TA_PSCID);
>          iot->perm = iotlb->perm;
>          riscv_iommu_iot_update(s, iot_cache, iot);
> @@ -1394,6 +1648,14 @@ static void riscv_iommu_realize(DeviceState *dev, Error **errp)
>      if (s->enable_msi) {
>          s->cap |= RISCV_IOMMU_CAP_MSI_FLAT | RISCV_IOMMU_CAP_MSI_MRIF;
>      }
> +    if (s->enable_s_stage) {
> +        s->cap |= RISCV_IOMMU_CAP_SV32 | RISCV_IOMMU_CAP_SV39 |
> +                  RISCV_IOMMU_CAP_SV48 | RISCV_IOMMU_CAP_SV57;
> +    }
> +    if (s->enable_g_stage) {
> +        s->cap |= RISCV_IOMMU_CAP_SV32X4 | RISCV_IOMMU_CAP_SV39X4 |
> +                  RISCV_IOMMU_CAP_SV48X4 | RISCV_IOMMU_CAP_SV57X4;
> +    }
>      /* Report QEMU target physical address space limits */
>      s->cap = set_field(s->cap, RISCV_IOMMU_CAP_PAS,
>                         TARGET_PHYS_ADDR_SPACE_BITS);
> @@ -1504,6 +1766,8 @@ static Property riscv_iommu_properties[] = {
>          LIMIT_CACHE_IOT),
>      DEFINE_PROP_BOOL("intremap", RISCVIOMMUState, enable_msi, TRUE),
>      DEFINE_PROP_BOOL("off", RISCVIOMMUState, enable_off, TRUE),
> +    DEFINE_PROP_BOOL("s-stage", RISCVIOMMUState, enable_s_stage, TRUE),
> +    DEFINE_PROP_BOOL("g-stage", RISCVIOMMUState, enable_g_stage, TRUE),
>      DEFINE_PROP_LINK("downstream-mr", RISCVIOMMUState, target_mr,
>          TYPE_MEMORY_REGION, MemoryRegion *),
>      DEFINE_PROP_END_OF_LIST(),
> diff --git a/hw/riscv/riscv-iommu.h b/hw/riscv/riscv-iommu.h
> index eea2123686..9b33fb97ef 100644
> --- a/hw/riscv/riscv-iommu.h
> +++ b/hw/riscv/riscv-iommu.h
> @@ -38,6 +38,8 @@ struct RISCVIOMMUState {
>
>      bool enable_off;      /* Enable out-of-reset OFF mode (DMA disabled) */
>      bool enable_msi;      /* Enable MSI remapping */
> +    bool enable_s_stage;  /* Enable S/VS-Stage translation */
> +    bool enable_g_stage;  /* Enable G-Stage translation */
>
>      /* IOMMU Internal State */
>      uint64_t ddtp;        /* Validated Device Directory Tree Root Pointer */
> --
> 2.43.2
>
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 03/15] hw/riscv: add RISC-V IOMMU base emulation
  2024-05-08 11:15     ` Daniel Henrique Barboza
@ 2024-05-10 10:58       ` Frank Chang
  2024-05-13 12:41         ` Daniel Henrique Barboza
  2024-05-13 12:37       ` Daniel Henrique Barboza
  1 sibling, 1 reply; 55+ messages in thread
From: Frank Chang @ 2024-05-10 10:58 UTC (permalink / raw)
  To: Daniel Henrique Barboza
  Cc: Frank Chang, qemu-devel, qemu-riscv, alistair.francis, bmeng,
	liwei1518, zhiwei_liu, palmer, ajones, tjeznach, Sebastien Boeuf

Hi Daniel,

Daniel Henrique Barboza <dbarboza@ventanamicro.com> 於 2024年5月8日 週三 下午7:16寫道:
>
> Hi Frank,
>
> I'll reply with that I've done so far. Still missing some stuff:
>
> On 5/2/24 08:37, Frank Chang wrote:
> > Hi Daniel,
> >
> > Daniel Henrique Barboza <dbarboza@ventanamicro.com> 於 2024年3月8日 週五 上午12:04寫道:
> >>
> >> From: Tomasz Jeznach <tjeznach@rivosinc.com>
> >>
> >> The RISC-V IOMMU specification is now ratified as-per the RISC-V
> >> international process. The latest frozen specifcation can be found
> >> at:
> >>
> >> https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf
> >>
> >> Add the foundation of the device emulation for RISC-V IOMMU, which
> >> includes an IOMMU that has no capabilities but MSI interrupt support and
> >> fault queue interfaces. We'll add add more features incrementally in the
> >> next patches.
> >>
> >> Co-developed-by: Sebastien Boeuf <seb@rivosinc.com>
> >> Signed-off-by: Sebastien Boeuf <seb@rivosinc.com>
> >> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> >> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> >> ---
> >>   hw/riscv/Kconfig         |    4 +
> >>   hw/riscv/meson.build     |    1 +
> >>   hw/riscv/riscv-iommu.c   | 1492 ++++++++++++++++++++++++++++++++++++++
> >>   hw/riscv/riscv-iommu.h   |  141 ++++
> >>   hw/riscv/trace-events    |   11 +
> >>   hw/riscv/trace.h         |    2 +
> >>   include/hw/riscv/iommu.h |   36 +
> >>   meson.build              |    1 +
> >>   8 files changed, 1688 insertions(+)
> >>   create mode 100644 hw/riscv/riscv-iommu.c
> >>   create mode 100644 hw/riscv/riscv-iommu.h
> >>   create mode 100644 hw/riscv/trace-events
> >>   create mode 100644 hw/riscv/trace.h
> >>   create mode 100644 include/hw/riscv/iommu.h
> >>
>
> (...)
>
> +{
> >> +    const uint32_t ipsr =
> >> +        riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_IPSR, (1 << vec), 0);
> >> +    const uint32_t ivec = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_IVEC);
> >> +    if (s->notify && !(ipsr & (1 << vec))) {
> >> +        s->notify(s, (ivec >> (vec * 4)) & 0x0F);
> >> +    }
> >
> > s->notify is assigned to riscv_iommu_pci_notify() only.
> > There's no way to assert the wire-signaled interrupt.
> >
> > We should also check fctl.WSI before asserting the interrupt.
> >
>
> This implementation does not support wire-signalled interrupts. It supports only
> MSI, i.e. capabililities.IGS is always MSI (0). For this reason the code is also
> not checking for fctl.WSI.
>
>
>
> >> +}
>   (...)
>
> >> +    g_hash_table_unref(ctx_cache);
> >> +    *ref = NULL;
> >> +
> >> +    if (!(ctx->tc & RISCV_IOMMU_DC_TC_DTF)) {
> >
> > riscv_iommu_ctx_fetch() may return:
> > RISCV_IOMMU_FQ_CAUSE_DMA_DISABLED (256)
> > RISCV_IOMMU_FQ_CAUSE_DDT_LOAD_FAULT (257)
> > RISCV_IOMMU_FQ_CAUSE_DDT_INVALID (258)
> > RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED (259)
> >
> > These faults are reported even when DTF is set to 1.
> > We should report these faults regardless of DTF setting.
>
>
> I created a "riscv_iommu_report_fault()" helper to centralize all the report fault
> logic. This helper will check for DTF and, if set, we'll check the 'cause' to see if
> we still want the fault to be reported or not. This helper is then used in these 2
> instances where we're creating a fault by hand. It's also used extensively in
> riscv_iommu_msi_write() to handle all the cases you mentioned above where we
> weren't issuing faults.
>
>
> >
> >> +        struct riscv_iommu_fq_record ev = { 0 };
> >> +        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_CAUSE, fault);
> >> +        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_TTYPE,
> >> +            RISCV_IOMMU_FQ_TTYPE_UADDR_RD);
> >> +        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_DID, devid);
> >> +        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_PID, pasid);
> >> +        ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_PV, !!pasid);
> >> +        riscv_iommu_fault(s, &ev);
> >> +    }
> >> +
> >> +    g_free(ctx);
> >> +    return NULL;
> >> +}
> >> +
> >> +static void riscv_iommu_ctx_put(RISCVIOMMUState *s, void *ref)
> >> +{
> >> +    if (ref) {
> >> +        g_hash_table_unref((GHashTable *)ref);
> >> +    }
> >> +}
> >> +
> >> +/* Find or allocate address space for a given device */
> >> +static AddressSpace *riscv_iommu_space(RISCVIOMMUState *s, uint32_t devid)
> >> +{
> >> +    RISCVIOMMUSpace *as;
> >> +
> >> +    /* FIXME: PCIe bus remapping for attached endpoints. */
> >> +    devid |= s->bus << 8;
> >> +
> >> +    qemu_mutex_lock(&s->core_lock);
> >> +    QLIST_FOREACH(as, &s->spaces, list) {
> >> +        if (as->devid == devid) {
> >> +            break;
> >> +        }
> >> +    }
> >> +    qemu_mutex_unlock(&s->core_lock);
> >> +
> >> +    if (as == NULL) {
> >> +        char name[64];
> >> +        as = g_new0(RISCVIOMMUSpace, 1);
> >> +
> >> +        as->iommu = s;
> >> +        as->devid = devid;
> >> +
> >> +        snprintf(name, sizeof(name), "riscv-iommu-%04x:%02x.%d-iova",
> >> +            PCI_BUS_NUM(as->devid), PCI_SLOT(as->devid), PCI_FUNC(as->devid));
> >> +
> >> +        /* IOVA address space, untranslated addresses */
> >> +        memory_region_init_iommu(&as->iova_mr, sizeof(as->iova_mr),
> >> +            TYPE_RISCV_IOMMU_MEMORY_REGION,
> >> +            OBJECT(as), name, UINT64_MAX);
> >> +        address_space_init(&as->iova_as, MEMORY_REGION(&as->iova_mr),
> >> +            TYPE_RISCV_IOMMU_PCI);
> >
> > Why do we use TYPE_RISCV_IOMMU_PCI as the address space name here?
> >
>
> This is an error. TYPE_RISCV_IOMMU_PCI is the name of the PCI IOMMU device.
>
> Seeing other iommus in QEMU it seems like the name of memory region is a simple
> string, e.g. "amd_iommu", and then the name of the address space of the device
> is something that includes the device identification.
>
> I'll change this to something like:
>
>          snprintf(name, sizeof(name), "riscv-iommu-%04x:%02x.%d-iova",
>              PCI_BUS_NUM(as->devid), PCI_SLOT(as->devid), PCI_FUNC(as->devid));
>
>          /* IOVA address space, untranslated addresses */
>          memory_region_init_iommu(&as->iova_mr, sizeof(as->iova_mr),
>              TYPE_RISCV_IOMMU_MEMORY_REGION,
>              OBJECT(as), "riscv_iommu", UINT64_MAX);
>          address_space_init(&as->iova_as, MEMORY_REGION(&as->iova_mr),
>                             name);
>
> >> +
> >> +        qemu_mutex_lock(&s->core_lock);
>
> (...)
>
>
> >> +    }
> >> +
> >> +    return dma_memory_write(s->target_as, addr, &data, sizeof(data),
> >> +        MEMTXATTRS_UNSPECIFIED);
> >
> > We should also assert the interrupt when IOFENCE.WSI is true
> > and IOMMU is configured with wire-signaled interrupt.
>
>
> I believe that for the same reason I pointed earlier ("this implementation does not
> support wire-signalled interrupts") we're not checking for IOFENCE.WSI here.
>
> >
> >> +}
> >> +
>
> (...)
>
> >> +
> >> +static MemTxResult riscv_iommu_mmio_write(void *opaque, hwaddr addr,
> >> +    uint64_t data, unsigned size, MemTxAttrs attrs)
> >> +{
> >> +    RISCVIOMMUState *s = opaque;
> >> +    uint32_t regb = addr & ~3;
> >> +    uint32_t busy = 0;
> >> +    uint32_t exec = 0;
> >> +
> >> +    if (size == 0 || size > 8 || (addr & (size - 1)) != 0) {
> >
> > Is it ever possible to have size = 0 or size > 8 write access?
> > This should be guarded by .valid.min_access_size and .valid.max_access_size.
>
> Yes. And on this point:
>
>
> >
>
> (...)
>
> >> +
> >> +static const MemoryRegionOps riscv_iommu_mmio_ops = {
> >> +    .read_with_attrs = riscv_iommu_mmio_read,
> >> +    .write_with_attrs = riscv_iommu_mmio_write,
> >> +    .endianness = DEVICE_NATIVE_ENDIAN,
> >> +    .impl = {
> >> +        .min_access_size = 1,
> >> +        .max_access_size = 8,
> >> +        .unaligned = false,
> >> +    },
> >> +    .valid = {
> >> +        .min_access_size = 1,
> >> +        .max_access_size = 8,
> >> +    }
> >
> > Spec says:
> > "The IOMMU behavior for register accesses where the address is not aligned
> > to the size of the access, or if the access spans multiple registers,
> > or if the size
> > of the access is not 4 bytes or 8 bytes, is UNSPECIFIED."
> >
> > Section 6.1. Reading and writing IOMMU registers also says:
> > "Registers that are 64-bit wide may be accessed using either a 32-bit
> > or a 64-bit access.
> > Registers that are 32-bit wide must only be accessed using a 32-bit access."
> >
> > Should we limit the access sizes to only 4 and 8 bytes?
>
> Yes. We should set min = 4, max = 8, and use min and max to validate the
> access  in riscv_iommu_mmio_write().
>
>
> >
> >> +};
> >> +
> >> +/*
>
> (...)
>
> >> +
> >> +static const MemoryRegionOps riscv_iommu_trap_ops = {
> >> +    .read_with_attrs = riscv_iommu_trap_read,
> >> +    .write_with_attrs = riscv_iommu_trap_write,
> >> +    .endianness = DEVICE_LITTLE_ENDIAN,
> >> +    .impl = {
> >> +        .min_access_size = 1,
> >> +        .max_access_size = 8,
> >> +        .unaligned = true,
> >> +    },
> >> +    .valid = {
> >> +        .min_access_size = 1,
> >> +        .max_access_size = 8,
> >> +    }
> >> +};
>
> We'll also want set min = 4 and max = 8 in these ops too.
>
> >> +
> >> +static void riscv_iommu_realize(DeviceState *dev, Error **errp)
>
> (...)
>
> >> +
> >> +    /* Memory region for untranslated MRIF/MSI writes */
> >> +    memory_region_init_io(&s->trap_mr, OBJECT(dev), &riscv_iommu_trap_ops, s,
> >> +            "riscv-iommu-trap", ~0ULL);
> >> +    address_space_init(&s->trap_as, &s->trap_mr, "riscv-iommu-trap-as");
> >> +
> >> +    /* Device translation context cache */
> >> +    s->ctx_cache = g_hash_table_new_full(__ctx_hash, __ctx_equal,
> >> +                                         g_free, NULL);
> >> +
> >> +    s->iommus.le_next = NULL;
> >> +    s->iommus.le_prev = NULL;
> >> +    QLIST_INIT(&s->spaces);
> >> +    qemu_cond_init(&s->core_cond);
> >> +    qemu_mutex_init(&s->core_lock);
> >> +    qemu_spin_init(&s->regs_lock);
> >> +    qemu_thread_create(&s->core_proc, "riscv-iommu-core",
> >> +        riscv_iommu_core_proc, s, QEMU_THREAD_JOINABLE);
> >
> > In our experience, using QEMU thread increases the latency of command
> > queue processing,
> > which leads to the potential IOMMU fence timeout in the Linux driver
> > when using IOMMU with KVM,
> > e.g. booting the guest Linux.
> >
> > Is it possible to remove the thread from the IOMMU just like ARM, AMD,
> > and Intel IOMMU models?
>
> Interesting. We've been using this emulation internally in Ventana, with
> KVM and VFIO, and didn't experience this issue. Drew is on CC and can talk
> more about it.

We've developed IOFENCE timeout detection mechanism in our Linux
driver internally
to detect the long-run IOFENCE command on the hardware.

However, we hit the assertion when running on QEMU
and the issue was resolved after we removed the thread from IOMMU model.
However, the assertion didn't happen on our hardware.

Regards,
Frank CHang

>
> That said, I don't mind this change, assuming it's feasible to make it for this
> first version.  I'll need to check it how other IOMMUs are doing it.
>
>
>
> >
> >> +}
> >> +
>
> (...)
>
> >> +
> >> +static AddressSpace *riscv_iommu_find_as(PCIBus *bus, void *opaque, int devfn)
> >> +{
> >> +    RISCVIOMMUState *s = (RISCVIOMMUState *) opaque;
> >> +    PCIDevice *pdev = pci_find_device(bus, pci_bus_num(bus), devfn);
> >> +    AddressSpace *as = NULL;
> >> +
> >> +    if (pdev && pci_is_iommu(pdev)) {
> >> +        return s->target_as;
> >> +    }
> >> +
> >> +    /* Find first registered IOMMU device */
> >> +    while (s->iommus.le_prev) {
> >> +        s = *(s->iommus.le_prev);
> >> +    }
> >> +
> >> +    /* Find first matching IOMMU */
> >> +    while (s != NULL && as == NULL) {
> >> +        as = riscv_iommu_space(s, PCI_BUILD_BDF(pci_bus_num(bus), devfn));
> >
> > For pci_bus_num(),
> > riscv_iommu_find_as() can be called at the very early stage
> > where software has no chance to enumerate the bus numbers.
>
> I'll see how other IOMMUs are handling their iommu_find_as()
>
>
> Thanks,
>
>
> Daniel
>
>
> >
> >
> >
> >
> >> +        s = s->iommus.le_next;
> >> +    }
> >> +
> >> +    return as ? as : &address_space_memory;
> >> +}
> >> +
> >> +static const PCIIOMMUOps riscv_iommu_ops = {
> >> +    .get_address_space = riscv_iommu_find_as,
> >> +};
> >> +
> >> +void riscv_iommu_pci_setup_iommu(RISCVIOMMUState *iommu, PCIBus *bus,
> >> +        Error **errp)
> >> +{
> >> +    if (bus->iommu_ops &&
> >> +        bus->iommu_ops->get_address_space == riscv_iommu_find_as) {
> >> +        /* Allow multiple IOMMUs on the same PCIe bus, link known devices */
> >> +        RISCVIOMMUState *last = (RISCVIOMMUState *)bus->iommu_opaque;
> >> +        QLIST_INSERT_AFTER(last, iommu, iommus);
> >> +    } else if (bus->iommu_ops == NULL) {
> >> +        pci_setup_iommu(bus, &riscv_iommu_ops, iommu);
> >> +    } else {
> >> +        error_setg(errp, "can't register secondary IOMMU for PCI bus #%d",
> >> +            pci_bus_num(bus));
> >> +    }
> >> +}
> >> +
> >> +static int riscv_iommu_memory_region_index(IOMMUMemoryRegion *iommu_mr,
> >> +    MemTxAttrs attrs)
> >> +{
> >> +    return attrs.unspecified ? RISCV_IOMMU_NOPASID : (int)attrs.pasid;
> >> +}
> >> +
> >> +static int riscv_iommu_memory_region_index_len(IOMMUMemoryRegion *iommu_mr)
> >> +{
> >> +    RISCVIOMMUSpace *as = container_of(iommu_mr, RISCVIOMMUSpace, iova_mr);
> >> +    return 1 << as->iommu->pasid_bits;
> >> +}
> >> +
> >> +static void riscv_iommu_memory_region_init(ObjectClass *klass, void *data)
> >> +{
> >> +    IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_CLASS(klass);
> >> +
> >> +    imrc->translate = riscv_iommu_memory_region_translate;
> >> +    imrc->notify_flag_changed = riscv_iommu_memory_region_notify;
> >> +    imrc->attrs_to_index = riscv_iommu_memory_region_index;
> >> +    imrc->num_indexes = riscv_iommu_memory_region_index_len;
> >> +}
> >> +
> >> +static const TypeInfo riscv_iommu_memory_region_info = {
> >> +    .parent = TYPE_IOMMU_MEMORY_REGION,
> >> +    .name = TYPE_RISCV_IOMMU_MEMORY_REGION,
> >> +    .class_init = riscv_iommu_memory_region_init,
> >> +};
> >> +
> >> +static void riscv_iommu_register_mr_types(void)
> >> +{
> >> +    type_register_static(&riscv_iommu_memory_region_info);
> >> +    type_register_static(&riscv_iommu_info);
> >> +}
> >> +
> >> +type_init(riscv_iommu_register_mr_types);
> >> diff --git a/hw/riscv/riscv-iommu.h b/hw/riscv/riscv-iommu.h
> >> new file mode 100644
> >> index 0000000000..6f740de690
> >> --- /dev/null
> >> +++ b/hw/riscv/riscv-iommu.h
> >> @@ -0,0 +1,141 @@
> >> +/*
> >> + * QEMU emulation of an RISC-V IOMMU (Ziommu)
> >> + *
> >> + * Copyright (C) 2022-2023 Rivos Inc.
> >> + *
> >> + * This program is free software; you can redistribute it and/or modify
> >> + * it under the terms of the GNU General Public License as published by
> >> + * the Free Software Foundation; either version 2 of the License.
> >> + *
> >> + * This program is distributed in the hope that it will be useful,
> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> + * GNU General Public License for more details.
> >> + *
> >> + * You should have received a copy of the GNU General Public License along
> >> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> >> + */
> >> +
> >> +#ifndef HW_RISCV_IOMMU_STATE_H
> >> +#define HW_RISCV_IOMMU_STATE_H
> >> +
> >> +#include "qemu/osdep.h"
> >> +#include "qom/object.h"
> >> +
> >> +#include "hw/riscv/iommu.h"
> >> +
> >> +struct RISCVIOMMUState {
> >> +    /*< private >*/
> >> +    DeviceState parent_obj;
> >> +
> >> +    /*< public >*/
> >> +    uint32_t version;     /* Reported interface version number */
> >> +    uint32_t pasid_bits;  /* process identifier width */
> >> +    uint32_t bus;         /* PCI bus mapping for non-root endpoints */
> >> +
> >> +    uint64_t cap;         /* IOMMU supported capabilities */
> >> +    uint64_t fctl;        /* IOMMU enabled features */
> >> +
> >> +    bool enable_off;      /* Enable out-of-reset OFF mode (DMA disabled) */
> >> +    bool enable_msi;      /* Enable MSI remapping */
> >> +
> >> +    /* IOMMU Internal State */
> >> +    uint64_t ddtp;        /* Validated Device Directory Tree Root Pointer */
> >> +
> >> +    dma_addr_t cq_addr;   /* Command queue base physical address */
> >> +    dma_addr_t fq_addr;   /* Fault/event queue base physical address */
> >> +    dma_addr_t pq_addr;   /* Page request queue base physical address */
> >> +
> >> +    uint32_t cq_mask;     /* Command queue index bit mask */
> >> +    uint32_t fq_mask;     /* Fault/event queue index bit mask */
> >> +    uint32_t pq_mask;     /* Page request queue index bit mask */
> >> +
> >> +    /* interrupt notifier */
> >> +    void (*notify)(RISCVIOMMUState *iommu, unsigned vector);
> >> +
> >> +    /* IOMMU State Machine */
> >> +    QemuThread core_proc; /* Background processing thread */
> >> +    QemuMutex core_lock;  /* Global IOMMU lock, used for cache/regs updates */
> >> +    QemuCond core_cond;   /* Background processing wake up signal */
> >> +    unsigned core_exec;   /* Processing thread execution actions */
> >> +
> >> +    /* IOMMU target address space */
> >> +    AddressSpace *target_as;
> >> +    MemoryRegion *target_mr;
> >> +
> >> +    /* MSI / MRIF access trap */
> >> +    AddressSpace trap_as;
> >> +    MemoryRegion trap_mr;
> >> +
> >> +    GHashTable *ctx_cache;          /* Device translation Context Cache */
> >> +
> >> +    /* MMIO Hardware Interface */
> >> +    MemoryRegion regs_mr;
> >> +    QemuSpin regs_lock;
> >> +    uint8_t *regs_rw;  /* register state (user write) */
> >> +    uint8_t *regs_wc;  /* write-1-to-clear mask */
> >> +    uint8_t *regs_ro;  /* read-only mask */
> >> +
> >> +    QLIST_ENTRY(RISCVIOMMUState) iommus;
> >> +    QLIST_HEAD(, RISCVIOMMUSpace) spaces;
> >> +};
> >> +
> >> +void riscv_iommu_pci_setup_iommu(RISCVIOMMUState *iommu, PCIBus *bus,
> >> +         Error **errp);
> >> +
> >> +/* private helpers */
> >> +
> >> +/* Register helper functions */
> >> +static inline uint32_t riscv_iommu_reg_mod32(RISCVIOMMUState *s,
> >> +    unsigned idx, uint32_t set, uint32_t clr)
> >> +{
> >> +    uint32_t val;
> >> +    qemu_spin_lock(&s->regs_lock);
> >> +    val = ldl_le_p(s->regs_rw + idx);
> >> +    stl_le_p(s->regs_rw + idx, (val & ~clr) | set);
> >> +    qemu_spin_unlock(&s->regs_lock);
> >> +    return val;
> >> +}
> >> +
> >> +static inline void riscv_iommu_reg_set32(RISCVIOMMUState *s,
> >> +    unsigned idx, uint32_t set)
> >> +{
> >> +    qemu_spin_lock(&s->regs_lock);
> >> +    stl_le_p(s->regs_rw + idx, set);
> >> +    qemu_spin_unlock(&s->regs_lock);
> >> +}
> >> +
> >> +static inline uint32_t riscv_iommu_reg_get32(RISCVIOMMUState *s,
> >> +    unsigned idx)
> >> +{
> >> +    return ldl_le_p(s->regs_rw + idx);
> >> +}
> >> +
> >> +static inline uint64_t riscv_iommu_reg_mod64(RISCVIOMMUState *s,
> >> +    unsigned idx, uint64_t set, uint64_t clr)
> >> +{
> >> +    uint64_t val;
> >> +    qemu_spin_lock(&s->regs_lock);
> >> +    val = ldq_le_p(s->regs_rw + idx);
> >> +    stq_le_p(s->regs_rw + idx, (val & ~clr) | set);
> >> +    qemu_spin_unlock(&s->regs_lock);
> >> +    return val;
> >> +}
> >> +
> >> +static inline void riscv_iommu_reg_set64(RISCVIOMMUState *s,
> >> +    unsigned idx, uint64_t set)
> >> +{
> >> +    qemu_spin_lock(&s->regs_lock);
> >> +    stq_le_p(s->regs_rw + idx, set);
> >> +    qemu_spin_unlock(&s->regs_lock);
> >> +}
> >> +
> >> +static inline uint64_t riscv_iommu_reg_get64(RISCVIOMMUState *s,
> >> +    unsigned idx)
> >> +{
> >> +    return ldq_le_p(s->regs_rw + idx);
> >> +}
> >> +
> >> +
> >> +
> >> +#endif
> >> diff --git a/hw/riscv/trace-events b/hw/riscv/trace-events
> >> new file mode 100644
> >> index 0000000000..42a97caffa
> >> --- /dev/null
> >> +++ b/hw/riscv/trace-events
> >> @@ -0,0 +1,11 @@
> >> +# See documentation at docs/devel/tracing.rst
> >> +
> >> +# riscv-iommu.c
> >> +riscv_iommu_new(const char *id, unsigned b, unsigned d, unsigned f) "%s: device attached %04x:%02x.%d"
> >> +riscv_iommu_flt(const char *id, unsigned b, unsigned d, unsigned f, uint64_t reason, uint64_t iova) "%s: fault %04x:%02x.%u reason: 0x%"PRIx64" iova: 0x%"PRIx64
> >> +riscv_iommu_pri(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova) "%s: page request %04x:%02x.%u iova: 0x%"PRIx64
> >> +riscv_iommu_dma(const char *id, unsigned b, unsigned d, unsigned f, unsigned pasid, const char *dir, uint64_t iova, uint64_t phys) "%s: translate %04x:%02x.%u #%u %s 0x%"PRIx64" -> 0x%"PRIx64
> >> +riscv_iommu_msi(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova, uint64_t phys) "%s: translate %04x:%02x.%u MSI 0x%"PRIx64" -> 0x%"PRIx64
> >> +riscv_iommu_cmd(const char *id, uint64_t l, uint64_t u) "%s: command 0x%"PRIx64" 0x%"PRIx64
> >> +riscv_iommu_notifier_add(const char *id) "%s: dev-iotlb notifier added"
> >> +riscv_iommu_notifier_del(const char *id) "%s: dev-iotlb notifier removed"
> >> diff --git a/hw/riscv/trace.h b/hw/riscv/trace.h
> >> new file mode 100644
> >> index 0000000000..b88504b750
> >> --- /dev/null
> >> +++ b/hw/riscv/trace.h
> >> @@ -0,0 +1,2 @@
> >> +#include "trace/trace-hw_riscv.h"
> >> +
> >> diff --git a/include/hw/riscv/iommu.h b/include/hw/riscv/iommu.h
> >> new file mode 100644
> >> index 0000000000..403b365893
> >> --- /dev/null
> >> +++ b/include/hw/riscv/iommu.h
> >> @@ -0,0 +1,36 @@
> >> +/*
> >> + * QEMU emulation of an RISC-V IOMMU (Ziommu)
> >> + *
> >> + * Copyright (C) 2022-2023 Rivos Inc.
> >> + *
> >> + * This program is free software; you can redistribute it and/or modify
> >> + * it under the terms of the GNU General Public License as published by
> >> + * the Free Software Foundation; either version 2 of the License.
> >> + *
> >> + * This program is distributed in the hope that it will be useful,
> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> + * GNU General Public License for more details.
> >> + *
> >> + * You should have received a copy of the GNU General Public License along
> >> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> >> + */
> >> +
> >> +#ifndef HW_RISCV_IOMMU_H
> >> +#define HW_RISCV_IOMMU_H
> >> +
> >> +#include "qemu/osdep.h"
> >> +#include "qom/object.h"
> >> +
> >> +#define TYPE_RISCV_IOMMU "riscv-iommu"
> >> +OBJECT_DECLARE_SIMPLE_TYPE(RISCVIOMMUState, RISCV_IOMMU)
> >> +typedef struct RISCVIOMMUState RISCVIOMMUState;
> >> +
> >> +#define TYPE_RISCV_IOMMU_MEMORY_REGION "riscv-iommu-mr"
> >> +typedef struct RISCVIOMMUSpace RISCVIOMMUSpace;
> >> +
> >> +#define TYPE_RISCV_IOMMU_PCI "riscv-iommu-pci"
> >> +OBJECT_DECLARE_SIMPLE_TYPE(RISCVIOMMUStatePci, RISCV_IOMMU_PCI)
> >> +typedef struct RISCVIOMMUStatePci RISCVIOMMUStatePci;
> >> +
> >> +#endif
> >> diff --git a/meson.build b/meson.build
> >> index c59ca496f2..75e56f3282 100644
> >> --- a/meson.build
> >> +++ b/meson.build
> >> @@ -3361,6 +3361,7 @@ if have_system
> >>       'hw/rdma',
> >>       'hw/rdma/vmw',
> >>       'hw/rtc',
> >> +    'hw/riscv',
> >>       'hw/s390x',
> >>       'hw/scsi',
> >>       'hw/sd',
> >> --
> >> 2.43.2
> >>
> >>
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 11/15] hw/riscv/riscv-iommu: add DBG support
  2024-05-06 13:05     ` Daniel Henrique Barboza
@ 2024-05-10 10:59       ` Frank Chang
  0 siblings, 0 replies; 55+ messages in thread
From: Frank Chang @ 2024-05-10 10:59 UTC (permalink / raw)
  To: Daniel Henrique Barboza
  Cc: Frank Chang, qemu-devel, qemu-riscv, alistair.francis, bmeng,
	liwei1518, zhiwei_liu, palmer, ajones, tjeznach

Hi Daniel,

Daniel Henrique Barboza <dbarboza@ventanamicro.com> 於 2024年5月6日 週一 下午9:06寫道:
>
> Hi Frank,
>
> On 5/6/24 01:09, Frank Chang wrote:
> > Hi Daniel,
> >
> > Daniel Henrique Barboza <dbarboza@ventanamicro.com> 於 2024年3月8日 週五 上午12:05寫道:
> >>
> >> From: Tomasz Jeznach <tjeznach@rivosinc.com>
> >>
> >> DBG support adds three additional registers: tr_req_iova, tr_req_ctl and
> >> tr_response.
> >>
> >> The DBG cap is always enabled. No on/off toggle is provided for it.
> >>
> >> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> >> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> >> ---
> >>   hw/riscv/riscv-iommu-bits.h | 20 +++++++++++++
> >>   hw/riscv/riscv-iommu.c      | 57 ++++++++++++++++++++++++++++++++++++-
> >>   2 files changed, 76 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/riscv/riscv-iommu-bits.h b/hw/riscv/riscv-iommu-bits.h
> >> index 0994f5ce48..b3f92411bb 100644
> >> --- a/hw/riscv/riscv-iommu-bits.h
> >> +++ b/hw/riscv/riscv-iommu-bits.h
> >> @@ -83,6 +83,7 @@ struct riscv_iommu_pq_record {
> >>   #define RISCV_IOMMU_CAP_MSI_MRIF        BIT_ULL(23)
> >>   #define RISCV_IOMMU_CAP_ATS             BIT_ULL(25)
> >>   #define RISCV_IOMMU_CAP_IGS             GENMASK_ULL(29, 28)
> >> +#define RISCV_IOMMU_CAP_DBG             BIT_ULL(31)
> >>   #define RISCV_IOMMU_CAP_PAS             GENMASK_ULL(37, 32)
> >>   #define RISCV_IOMMU_CAP_PD8             BIT_ULL(38)
> >>
> >> @@ -177,6 +178,25 @@ enum {
> >>       RISCV_IOMMU_INTR_COUNT
> >>   };
> >>
> >> +#define RISCV_IOMMU_IPSR_CIP            BIT(RISCV_IOMMU_INTR_CQ)
> >> +#define RISCV_IOMMU_IPSR_FIP            BIT(RISCV_IOMMU_INTR_FQ)
> >> +#define RISCV_IOMMU_IPSR_PMIP           BIT(RISCV_IOMMU_INTR_PM)
> >> +#define RISCV_IOMMU_IPSR_PIP            BIT(RISCV_IOMMU_INTR_PQ)
> >
> > These are not related to the DBG.
> >
> >> +
> >> +/* 5.24 Translation request IOVA (64bits) */
> >> +#define RISCV_IOMMU_REG_TR_REQ_IOVA     0x0258
> >> +
> >> +/* 5.25 Translation request control (64bits) */
> >> +#define RISCV_IOMMU_REG_TR_REQ_CTL      0x0260
> >> +#define RISCV_IOMMU_TR_REQ_CTL_GO_BUSY  BIT_ULL(0)
> >> +#define RISCV_IOMMU_TR_REQ_CTL_PID      GENMASK_ULL(31, 12)
> >> +#define RISCV_IOMMU_TR_REQ_CTL_DID      GENMASK_ULL(63, 40)
> >> +
> >> +/* 5.26 Translation request response (64bits) */
> >> +#define RISCV_IOMMU_REG_TR_RESPONSE     0x0268
> >> +#define RISCV_IOMMU_TR_RESPONSE_FAULT   BIT_ULL(0)
> >> +#define RISCV_IOMMU_TR_RESPONSE_PPN     RISCV_IOMMU_PPN_FIELD
> >> +
> >>   /* 5.27 Interrupt cause to vector (64bits) */
> >>   #define RISCV_IOMMU_REG_IVEC            0x02F8
> >>
> >> diff --git a/hw/riscv/riscv-iommu.c b/hw/riscv/riscv-iommu.c
> >> index 7af5929b10..1fa1286d07 100644
> >> --- a/hw/riscv/riscv-iommu.c
> >> +++ b/hw/riscv/riscv-iommu.c
> >> @@ -1457,6 +1457,46 @@ static void riscv_iommu_process_pq_control(RISCVIOMMUState *s)
> >>       riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_PQCSR, ctrl_set, ctrl_clr);
> >>   }
> >>
> >> +static void riscv_iommu_process_dbg(RISCVIOMMUState *s)
> >> +{
> >> +    uint64_t iova = riscv_iommu_reg_get64(s, RISCV_IOMMU_REG_TR_REQ_IOVA);
> >> +    uint64_t ctrl = riscv_iommu_reg_get64(s, RISCV_IOMMU_REG_TR_REQ_CTL);
> >> +    unsigned devid = get_field(ctrl, RISCV_IOMMU_TR_REQ_CTL_DID);
> >> +    unsigned pid = get_field(ctrl, RISCV_IOMMU_TR_REQ_CTL_PID);
> >> +    RISCVIOMMUContext *ctx;
> >> +    void *ref;
> >> +
> >> +    if (!(ctrl & RISCV_IOMMU_TR_REQ_CTL_GO_BUSY)) {
> >> +        return;
> >> +    }
> >> +
> >> +    ctx = riscv_iommu_ctx(s, devid, pid, &ref);
> >> +    if (ctx == NULL) {
> >> +        riscv_iommu_reg_set64(s, RISCV_IOMMU_REG_TR_RESPONSE,
> >> +                                 RISCV_IOMMU_TR_RESPONSE_FAULT |
> >> +                                 (RISCV_IOMMU_FQ_CAUSE_DMA_DISABLED << 10));
> >> +    } else {
> >> +        IOMMUTLBEntry iotlb = {
> >> +            .iova = iova,
> >> +            .perm = IOMMU_NONE,
> >
> > .perm should honor tr_req_ctl.[Exe|Nw]
> >
> >> +            .addr_mask = ~0,
> >> +            .target_as = NULL,
> >> +        };
> >> +        int fault = riscv_iommu_translate(s, ctx, &iotlb, false);
> >> +        if (fault) {
> >> +            iova = RISCV_IOMMU_TR_RESPONSE_FAULT | (((uint64_t) fault) << 10);
> >> +        } else {
> >> +            iova = ((iotlb.translated_addr & ~iotlb.addr_mask) >> 2) &
> >
> > For 4-KB page, we should right-shift 12 bits.
> >
> >> +                RISCV_IOMMU_TR_RESPONSE_PPN;
> >
> > It's possible that the translation is not 4-KB page (i.e. superpage),
> > which we should set tr_response.S
> > and encode translation range size in tr_response.PPN.
>
> At this moment this emulation doesn't support superpages, at least from my
> understanding. Tomasz is welcome to correct me if I'm wrong. I'll explictly
> set tr_response.S to 0 here to make it clearer.
>
> The idea here IIUC is to, in the future, merge the IOMMU translation lookup code
> with the existing lookup code we have (cpu_helper.c, get_physical_address()), and
> with that the IOMMU will end up supporting both super-pages and svnapot.

I see, thanks for the explanation.

Regards,
Frank Chang

>
>
>
> Thanks,
>
> Daniel
>
>
> >
> > Regards,
> > Frank Chang
> >
> >> +        }
> >> +        riscv_iommu_reg_set64(s, RISCV_IOMMU_REG_TR_RESPONSE, iova);
> >> +    }
> >> +
> >> +    riscv_iommu_reg_mod64(s, RISCV_IOMMU_REG_TR_REQ_CTL, 0,
> >> +        RISCV_IOMMU_TR_REQ_CTL_GO_BUSY);
> >> +    riscv_iommu_ctx_put(s, ref);
> >> +}
> >> +
> >>   /* Core IOMMU execution activation */
> >>   enum {
> >>       RISCV_IOMMU_EXEC_DDTP,
> >> @@ -1502,7 +1542,7 @@ static void *riscv_iommu_core_proc(void* arg)
> >>               /* NOP */
> >>               break;
> >>           case BIT(RISCV_IOMMU_EXEC_TR_REQUEST):
> >> -            /* DBG support not implemented yet */
> >> +            riscv_iommu_process_dbg(s);
> >>               break;
> >>           }
> >>           exec &= ~mask;
> >> @@ -1574,6 +1614,12 @@ static MemTxResult riscv_iommu_mmio_write(void *opaque, hwaddr addr,
> >>           exec = BIT(RISCV_IOMMU_EXEC_PQCSR);
> >>           busy = RISCV_IOMMU_PQCSR_BUSY;
> >>           break;
> >> +
> >> +    case RISCV_IOMMU_REG_TR_REQ_CTL:
> >> +        exec = BIT(RISCV_IOMMU_EXEC_TR_REQUEST);
> >> +        regb = RISCV_IOMMU_REG_TR_REQ_CTL;
> >> +        busy = RISCV_IOMMU_TR_REQ_CTL_GO_BUSY;
> >> +        break;
> >>       }
> >>
> >>       /*
> >> @@ -1746,6 +1792,9 @@ static void riscv_iommu_realize(DeviceState *dev, Error **errp)
> >>           s->cap |= RISCV_IOMMU_CAP_SV32X4 | RISCV_IOMMU_CAP_SV39X4 |
> >>                     RISCV_IOMMU_CAP_SV48X4 | RISCV_IOMMU_CAP_SV57X4;
> >>       }
> >> +    /* Enable translation debug interface */
> >> +    s->cap |= RISCV_IOMMU_CAP_DBG;
> >> +
> >>       /* Report QEMU target physical address space limits */
> >>       s->cap = set_field(s->cap, RISCV_IOMMU_CAP_PAS,
> >>                          TARGET_PHYS_ADDR_SPACE_BITS);
> >> @@ -1800,6 +1849,12 @@ static void riscv_iommu_realize(DeviceState *dev, Error **errp)
> >>       stl_le_p(&s->regs_wc[RISCV_IOMMU_REG_IPSR], ~0);
> >>       stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_IVEC], 0);
> >>       stq_le_p(&s->regs_rw[RISCV_IOMMU_REG_DDTP], s->ddtp);
> >> +    /* If debug registers enabled. */
> >> +    if (s->cap & RISCV_IOMMU_CAP_DBG) {
> >> +        stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_TR_REQ_IOVA], 0);
> >> +        stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_TR_REQ_CTL],
> >> +            RISCV_IOMMU_TR_REQ_CTL_GO_BUSY);
> >> +    }
> >>
> >>       /* Memory region for downstream access, if specified. */
> >>       if (s->target_mr) {
> >> --
> >> 2.43.2
> >>
> >>
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 02/15] hw/riscv: add riscv-iommu-bits.h
  2024-03-07 16:03 ` [PATCH v2 02/15] hw/riscv: add riscv-iommu-bits.h Daniel Henrique Barboza
@ 2024-05-10 11:01   ` Frank Chang
  2024-05-15 10:02   ` Eric Cheng
  1 sibling, 0 replies; 55+ messages in thread
From: Frank Chang @ 2024-05-10 11:01 UTC (permalink / raw)
  To: Daniel Henrique Barboza
  Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
	zhiwei_liu, palmer, ajones, tjeznach

Reviewed-by: Frank Chang <frank.chang@sifive.com>

Daniel Henrique Barboza <dbarboza@ventanamicro.com> 於 2024年3月8日 週五 上午12:07寫道:
>
> From: Tomasz Jeznach <tjeznach@rivosinc.com>
>
> This header will be used by the RISC-V IOMMU emulation to be added
> in the next patch. Due to its size it's being sent in separate for
> an easier review.
>
> One thing to notice is that this header can be replaced by the future
> Linux RISC-V IOMMU driver header, which would become a linux-header we
> would import instead of keeping our own. The Linux implementation isn't
> upstream yet so for now we'll have to manage riscv-iommu-bits.h.
>
> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> ---
>  hw/riscv/riscv-iommu-bits.h | 335 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 335 insertions(+)
>  create mode 100644 hw/riscv/riscv-iommu-bits.h
>
> diff --git a/hw/riscv/riscv-iommu-bits.h b/hw/riscv/riscv-iommu-bits.h
> new file mode 100644
> index 0000000000..8e80b1e52a
> --- /dev/null
> +++ b/hw/riscv/riscv-iommu-bits.h
> @@ -0,0 +1,335 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright © 2022-2023 Rivos Inc.
> + * Copyright © 2023 FORTH-ICS/CARV
> + * Copyright © 2023 RISC-V IOMMU Task Group
> + *
> + * RISC-V Ziommu - Register Layout and Data Structures.
> + *
> + * Based on the IOMMU spec version 1.0, 3/2023
> + * https://github.com/riscv-non-isa/riscv-iommu
> + */
> +
> +#ifndef HW_RISCV_IOMMU_BITS_H
> +#define HW_RISCV_IOMMU_BITS_H
> +
> +#include "qemu/osdep.h"
> +
> +#define RISCV_IOMMU_SPEC_DOT_VER 0x010
> +
> +#ifndef GENMASK_ULL
> +#define GENMASK_ULL(h, l) (((~0ULL) >> (63 - (h) + (l))) << (l))
> +#endif
> +
> +/*
> + * struct riscv_iommu_fq_record - Fault/Event Queue Record
> + * See section 3.2 for more info.
> + */
> +struct riscv_iommu_fq_record {
> +    uint64_t hdr;
> +    uint64_t _reserved;
> +    uint64_t iotval;
> +    uint64_t iotval2;
> +};
> +/* Header fields */
> +#define RISCV_IOMMU_FQ_HDR_CAUSE        GENMASK_ULL(11, 0)
> +#define RISCV_IOMMU_FQ_HDR_PID          GENMASK_ULL(31, 12)
> +#define RISCV_IOMMU_FQ_HDR_PV           BIT_ULL(32)
> +#define RISCV_IOMMU_FQ_HDR_TTYPE        GENMASK_ULL(39, 34)
> +#define RISCV_IOMMU_FQ_HDR_DID          GENMASK_ULL(63, 40)
> +
> +/*
> + * struct riscv_iommu_pq_record - PCIe Page Request record
> + * For more infos on the PCIe Page Request queue see chapter 3.3.
> + */
> +struct riscv_iommu_pq_record {
> +      uint64_t hdr;
> +      uint64_t payload;
> +};
> +/* Header fields */
> +#define RISCV_IOMMU_PREQ_HDR_PID        GENMASK_ULL(31, 12)
> +#define RISCV_IOMMU_PREQ_HDR_PV         BIT_ULL(32)
> +#define RISCV_IOMMU_PREQ_HDR_PRIV       BIT_ULL(33)
> +#define RISCV_IOMMU_PREQ_HDR_EXEC       BIT_ULL(34)
> +#define RISCV_IOMMU_PREQ_HDR_DID        GENMASK_ULL(63, 40)
> +/* Payload fields */
> +#define RISCV_IOMMU_PREQ_PAYLOAD_M      GENMASK_ULL(2, 0)
> +
> +/* Common field positions */
> +#define RISCV_IOMMU_PPN_FIELD           GENMASK_ULL(53, 10)
> +#define RISCV_IOMMU_QUEUE_LOGSZ_FIELD   GENMASK_ULL(4, 0)
> +#define RISCV_IOMMU_QUEUE_INDEX_FIELD   GENMASK_ULL(31, 0)
> +#define RISCV_IOMMU_QUEUE_ENABLE        BIT(0)
> +#define RISCV_IOMMU_QUEUE_INTR_ENABLE   BIT(1)
> +#define RISCV_IOMMU_QUEUE_MEM_FAULT     BIT(8)
> +#define RISCV_IOMMU_QUEUE_OVERFLOW      BIT(9)
> +#define RISCV_IOMMU_QUEUE_ACTIVE        BIT(16)
> +#define RISCV_IOMMU_QUEUE_BUSY          BIT(17)
> +#define RISCV_IOMMU_ATP_PPN_FIELD       GENMASK_ULL(43, 0)
> +#define RISCV_IOMMU_ATP_MODE_FIELD      GENMASK_ULL(63, 60)
> +
> +/* 5.3 IOMMU Capabilities (64bits) */
> +#define RISCV_IOMMU_REG_CAP             0x0000
> +#define RISCV_IOMMU_CAP_VERSION         GENMASK_ULL(7, 0)
> +#define RISCV_IOMMU_CAP_MSI_FLAT        BIT_ULL(22)
> +#define RISCV_IOMMU_CAP_MSI_MRIF        BIT_ULL(23)
> +#define RISCV_IOMMU_CAP_IGS             GENMASK_ULL(29, 28)
> +#define RISCV_IOMMU_CAP_PAS             GENMASK_ULL(37, 32)
> +#define RISCV_IOMMU_CAP_PD8             BIT_ULL(38)
> +
> +/* 5.4 Features control register (32bits) */
> +#define RISCV_IOMMU_REG_FCTL            0x0008
> +
> +/* 5.5 Device-directory-table pointer (64bits) */
> +#define RISCV_IOMMU_REG_DDTP            0x0010
> +#define RISCV_IOMMU_DDTP_MODE           GENMASK_ULL(3, 0)
> +#define RISCV_IOMMU_DDTP_BUSY           BIT_ULL(4)
> +#define RISCV_IOMMU_DDTP_PPN            RISCV_IOMMU_PPN_FIELD
> +
> +enum riscv_iommu_ddtp_modes {
> +    RISCV_IOMMU_DDTP_MODE_OFF = 0,
> +    RISCV_IOMMU_DDTP_MODE_BARE = 1,
> +    RISCV_IOMMU_DDTP_MODE_1LVL = 2,
> +    RISCV_IOMMU_DDTP_MODE_2LVL = 3,
> +    RISCV_IOMMU_DDTP_MODE_3LVL = 4,
> +    RISCV_IOMMU_DDTP_MODE_MAX = 4
> +};
> +
> +/* 5.6 Command Queue Base (64bits) */
> +#define RISCV_IOMMU_REG_CQB             0x0018
> +#define RISCV_IOMMU_CQB_LOG2SZ          RISCV_IOMMU_QUEUE_LOGSZ_FIELD
> +#define RISCV_IOMMU_CQB_PPN             RISCV_IOMMU_PPN_FIELD
> +
> +/* 5.7 Command Queue head (32bits) */
> +#define RISCV_IOMMU_REG_CQH             0x0020
> +
> +/* 5.8 Command Queue tail (32bits) */
> +#define RISCV_IOMMU_REG_CQT             0x0024
> +
> +/* 5.9 Fault Queue Base (64bits) */
> +#define RISCV_IOMMU_REG_FQB             0x0028
> +#define RISCV_IOMMU_FQB_LOG2SZ          RISCV_IOMMU_QUEUE_LOGSZ_FIELD
> +#define RISCV_IOMMU_FQB_PPN             RISCV_IOMMU_PPN_FIELD
> +
> +/* 5.10 Fault Queue Head (32bits) */
> +#define RISCV_IOMMU_REG_FQH             0x0030
> +
> +/* 5.11 Fault Queue tail (32bits) */
> +#define RISCV_IOMMU_REG_FQT             0x0034
> +
> +/* 5.12 Page Request Queue base (64bits) */
> +#define RISCV_IOMMU_REG_PQB             0x0038
> +#define RISCV_IOMMU_PQB_LOG2SZ          RISCV_IOMMU_QUEUE_LOGSZ_FIELD
> +#define RISCV_IOMMU_PQB_PPN             RISCV_IOMMU_PPN_FIELD
> +
> +/* 5.13 Page Request Queue head (32bits) */
> +#define RISCV_IOMMU_REG_PQH             0x0040
> +
> +/* 5.14 Page Request Queue tail (32bits) */
> +#define RISCV_IOMMU_REG_PQT             0x0044
> +
> +/* 5.15 Command Queue CSR (32bits) */
> +#define RISCV_IOMMU_REG_CQCSR           0x0048
> +#define RISCV_IOMMU_CQCSR_CQEN          RISCV_IOMMU_QUEUE_ENABLE
> +#define RISCV_IOMMU_CQCSR_CIE           RISCV_IOMMU_QUEUE_INTR_ENABLE
> +#define RISCV_IOMMU_CQCSR_CQMF          RISCV_IOMMU_QUEUE_MEM_FAULT
> +#define RISCV_IOMMU_CQCSR_CMD_TO        BIT(9)
> +#define RISCV_IOMMU_CQCSR_CMD_ILL       BIT(10)
> +#define RISCV_IOMMU_CQCSR_CQON          RISCV_IOMMU_QUEUE_ACTIVE
> +#define RISCV_IOMMU_CQCSR_BUSY          RISCV_IOMMU_QUEUE_BUSY
> +
> +/* 5.16 Fault Queue CSR (32bits) */
> +#define RISCV_IOMMU_REG_FQCSR           0x004C
> +#define RISCV_IOMMU_FQCSR_FQEN          RISCV_IOMMU_QUEUE_ENABLE
> +#define RISCV_IOMMU_FQCSR_FIE           RISCV_IOMMU_QUEUE_INTR_ENABLE
> +#define RISCV_IOMMU_FQCSR_FQMF          RISCV_IOMMU_QUEUE_MEM_FAULT
> +#define RISCV_IOMMU_FQCSR_FQOF          RISCV_IOMMU_QUEUE_OVERFLOW
> +#define RISCV_IOMMU_FQCSR_FQON          RISCV_IOMMU_QUEUE_ACTIVE
> +#define RISCV_IOMMU_FQCSR_BUSY          RISCV_IOMMU_QUEUE_BUSY
> +
> +/* 5.17 Page Request Queue CSR (32bits) */
> +#define RISCV_IOMMU_REG_PQCSR           0x0050
> +#define RISCV_IOMMU_PQCSR_PQEN          RISCV_IOMMU_QUEUE_ENABLE
> +#define RISCV_IOMMU_PQCSR_PIE           RISCV_IOMMU_QUEUE_INTR_ENABLE
> +#define RISCV_IOMMU_PQCSR_PQMF          RISCV_IOMMU_QUEUE_MEM_FAULT
> +#define RISCV_IOMMU_PQCSR_PQOF          RISCV_IOMMU_QUEUE_OVERFLOW
> +#define RISCV_IOMMU_PQCSR_PQON          RISCV_IOMMU_QUEUE_ACTIVE
> +#define RISCV_IOMMU_PQCSR_BUSY          RISCV_IOMMU_QUEUE_BUSY
> +
> +/* 5.18 Interrupt Pending Status (32bits) */
> +#define RISCV_IOMMU_REG_IPSR            0x0054
> +
> +enum {
> +    RISCV_IOMMU_INTR_CQ,
> +    RISCV_IOMMU_INTR_FQ,
> +    RISCV_IOMMU_INTR_PM,
> +    RISCV_IOMMU_INTR_PQ,
> +    RISCV_IOMMU_INTR_COUNT
> +};
> +
> +/* 5.27 Interrupt cause to vector (64bits) */
> +#define RISCV_IOMMU_REG_IVEC            0x02F8
> +
> +/* 5.28 MSI Configuration table (32 * 64bits) */
> +#define RISCV_IOMMU_REG_MSI_CONFIG      0x0300
> +
> +#define RISCV_IOMMU_REG_SIZE           0x1000
> +
> +#define RISCV_IOMMU_DDTE_VALID          BIT_ULL(0)
> +#define RISCV_IOMMU_DDTE_PPN            RISCV_IOMMU_PPN_FIELD
> +
> +/* Struct riscv_iommu_dc - Device Context - section 2.1 */
> +struct riscv_iommu_dc {
> +      uint64_t tc;
> +      uint64_t iohgatp;
> +      uint64_t ta;
> +      uint64_t fsc;
> +      uint64_t msiptp;
> +      uint64_t msi_addr_mask;
> +      uint64_t msi_addr_pattern;
> +      uint64_t _reserved;
> +};
> +
> +/* Translation control fields */
> +#define RISCV_IOMMU_DC_TC_V             BIT_ULL(0)
> +#define RISCV_IOMMU_DC_TC_DTF           BIT_ULL(4)
> +#define RISCV_IOMMU_DC_TC_PDTV          BIT_ULL(5)
> +#define RISCV_IOMMU_DC_TC_PRPR          BIT_ULL(6)
> +#define RISCV_IOMMU_DC_TC_DPE           BIT_ULL(9)
> +#define RISCV_IOMMU_DC_TC_SXL           BIT_ULL(11)
> +
> +/* Second-stage (aka G-stage) context fields */
> +#define RISCV_IOMMU_DC_IOHGATP_PPN      RISCV_IOMMU_ATP_PPN_FIELD
> +#define RISCV_IOMMU_DC_IOHGATP_GSCID    GENMASK_ULL(59, 44)
> +#define RISCV_IOMMU_DC_IOHGATP_MODE     RISCV_IOMMU_ATP_MODE_FIELD
> +
> +enum riscv_iommu_dc_iohgatp_modes {
> +    RISCV_IOMMU_DC_IOHGATP_MODE_BARE = 0,
> +    RISCV_IOMMU_DC_IOHGATP_MODE_SV32X4 = 8,
> +    RISCV_IOMMU_DC_IOHGATP_MODE_SV39X4 = 8,
> +    RISCV_IOMMU_DC_IOHGATP_MODE_SV48X4 = 9,
> +    RISCV_IOMMU_DC_IOHGATP_MODE_SV57X4 = 10
> +};
> +
> +/* Translation attributes fields */
> +#define RISCV_IOMMU_DC_TA_PSCID         GENMASK_ULL(31, 12)
> +
> +/* First-stage context fields */
> +#define RISCV_IOMMU_DC_FSC_PPN          RISCV_IOMMU_ATP_PPN_FIELD
> +#define RISCV_IOMMU_DC_FSC_MODE         RISCV_IOMMU_ATP_MODE_FIELD
> +
> +/* Generic I/O MMU command structure - check section 3.1 */
> +struct riscv_iommu_command {
> +    uint64_t dword0;
> +    uint64_t dword1;
> +};
> +
> +#define RISCV_IOMMU_CMD_OPCODE          GENMASK_ULL(6, 0)
> +#define RISCV_IOMMU_CMD_FUNC            GENMASK_ULL(9, 7)
> +
> +#define RISCV_IOMMU_CMD_IOTINVAL_OPCODE         1
> +#define RISCV_IOMMU_CMD_IOTINVAL_FUNC_VMA       0
> +#define RISCV_IOMMU_CMD_IOTINVAL_FUNC_GVMA      1
> +#define RISCV_IOMMU_CMD_IOTINVAL_AV     BIT_ULL(10)
> +#define RISCV_IOMMU_CMD_IOTINVAL_PSCID  GENMASK_ULL(31, 12)
> +#define RISCV_IOMMU_CMD_IOTINVAL_PSCV   BIT_ULL(32)
> +#define RISCV_IOMMU_CMD_IOTINVAL_GV     BIT_ULL(33)
> +#define RISCV_IOMMU_CMD_IOTINVAL_GSCID  GENMASK_ULL(59, 44)
> +
> +#define RISCV_IOMMU_CMD_IOFENCE_OPCODE          2
> +#define RISCV_IOMMU_CMD_IOFENCE_FUNC_C          0
> +#define RISCV_IOMMU_CMD_IOFENCE_AV      BIT_ULL(10)
> +#define RISCV_IOMMU_CMD_IOFENCE_DATA    GENMASK_ULL(63, 32)
> +
> +#define RISCV_IOMMU_CMD_IODIR_OPCODE            3
> +#define RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_DDT    0
> +#define RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_PDT    1
> +#define RISCV_IOMMU_CMD_IODIR_PID       GENMASK_ULL(31, 12)
> +#define RISCV_IOMMU_CMD_IODIR_DV        BIT_ULL(33)
> +#define RISCV_IOMMU_CMD_IODIR_DID       GENMASK_ULL(63, 40)
> +
> +enum riscv_iommu_dc_fsc_atp_modes {
> +    RISCV_IOMMU_DC_FSC_MODE_BARE = 0,
> +    RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV32 = 8,
> +    RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 = 8,
> +    RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV48 = 9,
> +    RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV57 = 10,
> +    RISCV_IOMMU_DC_FSC_PDTP_MODE_PD8 = 1,
> +    RISCV_IOMMU_DC_FSC_PDTP_MODE_PD17 = 2,
> +    RISCV_IOMMU_DC_FSC_PDTP_MODE_PD20 = 3
> +};
> +
> +enum riscv_iommu_fq_causes {
> +    RISCV_IOMMU_FQ_CAUSE_INST_FAULT           = 1,
> +    RISCV_IOMMU_FQ_CAUSE_RD_ADDR_MISALIGNED   = 4,
> +    RISCV_IOMMU_FQ_CAUSE_RD_FAULT             = 5,
> +    RISCV_IOMMU_FQ_CAUSE_WR_ADDR_MISALIGNED   = 6,
> +    RISCV_IOMMU_FQ_CAUSE_WR_FAULT             = 7,
> +    RISCV_IOMMU_FQ_CAUSE_INST_FAULT_S         = 12,
> +    RISCV_IOMMU_FQ_CAUSE_RD_FAULT_S           = 13,
> +    RISCV_IOMMU_FQ_CAUSE_WR_FAULT_S           = 15,
> +    RISCV_IOMMU_FQ_CAUSE_INST_FAULT_VS        = 20,
> +    RISCV_IOMMU_FQ_CAUSE_RD_FAULT_VS          = 21,
> +    RISCV_IOMMU_FQ_CAUSE_WR_FAULT_VS          = 23,
> +    RISCV_IOMMU_FQ_CAUSE_DMA_DISABLED         = 256,
> +    RISCV_IOMMU_FQ_CAUSE_DDT_LOAD_FAULT       = 257,
> +    RISCV_IOMMU_FQ_CAUSE_DDT_INVALID          = 258,
> +    RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED    = 259,
> +    RISCV_IOMMU_FQ_CAUSE_TTYPE_BLOCKED        = 260,
> +    RISCV_IOMMU_FQ_CAUSE_MSI_LOAD_FAULT       = 261,
> +    RISCV_IOMMU_FQ_CAUSE_MSI_INVALID          = 262,
> +    RISCV_IOMMU_FQ_CAUSE_MSI_MISCONFIGURED    = 263,
> +    RISCV_IOMMU_FQ_CAUSE_MRIF_FAULT           = 264,
> +    RISCV_IOMMU_FQ_CAUSE_PDT_LOAD_FAULT       = 265,
> +    RISCV_IOMMU_FQ_CAUSE_PDT_INVALID          = 266,
> +    RISCV_IOMMU_FQ_CAUSE_PDT_MISCONFIGURED    = 267,
> +    RISCV_IOMMU_FQ_CAUSE_DDT_CORRUPTED        = 268,
> +    RISCV_IOMMU_FQ_CAUSE_PDT_CORRUPTED        = 269,
> +    RISCV_IOMMU_FQ_CAUSE_MSI_PT_CORRUPTED     = 270,
> +    RISCV_IOMMU_FQ_CAUSE_MRIF_CORRUIPTED      = 271,
> +    RISCV_IOMMU_FQ_CAUSE_INTERNAL_DP_ERROR    = 272,
> +    RISCV_IOMMU_FQ_CAUSE_MSI_WR_FAULT         = 273,
> +    RISCV_IOMMU_FQ_CAUSE_PT_CORRUPTED         = 274
> +};
> +
> +/* MSI page table pointer */
> +#define RISCV_IOMMU_DC_MSIPTP_PPN       RISCV_IOMMU_ATP_PPN_FIELD
> +#define RISCV_IOMMU_DC_MSIPTP_MODE      RISCV_IOMMU_ATP_MODE_FIELD
> +#define RISCV_IOMMU_DC_MSIPTP_MODE_FLAT 1
> +
> +/* Translation attributes fields */
> +#define RISCV_IOMMU_PC_TA_V             BIT_ULL(0)
> +
> +/* First stage context fields */
> +#define RISCV_IOMMU_PC_FSC_PPN          GENMASK_ULL(43, 0)
> +
> +enum riscv_iommu_fq_ttypes {
> +    RISCV_IOMMU_FQ_TTYPE_NONE = 0,
> +    RISCV_IOMMU_FQ_TTYPE_UADDR_INST_FETCH = 1,
> +    RISCV_IOMMU_FQ_TTYPE_UADDR_RD = 2,
> +    RISCV_IOMMU_FQ_TTYPE_UADDR_WR = 3,
> +    RISCV_IOMMU_FQ_TTYPE_TADDR_INST_FETCH = 5,
> +    RISCV_IOMMU_FQ_TTYPE_TADDR_RD = 6,
> +    RISCV_IOMMU_FQ_TTYPE_TADDR_WR = 7,
> +    RISCV_IOMMU_FW_TTYPE_PCIE_MSG_REQ = 8,
> +};
> +
> +/* Fields on pte */
> +#define RISCV_IOMMU_MSI_PTE_V           BIT_ULL(0)
> +#define RISCV_IOMMU_MSI_PTE_M           GENMASK_ULL(2, 1)
> +
> +#define RISCV_IOMMU_MSI_PTE_M_MRIF      1
> +#define RISCV_IOMMU_MSI_PTE_M_BASIC     3
> +
> +/* When M == 1 (MRIF mode) */
> +#define RISCV_IOMMU_MSI_PTE_MRIF_ADDR   GENMASK_ULL(53, 7)
> +/* When M == 3 (basic mode) */
> +#define RISCV_IOMMU_MSI_PTE_PPN         RISCV_IOMMU_PPN_FIELD
> +#define RISCV_IOMMU_MSI_PTE_C           BIT_ULL(63)
> +
> +/* Fields on mrif_info */
> +#define RISCV_IOMMU_MSI_MRIF_NID        GENMASK_ULL(9, 0)
> +#define RISCV_IOMMU_MSI_MRIF_NPPN       RISCV_IOMMU_PPN_FIELD
> +#define RISCV_IOMMU_MSI_MRIF_NID_MSB    BIT_ULL(60)
> +
> +#endif /* _RISCV_IOMMU_BITS_H_ */
> --
> 2.43.2
>
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 09/15] hw/riscv/riscv-iommu: add s-stage and g-stage support
  2024-05-10 10:36   ` Frank Chang
@ 2024-05-10 11:14     ` Andrew Jones
  2024-05-16 19:41       ` Daniel Henrique Barboza
  0 siblings, 1 reply; 55+ messages in thread
From: Andrew Jones @ 2024-05-10 11:14 UTC (permalink / raw)
  To: Frank Chang
  Cc: Daniel Henrique Barboza, qemu-devel, qemu-riscv,
	alistair.francis, bmeng, liwei1518, zhiwei_liu, palmer, tjeznach

On Fri, May 10, 2024 at 06:36:51PM GMT, Frank Chang wrote:
...
> >  static int riscv_iommu_spa_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
> > -    IOMMUTLBEntry *iotlb)
> > +    IOMMUTLBEntry *iotlb, bool gpa)
> >  {
> > +    dma_addr_t addr, base;
> > +    uint64_t satp, gatp, pte;
> > +    bool en_s, en_g;
> > +    struct {
> > +        unsigned char step;
> > +        unsigned char levels;
> > +        unsigned char ptidxbits;
> > +        unsigned char ptesize;
> > +    } sc[2];
> > +    /* Translation stage phase */
> > +    enum {
> > +        S_STAGE = 0,
> > +        G_STAGE = 1,
> > +    } pass;
> > +
> > +    satp = get_field(ctx->satp, RISCV_IOMMU_ATP_MODE_FIELD);
> > +    gatp = get_field(ctx->gatp, RISCV_IOMMU_ATP_MODE_FIELD);
> > +
> > +    en_s = satp != RISCV_IOMMU_DC_FSC_MODE_BARE && !gpa;
> > +    en_g = gatp != RISCV_IOMMU_DC_IOHGATP_MODE_BARE;
> > +
> >      /* Early check for MSI address match when IOVA == GPA */
> > -    if (iotlb->perm & IOMMU_WO &&
> > +    if (!en_s && (iotlb->perm & IOMMU_WO) &&
> 
> I'm wondering do we need to check "en_s" for MSI writes?
> 
> IOMMU spec Section 2.3.3. Process to translate addresses of MSIs says:
> "Determine if the address A is an access to a virtual interrupt file
> as specified in Section 2.1.3.6."
> 
> and Section 2.1.3.6 says:
> 
> "An incoming memory access made by a device is recognized as
> an access to a virtual interrupt file if the destination guest physical page
> matches the supplied address pattern in all bit positions that are zeros
> in the supplied address mask. In detail, a memory access to
> guest physical address A is recognized as an access to a virtual
> interrupt file’s
> memory-mapped page if:
> (A >> 12) & ~msi_addr_mask = (msi_addr_pattern & ~msi_addr_mask)"
> 
> Is checking the address pattern sufficient enough to determine
> the address is an MSI to a virtual interrupt file?
>

I think so. In fact, I've removed that en_s check on our internal build in
order to get things working for my irqbypass work, as we can do device
assignment with VFIO with only S-stage enabled.

Thanks,
drew


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 00/15] riscv: QEMU RISC-V IOMMU Support
  2024-03-07 16:03 [PATCH v2 00/15] riscv: QEMU RISC-V IOMMU Support Daniel Henrique Barboza
                   ` (14 preceding siblings ...)
  2024-03-07 16:03 ` [PATCH v2 15/15] hw/misc: EDU: add ATS/PRI capability Daniel Henrique Barboza
@ 2024-05-10 11:14 ` Frank Chang
  2024-05-20 16:26   ` Daniel Henrique Barboza
  15 siblings, 1 reply; 55+ messages in thread
From: Frank Chang @ 2024-05-10 11:14 UTC (permalink / raw)
  To: Daniel Henrique Barboza
  Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
	zhiwei_liu, palmer, ajones, tjeznach

Hi Daniel,

Thanks for the upstream work.
Sorry that it took a while for me to review the patchset.

Please let me know if you need any help from us to update the IOMMU model.
We would like to see it merged for QEMU 9.1.0.

Regards,
Frank Chang

Daniel Henrique Barboza <dbarboza@ventanamicro.com> 於 2024年3月8日 週五 上午12:04寫道:
>
> Hi,
>
> This is the second version of the work Tomasz sent in July 2023 [1].
> I'll be helping Tomasz upstreaming it.
>
> The core emulation code is left unchanged but a few tweaks were made in
> v2:
>
> - The most notable difference in this version is that the code was split
>   in smaller chunks. Patch 03 is still a 1700 lines patch, which is an
>   improvement from the 3800 lines patch from v1, but we can only go so
>   far when splitting the core components of the emulation. The reality
>   is that the IOMMU emulation is a rather complex piece of software and
>   there's not much we can do to alleviate it;
>
> - I'm not contributing the HPM support that was present in v1. It shaved
>   off 600 lines of code from the series, which is already large enough
>   as is. We'll introduce HPM in later versions or as a follow-up;
>
> - The riscv-iommu-header.h header was also trimmed. I shaved it of 300
>   or so from it, all of them from definitions that the emulation isn't
>   using it. The header will be eventually be imported from the Linux
>   driver (not upstream yet), so for now we can live with a trimmed
>   header for the emulation usage alone;
>
> - I added libqos tests for the riscv-iommu-pci device. The idea of these
>   tests is to give us more confidence in the emulation code;
>
> - 'edu' device support. The support was retrieved from Tomasz EDU branch
>   [2]. This device can then be used to test PCI passthrough to exercise
>   the IOMMU.
>
>
> Patches based on alistair/riscv-to-apply.next.
>
> v1 link: https://lore.kernel.org/qemu-riscv/cover.1689819031.git.tjeznach@rivosinc.com/
>
> [1] https://lore.kernel.org/qemu-riscv/cover.1689819031.git.tjeznach@rivosinc.com/
> [2] https://github.com/tjeznach/qemu.git, branch 'riscv_iommu_edu_impl'
>
> Andrew Jones (1):
>   hw/riscv/riscv-iommu: Add another irq for mrif notifications
>
> Daniel Henrique Barboza (2):
>   test/qtest: add riscv-iommu-pci tests
>   qtest/riscv-iommu-test: add init queues test
>
> Tomasz Jeznach (12):
>   exec/memtxattr: add process identifier to the transaction attributes
>   hw/riscv: add riscv-iommu-bits.h
>   hw/riscv: add RISC-V IOMMU base emulation
>   hw/riscv: add riscv-iommu-pci device
>   hw/riscv: add riscv-iommu-sys platform device
>   hw/riscv/virt.c: support for RISC-V IOMMU PCIDevice hotplug
>   hw/riscv/riscv-iommu: add Address Translation Cache (IOATC)
>   hw/riscv/riscv-iommu: add s-stage and g-stage support
>   hw/riscv/riscv-iommu: add ATS support
>   hw/riscv/riscv-iommu: add DBG support
>   hw/misc: EDU: added PASID support
>   hw/misc: EDU: add ATS/PRI capability
>
>  hw/misc/edu.c                    |  297 ++++-
>  hw/riscv/Kconfig                 |    4 +
>  hw/riscv/meson.build             |    1 +
>  hw/riscv/riscv-iommu-bits.h      |  407 ++++++
>  hw/riscv/riscv-iommu-pci.c       |  173 +++
>  hw/riscv/riscv-iommu-sys.c       |   93 ++
>  hw/riscv/riscv-iommu.c           | 2085 ++++++++++++++++++++++++++++++
>  hw/riscv/riscv-iommu.h           |  146 +++
>  hw/riscv/trace-events            |   15 +
>  hw/riscv/trace.h                 |    2 +
>  hw/riscv/virt.c                  |   33 +-
>  include/exec/memattrs.h          |    5 +
>  include/hw/riscv/iommu.h         |   40 +
>  meson.build                      |    1 +
>  tests/qtest/libqos/meson.build   |    4 +
>  tests/qtest/libqos/riscv-iommu.c |   79 ++
>  tests/qtest/libqos/riscv-iommu.h |   96 ++
>  tests/qtest/meson.build          |    1 +
>  tests/qtest/riscv-iommu-test.c   |  234 ++++
>  19 files changed, 3704 insertions(+), 12 deletions(-)
>  create mode 100644 hw/riscv/riscv-iommu-bits.h
>  create mode 100644 hw/riscv/riscv-iommu-pci.c
>  create mode 100644 hw/riscv/riscv-iommu-sys.c
>  create mode 100644 hw/riscv/riscv-iommu.c
>  create mode 100644 hw/riscv/riscv-iommu.h
>  create mode 100644 hw/riscv/trace-events
>  create mode 100644 hw/riscv/trace.h
>  create mode 100644 include/hw/riscv/iommu.h
>  create mode 100644 tests/qtest/libqos/riscv-iommu.c
>  create mode 100644 tests/qtest/libqos/riscv-iommu.h
>  create mode 100644 tests/qtest/riscv-iommu-test.c
>
> --
> 2.43.2
>
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 03/15] hw/riscv: add RISC-V IOMMU base emulation
  2024-05-08 11:15     ` Daniel Henrique Barboza
  2024-05-10 10:58       ` Frank Chang
@ 2024-05-13 12:37       ` Daniel Henrique Barboza
  2024-05-16  7:13         ` Frank Chang
  1 sibling, 1 reply; 55+ messages in thread
From: Daniel Henrique Barboza @ 2024-05-13 12:37 UTC (permalink / raw)
  To: Frank Chang
  Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
	zhiwei_liu, palmer, ajones, tjeznach, Sebastien Boeuf

Hi Frank,


On 5/8/24 08:15, Daniel Henrique Barboza wrote:
> Hi Frank,
> 
> I'll reply with that I've done so far. Still missing some stuff:
> 
> On 5/2/24 08:37, Frank Chang wrote:
>> Hi Daniel,
>>
>> Daniel Henrique Barboza <dbarboza@ventanamicro.com> 於 2024年3月8日 週五 上午12:04寫道:
>>>
>>> From: Tomasz Jeznach <tjeznach@rivosinc.com>
>>>
>>> The RISC-V IOMMU specification is now ratified as-per the RISC-V
>>> international process. The latest frozen specifcation can be found
>>> at:
>>>
>>> https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf
>>>
>>> Add the foundation of the device emulation for RISC-V IOMMU, which
>>> includes an IOMMU that has no capabilities but MSI interrupt support and
>>> fault queue interfaces. We'll add add more features incrementally in the
>>> next patches.
>>>
>>> Co-developed-by: Sebastien Boeuf <seb@rivosinc.com>
>>> Signed-off-by: Sebastien Boeuf <seb@rivosinc.com>
>>> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
>>> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
>>> ---
>>>   hw/riscv/Kconfig         |    4 +

(...)

>>> +
>>> +    s->iommus.le_next = NULL;
>>> +    s->iommus.le_prev = NULL;
>>> +    QLIST_INIT(&s->spaces);
>>> +    qemu_cond_init(&s->core_cond);
>>> +    qemu_mutex_init(&s->core_lock);
>>> +    qemu_spin_init(&s->regs_lock);
>>> +    qemu_thread_create(&s->core_proc, "riscv-iommu-core",
>>> +        riscv_iommu_core_proc, s, QEMU_THREAD_JOINABLE);
>>
>> In our experience, using QEMU thread increases the latency of command
>> queue processing,
>> which leads to the potential IOMMU fence timeout in the Linux driver
>> when using IOMMU with KVM,
>> e.g. booting the guest Linux.
>>
>> Is it possible to remove the thread from the IOMMU just like ARM, AMD,
>> and Intel IOMMU models?
> 
> Interesting. We've been using this emulation internally in Ventana, with
> KVM and VFIO, and didn't experience this issue. Drew is on CC and can talk
> more about it.
> 
> That said, I don't mind this change, assuming it's feasible to make it for this
> first version.  I'll need to check it how other IOMMUs are doing it.


I removed the threading and it seems to be working fine without it. I'll commit this
change for v3.

> 
> 
> 
>>
>>> +}
>>> +
> 
> (...)
> 
>>> +
>>> +static AddressSpace *riscv_iommu_find_as(PCIBus *bus, void *opaque, int devfn)
>>> +{
>>> +    RISCVIOMMUState *s = (RISCVIOMMUState *) opaque;
>>> +    PCIDevice *pdev = pci_find_device(bus, pci_bus_num(bus), devfn);
>>> +    AddressSpace *as = NULL;
>>> +
>>> +    if (pdev && pci_is_iommu(pdev)) {
>>> +        return s->target_as;
>>> +    }
>>> +
>>> +    /* Find first registered IOMMU device */
>>> +    while (s->iommus.le_prev) {
>>> +        s = *(s->iommus.le_prev);
>>> +    }
>>> +
>>> +    /* Find first matching IOMMU */
>>> +    while (s != NULL && as == NULL) {
>>> +        as = riscv_iommu_space(s, PCI_BUILD_BDF(pci_bus_num(bus), devfn));
>>
>> For pci_bus_num(),
>> riscv_iommu_find_as() can be called at the very early stage
>> where software has no chance to enumerate the bus numbers.

I investigated and this doesn't seem to be a problem. This function is called at the
last step of the realize() steps of both riscv_iommu_pci_realize() and
riscv_iommu_sys_realize(), and by that time the pci_bus_num() is already assigned.
Other iommus use pci_bus_num() into their own get_address_space() callbacks like
this too.


Thanks,


Daniel


> 
> I'll see how other IOMMUs are handling their iommu_find_as()
> 
> 
> Thanks,
> 
> 
> Daniel
> 
> 
>>
>>
>>
>>
>>> +        s = s->iommus.le_next;
>>> +    }
>>> +
>>> +    return as ? as : &address_space_memory;
>>> +}
>>> +
>>> +static const PCIIOMMUOps riscv_iommu_ops = {
>>> +    .get_address_space = riscv_iommu_find_as,
>>> +};
>>> +
>>> +void riscv_iommu_pci_setup_iommu(RISCVIOMMUState *iommu, PCIBus *bus,
>>> +        Error **errp)
>>> +{
>>> +    if (bus->iommu_ops &&
>>> +        bus->iommu_ops->get_address_space == riscv_iommu_find_as) {
>>> +        /* Allow multiple IOMMUs on the same PCIe bus, link known devices */
>>> +        RISCVIOMMUState *last = (RISCVIOMMUState *)bus->iommu_opaque;
>>> +        QLIST_INSERT_AFTER(last, iommu, iommus);
>>> +    } else if (bus->iommu_ops == NULL) {
>>> +        pci_setup_iommu(bus, &riscv_iommu_ops, iommu);
>>> +    } else {
>>> +        error_setg(errp, "can't register secondary IOMMU for PCI bus #%d",
>>> +            pci_bus_num(bus));
>>> +    }
>>> +}
>>> +
>>> +static int riscv_iommu_memory_region_index(IOMMUMemoryRegion *iommu_mr,
>>> +    MemTxAttrs attrs)
>>> +{
>>> +    return attrs.unspecified ? RISCV_IOMMU_NOPASID : (int)attrs.pasid;
>>> +}
>>> +
>>> +static int riscv_iommu_memory_region_index_len(IOMMUMemoryRegion *iommu_mr)
>>> +{
>>> +    RISCVIOMMUSpace *as = container_of(iommu_mr, RISCVIOMMUSpace, iova_mr);
>>> +    return 1 << as->iommu->pasid_bits;
>>> +}
>>> +
>>> +static void riscv_iommu_memory_region_init(ObjectClass *klass, void *data)
>>> +{
>>> +    IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_CLASS(klass);
>>> +
>>> +    imrc->translate = riscv_iommu_memory_region_translate;
>>> +    imrc->notify_flag_changed = riscv_iommu_memory_region_notify;
>>> +    imrc->attrs_to_index = riscv_iommu_memory_region_index;
>>> +    imrc->num_indexes = riscv_iommu_memory_region_index_len;
>>> +}
>>> +
>>> +static const TypeInfo riscv_iommu_memory_region_info = {
>>> +    .parent = TYPE_IOMMU_MEMORY_REGION,
>>> +    .name = TYPE_RISCV_IOMMU_MEMORY_REGION,
>>> +    .class_init = riscv_iommu_memory_region_init,
>>> +};
>>> +
>>> +static void riscv_iommu_register_mr_types(void)
>>> +{
>>> +    type_register_static(&riscv_iommu_memory_region_info);
>>> +    type_register_static(&riscv_iommu_info);
>>> +}
>>> +
>>> +type_init(riscv_iommu_register_mr_types);
>>> diff --git a/hw/riscv/riscv-iommu.h b/hw/riscv/riscv-iommu.h
>>> new file mode 100644
>>> index 0000000000..6f740de690
>>> --- /dev/null
>>> +++ b/hw/riscv/riscv-iommu.h
>>> @@ -0,0 +1,141 @@
>>> +/*
>>> + * QEMU emulation of an RISC-V IOMMU (Ziommu)
>>> + *
>>> + * Copyright (C) 2022-2023 Rivos Inc.
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify
>>> + * it under the terms of the GNU General Public License as published by
>>> + * the Free Software Foundation; either version 2 of the License.
>>> + *
>>> + * This program is distributed in the hope that it will be useful,
>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>> + * GNU General Public License for more details.
>>> + *
>>> + * You should have received a copy of the GNU General Public License along
>>> + * with this program; if not, see <http://www.gnu.org/licenses/>.
>>> + */
>>> +
>>> +#ifndef HW_RISCV_IOMMU_STATE_H
>>> +#define HW_RISCV_IOMMU_STATE_H
>>> +
>>> +#include "qemu/osdep.h"
>>> +#include "qom/object.h"
>>> +
>>> +#include "hw/riscv/iommu.h"
>>> +
>>> +struct RISCVIOMMUState {
>>> +    /*< private >*/
>>> +    DeviceState parent_obj;
>>> +
>>> +    /*< public >*/
>>> +    uint32_t version;     /* Reported interface version number */
>>> +    uint32_t pasid_bits;  /* process identifier width */
>>> +    uint32_t bus;         /* PCI bus mapping for non-root endpoints */
>>> +
>>> +    uint64_t cap;         /* IOMMU supported capabilities */
>>> +    uint64_t fctl;        /* IOMMU enabled features */
>>> +
>>> +    bool enable_off;      /* Enable out-of-reset OFF mode (DMA disabled) */
>>> +    bool enable_msi;      /* Enable MSI remapping */
>>> +
>>> +    /* IOMMU Internal State */
>>> +    uint64_t ddtp;        /* Validated Device Directory Tree Root Pointer */
>>> +
>>> +    dma_addr_t cq_addr;   /* Command queue base physical address */
>>> +    dma_addr_t fq_addr;   /* Fault/event queue base physical address */
>>> +    dma_addr_t pq_addr;   /* Page request queue base physical address */
>>> +
>>> +    uint32_t cq_mask;     /* Command queue index bit mask */
>>> +    uint32_t fq_mask;     /* Fault/event queue index bit mask */
>>> +    uint32_t pq_mask;     /* Page request queue index bit mask */
>>> +
>>> +    /* interrupt notifier */
>>> +    void (*notify)(RISCVIOMMUState *iommu, unsigned vector);
>>> +
>>> +    /* IOMMU State Machine */
>>> +    QemuThread core_proc; /* Background processing thread */
>>> +    QemuMutex core_lock;  /* Global IOMMU lock, used for cache/regs updates */
>>> +    QemuCond core_cond;   /* Background processing wake up signal */
>>> +    unsigned core_exec;   /* Processing thread execution actions */
>>> +
>>> +    /* IOMMU target address space */
>>> +    AddressSpace *target_as;
>>> +    MemoryRegion *target_mr;
>>> +
>>> +    /* MSI / MRIF access trap */
>>> +    AddressSpace trap_as;
>>> +    MemoryRegion trap_mr;
>>> +
>>> +    GHashTable *ctx_cache;          /* Device translation Context Cache */
>>> +
>>> +    /* MMIO Hardware Interface */
>>> +    MemoryRegion regs_mr;
>>> +    QemuSpin regs_lock;
>>> +    uint8_t *regs_rw;  /* register state (user write) */
>>> +    uint8_t *regs_wc;  /* write-1-to-clear mask */
>>> +    uint8_t *regs_ro;  /* read-only mask */
>>> +
>>> +    QLIST_ENTRY(RISCVIOMMUState) iommus;
>>> +    QLIST_HEAD(, RISCVIOMMUSpace) spaces;
>>> +};
>>> +
>>> +void riscv_iommu_pci_setup_iommu(RISCVIOMMUState *iommu, PCIBus *bus,
>>> +         Error **errp);
>>> +
>>> +/* private helpers */
>>> +
>>> +/* Register helper functions */
>>> +static inline uint32_t riscv_iommu_reg_mod32(RISCVIOMMUState *s,
>>> +    unsigned idx, uint32_t set, uint32_t clr)
>>> +{
>>> +    uint32_t val;
>>> +    qemu_spin_lock(&s->regs_lock);
>>> +    val = ldl_le_p(s->regs_rw + idx);
>>> +    stl_le_p(s->regs_rw + idx, (val & ~clr) | set);
>>> +    qemu_spin_unlock(&s->regs_lock);
>>> +    return val;
>>> +}
>>> +
>>> +static inline void riscv_iommu_reg_set32(RISCVIOMMUState *s,
>>> +    unsigned idx, uint32_t set)
>>> +{
>>> +    qemu_spin_lock(&s->regs_lock);
>>> +    stl_le_p(s->regs_rw + idx, set);
>>> +    qemu_spin_unlock(&s->regs_lock);
>>> +}
>>> +
>>> +static inline uint32_t riscv_iommu_reg_get32(RISCVIOMMUState *s,
>>> +    unsigned idx)
>>> +{
>>> +    return ldl_le_p(s->regs_rw + idx);
>>> +}
>>> +
>>> +static inline uint64_t riscv_iommu_reg_mod64(RISCVIOMMUState *s,
>>> +    unsigned idx, uint64_t set, uint64_t clr)
>>> +{
>>> +    uint64_t val;
>>> +    qemu_spin_lock(&s->regs_lock);
>>> +    val = ldq_le_p(s->regs_rw + idx);
>>> +    stq_le_p(s->regs_rw + idx, (val & ~clr) | set);
>>> +    qemu_spin_unlock(&s->regs_lock);
>>> +    return val;
>>> +}
>>> +
>>> +static inline void riscv_iommu_reg_set64(RISCVIOMMUState *s,
>>> +    unsigned idx, uint64_t set)
>>> +{
>>> +    qemu_spin_lock(&s->regs_lock);
>>> +    stq_le_p(s->regs_rw + idx, set);
>>> +    qemu_spin_unlock(&s->regs_lock);
>>> +}
>>> +
>>> +static inline uint64_t riscv_iommu_reg_get64(RISCVIOMMUState *s,
>>> +    unsigned idx)
>>> +{
>>> +    return ldq_le_p(s->regs_rw + idx);
>>> +}
>>> +
>>> +
>>> +
>>> +#endif
>>> diff --git a/hw/riscv/trace-events b/hw/riscv/trace-events
>>> new file mode 100644
>>> index 0000000000..42a97caffa
>>> --- /dev/null
>>> +++ b/hw/riscv/trace-events
>>> @@ -0,0 +1,11 @@
>>> +# See documentation at docs/devel/tracing.rst
>>> +
>>> +# riscv-iommu.c
>>> +riscv_iommu_new(const char *id, unsigned b, unsigned d, unsigned f) "%s: device attached %04x:%02x.%d"
>>> +riscv_iommu_flt(const char *id, unsigned b, unsigned d, unsigned f, uint64_t reason, uint64_t iova) "%s: fault %04x:%02x.%u reason: 0x%"PRIx64" iova: 0x%"PRIx64
>>> +riscv_iommu_pri(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova) "%s: page request %04x:%02x.%u iova: 0x%"PRIx64
>>> +riscv_iommu_dma(const char *id, unsigned b, unsigned d, unsigned f, unsigned pasid, const char *dir, uint64_t iova, uint64_t phys) "%s: translate %04x:%02x.%u #%u %s 0x%"PRIx64" -> 0x%"PRIx64
>>> +riscv_iommu_msi(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova, uint64_t phys) "%s: translate %04x:%02x.%u MSI 0x%"PRIx64" -> 0x%"PRIx64
>>> +riscv_iommu_cmd(const char *id, uint64_t l, uint64_t u) "%s: command 0x%"PRIx64" 0x%"PRIx64
>>> +riscv_iommu_notifier_add(const char *id) "%s: dev-iotlb notifier added"
>>> +riscv_iommu_notifier_del(const char *id) "%s: dev-iotlb notifier removed"
>>> diff --git a/hw/riscv/trace.h b/hw/riscv/trace.h
>>> new file mode 100644
>>> index 0000000000..b88504b750
>>> --- /dev/null
>>> +++ b/hw/riscv/trace.h
>>> @@ -0,0 +1,2 @@
>>> +#include "trace/trace-hw_riscv.h"
>>> +
>>> diff --git a/include/hw/riscv/iommu.h b/include/hw/riscv/iommu.h
>>> new file mode 100644
>>> index 0000000000..403b365893
>>> --- /dev/null
>>> +++ b/include/hw/riscv/iommu.h
>>> @@ -0,0 +1,36 @@
>>> +/*
>>> + * QEMU emulation of an RISC-V IOMMU (Ziommu)
>>> + *
>>> + * Copyright (C) 2022-2023 Rivos Inc.
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify
>>> + * it under the terms of the GNU General Public License as published by
>>> + * the Free Software Foundation; either version 2 of the License.
>>> + *
>>> + * This program is distributed in the hope that it will be useful,
>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>> + * GNU General Public License for more details.
>>> + *
>>> + * You should have received a copy of the GNU General Public License along
>>> + * with this program; if not, see <http://www.gnu.org/licenses/>.
>>> + */
>>> +
>>> +#ifndef HW_RISCV_IOMMU_H
>>> +#define HW_RISCV_IOMMU_H
>>> +
>>> +#include "qemu/osdep.h"
>>> +#include "qom/object.h"
>>> +
>>> +#define TYPE_RISCV_IOMMU "riscv-iommu"
>>> +OBJECT_DECLARE_SIMPLE_TYPE(RISCVIOMMUState, RISCV_IOMMU)
>>> +typedef struct RISCVIOMMUState RISCVIOMMUState;
>>> +
>>> +#define TYPE_RISCV_IOMMU_MEMORY_REGION "riscv-iommu-mr"
>>> +typedef struct RISCVIOMMUSpace RISCVIOMMUSpace;
>>> +
>>> +#define TYPE_RISCV_IOMMU_PCI "riscv-iommu-pci"
>>> +OBJECT_DECLARE_SIMPLE_TYPE(RISCVIOMMUStatePci, RISCV_IOMMU_PCI)
>>> +typedef struct RISCVIOMMUStatePci RISCVIOMMUStatePci;
>>> +
>>> +#endif
>>> diff --git a/meson.build b/meson.build
>>> index c59ca496f2..75e56f3282 100644
>>> --- a/meson.build
>>> +++ b/meson.build
>>> @@ -3361,6 +3361,7 @@ if have_system
>>>       'hw/rdma',
>>>       'hw/rdma/vmw',
>>>       'hw/rtc',
>>> +    'hw/riscv',
>>>       'hw/s390x',
>>>       'hw/scsi',
>>>       'hw/sd',
>>> -- 
>>> 2.43.2
>>>
>>>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 03/15] hw/riscv: add RISC-V IOMMU base emulation
  2024-05-10 10:58       ` Frank Chang
@ 2024-05-13 12:41         ` Daniel Henrique Barboza
  0 siblings, 0 replies; 55+ messages in thread
From: Daniel Henrique Barboza @ 2024-05-13 12:41 UTC (permalink / raw)
  To: Frank Chang
  Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
	zhiwei_liu, palmer, ajones, tjeznach, Sebastien Boeuf

Hi Frank,

On 5/10/24 07:58, Frank Chang wrote:
> Hi Daniel,
> 
> Daniel Henrique Barboza <dbarboza@ventanamicro.com> 於 2024年5月8日 週三 下午7:16寫道:
>>
>> Hi Frank,
>>
>> I'll reply with that I've done so far. Still missing some stuff:
>>
>> On 5/2/24 08:37, Frank Chang wrote:
>>> Hi Daniel,
>>>
>>> Daniel Henrique Barboza <dbarboza@ventanamicro.com> 於 2024年3月8日 週五 上午12:04寫道:


(...)


>>> In our experience, using QEMU thread increases the latency of command
>>> queue processing,
>>> which leads to the potential IOMMU fence timeout in the Linux driver
>>> when using IOMMU with KVM,
>>> e.g. booting the guest Linux.
>>>
>>> Is it possible to remove the thread from the IOMMU just like ARM, AMD,
>>> and Intel IOMMU models?
>>
>> Interesting. We've been using this emulation internally in Ventana, with
>> KVM and VFIO, and didn't experience this issue. Drew is on CC and can talk
>> more about it.
> 
> We've developed IOFENCE timeout detection mechanism in our Linux
> driver internally
> to detect the long-run IOFENCE command on the hardware.
> 
> However, we hit the assertion when running on QEMU
> and the issue was resolved after we removed the thread from IOMMU model.
> However, the assertion didn't happen on our hardware.
> 
> Regards,
> Frank CHang


I see. Well, one more reason to remove the threading for v3 then. I removed it and
it seems to be working as usual in my tests, i.e. no perceptible performance or
behavior impacts. Thanks,


Daniel


> 
>>
>> That said, I don't mind this change, assuming it's feasible to make it for this
>> first version.  I'll need to check it how other IOMMUs are doing it.
>>
>>
>>
>>>
>>>> +}
>>>> +
>>
>> (...)
>>
>>>> +
>>>> +static AddressSpace *riscv_iommu_find_as(PCIBus *bus, void *opaque, int devfn)
>>>> +{
>>>> +    RISCVIOMMUState *s = (RISCVIOMMUState *) opaque;
>>>> +    PCIDevice *pdev = pci_find_device(bus, pci_bus_num(bus), devfn);
>>>> +    AddressSpace *as = NULL;
>>>> +
>>>> +    if (pdev && pci_is_iommu(pdev)) {
>>>> +        return s->target_as;
>>>> +    }
>>>> +
>>>> +    /* Find first registered IOMMU device */
>>>> +    while (s->iommus.le_prev) {
>>>> +        s = *(s->iommus.le_prev);
>>>> +    }
>>>> +
>>>> +    /* Find first matching IOMMU */
>>>> +    while (s != NULL && as == NULL) {
>>>> +        as = riscv_iommu_space(s, PCI_BUILD_BDF(pci_bus_num(bus), devfn));
>>>
>>> For pci_bus_num(),
>>> riscv_iommu_find_as() can be called at the very early stage
>>> where software has no chance to enumerate the bus numbers.
>>
>> I'll see how other IOMMUs are handling their iommu_find_as()
>>
>>
>> Thanks,
>>
>>
>> Daniel
>>
>>
>>>
>>>
>>>
>>>
>>>> +        s = s->iommus.le_next;
>>>> +    }
>>>> +
>>>> +    return as ? as : &address_space_memory;
>>>> +}
>>>> +
>>>> +static const PCIIOMMUOps riscv_iommu_ops = {
>>>> +    .get_address_space = riscv_iommu_find_as,
>>>> +};
>>>> +
>>>> +void riscv_iommu_pci_setup_iommu(RISCVIOMMUState *iommu, PCIBus *bus,
>>>> +        Error **errp)
>>>> +{
>>>> +    if (bus->iommu_ops &&
>>>> +        bus->iommu_ops->get_address_space == riscv_iommu_find_as) {
>>>> +        /* Allow multiple IOMMUs on the same PCIe bus, link known devices */
>>>> +        RISCVIOMMUState *last = (RISCVIOMMUState *)bus->iommu_opaque;
>>>> +        QLIST_INSERT_AFTER(last, iommu, iommus);
>>>> +    } else if (bus->iommu_ops == NULL) {
>>>> +        pci_setup_iommu(bus, &riscv_iommu_ops, iommu);
>>>> +    } else {
>>>> +        error_setg(errp, "can't register secondary IOMMU for PCI bus #%d",
>>>> +            pci_bus_num(bus));
>>>> +    }
>>>> +}
>>>> +
>>>> +static int riscv_iommu_memory_region_index(IOMMUMemoryRegion *iommu_mr,
>>>> +    MemTxAttrs attrs)
>>>> +{
>>>> +    return attrs.unspecified ? RISCV_IOMMU_NOPASID : (int)attrs.pasid;
>>>> +}
>>>> +
>>>> +static int riscv_iommu_memory_region_index_len(IOMMUMemoryRegion *iommu_mr)
>>>> +{
>>>> +    RISCVIOMMUSpace *as = container_of(iommu_mr, RISCVIOMMUSpace, iova_mr);
>>>> +    return 1 << as->iommu->pasid_bits;
>>>> +}
>>>> +
>>>> +static void riscv_iommu_memory_region_init(ObjectClass *klass, void *data)
>>>> +{
>>>> +    IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_CLASS(klass);
>>>> +
>>>> +    imrc->translate = riscv_iommu_memory_region_translate;
>>>> +    imrc->notify_flag_changed = riscv_iommu_memory_region_notify;
>>>> +    imrc->attrs_to_index = riscv_iommu_memory_region_index;
>>>> +    imrc->num_indexes = riscv_iommu_memory_region_index_len;
>>>> +}
>>>> +
>>>> +static const TypeInfo riscv_iommu_memory_region_info = {
>>>> +    .parent = TYPE_IOMMU_MEMORY_REGION,
>>>> +    .name = TYPE_RISCV_IOMMU_MEMORY_REGION,
>>>> +    .class_init = riscv_iommu_memory_region_init,
>>>> +};
>>>> +
>>>> +static void riscv_iommu_register_mr_types(void)
>>>> +{
>>>> +    type_register_static(&riscv_iommu_memory_region_info);
>>>> +    type_register_static(&riscv_iommu_info);
>>>> +}
>>>> +
>>>> +type_init(riscv_iommu_register_mr_types);
>>>> diff --git a/hw/riscv/riscv-iommu.h b/hw/riscv/riscv-iommu.h
>>>> new file mode 100644
>>>> index 0000000000..6f740de690
>>>> --- /dev/null
>>>> +++ b/hw/riscv/riscv-iommu.h
>>>> @@ -0,0 +1,141 @@
>>>> +/*
>>>> + * QEMU emulation of an RISC-V IOMMU (Ziommu)
>>>> + *
>>>> + * Copyright (C) 2022-2023 Rivos Inc.
>>>> + *
>>>> + * This program is free software; you can redistribute it and/or modify
>>>> + * it under the terms of the GNU General Public License as published by
>>>> + * the Free Software Foundation; either version 2 of the License.
>>>> + *
>>>> + * This program is distributed in the hope that it will be useful,
>>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>>> + * GNU General Public License for more details.
>>>> + *
>>>> + * You should have received a copy of the GNU General Public License along
>>>> + * with this program; if not, see <http://www.gnu.org/licenses/>.
>>>> + */
>>>> +
>>>> +#ifndef HW_RISCV_IOMMU_STATE_H
>>>> +#define HW_RISCV_IOMMU_STATE_H
>>>> +
>>>> +#include "qemu/osdep.h"
>>>> +#include "qom/object.h"
>>>> +
>>>> +#include "hw/riscv/iommu.h"
>>>> +
>>>> +struct RISCVIOMMUState {
>>>> +    /*< private >*/
>>>> +    DeviceState parent_obj;
>>>> +
>>>> +    /*< public >*/
>>>> +    uint32_t version;     /* Reported interface version number */
>>>> +    uint32_t pasid_bits;  /* process identifier width */
>>>> +    uint32_t bus;         /* PCI bus mapping for non-root endpoints */
>>>> +
>>>> +    uint64_t cap;         /* IOMMU supported capabilities */
>>>> +    uint64_t fctl;        /* IOMMU enabled features */
>>>> +
>>>> +    bool enable_off;      /* Enable out-of-reset OFF mode (DMA disabled) */
>>>> +    bool enable_msi;      /* Enable MSI remapping */
>>>> +
>>>> +    /* IOMMU Internal State */
>>>> +    uint64_t ddtp;        /* Validated Device Directory Tree Root Pointer */
>>>> +
>>>> +    dma_addr_t cq_addr;   /* Command queue base physical address */
>>>> +    dma_addr_t fq_addr;   /* Fault/event queue base physical address */
>>>> +    dma_addr_t pq_addr;   /* Page request queue base physical address */
>>>> +
>>>> +    uint32_t cq_mask;     /* Command queue index bit mask */
>>>> +    uint32_t fq_mask;     /* Fault/event queue index bit mask */
>>>> +    uint32_t pq_mask;     /* Page request queue index bit mask */
>>>> +
>>>> +    /* interrupt notifier */
>>>> +    void (*notify)(RISCVIOMMUState *iommu, unsigned vector);
>>>> +
>>>> +    /* IOMMU State Machine */
>>>> +    QemuThread core_proc; /* Background processing thread */
>>>> +    QemuMutex core_lock;  /* Global IOMMU lock, used for cache/regs updates */
>>>> +    QemuCond core_cond;   /* Background processing wake up signal */
>>>> +    unsigned core_exec;   /* Processing thread execution actions */
>>>> +
>>>> +    /* IOMMU target address space */
>>>> +    AddressSpace *target_as;
>>>> +    MemoryRegion *target_mr;
>>>> +
>>>> +    /* MSI / MRIF access trap */
>>>> +    AddressSpace trap_as;
>>>> +    MemoryRegion trap_mr;
>>>> +
>>>> +    GHashTable *ctx_cache;          /* Device translation Context Cache */
>>>> +
>>>> +    /* MMIO Hardware Interface */
>>>> +    MemoryRegion regs_mr;
>>>> +    QemuSpin regs_lock;
>>>> +    uint8_t *regs_rw;  /* register state (user write) */
>>>> +    uint8_t *regs_wc;  /* write-1-to-clear mask */
>>>> +    uint8_t *regs_ro;  /* read-only mask */
>>>> +
>>>> +    QLIST_ENTRY(RISCVIOMMUState) iommus;
>>>> +    QLIST_HEAD(, RISCVIOMMUSpace) spaces;
>>>> +};
>>>> +
>>>> +void riscv_iommu_pci_setup_iommu(RISCVIOMMUState *iommu, PCIBus *bus,
>>>> +         Error **errp);
>>>> +
>>>> +/* private helpers */
>>>> +
>>>> +/* Register helper functions */
>>>> +static inline uint32_t riscv_iommu_reg_mod32(RISCVIOMMUState *s,
>>>> +    unsigned idx, uint32_t set, uint32_t clr)
>>>> +{
>>>> +    uint32_t val;
>>>> +    qemu_spin_lock(&s->regs_lock);
>>>> +    val = ldl_le_p(s->regs_rw + idx);
>>>> +    stl_le_p(s->regs_rw + idx, (val & ~clr) | set);
>>>> +    qemu_spin_unlock(&s->regs_lock);
>>>> +    return val;
>>>> +}
>>>> +
>>>> +static inline void riscv_iommu_reg_set32(RISCVIOMMUState *s,
>>>> +    unsigned idx, uint32_t set)
>>>> +{
>>>> +    qemu_spin_lock(&s->regs_lock);
>>>> +    stl_le_p(s->regs_rw + idx, set);
>>>> +    qemu_spin_unlock(&s->regs_lock);
>>>> +}
>>>> +
>>>> +static inline uint32_t riscv_iommu_reg_get32(RISCVIOMMUState *s,
>>>> +    unsigned idx)
>>>> +{
>>>> +    return ldl_le_p(s->regs_rw + idx);
>>>> +}
>>>> +
>>>> +static inline uint64_t riscv_iommu_reg_mod64(RISCVIOMMUState *s,
>>>> +    unsigned idx, uint64_t set, uint64_t clr)
>>>> +{
>>>> +    uint64_t val;
>>>> +    qemu_spin_lock(&s->regs_lock);
>>>> +    val = ldq_le_p(s->regs_rw + idx);
>>>> +    stq_le_p(s->regs_rw + idx, (val & ~clr) | set);
>>>> +    qemu_spin_unlock(&s->regs_lock);
>>>> +    return val;
>>>> +}
>>>> +
>>>> +static inline void riscv_iommu_reg_set64(RISCVIOMMUState *s,
>>>> +    unsigned idx, uint64_t set)
>>>> +{
>>>> +    qemu_spin_lock(&s->regs_lock);
>>>> +    stq_le_p(s->regs_rw + idx, set);
>>>> +    qemu_spin_unlock(&s->regs_lock);
>>>> +}
>>>> +
>>>> +static inline uint64_t riscv_iommu_reg_get64(RISCVIOMMUState *s,
>>>> +    unsigned idx)
>>>> +{
>>>> +    return ldq_le_p(s->regs_rw + idx);
>>>> +}
>>>> +
>>>> +
>>>> +
>>>> +#endif
>>>> diff --git a/hw/riscv/trace-events b/hw/riscv/trace-events
>>>> new file mode 100644
>>>> index 0000000000..42a97caffa
>>>> --- /dev/null
>>>> +++ b/hw/riscv/trace-events
>>>> @@ -0,0 +1,11 @@
>>>> +# See documentation at docs/devel/tracing.rst
>>>> +
>>>> +# riscv-iommu.c
>>>> +riscv_iommu_new(const char *id, unsigned b, unsigned d, unsigned f) "%s: device attached %04x:%02x.%d"
>>>> +riscv_iommu_flt(const char *id, unsigned b, unsigned d, unsigned f, uint64_t reason, uint64_t iova) "%s: fault %04x:%02x.%u reason: 0x%"PRIx64" iova: 0x%"PRIx64
>>>> +riscv_iommu_pri(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova) "%s: page request %04x:%02x.%u iova: 0x%"PRIx64
>>>> +riscv_iommu_dma(const char *id, unsigned b, unsigned d, unsigned f, unsigned pasid, const char *dir, uint64_t iova, uint64_t phys) "%s: translate %04x:%02x.%u #%u %s 0x%"PRIx64" -> 0x%"PRIx64
>>>> +riscv_iommu_msi(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova, uint64_t phys) "%s: translate %04x:%02x.%u MSI 0x%"PRIx64" -> 0x%"PRIx64
>>>> +riscv_iommu_cmd(const char *id, uint64_t l, uint64_t u) "%s: command 0x%"PRIx64" 0x%"PRIx64
>>>> +riscv_iommu_notifier_add(const char *id) "%s: dev-iotlb notifier added"
>>>> +riscv_iommu_notifier_del(const char *id) "%s: dev-iotlb notifier removed"
>>>> diff --git a/hw/riscv/trace.h b/hw/riscv/trace.h
>>>> new file mode 100644
>>>> index 0000000000..b88504b750
>>>> --- /dev/null
>>>> +++ b/hw/riscv/trace.h
>>>> @@ -0,0 +1,2 @@
>>>> +#include "trace/trace-hw_riscv.h"
>>>> +
>>>> diff --git a/include/hw/riscv/iommu.h b/include/hw/riscv/iommu.h
>>>> new file mode 100644
>>>> index 0000000000..403b365893
>>>> --- /dev/null
>>>> +++ b/include/hw/riscv/iommu.h
>>>> @@ -0,0 +1,36 @@
>>>> +/*
>>>> + * QEMU emulation of an RISC-V IOMMU (Ziommu)
>>>> + *
>>>> + * Copyright (C) 2022-2023 Rivos Inc.
>>>> + *
>>>> + * This program is free software; you can redistribute it and/or modify
>>>> + * it under the terms of the GNU General Public License as published by
>>>> + * the Free Software Foundation; either version 2 of the License.
>>>> + *
>>>> + * This program is distributed in the hope that it will be useful,
>>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>>> + * GNU General Public License for more details.
>>>> + *
>>>> + * You should have received a copy of the GNU General Public License along
>>>> + * with this program; if not, see <http://www.gnu.org/licenses/>.
>>>> + */
>>>> +
>>>> +#ifndef HW_RISCV_IOMMU_H
>>>> +#define HW_RISCV_IOMMU_H
>>>> +
>>>> +#include "qemu/osdep.h"
>>>> +#include "qom/object.h"
>>>> +
>>>> +#define TYPE_RISCV_IOMMU "riscv-iommu"
>>>> +OBJECT_DECLARE_SIMPLE_TYPE(RISCVIOMMUState, RISCV_IOMMU)
>>>> +typedef struct RISCVIOMMUState RISCVIOMMUState;
>>>> +
>>>> +#define TYPE_RISCV_IOMMU_MEMORY_REGION "riscv-iommu-mr"
>>>> +typedef struct RISCVIOMMUSpace RISCVIOMMUSpace;
>>>> +
>>>> +#define TYPE_RISCV_IOMMU_PCI "riscv-iommu-pci"
>>>> +OBJECT_DECLARE_SIMPLE_TYPE(RISCVIOMMUStatePci, RISCV_IOMMU_PCI)
>>>> +typedef struct RISCVIOMMUStatePci RISCVIOMMUStatePci;
>>>> +
>>>> +#endif
>>>> diff --git a/meson.build b/meson.build
>>>> index c59ca496f2..75e56f3282 100644
>>>> --- a/meson.build
>>>> +++ b/meson.build
>>>> @@ -3361,6 +3361,7 @@ if have_system
>>>>        'hw/rdma',
>>>>        'hw/rdma/vmw',
>>>>        'hw/rtc',
>>>> +    'hw/riscv',
>>>>        'hw/s390x',
>>>>        'hw/scsi',
>>>>        'hw/sd',
>>>> --
>>>> 2.43.2
>>>>
>>>>
>>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 03/15] hw/riscv: add RISC-V IOMMU base emulation
  2024-05-01 11:57   ` Jason Chien
@ 2024-05-14 20:06     ` Daniel Henrique Barboza
  0 siblings, 0 replies; 55+ messages in thread
From: Daniel Henrique Barboza @ 2024-05-14 20:06 UTC (permalink / raw)
  To: Jason Chien, qemu-devel
  Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, zhiwei_liu,
	palmer, ajones, tjeznach, Sebastien Boeuf

Hi Jason,

On 5/1/24 08:57, Jason Chien wrote:
> Daniel Henrique Barboza 於 2024/3/8 上午 12:03 寫道:
>> From: Tomasz Jeznach<tjeznach@rivosinc.com>
>>
>> The RISC-V IOMMU specification is now ratified as-per the RISC-V
>> international process. The latest frozen specifcation can be found
>> at:
>>
>> https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf
>>
>> Add the foundation of the device emulation for RISC-V IOMMU, which
>> includes an IOMMU that has no capabilities but MSI interrupt support and
>> fault queue interfaces. We'll add add more features incrementally in the
>> next patches.
>>
>> Co-developed-by: Sebastien Boeuf<seb@rivosinc.com>
>> Signed-off-by: Sebastien Boeuf<seb@rivosinc.com>
>> Signed-off-by: Tomasz Jeznach<tjeznach@rivosinc.com>
>> Signed-off-by: Daniel Henrique Barboza<dbarboza@ventanamicro.com>
>> ---
>>   hw/riscv/Kconfig         |    4 +
>>   hw/riscv/meson.build     |    1 +
>>   hw/riscv/riscv-iommu.c   | 1492 ++++++++++++++++++++++++++++++++++++++
>>   hw/riscv/riscv-iommu.h   |  141 ++++
>>   hw/riscv/trace-events    |   11 +
>>   hw/riscv/trace.h         |    2 +
>>   include/hw/riscv/iommu.h |   36 +
>>   meson.build              |    1 +
>>   8 files changed, 1688 insertions(+)
>>   create mode 100644 hw/riscv/riscv-iommu.c
>>   create mode 100644 hw/riscv/riscv-iommu.h
>>   create mode 100644 hw/riscv/trace-events
>>   create mode 100644 hw/riscv/trace.h
>>   create mode 100644 include/hw/riscv/iommu.h
>>
>> diff --git a/hw/riscv/Kconfig b/hw/riscv/Kconfig
>> index 5d644eb7b1..faf6a10029 100644
>> --- a/hw/riscv/Kconfig
>> +++ b/hw/riscv/Kconfig
>> @@ -1,3 +1,6 @@
>> +config RISCV_IOMMU
>> +    bool
>> +

(...)

>> +
>> +/* IOMMU index for transactions without PASID specified. */
>> +#define RISCV_IOMMU_NOPASID 0
>> +
>> +static void riscv_iommu_notify(RISCVIOMMUState *s, int vec)
>> +{
>> +    const uint32_t ipsr =
>> +        riscv_iommu_reg_mod32(s, RISCV_IOMMU_REG_IPSR, (1 << vec), 0);
>> +    const uint32_t ivec = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_IVEC);
>> +    if (s->notify && !(ipsr & (1 << vec))) {
>> +        s->notify(s, (ivec >> (vec * 4)) & 0x0F);
>> +    }
>> +}
> The RISC-V IOMMU also supports WSI.
>> +

I mentioned in the review with Frank that this impl does not support WSI, but
it really seems clearer to do the check here nevertheless. I'll add it.


>> +static void riscv_iommu_fault(RISCVIOMMUState *s,
>> +                              struct riscv_iommu_fq_record *ev)
>> +{

(...)

>> +
>> +    /*
>> +     * Check supported device id width (in bits).
>> +     * See IOMMU Specification, Chapter 6. Software guidelines.
>> +     * - if extended device-context format is used:
>> +     *   1LVL: 6, 2LVL: 15, 3LVL: 24
>> +     * - if base device-context format is used:
>> +     *   1LVL: 7, 2LVL: 16, 3LVL: 24
>> +     */
>> +    if (ctx->devid >= (1 << (depth * 9 + 6 + (dc_fmt && depth != 2)))) {
>> +        return RISCV_IOMMU_FQ_CAUSE_DDT_INVALID;
> 
> The cause should be 260 not 258.
> 
>  From the RISC-V IOMMU Architecture Spec v1.0.0 section 2.3:
> If the device_id is wider than that supported by the IOMMU mode, as determined by the following checks then stop and report "Transaction type disallowed" (cause = 260).
> a. ddtp.iommu_mode is 2LVL and DDI[2] is not 0
> b. ddtp.iommu_mode is 1LVL and either DDI[2] is not 0 or DDI[1] is not 0
> 

Changed.

>> +    }
>> +
>> +    /* Device directory tree walk */
>> +    for (; depth-- > 0; ) {
>> +        /*
>> +         * Select device id index bits based on device directory tree level
>> +         * and device context format.
>> +         * See IOMMU Specification, Chapter 2. Data Structures.
>> +         * - if extended device-context format is used:
>> +         *   device index: [23:15][14:6][5:0]
>> +         * - if base device-context format is used:
>> +         *   device index: [23:16][15:7][6:0]
>> +         */
>> +        const int split = depth * 9 + 6 + dc_fmt;
>> +        addr |= ((ctx->devid >> split) << 3) & ~TARGET_PAGE_MASK;
>> +        if (dma_memory_read(s->target_as, addr, &de, sizeof(de),
>> +                            MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) {
>> +            return RISCV_IOMMU_FQ_CAUSE_DDT_LOAD_FAULT;
>> +        }
>> +        le64_to_cpus(&de);
>> +        if (!(de & RISCV_IOMMU_DDTE_VALID)) {
>> +            /* invalid directory entry */
>> +            return RISCV_IOMMU_FQ_CAUSE_DDT_INVALID;
>> +        }
>> +        if (de & ~(RISCV_IOMMU_DDTE_PPN | RISCV_IOMMU_DDTE_VALID)) {
>> +            /* reserved bits set */
>> +            return RISCV_IOMMU_FQ_CAUSE_DDT_INVALID;
> 
> The cause should be 259 not 258.
> 
>  From RISC-V IOMMU Architecture Spec v1.0.0 section 2.3.1:
> If any bits or encoding that are reserved for future standard use are set within ddte, stop and report "DDT entry misconfigured" (cause = 259).

Changed

> 
>> +        }
>> +        addr = PPN_PHYS(get_field(de, RISCV_IOMMU_DDTE_PPN));
>> +    }
>> +
>> +    /* index into device context entry page */
>> +    addr |= (ctx->devid * dc_len) & ~TARGET_PAGE_MASK;
>> +
>> +    memset(&dc, 0, sizeof(dc));
>> +    if (dma_memory_read(s->target_as, addr, &dc, dc_len,
>> +                        MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) {
>> +        return RISCV_IOMMU_FQ_CAUSE_DDT_LOAD_FAULT;
>> +    }
>> +
>> +    /* Set translation context. */
>> +    ctx->tc = le64_to_cpu(dc.tc);
>> +    ctx->ta = le64_to_cpu(dc.ta);
>> +    ctx->msiptp = le64_to_cpu(dc.msiptp);
>> +    ctx->msi_addr_mask = le64_to_cpu(dc.msi_addr_mask);
>> +    ctx->msi_addr_pattern = le64_to_cpu(dc.msi_addr_pattern);
>> +
> According to RISC-V IOMMU Architecture spec v1.0.0 section 2.1.4, we should do some checks for the found device context.

I added a new helper to validate the device context at this point, following
section 2.1.4 steps.

>> +    if (!(ctx->tc & RISCV_IOMMU_DC_TC_V)) {
>> +        return RISCV_IOMMU_FQ_CAUSE_DDT_INVALID;
>> +    }
>> +
>> +    if (!(ctx->tc & RISCV_IOMMU_DC_TC_PDTV)) {
>> +        if (ctx->pasid != RISCV_IOMMU_NOPASID) {
>> +            /* PASID is disabled */
>> +            return RISCV_IOMMU_FQ_CAUSE_TTYPE_BLOCKED;
>> +        }
>> +        return 0;
>> +    }
>> +

(...)

>> +
>> +static void riscv_iommu_process_cq_control(RISCVIOMMUState *s)
>> +{
>> +    uint64_t base;
>> +    uint32_t ctrl_set = riscv_iommu_reg_get32(s, RISCV_IOMMU_REG_CQCSR);
>> +    uint32_t ctrl_clr;
>> +    bool enable = !!(ctrl_set & RISCV_IOMMU_CQCSR_CQEN);
>> +    bool active = !!(ctrl_set & RISCV_IOMMU_CQCSR_CQON);
>> +
>> +    if (enable && !active) {
>> +        base = riscv_iommu_reg_get64(s, RISCV_IOMMU_REG_CQB);
>> +        s->cq_mask = (2ULL << get_field(base, RISCV_IOMMU_CQB_LOG2SZ)) - 1;
>> +        s->cq_addr = PPN_PHYS(get_field(base, RISCV_IOMMU_CQB_PPN));
>> +        stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_CQT], ~s->cq_mask);
>> +        stl_le_p(&s->regs_rw[RISCV_IOMMU_REG_CQH], 0);
>> +        stl_le_p(&s->regs_rw[RISCV_IOMMU_REG_CQT], 0);
>> +        ctrl_set = RISCV_IOMMU_CQCSR_CQON;
>> +        ctrl_clr = RISCV_IOMMU_CQCSR_BUSY | RISCV_IOMMU_CQCSR_CQMF |
>> +            RISCV_IOMMU_CQCSR_CMD_ILL | RISCV_IOMMU_CQCSR_CMD_TO;
> cqcsr.fence_w_ip should be set to 0 as well.

Done.


>> +    } else if (!enable && active) {
>> +        stl_le_p(&s->regs_ro[RISCV_IOMMU_REG_CQT], ~0);
>> +        ctrl_set = 0;
>> +        ctrl_clr = RISCV_IOMMU_CQCSR_BUSY | RISCV_IOMMU_CQCSR_CQON;
>> +    } else {

(...)

>> +}
>> +
>> +static MemTxResult riscv_iommu_mmio_write(void *opaque, hwaddr addr,
>> +    uint64_t data, unsigned size, MemTxAttrs attrs)
>> +{
>> +    RISCVIOMMUState *s = opaque;
>> +    uint32_t regb = addr & ~3;
>> +    uint32_t busy = 0;
>> +    uint32_t exec = 0;
>> +
>> +    if (size == 0 || size > 8 || (addr & (size - 1)) != 0) {
>> +        /* Unsupported MMIO alignment or access size */
>> +        return MEMTX_ERROR;
>> +    }
>> +
>> +    if (addr + size > RISCV_IOMMU_REG_MSI_CONFIG) {
>> +        /* Unsupported MMIO access location. */
>> +        return MEMTX_ACCESS_ERROR;
>> +    }
>> +
>> +    /* Track actionable MMIO write. */
>> +    switch (regb) {
> 
> There should be a case for IPSR register.
> 
>  From RISC-V IOMMU Architecture Spec v1.0.0 section 5.18:
> If a bit in ipsr is 1 then a write of 1 to the bit transitions the bit from 1→0. If the conditions to set that bit are still present (See [IPSR_FIELDS]) or if they occur after the bit is cleared then that bit transitions again from 0→1.


A new helper to handle ipsr updates via mmio_write was created.

>> +    case RISCV_IOMMU_REG_DDTP:
>> +    case RISCV_IOMMU_REG_DDTP + 4:
>> +        exec = BIT(RISCV_IOMMU_EXEC_DDTP);

(...)

>> +static void riscv_iommu_realize(DeviceState *dev, Error **errp)
>> +{
>> +    RISCVIOMMUState *s = RISCV_IOMMU(dev);
>> +
>> +    s->cap = s->version & RISCV_IOMMU_CAP_VERSION;
>> +    if (s->enable_msi) {
>> +        s->cap |= RISCV_IOMMU_CAP_MSI_FLAT | RISCV_IOMMU_CAP_MSI_MRIF;
>> +    }
>> +    /* Report QEMU target physical address space limits */
>> +    s->cap = set_field(s->cap, RISCV_IOMMU_CAP_PAS,
>> +                       TARGET_PHYS_ADDR_SPACE_BITS);
>> +
>> +    /* TODO: method to report supported PASID bits */
>> +    s->pasid_bits = 8; /* restricted to size of MemTxAttrs.pasid */
>> +    s->cap |= RISCV_IOMMU_CAP_PD8;
>> +
>> +    /* Out-of-reset translation mode: OFF (DMA disabled) BARE (passthrough) */
>> +    s->ddtp = set_field(0, RISCV_IOMMU_DDTP_MODE, s->enable_off ?
>> +                        RISCV_IOMMU_DDTP_MODE_OFF : RISCV_IOMMU_DDTP_MODE_BARE);
>> +
>> +    /* register storage */
>> +    s->regs_rw = g_new0(uint8_t, RISCV_IOMMU_REG_SIZE);
>> +    s->regs_ro = g_new0(uint8_t, RISCV_IOMMU_REG_SIZE);
>> +    s->regs_wc = g_new0(uint8_t, RISCV_IOMMU_REG_SIZE);
>> +
>> +     /* Mark all registers read-only */
>> +    memset(s->regs_ro, 0xff, RISCV_IOMMU_REG_SIZE);
>> +
>> +    /*
>> +     * Register complete MMIO space, including MSI/PBA registers.
>> +     * Note, PCIDevice implementation will add overlapping MR for MSI/PBA,
>> +     * managed directly by the PCIDevice implementation.
>> +     */
>> +    memory_region_init_io(&s->regs_mr, OBJECT(dev), &riscv_iommu_mmio_ops, s,
>> +        "riscv-iommu-regs", RISCV_IOMMU_REG_SIZE);
>> +
>> +    /* Set power-on register state */
>> +    stq_le_p(&s->regs_rw[RISCV_IOMMU_REG_CAP], s->cap);
>> +    stq_le_p(&s->regs_rw[RISCV_IOMMU_REG_FCTL], s->fctl);
> s->fctl is not initialized.

I believe the idea is to init it as zero. I'll change it to init as zero
explicitly.

>> +    stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_DDTP],
>> +        ~(RISCV_IOMMU_DDTP_PPN | RISCV_IOMMU_DDTP_MODE));
>> +    stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_CQB],
>> +        ~(RISCV_IOMMU_CQB_LOG2SZ | RISCV_IOMMU_CQB_PPN));
>> +    stq_le_p(&s->regs_ro[RISCV_IOMMU_REG_FQB],

(...)

>> +void riscv_iommu_pci_setup_iommu(RISCVIOMMUState *iommu, PCIBus *bus,
>> +        Error **errp)
>> +{
>> +    if (bus->iommu_ops &&
>> +        bus->iommu_ops->get_address_space == riscv_iommu_find_as) {
>> +        /* Allow multiple IOMMUs on the same PCIe bus, link known devices */
>> +        RISCVIOMMUState *last = (RISCVIOMMUState *)bus->iommu_opaque;
>> +        QLIST_INSERT_AFTER(last, iommu, iommus);
>> +    } else if (bus->iommu_ops == NULL) {
>> +        pci_setup_iommu(bus, &riscv_iommu_ops, iommu);
> The original bus->iommu_op and bus->iommu_opaque will be lost.

Not sure what you meant with 'iommu_op'. We have 'iommu_ops', which is being checked
for NULL before calling pci_setup_iommu().

As for overwriting the original bus->iommu_opaque, I added an extra iommu_opaque == NULL
check. We'll make:

     } else if (!bus->iommu_ops && !bus->iommu_opaque) {
         pci_setup_iommu(bus, &riscv_iommu_ops, iommu);

This will guarantee that we're not overwriting any existing ops or opaque by accident.



Thanks,


Daniel



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 06/15] hw/riscv/virt.c: support for RISC-V IOMMU PCIDevice hotplug
  2024-03-07 16:03 ` [PATCH v2 06/15] hw/riscv/virt.c: support for RISC-V IOMMU PCIDevice hotplug Daniel Henrique Barboza
  2024-04-30  2:17   ` Frank Chang
@ 2024-05-15  6:25   ` Eric Cheng
  2024-05-15  7:16     ` Andrew Jones
  1 sibling, 1 reply; 55+ messages in thread
From: Eric Cheng @ 2024-05-15  6:25 UTC (permalink / raw)
  To: Daniel Henrique Barboza, qemu-devel
  Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, zhiwei_liu,
	palmer, ajones, tjeznach

On 3/8/2024 12:03 AM, Daniel Henrique Barboza wrote:
> From: Tomasz Jeznach <tjeznach@rivosinc.com>
> 
> Generate device tree entry for riscv-iommu PCI device, along with
> mapping all PCI device identifiers to the single IOMMU device instance.
> 
> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> ---
>   hw/riscv/virt.c | 33 ++++++++++++++++++++++++++++++++-
>   1 file changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
> index a094af97c3..67a8267747 100644
> --- a/hw/riscv/virt.c
> +++ b/hw/riscv/virt.c
> @@ -32,6 +32,7 @@
>   #include "hw/core/sysbus-fdt.h"
>   #include "target/riscv/pmu.h"
>   #include "hw/riscv/riscv_hart.h"
> +#include "hw/riscv/iommu.h"
>   #include "hw/riscv/virt.h"
>   #include "hw/riscv/boot.h"
>   #include "hw/riscv/numa.h"
> @@ -1004,6 +1005,30 @@ static void create_fdt_virtio_iommu(RISCVVirtState *s, uint16_t bdf)
>                              bdf + 1, iommu_phandle, bdf + 1, 0xffff - bdf);
>   }
>   
> +static void create_fdt_iommu(RISCVVirtState *s, uint16_t bdf)
> +{
> +    const char comp[] = "riscv,pci-iommu";
> +    void *fdt = MACHINE(s)->fdt;
> +    uint32_t iommu_phandle;
> +    g_autofree char *iommu_node = NULL;
> +    g_autofree char *pci_node = NULL;
> +
> +    pci_node = g_strdup_printf("/soc/pci@%lx",
> +                               (long) virt_memmap[VIRT_PCIE_ECAM].base);
> +    iommu_node = g_strdup_printf("%s/iommu@%x", pci_node, bdf);
> +    iommu_phandle = qemu_fdt_alloc_phandle(fdt);
> +    qemu_fdt_add_subnode(fdt, iommu_node);
> +
> +    qemu_fdt_setprop(fdt, iommu_node, "compatible", comp, sizeof(comp));
> +    qemu_fdt_setprop_cell(fdt, iommu_node, "#iommu-cells", 1);
> +    qemu_fdt_setprop_cell(fdt, iommu_node, "phandle", iommu_phandle);
> +    qemu_fdt_setprop_cells(fdt, iommu_node, "reg",
> +                           bdf << 8, 0, 0, 0, 0);
> +    qemu_fdt_setprop_cells(fdt, pci_node, "iommu-map",
> +                           0, iommu_phandle, 0, bdf,
> +                           bdf + 1, iommu_phandle, bdf + 1, 0xffff - bdf);
> +}

Is it really necessary to add this iommu-pci device in riscv virt machine, 
rather than other 'physical' machine type? virt machine already has its 
virtio-iommu.

> +
>   static void finalize_fdt(RISCVVirtState *s)
>   {
>       uint32_t phandle = 1, irq_mmio_phandle = 1, msi_pcie_phandle = 1;
> @@ -1712,9 +1737,11 @@ static HotplugHandler *virt_machine_get_hotplug_handler(MachineState *machine,
>       MachineClass *mc = MACHINE_GET_CLASS(machine);
>   
>       if (device_is_dynamic_sysbus(mc, dev) ||
> -        object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI)) {
> +        object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI) ||
> +        object_dynamic_cast(OBJECT(dev), TYPE_RISCV_IOMMU_PCI)) {
>           return HOTPLUG_HANDLER(machine);
>       }
> +
>       return NULL;
>   }
>   
> @@ -1735,6 +1762,10 @@ static void virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
>       if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI)) {
>           create_fdt_virtio_iommu(s, pci_get_bdf(PCI_DEVICE(dev)));
>       }
> +
> +    if (object_dynamic_cast(OBJECT(dev), TYPE_RISCV_IOMMU_PCI)) {
> +        create_fdt_iommu(s, pci_get_bdf(PCI_DEVICE(dev)));
> +    }
>   }
>   
>   static void virt_machine_class_init(ObjectClass *oc, void *data)



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 06/15] hw/riscv/virt.c: support for RISC-V IOMMU PCIDevice hotplug
  2024-05-15  6:25   ` Eric Cheng
@ 2024-05-15  7:16     ` Andrew Jones
  0 siblings, 0 replies; 55+ messages in thread
From: Andrew Jones @ 2024-05-15  7:16 UTC (permalink / raw)
  To: Eric Cheng
  Cc: Daniel Henrique Barboza, qemu-devel, qemu-riscv,
	alistair.francis, bmeng, liwei1518, zhiwei_liu, palmer, tjeznach

On Wed, May 15, 2024 at 02:25:31PM GMT, Eric Cheng wrote:
> On 3/8/2024 12:03 AM, Daniel Henrique Barboza wrote:
> > From: Tomasz Jeznach <tjeznach@rivosinc.com>
> > 
> > Generate device tree entry for riscv-iommu PCI device, along with
> > mapping all PCI device identifiers to the single IOMMU device instance.
> > 
> > Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> > Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> > ---
> >   hw/riscv/virt.c | 33 ++++++++++++++++++++++++++++++++-
> >   1 file changed, 32 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
> > index a094af97c3..67a8267747 100644
> > --- a/hw/riscv/virt.c
> > +++ b/hw/riscv/virt.c
> > @@ -32,6 +32,7 @@
> >   #include "hw/core/sysbus-fdt.h"
> >   #include "target/riscv/pmu.h"
> >   #include "hw/riscv/riscv_hart.h"
> > +#include "hw/riscv/iommu.h"
> >   #include "hw/riscv/virt.h"
> >   #include "hw/riscv/boot.h"
> >   #include "hw/riscv/numa.h"
> > @@ -1004,6 +1005,30 @@ static void create_fdt_virtio_iommu(RISCVVirtState *s, uint16_t bdf)
> >                              bdf + 1, iommu_phandle, bdf + 1, 0xffff - bdf);
> >   }
> > +static void create_fdt_iommu(RISCVVirtState *s, uint16_t bdf)
> > +{
> > +    const char comp[] = "riscv,pci-iommu";
> > +    void *fdt = MACHINE(s)->fdt;
> > +    uint32_t iommu_phandle;
> > +    g_autofree char *iommu_node = NULL;
> > +    g_autofree char *pci_node = NULL;
> > +
> > +    pci_node = g_strdup_printf("/soc/pci@%lx",
> > +                               (long) virt_memmap[VIRT_PCIE_ECAM].base);
> > +    iommu_node = g_strdup_printf("%s/iommu@%x", pci_node, bdf);
> > +    iommu_phandle = qemu_fdt_alloc_phandle(fdt);
> > +    qemu_fdt_add_subnode(fdt, iommu_node);
> > +
> > +    qemu_fdt_setprop(fdt, iommu_node, "compatible", comp, sizeof(comp));
> > +    qemu_fdt_setprop_cell(fdt, iommu_node, "#iommu-cells", 1);
> > +    qemu_fdt_setprop_cell(fdt, iommu_node, "phandle", iommu_phandle);
> > +    qemu_fdt_setprop_cells(fdt, iommu_node, "reg",
> > +                           bdf << 8, 0, 0, 0, 0);
> > +    qemu_fdt_setprop_cells(fdt, pci_node, "iommu-map",
> > +                           0, iommu_phandle, 0, bdf,
> > +                           bdf + 1, iommu_phandle, bdf + 1, 0xffff - bdf);
> > +}
> 
> Is it really necessary to add this iommu-pci device in riscv virt machine,
> rather than other 'physical' machine type? virt machine already has its
> virtio-iommu.
>

We need both, just as the Arm virt machine has both. virtio-iommu is for
guests, but the Arm and RISCV virt machines are both also used as hosts.

Thanks,
drew


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 02/15] hw/riscv: add riscv-iommu-bits.h
  2024-03-07 16:03 ` [PATCH v2 02/15] hw/riscv: add riscv-iommu-bits.h Daniel Henrique Barboza
  2024-05-10 11:01   ` Frank Chang
@ 2024-05-15 10:02   ` Eric Cheng
  2024-05-15 14:28     ` Daniel Henrique Barboza
  1 sibling, 1 reply; 55+ messages in thread
From: Eric Cheng @ 2024-05-15 10:02 UTC (permalink / raw)
  To: Daniel Henrique Barboza, qemu-devel
  Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, zhiwei_liu,
	palmer, ajones, tjeznach

On 3/8/2024 12:03 AM, Daniel Henrique Barboza wrote:
> From: Tomasz Jeznach <tjeznach@rivosinc.com>
> 
> This header will be used by the RISC-V IOMMU emulation to be added
> in the next patch. Due to its size it's being sent in separate for
> an easier review.
> 
> One thing to notice is that this header can be replaced by the future
> Linux RISC-V IOMMU driver header, which would become a linux-header we
> would import instead of keeping our own. The Linux implementation isn't
> upstream yet so for now we'll have to manage riscv-iommu-bits.h.
> 
> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> ---
>   hw/riscv/riscv-iommu-bits.h | 335 ++++++++++++++++++++++++++++++++++++
>   1 file changed, 335 insertions(+)
>   create mode 100644 hw/riscv/riscv-iommu-bits.h
> 
> diff --git a/hw/riscv/riscv-iommu-bits.h b/hw/riscv/riscv-iommu-bits.h
> new file mode 100644
> index 0000000000..8e80b1e52a
> --- /dev/null
> +++ b/hw/riscv/riscv-iommu-bits.h
> @@ -0,0 +1,335 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright © 2022-2023 Rivos Inc.
> + * Copyright © 2023 FORTH-ICS/CARV
> + * Copyright © 2023 RISC-V IOMMU Task Group
> + *
> + * RISC-V Ziommu - Register Layout and Data Structures.

Is there still the term Ziommu today? cannot be googled. May be just a transient 
term during spec development? it puzzles new comers.
> + *
> + * Based on the IOMMU spec version 1.0, 3/2023
> + * https://github.com/riscv-non-isa/riscv-iommu
> + */
> +




^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 02/15] hw/riscv: add riscv-iommu-bits.h
  2024-05-15 10:02   ` Eric Cheng
@ 2024-05-15 14:28     ` Daniel Henrique Barboza
  0 siblings, 0 replies; 55+ messages in thread
From: Daniel Henrique Barboza @ 2024-05-15 14:28 UTC (permalink / raw)
  To: Eric Cheng, qemu-devel
  Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, zhiwei_liu,
	palmer, ajones, tjeznach



On 5/15/24 07:02, Eric Cheng wrote:
> On 3/8/2024 12:03 AM, Daniel Henrique Barboza wrote:
>> From: Tomasz Jeznach <tjeznach@rivosinc.com>
>>
>> This header will be used by the RISC-V IOMMU emulation to be added
>> in the next patch. Due to its size it's being sent in separate for
>> an easier review.
>>
>> One thing to notice is that this header can be replaced by the future
>> Linux RISC-V IOMMU driver header, which would become a linux-header we
>> would import instead of keeping our own. The Linux implementation isn't
>> upstream yet so for now we'll have to manage riscv-iommu-bits.h.
>>
>> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
>> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
>> ---
>>   hw/riscv/riscv-iommu-bits.h | 335 ++++++++++++++++++++++++++++++++++++
>>   1 file changed, 335 insertions(+)
>>   create mode 100644 hw/riscv/riscv-iommu-bits.h
>>
>> diff --git a/hw/riscv/riscv-iommu-bits.h b/hw/riscv/riscv-iommu-bits.h
>> new file mode 100644
>> index 0000000000..8e80b1e52a
>> --- /dev/null
>> +++ b/hw/riscv/riscv-iommu-bits.h
>> @@ -0,0 +1,335 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*
>> + * Copyright © 2022-2023 Rivos Inc.
>> + * Copyright © 2023 FORTH-ICS/CARV
>> + * Copyright © 2023 RISC-V IOMMU Task Group
>> + *
>> + * RISC-V Ziommu - Register Layout and Data Structures.
> 
> Is there still the term Ziommu today? cannot be googled. May be just a transient term during spec development? it puzzles new comers.

Fair point. I'll remove any 'ziommu' references in all patches.


Thanks,

Daniel


>> + *
>> + * Based on the IOMMU spec version 1.0, 3/2023
>> + * https://github.com/riscv-non-isa/riscv-iommu
>> + */
>> +
> 
> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 03/15] hw/riscv: add RISC-V IOMMU base emulation
  2024-05-13 12:37       ` Daniel Henrique Barboza
@ 2024-05-16  7:13         ` Frank Chang
  2024-05-20 16:17           ` Daniel Henrique Barboza
  0 siblings, 1 reply; 55+ messages in thread
From: Frank Chang @ 2024-05-16  7:13 UTC (permalink / raw)
  To: Daniel Henrique Barboza
  Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
	zhiwei_liu, palmer, ajones, tjeznach, Sebastien Boeuf

[-- Attachment #1: Type: text/plain, Size: 16874 bytes --]

On Mon, May 13, 2024 at 8:37 PM Daniel Henrique Barboza <
dbarboza@ventanamicro.com> wrote:

> Hi Frank,
>
>
> On 5/8/24 08:15, Daniel Henrique Barboza wrote:
> > Hi Frank,
> >
> > I'll reply with that I've done so far. Still missing some stuff:
> >
> > On 5/2/24 08:37, Frank Chang wrote:
> >> Hi Daniel,
> >>
> >> Daniel Henrique Barboza <dbarboza@ventanamicro.com> 於 2024年3月8日 週五
> 上午12:04寫道:
> >>>
> >>> From: Tomasz Jeznach <tjeznach@rivosinc.com>
> >>>
> >>> The RISC-V IOMMU specification is now ratified as-per the RISC-V
> >>> international process. The latest frozen specifcation can be found
> >>> at:
> >>>
> >>>
> https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf
> >>>
> >>> Add the foundation of the device emulation for RISC-V IOMMU, which
> >>> includes an IOMMU that has no capabilities but MSI interrupt support
> and
> >>> fault queue interfaces. We'll add add more features incrementally in
> the
> >>> next patches.
> >>>
> >>> Co-developed-by: Sebastien Boeuf <seb@rivosinc.com>
> >>> Signed-off-by: Sebastien Boeuf <seb@rivosinc.com>
> >>> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
> >>> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> >>> ---
> >>>   hw/riscv/Kconfig         |    4 +
>
> (...)
>
> >>> +
> >>> +    s->iommus.le_next = NULL;
> >>> +    s->iommus.le_prev = NULL;
> >>> +    QLIST_INIT(&s->spaces);
> >>> +    qemu_cond_init(&s->core_cond);
> >>> +    qemu_mutex_init(&s->core_lock);
> >>> +    qemu_spin_init(&s->regs_lock);
> >>> +    qemu_thread_create(&s->core_proc, "riscv-iommu-core",
> >>> +        riscv_iommu_core_proc, s, QEMU_THREAD_JOINABLE);
> >>
> >> In our experience, using QEMU thread increases the latency of command
> >> queue processing,
> >> which leads to the potential IOMMU fence timeout in the Linux driver
> >> when using IOMMU with KVM,
> >> e.g. booting the guest Linux.
> >>
> >> Is it possible to remove the thread from the IOMMU just like ARM, AMD,
> >> and Intel IOMMU models?
> >
> > Interesting. We've been using this emulation internally in Ventana, with
> > KVM and VFIO, and didn't experience this issue. Drew is on CC and can
> talk
> > more about it.
> >
> > That said, I don't mind this change, assuming it's feasible to make it
> for this
> > first version.  I'll need to check it how other IOMMUs are doing it.
>
>
> I removed the threading and it seems to be working fine without it. I'll
> commit this
> change for v3.
>
> >
> >
> >
> >>
> >>> +}
> >>> +
> >
> > (...)
> >
> >>> +
> >>> +static AddressSpace *riscv_iommu_find_as(PCIBus *bus, void *opaque,
> int devfn)
> >>> +{
> >>> +    RISCVIOMMUState *s = (RISCVIOMMUState *) opaque;
> >>> +    PCIDevice *pdev = pci_find_device(bus, pci_bus_num(bus), devfn);
> >>> +    AddressSpace *as = NULL;
> >>> +
> >>> +    if (pdev && pci_is_iommu(pdev)) {
> >>> +        return s->target_as;
> >>> +    }
> >>> +
> >>> +    /* Find first registered IOMMU device */
> >>> +    while (s->iommus.le_prev) {
> >>> +        s = *(s->iommus.le_prev);
> >>> +    }
> >>> +
> >>> +    /* Find first matching IOMMU */
> >>> +    while (s != NULL && as == NULL) {
> >>> +        as = riscv_iommu_space(s, PCI_BUILD_BDF(pci_bus_num(bus),
> devfn));
> >>
> >> For pci_bus_num(),
> >> riscv_iommu_find_as() can be called at the very early stage
> >> where software has no chance to enumerate the bus numbers.
>
> I investigated and this doesn't seem to be a problem. This function is
> called at the
> last step of the realize() steps of both riscv_iommu_pci_realize() and
> riscv_iommu_sys_realize(), and by that time the pci_bus_num() is already
> assigned.
> Other iommus use pci_bus_num() into their own get_address_space()
> callbacks like
> this too.
>

Hi Daniel,

IIUC, pci_bus_num() by default is assigned to pcibus_num():

static int pcibus_num(PCIBus *bus)
{
    if (pci_bus_is_root(bus)) {
        return 0; /* pci host bridge */
    }
    return bus->parent_dev->config[PCI_SECONDARY_BUS];
}

If the bus is not the root bus, it tries to read the bus' parent device's
secondary bus number (PCI_SECONDARY_BUS) field in the PCI configuration
space.
This field should be programmable by the SW during PCIe enumeration.
But I don't think SW has a chance to be executed before
riscv_iommu_sys_realize() is called,
since it's pretty early before CPU's execution unless RISC-V IOMMU is
hot-plugged.
Even if RISC-V IOMMU is hot-plugged, I think riscv_iommu_sys_realize() is
still called
before SW aware of the existence of IOMMU on the PCI topology tree.

Do you think this matches your observation?

Regards,
Frank Chang


>
>
> Thanks,
>
>
> Daniel
>
>
> >
> > I'll see how other IOMMUs are handling their iommu_find_as()
> >
> >
> > Thanks,
> >
> >
> > Daniel
> >
> >
> >>
> >>
> >>
> >>
> >>> +        s = s->iommus.le_next;
> >>> +    }
> >>> +
> >>> +    return as ? as : &address_space_memory;
> >>> +}
> >>> +
> >>> +static const PCIIOMMUOps riscv_iommu_ops = {
> >>> +    .get_address_space = riscv_iommu_find_as,
> >>> +};
> >>> +
> >>> +void riscv_iommu_pci_setup_iommu(RISCVIOMMUState *iommu, PCIBus *bus,
> >>> +        Error **errp)
> >>> +{
> >>> +    if (bus->iommu_ops &&
> >>> +        bus->iommu_ops->get_address_space == riscv_iommu_find_as) {
> >>> +        /* Allow multiple IOMMUs on the same PCIe bus, link known
> devices */
> >>> +        RISCVIOMMUState *last = (RISCVIOMMUState *)bus->iommu_opaque;
> >>> +        QLIST_INSERT_AFTER(last, iommu, iommus);
> >>> +    } else if (bus->iommu_ops == NULL) {
> >>> +        pci_setup_iommu(bus, &riscv_iommu_ops, iommu);
> >>> +    } else {
> >>> +        error_setg(errp, "can't register secondary IOMMU for PCI bus
> #%d",
> >>> +            pci_bus_num(bus));
> >>> +    }
> >>> +}
> >>> +
> >>> +static int riscv_iommu_memory_region_index(IOMMUMemoryRegion
> *iommu_mr,
> >>> +    MemTxAttrs attrs)
> >>> +{
> >>> +    return attrs.unspecified ? RISCV_IOMMU_NOPASID : (int)attrs.pasid;
> >>> +}
> >>> +
> >>> +static int riscv_iommu_memory_region_index_len(IOMMUMemoryRegion
> *iommu_mr)
> >>> +{
> >>> +    RISCVIOMMUSpace *as = container_of(iommu_mr, RISCVIOMMUSpace,
> iova_mr);
> >>> +    return 1 << as->iommu->pasid_bits;
> >>> +}
> >>> +
> >>> +static void riscv_iommu_memory_region_init(ObjectClass *klass, void
> *data)
> >>> +{
> >>> +    IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_CLASS(klass);
> >>> +
> >>> +    imrc->translate = riscv_iommu_memory_region_translate;
> >>> +    imrc->notify_flag_changed = riscv_iommu_memory_region_notify;
> >>> +    imrc->attrs_to_index = riscv_iommu_memory_region_index;
> >>> +    imrc->num_indexes = riscv_iommu_memory_region_index_len;
> >>> +}
> >>> +
> >>> +static const TypeInfo riscv_iommu_memory_region_info = {
> >>> +    .parent = TYPE_IOMMU_MEMORY_REGION,
> >>> +    .name = TYPE_RISCV_IOMMU_MEMORY_REGION,
> >>> +    .class_init = riscv_iommu_memory_region_init,
> >>> +};
> >>> +
> >>> +static void riscv_iommu_register_mr_types(void)
> >>> +{
> >>> +    type_register_static(&riscv_iommu_memory_region_info);
> >>> +    type_register_static(&riscv_iommu_info);
> >>> +}
> >>> +
> >>> +type_init(riscv_iommu_register_mr_types);
> >>> diff --git a/hw/riscv/riscv-iommu.h b/hw/riscv/riscv-iommu.h
> >>> new file mode 100644
> >>> index 0000000000..6f740de690
> >>> --- /dev/null
> >>> +++ b/hw/riscv/riscv-iommu.h
> >>> @@ -0,0 +1,141 @@
> >>> +/*
> >>> + * QEMU emulation of an RISC-V IOMMU (Ziommu)
> >>> + *
> >>> + * Copyright (C) 2022-2023 Rivos Inc.
> >>> + *
> >>> + * This program is free software; you can redistribute it and/or
> modify
> >>> + * it under the terms of the GNU General Public License as published
> by
> >>> + * the Free Software Foundation; either version 2 of the License.
> >>> + *
> >>> + * This program is distributed in the hope that it will be useful,
> >>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >>> + * GNU General Public License for more details.
> >>> + *
> >>> + * You should have received a copy of the GNU General Public License
> along
> >>> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> >>> + */
> >>> +
> >>> +#ifndef HW_RISCV_IOMMU_STATE_H
> >>> +#define HW_RISCV_IOMMU_STATE_H
> >>> +
> >>> +#include "qemu/osdep.h"
> >>> +#include "qom/object.h"
> >>> +
> >>> +#include "hw/riscv/iommu.h"
> >>> +
> >>> +struct RISCVIOMMUState {
> >>> +    /*< private >*/
> >>> +    DeviceState parent_obj;
> >>> +
> >>> +    /*< public >*/
> >>> +    uint32_t version;     /* Reported interface version number */
> >>> +    uint32_t pasid_bits;  /* process identifier width */
> >>> +    uint32_t bus;         /* PCI bus mapping for non-root endpoints */
> >>> +
> >>> +    uint64_t cap;         /* IOMMU supported capabilities */
> >>> +    uint64_t fctl;        /* IOMMU enabled features */
> >>> +
> >>> +    bool enable_off;      /* Enable out-of-reset OFF mode (DMA
> disabled) */
> >>> +    bool enable_msi;      /* Enable MSI remapping */
> >>> +
> >>> +    /* IOMMU Internal State */
> >>> +    uint64_t ddtp;        /* Validated Device Directory Tree Root
> Pointer */
> >>> +
> >>> +    dma_addr_t cq_addr;   /* Command queue base physical address */
> >>> +    dma_addr_t fq_addr;   /* Fault/event queue base physical address
> */
> >>> +    dma_addr_t pq_addr;   /* Page request queue base physical address
> */
> >>> +
> >>> +    uint32_t cq_mask;     /* Command queue index bit mask */
> >>> +    uint32_t fq_mask;     /* Fault/event queue index bit mask */
> >>> +    uint32_t pq_mask;     /* Page request queue index bit mask */
> >>> +
> >>> +    /* interrupt notifier */
> >>> +    void (*notify)(RISCVIOMMUState *iommu, unsigned vector);
> >>> +
> >>> +    /* IOMMU State Machine */
> >>> +    QemuThread core_proc; /* Background processing thread */
> >>> +    QemuMutex core_lock;  /* Global IOMMU lock, used for cache/regs
> updates */
> >>> +    QemuCond core_cond;   /* Background processing wake up signal */
> >>> +    unsigned core_exec;   /* Processing thread execution actions */
> >>> +
> >>> +    /* IOMMU target address space */
> >>> +    AddressSpace *target_as;
> >>> +    MemoryRegion *target_mr;
> >>> +
> >>> +    /* MSI / MRIF access trap */
> >>> +    AddressSpace trap_as;
> >>> +    MemoryRegion trap_mr;
> >>> +
> >>> +    GHashTable *ctx_cache;          /* Device translation Context
> Cache */
> >>> +
> >>> +    /* MMIO Hardware Interface */
> >>> +    MemoryRegion regs_mr;
> >>> +    QemuSpin regs_lock;
> >>> +    uint8_t *regs_rw;  /* register state (user write) */
> >>> +    uint8_t *regs_wc;  /* write-1-to-clear mask */
> >>> +    uint8_t *regs_ro;  /* read-only mask */
> >>> +
> >>> +    QLIST_ENTRY(RISCVIOMMUState) iommus;
> >>> +    QLIST_HEAD(, RISCVIOMMUSpace) spaces;
> >>> +};
> >>> +
> >>> +void riscv_iommu_pci_setup_iommu(RISCVIOMMUState *iommu, PCIBus *bus,
> >>> +         Error **errp);
> >>> +
> >>> +/* private helpers */
> >>> +
> >>> +/* Register helper functions */
> >>> +static inline uint32_t riscv_iommu_reg_mod32(RISCVIOMMUState *s,
> >>> +    unsigned idx, uint32_t set, uint32_t clr)
> >>> +{
> >>> +    uint32_t val;
> >>> +    qemu_spin_lock(&s->regs_lock);
> >>> +    val = ldl_le_p(s->regs_rw + idx);
> >>> +    stl_le_p(s->regs_rw + idx, (val & ~clr) | set);
> >>> +    qemu_spin_unlock(&s->regs_lock);
> >>> +    return val;
> >>> +}
> >>> +
> >>> +static inline void riscv_iommu_reg_set32(RISCVIOMMUState *s,
> >>> +    unsigned idx, uint32_t set)
> >>> +{
> >>> +    qemu_spin_lock(&s->regs_lock);
> >>> +    stl_le_p(s->regs_rw + idx, set);
> >>> +    qemu_spin_unlock(&s->regs_lock);
> >>> +}
> >>> +
> >>> +static inline uint32_t riscv_iommu_reg_get32(RISCVIOMMUState *s,
> >>> +    unsigned idx)
> >>> +{
> >>> +    return ldl_le_p(s->regs_rw + idx);
> >>> +}
> >>> +
> >>> +static inline uint64_t riscv_iommu_reg_mod64(RISCVIOMMUState *s,
> >>> +    unsigned idx, uint64_t set, uint64_t clr)
> >>> +{
> >>> +    uint64_t val;
> >>> +    qemu_spin_lock(&s->regs_lock);
> >>> +    val = ldq_le_p(s->regs_rw + idx);
> >>> +    stq_le_p(s->regs_rw + idx, (val & ~clr) | set);
> >>> +    qemu_spin_unlock(&s->regs_lock);
> >>> +    return val;
> >>> +}
> >>> +
> >>> +static inline void riscv_iommu_reg_set64(RISCVIOMMUState *s,
> >>> +    unsigned idx, uint64_t set)
> >>> +{
> >>> +    qemu_spin_lock(&s->regs_lock);
> >>> +    stq_le_p(s->regs_rw + idx, set);
> >>> +    qemu_spin_unlock(&s->regs_lock);
> >>> +}
> >>> +
> >>> +static inline uint64_t riscv_iommu_reg_get64(RISCVIOMMUState *s,
> >>> +    unsigned idx)
> >>> +{
> >>> +    return ldq_le_p(s->regs_rw + idx);
> >>> +}
> >>> +
> >>> +
> >>> +
> >>> +#endif
> >>> diff --git a/hw/riscv/trace-events b/hw/riscv/trace-events
> >>> new file mode 100644
> >>> index 0000000000..42a97caffa
> >>> --- /dev/null
> >>> +++ b/hw/riscv/trace-events
> >>> @@ -0,0 +1,11 @@
> >>> +# See documentation at docs/devel/tracing.rst
> >>> +
> >>> +# riscv-iommu.c
> >>> +riscv_iommu_new(const char *id, unsigned b, unsigned d, unsigned f)
> "%s: device attached %04x:%02x.%d"
> >>> +riscv_iommu_flt(const char *id, unsigned b, unsigned d, unsigned f,
> uint64_t reason, uint64_t iova) "%s: fault %04x:%02x.%u reason: 0x%"PRIx64"
> iova: 0x%"PRIx64
> >>> +riscv_iommu_pri(const char *id, unsigned b, unsigned d, unsigned f,
> uint64_t iova) "%s: page request %04x:%02x.%u iova: 0x%"PRIx64
> >>> +riscv_iommu_dma(const char *id, unsigned b, unsigned d, unsigned f,
> unsigned pasid, const char *dir, uint64_t iova, uint64_t phys) "%s:
> translate %04x:%02x.%u #%u %s 0x%"PRIx64" -> 0x%"PRIx64
> >>> +riscv_iommu_msi(const char *id, unsigned b, unsigned d, unsigned f,
> uint64_t iova, uint64_t phys) "%s: translate %04x:%02x.%u MSI 0x%"PRIx64"
> -> 0x%"PRIx64
> >>> +riscv_iommu_cmd(const char *id, uint64_t l, uint64_t u) "%s: command
> 0x%"PRIx64" 0x%"PRIx64
> >>> +riscv_iommu_notifier_add(const char *id) "%s: dev-iotlb notifier
> added"
> >>> +riscv_iommu_notifier_del(const char *id) "%s: dev-iotlb notifier
> removed"
> >>> diff --git a/hw/riscv/trace.h b/hw/riscv/trace.h
> >>> new file mode 100644
> >>> index 0000000000..b88504b750
> >>> --- /dev/null
> >>> +++ b/hw/riscv/trace.h
> >>> @@ -0,0 +1,2 @@
> >>> +#include "trace/trace-hw_riscv.h"
> >>> +
> >>> diff --git a/include/hw/riscv/iommu.h b/include/hw/riscv/iommu.h
> >>> new file mode 100644
> >>> index 0000000000..403b365893
> >>> --- /dev/null
> >>> +++ b/include/hw/riscv/iommu.h
> >>> @@ -0,0 +1,36 @@
> >>> +/*
> >>> + * QEMU emulation of an RISC-V IOMMU (Ziommu)
> >>> + *
> >>> + * Copyright (C) 2022-2023 Rivos Inc.
> >>> + *
> >>> + * This program is free software; you can redistribute it and/or
> modify
> >>> + * it under the terms of the GNU General Public License as published
> by
> >>> + * the Free Software Foundation; either version 2 of the License.
> >>> + *
> >>> + * This program is distributed in the hope that it will be useful,
> >>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >>> + * GNU General Public License for more details.
> >>> + *
> >>> + * You should have received a copy of the GNU General Public License
> along
> >>> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> >>> + */
> >>> +
> >>> +#ifndef HW_RISCV_IOMMU_H
> >>> +#define HW_RISCV_IOMMU_H
> >>> +
> >>> +#include "qemu/osdep.h"
> >>> +#include "qom/object.h"
> >>> +
> >>> +#define TYPE_RISCV_IOMMU "riscv-iommu"
> >>> +OBJECT_DECLARE_SIMPLE_TYPE(RISCVIOMMUState, RISCV_IOMMU)
> >>> +typedef struct RISCVIOMMUState RISCVIOMMUState;
> >>> +
> >>> +#define TYPE_RISCV_IOMMU_MEMORY_REGION "riscv-iommu-mr"
> >>> +typedef struct RISCVIOMMUSpace RISCVIOMMUSpace;
> >>> +
> >>> +#define TYPE_RISCV_IOMMU_PCI "riscv-iommu-pci"
> >>> +OBJECT_DECLARE_SIMPLE_TYPE(RISCVIOMMUStatePci, RISCV_IOMMU_PCI)
> >>> +typedef struct RISCVIOMMUStatePci RISCVIOMMUStatePci;
> >>> +
> >>> +#endif
> >>> diff --git a/meson.build b/meson.build
> >>> index c59ca496f2..75e56f3282 100644
> >>> --- a/meson.build
> >>> +++ b/meson.build
> >>> @@ -3361,6 +3361,7 @@ if have_system
> >>>       'hw/rdma',
> >>>       'hw/rdma/vmw',
> >>>       'hw/rtc',
> >>> +    'hw/riscv',
> >>>       'hw/s390x',
> >>>       'hw/scsi',
> >>>       'hw/sd',
> >>> --
> >>> 2.43.2
> >>>
> >>>
>

[-- Attachment #2: Type: text/html, Size: 23337 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 15/15] hw/misc: EDU: add ATS/PRI capability
  2024-05-07 15:32   ` Frank Chang
@ 2024-05-16 13:59     ` Daniel Henrique Barboza
  0 siblings, 0 replies; 55+ messages in thread
From: Daniel Henrique Barboza @ 2024-05-16 13:59 UTC (permalink / raw)
  To: Frank Chang
  Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
	zhiwei_liu, palmer, ajones, tjeznach

Hi Frank!

On 5/7/24 12:32, Frank Chang wrote:
> Hi Daniel,
> 
> Daniel Henrique Barboza <dbarboza@ventanamicro.com> 於 2024年3月8日 週五 上午12:05寫道:
>>
>> From: Tomasz Jeznach <tjeznach@rivosinc.com>
>>
>> Mimic ATS interface with IOMMU translate request with IOMMU_NONE.  If
>> mapping exists, translation service will return current permission
>> flags, otherwise will report no permissions.
>>
>> Implement and register the IOMMU memory region listener to be notified
>> whenever an ATS invalidation request is sent from the IOMMU.
>>
>> Implement and register the IOMMU memory region listener to be notified
>> whenever an ATS page request group response is triggered from the IOMMU.
>>
>> Introduces a retry mechanism to the timer design so that any page that's
>> not available should be only accessed after the PRGR notification has
>> been received.
>>
>> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
>> Signed-off-by: Sebastien Boeuf <seb@rivosinc.com>
>> ---
>>   hw/misc/edu.c | 258 ++++++++++++++++++++++++++++++++++++++++++++++++--
>>   1 file changed, 251 insertions(+), 7 deletions(-)

(...)


>> +
>>   static void pci_edu_realize(PCIDevice *pdev, Error **errp)
>>   {
>>       EduState *edu = EDU(pdev);
>> +    AddressSpace *dma_as = NULL;
>>       uint8_t *pci_conf = pdev->config;
>>       int pos;
>>
>> @@ -390,9 +603,28 @@ static void pci_edu_realize(PCIDevice *pdev, Error **errp)
>>       pos = PCI_CONFIG_SPACE_SIZE;
>>       if (edu->enable_pasid) {
>>           /* PCIe Spec 7.8.9 PASID Extended Capability Structure */
>> -        pcie_add_capability(pdev, 0x1b, 1, pos, 8);
>> +        pcie_add_capability(pdev, PCI_EXT_CAP_ID_PASID, 1, pos, 8);
> 
> This should be included in the 14th commit.
> 
>>           pci_set_long(pdev->config + pos + 4, 0x00001400);
>>           pci_set_long(pdev->wmask + pos + 4,  0xfff0ffff);
>> +        pos += 8;
>> +
>> +        /* ATS Capability */
>> +        pcie_ats_init(pdev, pos, true);
>> +        pos += PCI_EXT_CAP_ATS_SIZEOF;
>> +
>> +        /* PRI Capability */
>> +        pcie_add_capability(pdev, PCI_EXT_CAP_ID_PRI, 1, pos, 16);
>> +        /* PRI STOPPED */
>> +        pci_set_long(pdev->config + pos +  4, 0x01000000);
>> +        /* PRI ENABLE bit writable */
>> +        pci_set_long(pdev->wmask  + pos +  4, 0x00000001);
>> +        /* PRI Capacity Supported */
>> +        pci_set_long(pdev->config + pos +  8, 0x00000080);
>> +        /* PRI Allocations Allowed, 32 */
>> +        pci_set_long(pdev->config + pos + 12, 0x00000040);
>> +        pci_set_long(pdev->wmask  + pos + 12, 0x0000007f);
> 
> We should use the defines declared in
> include/standard-headers/linux/pci_regs.h for readability,
> though some of the bitfields are not defined in the header file.
> 
> Regards,
> Frank Chang
> 
>> +
>> +        pos += 8;
>>       }

I'll reply here for both patches 14 and 15.

I changed it to use the defines we have in pci_regs.h if we have the definition
in the header. When we don't have the definition I ended up adding a manual
comment in the line like it's being done up above.

I'll also add doc changes for each new feature added.

All this said, I'm inclined to remove these 2 patches from the series. It's a
way of experiment with the riscv-iommu impl but it's not a crucial part of it.
The changes I made so far, based on your review, were uploaded here:


https://gitlab.com/danielhb/qemu/-/commits/edu_pasid_v3


Thanks,

Daniel


>>
>>       if (msi_init(pdev, 0, 1, true, false, errp)) {
>> @@ -409,12 +641,24 @@ static void pci_edu_realize(PCIDevice *pdev, Error **errp)
>>       memory_region_init_io(&edu->mmio, OBJECT(edu), &edu_mmio_ops, edu,
>>                       "edu-mmio", 1 * MiB);
>>       pci_register_bar(pdev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &edu->mmio);
>> +
>> +    /* Register IOMMU listener */
>> +    edu->iommu_listener = (MemoryListener) {
>> +        .name = "edu-iommu",
>> +        .region_add = edu_iommu_region_add,
>> +        .region_del = edu_iommu_region_del,
>> +    };
>> +
>> +    dma_as = pci_device_iommu_address_space(pdev);
>> +    memory_listener_register(&edu->iommu_listener, dma_as);
>>   }
>>
>>   static void pci_edu_uninit(PCIDevice *pdev)
>>   {
>>       EduState *edu = EDU(pdev);
>>
>> +    memory_listener_unregister(&edu->iommu_listener);
>> +
>>       qemu_mutex_lock(&edu->thr_mutex);
>>       edu->stopping = true;
>>       qemu_mutex_unlock(&edu->thr_mutex);
>> --
>> 2.43.2
>>
>>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 09/15] hw/riscv/riscv-iommu: add s-stage and g-stage support
  2024-05-10 11:14     ` Andrew Jones
@ 2024-05-16 19:41       ` Daniel Henrique Barboza
  0 siblings, 0 replies; 55+ messages in thread
From: Daniel Henrique Barboza @ 2024-05-16 19:41 UTC (permalink / raw)
  To: Andrew Jones, Frank Chang
  Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
	zhiwei_liu, palmer, tjeznach



On 5/10/24 08:14, Andrew Jones wrote:
> On Fri, May 10, 2024 at 06:36:51PM GMT, Frank Chang wrote:
> ...
>>>   static int riscv_iommu_spa_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
>>> -    IOMMUTLBEntry *iotlb)
>>> +    IOMMUTLBEntry *iotlb, bool gpa)
>>>   {
>>> +    dma_addr_t addr, base;
>>> +    uint64_t satp, gatp, pte;
>>> +    bool en_s, en_g;
>>> +    struct {
>>> +        unsigned char step;
>>> +        unsigned char levels;
>>> +        unsigned char ptidxbits;
>>> +        unsigned char ptesize;
>>> +    } sc[2];
>>> +    /* Translation stage phase */
>>> +    enum {
>>> +        S_STAGE = 0,
>>> +        G_STAGE = 1,
>>> +    } pass;
>>> +
>>> +    satp = get_field(ctx->satp, RISCV_IOMMU_ATP_MODE_FIELD);
>>> +    gatp = get_field(ctx->gatp, RISCV_IOMMU_ATP_MODE_FIELD);
>>> +
>>> +    en_s = satp != RISCV_IOMMU_DC_FSC_MODE_BARE && !gpa;
>>> +    en_g = gatp != RISCV_IOMMU_DC_IOHGATP_MODE_BARE;
>>> +
>>>       /* Early check for MSI address match when IOVA == GPA */
>>> -    if (iotlb->perm & IOMMU_WO &&
>>> +    if (!en_s && (iotlb->perm & IOMMU_WO) &&
>>
>> I'm wondering do we need to check "en_s" for MSI writes?
>>
>> IOMMU spec Section 2.3.3. Process to translate addresses of MSIs says:
>> "Determine if the address A is an access to a virtual interrupt file
>> as specified in Section 2.1.3.6."
>>
>> and Section 2.1.3.6 says:
>>
>> "An incoming memory access made by a device is recognized as
>> an access to a virtual interrupt file if the destination guest physical page
>> matches the supplied address pattern in all bit positions that are zeros
>> in the supplied address mask. In detail, a memory access to
>> guest physical address A is recognized as an access to a virtual
>> interrupt file’s
>> memory-mapped page if:
>> (A >> 12) & ~msi_addr_mask = (msi_addr_pattern & ~msi_addr_mask)"
>>
>> Is checking the address pattern sufficient enough to determine
>> the address is an MSI to a virtual interrupt file?
>>
> 
> I think so. In fact, I've removed that en_s check on our internal build in
> order to get things working for my irqbypass work, as we can do device
> assignment with VFIO with only S-stage enabled.

The following code will be fixed up here:

  static int riscv_iommu_spa_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
-    IOMMUTLBEntry *iotlb, bool gpa)
+    IOMMUTLBEntry *iotlb)
  {
      dma_addr_t addr, base;
      uint64_t satp, gatp, pte;
@@ -238,11 +237,11 @@ static int riscv_iommu_spa_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
      satp = get_field(ctx->satp, RISCV_IOMMU_ATP_MODE_FIELD);
      gatp = get_field(ctx->gatp, RISCV_IOMMU_ATP_MODE_FIELD);
  
-    en_s = satp != RISCV_IOMMU_DC_FSC_MODE_BARE && !gpa;
+    en_s = satp != RISCV_IOMMU_DC_FSC_MODE_BARE;
      en_g = gatp != RISCV_IOMMU_DC_IOHGATP_MODE_BARE;
  
      /* Early check for MSI address match when IOVA == GPA */
-    if (!en_s && (iotlb->perm & IOMMU_WO) &&
+    if ((iotlb->perm & IOMMU_WO) &&
          riscv_iommu_msi_check(s, ctx, iotlb->iova)) {
          iotlb->target_as = &s->trap_as;
          iotlb->translated_addr = iotlb->iova;
@@ -1203,7 +1202,7 @@ static int riscv_iommu_translate(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
      }
  
      /* Translate using device directory / page table information. */
-    fault = riscv_iommu_spa_fetch(s, ctx, iotlb, false);
+    fault = riscv_iommu_spa_fetch(s, ctx, iotlb);
  
      if (!fault && iotlb->target_as == &s->trap_as) {
          /* Do not cache trapped MSI translations */

'gpa' is eliminated since it was only being used as 'false' by the only
caller of riscv_iommu_spa_fetch(). The boolean was used only to calculate
en_s as "&& !gpa", so it's always 'true' and had no impact in en_s. My
understand here is that 'gpa' was a prototype of the first implementation
that got left behind and ended up not being used.

As for the MSI check, we won't skip translation if satp is bare (!en_s) because
we might be using just stage2 for a guest, thus en_s is removed from the
conditional. As Frank said, this change also complies with the spec since we don't
need to check satp to determine if the address is an MSI to a virtual interrupt
file.

And, last but not the least, this change doesn't break my KVM VFIO passthrough
test case :) I'll document more about the test case I'm using in the v3 cover
letter.


Thanks,

Daniel


> 
> Thanks,
> drew


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 08/15] hw/riscv/riscv-iommu: add Address Translation Cache (IOATC)
  2024-05-08  7:26   ` Frank Chang
@ 2024-05-16 21:45     ` Daniel Henrique Barboza
  0 siblings, 0 replies; 55+ messages in thread
From: Daniel Henrique Barboza @ 2024-05-16 21:45 UTC (permalink / raw)
  To: Frank Chang
  Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
	zhiwei_liu, palmer, ajones, tjeznach



On 5/8/24 04:26, Frank Chang wrote:
> Hi Daniel,
> 
> Daniel Henrique Barboza <dbarboza@ventanamicro.com> 於 2024年3月8日 週五 上午12:05寫道:
>>
>> From: Tomasz Jeznach <tjeznach@rivosinc.com>
>>
>> The RISC-V IOMMU spec predicts that the IOMMU can use translation caches
>> to hold entries from the DDT. This includes implementation for all cache
>> commands that are marked as 'not implemented'.
>>
>> There are some artifacts included in the cache that predicts s-stage and
>> g-stage elements, although we don't support it yet. We'll introduce them
>> next.
>>
>> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
>> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
>> ---
>>   hw/riscv/riscv-iommu.c | 190 ++++++++++++++++++++++++++++++++++++++++-
>>   hw/riscv/riscv-iommu.h |   2 +
>>   2 files changed, 188 insertions(+), 4 deletions(-)
>>
>> diff --git a/hw/riscv/riscv-iommu.c b/hw/riscv/riscv-iommu.c
>> index df534b99b0..0b93146327 100644
>> --- a/hw/riscv/riscv-iommu.c
>> +++ b/hw/riscv/riscv-iommu.c
>> @@ -63,6 +63,16 @@ struct RISCVIOMMUContext {
>>       uint64_t msiptp;            /* MSI redirection page table pointer */
>>   };
>>
>> +/* Address translation cache entry */
>> +struct RISCVIOMMUEntry {
>> +    uint64_t iova:44;           /* IOVA Page Number */
>> +    uint64_t pscid:20;          /* Process Soft-Context identifier */
>> +    uint64_t phys:44;           /* Physical Page Number */
>> +    uint64_t gscid:16;          /* Guest Soft-Context identifier */
>> +    uint64_t perm:2;            /* IOMMU_RW flags */
>> +    uint64_t __rfu:2;
>> +};
>> +
>>   /* IOMMU index for transactions without PASID specified. */
>>   #define RISCV_IOMMU_NOPASID 0
>>
>> @@ -629,14 +639,127 @@ static AddressSpace *riscv_iommu_space(RISCVIOMMUState *s, uint32_t devid)
>>       return &as->iova_as;
>>   }
>>
>> +/* Translation Object cache support */
>> +static gboolean __iot_equal(gconstpointer v1, gconstpointer v2)
>> +{
>> +    RISCVIOMMUEntry *t1 = (RISCVIOMMUEntry *) v1;
>> +    RISCVIOMMUEntry *t2 = (RISCVIOMMUEntry *) v2;
>> +    return t1->gscid == t2->gscid && t1->pscid == t2->pscid &&
>> +           t1->iova == t2->iova;
>> +}
>> +
>> +static guint __iot_hash(gconstpointer v)
>> +{
>> +    RISCVIOMMUEntry *t = (RISCVIOMMUEntry *) v;
>> +    return (guint)t->iova;
>> +}
>> +
>> +/* GV: 1 PSCV: 1 AV: 1 */
>> +static void __iot_inval_pscid_iova(gpointer key, gpointer value, gpointer data)
>> +{
>> +    RISCVIOMMUEntry *iot = (RISCVIOMMUEntry *) value;
>> +    RISCVIOMMUEntry *arg = (RISCVIOMMUEntry *) data;
>> +    if (iot->gscid == arg->gscid &&
>> +        iot->pscid == arg->pscid &&
>> +        iot->iova == arg->iova) {
>> +        iot->perm = 0;
> 
> Maybe using IOMMU_NONE would be clearer?

Agree. I changed all relevant "iot->perm = 0" instances to "iot->perm = IOMMU_NONE".


Thanks,


Daniel

> 
> Otherwise,
> Reviewed-by: Frank Chang <frank.chang@sifive.com>
> 
>> +    }
>> +}
>> +
>> +/* GV: 1 PSCV: 1 AV: 0 */
>> +static void __iot_inval_pscid(gpointer key, gpointer value, gpointer data)
>> +{
>> +    RISCVIOMMUEntry *iot = (RISCVIOMMUEntry *) value;
>> +    RISCVIOMMUEntry *arg = (RISCVIOMMUEntry *) data;
>> +    if (iot->gscid == arg->gscid &&
>> +        iot->pscid == arg->pscid) {
>> +        iot->perm = 0;
>> +    }
>> +}
>> +
>> +/* GV: 1 GVMA: 1 */
>> +static void __iot_inval_gscid_gpa(gpointer key, gpointer value, gpointer data)
>> +{
>> +    RISCVIOMMUEntry *iot = (RISCVIOMMUEntry *) value;
>> +    RISCVIOMMUEntry *arg = (RISCVIOMMUEntry *) data;
>> +    if (iot->gscid == arg->gscid) {
>> +        /* simplified cache, no GPA matching */
>> +        iot->perm = 0;
>> +    }
>> +}
>> +
>> +/* GV: 1 GVMA: 0 */
>> +static void __iot_inval_gscid(gpointer key, gpointer value, gpointer data)
>> +{
>> +    RISCVIOMMUEntry *iot = (RISCVIOMMUEntry *) value;
>> +    RISCVIOMMUEntry *arg = (RISCVIOMMUEntry *) data;
>> +    if (iot->gscid == arg->gscid) {
>> +        iot->perm = 0;
>> +    }
>> +}
>> +
>> +/* GV: 0 */
>> +static void __iot_inval_all(gpointer key, gpointer value, gpointer data)
>> +{
>> +    RISCVIOMMUEntry *iot = (RISCVIOMMUEntry *) value;
>> +    iot->perm = 0;
>> +}
>> +
>> +/* caller should keep ref-count for iot_cache object */
>> +static RISCVIOMMUEntry *riscv_iommu_iot_lookup(RISCVIOMMUContext *ctx,
>> +    GHashTable *iot_cache, hwaddr iova)
>> +{
>> +    RISCVIOMMUEntry key = {
>> +        .pscid = get_field(ctx->ta, RISCV_IOMMU_DC_TA_PSCID),
>> +        .iova  = PPN_DOWN(iova),
>> +    };
>> +    return g_hash_table_lookup(iot_cache, &key);
>> +}
>> +
>> +/* caller should keep ref-count for iot_cache object */
>> +static void riscv_iommu_iot_update(RISCVIOMMUState *s,
>> +    GHashTable *iot_cache, RISCVIOMMUEntry *iot)
>> +{
>> +    if (!s->iot_limit) {
>> +        return;
>> +    }
>> +
>> +    if (g_hash_table_size(s->iot_cache) >= s->iot_limit) {
>> +        iot_cache = g_hash_table_new_full(__iot_hash, __iot_equal,
>> +                                          g_free, NULL);
>> +        g_hash_table_unref(qatomic_xchg(&s->iot_cache, iot_cache));
>> +    }
>> +    g_hash_table_add(iot_cache, iot);
>> +}
>> +
>> +static void riscv_iommu_iot_inval(RISCVIOMMUState *s, GHFunc func,
>> +    uint32_t gscid, uint32_t pscid, hwaddr iova)
>> +{
>> +    GHashTable *iot_cache;
>> +    RISCVIOMMUEntry key = {
>> +        .gscid = gscid,
>> +        .pscid = pscid,
>> +        .iova  = PPN_DOWN(iova),
>> +    };
>> +
>> +    iot_cache = g_hash_table_ref(s->iot_cache);
>> +    g_hash_table_foreach(iot_cache, func, &key);
>> +    g_hash_table_unref(iot_cache);
>> +}
>> +
>>   static int riscv_iommu_translate(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
>> -    IOMMUTLBEntry *iotlb)
>> +    IOMMUTLBEntry *iotlb, bool enable_cache)
>>   {
>> +    RISCVIOMMUEntry *iot;
>> +    IOMMUAccessFlags perm;
>>       bool enable_faults;
>>       bool enable_pasid;
>>       bool enable_pri;
>> +    GHashTable *iot_cache;
>>       int fault;
>>
>> +    iot_cache = g_hash_table_ref(s->iot_cache);
>> +
>>       enable_faults = !(ctx->tc & RISCV_IOMMU_DC_TC_DTF);
>>       /*
>>        * TC[32] is reserved for custom extensions, used here to temporarily
>> @@ -645,9 +768,36 @@ static int riscv_iommu_translate(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
>>       enable_pri = (iotlb->perm == IOMMU_NONE) && (ctx->tc & BIT_ULL(32));
>>       enable_pasid = (ctx->tc & RISCV_IOMMU_DC_TC_PDTV);
>>
>> +    iot = riscv_iommu_iot_lookup(ctx, iot_cache, iotlb->iova);
>> +    perm = iot ? iot->perm : IOMMU_NONE;
>> +    if (perm != IOMMU_NONE) {
>> +        iotlb->translated_addr = PPN_PHYS(iot->phys);
>> +        iotlb->addr_mask = ~TARGET_PAGE_MASK;
>> +        iotlb->perm = perm;
>> +        fault = 0;
>> +        goto done;
>> +    }
>> +
>>       /* Translate using device directory / page table information. */
>>       fault = riscv_iommu_spa_fetch(s, ctx, iotlb);
>>
>> +    if (!fault && iotlb->target_as == &s->trap_as) {
>> +        /* Do not cache trapped MSI translations */
>> +        goto done;
>> +    }
>> +
>> +    if (!fault && iotlb->translated_addr != iotlb->iova && enable_cache) {
>> +        iot = g_new0(RISCVIOMMUEntry, 1);
>> +        iot->iova = PPN_DOWN(iotlb->iova);
>> +        iot->phys = PPN_DOWN(iotlb->translated_addr);
>> +        iot->pscid = get_field(ctx->ta, RISCV_IOMMU_DC_TA_PSCID);
>> +        iot->perm = iotlb->perm;
>> +        riscv_iommu_iot_update(s, iot_cache, iot);
>> +    }
>> +
>> +done:
>> +    g_hash_table_unref(iot_cache);
>> +
>>       if (enable_pri && fault) {
>>           struct riscv_iommu_pq_record pr = {0};
>>           if (enable_pasid) {
>> @@ -794,13 +944,40 @@ static void riscv_iommu_process_cq_tail(RISCVIOMMUState *s)
>>               if (cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_PSCV) {
>>                   /* illegal command arguments IOTINVAL.GVMA & PSCV == 1 */
>>                   goto cmd_ill;
>> +            } else if (!(cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_GV)) {
>> +                /* invalidate all cache mappings */
>> +                func = __iot_inval_all;
>> +            } else if (!(cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_AV)) {
>> +                /* invalidate cache matching GSCID */
>> +                func = __iot_inval_gscid;
>> +            } else {
>> +                /* invalidate cache matching GSCID and ADDR (GPA) */
>> +                func = __iot_inval_gscid_gpa;
>>               }
>> -            /* translation cache not implemented yet */
>> +            riscv_iommu_iot_inval(s, func,
>> +                get_field(cmd.dword0, RISCV_IOMMU_CMD_IOTINVAL_GSCID), 0,
>> +                cmd.dword1 & TARGET_PAGE_MASK);
>>               break;
>>
>>           case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IOTINVAL_FUNC_VMA,
>>                                RISCV_IOMMU_CMD_IOTINVAL_OPCODE):
>> -            /* translation cache not implemented yet */
>> +            if (!(cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_GV)) {
>> +                /* invalidate all cache mappings, simplified model */
>> +                func = __iot_inval_all;
>> +            } else if (!(cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_PSCV)) {
>> +                /* invalidate cache matching GSCID, simplified model */
>> +                func = __iot_inval_gscid;
>> +            } else if (!(cmd.dword0 & RISCV_IOMMU_CMD_IOTINVAL_AV)) {
>> +                /* invalidate cache matching GSCID and PSCID */
>> +                func = __iot_inval_pscid;
>> +            } else {
>> +                /* invalidate cache matching GSCID and PSCID and ADDR (IOVA) */
>> +                func = __iot_inval_pscid_iova;
>> +            }
>> +            riscv_iommu_iot_inval(s, func,
>> +                get_field(cmd.dword0, RISCV_IOMMU_CMD_IOTINVAL_GSCID),
>> +                get_field(cmd.dword0, RISCV_IOMMU_CMD_IOTINVAL_PSCID),
>> +                cmd.dword1 & TARGET_PAGE_MASK);
>>               break;
>>
>>           case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_DDT,
>> @@ -1290,6 +1467,8 @@ static void riscv_iommu_realize(DeviceState *dev, Error **errp)
>>       /* Device translation context cache */
>>       s->ctx_cache = g_hash_table_new_full(__ctx_hash, __ctx_equal,
>>                                            g_free, NULL);
>> +    s->iot_cache = g_hash_table_new_full(__iot_hash, __iot_equal,
>> +                                         g_free, NULL);
>>
>>       s->iommus.le_next = NULL;
>>       s->iommus.le_prev = NULL;
>> @@ -1313,6 +1492,7 @@ static void riscv_iommu_unrealize(DeviceState *dev)
>>       qemu_thread_join(&s->core_proc);
>>       qemu_cond_destroy(&s->core_cond);
>>       qemu_mutex_destroy(&s->core_lock);
>> +    g_hash_table_unref(s->iot_cache);
>>       g_hash_table_unref(s->ctx_cache);
>>   }
>>
>> @@ -1320,6 +1500,8 @@ static Property riscv_iommu_properties[] = {
>>       DEFINE_PROP_UINT32("version", RISCVIOMMUState, version,
>>           RISCV_IOMMU_SPEC_DOT_VER),
>>       DEFINE_PROP_UINT32("bus", RISCVIOMMUState, bus, 0x0),
>> +    DEFINE_PROP_UINT32("ioatc-limit", RISCVIOMMUState, iot_limit,
>> +        LIMIT_CACHE_IOT),
>>       DEFINE_PROP_BOOL("intremap", RISCVIOMMUState, enable_msi, TRUE),
>>       DEFINE_PROP_BOOL("off", RISCVIOMMUState, enable_off, TRUE),
>>       DEFINE_PROP_LINK("downstream-mr", RISCVIOMMUState, target_mr,
>> @@ -1372,7 +1554,7 @@ static IOMMUTLBEntry riscv_iommu_memory_region_translate(
>>           /* Translation disabled or invalid. */
>>           iotlb.addr_mask = 0;
>>           iotlb.perm = IOMMU_NONE;
>> -    } else if (riscv_iommu_translate(as->iommu, ctx, &iotlb)) {
>> +    } else if (riscv_iommu_translate(as->iommu, ctx, &iotlb, true)) {
>>           /* Translation disabled or fault reported. */
>>           iotlb.addr_mask = 0;
>>           iotlb.perm = IOMMU_NONE;
>> diff --git a/hw/riscv/riscv-iommu.h b/hw/riscv/riscv-iommu.h
>> index 6f740de690..eea2123686 100644
>> --- a/hw/riscv/riscv-iommu.h
>> +++ b/hw/riscv/riscv-iommu.h
>> @@ -68,6 +68,8 @@ struct RISCVIOMMUState {
>>       MemoryRegion trap_mr;
>>
>>       GHashTable *ctx_cache;          /* Device translation Context Cache */
>> +    GHashTable *iot_cache;          /* IO Translated Address Cache */
>> +    unsigned iot_limit;             /* IO Translation Cache size limit */
>>
>>       /* MMIO Hardware Interface */
>>       MemoryRegion regs_mr;
>> --
>> 2.43.2
>>
>>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 10/15] hw/riscv/riscv-iommu: add ATS support
  2024-05-08  2:57   ` Frank Chang
@ 2024-05-17  9:29     ` Daniel Henrique Barboza
  0 siblings, 0 replies; 55+ messages in thread
From: Daniel Henrique Barboza @ 2024-05-17  9:29 UTC (permalink / raw)
  To: Frank Chang
  Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
	zhiwei_liu, palmer, ajones, tjeznach

Hi Frank,


On 5/7/24 23:57, Frank Chang wrote:
> Hi Daniel,
> 
> Daniel Henrique Barboza <dbarboza@ventanamicro.com> 於 2024年3月8日 週五 上午12:06寫道:
>>
>> From: Tomasz Jeznach <tjeznach@rivosinc.com>
>>
>> Add PCIe Address Translation Services (ATS) capabilities to the IOMMU.
>> This will add support for ATS translation requests in Fault/Event
>> queues, Page-request queue and IOATC invalidations.
>>
>> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
>> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
>> ---
>>   hw/riscv/riscv-iommu-bits.h |  43 ++++++++++++++-
>>   hw/riscv/riscv-iommu.c      | 107 +++++++++++++++++++++++++++++++++---
>>   hw/riscv/riscv-iommu.h      |   1 +
>>   hw/riscv/trace-events       |   3 +
>>   4 files changed, 145 insertions(+), 9 deletions(-)
>>
>> diff --git a/hw/riscv/riscv-iommu-bits.h b/hw/riscv/riscv-iommu-bits.h
>> index 9d645d69ea..0994f5ce48 100644
>> --- a/hw/riscv/riscv-iommu-bits.h
>> +++ b/hw/riscv/riscv-iommu-bits.h
>> @@ -81,6 +81,7 @@ struct riscv_iommu_pq_record {
>>   #define RISCV_IOMMU_CAP_SV57X4          BIT_ULL(19)
>>   #define RISCV_IOMMU_CAP_MSI_FLAT        BIT_ULL(22)
>>   #define RISCV_IOMMU_CAP_MSI_MRIF        BIT_ULL(23)
>> +#define RISCV_IOMMU_CAP_ATS             BIT_ULL(25)
>>   #define RISCV_IOMMU_CAP_IGS             GENMASK_ULL(29, 28)
>>   #define RISCV_IOMMU_CAP_PAS             GENMASK_ULL(37, 32)
>>   #define RISCV_IOMMU_CAP_PD8             BIT_ULL(38)
>> @@ -201,6 +202,7 @@ struct riscv_iommu_dc {
>>
>>   /* Translation control fields */
>>   #define RISCV_IOMMU_DC_TC_V             BIT_ULL(0)
>> +#define RISCV_IOMMU_DC_TC_EN_ATS        BIT_ULL(1)
>>   #define RISCV_IOMMU_DC_TC_DTF           BIT_ULL(4)
>>   #define RISCV_IOMMU_DC_TC_PDTV          BIT_ULL(5)
>>   #define RISCV_IOMMU_DC_TC_PRPR          BIT_ULL(6)
>> @@ -259,6 +261,20 @@ struct riscv_iommu_command {
>>   #define RISCV_IOMMU_CMD_IODIR_DV        BIT_ULL(33)
>>   #define RISCV_IOMMU_CMD_IODIR_DID       GENMASK_ULL(63, 40)
>>
>> +/* 3.1.4 I/O MMU PCIe ATS */
>> +#define RISCV_IOMMU_CMD_ATS_OPCODE              4
>> +#define RISCV_IOMMU_CMD_ATS_FUNC_INVAL          0
>> +#define RISCV_IOMMU_CMD_ATS_FUNC_PRGR           1
>> +#define RISCV_IOMMU_CMD_ATS_PID         GENMASK_ULL(31, 12)
>> +#define RISCV_IOMMU_CMD_ATS_PV          BIT_ULL(32)
>> +#define RISCV_IOMMU_CMD_ATS_DSV         BIT_ULL(33)
>> +#define RISCV_IOMMU_CMD_ATS_RID         GENMASK_ULL(55, 40)
>> +#define RISCV_IOMMU_CMD_ATS_DSEG        GENMASK_ULL(63, 56)
>> +/* dword1 is the ATS payload, two different payload types for INVAL and PRGR */
>> +
>> +/* ATS.PRGR payload */
>> +#define RISCV_IOMMU_CMD_ATS_PRGR_RESP_CODE      GENMASK_ULL(47, 44)
>> +
>>   enum riscv_iommu_dc_fsc_atp_modes {
>>       RISCV_IOMMU_DC_FSC_MODE_BARE = 0,
>>       RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV32 = 8,
>> @@ -322,7 +338,32 @@ enum riscv_iommu_fq_ttypes {
>>       RISCV_IOMMU_FQ_TTYPE_TADDR_INST_FETCH = 5,
>>       RISCV_IOMMU_FQ_TTYPE_TADDR_RD = 6,
>>       RISCV_IOMMU_FQ_TTYPE_TADDR_WR = 7,
>> -    RISCV_IOMMU_FW_TTYPE_PCIE_MSG_REQ = 8,
>> +    RISCV_IOMMU_FQ_TTYPE_PCIE_ATS_REQ = 8,
>> +    RISCV_IOMMU_FW_TTYPE_PCIE_MSG_REQ = 9,
>> +};
>> +
>> +/* Header fields */
>> +#define RISCV_IOMMU_PREQ_HDR_PID        GENMASK_ULL(31, 12)
>> +#define RISCV_IOMMU_PREQ_HDR_PV         BIT_ULL(32)
>> +#define RISCV_IOMMU_PREQ_HDR_PRIV       BIT_ULL(33)
>> +#define RISCV_IOMMU_PREQ_HDR_EXEC       BIT_ULL(34)
>> +#define RISCV_IOMMU_PREQ_HDR_DID        GENMASK_ULL(63, 40)
>> +
>> +/* Payload fields */
>> +#define RISCV_IOMMU_PREQ_PAYLOAD_R      BIT_ULL(0)
>> +#define RISCV_IOMMU_PREQ_PAYLOAD_W      BIT_ULL(1)
>> +#define RISCV_IOMMU_PREQ_PAYLOAD_L      BIT_ULL(2)
>> +#define RISCV_IOMMU_PREQ_PAYLOAD_M      GENMASK_ULL(2, 0)
>> +#define RISCV_IOMMU_PREQ_PRG_INDEX      GENMASK_ULL(11, 3)
>> +#define RISCV_IOMMU_PREQ_UADDR          GENMASK_ULL(63, 12)
>> +
>> +
>> +/*
>> + * struct riscv_iommu_msi_pte - MSI Page Table Entry
>> + */
>> +struct riscv_iommu_msi_pte {
>> +      uint64_t pte;
>> +      uint64_t mrif_info;
>>   };
>>
>>   /* Fields on pte */
>> diff --git a/hw/riscv/riscv-iommu.c b/hw/riscv/riscv-iommu.c
>> index 03a610fa75..7af5929b10 100644
>> --- a/hw/riscv/riscv-iommu.c
>> +++ b/hw/riscv/riscv-iommu.c
>> @@ -576,7 +576,7 @@ static int riscv_iommu_ctx_fetch(RISCVIOMMUState *s, RISCVIOMMUContext *ctx)
>>               RISCV_IOMMU_DC_IOHGATP_MODE_BARE);
>>           ctx->satp = set_field(0, RISCV_IOMMU_ATP_MODE_FIELD,
>>               RISCV_IOMMU_DC_FSC_MODE_BARE);
>> -        ctx->tc = RISCV_IOMMU_DC_TC_V;
>> +        ctx->tc = RISCV_IOMMU_DC_TC_EN_ATS | RISCV_IOMMU_DC_TC_V;
> 
> We should OR RISCV_IOMMU_DC_TC_EN_ATS only when IOMMU has ATS capability.
> (i.e. s->enable_ats == true).
> 
>>           ctx->ta = 0;
>>           ctx->msiptp = 0;
>>           return 0;
>> @@ -1021,6 +1021,18 @@ static int riscv_iommu_translate(RISCVIOMMUState *s, RISCVIOMMUContext *ctx,
>>       enable_pri = (iotlb->perm == IOMMU_NONE) && (ctx->tc & BIT_ULL(32));
>>       enable_pasid = (ctx->tc & RISCV_IOMMU_DC_TC_PDTV);
>>
>> +    /* Check for ATS request. */
>> +    if (iotlb->perm == IOMMU_NONE) {
>> +        /* Check if ATS is disabled. */
>> +        if (!(ctx->tc & RISCV_IOMMU_DC_TC_EN_ATS)) {
>> +            enable_pri = false;
>> +            fault = RISCV_IOMMU_FQ_CAUSE_TTYPE_BLOCKED;
>> +            goto done;
>> +        }
>> +        trace_riscv_iommu_ats(s->parent_obj.id, PCI_BUS_NUM(ctx->devid),
>> +                PCI_SLOT(ctx->devid), PCI_FUNC(ctx->devid), iotlb->iova);
> 
> It's possible that iotlb->perm == IOMMU_NONE,
> but the translation request comes from riscv_iommu_process_dbg().

That's true. I don't see an easy way to distinguish at this point whether
the translation was triggered by an actual ATS request or a DBG request.

I'll remove this trace since it's ambiguous.  There are enough traces in ATS
code in riscv_iommu_ats_inval() and riscv_iommu_ats_prgr(). We also have a trace
for each command being processed in riscv_iommu_process_cq_tail().


Thanks,


Daniel

> 
>> +    }
>> +
>>       iot = riscv_iommu_iot_lookup(ctx, iot_cache, iotlb->iova);
>>       perm = iot ? iot->perm : IOMMU_NONE;
>>       if (perm != IOMMU_NONE) {
>> @@ -1067,13 +1079,10 @@ done:
>>
>>       if (enable_faults && fault) {
>>           struct riscv_iommu_fq_record ev;
>> -        unsigned ttype;
>> -
>> -        if (iotlb->perm & IOMMU_RW) {
>> -            ttype = RISCV_IOMMU_FQ_TTYPE_UADDR_WR;
>> -        } else {
>> -            ttype = RISCV_IOMMU_FQ_TTYPE_UADDR_RD;
>> -        }
>> +        const unsigned ttype =
>> +            (iotlb->perm & IOMMU_RW) ? RISCV_IOMMU_FQ_TTYPE_UADDR_WR :
>> +            ((iotlb->perm & IOMMU_RO) ? RISCV_IOMMU_FQ_TTYPE_UADDR_RD :
>> +            RISCV_IOMMU_FQ_TTYPE_PCIE_ATS_REQ);
>>           ev.hdr = set_field(0, RISCV_IOMMU_FQ_HDR_CAUSE, fault);
>>           ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_TTYPE, ttype);
>>           ev.hdr = set_field(ev.hdr, RISCV_IOMMU_FQ_HDR_PV, enable_pasid);
>> @@ -1105,6 +1114,73 @@ static MemTxResult riscv_iommu_iofence(RISCVIOMMUState *s, bool notify,
>>           MEMTXATTRS_UNSPECIFIED);
>>   }
>>
>> +static void riscv_iommu_ats(RISCVIOMMUState *s,
>> +    struct riscv_iommu_command *cmd, IOMMUNotifierFlag flag,
>> +    IOMMUAccessFlags perm,
>> +    void (*trace_fn)(const char *id))
>> +{
>> +    RISCVIOMMUSpace *as = NULL;
>> +    IOMMUNotifier *n;
>> +    IOMMUTLBEvent event;
>> +    uint32_t pasid;
>> +    uint32_t devid;
>> +    const bool pv = cmd->dword0 & RISCV_IOMMU_CMD_ATS_PV;
>> +
>> +    if (cmd->dword0 & RISCV_IOMMU_CMD_ATS_DSV) {
>> +        /* Use device segment and requester id */
>> +        devid = get_field(cmd->dword0,
>> +            RISCV_IOMMU_CMD_ATS_DSEG | RISCV_IOMMU_CMD_ATS_RID);
>> +    } else {
>> +        devid = get_field(cmd->dword0, RISCV_IOMMU_CMD_ATS_RID);
>> +    }
>> +
>> +    pasid = get_field(cmd->dword0, RISCV_IOMMU_CMD_ATS_PID);
>> +
>> +    qemu_mutex_lock(&s->core_lock);
>> +    QLIST_FOREACH(as, &s->spaces, list) {
>> +        if (as->devid == devid) {
>> +            break;
>> +        }
>> +    }
>> +    qemu_mutex_unlock(&s->core_lock);
>> +
>> +    if (!as || !as->notifier) {
>> +        return;
>> +    }
>> +
>> +    event.type = flag;
>> +    event.entry.perm = perm;
>> +    event.entry.target_as = s->target_as;
>> +
>> +    IOMMU_NOTIFIER_FOREACH(n, &as->iova_mr) {
>> +        if (!pv || n->iommu_idx == pasid) {
>> +            event.entry.iova = n->start;
>> +            event.entry.addr_mask = n->end - n->start;
>> +            trace_fn(as->iova_mr.parent_obj.name);
>> +            memory_region_notify_iommu_one(n, &event);
>> +        }
>> +    }
>> +}
>> +
>> +static void riscv_iommu_ats_inval(RISCVIOMMUState *s,
>> +    struct riscv_iommu_command *cmd)
>> +{
>> +    return riscv_iommu_ats(s, cmd, IOMMU_NOTIFIER_DEVIOTLB_UNMAP, IOMMU_NONE,
>> +                           trace_riscv_iommu_ats_inval);
>> +}
>> +
>> +static void riscv_iommu_ats_prgr(RISCVIOMMUState *s,
>> +    struct riscv_iommu_command *cmd)
>> +{
>> +    unsigned resp_code = get_field(cmd->dword1,
>> +                                   RISCV_IOMMU_CMD_ATS_PRGR_RESP_CODE);
>> +
>> +    /* Using the access flag to carry response code information */
>> +    IOMMUAccessFlags perm = resp_code ? IOMMU_NONE : IOMMU_RW;
>> +    return riscv_iommu_ats(s, cmd, IOMMU_NOTIFIER_MAP, perm,
>> +                           trace_riscv_iommu_ats_prgr);
>> +}
>> +
>>   static void riscv_iommu_process_ddtp(RISCVIOMMUState *s)
>>   {
>>       uint64_t old_ddtp = s->ddtp;
>> @@ -1260,6 +1336,17 @@ static void riscv_iommu_process_cq_tail(RISCVIOMMUState *s)
>>                   get_field(cmd.dword0, RISCV_IOMMU_CMD_IODIR_PID));
>>               break;
>>
>> +        /* ATS commands */
>> +        case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_ATS_FUNC_INVAL,
>> +                             RISCV_IOMMU_CMD_ATS_OPCODE):
>> +            riscv_iommu_ats_inval(s, &cmd);
>> +            break;
>> +
>> +        case RISCV_IOMMU_CMD(RISCV_IOMMU_CMD_ATS_FUNC_PRGR,
>> +                             RISCV_IOMMU_CMD_ATS_OPCODE):
>> +            riscv_iommu_ats_prgr(s, &cmd);
>> +            break;
>> +
> 
> PCIe ATS commands are supported only when capabilities.ATS is set to 1
> (i.e. s->enable_ats == true).
> 
> Regards,
> Frank Chang
> 
>>           default:
>>           cmd_ill:
>>               /* Invalid instruction, do not advance instruction index. */
>> @@ -1648,6 +1735,9 @@ static void riscv_iommu_realize(DeviceState *dev, Error **errp)
>>       if (s->enable_msi) {
>>           s->cap |= RISCV_IOMMU_CAP_MSI_FLAT | RISCV_IOMMU_CAP_MSI_MRIF;
>>       }
>> +    if (s->enable_ats) {
>> +        s->cap |= RISCV_IOMMU_CAP_ATS;
>> +    }
>>       if (s->enable_s_stage) {
>>           s->cap |= RISCV_IOMMU_CAP_SV32 | RISCV_IOMMU_CAP_SV39 |
>>                     RISCV_IOMMU_CAP_SV48 | RISCV_IOMMU_CAP_SV57;
>> @@ -1765,6 +1855,7 @@ static Property riscv_iommu_properties[] = {
>>       DEFINE_PROP_UINT32("ioatc-limit", RISCVIOMMUState, iot_limit,
>>           LIMIT_CACHE_IOT),
>>       DEFINE_PROP_BOOL("intremap", RISCVIOMMUState, enable_msi, TRUE),
>> +    DEFINE_PROP_BOOL("ats", RISCVIOMMUState, enable_ats, TRUE),
>>       DEFINE_PROP_BOOL("off", RISCVIOMMUState, enable_off, TRUE),
>>       DEFINE_PROP_BOOL("s-stage", RISCVIOMMUState, enable_s_stage, TRUE),
>>       DEFINE_PROP_BOOL("g-stage", RISCVIOMMUState, enable_g_stage, TRUE),
>> diff --git a/hw/riscv/riscv-iommu.h b/hw/riscv/riscv-iommu.h
>> index 9b33fb97ef..47f3fdad58 100644
>> --- a/hw/riscv/riscv-iommu.h
>> +++ b/hw/riscv/riscv-iommu.h
>> @@ -38,6 +38,7 @@ struct RISCVIOMMUState {
>>
>>       bool enable_off;      /* Enable out-of-reset OFF mode (DMA disabled) */
>>       bool enable_msi;      /* Enable MSI remapping */
>> +    bool enable_ats;      /* Enable ATS support */
>>       bool enable_s_stage;  /* Enable S/VS-Stage translation */
>>       bool enable_g_stage;  /* Enable G-Stage translation */
>>
>> diff --git a/hw/riscv/trace-events b/hw/riscv/trace-events
>> index 42a97caffa..4b486b6420 100644
>> --- a/hw/riscv/trace-events
>> +++ b/hw/riscv/trace-events
>> @@ -9,3 +9,6 @@ riscv_iommu_msi(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iov
>>   riscv_iommu_cmd(const char *id, uint64_t l, uint64_t u) "%s: command 0x%"PRIx64" 0x%"PRIx64
>>   riscv_iommu_notifier_add(const char *id) "%s: dev-iotlb notifier added"
>>   riscv_iommu_notifier_del(const char *id) "%s: dev-iotlb notifier removed"
>> +riscv_iommu_ats(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova) "%s: translate request %04x:%02x.%u iova: 0x%"PRIx64
>> +riscv_iommu_ats_inval(const char *id) "%s: dev-iotlb invalidate"
>> +riscv_iommu_ats_prgr(const char *id) "%s: dev-iotlb page request group response"
>> --
>> 2.43.2
>>
>>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 03/15] hw/riscv: add RISC-V IOMMU base emulation
  2024-05-16  7:13         ` Frank Chang
@ 2024-05-20 16:17           ` Daniel Henrique Barboza
  2024-05-21 10:52             ` Frank Chang
  0 siblings, 1 reply; 55+ messages in thread
From: Daniel Henrique Barboza @ 2024-05-20 16:17 UTC (permalink / raw)
  To: Frank Chang
  Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
	zhiwei_liu, palmer, ajones, tjeznach, Sebastien Boeuf, Sunil V L

Hi Frank,

On 5/16/24 04:13, Frank Chang wrote:
> On Mon, May 13, 2024 at 8:37 PM Daniel Henrique Barboza <dbarboza@ventanamicro.com <mailto:dbarboza@ventanamicro.com>> wrote:
> 
>     Hi Frank,
> 
> 
>     On 5/8/24 08:15, Daniel Henrique Barboza wrote:
>      > Hi Frank,
>      >
>      > I'll reply with that I've done so far. Still missing some stuff:
>      >
>      > On 5/2/24 08:37, Frank Chang wrote:
>      >> Hi Daniel,
>      >>
>      >> Daniel Henrique Barboza <dbarboza@ventanamicro.com <mailto:dbarboza@ventanamicro.com>> 於 2024年3月8日 週五 上午12:04寫道:
>      >>>
>      >>> From: Tomasz Jeznach <tjeznach@rivosinc.com <mailto:tjeznach@rivosinc.com>>
>      >>>
>      >>> The RISC-V IOMMU specification is now ratified as-per the RISC-V
>      >>> international process. The latest frozen specifcation can be found
>      >>> at:
>      >>>
>      >>> https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf <https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf>
>      >>>
>      >>> Add the foundation of the device emulation for RISC-V IOMMU, which
>      >>> includes an IOMMU that has no capabilities but MSI interrupt support and
>      >>> fault queue interfaces. We'll add add more features incrementally in the
>      >>> next patches.
>      >>>
>      >>> Co-developed-by: Sebastien Boeuf <seb@rivosinc.com <mailto:seb@rivosinc.com>>
>      >>> Signed-off-by: Sebastien Boeuf <seb@rivosinc.com <mailto:seb@rivosinc.com>>
>      >>> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com <mailto:tjeznach@rivosinc.com>>
>      >>> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com <mailto:dbarboza@ventanamicro.com>>
>      >>> ---
>      >>>   hw/riscv/Kconfig         |    4 +
> 
>     (...)
> 
>      >>> +
>      >>> +    s->iommus.le_next = NULL;
>      >>> +    s->iommus.le_prev = NULL;
>      >>> +    QLIST_INIT(&s->spaces);
>      >>> +    qemu_cond_init(&s->core_cond);
>      >>> +    qemu_mutex_init(&s->core_lock);
>      >>> +    qemu_spin_init(&s->regs_lock);
>      >>> +    qemu_thread_create(&s->core_proc, "riscv-iommu-core",
>      >>> +        riscv_iommu_core_proc, s, QEMU_THREAD_JOINABLE);
>      >>
>      >> In our experience, using QEMU thread increases the latency of command
>      >> queue processing,
>      >> which leads to the potential IOMMU fence timeout in the Linux driver
>      >> when using IOMMU with KVM,
>      >> e.g. booting the guest Linux.
>      >>
>      >> Is it possible to remove the thread from the IOMMU just like ARM, AMD,
>      >> and Intel IOMMU models?
>      >
>      > Interesting. We've been using this emulation internally in Ventana, with
>      > KVM and VFIO, and didn't experience this issue. Drew is on CC and can talk
>      > more about it.
>      >
>      > That said, I don't mind this change, assuming it's feasible to make it for this
>      > first version.  I'll need to check it how other IOMMUs are doing it.
> 
> 
>     I removed the threading and it seems to be working fine without it. I'll commit this
>     change for v3.
> 
>      >
>      >
>      >
>      >>
>      >>> +}
>      >>> +
>      >
>      > (...)
>      >
>      >>> +
>      >>> +static AddressSpace *riscv_iommu_find_as(PCIBus *bus, void *opaque, int devfn)
>      >>> +{
>      >>> +    RISCVIOMMUState *s = (RISCVIOMMUState *) opaque;
>      >>> +    PCIDevice *pdev = pci_find_device(bus, pci_bus_num(bus), devfn);
>      >>> +    AddressSpace *as = NULL;
>      >>> +
>      >>> +    if (pdev && pci_is_iommu(pdev)) {
>      >>> +        return s->target_as;
>      >>> +    }
>      >>> +
>      >>> +    /* Find first registered IOMMU device */
>      >>> +    while (s->iommus.le_prev) {
>      >>> +        s = *(s->iommus.le_prev);
>      >>> +    }
>      >>> +
>      >>> +    /* Find first matching IOMMU */
>      >>> +    while (s != NULL && as == NULL) {
>      >>> +        as = riscv_iommu_space(s, PCI_BUILD_BDF(pci_bus_num(bus), devfn));
>      >>
>      >> For pci_bus_num(),
>      >> riscv_iommu_find_as() can be called at the very early stage
>      >> where software has no chance to enumerate the bus numbers.
> 
>     I investigated and this doesn't seem to be a problem. This function is called at the
>     last step of the realize() steps of both riscv_iommu_pci_realize() and
>     riscv_iommu_sys_realize(), and by that time the pci_bus_num() is already assigned.
>     Other iommus use pci_bus_num() into their own get_address_space() callbacks like
>     this too.
> 
> 
> Hi Daniel,
> 
> IIUC, pci_bus_num() by default is assigned to pcibus_num():
> 
> static int pcibus_num(PCIBus *bus)
> {
>      if (pci_bus_is_root(bus)) {
>          return 0; /* pci host bridge */
>      }
>      return bus->parent_dev->config[PCI_SECONDARY_BUS];
> }
> 
> If the bus is not the root bus, it tries to read the bus' parent device's
> secondary bus number (PCI_SECONDARY_BUS) field in the PCI configuration space.
> This field should be programmable by the SW during PCIe enumeration.
> But I don't think SW has a chance to be executed before riscv_iommu_sys_realize() is called,
> since it's pretty early before CPU's execution unless RISC-V IOMMU is hot-plugged.
> Even if RISC-V IOMMU is hot-plugged, I think riscv_iommu_sys_realize() is still called
> before SW aware of the existence of IOMMU on the PCI topology tree.
> 
> Do you think this matches your observation?

It does. You have a good point on how the pcibus_num() can vary if SW wants to
change the PCI_SECONDARY_BUS and the IOMMU isn't in a root bus. Note that this
will not happen with the existing riscv-iommu-pci device as it is now, since it
has code to prevent the device to be attached to non-pci root buses, but there's
no restrictions in the riscv-iommu-sys device.

And speaking on riscv-iommu-bus, the current device we have in this series is too
bare bones, without an actual use case for it (e.g. code to add it in the 'virt'
machine), but it's getting in the way nevertheless.

I'll remove the riscv-iommu-sys device from v3 and re-introduce it in a later
revision or as a follow up series. Sunil has a handful of patches that add the
riscv-iommu-sys device in the 'virt' machine and the proper ACPI support for it [1],
and I intend to use them as a base. We'll then need some minor adjustments in the
existing code to make it fully functional like we're doing with riscv-iommu-pci.


Thanks,

Daniel


[1] https://github.com/vlsunil/qemu/commits/acpi_rimt_poc_v1/
> 
> Regards,
> Frank Chang
> 
> 
> 
>     Thanks,
> 
> 
>     Daniel
> 
> 
>      >
>      > I'll see how other IOMMUs are handling their iommu_find_as()
>      >
>      >
>      > Thanks,
>      >
>      >
>      > Daniel
>      >
>      >
>      >>
>      >>
>      >>
>      >>
>      >>> +        s = s->iommus.le_next;
>      >>> +    }
>      >>> +
>      >>> +    return as ? as : &address_space_memory;
>      >>> +}
>      >>> +
>      >>> +static const PCIIOMMUOps riscv_iommu_ops = {
>      >>> +    .get_address_space = riscv_iommu_find_as,
>      >>> +};
>      >>> +
>      >>> +void riscv_iommu_pci_setup_iommu(RISCVIOMMUState *iommu, PCIBus *bus,
>      >>> +        Error **errp)
>      >>> +{
>      >>> +    if (bus->iommu_ops &&
>      >>> +        bus->iommu_ops->get_address_space == riscv_iommu_find_as) {
>      >>> +        /* Allow multiple IOMMUs on the same PCIe bus, link known devices */
>      >>> +        RISCVIOMMUState *last = (RISCVIOMMUState *)bus->iommu_opaque;
>      >>> +        QLIST_INSERT_AFTER(last, iommu, iommus);
>      >>> +    } else if (bus->iommu_ops == NULL) {
>      >>> +        pci_setup_iommu(bus, &riscv_iommu_ops, iommu);
>      >>> +    } else {
>      >>> +        error_setg(errp, "can't register secondary IOMMU for PCI bus #%d",
>      >>> +            pci_bus_num(bus));
>      >>> +    }
>      >>> +}
>      >>> +
>      >>> +static int riscv_iommu_memory_region_index(IOMMUMemoryRegion *iommu_mr,
>      >>> +    MemTxAttrs attrs)
>      >>> +{
>      >>> +    return attrs.unspecified ? RISCV_IOMMU_NOPASID : (int)attrs.pasid;
>      >>> +}
>      >>> +
>      >>> +static int riscv_iommu_memory_region_index_len(IOMMUMemoryRegion *iommu_mr)
>      >>> +{
>      >>> +    RISCVIOMMUSpace *as = container_of(iommu_mr, RISCVIOMMUSpace, iova_mr);
>      >>> +    return 1 << as->iommu->pasid_bits;
>      >>> +}
>      >>> +
>      >>> +static void riscv_iommu_memory_region_init(ObjectClass *klass, void *data)
>      >>> +{
>      >>> +    IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_CLASS(klass);
>      >>> +
>      >>> +    imrc->translate = riscv_iommu_memory_region_translate;
>      >>> +    imrc->notify_flag_changed = riscv_iommu_memory_region_notify;
>      >>> +    imrc->attrs_to_index = riscv_iommu_memory_region_index;
>      >>> +    imrc->num_indexes = riscv_iommu_memory_region_index_len;
>      >>> +}
>      >>> +
>      >>> +static const TypeInfo riscv_iommu_memory_region_info = {
>      >>> +    .parent = TYPE_IOMMU_MEMORY_REGION,
>      >>> +    .name = TYPE_RISCV_IOMMU_MEMORY_REGION,
>      >>> +    .class_init = riscv_iommu_memory_region_init,
>      >>> +};
>      >>> +
>      >>> +static void riscv_iommu_register_mr_types(void)
>      >>> +{
>      >>> +    type_register_static(&riscv_iommu_memory_region_info);
>      >>> +    type_register_static(&riscv_iommu_info);
>      >>> +}
>      >>> +
>      >>> +type_init(riscv_iommu_register_mr_types);
>      >>> diff --git a/hw/riscv/riscv-iommu.h b/hw/riscv/riscv-iommu.h
>      >>> new file mode 100644
>      >>> index 0000000000..6f740de690
>      >>> --- /dev/null
>      >>> +++ b/hw/riscv/riscv-iommu.h
>      >>> @@ -0,0 +1,141 @@
>      >>> +/*
>      >>> + * QEMU emulation of an RISC-V IOMMU (Ziommu)
>      >>> + *
>      >>> + * Copyright (C) 2022-2023 Rivos Inc.
>      >>> + *
>      >>> + * This program is free software; you can redistribute it and/or modify
>      >>> + * it under the terms of the GNU General Public License as published by
>      >>> + * the Free Software Foundation; either version 2 of the License.
>      >>> + *
>      >>> + * This program is distributed in the hope that it will be useful,
>      >>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>      >>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>      >>> + * GNU General Public License for more details.
>      >>> + *
>      >>> + * You should have received a copy of the GNU General Public License along
>      >>> + * with this program; if not, see <http://www.gnu.org/licenses/ <http://www.gnu.org/licenses/>>.
>      >>> + */
>      >>> +
>      >>> +#ifndef HW_RISCV_IOMMU_STATE_H
>      >>> +#define HW_RISCV_IOMMU_STATE_H
>      >>> +
>      >>> +#include "qemu/osdep.h"
>      >>> +#include "qom/object.h"
>      >>> +
>      >>> +#include "hw/riscv/iommu.h"
>      >>> +
>      >>> +struct RISCVIOMMUState {
>      >>> +    /*< private >*/
>      >>> +    DeviceState parent_obj;
>      >>> +
>      >>> +    /*< public >*/
>      >>> +    uint32_t version;     /* Reported interface version number */
>      >>> +    uint32_t pasid_bits;  /* process identifier width */
>      >>> +    uint32_t bus;         /* PCI bus mapping for non-root endpoints */
>      >>> +
>      >>> +    uint64_t cap;         /* IOMMU supported capabilities */
>      >>> +    uint64_t fctl;        /* IOMMU enabled features */
>      >>> +
>      >>> +    bool enable_off;      /* Enable out-of-reset OFF mode (DMA disabled) */
>      >>> +    bool enable_msi;      /* Enable MSI remapping */
>      >>> +
>      >>> +    /* IOMMU Internal State */
>      >>> +    uint64_t ddtp;        /* Validated Device Directory Tree Root Pointer */
>      >>> +
>      >>> +    dma_addr_t cq_addr;   /* Command queue base physical address */
>      >>> +    dma_addr_t fq_addr;   /* Fault/event queue base physical address */
>      >>> +    dma_addr_t pq_addr;   /* Page request queue base physical address */
>      >>> +
>      >>> +    uint32_t cq_mask;     /* Command queue index bit mask */
>      >>> +    uint32_t fq_mask;     /* Fault/event queue index bit mask */
>      >>> +    uint32_t pq_mask;     /* Page request queue index bit mask */
>      >>> +
>      >>> +    /* interrupt notifier */
>      >>> +    void (*notify)(RISCVIOMMUState *iommu, unsigned vector);
>      >>> +
>      >>> +    /* IOMMU State Machine */
>      >>> +    QemuThread core_proc; /* Background processing thread */
>      >>> +    QemuMutex core_lock;  /* Global IOMMU lock, used for cache/regs updates */
>      >>> +    QemuCond core_cond;   /* Background processing wake up signal */
>      >>> +    unsigned core_exec;   /* Processing thread execution actions */
>      >>> +
>      >>> +    /* IOMMU target address space */
>      >>> +    AddressSpace *target_as;
>      >>> +    MemoryRegion *target_mr;
>      >>> +
>      >>> +    /* MSI / MRIF access trap */
>      >>> +    AddressSpace trap_as;
>      >>> +    MemoryRegion trap_mr;
>      >>> +
>      >>> +    GHashTable *ctx_cache;          /* Device translation Context Cache */
>      >>> +
>      >>> +    /* MMIO Hardware Interface */
>      >>> +    MemoryRegion regs_mr;
>      >>> +    QemuSpin regs_lock;
>      >>> +    uint8_t *regs_rw;  /* register state (user write) */
>      >>> +    uint8_t *regs_wc;  /* write-1-to-clear mask */
>      >>> +    uint8_t *regs_ro;  /* read-only mask */
>      >>> +
>      >>> +    QLIST_ENTRY(RISCVIOMMUState) iommus;
>      >>> +    QLIST_HEAD(, RISCVIOMMUSpace) spaces;
>      >>> +};
>      >>> +
>      >>> +void riscv_iommu_pci_setup_iommu(RISCVIOMMUState *iommu, PCIBus *bus,
>      >>> +         Error **errp);
>      >>> +
>      >>> +/* private helpers */
>      >>> +
>      >>> +/* Register helper functions */
>      >>> +static inline uint32_t riscv_iommu_reg_mod32(RISCVIOMMUState *s,
>      >>> +    unsigned idx, uint32_t set, uint32_t clr)
>      >>> +{
>      >>> +    uint32_t val;
>      >>> +    qemu_spin_lock(&s->regs_lock);
>      >>> +    val = ldl_le_p(s->regs_rw + idx);
>      >>> +    stl_le_p(s->regs_rw + idx, (val & ~clr) | set);
>      >>> +    qemu_spin_unlock(&s->regs_lock);
>      >>> +    return val;
>      >>> +}
>      >>> +
>      >>> +static inline void riscv_iommu_reg_set32(RISCVIOMMUState *s,
>      >>> +    unsigned idx, uint32_t set)
>      >>> +{
>      >>> +    qemu_spin_lock(&s->regs_lock);
>      >>> +    stl_le_p(s->regs_rw + idx, set);
>      >>> +    qemu_spin_unlock(&s->regs_lock);
>      >>> +}
>      >>> +
>      >>> +static inline uint32_t riscv_iommu_reg_get32(RISCVIOMMUState *s,
>      >>> +    unsigned idx)
>      >>> +{
>      >>> +    return ldl_le_p(s->regs_rw + idx);
>      >>> +}
>      >>> +
>      >>> +static inline uint64_t riscv_iommu_reg_mod64(RISCVIOMMUState *s,
>      >>> +    unsigned idx, uint64_t set, uint64_t clr)
>      >>> +{
>      >>> +    uint64_t val;
>      >>> +    qemu_spin_lock(&s->regs_lock);
>      >>> +    val = ldq_le_p(s->regs_rw + idx);
>      >>> +    stq_le_p(s->regs_rw + idx, (val & ~clr) | set);
>      >>> +    qemu_spin_unlock(&s->regs_lock);
>      >>> +    return val;
>      >>> +}
>      >>> +
>      >>> +static inline void riscv_iommu_reg_set64(RISCVIOMMUState *s,
>      >>> +    unsigned idx, uint64_t set)
>      >>> +{
>      >>> +    qemu_spin_lock(&s->regs_lock);
>      >>> +    stq_le_p(s->regs_rw + idx, set);
>      >>> +    qemu_spin_unlock(&s->regs_lock);
>      >>> +}
>      >>> +
>      >>> +static inline uint64_t riscv_iommu_reg_get64(RISCVIOMMUState *s,
>      >>> +    unsigned idx)
>      >>> +{
>      >>> +    return ldq_le_p(s->regs_rw + idx);
>      >>> +}
>      >>> +
>      >>> +
>      >>> +
>      >>> +#endif
>      >>> diff --git a/hw/riscv/trace-events b/hw/riscv/trace-events
>      >>> new file mode 100644
>      >>> index 0000000000..42a97caffa
>      >>> --- /dev/null
>      >>> +++ b/hw/riscv/trace-events
>      >>> @@ -0,0 +1,11 @@
>      >>> +# See documentation at docs/devel/tracing.rst
>      >>> +
>      >>> +# riscv-iommu.c
>      >>> +riscv_iommu_new(const char *id, unsigned b, unsigned d, unsigned f) "%s: device attached %04x:%02x.%d"
>      >>> +riscv_iommu_flt(const char *id, unsigned b, unsigned d, unsigned f, uint64_t reason, uint64_t iova) "%s: fault %04x:%02x.%u reason: 0x%"PRIx64" iova: 0x%"PRIx64
>      >>> +riscv_iommu_pri(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova) "%s: page request %04x:%02x.%u iova: 0x%"PRIx64
>      >>> +riscv_iommu_dma(const char *id, unsigned b, unsigned d, unsigned f, unsigned pasid, const char *dir, uint64_t iova, uint64_t phys) "%s: translate %04x:%02x.%u #%u %s 0x%"PRIx64" -> 0x%"PRIx64
>      >>> +riscv_iommu_msi(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova, uint64_t phys) "%s: translate %04x:%02x.%u MSI 0x%"PRIx64" -> 0x%"PRIx64
>      >>> +riscv_iommu_cmd(const char *id, uint64_t l, uint64_t u) "%s: command 0x%"PRIx64" 0x%"PRIx64
>      >>> +riscv_iommu_notifier_add(const char *id) "%s: dev-iotlb notifier added"
>      >>> +riscv_iommu_notifier_del(const char *id) "%s: dev-iotlb notifier removed"
>      >>> diff --git a/hw/riscv/trace.h b/hw/riscv/trace.h
>      >>> new file mode 100644
>      >>> index 0000000000..b88504b750
>      >>> --- /dev/null
>      >>> +++ b/hw/riscv/trace.h
>      >>> @@ -0,0 +1,2 @@
>      >>> +#include "trace/trace-hw_riscv.h"
>      >>> +
>      >>> diff --git a/include/hw/riscv/iommu.h b/include/hw/riscv/iommu.h
>      >>> new file mode 100644
>      >>> index 0000000000..403b365893
>      >>> --- /dev/null
>      >>> +++ b/include/hw/riscv/iommu.h
>      >>> @@ -0,0 +1,36 @@
>      >>> +/*
>      >>> + * QEMU emulation of an RISC-V IOMMU (Ziommu)
>      >>> + *
>      >>> + * Copyright (C) 2022-2023 Rivos Inc.
>      >>> + *
>      >>> + * This program is free software; you can redistribute it and/or modify
>      >>> + * it under the terms of the GNU General Public License as published by
>      >>> + * the Free Software Foundation; either version 2 of the License.
>      >>> + *
>      >>> + * This program is distributed in the hope that it will be useful,
>      >>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>      >>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>      >>> + * GNU General Public License for more details.
>      >>> + *
>      >>> + * You should have received a copy of the GNU General Public License along
>      >>> + * with this program; if not, see <http://www.gnu.org/licenses/ <http://www.gnu.org/licenses/>>.
>      >>> + */
>      >>> +
>      >>> +#ifndef HW_RISCV_IOMMU_H
>      >>> +#define HW_RISCV_IOMMU_H
>      >>> +
>      >>> +#include "qemu/osdep.h"
>      >>> +#include "qom/object.h"
>      >>> +
>      >>> +#define TYPE_RISCV_IOMMU "riscv-iommu"
>      >>> +OBJECT_DECLARE_SIMPLE_TYPE(RISCVIOMMUState, RISCV_IOMMU)
>      >>> +typedef struct RISCVIOMMUState RISCVIOMMUState;
>      >>> +
>      >>> +#define TYPE_RISCV_IOMMU_MEMORY_REGION "riscv-iommu-mr"
>      >>> +typedef struct RISCVIOMMUSpace RISCVIOMMUSpace;
>      >>> +
>      >>> +#define TYPE_RISCV_IOMMU_PCI "riscv-iommu-pci"
>      >>> +OBJECT_DECLARE_SIMPLE_TYPE(RISCVIOMMUStatePci, RISCV_IOMMU_PCI)
>      >>> +typedef struct RISCVIOMMUStatePci RISCVIOMMUStatePci;
>      >>> +
>      >>> +#endif
>      >>> diff --git a/meson.build b/meson.build
>      >>> index c59ca496f2..75e56f3282 100644
>      >>> --- a/meson.build
>      >>> +++ b/meson.build
>      >>> @@ -3361,6 +3361,7 @@ if have_system
>      >>>       'hw/rdma',
>      >>>       'hw/rdma/vmw',
>      >>>       'hw/rtc',
>      >>> +    'hw/riscv',
>      >>>       'hw/s390x',
>      >>>       'hw/scsi',
>      >>>       'hw/sd',
>      >>> --
>      >>> 2.43.2
>      >>>
>      >>>
> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 00/15] riscv: QEMU RISC-V IOMMU Support
  2024-05-10 11:14 ` [PATCH v2 00/15] riscv: QEMU RISC-V IOMMU Support Frank Chang
@ 2024-05-20 16:26   ` Daniel Henrique Barboza
  0 siblings, 0 replies; 55+ messages in thread
From: Daniel Henrique Barboza @ 2024-05-20 16:26 UTC (permalink / raw)
  To: Frank Chang
  Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
	zhiwei_liu, palmer, ajones, tjeznach



On 5/10/24 08:14, Frank Chang wrote:
> Hi Daniel,
> 
> Thanks for the upstream work.
> Sorry that it took a while for me to review the patchset.
> 
> Please let me know if you need any help from us to update the IOMMU model.
> We would like to see it merged for QEMU 9.1.0.

Thanks for the help in the reviews!

I'll do some last changes in the riscv-iommu-pci device, and check if we have any
DT changes that happened that we need to sync up.

The plan is to send v3 in the next couple of days. Let's see how it goes.


Thanks,


Daniel


> 
> Regards,
> Frank Chang
> 
> Daniel Henrique Barboza <dbarboza@ventanamicro.com> 於 2024年3月8日 週五 上午12:04寫道:
>>
>> Hi,
>>
>> This is the second version of the work Tomasz sent in July 2023 [1].
>> I'll be helping Tomasz upstreaming it.
>>
>> The core emulation code is left unchanged but a few tweaks were made in
>> v2:
>>
>> - The most notable difference in this version is that the code was split
>>    in smaller chunks. Patch 03 is still a 1700 lines patch, which is an
>>    improvement from the 3800 lines patch from v1, but we can only go so
>>    far when splitting the core components of the emulation. The reality
>>    is that the IOMMU emulation is a rather complex piece of software and
>>    there's not much we can do to alleviate it;
>>
>> - I'm not contributing the HPM support that was present in v1. It shaved
>>    off 600 lines of code from the series, which is already large enough
>>    as is. We'll introduce HPM in later versions or as a follow-up;
>>
>> - The riscv-iommu-header.h header was also trimmed. I shaved it of 300
>>    or so from it, all of them from definitions that the emulation isn't
>>    using it. The header will be eventually be imported from the Linux
>>    driver (not upstream yet), so for now we can live with a trimmed
>>    header for the emulation usage alone;
>>
>> - I added libqos tests for the riscv-iommu-pci device. The idea of these
>>    tests is to give us more confidence in the emulation code;
>>
>> - 'edu' device support. The support was retrieved from Tomasz EDU branch
>>    [2]. This device can then be used to test PCI passthrough to exercise
>>    the IOMMU.
>>
>>
>> Patches based on alistair/riscv-to-apply.next.
>>
>> v1 link: https://lore.kernel.org/qemu-riscv/cover.1689819031.git.tjeznach@rivosinc.com/
>>
>> [1] https://lore.kernel.org/qemu-riscv/cover.1689819031.git.tjeznach@rivosinc.com/
>> [2] https://github.com/tjeznach/qemu.git, branch 'riscv_iommu_edu_impl'
>>
>> Andrew Jones (1):
>>    hw/riscv/riscv-iommu: Add another irq for mrif notifications
>>
>> Daniel Henrique Barboza (2):
>>    test/qtest: add riscv-iommu-pci tests
>>    qtest/riscv-iommu-test: add init queues test
>>
>> Tomasz Jeznach (12):
>>    exec/memtxattr: add process identifier to the transaction attributes
>>    hw/riscv: add riscv-iommu-bits.h
>>    hw/riscv: add RISC-V IOMMU base emulation
>>    hw/riscv: add riscv-iommu-pci device
>>    hw/riscv: add riscv-iommu-sys platform device
>>    hw/riscv/virt.c: support for RISC-V IOMMU PCIDevice hotplug
>>    hw/riscv/riscv-iommu: add Address Translation Cache (IOATC)
>>    hw/riscv/riscv-iommu: add s-stage and g-stage support
>>    hw/riscv/riscv-iommu: add ATS support
>>    hw/riscv/riscv-iommu: add DBG support
>>    hw/misc: EDU: added PASID support
>>    hw/misc: EDU: add ATS/PRI capability
>>
>>   hw/misc/edu.c                    |  297 ++++-
>>   hw/riscv/Kconfig                 |    4 +
>>   hw/riscv/meson.build             |    1 +
>>   hw/riscv/riscv-iommu-bits.h      |  407 ++++++
>>   hw/riscv/riscv-iommu-pci.c       |  173 +++
>>   hw/riscv/riscv-iommu-sys.c       |   93 ++
>>   hw/riscv/riscv-iommu.c           | 2085 ++++++++++++++++++++++++++++++
>>   hw/riscv/riscv-iommu.h           |  146 +++
>>   hw/riscv/trace-events            |   15 +
>>   hw/riscv/trace.h                 |    2 +
>>   hw/riscv/virt.c                  |   33 +-
>>   include/exec/memattrs.h          |    5 +
>>   include/hw/riscv/iommu.h         |   40 +
>>   meson.build                      |    1 +
>>   tests/qtest/libqos/meson.build   |    4 +
>>   tests/qtest/libqos/riscv-iommu.c |   79 ++
>>   tests/qtest/libqos/riscv-iommu.h |   96 ++
>>   tests/qtest/meson.build          |    1 +
>>   tests/qtest/riscv-iommu-test.c   |  234 ++++
>>   19 files changed, 3704 insertions(+), 12 deletions(-)
>>   create mode 100644 hw/riscv/riscv-iommu-bits.h
>>   create mode 100644 hw/riscv/riscv-iommu-pci.c
>>   create mode 100644 hw/riscv/riscv-iommu-sys.c
>>   create mode 100644 hw/riscv/riscv-iommu.c
>>   create mode 100644 hw/riscv/riscv-iommu.h
>>   create mode 100644 hw/riscv/trace-events
>>   create mode 100644 hw/riscv/trace.h
>>   create mode 100644 include/hw/riscv/iommu.h
>>   create mode 100644 tests/qtest/libqos/riscv-iommu.c
>>   create mode 100644 tests/qtest/libqos/riscv-iommu.h
>>   create mode 100644 tests/qtest/riscv-iommu-test.c
>>
>> --
>> 2.43.2
>>
>>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 03/15] hw/riscv: add RISC-V IOMMU base emulation
  2024-05-20 16:17           ` Daniel Henrique Barboza
@ 2024-05-21 10:52             ` Frank Chang
  2024-05-21 12:28               ` Daniel Henrique Barboza
  0 siblings, 1 reply; 55+ messages in thread
From: Frank Chang @ 2024-05-21 10:52 UTC (permalink / raw)
  To: Daniel Henrique Barboza
  Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
	zhiwei_liu, palmer, ajones, tjeznach, Sebastien Boeuf, Sunil V L

[-- Attachment #1: Type: text/plain, Size: 22965 bytes --]

Hi Daniel,

On Tue, May 21, 2024 at 12:17 AM Daniel Henrique Barboza <
dbarboza@ventanamicro.com> wrote:

> Hi Frank,
>
> On 5/16/24 04:13, Frank Chang wrote:
> > On Mon, May 13, 2024 at 8:37 PM Daniel Henrique Barboza <
> dbarboza@ventanamicro.com <mailto:dbarboza@ventanamicro.com>> wrote:
> >
> >     Hi Frank,
> >
> >
> >     On 5/8/24 08:15, Daniel Henrique Barboza wrote:
> >      > Hi Frank,
> >      >
> >      > I'll reply with that I've done so far. Still missing some stuff:
> >      >
> >      > On 5/2/24 08:37, Frank Chang wrote:
> >      >> Hi Daniel,
> >      >>
> >      >> Daniel Henrique Barboza <dbarboza@ventanamicro.com <mailto:
> dbarboza@ventanamicro.com>> 於 2024年3月8日 週五 上午12:04寫道:
> >      >>>
> >      >>> From: Tomasz Jeznach <tjeznach@rivosinc.com <mailto:
> tjeznach@rivosinc.com>>
> >      >>>
> >      >>> The RISC-V IOMMU specification is now ratified as-per the RISC-V
> >      >>> international process. The latest frozen specifcation can be
> found
> >      >>> at:
> >      >>>
> >      >>>
> https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf
> <
> https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf
> >
> >      >>>
> >      >>> Add the foundation of the device emulation for RISC-V IOMMU,
> which
> >      >>> includes an IOMMU that has no capabilities but MSI interrupt
> support and
> >      >>> fault queue interfaces. We'll add add more features
> incrementally in the
> >      >>> next patches.
> >      >>>
> >      >>> Co-developed-by: Sebastien Boeuf <seb@rivosinc.com <mailto:
> seb@rivosinc.com>>
> >      >>> Signed-off-by: Sebastien Boeuf <seb@rivosinc.com <mailto:
> seb@rivosinc.com>>
> >      >>> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com <mailto:
> tjeznach@rivosinc.com>>
> >      >>> Signed-off-by: Daniel Henrique Barboza <
> dbarboza@ventanamicro.com <mailto:dbarboza@ventanamicro.com>>
> >      >>> ---
> >      >>>   hw/riscv/Kconfig         |    4 +
> >
> >     (...)
> >
> >      >>> +
> >      >>> +    s->iommus.le_next = NULL;
> >      >>> +    s->iommus.le_prev = NULL;
> >      >>> +    QLIST_INIT(&s->spaces);
> >      >>> +    qemu_cond_init(&s->core_cond);
> >      >>> +    qemu_mutex_init(&s->core_lock);
> >      >>> +    qemu_spin_init(&s->regs_lock);
> >      >>> +    qemu_thread_create(&s->core_proc, "riscv-iommu-core",
> >      >>> +        riscv_iommu_core_proc, s, QEMU_THREAD_JOINABLE);
> >      >>
> >      >> In our experience, using QEMU thread increases the latency of
> command
> >      >> queue processing,
> >      >> which leads to the potential IOMMU fence timeout in the Linux
> driver
> >      >> when using IOMMU with KVM,
> >      >> e.g. booting the guest Linux.
> >      >>
> >      >> Is it possible to remove the thread from the IOMMU just like
> ARM, AMD,
> >      >> and Intel IOMMU models?
> >      >
> >      > Interesting. We've been using this emulation internally in
> Ventana, with
> >      > KVM and VFIO, and didn't experience this issue. Drew is on CC and
> can talk
> >      > more about it.
> >      >
> >      > That said, I don't mind this change, assuming it's feasible to
> make it for this
> >      > first version.  I'll need to check it how other IOMMUs are doing
> it.
> >
> >
> >     I removed the threading and it seems to be working fine without it.
> I'll commit this
> >     change for v3.
> >
> >      >
> >      >
> >      >
> >      >>
> >      >>> +}
> >      >>> +
> >      >
> >      > (...)
> >      >
> >      >>> +
> >      >>> +static AddressSpace *riscv_iommu_find_as(PCIBus *bus, void
> *opaque, int devfn)
> >      >>> +{
> >      >>> +    RISCVIOMMUState *s = (RISCVIOMMUState *) opaque;
> >      >>> +    PCIDevice *pdev = pci_find_device(bus, pci_bus_num(bus),
> devfn);
> >      >>> +    AddressSpace *as = NULL;
> >      >>> +
> >      >>> +    if (pdev && pci_is_iommu(pdev)) {
> >      >>> +        return s->target_as;
> >      >>> +    }
> >      >>> +
> >      >>> +    /* Find first registered IOMMU device */
> >      >>> +    while (s->iommus.le_prev) {
> >      >>> +        s = *(s->iommus.le_prev);
> >      >>> +    }
> >      >>> +
> >      >>> +    /* Find first matching IOMMU */
> >      >>> +    while (s != NULL && as == NULL) {
> >      >>> +        as = riscv_iommu_space(s,
> PCI_BUILD_BDF(pci_bus_num(bus), devfn));
> >      >>
> >      >> For pci_bus_num(),
> >      >> riscv_iommu_find_as() can be called at the very early stage
> >      >> where software has no chance to enumerate the bus numbers.
> >
> >     I investigated and this doesn't seem to be a problem. This function
> is called at the
> >     last step of the realize() steps of both riscv_iommu_pci_realize()
> and
> >     riscv_iommu_sys_realize(), and by that time the pci_bus_num() is
> already assigned.
> >     Other iommus use pci_bus_num() into their own get_address_space()
> callbacks like
> >     this too.
> >
> >
> > Hi Daniel,
> >
> > IIUC, pci_bus_num() by default is assigned to pcibus_num():
> >
> > static int pcibus_num(PCIBus *bus)
> > {
> >      if (pci_bus_is_root(bus)) {
> >          return 0; /* pci host bridge */
> >      }
> >      return bus->parent_dev->config[PCI_SECONDARY_BUS];
> > }
> >
> > If the bus is not the root bus, it tries to read the bus' parent device's
> > secondary bus number (PCI_SECONDARY_BUS) field in the PCI configuration
> space.
> > This field should be programmable by the SW during PCIe enumeration.
> > But I don't think SW has a chance to be executed before
> riscv_iommu_sys_realize() is called,
> > since it's pretty early before CPU's execution unless RISC-V IOMMU is
> hot-plugged.
> > Even if RISC-V IOMMU is hot-plugged, I think riscv_iommu_sys_realize()
> is still called
> > before SW aware of the existence of IOMMU on the PCI topology tree.
> >
> > Do you think this matches your observation?
>
> It does. You have a good point on how the pcibus_num() can vary if SW
> wants to
> change the PCI_SECONDARY_BUS and the IOMMU isn't in a root bus. Note that
> this
> will not happen with the existing riscv-iommu-pci device as it is now,
> since it
> has code to prevent the device to be attached to non-pci root buses, but
> there's
> no restrictions in the riscv-iommu-sys device.
>

Thanks for the explanation.

Do you know where this limitation is from?
Is it in this patchset or it's somewhere else in the Linux RISC-V IOMMU
driver?

BTW, for the case like DesignWare PCIe host controller [1],
we cannot connect RISC-V IOMMU to the root bus ("pcie") [2]
because it already has a child bus ("dw-pcie") connecting to it [3].

If we try to connect RISC-V IOMMU to the root bus ("pcie"),
it can't be discovered by Linux PCIe driver as a PCIe Downstream Port
normally leads to a Link with only Device 0 on it.

PCIe spec 6.0, section 7.3.1 stats:
"Downstream Ports that do not have ARI Forwarding enabled must associate
only Device 0 with the device attached to the Logical Bus representing the
Link
from the Port."

The PCI slot scan is early returned in the Linux PCIe driver [4][5].

Do you think it's possible to remove this limitation?

[1] https://github.com/qemu/qemu/blob/master/hw/pci-host/designware.c
[2] https://github.com/qemu/qemu/blob/master/hw/pci-host/designware.c#L695
[3] https://github.com/qemu/qemu/blob/master/hw/pci-host/designware.c#L409
[4] https://github.com/torvalds/linux/blob/master/drivers/pci/probe.c#L2674
[5] https://github.com/torvalds/linux/blob/master/drivers/pci/probe.c#L2652

Regards,
Frank Chang

>
> And speaking on riscv-iommu-bus, the current device we have in this series
> is too
> bare bones, without an actual use case for it (e.g. code to add it in the
> 'virt'
> machine), but it's getting in the way nevertheless.
>
> I'll remove the riscv-iommu-sys device from v3 and re-introduce it in a
> later
> revision or as a follow up series. Sunil has a handful of patches that add
> the
> riscv-iommu-sys device in the 'virt' machine and the proper ACPI support
> for it [1],
> and I intend to use them as a base. We'll then need some minor adjustments
> in the
> existing code to make it fully functional like we're doing with
> riscv-iommu-pci.
>
>
> Thanks,
>
> Daniel
>
>
> [1] https://github.com/vlsunil/qemu/commits/acpi_rimt_poc_v1/
> >
> > Regards,
> > Frank Chang
> >
> >
> >
> >     Thanks,
> >
> >
> >     Daniel
> >
> >
> >      >
> >      > I'll see how other IOMMUs are handling their iommu_find_as()
> >      >
> >      >
> >      > Thanks,
> >      >
> >      >
> >      > Daniel
> >      >
> >      >
> >      >>
> >      >>
> >      >>
> >      >>
> >      >>> +        s = s->iommus.le_next;
> >      >>> +    }
> >      >>> +
> >      >>> +    return as ? as : &address_space_memory;
> >      >>> +}
> >      >>> +
> >      >>> +static const PCIIOMMUOps riscv_iommu_ops = {
> >      >>> +    .get_address_space = riscv_iommu_find_as,
> >      >>> +};
> >      >>> +
> >      >>> +void riscv_iommu_pci_setup_iommu(RISCVIOMMUState *iommu,
> PCIBus *bus,
> >      >>> +        Error **errp)
> >      >>> +{
> >      >>> +    if (bus->iommu_ops &&
> >      >>> +        bus->iommu_ops->get_address_space ==
> riscv_iommu_find_as) {
> >      >>> +        /* Allow multiple IOMMUs on the same PCIe bus, link
> known devices */
> >      >>> +        RISCVIOMMUState *last = (RISCVIOMMUState
> *)bus->iommu_opaque;
> >      >>> +        QLIST_INSERT_AFTER(last, iommu, iommus);
> >      >>> +    } else if (bus->iommu_ops == NULL) {
> >      >>> +        pci_setup_iommu(bus, &riscv_iommu_ops, iommu);
> >      >>> +    } else {
> >      >>> +        error_setg(errp, "can't register secondary IOMMU for
> PCI bus #%d",
> >      >>> +            pci_bus_num(bus));
> >      >>> +    }
> >      >>> +}
> >      >>> +
> >      >>> +static int riscv_iommu_memory_region_index(IOMMUMemoryRegion
> *iommu_mr,
> >      >>> +    MemTxAttrs attrs)
> >      >>> +{
> >      >>> +    return attrs.unspecified ? RISCV_IOMMU_NOPASID :
> (int)attrs.pasid;
> >      >>> +}
> >      >>> +
> >      >>> +static int
> riscv_iommu_memory_region_index_len(IOMMUMemoryRegion *iommu_mr)
> >      >>> +{
> >      >>> +    RISCVIOMMUSpace *as = container_of(iommu_mr,
> RISCVIOMMUSpace, iova_mr);
> >      >>> +    return 1 << as->iommu->pasid_bits;
> >      >>> +}
> >      >>> +
> >      >>> +static void riscv_iommu_memory_region_init(ObjectClass *klass,
> void *data)
> >      >>> +{
> >      >>> +    IOMMUMemoryRegionClass *imrc =
> IOMMU_MEMORY_REGION_CLASS(klass);
> >      >>> +
> >      >>> +    imrc->translate = riscv_iommu_memory_region_translate;
> >      >>> +    imrc->notify_flag_changed =
> riscv_iommu_memory_region_notify;
> >      >>> +    imrc->attrs_to_index = riscv_iommu_memory_region_index;
> >      >>> +    imrc->num_indexes = riscv_iommu_memory_region_index_len;
> >      >>> +}
> >      >>> +
> >      >>> +static const TypeInfo riscv_iommu_memory_region_info = {
> >      >>> +    .parent = TYPE_IOMMU_MEMORY_REGION,
> >      >>> +    .name = TYPE_RISCV_IOMMU_MEMORY_REGION,
> >      >>> +    .class_init = riscv_iommu_memory_region_init,
> >      >>> +};
> >      >>> +
> >      >>> +static void riscv_iommu_register_mr_types(void)
> >      >>> +{
> >      >>> +    type_register_static(&riscv_iommu_memory_region_info);
> >      >>> +    type_register_static(&riscv_iommu_info);
> >      >>> +}
> >      >>> +
> >      >>> +type_init(riscv_iommu_register_mr_types);
> >      >>> diff --git a/hw/riscv/riscv-iommu.h b/hw/riscv/riscv-iommu.h
> >      >>> new file mode 100644
> >      >>> index 0000000000..6f740de690
> >      >>> --- /dev/null
> >      >>> +++ b/hw/riscv/riscv-iommu.h
> >      >>> @@ -0,0 +1,141 @@
> >      >>> +/*
> >      >>> + * QEMU emulation of an RISC-V IOMMU (Ziommu)
> >      >>> + *
> >      >>> + * Copyright (C) 2022-2023 Rivos Inc.
> >      >>> + *
> >      >>> + * This program is free software; you can redistribute it
> and/or modify
> >      >>> + * it under the terms of the GNU General Public License as
> published by
> >      >>> + * the Free Software Foundation; either version 2 of the
> License.
> >      >>> + *
> >      >>> + * This program is distributed in the hope that it will be
> useful,
> >      >>> + * but WITHOUT ANY WARRANTY; without even the implied warranty
> of
> >      >>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
> the
> >      >>> + * GNU General Public License for more details.
> >      >>> + *
> >      >>> + * You should have received a copy of the GNU General Public
> License along
> >      >>> + * with this program; if not, see <
> http://www.gnu.org/licenses/ <http://www.gnu.org/licenses/>>.
> >      >>> + */
> >      >>> +
> >      >>> +#ifndef HW_RISCV_IOMMU_STATE_H
> >      >>> +#define HW_RISCV_IOMMU_STATE_H
> >      >>> +
> >      >>> +#include "qemu/osdep.h"
> >      >>> +#include "qom/object.h"
> >      >>> +
> >      >>> +#include "hw/riscv/iommu.h"
> >      >>> +
> >      >>> +struct RISCVIOMMUState {
> >      >>> +    /*< private >*/
> >      >>> +    DeviceState parent_obj;
> >      >>> +
> >      >>> +    /*< public >*/
> >      >>> +    uint32_t version;     /* Reported interface version number
> */
> >      >>> +    uint32_t pasid_bits;  /* process identifier width */
> >      >>> +    uint32_t bus;         /* PCI bus mapping for non-root
> endpoints */
> >      >>> +
> >      >>> +    uint64_t cap;         /* IOMMU supported capabilities */
> >      >>> +    uint64_t fctl;        /* IOMMU enabled features */
> >      >>> +
> >      >>> +    bool enable_off;      /* Enable out-of-reset OFF mode (DMA
> disabled) */
> >      >>> +    bool enable_msi;      /* Enable MSI remapping */
> >      >>> +
> >      >>> +    /* IOMMU Internal State */
> >      >>> +    uint64_t ddtp;        /* Validated Device Directory Tree
> Root Pointer */
> >      >>> +
> >      >>> +    dma_addr_t cq_addr;   /* Command queue base physical
> address */
> >      >>> +    dma_addr_t fq_addr;   /* Fault/event queue base physical
> address */
> >      >>> +    dma_addr_t pq_addr;   /* Page request queue base physical
> address */
> >      >>> +
> >      >>> +    uint32_t cq_mask;     /* Command queue index bit mask */
> >      >>> +    uint32_t fq_mask;     /* Fault/event queue index bit mask
> */
> >      >>> +    uint32_t pq_mask;     /* Page request queue index bit mask
> */
> >      >>> +
> >      >>> +    /* interrupt notifier */
> >      >>> +    void (*notify)(RISCVIOMMUState *iommu, unsigned vector);
> >      >>> +
> >      >>> +    /* IOMMU State Machine */
> >      >>> +    QemuThread core_proc; /* Background processing thread */
> >      >>> +    QemuMutex core_lock;  /* Global IOMMU lock, used for
> cache/regs updates */
> >      >>> +    QemuCond core_cond;   /* Background processing wake up
> signal */
> >      >>> +    unsigned core_exec;   /* Processing thread execution
> actions */
> >      >>> +
> >      >>> +    /* IOMMU target address space */
> >      >>> +    AddressSpace *target_as;
> >      >>> +    MemoryRegion *target_mr;
> >      >>> +
> >      >>> +    /* MSI / MRIF access trap */
> >      >>> +    AddressSpace trap_as;
> >      >>> +    MemoryRegion trap_mr;
> >      >>> +
> >      >>> +    GHashTable *ctx_cache;          /* Device translation
> Context Cache */
> >      >>> +
> >      >>> +    /* MMIO Hardware Interface */
> >      >>> +    MemoryRegion regs_mr;
> >      >>> +    QemuSpin regs_lock;
> >      >>> +    uint8_t *regs_rw;  /* register state (user write) */
> >      >>> +    uint8_t *regs_wc;  /* write-1-to-clear mask */
> >      >>> +    uint8_t *regs_ro;  /* read-only mask */
> >      >>> +
> >      >>> +    QLIST_ENTRY(RISCVIOMMUState) iommus;
> >      >>> +    QLIST_HEAD(, RISCVIOMMUSpace) spaces;
> >      >>> +};
> >      >>> +
> >      >>> +void riscv_iommu_pci_setup_iommu(RISCVIOMMUState *iommu,
> PCIBus *bus,
> >      >>> +         Error **errp);
> >      >>> +
> >      >>> +/* private helpers */
> >      >>> +
> >      >>> +/* Register helper functions */
> >      >>> +static inline uint32_t riscv_iommu_reg_mod32(RISCVIOMMUState
> *s,
> >      >>> +    unsigned idx, uint32_t set, uint32_t clr)
> >      >>> +{
> >      >>> +    uint32_t val;
> >      >>> +    qemu_spin_lock(&s->regs_lock);
> >      >>> +    val = ldl_le_p(s->regs_rw + idx);
> >      >>> +    stl_le_p(s->regs_rw + idx, (val & ~clr) | set);
> >      >>> +    qemu_spin_unlock(&s->regs_lock);
> >      >>> +    return val;
> >      >>> +}
> >      >>> +
> >      >>> +static inline void riscv_iommu_reg_set32(RISCVIOMMUState *s,
> >      >>> +    unsigned idx, uint32_t set)
> >      >>> +{
> >      >>> +    qemu_spin_lock(&s->regs_lock);
> >      >>> +    stl_le_p(s->regs_rw + idx, set);
> >      >>> +    qemu_spin_unlock(&s->regs_lock);
> >      >>> +}
> >      >>> +
> >      >>> +static inline uint32_t riscv_iommu_reg_get32(RISCVIOMMUState
> *s,
> >      >>> +    unsigned idx)
> >      >>> +{
> >      >>> +    return ldl_le_p(s->regs_rw + idx);
> >      >>> +}
> >      >>> +
> >      >>> +static inline uint64_t riscv_iommu_reg_mod64(RISCVIOMMUState
> *s,
> >      >>> +    unsigned idx, uint64_t set, uint64_t clr)
> >      >>> +{
> >      >>> +    uint64_t val;
> >      >>> +    qemu_spin_lock(&s->regs_lock);
> >      >>> +    val = ldq_le_p(s->regs_rw + idx);
> >      >>> +    stq_le_p(s->regs_rw + idx, (val & ~clr) | set);
> >      >>> +    qemu_spin_unlock(&s->regs_lock);
> >      >>> +    return val;
> >      >>> +}
> >      >>> +
> >      >>> +static inline void riscv_iommu_reg_set64(RISCVIOMMUState *s,
> >      >>> +    unsigned idx, uint64_t set)
> >      >>> +{
> >      >>> +    qemu_spin_lock(&s->regs_lock);
> >      >>> +    stq_le_p(s->regs_rw + idx, set);
> >      >>> +    qemu_spin_unlock(&s->regs_lock);
> >      >>> +}
> >      >>> +
> >      >>> +static inline uint64_t riscv_iommu_reg_get64(RISCVIOMMUState
> *s,
> >      >>> +    unsigned idx)
> >      >>> +{
> >      >>> +    return ldq_le_p(s->regs_rw + idx);
> >      >>> +}
> >      >>> +
> >      >>> +
> >      >>> +
> >      >>> +#endif
> >      >>> diff --git a/hw/riscv/trace-events b/hw/riscv/trace-events
> >      >>> new file mode 100644
> >      >>> index 0000000000..42a97caffa
> >      >>> --- /dev/null
> >      >>> +++ b/hw/riscv/trace-events
> >      >>> @@ -0,0 +1,11 @@
> >      >>> +# See documentation at docs/devel/tracing.rst
> >      >>> +
> >      >>> +# riscv-iommu.c
> >      >>> +riscv_iommu_new(const char *id, unsigned b, unsigned d,
> unsigned f) "%s: device attached %04x:%02x.%d"
> >      >>> +riscv_iommu_flt(const char *id, unsigned b, unsigned d,
> unsigned f, uint64_t reason, uint64_t iova) "%s: fault %04x:%02x.%u reason:
> 0x%"PRIx64" iova: 0x%"PRIx64
> >      >>> +riscv_iommu_pri(const char *id, unsigned b, unsigned d,
> unsigned f, uint64_t iova) "%s: page request %04x:%02x.%u iova: 0x%"PRIx64
> >      >>> +riscv_iommu_dma(const char *id, unsigned b, unsigned d,
> unsigned f, unsigned pasid, const char *dir, uint64_t iova, uint64_t phys)
> "%s: translate %04x:%02x.%u #%u %s 0x%"PRIx64" -> 0x%"PRIx64
> >      >>> +riscv_iommu_msi(const char *id, unsigned b, unsigned d,
> unsigned f, uint64_t iova, uint64_t phys) "%s: translate %04x:%02x.%u MSI
> 0x%"PRIx64" -> 0x%"PRIx64
> >      >>> +riscv_iommu_cmd(const char *id, uint64_t l, uint64_t u) "%s:
> command 0x%"PRIx64" 0x%"PRIx64
> >      >>> +riscv_iommu_notifier_add(const char *id) "%s: dev-iotlb
> notifier added"
> >      >>> +riscv_iommu_notifier_del(const char *id) "%s: dev-iotlb
> notifier removed"
> >      >>> diff --git a/hw/riscv/trace.h b/hw/riscv/trace.h
> >      >>> new file mode 100644
> >      >>> index 0000000000..b88504b750
> >      >>> --- /dev/null
> >      >>> +++ b/hw/riscv/trace.h
> >      >>> @@ -0,0 +1,2 @@
> >      >>> +#include "trace/trace-hw_riscv.h"
> >      >>> +
> >      >>> diff --git a/include/hw/riscv/iommu.h b/include/hw/riscv/iommu.h
> >      >>> new file mode 100644
> >      >>> index 0000000000..403b365893
> >      >>> --- /dev/null
> >      >>> +++ b/include/hw/riscv/iommu.h
> >      >>> @@ -0,0 +1,36 @@
> >      >>> +/*
> >      >>> + * QEMU emulation of an RISC-V IOMMU (Ziommu)
> >      >>> + *
> >      >>> + * Copyright (C) 2022-2023 Rivos Inc.
> >      >>> + *
> >      >>> + * This program is free software; you can redistribute it
> and/or modify
> >      >>> + * it under the terms of the GNU General Public License as
> published by
> >      >>> + * the Free Software Foundation; either version 2 of the
> License.
> >      >>> + *
> >      >>> + * This program is distributed in the hope that it will be
> useful,
> >      >>> + * but WITHOUT ANY WARRANTY; without even the implied warranty
> of
> >      >>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
> the
> >      >>> + * GNU General Public License for more details.
> >      >>> + *
> >      >>> + * You should have received a copy of the GNU General Public
> License along
> >      >>> + * with this program; if not, see <
> http://www.gnu.org/licenses/ <http://www.gnu.org/licenses/>>.
> >      >>> + */
> >      >>> +
> >      >>> +#ifndef HW_RISCV_IOMMU_H
> >      >>> +#define HW_RISCV_IOMMU_H
> >      >>> +
> >      >>> +#include "qemu/osdep.h"
> >      >>> +#include "qom/object.h"
> >      >>> +
> >      >>> +#define TYPE_RISCV_IOMMU "riscv-iommu"
> >      >>> +OBJECT_DECLARE_SIMPLE_TYPE(RISCVIOMMUState, RISCV_IOMMU)
> >      >>> +typedef struct RISCVIOMMUState RISCVIOMMUState;
> >      >>> +
> >      >>> +#define TYPE_RISCV_IOMMU_MEMORY_REGION "riscv-iommu-mr"
> >      >>> +typedef struct RISCVIOMMUSpace RISCVIOMMUSpace;
> >      >>> +
> >      >>> +#define TYPE_RISCV_IOMMU_PCI "riscv-iommu-pci"
> >      >>> +OBJECT_DECLARE_SIMPLE_TYPE(RISCVIOMMUStatePci, RISCV_IOMMU_PCI)
> >      >>> +typedef struct RISCVIOMMUStatePci RISCVIOMMUStatePci;
> >      >>> +
> >      >>> +#endif
> >      >>> diff --git a/meson.build b/meson.build
> >      >>> index c59ca496f2..75e56f3282 100644
> >      >>> --- a/meson.build
> >      >>> +++ b/meson.build
> >      >>> @@ -3361,6 +3361,7 @@ if have_system
> >      >>>       'hw/rdma',
> >      >>>       'hw/rdma/vmw',
> >      >>>       'hw/rtc',
> >      >>> +    'hw/riscv',
> >      >>>       'hw/s390x',
> >      >>>       'hw/scsi',
> >      >>>       'hw/sd',
> >      >>> --
> >      >>> 2.43.2
> >      >>>
> >      >>>
> >
>

[-- Attachment #2: Type: text/html, Size: 33510 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 03/15] hw/riscv: add RISC-V IOMMU base emulation
  2024-05-21 10:52             ` Frank Chang
@ 2024-05-21 12:28               ` Daniel Henrique Barboza
  0 siblings, 0 replies; 55+ messages in thread
From: Daniel Henrique Barboza @ 2024-05-21 12:28 UTC (permalink / raw)
  To: Frank Chang
  Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
	zhiwei_liu, palmer, ajones, tjeznach, Sebastien Boeuf, Sunil V L



On 5/21/24 07:52, Frank Chang wrote:
> Hi Daniel,
> 
> On Tue, May 21, 2024 at 12:17 AM Daniel Henrique Barboza <dbarboza@ventanamicro.com <mailto:dbarboza@ventanamicro.com>> wrote:
> 
>     Hi Frank,
> 
>     On 5/16/24 04:13, Frank Chang wrote:
>      > On Mon, May 13, 2024 at 8:37 PM Daniel Henrique Barboza <dbarboza@ventanamicro.com <mailto:dbarboza@ventanamicro.com> <mailto:dbarboza@ventanamicro.com <mailto:dbarboza@ventanamicro.com>>> wrote:
>      >
>      >     Hi Frank,
>      >
>      >
>      >     On 5/8/24 08:15, Daniel Henrique Barboza wrote:
>      >      > Hi Frank,
>      >      >
>      >      > I'll reply with that I've done so far. Still missing some stuff:
>      >      >
>      >      > On 5/2/24 08:37, Frank Chang wrote:
>      >      >> Hi Daniel,
>      >      >>
>      >      >> Daniel Henrique Barboza <dbarboza@ventanamicro.com <mailto:dbarboza@ventanamicro.com> <mailto:dbarboza@ventanamicro.com <mailto:dbarboza@ventanamicro.com>>> 於 2024年3月8日 週五 上午12:04寫道:
>      >      >>>
>      >      >>> From: Tomasz Jeznach <tjeznach@rivosinc.com <mailto:tjeznach@rivosinc.com> <mailto:tjeznach@rivosinc.com <mailto:tjeznach@rivosinc.com>>>
>      >      >>>
>      >      >>> The RISC-V IOMMU specification is now ratified as-per the RISC-V
>      >      >>> international process. The latest frozen specifcation can be found
>      >      >>> at:
>      >      >>>
>      >      >>> https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf <https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf> <https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf <https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf>>
>      >      >>>
>      >      >>> Add the foundation of the device emulation for RISC-V IOMMU, which
>      >      >>> includes an IOMMU that has no capabilities but MSI interrupt support and
>      >      >>> fault queue interfaces. We'll add add more features incrementally in the
>      >      >>> next patches.
>      >      >>>
>      >      >>> Co-developed-by: Sebastien Boeuf <seb@rivosinc.com <mailto:seb@rivosinc.com> <mailto:seb@rivosinc.com <mailto:seb@rivosinc.com>>>
>      >      >>> Signed-off-by: Sebastien Boeuf <seb@rivosinc.com <mailto:seb@rivosinc.com> <mailto:seb@rivosinc.com <mailto:seb@rivosinc.com>>>
>      >      >>> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com <mailto:tjeznach@rivosinc.com> <mailto:tjeznach@rivosinc.com <mailto:tjeznach@rivosinc.com>>>
>      >      >>> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com <mailto:dbarboza@ventanamicro.com> <mailto:dbarboza@ventanamicro.com <mailto:dbarboza@ventanamicro.com>>>
>      >      >>> ---
>      >      >>>   hw/riscv/Kconfig         |    4 +
>      >
>      >     (...)
>      >
>      >      >>> +
>      >      >>> +    s->iommus.le_next = NULL;
>      >      >>> +    s->iommus.le_prev = NULL;
>      >      >>> +    QLIST_INIT(&s->spaces);
>      >      >>> +    qemu_cond_init(&s->core_cond);
>      >      >>> +    qemu_mutex_init(&s->core_lock);
>      >      >>> +    qemu_spin_init(&s->regs_lock);
>      >      >>> +    qemu_thread_create(&s->core_proc, "riscv-iommu-core",
>      >      >>> +        riscv_iommu_core_proc, s, QEMU_THREAD_JOINABLE);
>      >      >>
>      >      >> In our experience, using QEMU thread increases the latency of command
>      >      >> queue processing,
>      >      >> which leads to the potential IOMMU fence timeout in the Linux driver
>      >      >> when using IOMMU with KVM,
>      >      >> e.g. booting the guest Linux.
>      >      >>
>      >      >> Is it possible to remove the thread from the IOMMU just like ARM, AMD,
>      >      >> and Intel IOMMU models?
>      >      >
>      >      > Interesting. We've been using this emulation internally in Ventana, with
>      >      > KVM and VFIO, and didn't experience this issue. Drew is on CC and can talk
>      >      > more about it.
>      >      >
>      >      > That said, I don't mind this change, assuming it's feasible to make it for this
>      >      > first version.  I'll need to check it how other IOMMUs are doing it.
>      >
>      >
>      >     I removed the threading and it seems to be working fine without it. I'll commit this
>      >     change for v3.
>      >
>      >      >
>      >      >
>      >      >
>      >      >>
>      >      >>> +}
>      >      >>> +
>      >      >
>      >      > (...)
>      >      >
>      >      >>> +
>      >      >>> +static AddressSpace *riscv_iommu_find_as(PCIBus *bus, void *opaque, int devfn)
>      >      >>> +{
>      >      >>> +    RISCVIOMMUState *s = (RISCVIOMMUState *) opaque;
>      >      >>> +    PCIDevice *pdev = pci_find_device(bus, pci_bus_num(bus), devfn);
>      >      >>> +    AddressSpace *as = NULL;
>      >      >>> +
>      >      >>> +    if (pdev && pci_is_iommu(pdev)) {
>      >      >>> +        return s->target_as;
>      >      >>> +    }
>      >      >>> +
>      >      >>> +    /* Find first registered IOMMU device */
>      >      >>> +    while (s->iommus.le_prev) {
>      >      >>> +        s = *(s->iommus.le_prev);
>      >      >>> +    }
>      >      >>> +
>      >      >>> +    /* Find first matching IOMMU */
>      >      >>> +    while (s != NULL && as == NULL) {
>      >      >>> +        as = riscv_iommu_space(s, PCI_BUILD_BDF(pci_bus_num(bus), devfn));
>      >      >>
>      >      >> For pci_bus_num(),
>      >      >> riscv_iommu_find_as() can be called at the very early stage
>      >      >> where software has no chance to enumerate the bus numbers.
>      >
>      >     I investigated and this doesn't seem to be a problem. This function is called at the
>      >     last step of the realize() steps of both riscv_iommu_pci_realize() and
>      >     riscv_iommu_sys_realize(), and by that time the pci_bus_num() is already assigned.
>      >     Other iommus use pci_bus_num() into their own get_address_space() callbacks like
>      >     this too.
>      >
>      >
>      > Hi Daniel,
>      >
>      > IIUC, pci_bus_num() by default is assigned to pcibus_num():
>      >
>      > static int pcibus_num(PCIBus *bus)
>      > {
>      >      if (pci_bus_is_root(bus)) {
>      >          return 0; /* pci host bridge */
>      >      }
>      >      return bus->parent_dev->config[PCI_SECONDARY_BUS];
>      > }
>      >
>      > If the bus is not the root bus, it tries to read the bus' parent device's
>      > secondary bus number (PCI_SECONDARY_BUS) field in the PCI configuration space.
>      > This field should be programmable by the SW during PCIe enumeration.
>      > But I don't think SW has a chance to be executed before riscv_iommu_sys_realize() is called,
>      > since it's pretty early before CPU's execution unless RISC-V IOMMU is hot-plugged.
>      > Even if RISC-V IOMMU is hot-plugged, I think riscv_iommu_sys_realize() is still called
>      > before SW aware of the existence of IOMMU on the PCI topology tree.
>      >
>      > Do you think this matches your observation?
> 
>     It does. You have a good point on how the pcibus_num() can vary if SW wants to
>     change the PCI_SECONDARY_BUS and the IOMMU isn't in a root bus. Note that this
>     will not happen with the existing riscv-iommu-pci device as it is now, since it
>     has code to prevent the device to be attached to non-pci root buses, but there's
>     no restrictions in the riscv-iommu-sys device.
> 
> 
> Thanks for the explanation.
> 
> Do you know where this limitation is from?
> Is it in this patchset or it's somewhere else in the Linux RISC-V IOMMU driver?

I don't know. I know that the riscv-iommu spec does not impose these restrictions, so
I assume that this is a design choice from Tomasz when designing both the first
QEMU impl and the Linux driver.


> 
> BTW, for the case like DesignWare PCIe host controller [1],
> we cannot connect RISC-V IOMMU to the root bus ("pcie") [2]
> because it already has a child bus ("dw-pcie") connecting to it [3].
> 
> If we try to connect RISC-V IOMMU to the root bus ("pcie"),
> it can't be discovered by Linux PCIe driver as a PCIe Downstream Port
> normally leads to a Link with only Device 0 on it.
> 
> PCIe spec 6.0, section 7.3.1 stats:
> "Downstream Ports that do not have ARI Forwarding enabled must associate
> only Device 0 with the device attached to the Logical Bus representing the Link
> from the Port."
> 
> The PCI slot scan is early returned in the Linux PCIe driver [4][5].
> 
> Do you think it's possible to remove this limitation?

I'm pretty sure it is. It's only a matter of how much code and effort we're
willing to put into it.

For this initial impl I believe we can live with this restriction. We will enable more
use cases as we go along on both Linux and QEMU.


Thanks,

Daniel



> 
> [1] https://github.com/qemu/qemu/blob/master/hw/pci-host/designware.c <https://github.com/qemu/qemu/blob/master/hw/pci-host/designware.c>
> [2] https://github.com/qemu/qemu/blob/master/hw/pci-host/designware.c#L695 <https://github.com/qemu/qemu/blob/master/hw/pci-host/designware.c#L695>
> [3] https://github.com/qemu/qemu/blob/master/hw/pci-host/designware.c#L409 <https://github.com/qemu/qemu/blob/master/hw/pci-host/designware.c#L409>
> [4] https://github.com/torvalds/linux/blob/master/drivers/pci/probe.c#L2674 <https://github.com/torvalds/linux/blob/master/drivers/pci/probe.c#L2674>
> [5] https://github.com/torvalds/linux/blob/master/drivers/pci/probe.c#L2652 <https://github.com/torvalds/linux/blob/master/drivers/pci/probe.c#L2652>
> 
> Regards,
> Frank Chang
> 
> 
>     And speaking on riscv-iommu-bus, the current device we have in this series is too
>     bare bones, without an actual use case for it (e.g. code to add it in the 'virt'
>     machine), but it's getting in the way nevertheless.
> 
>     I'll remove the riscv-iommu-sys device from v3 and re-introduce it in a later
>     revision or as a follow up series. Sunil has a handful of patches that add the
>     riscv-iommu-sys device in the 'virt' machine and the proper ACPI support for it [1],
>     and I intend to use them as a base. We'll then need some minor adjustments in the
>     existing code to make it fully functional like we're doing with riscv-iommu-pci.
> 
> 
>     Thanks,
> 
>     Daniel
> 
> 
>     [1] https://github.com/vlsunil/qemu/commits/acpi_rimt_poc_v1/ <https://github.com/vlsunil/qemu/commits/acpi_rimt_poc_v1/>
>      >
>      > Regards,
>      > Frank Chang
>      >
>      >
>      >
>      >     Thanks,
>      >
>      >
>      >     Daniel
>      >
>      >
>      >      >
>      >      > I'll see how other IOMMUs are handling their iommu_find_as()
>      >      >
>      >      >
>      >      > Thanks,
>      >      >
>      >      >
>      >      > Daniel
>      >      >
>      >      >
>      >      >>
>      >      >>
>      >      >>
>      >      >>
>      >      >>> +        s = s->iommus.le_next;
>      >      >>> +    }
>      >      >>> +
>      >      >>> +    return as ? as : &address_space_memory;
>      >      >>> +}
>      >      >>> +
>      >      >>> +static const PCIIOMMUOps riscv_iommu_ops = {
>      >      >>> +    .get_address_space = riscv_iommu_find_as,
>      >      >>> +};
>      >      >>> +
>      >      >>> +void riscv_iommu_pci_setup_iommu(RISCVIOMMUState *iommu, PCIBus *bus,
>      >      >>> +        Error **errp)
>      >      >>> +{
>      >      >>> +    if (bus->iommu_ops &&
>      >      >>> +        bus->iommu_ops->get_address_space == riscv_iommu_find_as) {
>      >      >>> +        /* Allow multiple IOMMUs on the same PCIe bus, link known devices */
>      >      >>> +        RISCVIOMMUState *last = (RISCVIOMMUState *)bus->iommu_opaque;
>      >      >>> +        QLIST_INSERT_AFTER(last, iommu, iommus);
>      >      >>> +    } else if (bus->iommu_ops == NULL) {
>      >      >>> +        pci_setup_iommu(bus, &riscv_iommu_ops, iommu);
>      >      >>> +    } else {
>      >      >>> +        error_setg(errp, "can't register secondary IOMMU for PCI bus #%d",
>      >      >>> +            pci_bus_num(bus));
>      >      >>> +    }
>      >      >>> +}
>      >      >>> +
>      >      >>> +static int riscv_iommu_memory_region_index(IOMMUMemoryRegion *iommu_mr,
>      >      >>> +    MemTxAttrs attrs)
>      >      >>> +{
>      >      >>> +    return attrs.unspecified ? RISCV_IOMMU_NOPASID : (int)attrs.pasid;
>      >      >>> +}
>      >      >>> +
>      >      >>> +static int riscv_iommu_memory_region_index_len(IOMMUMemoryRegion *iommu_mr)
>      >      >>> +{
>      >      >>> +    RISCVIOMMUSpace *as = container_of(iommu_mr, RISCVIOMMUSpace, iova_mr);
>      >      >>> +    return 1 << as->iommu->pasid_bits;
>      >      >>> +}
>      >      >>> +
>      >      >>> +static void riscv_iommu_memory_region_init(ObjectClass *klass, void *data)
>      >      >>> +{
>      >      >>> +    IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_CLASS(klass);
>      >      >>> +
>      >      >>> +    imrc->translate = riscv_iommu_memory_region_translate;
>      >      >>> +    imrc->notify_flag_changed = riscv_iommu_memory_region_notify;
>      >      >>> +    imrc->attrs_to_index = riscv_iommu_memory_region_index;
>      >      >>> +    imrc->num_indexes = riscv_iommu_memory_region_index_len;
>      >      >>> +}
>      >      >>> +
>      >      >>> +static const TypeInfo riscv_iommu_memory_region_info = {
>      >      >>> +    .parent = TYPE_IOMMU_MEMORY_REGION,
>      >      >>> +    .name = TYPE_RISCV_IOMMU_MEMORY_REGION,
>      >      >>> +    .class_init = riscv_iommu_memory_region_init,
>      >      >>> +};
>      >      >>> +
>      >      >>> +static void riscv_iommu_register_mr_types(void)
>      >      >>> +{
>      >      >>> +    type_register_static(&riscv_iommu_memory_region_info);
>      >      >>> +    type_register_static(&riscv_iommu_info);
>      >      >>> +}
>      >      >>> +
>      >      >>> +type_init(riscv_iommu_register_mr_types);
>      >      >>> diff --git a/hw/riscv/riscv-iommu.h b/hw/riscv/riscv-iommu.h
>      >      >>> new file mode 100644
>      >      >>> index 0000000000..6f740de690
>      >      >>> --- /dev/null
>      >      >>> +++ b/hw/riscv/riscv-iommu.h
>      >      >>> @@ -0,0 +1,141 @@
>      >      >>> +/*
>      >      >>> + * QEMU emulation of an RISC-V IOMMU (Ziommu)
>      >      >>> + *
>      >      >>> + * Copyright (C) 2022-2023 Rivos Inc.
>      >      >>> + *
>      >      >>> + * This program is free software; you can redistribute it and/or modify
>      >      >>> + * it under the terms of the GNU General Public License as published by
>      >      >>> + * the Free Software Foundation; either version 2 of the License.
>      >      >>> + *
>      >      >>> + * This program is distributed in the hope that it will be useful,
>      >      >>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>      >      >>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>      >      >>> + * GNU General Public License for more details.
>      >      >>> + *
>      >      >>> + * You should have received a copy of the GNU General Public License along
>      >      >>> + * with this program; if not, see <http://www.gnu.org/licenses/ <http://www.gnu.org/licenses/> <http://www.gnu.org/licenses/ <http://www.gnu.org/licenses/>>>.
>      >      >>> + */
>      >      >>> +
>      >      >>> +#ifndef HW_RISCV_IOMMU_STATE_H
>      >      >>> +#define HW_RISCV_IOMMU_STATE_H
>      >      >>> +
>      >      >>> +#include "qemu/osdep.h"
>      >      >>> +#include "qom/object.h"
>      >      >>> +
>      >      >>> +#include "hw/riscv/iommu.h"
>      >      >>> +
>      >      >>> +struct RISCVIOMMUState {
>      >      >>> +    /*< private >*/
>      >      >>> +    DeviceState parent_obj;
>      >      >>> +
>      >      >>> +    /*< public >*/
>      >      >>> +    uint32_t version;     /* Reported interface version number */
>      >      >>> +    uint32_t pasid_bits;  /* process identifier width */
>      >      >>> +    uint32_t bus;         /* PCI bus mapping for non-root endpoints */
>      >      >>> +
>      >      >>> +    uint64_t cap;         /* IOMMU supported capabilities */
>      >      >>> +    uint64_t fctl;        /* IOMMU enabled features */
>      >      >>> +
>      >      >>> +    bool enable_off;      /* Enable out-of-reset OFF mode (DMA disabled) */
>      >      >>> +    bool enable_msi;      /* Enable MSI remapping */
>      >      >>> +
>      >      >>> +    /* IOMMU Internal State */
>      >      >>> +    uint64_t ddtp;        /* Validated Device Directory Tree Root Pointer */
>      >      >>> +
>      >      >>> +    dma_addr_t cq_addr;   /* Command queue base physical address */
>      >      >>> +    dma_addr_t fq_addr;   /* Fault/event queue base physical address */
>      >      >>> +    dma_addr_t pq_addr;   /* Page request queue base physical address */
>      >      >>> +
>      >      >>> +    uint32_t cq_mask;     /* Command queue index bit mask */
>      >      >>> +    uint32_t fq_mask;     /* Fault/event queue index bit mask */
>      >      >>> +    uint32_t pq_mask;     /* Page request queue index bit mask */
>      >      >>> +
>      >      >>> +    /* interrupt notifier */
>      >      >>> +    void (*notify)(RISCVIOMMUState *iommu, unsigned vector);
>      >      >>> +
>      >      >>> +    /* IOMMU State Machine */
>      >      >>> +    QemuThread core_proc; /* Background processing thread */
>      >      >>> +    QemuMutex core_lock;  /* Global IOMMU lock, used for cache/regs updates */
>      >      >>> +    QemuCond core_cond;   /* Background processing wake up signal */
>      >      >>> +    unsigned core_exec;   /* Processing thread execution actions */
>      >      >>> +
>      >      >>> +    /* IOMMU target address space */
>      >      >>> +    AddressSpace *target_as;
>      >      >>> +    MemoryRegion *target_mr;
>      >      >>> +
>      >      >>> +    /* MSI / MRIF access trap */
>      >      >>> +    AddressSpace trap_as;
>      >      >>> +    MemoryRegion trap_mr;
>      >      >>> +
>      >      >>> +    GHashTable *ctx_cache;          /* Device translation Context Cache */
>      >      >>> +
>      >      >>> +    /* MMIO Hardware Interface */
>      >      >>> +    MemoryRegion regs_mr;
>      >      >>> +    QemuSpin regs_lock;
>      >      >>> +    uint8_t *regs_rw;  /* register state (user write) */
>      >      >>> +    uint8_t *regs_wc;  /* write-1-to-clear mask */
>      >      >>> +    uint8_t *regs_ro;  /* read-only mask */
>      >      >>> +
>      >      >>> +    QLIST_ENTRY(RISCVIOMMUState) iommus;
>      >      >>> +    QLIST_HEAD(, RISCVIOMMUSpace) spaces;
>      >      >>> +};
>      >      >>> +
>      >      >>> +void riscv_iommu_pci_setup_iommu(RISCVIOMMUState *iommu, PCIBus *bus,
>      >      >>> +         Error **errp);
>      >      >>> +
>      >      >>> +/* private helpers */
>      >      >>> +
>      >      >>> +/* Register helper functions */
>      >      >>> +static inline uint32_t riscv_iommu_reg_mod32(RISCVIOMMUState *s,
>      >      >>> +    unsigned idx, uint32_t set, uint32_t clr)
>      >      >>> +{
>      >      >>> +    uint32_t val;
>      >      >>> +    qemu_spin_lock(&s->regs_lock);
>      >      >>> +    val = ldl_le_p(s->regs_rw + idx);
>      >      >>> +    stl_le_p(s->regs_rw + idx, (val & ~clr) | set);
>      >      >>> +    qemu_spin_unlock(&s->regs_lock);
>      >      >>> +    return val;
>      >      >>> +}
>      >      >>> +
>      >      >>> +static inline void riscv_iommu_reg_set32(RISCVIOMMUState *s,
>      >      >>> +    unsigned idx, uint32_t set)
>      >      >>> +{
>      >      >>> +    qemu_spin_lock(&s->regs_lock);
>      >      >>> +    stl_le_p(s->regs_rw + idx, set);
>      >      >>> +    qemu_spin_unlock(&s->regs_lock);
>      >      >>> +}
>      >      >>> +
>      >      >>> +static inline uint32_t riscv_iommu_reg_get32(RISCVIOMMUState *s,
>      >      >>> +    unsigned idx)
>      >      >>> +{
>      >      >>> +    return ldl_le_p(s->regs_rw + idx);
>      >      >>> +}
>      >      >>> +
>      >      >>> +static inline uint64_t riscv_iommu_reg_mod64(RISCVIOMMUState *s,
>      >      >>> +    unsigned idx, uint64_t set, uint64_t clr)
>      >      >>> +{
>      >      >>> +    uint64_t val;
>      >      >>> +    qemu_spin_lock(&s->regs_lock);
>      >      >>> +    val = ldq_le_p(s->regs_rw + idx);
>      >      >>> +    stq_le_p(s->regs_rw + idx, (val & ~clr) | set);
>      >      >>> +    qemu_spin_unlock(&s->regs_lock);
>      >      >>> +    return val;
>      >      >>> +}
>      >      >>> +
>      >      >>> +static inline void riscv_iommu_reg_set64(RISCVIOMMUState *s,
>      >      >>> +    unsigned idx, uint64_t set)
>      >      >>> +{
>      >      >>> +    qemu_spin_lock(&s->regs_lock);
>      >      >>> +    stq_le_p(s->regs_rw + idx, set);
>      >      >>> +    qemu_spin_unlock(&s->regs_lock);
>      >      >>> +}
>      >      >>> +
>      >      >>> +static inline uint64_t riscv_iommu_reg_get64(RISCVIOMMUState *s,
>      >      >>> +    unsigned idx)
>      >      >>> +{
>      >      >>> +    return ldq_le_p(s->regs_rw + idx);
>      >      >>> +}
>      >      >>> +
>      >      >>> +
>      >      >>> +
>      >      >>> +#endif
>      >      >>> diff --git a/hw/riscv/trace-events b/hw/riscv/trace-events
>      >      >>> new file mode 100644
>      >      >>> index 0000000000..42a97caffa
>      >      >>> --- /dev/null
>      >      >>> +++ b/hw/riscv/trace-events
>      >      >>> @@ -0,0 +1,11 @@
>      >      >>> +# See documentation at docs/devel/tracing.rst
>      >      >>> +
>      >      >>> +# riscv-iommu.c
>      >      >>> +riscv_iommu_new(const char *id, unsigned b, unsigned d, unsigned f) "%s: device attached %04x:%02x.%d"
>      >      >>> +riscv_iommu_flt(const char *id, unsigned b, unsigned d, unsigned f, uint64_t reason, uint64_t iova) "%s: fault %04x:%02x.%u reason: 0x%"PRIx64" iova: 0x%"PRIx64
>      >      >>> +riscv_iommu_pri(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova) "%s: page request %04x:%02x.%u iova: 0x%"PRIx64
>      >      >>> +riscv_iommu_dma(const char *id, unsigned b, unsigned d, unsigned f, unsigned pasid, const char *dir, uint64_t iova, uint64_t phys) "%s: translate %04x:%02x.%u #%u %s 0x%"PRIx64" -> 0x%"PRIx64
>      >      >>> +riscv_iommu_msi(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova, uint64_t phys) "%s: translate %04x:%02x.%u MSI 0x%"PRIx64" -> 0x%"PRIx64
>      >      >>> +riscv_iommu_cmd(const char *id, uint64_t l, uint64_t u) "%s: command 0x%"PRIx64" 0x%"PRIx64
>      >      >>> +riscv_iommu_notifier_add(const char *id) "%s: dev-iotlb notifier added"
>      >      >>> +riscv_iommu_notifier_del(const char *id) "%s: dev-iotlb notifier removed"
>      >      >>> diff --git a/hw/riscv/trace.h b/hw/riscv/trace.h
>      >      >>> new file mode 100644
>      >      >>> index 0000000000..b88504b750
>      >      >>> --- /dev/null
>      >      >>> +++ b/hw/riscv/trace.h
>      >      >>> @@ -0,0 +1,2 @@
>      >      >>> +#include "trace/trace-hw_riscv.h"
>      >      >>> +
>      >      >>> diff --git a/include/hw/riscv/iommu.h b/include/hw/riscv/iommu.h
>      >      >>> new file mode 100644
>      >      >>> index 0000000000..403b365893
>      >      >>> --- /dev/null
>      >      >>> +++ b/include/hw/riscv/iommu.h
>      >      >>> @@ -0,0 +1,36 @@
>      >      >>> +/*
>      >      >>> + * QEMU emulation of an RISC-V IOMMU (Ziommu)
>      >      >>> + *
>      >      >>> + * Copyright (C) 2022-2023 Rivos Inc.
>      >      >>> + *
>      >      >>> + * This program is free software; you can redistribute it and/or modify
>      >      >>> + * it under the terms of the GNU General Public License as published by
>      >      >>> + * the Free Software Foundation; either version 2 of the License.
>      >      >>> + *
>      >      >>> + * This program is distributed in the hope that it will be useful,
>      >      >>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>      >      >>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>      >      >>> + * GNU General Public License for more details.
>      >      >>> + *
>      >      >>> + * You should have received a copy of the GNU General Public License along
>      >      >>> + * with this program; if not, see <http://www.gnu.org/licenses/ <http://www.gnu.org/licenses/> <http://www.gnu.org/licenses/ <http://www.gnu.org/licenses/>>>.
>      >      >>> + */
>      >      >>> +
>      >      >>> +#ifndef HW_RISCV_IOMMU_H
>      >      >>> +#define HW_RISCV_IOMMU_H
>      >      >>> +
>      >      >>> +#include "qemu/osdep.h"
>      >      >>> +#include "qom/object.h"
>      >      >>> +
>      >      >>> +#define TYPE_RISCV_IOMMU "riscv-iommu"
>      >      >>> +OBJECT_DECLARE_SIMPLE_TYPE(RISCVIOMMUState, RISCV_IOMMU)
>      >      >>> +typedef struct RISCVIOMMUState RISCVIOMMUState;
>      >      >>> +
>      >      >>> +#define TYPE_RISCV_IOMMU_MEMORY_REGION "riscv-iommu-mr"
>      >      >>> +typedef struct RISCVIOMMUSpace RISCVIOMMUSpace;
>      >      >>> +
>      >      >>> +#define TYPE_RISCV_IOMMU_PCI "riscv-iommu-pci"
>      >      >>> +OBJECT_DECLARE_SIMPLE_TYPE(RISCVIOMMUStatePci, RISCV_IOMMU_PCI)
>      >      >>> +typedef struct RISCVIOMMUStatePci RISCVIOMMUStatePci;
>      >      >>> +
>      >      >>> +#endif
>      >      >>> diff --git a/meson.build b/meson.build
>      >      >>> index c59ca496f2..75e56f3282 100644
>      >      >>> --- a/meson.build
>      >      >>> +++ b/meson.build
>      >      >>> @@ -3361,6 +3361,7 @@ if have_system
>      >      >>>       'hw/rdma',
>      >      >>>       'hw/rdma/vmw',
>      >      >>>       'hw/rtc',
>      >      >>> +    'hw/riscv',
>      >      >>>       'hw/s390x',
>      >      >>>       'hw/scsi',
>      >      >>>       'hw/sd',
>      >      >>> --
>      >      >>> 2.43.2
>      >      >>>
>      >      >>>
>      >
> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2024-05-21 12:29 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-07 16:03 [PATCH v2 00/15] riscv: QEMU RISC-V IOMMU Support Daniel Henrique Barboza
2024-03-07 16:03 ` [PATCH v2 01/15] exec/memtxattr: add process identifier to the transaction attributes Daniel Henrique Barboza
2024-04-23 16:33   ` Frank Chang
2024-03-07 16:03 ` [PATCH v2 02/15] hw/riscv: add riscv-iommu-bits.h Daniel Henrique Barboza
2024-05-10 11:01   ` Frank Chang
2024-05-15 10:02   ` Eric Cheng
2024-05-15 14:28     ` Daniel Henrique Barboza
2024-03-07 16:03 ` [PATCH v2 03/15] hw/riscv: add RISC-V IOMMU base emulation Daniel Henrique Barboza
2024-05-01 11:57   ` Jason Chien
2024-05-14 20:06     ` Daniel Henrique Barboza
2024-05-02 11:37   ` Frank Chang
2024-05-08 11:15     ` Daniel Henrique Barboza
2024-05-10 10:58       ` Frank Chang
2024-05-13 12:41         ` Daniel Henrique Barboza
2024-05-13 12:37       ` Daniel Henrique Barboza
2024-05-16  7:13         ` Frank Chang
2024-05-20 16:17           ` Daniel Henrique Barboza
2024-05-21 10:52             ` Frank Chang
2024-05-21 12:28               ` Daniel Henrique Barboza
2024-03-07 16:03 ` [PATCH v2 04/15] hw/riscv: add riscv-iommu-pci device Daniel Henrique Barboza
2024-04-29  7:21   ` Frank Chang
2024-05-02  9:37     ` Daniel Henrique Barboza
2024-03-07 16:03 ` [PATCH v2 05/15] hw/riscv: add riscv-iommu-sys platform device Daniel Henrique Barboza
2024-04-30  1:35   ` Frank Chang
2024-03-07 16:03 ` [PATCH v2 06/15] hw/riscv/virt.c: support for RISC-V IOMMU PCIDevice hotplug Daniel Henrique Barboza
2024-04-30  2:17   ` Frank Chang
2024-05-15  6:25   ` Eric Cheng
2024-05-15  7:16     ` Andrew Jones
2024-03-07 16:03 ` [PATCH v2 07/15] test/qtest: add riscv-iommu-pci tests Daniel Henrique Barboza
2024-04-30  3:33   ` Frank Chang
2024-03-07 16:03 ` [PATCH v2 08/15] hw/riscv/riscv-iommu: add Address Translation Cache (IOATC) Daniel Henrique Barboza
2024-05-08  7:26   ` Frank Chang
2024-05-16 21:45     ` Daniel Henrique Barboza
2024-03-07 16:03 ` [PATCH v2 09/15] hw/riscv/riscv-iommu: add s-stage and g-stage support Daniel Henrique Barboza
2024-05-10 10:36   ` Frank Chang
2024-05-10 11:14     ` Andrew Jones
2024-05-16 19:41       ` Daniel Henrique Barboza
2024-03-07 16:03 ` [PATCH v2 10/15] hw/riscv/riscv-iommu: add ATS support Daniel Henrique Barboza
2024-05-08  2:57   ` Frank Chang
2024-05-17  9:29     ` Daniel Henrique Barboza
2024-03-07 16:03 ` [PATCH v2 11/15] hw/riscv/riscv-iommu: add DBG support Daniel Henrique Barboza
2024-05-06  4:09   ` Frank Chang
2024-05-06 13:05     ` Daniel Henrique Barboza
2024-05-10 10:59       ` Frank Chang
2024-03-07 16:03 ` [PATCH v2 12/15] hw/riscv/riscv-iommu: Add another irq for mrif notifications Daniel Henrique Barboza
2024-05-06  6:12   ` Frank Chang
2024-03-07 16:03 ` [PATCH v2 13/15] qtest/riscv-iommu-test: add init queues test Daniel Henrique Barboza
2024-05-07  8:01   ` Frank Chang
2024-03-07 16:03 ` [PATCH v2 14/15] hw/misc: EDU: added PASID support Daniel Henrique Barboza
2024-05-07  9:06   ` Frank Chang
2024-03-07 16:03 ` [PATCH v2 15/15] hw/misc: EDU: add ATS/PRI capability Daniel Henrique Barboza
2024-05-07 15:32   ` Frank Chang
2024-05-16 13:59     ` Daniel Henrique Barboza
2024-05-10 11:14 ` [PATCH v2 00/15] riscv: QEMU RISC-V IOMMU Support Frank Chang
2024-05-20 16:26   ` Daniel Henrique Barboza

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).