All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/12] SMMUv3 nested translation support
@ 2024-03-25 10:13 Mostafa Saleh
  2024-03-25 10:13 ` [RFC PATCH 01/12] hw/arm/smmu: Use enum for SMMU stage Mostafa Saleh
                   ` (13 more replies)
  0 siblings, 14 replies; 24+ messages in thread
From: Mostafa Saleh @ 2024-03-25 10:13 UTC (permalink / raw)
  To: qemu-arm, eric.auger, peter.maydell, qemu-devel
  Cc: jean-philippe, alex.bennee, maz, nicolinc, julien, Mostafa Saleh

Currently, QEMU supports emulating either stage-1 or stage-2 SMMUs
but not nested instances.
This patch series adds support for nested translation in SMMUv3,
this is controlled by property “arm-smmuv3.stage=nested”, and
advertised to guests as (IDR0.S1P == 1 && IDR0.S2P == 2)

Main changes(architecture):
============================
1) CDs are considered IPA and translated with stage-2.
2) TTBx and tables for stage-1 are considered IPA and translated
   with stage-2.
3) Translate the IPA address with stage-2.

TLBs:
======
TLBs are the most tricky part.

1) General design
   Unified(Combined) design is used, where a new tag is added "stage"
   which has 2 valid values:
   - STAGE_1: Meaning this entry translates VA to PADDR, it can be
     cached from fully nested configuration or from stage-1 only.
     It doesn't support separate cached entries (VA to IPA).

   - STAGE_2: Meaning this translates IPA to PADDR, cached from
     stage-2  only configuration.

   TLBs are also modified to cache 2 permissions, a new permission added
   "parent_perm."

   For non-nested configuration, perm == parent_perm and nothing
   changes. This is used to know which stage to use in case there is
   a permission fault from a TLB entry.

2) Caching in TLB
   Stage-1 and stage-2 are inserted in the TLB as is.
   For nested translation, both entries are combined into one TLB
   entry. Everything is used from stage-1, except:
   - transatled_addr from stage-2.
   - parent_perm is from stage-2.
   - addr_mask: is the minimum of both.

3) TLB Lookup
   For stage-1 and nested translations, it look for STAGE_1 entries.
   For stage-2 it look for STAGE_2 TLB entries.

4) TLB invalidation
   - Stage-1 commands (CMD_TLBI_NH_VAA, SMMU_CMD_TLBI_NH_VA,
     SMMU_CMD_TLBI_NH_ALL): Invalidate TLBs tagged with SMMU_STAGE_1.
   - Stage-2 commands (CMD_TLBI_S2_IPA): Invalidate TLBs tagged with
     SMMU_STAGE_2.
   - All (SMMU_CMD_TLBI_S12_VMALL): Will invalidate both, this is
     communicated to the TLB as SMMU_NESTED which is (SMMU_STAGE_1 |
     SMMU_STAGE_2) which uses it as a mask.

   As far as I understand, this is compliant with the ARM
   architecture, based on:
   - ARM ARM DDI 0487J.a: RLGSCG, RTVTYQ, RGNJPZ
   - ARM IHI 0070F.b: 16.2 Caching

   An alternative approach would be to instantiate 2 TLBs, one per
   each stage. I haven’t investigated that.

Others
=======
- Advertise SMMUv3.2-S2FWB, it is NOP for QEMU as it doesn’t support
  attributes.

- OAS: A typical setup with nesting is to share CPU stage-2 with the
  SMMU, and according to the user manual, SMMU OAS must match the
  system physical address.

  This was discussed before in
  https://lore.kernel.org/all/20230226220650.1480786-11-smostafa@google.com/
  The implementation here, follows the discussion, where migration is
  added and oas is set up from the board (virt). However, the OAS is
  chosen based on the CPU PARANGE as there is no fixed one.

- For nested configuration, IOVA notifier only notifies for stage-1
  invalidations (as far as I understand this is the intended
  behaviour as it notifies for IOVA)

- Stop ignoring VMID for stage-1 if stage-2 is also supported.


Future improvements:
=====================
1) One small improvement, that I don’t think it’s worth the extra
   complexity, is in case of Stage-1 TLB miss for nested translation,
   we can do stage-1 walk and lookup for stage-2 TLBs, instead of
   doing the full walk.

2) Patch 0006 (hw/arm/smmuv3: Translate CD and TT using stage-2 table)
   introduces a macro to use functions that rely on cfg for stage-2,
   I don’t like it. However, I didn’t find a simple way around it,
   either we change many functions to have a separate stage argument,
   or add another arg in config, which is probably more code.

Testing
========
1) IOMMUFD + VFIO
   Kernel: https://lore.kernel.org/all/cover.1683688960.git.nicolinc@nvidia.com/
   VMM: https://qemu-devel.nongnu.narkive.com/o815DqpI/rfc-v5-0-8-arm-smmuv3-emulation-support

   By assigning “virtio-net-pci,netdev=net0,disable-legacy=on,iommu_platform=on,ats=on”,
   to a guest VM (on top of QEMU guest) with VIFO and IOMMUFD.

2) Work in progress prototype I am hacking on for nesting on KVM
   (this is nowhere near complete, and misses many stuff but it
   doesn't require VMs/VFIO) also with virtio-net-pci and git
   cloning a bunch of stuff and also observing traces.
   https://android-kvm.googlesource.com/linux/+log/refs/heads/smostafa/android15-6.6-smmu-nesting-wip

hw/arm/smmuv3: Split smmuv3_translate() better viewed with --color-moved


Mostafa Saleh (12):
  hw/arm/smmu: Use enum for SMMU stage
  hw/arm/smmu: Split smmuv3_translate()
  hw/arm/smmu: Add stage to TLB
  hw/arm/smmu: Support nesting in commands
  hw/arm/smmuv3: Support nested SMMUs in smmuv3_notify_iova()
  hw/arm/smmuv3: Translate CD and TT using stage-2 table
  hw/arm/smmu-common: Support nested translation
  hw/arm/smmuv3: Support and advertise nesting
  hw/arm/smmuv3: Advertise S2FWB
  hw/arm/smmu: Refactor SMMU OAS
  hw/arm/smmuv3: Add property for OAS
  hw/arm/virt: Set SMMU OAS based on CPU PARANGE

 hw/arm/smmu-common.c         | 256 ++++++++++++++++++----
 hw/arm/smmu-internal.h       |   2 +
 hw/arm/smmuv3-internal.h     |  17 +-
 hw/arm/smmuv3.c              | 405 ++++++++++++++++++++++-------------
 hw/arm/trace-events          |  14 +-
 hw/arm/virt.c                |  14 +-
 include/hw/arm/smmu-common.h |  46 +++-
 include/hw/arm/smmuv3.h      |   1 +
 target/arm/cpu.h             |   2 +
 target/arm/cpu64.c           |   5 +
 10 files changed, 533 insertions(+), 229 deletions(-)

-- 
2.44.0.396.g6e790dbe36-goog



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH 01/12] hw/arm/smmu: Use enum for SMMU stage
  2024-03-25 10:13 [RFC PATCH 00/12] SMMUv3 nested translation support Mostafa Saleh
@ 2024-03-25 10:13 ` Mostafa Saleh
  2024-04-02 16:32   ` Eric Auger
  2024-03-25 10:13 ` [RFC PATCH 02/12] hw/arm/smmu: Split smmuv3_translate() Mostafa Saleh
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 24+ messages in thread
From: Mostafa Saleh @ 2024-03-25 10:13 UTC (permalink / raw)
  To: qemu-arm, eric.auger, peter.maydell, qemu-devel
  Cc: jean-philippe, alex.bennee, maz, nicolinc, julien, Mostafa Saleh

Currently, translation stage is represented as an int, where 1 is stage-1 and
2 is stage-2, when nested is added, 3 would be confusing to represent nesting,
so we use an enum instead.

While keeping the same values, this is useful for:
 - Doing tricks with bit masks, where BIT(0) is stage-1 and BIT(1) is
   stage-2 and both is nested.
 - Tracing, as stage is printed as int.

Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
 hw/arm/smmu-common.c         | 14 +++++++-------
 hw/arm/smmuv3.c              | 15 ++++++++-------
 include/hw/arm/smmu-common.h | 11 +++++++++--
 3 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index 4caedb4998..3a7c350aca 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -304,7 +304,7 @@ static int smmu_ptw_64_s1(SMMUTransCfg *cfg,
                           SMMUTLBEntry *tlbe, SMMUPTWEventInfo *info)
 {
     dma_addr_t baseaddr, indexmask;
-    int stage = cfg->stage;
+    SMMUStage stage = cfg->stage;
     SMMUTransTableInfo *tt = select_tt(cfg, iova);
     uint8_t level, granule_sz, inputsize, stride;
 
@@ -392,7 +392,7 @@ static int smmu_ptw_64_s1(SMMUTransCfg *cfg,
     info->type = SMMU_PTW_ERR_TRANSLATION;
 
 error:
-    info->stage = 1;
+    info->stage = SMMU_STAGE_1;
     tlbe->entry.perm = IOMMU_NONE;
     return -EINVAL;
 }
@@ -415,7 +415,7 @@ static int smmu_ptw_64_s2(SMMUTransCfg *cfg,
                           dma_addr_t ipa, IOMMUAccessFlags perm,
                           SMMUTLBEntry *tlbe, SMMUPTWEventInfo *info)
 {
-    const int stage = 2;
+    const SMMUStage stage = SMMU_STAGE_2;
     int granule_sz = cfg->s2cfg.granule_sz;
     /* ARM DDI0487I.a: Table D8-7. */
     int inputsize = 64 - cfg->s2cfg.tsz;
@@ -513,7 +513,7 @@ static int smmu_ptw_64_s2(SMMUTransCfg *cfg,
     info->type = SMMU_PTW_ERR_TRANSLATION;
 
 error:
-    info->stage = 2;
+    info->stage = SMMU_STAGE_2;
     tlbe->entry.perm = IOMMU_NONE;
     return -EINVAL;
 }
@@ -532,9 +532,9 @@ error:
 int smmu_ptw(SMMUTransCfg *cfg, dma_addr_t iova, IOMMUAccessFlags perm,
              SMMUTLBEntry *tlbe, SMMUPTWEventInfo *info)
 {
-    if (cfg->stage == 1) {
+    if (cfg->stage == SMMU_STAGE_1) {
         return smmu_ptw_64_s1(cfg, iova, perm, tlbe, info);
-    } else if (cfg->stage == 2) {
+    } else if (cfg->stage == SMMU_STAGE_2) {
         /*
          * If bypassing stage 1(or unimplemented), the input address is passed
          * directly to stage 2 as IPA. If the input address of a transaction
@@ -543,7 +543,7 @@ int smmu_ptw(SMMUTransCfg *cfg, dma_addr_t iova, IOMMUAccessFlags perm,
          */
         if (iova >= (1ULL << cfg->oas)) {
             info->type = SMMU_PTW_ERR_ADDR_SIZE;
-            info->stage = 1;
+            info->stage = SMMU_STAGE_1;
             tlbe->entry.perm = IOMMU_NONE;
             return -EINVAL;
         }
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 9eb56a70f3..50e5a72d54 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -34,7 +34,8 @@
 #include "smmuv3-internal.h"
 #include "smmu-internal.h"
 
-#define PTW_RECORD_FAULT(cfg)   (((cfg)->stage == 1) ? (cfg)->record_faults : \
+#define PTW_RECORD_FAULT(cfg)   (((cfg)->stage == SMMU_STAGE_1) ? \
+                                 (cfg)->record_faults : \
                                  (cfg)->s2cfg.record_faults)
 
 /**
@@ -402,7 +403,7 @@ static bool s2_pgtable_config_valid(uint8_t sl0, uint8_t t0sz, uint8_t gran)
 
 static int decode_ste_s2_cfg(SMMUTransCfg *cfg, STE *ste)
 {
-    cfg->stage = 2;
+    cfg->stage = SMMU_STAGE_2;
 
     if (STE_S2AA64(ste) == 0x0) {
         qemu_log_mask(LOG_UNIMP,
@@ -678,7 +679,7 @@ static int decode_cd(SMMUTransCfg *cfg, CD *cd, SMMUEventInfo *event)
 
     /* we support only those at the moment */
     cfg->aa64 = true;
-    cfg->stage = 1;
+    cfg->stage = SMMU_STAGE_1;
 
     cfg->oas = oas2bits(CD_IPS(cd));
     cfg->oas = MIN(oas2bits(SMMU_IDR5_OAS), cfg->oas);
@@ -762,7 +763,7 @@ static int smmuv3_decode_config(IOMMUMemoryRegion *mr, SMMUTransCfg *cfg,
         return ret;
     }
 
-    if (cfg->aborted || cfg->bypassed || (cfg->stage == 2)) {
+    if (cfg->aborted || cfg->bypassed || (cfg->stage == SMMU_STAGE_2)) {
         return 0;
     }
 
@@ -882,7 +883,7 @@ static IOMMUTLBEntry smmuv3_translate(IOMMUMemoryRegion *mr, hwaddr addr,
         goto epilogue;
     }
 
-    if (cfg->stage == 1) {
+    if (cfg->stage == SMMU_STAGE_1) {
         /* Select stage1 translation table. */
         tt = select_tt(cfg, addr);
         if (!tt) {
@@ -919,7 +920,7 @@ static IOMMUTLBEntry smmuv3_translate(IOMMUMemoryRegion *mr, hwaddr addr,
              * nesting is not supported. So it is sufficient to check the
              * translation stage to know the TLB stage for now.
              */
-            event.u.f_walk_eabt.s2 = (cfg->stage == 2);
+            event.u.f_walk_eabt.s2 = (cfg->stage == SMMU_STAGE_2);
             if (PTW_RECORD_FAULT(cfg)) {
                 event.type = SMMU_EVT_F_PERMISSION;
                 event.u.f_permission.addr = addr;
@@ -935,7 +936,7 @@ static IOMMUTLBEntry smmuv3_translate(IOMMUMemoryRegion *mr, hwaddr addr,
 
     if (smmu_ptw(cfg, aligned_addr, flag, cached_entry, &ptw_info)) {
         /* All faults from PTW has S2 field. */
-        event.u.f_walk_eabt.s2 = (ptw_info.stage == 2);
+        event.u.f_walk_eabt.s2 = (ptw_info.stage == SMMU_STAGE_2);
         g_free(cached_entry);
         switch (ptw_info.type) {
         case SMMU_PTW_ERR_WALK_EABT:
diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
index 5ec2e6c1a4..b3c881f0ee 100644
--- a/include/hw/arm/smmu-common.h
+++ b/include/hw/arm/smmu-common.h
@@ -49,8 +49,15 @@ typedef enum {
     SMMU_PTW_ERR_PERMISSION,  /* Permission fault */
 } SMMUPTWEventType;
 
+/* SMMU Stage */
+typedef enum {
+    SMMU_STAGE_1 = 1,
+    SMMU_STAGE_2,
+    SMMU_NESTED,
+} SMMUStage;
+
 typedef struct SMMUPTWEventInfo {
-    int stage;
+    SMMUStage stage;
     SMMUPTWEventType type;
     dma_addr_t addr; /* fetched address that induced an abort, if any */
 } SMMUPTWEventInfo;
@@ -88,7 +95,7 @@ typedef struct SMMUS2Cfg {
  */
 typedef struct SMMUTransCfg {
     /* Shared fields between stage-1 and stage-2. */
-    int stage;                 /* translation stage */
+    SMMUStage stage;           /* translation stage */
     bool disabled;             /* smmu is disabled */
     bool bypassed;             /* translation is bypassed */
     bool aborted;              /* translation is aborted */
-- 
2.44.0.396.g6e790dbe36-goog



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH 02/12] hw/arm/smmu: Split smmuv3_translate()
  2024-03-25 10:13 [RFC PATCH 00/12] SMMUv3 nested translation support Mostafa Saleh
  2024-03-25 10:13 ` [RFC PATCH 01/12] hw/arm/smmu: Use enum for SMMU stage Mostafa Saleh
@ 2024-03-25 10:13 ` Mostafa Saleh
  2024-04-02 16:32   ` Eric Auger
  2024-03-25 10:13 ` [RFC PATCH 03/12] hw/arm/smmu: Add stage to TLB Mostafa Saleh
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 24+ messages in thread
From: Mostafa Saleh @ 2024-03-25 10:13 UTC (permalink / raw)
  To: qemu-arm, eric.auger, peter.maydell, qemu-devel
  Cc: jean-philippe, alex.bennee, maz, nicolinc, julien, Mostafa Saleh

smmuv3_translate() does everything from STE/CD parsing to TLB lookup
and PTW.

Soon, when nesting is supported, stage-1 data (tt, CD) needs to be
translated using stage-2.

Split smmuv3_translate() to 3 functions:

- smmu_translate(): in smmu-common.c, which does the TLB lookup, PTW,
  TLB insertion, all the functions are already there, this just puts
  them together.
  This also simplifies the code as it consolidates event generation
  in case of TLB lookup permission failure or in TT selection.

- smmuv3_do_translate(): in smmuv3.c, Calls smmu_translate() and does
  the event population in case of errors.

 - smmuv3_translate(), now calls smmuv3_do_translate() for
   translation while the rest is the same.

Also, add stage in trace_smmuv3_translate_success()

Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
 hw/arm/smmu-common.c         |  59 ++++++++++++
 hw/arm/smmuv3.c              | 175 +++++++++++++----------------------
 hw/arm/trace-events          |   2 +-
 include/hw/arm/smmu-common.h |   5 +
 4 files changed, 130 insertions(+), 111 deletions(-)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index 3a7c350aca..20630eb670 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -554,6 +554,65 @@ int smmu_ptw(SMMUTransCfg *cfg, dma_addr_t iova, IOMMUAccessFlags perm,
     g_assert_not_reached();
 }
 
+SMMUTLBEntry *smmu_translate(SMMUState *bs, SMMUTransCfg *cfg, dma_addr_t addr,
+                             IOMMUAccessFlags flag, SMMUPTWEventInfo *info)
+{
+    uint64_t page_mask, aligned_addr;
+    SMMUTLBEntry *cached_entry = NULL;
+    SMMUTransTableInfo *tt;
+    int status;
+
+    /*
+     * Combined attributes used for TLB lookup, as only one stage is supported,
+     * it will hold attributes based on the enabled stage.
+     */
+    SMMUTransTableInfo tt_combined;
+
+    if (cfg->stage == SMMU_STAGE_1) {
+        /* Select stage1 translation table. */
+        tt = select_tt(cfg, addr);
+        if (!tt) {
+            info->type = SMMU_PTW_ERR_TRANSLATION;
+            info->stage = SMMU_STAGE_1;
+            return NULL;
+        }
+        tt_combined.granule_sz = tt->granule_sz;
+        tt_combined.tsz = tt->tsz;
+
+    } else {
+        /* Stage2. */
+        tt_combined.granule_sz = cfg->s2cfg.granule_sz;
+        tt_combined.tsz = cfg->s2cfg.tsz;
+    }
+
+    /*
+     * TLB lookup looks for granule and input size for a translation stage,
+     * as only one stage is supported right now, choose the right values
+     * from the configuration.
+     */
+    page_mask = (1ULL << tt_combined.granule_sz) - 1;
+    aligned_addr = addr & ~page_mask;
+
+    cached_entry = smmu_iotlb_lookup(bs, cfg, &tt_combined, aligned_addr);
+    if (cached_entry) {
+        if ((flag & IOMMU_WO) && !(cached_entry->entry.perm & IOMMU_WO)) {
+            info->type = SMMU_PTW_ERR_PERMISSION;
+            info->stage = cfg->stage;
+            return NULL;
+        }
+        return cached_entry;
+    }
+
+    cached_entry = g_new0(SMMUTLBEntry, 1);
+    status = smmu_ptw(cfg, aligned_addr, flag, cached_entry, info);
+    if (status) {
+            g_free(cached_entry);
+            return NULL;
+    }
+    smmu_iotlb_insert(bs, cfg, cached_entry);
+    return cached_entry;
+}
+
 /**
  * The bus number is used for lookup when SID based invalidation occurs.
  * In that case we lazily populate the SMMUPciBus array from the bus hash
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 50e5a72d54..f081ff0cc4 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -827,6 +827,67 @@ static void smmuv3_flush_config(SMMUDevice *sdev)
     g_hash_table_remove(bc->configs, sdev);
 }
 
+/* Do translation with TLB lookup. */
+static SMMUTranslationStatus smmuv3_do_translate(SMMUv3State *s, hwaddr addr,
+                                                 SMMUTransCfg *cfg,
+                                                 SMMUEventInfo *event,
+                                                 IOMMUAccessFlags flag,
+                                                 SMMUTLBEntry **out_entry)
+{
+    SMMUPTWEventInfo ptw_info = {};
+    SMMUState *bs = ARM_SMMU(s);
+    SMMUTLBEntry *cached_entry = NULL;
+
+    cached_entry = smmu_translate(bs, cfg, addr, flag, &ptw_info);
+    if (!cached_entry) {
+        /* All faults from PTW has S2 field. */
+        event->u.f_walk_eabt.s2 = (ptw_info.stage == SMMU_STAGE_2);
+        switch (ptw_info.type) {
+        case SMMU_PTW_ERR_WALK_EABT:
+            event->type = SMMU_EVT_F_WALK_EABT;
+            event->u.f_walk_eabt.addr = addr;
+            event->u.f_walk_eabt.rnw = flag & 0x1;
+            event->u.f_walk_eabt.class = 0x1;
+            event->u.f_walk_eabt.addr2 = ptw_info.addr;
+            break;
+        case SMMU_PTW_ERR_TRANSLATION:
+            if (PTW_RECORD_FAULT(cfg)) {
+                event->type = SMMU_EVT_F_TRANSLATION;
+                event->u.f_translation.addr = addr;
+                event->u.f_translation.rnw = flag & 0x1;
+            }
+            break;
+        case SMMU_PTW_ERR_ADDR_SIZE:
+            if (PTW_RECORD_FAULT(cfg)) {
+                event->type = SMMU_EVT_F_ADDR_SIZE;
+                event->u.f_addr_size.addr = addr;
+                event->u.f_addr_size.rnw = flag & 0x1;
+            }
+            break;
+        case SMMU_PTW_ERR_ACCESS:
+            if (PTW_RECORD_FAULT(cfg)) {
+                event->type = SMMU_EVT_F_ACCESS;
+                event->u.f_access.addr = addr;
+                event->u.f_access.rnw = flag & 0x1;
+            }
+            break;
+        case SMMU_PTW_ERR_PERMISSION:
+            if (PTW_RECORD_FAULT(cfg)) {
+                event->type = SMMU_EVT_F_PERMISSION;
+                event->u.f_permission.addr = addr;
+                event->u.f_permission.rnw = flag & 0x1;
+            }
+            break;
+        default:
+            g_assert_not_reached();
+        }
+        return SMMU_TRANS_ERROR;
+    }
+    *out_entry = cached_entry;
+    return SMMU_TRANS_SUCCESS;
+}
+
+/* Entry point to SMMU, does everything. */
 static IOMMUTLBEntry smmuv3_translate(IOMMUMemoryRegion *mr, hwaddr addr,
                                       IOMMUAccessFlags flag, int iommu_idx)
 {
@@ -836,12 +897,7 @@ static IOMMUTLBEntry smmuv3_translate(IOMMUMemoryRegion *mr, hwaddr addr,
     SMMUEventInfo event = {.type = SMMU_EVT_NONE,
                            .sid = sid,
                            .inval_ste_allowed = false};
-    SMMUPTWEventInfo ptw_info = {};
     SMMUTranslationStatus status;
-    SMMUState *bs = ARM_SMMU(s);
-    uint64_t page_mask, aligned_addr;
-    SMMUTLBEntry *cached_entry = NULL;
-    SMMUTransTableInfo *tt;
     SMMUTransCfg *cfg = NULL;
     IOMMUTLBEntry entry = {
         .target_as = &address_space_memory,
@@ -850,11 +906,7 @@ static IOMMUTLBEntry smmuv3_translate(IOMMUMemoryRegion *mr, hwaddr addr,
         .addr_mask = ~(hwaddr)0,
         .perm = IOMMU_NONE,
     };
-    /*
-     * Combined attributes used for TLB lookup, as only one stage is supported,
-     * it will hold attributes based on the enabled stage.
-     */
-    SMMUTransTableInfo tt_combined;
+    SMMUTLBEntry *cached_entry = NULL;
 
     qemu_mutex_lock(&s->mutex);
 
@@ -883,105 +935,7 @@ static IOMMUTLBEntry smmuv3_translate(IOMMUMemoryRegion *mr, hwaddr addr,
         goto epilogue;
     }
 
-    if (cfg->stage == SMMU_STAGE_1) {
-        /* Select stage1 translation table. */
-        tt = select_tt(cfg, addr);
-        if (!tt) {
-            if (cfg->record_faults) {
-                event.type = SMMU_EVT_F_TRANSLATION;
-                event.u.f_translation.addr = addr;
-                event.u.f_translation.rnw = flag & 0x1;
-            }
-            status = SMMU_TRANS_ERROR;
-            goto epilogue;
-        }
-        tt_combined.granule_sz = tt->granule_sz;
-        tt_combined.tsz = tt->tsz;
-
-    } else {
-        /* Stage2. */
-        tt_combined.granule_sz = cfg->s2cfg.granule_sz;
-        tt_combined.tsz = cfg->s2cfg.tsz;
-    }
-    /*
-     * TLB lookup looks for granule and input size for a translation stage,
-     * as only one stage is supported right now, choose the right values
-     * from the configuration.
-     */
-    page_mask = (1ULL << tt_combined.granule_sz) - 1;
-    aligned_addr = addr & ~page_mask;
-
-    cached_entry = smmu_iotlb_lookup(bs, cfg, &tt_combined, aligned_addr);
-    if (cached_entry) {
-        if ((flag & IOMMU_WO) && !(cached_entry->entry.perm & IOMMU_WO)) {
-            status = SMMU_TRANS_ERROR;
-            /*
-             * We know that the TLB only contains either stage-1 or stage-2 as
-             * nesting is not supported. So it is sufficient to check the
-             * translation stage to know the TLB stage for now.
-             */
-            event.u.f_walk_eabt.s2 = (cfg->stage == SMMU_STAGE_2);
-            if (PTW_RECORD_FAULT(cfg)) {
-                event.type = SMMU_EVT_F_PERMISSION;
-                event.u.f_permission.addr = addr;
-                event.u.f_permission.rnw = flag & 0x1;
-            }
-        } else {
-            status = SMMU_TRANS_SUCCESS;
-        }
-        goto epilogue;
-    }
-
-    cached_entry = g_new0(SMMUTLBEntry, 1);
-
-    if (smmu_ptw(cfg, aligned_addr, flag, cached_entry, &ptw_info)) {
-        /* All faults from PTW has S2 field. */
-        event.u.f_walk_eabt.s2 = (ptw_info.stage == SMMU_STAGE_2);
-        g_free(cached_entry);
-        switch (ptw_info.type) {
-        case SMMU_PTW_ERR_WALK_EABT:
-            event.type = SMMU_EVT_F_WALK_EABT;
-            event.u.f_walk_eabt.addr = addr;
-            event.u.f_walk_eabt.rnw = flag & 0x1;
-            event.u.f_walk_eabt.class = 0x1;
-            event.u.f_walk_eabt.addr2 = ptw_info.addr;
-            break;
-        case SMMU_PTW_ERR_TRANSLATION:
-            if (PTW_RECORD_FAULT(cfg)) {
-                event.type = SMMU_EVT_F_TRANSLATION;
-                event.u.f_translation.addr = addr;
-                event.u.f_translation.rnw = flag & 0x1;
-            }
-            break;
-        case SMMU_PTW_ERR_ADDR_SIZE:
-            if (PTW_RECORD_FAULT(cfg)) {
-                event.type = SMMU_EVT_F_ADDR_SIZE;
-                event.u.f_addr_size.addr = addr;
-                event.u.f_addr_size.rnw = flag & 0x1;
-            }
-            break;
-        case SMMU_PTW_ERR_ACCESS:
-            if (PTW_RECORD_FAULT(cfg)) {
-                event.type = SMMU_EVT_F_ACCESS;
-                event.u.f_access.addr = addr;
-                event.u.f_access.rnw = flag & 0x1;
-            }
-            break;
-        case SMMU_PTW_ERR_PERMISSION:
-            if (PTW_RECORD_FAULT(cfg)) {
-                event.type = SMMU_EVT_F_PERMISSION;
-                event.u.f_permission.addr = addr;
-                event.u.f_permission.rnw = flag & 0x1;
-            }
-            break;
-        default:
-            g_assert_not_reached();
-        }
-        status = SMMU_TRANS_ERROR;
-    } else {
-        smmu_iotlb_insert(bs, cfg, cached_entry);
-        status = SMMU_TRANS_SUCCESS;
-    }
+    status = smmuv3_do_translate(s, addr, cfg, &event, flag, &cached_entry);
 
 epilogue:
     qemu_mutex_unlock(&s->mutex);
@@ -992,7 +946,8 @@ epilogue:
                                     (addr & cached_entry->entry.addr_mask);
         entry.addr_mask = cached_entry->entry.addr_mask;
         trace_smmuv3_translate_success(mr->parent_obj.name, sid, addr,
-                                       entry.translated_addr, entry.perm);
+                                       entry.translated_addr, entry.perm,
+                                       cfg->stage);
         break;
     case SMMU_TRANS_DISABLE:
         entry.perm = flag;
diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index f1a54a02df..cc12924a84 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -37,7 +37,7 @@ smmuv3_get_ste(uint64_t addr) "STE addr: 0x%"PRIx64
 smmuv3_translate_disable(const char *n, uint16_t sid, uint64_t addr, bool is_write) "%s sid=0x%x bypass (smmu disabled) iova:0x%"PRIx64" is_write=%d"
 smmuv3_translate_bypass(const char *n, uint16_t sid, uint64_t addr, bool is_write) "%s sid=0x%x STE bypass iova:0x%"PRIx64" is_write=%d"
 smmuv3_translate_abort(const char *n, uint16_t sid, uint64_t addr, bool is_write) "%s sid=0x%x abort on iova:0x%"PRIx64" is_write=%d"
-smmuv3_translate_success(const char *n, uint16_t sid, uint64_t iova, uint64_t translated, int perm) "%s sid=0x%x iova=0x%"PRIx64" translated=0x%"PRIx64" perm=0x%x"
+smmuv3_translate_success(const char *n, uint16_t sid, uint64_t iova, uint64_t translated, int perm, int stage) "%s sid=0x%x iova=0x%"PRIx64" translated=0x%"PRIx64" perm=0x%x stage=%d"
 smmuv3_get_cd(uint64_t addr) "CD addr: 0x%"PRIx64
 smmuv3_decode_cd(uint32_t oas) "oas=%d"
 smmuv3_decode_cd_tt(int i, uint32_t tsz, uint64_t ttb, uint32_t granule_sz, bool had) "TT[%d]:tsz:%d ttb:0x%"PRIx64" granule_sz:%d had:%d"
diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
index b3c881f0ee..876e78975c 100644
--- a/include/hw/arm/smmu-common.h
+++ b/include/hw/arm/smmu-common.h
@@ -183,6 +183,11 @@ static inline uint16_t smmu_get_sid(SMMUDevice *sdev)
 int smmu_ptw(SMMUTransCfg *cfg, dma_addr_t iova, IOMMUAccessFlags perm,
              SMMUTLBEntry *tlbe, SMMUPTWEventInfo *info);
 
+
+/* smmu_translate - Look for a translation in TLB, if not, do a PTW. */
+SMMUTLBEntry *smmu_translate(SMMUState *bs, SMMUTransCfg *cfg, dma_addr_t addr,
+                             IOMMUAccessFlags flag, SMMUPTWEventInfo *info);
+
 /**
  * select_tt - compute which translation table shall be used according to
  * the input iova and translation config and return the TT specific info
-- 
2.44.0.396.g6e790dbe36-goog



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH 03/12] hw/arm/smmu: Add stage to TLB
  2024-03-25 10:13 [RFC PATCH 00/12] SMMUv3 nested translation support Mostafa Saleh
  2024-03-25 10:13 ` [RFC PATCH 01/12] hw/arm/smmu: Use enum for SMMU stage Mostafa Saleh
  2024-03-25 10:13 ` [RFC PATCH 02/12] hw/arm/smmu: Split smmuv3_translate() Mostafa Saleh
@ 2024-03-25 10:13 ` Mostafa Saleh
  2024-04-02 17:15   ` Eric Auger
  2024-03-25 10:14 ` [RFC PATCH 04/12] hw/arm/smmu: Support nesting in commands Mostafa Saleh
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 24+ messages in thread
From: Mostafa Saleh @ 2024-03-25 10:13 UTC (permalink / raw)
  To: qemu-arm, eric.auger, peter.maydell, qemu-devel
  Cc: jean-philippe, alex.bennee, maz, nicolinc, julien, Mostafa Saleh

TLBs for nesting will be extended to be combined, a new index is added
"stage", with 2 valid values:
 - SMMU_STAGE_1: Meaning this translates VA to PADDR, this entry can
   be cached from fully nested configuration or from stage-1 only.
   We don't support separate cached entries (VA to IPA)

 - SMMU_STAGE_2: Meaning this translates IPA to PADDR, cached from
   stage-2 only configuration.

For TLB invalidation:
 - by VA: Invalidate TLBs tagged with SMMU_STAGE_1
 - by IPA: Invalidate TLBs tagged with SMMU_STAGE_2
 - All: Will invalidate both, this is communicated to the TLB as
   SMMU_NESTED which is (SMMU_STAGE_1 | SMMU_STAGE_2) which uses
   it as a mask.

This briefly described in the user manual (ARM IHI 0070 F.b) in
"16.2.1 Caching combined structures".

Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
 hw/arm/smmu-common.c         | 27 +++++++++++++++++----------
 hw/arm/smmu-internal.h       |  2 ++
 hw/arm/smmuv3.c              |  5 +++--
 hw/arm/trace-events          |  3 ++-
 include/hw/arm/smmu-common.h |  8 ++++++--
 5 files changed, 30 insertions(+), 15 deletions(-)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index 20630eb670..677dcf9a13 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -38,7 +38,7 @@ static guint smmu_iotlb_key_hash(gconstpointer v)
 
     /* Jenkins hash */
     a = b = c = JHASH_INITVAL + sizeof(*key);
-    a += key->asid + key->vmid + key->level + key->tg;
+    a += key->asid + key->vmid + key->level + key->tg + key->stage;
     b += extract64(key->iova, 0, 32);
     c += extract64(key->iova, 32, 32);
 
@@ -54,14 +54,14 @@ static gboolean smmu_iotlb_key_equal(gconstpointer v1, gconstpointer v2)
 
     return (k1->asid == k2->asid) && (k1->iova == k2->iova) &&
            (k1->level == k2->level) && (k1->tg == k2->tg) &&
-           (k1->vmid == k2->vmid);
+           (k1->vmid == k2->vmid) && (k1->stage == k2->stage);
 }
 
 SMMUIOTLBKey smmu_get_iotlb_key(uint16_t asid, uint16_t vmid, uint64_t iova,
-                                uint8_t tg, uint8_t level)
+                                uint8_t tg, uint8_t level, SMMUStage stage)
 {
     SMMUIOTLBKey key = {.asid = asid, .vmid = vmid, .iova = iova,
-                        .tg = tg, .level = level};
+                        .tg = tg, .level = level, .stage = stage};
 
     return key;
 }
@@ -81,7 +81,8 @@ SMMUTLBEntry *smmu_iotlb_lookup(SMMUState *bs, SMMUTransCfg *cfg,
         SMMUIOTLBKey key;
 
         key = smmu_get_iotlb_key(cfg->asid, cfg->s2cfg.vmid,
-                                 iova & ~mask, tg, level);
+                                 iova & ~mask, tg, level,
+                                 SMMU_STAGE_TO_TLB_TAG(cfg->stage));
         entry = g_hash_table_lookup(bs->iotlb, &key);
         if (entry) {
             break;
@@ -109,15 +110,16 @@ void smmu_iotlb_insert(SMMUState *bs, SMMUTransCfg *cfg, SMMUTLBEntry *new)
 {
     SMMUIOTLBKey *key = g_new0(SMMUIOTLBKey, 1);
     uint8_t tg = (new->granule - 10) / 2;
+    SMMUStage stage_tag = SMMU_STAGE_TO_TLB_TAG(cfg->stage);
 
     if (g_hash_table_size(bs->iotlb) >= SMMU_IOTLB_MAX_SIZE) {
         smmu_iotlb_inv_all(bs);
     }
 
     *key = smmu_get_iotlb_key(cfg->asid, cfg->s2cfg.vmid, new->entry.iova,
-                              tg, new->level);
+                              tg, new->level, stage_tag);
     trace_smmu_iotlb_insert(cfg->asid, cfg->s2cfg.vmid, new->entry.iova,
-                            tg, new->level);
+                            tg, new->level, stage_tag);
     g_hash_table_insert(bs->iotlb, key, new);
 }
 
@@ -159,18 +161,22 @@ static gboolean smmu_hash_remove_by_asid_vmid_iova(gpointer key, gpointer value,
     if (info->vmid >= 0 && info->vmid != SMMU_IOTLB_VMID(iotlb_key)) {
         return false;
     }
+    if (!(info->stage & SMMU_IOTLB_STAGE(iotlb_key))) {
+        return false;
+    }
     return ((info->iova & ~entry->addr_mask) == entry->iova) ||
            ((entry->iova & ~info->mask) == info->iova);
 }
 
 void smmu_iotlb_inv_iova(SMMUState *s, int asid, int vmid, dma_addr_t iova,
-                         uint8_t tg, uint64_t num_pages, uint8_t ttl)
+                         uint8_t tg, uint64_t num_pages, uint8_t ttl,
+                         SMMUStage stage)
 {
     /* if tg is not set we use 4KB range invalidation */
     uint8_t granule = tg ? tg * 2 + 10 : 12;
 
     if (ttl && (num_pages == 1) && (asid >= 0)) {
-        SMMUIOTLBKey key = smmu_get_iotlb_key(asid, vmid, iova, tg, ttl);
+        SMMUIOTLBKey key = smmu_get_iotlb_key(asid, vmid, iova, tg, ttl, stage);
 
         if (g_hash_table_remove(s->iotlb, &key)) {
             return;
@@ -184,6 +190,7 @@ void smmu_iotlb_inv_iova(SMMUState *s, int asid, int vmid, dma_addr_t iova,
     SMMUIOTLBPageInvInfo info = {
         .asid = asid, .iova = iova,
         .vmid = vmid,
+        .stage = stage,
         .mask = (num_pages * 1 << granule) - 1};
 
     g_hash_table_foreach_remove(s->iotlb,
@@ -597,7 +604,7 @@ SMMUTLBEntry *smmu_translate(SMMUState *bs, SMMUTransCfg *cfg, dma_addr_t addr,
     if (cached_entry) {
         if ((flag & IOMMU_WO) && !(cached_entry->entry.perm & IOMMU_WO)) {
             info->type = SMMU_PTW_ERR_PERMISSION;
-            info->stage = cfg->stage;
+            info->stage = SMMU_STAGE_TO_TLB_TAG(cfg->stage);
             return NULL;
         }
         return cached_entry;
diff --git a/hw/arm/smmu-internal.h b/hw/arm/smmu-internal.h
index 843bebb185..6caa0ddf21 100644
--- a/hw/arm/smmu-internal.h
+++ b/hw/arm/smmu-internal.h
@@ -133,12 +133,14 @@ static inline int pgd_concat_idx(int start_level, int granule_sz,
 
 #define SMMU_IOTLB_ASID(key) ((key).asid)
 #define SMMU_IOTLB_VMID(key) ((key).vmid)
+#define SMMU_IOTLB_STAGE(key) ((key).stage)
 
 typedef struct SMMUIOTLBPageInvInfo {
     int asid;
     int vmid;
     uint64_t iova;
     uint64_t mask;
+    SMMUStage stage;
 } SMMUIOTLBPageInvInfo;
 
 typedef struct SMMUSIDRange {
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index f081ff0cc4..b27bf297e1 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1087,7 +1087,7 @@ static void smmuv3_range_inval(SMMUState *s, Cmd *cmd)
     if (!tg) {
         trace_smmuv3_range_inval(vmid, asid, addr, tg, 1, ttl, leaf);
         smmuv3_inv_notifiers_iova(s, asid, vmid, addr, tg, 1);
-        smmu_iotlb_inv_iova(s, asid, vmid, addr, tg, 1, ttl);
+        smmu_iotlb_inv_iova(s, asid, vmid, addr, tg, 1, ttl, SMMU_NESTED);
         return;
     }
 
@@ -1105,7 +1105,8 @@ static void smmuv3_range_inval(SMMUState *s, Cmd *cmd)
         num_pages = (mask + 1) >> granule;
         trace_smmuv3_range_inval(vmid, asid, addr, tg, num_pages, ttl, leaf);
         smmuv3_inv_notifiers_iova(s, asid, vmid, addr, tg, num_pages);
-        smmu_iotlb_inv_iova(s, asid, vmid, addr, tg, num_pages, ttl);
+        smmu_iotlb_inv_iova(s, asid, vmid, addr, tg,
+                            num_pages, ttl, SMMU_NESTED);
         addr += mask + 1;
     }
 }
diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index cc12924a84..3000c3bf14 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -14,10 +14,11 @@ smmu_iotlb_inv_all(void) "IOTLB invalidate all"
 smmu_iotlb_inv_asid(uint16_t asid) "IOTLB invalidate asid=%d"
 smmu_iotlb_inv_vmid(uint16_t vmid) "IOTLB invalidate vmid=%d"
 smmu_iotlb_inv_iova(uint16_t asid, uint64_t addr) "IOTLB invalidate asid=%d addr=0x%"PRIx64
+smmu_iotlb_inv_stage(int stage) "Stage invalidate stage=%d"
 smmu_inv_notifiers_mr(const char *name) "iommu mr=%s"
 smmu_iotlb_lookup_hit(uint16_t asid, uint16_t vmid, uint64_t addr, uint32_t hit, uint32_t miss, uint32_t p) "IOTLB cache HIT asid=%d vmid=%d addr=0x%"PRIx64" hit=%d miss=%d hit rate=%d"
 smmu_iotlb_lookup_miss(uint16_t asid, uint16_t vmid, uint64_t addr, uint32_t hit, uint32_t miss, uint32_t p) "IOTLB cache MISS asid=%d vmid=%d addr=0x%"PRIx64" hit=%d miss=%d hit rate=%d"
-smmu_iotlb_insert(uint16_t asid, uint16_t vmid, uint64_t addr, uint8_t tg, uint8_t level) "IOTLB ++ asid=%d vmid=%d addr=0x%"PRIx64" tg=%d level=%d"
+smmu_iotlb_insert(uint16_t asid, uint16_t vmid, uint64_t addr, uint8_t tg, uint8_t level, int stage) "IOTLB ++ asid=%d vmid=%d addr=0x%"PRIx64" tg=%d level=%d stage=%d"
 
 # smmuv3.c
 smmuv3_read_mmio(uint64_t addr, uint64_t val, unsigned size, uint32_t r) "addr: 0x%"PRIx64" val:0x%"PRIx64" size: 0x%x(%d)"
diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
index 876e78975c..695d6d10ad 100644
--- a/include/hw/arm/smmu-common.h
+++ b/include/hw/arm/smmu-common.h
@@ -37,6 +37,8 @@
 #define VMSA_IDXMSK(isz, strd, lvl)         ((1ULL << \
                                              VMSA_BIT_LVL(isz, strd, lvl)) - 1)
 
+#define SMMU_STAGE_TO_TLB_TAG(stage)        (((stage) == SMMU_NESTED) ? \
+                                             SMMU_STAGE_1 : (stage))
 /*
  * Page table walk error types
  */
@@ -136,6 +138,7 @@ typedef struct SMMUIOTLBKey {
     uint16_t vmid;
     uint8_t tg;
     uint8_t level;
+    SMMUStage stage;
 } SMMUIOTLBKey;
 
 struct SMMUState {
@@ -203,12 +206,13 @@ SMMUTLBEntry *smmu_iotlb_lookup(SMMUState *bs, SMMUTransCfg *cfg,
                                 SMMUTransTableInfo *tt, hwaddr iova);
 void smmu_iotlb_insert(SMMUState *bs, SMMUTransCfg *cfg, SMMUTLBEntry *entry);
 SMMUIOTLBKey smmu_get_iotlb_key(uint16_t asid, uint16_t vmid, uint64_t iova,
-                                uint8_t tg, uint8_t level);
+                                uint8_t tg, uint8_t level, SMMUStage stage);
 void smmu_iotlb_inv_all(SMMUState *s);
 void smmu_iotlb_inv_asid(SMMUState *s, uint16_t asid);
 void smmu_iotlb_inv_vmid(SMMUState *s, uint16_t vmid);
 void smmu_iotlb_inv_iova(SMMUState *s, int asid, int vmid, dma_addr_t iova,
-                         uint8_t tg, uint64_t num_pages, uint8_t ttl);
+                         uint8_t tg, uint64_t num_pages, uint8_t ttl,
+                         SMMUStage stage);
 
 /* Unmap the range of all the notifiers registered to any IOMMU mr */
 void smmu_inv_notifiers_all(SMMUState *s);
-- 
2.44.0.396.g6e790dbe36-goog



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH 04/12] hw/arm/smmu: Support nesting in commands
  2024-03-25 10:13 [RFC PATCH 00/12] SMMUv3 nested translation support Mostafa Saleh
                   ` (2 preceding siblings ...)
  2024-03-25 10:13 ` [RFC PATCH 03/12] hw/arm/smmu: Add stage to TLB Mostafa Saleh
@ 2024-03-25 10:14 ` Mostafa Saleh
  2024-03-25 10:14 ` [RFC PATCH 05/12] hw/arm/smmuv3: Support nested SMMUs in smmuv3_notify_iova() Mostafa Saleh
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 24+ messages in thread
From: Mostafa Saleh @ 2024-03-25 10:14 UTC (permalink / raw)
  To: qemu-arm, eric.auger, peter.maydell, qemu-devel
  Cc: jean-philippe, alex.bennee, maz, nicolinc, julien, Mostafa Saleh

Commands had assumptions about VMID and ASID being mutually exclusive
and the same for stage-1 and stage-2. As we are going to support
nesting, we need to implement them properly:
- CMD_TLBI_NH_ASID: Used to ignore VMID as it was not used in stage-1
  instances, now we read it from the command and invalidate by
  ASID + VMID if stage-2 exists.

- CMD_TLBI_NH_ALL: Use to invalidate all as VMID were not used in
  stage-1 instances, now it invalidates stage-1 by vmid, and this
  command is decoupled from CMD_TLBI_NSNH_ALL which invalidates all
  stages.

- CMD_TLBI_NH_VAA, SMMU_CMD_TLBI_NH_VA: Used to ignore VMID also.

- CMD_TLBI_S2_IPA: Now invalidates stage-2 only.

Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
 hw/arm/smmu-common.c         | 34 +++++++++++++++++---------
 hw/arm/smmuv3.c              | 47 +++++++++++++++++++++++++++---------
 hw/arm/trace-events          |  7 +++---
 include/hw/arm/smmu-common.h |  4 +--
 4 files changed, 64 insertions(+), 28 deletions(-)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index 677dcf9a13..f0905c28cf 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -129,22 +129,24 @@ void smmu_iotlb_inv_all(SMMUState *s)
     g_hash_table_remove_all(s->iotlb);
 }
 
-static gboolean smmu_hash_remove_by_asid(gpointer key, gpointer value,
-                                         gpointer user_data)
+static gboolean smmu_hash_remove_by_asid_vmid(gpointer key, gpointer value,
+                                              gpointer user_data)
 {
-    uint16_t asid = *(uint16_t *)user_data;
+    SMMUIOTLBPageInvInfo *info = (SMMUIOTLBPageInvInfo *)user_data;
     SMMUIOTLBKey *iotlb_key = (SMMUIOTLBKey *)key;
 
-    return SMMU_IOTLB_ASID(*iotlb_key) == asid;
+    return (SMMU_IOTLB_ASID(*iotlb_key) == info->asid) &&
+            (SMMU_IOTLB_VMID(*iotlb_key) == info->vmid);
 }
 
 static gboolean smmu_hash_remove_by_vmid(gpointer key, gpointer value,
                                          gpointer user_data)
 {
-    uint16_t vmid = *(uint16_t *)user_data;
+    SMMUIOTLBPageInvInfo *info = (SMMUIOTLBPageInvInfo *)user_data;
     SMMUIOTLBKey *iotlb_key = (SMMUIOTLBKey *)key;
 
-    return SMMU_IOTLB_VMID(*iotlb_key) == vmid;
+    return (SMMU_IOTLB_VMID(*iotlb_key) == info->vmid) &&
+            (info->stage & SMMU_IOTLB_STAGE(*iotlb_key));
 }
 
 static gboolean smmu_hash_remove_by_asid_vmid_iova(gpointer key, gpointer value,
@@ -198,16 +200,26 @@ void smmu_iotlb_inv_iova(SMMUState *s, int asid, int vmid, dma_addr_t iova,
                                 &info);
 }
 
-void smmu_iotlb_inv_asid(SMMUState *s, uint16_t asid)
+void smmu_iotlb_inv_asid_vmid(SMMUState *s, uint16_t asid, uint16_t vmid)
 {
+    SMMUIOTLBPageInvInfo info = {
+        .asid = asid,
+        .vmid = vmid,
+    };
+
     trace_smmu_iotlb_inv_asid(asid);
-    g_hash_table_foreach_remove(s->iotlb, smmu_hash_remove_by_asid, &asid);
+    g_hash_table_foreach_remove(s->iotlb, smmu_hash_remove_by_asid_vmid, &info);
 }
 
-inline void smmu_iotlb_inv_vmid(SMMUState *s, uint16_t vmid)
+inline void smmu_iotlb_inv_vmid(SMMUState *s, uint16_t vmid, SMMUStage stage)
 {
-    trace_smmu_iotlb_inv_vmid(vmid);
-    g_hash_table_foreach_remove(s->iotlb, smmu_hash_remove_by_vmid, &vmid);
+    SMMUIOTLBPageInvInfo info = {
+        .vmid = vmid,
+        .stage = stage,
+    };
+
+    trace_smmu_iotlb_inv_vmid(vmid, stage);
+    g_hash_table_foreach_remove(s->iotlb, smmu_hash_remove_by_vmid, &info);
 }
 
 /* VMSAv8-64 Translation */
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index b27bf297e1..9460fff0ed 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1060,7 +1060,7 @@ static void smmuv3_inv_notifiers_iova(SMMUState *s, int asid, int vmid,
     }
 }
 
-static void smmuv3_range_inval(SMMUState *s, Cmd *cmd)
+static void smmuv3_range_inval(SMMUState *s, Cmd *cmd, SMMUStage stage)
 {
     dma_addr_t end, addr = CMD_ADDR(cmd);
     uint8_t type = CMD_TYPE(cmd);
@@ -1085,9 +1085,9 @@ static void smmuv3_range_inval(SMMUState *s, Cmd *cmd)
     }
 
     if (!tg) {
-        trace_smmuv3_range_inval(vmid, asid, addr, tg, 1, ttl, leaf);
+        trace_smmuv3_range_inval(vmid, asid, addr, tg, 1, ttl, leaf, stage);
         smmuv3_inv_notifiers_iova(s, asid, vmid, addr, tg, 1);
-        smmu_iotlb_inv_iova(s, asid, vmid, addr, tg, 1, ttl, SMMU_NESTED);
+        smmu_iotlb_inv_iova(s, asid, vmid, addr, tg, 1, ttl, stage);
         return;
     }
 
@@ -1103,10 +1103,10 @@ static void smmuv3_range_inval(SMMUState *s, Cmd *cmd)
         uint64_t mask = dma_aligned_pow2_mask(addr, end, 64);
 
         num_pages = (mask + 1) >> granule;
-        trace_smmuv3_range_inval(vmid, asid, addr, tg, num_pages, ttl, leaf);
+        trace_smmuv3_range_inval(vmid, asid, addr, tg,
+                                 num_pages, ttl, leaf, stage);
         smmuv3_inv_notifiers_iova(s, asid, vmid, addr, tg, num_pages);
-        smmu_iotlb_inv_iova(s, asid, vmid, addr, tg,
-                            num_pages, ttl, SMMU_NESTED);
+        smmu_iotlb_inv_iova(s, asid, vmid, addr, tg, num_pages, ttl, stage);
         addr += mask + 1;
     }
 }
@@ -1237,25 +1237,48 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
         case SMMU_CMD_TLBI_NH_ASID:
         {
             uint16_t asid = CMD_ASID(&cmd);
+            uint16_t vmid = CMD_VMID(&cmd);
 
             if (!STAGE1_SUPPORTED(s)) {
                 cmd_error = SMMU_CERROR_ILL;
                 break;
             }
+            /*
+             * VMID is only matched when stage 2 is supported for the Security
+             * state corresponding to the command queue that the command was
+             * issued in.
+             * QEMU ignores the field by setting to -1, similarly to what STE
+             * decoding does. And invalidation commands ignore VMID < 0.
+             */
+            if (!STAGE2_SUPPORTED(s)) {
+                vmid = -1;
+            }
 
             trace_smmuv3_cmdq_tlbi_nh_asid(asid);
             smmu_inv_notifiers_all(&s->smmu_state);
-            smmu_iotlb_inv_asid(bs, asid);
+            smmu_iotlb_inv_asid_vmid(bs, asid, vmid);
             break;
         }
         case SMMU_CMD_TLBI_NH_ALL:
+        {
+            uint16_t vmid = CMD_VMID(&cmd);
+
+            trace_smmuv3_cmdq_tlbi_nh(vmid);
             if (!STAGE1_SUPPORTED(s)) {
                 cmd_error = SMMU_CERROR_ILL;
                 break;
             }
-            QEMU_FALLTHROUGH;
+
+            /* See SMMU_CMD_TLBI_NH_ASID. */
+            if (!STAGE2_SUPPORTED(s)) {
+                vmid = -1;
+            }
+
+            smmu_iotlb_inv_vmid(bs, vmid, SMMU_STAGE_1);
+            break;
+        }
         case SMMU_CMD_TLBI_NSNH_ALL:
-            trace_smmuv3_cmdq_tlbi_nh();
+            trace_smmuv3_cmdq_tlbi_nsnh();
             smmu_inv_notifiers_all(&s->smmu_state);
             smmu_iotlb_inv_all(bs);
             break;
@@ -1265,7 +1288,7 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
                 cmd_error = SMMU_CERROR_ILL;
                 break;
             }
-            smmuv3_range_inval(bs, &cmd);
+            smmuv3_range_inval(bs, &cmd, SMMU_STAGE_1);
             break;
         case SMMU_CMD_TLBI_S12_VMALL:
         {
@@ -1278,7 +1301,7 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
 
             trace_smmuv3_cmdq_tlbi_s12_vmid(vmid);
             smmu_inv_notifiers_all(&s->smmu_state);
-            smmu_iotlb_inv_vmid(bs, vmid);
+            smmu_iotlb_inv_vmid(bs, vmid, SMMU_NESTED);
             break;
         }
         case SMMU_CMD_TLBI_S2_IPA:
@@ -1290,7 +1313,7 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
              * As currently only either s1 or s2 are supported
              * we can reuse same function for s2.
              */
-            smmuv3_range_inval(bs, &cmd);
+            smmuv3_range_inval(bs, &cmd, SMMU_STAGE_2);
             break;
         case SMMU_CMD_TLBI_EL3_ALL:
         case SMMU_CMD_TLBI_EL3_VA:
diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index 3000c3bf14..73cec52d21 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -12,7 +12,7 @@ smmu_ptw_block_pte(int stage, int level, uint64_t baseaddr, uint64_t pteaddr, ui
 smmu_get_pte(uint64_t baseaddr, int index, uint64_t pteaddr, uint64_t pte) "baseaddr=0x%"PRIx64" index=0x%x, pteaddr=0x%"PRIx64", pte=0x%"PRIx64
 smmu_iotlb_inv_all(void) "IOTLB invalidate all"
 smmu_iotlb_inv_asid(uint16_t asid) "IOTLB invalidate asid=%d"
-smmu_iotlb_inv_vmid(uint16_t vmid) "IOTLB invalidate vmid=%d"
+smmu_iotlb_inv_vmid(uint16_t vmid, int stage) "IOTLB invalidate vmid=%d stage=%d"
 smmu_iotlb_inv_iova(uint16_t asid, uint64_t addr) "IOTLB invalidate asid=%d addr=0x%"PRIx64
 smmu_iotlb_inv_stage(int stage) "Stage invalidate stage=%d"
 smmu_inv_notifiers_mr(const char *name) "iommu mr=%s"
@@ -47,8 +47,9 @@ smmuv3_cmdq_cfgi_ste_range(int start, int end) "start=0x%x - end=0x%x"
 smmuv3_cmdq_cfgi_cd(uint32_t sid) "sid=0x%x"
 smmuv3_config_cache_hit(uint32_t sid, uint32_t hits, uint32_t misses, uint32_t perc) "Config cache HIT for sid=0x%x (hits=%d, misses=%d, hit rate=%d)"
 smmuv3_config_cache_miss(uint32_t sid, uint32_t hits, uint32_t misses, uint32_t perc) "Config cache MISS for sid=0x%x (hits=%d, misses=%d, hit rate=%d)"
-smmuv3_range_inval(int vmid, int asid, uint64_t addr, uint8_t tg, uint64_t num_pages, uint8_t ttl, bool leaf) "vmid=%d asid=%d addr=0x%"PRIx64" tg=%d num_pages=0x%"PRIx64" ttl=%d leaf=%d"
-smmuv3_cmdq_tlbi_nh(void) ""
+smmuv3_range_inval(int vmid, int asid, uint64_t addr, uint8_t tg, uint64_t num_pages, uint8_t ttl, bool leaf, int stage) "vmid=%d asid=%d addr=0x%"PRIx64" tg=%d num_pages=0x%"PRIx64" ttl=%d leaf=%d stage=%d"
+smmuv3_cmdq_tlbi_nsnh(void) ""
+smmuv3_cmdq_tlbi_nh(uint16_t vmid) "vmid=%d"
 smmuv3_cmdq_tlbi_nh_asid(uint16_t asid) "asid=%d"
 smmuv3_cmdq_tlbi_s12_vmid(uint16_t vmid) "vmid=%d"
 smmuv3_config_cache_inv(uint32_t sid) "Config cache INV for sid=0x%x"
diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
index 695d6d10ad..6d3bf5316b 100644
--- a/include/hw/arm/smmu-common.h
+++ b/include/hw/arm/smmu-common.h
@@ -208,8 +208,8 @@ void smmu_iotlb_insert(SMMUState *bs, SMMUTransCfg *cfg, SMMUTLBEntry *entry);
 SMMUIOTLBKey smmu_get_iotlb_key(uint16_t asid, uint16_t vmid, uint64_t iova,
                                 uint8_t tg, uint8_t level, SMMUStage stage);
 void smmu_iotlb_inv_all(SMMUState *s);
-void smmu_iotlb_inv_asid(SMMUState *s, uint16_t asid);
-void smmu_iotlb_inv_vmid(SMMUState *s, uint16_t vmid);
+void smmu_iotlb_inv_asid_vmid(SMMUState *s, uint16_t asid, uint16_t vmid);
+void smmu_iotlb_inv_vmid(SMMUState *s, uint16_t vmid, SMMUStage stage);
 void smmu_iotlb_inv_iova(SMMUState *s, int asid, int vmid, dma_addr_t iova,
                          uint8_t tg, uint64_t num_pages, uint8_t ttl,
                          SMMUStage stage);
-- 
2.44.0.396.g6e790dbe36-goog



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH 05/12] hw/arm/smmuv3: Support nested SMMUs in smmuv3_notify_iova()
  2024-03-25 10:13 [RFC PATCH 00/12] SMMUv3 nested translation support Mostafa Saleh
                   ` (3 preceding siblings ...)
  2024-03-25 10:14 ` [RFC PATCH 04/12] hw/arm/smmu: Support nesting in commands Mostafa Saleh
@ 2024-03-25 10:14 ` Mostafa Saleh
  2024-03-25 10:14 ` [RFC PATCH 06/12] hw/arm/smmuv3: Translate CD and TT using stage-2 table Mostafa Saleh
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 24+ messages in thread
From: Mostafa Saleh @ 2024-03-25 10:14 UTC (permalink / raw)
  To: qemu-arm, eric.auger, peter.maydell, qemu-devel
  Cc: jean-philippe, alex.bennee, maz, nicolinc, julien, Mostafa Saleh

IOMMUTLBEvent only understands IOVA, for stage-2 only SMMUs keep
the implementation, while only notify for stage-1 invalidation
in case of nesting.

Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
 hw/arm/smmuv3.c     | 23 +++++++++++++++--------
 hw/arm/trace-events |  2 +-
 2 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 9460fff0ed..d9ee203d09 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -993,7 +993,7 @@ static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
                                IOMMUNotifier *n,
                                int asid, int vmid,
                                dma_addr_t iova, uint8_t tg,
-                               uint64_t num_pages)
+                               uint64_t num_pages, int stage)
 {
     SMMUDevice *sdev = container_of(mr, SMMUDevice, iommu);
     IOMMUTLBEvent event;
@@ -1017,14 +1017,21 @@ static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
             return;
         }
 
-        if (STAGE1_SUPPORTED(s)) {
+        /*
+         * IOMMUTLBEvent only understands IOVA, for stage-2 only SMMUs
+         * keep the implementation, while only notify for stage-1
+         * invalidation in case of nesting.
+         */
+        if (stage == SMMU_STAGE_1) {
             tt = select_tt(cfg, iova);
             if (!tt) {
                 return;
             }
             granule = tt->granule_sz;
-        } else {
+        } else if (!STAGE1_SUPPORTED(s)) {
             granule = cfg->s2cfg.granule_sz;
+        } else {
+            return;
         }
 
     } else {
@@ -1043,7 +1050,7 @@ static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
 /* invalidate an asid/vmid/iova range tuple in all mr's */
 static void smmuv3_inv_notifiers_iova(SMMUState *s, int asid, int vmid,
                                       dma_addr_t iova, uint8_t tg,
-                                      uint64_t num_pages)
+                                      uint64_t num_pages, int stage)
 {
     SMMUDevice *sdev;
 
@@ -1052,10 +1059,10 @@ static void smmuv3_inv_notifiers_iova(SMMUState *s, int asid, int vmid,
         IOMMUNotifier *n;
 
         trace_smmuv3_inv_notifiers_iova(mr->parent_obj.name, asid, vmid,
-                                        iova, tg, num_pages);
+                                        iova, tg, num_pages, stage);
 
         IOMMU_NOTIFIER_FOREACH(n, mr) {
-            smmuv3_notify_iova(mr, n, asid, vmid, iova, tg, num_pages);
+            smmuv3_notify_iova(mr, n, asid, vmid, iova, tg, num_pages, stage);
         }
     }
 }
@@ -1086,7 +1093,7 @@ static void smmuv3_range_inval(SMMUState *s, Cmd *cmd, SMMUStage stage)
 
     if (!tg) {
         trace_smmuv3_range_inval(vmid, asid, addr, tg, 1, ttl, leaf, stage);
-        smmuv3_inv_notifiers_iova(s, asid, vmid, addr, tg, 1);
+        smmuv3_inv_notifiers_iova(s, asid, vmid, addr, tg, 1, stage);
         smmu_iotlb_inv_iova(s, asid, vmid, addr, tg, 1, ttl, stage);
         return;
     }
@@ -1105,7 +1112,7 @@ static void smmuv3_range_inval(SMMUState *s, Cmd *cmd, SMMUStage stage)
         num_pages = (mask + 1) >> granule;
         trace_smmuv3_range_inval(vmid, asid, addr, tg,
                                  num_pages, ttl, leaf, stage);
-        smmuv3_inv_notifiers_iova(s, asid, vmid, addr, tg, num_pages);
+        smmuv3_inv_notifiers_iova(s, asid, vmid, addr, tg, num_pages, stage);
         smmu_iotlb_inv_iova(s, asid, vmid, addr, tg, num_pages, ttl, stage);
         addr += mask + 1;
     }
diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index 73cec52d21..34b10af83f 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -55,7 +55,7 @@ smmuv3_cmdq_tlbi_s12_vmid(uint16_t vmid) "vmid=%d"
 smmuv3_config_cache_inv(uint32_t sid) "Config cache INV for sid=0x%x"
 smmuv3_notify_flag_add(const char *iommu) "ADD SMMUNotifier node for iommu mr=%s"
 smmuv3_notify_flag_del(const char *iommu) "DEL SMMUNotifier node for iommu mr=%s"
-smmuv3_inv_notifiers_iova(const char *name, uint16_t asid, uint16_t vmid, uint64_t iova, uint8_t tg, uint64_t num_pages) "iommu mr=%s asid=%d vmid=%d iova=0x%"PRIx64" tg=%d num_pages=0x%"PRIx64
+smmuv3_inv_notifiers_iova(const char *name, uint16_t asid, uint16_t vmid, uint64_t iova, uint8_t tg, uint64_t num_pages, int stage) "iommu mr=%s asid=%d vmid=%d iova=0x%"PRIx64" tg=%d num_pages=0x%"PRIx64" stage=%d"
 
 # strongarm.c
 strongarm_uart_update_parameters(const char *label, int speed, char parity, int data_bits, int stop_bits) "%s speed=%d parity=%c data=%d stop=%d"
-- 
2.44.0.396.g6e790dbe36-goog



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH 06/12] hw/arm/smmuv3: Translate CD and TT using stage-2 table
  2024-03-25 10:13 [RFC PATCH 00/12] SMMUv3 nested translation support Mostafa Saleh
                   ` (4 preceding siblings ...)
  2024-03-25 10:14 ` [RFC PATCH 05/12] hw/arm/smmuv3: Support nested SMMUs in smmuv3_notify_iova() Mostafa Saleh
@ 2024-03-25 10:14 ` Mostafa Saleh
  2024-03-25 10:14 ` [RFC PATCH 07/12] hw/arm/smmu-common: Support nested translation Mostafa Saleh
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 24+ messages in thread
From: Mostafa Saleh @ 2024-03-25 10:14 UTC (permalink / raw)
  To: qemu-arm, eric.auger, peter.maydell, qemu-devel
  Cc: jean-philippe, alex.bennee, maz, nicolinc, julien, Mostafa Saleh

According to the user manual (ARM IHI 0070 F.b),
In "5.2 Stream Table Entry":
 [51:6] S1ContextPtr
 If Config[1] == 1 (stage 2 enabled), this pointer is an IPA translated by
 stage 2 and the programmed value must be within the range of the IAS.

In "5.4.1 CD notes":
 The translation table walks performed from TTB0 or TTB1 are always performed
 in IPA space if stage 2 translations are enabled.

So translate both the CD and the TTBx in this patch if nested
translation is requested.

Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
 hw/arm/smmuv3.c              | 49 ++++++++++++++++++++++++++++++------
 include/hw/arm/smmu-common.h | 15 +++++++++++
 2 files changed, 57 insertions(+), 7 deletions(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index d9ee203d09..32a1838576 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -337,14 +337,36 @@ static int smmu_get_ste(SMMUv3State *s, dma_addr_t addr, STE *buf,
 
 }
 
+static SMMUTranslationStatus smmuv3_do_translate(SMMUv3State *s, hwaddr addr,
+                                                 SMMUTransCfg *cfg,
+                                                 SMMUEventInfo *event,
+                                                 IOMMUAccessFlags flag,
+                                                 SMMUTLBEntry **out_entry);
 /* @ssid > 0 not supported yet */
-static int smmu_get_cd(SMMUv3State *s, STE *ste, uint32_t ssid,
-                       CD *buf, SMMUEventInfo *event)
+static int smmu_get_cd(SMMUv3State *s, STE *ste, SMMUTransCfg *cfg,
+                       uint32_t ssid, CD *buf, SMMUEventInfo *event)
 {
     dma_addr_t addr = STE_CTXPTR(ste);
     int ret, i;
+    SMMUTranslationStatus status;
+    SMMUTLBEntry *entry;
 
     trace_smmuv3_get_cd(addr);
+
+    if (cfg->stage == SMMU_NESTED) {
+        CALL_FUNC_CFG_S2(cfg, status, smmuv3_do_translate, s, addr,
+                         cfg, event, IOMMU_RO, &entry);
+        /*
+         * It is not clear what should happen if this fails, so we return here
+         * which gets propagated as a translation error.
+         */
+        if (status != SMMU_TRANS_SUCCESS) {
+            return -EINVAL;
+        }
+
+        addr = CACHED_ENTRY_TO_ADDR(entry, addr);
+    }
+
     /* TODO: guarantee 64-bit single-copy atomicity */
     ret = dma_memory_read(&address_space_memory, addr, buf, sizeof(*buf),
                           MEMTXATTRS_UNSPECIFIED);
@@ -659,10 +681,13 @@ static int smmu_find_ste(SMMUv3State *s, uint32_t sid, STE *ste,
     return 0;
 }
 
-static int decode_cd(SMMUTransCfg *cfg, CD *cd, SMMUEventInfo *event)
+static int decode_cd(SMMUv3State *s, SMMUTransCfg *cfg,
+                     CD *cd, SMMUEventInfo *event)
 {
     int ret = -EINVAL;
     int i;
+    SMMUTranslationStatus status;
+    SMMUTLBEntry *entry;
 
     if (!CD_VALID(cd) || !CD_AARCH64(cd)) {
         goto bad_cd;
@@ -713,6 +738,17 @@ static int decode_cd(SMMUTransCfg *cfg, CD *cd, SMMUEventInfo *event)
 
         tt->tsz = tsz;
         tt->ttb = CD_TTB(cd, i);
+
+        /* Translate the TTBx, from IPA to PA if nesting is enabled. */
+        if (cfg->stage == SMMU_NESTED) {
+            CALL_FUNC_CFG_S2(cfg, status, smmuv3_do_translate, s,
+                             tt->ttb, cfg, event, IOMMU_RO, &entry);
+            /* See smmu_get_cd(). */
+            if (status != SMMU_TRANS_SUCCESS) {
+                return -EINVAL;
+            }
+            tt->ttb = CACHED_ENTRY_TO_ADDR(entry, tt->ttb);
+        }
         if (tt->ttb & ~(MAKE_64BIT_MASK(0, cfg->oas))) {
             goto bad_cd;
         }
@@ -767,12 +803,12 @@ static int smmuv3_decode_config(IOMMUMemoryRegion *mr, SMMUTransCfg *cfg,
         return 0;
     }
 
-    ret = smmu_get_cd(s, &ste, 0 /* ssid */, &cd, event);
+    ret = smmu_get_cd(s, &ste, cfg, 0 /* ssid */, &cd, event);
     if (ret) {
         return ret;
     }
 
-    return decode_cd(cfg, &cd, event);
+    return decode_cd(s, cfg, &cd, event);
 }
 
 /**
@@ -942,8 +978,7 @@ epilogue:
     switch (status) {
     case SMMU_TRANS_SUCCESS:
         entry.perm = cached_entry->entry.perm;
-        entry.translated_addr = cached_entry->entry.translated_addr +
-                                    (addr & cached_entry->entry.addr_mask);
+        entry.translated_addr = CACHED_ENTRY_TO_ADDR(cached_entry, addr);
         entry.addr_mask = cached_entry->entry.addr_mask;
         trace_smmuv3_translate_success(mr->parent_obj.name, sid, addr,
                                        entry.translated_addr, entry.perm,
diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
index 6d3bf5316b..c0969e461d 100644
--- a/include/hw/arm/smmu-common.h
+++ b/include/hw/arm/smmu-common.h
@@ -39,6 +39,21 @@
 
 #define SMMU_STAGE_TO_TLB_TAG(stage)        (((stage) == SMMU_NESTED) ? \
                                              SMMU_STAGE_1 : (stage))
+
+#define CACHED_ENTRY_TO_ADDR(ent, addr)      (ent)->entry.translated_addr + \
+                                             ((addr) & (ent)->entry.addr_mask);
+
+/*
+ * From nested context, some functions might need to translate IPA addresses.
+ * As cfg has SMMU_NESTED, this won't work, this macro calls a function with
+ * making it a stage-2 cfg and then restore it after.
+ */
+#define CALL_FUNC_CFG_S2(cfg, ret, fn, ...)  ({ \
+                                                   cfg->stage = SMMU_STAGE_2; \
+                                                   ret = fn(__VA_ARGS__); \
+                                                   cfg->stage = SMMU_NESTED; \
+                                              })
+
 /*
  * Page table walk error types
  */
-- 
2.44.0.396.g6e790dbe36-goog



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH 07/12] hw/arm/smmu-common: Support nested translation
  2024-03-25 10:13 [RFC PATCH 00/12] SMMUv3 nested translation support Mostafa Saleh
                   ` (5 preceding siblings ...)
  2024-03-25 10:14 ` [RFC PATCH 06/12] hw/arm/smmuv3: Translate CD and TT using stage-2 table Mostafa Saleh
@ 2024-03-25 10:14 ` Mostafa Saleh
  2024-03-25 14:20   ` Julien Grall
  2024-03-25 10:14 ` [RFC PATCH 08/12] hw/arm/smmuv3: Support and advertise nesting Mostafa Saleh
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 24+ messages in thread
From: Mostafa Saleh @ 2024-03-25 10:14 UTC (permalink / raw)
  To: qemu-arm, eric.auger, peter.maydell, qemu-devel
  Cc: jean-philippe, alex.bennee, maz, nicolinc, julien, Mostafa Saleh

When nested translation is requested, we need  to do:

- Translate stage-1 IPA using stage-2 to a physical address.
- Translate stage-1 PTW walks using stage-2.
- Combine both to create a single TLB entry

Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
 hw/arm/smmu-common.c         | 135 ++++++++++++++++++++++++++++-------
 hw/arm/trace-events          |   2 +-
 include/hw/arm/smmu-common.h |   3 +-
 3 files changed, 113 insertions(+), 27 deletions(-)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index f0905c28cf..da8776ecec 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -119,7 +119,7 @@ void smmu_iotlb_insert(SMMUState *bs, SMMUTransCfg *cfg, SMMUTLBEntry *new)
     *key = smmu_get_iotlb_key(cfg->asid, cfg->s2cfg.vmid, new->entry.iova,
                               tg, new->level, stage_tag);
     trace_smmu_iotlb_insert(cfg->asid, cfg->s2cfg.vmid, new->entry.iova,
-                            tg, new->level, stage_tag);
+                            tg, new->level, new->entry.addr_mask, stage_tag);
     g_hash_table_insert(bs->iotlb, key, new);
 }
 
@@ -305,6 +305,27 @@ SMMUTransTableInfo *select_tt(SMMUTransCfg *cfg, dma_addr_t iova)
     return NULL;
 }
 
+/* Return the correct table address based on configuration. */
+static inline int translate_table_s1(dma_addr_t *table_addr, SMMUTransCfg *cfg,
+                                     SMMUPTWEventInfo *info, SMMUState *bs)
+{
+    dma_addr_t addr = *table_addr;
+    SMMUTLBEntry *cached_entry;
+
+    if (cfg->stage != SMMU_NESTED) {
+        return 0;
+    }
+
+    CALL_FUNC_CFG_S2(cfg, cached_entry, smmu_translate,
+                     bs, cfg, addr, IOMMU_RO, info);
+
+    if (cached_entry) {
+        *table_addr = CACHED_ENTRY_TO_ADDR(cached_entry, addr);
+        return 0;
+    }
+    return -EINVAL;
+}
+
 /**
  * smmu_ptw_64_s1 - VMSAv8-64 Walk of the page tables for a given IOVA
  * @cfg: translation config
@@ -320,7 +341,8 @@ SMMUTransTableInfo *select_tt(SMMUTransCfg *cfg, dma_addr_t iova)
  */
 static int smmu_ptw_64_s1(SMMUTransCfg *cfg,
                           dma_addr_t iova, IOMMUAccessFlags perm,
-                          SMMUTLBEntry *tlbe, SMMUPTWEventInfo *info)
+                          SMMUTLBEntry *tlbe, SMMUPTWEventInfo *info,
+                          SMMUState *bs)
 {
     dma_addr_t baseaddr, indexmask;
     SMMUStage stage = cfg->stage;
@@ -368,6 +390,10 @@ static int smmu_ptw_64_s1(SMMUTransCfg *cfg,
                 goto error;
             }
             baseaddr = get_table_pte_address(pte, granule_sz);
+            /* In case of failure, retain stage-2 fault. */
+            if (translate_table_s1(&baseaddr, cfg, info, bs)) {
+                goto error_no_stage;
+            }
             level++;
             continue;
         } else if (is_page_pte(pte, level)) {
@@ -403,7 +429,7 @@ static int smmu_ptw_64_s1(SMMUTransCfg *cfg,
         tlbe->entry.translated_addr = gpa;
         tlbe->entry.iova = iova & ~mask;
         tlbe->entry.addr_mask = mask;
-        tlbe->entry.perm = PTE_AP_TO_PERM(ap);
+        tlbe->parent_perm = tlbe->entry.perm = PTE_AP_TO_PERM(ap);
         tlbe->level = level;
         tlbe->granule = granule_sz;
         return 0;
@@ -412,6 +438,7 @@ static int smmu_ptw_64_s1(SMMUTransCfg *cfg,
 
 error:
     info->stage = SMMU_STAGE_1;
+error_no_stage:
     tlbe->entry.perm = IOMMU_NONE;
     return -EINVAL;
 }
@@ -524,7 +551,7 @@ static int smmu_ptw_64_s2(SMMUTransCfg *cfg,
         tlbe->entry.translated_addr = gpa;
         tlbe->entry.iova = ipa & ~mask;
         tlbe->entry.addr_mask = mask;
-        tlbe->entry.perm = s2ap;
+        tlbe->parent_perm = tlbe->entry.perm = s2ap;
         tlbe->level = level;
         tlbe->granule = granule_sz;
         return 0;
@@ -537,6 +564,35 @@ error:
     return -EINVAL;
 }
 
+/* Combine 2 TLB enteries and return in tlbe. */
+static void combine_tlb(SMMUTLBEntry *tlbe, SMMUTLBEntry *tlbe_s2,
+                        dma_addr_t iova, SMMUTransCfg *cfg)
+{
+        if (cfg->stage == SMMU_NESTED) {
+
+            /*
+             * tg and level are used from stage-1, while the addr mask can be
+             * smaller in case stage-2 size(based on granule and level) was
+             * smaller than stage-1.
+             * That should have no impact on:
+             * - lookup: as iova is properly aligned with the stage-1 level and
+             *   granule.
+             * - Invalidation: as it uses the entry mask.
+             */
+            tlbe->entry.addr_mask = MIN(tlbe->entry.addr_mask,
+                                        tlbe_s2->entry.addr_mask);
+            tlbe->entry.translated_addr = CACHED_ENTRY_TO_ADDR(tlbe_s2,
+                                          tlbe->entry.translated_addr);
+
+            /* parent_perm has s2 perm while perm has s1 perm. */
+            tlbe->parent_perm = tlbe_s2->entry.perm;
+            return;
+        }
+
+        /* That was not nested, use the s2. */
+        memcpy(tlbe, tlbe_s2, sizeof(*tlbe));
+}
+
 /**
  * smmu_ptw - Walk the page tables for an IOVA, according to @cfg
  *
@@ -549,28 +605,59 @@ error:
  * return 0 on success
  */
 int smmu_ptw(SMMUTransCfg *cfg, dma_addr_t iova, IOMMUAccessFlags perm,
-             SMMUTLBEntry *tlbe, SMMUPTWEventInfo *info)
+             SMMUTLBEntry *tlbe, SMMUPTWEventInfo *info, SMMUState *bs)
 {
-    if (cfg->stage == SMMU_STAGE_1) {
-        return smmu_ptw_64_s1(cfg, iova, perm, tlbe, info);
-    } else if (cfg->stage == SMMU_STAGE_2) {
-        /*
-         * If bypassing stage 1(or unimplemented), the input address is passed
-         * directly to stage 2 as IPA. If the input address of a transaction
-         * exceeds the size of the IAS, a stage 1 Address Size fault occurs.
-         * For AA64, IAS = OAS according to (IHI 0070.E.a) "3.4 Address sizes"
-         */
-        if (iova >= (1ULL << cfg->oas)) {
-            info->type = SMMU_PTW_ERR_ADDR_SIZE;
-            info->stage = SMMU_STAGE_1;
-            tlbe->entry.perm = IOMMU_NONE;
-            return -EINVAL;
+    int ret = 0;
+    SMMUTLBEntry tlbe_s2;
+    dma_addr_t ipa = iova;
+
+    if (cfg->stage & SMMU_STAGE_1) {
+        ret = smmu_ptw_64_s1(cfg, iova, perm, tlbe, info, bs);
+        if (ret) {
+            return ret;
         }
+        /* This is the IPA for next stage.*/
+        ipa = CACHED_ENTRY_TO_ADDR(tlbe, iova);
+    }
+
+    /*
+     * The address output from the translation causes a stage 1 Address Size
+     * fault if it exceeds the range of the effective IPA size for the given CD.
+     * If bypassing stage 1(or unimplemented), the input address is passed
+     * directly to stage 2 as IPA. If the input address of a transaction
+     * exceeds the size of the IAS, a stage 1 Address Size fault occurs.
+     * For AA64, IAS = OAS according to (IHI 0070.E.a) "3.4 Address sizes"
+     */
+    if (ipa >= (1ULL << cfg->oas)) {
+        info->type = SMMU_PTW_ERR_ADDR_SIZE;
+        info->stage = SMMU_STAGE_1;
+        tlbe->entry.perm = IOMMU_NONE;
+        return -EINVAL;
+    }
 
-        return smmu_ptw_64_s2(cfg, iova, perm, tlbe, info);
+    if (cfg->stage & SMMU_STAGE_2) {
+        ret = smmu_ptw_64_s2(cfg, ipa, perm, &tlbe_s2, info);
+        if (ret) {
+            return ret;
+        }
+        combine_tlb(tlbe, &tlbe_s2, iova, cfg);
     }
 
-    g_assert_not_reached();
+    return ret;
+}
+
+static int validate_tlb_entry(SMMUTLBEntry *cached_entry, IOMMUAccessFlags flag,
+                              SMMUPTWEventInfo *info)
+{
+        if ((flag & IOMMU_WO) && !(cached_entry->entry.perm &
+            cached_entry->parent_perm & IOMMU_WO)) {
+            info->type = SMMU_PTW_ERR_PERMISSION;
+            info->stage = !(cached_entry->entry.perm & IOMMU_WO) ?
+                          SMMU_STAGE_1 :
+                          SMMU_STAGE_2;
+            return -EINVAL;
+        }
+        return 0;
 }
 
 SMMUTLBEntry *smmu_translate(SMMUState *bs, SMMUTransCfg *cfg, dma_addr_t addr,
@@ -614,16 +701,14 @@ SMMUTLBEntry *smmu_translate(SMMUState *bs, SMMUTransCfg *cfg, dma_addr_t addr,
 
     cached_entry = smmu_iotlb_lookup(bs, cfg, &tt_combined, aligned_addr);
     if (cached_entry) {
-        if ((flag & IOMMU_WO) && !(cached_entry->entry.perm & IOMMU_WO)) {
-            info->type = SMMU_PTW_ERR_PERMISSION;
-            info->stage = SMMU_STAGE_TO_TLB_TAG(cfg->stage);
+        if (validate_tlb_entry(cached_entry, flag, info)) {
             return NULL;
         }
         return cached_entry;
     }
 
     cached_entry = g_new0(SMMUTLBEntry, 1);
-    status = smmu_ptw(cfg, aligned_addr, flag, cached_entry, info);
+    status = smmu_ptw(cfg, aligned_addr, flag, cached_entry, info, bs);
     if (status) {
             g_free(cached_entry);
             return NULL;
diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index 34b10af83f..215df91ea3 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -18,7 +18,7 @@ smmu_iotlb_inv_stage(int stage) "Stage invalidate stage=%d"
 smmu_inv_notifiers_mr(const char *name) "iommu mr=%s"
 smmu_iotlb_lookup_hit(uint16_t asid, uint16_t vmid, uint64_t addr, uint32_t hit, uint32_t miss, uint32_t p) "IOTLB cache HIT asid=%d vmid=%d addr=0x%"PRIx64" hit=%d miss=%d hit rate=%d"
 smmu_iotlb_lookup_miss(uint16_t asid, uint16_t vmid, uint64_t addr, uint32_t hit, uint32_t miss, uint32_t p) "IOTLB cache MISS asid=%d vmid=%d addr=0x%"PRIx64" hit=%d miss=%d hit rate=%d"
-smmu_iotlb_insert(uint16_t asid, uint16_t vmid, uint64_t addr, uint8_t tg, uint8_t level, int stage) "IOTLB ++ asid=%d vmid=%d addr=0x%"PRIx64" tg=%d level=%d stage=%d"
+smmu_iotlb_insert(uint16_t asid, uint16_t vmid, uint64_t addr, uint8_t tg, uint8_t level, uint64_t mask, int stage) "IOTLB ++ asid=%d vmid=%d addr=0x%"PRIx64" tg=%d level=%d mask=0x%"PRIx64" stage=%d"
 
 # smmuv3.c
 smmuv3_read_mmio(uint64_t addr, uint64_t val, unsigned size, uint32_t r) "addr: 0x%"PRIx64" val:0x%"PRIx64" size: 0x%x(%d)"
diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
index c0969e461d..4f9505d91c 100644
--- a/include/hw/arm/smmu-common.h
+++ b/include/hw/arm/smmu-common.h
@@ -91,6 +91,7 @@ typedef struct SMMUTLBEntry {
     IOMMUTLBEntry entry;
     uint8_t level;
     uint8_t granule;
+    IOMMUAccessFlags parent_perm;
 } SMMUTLBEntry;
 
 /* Stage-2 configuration. */
@@ -199,7 +200,7 @@ static inline uint16_t smmu_get_sid(SMMUDevice *sdev)
  * pair, according to @cfg translation config
  */
 int smmu_ptw(SMMUTransCfg *cfg, dma_addr_t iova, IOMMUAccessFlags perm,
-             SMMUTLBEntry *tlbe, SMMUPTWEventInfo *info);
+             SMMUTLBEntry *tlbe, SMMUPTWEventInfo *info, SMMUState *bs);
 
 
 /* smmu_translate - Look for a translation in TLB, if not, do a PTW. */
-- 
2.44.0.396.g6e790dbe36-goog



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH 08/12] hw/arm/smmuv3: Support and advertise nesting
  2024-03-25 10:13 [RFC PATCH 00/12] SMMUv3 nested translation support Mostafa Saleh
                   ` (6 preceding siblings ...)
  2024-03-25 10:14 ` [RFC PATCH 07/12] hw/arm/smmu-common: Support nested translation Mostafa Saleh
@ 2024-03-25 10:14 ` Mostafa Saleh
  2024-03-25 10:14 ` [RFC PATCH 09/12] hw/arm/smmuv3: Advertise S2FWB Mostafa Saleh
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 24+ messages in thread
From: Mostafa Saleh @ 2024-03-25 10:14 UTC (permalink / raw)
  To: qemu-arm, eric.auger, peter.maydell, qemu-devel
  Cc: jean-philippe, alex.bennee, maz, nicolinc, julien, Mostafa Saleh

Everything is in place, add the last missing bits:
- Handle fault checking according to the actual PTW event and not the
  the translation stage.
- Consolidate parsing of STE cfg and setting translation stage.

Advertise nesting if stage requested is "nested".

Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
 hw/arm/smmuv3.c | 51 ++++++++++++++++++++++++++++++++++---------------
 1 file changed, 36 insertions(+), 15 deletions(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 32a1838576..e5373f4cfe 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -34,9 +34,10 @@
 #include "smmuv3-internal.h"
 #include "smmu-internal.h"
 
-#define PTW_RECORD_FAULT(cfg)   (((cfg)->stage == SMMU_STAGE_1) ? \
-                                 (cfg)->record_faults : \
-                                 (cfg)->s2cfg.record_faults)
+#define PTW_RECORD_FAULT(ptw_info, cfg) (((ptw_info).stage == SMMU_STAGE_1 && \
+                                        (cfg)->record_faults) || \
+                                        ((ptw_info).stage == SMMU_STAGE_2 && \
+                                        (cfg)->s2cfg.record_faults))
 
 /**
  * smmuv3_trigger_irq - pulse @irq if enabled and update
@@ -260,6 +261,9 @@ static void smmuv3_init_regs(SMMUv3State *s)
     /* Based on sys property, the stages supported in smmu will be advertised.*/
     if (s->stage && !strcmp("2", s->stage)) {
         s->idr[0] = FIELD_DP32(s->idr[0], IDR0, S2P, 1);
+    } else if (s->stage && !strcmp("nested", s->stage)) {
+        s->idr[0] = FIELD_DP32(s->idr[0], IDR0, S1P, 1);
+        s->idr[0] = FIELD_DP32(s->idr[0], IDR0, S2P, 1);
     } else {
         s->idr[0] = FIELD_DP32(s->idr[0], IDR0, S1P, 1);
     }
@@ -425,8 +429,6 @@ static bool s2_pgtable_config_valid(uint8_t sl0, uint8_t t0sz, uint8_t gran)
 
 static int decode_ste_s2_cfg(SMMUTransCfg *cfg, STE *ste)
 {
-    cfg->stage = SMMU_STAGE_2;
-
     if (STE_S2AA64(ste) == 0x0) {
         qemu_log_mask(LOG_UNIMP,
                       "SMMUv3 AArch32 tables not supported\n");
@@ -509,6 +511,27 @@ bad_ste:
     return -EINVAL;
 }
 
+static void decode_ste_config(SMMUTransCfg *cfg, uint32_t config)
+{
+
+    if (STE_CFG_ABORT(config)) {
+        cfg->aborted = true;
+        return;
+    }
+    if (STE_CFG_BYPASS(config)) {
+        cfg->bypassed = true;
+        return;
+    }
+
+    if (STE_CFG_S1_ENABLED(config)) {
+        cfg->stage |= SMMU_STAGE_1;
+    }
+
+    if (STE_CFG_S2_ENABLED(config)) {
+        cfg->stage |= SMMU_STAGE_2;
+    }
+}
+
 /* Returns < 0 in case of invalid STE, 0 otherwise */
 static int decode_ste(SMMUv3State *s, SMMUTransCfg *cfg,
                       STE *ste, SMMUEventInfo *event)
@@ -525,16 +548,15 @@ static int decode_ste(SMMUv3State *s, SMMUTransCfg *cfg,
 
     config = STE_CONFIG(ste);
 
-    if (STE_CFG_ABORT(config)) {
-        cfg->aborted = true;
+    decode_ste_config(cfg, config);
+
+    if (cfg->aborted) {
         return 0;
     }
 
-    if (STE_CFG_BYPASS(config)) {
-        cfg->bypassed = true;
+    if (cfg->bypassed) {
         return 0;
     }
-
     /*
      * If a stage is enabled in SW while not advertised, throw bad ste
      * according to user manual(IHI0070E) "5.2 Stream Table Entry".
@@ -704,7 +726,6 @@ static int decode_cd(SMMUv3State *s, SMMUTransCfg *cfg,
 
     /* we support only those at the moment */
     cfg->aa64 = true;
-    cfg->stage = SMMU_STAGE_1;
 
     cfg->oas = oas2bits(CD_IPS(cd));
     cfg->oas = MIN(oas2bits(SMMU_IDR5_OAS), cfg->oas);
@@ -887,28 +908,28 @@ static SMMUTranslationStatus smmuv3_do_translate(SMMUv3State *s, hwaddr addr,
             event->u.f_walk_eabt.addr2 = ptw_info.addr;
             break;
         case SMMU_PTW_ERR_TRANSLATION:
-            if (PTW_RECORD_FAULT(cfg)) {
+            if (PTW_RECORD_FAULT(ptw_info, cfg)) {
                 event->type = SMMU_EVT_F_TRANSLATION;
                 event->u.f_translation.addr = addr;
                 event->u.f_translation.rnw = flag & 0x1;
             }
             break;
         case SMMU_PTW_ERR_ADDR_SIZE:
-            if (PTW_RECORD_FAULT(cfg)) {
+            if (PTW_RECORD_FAULT(ptw_info, cfg)) {
                 event->type = SMMU_EVT_F_ADDR_SIZE;
                 event->u.f_addr_size.addr = addr;
                 event->u.f_addr_size.rnw = flag & 0x1;
             }
             break;
         case SMMU_PTW_ERR_ACCESS:
-            if (PTW_RECORD_FAULT(cfg)) {
+            if (PTW_RECORD_FAULT(ptw_info, cfg)) {
                 event->type = SMMU_EVT_F_ACCESS;
                 event->u.f_access.addr = addr;
                 event->u.f_access.rnw = flag & 0x1;
             }
             break;
         case SMMU_PTW_ERR_PERMISSION:
-            if (PTW_RECORD_FAULT(cfg)) {
+            if (PTW_RECORD_FAULT(ptw_info, cfg)) {
                 event->type = SMMU_EVT_F_PERMISSION;
                 event->u.f_permission.addr = addr;
                 event->u.f_permission.rnw = flag & 0x1;
-- 
2.44.0.396.g6e790dbe36-goog



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH 09/12] hw/arm/smmuv3: Advertise S2FWB
  2024-03-25 10:13 [RFC PATCH 00/12] SMMUv3 nested translation support Mostafa Saleh
                   ` (7 preceding siblings ...)
  2024-03-25 10:14 ` [RFC PATCH 08/12] hw/arm/smmuv3: Support and advertise nesting Mostafa Saleh
@ 2024-03-25 10:14 ` Mostafa Saleh
  2024-03-25 10:14 ` [RFC PATCH 10/12] hw/arm/smmu: Refactor SMMU OAS Mostafa Saleh
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 24+ messages in thread
From: Mostafa Saleh @ 2024-03-25 10:14 UTC (permalink / raw)
  To: qemu-arm, eric.auger, peter.maydell, qemu-devel
  Cc: jean-philippe, alex.bennee, maz, nicolinc, julien, Mostafa Saleh

QEMU doesn's support memory attributes, so FWB is NOP, this
might change in the future if memory attributre would be supported.

Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
 hw/arm/smmuv3.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index e5373f4cfe..288e7cf1ae 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -287,6 +287,14 @@ static void smmuv3_init_regs(SMMUv3State *s)
     if (FIELD_EX32(s->idr[0], IDR0, S2P)) {
         /* XNX is a stage-2-specific feature */
         s->idr[3] = FIELD_DP32(s->idr[3], IDR3, XNX, 1);
+        if (FIELD_EX32(s->idr[0], IDR0, S1P)) {
+            /*
+             * QEMU doesn's support memory attributes, so FWB is NOP, this
+             * might change in the future if memory attributre would be
+             * supported.
+             */
+           s->idr[3] = FIELD_DP32(s->idr[3], IDR3, FWB, 1);
+        }
     }
     s->idr[3] = FIELD_DP32(s->idr[3], IDR3, RIL, 1);
     s->idr[3] = FIELD_DP32(s->idr[3], IDR3, BBML, 2);
-- 
2.44.0.396.g6e790dbe36-goog



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH 10/12] hw/arm/smmu: Refactor SMMU OAS
  2024-03-25 10:13 [RFC PATCH 00/12] SMMUv3 nested translation support Mostafa Saleh
                   ` (8 preceding siblings ...)
  2024-03-25 10:14 ` [RFC PATCH 09/12] hw/arm/smmuv3: Advertise S2FWB Mostafa Saleh
@ 2024-03-25 10:14 ` Mostafa Saleh
  2024-03-25 10:14 ` [RFC PATCH 11/12] hw/arm/smmuv3: Add property for OAS Mostafa Saleh
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 24+ messages in thread
From: Mostafa Saleh @ 2024-03-25 10:14 UTC (permalink / raw)
  To: qemu-arm, eric.auger, peter.maydell, qemu-devel
  Cc: jean-philippe, alex.bennee, maz, nicolinc, julien, Mostafa Saleh

SMMUv3 OAS is hardcoded to 44 bits, for nested configurations that
can be a problem as stage-2 might be shared with the CPU which might
have different PARANGE, and according to SMMU manual ARM IHI 0070F.b:
    6.3.6 SMMU_IDR5, OAS must match the system physical address size.

This patch doesn't change the SMMU OAS, but refactors the code to
make it easier to do that:
- Rely everywhere on IDR5 for reading OAS instead of using the macro so
  it is easier just change IDR5 and it propagages correctly.
- Make the code work if OAS is 52 bits.
- Remove unused functions/macros: pa_range/MAX_PA

Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
 hw/arm/smmu-common.c     |  7 ++++---
 hw/arm/smmuv3-internal.h | 15 ++-------------
 hw/arm/smmuv3.c          | 35 ++++++++++++++++++++++++++++-------
 3 files changed, 34 insertions(+), 23 deletions(-)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index da8776ecec..a4196ddd22 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -359,7 +359,8 @@ static int smmu_ptw_64_s1(SMMUTransCfg *cfg,
     inputsize = 64 - tt->tsz;
     level = 4 - (inputsize - 4) / stride;
     indexmask = VMSA_IDXMSK(inputsize, stride, level);
-    baseaddr = extract64(tt->ttb, 0, 48);
+
+    baseaddr = extract64(tt->ttb, 0, cfg->oas);
     baseaddr &= ~indexmask;
 
     while (level < VMSA_LEVELS) {
@@ -472,8 +473,8 @@ static int smmu_ptw_64_s2(SMMUTransCfg *cfg,
      * Get the ttb from concatenated structure.
      * The offset is the idx * size of each ttb(number of ptes * (sizeof(pte))
      */
-    uint64_t baseaddr = extract64(cfg->s2cfg.vttb, 0, 48) + (1 << stride) *
-                                  idx * sizeof(uint64_t);
+    uint64_t baseaddr = extract64(cfg->s2cfg.vttb, 0, cfg->s2cfg.eff_ps) +
+                                  (1 << stride) * idx * sizeof(uint64_t);
     dma_addr_t indexmask = VMSA_IDXMSK(inputsize, stride, level);
 
     baseaddr &= ~indexmask;
diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
index e4dd11e1e6..a7d53b3854 100644
--- a/hw/arm/smmuv3-internal.h
+++ b/hw/arm/smmuv3-internal.h
@@ -592,23 +592,12 @@ static inline int oas2bits(int oas_field)
         return 44;
     case 5:
         return 48;
+    case 6:
+        return 52;
     }
     return -1;
 }
 
-static inline int pa_range(STE *ste)
-{
-    int oas_field = MIN(STE_S2PS(ste), SMMU_IDR5_OAS);
-
-    if (!STE_S2AA64(ste)) {
-        return 40;
-    }
-
-    return oas2bits(oas_field);
-}
-
-#define MAX_PA(ste) ((1 << pa_range(ste)) - 1)
-
 /* CD fields */
 
 #define CD_VALID(x)   extract32((x)->word[0], 31, 1)
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 288e7cf1ae..2a29e3bccb 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -411,10 +411,10 @@ static bool s2t0sz_valid(SMMUTransCfg *cfg)
     }
 
     if (cfg->s2cfg.granule_sz == 16) {
-        return (cfg->s2cfg.tsz >= 64 - oas2bits(SMMU_IDR5_OAS));
+        return (cfg->s2cfg.tsz >= 64 - cfg->s2cfg.eff_ps);
     }
 
-    return (cfg->s2cfg.tsz >= MAX(64 - oas2bits(SMMU_IDR5_OAS), 16));
+    return (cfg->s2cfg.tsz >= MAX(64 - cfg->s2cfg.eff_ps, 16));
 }
 
 /*
@@ -435,8 +435,11 @@ static bool s2_pgtable_config_valid(uint8_t sl0, uint8_t t0sz, uint8_t gran)
     return nr_concat <= VMSA_MAX_S2_CONCAT;
 }
 
-static int decode_ste_s2_cfg(SMMUTransCfg *cfg, STE *ste)
+static int decode_ste_s2_cfg(SMMUv3State *s, SMMUTransCfg *cfg,
+                             STE *ste)
 {
+    uint8_t oas = FIELD_EX32(s->idr[5], IDR5, OAS);
+
     if (STE_S2AA64(ste) == 0x0) {
         qemu_log_mask(LOG_UNIMP,
                       "SMMUv3 AArch32 tables not supported\n");
@@ -469,7 +472,15 @@ static int decode_ste_s2_cfg(SMMUTransCfg *cfg, STE *ste)
     }
 
     /* For AA64, The effective S2PS size is capped to the OAS. */
-    cfg->s2cfg.eff_ps = oas2bits(MIN(STE_S2PS(ste), SMMU_IDR5_OAS));
+    cfg->s2cfg.eff_ps = oas2bits(MIN(STE_S2PS(ste), oas));
+    /*
+     * For SMMUv3.1 and later, when OAS == IAS == 52, the stage 2 input
+     * range is further limited to 48 bits unless STE.S2TG indicates a
+     * 64KB granule.
+     */
+    if (cfg->s2cfg.granule_sz != 16) {
+        cfg->s2cfg.eff_ps = MIN(cfg->s2cfg.eff_ps, 48);
+    }
     /*
      * It is ILLEGAL for the address in S2TTB to be outside the range
      * described by the effective S2PS value.
@@ -545,6 +556,7 @@ static int decode_ste(SMMUv3State *s, SMMUTransCfg *cfg,
                       STE *ste, SMMUEventInfo *event)
 {
     uint32_t config;
+    uint8_t oas = FIELD_EX32(s->idr[5], IDR5, OAS);
     int ret;
 
     if (!STE_VALID(ste)) {
@@ -591,8 +603,8 @@ static int decode_ste(SMMUv3State *s, SMMUTransCfg *cfg,
          * Stage-1 OAS defaults to OAS even if not enabled as it would be used
          * in input address check for stage-2.
          */
-        cfg->oas = oas2bits(SMMU_IDR5_OAS);
-        ret = decode_ste_s2_cfg(cfg, ste);
+        cfg->oas = oas2bits(oas);
+        ret = decode_ste_s2_cfg(s, cfg, ste);
         if (ret) {
             goto bad_ste;
         }
@@ -718,6 +730,7 @@ static int decode_cd(SMMUv3State *s, SMMUTransCfg *cfg,
     int i;
     SMMUTranslationStatus status;
     SMMUTLBEntry *entry;
+    uint8_t oas = FIELD_EX32(s->idr[5], IDR5, OAS);
 
     if (!CD_VALID(cd) || !CD_AARCH64(cd)) {
         goto bad_cd;
@@ -736,7 +749,7 @@ static int decode_cd(SMMUv3State *s, SMMUTransCfg *cfg,
     cfg->aa64 = true;
 
     cfg->oas = oas2bits(CD_IPS(cd));
-    cfg->oas = MIN(oas2bits(SMMU_IDR5_OAS), cfg->oas);
+    cfg->oas = MIN(oas2bits(oas), cfg->oas);
     cfg->tbi = CD_TBI(cd);
     cfg->asid = CD_ASID(cd);
     cfg->affd = CD_AFFD(cd);
@@ -765,6 +778,14 @@ static int decode_cd(SMMUv3State *s, SMMUTransCfg *cfg,
             goto bad_cd;
         }
 
+        /*
+         * An address greater than 48 bits in size can only be output from a
+         * TTD when, in SMMUv3.1 and later, the effective IPS is 52 and a 64KB
+         * granule is in use for that translation table
+         */
+        if (tt->granule_sz != 16) {
+            cfg->oas = MIN(cfg->oas, 48);
+        }
         tt->tsz = tsz;
         tt->ttb = CD_TTB(cd, i);
 
-- 
2.44.0.396.g6e790dbe36-goog



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH 11/12] hw/arm/smmuv3: Add property for OAS
  2024-03-25 10:13 [RFC PATCH 00/12] SMMUv3 nested translation support Mostafa Saleh
                   ` (9 preceding siblings ...)
  2024-03-25 10:14 ` [RFC PATCH 10/12] hw/arm/smmu: Refactor SMMU OAS Mostafa Saleh
@ 2024-03-25 10:14 ` Mostafa Saleh
  2024-03-25 10:14 ` [RFC PATCH 12/12] hw/arm/virt: Set SMMU OAS based on CPU PARANGE Mostafa Saleh
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 24+ messages in thread
From: Mostafa Saleh @ 2024-03-25 10:14 UTC (permalink / raw)
  To: qemu-arm, eric.auger, peter.maydell, qemu-devel
  Cc: jean-philippe, alex.bennee, maz, nicolinc, julien, Mostafa Saleh

Add property that sets the OAS of the SMMU, this in not used in this
patch.

Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
 hw/arm/smmuv3-internal.h |  2 +-
 hw/arm/smmuv3.c          | 27 ++++++++++++++++++++++++++-
 include/hw/arm/smmuv3.h  |  1 +
 3 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
index a7d53b3854..9bb4ec9ec6 100644
--- a/hw/arm/smmuv3-internal.h
+++ b/hw/arm/smmuv3-internal.h
@@ -105,7 +105,7 @@ REG32(IDR5,                0x14)
      FIELD(IDR5, VAX,        10, 2);
      FIELD(IDR5, STALL_MAX,  16, 16);
 
-#define SMMU_IDR5_OAS 4
+#define SMMU_IDR5_OAS_DEF 4
 
 REG32(IIDR,                0x18)
 REG32(AIDR,                0x1c)
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 2a29e3bccb..9d0db25379 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -299,7 +299,7 @@ static void smmuv3_init_regs(SMMUv3State *s)
     s->idr[3] = FIELD_DP32(s->idr[3], IDR3, RIL, 1);
     s->idr[3] = FIELD_DP32(s->idr[3], IDR3, BBML, 2);
 
-    s->idr[5] = FIELD_DP32(s->idr[5], IDR5, OAS, SMMU_IDR5_OAS); /* 44 bits */
+    s->idr[5] = FIELD_DP32(s->idr[5], IDR5, OAS, s->oas);
     /* 4K, 16K and 64K granule support */
     s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN4K, 1);
     s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN16K, 1);
@@ -1869,11 +1869,34 @@ static const VMStateDescription vmstate_gbpa = {
     }
 };
 
+static const VMStateDescription vmstate_oas = {
+    .name = "smmuv3/oas",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (const VMStateField[]) {
+        VMSTATE_INT32(oas, SMMUv3State),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static int smmuv3_preload(void *opaque)
+{
+    SMMUv3State *s = opaque;
+
+    /*
+     * In case it wasn't migrated, use the value used
+     * by older QEMU.
+     */
+    s->oas = SMMU_IDR5_OAS_DEF;
+    return 0;
+}
+
 static const VMStateDescription vmstate_smmuv3 = {
     .name = "smmuv3",
     .version_id = 1,
     .minimum_version_id = 1,
     .priority = MIG_PRI_IOMMU,
+    .pre_load = smmuv3_preload,
     .fields = (const VMStateField[]) {
         VMSTATE_UINT32(features, SMMUv3State),
         VMSTATE_UINT8(sid_size, SMMUv3State),
@@ -1901,6 +1924,7 @@ static const VMStateDescription vmstate_smmuv3 = {
     },
     .subsections = (const VMStateDescription * const []) {
         &vmstate_gbpa,
+        &vmstate_oas,
         NULL
     }
 };
@@ -1913,6 +1937,7 @@ static Property smmuv3_properties[] = {
      * Defaults to stage 1
      */
     DEFINE_PROP_STRING("stage", SMMUv3State, stage),
+    DEFINE_PROP_INT32("oas", SMMUv3State, oas, SMMU_IDR5_OAS_DEF),
     DEFINE_PROP_END_OF_LIST()
 };
 
diff --git a/include/hw/arm/smmuv3.h b/include/hw/arm/smmuv3.h
index d183a62766..00a9eb4467 100644
--- a/include/hw/arm/smmuv3.h
+++ b/include/hw/arm/smmuv3.h
@@ -63,6 +63,7 @@ struct SMMUv3State {
     qemu_irq     irq[4];
     QemuMutex mutex;
     char *stage;
+    int32_t oas;
 };
 
 typedef enum {
-- 
2.44.0.396.g6e790dbe36-goog



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH 12/12] hw/arm/virt: Set SMMU OAS based on CPU PARANGE
  2024-03-25 10:13 [RFC PATCH 00/12] SMMUv3 nested translation support Mostafa Saleh
                   ` (10 preceding siblings ...)
  2024-03-25 10:14 ` [RFC PATCH 11/12] hw/arm/smmuv3: Add property for OAS Mostafa Saleh
@ 2024-03-25 10:14 ` Mostafa Saleh
  2024-03-25 17:50 ` [RFC PATCH 00/12] SMMUv3 nested translation support Marcin Juszkiewicz
  2024-04-02 22:28 ` Nicolin Chen
  13 siblings, 0 replies; 24+ messages in thread
From: Mostafa Saleh @ 2024-03-25 10:14 UTC (permalink / raw)
  To: qemu-arm, eric.auger, peter.maydell, qemu-devel
  Cc: jean-philippe, alex.bennee, maz, nicolinc, julien, Mostafa Saleh

Use the new SMMU property to make the SMMU OAS match the CPU PARANGE.
That's according to SMMU manual ARM IHI 0070F.b:
    6.3.6 SMMU_IDR5, OAS must match the system physical address size.

Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
 hw/arm/virt.c      | 14 ++++++++++++--
 target/arm/cpu.h   |  2 ++
 target/arm/cpu64.c |  5 +++++
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 0af1943697..599c0f752b 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -235,6 +235,13 @@ static bool ns_el2_virt_timer_present(void)
         arm_feature(env, ARM_FEATURE_EL2) && cpu_isar_feature(aa64_vh, cpu);
 }
 
+/* We rely on CPU to define system OAS. */
+static int32_t get_system_oas(void)
+{
+    ARMCPU *cpu = ARM_CPU(qemu_get_cpu(0));
+    return cpu_arm_get_oas(cpu);
+}
+
 static void create_fdt(VirtMachineState *vms)
 {
     MachineState *ms = MACHINE(vms);
@@ -1340,7 +1347,7 @@ static void create_pcie_irq_map(const MachineState *ms,
 }
 
 static void create_smmu(const VirtMachineState *vms,
-                        PCIBus *bus)
+                        PCIBus *bus, int32_t oas)
 {
     char *node;
     const char compat[] = "arm,smmu-v3";
@@ -1360,6 +1367,9 @@ static void create_smmu(const VirtMachineState *vms,
 
     object_property_set_link(OBJECT(dev), "primary-bus", OBJECT(bus),
                              &error_abort);
+
+    qdev_prop_set_uint64(dev, "oas", oas);
+
     sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), &error_fatal);
     sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, base);
     for (i = 0; i < NUM_SMMU_IRQS; i++) {
@@ -1534,7 +1544,7 @@ static void create_pcie(VirtMachineState *vms)
 
         switch (vms->iommu) {
         case VIRT_IOMMU_SMMUV3:
-            create_smmu(vms, vms->bus);
+            create_smmu(vms, vms->bus, get_system_oas());
             qemu_fdt_setprop_cells(ms->fdt, nodename, "iommu-map",
                                    0x0, vms->iommu_phandle, 0x0, 0x10000);
             break;
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index a5b3d8f7da..14ef1a9d37 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3408,4 +3408,6 @@ static inline target_ulong cpu_untagged_addr(CPUState *cs, target_ulong x)
 }
 #endif
 
+int32_t cpu_arm_get_oas(ARMCPU *cpu);
+
 #endif
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 985b1efe16..08da83c082 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -787,6 +787,11 @@ static const gchar *aarch64_gdb_arch_name(CPUState *cs)
     return "aarch64";
 }
 
+int32_t cpu_arm_get_oas(ARMCPU *cpu)
+{
+    return FIELD_EX64(cpu->isar.id_aa64mmfr0, ID_AA64MMFR0, PARANGE);
+}
+
 static void aarch64_cpu_class_init(ObjectClass *oc, void *data)
 {
     CPUClass *cc = CPU_CLASS(oc);
-- 
2.44.0.396.g6e790dbe36-goog



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 07/12] hw/arm/smmu-common: Support nested translation
  2024-03-25 10:14 ` [RFC PATCH 07/12] hw/arm/smmu-common: Support nested translation Mostafa Saleh
@ 2024-03-25 14:20   ` Julien Grall
  2024-03-25 20:47     ` Mostafa Saleh
  0 siblings, 1 reply; 24+ messages in thread
From: Julien Grall @ 2024-03-25 14:20 UTC (permalink / raw)
  To: Mostafa Saleh, qemu-arm, eric.auger, peter.maydell, qemu-devel
  Cc: jean-philippe, alex.bennee, maz, nicolinc

Hi Mostafa,

On 25/03/2024 10:14, Mostafa Saleh wrote:
> @@ -524,7 +551,7 @@ static int smmu_ptw_64_s2(SMMUTransCfg *cfg,
>           tlbe->entry.translated_addr = gpa;
>           tlbe->entry.iova = ipa & ~mask;
>           tlbe->entry.addr_mask = mask;
> -        tlbe->entry.perm = s2ap;
> +        tlbe->parent_perm = tlbe->entry.perm = s2ap;
>           tlbe->level = level;
>           tlbe->granule = granule_sz;
>           return 0;
> @@ -537,6 +564,35 @@ error:
>       return -EINVAL;
>   }
>   
> +/* Combine 2 TLB enteries and return in tlbe. */
> +static void combine_tlb(SMMUTLBEntry *tlbe, SMMUTLBEntry *tlbe_s2,
> +                        dma_addr_t iova, SMMUTransCfg *cfg)
> +{
> +        if (cfg->stage == SMMU_NESTED) {
> +
> +            /*
> +             * tg and level are used from stage-1, while the addr mask can be
With the current approach, I can't boot a guest if I create a dummy 
stage-1 using 512GB mapping and a stage-2 using 2MB mapping. It looks 
like this is because the level will be used during the TLB lookup.

I managed to solve the issue by using the max level of the two stages. I 
think we may need to do a minimum for the granule.


> +             * smaller in case stage-2 size(based on granule and level) was
> +             * smaller than stage-1.
> +             * That should have no impact on:
> +             * - lookup: as iova is properly aligned with the stage-1 level and
> +             *   granule.
> +             * - Invalidation: as it uses the entry mask.
> +             */
> +            tlbe->entry.addr_mask = MIN(tlbe->entry.addr_mask,
> +                                        tlbe_s2->entry.addr_mask);
> +            tlbe->entry.translated_addr = CACHED_ENTRY_TO_ADDR(tlbe_s2,
> +                                          tlbe->entry.translated_addr);
> +
> +            /* parent_perm has s2 perm while perm has s1 perm. */
> +            tlbe->parent_perm = tlbe_s2->entry.perm;
> +            return;
> +        }
> +
> +        /* That was not nested, use the s2. */
> +        memcpy(tlbe, tlbe_s2, sizeof(*tlbe));
> +}

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 00/12] SMMUv3 nested translation support
  2024-03-25 10:13 [RFC PATCH 00/12] SMMUv3 nested translation support Mostafa Saleh
                   ` (11 preceding siblings ...)
  2024-03-25 10:14 ` [RFC PATCH 12/12] hw/arm/virt: Set SMMU OAS based on CPU PARANGE Mostafa Saleh
@ 2024-03-25 17:50 ` Marcin Juszkiewicz
  2024-04-02 22:28 ` Nicolin Chen
  13 siblings, 0 replies; 24+ messages in thread
From: Marcin Juszkiewicz @ 2024-03-25 17:50 UTC (permalink / raw)
  To: Mostafa Saleh, qemu-arm, eric.auger, peter.maydell, qemu-devel
  Cc: jean-philippe, alex.bennee, maz, nicolinc, julien, Leif Lindholm

W dniu 25.03.2024 o 11:13, Mostafa Saleh pisze:
> Currently, QEMU supports emulating either stage-1 or stage-2 SMMUs
> but not nested instances.
> This patch series adds support for nested translation in SMMUv3,
> this is controlled by property “arm-smmuv3.stage=nested”, and
> advertised to guests as (IDR0.S1P == 1 && IDR0.S2P == 2)

 From pure curiosity I applied the series, enabled 'nested' one in
sbsa-ref and ran (S)BSA ACS tests.

Two more tests passed. Ones which check does SMMU supports both stage1 
and stage2 at same time.

The fun part? Those tests only check SMMU registers.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 07/12] hw/arm/smmu-common: Support nested translation
  2024-03-25 14:20   ` Julien Grall
@ 2024-03-25 20:47     ` Mostafa Saleh
  0 siblings, 0 replies; 24+ messages in thread
From: Mostafa Saleh @ 2024-03-25 20:47 UTC (permalink / raw)
  To: Julien Grall
  Cc: qemu-arm, eric.auger, peter.maydell, qemu-devel, jean-philippe,
	alex.bennee, maz, nicolinc

Hi Julien,

On Mon, Mar 25, 2024 at 02:20:07PM +0000, Julien Grall wrote:
> Hi Mostafa,
> 
> On 25/03/2024 10:14, Mostafa Saleh wrote:
> > @@ -524,7 +551,7 @@ static int smmu_ptw_64_s2(SMMUTransCfg *cfg,
> >           tlbe->entry.translated_addr = gpa;
> >           tlbe->entry.iova = ipa & ~mask;
> >           tlbe->entry.addr_mask = mask;
> > -        tlbe->entry.perm = s2ap;
> > +        tlbe->parent_perm = tlbe->entry.perm = s2ap;
> >           tlbe->level = level;
> >           tlbe->granule = granule_sz;
> >           return 0;
> > @@ -537,6 +564,35 @@ error:
> >       return -EINVAL;
> >   }
> > +/* Combine 2 TLB enteries and return in tlbe. */
> > +static void combine_tlb(SMMUTLBEntry *tlbe, SMMUTLBEntry *tlbe_s2,
> > +                        dma_addr_t iova, SMMUTransCfg *cfg)
> > +{
> > +        if (cfg->stage == SMMU_NESTED) {
> > +
> > +            /*
> > +             * tg and level are used from stage-1, while the addr mask can be
> With the current approach, I can't boot a guest if I create a dummy stage-1
> using 512GB mapping and a stage-2 using 2MB mapping. It looks like this is
> because the level will be used during the TLB lookup.

Agh, I guess that case is’t common with Linux.

I was able to reproduce it with a hacked Linux driver, and the issue
happens in smmu_iotlb_lookup() because it assumes the cached entry has
a mask matching level and granularity, which is not correct with
nesting and I missed it, and fixing the mask is not enough here.

Looking at the mask of the found entry, not good also, if there is
disparity between stage-1 and stage-2 levels we always miss in TLB
even for the same address.

> 
> I managed to solve the issue by using the max level of the two stages. I
> think we may need to do a minimum for the granule.
> 

Just fixing the granularity and level, will alway miss in TLB if they
are different as granularity is used in lookup, I guess one way is to
fall back for stage-2 granularity in lookup if stage-1 lookup fails,
I will have another look and see if there is a better solution for v2.

But for now as you mentioned (also we need update the IOVA to match
the mask), that just should at least work:

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index ef5edfe4dc..ac2dc3efeb 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -572,21 +572,13 @@ static void combine_tlb(SMMUTLBEntry *tlbe, SMMUTLBEntry *tlbe_s2,
                         dma_addr_t iova, SMMUTransCfg *cfg)
 {
         if (cfg->stage == SMMU_NESTED) {
-
-            /*
-             * tg and level are used from stage-1, while the addr mask can be
-             * smaller in case stage-2 size(based on granule and level) was
-             * smaller than stage-1.
-             * That should have no impact on:
-             * - lookup: as iova is properly aligned with the stage-1 level and
-             *   granule.
-             * - Invalidation: as it uses the entry mask.
-             */
             tlbe->entry.addr_mask = MIN(tlbe->entry.addr_mask,
                                         tlbe_s2->entry.addr_mask);
             tlbe->entry.translated_addr = CACHED_ENTRY_TO_ADDR(tlbe_s2,
                                           tlbe->entry.translated_addr);
-
+            tlbe->granule = MIN(tlbe->granule, tlbe_s2->granule);
+            tlbe->level = MAX(tlbe->level, tlbe_s2->level);
+            tlbe->entry.iova = iova & ~tlbe->entry.addr_mask;
             /* parent_perm has s2 perm while perm has s1 perm. */
             tlbe->parent_perm = tlbe_s2->entry.perm;

> 
> > +             * smaller in case stage-2 size(based on granule and level) was
> > +             * smaller than stage-1.
> > +             * That should have no impact on:
> > +             * - lookup: as iova is properly aligned with the stage-1 level and
> > +             *   granule.
> > +             * - Invalidation: as it uses the entry mask.
> > +             */
> > +            tlbe->entry.addr_mask = MIN(tlbe->entry.addr_mask,
> > +                                        tlbe_s2->entry.addr_mask);
> > +            tlbe->entry.translated_addr = CACHED_ENTRY_TO_ADDR(tlbe_s2,
> > +                                          tlbe->entry.translated_addr);
> > +
> > +            /* parent_perm has s2 perm while perm has s1 perm. */
> > +            tlbe->parent_perm = tlbe_s2->entry.perm;
> > +            return;
> > +        }
> > +
> > +        /* That was not nested, use the s2. */
> > +        memcpy(tlbe, tlbe_s2, sizeof(*tlbe));
> > +}
> 
> Cheers,
> 
> -- 
> Julien Grall

Thanks,
Mostafa



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 02/12] hw/arm/smmu: Split smmuv3_translate()
  2024-03-25 10:13 ` [RFC PATCH 02/12] hw/arm/smmu: Split smmuv3_translate() Mostafa Saleh
@ 2024-04-02 16:32   ` Eric Auger
  0 siblings, 0 replies; 24+ messages in thread
From: Eric Auger @ 2024-04-02 16:32 UTC (permalink / raw)
  To: Mostafa Saleh, qemu-arm, peter.maydell, qemu-devel
  Cc: jean-philippe, alex.bennee, maz, nicolinc, julien

Hi Mostafa,

On 3/25/24 11:13, Mostafa Saleh wrote:
> smmuv3_translate() does everything from STE/CD parsing to TLB lookup
> and PTW.
>
> Soon, when nesting is supported, stage-1 data (tt, CD) needs to be
> translated using stage-2.
>
> Split smmuv3_translate() to 3 functions:
>
> - smmu_translate(): in smmu-common.c, which does the TLB lookup, PTW,
>   TLB insertion, all the functions are already there, this just puts
>   them together.
>   This also simplifies the code as it consolidates event generation
>   in case of TLB lookup permission failure or in TT selection.
>
> - smmuv3_do_translate(): in smmuv3.c, Calls smmu_translate() and does
>   the event population in case of errors.
>
>  - smmuv3_translate(), now calls smmuv3_do_translate() for
>    translation while the rest is the same.
>
> Also, add stage in trace_smmuv3_translate_success()
>
> Signed-off-by: Mostafa Saleh <smostafa@google.com>
> ---
>  hw/arm/smmu-common.c         |  59 ++++++++++++
>  hw/arm/smmuv3.c              | 175 +++++++++++++----------------------
>  hw/arm/trace-events          |   2 +-
>  include/hw/arm/smmu-common.h |   5 +
>  4 files changed, 130 insertions(+), 111 deletions(-)
>
> diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
> index 3a7c350aca..20630eb670 100644
> --- a/hw/arm/smmu-common.c
> +++ b/hw/arm/smmu-common.c
> @@ -554,6 +554,65 @@ int smmu_ptw(SMMUTransCfg *cfg, dma_addr_t iova, IOMMUAccessFlags perm,
>      g_assert_not_reached();
>  }
>  
> +SMMUTLBEntry *smmu_translate(SMMUState *bs, SMMUTransCfg *cfg, dma_addr_t addr,
> +                             IOMMUAccessFlags flag, SMMUPTWEventInfo *info)
> +{
> +    uint64_t page_mask, aligned_addr;
> +    SMMUTLBEntry *cached_entry = NULL;
> +    SMMUTransTableInfo *tt;
> +    int status;
> +
> +    /*
> +     * Combined attributes used for TLB lookup, as only one stage is supported,
> +     * it will hold attributes based on the enabled stage.
> +     */
> +    SMMUTransTableInfo tt_combined;
> +
> +    if (cfg->stage == SMMU_STAGE_1) {
> +        /* Select stage1 translation table. */
> +        tt = select_tt(cfg, addr);
> +        if (!tt) {
> +            info->type = SMMU_PTW_ERR_TRANSLATION;
> +            info->stage = SMMU_STAGE_1;
> +            return NULL;
> +        }
> +        tt_combined.granule_sz = tt->granule_sz;
> +        tt_combined.tsz = tt->tsz;
> +
> +    } else {
> +        /* Stage2. */
> +        tt_combined.granule_sz = cfg->s2cfg.granule_sz;
> +        tt_combined.tsz = cfg->s2cfg.tsz;
> +    }
> +
> +    /*
> +     * TLB lookup looks for granule and input size for a translation stage,
> +     * as only one stage is supported right now, choose the right values
> +     * from the configuration.
> +     */
> +    page_mask = (1ULL << tt_combined.granule_sz) - 1;
> +    aligned_addr = addr & ~page_mask;
> +
> +    cached_entry = smmu_iotlb_lookup(bs, cfg, &tt_combined, aligned_addr);
> +    if (cached_entry) {
> +        if ((flag & IOMMU_WO) && !(cached_entry->entry.perm & IOMMU_WO)) {
> +            info->type = SMMU_PTW_ERR_PERMISSION;
> +            info->stage = cfg->stage;
> +            return NULL;
> +        }
> +        return cached_entry;
> +    }
> +
> +    cached_entry = g_new0(SMMUTLBEntry, 1);
> +    status = smmu_ptw(cfg, aligned_addr, flag, cached_entry, info);
> +    if (status) {
> +            g_free(cached_entry);
> +            return NULL;
> +    }
> +    smmu_iotlb_insert(bs, cfg, cached_entry);
> +    return cached_entry;
> +}
> +
>  /**
>   * The bus number is used for lookup when SID based invalidation occurs.
>   * In that case we lazily populate the SMMUPciBus array from the bus hash
> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> index 50e5a72d54..f081ff0cc4 100644
> --- a/hw/arm/smmuv3.c
> +++ b/hw/arm/smmuv3.c
> @@ -827,6 +827,67 @@ static void smmuv3_flush_config(SMMUDevice *sdev)
>      g_hash_table_remove(bc->configs, sdev);
>  }
>  
> +/* Do translation with TLB lookup. */
> +static SMMUTranslationStatus smmuv3_do_translate(SMMUv3State *s, hwaddr addr,
> +                                                 SMMUTransCfg *cfg,
> +                                                 SMMUEventInfo *event,
> +                                                 IOMMUAccessFlags flag,
> +                                                 SMMUTLBEntry **out_entry)
> +{
> +    SMMUPTWEventInfo ptw_info = {};
> +    SMMUState *bs = ARM_SMMU(s);
> +    SMMUTLBEntry *cached_entry = NULL;
> +
> +    cached_entry = smmu_translate(bs, cfg, addr, flag, &ptw_info);
> +    if (!cached_entry) {
> +        /* All faults from PTW has S2 field. */
> +        event->u.f_walk_eabt.s2 = (ptw_info.stage == SMMU_STAGE_2);
> +        switch (ptw_info.type) {
> +        case SMMU_PTW_ERR_WALK_EABT:
> +            event->type = SMMU_EVT_F_WALK_EABT;
> +            event->u.f_walk_eabt.addr = addr;
> +            event->u.f_walk_eabt.rnw = flag & 0x1;
> +            event->u.f_walk_eabt.class = 0x1;
> +            event->u.f_walk_eabt.addr2 = ptw_info.addr;
> +            break;
> +        case SMMU_PTW_ERR_TRANSLATION:
> +            if (PTW_RECORD_FAULT(cfg)) {
> +                event->type = SMMU_EVT_F_TRANSLATION;
> +                event->u.f_translation.addr = addr;
> +                event->u.f_translation.rnw = flag & 0x1;
> +            }
> +            break;
> +        case SMMU_PTW_ERR_ADDR_SIZE:
> +            if (PTW_RECORD_FAULT(cfg)) {
> +                event->type = SMMU_EVT_F_ADDR_SIZE;
> +                event->u.f_addr_size.addr = addr;
> +                event->u.f_addr_size.rnw = flag & 0x1;
> +            }
> +            break;
> +        case SMMU_PTW_ERR_ACCESS:
> +            if (PTW_RECORD_FAULT(cfg)) {
> +                event->type = SMMU_EVT_F_ACCESS;
> +                event->u.f_access.addr = addr;
> +                event->u.f_access.rnw = flag & 0x1;
> +            }
> +            break;
> +        case SMMU_PTW_ERR_PERMISSION:
> +            if (PTW_RECORD_FAULT(cfg)) {
> +                event->type = SMMU_EVT_F_PERMISSION;
> +                event->u.f_permission.addr = addr;
> +                event->u.f_permission.rnw = flag & 0x1;
> +            }
> +            break;
> +        default:
> +            g_assert_not_reached();
> +        }
> +        return SMMU_TRANS_ERROR;
> +    }
> +    *out_entry = cached_entry;
> +    return SMMU_TRANS_SUCCESS;
> +}
> +
> +/* Entry point to SMMU, does everything. */
>  static IOMMUTLBEntry smmuv3_translate(IOMMUMemoryRegion *mr, hwaddr addr,
>                                        IOMMUAccessFlags flag, int iommu_idx)
>  {
> @@ -836,12 +897,7 @@ static IOMMUTLBEntry smmuv3_translate(IOMMUMemoryRegion *mr, hwaddr addr,
>      SMMUEventInfo event = {.type = SMMU_EVT_NONE,
>                             .sid = sid,
>                             .inval_ste_allowed = false};
> -    SMMUPTWEventInfo ptw_info = {};
>      SMMUTranslationStatus status;
> -    SMMUState *bs = ARM_SMMU(s);
> -    uint64_t page_mask, aligned_addr;
> -    SMMUTLBEntry *cached_entry = NULL;
> -    SMMUTransTableInfo *tt;
>      SMMUTransCfg *cfg = NULL;
>      IOMMUTLBEntry entry = {
>          .target_as = &address_space_memory,
> @@ -850,11 +906,7 @@ static IOMMUTLBEntry smmuv3_translate(IOMMUMemoryRegion *mr, hwaddr addr,
>          .addr_mask = ~(hwaddr)0,
>          .perm = IOMMU_NONE,
>      };
> -    /*
> -     * Combined attributes used for TLB lookup, as only one stage is supported,
> -     * it will hold attributes based on the enabled stage.
> -     */
> -    SMMUTransTableInfo tt_combined;
> +    SMMUTLBEntry *cached_entry = NULL;
>  
>      qemu_mutex_lock(&s->mutex);
>  
> @@ -883,105 +935,7 @@ static IOMMUTLBEntry smmuv3_translate(IOMMUMemoryRegion *mr, hwaddr addr,
>          goto epilogue;
>      }
>  
> -    if (cfg->stage == SMMU_STAGE_1) {
> -        /* Select stage1 translation table. */
> -        tt = select_tt(cfg, addr);
> -        if (!tt) {
> -            if (cfg->record_faults) {
> -                event.type = SMMU_EVT_F_TRANSLATION;
> -                event.u.f_translation.addr = addr;
> -                event.u.f_translation.rnw = flag & 0x1;
> -            }
> -            status = SMMU_TRANS_ERROR;
> -            goto epilogue;
> -        }
> -        tt_combined.granule_sz = tt->granule_sz;
> -        tt_combined.tsz = tt->tsz;
> -
> -    } else {
> -        /* Stage2. */
> -        tt_combined.granule_sz = cfg->s2cfg.granule_sz;
> -        tt_combined.tsz = cfg->s2cfg.tsz;
> -    }
> -    /*
> -     * TLB lookup looks for granule and input size for a translation stage,
> -     * as only one stage is supported right now, choose the right values
> -     * from the configuration.
> -     */
> -    page_mask = (1ULL << tt_combined.granule_sz) - 1;
> -    aligned_addr = addr & ~page_mask;
> -
> -    cached_entry = smmu_iotlb_lookup(bs, cfg, &tt_combined, aligned_addr);
> -    if (cached_entry) {
> -        if ((flag & IOMMU_WO) && !(cached_entry->entry.perm & IOMMU_WO)) {
> -            status = SMMU_TRANS_ERROR;
> -            /*
> -             * We know that the TLB only contains either stage-1 or stage-2 as
> -             * nesting is not supported. So it is sufficient to check the
> -             * translation stage to know the TLB stage for now.
> -             */
> -            event.u.f_walk_eabt.s2 = (cfg->stage == SMMU_STAGE_2);
> -            if (PTW_RECORD_FAULT(cfg)) {
> -                event.type = SMMU_EVT_F_PERMISSION;
> -                event.u.f_permission.addr = addr;
> -                event.u.f_permission.rnw = flag & 0x1;
> -            }
> -        } else {
> -            status = SMMU_TRANS_SUCCESS;
> -        }
> -        goto epilogue;
> -    }
> -
> -    cached_entry = g_new0(SMMUTLBEntry, 1);
> -
> -    if (smmu_ptw(cfg, aligned_addr, flag, cached_entry, &ptw_info)) {
> -        /* All faults from PTW has S2 field. */
> -        event.u.f_walk_eabt.s2 = (ptw_info.stage == SMMU_STAGE_2);
> -        g_free(cached_entry);
> -        switch (ptw_info.type) {
> -        case SMMU_PTW_ERR_WALK_EABT:
> -            event.type = SMMU_EVT_F_WALK_EABT;
> -            event.u.f_walk_eabt.addr = addr;
> -            event.u.f_walk_eabt.rnw = flag & 0x1;
> -            event.u.f_walk_eabt.class = 0x1;
> -            event.u.f_walk_eabt.addr2 = ptw_info.addr;
> -            break;
> -        case SMMU_PTW_ERR_TRANSLATION:
> -            if (PTW_RECORD_FAULT(cfg)) {
> -                event.type = SMMU_EVT_F_TRANSLATION;
> -                event.u.f_translation.addr = addr;
> -                event.u.f_translation.rnw = flag & 0x1;
> -            }
> -            break;
> -        case SMMU_PTW_ERR_ADDR_SIZE:
> -            if (PTW_RECORD_FAULT(cfg)) {
> -                event.type = SMMU_EVT_F_ADDR_SIZE;
> -                event.u.f_addr_size.addr = addr;
> -                event.u.f_addr_size.rnw = flag & 0x1;
> -            }
> -            break;
> -        case SMMU_PTW_ERR_ACCESS:
> -            if (PTW_RECORD_FAULT(cfg)) {
> -                event.type = SMMU_EVT_F_ACCESS;
> -                event.u.f_access.addr = addr;
> -                event.u.f_access.rnw = flag & 0x1;
> -            }
> -            break;
> -        case SMMU_PTW_ERR_PERMISSION:
> -            if (PTW_RECORD_FAULT(cfg)) {
> -                event.type = SMMU_EVT_F_PERMISSION;
> -                event.u.f_permission.addr = addr;
> -                event.u.f_permission.rnw = flag & 0x1;
> -            }
> -            break;
> -        default:
> -            g_assert_not_reached();
> -        }
> -        status = SMMU_TRANS_ERROR;
> -    } else {
> -        smmu_iotlb_insert(bs, cfg, cached_entry);
> -        status = SMMU_TRANS_SUCCESS;
> -    }
> +    status = smmuv3_do_translate(s, addr, cfg, &event, flag, &cached_entry);
>  
>  epilogue:
>      qemu_mutex_unlock(&s->mutex);
> @@ -992,7 +946,8 @@ epilogue:
>                                      (addr & cached_entry->entry.addr_mask);
>          entry.addr_mask = cached_entry->entry.addr_mask;
>          trace_smmuv3_translate_success(mr->parent_obj.name, sid, addr,
> -                                       entry.translated_addr, entry.perm);
> +                                       entry.translated_addr, entry.perm,
> +                                       cfg->stage);
>          break;
>      case SMMU_TRANS_DISABLE:
>          entry.perm = flag;
> diff --git a/hw/arm/trace-events b/hw/arm/trace-events
> index f1a54a02df..cc12924a84 100644
> --- a/hw/arm/trace-events
> +++ b/hw/arm/trace-events
> @@ -37,7 +37,7 @@ smmuv3_get_ste(uint64_t addr) "STE addr: 0x%"PRIx64
>  smmuv3_translate_disable(const char *n, uint16_t sid, uint64_t addr, bool is_write) "%s sid=0x%x bypass (smmu disabled) iova:0x%"PRIx64" is_write=%d"
>  smmuv3_translate_bypass(const char *n, uint16_t sid, uint64_t addr, bool is_write) "%s sid=0x%x STE bypass iova:0x%"PRIx64" is_write=%d"
>  smmuv3_translate_abort(const char *n, uint16_t sid, uint64_t addr, bool is_write) "%s sid=0x%x abort on iova:0x%"PRIx64" is_write=%d"
> -smmuv3_translate_success(const char *n, uint16_t sid, uint64_t iova, uint64_t translated, int perm) "%s sid=0x%x iova=0x%"PRIx64" translated=0x%"PRIx64" perm=0x%x"
> +smmuv3_translate_success(const char *n, uint16_t sid, uint64_t iova, uint64_t translated, int perm, int stage) "%s sid=0x%x iova=0x%"PRIx64" translated=0x%"PRIx64" perm=0x%x stage=%d"
>  smmuv3_get_cd(uint64_t addr) "CD addr: 0x%"PRIx64
>  smmuv3_decode_cd(uint32_t oas) "oas=%d"
>  smmuv3_decode_cd_tt(int i, uint32_t tsz, uint64_t ttb, uint32_t granule_sz, bool had) "TT[%d]:tsz:%d ttb:0x%"PRIx64" granule_sz:%d had:%d"
> diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
> index b3c881f0ee..876e78975c 100644
> --- a/include/hw/arm/smmu-common.h
> +++ b/include/hw/arm/smmu-common.h
> @@ -183,6 +183,11 @@ static inline uint16_t smmu_get_sid(SMMUDevice *sdev)
>  int smmu_ptw(SMMUTransCfg *cfg, dma_addr_t iova, IOMMUAccessFlags perm,
>               SMMUTLBEntry *tlbe, SMMUPTWEventInfo *info);
>  
> +
> +/* smmu_translate - Look for a translation in TLB, if not, do a PTW. */
I would add a comment saying it returns NULL in case of translation
error. Indeed at first sight I thought in case of hit and err_permission
it would still return the TLBentry.

Otherwise it looks good to me.

Reviewed-by: Eric Auger <eric.auger@redhat.com>

Eric
> +SMMUTLBEntry *smmu_translate(SMMUState *bs, SMMUTransCfg *cfg, dma_addr_t addr,
> +                             IOMMUAccessFlags flag, SMMUPTWEventInfo *info);
> +
>  /**
>   * select_tt - compute which translation table shall be used according to
>   * the input iova and translation config and return the TT specific info



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 01/12] hw/arm/smmu: Use enum for SMMU stage
  2024-03-25 10:13 ` [RFC PATCH 01/12] hw/arm/smmu: Use enum for SMMU stage Mostafa Saleh
@ 2024-04-02 16:32   ` Eric Auger
  0 siblings, 0 replies; 24+ messages in thread
From: Eric Auger @ 2024-04-02 16:32 UTC (permalink / raw)
  To: Mostafa Saleh, qemu-arm, peter.maydell, qemu-devel
  Cc: jean-philippe, alex.bennee, maz, nicolinc, julien

Hi Mostafa,

On 3/25/24 11:13, Mostafa Saleh wrote:
> Currently, translation stage is represented as an int, where 1 is stage-1 and
> 2 is stage-2, when nested is added, 3 would be confusing to represent nesting,
> so we use an enum instead.
>
> While keeping the same values, this is useful for:
>  - Doing tricks with bit masks, where BIT(0) is stage-1 and BIT(1) is
>    stage-2 and both is nested.
>  - Tracing, as stage is printed as int.
>
> Signed-off-by: Mostafa Saleh <smostafa@google.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Eric
> ---
>  hw/arm/smmu-common.c         | 14 +++++++-------
>  hw/arm/smmuv3.c              | 15 ++++++++-------
>  include/hw/arm/smmu-common.h | 11 +++++++++--
>  3 files changed, 24 insertions(+), 16 deletions(-)
>
> diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
> index 4caedb4998..3a7c350aca 100644
> --- a/hw/arm/smmu-common.c
> +++ b/hw/arm/smmu-common.c
> @@ -304,7 +304,7 @@ static int smmu_ptw_64_s1(SMMUTransCfg *cfg,
>                            SMMUTLBEntry *tlbe, SMMUPTWEventInfo *info)
>  {
>      dma_addr_t baseaddr, indexmask;
> -    int stage = cfg->stage;
> +    SMMUStage stage = cfg->stage;
>      SMMUTransTableInfo *tt = select_tt(cfg, iova);
>      uint8_t level, granule_sz, inputsize, stride;
>  
> @@ -392,7 +392,7 @@ static int smmu_ptw_64_s1(SMMUTransCfg *cfg,
>      info->type = SMMU_PTW_ERR_TRANSLATION;
>  
>  error:
> -    info->stage = 1;
> +    info->stage = SMMU_STAGE_1;
>      tlbe->entry.perm = IOMMU_NONE;
>      return -EINVAL;
>  }
> @@ -415,7 +415,7 @@ static int smmu_ptw_64_s2(SMMUTransCfg *cfg,
>                            dma_addr_t ipa, IOMMUAccessFlags perm,
>                            SMMUTLBEntry *tlbe, SMMUPTWEventInfo *info)
>  {
> -    const int stage = 2;
> +    const SMMUStage stage = SMMU_STAGE_2;
>      int granule_sz = cfg->s2cfg.granule_sz;
>      /* ARM DDI0487I.a: Table D8-7. */
>      int inputsize = 64 - cfg->s2cfg.tsz;
> @@ -513,7 +513,7 @@ static int smmu_ptw_64_s2(SMMUTransCfg *cfg,
>      info->type = SMMU_PTW_ERR_TRANSLATION;
>  
>  error:
> -    info->stage = 2;
> +    info->stage = SMMU_STAGE_2;
>      tlbe->entry.perm = IOMMU_NONE;
>      return -EINVAL;
>  }
> @@ -532,9 +532,9 @@ error:
>  int smmu_ptw(SMMUTransCfg *cfg, dma_addr_t iova, IOMMUAccessFlags perm,
>               SMMUTLBEntry *tlbe, SMMUPTWEventInfo *info)
>  {
> -    if (cfg->stage == 1) {
> +    if (cfg->stage == SMMU_STAGE_1) {
>          return smmu_ptw_64_s1(cfg, iova, perm, tlbe, info);
> -    } else if (cfg->stage == 2) {
> +    } else if (cfg->stage == SMMU_STAGE_2) {
>          /*
>           * If bypassing stage 1(or unimplemented), the input address is passed
>           * directly to stage 2 as IPA. If the input address of a transaction
> @@ -543,7 +543,7 @@ int smmu_ptw(SMMUTransCfg *cfg, dma_addr_t iova, IOMMUAccessFlags perm,
>           */
>          if (iova >= (1ULL << cfg->oas)) {
>              info->type = SMMU_PTW_ERR_ADDR_SIZE;
> -            info->stage = 1;
> +            info->stage = SMMU_STAGE_1;
>              tlbe->entry.perm = IOMMU_NONE;
>              return -EINVAL;
>          }
> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> index 9eb56a70f3..50e5a72d54 100644
> --- a/hw/arm/smmuv3.c
> +++ b/hw/arm/smmuv3.c
> @@ -34,7 +34,8 @@
>  #include "smmuv3-internal.h"
>  #include "smmu-internal.h"
>  
> -#define PTW_RECORD_FAULT(cfg)   (((cfg)->stage == 1) ? (cfg)->record_faults : \
> +#define PTW_RECORD_FAULT(cfg)   (((cfg)->stage == SMMU_STAGE_1) ? \
> +                                 (cfg)->record_faults : \
>                                   (cfg)->s2cfg.record_faults)
>  
>  /**
> @@ -402,7 +403,7 @@ static bool s2_pgtable_config_valid(uint8_t sl0, uint8_t t0sz, uint8_t gran)
>  
>  static int decode_ste_s2_cfg(SMMUTransCfg *cfg, STE *ste)
>  {
> -    cfg->stage = 2;
> +    cfg->stage = SMMU_STAGE_2;
>  
>      if (STE_S2AA64(ste) == 0x0) {
>          qemu_log_mask(LOG_UNIMP,
> @@ -678,7 +679,7 @@ static int decode_cd(SMMUTransCfg *cfg, CD *cd, SMMUEventInfo *event)
>  
>      /* we support only those at the moment */
>      cfg->aa64 = true;
> -    cfg->stage = 1;
> +    cfg->stage = SMMU_STAGE_1;
>  
>      cfg->oas = oas2bits(CD_IPS(cd));
>      cfg->oas = MIN(oas2bits(SMMU_IDR5_OAS), cfg->oas);
> @@ -762,7 +763,7 @@ static int smmuv3_decode_config(IOMMUMemoryRegion *mr, SMMUTransCfg *cfg,
>          return ret;
>      }
>  
> -    if (cfg->aborted || cfg->bypassed || (cfg->stage == 2)) {
> +    if (cfg->aborted || cfg->bypassed || (cfg->stage == SMMU_STAGE_2)) {
>          return 0;
>      }
>  
> @@ -882,7 +883,7 @@ static IOMMUTLBEntry smmuv3_translate(IOMMUMemoryRegion *mr, hwaddr addr,
>          goto epilogue;
>      }
>  
> -    if (cfg->stage == 1) {
> +    if (cfg->stage == SMMU_STAGE_1) {
>          /* Select stage1 translation table. */
>          tt = select_tt(cfg, addr);
>          if (!tt) {
> @@ -919,7 +920,7 @@ static IOMMUTLBEntry smmuv3_translate(IOMMUMemoryRegion *mr, hwaddr addr,
>               * nesting is not supported. So it is sufficient to check the
>               * translation stage to know the TLB stage for now.
>               */
> -            event.u.f_walk_eabt.s2 = (cfg->stage == 2);
> +            event.u.f_walk_eabt.s2 = (cfg->stage == SMMU_STAGE_2);
>              if (PTW_RECORD_FAULT(cfg)) {
>                  event.type = SMMU_EVT_F_PERMISSION;
>                  event.u.f_permission.addr = addr;
> @@ -935,7 +936,7 @@ static IOMMUTLBEntry smmuv3_translate(IOMMUMemoryRegion *mr, hwaddr addr,
>  
>      if (smmu_ptw(cfg, aligned_addr, flag, cached_entry, &ptw_info)) {
>          /* All faults from PTW has S2 field. */
> -        event.u.f_walk_eabt.s2 = (ptw_info.stage == 2);
> +        event.u.f_walk_eabt.s2 = (ptw_info.stage == SMMU_STAGE_2);
>          g_free(cached_entry);
>          switch (ptw_info.type) {
>          case SMMU_PTW_ERR_WALK_EABT:
> diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
> index 5ec2e6c1a4..b3c881f0ee 100644
> --- a/include/hw/arm/smmu-common.h
> +++ b/include/hw/arm/smmu-common.h
> @@ -49,8 +49,15 @@ typedef enum {
>      SMMU_PTW_ERR_PERMISSION,  /* Permission fault */
>  } SMMUPTWEventType;
>  
> +/* SMMU Stage */
> +typedef enum {
> +    SMMU_STAGE_1 = 1,
> +    SMMU_STAGE_2,
> +    SMMU_NESTED,
> +} SMMUStage;
> +
>  typedef struct SMMUPTWEventInfo {
> -    int stage;
> +    SMMUStage stage;
>      SMMUPTWEventType type;
>      dma_addr_t addr; /* fetched address that induced an abort, if any */
>  } SMMUPTWEventInfo;
> @@ -88,7 +95,7 @@ typedef struct SMMUS2Cfg {
>   */
>  typedef struct SMMUTransCfg {
>      /* Shared fields between stage-1 and stage-2. */
> -    int stage;                 /* translation stage */
> +    SMMUStage stage;           /* translation stage */
>      bool disabled;             /* smmu is disabled */
>      bool bypassed;             /* translation is bypassed */
>      bool aborted;              /* translation is aborted */



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 03/12] hw/arm/smmu: Add stage to TLB
  2024-03-25 10:13 ` [RFC PATCH 03/12] hw/arm/smmu: Add stage to TLB Mostafa Saleh
@ 2024-04-02 17:15   ` Eric Auger
  2024-04-02 18:47     ` Mostafa Saleh
  0 siblings, 1 reply; 24+ messages in thread
From: Eric Auger @ 2024-04-02 17:15 UTC (permalink / raw)
  To: Mostafa Saleh, qemu-arm, peter.maydell, qemu-devel
  Cc: jean-philippe, alex.bennee, maz, nicolinc, julien

Hi Mostafa,

On 3/25/24 11:13, Mostafa Saleh wrote:
> TLBs for nesting will be extended to be combined, a new index is added
> "stage", with 2 valid values:
>  - SMMU_STAGE_1: Meaning this translates VA to PADDR, this entry can
>    be cached from fully nested configuration or from stage-1 only.
>    We don't support separate cached entries (VA to IPA)
>
>  - SMMU_STAGE_2: Meaning this translates IPA to PADDR, cached from
>    stage-2 only configuration.
>
> For TLB invalidation:
>  - by VA: Invalidate TLBs tagged with SMMU_STAGE_1
>  - by IPA: Invalidate TLBs tagged with SMMU_STAGE_2
>  - All: Will invalidate both, this is communicated to the TLB as
>    SMMU_NESTED which is (SMMU_STAGE_1 | SMMU_STAGE_2) which uses
>    it as a mask.

I don't really get why you need this extra stage field in the key. Why
aren't the asid and vmid tags enough?

Eric
>
> This briefly described in the user manual (ARM IHI 0070 F.b) in
> "16.2.1 Caching combined structures".
>
> Signed-off-by: Mostafa Saleh <smostafa@google.com>
> ---
>  hw/arm/smmu-common.c         | 27 +++++++++++++++++----------
>  hw/arm/smmu-internal.h       |  2 ++
>  hw/arm/smmuv3.c              |  5 +++--
>  hw/arm/trace-events          |  3 ++-
>  include/hw/arm/smmu-common.h |  8 ++++++--
>  5 files changed, 30 insertions(+), 15 deletions(-)
>
> diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
> index 20630eb670..677dcf9a13 100644
> --- a/hw/arm/smmu-common.c
> +++ b/hw/arm/smmu-common.c
> @@ -38,7 +38,7 @@ static guint smmu_iotlb_key_hash(gconstpointer v)
>  
>      /* Jenkins hash */
>      a = b = c = JHASH_INITVAL + sizeof(*key);
> -    a += key->asid + key->vmid + key->level + key->tg;
> +    a += key->asid + key->vmid + key->level + key->tg + key->stage;
>      b += extract64(key->iova, 0, 32);
>      c += extract64(key->iova, 32, 32);
>  
> @@ -54,14 +54,14 @@ static gboolean smmu_iotlb_key_equal(gconstpointer v1, gconstpointer v2)
>  
>      return (k1->asid == k2->asid) && (k1->iova == k2->iova) &&
>             (k1->level == k2->level) && (k1->tg == k2->tg) &&
> -           (k1->vmid == k2->vmid);
> +           (k1->vmid == k2->vmid) && (k1->stage == k2->stage);
>  }
>  
>  SMMUIOTLBKey smmu_get_iotlb_key(uint16_t asid, uint16_t vmid, uint64_t iova,
> -                                uint8_t tg, uint8_t level)
> +                                uint8_t tg, uint8_t level, SMMUStage stage)
>  {
>      SMMUIOTLBKey key = {.asid = asid, .vmid = vmid, .iova = iova,
> -                        .tg = tg, .level = level};
> +                        .tg = tg, .level = level, .stage = stage};
>  
>      return key;
>  }
> @@ -81,7 +81,8 @@ SMMUTLBEntry *smmu_iotlb_lookup(SMMUState *bs, SMMUTransCfg *cfg,
>          SMMUIOTLBKey key;
>  
>          key = smmu_get_iotlb_key(cfg->asid, cfg->s2cfg.vmid,
> -                                 iova & ~mask, tg, level);
> +                                 iova & ~mask, tg, level,
> +                                 SMMU_STAGE_TO_TLB_TAG(cfg->stage));
>          entry = g_hash_table_lookup(bs->iotlb, &key);
>          if (entry) {
>              break;
> @@ -109,15 +110,16 @@ void smmu_iotlb_insert(SMMUState *bs, SMMUTransCfg *cfg, SMMUTLBEntry *new)
>  {
>      SMMUIOTLBKey *key = g_new0(SMMUIOTLBKey, 1);
>      uint8_t tg = (new->granule - 10) / 2;
> +    SMMUStage stage_tag = SMMU_STAGE_TO_TLB_TAG(cfg->stage);
>  
>      if (g_hash_table_size(bs->iotlb) >= SMMU_IOTLB_MAX_SIZE) {
>          smmu_iotlb_inv_all(bs);
>      }
>  
>      *key = smmu_get_iotlb_key(cfg->asid, cfg->s2cfg.vmid, new->entry.iova,
> -                              tg, new->level);
> +                              tg, new->level, stage_tag);
>      trace_smmu_iotlb_insert(cfg->asid, cfg->s2cfg.vmid, new->entry.iova,
> -                            tg, new->level);
> +                            tg, new->level, stage_tag);
>      g_hash_table_insert(bs->iotlb, key, new);
>  }
>  
> @@ -159,18 +161,22 @@ static gboolean smmu_hash_remove_by_asid_vmid_iova(gpointer key, gpointer value,
>      if (info->vmid >= 0 && info->vmid != SMMU_IOTLB_VMID(iotlb_key)) {
>          return false;
>      }
> +    if (!(info->stage & SMMU_IOTLB_STAGE(iotlb_key))) {
> +        return false;
> +    }
>      return ((info->iova & ~entry->addr_mask) == entry->iova) ||
>             ((entry->iova & ~info->mask) == info->iova);
>  }
>  
>  void smmu_iotlb_inv_iova(SMMUState *s, int asid, int vmid, dma_addr_t iova,
> -                         uint8_t tg, uint64_t num_pages, uint8_t ttl)
> +                         uint8_t tg, uint64_t num_pages, uint8_t ttl,
> +                         SMMUStage stage)
>  {
>      /* if tg is not set we use 4KB range invalidation */
>      uint8_t granule = tg ? tg * 2 + 10 : 12;
>  
>      if (ttl && (num_pages == 1) && (asid >= 0)) {
> -        SMMUIOTLBKey key = smmu_get_iotlb_key(asid, vmid, iova, tg, ttl);
> +        SMMUIOTLBKey key = smmu_get_iotlb_key(asid, vmid, iova, tg, ttl, stage);
>  
>          if (g_hash_table_remove(s->iotlb, &key)) {
>              return;
> @@ -184,6 +190,7 @@ void smmu_iotlb_inv_iova(SMMUState *s, int asid, int vmid, dma_addr_t iova,
>      SMMUIOTLBPageInvInfo info = {
>          .asid = asid, .iova = iova,
>          .vmid = vmid,
> +        .stage = stage,
>          .mask = (num_pages * 1 << granule) - 1};
>  
>      g_hash_table_foreach_remove(s->iotlb,
> @@ -597,7 +604,7 @@ SMMUTLBEntry *smmu_translate(SMMUState *bs, SMMUTransCfg *cfg, dma_addr_t addr,
>      if (cached_entry) {
>          if ((flag & IOMMU_WO) && !(cached_entry->entry.perm & IOMMU_WO)) {
>              info->type = SMMU_PTW_ERR_PERMISSION;
> -            info->stage = cfg->stage;
> +            info->stage = SMMU_STAGE_TO_TLB_TAG(cfg->stage);
>              return NULL;
>          }
>          return cached_entry;
> diff --git a/hw/arm/smmu-internal.h b/hw/arm/smmu-internal.h
> index 843bebb185..6caa0ddf21 100644
> --- a/hw/arm/smmu-internal.h
> +++ b/hw/arm/smmu-internal.h
> @@ -133,12 +133,14 @@ static inline int pgd_concat_idx(int start_level, int granule_sz,
>  
>  #define SMMU_IOTLB_ASID(key) ((key).asid)
>  #define SMMU_IOTLB_VMID(key) ((key).vmid)
> +#define SMMU_IOTLB_STAGE(key) ((key).stage)
>  
>  typedef struct SMMUIOTLBPageInvInfo {
>      int asid;
>      int vmid;
>      uint64_t iova;
>      uint64_t mask;
> +    SMMUStage stage;
>  } SMMUIOTLBPageInvInfo;
>  
>  typedef struct SMMUSIDRange {
> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> index f081ff0cc4..b27bf297e1 100644
> --- a/hw/arm/smmuv3.c
> +++ b/hw/arm/smmuv3.c
> @@ -1087,7 +1087,7 @@ static void smmuv3_range_inval(SMMUState *s, Cmd *cmd)
>      if (!tg) {
>          trace_smmuv3_range_inval(vmid, asid, addr, tg, 1, ttl, leaf);
>          smmuv3_inv_notifiers_iova(s, asid, vmid, addr, tg, 1);
> -        smmu_iotlb_inv_iova(s, asid, vmid, addr, tg, 1, ttl);
> +        smmu_iotlb_inv_iova(s, asid, vmid, addr, tg, 1, ttl, SMMU_NESTED);
>          return;
>      }
>  
> @@ -1105,7 +1105,8 @@ static void smmuv3_range_inval(SMMUState *s, Cmd *cmd)
>          num_pages = (mask + 1) >> granule;
>          trace_smmuv3_range_inval(vmid, asid, addr, tg, num_pages, ttl, leaf);
>          smmuv3_inv_notifiers_iova(s, asid, vmid, addr, tg, num_pages);
> -        smmu_iotlb_inv_iova(s, asid, vmid, addr, tg, num_pages, ttl);
> +        smmu_iotlb_inv_iova(s, asid, vmid, addr, tg,
> +                            num_pages, ttl, SMMU_NESTED);
>          addr += mask + 1;
>      }
>  }
> diff --git a/hw/arm/trace-events b/hw/arm/trace-events
> index cc12924a84..3000c3bf14 100644
> --- a/hw/arm/trace-events
> +++ b/hw/arm/trace-events
> @@ -14,10 +14,11 @@ smmu_iotlb_inv_all(void) "IOTLB invalidate all"
>  smmu_iotlb_inv_asid(uint16_t asid) "IOTLB invalidate asid=%d"
>  smmu_iotlb_inv_vmid(uint16_t vmid) "IOTLB invalidate vmid=%d"
>  smmu_iotlb_inv_iova(uint16_t asid, uint64_t addr) "IOTLB invalidate asid=%d addr=0x%"PRIx64
> +smmu_iotlb_inv_stage(int stage) "Stage invalidate stage=%d"
>  smmu_inv_notifiers_mr(const char *name) "iommu mr=%s"
>  smmu_iotlb_lookup_hit(uint16_t asid, uint16_t vmid, uint64_t addr, uint32_t hit, uint32_t miss, uint32_t p) "IOTLB cache HIT asid=%d vmid=%d addr=0x%"PRIx64" hit=%d miss=%d hit rate=%d"
>  smmu_iotlb_lookup_miss(uint16_t asid, uint16_t vmid, uint64_t addr, uint32_t hit, uint32_t miss, uint32_t p) "IOTLB cache MISS asid=%d vmid=%d addr=0x%"PRIx64" hit=%d miss=%d hit rate=%d"
> -smmu_iotlb_insert(uint16_t asid, uint16_t vmid, uint64_t addr, uint8_t tg, uint8_t level) "IOTLB ++ asid=%d vmid=%d addr=0x%"PRIx64" tg=%d level=%d"
> +smmu_iotlb_insert(uint16_t asid, uint16_t vmid, uint64_t addr, uint8_t tg, uint8_t level, int stage) "IOTLB ++ asid=%d vmid=%d addr=0x%"PRIx64" tg=%d level=%d stage=%d"
>  
>  # smmuv3.c
>  smmuv3_read_mmio(uint64_t addr, uint64_t val, unsigned size, uint32_t r) "addr: 0x%"PRIx64" val:0x%"PRIx64" size: 0x%x(%d)"
> diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
> index 876e78975c..695d6d10ad 100644
> --- a/include/hw/arm/smmu-common.h
> +++ b/include/hw/arm/smmu-common.h
> @@ -37,6 +37,8 @@
>  #define VMSA_IDXMSK(isz, strd, lvl)         ((1ULL << \
>                                               VMSA_BIT_LVL(isz, strd, lvl)) - 1)
>  
> +#define SMMU_STAGE_TO_TLB_TAG(stage)        (((stage) == SMMU_NESTED) ? \
> +                                             SMMU_STAGE_1 : (stage))
>  /*
>   * Page table walk error types
>   */
> @@ -136,6 +138,7 @@ typedef struct SMMUIOTLBKey {
>      uint16_t vmid;
>      uint8_t tg;
>      uint8_t level;
> +    SMMUStage stage;
>  } SMMUIOTLBKey;
>  
>  struct SMMUState {
> @@ -203,12 +206,13 @@ SMMUTLBEntry *smmu_iotlb_lookup(SMMUState *bs, SMMUTransCfg *cfg,
>                                  SMMUTransTableInfo *tt, hwaddr iova);
>  void smmu_iotlb_insert(SMMUState *bs, SMMUTransCfg *cfg, SMMUTLBEntry *entry);
>  SMMUIOTLBKey smmu_get_iotlb_key(uint16_t asid, uint16_t vmid, uint64_t iova,
> -                                uint8_t tg, uint8_t level);
> +                                uint8_t tg, uint8_t level, SMMUStage stage);
>  void smmu_iotlb_inv_all(SMMUState *s);
>  void smmu_iotlb_inv_asid(SMMUState *s, uint16_t asid);
>  void smmu_iotlb_inv_vmid(SMMUState *s, uint16_t vmid);
>  void smmu_iotlb_inv_iova(SMMUState *s, int asid, int vmid, dma_addr_t iova,
> -                         uint8_t tg, uint64_t num_pages, uint8_t ttl);
> +                         uint8_t tg, uint64_t num_pages, uint8_t ttl,
> +                         SMMUStage stage);
>  
>  /* Unmap the range of all the notifiers registered to any IOMMU mr */
>  void smmu_inv_notifiers_all(SMMUState *s);



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 03/12] hw/arm/smmu: Add stage to TLB
  2024-04-02 17:15   ` Eric Auger
@ 2024-04-02 18:47     ` Mostafa Saleh
  2024-04-03  7:22       ` Eric Auger
  0 siblings, 1 reply; 24+ messages in thread
From: Mostafa Saleh @ 2024-04-02 18:47 UTC (permalink / raw)
  To: Eric Auger
  Cc: qemu-arm, peter.maydell, qemu-devel, jean-philippe, alex.bennee,
	maz, nicolinc, julien

Hi Eric,

On Tue, Apr 02, 2024 at 07:15:20PM +0200, Eric Auger wrote:
> Hi Mostafa,
> 
> On 3/25/24 11:13, Mostafa Saleh wrote:
> > TLBs for nesting will be extended to be combined, a new index is added
> > "stage", with 2 valid values:
> >  - SMMU_STAGE_1: Meaning this translates VA to PADDR, this entry can
> >    be cached from fully nested configuration or from stage-1 only.
> >    We don't support separate cached entries (VA to IPA)
> >
> >  - SMMU_STAGE_2: Meaning this translates IPA to PADDR, cached from
> >    stage-2 only configuration.
> >
> > For TLB invalidation:
> >  - by VA: Invalidate TLBs tagged with SMMU_STAGE_1
> >  - by IPA: Invalidate TLBs tagged with SMMU_STAGE_2
> >  - All: Will invalidate both, this is communicated to the TLB as
> >    SMMU_NESTED which is (SMMU_STAGE_1 | SMMU_STAGE_2) which uses
> >    it as a mask.
> 
> I don't really get why you need this extra stage field in the key. Why
> aren't the asid and vmid tags enough?
> 

Looking again, I think we can do it with ASID and VMID only, but that
requires some rework in the invalidation path.

With nested SMMUs, we can cache entries from:
- Stage-1 (or nested): Tagged with VMID and ASID
- Stage-2: Tagged with VMID only (ASID = -1)

That should be enough for caching/lookup, but for invalidation, we
should be able to invalidate IPAs which are cached from stage-2.

At the moment, we represent ASIDs with < 0 as a wildcard for
invalidation or stage-2 and they were mutually exclusive.

An example is:
- CMD_TLBI_NH_VAA: Invalidate stage-1 for a VMID, all ASIDs (we use ASID = -1)
- CMD_TLBI_NH_VA: Invalidate stage-1 for a VMID, an ASID  ( > 0)
- CMD_TLBI_S2_IPA: Invalidate stage-2 for a VMID (we use ASID = -1)

We need to distinguish between case 1) and 3) otherwise we over invalidate.

Similarly, CMD_TLBI_NH_ALL(invalidate all stage-1 by VMID) and
CMD_TLBI_S12_VMALL(invalidate both stages by VMID).

I guess we can add variants of these functions that operate on ASIDs
(>= 0) or (< 0) which is basically stage-1 or stage-2.

Another case I can think of which is not implemented in QEMU is
global entries, where we would like to look up entries for all ASIDs
(-1), but that’s not a problem for now.

I don’t have a strong opinion, I can try to do it this way.

Thanks,
Mostafa

> Eric
> >
> > This briefly described in the user manual (ARM IHI 0070 F.b) in
> > "16.2.1 Caching combined structures".
> >
> > Signed-off-by: Mostafa Saleh <smostafa@google.com>
> > ---
> >  hw/arm/smmu-common.c         | 27 +++++++++++++++++----------
> >  hw/arm/smmu-internal.h       |  2 ++
> >  hw/arm/smmuv3.c              |  5 +++--
> >  hw/arm/trace-events          |  3 ++-
> >  include/hw/arm/smmu-common.h |  8 ++++++--
> >  5 files changed, 30 insertions(+), 15 deletions(-)
> >
> > diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
> > index 20630eb670..677dcf9a13 100644
> > --- a/hw/arm/smmu-common.c
> > +++ b/hw/arm/smmu-common.c
> > @@ -38,7 +38,7 @@ static guint smmu_iotlb_key_hash(gconstpointer v)
> >  
> >      /* Jenkins hash */
> >      a = b = c = JHASH_INITVAL + sizeof(*key);
> > -    a += key->asid + key->vmid + key->level + key->tg;
> > +    a += key->asid + key->vmid + key->level + key->tg + key->stage;
> >      b += extract64(key->iova, 0, 32);
> >      c += extract64(key->iova, 32, 32);
> >  
> > @@ -54,14 +54,14 @@ static gboolean smmu_iotlb_key_equal(gconstpointer v1, gconstpointer v2)
> >  
> >      return (k1->asid == k2->asid) && (k1->iova == k2->iova) &&
> >             (k1->level == k2->level) && (k1->tg == k2->tg) &&
> > -           (k1->vmid == k2->vmid);
> > +           (k1->vmid == k2->vmid) && (k1->stage == k2->stage);
> >  }
> >  
> >  SMMUIOTLBKey smmu_get_iotlb_key(uint16_t asid, uint16_t vmid, uint64_t iova,
> > -                                uint8_t tg, uint8_t level)
> > +                                uint8_t tg, uint8_t level, SMMUStage stage)
> >  {
> >      SMMUIOTLBKey key = {.asid = asid, .vmid = vmid, .iova = iova,
> > -                        .tg = tg, .level = level};
> > +                        .tg = tg, .level = level, .stage = stage};
> >  
> >      return key;
> >  }
> > @@ -81,7 +81,8 @@ SMMUTLBEntry *smmu_iotlb_lookup(SMMUState *bs, SMMUTransCfg *cfg,
> >          SMMUIOTLBKey key;
> >  
> >          key = smmu_get_iotlb_key(cfg->asid, cfg->s2cfg.vmid,
> > -                                 iova & ~mask, tg, level);
> > +                                 iova & ~mask, tg, level,
> > +                                 SMMU_STAGE_TO_TLB_TAG(cfg->stage));
> >          entry = g_hash_table_lookup(bs->iotlb, &key);
> >          if (entry) {
> >              break;
> > @@ -109,15 +110,16 @@ void smmu_iotlb_insert(SMMUState *bs, SMMUTransCfg *cfg, SMMUTLBEntry *new)
> >  {
> >      SMMUIOTLBKey *key = g_new0(SMMUIOTLBKey, 1);
> >      uint8_t tg = (new->granule - 10) / 2;
> > +    SMMUStage stage_tag = SMMU_STAGE_TO_TLB_TAG(cfg->stage);
> >  
> >      if (g_hash_table_size(bs->iotlb) >= SMMU_IOTLB_MAX_SIZE) {
> >          smmu_iotlb_inv_all(bs);
> >      }
> >  
> >      *key = smmu_get_iotlb_key(cfg->asid, cfg->s2cfg.vmid, new->entry.iova,
> > -                              tg, new->level);
> > +                              tg, new->level, stage_tag);
> >      trace_smmu_iotlb_insert(cfg->asid, cfg->s2cfg.vmid, new->entry.iova,
> > -                            tg, new->level);
> > +                            tg, new->level, stage_tag);
> >      g_hash_table_insert(bs->iotlb, key, new);
> >  }
> >  
> > @@ -159,18 +161,22 @@ static gboolean smmu_hash_remove_by_asid_vmid_iova(gpointer key, gpointer value,
> >      if (info->vmid >= 0 && info->vmid != SMMU_IOTLB_VMID(iotlb_key)) {
> >          return false;
> >      }
> > +    if (!(info->stage & SMMU_IOTLB_STAGE(iotlb_key))) {
> > +        return false;
> > +    }
> >      return ((info->iova & ~entry->addr_mask) == entry->iova) ||
> >             ((entry->iova & ~info->mask) == info->iova);
> >  }
> >  
> >  void smmu_iotlb_inv_iova(SMMUState *s, int asid, int vmid, dma_addr_t iova,
> > -                         uint8_t tg, uint64_t num_pages, uint8_t ttl)
> > +                         uint8_t tg, uint64_t num_pages, uint8_t ttl,
> > +                         SMMUStage stage)
> >  {
> >      /* if tg is not set we use 4KB range invalidation */
> >      uint8_t granule = tg ? tg * 2 + 10 : 12;
> >  
> >      if (ttl && (num_pages == 1) && (asid >= 0)) {
> > -        SMMUIOTLBKey key = smmu_get_iotlb_key(asid, vmid, iova, tg, ttl);
> > +        SMMUIOTLBKey key = smmu_get_iotlb_key(asid, vmid, iova, tg, ttl, stage);
> >  
> >          if (g_hash_table_remove(s->iotlb, &key)) {
> >              return;
> > @@ -184,6 +190,7 @@ void smmu_iotlb_inv_iova(SMMUState *s, int asid, int vmid, dma_addr_t iova,
> >      SMMUIOTLBPageInvInfo info = {
> >          .asid = asid, .iova = iova,
> >          .vmid = vmid,
> > +        .stage = stage,
> >          .mask = (num_pages * 1 << granule) - 1};
> >  
> >      g_hash_table_foreach_remove(s->iotlb,
> > @@ -597,7 +604,7 @@ SMMUTLBEntry *smmu_translate(SMMUState *bs, SMMUTransCfg *cfg, dma_addr_t addr,
> >      if (cached_entry) {
> >          if ((flag & IOMMU_WO) && !(cached_entry->entry.perm & IOMMU_WO)) {
> >              info->type = SMMU_PTW_ERR_PERMISSION;
> > -            info->stage = cfg->stage;
> > +            info->stage = SMMU_STAGE_TO_TLB_TAG(cfg->stage);
> >              return NULL;
> >          }
> >          return cached_entry;
> > diff --git a/hw/arm/smmu-internal.h b/hw/arm/smmu-internal.h
> > index 843bebb185..6caa0ddf21 100644
> > --- a/hw/arm/smmu-internal.h
> > +++ b/hw/arm/smmu-internal.h
> > @@ -133,12 +133,14 @@ static inline int pgd_concat_idx(int start_level, int granule_sz,
> >  
> >  #define SMMU_IOTLB_ASID(key) ((key).asid)
> >  #define SMMU_IOTLB_VMID(key) ((key).vmid)
> > +#define SMMU_IOTLB_STAGE(key) ((key).stage)
> >  
> >  typedef struct SMMUIOTLBPageInvInfo {
> >      int asid;
> >      int vmid;
> >      uint64_t iova;
> >      uint64_t mask;
> > +    SMMUStage stage;
> >  } SMMUIOTLBPageInvInfo;
> >  
> >  typedef struct SMMUSIDRange {
> > diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> > index f081ff0cc4..b27bf297e1 100644
> > --- a/hw/arm/smmuv3.c
> > +++ b/hw/arm/smmuv3.c
> > @@ -1087,7 +1087,7 @@ static void smmuv3_range_inval(SMMUState *s, Cmd *cmd)
> >      if (!tg) {
> >          trace_smmuv3_range_inval(vmid, asid, addr, tg, 1, ttl, leaf);
> >          smmuv3_inv_notifiers_iova(s, asid, vmid, addr, tg, 1);
> > -        smmu_iotlb_inv_iova(s, asid, vmid, addr, tg, 1, ttl);
> > +        smmu_iotlb_inv_iova(s, asid, vmid, addr, tg, 1, ttl, SMMU_NESTED);
> >          return;
> >      }
> >  
> > @@ -1105,7 +1105,8 @@ static void smmuv3_range_inval(SMMUState *s, Cmd *cmd)
> >          num_pages = (mask + 1) >> granule;
> >          trace_smmuv3_range_inval(vmid, asid, addr, tg, num_pages, ttl, leaf);
> >          smmuv3_inv_notifiers_iova(s, asid, vmid, addr, tg, num_pages);
> > -        smmu_iotlb_inv_iova(s, asid, vmid, addr, tg, num_pages, ttl);
> > +        smmu_iotlb_inv_iova(s, asid, vmid, addr, tg,
> > +                            num_pages, ttl, SMMU_NESTED);
> >          addr += mask + 1;
> >      }
> >  }
> > diff --git a/hw/arm/trace-events b/hw/arm/trace-events
> > index cc12924a84..3000c3bf14 100644
> > --- a/hw/arm/trace-events
> > +++ b/hw/arm/trace-events
> > @@ -14,10 +14,11 @@ smmu_iotlb_inv_all(void) "IOTLB invalidate all"
> >  smmu_iotlb_inv_asid(uint16_t asid) "IOTLB invalidate asid=%d"
> >  smmu_iotlb_inv_vmid(uint16_t vmid) "IOTLB invalidate vmid=%d"
> >  smmu_iotlb_inv_iova(uint16_t asid, uint64_t addr) "IOTLB invalidate asid=%d addr=0x%"PRIx64
> > +smmu_iotlb_inv_stage(int stage) "Stage invalidate stage=%d"
> >  smmu_inv_notifiers_mr(const char *name) "iommu mr=%s"
> >  smmu_iotlb_lookup_hit(uint16_t asid, uint16_t vmid, uint64_t addr, uint32_t hit, uint32_t miss, uint32_t p) "IOTLB cache HIT asid=%d vmid=%d addr=0x%"PRIx64" hit=%d miss=%d hit rate=%d"
> >  smmu_iotlb_lookup_miss(uint16_t asid, uint16_t vmid, uint64_t addr, uint32_t hit, uint32_t miss, uint32_t p) "IOTLB cache MISS asid=%d vmid=%d addr=0x%"PRIx64" hit=%d miss=%d hit rate=%d"
> > -smmu_iotlb_insert(uint16_t asid, uint16_t vmid, uint64_t addr, uint8_t tg, uint8_t level) "IOTLB ++ asid=%d vmid=%d addr=0x%"PRIx64" tg=%d level=%d"
> > +smmu_iotlb_insert(uint16_t asid, uint16_t vmid, uint64_t addr, uint8_t tg, uint8_t level, int stage) "IOTLB ++ asid=%d vmid=%d addr=0x%"PRIx64" tg=%d level=%d stage=%d"
> >  
> >  # smmuv3.c
> >  smmuv3_read_mmio(uint64_t addr, uint64_t val, unsigned size, uint32_t r) "addr: 0x%"PRIx64" val:0x%"PRIx64" size: 0x%x(%d)"
> > diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
> > index 876e78975c..695d6d10ad 100644
> > --- a/include/hw/arm/smmu-common.h
> > +++ b/include/hw/arm/smmu-common.h
> > @@ -37,6 +37,8 @@
> >  #define VMSA_IDXMSK(isz, strd, lvl)         ((1ULL << \
> >                                               VMSA_BIT_LVL(isz, strd, lvl)) - 1)
> >  
> > +#define SMMU_STAGE_TO_TLB_TAG(stage)        (((stage) == SMMU_NESTED) ? \
> > +                                             SMMU_STAGE_1 : (stage))
> >  /*
> >   * Page table walk error types
> >   */
> > @@ -136,6 +138,7 @@ typedef struct SMMUIOTLBKey {
> >      uint16_t vmid;
> >      uint8_t tg;
> >      uint8_t level;
> > +    SMMUStage stage;
> >  } SMMUIOTLBKey;
> >  
> >  struct SMMUState {
> > @@ -203,12 +206,13 @@ SMMUTLBEntry *smmu_iotlb_lookup(SMMUState *bs, SMMUTransCfg *cfg,
> >                                  SMMUTransTableInfo *tt, hwaddr iova);
> >  void smmu_iotlb_insert(SMMUState *bs, SMMUTransCfg *cfg, SMMUTLBEntry *entry);
> >  SMMUIOTLBKey smmu_get_iotlb_key(uint16_t asid, uint16_t vmid, uint64_t iova,
> > -                                uint8_t tg, uint8_t level);
> > +                                uint8_t tg, uint8_t level, SMMUStage stage);
> >  void smmu_iotlb_inv_all(SMMUState *s);
> >  void smmu_iotlb_inv_asid(SMMUState *s, uint16_t asid);
> >  void smmu_iotlb_inv_vmid(SMMUState *s, uint16_t vmid);
> >  void smmu_iotlb_inv_iova(SMMUState *s, int asid, int vmid, dma_addr_t iova,
> > -                         uint8_t tg, uint64_t num_pages, uint8_t ttl);
> > +                         uint8_t tg, uint64_t num_pages, uint8_t ttl,
> > +                         SMMUStage stage);
> >  
> >  /* Unmap the range of all the notifiers registered to any IOMMU mr */
> >  void smmu_inv_notifiers_all(SMMUState *s);
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 00/12] SMMUv3 nested translation support
  2024-03-25 10:13 [RFC PATCH 00/12] SMMUv3 nested translation support Mostafa Saleh
                   ` (12 preceding siblings ...)
  2024-03-25 17:50 ` [RFC PATCH 00/12] SMMUv3 nested translation support Marcin Juszkiewicz
@ 2024-04-02 22:28 ` Nicolin Chen
  2024-04-03 10:39   ` Mostafa Saleh
  13 siblings, 1 reply; 24+ messages in thread
From: Nicolin Chen @ 2024-04-02 22:28 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: qemu-arm, eric.auger, peter.maydell, qemu-devel, jean-philippe,
	alex.bennee, maz, julien

Hi Mostafa,

On Mon, Mar 25, 2024 at 10:13:56AM +0000, Mostafa Saleh wrote:
> 
> Currently, QEMU supports emulating either stage-1 or stage-2 SMMUs
> but not nested instances.
> This patch series adds support for nested translation in SMMUv3,
> this is controlled by property “arm-smmuv3.stage=nested”, and
> advertised to guests as (IDR0.S1P == 1 && IDR0.S2P == 2)

IIUIC, with this series, vSMMU will support a virtualized 2-stage
translation in a guest VM, right? I wonder how it would interact
with the ongoing 2-stage nesting support with host and guest. Or
is it supposed to be just a total orthogonal feature without any
interaction with the host system?

Thanks
Nicolin

> Main changes(architecture):
> ============================
> 1) CDs are considered IPA and translated with stage-2.
> 2) TTBx and tables for stage-1 are considered IPA and translated
>    with stage-2.
> 3) Translate the IPA address with stage-2.
> 
> TLBs:
> ======
> TLBs are the most tricky part.
> 
> 1) General design
>    Unified(Combined) design is used, where a new tag is added "stage"
>    which has 2 valid values:
>    - STAGE_1: Meaning this entry translates VA to PADDR, it can be
>      cached from fully nested configuration or from stage-1 only.
>      It doesn't support separate cached entries (VA to IPA).
> 
>    - STAGE_2: Meaning this translates IPA to PADDR, cached from
>      stage-2  only configuration.
> 
>    TLBs are also modified to cache 2 permissions, a new permission added
>    "parent_perm."
> 
>    For non-nested configuration, perm == parent_perm and nothing
>    changes. This is used to know which stage to use in case there is
>    a permission fault from a TLB entry.
> 
> 2) Caching in TLB
>    Stage-1 and stage-2 are inserted in the TLB as is.
>    For nested translation, both entries are combined into one TLB
>    entry. Everything is used from stage-1, except:
>    - transatled_addr from stage-2.
>    - parent_perm is from stage-2.
>    - addr_mask: is the minimum of both.
> 
> 3) TLB Lookup
>    For stage-1 and nested translations, it look for STAGE_1 entries.
>    For stage-2 it look for STAGE_2 TLB entries.
> 
> 4) TLB invalidation
>    - Stage-1 commands (CMD_TLBI_NH_VAA, SMMU_CMD_TLBI_NH_VA,
>      SMMU_CMD_TLBI_NH_ALL): Invalidate TLBs tagged with SMMU_STAGE_1.
>    - Stage-2 commands (CMD_TLBI_S2_IPA): Invalidate TLBs tagged with
>      SMMU_STAGE_2.
>    - All (SMMU_CMD_TLBI_S12_VMALL): Will invalidate both, this is
>      communicated to the TLB as SMMU_NESTED which is (SMMU_STAGE_1 |
>      SMMU_STAGE_2) which uses it as a mask.
> 
>    As far as I understand, this is compliant with the ARM
>    architecture, based on:
>    - ARM ARM DDI 0487J.a: RLGSCG, RTVTYQ, RGNJPZ
>    - ARM IHI 0070F.b: 16.2 Caching
> 
>    An alternative approach would be to instantiate 2 TLBs, one per
>    each stage. I haven’t investigated that.
> 
> Others
> =======
> - Advertise SMMUv3.2-S2FWB, it is NOP for QEMU as it doesn’t support
>   attributes.
> 
> - OAS: A typical setup with nesting is to share CPU stage-2 with the
>   SMMU, and according to the user manual, SMMU OAS must match the
>   system physical address.
> 
>   This was discussed before in
>   https://lore.kernel.org/all/20230226220650.1480786-11-smostafa@google.com/
>   The implementation here, follows the discussion, where migration is
>   added and oas is set up from the board (virt). However, the OAS is
>   chosen based on the CPU PARANGE as there is no fixed one.
> 
> - For nested configuration, IOVA notifier only notifies for stage-1
>   invalidations (as far as I understand this is the intended
>   behaviour as it notifies for IOVA)
> 
> - Stop ignoring VMID for stage-1 if stage-2 is also supported.
> 
> 
> Future improvements:
> =====================
> 1) One small improvement, that I don’t think it’s worth the extra
>    complexity, is in case of Stage-1 TLB miss for nested translation,
>    we can do stage-1 walk and lookup for stage-2 TLBs, instead of
>    doing the full walk.
> 
> 2) Patch 0006 (hw/arm/smmuv3: Translate CD and TT using stage-2 table)
>    introduces a macro to use functions that rely on cfg for stage-2,
>    I don’t like it. However, I didn’t find a simple way around it,
>    either we change many functions to have a separate stage argument,
>    or add another arg in config, which is probably more code.
> 
> Testing
> ========
> 1) IOMMUFD + VFIO
>    Kernel: https://lore.kernel.org/all/cover.1683688960.git.nicolinc@nvidia.com/
>    VMM: https://qemu-devel.nongnu.narkive.com/o815DqpI/rfc-v5-0-8-arm-smmuv3-emulation-support
> 
>    By assigning “virtio-net-pci,netdev=net0,disable-legacy=on,iommu_platform=on,ats=on”,
>    to a guest VM (on top of QEMU guest) with VIFO and IOMMUFD.
> 
> 2) Work in progress prototype I am hacking on for nesting on KVM
>    (this is nowhere near complete, and misses many stuff but it
>    doesn't require VMs/VFIO) also with virtio-net-pci and git
>    cloning a bunch of stuff and also observing traces.
>    https://android-kvm.googlesource.com/linux/+log/refs/heads/smostafa/android15-6.6-smmu-nesting-wip
> 
> hw/arm/smmuv3: Split smmuv3_translate() better viewed with --color-moved
> 
> 
> Mostafa Saleh (12):
>   hw/arm/smmu: Use enum for SMMU stage
>   hw/arm/smmu: Split smmuv3_translate()
>   hw/arm/smmu: Add stage to TLB
>   hw/arm/smmu: Support nesting in commands
>   hw/arm/smmuv3: Support nested SMMUs in smmuv3_notify_iova()
>   hw/arm/smmuv3: Translate CD and TT using stage-2 table
>   hw/arm/smmu-common: Support nested translation
>   hw/arm/smmuv3: Support and advertise nesting
>   hw/arm/smmuv3: Advertise S2FWB
>   hw/arm/smmu: Refactor SMMU OAS
>   hw/arm/smmuv3: Add property for OAS
>   hw/arm/virt: Set SMMU OAS based on CPU PARANGE
> 
>  hw/arm/smmu-common.c         | 256 ++++++++++++++++++----
>  hw/arm/smmu-internal.h       |   2 +
>  hw/arm/smmuv3-internal.h     |  17 +-
>  hw/arm/smmuv3.c              | 405 ++++++++++++++++++++++-------------
>  hw/arm/trace-events          |  14 +-
>  hw/arm/virt.c                |  14 +-
>  include/hw/arm/smmu-common.h |  46 +++-
>  include/hw/arm/smmuv3.h      |   1 +
>  target/arm/cpu.h             |   2 +
>  target/arm/cpu64.c           |   5 +
>  10 files changed, 533 insertions(+), 229 deletions(-)
> 
> --
> 2.44.0.396.g6e790dbe36-goog
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 03/12] hw/arm/smmu: Add stage to TLB
  2024-04-02 18:47     ` Mostafa Saleh
@ 2024-04-03  7:22       ` Eric Auger
  2024-04-03  9:55         ` Mostafa Saleh
  0 siblings, 1 reply; 24+ messages in thread
From: Eric Auger @ 2024-04-03  7:22 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: qemu-arm, peter.maydell, qemu-devel, jean-philippe, alex.bennee,
	maz, nicolinc, julien

Hi Mostafa,

On 4/2/24 20:47, Mostafa Saleh wrote:
> Hi Eric,
>
> On Tue, Apr 02, 2024 at 07:15:20PM +0200, Eric Auger wrote:
>> Hi Mostafa,
>>
>> On 3/25/24 11:13, Mostafa Saleh wrote:
>>> TLBs for nesting will be extended to be combined, a new index is added
>>> "stage", with 2 valid values:
>>>  - SMMU_STAGE_1: Meaning this translates VA to PADDR, this entry can
>>>    be cached from fully nested configuration or from stage-1 only.
>>>    We don't support separate cached entries (VA to IPA)
>>>
>>>  - SMMU_STAGE_2: Meaning this translates IPA to PADDR, cached from
>>>    stage-2 only configuration.
>>>
>>> For TLB invalidation:
>>>  - by VA: Invalidate TLBs tagged with SMMU_STAGE_1
>>>  - by IPA: Invalidate TLBs tagged with SMMU_STAGE_2
>>>  - All: Will invalidate both, this is communicated to the TLB as
>>>    SMMU_NESTED which is (SMMU_STAGE_1 | SMMU_STAGE_2) which uses
>>>    it as a mask.
>> I don't really get why you need this extra stage field in the key. Why
>> aren't the asid and vmid tags enough?
>>
> Looking again, I think we can do it with ASID and VMID only, but that
> requires some rework in the invalidation path.
>
> With nested SMMUs, we can cache entries from:
> - Stage-1 (or nested): Tagged with VMID and ASID
> - Stage-2: Tagged with VMID only (ASID = -1)
>
> That should be enough for caching/lookup, but for invalidation, we
> should be able to invalidate IPAs which are cached from stage-2.
>
> At the moment, we represent ASIDs with < 0 as a wildcard for
> invalidation or stage-2 and they were mutually exclusive.
>
> An example is:
> - CMD_TLBI_NH_VAA: Invalidate stage-1 for a VMID, all ASIDs (we use ASID = -1)
> - CMD_TLBI_NH_VA: Invalidate stage-1 for a VMID, an ASID  ( > 0)
> - CMD_TLBI_S2_IPA: Invalidate stage-2 for a VMID (we use ASID = -1)
>
> We need to distinguish between case 1) and 3) otherwise we over invalidate.
OK I see your point when passing the asid param to smmuv3_range_inval()
in smmuv3_range_inval().
Well if you can have separate functions for handling S1 and S2 cases
while keeping the current key that may be interesting. It may be clearer
now we have extended support. This can also help in debugging/tracing.
>
> Similarly, CMD_TLBI_NH_ALL(invalidate all stage-1 by VMID) and
> CMD_TLBI_S12_VMALL(invalidate both stages by VMID).
>
> I guess we can add variants of these functions that operate on ASIDs
> (>= 0) or (< 0) which is basically stage-1 or stage-2.
worth to try indeed.

Thanks

Eric
>
> Another case I can think of which is not implemented in QEMU is
> global entries, where we would like to look up entries for all ASIDs
> (-1), but that’s not a problem for now.
>
> I don’t have a strong opinion, I can try to do it this way.
>
> Thanks,
> Mostafa
>
>> Eric
>>> This briefly described in the user manual (ARM IHI 0070 F.b) in
>>> "16.2.1 Caching combined structures".
>>>
>>> Signed-off-by: Mostafa Saleh <smostafa@google.com>
>>> ---
>>>  hw/arm/smmu-common.c         | 27 +++++++++++++++++----------
>>>  hw/arm/smmu-internal.h       |  2 ++
>>>  hw/arm/smmuv3.c              |  5 +++--
>>>  hw/arm/trace-events          |  3 ++-
>>>  include/hw/arm/smmu-common.h |  8 ++++++--
>>>  5 files changed, 30 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
>>> index 20630eb670..677dcf9a13 100644
>>> --- a/hw/arm/smmu-common.c
>>> +++ b/hw/arm/smmu-common.c
>>> @@ -38,7 +38,7 @@ static guint smmu_iotlb_key_hash(gconstpointer v)
>>>  
>>>      /* Jenkins hash */
>>>      a = b = c = JHASH_INITVAL + sizeof(*key);
>>> -    a += key->asid + key->vmid + key->level + key->tg;
>>> +    a += key->asid + key->vmid + key->level + key->tg + key->stage;
>>>      b += extract64(key->iova, 0, 32);
>>>      c += extract64(key->iova, 32, 32);
>>>  
>>> @@ -54,14 +54,14 @@ static gboolean smmu_iotlb_key_equal(gconstpointer v1, gconstpointer v2)
>>>  
>>>      return (k1->asid == k2->asid) && (k1->iova == k2->iova) &&
>>>             (k1->level == k2->level) && (k1->tg == k2->tg) &&
>>> -           (k1->vmid == k2->vmid);
>>> +           (k1->vmid == k2->vmid) && (k1->stage == k2->stage);
>>>  }
>>>  
>>>  SMMUIOTLBKey smmu_get_iotlb_key(uint16_t asid, uint16_t vmid, uint64_t iova,
>>> -                                uint8_t tg, uint8_t level)
>>> +                                uint8_t tg, uint8_t level, SMMUStage stage)
>>>  {
>>>      SMMUIOTLBKey key = {.asid = asid, .vmid = vmid, .iova = iova,
>>> -                        .tg = tg, .level = level};
>>> +                        .tg = tg, .level = level, .stage = stage};
>>>  
>>>      return key;
>>>  }
>>> @@ -81,7 +81,8 @@ SMMUTLBEntry *smmu_iotlb_lookup(SMMUState *bs, SMMUTransCfg *cfg,
>>>          SMMUIOTLBKey key;
>>>  
>>>          key = smmu_get_iotlb_key(cfg->asid, cfg->s2cfg.vmid,
>>> -                                 iova & ~mask, tg, level);
>>> +                                 iova & ~mask, tg, level,
>>> +                                 SMMU_STAGE_TO_TLB_TAG(cfg->stage));
>>>          entry = g_hash_table_lookup(bs->iotlb, &key);
>>>          if (entry) {
>>>              break;
>>> @@ -109,15 +110,16 @@ void smmu_iotlb_insert(SMMUState *bs, SMMUTransCfg *cfg, SMMUTLBEntry *new)
>>>  {
>>>      SMMUIOTLBKey *key = g_new0(SMMUIOTLBKey, 1);
>>>      uint8_t tg = (new->granule - 10) / 2;
>>> +    SMMUStage stage_tag = SMMU_STAGE_TO_TLB_TAG(cfg->stage);
>>>  
>>>      if (g_hash_table_size(bs->iotlb) >= SMMU_IOTLB_MAX_SIZE) {
>>>          smmu_iotlb_inv_all(bs);
>>>      }
>>>  
>>>      *key = smmu_get_iotlb_key(cfg->asid, cfg->s2cfg.vmid, new->entry.iova,
>>> -                              tg, new->level);
>>> +                              tg, new->level, stage_tag);
>>>      trace_smmu_iotlb_insert(cfg->asid, cfg->s2cfg.vmid, new->entry.iova,
>>> -                            tg, new->level);
>>> +                            tg, new->level, stage_tag);
>>>      g_hash_table_insert(bs->iotlb, key, new);
>>>  }
>>>  
>>> @@ -159,18 +161,22 @@ static gboolean smmu_hash_remove_by_asid_vmid_iova(gpointer key, gpointer value,
>>>      if (info->vmid >= 0 && info->vmid != SMMU_IOTLB_VMID(iotlb_key)) {
>>>          return false;
>>>      }
>>> +    if (!(info->stage & SMMU_IOTLB_STAGE(iotlb_key))) {
>>> +        return false;
>>> +    }
>>>      return ((info->iova & ~entry->addr_mask) == entry->iova) ||
>>>             ((entry->iova & ~info->mask) == info->iova);
>>>  }
>>>  
>>>  void smmu_iotlb_inv_iova(SMMUState *s, int asid, int vmid, dma_addr_t iova,
>>> -                         uint8_t tg, uint64_t num_pages, uint8_t ttl)
>>> +                         uint8_t tg, uint64_t num_pages, uint8_t ttl,
>>> +                         SMMUStage stage)
>>>  {
>>>      /* if tg is not set we use 4KB range invalidation */
>>>      uint8_t granule = tg ? tg * 2 + 10 : 12;
>>>  
>>>      if (ttl && (num_pages == 1) && (asid >= 0)) {
>>> -        SMMUIOTLBKey key = smmu_get_iotlb_key(asid, vmid, iova, tg, ttl);
>>> +        SMMUIOTLBKey key = smmu_get_iotlb_key(asid, vmid, iova, tg, ttl, stage);
>>>  
>>>          if (g_hash_table_remove(s->iotlb, &key)) {
>>>              return;
>>> @@ -184,6 +190,7 @@ void smmu_iotlb_inv_iova(SMMUState *s, int asid, int vmid, dma_addr_t iova,
>>>      SMMUIOTLBPageInvInfo info = {
>>>          .asid = asid, .iova = iova,
>>>          .vmid = vmid,
>>> +        .stage = stage,
>>>          .mask = (num_pages * 1 << granule) - 1};
>>>  
>>>      g_hash_table_foreach_remove(s->iotlb,
>>> @@ -597,7 +604,7 @@ SMMUTLBEntry *smmu_translate(SMMUState *bs, SMMUTransCfg *cfg, dma_addr_t addr,
>>>      if (cached_entry) {
>>>          if ((flag & IOMMU_WO) && !(cached_entry->entry.perm & IOMMU_WO)) {
>>>              info->type = SMMU_PTW_ERR_PERMISSION;
>>> -            info->stage = cfg->stage;
>>> +            info->stage = SMMU_STAGE_TO_TLB_TAG(cfg->stage);
>>>              return NULL;
>>>          }
>>>          return cached_entry;
>>> diff --git a/hw/arm/smmu-internal.h b/hw/arm/smmu-internal.h
>>> index 843bebb185..6caa0ddf21 100644
>>> --- a/hw/arm/smmu-internal.h
>>> +++ b/hw/arm/smmu-internal.h
>>> @@ -133,12 +133,14 @@ static inline int pgd_concat_idx(int start_level, int granule_sz,
>>>  
>>>  #define SMMU_IOTLB_ASID(key) ((key).asid)
>>>  #define SMMU_IOTLB_VMID(key) ((key).vmid)
>>> +#define SMMU_IOTLB_STAGE(key) ((key).stage)
>>>  
>>>  typedef struct SMMUIOTLBPageInvInfo {
>>>      int asid;
>>>      int vmid;
>>>      uint64_t iova;
>>>      uint64_t mask;
>>> +    SMMUStage stage;
>>>  } SMMUIOTLBPageInvInfo;
>>>  
>>>  typedef struct SMMUSIDRange {
>>> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
>>> index f081ff0cc4..b27bf297e1 100644
>>> --- a/hw/arm/smmuv3.c
>>> +++ b/hw/arm/smmuv3.c
>>> @@ -1087,7 +1087,7 @@ static void smmuv3_range_inval(SMMUState *s, Cmd *cmd)
>>>      if (!tg) {
>>>          trace_smmuv3_range_inval(vmid, asid, addr, tg, 1, ttl, leaf);
>>>          smmuv3_inv_notifiers_iova(s, asid, vmid, addr, tg, 1);
>>> -        smmu_iotlb_inv_iova(s, asid, vmid, addr, tg, 1, ttl);
>>> +        smmu_iotlb_inv_iova(s, asid, vmid, addr, tg, 1, ttl, SMMU_NESTED);
>>>          return;
>>>      }
>>>  
>>> @@ -1105,7 +1105,8 @@ static void smmuv3_range_inval(SMMUState *s, Cmd *cmd)
>>>          num_pages = (mask + 1) >> granule;
>>>          trace_smmuv3_range_inval(vmid, asid, addr, tg, num_pages, ttl, leaf);
>>>          smmuv3_inv_notifiers_iova(s, asid, vmid, addr, tg, num_pages);
>>> -        smmu_iotlb_inv_iova(s, asid, vmid, addr, tg, num_pages, ttl);
>>> +        smmu_iotlb_inv_iova(s, asid, vmid, addr, tg,
>>> +                            num_pages, ttl, SMMU_NESTED);
>>>          addr += mask + 1;
>>>      }
>>>  }
>>> diff --git a/hw/arm/trace-events b/hw/arm/trace-events
>>> index cc12924a84..3000c3bf14 100644
>>> --- a/hw/arm/trace-events
>>> +++ b/hw/arm/trace-events
>>> @@ -14,10 +14,11 @@ smmu_iotlb_inv_all(void) "IOTLB invalidate all"
>>>  smmu_iotlb_inv_asid(uint16_t asid) "IOTLB invalidate asid=%d"
>>>  smmu_iotlb_inv_vmid(uint16_t vmid) "IOTLB invalidate vmid=%d"
>>>  smmu_iotlb_inv_iova(uint16_t asid, uint64_t addr) "IOTLB invalidate asid=%d addr=0x%"PRIx64
>>> +smmu_iotlb_inv_stage(int stage) "Stage invalidate stage=%d"
>>>  smmu_inv_notifiers_mr(const char *name) "iommu mr=%s"
>>>  smmu_iotlb_lookup_hit(uint16_t asid, uint16_t vmid, uint64_t addr, uint32_t hit, uint32_t miss, uint32_t p) "IOTLB cache HIT asid=%d vmid=%d addr=0x%"PRIx64" hit=%d miss=%d hit rate=%d"
>>>  smmu_iotlb_lookup_miss(uint16_t asid, uint16_t vmid, uint64_t addr, uint32_t hit, uint32_t miss, uint32_t p) "IOTLB cache MISS asid=%d vmid=%d addr=0x%"PRIx64" hit=%d miss=%d hit rate=%d"
>>> -smmu_iotlb_insert(uint16_t asid, uint16_t vmid, uint64_t addr, uint8_t tg, uint8_t level) "IOTLB ++ asid=%d vmid=%d addr=0x%"PRIx64" tg=%d level=%d"
>>> +smmu_iotlb_insert(uint16_t asid, uint16_t vmid, uint64_t addr, uint8_t tg, uint8_t level, int stage) "IOTLB ++ asid=%d vmid=%d addr=0x%"PRIx64" tg=%d level=%d stage=%d"
>>>  
>>>  # smmuv3.c
>>>  smmuv3_read_mmio(uint64_t addr, uint64_t val, unsigned size, uint32_t r) "addr: 0x%"PRIx64" val:0x%"PRIx64" size: 0x%x(%d)"
>>> diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
>>> index 876e78975c..695d6d10ad 100644
>>> --- a/include/hw/arm/smmu-common.h
>>> +++ b/include/hw/arm/smmu-common.h
>>> @@ -37,6 +37,8 @@
>>>  #define VMSA_IDXMSK(isz, strd, lvl)         ((1ULL << \
>>>                                               VMSA_BIT_LVL(isz, strd, lvl)) - 1)
>>>  
>>> +#define SMMU_STAGE_TO_TLB_TAG(stage)        (((stage) == SMMU_NESTED) ? \
>>> +                                             SMMU_STAGE_1 : (stage))
>>>  /*
>>>   * Page table walk error types
>>>   */
>>> @@ -136,6 +138,7 @@ typedef struct SMMUIOTLBKey {
>>>      uint16_t vmid;
>>>      uint8_t tg;
>>>      uint8_t level;
>>> +    SMMUStage stage;
>>>  } SMMUIOTLBKey;
>>>  
>>>  struct SMMUState {
>>> @@ -203,12 +206,13 @@ SMMUTLBEntry *smmu_iotlb_lookup(SMMUState *bs, SMMUTransCfg *cfg,
>>>                                  SMMUTransTableInfo *tt, hwaddr iova);
>>>  void smmu_iotlb_insert(SMMUState *bs, SMMUTransCfg *cfg, SMMUTLBEntry *entry);
>>>  SMMUIOTLBKey smmu_get_iotlb_key(uint16_t asid, uint16_t vmid, uint64_t iova,
>>> -                                uint8_t tg, uint8_t level);
>>> +                                uint8_t tg, uint8_t level, SMMUStage stage);
>>>  void smmu_iotlb_inv_all(SMMUState *s);
>>>  void smmu_iotlb_inv_asid(SMMUState *s, uint16_t asid);
>>>  void smmu_iotlb_inv_vmid(SMMUState *s, uint16_t vmid);
>>>  void smmu_iotlb_inv_iova(SMMUState *s, int asid, int vmid, dma_addr_t iova,
>>> -                         uint8_t tg, uint64_t num_pages, uint8_t ttl);
>>> +                         uint8_t tg, uint64_t num_pages, uint8_t ttl,
>>> +                         SMMUStage stage);
>>>  
>>>  /* Unmap the range of all the notifiers registered to any IOMMU mr */
>>>  void smmu_inv_notifiers_all(SMMUState *s);



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 03/12] hw/arm/smmu: Add stage to TLB
  2024-04-03  7:22       ` Eric Auger
@ 2024-04-03  9:55         ` Mostafa Saleh
  0 siblings, 0 replies; 24+ messages in thread
From: Mostafa Saleh @ 2024-04-03  9:55 UTC (permalink / raw)
  To: Eric Auger
  Cc: qemu-arm, peter.maydell, qemu-devel, jean-philippe, alex.bennee,
	maz, nicolinc, julien

On Wed, Apr 03, 2024 at 09:22:03AM +0200, Eric Auger wrote:
> Hi Mostafa,
> 
> On 4/2/24 20:47, Mostafa Saleh wrote:
> > Hi Eric,
> >
> > On Tue, Apr 02, 2024 at 07:15:20PM +0200, Eric Auger wrote:
> >> Hi Mostafa,
> >>
> >> On 3/25/24 11:13, Mostafa Saleh wrote:
> >>> TLBs for nesting will be extended to be combined, a new index is added
> >>> "stage", with 2 valid values:
> >>>  - SMMU_STAGE_1: Meaning this translates VA to PADDR, this entry can
> >>>    be cached from fully nested configuration or from stage-1 only.
> >>>    We don't support separate cached entries (VA to IPA)
> >>>
> >>>  - SMMU_STAGE_2: Meaning this translates IPA to PADDR, cached from
> >>>    stage-2 only configuration.
> >>>
> >>> For TLB invalidation:
> >>>  - by VA: Invalidate TLBs tagged with SMMU_STAGE_1
> >>>  - by IPA: Invalidate TLBs tagged with SMMU_STAGE_2
> >>>  - All: Will invalidate both, this is communicated to the TLB as
> >>>    SMMU_NESTED which is (SMMU_STAGE_1 | SMMU_STAGE_2) which uses
> >>>    it as a mask.
> >> I don't really get why you need this extra stage field in the key. Why
> >> aren't the asid and vmid tags enough?
> >>
> > Looking again, I think we can do it with ASID and VMID only, but that
> > requires some rework in the invalidation path.
> >
> > With nested SMMUs, we can cache entries from:
> > - Stage-1 (or nested): Tagged with VMID and ASID
> > - Stage-2: Tagged with VMID only (ASID = -1)
> >
> > That should be enough for caching/lookup, but for invalidation, we
> > should be able to invalidate IPAs which are cached from stage-2.
> >
> > At the moment, we represent ASIDs with < 0 as a wildcard for
> > invalidation or stage-2 and they were mutually exclusive.
> >
> > An example is:
> > - CMD_TLBI_NH_VAA: Invalidate stage-1 for a VMID, all ASIDs (we use ASID = -1)
> > - CMD_TLBI_NH_VA: Invalidate stage-1 for a VMID, an ASID  ( > 0)
> > - CMD_TLBI_S2_IPA: Invalidate stage-2 for a VMID (we use ASID = -1)
> >
> > We need to distinguish between case 1) and 3) otherwise we over invalidate.
> OK I see your point when passing the asid param to smmuv3_range_inval()
> in smmuv3_range_inval().
> Well if you can have separate functions for handling S1 and S2 cases
> while keeping the current key that may be interesting. It may be clearer
> now we have extended support. This can also help in debugging/tracing.
> >
> > Similarly, CMD_TLBI_NH_ALL(invalidate all stage-1 by VMID) and
> > CMD_TLBI_S12_VMALL(invalidate both stages by VMID).
> >
> > I guess we can add variants of these functions that operate on ASIDs
> > (>= 0) or (< 0) which is basically stage-1 or stage-2.
> worth to try indeed.

I will switch to that in V2.

Thanks,
Mostafa

> 
> Thanks
> 
> Eric
> >
> > Another case I can think of which is not implemented in QEMU is
> > global entries, where we would like to look up entries for all ASIDs
> > (-1), but that’s not a problem for now.
> >
> > I don’t have a strong opinion, I can try to do it this way.
> >
> > Thanks,
> > Mostafa
> >
> >> Eric
> >>> This briefly described in the user manual (ARM IHI 0070 F.b) in
> >>> "16.2.1 Caching combined structures".
> >>>
> >>> Signed-off-by: Mostafa Saleh <smostafa@google.com>
> >>> ---
> >>>  hw/arm/smmu-common.c         | 27 +++++++++++++++++----------
> >>>  hw/arm/smmu-internal.h       |  2 ++
> >>>  hw/arm/smmuv3.c              |  5 +++--
> >>>  hw/arm/trace-events          |  3 ++-
> >>>  include/hw/arm/smmu-common.h |  8 ++++++--
> >>>  5 files changed, 30 insertions(+), 15 deletions(-)
> >>>
> >>> diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
> >>> index 20630eb670..677dcf9a13 100644
> >>> --- a/hw/arm/smmu-common.c
> >>> +++ b/hw/arm/smmu-common.c
> >>> @@ -38,7 +38,7 @@ static guint smmu_iotlb_key_hash(gconstpointer v)
> >>>  
> >>>      /* Jenkins hash */
> >>>      a = b = c = JHASH_INITVAL + sizeof(*key);
> >>> -    a += key->asid + key->vmid + key->level + key->tg;
> >>> +    a += key->asid + key->vmid + key->level + key->tg + key->stage;
> >>>      b += extract64(key->iova, 0, 32);
> >>>      c += extract64(key->iova, 32, 32);
> >>>  
> >>> @@ -54,14 +54,14 @@ static gboolean smmu_iotlb_key_equal(gconstpointer v1, gconstpointer v2)
> >>>  
> >>>      return (k1->asid == k2->asid) && (k1->iova == k2->iova) &&
> >>>             (k1->level == k2->level) && (k1->tg == k2->tg) &&
> >>> -           (k1->vmid == k2->vmid);
> >>> +           (k1->vmid == k2->vmid) && (k1->stage == k2->stage);
> >>>  }
> >>>  
> >>>  SMMUIOTLBKey smmu_get_iotlb_key(uint16_t asid, uint16_t vmid, uint64_t iova,
> >>> -                                uint8_t tg, uint8_t level)
> >>> +                                uint8_t tg, uint8_t level, SMMUStage stage)
> >>>  {
> >>>      SMMUIOTLBKey key = {.asid = asid, .vmid = vmid, .iova = iova,
> >>> -                        .tg = tg, .level = level};
> >>> +                        .tg = tg, .level = level, .stage = stage};
> >>>  
> >>>      return key;
> >>>  }
> >>> @@ -81,7 +81,8 @@ SMMUTLBEntry *smmu_iotlb_lookup(SMMUState *bs, SMMUTransCfg *cfg,
> >>>          SMMUIOTLBKey key;
> >>>  
> >>>          key = smmu_get_iotlb_key(cfg->asid, cfg->s2cfg.vmid,
> >>> -                                 iova & ~mask, tg, level);
> >>> +                                 iova & ~mask, tg, level,
> >>> +                                 SMMU_STAGE_TO_TLB_TAG(cfg->stage));
> >>>          entry = g_hash_table_lookup(bs->iotlb, &key);
> >>>          if (entry) {
> >>>              break;
> >>> @@ -109,15 +110,16 @@ void smmu_iotlb_insert(SMMUState *bs, SMMUTransCfg *cfg, SMMUTLBEntry *new)
> >>>  {
> >>>      SMMUIOTLBKey *key = g_new0(SMMUIOTLBKey, 1);
> >>>      uint8_t tg = (new->granule - 10) / 2;
> >>> +    SMMUStage stage_tag = SMMU_STAGE_TO_TLB_TAG(cfg->stage);
> >>>  
> >>>      if (g_hash_table_size(bs->iotlb) >= SMMU_IOTLB_MAX_SIZE) {
> >>>          smmu_iotlb_inv_all(bs);
> >>>      }
> >>>  
> >>>      *key = smmu_get_iotlb_key(cfg->asid, cfg->s2cfg.vmid, new->entry.iova,
> >>> -                              tg, new->level);
> >>> +                              tg, new->level, stage_tag);
> >>>      trace_smmu_iotlb_insert(cfg->asid, cfg->s2cfg.vmid, new->entry.iova,
> >>> -                            tg, new->level);
> >>> +                            tg, new->level, stage_tag);
> >>>      g_hash_table_insert(bs->iotlb, key, new);
> >>>  }
> >>>  
> >>> @@ -159,18 +161,22 @@ static gboolean smmu_hash_remove_by_asid_vmid_iova(gpointer key, gpointer value,
> >>>      if (info->vmid >= 0 && info->vmid != SMMU_IOTLB_VMID(iotlb_key)) {
> >>>          return false;
> >>>      }
> >>> +    if (!(info->stage & SMMU_IOTLB_STAGE(iotlb_key))) {
> >>> +        return false;
> >>> +    }
> >>>      return ((info->iova & ~entry->addr_mask) == entry->iova) ||
> >>>             ((entry->iova & ~info->mask) == info->iova);
> >>>  }
> >>>  
> >>>  void smmu_iotlb_inv_iova(SMMUState *s, int asid, int vmid, dma_addr_t iova,
> >>> -                         uint8_t tg, uint64_t num_pages, uint8_t ttl)
> >>> +                         uint8_t tg, uint64_t num_pages, uint8_t ttl,
> >>> +                         SMMUStage stage)
> >>>  {
> >>>      /* if tg is not set we use 4KB range invalidation */
> >>>      uint8_t granule = tg ? tg * 2 + 10 : 12;
> >>>  
> >>>      if (ttl && (num_pages == 1) && (asid >= 0)) {
> >>> -        SMMUIOTLBKey key = smmu_get_iotlb_key(asid, vmid, iova, tg, ttl);
> >>> +        SMMUIOTLBKey key = smmu_get_iotlb_key(asid, vmid, iova, tg, ttl, stage);
> >>>  
> >>>          if (g_hash_table_remove(s->iotlb, &key)) {
> >>>              return;
> >>> @@ -184,6 +190,7 @@ void smmu_iotlb_inv_iova(SMMUState *s, int asid, int vmid, dma_addr_t iova,
> >>>      SMMUIOTLBPageInvInfo info = {
> >>>          .asid = asid, .iova = iova,
> >>>          .vmid = vmid,
> >>> +        .stage = stage,
> >>>          .mask = (num_pages * 1 << granule) - 1};
> >>>  
> >>>      g_hash_table_foreach_remove(s->iotlb,
> >>> @@ -597,7 +604,7 @@ SMMUTLBEntry *smmu_translate(SMMUState *bs, SMMUTransCfg *cfg, dma_addr_t addr,
> >>>      if (cached_entry) {
> >>>          if ((flag & IOMMU_WO) && !(cached_entry->entry.perm & IOMMU_WO)) {
> >>>              info->type = SMMU_PTW_ERR_PERMISSION;
> >>> -            info->stage = cfg->stage;
> >>> +            info->stage = SMMU_STAGE_TO_TLB_TAG(cfg->stage);
> >>>              return NULL;
> >>>          }
> >>>          return cached_entry;
> >>> diff --git a/hw/arm/smmu-internal.h b/hw/arm/smmu-internal.h
> >>> index 843bebb185..6caa0ddf21 100644
> >>> --- a/hw/arm/smmu-internal.h
> >>> +++ b/hw/arm/smmu-internal.h
> >>> @@ -133,12 +133,14 @@ static inline int pgd_concat_idx(int start_level, int granule_sz,
> >>>  
> >>>  #define SMMU_IOTLB_ASID(key) ((key).asid)
> >>>  #define SMMU_IOTLB_VMID(key) ((key).vmid)
> >>> +#define SMMU_IOTLB_STAGE(key) ((key).stage)
> >>>  
> >>>  typedef struct SMMUIOTLBPageInvInfo {
> >>>      int asid;
> >>>      int vmid;
> >>>      uint64_t iova;
> >>>      uint64_t mask;
> >>> +    SMMUStage stage;
> >>>  } SMMUIOTLBPageInvInfo;
> >>>  
> >>>  typedef struct SMMUSIDRange {
> >>> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> >>> index f081ff0cc4..b27bf297e1 100644
> >>> --- a/hw/arm/smmuv3.c
> >>> +++ b/hw/arm/smmuv3.c
> >>> @@ -1087,7 +1087,7 @@ static void smmuv3_range_inval(SMMUState *s, Cmd *cmd)
> >>>      if (!tg) {
> >>>          trace_smmuv3_range_inval(vmid, asid, addr, tg, 1, ttl, leaf);
> >>>          smmuv3_inv_notifiers_iova(s, asid, vmid, addr, tg, 1);
> >>> -        smmu_iotlb_inv_iova(s, asid, vmid, addr, tg, 1, ttl);
> >>> +        smmu_iotlb_inv_iova(s, asid, vmid, addr, tg, 1, ttl, SMMU_NESTED);
> >>>          return;
> >>>      }
> >>>  
> >>> @@ -1105,7 +1105,8 @@ static void smmuv3_range_inval(SMMUState *s, Cmd *cmd)
> >>>          num_pages = (mask + 1) >> granule;
> >>>          trace_smmuv3_range_inval(vmid, asid, addr, tg, num_pages, ttl, leaf);
> >>>          smmuv3_inv_notifiers_iova(s, asid, vmid, addr, tg, num_pages);
> >>> -        smmu_iotlb_inv_iova(s, asid, vmid, addr, tg, num_pages, ttl);
> >>> +        smmu_iotlb_inv_iova(s, asid, vmid, addr, tg,
> >>> +                            num_pages, ttl, SMMU_NESTED);
> >>>          addr += mask + 1;
> >>>      }
> >>>  }
> >>> diff --git a/hw/arm/trace-events b/hw/arm/trace-events
> >>> index cc12924a84..3000c3bf14 100644
> >>> --- a/hw/arm/trace-events
> >>> +++ b/hw/arm/trace-events
> >>> @@ -14,10 +14,11 @@ smmu_iotlb_inv_all(void) "IOTLB invalidate all"
> >>>  smmu_iotlb_inv_asid(uint16_t asid) "IOTLB invalidate asid=%d"
> >>>  smmu_iotlb_inv_vmid(uint16_t vmid) "IOTLB invalidate vmid=%d"
> >>>  smmu_iotlb_inv_iova(uint16_t asid, uint64_t addr) "IOTLB invalidate asid=%d addr=0x%"PRIx64
> >>> +smmu_iotlb_inv_stage(int stage) "Stage invalidate stage=%d"
> >>>  smmu_inv_notifiers_mr(const char *name) "iommu mr=%s"
> >>>  smmu_iotlb_lookup_hit(uint16_t asid, uint16_t vmid, uint64_t addr, uint32_t hit, uint32_t miss, uint32_t p) "IOTLB cache HIT asid=%d vmid=%d addr=0x%"PRIx64" hit=%d miss=%d hit rate=%d"
> >>>  smmu_iotlb_lookup_miss(uint16_t asid, uint16_t vmid, uint64_t addr, uint32_t hit, uint32_t miss, uint32_t p) "IOTLB cache MISS asid=%d vmid=%d addr=0x%"PRIx64" hit=%d miss=%d hit rate=%d"
> >>> -smmu_iotlb_insert(uint16_t asid, uint16_t vmid, uint64_t addr, uint8_t tg, uint8_t level) "IOTLB ++ asid=%d vmid=%d addr=0x%"PRIx64" tg=%d level=%d"
> >>> +smmu_iotlb_insert(uint16_t asid, uint16_t vmid, uint64_t addr, uint8_t tg, uint8_t level, int stage) "IOTLB ++ asid=%d vmid=%d addr=0x%"PRIx64" tg=%d level=%d stage=%d"
> >>>  
> >>>  # smmuv3.c
> >>>  smmuv3_read_mmio(uint64_t addr, uint64_t val, unsigned size, uint32_t r) "addr: 0x%"PRIx64" val:0x%"PRIx64" size: 0x%x(%d)"
> >>> diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
> >>> index 876e78975c..695d6d10ad 100644
> >>> --- a/include/hw/arm/smmu-common.h
> >>> +++ b/include/hw/arm/smmu-common.h
> >>> @@ -37,6 +37,8 @@
> >>>  #define VMSA_IDXMSK(isz, strd, lvl)         ((1ULL << \
> >>>                                               VMSA_BIT_LVL(isz, strd, lvl)) - 1)
> >>>  
> >>> +#define SMMU_STAGE_TO_TLB_TAG(stage)        (((stage) == SMMU_NESTED) ? \
> >>> +                                             SMMU_STAGE_1 : (stage))
> >>>  /*
> >>>   * Page table walk error types
> >>>   */
> >>> @@ -136,6 +138,7 @@ typedef struct SMMUIOTLBKey {
> >>>      uint16_t vmid;
> >>>      uint8_t tg;
> >>>      uint8_t level;
> >>> +    SMMUStage stage;
> >>>  } SMMUIOTLBKey;
> >>>  
> >>>  struct SMMUState {
> >>> @@ -203,12 +206,13 @@ SMMUTLBEntry *smmu_iotlb_lookup(SMMUState *bs, SMMUTransCfg *cfg,
> >>>                                  SMMUTransTableInfo *tt, hwaddr iova);
> >>>  void smmu_iotlb_insert(SMMUState *bs, SMMUTransCfg *cfg, SMMUTLBEntry *entry);
> >>>  SMMUIOTLBKey smmu_get_iotlb_key(uint16_t asid, uint16_t vmid, uint64_t iova,
> >>> -                                uint8_t tg, uint8_t level);
> >>> +                                uint8_t tg, uint8_t level, SMMUStage stage);
> >>>  void smmu_iotlb_inv_all(SMMUState *s);
> >>>  void smmu_iotlb_inv_asid(SMMUState *s, uint16_t asid);
> >>>  void smmu_iotlb_inv_vmid(SMMUState *s, uint16_t vmid);
> >>>  void smmu_iotlb_inv_iova(SMMUState *s, int asid, int vmid, dma_addr_t iova,
> >>> -                         uint8_t tg, uint64_t num_pages, uint8_t ttl);
> >>> +                         uint8_t tg, uint64_t num_pages, uint8_t ttl,
> >>> +                         SMMUStage stage);
> >>>  
> >>>  /* Unmap the range of all the notifiers registered to any IOMMU mr */
> >>>  void smmu_inv_notifiers_all(SMMUState *s);
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 00/12] SMMUv3 nested translation support
  2024-04-02 22:28 ` Nicolin Chen
@ 2024-04-03 10:39   ` Mostafa Saleh
  0 siblings, 0 replies; 24+ messages in thread
From: Mostafa Saleh @ 2024-04-03 10:39 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: qemu-arm, eric.auger, peter.maydell, qemu-devel, jean-philippe,
	alex.bennee, maz, julien

Hi Nicolin,

On Tue, Apr 02, 2024 at 03:28:12PM -0700, Nicolin Chen wrote:
> Hi Mostafa,
> 
> On Mon, Mar 25, 2024 at 10:13:56AM +0000, Mostafa Saleh wrote:
> > 
> > Currently, QEMU supports emulating either stage-1 or stage-2 SMMUs
> > but not nested instances.
> > This patch series adds support for nested translation in SMMUv3,
> > this is controlled by property “arm-smmuv3.stage=nested”, and
> > advertised to guests as (IDR0.S1P == 1 && IDR0.S2P == 2)
> 
> IIUIC, with this series, vSMMU will support a virtualized 2-stage
> translation in a guest VM, right? I wonder how it would interact

I always get confused with terminologies when dealing with QEMU;
as the host can mean the actual host (which is x86_64 in my case)
and the guest would aarch64 Linux fully emulated by QEMU, and the
emulated guest can be considered a host and launch it’s guests wit
KVM for example. This also will be more fun with guests supporting
nested virtualization :)

For simplicity, I will consider:
- HOST: the fully emulated QEMU guest (aarch64) running on my machine.
- GUEST: Any guest launched by the HOST (through KVM for example)
- QEMU: Is the instance of QEMU emulating the HOST (built for x86)
- QEMU-VMM: Is the instance of QEMU running on the HOST (built for
  aarch64) which launches VMs(GUESTs).

With that, AFAIU, vSMMU is the SMMUv3 emulation used for GUESTs with
QEMU-VMM, where it has hooks in CMDQ and then the QEMU-VMM will issue
IOCTLs to the HOST to do the actual SMMU work (through iommufd or IIRC
there was previous patches from Eric that does that also), also the
vSMMU is out of tree AFAICT.

In that case, this work is orthogonal to that, the nested SMMUv3
emulation in this series mainly targets QEMU which is advertised
to the HOST, which then allows it to use iommufd with GUESts.

In theory, that work can be extended to QEMU-VMM with vSMMU, but
I guess that would be a lot of work as the VMM needs to collapse
both stages as the kernel provides only one address space for the VMM.

Mainly, I use this patches to test nesting patches I am hacking for
KVM, also they can be used with your patches to test iommufd with
needing hardware. (See testing section in the cover letter)

> with the ongoing 2-stage nesting support with host and guest. Or
> is it supposed to be just a total orthogonal feature without any
> interaction with the host system?

Are you referring to the iommufd work on Linux to support nesting?


Thanks,
Mostafa
> Thanks
> Nicolin
> 
> > Main changes(architecture):
> > ============================
> > 1) CDs are considered IPA and translated with stage-2.
> > 2) TTBx and tables for stage-1 are considered IPA and translated
> >    with stage-2.
> > 3) Translate the IPA address with stage-2.
> > 
> > TLBs:
> > ======
> > TLBs are the most tricky part.
> > 
> > 1) General design
> >    Unified(Combined) design is used, where a new tag is added "stage"
> >    which has 2 valid values:
> >    - STAGE_1: Meaning this entry translates VA to PADDR, it can be
> >      cached from fully nested configuration or from stage-1 only.
> >      It doesn't support separate cached entries (VA to IPA).
> > 
> >    - STAGE_2: Meaning this translates IPA to PADDR, cached from
> >      stage-2  only configuration.
> > 
> >    TLBs are also modified to cache 2 permissions, a new permission added
> >    "parent_perm."
> > 
> >    For non-nested configuration, perm == parent_perm and nothing
> >    changes. This is used to know which stage to use in case there is
> >    a permission fault from a TLB entry.
> > 
> > 2) Caching in TLB
> >    Stage-1 and stage-2 are inserted in the TLB as is.
> >    For nested translation, both entries are combined into one TLB
> >    entry. Everything is used from stage-1, except:
> >    - transatled_addr from stage-2.
> >    - parent_perm is from stage-2.
> >    - addr_mask: is the minimum of both.
> > 
> > 3) TLB Lookup
> >    For stage-1 and nested translations, it look for STAGE_1 entries.
> >    For stage-2 it look for STAGE_2 TLB entries.
> > 
> > 4) TLB invalidation
> >    - Stage-1 commands (CMD_TLBI_NH_VAA, SMMU_CMD_TLBI_NH_VA,
> >      SMMU_CMD_TLBI_NH_ALL): Invalidate TLBs tagged with SMMU_STAGE_1.
> >    - Stage-2 commands (CMD_TLBI_S2_IPA): Invalidate TLBs tagged with
> >      SMMU_STAGE_2.
> >    - All (SMMU_CMD_TLBI_S12_VMALL): Will invalidate both, this is
> >      communicated to the TLB as SMMU_NESTED which is (SMMU_STAGE_1 |
> >      SMMU_STAGE_2) which uses it as a mask.
> > 
> >    As far as I understand, this is compliant with the ARM
> >    architecture, based on:
> >    - ARM ARM DDI 0487J.a: RLGSCG, RTVTYQ, RGNJPZ
> >    - ARM IHI 0070F.b: 16.2 Caching
> > 
> >    An alternative approach would be to instantiate 2 TLBs, one per
> >    each stage. I haven’t investigated that.
> > 
> > Others
> > =======
> > - Advertise SMMUv3.2-S2FWB, it is NOP for QEMU as it doesn’t support
> >   attributes.
> > 
> > - OAS: A typical setup with nesting is to share CPU stage-2 with the
> >   SMMU, and according to the user manual, SMMU OAS must match the
> >   system physical address.
> > 
> >   This was discussed before in
> >   https://lore.kernel.org/all/20230226220650.1480786-11-smostafa@google.com/
> >   The implementation here, follows the discussion, where migration is
> >   added and oas is set up from the board (virt). However, the OAS is
> >   chosen based on the CPU PARANGE as there is no fixed one.
> > 
> > - For nested configuration, IOVA notifier only notifies for stage-1
> >   invalidations (as far as I understand this is the intended
> >   behaviour as it notifies for IOVA)
> > 
> > - Stop ignoring VMID for stage-1 if stage-2 is also supported.
> > 
> > 
> > Future improvements:
> > =====================
> > 1) One small improvement, that I don’t think it’s worth the extra
> >    complexity, is in case of Stage-1 TLB miss for nested translation,
> >    we can do stage-1 walk and lookup for stage-2 TLBs, instead of
> >    doing the full walk.
> > 
> > 2) Patch 0006 (hw/arm/smmuv3: Translate CD and TT using stage-2 table)
> >    introduces a macro to use functions that rely on cfg for stage-2,
> >    I don’t like it. However, I didn’t find a simple way around it,
> >    either we change many functions to have a separate stage argument,
> >    or add another arg in config, which is probably more code.
> > 
> > Testing
> > ========
> > 1) IOMMUFD + VFIO
> >    Kernel: https://lore.kernel.org/all/cover.1683688960.git.nicolinc@nvidia.com/
> >    VMM: https://qemu-devel.nongnu.narkive.com/o815DqpI/rfc-v5-0-8-arm-smmuv3-emulation-support
> > 
> >    By assigning “virtio-net-pci,netdev=net0,disable-legacy=on,iommu_platform=on,ats=on”,
> >    to a guest VM (on top of QEMU guest) with VIFO and IOMMUFD.
> > 
> > 2) Work in progress prototype I am hacking on for nesting on KVM
> >    (this is nowhere near complete, and misses many stuff but it
> >    doesn't require VMs/VFIO) also with virtio-net-pci and git
> >    cloning a bunch of stuff and also observing traces.
> >    https://android-kvm.googlesource.com/linux/+log/refs/heads/smostafa/android15-6.6-smmu-nesting-wip
> > 
> > hw/arm/smmuv3: Split smmuv3_translate() better viewed with --color-moved
> > 
> > 
> > Mostafa Saleh (12):
> >   hw/arm/smmu: Use enum for SMMU stage
> >   hw/arm/smmu: Split smmuv3_translate()
> >   hw/arm/smmu: Add stage to TLB
> >   hw/arm/smmu: Support nesting in commands
> >   hw/arm/smmuv3: Support nested SMMUs in smmuv3_notify_iova()
> >   hw/arm/smmuv3: Translate CD and TT using stage-2 table
> >   hw/arm/smmu-common: Support nested translation
> >   hw/arm/smmuv3: Support and advertise nesting
> >   hw/arm/smmuv3: Advertise S2FWB
> >   hw/arm/smmu: Refactor SMMU OAS
> >   hw/arm/smmuv3: Add property for OAS
> >   hw/arm/virt: Set SMMU OAS based on CPU PARANGE
> > 
> >  hw/arm/smmu-common.c         | 256 ++++++++++++++++++----
> >  hw/arm/smmu-internal.h       |   2 +
> >  hw/arm/smmuv3-internal.h     |  17 +-
> >  hw/arm/smmuv3.c              | 405 ++++++++++++++++++++++-------------
> >  hw/arm/trace-events          |  14 +-
> >  hw/arm/virt.c                |  14 +-
> >  include/hw/arm/smmu-common.h |  46 +++-
> >  include/hw/arm/smmuv3.h      |   1 +
> >  target/arm/cpu.h             |   2 +
> >  target/arm/cpu64.c           |   5 +
> >  10 files changed, 533 insertions(+), 229 deletions(-)
> > 
> > --
> > 2.44.0.396.g6e790dbe36-goog
> > 


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2024-04-03 10:40 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-25 10:13 [RFC PATCH 00/12] SMMUv3 nested translation support Mostafa Saleh
2024-03-25 10:13 ` [RFC PATCH 01/12] hw/arm/smmu: Use enum for SMMU stage Mostafa Saleh
2024-04-02 16:32   ` Eric Auger
2024-03-25 10:13 ` [RFC PATCH 02/12] hw/arm/smmu: Split smmuv3_translate() Mostafa Saleh
2024-04-02 16:32   ` Eric Auger
2024-03-25 10:13 ` [RFC PATCH 03/12] hw/arm/smmu: Add stage to TLB Mostafa Saleh
2024-04-02 17:15   ` Eric Auger
2024-04-02 18:47     ` Mostafa Saleh
2024-04-03  7:22       ` Eric Auger
2024-04-03  9:55         ` Mostafa Saleh
2024-03-25 10:14 ` [RFC PATCH 04/12] hw/arm/smmu: Support nesting in commands Mostafa Saleh
2024-03-25 10:14 ` [RFC PATCH 05/12] hw/arm/smmuv3: Support nested SMMUs in smmuv3_notify_iova() Mostafa Saleh
2024-03-25 10:14 ` [RFC PATCH 06/12] hw/arm/smmuv3: Translate CD and TT using stage-2 table Mostafa Saleh
2024-03-25 10:14 ` [RFC PATCH 07/12] hw/arm/smmu-common: Support nested translation Mostafa Saleh
2024-03-25 14:20   ` Julien Grall
2024-03-25 20:47     ` Mostafa Saleh
2024-03-25 10:14 ` [RFC PATCH 08/12] hw/arm/smmuv3: Support and advertise nesting Mostafa Saleh
2024-03-25 10:14 ` [RFC PATCH 09/12] hw/arm/smmuv3: Advertise S2FWB Mostafa Saleh
2024-03-25 10:14 ` [RFC PATCH 10/12] hw/arm/smmu: Refactor SMMU OAS Mostafa Saleh
2024-03-25 10:14 ` [RFC PATCH 11/12] hw/arm/smmuv3: Add property for OAS Mostafa Saleh
2024-03-25 10:14 ` [RFC PATCH 12/12] hw/arm/virt: Set SMMU OAS based on CPU PARANGE Mostafa Saleh
2024-03-25 17:50 ` [RFC PATCH 00/12] SMMUv3 nested translation support Marcin Juszkiewicz
2024-04-02 22:28 ` Nicolin Chen
2024-04-03 10:39   ` Mostafa Saleh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.