All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/15] target/arm: SME prep patches
@ 2022-05-27 18:06 Richard Henderson
  2022-05-27 18:06 ` [PATCH v3 01/15] target/arm: Rename TBFLAG_A64 ZCR_LEN to SVE_LEN Richard Henderson
                   ` (14 more replies)
  0 siblings, 15 replies; 30+ messages in thread
From: Richard Henderson @ 2022-05-27 18:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Based-on: 20220523204742.740932-1-richard.henderson@linaro.org
("target/arm: tidy exception routing")

Changes for v3:
  * Two patch upstream,
  * Have linux-user use the digested SVE_LEN from hflags (pmm)
  * Use el_is_in_host in sve_vqm1_for_el, mirror how I intend
    do use it for streaming sve.
  * Export a bunch of functions which will be used by sme_helper.c.


r~
  

Richard Henderson (15):
  target/arm: Rename TBFLAG_A64 ZCR_LEN to SVE_LEN
  linux-user/aarch64: Use SVE_LEN from hflags
  target/arm: Do not use aarch64_sve_zcr_get_valid_len in reset
  target/arm: Merge aarch64_sve_zcr_get_valid_len into caller
  target/arm: Use uint32_t instead of bitmap for sve vq's
  target/arm: Rename sve_zcr_len_for_el to sve_vqm1_for_el
  target/arm: Remove fp checks from sve_exception_el
  target/arm: Add el_is_in_host
  target/arm: Use el_is_in_host for sve_vqm1_for_el
  target/arm: Split out load/store primitives to sve_ldst_internal.h
  target/arm: Export sve contiguous ldst support functions
  target/arm: Move expand_pred_b to vec_internal.h
  target/arm: Use expand_pred_b in mve_helper.c
  target/arm: Move expand_pred_h to vec_internal.h
  target/arm: Export bfdotadd from vec_helper.c

 linux-user/aarch64/target_prctl.h |  19 ++-
 target/arm/cpu.h                  |  11 +-
 target/arm/internals.h            |  18 +--
 target/arm/kvm_arm.h              |   7 +-
 target/arm/sve_ldst_internal.h    | 221 ++++++++++++++++++++++++++++
 target/arm/vec_internal.h         |  17 ++-
 linux-user/aarch64/signal.c       |   4 +-
 target/arm/arch_dump.c            |   2 +-
 target/arm/cpu.c                  |   5 +-
 target/arm/cpu64.c                | 117 ++++++++-------
 target/arm/gdbstub64.c            |   2 +-
 target/arm/helper.c               | 126 ++++++++--------
 target/arm/kvm64.c                |  36 +----
 target/arm/mve_helper.c           |   6 +-
 target/arm/sve_helper.c           | 232 +++---------------------------
 target/arm/translate-a64.c        |   2 +-
 target/arm/vec_helper.c           |  28 +++-
 17 files changed, 444 insertions(+), 409 deletions(-)
 create mode 100644 target/arm/sve_ldst_internal.h

-- 
2.34.1



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v3 01/15] target/arm: Rename TBFLAG_A64 ZCR_LEN to SVE_LEN
  2022-05-27 18:06 [PATCH v3 00/15] target/arm: SME prep patches Richard Henderson
@ 2022-05-27 18:06 ` Richard Henderson
  2022-05-31 12:13   ` Peter Maydell
  2022-05-27 18:06 ` [PATCH v3 02/15] linux-user/aarch64: Use SVE_LEN from hflags Richard Henderson
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2022-05-27 18:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

With SME, the vector length does not only come from ZCR_ELx.
Comment that this is either the SVE VL, or the Streaming SVE VL.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h           | 3 ++-
 target/arm/helper.c        | 2 +-
 target/arm/translate-a64.c | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 5bc6382fce..69e71fdcec 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3117,7 +3117,8 @@ FIELD(TBFLAG_M32, MVE_NO_PRED, 5, 1)            /* Not cached. */
  */
 FIELD(TBFLAG_A64, TBII, 0, 2)
 FIELD(TBFLAG_A64, SVEEXC_EL, 2, 2)
-FIELD(TBFLAG_A64, ZCR_LEN, 4, 4)
+/* The current vector length, either SVE VL or Streaming SVE VL. */
+FIELD(TBFLAG_A64, SVE_LEN, 4, 4)
 FIELD(TBFLAG_A64, PAUTH_ACTIVE, 8, 1)
 FIELD(TBFLAG_A64, BT, 9, 1)
 FIELD(TBFLAG_A64, BTYPE, 10, 2)         /* Not cached. */
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 5c875927cf..2a0399100e 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -13683,7 +13683,7 @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
             zcr_len = sve_zcr_len_for_el(env, el);
         }
         DP_TBFLAG_A64(flags, SVEEXC_EL, sve_el);
-        DP_TBFLAG_A64(flags, ZCR_LEN, zcr_len);
+        DP_TBFLAG_A64(flags, SVE_LEN, zcr_len);
     }
 
     sctlr = regime_sctlr(env, stage1);
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index cc9344b015..09ac344d35 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -14608,7 +14608,7 @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
     dc->align_mem = EX_TBFLAG_ANY(tb_flags, ALIGN_MEM);
     dc->pstate_il = EX_TBFLAG_ANY(tb_flags, PSTATE__IL);
     dc->sve_excp_el = EX_TBFLAG_A64(tb_flags, SVEEXC_EL);
-    dc->sve_len = (EX_TBFLAG_A64(tb_flags, ZCR_LEN) + 1) * 16;
+    dc->sve_len = (EX_TBFLAG_A64(tb_flags, SVE_LEN) + 1) * 16;
     dc->pauth_active = EX_TBFLAG_A64(tb_flags, PAUTH_ACTIVE);
     dc->bt = EX_TBFLAG_A64(tb_flags, BT);
     dc->btype = EX_TBFLAG_A64(tb_flags, BTYPE);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 02/15] linux-user/aarch64: Use SVE_LEN from hflags
  2022-05-27 18:06 [PATCH v3 00/15] target/arm: SME prep patches Richard Henderson
  2022-05-27 18:06 ` [PATCH v3 01/15] target/arm: Rename TBFLAG_A64 ZCR_LEN to SVE_LEN Richard Henderson
@ 2022-05-27 18:06 ` Richard Henderson
  2022-05-31 12:15   ` Peter Maydell
  2022-05-27 18:06 ` [PATCH v3 03/15] target/arm: Do not use aarch64_sve_zcr_get_valid_len in reset Richard Henderson
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2022-05-27 18:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Use the digested vector length rather than the raw zcr_el[1] value.

This fixes an incorrect return from do_prctl_set_vl where we didn't
take into account the set of vector lengths supported by the cpu.
It also prepares us for Streaming SVE mode, where the vector length
comes from a different cpreg.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 linux-user/aarch64/target_prctl.h | 19 +++++++++++++------
 linux-user/aarch64/signal.c       |  4 ++--
 2 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/linux-user/aarch64/target_prctl.h b/linux-user/aarch64/target_prctl.h
index 3f5a5d3933..fcbb90e881 100644
--- a/linux-user/aarch64/target_prctl.h
+++ b/linux-user/aarch64/target_prctl.h
@@ -10,7 +10,7 @@ static abi_long do_prctl_get_vl(CPUArchState *env)
 {
     ARMCPU *cpu = env_archcpu(env);
     if (cpu_isar_feature(aa64_sve, cpu)) {
-        return ((cpu->env.vfp.zcr_el[1] & 0xf) + 1) * 16;
+        return (EX_TBFLAG_A64(env->hflags, SVE_LEN) + 1) * 16;
     }
     return -TARGET_EINVAL;
 }
@@ -25,18 +25,25 @@ static abi_long do_prctl_set_vl(CPUArchState *env, abi_long arg2)
      */
     if (cpu_isar_feature(aa64_sve, env_archcpu(env))
         && arg2 >= 0 && arg2 <= 512 * 16 && !(arg2 & 15)) {
-        ARMCPU *cpu = env_archcpu(env);
         uint32_t vq, old_vq;
 
-        old_vq = (env->vfp.zcr_el[1] & 0xf) + 1;
+        old_vq = EX_TBFLAG_A64(env->hflags, SVE_LEN) + 1;
+
+        /*
+         * Bound the value of vq, so that we know that it fits into
+         * the 4-bit field in ZCR_EL1.  Rely on the hflags rebuild
+         * to sort out the length supported by the cpu.
+         */
         vq = MAX(arg2 / 16, 1);
-        vq = MIN(vq, cpu->sve_max_vq);
+        vq = MIN(vq, 16);
+        env->vfp.zcr_el[1] = vq - 1;
+        arm_rebuild_hflags(env);
+
+        vq = EX_TBFLAG_A64(env->hflags, SVE_LEN) + 1;
 
         if (vq < old_vq) {
             aarch64_sve_narrow_vq(env, vq);
         }
-        env->vfp.zcr_el[1] = vq - 1;
-        arm_rebuild_hflags(env);
         return vq * 16;
     }
     return -TARGET_EINVAL;
diff --git a/linux-user/aarch64/signal.c b/linux-user/aarch64/signal.c
index 7de4c96eb9..57e9360743 100644
--- a/linux-user/aarch64/signal.c
+++ b/linux-user/aarch64/signal.c
@@ -315,7 +315,7 @@ static int target_restore_sigframe(CPUARMState *env,
 
         case TARGET_SVE_MAGIC:
             if (cpu_isar_feature(aa64_sve, env_archcpu(env))) {
-                vq = (env->vfp.zcr_el[1] & 0xf) + 1;
+                vq = EX_TBFLAG_A64(env->hflags, SVE_LEN) + 1;
                 sve_size = QEMU_ALIGN_UP(TARGET_SVE_SIG_CONTEXT_SIZE(vq), 16);
                 if (!sve && size == sve_size) {
                     sve = (struct target_sve_context *)ctx;
@@ -434,7 +434,7 @@ static void target_setup_frame(int usig, struct target_sigaction *ka,
 
     /* SVE state needs saving only if it exists.  */
     if (cpu_isar_feature(aa64_sve, env_archcpu(env))) {
-        vq = (env->vfp.zcr_el[1] & 0xf) + 1;
+        vq = EX_TBFLAG_A64(env->hflags, SVE_LEN) + 1;
         sve_size = QEMU_ALIGN_UP(TARGET_SVE_SIG_CONTEXT_SIZE(vq), 16);
         sve_ofs = alloc_sigframe_space(sve_size, &layout);
     }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 03/15] target/arm: Do not use aarch64_sve_zcr_get_valid_len in reset
  2022-05-27 18:06 [PATCH v3 00/15] target/arm: SME prep patches Richard Henderson
  2022-05-27 18:06 ` [PATCH v3 01/15] target/arm: Rename TBFLAG_A64 ZCR_LEN to SVE_LEN Richard Henderson
  2022-05-27 18:06 ` [PATCH v3 02/15] linux-user/aarch64: Use SVE_LEN from hflags Richard Henderson
@ 2022-05-27 18:06 ` Richard Henderson
  2022-05-31 12:15   ` Peter Maydell
  2022-05-27 18:06 ` [PATCH v3 04/15] target/arm: Merge aarch64_sve_zcr_get_valid_len into caller Richard Henderson
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2022-05-27 18:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

We don't need to constrain the value set in zcr_el[1],
because it will be done by sve_zcr_len_for_el.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index d2bd74c2ed..0621944167 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -208,8 +208,7 @@ static void arm_cpu_reset(DeviceState *dev)
                                          CPACR_EL1, ZEN, 3);
         /* with reasonable vector length */
         if (cpu_isar_feature(aa64_sve, cpu)) {
-            env->vfp.zcr_el[1] =
-                aarch64_sve_zcr_get_valid_len(cpu, cpu->sve_default_vq - 1);
+            env->vfp.zcr_el[1] = cpu->sve_default_vq - 1;
         }
         /*
          * Enable 48-bit address space (TODO: take reserved_va into account).
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 04/15] target/arm: Merge aarch64_sve_zcr_get_valid_len into caller
  2022-05-27 18:06 [PATCH v3 00/15] target/arm: SME prep patches Richard Henderson
                   ` (2 preceding siblings ...)
  2022-05-27 18:06 ` [PATCH v3 03/15] target/arm: Do not use aarch64_sve_zcr_get_valid_len in reset Richard Henderson
@ 2022-05-27 18:06 ` Richard Henderson
  2022-05-27 18:06 ` [PATCH v3 05/15] target/arm: Use uint32_t instead of bitmap for sve vq's Richard Henderson
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2022-05-27 18:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

This function is used only once, and will need modification
for Streaming SVE mode.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/internals.h | 11 -----------
 target/arm/helper.c    | 30 +++++++++++-------------------
 2 files changed, 11 insertions(+), 30 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index 09d25612af..199d1bf630 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -198,17 +198,6 @@ void arm_translate_init(void);
 void arm_cpu_synchronize_from_tb(CPUState *cs, const TranslationBlock *tb);
 #endif /* CONFIG_TCG */
 
-/**
- * aarch64_sve_zcr_get_valid_len:
- * @cpu: cpu context
- * @start_len: maximum len to consider
- *
- * Return the maximum supported sve vector length <= @start_len.
- * Note that both @start_len and the return value are in units
- * of ZCR_ELx.LEN, so the vector bit length is (x + 1) * 128.
- */
-uint32_t aarch64_sve_zcr_get_valid_len(ARMCPU *cpu, uint32_t start_len);
-
 enum arm_fprounding {
     FPROUNDING_TIEEVEN,
     FPROUNDING_POSINF,
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 2a0399100e..66036c85d7 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -6216,40 +6216,32 @@ int sve_exception_el(CPUARMState *env, int el)
     return 0;
 }
 
-uint32_t aarch64_sve_zcr_get_valid_len(ARMCPU *cpu, uint32_t start_len)
-{
-    uint32_t end_len;
-
-    start_len = MIN(start_len, ARM_MAX_VQ - 1);
-    end_len = start_len;
-
-    if (!test_bit(start_len, cpu->sve_vq_map)) {
-        end_len = find_last_bit(cpu->sve_vq_map, start_len);
-        assert(end_len < start_len);
-    }
-    return end_len;
-}
-
 /*
  * Given that SVE is enabled, return the vector length for EL.
  */
 uint32_t sve_zcr_len_for_el(CPUARMState *env, int el)
 {
     ARMCPU *cpu = env_archcpu(env);
-    uint32_t zcr_len = cpu->sve_max_vq - 1;
+    uint32_t len = cpu->sve_max_vq - 1;
+    uint32_t end_len;
 
     if (el <= 1 &&
         (arm_hcr_el2_eff(env) & (HCR_E2H | HCR_TGE)) != (HCR_E2H | HCR_TGE)) {
-        zcr_len = MIN(zcr_len, 0xf & (uint32_t)env->vfp.zcr_el[1]);
+        len = MIN(len, 0xf & (uint32_t)env->vfp.zcr_el[1]);
     }
     if (el <= 2 && arm_feature(env, ARM_FEATURE_EL2)) {
-        zcr_len = MIN(zcr_len, 0xf & (uint32_t)env->vfp.zcr_el[2]);
+        len = MIN(len, 0xf & (uint32_t)env->vfp.zcr_el[2]);
     }
     if (arm_feature(env, ARM_FEATURE_EL3)) {
-        zcr_len = MIN(zcr_len, 0xf & (uint32_t)env->vfp.zcr_el[3]);
+        len = MIN(len, 0xf & (uint32_t)env->vfp.zcr_el[3]);
     }
 
-    return aarch64_sve_zcr_get_valid_len(cpu, zcr_len);
+    end_len = len;
+    if (!test_bit(len, cpu->sve_vq_map)) {
+        end_len = find_last_bit(cpu->sve_vq_map, len);
+        assert(end_len < len);
+    }
+    return end_len;
 }
 
 static void zcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 05/15] target/arm: Use uint32_t instead of bitmap for sve vq's
  2022-05-27 18:06 [PATCH v3 00/15] target/arm: SME prep patches Richard Henderson
                   ` (3 preceding siblings ...)
  2022-05-27 18:06 ` [PATCH v3 04/15] target/arm: Merge aarch64_sve_zcr_get_valid_len into caller Richard Henderson
@ 2022-05-27 18:06 ` Richard Henderson
  2022-05-27 18:06 ` [PATCH v3 06/15] target/arm: Rename sve_zcr_len_for_el to sve_vqm1_for_el Richard Henderson
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2022-05-27 18:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, Peter Maydell

The bitmap need only hold 15 bits; bitmap is over-complicated.
We can simplify operations quite a bit with plain logical ops.

The introduction of SVE_VQ_POW2_MAP eliminates the need for
looping in order to search for powers of two.  Simply perform
the logical ops and use count leading or trailing zeros as
required to find the result.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h       |   6 +--
 target/arm/internals.h |   5 ++
 target/arm/kvm_arm.h   |   7 ++-
 target/arm/cpu64.c     | 117 ++++++++++++++++++++---------------------
 target/arm/helper.c    |   9 +---
 target/arm/kvm64.c     |  36 +++----------
 6 files changed, 75 insertions(+), 105 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 69e71fdcec..a86e8d6548 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1041,9 +1041,9 @@ struct ArchCPU {
      * Bits set in sve_vq_supported represent valid vector lengths for
      * the CPU type.
      */
-    DECLARE_BITMAP(sve_vq_map, ARM_MAX_VQ);
-    DECLARE_BITMAP(sve_vq_init, ARM_MAX_VQ);
-    DECLARE_BITMAP(sve_vq_supported, ARM_MAX_VQ);
+    uint32_t sve_vq_map;
+    uint32_t sve_vq_init;
+    uint32_t sve_vq_supported;
 
     /* Generic timer counter frequency, in Hz */
     uint64_t gt_cntfrq_hz;
diff --git a/target/arm/internals.h b/target/arm/internals.h
index 199d1bf630..b587901be1 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -1299,4 +1299,9 @@ void aa32_max_features(ARMCPU *cpu);
 bool arm_singlestep_active(CPUARMState *env);
 bool arm_generate_debug_exceptions(CPUARMState *env, int cur_el);
 
+/* Powers of 2 for sve_vq_map et al. */
+#define SVE_VQ_POW2_MAP                                 \
+    ((1 << (1 - 1)) | (1 << (2 - 1)) |                  \
+     (1 << (4 - 1)) | (1 << (8 - 1)) | (1 << (16 - 1)))
+
 #endif
diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
index b7f78b5215..99017b635c 100644
--- a/target/arm/kvm_arm.h
+++ b/target/arm/kvm_arm.h
@@ -239,13 +239,12 @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf);
 /**
  * kvm_arm_sve_get_vls:
  * @cs: CPUState
- * @map: bitmap to fill in
  *
  * Get all the SVE vector lengths supported by the KVM host, setting
  * the bits corresponding to their length in quadwords minus one
- * (vq - 1) in @map up to ARM_MAX_VQ.
+ * (vq - 1) up to ARM_MAX_VQ.  Return the resulting map.
  */
-void kvm_arm_sve_get_vls(CPUState *cs, unsigned long *map);
+uint32_t kvm_arm_sve_get_vls(CPUState *cs);
 
 /**
  * kvm_arm_set_cpu_features_from_host:
@@ -439,7 +438,7 @@ static inline void kvm_arm_steal_time_finalize(ARMCPU *cpu, Error **errp)
     g_assert_not_reached();
 }
 
-static inline void kvm_arm_sve_get_vls(CPUState *cs, unsigned long *map)
+static inline uint32_t kvm_arm_sve_get_vls(CPUState *cs)
 {
     g_assert_not_reached();
 }
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 3ff9219ca3..51c5d8d4bc 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -355,8 +355,11 @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
      * any of the above.  Finally, if SVE is not disabled, then at least one
      * vector length must be enabled.
      */
-    DECLARE_BITMAP(tmp, ARM_MAX_VQ);
-    uint32_t vq, max_vq = 0;
+    uint32_t vq_map = cpu->sve_vq_map;
+    uint32_t vq_init = cpu->sve_vq_init;
+    uint32_t vq_supported;
+    uint32_t vq_mask = 0;
+    uint32_t tmp, vq, max_vq = 0;
 
     /*
      * CPU models specify a set of supported vector lengths which are
@@ -364,10 +367,16 @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
      * in the supported bitmap results in an error.  When KVM is enabled we
      * fetch the supported bitmap from the host.
      */
-    if (kvm_enabled() && kvm_arm_sve_supported()) {
-        kvm_arm_sve_get_vls(CPU(cpu), cpu->sve_vq_supported);
-    } else if (kvm_enabled()) {
-        assert(!cpu_isar_feature(aa64_sve, cpu));
+    if (kvm_enabled()) {
+        if (kvm_arm_sve_supported()) {
+            cpu->sve_vq_supported = kvm_arm_sve_get_vls(CPU(cpu));
+            vq_supported = cpu->sve_vq_supported;
+        } else {
+            assert(!cpu_isar_feature(aa64_sve, cpu));
+            vq_supported = 0;
+        }
+    } else {
+        vq_supported = cpu->sve_vq_supported;
     }
 
     /*
@@ -375,8 +384,9 @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
      * From the properties, sve_vq_map<N> implies sve_vq_init<N>.
      * Check first for any sve<N> enabled.
      */
-    if (!bitmap_empty(cpu->sve_vq_map, ARM_MAX_VQ)) {
-        max_vq = find_last_bit(cpu->sve_vq_map, ARM_MAX_VQ) + 1;
+    if (vq_map != 0) {
+        max_vq = 32 - clz32(vq_map);
+        vq_mask = MAKE_64BIT_MASK(0, max_vq);
 
         if (cpu->sve_max_vq && max_vq > cpu->sve_max_vq) {
             error_setg(errp, "cannot enable sve%d", max_vq * 128);
@@ -392,15 +402,10 @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
              * For KVM we have to automatically enable all supported unitialized
              * lengths, even when the smaller lengths are not all powers-of-two.
              */
-            bitmap_andnot(tmp, cpu->sve_vq_supported, cpu->sve_vq_init, max_vq);
-            bitmap_or(cpu->sve_vq_map, cpu->sve_vq_map, tmp, max_vq);
+            vq_map |= vq_supported & ~vq_init & vq_mask;
         } else {
             /* Propagate enabled bits down through required powers-of-two. */
-            for (vq = pow2floor(max_vq); vq >= 1; vq >>= 1) {
-                if (!test_bit(vq - 1, cpu->sve_vq_init)) {
-                    set_bit(vq - 1, cpu->sve_vq_map);
-                }
-            }
+            vq_map |= SVE_VQ_POW2_MAP & ~vq_init & vq_mask;
         }
     } else if (cpu->sve_max_vq == 0) {
         /*
@@ -413,25 +418,18 @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
 
         if (kvm_enabled()) {
             /* Disabling a supported length disables all larger lengths. */
-            for (vq = 1; vq <= ARM_MAX_VQ; ++vq) {
-                if (test_bit(vq - 1, cpu->sve_vq_init) &&
-                    test_bit(vq - 1, cpu->sve_vq_supported)) {
-                    break;
-                }
-            }
+            tmp = vq_init & vq_supported;
         } else {
             /* Disabling a power-of-two disables all larger lengths. */
-            for (vq = 1; vq <= ARM_MAX_VQ; vq <<= 1) {
-                if (test_bit(vq - 1, cpu->sve_vq_init)) {
-                    break;
-                }
-            }
+            tmp = vq_init & SVE_VQ_POW2_MAP;
         }
+        vq = ctz32(tmp) + 1;
 
         max_vq = vq <= ARM_MAX_VQ ? vq - 1 : ARM_MAX_VQ;
-        bitmap_andnot(cpu->sve_vq_map, cpu->sve_vq_supported,
-                      cpu->sve_vq_init, max_vq);
-        if (max_vq == 0 || bitmap_empty(cpu->sve_vq_map, max_vq)) {
+        vq_mask = MAKE_64BIT_MASK(0, max_vq);
+        vq_map = vq_supported & ~vq_init & vq_mask;
+
+        if (max_vq == 0 || vq_map == 0) {
             error_setg(errp, "cannot disable sve%d", vq * 128);
             error_append_hint(errp, "Disabling sve%d results in all "
                               "vector lengths being disabled.\n",
@@ -441,7 +439,8 @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
             return;
         }
 
-        max_vq = find_last_bit(cpu->sve_vq_map, max_vq) + 1;
+        max_vq = 32 - clz32(vq_map);
+        vq_mask = MAKE_64BIT_MASK(0, max_vq);
     }
 
     /*
@@ -451,9 +450,9 @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
      */
     if (cpu->sve_max_vq != 0) {
         max_vq = cpu->sve_max_vq;
+        vq_mask = MAKE_64BIT_MASK(0, max_vq);
 
-        if (!test_bit(max_vq - 1, cpu->sve_vq_map) &&
-            test_bit(max_vq - 1, cpu->sve_vq_init)) {
+        if (vq_init & ~vq_map & (1 << (max_vq - 1))) {
             error_setg(errp, "cannot disable sve%d", max_vq * 128);
             error_append_hint(errp, "The maximum vector length must be "
                               "enabled, sve-max-vq=%d (%d bits)\n",
@@ -462,8 +461,7 @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
         }
 
         /* Set all bits not explicitly set within sve-max-vq. */
-        bitmap_complement(tmp, cpu->sve_vq_init, max_vq);
-        bitmap_or(cpu->sve_vq_map, cpu->sve_vq_map, tmp, max_vq);
+        vq_map |= ~vq_init & vq_mask;
     }
 
     /*
@@ -472,13 +470,14 @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
      * are clear, just in case anybody looks.
      */
     assert(max_vq != 0);
-    bitmap_clear(cpu->sve_vq_map, max_vq, ARM_MAX_VQ - max_vq);
+    assert(vq_mask != 0);
+    vq_map &= vq_mask;
 
     /* Ensure the set of lengths matches what is supported. */
-    bitmap_xor(tmp, cpu->sve_vq_map, cpu->sve_vq_supported, max_vq);
-    if (!bitmap_empty(tmp, max_vq)) {
-        vq = find_last_bit(tmp, max_vq) + 1;
-        if (test_bit(vq - 1, cpu->sve_vq_map)) {
+    tmp = vq_map ^ (vq_supported & vq_mask);
+    if (tmp) {
+        vq = 32 - clz32(tmp);
+        if (vq_map & (1 << (vq - 1))) {
             if (cpu->sve_max_vq) {
                 error_setg(errp, "cannot set sve-max-vq=%d", cpu->sve_max_vq);
                 error_append_hint(errp, "This CPU does not support "
@@ -502,15 +501,15 @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
                 return;
             } else {
                 /* Ensure all required powers-of-two are enabled. */
-                for (vq = pow2floor(max_vq); vq >= 1; vq >>= 1) {
-                    if (!test_bit(vq - 1, cpu->sve_vq_map)) {
-                        error_setg(errp, "cannot disable sve%d", vq * 128);
-                        error_append_hint(errp, "sve%d is required as it "
-                                          "is a power-of-two length smaller "
-                                          "than the maximum, sve%d\n",
-                                          vq * 128, max_vq * 128);
-                        return;
-                    }
+                tmp = SVE_VQ_POW2_MAP & vq_mask & ~vq_map;
+                if (tmp) {
+                    vq = 32 - clz32(tmp);
+                    error_setg(errp, "cannot disable sve%d", vq * 128);
+                    error_append_hint(errp, "sve%d is required as it "
+                                      "is a power-of-two length smaller "
+                                      "than the maximum, sve%d\n",
+                                      vq * 128, max_vq * 128);
+                    return;
                 }
             }
         }
@@ -530,6 +529,7 @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
 
     /* From now on sve_max_vq is the actual maximum supported length. */
     cpu->sve_max_vq = max_vq;
+    cpu->sve_vq_map = vq_map;
 }
 
 static void cpu_max_get_sve_max_vq(Object *obj, Visitor *v, const char *name,
@@ -590,7 +590,7 @@ static void cpu_arm_get_sve_vq(Object *obj, Visitor *v, const char *name,
     if (!cpu_isar_feature(aa64_sve, cpu)) {
         value = false;
     } else {
-        value = test_bit(vq - 1, cpu->sve_vq_map);
+        value = extract32(cpu->sve_vq_map, vq - 1, 1);
     }
     visit_type_bool(v, name, &value, errp);
 }
@@ -612,12 +612,8 @@ static void cpu_arm_set_sve_vq(Object *obj, Visitor *v, const char *name,
         return;
     }
 
-    if (value) {
-        set_bit(vq - 1, cpu->sve_vq_map);
-    } else {
-        clear_bit(vq - 1, cpu->sve_vq_map);
-    }
-    set_bit(vq - 1, cpu->sve_vq_init);
+    cpu->sve_vq_map = deposit32(cpu->sve_vq_map, vq - 1, 1, value);
+    cpu->sve_vq_init |= 1 << (vq - 1);
 }
 
 static bool cpu_arm_get_sve(Object *obj, Error **errp)
@@ -978,7 +974,7 @@ static void aarch64_max_initfn(Object *obj)
     cpu->dcz_blocksize = 7; /*  512 bytes */
 #endif
 
-    bitmap_fill(cpu->sve_vq_supported, ARM_MAX_VQ);
+    cpu->sve_vq_supported = MAKE_64BIT_MASK(0, ARM_MAX_VQ);
 
     aarch64_add_pauth_properties(obj);
     aarch64_add_sve_properties(obj);
@@ -1025,12 +1021,11 @@ static void aarch64_a64fx_initfn(Object *obj)
     cpu->gic_vprebits = 5;
     cpu->gic_pribits = 5;
 
-    /* Suppport of A64FX's vector length are 128,256 and 512bit only */
+    /* The A64FX supports only 128, 256 and 512 bit vector lengths */
     aarch64_add_sve_properties(obj);
-    bitmap_zero(cpu->sve_vq_supported, ARM_MAX_VQ);
-    set_bit(0, cpu->sve_vq_supported); /* 128bit */
-    set_bit(1, cpu->sve_vq_supported); /* 256bit */
-    set_bit(3, cpu->sve_vq_supported); /* 512bit */
+    cpu->sve_vq_supported = (1 << 0)  /* 128bit */
+                          | (1 << 1)  /* 256bit */
+                          | (1 << 3); /* 512bit */
 
     cpu->isar.reset_pmcr_el0 = 0x46014040;
 
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 66036c85d7..93784cb073 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -6223,7 +6223,6 @@ uint32_t sve_zcr_len_for_el(CPUARMState *env, int el)
 {
     ARMCPU *cpu = env_archcpu(env);
     uint32_t len = cpu->sve_max_vq - 1;
-    uint32_t end_len;
 
     if (el <= 1 &&
         (arm_hcr_el2_eff(env) & (HCR_E2H | HCR_TGE)) != (HCR_E2H | HCR_TGE)) {
@@ -6236,12 +6235,8 @@ uint32_t sve_zcr_len_for_el(CPUARMState *env, int el)
         len = MIN(len, 0xf & (uint32_t)env->vfp.zcr_el[3]);
     }
 
-    end_len = len;
-    if (!test_bit(len, cpu->sve_vq_map)) {
-        end_len = find_last_bit(cpu->sve_vq_map, len);
-        assert(end_len < len);
-    }
-    return end_len;
+    len = 31 - clz32(cpu->sve_vq_map & MAKE_64BIT_MASK(0, len + 1));
+    return len;
 }
 
 static void zcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index 363032da90..b3f635fc95 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -760,15 +760,13 @@ bool kvm_arm_steal_time_supported(void)
 
 QEMU_BUILD_BUG_ON(KVM_ARM64_SVE_VQ_MIN != 1);
 
-void kvm_arm_sve_get_vls(CPUState *cs, unsigned long *map)
+uint32_t kvm_arm_sve_get_vls(CPUState *cs)
 {
     /* Only call this function if kvm_arm_sve_supported() returns true. */
     static uint64_t vls[KVM_ARM64_SVE_VLS_WORDS];
     static bool probed;
     uint32_t vq = 0;
-    int i, j;
-
-    bitmap_zero(map, ARM_MAX_VQ);
+    int i;
 
     /*
      * KVM ensures all host CPUs support the same set of vector lengths.
@@ -809,46 +807,24 @@ void kvm_arm_sve_get_vls(CPUState *cs, unsigned long *map)
         if (vq > ARM_MAX_VQ) {
             warn_report("KVM supports vector lengths larger than "
                         "QEMU can enable");
+            vls[0] &= MAKE_64BIT_MASK(0, ARM_MAX_VQ);
         }
     }
 
-    for (i = 0; i < KVM_ARM64_SVE_VLS_WORDS; ++i) {
-        if (!vls[i]) {
-            continue;
-        }
-        for (j = 1; j <= 64; ++j) {
-            vq = j + i * 64;
-            if (vq > ARM_MAX_VQ) {
-                return;
-            }
-            if (vls[i] & (1UL << (j - 1))) {
-                set_bit(vq - 1, map);
-            }
-        }
-    }
+    return vls[0];
 }
 
 static int kvm_arm_sve_set_vls(CPUState *cs)
 {
-    uint64_t vls[KVM_ARM64_SVE_VLS_WORDS] = {0};
+    ARMCPU *cpu = ARM_CPU(cs);
+    uint64_t vls[KVM_ARM64_SVE_VLS_WORDS] = { cpu->sve_vq_map };
     struct kvm_one_reg reg = {
         .id = KVM_REG_ARM64_SVE_VLS,
         .addr = (uint64_t)&vls[0],
     };
-    ARMCPU *cpu = ARM_CPU(cs);
-    uint32_t vq;
-    int i, j;
 
     assert(cpu->sve_max_vq <= KVM_ARM64_SVE_VQ_MAX);
 
-    for (vq = 1; vq <= cpu->sve_max_vq; ++vq) {
-        if (test_bit(vq - 1, cpu->sve_vq_map)) {
-            i = (vq - 1) / 64;
-            j = (vq - 1) % 64;
-            vls[i] |= 1UL << j;
-        }
-    }
-
     return kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, &reg);
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 06/15] target/arm: Rename sve_zcr_len_for_el to sve_vqm1_for_el
  2022-05-27 18:06 [PATCH v3 00/15] target/arm: SME prep patches Richard Henderson
                   ` (4 preceding siblings ...)
  2022-05-27 18:06 ` [PATCH v3 05/15] target/arm: Use uint32_t instead of bitmap for sve vq's Richard Henderson
@ 2022-05-27 18:06 ` Richard Henderson
  2022-05-31 12:19   ` Peter Maydell
  2022-05-27 18:06 ` [PATCH v3 07/15] target/arm: Remove fp checks from sve_exception_el Richard Henderson
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2022-05-27 18:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

This will be used for both Normal and Streaming SVE, and the value
does not necessarily come from ZCR_ELx.  While we're at it, emphasize
the units in which the value is returned.

Patch produced by
    git grep -l sve_zcr_len_for_el | \
    xargs -n1 sed -i 's/sve_zcr_len_for_el/sve_vqm1_for_el/g'

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h       |  2 +-
 target/arm/arch_dump.c |  2 +-
 target/arm/cpu.c       |  2 +-
 target/arm/gdbstub64.c |  2 +-
 target/arm/helper.c    | 12 ++++++------
 5 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index a86e8d6548..24cb48eea1 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1132,7 +1132,7 @@ void aarch64_sync_64_to_32(CPUARMState *env);
 
 int fp_exception_el(CPUARMState *env, int cur_el);
 int sve_exception_el(CPUARMState *env, int cur_el);
-uint32_t sve_zcr_len_for_el(CPUARMState *env, int el);
+uint32_t sve_vqm1_for_el(CPUARMState *env, int el);
 
 static inline bool is_a64(CPUARMState *env)
 {
diff --git a/target/arm/arch_dump.c b/target/arm/arch_dump.c
index 0184845310..b1f040e69f 100644
--- a/target/arm/arch_dump.c
+++ b/target/arm/arch_dump.c
@@ -166,7 +166,7 @@ static off_t sve_fpcr_offset(uint32_t vq)
 
 static uint32_t sve_current_vq(CPUARMState *env)
 {
-    return sve_zcr_len_for_el(env, arm_current_el(env)) + 1;
+    return sve_vqm1_for_el(env, arm_current_el(env)) + 1;
 }
 
 static size_t sve_size_vq(uint32_t vq)
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 0621944167..1b5d535788 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -925,7 +925,7 @@ static void aarch64_cpu_dump_state(CPUState *cs, FILE *f, int flags)
                  vfp_get_fpcr(env), vfp_get_fpsr(env));
 
     if (cpu_isar_feature(aa64_sve, cpu) && sve_exception_el(env, el) == 0) {
-        int j, zcr_len = sve_zcr_len_for_el(env, el);
+        int j, zcr_len = sve_vqm1_for_el(env, el);
 
         for (i = 0; i <= FFR_PRED_NUM; i++) {
             bool eol;
diff --git a/target/arm/gdbstub64.c b/target/arm/gdbstub64.c
index 596878666d..07a6746944 100644
--- a/target/arm/gdbstub64.c
+++ b/target/arm/gdbstub64.c
@@ -152,7 +152,7 @@ int arm_gdb_get_svereg(CPUARMState *env, GByteArray *buf, int reg)
          * We report in Vector Granules (VG) which is 64bit in a Z reg
          * while the ZCR works in Vector Quads (VQ) which is 128bit chunks.
          */
-        int vq = sve_zcr_len_for_el(env, arm_current_el(env)) + 1;
+        int vq = sve_vqm1_for_el(env, arm_current_el(env)) + 1;
         return gdb_get_reg64(buf, vq * 2);
     }
     default:
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 93784cb073..84cb78d151 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -6219,7 +6219,7 @@ int sve_exception_el(CPUARMState *env, int el)
 /*
  * Given that SVE is enabled, return the vector length for EL.
  */
-uint32_t sve_zcr_len_for_el(CPUARMState *env, int el)
+uint32_t sve_vqm1_for_el(CPUARMState *env, int el)
 {
     ARMCPU *cpu = env_archcpu(env);
     uint32_t len = cpu->sve_max_vq - 1;
@@ -6243,7 +6243,7 @@ static void zcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
                       uint64_t value)
 {
     int cur_el = arm_current_el(env);
-    int old_len = sve_zcr_len_for_el(env, cur_el);
+    int old_len = sve_vqm1_for_el(env, cur_el);
     int new_len;
 
     /* Bits other than [3:0] are RAZ/WI.  */
@@ -6254,7 +6254,7 @@ static void zcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
      * Because we arrived here, we know both FP and SVE are enabled;
      * otherwise we would have trapped access to the ZCR_ELn register.
      */
-    new_len = sve_zcr_len_for_el(env, cur_el);
+    new_len = sve_vqm1_for_el(env, cur_el);
     if (new_len < old_len) {
         aarch64_sve_narrow_vq(env, new_len + 1);
     }
@@ -13667,7 +13667,7 @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
         if (sve_el != 0 && fp_el == 0) {
             zcr_len = 0;
         } else {
-            zcr_len = sve_zcr_len_for_el(env, el);
+            zcr_len = sve_vqm1_for_el(env, el);
         }
         DP_TBFLAG_A64(flags, SVEEXC_EL, sve_el);
         DP_TBFLAG_A64(flags, SVE_LEN, zcr_len);
@@ -14034,10 +14034,10 @@ void aarch64_sve_change_el(CPUARMState *env, int old_el,
      */
     old_a64 = old_el ? arm_el_is_aa64(env, old_el) : el0_a64;
     old_len = (old_a64 && !sve_exception_el(env, old_el)
-               ? sve_zcr_len_for_el(env, old_el) : 0);
+               ? sve_vqm1_for_el(env, old_el) : 0);
     new_a64 = new_el ? arm_el_is_aa64(env, new_el) : el0_a64;
     new_len = (new_a64 && !sve_exception_el(env, new_el)
-               ? sve_zcr_len_for_el(env, new_el) : 0);
+               ? sve_vqm1_for_el(env, new_el) : 0);
 
     /* When changing vector length, clear inaccessible state.  */
     if (new_len < old_len) {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 07/15] target/arm: Remove fp checks from sve_exception_el
  2022-05-27 18:06 [PATCH v3 00/15] target/arm: SME prep patches Richard Henderson
                   ` (5 preceding siblings ...)
  2022-05-27 18:06 ` [PATCH v3 06/15] target/arm: Rename sve_zcr_len_for_el to sve_vqm1_for_el Richard Henderson
@ 2022-05-27 18:06 ` Richard Henderson
  2022-05-27 18:06 ` [PATCH v3 08/15] target/arm: Add el_is_in_host Richard Henderson
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2022-05-27 18:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Instead of checking these bits in fp_exception_el and
also in sve_exception_el, document that we must compare
the results.  The only place where we have not already
checked that FP EL is zero is in rebuild_hflags_a64.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.c | 56 +++++++++++++++------------------------------
 1 file changed, 19 insertions(+), 37 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 84cb78d151..cd0a8992ba 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -6135,11 +6135,15 @@ static const ARMCPRegInfo minimal_ras_reginfo[] = {
       .access = PL2_RW, .fieldoffset = offsetof(CPUARMState, cp15.vsesr_el2) },
 };
 
-/* Return the exception level to which exceptions should be taken
- * via SVEAccessTrap.  If an exception should be routed through
- * AArch64.AdvSIMDFPAccessTrap, return 0; fp_exception_el should
- * take care of raising that exception.
- * C.f. the ARM pseudocode function CheckSVEEnabled.
+/*
+ * Return the exception level to which exceptions should be taken
+ * via SVEAccessTrap.  This excludes the check for whether the exception
+ * should be routed through AArch64.AdvSIMDFPAccessTrap.  That can easily
+ * be found by testing 0 < fp_exception_el < sve_exception_el.
+ *
+ * C.f. the ARM pseudocode function CheckSVEEnabled.  Note that the
+ * pseudocode does *not* separate out the FP trap checks, but has them
+ * all in one function.
  */
 int sve_exception_el(CPUARMState *env, int el)
 {
@@ -6157,18 +6161,6 @@ int sve_exception_el(CPUARMState *env, int el)
         case 2:
             return 1;
         }
-
-        /* Check CPACR.FPEN.  */
-        switch (FIELD_EX64(env->cp15.cpacr_el1, CPACR_EL1, FPEN)) {
-        case 1:
-            if (el != 0) {
-                break;
-            }
-            /* fall through */
-        case 0:
-        case 2:
-            return 0;
-        }
     }
 
     /*
@@ -6186,24 +6178,10 @@ int sve_exception_el(CPUARMState *env, int el)
             case 2:
                 return 2;
             }
-
-            switch (FIELD_EX32(env->cp15.cptr_el[2], CPTR_EL2, FPEN)) {
-            case 1:
-                if (el == 2 || !(hcr_el2 & HCR_TGE)) {
-                    break;
-                }
-                /* fall through */
-            case 0:
-            case 2:
-                return 0;
-            }
         } else if (arm_is_el2_enabled(env)) {
             if (FIELD_EX64(env->cp15.cptr_el[2], CPTR_EL2, TZ)) {
                 return 2;
             }
-            if (FIELD_EX64(env->cp15.cptr_el[2], CPTR_EL2, TFP)) {
-                return 0;
-            }
         }
     }
 
@@ -13658,15 +13636,19 @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
 
     if (cpu_isar_feature(aa64_sve, env_archcpu(env))) {
         int sve_el = sve_exception_el(env, el);
-        uint32_t zcr_len;
+        uint32_t zcr_len = 0;
 
         /*
-         * If SVE is disabled, but FP is enabled,
-         * then the effective len is 0.
+         * If either FP or SVE are disabled, translator does not need len.
+         * If SVE EL > FP EL, FP exception has precedence, and translator
+         * does not need SVE EL.  Save potential re-translations by forcing
+         * the unneeded data to zero.
          */
-        if (sve_el != 0 && fp_el == 0) {
-            zcr_len = 0;
-        } else {
+        if (fp_el != 0) {
+            if (sve_el > fp_el) {
+                sve_el = 0;
+            }
+        } else if (sve_el == 0) {
             zcr_len = sve_vqm1_for_el(env, el);
         }
         DP_TBFLAG_A64(flags, SVEEXC_EL, sve_el);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 08/15] target/arm: Add el_is_in_host
  2022-05-27 18:06 [PATCH v3 00/15] target/arm: SME prep patches Richard Henderson
                   ` (6 preceding siblings ...)
  2022-05-27 18:06 ` [PATCH v3 07/15] target/arm: Remove fp checks from sve_exception_el Richard Henderson
@ 2022-05-27 18:06 ` Richard Henderson
  2022-05-31 12:24   ` Peter Maydell
  2022-05-27 18:06 ` [PATCH v3 09/15] target/arm: Use el_is_in_host for sve_vqm1_for_el Richard Henderson
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2022-05-27 18:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

This (newish) ARM pseudocode function is easier to work with
than open-coded tests for HCR_E2H etc.  Use of the function
will be staged into the code base in parts.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/internals.h |  2 ++
 target/arm/helper.c    | 28 ++++++++++++++++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index b587901be1..008e377887 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -1295,6 +1295,8 @@ static inline void define_cortex_a72_a57_a53_cp_reginfo(ARMCPU *cpu) { }
 void define_cortex_a72_a57_a53_cp_reginfo(ARMCPU *cpu);
 #endif
 
+bool el_is_in_host(CPUARMState *env, int el);
+
 void aa32_max_features(ARMCPU *cpu);
 bool arm_singlestep_active(CPUARMState *env);
 bool arm_generate_debug_exceptions(CPUARMState *env, int cur_el);
diff --git a/target/arm/helper.c b/target/arm/helper.c
index cd0a8992ba..d1b6c2459b 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -5288,6 +5288,34 @@ uint64_t arm_hcr_el2_eff(CPUARMState *env)
     return ret;
 }
 
+/*
+ * Corresponds to ARM pseudocode function ELIsInHost().
+ */
+bool el_is_in_host(CPUARMState *env, int el)
+{
+    uint64_t mask;
+
+    /*
+     * Since we only care about E2H and TGE, we can skip arm_hcr_el2_eff().
+     * Perform the simplest bit tests first, and validate EL2 afterward.
+     */
+    if (el & 1) {
+        return false; /* EL1 or EL3 */
+    }
+
+    /*
+     * Note that hcr_write() checks isar_feature_aa64_vh(),
+     * aka HaveVirtHostExt(), in allowing HCR_E2H to be set.
+     */
+    mask = el ? HCR_E2H : HCR_E2H | HCR_TGE;
+    if ((env->cp15.hcr_el2 & mask) != mask) {
+        return false;
+    }
+
+    /* TGE and/or E2H set: double check those bits are currently legal. */
+    return arm_is_el2_enabled(env) && arm_el_is_aa64(env, 2);
+}
+
 static void hcrx_write(CPUARMState *env, const ARMCPRegInfo *ri,
                        uint64_t value)
 {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 09/15] target/arm: Use el_is_in_host for sve_vqm1_for_el
  2022-05-27 18:06 [PATCH v3 00/15] target/arm: SME prep patches Richard Henderson
                   ` (7 preceding siblings ...)
  2022-05-27 18:06 ` [PATCH v3 08/15] target/arm: Add el_is_in_host Richard Henderson
@ 2022-05-27 18:06 ` Richard Henderson
  2022-05-31 12:26   ` Peter Maydell
  2022-05-27 18:06 ` [PATCH v3 10/15] target/arm: Split out load/store primitives to sve_ldst_internal.h Richard Henderson
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2022-05-27 18:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

The ARM pseudocode function NVL uses this predicate now,
and I think it's a bit clearer.  Simplify the pseudocode
condition by noting that IsInHost is always false for EL1.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index d1b6c2459b..69b10be480 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -6230,8 +6230,7 @@ uint32_t sve_vqm1_for_el(CPUARMState *env, int el)
     ARMCPU *cpu = env_archcpu(env);
     uint32_t len = cpu->sve_max_vq - 1;
 
-    if (el <= 1 &&
-        (arm_hcr_el2_eff(env) & (HCR_E2H | HCR_TGE)) != (HCR_E2H | HCR_TGE)) {
+    if (el <= 1 && !el_is_in_host(env, el)) {
         len = MIN(len, 0xf & (uint32_t)env->vfp.zcr_el[1]);
     }
     if (el <= 2 && arm_feature(env, ARM_FEATURE_EL2)) {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 10/15] target/arm: Split out load/store primitives to sve_ldst_internal.h
  2022-05-27 18:06 [PATCH v3 00/15] target/arm: SME prep patches Richard Henderson
                   ` (8 preceding siblings ...)
  2022-05-27 18:06 ` [PATCH v3 09/15] target/arm: Use el_is_in_host for sve_vqm1_for_el Richard Henderson
@ 2022-05-27 18:06 ` Richard Henderson
  2022-05-31 12:26   ` Peter Maydell
  2022-05-27 18:06 ` [PATCH v3 11/15] target/arm: Export sve contiguous ldst support functions Richard Henderson
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2022-05-27 18:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Begin creation of sve_ldst_internal.h by moving the primitives
that access host and tlb memory.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/sve_ldst_internal.h | 127 +++++++++++++++++++++++++++++++++
 target/arm/sve_helper.c        | 107 +--------------------------
 2 files changed, 128 insertions(+), 106 deletions(-)
 create mode 100644 target/arm/sve_ldst_internal.h

diff --git a/target/arm/sve_ldst_internal.h b/target/arm/sve_ldst_internal.h
new file mode 100644
index 0000000000..ef9117e84c
--- /dev/null
+++ b/target/arm/sve_ldst_internal.h
@@ -0,0 +1,127 @@
+/*
+ * ARM SVE Load/Store Helpers
+ *
+ * Copyright (c) 2018-2022 Linaro
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef TARGET_ARM_SVE_LDST_INTERNAL_H
+#define TARGET_ARM_SVE_LDST_INTERNAL_H
+
+#include "exec/cpu_ldst.h"
+
+/*
+ * Load one element into @vd + @reg_off from @host.
+ * The controlling predicate is known to be true.
+ */
+typedef void sve_ldst1_host_fn(void *vd, intptr_t reg_off, void *host);
+
+/*
+ * Load one element into @vd + @reg_off from (@env, @vaddr, @ra).
+ * The controlling predicate is known to be true.
+ */
+typedef void sve_ldst1_tlb_fn(CPUARMState *env, void *vd, intptr_t reg_off,
+                              target_ulong vaddr, uintptr_t retaddr);
+
+/*
+ * Generate the above primitives.
+ */
+
+#define DO_LD_HOST(NAME, H, TYPEE, TYPEM, HOST)                              \
+static inline void sve_##NAME##_host(void *vd, intptr_t reg_off, void *host) \
+{ TYPEM val = HOST(host); *(TYPEE *)(vd + H(reg_off)) = val; }
+
+#define DO_ST_HOST(NAME, H, TYPEE, TYPEM, HOST)                              \
+static inline void sve_##NAME##_host(void *vd, intptr_t reg_off, void *host) \
+{ TYPEM val = *(TYPEE *)(vd + H(reg_off)); HOST(host, val); }
+
+#define DO_LD_TLB(NAME, H, TYPEE, TYPEM, TLB)                              \
+static inline void sve_##NAME##_tlb(CPUARMState *env, void *vd,            \
+                        intptr_t reg_off, target_ulong addr, uintptr_t ra) \
+{                                                                          \
+    TYPEM val = TLB(env, useronly_clean_ptr(addr), ra);                    \
+    *(TYPEE *)(vd + H(reg_off)) = val;                                     \
+}
+
+#define DO_ST_TLB(NAME, H, TYPEE, TYPEM, TLB)                              \
+static inline void sve_##NAME##_tlb(CPUARMState *env, void *vd,            \
+                        intptr_t reg_off, target_ulong addr, uintptr_t ra) \
+{                                                                          \
+    TYPEM val = *(TYPEE *)(vd + H(reg_off));                               \
+    TLB(env, useronly_clean_ptr(addr), val, ra);                           \
+}
+
+#define DO_LD_PRIM_1(NAME, H, TE, TM)                   \
+    DO_LD_HOST(NAME, H, TE, TM, ldub_p)                 \
+    DO_LD_TLB(NAME, H, TE, TM, cpu_ldub_data_ra)
+
+DO_LD_PRIM_1(ld1bb,  H1,   uint8_t,  uint8_t)
+DO_LD_PRIM_1(ld1bhu, H1_2, uint16_t, uint8_t)
+DO_LD_PRIM_1(ld1bhs, H1_2, uint16_t,  int8_t)
+DO_LD_PRIM_1(ld1bsu, H1_4, uint32_t, uint8_t)
+DO_LD_PRIM_1(ld1bss, H1_4, uint32_t,  int8_t)
+DO_LD_PRIM_1(ld1bdu, H1_8, uint64_t, uint8_t)
+DO_LD_PRIM_1(ld1bds, H1_8, uint64_t,  int8_t)
+
+#define DO_ST_PRIM_1(NAME, H, TE, TM)                   \
+    DO_ST_HOST(st1##NAME, H, TE, TM, stb_p)             \
+    DO_ST_TLB(st1##NAME, H, TE, TM, cpu_stb_data_ra)
+
+DO_ST_PRIM_1(bb,   H1,  uint8_t, uint8_t)
+DO_ST_PRIM_1(bh, H1_2, uint16_t, uint8_t)
+DO_ST_PRIM_1(bs, H1_4, uint32_t, uint8_t)
+DO_ST_PRIM_1(bd, H1_8, uint64_t, uint8_t)
+
+#define DO_LD_PRIM_2(NAME, H, TE, TM, LD) \
+    DO_LD_HOST(ld1##NAME##_be, H, TE, TM, LD##_be_p)    \
+    DO_LD_HOST(ld1##NAME##_le, H, TE, TM, LD##_le_p)    \
+    DO_LD_TLB(ld1##NAME##_be, H, TE, TM, cpu_##LD##_be_data_ra) \
+    DO_LD_TLB(ld1##NAME##_le, H, TE, TM, cpu_##LD##_le_data_ra)
+
+#define DO_ST_PRIM_2(NAME, H, TE, TM, ST) \
+    DO_ST_HOST(st1##NAME##_be, H, TE, TM, ST##_be_p)    \
+    DO_ST_HOST(st1##NAME##_le, H, TE, TM, ST##_le_p)    \
+    DO_ST_TLB(st1##NAME##_be, H, TE, TM, cpu_##ST##_be_data_ra) \
+    DO_ST_TLB(st1##NAME##_le, H, TE, TM, cpu_##ST##_le_data_ra)
+
+DO_LD_PRIM_2(hh,  H1_2, uint16_t, uint16_t, lduw)
+DO_LD_PRIM_2(hsu, H1_4, uint32_t, uint16_t, lduw)
+DO_LD_PRIM_2(hss, H1_4, uint32_t,  int16_t, lduw)
+DO_LD_PRIM_2(hdu, H1_8, uint64_t, uint16_t, lduw)
+DO_LD_PRIM_2(hds, H1_8, uint64_t,  int16_t, lduw)
+
+DO_ST_PRIM_2(hh, H1_2, uint16_t, uint16_t, stw)
+DO_ST_PRIM_2(hs, H1_4, uint32_t, uint16_t, stw)
+DO_ST_PRIM_2(hd, H1_8, uint64_t, uint16_t, stw)
+
+DO_LD_PRIM_2(ss,  H1_4, uint32_t, uint32_t, ldl)
+DO_LD_PRIM_2(sdu, H1_8, uint64_t, uint32_t, ldl)
+DO_LD_PRIM_2(sds, H1_8, uint64_t,  int32_t, ldl)
+
+DO_ST_PRIM_2(ss, H1_4, uint32_t, uint32_t, stl)
+DO_ST_PRIM_2(sd, H1_8, uint64_t, uint32_t, stl)
+
+DO_LD_PRIM_2(dd, H1_8, uint64_t, uint64_t, ldq)
+DO_ST_PRIM_2(dd, H1_8, uint64_t, uint64_t, stq)
+
+#undef DO_LD_TLB
+#undef DO_ST_TLB
+#undef DO_LD_HOST
+#undef DO_LD_PRIM_1
+#undef DO_ST_PRIM_1
+#undef DO_LD_PRIM_2
+#undef DO_ST_PRIM_2
+
+#endif /* TARGET_ARM_SVE_LDST_INTERNAL_H */
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index e0f9aa9983..ea4c835689 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -21,12 +21,12 @@
 #include "cpu.h"
 #include "internals.h"
 #include "exec/exec-all.h"
-#include "exec/cpu_ldst.h"
 #include "exec/helper-proto.h"
 #include "tcg/tcg-gvec-desc.h"
 #include "fpu/softfloat.h"
 #include "tcg/tcg.h"
 #include "vec_internal.h"
+#include "sve_ldst_internal.h"
 
 
 /* Return a value for NZCV as per the ARM PredTest pseudofunction.
@@ -5299,111 +5299,6 @@ void HELPER(sve_fcmla_zpzzz_d)(void *vd, void *vn, void *vm, void *va,
  * Load contiguous data, protected by a governing predicate.
  */
 
-/*
- * Load one element into @vd + @reg_off from @host.
- * The controlling predicate is known to be true.
- */
-typedef void sve_ldst1_host_fn(void *vd, intptr_t reg_off, void *host);
-
-/*
- * Load one element into @vd + @reg_off from (@env, @vaddr, @ra).
- * The controlling predicate is known to be true.
- */
-typedef void sve_ldst1_tlb_fn(CPUARMState *env, void *vd, intptr_t reg_off,
-                              target_ulong vaddr, uintptr_t retaddr);
-
-/*
- * Generate the above primitives.
- */
-
-#define DO_LD_HOST(NAME, H, TYPEE, TYPEM, HOST) \
-static void sve_##NAME##_host(void *vd, intptr_t reg_off, void *host)  \
-{                                                                      \
-    TYPEM val = HOST(host);                                            \
-    *(TYPEE *)(vd + H(reg_off)) = val;                                 \
-}
-
-#define DO_ST_HOST(NAME, H, TYPEE, TYPEM, HOST) \
-static void sve_##NAME##_host(void *vd, intptr_t reg_off, void *host)  \
-{ HOST(host, (TYPEM)*(TYPEE *)(vd + H(reg_off))); }
-
-#define DO_LD_TLB(NAME, H, TYPEE, TYPEM, TLB) \
-static void sve_##NAME##_tlb(CPUARMState *env, void *vd, intptr_t reg_off,  \
-                             target_ulong addr, uintptr_t ra)               \
-{                                                                           \
-    *(TYPEE *)(vd + H(reg_off)) =                                           \
-        (TYPEM)TLB(env, useronly_clean_ptr(addr), ra);                      \
-}
-
-#define DO_ST_TLB(NAME, H, TYPEE, TYPEM, TLB) \
-static void sve_##NAME##_tlb(CPUARMState *env, void *vd, intptr_t reg_off,  \
-                             target_ulong addr, uintptr_t ra)               \
-{                                                                           \
-    TLB(env, useronly_clean_ptr(addr),                                      \
-        (TYPEM)*(TYPEE *)(vd + H(reg_off)), ra);                            \
-}
-
-#define DO_LD_PRIM_1(NAME, H, TE, TM)                   \
-    DO_LD_HOST(NAME, H, TE, TM, ldub_p)                 \
-    DO_LD_TLB(NAME, H, TE, TM, cpu_ldub_data_ra)
-
-DO_LD_PRIM_1(ld1bb,  H1,   uint8_t,  uint8_t)
-DO_LD_PRIM_1(ld1bhu, H1_2, uint16_t, uint8_t)
-DO_LD_PRIM_1(ld1bhs, H1_2, uint16_t,  int8_t)
-DO_LD_PRIM_1(ld1bsu, H1_4, uint32_t, uint8_t)
-DO_LD_PRIM_1(ld1bss, H1_4, uint32_t,  int8_t)
-DO_LD_PRIM_1(ld1bdu, H1_8, uint64_t, uint8_t)
-DO_LD_PRIM_1(ld1bds, H1_8, uint64_t,  int8_t)
-
-#define DO_ST_PRIM_1(NAME, H, TE, TM)                   \
-    DO_ST_HOST(st1##NAME, H, TE, TM, stb_p)             \
-    DO_ST_TLB(st1##NAME, H, TE, TM, cpu_stb_data_ra)
-
-DO_ST_PRIM_1(bb,   H1,  uint8_t, uint8_t)
-DO_ST_PRIM_1(bh, H1_2, uint16_t, uint8_t)
-DO_ST_PRIM_1(bs, H1_4, uint32_t, uint8_t)
-DO_ST_PRIM_1(bd, H1_8, uint64_t, uint8_t)
-
-#define DO_LD_PRIM_2(NAME, H, TE, TM, LD) \
-    DO_LD_HOST(ld1##NAME##_be, H, TE, TM, LD##_be_p)    \
-    DO_LD_HOST(ld1##NAME##_le, H, TE, TM, LD##_le_p)    \
-    DO_LD_TLB(ld1##NAME##_be, H, TE, TM, cpu_##LD##_be_data_ra) \
-    DO_LD_TLB(ld1##NAME##_le, H, TE, TM, cpu_##LD##_le_data_ra)
-
-#define DO_ST_PRIM_2(NAME, H, TE, TM, ST) \
-    DO_ST_HOST(st1##NAME##_be, H, TE, TM, ST##_be_p)    \
-    DO_ST_HOST(st1##NAME##_le, H, TE, TM, ST##_le_p)    \
-    DO_ST_TLB(st1##NAME##_be, H, TE, TM, cpu_##ST##_be_data_ra) \
-    DO_ST_TLB(st1##NAME##_le, H, TE, TM, cpu_##ST##_le_data_ra)
-
-DO_LD_PRIM_2(hh,  H1_2, uint16_t, uint16_t, lduw)
-DO_LD_PRIM_2(hsu, H1_4, uint32_t, uint16_t, lduw)
-DO_LD_PRIM_2(hss, H1_4, uint32_t,  int16_t, lduw)
-DO_LD_PRIM_2(hdu, H1_8, uint64_t, uint16_t, lduw)
-DO_LD_PRIM_2(hds, H1_8, uint64_t,  int16_t, lduw)
-
-DO_ST_PRIM_2(hh, H1_2, uint16_t, uint16_t, stw)
-DO_ST_PRIM_2(hs, H1_4, uint32_t, uint16_t, stw)
-DO_ST_PRIM_2(hd, H1_8, uint64_t, uint16_t, stw)
-
-DO_LD_PRIM_2(ss,  H1_4, uint32_t, uint32_t, ldl)
-DO_LD_PRIM_2(sdu, H1_8, uint64_t, uint32_t, ldl)
-DO_LD_PRIM_2(sds, H1_8, uint64_t,  int32_t, ldl)
-
-DO_ST_PRIM_2(ss, H1_4, uint32_t, uint32_t, stl)
-DO_ST_PRIM_2(sd, H1_8, uint64_t, uint32_t, stl)
-
-DO_LD_PRIM_2(dd, H1_8, uint64_t, uint64_t, ldq)
-DO_ST_PRIM_2(dd, H1_8, uint64_t, uint64_t, stq)
-
-#undef DO_LD_TLB
-#undef DO_ST_TLB
-#undef DO_LD_HOST
-#undef DO_LD_PRIM_1
-#undef DO_ST_PRIM_1
-#undef DO_LD_PRIM_2
-#undef DO_ST_PRIM_2
-
 /*
  * Skip through a sequence of inactive elements in the guarding predicate @vg,
  * beginning at @reg_off bounded by @reg_max.  Return the offset of the active
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 11/15] target/arm: Export sve contiguous ldst support functions
  2022-05-27 18:06 [PATCH v3 00/15] target/arm: SME prep patches Richard Henderson
                   ` (9 preceding siblings ...)
  2022-05-27 18:06 ` [PATCH v3 10/15] target/arm: Split out load/store primitives to sve_ldst_internal.h Richard Henderson
@ 2022-05-27 18:06 ` Richard Henderson
  2022-05-31 12:27   ` Peter Maydell
  2022-05-27 18:06 ` [PATCH v3 12/15] target/arm: Move expand_pred_b to vec_internal.h Richard Henderson
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2022-05-27 18:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Export all of the support functions for performing bulk
fault analysis on a set of elements at contiguous addresses
controlled by a predicate.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/sve_ldst_internal.h | 94 ++++++++++++++++++++++++++++++++++
 target/arm/sve_helper.c        | 87 ++++++-------------------------
 2 files changed, 111 insertions(+), 70 deletions(-)

diff --git a/target/arm/sve_ldst_internal.h b/target/arm/sve_ldst_internal.h
index ef9117e84c..b5c473fc48 100644
--- a/target/arm/sve_ldst_internal.h
+++ b/target/arm/sve_ldst_internal.h
@@ -124,4 +124,98 @@ DO_ST_PRIM_2(dd, H1_8, uint64_t, uint64_t, stq)
 #undef DO_LD_PRIM_2
 #undef DO_ST_PRIM_2
 
+/*
+ * Resolve the guest virtual address to info->host and info->flags.
+ * If @nofault, return false if the page is invalid, otherwise
+ * exit via page fault exception.
+ */
+
+typedef struct {
+    void *host;
+    int flags;
+    MemTxAttrs attrs;
+} SVEHostPage;
+
+bool sve_probe_page(SVEHostPage *info, bool nofault, CPUARMState *env,
+                    target_ulong addr, int mem_off, MMUAccessType access_type,
+                    int mmu_idx, uintptr_t retaddr);
+
+/*
+ * Analyse contiguous data, protected by a governing predicate.
+ */
+
+typedef enum {
+    FAULT_NO,
+    FAULT_FIRST,
+    FAULT_ALL,
+} SVEContFault;
+
+typedef struct {
+    /*
+     * First and last element wholly contained within the two pages.
+     * mem_off_first[0] and reg_off_first[0] are always set >= 0.
+     * reg_off_last[0] may be < 0 if the first element crosses pages.
+     * All of mem_off_first[1], reg_off_first[1] and reg_off_last[1]
+     * are set >= 0 only if there are complete elements on a second page.
+     *
+     * The reg_off_* offsets are relative to the internal vector register.
+     * The mem_off_first offset is relative to the memory address; the
+     * two offsets are different when a load operation extends, a store
+     * operation truncates, or for multi-register operations.
+     */
+    int16_t mem_off_first[2];
+    int16_t reg_off_first[2];
+    int16_t reg_off_last[2];
+
+    /*
+     * One element that is misaligned and spans both pages,
+     * or -1 if there is no such active element.
+     */
+    int16_t mem_off_split;
+    int16_t reg_off_split;
+
+    /*
+     * The byte offset at which the entire operation crosses a page boundary.
+     * Set >= 0 if and only if the entire operation spans two pages.
+     */
+    int16_t page_split;
+
+    /* TLB data for the two pages. */
+    SVEHostPage page[2];
+} SVEContLdSt;
+
+/*
+ * Find first active element on each page, and a loose bound for the
+ * final element on each page.  Identify any single element that spans
+ * the page boundary.  Return true if there are any active elements.
+ */
+bool sve_cont_ldst_elements(SVEContLdSt *info, target_ulong addr, uint64_t *vg,
+                            intptr_t reg_max, int esz, int msize);
+
+/*
+ * Resolve the guest virtual addresses to info->page[].
+ * Control the generation of page faults with @fault.  Return false if
+ * there is no work to do, which can only happen with @fault == FAULT_NO.
+ */
+bool sve_cont_ldst_pages(SVEContLdSt *info, SVEContFault fault,
+                         CPUARMState *env, target_ulong addr,
+                         MMUAccessType access_type, uintptr_t retaddr);
+
+#ifdef CONFIG_USER_ONLY
+static inline void
+sve_cont_ldst_watchpoints(SVEContLdSt *info, CPUARMState *env, uint64_t *vg,
+                          target_ulong addr, int esize, int msize,
+                          int wp_access, uintptr_t retaddr)
+{ }
+#else
+void sve_cont_ldst_watchpoints(SVEContLdSt *info, CPUARMState *env,
+                               uint64_t *vg, target_ulong addr,
+                               int esize, int msize, int wp_access,
+                               uintptr_t retaddr);
+#endif
+
+void sve_cont_ldst_mte_check(SVEContLdSt *info, CPUARMState *env, uint64_t *vg,
+                             target_ulong addr, int esize, int msize,
+                             uint32_t mtedesc, uintptr_t ra);
+
 #endif /* TARGET_ARM_SVE_LDST_INTERNAL_H */
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index ea4c835689..446d7ac5cb 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -5339,16 +5339,9 @@ static intptr_t find_next_active(uint64_t *vg, intptr_t reg_off,
  * exit via page fault exception.
  */
 
-typedef struct {
-    void *host;
-    int flags;
-    MemTxAttrs attrs;
-} SVEHostPage;
-
-static bool sve_probe_page(SVEHostPage *info, bool nofault,
-                           CPUARMState *env, target_ulong addr,
-                           int mem_off, MMUAccessType access_type,
-                           int mmu_idx, uintptr_t retaddr)
+bool sve_probe_page(SVEHostPage *info, bool nofault, CPUARMState *env,
+                    target_ulong addr, int mem_off, MMUAccessType access_type,
+                    int mmu_idx, uintptr_t retaddr)
 {
     int flags;
 
@@ -5404,59 +5397,13 @@ static bool sve_probe_page(SVEHostPage *info, bool nofault,
     return true;
 }
 
-
-/*
- * Analyse contiguous data, protected by a governing predicate.
- */
-
-typedef enum {
-    FAULT_NO,
-    FAULT_FIRST,
-    FAULT_ALL,
-} SVEContFault;
-
-typedef struct {
-    /*
-     * First and last element wholly contained within the two pages.
-     * mem_off_first[0] and reg_off_first[0] are always set >= 0.
-     * reg_off_last[0] may be < 0 if the first element crosses pages.
-     * All of mem_off_first[1], reg_off_first[1] and reg_off_last[1]
-     * are set >= 0 only if there are complete elements on a second page.
-     *
-     * The reg_off_* offsets are relative to the internal vector register.
-     * The mem_off_first offset is relative to the memory address; the
-     * two offsets are different when a load operation extends, a store
-     * operation truncates, or for multi-register operations.
-     */
-    int16_t mem_off_first[2];
-    int16_t reg_off_first[2];
-    int16_t reg_off_last[2];
-
-    /*
-     * One element that is misaligned and spans both pages,
-     * or -1 if there is no such active element.
-     */
-    int16_t mem_off_split;
-    int16_t reg_off_split;
-
-    /*
-     * The byte offset at which the entire operation crosses a page boundary.
-     * Set >= 0 if and only if the entire operation spans two pages.
-     */
-    int16_t page_split;
-
-    /* TLB data for the two pages. */
-    SVEHostPage page[2];
-} SVEContLdSt;
-
 /*
  * Find first active element on each page, and a loose bound for the
  * final element on each page.  Identify any single element that spans
  * the page boundary.  Return true if there are any active elements.
  */
-static bool sve_cont_ldst_elements(SVEContLdSt *info, target_ulong addr,
-                                   uint64_t *vg, intptr_t reg_max,
-                                   int esz, int msize)
+bool sve_cont_ldst_elements(SVEContLdSt *info, target_ulong addr, uint64_t *vg,
+                            intptr_t reg_max, int esz, int msize)
 {
     const int esize = 1 << esz;
     const uint64_t pg_mask = pred_esz_masks[esz];
@@ -5546,9 +5493,9 @@ static bool sve_cont_ldst_elements(SVEContLdSt *info, target_ulong addr,
  * Control the generation of page faults with @fault.  Return false if
  * there is no work to do, which can only happen with @fault == FAULT_NO.
  */
-static bool sve_cont_ldst_pages(SVEContLdSt *info, SVEContFault fault,
-                                CPUARMState *env, target_ulong addr,
-                                MMUAccessType access_type, uintptr_t retaddr)
+bool sve_cont_ldst_pages(SVEContLdSt *info, SVEContFault fault,
+                         CPUARMState *env, target_ulong addr,
+                         MMUAccessType access_type, uintptr_t retaddr)
 {
     int mmu_idx = cpu_mmu_index(env, false);
     int mem_off = info->mem_off_first[0];
@@ -5604,12 +5551,12 @@ static bool sve_cont_ldst_pages(SVEContLdSt *info, SVEContFault fault,
     return have_work;
 }
 
-static void sve_cont_ldst_watchpoints(SVEContLdSt *info, CPUARMState *env,
-                                      uint64_t *vg, target_ulong addr,
-                                      int esize, int msize, int wp_access,
-                                      uintptr_t retaddr)
-{
 #ifndef CONFIG_USER_ONLY
+void sve_cont_ldst_watchpoints(SVEContLdSt *info, CPUARMState *env,
+                               uint64_t *vg, target_ulong addr,
+                               int esize, int msize, int wp_access,
+                               uintptr_t retaddr)
+{
     intptr_t mem_off, reg_off, reg_last;
     int flags0 = info->page[0].flags;
     int flags1 = info->page[1].flags;
@@ -5665,12 +5612,12 @@ static void sve_cont_ldst_watchpoints(SVEContLdSt *info, CPUARMState *env,
             } while (reg_off & 63);
         } while (reg_off <= reg_last);
     }
-#endif
 }
+#endif
 
-static void sve_cont_ldst_mte_check(SVEContLdSt *info, CPUARMState *env,
-                                    uint64_t *vg, target_ulong addr, int esize,
-                                    int msize, uint32_t mtedesc, uintptr_t ra)
+void sve_cont_ldst_mte_check(SVEContLdSt *info, CPUARMState *env,
+                             uint64_t *vg, target_ulong addr, int esize,
+                             int msize, uint32_t mtedesc, uintptr_t ra)
 {
     intptr_t mem_off, reg_off, reg_last;
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 12/15] target/arm: Move expand_pred_b to vec_internal.h
  2022-05-27 18:06 [PATCH v3 00/15] target/arm: SME prep patches Richard Henderson
                   ` (10 preceding siblings ...)
  2022-05-27 18:06 ` [PATCH v3 11/15] target/arm: Export sve contiguous ldst support functions Richard Henderson
@ 2022-05-27 18:06 ` Richard Henderson
  2022-05-31 12:30   ` Peter Maydell
  2022-05-27 18:06 ` [PATCH v3 13/15] target/arm: Use expand_pred_b in mve_helper.c Richard Henderson
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2022-05-27 18:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Put the inline function near the array declaration.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/vec_internal.h | 8 +++++++-
 target/arm/sve_helper.c   | 9 ---------
 2 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/target/arm/vec_internal.h b/target/arm/vec_internal.h
index 1d63402042..d1a1ea4a66 100644
--- a/target/arm/vec_internal.h
+++ b/target/arm/vec_internal.h
@@ -50,8 +50,14 @@
 #define H8(x)   (x)
 #define H1_8(x) (x)
 
-/* Data for expanding active predicate bits to bytes, for byte elements. */
+/*
+ * Expand active predicate bits to bytes, for byte elements.
+ */
 extern const uint64_t expand_pred_b_data[256];
+static inline uint64_t expand_pred_b(uint8_t byte)
+{
+    return expand_pred_b_data[byte];
+}
 
 static inline void clear_tail(void *vd, uintptr_t opr_sz, uintptr_t max_sz)
 {
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 446d7ac5cb..b8a37dd1eb 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -103,15 +103,6 @@ uint32_t HELPER(sve_predtest)(void *vd, void *vg, uint32_t words)
     return flags;
 }
 
-/*
- * Expand active predicate bits to bytes, for byte elements.
- * (The data table itself is in vec_helper.c as MVE also needs it.)
- */
-static inline uint64_t expand_pred_b(uint8_t byte)
-{
-    return expand_pred_b_data[byte];
-}
-
 /* Similarly for half-word elements.
  *  for (i = 0; i < 256; ++i) {
  *      unsigned long m = 0;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 13/15] target/arm: Use expand_pred_b in mve_helper.c
  2022-05-27 18:06 [PATCH v3 00/15] target/arm: SME prep patches Richard Henderson
                   ` (11 preceding siblings ...)
  2022-05-27 18:06 ` [PATCH v3 12/15] target/arm: Move expand_pred_b to vec_internal.h Richard Henderson
@ 2022-05-27 18:06 ` Richard Henderson
  2022-05-31 12:33   ` Peter Maydell
  2022-05-27 18:06 ` [PATCH v3 14/15] target/arm: Move expand_pred_h to vec_internal.h Richard Henderson
  2022-05-27 18:06 ` [PATCH v3 15/15] target/arm: Export bfdotadd from vec_helper.c Richard Henderson
  14 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2022-05-27 18:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Use the function instead of the array directly.

Because the function performs its own masking, via the uint8_t
parameter, we need to nothing extra within the users: the bits
above the first 2 (_uh) or 4 (_uw) will be discarded by assignment
to the local bmask variables, and of course _uq uses the entire
uint64_t result.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/mve_helper.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 846962bf4c..403b345ea3 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -726,7 +726,7 @@ static void mergemask_sb(int8_t *d, int8_t r, uint16_t mask)
 
 static void mergemask_uh(uint16_t *d, uint16_t r, uint16_t mask)
 {
-    uint16_t bmask = expand_pred_b_data[mask & 3];
+    uint16_t bmask = expand_pred_b(mask);
     *d = (*d & ~bmask) | (r & bmask);
 }
 
@@ -737,7 +737,7 @@ static void mergemask_sh(int16_t *d, int16_t r, uint16_t mask)
 
 static void mergemask_uw(uint32_t *d, uint32_t r, uint16_t mask)
 {
-    uint32_t bmask = expand_pred_b_data[mask & 0xf];
+    uint32_t bmask = expand_pred_b(mask);
     *d = (*d & ~bmask) | (r & bmask);
 }
 
@@ -748,7 +748,7 @@ static void mergemask_sw(int32_t *d, int32_t r, uint16_t mask)
 
 static void mergemask_uq(uint64_t *d, uint64_t r, uint16_t mask)
 {
-    uint64_t bmask = expand_pred_b_data[mask & 0xff];
+    uint64_t bmask = expand_pred_b(mask);
     *d = (*d & ~bmask) | (r & bmask);
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 14/15] target/arm: Move expand_pred_h to vec_internal.h
  2022-05-27 18:06 [PATCH v3 00/15] target/arm: SME prep patches Richard Henderson
                   ` (12 preceding siblings ...)
  2022-05-27 18:06 ` [PATCH v3 13/15] target/arm: Use expand_pred_b in mve_helper.c Richard Henderson
@ 2022-05-27 18:06 ` Richard Henderson
  2022-05-31 12:34   ` Peter Maydell
  2022-05-27 18:06 ` [PATCH v3 15/15] target/arm: Export bfdotadd from vec_helper.c Richard Henderson
  14 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2022-05-27 18:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Move the data to vec_helper.c and the inline to vec_internal.h.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/vec_internal.h |  7 +++++++
 target/arm/sve_helper.c   | 29 -----------------------------
 target/arm/vec_helper.c   | 26 ++++++++++++++++++++++++++
 3 files changed, 33 insertions(+), 29 deletions(-)

diff --git a/target/arm/vec_internal.h b/target/arm/vec_internal.h
index d1a1ea4a66..43cff5ec7c 100644
--- a/target/arm/vec_internal.h
+++ b/target/arm/vec_internal.h
@@ -59,6 +59,13 @@ static inline uint64_t expand_pred_b(uint8_t byte)
     return expand_pred_b_data[byte];
 }
 
+/* Similarly for half-word elements. */
+extern const uint64_t expand_pred_h_data[0x55+1];
+static inline uint64_t expand_pred_h(uint8_t byte)
+{
+    return expand_pred_h_data[byte & 0x55];
+}
+
 static inline void clear_tail(void *vd, uintptr_t opr_sz, uintptr_t max_sz)
 {
     uint64_t *d = vd + opr_sz;
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index b8a37dd1eb..9a2741b20f 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -103,35 +103,6 @@ uint32_t HELPER(sve_predtest)(void *vd, void *vg, uint32_t words)
     return flags;
 }
 
-/* Similarly for half-word elements.
- *  for (i = 0; i < 256; ++i) {
- *      unsigned long m = 0;
- *      if (i & 0xaa) {
- *          continue;
- *      }
- *      for (j = 0; j < 8; j += 2) {
- *          if ((i >> j) & 1) {
- *              m |= 0xfffful << (j << 3);
- *          }
- *      }
- *      printf("[0x%x] = 0x%016lx,\n", i, m);
- *  }
- */
-static inline uint64_t expand_pred_h(uint8_t byte)
-{
-    static const uint64_t word[] = {
-        [0x01] = 0x000000000000ffff, [0x04] = 0x00000000ffff0000,
-        [0x05] = 0x00000000ffffffff, [0x10] = 0x0000ffff00000000,
-        [0x11] = 0x0000ffff0000ffff, [0x14] = 0x0000ffffffff0000,
-        [0x15] = 0x0000ffffffffffff, [0x40] = 0xffff000000000000,
-        [0x41] = 0xffff00000000ffff, [0x44] = 0xffff0000ffff0000,
-        [0x45] = 0xffff0000ffffffff, [0x50] = 0xffffffff00000000,
-        [0x51] = 0xffffffff0000ffff, [0x54] = 0xffffffffffff0000,
-        [0x55] = 0xffffffffffffffff,
-    };
-    return word[byte & 0x55];
-}
-
 /* Similarly for single word elements.  */
 static inline uint64_t expand_pred_s(uint8_t byte)
 {
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index 17fb158362..4db68fbbb3 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -127,6 +127,32 @@ const uint64_t expand_pred_b_data[256] = {
     0xffffffffffffffff,
 };
 
+/*
+ * Similarly for half-word elements.
+ *  for (i = 0; i < 256; ++i) {
+ *      unsigned long m = 0;
+ *      if (i & 0xaa) {
+ *          continue;
+ *      }
+ *      for (j = 0; j < 8; j += 2) {
+ *          if ((i >> j) & 1) {
+ *              m |= 0xfffful << (j << 3);
+ *          }
+ *      }
+ *      printf("[0x%x] = 0x%016lx,\n", i, m);
+ *  }
+ */
+const uint64_t expand_pred_h_data[0x55+1] = {
+    [0x01] = 0x000000000000ffff, [0x04] = 0x00000000ffff0000,
+    [0x05] = 0x00000000ffffffff, [0x10] = 0x0000ffff00000000,
+    [0x11] = 0x0000ffff0000ffff, [0x14] = 0x0000ffffffff0000,
+    [0x15] = 0x0000ffffffffffff, [0x40] = 0xffff000000000000,
+    [0x41] = 0xffff00000000ffff, [0x44] = 0xffff0000ffff0000,
+    [0x45] = 0xffff0000ffffffff, [0x50] = 0xffffffff00000000,
+    [0x51] = 0xffffffff0000ffff, [0x54] = 0xffffffffffff0000,
+    [0x55] = 0xffffffffffffffff,
+};
+
 /* Signed saturating rounding doubling multiply-accumulate high half, 8-bit */
 int8_t do_sqrdmlah_b(int8_t src1, int8_t src2, int8_t src3,
                      bool neg, bool round)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 15/15] target/arm: Export bfdotadd from vec_helper.c
  2022-05-27 18:06 [PATCH v3 00/15] target/arm: SME prep patches Richard Henderson
                   ` (13 preceding siblings ...)
  2022-05-27 18:06 ` [PATCH v3 14/15] target/arm: Move expand_pred_h to vec_internal.h Richard Henderson
@ 2022-05-27 18:06 ` Richard Henderson
  2022-05-31 12:35   ` Peter Maydell
  14 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2022-05-27 18:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

We will need this over in sme_helper.c.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/vec_internal.h | 2 ++
 target/arm/vec_helper.c   | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/target/arm/vec_internal.h b/target/arm/vec_internal.h
index 43cff5ec7c..5e50c503aa 100644
--- a/target/arm/vec_internal.h
+++ b/target/arm/vec_internal.h
@@ -230,4 +230,6 @@ uint64_t pmull_h(uint64_t op1, uint64_t op2);
  */
 uint64_t pmull_w(uint64_t op1, uint64_t op2);
 
+float32 bfdotadd(float32 sum, uint32_t e1, uint32_t e2);
+
 #endif /* TARGET_ARM_VEC_INTERNAL_H */
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index 4db68fbbb3..b3e8039cdb 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -2557,7 +2557,7 @@ DO_MMLA_B(gvec_usmmla_b, do_usmmla_b)
  * BFloat16 Dot Product
  */
 
-static float32 bfdotadd(float32 sum, uint32_t e1, uint32_t e2)
+float32 bfdotadd(float32 sum, uint32_t e1, uint32_t e2)
 {
     /* FPCR is ignored for BFDOT and BFMMLA. */
     float_status bf_status = {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 01/15] target/arm: Rename TBFLAG_A64 ZCR_LEN to SVE_LEN
  2022-05-27 18:06 ` [PATCH v3 01/15] target/arm: Rename TBFLAG_A64 ZCR_LEN to SVE_LEN Richard Henderson
@ 2022-05-31 12:13   ` Peter Maydell
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Maydell @ 2022-05-31 12:13 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Fri, 27 May 2022 at 19:06, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> With SME, the vector length does not only come from ZCR_ELx.
> Comment that this is either the SVE VL, or the Streaming SVE VL.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> --

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 02/15] linux-user/aarch64: Use SVE_LEN from hflags
  2022-05-27 18:06 ` [PATCH v3 02/15] linux-user/aarch64: Use SVE_LEN from hflags Richard Henderson
@ 2022-05-31 12:15   ` Peter Maydell
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Maydell @ 2022-05-31 12:15 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Fri, 27 May 2022 at 19:07, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Use the digested vector length rather than the raw zcr_el[1] value.
>
> This fixes an incorrect return from do_prctl_set_vl where we didn't
> take into account the set of vector lengths supported by the cpu.
> It also prepares us for Streaming SVE mode, where the vector length
> comes from a different cpreg.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---

> diff --git a/linux-user/aarch64/target_prctl.h b/linux-user/aarch64/target_prctl.h
> index 3f5a5d3933..fcbb90e881 100644
> --- a/linux-user/aarch64/target_prctl.h
> +++ b/linux-user/aarch64/target_prctl.h
> @@ -10,7 +10,7 @@ static abi_long do_prctl_get_vl(CPUArchState *env)
>  {
>      ARMCPU *cpu = env_archcpu(env);
>      if (cpu_isar_feature(aa64_sve, cpu)) {
> -        return ((cpu->env.vfp.zcr_el[1] & 0xf) + 1) * 16;
> +        return (EX_TBFLAG_A64(env->hflags, SVE_LEN) + 1) * 16;

I think env->hflags should be a private implementation detail
to target/arm and it's a bit odd to see linux-user fishing
around in it directly. Can we hide this behind a suitably
named function, please ?

thanks
-- PMM


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 03/15] target/arm: Do not use aarch64_sve_zcr_get_valid_len in reset
  2022-05-27 18:06 ` [PATCH v3 03/15] target/arm: Do not use aarch64_sve_zcr_get_valid_len in reset Richard Henderson
@ 2022-05-31 12:15   ` Peter Maydell
  2022-05-31 14:28     ` Richard Henderson
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Maydell @ 2022-05-31 12:15 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Fri, 27 May 2022 at 19:07, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> We don't need to constrain the value set in zcr_el[1],
> because it will be done by sve_zcr_len_for_el.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/cpu.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index d2bd74c2ed..0621944167 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -208,8 +208,7 @@ static void arm_cpu_reset(DeviceState *dev)
>                                           CPACR_EL1, ZEN, 3);
>          /* with reasonable vector length */
>          if (cpu_isar_feature(aa64_sve, cpu)) {
> -            env->vfp.zcr_el[1] =
> -                aarch64_sve_zcr_get_valid_len(cpu, cpu->sve_default_vq - 1);
> +            env->vfp.zcr_el[1] = cpu->sve_default_vq - 1;
>          }

I'm still not a fan of the zcr_el[] value not actually being
a valid one. I'd rather we constrained it when we write the
value into the field.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 06/15] target/arm: Rename sve_zcr_len_for_el to sve_vqm1_for_el
  2022-05-27 18:06 ` [PATCH v3 06/15] target/arm: Rename sve_zcr_len_for_el to sve_vqm1_for_el Richard Henderson
@ 2022-05-31 12:19   ` Peter Maydell
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Maydell @ 2022-05-31 12:19 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Fri, 27 May 2022 at 19:11, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> This will be used for both Normal and Streaming SVE, and the value
> does not necessarily come from ZCR_ELx.  While we're at it, emphasize
> the units in which the value is returned.
>
> Patch produced by
>     git grep -l sve_zcr_len_for_el | \
>     xargs -n1 sed -i 's/sve_zcr_len_for_el/sve_vqm1_for_el/g'
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Can we have a comment that says what a vqm1 is (and/or pick a less
obscure function name) ? That string
doesn't turn up anywhere in the Arm ARM...

Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 08/15] target/arm: Add el_is_in_host
  2022-05-27 18:06 ` [PATCH v3 08/15] target/arm: Add el_is_in_host Richard Henderson
@ 2022-05-31 12:24   ` Peter Maydell
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Maydell @ 2022-05-31 12:24 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Fri, 27 May 2022 at 19:18, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> This (newish) ARM pseudocode function is easier to work with
> than open-coded tests for HCR_E2H etc.  Use of the function
> will be staged into the code base in parts.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 09/15] target/arm: Use el_is_in_host for sve_vqm1_for_el
  2022-05-27 18:06 ` [PATCH v3 09/15] target/arm: Use el_is_in_host for sve_vqm1_for_el Richard Henderson
@ 2022-05-31 12:26   ` Peter Maydell
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Maydell @ 2022-05-31 12:26 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Fri, 27 May 2022 at 19:13, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> The ARM pseudocode function NVL uses this predicate now,
> and I think it's a bit clearer.  Simplify the pseudocode
> condition by noting that IsInHost is always false for EL1.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 10/15] target/arm: Split out load/store primitives to sve_ldst_internal.h
  2022-05-27 18:06 ` [PATCH v3 10/15] target/arm: Split out load/store primitives to sve_ldst_internal.h Richard Henderson
@ 2022-05-31 12:26   ` Peter Maydell
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Maydell @ 2022-05-31 12:26 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Fri, 27 May 2022 at 19:18, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Begin creation of sve_ldst_internal.h by moving the primitives
> that access host and tlb memory.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 11/15] target/arm: Export sve contiguous ldst support functions
  2022-05-27 18:06 ` [PATCH v3 11/15] target/arm: Export sve contiguous ldst support functions Richard Henderson
@ 2022-05-31 12:27   ` Peter Maydell
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Maydell @ 2022-05-31 12:27 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Fri, 27 May 2022 at 19:12, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Export all of the support functions for performing bulk
> fault analysis on a set of elements at contiguous addresses
> controlled by a predicate.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 12/15] target/arm: Move expand_pred_b to vec_internal.h
  2022-05-27 18:06 ` [PATCH v3 12/15] target/arm: Move expand_pred_b to vec_internal.h Richard Henderson
@ 2022-05-31 12:30   ` Peter Maydell
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Maydell @ 2022-05-31 12:30 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Fri, 27 May 2022 at 19:11, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Put the inline function near the array declaration.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/vec_internal.h | 8 +++++++-
>  target/arm/sve_helper.c   | 9 ---------
>  2 files changed, 7 insertions(+), 10 deletions(-)
>
> diff --git a/target/arm/vec_internal.h b/target/arm/vec_internal.h
> index 1d63402042..d1a1ea4a66 100644
> --- a/target/arm/vec_internal.h
> +++ b/target/arm/vec_internal.h
> @@ -50,8 +50,14 @@
>  #define H8(x)   (x)
>  #define H1_8(x) (x)
>
> -/* Data for expanding active predicate bits to bytes, for byte elements. */
> +/*
> + * Expand active predicate bits to bytes, for byte elements.
> + */
>  extern const uint64_t expand_pred_b_data[256];
> +static inline uint64_t expand_pred_b(uint8_t byte)
> +{
> +    return expand_pred_b_data[byte];
> +}

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 13/15] target/arm: Use expand_pred_b in mve_helper.c
  2022-05-27 18:06 ` [PATCH v3 13/15] target/arm: Use expand_pred_b in mve_helper.c Richard Henderson
@ 2022-05-31 12:33   ` Peter Maydell
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Maydell @ 2022-05-31 12:33 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Fri, 27 May 2022 at 19:14, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Use the function instead of the array directly.
>
> Because the function performs its own masking, via the uint8_t
> parameter, we need to nothing extra within the users: the bits

"to do"

> above the first 2 (_uh) or 4 (_uw) will be discarded by assignment
> to the local bmask variables, and of course _uq uses the entire
> uint64_t result.

Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 14/15] target/arm: Move expand_pred_h to vec_internal.h
  2022-05-27 18:06 ` [PATCH v3 14/15] target/arm: Move expand_pred_h to vec_internal.h Richard Henderson
@ 2022-05-31 12:34   ` Peter Maydell
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Maydell @ 2022-05-31 12:34 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Fri, 27 May 2022 at 19:17, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Move the data to vec_helper.c and the inline to vec_internal.h.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 15/15] target/arm: Export bfdotadd from vec_helper.c
  2022-05-27 18:06 ` [PATCH v3 15/15] target/arm: Export bfdotadd from vec_helper.c Richard Henderson
@ 2022-05-31 12:35   ` Peter Maydell
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Maydell @ 2022-05-31 12:35 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Fri, 27 May 2022 at 19:19, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> We will need this over in sme_helper.c.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/vec_internal.h | 2 ++
>  target/arm/vec_helper.c   | 2 +-
>  2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/target/arm/vec_internal.h b/target/arm/vec_internal.h
> index 43cff5ec7c..5e50c503aa 100644
> --- a/target/arm/vec_internal.h
> +++ b/target/arm/vec_internal.h
> @@ -230,4 +230,6 @@ uint64_t pmull_h(uint64_t op1, uint64_t op2);
>   */
>  uint64_t pmull_w(uint64_t op1, uint64_t op2);
>
> +float32 bfdotadd(float32 sum, uint32_t e1, uint32_t e2);
> +

A brief doc comment would be nice.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 03/15] target/arm: Do not use aarch64_sve_zcr_get_valid_len in reset
  2022-05-31 12:15   ` Peter Maydell
@ 2022-05-31 14:28     ` Richard Henderson
  2022-05-31 14:55       ` Peter Maydell
  0 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2022-05-31 14:28 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel, qemu-arm

On 5/31/22 05:15, Peter Maydell wrote:
> On Fri, 27 May 2022 at 19:07, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> We don't need to constrain the value set in zcr_el[1],
>> because it will be done by sve_zcr_len_for_el.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   target/arm/cpu.c | 3 +--
>>   1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
>> index d2bd74c2ed..0621944167 100644
>> --- a/target/arm/cpu.c
>> +++ b/target/arm/cpu.c
>> @@ -208,8 +208,7 @@ static void arm_cpu_reset(DeviceState *dev)
>>                                            CPACR_EL1, ZEN, 3);
>>           /* with reasonable vector length */
>>           if (cpu_isar_feature(aa64_sve, cpu)) {
>> -            env->vfp.zcr_el[1] =
>> -                aarch64_sve_zcr_get_valid_len(cpu, cpu->sve_default_vq - 1);
>> +            env->vfp.zcr_el[1] = cpu->sve_default_vq - 1;
>>           }
> 
> I'm still not a fan of the zcr_el[] value not actually being
> a valid one. I'd rather we constrained it when we write the
> value into the field.

It is an architecturally valid value, exactly like the kernel might write while probing 
for supported vector lengths.  It will result in this, or the next smaller supported 
vector size, being selected.


r~


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 03/15] target/arm: Do not use aarch64_sve_zcr_get_valid_len in reset
  2022-05-31 14:28     ` Richard Henderson
@ 2022-05-31 14:55       ` Peter Maydell
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Maydell @ 2022-05-31 14:55 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Tue, 31 May 2022 at 15:28, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 5/31/22 05:15, Peter Maydell wrote:
> > On Fri, 27 May 2022 at 19:07, Richard Henderson
> > <richard.henderson@linaro.org> wrote:
> >>
> >> We don't need to constrain the value set in zcr_el[1],
> >> because it will be done by sve_zcr_len_for_el.
> >>
> >> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> >> ---
> >>   target/arm/cpu.c | 3 +--
> >>   1 file changed, 1 insertion(+), 2 deletions(-)
> >>
> >> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> >> index d2bd74c2ed..0621944167 100644
> >> --- a/target/arm/cpu.c
> >> +++ b/target/arm/cpu.c
> >> @@ -208,8 +208,7 @@ static void arm_cpu_reset(DeviceState *dev)
> >>                                            CPACR_EL1, ZEN, 3);
> >>           /* with reasonable vector length */
> >>           if (cpu_isar_feature(aa64_sve, cpu)) {
> >> -            env->vfp.zcr_el[1] =
> >> -                aarch64_sve_zcr_get_valid_len(cpu, cpu->sve_default_vq - 1);
> >> +            env->vfp.zcr_el[1] = cpu->sve_default_vq - 1;
> >>           }
> >
> > I'm still not a fan of the zcr_el[] value not actually being
> > a valid one. I'd rather we constrained it when we write the
> > value into the field.
>
> It is an architecturally valid value, exactly like the kernel might write while probing
> for supported vector lengths.  It will result in this, or the next smaller supported
> vector size, being selected.

Mmm, I guess so (having re-read the ZCR_EL1 docs).

-- PMM


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2022-05-31 14:57 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-27 18:06 [PATCH v3 00/15] target/arm: SME prep patches Richard Henderson
2022-05-27 18:06 ` [PATCH v3 01/15] target/arm: Rename TBFLAG_A64 ZCR_LEN to SVE_LEN Richard Henderson
2022-05-31 12:13   ` Peter Maydell
2022-05-27 18:06 ` [PATCH v3 02/15] linux-user/aarch64: Use SVE_LEN from hflags Richard Henderson
2022-05-31 12:15   ` Peter Maydell
2022-05-27 18:06 ` [PATCH v3 03/15] target/arm: Do not use aarch64_sve_zcr_get_valid_len in reset Richard Henderson
2022-05-31 12:15   ` Peter Maydell
2022-05-31 14:28     ` Richard Henderson
2022-05-31 14:55       ` Peter Maydell
2022-05-27 18:06 ` [PATCH v3 04/15] target/arm: Merge aarch64_sve_zcr_get_valid_len into caller Richard Henderson
2022-05-27 18:06 ` [PATCH v3 05/15] target/arm: Use uint32_t instead of bitmap for sve vq's Richard Henderson
2022-05-27 18:06 ` [PATCH v3 06/15] target/arm: Rename sve_zcr_len_for_el to sve_vqm1_for_el Richard Henderson
2022-05-31 12:19   ` Peter Maydell
2022-05-27 18:06 ` [PATCH v3 07/15] target/arm: Remove fp checks from sve_exception_el Richard Henderson
2022-05-27 18:06 ` [PATCH v3 08/15] target/arm: Add el_is_in_host Richard Henderson
2022-05-31 12:24   ` Peter Maydell
2022-05-27 18:06 ` [PATCH v3 09/15] target/arm: Use el_is_in_host for sve_vqm1_for_el Richard Henderson
2022-05-31 12:26   ` Peter Maydell
2022-05-27 18:06 ` [PATCH v3 10/15] target/arm: Split out load/store primitives to sve_ldst_internal.h Richard Henderson
2022-05-31 12:26   ` Peter Maydell
2022-05-27 18:06 ` [PATCH v3 11/15] target/arm: Export sve contiguous ldst support functions Richard Henderson
2022-05-31 12:27   ` Peter Maydell
2022-05-27 18:06 ` [PATCH v3 12/15] target/arm: Move expand_pred_b to vec_internal.h Richard Henderson
2022-05-31 12:30   ` Peter Maydell
2022-05-27 18:06 ` [PATCH v3 13/15] target/arm: Use expand_pred_b in mve_helper.c Richard Henderson
2022-05-31 12:33   ` Peter Maydell
2022-05-27 18:06 ` [PATCH v3 14/15] target/arm: Move expand_pred_h to vec_internal.h Richard Henderson
2022-05-31 12:34   ` Peter Maydell
2022-05-27 18:06 ` [PATCH v3 15/15] target/arm: Export bfdotadd from vec_helper.c Richard Henderson
2022-05-31 12:35   ` Peter Maydell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.