All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/6] target/arm: assorted mte fixes
@ 2024-02-07  2:52 Richard Henderson
  2024-02-07  2:52 ` [PATCH v3 1/6] linux-user/aarch64: Choose SYNC as the preferred MTE mode Richard Henderson
                   ` (7 more replies)
  0 siblings, 8 replies; 14+ messages in thread
From: Richard Henderson @ 2024-02-07  2:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, gustavo.romero

Changes for v3:
  - As if /sys/devices/system/cpu/cpu<N>/mte_tcf_preferred is "sync".
  - Fix do_st_zpa as well as do_ld_zpa.  Oops.

Because of the above, I dropped Gustavo's t-b.


r~


Richard Henderson (6):
  linux-user/aarch64: Choose SYNC as the preferred MTE mode
  target/arm: Fix nregs computation in do_{ld,st}_zpa
  target/arm: Adjust and validate mtedesc sizem1
  target/arm: Split out make_svemte_desc
  target/arm: Handle mte in do_ldrq, do_ldro
  target/arm: Fix SVE/SME gross MTE suppression checks

 linux-user/aarch64/target_prctl.h | 29 ++++++-----
 target/arm/internals.h            |  2 +-
 target/arm/tcg/translate-a64.h    |  2 +
 target/arm/tcg/sme_helper.c       |  8 +--
 target/arm/tcg/sve_helper.c       | 12 ++---
 target/arm/tcg/translate-sme.c    | 15 ++----
 target/arm/tcg/translate-sve.c    | 83 ++++++++++++++++++-------------
 7 files changed, 83 insertions(+), 68 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 1/6] linux-user/aarch64: Choose SYNC as the preferred MTE mode
  2024-02-07  2:52 [PATCH v3 0/6] target/arm: assorted mte fixes Richard Henderson
@ 2024-02-07  2:52 ` Richard Henderson
  2024-02-07 20:03   ` Gustavo Romero
  2024-02-07  2:52 ` [PATCH v3 2/6] target/arm: Fix nregs computation in do_{ld,st}_zpa Richard Henderson
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Richard Henderson @ 2024-02-07  2:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, gustavo.romero, qemu-stable

The API does not generate an error for setting ASYNC | SYNC; that merely
constrains the selection vs the per-cpu default.  For qemu linux-user,
choose SYNC as the default.

Cc: qemu-stable@nongnu.org
Reported-by: Gustavo Romero <gustavo.romero@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 linux-user/aarch64/target_prctl.h | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/linux-user/aarch64/target_prctl.h b/linux-user/aarch64/target_prctl.h
index 5067e7d731..aa8e203c15 100644
--- a/linux-user/aarch64/target_prctl.h
+++ b/linux-user/aarch64/target_prctl.h
@@ -173,21 +173,26 @@ static abi_long do_prctl_set_tagged_addr_ctrl(CPUArchState *env, abi_long arg2)
     env->tagged_addr_enable = arg2 & PR_TAGGED_ADDR_ENABLE;
 
     if (cpu_isar_feature(aa64_mte, cpu)) {
-        switch (arg2 & PR_MTE_TCF_MASK) {
-        case PR_MTE_TCF_NONE:
-        case PR_MTE_TCF_SYNC:
-        case PR_MTE_TCF_ASYNC:
-            break;
-        default:
-            return -EINVAL;
-        }
-
         /*
          * Write PR_MTE_TCF to SCTLR_EL1[TCF0].
-         * Note that the syscall values are consistent with hw.
+         *
+         * The kernel has a per-cpu configuration for the sysadmin,
+         * /sys/devices/system/cpu/cpu<N>/mte_tcf_preferred,
+         * which qemu does not implement.
+         *
+         * Because there is no performance difference between the modes, and
+         * because SYNC is most useful for debugging MTE errors, choose SYNC
+         * as the preferred mode.  With this preference, and the way the API
+         * uses only two bits, there is no way for the program to select
+         * ASYMM mode.
          */
-        env->cp15.sctlr_el[1] =
-            deposit64(env->cp15.sctlr_el[1], 38, 2, arg2 >> PR_MTE_TCF_SHIFT);
+        unsigned tcf = 0;
+        if (arg2 & PR_MTE_TCF_SYNC) {
+            tcf = 1;
+        } else if (arg2 & PR_MTE_TCF_ASYNC) {
+            tcf = 2;
+        }
+        env->cp15.sctlr_el[1] = deposit64(env->cp15.sctlr_el[1], 38, 2, tcf);
 
         /*
          * Write PR_MTE_TAG to GCR_EL1[Exclude].
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 2/6] target/arm: Fix nregs computation in do_{ld,st}_zpa
  2024-02-07  2:52 [PATCH v3 0/6] target/arm: assorted mte fixes Richard Henderson
  2024-02-07  2:52 ` [PATCH v3 1/6] linux-user/aarch64: Choose SYNC as the preferred MTE mode Richard Henderson
@ 2024-02-07  2:52 ` Richard Henderson
  2024-02-08 16:24   ` [PATCH v3 2/6] target/arm: Fix nregs computation in do_{ld, st}_zpa Peter Maydell
  2024-02-07  2:52 ` [PATCH v3 3/6] target/arm: Adjust and validate mtedesc sizem1 Richard Henderson
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Richard Henderson @ 2024-02-07  2:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, gustavo.romero, qemu-stable

The field is encoded as [0-3], which is convenient for
indexing our array of function pointers, but the true
value is [1-4].  Adjust before calling do_mem_zpa.

Add an assert, and move the comment re passing ZT to
the helper back next to the relevant code.

Cc: qemu-stable@nongnu.org
Fixes: 206adacfb8d ("target/arm: Add mte helpers for sve scalar + int loads")
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-sve.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index 296e7d1ce2..7108938251 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -4445,11 +4445,7 @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
     TCGv_ptr t_pg;
     int desc = 0;
 
-    /*
-     * For e.g. LD4, there are not enough arguments to pass all 4
-     * registers as pointers, so encode the regno into the data field.
-     * For consistency, do this even for LD1.
-     */
+    assert(mte_n >= 1 && mte_n <= 4);
     if (s->mte_active[0]) {
         int msz = dtype_msz(dtype);
 
@@ -4463,6 +4459,11 @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
         addr = clean_data_tbi(s, addr);
     }
 
+    /*
+     * For e.g. LD4, there are not enough arguments to pass all 4
+     * registers as pointers, so encode the regno into the data field.
+     * For consistency, do this even for LD1.
+     */
     desc = simd_desc(vsz, vsz, zt | desc);
     t_pg = tcg_temp_new_ptr();
 
@@ -4600,7 +4601,7 @@ static void do_ld_zpa(DisasContext *s, int zt, int pg,
      * accessible via the instruction encoding.
      */
     assert(fn != NULL);
-    do_mem_zpa(s, zt, pg, addr, dtype, nreg, false, fn);
+    do_mem_zpa(s, zt, pg, addr, dtype, nreg + 1, false, fn);
 }
 
 static bool trans_LD_zprr(DisasContext *s, arg_rprr_load *a)
@@ -5168,14 +5169,13 @@ static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
     if (nreg == 0) {
         /* ST1 */
         fn = fn_single[s->mte_active[0]][be][msz][esz];
-        nreg = 1;
     } else {
         /* ST2, ST3, ST4 -- msz == esz, enforced by encoding */
         assert(msz == esz);
         fn = fn_multiple[s->mte_active[0]][be][nreg - 1][msz];
     }
     assert(fn != NULL);
-    do_mem_zpa(s, zt, pg, addr, msz_dtype(s, msz), nreg, true, fn);
+    do_mem_zpa(s, zt, pg, addr, msz_dtype(s, msz), nreg + 1, true, fn);
 }
 
 static bool trans_ST_zprr(DisasContext *s, arg_rprr_store *a)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 3/6] target/arm: Adjust and validate mtedesc sizem1
  2024-02-07  2:52 [PATCH v3 0/6] target/arm: assorted mte fixes Richard Henderson
  2024-02-07  2:52 ` [PATCH v3 1/6] linux-user/aarch64: Choose SYNC as the preferred MTE mode Richard Henderson
  2024-02-07  2:52 ` [PATCH v3 2/6] target/arm: Fix nregs computation in do_{ld,st}_zpa Richard Henderson
@ 2024-02-07  2:52 ` Richard Henderson
  2024-02-16 15:12   ` Michael Tokarev
  2024-02-07  2:52 ` [PATCH v3 4/6] target/arm: Split out make_svemte_desc Richard Henderson
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Richard Henderson @ 2024-02-07  2:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, gustavo.romero, qemu-stable, Peter Maydell

When we added SVE_MTEDESC_SHIFT, we effectively limited the
maximum size of MTEDESC.  Adjust SIZEM1 to consume the remaining
bits (32 - 10 - 5 - 12 == 5).  Assert that the data to be stored
fits within the field (expecting 8 * 4 - 1 == 31, exact fit).

Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/internals.h         | 2 +-
 target/arm/tcg/translate-sve.c | 7 ++++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index fc337fe40e..50bff44549 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -1278,7 +1278,7 @@ FIELD(MTEDESC, TBI,   4, 2)
 FIELD(MTEDESC, TCMA,  6, 2)
 FIELD(MTEDESC, WRITE, 8, 1)
 FIELD(MTEDESC, ALIGN, 9, 3)
-FIELD(MTEDESC, SIZEM1, 12, SIMD_DATA_BITS - 12)  /* size - 1 */
+FIELD(MTEDESC, SIZEM1, 12, SIMD_DATA_BITS - SVE_MTEDESC_SHIFT - 12)  /* size - 1 */
 
 bool mte_probe(CPUARMState *env, uint32_t desc, uint64_t ptr);
 uint64_t mte_check(CPUARMState *env, uint32_t desc, uint64_t ptr, uintptr_t ra);
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index 7108938251..a88e523cba 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -4443,17 +4443,18 @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
 {
     unsigned vsz = vec_full_reg_size(s);
     TCGv_ptr t_pg;
+    uint32_t sizem1;
     int desc = 0;
 
     assert(mte_n >= 1 && mte_n <= 4);
+    sizem1 = (mte_n << dtype_msz(dtype)) - 1;
+    assert(sizem1 <= R_MTEDESC_SIZEM1_MASK >> R_MTEDESC_SIZEM1_SHIFT);
     if (s->mte_active[0]) {
-        int msz = dtype_msz(dtype);
-
         desc = FIELD_DP32(desc, MTEDESC, MIDX, get_mem_index(s));
         desc = FIELD_DP32(desc, MTEDESC, TBI, s->tbid);
         desc = FIELD_DP32(desc, MTEDESC, TCMA, s->tcma);
         desc = FIELD_DP32(desc, MTEDESC, WRITE, is_write);
-        desc = FIELD_DP32(desc, MTEDESC, SIZEM1, (mte_n << msz) - 1);
+        desc = FIELD_DP32(desc, MTEDESC, SIZEM1, sizem1);
         desc <<= SVE_MTEDESC_SHIFT;
     } else {
         addr = clean_data_tbi(s, addr);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 4/6] target/arm: Split out make_svemte_desc
  2024-02-07  2:52 [PATCH v3 0/6] target/arm: assorted mte fixes Richard Henderson
                   ` (2 preceding siblings ...)
  2024-02-07  2:52 ` [PATCH v3 3/6] target/arm: Adjust and validate mtedesc sizem1 Richard Henderson
@ 2024-02-07  2:52 ` Richard Henderson
  2024-02-07  2:52 ` [PATCH v3 5/6] target/arm: Handle mte in do_ldrq, do_ldro Richard Henderson
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Richard Henderson @ 2024-02-07  2:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, gustavo.romero, qemu-stable, Peter Maydell

Share code that creates mtedesc and embeds within simd_desc.

Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.h |  2 ++
 target/arm/tcg/translate-sme.c | 15 +++--------
 target/arm/tcg/translate-sve.c | 47 ++++++++++++++++++----------------
 3 files changed, 31 insertions(+), 33 deletions(-)

diff --git a/target/arm/tcg/translate-a64.h b/target/arm/tcg/translate-a64.h
index 96ba39b37e..7b811b8ac5 100644
--- a/target/arm/tcg/translate-a64.h
+++ b/target/arm/tcg/translate-a64.h
@@ -28,6 +28,8 @@ bool logic_imm_decode_wmask(uint64_t *result, unsigned int immn,
 bool sve_access_check(DisasContext *s);
 bool sme_enabled_check(DisasContext *s);
 bool sme_enabled_check_with_svcr(DisasContext *s, unsigned);
+uint32_t make_svemte_desc(DisasContext *s, unsigned vsz, uint32_t nregs,
+                          uint32_t msz, bool is_write, uint32_t data);
 
 /* This function corresponds to CheckStreamingSVEEnabled. */
 static inline bool sme_sm_enabled_check(DisasContext *s)
diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c
index 8f0dfc884e..46c7fce8b4 100644
--- a/target/arm/tcg/translate-sme.c
+++ b/target/arm/tcg/translate-sme.c
@@ -206,7 +206,7 @@ static bool trans_LDST1(DisasContext *s, arg_LDST1 *a)
 
     TCGv_ptr t_za, t_pg;
     TCGv_i64 addr;
-    int svl, desc = 0;
+    uint32_t desc;
     bool be = s->be_data == MO_BE;
     bool mte = s->mte_active[0];
 
@@ -224,18 +224,11 @@ static bool trans_LDST1(DisasContext *s, arg_LDST1 *a)
     tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), a->esz);
     tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn));
 
-    if (mte) {
-        desc = FIELD_DP32(desc, MTEDESC, MIDX, get_mem_index(s));
-        desc = FIELD_DP32(desc, MTEDESC, TBI, s->tbid);
-        desc = FIELD_DP32(desc, MTEDESC, TCMA, s->tcma);
-        desc = FIELD_DP32(desc, MTEDESC, WRITE, a->st);
-        desc = FIELD_DP32(desc, MTEDESC, SIZEM1, (1 << a->esz) - 1);
-        desc <<= SVE_MTEDESC_SHIFT;
-    } else {
+    if (!mte) {
         addr = clean_data_tbi(s, addr);
     }
-    svl = streaming_vec_reg_size(s);
-    desc = simd_desc(svl, svl, desc);
+
+    desc = make_svemte_desc(s, streaming_vec_reg_size(s), 1, a->esz, a->st, 0);
 
     fns[a->esz][be][a->v][mte][a->st](tcg_env, t_za, t_pg, addr,
                                       tcg_constant_i32(desc));
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index a88e523cba..508f7b6bbd 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -4437,18 +4437,18 @@ static const uint8_t dtype_esz[16] = {
     3, 2, 1, 3
 };
 
-static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
-                       int dtype, uint32_t mte_n, bool is_write,
-                       gen_helper_gvec_mem *fn)
+uint32_t make_svemte_desc(DisasContext *s, unsigned vsz, uint32_t nregs,
+                          uint32_t msz, bool is_write, uint32_t data)
 {
-    unsigned vsz = vec_full_reg_size(s);
-    TCGv_ptr t_pg;
     uint32_t sizem1;
-    int desc = 0;
+    uint32_t desc = 0;
 
-    assert(mte_n >= 1 && mte_n <= 4);
-    sizem1 = (mte_n << dtype_msz(dtype)) - 1;
+    /* Assert all of the data fits, with or without MTE enabled. */
+    assert(nregs >= 1 && nregs <= 4);
+    sizem1 = (nregs << msz) - 1;
     assert(sizem1 <= R_MTEDESC_SIZEM1_MASK >> R_MTEDESC_SIZEM1_SHIFT);
+    assert(data < 1u << SVE_MTEDESC_SHIFT);
+
     if (s->mte_active[0]) {
         desc = FIELD_DP32(desc, MTEDESC, MIDX, get_mem_index(s));
         desc = FIELD_DP32(desc, MTEDESC, TBI, s->tbid);
@@ -4456,7 +4456,18 @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
         desc = FIELD_DP32(desc, MTEDESC, WRITE, is_write);
         desc = FIELD_DP32(desc, MTEDESC, SIZEM1, sizem1);
         desc <<= SVE_MTEDESC_SHIFT;
-    } else {
+    }
+    return simd_desc(vsz, vsz, desc | data);
+}
+
+static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
+                       int dtype, uint32_t nregs, bool is_write,
+                       gen_helper_gvec_mem *fn)
+{
+    TCGv_ptr t_pg;
+    uint32_t desc;
+
+    if (!s->mte_active[0]) {
         addr = clean_data_tbi(s, addr);
     }
 
@@ -4465,7 +4476,8 @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
      * registers as pointers, so encode the regno into the data field.
      * For consistency, do this even for LD1.
      */
-    desc = simd_desc(vsz, vsz, zt | desc);
+    desc = make_svemte_desc(s, vec_full_reg_size(s), nregs,
+                            dtype_msz(dtype), is_write, zt);
     t_pg = tcg_temp_new_ptr();
 
     tcg_gen_addi_ptr(t_pg, tcg_env, pred_full_reg_offset(s, pg));
@@ -5224,25 +5236,16 @@ static void do_mem_zpz(DisasContext *s, int zt, int pg, int zm,
                        int scale, TCGv_i64 scalar, int msz, bool is_write,
                        gen_helper_gvec_mem_scatter *fn)
 {
-    unsigned vsz = vec_full_reg_size(s);
     TCGv_ptr t_zm = tcg_temp_new_ptr();
     TCGv_ptr t_pg = tcg_temp_new_ptr();
     TCGv_ptr t_zt = tcg_temp_new_ptr();
-    int desc = 0;
-
-    if (s->mte_active[0]) {
-        desc = FIELD_DP32(desc, MTEDESC, MIDX, get_mem_index(s));
-        desc = FIELD_DP32(desc, MTEDESC, TBI, s->tbid);
-        desc = FIELD_DP32(desc, MTEDESC, TCMA, s->tcma);
-        desc = FIELD_DP32(desc, MTEDESC, WRITE, is_write);
-        desc = FIELD_DP32(desc, MTEDESC, SIZEM1, (1 << msz) - 1);
-        desc <<= SVE_MTEDESC_SHIFT;
-    }
-    desc = simd_desc(vsz, vsz, desc | scale);
+    uint32_t desc;
 
     tcg_gen_addi_ptr(t_pg, tcg_env, pred_full_reg_offset(s, pg));
     tcg_gen_addi_ptr(t_zm, tcg_env, vec_full_reg_offset(s, zm));
     tcg_gen_addi_ptr(t_zt, tcg_env, vec_full_reg_offset(s, zt));
+
+    desc = make_svemte_desc(s, vec_full_reg_size(s), 1, msz, is_write, scale);
     fn(tcg_env, t_zt, t_pg, t_zm, scalar, tcg_constant_i32(desc));
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 5/6] target/arm: Handle mte in do_ldrq, do_ldro
  2024-02-07  2:52 [PATCH v3 0/6] target/arm: assorted mte fixes Richard Henderson
                   ` (3 preceding siblings ...)
  2024-02-07  2:52 ` [PATCH v3 4/6] target/arm: Split out make_svemte_desc Richard Henderson
@ 2024-02-07  2:52 ` Richard Henderson
  2024-02-07  2:52 ` [PATCH v3 6/6] target/arm: Fix SVE/SME gross MTE suppression checks Richard Henderson
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Richard Henderson @ 2024-02-07  2:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, gustavo.romero, qemu-stable, Peter Maydell

These functions "use the standard load helpers", but
fail to clean_data_tbi or populate mtedesc.

Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-sve.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index 508f7b6bbd..ada05aa530 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -4861,8 +4861,13 @@ static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int dtype)
     unsigned vsz = vec_full_reg_size(s);
     TCGv_ptr t_pg;
     int poff;
+    uint32_t desc;
 
     /* Load the first quadword using the normal predicated load helpers.  */
+    if (!s->mte_active[0]) {
+        addr = clean_data_tbi(s, addr);
+    }
+
     poff = pred_full_reg_offset(s, pg);
     if (vsz > 16) {
         /*
@@ -4886,7 +4891,8 @@ static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int dtype)
 
     gen_helper_gvec_mem *fn
         = ldr_fns[s->mte_active[0]][s->be_data == MO_BE][dtype][0];
-    fn(tcg_env, t_pg, addr, tcg_constant_i32(simd_desc(16, 16, zt)));
+    desc = make_svemte_desc(s, 16, 1, dtype_msz(dtype), false, zt);
+    fn(tcg_env, t_pg, addr, tcg_constant_i32(desc));
 
     /* Replicate that first quadword.  */
     if (vsz > 16) {
@@ -4929,6 +4935,7 @@ static void do_ldro(DisasContext *s, int zt, int pg, TCGv_i64 addr, int dtype)
     unsigned vsz_r32;
     TCGv_ptr t_pg;
     int poff, doff;
+    uint32_t desc;
 
     if (vsz < 32) {
         /*
@@ -4941,6 +4948,9 @@ static void do_ldro(DisasContext *s, int zt, int pg, TCGv_i64 addr, int dtype)
     }
 
     /* Load the first octaword using the normal predicated load helpers.  */
+    if (!s->mte_active[0]) {
+        addr = clean_data_tbi(s, addr);
+    }
 
     poff = pred_full_reg_offset(s, pg);
     if (vsz > 32) {
@@ -4965,7 +4975,8 @@ static void do_ldro(DisasContext *s, int zt, int pg, TCGv_i64 addr, int dtype)
 
     gen_helper_gvec_mem *fn
         = ldr_fns[s->mte_active[0]][s->be_data == MO_BE][dtype][0];
-    fn(tcg_env, t_pg, addr, tcg_constant_i32(simd_desc(32, 32, zt)));
+    desc = make_svemte_desc(s, 32, 1, dtype_msz(dtype), false, zt);
+    fn(tcg_env, t_pg, addr, tcg_constant_i32(desc));
 
     /*
      * Replicate that first octaword.
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 6/6] target/arm: Fix SVE/SME gross MTE suppression checks
  2024-02-07  2:52 [PATCH v3 0/6] target/arm: assorted mte fixes Richard Henderson
                   ` (4 preceding siblings ...)
  2024-02-07  2:52 ` [PATCH v3 5/6] target/arm: Handle mte in do_ldrq, do_ldro Richard Henderson
@ 2024-02-07  2:52 ` Richard Henderson
  2024-02-07 20:09 ` [PATCH v3 0/6] target/arm: assorted mte fixes Gustavo Romero
  2024-02-08 16:27 ` Peter Maydell
  7 siblings, 0 replies; 14+ messages in thread
From: Richard Henderson @ 2024-02-07  2:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, gustavo.romero, qemu-stable, Peter Maydell

The TBI and TCMA bits are located within mtedesc, not desc.

Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/sme_helper.c |  8 ++++----
 target/arm/tcg/sve_helper.c | 12 ++++++------
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c
index 1ee2690ceb..904bfdac43 100644
--- a/target/arm/tcg/sme_helper.c
+++ b/target/arm/tcg/sme_helper.c
@@ -573,8 +573,8 @@ void sme_ld1_mte(CPUARMState *env, void *za, uint64_t *vg,
     desc = extract32(desc, 0, SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT);
 
     /* Perform gross MTE suppression early. */
-    if (!tbi_check(desc, bit55) ||
-        tcma_check(desc, bit55, allocation_tag_from_addr(addr))) {
+    if (!tbi_check(mtedesc, bit55) ||
+        tcma_check(mtedesc, bit55, allocation_tag_from_addr(addr))) {
         mtedesc = 0;
     }
 
@@ -750,8 +750,8 @@ void sme_st1_mte(CPUARMState *env, void *za, uint64_t *vg, target_ulong addr,
     desc = extract32(desc, 0, SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT);
 
     /* Perform gross MTE suppression early. */
-    if (!tbi_check(desc, bit55) ||
-        tcma_check(desc, bit55, allocation_tag_from_addr(addr))) {
+    if (!tbi_check(mtedesc, bit55) ||
+        tcma_check(mtedesc, bit55, allocation_tag_from_addr(addr))) {
         mtedesc = 0;
     }
 
diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
index bce4295d28..6853f58c19 100644
--- a/target/arm/tcg/sve_helper.c
+++ b/target/arm/tcg/sve_helper.c
@@ -5800,8 +5800,8 @@ void sve_ldN_r_mte(CPUARMState *env, uint64_t *vg, target_ulong addr,
     desc = extract32(desc, 0, SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT);
 
     /* Perform gross MTE suppression early. */
-    if (!tbi_check(desc, bit55) ||
-        tcma_check(desc, bit55, allocation_tag_from_addr(addr))) {
+    if (!tbi_check(mtedesc, bit55) ||
+        tcma_check(mtedesc, bit55, allocation_tag_from_addr(addr))) {
         mtedesc = 0;
     }
 
@@ -6156,8 +6156,8 @@ void sve_ldnfff1_r_mte(CPUARMState *env, void *vg, target_ulong addr,
     desc = extract32(desc, 0, SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT);
 
     /* Perform gross MTE suppression early. */
-    if (!tbi_check(desc, bit55) ||
-        tcma_check(desc, bit55, allocation_tag_from_addr(addr))) {
+    if (!tbi_check(mtedesc, bit55) ||
+        tcma_check(mtedesc, bit55, allocation_tag_from_addr(addr))) {
         mtedesc = 0;
     }
 
@@ -6410,8 +6410,8 @@ void sve_stN_r_mte(CPUARMState *env, uint64_t *vg, target_ulong addr,
     desc = extract32(desc, 0, SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT);
 
     /* Perform gross MTE suppression early. */
-    if (!tbi_check(desc, bit55) ||
-        tcma_check(desc, bit55, allocation_tag_from_addr(addr))) {
+    if (!tbi_check(mtedesc, bit55) ||
+        tcma_check(mtedesc, bit55, allocation_tag_from_addr(addr))) {
         mtedesc = 0;
     }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/6] linux-user/aarch64: Choose SYNC as the preferred MTE mode
  2024-02-07  2:52 ` [PATCH v3 1/6] linux-user/aarch64: Choose SYNC as the preferred MTE mode Richard Henderson
@ 2024-02-07 20:03   ` Gustavo Romero
  2024-02-08 16:18     ` Peter Maydell
  0 siblings, 1 reply; 14+ messages in thread
From: Gustavo Romero @ 2024-02-07 20:03 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-arm, qemu-stable


On 2/6/24 11:52 PM, Richard Henderson wrote:
> The API does not generate an error for setting ASYNC | SYNC; that merely
> constrains the selection vs the per-cpu default.  For qemu linux-user,
> choose SYNC as the default.
> 
> Cc: qemu-stable@nongnu.org
> Reported-by: Gustavo Romero <gustavo.romero@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   linux-user/aarch64/target_prctl.h | 29 +++++++++++++++++------------
>   1 file changed, 17 insertions(+), 12 deletions(-)
> 
> diff --git a/linux-user/aarch64/target_prctl.h b/linux-user/aarch64/target_prctl.h
> index 5067e7d731..aa8e203c15 100644
> --- a/linux-user/aarch64/target_prctl.h
> +++ b/linux-user/aarch64/target_prctl.h
> @@ -173,21 +173,26 @@ static abi_long do_prctl_set_tagged_addr_ctrl(CPUArchState *env, abi_long arg2)
>       env->tagged_addr_enable = arg2 & PR_TAGGED_ADDR_ENABLE;
>   
>       if (cpu_isar_feature(aa64_mte, cpu)) {
> -        switch (arg2 & PR_MTE_TCF_MASK) {
> -        case PR_MTE_TCF_NONE:
> -        case PR_MTE_TCF_SYNC:
> -        case PR_MTE_TCF_ASYNC:
> -            break;
> -        default:
> -            return -EINVAL;
> -        }
> -
>           /*
>            * Write PR_MTE_TCF to SCTLR_EL1[TCF0].
> -         * Note that the syscall values are consistent with hw.
> +         *
> +         * The kernel has a per-cpu configuration for the sysadmin,
> +         * /sys/devices/system/cpu/cpu<N>/mte_tcf_preferred,
> +         * which qemu does not implement.
> +         *
> +         * Because there is no performance difference between the modes, and
> +         * because SYNC is most useful for debugging MTE errors, choose SYNC
> +         * as the preferred mode.  With this preference, and the way the API
> +         * uses only two bits, there is no way for the program to select
> +         * ASYMM mode.
>            */
> -        env->cp15.sctlr_el[1] =
> -            deposit64(env->cp15.sctlr_el[1], 38, 2, arg2 >> PR_MTE_TCF_SHIFT);
> +        unsigned tcf = 0;
> +        if (arg2 & PR_MTE_TCF_SYNC) {
> +            tcf = 1;
> +        } else if (arg2 & PR_MTE_TCF_ASYNC) {
> +            tcf = 2;
> +        }
> +        env->cp15.sctlr_el[1] = deposit64(env->cp15.sctlr_el[1], 38, 2, tcf);
>   
>           /*
>            * Write PR_MTE_TAG to GCR_EL1[Exclude].
> 

ok, so no ASYMM in QEMU user-mode, plus if both SYNC and ASYNC flags are
specified by the user SYNC is selected. Contrary to what happens by default
on Linux, because of the mte_tcf_preferred value, which is ASYNC, and the
final value selected is define by:

resolved_mte_tcf = (mte_ctrl & pref) ? pref : mte_ctrl; [0]

where pref is mte_tcf_preferred (CPU, the value set in sys /mte_tcf_preferred)
and mte_ctr comes from the process, i.e. is the value specified by the user in
the flags -- hence the default on Linux if both flags are specified is ASYNC,
not SYNC.

(just some notes for the records).

Thanks.


[0] https://github.com/torvalds/linux/blob/master/arch/arm64/kernel/mte.c#L180-L186


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/6] target/arm: assorted mte fixes
  2024-02-07  2:52 [PATCH v3 0/6] target/arm: assorted mte fixes Richard Henderson
                   ` (5 preceding siblings ...)
  2024-02-07  2:52 ` [PATCH v3 6/6] target/arm: Fix SVE/SME gross MTE suppression checks Richard Henderson
@ 2024-02-07 20:09 ` Gustavo Romero
  2024-02-08 16:27 ` Peter Maydell
  7 siblings, 0 replies; 14+ messages in thread
From: Gustavo Romero @ 2024-02-07 20:09 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-arm

Hi Richard,

On 2/6/24 11:52 PM, Richard Henderson wrote:
> Changes for v3:
>    - As if /sys/devices/system/cpu/cpu<N>/mte_tcf_preferred is "sync".
>    - Fix do_st_zpa as well as do_ld_zpa.  Oops.
> 
> Because of the above, I dropped Gustavo's t-b.
> 
> 
> r~
> 
> 
> Richard Henderson (6):
>    linux-user/aarch64: Choose SYNC as the preferred MTE mode
>    target/arm: Fix nregs computation in do_{ld,st}_zpa
>    target/arm: Adjust and validate mtedesc sizem1
>    target/arm: Split out make_svemte_desc
>    target/arm: Handle mte in do_ldrq, do_ldro
>    target/arm: Fix SVE/SME gross MTE suppression checks
> 
>   linux-user/aarch64/target_prctl.h | 29 ++++++-----
>   target/arm/internals.h            |  2 +-
>   target/arm/tcg/translate-a64.h    |  2 +
>   target/arm/tcg/sme_helper.c       |  8 +--
>   target/arm/tcg/sve_helper.c       | 12 ++---
>   target/arm/tcg/translate-sme.c    | 15 ++----
>   target/arm/tcg/translate-sve.c    | 83 ++++++++++++++++++-------------
>   7 files changed, 83 insertions(+), 68 deletions(-)
> 
Since this patchset fixes the "prctl() failed: Invalid argument"
on QEMU user-mode when both flags (ASYNC | SYNC) are passed
to prctl(PR_SET_TAGGED_ADDR_CTRL, ...), I tested it again --
expecting no different result -- so:

Tested-by: Gustavo Romero <gustavo.romero@linaro.org>

If that t-b tag doesn't make sense, feel free to drop it :)

Thanks for fixing it!


Cheers,
Gustavo


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/6] linux-user/aarch64: Choose SYNC as the preferred MTE mode
  2024-02-07 20:03   ` Gustavo Romero
@ 2024-02-08 16:18     ` Peter Maydell
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Maydell @ 2024-02-08 16:18 UTC (permalink / raw)
  To: Gustavo Romero; +Cc: Richard Henderson, qemu-devel, qemu-arm, qemu-stable

On Wed, 7 Feb 2024 at 20:04, Gustavo Romero <gustavo.romero@linaro.org> wrote:
>
>
> On 2/6/24 11:52 PM, Richard Henderson wrote:
> > The API does not generate an error for setting ASYNC | SYNC; that merely
> > constrains the selection vs the per-cpu default.  For qemu linux-user,
> > choose SYNC as the default.
> >
> > Cc: qemu-stable@nongnu.org
> > Reported-by: Gustavo Romero <gustavo.romero@linaro.org>
> > Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> > ---
> >   linux-user/aarch64/target_prctl.h | 29 +++++++++++++++++------------
> >   1 file changed, 17 insertions(+), 12 deletions(-)
> >
> > diff --git a/linux-user/aarch64/target_prctl.h b/linux-user/aarch64/target_prctl.h
> > index 5067e7d731..aa8e203c15 100644
> > --- a/linux-user/aarch64/target_prctl.h
> > +++ b/linux-user/aarch64/target_prctl.h
> > @@ -173,21 +173,26 @@ static abi_long do_prctl_set_tagged_addr_ctrl(CPUArchState *env, abi_long arg2)
> >       env->tagged_addr_enable = arg2 & PR_TAGGED_ADDR_ENABLE;
> >
> >       if (cpu_isar_feature(aa64_mte, cpu)) {
> > -        switch (arg2 & PR_MTE_TCF_MASK) {
> > -        case PR_MTE_TCF_NONE:
> > -        case PR_MTE_TCF_SYNC:
> > -        case PR_MTE_TCF_ASYNC:
> > -            break;
> > -        default:
> > -            return -EINVAL;
> > -        }
> > -
> >           /*
> >            * Write PR_MTE_TCF to SCTLR_EL1[TCF0].
> > -         * Note that the syscall values are consistent with hw.
> > +         *
> > +         * The kernel has a per-cpu configuration for the sysadmin,
> > +         * /sys/devices/system/cpu/cpu<N>/mte_tcf_preferred,
> > +         * which qemu does not implement.
> > +         *
> > +         * Because there is no performance difference between the modes, and
> > +         * because SYNC is most useful for debugging MTE errors, choose SYNC
> > +         * as the preferred mode.  With this preference, and the way the API
> > +         * uses only two bits, there is no way for the program to select
> > +         * ASYMM mode.
> >            */
> > -        env->cp15.sctlr_el[1] =
> > -            deposit64(env->cp15.sctlr_el[1], 38, 2, arg2 >> PR_MTE_TCF_SHIFT);
> > +        unsigned tcf = 0;
> > +        if (arg2 & PR_MTE_TCF_SYNC) {
> > +            tcf = 1;
> > +        } else if (arg2 & PR_MTE_TCF_ASYNC) {
> > +            tcf = 2;
> > +        }
> > +        env->cp15.sctlr_el[1] = deposit64(env->cp15.sctlr_el[1], 38, 2, tcf);
> >
> >           /*
> >            * Write PR_MTE_TAG to GCR_EL1[Exclude].
> >
>
> ok, so no ASYMM in QEMU user-mode, plus if both SYNC and ASYNC flags are
> specified by the user SYNC is selected. Contrary to what happens by default
> on Linux, because of the mte_tcf_preferred value, which is ASYNC, and the
> final value selected is define by:
>
> resolved_mte_tcf = (mte_ctrl & pref) ? pref : mte_ctrl;

Yes; the kernel default mte_tcf_preferred is 'async', but as the comment
notes we do the equivalent of the sysadmin having set mte_tcf_preferred
to 'sync'. (The idea of the tunable is that you can set it to whatever
is a good balance between performance and debugging precision depending
on what your CPU implementation happens to be; for QEMU since there is
no performance gain from 'async' or 'asymm' we might as well act like
the sysadmin set the tunable to 'sync'.)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 2/6] target/arm: Fix nregs computation in do_{ld, st}_zpa
  2024-02-07  2:52 ` [PATCH v3 2/6] target/arm: Fix nregs computation in do_{ld,st}_zpa Richard Henderson
@ 2024-02-08 16:24   ` Peter Maydell
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Maydell @ 2024-02-08 16:24 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm, gustavo.romero, qemu-stable

On Wed, 7 Feb 2024 at 02:53, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> The field is encoded as [0-3], which is convenient for
> indexing our array of function pointers, but the true
> value is [1-4].  Adjust before calling do_mem_zpa.
>
> Add an assert, and move the comment re passing ZT to
> the helper back next to the relevant code.
>
> Cc: qemu-stable@nongnu.org
> Fixes: 206adacfb8d ("target/arm: Add mte helpers for sve scalar + int loads")
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/6] target/arm: assorted mte fixes
  2024-02-07  2:52 [PATCH v3 0/6] target/arm: assorted mte fixes Richard Henderson
                   ` (6 preceding siblings ...)
  2024-02-07 20:09 ` [PATCH v3 0/6] target/arm: assorted mte fixes Gustavo Romero
@ 2024-02-08 16:27 ` Peter Maydell
  7 siblings, 0 replies; 14+ messages in thread
From: Peter Maydell @ 2024-02-08 16:27 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm, gustavo.romero

On Wed, 7 Feb 2024 at 02:52, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Changes for v3:
>   - As if /sys/devices/system/cpu/cpu<N>/mte_tcf_preferred is "sync".
>   - Fix do_st_zpa as well as do_ld_zpa.  Oops.
>
> Because of the above, I dropped Gustavo's t-b.
>
>
> r~
>
>
> Richard Henderson (6):
>   linux-user/aarch64: Choose SYNC as the preferred MTE mode
>   target/arm: Fix nregs computation in do_{ld,st}_zpa
>   target/arm: Adjust and validate mtedesc sizem1
>   target/arm: Split out make_svemte_desc
>   target/arm: Handle mte in do_ldrq, do_ldro
>   target/arm: Fix SVE/SME gross MTE suppression checks



Applied to target-arm.next, thanks.

-- PMM


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/6] target/arm: Adjust and validate mtedesc sizem1
  2024-02-07  2:52 ` [PATCH v3 3/6] target/arm: Adjust and validate mtedesc sizem1 Richard Henderson
@ 2024-02-16 15:12   ` Michael Tokarev
  2024-02-16 19:17     ` Richard Henderson
  0 siblings, 1 reply; 14+ messages in thread
From: Michael Tokarev @ 2024-02-16 15:12 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-arm, gustavo.romero, qemu-stable, Peter Maydell

07.02.2024 05:52, Richard Henderson :
> When we added SVE_MTEDESC_SHIFT, we effectively limited the
> maximum size of MTEDESC.  Adjust SIZEM1 to consume the remaining
> bits (32 - 10 - 5 - 12 == 5).  Assert that the data to be stored
> fits within the field (expecting 8 * 4 - 1 == 31, exact fit).
> 
> Cc: qemu-stable@nongnu.org
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/arm/internals.h         | 2 +-
>   target/arm/tcg/translate-sve.c | 7 ++++---
>   2 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/target/arm/internals.h b/target/arm/internals.h
> index fc337fe40e..50bff44549 100644
> --- a/target/arm/internals.h
> +++ b/target/arm/internals.h
> @@ -1278,7 +1278,7 @@ FIELD(MTEDESC, TBI,   4, 2)
>   FIELD(MTEDESC, TCMA,  6, 2)
>   FIELD(MTEDESC, WRITE, 8, 1)
>   FIELD(MTEDESC, ALIGN, 9, 3)
> -FIELD(MTEDESC, SIZEM1, 12, SIMD_DATA_BITS - 12)  /* size - 1 */
> +FIELD(MTEDESC, SIZEM1, 12, SIMD_DATA_BITS - SVE_MTEDESC_SHIFT - 12)  /* size - 1 */
>   
>   bool mte_probe(CPUARMState *env, uint32_t desc, uint64_t ptr);
>   uint64_t mte_check(CPUARMState *env, uint32_t desc, uint64_t ptr, uintptr_t ra);
> diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
> index 7108938251..a88e523cba 100644
> --- a/target/arm/tcg/translate-sve.c
> +++ b/target/arm/tcg/translate-sve.c
> @@ -4443,17 +4443,18 @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
>   {
>       unsigned vsz = vec_full_reg_size(s);
>       TCGv_ptr t_pg;
> +    uint32_t sizem1;
>       int desc = 0;
>   
>       assert(mte_n >= 1 && mte_n <= 4);
> +    sizem1 = (mte_n << dtype_msz(dtype)) - 1;
> +    assert(sizem1 <= R_MTEDESC_SIZEM1_MASK >> R_MTEDESC_SIZEM1_SHIFT);
>       if (s->mte_active[0]) {
> -        int msz = dtype_msz(dtype);
> -
>           desc = FIELD_DP32(desc, MTEDESC, MIDX, get_mem_index(s));
>           desc = FIELD_DP32(desc, MTEDESC, TBI, s->tbid);
>           desc = FIELD_DP32(desc, MTEDESC, TCMA, s->tcma);
>           desc = FIELD_DP32(desc, MTEDESC, WRITE, is_write);
> -        desc = FIELD_DP32(desc, MTEDESC, SIZEM1, (mte_n << msz) - 1);
> +        desc = FIELD_DP32(desc, MTEDESC, SIZEM1, sizem1);
>           desc <<= SVE_MTEDESC_SHIFT;
>       } else {
>           addr = clean_data_tbi(s, addr);

There's no question about stable-8.2 here, this change needed there.
But I've a question about stable-7.2 - does it make sense to pick this
one up for 7.2?  It quickly goes out of control, because this one
is on top of

  523da6b963455ce0a0e8d572d98d9cd91f952785 target/arm: Check alignment in helper_mte_check
  (this one might be good for 7.2 by its own)
  which needs:
   3b97520c86e704b0533627c26b98173b71834bad target/arm: Pass single_memop to gen_mte_checkN
   which needs:
    6f47e7c18972802c428a5e03eb52a8f0a7bebe5c target/arm: Load/store integer pair with one tcg operation
    which needs:
     needs 128bit ops
     659aed5feda4472d8aed4ccc69e125bba2af8b89 target/arm: Drop tcg_temp_free from translator-a64.c
     ...

So I think it's not a good idea to go down this hole..

Probably ditto for the other two:
   target/arm: Split out make_svemte_desc
   target/arm: Handle mte in do_ldrq, do_ldro

Makes sense?  Or it's better to do a proper backport?

Thanks,

/mjt


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/6] target/arm: Adjust and validate mtedesc sizem1
  2024-02-16 15:12   ` Michael Tokarev
@ 2024-02-16 19:17     ` Richard Henderson
  0 siblings, 0 replies; 14+ messages in thread
From: Richard Henderson @ 2024-02-16 19:17 UTC (permalink / raw)
  To: Michael Tokarev, qemu-devel
  Cc: qemu-arm, gustavo.romero, qemu-stable, Peter Maydell

On 2/16/24 05:12, Michael Tokarev wrote:
> 07.02.2024 05:52, Richard Henderson :
>> When we added SVE_MTEDESC_SHIFT, we effectively limited the
>> maximum size of MTEDESC.  Adjust SIZEM1 to consume the remaining
>> bits (32 - 10 - 5 - 12 == 5).  Assert that the data to be stored
>> fits within the field (expecting 8 * 4 - 1 == 31, exact fit).
>>
>> Cc: qemu-stable@nongnu.org
>> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   target/arm/internals.h         | 2 +-
>>   target/arm/tcg/translate-sve.c | 7 ++++---
>>   2 files changed, 5 insertions(+), 4 deletions(-)
>>
>> diff --git a/target/arm/internals.h b/target/arm/internals.h
>> index fc337fe40e..50bff44549 100644
>> --- a/target/arm/internals.h
>> +++ b/target/arm/internals.h
>> @@ -1278,7 +1278,7 @@ FIELD(MTEDESC, TBI,   4, 2)
>>   FIELD(MTEDESC, TCMA,  6, 2)
>>   FIELD(MTEDESC, WRITE, 8, 1)
>>   FIELD(MTEDESC, ALIGN, 9, 3)
>> -FIELD(MTEDESC, SIZEM1, 12, SIMD_DATA_BITS - 12)  /* size - 1 */
>> +FIELD(MTEDESC, SIZEM1, 12, SIMD_DATA_BITS - SVE_MTEDESC_SHIFT - 12)  /* size - 1 */
>>   bool mte_probe(CPUARMState *env, uint32_t desc, uint64_t ptr);
>>   uint64_t mte_check(CPUARMState *env, uint32_t desc, uint64_t ptr, uintptr_t ra);
>> diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
>> index 7108938251..a88e523cba 100644
>> --- a/target/arm/tcg/translate-sve.c
>> +++ b/target/arm/tcg/translate-sve.c
>> @@ -4443,17 +4443,18 @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 
>> addr,
>>   {
>>       unsigned vsz = vec_full_reg_size(s);
>>       TCGv_ptr t_pg;
>> +    uint32_t sizem1;
>>       int desc = 0;
>>       assert(mte_n >= 1 && mte_n <= 4);
>> +    sizem1 = (mte_n << dtype_msz(dtype)) - 1;
>> +    assert(sizem1 <= R_MTEDESC_SIZEM1_MASK >> R_MTEDESC_SIZEM1_SHIFT);
>>       if (s->mte_active[0]) {
>> -        int msz = dtype_msz(dtype);
>> -
>>           desc = FIELD_DP32(desc, MTEDESC, MIDX, get_mem_index(s));
>>           desc = FIELD_DP32(desc, MTEDESC, TBI, s->tbid);
>>           desc = FIELD_DP32(desc, MTEDESC, TCMA, s->tcma);
>>           desc = FIELD_DP32(desc, MTEDESC, WRITE, is_write);
>> -        desc = FIELD_DP32(desc, MTEDESC, SIZEM1, (mte_n << msz) - 1);
>> +        desc = FIELD_DP32(desc, MTEDESC, SIZEM1, sizem1);
>>           desc <<= SVE_MTEDESC_SHIFT;
>>       } else {
>>           addr = clean_data_tbi(s, addr);
> 
> There's no question about stable-8.2 here, this change needed there.
> But I've a question about stable-7.2 - does it make sense to pick this
> one up for 7.2?  It quickly goes out of control, because this one
> is on top of
> 
>   523da6b963455ce0a0e8d572d98d9cd91f952785 target/arm: Check alignment in helper_mte_check
>   (this one might be good for 7.2 by its own)
>   which needs:
>    3b97520c86e704b0533627c26b98173b71834bad target/arm: Pass single_memop to gen_mte_checkN
>    which needs:
>     6f47e7c18972802c428a5e03eb52a8f0a7bebe5c target/arm: Load/store integer pair with one 
> tcg operation
>     which needs:
>      needs 128bit ops
>      659aed5feda4472d8aed4ccc69e125bba2af8b89 target/arm: Drop tcg_temp_free from 
> translator-a64.c
>      ...
> 
> So I think it's not a good idea to go down this hole..
> 
> Probably ditto for the other two:
>    target/arm: Split out make_svemte_desc
>    target/arm: Handle mte in do_ldrq, do_ldro
> 
> Makes sense?  Or it's better to do a proper backport?

I don't think it makes sense to go back too far.

r~



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-02-16 19:18 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-07  2:52 [PATCH v3 0/6] target/arm: assorted mte fixes Richard Henderson
2024-02-07  2:52 ` [PATCH v3 1/6] linux-user/aarch64: Choose SYNC as the preferred MTE mode Richard Henderson
2024-02-07 20:03   ` Gustavo Romero
2024-02-08 16:18     ` Peter Maydell
2024-02-07  2:52 ` [PATCH v3 2/6] target/arm: Fix nregs computation in do_{ld,st}_zpa Richard Henderson
2024-02-08 16:24   ` [PATCH v3 2/6] target/arm: Fix nregs computation in do_{ld, st}_zpa Peter Maydell
2024-02-07  2:52 ` [PATCH v3 3/6] target/arm: Adjust and validate mtedesc sizem1 Richard Henderson
2024-02-16 15:12   ` Michael Tokarev
2024-02-16 19:17     ` Richard Henderson
2024-02-07  2:52 ` [PATCH v3 4/6] target/arm: Split out make_svemte_desc Richard Henderson
2024-02-07  2:52 ` [PATCH v3 5/6] target/arm: Handle mte in do_ldrq, do_ldro Richard Henderson
2024-02-07  2:52 ` [PATCH v3 6/6] target/arm: Fix SVE/SME gross MTE suppression checks Richard Henderson
2024-02-07 20:09 ` [PATCH v3 0/6] target/arm: assorted mte fixes Gustavo Romero
2024-02-08 16:27 ` Peter Maydell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.