[Qemu-devel] [PATCH 00/11] target/arm: sve linux-user patches

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH 00/11] target/arm: sve linux-user patches
@ 2018-08-09  3:40 Richard Henderson
  2018-08-09  3:40 ` [Qemu-devel] [PATCH 01/11] target/arm: Fix sign of sve_cmpeq_ppzw/sve_cmpne_ppzw Richard Henderson
                   ` (11 more replies)
  0 siblings, 12 replies; 24+ messages in thread
From: Richard Henderson @ 2018-08-09  3:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee

I posted a few of these before, and I thought Peter had applied them
to his target-arm.for-3-1 branch, but I don't see them there now.  

I've taken the opportunity to tag all of these for backport into the
next stable release.  I'm intending to do so for all of the correctness
patches affecting sve linux-user so that 3.0.1 will be usable long-term.


r~


Richard Henderson (11):
  target/arm: Fix sign of sve_cmpeq_ppzw/sve_cmpne_ppzw
  target/arm: Fix typo in do_sat_addsub_64
  target/arm: Reorganize SVE WHILE
  target/arm: Fix typo in helper_sve_movz_d
  target/arm: Fix typo in helper_sve_ld1hss_r
  target/arm: Fix sign-extension in sve do_ldr/do_str
  target/arm: Fix offset for LD1R instructions
  target/arm: Fix offset scaling for LD_zprr and ST_zprr
  target/arm: Reformat integer register dump
  target/arm: Dump SVE state if enabled
  target/arm: Add sve-max-vq cpu property to -cpu max

 target/arm/cpu.h           |   3 ++
 linux-user/syscall.c       |  19 ++++---
 target/arm/cpu.c           |   6 +--
 target/arm/cpu64.c         |  29 ++++++++++
 target/arm/helper.c        |   7 ++-
 target/arm/sve_helper.c    |  21 +++-----
 target/arm/translate-a64.c | 108 ++++++++++++++++++++++++++++++-------
 target/arm/translate-sve.c |  77 +++++++++++++++-----------
 8 files changed, 195 insertions(+), 75 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH 01/11] target/arm: Fix sign of sve_cmpeq_ppzw/sve_cmpne_ppzw
  2018-08-09  3:40 [Qemu-devel] [PATCH 00/11] target/arm: sve linux-user patches Richard Henderson
@ 2018-08-09  3:40 ` Richard Henderson
  2018-08-09  3:40 ` [Qemu-devel] [PATCH 02/11] target/arm: Fix typo in do_sat_addsub_64 Richard Henderson
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Richard Henderson @ 2018-08-09  3:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee, qemu-stable

The normal vector element is sign-extended before
comparing with the wide vector element.

Cc: qemu-stable@nongnu.org (3.0.1)
Tested-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/sve_helper.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 54795c9194..9bd0694d55 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -2436,13 +2436,13 @@ uint32_t HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc) \
 #define DO_CMP_PPZW_S(NAME, TYPE, TYPEW, OP) \
     DO_CMP_PPZW(NAME, TYPE, TYPEW, OP, H1_4, 0x1111111111111111ull)
 
-DO_CMP_PPZW_B(sve_cmpeq_ppzw_b, uint8_t,  uint64_t, ==)
-DO_CMP_PPZW_H(sve_cmpeq_ppzw_h, uint16_t, uint64_t, ==)
-DO_CMP_PPZW_S(sve_cmpeq_ppzw_s, uint32_t, uint64_t, ==)
+DO_CMP_PPZW_B(sve_cmpeq_ppzw_b, int8_t,  uint64_t, ==)
+DO_CMP_PPZW_H(sve_cmpeq_ppzw_h, int16_t, uint64_t, ==)
+DO_CMP_PPZW_S(sve_cmpeq_ppzw_s, int32_t, uint64_t, ==)
 
-DO_CMP_PPZW_B(sve_cmpne_ppzw_b, uint8_t,  uint64_t, !=)
-DO_CMP_PPZW_H(sve_cmpne_ppzw_h, uint16_t, uint64_t, !=)
-DO_CMP_PPZW_S(sve_cmpne_ppzw_s, uint32_t, uint64_t, !=)
+DO_CMP_PPZW_B(sve_cmpne_ppzw_b, int8_t,  uint64_t, !=)
+DO_CMP_PPZW_H(sve_cmpne_ppzw_h, int16_t, uint64_t, !=)
+DO_CMP_PPZW_S(sve_cmpne_ppzw_s, int32_t, uint64_t, !=)
 
 DO_CMP_PPZW_B(sve_cmpgt_ppzw_b, int8_t,   int64_t, >)
 DO_CMP_PPZW_H(sve_cmpgt_ppzw_h, int16_t,  int64_t, >)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH 02/11] target/arm: Fix typo in do_sat_addsub_64
  2018-08-09  3:40 [Qemu-devel] [PATCH 00/11] target/arm: sve linux-user patches Richard Henderson
  2018-08-09  3:40 ` [Qemu-devel] [PATCH 01/11] target/arm: Fix sign of sve_cmpeq_ppzw/sve_cmpne_ppzw Richard Henderson
@ 2018-08-09  3:40 ` Richard Henderson
  2018-08-09  9:12   ` Alex Bennée
  2018-08-09  3:40 ` [Qemu-devel] [PATCH 03/11] target/arm: Reorganize SVE WHILE Richard Henderson
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: Richard Henderson @ 2018-08-09  3:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee, qemu-stable

Used the wrong temporary in the computation of subtractive overflow.

Cc: qemu-stable@nongnu.org (3.0.1)
Tested-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-sve.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 374051cd20..9dd4c38bab 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -1625,7 +1625,7 @@ static void do_sat_addsub_64(TCGv_i64 reg, TCGv_i64 val, bool u, bool d)
             /* Detect signed overflow for subtraction.  */
             tcg_gen_xor_i64(t0, reg, val);
             tcg_gen_sub_i64(t1, reg, val);
-            tcg_gen_xor_i64(reg, reg, t0);
+            tcg_gen_xor_i64(reg, reg, t1);
             tcg_gen_and_i64(t0, t0, reg);
 
             /* Bound the result.  */
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH 03/11] target/arm: Reorganize SVE WHILE
  2018-08-09  3:40 [Qemu-devel] [PATCH 00/11] target/arm: sve linux-user patches Richard Henderson
  2018-08-09  3:40 ` [Qemu-devel] [PATCH 01/11] target/arm: Fix sign of sve_cmpeq_ppzw/sve_cmpne_ppzw Richard Henderson
  2018-08-09  3:40 ` [Qemu-devel] [PATCH 02/11] target/arm: Fix typo in do_sat_addsub_64 Richard Henderson
@ 2018-08-09  3:40 ` Richard Henderson
  2018-08-09  9:48   ` Alex Bennée
  2018-08-09  3:40 ` [Qemu-devel] [PATCH 04/11] target/arm: Fix typo in helper_sve_movz_d Richard Henderson
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: Richard Henderson @ 2018-08-09  3:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee, qemu-stable

The pseudocode for this operation is an increment + compare loop,
so comparing <= the maximum integer produces an all-true predicate.

Rather than bound in both the inline code and the helper, pass the
helper the number of predicate bits to set instead of the number
of predicate elements to set.

Cc: qemu-stable@nongnu.org (3.0.1)
Tested-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/sve_helper.c    |  5 ----
 target/arm/translate-sve.c | 49 +++++++++++++++++++++++++-------------
 2 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 9bd0694d55..87594a8adb 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -2846,11 +2846,6 @@ uint32_t HELPER(sve_while)(void *vd, uint32_t count, uint32_t pred_desc)
         return flags;
     }
 
-    /* Scale from predicate element count to bits.  */
-    count <<= esz;
-    /* Bound to the bits in the predicate.  */
-    count = MIN(count, oprsz * 8);
-
     /* Set all of the requested bits.  */
     for (i = 0; i < count / 64; ++i) {
         d->p[i] = esz_mask;
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 9dd4c38bab..89efc80ee7 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -3173,19 +3173,19 @@ static bool trans_CTERM(DisasContext *s, arg_CTERM *a, uint32_t insn)
 
 static bool trans_WHILE(DisasContext *s, arg_WHILE *a, uint32_t insn)
 {
-    if (!sve_access_check(s)) {
-        return true;
-    }
-
-    TCGv_i64 op0 = read_cpu_reg(s, a->rn, 1);
-    TCGv_i64 op1 = read_cpu_reg(s, a->rm, 1);
-    TCGv_i64 t0 = tcg_temp_new_i64();
-    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 op0, op1, t0, t1, tmax;
     TCGv_i32 t2, t3;
     TCGv_ptr ptr;
     unsigned desc, vsz = vec_full_reg_size(s);
     TCGCond cond;
 
+    if (!sve_access_check(s)) {
+        return true;
+    }
+
+    op0 = read_cpu_reg(s, a->rn, 1);
+    op1 = read_cpu_reg(s, a->rm, 1);
+
     if (!a->sf) {
         if (a->u) {
             tcg_gen_ext32u_i64(op0, op0);
@@ -3198,32 +3198,47 @@ static bool trans_WHILE(DisasContext *s, arg_WHILE *a, uint32_t insn)
 
     /* For the helper, compress the different conditions into a computation
      * of how many iterations for which the condition is true.
-     *
-     * This is slightly complicated by 0 <= UINT64_MAX, which is nominally
-     * 2**64 iterations, overflowing to 0.  Of course, predicate registers
-     * aren't that large, so any value >= predicate size is sufficient.
      */
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
     tcg_gen_sub_i64(t0, op1, op0);
 
-    /* t0 = MIN(op1 - op0, vsz).  */
-    tcg_gen_movi_i64(t1, vsz);
-    tcg_gen_umin_i64(t0, t0, t1);
+    tmax = tcg_const_i64(vsz >> a->esz);
     if (a->eq) {
         /* Equality means one more iteration.  */
         tcg_gen_addi_i64(t0, t0, 1);
+
+        /* If op1 is max (un)signed integer (and the only time the addition
+         * above could overflow), then we produce an all-true predicate by
+         * setting the count to the vector length.  This is because the
+         * pseudocode is described as an increment + compare loop, and the
+         * max integer would always compare true.
+         */
+        tcg_gen_movi_i64(t1, (a->sf
+                              ? (a->u ? UINT64_MAX : INT64_MAX)
+                              : (a->u ? UINT32_MAX : INT32_MAX)));
+        tcg_gen_movcond_i64(TCG_COND_EQ, t0, op1, t1, tmax, t0);
     }
 
-    /* t0 = (condition true ? t0 : 0).  */
+    /* Bound to the maximum.  */
+    tcg_gen_umin_i64(t0, t0, tmax);
+    tcg_temp_free_i64(tmax);
+
+    /* Set the count to zero if the condition is false.  */
     cond = (a->u
             ? (a->eq ? TCG_COND_LEU : TCG_COND_LTU)
             : (a->eq ? TCG_COND_LE : TCG_COND_LT));
     tcg_gen_movi_i64(t1, 0);
     tcg_gen_movcond_i64(cond, t0, op0, op1, t0, t1);
+    tcg_temp_free_i64(t1);
 
+    /* Since we're bounded, pass as a 32-bit type.  */
     t2 = tcg_temp_new_i32();
     tcg_gen_extrl_i64_i32(t2, t0);
     tcg_temp_free_i64(t0);
-    tcg_temp_free_i64(t1);
+
+    /* Scale elements to bits.  */
+    tcg_gen_shli_i32(t2, t2, a->esz);
 
     desc = (vsz / 8) - 2;
     desc = deposit32(desc, SIMD_DATA_SHIFT, 2, a->esz);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH 04/11] target/arm: Fix typo in helper_sve_movz_d
  2018-08-09  3:40 [Qemu-devel] [PATCH 00/11] target/arm: sve linux-user patches Richard Henderson
                   ` (2 preceding siblings ...)
  2018-08-09  3:40 ` [Qemu-devel] [PATCH 03/11] target/arm: Reorganize SVE WHILE Richard Henderson
@ 2018-08-09  3:40 ` Richard Henderson
  2018-08-09  3:40 ` [Qemu-devel] [PATCH 05/11] target/arm: Fix typo in helper_sve_ld1hss_r Richard Henderson
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Richard Henderson @ 2018-08-09  3:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee, qemu-stable

Cc: qemu-stable@nongnu.org (3.0.1)
Tested-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/sve_helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 87594a8adb..c3cbec9cf5 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -1042,7 +1042,7 @@ void HELPER(sve_movz_d)(void *vd, void *vn, void *vg, uint32_t desc)
     uint64_t *d = vd, *n = vn;
     uint8_t *pg = vg;
     for (i = 0; i < opr_sz; i += 1) {
-        d[i] = n[1] & -(uint64_t)(pg[H1(i)] & 1);
+        d[i] = n[i] & -(uint64_t)(pg[H1(i)] & 1);
     }
 }
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH 05/11] target/arm: Fix typo in helper_sve_ld1hss_r
  2018-08-09  3:40 [Qemu-devel] [PATCH 00/11] target/arm: sve linux-user patches Richard Henderson
                   ` (3 preceding siblings ...)
  2018-08-09  3:40 ` [Qemu-devel] [PATCH 04/11] target/arm: Fix typo in helper_sve_movz_d Richard Henderson
@ 2018-08-09  3:40 ` Richard Henderson
  2018-08-09 10:09   ` Alex Bennée
  2018-08-09  3:40 ` [Qemu-devel] [PATCH 06/11] target/arm: Fix sign-extension in sve do_ldr/do_str Richard Henderson
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: Richard Henderson @ 2018-08-09  3:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee, qemu-stable

Cc: qemu-stable@nongnu.org (3.0.1)
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/sve_helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index c3cbec9cf5..e03f954a26 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -4045,7 +4045,7 @@ DO_LD1(sve_ld1bdu_r, cpu_ldub_data_ra, uint64_t, uint8_t, )
 DO_LD1(sve_ld1bds_r, cpu_ldsb_data_ra, uint64_t, int8_t, )
 
 DO_LD1(sve_ld1hsu_r, cpu_lduw_data_ra, uint32_t, uint16_t, H1_4)
-DO_LD1(sve_ld1hss_r, cpu_ldsw_data_ra, uint32_t, int8_t, H1_4)
+DO_LD1(sve_ld1hss_r, cpu_ldsw_data_ra, uint32_t, int16_t, H1_4)
 DO_LD1(sve_ld1hdu_r, cpu_lduw_data_ra, uint64_t, uint16_t, )
 DO_LD1(sve_ld1hds_r, cpu_ldsw_data_ra, uint64_t, int16_t, )
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH 06/11] target/arm: Fix sign-extension in sve do_ldr/do_str
  2018-08-09  3:40 [Qemu-devel] [PATCH 00/11] target/arm: sve linux-user patches Richard Henderson
                   ` (4 preceding siblings ...)
  2018-08-09  3:40 ` [Qemu-devel] [PATCH 05/11] target/arm: Fix typo in helper_sve_ld1hss_r Richard Henderson
@ 2018-08-09  3:40 ` Richard Henderson
  2018-08-09  5:28   ` Laurent Desnogues
  2018-08-09 11:00   ` Alex Bennée
  2018-08-09  3:40 ` [Qemu-devel] [PATCH 07/11] target/arm: Fix offset for LD1R instructions Richard Henderson
                   ` (5 subsequent siblings)
  11 siblings, 2 replies; 24+ messages in thread
From: Richard Henderson @ 2018-08-09  3:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee, qemu-stable

The expression (int) imm + (uint32_t) len_align turns into uint32_t
and thus with negative imm produces a memory operation at the wrong
offset.  None of the numbers involved are particularly large, so
change everything to use int.

Cc: qemu-stable@nongnu.org (3.0.1)
Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-sve.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 89efc80ee7..9e63b5f8e5 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -4372,12 +4372,11 @@ static bool trans_UCVTF_dd(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
  * The load should begin at the address Rn + IMM.
  */
 
-static void do_ldr(DisasContext *s, uint32_t vofs, uint32_t len,
-                   int rn, int imm)
+static void do_ldr(DisasContext *s, uint32_t vofs, int len, int rn, int imm)
 {
-    uint32_t len_align = QEMU_ALIGN_DOWN(len, 8);
-    uint32_t len_remain = len % 8;
-    uint32_t nparts = len / 8 + ctpop8(len_remain);
+    int len_align = QEMU_ALIGN_DOWN(len, 8);
+    int len_remain = len % 8;
+    int nparts = len / 8 + ctpop8(len_remain);
     int midx = get_mem_index(s);
     TCGv_i64 addr, t0, t1;
 
@@ -4458,12 +4457,11 @@ static void do_ldr(DisasContext *s, uint32_t vofs, uint32_t len,
 }
 
 /* Similarly for stores.  */
-static void do_str(DisasContext *s, uint32_t vofs, uint32_t len,
-                   int rn, int imm)
+static void do_str(DisasContext *s, uint32_t vofs, int len, int rn, int imm)
 {
-    uint32_t len_align = QEMU_ALIGN_DOWN(len, 8);
-    uint32_t len_remain = len % 8;
-    uint32_t nparts = len / 8 + ctpop8(len_remain);
+    int len_align = QEMU_ALIGN_DOWN(len, 8);
+    int len_remain = len % 8;
+    int nparts = len / 8 + ctpop8(len_remain);
     int midx = get_mem_index(s);
     TCGv_i64 addr, t0;
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH 07/11] target/arm: Fix offset for LD1R instructions
  2018-08-09  3:40 [Qemu-devel] [PATCH 00/11] target/arm: sve linux-user patches Richard Henderson
                   ` (5 preceding siblings ...)
  2018-08-09  3:40 ` [Qemu-devel] [PATCH 06/11] target/arm: Fix sign-extension in sve do_ldr/do_str Richard Henderson
@ 2018-08-09  3:40 ` Richard Henderson
  2018-08-09  5:28   ` Laurent Desnogues
  2018-08-09  3:40 ` [Qemu-devel] [PATCH 08/11] target/arm: Fix offset scaling for LD_zprr and ST_zprr Richard Henderson
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: Richard Henderson @ 2018-08-09  3:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee, qemu-stable

The immediate should be scaled by the size of the memory reference,
not the size of the elements into which it is loaded.

Cc: qemu-stable@nongnu.org (3.0.1)
Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-sve.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 9e63b5f8e5..f635822a61 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -4819,6 +4819,7 @@ static bool trans_LD1R_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn)
     unsigned vsz = vec_full_reg_size(s);
     unsigned psz = pred_full_reg_size(s);
     unsigned esz = dtype_esz[a->dtype];
+    unsigned msz = dtype_msz(a->dtype);
     TCGLabel *over = gen_new_label();
     TCGv_i64 temp;
 
@@ -4842,7 +4843,7 @@ static bool trans_LD1R_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn)
 
     /* Load the data.  */
     temp = tcg_temp_new_i64();
-    tcg_gen_addi_i64(temp, cpu_reg_sp(s, a->rn), a->imm << esz);
+    tcg_gen_addi_i64(temp, cpu_reg_sp(s, a->rn), a->imm << msz);
     tcg_gen_qemu_ld_i64(temp, temp, get_mem_index(s),
                         s->be_data | dtype_mop[a->dtype]);
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH 08/11] target/arm: Fix offset scaling for LD_zprr and ST_zprr
  2018-08-09  3:40 [Qemu-devel] [PATCH 00/11] target/arm: sve linux-user patches Richard Henderson
                   ` (6 preceding siblings ...)
  2018-08-09  3:40 ` [Qemu-devel] [PATCH 07/11] target/arm: Fix offset for LD1R instructions Richard Henderson
@ 2018-08-09  3:40 ` Richard Henderson
  2018-08-09  5:29   ` Laurent Desnogues
  2018-08-09  3:40 ` [Qemu-devel] [PATCH 09/11] target/arm: Reformat integer register dump Richard Henderson
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: Richard Henderson @ 2018-08-09  3:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee, qemu-stable

The scaling should be solely on the memory operation size; the number
of registers being loaded does not come in to the initial computation.

Cc: qemu-stable@nongnu.org (3.0.1)
Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-sve.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index f635822a61..d27bc8c946 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -4665,8 +4665,7 @@ static bool trans_LD_zprr(DisasContext *s, arg_rprr_load *a, uint32_t insn)
     }
     if (sve_access_check(s)) {
         TCGv_i64 addr = new_tmp_a64(s);
-        tcg_gen_muli_i64(addr, cpu_reg(s, a->rm),
-                         (a->nreg + 1) << dtype_msz(a->dtype));
+        tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), dtype_msz(a->dtype));
         tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn));
         do_ld_zpa(s, a->rd, a->pg, addr, a->dtype, a->nreg);
     }
@@ -4899,7 +4898,7 @@ static bool trans_ST_zprr(DisasContext *s, arg_rprr_store *a, uint32_t insn)
     }
     if (sve_access_check(s)) {
         TCGv_i64 addr = new_tmp_a64(s);
-        tcg_gen_muli_i64(addr, cpu_reg(s, a->rm), (a->nreg + 1) << a->msz);
+        tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), a->msz);
         tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn));
         do_st_zpa(s, a->rd, a->pg, addr, a->msz, a->esz, a->nreg);
     }
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH 09/11] target/arm: Reformat integer register dump
  2018-08-09  3:40 [Qemu-devel] [PATCH 00/11] target/arm: sve linux-user patches Richard Henderson
                   ` (7 preceding siblings ...)
  2018-08-09  3:40 ` [Qemu-devel] [PATCH 08/11] target/arm: Fix offset scaling for LD_zprr and ST_zprr Richard Henderson
@ 2018-08-09  3:40 ` Richard Henderson
  2018-08-09 10:12   ` Alex Bennée
  2018-08-09 10:58   ` Alex Bennée
  2018-08-09  3:40 ` [Qemu-devel] [PATCH 10/11] target/arm: Dump SVE state if enabled Richard Henderson
                   ` (2 subsequent siblings)
  11 siblings, 2 replies; 24+ messages in thread
From: Richard Henderson @ 2018-08-09  3:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee, qemu-stable

With PC, there are 33 registers.  Three per line lines up nicely
without overflowing 80 columns.

Cc: qemu-stable@nongnu.org (3.0.1)
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 45a6c2a3aa..358f169c75 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -137,14 +137,13 @@ void aarch64_cpu_dump_state(CPUState *cs, FILE *f,
     int el = arm_current_el(env);
     const char *ns_status;
 
-    cpu_fprintf(f, "PC=%016"PRIx64"  SP=%016"PRIx64"\n",
-            env->pc, env->xregs[31]);
-    for (i = 0; i < 31; i++) {
-        cpu_fprintf(f, "X%02d=%016"PRIx64, i, env->xregs[i]);
-        if ((i % 4) == 3) {
-            cpu_fprintf(f, "\n");
+    cpu_fprintf(f, " PC=%016" PRIx64 " ", env->pc);
+    for (i = 0; i < 32; i++) {
+        if (i == 31) {
+            cpu_fprintf(f, " SP=%016" PRIx64 "\n", env->xregs[i]);
         } else {
-            cpu_fprintf(f, " ");
+            cpu_fprintf(f, "X%02d=%016" PRIx64 "%s", i, env->xregs[i],
+                        (i + 2) % 3 ? " " : "\n");
         }
     }
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH 10/11] target/arm: Dump SVE state if enabled
  2018-08-09  3:40 [Qemu-devel] [PATCH 00/11] target/arm: sve linux-user patches Richard Henderson
                   ` (8 preceding siblings ...)
  2018-08-09  3:40 ` [Qemu-devel] [PATCH 09/11] target/arm: Reformat integer register dump Richard Henderson
@ 2018-08-09  3:40 ` Richard Henderson
  2018-08-09 10:55   ` Alex Bennée
  2018-08-09  3:40 ` [Qemu-devel] [PATCH 11/11] target/arm: Add sve-max-vq cpu property to -cpu max Richard Henderson
  2018-08-16 12:11 ` [Qemu-devel] [PATCH 00/11] target/arm: sve linux-user patches Peter Maydell
  11 siblings, 1 reply; 24+ messages in thread
From: Richard Henderson @ 2018-08-09  3:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee, qemu-stable

Also fold the FPCR/FPSR state onto the same line as PSTATE,
and mention but do not dump disabled FPU state.

Cc: qemu-stable@nongnu.org (3.0.1)
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 95 +++++++++++++++++++++++++++++++++-----
 1 file changed, 83 insertions(+), 12 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 358f169c75..b29dc49c4f 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -152,8 +152,7 @@ void aarch64_cpu_dump_state(CPUState *cs, FILE *f,
     } else {
         ns_status = "";
     }
-
-    cpu_fprintf(f, "\nPSTATE=%08x %c%c%c%c %sEL%d%c\n",
+    cpu_fprintf(f, "PSTATE=%08x %c%c%c%c %sEL%d%c",
                 psr,
                 psr & PSTATE_N ? 'N' : '-',
                 psr & PSTATE_Z ? 'Z' : '-',
@@ -163,17 +162,89 @@ void aarch64_cpu_dump_state(CPUState *cs, FILE *f,
                 el,
                 psr & PSTATE_SP ? 'h' : 't');
 
-    if (flags & CPU_DUMP_FPU) {
-        int numvfpregs = 32;
-        for (i = 0; i < numvfpregs; i++) {
-            uint64_t *q = aa64_vfp_qreg(env, i);
-            uint64_t vlo = q[0];
-            uint64_t vhi = q[1];
-            cpu_fprintf(f, "q%02d=%016" PRIx64 ":%016" PRIx64 "%c",
-                        i, vhi, vlo, (i & 1 ? '\n' : ' '));
+    if (!(flags & CPU_DUMP_FPU)) {
+        cpu_fprintf(f, "\n");
+        return;
+    }
+    cpu_fprintf(f, "     FPCR=%08x FPSR=%08x\n",
+                vfp_get_fpcr(env), vfp_get_fpsr(env));
+
+    if (arm_feature(env, ARM_FEATURE_SVE)) {
+        int j, zcr_len = env->vfp.zcr_el[1] & 0xf; /* fix for system mode */
+
+        for (i = 0; i <= FFR_PRED_NUM; i++) {
+            bool eol;
+            if (i == FFR_PRED_NUM) {
+                cpu_fprintf(f, "FFR=");
+                /* It's last, so end the line.  */
+                eol = true;
+            } else {
+                cpu_fprintf(f, "P%02d=", i);
+                switch (zcr_len) {
+                case 0:
+                    eol = i % 8 == 7;
+                    break;
+                case 1:
+                    eol = i % 6 == 5;
+                    break;
+                case 2:
+                case 3:
+                    eol = i % 3 == 2;
+                    break;
+                default:
+                    /* More than one quadword per predicate.  */
+                    eol = true;
+                    break;
+                }
+            }
+            for (j = zcr_len / 4; j >= 0; j--) {
+                int digits;
+                if (j * 4 + 4 <= zcr_len + 1) {
+                    digits = 16;
+                } else {
+                    digits = (zcr_len % 4 + 1) * 4;
+                }
+                cpu_fprintf(f, "%0*" PRIx64 "%s", digits,
+                            env->vfp.pregs[i].p[j],
+                            j ? ":" : eol ? "\n" : " ");
+            }
+        }
+
+        for (i = 0; i < 32; i++) {
+            if (zcr_len == 0) {
+                cpu_fprintf(f, "Z%02d=%016" PRIx64 ":%016" PRIx64 "%s",
+                            i, env->vfp.zregs[i].d[1],
+                            env->vfp.zregs[i].d[0], i & 1 ? "\n" : " ");
+            } else if (zcr_len == 1) {
+                cpu_fprintf(f, "Z%02d=%016" PRIx64 ":%016" PRIx64
+                            ":%016" PRIx64 ":%016" PRIx64 "\n",
+                            i, env->vfp.zregs[i].d[3], env->vfp.zregs[i].d[2],
+                            env->vfp.zregs[i].d[1], env->vfp.zregs[i].d[0]);
+            } else {
+                for (j = zcr_len; j >= 0; j--) {
+                    bool odd = (zcr_len - j) % 2 != 0;
+                    if (j == zcr_len) {
+                        cpu_fprintf(f, "Z%02d[%x-%x]=", i, j, j - 1);
+                    } else if (!odd) {
+                        if (j > 0) {
+                            cpu_fprintf(f, "   [%x-%x]=", j, j - 1);
+                        } else {
+                            cpu_fprintf(f, "     [%x]=", j);
+                        }
+                    }
+                    cpu_fprintf(f, "%016" PRIx64 ":%016" PRIx64 "%s",
+                                env->vfp.zregs[i].d[j * 2 + 1],
+                                env->vfp.zregs[i].d[j * 2],
+                                odd || j == 0 ? "\n" : ":");
+                }
+            }
+        }
+    } else {
+        for (i = 0; i < 32; i++) {
+            uint64_t *q = aa64_vfp_qreg(env, i);
+            cpu_fprintf(f, "Q%02d=%016" PRIx64 ":%016" PRIx64 "%s",
+                        i, q[1], q[0], (i & 1 ? "\n" : " "));
         }
-        cpu_fprintf(f, "FPCR: %08x  FPSR: %08x\n",
-                    vfp_get_fpcr(env), vfp_get_fpsr(env));
     }
 }
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH 11/11] target/arm: Add sve-max-vq cpu property to -cpu max
  2018-08-09  3:40 [Qemu-devel] [PATCH 00/11] target/arm: sve linux-user patches Richard Henderson
                   ` (9 preceding siblings ...)
  2018-08-09  3:40 ` [Qemu-devel] [PATCH 10/11] target/arm: Dump SVE state if enabled Richard Henderson
@ 2018-08-09  3:40 ` Richard Henderson
  2018-08-09 11:00   ` Alex Bennée
  2018-08-16 12:11 ` [Qemu-devel] [PATCH 00/11] target/arm: sve linux-user patches Peter Maydell
  11 siblings, 1 reply; 24+ messages in thread
From: Richard Henderson @ 2018-08-09  3:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee, qemu-stable

This allows the default (and maximum) vector length to be set
from the command-line.  Which is extraordinarily helpful in
debuging problems depending on vector length without having to
bake knowledge of PR_SET_SVE_VL into every guest binary.

Cc: qemu-stable@nongnu.org (3.0.1)
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h     |  3 +++
 linux-user/syscall.c | 19 +++++++++++++------
 target/arm/cpu.c     |  6 +++---
 target/arm/cpu64.c   | 29 +++++++++++++++++++++++++++++
 target/arm/helper.c  |  7 +++++--
 5 files changed, 53 insertions(+), 11 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index e310ffc29d..9526ed27cb 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -857,6 +857,9 @@ struct ARMCPU {
 
     /* Used to synchronize KVM and QEMU in-kernel device levels */
     uint8_t device_irq_level;
+
+    /* Used to set the maximum vector length the cpu will support.  */
+    uint32_t sve_max_vq;
 };
 
 static inline ARMCPU *arm_env_get_cpu(CPUARMState *env)
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index dfc851cc35..5a4af76c03 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -10848,15 +10848,22 @@ abi_long do_syscall(void *cpu_env, int num, abi_long arg1,
 #endif
 #ifdef TARGET_AARCH64
         case TARGET_PR_SVE_SET_VL:
-            /* We cannot support either PR_SVE_SET_VL_ONEXEC
-               or PR_SVE_VL_INHERIT.  Therefore, anything above
-               ARM_MAX_VQ results in EINVAL.  */
+            /*
+             * We cannot support either PR_SVE_SET_VL_ONEXEC or
+             * PR_SVE_VL_INHERIT.  Note the kernel definition
+             * of sve_vl_valid allows for VQ=512, i.e. VL=8192,
+             * even though the current architectural maximum is VQ=16.
+             */
             ret = -TARGET_EINVAL;
             if (arm_feature(cpu_env, ARM_FEATURE_SVE)
-                && arg2 >= 0 && arg2 <= ARM_MAX_VQ * 16 && !(arg2 & 15)) {
+                && arg2 >= 0 && arg2 <= 512 * 16 && !(arg2 & 15)) {
                 CPUARMState *env = cpu_env;
-                int old_vq = (env->vfp.zcr_el[1] & 0xf) + 1;
-                int vq = MAX(arg2 / 16, 1);
+                ARMCPU *cpu = arm_env_get_cpu(env);
+                uint32_t vq, old_vq;
+
+                old_vq = (env->vfp.zcr_el[1] & 0xf) + 1;
+                vq = MAX(arg2 / 16, 1);
+                vq = MIN(vq, cpu->sve_max_vq);
 
                 if (vq < old_vq) {
                     aarch64_sve_narrow_vq(env, vq);
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 64a8005a4b..b25898ed4c 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -168,9 +168,9 @@ static void arm_cpu_reset(CPUState *s)
         env->cp15.cpacr_el1 = deposit64(env->cp15.cpacr_el1, 16, 2, 3);
         env->cp15.cptr_el[3] |= CPTR_EZ;
         /* with maximum vector length */
-        env->vfp.zcr_el[1] = ARM_MAX_VQ - 1;
-        env->vfp.zcr_el[2] = ARM_MAX_VQ - 1;
-        env->vfp.zcr_el[3] = ARM_MAX_VQ - 1;
+        env->vfp.zcr_el[1] = cpu->sve_max_vq - 1;
+        env->vfp.zcr_el[2] = env->vfp.zcr_el[1];
+        env->vfp.zcr_el[3] = env->vfp.zcr_el[1];
 #else
         /* Reset into the highest available EL */
         if (arm_feature(env, ARM_FEATURE_EL3)) {
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index d0581d59d8..800bff780e 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -29,6 +29,7 @@
 #include "sysemu/sysemu.h"
 #include "sysemu/kvm.h"
 #include "kvm_arm.h"
+#include "qapi/visitor.h"
 
 static inline void set_feature(CPUARMState *env, int feature)
 {
@@ -217,6 +218,29 @@ static void aarch64_a53_initfn(Object *obj)
     define_arm_cp_regs(cpu, cortex_a57_a53_cp_reginfo);
 }
 
+static void cpu_max_get_sve_vq(Object *obj, Visitor *v, const char *name,
+                               void *opaque, Error **errp)
+{
+    ARMCPU *cpu = ARM_CPU(obj);
+    visit_type_uint32(v, name, &cpu->sve_max_vq, errp);
+}
+
+static void cpu_max_set_sve_vq(Object *obj, Visitor *v, const char *name,
+                               void *opaque, Error **errp)
+{
+    ARMCPU *cpu = ARM_CPU(obj);
+    Error *err = NULL;
+
+    visit_type_uint32(v, name, &cpu->sve_max_vq, &err);
+
+    if (!err && (cpu->sve_max_vq == 0 || cpu->sve_max_vq > ARM_MAX_VQ)) {
+        error_setg(&err, "unsupported SVE vector length");
+        error_append_hint(&err, "Valid sve-max-vq in range [1-%d]\n",
+                          ARM_MAX_VQ);
+    }
+    error_propagate(errp, err);
+}
+
 /* -cpu max: if KVM is enabled, like -cpu host (best possible with this host);
  * otherwise, a CPU with as many features enabled as our emulation supports.
  * The version of '-cpu max' for qemu-system-arm is defined in cpu.c;
@@ -253,6 +277,10 @@ static void aarch64_max_initfn(Object *obj)
         cpu->ctr = 0x80038003; /* 32 byte I and D cacheline size, VIPT icache */
         cpu->dcz_blocksize = 7; /*  512 bytes */
 #endif
+
+        cpu->sve_max_vq = ARM_MAX_VQ;
+        object_property_add(obj, "sve-max-vq", "uint32", cpu_max_get_sve_vq,
+                            cpu_max_set_sve_vq, NULL, NULL, &error_fatal);
     }
 }
 
@@ -405,6 +433,7 @@ void aarch64_sve_narrow_vq(CPUARMState *env, unsigned vq)
     uint64_t pmask;
 
     assert(vq >= 1 && vq <= ARM_MAX_VQ);
+    assert(vq <= arm_env_get_cpu(env)->sve_max_vq);
 
     /* Zap the high bits of the zregs.  */
     for (i = 0; i < 32; i++) {
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 66afb08ee0..c24c66d43e 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -12408,9 +12408,12 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
             zcr_len = 0;
         } else {
             int current_el = arm_current_el(env);
+            ARMCPU *cpu = arm_env_get_cpu(env);
 
-            zcr_len = env->vfp.zcr_el[current_el <= 1 ? 1 : current_el];
-            zcr_len &= 0xf;
+            zcr_len = cpu->sve_max_vq - 1;
+            if (current_el <= 1) {
+                zcr_len = MIN(zcr_len, 0xf & (uint32_t)env->vfp.zcr_el[1]);
+            }
             if (current_el < 2 && arm_feature(env, ARM_FEATURE_EL2)) {
                 zcr_len = MIN(zcr_len, 0xf & (uint32_t)env->vfp.zcr_el[2]);
             }
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 06/11] target/arm: Fix sign-extension in sve do_ldr/do_str
  2018-08-09  3:40 ` [Qemu-devel] [PATCH 06/11] target/arm: Fix sign-extension in sve do_ldr/do_str Richard Henderson
@ 2018-08-09  5:28   ` Laurent Desnogues
  2018-08-09 11:00   ` Alex Bennée
  1 sibling, 0 replies; 24+ messages in thread
From: Laurent Desnogues @ 2018-08-09  5:28 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, Peter Maydell, Alex Bennée, qemu-stable

On Thu, Aug 9, 2018 at 5:40 AM, Richard Henderson
<richard.henderson@linaro.org> wrote:
> The expression (int) imm + (uint32_t) len_align turns into uint32_t
> and thus with negative imm produces a memory operation at the wrong
> offset.  None of the numbers involved are particularly large, so
> change everything to use int.
>
> Cc: qemu-stable@nongnu.org (3.0.1)
> Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Tested-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>

Laurent

> ---
>  target/arm/translate-sve.c | 18 ++++++++----------
>  1 file changed, 8 insertions(+), 10 deletions(-)
>
> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
> index 89efc80ee7..9e63b5f8e5 100644
> --- a/target/arm/translate-sve.c
> +++ b/target/arm/translate-sve.c
> @@ -4372,12 +4372,11 @@ static bool trans_UCVTF_dd(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
>   * The load should begin at the address Rn + IMM.
>   */
>
> -static void do_ldr(DisasContext *s, uint32_t vofs, uint32_t len,
> -                   int rn, int imm)
> +static void do_ldr(DisasContext *s, uint32_t vofs, int len, int rn, int imm)
>  {
> -    uint32_t len_align = QEMU_ALIGN_DOWN(len, 8);
> -    uint32_t len_remain = len % 8;
> -    uint32_t nparts = len / 8 + ctpop8(len_remain);
> +    int len_align = QEMU_ALIGN_DOWN(len, 8);
> +    int len_remain = len % 8;
> +    int nparts = len / 8 + ctpop8(len_remain);
>      int midx = get_mem_index(s);
>      TCGv_i64 addr, t0, t1;
>
> @@ -4458,12 +4457,11 @@ static void do_ldr(DisasContext *s, uint32_t vofs, uint32_t len,
>  }
>
>  /* Similarly for stores.  */
> -static void do_str(DisasContext *s, uint32_t vofs, uint32_t len,
> -                   int rn, int imm)
> +static void do_str(DisasContext *s, uint32_t vofs, int len, int rn, int imm)
>  {
> -    uint32_t len_align = QEMU_ALIGN_DOWN(len, 8);
> -    uint32_t len_remain = len % 8;
> -    uint32_t nparts = len / 8 + ctpop8(len_remain);
> +    int len_align = QEMU_ALIGN_DOWN(len, 8);
> +    int len_remain = len % 8;
> +    int nparts = len / 8 + ctpop8(len_remain);
>      int midx = get_mem_index(s);
>      TCGv_i64 addr, t0;
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 07/11] target/arm: Fix offset for LD1R instructions
  2018-08-09  3:40 ` [Qemu-devel] [PATCH 07/11] target/arm: Fix offset for LD1R instructions Richard Henderson
@ 2018-08-09  5:28   ` Laurent Desnogues
  0 siblings, 0 replies; 24+ messages in thread
From: Laurent Desnogues @ 2018-08-09  5:28 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, Peter Maydell, Alex Bennée, qemu-stable

On Thu, Aug 9, 2018 at 5:40 AM, Richard Henderson
<richard.henderson@linaro.org> wrote:
> The immediate should be scaled by the size of the memory reference,
> not the size of the elements into which it is loaded.
>
> Cc: qemu-stable@nongnu.org (3.0.1)
> Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Tested-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>

Laurent

> ---
>  target/arm/translate-sve.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
> index 9e63b5f8e5..f635822a61 100644
> --- a/target/arm/translate-sve.c
> +++ b/target/arm/translate-sve.c
> @@ -4819,6 +4819,7 @@ static bool trans_LD1R_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn)
>      unsigned vsz = vec_full_reg_size(s);
>      unsigned psz = pred_full_reg_size(s);
>      unsigned esz = dtype_esz[a->dtype];
> +    unsigned msz = dtype_msz(a->dtype);
>      TCGLabel *over = gen_new_label();
>      TCGv_i64 temp;
>
> @@ -4842,7 +4843,7 @@ static bool trans_LD1R_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn)
>
>      /* Load the data.  */
>      temp = tcg_temp_new_i64();
> -    tcg_gen_addi_i64(temp, cpu_reg_sp(s, a->rn), a->imm << esz);
> +    tcg_gen_addi_i64(temp, cpu_reg_sp(s, a->rn), a->imm << msz);
>      tcg_gen_qemu_ld_i64(temp, temp, get_mem_index(s),
>                          s->be_data | dtype_mop[a->dtype]);
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 08/11] target/arm: Fix offset scaling for LD_zprr and ST_zprr
  2018-08-09  3:40 ` [Qemu-devel] [PATCH 08/11] target/arm: Fix offset scaling for LD_zprr and ST_zprr Richard Henderson
@ 2018-08-09  5:29   ` Laurent Desnogues
  0 siblings, 0 replies; 24+ messages in thread
From: Laurent Desnogues @ 2018-08-09  5:29 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, Peter Maydell, Alex Bennée, qemu-stable

On Thu, Aug 9, 2018 at 5:40 AM, Richard Henderson
<richard.henderson@linaro.org> wrote:
> The scaling should be solely on the memory operation size; the number
> of registers being loaded does not come in to the initial computation.
>
> Cc: qemu-stable@nongnu.org (3.0.1)
> Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Tested-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>

Laurent

> ---
>  target/arm/translate-sve.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
> index f635822a61..d27bc8c946 100644
> --- a/target/arm/translate-sve.c
> +++ b/target/arm/translate-sve.c
> @@ -4665,8 +4665,7 @@ static bool trans_LD_zprr(DisasContext *s, arg_rprr_load *a, uint32_t insn)
>      }
>      if (sve_access_check(s)) {
>          TCGv_i64 addr = new_tmp_a64(s);
> -        tcg_gen_muli_i64(addr, cpu_reg(s, a->rm),
> -                         (a->nreg + 1) << dtype_msz(a->dtype));
> +        tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), dtype_msz(a->dtype));
>          tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn));
>          do_ld_zpa(s, a->rd, a->pg, addr, a->dtype, a->nreg);
>      }
> @@ -4899,7 +4898,7 @@ static bool trans_ST_zprr(DisasContext *s, arg_rprr_store *a, uint32_t insn)
>      }
>      if (sve_access_check(s)) {
>          TCGv_i64 addr = new_tmp_a64(s);
> -        tcg_gen_muli_i64(addr, cpu_reg(s, a->rm), (a->nreg + 1) << a->msz);
> +        tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), a->msz);
>          tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn));
>          do_st_zpa(s, a->rd, a->pg, addr, a->msz, a->esz, a->nreg);
>      }
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 02/11] target/arm: Fix typo in do_sat_addsub_64
  2018-08-09  3:40 ` [Qemu-devel] [PATCH 02/11] target/arm: Fix typo in do_sat_addsub_64 Richard Henderson
@ 2018-08-09  9:12   ` Alex Bennée
  0 siblings, 0 replies; 24+ messages in thread
From: Alex Bennée @ 2018-08-09  9:12 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, laurent.desnogues, peter.maydell, qemu-stable


Richard Henderson <richard.henderson@linaro.org> writes:

> Used the wrong temporary in the computation of subtractive overflow.
>
> Cc: qemu-stable@nongnu.org (3.0.1)
> Tested-by: Laurent Desnogues <laurent.desnogues@gmail.com>
> Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>
> Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  target/arm/translate-sve.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
> index 374051cd20..9dd4c38bab 100644
> --- a/target/arm/translate-sve.c
> +++ b/target/arm/translate-sve.c
> @@ -1625,7 +1625,7 @@ static void do_sat_addsub_64(TCGv_i64 reg, TCGv_i64 val, bool u, bool d)
>              /* Detect signed overflow for subtraction.  */
>              tcg_gen_xor_i64(t0, reg, val);
>              tcg_gen_sub_i64(t1, reg, val);
> -            tcg_gen_xor_i64(reg, reg, t0);
> +            tcg_gen_xor_i64(reg, reg, t1);
>              tcg_gen_and_i64(t0, t0, reg);
>
>              /* Bound the result.  */


--
Alex Bennée

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 03/11] target/arm: Reorganize SVE WHILE
  2018-08-09  3:40 ` [Qemu-devel] [PATCH 03/11] target/arm: Reorganize SVE WHILE Richard Henderson
@ 2018-08-09  9:48   ` Alex Bennée
  0 siblings, 0 replies; 24+ messages in thread
From: Alex Bennée @ 2018-08-09  9:48 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, laurent.desnogues, peter.maydell, qemu-stable


Richard Henderson <richard.henderson@linaro.org> writes:

> The pseudocode for this operation is an increment + compare loop,
> so comparing <= the maximum integer produces an all-true predicate.
>
> Rather than bound in both the inline code and the helper, pass the
> helper the number of predicate bits to set instead of the number
> of predicate elements to set.
>
> Cc: qemu-stable@nongnu.org (3.0.1)
> Tested-by: Laurent Desnogues <laurent.desnogues@gmail.com>
> Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>
> Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  target/arm/sve_helper.c    |  5 ----
>  target/arm/translate-sve.c | 49 +++++++++++++++++++++++++-------------
>  2 files changed, 32 insertions(+), 22 deletions(-)
>
> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> index 9bd0694d55..87594a8adb 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
> @@ -2846,11 +2846,6 @@ uint32_t HELPER(sve_while)(void *vd, uint32_t count, uint32_t pred_desc)
>          return flags;
>      }
>
> -    /* Scale from predicate element count to bits.  */
> -    count <<= esz;
> -    /* Bound to the bits in the predicate.  */
> -    count = MIN(count, oprsz * 8);
> -
>      /* Set all of the requested bits.  */
>      for (i = 0; i < count / 64; ++i) {
>          d->p[i] = esz_mask;
> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
> index 9dd4c38bab..89efc80ee7 100644
> --- a/target/arm/translate-sve.c
> +++ b/target/arm/translate-sve.c
> @@ -3173,19 +3173,19 @@ static bool trans_CTERM(DisasContext *s, arg_CTERM *a, uint32_t insn)
>
>  static bool trans_WHILE(DisasContext *s, arg_WHILE *a, uint32_t insn)
>  {
> -    if (!sve_access_check(s)) {
> -        return true;
> -    }
> -
> -    TCGv_i64 op0 = read_cpu_reg(s, a->rn, 1);
> -    TCGv_i64 op1 = read_cpu_reg(s, a->rm, 1);
> -    TCGv_i64 t0 = tcg_temp_new_i64();
> -    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 op0, op1, t0, t1, tmax;
>      TCGv_i32 t2, t3;
>      TCGv_ptr ptr;
>      unsigned desc, vsz = vec_full_reg_size(s);
>      TCGCond cond;
>
> +    if (!sve_access_check(s)) {
> +        return true;
> +    }
> +
> +    op0 = read_cpu_reg(s, a->rn, 1);
> +    op1 = read_cpu_reg(s, a->rm, 1);
> +
>      if (!a->sf) {
>          if (a->u) {
>              tcg_gen_ext32u_i64(op0, op0);
> @@ -3198,32 +3198,47 @@ static bool trans_WHILE(DisasContext *s, arg_WHILE *a, uint32_t insn)
>
>      /* For the helper, compress the different conditions into a computation
>       * of how many iterations for which the condition is true.
> -     *
> -     * This is slightly complicated by 0 <= UINT64_MAX, which is nominally
> -     * 2**64 iterations, overflowing to 0.  Of course, predicate registers
> -     * aren't that large, so any value >= predicate size is sufficient.
>       */
> +    t0 = tcg_temp_new_i64();
> +    t1 = tcg_temp_new_i64();
>      tcg_gen_sub_i64(t0, op1, op0);
>
> -    /* t0 = MIN(op1 - op0, vsz).  */
> -    tcg_gen_movi_i64(t1, vsz);
> -    tcg_gen_umin_i64(t0, t0, t1);
> +    tmax = tcg_const_i64(vsz >> a->esz);
>      if (a->eq) {
>          /* Equality means one more iteration.  */
>          tcg_gen_addi_i64(t0, t0, 1);
> +
> +        /* If op1 is max (un)signed integer (and the only time the addition
> +         * above could overflow), then we produce an all-true predicate by
> +         * setting the count to the vector length.  This is because the
> +         * pseudocode is described as an increment + compare loop, and the
> +         * max integer would always compare true.
> +         */
> +        tcg_gen_movi_i64(t1, (a->sf
> +                              ? (a->u ? UINT64_MAX : INT64_MAX)
> +                              : (a->u ? UINT32_MAX : INT32_MAX)));
> +        tcg_gen_movcond_i64(TCG_COND_EQ, t0, op1, t1, tmax, t0);
>      }
>
> -    /* t0 = (condition true ? t0 : 0).  */
> +    /* Bound to the maximum.  */
> +    tcg_gen_umin_i64(t0, t0, tmax);
> +    tcg_temp_free_i64(tmax);
> +
> +    /* Set the count to zero if the condition is false.  */
>      cond = (a->u
>              ? (a->eq ? TCG_COND_LEU : TCG_COND_LTU)
>              : (a->eq ? TCG_COND_LE : TCG_COND_LT));
>      tcg_gen_movi_i64(t1, 0);
>      tcg_gen_movcond_i64(cond, t0, op0, op1, t0, t1);
> +    tcg_temp_free_i64(t1);
>
> +    /* Since we're bounded, pass as a 32-bit type.  */
>      t2 = tcg_temp_new_i32();
>      tcg_gen_extrl_i64_i32(t2, t0);
>      tcg_temp_free_i64(t0);
> -    tcg_temp_free_i64(t1);
> +
> +    /* Scale elements to bits.  */
> +    tcg_gen_shli_i32(t2, t2, a->esz);
>
>      desc = (vsz / 8) - 2;
>      desc = deposit32(desc, SIMD_DATA_SHIFT, 2, a->esz);


--
Alex Bennée

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 05/11] target/arm: Fix typo in helper_sve_ld1hss_r
  2018-08-09  3:40 ` [Qemu-devel] [PATCH 05/11] target/arm: Fix typo in helper_sve_ld1hss_r Richard Henderson
@ 2018-08-09 10:09   ` Alex Bennée
  0 siblings, 0 replies; 24+ messages in thread
From: Alex Bennée @ 2018-08-09 10:09 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, laurent.desnogues, peter.maydell, qemu-stable


Richard Henderson <richard.henderson@linaro.org> writes:

> Cc: qemu-stable@nongnu.org (3.0.1)
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  target/arm/sve_helper.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> index c3cbec9cf5..e03f954a26 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
> @@ -4045,7 +4045,7 @@ DO_LD1(sve_ld1bdu_r, cpu_ldub_data_ra, uint64_t, uint8_t, )
>  DO_LD1(sve_ld1bds_r, cpu_ldsb_data_ra, uint64_t, int8_t, )
>
>  DO_LD1(sve_ld1hsu_r, cpu_lduw_data_ra, uint32_t, uint16_t, H1_4)
> -DO_LD1(sve_ld1hss_r, cpu_ldsw_data_ra, uint32_t, int8_t, H1_4)
> +DO_LD1(sve_ld1hss_r, cpu_ldsw_data_ra, uint32_t, int16_t, H1_4)
>  DO_LD1(sve_ld1hdu_r, cpu_lduw_data_ra, uint64_t, uint16_t, )
>  DO_LD1(sve_ld1hds_r, cpu_ldsw_data_ra, uint64_t, int16_t, )


--
Alex Bennée

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 09/11] target/arm: Reformat integer register dump
  2018-08-09  3:40 ` [Qemu-devel] [PATCH 09/11] target/arm: Reformat integer register dump Richard Henderson
@ 2018-08-09 10:12   ` Alex Bennée
  2018-08-09 10:58   ` Alex Bennée
  1 sibling, 0 replies; 24+ messages in thread
From: Alex Bennée @ 2018-08-09 10:12 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, laurent.desnogues, peter.maydell, qemu-stable


Richard Henderson <richard.henderson@linaro.org> writes:

> With PC, there are 33 registers.  Three per line lines up nicely
> without overflowing 80 columns.
>
> Cc: qemu-stable@nongnu.org (3.0.1)
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  target/arm/translate-a64.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index 45a6c2a3aa..358f169c75 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -137,14 +137,13 @@ void aarch64_cpu_dump_state(CPUState *cs, FILE *f,
>      int el = arm_current_el(env);
>      const char *ns_status;
>
> -    cpu_fprintf(f, "PC=%016"PRIx64"  SP=%016"PRIx64"\n",
> -            env->pc, env->xregs[31]);
> -    for (i = 0; i < 31; i++) {
> -        cpu_fprintf(f, "X%02d=%016"PRIx64, i, env->xregs[i]);
> -        if ((i % 4) == 3) {
> -            cpu_fprintf(f, "\n");
> +    cpu_fprintf(f, " PC=%016" PRIx64 " ", env->pc);
> +    for (i = 0; i < 32; i++) {
> +        if (i == 31) {
> +            cpu_fprintf(f, " SP=%016" PRIx64 "\n", env->xregs[i]);
>          } else {
> -            cpu_fprintf(f, " ");
> +            cpu_fprintf(f, "X%02d=%016" PRIx64 "%s", i, env->xregs[i],
> +                        (i + 2) % 3 ? " " : "\n");
>          }
>      }


--
Alex Bennée

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 10/11] target/arm: Dump SVE state if enabled
  2018-08-09  3:40 ` [Qemu-devel] [PATCH 10/11] target/arm: Dump SVE state if enabled Richard Henderson
@ 2018-08-09 10:55   ` Alex Bennée
  0 siblings, 0 replies; 24+ messages in thread
From: Alex Bennée @ 2018-08-09 10:55 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, laurent.desnogues, peter.maydell, qemu-stable


Richard Henderson <richard.henderson@linaro.org> writes:

> Also fold the FPCR/FPSR state onto the same line as PSTATE,
> and mention but do not dump disabled FPU state.
>
> Cc: qemu-stable@nongnu.org (3.0.1)
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  target/arm/translate-a64.c | 95 +++++++++++++++++++++++++++++++++-----
>  1 file changed, 83 insertions(+), 12 deletions(-)
>
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index 358f169c75..b29dc49c4f 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -152,8 +152,7 @@ void aarch64_cpu_dump_state(CPUState *cs, FILE *f,
>      } else {
>          ns_status = "";
>      }
> -
> -    cpu_fprintf(f, "\nPSTATE=%08x %c%c%c%c %sEL%d%c\n",
> +    cpu_fprintf(f, "PSTATE=%08x %c%c%c%c %sEL%d%c",
>                  psr,
>                  psr & PSTATE_N ? 'N' : '-',
>                  psr & PSTATE_Z ? 'Z' : '-',
> @@ -163,17 +162,89 @@ void aarch64_cpu_dump_state(CPUState *cs, FILE *f,
>                  el,
>                  psr & PSTATE_SP ? 'h' : 't');
>
> -    if (flags & CPU_DUMP_FPU) {
> -        int numvfpregs = 32;
> -        for (i = 0; i < numvfpregs; i++) {
> -            uint64_t *q = aa64_vfp_qreg(env, i);
> -            uint64_t vlo = q[0];
> -            uint64_t vhi = q[1];
> -            cpu_fprintf(f, "q%02d=%016" PRIx64 ":%016" PRIx64 "%c",
> -                        i, vhi, vlo, (i & 1 ? '\n' : ' '));
> +    if (!(flags & CPU_DUMP_FPU)) {
> +        cpu_fprintf(f, "\n");
> +        return;
> +    }
> +    cpu_fprintf(f, "     FPCR=%08x FPSR=%08x\n",
> +                vfp_get_fpcr(env), vfp_get_fpsr(env));
> +
> +    if (arm_feature(env, ARM_FEATURE_SVE)) {
> +        int j, zcr_len = env->vfp.zcr_el[1] & 0xf; /* fix for system mode */
> +
> +        for (i = 0; i <= FFR_PRED_NUM; i++) {
> +            bool eol;
> +            if (i == FFR_PRED_NUM) {
> +                cpu_fprintf(f, "FFR=");
> +                /* It's last, so end the line.  */
> +                eol = true;
> +            } else {
> +                cpu_fprintf(f, "P%02d=", i);
> +                switch (zcr_len) {
> +                case 0:
> +                    eol = i % 8 == 7;
> +                    break;
> +                case 1:
> +                    eol = i % 6 == 5;
> +                    break;
> +                case 2:
> +                case 3:
> +                    eol = i % 3 == 2;
> +                    break;
> +                default:
> +                    /* More than one quadword per predicate.  */
> +                    eol = true;
> +                    break;
> +                }
> +            }
> +            for (j = zcr_len / 4; j >= 0; j--) {
> +                int digits;
> +                if (j * 4 + 4 <= zcr_len + 1) {
> +                    digits = 16;
> +                } else {
> +                    digits = (zcr_len % 4 + 1) * 4;
> +                }
> +                cpu_fprintf(f, "%0*" PRIx64 "%s", digits,
> +                            env->vfp.pregs[i].p[j],
> +                            j ? ":" : eol ? "\n" : " ");
> +            }
> +        }
> +
> +        for (i = 0; i < 32; i++) {
> +            if (zcr_len == 0) {
> +                cpu_fprintf(f, "Z%02d=%016" PRIx64 ":%016" PRIx64 "%s",
> +                            i, env->vfp.zregs[i].d[1],
> +                            env->vfp.zregs[i].d[0], i & 1 ? "\n" : " ");
> +            } else if (zcr_len == 1) {
> +                cpu_fprintf(f, "Z%02d=%016" PRIx64 ":%016" PRIx64
> +                            ":%016" PRIx64 ":%016" PRIx64 "\n",
> +                            i, env->vfp.zregs[i].d[3], env->vfp.zregs[i].d[2],
> +                            env->vfp.zregs[i].d[1], env->vfp.zregs[i].d[0]);
> +            } else {
> +                for (j = zcr_len; j >= 0; j--) {
> +                    bool odd = (zcr_len - j) % 2 != 0;
> +                    if (j == zcr_len) {
> +                        cpu_fprintf(f, "Z%02d[%x-%x]=", i, j, j - 1);
> +                    } else if (!odd) {
> +                        if (j > 0) {
> +                            cpu_fprintf(f, "   [%x-%x]=", j, j - 1);
> +                        } else {
> +                            cpu_fprintf(f, "     [%x]=", j);
> +                        }
> +                    }
> +                    cpu_fprintf(f, "%016" PRIx64 ":%016" PRIx64 "%s",
> +                                env->vfp.zregs[i].d[j * 2 + 1],
> +                                env->vfp.zregs[i].d[j * 2],
> +                                odd || j == 0 ? "\n" : ":");
> +                }
> +            }
> +        }
> +    } else {
> +        for (i = 0; i < 32; i++) {
> +            uint64_t *q = aa64_vfp_qreg(env, i);
> +            cpu_fprintf(f, "Q%02d=%016" PRIx64 ":%016" PRIx64 "%s",
> +                        i, q[1], q[0], (i & 1 ? "\n" : " "));
>          }
> -        cpu_fprintf(f, "FPCR: %08x  FPSR: %08x\n",
> -                    vfp_get_fpcr(env), vfp_get_fpsr(env));
>      }
>  }


--
Alex Bennée

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 09/11] target/arm: Reformat integer register dump
  2018-08-09  3:40 ` [Qemu-devel] [PATCH 09/11] target/arm: Reformat integer register dump Richard Henderson
  2018-08-09 10:12   ` Alex Bennée
@ 2018-08-09 10:58   ` Alex Bennée
  1 sibling, 0 replies; 24+ messages in thread
From: Alex Bennée @ 2018-08-09 10:58 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, laurent.desnogues, peter.maydell, qemu-stable


Richard Henderson <richard.henderson@linaro.org> writes:

> With PC, there are 33 registers.  Three per line lines up nicely
> without overflowing 80 columns.
>
> Cc: qemu-stable@nongnu.org (3.0.1)
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  target/arm/translate-a64.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index 45a6c2a3aa..358f169c75 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -137,14 +137,13 @@ void aarch64_cpu_dump_state(CPUState *cs, FILE *f,
>      int el = arm_current_el(env);
>      const char *ns_status;
>
> -    cpu_fprintf(f, "PC=%016"PRIx64"  SP=%016"PRIx64"\n",
> -            env->pc, env->xregs[31]);
> -    for (i = 0; i < 31; i++) {
> -        cpu_fprintf(f, "X%02d=%016"PRIx64, i, env->xregs[i]);
> -        if ((i % 4) == 3) {
> -            cpu_fprintf(f, "\n");
> +    cpu_fprintf(f, " PC=%016" PRIx64 " ", env->pc);
> +    for (i = 0; i < 32; i++) {
> +        if (i == 31) {
> +            cpu_fprintf(f, " SP=%016" PRIx64 "\n", env->xregs[i]);
>          } else {
> -            cpu_fprintf(f, " ");
> +            cpu_fprintf(f, "X%02d=%016" PRIx64 "%s", i, env->xregs[i],
> +                        (i + 2) % 3 ? " " : "\n");
>          }
>      }


--
Alex Bennée

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 11/11] target/arm: Add sve-max-vq cpu property to -cpu max
  2018-08-09  3:40 ` [Qemu-devel] [PATCH 11/11] target/arm: Add sve-max-vq cpu property to -cpu max Richard Henderson
@ 2018-08-09 11:00   ` Alex Bennée
  0 siblings, 0 replies; 24+ messages in thread
From: Alex Bennée @ 2018-08-09 11:00 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, laurent.desnogues, peter.maydell, qemu-stable


Richard Henderson <richard.henderson@linaro.org> writes:

> This allows the default (and maximum) vector length to be set
> from the command-line.  Which is extraordinarily helpful in
> debuging problems depending on vector length without having to
> bake knowledge of PR_SET_SVE_VL into every guest binary.
>
> Cc: qemu-stable@nongnu.org (3.0.1)
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>


> ---
>  target/arm/cpu.h     |  3 +++
>  linux-user/syscall.c | 19 +++++++++++++------
>  target/arm/cpu.c     |  6 +++---
>  target/arm/cpu64.c   | 29 +++++++++++++++++++++++++++++
>  target/arm/helper.c  |  7 +++++--
>  5 files changed, 53 insertions(+), 11 deletions(-)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index e310ffc29d..9526ed27cb 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -857,6 +857,9 @@ struct ARMCPU {
>
>      /* Used to synchronize KVM and QEMU in-kernel device levels */
>      uint8_t device_irq_level;
> +
> +    /* Used to set the maximum vector length the cpu will support.  */
> +    uint32_t sve_max_vq;
>  };
>
>  static inline ARMCPU *arm_env_get_cpu(CPUARMState *env)
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index dfc851cc35..5a4af76c03 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -10848,15 +10848,22 @@ abi_long do_syscall(void *cpu_env, int num, abi_long arg1,
>  #endif
>  #ifdef TARGET_AARCH64
>          case TARGET_PR_SVE_SET_VL:
> -            /* We cannot support either PR_SVE_SET_VL_ONEXEC
> -               or PR_SVE_VL_INHERIT.  Therefore, anything above
> -               ARM_MAX_VQ results in EINVAL.  */
> +            /*
> +             * We cannot support either PR_SVE_SET_VL_ONEXEC or
> +             * PR_SVE_VL_INHERIT.  Note the kernel definition
> +             * of sve_vl_valid allows for VQ=512, i.e. VL=8192,
> +             * even though the current architectural maximum is VQ=16.
> +             */
>              ret = -TARGET_EINVAL;
>              if (arm_feature(cpu_env, ARM_FEATURE_SVE)
> -                && arg2 >= 0 && arg2 <= ARM_MAX_VQ * 16 && !(arg2 & 15)) {
> +                && arg2 >= 0 && arg2 <= 512 * 16 && !(arg2 & 15)) {
>                  CPUARMState *env = cpu_env;
> -                int old_vq = (env->vfp.zcr_el[1] & 0xf) + 1;
> -                int vq = MAX(arg2 / 16, 1);
> +                ARMCPU *cpu = arm_env_get_cpu(env);
> +                uint32_t vq, old_vq;
> +
> +                old_vq = (env->vfp.zcr_el[1] & 0xf) + 1;
> +                vq = MAX(arg2 / 16, 1);
> +                vq = MIN(vq, cpu->sve_max_vq);
>
>                  if (vq < old_vq) {
>                      aarch64_sve_narrow_vq(env, vq);
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index 64a8005a4b..b25898ed4c 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -168,9 +168,9 @@ static void arm_cpu_reset(CPUState *s)
>          env->cp15.cpacr_el1 = deposit64(env->cp15.cpacr_el1, 16, 2, 3);
>          env->cp15.cptr_el[3] |= CPTR_EZ;
>          /* with maximum vector length */
> -        env->vfp.zcr_el[1] = ARM_MAX_VQ - 1;
> -        env->vfp.zcr_el[2] = ARM_MAX_VQ - 1;
> -        env->vfp.zcr_el[3] = ARM_MAX_VQ - 1;
> +        env->vfp.zcr_el[1] = cpu->sve_max_vq - 1;
> +        env->vfp.zcr_el[2] = env->vfp.zcr_el[1];
> +        env->vfp.zcr_el[3] = env->vfp.zcr_el[1];
>  #else
>          /* Reset into the highest available EL */
>          if (arm_feature(env, ARM_FEATURE_EL3)) {
> diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
> index d0581d59d8..800bff780e 100644
> --- a/target/arm/cpu64.c
> +++ b/target/arm/cpu64.c
> @@ -29,6 +29,7 @@
>  #include "sysemu/sysemu.h"
>  #include "sysemu/kvm.h"
>  #include "kvm_arm.h"
> +#include "qapi/visitor.h"
>
>  static inline void set_feature(CPUARMState *env, int feature)
>  {
> @@ -217,6 +218,29 @@ static void aarch64_a53_initfn(Object *obj)
>      define_arm_cp_regs(cpu, cortex_a57_a53_cp_reginfo);
>  }
>
> +static void cpu_max_get_sve_vq(Object *obj, Visitor *v, const char *name,
> +                               void *opaque, Error **errp)
> +{
> +    ARMCPU *cpu = ARM_CPU(obj);
> +    visit_type_uint32(v, name, &cpu->sve_max_vq, errp);
> +}
> +
> +static void cpu_max_set_sve_vq(Object *obj, Visitor *v, const char *name,
> +                               void *opaque, Error **errp)
> +{
> +    ARMCPU *cpu = ARM_CPU(obj);
> +    Error *err = NULL;
> +
> +    visit_type_uint32(v, name, &cpu->sve_max_vq, &err);
> +
> +    if (!err && (cpu->sve_max_vq == 0 || cpu->sve_max_vq > ARM_MAX_VQ)) {
> +        error_setg(&err, "unsupported SVE vector length");
> +        error_append_hint(&err, "Valid sve-max-vq in range [1-%d]\n",
> +                          ARM_MAX_VQ);
> +    }
> +    error_propagate(errp, err);
> +}
> +
>  /* -cpu max: if KVM is enabled, like -cpu host (best possible with this host);
>   * otherwise, a CPU with as many features enabled as our emulation supports.
>   * The version of '-cpu max' for qemu-system-arm is defined in cpu.c;
> @@ -253,6 +277,10 @@ static void aarch64_max_initfn(Object *obj)
>          cpu->ctr = 0x80038003; /* 32 byte I and D cacheline size, VIPT icache */
>          cpu->dcz_blocksize = 7; /*  512 bytes */
>  #endif
> +
> +        cpu->sve_max_vq = ARM_MAX_VQ;
> +        object_property_add(obj, "sve-max-vq", "uint32", cpu_max_get_sve_vq,
> +                            cpu_max_set_sve_vq, NULL, NULL, &error_fatal);
>      }
>  }
>
> @@ -405,6 +433,7 @@ void aarch64_sve_narrow_vq(CPUARMState *env, unsigned vq)
>      uint64_t pmask;
>
>      assert(vq >= 1 && vq <= ARM_MAX_VQ);
> +    assert(vq <= arm_env_get_cpu(env)->sve_max_vq);
>
>      /* Zap the high bits of the zregs.  */
>      for (i = 0; i < 32; i++) {
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index 66afb08ee0..c24c66d43e 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -12408,9 +12408,12 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
>              zcr_len = 0;
>          } else {
>              int current_el = arm_current_el(env);
> +            ARMCPU *cpu = arm_env_get_cpu(env);
>
> -            zcr_len = env->vfp.zcr_el[current_el <= 1 ? 1 : current_el];
> -            zcr_len &= 0xf;
> +            zcr_len = cpu->sve_max_vq - 1;
> +            if (current_el <= 1) {
> +                zcr_len = MIN(zcr_len, 0xf & (uint32_t)env->vfp.zcr_el[1]);
> +            }
>              if (current_el < 2 && arm_feature(env, ARM_FEATURE_EL2)) {
>                  zcr_len = MIN(zcr_len, 0xf & (uint32_t)env->vfp.zcr_el[2]);
>              }


--
Alex Bennée

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 06/11] target/arm: Fix sign-extension in sve do_ldr/do_str
  2018-08-09  3:40 ` [Qemu-devel] [PATCH 06/11] target/arm: Fix sign-extension in sve do_ldr/do_str Richard Henderson
  2018-08-09  5:28   ` Laurent Desnogues
@ 2018-08-09 11:00   ` Alex Bennée
  1 sibling, 0 replies; 24+ messages in thread
From: Alex Bennée @ 2018-08-09 11:00 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, laurent.desnogues, peter.maydell, qemu-stable


Richard Henderson <richard.henderson@linaro.org> writes:

> The expression (int) imm + (uint32_t) len_align turns into uint32_t
> and thus with negative imm produces a memory operation at the wrong
> offset.  None of the numbers involved are particularly large, so
> change everything to use int.
>
> Cc: qemu-stable@nongnu.org (3.0.1)
> Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  target/arm/translate-sve.c | 18 ++++++++----------
>  1 file changed, 8 insertions(+), 10 deletions(-)
>
> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
> index 89efc80ee7..9e63b5f8e5 100644
> --- a/target/arm/translate-sve.c
> +++ b/target/arm/translate-sve.c
> @@ -4372,12 +4372,11 @@ static bool trans_UCVTF_dd(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
>   * The load should begin at the address Rn + IMM.
>   */
>
> -static void do_ldr(DisasContext *s, uint32_t vofs, uint32_t len,
> -                   int rn, int imm)
> +static void do_ldr(DisasContext *s, uint32_t vofs, int len, int rn, int imm)
>  {
> -    uint32_t len_align = QEMU_ALIGN_DOWN(len, 8);
> -    uint32_t len_remain = len % 8;
> -    uint32_t nparts = len / 8 + ctpop8(len_remain);
> +    int len_align = QEMU_ALIGN_DOWN(len, 8);
> +    int len_remain = len % 8;
> +    int nparts = len / 8 + ctpop8(len_remain);
>      int midx = get_mem_index(s);
>      TCGv_i64 addr, t0, t1;
>
> @@ -4458,12 +4457,11 @@ static void do_ldr(DisasContext *s, uint32_t vofs, uint32_t len,
>  }
>
>  /* Similarly for stores.  */
> -static void do_str(DisasContext *s, uint32_t vofs, uint32_t len,
> -                   int rn, int imm)
> +static void do_str(DisasContext *s, uint32_t vofs, int len, int rn, int imm)
>  {
> -    uint32_t len_align = QEMU_ALIGN_DOWN(len, 8);
> -    uint32_t len_remain = len % 8;
> -    uint32_t nparts = len / 8 + ctpop8(len_remain);
> +    int len_align = QEMU_ALIGN_DOWN(len, 8);
> +    int len_remain = len % 8;
> +    int nparts = len / 8 + ctpop8(len_remain);
>      int midx = get_mem_index(s);
>      TCGv_i64 addr, t0;


--
Alex Bennée

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 00/11] target/arm: sve linux-user patches
  2018-08-09  3:40 [Qemu-devel] [PATCH 00/11] target/arm: sve linux-user patches Richard Henderson
                   ` (10 preceding siblings ...)
  2018-08-09  3:40 ` [Qemu-devel] [PATCH 11/11] target/arm: Add sve-max-vq cpu property to -cpu max Richard Henderson
@ 2018-08-16 12:11 ` Peter Maydell
  11 siblings, 0 replies; 24+ messages in thread
From: Peter Maydell @ 2018-08-16 12:11 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée

On 9 August 2018 at 04:40, Richard Henderson
<richard.henderson@linaro.org> wrote:
> I posted a few of these before, and I thought Peter had applied them
> to his target-arm.for-3-1 branch, but I don't see them there now.

I did indeed have patches 1-4 in my for-3.1 branch and they're now
in master. I must have forgotten to push out my local branch to
git.linaro.org after applying them.

> I've taken the opportunity to tag all of these for backport into the
> next stable release.  I'm intending to do so for all of the correctness
> patches affecting sve linux-user so that 3.0.1 will be usable long-term.

Applied 5..11 to target-arm.next.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2018-08-16 12:11 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-09  3:40 [Qemu-devel] [PATCH 00/11] target/arm: sve linux-user patches Richard Henderson
2018-08-09  3:40 ` [Qemu-devel] [PATCH 01/11] target/arm: Fix sign of sve_cmpeq_ppzw/sve_cmpne_ppzw Richard Henderson
2018-08-09  3:40 ` [Qemu-devel] [PATCH 02/11] target/arm: Fix typo in do_sat_addsub_64 Richard Henderson
2018-08-09  9:12   ` Alex Bennée
2018-08-09  3:40 ` [Qemu-devel] [PATCH 03/11] target/arm: Reorganize SVE WHILE Richard Henderson
2018-08-09  9:48   ` Alex Bennée
2018-08-09  3:40 ` [Qemu-devel] [PATCH 04/11] target/arm: Fix typo in helper_sve_movz_d Richard Henderson
2018-08-09  3:40 ` [Qemu-devel] [PATCH 05/11] target/arm: Fix typo in helper_sve_ld1hss_r Richard Henderson
2018-08-09 10:09   ` Alex Bennée
2018-08-09  3:40 ` [Qemu-devel] [PATCH 06/11] target/arm: Fix sign-extension in sve do_ldr/do_str Richard Henderson
2018-08-09  5:28   ` Laurent Desnogues
2018-08-09 11:00   ` Alex Bennée
2018-08-09  3:40 ` [Qemu-devel] [PATCH 07/11] target/arm: Fix offset for LD1R instructions Richard Henderson
2018-08-09  5:28   ` Laurent Desnogues
2018-08-09  3:40 ` [Qemu-devel] [PATCH 08/11] target/arm: Fix offset scaling for LD_zprr and ST_zprr Richard Henderson
2018-08-09  5:29   ` Laurent Desnogues
2018-08-09  3:40 ` [Qemu-devel] [PATCH 09/11] target/arm: Reformat integer register dump Richard Henderson
2018-08-09 10:12   ` Alex Bennée
2018-08-09 10:58   ` Alex Bennée
2018-08-09  3:40 ` [Qemu-devel] [PATCH 10/11] target/arm: Dump SVE state if enabled Richard Henderson
2018-08-09 10:55   ` Alex Bennée
2018-08-09  3:40 ` [Qemu-devel] [PATCH 11/11] target/arm: Add sve-max-vq cpu property to -cpu max Richard Henderson
2018-08-09 11:00   ` Alex Bennée
2018-08-16 12:11 ` [Qemu-devel] [PATCH 00/11] target/arm: sve linux-user patches Peter Maydell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.