qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/10] target/arm: Various v8.1M minor features
@ 2020-10-19 15:12 Peter Maydell
  2020-10-19 15:12 ` [PATCH v2 01/10] decodetree: Fix codegen for non-overlapping group inside overlapping group Peter Maydell
                   ` (9 more replies)
  0 siblings, 10 replies; 15+ messages in thread
From: Peter Maydell @ 2020-10-19 15:12 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

This patchseries implements various minor v8.1M new features,
notably the branch-future and low-overhead-loop extensions.

(None of this will get enabled until we have enough to implement
a CPU model which has v8.1M, which will be the Cortex-M55, but
as usual we can get stuff into the tree gradually.)

Changes v1->v2:
 * added missing check that rm!=13 for CSEL decode
 * folded in gen_jmp_tb() fixup for DLS/WLS/LE patch
 * reversed sense of branch in trans_WLS
 * reworked set_fpscr changes as suggested by RTH
 * provide an env->v7m.ltpsize now (always 4 until
   MVE implemented, but it avoids code changes later)

Unreviewed patches: 2, 7, 9, 10

thanks
-- PMM

Peter Maydell (10):
  decodetree: Fix codegen for non-overlapping group inside overlapping
    group
  target/arm: Implement v8.1M NOCP handling
  target/arm: Implement v8.1M conditional-select insns
  target/arm: Make the t32 insn[25:23]=111 group non-overlapping
  target/arm: Don't allow BLX imm for M-profile
  target/arm: Implement v8.1M branch-future insns (as NOPs)
  target/arm: Implement v8.1M low-overhead-loop instructions
  target/arm: Fix has_vfp/has_neon ID reg squashing for M-profile
  target/arm: Allow M-profile CPUs with FP16 to set FPSCR.FP16
  target/arm: Implement FPSCR.LTPSIZE for M-profile LOB extension

 target/arm/cpu.h               |   8 ++
 target/arm/m-nocp.decode       |  10 +-
 target/arm/t32.decode          |  50 ++++++---
 target/arm/cpu.c               |  38 +++++--
 target/arm/translate.c         | 181 ++++++++++++++++++++++++++++++++-
 target/arm/vfp_helper.c        |  53 ++++++----
 scripts/decodetree.py          |   2 +-
 target/arm/translate-vfp.c.inc |  17 +++-
 8 files changed, 305 insertions(+), 54 deletions(-)

-- 
2.20.1



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 01/10] decodetree: Fix codegen for non-overlapping group inside overlapping group
  2020-10-19 15:12 [PATCH v2 00/10] target/arm: Various v8.1M minor features Peter Maydell
@ 2020-10-19 15:12 ` Peter Maydell
  2020-10-19 15:12 ` [PATCH v2 02/10] target/arm: Implement v8.1M NOCP handling Peter Maydell
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Peter Maydell @ 2020-10-19 15:12 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

For nested groups like:

  {
    [
      pattern 1
      pattern 2
    ]
    pattern 3
  }

the intended behaviour is that patterns 1 and 2 must not
overlap with each other; if the insn matches neither then
we fall through to pattern 3 as the next thing in the
outer overlapping group.

Currently we generate incorrect code for this situation,
because in the code path for a failed match inside the
inner non-overlapping group we generate a "return" statement,
which causes decode to stop entirely rather than continuing
to the next thing in the outer group.

Generate a "break" instead, so that decode flow behaves
as required for this nested group case.

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 scripts/decodetree.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/decodetree.py b/scripts/decodetree.py
index 60fd3b5e5f6..c1bf3cfa85f 100644
--- a/scripts/decodetree.py
+++ b/scripts/decodetree.py
@@ -548,7 +548,7 @@ class Tree:
             output(ind, '    /* ',
                    str_match_bits(innerbits, innermask), ' */\n')
             s.output_code(i + 4, extracted, innerbits, innermask)
-            output(ind, '    return false;\n')
+            output(ind, '    break;\n')
         output(ind, '}\n')
 # end Tree
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 02/10] target/arm: Implement v8.1M NOCP handling
  2020-10-19 15:12 [PATCH v2 00/10] target/arm: Various v8.1M minor features Peter Maydell
  2020-10-19 15:12 ` [PATCH v2 01/10] decodetree: Fix codegen for non-overlapping group inside overlapping group Peter Maydell
@ 2020-10-19 15:12 ` Peter Maydell
  2020-10-19 16:11   ` Richard Henderson
  2020-10-19 15:12 ` [PATCH v2 03/10] target/arm: Implement v8.1M conditional-select insns Peter Maydell
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 15+ messages in thread
From: Peter Maydell @ 2020-10-19 15:12 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

From v8.1M, disabled-coprocessor handling changes slightly:
 * coprocessors 8, 9, 14 and 15 are also governed by the
   cp10 enable bit, like cp11
 * an extra range of instruction patterns is considered
   to be inside the coprocessor space

We previously marked these up with TODO comments; implement the
correct behaviour.

Unfortunately there is no ID register field which indicates this
behaviour.  We could in theory test an unrelated ID register which
indicates guaranteed-to-be-in-v8.1M behaviour like ID_ISAR0.CmpBranch
>= 3 (low-overhead-loops), but it seems better to simply define a new
ARM_FEATURE_V8_1M feature flag and use it for this and other
new-in-v8.1M behaviour that isn't identifiable from the ID registers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h               |  1 +
 target/arm/m-nocp.decode       | 10 ++++++----
 target/arm/translate-vfp.c.inc | 17 +++++++++++++++--
 3 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index cfff1b5c8fe..74392fa0295 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1985,6 +1985,7 @@ enum arm_features {
     ARM_FEATURE_VBAR, /* has cp15 VBAR */
     ARM_FEATURE_M_SECURITY, /* M profile Security Extension */
     ARM_FEATURE_M_MAIN, /* M profile Main Extension */
+    ARM_FEATURE_V8_1M, /* M profile extras only in v8.1M and later */
 };
 
 static inline int arm_feature(CPUARMState *env, int feature)
diff --git a/target/arm/m-nocp.decode b/target/arm/m-nocp.decode
index 7182d7d1217..28c8ac6b94c 100644
--- a/target/arm/m-nocp.decode
+++ b/target/arm/m-nocp.decode
@@ -29,14 +29,16 @@
 # If the coprocessor is not present or disabled then we will generate
 # the NOCP exception; otherwise we let the insn through to the main decode.
 
+&nocp cp
+
 {
   # Special cases which do not take an early NOCP: VLLDM and VLSTM
   VLLDM_VLSTM  1110 1100 001 l:1 rn:4 0000 1010 0000 0000
   # TODO: VSCCLRM (new in v8.1M) is similar:
   #VSCCLRM      1110 1100 1-01 1111 ---- 1011 ---- ---0
 
-  NOCP         111- 1110 ---- ---- ---- cp:4 ---- ----
-  NOCP         111- 110- ---- ---- ---- cp:4 ---- ----
-  # TODO: From v8.1M onwards we will also want this range to NOCP
-  #NOCP_8_1     111- 1111 ---- ---- ---- ---- ---- ---- cp=10
+  NOCP         111- 1110 ---- ---- ---- cp:4 ---- ---- &nocp
+  NOCP         111- 110- ---- ---- ---- cp:4 ---- ---- &nocp
+  # From v8.1M onwards this range will also NOCP:
+  NOCP_8_1     111- 1111 ---- ---- ---- ---- ---- ---- &nocp cp=10
 }
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index 28e0dba5f14..cc9ffb95887 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -3459,7 +3459,7 @@ static bool trans_VLLDM_VLSTM(DisasContext *s, arg_VLLDM_VLSTM *a)
     return true;
 }
 
-static bool trans_NOCP(DisasContext *s, arg_NOCP *a)
+static bool trans_NOCP(DisasContext *s, arg_nocp *a)
 {
     /*
      * Handle M-profile early check for disabled coprocessor:
@@ -3472,7 +3472,11 @@ static bool trans_NOCP(DisasContext *s, arg_NOCP *a)
     if (a->cp == 11) {
         a->cp = 10;
     }
-    /* TODO: in v8.1M cp 8, 9, 14, 15 also are governed by the cp10 enable */
+    if (arm_dc_feature(s, ARM_FEATURE_V8_1M) &&
+        (a->cp == 8 || a->cp == 9 || a->cp == 14 || a->cp == 15)) {
+        /* in v8.1M cp 8, 9, 14, 15 also are governed by the cp10 enable */
+        a->cp = 10;
+    }
 
     if (a->cp != 10) {
         gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
@@ -3489,6 +3493,15 @@ static bool trans_NOCP(DisasContext *s, arg_NOCP *a)
     return false;
 }
 
+static bool trans_NOCP_8_1(DisasContext *s, arg_nocp *a)
+{
+    /* This range needs a coprocessor check for v8.1M and later only */
+    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
+        return false;
+    }
+    return trans_NOCP(s, a);
+}
+
 static bool trans_VINS(DisasContext *s, arg_VINS *a)
 {
     TCGv_i32 rd, rm;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 03/10] target/arm: Implement v8.1M conditional-select insns
  2020-10-19 15:12 [PATCH v2 00/10] target/arm: Various v8.1M minor features Peter Maydell
  2020-10-19 15:12 ` [PATCH v2 01/10] decodetree: Fix codegen for non-overlapping group inside overlapping group Peter Maydell
  2020-10-19 15:12 ` [PATCH v2 02/10] target/arm: Implement v8.1M NOCP handling Peter Maydell
@ 2020-10-19 15:12 ` Peter Maydell
  2020-10-19 15:12 ` [PATCH v2 04/10] target/arm: Make the t32 insn[25:23]=111 group non-overlapping Peter Maydell
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Peter Maydell @ 2020-10-19 15:12 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

v8.1M brings four new insns to M-profile:
 * CSEL  : Rd = cond ? Rn : Rm
 * CSINC : Rd = cond ? Rn : Rm+1
 * CSINV : Rd = cond ? Rn : ~Rm
 * CSNEG : Rd = cond ? Rn : -Rm

Implement these.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/t32.decode  |  3 +++
 target/arm/translate.c | 60 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 63 insertions(+)

diff --git a/target/arm/t32.decode b/target/arm/t32.decode
index 7069d821fde..d8454bd814e 100644
--- a/target/arm/t32.decode
+++ b/target/arm/t32.decode
@@ -90,6 +90,9 @@ SBC_rrri         1110101 1011 . .... 0 ... .... .... ....     @s_rrr_shi
 }
 RSB_rrri         1110101 1110 . .... 0 ... .... .... ....     @s_rrr_shi
 
+# v8.1M CSEL and friends
+CSEL             1110101 0010 1 rn:4 10 op:2 rd:4 fcond:4 rm:4
+
 # Data-processing (register-shifted register)
 
 MOV_rxrr         1111 1010 0 shty:2 s:1 rm:4 1111 rd:4 0000 rs:4 \
diff --git a/target/arm/translate.c b/target/arm/translate.c
index d34c1d351a6..c145775438e 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -8224,6 +8224,66 @@ static bool trans_IT(DisasContext *s, arg_IT *a)
     return true;
 }
 
+/* v8.1M CSEL/CSINC/CSNEG/CSINV */
+static bool trans_CSEL(DisasContext *s, arg_CSEL *a)
+{
+    TCGv_i32 rn, rm, zero;
+    DisasCompare c;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
+        return false;
+    }
+
+    if (a->rm == 13) {
+        /* SEE "Related encodings" (MVE shifts) */
+        return false;
+    }
+
+    if (a->rd == 13 || a->rd == 15 || a->rn == 13 || a->fcond >= 14) {
+        /* CONSTRAINED UNPREDICTABLE: we choose to UNDEF */
+        return false;
+    }
+
+    /* In this insn input reg fields of 0b1111 mean "zero", not "PC" */
+    if (a->rn == 15) {
+        rn = tcg_const_i32(0);
+    } else {
+        rn = load_reg(s, a->rn);
+    }
+    if (a->rm == 15) {
+        rm = tcg_const_i32(0);
+    } else {
+        rm = load_reg(s, a->rm);
+    }
+
+    switch (a->op) {
+    case 0: /* CSEL */
+        break;
+    case 1: /* CSINC */
+        tcg_gen_addi_i32(rm, rm, 1);
+        break;
+    case 2: /* CSINV */
+        tcg_gen_not_i32(rm, rm);
+        break;
+    case 3: /* CSNEG */
+        tcg_gen_neg_i32(rm, rm);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    arm_test_cc(&c, a->fcond);
+    zero = tcg_const_i32(0);
+    tcg_gen_movcond_i32(c.cond, rn, c.value, zero, rn, rm);
+    arm_free_cc(&c);
+    tcg_temp_free_i32(zero);
+
+    store_reg(s, a->rd, rn);
+    tcg_temp_free_i32(rm);
+
+    return true;
+}
+
 /*
  * Legacy decoder.
  */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 04/10] target/arm: Make the t32 insn[25:23]=111 group non-overlapping
  2020-10-19 15:12 [PATCH v2 00/10] target/arm: Various v8.1M minor features Peter Maydell
                   ` (2 preceding siblings ...)
  2020-10-19 15:12 ` [PATCH v2 03/10] target/arm: Implement v8.1M conditional-select insns Peter Maydell
@ 2020-10-19 15:12 ` Peter Maydell
  2020-10-19 15:12 ` [PATCH v2 05/10] target/arm: Don't allow BLX imm for M-profile Peter Maydell
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Peter Maydell @ 2020-10-19 15:12 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

The t32 decode has a group which represents a set of insns
which overlap with B_cond_thumb because they have [25:23]=111
(which is an invalid condition code field for the branch insn).
This group is currently defined using the {} overlap-OK syntax,
but it is almost entirely non-overlapping patterns. Switch
it over to use a non-overlapping group.

For this to be valid syntactically, CPS must move into the same
overlapping-group as the hint insns (CPS vs hints was the
only actual use of the overlap facility for the group).

The non-overlapping subgroup for CLREX/DSB/DMB/ISB/SB is no longer
necessary and so we can remove it (promoting those insns to
be members of the parent group).

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/t32.decode | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/target/arm/t32.decode b/target/arm/t32.decode
index d8454bd814e..7d5e000e82c 100644
--- a/target/arm/t32.decode
+++ b/target/arm/t32.decode
@@ -296,8 +296,8 @@ CLZ              1111 1010 1011 ---- 1111 .... 1000 ....      @rdm
 {
   # Group insn[25:23] = 111, which is cond=111x for the branch below,
   # or unconditional, which would be illegal for the branch.
-  {
-    # Hints
+  [
+    # Hints, and CPS
     {
       YIELD      1111 0011 1010 1111 1000 0000 0000 0001
       WFE        1111 0011 1010 1111 1000 0000 0000 0010
@@ -310,20 +310,18 @@ CLZ              1111 1010 1011 ---- 1111 .... 1000 ....      @rdm
       # The canonical nop ends in 0000 0000, but the whole rest
       # of the space is "reserved hint, behaves as nop".
       NOP        1111 0011 1010 1111 1000 0000 ---- ----
+
+      # If imod == '00' && M == '0' then SEE "Hint instructions", above.
+      CPS        1111 0011 1010 1111 1000 0 imod:2 M:1 A:1 I:1 F:1 mode:5 \
+                 &cps
     }
 
-    # If imod == '00' && M == '0' then SEE "Hint instructions", above.
-    CPS          1111 0011 1010 1111 1000 0 imod:2 M:1 A:1 I:1 F:1 mode:5 \
-                 &cps
-
     # Miscellaneous control
-    [
-      CLREX      1111 0011 1011 1111 1000 1111 0010 1111
-      DSB        1111 0011 1011 1111 1000 1111 0100 ----
-      DMB        1111 0011 1011 1111 1000 1111 0101 ----
-      ISB        1111 0011 1011 1111 1000 1111 0110 ----
-      SB         1111 0011 1011 1111 1000 1111 0111 0000
-    ]
+    CLREX        1111 0011 1011 1111 1000 1111 0010 1111
+    DSB          1111 0011 1011 1111 1000 1111 0100 ----
+    DMB          1111 0011 1011 1111 1000 1111 0101 ----
+    ISB          1111 0011 1011 1111 1000 1111 0110 ----
+    SB           1111 0011 1011 1111 1000 1111 0111 0000
 
     # Note that the v7m insn overlaps both the normal and banked insn.
     {
@@ -351,7 +349,7 @@ CLZ              1111 1010 1011 ---- 1111 .... 1000 ....      @rdm
     HVC          1111 0111 1110 ....  1000 .... .... ....     \
                  &i imm=%imm16_16_0
     UDF          1111 0111 1111 ----  1010 ---- ---- ----
-  }
+  ]
   B_cond_thumb   1111 0. cond:4 ...... 10.0 ............      &ci imm=%imm21
 }
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 05/10] target/arm: Don't allow BLX imm for M-profile
  2020-10-19 15:12 [PATCH v2 00/10] target/arm: Various v8.1M minor features Peter Maydell
                   ` (3 preceding siblings ...)
  2020-10-19 15:12 ` [PATCH v2 04/10] target/arm: Make the t32 insn[25:23]=111 group non-overlapping Peter Maydell
@ 2020-10-19 15:12 ` Peter Maydell
  2020-10-19 15:12 ` [PATCH v2 06/10] target/arm: Implement v8.1M branch-future insns (as NOPs) Peter Maydell
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Peter Maydell @ 2020-10-19 15:12 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

The BLX immediate insn in the Thumb encoding always performs
a switch from Thumb to Arm state. This would be totally useless
in M-profile which has no Arm decoder, and so the instruction
does not exist at all there. Make the encoding UNDEF for M-profile.

(This part of the encoding space is used for the branch-future
and low-overhead-loop insns in v8.1M.)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index c145775438e..613bc0b9f1e 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -7880,6 +7880,14 @@ static bool trans_BLX_i(DisasContext *s, arg_BLX_i *a)
 {
     TCGv_i32 tmp;
 
+    /*
+     * BLX <imm> would be useless on M-profile; the encoding space
+     * is used for other insns from v8.1M onward, and UNDEFs before that.
+     */
+    if (arm_dc_feature(s, ARM_FEATURE_M)) {
+        return false;
+    }
+
     /* For A32, ARM_FEATURE_V5 is checked near the start of the uncond block. */
     if (s->thumb && (a->imm & 2)) {
         return false;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 06/10] target/arm: Implement v8.1M branch-future insns (as NOPs)
  2020-10-19 15:12 [PATCH v2 00/10] target/arm: Various v8.1M minor features Peter Maydell
                   ` (4 preceding siblings ...)
  2020-10-19 15:12 ` [PATCH v2 05/10] target/arm: Don't allow BLX imm for M-profile Peter Maydell
@ 2020-10-19 15:12 ` Peter Maydell
  2020-10-19 15:12 ` [PATCH v2 07/10] target/arm: Implement v8.1M low-overhead-loop instructions Peter Maydell
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Peter Maydell @ 2020-10-19 15:12 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

v8.1M implements a new 'branch future' feature, which is a
set of instructions that request the CPU to perform a branch
"in the future", when it reaches a particular execution address.
In hardware, the expected implementation is that the information
about the branch location and destination is cached and then
acted upon when execution reaches the specified address.
However the architecture permits an implementation to discard
this cached information at any point, and so guest code must
always include a normal branch insn at the branch point as
a fallback. In particular, an implementation is specifically
permitted to treat all BF insns as NOPs (which is equivalent
to discarding the cached information immediately).

For QEMU, implementing this caching of branch information
would be complicated and would not improve the speed of
execution at all, so we make the IMPDEF choice to implement
all BF insns as NOPs.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h       |  6 ++++++
 target/arm/t32.decode  | 13 ++++++++++++-
 target/arm/translate.c | 20 ++++++++++++++++++++
 3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 74392fa0295..a432f301f11 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3473,6 +3473,12 @@ static inline bool isar_feature_aa32_arm_div(const ARMISARegisters *id)
     return FIELD_EX32(id->id_isar0, ID_ISAR0, DIVIDE) > 1;
 }
 
+static inline bool isar_feature_aa32_lob(const ARMISARegisters *id)
+{
+    /* (M-profile) low-overhead loops and branch future */
+    return FIELD_EX32(id->id_isar0, ID_ISAR0, CMPBRANCH) >= 3;
+}
+
 static inline bool isar_feature_aa32_jazelle(const ARMISARegisters *id)
 {
     return FIELD_EX32(id->id_isar1, ID_ISAR1, JAZELLE) != 0;
diff --git a/target/arm/t32.decode b/target/arm/t32.decode
index 7d5e000e82c..3015731a8d0 100644
--- a/target/arm/t32.decode
+++ b/target/arm/t32.decode
@@ -648,4 +648,15 @@ MRC              1110 1110 ... 1 .... .... .... ... 1 .... @mcr
 
 B                1111 0. .......... 10.1 ............         @branch24
 BL               1111 0. .......... 11.1 ............         @branch24
-BLX_i            1111 0. .......... 11.0 ............         @branch24
+{
+  # BLX_i is non-M-profile only
+  BLX_i          1111 0. .......... 11.0 ............         @branch24
+  # M-profile only: loop and branch insns
+  [
+    # All these BF insns have boff != 0b0000; we NOP them all
+    BF           1111 0 boff:4  ------- 1100 - ---------- 1    # BFL
+    BF           1111 0 boff:4 0 ------ 1110 - ---------- 1    # BFCSEL
+    BF           1111 0 boff:4 10 ----- 1110 - ---------- 1    # BF
+    BF           1111 0 boff:4 11 ----- 1110 0 0000000000 1    # BFX, BFLX
+  ]
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 613bc0b9f1e..01b697083a0 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -7933,6 +7933,26 @@ static bool trans_BLX_suffix(DisasContext *s, arg_BLX_suffix *a)
     return true;
 }
 
+static bool trans_BF(DisasContext *s, arg_BF *a)
+{
+    /*
+     * M-profile branch future insns. The architecture permits an
+     * implementation to implement these as NOPs (equivalent to
+     * discarding the LO_BRANCH_INFO cache immediately), and we
+     * take that IMPDEF option because for QEMU a "real" implementation
+     * would be complicated and wouldn't execute any faster.
+     */
+    if (!dc_isar_feature(aa32_lob, s)) {
+        return false;
+    }
+    if (a->boff == 0) {
+        /* SEE "Related encodings" (loop insns) */
+        return false;
+    }
+    /* Handle as NOP */
+    return true;
+}
+
 static bool op_tbranch(DisasContext *s, arg_tbranch *a, bool half)
 {
     TCGv_i32 addr, tmp;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 07/10] target/arm: Implement v8.1M low-overhead-loop instructions
  2020-10-19 15:12 [PATCH v2 00/10] target/arm: Various v8.1M minor features Peter Maydell
                   ` (5 preceding siblings ...)
  2020-10-19 15:12 ` [PATCH v2 06/10] target/arm: Implement v8.1M branch-future insns (as NOPs) Peter Maydell
@ 2020-10-19 15:12 ` Peter Maydell
  2020-10-19 15:39   ` Richard Henderson
  2020-10-19 15:12 ` [PATCH v2 08/10] target/arm: Fix has_vfp/has_neon ID reg squashing for M-profile Peter Maydell
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 15+ messages in thread
From: Peter Maydell @ 2020-10-19 15:12 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

v8.1M's "low-overhead-loop" extension has three instructions
for looping:
 * DLS (start of a do-loop)
 * WLS (start of a while-loop)
 * LE (end of a loop)

The loop-start instructions are both simple operations to start a
loop whose iteration count (if any) is in LR.  The loop-end
instruction handles "decrement iteration count and jump back to loop
start"; it also caches the information about the branch back to the
start of the loop to improve performance of the branch on subsequent
iterations.

As with the branch-future instructions, the architecture permits an
implementation to discard the LO_BRANCH_INFO cache at any time, and
QEMU takes the IMPDEF option to never set it in the first place
(equivalent to discarding it immediately), because for us a "real"
implementation would be unnecessary complexity.

(This implementation only provides the simple looping constructs; the
vector extension MVE (Helium) adds some extra variants to handle
looping across vectors.  We'll add those later when we implement
MVE.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/t32.decode  |  8 ++++
 target/arm/translate.c | 93 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 99 insertions(+), 2 deletions(-)

diff --git a/target/arm/t32.decode b/target/arm/t32.decode
index 3015731a8d0..8152739b52b 100644
--- a/target/arm/t32.decode
+++ b/target/arm/t32.decode
@@ -659,4 +659,12 @@ BL               1111 0. .......... 11.1 ............         @branch24
     BF           1111 0 boff:4 10 ----- 1110 - ---------- 1    # BF
     BF           1111 0 boff:4 11 ----- 1110 0 0000000000 1    # BFX, BFLX
   ]
+  [
+    # LE and WLS immediate
+    %lob_imm 1:10 11:1 !function=times_2
+
+    DLS          1111 0 0000 100     rn:4 1110 0000 0000 0001
+    WLS          1111 0 0000 100     rn:4 1100 . .......... 1 imm=%lob_imm
+    LE           1111 0 0000 0 f:1 0 1111 1100 . .......... 1 imm=%lob_imm
+  ]
 }
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 01b697083a0..5083f828780 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -2490,17 +2490,23 @@ static void gen_goto_tb(DisasContext *s, int n, target_ulong dest)
     s->base.is_jmp = DISAS_NORETURN;
 }
 
-static inline void gen_jmp (DisasContext *s, uint32_t dest)
+/* Jump, specifying which TB number to use if we gen_goto_tb() */
+static inline void gen_jmp_tb(DisasContext *s, uint32_t dest, int tbno)
 {
     if (unlikely(is_singlestepping(s))) {
         /* An indirect jump so that we still trigger the debug exception.  */
         gen_set_pc_im(s, dest);
         s->base.is_jmp = DISAS_JUMP;
     } else {
-        gen_goto_tb(s, 0, dest);
+        gen_goto_tb(s, tbno, dest);
     }
 }
 
+static inline void gen_jmp(DisasContext *s, uint32_t dest)
+{
+    gen_jmp_tb(s, dest, 0);
+}
+
 static inline void gen_mulxy(TCGv_i32 t0, TCGv_i32 t1, int x, int y)
 {
     if (x)
@@ -7953,6 +7959,89 @@ static bool trans_BF(DisasContext *s, arg_BF *a)
     return true;
 }
 
+static bool trans_DLS(DisasContext *s, arg_DLS *a)
+{
+    /* M-profile low-overhead loop start */
+    TCGv_i32 tmp;
+
+    if (!dc_isar_feature(aa32_lob, s)) {
+        return false;
+    }
+    if (a->rn == 13 || a->rn == 15) {
+        /* CONSTRAINED UNPREDICTABLE: we choose to UNDEF */
+        return false;
+    }
+
+    /* Not a while loop, no tail predication: just set LR to the count */
+    tmp = load_reg(s, a->rn);
+    store_reg(s, 14, tmp);
+    return true;
+}
+
+static bool trans_WLS(DisasContext *s, arg_WLS *a)
+{
+    /* M-profile low-overhead while-loop start */
+    TCGv_i32 tmp;
+    TCGLabel *nextlabel;
+
+    if (!dc_isar_feature(aa32_lob, s)) {
+        return false;
+    }
+    if (a->rn == 13 || a->rn == 15) {
+        /* CONSTRAINED UNPREDICTABLE: we choose to UNDEF */
+        return false;
+    }
+    if (s->condexec_mask) {
+        /*
+         * WLS in an IT block is CONSTRAINED UNPREDICTABLE;
+         * we choose to UNDEF, because otherwise our use of
+         * gen_goto_tb(1) would clash with the use of TB exit 1
+         * in the dc->condjmp condition-failed codepath in
+         * arm_tr_tb_stop() and we'd get an assertion.
+         */
+        return false;
+    }
+    nextlabel = gen_new_label();
+    tcg_gen_brcondi_i32(TCG_COND_EQ, cpu_R[a->rn], 0, nextlabel);
+    tmp = load_reg(s, a->rn);
+    store_reg(s, 14, tmp);
+    gen_jmp_tb(s, s->base.pc_next, 1);
+
+    gen_set_label(nextlabel);
+    gen_jmp(s, read_pc(s) + a->imm);
+    return true;
+}
+
+static bool trans_LE(DisasContext *s, arg_LE *a)
+{
+    /*
+     * M-profile low-overhead loop end. The architecture permits an
+     * implementation to discard the LO_BRANCH_INFO cache at any time,
+     * and we take the IMPDEF option to never set it in the first place
+     * (equivalent to always discarding it immediately), because for QEMU
+     * a "real" implementation would be complicated and wouldn't execute
+     * any faster.
+     */
+    TCGv_i32 tmp;
+
+    if (!dc_isar_feature(aa32_lob, s)) {
+        return false;
+    }
+
+    if (!a->f) {
+        /* Not loop-forever. If LR <= 1 this is the last loop: do nothing. */
+        arm_gen_condlabel(s);
+        tcg_gen_brcondi_i32(TCG_COND_LEU, cpu_R[14], 1, s->condlabel);
+        /* Decrement LR */
+        tmp = load_reg(s, 14);
+        tcg_gen_addi_i32(tmp, tmp, -1);
+        store_reg(s, 14, tmp);
+    }
+    /* Jump back to the loop start */
+    gen_jmp(s, read_pc(s) - a->imm);
+    return true;
+}
+
 static bool op_tbranch(DisasContext *s, arg_tbranch *a, bool half)
 {
     TCGv_i32 addr, tmp;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 08/10] target/arm: Fix has_vfp/has_neon ID reg squashing for M-profile
  2020-10-19 15:12 [PATCH v2 00/10] target/arm: Various v8.1M minor features Peter Maydell
                   ` (6 preceding siblings ...)
  2020-10-19 15:12 ` [PATCH v2 07/10] target/arm: Implement v8.1M low-overhead-loop instructions Peter Maydell
@ 2020-10-19 15:12 ` Peter Maydell
  2020-10-19 15:13 ` [PATCH v2 09/10] target/arm: Allow M-profile CPUs with FP16 to set FPSCR.FP16 Peter Maydell
  2020-10-19 15:13 ` [PATCH v2 10/10] target/arm: Implement FPSCR.LTPSIZE for M-profile LOB extension Peter Maydell
  9 siblings, 0 replies; 15+ messages in thread
From: Peter Maydell @ 2020-10-19 15:12 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

In arm_cpu_realizefn(), if the CPU has VFP or Neon disabled then we
squash the ID register fields so that we don't advertise it to the
guest.  This code was written for A-profile and needs some tweaks to
work correctly on M-profile:

 * A-profile only fields should not be zeroed on M-profile:
   - MVFR0.FPSHVEC,FPTRAP
   - MVFR1.SIMDLS,SIMDINT,SIMDSP,SIMDHP
   - MVFR2.SIMDMISC
 * M-profile only fields should be zeroed on M-profile:
   - MVFR1.FP16

In particular, because MVFR1.SIMDHP on A-profile is the same field as
MVFR1.FP16 on M-profile this code was incorrectly disabling FP16
support on an M-profile CPU (where has_neon is always false).  This
isn't a visible bug yet because we don't have any M-profile CPUs with
FP16 support, but the change is necessary before we introduce any.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.c | 29 ++++++++++++++++++-----------
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 056319859fb..186ee621a65 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -1429,17 +1429,22 @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
         u = cpu->isar.mvfr0;
         u = FIELD_DP32(u, MVFR0, FPSP, 0);
         u = FIELD_DP32(u, MVFR0, FPDP, 0);
-        u = FIELD_DP32(u, MVFR0, FPTRAP, 0);
         u = FIELD_DP32(u, MVFR0, FPDIVIDE, 0);
         u = FIELD_DP32(u, MVFR0, FPSQRT, 0);
-        u = FIELD_DP32(u, MVFR0, FPSHVEC, 0);
         u = FIELD_DP32(u, MVFR0, FPROUND, 0);
+        if (!arm_feature(env, ARM_FEATURE_M)) {
+            u = FIELD_DP32(u, MVFR0, FPTRAP, 0);
+            u = FIELD_DP32(u, MVFR0, FPSHVEC, 0);
+        }
         cpu->isar.mvfr0 = u;
 
         u = cpu->isar.mvfr1;
         u = FIELD_DP32(u, MVFR1, FPFTZ, 0);
         u = FIELD_DP32(u, MVFR1, FPDNAN, 0);
         u = FIELD_DP32(u, MVFR1, FPHP, 0);
+        if (arm_feature(env, ARM_FEATURE_M)) {
+            u = FIELD_DP32(u, MVFR1, FP16, 0);
+        }
         cpu->isar.mvfr1 = u;
 
         u = cpu->isar.mvfr2;
@@ -1475,16 +1480,18 @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
         u = FIELD_DP32(u, ID_ISAR6, FHM, 0);
         cpu->isar.id_isar6 = u;
 
-        u = cpu->isar.mvfr1;
-        u = FIELD_DP32(u, MVFR1, SIMDLS, 0);
-        u = FIELD_DP32(u, MVFR1, SIMDINT, 0);
-        u = FIELD_DP32(u, MVFR1, SIMDSP, 0);
-        u = FIELD_DP32(u, MVFR1, SIMDHP, 0);
-        cpu->isar.mvfr1 = u;
+        if (!arm_feature(env, ARM_FEATURE_M)) {
+            u = cpu->isar.mvfr1;
+            u = FIELD_DP32(u, MVFR1, SIMDLS, 0);
+            u = FIELD_DP32(u, MVFR1, SIMDINT, 0);
+            u = FIELD_DP32(u, MVFR1, SIMDSP, 0);
+            u = FIELD_DP32(u, MVFR1, SIMDHP, 0);
+            cpu->isar.mvfr1 = u;
 
-        u = cpu->isar.mvfr2;
-        u = FIELD_DP32(u, MVFR2, SIMDMISC, 0);
-        cpu->isar.mvfr2 = u;
+            u = cpu->isar.mvfr2;
+            u = FIELD_DP32(u, MVFR2, SIMDMISC, 0);
+            cpu->isar.mvfr2 = u;
+        }
     }
 
     if (!cpu->has_neon && !cpu->has_vfp) {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 09/10] target/arm: Allow M-profile CPUs with FP16 to set FPSCR.FP16
  2020-10-19 15:12 [PATCH v2 00/10] target/arm: Various v8.1M minor features Peter Maydell
                   ` (7 preceding siblings ...)
  2020-10-19 15:12 ` [PATCH v2 08/10] target/arm: Fix has_vfp/has_neon ID reg squashing for M-profile Peter Maydell
@ 2020-10-19 15:13 ` Peter Maydell
  2020-10-19 15:57   ` Richard Henderson
  2020-10-19 15:13 ` [PATCH v2 10/10] target/arm: Implement FPSCR.LTPSIZE for M-profile LOB extension Peter Maydell
  9 siblings, 1 reply; 15+ messages in thread
From: Peter Maydell @ 2020-10-19 15:13 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

M-profile CPUs with half-precision floating point support should
be able to write to FPSCR.FZ16, but an M-profile specific masking
of the value at the top of vfp_set_fpscr() currently prevents that.
This is not yet an active bug because we have no M-profile
FP16 CPUs, but needs to be fixed before we can add any.

The bits that the masking is effectively preventing from being
set are the A-profile only short-vector Len and Stride fields,
plus the Neon QC bit. Rearrange the order of the function so
that those fields are handled earlier and only under a suitable
guard; this allows us to drop the M-profile specific masking,
making FZ16 writeable.

This change also makes the QC bit correctly RAZ/WI for older
no-Neon A-profile cores.

This refactoring also paves the way for the low-overhead-branch
LTPSIZE field, which uses some of the bits that are used for
A-profile Stride and Len.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/vfp_helper.c | 47 ++++++++++++++++++++++++-----------------
 1 file changed, 28 insertions(+), 19 deletions(-)

diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index 5666393ef79..c3d01d781b6 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -194,36 +194,45 @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
         val &= ~FPCR_FZ16;
     }
 
-    if (arm_feature(env, ARM_FEATURE_M)) {
+    vfp_set_fpscr_to_host(env, val);
+
+    if (!arm_feature(env, ARM_FEATURE_M)) {
         /*
-         * M profile FPSCR is RES0 for the QC, STRIDE, FZ16, LEN bits
-         * and also for the trapped-exception-handling bits IxE.
+         * Short-vector length and stride; on M-profile these bits
+         * are used for different purposes.
+         * We can't make this conditional be "if MVFR0.FPShVec != 0",
+         * because in v7A no-short-vector-support cores still had to
+         * allow Stride/Len to be written with the only effect that
+         * some insns are required to UNDEF if the guest sets them.
+         *
+         * TODO: if M-profile MVE implemented, set LTPSIZE.
          */
-        val &= 0xf7c0009f;
+        env->vfp.vec_len = extract32(val, 16, 3);
+        env->vfp.vec_stride = extract32(val, 20, 2);
     }
 
-    vfp_set_fpscr_to_host(env, val);
+    if (arm_feature(env, ARM_FEATURE_NEON)) {
+        /*
+         * The bit we set within fpscr_q is arbitrary; the register as a
+         * whole being zero/non-zero is what counts.
+         * TODO: M-profile MVE also has a QC bit.
+         */
+        env->vfp.qc[0] = val & FPCR_QC;
+        env->vfp.qc[1] = 0;
+        env->vfp.qc[2] = 0;
+        env->vfp.qc[3] = 0;
+    }
 
     /*
      * We don't implement trapped exception handling, so the
      * trap enable bits, IDE|IXE|UFE|OFE|DZE|IOE are all RAZ/WI (not RES0!)
      *
-     * If we exclude the exception flags, IOC|DZC|OFC|UFC|IXC|IDC
-     * (which are stored in fp_status), and the other RES0 bits
-     * in between, then we clear all of the low 16 bits.
+     * The exception flags IOC|DZC|OFC|UFC|IXC|IDC are stored in
+     * fp_status; QC, Len and Stride are stored separately earlier.
+     * Clear out all of those and the RES0 bits: only NZCV, AHP, DN,
+     * FZ, RMode and FZ16 are kept in vfp.xregs[FPSCR].
      */
     env->vfp.xregs[ARM_VFP_FPSCR] = val & 0xf7c80000;
-    env->vfp.vec_len = (val >> 16) & 7;
-    env->vfp.vec_stride = (val >> 20) & 3;
-
-    /*
-     * The bit we set within fpscr_q is arbitrary; the register as a
-     * whole being zero/non-zero is what counts.
-     */
-    env->vfp.qc[0] = val & FPCR_QC;
-    env->vfp.qc[1] = 0;
-    env->vfp.qc[2] = 0;
-    env->vfp.qc[3] = 0;
 }
 
 void vfp_set_fpscr(CPUARMState *env, uint32_t val)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 10/10] target/arm: Implement FPSCR.LTPSIZE for M-profile LOB extension
  2020-10-19 15:12 [PATCH v2 00/10] target/arm: Various v8.1M minor features Peter Maydell
                   ` (8 preceding siblings ...)
  2020-10-19 15:13 ` [PATCH v2 09/10] target/arm: Allow M-profile CPUs with FP16 to set FPSCR.FP16 Peter Maydell
@ 2020-10-19 15:13 ` Peter Maydell
  2020-10-19 16:00   ` Richard Henderson
  9 siblings, 1 reply; 15+ messages in thread
From: Peter Maydell @ 2020-10-19 15:13 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

If the M-profile low-overhead-branch extension is implemented, FPSCR
bits [18:16] are a new field LTPSIZE.  If MVE is not implemented
(currently always true for us) then this field always reads as 4 and
ignores writes.

These bits used to be the vector-length field for the old
short-vector extension, so we need to take care that they are not
misinterpreted as setting vec_len. We do this with a rearrangement
of the vfp_set_fpscr() code that deals with vec_len, vec_stride
and also the QC bit; this obviates the need for the M-profile
only masking step that we used to have at the start of the function.

We provide a new field in CPUState for LTPSIZE, even though this
will always be 4, in preparation for MVE, so we don't have to
come back later and split it out of the vfp.xregs[FPSCR] value.
(This state struct field will be saved and restored as part of
the FPSCR value via the vmstate_fpscr in machine.c.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h        | 1 +
 target/arm/cpu.c        | 9 +++++++++
 target/arm/vfp_helper.c | 6 ++++++
 3 files changed, 16 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index a432f301f11..49cd5cabcf2 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -549,6 +549,7 @@ typedef struct CPUARMState {
         uint32_t fpdscr[M_REG_NUM_BANKS];
         uint32_t cpacr[M_REG_NUM_BANKS];
         uint32_t nsacr;
+        int ltpsize;
     } v7m;
 
     /* Information associated with an exception about to be taken:
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 186ee621a65..07492e9f9a4 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -255,6 +255,15 @@ static void arm_cpu_reset(DeviceState *dev)
         uint8_t *rom;
         uint32_t vecbase;
 
+        if (cpu_isar_feature(aa32_lob, cpu)) {
+            /*
+             * LTPSIZE is constant 4 if MVE not implemented, and resets
+             * to an UNKNOWN value if MVE is implemented. We choose to
+             * always reset to 4.
+             */
+            env->v7m.ltpsize = 4;
+        }
+
         if (arm_feature(env, ARM_FEATURE_M_SECURITY)) {
             env->v7m.secure = true;
         } else {
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index c3d01d781b6..bf608d7aef3 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -174,6 +174,12 @@ uint32_t HELPER(vfp_get_fpscr)(CPUARMState *env)
             | (env->vfp.vec_len << 16)
             | (env->vfp.vec_stride << 20);
 
+    /*
+     * M-profile LTPSIZE overlaps A-profile Stride; whichever of the
+     * two is not applicable to this CPU will always be zero.
+     */
+    fpscr |= env->v7m.ltpsize << 16;
+
     fpscr |= vfp_get_fpscr_from_host(env);
 
     i = env->vfp.qc[0] | env->vfp.qc[1] | env->vfp.qc[2] | env->vfp.qc[3];
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 07/10] target/arm: Implement v8.1M low-overhead-loop instructions
  2020-10-19 15:12 ` [PATCH v2 07/10] target/arm: Implement v8.1M low-overhead-loop instructions Peter Maydell
@ 2020-10-19 15:39   ` Richard Henderson
  0 siblings, 0 replies; 15+ messages in thread
From: Richard Henderson @ 2020-10-19 15:39 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 10/19/20 8:12 AM, Peter Maydell wrote:
> v8.1M's "low-overhead-loop" extension has three instructions
> for looping:
>  * DLS (start of a do-loop)
>  * WLS (start of a while-loop)
>  * LE (end of a loop)
> 
> The loop-start instructions are both simple operations to start a
> loop whose iteration count (if any) is in LR.  The loop-end
> instruction handles "decrement iteration count and jump back to loop
> start"; it also caches the information about the branch back to the
> start of the loop to improve performance of the branch on subsequent
> iterations.
> 
> As with the branch-future instructions, the architecture permits an
> implementation to discard the LO_BRANCH_INFO cache at any time, and
> QEMU takes the IMPDEF option to never set it in the first place
> (equivalent to discarding it immediately), because for us a "real"
> implementation would be unnecessary complexity.
> 
> (This implementation only provides the simple looping constructs; the
> vector extension MVE (Helium) adds some extra variants to handle
> looping across vectors.  We'll add those later when we implement
> MVE.)
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/t32.decode  |  8 ++++
>  target/arm/translate.c | 93 +++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 99 insertions(+), 2 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 09/10] target/arm: Allow M-profile CPUs with FP16 to set FPSCR.FP16
  2020-10-19 15:13 ` [PATCH v2 09/10] target/arm: Allow M-profile CPUs with FP16 to set FPSCR.FP16 Peter Maydell
@ 2020-10-19 15:57   ` Richard Henderson
  0 siblings, 0 replies; 15+ messages in thread
From: Richard Henderson @ 2020-10-19 15:57 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 10/19/20 8:13 AM, Peter Maydell wrote:
> M-profile CPUs with half-precision floating point support should
> be able to write to FPSCR.FZ16, but an M-profile specific masking
> of the value at the top of vfp_set_fpscr() currently prevents that.
> This is not yet an active bug because we have no M-profile
> FP16 CPUs, but needs to be fixed before we can add any.
> 
> The bits that the masking is effectively preventing from being
> set are the A-profile only short-vector Len and Stride fields,
> plus the Neon QC bit. Rearrange the order of the function so
> that those fields are handled earlier and only under a suitable
> guard; this allows us to drop the M-profile specific masking,
> making FZ16 writeable.
> 
> This change also makes the QC bit correctly RAZ/WI for older
> no-Neon A-profile cores.
> 
> This refactoring also paves the way for the low-overhead-branch
> LTPSIZE field, which uses some of the bits that are used for
> A-profile Stride and Len.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/vfp_helper.c | 47 ++++++++++++++++++++++++-----------------
>  1 file changed, 28 insertions(+), 19 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 10/10] target/arm: Implement FPSCR.LTPSIZE for M-profile LOB extension
  2020-10-19 15:13 ` [PATCH v2 10/10] target/arm: Implement FPSCR.LTPSIZE for M-profile LOB extension Peter Maydell
@ 2020-10-19 16:00   ` Richard Henderson
  0 siblings, 0 replies; 15+ messages in thread
From: Richard Henderson @ 2020-10-19 16:00 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 10/19/20 8:13 AM, Peter Maydell wrote:
> If the M-profile low-overhead-branch extension is implemented, FPSCR
> bits [18:16] are a new field LTPSIZE.  If MVE is not implemented
> (currently always true for us) then this field always reads as 4 and
> ignores writes.
> 
> These bits used to be the vector-length field for the old
> short-vector extension, so we need to take care that they are not
> misinterpreted as setting vec_len. We do this with a rearrangement
> of the vfp_set_fpscr() code that deals with vec_len, vec_stride
> and also the QC bit; this obviates the need for the M-profile
> only masking step that we used to have at the start of the function.
> 
> We provide a new field in CPUState for LTPSIZE, even though this
> will always be 4, in preparation for MVE, so we don't have to
> come back later and split it out of the vfp.xregs[FPSCR] value.
> (This state struct field will be saved and restored as part of
> the FPSCR value via the vmstate_fpscr in machine.c.)
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/cpu.h        | 1 +
>  target/arm/cpu.c        | 9 +++++++++
>  target/arm/vfp_helper.c | 6 ++++++
>  3 files changed, 16 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 02/10] target/arm: Implement v8.1M NOCP handling
  2020-10-19 15:12 ` [PATCH v2 02/10] target/arm: Implement v8.1M NOCP handling Peter Maydell
@ 2020-10-19 16:11   ` Richard Henderson
  0 siblings, 0 replies; 15+ messages in thread
From: Richard Henderson @ 2020-10-19 16:11 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 10/19/20 8:12 AM, Peter Maydell wrote:
> From v8.1M, disabled-coprocessor handling changes slightly:
>  * coprocessors 8, 9, 14 and 15 are also governed by the
>    cp10 enable bit, like cp11
>  * an extra range of instruction patterns is considered
>    to be inside the coprocessor space
> 
> We previously marked these up with TODO comments; implement the
> correct behaviour.
> 
> Unfortunately there is no ID register field which indicates this
> behaviour.  We could in theory test an unrelated ID register which
> indicates guaranteed-to-be-in-v8.1M behaviour like ID_ISAR0.CmpBranch
>> = 3 (low-overhead-loops), but it seems better to simply define a new
> ARM_FEATURE_V8_1M feature flag and use it for this and other
> new-in-v8.1M behaviour that isn't identifiable from the ID registers.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/cpu.h               |  1 +
>  target/arm/m-nocp.decode       | 10 ++++++----
>  target/arm/translate-vfp.c.inc | 17 +++++++++++++++--
>  3 files changed, 22 insertions(+), 6 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-10-19 16:13 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-19 15:12 [PATCH v2 00/10] target/arm: Various v8.1M minor features Peter Maydell
2020-10-19 15:12 ` [PATCH v2 01/10] decodetree: Fix codegen for non-overlapping group inside overlapping group Peter Maydell
2020-10-19 15:12 ` [PATCH v2 02/10] target/arm: Implement v8.1M NOCP handling Peter Maydell
2020-10-19 16:11   ` Richard Henderson
2020-10-19 15:12 ` [PATCH v2 03/10] target/arm: Implement v8.1M conditional-select insns Peter Maydell
2020-10-19 15:12 ` [PATCH v2 04/10] target/arm: Make the t32 insn[25:23]=111 group non-overlapping Peter Maydell
2020-10-19 15:12 ` [PATCH v2 05/10] target/arm: Don't allow BLX imm for M-profile Peter Maydell
2020-10-19 15:12 ` [PATCH v2 06/10] target/arm: Implement v8.1M branch-future insns (as NOPs) Peter Maydell
2020-10-19 15:12 ` [PATCH v2 07/10] target/arm: Implement v8.1M low-overhead-loop instructions Peter Maydell
2020-10-19 15:39   ` Richard Henderson
2020-10-19 15:12 ` [PATCH v2 08/10] target/arm: Fix has_vfp/has_neon ID reg squashing for M-profile Peter Maydell
2020-10-19 15:13 ` [PATCH v2 09/10] target/arm: Allow M-profile CPUs with FP16 to set FPSCR.FP16 Peter Maydell
2020-10-19 15:57   ` Richard Henderson
2020-10-19 15:13 ` [PATCH v2 10/10] target/arm: Implement FPSCR.LTPSIZE for M-profile LOB extension Peter Maydell
2020-10-19 16:00   ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).