All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/10] target/arm: Various v8.1M minor features
@ 2020-10-12 15:37 Peter Maydell
  2020-10-12 15:37 ` [PATCH 01/10] decodetree: Fix codegen for non-overlapping group inside overlapping group Peter Maydell
                   ` (9 more replies)
  0 siblings, 10 replies; 27+ messages in thread
From: Peter Maydell @ 2020-10-12 15:37 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

This patchseries implements various minor v8.1M new features,
notably the branch-future and low-overhead-loop extensions.

(None of this will get enabled until we have enough to implement
a CPU model which has v8.1M, which will be the Cortex-M55, but
as usual we can get stuff into the tree gradually.)

Patch 1 is a decodetree fix suggested by Richard that is
necessary to avoid wrong-decode of the changes to t32.decode
by later patches.

(Apologies for the accidental mailbombing of the list with
stale patches due to a mangled command line on my first attempt
at sending this :-(  )

thanks
-- PMM

Peter Maydell (10):
  decodetree: Fix codegen for non-overlapping group inside overlapping
    group
  target/arm: Implement v8.1M NOCP handling
  target/arm: Implement v8.1M conditional-select insns
  target/arm: Make the t32 insn[25:23]=111 group non-overlapping
  target/arm: Don't allow BLX imm for M-profile
  target/arm: Implement v8.1M branch-future insns (as NOPs)
  target/arm: Implement v8.1M low-overhead-loop instructions
  target/arm: Fix has_vfp/has_neon ID reg squashing for M-profile
  target/arm: Implement FPSCR.LTPSIZE for M-profile LOB extension
  target/arm: Fix writing to FPSCR.FZ16 on M-profile

 target/arm/cpu.h               |   7 ++
 target/arm/m-nocp.decode       |  10 ++-
 target/arm/t32.decode          |  50 +++++++----
 target/arm/cpu.c               |  34 ++++---
 target/arm/translate.c         | 157 +++++++++++++++++++++++++++++++++
 target/arm/vfp_helper.c        |  30 +++++--
 scripts/decodetree.py          |   2 +-
 target/arm/translate-vfp.c.inc |  17 +++-
 8 files changed, 268 insertions(+), 39 deletions(-)

-- 
2.20.1



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 01/10] decodetree: Fix codegen for non-overlapping group inside overlapping group
  2020-10-12 15:37 [PATCH 00/10] target/arm: Various v8.1M minor features Peter Maydell
@ 2020-10-12 15:37 ` Peter Maydell
  2020-10-13 16:02   ` Richard Henderson
  2020-10-12 15:37 ` [PATCH 02/10] target/arm: Implement v8.1M NOCP handling Peter Maydell
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Peter Maydell @ 2020-10-12 15:37 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

For nested groups like:

  {
    [
      pattern 1
      pattern 2
    ]
    pattern 3
  }

the intended behaviour is that patterns 1 and 2 must not
overlap with each other; if the insn matches neither then
we fall through to pattern 3 as the next thing in the
outer overlapping group.

Currently we generate incorrect code for this situation,
because in the code path for a failed match inside the
inner non-overlapping group we generate a "return" statement,
which causes decode to stop entirely rather than continuing
to the next thing in the outer group.

Generate a "break" instead, so that decode flow behaves
as required for this nested group case.

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 scripts/decodetree.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/decodetree.py b/scripts/decodetree.py
index 60fd3b5e5f6..c1bf3cfa85f 100644
--- a/scripts/decodetree.py
+++ b/scripts/decodetree.py
@@ -548,7 +548,7 @@ class Tree:
             output(ind, '    /* ',
                    str_match_bits(innerbits, innermask), ' */\n')
             s.output_code(i + 4, extracted, innerbits, innermask)
-            output(ind, '    return false;\n')
+            output(ind, '    break;\n')
         output(ind, '}\n')
 # end Tree
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 02/10] target/arm: Implement v8.1M NOCP handling
  2020-10-12 15:37 [PATCH 00/10] target/arm: Various v8.1M minor features Peter Maydell
  2020-10-12 15:37 ` [PATCH 01/10] decodetree: Fix codegen for non-overlapping group inside overlapping group Peter Maydell
@ 2020-10-12 15:37 ` Peter Maydell
  2020-10-12 15:37 ` [PATCH 03/10] target/arm: Implement v8.1M conditional-select insns Peter Maydell
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 27+ messages in thread
From: Peter Maydell @ 2020-10-12 15:37 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

From v8.1M, disabled-coprocessor handling changes slightly:
 * coprocessors 8, 9, 14 and 15 are also governed by the
   cp10 enable bit, like cp11
 * an extra range of instruction patterns is considered
   to be inside the coprocessor space

We previously marked these up with TODO comments; implement the
correct behaviour.

Unfortunately there is no ID register field which indicates this
behaviour.  We could in theory test an unrelated ID register which
indicates guaranteed-to-be-in-v8.1M behaviour like ID_ISAR0.CmpBranch
>= 3 (low-overhead-loops), but it seems better to simply define a new
ARM_FEATURE_V8_1M feature flag and use it for this and other
new-in-v8.1M behaviour that isn't identifiable from the ID registers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h               |  1 +
 target/arm/m-nocp.decode       | 10 ++++++----
 target/arm/translate-vfp.c.inc | 17 +++++++++++++++--
 3 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index cfff1b5c8fe..74392fa0295 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1985,6 +1985,7 @@ enum arm_features {
     ARM_FEATURE_VBAR, /* has cp15 VBAR */
     ARM_FEATURE_M_SECURITY, /* M profile Security Extension */
     ARM_FEATURE_M_MAIN, /* M profile Main Extension */
+    ARM_FEATURE_V8_1M, /* M profile extras only in v8.1M and later */
 };
 
 static inline int arm_feature(CPUARMState *env, int feature)
diff --git a/target/arm/m-nocp.decode b/target/arm/m-nocp.decode
index 7182d7d1217..28c8ac6b94c 100644
--- a/target/arm/m-nocp.decode
+++ b/target/arm/m-nocp.decode
@@ -29,14 +29,16 @@
 # If the coprocessor is not present or disabled then we will generate
 # the NOCP exception; otherwise we let the insn through to the main decode.
 
+&nocp cp
+
 {
   # Special cases which do not take an early NOCP: VLLDM and VLSTM
   VLLDM_VLSTM  1110 1100 001 l:1 rn:4 0000 1010 0000 0000
   # TODO: VSCCLRM (new in v8.1M) is similar:
   #VSCCLRM      1110 1100 1-01 1111 ---- 1011 ---- ---0
 
-  NOCP         111- 1110 ---- ---- ---- cp:4 ---- ----
-  NOCP         111- 110- ---- ---- ---- cp:4 ---- ----
-  # TODO: From v8.1M onwards we will also want this range to NOCP
-  #NOCP_8_1     111- 1111 ---- ---- ---- ---- ---- ---- cp=10
+  NOCP         111- 1110 ---- ---- ---- cp:4 ---- ---- &nocp
+  NOCP         111- 110- ---- ---- ---- cp:4 ---- ---- &nocp
+  # From v8.1M onwards this range will also NOCP:
+  NOCP_8_1     111- 1111 ---- ---- ---- ---- ---- ---- &nocp cp=10
 }
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index 28e0dba5f14..cc9ffb95887 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -3459,7 +3459,7 @@ static bool trans_VLLDM_VLSTM(DisasContext *s, arg_VLLDM_VLSTM *a)
     return true;
 }
 
-static bool trans_NOCP(DisasContext *s, arg_NOCP *a)
+static bool trans_NOCP(DisasContext *s, arg_nocp *a)
 {
     /*
      * Handle M-profile early check for disabled coprocessor:
@@ -3472,7 +3472,11 @@ static bool trans_NOCP(DisasContext *s, arg_NOCP *a)
     if (a->cp == 11) {
         a->cp = 10;
     }
-    /* TODO: in v8.1M cp 8, 9, 14, 15 also are governed by the cp10 enable */
+    if (arm_dc_feature(s, ARM_FEATURE_V8_1M) &&
+        (a->cp == 8 || a->cp == 9 || a->cp == 14 || a->cp == 15)) {
+        /* in v8.1M cp 8, 9, 14, 15 also are governed by the cp10 enable */
+        a->cp = 10;
+    }
 
     if (a->cp != 10) {
         gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
@@ -3489,6 +3493,15 @@ static bool trans_NOCP(DisasContext *s, arg_NOCP *a)
     return false;
 }
 
+static bool trans_NOCP_8_1(DisasContext *s, arg_nocp *a)
+{
+    /* This range needs a coprocessor check for v8.1M and later only */
+    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
+        return false;
+    }
+    return trans_NOCP(s, a);
+}
+
 static bool trans_VINS(DisasContext *s, arg_VINS *a)
 {
     TCGv_i32 rd, rm;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 03/10] target/arm: Implement v8.1M conditional-select insns
  2020-10-12 15:37 [PATCH 00/10] target/arm: Various v8.1M minor features Peter Maydell
  2020-10-12 15:37 ` [PATCH 01/10] decodetree: Fix codegen for non-overlapping group inside overlapping group Peter Maydell
  2020-10-12 15:37 ` [PATCH 02/10] target/arm: Implement v8.1M NOCP handling Peter Maydell
@ 2020-10-12 15:37 ` Peter Maydell
  2020-10-13 16:37   ` Richard Henderson
  2020-10-12 15:37 ` [PATCH 04/10] target/arm: Make the t32 insn[25:23]=111 group non-overlapping Peter Maydell
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Peter Maydell @ 2020-10-12 15:37 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

v8.1M brings four new insns to M-profile:
 * CSEL  : Rd = cond ? Rn : Rm
 * CSINC : Rd = cond ? Rn : Rm+1
 * CSINV : Rd = cond ? Rn : ~Rm
 * CSNEG : Rd = cond ? Rn : -Rm

Implement these.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/t32.decode  |  3 +++
 target/arm/translate.c | 55 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 58 insertions(+)

diff --git a/target/arm/t32.decode b/target/arm/t32.decode
index 7069d821fde..d8454bd814e 100644
--- a/target/arm/t32.decode
+++ b/target/arm/t32.decode
@@ -90,6 +90,9 @@ SBC_rrri         1110101 1011 . .... 0 ... .... .... ....     @s_rrr_shi
 }
 RSB_rrri         1110101 1110 . .... 0 ... .... .... ....     @s_rrr_shi
 
+# v8.1M CSEL and friends
+CSEL             1110101 0010 1 rn:4 10 op:2 rd:4 fcond:4 rm:4
+
 # Data-processing (register-shifted register)
 
 MOV_rxrr         1111 1010 0 shty:2 s:1 rm:4 1111 rd:4 0000 rs:4 \
diff --git a/target/arm/translate.c b/target/arm/translate.c
index d34c1d351a6..a7923a31b56 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -8224,6 +8224,61 @@ static bool trans_IT(DisasContext *s, arg_IT *a)
     return true;
 }
 
+/* v8.1M CSEL/CSINC/CSNEG/CSINV */
+static bool trans_CSEL(DisasContext *s, arg_CSEL *a)
+{
+    TCGv_i32 rn, rm, zero;
+    DisasCompare c;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
+        return false;
+    }
+
+    if (a->rd == 13 || a->rd == 15 || a->rn == 13 || a->fcond >= 14) {
+        /* CONSTRAINED UNPREDICTABLE: we choose to UNDEF */
+        return false;
+    }
+
+    /* In this insn input reg fields of 0b1111 mean "zero", not "PC" */
+    if (a->rn == 15) {
+        rn = tcg_const_i32(0);
+    } else {
+        rn = load_reg(s, a->rn);
+    }
+    if (a->rm == 15) {
+        rm = tcg_const_i32(0);
+    } else {
+        rm = load_reg(s, a->rm);
+    }
+
+    switch (a->op) {
+    case 0: /* CSEL */
+        break;
+    case 1: /* CSINC */
+        tcg_gen_addi_i32(rm, rm, 1);
+        break;
+    case 2: /* CSINV */
+        tcg_gen_not_i32(rm, rm);
+        break;
+    case 3: /* CSNEG */
+        tcg_gen_neg_i32(rm, rm);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    arm_test_cc(&c, a->fcond);
+    zero = tcg_const_i32(0);
+    tcg_gen_movcond_i32(c.cond, rn, c.value, zero, rn, rm);
+    arm_free_cc(&c);
+    tcg_temp_free_i32(zero);
+
+    store_reg(s, a->rd, rn);
+    tcg_temp_free_i32(rm);
+
+    return true;
+}
+
 /*
  * Legacy decoder.
  */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 04/10] target/arm: Make the t32 insn[25:23]=111 group non-overlapping
  2020-10-12 15:37 [PATCH 00/10] target/arm: Various v8.1M minor features Peter Maydell
                   ` (2 preceding siblings ...)
  2020-10-12 15:37 ` [PATCH 03/10] target/arm: Implement v8.1M conditional-select insns Peter Maydell
@ 2020-10-12 15:37 ` Peter Maydell
  2020-10-13 16:40   ` Richard Henderson
  2020-10-12 15:37 ` [PATCH 05/10] target/arm: Don't allow BLX imm for M-profile Peter Maydell
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Peter Maydell @ 2020-10-12 15:37 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

The t32 decode has a group which represents a set of insns
which overlap with B_cond_thumb because they have [25:23]=111
(which is an invalid condition code field for the branch insn).
This group is currently defined using the {} overlap-OK syntax,
but it is almost entirely non-overlapping patterns. Switch
it over to use a non-overlapping group.

For this to be valid syntactically, CPS must move into the same
overlapping-group as the hint insns (CPS vs hints was the
only actual use of the overlap facility for the group).

The non-overlapping subgroup for CLREX/DSB/DMB/ISB/SB is no longer
necessary and so we can remove it (promoting those insns to
be members of the parent group).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
Just a minor bit of tidying that I did while I was trying to
work out whether the v8.1M loop/branch insns needed to go in
this group. (As it turns out, they don't.)
---
 target/arm/t32.decode | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/target/arm/t32.decode b/target/arm/t32.decode
index d8454bd814e..7d5e000e82c 100644
--- a/target/arm/t32.decode
+++ b/target/arm/t32.decode
@@ -296,8 +296,8 @@ CLZ              1111 1010 1011 ---- 1111 .... 1000 ....      @rdm
 {
   # Group insn[25:23] = 111, which is cond=111x for the branch below,
   # or unconditional, which would be illegal for the branch.
-  {
-    # Hints
+  [
+    # Hints, and CPS
     {
       YIELD      1111 0011 1010 1111 1000 0000 0000 0001
       WFE        1111 0011 1010 1111 1000 0000 0000 0010
@@ -310,20 +310,18 @@ CLZ              1111 1010 1011 ---- 1111 .... 1000 ....      @rdm
       # The canonical nop ends in 0000 0000, but the whole rest
       # of the space is "reserved hint, behaves as nop".
       NOP        1111 0011 1010 1111 1000 0000 ---- ----
+
+      # If imod == '00' && M == '0' then SEE "Hint instructions", above.
+      CPS        1111 0011 1010 1111 1000 0 imod:2 M:1 A:1 I:1 F:1 mode:5 \
+                 &cps
     }
 
-    # If imod == '00' && M == '0' then SEE "Hint instructions", above.
-    CPS          1111 0011 1010 1111 1000 0 imod:2 M:1 A:1 I:1 F:1 mode:5 \
-                 &cps
-
     # Miscellaneous control
-    [
-      CLREX      1111 0011 1011 1111 1000 1111 0010 1111
-      DSB        1111 0011 1011 1111 1000 1111 0100 ----
-      DMB        1111 0011 1011 1111 1000 1111 0101 ----
-      ISB        1111 0011 1011 1111 1000 1111 0110 ----
-      SB         1111 0011 1011 1111 1000 1111 0111 0000
-    ]
+    CLREX        1111 0011 1011 1111 1000 1111 0010 1111
+    DSB          1111 0011 1011 1111 1000 1111 0100 ----
+    DMB          1111 0011 1011 1111 1000 1111 0101 ----
+    ISB          1111 0011 1011 1111 1000 1111 0110 ----
+    SB           1111 0011 1011 1111 1000 1111 0111 0000
 
     # Note that the v7m insn overlaps both the normal and banked insn.
     {
@@ -351,7 +349,7 @@ CLZ              1111 1010 1011 ---- 1111 .... 1000 ....      @rdm
     HVC          1111 0111 1110 ....  1000 .... .... ....     \
                  &i imm=%imm16_16_0
     UDF          1111 0111 1111 ----  1010 ---- ---- ----
-  }
+  ]
   B_cond_thumb   1111 0. cond:4 ...... 10.0 ............      &ci imm=%imm21
 }
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 05/10] target/arm: Don't allow BLX imm for M-profile
  2020-10-12 15:37 [PATCH 00/10] target/arm: Various v8.1M minor features Peter Maydell
                   ` (3 preceding siblings ...)
  2020-10-12 15:37 ` [PATCH 04/10] target/arm: Make the t32 insn[25:23]=111 group non-overlapping Peter Maydell
@ 2020-10-12 15:37 ` Peter Maydell
  2020-10-13 16:41   ` Richard Henderson
  2020-10-12 15:37 ` [PATCH 06/10] target/arm: Implement v8.1M branch-future insns (as NOPs) Peter Maydell
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Peter Maydell @ 2020-10-12 15:37 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

The BLX immediate insn in the Thumb encoding always performs
a switch from Thumb to Arm state. This would be totally useless
in M-profile which has no Arm decoder, and so the instruction
does not exist at all there. Make the encoding UNDEF for M-profile.

(This part of the encoding space is used for the branch-future
and low-overhead-loop insns in v8.1M.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index a7923a31b56..0c35efb1014 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -7880,6 +7880,14 @@ static bool trans_BLX_i(DisasContext *s, arg_BLX_i *a)
 {
     TCGv_i32 tmp;
 
+    /*
+     * BLX <imm> would be useless on M-profile; the encoding space
+     * is used for other insns from v8.1M onward, and UNDEFs before that.
+     */
+    if (arm_dc_feature(s, ARM_FEATURE_M)) {
+        return false;
+    }
+
     /* For A32, ARM_FEATURE_V5 is checked near the start of the uncond block. */
     if (s->thumb && (a->imm & 2)) {
         return false;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 06/10] target/arm: Implement v8.1M branch-future insns (as NOPs)
  2020-10-12 15:37 [PATCH 00/10] target/arm: Various v8.1M minor features Peter Maydell
                   ` (4 preceding siblings ...)
  2020-10-12 15:37 ` [PATCH 05/10] target/arm: Don't allow BLX imm for M-profile Peter Maydell
@ 2020-10-12 15:37 ` Peter Maydell
  2020-10-13 16:58   ` Richard Henderson
  2020-10-12 15:37 ` [PATCH 07/10] target/arm: Implement v8.1M low-overhead-loop instructions Peter Maydell
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Peter Maydell @ 2020-10-12 15:37 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

v8.1M implements a new 'branch future' feature, which is a
set of instructions that request the CPU to perform a branch
"in the future", when it reaches a particular execution address.
In hardware, the expected implementation is that the information
about the branch location and destination is cached and then
acted upon when execution reaches the specified address.
However the architecture permits an implementation to discard
this cached information at any point, and so guest code must
always include a normal branch insn at the branch point as
a fallback. In particular, an implementation is specifically
permitted to treat all BF insns as NOPs (which is equivalent
to discarding the cached information immediately).

For QEMU, implementing this caching of branch information
would be complicated and would not improve the speed of
execution at all, so we make the IMPDEF choice to implement
all BF insns as NOPs.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h       |  6 ++++++
 target/arm/t32.decode  | 13 ++++++++++++-
 target/arm/translate.c | 20 ++++++++++++++++++++
 3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 74392fa0295..a432f301f11 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3473,6 +3473,12 @@ static inline bool isar_feature_aa32_arm_div(const ARMISARegisters *id)
     return FIELD_EX32(id->id_isar0, ID_ISAR0, DIVIDE) > 1;
 }
 
+static inline bool isar_feature_aa32_lob(const ARMISARegisters *id)
+{
+    /* (M-profile) low-overhead loops and branch future */
+    return FIELD_EX32(id->id_isar0, ID_ISAR0, CMPBRANCH) >= 3;
+}
+
 static inline bool isar_feature_aa32_jazelle(const ARMISARegisters *id)
 {
     return FIELD_EX32(id->id_isar1, ID_ISAR1, JAZELLE) != 0;
diff --git a/target/arm/t32.decode b/target/arm/t32.decode
index 7d5e000e82c..3015731a8d0 100644
--- a/target/arm/t32.decode
+++ b/target/arm/t32.decode
@@ -648,4 +648,15 @@ MRC              1110 1110 ... 1 .... .... .... ... 1 .... @mcr
 
 B                1111 0. .......... 10.1 ............         @branch24
 BL               1111 0. .......... 11.1 ............         @branch24
-BLX_i            1111 0. .......... 11.0 ............         @branch24
+{
+  # BLX_i is non-M-profile only
+  BLX_i          1111 0. .......... 11.0 ............         @branch24
+  # M-profile only: loop and branch insns
+  [
+    # All these BF insns have boff != 0b0000; we NOP them all
+    BF           1111 0 boff:4  ------- 1100 - ---------- 1    # BFL
+    BF           1111 0 boff:4 0 ------ 1110 - ---------- 1    # BFCSEL
+    BF           1111 0 boff:4 10 ----- 1110 - ---------- 1    # BF
+    BF           1111 0 boff:4 11 ----- 1110 0 0000000000 1    # BFX, BFLX
+  ]
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 0c35efb1014..9e72d719c6f 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -7933,6 +7933,26 @@ static bool trans_BLX_suffix(DisasContext *s, arg_BLX_suffix *a)
     return true;
 }
 
+static bool trans_BF(DisasContext *s, arg_BF *a)
+{
+    /*
+     * M-profile branch future insns. The architecture permits an
+     * implementation to implement these as NOPs (equivalent to
+     * discarding the LO_BRANCH_INFO cache immediately), and we
+     * take that IMPDEF option because for QEMU a "real" implementation
+     * would be complicated and wouldn't execute any faster.
+     */
+    if (!dc_isar_feature(aa32_lob, s)) {
+        return false;
+    }
+    if (a->boff == 0) {
+        /* SEE "Related encodings" (loop insns) */
+        return false;
+    }
+    /* Handle as NOP */
+    return true;
+}
+
 static bool op_tbranch(DisasContext *s, arg_tbranch *a, bool half)
 {
     TCGv_i32 addr, tmp;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 07/10] target/arm: Implement v8.1M low-overhead-loop instructions
  2020-10-12 15:37 [PATCH 00/10] target/arm: Various v8.1M minor features Peter Maydell
                   ` (5 preceding siblings ...)
  2020-10-12 15:37 ` [PATCH 06/10] target/arm: Implement v8.1M branch-future insns (as NOPs) Peter Maydell
@ 2020-10-12 15:37 ` Peter Maydell
  2020-10-12 19:56   ` Peter Maydell
  2020-10-13 22:31   ` Richard Henderson
  2020-10-12 15:37 ` [PATCH 08/10] target/arm: Fix has_vfp/has_neon ID reg squashing for M-profile Peter Maydell
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 27+ messages in thread
From: Peter Maydell @ 2020-10-12 15:37 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

v8.1M's "low-overhead-loop" extension has three instructions
for looping:
 * DLS (start of a do-loop)
 * WLS (start of a while-loop)
 * LE (end of a loop)

The loop-start instructions are both simple operations to start a
loop whose iteration count (if any) is in LR.  The loop-end
instruction handles "decrement iteration count and jump back to loop
start"; it also caches the information about the branch back to the
start of the loop to improve performance of the branch on subsequent
iterations.

As with the branch-future instructions, the architecture permits an
implementation to discard the LO_BRANCH_INFO cache at any time, and
QEMU takes the IMPDEF option to never set it in the first place
(equivalent to discarding it immediately), because for us a "real"
implementation would be unnecessary complexity.

(This implementation only provides the simple looping constructs; the
vector extension MVE (Helium) adds some extra variants to handle
looping across vectors.  We'll add those later when we implement
MVE.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/t32.decode  |  8 +++++
 target/arm/translate.c | 74 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 82 insertions(+)

diff --git a/target/arm/t32.decode b/target/arm/t32.decode
index 3015731a8d0..8152739b52b 100644
--- a/target/arm/t32.decode
+++ b/target/arm/t32.decode
@@ -659,4 +659,12 @@ BL               1111 0. .......... 11.1 ............         @branch24
     BF           1111 0 boff:4 10 ----- 1110 - ---------- 1    # BF
     BF           1111 0 boff:4 11 ----- 1110 0 0000000000 1    # BFX, BFLX
   ]
+  [
+    # LE and WLS immediate
+    %lob_imm 1:10 11:1 !function=times_2
+
+    DLS          1111 0 0000 100     rn:4 1110 0000 0000 0001
+    WLS          1111 0 0000 100     rn:4 1100 . .......... 1 imm=%lob_imm
+    LE           1111 0 0000 0 f:1 0 1111 1100 . .......... 1 imm=%lob_imm
+  ]
 }
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 9e72d719c6f..742c219c071 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -7953,6 +7953,80 @@ static bool trans_BF(DisasContext *s, arg_BF *a)
     return true;
 }
 
+static bool trans_DLS(DisasContext *s, arg_DLS *a)
+{
+    /* M-profile low-overhead loop start */
+    TCGv_i32 tmp;
+
+    if (!dc_isar_feature(aa32_lob, s)) {
+        return false;
+    }
+    if (a->rn == 13 || a->rn == 15) {
+        /* CONSTRAINED UNPREDICTABLE: we choose to UNDEF */
+        return false;
+    }
+
+    /* Not a while loop, no tail predication: just set LR to the count */
+    tmp = load_reg(s, a->rn);
+    store_reg(s, 14, tmp);
+    return true;
+}
+
+static bool trans_WLS(DisasContext *s, arg_WLS *a)
+{
+    /* M-profile low-overhead while-loop start */
+    TCGv_i32 tmp;
+    TCGLabel *nextlabel;
+
+    if (!dc_isar_feature(aa32_lob, s)) {
+        return false;
+    }
+    if (a->rn == 13 || a->rn == 15) {
+        /* CONSTRAINED UNPREDICTABLE: we choose to UNDEF */
+        return false;
+    }
+
+    nextlabel = gen_new_label();
+    tcg_gen_brcondi_i32(TCG_COND_NE, cpu_R[a->rn], 0, nextlabel);
+    gen_jmp(s, read_pc(s) + a->imm);
+
+    gen_set_label(nextlabel);
+    tmp = load_reg(s, a->rn);
+    store_reg(s, 14, tmp);
+    gen_jmp(s, s->base.pc_next);
+    return true;
+}
+
+static bool trans_LE(DisasContext *s, arg_LE *a)
+{
+    /*
+     * M-profile low-overhead loop end. The architecture permits an
+     * implementation to discard the LO_BRANCH_INFO cache at any time,
+     * and we take the IMPDEF option to never set it in the first place
+     * (equivalent to always discarding it immediately), because for QEMU
+     * a "real" implementation would be complicated and wouldn't execute
+     * any faster.
+     */
+    TCGv_i32 tmp;
+
+    if (!dc_isar_feature(aa32_lob, s)) {
+        return false;
+    }
+
+    if (!a->f) {
+        /* Not loop-forever. If LR <= 1 this is the last loop: do nothing. */
+        arm_gen_condlabel(s);
+        tcg_gen_brcondi_i32(TCG_COND_LEU, cpu_R[14], 1, s->condlabel);
+        /* Decrement LR */
+        tmp = load_reg(s, 14);
+        tcg_gen_addi_i32(tmp, tmp, -1);
+        store_reg(s, 14, tmp);
+    }
+    /* Jump back to the loop start */
+    gen_jmp(s, read_pc(s) - a->imm);
+    return true;
+}
+
 static bool op_tbranch(DisasContext *s, arg_tbranch *a, bool half)
 {
     TCGv_i32 addr, tmp;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 08/10] target/arm: Fix has_vfp/has_neon ID reg squashing for M-profile
  2020-10-12 15:37 [PATCH 00/10] target/arm: Various v8.1M minor features Peter Maydell
                   ` (6 preceding siblings ...)
  2020-10-12 15:37 ` [PATCH 07/10] target/arm: Implement v8.1M low-overhead-loop instructions Peter Maydell
@ 2020-10-12 15:37 ` Peter Maydell
  2020-10-13 19:07   ` Richard Henderson
  2020-10-12 15:37 ` [PATCH 09/10] target/arm: Implement FPSCR.LTPSIZE for M-profile LOB extension Peter Maydell
  2020-10-12 15:37 ` [PATCH 10/10] target/arm: Fix writing to FPSCR.FZ16 on M-profile Peter Maydell
  9 siblings, 1 reply; 27+ messages in thread
From: Peter Maydell @ 2020-10-12 15:37 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

In arm_cpu_realizefn(), if the CPU has VFP or Neon disabled then we
squash the ID register fields so that we don't advertise it to the
guest.  This code was written for A-profile and needs some tweaks to
work correctly on M-profile:

 * A-profile only fields should not be zeroed on M-profile:
   - MVFR0.FPSHVEC,FPTRAP
   - MVFR1.SIMDLS,SIMDINT,SIMDSP,SIMDHP
   - MVFR2.SIMDMISC
 * M-profile only fields should be zeroed on M-profile:
   - MVFR1.FP16

In particular, because MVFR1.SIMDHP on A-profile is the same field as
MVFR1.FP16 on M-profile this code was incorrectly disabling FP16
support on an M-profile CPU (where has_neon is always false).  This
isn't a visible bug yet because we don't have any M-profile CPUs with
FP16 support, but the change is necessary before we introduce any.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.c | 29 ++++++++++++++++++-----------
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 056319859fb..186ee621a65 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -1429,17 +1429,22 @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
         u = cpu->isar.mvfr0;
         u = FIELD_DP32(u, MVFR0, FPSP, 0);
         u = FIELD_DP32(u, MVFR0, FPDP, 0);
-        u = FIELD_DP32(u, MVFR0, FPTRAP, 0);
         u = FIELD_DP32(u, MVFR0, FPDIVIDE, 0);
         u = FIELD_DP32(u, MVFR0, FPSQRT, 0);
-        u = FIELD_DP32(u, MVFR0, FPSHVEC, 0);
         u = FIELD_DP32(u, MVFR0, FPROUND, 0);
+        if (!arm_feature(env, ARM_FEATURE_M)) {
+            u = FIELD_DP32(u, MVFR0, FPTRAP, 0);
+            u = FIELD_DP32(u, MVFR0, FPSHVEC, 0);
+        }
         cpu->isar.mvfr0 = u;
 
         u = cpu->isar.mvfr1;
         u = FIELD_DP32(u, MVFR1, FPFTZ, 0);
         u = FIELD_DP32(u, MVFR1, FPDNAN, 0);
         u = FIELD_DP32(u, MVFR1, FPHP, 0);
+        if (arm_feature(env, ARM_FEATURE_M)) {
+            u = FIELD_DP32(u, MVFR1, FP16, 0);
+        }
         cpu->isar.mvfr1 = u;
 
         u = cpu->isar.mvfr2;
@@ -1475,16 +1480,18 @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
         u = FIELD_DP32(u, ID_ISAR6, FHM, 0);
         cpu->isar.id_isar6 = u;
 
-        u = cpu->isar.mvfr1;
-        u = FIELD_DP32(u, MVFR1, SIMDLS, 0);
-        u = FIELD_DP32(u, MVFR1, SIMDINT, 0);
-        u = FIELD_DP32(u, MVFR1, SIMDSP, 0);
-        u = FIELD_DP32(u, MVFR1, SIMDHP, 0);
-        cpu->isar.mvfr1 = u;
+        if (!arm_feature(env, ARM_FEATURE_M)) {
+            u = cpu->isar.mvfr1;
+            u = FIELD_DP32(u, MVFR1, SIMDLS, 0);
+            u = FIELD_DP32(u, MVFR1, SIMDINT, 0);
+            u = FIELD_DP32(u, MVFR1, SIMDSP, 0);
+            u = FIELD_DP32(u, MVFR1, SIMDHP, 0);
+            cpu->isar.mvfr1 = u;
 
-        u = cpu->isar.mvfr2;
-        u = FIELD_DP32(u, MVFR2, SIMDMISC, 0);
-        cpu->isar.mvfr2 = u;
+            u = cpu->isar.mvfr2;
+            u = FIELD_DP32(u, MVFR2, SIMDMISC, 0);
+            cpu->isar.mvfr2 = u;
+        }
     }
 
     if (!cpu->has_neon && !cpu->has_vfp) {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 09/10] target/arm: Implement FPSCR.LTPSIZE for M-profile LOB extension
  2020-10-12 15:37 [PATCH 00/10] target/arm: Various v8.1M minor features Peter Maydell
                   ` (7 preceding siblings ...)
  2020-10-12 15:37 ` [PATCH 08/10] target/arm: Fix has_vfp/has_neon ID reg squashing for M-profile Peter Maydell
@ 2020-10-12 15:37 ` Peter Maydell
  2020-10-13 20:06   ` Richard Henderson
  2020-10-12 15:37 ` [PATCH 10/10] target/arm: Fix writing to FPSCR.FZ16 on M-profile Peter Maydell
  9 siblings, 1 reply; 27+ messages in thread
From: Peter Maydell @ 2020-10-12 15:37 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

If the M-profile low-overhead-branch extension is implemented, FPSCR
bits [18:16] are a new field LTPSIZE.  If MVE is not implemented
(currently always true for us) then this field always reads as 4 and
ignores writes.

These bits used to be the vector-length field for the old
short-vector extension, so we need to take care that they
are not misinterpreted as setting vec_len.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.c        |  5 +++++
 target/arm/vfp_helper.c | 25 +++++++++++++++++++++----
 2 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 186ee621a65..baae826f94f 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -255,6 +255,11 @@ static void arm_cpu_reset(DeviceState *dev)
         uint8_t *rom;
         uint32_t vecbase;
 
+        if (cpu_isar_feature(aa32_lob, cpu)) {
+            /* LTPSIZE is constant 4 if MVE not implemented */
+            env->vfp.xregs[ARM_VFP_FPSCR] |= 4 << 16;
+        }
+
         if (arm_feature(env, ARM_FEATURE_M_SECURITY)) {
             env->v7m.secure = true;
         } else {
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index 5666393ef79..350150adbf1 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -189,8 +189,10 @@ uint32_t vfp_get_fpscr(CPUARMState *env)
 
 void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
 {
+    ARMCPU *cpu = env_archcpu(env);
+
     /* When ARMv8.2-FP16 is not supported, FZ16 is RES0.  */
-    if (!cpu_isar_feature(any_fp16, env_archcpu(env))) {
+    if (!cpu_isar_feature(any_fp16, cpu)) {
         val &= ~FPCR_FZ16;
     }
 
@@ -198,8 +200,14 @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
         /*
          * M profile FPSCR is RES0 for the QC, STRIDE, FZ16, LEN bits
          * and also for the trapped-exception-handling bits IxE.
+         * From v8.1M with the low-overhead-loop extension bits
+         * [18:16] are used for LTPSIZE and (since we don't implement
+         * MVE) always read as 4 and ignore writes.
          */
         val &= 0xf7c0009f;
+        if (cpu_isar_feature(aa32_lob, cpu)) {
+            val |= 4 << 16;
+        }
     }
 
     vfp_set_fpscr_to_host(env, val);
@@ -212,9 +220,18 @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
      * (which are stored in fp_status), and the other RES0 bits
      * in between, then we clear all of the low 16 bits.
      */
-    env->vfp.xregs[ARM_VFP_FPSCR] = val & 0xf7c80000;
-    env->vfp.vec_len = (val >> 16) & 7;
-    env->vfp.vec_stride = (val >> 20) & 3;
+    if (cpu_isar_feature(aa32_lob, cpu)) {
+        /*
+         * M-profile low-overhead-loop extension: [18:16] are LTPSIZE
+         * and we keep them in vfp.xregs[].
+         */
+        env->vfp.xregs[ARM_VFP_FPSCR] = val & 0xf7cf0000;
+    } else {
+        /* Those bits might be the old-style short vector length/stride */
+        env->vfp.xregs[ARM_VFP_FPSCR] = val & 0xf7c80000;
+        env->vfp.vec_len = (val >> 16) & 7;
+        env->vfp.vec_stride = (val >> 20) & 3;
+    }
 
     /*
      * The bit we set within fpscr_q is arbitrary; the register as a
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 10/10] target/arm: Fix writing to FPSCR.FZ16 on M-profile
  2020-10-12 15:37 [PATCH 00/10] target/arm: Various v8.1M minor features Peter Maydell
                   ` (8 preceding siblings ...)
  2020-10-12 15:37 ` [PATCH 09/10] target/arm: Implement FPSCR.LTPSIZE for M-profile LOB extension Peter Maydell
@ 2020-10-12 15:37 ` Peter Maydell
  9 siblings, 0 replies; 27+ messages in thread
From: Peter Maydell @ 2020-10-12 15:37 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

The M-profile specific part of the sanitizing of the value to
be written to the FPSCR used a mask which always zeroed bit 19,
which is FZ16. This is incorrect when the CPU supports 16-bit
floating point arithmetic, because the bit should be writeable.

Code earlier in the function already handles making this bit be RES0
if the CPU doesn't implement the FP16 feature, so we can simply stop
masking it out for M-profile.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/vfp_helper.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index 350150adbf1..4b0bb2bacfb 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -198,13 +198,14 @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
 
     if (arm_feature(env, ARM_FEATURE_M)) {
         /*
-         * M profile FPSCR is RES0 for the QC, STRIDE, FZ16, LEN bits
+         * M profile FPSCR is RES0 for the QC, STRIDE, LEN bits
          * and also for the trapped-exception-handling bits IxE.
          * From v8.1M with the low-overhead-loop extension bits
          * [18:16] are used for LTPSIZE and (since we don't implement
          * MVE) always read as 4 and ignore writes.
+         * FZ16 has already been handled as RES0 above if needed.
          */
-        val &= 0xf7c0009f;
+        val &= 0xf7c8009f;
         if (cpu_isar_feature(aa32_lob, cpu)) {
             val |= 4 << 16;
         }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 07/10] target/arm: Implement v8.1M low-overhead-loop instructions
  2020-10-12 15:37 ` [PATCH 07/10] target/arm: Implement v8.1M low-overhead-loop instructions Peter Maydell
@ 2020-10-12 19:56   ` Peter Maydell
  2020-10-13 17:10     ` Richard Henderson
  2020-10-13 22:31   ` Richard Henderson
  1 sibling, 1 reply; 27+ messages in thread
From: Peter Maydell @ 2020-10-12 19:56 UTC (permalink / raw)
  To: qemu-arm, QEMU Developers; +Cc: Richard Henderson

On Mon, 12 Oct 2020 at 16:37, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> v8.1M's "low-overhead-loop" extension has three instructions
> for looping:
>  * DLS (start of a do-loop)
>  * WLS (start of a while-loop)
>  * LE (end of a loop)
>
> +static bool trans_WLS(DisasContext *s, arg_WLS *a)
> +{
> +    /* M-profile low-overhead while-loop start */
> +    TCGv_i32 tmp;
> +    TCGLabel *nextlabel;
> +
> +    if (!dc_isar_feature(aa32_lob, s)) {
> +        return false;
> +    }
> +    if (a->rn == 13 || a->rn == 15) {
> +        /* CONSTRAINED UNPREDICTABLE: we choose to UNDEF */
> +        return false;
> +    }
> +
> +    nextlabel = gen_new_label();
> +    tcg_gen_brcondi_i32(TCG_COND_NE, cpu_R[a->rn], 0, nextlabel);
> +    gen_jmp(s, read_pc(s) + a->imm);
> +
> +    gen_set_label(nextlabel);
> +    tmp = load_reg(s, a->rn);
> +    store_reg(s, 14, tmp);
> +    gen_jmp(s, s->base.pc_next);
> +    return true;
> +}

This turns out not to work, because gen_jmp() always generates
a goto-tb for tb exit 0, and we hit the assert() that exit 0
was not used twice. Here's a fixup to fold into this patch:

--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -2490,17 +2490,23 @@ static void gen_goto_tb(DisasContext *s, int
n, target_ulong dest)
     s->base.is_jmp = DISAS_NORETURN;
 }

-static inline void gen_jmp (DisasContext *s, uint32_t dest)
+/* Jump, specifying which TB number to use if we gen_goto_tb() */
+static inline void gen_jmp_tb(DisasContext *s, uint32_t dest, int tbno)
 {
     if (unlikely(is_singlestepping(s))) {
         /* An indirect jump so that we still trigger the debug exception.  */
         gen_set_pc_im(s, dest);
         s->base.is_jmp = DISAS_JUMP;
     } else {
-        gen_goto_tb(s, 0, dest);
+        gen_goto_tb(s, tbno, dest);
     }
 }

+static inline void gen_jmp(DisasContext *s, uint32_t dest)
+{
+    gen_jmp_tb(s, dest, 0);
+}
+
 static inline void gen_mulxy(TCGv_i32 t0, TCGv_i32 t1, int x, int y)
 {
     if (x)
@@ -8023,7 +8029,16 @@ static bool trans_WLS(DisasContext *s, arg_WLS *a)
         /* CONSTRAINED UNPREDICTABLE: we choose to UNDEF */
         return false;
     }
-
+    if (s->condexec_mask) {
+        /*
+         * WLS in an IT block is CONSTRAINED UNPREDICTABLE;
+         * we choose to UNDEF, because otherwise our use of
+         * gen_goto_tb(1) would clash with the use of TB exit 1
+         * in the dc->condjmp condition-failed codepath in
+         * arm_tr_tb_stop() and we'd get an assertion.
+         */
+        return false;
+    }
     nextlabel = gen_new_label();
     tcg_gen_brcondi_i32(TCG_COND_NE, cpu_R[a->rn], 0, nextlabel);
     gen_jmp(s, read_pc(s) + a->imm);
@@ -8031,7 +8046,7 @@ static bool trans_WLS(DisasContext *s, arg_WLS *a)
     gen_set_label(nextlabel);
     tmp = load_reg(s, a->rn);
     store_reg(s, 14, tmp);
-    gen_jmp(s, s->base.pc_next);
+    gen_jmp_tb(s, s->base.pc_next, 1);
     return true;
 }

thanks
-- PMM


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 01/10] decodetree: Fix codegen for non-overlapping group inside overlapping group
  2020-10-12 15:37 ` [PATCH 01/10] decodetree: Fix codegen for non-overlapping group inside overlapping group Peter Maydell
@ 2020-10-13 16:02   ` Richard Henderson
  0 siblings, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2020-10-13 16:02 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 10/12/20 8:37 AM, Peter Maydell wrote:
> Generate a "break" instead, so that decode flow behaves
> as required for this nested group case.
> 
> Suggested-by: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  scripts/decodetree.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 03/10] target/arm: Implement v8.1M conditional-select insns
  2020-10-12 15:37 ` [PATCH 03/10] target/arm: Implement v8.1M conditional-select insns Peter Maydell
@ 2020-10-13 16:37   ` Richard Henderson
  0 siblings, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2020-10-13 16:37 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 10/12/20 8:37 AM, Peter Maydell wrote:
> v8.1M brings four new insns to M-profile:
>  * CSEL  : Rd = cond ? Rn : Rm
>  * CSINC : Rd = cond ? Rn : Rm+1
>  * CSINV : Rd = cond ? Rn : ~Rm
>  * CSNEG : Rd = cond ? Rn : -Rm
> 
> Implement these.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/t32.decode  |  3 +++
>  target/arm/translate.c | 55 ++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 58 insertions(+)
> 
> diff --git a/target/arm/t32.decode b/target/arm/t32.decode
> index 7069d821fde..d8454bd814e 100644
> --- a/target/arm/t32.decode
> +++ b/target/arm/t32.decode
> @@ -90,6 +90,9 @@ SBC_rrri         1110101 1011 . .... 0 ... .... .... ....     @s_rrr_shi
>  }
>  RSB_rrri         1110101 1110 . .... 0 ... .... .... ....     @s_rrr_shi
>  
> +# v8.1M CSEL and friends
> +CSEL             1110101 0010 1 rn:4 10 op:2 rd:4 fcond:4 rm:4
> +
>  # Data-processing (register-shifted register)
>  
>  MOV_rxrr         1111 1010 0 shty:2 s:1 rm:4 1111 rd:4 0000 rs:4 \
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index d34c1d351a6..a7923a31b56 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -8224,6 +8224,61 @@ static bool trans_IT(DisasContext *s, arg_IT *a)
>      return true;
>  }
>  
> +/* v8.1M CSEL/CSINC/CSNEG/CSINV */
> +static bool trans_CSEL(DisasContext *s, arg_CSEL *a)
> +{
> +    TCGv_i32 rn, rm, zero;
> +    DisasCompare c;
> +
> +    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
> +        return false;
> +    }
> +
> +    if (a->rd == 13 || a->rd == 15 || a->rn == 13 || a->fcond >= 14) {
> +        /* CONSTRAINED UNPREDICTABLE: we choose to UNDEF */
> +        return false;
> +    }

Missing check for rm != 13, which if I read the table properly would be an MVE
shift instruction.  (Irritatingly, there's a note for "See related encodings",
but there's no related encodings section.)

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 04/10] target/arm: Make the t32 insn[25:23]=111 group non-overlapping
  2020-10-12 15:37 ` [PATCH 04/10] target/arm: Make the t32 insn[25:23]=111 group non-overlapping Peter Maydell
@ 2020-10-13 16:40   ` Richard Henderson
  0 siblings, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2020-10-13 16:40 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 10/12/20 8:37 AM, Peter Maydell wrote:
> The t32 decode has a group which represents a set of insns
> which overlap with B_cond_thumb because they have [25:23]=111
> (which is an invalid condition code field for the branch insn).
> This group is currently defined using the {} overlap-OK syntax,
> but it is almost entirely non-overlapping patterns. Switch
> it over to use a non-overlapping group.
> 
> For this to be valid syntactically, CPS must move into the same
> overlapping-group as the hint insns (CPS vs hints was the
> only actual use of the overlap facility for the group).
> 
> The non-overlapping subgroup for CLREX/DSB/DMB/ISB/SB is no longer
> necessary and so we can remove it (promoting those insns to
> be members of the parent group).
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
> Just a minor bit of tidying that I did while I was trying to
> work out whether the v8.1M loop/branch insns needed to go in
> this group. (As it turns out, they don't.)
> ---
>  target/arm/t32.decode | 26 ++++++++++++--------------
>  1 file changed, 12 insertions(+), 14 deletions(-)

Nice cleanup, thanks.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 05/10] target/arm: Don't allow BLX imm for M-profile
  2020-10-12 15:37 ` [PATCH 05/10] target/arm: Don't allow BLX imm for M-profile Peter Maydell
@ 2020-10-13 16:41   ` Richard Henderson
  0 siblings, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2020-10-13 16:41 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 10/12/20 8:37 AM, Peter Maydell wrote:
> The BLX immediate insn in the Thumb encoding always performs
> a switch from Thumb to Arm state. This would be totally useless
> in M-profile which has no Arm decoder, and so the instruction
> does not exist at all there. Make the encoding UNDEF for M-profile.
> 
> (This part of the encoding space is used for the branch-future
> and low-overhead-loop insns in v8.1M.)
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/translate.c | 8 ++++++++
>  1 file changed, 8 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 06/10] target/arm: Implement v8.1M branch-future insns (as NOPs)
  2020-10-12 15:37 ` [PATCH 06/10] target/arm: Implement v8.1M branch-future insns (as NOPs) Peter Maydell
@ 2020-10-13 16:58   ` Richard Henderson
  0 siblings, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2020-10-13 16:58 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 10/12/20 8:37 AM, Peter Maydell wrote:
> v8.1M implements a new 'branch future' feature, which is a
> set of instructions that request the CPU to perform a branch
> "in the future", when it reaches a particular execution address.
> In hardware, the expected implementation is that the information
> about the branch location and destination is cached and then
> acted upon when execution reaches the specified address.
> However the architecture permits an implementation to discard
> this cached information at any point, and so guest code must
> always include a normal branch insn at the branch point as
> a fallback. In particular, an implementation is specifically
> permitted to treat all BF insns as NOPs (which is equivalent
> to discarding the cached information immediately).
> 
> For QEMU, implementing this caching of branch information
> would be complicated and would not improve the speed of
> execution at all, so we make the IMPDEF choice to implement
> all BF insns as NOPs.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 07/10] target/arm: Implement v8.1M low-overhead-loop instructions
  2020-10-12 19:56   ` Peter Maydell
@ 2020-10-13 17:10     ` Richard Henderson
  2020-10-13 17:12       ` Peter Maydell
  0 siblings, 1 reply; 27+ messages in thread
From: Richard Henderson @ 2020-10-13 17:10 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, QEMU Developers

On 10/12/20 12:56 PM, Peter Maydell wrote:
> On Mon, 12 Oct 2020 at 16:37, Peter Maydell <peter.maydell@linaro.org> wrote:
>>
>> v8.1M's "low-overhead-loop" extension has three instructions
>> for looping:
>>  * DLS (start of a do-loop)
>>  * WLS (start of a while-loop)
>>  * LE (end of a loop)
>>
>> +static bool trans_WLS(DisasContext *s, arg_WLS *a)
>> +{
>> +    /* M-profile low-overhead while-loop start */
>> +    TCGv_i32 tmp;
>> +    TCGLabel *nextlabel;
>> +
>> +    if (!dc_isar_feature(aa32_lob, s)) {
>> +        return false;
>> +    }
>> +    if (a->rn == 13 || a->rn == 15) {
>> +        /* CONSTRAINED UNPREDICTABLE: we choose to UNDEF */
>> +        return false;
>> +    }
>> +
>> +    nextlabel = gen_new_label();
>> +    tcg_gen_brcondi_i32(TCG_COND_NE, cpu_R[a->rn], 0, nextlabel);
>> +    gen_jmp(s, read_pc(s) + a->imm);
>> +
>> +    gen_set_label(nextlabel);
>> +    tmp = load_reg(s, a->rn);
>> +    store_reg(s, 14, tmp);
>> +    gen_jmp(s, s->base.pc_next);
>> +    return true;
>> +}
> 
> This turns out not to work, because gen_jmp() always generates
> a goto-tb for tb exit 0, and we hit the assert() that exit 0
> was not used twice. Here's a fixup to fold into this patch:

Indeed.  I was going to suggest that here you should use arm_gen_condlabel()
like you did for LE.  Which I think would be still cleaner than your fixup patch.


r~


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 07/10] target/arm: Implement v8.1M low-overhead-loop instructions
  2020-10-13 17:10     ` Richard Henderson
@ 2020-10-13 17:12       ` Peter Maydell
  2020-10-13 17:30         ` Richard Henderson
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Maydell @ 2020-10-13 17:12 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-arm, QEMU Developers

On Tue, 13 Oct 2020 at 18:10, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 10/12/20 12:56 PM, Peter Maydell wrote:
> > On Mon, 12 Oct 2020 at 16:37, Peter Maydell <peter.maydell@linaro.org> wrote:
> > This turns out not to work, because gen_jmp() always generates
> > a goto-tb for tb exit 0, and we hit the assert() that exit 0
> > was not used twice. Here's a fixup to fold into this patch:
>
> Indeed.  I was going to suggest that here you should use arm_gen_condlabel()
> like you did for LE.  Which I think would be still cleaner than your fixup patch.

I thought about that but it doesn't really fit, because
the condlabel is for "go to the next instruction
without having done anything". Here we need to do something
on that codepath (unlike LE).

thanks
-- PMM


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 07/10] target/arm: Implement v8.1M low-overhead-loop instructions
  2020-10-13 17:12       ` Peter Maydell
@ 2020-10-13 17:30         ` Richard Henderson
  2020-10-13 20:24           ` Peter Maydell
  0 siblings, 1 reply; 27+ messages in thread
From: Richard Henderson @ 2020-10-13 17:30 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-arm, QEMU Developers

On 10/13/20 10:12 AM, Peter Maydell wrote:
> On Tue, 13 Oct 2020 at 18:10, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> On 10/12/20 12:56 PM, Peter Maydell wrote:
>>> On Mon, 12 Oct 2020 at 16:37, Peter Maydell <peter.maydell@linaro.org> wrote:
>>> This turns out not to work, because gen_jmp() always generates
>>> a goto-tb for tb exit 0, and we hit the assert() that exit 0
>>> was not used twice. Here's a fixup to fold into this patch:
>>
>> Indeed.  I was going to suggest that here you should use arm_gen_condlabel()
>> like you did for LE.  Which I think would be still cleaner than your fixup patch.
> 
> I thought about that but it doesn't really fit, because
> the condlabel is for "go to the next instruction
> without having done anything". Here we need to do something
> on that codepath (unlike LE).

Ah, right.

Well, the only further comment is that, in the followup, only WLS gains the IT
block check.  While I understand that's required to avoid an abort in QEMU for
this case, all three of the insns have that case as CONSTRAINED UNPREDICTABLE.
 It might be worthwhile checking for IT in all of them, just to continue our
normal "unpredictable raises sigill, when easy" choice.


r~


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 08/10] target/arm: Fix has_vfp/has_neon ID reg squashing for M-profile
  2020-10-12 15:37 ` [PATCH 08/10] target/arm: Fix has_vfp/has_neon ID reg squashing for M-profile Peter Maydell
@ 2020-10-13 19:07   ` Richard Henderson
  0 siblings, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2020-10-13 19:07 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 10/12/20 8:37 AM, Peter Maydell wrote:
> In arm_cpu_realizefn(), if the CPU has VFP or Neon disabled then we
> squash the ID register fields so that we don't advertise it to the
> guest.  This code was written for A-profile and needs some tweaks to
> work correctly on M-profile:
> 
>  * A-profile only fields should not be zeroed on M-profile:
>    - MVFR0.FPSHVEC,FPTRAP
>    - MVFR1.SIMDLS,SIMDINT,SIMDSP,SIMDHP
>    - MVFR2.SIMDMISC
>  * M-profile only fields should be zeroed on M-profile:
>    - MVFR1.FP16
> 
> In particular, because MVFR1.SIMDHP on A-profile is the same field as
> MVFR1.FP16 on M-profile this code was incorrectly disabling FP16
> support on an M-profile CPU (where has_neon is always false).  This
> isn't a visible bug yet because we don't have any M-profile CPUs with
> FP16 support, but the change is necessary before we introduce any.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/cpu.c | 29 ++++++++++++++++++-----------
>  1 file changed, 18 insertions(+), 11 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 09/10] target/arm: Implement FPSCR.LTPSIZE for M-profile LOB extension
  2020-10-12 15:37 ` [PATCH 09/10] target/arm: Implement FPSCR.LTPSIZE for M-profile LOB extension Peter Maydell
@ 2020-10-13 20:06   ` Richard Henderson
  2020-10-13 20:38     ` Peter Maydell
  0 siblings, 1 reply; 27+ messages in thread
From: Richard Henderson @ 2020-10-13 20:06 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 10/12/20 8:37 AM, Peter Maydell wrote:
> @@ -198,8 +200,14 @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
>          /*
>           * M profile FPSCR is RES0 for the QC, STRIDE, FZ16, LEN bits
>           * and also for the trapped-exception-handling bits IxE.
> +         * From v8.1M with the low-overhead-loop extension bits
> +         * [18:16] are used for LTPSIZE and (since we don't implement
> +         * MVE) always read as 4 and ignore writes.
>           */
>          val &= 0xf7c0009f;
> +        if (cpu_isar_feature(aa32_lob, cpu)) {
> +            val |= 4 << 16;
> +        }
>      }
>  
>      vfp_set_fpscr_to_host(env, val);
> @@ -212,9 +220,18 @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
>       * (which are stored in fp_status), and the other RES0 bits
>       * in between, then we clear all of the low 16 bits.
>       */
> -    env->vfp.xregs[ARM_VFP_FPSCR] = val & 0xf7c80000;
> -    env->vfp.vec_len = (val >> 16) & 7;
> -    env->vfp.vec_stride = (val >> 20) & 3;
> +    if (cpu_isar_feature(aa32_lob, cpu)) {
> +        /*
> +         * M-profile low-overhead-loop extension: [18:16] are LTPSIZE
> +         * and we keep them in vfp.xregs[].
> +         */
> +        env->vfp.xregs[ARM_VFP_FPSCR] = val & 0xf7cf0000;
> +    } else {
> +        /* Those bits might be the old-style short vector length/stride */
> +        env->vfp.xregs[ARM_VFP_FPSCR] = val & 0xf7c80000;
> +        env->vfp.vec_len = (val >> 16) & 7;
> +        env->vfp.vec_stride = (val >> 20) & 3;
> +    }

I think these two sets of masking are confusing.
Perhaps usefully rearranged as

    if (!fp16) {
        val &= ~fz16;
    }
    vfp_set_fpscr_to_host(env, val);

    if (!m-profile) {
        vec_len = extract32(val, 16, 3);
        vec_stride = extract32(val, 20, 2);
    }
    val &= 0xf7c80000;
    if (lob) {
        val |= 4 << 16;
    }
    fpscr = val;

Which then obviates the next patch.


r~


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 07/10] target/arm: Implement v8.1M low-overhead-loop instructions
  2020-10-13 17:30         ` Richard Henderson
@ 2020-10-13 20:24           ` Peter Maydell
  0 siblings, 0 replies; 27+ messages in thread
From: Peter Maydell @ 2020-10-13 20:24 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-arm, QEMU Developers

On Tue, 13 Oct 2020 at 18:30, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Well, the only further comment is that, in the followup, only WLS gains the IT
> block check.  While I understand that's required to avoid an abort in QEMU for
> this case, all three of the insns have that case as CONSTRAINED UNPREDICTABLE.
>  It might be worthwhile checking for IT in all of them, just to continue our
> normal "unpredictable raises sigill, when easy" choice.

Maybe, but there are a lot of instructions that are
unpredictable-in-an-IT-block (CPSID, CRC32B, HVC...)
and our general approach seems to have been "don't check unless
it would cause an actual problem". The only place I can find
where we do check for this case is in trans_B_cond_thumb(),
which we do for the same reason as here.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 09/10] target/arm: Implement FPSCR.LTPSIZE for M-profile LOB extension
  2020-10-13 20:06   ` Richard Henderson
@ 2020-10-13 20:38     ` Peter Maydell
  2020-10-13 21:01       ` Richard Henderson
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Maydell @ 2020-10-13 20:38 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-arm, QEMU Developers

On Tue, 13 Oct 2020 at 21:06, Richard Henderson
<richard.henderson@linaro.org> wrote:
> I think these two sets of masking are confusing.
> Perhaps usefully rearranged as
>
>     if (!fp16) {
>         val &= ~fz16;
>     }
>     vfp_set_fpscr_to_host(env, val);
>
>     if (!m-profile) {
>         vec_len = extract32(val, 16, 3);
>         vec_stride = extract32(val, 20, 2);
>     }
>     val &= 0xf7c80000;
>     if (lob) {
>         val |= 4 << 16;
>     }
>     fpscr = val;

Yeah, probably cleaner.

The other thing I wondered about is whether we should
be setting vec_len/vec_stride for an A-profile CPU which
doesn't implement the short-vector extension (ie where
MVFR0.FPShVec is zero). But that gets a bit awkward: v8A
allows an implementation to make Stride and Len be RAZ,
but v7A didn't permit that and so I think we would need
to distinguish:
 * has short-vector support (eg Cortex-A9)
 * v8A, can implement FPSCR.{Stride,Len} as RAZ/WI
 * no short-vector support, Stride/Len can be written
   but the only effect is that some insns must UNDEF
   (eg Cortex-A7)

I think at the moment we currently provide short-vector
support for everything, which is wrong but wrong in
the direction that means more guest code runs...

thanks
-- PMM


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 09/10] target/arm: Implement FPSCR.LTPSIZE for M-profile LOB extension
  2020-10-13 20:38     ` Peter Maydell
@ 2020-10-13 21:01       ` Richard Henderson
  0 siblings, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2020-10-13 21:01 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-arm, QEMU Developers

On 10/13/20 1:38 PM, Peter Maydell wrote:
>  * has short-vector support (eg Cortex-A9)
>  * v8A, can implement FPSCR.{Stride,Len} as RAZ/WI
>  * no short-vector support, Stride/Len can be written
>    but the only effect is that some insns must UNDEF
>    (eg Cortex-A7)

Yep.

The other thing I wondered is if it was worthwhile to go ahead and split out
ltpsize now, even with MTE not implemented.

Eventually the conditions here would look like

    if (m-profile) {
        if (mte) {
            ltpsize = [18:16];
        }
    } else {
        if (!v8) {
            vec_len = [18:16];
            vec_stride = [22:20];
        }
    }

but for now you could leave out the assignment to ltpsize and just leave it
initialized to 4 since reset.


r~


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 07/10] target/arm: Implement v8.1M low-overhead-loop instructions
  2020-10-12 15:37 ` [PATCH 07/10] target/arm: Implement v8.1M low-overhead-loop instructions Peter Maydell
  2020-10-12 19:56   ` Peter Maydell
@ 2020-10-13 22:31   ` Richard Henderson
  1 sibling, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2020-10-13 22:31 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 10/12/20 8:37 AM, Peter Maydell wrote:
> +    nextlabel = gen_new_label();
> +    tcg_gen_brcondi_i32(TCG_COND_NE, cpu_R[a->rn], 0, nextlabel);
> +    gen_jmp(s, read_pc(s) + a->imm);
> +
> +    gen_set_label(nextlabel);
> +    tmp = load_reg(s, a->rn);
> +    store_reg(s, 14, tmp);
> +    gen_jmp(s, s->base.pc_next);
> +    return true;

Oh, fwiw, with the tcg optimization patches just posted, this branch is better
inverted.  That way the load of rn can be reused on the non-taken branch path.

Maybe sometime I'll try to propagate the data to the taken path, but that
automatically requires extra memory allocation, so it'll be difficult to do
that without a tcg slowdown.


r~


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 00/10] target/arm: Various v8.1M minor features
@ 2020-10-12 15:33 Peter Maydell
  0 siblings, 0 replies; 27+ messages in thread
From: Peter Maydell @ 2020-10-12 15:33 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

This patchseries implements various minor v8.1M new features,
notably the branch-future and low-overhead-loop extensions.

(None of this will get enabled until we have enough to implement
a CPU model which has v8.1M, which will be the Cortex-M55, but
as usual we can get stuff into the tree gradually.)

Patch 1 is a decodetree fix suggested by Richard that is
necessary to avoid wrong-decode of the changes to t32.decode
by later patches.

thanks
-- PMM

Peter Maydell (10):
  decodetree: Fix codegen for non-overlapping group inside overlapping
    group
  target/arm: Implement v8.1M NOCP handling
  target/arm: Implement v8.1M conditional-select insns
  target/arm: Make the t32 insn[25:23]=111 group non-overlapping
  target/arm: Don't allow BLX imm for M-profile
  target/arm: Implement v8.1M branch-future insns (as NOPs)
  target/arm: Implement v8.1M low-overhead-loop instructions
  target/arm: Fix has_vfp/has_neon ID reg squashing for M-profile
  target/arm: Implement FPSCR.LTPSIZE for M-profile LOB extension
  target/arm: Fix writing to FPSCR.FZ16 on M-profile

 target/arm/cpu.h               |   7 ++
 target/arm/m-nocp.decode       |  10 ++-
 target/arm/t32.decode          |  50 +++++++----
 target/arm/cpu.c               |  34 ++++---
 target/arm/translate.c         | 157 +++++++++++++++++++++++++++++++++
 target/arm/vfp_helper.c        |  30 +++++--
 scripts/decodetree.py          |   2 +-
 target/arm/translate-vfp.c.inc |  17 +++-
 8 files changed, 268 insertions(+), 39 deletions(-)

-- 
2.20.1



^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2020-10-13 22:33 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-12 15:37 [PATCH 00/10] target/arm: Various v8.1M minor features Peter Maydell
2020-10-12 15:37 ` [PATCH 01/10] decodetree: Fix codegen for non-overlapping group inside overlapping group Peter Maydell
2020-10-13 16:02   ` Richard Henderson
2020-10-12 15:37 ` [PATCH 02/10] target/arm: Implement v8.1M NOCP handling Peter Maydell
2020-10-12 15:37 ` [PATCH 03/10] target/arm: Implement v8.1M conditional-select insns Peter Maydell
2020-10-13 16:37   ` Richard Henderson
2020-10-12 15:37 ` [PATCH 04/10] target/arm: Make the t32 insn[25:23]=111 group non-overlapping Peter Maydell
2020-10-13 16:40   ` Richard Henderson
2020-10-12 15:37 ` [PATCH 05/10] target/arm: Don't allow BLX imm for M-profile Peter Maydell
2020-10-13 16:41   ` Richard Henderson
2020-10-12 15:37 ` [PATCH 06/10] target/arm: Implement v8.1M branch-future insns (as NOPs) Peter Maydell
2020-10-13 16:58   ` Richard Henderson
2020-10-12 15:37 ` [PATCH 07/10] target/arm: Implement v8.1M low-overhead-loop instructions Peter Maydell
2020-10-12 19:56   ` Peter Maydell
2020-10-13 17:10     ` Richard Henderson
2020-10-13 17:12       ` Peter Maydell
2020-10-13 17:30         ` Richard Henderson
2020-10-13 20:24           ` Peter Maydell
2020-10-13 22:31   ` Richard Henderson
2020-10-12 15:37 ` [PATCH 08/10] target/arm: Fix has_vfp/has_neon ID reg squashing for M-profile Peter Maydell
2020-10-13 19:07   ` Richard Henderson
2020-10-12 15:37 ` [PATCH 09/10] target/arm: Implement FPSCR.LTPSIZE for M-profile LOB extension Peter Maydell
2020-10-13 20:06   ` Richard Henderson
2020-10-13 20:38     ` Peter Maydell
2020-10-13 21:01       ` Richard Henderson
2020-10-12 15:37 ` [PATCH 10/10] target/arm: Fix writing to FPSCR.FZ16 on M-profile Peter Maydell
  -- strict thread matches above, loose matches on Subject: below --
2020-10-12 15:33 [PATCH 00/10] target/arm: Various v8.1M minor features Peter Maydell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.