All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v5 00/17] target/arm: Reduce overhead of cpu_get_tb_cpu_state
@ 2019-08-20 21:07 Richard Henderson
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 01/17] target/arm: Split out rebuild_hflags_common Richard Henderson
                   ` (17 more replies)
  0 siblings, 18 replies; 31+ messages in thread
From: Richard Henderson @ 2019-08-20 21:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Changes since v4:
  * Split patch 1 into 15 smaller patches.
  * Cache the new DEBUG_TARGET_EL field.
  * Split out m-profile hflags separately from a-profile 32-bit.
  * Move around non-cached tb flags as well, avoiding repetitive
    checks for m-profile or other mutually exclusive conditions.

  I haven't officially re-run the performance test quoted in the
  last patch, but I have eyeballed "perf top", and have dug into
  the compiled code a bit, which resulted in a few of the new
  cleanup patches (e.g. cs_base, arm_mmu_idx_el, and
  arm_cpu_data_is_big_endian).

Changes since v3:
  * Rebase.
  * Do not cache XSCALE_CPAR now that it overlaps VECSTRIDE.
  * Leave the new v7m bits as uncached.  I haven't figured
    out all of the ways fpccr is modified.

Changes since v2:
  * Do not cache VECLEN, VECSTRIDE, VFPEN.
    These variables come from VFP_FPSCR and VFP_FPEXC, not from
    system control registers.
  * Move HANDLER and STACKCHECK to rebuild_hflags_a32,
    instead of building them in rebuild_hflags_common.

Changes since v1:
  * Apparently I had started a last-minute API change, and failed to
    covert all of the users, and also failed to re-test afterward.
  * Retain assertions for --enable-debug-tcg.

Richard Henderson (17):
  target/arm: Split out rebuild_hflags_common
  target/arm: Split out rebuild_hflags_a64
  target/arm: Split out rebuild_hflags_common_32
  target/arm: Split arm_cpu_data_is_big_endian
  target/arm: Split out rebuild_hflags_m32
  target/arm: Reduce tests vs M-profile in cpu_get_tb_cpu_state
  target/arm: Split out rebuild_hflags_a32
  target/arm: Split out rebuild_hflags_aprofile
  target/arm: Hoist XSCALE_CPAR, VECLEN, VECSTRIDE in
    cpu_get_tb_cpu_state
  target/arm: Simplify set of PSTATE_SS in cpu_get_tb_cpu_state
  target/arm: Hoist computation of TBFLAG_A32.VFPEN
  target/arm: Add arm_rebuild_hflags
  target/arm: Split out arm_mmu_idx_el
  target/arm: Hoist store to cs_base in cpu_get_tb_cpu_state
  target/arm: Add HELPER(rebuild_hflags_{a32,a64,m32})
  target/arm: Rebuild hflags at EL changes and MSR writes
  target/arm: Rely on hflags correct in cpu_get_tb_cpu_state

 target/arm/cpu.h           |  84 +++++---
 target/arm/helper.h        |   4 +
 target/arm/internals.h     |   9 +
 linux-user/syscall.c       |   1 +
 target/arm/cpu.c           |   1 +
 target/arm/helper-a64.c    |   3 +
 target/arm/helper.c        | 383 ++++++++++++++++++++++++-------------
 target/arm/machine.c       |   1 +
 target/arm/op_helper.c     |   1 +
 target/arm/translate-a64.c |   6 +-
 target/arm/translate.c     |  18 +-
 11 files changed, 341 insertions(+), 170 deletions(-)

-- 
2.17.1



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH v5 01/17] target/arm: Split out rebuild_hflags_common
  2019-08-20 21:07 [Qemu-devel] [PATCH v5 00/17] target/arm: Reduce overhead of cpu_get_tb_cpu_state Richard Henderson
@ 2019-08-20 21:07 ` Richard Henderson
  2019-09-05 13:58   ` [Qemu-devel] [Qemu-arm] " Alex Bennée
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 02/17] target/arm: Split out rebuild_hflags_a64 Richard Henderson
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 31+ messages in thread
From: Richard Henderson @ 2019-08-20 21:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Create a function to compute the values of the TBFLAG_ANY bits
that will be cached.  For now, the env->hflags variable is not
used, and the results are fed back to cpu_get_tb_cpu_state.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h    | 29 ++++++++++++++++++-----------
 target/arm/helper.c | 26 +++++++++++++++++++-------
 2 files changed, 37 insertions(+), 18 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 0981303170..3dc52c032b 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -231,6 +231,9 @@ typedef struct CPUARMState {
     uint32_t pstate;
     uint32_t aarch64; /* 1 if CPU is in aarch64 state; inverse of PSTATE.nRW */
 
+    /* Cached TBFLAGS state.  See below for which bits are included.  */
+    uint32_t hflags;
+
     /* Frequently accessed CPSR bits are stored separately for efficiency.
        This contains all the other bits.  Use cpsr_{read,write} to access
        the whole CPSR.  */
@@ -3136,15 +3139,18 @@ typedef ARMCPU ArchCPU;
 
 #include "exec/cpu-all.h"
 
-/* Bit usage in the TB flags field: bit 31 indicates whether we are
+/*
+ * Bit usage in the TB flags field: bit 31 indicates whether we are
  * in 32 or 64 bit mode. The meaning of the other bits depends on that.
  * We put flags which are shared between 32 and 64 bit mode at the top
  * of the word, and flags which apply to only one mode at the bottom.
+ *
+ * Unless otherwise noted, these bits are cached in env->hflags.
  */
 FIELD(TBFLAG_ANY, AARCH64_STATE, 31, 1)
 FIELD(TBFLAG_ANY, MMUIDX, 28, 3)
 FIELD(TBFLAG_ANY, SS_ACTIVE, 27, 1)
-FIELD(TBFLAG_ANY, PSTATE_SS, 26, 1)
+FIELD(TBFLAG_ANY, PSTATE_SS, 26, 1)     /* Not cached. */
 /* Target EL if we take a floating-point-disabled exception */
 FIELD(TBFLAG_ANY, FPEXC_EL, 24, 2)
 FIELD(TBFLAG_ANY, BE_DATA, 23, 1)
@@ -3155,13 +3161,14 @@ FIELD(TBFLAG_ANY, BE_DATA, 23, 1)
 FIELD(TBFLAG_ANY, DEBUG_TARGET_EL, 21, 2)
 
 /* Bit usage when in AArch32 state: */
-FIELD(TBFLAG_A32, THUMB, 0, 1)
-FIELD(TBFLAG_A32, VECLEN, 1, 3)
-FIELD(TBFLAG_A32, VECSTRIDE, 4, 2)
+FIELD(TBFLAG_A32, THUMB, 0, 1)          /* Not cached. */
+FIELD(TBFLAG_A32, VECLEN, 1, 3)         /* Not cached. */
+FIELD(TBFLAG_A32, VECSTRIDE, 4, 2)      /* Not cached. */
 /*
  * We store the bottom two bits of the CPAR as TB flags and handle
  * checks on the other bits at runtime. This shares the same bits as
  * VECSTRIDE, which is OK as no XScale CPU has VFP.
+ * Not cached, because VECLEN+VECSTRIDE are not cached.
  */
 FIELD(TBFLAG_A32, XSCALE_CPAR, 4, 2)
 /*
@@ -3170,15 +3177,15 @@ FIELD(TBFLAG_A32, XSCALE_CPAR, 4, 2)
  * the same thing as the current security state of the processor!
  */
 FIELD(TBFLAG_A32, NS, 6, 1)
-FIELD(TBFLAG_A32, VFPEN, 7, 1)
-FIELD(TBFLAG_A32, CONDEXEC, 8, 8)
+FIELD(TBFLAG_A32, VFPEN, 7, 1)          /* Not cached. */
+FIELD(TBFLAG_A32, CONDEXEC, 8, 8)       /* Not cached. */
 FIELD(TBFLAG_A32, SCTLR_B, 16, 1)
 /* For M profile only, set if FPCCR.LSPACT is set */
-FIELD(TBFLAG_A32, LSPACT, 18, 1)
+FIELD(TBFLAG_A32, LSPACT, 18, 1)        /* Not cached. */
 /* For M profile only, set if we must create a new FP context */
-FIELD(TBFLAG_A32, NEW_FP_CTXT_NEEDED, 19, 1)
+FIELD(TBFLAG_A32, NEW_FP_CTXT_NEEDED, 19, 1) /* Not cached. */
 /* For M profile only, set if FPCCR.S does not match current security state */
-FIELD(TBFLAG_A32, FPCCR_S_WRONG, 20, 1)
+FIELD(TBFLAG_A32, FPCCR_S_WRONG, 20, 1) /* Not cached. */
 /* For M profile only, Handler (ie not Thread) mode */
 FIELD(TBFLAG_A32, HANDLER, 21, 1)
 /* For M profile only, whether we should generate stack-limit checks */
@@ -3190,7 +3197,7 @@ FIELD(TBFLAG_A64, SVEEXC_EL, 2, 2)
 FIELD(TBFLAG_A64, ZCR_LEN, 4, 4)
 FIELD(TBFLAG_A64, PAUTH_ACTIVE, 8, 1)
 FIELD(TBFLAG_A64, BT, 9, 1)
-FIELD(TBFLAG_A64, BTYPE, 10, 2)
+FIELD(TBFLAG_A64, BTYPE, 10, 2)         /* Not cached. */
 FIELD(TBFLAG_A64, TBID, 12, 2)
 
 static inline bool bswap_code(bool sctlr_b)
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 7e0d5398ab..f2c6419369 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -11016,6 +11016,22 @@ ARMMMUIdx arm_stage1_mmu_idx(CPUARMState *env)
 }
 #endif
 
+static uint32_t rebuild_hflags_common(CPUARMState *env, int fp_el,
+                                      ARMMMUIdx mmu_idx, uint32_t flags)
+{
+    flags = FIELD_DP32(flags, TBFLAG_ANY, FPEXC_EL, fp_el);
+    flags = FIELD_DP32(flags, TBFLAG_ANY, MMUIDX,
+                       arm_to_core_mmu_idx(mmu_idx));
+
+    if (arm_cpu_data_is_big_endian(env)) {
+        flags = FIELD_DP32(flags, TBFLAG_ANY, BE_DATA, 1);
+    }
+    if (arm_singlestep_active(env)) {
+        flags = FIELD_DP32(flags, TBFLAG_ANY, SS_ACTIVE, 1);
+    }
+    return flags;
+}
+
 void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
                           target_ulong *cs_base, uint32_t *pflags)
 {
@@ -11107,7 +11123,7 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
         }
     }
 
-    flags = FIELD_DP32(flags, TBFLAG_ANY, MMUIDX, arm_to_core_mmu_idx(mmu_idx));
+    flags = rebuild_hflags_common(env, fp_el, mmu_idx, flags);
 
     /* The SS_ACTIVE and PSTATE_SS bits correspond to the state machine
      * states defined in the ARM ARM for software singlestep:
@@ -11115,9 +11131,9 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
      *     0            x       Inactive (the TB flag for SS is always 0)
      *     1            0       Active-pending
      *     1            1       Active-not-pending
+     * SS_ACTIVE is set in hflags; PSTATE_SS is computed every TB.
      */
-    if (arm_singlestep_active(env)) {
-        flags = FIELD_DP32(flags, TBFLAG_ANY, SS_ACTIVE, 1);
+    if (FIELD_EX32(flags, TBFLAG_ANY, SS_ACTIVE)) {
         if (is_a64(env)) {
             if (env->pstate & PSTATE_SS) {
                 flags = FIELD_DP32(flags, TBFLAG_ANY, PSTATE_SS, 1);
@@ -11128,10 +11144,6 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
             }
         }
     }
-    if (arm_cpu_data_is_big_endian(env)) {
-        flags = FIELD_DP32(flags, TBFLAG_ANY, BE_DATA, 1);
-    }
-    flags = FIELD_DP32(flags, TBFLAG_ANY, FPEXC_EL, fp_el);
 
     if (arm_v7m_is_handler_mode(env)) {
         flags = FIELD_DP32(flags, TBFLAG_A32, HANDLER, 1);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH v5 02/17] target/arm: Split out rebuild_hflags_a64
  2019-08-20 21:07 [Qemu-devel] [PATCH v5 00/17] target/arm: Reduce overhead of cpu_get_tb_cpu_state Richard Henderson
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 01/17] target/arm: Split out rebuild_hflags_common Richard Henderson
@ 2019-08-20 21:07 ` Richard Henderson
  2019-09-05 15:28   ` [Qemu-devel] [Qemu-arm] " Alex Bennée
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 03/17] target/arm: Split out rebuild_hflags_common_32 Richard Henderson
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 31+ messages in thread
From: Richard Henderson @ 2019-08-20 21:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Create a function to compute the values of the TBFLAG_A64 bits
that will be cached.  For now, the env->hflags variable is not
used, and the results are fed back to cpu_get_tb_cpu_state.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.c | 131 +++++++++++++++++++++++---------------------
 1 file changed, 69 insertions(+), 62 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index f2c6419369..02cb43cf58 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -11032,6 +11032,71 @@ static uint32_t rebuild_hflags_common(CPUARMState *env, int fp_el,
     return flags;
 }
 
+static uint32_t rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
+                                   ARMMMUIdx mmu_idx)
+{
+    ARMMMUIdx stage1 = stage_1_mmu_idx(mmu_idx);
+    ARMVAParameters p0 = aa64_va_parameters_both(env, 0, stage1);
+    uint32_t flags = 0;
+    uint64_t sctlr;
+    int tbii, tbid;
+
+    flags = FIELD_DP32(flags, TBFLAG_ANY, AARCH64_STATE, 1);
+
+    /* FIXME: ARMv8.1-VHE S2 translation regime.  */
+    if (regime_el(env, stage1) < 2) {
+        ARMVAParameters p1 = aa64_va_parameters_both(env, -1, stage1);
+        tbid = (p1.tbi << 1) | p0.tbi;
+        tbii = tbid & ~((p1.tbid << 1) | p0.tbid);
+    } else {
+        tbid = p0.tbi;
+        tbii = tbid & !p0.tbid;
+    }
+
+    flags = FIELD_DP32(flags, TBFLAG_A64, TBII, tbii);
+    flags = FIELD_DP32(flags, TBFLAG_A64, TBID, tbid);
+
+    if (cpu_isar_feature(aa64_sve, env_archcpu(env))) {
+        int sve_el = sve_exception_el(env, el);
+        uint32_t zcr_len;
+
+        /*
+         * If SVE is disabled, but FP is enabled,
+         * then the effective len is 0.
+         */
+        if (sve_el != 0 && fp_el == 0) {
+            zcr_len = 0;
+        } else {
+            zcr_len = sve_zcr_len_for_el(env, el);
+        }
+        flags = FIELD_DP32(flags, TBFLAG_A64, SVEEXC_EL, sve_el);
+        flags = FIELD_DP32(flags, TBFLAG_A64, ZCR_LEN, zcr_len);
+    }
+
+    sctlr = arm_sctlr(env, el);
+
+    if (cpu_isar_feature(aa64_pauth, env_archcpu(env))) {
+        /*
+         * In order to save space in flags, we record only whether
+         * pauth is "inactive", meaning all insns are implemented as
+         * a nop, or "active" when some action must be performed.
+         * The decision of which action to take is left to a helper.
+         */
+        if (sctlr & (SCTLR_EnIA | SCTLR_EnIB | SCTLR_EnDA | SCTLR_EnDB)) {
+            flags = FIELD_DP32(flags, TBFLAG_A64, PAUTH_ACTIVE, 1);
+        }
+    }
+
+    if (cpu_isar_feature(aa64_bti, env_archcpu(env))) {
+        /* Note that SCTLR_EL[23].BT == SCTLR_BT1.  */
+        if (sctlr & (el == 0 ? SCTLR_BT0 : SCTLR_BT1)) {
+            flags = FIELD_DP32(flags, TBFLAG_A64, BT, 1);
+        }
+    }
+
+    return rebuild_hflags_common(env, fp_el, mmu_idx, flags);
+}
+
 void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
                           target_ulong *cs_base, uint32_t *pflags)
 {
@@ -11041,67 +11106,9 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
     uint32_t flags = 0;
 
     if (is_a64(env)) {
-        ARMCPU *cpu = env_archcpu(env);
-        uint64_t sctlr;
-
         *pc = env->pc;
-        flags = FIELD_DP32(flags, TBFLAG_ANY, AARCH64_STATE, 1);
-
-        /* Get control bits for tagged addresses.  */
-        {
-            ARMMMUIdx stage1 = stage_1_mmu_idx(mmu_idx);
-            ARMVAParameters p0 = aa64_va_parameters_both(env, 0, stage1);
-            int tbii, tbid;
-
-            /* FIXME: ARMv8.1-VHE S2 translation regime.  */
-            if (regime_el(env, stage1) < 2) {
-                ARMVAParameters p1 = aa64_va_parameters_both(env, -1, stage1);
-                tbid = (p1.tbi << 1) | p0.tbi;
-                tbii = tbid & ~((p1.tbid << 1) | p0.tbid);
-            } else {
-                tbid = p0.tbi;
-                tbii = tbid & !p0.tbid;
-            }
-
-            flags = FIELD_DP32(flags, TBFLAG_A64, TBII, tbii);
-            flags = FIELD_DP32(flags, TBFLAG_A64, TBID, tbid);
-        }
-
-        if (cpu_isar_feature(aa64_sve, cpu)) {
-            int sve_el = sve_exception_el(env, current_el);
-            uint32_t zcr_len;
-
-            /* If SVE is disabled, but FP is enabled,
-             * then the effective len is 0.
-             */
-            if (sve_el != 0 && fp_el == 0) {
-                zcr_len = 0;
-            } else {
-                zcr_len = sve_zcr_len_for_el(env, current_el);
-            }
-            flags = FIELD_DP32(flags, TBFLAG_A64, SVEEXC_EL, sve_el);
-            flags = FIELD_DP32(flags, TBFLAG_A64, ZCR_LEN, zcr_len);
-        }
-
-        sctlr = arm_sctlr(env, current_el);
-
-        if (cpu_isar_feature(aa64_pauth, cpu)) {
-            /*
-             * In order to save space in flags, we record only whether
-             * pauth is "inactive", meaning all insns are implemented as
-             * a nop, or "active" when some action must be performed.
-             * The decision of which action to take is left to a helper.
-             */
-            if (sctlr & (SCTLR_EnIA | SCTLR_EnIB | SCTLR_EnDA | SCTLR_EnDB)) {
-                flags = FIELD_DP32(flags, TBFLAG_A64, PAUTH_ACTIVE, 1);
-            }
-        }
-
-        if (cpu_isar_feature(aa64_bti, cpu)) {
-            /* Note that SCTLR_EL[23].BT == SCTLR_BT1.  */
-            if (sctlr & (current_el == 0 ? SCTLR_BT0 : SCTLR_BT1)) {
-                flags = FIELD_DP32(flags, TBFLAG_A64, BT, 1);
-            }
+        flags = rebuild_hflags_a64(env, current_el, fp_el, mmu_idx);
+        if (cpu_isar_feature(aa64_bti, env_archcpu(env))) {
             flags = FIELD_DP32(flags, TBFLAG_A64, BTYPE, env->btype);
         }
     } else {
@@ -11121,9 +11128,9 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
             flags = FIELD_DP32(flags, TBFLAG_A32,
                                XSCALE_CPAR, env->cp15.c15_cpar);
         }
-    }
 
-    flags = rebuild_hflags_common(env, fp_el, mmu_idx, flags);
+        flags = rebuild_hflags_common(env, fp_el, mmu_idx, flags);
+    }
 
     /* The SS_ACTIVE and PSTATE_SS bits correspond to the state machine
      * states defined in the ARM ARM for software singlestep:
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH v5 03/17] target/arm: Split out rebuild_hflags_common_32
  2019-08-20 21:07 [Qemu-devel] [PATCH v5 00/17] target/arm: Reduce overhead of cpu_get_tb_cpu_state Richard Henderson
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 01/17] target/arm: Split out rebuild_hflags_common Richard Henderson
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 02/17] target/arm: Split out rebuild_hflags_a64 Richard Henderson
@ 2019-08-20 21:07 ` Richard Henderson
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 04/17] target/arm: Split arm_cpu_data_is_big_endian Richard Henderson
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 31+ messages in thread
From: Richard Henderson @ 2019-08-20 21:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Create a function to compute the values of the TBFLAG_A32 bits
that will be cached, and are used by all profiles.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 02cb43cf58..1844c13a19 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -11032,6 +11032,15 @@ static uint32_t rebuild_hflags_common(CPUARMState *env, int fp_el,
     return flags;
 }
 
+static uint32_t rebuild_hflags_common_32(CPUARMState *env, int fp_el,
+                                         ARMMMUIdx mmu_idx, uint32_t flags)
+{
+    flags = FIELD_DP32(flags, TBFLAG_A32, SCTLR_B, arm_sctlr_b(env));
+    flags = FIELD_DP32(flags, TBFLAG_A32, NS, !access_secure_reg(env));
+
+    return rebuild_hflags_common(env, fp_el, mmu_idx, flags);
+}
+
 static uint32_t rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
                                    ARMMMUIdx mmu_idx)
 {
@@ -11103,7 +11112,7 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
     ARMMMUIdx mmu_idx = arm_mmu_idx(env);
     int current_el = arm_current_el(env);
     int fp_el = fp_exception_el(env, current_el);
-    uint32_t flags = 0;
+    uint32_t flags;
 
     if (is_a64(env)) {
         *pc = env->pc;
@@ -11113,12 +11122,11 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
         }
     } else {
         *pc = env->regs[15];
+        flags = rebuild_hflags_common_32(env, fp_el, mmu_idx, 0);
         flags = FIELD_DP32(flags, TBFLAG_A32, THUMB, env->thumb);
         flags = FIELD_DP32(flags, TBFLAG_A32, VECLEN, env->vfp.vec_len);
         flags = FIELD_DP32(flags, TBFLAG_A32, VECSTRIDE, env->vfp.vec_stride);
         flags = FIELD_DP32(flags, TBFLAG_A32, CONDEXEC, env->condexec_bits);
-        flags = FIELD_DP32(flags, TBFLAG_A32, SCTLR_B, arm_sctlr_b(env));
-        flags = FIELD_DP32(flags, TBFLAG_A32, NS, !access_secure_reg(env));
         if (env->vfp.xregs[ARM_VFP_FPEXC] & (1 << 30)
             || arm_el_is_aa64(env, 1) || arm_feature(env, ARM_FEATURE_M)) {
             flags = FIELD_DP32(flags, TBFLAG_A32, VFPEN, 1);
@@ -11128,8 +11136,6 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
             flags = FIELD_DP32(flags, TBFLAG_A32,
                                XSCALE_CPAR, env->cp15.c15_cpar);
         }
-
-        flags = rebuild_hflags_common(env, fp_el, mmu_idx, flags);
     }
 
     /* The SS_ACTIVE and PSTATE_SS bits correspond to the state machine
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH v5 04/17] target/arm: Split arm_cpu_data_is_big_endian
  2019-08-20 21:07 [Qemu-devel] [PATCH v5 00/17] target/arm: Reduce overhead of cpu_get_tb_cpu_state Richard Henderson
                   ` (2 preceding siblings ...)
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 03/17] target/arm: Split out rebuild_hflags_common_32 Richard Henderson
@ 2019-08-20 21:07 ` Richard Henderson
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 05/17] target/arm: Split out rebuild_hflags_m32 Richard Henderson
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 31+ messages in thread
From: Richard Henderson @ 2019-08-20 21:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Set TBFLAG_ANY.BE_DATA in rebuild_hflags_common_32 and
rebuild_hflags_a64 instead of rebuild_hflags_common, where we do
not need to re-test is_a64() nor re-compute the various inputs.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h    | 49 +++++++++++++++++++++++++++------------------
 target/arm/helper.c | 16 +++++++++++----
 2 files changed, 42 insertions(+), 23 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 3dc52c032b..5dec4d3b3a 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3104,33 +3104,44 @@ static inline uint64_t arm_sctlr(CPUARMState *env, int el)
     }
 }
 
+static inline bool arm_cpu_data_is_big_endian_a32(CPUARMState *env,
+                                                  bool sctlr_b)
+{
+#ifdef CONFIG_USER_ONLY
+    /*
+     * In system mode, BE32 is modelled in line with the
+     * architecture (as word-invariant big-endianness), where loads
+     * and stores are done little endian but from addresses which
+     * are adjusted by XORing with the appropriate constant. So the
+     * endianness to use for the raw data access is not affected by
+     * SCTLR.B.
+     * In user mode, however, we model BE32 as byte-invariant
+     * big-endianness (because user-only code cannot tell the
+     * difference), and so we need to use a data access endianness
+     * that depends on SCTLR.B.
+     */
+    if (sctlr_b) {
+        return true;
+    }
+#endif
+    /* In 32bit endianness is determined by looking at CPSR's E bit */
+    return env->uncached_cpsr & CPSR_E;
+}
+
+static inline bool arm_cpu_data_is_big_endian_a64(int el, uint64_t sctlr)
+{
+    return sctlr & (el ? SCTLR_EE : SCTLR_E0E);
+}
 
 /* Return true if the processor is in big-endian mode. */
 static inline bool arm_cpu_data_is_big_endian(CPUARMState *env)
 {
-    /* In 32bit endianness is determined by looking at CPSR's E bit */
     if (!is_a64(env)) {
-        return
-#ifdef CONFIG_USER_ONLY
-            /* In system mode, BE32 is modelled in line with the
-             * architecture (as word-invariant big-endianness), where loads
-             * and stores are done little endian but from addresses which
-             * are adjusted by XORing with the appropriate constant. So the
-             * endianness to use for the raw data access is not affected by
-             * SCTLR.B.
-             * In user mode, however, we model BE32 as byte-invariant
-             * big-endianness (because user-only code cannot tell the
-             * difference), and so we need to use a data access endianness
-             * that depends on SCTLR.B.
-             */
-            arm_sctlr_b(env) ||
-#endif
-                ((env->uncached_cpsr & CPSR_E) ? 1 : 0);
+        return arm_cpu_data_is_big_endian_a32(env, arm_sctlr_b(env));
     } else {
         int cur_el = arm_current_el(env);
         uint64_t sctlr = arm_sctlr(env, cur_el);
-
-        return (sctlr & (cur_el ? SCTLR_EE : SCTLR_E0E)) != 0;
+        return arm_cpu_data_is_big_endian_a64(cur_el, sctlr);
     }
 }
 
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 1844c13a19..6570d7e195 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -11023,9 +11023,6 @@ static uint32_t rebuild_hflags_common(CPUARMState *env, int fp_el,
     flags = FIELD_DP32(flags, TBFLAG_ANY, MMUIDX,
                        arm_to_core_mmu_idx(mmu_idx));
 
-    if (arm_cpu_data_is_big_endian(env)) {
-        flags = FIELD_DP32(flags, TBFLAG_ANY, BE_DATA, 1);
-    }
     if (arm_singlestep_active(env)) {
         flags = FIELD_DP32(flags, TBFLAG_ANY, SS_ACTIVE, 1);
     }
@@ -11035,7 +11032,14 @@ static uint32_t rebuild_hflags_common(CPUARMState *env, int fp_el,
 static uint32_t rebuild_hflags_common_32(CPUARMState *env, int fp_el,
                                          ARMMMUIdx mmu_idx, uint32_t flags)
 {
-    flags = FIELD_DP32(flags, TBFLAG_A32, SCTLR_B, arm_sctlr_b(env));
+    bool sctlr_b = arm_sctlr_b(env);
+
+    if (sctlr_b) {
+        flags = FIELD_DP32(flags, TBFLAG_A32, SCTLR_B, 1);
+    }
+    if (arm_cpu_data_is_big_endian_a32(env, sctlr_b)) {
+        flags = FIELD_DP32(flags, TBFLAG_ANY, BE_DATA, 1);
+    }
     flags = FIELD_DP32(flags, TBFLAG_A32, NS, !access_secure_reg(env));
 
     return rebuild_hflags_common(env, fp_el, mmu_idx, flags);
@@ -11084,6 +11088,10 @@ static uint32_t rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
 
     sctlr = arm_sctlr(env, el);
 
+    if (arm_cpu_data_is_big_endian_a64(el, sctlr)) {
+        flags = FIELD_DP32(flags, TBFLAG_ANY, BE_DATA, 1);
+    }
+
     if (cpu_isar_feature(aa64_pauth, env_archcpu(env))) {
         /*
          * In order to save space in flags, we record only whether
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH v5 05/17] target/arm: Split out rebuild_hflags_m32
  2019-08-20 21:07 [Qemu-devel] [PATCH v5 00/17] target/arm: Reduce overhead of cpu_get_tb_cpu_state Richard Henderson
                   ` (3 preceding siblings ...)
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 04/17] target/arm: Split arm_cpu_data_is_big_endian Richard Henderson
@ 2019-08-20 21:07 ` Richard Henderson
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 06/17] target/arm: Reduce tests vs M-profile in cpu_get_tb_cpu_state Richard Henderson
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 31+ messages in thread
From: Richard Henderson @ 2019-08-20 21:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Create a function to compute the values of the TBFLAG_A32 bits
that will be cached, and are used by M-profile.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.c | 45 ++++++++++++++++++++++++++++++---------------
 1 file changed, 30 insertions(+), 15 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 6570d7e195..c36ec6c530 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -11045,6 +11045,29 @@ static uint32_t rebuild_hflags_common_32(CPUARMState *env, int fp_el,
     return rebuild_hflags_common(env, fp_el, mmu_idx, flags);
 }
 
+static uint32_t rebuild_hflags_m32(CPUARMState *env, int fp_el,
+                                   ARMMMUIdx mmu_idx)
+{
+    uint32_t flags = 0;
+
+    if (arm_v7m_is_handler_mode(env)) {
+        flags = FIELD_DP32(flags, TBFLAG_A32, HANDLER, 1);
+    }
+
+    /*
+     * v8M always applies stack limit checks unless CCR.STKOFHFNMIGN
+     * is suppressing them because the requested execution priority
+     * is less than 0.
+     */
+    if (arm_feature(env, ARM_FEATURE_V8) &&
+        !((mmu_idx & ARM_MMU_IDX_M_NEGPRI) &&
+          (env->v7m.ccr[env->v7m.secure] & R_V7M_CCR_STKOFHFNMIGN_MASK))) {
+        flags = FIELD_DP32(flags, TBFLAG_A32, STACKCHECK, 1);
+    }
+
+    return rebuild_hflags_common_32(env, fp_el, mmu_idx, flags);
+}
+
 static uint32_t rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
                                    ARMMMUIdx mmu_idx)
 {
@@ -11130,7 +11153,13 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
         }
     } else {
         *pc = env->regs[15];
-        flags = rebuild_hflags_common_32(env, fp_el, mmu_idx, 0);
+
+        if (arm_feature(env, ARM_FEATURE_M)) {
+            flags = rebuild_hflags_m32(env, fp_el, mmu_idx);
+        } else {
+            flags = rebuild_hflags_common_32(env, fp_el, mmu_idx, 0);
+        }
+
         flags = FIELD_DP32(flags, TBFLAG_A32, THUMB, env->thumb);
         flags = FIELD_DP32(flags, TBFLAG_A32, VECLEN, env->vfp.vec_len);
         flags = FIELD_DP32(flags, TBFLAG_A32, VECSTRIDE, env->vfp.vec_stride);
@@ -11166,20 +11195,6 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
         }
     }
 
-    if (arm_v7m_is_handler_mode(env)) {
-        flags = FIELD_DP32(flags, TBFLAG_A32, HANDLER, 1);
-    }
-
-    /* v8M always applies stack limit checks unless CCR.STKOFHFNMIGN is
-     * suppressing them because the requested execution priority is less than 0.
-     */
-    if (arm_feature(env, ARM_FEATURE_V8) &&
-        arm_feature(env, ARM_FEATURE_M) &&
-        !((mmu_idx  & ARM_MMU_IDX_M_NEGPRI) &&
-          (env->v7m.ccr[env->v7m.secure] & R_V7M_CCR_STKOFHFNMIGN_MASK))) {
-        flags = FIELD_DP32(flags, TBFLAG_A32, STACKCHECK, 1);
-    }
-
     if (arm_feature(env, ARM_FEATURE_M_SECURITY) &&
         FIELD_EX32(env->v7m.fpccr[M_REG_S], V7M_FPCCR, S) != env->v7m.secure) {
         flags = FIELD_DP32(flags, TBFLAG_A32, FPCCR_S_WRONG, 1);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH v5 06/17] target/arm: Reduce tests vs M-profile in cpu_get_tb_cpu_state
  2019-08-20 21:07 [Qemu-devel] [PATCH v5 00/17] target/arm: Reduce overhead of cpu_get_tb_cpu_state Richard Henderson
                   ` (4 preceding siblings ...)
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 05/17] target/arm: Split out rebuild_hflags_m32 Richard Henderson
@ 2019-08-20 21:07 ` Richard Henderson
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 07/17] target/arm: Split out rebuild_hflags_a32 Richard Henderson
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 31+ messages in thread
From: Richard Henderson @ 2019-08-20 21:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Hoist the computation of some TBFLAG_A32 bits that only apply to
M-profile under a single test for ARM_FEATURE_M.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.c | 49 +++++++++++++++++++++------------------------
 1 file changed, 23 insertions(+), 26 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index c36ec6c530..570f2dcc98 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -11156,6 +11156,29 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
 
         if (arm_feature(env, ARM_FEATURE_M)) {
             flags = rebuild_hflags_m32(env, fp_el, mmu_idx);
+
+            if (arm_feature(env, ARM_FEATURE_M_SECURITY) &&
+                FIELD_EX32(env->v7m.fpccr[M_REG_S], V7M_FPCCR, S)
+                != env->v7m.secure) {
+                flags = FIELD_DP32(flags, TBFLAG_A32, FPCCR_S_WRONG, 1);
+            }
+
+            if ((env->v7m.fpccr[env->v7m.secure] & R_V7M_FPCCR_ASPEN_MASK) &&
+                (!(env->v7m.control[M_REG_S] & R_V7M_CONTROL_FPCA_MASK) ||
+                 (env->v7m.secure &&
+                  !(env->v7m.control[M_REG_S] & R_V7M_CONTROL_SFPA_MASK)))) {
+                /*
+                 * ASPEN is set, but FPCA/SFPA indicate that there is no
+                 * active FP context; we must create a new FP context before
+                 * executing any FP insn.
+                 */
+                flags = FIELD_DP32(flags, TBFLAG_A32, NEW_FP_CTXT_NEEDED, 1);
+            }
+
+            bool is_secure = env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_S_MASK;
+            if (env->v7m.fpccr[is_secure] & R_V7M_FPCCR_LSPACT_MASK) {
+                flags = FIELD_DP32(flags, TBFLAG_A32, LSPACT, 1);
+            }
         } else {
             flags = rebuild_hflags_common_32(env, fp_el, mmu_idx, 0);
         }
@@ -11195,32 +11218,6 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
         }
     }
 
-    if (arm_feature(env, ARM_FEATURE_M_SECURITY) &&
-        FIELD_EX32(env->v7m.fpccr[M_REG_S], V7M_FPCCR, S) != env->v7m.secure) {
-        flags = FIELD_DP32(flags, TBFLAG_A32, FPCCR_S_WRONG, 1);
-    }
-
-    if (arm_feature(env, ARM_FEATURE_M) &&
-        (env->v7m.fpccr[env->v7m.secure] & R_V7M_FPCCR_ASPEN_MASK) &&
-        (!(env->v7m.control[M_REG_S] & R_V7M_CONTROL_FPCA_MASK) ||
-         (env->v7m.secure &&
-          !(env->v7m.control[M_REG_S] & R_V7M_CONTROL_SFPA_MASK)))) {
-        /*
-         * ASPEN is set, but FPCA/SFPA indicate that there is no active
-         * FP context; we must create a new FP context before executing
-         * any FP insn.
-         */
-        flags = FIELD_DP32(flags, TBFLAG_A32, NEW_FP_CTXT_NEEDED, 1);
-    }
-
-    if (arm_feature(env, ARM_FEATURE_M)) {
-        bool is_secure = env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_S_MASK;
-
-        if (env->v7m.fpccr[is_secure] & R_V7M_FPCCR_LSPACT_MASK) {
-            flags = FIELD_DP32(flags, TBFLAG_A32, LSPACT, 1);
-        }
-    }
-
     if (!arm_feature(env, ARM_FEATURE_M)) {
         int target_el = arm_debug_target_el(env);
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH v5 07/17] target/arm: Split out rebuild_hflags_a32
  2019-08-20 21:07 [Qemu-devel] [PATCH v5 00/17] target/arm: Reduce overhead of cpu_get_tb_cpu_state Richard Henderson
                   ` (5 preceding siblings ...)
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 06/17] target/arm: Reduce tests vs M-profile in cpu_get_tb_cpu_state Richard Henderson
@ 2019-08-20 21:07 ` Richard Henderson
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 08/17] target/arm: Split out rebuild_hflags_aprofile Richard Henderson
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 31+ messages in thread
From: Richard Henderson @ 2019-08-20 21:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Currently a trivial wrapper for rebuild_hflags_common_32.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 570f2dcc98..0fea62dc12 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -11068,6 +11068,12 @@ static uint32_t rebuild_hflags_m32(CPUARMState *env, int fp_el,
     return rebuild_hflags_common_32(env, fp_el, mmu_idx, flags);
 }
 
+static uint32_t rebuild_hflags_a32(CPUARMState *env, int fp_el,
+                                   ARMMMUIdx mmu_idx)
+{
+    return rebuild_hflags_common_32(env, fp_el, mmu_idx, 0);
+}
+
 static uint32_t rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
                                    ARMMMUIdx mmu_idx)
 {
@@ -11180,7 +11186,7 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
                 flags = FIELD_DP32(flags, TBFLAG_A32, LSPACT, 1);
             }
         } else {
-            flags = rebuild_hflags_common_32(env, fp_el, mmu_idx, 0);
+            flags = rebuild_hflags_a32(env, fp_el, mmu_idx);
         }
 
         flags = FIELD_DP32(flags, TBFLAG_A32, THUMB, env->thumb);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH v5 08/17] target/arm: Split out rebuild_hflags_aprofile
  2019-08-20 21:07 [Qemu-devel] [PATCH v5 00/17] target/arm: Reduce overhead of cpu_get_tb_cpu_state Richard Henderson
                   ` (6 preceding siblings ...)
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 07/17] target/arm: Split out rebuild_hflags_a32 Richard Henderson
@ 2019-08-20 21:07 ` Richard Henderson
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 09/17] target/arm: Hoist XSCALE_CPAR, VECLEN, VECSTRIDE in cpu_get_tb_cpu_state Richard Henderson
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 31+ messages in thread
From: Richard Henderson @ 2019-08-20 21:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Create a function to compute the values of the TBFLAG_ANY bits
that will be cached, and are used by A-profile.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 0fea62dc12..fc071f95db 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -11068,18 +11068,28 @@ static uint32_t rebuild_hflags_m32(CPUARMState *env, int fp_el,
     return rebuild_hflags_common_32(env, fp_el, mmu_idx, flags);
 }
 
+static uint32_t rebuild_hflags_aprofile(CPUARMState *env)
+{
+    int flags = 0;
+
+    flags = FIELD_DP32(flags, TBFLAG_ANY, DEBUG_TARGET_EL,
+                       arm_debug_target_el(env));
+    return flags;
+}
+
 static uint32_t rebuild_hflags_a32(CPUARMState *env, int fp_el,
                                    ARMMMUIdx mmu_idx)
 {
-    return rebuild_hflags_common_32(env, fp_el, mmu_idx, 0);
+    uint32_t flags = rebuild_hflags_aprofile(env);
+    return rebuild_hflags_common_32(env, fp_el, mmu_idx, flags);
 }
 
 static uint32_t rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
                                    ARMMMUIdx mmu_idx)
 {
+    uint32_t flags = rebuild_hflags_aprofile(env);
     ARMMMUIdx stage1 = stage_1_mmu_idx(mmu_idx);
     ARMVAParameters p0 = aa64_va_parameters_both(env, 0, stage1);
-    uint32_t flags = 0;
     uint64_t sctlr;
     int tbii, tbid;
 
@@ -11224,12 +11234,6 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
         }
     }
 
-    if (!arm_feature(env, ARM_FEATURE_M)) {
-        int target_el = arm_debug_target_el(env);
-
-        flags = FIELD_DP32(flags, TBFLAG_ANY, DEBUG_TARGET_EL, target_el);
-    }
-
     *pflags = flags;
     *cs_base = 0;
 }
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH v5 09/17] target/arm: Hoist XSCALE_CPAR, VECLEN, VECSTRIDE in cpu_get_tb_cpu_state
  2019-08-20 21:07 [Qemu-devel] [PATCH v5 00/17] target/arm: Reduce overhead of cpu_get_tb_cpu_state Richard Henderson
                   ` (7 preceding siblings ...)
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 08/17] target/arm: Split out rebuild_hflags_aprofile Richard Henderson
@ 2019-08-20 21:07 ` Richard Henderson
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 10/17] target/arm: Simplify set of PSTATE_SS " Richard Henderson
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 31+ messages in thread
From: Richard Henderson @ 2019-08-20 21:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

We do not need to compute any of these values for M-profile.
Further, XSCALE_CPAR overlaps VECSTRIDE so obviously the two
sets must be mutually exclusive.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index fc071f95db..3889b9295a 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -11197,21 +11197,28 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
             }
         } else {
             flags = rebuild_hflags_a32(env, fp_el, mmu_idx);
+
+            /*
+             * Note that XSCALE_CPAR shares bits with VECSTRIDE.
+             * Note that VECLEN+VECSTRIDE are RES0 for M-profile.
+             */
+            if (arm_feature(env, ARM_FEATURE_XSCALE)) {
+                flags = FIELD_DP32(flags, TBFLAG_A32,
+                                   XSCALE_CPAR, env->cp15.c15_cpar);
+            } else {
+                flags = FIELD_DP32(flags, TBFLAG_A32, VECLEN,
+                                   env->vfp.vec_len);
+                flags = FIELD_DP32(flags, TBFLAG_A32, VECSTRIDE,
+                                   env->vfp.vec_stride);
+            }
         }
 
         flags = FIELD_DP32(flags, TBFLAG_A32, THUMB, env->thumb);
-        flags = FIELD_DP32(flags, TBFLAG_A32, VECLEN, env->vfp.vec_len);
-        flags = FIELD_DP32(flags, TBFLAG_A32, VECSTRIDE, env->vfp.vec_stride);
         flags = FIELD_DP32(flags, TBFLAG_A32, CONDEXEC, env->condexec_bits);
         if (env->vfp.xregs[ARM_VFP_FPEXC] & (1 << 30)
             || arm_el_is_aa64(env, 1) || arm_feature(env, ARM_FEATURE_M)) {
             flags = FIELD_DP32(flags, TBFLAG_A32, VFPEN, 1);
         }
-        /* Note that XSCALE_CPAR shares bits with VECSTRIDE */
-        if (arm_feature(env, ARM_FEATURE_XSCALE)) {
-            flags = FIELD_DP32(flags, TBFLAG_A32,
-                               XSCALE_CPAR, env->cp15.c15_cpar);
-        }
     }
 
     /* The SS_ACTIVE and PSTATE_SS bits correspond to the state machine
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH v5 10/17] target/arm: Simplify set of PSTATE_SS in cpu_get_tb_cpu_state
  2019-08-20 21:07 [Qemu-devel] [PATCH v5 00/17] target/arm: Reduce overhead of cpu_get_tb_cpu_state Richard Henderson
                   ` (8 preceding siblings ...)
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 09/17] target/arm: Hoist XSCALE_CPAR, VECLEN, VECSTRIDE in cpu_get_tb_cpu_state Richard Henderson
@ 2019-08-20 21:07 ` Richard Henderson
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 11/17] target/arm: Hoist computation of TBFLAG_A32.VFPEN Richard Henderson
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 31+ messages in thread
From: Richard Henderson @ 2019-08-20 21:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Hoist the variable load for PSTATE into the existing test vs is_a64.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 3889b9295a..c01b3027e0 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -11159,7 +11159,7 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
     ARMMMUIdx mmu_idx = arm_mmu_idx(env);
     int current_el = arm_current_el(env);
     int fp_el = fp_exception_el(env, current_el);
-    uint32_t flags;
+    uint32_t flags, pstate_for_ss;
 
     if (is_a64(env)) {
         *pc = env->pc;
@@ -11167,6 +11167,7 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
         if (cpu_isar_feature(aa64_bti, env_archcpu(env))) {
             flags = FIELD_DP32(flags, TBFLAG_A64, BTYPE, env->btype);
         }
+        pstate_for_ss = env->pstate;
     } else {
         *pc = env->regs[15];
 
@@ -11219,9 +11220,11 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
             || arm_el_is_aa64(env, 1) || arm_feature(env, ARM_FEATURE_M)) {
             flags = FIELD_DP32(flags, TBFLAG_A32, VFPEN, 1);
         }
+        pstate_for_ss = env->uncached_cpsr;
     }
 
-    /* The SS_ACTIVE and PSTATE_SS bits correspond to the state machine
+    /*
+     * The SS_ACTIVE and PSTATE_SS bits correspond to the state machine
      * states defined in the ARM ARM for software singlestep:
      *  SS_ACTIVE   PSTATE.SS   State
      *     0            x       Inactive (the TB flag for SS is always 0)
@@ -11229,16 +11232,9 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
      *     1            1       Active-not-pending
      * SS_ACTIVE is set in hflags; PSTATE_SS is computed every TB.
      */
-    if (FIELD_EX32(flags, TBFLAG_ANY, SS_ACTIVE)) {
-        if (is_a64(env)) {
-            if (env->pstate & PSTATE_SS) {
-                flags = FIELD_DP32(flags, TBFLAG_ANY, PSTATE_SS, 1);
-            }
-        } else {
-            if (env->uncached_cpsr & PSTATE_SS) {
-                flags = FIELD_DP32(flags, TBFLAG_ANY, PSTATE_SS, 1);
-            }
-        }
+    if (FIELD_EX32(flags, TBFLAG_ANY, SS_ACTIVE) &&
+        (pstate_for_ss & PSTATE_SS)) {
+        flags = FIELD_DP32(flags, TBFLAG_ANY, PSTATE_SS, 1);
     }
 
     *pflags = flags;
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH v5 11/17] target/arm: Hoist computation of TBFLAG_A32.VFPEN
  2019-08-20 21:07 [Qemu-devel] [PATCH v5 00/17] target/arm: Reduce overhead of cpu_get_tb_cpu_state Richard Henderson
                   ` (9 preceding siblings ...)
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 10/17] target/arm: Simplify set of PSTATE_SS " Richard Henderson
@ 2019-08-20 21:07 ` Richard Henderson
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 12/17] target/arm: Add arm_rebuild_hflags Richard Henderson
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 31+ messages in thread
From: Richard Henderson @ 2019-08-20 21:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

There are 3 conditions that each enable this flag.  M-profile always
enables; A-profile with EL1 as AA64 always enables.  Both of these
conditions can easily be cached.  The final condition relies on the
FPEXC register which we are not prepared to cache.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h    |  2 +-
 target/arm/helper.c | 14 ++++++++++----
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 5dec4d3b3a..9606222942 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3188,7 +3188,7 @@ FIELD(TBFLAG_A32, XSCALE_CPAR, 4, 2)
  * the same thing as the current security state of the processor!
  */
 FIELD(TBFLAG_A32, NS, 6, 1)
-FIELD(TBFLAG_A32, VFPEN, 7, 1)          /* Not cached. */
+FIELD(TBFLAG_A32, VFPEN, 7, 1)          /* Partially cached, minus FPEXC. */
 FIELD(TBFLAG_A32, CONDEXEC, 8, 8)       /* Not cached. */
 FIELD(TBFLAG_A32, SCTLR_B, 16, 1)
 /* For M profile only, set if FPCCR.LSPACT is set */
diff --git a/target/arm/helper.c b/target/arm/helper.c
index c01b3027e0..b905d61898 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -11050,6 +11050,9 @@ static uint32_t rebuild_hflags_m32(CPUARMState *env, int fp_el,
 {
     uint32_t flags = 0;
 
+    /* v8M always enables the fpu.  */
+    flags = FIELD_DP32(flags, TBFLAG_A32, VFPEN, 1);
+
     if (arm_v7m_is_handler_mode(env)) {
         flags = FIELD_DP32(flags, TBFLAG_A32, HANDLER, 1);
     }
@@ -11081,6 +11084,10 @@ static uint32_t rebuild_hflags_a32(CPUARMState *env, int fp_el,
                                    ARMMMUIdx mmu_idx)
 {
     uint32_t flags = rebuild_hflags_aprofile(env);
+
+    if (arm_el_is_aa64(env, 1)) {
+        flags = FIELD_DP32(flags, TBFLAG_A32, VFPEN, 1);
+    }
     return rebuild_hflags_common_32(env, fp_el, mmu_idx, flags);
 }
 
@@ -11212,14 +11219,13 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
                 flags = FIELD_DP32(flags, TBFLAG_A32, VECSTRIDE,
                                    env->vfp.vec_stride);
             }
+            if (env->vfp.xregs[ARM_VFP_FPEXC] & (1 << 30)) {
+                flags = FIELD_DP32(flags, TBFLAG_A32, VFPEN, 1);
+            }
         }
 
         flags = FIELD_DP32(flags, TBFLAG_A32, THUMB, env->thumb);
         flags = FIELD_DP32(flags, TBFLAG_A32, CONDEXEC, env->condexec_bits);
-        if (env->vfp.xregs[ARM_VFP_FPEXC] & (1 << 30)
-            || arm_el_is_aa64(env, 1) || arm_feature(env, ARM_FEATURE_M)) {
-            flags = FIELD_DP32(flags, TBFLAG_A32, VFPEN, 1);
-        }
         pstate_for_ss = env->uncached_cpsr;
     }
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH v5 12/17] target/arm: Add arm_rebuild_hflags
  2019-08-20 21:07 [Qemu-devel] [PATCH v5 00/17] target/arm: Reduce overhead of cpu_get_tb_cpu_state Richard Henderson
                   ` (10 preceding siblings ...)
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 11/17] target/arm: Hoist computation of TBFLAG_A32.VFPEN Richard Henderson
@ 2019-08-20 21:07 ` Richard Henderson
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 13/17] target/arm: Split out arm_mmu_idx_el Richard Henderson
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 31+ messages in thread
From: Richard Henderson @ 2019-08-20 21:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

This function assumes nothing about the current state of the cpu,
and writes the computed value to env->hflags.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h    |  6 ++++++
 target/arm/helper.c | 30 ++++++++++++++++++++++--------
 2 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 9606222942..602c879395 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3293,6 +3293,12 @@ void arm_register_pre_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook,
 void arm_register_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook, void
         *opaque);
 
+/**
+ * arm_rebuild_hflags:
+ * Rebuild the cached TBFLAGS for arbitrary changed processor state.
+ */
+void arm_rebuild_hflags(CPUARMState *env);
+
 /**
  * aa32_vfp_dreg:
  * Return a pointer to the Dn register within env in 32-bit mode.
diff --git a/target/arm/helper.c b/target/arm/helper.c
index b905d61898..83ae33dae5 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -11160,17 +11160,35 @@ static uint32_t rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
     return rebuild_hflags_common(env, fp_el, mmu_idx, flags);
 }
 
+static uint32_t rebuild_hflags_internal(CPUARMState *env)
+{
+    int el = arm_current_el(env);
+    int fp_el = fp_exception_el(env, el);
+    ARMMMUIdx mmu_idx = arm_mmu_idx(env);
+
+    if (is_a64(env)) {
+        return rebuild_hflags_a64(env, el, fp_el, mmu_idx);
+    } else if (arm_feature(env, ARM_FEATURE_M)) {
+        return rebuild_hflags_m32(env, fp_el, mmu_idx);
+    } else {
+        return rebuild_hflags_a32(env, fp_el, mmu_idx);
+    }
+}
+
+void arm_rebuild_hflags(CPUARMState *env)
+{
+    env->hflags = rebuild_hflags_internal(env);
+}
+
 void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
                           target_ulong *cs_base, uint32_t *pflags)
 {
-    ARMMMUIdx mmu_idx = arm_mmu_idx(env);
-    int current_el = arm_current_el(env);
-    int fp_el = fp_exception_el(env, current_el);
     uint32_t flags, pstate_for_ss;
 
+    flags = rebuild_hflags_internal(env);
+
     if (is_a64(env)) {
         *pc = env->pc;
-        flags = rebuild_hflags_a64(env, current_el, fp_el, mmu_idx);
         if (cpu_isar_feature(aa64_bti, env_archcpu(env))) {
             flags = FIELD_DP32(flags, TBFLAG_A64, BTYPE, env->btype);
         }
@@ -11179,8 +11197,6 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
         *pc = env->regs[15];
 
         if (arm_feature(env, ARM_FEATURE_M)) {
-            flags = rebuild_hflags_m32(env, fp_el, mmu_idx);
-
             if (arm_feature(env, ARM_FEATURE_M_SECURITY) &&
                 FIELD_EX32(env->v7m.fpccr[M_REG_S], V7M_FPCCR, S)
                 != env->v7m.secure) {
@@ -11204,8 +11220,6 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
                 flags = FIELD_DP32(flags, TBFLAG_A32, LSPACT, 1);
             }
         } else {
-            flags = rebuild_hflags_a32(env, fp_el, mmu_idx);
-
             /*
              * Note that XSCALE_CPAR shares bits with VECSTRIDE.
              * Note that VECLEN+VECSTRIDE are RES0 for M-profile.
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH v5 13/17] target/arm: Split out arm_mmu_idx_el
  2019-08-20 21:07 [Qemu-devel] [PATCH v5 00/17] target/arm: Reduce overhead of cpu_get_tb_cpu_state Richard Henderson
                   ` (11 preceding siblings ...)
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 12/17] target/arm: Add arm_rebuild_hflags Richard Henderson
@ 2019-08-20 21:07 ` Richard Henderson
  2019-09-06  7:12   ` Philippe Mathieu-Daudé
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 14/17] target/arm: Hoist store to cs_base in cpu_get_tb_cpu_state Richard Henderson
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 31+ messages in thread
From: Richard Henderson @ 2019-08-20 21:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Avoid calling arm_current_el() twice.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/internals.h |  9 +++++++++
 target/arm/helper.c    | 12 +++++++-----
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index 232d963875..f5313dd3d4 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -949,6 +949,15 @@ void arm_cpu_update_virq(ARMCPU *cpu);
  */
 void arm_cpu_update_vfiq(ARMCPU *cpu);
 
+/**
+ * arm_mmu_idx_el:
+ * @env: The cpu environment
+ * @el: The EL to use.
+ *
+ * Return the full ARMMMUIdx for the translation regime for EL.
+ */
+ARMMMUIdx arm_mmu_idx_el(CPUARMState *env, int el);
+
 /**
  * arm_mmu_idx:
  * @env: The cpu environment
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 83ae33dae5..19bdb9b9d6 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -10988,15 +10988,12 @@ ARMMMUIdx arm_v7m_mmu_idx_for_secstate(CPUARMState *env, bool secstate)
 }
 #endif
 
-ARMMMUIdx arm_mmu_idx(CPUARMState *env)
+ARMMMUIdx arm_mmu_idx_el(CPUARMState *env, int el)
 {
-    int el;
-
     if (arm_feature(env, ARM_FEATURE_M)) {
         return arm_v7m_mmu_idx_for_secstate(env, env->v7m.secure);
     }
 
-    el = arm_current_el(env);
     if (el < 2 && arm_is_secure_below_el3(env)) {
         return ARMMMUIdx_S1SE0 + el;
     } else {
@@ -11004,6 +11001,11 @@ ARMMMUIdx arm_mmu_idx(CPUARMState *env)
     }
 }
 
+ARMMMUIdx arm_mmu_idx(CPUARMState *env)
+{
+    return arm_mmu_idx_el(env, arm_current_el(env));
+}
+
 int cpu_mmu_index(CPUARMState *env, bool ifetch)
 {
     return arm_to_core_mmu_idx(arm_mmu_idx(env));
@@ -11164,7 +11166,7 @@ static uint32_t rebuild_hflags_internal(CPUARMState *env)
 {
     int el = arm_current_el(env);
     int fp_el = fp_exception_el(env, el);
-    ARMMMUIdx mmu_idx = arm_mmu_idx(env);
+    ARMMMUIdx mmu_idx = arm_mmu_idx_el(env, el);
 
     if (is_a64(env)) {
         return rebuild_hflags_a64(env, el, fp_el, mmu_idx);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH v5 14/17] target/arm: Hoist store to cs_base in cpu_get_tb_cpu_state
  2019-08-20 21:07 [Qemu-devel] [PATCH v5 00/17] target/arm: Reduce overhead of cpu_get_tb_cpu_state Richard Henderson
                   ` (12 preceding siblings ...)
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 13/17] target/arm: Split out arm_mmu_idx_el Richard Henderson
@ 2019-08-20 21:07 ` Richard Henderson
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 15/17] target/arm: Add HELPER(rebuild_hflags_{a32, a64, m32}) Richard Henderson
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 31+ messages in thread
From: Richard Henderson @ 2019-08-20 21:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

By performing this store early, we avoid having to save and restore
the register holding the address around any function calls.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 19bdb9b9d6..7a94495788 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -11187,6 +11187,7 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
 {
     uint32_t flags, pstate_for_ss;
 
+    *cs_base = 0;
     flags = rebuild_hflags_internal(env);
 
     if (is_a64(env)) {
@@ -11260,7 +11261,6 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
     }
 
     *pflags = flags;
-    *cs_base = 0;
 }
 
 #ifdef TARGET_AARCH64
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH v5 15/17] target/arm: Add HELPER(rebuild_hflags_{a32, a64, m32})
  2019-08-20 21:07 [Qemu-devel] [PATCH v5 00/17] target/arm: Reduce overhead of cpu_get_tb_cpu_state Richard Henderson
                   ` (13 preceding siblings ...)
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 14/17] target/arm: Hoist store to cs_base in cpu_get_tb_cpu_state Richard Henderson
@ 2019-08-20 21:07 ` Richard Henderson
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 16/17] target/arm: Rebuild hflags at EL changes and MSR writes Richard Henderson
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 31+ messages in thread
From: Richard Henderson @ 2019-08-20 21:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

This functions are given the mode and el state of the cpu
and writes the computed value to env->hflags.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.h |  4 ++++
 target/arm/helper.c | 24 ++++++++++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index 1fb2cb5a77..3d4ec267a2 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -90,6 +90,10 @@ DEF_HELPER_4(msr_banked, void, env, i32, i32, i32)
 DEF_HELPER_2(get_user_reg, i32, env, i32)
 DEF_HELPER_3(set_user_reg, void, env, i32, i32)
 
+DEF_HELPER_FLAGS_2(rebuild_hflags_m32, TCG_CALL_NO_RWG, void, env, int)
+DEF_HELPER_FLAGS_2(rebuild_hflags_a32, TCG_CALL_NO_RWG, void, env, int)
+DEF_HELPER_FLAGS_2(rebuild_hflags_a64, TCG_CALL_NO_RWG, void, env, int)
+
 DEF_HELPER_1(vfp_get_fpscr, i32, env)
 DEF_HELPER_2(vfp_set_fpscr, void, env, i32)
 
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 7a94495788..53a7bd796b 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -11182,6 +11182,30 @@ void arm_rebuild_hflags(CPUARMState *env)
     env->hflags = rebuild_hflags_internal(env);
 }
 
+void HELPER(rebuild_hflags_m32)(CPUARMState *env, int el)
+{
+    int fp_el = fp_exception_el(env, el);
+    ARMMMUIdx mmu_idx = arm_mmu_idx_el(env, el);
+
+    env->hflags = rebuild_hflags_m32(env, fp_el, mmu_idx);
+}
+
+void HELPER(rebuild_hflags_a32)(CPUARMState *env, int el)
+{
+    int fp_el = fp_exception_el(env, el);
+    ARMMMUIdx mmu_idx = arm_mmu_idx_el(env, el);
+
+    env->hflags = rebuild_hflags_a32(env, fp_el, mmu_idx);
+}
+
+void HELPER(rebuild_hflags_a64)(CPUARMState *env, int el)
+{
+    int fp_el = fp_exception_el(env, el);
+    ARMMMUIdx mmu_idx = arm_mmu_idx_el(env, el);
+
+    env->hflags = rebuild_hflags_a64(env, el, fp_el, mmu_idx);
+}
+
 void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
                           target_ulong *cs_base, uint32_t *pflags)
 {
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH v5 16/17] target/arm: Rebuild hflags at EL changes and MSR writes
  2019-08-20 21:07 [Qemu-devel] [PATCH v5 00/17] target/arm: Reduce overhead of cpu_get_tb_cpu_state Richard Henderson
                   ` (14 preceding siblings ...)
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 15/17] target/arm: Add HELPER(rebuild_hflags_{a32, a64, m32}) Richard Henderson
@ 2019-08-20 21:07 ` Richard Henderson
  2019-09-05 13:53   ` Alex Bennée
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 17/17] target/arm: Rely on hflags correct in cpu_get_tb_cpu_state Richard Henderson
  2019-08-20 23:54 ` [Qemu-devel] [PATCH v5 00/17] target/arm: Reduce overhead of cpu_get_tb_cpu_state Richard Henderson
  17 siblings, 1 reply; 31+ messages in thread
From: Richard Henderson @ 2019-08-20 21:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Now setting, but not relying upon, env->hflags.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 linux-user/syscall.c       |  1 +
 target/arm/cpu.c           |  1 +
 target/arm/helper-a64.c    |  3 +++
 target/arm/helper.c        |  2 ++
 target/arm/machine.c       |  1 +
 target/arm/op_helper.c     |  1 +
 target/arm/translate-a64.c |  6 +++++-
 target/arm/translate.c     | 18 ++++++++++++++++--
 8 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 8b41a03901..be01c33759 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -9980,6 +9980,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1,
                     aarch64_sve_narrow_vq(env, vq);
                 }
                 env->vfp.zcr_el[1] = vq - 1;
+                arm_rebuild_hflags(env);
                 ret = vq * 16;
             }
             return ret;
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 2399c14471..d043e75166 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -406,6 +406,7 @@ static void arm_cpu_reset(CPUState *s)
 
     hw_breakpoint_update_all(cpu);
     hw_watchpoint_update_all(cpu);
+    arm_rebuild_hflags(env);
 }
 
 bool arm_cpu_exec_interrupt(CPUState *cs, int interrupt_request)
diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index bca80bdc38..b4cd680fc4 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -1025,6 +1025,7 @@ void HELPER(exception_return)(CPUARMState *env, uint64_t new_pc)
         } else {
             env->regs[15] = new_pc & ~0x3;
         }
+        helper_rebuild_hflags_a32(env, new_el);
         qemu_log_mask(CPU_LOG_INT, "Exception return from AArch64 EL%d to "
                       "AArch32 EL%d PC 0x%" PRIx32 "\n",
                       cur_el, new_el, env->regs[15]);
@@ -1036,10 +1037,12 @@ void HELPER(exception_return)(CPUARMState *env, uint64_t new_pc)
         }
         aarch64_restore_sp(env, new_el);
         env->pc = new_pc;
+        helper_rebuild_hflags_a64(env, new_el);
         qemu_log_mask(CPU_LOG_INT, "Exception return from AArch64 EL%d to "
                       "AArch64 EL%d PC 0x%" PRIx64 "\n",
                       cur_el, new_el, env->pc);
     }
+
     /*
      * Note that cur_el can never be 0.  If new_el is 0, then
      * el0_a64 is return_to_aa64, else el0_a64 is ignored.
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 53a7bd796b..d1bf71a260 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -7908,6 +7908,7 @@ static void take_aarch32_exception(CPUARMState *env, int new_mode,
         env->regs[14] = env->regs[15] + offset;
     }
     env->regs[15] = newpc;
+    arm_rebuild_hflags(env);
 }
 
 static void arm_cpu_do_interrupt_aarch32_hyp(CPUState *cs)
@@ -8255,6 +8256,7 @@ static void arm_cpu_do_interrupt_aarch64(CPUState *cs)
     pstate_write(env, PSTATE_DAIF | new_mode);
     env->aarch64 = 1;
     aarch64_restore_sp(env, new_el);
+    helper_rebuild_hflags_a64(env, new_el);
 
     env->pc = addr;
 
diff --git a/target/arm/machine.c b/target/arm/machine.c
index 5c36707a7c..eb28b2381b 100644
--- a/target/arm/machine.c
+++ b/target/arm/machine.c
@@ -756,6 +756,7 @@ static int cpu_post_load(void *opaque, int version_id)
     if (!kvm_enabled()) {
         pmu_op_finish(&cpu->env);
     }
+    arm_rebuild_hflags(&cpu->env);
 
     return 0;
 }
diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
index 0fd4bd0238..ccc2cecb46 100644
--- a/target/arm/op_helper.c
+++ b/target/arm/op_helper.c
@@ -404,6 +404,7 @@ void HELPER(cpsr_write_eret)(CPUARMState *env, uint32_t val)
      * state. Do the masking now.
      */
     env->regs[15] &= (env->thumb ? ~1 : ~3);
+    arm_rebuild_hflags(env);
 
     qemu_mutex_lock_iothread();
     arm_call_el_change_hook(env_archcpu(env));
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index fc3e5f5c38..4412c60383 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -1777,11 +1777,15 @@ static void handle_sys(DisasContext *s, uint32_t insn, bool isread,
         /* I/O operations must end the TB here (whether read or write) */
         gen_io_end();
         s->base.is_jmp = DISAS_UPDATE;
-    } else if (!isread && !(ri->type & ARM_CP_SUPPRESS_TB_END)) {
+    }
+    if (!isread && !(ri->type & ARM_CP_SUPPRESS_TB_END)) {
         /* We default to ending the TB on a coprocessor register write,
          * but allow this to be suppressed by the register definition
          * (usually only necessary to work around guest bugs).
          */
+        TCGv_i32 tcg_el = tcg_const_i32(s->current_el);
+        gen_helper_rebuild_hflags_a64(cpu_env, tcg_el);
+        tcg_temp_free_i32(tcg_el);
         s->base.is_jmp = DISAS_UPDATE;
     }
 }
diff --git a/target/arm/translate.c b/target/arm/translate.c
index d948757131..2f7beca065 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -7130,6 +7130,8 @@ static int disas_coproc_insn(DisasContext *s, uint32_t insn)
     ri = get_arm_cp_reginfo(s->cp_regs,
             ENCODE_CP_REG(cpnum, is64, s->ns, crn, crm, opc1, opc2));
     if (ri) {
+        bool need_exit_tb;
+
         /* Check access permissions */
         if (!cp_access_ok(s->current_el, ri, isread)) {
             return 1;
@@ -7301,15 +7303,27 @@ static int disas_coproc_insn(DisasContext *s, uint32_t insn)
             }
         }
 
+        need_exit_tb = false;
         if ((tb_cflags(s->base.tb) & CF_USE_ICOUNT) && (ri->type & ARM_CP_IO)) {
             /* I/O operations must end the TB here (whether read or write) */
             gen_io_end();
-            gen_lookup_tb(s);
-        } else if (!isread && !(ri->type & ARM_CP_SUPPRESS_TB_END)) {
+            need_exit_tb = true;
+        }
+        if (!isread && !(ri->type & ARM_CP_SUPPRESS_TB_END)) {
             /* We default to ending the TB on a coprocessor register write,
              * but allow this to be suppressed by the register definition
              * (usually only necessary to work around guest bugs).
              */
+            TCGv_i32 tcg_el = tcg_const_i32(s->current_el);
+            if (arm_dc_feature(s, ARM_FEATURE_M)) {
+                gen_helper_rebuild_hflags_m32(cpu_env, tcg_el);
+            } else {
+                gen_helper_rebuild_hflags_a32(cpu_env, tcg_el);
+            }
+            tcg_temp_free_i32(tcg_el);
+            need_exit_tb = true;
+        }
+        if (need_exit_tb) {
             gen_lookup_tb(s);
         }
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH v5 17/17] target/arm: Rely on hflags correct in cpu_get_tb_cpu_state
  2019-08-20 21:07 [Qemu-devel] [PATCH v5 00/17] target/arm: Reduce overhead of cpu_get_tb_cpu_state Richard Henderson
                   ` (15 preceding siblings ...)
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 16/17] target/arm: Rebuild hflags at EL changes and MSR writes Richard Henderson
@ 2019-08-20 21:07 ` Richard Henderson
  2019-09-05 15:23   ` [Qemu-devel] [Qemu-arm] " Alex Bennée
  2019-08-20 23:54 ` [Qemu-devel] [PATCH v5 00/17] target/arm: Reduce overhead of cpu_get_tb_cpu_state Richard Henderson
  17 siblings, 1 reply; 31+ messages in thread
From: Richard Henderson @ 2019-08-20 21:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

This is the payoff.

From perf record -g data of ubuntu 18 boot and shutdown:

BEFORE:

-   23.02%     2.82%  qemu-system-aar  [.] helper_lookup_tb_ptr
   - 20.22% helper_lookup_tb_ptr
      + 10.05% tb_htable_lookup
      - 9.13% cpu_get_tb_cpu_state
           3.20% aa64_va_parameters_both
           0.55% fp_exception_el

-   11.66%     4.74%  qemu-system-aar  [.] cpu_get_tb_cpu_state
   - 6.96% cpu_get_tb_cpu_state
        3.63% aa64_va_parameters_both
        0.60% fp_exception_el
        0.53% sve_exception_el

AFTER:

-   16.40%     3.40%  qemu-system-aar  [.] helper_lookup_tb_ptr
   - 13.03% helper_lookup_tb_ptr
      + 11.19% tb_htable_lookup
        0.55% cpu_get_tb_cpu_state

     0.98%     0.71%  qemu-system-aar  [.] cpu_get_tb_cpu_state

     0.87%     0.24%  qemu-system-aar  [.] rebuild_hflags_a64

Before, helper_lookup_tb_ptr is the second hottest function in the
application, consuming almost a quarter of the runtime.  Within the
entire execution, cpu_get_tb_cpu_state consumes about 12%.

After, helper_lookup_tb_ptr has dropped to the fourth hottest function,
with consumption dropping to a sixth of the runtime.  Within the
entire execution, cpu_get_tb_cpu_state has dropped below 1%, and the
supporting function to rebuild hflags also consumes about 1%.

Assertions are retained for --enable-debug-tcg.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
v2: Retain asserts for future debugging.
---
 target/arm/helper.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index d1bf71a260..5e4f996882 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -11211,12 +11211,15 @@ void HELPER(rebuild_hflags_a64)(CPUARMState *env, int el)
 void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
                           target_ulong *cs_base, uint32_t *pflags)
 {
-    uint32_t flags, pstate_for_ss;
+    uint32_t flags = env->hflags;
+    uint32_t pstate_for_ss;
 
     *cs_base = 0;
-    flags = rebuild_hflags_internal(env);
+#ifdef CONFIG_TCG_DEBUG
+    assert(flags == rebuild_hflags_internal(env));
+#endif
 
-    if (is_a64(env)) {
+    if (FIELD_EX32(flags, TBFLAG_ANY, AARCH64_STATE)) {
         *pc = env->pc;
         if (cpu_isar_feature(aa64_bti, env_archcpu(env))) {
             flags = FIELD_DP32(flags, TBFLAG_A64, BTYPE, env->btype);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v5 00/17] target/arm: Reduce overhead of cpu_get_tb_cpu_state
  2019-08-20 21:07 [Qemu-devel] [PATCH v5 00/17] target/arm: Reduce overhead of cpu_get_tb_cpu_state Richard Henderson
                   ` (16 preceding siblings ...)
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 17/17] target/arm: Rely on hflags correct in cpu_get_tb_cpu_state Richard Henderson
@ 2019-08-20 23:54 ` Richard Henderson
  2019-09-04 10:48   ` Peter Maydell
  17 siblings, 1 reply; 31+ messages in thread
From: Richard Henderson @ 2019-08-20 23:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

[-- Attachment #1: Type: text/plain, Size: 842 bytes --]

On 8/20/19 2:07 PM, Richard Henderson wrote:
> Changes since v4:
>   * Split patch 1 into 15 smaller patches.
>   * Cache the new DEBUG_TARGET_EL field.
>   * Split out m-profile hflags separately from a-profile 32-bit.
>   * Move around non-cached tb flags as well, avoiding repetitive
>     checks for m-profile or other mutually exclusive conditions.

Just after I posted this, I started rebasing my VHE patch set on top, and I
found that the new DEBUG_TARGET_EL field has used The Last Bit, so that I could
not add the one bit that I need for VHE.

However, while working on this patch set, I noticed that we have a lot of
unnecessary overlap between A- and M- profile in the TBFLAGs.  Thus point 4
above and the completely separate rebuild_hflags_m32().

If we rearrange things like the appended, then we recover 4 bits.

Thoughts?


r~

[-- Attachment #2: m-tbflag.patch --]
[-- Type: text/x-patch, Size: 3083 bytes --]

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 91a54662c3..0c2803baa1 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3183,38 +3183,50 @@ FIELD(TBFLAG_ANY, BE_DATA, 23, 1)
  */
 FIELD(TBFLAG_ANY, DEBUG_TARGET_EL, 21, 2)
 
-/* Bit usage when in AArch32 state: */
-FIELD(TBFLAG_A32, THUMB, 0, 1)          /* Not cached. */
-FIELD(TBFLAG_A32, VECLEN, 1, 3)         /* Not cached. */
-FIELD(TBFLAG_A32, VECSTRIDE, 4, 2)      /* Not cached. */
+/*
+ * Bit usage when in AArch32 state, both A- and M-profile.
+ */
+FIELD(TBFLAG_AM32, CONDEXEC, 0, 8)      /* Not cached. */
+
+/*
+ * Bit usage when in AArch32 state, for A-profile only.
+ */
+FIELD(TBFLAG_A32, THUMB, 8, 1)          /* Not cached. */
+FIELD(TBFLAG_A32, VECLEN, 9, 3)         /* Not cached. */
+FIELD(TBFLAG_A32, VECSTRIDE, 12, 2)     /* Not cached. */
 /*
  * We store the bottom two bits of the CPAR as TB flags and handle
  * checks on the other bits at runtime. This shares the same bits as
  * VECSTRIDE, which is OK as no XScale CPU has VFP.
  * Not cached, because VECLEN+VECSTRIDE are not cached.
  */
-FIELD(TBFLAG_A32, XSCALE_CPAR, 4, 2)
+FIELD(TBFLAG_A32, XSCALE_CPAR, 12, 2)
+FIELD(TBFLAG_A32, VFPEN, 14, 1)         /* Partially cached, minus FPEXC. */
+FIELD(TBFLAG_A32, SCTLR_B, 15, 1)
 /*
  * Indicates whether cp register reads and writes by guest code should access
  * the secure or nonsecure bank of banked registers; note that this is not
  * the same thing as the current security state of the processor!
  */
-FIELD(TBFLAG_A32, NS, 6, 1)
-FIELD(TBFLAG_A32, VFPEN, 7, 1)          /* Partially cached, minus FPEXC. */
-FIELD(TBFLAG_A32, CONDEXEC, 8, 8)       /* Not cached. */
-FIELD(TBFLAG_A32, SCTLR_B, 16, 1)
-/* For M profile only, set if FPCCR.LSPACT is set */
-FIELD(TBFLAG_A32, LSPACT, 18, 1)        /* Not cached. */
-/* For M profile only, set if we must create a new FP context */
-FIELD(TBFLAG_A32, NEW_FP_CTXT_NEEDED, 19, 1) /* Not cached. */
-/* For M profile only, set if FPCCR.S does not match current security state */
-FIELD(TBFLAG_A32, FPCCR_S_WRONG, 20, 1) /* Not cached. */
-/* For M profile only, Handler (ie not Thread) mode */
-FIELD(TBFLAG_A32, HANDLER, 21, 1)
-/* For M profile only, whether we should generate stack-limit checks */
-FIELD(TBFLAG_A32, STACKCHECK, 22, 1)
+FIELD(TBFLAG_A32, NS, 16, 1)
 
-/* Bit usage when in AArch64 state */
+/*
+ * Bit usage when in AArch32 state, for M-profile only.
+ */
+/* Set if FPCCR.LSPACT is set */
+FIELD(TBFLAG_M32, LSPACT, 8, 1)                  /* Not cached. */
+/* Set if we must create a new FP context */
+FIELD(TBFLAG_M32, NEW_FP_CTXT_NEEDED, 9, 1)      /* Not cached. */
+/* Set if FPCCR.S does not match current security state */
+FIELD(TBFLAG_M32, FPCCR_S_WRONG, 10, 1)          /* Not cached. */
+/* Handler (ie not Thread) mode */
+FIELD(TBFLAG_A32, HANDLER, 11, 1)
+/* Whether we should generate stack-limit checks */
+FIELD(TBFLAG_A32, STACKCHECK, 12, 1)
+
+/*
+ * Bit usage when in AArch64 state
+ */
 FIELD(TBFLAG_A64, TBII, 0, 2)
 FIELD(TBFLAG_A64, SVEEXC_EL, 2, 2)
 FIELD(TBFLAG_A64, ZCR_LEN, 4, 4)

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v5 00/17] target/arm: Reduce overhead of cpu_get_tb_cpu_state
  2019-08-20 23:54 ` [Qemu-devel] [PATCH v5 00/17] target/arm: Reduce overhead of cpu_get_tb_cpu_state Richard Henderson
@ 2019-09-04 10:48   ` Peter Maydell
  2019-09-04 17:26     ` Richard Henderson
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Maydell @ 2019-09-04 10:48 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-arm, QEMU Developers

On Wed, 21 Aug 2019 at 00:54, Richard Henderson
<richard.henderson@linaro.org> wrote:
> However, while working on this patch set, I noticed that we have a lot of
> unnecessary overlap between A- and M- profile in the TBFLAGs.  Thus point 4
> above and the completely separate rebuild_hflags_m32().
>
> If we rearrange things like the appended, then we recover 4 bits.

You can't make the THUMB bit A-profile only: we need it in
M-profile too, so we can correctly generate code that takes
the InvalidState exception for attempts to execute with the
Thumb bit not set.

If you want to make VFPEN be A-profile only you need to
do something so we don't look at it for M-profile: currently
we set it always-1 for M-profile so we don't trip the code
that causes us to take an exception if it's 0.

Otherwise seems reasonable. My overall question is: how bad
is it if we just start using bits in the cs_base word?
If we try to get too tricky with using the same bits for
different purposes it opens the door for accidentally writing
code where we use a bit that isn't actually set correctly
for all the situations where we're using it.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v5 00/17] target/arm: Reduce overhead of cpu_get_tb_cpu_state
  2019-09-04 10:48   ` Peter Maydell
@ 2019-09-04 17:26     ` Richard Henderson
  0 siblings, 0 replies; 31+ messages in thread
From: Richard Henderson @ 2019-09-04 17:26 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-arm, QEMU Developers

On 9/4/19 3:48 AM, Peter Maydell wrote:
> On Wed, 21 Aug 2019 at 00:54, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>> However, while working on this patch set, I noticed that we have a lot of
>> unnecessary overlap between A- and M- profile in the TBFLAGs.  Thus point 4
>> above and the completely separate rebuild_hflags_m32().
>>
>> If we rearrange things like the appended, then we recover 4 bits.
> 
> You can't make the THUMB bit A-profile only: we need it in
> M-profile too, so we can correctly generate code that takes
> the InvalidState exception for attempts to execute with the
> Thumb bit not set.

Yes, IIRC I found that when I went to make this work, rather than merely a
sketch through the header file.

> If you want to make VFPEN be A-profile only you need to
> do something so we don't look at it for M-profile: currently
> we set it always-1 for M-profile so we don't trip the code
> that causes us to take an exception if it's 0.
> 
> Otherwise seems reasonable. My overall question is: how bad
> is it if we just start using bits in the cs_base word?

*shrug* Even then we don't want to waste bits; there are only 32 of them
remaining, and after that there's no room at all for expansion.

> If we try to get too tricky with using the same bits for
> different purposes it opens the door for accidentally writing
> code where we use a bit that isn't actually set correctly
> for all the situations where we're using it.

Thankfully, we don't touch these bits in many places.  We set them in the
generator function, and we read them at the beginning of the translator.


r~



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v5 16/17] target/arm: Rebuild hflags at EL changes and MSR writes
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 16/17] target/arm: Rebuild hflags at EL changes and MSR writes Richard Henderson
@ 2019-09-05 13:53   ` Alex Bennée
  0 siblings, 0 replies; 31+ messages in thread
From: Alex Bennée @ 2019-09-05 13:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm


Richard Henderson <richard.henderson@linaro.org> writes:

> Now setting, but not relying upon, env->hflags.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  linux-user/syscall.c       |  1 +
>  target/arm/cpu.c           |  1 +
>  target/arm/helper-a64.c    |  3 +++
>  target/arm/helper.c        |  2 ++
>  target/arm/machine.c       |  1 +
>  target/arm/op_helper.c     |  1 +
>  target/arm/translate-a64.c |  6 +++++-
>  target/arm/translate.c     | 18 ++++++++++++++++--
>  8 files changed, 30 insertions(+), 3 deletions(-)
>
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index 8b41a03901..be01c33759 100644
<snip>

I had to manually fix these up due to the patch failing to apply. I
think because 9e9b10c64911 removes the gen_io_end() calls.

> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index fc3e5f5c38..4412c60383 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -1777,11 +1777,15 @@ static void handle_sys(DisasContext *s, uint32_t insn, bool isread,
>          /* I/O operations must end the TB here (whether read or write) */
>          gen_io_end();
>          s->base.is_jmp = DISAS_UPDATE;
> -    } else if (!isread && !(ri->type & ARM_CP_SUPPRESS_TB_END)) {
> +    }
> +    if (!isread && !(ri->type & ARM_CP_SUPPRESS_TB_END)) {
>          /* We default to ending the TB on a coprocessor register write,
>           * but allow this to be suppressed by the register definition
>           * (usually only necessary to work around guest bugs).
>           */
> +        TCGv_i32 tcg_el = tcg_const_i32(s->current_el);
> +        gen_helper_rebuild_hflags_a64(cpu_env, tcg_el);
> +        tcg_temp_free_i32(tcg_el);
>          s->base.is_jmp = DISAS_UPDATE;
>      }
>  }
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index d948757131..2f7beca065 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -7130,6 +7130,8 @@ static int disas_coproc_insn(DisasContext *s, uint32_t insn)
>      ri = get_arm_cp_reginfo(s->cp_regs,
>              ENCODE_CP_REG(cpnum, is64, s->ns, crn, crm, opc1, opc2));
>      if (ri) {
> +        bool need_exit_tb;
> +
>          /* Check access permissions */
>          if (!cp_access_ok(s->current_el, ri, isread)) {
>              return 1;
> @@ -7301,15 +7303,27 @@ static int disas_coproc_insn(DisasContext *s, uint32_t insn)
>              }
>          }
>
> +        need_exit_tb = false;
>          if ((tb_cflags(s->base.tb) & CF_USE_ICOUNT) && (ri->type & ARM_CP_IO)) {
>              /* I/O operations must end the TB here (whether read or write) */
>              gen_io_end();
> -            gen_lookup_tb(s);
> -        } else if (!isread && !(ri->type & ARM_CP_SUPPRESS_TB_END)) {
> +            need_exit_tb = true;
> +        }
> +        if (!isread && !(ri->type & ARM_CP_SUPPRESS_TB_END)) {
>              /* We default to ending the TB on a coprocessor register write,
>               * but allow this to be suppressed by the register definition
>               * (usually only necessary to work around guest bugs).
>               */
> +            TCGv_i32 tcg_el = tcg_const_i32(s->current_el);
> +            if (arm_dc_feature(s, ARM_FEATURE_M)) {
> +                gen_helper_rebuild_hflags_m32(cpu_env, tcg_el);
> +            } else {
> +                gen_helper_rebuild_hflags_a32(cpu_env, tcg_el);
> +            }
> +            tcg_temp_free_i32(tcg_el);
> +            need_exit_tb = true;
> +        }
> +        if (need_exit_tb) {
>              gen_lookup_tb(s);
>          }


--
Alex Bennée


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v5 01/17] target/arm: Split out rebuild_hflags_common
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 01/17] target/arm: Split out rebuild_hflags_common Richard Henderson
@ 2019-09-05 13:58   ` Alex Bennée
  0 siblings, 0 replies; 31+ messages in thread
From: Alex Bennée @ 2019-09-05 13:58 UTC (permalink / raw)
  To: qemu-arm; +Cc: peter.maydell, qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> Create a function to compute the values of the TBFLAG_ANY bits
> that will be cached.  For now, the env->hflags variable is not
> used, and the results are fed back to cpu_get_tb_cpu_state.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  target/arm/cpu.h    | 29 ++++++++++++++++++-----------
>  target/arm/helper.c | 26 +++++++++++++++++++-------
>  2 files changed, 37 insertions(+), 18 deletions(-)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 0981303170..3dc52c032b 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -231,6 +231,9 @@ typedef struct CPUARMState {
>      uint32_t pstate;
>      uint32_t aarch64; /* 1 if CPU is in aarch64 state; inverse of PSTATE.nRW */
>
> +    /* Cached TBFLAGS state.  See below for which bits are included.  */
> +    uint32_t hflags;
> +
>      /* Frequently accessed CPSR bits are stored separately for efficiency.
>         This contains all the other bits.  Use cpsr_{read,write} to access
>         the whole CPSR.  */
> @@ -3136,15 +3139,18 @@ typedef ARMCPU ArchCPU;
>
>  #include "exec/cpu-all.h"
>
> -/* Bit usage in the TB flags field: bit 31 indicates whether we are
> +/*
> + * Bit usage in the TB flags field: bit 31 indicates whether we are
>   * in 32 or 64 bit mode. The meaning of the other bits depends on that.
>   * We put flags which are shared between 32 and 64 bit mode at the top
>   * of the word, and flags which apply to only one mode at the bottom.
> + *
> + * Unless otherwise noted, these bits are cached in env->hflags.
>   */
>  FIELD(TBFLAG_ANY, AARCH64_STATE, 31, 1)
>  FIELD(TBFLAG_ANY, MMUIDX, 28, 3)
>  FIELD(TBFLAG_ANY, SS_ACTIVE, 27, 1)
> -FIELD(TBFLAG_ANY, PSTATE_SS, 26, 1)
> +FIELD(TBFLAG_ANY, PSTATE_SS, 26, 1)     /* Not cached. */
>  /* Target EL if we take a floating-point-disabled exception */
>  FIELD(TBFLAG_ANY, FPEXC_EL, 24, 2)
>  FIELD(TBFLAG_ANY, BE_DATA, 23, 1)
> @@ -3155,13 +3161,14 @@ FIELD(TBFLAG_ANY, BE_DATA, 23, 1)
>  FIELD(TBFLAG_ANY, DEBUG_TARGET_EL, 21, 2)
>
>  /* Bit usage when in AArch32 state: */
> -FIELD(TBFLAG_A32, THUMB, 0, 1)
> -FIELD(TBFLAG_A32, VECLEN, 1, 3)
> -FIELD(TBFLAG_A32, VECSTRIDE, 4, 2)
> +FIELD(TBFLAG_A32, THUMB, 0, 1)          /* Not cached. */
> +FIELD(TBFLAG_A32, VECLEN, 1, 3)         /* Not cached. */
> +FIELD(TBFLAG_A32, VECSTRIDE, 4, 2)      /* Not cached. */
>  /*
>   * We store the bottom two bits of the CPAR as TB flags and handle
>   * checks on the other bits at runtime. This shares the same bits as
>   * VECSTRIDE, which is OK as no XScale CPU has VFP.
> + * Not cached, because VECLEN+VECSTRIDE are not cached.
>   */
>  FIELD(TBFLAG_A32, XSCALE_CPAR, 4, 2)
>  /*
> @@ -3170,15 +3177,15 @@ FIELD(TBFLAG_A32, XSCALE_CPAR, 4, 2)
>   * the same thing as the current security state of the processor!
>   */
>  FIELD(TBFLAG_A32, NS, 6, 1)
> -FIELD(TBFLAG_A32, VFPEN, 7, 1)
> -FIELD(TBFLAG_A32, CONDEXEC, 8, 8)
> +FIELD(TBFLAG_A32, VFPEN, 7, 1)          /* Not cached. */
> +FIELD(TBFLAG_A32, CONDEXEC, 8, 8)       /* Not cached. */
>  FIELD(TBFLAG_A32, SCTLR_B, 16, 1)
>  /* For M profile only, set if FPCCR.LSPACT is set */
> -FIELD(TBFLAG_A32, LSPACT, 18, 1)
> +FIELD(TBFLAG_A32, LSPACT, 18, 1)        /* Not cached. */
>  /* For M profile only, set if we must create a new FP context */
> -FIELD(TBFLAG_A32, NEW_FP_CTXT_NEEDED, 19, 1)
> +FIELD(TBFLAG_A32, NEW_FP_CTXT_NEEDED, 19, 1) /* Not cached. */
>  /* For M profile only, set if FPCCR.S does not match current security state */
> -FIELD(TBFLAG_A32, FPCCR_S_WRONG, 20, 1)
> +FIELD(TBFLAG_A32, FPCCR_S_WRONG, 20, 1) /* Not cached. */
>  /* For M profile only, Handler (ie not Thread) mode */
>  FIELD(TBFLAG_A32, HANDLER, 21, 1)
>  /* For M profile only, whether we should generate stack-limit checks */
> @@ -3190,7 +3197,7 @@ FIELD(TBFLAG_A64, SVEEXC_EL, 2, 2)
>  FIELD(TBFLAG_A64, ZCR_LEN, 4, 4)
>  FIELD(TBFLAG_A64, PAUTH_ACTIVE, 8, 1)
>  FIELD(TBFLAG_A64, BT, 9, 1)
> -FIELD(TBFLAG_A64, BTYPE, 10, 2)
> +FIELD(TBFLAG_A64, BTYPE, 10, 2)         /* Not cached. */
>  FIELD(TBFLAG_A64, TBID, 12, 2)
>
>  static inline bool bswap_code(bool sctlr_b)
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index 7e0d5398ab..f2c6419369 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -11016,6 +11016,22 @@ ARMMMUIdx arm_stage1_mmu_idx(CPUARMState *env)
>  }
>  #endif
>
> +static uint32_t rebuild_hflags_common(CPUARMState *env, int fp_el,
> +                                      ARMMMUIdx mmu_idx, uint32_t flags)
> +{
> +    flags = FIELD_DP32(flags, TBFLAG_ANY, FPEXC_EL, fp_el);
> +    flags = FIELD_DP32(flags, TBFLAG_ANY, MMUIDX,
> +                       arm_to_core_mmu_idx(mmu_idx));
> +
> +    if (arm_cpu_data_is_big_endian(env)) {
> +        flags = FIELD_DP32(flags, TBFLAG_ANY, BE_DATA, 1);
> +    }
> +    if (arm_singlestep_active(env)) {
> +        flags = FIELD_DP32(flags, TBFLAG_ANY, SS_ACTIVE, 1);
> +    }
> +    return flags;
> +}
> +
>  void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
>                            target_ulong *cs_base, uint32_t *pflags)
>  {
> @@ -11107,7 +11123,7 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
>          }
>      }
>
> -    flags = FIELD_DP32(flags, TBFLAG_ANY, MMUIDX, arm_to_core_mmu_idx(mmu_idx));
> +    flags = rebuild_hflags_common(env, fp_el, mmu_idx, flags);
>
>      /* The SS_ACTIVE and PSTATE_SS bits correspond to the state machine
>       * states defined in the ARM ARM for software singlestep:
> @@ -11115,9 +11131,9 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
>       *     0            x       Inactive (the TB flag for SS is always 0)
>       *     1            0       Active-pending
>       *     1            1       Active-not-pending
> +     * SS_ACTIVE is set in hflags; PSTATE_SS is computed every TB.
>       */
> -    if (arm_singlestep_active(env)) {
> -        flags = FIELD_DP32(flags, TBFLAG_ANY, SS_ACTIVE, 1);
> +    if (FIELD_EX32(flags, TBFLAG_ANY, SS_ACTIVE)) {
>          if (is_a64(env)) {
>              if (env->pstate & PSTATE_SS) {
>                  flags = FIELD_DP32(flags, TBFLAG_ANY, PSTATE_SS, 1);
> @@ -11128,10 +11144,6 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
>              }
>          }
>      }
> -    if (arm_cpu_data_is_big_endian(env)) {
> -        flags = FIELD_DP32(flags, TBFLAG_ANY, BE_DATA, 1);
> -    }
> -    flags = FIELD_DP32(flags, TBFLAG_ANY, FPEXC_EL, fp_el);
>
>      if (arm_v7m_is_handler_mode(env)) {
>          flags = FIELD_DP32(flags, TBFLAG_A32, HANDLER, 1);


--
Alex Bennée


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v5 17/17] target/arm: Rely on hflags correct in cpu_get_tb_cpu_state
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 17/17] target/arm: Rely on hflags correct in cpu_get_tb_cpu_state Richard Henderson
@ 2019-09-05 15:23   ` Alex Bennée
  2019-09-05 15:40     ` Laurent Desnogues
  0 siblings, 1 reply; 31+ messages in thread
From: Alex Bennée @ 2019-09-05 15:23 UTC (permalink / raw)
  To: qemu-arm; +Cc: peter.maydell, qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> This is the payoff.
>
> From perf record -g data of ubuntu 18 boot and shutdown:
>
> BEFORE:
>
> -   23.02%     2.82%  qemu-system-aar  [.] helper_lookup_tb_ptr
>    - 20.22% helper_lookup_tb_ptr
>       + 10.05% tb_htable_lookup
>       - 9.13% cpu_get_tb_cpu_state
>            3.20% aa64_va_parameters_both
>            0.55% fp_exception_el
>
> -   11.66%     4.74%  qemu-system-aar  [.] cpu_get_tb_cpu_state
>    - 6.96% cpu_get_tb_cpu_state
>         3.63% aa64_va_parameters_both
>         0.60% fp_exception_el
>         0.53% sve_exception_el
>
> AFTER:
>
> -   16.40%     3.40%  qemu-system-aar  [.] helper_lookup_tb_ptr
>    - 13.03% helper_lookup_tb_ptr
>       + 11.19% tb_htable_lookup
>         0.55% cpu_get_tb_cpu_state
>
>      0.98%     0.71%  qemu-system-aar  [.] cpu_get_tb_cpu_state
>
>      0.87%     0.24%  qemu-system-aar  [.] rebuild_hflags_a64
>
> Before, helper_lookup_tb_ptr is the second hottest function in the
> application, consuming almost a quarter of the runtime.  Within the
> entire execution, cpu_get_tb_cpu_state consumes about 12%.
>
> After, helper_lookup_tb_ptr has dropped to the fourth hottest function,
> with consumption dropping to a sixth of the runtime.  Within the
> entire execution, cpu_get_tb_cpu_state has dropped below 1%, and the
> supporting function to rebuild hflags also consumes about 1%.
>
> Assertions are retained for --enable-debug-tcg.
>
> Tested-by: Alex Bennée <alex.bennee@linaro.org>

Hmm something must have been missed for M-profile because:

  make run-tcg-tests-arm-softmmu V=1

Leads to:

  timeout 15  /home/alex/lsrc/qemu.git/builds/all.debug/arm-softmmu/qemu-system-arm -monitor none -display none -chardev file,path=test-armv6m-undef.out,id=output -semihosting -M microbit -kernel test-armv6m-undef
  qemu: fatal: Lockup: can't escalate 3 to HardFault (current priority -1)

  R00=00000000 R01=00000000 R02=00000000 R03=00000000
  R04=00000000 R05=00000000 R06=00000000 R07=00000000
  R08=00000000 R09=00000000 R10=00000000 R11=00000000
  R12=00000000 R13=20003fe0 R14=fffffff9 R15=000000c0
  XPSR=41000003 -Z-- T handler
  FPSCR: 00000000
  timeout: the monitored command dumped core

But annoyingly not shown up by the debug-tcg verification. The commit
before works fine.

> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> v2: Retain asserts for future debugging.
> ---
>  target/arm/helper.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index d1bf71a260..5e4f996882 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -11211,12 +11211,15 @@ void HELPER(rebuild_hflags_a64)(CPUARMState *env, int el)
>  void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
>                            target_ulong *cs_base, uint32_t *pflags)
>  {
> -    uint32_t flags, pstate_for_ss;
> +    uint32_t flags = env->hflags;
> +    uint32_t pstate_for_ss;
>
>      *cs_base = 0;
> -    flags = rebuild_hflags_internal(env);
> +#ifdef CONFIG_TCG_DEBUG
> +    assert(flags == rebuild_hflags_internal(env));
> +#endif
>
> -    if (is_a64(env)) {
> +    if (FIELD_EX32(flags, TBFLAG_ANY, AARCH64_STATE)) {
>          *pc = env->pc;
>          if (cpu_isar_feature(aa64_bti, env_archcpu(env))) {
>              flags = FIELD_DP32(flags, TBFLAG_A64, BTYPE, env->btype);


--
Alex Bennée


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v5 02/17] target/arm: Split out rebuild_hflags_a64
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 02/17] target/arm: Split out rebuild_hflags_a64 Richard Henderson
@ 2019-09-05 15:28   ` Alex Bennée
  2019-09-06  3:26     ` Richard Henderson
  0 siblings, 1 reply; 31+ messages in thread
From: Alex Bennée @ 2019-09-05 15:28 UTC (permalink / raw)
  To: qemu-arm; +Cc: peter.maydell, qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> Create a function to compute the values of the TBFLAG_A64 bits
> that will be cached.  For now, the env->hflags variable is not
> used, and the results are fed back to cpu_get_tb_cpu_state.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper.c | 131 +++++++++++++++++++++++---------------------
>  1 file changed, 69 insertions(+), 62 deletions(-)
>
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index f2c6419369..02cb43cf58 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -11032,6 +11032,71 @@ static uint32_t rebuild_hflags_common(CPUARMState *env, int fp_el,
>      return flags;
>  }
>
> +static uint32_t rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
> +                                   ARMMMUIdx mmu_idx)
> +{
<snip>
> +
> +    if (cpu_isar_feature(aa64_bti, env_archcpu(env))) {
> +        /* Note that SCTLR_EL[23].BT == SCTLR_BT1.  */
> +        if (sctlr & (el == 0 ? SCTLR_BT0 : SCTLR_BT1)) {
> +            flags = FIELD_DP32(flags, TBFLAG_A64, BT, 1);
> +        }
> +    }
> +
> +    return rebuild_hflags_common(env, fp_el, mmu_idx, flags);
> +}
> +
>  void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
>                            target_ulong *cs_base, uint32_t *pflags)
>  {
> @@ -11041,67 +11106,9 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
>      uint32_t flags = 0;
>
>      if (is_a64(env)) {
<snip>
> -
> -        if (cpu_isar_feature(aa64_bti, cpu)) {
> -            /* Note that SCTLR_EL[23].BT == SCTLR_BT1.  */
> -            if (sctlr & (current_el == 0 ? SCTLR_BT0 : SCTLR_BT1)) {
> -                flags = FIELD_DP32(flags, TBFLAG_A64, BT, 1);
> -            }
> +        flags = rebuild_hflags_a64(env, current_el, fp_el, mmu_idx);
> +        if (cpu_isar_feature(aa64_bti, env_archcpu(env))) {
>              flags = FIELD_DP32(flags, TBFLAG_A64, BTYPE, env->btype);

It seems off to only hoist part of the BTI flag check into the helper,
was it just missed or is there a reason? If so it could probably do with
an additional comment.

>          }
>      } else {
> @@ -11121,9 +11128,9 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
>              flags = FIELD_DP32(flags, TBFLAG_A32,
>                                 XSCALE_CPAR, env->cp15.c15_cpar);
>          }
> -    }
>
> -    flags = rebuild_hflags_common(env, fp_el, mmu_idx, flags);
> +        flags = rebuild_hflags_common(env, fp_el, mmu_idx, flags);
> +    }
>
>      /* The SS_ACTIVE and PSTATE_SS bits correspond to the state machine
>       * states defined in the ARM ARM for software singlestep:


--
Alex Bennée


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v5 17/17] target/arm: Rely on hflags correct in cpu_get_tb_cpu_state
  2019-09-05 15:23   ` [Qemu-devel] [Qemu-arm] " Alex Bennée
@ 2019-09-05 15:40     ` Laurent Desnogues
  2019-09-05 15:50       ` Alex Bennée
  0 siblings, 1 reply; 31+ messages in thread
From: Laurent Desnogues @ 2019-09-05 15:40 UTC (permalink / raw)
  To: Alex Bennée; +Cc: Peter Maydell, qemu-arm, qemu-devel

On Thu, Sep 5, 2019 at 5:24 PM Alex Bennée <alex.bennee@linaro.org> wrote:
>
>
> Richard Henderson <richard.henderson@linaro.org> writes:
>
> > This is the payoff.
> >
> > From perf record -g data of ubuntu 18 boot and shutdown:
> >
> > BEFORE:
> >
> > -   23.02%     2.82%  qemu-system-aar  [.] helper_lookup_tb_ptr
> >    - 20.22% helper_lookup_tb_ptr
> >       + 10.05% tb_htable_lookup
> >       - 9.13% cpu_get_tb_cpu_state
> >            3.20% aa64_va_parameters_both
> >            0.55% fp_exception_el
> >
> > -   11.66%     4.74%  qemu-system-aar  [.] cpu_get_tb_cpu_state
> >    - 6.96% cpu_get_tb_cpu_state
> >         3.63% aa64_va_parameters_both
> >         0.60% fp_exception_el
> >         0.53% sve_exception_el
> >
> > AFTER:
> >
> > -   16.40%     3.40%  qemu-system-aar  [.] helper_lookup_tb_ptr
> >    - 13.03% helper_lookup_tb_ptr
> >       + 11.19% tb_htable_lookup
> >         0.55% cpu_get_tb_cpu_state
> >
> >      0.98%     0.71%  qemu-system-aar  [.] cpu_get_tb_cpu_state
> >
> >      0.87%     0.24%  qemu-system-aar  [.] rebuild_hflags_a64
> >
> > Before, helper_lookup_tb_ptr is the second hottest function in the
> > application, consuming almost a quarter of the runtime.  Within the
> > entire execution, cpu_get_tb_cpu_state consumes about 12%.
> >
> > After, helper_lookup_tb_ptr has dropped to the fourth hottest function,
> > with consumption dropping to a sixth of the runtime.  Within the
> > entire execution, cpu_get_tb_cpu_state has dropped below 1%, and the
> > supporting function to rebuild hflags also consumes about 1%.
> >
> > Assertions are retained for --enable-debug-tcg.
> >
> > Tested-by: Alex Bennée <alex.bennee@linaro.org>
>
> Hmm something must have been missed for M-profile because:
>
>   make run-tcg-tests-arm-softmmu V=1
>
> Leads to:
>
>   timeout 15  /home/alex/lsrc/qemu.git/builds/all.debug/arm-softmmu/qemu-system-arm -monitor none -display none -chardev file,path=test-armv6m-undef.out,id=output -semihosting -M microbit -kernel test-armv6m-undef
>   qemu: fatal: Lockup: can't escalate 3 to HardFault (current priority -1)
>
>   R00=00000000 R01=00000000 R02=00000000 R03=00000000
>   R04=00000000 R05=00000000 R06=00000000 R07=00000000
>   R08=00000000 R09=00000000 R10=00000000 R11=00000000
>   R12=00000000 R13=20003fe0 R14=fffffff9 R15=000000c0
>   XPSR=41000003 -Z-- T handler
>   FPSCR: 00000000
>   timeout: the monitored command dumped core
>
> But annoyingly not shown up by the debug-tcg verification. The commit
> before works fine.

There's a typo in the patch:  that should not be CONFIG_TCG_DEBUG but
CONFIG_DEBUG_TCG.  With this you should see the assert fire.

I let Richard know that there's an issue with the handling of CPSR E
flag (BE_DATA in hflags).  I don't know if that applies to your test.

Thanks,

Laurent

> > Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> > Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> > ---
> > v2: Retain asserts for future debugging.
> > ---
> >  target/arm/helper.c | 9 ++++++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> >
> > diff --git a/target/arm/helper.c b/target/arm/helper.c
> > index d1bf71a260..5e4f996882 100644
> > --- a/target/arm/helper.c
> > +++ b/target/arm/helper.c
> > @@ -11211,12 +11211,15 @@ void HELPER(rebuild_hflags_a64)(CPUARMState *env, int el)
> >  void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
> >                            target_ulong *cs_base, uint32_t *pflags)
> >  {
> > -    uint32_t flags, pstate_for_ss;
> > +    uint32_t flags = env->hflags;
> > +    uint32_t pstate_for_ss;
> >
> >      *cs_base = 0;
> > -    flags = rebuild_hflags_internal(env);
> > +#ifdef CONFIG_TCG_DEBUG
> > +    assert(flags == rebuild_hflags_internal(env));
> > +#endif
> >
> > -    if (is_a64(env)) {
> > +    if (FIELD_EX32(flags, TBFLAG_ANY, AARCH64_STATE)) {
> >          *pc = env->pc;
> >          if (cpu_isar_feature(aa64_bti, env_archcpu(env))) {
> >              flags = FIELD_DP32(flags, TBFLAG_A64, BTYPE, env->btype);
>
>
> --
> Alex Bennée
>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v5 17/17] target/arm: Rely on hflags correct in cpu_get_tb_cpu_state
  2019-09-05 15:40     ` Laurent Desnogues
@ 2019-09-05 15:50       ` Alex Bennée
  2019-09-06  3:02         ` Richard Henderson
  0 siblings, 1 reply; 31+ messages in thread
From: Alex Bennée @ 2019-09-05 15:50 UTC (permalink / raw)
  To: Laurent Desnogues; +Cc: Peter Maydell, qemu-arm, qemu-devel


Laurent Desnogues <laurent.desnogues@gmail.com> writes:

> On Thu, Sep 5, 2019 at 5:24 PM Alex Bennée <alex.bennee@linaro.org> wrote:
>>
>>
>> Richard Henderson <richard.henderson@linaro.org> writes:
>>
>> > This is the payoff.
>> >
>> > From perf record -g data of ubuntu 18 boot and shutdown:
>> >
>> > BEFORE:
>> >
>> > -   23.02%     2.82%  qemu-system-aar  [.] helper_lookup_tb_ptr
>> >    - 20.22% helper_lookup_tb_ptr
>> >       + 10.05% tb_htable_lookup
>> >       - 9.13% cpu_get_tb_cpu_state
>> >            3.20% aa64_va_parameters_both
>> >            0.55% fp_exception_el
>> >
>> > -   11.66%     4.74%  qemu-system-aar  [.] cpu_get_tb_cpu_state
>> >    - 6.96% cpu_get_tb_cpu_state
>> >         3.63% aa64_va_parameters_both
>> >         0.60% fp_exception_el
>> >         0.53% sve_exception_el
>> >
>> > AFTER:
>> >
>> > -   16.40%     3.40%  qemu-system-aar  [.] helper_lookup_tb_ptr
>> >    - 13.03% helper_lookup_tb_ptr
>> >       + 11.19% tb_htable_lookup
>> >         0.55% cpu_get_tb_cpu_state
>> >
>> >      0.98%     0.71%  qemu-system-aar  [.] cpu_get_tb_cpu_state
>> >
>> >      0.87%     0.24%  qemu-system-aar  [.] rebuild_hflags_a64
>> >
>> > Before, helper_lookup_tb_ptr is the second hottest function in the
>> > application, consuming almost a quarter of the runtime.  Within the
>> > entire execution, cpu_get_tb_cpu_state consumes about 12%.
>> >
>> > After, helper_lookup_tb_ptr has dropped to the fourth hottest function,
>> > with consumption dropping to a sixth of the runtime.  Within the
>> > entire execution, cpu_get_tb_cpu_state has dropped below 1%, and the
>> > supporting function to rebuild hflags also consumes about 1%.
>> >
>> > Assertions are retained for --enable-debug-tcg.
>> >
>> > Tested-by: Alex Bennée <alex.bennee@linaro.org>
>>
>> Hmm something must have been missed for M-profile because:
>>
>>   make run-tcg-tests-arm-softmmu V=1
>>
>> Leads to:
>>
>>   timeout 15  /home/alex/lsrc/qemu.git/builds/all.debug/arm-softmmu/qemu-system-arm -monitor none -display none -chardev file,path=test-armv6m-undef.out,id=output -semihosting -M microbit -kernel test-armv6m-undef
>>   qemu: fatal: Lockup: can't escalate 3 to HardFault (current priority -1)
>>
>>   R00=00000000 R01=00000000 R02=00000000 R03=00000000
>>   R04=00000000 R05=00000000 R06=00000000 R07=00000000
>>   R08=00000000 R09=00000000 R10=00000000 R11=00000000
>>   R12=00000000 R13=20003fe0 R14=fffffff9 R15=000000c0
>>   XPSR=41000003 -Z-- T handler
>>   FPSCR: 00000000
>>   timeout: the monitored command dumped core
>>
>> But annoyingly not shown up by the debug-tcg verification. The commit
>> before works fine.
>
> There's a typo in the patch:  that should not be CONFIG_TCG_DEBUG but
> CONFIG_DEBUG_TCG.  With this you should see the assert fire.

Indeed:

  cpu_get_tb_cpu_state: cache 110000c0 <> 312000c0

I wish there was an assert form that would handily print out the
difference between the two values. I wonder if glib has one...

>
> I let Richard know that there's an issue with the handling of CPSR E
> flag (BE_DATA in hflags).  I don't know if that applies to your test.
>
> Thanks,
>
> Laurent
>
>> > Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>> > Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> > ---
>> > v2: Retain asserts for future debugging.
>> > ---
>> >  target/arm/helper.c | 9 ++++++---
>> >  1 file changed, 6 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/target/arm/helper.c b/target/arm/helper.c
>> > index d1bf71a260..5e4f996882 100644
>> > --- a/target/arm/helper.c
>> > +++ b/target/arm/helper.c
>> > @@ -11211,12 +11211,15 @@ void HELPER(rebuild_hflags_a64)(CPUARMState *env, int el)
>> >  void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
>> >                            target_ulong *cs_base, uint32_t *pflags)
>> >  {
>> > -    uint32_t flags, pstate_for_ss;
>> > +    uint32_t flags = env->hflags;
>> > +    uint32_t pstate_for_ss;
>> >
>> >      *cs_base = 0;
>> > -    flags = rebuild_hflags_internal(env);
>> > +#ifdef CONFIG_TCG_DEBUG
>> > +    assert(flags == rebuild_hflags_internal(env));
>> > +#endif
>> >
>> > -    if (is_a64(env)) {
>> > +    if (FIELD_EX32(flags, TBFLAG_ANY, AARCH64_STATE)) {
>> >          *pc = env->pc;
>> >          if (cpu_isar_feature(aa64_bti, env_archcpu(env))) {
>> >              flags = FIELD_DP32(flags, TBFLAG_A64, BTYPE, env->btype);
>>
>>
>> --
>> Alex Bennée
>>


--
Alex Bennée


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v5 17/17] target/arm: Rely on hflags correct in cpu_get_tb_cpu_state
  2019-09-05 15:50       ` Alex Bennée
@ 2019-09-06  3:02         ` Richard Henderson
  0 siblings, 0 replies; 31+ messages in thread
From: Richard Henderson @ 2019-09-06  3:02 UTC (permalink / raw)
  To: Alex Bennée, Laurent Desnogues; +Cc: Peter Maydell, qemu-arm, qemu-devel

On 9/5/19 11:50 AM, Alex Bennée wrote:
> I wish there was an assert form that would handily print out the
> difference between the two values. I wonder if glib has one...

g_assert_cmphex(), but checkpatch.pl flags its use because there's a glib
environment variable that disables the assert.


r~


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v5 02/17] target/arm: Split out rebuild_hflags_a64
  2019-09-05 15:28   ` [Qemu-devel] [Qemu-arm] " Alex Bennée
@ 2019-09-06  3:26     ` Richard Henderson
  2019-09-06 15:52       ` Alex Bennée
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Henderson @ 2019-09-06  3:26 UTC (permalink / raw)
  To: Alex Bennée, qemu-arm; +Cc: peter.maydell, qemu-devel

On 9/5/19 11:28 AM, Alex Bennée wrote:
>> -
>> -        if (cpu_isar_feature(aa64_bti, cpu)) {
>> -            /* Note that SCTLR_EL[23].BT == SCTLR_BT1.  */
>> -            if (sctlr & (current_el == 0 ? SCTLR_BT0 : SCTLR_BT1)) {
>> -                flags = FIELD_DP32(flags, TBFLAG_A64, BT, 1);
>> -            }
>> +        flags = rebuild_hflags_a64(env, current_el, fp_el, mmu_idx);
>> +        if (cpu_isar_feature(aa64_bti, env_archcpu(env))) {
>>              flags = FIELD_DP32(flags, TBFLAG_A64, BTYPE, env->btype);
> 
> It seems off to only hoist part of the BTI flag check into the helper,
> was it just missed or is there a reason? If so it could probably do with
> an additional comment.

The part of the bti stuff that is hoisted is solely based on system registers.
 The BTYPE field is in PSTATE and is a very different kind of animal -- in
particular, it is not set by MSR.

But also, comments in cpu.h say which fields are (not) cached in hflags, and
BTYPE is so documented.

Is your proposed comment really helpful here going forward, or do you just
think it's weird reviewing this patch, since not all BTI is treated the same
after the patch?


r~


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v5 13/17] target/arm: Split out arm_mmu_idx_el
  2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 13/17] target/arm: Split out arm_mmu_idx_el Richard Henderson
@ 2019-09-06  7:12   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 31+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-09-06  7:12 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: peter.maydell, qemu-arm

On 8/20/19 11:07 PM, Richard Henderson wrote:
> Avoid calling arm_current_el() twice.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

> ---
>  target/arm/internals.h |  9 +++++++++
>  target/arm/helper.c    | 12 +++++++-----
>  2 files changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/target/arm/internals.h b/target/arm/internals.h
> index 232d963875..f5313dd3d4 100644
> --- a/target/arm/internals.h
> +++ b/target/arm/internals.h
> @@ -949,6 +949,15 @@ void arm_cpu_update_virq(ARMCPU *cpu);
>   */
>  void arm_cpu_update_vfiq(ARMCPU *cpu);
>  
> +/**
> + * arm_mmu_idx_el:
> + * @env: The cpu environment
> + * @el: The EL to use.
> + *
> + * Return the full ARMMMUIdx for the translation regime for EL.
> + */
> +ARMMMUIdx arm_mmu_idx_el(CPUARMState *env, int el);
> +
>  /**
>   * arm_mmu_idx:
>   * @env: The cpu environment
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index 83ae33dae5..19bdb9b9d6 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -10988,15 +10988,12 @@ ARMMMUIdx arm_v7m_mmu_idx_for_secstate(CPUARMState *env, bool secstate)
>  }
>  #endif
>  
> -ARMMMUIdx arm_mmu_idx(CPUARMState *env)
> +ARMMMUIdx arm_mmu_idx_el(CPUARMState *env, int el)
>  {
> -    int el;
> -
>      if (arm_feature(env, ARM_FEATURE_M)) {
>          return arm_v7m_mmu_idx_for_secstate(env, env->v7m.secure);
>      }
>  
> -    el = arm_current_el(env);
>      if (el < 2 && arm_is_secure_below_el3(env)) {
>          return ARMMMUIdx_S1SE0 + el;
>      } else {
> @@ -11004,6 +11001,11 @@ ARMMMUIdx arm_mmu_idx(CPUARMState *env)
>      }
>  }
>  
> +ARMMMUIdx arm_mmu_idx(CPUARMState *env)
> +{
> +    return arm_mmu_idx_el(env, arm_current_el(env));
> +}
> +
>  int cpu_mmu_index(CPUARMState *env, bool ifetch)
>  {
>      return arm_to_core_mmu_idx(arm_mmu_idx(env));
> @@ -11164,7 +11166,7 @@ static uint32_t rebuild_hflags_internal(CPUARMState *env)
>  {
>      int el = arm_current_el(env);
>      int fp_el = fp_exception_el(env, el);
> -    ARMMMUIdx mmu_idx = arm_mmu_idx(env);
> +    ARMMMUIdx mmu_idx = arm_mmu_idx_el(env, el);
>  
>      if (is_a64(env)) {
>          return rebuild_hflags_a64(env, el, fp_el, mmu_idx);
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v5 02/17] target/arm: Split out rebuild_hflags_a64
  2019-09-06  3:26     ` Richard Henderson
@ 2019-09-06 15:52       ` Alex Bennée
  0 siblings, 0 replies; 31+ messages in thread
From: Alex Bennée @ 2019-09-06 15:52 UTC (permalink / raw)
  To: Richard Henderson; +Cc: peter.maydell, qemu-arm, qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> On 9/5/19 11:28 AM, Alex Bennée wrote:
>>> -
>>> -        if (cpu_isar_feature(aa64_bti, cpu)) {
>>> -            /* Note that SCTLR_EL[23].BT == SCTLR_BT1.  */
>>> -            if (sctlr & (current_el == 0 ? SCTLR_BT0 : SCTLR_BT1)) {
>>> -                flags = FIELD_DP32(flags, TBFLAG_A64, BT, 1);
>>> -            }
>>> +        flags = rebuild_hflags_a64(env, current_el, fp_el, mmu_idx);
>>> +        if (cpu_isar_feature(aa64_bti, env_archcpu(env))) {
>>>              flags = FIELD_DP32(flags, TBFLAG_A64, BTYPE, env->btype);
>>
>> It seems off to only hoist part of the BTI flag check into the helper,
>> was it just missed or is there a reason? If so it could probably do with
>> an additional comment.
>
> The part of the bti stuff that is hoisted is solely based on system registers.
>  The BTYPE field is in PSTATE and is a very different kind of animal -- in
> particular, it is not set by MSR.
>
> But also, comments in cpu.h say which fields are (not) cached in hflags, and
> BTYPE is so documented.
>
> Is your proposed comment really helpful here going forward, or do you just
> think it's weird reviewing this patch, since not all BTI is treated the same
> after the patch?

It was just weird seeing the isar_feature test twice. A mention in the
commit "not all bti related flags will be cached so we have to test the
feature twice" or something like that will suffice.

>
>
> r~


--
Alex Bennée


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2019-09-06 15:53 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-20 21:07 [Qemu-devel] [PATCH v5 00/17] target/arm: Reduce overhead of cpu_get_tb_cpu_state Richard Henderson
2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 01/17] target/arm: Split out rebuild_hflags_common Richard Henderson
2019-09-05 13:58   ` [Qemu-devel] [Qemu-arm] " Alex Bennée
2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 02/17] target/arm: Split out rebuild_hflags_a64 Richard Henderson
2019-09-05 15:28   ` [Qemu-devel] [Qemu-arm] " Alex Bennée
2019-09-06  3:26     ` Richard Henderson
2019-09-06 15:52       ` Alex Bennée
2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 03/17] target/arm: Split out rebuild_hflags_common_32 Richard Henderson
2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 04/17] target/arm: Split arm_cpu_data_is_big_endian Richard Henderson
2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 05/17] target/arm: Split out rebuild_hflags_m32 Richard Henderson
2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 06/17] target/arm: Reduce tests vs M-profile in cpu_get_tb_cpu_state Richard Henderson
2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 07/17] target/arm: Split out rebuild_hflags_a32 Richard Henderson
2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 08/17] target/arm: Split out rebuild_hflags_aprofile Richard Henderson
2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 09/17] target/arm: Hoist XSCALE_CPAR, VECLEN, VECSTRIDE in cpu_get_tb_cpu_state Richard Henderson
2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 10/17] target/arm: Simplify set of PSTATE_SS " Richard Henderson
2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 11/17] target/arm: Hoist computation of TBFLAG_A32.VFPEN Richard Henderson
2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 12/17] target/arm: Add arm_rebuild_hflags Richard Henderson
2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 13/17] target/arm: Split out arm_mmu_idx_el Richard Henderson
2019-09-06  7:12   ` Philippe Mathieu-Daudé
2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 14/17] target/arm: Hoist store to cs_base in cpu_get_tb_cpu_state Richard Henderson
2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 15/17] target/arm: Add HELPER(rebuild_hflags_{a32, a64, m32}) Richard Henderson
2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 16/17] target/arm: Rebuild hflags at EL changes and MSR writes Richard Henderson
2019-09-05 13:53   ` Alex Bennée
2019-08-20 21:07 ` [Qemu-devel] [PATCH v5 17/17] target/arm: Rely on hflags correct in cpu_get_tb_cpu_state Richard Henderson
2019-09-05 15:23   ` [Qemu-devel] [Qemu-arm] " Alex Bennée
2019-09-05 15:40     ` Laurent Desnogues
2019-09-05 15:50       ` Alex Bennée
2019-09-06  3:02         ` Richard Henderson
2019-08-20 23:54 ` [Qemu-devel] [PATCH v5 00/17] target/arm: Reduce overhead of cpu_get_tb_cpu_state Richard Henderson
2019-09-04 10:48   ` Peter Maydell
2019-09-04 17:26     ` Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.