All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] Re-write PPC64 PMU instruction count using TCG Ops
@ 2021-12-22 13:45 Daniel Henrique Barboza
  2021-12-22 13:45 ` [PATCH 1/8] target/ppc: introduce power8-pmu-insn-cnt.c.inc Daniel Henrique Barboza
                   ` (8 more replies)
  0 siblings, 9 replies; 12+ messages in thread
From: Daniel Henrique Barboza @ 2021-12-22 13:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: richard.henderson, Daniel Henrique Barboza, qemu-ppc, clg, david

Hi,

Two days ago Richard Henderson reported test failures with Avocado and
powernv8/9 due to timeouts [1]. The culprit ended up to be commit , a
commit where I introduced PMU instruction counting for TCG PPC64.

For a reason that is still unclear to me these Avocado powernv tests are
suffering a huge performance impact after that patch, something that I
didn't verify in any other scenario I've tested. So one alternative to
fix the situation is to understand this difference and try to solve it,
which can take some time. 
 
Another alternative is to optimize the code introduced by that commit.
Today the instruction count is done by a TCG helper that is called after
each TB exit. I was aware that calling a helper frequently isn't
optimal, but that got the job done and didn't  hindered the use of
pSeries and powernv machines.  Well, until [1] at least.

This series rewrites the PMU instruction counting using TCG Ops instead
of a TCG helper. To do that we needed to write in TCG Ops not only the
logic for increment the counters but also the logic to detect counter
overflows.

A lot of code was added but the performance improvement is noticeable.
Using my local machine I did some test runs with the 2 Avocado powernv
tests that are timing out at this moment:

- failing Avocado powernv tests with current master:

 (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (70.17 s)
 (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (70.90 s)
 (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (70.81 s)
 
 (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (75.62 s)
 (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (69.79 s)
 (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (72.33 s)

- after this series:

 (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (39.90 s)
 (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (38.25 s)
 (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (37.99 s)

 (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (43.17 s)
 (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (43.64 s)
 (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (44.21 s)


I've also tested this code with the EBB exception patch that is pending
re-send [2]. The EBB kernel selftests are working as expected. This
means that we improved the performance and didn't lost any PMU
capability we already have.


[1] https://lists.gnu.org/archive/html/qemu-devel/2021-12/msg03486.html
[2] https://lists.gnu.org/archive/html/qemu-devel/2021-12/msg00082.html


Daniel Henrique Barboza (8):
  target/ppc: introduce power8-pmu-insn-cnt.c.inc
  target/ppc/power8-pmu-insn-cnt: add pmu_inc_pmc5()
  target/ppc/power8-pmu-insn-cnt: add pmu_inc_pmc1()
  target/ppc/power8-pmu-insn-cnt: add pmu_inc_pmc2()
  target/ppc/power8-pmu-insn-cnt: add pmu_inc_pmc3()
  target/ppc/power8-pmu-insn-cnt.c: add pmu_inc_pmc4()
  target/ppc/power8-pmu-insn-cnt: add pmu_check_overflow()
  target/ppc/power8-pmu.c: remove helper_insns_inc()

 target/ppc/helper.h                  |   2 +-
 target/ppc/power8-pmu-insn-cnt.c.inc | 365 +++++++++++++++++++++++++++
 target/ppc/power8-pmu.c              |  60 +----
 target/ppc/translate.c               |  44 +---
 4 files changed, 372 insertions(+), 99 deletions(-)
 create mode 100644 target/ppc/power8-pmu-insn-cnt.c.inc

-- 
2.33.1



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/8] target/ppc: introduce power8-pmu-insn-cnt.c.inc
  2021-12-22 13:45 [PATCH 0/8] Re-write PPC64 PMU instruction count using TCG Ops Daniel Henrique Barboza
@ 2021-12-22 13:45 ` Daniel Henrique Barboza
  2021-12-22 18:00   ` Cédric Le Goater
  2021-12-22 13:45 ` [PATCH 2/8] target/ppc/power8-pmu-insn-cnt: add pmu_inc_pmc5() Daniel Henrique Barboza
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 12+ messages in thread
From: Daniel Henrique Barboza @ 2021-12-22 13:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: richard.henderson, Daniel Henrique Barboza, qemu-ppc, clg, david

We're going to add a significant amount of TCG ops code for
instruction counting, eventually getting rid of the 'helper_insn_inc'
helper entirely.

Create a new file to avoid putting even more stuff on the already
crowded target/ppc/translate.c.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
 target/ppc/power8-pmu-insn-cnt.c.inc | 54 ++++++++++++++++++++++++++++
 target/ppc/translate.c               | 44 ++---------------------
 2 files changed, 56 insertions(+), 42 deletions(-)
 create mode 100644 target/ppc/power8-pmu-insn-cnt.c.inc

diff --git a/target/ppc/power8-pmu-insn-cnt.c.inc b/target/ppc/power8-pmu-insn-cnt.c.inc
new file mode 100644
index 0000000000..2febbcc27e
--- /dev/null
+++ b/target/ppc/power8-pmu-insn-cnt.c.inc
@@ -0,0 +1,54 @@
+/*
+ * PMU instruction counting for TCG IBM POWER chips
+ *
+ * Copyright IBM Corp. 2021
+ *
+ * Authors:
+ *  Daniel Henrique Barboza      <danielhb413@gmail.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#if defined(TARGET_PPC64)
+static void pmu_count_insns(DisasContext *ctx)
+{
+    /*
+     * Do not bother calling the helper if the PMU isn't counting
+     * instructions.
+     */
+    if (!ctx->pmu_insn_cnt) {
+        return;
+    }
+
+ #if !defined(CONFIG_USER_ONLY)
+    /*
+     * The PMU insns_inc() helper stops the internal PMU timer if a
+     * counter overflows happens. In that case, if the guest is
+     * running with icount and we do not handle it beforehand,
+     * the helper can trigger a 'bad icount read'.
+     */
+    gen_icount_io_start(ctx);
+
+    gen_helper_insns_inc(cpu_env, tcg_constant_i32(ctx->base.num_insns));
+#else
+    /*
+     * User mode can read (but not write) PMC5 and start/stop
+     * the PMU via MMCR0_FC. In this case just increment
+     * PMC5 with base.num_insns.
+     */
+    TCGv t0 = tcg_temp_new();
+
+    gen_load_spr(t0, SPR_POWER_PMC5);
+    tcg_gen_addi_tl(t0, t0, ctx->base.num_insns);
+    gen_store_spr(SPR_POWER_PMC5, t0);
+
+    tcg_temp_free(t0);
+#endif /* #if !defined(CONFIG_USER_ONLY) */
+}
+#else
+static void pmu_count_insns(DisasContext *ctx)
+{
+    return;
+}
+#endif /* #if defined(TARGET_PPC64) */
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 114456148c..44773bc6cd 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -4183,48 +4183,8 @@ static inline void gen_update_cfar(DisasContext *ctx, target_ulong nip)
 #endif
 }
 
-#if defined(TARGET_PPC64)
-static void pmu_count_insns(DisasContext *ctx)
-{
-    /*
-     * Do not bother calling the helper if the PMU isn't counting
-     * instructions.
-     */
-    if (!ctx->pmu_insn_cnt) {
-        return;
-    }
-
- #if !defined(CONFIG_USER_ONLY)
-    /*
-     * The PMU insns_inc() helper stops the internal PMU timer if a
-     * counter overflows happens. In that case, if the guest is
-     * running with icount and we do not handle it beforehand,
-     * the helper can trigger a 'bad icount read'.
-     */
-    gen_icount_io_start(ctx);
-
-    gen_helper_insns_inc(cpu_env, tcg_constant_i32(ctx->base.num_insns));
-#else
-    /*
-     * User mode can read (but not write) PMC5 and start/stop
-     * the PMU via MMCR0_FC. In this case just increment
-     * PMC5 with base.num_insns.
-     */
-    TCGv t0 = tcg_temp_new();
-
-    gen_load_spr(t0, SPR_POWER_PMC5);
-    tcg_gen_addi_tl(t0, t0, ctx->base.num_insns);
-    gen_store_spr(SPR_POWER_PMC5, t0);
-
-    tcg_temp_free(t0);
-#endif /* #if !defined(CONFIG_USER_ONLY) */
-}
-#else
-static void pmu_count_insns(DisasContext *ctx)
-{
-    return;
-}
-#endif /* #if defined(TARGET_PPC64) */
+/* For pmu_count_insns */
+#include "target/ppc/power8-pmu-insn-cnt.c.inc"
 
 static inline bool use_goto_tb(DisasContext *ctx, target_ulong dest)
 {
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/8] target/ppc/power8-pmu-insn-cnt: add pmu_inc_pmc5()
  2021-12-22 13:45 [PATCH 0/8] Re-write PPC64 PMU instruction count using TCG Ops Daniel Henrique Barboza
  2021-12-22 13:45 ` [PATCH 1/8] target/ppc: introduce power8-pmu-insn-cnt.c.inc Daniel Henrique Barboza
@ 2021-12-22 13:45 ` Daniel Henrique Barboza
  2021-12-22 13:45 ` [PATCH 3/8] target/ppc/power8-pmu-insn-cnt: add pmu_inc_pmc1() Daniel Henrique Barboza
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Daniel Henrique Barboza @ 2021-12-22 13:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: richard.henderson, Daniel Henrique Barboza, qemu-ppc, clg, david

The first PMC to be counted using exclusively TCG ops will be PMC5.
pmu_inc_pmc5() will always be called inside pmu_count_insns() since it's
able to avoid incrementing PMC5 by checking for MMCR0_FC56 beforehand.

Note that we've already checked that MMCR0_FC is cleared at this point
via ctx->pmu_insn_cnt being set.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
 target/ppc/power8-pmu-insn-cnt.c.inc | 42 ++++++++++++++++++++++------
 1 file changed, 34 insertions(+), 8 deletions(-)

diff --git a/target/ppc/power8-pmu-insn-cnt.c.inc b/target/ppc/power8-pmu-insn-cnt.c.inc
index 2febbcc27e..c683573104 100644
--- a/target/ppc/power8-pmu-insn-cnt.c.inc
+++ b/target/ppc/power8-pmu-insn-cnt.c.inc
@@ -10,6 +10,38 @@
  * See the COPYING file in the top-level directory.
  */
 
+#if defined(TARGET_PPC64) && !defined(CONFIG_USER_ONLY)
+/*
+ * Increments PMC5 if MMCR0_FC is cleared.
+ */
+static void pmu_inc_pmc5(DisasContext *ctx)
+{
+    TCGv t0, t1;
+    TCGLabel *l_skip_pmc;
+
+    /*
+     * If MMCR0_FC56 is set skip PMC5 increment.
+     */
+    l_skip_pmc = gen_new_label();
+
+    t0 = tcg_temp_new();
+    gen_load_spr(t0, SPR_POWER_MMCR0);
+
+    tcg_gen_andi_tl(t0, t0, MMCR0_FC56);
+    tcg_gen_brcondi_tl(TCG_COND_EQ, t0, MMCR0_FC56, l_skip_pmc);
+
+    t1 = tcg_temp_new();
+    gen_load_spr(t1, SPR_POWER_PMC5);
+    tcg_gen_addi_tl(t1, t1, ctx->base.num_insns);
+    gen_store_spr(SPR_POWER_PMC5, t1);
+
+    gen_set_label(l_skip_pmc);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+}
+#endif /* #if defined(TARGET_PPC64) && !defined(CONFIG_USER_ONLY) */
+
 #if defined(TARGET_PPC64)
 static void pmu_count_insns(DisasContext *ctx)
 {
@@ -22,15 +54,9 @@ static void pmu_count_insns(DisasContext *ctx)
     }
 
  #if !defined(CONFIG_USER_ONLY)
-    /*
-     * The PMU insns_inc() helper stops the internal PMU timer if a
-     * counter overflows happens. In that case, if the guest is
-     * running with icount and we do not handle it beforehand,
-     * the helper can trigger a 'bad icount read'.
-     */
-    gen_icount_io_start(ctx);
 
-    gen_helper_insns_inc(cpu_env, tcg_constant_i32(ctx->base.num_insns));
+    pmu_inc_pmc5(ctx);
+
 #else
     /*
      * User mode can read (but not write) PMC5 and start/stop
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/8] target/ppc/power8-pmu-insn-cnt: add pmu_inc_pmc1()
  2021-12-22 13:45 [PATCH 0/8] Re-write PPC64 PMU instruction count using TCG Ops Daniel Henrique Barboza
  2021-12-22 13:45 ` [PATCH 1/8] target/ppc: introduce power8-pmu-insn-cnt.c.inc Daniel Henrique Barboza
  2021-12-22 13:45 ` [PATCH 2/8] target/ppc/power8-pmu-insn-cnt: add pmu_inc_pmc5() Daniel Henrique Barboza
@ 2021-12-22 13:45 ` Daniel Henrique Barboza
  2021-12-22 13:45 ` [PATCH 4/8] target/ppc/power8-pmu-insn-cnt: add pmu_inc_pmc2() Daniel Henrique Barboza
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Daniel Henrique Barboza @ 2021-12-22 13:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: richard.henderson, Daniel Henrique Barboza, qemu-ppc, clg, david

pmu_inc_pmc1() will use TCG Ops to increment the PMC1 counter when
it's counting PM_INST_CMPL events. At this moment we're supporting two
values of MMCR1_PMC1SEL for this event: 0x02 and 0xFE.

This function, and all the soon to be added PMC2-4 insn count functions,
does not check if MMCR0_FC14 is set. This check is done inside
pmu_count_insns, which will allow us to skip all PMC1-4 instruction
count functions at once if the proper conditions aren't met.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
 target/ppc/power8-pmu-insn-cnt.c.inc | 76 ++++++++++++++++++++++++++++
 1 file changed, 76 insertions(+)

diff --git a/target/ppc/power8-pmu-insn-cnt.c.inc b/target/ppc/power8-pmu-insn-cnt.c.inc
index c683573104..3661fb0022 100644
--- a/target/ppc/power8-pmu-insn-cnt.c.inc
+++ b/target/ppc/power8-pmu-insn-cnt.c.inc
@@ -11,6 +11,56 @@
  */
 
 #if defined(TARGET_PPC64) && !defined(CONFIG_USER_ONLY)
+
+#define MMCR1_PMC1_INS_CNT        0x02000000
+#define MMCR1_PMC1_INS_CNT2       0xFE000000
+
+/*
+ * Increments PMC1 checking if MMCR1_PMC1SEL has one of the following
+ * events:
+ *
+ * - 0x02: implementation dependent PM_INSN_CMPL
+ * - 0xFE: ISA architected PM_INSN_CMPL
+ *
+ * This function assumes that MMCR0_FC14 is cleared.
+ */
+static void pmu_inc_pmc1(DisasContext *ctx)
+{
+    TCGv t0, t1, t2;
+    TCGLabel *l_inc_pmc;
+    TCGLabel *l_skip_pmc;
+
+    /*
+     * PMC1 will be incremented if MMCR1_PMC1SEL = 0x2
+     * or 0xFE.
+     */
+    l_inc_pmc = gen_new_label();
+    l_skip_pmc = gen_new_label();
+
+    t0 = tcg_temp_new();
+    gen_load_spr(t0, SPR_POWER_MMCR1);
+    tcg_gen_andi_tl(t0, t0, MMCR1_PMC1_INS_CNT);
+    tcg_gen_brcondi_tl(TCG_COND_EQ, t0, MMCR1_PMC1_INS_CNT, l_inc_pmc);
+
+    t1 = tcg_temp_new();
+    gen_load_spr(t1, SPR_POWER_MMCR1);
+    tcg_gen_andi_tl(t1, t1, MMCR1_PMC1_INS_CNT2);
+    tcg_gen_brcondi_tl(TCG_COND_NE, t1, MMCR1_PMC1_INS_CNT2, l_skip_pmc);
+
+    gen_set_label(l_inc_pmc);
+
+    t2 = tcg_temp_new();
+    gen_load_spr(t2, SPR_POWER_PMC1);
+    tcg_gen_addi_tl(t2, t2, ctx->base.num_insns);
+    gen_store_spr(SPR_POWER_PMC1, t2);
+
+    gen_set_label(l_skip_pmc);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+    tcg_temp_free(t2);
+}
+
 /*
  * Increments PMC5 if MMCR0_FC is cleared.
  */
@@ -55,8 +105,34 @@ static void pmu_count_insns(DisasContext *ctx)
 
  #if !defined(CONFIG_USER_ONLY)
 
+    TCGv t_mmcr0, t_mmcr1;
+    TCGLabel *l_skip_pmc14;
+
     pmu_inc_pmc5(ctx);
 
+    /*
+     * Skip PMC1-4 increment if:
+     * - MMCR0_FC14 is set OR
+     * - MMCR1 is cleared
+     */
+    l_skip_pmc14 = gen_new_label();
+
+    t_mmcr0 = tcg_temp_new();
+    gen_load_spr(t_mmcr0, SPR_POWER_MMCR0);
+    tcg_gen_andi_tl(t_mmcr0, t_mmcr0, MMCR0_FC14);
+    tcg_gen_brcondi_tl(TCG_COND_EQ, t_mmcr0, MMCR0_FC14, l_skip_pmc14);
+
+    t_mmcr1 = tcg_temp_new();
+    gen_load_spr(t_mmcr1, SPR_POWER_MMCR1);
+    tcg_gen_brcondi_tl(TCG_COND_EQ, t_mmcr1, 0x0, l_skip_pmc14);
+
+    pmu_inc_pmc1(ctx);
+
+    gen_set_label(l_skip_pmc14);
+
+    tcg_temp_free(t_mmcr0);
+    tcg_temp_free(t_mmcr1);
+
 #else
     /*
      * User mode can read (but not write) PMC5 and start/stop
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/8] target/ppc/power8-pmu-insn-cnt: add pmu_inc_pmc2()
  2021-12-22 13:45 [PATCH 0/8] Re-write PPC64 PMU instruction count using TCG Ops Daniel Henrique Barboza
                   ` (2 preceding siblings ...)
  2021-12-22 13:45 ` [PATCH 3/8] target/ppc/power8-pmu-insn-cnt: add pmu_inc_pmc1() Daniel Henrique Barboza
@ 2021-12-22 13:45 ` Daniel Henrique Barboza
  2021-12-22 13:45 ` [PATCH 5/8] target/ppc/power8-pmu-insn-cnt: add pmu_inc_pmc3() Daniel Henrique Barboza
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Daniel Henrique Barboza @ 2021-12-22 13:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: richard.henderson, Daniel Henrique Barboza, qemu-ppc, clg, david

Add PMC2 PM_INST_CMPL count with TCG Ops.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
 target/ppc/power8-pmu-insn-cnt.c.inc | 32 ++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/target/ppc/power8-pmu-insn-cnt.c.inc b/target/ppc/power8-pmu-insn-cnt.c.inc
index 3661fb0022..be0e2dc3b5 100644
--- a/target/ppc/power8-pmu-insn-cnt.c.inc
+++ b/target/ppc/power8-pmu-insn-cnt.c.inc
@@ -14,6 +14,7 @@
 
 #define MMCR1_PMC1_INS_CNT        0x02000000
 #define MMCR1_PMC1_INS_CNT2       0xFE000000
+#define MMCR1_PMC2_INS_CNT        0x00020000
 
 /*
  * Increments PMC1 checking if MMCR1_PMC1SEL has one of the following
@@ -61,6 +62,36 @@ static void pmu_inc_pmc1(DisasContext *ctx)
     tcg_temp_free(t2);
 }
 
+/*
+ * Increments PMC2 checking if MMCR1_PMC2SEL = 0x02
+ * (PM_INST_CMPL event).
+ *
+ * This function assumes that MMCR0_FC14 is cleared.
+ */
+static void pmu_inc_pmc2(DisasContext *ctx)
+{
+    TCGv t0, t1;
+    TCGLabel *l_skip_pmc;
+
+    /* PMC2 will be incremented if MMCR1_PMC2SEL is 0x2 */
+    l_skip_pmc = gen_new_label();
+
+    t0 = tcg_temp_new();
+    gen_load_spr(t0, SPR_POWER_MMCR1);
+    tcg_gen_andi_tl(t0, t0, MMCR1_PMC2_INS_CNT);
+    tcg_gen_brcondi_tl(TCG_COND_NE, t0, MMCR1_PMC2_INS_CNT, l_skip_pmc);
+
+    t1 = tcg_temp_new();
+    gen_load_spr(t1, SPR_POWER_PMC2);
+    tcg_gen_addi_tl(t1, t1, ctx->base.num_insns);
+    gen_store_spr(SPR_POWER_PMC2, t1);
+
+    gen_set_label(l_skip_pmc);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+}
+
 /*
  * Increments PMC5 if MMCR0_FC is cleared.
  */
@@ -127,6 +158,7 @@ static void pmu_count_insns(DisasContext *ctx)
     tcg_gen_brcondi_tl(TCG_COND_EQ, t_mmcr1, 0x0, l_skip_pmc14);
 
     pmu_inc_pmc1(ctx);
+    pmu_inc_pmc2(ctx);
 
     gen_set_label(l_skip_pmc14);
 
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 5/8] target/ppc/power8-pmu-insn-cnt: add pmu_inc_pmc3()
  2021-12-22 13:45 [PATCH 0/8] Re-write PPC64 PMU instruction count using TCG Ops Daniel Henrique Barboza
                   ` (3 preceding siblings ...)
  2021-12-22 13:45 ` [PATCH 4/8] target/ppc/power8-pmu-insn-cnt: add pmu_inc_pmc2() Daniel Henrique Barboza
@ 2021-12-22 13:45 ` Daniel Henrique Barboza
  2021-12-22 13:45 ` [PATCH 6/8] target/ppc/power8-pmu-insn-cnt.c: add pmu_inc_pmc4() Daniel Henrique Barboza
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Daniel Henrique Barboza @ 2021-12-22 13:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: richard.henderson, Daniel Henrique Barboza, qemu-ppc, clg, david

Add PMC3 PM_INST_CMPL count with TCG Ops.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
 target/ppc/power8-pmu-insn-cnt.c.inc | 32 ++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/target/ppc/power8-pmu-insn-cnt.c.inc b/target/ppc/power8-pmu-insn-cnt.c.inc
index be0e2dc3b5..a84d688503 100644
--- a/target/ppc/power8-pmu-insn-cnt.c.inc
+++ b/target/ppc/power8-pmu-insn-cnt.c.inc
@@ -15,6 +15,7 @@
 #define MMCR1_PMC1_INS_CNT        0x02000000
 #define MMCR1_PMC1_INS_CNT2       0xFE000000
 #define MMCR1_PMC2_INS_CNT        0x00020000
+#define MMCR1_PMC3_INS_CNT        0x00000200
 
 /*
  * Increments PMC1 checking if MMCR1_PMC1SEL has one of the following
@@ -92,6 +93,36 @@ static void pmu_inc_pmc2(DisasContext *ctx)
     tcg_temp_free(t1);
 }
 
+/*
+ * Increments PMC3 checking if MMCR1_PMC3SEL = 0x02
+ * (PM_INST_CMPL event).
+ *
+ * This function assumes that MMCR0_FC14 is cleared.
+ */
+static void pmu_inc_pmc3(DisasContext *ctx)
+{
+    TCGv t0, t1;
+    TCGLabel *l_skip_pmc;
+
+    /* PMC3 will be incremented if MMCR1_PMC3SEL is 0x2 */
+    l_skip_pmc = gen_new_label();
+
+    t0 = tcg_temp_new();
+    gen_load_spr(t0, SPR_POWER_MMCR1);
+    tcg_gen_andi_tl(t0, t0, MMCR1_PMC3_INS_CNT);
+    tcg_gen_brcondi_tl(TCG_COND_NE, t0, MMCR1_PMC3_INS_CNT, l_skip_pmc);
+
+    t1 = tcg_temp_new();
+    gen_load_spr(t1, SPR_POWER_PMC3);
+    tcg_gen_addi_tl(t1, t1, ctx->base.num_insns);
+    gen_store_spr(SPR_POWER_PMC3, t1);
+
+    gen_set_label(l_skip_pmc);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+}
+
 /*
  * Increments PMC5 if MMCR0_FC is cleared.
  */
@@ -159,6 +190,7 @@ static void pmu_count_insns(DisasContext *ctx)
 
     pmu_inc_pmc1(ctx);
     pmu_inc_pmc2(ctx);
+    pmu_inc_pmc3(ctx);
 
     gen_set_label(l_skip_pmc14);
 
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 6/8] target/ppc/power8-pmu-insn-cnt.c: add pmu_inc_pmc4()
  2021-12-22 13:45 [PATCH 0/8] Re-write PPC64 PMU instruction count using TCG Ops Daniel Henrique Barboza
                   ` (4 preceding siblings ...)
  2021-12-22 13:45 ` [PATCH 5/8] target/ppc/power8-pmu-insn-cnt: add pmu_inc_pmc3() Daniel Henrique Barboza
@ 2021-12-22 13:45 ` Daniel Henrique Barboza
  2021-12-22 13:45 ` [PATCH 7/8] target/ppc/power8-pmu-insn-cnt: add pmu_check_overflow() Daniel Henrique Barboza
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Daniel Henrique Barboza @ 2021-12-22 13:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: richard.henderson, Daniel Henrique Barboza, qemu-ppc, clg, david

pmu_inc_pmc4() will count both PM_INST_CMPL and PM_RUN_INST_CMPL.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
 target/ppc/power8-pmu-insn-cnt.c.inc | 60 ++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/target/ppc/power8-pmu-insn-cnt.c.inc b/target/ppc/power8-pmu-insn-cnt.c.inc
index a84d688503..d3dd6d4685 100644
--- a/target/ppc/power8-pmu-insn-cnt.c.inc
+++ b/target/ppc/power8-pmu-insn-cnt.c.inc
@@ -16,6 +16,8 @@
 #define MMCR1_PMC1_INS_CNT2       0xFE000000
 #define MMCR1_PMC2_INS_CNT        0x00020000
 #define MMCR1_PMC3_INS_CNT        0x00000200
+#define MMCR1_PMC4_INS_CNT        0x00000002
+#define MMCR1_PMC4_INS_LATCH_CNT  0x000000FA
 
 /*
  * Increments PMC1 checking if MMCR1_PMC1SEL has one of the following
@@ -123,6 +125,63 @@ static void pmu_inc_pmc3(DisasContext *ctx)
     tcg_temp_free(t1);
 }
 
+/*
+ * Increments PMC4 checking if MMCR1_PMC4SEL has one of the following
+ * events:
+ *
+ * - 0x02: implementation dependent PM_INST_CMPL
+ * - 0xFE: ISA architected PM_RUN_INST_CMPL (run latch)
+ *
+ * This function assumes that MMCR0_FC14 is cleared.
+ */
+static void pmu_inc_pmc4(DisasContext *ctx)
+{
+    TCGv t0, t1, t2, t3;
+    TCGLabel *l_inc_pmc;
+    TCGLabel *l_skip_pmc;
+
+    /*
+     * PMC4 will be incremented if MMCR1_PMC4SEL = 0x2
+     * or 0xFA. For 0xFA (INSN_RUN_LATCH) we need to do an
+     * extra check with SPR_CTRL & CTRL_RUN.
+     */
+    l_inc_pmc = gen_new_label();
+    l_skip_pmc = gen_new_label();
+
+    t0 = tcg_temp_new();
+    gen_load_spr(t0, SPR_POWER_MMCR1);
+    tcg_gen_andi_tl(t0, t0, MMCR1_PMC4_INS_CNT);
+    tcg_gen_brcondi_tl(TCG_COND_EQ, t0, MMCR1_PMC4_INS_CNT, l_inc_pmc);
+
+    t1 = tcg_temp_new();
+    gen_load_spr(t1, SPR_POWER_MMCR1);
+    tcg_gen_andi_tl(t1, t1, MMCR1_PMC4_INS_LATCH_CNT);
+    tcg_gen_brcondi_tl(TCG_COND_NE, t1, MMCR1_PMC4_INS_LATCH_CNT, l_skip_pmc);
+
+    /*
+     * MMCR1_PMC4SEL is 0xFA at this point. Check if we have
+     * the run latch, skip if we don't.
+     */
+    t2 = tcg_temp_new();
+    gen_load_spr(t2, SPR_CTRL);
+    tcg_gen_andi_tl(t2, t2, CTRL_RUN);
+    tcg_gen_brcondi_tl(TCG_COND_NE, t2, CTRL_RUN, l_skip_pmc);
+
+    gen_set_label(l_inc_pmc);
+
+    t3 = tcg_temp_new();
+    gen_load_spr(t3, SPR_POWER_PMC4);
+    tcg_gen_addi_tl(t3, t3, ctx->base.num_insns);
+    gen_store_spr(SPR_POWER_PMC4, t3);
+
+    gen_set_label(l_skip_pmc);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+    tcg_temp_free(t2);
+    tcg_temp_free(t3);
+}
+
 /*
  * Increments PMC5 if MMCR0_FC is cleared.
  */
@@ -191,6 +250,7 @@ static void pmu_count_insns(DisasContext *ctx)
     pmu_inc_pmc1(ctx);
     pmu_inc_pmc2(ctx);
     pmu_inc_pmc3(ctx);
+    pmu_inc_pmc4(ctx);
 
     gen_set_label(l_skip_pmc14);
 
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 7/8] target/ppc/power8-pmu-insn-cnt: add pmu_check_overflow()
  2021-12-22 13:45 [PATCH 0/8] Re-write PPC64 PMU instruction count using TCG Ops Daniel Henrique Barboza
                   ` (5 preceding siblings ...)
  2021-12-22 13:45 ` [PATCH 6/8] target/ppc/power8-pmu-insn-cnt.c: add pmu_inc_pmc4() Daniel Henrique Barboza
@ 2021-12-22 13:45 ` Daniel Henrique Barboza
  2021-12-22 13:45 ` [PATCH 8/8] target/ppc/power8-pmu.c: remove helper_insns_inc() Daniel Henrique Barboza
  2021-12-23  2:43 ` [PATCH 0/8] Re-write PPC64 PMU instruction count using TCG Ops Richard Henderson
  8 siblings, 0 replies; 12+ messages in thread
From: Daniel Henrique Barboza @ 2021-12-22 13:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: richard.henderson, Daniel Henrique Barboza, qemu-ppc, clg, david

pmu_check_overflow() will verify for overflow in the PMC1-5 counters,
firing a performance monitor alert if an overflow happened with the
proper MMCR0 bits set.

The alert is fired by using helper_pmu_overflow().

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
 target/ppc/helper.h                  |  1 +
 target/ppc/power8-pmu-insn-cnt.c.inc | 89 ++++++++++++++++++++++++++++
 target/ppc/power8-pmu.c              |  8 +++
 3 files changed, 98 insertions(+)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index fb6cac38b4..4d8193caab 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -26,6 +26,7 @@ DEF_HELPER_2(store_mmcr1, void, env, tl)
 DEF_HELPER_3(store_pmc, void, env, i32, i64)
 DEF_HELPER_2(read_pmc, tl, env, i32)
 DEF_HELPER_2(insns_inc, void, env, i32)
+DEF_HELPER_1(pmu_overflow, void, env)
 #endif
 DEF_HELPER_1(check_tlb_flush_local, void, env)
 DEF_HELPER_1(check_tlb_flush_global, void, env)
diff --git a/target/ppc/power8-pmu-insn-cnt.c.inc b/target/ppc/power8-pmu-insn-cnt.c.inc
index d3dd6d4685..7bd07d8105 100644
--- a/target/ppc/power8-pmu-insn-cnt.c.inc
+++ b/target/ppc/power8-pmu-insn-cnt.c.inc
@@ -19,6 +19,7 @@
 #define MMCR1_PMC4_INS_CNT        0x00000002
 #define MMCR1_PMC4_INS_LATCH_CNT  0x000000FA
 
+#define PMC_COUNTER_NEGATIVE_VAL  0x80000000UL
 /*
  * Increments PMC1 checking if MMCR1_PMC1SEL has one of the following
  * events:
@@ -211,6 +212,92 @@ static void pmu_inc_pmc5(DisasContext *ctx)
     tcg_temp_free(t0);
     tcg_temp_free(t1);
 }
+
+/*
+ * Check for overflow of PMC1-PMC5 counters and call the overflow
+ * helper in case any counter has overflown.
+ */
+static void pmu_check_overflow(DisasContext *ctx)
+{
+    TCGv t_pmc1, t_pmc2, t_pmc3, t_pmc4, t_pmc5;
+    TCGv t0, t1;
+    TCGLabel *l_pmc_overflow;
+    TCGLabel *l_skip_pmc1_overflow;
+    TCGLabel *l_skip_overflow;
+
+    /*
+     * Check if we have overflow bits set and fire an overflow
+     * event if necessary. Skip directly to 'l_pmc_overflow'
+     * right after finding the first overflow.
+     */
+    l_pmc_overflow = gen_new_label();
+    l_skip_pmc1_overflow = gen_new_label();
+
+    t0 = tcg_temp_new();
+    gen_load_spr(t0, SPR_POWER_MMCR0);
+    tcg_gen_andi_tl(t0, t0, MMCR0_PMC1CE);
+    tcg_gen_brcondi_tl(TCG_COND_NE, t0, MMCR0_PMC1CE, l_skip_pmc1_overflow);
+
+    t_pmc1 = tcg_temp_new();
+    gen_load_spr(t_pmc1, SPR_POWER_PMC1);
+    tcg_gen_brcondi_tl(TCG_COND_GE, t_pmc1, PMC_COUNTER_NEGATIVE_VAL,
+                       l_pmc_overflow);
+
+    gen_set_label(l_skip_pmc1_overflow);
+
+    l_skip_overflow = gen_new_label();
+
+    /*
+     * At this point we're sure PMC1 didn't overflow. If MMCR0_PMCjCE
+     * isn't set we can skip everything since PMC2-5 overflow is
+     * disabled.
+     */
+    t1 = tcg_temp_new();
+    gen_load_spr(t1, SPR_POWER_MMCR0);
+    tcg_gen_andi_tl(t1, t1, MMCR0_PMCjCE);
+    tcg_gen_brcondi_tl(TCG_COND_NE, t1, MMCR0_PMCjCE, l_skip_overflow);
+
+    t_pmc2 = tcg_temp_new();
+    gen_load_spr(t_pmc2, SPR_POWER_PMC2);
+    tcg_gen_brcondi_tl(TCG_COND_GE, t_pmc2, PMC_COUNTER_NEGATIVE_VAL,
+                       l_pmc_overflow);
+
+    t_pmc3 = tcg_temp_new();
+    gen_load_spr(t_pmc3, SPR_POWER_PMC3);
+    tcg_gen_brcondi_tl(TCG_COND_GE, t_pmc3, PMC_COUNTER_NEGATIVE_VAL,
+                       l_pmc_overflow);
+
+    t_pmc4 = tcg_temp_new();
+    gen_load_spr(t_pmc4, SPR_POWER_PMC4);
+    tcg_gen_brcondi_tl(TCG_COND_GE, t_pmc4, PMC_COUNTER_NEGATIVE_VAL,
+                       l_pmc_overflow);
+
+    t_pmc5 = tcg_temp_new();
+    gen_load_spr(t_pmc5, SPR_POWER_PMC5);
+    tcg_gen_brcondi_tl(TCG_COND_LE, t_pmc5, PMC_COUNTER_NEGATIVE_VAL,
+                       l_skip_overflow);
+
+    gen_set_label(l_pmc_overflow);
+
+    /*
+     * The PMU overflow helper manipuilates the internal PMU timer.
+     * In that case, if the guest is running with icount and we do not
+     * handle it beforehand, the helper can trigger a 'bad icount
+     * read'.
+     */
+    gen_icount_io_start(ctx);
+    gen_helper_pmu_overflow(cpu_env);
+
+    gen_set_label(l_skip_overflow);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+    tcg_temp_free(t_pmc1);
+    tcg_temp_free(t_pmc2);
+    tcg_temp_free(t_pmc3);
+    tcg_temp_free(t_pmc4);
+    tcg_temp_free(t_pmc5);
+}
 #endif /* #if defined(TARGET_PPC64) && !defined(CONFIG_USER_ONLY) */
 
 #if defined(TARGET_PPC64)
@@ -254,6 +341,8 @@ static void pmu_count_insns(DisasContext *ctx)
 
     gen_set_label(l_skip_pmc14);
 
+    pmu_check_overflow(ctx);
+
     tcg_temp_free(t_mmcr0);
     tcg_temp_free(t_mmcr1);
 
diff --git a/target/ppc/power8-pmu.c b/target/ppc/power8-pmu.c
index 08d1902cd5..6696c9b3ae 100644
--- a/target/ppc/power8-pmu.c
+++ b/target/ppc/power8-pmu.c
@@ -323,6 +323,14 @@ void helper_insns_inc(CPUPPCState *env, uint32_t num_insns)
     }
 }
 
+/* Helper to fire a PMC interrupt from TCG code */
+void helper_pmu_overflow(CPUPPCState *env)
+{
+    PowerPCCPU *cpu = env_archcpu(env);
+
+    fire_PMC_interrupt(cpu);
+}
+
 static void cpu_ppc_pmu_timer_cb(void *opaque)
 {
     PowerPCCPU *cpu = opaque;
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 8/8] target/ppc/power8-pmu.c: remove helper_insns_inc()
  2021-12-22 13:45 [PATCH 0/8] Re-write PPC64 PMU instruction count using TCG Ops Daniel Henrique Barboza
                   ` (6 preceding siblings ...)
  2021-12-22 13:45 ` [PATCH 7/8] target/ppc/power8-pmu-insn-cnt: add pmu_check_overflow() Daniel Henrique Barboza
@ 2021-12-22 13:45 ` Daniel Henrique Barboza
  2021-12-23  2:43 ` [PATCH 0/8] Re-write PPC64 PMU instruction count using TCG Ops Richard Henderson
  8 siblings, 0 replies; 12+ messages in thread
From: Daniel Henrique Barboza @ 2021-12-22 13:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: richard.henderson, Daniel Henrique Barboza, qemu-ppc, clg, david

After moving all the instruction counting to TCG Ops code
this helper is not needed anymore.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
 target/ppc/helper.h                  |  1 -
 target/ppc/power8-pmu-insn-cnt.c.inc |  4 --
 target/ppc/power8-pmu.c              | 60 ----------------------------
 3 files changed, 65 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 4d8193caab..de80e82ebe 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -25,7 +25,6 @@ DEF_HELPER_2(store_mmcr0, void, env, tl)
 DEF_HELPER_2(store_mmcr1, void, env, tl)
 DEF_HELPER_3(store_pmc, void, env, i32, i64)
 DEF_HELPER_2(read_pmc, tl, env, i32)
-DEF_HELPER_2(insns_inc, void, env, i32)
 DEF_HELPER_1(pmu_overflow, void, env)
 #endif
 DEF_HELPER_1(check_tlb_flush_local, void, env)
diff --git a/target/ppc/power8-pmu-insn-cnt.c.inc b/target/ppc/power8-pmu-insn-cnt.c.inc
index 7bd07d8105..8b1604b4c7 100644
--- a/target/ppc/power8-pmu-insn-cnt.c.inc
+++ b/target/ppc/power8-pmu-insn-cnt.c.inc
@@ -303,10 +303,6 @@ static void pmu_check_overflow(DisasContext *ctx)
 #if defined(TARGET_PPC64)
 static void pmu_count_insns(DisasContext *ctx)
 {
-    /*
-     * Do not bother calling the helper if the PMU isn't counting
-     * instructions.
-     */
     if (!ctx->pmu_insn_cnt) {
         return;
     }
diff --git a/target/ppc/power8-pmu.c b/target/ppc/power8-pmu.c
index 6696c9b3ae..bfc052b49e 100644
--- a/target/ppc/power8-pmu.c
+++ b/target/ppc/power8-pmu.c
@@ -135,52 +135,6 @@ bool pmu_insn_cnt_enabled(CPUPPCState *env)
     return false;
 }
 
-static bool pmu_increment_insns(CPUPPCState *env, uint32_t num_insns)
-{
-    bool overflow_triggered = false;
-    int sprn;
-
-    /* PMC6 never counts instructions */
-    for (sprn = SPR_POWER_PMC1; sprn <= SPR_POWER_PMC5; sprn++) {
-        PMUEventType evt_type = pmc_get_event(env, sprn);
-        bool insn_event = evt_type == PMU_EVENT_INSTRUCTIONS ||
-                          evt_type == PMU_EVENT_INSN_RUN_LATCH;
-
-        if (pmc_is_inactive(env, sprn) || !insn_event) {
-            continue;
-        }
-
-        if (evt_type == PMU_EVENT_INSTRUCTIONS) {
-            env->spr[sprn] += num_insns;
-        }
-
-        if (evt_type == PMU_EVENT_INSN_RUN_LATCH &&
-            env->spr[SPR_CTRL] & CTRL_RUN) {
-            env->spr[sprn] += num_insns;
-        }
-
-        if (env->spr[sprn] >= PMC_COUNTER_NEGATIVE_VAL &&
-            pmc_has_overflow_enabled(env, sprn)) {
-
-            overflow_triggered = true;
-
-            /*
-             * The real PMU will always trigger a counter overflow with
-             * PMC_COUNTER_NEGATIVE_VAL. We don't have an easy way to
-             * do that since we're counting block of instructions at
-             * the end of each translation block, and we're probably
-             * passing this value at this point.
-             *
-             * Let's write PMC_COUNTER_NEGATIVE_VAL to the overflowed
-             * counter to simulate what the real hardware would do.
-             */
-            env->spr[sprn] = PMC_COUNTER_NEGATIVE_VAL;
-        }
-    }
-
-    return overflow_triggered;
-}
-
 static void pmu_update_cycles(CPUPPCState *env)
 {
     uint64_t now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
@@ -309,20 +263,6 @@ static void fire_PMC_interrupt(PowerPCCPU *cpu)
     return;
 }
 
-/* This helper assumes that the PMC is running. */
-void helper_insns_inc(CPUPPCState *env, uint32_t num_insns)
-{
-    bool overflow_triggered;
-    PowerPCCPU *cpu;
-
-    overflow_triggered = pmu_increment_insns(env, num_insns);
-
-    if (overflow_triggered) {
-        cpu = env_archcpu(env);
-        fire_PMC_interrupt(cpu);
-    }
-}
-
 /* Helper to fire a PMC interrupt from TCG code */
 void helper_pmu_overflow(CPUPPCState *env)
 {
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/8] target/ppc: introduce power8-pmu-insn-cnt.c.inc
  2021-12-22 13:45 ` [PATCH 1/8] target/ppc: introduce power8-pmu-insn-cnt.c.inc Daniel Henrique Barboza
@ 2021-12-22 18:00   ` Cédric Le Goater
  2021-12-22 18:10     ` Daniel Henrique Barboza
  0 siblings, 1 reply; 12+ messages in thread
From: Cédric Le Goater @ 2021-12-22 18:00 UTC (permalink / raw)
  To: Daniel Henrique Barboza, qemu-devel; +Cc: richard.henderson, qemu-ppc, david

On 12/22/21 14:45, Daniel Henrique Barboza wrote:
> We're going to add a significant amount of TCG ops code for
> instruction counting, eventually getting rid of the 'helper_insn_inc'
> helper entirely.
> 
> Create a new file to avoid putting even more stuff on the already
> crowded target/ppc/translate.c.

We already have a power8-pmu-regs.c.inc file. Couldn't we use it or is it
a bad idea ?

Thanks,

C.


> 
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> ---
>   target/ppc/power8-pmu-insn-cnt.c.inc | 54 ++++++++++++++++++++++++++++
>   target/ppc/translate.c               | 44 ++---------------------
>   2 files changed, 56 insertions(+), 42 deletions(-)
>   create mode 100644 target/ppc/power8-pmu-insn-cnt.c.inc
> 
> diff --git a/target/ppc/power8-pmu-insn-cnt.c.inc b/target/ppc/power8-pmu-insn-cnt.c.inc
> new file mode 100644
> index 0000000000..2febbcc27e
> --- /dev/null
> +++ b/target/ppc/power8-pmu-insn-cnt.c.inc
> @@ -0,0 +1,54 @@
> +/*
> + * PMU instruction counting for TCG IBM POWER chips
> + *
> + * Copyright IBM Corp. 2021
> + *
> + * Authors:
> + *  Daniel Henrique Barboza      <danielhb413@gmail.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#if defined(TARGET_PPC64)
> +static void pmu_count_insns(DisasContext *ctx)
> +{
> +    /*
> +     * Do not bother calling the helper if the PMU isn't counting
> +     * instructions.
> +     */
> +    if (!ctx->pmu_insn_cnt) {
> +        return;
> +    }
> +
> + #if !defined(CONFIG_USER_ONLY)
> +    /*
> +     * The PMU insns_inc() helper stops the internal PMU timer if a
> +     * counter overflows happens. In that case, if the guest is
> +     * running with icount and we do not handle it beforehand,
> +     * the helper can trigger a 'bad icount read'.
> +     */
> +    gen_icount_io_start(ctx);
> +
> +    gen_helper_insns_inc(cpu_env, tcg_constant_i32(ctx->base.num_insns));
> +#else
> +    /*
> +     * User mode can read (but not write) PMC5 and start/stop
> +     * the PMU via MMCR0_FC. In this case just increment
> +     * PMC5 with base.num_insns.
> +     */
> +    TCGv t0 = tcg_temp_new();
> +
> +    gen_load_spr(t0, SPR_POWER_PMC5);
> +    tcg_gen_addi_tl(t0, t0, ctx->base.num_insns);
> +    gen_store_spr(SPR_POWER_PMC5, t0);
> +
> +    tcg_temp_free(t0);
> +#endif /* #if !defined(CONFIG_USER_ONLY) */
> +}
> +#else
> +static void pmu_count_insns(DisasContext *ctx)
> +{
> +    return;
> +}
> +#endif /* #if defined(TARGET_PPC64) */
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index 114456148c..44773bc6cd 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -4183,48 +4183,8 @@ static inline void gen_update_cfar(DisasContext *ctx, target_ulong nip)
>   #endif
>   }
>   
> -#if defined(TARGET_PPC64)
> -static void pmu_count_insns(DisasContext *ctx)
> -{
> -    /*
> -     * Do not bother calling the helper if the PMU isn't counting
> -     * instructions.
> -     */
> -    if (!ctx->pmu_insn_cnt) {
> -        return;
> -    }
> -
> - #if !defined(CONFIG_USER_ONLY)
> -    /*
> -     * The PMU insns_inc() helper stops the internal PMU timer if a
> -     * counter overflows happens. In that case, if the guest is
> -     * running with icount and we do not handle it beforehand,
> -     * the helper can trigger a 'bad icount read'.
> -     */
> -    gen_icount_io_start(ctx);
> -
> -    gen_helper_insns_inc(cpu_env, tcg_constant_i32(ctx->base.num_insns));
> -#else
> -    /*
> -     * User mode can read (but not write) PMC5 and start/stop
> -     * the PMU via MMCR0_FC. In this case just increment
> -     * PMC5 with base.num_insns.
> -     */
> -    TCGv t0 = tcg_temp_new();
> -
> -    gen_load_spr(t0, SPR_POWER_PMC5);
> -    tcg_gen_addi_tl(t0, t0, ctx->base.num_insns);
> -    gen_store_spr(SPR_POWER_PMC5, t0);
> -
> -    tcg_temp_free(t0);
> -#endif /* #if !defined(CONFIG_USER_ONLY) */
> -}
> -#else
> -static void pmu_count_insns(DisasContext *ctx)
> -{
> -    return;
> -}
> -#endif /* #if defined(TARGET_PPC64) */
> +/* For pmu_count_insns */
> +#include "target/ppc/power8-pmu-insn-cnt.c.inc"
>   
>   static inline bool use_goto_tb(DisasContext *ctx, target_ulong dest)
>   {
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/8] target/ppc: introduce power8-pmu-insn-cnt.c.inc
  2021-12-22 18:00   ` Cédric Le Goater
@ 2021-12-22 18:10     ` Daniel Henrique Barboza
  0 siblings, 0 replies; 12+ messages in thread
From: Daniel Henrique Barboza @ 2021-12-22 18:10 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel; +Cc: richard.henderson, qemu-ppc, david



On 12/22/21 15:00, Cédric Le Goater wrote:
> On 12/22/21 14:45, Daniel Henrique Barboza wrote:
>> We're going to add a significant amount of TCG ops code for
>> instruction counting, eventually getting rid of the 'helper_insn_inc'
>> helper entirely.
>>
>> Create a new file to avoid putting even more stuff on the already
>> crowded target/ppc/translate.c.
> 
> We already have a power8-pmu-regs.c.inc file. Couldn't we use it or is it
> a bad idea ?

That was my first idea. But then I thought that this code would feel out of place
in power8-pmu-regs.c.inc since, at this moment, the file only has SPR read/write
implementations that are being declared by spr_tcg.h to be used by cpu_init.c.


I guess it's a matter of taste. I'm ok putting this code in power8-pmu-regs.c.inc if
you think it's better.



Daniel

> 
> Thanks,
> 
> C.
> 
> 
>>
>> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
>> ---
>>   target/ppc/power8-pmu-insn-cnt.c.inc | 54 ++++++++++++++++++++++++++++
>>   target/ppc/translate.c               | 44 ++---------------------
>>   2 files changed, 56 insertions(+), 42 deletions(-)
>>   create mode 100644 target/ppc/power8-pmu-insn-cnt.c.inc
>>
>> diff --git a/target/ppc/power8-pmu-insn-cnt.c.inc b/target/ppc/power8-pmu-insn-cnt.c.inc
>> new file mode 100644
>> index 0000000000..2febbcc27e
>> --- /dev/null
>> +++ b/target/ppc/power8-pmu-insn-cnt.c.inc
>> @@ -0,0 +1,54 @@
>> +/*
>> + * PMU instruction counting for TCG IBM POWER chips
>> + *
>> + * Copyright IBM Corp. 2021
>> + *
>> + * Authors:
>> + *  Daniel Henrique Barboza      <danielhb413@gmail.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + */
>> +
>> +#if defined(TARGET_PPC64)
>> +static void pmu_count_insns(DisasContext *ctx)
>> +{
>> +    /*
>> +     * Do not bother calling the helper if the PMU isn't counting
>> +     * instructions.
>> +     */
>> +    if (!ctx->pmu_insn_cnt) {
>> +        return;
>> +    }
>> +
>> + #if !defined(CONFIG_USER_ONLY)
>> +    /*
>> +     * The PMU insns_inc() helper stops the internal PMU timer if a
>> +     * counter overflows happens. In that case, if the guest is
>> +     * running with icount and we do not handle it beforehand,
>> +     * the helper can trigger a 'bad icount read'.
>> +     */
>> +    gen_icount_io_start(ctx);
>> +
>> +    gen_helper_insns_inc(cpu_env, tcg_constant_i32(ctx->base.num_insns));
>> +#else
>> +    /*
>> +     * User mode can read (but not write) PMC5 and start/stop
>> +     * the PMU via MMCR0_FC. In this case just increment
>> +     * PMC5 with base.num_insns.
>> +     */
>> +    TCGv t0 = tcg_temp_new();
>> +
>> +    gen_load_spr(t0, SPR_POWER_PMC5);
>> +    tcg_gen_addi_tl(t0, t0, ctx->base.num_insns);
>> +    gen_store_spr(SPR_POWER_PMC5, t0);
>> +
>> +    tcg_temp_free(t0);
>> +#endif /* #if !defined(CONFIG_USER_ONLY) */
>> +}
>> +#else
>> +static void pmu_count_insns(DisasContext *ctx)
>> +{
>> +    return;
>> +}
>> +#endif /* #if defined(TARGET_PPC64) */
>> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
>> index 114456148c..44773bc6cd 100644
>> --- a/target/ppc/translate.c
>> +++ b/target/ppc/translate.c
>> @@ -4183,48 +4183,8 @@ static inline void gen_update_cfar(DisasContext *ctx, target_ulong nip)
>>   #endif
>>   }
>> -#if defined(TARGET_PPC64)
>> -static void pmu_count_insns(DisasContext *ctx)
>> -{
>> -    /*
>> -     * Do not bother calling the helper if the PMU isn't counting
>> -     * instructions.
>> -     */
>> -    if (!ctx->pmu_insn_cnt) {
>> -        return;
>> -    }
>> -
>> - #if !defined(CONFIG_USER_ONLY)
>> -    /*
>> -     * The PMU insns_inc() helper stops the internal PMU timer if a
>> -     * counter overflows happens. In that case, if the guest is
>> -     * running with icount and we do not handle it beforehand,
>> -     * the helper can trigger a 'bad icount read'.
>> -     */
>> -    gen_icount_io_start(ctx);
>> -
>> -    gen_helper_insns_inc(cpu_env, tcg_constant_i32(ctx->base.num_insns));
>> -#else
>> -    /*
>> -     * User mode can read (but not write) PMC5 and start/stop
>> -     * the PMU via MMCR0_FC. In this case just increment
>> -     * PMC5 with base.num_insns.
>> -     */
>> -    TCGv t0 = tcg_temp_new();
>> -
>> -    gen_load_spr(t0, SPR_POWER_PMC5);
>> -    tcg_gen_addi_tl(t0, t0, ctx->base.num_insns);
>> -    gen_store_spr(SPR_POWER_PMC5, t0);
>> -
>> -    tcg_temp_free(t0);
>> -#endif /* #if !defined(CONFIG_USER_ONLY) */
>> -}
>> -#else
>> -static void pmu_count_insns(DisasContext *ctx)
>> -{
>> -    return;
>> -}
>> -#endif /* #if defined(TARGET_PPC64) */
>> +/* For pmu_count_insns */
>> +#include "target/ppc/power8-pmu-insn-cnt.c.inc"
>>   static inline bool use_goto_tb(DisasContext *ctx, target_ulong dest)
>>   {
>>
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/8] Re-write PPC64 PMU instruction count using TCG Ops
  2021-12-22 13:45 [PATCH 0/8] Re-write PPC64 PMU instruction count using TCG Ops Daniel Henrique Barboza
                   ` (7 preceding siblings ...)
  2021-12-22 13:45 ` [PATCH 8/8] target/ppc/power8-pmu.c: remove helper_insns_inc() Daniel Henrique Barboza
@ 2021-12-23  2:43 ` Richard Henderson
  8 siblings, 0 replies; 12+ messages in thread
From: Richard Henderson @ 2021-12-23  2:43 UTC (permalink / raw)
  To: Daniel Henrique Barboza, qemu-devel; +Cc: qemu-ppc, clg, david

On 12/22/21 5:45 AM, Daniel Henrique Barboza wrote:
> Hi,
> 
> Two days ago Richard Henderson reported test failures with Avocado and
> powernv8/9 due to timeouts [1]. The culprit ended up to be commit , a
> commit where I introduced PMU instruction counting for TCG PPC64.
> 
> For a reason that is still unclear to me these Avocado powernv tests are
> suffering a huge performance impact after that patch, something that I
> didn't verify in any other scenario I've tested. So one alternative to
> fix the situation is to understand this difference and try to solve it,
> which can take some time.
>   
> Another alternative is to optimize the code introduced by that commit.
> Today the instruction count is done by a TCG helper that is called after
> each TB exit. I was aware that calling a helper frequently isn't
> optimal, but that got the job done and didn't  hindered the use of
> pSeries and powernv machines.  Well, until [1] at least.
> 
> This series rewrites the PMU instruction counting using TCG Ops instead
> of a TCG helper. To do that we needed to write in TCG Ops not only the
> logic for increment the counters but also the logic to detect counter
> overflows.
> 
> A lot of code was added but the performance improvement is noticeable.
> Using my local machine I did some test runs with the 2 Avocado powernv
> tests that are timing out at this moment:

You generate a *lot* of inline code here.  Way too much, actually.

If you can get this performance improvement with this reorg, it merely means that your 
original C algorithm was poor.  The compiler should have been able to do better.

I've tested this theory here and...

> - failing Avocado powernv tests with current master:
> 
>   (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (70.17 s)
>   (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (70.90 s)
>   (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (70.81 s)
>   
>   (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (75.62 s)
>   (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (69.79 s)
>   (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (72.33 s)

boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (75.73 s)
boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (80.20 s)

> - after this series:
> 
>   (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (39.90 s)
>   (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (38.25 s)
>   (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (37.99 s)
> 
>   (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (43.17 s)
>   (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (43.64 s)
>   (1/1) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (44.21 s)

boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (39.66 s)
boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (43.02 s)

BTW, pre-power8-pmu, 29c4a3363b:

boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8: PASS (36.62 s)
boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9: PASS (39.69 s)

I'll post my series shortly.


r~


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-12-23  2:44 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-22 13:45 [PATCH 0/8] Re-write PPC64 PMU instruction count using TCG Ops Daniel Henrique Barboza
2021-12-22 13:45 ` [PATCH 1/8] target/ppc: introduce power8-pmu-insn-cnt.c.inc Daniel Henrique Barboza
2021-12-22 18:00   ` Cédric Le Goater
2021-12-22 18:10     ` Daniel Henrique Barboza
2021-12-22 13:45 ` [PATCH 2/8] target/ppc/power8-pmu-insn-cnt: add pmu_inc_pmc5() Daniel Henrique Barboza
2021-12-22 13:45 ` [PATCH 3/8] target/ppc/power8-pmu-insn-cnt: add pmu_inc_pmc1() Daniel Henrique Barboza
2021-12-22 13:45 ` [PATCH 4/8] target/ppc/power8-pmu-insn-cnt: add pmu_inc_pmc2() Daniel Henrique Barboza
2021-12-22 13:45 ` [PATCH 5/8] target/ppc/power8-pmu-insn-cnt: add pmu_inc_pmc3() Daniel Henrique Barboza
2021-12-22 13:45 ` [PATCH 6/8] target/ppc/power8-pmu-insn-cnt.c: add pmu_inc_pmc4() Daniel Henrique Barboza
2021-12-22 13:45 ` [PATCH 7/8] target/ppc/power8-pmu-insn-cnt: add pmu_check_overflow() Daniel Henrique Barboza
2021-12-22 13:45 ` [PATCH 8/8] target/ppc/power8-pmu.c: remove helper_insns_inc() Daniel Henrique Barboza
2021-12-23  2:43 ` [PATCH 0/8] Re-write PPC64 PMU instruction count using TCG Ops Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.