All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH  v9 00/13] TCG code quality tracking and perf integration
@ 2019-10-07 15:28 Alex Bennée
  2019-10-07 15:28 ` [PATCH v9 01/13] accel/tcg: introduce TBStatistics structure Alex Bennée
                   ` (14 more replies)
  0 siblings, 15 replies; 37+ messages in thread
From: Alex Bennée @ 2019-10-07 15:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota, Alex Bennée

Hi,

These are the bits of Vanderson's GSoC project which I think are ready
for merge(^) and would like get up-streamed before the code freeze
kicks in. It effectively removes the CONFIG_PROFILER support in favour
of a more dynamic and configurable TB Statistics subsystem. While not
in use it has no impact on the system apart a small amount of wasted
space in the TCGContext for each thread which is used to collect stats
during translation. When in use a TBStats structure is created for
each set of TBs that have the same address/flags combination. The most
basic tracking involves counting executions of each TB. You can also
enable JIT stats and sort by features such as spill count and
host/guest expansion ration.

For full performance analysis you can use -perf to output a JIT dump
that is compatible with the linux perf tool. This allows you to see
exactly which TBs are responsible for the majority of real execution
time. With the additional TB stats involved it will also include basic
information about the quality of that code.

The main changes from Vanderson's last post apart from general
clean-ups and g_autoptr'ing is I've dropped the "cfg" and "coverset"
commands. I dropped "cfg" because I wasn't happy about it navigated
the next TB in the chain. "coverset" tends to dump quite a lot of TBs
on a full system emulation so I wanted to think about a slightly
smoother UI experience.

^ and finally I think I'll also drop the "tb" command for the next
iteration because although useful it fails when the guest has
un-mapped pages and results in a exception getting delivered to the
guest which is less than ideal. See the comments for get_code_string.
However I've left it in this iteration so you can see a taste of the
future.

There are more notes on the project at the wiki page:

  https://wiki.qemu.org/Features/TCGCodeQuality

Please review.

Alex Bennée (2):
  tb-stats: reset the tracked TBs on a tb_flush
  configure: remove the final bits of --profiler support

Vanderson M. do Rosario (11):
  accel/tcg: introduce TBStatistics structure
  accel: collecting TB execution count
  accel: collecting JIT statistics
  accel: replacing part of CONFIG_PROFILER with TBStats
  accel: adding TB_JIT_TIME and full replacing CONFIG_PROFILER
  debug: add -d tb_stats to control TBStatistics collection:
  monitor: adding tb_stats hmp command
  Adding info [tb-list|tb] commands to HMP (WIP)
  tb-stats: dump hot TBs at the end of the execution
  accel/tcg: adding integration with linux perf
  tb-stats: adding TBStatistics info into perf dump

 accel/tcg/Makefile.objs       |   4 +-
 accel/tcg/cpu-exec.c          |   4 +
 accel/tcg/perf/Makefile.objs  |   1 +
 accel/tcg/perf/jitdump.c      | 206 +++++++++++
 accel/tcg/perf/jitdump.h      |  36 ++
 accel/tcg/tb-stats.c          | 662 ++++++++++++++++++++++++++++++++++
 accel/tcg/tcg-runtime.c       |   7 +
 accel/tcg/tcg-runtime.h       |   2 +
 accel/tcg/translate-all.c     | 173 +++++++--
 accel/tcg/translator.c        |   4 +
 configure                     |   8 -
 cpus.c                        |  14 +-
 disas.c                       |  31 +-
 docs/devel/tcg.rst            |  15 +
 hmp-commands-info.hx          |  16 +
 hmp-commands.hx               |  17 +
 include/exec/exec-all.h       |  15 +-
 include/exec/gen-icount.h     |  10 +
 include/exec/tb-context.h     |  12 +
 include/exec/tb-hash.h        |   7 +
 include/exec/tb-stats-dump.h  |  21 ++
 include/exec/tb-stats-flags.h |  31 ++
 include/exec/tb-stats.h       | 165 +++++++++
 include/qemu-common.h         |   3 +
 include/qemu/log-for-trace.h  |   4 +
 include/qemu/log.h            |   3 +
 include/qemu/timer.h          |   5 +-
 linux-user/exit.c             |   2 +
 linux-user/main.c             |   7 +
 monitor/misc.c                | 127 +++++--
 os-posix.c                    |   5 +
 qemu-options.hx               |  11 +
 stubs/Makefile.objs           |   1 +
 stubs/tb-stats.c              |  32 ++
 tcg/tcg.c                     | 221 +++---------
 tcg/tcg.h                     |  34 +-
 util/log.c                    |  89 ++++-
 vl.c                          |  10 +-
 38 files changed, 1727 insertions(+), 288 deletions(-)
 create mode 100644 accel/tcg/perf/Makefile.objs
 create mode 100644 accel/tcg/perf/jitdump.c
 create mode 100644 accel/tcg/perf/jitdump.h
 create mode 100644 accel/tcg/tb-stats.c
 create mode 100644 include/exec/tb-stats-dump.h
 create mode 100644 include/exec/tb-stats-flags.h
 create mode 100644 include/exec/tb-stats.h
 create mode 100644 stubs/tb-stats.c

-- 
2.20.1



^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH  v9 01/13] accel/tcg: introduce TBStatistics structure
  2019-10-07 15:28 [PATCH v9 00/13] TCG code quality tracking and perf integration Alex Bennée
@ 2019-10-07 15:28 ` Alex Bennée
  2019-10-08 12:35   ` Richard Henderson
  2019-10-07 15:28 ` [PATCH v9 02/13] accel: collecting TB execution count Alex Bennée
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Alex Bennée @ 2019-10-07 15:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, cota, Vanderson M. do Rosario, Paolo Bonzini,
	Richard Henderson

From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>

To store statistics for each TB, we created a TBStatistics structure
which is linked with the TBs. TBStatistics can stay alive after
tb_flush and be relinked to a regenerated TB. So the statistics can
be accumulated even through flushes.

The goal is to have all present and future qemu/tcg statistics and
meta-data stored in this new structure.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
Message-Id: <20190829173437.5926-2-vandersonmr2@gmail.com>
[AJB: fix git author, review comments]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
AJB
  - move tcg_collect_tb_stats inside tb-stats.c
  - add spdx header
  - drop tb from tbstats and associated functions
---
 accel/tcg/Makefile.objs   |  2 +-
 accel/tcg/tb-stats.c      | 52 +++++++++++++++++++++++++++++++++
 accel/tcg/translate-all.c | 60 ++++++++++++++++++++++++++++++++++++++
 include/exec/exec-all.h   | 15 +++-------
 include/exec/tb-context.h | 12 ++++++++
 include/exec/tb-hash.h    |  7 +++++
 include/exec/tb-stats.h   | 61 +++++++++++++++++++++++++++++++++++++++
 7 files changed, 197 insertions(+), 12 deletions(-)
 create mode 100644 accel/tcg/tb-stats.c
 create mode 100644 include/exec/tb-stats.h

diff --git a/accel/tcg/Makefile.objs b/accel/tcg/Makefile.objs
index d381a02f34..49ffe81b5d 100644
--- a/accel/tcg/Makefile.objs
+++ b/accel/tcg/Makefile.objs
@@ -2,7 +2,7 @@ obj-$(CONFIG_SOFTMMU) += tcg-all.o
 obj-$(CONFIG_SOFTMMU) += cputlb.o
 obj-y += tcg-runtime.o tcg-runtime-gvec.o
 obj-y += cpu-exec.o cpu-exec-common.o translate-all.o
-obj-y += translator.o
+obj-y += translator.o tb-stats.o
 
 obj-$(CONFIG_USER_ONLY) += user-exec.o
 obj-$(call lnot,$(CONFIG_SOFTMMU)) += user-exec-stub.o
diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
new file mode 100644
index 0000000000..8208f4a0ad
--- /dev/null
+++ b/accel/tcg/tb-stats.c
@@ -0,0 +1,52 @@
+/*
+ * QEMU System Emulator, Code Quality Monitor System
+ *
+ * Copyright (c) 2019 Vanderson M. do Rosario <vandersonmr2@gmail.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+
+#include "disas/disas.h"
+
+#include "exec/tb-stats.h"
+
+/* TBStatistic collection controls */
+enum TBStatsStatus { TB_STATS_RUNNING, TB_STATS_PAUSED, TB_STATS_STOPPED };
+
+static enum TBStatsStatus tcg_collect_tb_stats;
+
+void init_tb_stats_htable_if_not(void)
+{
+    if (tb_stats_collection_enabled() && !tb_ctx.tb_stats.map) {
+        qht_init(&tb_ctx.tb_stats, tb_stats_cmp,
+                CODE_GEN_HTABLE_SIZE, QHT_MODE_AUTO_RESIZE);
+    }
+}
+
+void enable_collect_tb_stats(void)
+{
+    init_tb_stats_htable_if_not();
+    tcg_collect_tb_stats = TB_STATS_RUNNING;
+}
+
+void disable_collect_tb_stats(void)
+{
+    tcg_collect_tb_stats = TB_STATS_PAUSED;
+}
+
+void pause_collect_tb_stats(void)
+{
+    tcg_collect_tb_stats = TB_STATS_STOPPED;
+}
+
+bool tb_stats_collection_enabled(void)
+{
+    return tcg_collect_tb_stats == TB_STATS_RUNNING;
+}
+
+bool tb_stats_collection_paused(void)
+{
+    return tcg_collect_tb_stats == TB_STATS_PAUSED;
+}
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 66d4bc4341..114ebe48bf 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1118,6 +1118,23 @@ static inline void code_gen_alloc(size_t tb_size)
     }
 }
 
+/*
+ * This is the more or less the same compare as tb_cmp(), but the
+ * data persists over tb_flush. We also aggregate the various
+ * variations of cflags under one record and ignore the details of
+ * page overlap (although we can count it).
+ */
+bool tb_stats_cmp(const void *ap, const void *bp)
+{
+    const TBStatistics *a = ap;
+    const TBStatistics *b = bp;
+
+    return a->phys_pc == b->phys_pc &&
+        a->pc == b->pc &&
+        a->cs_base == b->cs_base &&
+        a->flags == b->flags;
+}
+
 static bool tb_cmp(const void *ap, const void *bp)
 {
     const TranslationBlock *a = ap;
@@ -1137,6 +1154,7 @@ static void tb_htable_init(void)
     unsigned int mode = QHT_MODE_AUTO_RESIZE;
 
     qht_init(&tb_ctx.htable, tb_cmp, CODE_GEN_HTABLE_SIZE, mode);
+    init_tb_stats_htable_if_not();
 }
 
 /* Must be called before using the QEMU cpus. 'tb_size' is the size
@@ -1666,6 +1684,37 @@ tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
     return tb;
 }
 
+static TBStatistics *tb_get_stats(tb_page_addr_t phys_pc, target_ulong pc,
+                                  target_ulong cs_base, uint32_t flags)
+{
+    TBStatistics *new_stats = g_new0(TBStatistics, 1);
+    uint32_t hash = tb_stats_hash_func(phys_pc, pc, flags);
+    void *existing_stats = NULL;
+    new_stats->phys_pc = phys_pc;
+    new_stats->pc = pc;
+    new_stats->cs_base = cs_base;
+    new_stats->flags = flags;
+
+    /*
+     * All initialisation must be complete before we insert into qht
+     * table otherwise another thread might get a partially created
+     * structure.
+     */
+    qht_insert(&tb_ctx.tb_stats, new_stats, hash, &existing_stats);
+
+    if (unlikely(existing_stats)) {
+        /*
+         * If there is already a TBStatistic for this TB from a previous flush
+         * then just make the new TB point to the older TBStatistic
+         */
+        g_free(new_stats);
+        return existing_stats;
+    } else {
+        return new_stats;
+    }
+}
+
+
 /* Called with mmap_lock held for user mode emulation.  */
 TranslationBlock *tb_gen_code(CPUState *cpu,
                               target_ulong pc, target_ulong cs_base,
@@ -1732,6 +1781,17 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     ti = profile_getclock();
 #endif
 
+    /*
+     * We want to fetch the stats structure before we start code
+     * generation so we can count interesting things about this
+     * generation.
+     */
+    if (tb_stats_collection_enabled()) {
+        tb->tb_stats = tb_get_stats(phys_pc, pc, cs_base, flags);
+    } else {
+        tb->tb_stats = NULL;
+    }
+
     tcg_func_start(tcg_ctx);
 
     tcg_ctx->cpu = env_cpu(env);
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 49db07ba0b..73b15db7a7 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -23,21 +23,11 @@
 #include "cpu.h"
 #include "exec/tb-context.h"
 #include "sysemu/cpus.h"
+#include "exec/tb-stats.h"
 
 /* allow to see translation results - the slowdown should be negligible, so we leave it */
 #define DEBUG_DISAS
 
-/* Page tracking code uses ram addresses in system mode, and virtual
-   addresses in userspace mode.  Define tb_page_addr_t to be an appropriate
-   type.  */
-#if defined(CONFIG_USER_ONLY)
-typedef abi_ulong tb_page_addr_t;
-#define TB_PAGE_ADDR_FMT TARGET_ABI_FMT_lx
-#else
-typedef ram_addr_t tb_page_addr_t;
-#define TB_PAGE_ADDR_FMT RAM_ADDR_FMT
-#endif
-
 #include "qemu/log.h"
 
 void gen_intermediate_code(CPUState *cpu, TranslationBlock *tb, int max_insns);
@@ -410,6 +400,9 @@ struct TranslationBlock {
     uintptr_t jmp_list_head;
     uintptr_t jmp_list_next[2];
     uintptr_t jmp_dest[2];
+
+    /* Pointer to a struct where statistics from the TB is stored */
+    TBStatistics *tb_stats;
 };
 
 extern bool parallel_cpus;
diff --git a/include/exec/tb-context.h b/include/exec/tb-context.h
index feb585e0a7..3cfb62a338 100644
--- a/include/exec/tb-context.h
+++ b/include/exec/tb-context.h
@@ -23,6 +23,17 @@
 #include "qemu/thread.h"
 #include "qemu/qht.h"
 
+/* Page tracking code uses ram addresses in system mode, and virtual
+   addresses in userspace mode.  Define tb_page_addr_t to be an appropriate
+   type.  */
+#if defined(CONFIG_USER_ONLY)
+typedef abi_ulong tb_page_addr_t;
+#define TB_PAGE_ADDR_FMT TARGET_ABI_FMT_lx
+#else
+typedef ram_addr_t tb_page_addr_t;
+#define TB_PAGE_ADDR_FMT RAM_ADDR_FMT
+#endif
+
 #define CODE_GEN_HTABLE_BITS     15
 #define CODE_GEN_HTABLE_SIZE     (1 << CODE_GEN_HTABLE_BITS)
 
@@ -35,6 +46,7 @@ struct TBContext {
 
     /* statistics */
     unsigned tb_flush_count;
+    struct qht tb_stats;
 };
 
 extern TBContext tb_ctx;
diff --git a/include/exec/tb-hash.h b/include/exec/tb-hash.h
index 805235d321..dd8e8f252a 100644
--- a/include/exec/tb-hash.h
+++ b/include/exec/tb-hash.h
@@ -66,4 +66,11 @@ uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, uint32_t flags,
     return qemu_xxhash7(phys_pc, pc, flags, cf_mask, trace_vcpu_dstate);
 }
 
+static inline
+uint32_t tb_stats_hash_func(tb_page_addr_t phys_pc, target_ulong pc,
+                            uint32_t flags)
+{
+    return qemu_xxhash5(phys_pc, pc, flags);
+}
+
 #endif
diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
new file mode 100644
index 0000000000..4be6522da0
--- /dev/null
+++ b/include/exec/tb-stats.h
@@ -0,0 +1,61 @@
+/*
+ * QEMU System Emulator, Code Quality Monitor System
+ *
+ * Copyright (c) 2019 Vanderson M. do Rosario <vandersonmr2@gmail.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef TB_STATS_H
+
+#define TB_STATS_H
+
+#include "exec/cpu-common.h"
+#include "exec/tb-context.h"
+#include "tcg.h"
+
+typedef struct TBStatistics TBStatistics;
+
+/*
+ * This struct stores statistics such as execution count of the
+ * TranslationBlocks. Each sets of TBs for a given phys_pc/pc/flags
+ * has its own TBStatistics which will persist over tb_flush.
+ *
+ * We include additional counters to track number of translations as
+ * well as variants for compile flags.
+ */
+struct TBStatistics {
+    tb_page_addr_t phys_pc;
+    target_ulong pc;
+    uint32_t     flags;
+    /* cs_base isn't included in the hash but we do check for matches */
+    target_ulong cs_base;
+};
+
+bool tb_stats_cmp(const void *ap, const void *bp);
+
+void init_tb_stats_htable_if_not(void);
+
+void enable_collect_tb_stats(void);
+void disable_collect_tb_stats(void);
+void pause_collect_tb_stats(void);
+bool tb_stats_collection_enabled(void);
+bool tb_stats_collection_paused(void);
+
+#endif
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH  v9 02/13] accel: collecting TB execution count
  2019-10-07 15:28 [PATCH v9 00/13] TCG code quality tracking and perf integration Alex Bennée
  2019-10-07 15:28 ` [PATCH v9 01/13] accel/tcg: introduce TBStatistics structure Alex Bennée
@ 2019-10-07 15:28 ` Alex Bennée
  2019-10-08 13:10   ` Richard Henderson
  2019-10-07 15:28 ` [PATCH v9 03/13] accel: collecting JIT statistics Alex Bennée
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Alex Bennée @ 2019-10-07 15:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, cota, Vanderson M. do Rosario, Paolo Bonzini,
	Richard Henderson

From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>

If a TB has a TBS (TBStatistics) with the TB_EXEC_STATS
enabled, then we instrument the start code of this TB
to atomically count the number of times it is executed.
We count both the number of "normal" executions and atomic
executions of a TB.

The execution count of the TB is stored in its respective
TBS.

All TBStatistics are created by default with the flags from
default_tbstats_flag.

Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
Message-Id: <20190829173437.5926-3-vandersonmr2@gmail.com>
[AJB: Fix author]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
AJB:
  - move default_tbstats_flag to tb-stats
  - hoist dfilter check
---
 accel/tcg/cpu-exec.c      |  4 ++++
 accel/tcg/tb-stats.c      |  6 ++++++
 accel/tcg/tcg-runtime.c   |  7 +++++++
 accel/tcg/tcg-runtime.h   |  2 ++
 accel/tcg/translate-all.c | 11 +++++++++--
 accel/tcg/translator.c    |  1 +
 include/exec/gen-icount.h |  9 +++++++++
 include/exec/tb-stats.h   | 18 ++++++++++++++++++
 8 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 48272c781b..9b2b7bff80 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -251,6 +251,10 @@ void cpu_exec_step_atomic(CPUState *cpu)
 
         start_exclusive();
 
+        if (tb_stats_enabled(tb, TB_EXEC_STATS)) {
+            tb->tb_stats->executions.atomic++;
+        }
+
         /* Since we got here, we know that parallel_cpus must be true.  */
         parallel_cpus = false;
         in_exclusive_region = true;
diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
index 8208f4a0ad..ee0506bff1 100644
--- a/accel/tcg/tb-stats.c
+++ b/accel/tcg/tb-stats.c
@@ -16,6 +16,7 @@
 enum TBStatsStatus { TB_STATS_RUNNING, TB_STATS_PAUSED, TB_STATS_STOPPED };
 
 static enum TBStatsStatus tcg_collect_tb_stats;
+static uint32_t default_tbstats_flag;
 
 void init_tb_stats_htable_if_not(void)
 {
@@ -50,3 +51,8 @@ bool tb_stats_collection_paused(void)
 {
     return tcg_collect_tb_stats == TB_STATS_PAUSED;
 }
+
+uint32_t get_default_tbstats_flag(void)
+{
+    return default_tbstats_flag;
+}
diff --git a/accel/tcg/tcg-runtime.c b/accel/tcg/tcg-runtime.c
index 8a1e408e31..6f4aafba11 100644
--- a/accel/tcg/tcg-runtime.c
+++ b/accel/tcg/tcg-runtime.c
@@ -167,3 +167,10 @@ void HELPER(exit_atomic)(CPUArchState *env)
 {
     cpu_loop_exit_atomic(env_cpu(env), GETPC());
 }
+
+void HELPER(inc_exec_freq)(void *ptr)
+{
+    TBStatistics *stats = (TBStatistics *) ptr;
+    g_assert(stats);
+    atomic_inc(&stats->executions.normal);
+}
diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
index 4fa61b49b4..bf0b75dbe8 100644
--- a/accel/tcg/tcg-runtime.h
+++ b/accel/tcg/tcg-runtime.h
@@ -28,6 +28,8 @@ DEF_HELPER_FLAGS_1(lookup_tb_ptr, TCG_CALL_NO_WG_SE, ptr, env)
 
 DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, env)
 
+DEF_HELPER_FLAGS_1(inc_exec_freq, TCG_CALL_NO_RWG, void, ptr)
+
 #ifdef CONFIG_SOFTMMU
 
 DEF_HELPER_FLAGS_5(atomic_cmpxchgb, TCG_CALL_NO_WG,
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 114ebe48bf..b7dd1a78e5 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1784,10 +1784,17 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     /*
      * We want to fetch the stats structure before we start code
      * generation so we can count interesting things about this
-     * generation.
+     * generation. If dfilter is in effect we will only collect stats
+     * for the specified range.
      */
-    if (tb_stats_collection_enabled()) {
+    if (tb_stats_collection_enabled() &&
+        qemu_log_in_addr_range(tb->pc)) {
+        uint32_t flag = get_default_tbstats_flag();
         tb->tb_stats = tb_get_stats(phys_pc, pc, cs_base, flags);
+
+        if (flag & TB_EXEC_STATS) {
+            tb->tb_stats->stats_enabled |= TB_EXEC_STATS;
+        }
     } else {
         tb->tb_stats = NULL;
     }
diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index 70c66c538c..ec6bd829a0 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -46,6 +46,7 @@ void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
 
     ops->init_disas_context(db, cpu);
     tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
+    gen_tb_exec_count(tb);
 
     /* Reset the temp count so that we can identify leaks */
     tcg_clear_temp_count();
diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
index 822c43cfd3..be006383b9 100644
--- a/include/exec/gen-icount.h
+++ b/include/exec/gen-icount.h
@@ -32,6 +32,15 @@ static inline void gen_io_end(void)
     tcg_temp_free_i32(tmp);
 }
 
+static inline void gen_tb_exec_count(TranslationBlock *tb)
+{
+    if (tb_stats_enabled(tb, TB_EXEC_STATS)) {
+        TCGv_ptr ptr = tcg_const_ptr(tb->tb_stats);
+        gen_helper_inc_exec_freq(ptr);
+        tcg_temp_free_ptr(ptr);
+    }
+}
+
 static inline void gen_tb_start(TranslationBlock *tb)
 {
     TCGv_i32 count, imm;
diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
index 4be6522da0..51aecf65e2 100644
--- a/include/exec/tb-stats.h
+++ b/include/exec/tb-stats.h
@@ -30,6 +30,9 @@
 #include "exec/tb-context.h"
 #include "tcg.h"
 
+#define tb_stats_enabled(tb, JIT_STATS) \
+    (tb && tb->tb_stats && (tb->tb_stats->stats_enabled & JIT_STATS))
+
 typedef struct TBStatistics TBStatistics;
 
 /*
@@ -46,16 +49,31 @@ struct TBStatistics {
     uint32_t     flags;
     /* cs_base isn't included in the hash but we do check for matches */
     target_ulong cs_base;
+
+    /* which stats are enabled for this TBStats */
+    uint32_t stats_enabled;
+
+    /* Execution stats */
+    struct {
+        unsigned long normal;
+        unsigned long atomic;
+    } executions;
+
 };
 
 bool tb_stats_cmp(const void *ap, const void *bp);
 
 void init_tb_stats_htable_if_not(void);
 
+#define TB_NOTHING    (1 << 0)
+#define TB_EXEC_STATS (1 << 1)
+
 void enable_collect_tb_stats(void);
 void disable_collect_tb_stats(void);
 void pause_collect_tb_stats(void);
 bool tb_stats_collection_enabled(void);
 bool tb_stats_collection_paused(void);
 
+uint32_t get_default_tbstats_flag(void);
+
 #endif
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH  v9 03/13] accel: collecting JIT statistics
  2019-10-07 15:28 [PATCH v9 00/13] TCG code quality tracking and perf integration Alex Bennée
  2019-10-07 15:28 ` [PATCH v9 01/13] accel/tcg: introduce TBStatistics structure Alex Bennée
  2019-10-07 15:28 ` [PATCH v9 02/13] accel: collecting TB execution count Alex Bennée
@ 2019-10-07 15:28 ` Alex Bennée
  2019-10-08 13:38   ` Richard Henderson
  2019-10-07 15:28 ` [PATCH v9 04/13] accel: replacing part of CONFIG_PROFILER with TBStats Alex Bennée
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Alex Bennée @ 2019-10-07 15:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, cota, Vanderson M. do Rosario, Paolo Bonzini,
	Richard Henderson

From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>

If a TB has a TBS (TBStatistics) with the TB_JIT_STATS enabled then we
collect statistics of its translation processes and code translation.

To help with collection we include the TCGProfile structure
unconditionally. It will have further alterations in future commits.

Collecting the number of host instructions seems to be not simple as
it would imply in having to modify several target source files. So,
for now, we are only collecting the size of the host gen code.

Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
Message-Id: <20190829173437.5926-4-vandersonmr2@gmail.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
AJB:
  - replace tb tracking with proper array.
  - stash tcg counts in tcg_ctx.prof until we can update later
  - update jit stats under a lock instead of lots of atomics
  - don't re-count nb_ops
---
 accel/tcg/translate-all.c | 35 ++++++++++++++++++++++++++++++++++-
 accel/tcg/translator.c    |  3 +++
 include/exec/tb-stats.h   | 23 +++++++++++++++++++++++
 tcg/tcg.c                 |  9 +++++++--
 tcg/tcg.h                 | 23 +++++++++++++++++++++--
 5 files changed, 88 insertions(+), 5 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index b7dd1a78e5..6fa9850a3a 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1690,10 +1690,13 @@ static TBStatistics *tb_get_stats(tb_page_addr_t phys_pc, target_ulong pc,
     TBStatistics *new_stats = g_new0(TBStatistics, 1);
     uint32_t hash = tb_stats_hash_func(phys_pc, pc, flags);
     void *existing_stats = NULL;
+
     new_stats->phys_pc = phys_pc;
     new_stats->pc = pc;
     new_stats->cs_base = cs_base;
     new_stats->flags = flags;
+    new_stats->tbs = g_ptr_array_sized_new(4);
+    qemu_mutex_init(&new_stats->jit_stats_lock);
 
     /*
      * All initialisation must be complete before we insert into qht
@@ -1707,6 +1710,7 @@ static TBStatistics *tb_get_stats(tb_page_addr_t phys_pc, target_ulong pc,
          * If there is already a TBStatistic for this TB from a previous flush
          * then just make the new TB point to the older TBStatistic
          */
+        g_ptr_array_free(new_stats->tbs, true);
         g_free(new_stats);
         return existing_stats;
     } else {
@@ -1726,8 +1730,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     target_ulong virt_page2;
     tcg_insn_unit *gen_code_buf;
     int gen_code_size, search_size, max_insns;
-#ifdef CONFIG_PROFILER
     TCGProfile *prof = &tcg_ctx->prof;
+#ifdef CONFIG_PROFILER
     int64_t ti;
 #endif
     assert_memory_lock();
@@ -1795,6 +1799,10 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
         if (flag & TB_EXEC_STATS) {
             tb->tb_stats->stats_enabled |= TB_EXEC_STATS;
         }
+
+        if (flag & TB_JIT_STATS) {
+            tb->tb_stats->stats_enabled |= TB_JIT_STATS;
+        }
     } else {
         tb->tb_stats = NULL;
     }
@@ -1930,6 +1938,31 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     if ((pc & TARGET_PAGE_MASK) != virt_page2) {
         phys_page2 = get_page_addr_code(env, virt_page2);
     }
+
+    /*
+     * Collect JIT stats when enabled. We batch them all up here to
+     * avoid spamming the cache with atomic accesses
+     */
+    if (tb_stats_enabled(tb, TB_JIT_STATS)) {
+        TBStatistics *ts = tb->tb_stats;
+        qemu_mutex_lock(&ts->jit_stats_lock);
+
+        ts->code.num_guest_inst += prof->translation.nb_guest_insns;
+        ts->code.num_tcg_ops += prof->translation.nb_ops_pre_opt;
+        ts->code.num_tcg_ops_opt += tcg_ctx->nb_ops;
+        ts->code.spills += prof->translation.nb_spills;
+        ts->code.out_len += tb->tc.size;
+
+        ts->translations.total++;
+        if (phys_page2 != -1) {
+            ts->translations.spanning++;
+        }
+
+        g_ptr_array_add(ts->tbs, tb);
+
+        qemu_mutex_unlock(&ts->jit_stats_lock);
+    }
+
     /*
      * No explicit memory barrier is required -- tb_link_page() makes the
      * TB visible in a consistent state.
diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index ec6bd829a0..114e76db27 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -116,6 +116,9 @@ void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
     db->tb->size = db->pc_next - db->pc_first;
     db->tb->icount = db->num_insns;
 
+    /* Save number of guest instructions for TB_JIT_STATS */
+    tcg_ctx->prof.translation.nb_guest_insns = db->num_insns;
+
 #ifdef DEBUG_DISAS
     if (qemu_loglevel_mask(CPU_LOG_TB_IN_ASM)
         && qemu_log_in_addr_range(db->pc_first)) {
diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
index 51aecf65e2..566ae0d2be 100644
--- a/include/exec/tb-stats.h
+++ b/include/exec/tb-stats.h
@@ -59,6 +59,28 @@ struct TBStatistics {
         unsigned long atomic;
     } executions;
 
+    /* JIT Stats - protected by lock */
+    QemuMutex jit_stats_lock;
+
+    struct {
+        unsigned num_guest_inst;
+        unsigned num_tcg_ops;
+        unsigned num_tcg_ops_opt;
+        unsigned spills;
+        unsigned out_len;
+    } code;
+
+    struct {
+        unsigned long total;
+        unsigned long uncached;
+        unsigned long spanning;
+    } translations;
+
+    /*
+     * All persistent (cached) TranslationBlocks using
+     * this TBStats structure. Has to be reset on a tb_flush.
+     */
+    GPtrArray *tbs;
 };
 
 bool tb_stats_cmp(const void *ap, const void *bp);
@@ -67,6 +89,7 @@ void init_tb_stats_htable_if_not(void);
 
 #define TB_NOTHING    (1 << 0)
 #define TB_EXEC_STATS (1 << 1)
+#define TB_JIT_STATS  (1 << 2)
 
 void enable_collect_tb_stats(void);
 void disable_collect_tb_stats(void);
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 16b2d0e0ec..fa2fef9f1a 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1117,6 +1117,7 @@ void tcg_func_start(TCGContext *s)
     s->nb_labels = 0;
     s->current_frame_offset = s->frame_start;
 
+    s->prof.translation.nb_spills = 0;
 #ifdef CONFIG_DEBUG_TCG
     s->goto_tb_issue_mask = 0;
 #endif
@@ -3125,6 +3126,7 @@ static void temp_sync(TCGContext *s, TCGTemp *ts, TCGRegSet allocated_regs,
         case TEMP_VAL_REG:
             tcg_out_st(s, ts->type, ts->reg,
                        ts->mem_base->reg, ts->mem_offset);
+            s->prof.translation.nb_spills++;
             break;
 
         case TEMP_VAL_MEM:
@@ -3990,12 +3992,12 @@ int64_t tcg_cpu_exec_time(void)
 
 int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
 {
-#ifdef CONFIG_PROFILER
     TCGProfile *prof = &s->prof;
-#endif
     int i, num_insns;
     TCGOp *op;
 
+    s->current_tb = tb;
+
 #ifdef CONFIG_PROFILER
     {
         int n = 0;
@@ -4027,6 +4029,9 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
     }
 #endif
 
+    /* save pre-optimisation op count */
+    prof->translation.nb_ops_pre_opt = s->nb_ops;
+
 #ifdef CONFIG_DEBUG_TCG
     /* Ensure all labels referenced have been emitted.  */
     {
diff --git a/tcg/tcg.h b/tcg/tcg.h
index a37181c899..31571289cf 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -555,7 +555,26 @@ typedef struct TCGOp {
 /* Make sure operands fit in the bitfields above.  */
 QEMU_BUILD_BUG_ON(NB_OPS > (1 << 8));
 
+/*
+ * The TCGProfile structure holds data for analysing the quality of
+ * the code generation. The data is split between stuff that is valid
+ * for the lifetime of a single translation and things that are valid
+ * for the lifetime of the translator. As the former is reset for each
+ * new translation so it should be copied elsewhere if you want to
+ * keep it.
+ *
+ * The structure is safe to access within the context of translation
+ * but accessing the data from elsewhere should be done with safe
+ * work.
+ */
 typedef struct TCGProfile {
+
+    struct {
+        int nb_guest_insns;
+        int nb_spills;
+        int nb_ops_pre_opt;
+    } translation;
+
     int64_t cpu_exec_time;
     int64_t tb_count1;
     int64_t tb_count;
@@ -600,9 +619,7 @@ struct TCGContext {
 
     tcg_insn_unit *code_ptr;
 
-#ifdef CONFIG_PROFILER
     TCGProfile prof;
-#endif
 
 #ifdef CONFIG_DEBUG_TCG
     int temps_in_use;
@@ -651,6 +668,8 @@ struct TCGContext {
 
     uint16_t gen_insn_end_off[TCG_MAX_INSNS];
     target_ulong gen_insn_data[TCG_MAX_INSNS][TARGET_INSN_START_WORDS];
+
+    TranslationBlock *current_tb;
 };
 
 extern TCGContext tcg_init_ctx;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v9 04/13] accel: replacing part of CONFIG_PROFILER with TBStats
  2019-10-07 15:28 [PATCH v9 00/13] TCG code quality tracking and perf integration Alex Bennée
                   ` (2 preceding siblings ...)
  2019-10-07 15:28 ` [PATCH v9 03/13] accel: collecting JIT statistics Alex Bennée
@ 2019-10-07 15:28 ` Alex Bennée
  2019-10-08 13:58   ` Richard Henderson
  2019-10-07 15:28 ` [PATCH v9 05/13] accel: adding TB_JIT_TIME and full replacing CONFIG_PROFILER Alex Bennée
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Alex Bennée @ 2019-10-07 15:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, cota, Vanderson M. do Rosario, Paolo Bonzini,
	Richard Henderson

From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>

We add some of the statistics collected in the TCGProfiler into the
TBStats, having the statistics not only for the whole emulation but
for each TB. Then, we removed these stats from TCGProfiler and
reconstruct the information for the "info jit" using the sum of all
TBStats statistics.

The goal is to have one unique and better way of collecting emulation
statistics. Moreover, checking dynamiclly if the profiling is enabled
showed to have an insignificant impact on the performance:
https://wiki.qemu.org/Internships/ProjectIdeas/TCGCodeQuality#Overheads.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
Message-Id: <20190829173437.5926-5-vandersonmr2@gmail.com>
[AJB: fix authorship, use prof structure]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
AJB:
  - moving profiler stuff under jit_stats lock
  - keep TCGProf as per-ctx staging post for values
  - delete max counters as can be interfered.
---
 accel/tcg/tb-stats.c      |  95 ++++++++++++++++++++++++++++++++++
 accel/tcg/translate-all.c |  11 ++--
 include/exec/tb-stats.h   |  11 ++++
 tcg/tcg.c                 | 105 ++++----------------------------------
 tcg/tcg.h                 |   9 ++--
 5 files changed, 125 insertions(+), 106 deletions(-)

diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
index ee0506bff1..0e64c176b3 100644
--- a/accel/tcg/tb-stats.c
+++ b/accel/tcg/tb-stats.c
@@ -9,6 +9,10 @@
 #include "qemu/osdep.h"
 
 #include "disas/disas.h"
+#include "exec/exec-all.h"
+#include "tcg.h"
+
+#include "qemu/qemu-print.h"
 
 #include "exec/tb-stats.h"
 
@@ -18,6 +22,97 @@ enum TBStatsStatus { TB_STATS_RUNNING, TB_STATS_PAUSED, TB_STATS_STOPPED };
 static enum TBStatsStatus tcg_collect_tb_stats;
 static uint32_t default_tbstats_flag;
 
+struct jit_profile_info {
+    uint64_t translations;
+    uint64_t aborted;
+    uint64_t ops;
+    unsigned ops_max;
+    uint64_t del_ops;
+    uint64_t temps;
+    unsigned temps_max;
+    uint64_t host;
+    uint64_t guest;
+    uint64_t search_data;
+};
+
+/* accumulate the statistics from all TBs */
+static void collect_jit_profile_info(void *p, uint32_t hash, void *userp)
+{
+    struct jit_profile_info *jpi = userp;
+    TBStatistics *tbs = p;
+
+    jpi->translations += tbs->translations.total;
+    jpi->ops += tbs->code.num_tcg_ops;
+    if (stat_per_translation(tbs, code.num_tcg_ops) > jpi->ops_max) {
+        jpi->ops_max = stat_per_translation(tbs, code.num_tcg_ops);
+    }
+    jpi->del_ops += tbs->code.deleted_ops;
+    jpi->temps += tbs->code.temps;
+    if (stat_per_translation(tbs, code.temps) > jpi->temps_max) {
+        jpi->temps_max = stat_per_translation(tbs, code.temps);
+    }
+    jpi->host += tbs->code.out_len;
+    jpi->guest += tbs->code.in_len;
+    jpi->search_data += tbs->code.search_out_len;
+}
+
+/* dump JIT statisticis using TCGProfile and TBStats */
+void dump_jit_profile_info(TCGProfile *s)
+{
+    if (!tb_stats_collection_enabled()) {
+        return;
+    }
+
+    struct jit_profile_info *jpi = g_new0(struct jit_profile_info, 1);
+
+    qht_iter(&tb_ctx.tb_stats, collect_jit_profile_info, jpi);
+
+    if (jpi->translations) {
+        qemu_printf("translated TBs      %" PRId64 "\n", jpi->translations);
+        qemu_printf("avg ops/TB          %0.1f max=%d\n",
+                jpi->ops / (double) jpi->translations, jpi->ops_max);
+        qemu_printf("deleted ops/TB      %0.2f\n",
+                jpi->del_ops / (double) jpi->translations);
+        qemu_printf("avg temps/TB        %0.2f max=%d\n",
+                jpi->temps / (double) jpi->translations, jpi->temps_max);
+        qemu_printf("avg host code/TB    %0.1f\n",
+                jpi->host / (double) jpi->translations);
+        qemu_printf("avg search data/TB  %0.1f\n",
+                jpi->search_data / (double) jpi->translations);
+
+        if (s) {
+            int64_t tot = s->interm_time + s->code_time;
+            qemu_printf("JIT cycles          %" PRId64 " (%0.3f s at 2.4 GHz)\n",
+                            tot, tot / 2.4e9);
+            qemu_printf("cycles/op           %0.1f\n",
+                        jpi->ops ? (double)tot / jpi->ops : 0);
+            qemu_printf("cycles/in byte      %0.1f\n",
+                        jpi->guest ? (double)tot / jpi->guest : 0);
+            qemu_printf("cycles/out byte     %0.1f\n",
+                        jpi->host ? (double)tot / jpi->host : 0);
+            qemu_printf("cycles/search byte     %0.1f\n",
+                        jpi->search_data ? (double)tot / jpi->search_data : 0);
+            if (tot == 0) {
+                tot = 1;
+            }
+            qemu_printf("  gen_interm time   %0.1f%%\n",
+                        (double)s->interm_time / tot * 100.0);
+            qemu_printf("  gen_code time     %0.1f%%\n",
+                        (double)s->code_time / tot * 100.0);
+            qemu_printf("optim./code time    %0.1f%%\n",
+                        (double)s->opt_time / (s->code_time ? s->code_time : 1)
+                        * 100.0);
+            qemu_printf("liveness/code time  %0.1f%%\n",
+                    (double)s->la_time / (s->code_time ? s->code_time : 1) * 100.0);
+            qemu_printf("cpu_restore count   %" PRId64 "\n",
+                    s->restore_count);
+            qemu_printf("  avg cycles        %0.1f\n",
+                    s->restore_count ? (double)s->restore_time / s->restore_count : 0);
+        }
+    }
+    g_free(jpi);
+}
+
 void init_tb_stats_htable_if_not(void)
 {
     if (tb_stats_collection_enabled() && !tb_ctx.tb_stats.map) {
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 6fa9850a3a..84f759a829 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1780,8 +1780,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
  tb_overflow:
 
 #ifdef CONFIG_PROFILER
-    /* includes aborted translations because of exceptions */
-    atomic_set(&prof->tb_count1, prof->tb_count1 + 1);
     ti = profile_getclock();
 #endif
 
@@ -1807,6 +1805,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
         tb->tb_stats = NULL;
     }
 
+
     tcg_func_start(tcg_ctx);
 
     tcg_ctx->cpu = env_cpu(env);
@@ -1828,7 +1827,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     }
 
 #ifdef CONFIG_PROFILER
-    atomic_set(&prof->tb_count, prof->tb_count + 1);
     atomic_set(&prof->interm_time, prof->interm_time + profile_getclock() - ti);
     ti = profile_getclock();
 #endif
@@ -1875,9 +1873,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 
 #ifdef CONFIG_PROFILER
     atomic_set(&prof->code_time, prof->code_time + profile_getclock() - ti);
-    atomic_set(&prof->code_in_len, prof->code_in_len + tb->size);
-    atomic_set(&prof->code_out_len, prof->code_out_len + gen_code_size);
-    atomic_set(&prof->search_out_len, prof->search_out_len + search_size);
 #endif
 
 #ifdef DEBUG_DISAS
@@ -1951,7 +1946,11 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
         ts->code.num_tcg_ops += prof->translation.nb_ops_pre_opt;
         ts->code.num_tcg_ops_opt += tcg_ctx->nb_ops;
         ts->code.spills += prof->translation.nb_spills;
+        ts->code.temps += prof->translation.temp_count;
+        ts->code.deleted_ops += prof->translation.del_op_count;
+        ts->code.in_len += tb->size;
         ts->code.out_len += tb->tc.size;
+        ts->code.search_out_len += search_size;
 
         ts->translations.total++;
         if (phys_page2 != -1) {
diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
index 566ae0d2be..c779d6a183 100644
--- a/include/exec/tb-stats.h
+++ b/include/exec/tb-stats.h
@@ -33,6 +33,9 @@
 #define tb_stats_enabled(tb, JIT_STATS) \
     (tb && tb->tb_stats && (tb->tb_stats->stats_enabled & JIT_STATS))
 
+#define stat_per_translation(stat, name) \
+    (stat->translations.total ? stat->name / stat->translations.total : 0)
+
 typedef struct TBStatistics TBStatistics;
 
 /*
@@ -67,7 +70,13 @@ struct TBStatistics {
         unsigned num_tcg_ops;
         unsigned num_tcg_ops_opt;
         unsigned spills;
+
+        /* CONFIG_PROFILE */
+        unsigned temps;
+        unsigned deleted_ops;
+        unsigned in_len;
         unsigned out_len;
+        unsigned search_out_len;
     } code;
 
     struct {
@@ -87,6 +96,8 @@ bool tb_stats_cmp(const void *ap, const void *bp);
 
 void init_tb_stats_htable_if_not(void);
 
+void dump_jit_profile_info(TCGProfile *s);
+
 #define TB_NOTHING    (1 << 0)
 #define TB_EXEC_STATS (1 << 1)
 #define TB_JIT_STATS  (1 << 2)
diff --git a/tcg/tcg.c b/tcg/tcg.c
index fa2fef9f1a..9c199c8a40 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1118,6 +1118,8 @@ void tcg_func_start(TCGContext *s)
     s->current_frame_offset = s->frame_start;
 
     s->prof.translation.nb_spills = 0;
+    s->prof.translation.del_op_count = 0;
+    s->prof.translation.temp_count = 0;
 #ifdef CONFIG_DEBUG_TCG
     s->goto_tb_issue_mask = 0;
 #endif
@@ -2294,10 +2296,8 @@ void tcg_op_remove(TCGContext *s, TCGOp *op)
     QTAILQ_REMOVE(&s->ops, op, link);
     QTAILQ_INSERT_TAIL(&s->free_ops, op, link);
     s->nb_ops--;
-
-#ifdef CONFIG_PROFILER
-    atomic_set(&s->prof.del_op_count, s->prof.del_op_count + 1);
-#endif
+    /* ? won't this end up op_opt - op = del_op_count ? */
+    s->prof.translation.del_op_count++;
 }
 
 static TCGOp *tcg_op_alloc(TCGOpcode opc)
@@ -3910,16 +3910,6 @@ void tcg_profile_snapshot(TCGProfile *prof, bool counters, bool table)
 
         if (counters) {
             PROF_ADD(prof, orig, cpu_exec_time);
-            PROF_ADD(prof, orig, tb_count1);
-            PROF_ADD(prof, orig, tb_count);
-            PROF_ADD(prof, orig, op_count);
-            PROF_MAX(prof, orig, op_count_max);
-            PROF_ADD(prof, orig, temp_count);
-            PROF_MAX(prof, orig, temp_count_max);
-            PROF_ADD(prof, orig, del_op_count);
-            PROF_ADD(prof, orig, code_in_len);
-            PROF_ADD(prof, orig, code_out_len);
-            PROF_ADD(prof, orig, search_out_len);
             PROF_ADD(prof, orig, interm_time);
             PROF_ADD(prof, orig, code_time);
             PROF_ADD(prof, orig, la_time);
@@ -3997,26 +3987,9 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
     TCGOp *op;
 
     s->current_tb = tb;
-
-#ifdef CONFIG_PROFILER
-    {
-        int n = 0;
-
-        QTAILQ_FOREACH(op, &s->ops, link) {
-            n++;
-        }
-        atomic_set(&prof->op_count, prof->op_count + n);
-        if (n > prof->op_count_max) {
-            atomic_set(&prof->op_count_max, n);
-        }
-
-        n = s->nb_temps;
-        atomic_set(&prof->temp_count, prof->temp_count + n);
-        if (n > prof->temp_count_max) {
-            atomic_set(&prof->temp_count_max, n);
-        }
-    }
-#endif
+    /* save pre-optimisation op count */
+    prof->translation.nb_ops_pre_opt = s->nb_ops;
+    prof->translation.temp_count = s->nb_temps;
 
 #ifdef DEBUG_DISAS
     if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP)
@@ -4029,8 +4002,6 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
     }
 #endif
 
-    /* save pre-optimisation op count */
-    prof->translation.nb_ops_pre_opt = s->nb_ops;
 
 #ifdef CONFIG_DEBUG_TCG
     /* Ensure all labels referenced have been emitted.  */
@@ -4210,70 +4181,16 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
     return tcg_current_code_size(s);
 }
 
-#ifdef CONFIG_PROFILER
 void tcg_dump_info(void)
 {
+    TCGProfile *s = NULL;
+#ifdef CONFIG_PROFILER
     TCGProfile prof = {};
-    const TCGProfile *s;
-    int64_t tb_count;
-    int64_t tb_div_count;
-    int64_t tot;
-
     tcg_profile_snapshot_counters(&prof);
     s = &prof;
-    tb_count = s->tb_count;
-    tb_div_count = tb_count ? tb_count : 1;
-    tot = s->interm_time + s->code_time;
-
-    qemu_printf("JIT cycles          %" PRId64 " (%0.3f s at 2.4 GHz)\n",
-                tot, tot / 2.4e9);
-    qemu_printf("translated TBs      %" PRId64 " (aborted=%" PRId64
-                " %0.1f%%)\n",
-                tb_count, s->tb_count1 - tb_count,
-                (double)(s->tb_count1 - s->tb_count)
-                / (s->tb_count1 ? s->tb_count1 : 1) * 100.0);
-    qemu_printf("avg ops/TB          %0.1f max=%d\n",
-                (double)s->op_count / tb_div_count, s->op_count_max);
-    qemu_printf("deleted ops/TB      %0.2f\n",
-                (double)s->del_op_count / tb_div_count);
-    qemu_printf("avg temps/TB        %0.2f max=%d\n",
-                (double)s->temp_count / tb_div_count, s->temp_count_max);
-    qemu_printf("avg host code/TB    %0.1f\n",
-                (double)s->code_out_len / tb_div_count);
-    qemu_printf("avg search data/TB  %0.1f\n",
-                (double)s->search_out_len / tb_div_count);
-    
-    qemu_printf("cycles/op           %0.1f\n",
-                s->op_count ? (double)tot / s->op_count : 0);
-    qemu_printf("cycles/in byte      %0.1f\n",
-                s->code_in_len ? (double)tot / s->code_in_len : 0);
-    qemu_printf("cycles/out byte     %0.1f\n",
-                s->code_out_len ? (double)tot / s->code_out_len : 0);
-    qemu_printf("cycles/search byte     %0.1f\n",
-                s->search_out_len ? (double)tot / s->search_out_len : 0);
-    if (tot == 0) {
-        tot = 1;
-    }
-    qemu_printf("  gen_interm time   %0.1f%%\n",
-                (double)s->interm_time / tot * 100.0);
-    qemu_printf("  gen_code time     %0.1f%%\n",
-                (double)s->code_time / tot * 100.0);
-    qemu_printf("optim./code time    %0.1f%%\n",
-                (double)s->opt_time / (s->code_time ? s->code_time : 1)
-                * 100.0);
-    qemu_printf("liveness/code time  %0.1f%%\n",
-                (double)s->la_time / (s->code_time ? s->code_time : 1) * 100.0);
-    qemu_printf("cpu_restore count   %" PRId64 "\n",
-                s->restore_count);
-    qemu_printf("  avg cycles        %0.1f\n",
-                s->restore_count ? (double)s->restore_time / s->restore_count : 0);
-}
-#else
-void tcg_dump_info(void)
-{
-    qemu_printf("[TCG profiler not compiled]\n");
-}
 #endif
+    dump_jit_profile_info(s);
+}
 
 #ifdef ELF_HOST_MACHINE
 /* In order to use this feature, the backend needs to do three things:
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 31571289cf..5296f7c075 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -573,16 +573,13 @@ typedef struct TCGProfile {
         int nb_guest_insns;
         int nb_spills;
         int nb_ops_pre_opt;
+
+        int del_op_count;
+        int temp_count;
     } translation;
 
     int64_t cpu_exec_time;
-    int64_t tb_count1;
-    int64_t tb_count;
     int64_t op_count; /* total insn count */
-    int op_count_max; /* max insn per TB */
-    int temp_count_max;
-    int64_t temp_count;
-    int64_t del_op_count;
     int64_t code_in_len;
     int64_t code_out_len;
     int64_t search_out_len;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v9 05/13] accel: adding TB_JIT_TIME and full replacing CONFIG_PROFILER
  2019-10-07 15:28 [PATCH v9 00/13] TCG code quality tracking and perf integration Alex Bennée
                   ` (3 preceding siblings ...)
  2019-10-07 15:28 ` [PATCH v9 04/13] accel: replacing part of CONFIG_PROFILER with TBStats Alex Bennée
@ 2019-10-07 15:28 ` Alex Bennée
  2019-10-08 15:25   ` Richard Henderson
  2019-10-07 15:28 ` [PATCH v9 06/13] debug: add -d tb_stats to control TBStatistics collection: Alex Bennée
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Alex Bennée @ 2019-10-07 15:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Vanderson M. do Rosario, Dr. David Alan Gilbert,
	Markus Armbruster, cota, Paolo Bonzini, Alex Bennée,
	Richard Henderson

From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>

Replace all others CONFIG_PROFILER statistics and migrate it to
TBStatistics system. However, TCGProfiler still exists and can
be use to store global statistics and times. All TB related
statistics goes to TBStatistics.

Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
Message-Id: <20190829173437.5926-6-vandersonmr2@gmail.com>
[AJB: fix authorship]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 accel/tcg/tb-stats.c      |  98 +++++++++++++++++++++---------
 accel/tcg/translate-all.c |  46 ++++++++-------
 configure                 |   3 -
 cpus.c                    |  14 ++---
 include/exec/tb-stats.h   |  15 +++++
 include/qemu/timer.h      |   5 +-
 monitor/misc.c            |  28 ++-------
 tcg/tcg.c                 | 121 ++++++++++++--------------------------
 tcg/tcg.h                 |   2 +-
 vl.c                      |   8 +--
 10 files changed, 159 insertions(+), 181 deletions(-)

diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
index 0e64c176b3..f431159fd2 100644
--- a/accel/tcg/tb-stats.c
+++ b/accel/tcg/tb-stats.c
@@ -17,11 +17,18 @@
 #include "exec/tb-stats.h"
 
 /* TBStatistic collection controls */
-enum TBStatsStatus { TB_STATS_RUNNING, TB_STATS_PAUSED, TB_STATS_STOPPED };
+enum TBStatsStatus {
+    TB_STATS_DISABLED = 0,
+    TB_STATS_RUNNING,
+    TB_STATS_PAUSED,
+    TB_STATS_STOPPED
+};
 
 static enum TBStatsStatus tcg_collect_tb_stats;
 static uint32_t default_tbstats_flag;
 
+uint64_t dev_time;
+
 struct jit_profile_info {
     uint64_t translations;
     uint64_t aborted;
@@ -33,6 +40,13 @@ struct jit_profile_info {
     uint64_t host;
     uint64_t guest;
     uint64_t search_data;
+
+    uint64_t interm_time;
+    uint64_t code_time;
+    uint64_t restore_count;
+    uint64_t restore_time;
+    uint64_t opt_time;
+    uint64_t la_time;
 };
 
 /* accumulate the statistics from all TBs */
@@ -54,6 +68,29 @@ static void collect_jit_profile_info(void *p, uint32_t hash, void *userp)
     jpi->host += tbs->code.out_len;
     jpi->guest += tbs->code.in_len;
     jpi->search_data += tbs->code.search_out_len;
+
+    jpi->interm_time += stat_per_translation(tbs, time.interm);
+    jpi->code_time += stat_per_translation(tbs, time.code);
+    jpi->opt_time += stat_per_translation(tbs, time.opt);
+    jpi->la_time += stat_per_translation(tbs, time.la);
+    jpi->restore_time += tbs->time.restore;
+    jpi->restore_count += tbs->time.restore_count;
+}
+
+void dump_jit_exec_time_info(uint64_t dev_time)
+{
+    static uint64_t last_cpu_exec_time;
+    uint64_t cpu_exec_time;
+    uint64_t delta;
+
+    cpu_exec_time = tcg_cpu_exec_time();
+    delta = cpu_exec_time - last_cpu_exec_time;
+
+    qemu_printf("async time  %" PRId64 " (%0.3f)\n",
+                   dev_time, dev_time / (double) NANOSECONDS_PER_SECOND);
+    qemu_printf("qemu time   %" PRId64 " (%0.3f)\n",
+                   delta, delta / (double) NANOSECONDS_PER_SECOND);
+    last_cpu_exec_time = cpu_exec_time;
 }
 
 /* dump JIT statisticis using TCGProfile and TBStats */
@@ -80,34 +117,39 @@ void dump_jit_profile_info(TCGProfile *s)
         qemu_printf("avg search data/TB  %0.1f\n",
                 jpi->search_data / (double) jpi->translations);
 
+        uint64_t tot = jpi->interm_time + jpi->code_time;
+
+        qemu_printf("JIT cycles          %" PRId64 " (%0.3fs at 2.4 GHz)\n",
+                tot, tot / 2.4e9);
+        qemu_printf("  cycles/op           %0.1f\n",
+                jpi->ops ? (double)tot / jpi->ops : 0);
+        qemu_printf("  cycles/in byte      %0.1f\n",
+                jpi->guest ? (double)tot / jpi->guest : 0);
+        qemu_printf("  cycles/out byte     %0.1f\n",
+                jpi->host ? (double)tot / jpi->host : 0);
+        qemu_printf("  cycles/search byte  %0.1f\n",
+                jpi->search_data ? (double)tot / jpi->search_data : 0);
+        if (tot == 0) {
+            tot = 1;
+        }
+
+        qemu_printf("  gen_interm time     %0.1f%%\n",
+                (double)jpi->interm_time / tot * 100.0);
+        qemu_printf("  gen_code time       %0.1f%%\n",
+                (double)jpi->code_time / tot * 100.0);
+
+        qemu_printf("    optim./code time    %0.1f%%\n",
+                (double)jpi->opt_time / (jpi->code_time ? jpi->code_time : 1) * 100.0);
+        qemu_printf("    liveness/code time  %0.1f%%\n",
+                (double)jpi->la_time / (jpi->code_time ? jpi->code_time : 1) * 100.0);
+
+        qemu_printf("cpu_restore count   %" PRId64 "\n", jpi->restore_count);
+        qemu_printf("  avg cycles        %0.1f\n",
+                jpi->restore_count ? (double)jpi->restore_time / jpi->restore_count : 0);
+
         if (s) {
-            int64_t tot = s->interm_time + s->code_time;
-            qemu_printf("JIT cycles          %" PRId64 " (%0.3f s at 2.4 GHz)\n",
-                            tot, tot / 2.4e9);
-            qemu_printf("cycles/op           %0.1f\n",
-                        jpi->ops ? (double)tot / jpi->ops : 0);
-            qemu_printf("cycles/in byte      %0.1f\n",
-                        jpi->guest ? (double)tot / jpi->guest : 0);
-            qemu_printf("cycles/out byte     %0.1f\n",
-                        jpi->host ? (double)tot / jpi->host : 0);
-            qemu_printf("cycles/search byte     %0.1f\n",
-                        jpi->search_data ? (double)tot / jpi->search_data : 0);
-            if (tot == 0) {
-                tot = 1;
-            }
-            qemu_printf("  gen_interm time   %0.1f%%\n",
-                        (double)s->interm_time / tot * 100.0);
-            qemu_printf("  gen_code time     %0.1f%%\n",
-                        (double)s->code_time / tot * 100.0);
-            qemu_printf("optim./code time    %0.1f%%\n",
-                        (double)s->opt_time / (s->code_time ? s->code_time : 1)
-                        * 100.0);
-            qemu_printf("liveness/code time  %0.1f%%\n",
-                    (double)s->la_time / (s->code_time ? s->code_time : 1) * 100.0);
-            qemu_printf("cpu_restore count   %" PRId64 "\n",
-                    s->restore_count);
-            qemu_printf("  avg cycles        %0.1f\n",
-                    s->restore_count ? (double)s->restore_time / s->restore_count : 0);
+            qemu_printf("cpu exec time  %" PRId64 " (%0.3fs)\n",
+                s->cpu_exec_time, s->cpu_exec_time / (double) NANOSECONDS_PER_SECOND);
         }
     }
     g_free(jpi);
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 84f759a829..396e63c3e7 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -337,10 +337,11 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
     CPUArchState *env = cpu->env_ptr;
     uint8_t *p = tb->tc.ptr + tb->tc.size;
     int i, j, num_insns = tb->icount;
-#ifdef CONFIG_PROFILER
-    TCGProfile *prof = &tcg_ctx->prof;
-    int64_t ti = profile_getclock();
-#endif
+    uint64_t ti = 0;
+
+    if (tb_stats_enabled(tb, TB_JIT_TIME)) {
+        ti = profile_getclock();
+    }
 
     searched_pc -= GETPC_ADJ;
 
@@ -370,11 +371,11 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
     }
     restore_state_to_opc(env, tb, data);
 
-#ifdef CONFIG_PROFILER
-    atomic_set(&prof->restore_time,
-                prof->restore_time + profile_getclock() - ti);
-    atomic_set(&prof->restore_count, prof->restore_count + 1);
-#endif
+    if (tb_stats_enabled(tb, TB_JIT_TIME)) {
+        atomic_add(&tb->tb_stats->time.restore, profile_getclock() - ti);
+        atomic_inc(&tb->tb_stats->time.restore_count);
+    }
+
     return 0;
 }
 
@@ -1731,9 +1732,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     tcg_insn_unit *gen_code_buf;
     int gen_code_size, search_size, max_insns;
     TCGProfile *prof = &tcg_ctx->prof;
-#ifdef CONFIG_PROFILER
-    int64_t ti;
-#endif
+    uint64_t ti = 0;
+
     assert_memory_lock();
 
     phys_pc = get_page_addr_code(env, pc);
@@ -1779,9 +1779,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     tcg_ctx->tb_cflags = cflags;
  tb_overflow:
 
-#ifdef CONFIG_PROFILER
-    ti = profile_getclock();
-#endif
 
     /*
      * We want to fetch the stats structure before we start code
@@ -1801,6 +1798,11 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
         if (flag & TB_JIT_STATS) {
             tb->tb_stats->stats_enabled |= TB_JIT_STATS;
         }
+
+        if (flag & TB_JIT_TIME) {
+            tb->tb_stats->stats_enabled |= TB_JIT_TIME;
+            ti = profile_getclock();
+        }
     } else {
         tb->tb_stats = NULL;
     }
@@ -1826,10 +1828,10 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
         tcg_ctx->tb_jmp_target_addr = tb->jmp_target_arg;
     }
 
-#ifdef CONFIG_PROFILER
-    atomic_set(&prof->interm_time, prof->interm_time + profile_getclock() - ti);
-    ti = profile_getclock();
-#endif
+    if (tb_stats_enabled(tb, TB_JIT_TIME)) {
+        atomic_add(&tb->tb_stats->time.interm, profile_getclock() - ti);
+        ti = profile_getclock();
+    }
 
     gen_code_size = tcg_gen_code(tcg_ctx, tb);
     if (unlikely(gen_code_size < 0)) {
@@ -1871,9 +1873,9 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     }
     tb->tc.size = gen_code_size;
 
-#ifdef CONFIG_PROFILER
-    atomic_set(&prof->code_time, prof->code_time + profile_getclock() - ti);
-#endif
+    if (tb_stats_enabled(tb, TB_JIT_TIME)) {
+        atomic_add(&tb->tb_stats->time.code, profile_getclock() - ti);
+    }
 
 #ifdef DEBUG_DISAS
     if (qemu_loglevel_mask(CPU_LOG_TB_OUT_ASM) &&
diff --git a/configure b/configure
index 8f8446f52b..eedeb9016e 100755
--- a/configure
+++ b/configure
@@ -6566,9 +6566,6 @@ fi
 if test "$static" = "yes" ; then
   echo "CONFIG_STATIC=y" >> $config_host_mak
 fi
-if test "$profiler" = "yes" ; then
-  echo "CONFIG_PROFILER=y" >> $config_host_mak
-fi
 if test "$want_tools" = "yes" ; then
   echo "CONFIG_TOOLS=y" >> $config_host_mak
 fi
diff --git a/cpus.c b/cpus.c
index d2c61ff155..3c4bce0e56 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1442,21 +1442,17 @@ static void process_icount_data(CPUState *cpu)
 static int tcg_cpu_exec(CPUState *cpu)
 {
     int ret;
-#ifdef CONFIG_PROFILER
-    int64_t ti;
-#endif
+    uint64_t ti;
 
     assert(tcg_enabled());
-#ifdef CONFIG_PROFILER
     ti = profile_getclock();
-#endif
+
     cpu_exec_start(cpu);
     ret = cpu_exec(cpu);
     cpu_exec_end(cpu);
-#ifdef CONFIG_PROFILER
-    atomic_set(&tcg_ctx->prof.cpu_exec_time,
-               tcg_ctx->prof.cpu_exec_time + profile_getclock() - ti);
-#endif
+
+    atomic_add(&tcg_ctx->prof.cpu_exec_time, profile_getclock() - ti);
+
     return ret;
 }
 
diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
index c779d6a183..a142972960 100644
--- a/include/exec/tb-stats.h
+++ b/include/exec/tb-stats.h
@@ -90,10 +90,24 @@ struct TBStatistics {
      * this TBStats structure. Has to be reset on a tb_flush.
      */
     GPtrArray *tbs;
+
+    /* end of jit_stats_lock */
+
+    /* These are accessed with atomic operations */
+    struct {
+        int64_t restore;
+        uint64_t restore_count;
+        int64_t interm;
+        int64_t code;
+        int64_t opt;
+        int64_t la;
+    } time;
 };
 
 bool tb_stats_cmp(const void *ap, const void *bp);
 
+void dump_jit_exec_time_info(uint64_t dev_time);
+
 void init_tb_stats_htable_if_not(void);
 
 void dump_jit_profile_info(TCGProfile *s);
@@ -101,6 +115,7 @@ void dump_jit_profile_info(TCGProfile *s);
 #define TB_NOTHING    (1 << 0)
 #define TB_EXEC_STATS (1 << 1)
 #define TB_JIT_STATS  (1 << 2)
+#define TB_JIT_TIME   (1 << 3)
 
 void enable_collect_tb_stats(void);
 void disable_collect_tb_stats(void);
diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index 85bc6eb00b..641ca3ddd0 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -1006,13 +1006,10 @@ static inline int64_t cpu_get_host_ticks(void)
 }
 #endif
 
-#ifdef CONFIG_PROFILER
 static inline int64_t profile_getclock(void)
 {
     return get_clock();
 }
 
-extern int64_t dev_time;
-#endif
-
+extern uint64_t dev_time;
 #endif
diff --git a/monitor/misc.c b/monitor/misc.c
index aef16f6cfb..ac4ff58d96 100644
--- a/monitor/misc.c
+++ b/monitor/misc.c
@@ -472,6 +472,11 @@ static void hmp_info_jit(Monitor *mon, const QDict *qdict)
 
 static void hmp_info_opcount(Monitor *mon, const QDict *qdict)
 {
+    if (!tb_stats_collection_enabled()) {
+        error_report("TB information not being recorded.");
+        return;
+    }
+
     dump_opcount_info();
 }
 #endif
@@ -1082,32 +1087,11 @@ static void hmp_info_mtree(Monitor *mon, const QDict *qdict)
     mtree_info(flatview, dispatch_tree, owner);
 }
 
-#ifdef CONFIG_PROFILER
-
-int64_t dev_time;
-
 static void hmp_info_profile(Monitor *mon, const QDict *qdict)
 {
-    static int64_t last_cpu_exec_time;
-    int64_t cpu_exec_time;
-    int64_t delta;
-
-    cpu_exec_time = tcg_cpu_exec_time();
-    delta = cpu_exec_time - last_cpu_exec_time;
-
-    monitor_printf(mon, "async time  %" PRId64 " (%0.3f)\n",
-                   dev_time, dev_time / (double)NANOSECONDS_PER_SECOND);
-    monitor_printf(mon, "qemu time   %" PRId64 " (%0.3f)\n",
-                   delta, delta / (double)NANOSECONDS_PER_SECOND);
-    last_cpu_exec_time = cpu_exec_time;
+    dump_jit_exec_time_info(dev_time);
     dev_time = 0;
 }
-#else
-static void hmp_info_profile(Monitor *mon, const QDict *qdict)
-{
-    monitor_printf(mon, "Internal profiler not compiled\n");
-}
-#endif
 
 /* Capture support */
 static QLIST_HEAD (capture_list_head, CaptureState) capture_head;
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 9c199c8a40..e1438fa0d9 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -3881,82 +3881,34 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
     }
 }
 
-#ifdef CONFIG_PROFILER
-
 /* avoid copy/paste errors */
 #define PROF_ADD(to, from, field)                       \
     do {                                                \
         (to)->field += atomic_read(&((from)->field));   \
     } while (0)
 
-#define PROF_MAX(to, from, field)                                       \
-    do {                                                                \
-        typeof((from)->field) val__ = atomic_read(&((from)->field));    \
-        if (val__ > (to)->field) {                                      \
-            (to)->field = val__;                                        \
-        }                                                               \
-    } while (0)
-
-/* Pass in a zero'ed @prof */
-static inline
-void tcg_profile_snapshot(TCGProfile *prof, bool counters, bool table)
+static void collect_tcg_profiler(TCGProfile *prof)
 {
+    int i;
     unsigned int n_ctxs = atomic_read(&n_tcg_ctxs);
-    unsigned int i;
 
     for (i = 0; i < n_ctxs; i++) {
         TCGContext *s = atomic_read(&tcg_ctxs[i]);
         const TCGProfile *orig = &s->prof;
 
-        if (counters) {
-            PROF_ADD(prof, orig, cpu_exec_time);
-            PROF_ADD(prof, orig, interm_time);
-            PROF_ADD(prof, orig, code_time);
-            PROF_ADD(prof, orig, la_time);
-            PROF_ADD(prof, orig, opt_time);
-            PROF_ADD(prof, orig, restore_count);
-            PROF_ADD(prof, orig, restore_time);
-        }
-        if (table) {
-            int i;
+        PROF_ADD(prof, orig, cpu_exec_time);
 
-            for (i = 0; i < NB_OPS; i++) {
-                PROF_ADD(prof, orig, table_op_count[i]);
-            }
+        for (i = 0; i < NB_OPS; i++) {
+            PROF_ADD(prof, orig, table_op_count[i]);
         }
     }
 }
 
-#undef PROF_ADD
-#undef PROF_MAX
-
-static void tcg_profile_snapshot_counters(TCGProfile *prof)
-{
-    tcg_profile_snapshot(prof, true, false);
-}
-
-static void tcg_profile_snapshot_table(TCGProfile *prof)
-{
-    tcg_profile_snapshot(prof, false, true);
-}
-
-void tcg_dump_op_count(void)
-{
-    TCGProfile prof = {};
-    int i;
-
-    tcg_profile_snapshot_table(&prof);
-    for (i = 0; i < NB_OPS; i++) {
-        qemu_printf("%s %" PRId64 "\n", tcg_op_defs[i].name,
-                    prof.table_op_count[i]);
-    }
-}
-
-int64_t tcg_cpu_exec_time(void)
+uint64_t tcg_cpu_exec_time(void)
 {
     unsigned int n_ctxs = atomic_read(&n_tcg_ctxs);
     unsigned int i;
-    int64_t ret = 0;
+    uint64_t ret = 0;
 
     for (i = 0; i < n_ctxs; i++) {
         const TCGContext *s = atomic_read(&tcg_ctxs[i]);
@@ -3966,19 +3918,6 @@ int64_t tcg_cpu_exec_time(void)
     }
     return ret;
 }
-#else
-void tcg_dump_op_count(void)
-{
-    qemu_printf("[TCG profiler not compiled]\n");
-}
-
-int64_t tcg_cpu_exec_time(void)
-{
-    error_report("%s: TCG profiler not compiled", __func__);
-    exit(EXIT_FAILURE);
-}
-#endif
-
 
 int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
 {
@@ -4020,18 +3959,18 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
     }
 #endif
 
-#ifdef CONFIG_PROFILER
-    atomic_set(&prof->opt_time, prof->opt_time - profile_getclock());
-#endif
+    if (tb_stats_enabled(tb, TB_JIT_TIME)) {
+        atomic_add(&tb->tb_stats->time.opt, -profile_getclock());
+    }
 
 #ifdef USE_TCG_OPTIMIZATIONS
     tcg_optimize(s);
 #endif
 
-#ifdef CONFIG_PROFILER
-    atomic_set(&prof->opt_time, prof->opt_time + profile_getclock());
-    atomic_set(&prof->la_time, prof->la_time - profile_getclock());
-#endif
+    if (tb_stats_enabled(tb, TB_JIT_TIME)) {
+        atomic_add(&tb->tb_stats->time.opt, profile_getclock());
+        atomic_add(&tb->tb_stats->time.la,  -profile_getclock());
+    }
 
     reachable_code_pass(s);
     liveness_pass_1(s);
@@ -4054,9 +3993,9 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
         }
     }
 
-#ifdef CONFIG_PROFILER
-    atomic_set(&prof->la_time, prof->la_time + profile_getclock());
-#endif
+    if (tb_stats_enabled(tb, TB_JIT_TIME)) {
+        atomic_add(&tb->tb_stats->time.la, profile_getclock());
+    }
 
 #ifdef DEBUG_DISAS
     if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP_OPT)
@@ -4081,14 +4020,17 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
     s->pool_labels = NULL;
 #endif
 
+    if (!tb_stats_collection_enabled()) {
+        QTAILQ_FOREACH(op, &s->ops, link) {
+            TCGOpcode opc = op->opc;
+            atomic_add(&s->prof.table_op_count[opc], 1);
+        }
+    }
+
     num_insns = -1;
     QTAILQ_FOREACH(op, &s->ops, link) {
         TCGOpcode opc = op->opc;
 
-#ifdef CONFIG_PROFILER
-        atomic_set(&prof->table_op_count[opc], prof->table_op_count[opc] + 1);
-#endif
-
         switch (opc) {
         case INDEX_op_mov_i32:
         case INDEX_op_mov_i64:
@@ -4181,14 +4123,23 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
     return tcg_current_code_size(s);
 }
 
+void tcg_dump_op_count(void)
+{
+    TCGProfile prof = {};
+    int i;
+
+    collect_tcg_profiler(&prof);
+    for (i = 0; i < NB_OPS; i++) {
+        qemu_printf("%s %" PRId64 "\n", tcg_op_defs[i].name, prof.table_op_count[i]);
+    }
+}
+
 void tcg_dump_info(void)
 {
     TCGProfile *s = NULL;
-#ifdef CONFIG_PROFILER
     TCGProfile prof = {};
-    tcg_profile_snapshot_counters(&prof);
     s = &prof;
-#endif
+    collect_tcg_profiler(s);
     dump_jit_profile_info(s);
 }
 
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 5296f7c075..6dcbbf79e7 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -948,7 +948,7 @@ int tcg_check_temp_count(void);
 #define tcg_check_temp_count() 0
 #endif
 
-int64_t tcg_cpu_exec_time(void);
+uint64_t tcg_cpu_exec_time(void);
 void tcg_dump_info(void);
 void tcg_dump_op_count(void);
 
diff --git a/vl.c b/vl.c
index 002bf4919e..f91bc6fb4c 100644
--- a/vl.c
+++ b/vl.c
@@ -1796,17 +1796,11 @@ static bool main_loop_should_exit(void)
 
 static void main_loop(void)
 {
-#ifdef CONFIG_PROFILER
-    int64_t ti;
-#endif
+    uint64_t ti;
     while (!main_loop_should_exit()) {
-#ifdef CONFIG_PROFILER
         ti = profile_getclock();
-#endif
         main_loop_wait(false);
-#ifdef CONFIG_PROFILER
         dev_time += profile_getclock() - ti;
-#endif
     }
 }
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v9 06/13] debug: add -d tb_stats to control TBStatistics collection:
  2019-10-07 15:28 [PATCH v9 00/13] TCG code quality tracking and perf integration Alex Bennée
                   ` (4 preceding siblings ...)
  2019-10-07 15:28 ` [PATCH v9 05/13] accel: adding TB_JIT_TIME and full replacing CONFIG_PROFILER Alex Bennée
@ 2019-10-07 15:28 ` Alex Bennée
  2019-10-08 15:34   ` Richard Henderson
  2019-10-07 15:28 ` [PATCH v9 07/13] monitor: adding tb_stats hmp command Alex Bennée
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Alex Bennée @ 2019-10-07 15:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, cota, Vanderson M. do Rosario, Paolo Bonzini,
	Richard Henderson

From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>

 -d tb_stats[[,level=(+all+jit+exec+time)][,dump_limit=<number>]]

"dump_limit" is used to limit the number of dumped TBStats in
linux-user mode.

[all+jit+exec+time] control the profilling level used
by the TBStats. Can be used as follow:

-d tb_stats
-d tb_stats,level=jit+time
-d tb_stats,dump_limit=15
...

Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
Message-Id: <20190829173437.5926-7-vandersonmr2@gmail.com>
[AJB: fix authorship, reword title]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
AJB:
  - reword title
  - add stubs for enabling
  - move things across to tb-stats-flags.h
---
 accel/tcg/tb-stats.c          |  5 +++++
 include/exec/gen-icount.h     |  1 +
 include/exec/tb-stats-flags.h | 29 +++++++++++++++++++++++++++++
 include/exec/tb-stats.h       | 16 +++-------------
 include/qemu/log.h            |  1 +
 stubs/Makefile.objs           |  1 +
 stubs/tb-stats.c              | 27 +++++++++++++++++++++++++++
 util/log.c                    | 35 +++++++++++++++++++++++++++++++++++
 8 files changed, 102 insertions(+), 13 deletions(-)
 create mode 100644 include/exec/tb-stats-flags.h
 create mode 100644 stubs/tb-stats.c

diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
index f431159fd2..1c66e03979 100644
--- a/accel/tcg/tb-stats.c
+++ b/accel/tcg/tb-stats.c
@@ -193,3 +193,8 @@ uint32_t get_default_tbstats_flag(void)
 {
     return default_tbstats_flag;
 }
+
+void set_default_tbstats_flag(uint32_t flags)
+{
+    default_tbstats_flag = flags;
+}
diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
index be006383b9..3987adfb0e 100644
--- a/include/exec/gen-icount.h
+++ b/include/exec/gen-icount.h
@@ -2,6 +2,7 @@
 #define GEN_ICOUNT_H
 
 #include "qemu/timer.h"
+#include "tb-stats-flags.h"
 
 /* Helpers for instruction counting code generation.  */
 
diff --git a/include/exec/tb-stats-flags.h b/include/exec/tb-stats-flags.h
new file mode 100644
index 0000000000..8455073048
--- /dev/null
+++ b/include/exec/tb-stats-flags.h
@@ -0,0 +1,29 @@
+/*
+ * QEMU System Emulator, Code Quality Monitor System
+ *
+ * We define the flags and control bits here to avoid complications of
+ * including TCG/CPU information in common code.
+ *
+ * Copyright (c) 2019 Vanderson M. do Rosario <vandersonmr2@gmail.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+#ifndef TB_STATS_FLAGS
+#define TB_STATS_FLAGS
+
+#define TB_NOTHING    (1 << 0)
+#define TB_EXEC_STATS (1 << 1)
+#define TB_JIT_STATS  (1 << 2)
+#define TB_JIT_TIME   (1 << 3)
+
+/* TBStatistic collection controls */
+void enable_collect_tb_stats(void);
+void disable_collect_tb_stats(void);
+void pause_collect_tb_stats(void);
+bool tb_stats_collection_enabled(void);
+bool tb_stats_collection_paused(void);
+
+uint32_t get_default_tbstats_flag(void);
+void set_default_tbstats_flag(uint32_t);
+
+#endif
diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
index a142972960..019d316f5c 100644
--- a/include/exec/tb-stats.h
+++ b/include/exec/tb-stats.h
@@ -30,6 +30,8 @@
 #include "exec/tb-context.h"
 #include "tcg.h"
 
+#include "exec/tb-stats-flags.h"
+
 #define tb_stats_enabled(tb, JIT_STATS) \
     (tb && tb->tb_stats && (tb->tb_stats->stats_enabled & JIT_STATS))
 
@@ -108,21 +110,9 @@ bool tb_stats_cmp(const void *ap, const void *bp);
 
 void dump_jit_exec_time_info(uint64_t dev_time);
 
+void set_tbstats_flags(uint32_t flags);
 void init_tb_stats_htable_if_not(void);
 
 void dump_jit_profile_info(TCGProfile *s);
 
-#define TB_NOTHING    (1 << 0)
-#define TB_EXEC_STATS (1 << 1)
-#define TB_JIT_STATS  (1 << 2)
-#define TB_JIT_TIME   (1 << 3)
-
-void enable_collect_tb_stats(void);
-void disable_collect_tb_stats(void);
-void pause_collect_tb_stats(void);
-bool tb_stats_collection_enabled(void);
-bool tb_stats_collection_paused(void);
-
-uint32_t get_default_tbstats_flag(void);
-
 #endif
diff --git a/include/qemu/log.h b/include/qemu/log.h
index b097a6cae1..a8d1997cde 100644
--- a/include/qemu/log.h
+++ b/include/qemu/log.h
@@ -45,6 +45,7 @@ static inline bool qemu_log_separate(void)
 /* LOG_TRACE (1 << 15) is defined in log-for-trace.h */
 #define CPU_LOG_TB_OP_IND  (1 << 16)
 #define CPU_LOG_TB_FPU     (1 << 17)
+#define CPU_LOG_TB_STATS   (1 << 18)
 
 /* Lock output for a series of related logs.  Since this is not needed
  * for a single qemu_log / qemu_log_mask / qemu_log_mask_and_addr, we
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index 9c7393b08c..1c5cd05147 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -41,3 +41,4 @@ stub-obj-y += ram-block.o
 stub-obj-y += ramfb.o
 stub-obj-y += fw_cfg.o
 stub-obj-$(CONFIG_SOFTMMU) += semihost.o
+stub-obj-$(CONFIG_TCG) += tb-stats.o
diff --git a/stubs/tb-stats.c b/stubs/tb-stats.c
new file mode 100644
index 0000000000..d212c2a1fa
--- /dev/null
+++ b/stubs/tb-stats.c
@@ -0,0 +1,27 @@
+/*
+ * TB Stats Stubs
+ *
+ * Copyright (c) 2019
+ * Written by Alex Bennée <alex.bennee@linaro.org>
+ *
+ * This code is licensed under the GNU GPL v2, or later.
+ */
+
+
+#include "qemu/osdep.h"
+#include "exec/tb-stats-flags.h"
+
+void enable_collect_tb_stats(void)
+{
+    return;
+}
+
+bool tb_stats_collection_enabled(void)
+{
+    return false;
+}
+
+void set_default_tbstats_flag(uint32_t flags)
+{
+    return;
+}
diff --git a/util/log.c b/util/log.c
index 1d1b33f7d9..86bd691967 100644
--- a/util/log.c
+++ b/util/log.c
@@ -19,17 +19,20 @@
 
 #include "qemu/osdep.h"
 #include "qemu/log.h"
+#include "qemu/qemu-print.h"
 #include "qemu/range.h"
 #include "qemu/error-report.h"
 #include "qapi/error.h"
 #include "qemu/cutils.h"
 #include "trace/control.h"
+#include "exec/tb-stats-flags.h"
 
 static char *logfilename;
 FILE *qemu_logfile;
 int qemu_loglevel;
 static int log_append = 0;
 static GArray *debug_regions;
+int32_t max_num_hot_tbs_to_dump;
 
 /* Return the number of characters emitted.  */
 int qemu_log(const char *fmt, ...)
@@ -273,6 +276,9 @@ const QEMULogItem qemu_log_items[] = {
     { CPU_LOG_TB_NOCHAIN, "nochain",
       "do not chain compiled TBs so that \"exec\" and \"cpu\" show\n"
       "complete traces" },
+    { CPU_LOG_TB_STATS, "tb_stats[[,level=(+all+jit+exec+time)][,dump_limit=<number>]]",
+      "enable collection of TBs statistics"
+      "(and dump until given a limit if in user mode).\n" },
     { 0, NULL, NULL },
 };
 
@@ -294,6 +300,35 @@ int qemu_str_to_log_mask(const char *str)
             trace_enable_events((*tmp) + 6);
             mask |= LOG_TRACE;
 #endif
+        } else if (g_str_has_prefix(*tmp, "tb_stats")) {
+            mask |= CPU_LOG_TB_STATS;
+            set_default_tbstats_flag(TB_JIT_STATS | TB_EXEC_STATS | TB_JIT_TIME);
+            enable_collect_tb_stats();
+        } else if (tb_stats_collection_enabled() &&
+                   g_str_has_prefix(*tmp, "dump_limit=")) {
+            max_num_hot_tbs_to_dump = atoi((*tmp) + 11);
+        } else if (tb_stats_collection_enabled() &&
+                   g_str_has_prefix(*tmp, "level=")) {
+            uint32_t flags = 0;
+            char **level_parts = g_strsplit(*tmp + 6, "+", 0);
+            char **level_tmp;
+            for (level_tmp = level_parts; level_tmp && *level_tmp; level_tmp++) {
+                if (g_str_equal(*level_tmp, "jit")) {
+                    flags |= TB_JIT_STATS;
+                } else if (g_str_equal(*level_tmp, "exec")) {
+                    flags |= TB_EXEC_STATS;
+                } else if (g_str_equal(*level_tmp, "time")) {
+                    flags |= TB_JIT_TIME;
+                } else if (g_str_equal(*level_tmp, "all")) {
+                    flags |= TB_JIT_STATS | TB_EXEC_STATS | TB_JIT_TIME;
+                } else {
+                    /* FIXME: set errp */
+                    fprintf(stderr, "no option level=%s, valid options are:"
+                            "all, jit, exec or/and time\n", *level_tmp);
+                    exit(1);
+                }
+                set_default_tbstats_flag(flags);
+            }
         } else {
             for (item = qemu_log_items; item->mask != 0; item++) {
                 if (g_str_equal(*tmp, item->name)) {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH  v9 07/13] monitor: adding tb_stats hmp command
  2019-10-07 15:28 [PATCH v9 00/13] TCG code quality tracking and perf integration Alex Bennée
                   ` (5 preceding siblings ...)
  2019-10-07 15:28 ` [PATCH v9 06/13] debug: add -d tb_stats to control TBStatistics collection: Alex Bennée
@ 2019-10-07 15:28 ` Alex Bennée
  2019-10-08 15:48   ` Richard Henderson
  2019-10-07 15:28 ` [PATCH v9 08/13] tb-stats: reset the tracked TBs on a tb_flush Alex Bennée
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Alex Bennée @ 2019-10-07 15:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Vanderson M. do Rosario, Dr . David Alan Gilbert,
	Markus Armbruster, cota, Paolo Bonzini, Alex Bennée,
	Richard Henderson

From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>

Adding tb_stats [start|pause|stop|filter] command to hmp.
This allows controlling the collection of statistics.
It is also possible to set the level of collection:
all, jit, or exec.

tb_stats filter allow to only collect statistics for the TB
in the last_search list.

The goal of this command is to allow the dynamic exploration
of the TCG behavior and quality. Therefore, for now, a
corresponding QMP command is not worthwhile.

Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
Message-Id: <20190829173437.5926-8-vandersonmr2@gmail.com>
[AJB: fix authorship]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

monitor: add tb_stats [start|pause|stop|filter] command to hmp.

This allows controlling the collection of statistics.
It is also possible to set the level of collection:
all, jit, or exec.

tb_stats filter allow to only collect statistics for the TB
in the last_search list.

The goal of this command is to allow the dynamic exploration
of the TCG behavior and quality. Therefore, for now, a
corresponding QMP command is not worthwhile.

Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
Message-Id: <20190829173437.5926-9-vandersonmr2@gmail.com>
[AJB: fix authorship]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 accel/tcg/tb-stats.c          | 105 ++++++++++++++++++++++++++++++++++
 hmp-commands.hx               |  17 ++++++
 include/exec/tb-stats-flags.h |   1 +
 include/exec/tb-stats.h       |  10 ++++
 monitor/misc.c                |  49 ++++++++++++++++
 vl.c                          |   6 ++
 6 files changed, 188 insertions(+)

diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
index 1c66e03979..dabc5150f9 100644
--- a/accel/tcg/tb-stats.c
+++ b/accel/tcg/tb-stats.c
@@ -26,6 +26,8 @@ enum TBStatsStatus {
 
 static enum TBStatsStatus tcg_collect_tb_stats;
 static uint32_t default_tbstats_flag;
+/* only accessed in safe work */
+static GList *last_search;
 
 uint64_t dev_time;
 
@@ -155,6 +157,96 @@ void dump_jit_profile_info(TCGProfile *s)
     g_free(jpi);
 }
 
+static void free_tbstats(void *p, uint32_t hash, void *userp)
+{
+    g_free(p);
+}
+
+static void clean_tbstats(void)
+{
+    /* remove all tb_stats */
+    qht_iter(&tb_ctx.tb_stats, free_tbstats, NULL);
+    qht_destroy(&tb_ctx.tb_stats);
+}
+
+void do_hmp_tbstats_safe(CPUState *cpu, run_on_cpu_data icmd)
+{
+    struct TbstatsCommand *cmdinfo = icmd.host_ptr;
+    int cmd = cmdinfo->cmd;
+    uint32_t level = cmdinfo->level;
+
+    switch (cmd) {
+    case START:
+        if (tb_stats_collection_paused()) {
+            set_tbstats_flags(level);
+        } else {
+            if (tb_stats_collection_enabled()) {
+                qemu_printf("TB information already being recorded");
+                return;
+            }
+            qht_init(&tb_ctx.tb_stats, tb_stats_cmp, CODE_GEN_HTABLE_SIZE,
+                        QHT_MODE_AUTO_RESIZE);
+        }
+
+        set_default_tbstats_flag(level);
+        enable_collect_tb_stats();
+        tb_flush(cpu);
+        break;
+    case PAUSE:
+        if (!tb_stats_collection_enabled()) {
+            qemu_printf("TB information not being recorded");
+            return;
+        }
+
+        /* Continue to create TBStatistic structures but stop collecting statistics */
+        pause_collect_tb_stats();
+        set_default_tbstats_flag(TB_NOTHING);
+        set_tbstats_flags(TB_PAUSED);
+        tb_flush(cpu);
+        break;
+    case STOP:
+        if (!tb_stats_collection_enabled()) {
+            qemu_printf("TB information not being recorded");
+            return;
+        }
+
+        /* Dissalloc all TBStatistics structures and stop creating new ones */
+        disable_collect_tb_stats();
+        clean_tbstats();
+        tb_flush(cpu);
+        break;
+    case FILTER:
+        if (!tb_stats_collection_enabled()) {
+            qemu_printf("TB information not being recorded");
+            return;
+        }
+        if (!last_search) {
+            qemu_printf("no search on record! execute info tbs before filtering!");
+            return;
+        }
+
+        set_default_tbstats_flag(TB_NOTHING);
+
+        /* Set all tbstats as paused, then return only the ones from last_search */
+        pause_collect_tb_stats();
+        set_tbstats_flags(TB_PAUSED);
+
+        for (GList *iter = last_search; iter; iter = g_list_next(iter)) {
+            TBStatistics *tbs = iter->data;
+            tbs->stats_enabled = level;
+        }
+
+        tb_flush(cpu);
+
+        break;
+    default: /* INVALID */
+        g_assert_not_reached();
+        break;
+    }
+
+    g_free(cmdinfo);
+}
+
 void init_tb_stats_htable_if_not(void)
 {
     if (tb_stats_collection_enabled() && !tb_ctx.tb_stats.map) {
@@ -189,6 +281,19 @@ bool tb_stats_collection_paused(void)
     return tcg_collect_tb_stats == TB_STATS_PAUSED;
 }
 
+static void reset_tbstats_flag(void *p, uint32_t hash, void *userp)
+{
+    uint32_t flag = *((int *)userp);
+    TBStatistics *tbs = p;
+    tbs->stats_enabled = flag;
+}
+
+void set_tbstats_flags(uint32_t flag)
+{
+    /* iterate over tbstats setting their flag as TB_NOTHING */
+    qht_iter(&tb_ctx.tb_stats, reset_tbstats_flag, &flag);
+}
+
 uint32_t get_default_tbstats_flag(void)
 {
     return default_tbstats_flag;
diff --git a/hmp-commands.hx b/hmp-commands.hx
index cfcc044ce4..6cd2800378 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1886,6 +1886,23 @@ STEXI
 @findex qemu-io
 Executes a qemu-io command on the given block device.
 
+ETEXI
+#if defined(CONFIG_TCG)
+    {
+        .name       = "tb_stats",
+        .args_type  = "command:s,level:s?",
+        .params     = "command [stats_level]",
+        .help       = "Control tb statistics collection:"
+                        "tb_stats (start|pause|stop|filter) [all|jit_stats|exec_stats]",
+        .cmd        = hmp_tbstats,
+    },
+#endif
+
+STEXI
+@item tb_stats
+@findex
+Control recording tb statistics
+
 ETEXI
 
     {
diff --git a/include/exec/tb-stats-flags.h b/include/exec/tb-stats-flags.h
index 8455073048..252304d685 100644
--- a/include/exec/tb-stats-flags.h
+++ b/include/exec/tb-stats-flags.h
@@ -15,6 +15,7 @@
 #define TB_EXEC_STATS (1 << 1)
 #define TB_JIT_STATS  (1 << 2)
 #define TB_JIT_TIME   (1 << 3)
+#define TB_PAUSED     (1 << 4)
 
 /* TBStatistic collection controls */
 void enable_collect_tb_stats(void);
diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
index 019d316f5c..921da38c97 100644
--- a/include/exec/tb-stats.h
+++ b/include/exec/tb-stats.h
@@ -32,6 +32,9 @@
 
 #include "exec/tb-stats-flags.h"
 
+enum SortBy { SORT_BY_HOTNESS, SORT_BY_HG /* Host/Guest */, SORT_BY_SPILLS };
+enum TbstatsCmd { START, PAUSE, STOP, FILTER };
+
 #define tb_stats_enabled(tb, JIT_STATS) \
     (tb && tb->tb_stats && (tb->tb_stats->stats_enabled & JIT_STATS))
 
@@ -115,4 +118,11 @@ void init_tb_stats_htable_if_not(void);
 
 void dump_jit_profile_info(TCGProfile *s);
 
+struct TbstatsCommand {
+    enum TbstatsCmd cmd;
+    uint32_t level;
+};
+
+void do_hmp_tbstats_safe(CPUState *cpu, run_on_cpu_data icmd);
+
 #endif
diff --git a/monitor/misc.c b/monitor/misc.c
index ac4ff58d96..218263d29a 100644
--- a/monitor/misc.c
+++ b/monitor/misc.c
@@ -74,6 +74,8 @@
 #include "sysemu/cpus.h"
 #include "qemu/cutils.h"
 #include "tcg/tcg.h"
+#include "exec/tb-stats.h"
+#include "qemu-common.h"
 
 #if defined(TARGET_S390X)
 #include "hw/s390x/storage-keys.h"
@@ -459,6 +461,49 @@ static void hmp_info_registers(Monitor *mon, const QDict *qdict)
 }
 
 #ifdef CONFIG_TCG
+static void hmp_tbstats(Monitor *mon, const QDict *qdict)
+{
+    if (!tcg_enabled()) {
+        error_report("TB information is only available with accel=tcg");
+        return;
+    }
+
+    char *cmd = (char *) qdict_get_try_str(qdict, "command");
+    enum TbstatsCmd icmd = -1;
+
+    if (strcmp(cmd, "start") == 0) {
+        icmd = START;
+    } else if (strcmp(cmd, "pause") == 0) {
+        icmd = PAUSE;
+    } else if (strcmp(cmd, "stop") == 0) {
+        icmd = STOP;
+    } else if (strcmp(cmd, "filter") == 0) {
+        icmd = FILTER;
+    } else {
+        error_report("invalid command!");
+        return;
+    }
+
+    char *slevel = (char *) qdict_get_try_str(qdict, "level");
+    uint32_t level = TB_EXEC_STATS | TB_JIT_STATS | TB_JIT_TIME;
+    if (slevel) {
+        if (strcmp(slevel, "jit") == 0) {
+            level = TB_JIT_STATS;
+        } else if (strcmp(slevel, "exec") == 0) {
+            level = TB_EXEC_STATS;
+        } else if (strcmp(slevel, "time") == 0) {
+            level = TB_JIT_TIME;
+        }
+    }
+
+    struct TbstatsCommand *tbscommand = g_new0(struct TbstatsCommand, 1);
+    tbscommand->cmd = icmd;
+    tbscommand->level = level;
+    async_safe_run_on_cpu(first_cpu, do_hmp_tbstats_safe,
+                          RUN_ON_CPU_HOST_PTR(tbscommand));
+
+}
+
 static void hmp_info_jit(Monitor *mon, const QDict *qdict)
 {
     if (!tcg_enabled()) {
@@ -1089,8 +1134,12 @@ static void hmp_info_mtree(Monitor *mon, const QDict *qdict)
 
 static void hmp_info_profile(Monitor *mon, const QDict *qdict)
 {
+#ifdef CONFIG_TCG
     dump_jit_exec_time_info(dev_time);
     dev_time = 0;
+#else
+    error_report("TCG should be enabled!");
+#endif
 }
 
 /* Capture support */
diff --git a/vl.c b/vl.c
index f91bc6fb4c..781fddaf18 100644
--- a/vl.c
+++ b/vl.c
@@ -1796,11 +1796,17 @@ static bool main_loop_should_exit(void)
 
 static void main_loop(void)
 {
+#ifdef CONFIG_TCG
     uint64_t ti;
+#endif
     while (!main_loop_should_exit()) {
+#ifdef CONFIG_TCG
         ti = profile_getclock();
+#endif
         main_loop_wait(false);
+#ifdef CONFIG_TCG
         dev_time += profile_getclock() - ti;
+#endif
     }
 }
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH  v9 08/13] tb-stats: reset the tracked TBs on a tb_flush
  2019-10-07 15:28 [PATCH v9 00/13] TCG code quality tracking and perf integration Alex Bennée
                   ` (6 preceding siblings ...)
  2019-10-07 15:28 ` [PATCH v9 07/13] monitor: adding tb_stats hmp command Alex Bennée
@ 2019-10-07 15:28 ` Alex Bennée
  2019-10-08 18:00   ` Richard Henderson
  2019-10-07 15:28 ` [PATCH v9 09/13] Adding info [tb-list|tb] commands to HMP (WIP) Alex Bennée
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Alex Bennée @ 2019-10-07 15:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, cota, Alex Bennée, Richard Henderson

We keep track of translations but can only do so up until the
translation cache is flushed. At that point we really have no idea if
we can re-create a translation because all the active tracking
information has been reset.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 accel/tcg/tb-stats.c      | 19 +++++++++++++++++++
 accel/tcg/translate-all.c |  2 +-
 include/exec/tb-stats.h   |  8 ++++++++
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
index dabc5150f9..f08e5f2540 100644
--- a/accel/tcg/tb-stats.c
+++ b/accel/tcg/tb-stats.c
@@ -247,6 +247,25 @@ void do_hmp_tbstats_safe(CPUState *cpu, run_on_cpu_data icmd)
     g_free(cmdinfo);
 }
 
+/*
+ * We have to reset the tbs array on a tb_flush as those
+ * TranslationBlocks no longer exist and we no loner know if the
+ * current mapping is still valid.
+ */
+
+static void reset_tbs_array(void *p, uint32_t hash, void *userp)
+{
+    TBStatistics *tbs = p;
+    g_ptr_array_set_size(tbs->tbs, 0);
+}
+
+void tbstats_reset_tbs(void)
+{
+    if (tb_ctx.tb_stats.map) {
+        qht_iter(&tb_ctx.tb_stats, reset_tbs_array, NULL);
+    }
+}
+
 void init_tb_stats_htable_if_not(void)
 {
     if (tb_stats_collection_enabled() && !tb_ctx.tb_stats.map) {
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 396e63c3e7..871d91d559 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1273,7 +1273,7 @@ static void do_tb_flush(CPUState *cpu, run_on_cpu_data tb_flush_count)
 
     qht_reset_size(&tb_ctx.htable, CODE_GEN_HTABLE_SIZE);
     page_flush_tb();
-
+    tbstats_reset_tbs();
     tcg_region_reset_all();
     /* XXX: flush processor icache at this point if cache flush is
        expensive */
diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
index 921da38c97..c20a3e6439 100644
--- a/include/exec/tb-stats.h
+++ b/include/exec/tb-stats.h
@@ -125,4 +125,12 @@ struct TbstatsCommand {
 
 void do_hmp_tbstats_safe(CPUState *cpu, run_on_cpu_data icmd);
 
+/**
+ * tbstats_reset_tbs: reset the linked array of TBs
+ *
+ * Reset the list of tbs for a given array. Should be called from
+ * safe work during tb_flush.
+ */
+void tbstats_reset_tbs(void);
+
 #endif
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH  v9 09/13] Adding info [tb-list|tb] commands to HMP (WIP)
  2019-10-07 15:28 [PATCH v9 00/13] TCG code quality tracking and perf integration Alex Bennée
                   ` (7 preceding siblings ...)
  2019-10-07 15:28 ` [PATCH v9 08/13] tb-stats: reset the tracked TBs on a tb_flush Alex Bennée
@ 2019-10-07 15:28 ` Alex Bennée
  2019-10-08 18:50   ` Richard Henderson
  2019-10-09  9:44   ` Dr. David Alan Gilbert
  2019-10-07 15:28 ` [PATCH v9 10/13] tb-stats: dump hot TBs at the end of the execution Alex Bennée
                   ` (5 subsequent siblings)
  14 siblings, 2 replies; 37+ messages in thread
From: Alex Bennée @ 2019-10-07 15:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Vanderson M. do Rosario, Dr . David Alan Gilbert,
	Markus Armbruster, cota, Paolo Bonzini, Alex Bennée,
	Richard Henderson

From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>

These commands allow the exploration of TBs generated by the TCG.
Understand which one hotter, with more guest/host instructions... and
examine their guest, host and IR code.

The goal of this command is to allow the dynamic exploration of TCG
behavior and code quality. Therefore, for now, a corresponding QMP
command is not worthwhile.

[AJB: WIP - we still can't be safely sure a translation will succeed]

Example of output:

TB id:1 | phys:0x34d54 virt:0x0000000000034d54 flags:0x0000f0
	| exec:4828932/0 guest inst cov:16.38%
	| trans:1 ints: g:3 op:82 op_opt:34 spills:3
	| h/g (host bytes / guest insts): 90.666664
	| time to gen at 2.4GHz => code:3150.83(ns) IR:712.08(ns)
	| targets: 0x0000000000034d5e (id:11), 0x0000000000034d0d (id:2)

TB id:2 | phys:0x34d0d virt:0x0000000000034d0d flags:0x0000f0
	| exec:4825842/0 guest inst cov:21.82%
	| trans:1 ints: g:4 op:80 op_opt:38 spills:2
	| h/g (host bytes / guest insts): 84.000000
	| time to gen at 2.4GHz => code:3362.92(ns) IR:793.75(ns)
	| targets: 0x0000000000034d19 (id:12), 0x0000000000034d54 (id:1)

TB id:2 | phys:0x34d0d virt:0x0000000000034d0d flags:0x0000f0
	| exec:6956495/0  guest inst cov:21.82%
	| trans:2 ints: g:2 op:40 op_opt:19 spills:1
	| h/g (host bytes / guest insts): 84.000000
	| time to gen at 2.4GHz => code:3130.83(ns) IR:722.50(ns)
	| targets: 0x0000000000034d19 (id:12), 0x0000000000034d54 (id:1)

----------------
IN:
0x00034d0d:  89 de                    movl     %ebx, %esi
0x00034d0f:  26 8b 0e                 movl     %es:(%esi), %ecx
0x00034d12:  26 f6 46 08 80           testb    $0x80, %es:8(%esi)
0x00034d17:  75 3b                    jne      0x34d54

------------------------------

TB id:1 | phys:0x34d54 virt:0x0000000000034d54 flags:0x0000f0
	| exec:5202686/0 guest inst cov:11.28%
	| trans:1 ints: g:3 op:82 op_opt:34 spills:3
	| h/g (host bytes / guest insts): 90.666664
	| time to gen at 2.4GHz => code:2793.75(ns) IR:614.58(ns)
	| targets: 0x0000000000034d5e (id:3), 0x0000000000034d0d (id:2)

TB id:2 | phys:0x34d0d virt:0x0000000000034d0d flags:0x0000f0
	| exec:5199468/0 guest inst cov:15.03%
	| trans:1 ints: g:4 op:80 op_opt:38 spills:2
	| h/g (host bytes / guest insts): 84.000000
	| time to gen at 2.4GHz => code:2958.75(ns) IR:719.58(ns)
	| targets: 0x0000000000034d19 (id:4), 0x0000000000034d54 (id:1)

------------------------------
2 TBs to reach 25% of guest inst exec coverage
Total of guest insts exec: 138346727

------------------------------

Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
Message-Id: <20190829173437.5926-10-vandersonmr2@gmail.com>
[AJB: fix authorship, dropped coverset]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
AJB
  - dropped coverset for now
  - check tbs list for valid translations
  - document the valid but unreachable TLB case
---
 accel/tcg/tb-stats.c         | 319 ++++++++++++++++++++++++++++++++++-
 disas.c                      |  31 +++-
 hmp-commands-info.hx         |  16 ++
 include/exec/tb-stats.h      |  33 +++-
 include/qemu/log-for-trace.h |   4 +
 include/qemu/log.h           |   2 +
 monitor/misc.c               |  52 ++++++
 util/log.c                   |  54 ++++--
 8 files changed, 491 insertions(+), 20 deletions(-)

diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
index f08e5f2540..3c1cc8cd06 100644
--- a/accel/tcg/tb-stats.c
+++ b/accel/tcg/tb-stats.c
@@ -28,9 +28,23 @@ static enum TBStatsStatus tcg_collect_tb_stats;
 static uint32_t default_tbstats_flag;
 /* only accessed in safe work */
 static GList *last_search;
-
+static int id = 1; /* display_id increment counter */
 uint64_t dev_time;
 
+static TBStatistics *get_tbstats_by_id(int id)
+{
+    GList *iter;
+
+    for (iter = last_search; iter; iter = g_list_next(iter)) {
+        TBStatistics *tbs = iter->data;
+        if (tbs && tbs->display_id == id) {
+            return tbs;
+            break;
+        }
+    }
+    return NULL;
+}
+
 struct jit_profile_info {
     uint64_t translations;
     uint64_t aborted;
@@ -274,6 +288,309 @@ void init_tb_stats_htable_if_not(void)
     }
 }
 
+static void collect_tb_stats(void *p, uint32_t hash, void *userp)
+{
+    last_search = g_list_prepend(last_search, p);
+}
+
+static void count_invalid_tbs(gpointer data, gpointer user_data)
+{
+    TranslationBlock *tb = (TranslationBlock *) data;
+    unsigned *counter = (unsigned *) user_data;
+    if (tb->cflags & CF_INVALID) {
+        *counter = *counter + 1;
+    }
+}
+
+static int dump_tb_header(TBStatistics *tbs)
+{
+    unsigned g = stat_per_translation(tbs, code.num_guest_inst);
+    unsigned ops = stat_per_translation(tbs, code.num_tcg_ops);
+    unsigned ops_opt = stat_per_translation(tbs, code.num_tcg_ops_opt);
+    unsigned spills = stat_per_translation(tbs, code.spills);
+    unsigned h = stat_per_translation(tbs, code.out_len);
+    unsigned act = tbs->tbs->len;
+    unsigned invalid = 0;
+
+    float guest_host_prop = g ? ((float) h / g) : 0;
+
+    g_ptr_array_foreach(tbs->tbs, &count_invalid_tbs, &invalid);
+
+    qemu_log("TB id:%d | phys:0x"TB_PAGE_ADDR_FMT" virt:0x"TARGET_FMT_lx
+             " flags:%#08x %d inv/%d\n",
+             tbs->display_id, tbs->phys_pc, tbs->pc, tbs->flags,
+             invalid, act);
+
+    if (tbs_stats_enabled(tbs, TB_EXEC_STATS)) {
+        qemu_log("\t| exec:%lu/%lu guest inst cov:%.2f%%\n", tbs->executions.normal,
+                tbs->executions.atomic, tbs->executions.coverage / 100.0f);
+    }
+
+    if (tbs_stats_enabled(tbs, TB_JIT_STATS)) {
+        qemu_log("\t| trans:%lu ints: g:%u op:%u op_opt:%u spills:%d"
+             "\n\t| h/g (host bytes / guest insts): %f\n",
+             tbs->translations.total, g, ops, ops_opt, spills, guest_host_prop);
+    }
+
+    if (tbs_stats_enabled(tbs, TB_JIT_TIME)) {
+        qemu_log("\t| time to gen at 2.4GHz => code:%0.2lf(ns) IR:%0.2lf(ns)\n",
+             tbs->time.code / 2.4, tbs->time.interm / 2.4);
+    }
+
+    qemu_log("\n");
+
+    return act - invalid;
+}
+
+static gint
+inverse_sort_tbs(gconstpointer p1, gconstpointer p2, gpointer psort_by)
+{
+    const TBStatistics *tbs1 = (TBStatistics *) p1;
+    const TBStatistics *tbs2 = (TBStatistics *) p2;
+    int sort_by = *((int *) psort_by);
+    unsigned long c1 = 0;
+    unsigned long c2 = 0;
+
+    if (likely(sort_by == SORT_BY_SPILLS)) {
+        c1 = stat_per_translation(tbs1, code.spills);
+        c2 = stat_per_translation(tbs2, code.spills);
+    } else if (likely(sort_by == SORT_BY_HOTNESS)) {
+        c1 = stat_per_translation(tbs1, executions.normal);
+        c2 = stat_per_translation(tbs2, executions.normal);
+    } else if (likely(sort_by == SORT_BY_HG)) {
+        if (tbs1->code.num_guest_inst == 0) {
+            return -1;
+        }
+        if (tbs2->code.num_guest_inst == 0) {
+            return 1;
+        }
+
+        float a =
+            (float) stat_per_translation(tbs1, code.out_len) / tbs1->code.num_guest_inst;
+        float b =
+            (float) stat_per_translation(tbs2, code.out_len) / tbs2->code.num_guest_inst;
+        c1 = a <= b ? 0 : 1;
+        c2 = a <= b ? 1 : 0;
+    }
+
+    return c1 < c2 ? 1 : c1 == c2 ? 0 : -1;
+}
+
+static void dump_last_search_headers(int count)
+{
+    if (!last_search) {
+        qemu_log("No data collected yet\n");
+        return;
+    }
+
+    GList *l = last_search;
+    while (l != NULL && count--) {
+        TBStatistics *tbs = (TBStatistics *) l->data;
+        GList *next = l->next;
+        dump_tb_header(tbs);
+        l = next;
+    }
+}
+
+static uint64_t calculate_last_search_coverages(void)
+{
+    uint64_t total_exec_count = 0;
+    GList *i;
+
+    /* Compute total execution count for all tbs */
+    for (i = last_search; i; i = i->next) {
+        TBStatistics *tbs = (TBStatistics *) i->data;
+        total_exec_count +=
+            (tbs->executions.atomic + tbs->executions.normal) * tbs->code.num_guest_inst;
+    }
+
+    for (i = last_search; i; i = i->next) {
+        TBStatistics *tbs = (TBStatistics *) i->data;
+        uint64_t tb_total_execs =
+            (tbs->executions.atomic + tbs->executions.normal) * tbs->code.num_guest_inst;
+        tbs->executions.coverage = (10000 * tb_total_execs) / (total_exec_count + 1);
+    }
+
+    return total_exec_count;
+}
+
+static void do_dump_tbs_info(int total, int sort_by)
+{
+    id = 1;
+    GList *i;
+    int count = total;
+
+    g_list_free(last_search);
+    last_search = NULL;
+
+    qht_iter(&tb_ctx.tb_stats, collect_tb_stats, NULL);
+
+    last_search = g_list_sort_with_data(last_search, inverse_sort_tbs, &sort_by);
+
+    if (!last_search) {
+        qemu_printf("No data collected yet!\n");
+        return;
+    }
+
+    calculate_last_search_coverages();
+
+    for (i = last_search; i && count--; i = i->next) {
+        TBStatistics *tbs = (TBStatistics *) i->data;
+        tbs->display_id = id++;
+    }
+
+    /* free the unused bits */
+    if (i) {
+        if (i->next) {
+            i->next->prev = NULL;
+        }
+        g_list_free(i->next);
+        i->next = NULL;
+    }
+
+    dump_last_search_headers(total);
+}
+
+struct tbs_dump_info {
+    int count;
+    int sort_by;
+};
+
+static void do_dump_tbs_info_safe(CPUState *cpu, run_on_cpu_data tbdi)
+{
+    struct tbs_dump_info *info = tbdi.host_ptr;
+    qemu_log_to_monitor(true);
+    do_dump_tbs_info(info->count, info->sort_by);
+    qemu_log_to_monitor(false);
+    g_free(info);
+}
+
+/*
+ * When we dump_tbs_info on a live system via the HMP we want to
+ * ensure the system is quiessent before we start outputting stuff.
+ * Otherwise we could pollute the output with other logging output.
+ */
+
+void dump_tbs_info(int count, int sort_by, bool use_monitor)
+{
+    if (use_monitor) {
+        struct tbs_dump_info *tbdi = g_new(struct tbs_dump_info, 1);
+        tbdi->count = count;
+        tbdi->sort_by = sort_by;
+        async_safe_run_on_cpu(first_cpu, do_dump_tbs_info_safe,
+                              RUN_ON_CPU_HOST_PTR(tbdi));
+    } else {
+        do_dump_tbs_info(count, sort_by);
+    }
+}
+
+/*
+ * We cannot always re-generate the code even if we know there are
+ * valid translations still in the cache. The reason being the guest
+ * may have un-mapped the page code. In real life this would be
+ * un-reachable as the jump cache is cleared and the following QHT
+ * lookup will do a get_page_addr_code() and fault.
+ *
+ * TODO: can we do this safely? We need to
+ *  a) somehow recover the mmu_idx for this translation
+ *  b) probe MMU_INST_FETCH to know it will succeed
+ */
+static GString *get_code_string(TBStatistics *tbs, int log_flags)
+{
+    int old_log_flags = qemu_loglevel;
+
+    CPUState *cpu = first_cpu;
+    uint32_t cflags = curr_cflags() | CF_NOCACHE;
+    TranslationBlock *tb = NULL;
+
+    GString *code_s = g_string_new(NULL);
+    qemu_log_to_string(true, code_s);
+
+    qemu_set_log(log_flags);
+
+    if (sigsetjmp(cpu->jmp_env, 0) == 0) {
+        mmap_lock();
+        tb = tb_gen_code(cpu, tbs->pc, tbs->cs_base, tbs->flags, cflags);
+        tb_phys_invalidate(tb, -1);
+        mmap_unlock();
+    } else {
+        /*
+         * The mmap_lock is dropped by tb_gen_code if it runs out of
+         * memory.
+         */
+        qemu_log("\ncould not gen code for this TB (no longer mapped?)\n");
+        assert_no_pages_locked();
+    }
+
+    qemu_set_log(old_log_flags);
+    qemu_log_to_string(false, NULL);
+
+    if (tb) {
+        tcg_tb_remove(tb);
+    }
+
+    return code_s;
+}
+
+static void do_tb_dump_with_statistics(TBStatistics *tbs, int log_flags)
+{
+    g_autoptr(GString) code_s = NULL;
+
+    qemu_log("\n------------------------------\n\n");
+
+    if (dump_tb_header(tbs) > 0) {
+        code_s = get_code_string(tbs, log_flags);
+    } else {
+        code_s = g_string_new("cannot re-translate non-active translation");
+    }
+    qemu_log("%s", code_s->str);
+    qemu_log("------------------------------\n");
+}
+
+struct tb_dump_info {
+    int id;
+    int log_flags;
+    bool use_monitor;
+};
+
+static void do_dump_tb_info_safe(CPUState *cpu, run_on_cpu_data info)
+{
+    struct tb_dump_info *tbdi = (struct tb_dump_info *) info.host_ptr;
+
+    if (!last_search) {
+        qemu_log("no search on record\n");
+        return;
+    }
+
+    qemu_log_to_monitor(tbdi->use_monitor);
+
+    TBStatistics *tbs = get_tbstats_by_id(tbdi->id);
+    if (tbs) {
+        do_tb_dump_with_statistics(tbs, tbdi->log_flags);
+    } else {
+        qemu_log("no TB statitics found with id %d\n", tbdi->id);
+    }
+
+    qemu_log_to_monitor(false);
+
+    g_free(tbdi);
+}
+
+void dump_tb_info(int id, int log_mask, bool use_monitor)
+{
+    struct tb_dump_info *tbdi = g_new(struct tb_dump_info, 1);
+
+    tbdi->id = id;
+    tbdi->log_flags = log_mask;
+    tbdi->use_monitor = use_monitor;
+
+    async_safe_run_on_cpu(first_cpu, do_dump_tb_info_safe,
+                          RUN_ON_CPU_HOST_PTR(tbdi));
+
+    /* tbdi free'd by do_dump_tb_info_safe */
+}
+
+
 void enable_collect_tb_stats(void)
 {
     init_tb_stats_htable_if_not();
diff --git a/disas.c b/disas.c
index 3e2bfa572b..d5292d4246 100644
--- a/disas.c
+++ b/disas.c
@@ -8,6 +8,8 @@
 #include "disas/disas.h"
 #include "disas/capstone.h"
 
+#include "qemu/log-for-trace.h"
+
 typedef struct CPUDebug {
     struct disassemble_info info;
     CPUState *cpu;
@@ -420,6 +422,22 @@ static bool cap_disas_monitor(disassemble_info *info, uint64_t pc, int count)
 # define cap_disas_monitor(i, p, c)  false
 #endif /* CONFIG_CAPSTONE */
 
+static int fprintf_log(struct _IO_FILE *a, const char *b, ...)
+{
+    va_list ap;
+    va_start(ap, b);
+
+    if (!to_string) {
+        vfprintf(a, b, ap);
+    } else {
+        qemu_vlog(b, ap);
+    }
+
+    va_end(ap);
+
+    return 1;
+}
+
 /* Disassemble this for me please... (debugging).  */
 void target_disas(FILE *out, CPUState *cpu, target_ulong code,
                   target_ulong size)
@@ -429,7 +447,7 @@ void target_disas(FILE *out, CPUState *cpu, target_ulong code,
     int count;
     CPUDebug s;
 
-    INIT_DISASSEMBLE_INFO(s.info, out, fprintf);
+    INIT_DISASSEMBLE_INFO(s.info, out, fprintf_log);
 
     s.cpu = cpu;
     s.info.read_memory_func = target_read_memory;
@@ -460,11 +478,12 @@ void target_disas(FILE *out, CPUState *cpu, target_ulong code,
     }
 
     for (pc = code; size > 0; pc += count, size -= count) {
-	fprintf(out, "0x" TARGET_FMT_lx ":  ", pc);
-	count = s.info.print_insn(pc, &s.info);
-	fprintf(out, "\n");
-	if (count < 0)
-	    break;
+        fprintf_log(out, "0x" TARGET_FMT_lx ":  ", pc);
+        count = s.info.print_insn(pc, &s.info);
+        fprintf_log(out, "\n");
+        if (count < 0) {
+            break;
+        }
         if (size < count) {
             fprintf(out,
                     "Disassembler disagrees with translator over instruction "
diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 257ee7d7a3..c8418211d9 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -289,6 +289,22 @@ ETEXI
         .help       = "show dynamic compiler info",
         .cmd        = hmp_info_jit,
     },
+    {
+        .name       = "tb-list",
+        .args_type  = "number:i?,sortedby:s?",
+        .params     = "[number sortedby]",
+        .help       = "show a [number] translated blocks sorted by [sortedby]"
+                      "sortedby opts: hotness hg spills",
+        .cmd        = hmp_info_tblist,
+    },
+    {
+        .name       = "tb",
+        .args_type  = "id:i,flags:s?",
+        .params     = "id [flag1,flag2,...]",
+        .help       = "show information about one translated block by id."
+                      "dump flags can be used to set dump code level: out_asm in_asm op",
+        .cmd        = hmp_info_tb,
+    },
 #endif
 
 STEXI
diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
index c20a3e6439..b5e9c54b33 100644
--- a/include/exec/tb-stats.h
+++ b/include/exec/tb-stats.h
@@ -35,8 +35,11 @@
 enum SortBy { SORT_BY_HOTNESS, SORT_BY_HG /* Host/Guest */, SORT_BY_SPILLS };
 enum TbstatsCmd { START, PAUSE, STOP, FILTER };
 
+#define tbs_stats_enabled(tbs, JIT_STATS) \
+    (tbs && (tbs->stats_enabled & JIT_STATS))
+
 #define tb_stats_enabled(tb, JIT_STATS) \
-    (tb && tb->tb_stats && (tb->tb_stats->stats_enabled & JIT_STATS))
+    (tb && tb->tb_stats && tbs_stats_enabled(tb->tb_stats, JIT_STATS))
 
 #define stat_per_translation(stat, name) \
     (stat->translations.total ? stat->name / stat->translations.total : 0)
@@ -65,6 +68,8 @@ struct TBStatistics {
     struct {
         unsigned long normal;
         unsigned long atomic;
+        /* filled only when dumping x% cover set */
+        uint16_t coverage;
     } executions;
 
     /* JIT Stats - protected by lock */
@@ -86,7 +91,6 @@ struct TBStatistics {
 
     struct {
         unsigned long total;
-        unsigned long uncached;
         unsigned long spanning;
     } translations;
 
@@ -107,6 +111,9 @@ struct TBStatistics {
         int64_t opt;
         int64_t la;
     } time;
+
+    /* HMP information - used for referring to previous search */
+    int display_id;
 };
 
 bool tb_stats_cmp(const void *ap, const void *bp);
@@ -133,4 +140,26 @@ void do_hmp_tbstats_safe(CPUState *cpu, run_on_cpu_data icmd);
  */
 void tbstats_reset_tbs(void);
 
+/**
+ * dump_tbs_info: report the hottest blocks
+ *
+ * @count: the limit of hotblocks
+ * @sort_by: property in which the dump will be sorted
+ * @use_monitor: redirect output to monitor
+ *
+ * Report the hottest blocks to either the log or monitor
+ */
+void dump_tbs_info(int count, int sort_by, bool use_monitor);
+
+/**
+ * dump_tb_info: dump information about one TB
+ *
+ * @id: the display id of the block (from previous search)
+ * @mask: the temporary logging mask
+ * @Use_monitor: redirect output to monitor
+ *
+ * Re-run a translation of a block at addr for the purposes of debug output
+ */
+void dump_tb_info(int id, int log_mask, bool use_monitor);
+
 #endif
diff --git a/include/qemu/log-for-trace.h b/include/qemu/log-for-trace.h
index 2f0a5b080e..3de88484cb 100644
--- a/include/qemu/log-for-trace.h
+++ b/include/qemu/log-for-trace.h
@@ -20,6 +20,9 @@
 
 /* Private global variable, don't use */
 extern int qemu_loglevel;
+extern bool to_string;
+
+extern int32_t max_num_hot_tbs_to_dump;
 
 #define LOG_TRACE          (1 << 15)
 
@@ -31,5 +34,6 @@ static inline bool qemu_loglevel_mask(int mask)
 
 /* main logging function */
 int GCC_FMT_ATTR(1, 2) qemu_log(const char *fmt, ...);
+int qemu_vlog(const char *fmt, va_list va);
 
 #endif
diff --git a/include/qemu/log.h b/include/qemu/log.h
index a8d1997cde..804cf90f0f 100644
--- a/include/qemu/log.h
+++ b/include/qemu/log.h
@@ -114,6 +114,8 @@ typedef struct QEMULogItem {
 extern const QEMULogItem qemu_log_items[];
 
 void qemu_set_log(int log_flags);
+void qemu_log_to_monitor(bool enable);
+void qemu_log_to_string(bool enable, GString *s);
 void qemu_log_needs_buffers(void);
 void qemu_set_log_filename(const char *filename, Error **errp);
 void qemu_set_dfilter_ranges(const char *ranges, Error **errp);
diff --git a/monitor/misc.c b/monitor/misc.c
index 218263d29a..af09a97085 100644
--- a/monitor/misc.c
+++ b/monitor/misc.c
@@ -504,6 +504,58 @@ static void hmp_tbstats(Monitor *mon, const QDict *qdict)
 
 }
 
+static void hmp_info_tblist(Monitor *mon, const QDict *qdict)
+{
+    int number_int;
+    const char *sortedby_str = NULL;
+    if (!tcg_enabled()) {
+        error_report("TB information is only available with accel=tcg");
+        return;
+    }
+    if (!tb_ctx.tb_stats.map) {
+        error_report("no TB information recorded");
+        return;
+    }
+
+    number_int = qdict_get_try_int(qdict, "number", 10);
+    sortedby_str = qdict_get_try_str(qdict, "sortedby");
+
+    int sortedby = SORT_BY_HOTNESS;
+    if (sortedby_str == NULL || strcmp(sortedby_str, "hotness") == 0) {
+        sortedby = SORT_BY_HOTNESS;
+    } else if (strcmp(sortedby_str, "hg") == 0) {
+        sortedby = SORT_BY_HG;
+    } else if (strcmp(sortedby_str, "spills") == 0) {
+        sortedby = SORT_BY_SPILLS;
+    } else {
+        error_report("valid sort options are: hotness hg spills");
+        return;
+    }
+
+    dump_tbs_info(number_int, sortedby, true);
+}
+
+static void hmp_info_tb(Monitor *mon, const QDict *qdict)
+{
+    const int id = qdict_get_int(qdict, "id");
+    const char *flags = qdict_get_try_str(qdict, "flags");
+    int mask;
+
+    if (!tcg_enabled()) {
+        error_report("TB information is only available with accel=tcg");
+        return;
+    }
+
+    mask = flags ? qemu_str_to_log_mask(flags) : CPU_LOG_TB_IN_ASM;
+
+    if (!mask) {
+        error_report("Unable to parse log flags, see 'help log'");
+        return;
+    }
+
+    dump_tb_info(id, mask, true);
+}
+
 static void hmp_info_jit(Monitor *mon, const QDict *qdict)
 {
     if (!tcg_enabled()) {
diff --git a/util/log.c b/util/log.c
index 86bd691967..fa78e2bca9 100644
--- a/util/log.c
+++ b/util/log.c
@@ -33,25 +33,55 @@ int qemu_loglevel;
 static int log_append = 0;
 static GArray *debug_regions;
 int32_t max_num_hot_tbs_to_dump;
+static bool to_monitor;
+bool to_string;
 
-/* Return the number of characters emitted.  */
-int qemu_log(const char *fmt, ...)
+GString *string;
+
+int qemu_vlog(const char *fmt, va_list va)
 {
     int ret = 0;
-    if (qemu_logfile) {
-        va_list ap;
-        va_start(ap, fmt);
-        ret = vfprintf(qemu_logfile, fmt, ap);
-        va_end(ap);
-
-        /* Don't pass back error results.  */
-        if (ret < 0) {
-            ret = 0;
+    if (to_string) {
+        if (string) {
+            g_string_append_vprintf(string, fmt, va);
         }
+    } else if (to_monitor) {
+        ret = qemu_vprintf(fmt, va);
+    } else if (qemu_logfile) {
+        ret = vfprintf(qemu_logfile, fmt, va);
+    }
+
+    /* Don't pass back error results.  */
+    if (ret < 0) {
+        ret = 0;
     }
     return ret;
 }
 
+/* Return the number of characters emitted.  */
+int qemu_log(const char *fmt, ...)
+{
+    int ret = 0;
+    va_list ap;
+    va_start(ap, fmt);
+
+    ret = qemu_vlog(fmt, ap);
+
+    va_end(ap);
+    return ret;
+}
+
+void qemu_log_to_monitor(bool enable)
+{
+    to_monitor = enable;
+}
+
+void qemu_log_to_string(bool enable, GString *s)
+{
+    to_string = enable;
+    string = s;
+}
+
 static bool log_uses_own_buffers;
 
 /* enable or disable low levels log */
@@ -300,6 +330,7 @@ int qemu_str_to_log_mask(const char *str)
             trace_enable_events((*tmp) + 6);
             mask |= LOG_TRACE;
 #endif
+#ifdef CONFIG_TCG
         } else if (g_str_has_prefix(*tmp, "tb_stats")) {
             mask |= CPU_LOG_TB_STATS;
             set_default_tbstats_flag(TB_JIT_STATS | TB_EXEC_STATS | TB_JIT_TIME);
@@ -329,6 +360,7 @@ int qemu_str_to_log_mask(const char *str)
                 }
                 set_default_tbstats_flag(flags);
             }
+#endif
         } else {
             for (item = qemu_log_items; item->mask != 0; item++) {
                 if (g_str_equal(*tmp, item->name)) {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH  v9 10/13] tb-stats: dump hot TBs at the end of the execution
  2019-10-07 15:28 [PATCH v9 00/13] TCG code quality tracking and perf integration Alex Bennée
                   ` (8 preceding siblings ...)
  2019-10-07 15:28 ` [PATCH v9 09/13] Adding info [tb-list|tb] commands to HMP (WIP) Alex Bennée
@ 2019-10-07 15:28 ` Alex Bennée
  2019-10-08 19:05   ` Richard Henderson
  2019-10-07 15:28 ` [PATCH v9 11/13] accel/tcg: adding integration with linux perf Alex Bennée
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Alex Bennée @ 2019-10-07 15:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Vanderson M. do Rosario, Riku Voipio, Laurent Vivier, cota,
	Paolo Bonzini, Alex Bennée, Richard Henderson

From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>

Dump the hottest TBs if -d tb_stats,dump_limit=N is used.

Example of output for the 3 hottest TBs:

TB id:1 | phys:0x34d54 virt:0x0000000000034d54 flags:0x0000f0
        | exec:4828932/0 guest inst cov:16.38%
        | trans:1 ints: g:3 op:82 op_opt:34 spills:3
        | h/g (host bytes / guest insts): 90.666664
        | time to gen at 2.4GHz => code:3150.83(ns) IR:712.08(ns)
        | targets: 0x0000000000034d5e (id:11), 0x0000000000034d0d (id:2)

TB id:2 | phys:0x34d0d virt:0x0000000000034d0d flags:0x0000f0
        | exec:4825842/0 guest inst cov:21.82%
        | trans:1 ints: g:4 op:80 op_opt:38 spills:2
        | h/g (host bytes / guest insts): 84.000000
        | time to gen at 2.4GHz => code:3362.92(ns) IR:793.75(ns)
        | targets: 0x0000000000034d19 (id:12), 0x0000000000034d54 (id:1)

TB id:3 | phys:0xec1c1 virt:0x00000000000ec1c1 flags:0x0000b0
        | exec:872032/0 guest inst cov:1.97%
        | trans:1 ints: g:2 op:56 op_opt:26 spills:1
        | h/g (host bytes / guest insts): 68.000000
        | time to gen at 2.4GHz => code:1692.08(ns) IR:473.75(ns)
        | targets: 0x00000000000ec1c5 (id:4), 0x00000000000ec1cb (id:13)

Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
Message-Id: <20190829173437.5926-12-vandersonmr2@gmail.com>
[AJB: fix authorship, ad softmmu support]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
AJB:
  - move vars into tb_stats.c
  - support softmmu as well
---
 accel/tcg/tb-stats.c          | 21 +++++++++++++++++++++
 include/exec/tb-stats-dump.h  | 21 +++++++++++++++++++++
 include/exec/tb-stats-flags.h |  1 +
 linux-user/exit.c             |  2 ++
 stubs/tb-stats.c              |  5 +++++
 util/log.c                    |  4 ++--
 vl.c                          |  2 ++
 7 files changed, 54 insertions(+), 2 deletions(-)
 create mode 100644 include/exec/tb-stats-dump.h

diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
index 3c1cc8cd06..60a9ee4b9c 100644
--- a/accel/tcg/tb-stats.c
+++ b/accel/tcg/tb-stats.c
@@ -15,6 +15,7 @@
 #include "qemu/qemu-print.h"
 
 #include "exec/tb-stats.h"
+#include "exec/tb-stats-dump.h"
 
 /* TBStatistic collection controls */
 enum TBStatsStatus {
@@ -26,6 +27,7 @@ enum TBStatsStatus {
 
 static enum TBStatsStatus tcg_collect_tb_stats;
 static uint32_t default_tbstats_flag;
+static int max_dump_tbs;
 /* only accessed in safe work */
 static GList *last_search;
 static int id = 1; /* display_id increment counter */
@@ -591,6 +593,20 @@ void dump_tb_info(int id, int log_mask, bool use_monitor)
 }
 
 
+/*
+ * Dump the final stats
+ */
+void tb_stats_dump(void)
+{
+    if (!tb_stats_collection_enabled()) {
+        return;
+    }
+
+    dump_tbs_info(max_dump_tbs, SORT_BY_HOTNESS, false);
+}
+
+/* TBStatistic collection controls */
+
 void enable_collect_tb_stats(void)
 {
     init_tb_stats_htable_if_not();
@@ -639,3 +655,8 @@ void set_default_tbstats_flag(uint32_t flags)
 {
     default_tbstats_flag = flags;
 }
+
+void set_tbstats_max_tbs(int max)
+{
+    max_dump_tbs = max;
+}
diff --git a/include/exec/tb-stats-dump.h b/include/exec/tb-stats-dump.h
new file mode 100644
index 0000000000..197c6148e9
--- /dev/null
+++ b/include/exec/tb-stats-dump.h
@@ -0,0 +1,21 @@
+/*
+ * TB Stats common dump functions across sysemu/linux-user
+ *
+ * Copyright (c) 2019 Linaro
+ *
+ * SPDX-License-Identifier: GPL-3.0-or-later
+ */
+
+#ifndef _TB_STATS_DUMP_H_
+#define _TB_STATS_DUMP_H_
+
+/**
+ * tb_stats_dump: dump final tb_stats at end of execution
+ */
+#ifdef CONFIG_TCG
+void tb_stats_dump(void);
+#else
+static inline void tb_stats_dump(void) { /* do nothing */ };
+#endif
+
+#endif /* _TB_STATS_DUMP_H_ */
diff --git a/include/exec/tb-stats-flags.h b/include/exec/tb-stats-flags.h
index 252304d685..a5bf44f824 100644
--- a/include/exec/tb-stats-flags.h
+++ b/include/exec/tb-stats-flags.h
@@ -24,6 +24,7 @@ void pause_collect_tb_stats(void);
 bool tb_stats_collection_enabled(void);
 bool tb_stats_collection_paused(void);
 
+void set_tbstats_max_tbs(int max);
 uint32_t get_default_tbstats_flag(void);
 void set_default_tbstats_flag(uint32_t);
 
diff --git a/linux-user/exit.c b/linux-user/exit.c
index bdda720553..c00700bb96 100644
--- a/linux-user/exit.c
+++ b/linux-user/exit.c
@@ -21,6 +21,7 @@
 #ifdef TARGET_GPROF
 #include <sys/gmon.h>
 #endif
+#include "exec/tb-stats-dump.h"
 
 #ifdef CONFIG_GCOV
 extern void __gcov_dump(void);
@@ -35,4 +36,5 @@ void preexit_cleanup(CPUArchState *env, int code)
         __gcov_dump();
 #endif
         gdb_exit(env, code);
+        tb_stats_dump();
 }
diff --git a/stubs/tb-stats.c b/stubs/tb-stats.c
index d212c2a1fa..a3e1406b88 100644
--- a/stubs/tb-stats.c
+++ b/stubs/tb-stats.c
@@ -21,6 +21,11 @@ bool tb_stats_collection_enabled(void)
     return false;
 }
 
+void set_tbstats_max_tbs(int max)
+{
+    return;
+}
+
 void set_default_tbstats_flag(uint32_t flags)
 {
     return;
diff --git a/util/log.c b/util/log.c
index fa78e2bca9..b6e4f71c1f 100644
--- a/util/log.c
+++ b/util/log.c
@@ -32,7 +32,6 @@ FILE *qemu_logfile;
 int qemu_loglevel;
 static int log_append = 0;
 static GArray *debug_regions;
-int32_t max_num_hot_tbs_to_dump;
 static bool to_monitor;
 bool to_string;
 
@@ -337,7 +336,8 @@ int qemu_str_to_log_mask(const char *str)
             enable_collect_tb_stats();
         } else if (tb_stats_collection_enabled() &&
                    g_str_has_prefix(*tmp, "dump_limit=")) {
-            max_num_hot_tbs_to_dump = atoi((*tmp) + 11);
+            int hot_tbs = atoi((*tmp) + 11);
+            set_tbstats_max_tbs(hot_tbs);
         } else if (tb_stats_collection_enabled() &&
                    g_str_has_prefix(*tmp, "level=")) {
             uint32_t flags = 0;
diff --git a/vl.c b/vl.c
index 781fddaf18..9ceb401007 100644
--- a/vl.c
+++ b/vl.c
@@ -35,6 +35,7 @@
 #include "sysemu/runstate.h"
 #include "sysemu/seccomp.h"
 #include "sysemu/tcg.h"
+#include "exec/tb-stats-dump.h"
 
 #ifdef CONFIG_SDL
 #if defined(__APPLE__) || defined(main)
@@ -4514,6 +4515,7 @@ int main(int argc, char **argv, char **envp)
     /* No more vcpu or device emulation activity beyond this point */
     vm_shutdown();
 
+    tb_stats_dump();
     job_cancel_sync_all();
     bdrv_close_all();
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH  v9 11/13] accel/tcg: adding integration with linux perf
  2019-10-07 15:28 [PATCH v9 00/13] TCG code quality tracking and perf integration Alex Bennée
                   ` (9 preceding siblings ...)
  2019-10-07 15:28 ` [PATCH v9 10/13] tb-stats: dump hot TBs at the end of the execution Alex Bennée
@ 2019-10-07 15:28 ` Alex Bennée
  2019-10-08 19:33   ` Richard Henderson
  2019-10-07 15:28 ` [PATCH v9 12/13] tb-stats: adding TBStatistics info into perf dump Alex Bennée
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Alex Bennée @ 2019-10-07 15:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Vanderson M. do Rosario, Riku Voipio, Laurent Vivier, cota,
	Paolo Bonzini, Alex Bennée, Richard Henderson

From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>

This commit adds support to Linux Perf in order to be able to analyze
qemu jitted code and also to able to see the TBs PC in it.

When using "-perf" qemu creates a jitdump file in the current working
directory. You then integrate the file using perf inject.

Example of use:
 perf record -k 1 qemu-x86_64 -perf ./a.out
 perf inject -j -i perf.data -o perf.data.jitted
 perf report -i perf.data.jitted

Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
Message-Id: <20190830121903.17585-2-vandersonmr2@gmail.com>
[AJB: rebase and various fixes]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
AJB:
   - use get_clock instead of repeating ourselves
   - add lock for writing to file
   - use canonical kernel.org links
   - g_autoptr/g_autofree strings
---
 accel/tcg/Makefile.objs      |   2 +-
 accel/tcg/perf/Makefile.objs |   1 +
 accel/tcg/perf/jitdump.c     | 194 +++++++++++++++++++++++++++++++++++
 accel/tcg/perf/jitdump.h     |  36 +++++++
 accel/tcg/translate-all.c    |  14 +++
 docs/devel/tcg.rst           |  15 +++
 include/qemu-common.h        |   3 +
 linux-user/main.c            |   7 ++
 os-posix.c                   |   5 +
 qemu-options.hx              |  11 ++
 10 files changed, 287 insertions(+), 1 deletion(-)
 create mode 100644 accel/tcg/perf/Makefile.objs
 create mode 100644 accel/tcg/perf/jitdump.c
 create mode 100644 accel/tcg/perf/jitdump.h

diff --git a/accel/tcg/Makefile.objs b/accel/tcg/Makefile.objs
index 49ffe81b5d..6a1ad59199 100644
--- a/accel/tcg/Makefile.objs
+++ b/accel/tcg/Makefile.objs
@@ -3,6 +3,6 @@ obj-$(CONFIG_SOFTMMU) += cputlb.o
 obj-y += tcg-runtime.o tcg-runtime-gvec.o
 obj-y += cpu-exec.o cpu-exec-common.o translate-all.o
 obj-y += translator.o tb-stats.o
-
+obj-y += perf/
 obj-$(CONFIG_USER_ONLY) += user-exec.o
 obj-$(call lnot,$(CONFIG_SOFTMMU)) += user-exec-stub.o
diff --git a/accel/tcg/perf/Makefile.objs b/accel/tcg/perf/Makefile.objs
new file mode 100644
index 0000000000..ca9abb4f48
--- /dev/null
+++ b/accel/tcg/perf/Makefile.objs
@@ -0,0 +1 @@
+obj-$(CONFIG_LINUX) += jitdump.o
diff --git a/accel/tcg/perf/jitdump.c b/accel/tcg/perf/jitdump.c
new file mode 100644
index 0000000000..e1d6f2214e
--- /dev/null
+++ b/accel/tcg/perf/jitdump.c
@@ -0,0 +1,194 @@
+/*
+ * This code implements an interface to create and fill jitdump files. These files
+ * store information used by Linux Perf to enhance the presentation of jitted
+ * code and to allow the disassembly of jitted code.
+ *
+ * The jitdump file specification can be found in the Linux Kernel Source tree:
+ *    tools/perf/Documentation/jitdump-specification.txt
+ *
+ * https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/tools/perf/Documentation/jitdump-specification.txt
+ *
+ * Copyright (c) 2019 Vanderson M. do Rosario
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+
+#include <sys/syscall.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <time.h>
+#include <elf.h>
+
+#include "disas/disas.h"
+#include "jitdump.h"
+#include "qemu-common.h"
+#include "qemu/timer.h"
+#include "exec/tb-stats.h"
+
+struct jitheader {
+    uint32_t magic;     /* characters "jItD" */
+    uint32_t version;   /* header version */
+    uint32_t total_size;/* total size of header */
+    uint32_t elf_mach;  /* elf mach target */
+    uint32_t pad1;      /* reserved */
+    uint32_t pid;       /* JIT process id */
+    uint64_t timestamp; /* timestamp */
+    uint64_t flags;     /* flags */
+};
+
+enum jit_record_type {
+    JIT_CODE_LOAD       = 0,
+    JIT_CODE_MOVE       = 1,
+    JIT_CODE_DEBUG_INFO = 2,
+    JIT_CODE_CLOSE      = 3,
+
+    JIT_CODE_MAX,
+};
+
+/* record prefix (mandatory in each record) */
+struct jr_prefix {
+    uint32_t id;
+    uint32_t total_size;
+    uint64_t timestamp;
+};
+
+struct jr_code_load {
+    struct jr_prefix p;
+
+    uint32_t pid;
+    uint32_t tid;
+    uint64_t vma;
+    uint64_t code_addr;
+    uint64_t code_size;
+    uint64_t code_index;
+};
+
+struct jr_code_close {
+    struct jr_prefix p;
+};
+
+struct jr_code_move {
+    struct jr_prefix p;
+
+    uint32_t pid;
+    uint32_t tid;
+    uint64_t vma;
+    uint64_t old_code_addr;
+    uint64_t new_code_addr;
+    uint64_t code_size;
+    uint64_t code_index;
+};
+
+FILE *dumpfile;
+QemuMutex dumpfile_lock;
+void *perf_marker;
+
+static uint32_t get_e_machine(void)
+{
+    uint32_t e_machine = EM_NONE;
+    Elf64_Ehdr elf_header;
+    FILE *exe = fopen("/proc/self/exe", "r");
+
+    if (exe == NULL) {
+        return e_machine;
+    }
+
+    if (fread(&elf_header, sizeof(Elf64_Ehdr), 1, exe) != 1) {
+        goto end;
+    }
+
+    e_machine = elf_header.e_machine;
+
+end:
+    fclose(exe);
+    return e_machine;
+}
+
+void start_jitdump_file(void)
+{
+    g_autofree gchar *dumpfile_name = g_strdup_printf("./jit-%d.dump", getpid());
+    dumpfile = fopen(dumpfile_name, "w+");
+
+    /* 'Perf record' saves mmaped files during the execution of a program and
+     * 'perf inject' iterate over them to reconstruct all used/executed binary.
+     * So, we create a mmap with the path of our jitdump that is processed
+     * and used by 'perf inject' to reconstruct jitted binaries.
+     */
+    perf_marker = mmap(NULL, sysconf(_SC_PAGESIZE),
+                          PROT_READ | PROT_EXEC,
+                          MAP_PRIVATE,
+                          fileno(dumpfile), 0);
+
+    if (perf_marker == MAP_FAILED) {
+        printf("Failed to create mmap marker file for perf %d\n", fileno(dumpfile));
+        fclose(dumpfile);
+        return;
+    }
+
+    struct jitheader header;
+    header.magic = 0x4A695444;
+    header.version = 1;
+    header.elf_mach = get_e_machine();
+    header.total_size = sizeof(struct jitheader);
+    header.pid = getpid();
+    header.timestamp = get_clock();
+    header.flags = 0;
+
+    fwrite(&header, header.total_size, 1, dumpfile);
+
+    fflush(dumpfile);
+
+    qemu_mutex_init(&dumpfile_lock);
+}
+
+void append_load_in_jitdump_file(TranslationBlock *tb)
+{
+    gchar *func_name = g_strdup_printf("TB virt:0x"TARGET_FMT_lx, tb->pc);
+
+    /* Serialise the writing of the dump file */
+    qemu_mutex_lock(&dumpfile_lock);
+
+    struct jr_code_load load_event;
+    load_event.p.id = JIT_CODE_LOAD;
+    load_event.p.total_size =
+        sizeof(struct jr_code_load) + func_name->len + 1 + tb->tc.size;
+    load_event.p.timestamp = get_clock();
+    load_event.pid = getpid();
+    load_event.tid = syscall(SYS_gettid);
+    load_event.vma = tb->pc;
+    load_event.code_addr = (uint64_t) tb->tc.ptr;
+    load_event.code_size = tb->tc.size;
+    load_event.code_index = tb->pc;
+
+    fwrite(&load_event, sizeof(struct jr_code_load), 1, dumpfile);
+    fwrite(func_name->str, func_name->len + 1, 1, dumpfile);
+    fwrite(tb->tc.ptr, tb->tc.size, 1, dumpfile);
+
+    g_free(func_name);
+    fflush(dumpfile);
+
+    qemu_mutex_unlock(&dumpfile_lock);
+}
+
+void close_jitdump_file(void)
+{
+    fclose(dumpfile);
+    if (perf_marker != MAP_FAILED) {
+        munmap(perf_marker, sysconf(_SC_PAGESIZE));
+    }
+}
+
+bool is_jitdump_enabled;
+
+void enable_jitdump(void)
+{
+    is_jitdump_enabled = true;
+}
+
+bool jitdump_enabled(void)
+{
+    return is_jitdump_enabled;
+}
diff --git a/accel/tcg/perf/jitdump.h b/accel/tcg/perf/jitdump.h
new file mode 100644
index 0000000000..5d6df3ec91
--- /dev/null
+++ b/accel/tcg/perf/jitdump.h
@@ -0,0 +1,36 @@
+/*
+ * QEMU Linux Perf Support
+ *
+ * Copyright (c) 2019 Vanderson M. do Rosario (vandersonmr2@gmail.com)
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+#ifndef JITDUMP_H
+#define JITDUMP_H
+
+#include "exec/exec-all.h"
+
+void start_jitdump_file(void);
+
+void append_load_in_jitdump_file(TranslationBlock *tb);
+void append_move_in_jitdump_file(TranslationBlock *tb);
+
+void close_jitdump_file(void);
+
+#endif
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 871d91d559..3fafb656e7 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -58,6 +58,10 @@
 #include "sysemu/cpus.h"
 #include "sysemu/tcg.h"
 
+#ifdef __linux__
+#include "perf/jitdump.h"
+#endif
+
 /* #define DEBUG_TB_INVALIDATE */
 /* #define DEBUG_TB_FLUSH */
 /* make various TB consistency checks */
@@ -1167,6 +1171,11 @@ void tcg_exec_init(unsigned long tb_size)
     cpu_gen_init();
     page_init();
     tb_htable_init();
+#ifdef __linux__
+    if (jitdump_enabled()) {
+        start_jitdump_file();
+    }
+#endif
     code_gen_alloc(tb_size);
 #if defined(CONFIG_SOFTMMU)
     /* There's no guest base to take into account, so go ahead and
@@ -1978,6 +1987,11 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
         return existing_tb;
     }
     tcg_tb_insert(tb);
+#ifdef __linux__
+    if (jitdump_enabled()) {
+        append_load_in_jitdump_file(tb);
+    }
+#endif
     return tb;
 }
 
diff --git a/docs/devel/tcg.rst b/docs/devel/tcg.rst
index 4956a30a4e..ab5639cf67 100644
--- a/docs/devel/tcg.rst
+++ b/docs/devel/tcg.rst
@@ -109,3 +109,18 @@ memory areas instead calls out to C code for device emulation.
 Finally, the MMU helps tracking dirty pages and pages pointed to by
 translation blocks.
 
+Profiling JITted code
+---------------------
+
+The Linux `perf` tool will treat all JITed code as a single block as
+unlike the main code it can't use debug information to link individual
+program counter samples with larger functions. To overcome this
+limitation you can use the `--perf` option to generate a map file.
+This needs to be integrated with the `perf.data` file before the final
+report can be viewed.
+
+.. code::
+
+  perf record -k 1 $QEMU --perf $REMAINING_ARGS
+  perf inject -i perf.data -j -o perf.data.jitted
+  perf report -i perf.data.jitted
diff --git a/include/qemu-common.h b/include/qemu-common.h
index 8d84db90b0..a16e0e7eb6 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -129,4 +129,7 @@ void page_size_init(void);
  * returned. */
 bool dump_in_progress(void);
 
+void enable_jitdump(void);
+bool jitdump_enabled(void);
+
 #endif
diff --git a/linux-user/main.c b/linux-user/main.c
index 560d053f72..18f771c0be 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -377,6 +377,11 @@ static void handle_arg_strace(const char *arg)
     do_strace = 1;
 }
 
+static void handle_arg_perf(const char *arg)
+{
+    enable_jitdump();
+}
+
 static void handle_arg_version(const char *arg)
 {
     printf("qemu-" TARGET_NAME " version " QEMU_FULL_VERSION
@@ -449,6 +454,8 @@ static const struct qemu_argument arg_table[] = {
      "",           "Seed for pseudo-random number generator"},
     {"trace",      "QEMU_TRACE",       true,  handle_arg_trace,
      "",           "[[enable=]<pattern>][,events=<file>][,file=<file>]"},
+    {"perf",      "QEMU_PERF",         false, handle_arg_perf,
+     "",           "dump jitdump files to help linux perf JIT code visualization"},
     {"version",    "QEMU_VERSION",     false, handle_arg_version,
      "",           "display version information and exit"},
 #if defined(TARGET_XTENSA)
diff --git a/os-posix.c b/os-posix.c
index 86cffd2c7d..36ea3a08ca 100644
--- a/os-posix.c
+++ b/os-posix.c
@@ -191,6 +191,11 @@ int os_parse_cmd_args(int index, const char *optarg)
     case QEMU_OPTION_enablefips:
         fips_set_state(true);
         break;
+#if defined(CONFIG_TCG) && defined (CLOCK_MONOTONIC)
+    case QEMU_OPTION_perf:
+        enable_jitdump();
+        break;
+#endif
 #endif
     default:
         return -1;
diff --git a/qemu-options.hx b/qemu-options.hx
index 2a04ca6ac5..2924032c3b 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4160,6 +4160,17 @@ STEXI
 Enable FIPS 140-2 compliance mode.
 ETEXI
 
+#ifdef __linux__
+DEF("perf", 0, QEMU_OPTION_perf,
+    "-perf  dump jitdump files to help linux perf JIT code visualization\n",
+    QEMU_ARCH_ALL)
+#endif
+STEXI
+@item -perf
+@findex -perf
+Dumps jitdump files to help linux perf JIT code visualization
+ETEXI
+
 HXCOMM Deprecated by -accel tcg
 DEF("no-kvm", 0, QEMU_OPTION_no_kvm, "", QEMU_ARCH_I386)
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH  v9 12/13] tb-stats: adding TBStatistics info into perf dump
  2019-10-07 15:28 [PATCH v9 00/13] TCG code quality tracking and perf integration Alex Bennée
                   ` (10 preceding siblings ...)
  2019-10-07 15:28 ` [PATCH v9 11/13] accel/tcg: adding integration with linux perf Alex Bennée
@ 2019-10-07 15:28 ` Alex Bennée
  2019-10-08 19:46   ` Richard Henderson
  2019-10-07 15:28 ` [PATCH v9 13/13] configure: remove the final bits of --profiler support Alex Bennée
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Alex Bennée @ 2019-10-07 15:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, cota, Vanderson M. do Rosario, Paolo Bonzini,
	Richard Henderson

From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>

Adding TBStatistics information to linux perf TB's symbol names.

Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
Message-Id: <20190830121903.17585-3-vandersonmr2@gmail.com>
[AJB: fix authorship, auto-strings]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
AJB:
  - use g_string and auto free
---
 accel/tcg/perf/jitdump.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/accel/tcg/perf/jitdump.c b/accel/tcg/perf/jitdump.c
index e1d6f2214e..e7b86173e0 100644
--- a/accel/tcg/perf/jitdump.c
+++ b/accel/tcg/perf/jitdump.c
@@ -146,7 +146,20 @@ void start_jitdump_file(void)
 
 void append_load_in_jitdump_file(TranslationBlock *tb)
 {
-    gchar *func_name = g_strdup_printf("TB virt:0x"TARGET_FMT_lx, tb->pc);
+    g_autoptr(GString) func_name = g_string_new("TB virt:");
+
+    g_string_append_printf(func_name, "0x"TARGET_FMT_lx, tb->pc);
+
+    if (tb->tb_stats) {
+        TBStatistics *tbs = tb->tb_stats;
+        unsigned g = stat_per_translation(tbs, code.num_guest_inst);
+        unsigned ops = stat_per_translation(tbs, code.num_tcg_ops);
+        unsigned ops_opt = stat_per_translation(tbs, code.num_tcg_ops_opt);
+        unsigned spills = stat_per_translation(tbs, code.spills);
+
+        g_string_append_printf(func_name, " (g:%u op:%u opt:%u spills:%d)",
+                               g, ops, ops_opt, spills);
+    }
 
     /* Serialise the writing of the dump file */
     qemu_mutex_lock(&dumpfile_lock);
@@ -167,7 +180,6 @@ void append_load_in_jitdump_file(TranslationBlock *tb)
     fwrite(func_name->str, func_name->len + 1, 1, dumpfile);
     fwrite(tb->tc.ptr, tb->tc.size, 1, dumpfile);
 
-    g_free(func_name);
     fflush(dumpfile);
 
     qemu_mutex_unlock(&dumpfile_lock);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v9 13/13] configure: remove the final bits of --profiler support
  2019-10-07 15:28 [PATCH v9 00/13] TCG code quality tracking and perf integration Alex Bennée
                   ` (11 preceding siblings ...)
  2019-10-07 15:28 ` [PATCH v9 12/13] tb-stats: adding TBStatistics info into perf dump Alex Bennée
@ 2019-10-07 15:28 ` Alex Bennée
  2019-10-08 19:39   ` Richard Henderson
  2019-10-07 18:14 ` [PATCH v9 00/13] TCG code quality tracking and perf integration no-reply
  2019-10-07 18:47 ` no-reply
  14 siblings, 1 reply; 37+ messages in thread
From: Alex Bennée @ 2019-10-07 15:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota, Alex Bennée

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 configure | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/configure b/configure
index eedeb9016e..dfd0f67170 100755
--- a/configure
+++ b/configure
@@ -419,7 +419,6 @@ oss_lib=""
 bsd="no"
 linux="no"
 solaris="no"
-profiler="no"
 cocoa="no"
 softmmu="yes"
 linux_user="no"
@@ -1192,8 +1191,6 @@ for opt do
   ;;
   --enable-libnfs) libnfs="yes"
   ;;
-  --enable-profiler) profiler="yes"
-  ;;
   --disable-cocoa) cocoa="no"
   ;;
   --enable-cocoa)
@@ -1702,7 +1699,6 @@ Advanced options (experts only):
   --with-win-sdk=SDK-path  path to Windows Platform SDK (to build VSS .tlb)
   --tls-priority           default TLS protocol/cipher priority string
   --enable-gprof           QEMU profiling with gprof
-  --enable-profiler        profiler support
   --enable-debug-stack-usage
                            track the maximum stack usage of stacks created by qemu_alloc_stack
 
@@ -6307,7 +6303,6 @@ echo "target list       $target_list"
 echo "gprof enabled     $gprof"
 echo "sparse enabled    $sparse"
 echo "strip binaries    $strip_opt"
-echo "profiler          $profiler"
 echo "static build      $static"
 if test "$darwin" = "yes" ; then
     echo "Cocoa support     $cocoa"
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH  v9 00/13] TCG code quality tracking and perf integration
  2019-10-07 15:28 [PATCH v9 00/13] TCG code quality tracking and perf integration Alex Bennée
                   ` (12 preceding siblings ...)
  2019-10-07 15:28 ` [PATCH v9 13/13] configure: remove the final bits of --profiler support Alex Bennée
@ 2019-10-07 18:14 ` no-reply
  2019-10-07 18:47 ` no-reply
  14 siblings, 0 replies; 37+ messages in thread
From: no-reply @ 2019-10-07 18:14 UTC (permalink / raw)
  To: alex.bennee; +Cc: cota, alex.bennee, qemu-devel

Patchew URL: https://patchew.org/QEMU/20191007152839.30804-1-alex.bennee@linaro.org/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [PATCH  v9 00/13] TCG code quality tracking and perf integration
Message-id: 20191007152839.30804-1-alex.bennee@linaro.org
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

From https://github.com/patchew-project/qemu
 * [new tag]         patchew/20191007152839.30804-1-alex.bennee@linaro.org -> patchew/20191007152839.30804-1-alex.bennee@linaro.org
Switched to a new branch 'test'
2344fa6 configure: remove the final bits of --profiler support
f20376c tb-stats: adding TBStatistics info into perf dump
3ce3a87 accel/tcg: adding integration with linux perf
f430d85 tb-stats: dump hot TBs at the end of the execution
acc3b84 Adding info [tb-list|tb] commands to HMP (WIP)
cdf6f72 tb-stats: reset the tracked TBs on a tb_flush
9fe7b93 monitor: adding tb_stats hmp command
1358a45 debug: add -d tb_stats to control TBStatistics collection:
c1bf3a8 accel: adding TB_JIT_TIME and full replacing CONFIG_PROFILER
947c81f accel: replacing part of CONFIG_PROFILER with TBStats
b3bf3ff accel: collecting JIT statistics
c7359e9 accel: collecting TB execution count
ac71d5f accel/tcg: introduce TBStatistics structure

=== OUTPUT BEGIN ===
1/13 Checking commit ac71d5f77f09 (accel/tcg: introduce TBStatistics structure)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#37: 
new file mode 100644

WARNING: Block comments use a leading /* on a separate line
#231: FILE: include/exec/tb-context.h:26:
+/* Page tracking code uses ram addresses in system mode, and virtual

WARNING: Block comments use * on subsequent lines
#232: FILE: include/exec/tb-context.h:27:
+/* Page tracking code uses ram addresses in system mode, and virtual
+   addresses in userspace mode.  Define tb_page_addr_t to be an appropriate

WARNING: Block comments use a trailing */ on a separate line
#233: FILE: include/exec/tb-context.h:28:
+   type.  */

total: 0 errors, 4 warnings, 271 lines checked

Patch 1/13 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
2/13 Checking commit c7359e966d69 (accel: collecting TB execution count)
3/13 Checking commit b3bf3ff3a8e4 (accel: collecting JIT statistics)
4/13 Checking commit 947c81f1a9c9 (accel: replacing part of CONFIG_PROFILER with TBStats)
WARNING: line over 80 characters
#105: FILE: accel/tcg/tb-stats.c:85:
+            qemu_printf("JIT cycles          %" PRId64 " (%0.3f s at 2.4 GHz)\n",

WARNING: line over 80 characters
#126: FILE: accel/tcg/tb-stats.c:106:
+                    (double)s->la_time / (s->code_time ? s->code_time : 1) * 100.0);

WARNING: line over 80 characters
#130: FILE: accel/tcg/tb-stats.c:110:
+                    s->restore_count ? (double)s->restore_time / s->restore_count : 0);

total: 0 errors, 3 warnings, 345 lines checked

Patch 4/13 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
5/13 Checking commit c1bf3a89e351 (accel: adding TB_JIT_TIME and full replacing CONFIG_PROFILER)
WARNING: line over 80 characters
#112: FILE: accel/tcg/tb-stats.c:142:
+                (double)jpi->opt_time / (jpi->code_time ? jpi->code_time : 1) * 100.0);

WARNING: line over 80 characters
#114: FILE: accel/tcg/tb-stats.c:144:
+                (double)jpi->la_time / (jpi->code_time ? jpi->code_time : 1) * 100.0);

WARNING: line over 80 characters
#118: FILE: accel/tcg/tb-stats.c:148:
+                jpi->restore_count ? (double)jpi->restore_time / jpi->restore_count : 0);

WARNING: line over 80 characters
#149: FILE: accel/tcg/tb-stats.c:152:
+                s->cpu_exec_time, s->cpu_exec_time / (double) NANOSECONDS_PER_SECOND);

WARNING: line over 80 characters
#589: FILE: tcg/tcg.c:4133:
+        qemu_printf("%s %" PRId64 "\n", tcg_op_defs[i].name, prof.table_op_count[i]);

total: 0 errors, 5 warnings, 557 lines checked

Patch 5/13 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
6/13 Checking commit 1358a45e85db (debug: add -d tb_stats to control TBStatistics collection:)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#53: 
new file mode 100644

WARNING: line over 80 characters
#206: FILE: util/log.c:279:
+    { CPU_LOG_TB_STATS, "tb_stats[[,level=(+all+jit+exec+time)][,dump_limit=<number>]]",

WARNING: line over 80 characters
#218: FILE: util/log.c:305:
+            set_default_tbstats_flag(TB_JIT_STATS | TB_EXEC_STATS | TB_JIT_TIME);

WARNING: line over 80 characters
#228: FILE: util/log.c:315:
+            for (level_tmp = level_parts; level_tmp && *level_tmp; level_tmp++) {

total: 0 errors, 4 warnings, 176 lines checked

Patch 6/13 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
7/13 Checking commit 9fe7b93cf6d7 (monitor: adding tb_stats hmp command)
WARNING: line over 80 characters
#103: FILE: accel/tcg/tb-stats.c:201:
+        /* Continue to create TBStatistic structures but stop collecting statistics */

WARNING: line over 80 characters
#126: FILE: accel/tcg/tb-stats.c:224:
+            qemu_printf("no search on record! execute info tbs before filtering!");

WARNING: line over 80 characters
#132: FILE: accel/tcg/tb-stats.c:230:
+        /* Set all tbstats as paused, then return only the ones from last_search */

WARNING: Block comments use a leading /* on a separate line
#223: FILE: include/exec/tb-stats.h:35:
+enum SortBy { SORT_BY_HOTNESS, SORT_BY_HG /* Host/Guest */, SORT_BY_SPILLS };

total: 0 errors, 4 warnings, 259 lines checked

Patch 7/13 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
8/13 Checking commit cdf6f7219143 (tb-stats: reset the tracked TBs on a tb_flush)
9/13 Checking commit acc3b8494d38 (Adding info [tb-list|tb] commands to HMP (WIP))
WARNING: line over 80 characters
#144: FILE: accel/tcg/tb-stats.c:325:
+        qemu_log("\t| exec:%lu/%lu guest inst cov:%.2f%%\n", tbs->executions.normal,

WARNING: line over 80 characters
#188: FILE: accel/tcg/tb-stats.c:369:
+            (float) stat_per_translation(tbs1, code.out_len) / tbs1->code.num_guest_inst;

WARNING: line over 80 characters
#190: FILE: accel/tcg/tb-stats.c:371:
+            (float) stat_per_translation(tbs2, code.out_len) / tbs2->code.num_guest_inst;

WARNING: line over 80 characters
#223: FILE: accel/tcg/tb-stats.c:404:
+            (tbs->executions.atomic + tbs->executions.normal) * tbs->code.num_guest_inst;

WARNING: line over 80 characters
#229: FILE: accel/tcg/tb-stats.c:410:
+            (tbs->executions.atomic + tbs->executions.normal) * tbs->code.num_guest_inst;

WARNING: line over 80 characters
#230: FILE: accel/tcg/tb-stats.c:411:
+        tbs->executions.coverage = (10000 * tb_total_execs) / (total_exec_count + 1);

WARNING: line over 80 characters
#247: FILE: accel/tcg/tb-stats.c:428:
+    last_search = g_list_sort_with_data(last_search, inverse_sort_tbs, &sort_by);

total: 0 errors, 7 warnings, 632 lines checked

Patch 9/13 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
10/13 Checking commit f430d85d6d0a (tb-stats: dump hot TBs at the end of the execution)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#90: 
new file mode 100644

WARNING: Block comments use a leading /* on a separate line
#112: FILE: include/exec/tb-stats-dump.h:18:
+static inline void tb_stats_dump(void) { /* do nothing */ };

total: 0 errors, 2 warnings, 123 lines checked

Patch 10/13 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
11/13 Checking commit 3ce3a87a1f50 (accel/tcg: adding integration with linux perf)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#38: 
new file mode 100644

WARNING: line over 80 characters
#51: FILE: accel/tcg/perf/jitdump.c:2:
+ * This code implements an interface to create and fill jitdump files. These files

WARNING: line over 80 characters
#161: FILE: accel/tcg/perf/jitdump.c:112:
+    g_autofree gchar *dumpfile_name = g_strdup_printf("./jit-%d.dump", getpid());

WARNING: Block comments use a leading /* on a separate line
#164: FILE: accel/tcg/perf/jitdump.c:115:
+    /* 'Perf record' saves mmaped files during the execution of a program and

WARNING: line over 80 characters
#175: FILE: accel/tcg/perf/jitdump.c:126:
+        printf("Failed to create mmap marker file for perf %d\n", fileno(dumpfile));

WARNING: architecture specific defines should be avoided
#294: FILE: accel/tcg/translate-all.c:61:
+#ifdef __linux__

WARNING: architecture specific defines should be avoided
#305: FILE: accel/tcg/translate-all.c:1174:
+#ifdef __linux__

WARNING: architecture specific defines should be avoided
#317: FILE: accel/tcg/translate-all.c:1990:
+#ifdef __linux__

WARNING: line over 80 characters
#381: FILE: linux-user/main.c:458:
+     "",           "dump jitdump files to help linux perf JIT code visualization"},

ERROR: space prohibited between function name and open parenthesis '('
#393: FILE: os-posix.c:194:
+#if defined(CONFIG_TCG) && defined (CLOCK_MONOTONIC)

total: 1 errors, 9 warnings, 342 lines checked

Patch 11/13 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

12/13 Checking commit f20376c59d71 (tb-stats: adding TBStatistics info into perf dump)
13/13 Checking commit 2344fa699da4 (configure: remove the final bits of --profiler support)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20191007152839.30804-1-alex.bennee@linaro.org/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH  v9 00/13] TCG code quality tracking and perf integration
  2019-10-07 15:28 [PATCH v9 00/13] TCG code quality tracking and perf integration Alex Bennée
                   ` (13 preceding siblings ...)
  2019-10-07 18:14 ` [PATCH v9 00/13] TCG code quality tracking and perf integration no-reply
@ 2019-10-07 18:47 ` no-reply
  14 siblings, 0 replies; 37+ messages in thread
From: no-reply @ 2019-10-07 18:47 UTC (permalink / raw)
  To: alex.bennee; +Cc: cota, alex.bennee, qemu-devel

Patchew URL: https://patchew.org/QEMU/20191007152839.30804-1-alex.bennee@linaro.org/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#! /bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-mingw@fedora J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC      qga/vss-win32.o
  CC      qga/qapi-generated/qga-qapi-types.o
  CC      qga/qapi-generated/qga-qapi-visit.o
./qemu-monitor.texi:585: warning: @findex missing argument
  CC      qga/qapi-generated/qga-qapi-commands.o
./qemu-monitor.texi:585: warning: @findex missing argument
  AR      libqemuutil.a
  LINK    elf2dmp.exe
  CC      qemu-img.o
---
  LINK    qemu-img.exe
  CC      aarch64-softmmu/arch_init.o
  CC      aarch64-softmmu/cpus.o
/tmp/qemu-test/src/disas.c:425:31: error: 'struct _IO_FILE' declared inside parameter list will not be visible outside of this definition or declaration [-Werror]
 static int fprintf_log(struct _IO_FILE *a, const char *b, ...)
                               ^~~~~~~~
/tmp/qemu-test/src/disas.c: In function 'fprintf_log':
/tmp/qemu-test/src/disas.c:431:18: error: passing argument 1 of 'vfprintf' from incompatible pointer type [-Werror=incompatible-pointer-types]
         vfprintf(a, b, ap);
                  ^
In file included from /tmp/qemu-test/src/include/qemu/osdep.h:99,
---
               ~~~~~~^~~~~~~~
In file included from /tmp/qemu-test/src/disas.c:3:
/tmp/qemu-test/src/disas.c: In function 'target_disas':
/tmp/qemu-test/src/include/disas/dis-asm.h:478:23: error: assignment to 'fprintf_function' {aka 'int (*)(struct _iobuf *, const char *, ...)'} from incompatible pointer type 'int (*)(struct _IO_FILE *, const char *, ...)' [-Werror=incompatible-pointer-types]
   (INFO).fprintf_func = (FPRINTF_FUNC), \
                       ^
/tmp/qemu-test/src/include/disas/dis-asm.h:470:3: note: in expansion of macro 'INIT_DISASSEMBLE_INFO_NO_ARCH'
---
/tmp/qemu-test/src/disas.c:450:5: note: in expansion of macro 'INIT_DISASSEMBLE_INFO'
     INIT_DISASSEMBLE_INFO(s.info, out, fprintf_log);
     ^~~~~~~~~~~~~~~~~~~~~
/tmp/qemu-test/src/disas.c:481:21: error: passing argument 1 of 'fprintf_log' from incompatible pointer type [-Werror=incompatible-pointer-types]
         fprintf_log(out, "0x" TARGET_FMT_lx ":  ", pc);
                     ^~~
/tmp/qemu-test/src/disas.c:425:41: note: expected 'struct _IO_FILE *' but argument is of type 'FILE *' {aka 'struct _iobuf *'}
 static int fprintf_log(struct _IO_FILE *a, const char *b, ...)
                        ~~~~~~~~~~~~~~~~~^
/tmp/qemu-test/src/disas.c:483:21: error: passing argument 1 of 'fprintf_log' from incompatible pointer type [-Werror=incompatible-pointer-types]
         fprintf_log(out, "\n");
                     ^~~
/tmp/qemu-test/src/disas.c:425:41: note: expected 'struct _IO_FILE *' but argument is of type 'FILE *' {aka 'struct _iobuf *'}
 static int fprintf_log(struct _IO_FILE *a, const char *b, ...)
                        ~~~~~~~~~~~~~~~~~^
cc1: all warnings being treated as errors
make[1]: *** [/tmp/qemu-test/src/rules.mak:69: disas.o] Error 1
make[1]: *** Waiting for unfinished jobs....
  CC      x86_64-softmmu/tcg/tcg-op-gvec.o
  CC      x86_64-softmmu/tcg/tcg-common.o
---
  CC      x86_64-softmmu/cpus.o
  CC      x86_64-softmmu/gdbstub.o
  CC      x86_64-softmmu/balloon.o
/tmp/qemu-test/src/disas.c:425:31: error: 'struct _IO_FILE' declared inside parameter list will not be visible outside of this definition or declaration [-Werror]
 static int fprintf_log(struct _IO_FILE *a, const char *b, ...)
                               ^~~~~~~~
/tmp/qemu-test/src/disas.c: In function 'fprintf_log':
/tmp/qemu-test/src/disas.c:431:18: error: passing argument 1 of 'vfprintf' from incompatible pointer type [-Werror=incompatible-pointer-types]
         vfprintf(a, b, ap);
                  ^
In file included from /tmp/qemu-test/src/include/qemu/osdep.h:99,
---
               ~~~~~~^~~~~~~~
In file included from /tmp/qemu-test/src/disas.c:3:
/tmp/qemu-test/src/disas.c: In function 'target_disas':
/tmp/qemu-test/src/include/disas/dis-asm.h:478:23: error: assignment to 'fprintf_function' {aka 'int (*)(struct _iobuf *, const char *, ...)'} from incompatible pointer type 'int (*)(struct _IO_FILE *, const char *, ...)' [-Werror=incompatible-pointer-types]
   (INFO).fprintf_func = (FPRINTF_FUNC), \
                       ^
/tmp/qemu-test/src/include/disas/dis-asm.h:470:3: note: in expansion of macro 'INIT_DISASSEMBLE_INFO_NO_ARCH'
---
/tmp/qemu-test/src/disas.c:450:5: note: in expansion of macro 'INIT_DISASSEMBLE_INFO'
     INIT_DISASSEMBLE_INFO(s.info, out, fprintf_log);
     ^~~~~~~~~~~~~~~~~~~~~
/tmp/qemu-test/src/disas.c:481:21: error: passing argument 1 of 'fprintf_log' from incompatible pointer type [-Werror=incompatible-pointer-types]
         fprintf_log(out, "0x" TARGET_FMT_lx ":  ", pc);
                     ^~~
/tmp/qemu-test/src/disas.c:425:41: note: expected 'struct _IO_FILE *' but argument is of type 'FILE *' {aka 'struct _iobuf *'}
 static int fprintf_log(struct _IO_FILE *a, const char *b, ...)
                        ~~~~~~~~~~~~~~~~~^
/tmp/qemu-test/src/disas.c:483:21: error: passing argument 1 of 'fprintf_log' from incompatible pointer type [-Werror=incompatible-pointer-types]
         fprintf_log(out, "\n");
                     ^~~
/tmp/qemu-test/src/disas.c:425:41: note: expected 'struct _IO_FILE *' but argument is of type 'FILE *' {aka 'struct _iobuf *'}
 static int fprintf_log(struct _IO_FILE *a, const char *b, ...)
                        ~~~~~~~~~~~~~~~~~^
cc1: all warnings being treated as errors
make[1]: *** [/tmp/qemu-test/src/rules.mak:69: disas.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make: *** [Makefile:482: aarch64-softmmu/all] Error 2
make: *** Waiting for unfinished jobs....
make: *** [Makefile:482: x86_64-softmmu/all] Error 2
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 662, in <module>
    sys.exit(main())
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=659c6edd1ef74ee3b79cf799f0a80d51', '-u', '1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-p5lijq1d/src/docker-src.2019-10-07-14.43.03.6150:/var/tmp/qemu:z,ro', 'qemu:fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=659c6edd1ef74ee3b79cf799f0a80d51
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-p5lijq1d/src'
make: *** [docker-run-test-mingw@fedora] Error 2

real    4m42.940s
user    0m8.363s


The full log is available at
http://patchew.org/logs/20191007152839.30804-1-alex.bennee@linaro.org/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 01/13] accel/tcg: introduce TBStatistics structure
  2019-10-07 15:28 ` [PATCH v9 01/13] accel/tcg: introduce TBStatistics structure Alex Bennée
@ 2019-10-08 12:35   ` Richard Henderson
  2019-12-13 11:14     ` Alex Bennée
  0 siblings, 1 reply; 37+ messages in thread
From: Richard Henderson @ 2019-10-08 12:35 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Paolo Bonzini, cota, Vanderson M. do Rosario, Richard Henderson

On 10/7/19 11:28 AM, Alex Bennée wrote:
> From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>
> 
> To store statistics for each TB, we created a TBStatistics structure
> which is linked with the TBs. TBStatistics can stay alive after
> tb_flush and be relinked to a regenerated TB. So the statistics can
> be accumulated even through flushes.
> 
> The goal is to have all present and future qemu/tcg statistics and
> meta-data stored in this new structure.
> 
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
> Message-Id: <20190829173437.5926-2-vandersonmr2@gmail.com>
> [AJB: fix git author, review comments]
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> 
> ---
> AJB
>   - move tcg_collect_tb_stats inside tb-stats.c
>   - add spdx header
>   - drop tb from tbstats and associated functions
> ---

The only quibble I have is with

> +void init_tb_stats_htable_if_not(void);

If not what?

I can only imagine that this phrase is intended to finish "if not initialized".
 But I think it's clearer to just call this "init_tb_stats_htable".

> +void enable_collect_tb_stats(void)
> +{
> +    init_tb_stats_htable_if_not();

Why do we need to do this again, since we did this in tb_htable_init?


r~


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 02/13] accel: collecting TB execution count
  2019-10-07 15:28 ` [PATCH v9 02/13] accel: collecting TB execution count Alex Bennée
@ 2019-10-08 13:10   ` Richard Henderson
  0 siblings, 0 replies; 37+ messages in thread
From: Richard Henderson @ 2019-10-08 13:10 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Paolo Bonzini, cota, Vanderson M. do Rosario, Richard Henderson

On 10/7/19 11:28 AM, Alex Bennée wrote:
> +void HELPER(inc_exec_freq)(void *ptr)
> +{
> +    TBStatistics *stats = (TBStatistics *) ptr;
> +    g_assert(stats);
> +    atomic_inc(&stats->executions.normal);
> +}

tcg_debug_assert.

> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 114ebe48bf..b7dd1a78e5 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -1784,10 +1784,17 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>      /*
>       * We want to fetch the stats structure before we start code
>       * generation so we can count interesting things about this
> -     * generation.
> +     * generation. If dfilter is in effect we will only collect stats
> +     * for the specified range.
>       */
> -    if (tb_stats_collection_enabled()) {
> +    if (tb_stats_collection_enabled() &&
> +        qemu_log_in_addr_range(tb->pc)) {
> +        uint32_t flag = get_default_tbstats_flag();
>          tb->tb_stats = tb_get_stats(phys_pc, pc, cs_base, flags);
> +
> +        if (flag & TB_EXEC_STATS) {
> +            tb->tb_stats->stats_enabled |= TB_EXEC_STATS;
> +        }

Is this intended to be

  tb->tb_stats->stats_enabled =
      (tb->tb_stats->stats_enabled & ~TB_EXEC_STATS)
    | (flag & TB_EXEC_STATS);

so that the flag eventually gets cleared?  I also don't understand placing
stats_enabled within a structure that is shared between multiple TB.

I can only imagine that TB_EXEC_STATS should really be a bit in tb->cflags.  It
wouldn't need to be in CF_HASH_MASK, but that seems to be the logical place to
put it.

> diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
> index 70c66c538c..ec6bd829a0 100644
> --- a/accel/tcg/translator.c
> +++ b/accel/tcg/translator.c
> @@ -46,6 +46,7 @@ void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
>  
>      ops->init_disas_context(db, cpu);
>      tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
> +    gen_tb_exec_count(tb);

Too early.  This should go after gen_tb_start().

>      /* Reset the temp count so that we can identify leaks */
>      tcg_clear_temp_count();
> diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
> index 822c43cfd3..be006383b9 100644
> --- a/include/exec/gen-icount.h
> +++ b/include/exec/gen-icount.h
> @@ -32,6 +32,15 @@ static inline void gen_io_end(void)
>      tcg_temp_free_i32(tmp);
>  }
>  
> +static inline void gen_tb_exec_count(TranslationBlock *tb)
> +{
> +    if (tb_stats_enabled(tb, TB_EXEC_STATS)) {
> +        TCGv_ptr ptr = tcg_const_ptr(tb->tb_stats);
> +        gen_helper_inc_exec_freq(ptr);
> +        tcg_temp_free_ptr(ptr);
> +    }
> +}

I think this could go into translator.c, instead of gen-icount.h; it shouldn't
be used anywhere else.

> +#define TB_NOTHING    (1 << 0)

What's this?


r~


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 03/13] accel: collecting JIT statistics
  2019-10-07 15:28 ` [PATCH v9 03/13] accel: collecting JIT statistics Alex Bennée
@ 2019-10-08 13:38   ` Richard Henderson
  2019-12-13 11:51     ` Alex Bennée
  0 siblings, 1 reply; 37+ messages in thread
From: Richard Henderson @ 2019-10-08 13:38 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Paolo Bonzini, cota, Vanderson M. do Rosario, Richard Henderson

On 10/7/19 11:28 AM, Alex Bennée wrote:
> @@ -1795,6 +1799,10 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>          if (flag & TB_EXEC_STATS) {
>              tb->tb_stats->stats_enabled |= TB_EXEC_STATS;
>          }
> +
> +        if (flag & TB_JIT_STATS) {
> +            tb->tb_stats->stats_enabled |= TB_JIT_STATS;
> +        }

So, assuming that you really meant this, and not replacing as I was guessing vs
patch 2, then this is

    tb->tb_stats->stats_enabled |=
        flag & (TB_EXEC_STATS | TB_JIT_STATS);

But I still think it's weird to be wanting to modify the shared structure.
What does that mean for concurrent code generation?

> +    /*
> +     * Collect JIT stats when enabled. We batch them all up here to
> +     * avoid spamming the cache with atomic accesses
> +     */
> +    if (tb_stats_enabled(tb, TB_JIT_STATS)) {
> +        TBStatistics *ts = tb->tb_stats;
> +        qemu_mutex_lock(&ts->jit_stats_lock);
> +
> +        ts->code.num_guest_inst += prof->translation.nb_guest_insns;
> +        ts->code.num_tcg_ops += prof->translation.nb_ops_pre_opt;
> +        ts->code.num_tcg_ops_opt += tcg_ctx->nb_ops;
> +        ts->code.spills += prof->translation.nb_spills;
> +        ts->code.out_len += tb->tc.size;
> +
> +        ts->translations.total++;
> +        if (phys_page2 != -1) {
> +            ts->translations.spanning++;
> +        }
> +
> +        g_ptr_array_add(ts->tbs, tb);
> +
> +        qemu_mutex_unlock(&ts->jit_stats_lock);
> +    }

Hmm.  So we're to interpret all of code.field as sums, the average of which can
be obtained by dividing by translations.total.  Ok, but that should probably be
mentioned in a comment in/near the structure definition.

What are we planning to do with the set of all tb's collected here?

> @@ -3125,6 +3126,7 @@ static void temp_sync
>          case TEMP_VAL_REG:
>              tcg_out_st(s, ts->type, ts->reg,
>                         ts->mem_base->reg, ts->mem_offset);
> +            s->prof.translation.nb_spills++;
>              break;
>  
>          case TEMP_VAL_MEM:

This is not a spill in the common compiler definition.

This is "write the temp to its backing storage".  While this does happen in the
course of spilling, the vast majority of these are because we've finished
modifying a global temp and must now update memory.  Which is not nearly the
same thing as "spill".

A spill in the compiler definition happens in tcg_reg_alloc, right after the
comment "We must spill something".  ;-)


r~


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 04/13] accel: replacing part of CONFIG_PROFILER with TBStats
  2019-10-07 15:28 ` [PATCH v9 04/13] accel: replacing part of CONFIG_PROFILER with TBStats Alex Bennée
@ 2019-10-08 13:58   ` Richard Henderson
  0 siblings, 0 replies; 37+ messages in thread
From: Richard Henderson @ 2019-10-08 13:58 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Paolo Bonzini, cota, Vanderson M. do Rosario, Richard Henderson

On 10/7/19 11:28 AM, Alex Bennée wrote:
> +struct jit_profile_info {
> +    uint64_t translations;
> +    uint64_t aborted;
> +    uint64_t ops;
> +    unsigned ops_max;
> +    uint64_t del_ops;
> +    uint64_t temps;
> +    unsigned temps_max;
> +    uint64_t host;
> +    uint64_t guest;
> +    uint64_t search_data;
> +};

Hmm.  Maybe better to define a single structure that you can share between the
collection here and the storage in TCGProfile?

Also, "host" and "guest" are not helpful names...

> +    jpi->host += tbs->code.out_len;
> +    jpi->guest += tbs->code.in_len;

... code_int_len and code_out_len are helpful names.

> @@ -1807,6 +1805,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>          tb->tb_stats = NULL;
>      }
>  
> +
>      tcg_func_start(tcg_ctx);
>  
>      tcg_ctx->cpu = env_cpu(env);

Watch the random whitespace.

> +#define stat_per_translation(stat, name) \
> +    (stat->translations.total ? stat->name / stat->translations.total : 0)

Does this really need to go in the header?


> +static void collect_jit_profile_info(void *p, uint32_t hash, void *userp)
> +{
> +    struct jit_profile_info *jpi = userp;
> +    TBStatistics *tbs = p;
> +
> +    jpi->translations += tbs->translations.total;
> +    jpi->ops += tbs->code.num_tcg_ops;
> +    if (stat_per_translation(tbs, code.num_tcg_ops) > jpi->ops_max) {
> +        jpi->ops_max = stat_per_translation(tbs, code.num_tcg_ops);
> +    }
> +    jpi->del_ops += tbs->code.deleted_ops;
> +    jpi->temps += tbs->code.temps;
> +    if (stat_per_translation(tbs, code.temps) > jpi->temps_max) {
> +        jpi->temps_max = stat_per_translation(tbs, code.temps);
> +    }
> +    jpi->host += tbs->code.out_len;
> +    jpi->guest += tbs->code.in_len;
> +    jpi->search_data += tbs->code.search_out_len;
> +}

E.g.

    unsigned long total = tbs->translations.total;

    if (total) {
        unsigned tmp;

        tmp = tbs->code.num_tcg_ops / total;
        jpi->ops_max = MAX(jpi->ops_max, tmp);

        tmp = tbs->code.temps / total;
        jpi->temps_max = MAX(jpi->temps_max, tmp);
    }

It does occur to me to wonder why code.num_guest_inst etc are "unsigned" while
translations.total are "unsigned long", given that they are both accumulations
over many TBs.


r~


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 05/13] accel: adding TB_JIT_TIME and full replacing CONFIG_PROFILER
  2019-10-07 15:28 ` [PATCH v9 05/13] accel: adding TB_JIT_TIME and full replacing CONFIG_PROFILER Alex Bennée
@ 2019-10-08 15:25   ` Richard Henderson
  2019-12-13 21:49     ` Alex Bennée
  0 siblings, 1 reply; 37+ messages in thread
From: Richard Henderson @ 2019-10-08 15:25 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Vanderson M. do Rosario, Dr. David Alan Gilbert,
	Markus Armbruster, cota, Paolo Bonzini, Richard Henderson

On 10/7/19 11:28 AM, Alex Bennée wrote:
> From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>
> 
> Replace all others CONFIG_PROFILER statistics and migrate it to
> TBStatistics system. However, TCGProfiler still exists and can
> be use to store global statistics and times. All TB related
> statistics goes to TBStatistics.
> 
> Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
> Message-Id: <20190829173437.5926-6-vandersonmr2@gmail.com>
> [AJB: fix authorship]
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  accel/tcg/tb-stats.c      |  98 +++++++++++++++++++++---------
>  accel/tcg/translate-all.c |  46 ++++++++-------
>  configure                 |   3 -
>  cpus.c                    |  14 ++---
>  include/exec/tb-stats.h   |  15 +++++
>  include/qemu/timer.h      |   5 +-
>  monitor/misc.c            |  28 ++-------
>  tcg/tcg.c                 | 121 ++++++++++++--------------------------
>  tcg/tcg.h                 |   2 +-
>  vl.c                      |   8 +--
>  10 files changed, 159 insertions(+), 181 deletions(-)
> 
> diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
> index 0e64c176b3..f431159fd2 100644
> --- a/accel/tcg/tb-stats.c
> +++ b/accel/tcg/tb-stats.c
> @@ -17,11 +17,18 @@
>  #include "exec/tb-stats.h"
>  
>  /* TBStatistic collection controls */
> -enum TBStatsStatus { TB_STATS_RUNNING, TB_STATS_PAUSED, TB_STATS_STOPPED };
> +enum TBStatsStatus {
> +    TB_STATS_DISABLED = 0,
> +    TB_STATS_RUNNING,
> +    TB_STATS_PAUSED,
> +    TB_STATS_STOPPED
> +};

This should be in patch 1, I should think.

> +uint64_t dev_time;

This patch is doing several things at once, and I think it should be split.
All of the changes to dev_time, for instance, are unrelated to TBStatistic.

> +    jpi->interm_time += stat_per_translation(tbs, time.interm);
> +    jpi->code_time += stat_per_translation(tbs, time.code);
> +    jpi->opt_time += stat_per_translation(tbs, time.opt);
> +    jpi->la_time += stat_per_translation(tbs, time.la);
> +    jpi->restore_time += tbs->time.restore;
> +    jpi->restore_count += tbs->time.restore_count;

Why are some of these "per translation" and some not?

> @@ -370,11 +371,11 @@ static int cpu_restore_state_from_tb
>      }
>      restore_state_to_opc(env, tb, data);
>  
> -#ifdef CONFIG_PROFILER
> -    atomic_set(&prof->restore_time,
> -                prof->restore_time + profile_getclock() - ti);
> -    atomic_set(&prof->restore_count, prof->restore_count + 1);
> -#endif
> +    if (tb_stats_enabled(tb, TB_JIT_TIME)) {
> +        atomic_add(&tb->tb_stats->time.restore, profile_getclock() - ti);
> +        atomic_inc(&tb->tb_stats->time.restore_count);
> +    }

Would it be better to use a the TBStatistics lock than two atomic updates?

> @@ -1826,10 +1828,10 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>          tcg_ctx->tb_jmp_target_addr = tb->jmp_target_arg;
>      }
>  
> -#ifdef CONFIG_PROFILER
> -    atomic_set(&prof->interm_time, prof->interm_time + profile_getclock() - ti);
> -    ti = profile_getclock();
> -#endif
> +    if (tb_stats_enabled(tb, TB_JIT_TIME)) {
> +        atomic_add(&tb->tb_stats->time.interm, profile_getclock() - ti);
> +        ti = profile_getclock();
> +    }

You should call profile_getclock() only once here.

Why does this need an atomic_add, while the rest of TB_JIT_STATS within
tb_gen_code do not, and in fact have a comment:

> +    /*
> +     * Collect JIT stats when enabled. We batch them all up here to
> +     * avoid spamming the cache with atomic accesses
> +     */

> @@ -1871,9 +1873,9 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>      }
>      tb->tc.size = gen_code_size;
>  
> -#ifdef CONFIG_PROFILER
> -    atomic_set(&prof->code_time, prof->code_time + profile_getclock() - ti);
> -#endif
> +    if (tb_stats_enabled(tb, TB_JIT_TIME)) {
> +        atomic_add(&tb->tb_stats->time.code, profile_getclock() - ti);
> +    }

Likewise.

> diff --git a/configure b/configure
> index 8f8446f52b..eedeb9016e 100755
> --- a/configure
> +++ b/configure
> @@ -6566,9 +6566,6 @@ fi
>  if test "$static" = "yes" ; then
>    echo "CONFIG_STATIC=y" >> $config_host_mak
>  fi
> -if test "$profiler" = "yes" ; then
> -  echo "CONFIG_PROFILER=y" >> $config_host_mak
> -fi

Removing the define without removing --enable-profiler.

>  static int tcg_cpu_exec(CPUState *cpu)
>  {
>      int ret;
> -#ifdef CONFIG_PROFILER
> -    int64_t ti;
> -#endif
> +    uint64_t ti;
>  
>      assert(tcg_enabled());
> -#ifdef CONFIG_PROFILER
>      ti = profile_getclock();
> -#endif
> +
>      cpu_exec_start(cpu);
>      ret = cpu_exec(cpu);
>      cpu_exec_end(cpu);
> -#ifdef CONFIG_PROFILER
> -    atomic_set(&tcg_ctx->prof.cpu_exec_time,
> -               tcg_ctx->prof.cpu_exec_time + profile_getclock() - ti);
> -#endif
> +
> +    atomic_add(&tcg_ctx->prof.cpu_exec_time, profile_getclock() - ti);

This is also unrelated to TBStatistics.

It's also adding an unconditional atomic_add, of which I am not fond.  Should
this be testing tb_stats_collection_enabled() or something else?

I'll also note that tcg_ctx is per-thread (for mttcg), and so perhaps this does
not require an atomic_add anyway.  Perhaps just an atomic_set to be paired with
atomic_read in the dump function that's iterating over the tcg_ctx.

Even without the atomic_add, we shouldn't make the syscall for getclock in the
normal case.


> +    /* These are accessed with atomic operations */
> +    struct {
> +        int64_t restore;
> +        uint64_t restore_count;
> +        int64_t interm;
> +        int64_t code;
> +        int64_t opt;
> +        int64_t la;
> +    } time;

Why are these signed?  We're always adding positive values.

Why is restore_count here in "time"?  Indeed, why all of these unnamed
sub-structures at all?  I don't see that "." helps organization any more than "_".

> @@ -4020,18 +3959,18 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
>      }
>  #endif
>  
> -#ifdef CONFIG_PROFILER
> -    atomic_set(&prof->opt_time, prof->opt_time - profile_getclock());
> -#endif
> +    if (tb_stats_enabled(tb, TB_JIT_TIME)) {
> +        atomic_add(&tb->tb_stats->time.opt, -profile_getclock());
> +    }

Atomic add of a negative clock value?  That just means the intermediate value
is unusable, so why make any of this atomic?

Also, this is tb_gen_code again, wherein we already talked about not using
atomic_foo.

> @@ -4081,14 +4020,17 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
>      s->pool_labels = NULL;
>  #endif
>  
> +    if (!tb_stats_collection_enabled()) {
> +        QTAILQ_FOREACH(op, &s->ops, link) {
> +            TCGOpcode opc = op->opc;
> +            atomic_add(&s->prof.table_op_count[opc], 1);
> +        }
> +    }

Why would you collect stats when stats collection is disabled?  Is this a
simple typo?


r~


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 06/13] debug: add -d tb_stats to control TBStatistics collection:
  2019-10-07 15:28 ` [PATCH v9 06/13] debug: add -d tb_stats to control TBStatistics collection: Alex Bennée
@ 2019-10-08 15:34   ` Richard Henderson
  2019-10-08 15:49     ` Alex Bennée
  0 siblings, 1 reply; 37+ messages in thread
From: Richard Henderson @ 2019-10-08 15:34 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Paolo Bonzini, cota, Vanderson M. do Rosario, Richard Henderson

On 10/7/19 11:28 AM, Alex Bennée wrote:
> From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>
> 
>  -d tb_stats[[,level=(+all+jit+exec+time)][,dump_limit=<number>]]
> 
> "dump_limit" is used to limit the number of dumped TBStats in
> linux-user mode.
> 
> [all+jit+exec+time] control the profilling level used
> by the TBStats. Can be used as follow:
> 
> -d tb_stats
> -d tb_stats,level=jit+time
> -d tb_stats,dump_limit=15
> ...
> 
> Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
> Message-Id: <20190829173437.5926-7-vandersonmr2@gmail.com>
> [AJB: fix authorship, reword title]
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> 
> ---
> AJB:
>   - reword title
>   - add stubs for enabling
>   - move things across to tb-stats-flags.h
> ---
>  accel/tcg/tb-stats.c          |  5 +++++
>  include/exec/gen-icount.h     |  1 +
>  include/exec/tb-stats-flags.h | 29 +++++++++++++++++++++++++++++
>  include/exec/tb-stats.h       | 16 +++-------------
>  include/qemu/log.h            |  1 +
>  stubs/Makefile.objs           |  1 +
>  stubs/tb-stats.c              | 27 +++++++++++++++++++++++++++
>  util/log.c                    | 35 +++++++++++++++++++++++++++++++++++
>  8 files changed, 102 insertions(+), 13 deletions(-)
>  create mode 100644 include/exec/tb-stats-flags.h
>  create mode 100644 stubs/tb-stats.c
> 
> diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
> index f431159fd2..1c66e03979 100644
> --- a/accel/tcg/tb-stats.c
> +++ b/accel/tcg/tb-stats.c
> @@ -193,3 +193,8 @@ uint32_t get_default_tbstats_flag(void)
>  {
>      return default_tbstats_flag;
>  }
> +
> +void set_default_tbstats_flag(uint32_t flags)
> +{
> +    default_tbstats_flag = flags;
> +}
> diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
> index be006383b9..3987adfb0e 100644
> --- a/include/exec/gen-icount.h
> +++ b/include/exec/gen-icount.h
> @@ -2,6 +2,7 @@
>  #define GEN_ICOUNT_H
>  
>  #include "qemu/timer.h"
> +#include "tb-stats-flags.h"
>  
>  /* Helpers for instruction counting code generation.  */
>  
> diff --git a/include/exec/tb-stats-flags.h b/include/exec/tb-stats-flags.h
> new file mode 100644
> index 0000000000..8455073048
> --- /dev/null
> +++ b/include/exec/tb-stats-flags.h
> @@ -0,0 +1,29 @@
> +/*
> + * QEMU System Emulator, Code Quality Monitor System
> + *
> + * We define the flags and control bits here to avoid complications of
> + * including TCG/CPU information in common code.
> + *
> + * Copyright (c) 2019 Vanderson M. do Rosario <vandersonmr2@gmail.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +#ifndef TB_STATS_FLAGS
> +#define TB_STATS_FLAGS
> +
> +#define TB_NOTHING    (1 << 0)

Repeating my question about TB_NOTHING -- what is it?

> +#define TB_EXEC_STATS (1 << 1)
> +#define TB_JIT_STATS  (1 << 2)
> +#define TB_JIT_TIME   (1 << 3)
> +
> +/* TBStatistic collection controls */
> +void enable_collect_tb_stats(void);
> +void disable_collect_tb_stats(void);
> +void pause_collect_tb_stats(void);
> +bool tb_stats_collection_enabled(void);
> +bool tb_stats_collection_paused(void);
> +
> +uint32_t get_default_tbstats_flag(void);
> +void set_default_tbstats_flag(uint32_t);

Is a get/set really better than an exported variable?

Should we have created this header in the first place,
rather than moving stuff here in patch 6?


> +        } else if (g_str_has_prefix(*tmp, "tb_stats")) {
> +            mask |= CPU_LOG_TB_STATS;
> +            set_default_tbstats_flag(TB_JIT_STATS | TB_EXEC_STATS | TB_JIT_TIME);

Surely TB_ALL_STATS?

> +                } else if (g_str_equal(*level_tmp, "all")) {
> +                    flags |= TB_JIT_STATS | TB_EXEC_STATS | TB_JIT_TIME;

Likewise.


r~


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 07/13] monitor: adding tb_stats hmp command
  2019-10-07 15:28 ` [PATCH v9 07/13] monitor: adding tb_stats hmp command Alex Bennée
@ 2019-10-08 15:48   ` Richard Henderson
  0 siblings, 0 replies; 37+ messages in thread
From: Richard Henderson @ 2019-10-08 15:48 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Vanderson M. do Rosario, Dr . David Alan Gilbert,
	Markus Armbruster, cota, Paolo Bonzini, Richard Henderson

On 10/7/19 11:28 AM, Alex Bennée wrote:
> From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>
> 
> Adding tb_stats [start|pause|stop|filter] command to hmp.
> This allows controlling the collection of statistics.
> It is also possible to set the level of collection:
> all, jit, or exec.
> 
> tb_stats filter allow to only collect statistics for the TB
> in the last_search list.
> 
> The goal of this command is to allow the dynamic exploration
> of the TCG behavior and quality. Therefore, for now, a
> corresponding QMP command is not worthwhile.
> 
> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
> Message-Id: <20190829173437.5926-8-vandersonmr2@gmail.com>
> [AJB: fix authorship]
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> 
> monitor: add tb_stats [start|pause|stop|filter] command to hmp.
> 
> This allows controlling the collection of statistics.
> It is also possible to set the level of collection:
> all, jit, or exec.
> 
> tb_stats filter allow to only collect statistics for the TB
> in the last_search list.
> 
> The goal of this command is to allow the dynamic exploration
> of the TCG behavior and quality. Therefore, for now, a
> corresponding QMP command is not worthwhile.
> 
> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
> Message-Id: <20190829173437.5926-9-vandersonmr2@gmail.com>
> [AJB: fix authorship]
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---

Duplicated commit message contents.

> +void do_hmp_tbstats_safe(CPUState *cpu, run_on_cpu_data icmd)

Deserves a comment "Run via async_safe_run_on_cpu", as otherwise the "safe"
part of the function name is fairly obscure.

> +    char *slevel = (char *) qdict_get_try_str(qdict, "level");
> +    uint32_t level = TB_EXEC_STATS | TB_JIT_STATS | TB_JIT_TIME;
> +    if (slevel) {
> +        if (strcmp(slevel, "jit") == 0) {
> +            level = TB_JIT_STATS;
> +        } else if (strcmp(slevel, "exec") == 0) {
> +            level = TB_EXEC_STATS;
> +        } else if (strcmp(slevel, "time") == 0) {
> +            level = TB_JIT_TIME;
> +        }
> +    }

No support for "+" or "all" as we do for the command-line?


> +static void reset_tbstats_flag(void *p, uint32_t hash, void *userp)
> +{
> +    uint32_t flag = *((int *)userp);

Type cast doesn't match...

> +    TBStatistics *tbs = p;
> +    tbs->stats_enabled = flag;
> +}
> +
> +void set_tbstats_flags(uint32_t flag)
> +{
> +    /* iterate over tbstats setting their flag as TB_NOTHING */
> +    qht_iter(&tb_ctx.tb_stats, reset_tbstats_flag, &flag);
> +}

... here.

It does make me wonder if you'd do just as well with (void *)(uintptr_t)flag
and avoid the indirection.  It is an opaque quantity shared only between these
two functions.


>  static void hmp_info_profile(Monitor *mon, const QDict *qdict)
>  {
> +#ifdef CONFIG_TCG
>      dump_jit_exec_time_info(dev_time);
>      dev_time = 0;
> +#else
> +    error_report("TCG should be enabled!");
> +#endif
>  }
>  
>  /* Capture support */
> diff --git a/vl.c b/vl.c
> index f91bc6fb4c..781fddaf18 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -1796,11 +1796,17 @@ static bool main_loop_should_exit(void)
>  
>  static void main_loop(void)
>  {
> +#ifdef CONFIG_TCG
>      uint64_t ti;
> +#endif
>      while (!main_loop_should_exit()) {
> +#ifdef CONFIG_TCG
>          ti = profile_getclock();
> +#endif
>          main_loop_wait(false);
> +#ifdef CONFIG_TCG
>          dev_time += profile_getclock() - ti;
> +#endif
>      }
>  }

All of this was CONFIG_PROFILE just a few patches ago.  If you need this, then
you broke bisection with !defined(CONFIG_TCG) a few patches ago.


r~


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 06/13] debug: add -d tb_stats to control TBStatistics collection:
  2019-10-08 15:34   ` Richard Henderson
@ 2019-10-08 15:49     ` Alex Bennée
  0 siblings, 0 replies; 37+ messages in thread
From: Alex Bennée @ 2019-10-08 15:49 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Paolo Bonzini, cota, Vanderson M. do Rosario, qemu-devel,
	Richard Henderson


Richard Henderson <richard.henderson@linaro.org> writes:

> On 10/7/19 11:28 AM, Alex Bennée wrote:
>> From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>
>>
>>  -d tb_stats[[,level=(+all+jit+exec+time)][,dump_limit=<number>]]
>>
>> "dump_limit" is used to limit the number of dumped TBStats in
>> linux-user mode.
>>
>> [all+jit+exec+time] control the profilling level used
>> by the TBStats. Can be used as follow:
>>
>> -d tb_stats
>> -d tb_stats,level=jit+time
>> -d tb_stats,dump_limit=15
>> ...
>>
>> Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
>> Message-Id: <20190829173437.5926-7-vandersonmr2@gmail.com>
>> [AJB: fix authorship, reword title]
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>>
>> ---
>> AJB:
>>   - reword title
>>   - add stubs for enabling
>>   - move things across to tb-stats-flags.h
>> ---
>>  accel/tcg/tb-stats.c          |  5 +++++
>>  include/exec/gen-icount.h     |  1 +
>>  include/exec/tb-stats-flags.h | 29 +++++++++++++++++++++++++++++
>>  include/exec/tb-stats.h       | 16 +++-------------
>>  include/qemu/log.h            |  1 +
>>  stubs/Makefile.objs           |  1 +
>>  stubs/tb-stats.c              | 27 +++++++++++++++++++++++++++
>>  util/log.c                    | 35 +++++++++++++++++++++++++++++++++++
>>  8 files changed, 102 insertions(+), 13 deletions(-)
>>  create mode 100644 include/exec/tb-stats-flags.h
>>  create mode 100644 stubs/tb-stats.c
>>
>> diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
>> index f431159fd2..1c66e03979 100644
>> --- a/accel/tcg/tb-stats.c
>> +++ b/accel/tcg/tb-stats.c
>> @@ -193,3 +193,8 @@ uint32_t get_default_tbstats_flag(void)
>>  {
>>      return default_tbstats_flag;
>>  }
>> +
>> +void set_default_tbstats_flag(uint32_t flags)
>> +{
>> +    default_tbstats_flag = flags;
>> +}
>> diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
>> index be006383b9..3987adfb0e 100644
>> --- a/include/exec/gen-icount.h
>> +++ b/include/exec/gen-icount.h
>> @@ -2,6 +2,7 @@
>>  #define GEN_ICOUNT_H
>>
>>  #include "qemu/timer.h"
>> +#include "tb-stats-flags.h"
>>
>>  /* Helpers for instruction counting code generation.  */
>>
>> diff --git a/include/exec/tb-stats-flags.h b/include/exec/tb-stats-flags.h
>> new file mode 100644
>> index 0000000000..8455073048
>> --- /dev/null
>> +++ b/include/exec/tb-stats-flags.h
>> @@ -0,0 +1,29 @@
>> +/*
>> + * QEMU System Emulator, Code Quality Monitor System
>> + *
>> + * We define the flags and control bits here to avoid complications of
>> + * including TCG/CPU information in common code.
>> + *
>> + * Copyright (c) 2019 Vanderson M. do Rosario <vandersonmr2@gmail.com>
>> + *
>> + * SPDX-License-Identifier: GPL-2.0-or-later
>> + */
>> +#ifndef TB_STATS_FLAGS
>> +#define TB_STATS_FLAGS
>> +
>> +#define TB_NOTHING    (1 << 0)
>
> Repeating my question about TB_NOTHING -- what is it?
>
>> +#define TB_EXEC_STATS (1 << 1)
>> +#define TB_JIT_STATS  (1 << 2)
>> +#define TB_JIT_TIME   (1 << 3)
>> +
>> +/* TBStatistic collection controls */
>> +void enable_collect_tb_stats(void);
>> +void disable_collect_tb_stats(void);
>> +void pause_collect_tb_stats(void);
>> +bool tb_stats_collection_enabled(void);
>> +bool tb_stats_collection_paused(void);
>> +
>> +uint32_t get_default_tbstats_flag(void);
>> +void set_default_tbstats_flag(uint32_t);
>
> Is a get/set really better than an exported variable?

It makes things easier for log.c which is used for multiple binaries
although I never actually used empty inlines instead having stubs. I'll
have to check if the tools define CONFIG_TCG anyway.

>
> Should we have created this header in the first place,
> rather than moving stuff here in patch 6?

Yes. I'll move it.

>
> Surely TB_ALL_STATS?
>
>> +                } else if (g_str_equal(*level_tmp, "all")) {
>> +                    flags |= TB_JIT_STATS | TB_EXEC_STATS | TB_JIT_TIME;
>
> Likewise.
>
>
> r~

Thanks,

--
Alex Bennée


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 08/13] tb-stats: reset the tracked TBs on a tb_flush
  2019-10-07 15:28 ` [PATCH v9 08/13] tb-stats: reset the tracked TBs on a tb_flush Alex Bennée
@ 2019-10-08 18:00   ` Richard Henderson
  2019-10-08 19:18     ` Alex Bennée
  0 siblings, 1 reply; 37+ messages in thread
From: Richard Henderson @ 2019-10-08 18:00 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel; +Cc: Paolo Bonzini, cota, Richard Henderson

On 10/7/19 11:28 AM, Alex Bennée wrote:
> We keep track of translations but can only do so up until the
> translation cache is flushed. At that point we really have no idea if
> we can re-create a translation because all the active tracking
> information has been reset.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  accel/tcg/tb-stats.c      | 19 +++++++++++++++++++
>  accel/tcg/translate-all.c |  2 +-
>  include/exec/tb-stats.h   |  8 ++++++++
>  3 files changed, 28 insertions(+), 1 deletion(-)

I still don't understand what the tbs array is for,
but resetting it at flush is fine.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 09/13] Adding info [tb-list|tb] commands to HMP (WIP)
  2019-10-07 15:28 ` [PATCH v9 09/13] Adding info [tb-list|tb] commands to HMP (WIP) Alex Bennée
@ 2019-10-08 18:50   ` Richard Henderson
  2019-10-08 19:36     ` Alex Bennée
  2019-10-09  9:44   ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 37+ messages in thread
From: Richard Henderson @ 2019-10-08 18:50 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Vanderson M. do Rosario, Dr . David Alan Gilbert,
	Markus Armbruster, cota, Paolo Bonzini, Richard Henderson

On 10/7/19 11:28 AM, Alex Bennée wrote:
> From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>
> 
> These commands allow the exploration of TBs generated by the TCG.
> Understand which one hotter, with more guest/host instructions... and
> examine their guest, host and IR code.
> 
> The goal of this command is to allow the dynamic exploration of TCG
> behavior and code quality. Therefore, for now, a corresponding QMP
> command is not worthwhile.
> 
> [AJB: WIP - we still can't be safely sure a translation will succeed]
> 
> Example of output:
> 
> TB id:1 | phys:0x34d54 virt:0x0000000000034d54 flags:0x0000f0
> 	| exec:4828932/0 guest inst cov:16.38%
> 	| trans:1 ints: g:3 op:82 op_opt:34 spills:3
> 	| h/g (host bytes / guest insts): 90.666664
> 	| time to gen at 2.4GHz => code:3150.83(ns) IR:712.08(ns)
> 	| targets: 0x0000000000034d5e (id:11), 0x0000000000034d0d (id:2)
> 
> TB id:2 | phys:0x34d0d virt:0x0000000000034d0d flags:0x0000f0
> 	| exec:4825842/0 guest inst cov:21.82%
> 	| trans:1 ints: g:4 op:80 op_opt:38 spills:2
> 	| h/g (host bytes / guest insts): 84.000000
> 	| time to gen at 2.4GHz => code:3362.92(ns) IR:793.75(ns)
> 	| targets: 0x0000000000034d19 (id:12), 0x0000000000034d54 (id:1)
> 
> TB id:2 | phys:0x34d0d virt:0x0000000000034d0d flags:0x0000f0
> 	| exec:6956495/0  guest inst cov:21.82%
> 	| trans:2 ints: g:2 op:40 op_opt:19 spills:1
> 	| h/g (host bytes / guest insts): 84.000000
> 	| time to gen at 2.4GHz => code:3130.83(ns) IR:722.50(ns)
> 	| targets: 0x0000000000034d19 (id:12), 0x0000000000034d54 (id:1)
> 
> ----------------
> IN:
> 0x00034d0d:  89 de                    movl     %ebx, %esi
> 0x00034d0f:  26 8b 0e                 movl     %es:(%esi), %ecx
> 0x00034d12:  26 f6 46 08 80           testb    $0x80, %es:8(%esi)
> 0x00034d17:  75 3b                    jne      0x34d54
> 
> ------------------------------
> 
> TB id:1 | phys:0x34d54 virt:0x0000000000034d54 flags:0x0000f0
> 	| exec:5202686/0 guest inst cov:11.28%
> 	| trans:1 ints: g:3 op:82 op_opt:34 spills:3
> 	| h/g (host bytes / guest insts): 90.666664
> 	| time to gen at 2.4GHz => code:2793.75(ns) IR:614.58(ns)
> 	| targets: 0x0000000000034d5e (id:3), 0x0000000000034d0d (id:2)
> 
> TB id:2 | phys:0x34d0d virt:0x0000000000034d0d flags:0x0000f0
> 	| exec:5199468/0 guest inst cov:15.03%
> 	| trans:1 ints: g:4 op:80 op_opt:38 spills:2
> 	| h/g (host bytes / guest insts): 84.000000
> 	| time to gen at 2.4GHz => code:2958.75(ns) IR:719.58(ns)
> 	| targets: 0x0000000000034d19 (id:4), 0x0000000000034d54 (id:1)
> 
> ------------------------------
> 2 TBs to reach 25% of guest inst exec coverage
> Total of guest insts exec: 138346727
> 

Is there too much cut-and-paste in this commit message?
I certainly hope that identical information about TB id:2
is not output 3 times within the same report...

Or, alternately, that we are not generating multiple TBs
for the { phys:0x34d0d virt:0x34d0d flags:0xf0 } tuple.

Also, I think you probably need to output cs_base.  Depending
on the target, that might have been different, and so it
might make sense that you have 3 copies of the above.

> +static gint
> +inverse_sort_tbs(gconstpointer p1, gconstpointer p2, gpointer psort_by)

What about this makes the sort "inverse"?

> +    int sort_by = *((int *) psort_by);

Why is this not enum SortBy?

> +    if (likely(sort_by == SORT_BY_SPILLS)) {
...
> +    } else if (likely(sort_by == SORT_BY_HOTNESS)) {
...
> +    } else if (likely(sort_by == SORT_BY_HG)) {

Surely these options are not all "likely".

> +        float a =
> +            (float) stat_per_translation(tbs1, code.out_len) / tbs1->code.num_guest_inst;
> +        float b =
> +            (float) stat_per_translation(tbs2, code.out_len) / tbs2->code.num_guest_inst;

I don't understand why we're suddenly introducing floats, when the division
hidden within stat_per_translation is integer.

Think about what units being compared here, because I don't think that "average
host code length / sum of guest insn count" makes sense.  Certainly as time
progresses, average / sum -> ((sum / sum) / sum) -> (sum / sum**2) -> (1 / sum)
is going to approach 0.

I think you actually want "sum of host code length / sum of guest insn length",
which could be stated as "host/guest code ratio" or "jit code expansion factor".

> +        c1 = a <= b ? 0 : 1;
> +        c2 = a <= b ? 1 : 0;

Please do recall that (x < y ? 1 : 0) => (x < y).

In addition, there's no point in having these comparisons feed...

> +    return c1 < c2 ? 1 : c1 == c2 ? 0 : -1;

... these comparisions.

> +    for (i = last_search; i; i = i->next) {
> +        TBStatistics *tbs = (TBStatistics *) i->data;
> +        uint64_t tb_total_execs =
> +            (tbs->executions.atomic + tbs->executions.normal) * tbs->code.num_guest_inst;
> +        tbs->executions.coverage = (10000 * tb_total_execs) / (total_exec_count + 1);

If ever there was a time to want to use float, this is it.

I assume the total_exec_count + 1 is to avoid divide by zero?
Surely we can do better than this...

Indeed, given that we've already checked...

> +    if (!last_search) {
> +        qemu_printf("No data collected yet!\n");
> +        return;
> +    }

... surely we can either assert total_exec_count != 0, or don't and just let
the divide-by-zero signal do the same thing.  (I don't see the value of
replacing one signal with another in most cases.)

> +/*
> + * We cannot always re-generate the code even if we know there are
> + * valid translations still in the cache. The reason being the guest
> + * may have un-mapped the page code.

Um... unless I mistake what's being described here, that wouldn't be a valid
translation.  Or do you just mean that the page mapping isn't present within
the TLB?  Which is not quite the same thing as "unmapping".

> + * TODO: can we do this safely? We need to
> + *  a) somehow recover the mmu_idx for this translation

We could add an interface for this, yes.  The value *must* be able to be
derived from tb->flags, but of course in a target-dependent way.

> + *  b) probe MMU_INST_FETCH to know it will succeed

We *do* have this now, sort of: tlb_vaddr_to_host.

So far all use of this function originates from target/foo/,
and so some targets have not been updated to work with this.
I've marked these with asserts within foo_cpu_tlb_fill.

Notable targets for which it won't work: i386, sparc.


> +static GString *get_code_string(TBStatistics *tbs, int log_flags)
> +{
> +    int old_log_flags = qemu_loglevel;
> +
> +    CPUState *cpu = first_cpu;
> +    uint32_t cflags = curr_cflags() | CF_NOCACHE;
> +    TranslationBlock *tb = NULL;
> +
> +    GString *code_s = g_string_new(NULL);
> +    qemu_log_to_string(true, code_s);
> +
> +    qemu_set_log(log_flags);
> +
> +    if (sigsetjmp(cpu->jmp_env, 0) == 0) {
> +        mmap_lock();
> +        tb = tb_gen_code(cpu, tbs->pc, tbs->cs_base, tbs->flags, cflags);
> +        tb_phys_invalidate(tb, -1);
> +        mmap_unlock();

Ew.  No.

Let us not go through tb_gen_code, just to get logging output from the
translator.  What are we really after here?  Input assembly?

> @@ -86,7 +91,6 @@ struct TBStatistics {
>  
>      struct {
>          unsigned long total;
> -        unsigned long uncached;
>          unsigned long spanning;
>      } translations;
>  

Vanishing unused variable that maybe shouldn't have existed?



r~


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 10/13] tb-stats: dump hot TBs at the end of the execution
  2019-10-07 15:28 ` [PATCH v9 10/13] tb-stats: dump hot TBs at the end of the execution Alex Bennée
@ 2019-10-08 19:05   ` Richard Henderson
  0 siblings, 0 replies; 37+ messages in thread
From: Richard Henderson @ 2019-10-08 19:05 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Vanderson M. do Rosario, Riku Voipio, Laurent Vivier, cota,
	Paolo Bonzini, Richard Henderson

On 10/7/19 11:28 AM, Alex Bennée wrote:
> @@ -337,7 +336,8 @@ int qemu_str_to_log_mask(const char *str)
>              enable_collect_tb_stats();
>          } else if (tb_stats_collection_enabled() &&
>                     g_str_has_prefix(*tmp, "dump_limit=")) {
> -            max_num_hot_tbs_to_dump = atoi((*tmp) + 11);
> +            int hot_tbs = atoi((*tmp) + 11);
> +            set_tbstats_max_tbs(hot_tbs);

I think qemu_strtoi() is preferred over atoi().

My guess it that it's better to export a variable than a set_tbstats_max_tbs()
function.  I don't see any reason why stubs.c can't define variables -- they're
probably no larger than empty functions in many cases.


r~


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 08/13] tb-stats: reset the tracked TBs on a tb_flush
  2019-10-08 18:00   ` Richard Henderson
@ 2019-10-08 19:18     ` Alex Bennée
  0 siblings, 0 replies; 37+ messages in thread
From: Alex Bennée @ 2019-10-08 19:18 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Paolo Bonzini, cota, qemu-devel, Richard Henderson


Richard Henderson <richard.henderson@linaro.org> writes:

> On 10/7/19 11:28 AM, Alex Bennée wrote:
>> We keep track of translations but can only do so up until the
>> translation cache is flushed. At that point we really have no idea if
>> we can re-create a translation because all the active tracking
>> information has been reset.
>>
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> ---
>>  accel/tcg/tb-stats.c      | 19 +++++++++++++++++++
>>  accel/tcg/translate-all.c |  2 +-
>>  include/exec/tb-stats.h   |  8 ++++++++
>>  3 files changed, 28 insertions(+), 1 deletion(-)
>
> I still don't understand what the tbs array is for,
> but resetting it at flush is fine.

In Vanderson's original patch he kept a reference to the last translated
tb which was incorrect - I changed it to track all the TBs associated
with the stats entry. However the coverset and cfg commands aren't in
this series which needed to iterate down through the TBs to their jump
targets to build the full hot block. I suspect for now we can just drop
the entry.

>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>
>
> r~


--
Alex Bennée


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 11/13] accel/tcg: adding integration with linux perf
  2019-10-07 15:28 ` [PATCH v9 11/13] accel/tcg: adding integration with linux perf Alex Bennée
@ 2019-10-08 19:33   ` Richard Henderson
  0 siblings, 0 replies; 37+ messages in thread
From: Richard Henderson @ 2019-10-08 19:33 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Vanderson M. do Rosario, Riku Voipio, Laurent Vivier, cota,
	Paolo Bonzini, Richard Henderson

On 10/7/19 11:28 AM, Alex Bennée wrote:
> +static uint32_t get_e_machine(void)
> +{
> +    uint32_t e_machine = EM_NONE;
> +    Elf64_Ehdr elf_header;

Not ideal, as this appears to not work on 32-bit hosts, but the two structures
do match up within the first 24 bytes, in which this is located.

That said, this value is present within tcg/host/tcg-target.inc.c as
ELF_HOST_MACHINE.  So we really don't have to play /proc/self/exec games.

> +void start_jitdump_file(void)
> +{
> +    g_autofree gchar *dumpfile_name = g_strdup_printf("./jit-%d.dump", getpid());
> +    dumpfile = fopen(dumpfile_name, "w+");
> +
> +    /* 'Perf record' saves mmaped files during the execution of a program and
> +     * 'perf inject' iterate over them to reconstruct all used/executed binary.
> +     * So, we create a mmap with the path of our jitdump that is processed
> +     * and used by 'perf inject' to reconstruct jitted binaries.
> +     */
> +    perf_marker = mmap(NULL, sysconf(_SC_PAGESIZE),
> +                          PROT_READ | PROT_EXEC,
> +                          MAP_PRIVATE,
> +                          fileno(dumpfile), 0);

(1) sysconf(_SC_PAGESIZE) is qemu_real_host_page_size.
(2) This is a page-sized mapping of a new, zero-sized file?
    I assume this mapping event gets logged, and that it its
    only purpose?
(3) I really need to read the kernel docs...

> +void append_load_in_jitdump_file(TranslationBlock *tb)
> +{
> +    gchar *func_name = g_strdup_printf("TB virt:0x"TARGET_FMT_lx, tb->pc);
> +
> +    /* Serialise the writing of the dump file */
> +    qemu_mutex_lock(&dumpfile_lock);
> +
> +    struct jr_code_load load_event;
> +    load_event.p.id = JIT_CODE_LOAD;
> +    load_event.p.total_size =
> +        sizeof(struct jr_code_load) + func_name->len + 1 + tb->tc.size;

How does a "gchar *func_name" have ->len?  Did this used to be GString, but a
last-minute change means it no longer compiles?

> +    fflush(dumpfile);

Why fflushing all of the time?  Surely the file contents doesn't matter until
after the final close.

> +    qemu_mutex_unlock(&dumpfile_lock);

Why a separate qemu locking instead of using stdio's own locking (flockfile).

> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 871d91d559..3fafb656e7 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -58,6 +58,10 @@
>  #include "sysemu/cpus.h"
>  #include "sysemu/tcg.h"
>  
> +#ifdef __linux__
> +#include "perf/jitdump.h"
> +#endif

Why the ifdefs?  We're not dependent on other headers are we?
Not that there's a "perf" on other hosts, but AFACT it should
at least compile...


r~


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 09/13] Adding info [tb-list|tb] commands to HMP (WIP)
  2019-10-08 18:50   ` Richard Henderson
@ 2019-10-08 19:36     ` Alex Bennée
  0 siblings, 0 replies; 37+ messages in thread
From: Alex Bennée @ 2019-10-08 19:36 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Vanderson M. do Rosario, Markus Armbruster, qemu-devel, cota,
	Paolo Bonzini, Dr . David Alan Gilbert, Richard Henderson


Richard Henderson <richard.henderson@linaro.org> writes:

> On 10/7/19 11:28 AM, Alex Bennée wrote:
>> From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>
>>
>> These commands allow the exploration of TBs generated by the TCG.
>> Understand which one hotter, with more guest/host instructions... and
>> examine their guest, host and IR code.
>>
>> The goal of this command is to allow the dynamic exploration of TCG
>> behavior and code quality. Therefore, for now, a corresponding QMP
>> command is not worthwhile.
>>
>> [AJB: WIP - we still can't be safely sure a translation will succeed]

I'll fix up all the other points...

>
>> +/*
>> + * We cannot always re-generate the code even if we know there are
>> + * valid translations still in the cache. The reason being the guest
>> + * may have un-mapped the page code.
>
> Um... unless I mistake what's being described here, that wouldn't be a valid
> translation.  Or do you just mean that the page mapping isn't present within
> the TLB?  Which is not quite the same thing as "unmapping".

A page entry can get dropped without invalidating the TB (for example
the guest OS sets a page as non-exec). The TLB flush takes care of the
jump cache and there are no inter-page hardwired links so if a tb_find
goes there we would get the fault.

>> + * TODO: can we do this safely? We need to
>> + *  a) somehow recover the mmu_idx for this translation
>
> We could add an interface for this, yes.  The value *must* be able to be
> derived from tb->flags, but of course in a target-dependent way.
>
>> + *  b) probe MMU_INST_FETCH to know it will succeed
>
> We *do* have this now, sort of: tlb_vaddr_to_host.
>
> So far all use of this function originates from target/foo/,
> and so some targets have not been updated to work with this.
> I've marked these with asserts within foo_cpu_tlb_fill.
>
> Notable targets for which it won't work: i386, sparc.
>
>
>> +static GString *get_code_string(TBStatistics *tbs, int log_flags)
>> +{
>> +    int old_log_flags = qemu_loglevel;
>> +
>> +    CPUState *cpu = first_cpu;
>> +    uint32_t cflags = curr_cflags() | CF_NOCACHE;
>> +    TranslationBlock *tb = NULL;
>> +
>> +    GString *code_s = g_string_new(NULL);
>> +    qemu_log_to_string(true, code_s);
>> +
>> +    qemu_set_log(log_flags);
>> +
>> +    if (sigsetjmp(cpu->jmp_env, 0) == 0) {
>> +        mmap_lock();
>> +        tb = tb_gen_code(cpu, tbs->pc, tbs->cs_base, tbs->flags, cflags);
>> +        tb_phys_invalidate(tb, -1);
>> +        mmap_unlock();
>
> Ew.  No.
>
> Let us not go through tb_gen_code, just to get logging output from the
> translator.  What are we really after here?  Input assembly?

All of it - in_asm, op, op_opt, out_asm potentially. But I agree it's
far too hacky - c.f. above because we end up potentially delivering a
fault to the guest when we fail.

It would be nice if we could run the translation phase independently of
the CPU environment. Maybe the monitor could have it's own TCGContext
and call tcg_gen_code directly just for debug output? It would avoid
having to use safe work and all the rest of the stuff we don't actually
care about in tb_gen_code. I guess you could still use the sigsetjmp to
catch the inevitable exceptions from the code load?

--
Alex Bennée


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 13/13] configure: remove the final bits of --profiler support
  2019-10-07 15:28 ` [PATCH v9 13/13] configure: remove the final bits of --profiler support Alex Bennée
@ 2019-10-08 19:39   ` Richard Henderson
  0 siblings, 0 replies; 37+ messages in thread
From: Richard Henderson @ 2019-10-08 19:39 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel; +Cc: cota

On 10/7/19 11:28 AM, Alex Bennée wrote:
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  configure | 5 -----
>  1 file changed, 5 deletions(-)

Merge this with the patch that removes CONFIG_PROFILER.


r~


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 12/13] tb-stats: adding TBStatistics info into perf dump
  2019-10-07 15:28 ` [PATCH v9 12/13] tb-stats: adding TBStatistics info into perf dump Alex Bennée
@ 2019-10-08 19:46   ` Richard Henderson
  0 siblings, 0 replies; 37+ messages in thread
From: Richard Henderson @ 2019-10-08 19:46 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Paolo Bonzini, cota, Vanderson M. do Rosario, Richard Henderson

On 10/7/19 11:28 AM, Alex Bennée wrote:
> diff --git a/accel/tcg/perf/jitdump.c b/accel/tcg/perf/jitdump.c
> index e1d6f2214e..e7b86173e0 100644
> --- a/accel/tcg/perf/jitdump.c
> +++ b/accel/tcg/perf/jitdump.c
> @@ -146,7 +146,20 @@ void start_jitdump_file(void)
>  
>  void append_load_in_jitdump_file(TranslationBlock *tb)
>  {
> -    gchar *func_name = g_strdup_printf("TB virt:0x"TARGET_FMT_lx, tb->pc);
> +    g_autoptr(GString) func_name = g_string_new("TB virt:");
> +
> +    g_string_append_printf(func_name, "0x"TARGET_FMT_lx, tb->pc);

I think it was clearer as a single printf.  Use g_string_printf().

But now I see where the missing GString went -- bad patch splitting.  ;-)

> +    if (tb->tb_stats) {
> +        TBStatistics *tbs = tb->tb_stats;
> +        unsigned g = stat_per_translation(tbs, code.num_guest_inst);
> +        unsigned ops = stat_per_translation(tbs, code.num_tcg_ops);
> +        unsigned ops_opt = stat_per_translation(tbs, code.num_tcg_ops_opt);
> +        unsigned spills = stat_per_translation(tbs, code.spills);
> +
> +        g_string_append_printf(func_name, " (g:%u op:%u opt:%u spills:%d)",
> +                               g, ops, ops_opt, spills);
> +    }

Oh, hum.  Does it really make sense to have accumulated averages here?  Why
does it not make more sense to have the statistics about this particular TB?
After all, different TB -- even for the same guest code -- will appear at
different places within the code_gen_buffer, and so have different entries in
this log.


r~


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH  v9 09/13] Adding info [tb-list|tb] commands to HMP (WIP)
  2019-10-07 15:28 ` [PATCH v9 09/13] Adding info [tb-list|tb] commands to HMP (WIP) Alex Bennée
  2019-10-08 18:50   ` Richard Henderson
@ 2019-10-09  9:44   ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 37+ messages in thread
From: Dr. David Alan Gilbert @ 2019-10-09  9:44 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Vanderson M. do Rosario, qemu-devel, Markus Armbruster, cota,
	Paolo Bonzini, Richard Henderson

* Alex Bennée (alex.bennee@linaro.org) wrote:
> From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>
> 
> These commands allow the exploration of TBs generated by the TCG.
> Understand which one hotter, with more guest/host instructions... and
> examine their guest, host and IR code.
> 
> The goal of this command is to allow the dynamic exploration of TCG
> behavior and code quality. Therefore, for now, a corresponding QMP
> command is not worthwhile.
> 
> [AJB: WIP - we still can't be safely sure a translation will succeed]
> 

<snip>

> diff --git a/util/log.c b/util/log.c
> index 86bd691967..fa78e2bca9 100644
> --- a/util/log.c
> +++ b/util/log.c
> @@ -33,25 +33,55 @@ int qemu_loglevel;
>  static int log_append = 0;
>  static GArray *debug_regions;
>  int32_t max_num_hot_tbs_to_dump;
> +static bool to_monitor;
> +bool to_string;
>  
> -/* Return the number of characters emitted.  */
> -int qemu_log(const char *fmt, ...)
> +GString *string;
> +
> +int qemu_vlog(const char *fmt, va_list va)
>  {
>      int ret = 0;
> -    if (qemu_logfile) {
> -        va_list ap;
> -        va_start(ap, fmt);
> -        ret = vfprintf(qemu_logfile, fmt, ap);
> -        va_end(ap);
> -
> -        /* Don't pass back error results.  */
> -        if (ret < 0) {
> -            ret = 0;
> +    if (to_string) {
> +        if (string) {
> +            g_string_append_vprintf(string, fmt, va);
>          }
> +    } else if (to_monitor) {
> +        ret = qemu_vprintf(fmt, va);
> +    } else if (qemu_logfile) {
> +        ret = vfprintf(qemu_logfile, fmt, va);
> +    }
> +
> +    /* Don't pass back error results.  */
> +    if (ret < 0) {
> +        ret = 0;
>      }
>      return ret;
>  }
>  
> +/* Return the number of characters emitted.  */
> +int qemu_log(const char *fmt, ...)
> +{
> +    int ret = 0;
> +    va_list ap;
> +    va_start(ap, fmt);
> +
> +    ret = qemu_vlog(fmt, ap);
> +
> +    va_end(ap);
> +    return ret;
> +}
> +
> +void qemu_log_to_monitor(bool enable)
> +{
> +    to_monitor = enable;
> +}
> +
> +void qemu_log_to_string(bool enable, GString *s)
> +{
> +    to_string = enable;
> +    string = s;
> +}
> +
>  static bool log_uses_own_buffers;

These feel like they should be in a separate patch.

Dave

>  
>  /* enable or disable low levels log */
> @@ -300,6 +330,7 @@ int qemu_str_to_log_mask(const char *str)
>              trace_enable_events((*tmp) + 6);
>              mask |= LOG_TRACE;
>  #endif
> +#ifdef CONFIG_TCG
>          } else if (g_str_has_prefix(*tmp, "tb_stats")) {
>              mask |= CPU_LOG_TB_STATS;
>              set_default_tbstats_flag(TB_JIT_STATS | TB_EXEC_STATS | TB_JIT_TIME);
> @@ -329,6 +360,7 @@ int qemu_str_to_log_mask(const char *str)
>                  }
>                  set_default_tbstats_flag(flags);
>              }
> +#endif
>          } else {
>              for (item = qemu_log_items; item->mask != 0; item++) {
>                  if (g_str_equal(*tmp, item->name)) {
> -- 
> 2.20.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 01/13] accel/tcg: introduce TBStatistics structure
  2019-10-08 12:35   ` Richard Henderson
@ 2019-12-13 11:14     ` Alex Bennée
  0 siblings, 0 replies; 37+ messages in thread
From: Alex Bennée @ 2019-12-13 11:14 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Paolo Bonzini, cota, Vanderson M. do Rosario, qemu-devel,
	Richard Henderson


Richard Henderson <richard.henderson@linaro.org> writes:

> On 10/7/19 11:28 AM, Alex Bennée wrote:
>> From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>
>> 
>> To store statistics for each TB, we created a TBStatistics structure
>> which is linked with the TBs. TBStatistics can stay alive after
>> tb_flush and be relinked to a regenerated TB. So the statistics can
>> be accumulated even through flushes.
>> 
>> The goal is to have all present and future qemu/tcg statistics and
>> meta-data stored in this new structure.
>> 
>> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>> Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
>> Message-Id: <20190829173437.5926-2-vandersonmr2@gmail.com>
>> [AJB: fix git author, review comments]
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> 
>> ---
>> AJB
>>   - move tcg_collect_tb_stats inside tb-stats.c
>>   - add spdx header
>>   - drop tb from tbstats and associated functions
>> ---
>
> The only quibble I have is with
>
>> +void init_tb_stats_htable_if_not(void);
>
> If not what?
>
> I can only imagine that this phrase is intended to finish "if not initialized".
>  But I think it's clearer to just call this "init_tb_stats_htable".

Fixed.

>
>> +void enable_collect_tb_stats(void)
>> +{
>> +    init_tb_stats_htable_if_not();
>
> Why do we need to do this again, since we did this in tb_htable_init?

This is the route if we dynamically enable tb-stats with an already
running system emulation.

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 03/13] accel: collecting JIT statistics
  2019-10-08 13:38   ` Richard Henderson
@ 2019-12-13 11:51     ` Alex Bennée
  0 siblings, 0 replies; 37+ messages in thread
From: Alex Bennée @ 2019-12-13 11:51 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Paolo Bonzini, cota, Vanderson M. do Rosario, qemu-devel,
	Richard Henderson


Richard Henderson <richard.henderson@linaro.org> writes:

> On 10/7/19 11:28 AM, Alex Bennée wrote:
>> @@ -1795,6 +1799,10 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>>          if (flag & TB_EXEC_STATS) {
>>              tb->tb_stats->stats_enabled |= TB_EXEC_STATS;
>>          }
>> +
>> +        if (flag & TB_JIT_STATS) {
>> +            tb->tb_stats->stats_enabled |= TB_JIT_STATS;
>> +        }
>
> So, assuming that you really meant this, and not replacing as I was guessing vs
> patch 2, then this is
>
>     tb->tb_stats->stats_enabled |=
>         flag & (TB_EXEC_STATS | TB_JIT_STATS);
>
> But I still think it's weird to be wanting to modify the shared structure.
> What does that mean for concurrent code generation?

The idea was to have per translation area granularity on collecting the
stats so we didn't have to burden all areas with the overhead. Currently
this only takes effect when qemu_log_in_addr_range is in effect. However
as the run goes on we could make decisions to disable some or all stats
for stuff that doesn't come up that frequently.

However the current positioning doesn't work as we keep setting the flag
so I think we need to apply get_default_tbstats_flag() inside
tb_get_stats only when we first create the data block.

>
>> +    /*
>> +     * Collect JIT stats when enabled. We batch them all up here to
>> +     * avoid spamming the cache with atomic accesses
>> +     */
>> +    if (tb_stats_enabled(tb, TB_JIT_STATS)) {
>> +        TBStatistics *ts = tb->tb_stats;
>> +        qemu_mutex_lock(&ts->jit_stats_lock);
>> +
>> +        ts->code.num_guest_inst += prof->translation.nb_guest_insns;
>> +        ts->code.num_tcg_ops += prof->translation.nb_ops_pre_opt;
>> +        ts->code.num_tcg_ops_opt += tcg_ctx->nb_ops;
>> +        ts->code.spills += prof->translation.nb_spills;
>> +        ts->code.out_len += tb->tc.size;
>> +
>> +        ts->translations.total++;
>> +        if (phys_page2 != -1) {
>> +            ts->translations.spanning++;
>> +        }
>> +
>> +        g_ptr_array_add(ts->tbs, tb);
>> +
>> +        qemu_mutex_unlock(&ts->jit_stats_lock);
>> +    }
>
> Hmm.  So we're to interpret all of code.field as sums, the average of which can
> be obtained by dividing by translations.total.  Ok, but that should probably be
> mentioned in a comment in/near the structure definition.

OK

> What are we planning to do with the set of all tb's collected here?

Originally we kept track for the coverset calculation as we need to know
where each individual TB goes next. The code was racy so I dropped it
from the series so tracking this now is possibly redundant although it
might be useful in the future.

>
>> @@ -3125,6 +3126,7 @@ static void temp_sync
>>          case TEMP_VAL_REG:
>>              tcg_out_st(s, ts->type, ts->reg,
>>                         ts->mem_base->reg, ts->mem_offset);
>> +            s->prof.translation.nb_spills++;
>>              break;
>>  
>>          case TEMP_VAL_MEM:
>
> This is not a spill in the common compiler definition.
>
> This is "write the temp to its backing storage".  While this does happen in the
> course of spilling, the vast majority of these are because we've finished
> modifying a global temp and must now update memory.  Which is not nearly the
> same thing as "spill".
>
> A spill in the compiler definition happens in tcg_reg_alloc, right after the
> comment "We must spill something".  ;-)

OK I'll fix that.

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 05/13] accel: adding TB_JIT_TIME and full replacing CONFIG_PROFILER
  2019-10-08 15:25   ` Richard Henderson
@ 2019-12-13 21:49     ` Alex Bennée
  2019-12-16 20:34       ` Richard Henderson
  0 siblings, 1 reply; 37+ messages in thread
From: Alex Bennée @ 2019-12-13 21:49 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Vanderson M. do Rosario, Markus Armbruster, qemu-devel, cota,
	Paolo Bonzini, Dr. David Alan Gilbert, Richard Henderson


Richard Henderson <richard.henderson@linaro.org> writes:

> On 10/7/19 11:28 AM, Alex Bennée wrote:
>> From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>
>> 
>> Replace all others CONFIG_PROFILER statistics and migrate it to
>> TBStatistics system. However, TCGProfiler still exists and can
>> be use to store global statistics and times. All TB related
>> statistics goes to TBStatistics.
>> 
>> Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
>> Message-Id: <20190829173437.5926-6-vandersonmr2@gmail.com>
>> [AJB: fix authorship]
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> ---
>>  accel/tcg/tb-stats.c      |  98 +++++++++++++++++++++---------
>>  accel/tcg/translate-all.c |  46 ++++++++-------
>>  configure                 |   3 -
>>  cpus.c                    |  14 ++---
>>  include/exec/tb-stats.h   |  15 +++++
>>  include/qemu/timer.h      |   5 +-
>>  monitor/misc.c            |  28 ++-------
>>  tcg/tcg.c                 | 121 ++++++++++++--------------------------
>>  tcg/tcg.h                 |   2 +-
>>  vl.c                      |   8 +--
>>  10 files changed, 159 insertions(+), 181 deletions(-)
>> 
>> diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
>> index 0e64c176b3..f431159fd2 100644
>> --- a/accel/tcg/tb-stats.c
>> +++ b/accel/tcg/tb-stats.c
>> @@ -17,11 +17,18 @@
>>  #include "exec/tb-stats.h"
>>  
>>  /* TBStatistic collection controls */
>> -enum TBStatsStatus { TB_STATS_RUNNING, TB_STATS_PAUSED, TB_STATS_STOPPED };
>> +enum TBStatsStatus {
>> +    TB_STATS_DISABLED = 0,
>> +    TB_STATS_RUNNING,
>> +    TB_STATS_PAUSED,
>> +    TB_STATS_STOPPED
>> +};
>
> This should be in patch 1, I should think.
>
>> +uint64_t dev_time;
>
> This patch is doing several things at once, and I think it should be split.
> All of the changes to dev_time, for instance, are unrelated to TBStatistic.
>
>> +    jpi->interm_time += stat_per_translation(tbs, time.interm);
>> +    jpi->code_time += stat_per_translation(tbs, time.code);
>> +    jpi->opt_time += stat_per_translation(tbs, time.opt);
>> +    jpi->la_time += stat_per_translation(tbs, time.la);
>> +    jpi->restore_time += tbs->time.restore;
>> +    jpi->restore_count += tbs->time.restore_count;
>
> Why are some of these "per translation" and some not?

The restore_time is an amortised time or all restore operations (so
dividable by restore_count rather than the number of translations).

I've added some commentary.

>
>> @@ -370,11 +371,11 @@ static int cpu_restore_state_from_tb
>>      }
>>      restore_state_to_opc(env, tb, data);
>>  
>> -#ifdef CONFIG_PROFILER
>> -    atomic_set(&prof->restore_time,
>> -                prof->restore_time + profile_getclock() - ti);
>> -    atomic_set(&prof->restore_count, prof->restore_count + 1);
>> -#endif
>> +    if (tb_stats_enabled(tb, TB_JIT_TIME)) {
>> +        atomic_add(&tb->tb_stats->time.restore, profile_getclock() - ti);
>> +        atomic_inc(&tb->tb_stats->time.restore_count);
>> +    }
>
> Would it be better to use a the TBStatistics lock than two atomic
> updates?

Yeah I think keeping the locking consistent as a mutex probably pays of
in reducing complexity as well.

>
>> @@ -1826,10 +1828,10 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>>          tcg_ctx->tb_jmp_target_addr = tb->jmp_target_arg;
>>      }
>>  
>> -#ifdef CONFIG_PROFILER
>> -    atomic_set(&prof->interm_time, prof->interm_time + profile_getclock() - ti);
>> -    ti = profile_getclock();
>> -#endif
>> +    if (tb_stats_enabled(tb, TB_JIT_TIME)) {
>> +        atomic_add(&tb->tb_stats->time.interm, profile_getclock() - ti);
>> +        ti = profile_getclock();
>> +    }
>
> You should call profile_getclock() only once here.
>
> Why does this need an atomic_add, while the rest of TB_JIT_STATS within
> tb_gen_code do not, and in fact have a comment:
>
>> +    /*
>> +     * Collect JIT stats when enabled. We batch them all up here to
>> +     * avoid spamming the cache with atomic accesses
>> +     */
>
>> @@ -1871,9 +1873,9 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>>      }
>>      tb->tc.size = gen_code_size;
>>  
>> -#ifdef CONFIG_PROFILER
>> -    atomic_set(&prof->code_time, prof->code_time + profile_getclock() - ti);
>> -#endif
>> +    if (tb_stats_enabled(tb, TB_JIT_TIME)) {
>> +        atomic_add(&tb->tb_stats->time.code, profile_getclock() - ti);
>> +    }
>
> Likewise.
>
>> diff --git a/configure b/configure
>> index 8f8446f52b..eedeb9016e 100755
>> --- a/configure
>> +++ b/configure
>> @@ -6566,9 +6566,6 @@ fi
>>  if test "$static" = "yes" ; then
>>    echo "CONFIG_STATIC=y" >> $config_host_mak
>>  fi
>> -if test "$profiler" = "yes" ; then
>> -  echo "CONFIG_PROFILER=y" >> $config_host_mak
>> -fi
>
> Removing the define without removing --enable-profiler.
>
>>  static int tcg_cpu_exec(CPUState *cpu)
>>  {
>>      int ret;
>> -#ifdef CONFIG_PROFILER
>> -    int64_t ti;
>> -#endif
>> +    uint64_t ti;
>>  
>>      assert(tcg_enabled());
>> -#ifdef CONFIG_PROFILER
>>      ti = profile_getclock();
>> -#endif
>> +
>>      cpu_exec_start(cpu);
>>      ret = cpu_exec(cpu);
>>      cpu_exec_end(cpu);
>> -#ifdef CONFIG_PROFILER
>> -    atomic_set(&tcg_ctx->prof.cpu_exec_time,
>> -               tcg_ctx->prof.cpu_exec_time + profile_getclock() - ti);
>> -#endif
>> +
>> +    atomic_add(&tcg_ctx->prof.cpu_exec_time, profile_getclock() - ti);
>
> This is also unrelated to TBStatistics.

I'm going to split this patch up into more sub-commits to do each piece
one at a time. To make it easier can I drop CONFIG_PROFILE support
before the re-factoring so I don't need to keep both modes compiling
during the transition?

> It's also adding an unconditional atomic_add, of which I am not fond.  Should
> this be testing tb_stats_collection_enabled() or something else?
>
> I'll also note that tcg_ctx is per-thread (for mttcg), and so perhaps this does
> not require an atomic_add anyway.  Perhaps just an atomic_set to be paired with
> atomic_read in the dump function that's iterating over the tcg_ctx.

I'll try an improve the commenting of TCGProf about what is just holding
per-translations values which get copied at the end and what is
persistent and may be read from outside the translation context (I think
just the OPC count array now).

>
> Even without the atomic_add, we shouldn't make the syscall for getclock in the
> normal case.
>
>
>> +    /* These are accessed with atomic operations */
>> +    struct {
>> +        int64_t restore;
>> +        uint64_t restore_count;
>> +        int64_t interm;
>> +        int64_t code;
>> +        int64_t opt;
>> +        int64_t la;
>> +    } time;
>
> Why are these signed?  We're always adding positive values.
>
> Why is restore_count here in "time"?  Indeed, why all of these unnamed
> sub-structures at all?  I don't see that "." helps organization any more than "_".
>
>> @@ -4020,18 +3959,18 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
>>      }
>>  #endif
>>  
>> -#ifdef CONFIG_PROFILER
>> -    atomic_set(&prof->opt_time, prof->opt_time - profile_getclock());
>> -#endif
>> +    if (tb_stats_enabled(tb, TB_JIT_TIME)) {
>> +        atomic_add(&tb->tb_stats->time.opt, -profile_getclock());
>> +    }
>
> Atomic add of a negative clock value?  That just means the intermediate value
> is unusable, so why make any of this atomic?

I've rewritten all this to just grab some timestamps at various key
points and then bring it all together when we update the TBStat in
translate.

>
> Also, this is tb_gen_code again, wherein we already talked about not using
> atomic_foo.
>
>> @@ -4081,14 +4020,17 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
>>      s->pool_labels = NULL;
>>  #endif
>>  
>> +    if (!tb_stats_collection_enabled()) {
>> +        QTAILQ_FOREACH(op, &s->ops, link) {
>> +            TCGOpcode opc = op->opc;
>> +            atomic_add(&s->prof.table_op_count[opc], 1);
>> +        }
>> +    }
>
> Why would you collect stats when stats collection is disabled?  Is this a
> simple typo?

I think if we can get away with not using atomics this count can go back
to being inline with the main opc loop. If we aren't using atomics is
there really such a big cost to just doing it unconditionally?


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v9 05/13] accel: adding TB_JIT_TIME and full replacing CONFIG_PROFILER
  2019-12-13 21:49     ` Alex Bennée
@ 2019-12-16 20:34       ` Richard Henderson
  0 siblings, 0 replies; 37+ messages in thread
From: Richard Henderson @ 2019-12-16 20:34 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Vanderson M. do Rosario, Markus Armbruster, qemu-devel, cota,
	Paolo Bonzini, Dr. David Alan Gilbert, Richard Henderson

On 12/13/19 11:49 AM, Alex Bennée wrote:
> I'm going to split this patch up into more sub-commits to do each piece
> one at a time. To make it easier can I drop CONFIG_PROFILE support
> before the re-factoring so I don't need to keep both modes compiling
> during the transition?

Certainly.

> I think if we can get away with not using atomics this count can go back
> to being inline with the main opc loop. If we aren't using atomics is
> there really such a big cost to just doing it unconditionally?

I shouldn't think so.


r~



^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2019-12-16 20:35 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-07 15:28 [PATCH v9 00/13] TCG code quality tracking and perf integration Alex Bennée
2019-10-07 15:28 ` [PATCH v9 01/13] accel/tcg: introduce TBStatistics structure Alex Bennée
2019-10-08 12:35   ` Richard Henderson
2019-12-13 11:14     ` Alex Bennée
2019-10-07 15:28 ` [PATCH v9 02/13] accel: collecting TB execution count Alex Bennée
2019-10-08 13:10   ` Richard Henderson
2019-10-07 15:28 ` [PATCH v9 03/13] accel: collecting JIT statistics Alex Bennée
2019-10-08 13:38   ` Richard Henderson
2019-12-13 11:51     ` Alex Bennée
2019-10-07 15:28 ` [PATCH v9 04/13] accel: replacing part of CONFIG_PROFILER with TBStats Alex Bennée
2019-10-08 13:58   ` Richard Henderson
2019-10-07 15:28 ` [PATCH v9 05/13] accel: adding TB_JIT_TIME and full replacing CONFIG_PROFILER Alex Bennée
2019-10-08 15:25   ` Richard Henderson
2019-12-13 21:49     ` Alex Bennée
2019-12-16 20:34       ` Richard Henderson
2019-10-07 15:28 ` [PATCH v9 06/13] debug: add -d tb_stats to control TBStatistics collection: Alex Bennée
2019-10-08 15:34   ` Richard Henderson
2019-10-08 15:49     ` Alex Bennée
2019-10-07 15:28 ` [PATCH v9 07/13] monitor: adding tb_stats hmp command Alex Bennée
2019-10-08 15:48   ` Richard Henderson
2019-10-07 15:28 ` [PATCH v9 08/13] tb-stats: reset the tracked TBs on a tb_flush Alex Bennée
2019-10-08 18:00   ` Richard Henderson
2019-10-08 19:18     ` Alex Bennée
2019-10-07 15:28 ` [PATCH v9 09/13] Adding info [tb-list|tb] commands to HMP (WIP) Alex Bennée
2019-10-08 18:50   ` Richard Henderson
2019-10-08 19:36     ` Alex Bennée
2019-10-09  9:44   ` Dr. David Alan Gilbert
2019-10-07 15:28 ` [PATCH v9 10/13] tb-stats: dump hot TBs at the end of the execution Alex Bennée
2019-10-08 19:05   ` Richard Henderson
2019-10-07 15:28 ` [PATCH v9 11/13] accel/tcg: adding integration with linux perf Alex Bennée
2019-10-08 19:33   ` Richard Henderson
2019-10-07 15:28 ` [PATCH v9 12/13] tb-stats: adding TBStatistics info into perf dump Alex Bennée
2019-10-08 19:46   ` Richard Henderson
2019-10-07 15:28 ` [PATCH v9 13/13] configure: remove the final bits of --profiler support Alex Bennée
2019-10-08 19:39   ` Richard Henderson
2019-10-07 18:14 ` [PATCH v9 00/13] TCG code quality tracking and perf integration no-reply
2019-10-07 18:47 ` no-reply

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.