qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v8 00/11] Measure Tiny Code Generation Quality
@ 2019-08-29 17:34 vandersonmr
  2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 01/11] accel: introducing TBStatistics structure vandersonmr
                   ` (11 more replies)
  0 siblings, 12 replies; 24+ messages in thread
From: vandersonmr @ 2019-08-29 17:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: vandersonmr

This patch is part of Google Summer of Code (GSoC) 2019.
More about the project can be found in:
https://wiki.qemu.org/Internships/ProjectIdeas/TCGCodeQuality

The goal of this patch is to add infrastructure to collect
execution and JIT statistics during the emulation with accel/TCG.
The statistics are stored in TBStatistic structures (TBStats)
with each TB having its respective TBStats.

We added -d tb_stats and HMP tb_stats commands to allow the control
of this statistics collection. And info tb, tbs, and coverset commands
were also added to allow dumping and exploring all this information
while emulating.

Collecting these statistics and information is useful to understand
qemu performance and to help to add the support for traces to QEMU.

v8:
 - fixing missing commit from v7
v7:
 - rebase
 - adding license to new files
 - applying comments from v6
v6:
 - applying comments from V5.
 - change info tbs to info tb-list
 - fix crash when dumping tb's targets
 - fix "liveness/code time" calculation
v5:
 - full replacement of CONFIG_PROFILER
 - several fixes
 - adds "info cfg"
 - adds TB's targets to dump

vandersonmr (11):
  accel: introducing TBStatistics structure
  accel: collecting TB execution count
  accel: collecting JIT statistics
  accel: replacing part of CONFIG_PROFILER with TBStats
  accel: adding TB_JIT_TIME and full replacing CONFIG_PROFILER
  Adding -d tb_stats to control TBStatistics collection:
  monitor: adding tb_stats hmp command
  Adding tb_stats [start|pause|stop|filter] command to hmp.
  Adding info [tb-list|tb|coverset] commands to HMP.
  monitor: adding new info cfg command
  linux-user: dumping hot TBs at the end of the execution

 accel/tcg/Makefile.objs       |   2 +-
 accel/tcg/cpu-exec.c          |   4 +
 accel/tcg/tb-stats.c          | 894 ++++++++++++++++++++++++++++++++++
 accel/tcg/tcg-runtime.c       |   7 +
 accel/tcg/tcg-runtime.h       |   2 +
 accel/tcg/translate-all.c     | 132 +++--
 accel/tcg/translator.c        |   5 +
 configure                     |   3 -
 cpus.c                        |  14 +-
 disas.c                       |  31 +-
 hmp-commands-info.hx          |  31 ++
 hmp-commands.hx               |  17 +
 include/exec/exec-all.h       |  15 +-
 include/exec/gen-icount.h     |  10 +
 include/exec/tb-context.h     |  12 +
 include/exec/tb-hash.h        |   7 +
 include/exec/tb-stats-flags.h |  43 ++
 include/exec/tb-stats.h       | 170 +++++++
 include/qemu/log-for-trace.h  |   4 +
 include/qemu/log.h            |   3 +
 include/qemu/timer.h          |   5 +-
 linux-user/exit.c             |   4 +
 monitor/misc.c                | 171 ++++++-
 tcg/tcg.c                     | 230 +++------
 tcg/tcg.h                     |  22 +-
 util/log.c                    |  90 +++-
 vl.c                          |   8 +-
 27 files changed, 1652 insertions(+), 284 deletions(-)
 create mode 100644 accel/tcg/tb-stats.c
 create mode 100644 include/exec/tb-stats-flags.h
 create mode 100644 include/exec/tb-stats.h

-- 
2.22.0



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v8 01/11] accel: introducing TBStatistics structure
  2019-08-29 17:34 [Qemu-devel] [PATCH v8 00/11] Measure Tiny Code Generation Quality vandersonmr
@ 2019-08-29 17:34 ` vandersonmr
  2019-08-30 12:59   ` Alex Bennée
  2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 02/11] accel: collecting TB execution count vandersonmr
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: vandersonmr @ 2019-08-29 17:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Paolo Bonzini, vandersonmr, Richard Henderson

To store statistics for each TB, we created a TBStatistics structure
which is linked with the TBs. TBStatistics can stay alive after
tb_flush and be relinked to a regenerated TB. So the statistics can
be accumulated even through flushes.

The goal is to have all present and future qemu/tcg statistics and
meta-data stored in this new structure.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
---
 accel/tcg/Makefile.objs   |  2 +-
 accel/tcg/tb-stats.c      | 63 ++++++++++++++++++++++++++++++++++++
 accel/tcg/translate-all.c | 57 +++++++++++++++++++++++++++++++++
 include/exec/exec-all.h   | 15 +++------
 include/exec/tb-context.h | 12 +++++++
 include/exec/tb-hash.h    |  7 ++++
 include/exec/tb-stats.h   | 67 +++++++++++++++++++++++++++++++++++++++
 util/log.c                |  2 ++
 8 files changed, 213 insertions(+), 12 deletions(-)
 create mode 100644 accel/tcg/tb-stats.c
 create mode 100644 include/exec/tb-stats.h

diff --git a/accel/tcg/Makefile.objs b/accel/tcg/Makefile.objs
index d381a02f34..49ffe81b5d 100644
--- a/accel/tcg/Makefile.objs
+++ b/accel/tcg/Makefile.objs
@@ -2,7 +2,7 @@ obj-$(CONFIG_SOFTMMU) += tcg-all.o
 obj-$(CONFIG_SOFTMMU) += cputlb.o
 obj-y += tcg-runtime.o tcg-runtime-gvec.o
 obj-y += cpu-exec.o cpu-exec-common.o translate-all.o
-obj-y += translator.o
+obj-y += translator.o tb-stats.o
 
 obj-$(CONFIG_USER_ONLY) += user-exec.o
 obj-$(call lnot,$(CONFIG_SOFTMMU)) += user-exec-stub.o
diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
new file mode 100644
index 0000000000..948b107e68
--- /dev/null
+++ b/accel/tcg/tb-stats.c
@@ -0,0 +1,63 @@
+/*
+ * QEMU System Emulator, Code Quality Monitor System
+ *
+ * Copyright (c) 2019 Vanderson M. do Rosario <vandersonmr2@gmail.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+
+#include "disas/disas.h"
+
+#include "exec/tb-stats.h"
+
+void init_tb_stats_htable_if_not(void)
+{
+    if (tb_stats_collection_enabled() && !tb_ctx.tb_stats.map) {
+        qht_init(&tb_ctx.tb_stats, tb_stats_cmp,
+                CODE_GEN_HTABLE_SIZE, QHT_MODE_AUTO_RESIZE);
+    }
+}
+
+void enable_collect_tb_stats(void)
+{
+    init_tb_stats_htable_if_not();
+    tcg_collect_tb_stats = TB_STATS_RUNNING;
+}
+
+void disable_collect_tb_stats(void)
+{
+    tcg_collect_tb_stats = TB_STATS_PAUSED;
+}
+
+void pause_collect_tb_stats(void)
+{
+    tcg_collect_tb_stats = TB_STATS_STOPPED;
+}
+
+bool tb_stats_collection_enabled(void)
+{
+    return tcg_collect_tb_stats == TB_STATS_RUNNING;
+}
+
+bool tb_stats_collection_paused(void)
+{
+    return tcg_collect_tb_stats == TB_STATS_PAUSED;
+}
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 5d1e08b169..b7bccacd3b 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1118,6 +1118,23 @@ static inline void code_gen_alloc(size_t tb_size)
     }
 }
 
+/*
+ * This is the more or less the same compare as tb_cmp(), but the
+ * data persists over tb_flush. We also aggregate the various
+ * variations of cflags under one record and ignore the details of
+ * page overlap (although we can count it).
+ */
+bool tb_stats_cmp(const void *ap, const void *bp)
+{
+    const TBStatistics *a = ap;
+    const TBStatistics *b = bp;
+
+    return a->phys_pc == b->phys_pc &&
+        a->pc == b->pc &&
+        a->cs_base == b->cs_base &&
+        a->flags == b->flags;
+}
+
 static bool tb_cmp(const void *ap, const void *bp)
 {
     const TranslationBlock *a = ap;
@@ -1137,6 +1154,7 @@ static void tb_htable_init(void)
     unsigned int mode = QHT_MODE_AUTO_RESIZE;
 
     qht_init(&tb_ctx.htable, tb_cmp, CODE_GEN_HTABLE_SIZE, mode);
+    init_tb_stats_htable_if_not();
 }
 
 /* Must be called before using the QEMU cpus. 'tb_size' is the size
@@ -1666,6 +1684,34 @@ tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
     return tb;
 }
 
+static TBStatistics *tb_get_stats(tb_page_addr_t phys_pc, target_ulong pc,
+                                  target_ulong cs_base, uint32_t flags,
+                                  TranslationBlock *current_tb)
+{
+    TBStatistics *new_stats = g_new0(TBStatistics, 1);
+    uint32_t hash = tb_stats_hash_func(phys_pc, pc, flags);
+    void *existing_stats = NULL;
+    new_stats->phys_pc = phys_pc;
+    new_stats->pc = pc;
+    new_stats->cs_base = cs_base;
+    new_stats->flags = flags;
+    new_stats->tb = current_tb;
+
+    qht_insert(&tb_ctx.tb_stats, new_stats, hash, &existing_stats);
+
+    if (unlikely(existing_stats)) {
+        /*
+         * If there is already a TBStatistic for this TB from a previous flush
+         * then just make the new TB point to the older TBStatistic
+         */
+        g_free(new_stats);
+        return existing_stats;
+    } else {
+        return new_stats;
+    }
+}
+
+
 /* Called with mmap_lock held for user mode emulation.  */
 TranslationBlock *tb_gen_code(CPUState *cpu,
                               target_ulong pc, target_ulong cs_base,
@@ -1732,6 +1778,17 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     ti = profile_getclock();
 #endif
 
+    /*
+     * We want to fetch the stats structure before we start code
+     * generation so we can count interesting things about this
+     * generation.
+     */
+    if (tb_stats_collection_enabled()) {
+        tb->tb_stats = tb_get_stats(phys_pc, pc, cs_base, flags, tb);
+    } else {
+        tb->tb_stats = NULL;
+    }
+
     tcg_func_start(tcg_ctx);
 
     tcg_ctx->cpu = env_cpu(env);
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 135aeaab0d..1e0bd4e147 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -23,21 +23,11 @@
 #include "cpu.h"
 #include "exec/tb-context.h"
 #include "sysemu/cpus.h"
+#include "exec/tb-stats.h"
 
 /* allow to see translation results - the slowdown should be negligible, so we leave it */
 #define DEBUG_DISAS
 
-/* Page tracking code uses ram addresses in system mode, and virtual
-   addresses in userspace mode.  Define tb_page_addr_t to be an appropriate
-   type.  */
-#if defined(CONFIG_USER_ONLY)
-typedef abi_ulong tb_page_addr_t;
-#define TB_PAGE_ADDR_FMT TARGET_ABI_FMT_lx
-#else
-typedef ram_addr_t tb_page_addr_t;
-#define TB_PAGE_ADDR_FMT RAM_ADDR_FMT
-#endif
-
 #include "qemu/log.h"
 
 void gen_intermediate_code(CPUState *cpu, TranslationBlock *tb, int max_insns);
@@ -404,6 +394,9 @@ struct TranslationBlock {
     uintptr_t jmp_list_head;
     uintptr_t jmp_list_next[2];
     uintptr_t jmp_dest[2];
+
+    /* Pointer to a struct where statistics from the TB is stored */
+    TBStatistics *tb_stats;
 };
 
 extern bool parallel_cpus;
diff --git a/include/exec/tb-context.h b/include/exec/tb-context.h
index feb585e0a7..3cfb62a338 100644
--- a/include/exec/tb-context.h
+++ b/include/exec/tb-context.h
@@ -23,6 +23,17 @@
 #include "qemu/thread.h"
 #include "qemu/qht.h"
 
+/* Page tracking code uses ram addresses in system mode, and virtual
+   addresses in userspace mode.  Define tb_page_addr_t to be an appropriate
+   type.  */
+#if defined(CONFIG_USER_ONLY)
+typedef abi_ulong tb_page_addr_t;
+#define TB_PAGE_ADDR_FMT TARGET_ABI_FMT_lx
+#else
+typedef ram_addr_t tb_page_addr_t;
+#define TB_PAGE_ADDR_FMT RAM_ADDR_FMT
+#endif
+
 #define CODE_GEN_HTABLE_BITS     15
 #define CODE_GEN_HTABLE_SIZE     (1 << CODE_GEN_HTABLE_BITS)
 
@@ -35,6 +46,7 @@ struct TBContext {
 
     /* statistics */
     unsigned tb_flush_count;
+    struct qht tb_stats;
 };
 
 extern TBContext tb_ctx;
diff --git a/include/exec/tb-hash.h b/include/exec/tb-hash.h
index 805235d321..dd8e8f252a 100644
--- a/include/exec/tb-hash.h
+++ b/include/exec/tb-hash.h
@@ -66,4 +66,11 @@ uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, uint32_t flags,
     return qemu_xxhash7(phys_pc, pc, flags, cf_mask, trace_vcpu_dstate);
 }
 
+static inline
+uint32_t tb_stats_hash_func(tb_page_addr_t phys_pc, target_ulong pc,
+                            uint32_t flags)
+{
+    return qemu_xxhash5(phys_pc, pc, flags);
+}
+
 #endif
diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
new file mode 100644
index 0000000000..898e05a36f
--- /dev/null
+++ b/include/exec/tb-stats.h
@@ -0,0 +1,67 @@
+/*
+ * QEMU System Emulator, Code Quality Monitor System
+ *
+ * Copyright (c) 2019 Vanderson M. do Rosario <vandersonmr2@gmail.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef TB_STATS_H
+
+#define TB_STATS_H
+
+#include "exec/cpu-common.h"
+#include "exec/tb-context.h"
+#include "tcg.h"
+
+typedef struct TBStatistics TBStatistics;
+
+/*
+ * This struct stores statistics such as execution count of the
+ * TranslationBlocks. Each sets of TBs for a given phys_pc/pc/flags
+ * has its own TBStatistics which will persist over tb_flush.
+ *
+ * We include additional counters to track number of translations as
+ * well as variants for compile flags.
+ */
+struct TBStatistics {
+    tb_page_addr_t phys_pc;
+    target_ulong pc;
+    uint32_t     flags;
+    /* cs_base isn't included in the hash but we do check for matches */
+    target_ulong cs_base;
+    /* current TB linked to this TBStatistics */
+    TranslationBlock *tb;
+};
+
+bool tb_stats_cmp(const void *ap, const void *bp);
+
+void init_tb_stats_htable_if_not(void);
+
+/* TBStatistic collection controls */
+enum TBStatsStatus { TB_STATS_RUNNING, TB_STATS_PAUSED, TB_STATS_STOPPED };
+extern int tcg_collect_tb_stats;
+
+void enable_collect_tb_stats(void);
+void disable_collect_tb_stats(void);
+void pause_collect_tb_stats(void);
+bool tb_stats_collection_enabled(void);
+bool tb_stats_collection_paused(void);
+
+#endif
diff --git a/util/log.c b/util/log.c
index 1d1b33f7d9..393a17115b 100644
--- a/util/log.c
+++ b/util/log.c
@@ -31,6 +31,8 @@ int qemu_loglevel;
 static int log_append = 0;
 static GArray *debug_regions;
 
+int tcg_collect_tb_stats;
+
 /* Return the number of characters emitted.  */
 int qemu_log(const char *fmt, ...)
 {
-- 
2.22.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v8 02/11] accel: collecting TB execution count
  2019-08-29 17:34 [Qemu-devel] [PATCH v8 00/11] Measure Tiny Code Generation Quality vandersonmr
  2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 01/11] accel: introducing TBStatistics structure vandersonmr
@ 2019-08-29 17:34 ` vandersonmr
  2019-08-30 10:21   ` Alex Bennée
  2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 03/11] accel: collecting JIT statistics vandersonmr
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: vandersonmr @ 2019-08-29 17:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, vandersonmr, Richard Henderson

If a TB has a TBS (TBStatistics) with the TB_EXEC_STATS
enabled, then we instrument the start code of this TB
to atomically count the number of times it is executed.
We count both the number of "normal" executions and atomic
executions of a TB.

The execution count of the TB is stored in its respective
TBS.

All TBStatistics are created by default with the flags from
default_tbstats_flag.

Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
---
 accel/tcg/cpu-exec.c      |  4 ++++
 accel/tcg/tb-stats.c      |  5 +++++
 accel/tcg/tcg-runtime.c   |  7 +++++++
 accel/tcg/tcg-runtime.h   |  2 ++
 accel/tcg/translate-all.c |  7 +++++++
 accel/tcg/translator.c    |  1 +
 include/exec/gen-icount.h |  9 +++++++++
 include/exec/tb-stats.h   | 19 +++++++++++++++++++
 util/log.c                |  1 +
 9 files changed, 55 insertions(+)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 48272c781b..9b2b7bff80 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -251,6 +251,10 @@ void cpu_exec_step_atomic(CPUState *cpu)
 
         start_exclusive();
 
+        if (tb_stats_enabled(tb, TB_EXEC_STATS)) {
+            tb->tb_stats->executions.atomic++;
+        }
+
         /* Since we got here, we know that parallel_cpus must be true.  */
         parallel_cpus = false;
         in_exclusive_region = true;
diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
index 948b107e68..1db81d83e7 100644
--- a/accel/tcg/tb-stats.c
+++ b/accel/tcg/tb-stats.c
@@ -61,3 +61,8 @@ bool tb_stats_collection_paused(void)
 {
     return tcg_collect_tb_stats == TB_STATS_PAUSED;
 }
+
+uint32_t get_default_tbstats_flag(void)
+{
+    return default_tbstats_flag;
+}
diff --git a/accel/tcg/tcg-runtime.c b/accel/tcg/tcg-runtime.c
index 8a1e408e31..6f4aafba11 100644
--- a/accel/tcg/tcg-runtime.c
+++ b/accel/tcg/tcg-runtime.c
@@ -167,3 +167,10 @@ void HELPER(exit_atomic)(CPUArchState *env)
 {
     cpu_loop_exit_atomic(env_cpu(env), GETPC());
 }
+
+void HELPER(inc_exec_freq)(void *ptr)
+{
+    TBStatistics *stats = (TBStatistics *) ptr;
+    g_assert(stats);
+    atomic_inc(&stats->executions.normal);
+}
diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
index 4fa61b49b4..bf0b75dbe8 100644
--- a/accel/tcg/tcg-runtime.h
+++ b/accel/tcg/tcg-runtime.h
@@ -28,6 +28,8 @@ DEF_HELPER_FLAGS_1(lookup_tb_ptr, TCG_CALL_NO_WG_SE, ptr, env)
 
 DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, env)
 
+DEF_HELPER_FLAGS_1(inc_exec_freq, TCG_CALL_NO_RWG, void, ptr)
+
 #ifdef CONFIG_SOFTMMU
 
 DEF_HELPER_FLAGS_5(atomic_cmpxchgb, TCG_CALL_NO_WG,
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index b7bccacd3b..e72aeba682 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1785,6 +1785,13 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
      */
     if (tb_stats_collection_enabled()) {
         tb->tb_stats = tb_get_stats(phys_pc, pc, cs_base, flags, tb);
+
+        if (qemu_log_in_addr_range(tb->pc)) {
+            uint32_t flag = get_default_tbstats_flag();
+            if (flag & TB_EXEC_STATS) {
+                tb->tb_stats->stats_enabled |= TB_EXEC_STATS;
+            }
+        }
     } else {
         tb->tb_stats = NULL;
     }
diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index 70c66c538c..ec6bd829a0 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -46,6 +46,7 @@ void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
 
     ops->init_disas_context(db, cpu);
     tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
+    gen_tb_exec_count(tb);
 
     /* Reset the temp count so that we can identify leaks */
     tcg_clear_temp_count();
diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
index 822c43cfd3..be006383b9 100644
--- a/include/exec/gen-icount.h
+++ b/include/exec/gen-icount.h
@@ -32,6 +32,15 @@ static inline void gen_io_end(void)
     tcg_temp_free_i32(tmp);
 }
 
+static inline void gen_tb_exec_count(TranslationBlock *tb)
+{
+    if (tb_stats_enabled(tb, TB_EXEC_STATS)) {
+        TCGv_ptr ptr = tcg_const_ptr(tb->tb_stats);
+        gen_helper_inc_exec_freq(ptr);
+        tcg_temp_free_ptr(ptr);
+    }
+}
+
 static inline void gen_tb_start(TranslationBlock *tb)
 {
     TCGv_i32 count, imm;
diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
index 898e05a36f..c4a8715400 100644
--- a/include/exec/tb-stats.h
+++ b/include/exec/tb-stats.h
@@ -30,6 +30,9 @@
 #include "exec/tb-context.h"
 #include "tcg.h"
 
+#define tb_stats_enabled(tb, JIT_STATS) \
+    (tb && tb->tb_stats && (tb->tb_stats->stats_enabled & JIT_STATS))
+
 typedef struct TBStatistics TBStatistics;
 
 /*
@@ -46,6 +49,15 @@ struct TBStatistics {
     uint32_t     flags;
     /* cs_base isn't included in the hash but we do check for matches */
     target_ulong cs_base;
+
+    uint32_t stats_enabled;
+
+    /* Execution stats */
+    struct {
+        unsigned long normal;
+        unsigned long atomic;
+    } executions;
+
     /* current TB linked to this TBStatistics */
     TranslationBlock *tb;
 };
@@ -56,7 +68,12 @@ void init_tb_stats_htable_if_not(void);
 
 /* TBStatistic collection controls */
 enum TBStatsStatus { TB_STATS_RUNNING, TB_STATS_PAUSED, TB_STATS_STOPPED };
+
+#define TB_NOTHING    (1 << 0)
+#define TB_EXEC_STATS (1 << 1)
+
 extern int tcg_collect_tb_stats;
+extern uint32_t default_tbstats_flag;
 
 void enable_collect_tb_stats(void);
 void disable_collect_tb_stats(void);
@@ -64,4 +81,6 @@ void pause_collect_tb_stats(void);
 bool tb_stats_collection_enabled(void);
 bool tb_stats_collection_paused(void);
 
+uint32_t get_default_tbstats_flag(void);
+
 #endif
diff --git a/util/log.c b/util/log.c
index 393a17115b..29021a4584 100644
--- a/util/log.c
+++ b/util/log.c
@@ -32,6 +32,7 @@ static int log_append = 0;
 static GArray *debug_regions;
 
 int tcg_collect_tb_stats;
+uint32_t default_tbstats_flag;
 
 /* Return the number of characters emitted.  */
 int qemu_log(const char *fmt, ...)
-- 
2.22.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v8 03/11] accel: collecting JIT statistics
  2019-08-29 17:34 [Qemu-devel] [PATCH v8 00/11] Measure Tiny Code Generation Quality vandersonmr
  2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 01/11] accel: introducing TBStatistics structure vandersonmr
  2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 02/11] accel: collecting TB execution count vandersonmr
@ 2019-08-29 17:34 ` vandersonmr
  2019-08-30 13:10   ` Alex Bennée
  2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 04/11] accel: replacing part of CONFIG_PROFILER with TBStats vandersonmr
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: vandersonmr @ 2019-08-29 17:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Paolo Bonzini, vandersonmr, Richard Henderson

If a TB has a TBS (TBStatistics) with the TB_JIT_STATS
enabled then we collect statistics of its translation
processes and code translation.

Collecting the number of host instructions seems to be
not simple as it would imply in having to modify several
target source files. So, for now, we are only collecting
the size of the host gen code.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
---
 accel/tcg/translate-all.c | 15 ++++++++++++++-
 accel/tcg/translator.c    |  4 ++++
 include/exec/tb-stats.h   | 15 +++++++++++++++
 tcg/tcg.c                 | 23 +++++++++++++++++++++++
 tcg/tcg.h                 |  2 ++
 5 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index e72aeba682..fb2fe0fa1f 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1705,6 +1705,7 @@ static TBStatistics *tb_get_stats(tb_page_addr_t phys_pc, target_ulong pc,
          * then just make the new TB point to the older TBStatistic
          */
         g_free(new_stats);
+        ((TBStatistics *) existing_stats)->tb = current_tb;
         return existing_stats;
     } else {
         return new_stats;
@@ -1785,13 +1786,18 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
      */
     if (tb_stats_collection_enabled()) {
         tb->tb_stats = tb_get_stats(phys_pc, pc, cs_base, flags, tb);
+        uint32_t flag = get_default_tbstats_flag();
 
         if (qemu_log_in_addr_range(tb->pc)) {
-            uint32_t flag = get_default_tbstats_flag();
             if (flag & TB_EXEC_STATS) {
                 tb->tb_stats->stats_enabled |= TB_EXEC_STATS;
             }
         }
+
+        if (flag & TB_JIT_STATS) {
+            tb->tb_stats->stats_enabled |= TB_JIT_STATS;
+            atomic_inc(&tb->tb_stats->translations.total);
+        }
     } else {
         tb->tb_stats = NULL;
     }
@@ -1869,6 +1875,10 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     atomic_set(&prof->search_out_len, prof->search_out_len + search_size);
 #endif
 
+    if (tb_stats_enabled(tb, TB_JIT_STATS)) {
+        atomic_add(&tb->tb_stats->code.out_len, gen_code_size);
+    }
+
 #ifdef DEBUG_DISAS
     if (qemu_loglevel_mask(CPU_LOG_TB_OUT_ASM) &&
         qemu_log_in_addr_range(tb->pc)) {
@@ -1926,6 +1936,9 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     phys_page2 = -1;
     if ((pc & TARGET_PAGE_MASK) != virt_page2) {
         phys_page2 = get_page_addr_code(env, virt_page2);
+        if (tb_stats_enabled(tb, TB_JIT_STATS)) {
+            atomic_inc(&tb->tb_stats->translations.spanning);
+        }
     }
     /*
      * No explicit memory barrier is required -- tb_link_page() makes the
diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index ec6bd829a0..9b2e248b09 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -116,6 +116,10 @@ void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
     db->tb->size = db->pc_next - db->pc_first;
     db->tb->icount = db->num_insns;
 
+    if (tb_stats_enabled(tb, TB_JIT_STATS)) {
+        atomic_add(&db->tb->tb_stats->code.num_guest_inst, db->num_insns);
+    }
+
 #ifdef DEBUG_DISAS
     if (qemu_loglevel_mask(CPU_LOG_TB_IN_ASM)
         && qemu_log_in_addr_range(db->pc_first)) {
diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
index c4a8715400..b68edd5d24 100644
--- a/include/exec/tb-stats.h
+++ b/include/exec/tb-stats.h
@@ -58,6 +58,20 @@ struct TBStatistics {
         unsigned long atomic;
     } executions;
 
+    struct {
+        unsigned num_guest_inst;
+        unsigned num_tcg_ops;
+        unsigned num_tcg_ops_opt;
+        unsigned spills;
+        unsigned out_len;
+    } code;
+
+    struct {
+        unsigned long total;
+        unsigned long uncached;
+        unsigned long spanning;
+    } translations;
+
     /* current TB linked to this TBStatistics */
     TranslationBlock *tb;
 };
@@ -71,6 +85,7 @@ enum TBStatsStatus { TB_STATS_RUNNING, TB_STATS_PAUSED, TB_STATS_STOPPED };
 
 #define TB_NOTHING    (1 << 0)
 #define TB_EXEC_STATS (1 << 1)
+#define TB_JIT_STATS  (1 << 2)
 
 extern int tcg_collect_tb_stats;
 extern uint32_t default_tbstats_flag;
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 0458eaec57..ae3e7a2217 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -3125,6 +3125,11 @@ static void temp_sync(TCGContext *s, TCGTemp *ts, TCGRegSet allocated_regs,
         case TEMP_VAL_REG:
             tcg_out_st(s, ts->type, ts->reg,
                        ts->mem_base->reg, ts->mem_offset);
+
+            /* Count number of spills */
+            if (tb_stats_enabled(s->current_tb, TB_JIT_STATS)) {
+                atomic_inc(&s->current_tb->tb_stats->code.spills);
+            }
             break;
 
         case TEMP_VAL_MEM:
@@ -3996,6 +4001,8 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
     int i, num_insns;
     TCGOp *op;
 
+    s->current_tb = tb;
+
 #ifdef CONFIG_PROFILER
     {
         int n = 0;
@@ -4027,6 +4034,14 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
     }
 #endif
 
+    if (tb_stats_enabled(tb, TB_JIT_STATS)) {
+        int n = 0;
+        QTAILQ_FOREACH(op, &s->ops, link) {
+            n++;
+        }
+        atomic_add(&tb->tb_stats->code.num_tcg_ops, n);
+    }
+
 #ifdef CONFIG_DEBUG_TCG
     /* Ensure all labels referenced have been emitted.  */
     {
@@ -4093,6 +4108,14 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
     }
 #endif
 
+    if (tb_stats_enabled(tb, TB_JIT_STATS)) {
+        int n = 0;
+        QTAILQ_FOREACH(op, &s->ops, link) {
+            n++;
+        }
+        atomic_add(&tb->tb_stats->code.num_tcg_ops_opt, n);
+    }
+
     tcg_reg_alloc_start(s);
 
     s->code_buf = tb->tc.ptr;
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 529acb2ed8..b4601162f8 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -740,6 +740,8 @@ struct TCGContext {
 
     uint16_t gen_insn_end_off[TCG_MAX_INSNS];
     target_ulong gen_insn_data[TCG_MAX_INSNS][TARGET_INSN_START_WORDS];
+
+    TranslationBlock *current_tb;
 };
 
 extern TCGContext tcg_init_ctx;
-- 
2.22.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v8 04/11] accel: replacing part of CONFIG_PROFILER with TBStats
  2019-08-29 17:34 [Qemu-devel] [PATCH v8 00/11] Measure Tiny Code Generation Quality vandersonmr
                   ` (2 preceding siblings ...)
  2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 03/11] accel: collecting JIT statistics vandersonmr
@ 2019-08-29 17:34 ` vandersonmr
  2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 05/11] accel: adding TB_JIT_TIME and full replacing CONFIG_PROFILER vandersonmr
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: vandersonmr @ 2019-08-29 17:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Paolo Bonzini, vandersonmr, Richard Henderson

We add some of the statistics collected in the TCGProfiler
into the TBStats, having the statistics not only for the whole
emulation but for each TB. Then, we removed these stats
from TCGProfiler and reconstruct the information for the
"info jit" using the sum of all TBStats statistics.

The goal is to have one unique and better way of collecting
emulation statistics. Moreover, checking dynamiclly if the
profiling is enabled showed to have an insignificant impact
on the performance:
https://wiki.qemu.org/Internships/ProjectIdeas/TCGCodeQuality#Overheads.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
---
 accel/tcg/tb-stats.c      | 96 +++++++++++++++++++++++++++++++++++++++
 accel/tcg/translate-all.c |  8 +---
 include/exec/tb-stats.h   | 11 +++++
 tcg/tcg.c                 | 93 ++++---------------------------------
 tcg/tcg.h                 | 10 ----
 5 files changed, 119 insertions(+), 99 deletions(-)

diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
index 1db81d83e7..176da60e13 100644
--- a/accel/tcg/tb-stats.c
+++ b/accel/tcg/tb-stats.c
@@ -25,9 +25,105 @@
 #include "qemu/osdep.h"
 
 #include "disas/disas.h"
+#include "exec/exec-all.h"
+#include "tcg.h"
+
+#include "qemu/qemu-print.h"
 
 #include "exec/tb-stats.h"
 
+struct jit_profile_info {
+    uint64_t translations;
+    uint64_t aborted;
+    uint64_t ops;
+    unsigned ops_max;
+    uint64_t del_ops;
+    uint64_t temps;
+    unsigned temps_max;
+    uint64_t host;
+    uint64_t guest;
+    uint64_t search_data;
+};
+
+/* accumulate the statistics from all TBs */
+static void collect_jit_profile_info(void *p, uint32_t hash, void *userp)
+{
+    struct jit_profile_info *jpi = userp;
+    TBStatistics *tbs = p;
+
+    jpi->translations += tbs->translations.total;
+    jpi->ops += tbs->code.num_tcg_ops;
+    if (stat_per_translation(tbs, code.num_tcg_ops) > jpi->ops_max) {
+        jpi->ops_max = stat_per_translation(tbs, code.num_tcg_ops);
+    }
+    jpi->del_ops += tbs->code.deleted_ops;
+    jpi->temps += tbs->code.temps;
+    if (stat_per_translation(tbs, code.temps) > jpi->temps_max) {
+        jpi->temps_max = stat_per_translation(tbs, code.temps);
+    }
+    jpi->host += tbs->code.out_len;
+    jpi->guest += tbs->code.in_len;
+    jpi->search_data += tbs->code.search_out_len;
+}
+
+/* dump JIT statisticis using TCGProfile and TBStats */
+void dump_jit_profile_info(TCGProfile *s)
+{
+    if (!tb_stats_collection_enabled()) {
+        return;
+    }
+
+    struct jit_profile_info *jpi = g_new0(struct jit_profile_info, 1);
+
+    qht_iter(&tb_ctx.tb_stats, collect_jit_profile_info, jpi);
+
+    if (jpi->translations) {
+        qemu_printf("translated TBs      %" PRId64 "\n", jpi->translations);
+        qemu_printf("avg ops/TB          %0.1f max=%d\n",
+                jpi->ops / (double) jpi->translations, jpi->ops_max);
+        qemu_printf("deleted ops/TB      %0.2f\n",
+                jpi->del_ops / (double) jpi->translations);
+        qemu_printf("avg temps/TB        %0.2f max=%d\n",
+                jpi->temps / (double) jpi->translations, jpi->temps_max);
+        qemu_printf("avg host code/TB    %0.1f\n",
+                jpi->host / (double) jpi->translations);
+        qemu_printf("avg search data/TB  %0.1f\n",
+                jpi->search_data / (double) jpi->translations);
+
+        if (s) {
+            int64_t tot = s->interm_time + s->code_time;
+            qemu_printf("JIT cycles          %" PRId64 " (%0.3f s at 2.4 GHz)\n",
+                            tot, tot / 2.4e9);
+            qemu_printf("cycles/op           %0.1f\n",
+                        jpi->ops ? (double)tot / jpi->ops : 0);
+            qemu_printf("cycles/in byte      %0.1f\n",
+                        jpi->guest ? (double)tot / jpi->guest : 0);
+            qemu_printf("cycles/out byte     %0.1f\n",
+                        jpi->host ? (double)tot / jpi->host : 0);
+            qemu_printf("cycles/search byte     %0.1f\n",
+                        jpi->search_data ? (double)tot / jpi->search_data : 0);
+            if (tot == 0) {
+                tot = 1;
+            }
+            qemu_printf("  gen_interm time   %0.1f%%\n",
+                        (double)s->interm_time / tot * 100.0);
+            qemu_printf("  gen_code time     %0.1f%%\n",
+                        (double)s->code_time / tot * 100.0);
+            qemu_printf("optim./code time    %0.1f%%\n",
+                        (double)s->opt_time / (s->code_time ? s->code_time : 1)
+                        * 100.0);
+            qemu_printf("liveness/code time  %0.1f%%\n",
+                    (double)s->la_time / (s->code_time ? s->code_time : 1) * 100.0);
+            qemu_printf("cpu_restore count   %" PRId64 "\n",
+                    s->restore_count);
+            qemu_printf("  avg cycles        %0.1f\n",
+                    s->restore_count ? (double)s->restore_time / s->restore_count : 0);
+        }
+    }
+    g_free(jpi);
+}
+
+
 void init_tb_stats_htable_if_not(void)
 {
     if (tb_stats_collection_enabled() && !tb_ctx.tb_stats.map) {
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index fb2fe0fa1f..fe890f0782 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1774,8 +1774,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
  tb_overflow:
 
 #ifdef CONFIG_PROFILER
-    /* includes aborted translations because of exceptions */
-    atomic_set(&prof->tb_count1, prof->tb_count1 + 1);
     ti = profile_getclock();
 #endif
 
@@ -1823,7 +1821,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     }
 
 #ifdef CONFIG_PROFILER
-    atomic_set(&prof->tb_count, prof->tb_count + 1);
     atomic_set(&prof->interm_time, prof->interm_time + profile_getclock() - ti);
     ti = profile_getclock();
 #endif
@@ -1870,13 +1867,12 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 
 #ifdef CONFIG_PROFILER
     atomic_set(&prof->code_time, prof->code_time + profile_getclock() - ti);
-    atomic_set(&prof->code_in_len, prof->code_in_len + tb->size);
-    atomic_set(&prof->code_out_len, prof->code_out_len + gen_code_size);
-    atomic_set(&prof->search_out_len, prof->search_out_len + search_size);
 #endif
 
     if (tb_stats_enabled(tb, TB_JIT_STATS)) {
+        atomic_add(&tb->tb_stats->code.in_len, tb->size);
         atomic_add(&tb->tb_stats->code.out_len, gen_code_size);
+        atomic_add(&tb->tb_stats->code.search_out_len, search_size);
     }
 
 #ifdef DEBUG_DISAS
diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
index b68edd5d24..b9d5a343be 100644
--- a/include/exec/tb-stats.h
+++ b/include/exec/tb-stats.h
@@ -33,6 +33,9 @@
 #define tb_stats_enabled(tb, JIT_STATS) \
     (tb && tb->tb_stats && (tb->tb_stats->stats_enabled & JIT_STATS))
 
+#define stat_per_translation(stat, name) \
+    (stat->translations.total ? stat->name / stat->translations.total : 0)
+
 typedef struct TBStatistics TBStatistics;
 
 /*
@@ -63,7 +66,13 @@ struct TBStatistics {
         unsigned num_tcg_ops;
         unsigned num_tcg_ops_opt;
         unsigned spills;
+
+        /* CONFIG_PROFILE */
+        unsigned temps;
+        unsigned deleted_ops;
+        unsigned in_len;
         unsigned out_len;
+        unsigned search_out_len;
     } code;
 
     struct {
@@ -80,6 +89,8 @@ bool tb_stats_cmp(const void *ap, const void *bp);
 
 void init_tb_stats_htable_if_not(void);
 
+void dump_jit_profile_info(TCGProfile *s);
+
 /* TBStatistic collection controls */
 enum TBStatsStatus { TB_STATS_RUNNING, TB_STATS_PAUSED, TB_STATS_STOPPED };
 
diff --git a/tcg/tcg.c b/tcg/tcg.c
index ae3e7a2217..03b11495aa 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -2294,9 +2294,9 @@ void tcg_op_remove(TCGContext *s, TCGOp *op)
     QTAILQ_INSERT_TAIL(&s->free_ops, op, link);
     s->nb_ops--;
 
-#ifdef CONFIG_PROFILER
-    atomic_set(&s->prof.del_op_count, s->prof.del_op_count + 1);
-#endif
+    if (tb_stats_enabled(s->current_tb, TB_JIT_STATS)) {
+        atomic_inc(&s->current_tb->tb_stats->code.deleted_ops);
+    }
 }
 
 static TCGOp *tcg_op_alloc(TCGOpcode opc)
@@ -3913,16 +3913,6 @@ void tcg_profile_snapshot(TCGProfile *prof, bool counters, bool table)
 
         if (counters) {
             PROF_ADD(prof, orig, cpu_exec_time);
-            PROF_ADD(prof, orig, tb_count1);
-            PROF_ADD(prof, orig, tb_count);
-            PROF_ADD(prof, orig, op_count);
-            PROF_MAX(prof, orig, op_count_max);
-            PROF_ADD(prof, orig, temp_count);
-            PROF_MAX(prof, orig, temp_count_max);
-            PROF_ADD(prof, orig, del_op_count);
-            PROF_ADD(prof, orig, code_in_len);
-            PROF_ADD(prof, orig, code_out_len);
-            PROF_ADD(prof, orig, search_out_len);
             PROF_ADD(prof, orig, interm_time);
             PROF_ADD(prof, orig, code_time);
             PROF_ADD(prof, orig, la_time);
@@ -4002,26 +3992,17 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
     TCGOp *op;
 
     s->current_tb = tb;
-
-#ifdef CONFIG_PROFILER
-    {
+    if (tb_stats_enabled(tb, TB_JIT_STATS)) {
         int n = 0;
 
         QTAILQ_FOREACH(op, &s->ops, link) {
             n++;
         }
-        atomic_set(&prof->op_count, prof->op_count + n);
-        if (n > prof->op_count_max) {
-            atomic_set(&prof->op_count_max, n);
-        }
+        atomic_add(&tb->tb_stats->code.num_tcg_ops, n);
 
         n = s->nb_temps;
-        atomic_set(&prof->temp_count, prof->temp_count + n);
-        if (n > prof->temp_count_max) {
-            atomic_set(&prof->temp_count_max, n);
-        }
+        atomic_add(&tb->tb_stats->code.temps, n);
     }
-#endif
 
 #ifdef DEBUG_DISAS
     if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP)
@@ -4228,70 +4209,16 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
     return tcg_current_code_size(s);
 }
 
-#ifdef CONFIG_PROFILER
 void tcg_dump_info(void)
 {
+    TCGProfile *s = NULL;
+#ifdef CONFIG_PROFILER
     TCGProfile prof = {};
-    const TCGProfile *s;
-    int64_t tb_count;
-    int64_t tb_div_count;
-    int64_t tot;
-
     tcg_profile_snapshot_counters(&prof);
     s = &prof;
-    tb_count = s->tb_count;
-    tb_div_count = tb_count ? tb_count : 1;
-    tot = s->interm_time + s->code_time;
-
-    qemu_printf("JIT cycles          %" PRId64 " (%0.3f s at 2.4 GHz)\n",
-                tot, tot / 2.4e9);
-    qemu_printf("translated TBs      %" PRId64 " (aborted=%" PRId64
-                " %0.1f%%)\n",
-                tb_count, s->tb_count1 - tb_count,
-                (double)(s->tb_count1 - s->tb_count)
-                / (s->tb_count1 ? s->tb_count1 : 1) * 100.0);
-    qemu_printf("avg ops/TB          %0.1f max=%d\n",
-                (double)s->op_count / tb_div_count, s->op_count_max);
-    qemu_printf("deleted ops/TB      %0.2f\n",
-                (double)s->del_op_count / tb_div_count);
-    qemu_printf("avg temps/TB        %0.2f max=%d\n",
-                (double)s->temp_count / tb_div_count, s->temp_count_max);
-    qemu_printf("avg host code/TB    %0.1f\n",
-                (double)s->code_out_len / tb_div_count);
-    qemu_printf("avg search data/TB  %0.1f\n",
-                (double)s->search_out_len / tb_div_count);
-    
-    qemu_printf("cycles/op           %0.1f\n",
-                s->op_count ? (double)tot / s->op_count : 0);
-    qemu_printf("cycles/in byte      %0.1f\n",
-                s->code_in_len ? (double)tot / s->code_in_len : 0);
-    qemu_printf("cycles/out byte     %0.1f\n",
-                s->code_out_len ? (double)tot / s->code_out_len : 0);
-    qemu_printf("cycles/search byte     %0.1f\n",
-                s->search_out_len ? (double)tot / s->search_out_len : 0);
-    if (tot == 0) {
-        tot = 1;
-    }
-    qemu_printf("  gen_interm time   %0.1f%%\n",
-                (double)s->interm_time / tot * 100.0);
-    qemu_printf("  gen_code time     %0.1f%%\n",
-                (double)s->code_time / tot * 100.0);
-    qemu_printf("optim./code time    %0.1f%%\n",
-                (double)s->opt_time / (s->code_time ? s->code_time : 1)
-                * 100.0);
-    qemu_printf("liveness/code time  %0.1f%%\n",
-                (double)s->la_time / (s->code_time ? s->code_time : 1) * 100.0);
-    qemu_printf("cpu_restore count   %" PRId64 "\n",
-                s->restore_count);
-    qemu_printf("  avg cycles        %0.1f\n",
-                s->restore_count ? (double)s->restore_time / s->restore_count : 0);
-}
-#else
-void tcg_dump_info(void)
-{
-    qemu_printf("[TCG profiler not compiled]\n");
-}
 #endif
+    dump_jit_profile_info(s);
+}
 
 #ifdef ELF_HOST_MACHINE
 /* In order to use this feature, the backend needs to do three things:
diff --git a/tcg/tcg.h b/tcg/tcg.h
index b4601162f8..a8d779f7e6 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -646,16 +646,6 @@ QEMU_BUILD_BUG_ON(NB_OPS > (1 << 8));
 
 typedef struct TCGProfile {
     int64_t cpu_exec_time;
-    int64_t tb_count1;
-    int64_t tb_count;
-    int64_t op_count; /* total insn count */
-    int op_count_max; /* max insn per TB */
-    int temp_count_max;
-    int64_t temp_count;
-    int64_t del_op_count;
-    int64_t code_in_len;
-    int64_t code_out_len;
-    int64_t search_out_len;
     int64_t interm_time;
     int64_t code_time;
     int64_t la_time;
-- 
2.22.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v8 05/11] accel: adding TB_JIT_TIME and full replacing CONFIG_PROFILER
  2019-08-29 17:34 [Qemu-devel] [PATCH v8 00/11] Measure Tiny Code Generation Quality vandersonmr
                   ` (3 preceding siblings ...)
  2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 04/11] accel: replacing part of CONFIG_PROFILER with TBStats vandersonmr
@ 2019-08-29 17:34 ` vandersonmr
  2019-08-30 13:12   ` Alex Bennée
  2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 06/11] Adding -d tb_stats to control TBStatistics collection: vandersonmr
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: vandersonmr @ 2019-08-29 17:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, vandersonmr, Markus Armbruster,
	Dr. David Alan Gilbert, Richard Henderson

Replace all others CONFIG_PROFILER statistics and migrate it to
TBStatistics system. However, TCGProfiler still exists and can
be use to store global statistics and times. All TB related
statistics goes to TBStatistics.

Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
---
 accel/tcg/tb-stats.c      |  91 +++++++++++++++++++---------
 accel/tcg/translate-all.c |  47 ++++++++-------
 configure                 |   3 -
 cpus.c                    |  14 ++---
 include/exec/tb-stats.h   |  19 +++++-
 include/qemu/timer.h      |   5 +-
 monitor/misc.c            |  28 ++-------
 tcg/tcg.c                 | 124 +++++++++++---------------------------
 tcg/tcg.h                 |  10 +--
 vl.c                      |   8 +--
 10 files changed, 156 insertions(+), 193 deletions(-)

diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
index 176da60e13..66abc97ad4 100644
--- a/accel/tcg/tb-stats.c
+++ b/accel/tcg/tb-stats.c
@@ -32,6 +32,8 @@
 
 #include "exec/tb-stats.h"
 
+uint64_t dev_time;
+
 struct jit_profile_info {
     uint64_t translations;
     uint64_t aborted;
@@ -43,6 +45,13 @@ struct jit_profile_info {
     uint64_t host;
     uint64_t guest;
     uint64_t search_data;
+
+    uint64_t interm_time;
+    uint64_t code_time;
+    uint64_t restore_count;
+    uint64_t restore_time;
+    uint64_t opt_time;
+    uint64_t la_time;
 };
 
 /* accumulate the statistics from all TBs */
@@ -64,6 +73,29 @@ static void collect_jit_profile_info(void *p, uint32_t hash, void *userp)
     jpi->host += tbs->code.out_len;
     jpi->guest += tbs->code.in_len;
     jpi->search_data += tbs->code.search_out_len;
+
+    jpi->interm_time += stat_per_translation(tbs, time.interm);
+    jpi->code_time += stat_per_translation(tbs, time.code);
+    jpi->opt_time += stat_per_translation(tbs, time.opt);
+    jpi->la_time += stat_per_translation(tbs, time.la);
+    jpi->restore_time += tbs->time.restore;
+    jpi->restore_count += tbs->time.restore_count;
+}
+
+void dump_jit_exec_time_info(uint64_t dev_time)
+{
+    static uint64_t last_cpu_exec_time;
+    uint64_t cpu_exec_time;
+    uint64_t delta;
+
+    cpu_exec_time = tcg_cpu_exec_time();
+    delta = cpu_exec_time - last_cpu_exec_time;
+
+    qemu_printf("async time  %" PRId64 " (%0.3f)\n",
+                   dev_time, dev_time / (double) NANOSECONDS_PER_SECOND);
+    qemu_printf("qemu time   %" PRId64 " (%0.3f)\n",
+                   delta, delta / (double) NANOSECONDS_PER_SECOND);
+    last_cpu_exec_time = cpu_exec_time;
 }
 
 /* dump JIT statisticis using TCGProfile and TBStats */
@@ -90,34 +122,39 @@ void dump_jit_profile_info(TCGProfile *s)
         qemu_printf("avg search data/TB  %0.1f\n",
                 jpi->search_data / (double) jpi->translations);
 
+        uint64_t tot = jpi->interm_time + jpi->code_time;
+
+        qemu_printf("JIT cycles          %" PRId64 " (%0.3fs at 2.4 GHz)\n",
+                tot, tot / 2.4e9);
+        qemu_printf("  cycles/op           %0.1f\n",
+                jpi->ops ? (double)tot / jpi->ops : 0);
+        qemu_printf("  cycles/in byte      %0.1f\n",
+                jpi->guest ? (double)tot / jpi->guest : 0);
+        qemu_printf("  cycles/out byte     %0.1f\n",
+                jpi->host ? (double)tot / jpi->host : 0);
+        qemu_printf("  cycles/search byte  %0.1f\n",
+                jpi->search_data ? (double)tot / jpi->search_data : 0);
+        if (tot == 0) {
+            tot = 1;
+        }
+
+        qemu_printf("  gen_interm time     %0.1f%%\n",
+                (double)jpi->interm_time / tot * 100.0);
+        qemu_printf("  gen_code time       %0.1f%%\n",
+                (double)jpi->code_time / tot * 100.0);
+
+        qemu_printf("    optim./code time    %0.1f%%\n",
+                (double)jpi->opt_time / (jpi->code_time ? jpi->code_time : 1) * 100.0);
+        qemu_printf("    liveness/code time  %0.1f%%\n",
+                (double)jpi->la_time / (jpi->code_time ? jpi->code_time : 1) * 100.0);
+
+        qemu_printf("cpu_restore count   %" PRId64 "\n", jpi->restore_count);
+        qemu_printf("  avg cycles        %0.1f\n",
+                jpi->restore_count ? (double)jpi->restore_time / jpi->restore_count : 0);
+
         if (s) {
-            int64_t tot = s->interm_time + s->code_time;
-            qemu_printf("JIT cycles          %" PRId64 " (%0.3f s at 2.4 GHz)\n",
-                            tot, tot / 2.4e9);
-            qemu_printf("cycles/op           %0.1f\n",
-                        jpi->ops ? (double)tot / jpi->ops : 0);
-            qemu_printf("cycles/in byte      %0.1f\n",
-                        jpi->guest ? (double)tot / jpi->guest : 0);
-            qemu_printf("cycles/out byte     %0.1f\n",
-                        jpi->host ? (double)tot / jpi->host : 0);
-            qemu_printf("cycles/search byte     %0.1f\n",
-                        jpi->search_data ? (double)tot / jpi->search_data : 0);
-            if (tot == 0) {
-                tot = 1;
-            }
-            qemu_printf("  gen_interm time   %0.1f%%\n",
-                        (double)s->interm_time / tot * 100.0);
-            qemu_printf("  gen_code time     %0.1f%%\n",
-                        (double)s->code_time / tot * 100.0);
-            qemu_printf("optim./code time    %0.1f%%\n",
-                        (double)s->opt_time / (s->code_time ? s->code_time : 1)
-                        * 100.0);
-            qemu_printf("liveness/code time  %0.1f%%\n",
-                    (double)s->la_time / (s->code_time ? s->code_time : 1) * 100.0);
-            qemu_printf("cpu_restore count   %" PRId64 "\n",
-                    s->restore_count);
-            qemu_printf("  avg cycles        %0.1f\n",
-                    s->restore_count ? (double)s->restore_time / s->restore_count : 0);
+            qemu_printf("cpu exec time  %" PRId64 " (%0.3fs)\n",
+                s->cpu_exec_time, s->cpu_exec_time / (double) NANOSECONDS_PER_SECOND);
         }
     }
     g_free(jpi);
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index fe890f0782..a2e65bb85c 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -337,10 +337,11 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
     CPUArchState *env = cpu->env_ptr;
     uint8_t *p = tb->tc.ptr + tb->tc.size;
     int i, j, num_insns = tb->icount;
-#ifdef CONFIG_PROFILER
-    TCGProfile *prof = &tcg_ctx->prof;
-    int64_t ti = profile_getclock();
-#endif
+    uint64_t ti = 0;
+
+    if (tb_stats_enabled(tb, TB_JIT_TIME)) {
+        ti = profile_getclock();
+    }
 
     searched_pc -= GETPC_ADJ;
 
@@ -370,11 +371,11 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
     }
     restore_state_to_opc(env, tb, data);
 
-#ifdef CONFIG_PROFILER
-    atomic_set(&prof->restore_time,
-                prof->restore_time + profile_getclock() - ti);
-    atomic_set(&prof->restore_count, prof->restore_count + 1);
-#endif
+    if (tb_stats_enabled(tb, TB_JIT_TIME)) {
+        atomic_add(&tb->tb_stats->time.restore, profile_getclock() - ti);
+        atomic_inc(&tb->tb_stats->time.restore_count);
+    }
+
     return 0;
 }
 
@@ -1724,10 +1725,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     target_ulong virt_page2;
     tcg_insn_unit *gen_code_buf;
     int gen_code_size, search_size, max_insns;
-#ifdef CONFIG_PROFILER
-    TCGProfile *prof = &tcg_ctx->prof;
-    int64_t ti;
-#endif
+    uint64_t ti = 0;
+
     assert_memory_lock();
 
     phys_pc = get_page_addr_code(env, pc);
@@ -1773,9 +1772,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     tcg_ctx->tb_cflags = cflags;
  tb_overflow:
 
-#ifdef CONFIG_PROFILER
-    ti = profile_getclock();
-#endif
 
     /*
      * We want to fetch the stats structure before we start code
@@ -1796,6 +1792,11 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
             tb->tb_stats->stats_enabled |= TB_JIT_STATS;
             atomic_inc(&tb->tb_stats->translations.total);
         }
+
+        if (flag & TB_JIT_TIME) {
+            tb->tb_stats->stats_enabled |= TB_JIT_TIME;
+            ti = profile_getclock();
+        }
     } else {
         tb->tb_stats = NULL;
     }
@@ -1820,10 +1821,10 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
         tcg_ctx->tb_jmp_target_addr = tb->jmp_target_arg;
     }
 
-#ifdef CONFIG_PROFILER
-    atomic_set(&prof->interm_time, prof->interm_time + profile_getclock() - ti);
-    ti = profile_getclock();
-#endif
+    if (tb_stats_enabled(tb, TB_JIT_TIME)) {
+        atomic_add(&tb->tb_stats->time.interm, profile_getclock() - ti);
+        ti = profile_getclock();
+    }
 
     gen_code_size = tcg_gen_code(tcg_ctx, tb);
     if (unlikely(gen_code_size < 0)) {
@@ -1865,9 +1866,9 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     }
     tb->tc.size = gen_code_size;
 
-#ifdef CONFIG_PROFILER
-    atomic_set(&prof->code_time, prof->code_time + profile_getclock() - ti);
-#endif
+    if (tb_stats_enabled(tb, TB_JIT_TIME)) {
+        atomic_add(&tb->tb_stats->time.code, profile_getclock() - ti);
+    }
 
     if (tb_stats_enabled(tb, TB_JIT_STATS)) {
         atomic_add(&tb->tb_stats->code.in_len, tb->size);
diff --git a/configure b/configure
index e44e454c43..af8ff2e6dc 100755
--- a/configure
+++ b/configure
@@ -6611,9 +6611,6 @@ fi
 if test "$static" = "yes" ; then
   echo "CONFIG_STATIC=y" >> $config_host_mak
 fi
-if test "$profiler" = "yes" ; then
-  echo "CONFIG_PROFILER=y" >> $config_host_mak
-fi
 if test "$want_tools" = "yes" ; then
   echo "CONFIG_TOOLS=y" >> $config_host_mak
 fi
diff --git a/cpus.c b/cpus.c
index 85cd451a86..c1844a412b 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1433,21 +1433,17 @@ static void process_icount_data(CPUState *cpu)
 static int tcg_cpu_exec(CPUState *cpu)
 {
     int ret;
-#ifdef CONFIG_PROFILER
-    int64_t ti;
-#endif
+    uint64_t ti;
 
     assert(tcg_enabled());
-#ifdef CONFIG_PROFILER
     ti = profile_getclock();
-#endif
+
     cpu_exec_start(cpu);
     ret = cpu_exec(cpu);
     cpu_exec_end(cpu);
-#ifdef CONFIG_PROFILER
-    atomic_set(&tcg_ctx->prof.cpu_exec_time,
-               tcg_ctx->prof.cpu_exec_time + profile_getclock() - ti);
-#endif
+
+    atomic_add(&tcg_ctx->prof.cpu_exec_time, profile_getclock() - ti);
+
     return ret;
 }
 
diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
index b9d5a343be..0b9a6e2f72 100644
--- a/include/exec/tb-stats.h
+++ b/include/exec/tb-stats.h
@@ -81,22 +81,39 @@ struct TBStatistics {
         unsigned long spanning;
     } translations;
 
+    struct {
+        int64_t restore;
+        uint64_t restore_count;
+        int64_t interm;
+        int64_t code;
+        int64_t opt;
+        int64_t la;
+    } time;
+
     /* current TB linked to this TBStatistics */
     TranslationBlock *tb;
 };
 
 bool tb_stats_cmp(const void *ap, const void *bp);
 
+void dump_jit_exec_time_info(uint64_t dev_time);
+
 void init_tb_stats_htable_if_not(void);
 
 void dump_jit_profile_info(TCGProfile *s);
 
 /* TBStatistic collection controls */
-enum TBStatsStatus { TB_STATS_RUNNING, TB_STATS_PAUSED, TB_STATS_STOPPED };
+enum TBStatsStatus {
+    TB_STATS_DISABLED = 0,
+    TB_STATS_RUNNING,
+    TB_STATS_PAUSED,
+    TB_STATS_STOPPED
+};
 
 #define TB_NOTHING    (1 << 0)
 #define TB_EXEC_STATS (1 << 1)
 #define TB_JIT_STATS  (1 << 2)
+#define TB_JIT_TIME   (1 << 3)
 
 extern int tcg_collect_tb_stats;
 extern uint32_t default_tbstats_flag;
diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index 85bc6eb00b..641ca3ddd0 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -1006,13 +1006,10 @@ static inline int64_t cpu_get_host_ticks(void)
 }
 #endif
 
-#ifdef CONFIG_PROFILER
 static inline int64_t profile_getclock(void)
 {
     return get_clock();
 }
 
-extern int64_t dev_time;
-#endif
-
+extern uint64_t dev_time;
 #endif
diff --git a/monitor/misc.c b/monitor/misc.c
index aef16f6cfb..ac4ff58d96 100644
--- a/monitor/misc.c
+++ b/monitor/misc.c
@@ -472,6 +472,11 @@ static void hmp_info_jit(Monitor *mon, const QDict *qdict)
 
 static void hmp_info_opcount(Monitor *mon, const QDict *qdict)
 {
+    if (!tb_stats_collection_enabled()) {
+        error_report("TB information not being recorded.");
+        return;
+    }
+
     dump_opcount_info();
 }
 #endif
@@ -1082,32 +1087,11 @@ static void hmp_info_mtree(Monitor *mon, const QDict *qdict)
     mtree_info(flatview, dispatch_tree, owner);
 }
 
-#ifdef CONFIG_PROFILER
-
-int64_t dev_time;
-
 static void hmp_info_profile(Monitor *mon, const QDict *qdict)
 {
-    static int64_t last_cpu_exec_time;
-    int64_t cpu_exec_time;
-    int64_t delta;
-
-    cpu_exec_time = tcg_cpu_exec_time();
-    delta = cpu_exec_time - last_cpu_exec_time;
-
-    monitor_printf(mon, "async time  %" PRId64 " (%0.3f)\n",
-                   dev_time, dev_time / (double)NANOSECONDS_PER_SECOND);
-    monitor_printf(mon, "qemu time   %" PRId64 " (%0.3f)\n",
-                   delta, delta / (double)NANOSECONDS_PER_SECOND);
-    last_cpu_exec_time = cpu_exec_time;
+    dump_jit_exec_time_info(dev_time);
     dev_time = 0;
 }
-#else
-static void hmp_info_profile(Monitor *mon, const QDict *qdict)
-{
-    monitor_printf(mon, "Internal profiler not compiled\n");
-}
-#endif
 
 /* Capture support */
 static QLIST_HEAD (capture_list_head, CaptureState) capture_head;
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 03b11495aa..4986ff3984 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -3884,82 +3884,34 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
     }
 }
 
-#ifdef CONFIG_PROFILER
-
 /* avoid copy/paste errors */
 #define PROF_ADD(to, from, field)                       \
     do {                                                \
         (to)->field += atomic_read(&((from)->field));   \
     } while (0)
 
-#define PROF_MAX(to, from, field)                                       \
-    do {                                                                \
-        typeof((from)->field) val__ = atomic_read(&((from)->field));    \
-        if (val__ > (to)->field) {                                      \
-            (to)->field = val__;                                        \
-        }                                                               \
-    } while (0)
-
-/* Pass in a zero'ed @prof */
-static inline
-void tcg_profile_snapshot(TCGProfile *prof, bool counters, bool table)
+static void collect_tcg_profiler(TCGProfile *prof)
 {
+    int i;
     unsigned int n_ctxs = atomic_read(&n_tcg_ctxs);
-    unsigned int i;
 
     for (i = 0; i < n_ctxs; i++) {
         TCGContext *s = atomic_read(&tcg_ctxs[i]);
         const TCGProfile *orig = &s->prof;
 
-        if (counters) {
-            PROF_ADD(prof, orig, cpu_exec_time);
-            PROF_ADD(prof, orig, interm_time);
-            PROF_ADD(prof, orig, code_time);
-            PROF_ADD(prof, orig, la_time);
-            PROF_ADD(prof, orig, opt_time);
-            PROF_ADD(prof, orig, restore_count);
-            PROF_ADD(prof, orig, restore_time);
-        }
-        if (table) {
-            int i;
+        PROF_ADD(prof, orig, cpu_exec_time);
 
-            for (i = 0; i < NB_OPS; i++) {
-                PROF_ADD(prof, orig, table_op_count[i]);
-            }
+        for (i = 0; i < NB_OPS; i++) {
+            PROF_ADD(prof, orig, table_op_count[i]);
         }
     }
 }
 
-#undef PROF_ADD
-#undef PROF_MAX
-
-static void tcg_profile_snapshot_counters(TCGProfile *prof)
-{
-    tcg_profile_snapshot(prof, true, false);
-}
-
-static void tcg_profile_snapshot_table(TCGProfile *prof)
-{
-    tcg_profile_snapshot(prof, false, true);
-}
-
-void tcg_dump_op_count(void)
-{
-    TCGProfile prof = {};
-    int i;
-
-    tcg_profile_snapshot_table(&prof);
-    for (i = 0; i < NB_OPS; i++) {
-        qemu_printf("%s %" PRId64 "\n", tcg_op_defs[i].name,
-                    prof.table_op_count[i]);
-    }
-}
-
-int64_t tcg_cpu_exec_time(void)
+uint64_t tcg_cpu_exec_time(void)
 {
     unsigned int n_ctxs = atomic_read(&n_tcg_ctxs);
     unsigned int i;
-    int64_t ret = 0;
+    uint64_t ret = 0;
 
     for (i = 0; i < n_ctxs; i++) {
         const TCGContext *s = atomic_read(&tcg_ctxs[i]);
@@ -3969,25 +3921,9 @@ int64_t tcg_cpu_exec_time(void)
     }
     return ret;
 }
-#else
-void tcg_dump_op_count(void)
-{
-    qemu_printf("[TCG profiler not compiled]\n");
-}
-
-int64_t tcg_cpu_exec_time(void)
-{
-    error_report("%s: TCG profiler not compiled", __func__);
-    exit(EXIT_FAILURE);
-}
-#endif
-
 
 int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
 {
-#ifdef CONFIG_PROFILER
-    TCGProfile *prof = &s->prof;
-#endif
     int i, num_insns;
     TCGOp *op;
 
@@ -4040,18 +3976,18 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
     }
 #endif
 
-#ifdef CONFIG_PROFILER
-    atomic_set(&prof->opt_time, prof->opt_time - profile_getclock());
-#endif
+    if (tb_stats_enabled(tb, TB_JIT_TIME)) {
+        atomic_add(&tb->tb_stats->time.opt, -profile_getclock());
+    }
 
 #ifdef USE_TCG_OPTIMIZATIONS
     tcg_optimize(s);
 #endif
 
-#ifdef CONFIG_PROFILER
-    atomic_set(&prof->opt_time, prof->opt_time + profile_getclock());
-    atomic_set(&prof->la_time, prof->la_time - profile_getclock());
-#endif
+    if (tb_stats_enabled(tb, TB_JIT_TIME)) {
+        atomic_add(&tb->tb_stats->time.opt, profile_getclock());
+        atomic_add(&tb->tb_stats->time.la,  -profile_getclock());
+    }
 
     reachable_code_pass(s);
     liveness_pass_1(s);
@@ -4074,9 +4010,9 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
         }
     }
 
-#ifdef CONFIG_PROFILER
-    atomic_set(&prof->la_time, prof->la_time + profile_getclock());
-#endif
+    if (tb_stats_enabled(tb, TB_JIT_TIME)) {
+        atomic_add(&tb->tb_stats->time.la, profile_getclock());
+    }
 
 #ifdef DEBUG_DISAS
     if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP_OPT)
@@ -4109,14 +4045,17 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
     s->pool_labels = NULL;
 #endif
 
+    if (!tb_stats_collection_enabled()) {
+        QTAILQ_FOREACH(op, &s->ops, link) {
+            TCGOpcode opc = op->opc;
+            atomic_add(&s->prof.table_op_count[opc], 1);
+        }
+    }
+
     num_insns = -1;
     QTAILQ_FOREACH(op, &s->ops, link) {
         TCGOpcode opc = op->opc;
 
-#ifdef CONFIG_PROFILER
-        atomic_set(&prof->table_op_count[opc], prof->table_op_count[opc] + 1);
-#endif
-
         switch (opc) {
         case INDEX_op_mov_i32:
         case INDEX_op_mov_i64:
@@ -4209,14 +4148,23 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
     return tcg_current_code_size(s);
 }
 
+void tcg_dump_op_count(void)
+{
+    TCGProfile prof = {};
+    int i;
+
+    collect_tcg_profiler(&prof);
+    for (i = 0; i < NB_OPS; i++) {
+        qemu_printf("%s %" PRId64 "\n", tcg_op_defs[i].name, prof.table_op_count[i]);
+    }
+}
+
 void tcg_dump_info(void)
 {
     TCGProfile *s = NULL;
-#ifdef CONFIG_PROFILER
     TCGProfile prof = {};
-    tcg_profile_snapshot_counters(&prof);
     s = &prof;
-#endif
+    collect_tcg_profiler(s);
     dump_jit_profile_info(s);
 }
 
diff --git a/tcg/tcg.h b/tcg/tcg.h
index a8d779f7e6..3550d22873 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -646,12 +646,6 @@ QEMU_BUILD_BUG_ON(NB_OPS > (1 << 8));
 
 typedef struct TCGProfile {
     int64_t cpu_exec_time;
-    int64_t interm_time;
-    int64_t code_time;
-    int64_t la_time;
-    int64_t opt_time;
-    int64_t restore_count;
-    int64_t restore_time;
     int64_t table_op_count[NB_OPS];
 } TCGProfile;
 
@@ -679,9 +673,7 @@ struct TCGContext {
 
     tcg_insn_unit *code_ptr;
 
-#ifdef CONFIG_PROFILER
     TCGProfile prof;
-#endif
 
 #ifdef CONFIG_DEBUG_TCG
     int temps_in_use;
@@ -1013,7 +1005,7 @@ int tcg_check_temp_count(void);
 #define tcg_check_temp_count() 0
 #endif
 
-int64_t tcg_cpu_exec_time(void);
+uint64_t tcg_cpu_exec_time(void);
 void tcg_dump_info(void);
 void tcg_dump_op_count(void);
 
diff --git a/vl.c b/vl.c
index 630f5c5e9c..4203cc728c 100644
--- a/vl.c
+++ b/vl.c
@@ -1796,17 +1796,11 @@ static bool main_loop_should_exit(void)
 
 static void main_loop(void)
 {
-#ifdef CONFIG_PROFILER
-    int64_t ti;
-#endif
+    uint64_t ti;
     while (!main_loop_should_exit()) {
-#ifdef CONFIG_PROFILER
         ti = profile_getclock();
-#endif
         main_loop_wait(false);
-#ifdef CONFIG_PROFILER
         dev_time += profile_getclock() - ti;
-#endif
     }
 }
 
-- 
2.22.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v8 06/11] Adding -d tb_stats to control TBStatistics collection:
  2019-08-29 17:34 [Qemu-devel] [PATCH v8 00/11] Measure Tiny Code Generation Quality vandersonmr
                   ` (4 preceding siblings ...)
  2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 05/11] accel: adding TB_JIT_TIME and full replacing CONFIG_PROFILER vandersonmr
@ 2019-08-29 17:34 ` vandersonmr
  2019-08-30 14:45   ` Alex Bennée
  2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 07/11] monitor: adding tb_stats hmp command vandersonmr
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: vandersonmr @ 2019-08-29 17:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: vandersonmr

 -d tb_stats[[,level=(+all+jit+exec+time)][,dump_limit=<number>]]

"dump_limit" is used to limit the number of dumped TBStats in
linux-user mode.

[all+jit+exec+time] control the profilling level used
by the TBStats. Can be used as follow:

-d tb_stats
-d tb_stats,level=jit+time
-d tb_stats,dump_limit=15
...

Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
---
 include/exec/gen-icount.h     |  1 +
 include/exec/tb-stats-flags.h | 42 +++++++++++++++++++++++++++++++++++
 include/exec/tb-stats.h       | 18 +++------------
 include/qemu/log.h            |  1 +
 util/log.c                    | 35 +++++++++++++++++++++++++++++
 5 files changed, 82 insertions(+), 15 deletions(-)
 create mode 100644 include/exec/tb-stats-flags.h

diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
index be006383b9..3987adfb0e 100644
--- a/include/exec/gen-icount.h
+++ b/include/exec/gen-icount.h
@@ -2,6 +2,7 @@
 #define GEN_ICOUNT_H
 
 #include "qemu/timer.h"
+#include "tb-stats-flags.h"
 
 /* Helpers for instruction counting code generation.  */
 
diff --git a/include/exec/tb-stats-flags.h b/include/exec/tb-stats-flags.h
new file mode 100644
index 0000000000..c936ac1084
--- /dev/null
+++ b/include/exec/tb-stats-flags.h
@@ -0,0 +1,42 @@
+/*
+ * QEMU System Emulator, Code Quality Monitor System
+ *
+ * Copyright (c) 2019 Vanderson M. do Rosario <vandersonmr2@gmail.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+#ifndef TB_STATS_FLAGS
+#define TB_STATS_FLAGS
+
+enum TBStatsStatus {
+    TB_STATS_DISABLED = 0,
+    TB_STATS_RUNNING,
+    TB_STATS_PAUSED,
+    TB_STATS_STOPPED
+};
+
+#define TB_NOTHING    (1 << 0)
+#define TB_EXEC_STATS (1 << 1)
+#define TB_JIT_STATS  (1 << 2)
+#define TB_JIT_TIME   (1 << 3)
+
+extern int tcg_collect_tb_stats;
+extern uint32_t default_tbstats_flag;
+
+#endif
diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
index 0b9a6e2f72..6a53bef31b 100644
--- a/include/exec/tb-stats.h
+++ b/include/exec/tb-stats.h
@@ -30,6 +30,8 @@
 #include "exec/tb-context.h"
 #include "tcg.h"
 
+#include "exec/tb-stats-flags.h"
+
 #define tb_stats_enabled(tb, JIT_STATS) \
     (tb && tb->tb_stats && (tb->tb_stats->stats_enabled & JIT_STATS))
 
@@ -98,26 +100,12 @@ bool tb_stats_cmp(const void *ap, const void *bp);
 
 void dump_jit_exec_time_info(uint64_t dev_time);
 
+void set_tbstats_flags(uint32_t flags);
 void init_tb_stats_htable_if_not(void);
 
 void dump_jit_profile_info(TCGProfile *s);
 
 /* TBStatistic collection controls */
-enum TBStatsStatus {
-    TB_STATS_DISABLED = 0,
-    TB_STATS_RUNNING,
-    TB_STATS_PAUSED,
-    TB_STATS_STOPPED
-};
-
-#define TB_NOTHING    (1 << 0)
-#define TB_EXEC_STATS (1 << 1)
-#define TB_JIT_STATS  (1 << 2)
-#define TB_JIT_TIME   (1 << 3)
-
-extern int tcg_collect_tb_stats;
-extern uint32_t default_tbstats_flag;
-
 void enable_collect_tb_stats(void);
 void disable_collect_tb_stats(void);
 void pause_collect_tb_stats(void);
diff --git a/include/qemu/log.h b/include/qemu/log.h
index b097a6cae1..a8d1997cde 100644
--- a/include/qemu/log.h
+++ b/include/qemu/log.h
@@ -45,6 +45,7 @@ static inline bool qemu_log_separate(void)
 /* LOG_TRACE (1 << 15) is defined in log-for-trace.h */
 #define CPU_LOG_TB_OP_IND  (1 << 16)
 #define CPU_LOG_TB_FPU     (1 << 17)
+#define CPU_LOG_TB_STATS   (1 << 18)
 
 /* Lock output for a series of related logs.  Since this is not needed
  * for a single qemu_log / qemu_log_mask / qemu_log_mask_and_addr, we
diff --git a/util/log.c b/util/log.c
index 29021a4584..c3805b331b 100644
--- a/util/log.c
+++ b/util/log.c
@@ -19,17 +19,20 @@
 
 #include "qemu/osdep.h"
 #include "qemu/log.h"
+#include "qemu/qemu-print.h"
 #include "qemu/range.h"
 #include "qemu/error-report.h"
 #include "qapi/error.h"
 #include "qemu/cutils.h"
 #include "trace/control.h"
+#include "exec/tb-stats-flags.h"
 
 static char *logfilename;
 FILE *qemu_logfile;
 int qemu_loglevel;
 static int log_append = 0;
 static GArray *debug_regions;
+int32_t max_num_hot_tbs_to_dump;
 
 int tcg_collect_tb_stats;
 uint32_t default_tbstats_flag;
@@ -276,6 +279,9 @@ const QEMULogItem qemu_log_items[] = {
     { CPU_LOG_TB_NOCHAIN, "nochain",
       "do not chain compiled TBs so that \"exec\" and \"cpu\" show\n"
       "complete traces" },
+    { CPU_LOG_TB_STATS, "tb_stats[[,level=(+all+jit+exec+time)][,dump_limit=<number>]]",
+      "enable collection of TBs statistics"
+      "(and dump until given a limit if in user mode).\n" },
     { 0, NULL, NULL },
 };
 
@@ -297,6 +303,35 @@ int qemu_str_to_log_mask(const char *str)
             trace_enable_events((*tmp) + 6);
             mask |= LOG_TRACE;
 #endif
+        } else if (g_str_has_prefix(*tmp, "tb_stats")) {
+            mask |= CPU_LOG_TB_STATS;
+            default_tbstats_flag = TB_JIT_STATS | TB_EXEC_STATS | TB_JIT_TIME;
+            tcg_collect_tb_stats = TB_STATS_RUNNING;
+        } else if (tcg_collect_tb_stats == TB_STATS_RUNNING &&
+                g_str_has_prefix(*tmp, "dump_limit=")) {
+
+            max_num_hot_tbs_to_dump = atoi((*tmp) + 11);
+        } else if (tcg_collect_tb_stats == TB_STATS_RUNNING &&
+                g_str_has_prefix(*tmp, "level=")) {
+
+            default_tbstats_flag = 0;
+            char **level_parts = g_strsplit(*tmp + 6, "+", 0);
+            char **level_tmp;
+            for (level_tmp = level_parts; level_tmp && *level_tmp; level_tmp++) {
+                if (g_str_equal(*level_tmp, "jit")) {
+                    default_tbstats_flag |= TB_JIT_STATS;
+                } else if (g_str_equal(*level_tmp, "exec")) {
+                    default_tbstats_flag |= TB_EXEC_STATS;
+                } else if (g_str_equal(*level_tmp, "time")) {
+                    default_tbstats_flag |= TB_JIT_TIME;
+                } else if (g_str_equal(*level_tmp, "all")) {
+                    default_tbstats_flag |= TB_JIT_STATS | TB_EXEC_STATS | TB_JIT_TIME;
+                } else {
+                    fprintf(stderr, "no option level=%s, valid options are:"
+                            "all, jit, exec or/and time\n", *level_tmp);
+                    exit(1);
+                }
+            }
         } else {
             for (item = qemu_log_items; item->mask != 0; item++) {
                 if (g_str_equal(*tmp, item->name)) {
-- 
2.22.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v8 07/11] monitor: adding tb_stats hmp command
  2019-08-29 17:34 [Qemu-devel] [PATCH v8 00/11] Measure Tiny Code Generation Quality vandersonmr
                   ` (5 preceding siblings ...)
  2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 06/11] Adding -d tb_stats to control TBStatistics collection: vandersonmr
@ 2019-08-29 17:34 ` vandersonmr
  2019-08-30 15:11   ` Alex Bennée
  2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 08/11] Adding tb_stats [start|pause|stop|filter] command to hmp vandersonmr
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: vandersonmr @ 2019-08-29 17:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, vandersonmr, Dr . David Alan Gilbert,
	Markus Armbruster, Richard Henderson

Adding tb_stats [start|pause|stop|filter] command to hmp.
This allows controlling the collection of statistics.
It is also possible to set the level of collection:
all, jit, or exec.

tb_stats filter allow to only collect statistics for the TB
in the last_search list.

The goal of this command is to allow the dynamic exploration
of the TCG behavior and quality. Therefore, for now, a
corresponding QMP command is not worthwhile.

Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
---
 accel/tcg/tb-stats.c          | 110 ++++++++++++++++++++++++++++++++++
 hmp-commands.hx               |  17 ++++++
 include/exec/tb-stats-flags.h |   1 +
 include/exec/tb-stats.h       |  11 ++++
 monitor/misc.c                |  47 +++++++++++++++
 vl.c                          |   6 ++
 6 files changed, 192 insertions(+)

diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
index 66abc97ad4..9959477fbb 100644
--- a/accel/tcg/tb-stats.c
+++ b/accel/tcg/tb-stats.c
@@ -32,6 +32,9 @@
 
 #include "exec/tb-stats.h"
 
+/* only accessed in safe work */
+static GList *last_search;
+
 uint64_t dev_time;
 
 struct jit_profile_info {
@@ -160,6 +163,95 @@ void dump_jit_profile_info(TCGProfile *s)
     g_free(jpi);
 }
 
+static void free_tbstats(void *p, uint32_t hash, void *userp)
+{
+    g_free(p);
+}
+
+static void clean_tbstats(void)
+{
+    /* remove all tb_stats */
+    qht_iter(&tb_ctx.tb_stats, free_tbstats, NULL);
+    qht_destroy(&tb_ctx.tb_stats);
+}
+
+void do_hmp_tbstats_safe(CPUState *cpu, run_on_cpu_data icmd)
+{
+    struct TbstatsCommand *cmdinfo = icmd.host_ptr;
+    int cmd = cmdinfo->cmd;
+    uint32_t level = cmdinfo->level;
+
+    switch (cmd) {
+    case START:
+        if (tb_stats_collection_paused()) {
+            set_tbstats_flags(level);
+        } else {
+            if (tb_stats_collection_enabled()) {
+                qemu_printf("TB information already being recorded");
+                return;
+            }
+            qht_init(&tb_ctx.tb_stats, tb_stats_cmp, CODE_GEN_HTABLE_SIZE,
+                        QHT_MODE_AUTO_RESIZE);
+        }
+
+        set_default_tbstats_flag(level);
+        enable_collect_tb_stats();
+        tb_flush(cpu);
+        break;
+    case PAUSE:
+        if (!tb_stats_collection_enabled()) {
+            qemu_printf("TB information not being recorded");
+            return;
+        }
+
+        /* Continue to create TBStatistic structures but stop collecting statistics */
+        pause_collect_tb_stats();
+        set_default_tbstats_flag(TB_NOTHING);
+        set_tbstats_flags(TB_PAUSED);
+        tb_flush(cpu);
+        break;
+    case STOP:
+        if (!tb_stats_collection_enabled()) {
+            qemu_printf("TB information not being recorded");
+            return;
+        }
+
+        /* Dissalloc all TBStatistics structures and stop creating new ones */
+        disable_collect_tb_stats();
+        clean_tbstats();
+        tb_flush(cpu);
+        break;
+    case FILTER:
+        if (!tb_stats_collection_enabled()) {
+            qemu_printf("TB information not being recorded");
+            return;
+        }
+        if (!last_search) {
+            qemu_printf("no search on record! execute info tbs before filtering!");
+            return;
+        }
+
+        set_default_tbstats_flag(TB_NOTHING);
+
+        /* Set all tbstats as paused, then return only the ones from last_search */
+        pause_collect_tb_stats();
+        set_tbstats_flags(TB_PAUSED);
+
+        for (GList *iter = last_search; iter; iter = g_list_next(iter)) {
+            TBStatistics *tbs = iter->data;
+            tbs->stats_enabled = level;
+        }
+
+        tb_flush(cpu);
+
+        break;
+    default: /* INVALID */
+        g_assert_not_reached();
+        break;
+    }
+
+    g_free(cmdinfo);
+}
 
 void init_tb_stats_htable_if_not(void)
 {
@@ -195,6 +287,24 @@ bool tb_stats_collection_paused(void)
     return tcg_collect_tb_stats == TB_STATS_PAUSED;
 }
 
+static void reset_tbstats_flag(void *p, uint32_t hash, void *userp)
+{
+    uint32_t flag = *((int *)userp);
+    TBStatistics *tbs = p;
+    tbs->stats_enabled = flag;
+}
+
+void set_default_tbstats_flag(uint32_t flag)
+{
+    default_tbstats_flag = flag;
+}
+
+void set_tbstats_flags(uint32_t flag)
+{
+    /* iterate over tbstats setting their flag as TB_NOTHING */
+    qht_iter(&tb_ctx.tb_stats, reset_tbstats_flag, &flag);
+}
+
 uint32_t get_default_tbstats_flag(void)
 {
     return default_tbstats_flag;
diff --git a/hmp-commands.hx b/hmp-commands.hx
index cfcc044ce4..6cd2800378 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1886,6 +1886,23 @@ STEXI
 @findex qemu-io
 Executes a qemu-io command on the given block device.
 
+ETEXI
+#if defined(CONFIG_TCG)
+    {
+        .name       = "tb_stats",
+        .args_type  = "command:s,level:s?",
+        .params     = "command [stats_level]",
+        .help       = "Control tb statistics collection:"
+                        "tb_stats (start|pause|stop|filter) [all|jit_stats|exec_stats]",
+        .cmd        = hmp_tbstats,
+    },
+#endif
+
+STEXI
+@item tb_stats
+@findex
+Control recording tb statistics
+
 ETEXI
 
     {
diff --git a/include/exec/tb-stats-flags.h b/include/exec/tb-stats-flags.h
index c936ac1084..16b563f858 100644
--- a/include/exec/tb-stats-flags.h
+++ b/include/exec/tb-stats-flags.h
@@ -35,6 +35,7 @@ enum TBStatsStatus {
 #define TB_EXEC_STATS (1 << 1)
 #define TB_JIT_STATS  (1 << 2)
 #define TB_JIT_TIME   (1 << 3)
+#define TB_PAUSED     (1 << 4)
 
 extern int tcg_collect_tb_stats;
 extern uint32_t default_tbstats_flag;
diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
index 6a53bef31b..65063c52d7 100644
--- a/include/exec/tb-stats.h
+++ b/include/exec/tb-stats.h
@@ -32,6 +32,9 @@
 
 #include "exec/tb-stats-flags.h"
 
+enum SortBy { SORT_BY_HOTNESS, SORT_BY_HG /* Host/Guest */, SORT_BY_SPILLS };
+enum TbstatsCmd { START, PAUSE, STOP, FILTER };
+
 #define tb_stats_enabled(tb, JIT_STATS) \
     (tb && tb->tb_stats && (tb->tb_stats->stats_enabled & JIT_STATS))
 
@@ -105,6 +108,13 @@ void init_tb_stats_htable_if_not(void);
 
 void dump_jit_profile_info(TCGProfile *s);
 
+struct TbstatsCommand {
+    enum TbstatsCmd cmd;
+    uint32_t level;
+};
+
+void do_hmp_tbstats_safe(CPUState *cpu, run_on_cpu_data icmd);
+
 /* TBStatistic collection controls */
 void enable_collect_tb_stats(void);
 void disable_collect_tb_stats(void);
@@ -112,6 +122,7 @@ void pause_collect_tb_stats(void);
 bool tb_stats_collection_enabled(void);
 bool tb_stats_collection_paused(void);
 
+void set_default_tbstats_flag(uint32_t flag);
 uint32_t get_default_tbstats_flag(void);
 
 #endif
diff --git a/monitor/misc.c b/monitor/misc.c
index ac4ff58d96..b389ca09a1 100644
--- a/monitor/misc.c
+++ b/monitor/misc.c
@@ -459,6 +459,49 @@ static void hmp_info_registers(Monitor *mon, const QDict *qdict)
 }
 
 #ifdef CONFIG_TCG
+static void hmp_tbstats(Monitor *mon, const QDict *qdict)
+{
+    if (!tcg_enabled()) {
+        error_report("TB information is only available with accel=tcg");
+        return;
+    }
+
+    char *cmd = (char *) qdict_get_try_str(qdict, "command");
+    enum TbstatsCmd icmd = -1;
+
+    if (strcmp(cmd, "start") == 0) {
+        icmd = START;
+    } else if (strcmp(cmd, "pause") == 0) {
+        icmd = PAUSE;
+    } else if (strcmp(cmd, "stop") == 0) {
+        icmd = STOP;
+    } else if (strcmp(cmd, "filter") == 0) {
+        icmd = FILTER;
+    } else {
+        error_report("invalid command!");
+        return;
+    }
+
+    char *slevel = (char *) qdict_get_try_str(qdict, "level");
+    uint32_t level = TB_EXEC_STATS | TB_JIT_STATS | TB_JIT_TIME;
+    if (slevel) {
+        if (strcmp(slevel, "jit") == 0) {
+            level = TB_JIT_STATS;
+        } else if (strcmp(slevel, "exec") == 0) {
+            level = TB_EXEC_STATS;
+        } else if (strcmp(slevel, "time") == 0) {
+            level = TB_JIT_TIME;
+        }
+    }
+
+    struct TbstatsCommand *tbscommand = g_new0(struct TbstatsCommand, 1);
+    tbscommand->cmd = icmd;
+    tbscommand->level = level;
+    async_safe_run_on_cpu(first_cpu, do_hmp_tbstats_safe,
+                          RUN_ON_CPU_HOST_PTR(tbscommand));
+
+}
+
 static void hmp_info_jit(Monitor *mon, const QDict *qdict)
 {
     if (!tcg_enabled()) {
@@ -1089,8 +1132,12 @@ static void hmp_info_mtree(Monitor *mon, const QDict *qdict)
 
 static void hmp_info_profile(Monitor *mon, const QDict *qdict)
 {
+#ifdef CONFIG_TCG
     dump_jit_exec_time_info(dev_time);
     dev_time = 0;
+#else
+    error_report("TCG should be enabled!");
+#endif
 }
 
 /* Capture support */
diff --git a/vl.c b/vl.c
index 4203cc728c..dcb30eea8a 100644
--- a/vl.c
+++ b/vl.c
@@ -1796,11 +1796,17 @@ static bool main_loop_should_exit(void)
 
 static void main_loop(void)
 {
+#ifdef CONFIG_TCG
     uint64_t ti;
+#endif
     while (!main_loop_should_exit()) {
+#ifdef CONFIG_TCG
         ti = profile_getclock();
+#endif
         main_loop_wait(false);
+#ifdef CONFIG_TCG
         dev_time += profile_getclock() - ti;
+#endif
     }
 }
 
-- 
2.22.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v8 08/11] Adding tb_stats [start|pause|stop|filter] command to hmp.
  2019-08-29 17:34 [Qemu-devel] [PATCH v8 00/11] Measure Tiny Code Generation Quality vandersonmr
                   ` (6 preceding siblings ...)
  2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 07/11] monitor: adding tb_stats hmp command vandersonmr
@ 2019-08-29 17:34 ` vandersonmr
  2019-08-29 17:54   ` Vanderson Martins do Rosario
  2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 09/11] Adding info [tb-list|tb|coverset] commands to HMP vandersonmr
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: vandersonmr @ 2019-08-29 17:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: vandersonmr, Dr . David Alan Gilbert, Markus Armbruster

This allows controlling the collection of statistics.
It is also possible to set the level of collection:
all, jit, or exec.

tb_stats filter allow to only collect statistics for the TB
in the last_search list.

The goal of this command is to allow the dynamic exploration
of the TCG behavior and quality. Therefore, for now, a
corresponding QMP command is not worthwhile.

Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
---
 monitor/misc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/monitor/misc.c b/monitor/misc.c
index b389ca09a1..218263d29a 100644
--- a/monitor/misc.c
+++ b/monitor/misc.c
@@ -74,6 +74,8 @@
 #include "sysemu/cpus.h"
 #include "qemu/cutils.h"
 #include "tcg/tcg.h"
+#include "exec/tb-stats.h"
+#include "qemu-common.h"
 
 #if defined(TARGET_S390X)
 #include "hw/s390x/storage-keys.h"
-- 
2.22.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v8 09/11] Adding info [tb-list|tb|coverset] commands to HMP.
  2019-08-29 17:34 [Qemu-devel] [PATCH v8 00/11] Measure Tiny Code Generation Quality vandersonmr
                   ` (7 preceding siblings ...)
  2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 08/11] Adding tb_stats [start|pause|stop|filter] command to hmp vandersonmr
@ 2019-08-29 17:34 ` vandersonmr
  2019-08-30 16:17   ` Alex Bennée
  2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 10/11] monitor: adding new info cfg command vandersonmr
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: vandersonmr @ 2019-08-29 17:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, vandersonmr, Dr . David Alan Gilbert,
	Markus Armbruster, Richard Henderson

These commands allow the exploration of TBs
generated by the TCG. Understand which one
hotter, with more guest/host instructions...
and examine their guest, host and IR code.

The goal of this command is to allow the dynamic exploration
of TCG behavior and code quality. Therefore, for now, a
corresponding QMP command is not worthwhile.

Example of output:

TB id:1 | phys:0x34d54 virt:0x0000000000034d54 flags:0x0000f0
	| exec:4828932/0 guest inst cov:16.38%
	| trans:1 ints: g:3 op:82 op_opt:34 spills:3
	| h/g (host bytes / guest insts): 90.666664
	| time to gen at 2.4GHz => code:3150.83(ns) IR:712.08(ns)
	| targets: 0x0000000000034d5e (id:11), 0x0000000000034d0d (id:2)

TB id:2 | phys:0x34d0d virt:0x0000000000034d0d flags:0x0000f0
	| exec:4825842/0 guest inst cov:21.82%
	| trans:1 ints: g:4 op:80 op_opt:38 spills:2
	| h/g (host bytes / guest insts): 84.000000
	| time to gen at 2.4GHz => code:3362.92(ns) IR:793.75(ns)
	| targets: 0x0000000000034d19 (id:12), 0x0000000000034d54 (id:1)

TB id:2 | phys:0x34d0d virt:0x0000000000034d0d flags:0x0000f0
	| exec:6956495/0  guest inst cov:21.82%
	| trans:2 ints: g:2 op:40 op_opt:19 spills:1
	| h/g (host bytes / guest insts): 84.000000
	| time to gen at 2.4GHz => code:3130.83(ns) IR:722.50(ns)
	| targets: 0x0000000000034d19 (id:12), 0x0000000000034d54 (id:1)

----------------
IN:
0x00034d0d:  89 de                    movl     %ebx, %esi
0x00034d0f:  26 8b 0e                 movl     %es:(%esi), %ecx
0x00034d12:  26 f6 46 08 80           testb    $0x80, %es:8(%esi)
0x00034d17:  75 3b                    jne      0x34d54

------------------------------

TB id:1 | phys:0x34d54 virt:0x0000000000034d54 flags:0x0000f0
	| exec:5202686/0 guest inst cov:11.28%
	| trans:1 ints: g:3 op:82 op_opt:34 spills:3
	| h/g (host bytes / guest insts): 90.666664
	| time to gen at 2.4GHz => code:2793.75(ns) IR:614.58(ns)
	| targets: 0x0000000000034d5e (id:3), 0x0000000000034d0d (id:2)

TB id:2 | phys:0x34d0d virt:0x0000000000034d0d flags:0x0000f0
	| exec:5199468/0 guest inst cov:15.03%
	| trans:1 ints: g:4 op:80 op_opt:38 spills:2
	| h/g (host bytes / guest insts): 84.000000
	| time to gen at 2.4GHz => code:2958.75(ns) IR:719.58(ns)
	| targets: 0x0000000000034d19 (id:4), 0x0000000000034d54 (id:1)

------------------------------
2 TBs to reach 25% of guest inst exec coverage
Total of guest insts exec: 138346727

------------------------------

Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
---
 accel/tcg/tb-stats.c         | 421 ++++++++++++++++++++++++++++++++++-
 accel/tcg/translate-all.c    |   2 +-
 disas.c                      |  31 ++-
 hmp-commands-info.hx         |  24 ++
 include/exec/tb-stats.h      |  45 +++-
 include/qemu/log-for-trace.h |   4 +
 include/qemu/log.h           |   2 +
 monitor/misc.c               |  74 ++++++
 util/log.c                   |  52 ++++-
 9 files changed, 634 insertions(+), 21 deletions(-)

diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
index 9959477fbb..d588c551c9 100644
--- a/accel/tcg/tb-stats.c
+++ b/accel/tcg/tb-stats.c
@@ -34,9 +34,35 @@
 
 /* only accessed in safe work */
 static GList *last_search;
-
+int id = 1; /* display_id increment counter */
 uint64_t dev_time;
 
+static TBStatistics *get_tbstats_by_id(int id)
+{
+    GList *iter;
+
+    for (iter = last_search; iter; iter = g_list_next(iter)) {
+        TBStatistics *tbs = iter->data;
+        if (tbs && tbs->display_id == id) {
+            return tbs;
+            break;
+        }
+    }
+    return NULL;
+}
+
+static TBStatistics *get_tbstats_by_addr(target_ulong pc)
+{
+    GList *iter;
+    for (iter = last_search; iter; iter = g_list_next(iter)) {
+        TBStatistics *tbs = iter->data;
+        if (tbs && tbs->pc == pc) {
+            return tbs;
+        }
+    }
+    return NULL;
+}
+
 struct jit_profile_info {
     uint64_t translations;
     uint64_t aborted;
@@ -175,6 +201,7 @@ static void clean_tbstats(void)
     qht_destroy(&tb_ctx.tb_stats);
 }
 
+
 void do_hmp_tbstats_safe(CPUState *cpu, run_on_cpu_data icmd)
 {
     struct TbstatsCommand *cmdinfo = icmd.host_ptr;
@@ -261,6 +288,398 @@ void init_tb_stats_htable_if_not(void)
     }
 }
 
+static void collect_tb_stats(void *p, uint32_t hash, void *userp)
+{
+    last_search = g_list_prepend(last_search, p);
+}
+
+static void dump_tb_targets(TBStatistics *tbs)
+{
+    if (tbs && tbs->tb) {
+        TBStatistics *valid_target_tbstats[2] = {NULL, NULL};
+
+        /*
+         * Check and ensure that targets' tbstats have a valid display_id and
+         * are in last_search list
+         */
+        for (int jmp_id = 0; jmp_id < 2; jmp_id++) {
+            qemu_spin_lock(&tbs->tb->jmp_lock);
+            TranslationBlock *tb_dst =
+                (TranslationBlock *) (atomic_read(&tbs->tb->jmp_dest[jmp_id]) & ~1);
+            qemu_spin_unlock(&tbs->tb->jmp_lock);
+
+            if (tb_dst) {
+                target_ulong pc = tb_dst ? tb_dst->pc : 0;
+
+                /* Check if tb_dst is on the last_search list */
+                TBStatistics *tbstats_pc = get_tbstats_by_addr(pc);
+
+                /* if not in the last_search list, then insert it */
+                if (!tbstats_pc) {
+                    last_search = g_list_append(last_search, tb_dst->tb_stats);
+                }
+
+                /* if the tb_stats does not have a valid display_id, then set one. */
+                if (tb_dst->tb_stats && tb_dst->tb_stats->display_id == 0) {
+                    tb_dst->tb_stats->display_id = id++;
+                    valid_target_tbstats[jmp_id] = tb_dst->tb_stats;
+                }
+            }
+        }
+
+        if (valid_target_tbstats[0] && !valid_target_tbstats[1]) {
+            qemu_log("\t| targets: 0x"TARGET_FMT_lx" (id:%d)\n",
+                    valid_target_tbstats[0]->pc, valid_target_tbstats[0]->display_id);
+        } else if (!valid_target_tbstats[0] && valid_target_tbstats[1]) {
+            qemu_log("\t| targets: 0x"TARGET_FMT_lx" (id:%d)\n",
+                    valid_target_tbstats[1]->pc, valid_target_tbstats[1]->display_id);
+        } else if (valid_target_tbstats[0] && valid_target_tbstats[1]) {
+            qemu_log("\t| targets: 0x"TARGET_FMT_lx" (id:%d), "
+                     "0x"TARGET_FMT_lx" (id:%d)\n",
+                    valid_target_tbstats[0]->pc, valid_target_tbstats[0]->display_id,
+                    valid_target_tbstats[1]->pc, valid_target_tbstats[1]->display_id);
+        } else {
+            qemu_log("\t| targets: no direct target\n");
+        }
+    }
+}
+
+static void dump_tb_header(TBStatistics *tbs)
+{
+    unsigned g = stat_per_translation(tbs, code.num_guest_inst);
+    unsigned ops = stat_per_translation(tbs, code.num_tcg_ops);
+    unsigned ops_opt = stat_per_translation(tbs, code.num_tcg_ops_opt);
+    unsigned spills = stat_per_translation(tbs, code.spills);
+    unsigned h = stat_per_translation(tbs, code.out_len);
+
+    float guest_host_prop = g ? ((float) h / g) : 0;
+
+    qemu_log("TB id:%d | phys:0x"TB_PAGE_ADDR_FMT" virt:0x"TARGET_FMT_lx
+             " flags:%#08x\n", tbs->display_id, tbs->phys_pc, tbs->pc, tbs->flags);
+
+    if (tbs_stats_enabled(tbs, TB_EXEC_STATS)) {
+        qemu_log("\t| exec:%lu/%lu guest inst cov:%.2f%%\n", tbs->executions.normal,
+                tbs->executions.atomic, tbs->executions.coverage / 100.0f);
+    }
+
+    if (tbs_stats_enabled(tbs, TB_JIT_STATS)) {
+        qemu_log("\t| trans:%lu ints: g:%u op:%u op_opt:%u spills:%d"
+             "\n\t| h/g (host bytes / guest insts): %f\n",
+             tbs->translations.total, g, ops, ops_opt, spills, guest_host_prop);
+    }
+
+    if (tbs_stats_enabled(tbs, TB_JIT_TIME)) {
+        qemu_log("\t| time to gen at 2.4GHz => code:%0.2lf(ns) IR:%0.2lf(ns)\n",
+             tbs->time.code / 2.4, tbs->time.interm / 2.4);
+    }
+
+    dump_tb_targets(tbs);
+    qemu_log("\n");
+}
+
+static gint
+inverse_sort_tbs(gconstpointer p1, gconstpointer p2, gpointer psort_by)
+{
+    const TBStatistics *tbs1 = (TBStatistics *) p1;
+    const TBStatistics *tbs2 = (TBStatistics *) p2;
+    int sort_by = *((int *) psort_by);
+    unsigned long c1 = 0;
+    unsigned long c2 = 0;
+
+    if (likely(sort_by == SORT_BY_SPILLS)) {
+        c1 = stat_per_translation(tbs1, code.spills);
+        c2 = stat_per_translation(tbs2, code.spills);
+    } else if (likely(sort_by == SORT_BY_HOTNESS)) {
+        c1 = stat_per_translation(tbs1, executions.normal);
+        c2 = stat_per_translation(tbs2, executions.normal);
+    } else if (likely(sort_by == SORT_BY_HG)) {
+        if (tbs1->code.num_guest_inst == 0) {
+            return -1;
+        }
+        if (tbs2->code.num_guest_inst == 0) {
+            return 1;
+        }
+
+        float a =
+            (float) stat_per_translation(tbs1, code.out_len) / tbs1->code.num_guest_inst;
+        float b =
+            (float) stat_per_translation(tbs2, code.out_len) / tbs2->code.num_guest_inst;
+        c1 = a <= b ? 0 : 1;
+        c2 = a <= b ? 1 : 0;
+    }
+
+    return c1 < c2 ? 1 : c1 == c2 ? 0 : -1;
+}
+
+static void dump_last_search_headers(int count)
+{
+    if (!last_search) {
+        qemu_log("No data collected yet\n");
+        return;
+    }
+
+    GList *l = last_search;
+    while (l != NULL && count--) {
+        TBStatistics *tbs = (TBStatistics *) l->data;
+        GList *next = l->next;
+        dump_tb_header(tbs);
+        l = next;
+    }
+}
+
+static uint64_t calculate_last_search_coverages(void)
+{
+    uint64_t total_exec_count = 0;
+    GList *i;
+
+    /* Compute total execution count for all tbs */
+    for (i = last_search; i; i = i->next) {
+        TBStatistics *tbs = (TBStatistics *) i->data;
+        total_exec_count +=
+            (tbs->executions.atomic + tbs->executions.normal) * tbs->code.num_guest_inst;
+    }
+
+    for (i = last_search; i; i = i->next) {
+        TBStatistics *tbs = (TBStatistics *) i->data;
+        uint64_t tb_total_execs =
+            (tbs->executions.atomic + tbs->executions.normal) * tbs->code.num_guest_inst;
+        tbs->executions.coverage = (10000 * tb_total_execs) / (total_exec_count + 1);
+    }
+
+    return total_exec_count;
+}
+
+static void do_dump_coverset_info(int percentage)
+{
+    mmap_lock();
+    uint16_t total_coverage = 0;
+    unsigned coverset_size = 0;
+    percentage *= 100;
+    id = 1;
+    GList *i;
+
+    g_list_free(last_search);
+    last_search = NULL;
+
+    qht_iter(&tb_ctx.tb_stats, collect_tb_stats, NULL);
+
+    int sort_by = SORT_BY_HOTNESS;
+    last_search = g_list_sort_with_data(last_search, inverse_sort_tbs, &sort_by);
+
+    if (!last_search) {
+        qemu_log("No data collected yet\n");
+        return;
+    }
+
+    uint64_t total_exec_count = calculate_last_search_coverages();
+
+    /* Iterate and tbs display_id until reach the coverage percentage count */
+    for (i = last_search; i && total_coverage < percentage; i = i->next) {
+        TBStatistics *tbs = (TBStatistics *) i->data;
+        tbs->display_id = id++;
+        coverset_size++;
+        total_coverage += tbs->executions.coverage;
+    }
+
+    /* free the unused bits */
+    if (i) {
+        if (i->next) {
+            i->next->prev = NULL;
+        }
+        g_list_free(i->next);
+        i->next = NULL;
+    }
+
+    dump_last_search_headers(coverset_size);
+    mmap_unlock();
+
+    qemu_log("------------------------------\n");
+    qemu_log("%u TBs to reach %d%% of guest inst exec coverage\n",
+                coverset_size, percentage / 100);
+    qemu_log("Total of guest insts exec: %lu", total_exec_count);
+    qemu_log("\n------------------------------\n");
+}
+
+static void do_dump_tbs_info(int total, int sort_by)
+{
+    id = 1;
+    GList *i;
+    int count = total;
+
+    g_list_free(last_search);
+    last_search = NULL;
+
+    qht_iter(&tb_ctx.tb_stats, collect_tb_stats, NULL);
+
+    last_search = g_list_sort_with_data(last_search, inverse_sort_tbs, &sort_by);
+
+    if (!last_search) {
+        qemu_printf("No data collected yet!\n");
+        return;
+    }
+
+    calculate_last_search_coverages();
+
+    for (i = last_search; i && count--; i = i->next) {
+        TBStatistics *tbs = (TBStatistics *) i->data;
+        tbs->display_id = id++;
+    }
+
+    /* free the unused bits */
+    if (i) {
+        if (i->next) {
+            i->next->prev = NULL;
+        }
+        g_list_free(i->next);
+        i->next = NULL;
+    }
+
+    dump_last_search_headers(total);
+}
+
+static void
+do_dump_coverset_info_safe(CPUState *cpu, run_on_cpu_data percentage)
+{
+    qemu_log_to_monitor(true);
+    do_dump_coverset_info(percentage.host_int);
+    qemu_log_to_monitor(false);
+}
+
+struct tbs_dump_info {
+    int count;
+    int sort_by;
+};
+
+static void do_dump_tbs_info_safe(CPUState *cpu, run_on_cpu_data tbdi)
+{
+    struct tbs_dump_info *info = tbdi.host_ptr;
+    qemu_log_to_monitor(true);
+    do_dump_tbs_info(info->count, info->sort_by);
+    qemu_log_to_monitor(false);
+    g_free(info);
+}
+
+/*
+ * When we dump_tbs_info on a live system via the HMP we want to
+ * ensure the system is quiessent before we start outputting stuff.
+ * Otherwise we could pollute the output with other logging output.
+ */
+void dump_coverset_info(int percentage, bool use_monitor)
+{
+    if (use_monitor) {
+        async_safe_run_on_cpu(first_cpu, do_dump_coverset_info_safe,
+                              RUN_ON_CPU_HOST_INT(percentage));
+    } else {
+        do_dump_coverset_info(percentage);
+    }
+}
+
+void dump_tbs_info(int count, int sort_by, bool use_monitor)
+{
+    if (use_monitor) {
+        struct tbs_dump_info *tbdi = g_new(struct tbs_dump_info, 1);
+        tbdi->count = count;
+        tbdi->sort_by = sort_by;
+        async_safe_run_on_cpu(first_cpu, do_dump_tbs_info_safe,
+                              RUN_ON_CPU_HOST_PTR(tbdi));
+    } else {
+        do_dump_tbs_info(count, sort_by);
+    }
+}
+
+static GString *get_code_string(TBStatistics *tbs, int log_flags)
+{
+    int old_log_flags = qemu_loglevel;
+
+    CPUState *cpu = first_cpu;
+    uint32_t cflags = curr_cflags() | CF_NOCACHE;
+    TranslationBlock *tb = NULL;
+
+    GString *code_s = g_string_new(NULL);
+    qemu_log_to_string(true, code_s);
+
+    qemu_set_log(log_flags);
+
+    if (sigsetjmp(cpu->jmp_env, 0) == 0) {
+        mmap_lock();
+        tb = tb_gen_code(cpu, tbs->pc, tbs->cs_base, tbs->flags, cflags);
+        tb_phys_invalidate(tb, -1);
+        mmap_unlock();
+    } else {
+        /*
+         * The mmap_lock is dropped by tb_gen_code if it runs out of
+         * memory.
+         */
+        fprintf(stderr, "%s: dbg failed!\n", __func__);
+        qemu_log("\ncould not gen code for this TB\n");
+        assert_no_pages_locked();
+    }
+
+    qemu_set_log(old_log_flags);
+    qemu_log_to_string(false, NULL);
+
+    if (tb) {
+        tcg_tb_remove(tb);
+    }
+
+    return code_s;
+}
+
+static void do_tb_dump_with_statistics(TBStatistics *tbs, int log_flags)
+{
+    qemu_log("\n------------------------------\n\n");
+    dump_tb_header(tbs);
+
+    GString *code_s = get_code_string(tbs, log_flags);
+    qemu_log("%s", code_s->str);
+    g_string_free(code_s, true);
+    qemu_log("------------------------------\n");
+}
+
+struct tb_dump_info {
+    int id;
+    int log_flags;
+    bool use_monitor;
+};
+
+static void do_dump_tb_info_safe(CPUState *cpu, run_on_cpu_data info)
+{
+    struct tb_dump_info *tbdi = (struct tb_dump_info *) info.host_ptr;
+
+    if (!last_search) {
+        qemu_log("no search on record\n");
+        return;
+    }
+
+    qemu_log_to_monitor(tbdi->use_monitor);
+
+    TBStatistics *tbs = get_tbstats_by_id(tbdi->id);
+    if (tbs) {
+        do_tb_dump_with_statistics(tbs, tbdi->log_flags);
+    } else {
+        qemu_log("no TB statitics found with id %d\n", tbdi->id);
+    }
+
+    qemu_log_to_monitor(false);
+
+    g_free(tbdi);
+}
+
+void dump_tb_info(int id, int log_mask, bool use_monitor)
+{
+    struct tb_dump_info *tbdi = g_new(struct tb_dump_info, 1);
+
+    tbdi->id = id;
+    tbdi->log_flags = log_mask;
+    tbdi->use_monitor = use_monitor;
+
+    async_safe_run_on_cpu(first_cpu, do_dump_tb_info_safe,
+                          RUN_ON_CPU_HOST_PTR(tbdi));
+
+    /* tbdi free'd by do_dump_tb_info_safe */
+}
+
+
 void enable_collect_tb_stats(void)
 {
     init_tb_stats_htable_if_not();
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index a2e65bb85c..fa163440dc 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1778,7 +1778,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
      * generation so we can count interesting things about this
      * generation.
      */
-    if (tb_stats_collection_enabled()) {
+    if (tb_stats_collection_enabled() && !(tb->cflags & CF_NOCACHE)) {
         tb->tb_stats = tb_get_stats(phys_pc, pc, cs_base, flags, tb);
         uint32_t flag = get_default_tbstats_flag();
 
diff --git a/disas.c b/disas.c
index 3e2bfa572b..d5292d4246 100644
--- a/disas.c
+++ b/disas.c
@@ -8,6 +8,8 @@
 #include "disas/disas.h"
 #include "disas/capstone.h"
 
+#include "qemu/log-for-trace.h"
+
 typedef struct CPUDebug {
     struct disassemble_info info;
     CPUState *cpu;
@@ -420,6 +422,22 @@ static bool cap_disas_monitor(disassemble_info *info, uint64_t pc, int count)
 # define cap_disas_monitor(i, p, c)  false
 #endif /* CONFIG_CAPSTONE */
 
+static int fprintf_log(struct _IO_FILE *a, const char *b, ...)
+{
+    va_list ap;
+    va_start(ap, b);
+
+    if (!to_string) {
+        vfprintf(a, b, ap);
+    } else {
+        qemu_vlog(b, ap);
+    }
+
+    va_end(ap);
+
+    return 1;
+}
+
 /* Disassemble this for me please... (debugging).  */
 void target_disas(FILE *out, CPUState *cpu, target_ulong code,
                   target_ulong size)
@@ -429,7 +447,7 @@ void target_disas(FILE *out, CPUState *cpu, target_ulong code,
     int count;
     CPUDebug s;
 
-    INIT_DISASSEMBLE_INFO(s.info, out, fprintf);
+    INIT_DISASSEMBLE_INFO(s.info, out, fprintf_log);
 
     s.cpu = cpu;
     s.info.read_memory_func = target_read_memory;
@@ -460,11 +478,12 @@ void target_disas(FILE *out, CPUState *cpu, target_ulong code,
     }
 
     for (pc = code; size > 0; pc += count, size -= count) {
-	fprintf(out, "0x" TARGET_FMT_lx ":  ", pc);
-	count = s.info.print_insn(pc, &s.info);
-	fprintf(out, "\n");
-	if (count < 0)
-	    break;
+        fprintf_log(out, "0x" TARGET_FMT_lx ":  ", pc);
+        count = s.info.print_insn(pc, &s.info);
+        fprintf_log(out, "\n");
+        if (count < 0) {
+            break;
+        }
         if (size < count) {
             fprintf(out,
                     "Disassembler disagrees with translator over instruction "
diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index c59444c461..f415479011 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -289,6 +289,30 @@ ETEXI
         .help       = "show dynamic compiler info",
         .cmd        = hmp_info_jit,
     },
+    {
+        .name       = "tb-list",
+        .args_type  = "number:i?,sortedby:s?",
+        .params     = "[number sortedby]",
+        .help       = "show a [number] translated blocks sorted by [sortedby]"
+                      "sortedby opts: hotness hg spills",
+        .cmd        = hmp_info_tblist,
+    },
+    {
+        .name       = "tb",
+        .args_type  = "id:i,flags:s?",
+        .params     = "id [flag1,flag2,...]",
+        .help       = "show information about one translated block by id."
+                      "dump flags can be used to set dump code level: out_asm in_asm op",
+        .cmd        = hmp_info_tb,
+    },
+    {
+        .name       = "coverset",
+        .args_type  = "coverage:i?",
+        .params     = "[coverage]",
+        .help       = "show hottest translated blocks neccesary to cover"
+                      "[coverage]% of the execution count",
+        .cmd        = hmp_info_coverset,
+    },
 #endif
 
 STEXI
diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
index 65063c52d7..51d73e1c5f 100644
--- a/include/exec/tb-stats.h
+++ b/include/exec/tb-stats.h
@@ -35,8 +35,11 @@
 enum SortBy { SORT_BY_HOTNESS, SORT_BY_HG /* Host/Guest */, SORT_BY_SPILLS };
 enum TbstatsCmd { START, PAUSE, STOP, FILTER };
 
+#define tbs_stats_enabled(tbs, JIT_STATS) \
+    (tbs && (tbs->stats_enabled & JIT_STATS))
+
 #define tb_stats_enabled(tb, JIT_STATS) \
-    (tb && tb->tb_stats && (tb->tb_stats->stats_enabled & JIT_STATS))
+    (tb && tb->tb_stats && tbs_stats_enabled(tb->tb_stats, JIT_STATS))
 
 #define stat_per_translation(stat, name) \
     (stat->translations.total ? stat->name / stat->translations.total : 0)
@@ -64,6 +67,8 @@ struct TBStatistics {
     struct {
         unsigned long normal;
         unsigned long atomic;
+        /* filled only when dumping x% cover set */
+        uint16_t coverage;
     } executions;
 
     struct {
@@ -82,7 +87,6 @@ struct TBStatistics {
 
     struct {
         unsigned long total;
-        unsigned long uncached;
         unsigned long spanning;
     } translations;
 
@@ -95,6 +99,9 @@ struct TBStatistics {
         int64_t la;
     } time;
 
+    /* HMP information - used for referring to previous search */
+    int display_id;
+
     /* current TB linked to this TBStatistics */
     TranslationBlock *tb;
 };
@@ -115,6 +122,40 @@ struct TbstatsCommand {
 
 void do_hmp_tbstats_safe(CPUState *cpu, run_on_cpu_data icmd);
 
+/**
+ * dump_coverset_info: report the hottest blocks to cover n% of execution
+ *
+ * @percentage: cover set percentage
+ * @use_monitor: redirect output to monitor
+ *
+ * Report the hottest blocks to either the log or monitor
+ */
+void dump_coverset_info(int percentage, bool use_monitor);
+
+
+/**
+ * dump_tbs_info: report the hottest blocks
+ *
+ * @count: the limit of hotblocks
+ * @sort_by: property in which the dump will be sorted
+ * @use_monitor: redirect output to monitor
+ *
+ * Report the hottest blocks to either the log or monitor
+ */
+void dump_tbs_info(int count, int sort_by, bool use_monitor);
+
+/**
+ * dump_tb_info: dump information about one TB
+ *
+ * @id: the display id of the block (from previous search)
+ * @mask: the temporary logging mask
+ * @Use_monitor: redirect output to monitor
+ *
+ * Re-run a translation of a block at addr for the purposes of debug output
+ */
+void dump_tb_info(int id, int log_mask, bool use_monitor);
+
+
 /* TBStatistic collection controls */
 void enable_collect_tb_stats(void);
 void disable_collect_tb_stats(void);
diff --git a/include/qemu/log-for-trace.h b/include/qemu/log-for-trace.h
index 2f0a5b080e..3de88484cb 100644
--- a/include/qemu/log-for-trace.h
+++ b/include/qemu/log-for-trace.h
@@ -20,6 +20,9 @@
 
 /* Private global variable, don't use */
 extern int qemu_loglevel;
+extern bool to_string;
+
+extern int32_t max_num_hot_tbs_to_dump;
 
 #define LOG_TRACE          (1 << 15)
 
@@ -31,5 +34,6 @@ static inline bool qemu_loglevel_mask(int mask)
 
 /* main logging function */
 int GCC_FMT_ATTR(1, 2) qemu_log(const char *fmt, ...);
+int qemu_vlog(const char *fmt, va_list va);
 
 #endif
diff --git a/include/qemu/log.h b/include/qemu/log.h
index a8d1997cde..804cf90f0f 100644
--- a/include/qemu/log.h
+++ b/include/qemu/log.h
@@ -114,6 +114,8 @@ typedef struct QEMULogItem {
 extern const QEMULogItem qemu_log_items[];
 
 void qemu_set_log(int log_flags);
+void qemu_log_to_monitor(bool enable);
+void qemu_log_to_string(bool enable, GString *s);
 void qemu_log_needs_buffers(void);
 void qemu_set_log_filename(const char *filename, Error **errp);
 void qemu_set_dfilter_ranges(const char *ranges, Error **errp);
diff --git a/monitor/misc.c b/monitor/misc.c
index 218263d29a..b99c018124 100644
--- a/monitor/misc.c
+++ b/monitor/misc.c
@@ -504,6 +504,80 @@ static void hmp_tbstats(Monitor *mon, const QDict *qdict)
 
 }
 
+static void hmp_info_tblist(Monitor *mon, const QDict *qdict)
+{
+    int number_int;
+    const char *sortedby_str = NULL;
+    if (!tcg_enabled()) {
+        error_report("TB information is only available with accel=tcg");
+        return;
+    }
+    if (!tb_ctx.tb_stats.map) {
+        error_report("no TB information recorded");
+        return;
+    }
+
+    number_int = qdict_get_try_int(qdict, "number", 10);
+    sortedby_str = qdict_get_try_str(qdict, "sortedby");
+
+    int sortedby = SORT_BY_HOTNESS;
+    if (sortedby_str == NULL || strcmp(sortedby_str, "hotness") == 0) {
+        sortedby = SORT_BY_HOTNESS;
+    } else if (strcmp(sortedby_str, "hg") == 0) {
+        sortedby = SORT_BY_HG;
+    } else if (strcmp(sortedby_str, "spills") == 0) {
+        sortedby = SORT_BY_SPILLS;
+    } else {
+        error_report("valid sort options are: hotness hg spills");
+        return;
+    }
+
+    dump_tbs_info(number_int, sortedby, true);
+}
+
+static void hmp_info_tb(Monitor *mon, const QDict *qdict)
+{
+    const int id = qdict_get_int(qdict, "id");
+    const char *flags = qdict_get_try_str(qdict, "flags");
+    int mask;
+
+    if (!tcg_enabled()) {
+        error_report("TB information is only available with accel=tcg");
+        return;
+    }
+
+    mask = flags ? qemu_str_to_log_mask(flags) : CPU_LOG_TB_IN_ASM;
+
+    if (!mask) {
+        error_report("Unable to parse log flags, see 'help log'");
+        return;
+    }
+
+    dump_tb_info(id, mask, true);
+}
+
+static void hmp_info_coverset(Monitor *mon, const QDict *qdict)
+{
+    int coverage;
+    if (!tcg_enabled()) {
+        error_report("TB information is only available with accel=tcg");
+        return;
+    }
+    if (!tb_stats_collection_enabled()) {
+        error_report("TB information not being recorded");
+        return;
+    }
+
+    coverage = qdict_get_try_int(qdict, "coverage", 90);
+
+    if (coverage < 0 || coverage > 100) {
+        error_report("Coverset percentage should be between 0 and 100");
+        return;
+    }
+
+    dump_coverset_info(coverage, true);
+}
+
 static void hmp_info_jit(Monitor *mon, const QDict *qdict)
 {
     if (!tcg_enabled()) {
diff --git a/util/log.c b/util/log.c
index c3805b331b..698b48d083 100644
--- a/util/log.c
+++ b/util/log.c
@@ -33,28 +33,58 @@ int qemu_loglevel;
 static int log_append = 0;
 static GArray *debug_regions;
 int32_t max_num_hot_tbs_to_dump;
+static bool to_monitor;
+bool to_string;
 
 int tcg_collect_tb_stats;
 uint32_t default_tbstats_flag;
 
-/* Return the number of characters emitted.  */
-int qemu_log(const char *fmt, ...)
+GString *string;
+
+int qemu_vlog(const char *fmt, va_list va)
 {
     int ret = 0;
-    if (qemu_logfile) {
-        va_list ap;
-        va_start(ap, fmt);
-        ret = vfprintf(qemu_logfile, fmt, ap);
-        va_end(ap);
-
-        /* Don't pass back error results.  */
-        if (ret < 0) {
-            ret = 0;
+    if (to_string) {
+        if (string) {
+            g_string_append_vprintf(string, fmt, va);
         }
+    } else if (to_monitor) {
+        ret = qemu_vprintf(fmt, va);
+    } else if (qemu_logfile) {
+        ret = vfprintf(qemu_logfile, fmt, va);
+    }
+
+    /* Don't pass back error results.  */
+    if (ret < 0) {
+        ret = 0;
     }
     return ret;
 }
 
+/* Return the number of characters emitted.  */
+int qemu_log(const char *fmt, ...)
+{
+    int ret = 0;
+    va_list ap;
+    va_start(ap, fmt);
+
+    ret = qemu_vlog(fmt, ap);
+
+    va_end(ap);
+    return ret;
+}
+
+void qemu_log_to_monitor(bool enable)
+{
+    to_monitor = enable;
+}
+
+void qemu_log_to_string(bool enable, GString *s)
+{
+    to_string = enable;
+    string = s;
+}
+
 static bool log_uses_own_buffers;
 
 /* enable or disable low levels log */
-- 
2.22.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v8 10/11] monitor: adding new info cfg command
  2019-08-29 17:34 [Qemu-devel] [PATCH v8 00/11] Measure Tiny Code Generation Quality vandersonmr
                   ` (8 preceding siblings ...)
  2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 09/11] Adding info [tb-list|tb|coverset] commands to HMP vandersonmr
@ 2019-08-29 17:34 ` vandersonmr
  2019-08-30 16:26   ` Alex Bennée
  2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 11/11] linux-user: dumping hot TBs at the end of the execution vandersonmr
  2019-11-21 15:38 ` [Qemu-devel] [PATCH v8 00/11] Measure Tiny Code Generation Quality Markus Armbruster
  11 siblings, 1 reply; 24+ messages in thread
From: vandersonmr @ 2019-08-29 17:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, vandersonmr, Dr. David Alan Gilbert,
	Markus Armbruster, Richard Henderson

Adding "info cfg id depth" commands to HMP.
This command allow the exploration a TB
neighbors by dumping [and opening] a .dot
file with the TB CFG neighbors colorized
by their hotness.

The goal of this command is to allow the dynamic exploration
of TCG behavior and code quality. Therefore, for now, a
corresponding QMP command is not worthwhile.

Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
---
 accel/tcg/tb-stats.c    | 164 ++++++++++++++++++++++++++++++++++++++++
 hmp-commands-info.hx    |   7 ++
 include/exec/tb-stats.h |   1 +
 monitor/misc.c          |  22 ++++++
 4 files changed, 194 insertions(+)

diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
index d588c551c9..fd2344c5d1 100644
--- a/accel/tcg/tb-stats.c
+++ b/accel/tcg/tb-stats.c
@@ -679,6 +679,170 @@ void dump_tb_info(int id, int log_mask, bool use_monitor)
     /* tbdi free'd by do_dump_tb_info_safe */
 }
 
+/* TB CFG xdot/dot dump implementation */
+#define MAX_CFG_NUM_NODES 1000
+static int cfg_tb_id;
+static GHashTable *cfg_nodes;
+static uint64_t root_count;
+
+static void fputs_jump(TBStatistics *from, TBStatistics *to, FILE *dot)
+{
+    if (!from || !to) {
+        return;
+    }
+
+    int *from_id = (int *) g_hash_table_lookup(cfg_nodes, from);
+    int *to_id   = (int *) g_hash_table_lookup(cfg_nodes, to);
+
+    if (!from_id || !to_id) {
+        return;
+    }
+
+    fprintf(dot, "   node_%d -> node_%d;\n", *from_id, *to_id);
+}
+
+/* Hotness colors used in the CFG */
+#define HOT_RED1  0xFF000 /* RGB(255,0,0)     */
+#define HOT_RED2  0xFF333 /* RGB(255,51,48)   */
+#define MILD_RED  0xFF666 /* RGB(255,102,96)  */
+#define WEAK_RED1 0xFF999 /* RGB(255,153,144) */
+#define WEAK_RED2 0xFFCCC /* RGB(255,204,192) */
+
+static void fputs_tbstats(TBStatistics *tbs, FILE *dot, int log_flags)
+{
+    if (!tbs) {
+        return;
+    }
+
+    uint32_t color = MILD_RED;
+    uint64_t count = tbs->executions.normal;
+    if (count > 1.6 * root_count) {
+        color = HOT_RED1;
+    } else if (count > 1.2 * root_count) {
+        color = HOT_RED2;
+    } else if (count < 0.4 * root_count) {
+        color = WEAK_RED2;
+    } else if (count < 0.8 * root_count) {
+        color = WEAK_RED1;
+    }
+
+    GString *code_s = get_code_string(tbs, log_flags);
+
+    for (int i = 0; i < code_s->len; i++) {
+        if (code_s->str[i] == '\n') {
+            code_s->str[i] = ' ';
+            code_s = g_string_insert(code_s, i, "\\l");
+            i += 2;
+        }
+    }
+
+    fprintf(dot,
+            "   node_%d [fillcolor=\"#%xFF0000\" shape=\"record\" "
+            "label=\"TB %d\\l"
+            "-------------\\l"
+            "PC:\t0x"TARGET_FMT_lx"\\l"
+            "exec count:\t%lu\\l"
+            "\\l %s\"];\n",
+            cfg_tb_id, color, cfg_tb_id, tbs->pc,
+            tbs->executions.normal, code_s->str);
+
+    int *id = g_new(int, 1);
+    *id = cfg_tb_id;
+    g_hash_table_insert(cfg_nodes, tbs, id);
+
+    cfg_tb_id++;
+
+    g_string_free(code_s, true);
+}
+
+static void fputs_preorder_walk(TBStatistics *tbs, int depth, FILE *dot, int log_flags)
+{
+    if (tbs && depth > 0
+            && cfg_tb_id < MAX_CFG_NUM_NODES
+            && !g_hash_table_contains(cfg_nodes, tbs)) {
+
+        fputs_tbstats(tbs, dot, log_flags);
+
+        if (tbs->tb) {
+            TranslationBlock *left_tb  = NULL;
+            TranslationBlock *right_tb = NULL;
+            if (tbs->tb->jmp_dest[0]) {
+                left_tb = (TranslationBlock *) atomic_read(tbs->tb->jmp_dest);
+            }
+            if (tbs->tb->jmp_dest[1]) {
+                right_tb = (TranslationBlock *) atomic_read(tbs->tb->jmp_dest + 1);
+            }
+
+            if (left_tb) {
+                fputs_preorder_walk(left_tb->tb_stats, depth - 1, dot, log_flags);
+                fputs_jump(tbs, left_tb->tb_stats, dot);
+            }
+            if (right_tb) {
+                fputs_preorder_walk(right_tb->tb_stats, depth - 1, dot, log_flags);
+                fputs_jump(tbs, right_tb->tb_stats, dot);
+            }
+        }
+    }
+}
+
+struct PreorderInfo {
+    TBStatistics *tbs;
+    int depth;
+    int log_flags;
+};
+
+static void fputs_preorder_walk_safe(CPUState *cpu, run_on_cpu_data icmd)
+{
+    struct PreorderInfo *info = icmd.host_ptr;
+
+    GString *file_name = g_string_new(NULL);;
+    g_string_printf(file_name, "/tmp/qemu-cfg-tb-%d-%d.dot", id, info->depth);
+    FILE *dot = fopen(file_name->str, "w+");
+
+    fputs(
+            "digraph G {\n"
+            "   mclimit=1.5;\n"
+            "   rankdir=TD; ordering=out;\n"
+            "   graph[fontsize=10 fontname=\"Verdana\"];\n"
+            "   color=\"#efefef\";\n"
+            "   node[shape=box style=filled fontsize=8 fontname=\"Verdana\" fillcolor=\"#efefef\"];\n"
+            "   edge[fontsize=8 fontname=\"Verdana\"];\n"
+         , dot);
+
+    cfg_nodes = g_hash_table_new(NULL, NULL);
+    fputs_preorder_walk(info->tbs, info->depth, dot, info->log_flags);
+    g_hash_table_destroy(cfg_nodes);
+
+    fputs("}\n\0", dot);
+    fclose(dot);
+
+    qemu_log("CFG dumped: %s\n", file_name->str);
+
+    g_string_free(file_name, true);
+    g_free(info);
+}
+
+void dump_tb_cfg(int id, int depth, int log_flags)
+{
+    cfg_tb_id = 1;
+    root_count = 0;
+
+    /* do a pre-order walk in the CFG with a limited depth */
+    TBStatistics *root = get_tbstats_by_id(id);
+    if (root) {
+        root_count = root->executions.normal;
+    }
+
+    struct PreorderInfo *info = g_new(struct PreorderInfo, 1);
+    info->tbs = root;
+    info->depth = depth + 1;
+    info->log_flags = log_flags;
+    async_safe_run_on_cpu(first_cpu, fputs_preorder_walk_safe,
+            RUN_ON_CPU_HOST_PTR(info));
+}
+
+/* TBStatistic collection controls */
+
 
 void enable_collect_tb_stats(void)
 {
diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index f415479011..8c96924c0b 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -305,6 +305,13 @@ ETEXI
                       "dump flags can be used to set dump code level: out_asm in_asm op",
         .cmd        = hmp_info_tb,
     },
+    {
+        .name       = "cfg",
+        .args_type  = "id:i,depth:i?,flags:s?",
+        .params     = "id [depth flags]",
+        .help       = "plot CFG around TB with the given id",
+        .cmd        = hmp_info_cfg,
+    },
     {
         .name       = "coverset",
         .args_type  = "coverage:i?",
diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
index 51d73e1c5f..4515db106a 100644
--- a/include/exec/tb-stats.h
+++ b/include/exec/tb-stats.h
@@ -155,6 +155,7 @@ void dump_tbs_info(int count, int sort_by, bool use_monitor);
  */
 void dump_tb_info(int id, int log_mask, bool use_monitor);
 
+void dump_tb_cfg(int id, int depth, int log_flags);
 
 /* TBStatistic collection controls */
 void enable_collect_tb_stats(void);
diff --git a/monitor/misc.c b/monitor/misc.c
index b99c018124..b3b31d7035 100644
--- a/monitor/misc.c
+++ b/monitor/misc.c
@@ -556,6 +556,28 @@ static void hmp_info_tb(Monitor *mon, const QDict *qdict)
     dump_tb_info(id, mask, true);
 }
 
+static void hmp_info_cfg(Monitor *mon, const QDict *qdict)
+{
+    const int id = qdict_get_int(qdict, "id");
+    const int depth = qdict_get_try_int(qdict, "depth", 3);
+    const char *flags = qdict_get_try_str(qdict, "flags");
+    int mask;
+
+    if (!tcg_enabled()) {
+        error_report("TB information is only available with accel=tcg");
+        return;
+    }
+
+    mask = flags ? qemu_str_to_log_mask(flags) : CPU_LOG_TB_IN_ASM;
+
+    if (!mask) {
+        error_report("Unable to parse log flags, see 'help log'");
+        return;
+    }
+
+    dump_tb_cfg(id, depth, mask);
+}
+
 static void hmp_info_coverset(Monitor *mon, const QDict *qdict)
 {
     int coverage;
-- 
2.22.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v8 11/11] linux-user: dumping hot TBs at the end of the execution
  2019-08-29 17:34 [Qemu-devel] [PATCH v8 00/11] Measure Tiny Code Generation Quality vandersonmr
                   ` (9 preceding siblings ...)
  2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 10/11] monitor: adding new info cfg command vandersonmr
@ 2019-08-29 17:34 ` vandersonmr
  2019-11-21 15:38 ` [Qemu-devel] [PATCH v8 00/11] Measure Tiny Code Generation Quality Markus Armbruster
  11 siblings, 0 replies; 24+ messages in thread
From: vandersonmr @ 2019-08-29 17:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: Riku Voipio, vandersonmr, Laurent Vivier

dumps, in linux-user mode, the hottest TBs if -d tb_stats is used.

Example of output for the 3 hottest TBs:

TB id:1 | phys:0x34d54 virt:0x0000000000034d54 flags:0x0000f0
        | exec:4828932/0 guest inst cov:16.38%
        | trans:1 ints: g:3 op:82 op_opt:34 spills:3
        | h/g (host bytes / guest insts): 90.666664
        | time to gen at 2.4GHz => code:3150.83(ns) IR:712.08(ns)
        | targets: 0x0000000000034d5e (id:11), 0x0000000000034d0d (id:2)

TB id:2 | phys:0x34d0d virt:0x0000000000034d0d flags:0x0000f0
        | exec:4825842/0 guest inst cov:21.82%
        | trans:1 ints: g:4 op:80 op_opt:38 spills:2
        | h/g (host bytes / guest insts): 84.000000
        | time to gen at 2.4GHz => code:3362.92(ns) IR:793.75(ns)
        | targets: 0x0000000000034d19 (id:12), 0x0000000000034d54 (id:1)

TB id:3 | phys:0xec1c1 virt:0x00000000000ec1c1 flags:0x0000b0
        | exec:872032/0 guest inst cov:1.97%
        | trans:1 ints: g:2 op:56 op_opt:26 spills:1
        | h/g (host bytes / guest insts): 68.000000
        | time to gen at 2.4GHz => code:1692.08(ns) IR:473.75(ns)
        | targets: 0x00000000000ec1c5 (id:4), 0x00000000000ec1cb (id:13)

Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
---
 linux-user/exit.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/linux-user/exit.c b/linux-user/exit.c
index bdda720553..7226104959 100644
--- a/linux-user/exit.c
+++ b/linux-user/exit.c
@@ -28,6 +28,10 @@ extern void __gcov_dump(void);
 
 void preexit_cleanup(CPUArchState *env, int code)
 {
+    if (tb_stats_collection_enabled()) {
+        dump_tbs_info(max_num_hot_tbs_to_dump, SORT_BY_HOTNESS, false);
+    }
+
 #ifdef TARGET_GPROF
         _mcleanup();
 #endif
-- 
2.22.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v8 08/11] Adding tb_stats [start|pause|stop|filter] command to hmp.
  2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 08/11] Adding tb_stats [start|pause|stop|filter] command to hmp vandersonmr
@ 2019-08-29 17:54   ` Vanderson Martins do Rosario
  0 siblings, 0 replies; 24+ messages in thread
From: Vanderson Martins do Rosario @ 2019-08-29 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dr . David Alan Gilbert, Markus Armbruster

Ops, this commit shouldn't exist. My mistake. The series still work though.
I will remove this on v9.


Vanderson M. Rosario


On Thu, Aug 29, 2019 at 2:35 PM vandersonmr <vandersonmr2@gmail.com> wrote:

> This allows controlling the collection of statistics.
> It is also possible to set the level of collection:
> all, jit, or exec.
>
> tb_stats filter allow to only collect statistics for the TB
> in the last_search list.
>
> The goal of this command is to allow the dynamic exploration
> of the TCG behavior and quality. Therefore, for now, a
> corresponding QMP command is not worthwhile.
>
> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
> ---
>  monitor/misc.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/monitor/misc.c b/monitor/misc.c
> index b389ca09a1..218263d29a 100644
> --- a/monitor/misc.c
> +++ b/monitor/misc.c
> @@ -74,6 +74,8 @@
>  #include "sysemu/cpus.h"
>  #include "qemu/cutils.h"
>  #include "tcg/tcg.h"
> +#include "exec/tb-stats.h"
> +#include "qemu-common.h"
>
>  #if defined(TARGET_S390X)
>  #include "hw/s390x/storage-keys.h"
> --
> 2.22.0
>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v8 02/11] accel: collecting TB execution count
  2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 02/11] accel: collecting TB execution count vandersonmr
@ 2019-08-30 10:21   ` Alex Bennée
  2019-08-30 12:31     ` Vanderson Martins do Rosario
  0 siblings, 1 reply; 24+ messages in thread
From: Alex Bennée @ 2019-08-30 10:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, vandersonmr, Richard Henderson


vandersonmr <vandersonmr2@gmail.com> writes:

> If a TB has a TBS (TBStatistics) with the TB_EXEC_STATS
> enabled, then we instrument the start code of this TB
> to atomically count the number of times it is executed.
> We count both the number of "normal" executions and atomic
> executions of a TB.
>
> The execution count of the TB is stored in its respective
> TBS.
>
> All TBStatistics are created by default with the flags from
> default_tbstats_flag.
>
> Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
> ---
>  accel/tcg/cpu-exec.c      |  4 ++++
>  accel/tcg/tb-stats.c      |  5 +++++
>  accel/tcg/tcg-runtime.c   |  7 +++++++
>  accel/tcg/tcg-runtime.h   |  2 ++
>  accel/tcg/translate-all.c |  7 +++++++
>  accel/tcg/translator.c    |  1 +
>  include/exec/gen-icount.h |  9 +++++++++
>  include/exec/tb-stats.h   | 19 +++++++++++++++++++
>  util/log.c                |  1 +
>  9 files changed, 55 insertions(+)
>
> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> index 48272c781b..9b2b7bff80 100644
> --- a/accel/tcg/cpu-exec.c
> +++ b/accel/tcg/cpu-exec.c
> @@ -251,6 +251,10 @@ void cpu_exec_step_atomic(CPUState *cpu)
>
>          start_exclusive();
>
> +        if (tb_stats_enabled(tb, TB_EXEC_STATS)) {
> +            tb->tb_stats->executions.atomic++;
> +        }
> +
>          /* Since we got here, we know that parallel_cpus must be true.  */
>          parallel_cpus = false;
>          in_exclusive_region = true;
> diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
> index 948b107e68..1db81d83e7 100644
> --- a/accel/tcg/tb-stats.c
> +++ b/accel/tcg/tb-stats.c
> @@ -61,3 +61,8 @@ bool tb_stats_collection_paused(void)
>  {
>      return tcg_collect_tb_stats == TB_STATS_PAUSED;
>  }
> +
> +uint32_t get_default_tbstats_flag(void)
> +{
> +    return default_tbstats_flag;
> +}
> diff --git a/accel/tcg/tcg-runtime.c b/accel/tcg/tcg-runtime.c
> index 8a1e408e31..6f4aafba11 100644
> --- a/accel/tcg/tcg-runtime.c
> +++ b/accel/tcg/tcg-runtime.c
> @@ -167,3 +167,10 @@ void HELPER(exit_atomic)(CPUArchState *env)
>  {
>      cpu_loop_exit_atomic(env_cpu(env), GETPC());
>  }
> +
> +void HELPER(inc_exec_freq)(void *ptr)
> +{
> +    TBStatistics *stats = (TBStatistics *) ptr;
> +    g_assert(stats);
> +    atomic_inc(&stats->executions.normal);
> +}
> diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
> index 4fa61b49b4..bf0b75dbe8 100644
> --- a/accel/tcg/tcg-runtime.h
> +++ b/accel/tcg/tcg-runtime.h
> @@ -28,6 +28,8 @@ DEF_HELPER_FLAGS_1(lookup_tb_ptr, TCG_CALL_NO_WG_SE, ptr, env)
>
>  DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, env)
>
> +DEF_HELPER_FLAGS_1(inc_exec_freq, TCG_CALL_NO_RWG, void, ptr)
> +
>  #ifdef CONFIG_SOFTMMU
>
>  DEF_HELPER_FLAGS_5(atomic_cmpxchgb, TCG_CALL_NO_WG,
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index b7bccacd3b..e72aeba682 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -1785,6 +1785,13 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>       */
>      if (tb_stats_collection_enabled()) {
>          tb->tb_stats = tb_get_stats(phys_pc, pc, cs_base, flags, tb);
> +
> +        if (qemu_log_in_addr_range(tb->pc)) {

We can open this out because this test will always pass if no dfilter
has been set and there is no point creating a tb_stats record if we
won't fill it in. So

  if (qemu_log_in_addr_range(tb->pc)) {
     tb->tb_stats = tb_get_stats(phys_pc, pc, cs_base, flags, tb);
     uint32_t flag = get_default_tbstats_flag();

     if (flag & TB_EXEC_STATS) {
       ...

And the additional tests that get added later. This way we'll only
create and collect stats for what we want.

> +            uint32_t flag = get_default_tbstats_flag();
> +            if (flag & TB_EXEC_STATS) {
> +                tb->tb_stats->stats_enabled |= TB_EXEC_STATS;
> +            }
> +        }
>      } else {
>          tb->tb_stats = NULL;
>      }
> diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
> index 70c66c538c..ec6bd829a0 100644
> --- a/accel/tcg/translator.c
> +++ b/accel/tcg/translator.c
> @@ -46,6 +46,7 @@ void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
>
>      ops->init_disas_context(db, cpu);
>      tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
> +    gen_tb_exec_count(tb);
>
>      /* Reset the temp count so that we can identify leaks */
>      tcg_clear_temp_count();
> diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
> index 822c43cfd3..be006383b9 100644
> --- a/include/exec/gen-icount.h
> +++ b/include/exec/gen-icount.h
> @@ -32,6 +32,15 @@ static inline void gen_io_end(void)
>      tcg_temp_free_i32(tmp);
>  }
>
> +static inline void gen_tb_exec_count(TranslationBlock *tb)
> +{
> +    if (tb_stats_enabled(tb, TB_EXEC_STATS)) {
> +        TCGv_ptr ptr = tcg_const_ptr(tb->tb_stats);
> +        gen_helper_inc_exec_freq(ptr);
> +        tcg_temp_free_ptr(ptr);
> +    }
> +}
> +
>  static inline void gen_tb_start(TranslationBlock *tb)
>  {
>      TCGv_i32 count, imm;
> diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
> index 898e05a36f..c4a8715400 100644
> --- a/include/exec/tb-stats.h
> +++ b/include/exec/tb-stats.h
> @@ -30,6 +30,9 @@
>  #include "exec/tb-context.h"
>  #include "tcg.h"
>
> +#define tb_stats_enabled(tb, JIT_STATS) \
> +    (tb && tb->tb_stats && (tb->tb_stats->stats_enabled & JIT_STATS))
> +
>  typedef struct TBStatistics TBStatistics;
>
>  /*
> @@ -46,6 +49,15 @@ struct TBStatistics {
>      uint32_t     flags;
>      /* cs_base isn't included in the hash but we do check for matches */
>      target_ulong cs_base;
> +
> +    uint32_t stats_enabled;
> +
> +    /* Execution stats */
> +    struct {
> +        unsigned long normal;
> +        unsigned long atomic;
> +    } executions;
> +
>      /* current TB linked to this TBStatistics */
>      TranslationBlock *tb;
>  };
> @@ -56,7 +68,12 @@ void init_tb_stats_htable_if_not(void);
>
>  /* TBStatistic collection controls */
>  enum TBStatsStatus { TB_STATS_RUNNING, TB_STATS_PAUSED, TB_STATS_STOPPED };
> +
> +#define TB_NOTHING    (1 << 0)
> +#define TB_EXEC_STATS (1 << 1)
> +
>  extern int tcg_collect_tb_stats;
> +extern uint32_t default_tbstats_flag;
>
>  void enable_collect_tb_stats(void);
>  void disable_collect_tb_stats(void);
> @@ -64,4 +81,6 @@ void pause_collect_tb_stats(void);
>  bool tb_stats_collection_enabled(void);
>  bool tb_stats_collection_paused(void);
>
> +uint32_t get_default_tbstats_flag(void);
> +
>  #endif
> diff --git a/util/log.c b/util/log.c
> index 393a17115b..29021a4584 100644
> --- a/util/log.c
> +++ b/util/log.c
> @@ -32,6 +32,7 @@ static int log_append = 0;
>  static GArray *debug_regions;
>
>  int tcg_collect_tb_stats;
> +uint32_t default_tbstats_flag;
>
>  /* Return the number of characters emitted.  */
>  int qemu_log(const char *fmt, ...)


--
Alex Bennée


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v8 02/11] accel: collecting TB execution count
  2019-08-30 10:21   ` Alex Bennée
@ 2019-08-30 12:31     ` Vanderson Martins do Rosario
  2019-08-30 13:01       ` Alex Bennée
  0 siblings, 1 reply; 24+ messages in thread
From: Vanderson Martins do Rosario @ 2019-08-30 12:31 UTC (permalink / raw)
  To: Alex Bennée; +Cc: Paolo Bonzini, qemu-devel, Richard Henderson

Ok. I haven't change it before because I would like to be able to collect
information for already translated TBs when I, for instance, remove the
filter during execution. Having the TBStats already created guarantee this.
To guarantee this in your approach, we would need to tb_flush when changing
the filter. Does it make sense? Is that ok for you?

Vanderson M. Rosario


On Fri, Aug 30, 2019 at 7:21 AM Alex Bennée <alex.bennee@linaro.org> wrote:

>
> vandersonmr <vandersonmr2@gmail.com> writes:
>
> > If a TB has a TBS (TBStatistics) with the TB_EXEC_STATS
> > enabled, then we instrument the start code of this TB
> > to atomically count the number of times it is executed.
> > We count both the number of "normal" executions and atomic
> > executions of a TB.
> >
> > The execution count of the TB is stored in its respective
> > TBS.
> >
> > All TBStatistics are created by default with the flags from
> > default_tbstats_flag.
> >
> > Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
> > ---
> >  accel/tcg/cpu-exec.c      |  4 ++++
> >  accel/tcg/tb-stats.c      |  5 +++++
> >  accel/tcg/tcg-runtime.c   |  7 +++++++
> >  accel/tcg/tcg-runtime.h   |  2 ++
> >  accel/tcg/translate-all.c |  7 +++++++
> >  accel/tcg/translator.c    |  1 +
> >  include/exec/gen-icount.h |  9 +++++++++
> >  include/exec/tb-stats.h   | 19 +++++++++++++++++++
> >  util/log.c                |  1 +
> >  9 files changed, 55 insertions(+)
> >
> > diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> > index 48272c781b..9b2b7bff80 100644
> > --- a/accel/tcg/cpu-exec.c
> > +++ b/accel/tcg/cpu-exec.c
> > @@ -251,6 +251,10 @@ void cpu_exec_step_atomic(CPUState *cpu)
> >
> >          start_exclusive();
> >
> > +        if (tb_stats_enabled(tb, TB_EXEC_STATS)) {
> > +            tb->tb_stats->executions.atomic++;
> > +        }
> > +
> >          /* Since we got here, we know that parallel_cpus must be true.
> */
> >          parallel_cpus = false;
> >          in_exclusive_region = true;
> > diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
> > index 948b107e68..1db81d83e7 100644
> > --- a/accel/tcg/tb-stats.c
> > +++ b/accel/tcg/tb-stats.c
> > @@ -61,3 +61,8 @@ bool tb_stats_collection_paused(void)
> >  {
> >      return tcg_collect_tb_stats == TB_STATS_PAUSED;
> >  }
> > +
> > +uint32_t get_default_tbstats_flag(void)
> > +{
> > +    return default_tbstats_flag;
> > +}
> > diff --git a/accel/tcg/tcg-runtime.c b/accel/tcg/tcg-runtime.c
> > index 8a1e408e31..6f4aafba11 100644
> > --- a/accel/tcg/tcg-runtime.c
> > +++ b/accel/tcg/tcg-runtime.c
> > @@ -167,3 +167,10 @@ void HELPER(exit_atomic)(CPUArchState *env)
> >  {
> >      cpu_loop_exit_atomic(env_cpu(env), GETPC());
> >  }
> > +
> > +void HELPER(inc_exec_freq)(void *ptr)
> > +{
> > +    TBStatistics *stats = (TBStatistics *) ptr;
> > +    g_assert(stats);
> > +    atomic_inc(&stats->executions.normal);
> > +}
> > diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
> > index 4fa61b49b4..bf0b75dbe8 100644
> > --- a/accel/tcg/tcg-runtime.h
> > +++ b/accel/tcg/tcg-runtime.h
> > @@ -28,6 +28,8 @@ DEF_HELPER_FLAGS_1(lookup_tb_ptr, TCG_CALL_NO_WG_SE,
> ptr, env)
> >
> >  DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, env)
> >
> > +DEF_HELPER_FLAGS_1(inc_exec_freq, TCG_CALL_NO_RWG, void, ptr)
> > +
> >  #ifdef CONFIG_SOFTMMU
> >
> >  DEF_HELPER_FLAGS_5(atomic_cmpxchgb, TCG_CALL_NO_WG,
> > diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> > index b7bccacd3b..e72aeba682 100644
> > --- a/accel/tcg/translate-all.c
> > +++ b/accel/tcg/translate-all.c
> > @@ -1785,6 +1785,13 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
> >       */
> >      if (tb_stats_collection_enabled()) {
> >          tb->tb_stats = tb_get_stats(phys_pc, pc, cs_base, flags, tb);
> > +
> > +        if (qemu_log_in_addr_range(tb->pc)) {
>
> We can open this out because this test will always pass if no dfilter
> has been set and there is no point creating a tb_stats record if we
> won't fill it in. So
>
>   if (qemu_log_in_addr_range(tb->pc)) {
>      tb->tb_stats = tb_get_stats(phys_pc, pc, cs_base, flags, tb);
>      uint32_t flag = get_default_tbstats_flag();
>
>      if (flag & TB_EXEC_STATS) {
>        ...
>
> And the additional tests that get added later. This way we'll only
> create and collect stats for what we want.
>
> > +            uint32_t flag = get_default_tbstats_flag();
> > +            if (flag & TB_EXEC_STATS) {
> > +                tb->tb_stats->stats_enabled |= TB_EXEC_STATS;
> > +            }
> > +        }
> >      } else {
> >          tb->tb_stats = NULL;
> >      }
> > diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
> > index 70c66c538c..ec6bd829a0 100644
> > --- a/accel/tcg/translator.c
> > +++ b/accel/tcg/translator.c
> > @@ -46,6 +46,7 @@ void translator_loop(const TranslatorOps *ops,
> DisasContextBase *db,
> >
> >      ops->init_disas_context(db, cpu);
> >      tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
> > +    gen_tb_exec_count(tb);
> >
> >      /* Reset the temp count so that we can identify leaks */
> >      tcg_clear_temp_count();
> > diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
> > index 822c43cfd3..be006383b9 100644
> > --- a/include/exec/gen-icount.h
> > +++ b/include/exec/gen-icount.h
> > @@ -32,6 +32,15 @@ static inline void gen_io_end(void)
> >      tcg_temp_free_i32(tmp);
> >  }
> >
> > +static inline void gen_tb_exec_count(TranslationBlock *tb)
> > +{
> > +    if (tb_stats_enabled(tb, TB_EXEC_STATS)) {
> > +        TCGv_ptr ptr = tcg_const_ptr(tb->tb_stats);
> > +        gen_helper_inc_exec_freq(ptr);
> > +        tcg_temp_free_ptr(ptr);
> > +    }
> > +}
> > +
> >  static inline void gen_tb_start(TranslationBlock *tb)
> >  {
> >      TCGv_i32 count, imm;
> > diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
> > index 898e05a36f..c4a8715400 100644
> > --- a/include/exec/tb-stats.h
> > +++ b/include/exec/tb-stats.h
> > @@ -30,6 +30,9 @@
> >  #include "exec/tb-context.h"
> >  #include "tcg.h"
> >
> > +#define tb_stats_enabled(tb, JIT_STATS) \
> > +    (tb && tb->tb_stats && (tb->tb_stats->stats_enabled & JIT_STATS))
> > +
> >  typedef struct TBStatistics TBStatistics;
> >
> >  /*
> > @@ -46,6 +49,15 @@ struct TBStatistics {
> >      uint32_t     flags;
> >      /* cs_base isn't included in the hash but we do check for matches */
> >      target_ulong cs_base;
> > +
> > +    uint32_t stats_enabled;
> > +
> > +    /* Execution stats */
> > +    struct {
> > +        unsigned long normal;
> > +        unsigned long atomic;
> > +    } executions;
> > +
> >      /* current TB linked to this TBStatistics */
> >      TranslationBlock *tb;
> >  };
> > @@ -56,7 +68,12 @@ void init_tb_stats_htable_if_not(void);
> >
> >  /* TBStatistic collection controls */
> >  enum TBStatsStatus { TB_STATS_RUNNING, TB_STATS_PAUSED,
> TB_STATS_STOPPED };
> > +
> > +#define TB_NOTHING    (1 << 0)
> > +#define TB_EXEC_STATS (1 << 1)
> > +
> >  extern int tcg_collect_tb_stats;
> > +extern uint32_t default_tbstats_flag;
> >
> >  void enable_collect_tb_stats(void);
> >  void disable_collect_tb_stats(void);
> > @@ -64,4 +81,6 @@ void pause_collect_tb_stats(void);
> >  bool tb_stats_collection_enabled(void);
> >  bool tb_stats_collection_paused(void);
> >
> > +uint32_t get_default_tbstats_flag(void);
> > +
> >  #endif
> > diff --git a/util/log.c b/util/log.c
> > index 393a17115b..29021a4584 100644
> > --- a/util/log.c
> > +++ b/util/log.c
> > @@ -32,6 +32,7 @@ static int log_append = 0;
> >  static GArray *debug_regions;
> >
> >  int tcg_collect_tb_stats;
> > +uint32_t default_tbstats_flag;
> >
> >  /* Return the number of characters emitted.  */
> >  int qemu_log(const char *fmt, ...)
>
>
> --
> Alex Bennée
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v8 01/11] accel: introducing TBStatistics structure
  2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 01/11] accel: introducing TBStatistics structure vandersonmr
@ 2019-08-30 12:59   ` Alex Bennée
  0 siblings, 0 replies; 24+ messages in thread
From: Alex Bennée @ 2019-08-30 12:59 UTC (permalink / raw)
  To: vandersonmr; +Cc: Paolo Bonzini, qemu-devel, Richard Henderson


vandersonmr <vandersonmr2@gmail.com> writes:

> To store statistics for each TB, we created a TBStatistics structure
> which is linked with the TBs. TBStatistics can stay alive after
> tb_flush and be relinked to a regenerated TB. So the statistics can
> be accumulated even through flushes.
>
> The goal is to have all present and future qemu/tcg statistics and
> meta-data stored in this new structure.
>
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
> ---
>  accel/tcg/Makefile.objs   |  2 +-
>  accel/tcg/tb-stats.c      | 63 ++++++++++++++++++++++++++++++++++++
>  accel/tcg/translate-all.c | 57 +++++++++++++++++++++++++++++++++
>  include/exec/exec-all.h   | 15 +++------
>  include/exec/tb-context.h | 12 +++++++
>  include/exec/tb-hash.h    |  7 ++++
>  include/exec/tb-stats.h   | 67 +++++++++++++++++++++++++++++++++++++++
>  util/log.c                |  2 ++
>  8 files changed, 213 insertions(+), 12 deletions(-)
>  create mode 100644 accel/tcg/tb-stats.c
>  create mode 100644 include/exec/tb-stats.h
>
> diff --git a/accel/tcg/Makefile.objs b/accel/tcg/Makefile.objs
> index d381a02f34..49ffe81b5d 100644
> --- a/accel/tcg/Makefile.objs
> +++ b/accel/tcg/Makefile.objs
> @@ -2,7 +2,7 @@ obj-$(CONFIG_SOFTMMU) += tcg-all.o
>  obj-$(CONFIG_SOFTMMU) += cputlb.o
>  obj-y += tcg-runtime.o tcg-runtime-gvec.o
>  obj-y += cpu-exec.o cpu-exec-common.o translate-all.o
> -obj-y += translator.o
> +obj-y += translator.o tb-stats.o
>  
>  obj-$(CONFIG_USER_ONLY) += user-exec.o
>  obj-$(call lnot,$(CONFIG_SOFTMMU)) += user-exec-stub.o
> diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
> new file mode 100644
> index 0000000000..948b107e68
> --- /dev/null
> +++ b/accel/tcg/tb-stats.c
> @@ -0,0 +1,63 @@
> +/*
> + * QEMU System Emulator, Code Quality Monitor System
> + *
> + * Copyright (c) 2019 Vanderson M. do Rosario <vandersonmr2@gmail.com>
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "qemu/osdep.h"
> +
> +#include "disas/disas.h"
> +
> +#include "exec/tb-stats.h"
> +
> +void init_tb_stats_htable_if_not(void)
> +{
> +    if (tb_stats_collection_enabled() && !tb_ctx.tb_stats.map) {
> +        qht_init(&tb_ctx.tb_stats, tb_stats_cmp,
> +                CODE_GEN_HTABLE_SIZE, QHT_MODE_AUTO_RESIZE);
> +    }
> +}
> +
> +void enable_collect_tb_stats(void)
> +{
> +    init_tb_stats_htable_if_not();
> +    tcg_collect_tb_stats = TB_STATS_RUNNING;
> +}
> +
> +void disable_collect_tb_stats(void)
> +{
> +    tcg_collect_tb_stats = TB_STATS_PAUSED;
> +}
> +
> +void pause_collect_tb_stats(void)
> +{
> +    tcg_collect_tb_stats = TB_STATS_STOPPED;
> +}
> +
> +bool tb_stats_collection_enabled(void)
> +{
> +    return tcg_collect_tb_stats == TB_STATS_RUNNING;
> +}
> +
> +bool tb_stats_collection_paused(void)
> +{
> +    return tcg_collect_tb_stats == TB_STATS_PAUSED;
> +}
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 5d1e08b169..b7bccacd3b 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -1118,6 +1118,23 @@ static inline void code_gen_alloc(size_t tb_size)
>      }
>  }
>  
> +/*
> + * This is the more or less the same compare as tb_cmp(), but the
> + * data persists over tb_flush. We also aggregate the various
> + * variations of cflags under one record and ignore the details of
> + * page overlap (although we can count it).
> + */
> +bool tb_stats_cmp(const void *ap, const void *bp)
> +{
> +    const TBStatistics *a = ap;
> +    const TBStatistics *b = bp;
> +
> +    return a->phys_pc == b->phys_pc &&
> +        a->pc == b->pc &&
> +        a->cs_base == b->cs_base &&
> +        a->flags == b->flags;
> +}
> +
>  static bool tb_cmp(const void *ap, const void *bp)
>  {
>      const TranslationBlock *a = ap;
> @@ -1137,6 +1154,7 @@ static void tb_htable_init(void)
>      unsigned int mode = QHT_MODE_AUTO_RESIZE;
>  
>      qht_init(&tb_ctx.htable, tb_cmp, CODE_GEN_HTABLE_SIZE, mode);
> +    init_tb_stats_htable_if_not();
>  }
>  
>  /* Must be called before using the QEMU cpus. 'tb_size' is the size
> @@ -1666,6 +1684,34 @@ tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
>      return tb;
>  }
>  
> +static TBStatistics *tb_get_stats(tb_page_addr_t phys_pc, target_ulong pc,
> +                                  target_ulong cs_base, uint32_t flags,
> +                                  TranslationBlock *current_tb)
> +{
> +    TBStatistics *new_stats = g_new0(TBStatistics, 1);
> +    uint32_t hash = tb_stats_hash_func(phys_pc, pc, flags);
> +    void *existing_stats = NULL;
> +    new_stats->phys_pc = phys_pc;
> +    new_stats->pc = pc;
> +    new_stats->cs_base = cs_base;
> +    new_stats->flags = flags;
> +    new_stats->tb = current_tb;

Drop the current_tb and the field from TBStatistics - it's not used by
anything else in this patch. Also see bellow:

> +
> +    qht_insert(&tb_ctx.tb_stats, new_stats, hash, &existing_stats);
> +
> +    if (unlikely(existing_stats)) {
> +        /*
> +         * If there is already a TBStatistic for this TB from a previous flush
> +         * then just make the new TB point to the older TBStatistic
> +         */
> +        g_free(new_stats);
> +        return existing_stats;
> +    } else {
> +        return new_stats;
> +    }
> +}
> +
> +
>  /* Called with mmap_lock held for user mode emulation.  */
>  TranslationBlock *tb_gen_code(CPUState *cpu,
>                                target_ulong pc, target_ulong cs_base,
> @@ -1732,6 +1778,17 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>      ti = profile_getclock();
>  #endif
>  
> +    /*
> +     * We want to fetch the stats structure before we start code
> +     * generation so we can count interesting things about this
> +     * generation.
> +     */
> +    if (tb_stats_collection_enabled()) {
> +        tb->tb_stats = tb_get_stats(phys_pc, pc, cs_base, flags, tb);
> +    } else {
> +        tb->tb_stats = NULL;
> +    }
> +
>      tcg_func_start(tcg_ctx);
>  
>      tcg_ctx->cpu = env_cpu(env);
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index 135aeaab0d..1e0bd4e147 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -23,21 +23,11 @@
>  #include "cpu.h"
>  #include "exec/tb-context.h"
>  #include "sysemu/cpus.h"
> +#include "exec/tb-stats.h"
>  
>  /* allow to see translation results - the slowdown should be negligible, so we leave it */
>  #define DEBUG_DISAS
>  
> -/* Page tracking code uses ram addresses in system mode, and virtual
> -   addresses in userspace mode.  Define tb_page_addr_t to be an appropriate
> -   type.  */
> -#if defined(CONFIG_USER_ONLY)
> -typedef abi_ulong tb_page_addr_t;
> -#define TB_PAGE_ADDR_FMT TARGET_ABI_FMT_lx
> -#else
> -typedef ram_addr_t tb_page_addr_t;
> -#define TB_PAGE_ADDR_FMT RAM_ADDR_FMT
> -#endif
> -
>  #include "qemu/log.h"
>  
>  void gen_intermediate_code(CPUState *cpu, TranslationBlock *tb, int max_insns);
> @@ -404,6 +394,9 @@ struct TranslationBlock {
>      uintptr_t jmp_list_head;
>      uintptr_t jmp_list_next[2];
>      uintptr_t jmp_dest[2];
> +
> +    /* Pointer to a struct where statistics from the TB is stored */
> +    TBStatistics *tb_stats;
>  };
>  
>  extern bool parallel_cpus;
> diff --git a/include/exec/tb-context.h b/include/exec/tb-context.h
> index feb585e0a7..3cfb62a338 100644
> --- a/include/exec/tb-context.h
> +++ b/include/exec/tb-context.h
> @@ -23,6 +23,17 @@
>  #include "qemu/thread.h"
>  #include "qemu/qht.h"
>  
> +/* Page tracking code uses ram addresses in system mode, and virtual
> +   addresses in userspace mode.  Define tb_page_addr_t to be an appropriate
> +   type.  */
> +#if defined(CONFIG_USER_ONLY)
> +typedef abi_ulong tb_page_addr_t;
> +#define TB_PAGE_ADDR_FMT TARGET_ABI_FMT_lx
> +#else
> +typedef ram_addr_t tb_page_addr_t;
> +#define TB_PAGE_ADDR_FMT RAM_ADDR_FMT
> +#endif
> +
>  #define CODE_GEN_HTABLE_BITS     15
>  #define CODE_GEN_HTABLE_SIZE     (1 << CODE_GEN_HTABLE_BITS)
>  
> @@ -35,6 +46,7 @@ struct TBContext {
>  
>      /* statistics */
>      unsigned tb_flush_count;
> +    struct qht tb_stats;
>  };
>  
>  extern TBContext tb_ctx;
> diff --git a/include/exec/tb-hash.h b/include/exec/tb-hash.h
> index 805235d321..dd8e8f252a 100644
> --- a/include/exec/tb-hash.h
> +++ b/include/exec/tb-hash.h
> @@ -66,4 +66,11 @@ uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, uint32_t flags,
>      return qemu_xxhash7(phys_pc, pc, flags, cf_mask, trace_vcpu_dstate);
>  }
>  
> +static inline
> +uint32_t tb_stats_hash_func(tb_page_addr_t phys_pc, target_ulong pc,
> +                            uint32_t flags)
> +{
> +    return qemu_xxhash5(phys_pc, pc, flags);
> +}
> +
>  #endif
> diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
> new file mode 100644
> index 0000000000..898e05a36f
> --- /dev/null
> +++ b/include/exec/tb-stats.h
> @@ -0,0 +1,67 @@
> +/*
> + * QEMU System Emulator, Code Quality Monitor System
> + *
> + * Copyright (c) 2019 Vanderson M. do Rosario <vandersonmr2@gmail.com>
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#ifndef TB_STATS_H
> +
> +#define TB_STATS_H
> +
> +#include "exec/cpu-common.h"
> +#include "exec/tb-context.h"
> +#include "tcg.h"
> +
> +typedef struct TBStatistics TBStatistics;
> +
> +/*
> + * This struct stores statistics such as execution count of the
> + * TranslationBlocks. Each sets of TBs for a given phys_pc/pc/flags
> + * has its own TBStatistics which will persist over tb_flush.
> + *
> + * We include additional counters to track number of translations as
> + * well as variants for compile flags.
> + */
> +struct TBStatistics {
> +    tb_page_addr_t phys_pc;
> +    target_ulong pc;
> +    uint32_t     flags;
> +    /* cs_base isn't included in the hash but we do check for matches */
> +    target_ulong cs_base;
> +    /* current TB linked to this TBStatistics */
> +    TranslationBlock *tb;

This seems sketchy to me. The TBStats have a 1 to many relationship so
I'm not sure what this information can be used for.

> +};
> +
> +bool tb_stats_cmp(const void *ap, const void *bp);
> +
> +void init_tb_stats_htable_if_not(void);
> +
> +/* TBStatistic collection controls */
> +enum TBStatsStatus { TB_STATS_RUNNING, TB_STATS_PAUSED, TB_STATS_STOPPED };
> +extern int tcg_collect_tb_stats;
> +
> +void enable_collect_tb_stats(void);
> +void disable_collect_tb_stats(void);
> +void pause_collect_tb_stats(void);
> +bool tb_stats_collection_enabled(void);
> +bool tb_stats_collection_paused(void);
> +
> +#endif
> diff --git a/util/log.c b/util/log.c
> index 1d1b33f7d9..393a17115b 100644
> --- a/util/log.c
> +++ b/util/log.c
> @@ -31,6 +31,8 @@ int qemu_loglevel;
>  static int log_append = 0;
>  static GArray *debug_regions;
>  
> +int tcg_collect_tb_stats;
> +
>  /* Return the number of characters emitted.  */
>  int qemu_log(const char *fmt, ...)
>  {


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v8 02/11] accel: collecting TB execution count
  2019-08-30 12:31     ` Vanderson Martins do Rosario
@ 2019-08-30 13:01       ` Alex Bennée
  0 siblings, 0 replies; 24+ messages in thread
From: Alex Bennée @ 2019-08-30 13:01 UTC (permalink / raw)
  To: Vanderson Martins do Rosario; +Cc: Paolo Bonzini, qemu-devel, Richard Henderson


Vanderson Martins do Rosario <vandersonmr2@gmail.com> writes:

> Ok. I haven't change it before because I would like to be able to collect
> information for already translated TBs when I, for instance, remove the
> filter during execution. Having the TBStats already created guarantee this.
> To guarantee this in your approach, we would need to tb_flush when changing
> the filter. Does it make sense? Is that ok for you?

I think so. While tb_flush is a bit of hammer translation is pretty
cheap so things will be running pretty quickly afterwards. We don't need
to flush the old TB stats entries though - we can keep them for the
lifetime of the run.

>
> Vanderson M. Rosario
>
>
> On Fri, Aug 30, 2019 at 7:21 AM Alex Bennée <alex.bennee@linaro.org> wrote:
>
>>
>> vandersonmr <vandersonmr2@gmail.com> writes:
>>
>> > If a TB has a TBS (TBStatistics) with the TB_EXEC_STATS
>> > enabled, then we instrument the start code of this TB
>> > to atomically count the number of times it is executed.
>> > We count both the number of "normal" executions and atomic
>> > executions of a TB.
>> >
>> > The execution count of the TB is stored in its respective
>> > TBS.
>> >
>> > All TBStatistics are created by default with the flags from
>> > default_tbstats_flag.
>> >
>> > Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
>> > ---
>> >  accel/tcg/cpu-exec.c      |  4 ++++
>> >  accel/tcg/tb-stats.c      |  5 +++++
>> >  accel/tcg/tcg-runtime.c   |  7 +++++++
>> >  accel/tcg/tcg-runtime.h   |  2 ++
>> >  accel/tcg/translate-all.c |  7 +++++++
>> >  accel/tcg/translator.c    |  1 +
>> >  include/exec/gen-icount.h |  9 +++++++++
>> >  include/exec/tb-stats.h   | 19 +++++++++++++++++++
>> >  util/log.c                |  1 +
>> >  9 files changed, 55 insertions(+)
>> >
>> > diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
>> > index 48272c781b..9b2b7bff80 100644
>> > --- a/accel/tcg/cpu-exec.c
>> > +++ b/accel/tcg/cpu-exec.c
>> > @@ -251,6 +251,10 @@ void cpu_exec_step_atomic(CPUState *cpu)
>> >
>> >          start_exclusive();
>> >
>> > +        if (tb_stats_enabled(tb, TB_EXEC_STATS)) {
>> > +            tb->tb_stats->executions.atomic++;
>> > +        }
>> > +
>> >          /* Since we got here, we know that parallel_cpus must be true.
>> */
>> >          parallel_cpus = false;
>> >          in_exclusive_region = true;
>> > diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
>> > index 948b107e68..1db81d83e7 100644
>> > --- a/accel/tcg/tb-stats.c
>> > +++ b/accel/tcg/tb-stats.c
>> > @@ -61,3 +61,8 @@ bool tb_stats_collection_paused(void)
>> >  {
>> >      return tcg_collect_tb_stats == TB_STATS_PAUSED;
>> >  }
>> > +
>> > +uint32_t get_default_tbstats_flag(void)
>> > +{
>> > +    return default_tbstats_flag;
>> > +}
>> > diff --git a/accel/tcg/tcg-runtime.c b/accel/tcg/tcg-runtime.c
>> > index 8a1e408e31..6f4aafba11 100644
>> > --- a/accel/tcg/tcg-runtime.c
>> > +++ b/accel/tcg/tcg-runtime.c
>> > @@ -167,3 +167,10 @@ void HELPER(exit_atomic)(CPUArchState *env)
>> >  {
>> >      cpu_loop_exit_atomic(env_cpu(env), GETPC());
>> >  }
>> > +
>> > +void HELPER(inc_exec_freq)(void *ptr)
>> > +{
>> > +    TBStatistics *stats = (TBStatistics *) ptr;
>> > +    g_assert(stats);
>> > +    atomic_inc(&stats->executions.normal);
>> > +}
>> > diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
>> > index 4fa61b49b4..bf0b75dbe8 100644
>> > --- a/accel/tcg/tcg-runtime.h
>> > +++ b/accel/tcg/tcg-runtime.h
>> > @@ -28,6 +28,8 @@ DEF_HELPER_FLAGS_1(lookup_tb_ptr, TCG_CALL_NO_WG_SE,
>> ptr, env)
>> >
>> >  DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, env)
>> >
>> > +DEF_HELPER_FLAGS_1(inc_exec_freq, TCG_CALL_NO_RWG, void, ptr)
>> > +
>> >  #ifdef CONFIG_SOFTMMU
>> >
>> >  DEF_HELPER_FLAGS_5(atomic_cmpxchgb, TCG_CALL_NO_WG,
>> > diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
>> > index b7bccacd3b..e72aeba682 100644
>> > --- a/accel/tcg/translate-all.c
>> > +++ b/accel/tcg/translate-all.c
>> > @@ -1785,6 +1785,13 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>> >       */
>> >      if (tb_stats_collection_enabled()) {
>> >          tb->tb_stats = tb_get_stats(phys_pc, pc, cs_base, flags, tb);
>> > +
>> > +        if (qemu_log_in_addr_range(tb->pc)) {
>>
>> We can open this out because this test will always pass if no dfilter
>> has been set and there is no point creating a tb_stats record if we
>> won't fill it in. So
>>
>>   if (qemu_log_in_addr_range(tb->pc)) {
>>      tb->tb_stats = tb_get_stats(phys_pc, pc, cs_base, flags, tb);
>>      uint32_t flag = get_default_tbstats_flag();
>>
>>      if (flag & TB_EXEC_STATS) {
>>        ...
>>
>> And the additional tests that get added later. This way we'll only
>> create and collect stats for what we want.
>>
>> > +            uint32_t flag = get_default_tbstats_flag();
>> > +            if (flag & TB_EXEC_STATS) {
>> > +                tb->tb_stats->stats_enabled |= TB_EXEC_STATS;
>> > +            }
>> > +        }
>> >      } else {
>> >          tb->tb_stats = NULL;
>> >      }
>> > diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
>> > index 70c66c538c..ec6bd829a0 100644
>> > --- a/accel/tcg/translator.c
>> > +++ b/accel/tcg/translator.c
>> > @@ -46,6 +46,7 @@ void translator_loop(const TranslatorOps *ops,
>> DisasContextBase *db,
>> >
>> >      ops->init_disas_context(db, cpu);
>> >      tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
>> > +    gen_tb_exec_count(tb);
>> >
>> >      /* Reset the temp count so that we can identify leaks */
>> >      tcg_clear_temp_count();
>> > diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
>> > index 822c43cfd3..be006383b9 100644
>> > --- a/include/exec/gen-icount.h
>> > +++ b/include/exec/gen-icount.h
>> > @@ -32,6 +32,15 @@ static inline void gen_io_end(void)
>> >      tcg_temp_free_i32(tmp);
>> >  }
>> >
>> > +static inline void gen_tb_exec_count(TranslationBlock *tb)
>> > +{
>> > +    if (tb_stats_enabled(tb, TB_EXEC_STATS)) {
>> > +        TCGv_ptr ptr = tcg_const_ptr(tb->tb_stats);
>> > +        gen_helper_inc_exec_freq(ptr);
>> > +        tcg_temp_free_ptr(ptr);
>> > +    }
>> > +}
>> > +
>> >  static inline void gen_tb_start(TranslationBlock *tb)
>> >  {
>> >      TCGv_i32 count, imm;
>> > diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
>> > index 898e05a36f..c4a8715400 100644
>> > --- a/include/exec/tb-stats.h
>> > +++ b/include/exec/tb-stats.h
>> > @@ -30,6 +30,9 @@
>> >  #include "exec/tb-context.h"
>> >  #include "tcg.h"
>> >
>> > +#define tb_stats_enabled(tb, JIT_STATS) \
>> > +    (tb && tb->tb_stats && (tb->tb_stats->stats_enabled & JIT_STATS))
>> > +
>> >  typedef struct TBStatistics TBStatistics;
>> >
>> >  /*
>> > @@ -46,6 +49,15 @@ struct TBStatistics {
>> >      uint32_t     flags;
>> >      /* cs_base isn't included in the hash but we do check for matches */
>> >      target_ulong cs_base;
>> > +
>> > +    uint32_t stats_enabled;
>> > +
>> > +    /* Execution stats */
>> > +    struct {
>> > +        unsigned long normal;
>> > +        unsigned long atomic;
>> > +    } executions;
>> > +
>> >      /* current TB linked to this TBStatistics */
>> >      TranslationBlock *tb;
>> >  };
>> > @@ -56,7 +68,12 @@ void init_tb_stats_htable_if_not(void);
>> >
>> >  /* TBStatistic collection controls */
>> >  enum TBStatsStatus { TB_STATS_RUNNING, TB_STATS_PAUSED,
>> TB_STATS_STOPPED };
>> > +
>> > +#define TB_NOTHING    (1 << 0)
>> > +#define TB_EXEC_STATS (1 << 1)
>> > +
>> >  extern int tcg_collect_tb_stats;
>> > +extern uint32_t default_tbstats_flag;
>> >
>> >  void enable_collect_tb_stats(void);
>> >  void disable_collect_tb_stats(void);
>> > @@ -64,4 +81,6 @@ void pause_collect_tb_stats(void);
>> >  bool tb_stats_collection_enabled(void);
>> >  bool tb_stats_collection_paused(void);
>> >
>> > +uint32_t get_default_tbstats_flag(void);
>> > +
>> >  #endif
>> > diff --git a/util/log.c b/util/log.c
>> > index 393a17115b..29021a4584 100644
>> > --- a/util/log.c
>> > +++ b/util/log.c
>> > @@ -32,6 +32,7 @@ static int log_append = 0;
>> >  static GArray *debug_regions;
>> >
>> >  int tcg_collect_tb_stats;
>> > +uint32_t default_tbstats_flag;
>> >
>> >  /* Return the number of characters emitted.  */
>> >  int qemu_log(const char *fmt, ...)
>>
>>
>> --
>> Alex Bennée
>>


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v8 03/11] accel: collecting JIT statistics
  2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 03/11] accel: collecting JIT statistics vandersonmr
@ 2019-08-30 13:10   ` Alex Bennée
  0 siblings, 0 replies; 24+ messages in thread
From: Alex Bennée @ 2019-08-30 13:10 UTC (permalink / raw)
  To: vandersonmr; +Cc: Paolo Bonzini, qemu-devel, Richard Henderson


vandersonmr <vandersonmr2@gmail.com> writes:

> If a TB has a TBS (TBStatistics) with the TB_JIT_STATS
> enabled then we collect statistics of its translation
> processes and code translation.
>
> Collecting the number of host instructions seems to be
> not simple as it would imply in having to modify several
> target source files. So, for now, we are only collecting
> the size of the host gen code.
>
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
> ---
>  accel/tcg/translate-all.c | 15 ++++++++++++++-
>  accel/tcg/translator.c    |  4 ++++
>  include/exec/tb-stats.h   | 15 +++++++++++++++
>  tcg/tcg.c                 | 23 +++++++++++++++++++++++
>  tcg/tcg.h                 |  2 ++
>  5 files changed, 58 insertions(+), 1 deletion(-)
>
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index e72aeba682..fb2fe0fa1f 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -1705,6 +1705,7 @@ static TBStatistics *tb_get_stats(tb_page_addr_t phys_pc, target_ulong pc,
>           * then just make the new TB point to the older TBStatistic
>           */
>          g_free(new_stats);
> +        ((TBStatistics *) existing_stats)->tb = current_tb;

This seems out of place and again I can't see what we are doing with
this information yet.

>          return existing_stats;
>      } else {
>          return new_stats;
> @@ -1785,13 +1786,18 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>       */
>      if (tb_stats_collection_enabled()) {
>          tb->tb_stats = tb_get_stats(phys_pc, pc, cs_base, flags, tb);
> +        uint32_t flag = get_default_tbstats_flag();
>
>          if (qemu_log_in_addr_range(tb->pc)) {
> -            uint32_t flag = get_default_tbstats_flag();
>              if (flag & TB_EXEC_STATS) {
>                  tb->tb_stats->stats_enabled |= TB_EXEC_STATS;
>              }
>          }
> +
> +        if (flag & TB_JIT_STATS) {
> +            tb->tb_stats->stats_enabled |= TB_JIT_STATS;
> +            atomic_inc(&tb->tb_stats->translations.total);
> +        }
>      } else {
>          tb->tb_stats = NULL;
>      }
> @@ -1869,6 +1875,10 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>      atomic_set(&prof->search_out_len, prof->search_out_len + search_size);
>  #endif
>
> +    if (tb_stats_enabled(tb, TB_JIT_STATS)) {
> +        atomic_add(&tb->tb_stats->code.out_len, gen_code_size);
> +    }
> +
>  #ifdef DEBUG_DISAS
>      if (qemu_loglevel_mask(CPU_LOG_TB_OUT_ASM) &&
>          qemu_log_in_addr_range(tb->pc)) {
> @@ -1926,6 +1936,9 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>      phys_page2 = -1;
>      if ((pc & TARGET_PAGE_MASK) != virt_page2) {
>          phys_page2 = get_page_addr_code(env, virt_page2);
> +        if (tb_stats_enabled(tb, TB_JIT_STATS)) {
> +            atomic_inc(&tb->tb_stats->translations.spanning);
> +        }
>      }
>      /*
>       * No explicit memory barrier is required -- tb_link_page() makes the
> diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
> index ec6bd829a0..9b2e248b09 100644
> --- a/accel/tcg/translator.c
> +++ b/accel/tcg/translator.c
> @@ -116,6 +116,10 @@ void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
>      db->tb->size = db->pc_next - db->pc_first;
>      db->tb->icount = db->num_insns;
>
> +    if (tb_stats_enabled(tb, TB_JIT_STATS)) {
> +        atomic_add(&db->tb->tb_stats->code.num_guest_inst, db->num_insns);
> +    }
> +
>  #ifdef DEBUG_DISAS
>      if (qemu_loglevel_mask(CPU_LOG_TB_IN_ASM)
>          && qemu_log_in_addr_range(db->pc_first)) {
> diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
> index c4a8715400..b68edd5d24 100644
> --- a/include/exec/tb-stats.h
> +++ b/include/exec/tb-stats.h
> @@ -58,6 +58,20 @@ struct TBStatistics {
>          unsigned long atomic;
>      } executions;
>
> +    struct {
> +        unsigned num_guest_inst;
> +        unsigned num_tcg_ops;
> +        unsigned num_tcg_ops_opt;
> +        unsigned spills;
> +        unsigned out_len;
> +    } code;
> +
> +    struct {
> +        unsigned long total;
> +        unsigned long uncached;
> +        unsigned long spanning;
> +    } translations;
> +
>      /* current TB linked to this TBStatistics */
>      TranslationBlock *tb;
>  };
> @@ -71,6 +85,7 @@ enum TBStatsStatus { TB_STATS_RUNNING, TB_STATS_PAUSED, TB_STATS_STOPPED };
>
>  #define TB_NOTHING    (1 << 0)
>  #define TB_EXEC_STATS (1 << 1)
> +#define TB_JIT_STATS  (1 << 2)
>
>  extern int tcg_collect_tb_stats;
>  extern uint32_t default_tbstats_flag;
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 0458eaec57..ae3e7a2217 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -3125,6 +3125,11 @@ static void temp_sync(TCGContext *s, TCGTemp *ts, TCGRegSet allocated_regs,
>          case TEMP_VAL_REG:
>              tcg_out_st(s, ts->type, ts->reg,
>                         ts->mem_base->reg, ts->mem_offset);
> +
> +            /* Count number of spills */
> +            if (tb_stats_enabled(s->current_tb, TB_JIT_STATS)) {
> +                atomic_inc(&s->current_tb->tb_stats->code.spills);
> +            }
>              break;
>
>          case TEMP_VAL_MEM:
> @@ -3996,6 +4001,8 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
>      int i, num_insns;
>      TCGOp *op;
>
> +    s->current_tb = tb;
> +
>  #ifdef CONFIG_PROFILER
>      {
>          int n = 0;
> @@ -4027,6 +4034,14 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
>      }
>  #endif
>
> +    if (tb_stats_enabled(tb, TB_JIT_STATS)) {
> +        int n = 0;
> +        QTAILQ_FOREACH(op, &s->ops, link) {
> +            n++;
> +        }
> +        atomic_add(&tb->tb_stats->code.num_tcg_ops, n);
> +    }
> +
>  #ifdef CONFIG_DEBUG_TCG
>      /* Ensure all labels referenced have been emitted.  */
>      {
> @@ -4093,6 +4108,14 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
>      }
>  #endif
>
> +    if (tb_stats_enabled(tb, TB_JIT_STATS)) {
> +        int n = 0;
> +        QTAILQ_FOREACH(op, &s->ops, link) {
> +            n++;
> +        }
> +        atomic_add(&tb->tb_stats->code.num_tcg_ops_opt, n);
> +    }
> +
>      tcg_reg_alloc_start(s);
>
>      s->code_buf = tb->tc.ptr;
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index 529acb2ed8..b4601162f8 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -740,6 +740,8 @@ struct TCGContext {
>
>      uint16_t gen_insn_end_off[TCG_MAX_INSNS];
>      target_ulong gen_insn_data[TCG_MAX_INSNS][TARGET_INSN_START_WORDS];
> +
> +    TranslationBlock *current_tb;

As we are only using this to get to tb_stats why not skip the middle man
and go directly to:

       TBStatistics *current_stats;

?

You already have a tbs_stats_enabled() helper 

>  };
>
>  extern TCGContext tcg_init_ctx;


--
Alex Bennée


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v8 05/11] accel: adding TB_JIT_TIME and full replacing CONFIG_PROFILER
  2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 05/11] accel: adding TB_JIT_TIME and full replacing CONFIG_PROFILER vandersonmr
@ 2019-08-30 13:12   ` Alex Bennée
  0 siblings, 0 replies; 24+ messages in thread
From: Alex Bennée @ 2019-08-30 13:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Richard Henderson, vandersonmr, Markus Armbruster,
	Dr. David Alan Gilbert


vandersonmr <vandersonmr2@gmail.com> writes:

> Replace all others CONFIG_PROFILER statistics and migrate it to
> TBStatistics system. However, TCGProfiler still exists and can
> be use to store global statistics and times. All TB related
> statistics goes to TBStatistics.
>
> Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
> ---
>  accel/tcg/tb-stats.c      |  91 +++++++++++++++++++---------
>  accel/tcg/translate-all.c |  47 ++++++++-------
>  configure                 |   3 -
>  cpus.c                    |  14 ++---
>  include/exec/tb-stats.h   |  19 +++++-
>  include/qemu/timer.h      |   5 +-
>  monitor/misc.c            |  28 ++-------
>  tcg/tcg.c                 | 124 +++++++++++---------------------------
>  tcg/tcg.h                 |  10 +--
>  vl.c                      |   8 +--
>  10 files changed, 156 insertions(+), 193 deletions(-)
>
> diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
> index 176da60e13..66abc97ad4 100644
> --- a/accel/tcg/tb-stats.c
> +++ b/accel/tcg/tb-stats.c
> @@ -32,6 +32,8 @@
>
>  #include "exec/tb-stats.h"
>
> +uint64_t dev_time;
> +
>  struct jit_profile_info {
>      uint64_t translations;
>      uint64_t aborted;
> @@ -43,6 +45,13 @@ struct jit_profile_info {
>      uint64_t host;
>      uint64_t guest;
>      uint64_t search_data;
> +
> +    uint64_t interm_time;
> +    uint64_t code_time;
> +    uint64_t restore_count;
> +    uint64_t restore_time;
> +    uint64_t opt_time;
> +    uint64_t la_time;
>  };
>
>  /* accumulate the statistics from all TBs */
> @@ -64,6 +73,29 @@ static void collect_jit_profile_info(void *p, uint32_t hash, void *userp)
>      jpi->host += tbs->code.out_len;
>      jpi->guest += tbs->code.in_len;
>      jpi->search_data += tbs->code.search_out_len;
> +
> +    jpi->interm_time += stat_per_translation(tbs, time.interm);
> +    jpi->code_time += stat_per_translation(tbs, time.code);
> +    jpi->opt_time += stat_per_translation(tbs, time.opt);
> +    jpi->la_time += stat_per_translation(tbs, time.la);
> +    jpi->restore_time += tbs->time.restore;
> +    jpi->restore_count += tbs->time.restore_count;
> +}
> +
> +void dump_jit_exec_time_info(uint64_t dev_time)
> +{
> +    static uint64_t last_cpu_exec_time;
> +    uint64_t cpu_exec_time;
> +    uint64_t delta;
> +
> +    cpu_exec_time = tcg_cpu_exec_time();
> +    delta = cpu_exec_time - last_cpu_exec_time;
> +
> +    qemu_printf("async time  %" PRId64 " (%0.3f)\n",
> +                   dev_time, dev_time / (double) NANOSECONDS_PER_SECOND);
> +    qemu_printf("qemu time   %" PRId64 " (%0.3f)\n",
> +                   delta, delta / (double) NANOSECONDS_PER_SECOND);
> +    last_cpu_exec_time = cpu_exec_time;
>  }
>
>  /* dump JIT statisticis using TCGProfile and TBStats */
> @@ -90,34 +122,39 @@ void dump_jit_profile_info(TCGProfile *s)
>          qemu_printf("avg search data/TB  %0.1f\n",
>                  jpi->search_data / (double) jpi->translations);
>
> +        uint64_t tot = jpi->interm_time + jpi->code_time;
> +
> +        qemu_printf("JIT cycles          %" PRId64 " (%0.3fs at 2.4 GHz)\n",
> +                tot, tot / 2.4e9);
> +        qemu_printf("  cycles/op           %0.1f\n",
> +                jpi->ops ? (double)tot / jpi->ops : 0);
> +        qemu_printf("  cycles/in byte      %0.1f\n",
> +                jpi->guest ? (double)tot / jpi->guest : 0);
> +        qemu_printf("  cycles/out byte     %0.1f\n",
> +                jpi->host ? (double)tot / jpi->host : 0);
> +        qemu_printf("  cycles/search byte  %0.1f\n",
> +                jpi->search_data ? (double)tot / jpi->search_data : 0);
> +        if (tot == 0) {
> +            tot = 1;
> +        }
> +
> +        qemu_printf("  gen_interm time     %0.1f%%\n",
> +                (double)jpi->interm_time / tot * 100.0);
> +        qemu_printf("  gen_code time       %0.1f%%\n",
> +                (double)jpi->code_time / tot * 100.0);
> +
> +        qemu_printf("    optim./code time    %0.1f%%\n",
> +                (double)jpi->opt_time / (jpi->code_time ? jpi->code_time : 1) * 100.0);
> +        qemu_printf("    liveness/code time  %0.1f%%\n",
> +                (double)jpi->la_time / (jpi->code_time ? jpi->code_time : 1) * 100.0);
> +
> +        qemu_printf("cpu_restore count   %" PRId64 "\n", jpi->restore_count);
> +        qemu_printf("  avg cycles        %0.1f\n",
> +                jpi->restore_count ? (double)jpi->restore_time / jpi->restore_count : 0);
> +
>          if (s) {
> -            int64_t tot = s->interm_time + s->code_time;
> -            qemu_printf("JIT cycles          %" PRId64 " (%0.3f s at 2.4 GHz)\n",
> -                            tot, tot / 2.4e9);
> -            qemu_printf("cycles/op           %0.1f\n",
> -                        jpi->ops ? (double)tot / jpi->ops : 0);
> -            qemu_printf("cycles/in byte      %0.1f\n",
> -                        jpi->guest ? (double)tot / jpi->guest : 0);
> -            qemu_printf("cycles/out byte     %0.1f\n",
> -                        jpi->host ? (double)tot / jpi->host : 0);
> -            qemu_printf("cycles/search byte     %0.1f\n",
> -                        jpi->search_data ? (double)tot / jpi->search_data : 0);
> -            if (tot == 0) {
> -                tot = 1;
> -            }
> -            qemu_printf("  gen_interm time   %0.1f%%\n",
> -                        (double)s->interm_time / tot * 100.0);
> -            qemu_printf("  gen_code time     %0.1f%%\n",
> -                        (double)s->code_time / tot * 100.0);
> -            qemu_printf("optim./code time    %0.1f%%\n",
> -                        (double)s->opt_time / (s->code_time ? s->code_time : 1)
> -                        * 100.0);
> -            qemu_printf("liveness/code time  %0.1f%%\n",
> -                    (double)s->la_time / (s->code_time ? s->code_time : 1) * 100.0);
> -            qemu_printf("cpu_restore count   %" PRId64 "\n",
> -                    s->restore_count);
> -            qemu_printf("  avg cycles        %0.1f\n",
> -                    s->restore_count ? (double)s->restore_time / s->restore_count : 0);
> +            qemu_printf("cpu exec time  %" PRId64 " (%0.3fs)\n",
> +                s->cpu_exec_time, s->cpu_exec_time / (double) NANOSECONDS_PER_SECOND);
>          }
>      }
>      g_free(jpi);
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index fe890f0782..a2e65bb85c 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -337,10 +337,11 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
>      CPUArchState *env = cpu->env_ptr;
>      uint8_t *p = tb->tc.ptr + tb->tc.size;
>      int i, j, num_insns = tb->icount;
> -#ifdef CONFIG_PROFILER
> -    TCGProfile *prof = &tcg_ctx->prof;
> -    int64_t ti = profile_getclock();
> -#endif
> +    uint64_t ti = 0;
> +
> +    if (tb_stats_enabled(tb, TB_JIT_TIME)) {
> +        ti = profile_getclock();
> +    }
>
>      searched_pc -= GETPC_ADJ;
>
> @@ -370,11 +371,11 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
>      }
>      restore_state_to_opc(env, tb, data);
>
> -#ifdef CONFIG_PROFILER
> -    atomic_set(&prof->restore_time,
> -                prof->restore_time + profile_getclock() - ti);
> -    atomic_set(&prof->restore_count, prof->restore_count + 1);
> -#endif
> +    if (tb_stats_enabled(tb, TB_JIT_TIME)) {
> +        atomic_add(&tb->tb_stats->time.restore, profile_getclock() - ti);
> +        atomic_inc(&tb->tb_stats->time.restore_count);
> +    }
> +
>      return 0;
>  }
>
> @@ -1724,10 +1725,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>      target_ulong virt_page2;
>      tcg_insn_unit *gen_code_buf;
>      int gen_code_size, search_size, max_insns;
> -#ifdef CONFIG_PROFILER
> -    TCGProfile *prof = &tcg_ctx->prof;
> -    int64_t ti;
> -#endif
> +    uint64_t ti = 0;
> +
>      assert_memory_lock();
>
>      phys_pc = get_page_addr_code(env, pc);
> @@ -1773,9 +1772,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>      tcg_ctx->tb_cflags = cflags;
>   tb_overflow:
>
> -#ifdef CONFIG_PROFILER
> -    ti = profile_getclock();
> -#endif
>
>      /*
>       * We want to fetch the stats structure before we start code
> @@ -1796,6 +1792,11 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>              tb->tb_stats->stats_enabled |= TB_JIT_STATS;
>              atomic_inc(&tb->tb_stats->translations.total);
>          }
> +
> +        if (flag & TB_JIT_TIME) {
> +            tb->tb_stats->stats_enabled |= TB_JIT_TIME;
> +            ti = profile_getclock();
> +        }
>      } else {
>          tb->tb_stats = NULL;
>      }
> @@ -1820,10 +1821,10 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>          tcg_ctx->tb_jmp_target_addr = tb->jmp_target_arg;
>      }
>
> -#ifdef CONFIG_PROFILER
> -    atomic_set(&prof->interm_time, prof->interm_time + profile_getclock() - ti);
> -    ti = profile_getclock();
> -#endif
> +    if (tb_stats_enabled(tb, TB_JIT_TIME)) {
> +        atomic_add(&tb->tb_stats->time.interm, profile_getclock() - ti);
> +        ti = profile_getclock();
> +    }
>
>      gen_code_size = tcg_gen_code(tcg_ctx, tb);
>      if (unlikely(gen_code_size < 0)) {
> @@ -1865,9 +1866,9 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>      }
>      tb->tc.size = gen_code_size;
>
> -#ifdef CONFIG_PROFILER
> -    atomic_set(&prof->code_time, prof->code_time + profile_getclock() - ti);
> -#endif
> +    if (tb_stats_enabled(tb, TB_JIT_TIME)) {
> +        atomic_add(&tb->tb_stats->time.code, profile_getclock() - ti);
> +    }
>
>      if (tb_stats_enabled(tb, TB_JIT_STATS)) {
>          atomic_add(&tb->tb_stats->code.in_len, tb->size);
> diff --git a/configure b/configure
> index e44e454c43..af8ff2e6dc 100755
> --- a/configure
> +++ b/configure
> @@ -6611,9 +6611,6 @@ fi
>  if test "$static" = "yes" ; then
>    echo "CONFIG_STATIC=y" >> $config_host_mak
>  fi
> -if test "$profiler" = "yes" ; then
> -  echo "CONFIG_PROFILER=y" >> $config_host_mak
> -fi
>  if test "$want_tools" = "yes" ; then
>    echo "CONFIG_TOOLS=y" >> $config_host_mak
>  fi
> diff --git a/cpus.c b/cpus.c
> index 85cd451a86..c1844a412b 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -1433,21 +1433,17 @@ static void process_icount_data(CPUState *cpu)
>  static int tcg_cpu_exec(CPUState *cpu)
>  {
>      int ret;
> -#ifdef CONFIG_PROFILER
> -    int64_t ti;
> -#endif
> +    uint64_t ti;
>
>      assert(tcg_enabled());
> -#ifdef CONFIG_PROFILER
>      ti = profile_getclock();
> -#endif
> +
>      cpu_exec_start(cpu);
>      ret = cpu_exec(cpu);
>      cpu_exec_end(cpu);
> -#ifdef CONFIG_PROFILER
> -    atomic_set(&tcg_ctx->prof.cpu_exec_time,
> -               tcg_ctx->prof.cpu_exec_time + profile_getclock() - ti);
> -#endif
> +
> +    atomic_add(&tcg_ctx->prof.cpu_exec_time, profile_getclock() - ti);
> +
>      return ret;
>  }
>
> diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
> index b9d5a343be..0b9a6e2f72 100644
> --- a/include/exec/tb-stats.h
> +++ b/include/exec/tb-stats.h
> @@ -81,22 +81,39 @@ struct TBStatistics {
>          unsigned long spanning;
>      } translations;
>
> +    struct {
> +        int64_t restore;
> +        uint64_t restore_count;
> +        int64_t interm;
> +        int64_t code;
> +        int64_t opt;
> +        int64_t la;
> +    } time;
> +
>      /* current TB linked to this TBStatistics */
>      TranslationBlock *tb;
>  };
>
>  bool tb_stats_cmp(const void *ap, const void *bp);
>
> +void dump_jit_exec_time_info(uint64_t dev_time);
> +
>  void init_tb_stats_htable_if_not(void);
>
>  void dump_jit_profile_info(TCGProfile *s);
>
>  /* TBStatistic collection controls */
> -enum TBStatsStatus { TB_STATS_RUNNING, TB_STATS_PAUSED, TB_STATS_STOPPED };
> +enum TBStatsStatus {
> +    TB_STATS_DISABLED = 0,
> +    TB_STATS_RUNNING,
> +    TB_STATS_PAUSED,
> +    TB_STATS_STOPPED
> +};
>
>  #define TB_NOTHING    (1 << 0)
>  #define TB_EXEC_STATS (1 << 1)
>  #define TB_JIT_STATS  (1 << 2)
> +#define TB_JIT_TIME   (1 << 3)
>
>  extern int tcg_collect_tb_stats;
>  extern uint32_t default_tbstats_flag;
> diff --git a/include/qemu/timer.h b/include/qemu/timer.h
> index 85bc6eb00b..641ca3ddd0 100644
> --- a/include/qemu/timer.h
> +++ b/include/qemu/timer.h
> @@ -1006,13 +1006,10 @@ static inline int64_t cpu_get_host_ticks(void)
>  }
>  #endif
>
> -#ifdef CONFIG_PROFILER
>  static inline int64_t profile_getclock(void)
>  {
>      return get_clock();
>  }
>
> -extern int64_t dev_time;
> -#endif
> -
> +extern uint64_t dev_time;
>  #endif
> diff --git a/monitor/misc.c b/monitor/misc.c
> index aef16f6cfb..ac4ff58d96 100644
> --- a/monitor/misc.c
> +++ b/monitor/misc.c
> @@ -472,6 +472,11 @@ static void hmp_info_jit(Monitor *mon, const QDict *qdict)
>
>  static void hmp_info_opcount(Monitor *mon, const QDict *qdict)
>  {
> +    if (!tb_stats_collection_enabled()) {
> +        error_report("TB information not being recorded.");
> +        return;
> +    }
> +
>      dump_opcount_info();
>  }
>  #endif
> @@ -1082,32 +1087,11 @@ static void hmp_info_mtree(Monitor *mon, const QDict *qdict)
>      mtree_info(flatview, dispatch_tree, owner);
>  }
>
> -#ifdef CONFIG_PROFILER
> -
> -int64_t dev_time;
> -
>  static void hmp_info_profile(Monitor *mon, const QDict *qdict)
>  {
> -    static int64_t last_cpu_exec_time;
> -    int64_t cpu_exec_time;
> -    int64_t delta;
> -
> -    cpu_exec_time = tcg_cpu_exec_time();
> -    delta = cpu_exec_time - last_cpu_exec_time;
> -
> -    monitor_printf(mon, "async time  %" PRId64 " (%0.3f)\n",
> -                   dev_time, dev_time / (double)NANOSECONDS_PER_SECOND);
> -    monitor_printf(mon, "qemu time   %" PRId64 " (%0.3f)\n",
> -                   delta, delta / (double)NANOSECONDS_PER_SECOND);
> -    last_cpu_exec_time = cpu_exec_time;
> +    dump_jit_exec_time_info(dev_time);
>      dev_time = 0;
>  }
> -#else
> -static void hmp_info_profile(Monitor *mon, const QDict *qdict)
> -{
> -    monitor_printf(mon, "Internal profiler not compiled\n");
> -}
> -#endif
>
>  /* Capture support */
>  static QLIST_HEAD (capture_list_head, CaptureState) capture_head;
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 03b11495aa..4986ff3984 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -3884,82 +3884,34 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
>      }
>  }
>
> -#ifdef CONFIG_PROFILER
> -
>  /* avoid copy/paste errors */
>  #define PROF_ADD(to, from, field)                       \
>      do {                                                \
>          (to)->field += atomic_read(&((from)->field));   \
>      } while (0)
>
> -#define PROF_MAX(to, from, field)                                       \
> -    do {                                                                \
> -        typeof((from)->field) val__ = atomic_read(&((from)->field));    \
> -        if (val__ > (to)->field) {                                      \
> -            (to)->field = val__;                                        \
> -        }                                                               \
> -    } while (0)
> -
> -/* Pass in a zero'ed @prof */
> -static inline
> -void tcg_profile_snapshot(TCGProfile *prof, bool counters, bool table)
> +static void collect_tcg_profiler(TCGProfile *prof)
>  {
> +    int i;
>      unsigned int n_ctxs = atomic_read(&n_tcg_ctxs);
> -    unsigned int i;
>
>      for (i = 0; i < n_ctxs; i++) {
>          TCGContext *s = atomic_read(&tcg_ctxs[i]);
>          const TCGProfile *orig = &s->prof;
>
> -        if (counters) {
> -            PROF_ADD(prof, orig, cpu_exec_time);
> -            PROF_ADD(prof, orig, interm_time);
> -            PROF_ADD(prof, orig, code_time);
> -            PROF_ADD(prof, orig, la_time);
> -            PROF_ADD(prof, orig, opt_time);
> -            PROF_ADD(prof, orig, restore_count);
> -            PROF_ADD(prof, orig, restore_time);
> -        }
> -        if (table) {
> -            int i;
> +        PROF_ADD(prof, orig, cpu_exec_time);
>
> -            for (i = 0; i < NB_OPS; i++) {
> -                PROF_ADD(prof, orig, table_op_count[i]);
> -            }
> +        for (i = 0; i < NB_OPS; i++) {
> +            PROF_ADD(prof, orig, table_op_count[i]);


I've just noticed we've run into the same problem we had with the
original exec count patch of breaking on the 32 bit builds, e.g:

  make docker-test-build@debian-mipsel-cross V=1 J=9 TARGET_LIST=mipsel-linux-user,mips-softmmu

Gives:

  In file included from /tmp/qemu-test/src/include/qemu/osdep.h:51:0,
                   from /tmp/qemu-test/src/tcg/tcg.c:28:
  /tmp/qemu-test/src/tcg/tcg.c: In function 'collect_tcg_profiler':
  /tmp/qemu-test/src/include/qemu/compiler.h:86:36: error: static assertion failed: "not expecting: sizeof(*&((orig)->cpu_exec_time)) > ATOMIC_REG_SIZE"
   #define QEMU_BUILD_BUG_MSG(x, msg) _Static_assert(!(x), msg)
                                      ^
  /tmp/qemu-test/src/include/qemu/compiler.h:94:30: note: in expansion of macro 'QEMU_BUILD_BUG_MSG'
   #define QEMU_BUILD_BUG_ON(x) QEMU_BUILD_BUG_MSG(x, "not expecting: " #x)
                                ^~~~~~~~~~~~~~~~~~
  /tmp/qemu-test/src/include/qemu/atomic.h:133:5: note: in expansion of macro 'QEMU_BUILD_BUG_ON'
       QEMU_BUILD_BUG_ON(sizeof(*ptr) > ATOMIC_REG_SIZE); \
       ^~~~~~~~~~~~~~~~~
  /tmp/qemu-test/src/tcg/tcg.c:3890:24: note: in expansion of macro 'atomic_read'
           (to)->field += atomic_read(&((from)->field));   \
                          ^~~~~~~~~~~
  /tmp/qemu-test/src/tcg/tcg.c:3902:9: note: in expansion of macro 'PROF_ADD'
           PROF_ADD(prof, orig, cpu_exec_time);
           ^~~~~~~~

Of course this would have happened if you had enabled CONFIG_PROFILER on
a 32 bit build anyway. So what options do we have:

1. Gate the code on something based of ATOMIC_REG_SIZE

not desirable - the newer code is a lot nicer than having lots of ugly
#ifdefs

2. Defer the summing of the work with async work

Much like we do at the moment with TB stats info

3. Maybe consider an RCU approach?

This would mean the numbers weren't exact at time of calculation but if
you want precise numbers you would have halted the machine somehow
anyway right?

>          }
>      }
>  }
>  
> -#undef PROF_ADD
> -#undef PROF_MAX
> -
> -static void tcg_profile_snapshot_counters(TCGProfile *prof)
> -{
> -    tcg_profile_snapshot(prof, true, false);
> -}
> -
> -static void tcg_profile_snapshot_table(TCGProfile *prof)
> -{
> -    tcg_profile_snapshot(prof, false, true);
> -}
> -
> -void tcg_dump_op_count(void)
> -{
> -    TCGProfile prof = {};
> -    int i;
> -
> -    tcg_profile_snapshot_table(&prof);
> -    for (i = 0; i < NB_OPS; i++) {
> -        qemu_printf("%s %" PRId64 "\n", tcg_op_defs[i].name,
> -                    prof.table_op_count[i]);
> -    }
> -}
> -
> -int64_t tcg_cpu_exec_time(void)
> +uint64_t tcg_cpu_exec_time(void)
>  {
>      unsigned int n_ctxs = atomic_read(&n_tcg_ctxs);
>      unsigned int i;
> -    int64_t ret = 0;
> +    uint64_t ret = 0;
>
>      for (i = 0; i < n_ctxs; i++) {
>          const TCGContext *s = atomic_read(&tcg_ctxs[i]);
> @@ -3969,25 +3921,9 @@ int64_t tcg_cpu_exec_time(void)
>      }
>      return ret;
>  }
> -#else
> -void tcg_dump_op_count(void)
> -{
> -    qemu_printf("[TCG profiler not compiled]\n");
> -}
> -
> -int64_t tcg_cpu_exec_time(void)
> -{
> -    error_report("%s: TCG profiler not compiled", __func__);
> -    exit(EXIT_FAILURE);
> -}
> -#endif
> -
>
>  int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
>  {
> -#ifdef CONFIG_PROFILER
> -    TCGProfile *prof = &s->prof;
> -#endif
>      int i, num_insns;
>      TCGOp *op;
>
> @@ -4040,18 +3976,18 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
>      }
>  #endif
>
> -#ifdef CONFIG_PROFILER
> -    atomic_set(&prof->opt_time, prof->opt_time - profile_getclock());
> -#endif
> +    if (tb_stats_enabled(tb, TB_JIT_TIME)) {
> +        atomic_add(&tb->tb_stats->time.opt, -profile_getclock());
> +    }
>
>  #ifdef USE_TCG_OPTIMIZATIONS
>      tcg_optimize(s);
>  #endif
>
> -#ifdef CONFIG_PROFILER
> -    atomic_set(&prof->opt_time, prof->opt_time + profile_getclock());
> -    atomic_set(&prof->la_time, prof->la_time - profile_getclock());
> -#endif
> +    if (tb_stats_enabled(tb, TB_JIT_TIME)) {
> +        atomic_add(&tb->tb_stats->time.opt, profile_getclock());
> +        atomic_add(&tb->tb_stats->time.la,  -profile_getclock());
> +    }
>
>      reachable_code_pass(s);
>      liveness_pass_1(s);
> @@ -4074,9 +4010,9 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
>          }
>      }
>
> -#ifdef CONFIG_PROFILER
> -    atomic_set(&prof->la_time, prof->la_time + profile_getclock());
> -#endif
> +    if (tb_stats_enabled(tb, TB_JIT_TIME)) {
> +        atomic_add(&tb->tb_stats->time.la, profile_getclock());
> +    }
>
>  #ifdef DEBUG_DISAS
>      if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP_OPT)
> @@ -4109,14 +4045,17 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
>      s->pool_labels = NULL;
>  #endif
>
> +    if (!tb_stats_collection_enabled()) {
> +        QTAILQ_FOREACH(op, &s->ops, link) {
> +            TCGOpcode opc = op->opc;
> +            atomic_add(&s->prof.table_op_count[opc], 1);
> +        }
> +    }
> +
>      num_insns = -1;
>      QTAILQ_FOREACH(op, &s->ops, link) {
>          TCGOpcode opc = op->opc;
>
> -#ifdef CONFIG_PROFILER
> -        atomic_set(&prof->table_op_count[opc], prof->table_op_count[opc] + 1);
> -#endif
> -
>          switch (opc) {
>          case INDEX_op_mov_i32:
>          case INDEX_op_mov_i64:
> @@ -4209,14 +4148,23 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
>      return tcg_current_code_size(s);
>  }
>
> +void tcg_dump_op_count(void)
> +{
> +    TCGProfile prof = {};
> +    int i;
> +
> +    collect_tcg_profiler(&prof);
> +    for (i = 0; i < NB_OPS; i++) {
> +        qemu_printf("%s %" PRId64 "\n", tcg_op_defs[i].name, prof.table_op_count[i]);
> +    }
> +}
> +
>  void tcg_dump_info(void)
>  {
>      TCGProfile *s = NULL;
> -#ifdef CONFIG_PROFILER
>      TCGProfile prof = {};
> -    tcg_profile_snapshot_counters(&prof);
>      s = &prof;
> -#endif
> +    collect_tcg_profiler(s);
>      dump_jit_profile_info(s);
>  }
>
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index a8d779f7e6..3550d22873 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -646,12 +646,6 @@ QEMU_BUILD_BUG_ON(NB_OPS > (1 << 8));
>
>  typedef struct TCGProfile {
>      int64_t cpu_exec_time;
> -    int64_t interm_time;
> -    int64_t code_time;
> -    int64_t la_time;
> -    int64_t opt_time;
> -    int64_t restore_count;
> -    int64_t restore_time;
>      int64_t table_op_count[NB_OPS];
>  } TCGProfile;
>
> @@ -679,9 +673,7 @@ struct TCGContext {
>
>      tcg_insn_unit *code_ptr;
>
> -#ifdef CONFIG_PROFILER
>      TCGProfile prof;
> -#endif
>
>  #ifdef CONFIG_DEBUG_TCG
>      int temps_in_use;
> @@ -1013,7 +1005,7 @@ int tcg_check_temp_count(void);
>  #define tcg_check_temp_count() 0
>  #endif
>
> -int64_t tcg_cpu_exec_time(void);
> +uint64_t tcg_cpu_exec_time(void);
>  void tcg_dump_info(void);
>  void tcg_dump_op_count(void);
>
> diff --git a/vl.c b/vl.c
> index 630f5c5e9c..4203cc728c 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -1796,17 +1796,11 @@ static bool main_loop_should_exit(void)
>
>  static void main_loop(void)
>  {
> -#ifdef CONFIG_PROFILER
> -    int64_t ti;
> -#endif
> +    uint64_t ti;
>      while (!main_loop_should_exit()) {
> -#ifdef CONFIG_PROFILER
>          ti = profile_getclock();
> -#endif
>          main_loop_wait(false);
> -#ifdef CONFIG_PROFILER
>          dev_time += profile_getclock() - ti;
> -#endif
>      }
>  }


--
Alex Bennée


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v8 06/11] Adding -d tb_stats to control TBStatistics collection:
  2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 06/11] Adding -d tb_stats to control TBStatistics collection: vandersonmr
@ 2019-08-30 14:45   ` Alex Bennée
  0 siblings, 0 replies; 24+ messages in thread
From: Alex Bennée @ 2019-08-30 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: vandersonmr


vandersonmr <vandersonmr2@gmail.com> writes:

>  -d tb_stats[[,level=(+all+jit+exec+time)][,dump_limit=<number>]]
>
> "dump_limit" is used to limit the number of dumped TBStats in
> linux-user mode.
>
> [all+jit+exec+time] control the profilling level used
> by the TBStats. Can be used as follow:
>
> -d tb_stats
> -d tb_stats,level=jit+time
> -d tb_stats,dump_limit=15
> ...
>
> Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
> ---
>  include/exec/gen-icount.h     |  1 +
>  include/exec/tb-stats-flags.h | 42 +++++++++++++++++++++++++++++++++++
>  include/exec/tb-stats.h       | 18 +++------------
>  include/qemu/log.h            |  1 +
>  util/log.c                    | 35 +++++++++++++++++++++++++++++
>  5 files changed, 82 insertions(+), 15 deletions(-)
>  create mode 100644 include/exec/tb-stats-flags.h
>
> diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
> index be006383b9..3987adfb0e 100644
> --- a/include/exec/gen-icount.h
> +++ b/include/exec/gen-icount.h
> @@ -2,6 +2,7 @@
>  #define GEN_ICOUNT_H
>
>  #include "qemu/timer.h"
> +#include "tb-stats-flags.h"
>
>  /* Helpers for instruction counting code generation.  */
>
> diff --git a/include/exec/tb-stats-flags.h b/include/exec/tb-stats-flags.h
> new file mode 100644
> index 0000000000..c936ac1084
> --- /dev/null
> +++ b/include/exec/tb-stats-flags.h
> @@ -0,0 +1,42 @@
> +/*
> + * QEMU System Emulator, Code Quality Monitor System
> + *
> + * Copyright (c) 2019 Vanderson M. do Rosario <vandersonmr2@gmail.com>
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.

Any reason not to use GPLv2 or later here?

> + */
> +#ifndef TB_STATS_FLAGS
> +#define TB_STATS_FLAGS
> +
> +enum TBStatsStatus {
> +    TB_STATS_DISABLED = 0,
> +    TB_STATS_RUNNING,
> +    TB_STATS_PAUSED,
> +    TB_STATS_STOPPED
> +};
> +
> +#define TB_NOTHING    (1 << 0)
> +#define TB_EXEC_STATS (1 << 1)
> +#define TB_JIT_STATS  (1 << 2)
> +#define TB_JIT_TIME   (1 << 3)
> +
> +extern int tcg_collect_tb_stats;
> +extern uint32_t default_tbstats_flag;
> +
> +#endif
> diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
> index 0b9a6e2f72..6a53bef31b 100644
> --- a/include/exec/tb-stats.h
> +++ b/include/exec/tb-stats.h
> @@ -30,6 +30,8 @@
>  #include "exec/tb-context.h"
>  #include "tcg.h"
>
> +#include "exec/tb-stats-flags.h"
> +
>  #define tb_stats_enabled(tb, JIT_STATS) \
>      (tb && tb->tb_stats && (tb->tb_stats->stats_enabled & JIT_STATS))
>
> @@ -98,26 +100,12 @@ bool tb_stats_cmp(const void *ap, const void *bp);
>
>  void dump_jit_exec_time_info(uint64_t dev_time);
>
> +void set_tbstats_flags(uint32_t flags);
>  void init_tb_stats_htable_if_not(void);
>
>  void dump_jit_profile_info(TCGProfile *s);
>
>  /* TBStatistic collection controls */
> -enum TBStatsStatus {
> -    TB_STATS_DISABLED = 0,
> -    TB_STATS_RUNNING,
> -    TB_STATS_PAUSED,
> -    TB_STATS_STOPPED
> -};
> -
> -#define TB_NOTHING    (1 << 0)
> -#define TB_EXEC_STATS (1 << 1)
> -#define TB_JIT_STATS  (1 << 2)
> -#define TB_JIT_TIME   (1 << 3)
> -
> -extern int tcg_collect_tb_stats;
> -extern uint32_t default_tbstats_flag;
> -
>  void enable_collect_tb_stats(void);
>  void disable_collect_tb_stats(void);
>  void pause_collect_tb_stats(void);
> diff --git a/include/qemu/log.h b/include/qemu/log.h
> index b097a6cae1..a8d1997cde 100644
> --- a/include/qemu/log.h
> +++ b/include/qemu/log.h
> @@ -45,6 +45,7 @@ static inline bool qemu_log_separate(void)
>  /* LOG_TRACE (1 << 15) is defined in log-for-trace.h */
>  #define CPU_LOG_TB_OP_IND  (1 << 16)
>  #define CPU_LOG_TB_FPU     (1 << 17)
> +#define CPU_LOG_TB_STATS   (1 << 18)
>
>  /* Lock output for a series of related logs.  Since this is not needed
>   * for a single qemu_log / qemu_log_mask / qemu_log_mask_and_addr, we
> diff --git a/util/log.c b/util/log.c
> index 29021a4584..c3805b331b 100644
> --- a/util/log.c
> +++ b/util/log.c
> @@ -19,17 +19,20 @@
>
>  #include "qemu/osdep.h"
>  #include "qemu/log.h"
> +#include "qemu/qemu-print.h"
>  #include "qemu/range.h"
>  #include "qemu/error-report.h"
>  #include "qapi/error.h"
>  #include "qemu/cutils.h"
>  #include "trace/control.h"
> +#include "exec/tb-stats-flags.h"
>
>  static char *logfilename;
>  FILE *qemu_logfile;
>  int qemu_loglevel;
>  static int log_append = 0;
>  static GArray *debug_regions;
> +int32_t max_num_hot_tbs_to_dump;
>
>  int tcg_collect_tb_stats;
>  uint32_t default_tbstats_flag;
> @@ -276,6 +279,9 @@ const QEMULogItem qemu_log_items[] = {
>      { CPU_LOG_TB_NOCHAIN, "nochain",
>        "do not chain compiled TBs so that \"exec\" and \"cpu\" show\n"
>        "complete traces" },
> +    { CPU_LOG_TB_STATS, "tb_stats[[,level=(+all+jit+exec+time)][,dump_limit=<number>]]",
> +      "enable collection of TBs statistics"
> +      "(and dump until given a limit if in user mode).\n" },
>      { 0, NULL, NULL },
>  };
>
> @@ -297,6 +303,35 @@ int qemu_str_to_log_mask(const char *str)
>              trace_enable_events((*tmp) + 6);
>              mask |= LOG_TRACE;
>  #endif
> +        } else if (g_str_has_prefix(*tmp, "tb_stats")) {
> +            mask |= CPU_LOG_TB_STATS;
> +            default_tbstats_flag = TB_JIT_STATS | TB_EXEC_STATS | TB_JIT_TIME;
> +            tcg_collect_tb_stats = TB_STATS_RUNNING;
> +        } else if (tcg_collect_tb_stats == TB_STATS_RUNNING &&
> +                g_str_has_prefix(*tmp, "dump_limit=")) {
> +
> +            max_num_hot_tbs_to_dump = atoi((*tmp) + 11);
> +        } else if (tcg_collect_tb_stats == TB_STATS_RUNNING &&
> +                g_str_has_prefix(*tmp, "level=")) {
> +
> +            default_tbstats_flag = 0;
> +            char **level_parts = g_strsplit(*tmp + 6, "+", 0);
> +            char **level_tmp;
> +            for (level_tmp = level_parts; level_tmp && *level_tmp; level_tmp++) {
> +                if (g_str_equal(*level_tmp, "jit")) {
> +                    default_tbstats_flag |= TB_JIT_STATS;
> +                } else if (g_str_equal(*level_tmp, "exec")) {
> +                    default_tbstats_flag |= TB_EXEC_STATS;
> +                } else if (g_str_equal(*level_tmp, "time")) {
> +                    default_tbstats_flag |= TB_JIT_TIME;
> +                } else if (g_str_equal(*level_tmp, "all")) {
> +                    default_tbstats_flag |= TB_JIT_STATS | TB_EXEC_STATS | TB_JIT_TIME;
> +                } else {
> +                    fprintf(stderr, "no option level=%s, valid options are:"
> +                            "all, jit, exec or/and time\n", *level_tmp);
> +                    exit(1);

You can't exit here - for one thing a typo on the HMP will kill your
machine. You should pass Error **err to the helper and use error_setg to
report the failure.

From vl.c/main.c you can pass &error_fatal and it will report and
error out as you expect.

From the HMP hooks:

    Error *err = NULL;

    ...
    qemu_str_to_log_mask(flags, &err)
    ...
    if (err) {
        hmp_handle_error(mon, &err);
        return;
    }

You'll need to fix up the existing calls to do this instead of the if
!mask error_report they currently do.


> +                }
> +            }
>          } else {
>              for (item = qemu_log_items; item->mask != 0; item++) {
>                  if (g_str_equal(*tmp, item->name)) {


--
Alex Bennée


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v8 07/11] monitor: adding tb_stats hmp command
  2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 07/11] monitor: adding tb_stats hmp command vandersonmr
@ 2019-08-30 15:11   ` Alex Bennée
  0 siblings, 0 replies; 24+ messages in thread
From: Alex Bennée @ 2019-08-30 15:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Richard Henderson, vandersonmr,
	Dr . David Alan Gilbert, Markus Armbruster


vandersonmr <vandersonmr2@gmail.com> writes:

> Adding tb_stats [start|pause|stop|filter] command to hmp.
> This allows controlling the collection of statistics.
> It is also possible to set the level of collection:
> all, jit, or exec.
>
> tb_stats filter allow to only collect statistics for the TB
> in the last_search list.
>
> The goal of this command is to allow the dynamic exploration
> of the TCG behavior and quality. Therefore, for now, a
> corresponding QMP command is not worthwhile.
>
> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
<snip>
> index 4203cc728c..dcb30eea8a 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -1796,11 +1796,17 @@ static bool main_loop_should_exit(void)
>
>  static void main_loop(void)
>  {
> +#ifdef CONFIG_TCG
>      uint64_t ti;
> +#endif
>      while (!main_loop_should_exit()) {
> +#ifdef CONFIG_TCG
>          ti = profile_getclock();
> +#endif
>          main_loop_wait(false);
> +#ifdef CONFIG_TCG
>          dev_time += profile_getclock() - ti;
> +#endif

Shouldn't this have been s/CONFIG_PROFILER/CONFIG_TCG/ in the last
patch? Otherwise we might have a broken build half way through.

TBH I'm not even sure what dev_time is meant to represent. As we spend
all our time in the main_loop until we quit is seems to be more a less
wall clock time.


>      }
>  }


--
Alex Bennée


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v8 09/11] Adding info [tb-list|tb|coverset] commands to HMP.
  2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 09/11] Adding info [tb-list|tb|coverset] commands to HMP vandersonmr
@ 2019-08-30 16:17   ` Alex Bennée
  0 siblings, 0 replies; 24+ messages in thread
From: Alex Bennée @ 2019-08-30 16:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Richard Henderson, vandersonmr,
	Dr . David Alan Gilbert, Markus Armbruster


vandersonmr <vandersonmr2@gmail.com> writes:

> These commands allow the exploration of TBs
> generated by the TCG. Understand which one
> hotter, with more guest/host instructions...
> and examine their guest, host and IR code.
>
> The goal of this command is to allow the dynamic exploration
> of TCG behavior and code quality. Therefore, for now, a
> corresponding QMP command is not worthwhile.
>
> Example of output:
>
> TB id:1 | phys:0x34d54 virt:0x0000000000034d54 flags:0x0000f0
> 	| exec:4828932/0 guest inst cov:16.38%
> 	| trans:1 ints: g:3 op:82 op_opt:34 spills:3
> 	| h/g (host bytes / guest insts): 90.666664
> 	| time to gen at 2.4GHz => code:3150.83(ns) IR:712.08(ns)
> 	| targets: 0x0000000000034d5e (id:11), 0x0000000000034d0d (id:2)
>
> TB id:2 | phys:0x34d0d virt:0x0000000000034d0d flags:0x0000f0
> 	| exec:4825842/0 guest inst cov:21.82%
> 	| trans:1 ints: g:4 op:80 op_opt:38 spills:2
> 	| h/g (host bytes / guest insts): 84.000000
> 	| time to gen at 2.4GHz => code:3362.92(ns) IR:793.75(ns)
> 	| targets: 0x0000000000034d19 (id:12), 0x0000000000034d54 (id:1)
>
> TB id:2 | phys:0x34d0d virt:0x0000000000034d0d flags:0x0000f0
> 	| exec:6956495/0  guest inst cov:21.82%
> 	| trans:2 ints: g:2 op:40 op_opt:19 spills:1
> 	| h/g (host bytes / guest insts): 84.000000
> 	| time to gen at 2.4GHz => code:3130.83(ns) IR:722.50(ns)
> 	| targets: 0x0000000000034d19 (id:12), 0x0000000000034d54 (id:1)
>
> ----------------
> IN:
> 0x00034d0d:  89 de                    movl     %ebx, %esi
> 0x00034d0f:  26 8b 0e                 movl     %es:(%esi), %ecx
> 0x00034d12:  26 f6 46 08 80           testb    $0x80, %es:8(%esi)
> 0x00034d17:  75 3b                    jne      0x34d54
>
> ------------------------------
>
> TB id:1 | phys:0x34d54 virt:0x0000000000034d54 flags:0x0000f0
> 	| exec:5202686/0 guest inst cov:11.28%
> 	| trans:1 ints: g:3 op:82 op_opt:34 spills:3
> 	| h/g (host bytes / guest insts): 90.666664
> 	| time to gen at 2.4GHz => code:2793.75(ns) IR:614.58(ns)
> 	| targets: 0x0000000000034d5e (id:3), 0x0000000000034d0d (id:2)
>
> TB id:2 | phys:0x34d0d virt:0x0000000000034d0d flags:0x0000f0
> 	| exec:5199468/0 guest inst cov:15.03%
> 	| trans:1 ints: g:4 op:80 op_opt:38 spills:2
> 	| h/g (host bytes / guest insts): 84.000000
> 	| time to gen at 2.4GHz => code:2958.75(ns) IR:719.58(ns)
> 	| targets: 0x0000000000034d19 (id:4), 0x0000000000034d54 (id:1)
>
> ------------------------------
> 2 TBs to reach 25% of guest inst exec coverage
> Total of guest insts exec: 138346727
>
> ------------------------------
>
> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
> ---
>  accel/tcg/tb-stats.c         | 421 ++++++++++++++++++++++++++++++++++-
>  accel/tcg/translate-all.c    |   2 +-
>  disas.c                      |  31 ++-
>  hmp-commands-info.hx         |  24 ++
>  include/exec/tb-stats.h      |  45 +++-
>  include/qemu/log-for-trace.h |   4 +
>  include/qemu/log.h           |   2 +
>  monitor/misc.c               |  74 ++++++
>  util/log.c                   |  52 ++++-
>  9 files changed, 634 insertions(+), 21 deletions(-)
>
> diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
> index 9959477fbb..d588c551c9 100644
> --- a/accel/tcg/tb-stats.c
> +++ b/accel/tcg/tb-stats.c
> @@ -34,9 +34,35 @@
>
>  /* only accessed in safe work */
>  static GList *last_search;
> -
> +int id = 1; /* display_id increment counter */
>  uint64_t dev_time;
>
> +static TBStatistics *get_tbstats_by_id(int id)
> +{
> +    GList *iter;
> +
> +    for (iter = last_search; iter; iter = g_list_next(iter)) {
> +        TBStatistics *tbs = iter->data;
> +        if (tbs && tbs->display_id == id) {
> +            return tbs;
> +            break;
> +        }
> +    }
> +    return NULL;
> +}
> +
> +static TBStatistics *get_tbstats_by_addr(target_ulong pc)
> +{
> +    GList *iter;
> +    for (iter = last_search; iter; iter = g_list_next(iter)) {
> +        TBStatistics *tbs = iter->data;
> +        if (tbs && tbs->pc == pc) {
> +            return tbs;
> +        }
> +    }
> +    return NULL;
> +}
> +
>  struct jit_profile_info {
>      uint64_t translations;
>      uint64_t aborted;
> @@ -175,6 +201,7 @@ static void clean_tbstats(void)
>      qht_destroy(&tb_ctx.tb_stats);
>  }
>
> +
>  void do_hmp_tbstats_safe(CPUState *cpu, run_on_cpu_data icmd)
>  {
>      struct TbstatsCommand *cmdinfo = icmd.host_ptr;
> @@ -261,6 +288,398 @@ void init_tb_stats_htable_if_not(void)
>      }
>  }
>
> +static void collect_tb_stats(void *p, uint32_t hash, void *userp)
> +{
> +    last_search = g_list_prepend(last_search, p);
> +}
> +
> +static void dump_tb_targets(TBStatistics *tbs)
> +{
> +    if (tbs && tbs->tb) {
> +        TBStatistics *valid_target_tbstats[2] = {NULL, NULL};
> +
> +        /*
> +         * Check and ensure that targets' tbstats have a valid display_id and
> +         * are in last_search list
> +         */
> +        for (int jmp_id = 0; jmp_id < 2; jmp_id++) {
> +            qemu_spin_lock(&tbs->tb->jmp_lock);
> +            TranslationBlock *tb_dst =
> +                (TranslationBlock *)
> (atomic_read(&tbs->tb->jmp_dest[jmp_id]) & ~1);

Ahh I see. But I don't think this can be right because tbs->tb will
point to the last translated TB which may not be the hottest. It could
even been a shortened TB during some sort of IO transaction.

But more fundamentally we are going to miss a bunch of the transitions.
Any block that ends in a computed jump will not be linked here. Any
block that transitions across a page boundary won't get linked (they may
end up using tcg_gen_lookup_and_goto_ptr). It's true these won't be
blocks we might be able to join together in a future code optimisation
but it's missing the full picture.

So rather than trying to reverse infer from the partial data I think we
need to capture the destinations within the tb_stats mechanism itself.
Either by growing a list/array of TranslationBlocks (or possibly
TBStatistics) that any given TBStatistics entry may end up in. We may
also have to keep a tally of the hits of each destination as well. I
think that means hooking into:

  tcg_gen_exit_tb
  tcg_gen_goto_tb
  tcg_gen_lookup_and_goto_ptr

(c.f. CPU_LOG_TB_NOCHAIN which prevents chaining and causes a return to
the outer loop after every block).

> +            qemu_spin_unlock(&tbs->tb->jmp_lock);
> +
> +            if (tb_dst) {
> +                target_ulong pc = tb_dst ? tb_dst->pc : 0;
> +
> +                /* Check if tb_dst is on the last_search list */
> +                TBStatistics *tbstats_pc = get_tbstats_by_addr(pc);
> +
> +                /* if not in the last_search list, then insert it */
> +                if (!tbstats_pc) {
> +                    last_search = g_list_append(last_search, tb_dst->tb_stats);
> +                }
> +
> +                /* if the tb_stats does not have a valid display_id, then set one. */
> +                if (tb_dst->tb_stats && tb_dst->tb_stats->display_id == 0) {
> +                    tb_dst->tb_stats->display_id = id++;
> +                    valid_target_tbstats[jmp_id] = tb_dst->tb_stats;
> +                }
> +            }
> +        }
> +
> +        if (valid_target_tbstats[0] && !valid_target_tbstats[1]) {
> +            qemu_log("\t| targets: 0x"TARGET_FMT_lx" (id:%d)\n",
> +                    valid_target_tbstats[0]->pc, valid_target_tbstats[0]->display_id);
> +        } else if (!valid_target_tbstats[0] && valid_target_tbstats[1]) {
> +            qemu_log("\t| targets: 0x"TARGET_FMT_lx" (id:%d)\n",
> +                    valid_target_tbstats[1]->pc, valid_target_tbstats[1]->display_id);
> +        } else if (valid_target_tbstats[0] && valid_target_tbstats[1]) {
> +            qemu_log("\t| targets: 0x"TARGET_FMT_lx" (id:%d), "
> +                     "0x"TARGET_FMT_lx" (id:%d)\n",
> +                    valid_target_tbstats[0]->pc, valid_target_tbstats[0]->display_id,
> +                    valid_target_tbstats[1]->pc, valid_target_tbstats[1]->display_id);
> +        } else {
> +            qemu_log("\t| targets: no direct target\n");
> +        }
> +    }
> +}
> +
> +static void dump_tb_header(TBStatistics *tbs)
> +{
> +    unsigned g = stat_per_translation(tbs, code.num_guest_inst);
> +    unsigned ops = stat_per_translation(tbs, code.num_tcg_ops);
> +    unsigned ops_opt = stat_per_translation(tbs, code.num_tcg_ops_opt);
> +    unsigned spills = stat_per_translation(tbs, code.spills);
> +    unsigned h = stat_per_translation(tbs, code.out_len);
> +
> +    float guest_host_prop = g ? ((float) h / g) : 0;
> +
> +    qemu_log("TB id:%d | phys:0x"TB_PAGE_ADDR_FMT" virt:0x"TARGET_FMT_lx
> +             " flags:%#08x\n", tbs->display_id, tbs->phys_pc, tbs->pc, tbs->flags);
> +
> +    if (tbs_stats_enabled(tbs, TB_EXEC_STATS)) {
> +        qemu_log("\t| exec:%lu/%lu guest inst cov:%.2f%%\n", tbs->executions.normal,
> +                tbs->executions.atomic, tbs->executions.coverage / 100.0f);
> +    }
> +
> +    if (tbs_stats_enabled(tbs, TB_JIT_STATS)) {
> +        qemu_log("\t| trans:%lu ints: g:%u op:%u op_opt:%u spills:%d"
> +             "\n\t| h/g (host bytes / guest insts): %f\n",
> +             tbs->translations.total, g, ops, ops_opt, spills, guest_host_prop);
> +    }
> +
> +    if (tbs_stats_enabled(tbs, TB_JIT_TIME)) {
> +        qemu_log("\t| time to gen at 2.4GHz => code:%0.2lf(ns) IR:%0.2lf(ns)\n",
> +             tbs->time.code / 2.4, tbs->time.interm / 2.4);
> +    }
> +
> +    dump_tb_targets(tbs);
> +    qemu_log("\n");
> +}
> +
> +static gint
> +inverse_sort_tbs(gconstpointer p1, gconstpointer p2, gpointer psort_by)
> +{
> +    const TBStatistics *tbs1 = (TBStatistics *) p1;
> +    const TBStatistics *tbs2 = (TBStatistics *) p2;
> +    int sort_by = *((int *) psort_by);
> +    unsigned long c1 = 0;
> +    unsigned long c2 = 0;
> +
> +    if (likely(sort_by == SORT_BY_SPILLS)) {
> +        c1 = stat_per_translation(tbs1, code.spills);
> +        c2 = stat_per_translation(tbs2, code.spills);
> +    } else if (likely(sort_by == SORT_BY_HOTNESS)) {
> +        c1 = stat_per_translation(tbs1, executions.normal);
> +        c2 = stat_per_translation(tbs2, executions.normal);
> +    } else if (likely(sort_by == SORT_BY_HG)) {
> +        if (tbs1->code.num_guest_inst == 0) {
> +            return -1;
> +        }
> +        if (tbs2->code.num_guest_inst == 0) {
> +            return 1;
> +        }
> +
> +        float a =
> +            (float) stat_per_translation(tbs1, code.out_len) / tbs1->code.num_guest_inst;
> +        float b =
> +            (float) stat_per_translation(tbs2, code.out_len) / tbs2->code.num_guest_inst;
> +        c1 = a <= b ? 0 : 1;
> +        c2 = a <= b ? 1 : 0;
> +    }
> +
> +    return c1 < c2 ? 1 : c1 == c2 ? 0 : -1;
> +}
> +
> +static void dump_last_search_headers(int count)
> +{
> +    if (!last_search) {
> +        qemu_log("No data collected yet\n");
> +        return;
> +    }
> +
> +    GList *l = last_search;
> +    while (l != NULL && count--) {
> +        TBStatistics *tbs = (TBStatistics *) l->data;
> +        GList *next = l->next;
> +        dump_tb_header(tbs);
> +        l = next;
> +    }
> +}
> +
> +static uint64_t calculate_last_search_coverages(void)
> +{
> +    uint64_t total_exec_count = 0;
> +    GList *i;
> +
> +    /* Compute total execution count for all tbs */
> +    for (i = last_search; i; i = i->next) {
> +        TBStatistics *tbs = (TBStatistics *) i->data;
> +        total_exec_count +=
> +            (tbs->executions.atomic + tbs->executions.normal) * tbs->code.num_guest_inst;
> +    }
> +
> +    for (i = last_search; i; i = i->next) {
> +        TBStatistics *tbs = (TBStatistics *) i->data;
> +        uint64_t tb_total_execs =
> +            (tbs->executions.atomic + tbs->executions.normal) * tbs->code.num_guest_inst;
> +        tbs->executions.coverage = (10000 * tb_total_execs) / (total_exec_count + 1);
> +    }
> +
> +    return total_exec_count;
> +}
> +
> +static void do_dump_coverset_info(int percentage)
> +{
> +    mmap_lock();
> +    uint16_t total_coverage = 0;
> +    unsigned coverset_size = 0;
> +    percentage *= 100;
> +    id = 1;
> +    GList *i;
> +
> +    g_list_free(last_search);
> +    last_search = NULL;
> +
> +    qht_iter(&tb_ctx.tb_stats, collect_tb_stats, NULL);
> +
> +    int sort_by = SORT_BY_HOTNESS;
> +    last_search = g_list_sort_with_data(last_search, inverse_sort_tbs, &sort_by);
> +
> +    if (!last_search) {
> +        qemu_log("No data collected yet\n");
> +        return;
> +    }
> +
> +    uint64_t total_exec_count = calculate_last_search_coverages();
> +
> +    /* Iterate and tbs display_id until reach the coverage percentage
> count */

I think we need to finese this a bit. I ran info coverset without any
params having booted a kernel and my HMP hasn't returned yet, it's
currently on TB id 169000!

Certainly for an interactive user they would have glazed over by now...

> +    for (i = last_search; i && total_coverage < percentage; i = i->next) {
> +        TBStatistics *tbs = (TBStatistics *) i->data;
> +        tbs->display_id = id++;
> +        coverset_size++;
> +        total_coverage += tbs->executions.coverage;
> +    }
> +
> +    /* free the unused bits */
> +    if (i) {
> +        if (i->next) {
> +            i->next->prev = NULL;
> +        }
> +        g_list_free(i->next);
> +        i->next = NULL;
> +    }
> +
> +    dump_last_search_headers(coverset_size);
> +    mmap_unlock();
> +
> +    qemu_log("------------------------------\n");
> +    qemu_log("%u TBs to reach %d%% of guest inst exec coverage\n",
> +                coverset_size, percentage / 100);
> +    qemu_log("Total of guest insts exec: %lu", total_exec_count);
> +    qemu_log("\n------------------------------\n");
> +}
> +
> +static void do_dump_tbs_info(int total, int sort_by)
> +{
> +    id = 1;
> +    GList *i;
> +    int count = total;
> +
> +    g_list_free(last_search);
> +    last_search = NULL;
> +
> +    qht_iter(&tb_ctx.tb_stats, collect_tb_stats, NULL);
> +
> +    last_search = g_list_sort_with_data(last_search, inverse_sort_tbs, &sort_by);
> +
> +    if (!last_search) {
> +        qemu_printf("No data collected yet!\n");
> +        return;
> +    }
> +
> +    calculate_last_search_coverages();
> +
> +    for (i = last_search; i && count--; i = i->next) {
> +        TBStatistics *tbs = (TBStatistics *) i->data;
> +        tbs->display_id = id++;
> +    }
> +
> +    /* free the unused bits */
> +    if (i) {
> +        if (i->next) {
> +            i->next->prev = NULL;
> +        }
> +        g_list_free(i->next);
> +        i->next = NULL;
> +    }
> +
> +    dump_last_search_headers(total);
> +}
> +
> +static void
> +do_dump_coverset_info_safe(CPUState *cpu, run_on_cpu_data percentage)
> +{
> +    qemu_log_to_monitor(true);
> +    do_dump_coverset_info(percentage.host_int);
> +    qemu_log_to_monitor(false);
> +}
> +
> +struct tbs_dump_info {
> +    int count;
> +    int sort_by;
> +};
> +
> +static void do_dump_tbs_info_safe(CPUState *cpu, run_on_cpu_data tbdi)
> +{
> +    struct tbs_dump_info *info = tbdi.host_ptr;
> +    qemu_log_to_monitor(true);
> +    do_dump_tbs_info(info->count, info->sort_by);
> +    qemu_log_to_monitor(false);
> +    g_free(info);
> +}
> +
> +/*
> + * When we dump_tbs_info on a live system via the HMP we want to
> + * ensure the system is quiessent before we start outputting stuff.
> + * Otherwise we could pollute the output with other logging output.
> + */
> +void dump_coverset_info(int percentage, bool use_monitor)
> +{
> +    if (use_monitor) {
> +        async_safe_run_on_cpu(first_cpu, do_dump_coverset_info_safe,
> +                              RUN_ON_CPU_HOST_INT(percentage));
> +    } else {
> +        do_dump_coverset_info(percentage);
> +    }
> +}
> +
> +void dump_tbs_info(int count, int sort_by, bool use_monitor)
> +{
> +    if (use_monitor) {
> +        struct tbs_dump_info *tbdi = g_new(struct tbs_dump_info, 1);
> +        tbdi->count = count;
> +        tbdi->sort_by = sort_by;
> +        async_safe_run_on_cpu(first_cpu, do_dump_tbs_info_safe,
> +                              RUN_ON_CPU_HOST_PTR(tbdi));
> +    } else {
> +        do_dump_tbs_info(count, sort_by);
> +    }
> +}
> +
> +static GString *get_code_string(TBStatistics *tbs, int log_flags)
> +{
> +    int old_log_flags = qemu_loglevel;
> +
> +    CPUState *cpu = first_cpu;
> +    uint32_t cflags = curr_cflags() | CF_NOCACHE;
> +    TranslationBlock *tb = NULL;
> +
> +    GString *code_s = g_string_new(NULL);
> +    qemu_log_to_string(true, code_s);
> +
> +    qemu_set_log(log_flags);
> +
> +    if (sigsetjmp(cpu->jmp_env, 0) == 0) {
> +        mmap_lock();
> +        tb = tb_gen_code(cpu, tbs->pc, tbs->cs_base, tbs->flags, cflags);
> +        tb_phys_invalidate(tb, -1);
> +        mmap_unlock();
> +    } else {
> +        /*
> +         * The mmap_lock is dropped by tb_gen_code if it runs out of
> +         * memory.
> +         */
> +        fprintf(stderr, "%s: dbg failed!\n", __func__);
> +        qemu_log("\ncould not gen code for this TB\n");
> +        assert_no_pages_locked();
> +    }
> +
> +    qemu_set_log(old_log_flags);
> +    qemu_log_to_string(false, NULL);
> +
> +    if (tb) {
> +        tcg_tb_remove(tb);
> +    }
> +
> +    return code_s;
> +}
> +
> +static void do_tb_dump_with_statistics(TBStatistics *tbs, int log_flags)
> +{
> +    qemu_log("\n------------------------------\n\n");
> +    dump_tb_header(tbs);
> +
> +    GString *code_s = get_code_string(tbs, log_flags);
> +    qemu_log("%s", code_s->str);
> +    g_string_free(code_s, true);
> +    qemu_log("------------------------------\n");
> +}
> +
> +struct tb_dump_info {
> +    int id;
> +    int log_flags;
> +    bool use_monitor;
> +};
> +
> +static void do_dump_tb_info_safe(CPUState *cpu, run_on_cpu_data info)
> +{
> +    struct tb_dump_info *tbdi = (struct tb_dump_info *) info.host_ptr;
> +
> +    if (!last_search) {
> +        qemu_log("no search on record\n");
> +        return;
> +    }
> +
> +    qemu_log_to_monitor(tbdi->use_monitor);
> +
> +    TBStatistics *tbs = get_tbstats_by_id(tbdi->id);
> +    if (tbs) {
> +        do_tb_dump_with_statistics(tbs, tbdi->log_flags);
> +    } else {
> +        qemu_log("no TB statitics found with id %d\n", tbdi->id);
> +    }
> +
> +    qemu_log_to_monitor(false);
> +
> +    g_free(tbdi);
> +}
> +
> +void dump_tb_info(int id, int log_mask, bool use_monitor)
> +{
> +    struct tb_dump_info *tbdi = g_new(struct tb_dump_info, 1);
> +
> +    tbdi->id = id;
> +    tbdi->log_flags = log_mask;
> +    tbdi->use_monitor = use_monitor;
> +
> +    async_safe_run_on_cpu(first_cpu, do_dump_tb_info_safe,
> +                          RUN_ON_CPU_HOST_PTR(tbdi));
> +
> +    /* tbdi free'd by do_dump_tb_info_safe */
> +}
> +
> +
>  void enable_collect_tb_stats(void)
>  {
>      init_tb_stats_htable_if_not();
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index a2e65bb85c..fa163440dc 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -1778,7 +1778,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>       * generation so we can count interesting things about this
>       * generation.
>       */
> -    if (tb_stats_collection_enabled()) {
> +    if (tb_stats_collection_enabled() && !(tb->cflags & CF_NOCACHE))
> {

It feels like this should be part of an earlier patch. It's not like
we've suddenly decided not to count uncached runs?

>          tb->tb_stats = tb_get_stats(phys_pc, pc, cs_base, flags, tb);
>          uint32_t flag = get_default_tbstats_flag();
>
> diff --git a/disas.c b/disas.c
> index 3e2bfa572b..d5292d4246 100644
> --- a/disas.c
> +++ b/disas.c
> @@ -8,6 +8,8 @@
>  #include "disas/disas.h"
>  #include "disas/capstone.h"
>
> +#include "qemu/log-for-trace.h"
> +
>  typedef struct CPUDebug {
>      struct disassemble_info info;
>      CPUState *cpu;
> @@ -420,6 +422,22 @@ static bool cap_disas_monitor(disassemble_info *info, uint64_t pc, int count)
>  # define cap_disas_monitor(i, p, c)  false
>  #endif /* CONFIG_CAPSTONE */
>
> +static int fprintf_log(struct _IO_FILE *a, const char *b, ...)
> +{
> +    va_list ap;
> +    va_start(ap, b);
> +
> +    if (!to_string) {
> +        vfprintf(a, b, ap);
> +    } else {
> +        qemu_vlog(b, ap);
> +    }
> +
> +    va_end(ap);
> +
> +    return 1;
> +}
> +
>  /* Disassemble this for me please... (debugging).  */
>  void target_disas(FILE *out, CPUState *cpu, target_ulong code,
>                    target_ulong size)
> @@ -429,7 +447,7 @@ void target_disas(FILE *out, CPUState *cpu, target_ulong code,
>      int count;
>      CPUDebug s;
>
> -    INIT_DISASSEMBLE_INFO(s.info, out, fprintf);
> +    INIT_DISASSEMBLE_INFO(s.info, out, fprintf_log);
>
>      s.cpu = cpu;
>      s.info.read_memory_func = target_read_memory;
> @@ -460,11 +478,12 @@ void target_disas(FILE *out, CPUState *cpu, target_ulong code,
>      }
>
>      for (pc = code; size > 0; pc += count, size -= count) {
> -	fprintf(out, "0x" TARGET_FMT_lx ":  ", pc);
> -	count = s.info.print_insn(pc, &s.info);
> -	fprintf(out, "\n");
> -	if (count < 0)
> -	    break;
> +        fprintf_log(out, "0x" TARGET_FMT_lx ":  ", pc);
> +        count = s.info.print_insn(pc, &s.info);
> +        fprintf_log(out, "\n");
> +        if (count < 0) {
> +            break;
> +        }
>          if (size < count) {
>              fprintf(out,
>                      "Disassembler disagrees with translator over instruction "
> diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
> index c59444c461..f415479011 100644
> --- a/hmp-commands-info.hx
> +++ b/hmp-commands-info.hx
> @@ -289,6 +289,30 @@ ETEXI
>          .help       = "show dynamic compiler info",
>          .cmd        = hmp_info_jit,
>      },
> +    {
> +        .name       = "tb-list",
> +        .args_type  = "number:i?,sortedby:s?",
> +        .params     = "[number sortedby]",
> +        .help       = "show a [number] translated blocks sorted by [sortedby]"
> +                      "sortedby opts: hotness hg spills",
> +        .cmd        = hmp_info_tblist,
> +    },
> +    {
> +        .name       = "tb",
> +        .args_type  = "id:i,flags:s?",
> +        .params     = "id [flag1,flag2,...]",
> +        .help       = "show information about one translated block by id."
> +                      "dump flags can be used to set dump code level: out_asm in_asm op",
> +        .cmd        = hmp_info_tb,
> +    },
> +    {
> +        .name       = "coverset",
> +        .args_type  = "coverage:i?",
> +        .params     = "[coverage]",
> +        .help       = "show hottest translated blocks neccesary to cover"
> +                      "[coverage]% of the execution count",
> +        .cmd        = hmp_info_coverset,
> +    },
>  #endif
>
>  STEXI
> diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
> index 65063c52d7..51d73e1c5f 100644
> --- a/include/exec/tb-stats.h
> +++ b/include/exec/tb-stats.h
> @@ -35,8 +35,11 @@
>  enum SortBy { SORT_BY_HOTNESS, SORT_BY_HG /* Host/Guest */, SORT_BY_SPILLS };
>  enum TbstatsCmd { START, PAUSE, STOP, FILTER };
>
> +#define tbs_stats_enabled(tbs, JIT_STATS) \
> +    (tbs && (tbs->stats_enabled & JIT_STATS))
> +
>  #define tb_stats_enabled(tb, JIT_STATS) \
> -    (tb && tb->tb_stats && (tb->tb_stats->stats_enabled & JIT_STATS))
> +    (tb && tb->tb_stats && tbs_stats_enabled(tb->tb_stats, JIT_STATS))
>
>  #define stat_per_translation(stat, name) \
>      (stat->translations.total ? stat->name / stat->translations.total : 0)
> @@ -64,6 +67,8 @@ struct TBStatistics {
>      struct {
>          unsigned long normal;
>          unsigned long atomic;
> +        /* filled only when dumping x% cover set */
> +        uint16_t coverage;
>      } executions;
>
>      struct {
> @@ -82,7 +87,6 @@ struct TBStatistics {
>
>      struct {
>          unsigned long total;
> -        unsigned long uncached;
>          unsigned long spanning;
>      } translations;
>
> @@ -95,6 +99,9 @@ struct TBStatistics {
>          int64_t la;
>      } time;
>
> +    /* HMP information - used for referring to previous search */
> +    int display_id;
> +
>      /* current TB linked to this TBStatistics */
>      TranslationBlock *tb;
>  };
> @@ -115,6 +122,40 @@ struct TbstatsCommand {
>
>  void do_hmp_tbstats_safe(CPUState *cpu, run_on_cpu_data icmd);
>
> +/**
> + * dump_coverset_info: report the hottest blocks to cover n% of execution
> + *
> + * @percentage: cover set percentage
> + * @use_monitor: redirect output to monitor
> + *
> + * Report the hottest blocks to either the log or monitor
> + */
> +void dump_coverset_info(int percentage, bool use_monitor);
> +
> +
> +/**
> + * dump_tbs_info: report the hottest blocks
> + *
> + * @count: the limit of hotblocks
> + * @sort_by: property in which the dump will be sorted
> + * @use_monitor: redirect output to monitor
> + *
> + * Report the hottest blocks to either the log or monitor
> + */
> +void dump_tbs_info(int count, int sort_by, bool use_monitor);
> +
> +/**
> + * dump_tb_info: dump information about one TB
> + *
> + * @id: the display id of the block (from previous search)
> + * @mask: the temporary logging mask
> + * @Use_monitor: redirect output to monitor
> + *
> + * Re-run a translation of a block at addr for the purposes of debug output
> + */
> +void dump_tb_info(int id, int log_mask, bool use_monitor);
> +
> +
>  /* TBStatistic collection controls */
>  void enable_collect_tb_stats(void);
>  void disable_collect_tb_stats(void);
> diff --git a/include/qemu/log-for-trace.h b/include/qemu/log-for-trace.h
> index 2f0a5b080e..3de88484cb 100644
> --- a/include/qemu/log-for-trace.h
> +++ b/include/qemu/log-for-trace.h
> @@ -20,6 +20,9 @@
>
>  /* Private global variable, don't use */

This smells bad.

>  extern int qemu_loglevel;
> +extern bool to_string;
> +
> +extern int32_t max_num_hot_tbs_to_dump;
>
>  #define LOG_TRACE          (1 << 15)
>
> @@ -31,5 +34,6 @@ static inline bool qemu_loglevel_mask(int mask)
>
>  /* main logging function */
>  int GCC_FMT_ATTR(1, 2) qemu_log(const char *fmt, ...);
> +int qemu_vlog(const char *fmt, va_list va);
>
>  #endif
> diff --git a/include/qemu/log.h b/include/qemu/log.h
> index a8d1997cde..804cf90f0f 100644
> --- a/include/qemu/log.h
> +++ b/include/qemu/log.h
> @@ -114,6 +114,8 @@ typedef struct QEMULogItem {
>  extern const QEMULogItem qemu_log_items[];
>
>  void qemu_set_log(int log_flags);
> +void qemu_log_to_monitor(bool enable);
> +void qemu_log_to_string(bool enable, GString *s);
>  void qemu_log_needs_buffers(void);
>  void qemu_set_log_filename(const char *filename, Error **errp);
>  void qemu_set_dfilter_ranges(const char *ranges, Error **errp);
> diff --git a/monitor/misc.c b/monitor/misc.c
> index 218263d29a..b99c018124 100644
> --- a/monitor/misc.c
> +++ b/monitor/misc.c
> @@ -504,6 +504,80 @@ static void hmp_tbstats(Monitor *mon, const QDict *qdict)
>
>  }
>
> +static void hmp_info_tblist(Monitor *mon, const QDict *qdict)
> +{
> +    int number_int;
> +    const char *sortedby_str = NULL;
> +    if (!tcg_enabled()) {
> +        error_report("TB information is only available with accel=tcg");
> +        return;
> +    }
> +    if (!tb_ctx.tb_stats.map) {
> +        error_report("no TB information recorded");
> +        return;
> +    }
> +
> +    number_int = qdict_get_try_int(qdict, "number", 10);
> +    sortedby_str = qdict_get_try_str(qdict, "sortedby");
> +
> +    int sortedby = SORT_BY_HOTNESS;
> +    if (sortedby_str == NULL || strcmp(sortedby_str, "hotness") == 0) {
> +        sortedby = SORT_BY_HOTNESS;
> +    } else if (strcmp(sortedby_str, "hg") == 0) {
> +        sortedby = SORT_BY_HG;
> +    } else if (strcmp(sortedby_str, "spills") == 0) {
> +        sortedby = SORT_BY_SPILLS;
> +    } else {
> +        error_report("valid sort options are: hotness hg spills");
> +        return;
> +    }
> +
> +    dump_tbs_info(number_int, sortedby, true);
> +}
> +
> +static void hmp_info_tb(Monitor *mon, const QDict *qdict)
> +{
> +    const int id = qdict_get_int(qdict, "id");
> +    const char *flags = qdict_get_try_str(qdict, "flags");
> +    int mask;
> +
> +    if (!tcg_enabled()) {
> +        error_report("TB information is only available with accel=tcg");
> +        return;
> +    }
> +
> +    mask = flags ? qemu_str_to_log_mask(flags) : CPU_LOG_TB_IN_ASM;
> +
> +    if (!mask) {
> +        error_report("Unable to parse log flags, see 'help log'");
> +        return;
> +    }
> +
> +    dump_tb_info(id, mask, true);
> +}
> +
> +static void hmp_info_coverset(Monitor *mon, const QDict *qdict)
> +{
> +    int coverage;
> +    if (!tcg_enabled()) {
> +        error_report("TB information is only available with accel=tcg");
> +        return;
> +    }
> +    if (!tb_stats_collection_enabled()) {
> +        error_report("TB information not being recorded");
> +        return;
> +    }
> +
> +    coverage = qdict_get_try_int(qdict, "coverage", 90);
> +
> +    if (coverage < 0 || coverage > 100) {
> +        error_report("Coverset percentage should be between 0 and 100");
> +        return;
> +    }
> +
> +    dump_coverset_info(coverage, true);
> +}
> +
>  static void hmp_info_jit(Monitor *mon, const QDict *qdict)
>  {
>      if (!tcg_enabled()) {
> diff --git a/util/log.c b/util/log.c
> index c3805b331b..698b48d083 100644
> --- a/util/log.c
> +++ b/util/log.c
> @@ -33,28 +33,58 @@ int qemu_loglevel;
>  static int log_append = 0;
>  static GArray *debug_regions;
>  int32_t max_num_hot_tbs_to_dump;
> +static bool to_monitor;
> +bool to_string;
>
>  int tcg_collect_tb_stats;
>  uint32_t default_tbstats_flag;
>
> -/* Return the number of characters emitted.  */
> -int qemu_log(const char *fmt, ...)
> +GString *string;
> +
> +int qemu_vlog(const char *fmt, va_list va)
>  {
>      int ret = 0;
> -    if (qemu_logfile) {
> -        va_list ap;
> -        va_start(ap, fmt);
> -        ret = vfprintf(qemu_logfile, fmt, ap);
> -        va_end(ap);
> -
> -        /* Don't pass back error results.  */
> -        if (ret < 0) {
> -            ret = 0;
> +    if (to_string) {
> +        if (string) {
> +            g_string_append_vprintf(string, fmt, va);
>          }
> +    } else if (to_monitor) {
> +        ret = qemu_vprintf(fmt, va);
> +    } else if (qemu_logfile) {
> +        ret = vfprintf(qemu_logfile, fmt, va);
> +    }
> +
> +    /* Don't pass back error results.  */
> +    if (ret < 0) {
> +        ret = 0;
>      }
>      return ret;
>  }
>
> +/* Return the number of characters emitted.  */
> +int qemu_log(const char *fmt, ...)
> +{
> +    int ret = 0;
> +    va_list ap;
> +    va_start(ap, fmt);
> +
> +    ret = qemu_vlog(fmt, ap);
> +
> +    va_end(ap);
> +    return ret;
> +}
> +
> +void qemu_log_to_monitor(bool enable)
> +{
> +    to_monitor = enable;
> +}
> +
> +void qemu_log_to_string(bool enable, GString *s)
> +{
> +    to_string = enable;
> +    string = s;
> +}
> +
>  static bool log_uses_own_buffers;
>
>  /* enable or disable low levels log */


--
Alex Bennée


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v8 10/11] monitor: adding new info cfg command
  2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 10/11] monitor: adding new info cfg command vandersonmr
@ 2019-08-30 16:26   ` Alex Bennée
  0 siblings, 0 replies; 24+ messages in thread
From: Alex Bennée @ 2019-08-30 16:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Richard Henderson, vandersonmr,
	Dr. David Alan Gilbert, Markus Armbruster


vandersonmr <vandersonmr2@gmail.com> writes:

> Adding "info cfg id depth" commands to HMP.
> This command allow the exploration a TB
> neighbors by dumping [and opening] a .dot
> file with the TB CFG neighbors colorized
> by their hotness.
>
> The goal of this command is to allow the dynamic exploration
> of TCG behavior and code quality. Therefore, for now, a
> corresponding QMP command is not worthwhile.
>
> Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
> ---
>  accel/tcg/tb-stats.c    | 164 ++++++++++++++++++++++++++++++++++++++++
>  hmp-commands-info.hx    |   7 ++
>  include/exec/tb-stats.h |   1 +
>  monitor/misc.c          |  22 ++++++
>  4 files changed, 194 insertions(+)
>
> diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
> index d588c551c9..fd2344c5d1 100644
> --- a/accel/tcg/tb-stats.c
> +++ b/accel/tcg/tb-stats.c
> @@ -679,6 +679,170 @@ void dump_tb_info(int id, int log_mask, bool use_monitor)
>      /* tbdi free'd by do_dump_tb_info_safe */
>  }
>
> +/* TB CFG xdot/dot dump implementation */
> +#define MAX_CFG_NUM_NODES 1000
> +static int cfg_tb_id;
> +static GHashTable *cfg_nodes;
> +static uint64_t root_count;
> +
> +static void fputs_jump(TBStatistics *from, TBStatistics *to, FILE *dot)
> +{
> +    if (!from || !to) {
> +        return;
> +    }
> +
> +    int *from_id = (int *) g_hash_table_lookup(cfg_nodes, from);
> +    int *to_id   = (int *) g_hash_table_lookup(cfg_nodes, to);
> +
> +    if (!from_id || !to_id) {
> +        return;
> +    }
> +
> +    fprintf(dot, "   node_%d -> node_%d;\n", *from_id, *to_id);
> +}
> +
> +/* Hotness colors used in the CFG */
> +#define HOT_RED1  0xFF000 /* RGB(255,0,0)     */
> +#define HOT_RED2  0xFF333 /* RGB(255,51,48)   */
> +#define MILD_RED  0xFF666 /* RGB(255,102,96)  */
> +#define WEAK_RED1 0xFF999 /* RGB(255,153,144) */
> +#define WEAK_RED2 0xFFCCC /* RGB(255,204,192) */
> +
> +static void fputs_tbstats(TBStatistics *tbs, FILE *dot, int log_flags)
> +{
> +    if (!tbs) {
> +        return;
> +    }
> +
> +    uint32_t color = MILD_RED;
> +    uint64_t count = tbs->executions.normal;
> +    if (count > 1.6 * root_count) {
> +        color = HOT_RED1;
> +    } else if (count > 1.2 * root_count) {
> +        color = HOT_RED2;
> +    } else if (count < 0.4 * root_count) {
> +        color = WEAK_RED2;
> +    } else if (count < 0.8 * root_count) {
> +        color = WEAK_RED1;
> +    }
> +
> +    GString *code_s = get_code_string(tbs, log_flags);
> +
> +    for (int i = 0; i < code_s->len; i++) {
> +        if (code_s->str[i] == '\n') {
> +            code_s->str[i] = ' ';
> +            code_s = g_string_insert(code_s, i, "\\l");
> +            i += 2;
> +        }
> +    }
> +
> +    fprintf(dot,
> +            "   node_%d [fillcolor=\"#%xFF0000\" shape=\"record\" "
> +            "label=\"TB %d\\l"
> +            "-------------\\l"
> +            "PC:\t0x"TARGET_FMT_lx"\\l"
> +            "exec count:\t%lu\\l"
> +            "\\l %s\"];\n",
> +            cfg_tb_id, color, cfg_tb_id, tbs->pc,
> +            tbs->executions.normal, code_s->str);
> +
> +    int *id = g_new(int, 1);
> +    *id = cfg_tb_id;
> +    g_hash_table_insert(cfg_nodes, tbs, id);
> +
> +    cfg_tb_id++;
> +
> +    g_string_free(code_s, true);
> +}
> +
> +static void fputs_preorder_walk(TBStatistics *tbs, int depth, FILE *dot, int log_flags)
> +{
> +    if (tbs && depth > 0
> +            && cfg_tb_id < MAX_CFG_NUM_NODES
> +            && !g_hash_table_contains(cfg_nodes, tbs)) {
> +
> +        fputs_tbstats(tbs, dot, log_flags);
> +
> +        if (tbs->tb) {
> +            TranslationBlock *left_tb  = NULL;
> +            TranslationBlock *right_tb = NULL;
> +            if (tbs->tb->jmp_dest[0]) {
> +                left_tb = (TranslationBlock *) atomic_read(tbs->tb->jmp_dest);
> +            }
> +            if (tbs->tb->jmp_dest[1]) {
> +                right_tb = (TranslationBlock *) atomic_read(tbs->tb->jmp_dest + 1);
> +            }

I think my comments about relying on jmp_dest on the previous patch hold
for this as well.

> +
> +            if (left_tb) {
> +                fputs_preorder_walk(left_tb->tb_stats, depth - 1, dot, log_flags);
> +                fputs_jump(tbs, left_tb->tb_stats, dot);
> +            }
> +            if (right_tb) {
> +                fputs_preorder_walk(right_tb->tb_stats, depth - 1, dot, log_flags);
> +                fputs_jump(tbs, right_tb->tb_stats, dot);
> +            }
> +        }
> +    }
> +}
> +
> +struct PreorderInfo {
> +    TBStatistics *tbs;
> +    int depth;
> +    int log_flags;
> +};
> +
> +static void fputs_preorder_walk_safe(CPUState *cpu, run_on_cpu_data icmd)
> +{
> +    struct PreorderInfo *info = icmd.host_ptr;
> +
> +    GString *file_name = g_string_new(NULL);;
> +    g_string_printf(file_name, "/tmp/qemu-cfg-tb-%d-%d.dot", id,
> info->depth);

g_string is great for building things up on the fly but slight overkill
for this sort of case. Besides I think using g_file_open_tmp might be a
better choice than hand rolling our own magic tmp file code.

  int dot_fd = g_file_open_tmp("qemu-cfg-XXXX", &name, &error);

> +    FILE *dot = fopen(file_name->str, "w+");
> +
> +    fputs(
> +            "digraph G {\n"
> +            "   mclimit=1.5;\n"
> +            "   rankdir=TD; ordering=out;\n"
> +            "   graph[fontsize=10 fontname=\"Verdana\"];\n"
> +            "   color=\"#efefef\";\n"
> +            "   node[shape=box style=filled fontsize=8 fontname=\"Verdana\" fillcolor=\"#efefef\"];\n"
> +            "   edge[fontsize=8 fontname=\"Verdana\"];\n"
> +         , dot);
> +
> +    cfg_nodes = g_hash_table_new(NULL, NULL);
> +    fputs_preorder_walk(info->tbs, info->depth, dot, info->log_flags);
> +    g_hash_table_destroy(cfg_nodes);
> +
> +    fputs("}\n\0", dot);
> +    fclose(dot);
> +
> +    qemu_log("CFG dumped: %s\n", file_name->str);

  qemu_log("CFG dumped: %s\n", name);
  g_free(name);

> +
> +    g_string_free(file_name, true);
> +    g_free(info);
> +}
> +
> +void dump_tb_cfg(int id, int depth, int log_flags)
> +{
> +    cfg_tb_id = 1;
> +    root_count = 0;
> +
> +    /* do a pre-order walk in the CFG with a limited depth */
> +    TBStatistics *root = get_tbstats_by_id(id);
> +    if (root) {
> +        root_count = root->executions.normal;
> +    }
> +
> +    struct PreorderInfo *info = g_new(struct PreorderInfo, 1);
> +    info->tbs = root;
> +    info->depth = depth + 1;
> +    info->log_flags = log_flags;
> +    async_safe_run_on_cpu(first_cpu, fputs_preorder_walk_safe,
> +            RUN_ON_CPU_HOST_PTR(info));
> +}
> +
> +/* TBStatistic collection controls */
> +
>
>  void enable_collect_tb_stats(void)
>  {
> diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
> index f415479011..8c96924c0b 100644
> --- a/hmp-commands-info.hx
> +++ b/hmp-commands-info.hx
> @@ -305,6 +305,13 @@ ETEXI
>                        "dump flags can be used to set dump code level: out_asm in_asm op",
>          .cmd        = hmp_info_tb,
>      },
> +    {
> +        .name       = "cfg",
> +        .args_type  = "id:i,depth:i?,flags:s?",
> +        .params     = "id [depth flags]",
> +        .help       = "plot CFG around TB with the given id",
> +        .cmd        = hmp_info_cfg,
> +    },
>      {
>          .name       = "coverset",
>          .args_type  = "coverage:i?",
> diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
> index 51d73e1c5f..4515db106a 100644
> --- a/include/exec/tb-stats.h
> +++ b/include/exec/tb-stats.h
> @@ -155,6 +155,7 @@ void dump_tbs_info(int count, int sort_by, bool use_monitor);
>   */
>  void dump_tb_info(int id, int log_mask, bool use_monitor);
>
> +void dump_tb_cfg(int id, int depth, int log_flags);
>
>  /* TBStatistic collection controls */
>  void enable_collect_tb_stats(void);
> diff --git a/monitor/misc.c b/monitor/misc.c
> index b99c018124..b3b31d7035 100644
> --- a/monitor/misc.c
> +++ b/monitor/misc.c
> @@ -556,6 +556,28 @@ static void hmp_info_tb(Monitor *mon, const QDict *qdict)
>      dump_tb_info(id, mask, true);
>  }
>
> +static void hmp_info_cfg(Monitor *mon, const QDict *qdict)
> +{
> +    const int id = qdict_get_int(qdict, "id");
> +    const int depth = qdict_get_try_int(qdict, "depth", 3);
> +    const char *flags = qdict_get_try_str(qdict, "flags");
> +    int mask;
> +
> +    if (!tcg_enabled()) {
> +        error_report("TB information is only available with accel=tcg");
> +        return;
> +    }
> +
> +    mask = flags ? qemu_str_to_log_mask(flags) : CPU_LOG_TB_IN_ASM;
> +
> +    if (!mask) {
> +        error_report("Unable to parse log flags, see 'help log'");
> +        return;
> +    }
> +
> +    dump_tb_cfg(id, depth, mask);
> +}
> +
>  static void hmp_info_coverset(Monitor *mon, const QDict *qdict)
>  {
>      int coverage;


--
Alex Bennée


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v8 00/11] Measure Tiny Code Generation Quality
  2019-08-29 17:34 [Qemu-devel] [PATCH v8 00/11] Measure Tiny Code Generation Quality vandersonmr
                   ` (10 preceding siblings ...)
  2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 11/11] linux-user: dumping hot TBs at the end of the execution vandersonmr
@ 2019-11-21 15:38 ` Markus Armbruster
  11 siblings, 0 replies; 24+ messages in thread
From: Markus Armbruster @ 2019-11-21 15:38 UTC (permalink / raw)
  To: vandersonmr; +Cc: qemu-devel

vandersonmr <vandersonmr2@gmail.com> writes:

> This patch is part of Google Summer of Code (GSoC) 2019.
> More about the project can be found in:
> https://wiki.qemu.org/Internships/ProjectIdeas/TCGCodeQuality
>
> The goal of this patch is to add infrastructure to collect
> execution and JIT statistics during the emulation with accel/TCG.
> The statistics are stored in TBStatistic structures (TBStats)
> with each TB having its respective TBStats.
>
> We added -d tb_stats and HMP tb_stats commands to allow the control
> of this statistics collection. And info tb, tbs, and coverset commands
> were also added to allow dumping and exploring all this information
> while emulating.
>
> Collecting these statistics and information is useful to understand
> qemu performance and to help to add the support for traces to QEMU.

Are you still pursuing this?



^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2019-11-21 15:43 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-29 17:34 [Qemu-devel] [PATCH v8 00/11] Measure Tiny Code Generation Quality vandersonmr
2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 01/11] accel: introducing TBStatistics structure vandersonmr
2019-08-30 12:59   ` Alex Bennée
2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 02/11] accel: collecting TB execution count vandersonmr
2019-08-30 10:21   ` Alex Bennée
2019-08-30 12:31     ` Vanderson Martins do Rosario
2019-08-30 13:01       ` Alex Bennée
2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 03/11] accel: collecting JIT statistics vandersonmr
2019-08-30 13:10   ` Alex Bennée
2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 04/11] accel: replacing part of CONFIG_PROFILER with TBStats vandersonmr
2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 05/11] accel: adding TB_JIT_TIME and full replacing CONFIG_PROFILER vandersonmr
2019-08-30 13:12   ` Alex Bennée
2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 06/11] Adding -d tb_stats to control TBStatistics collection: vandersonmr
2019-08-30 14:45   ` Alex Bennée
2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 07/11] monitor: adding tb_stats hmp command vandersonmr
2019-08-30 15:11   ` Alex Bennée
2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 08/11] Adding tb_stats [start|pause|stop|filter] command to hmp vandersonmr
2019-08-29 17:54   ` Vanderson Martins do Rosario
2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 09/11] Adding info [tb-list|tb|coverset] commands to HMP vandersonmr
2019-08-30 16:17   ` Alex Bennée
2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 10/11] monitor: adding new info cfg command vandersonmr
2019-08-30 16:26   ` Alex Bennée
2019-08-29 17:34 ` [Qemu-devel] [PATCH v8 11/11] linux-user: dumping hot TBs at the end of the execution vandersonmr
2019-11-21 15:38 ` [Qemu-devel] [PATCH v8 00/11] Measure Tiny Code Generation Quality Markus Armbruster

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).