qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v4 0/7] Measure Tiny Code Generation Quality
@ 2019-07-20  1:02 vandersonmr
  2019-07-20  1:02 ` [Qemu-devel] [PATCH v4 1/7] accel: introducing TBStatistics structure vandersonmr
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: vandersonmr @ 2019-07-20  1:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: vandersonmr

This patch is part of Google Summer of Code (GSoC) 2019.
More about the project can be found in:
https://wiki.qemu.org/Internships/ProjectIdeas/TCGCodeQuality

The goal of this patch is to add infrastructure to collect
execution and JIT statistics during the emulation with accel/TCG.
The statistics are stored in TBStatistic structures (TBStats)
with each TB having its respective TBStats.

We added -d tb_stats and HMP tb_stats commands to allow the control
of this statistics collection. And info tb, tbs, and coverset commands
were also added to allow dumping and exploring all this information 
while emulating.

Collecting these statistics and information is useful to understand
qemu performance and to help to add the support for traces to QEMU. 

vandersonmr (7):
  accel: introducing TBStatistics structure
  accel: collecting TB execution count
  accel: collecting JIT statistics
  accel: replacing part of CONFIG_PROFILER with TBStats
  log: adding -d tb_stats to control tbstats
  monitor: adding tb_stats hmp command
  monitor: adding info tbs, tb, and coverset

 accel/tcg/Makefile.objs      |   2 +-
 accel/tcg/tb-stats.c         | 489 +++++++++++++++++++++++++++++++++++
 accel/tcg/tcg-runtime.c      |   7 +
 accel/tcg/tcg-runtime.h      |   2 +
 accel/tcg/translate-all.c    |  93 ++++++-
 accel/tcg/translator.c       |   6 +
 disas.c                      | 108 ++++++++
 hmp-commands-info.hx         |  23 ++
 hmp-commands.hx              |  17 ++
 include/disas/disas.h        |   1 +
 include/exec/exec-all.h      |  15 +-
 include/exec/gen-icount.h    |   9 +
 include/exec/tb-context.h    |  12 +
 include/exec/tb-hash.h       |   7 +
 include/exec/tb-stats.h      | 113 ++++++++
 include/qemu/log-for-trace.h |   2 +
 include/qemu/log.h           |  16 ++
 linux-user/exit.c            |   4 +
 monitor/misc.c               | 111 ++++++++
 tcg/tcg.c                    | 114 +++-----
 tcg/tcg.h                    |  12 +-
 util/log.c                   |  99 ++++++-
 22 files changed, 1144 insertions(+), 118 deletions(-)
 create mode 100644 accel/tcg/tb-stats.c
 create mode 100644 include/exec/tb-stats.h

-- 
2.22.0



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v4 1/7] accel: introducing TBStatistics structure
  2019-07-20  1:02 [Qemu-devel] [PATCH v4 0/7] Measure Tiny Code Generation Quality vandersonmr
@ 2019-07-20  1:02 ` vandersonmr
  2019-07-26 11:56   ` Alex Bennée
  2019-07-20  1:02 ` [Qemu-devel] [PATCH v4 2/7] accel: collecting TB execution count vandersonmr
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: vandersonmr @ 2019-07-20  1:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, vandersonmr, Richard Henderson

To store statistics for each TB we created a TBStatistics structure
which is linked with the TBs. The TBStatistics can stay alive after
tb_flush and be relinked to a regenerated TB. So the statistics can
be accumulated even through flushes.

TBStatistics will be also referred to as TBS or tbstats.

Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
---
 accel/tcg/translate-all.c | 57 +++++++++++++++++++++++++++++++++++++++
 include/exec/exec-all.h   | 15 +++--------
 include/exec/tb-context.h | 12 +++++++++
 include/exec/tb-hash.h    |  7 +++++
 include/exec/tb-stats.h   | 29 ++++++++++++++++++++
 include/qemu/log.h        |  6 +++++
 util/log.c                | 28 +++++++++++++++++++
 7 files changed, 143 insertions(+), 11 deletions(-)
 create mode 100644 include/exec/tb-stats.h

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 5d1e08b169..a574890a80 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1118,6 +1118,23 @@ static inline void code_gen_alloc(size_t tb_size)
     }
 }
 
+/*
+ * This is the more or less the same compare, but the data persists
+ * over tb_flush. We also aggregate the various variations of cflags
+ * under one record and ignore the details of page overlap (although
+ * we can count it).
+ */
+bool tb_stats_cmp(const void *ap, const void *bp)
+{
+    const TBStatistics *a = ap;
+    const TBStatistics *b = bp;
+
+    return a->phys_pc == b->phys_pc &&
+        a->pc == b->pc &&
+        a->cs_base == b->cs_base &&
+        a->flags == b->flags;
+}
+
 static bool tb_cmp(const void *ap, const void *bp)
 {
     const TranslationBlock *a = ap;
@@ -1137,6 +1154,9 @@ static void tb_htable_init(void)
     unsigned int mode = QHT_MODE_AUTO_RESIZE;
 
     qht_init(&tb_ctx.htable, tb_cmp, CODE_GEN_HTABLE_SIZE, mode);
+    if (tb_stats_collection_enabled()) {
+        qht_init(&tb_ctx.tb_stats, tb_stats_cmp, CODE_GEN_HTABLE_SIZE, mode);
+    }
 }
 
 /* Must be called before using the QEMU cpus. 'tb_size' is the size
@@ -1666,6 +1686,32 @@ tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
     return tb;
 }
 
+static TBStatistics *tb_get_stats(tb_page_addr_t phys_pc, target_ulong pc,
+                                  target_ulong cs_base, uint32_t flags)
+{
+    TBStatistics *new_stats = g_new0(TBStatistics, 1);
+    uint32_t hash = tb_stats_hash_func(phys_pc, pc, flags);
+    void *existing_stats = NULL;
+    new_stats->phys_pc = phys_pc;
+    new_stats->pc = pc;
+    new_stats->cs_base = cs_base;
+    new_stats->flags = flags;
+
+    qht_insert(&tb_ctx.tb_stats, new_stats, hash, &existing_stats);
+
+    if (unlikely(existing_stats)) {
+        /*
+         * If there is already a TBStatistic for this TB from a previous flush
+         * then just make the new TB point to the older TBStatistic
+         */
+        g_free(new_stats);
+        return existing_stats;
+    } else {
+        return new_stats;
+    }
+}
+
+
 /* Called with mmap_lock held for user mode emulation.  */
 TranslationBlock *tb_gen_code(CPUState *cpu,
                               target_ulong pc, target_ulong cs_base,
@@ -1732,6 +1778,17 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     ti = profile_getclock();
 #endif
 
+    /*
+     * We want to fetch the stats structure before we start code
+     * generation so we can count interesting things about this
+     * generation.
+     */
+    if (tb_stats_collection_enabled()) {
+        tb->tb_stats = tb_get_stats(phys_pc, pc, cs_base, flags);
+    } else {
+        tb->tb_stats = NULL;
+    }
+
     tcg_func_start(tcg_ctx);
 
     tcg_ctx->cpu = env_cpu(env);
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 16034ee651..24bd6a0a0c 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -22,21 +22,11 @@
 
 #include "exec/tb-context.h"
 #include "sysemu/cpus.h"
+#include "exec/tb-stats.h"
 
 /* allow to see translation results - the slowdown should be negligible, so we leave it */
 #define DEBUG_DISAS
 
-/* Page tracking code uses ram addresses in system mode, and virtual
-   addresses in userspace mode.  Define tb_page_addr_t to be an appropriate
-   type.  */
-#if defined(CONFIG_USER_ONLY)
-typedef abi_ulong tb_page_addr_t;
-#define TB_PAGE_ADDR_FMT TARGET_ABI_FMT_lx
-#else
-typedef ram_addr_t tb_page_addr_t;
-#define TB_PAGE_ADDR_FMT RAM_ADDR_FMT
-#endif
-
 #include "qemu/log.h"
 
 void gen_intermediate_code(CPUState *cpu, TranslationBlock *tb, int max_insns);
@@ -403,6 +393,9 @@ struct TranslationBlock {
     uintptr_t jmp_list_head;
     uintptr_t jmp_list_next[2];
     uintptr_t jmp_dest[2];
+
+    /* Pointer to a struct where statistics from the TB is stored */
+    TBStatistics *tb_stats;
 };
 
 extern bool parallel_cpus;
diff --git a/include/exec/tb-context.h b/include/exec/tb-context.h
index feb585e0a7..3cfb62a338 100644
--- a/include/exec/tb-context.h
+++ b/include/exec/tb-context.h
@@ -23,6 +23,17 @@
 #include "qemu/thread.h"
 #include "qemu/qht.h"
 
+/* Page tracking code uses ram addresses in system mode, and virtual
+   addresses in userspace mode.  Define tb_page_addr_t to be an appropriate
+   type.  */
+#if defined(CONFIG_USER_ONLY)
+typedef abi_ulong tb_page_addr_t;
+#define TB_PAGE_ADDR_FMT TARGET_ABI_FMT_lx
+#else
+typedef ram_addr_t tb_page_addr_t;
+#define TB_PAGE_ADDR_FMT RAM_ADDR_FMT
+#endif
+
 #define CODE_GEN_HTABLE_BITS     15
 #define CODE_GEN_HTABLE_SIZE     (1 << CODE_GEN_HTABLE_BITS)
 
@@ -35,6 +46,7 @@ struct TBContext {
 
     /* statistics */
     unsigned tb_flush_count;
+    struct qht tb_stats;
 };
 
 extern TBContext tb_ctx;
diff --git a/include/exec/tb-hash.h b/include/exec/tb-hash.h
index 4f3a37d927..54c477fe79 100644
--- a/include/exec/tb-hash.h
+++ b/include/exec/tb-hash.h
@@ -64,4 +64,11 @@ uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, uint32_t flags,
     return qemu_xxhash7(phys_pc, pc, flags, cf_mask, trace_vcpu_dstate);
 }
 
+static inline
+uint32_t tb_stats_hash_func(tb_page_addr_t phys_pc, target_ulong pc,
+                            uint32_t flags)
+{
+    return qemu_xxhash5(phys_pc, pc, flags);
+}
+
 #endif
diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
new file mode 100644
index 0000000000..0913155ec3
--- /dev/null
+++ b/include/exec/tb-stats.h
@@ -0,0 +1,29 @@
+#ifndef TB_STATS_H
+
+#define TB_STATS_H
+
+#include "exec/cpu-common.h"
+#include "exec/tb-context.h"
+#include "tcg.h"
+
+typedef struct TBStatistics TBStatistics;
+
+/*
+ * This struct stores statistics such as execution count of the
+ * TranslationBlocks. Each sets of TBs for a given phys_pc/pc/flags
+ * has its own TBStatistics which will persist over tb_flush.
+ *
+ * We include additional counters to track number of translations as
+ * well as variants for compile flags.
+ */
+struct TBStatistics {
+    tb_page_addr_t phys_pc;
+    target_ulong pc;
+    uint32_t     flags;
+    /* cs_base isn't included in the hash but we do check for matches */
+    target_ulong cs_base;
+};
+
+bool tb_stats_cmp(const void *ap, const void *bp);
+
+#endif
diff --git a/include/qemu/log.h b/include/qemu/log.h
index b097a6cae1..e175d4d5d0 100644
--- a/include/qemu/log.h
+++ b/include/qemu/log.h
@@ -129,4 +129,10 @@ void qemu_log_flush(void);
 /* Close the log file */
 void qemu_log_close(void);
 
+void enable_collect_tb_stats(void);
+void disable_collect_tb_stats(void);
+void pause_collect_tb_stats(void);
+bool tb_stats_collection_enabled(void);
+bool tb_stats_collection_paused(void);
+
 #endif
diff --git a/util/log.c b/util/log.c
index 1d1b33f7d9..ab73fdc100 100644
--- a/util/log.c
+++ b/util/log.c
@@ -326,3 +326,31 @@ void qemu_print_log_usage(FILE *f)
     fprintf(f, "\nUse \"-d trace:help\" to get a list of trace events.\n\n");
 #endif
 }
+
+int tcg_collect_tb_stats;
+
+void enable_collect_tb_stats(void)
+{
+    tcg_collect_tb_stats = 1;
+}
+
+void disable_collect_tb_stats(void)
+{
+    tcg_collect_tb_stats = 0;
+}
+
+void pause_collect_tb_stats(void)
+{
+    tcg_collect_tb_stats = 2;
+}
+
+bool tb_stats_collection_enabled(void)
+{
+    return tcg_collect_tb_stats;
+}
+
+bool tb_stats_collection_paused(void)
+{
+    return tcg_collect_tb_stats == 2;
+}
+
-- 
2.22.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v4 2/7] accel: collecting TB execution count
  2019-07-20  1:02 [Qemu-devel] [PATCH v4 0/7] Measure Tiny Code Generation Quality vandersonmr
  2019-07-20  1:02 ` [Qemu-devel] [PATCH v4 1/7] accel: introducing TBStatistics structure vandersonmr
@ 2019-07-20  1:02 ` vandersonmr
  2019-07-26 13:38   ` Alex Bennée
  2019-07-20  1:02 ` [Qemu-devel] [PATCH v4 3/7] accel: collecting JIT statistics vandersonmr
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: vandersonmr @ 2019-07-20  1:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, vandersonmr, Richard Henderson

If a TB has a TBS (TBStatistics) with the TB_EXEC_STATS
enabled, then we instrument the start code of the TB
to atomically count the number of times it is executed.
The execution count of the TB is stored in its respective
TBS.

Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
---
 accel/tcg/tcg-runtime.c   |  7 +++++++
 accel/tcg/tcg-runtime.h   |  2 ++
 accel/tcg/translate-all.c |  8 ++++++++
 accel/tcg/translator.c    |  1 +
 include/exec/gen-icount.h |  9 +++++++++
 include/exec/tb-stats.h   | 11 +++++++++++
 include/qemu/log.h        |  6 ++++++
 util/log.c                | 11 +++++++++++
 8 files changed, 55 insertions(+)

diff --git a/accel/tcg/tcg-runtime.c b/accel/tcg/tcg-runtime.c
index 8a1e408e31..f332eae334 100644
--- a/accel/tcg/tcg-runtime.c
+++ b/accel/tcg/tcg-runtime.c
@@ -167,3 +167,10 @@ void HELPER(exit_atomic)(CPUArchState *env)
 {
     cpu_loop_exit_atomic(env_cpu(env), GETPC());
 }
+
+void HELPER(inc_exec_freq)(void *ptr)
+{
+    TBStatistics *stats = (TBStatistics *) ptr;
+    g_assert(stats);
+    atomic_inc(&stats->executions.total);
+}
diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
index 4fa61b49b4..bf0b75dbe8 100644
--- a/accel/tcg/tcg-runtime.h
+++ b/accel/tcg/tcg-runtime.h
@@ -28,6 +28,8 @@ DEF_HELPER_FLAGS_1(lookup_tb_ptr, TCG_CALL_NO_WG_SE, ptr, env)
 
 DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, env)
 
+DEF_HELPER_FLAGS_1(inc_exec_freq, TCG_CALL_NO_RWG, void, ptr)
+
 #ifdef CONFIG_SOFTMMU
 
 DEF_HELPER_FLAGS_5(atomic_cmpxchgb, TCG_CALL_NO_WG,
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index a574890a80..7497dae508 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1785,6 +1785,14 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
      */
     if (tb_stats_collection_enabled()) {
         tb->tb_stats = tb_get_stats(phys_pc, pc, cs_base, flags);
+        uint32_t flag = get_default_tbstats_flag();
+
+        if (qemu_log_in_addr_range(tb->pc)) {
+            if (flag & TB_EXEC_STATS) {
+                tb->tb_stats->stats_enabled |= TB_EXEC_STATS;
+            }
+        }
+
     } else {
         tb->tb_stats = NULL;
     }
diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index 9226a348a3..396a11e828 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -46,6 +46,7 @@ void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
 
     ops->init_disas_context(db, cpu);
     tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
+    gen_tb_exec_count(tb);
 
     /* Reset the temp count so that we can identify leaks */
     tcg_clear_temp_count();
diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
index f7669b6841..b3efe41894 100644
--- a/include/exec/gen-icount.h
+++ b/include/exec/gen-icount.h
@@ -7,6 +7,15 @@
 
 static TCGOp *icount_start_insn;
 
+static inline void gen_tb_exec_count(TranslationBlock *tb)
+{
+    if (tb_stats_enabled(tb, TB_EXEC_STATS)) {
+        TCGv_ptr ptr = tcg_const_ptr(tb->tb_stats);
+        gen_helper_inc_exec_freq(ptr);
+        tcg_temp_free_ptr(ptr);
+    }
+}
+
 static inline void gen_tb_start(TranslationBlock *tb)
 {
     TCGv_i32 count, imm;
diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
index 0913155ec3..ee1e8de0d3 100644
--- a/include/exec/tb-stats.h
+++ b/include/exec/tb-stats.h
@@ -6,6 +6,9 @@
 #include "exec/tb-context.h"
 #include "tcg.h"
 
+#define tb_stats_enabled(tb, JIT_STATS) \
+    (tb && tb->tb_stats && (tb->tb_stats->stats_enabled & JIT_STATS))
+
 typedef struct TBStatistics TBStatistics;
 
 /*
@@ -22,6 +25,14 @@ struct TBStatistics {
     uint32_t     flags;
     /* cs_base isn't included in the hash but we do check for matches */
     target_ulong cs_base;
+
+    uint32_t stats_enabled;
+
+    /* Execution stats */
+    struct {
+        unsigned long total;
+        unsigned long atomic;
+    } executions;
 };
 
 bool tb_stats_cmp(const void *ap, const void *bp);
diff --git a/include/qemu/log.h b/include/qemu/log.h
index e175d4d5d0..b213411836 100644
--- a/include/qemu/log.h
+++ b/include/qemu/log.h
@@ -129,10 +129,16 @@ void qemu_log_flush(void);
 /* Close the log file */
 void qemu_log_close(void);
 
+#define TB_NOTHING    0
+#define TB_EXEC_STATS (1 << 1)
+
 void enable_collect_tb_stats(void);
 void disable_collect_tb_stats(void);
 void pause_collect_tb_stats(void);
 bool tb_stats_collection_enabled(void);
 bool tb_stats_collection_paused(void);
 
+void set_default_tbstats_flag(uint32_t flag);
+uint32_t get_default_tbstats_flag(void);
+
 #endif
diff --git a/util/log.c b/util/log.c
index ab73fdc100..f81653d712 100644
--- a/util/log.c
+++ b/util/log.c
@@ -354,3 +354,14 @@ bool tb_stats_collection_paused(void)
     return tcg_collect_tb_stats == 2;
 }
 
+uint32_t default_tbstats_flag;
+
+void set_default_tbstats_flag(uint32_t flag)
+{
+    default_tbstats_flag = flag;
+}
+
+uint32_t get_default_tbstats_flag(void)
+{
+    return default_tbstats_flag;
+}
-- 
2.22.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v4 3/7] accel: collecting JIT statistics
  2019-07-20  1:02 [Qemu-devel] [PATCH v4 0/7] Measure Tiny Code Generation Quality vandersonmr
  2019-07-20  1:02 ` [Qemu-devel] [PATCH v4 1/7] accel: introducing TBStatistics structure vandersonmr
  2019-07-20  1:02 ` [Qemu-devel] [PATCH v4 2/7] accel: collecting TB execution count vandersonmr
@ 2019-07-20  1:02 ` vandersonmr
  2019-07-26 14:46   ` Alex Bennée
  2019-07-20  1:02 ` [Qemu-devel] [PATCH v4 4/7] accel: replacing part of CONFIG_PROFILER with TBStats vandersonmr
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: vandersonmr @ 2019-07-20  1:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, vandersonmr, Richard Henderson

If a TB has a TBS (TBStatistics) with the TB_JIT_STATS
enabled then we collect statistics of its translation
processes and code translation. To collect the number
of host instructions we used a modified version of the
disas function to pass through the whole code without
printing anything (fake_fprintf) but counting the number
of instructions.

Signed-off-by: vandersonmr <vandersonmr2@gmail.com>
---
 accel/tcg/translate-all.c |  18 +++++++
 accel/tcg/translator.c    |   5 ++
 disas.c                   | 108 ++++++++++++++++++++++++++++++++++++++
 include/disas/disas.h     |   1 +
 include/exec/tb-stats.h   |  14 +++++
 include/qemu/log.h        |   1 +
 tcg/tcg.c                 |  23 ++++++++
 tcg/tcg.h                 |   2 +
 8 files changed, 172 insertions(+)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 7497dae508..3a47ac6f2c 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1793,6 +1793,10 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
             }
         }
 
+        if (flag & TB_JIT_STATS) {
+            tb->tb_stats->stats_enabled |= TB_JIT_STATS;
+            atomic_inc(&tb->tb_stats->translations.total);
+        }
     } else {
         tb->tb_stats = NULL;
     }
@@ -1870,6 +1874,17 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     atomic_set(&prof->search_out_len, prof->search_out_len + search_size);
 #endif
 
+    if (tb_stats_enabled(tb, TB_JIT_STATS)) {
+        size_t code_size = gen_code_size;
+        if (tcg_ctx->data_gen_ptr) {
+            code_size = tcg_ctx->data_gen_ptr - tb->tc.ptr;
+        }
+
+        atomic_add(&tb->tb_stats->code.num_host_inst,
+                    get_num_insts(tb->tc.ptr, code_size));
+    }
+
+
 #ifdef DEBUG_DISAS
     if (qemu_loglevel_mask(CPU_LOG_TB_OUT_ASM) &&
         qemu_log_in_addr_range(tb->pc)) {
@@ -1927,6 +1942,9 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     phys_page2 = -1;
     if ((pc & TARGET_PAGE_MASK) != virt_page2) {
         phys_page2 = get_page_addr_code(env, virt_page2);
+        if (tb_stats_enabled(tb, TB_JIT_STATS)) {
+            atomic_inc(&tb->tb_stats->translations.spanning);
+        }
     }
     /*
      * No explicit memory barrier is required -- tb_link_page() makes the
diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index 396a11e828..03c00bdb1b 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -117,6 +117,11 @@ void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
     db->tb->size = db->pc_next - db->pc_first;
     db->tb->icount = db->num_insns;
 
+    if (tb_stats_enabled(tb, TB_JIT_STATS)) {
+        atomic_add(&db->tb->tb_stats->code.num_guest_inst, db->num_insns);
+    }
+
+
 #ifdef DEBUG_DISAS
     if (qemu_loglevel_mask(CPU_LOG_TB_IN_ASM)
         && qemu_log_in_addr_range(db->pc_first)) {
diff --git a/disas.c b/disas.c
index 3e2bfa572b..5dec754992 100644
--- a/disas.c
+++ b/disas.c
@@ -475,6 +475,114 @@ void target_disas(FILE *out, CPUState *cpu, target_ulong code,
     }
 }
 
+static int fprintf_fake(struct _IO_FILE *a, const char *b, ...)
+{
+    return 1;
+}
+
+/*
+ * This is a work around to get the number of host instructions with
+ * a small effort. It reuses the disas function with a fake printf to
+ * print nothing but count the number of instructions.
+ *
+ */
+unsigned get_num_insts(void *code, unsigned long size)
+{
+    uintptr_t pc;
+    int count;
+    CPUDebug s;
+    int (*print_insn)(bfd_vma pc, disassemble_info *info) = NULL;
+
+    INIT_DISASSEMBLE_INFO(s.info, NULL, fprintf_fake);
+    s.info.print_address_func = generic_print_host_address;
+
+    s.info.buffer = code;
+    s.info.buffer_vma = (uintptr_t)code;
+    s.info.buffer_length = size;
+    s.info.cap_arch = -1;
+    s.info.cap_mode = 0;
+    s.info.cap_insn_unit = 4;
+    s.info.cap_insn_split = 4;
+
+#ifdef HOST_WORDS_BIGENDIAN
+    s.info.endian = BFD_ENDIAN_BIG;
+#else
+    s.info.endian = BFD_ENDIAN_LITTLE;
+#endif
+#if defined(CONFIG_TCG_INTERPRETER)
+    print_insn = print_insn_tci;
+#elif defined(__i386__)
+    s.info.mach = bfd_mach_i386_i386;
+    print_insn = print_insn_i386;
+    s.info.cap_arch = CS_ARCH_X86;
+    s.info.cap_mode = CS_MODE_32;
+    s.info.cap_insn_unit = 1;
+    s.info.cap_insn_split = 8;
+#elif defined(__x86_64__)
+    s.info.mach = bfd_mach_x86_64;
+    print_insn = print_insn_i386;
+    s.info.cap_arch = CS_ARCH_X86;
+    s.info.cap_mode = CS_MODE_64;
+    s.info.cap_insn_unit = 1;
+    s.info.cap_insn_split = 8;
+#elif defined(_ARCH_PPC)
+    s.info.disassembler_options = (char *)"any";
+    print_insn = print_insn_ppc;
+    s.info.cap_arch = CS_ARCH_PPC;
+# ifdef _ARCH_PPC64
+    s.info.cap_mode = CS_MODE_64;
+# endif
+#elif defined(__riscv) && defined(CONFIG_RISCV_DIS)
+#if defined(_ILP32) || (__riscv_xlen == 32)
+    print_insn = print_insn_riscv32;
+#elif defined(_LP64)
+    print_insn = print_insn_riscv64;
+#else
+#error unsupported RISC-V ABI
+#endif
+#elif defined(__aarch64__) && defined(CONFIG_ARM_A64_DIS)
+    print_insn = print_insn_arm_a64;
+    s.info.cap_arch = CS_ARCH_ARM64;
+#elif defined(__alpha__)
+    print_insn = print_insn_alpha;
+#elif defined(__sparc__)
+    print_insn = print_insn_sparc;
+    s.info.mach = bfd_mach_sparc_v9b;
+#elif defined(__arm__)
+    print_insn = print_insn_arm;
+    s.info.cap_arch = CS_ARCH_ARM;
+    /* TCG only generates code for arm mode.  */
+#elif defined(__MIPSEB__)
+    print_insn = print_insn_big_mips;
+#elif defined(__MIPSEL__)
+    print_insn = print_insn_little_mips;
+#elif defined(__m68k__)
+    print_insn = print_insn_m68k;
+#elif defined(__s390__)
+    print_insn = print_insn_s390;
+#elif defined(__hppa__)
+    print_insn = print_insn_hppa;
+#endif
+
+    if (print_insn == NULL) {
+        print_insn = print_insn_od_host;
+    }
+
+    s.info.fprintf_func = fprintf_fake;
+    unsigned num_insts = 0;
+    for (pc = (uintptr_t)code; size > 0; pc += count, size -= count) {
+        num_insts++;
+        count = print_insn(pc, &s.info);
+        if (count < 0) {
+            break;
+        }
+    }
+    return num_insts;
+}
+
+
+
+
 /* Disassemble this for me please... (debugging). */
 void disas(FILE *out, void *code, unsigned long size)
 {
diff --git a/include/disas/disas.h b/include/disas/disas.h
index 15da511f49..9797ae7cfa 100644
--- a/include/disas/disas.h
+++ b/include/disas/disas.h
@@ -7,6 +7,7 @@
 
 /* Disassemble this for me please... (debugging). */
 void disas(FILE *out, void *code, unsigned long size);
+unsigned get_num_insts(void *code, unsigned long size);
 void target_disas(FILE *out, CPUState *cpu, target_ulong code,
                   target_ulong size);
 
diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
index ee1e8de0d3..6584b78c4a 100644
--- a/include/exec/tb-stats.h
+++ b/include/exec/tb-stats.h
@@ -33,6 +33,20 @@ struct TBStatistics {
         unsigned long total;
         unsigned long atomic;
     } executions;
+
+    struct {
+        unsigned num_guest_inst;
+        unsigned num_host_inst;
+        unsigned num_tcg_ops;
+        unsigned num_tcg_ops_opt;
+        unsigned spills;
+    } code;
+
+    struct {
+        unsigned long total;
+        unsigned long uncached;
+        unsigned long spanning;
+    } translations;
 };
 
 bool tb_stats_cmp(const void *ap, const void *bp);
diff --git a/include/qemu/log.h b/include/qemu/log.h
index b213411836..8cdfc268ca 100644
--- a/include/qemu/log.h
+++ b/include/qemu/log.h
@@ -131,6 +131,7 @@ void qemu_log_close(void);
 
 #define TB_NOTHING    0
 #define TB_EXEC_STATS (1 << 1)
+#define TB_JIT_STATS  (1 << 2)
 
 void enable_collect_tb_stats(void);
 void disable_collect_tb_stats(void);
diff --git a/tcg/tcg.c b/tcg/tcg.c
index be2c33c400..446e3d1708 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -3126,6 +3126,11 @@ static void temp_sync(TCGContext *s, TCGTemp *ts, TCGRegSet allocated_regs,
         case TEMP_VAL_REG:
             tcg_out_st(s, ts->type, ts->reg,
                        ts->mem_base->reg, ts->mem_offset);
+
+            /* Count number of spills */
+            if (tb_stats_enabled(s->current_tb, TB_JIT_STATS)) {
+                atomic_inc(&s->current_tb->tb_stats->code.spills);
+            }
             break;
 
         case TEMP_VAL_MEM:
@@ -3997,6 +4002,8 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
     int i, num_insns;
     TCGOp *op;
 
+    s->current_tb = tb;
+
 #ifdef CONFIG_PROFILER
     {
         int n = 0;
@@ -4028,6 +4035,14 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
     }
 #endif
 
+    if (tb_stats_enabled(tb, TB_JIT_STATS)) {
+        int n = 0;
+        QTAILQ_FOREACH(op, &s->ops, link) {
+            n++;
+        }
+        atomic_add(&tb->tb_stats->code.num_tcg_ops, n);
+    }
+
 #ifdef CONFIG_DEBUG_TCG
     /* Ensure all labels referenced have been emitted.  */
     {
@@ -4094,6 +4109,14 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
     }
 #endif
 
+    if (tb_stats_enabled(tb, TB_JIT_STATS)) {
+        int n = 0;
+        QTAILQ_FOREACH(op, &s->ops, link) {
+            n++;
+        }
+        atomic_add(&tb->tb_stats->code.num_tcg_ops_opt, n);
+    }
+
     tcg_reg_alloc_start(s);
 
     s->code_buf = tb->tc.ptr;
diff --git a/tcg/tcg.h b/tcg/tcg.h
index b411e17a28..bf6f3bcba3 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -738,6 +738,8 @@ struct TCGContext {
 
     uint16_t gen_insn_end_off[TCG_MAX_INSNS];
     target_ulong gen_insn_data[TCG_MAX_INSNS][TARGET_INSN_START_WORDS];
+
+    TranslationBlock *current_tb;
 };
 
 extern TCGContext tcg_init_ctx;
-- 
2.22.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v4 4/7] accel: replacing part of CONFIG_PROFILER with TBStats
  2019-07-20  1:02 [Qemu-devel] [PATCH v4 0/7] Measure Tiny Code Generation Quality vandersonmr
                   ` (2 preceding siblings ...)
  2019-07-20  1:02 ` [Qemu-devel] [PATCH v4 3/7] accel: collecting JIT statistics vandersonmr
@ 2019-07-20  1:02 ` vandersonmr
  2019-07-26 15:25   ` Alex Bennée
  2019-07-20  1:02 ` [Qemu-devel] [PATCH v4 5/7] log: adding -d tb_stats to control tbstats vandersonmr
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: vandersonmr @ 2019-07-20  1:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, vandersonmr, Richard Henderson

We add some of the statistics collected in the TCGProfiler
into the TBStats, having the statistics not only for the whole
emulation but for each TB. Then, we removed these stats
from TCGProfiler and reconstruct the information for the
"info jit" using the sum of all TBStats statistics.

The goal is to have one unique and better way of collecting
emulation statistics. Moreover, checking dynamiclly if the
profiling is enabled showed to have an insignificant impact
on the performance:
https://wiki.qemu.org/Internships/ProjectIdeas/TCGCodeQuality#Overheads.

Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
---
 accel/tcg/Makefile.objs   |   2 +-
 accel/tcg/tb-stats.c      | 107 ++++++++++++++++++++++++++++++++++++++
 accel/tcg/translate-all.c |  10 ++--
 include/exec/tb-stats.h   |   9 ++++
 tcg/tcg.c                 |  93 ++++-----------------------------
 tcg/tcg.h                 |  10 ----
 6 files changed, 131 insertions(+), 100 deletions(-)
 create mode 100644 accel/tcg/tb-stats.c

diff --git a/accel/tcg/Makefile.objs b/accel/tcg/Makefile.objs
index d381a02f34..49ffe81b5d 100644
--- a/accel/tcg/Makefile.objs
+++ b/accel/tcg/Makefile.objs
@@ -2,7 +2,7 @@ obj-$(CONFIG_SOFTMMU) += tcg-all.o
 obj-$(CONFIG_SOFTMMU) += cputlb.o
 obj-y += tcg-runtime.o tcg-runtime-gvec.o
 obj-y += cpu-exec.o cpu-exec-common.o translate-all.o
-obj-y += translator.o
+obj-y += translator.o tb-stats.o
 
 obj-$(CONFIG_USER_ONLY) += user-exec.o
 obj-$(call lnot,$(CONFIG_SOFTMMU)) += user-exec-stub.o
diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
new file mode 100644
index 0000000000..44497d4f9b
--- /dev/null
+++ b/accel/tcg/tb-stats.c
@@ -0,0 +1,107 @@
+#include "qemu/osdep.h"
+
+#include "disas/disas.h"
+#include "exec/exec-all.h"
+#include "tcg.h"
+
+#include "qemu/qemu-print.h"
+
+struct jit_profile_info {
+    uint64_t translations;
+    uint64_t aborted;
+    uint64_t ops;
+    unsigned ops_max;
+    uint64_t del_ops;
+    uint64_t temps;
+    unsigned temps_max;
+    uint64_t host;
+    uint64_t guest;
+    uint64_t host_ins;
+    uint64_t search_data;
+};
+
+/* accumulate the statistics from all TBs */
+static void collect_jit_profile_info(void *p, uint32_t hash, void *userp)
+{
+    struct jit_profile_info *jpi = userp;
+    TBStatistics *tbs = p;
+
+    jpi->translations += tbs->translations.total;
+    jpi->ops += tbs->code.num_tcg_ops;
+    if (tbs->translations.total && tbs->code.num_tcg_ops / tbs->translations.total
+            > jpi->ops_max) {
+        jpi->ops_max = tbs->code.num_tcg_ops / tbs->translations.total;
+    }
+    jpi->del_ops += tbs->code.deleted_ops;
+    jpi->temps += tbs->code.temps;
+    if (tbs->translations.total && tbs->code.temps / tbs->translations.total >
+            jpi->temps_max) {
+        jpi->temps_max = tbs->code.temps / tbs->translations.total;
+    }
+    jpi->host += tbs->code.out_len;
+    jpi->guest += tbs->code.in_len;
+    jpi->host_ins += tbs->code.num_host_inst;
+    jpi->search_data += tbs->code.search_out_len;
+}
+
+/* dump JIT statisticis using TCGProfile and TBStats */
+void dump_jit_profile_info(TCGProfile *s)
+{
+    if (!tb_stats_collection_enabled()) {
+        return;
+    }
+
+    struct jit_profile_info *jpi = g_new0(struct jit_profile_info, 1);
+
+    qht_iter(&tb_ctx.tb_stats, collect_jit_profile_info, jpi);
+
+    if (jpi->translations) {
+        qemu_printf("translated TBs      %" PRId64 "\n", jpi->translations);
+        qemu_printf("avg ops/TB          %0.1f max=%d\n",
+                jpi->ops / (double) jpi->translations, jpi->ops_max);
+        qemu_printf("deleted ops/TB      %0.2f\n",
+                jpi->del_ops / (double) jpi->translations);
+        qemu_printf("avg temps/TB        %0.2f max=%d\n",
+                jpi->temps / (double) jpi->translations, jpi->temps_max);
+        qemu_printf("avg host code/TB    %0.1f\n",
+                jpi->host / (double) jpi->translations);
+        qemu_printf("avg host ins/TB     %0.1f\n",
+                jpi->host_ins / (double) jpi->translations);
+        qemu_printf("avg search data/TB  %0.1f\n",
+                jpi->search_data / (double) jpi->translations);
+
+        if (s) {
+            int64_t tot = s->interm_time + s->code_time;
+            qemu_printf("JIT cycles          %" PRId64 " (%0.3f s at 2.4 GHz)\n",
+                            tot, tot / 2.4e9);
+            qemu_printf("cycles/op           %0.1f\n",
+                        jpi->ops ? (double)tot / jpi->ops : 0);
+            qemu_printf("cycles/in byte      %0.1f\n",
+                        jpi->guest ? (double)tot / jpi->guest : 0);
+            qemu_printf("cycles/out byte     %0.1f\n",
+                        jpi->host ? (double)tot / jpi->host : 0);
+            qemu_printf("cycles/out inst     %0.1f\n",
+                        jpi->host_ins ? (double)tot / jpi->host_ins : 0);
+            qemu_printf("cycles/search byte     %0.1f\n",
+                        jpi->search_data ? (double)tot / jpi->search_data : 0);
+            if (tot == 0) {
+                tot = 1;
+            }
+            qemu_printf("  gen_interm time   %0.1f%%\n",
+                        (double)s->interm_time / tot * 100.0);
+            qemu_printf("  gen_code time     %0.1f%%\n",
+                        (double)s->code_time / tot * 100.0);
+            qemu_printf("optim./code time    %0.1f%%\n",
+                        (double)s->opt_time / (s->code_time ? s->code_time : 1)
+                        * 100.0);
+            qemu_printf("liveness/code time  %0.1f%%\n",
+                    (double)s->la_time / (s->code_time ? s->code_time : 1) * 100.0);
+            qemu_printf("cpu_restore count   %" PRId64 "\n",
+                    s->restore_count);
+            qemu_printf("  avg cycles        %0.1f\n",
+                    s->restore_count ? (double)s->restore_time / s->restore_count : 0);
+        }
+    }
+}
+
+
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 3a47ac6f2c..bbdb0f23a4 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1773,8 +1773,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
  tb_overflow:
 
 #ifdef CONFIG_PROFILER
-    /* includes aborted translations because of exceptions */
-    atomic_set(&prof->tb_count1, prof->tb_count1 + 1);
     ti = profile_getclock();
 #endif
 
@@ -1822,7 +1820,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     }
 
 #ifdef CONFIG_PROFILER
-    atomic_set(&prof->tb_count, prof->tb_count + 1);
     atomic_set(&prof->interm_time, prof->interm_time + profile_getclock() - ti);
     ti = profile_getclock();
 #endif
@@ -1869,12 +1866,13 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 
 #ifdef CONFIG_PROFILER
     atomic_set(&prof->code_time, prof->code_time + profile_getclock() - ti);
-    atomic_set(&prof->code_in_len, prof->code_in_len + tb->size);
-    atomic_set(&prof->code_out_len, prof->code_out_len + gen_code_size);
-    atomic_set(&prof->search_out_len, prof->search_out_len + search_size);
 #endif
 
     if (tb_stats_enabled(tb, TB_JIT_STATS)) {
+        atomic_add(&tb->tb_stats->code.in_len, tb->size);
+        atomic_add(&tb->tb_stats->code.out_len, gen_code_size);
+        atomic_add(&tb->tb_stats->code.search_out_len, search_size);
+
         size_t code_size = gen_code_size;
         if (tcg_ctx->data_gen_ptr) {
             code_size = tcg_ctx->data_gen_ptr - tb->tc.ptr;
diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
index 6584b78c4a..7d775f2c0d 100644
--- a/include/exec/tb-stats.h
+++ b/include/exec/tb-stats.h
@@ -40,6 +40,13 @@ struct TBStatistics {
         unsigned num_tcg_ops;
         unsigned num_tcg_ops_opt;
         unsigned spills;
+
+        /* CONFIG_PROFILE */
+        unsigned temps;
+        unsigned deleted_ops;
+        unsigned in_len;
+        unsigned out_len;
+        unsigned search_out_len;
     } code;
 
     struct {
@@ -51,4 +58,6 @@ struct TBStatistics {
 
 bool tb_stats_cmp(const void *ap, const void *bp);
 
+void dump_jit_profile_info(TCGProfile *s);
+
 #endif
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 446e3d1708..46b31a2f68 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -2295,9 +2295,9 @@ void tcg_op_remove(TCGContext *s, TCGOp *op)
     QTAILQ_INSERT_TAIL(&s->free_ops, op, link);
     s->nb_ops--;
 
-#ifdef CONFIG_PROFILER
-    atomic_set(&s->prof.del_op_count, s->prof.del_op_count + 1);
-#endif
+    if (tb_stats_enabled(s->current_tb, TB_JIT_STATS)) {
+        atomic_inc(&s->current_tb->tb_stats->code.deleted_ops);
+    }
 }
 
 static TCGOp *tcg_op_alloc(TCGOpcode opc)
@@ -3914,16 +3914,6 @@ void tcg_profile_snapshot(TCGProfile *prof, bool counters, bool table)
 
         if (counters) {
             PROF_ADD(prof, orig, cpu_exec_time);
-            PROF_ADD(prof, orig, tb_count1);
-            PROF_ADD(prof, orig, tb_count);
-            PROF_ADD(prof, orig, op_count);
-            PROF_MAX(prof, orig, op_count_max);
-            PROF_ADD(prof, orig, temp_count);
-            PROF_MAX(prof, orig, temp_count_max);
-            PROF_ADD(prof, orig, del_op_count);
-            PROF_ADD(prof, orig, code_in_len);
-            PROF_ADD(prof, orig, code_out_len);
-            PROF_ADD(prof, orig, search_out_len);
             PROF_ADD(prof, orig, interm_time);
             PROF_ADD(prof, orig, code_time);
             PROF_ADD(prof, orig, la_time);
@@ -4003,26 +3993,17 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
     TCGOp *op;
 
     s->current_tb = tb;
-
-#ifdef CONFIG_PROFILER
-    {
+    if (tb_stats_enabled(tb, TB_JIT_STATS)) {
         int n = 0;
 
         QTAILQ_FOREACH(op, &s->ops, link) {
             n++;
         }
-        atomic_set(&prof->op_count, prof->op_count + n);
-        if (n > prof->op_count_max) {
-            atomic_set(&prof->op_count_max, n);
-        }
+        atomic_add(&tb->tb_stats->code.num_tcg_ops, n);
 
         n = s->nb_temps;
-        atomic_set(&prof->temp_count, prof->temp_count + n);
-        if (n > prof->temp_count_max) {
-            atomic_set(&prof->temp_count_max, n);
-        }
+        atomic_add(&tb->tb_stats->code.temps, n);
     }
-#endif
 
 #ifdef DEBUG_DISAS
     if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP)
@@ -4229,70 +4210,16 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
     return tcg_current_code_size(s);
 }
 
-#ifdef CONFIG_PROFILER
 void tcg_dump_info(void)
 {
+    TCGProfile *s = NULL;
+#ifdef CONFIG_PROFILER
     TCGProfile prof = {};
-    const TCGProfile *s;
-    int64_t tb_count;
-    int64_t tb_div_count;
-    int64_t tot;
-
     tcg_profile_snapshot_counters(&prof);
     s = &prof;
-    tb_count = s->tb_count;
-    tb_div_count = tb_count ? tb_count : 1;
-    tot = s->interm_time + s->code_time;
-
-    qemu_printf("JIT cycles          %" PRId64 " (%0.3f s at 2.4 GHz)\n",
-                tot, tot / 2.4e9);
-    qemu_printf("translated TBs      %" PRId64 " (aborted=%" PRId64
-                " %0.1f%%)\n",
-                tb_count, s->tb_count1 - tb_count,
-                (double)(s->tb_count1 - s->tb_count)
-                / (s->tb_count1 ? s->tb_count1 : 1) * 100.0);
-    qemu_printf("avg ops/TB          %0.1f max=%d\n",
-                (double)s->op_count / tb_div_count, s->op_count_max);
-    qemu_printf("deleted ops/TB      %0.2f\n",
-                (double)s->del_op_count / tb_div_count);
-    qemu_printf("avg temps/TB        %0.2f max=%d\n",
-                (double)s->temp_count / tb_div_count, s->temp_count_max);
-    qemu_printf("avg host code/TB    %0.1f\n",
-                (double)s->code_out_len / tb_div_count);
-    qemu_printf("avg search data/TB  %0.1f\n",
-                (double)s->search_out_len / tb_div_count);
-    
-    qemu_printf("cycles/op           %0.1f\n",
-                s->op_count ? (double)tot / s->op_count : 0);
-    qemu_printf("cycles/in byte      %0.1f\n",
-                s->code_in_len ? (double)tot / s->code_in_len : 0);
-    qemu_printf("cycles/out byte     %0.1f\n",
-                s->code_out_len ? (double)tot / s->code_out_len : 0);
-    qemu_printf("cycles/search byte     %0.1f\n",
-                s->search_out_len ? (double)tot / s->search_out_len : 0);
-    if (tot == 0) {
-        tot = 1;
-    }
-    qemu_printf("  gen_interm time   %0.1f%%\n",
-                (double)s->interm_time / tot * 100.0);
-    qemu_printf("  gen_code time     %0.1f%%\n",
-                (double)s->code_time / tot * 100.0);
-    qemu_printf("optim./code time    %0.1f%%\n",
-                (double)s->opt_time / (s->code_time ? s->code_time : 1)
-                * 100.0);
-    qemu_printf("liveness/code time  %0.1f%%\n",
-                (double)s->la_time / (s->code_time ? s->code_time : 1) * 100.0);
-    qemu_printf("cpu_restore count   %" PRId64 "\n",
-                s->restore_count);
-    qemu_printf("  avg cycles        %0.1f\n",
-                s->restore_count ? (double)s->restore_time / s->restore_count : 0);
-}
-#else
-void tcg_dump_info(void)
-{
-    qemu_printf("[TCG profiler not compiled]\n");
-}
 #endif
+    dump_jit_profile_info(s);
+}
 
 #ifdef ELF_HOST_MACHINE
 /* In order to use this feature, the backend needs to do three things:
diff --git a/tcg/tcg.h b/tcg/tcg.h
index bf6f3bcba3..026a066b9a 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -644,16 +644,6 @@ QEMU_BUILD_BUG_ON(NB_OPS > (1 << 8));
 
 typedef struct TCGProfile {
     int64_t cpu_exec_time;
-    int64_t tb_count1;
-    int64_t tb_count;
-    int64_t op_count; /* total insn count */
-    int op_count_max; /* max insn per TB */
-    int temp_count_max;
-    int64_t temp_count;
-    int64_t del_op_count;
-    int64_t code_in_len;
-    int64_t code_out_len;
-    int64_t search_out_len;
     int64_t interm_time;
     int64_t code_time;
     int64_t la_time;
-- 
2.22.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v4 5/7] log: adding -d tb_stats to control tbstats
  2019-07-20  1:02 [Qemu-devel] [PATCH v4 0/7] Measure Tiny Code Generation Quality vandersonmr
                   ` (3 preceding siblings ...)
  2019-07-20  1:02 ` [Qemu-devel] [PATCH v4 4/7] accel: replacing part of CONFIG_PROFILER with TBStats vandersonmr
@ 2019-07-20  1:02 ` vandersonmr
  2019-07-26 16:20   ` Alex Bennée
  2019-07-20  1:02 ` [Qemu-devel] [PATCH v4 6/7] monitor: adding tb_stats hmp command vandersonmr
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: vandersonmr @ 2019-07-20  1:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: vandersonmr

Adding -d tb_stats:[limit:[all|jit|exec]] to control TBStatistics
collection. "limit" is used to limit the number of TBStats in the
linux-user dump. [all|jit|exec] control the profilling level used
by the TBStats: all, only jit stats or only execution count stats.

Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
---
 include/qemu/log.h |  1 +
 util/log.c         | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/include/qemu/log.h b/include/qemu/log.h
index 8cdfc268ca..93642603e5 100644
--- a/include/qemu/log.h
+++ b/include/qemu/log.h
@@ -45,6 +45,7 @@ static inline bool qemu_log_separate(void)
 /* LOG_TRACE (1 << 15) is defined in log-for-trace.h */
 #define CPU_LOG_TB_OP_IND  (1 << 16)
 #define CPU_LOG_TB_FPU     (1 << 17)
+#define CPU_LOG_TB_STATS   (1 << 18)
 
 /* Lock output for a series of related logs.  Since this is not needed
  * for a single qemu_log / qemu_log_mask / qemu_log_mask_and_addr, we
diff --git a/util/log.c b/util/log.c
index f81653d712..8109d19afb 100644
--- a/util/log.c
+++ b/util/log.c
@@ -30,6 +30,7 @@ FILE *qemu_logfile;
 int qemu_loglevel;
 static int log_append = 0;
 static GArray *debug_regions;
+int32_t max_num_hot_tbs_to_dump;
 
 /* Return the number of characters emitted.  */
 int qemu_log(const char *fmt, ...)
@@ -273,6 +274,8 @@ const QEMULogItem qemu_log_items[] = {
     { CPU_LOG_TB_NOCHAIN, "nochain",
       "do not chain compiled TBs so that \"exec\" and \"cpu\" show\n"
       "complete traces" },
+    { CPU_LOG_TB_STATS, "tb_stats[:limit:(all,jit,exec)]",
+      "show TBs statistics (until given a limit) ordered by their hotness.\n" },
     { 0, NULL, NULL },
 };
 
@@ -294,6 +297,37 @@ int qemu_str_to_log_mask(const char *str)
             trace_enable_events((*tmp) + 6);
             mask |= LOG_TRACE;
 #endif
+        } else if (g_str_has_prefix(*tmp, "tb_stats")) {
+            if (g_str_has_prefix(*tmp, "tb_stats:") && (*tmp)[9] != '\0') {
+
+                if (!g_ascii_isdigit(*((*tmp) + 9))) {
+                    fprintf(stderr,
+                            "should be a number follow by [all|jit|exec], as tb_stats:10:all\n");
+                    exit(1);
+                }
+                /* get limit */
+                max_num_hot_tbs_to_dump = atoi((*tmp) + 9);
+
+                /* get profilling level */
+                char *s = (*tmp) + 9;
+                while (*s != '\0') {
+                    if (*s++ == ':') {
+                        break;
+                    }
+                }
+                if (g_str_equal(s, "jit") == 0) {
+                    set_default_tbstats_flag(TB_JIT_STATS);
+                } else if (g_str_equal(s, "exec") == 0) {
+                    set_default_tbstats_flag(TB_EXEC_STATS);
+                } else {
+                    set_default_tbstats_flag(TB_JIT_STATS | TB_EXEC_STATS);
+                }
+            } else {
+                set_default_tbstats_flag(TB_JIT_STATS | TB_EXEC_STATS);
+            }
+
+            mask |= CPU_LOG_TB_STATS;
+            enable_collect_tb_stats();
         } else {
             for (item = qemu_log_items; item->mask != 0; item++) {
                 if (g_str_equal(*tmp, item->name)) {
-- 
2.22.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v4 6/7] monitor: adding tb_stats hmp command
  2019-07-20  1:02 [Qemu-devel] [PATCH v4 0/7] Measure Tiny Code Generation Quality vandersonmr
                   ` (4 preceding siblings ...)
  2019-07-20  1:02 ` [Qemu-devel] [PATCH v4 5/7] log: adding -d tb_stats to control tbstats vandersonmr
@ 2019-07-20  1:02 ` vandersonmr
  2019-07-26 16:57   ` Alex Bennée
  2019-07-20  1:02 ` [Qemu-devel] [PATCH v4 7/7] monitor: adding info tbs, tb, and coverset vandersonmr
  2019-07-26 12:51 ` [Qemu-devel] [PATCH v4 0/7] Measure Tiny Code Generation Quality Alex Bennée
  7 siblings, 1 reply; 18+ messages in thread
From: vandersonmr @ 2019-07-20  1:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, vandersonmr, Dr. David Alan Gilbert,
	Markus Armbruster, Richard Henderson

Adding tb_stats [start|pause|stop|filter] command to hmp.
This allows controlling the collection of statistics.
It is also possible to set the level of collection:
all, jit, or exec.

The goal of this command is to allow the dynamic exploration
of the TCG behavior and quality. Therefore, for now, a
corresponding QMP command is not worthwhile.

Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
---
 accel/tcg/tb-stats.c    | 107 ++++++++++++++++++++++++++++++++++++++++
 hmp-commands.hx         |  17 +++++++
 include/exec/tb-stats.h |  13 +++++
 include/qemu/log.h      |   1 +
 monitor/misc.c          |  40 +++++++++++++++
 5 files changed, 178 insertions(+)

diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
index 44497d4f9b..6c330e1b02 100644
--- a/accel/tcg/tb-stats.c
+++ b/accel/tcg/tb-stats.c
@@ -6,6 +6,9 @@
 
 #include "qemu/qemu-print.h"
 
+/* only accessed in safe work */
+static GList *last_search;
+
 struct jit_profile_info {
     uint64_t translations;
     uint64_t aborted;
@@ -104,4 +107,108 @@ void dump_jit_profile_info(TCGProfile *s)
     }
 }
 
+static void dessaloc_tbstats(void *p, uint32_t hash, void *userp)
+{
+    g_free(p);
+}
+
+void clean_tbstats(void)
+{
+    /* remove all tb_stats */
+    qht_iter(&tb_ctx.tb_stats, dessaloc_tbstats, NULL);
+    qht_destroy(&tb_ctx.tb_stats);
+}
+
+static void reset_tbstats_flag(void *p, uint32_t hash, void *userp)
+{
+    uint32_t flag = *((int *)userp);
+    TBStatistics *tbs = p;
+    tbs->stats_enabled = flag;
+}
+
+void set_tbstats_flags(uint32_t flag)
+{
+    /* iterate over tbstats setting their flag as TB_NOTHING */
+    qht_iter(&tb_ctx.tb_stats, reset_tbstats_flag, &flag);
+}
+
+void do_hmp_tbstats_safe(CPUState *cpu, run_on_cpu_data icmd)
+{
+    struct TbstatsCommand *cmdinfo = icmd.host_ptr;
+    int cmd = cmdinfo->cmd;
+    uint32_t level = cmdinfo->level;
+
+    /* for safe, always pause TB generation while running this commands */
+    mmap_lock();
+
+    switch (cmd) {
+    case START:
+        if (tb_stats_collection_paused()) {
+            set_tbstats_flags(level);
+        } else {
+            if (tb_stats_collection_enabled()) {
+                qemu_printf("TB information already being recorded");
+                return;
+            }
+            qht_init(&tb_ctx.tb_stats, tb_stats_cmp, CODE_GEN_HTABLE_SIZE,
+                        QHT_MODE_AUTO_RESIZE);
+        }
+
+        set_default_tbstats_flag(level);
+        enable_collect_tb_stats();
+        tb_flush(cpu);
+        break;
+    case PAUSE:
+        if (!tb_stats_collection_enabled()) {
+            qemu_printf("TB information not being recorded");
+            return;
+        }
+
+        /* Continue to create TBStatistic structures but stop collecting statistics */
+        pause_collect_tb_stats();
+        tb_flush(cpu);
+        set_default_tbstats_flag(TB_NOTHING);
+        set_tbstats_flags(TB_PAUSED);
+        break;
+    case STOP:
+        if (!tb_stats_collection_enabled()) {
+            qemu_printf("TB information not being recorded");
+            return;
+        }
+
+        /* Dissalloc all TBStatistics structures and stop creating new ones */
+        disable_collect_tb_stats();
+        tb_flush(cpu);
+        clean_tbstats();
+        break;
+    case FILTER:
+        if (!tb_stats_collection_enabled()) {
+            qemu_printf("TB information not being recorded");
+            return;
+        }
+        if (!last_search) {
+            qemu_printf("no search on record! execute info tbs before filtering!");
+            return;
+        }
+
+        tb_flush(cpu);
+        set_default_tbstats_flag(TB_NOTHING);
+
+        /* Set all tbstats as paused, then return only the ones from last_search */
+        pause_collect_tb_stats();
+        set_tbstats_flags(TB_PAUSED);
+
+        for (GList *iter = last_search; iter; iter = g_list_next(iter)) {
+            TBStatistics *tbs = iter->data;
+            tbs->stats_enabled = level;
+        }
+
+        break;
+    default: /* INVALID */
+        break;
+    }
+
+    mmap_unlock();
+    g_free(cmdinfo);
+}
 
diff --git a/hmp-commands.hx b/hmp-commands.hx
index bfa5681dd2..419898751e 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1885,6 +1885,23 @@ STEXI
 @findex qemu-io
 Executes a qemu-io command on the given block device.
 
+ETEXI
+#if defined(CONFIG_TCG)
+    {
+        .name       = "tb_stats",
+        .args_type  = "command:s,level:s?",
+        .params     = "command [stats_level]",
+        .help       = "Control tb statistics collection:"
+                        "tb_stats (start|pause|stop|filter) [all|jit_stats|exec_stats]",
+        .cmd        = hmp_tbstats,
+    },
+#endif
+
+STEXI
+@item tb_stats
+@findex
+Control recording tb statistics
+
 ETEXI
 
     {
diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
index 7d775f2c0d..d1debd3262 100644
--- a/include/exec/tb-stats.h
+++ b/include/exec/tb-stats.h
@@ -6,6 +6,9 @@
 #include "exec/tb-context.h"
 #include "tcg.h"
 
+enum SortBy { SORT_BY_HOTNESS, SORT_BY_HG /* Host/Guest */, SORT_BY_SPILLS };
+enum TbstatsCmd { START, PAUSE, STOP, FILTER };
+
 #define tb_stats_enabled(tb, JIT_STATS) \
     (tb && tb->tb_stats && (tb->tb_stats->stats_enabled & JIT_STATS))
 
@@ -60,4 +63,14 @@ bool tb_stats_cmp(const void *ap, const void *bp);
 
 void dump_jit_profile_info(TCGProfile *s);
 
+void set_tbstats_flags(uint32_t flags);
+void clean_tbstats(void);
+
+struct TbstatsCommand {
+    enum TbstatsCmd cmd;
+    uint32_t level;
+};
+
+void do_hmp_tbstats_safe(CPUState *cpu, run_on_cpu_data icmd);
+
 #endif
diff --git a/include/qemu/log.h b/include/qemu/log.h
index 93642603e5..47071d72c7 100644
--- a/include/qemu/log.h
+++ b/include/qemu/log.h
@@ -133,6 +133,7 @@ void qemu_log_close(void);
 #define TB_NOTHING    0
 #define TB_EXEC_STATS (1 << 1)
 #define TB_JIT_STATS  (1 << 2)
+#define TB_PAUSED     (1 << 3)
 
 void enable_collect_tb_stats(void);
 void disable_collect_tb_stats(void);
diff --git a/monitor/misc.c b/monitor/misc.c
index 00338c002a..aea9b0db4f 100644
--- a/monitor/misc.c
+++ b/monitor/misc.c
@@ -459,6 +459,46 @@ static void hmp_info_registers(Monitor *mon, const QDict *qdict)
 }
 
 #ifdef CONFIG_TCG
+static void hmp_tbstats(Monitor *mon, const QDict *qdict)
+{
+    if (!tcg_enabled()) {
+        error_report("TB information is only available with accel=tcg");
+        return;
+    }
+
+    char *cmd = (char *) qdict_get_try_str(qdict, "command");
+    enum TbstatsCmd icmd = -1;
+
+    if (strcmp(cmd, "start") == 0) {
+        icmd = START;
+    } else if (strcmp(cmd, "pause") == 0) {
+        icmd = PAUSE;
+    } else if (strcmp(cmd, "stop") == 0) {
+        icmd = STOP;
+    } else if (strcmp(cmd, "filter") == 0) {
+        icmd = FILTER;
+    } else {
+        error_report("invalid command!");
+    }
+
+    char *slevel = (char *) qdict_get_try_str(qdict, "level");
+    uint32_t level = TB_EXEC_STATS | TB_JIT_STATS;
+    if (slevel) {
+        if (strcmp(slevel, "jit_stats") == 0) {
+            level = TB_JIT_STATS;
+        } else if (strcmp(slevel, "exec_stats") == 0) {
+            level = TB_EXEC_STATS;
+        }
+    }
+
+    struct TbstatsCommand *tbscommand = g_new0(struct TbstatsCommand, 1);
+    tbscommand->cmd = icmd;
+    tbscommand->level = level;
+    async_safe_run_on_cpu(first_cpu, do_hmp_tbstats_safe,
+                          RUN_ON_CPU_HOST_PTR(tbscommand));
+
+}
+
 static void hmp_info_jit(Monitor *mon, const QDict *qdict)
 {
     if (!tcg_enabled()) {
-- 
2.22.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v4 7/7] monitor: adding info tbs, tb, and coverset
  2019-07-20  1:02 [Qemu-devel] [PATCH v4 0/7] Measure Tiny Code Generation Quality vandersonmr
                   ` (5 preceding siblings ...)
  2019-07-20  1:02 ` [Qemu-devel] [PATCH v4 6/7] monitor: adding tb_stats hmp command vandersonmr
@ 2019-07-20  1:02 ` vandersonmr
  2019-07-26 18:17   ` Alex Bennée
  2019-07-29 11:01   ` Alex Bennée
  2019-07-26 12:51 ` [Qemu-devel] [PATCH v4 0/7] Measure Tiny Code Generation Quality Alex Bennée
  7 siblings, 2 replies; 18+ messages in thread
From: vandersonmr @ 2019-07-20  1:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: vandersonmr, Markus Armbruster, Riku Voipio, Laurent Vivier,
	Dr. David Alan Gilbert, Paolo Bonzini, Richard Henderson

Adding info [tbs|tb|coverset] commands to HMP.
These commands allow the exploration of TBs
generated by the TCG. Understand which one
hotter, with more guest/host instructions...
and examine their guest, host and IR code.

The goal of this command is to allow the dynamic exploration
of TCG behavior and code quality. Therefore, for now, a
corresponding QMP command is not worthwhile.

Signed-off-by: vandersonmr <vandersonmr2@gmail.com>
---
 accel/tcg/tb-stats.c         | 275 +++++++++++++++++++++++++++++++++++
 hmp-commands-info.hx         |  23 +++
 include/exec/tb-stats.h      |  37 +++++
 include/qemu/log-for-trace.h |   2 +
 include/qemu/log.h           |   1 +
 linux-user/exit.c            |   4 +
 monitor/misc.c               |  71 +++++++++
 util/log.c                   |  26 +++-
 8 files changed, 431 insertions(+), 8 deletions(-)

diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
index 6c330e1b02..85a16cd563 100644
--- a/accel/tcg/tb-stats.c
+++ b/accel/tcg/tb-stats.c
@@ -212,3 +212,278 @@ void do_hmp_tbstats_safe(CPUState *cpu, run_on_cpu_data icmd)
     g_free(cmdinfo);
 }
 
+static void collect_tb_stats(void *p, uint32_t hash, void *userp)
+{
+    last_search = g_list_prepend(last_search, p);
+}
+
+static void dump_tb_header(TBStatistics *tbs)
+{
+    unsigned g = tbs->translations.total ?
+        tbs->code.num_guest_inst / tbs->translations.total : 0;
+    unsigned ops = tbs->translations.total ?
+        tbs->code.num_tcg_ops / tbs->translations.total : 0;
+    unsigned ops_opt = tbs->translations.total ?
+        tbs->code.num_tcg_ops_opt / tbs->translations.total : 0;
+    unsigned h = tbs->translations.total ?
+        tbs->code.num_host_inst / tbs->translations.total : 0;
+    unsigned spills = tbs->translations.total ?
+        tbs->code.spills / tbs->translations.total : 0;
+
+    float guest_host_prop = g ? ((float) h / g) : 0;
+
+    qemu_log("TB%d: phys:0x"TB_PAGE_ADDR_FMT" virt:0x"TARGET_FMT_lx
+             " flags:%#08x (trans:%lu uncached:%lu exec:%lu ints: g:%u op:%u op_opt:%u h:%u h/g:%.2f spills:%d)\n",
+             tbs->display_id,
+             tbs->phys_pc, tbs->pc, tbs->flags,
+             tbs->translations.total, tbs->translations.uncached,
+             tbs->executions.total, g, ops, ops_opt, h, guest_host_prop,
+             spills);
+}
+
+static gint
+inverse_sort_tbs(gconstpointer p1, gconstpointer p2, gpointer psort_by)
+{
+    const TBStatistics *tbs1 = (TBStatistics *) p1;
+    const TBStatistics *tbs2 = (TBStatistics *) p2;
+    int sort_by = *((int *) psort_by);
+    unsigned long c1 = 0;
+    unsigned long c2 = 0;
+
+    if (likely(sort_by == SORT_BY_SPILLS)) {
+        c1 = tbs1->code.spills;
+        c2 = tbs2->code.spills;
+    } else if (likely(sort_by == SORT_BY_HOTNESS)) {
+        c1 = tbs1->executions.total;
+        c2 = tbs2->executions.total;
+    } else if (likely(sort_by == SORT_BY_HG)) {
+        if (tbs1->code.num_guest_inst == 0) {
+            return -1;
+        }
+        if (tbs2->code.num_guest_inst == 0) {
+            return 1;
+        }
+
+        float a = (float) tbs1->code.num_host_inst / tbs1->code.num_guest_inst;
+        float b = (float) tbs2->code.num_host_inst / tbs2->code.num_guest_inst;
+        c1 = a <= b ? 0 : 1;
+        c2 = a <= b ? 1 : 0;
+    }
+
+    return c1 < c2 ? 1 : c1 == c2 ? 0 : -1;
+}
+
+static void do_dump_coverset_info(int percentage)
+{
+    uint64_t total_exec_count = 0;
+    uint64_t covered_exec_count = 0;
+    unsigned coverset_size = 0;
+    int id = 1;
+    GList *i;
+
+    g_list_free(last_search);
+    last_search = NULL;
+
+    qht_iter(&tb_ctx.tb_stats, collect_tb_stats, NULL);
+
+    int sort_by = SORT_BY_HOTNESS;
+    last_search = g_list_sort_with_data(last_search, inverse_sort_tbs, &sort_by);
+
+    if (!last_search) {
+        qemu_log("No data collected yet\n");
+        return;
+    }
+
+    /* Compute total execution count for all tbs */
+    for (i = last_search; i; i = i->next) {
+        TBStatistics *tbs = (TBStatistics *) i->data;
+        total_exec_count += tbs->executions.total * tbs->code.num_guest_inst;
+    }
+
+    for (i = last_search; i; i = i->next) {
+        TBStatistics *tbs = (TBStatistics *) i->data;
+        covered_exec_count += tbs->executions.total * tbs->code.num_guest_inst;
+        tbs->display_id = id++;
+        coverset_size++;
+        dump_tb_header(tbs);
+
+        /* Iterate and display tbs until reach the percentage count cover */
+        if (((double) covered_exec_count / total_exec_count) >
+                ((double) percentage / 100)) {
+            break;
+        }
+    }
+
+    qemu_log("\n------------------------------\n");
+    qemu_log("# of TBs to reach %d%% of the total of guest insts exec: %u\t",
+                percentage, coverset_size);
+    qemu_log("Total of guest insts exec: %lu\n", total_exec_count);
+    qemu_log("\n------------------------------\n");
+
+    /* free the unused bits */
+    if (i) {
+        if (i->next) {
+            i->next->prev = NULL;
+        }
+        g_list_free(i->next);
+        i->next = NULL;
+    }
+}
+
+static void do_dump_tbs_info(int count, int sort_by)
+{
+    int id = 1;
+    GList *i;
+
+    g_list_free(last_search);
+    last_search = NULL;
+
+    qht_iter(&tb_ctx.tb_stats, collect_tb_stats, NULL);
+
+    last_search = g_list_sort_with_data(last_search, inverse_sort_tbs,
+                                        &sort_by);
+
+    if (!last_search) {
+        qemu_printf("No data collected yet!\n");
+        return;
+    }
+
+    for (i = last_search; i && count--; i = i->next) {
+        TBStatistics *tbs = (TBStatistics *) i->data;
+        tbs->display_id = id++;
+        dump_tb_header(tbs);
+    }
+
+    /* free the unused bits */
+    if (i) {
+        if (i->next) {
+            i->next->prev = NULL;
+        }
+        g_list_free(i->next);
+        i->next = NULL;
+    }
+}
+
+static void
+do_dump_coverset_info_safe(CPUState *cpu, run_on_cpu_data percentage)
+{
+    qemu_log_to_monitor(true);
+    do_dump_coverset_info(percentage.host_int);
+    qemu_log_to_monitor(false);
+}
+
+struct tbs_dump_info {
+    int count;
+    int sort_by;
+};
+
+static void do_dump_tbs_info_safe(CPUState *cpu, run_on_cpu_data tbdi)
+{
+    struct tbs_dump_info *info = tbdi.host_ptr;
+    qemu_log_to_monitor(true);
+    do_dump_tbs_info(info->count, info->sort_by);
+    qemu_log_to_monitor(false);
+    g_free(info);
+}
+
+/*
+ * When we dump_tbs_info on a live system via the HMP we want to
+ * ensure the system is quiessent before we start outputting stuff.
+ * Otherwise we could pollute the output with other logging output.
+ */
+void dump_coverset_info(int percentage, bool use_monitor)
+{
+    if (use_monitor) {
+        async_safe_run_on_cpu(first_cpu, do_dump_coverset_info_safe,
+                              RUN_ON_CPU_HOST_INT(percentage));
+    } else {
+        do_dump_coverset_info(percentage);
+    }
+}
+
+void dump_tbs_info(int count, int sort_by, bool use_monitor)
+{
+    if (use_monitor) {
+        struct tbs_dump_info *tbdi = g_new(struct tbs_dump_info, 1);
+        tbdi->count = count;
+        tbdi->sort_by = sort_by;
+        async_safe_run_on_cpu(first_cpu, do_dump_tbs_info_safe,
+                              RUN_ON_CPU_HOST_PTR(tbdi));
+    } else {
+        do_dump_tbs_info(count, sort_by);
+    }
+}
+
+static void do_tb_dump_with_statistics(TBStatistics *tbs, int log_flags)
+{
+    CPUState *cpu = current_cpu;
+    uint32_t cflags = curr_cflags() | CF_NOCACHE;
+    int old_log_flags = qemu_loglevel;
+    TranslationBlock *tb = NULL;
+
+    qemu_set_log(log_flags);
+
+    qemu_log("\n------------------------------\n");
+    dump_tb_header(tbs);
+
+    if (sigsetjmp(cpu->jmp_env, 0) == 0) {
+        mmap_lock();
+        tb = tb_gen_code(cpu, tbs->pc, tbs->cs_base, tbs->flags, cflags);
+        tb_phys_invalidate(tb, -1);
+        mmap_unlock();
+    } else {
+        /*
+         * The mmap_lock is dropped by tb_gen_code if it runs out of
+         * memory.
+         */
+        fprintf(stderr, "%s: dbg failed!\n", __func__);
+        assert_no_pages_locked();
+    }
+
+    qemu_set_log(old_log_flags);
+
+    tcg_tb_remove(tb);
+}
+
+struct tb_dump_info {
+    int id;
+    int log_flags;
+    bool use_monitor;
+};
+
+static void do_dump_tb_info_safe(CPUState *cpu, run_on_cpu_data info)
+{
+    struct tb_dump_info *tbdi = (struct tb_dump_info *) info.host_ptr;
+    GList *iter;
+
+    if (!last_search) {
+        qemu_printf("no search on record");
+        return;
+    }
+    qemu_log_to_monitor(tbdi->use_monitor);
+
+    for (iter = last_search; iter; iter = g_list_next(iter)) {
+        TBStatistics *tbs = iter->data;
+        if (tbs->display_id == tbdi->id) {
+            do_tb_dump_with_statistics(tbs, tbdi->log_flags);
+            break;
+        }
+    }
+    qemu_log_to_monitor(false);
+    g_free(tbdi);
+}
+
+void dump_tb_info(int id, int log_mask, bool use_monitor)
+{
+    struct tb_dump_info *tbdi = g_new(struct tb_dump_info, 1);
+
+    tbdi->id = id;
+    tbdi->log_flags = log_mask;
+    tbdi->use_monitor = use_monitor;
+
+    async_safe_run_on_cpu(first_cpu, do_dump_tb_info_safe,
+                          RUN_ON_CPU_HOST_PTR(tbdi));
+
+    /* tbdi free'd by do_dump_tb_info_safe */
+}
+
diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index c59444c461..761c76619b 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -289,6 +289,29 @@ ETEXI
         .help       = "show dynamic compiler info",
         .cmd        = hmp_info_jit,
     },
+    {
+        .name       = "tbs",
+        .args_type  = "number:i?,sortedby:s?",
+        .params     = "[number sortedby]",
+        .help       = "show a [number] translated blocks sorted by [sortedby]"
+                      "sortedby opts: hotness hg",
+        .cmd        = hmp_info_tbs,
+    },
+    {
+        .name       = "tb",
+        .args_type  = "id:i,flags:s?",
+        .params     = "id [log1[,...] flags]",
+        .help       = "show information about one translated block by id",
+        .cmd        = hmp_info_tb,
+    },
+    {
+        .name       = "coverset",
+        .args_type  = "number:i?",
+        .params     = "[number]",
+        .help       = "show hottest translated blocks neccesary to cover"
+                      "[number]% of the execution count",
+        .cmd        = hmp_info_coverset,
+    },
 #endif
 
 STEXI
diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
index d1debd3262..e2ab1068c3 100644
--- a/include/exec/tb-stats.h
+++ b/include/exec/tb-stats.h
@@ -57,6 +57,9 @@ struct TBStatistics {
         unsigned long uncached;
         unsigned long spanning;
     } translations;
+
+    /* HMP information - used for referring to previous search */
+    int display_id;
 };
 
 bool tb_stats_cmp(const void *ap, const void *bp);
@@ -73,4 +76,38 @@ struct TbstatsCommand {
 
 void do_hmp_tbstats_safe(CPUState *cpu, run_on_cpu_data icmd);
 
+/**
+ * dump_coverset_info: report the hottest blocks to cover n% of execution
+ *
+ * @percentage: cover set percentage
+ * @use_monitor: redirect output to monitor
+ *
+ * Report the hottest blocks to either the log or monitor
+ */
+void dump_coverset_info(int percentage, bool use_monitor);
+
+
+/**
+ * dump_tbs_info: report the hottest blocks
+ *
+ * @count: the limit of hotblocks
+ * @sort_by: property in which the dump will be sorted
+ * @use_monitor: redirect output to monitor
+ *
+ * Report the hottest blocks to either the log or monitor
+ */
+void dump_tbs_info(int count, int sort_by, bool use_monitor);
+
+/**
+ * dump_tb_info: dump information about one TB
+ *
+ * @id: the display id of the block (from previous search)
+ * @mask: the temporary logging mask
+ * @Use_monitor: redirect output to monitor
+ *
+ * Re-run a translation of a block at addr for the purposes of debug output
+ */
+void dump_tb_info(int id, int log_mask, bool use_monitor);
+
+
 #endif
diff --git a/include/qemu/log-for-trace.h b/include/qemu/log-for-trace.h
index 2f0a5b080e..d65eb83037 100644
--- a/include/qemu/log-for-trace.h
+++ b/include/qemu/log-for-trace.h
@@ -21,6 +21,8 @@
 /* Private global variable, don't use */
 extern int qemu_loglevel;
 
+extern int32_t max_num_hot_tbs_to_dump;
+
 #define LOG_TRACE          (1 << 15)
 
 /* Returns true if a bit is set in the current loglevel mask */
diff --git a/include/qemu/log.h b/include/qemu/log.h
index 47071d72c7..27ba267faa 100644
--- a/include/qemu/log.h
+++ b/include/qemu/log.h
@@ -114,6 +114,7 @@ typedef struct QEMULogItem {
 extern const QEMULogItem qemu_log_items[];
 
 void qemu_set_log(int log_flags);
+void qemu_log_to_monitor(bool enable);
 void qemu_log_needs_buffers(void);
 void qemu_set_log_filename(const char *filename, Error **errp);
 void qemu_set_dfilter_ranges(const char *ranges, Error **errp);
diff --git a/linux-user/exit.c b/linux-user/exit.c
index bdda720553..7226104959 100644
--- a/linux-user/exit.c
+++ b/linux-user/exit.c
@@ -28,6 +28,10 @@ extern void __gcov_dump(void);
 
 void preexit_cleanup(CPUArchState *env, int code)
 {
+    if (tb_stats_collection_enabled()) {
+        dump_tbs_info(max_num_hot_tbs_to_dump, SORT_BY_HOTNESS, false);
+    }
+
 #ifdef TARGET_GPROF
         _mcleanup();
 #endif
diff --git a/monitor/misc.c b/monitor/misc.c
index aea9b0db4f..78ada8aa0e 100644
--- a/monitor/misc.c
+++ b/monitor/misc.c
@@ -499,6 +499,77 @@ static void hmp_tbstats(Monitor *mon, const QDict *qdict)
 
 }
 
+static void hmp_info_tbs(Monitor *mon, const QDict *qdict)
+{
+    int n;
+    const char *s = NULL;
+    if (!tcg_enabled()) {
+        error_report("TB information is only available with accel=tcg");
+        return;
+    }
+    if (!tb_ctx.tb_stats.map) {
+        error_report("no TB information recorded");
+        return;
+    }
+
+    n = qdict_get_try_int(qdict, "number", 10);
+    s = qdict_get_try_str(qdict, "sortedby");
+
+    int sortedby = 0;
+    if (s == NULL || strcmp(s, "hotness") == 0) {
+        sortedby = SORT_BY_HOTNESS;
+    } else if (strcmp(s, "hg") == 0) {
+        sortedby = SORT_BY_HG;
+    } else if (strcmp(s, "spills") == 0) {
+        sortedby = SORT_BY_SPILLS;
+    }
+
+    dump_tbs_info(n, sortedby, true);
+}
+
+static void hmp_info_tb(Monitor *mon, const QDict *qdict)
+{
+    const int id = qdict_get_int(qdict, "id");
+    const char *flags = qdict_get_try_str(qdict, "flags");
+    int mask;
+
+    if (!tcg_enabled()) {
+        error_report("TB information is only available with accel=tcg");
+        return;
+    }
+
+    mask = flags ? qemu_str_to_log_mask(flags) : CPU_LOG_TB_IN_ASM;
+
+    if (!mask) {
+        help_cmd(mon, "log");
+        return;
+    }
+
+    dump_tb_info(id, mask, true);
+}
+
+static void hmp_info_coverset(Monitor *mon, const QDict *qdict)
+{
+    int n;
+    if (!tcg_enabled()) {
+        error_report("TB information is only available with accel=tcg");
+        return;
+    }
+    if (!tb_stats_collection_enabled()) {
+        error_report("TB information not being recorded");
+        return;
+    }
+
+    n = qdict_get_try_int(qdict, "number", 90);
+
+    if (n < 0 || n > 100) {
+        error_report("Coverset percentage should be between 0 and 100");
+        return;
+    }
+
+    dump_coverset_info(n, true);
+}
+
 static void hmp_info_jit(Monitor *mon, const QDict *qdict)
 {
     if (!tcg_enabled()) {
diff --git a/util/log.c b/util/log.c
index 8109d19afb..7bfcbe3703 100644
--- a/util/log.c
+++ b/util/log.c
@@ -19,6 +19,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/log.h"
+#include "qemu/qemu-print.h"
 #include "qemu/range.h"
 #include "qemu/error-report.h"
 #include "qapi/error.h"
@@ -31,25 +32,34 @@ int qemu_loglevel;
 static int log_append = 0;
 static GArray *debug_regions;
 int32_t max_num_hot_tbs_to_dump;
+static bool to_monitor;
 
 /* Return the number of characters emitted.  */
 int qemu_log(const char *fmt, ...)
 {
     int ret = 0;
-    if (qemu_logfile) {
-        va_list ap;
-        va_start(ap, fmt);
+    va_list ap;
+    va_start(ap, fmt);
+
+    if (to_monitor) {
+        ret = qemu_vprintf(fmt, ap);
+    } else if (qemu_logfile) {
         ret = vfprintf(qemu_logfile, fmt, ap);
-        va_end(ap);
+    }
+    va_end(ap);
 
-        /* Don't pass back error results.  */
-        if (ret < 0) {
-            ret = 0;
-        }
+    /* Don't pass back error results.  */
+    if (ret < 0) {
+        ret = 0;
     }
     return ret;
 }
 
+void qemu_log_to_monitor(bool enable)
+{
+    to_monitor = enable;
+}
+
 static bool log_uses_own_buffers;
 
 /* enable or disable low levels log */
-- 
2.22.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v4 1/7] accel: introducing TBStatistics structure
  2019-07-20  1:02 ` [Qemu-devel] [PATCH v4 1/7] accel: introducing TBStatistics structure vandersonmr
@ 2019-07-26 11:56   ` Alex Bennée
  0 siblings, 0 replies; 18+ messages in thread
From: Alex Bennée @ 2019-07-26 11:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, vandersonmr, Richard Henderson


vandersonmr <vandersonmr2@gmail.com> writes:

> To store statistics for each TB we created a TBStatistics structure
> which is linked with the TBs. The TBStatistics can stay alive after
> tb_flush and be relinked to a regenerated TB. So the statistics can
> be accumulated even through flushes.
>
> TBStatistics will be also referred to as TBS or tbstats.
>
> Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
> ---
>  accel/tcg/translate-all.c | 57 +++++++++++++++++++++++++++++++++++++++
>  include/exec/exec-all.h   | 15 +++--------
>  include/exec/tb-context.h | 12 +++++++++
>  include/exec/tb-hash.h    |  7 +++++
>  include/exec/tb-stats.h   | 29 ++++++++++++++++++++
>  include/qemu/log.h        |  6 +++++
>  util/log.c                | 28 +++++++++++++++++++
>  7 files changed, 143 insertions(+), 11 deletions(-)
>  create mode 100644 include/exec/tb-stats.h
>
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 5d1e08b169..a574890a80 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -1118,6 +1118,23 @@ static inline void code_gen_alloc(size_t tb_size)
>      }
>  }
>
> +/*
> + * This is the more or less the same compare,

We should reference "same compare as tb_cmp"

> but the data persists
> + * over tb_flush. We also aggregate the various variations of cflags
> + * under one record and ignore the details of page overlap (although
> + * we can count it).
> + */
> +bool tb_stats_cmp(const void *ap, const void *bp)
> +{
> +    const TBStatistics *a = ap;
> +    const TBStatistics *b = bp;
> +
> +    return a->phys_pc == b->phys_pc &&
> +        a->pc == b->pc &&
> +        a->cs_base == b->cs_base &&
> +        a->flags == b->flags;
> +}
> +
>  static bool tb_cmp(const void *ap, const void *bp)
>  {
>      const TranslationBlock *a = ap;
> @@ -1137,6 +1154,9 @@ static void tb_htable_init(void)
>      unsigned int mode = QHT_MODE_AUTO_RESIZE;
>
>      qht_init(&tb_ctx.htable, tb_cmp, CODE_GEN_HTABLE_SIZE, mode);
> +    if (tb_stats_collection_enabled()) {
> +        qht_init(&tb_ctx.tb_stats, tb_stats_cmp, CODE_GEN_HTABLE_SIZE, mode);
> +    }
>  }
>
>  /* Must be called before using the QEMU cpus. 'tb_size' is the size
> @@ -1666,6 +1686,32 @@ tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
>      return tb;
>  }
>
> +static TBStatistics *tb_get_stats(tb_page_addr_t phys_pc, target_ulong pc,
> +                                  target_ulong cs_base, uint32_t flags)
> +{
> +    TBStatistics *new_stats = g_new0(TBStatistics, 1);
> +    uint32_t hash = tb_stats_hash_func(phys_pc, pc, flags);
> +    void *existing_stats = NULL;
> +    new_stats->phys_pc = phys_pc;
> +    new_stats->pc = pc;
> +    new_stats->cs_base = cs_base;
> +    new_stats->flags = flags;
> +
> +    qht_insert(&tb_ctx.tb_stats, new_stats, hash, &existing_stats);
> +
> +    if (unlikely(existing_stats)) {
> +        /*
> +         * If there is already a TBStatistic for this TB from a previous flush
> +         * then just make the new TB point to the older TBStatistic
> +         */
> +        g_free(new_stats);
> +        return existing_stats;
> +    } else {
> +        return new_stats;
> +    }
> +}
> +
> +
>  /* Called with mmap_lock held for user mode emulation.  */
>  TranslationBlock *tb_gen_code(CPUState *cpu,
>                                target_ulong pc, target_ulong cs_base,
> @@ -1732,6 +1778,17 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>      ti = profile_getclock();
>  #endif
>
> +    /*
> +     * We want to fetch the stats structure before we start code
> +     * generation so we can count interesting things about this
> +     * generation.
> +     */
> +    if (tb_stats_collection_enabled()) {
> +        tb->tb_stats = tb_get_stats(phys_pc, pc, cs_base, flags);
> +    } else {
> +        tb->tb_stats = NULL;
> +    }
> +
>      tcg_func_start(tcg_ctx);
>
>      tcg_ctx->cpu = env_cpu(env);
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index 16034ee651..24bd6a0a0c 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -22,21 +22,11 @@
>
>  #include "exec/tb-context.h"
>  #include "sysemu/cpus.h"
> +#include "exec/tb-stats.h"
>
>  /* allow to see translation results - the slowdown should be negligible, so we leave it */
>  #define DEBUG_DISAS
>
> -/* Page tracking code uses ram addresses in system mode, and virtual
> -   addresses in userspace mode.  Define tb_page_addr_t to be an appropriate
> -   type.  */
> -#if defined(CONFIG_USER_ONLY)
> -typedef abi_ulong tb_page_addr_t;
> -#define TB_PAGE_ADDR_FMT TARGET_ABI_FMT_lx
> -#else
> -typedef ram_addr_t tb_page_addr_t;
> -#define TB_PAGE_ADDR_FMT RAM_ADDR_FMT
> -#endif
> -

Why are we moving this? I suspect it would be better to split the move
into a separate patch for the purposes of bisection.

>  #include "qemu/log.h"
>
>  void gen_intermediate_code(CPUState *cpu, TranslationBlock *tb, int max_insns);
> @@ -403,6 +393,9 @@ struct TranslationBlock {
>      uintptr_t jmp_list_head;
>      uintptr_t jmp_list_next[2];
>      uintptr_t jmp_dest[2];
> +
> +    /* Pointer to a struct where statistics from the TB is stored */
> +    TBStatistics *tb_stats;
>  };
>
>  extern bool parallel_cpus;
> diff --git a/include/exec/tb-context.h b/include/exec/tb-context.h
> index feb585e0a7..3cfb62a338 100644
> --- a/include/exec/tb-context.h
> +++ b/include/exec/tb-context.h
> @@ -23,6 +23,17 @@
>  #include "qemu/thread.h"
>  #include "qemu/qht.h"
>
> +/* Page tracking code uses ram addresses in system mode, and virtual
> +   addresses in userspace mode.  Define tb_page_addr_t to be an appropriate
> +   type.  */
> +#if defined(CONFIG_USER_ONLY)
> +typedef abi_ulong tb_page_addr_t;
> +#define TB_PAGE_ADDR_FMT TARGET_ABI_FMT_lx
> +#else
> +typedef ram_addr_t tb_page_addr_t;
> +#define TB_PAGE_ADDR_FMT RAM_ADDR_FMT
> +#endif
> +
>  #define CODE_GEN_HTABLE_BITS     15
>  #define CODE_GEN_HTABLE_SIZE     (1 << CODE_GEN_HTABLE_BITS)
>
> @@ -35,6 +46,7 @@ struct TBContext {
>
>      /* statistics */
>      unsigned tb_flush_count;
> +    struct qht tb_stats;
>  };
>
>  extern TBContext tb_ctx;
> diff --git a/include/exec/tb-hash.h b/include/exec/tb-hash.h
> index 4f3a37d927..54c477fe79 100644
> --- a/include/exec/tb-hash.h
> +++ b/include/exec/tb-hash.h
> @@ -64,4 +64,11 @@ uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, uint32_t flags,
>      return qemu_xxhash7(phys_pc, pc, flags, cf_mask, trace_vcpu_dstate);
>  }
>
> +static inline
> +uint32_t tb_stats_hash_func(tb_page_addr_t phys_pc, target_ulong pc,
> +                            uint32_t flags)
> +{
> +    return qemu_xxhash5(phys_pc, pc, flags);
> +}
> +
>  #endif
> diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
> new file mode 100644
> index 0000000000..0913155ec3
> --- /dev/null
> +++ b/include/exec/tb-stats.h
> @@ -0,0 +1,29 @@
> +#ifndef TB_STATS_H
> +
> +#define TB_STATS_H
> +
> +#include "exec/cpu-common.h"
> +#include "exec/tb-context.h"
> +#include "tcg.h"
> +
> +typedef struct TBStatistics TBStatistics;
> +
> +/*
> + * This struct stores statistics such as execution count of the
> + * TranslationBlocks. Each sets of TBs for a given phys_pc/pc/flags
> + * has its own TBStatistics which will persist over tb_flush.
> + *
> + * We include additional counters to track number of translations as
> + * well as variants for compile flags.
> + */
> +struct TBStatistics {
> +    tb_page_addr_t phys_pc;
> +    target_ulong pc;
> +    uint32_t     flags;
> +    /* cs_base isn't included in the hash but we do check for matches */
> +    target_ulong cs_base;
> +};
> +
> +bool tb_stats_cmp(const void *ap, const void *bp);
> +
> +#endif
> diff --git a/include/qemu/log.h b/include/qemu/log.h
> index b097a6cae1..e175d4d5d0 100644
> --- a/include/qemu/log.h
> +++ b/include/qemu/log.h
> @@ -129,4 +129,10 @@ void qemu_log_flush(void);
>  /* Close the log file */
>  void qemu_log_close(void);
>
> +void enable_collect_tb_stats(void);
> +void disable_collect_tb_stats(void);
> +void pause_collect_tb_stats(void);
> +bool tb_stats_collection_enabled(void);
> +bool tb_stats_collection_paused(void);
> +
>  #endif
> diff --git a/util/log.c b/util/log.c
> index 1d1b33f7d9..ab73fdc100 100644
> --- a/util/log.c
> +++ b/util/log.c
> @@ -326,3 +326,31 @@ void qemu_print_log_usage(FILE *f)
>      fprintf(f, "\nUse \"-d trace:help\" to get a list of trace events.\n\n");
>  #endif
>  }
> +
> +int tcg_collect_tb_stats;
> +
> +void enable_collect_tb_stats(void)
> +{
> +    tcg_collect_tb_stats = 1;
> +}

Do we check we have created the qht table? Perhaps this should be
wrapped into here? I suspect that's why I got:

  (qemu) help tb_stats
  tb_stats command [stats_level] -- Control tb statistics collection:tb_stats (start|pause|stop|filter) [all|jit_stats|exec_stats]
  (qemu) tb_stats start
  (qemu) fish: “./aarch64-softmmu/qemu-system-a…” terminated by signal SIGILL (Illegal instruction)

You'll want a memory barrier before you throw the switch of enabling so
qht_init is fully visible.

> +
> +void disable_collect_tb_stats(void)
> +{
> +    tcg_collect_tb_stats = 0;
> +}
> +
> +void pause_collect_tb_stats(void)
> +{
> +    tcg_collect_tb_stats = 2;
> +}
> +
> +bool tb_stats_collection_enabled(void)
> +{
> +    return tcg_collect_tb_stats;
> +}
> +
> +bool tb_stats_collection_paused(void)
> +{
> +    return tcg_collect_tb_stats == 2;
> +}

While we are at it can we move these helpers into tb_stats as log.c can
be built for all versions of QEMU but tb_stats only exists when TCG is
built in.

--
Alex Bennée


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v4 0/7] Measure Tiny Code Generation Quality
  2019-07-20  1:02 [Qemu-devel] [PATCH v4 0/7] Measure Tiny Code Generation Quality vandersonmr
                   ` (6 preceding siblings ...)
  2019-07-20  1:02 ` [Qemu-devel] [PATCH v4 7/7] monitor: adding info tbs, tb, and coverset vandersonmr
@ 2019-07-26 12:51 ` Alex Bennée
  7 siblings, 0 replies; 18+ messages in thread
From: Alex Bennée @ 2019-07-26 12:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: vandersonmr


vandersonmr <vandersonmr2@gmail.com> writes:

> This patch is part of Google Summer of Code (GSoC) 2019.
> More about the project can be found in:
> https://wiki.qemu.org/Internships/ProjectIdeas/TCGCodeQuality
>
> The goal of this patch is to add infrastructure to collect
> execution and JIT statistics during the emulation with accel/TCG.
> The statistics are stored in TBStatistic structures (TBStats)
> with each TB having its respective TBStats.
>
> We added -d tb_stats and HMP tb_stats commands to allow the control
> of this statistics collection. And info tb, tbs, and coverset commands
> were also added to allow dumping and exploring all this information
> while emulating.

Hmm with:

  -d tb_stats:10:all

I'm seeing the following bt:

  Thread 4 "qemu-system-aar" received signal SIGABRT, Aborted.
  [Switching to Thread 0x7fffe5285700 (LWP 898)]
  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
  50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
  (gdb) bt
  #0  0x00007ffff51787bb in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
  #1  0x00007ffff5163535 in __GI_abort () at abort.c:79
  #2  0x0000555555bab663 in OP_E (bytemode=2, sizeflag=3) at /home/alex/lsrc/qemu.git/disas/i386.c:5094
  #3  0x0000555555ba972f in print_insn (pc=140735911299712, info=0x7fffe5281c00) at /home/alex/lsrc/qemu.git/disas/i386.c:4071
  #4  0x0000555555ba8623 in print_insn_i386 (pc=140735911299712, info=0x7fffe5281c00) at /home/alex/lsrc/qemu.git/disas/i386.c:3705
  #5  0x0000555555881241 in get_num_insts (code=0x7fffa2000a80 <code_gen_buffer+33557075>, size=88) at /home/alex/lsrc/qemu.git/disas.c:575
  #6  0x00005555558d89f1 in tb_gen_code (cpu=0x555556a1f6a0, pc=18446743524230025728, cs_base=0, flags=2415924229, cflags=-16252928) at /home/alex/lsrc/qemu.git/accel/tcg/translate-all.c:1881
  #7  0x00005555558d4984 in tb_find (cpu=0x555556a1f6a0, last_tb=0x0, tb_exit=0, cf_mask=524288) at /home/alex/lsrc/qemu.git/accel/tcg/cpu-exec.c:409
  #8  0x00005555558d524f in cpu_exec (cpu=0x555556a1f6a0) at /home/alex/lsrc/qemu.git/accel/tcg/cpu-exec.c:731
  #9  0x0000555555884806 in tcg_cpu_exec (cpu=0x555556a1f6a0) at /home/alex/lsrc/qemu.git/cpus.c:1435
  #10 0x000055555588505c in qemu_tcg_cpu_thread_fn (arg=0x555556a1f6a0) at /home/alex/lsrc/qemu.git/cpus.c:1743
  #11 0x0000555556010875 in qemu_thread_start (args=0x555556a75450) at /home/alex/lsrc/qemu.git/util/qemu-thread-posix.c:502
  #12 0x00007ffff530bfa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
  #13 0x00007ffff523a4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95


>
> Collecting these statistics and information is useful to understand
> qemu performance and to help to add the support for traces to QEMU.
>
> vandersonmr (7):
>   accel: introducing TBStatistics structure
>   accel: collecting TB execution count
>   accel: collecting JIT statistics
>   accel: replacing part of CONFIG_PROFILER with TBStats
>   log: adding -d tb_stats to control tbstats
>   monitor: adding tb_stats hmp command
>   monitor: adding info tbs, tb, and coverset
>
>  accel/tcg/Makefile.objs      |   2 +-
>  accel/tcg/tb-stats.c         | 489 +++++++++++++++++++++++++++++++++++
>  accel/tcg/tcg-runtime.c      |   7 +
>  accel/tcg/tcg-runtime.h      |   2 +
>  accel/tcg/translate-all.c    |  93 ++++++-
>  accel/tcg/translator.c       |   6 +
>  disas.c                      | 108 ++++++++
>  hmp-commands-info.hx         |  23 ++
>  hmp-commands.hx              |  17 ++
>  include/disas/disas.h        |   1 +
>  include/exec/exec-all.h      |  15 +-
>  include/exec/gen-icount.h    |   9 +
>  include/exec/tb-context.h    |  12 +
>  include/exec/tb-hash.h       |   7 +
>  include/exec/tb-stats.h      | 113 ++++++++
>  include/qemu/log-for-trace.h |   2 +
>  include/qemu/log.h           |  16 ++
>  linux-user/exit.c            |   4 +
>  monitor/misc.c               | 111 ++++++++
>  tcg/tcg.c                    | 114 +++-----
>  tcg/tcg.h                    |  12 +-
>  util/log.c                   |  99 ++++++-
>  22 files changed, 1144 insertions(+), 118 deletions(-)
>  create mode 100644 accel/tcg/tb-stats.c
>  create mode 100644 include/exec/tb-stats.h


--
Alex Bennée


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v4 2/7] accel: collecting TB execution count
  2019-07-20  1:02 ` [Qemu-devel] [PATCH v4 2/7] accel: collecting TB execution count vandersonmr
@ 2019-07-26 13:38   ` Alex Bennée
  0 siblings, 0 replies; 18+ messages in thread
From: Alex Bennée @ 2019-07-26 13:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, vandersonmr, Richard Henderson


vandersonmr <vandersonmr2@gmail.com> writes:

> If a TB has a TBS (TBStatistics) with the TB_EXEC_STATS
> enabled, then we instrument the start code of the TB
> to atomically count the number of times it is executed.
> The execution count of the TB is stored in its respective
> TBS.
>
> Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
> ---
>  accel/tcg/tcg-runtime.c   |  7 +++++++
>  accel/tcg/tcg-runtime.h   |  2 ++
>  accel/tcg/translate-all.c |  8 ++++++++
>  accel/tcg/translator.c    |  1 +
>  include/exec/gen-icount.h |  9 +++++++++
>  include/exec/tb-stats.h   | 11 +++++++++++
>  include/qemu/log.h        |  6 ++++++
>  util/log.c                | 11 +++++++++++
>  8 files changed, 55 insertions(+)
>
> diff --git a/accel/tcg/tcg-runtime.c b/accel/tcg/tcg-runtime.c
> index 8a1e408e31..f332eae334 100644
> --- a/accel/tcg/tcg-runtime.c
> +++ b/accel/tcg/tcg-runtime.c
> @@ -167,3 +167,10 @@ void HELPER(exit_atomic)(CPUArchState *env)
>  {
>      cpu_loop_exit_atomic(env_cpu(env), GETPC());
>  }
> +
> +void HELPER(inc_exec_freq)(void *ptr)
> +{
> +    TBStatistics *stats = (TBStatistics *) ptr;
> +    g_assert(stats);
> +    atomic_inc(&stats->executions.total);
> +}
> diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
> index 4fa61b49b4..bf0b75dbe8 100644
> --- a/accel/tcg/tcg-runtime.h
> +++ b/accel/tcg/tcg-runtime.h
> @@ -28,6 +28,8 @@ DEF_HELPER_FLAGS_1(lookup_tb_ptr, TCG_CALL_NO_WG_SE, ptr, env)
>
>  DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, env)
>
> +DEF_HELPER_FLAGS_1(inc_exec_freq, TCG_CALL_NO_RWG, void, ptr)
> +
>  #ifdef CONFIG_SOFTMMU
>
>  DEF_HELPER_FLAGS_5(atomic_cmpxchgb, TCG_CALL_NO_WG,
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index a574890a80..7497dae508 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -1785,6 +1785,14 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>       */
>      if (tb_stats_collection_enabled()) {
>          tb->tb_stats = tb_get_stats(phys_pc, pc, cs_base, flags);
> +        uint32_t flag = get_default_tbstats_flag();
> +
> +        if (qemu_log_in_addr_range(tb->pc)) {
> +            if (flag & TB_EXEC_STATS) {
> +                tb->tb_stats->stats_enabled |= TB_EXEC_STATS;
> +            }
> +        }
> +
>      } else {
>          tb->tb_stats = NULL;
>      }
> diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
> index 9226a348a3..396a11e828 100644
> --- a/accel/tcg/translator.c
> +++ b/accel/tcg/translator.c
> @@ -46,6 +46,7 @@ void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
>
>      ops->init_disas_context(db, cpu);
>      tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
> +    gen_tb_exec_count(tb);
>
>      /* Reset the temp count so that we can identify leaks */
>      tcg_clear_temp_count();
> diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
> index f7669b6841..b3efe41894 100644
> --- a/include/exec/gen-icount.h
> +++ b/include/exec/gen-icount.h
> @@ -7,6 +7,15 @@
>
>  static TCGOp *icount_start_insn;
>
> +static inline void gen_tb_exec_count(TranslationBlock *tb)
> +{
> +    if (tb_stats_enabled(tb, TB_EXEC_STATS)) {
> +        TCGv_ptr ptr = tcg_const_ptr(tb->tb_stats);
> +        gen_helper_inc_exec_freq(ptr);
> +        tcg_temp_free_ptr(ptr);
> +    }
> +}
> +
>  static inline void gen_tb_start(TranslationBlock *tb)
>  {
>      TCGv_i32 count, imm;
> diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
> index 0913155ec3..ee1e8de0d3 100644
> --- a/include/exec/tb-stats.h
> +++ b/include/exec/tb-stats.h
> @@ -6,6 +6,9 @@
>  #include "exec/tb-context.h"
>  #include "tcg.h"
>
> +#define tb_stats_enabled(tb, JIT_STATS) \
> +    (tb && tb->tb_stats && (tb->tb_stats->stats_enabled & JIT_STATS))
> +
>  typedef struct TBStatistics TBStatistics;
>
>  /*
> @@ -22,6 +25,14 @@ struct TBStatistics {
>      uint32_t     flags;
>      /* cs_base isn't included in the hash but we do check for matches */
>      target_ulong cs_base;
> +
> +    uint32_t stats_enabled;
> +
> +    /* Execution stats */
> +    struct {
> +        unsigned long total;
> +        unsigned long atomic;

We are not incrementing atomic in this patch. Also it's not total so
maybe "normal" makes more sense. Something like:

fixup! accel: collecting TB execution count

4 files changed, 11 insertions(+), 6 deletions(-)
accel/tcg/cpu-exec.c    | 4 ++++
accel/tcg/tb-stats.c    | 9 +++++----
accel/tcg/tcg-runtime.c | 2 +-
include/exec/tb-stats.h | 2 +-

modified   accel/tcg/cpu-exec.c
@@ -252,6 +252,10 @@ void cpu_exec_step_atomic(CPUState *cpu)

         start_exclusive();

+        if (tb_stats_enabled(tb, TB_EXEC_STATS)) {
+            tb->tb_stats->executions.atomic++;
+        }
+
         /* Since we got here, we know that parallel_cpus must be true.  */
         parallel_cpus = false;
         in_exclusive_region = true;
modified   accel/tcg/tb-stats.c
@@ -233,11 +233,12 @@ static void dump_tb_header(TBStatistics *tbs)
     float guest_host_prop = g ? ((float) h / g) : 0;

     qemu_log("TB%d: phys:0x"TB_PAGE_ADDR_FMT" virt:0x"TARGET_FMT_lx
-             " flags:%#08x (trans:%lu uncached:%lu exec:%lu ints: g:%u op:%u op_opt:%u h:%u h/g:%.2f spills:%d)\n",
+             " flags:%#08x (trans:%lu uncached:%lu exec:%lu/%lu ints: g:%u op:%u op_opt:%u h:%u h/g:%.2f spills:%d)\n",
              tbs->display_id,
              tbs->phys_pc, tbs->pc, tbs->flags,
              tbs->translations.total, tbs->translations.uncached,
-             tbs->executions.total, g, ops, ops_opt, h, guest_host_prop,
+             tbs->executions.normal, tbs->executions.atomic,
+             g, ops, ops_opt, h, guest_host_prop,
              spills);
 }

@@ -254,8 +255,8 @@ inverse_sort_tbs(gconstpointer p1, gconstpointer p2, gpointer psort_by)
         c1 = tbs1->code.spills;
         c2 = tbs2->code.spills;
     } else if (likely(sort_by == SORT_BY_HOTNESS)) {
-        c1 = tbs1->executions.total;
-        c2 = tbs2->executions.total;
+        c1 = tbs1->executions.normal;
+        c2 = tbs2->executions.normal;
     } else if (likely(sort_by == SORT_BY_HG)) {
         if (tbs1->code.num_guest_inst == 0) {
             return -1;
modified   accel/tcg/tcg-runtime.c
@@ -172,5 +172,5 @@ void HELPER(inc_exec_freq)(void *ptr)
 {
     TBStatistics *stats = (TBStatistics *) ptr;
     g_assert(stats);
-    atomic_inc(&stats->executions.total);
+    atomic_inc(&stats->executions.normal);
 }
modified   include/exec/tb-stats.h
@@ -33,7 +33,7 @@ struct TBStatistics {

     /* Execution stats */
     struct {
-        unsigned long total;
+        unsigned long normal;
         unsigned long atomic;
     } executions;

> diff --git a/include/qemu/log.h b/include/qemu/log.h
> index e175d4d5d0..b213411836 100644
> --- a/include/qemu/log.h
> +++ b/include/qemu/log.h
> @@ -129,10 +129,16 @@ void qemu_log_flush(void);
>  /* Close the log file */
>  void qemu_log_close(void);
>
> +#define TB_NOTHING    0
> +#define TB_EXEC_STATS (1 << 1)
> +
>  void enable_collect_tb_stats(void);
>  void disable_collect_tb_stats(void);
>  void pause_collect_tb_stats(void);
>  bool tb_stats_collection_enabled(void);
>  bool tb_stats_collection_paused(void);
>
> +void set_default_tbstats_flag(uint32_t flag);
> +uint32_t get_default_tbstats_flag(void);
> +
>  #endif
> diff --git a/util/log.c b/util/log.c
> index ab73fdc100..f81653d712 100644
> --- a/util/log.c
> +++ b/util/log.c
> @@ -354,3 +354,14 @@ bool tb_stats_collection_paused(void)
>      return tcg_collect_tb_stats == 2;
>  }
>
> +uint32_t default_tbstats_flag;
> +
> +void set_default_tbstats_flag(uint32_t flag)
> +{
> +    default_tbstats_flag = flag;
> +}
> +
> +uint32_t get_default_tbstats_flag(void)
> +{
> +    return default_tbstats_flag;
> +}

Some comment about not overloading log.c as before.

--
Alex Bennée


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v4 3/7] accel: collecting JIT statistics
  2019-07-20  1:02 ` [Qemu-devel] [PATCH v4 3/7] accel: collecting JIT statistics vandersonmr
@ 2019-07-26 14:46   ` Alex Bennée
  0 siblings, 0 replies; 18+ messages in thread
From: Alex Bennée @ 2019-07-26 14:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, vandersonmr, Richard Henderson


vandersonmr <vandersonmr2@gmail.com> writes:

> If a TB has a TBS (TBStatistics) with the TB_JIT_STATS
> enabled then we collect statistics of its translation
> processes and code translation. To collect the number
> of host instructions we used a modified version of the
> disas function to pass through the whole code without
> printing anything (fake_fprintf) but counting the number
> of instructions.
>
> Signed-off-by: vandersonmr <vandersonmr2@gmail.com>
> ---
>  accel/tcg/translate-all.c |  18 +++++++
>  accel/tcg/translator.c    |   5 ++
>  disas.c                   | 108 ++++++++++++++++++++++++++++++++++++++
>  include/disas/disas.h     |   1 +
>  include/exec/tb-stats.h   |  14 +++++
>  include/qemu/log.h        |   1 +
>  tcg/tcg.c                 |  23 ++++++++
>  tcg/tcg.h                 |   2 +
>  8 files changed, 172 insertions(+)
>
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 7497dae508..3a47ac6f2c 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -1793,6 +1793,10 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>              }
>          }
>
> +        if (flag & TB_JIT_STATS) {
> +            tb->tb_stats->stats_enabled |= TB_JIT_STATS;
> +            atomic_inc(&tb->tb_stats->translations.total);
> +        }
>      } else {
>          tb->tb_stats = NULL;
>      }
> @@ -1870,6 +1874,17 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>      atomic_set(&prof->search_out_len, prof->search_out_len + search_size);
>  #endif
>
> +    if (tb_stats_enabled(tb, TB_JIT_STATS)) {
> +        size_t code_size = gen_code_size;
> +        if (tcg_ctx->data_gen_ptr) {
> +            code_size = tcg_ctx->data_gen_ptr - tb->tc.ptr;
> +        }
> +
> +        atomic_add(&tb->tb_stats->code.num_host_inst,
> +                    get_num_insts(tb->tc.ptr, code_size));

This is what is causing my crashes. I think invoking the disassembler stuff is
too clunky here. Maybe we should just change the counter to
num_host_insn_byte, the ratio will still be interesting but it saves the
complication of re-counting.

Ideally the we'd like the core tcg code to tell us how many host
instructions it emitted for each tcg_op but I suspect that involves
heavier surgery.

> +    }
> +
> +
>  #ifdef DEBUG_DISAS
>      if (qemu_loglevel_mask(CPU_LOG_TB_OUT_ASM) &&
>          qemu_log_in_addr_range(tb->pc)) {
> @@ -1927,6 +1942,9 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>      phys_page2 = -1;
>      if ((pc & TARGET_PAGE_MASK) != virt_page2) {
>          phys_page2 = get_page_addr_code(env, virt_page2);
> +        if (tb_stats_enabled(tb, TB_JIT_STATS)) {
> +            atomic_inc(&tb->tb_stats->translations.spanning);
> +        }
>      }
>      /*
>       * No explicit memory barrier is required -- tb_link_page() makes the
> diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
> index 396a11e828..03c00bdb1b 100644
> --- a/accel/tcg/translator.c
> +++ b/accel/tcg/translator.c
> @@ -117,6 +117,11 @@ void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
>      db->tb->size = db->pc_next - db->pc_first;
>      db->tb->icount = db->num_insns;
>
> +    if (tb_stats_enabled(tb, TB_JIT_STATS)) {
> +        atomic_add(&db->tb->tb_stats->code.num_guest_inst, db->num_insns);
> +    }
> +
> +
>  #ifdef DEBUG_DISAS
>      if (qemu_loglevel_mask(CPU_LOG_TB_IN_ASM)
>          && qemu_log_in_addr_range(db->pc_first)) {
> diff --git a/disas.c b/disas.c
> index 3e2bfa572b..5dec754992 100644
> --- a/disas.c
> +++ b/disas.c
> @@ -475,6 +475,114 @@ void target_disas(FILE *out, CPUState *cpu, target_ulong code,
>      }
>  }
>
> +static int fprintf_fake(struct _IO_FILE *a, const char *b, ...)
> +{
> +    return 1;
> +}
> +
> +/*
> + * This is a work around to get the number of host instructions with
> + * a small effort. It reuses the disas function with a fake printf to
> + * print nothing but count the number of instructions.
> + *
> + */
> +unsigned get_num_insts(void *code, unsigned long size)
> +{
> +    uintptr_t pc;
> +    int count;
> +    CPUDebug s;
> +    int (*print_insn)(bfd_vma pc, disassemble_info *info) = NULL;
> +
> +    INIT_DISASSEMBLE_INFO(s.info, NULL, fprintf_fake);
> +    s.info.print_address_func = generic_print_host_address;
> +
> +    s.info.buffer = code;
> +    s.info.buffer_vma = (uintptr_t)code;
> +    s.info.buffer_length = size;
> +    s.info.cap_arch = -1;
> +    s.info.cap_mode = 0;
> +    s.info.cap_insn_unit = 4;
> +    s.info.cap_insn_split = 4;
> +
> +#ifdef HOST_WORDS_BIGENDIAN
> +    s.info.endian = BFD_ENDIAN_BIG;
> +#else
> +    s.info.endian = BFD_ENDIAN_LITTLE;
> +#endif
> +#if defined(CONFIG_TCG_INTERPRETER)
> +    print_insn = print_insn_tci;
> +#elif defined(__i386__)
> +    s.info.mach = bfd_mach_i386_i386;
> +    print_insn = print_insn_i386;
> +    s.info.cap_arch = CS_ARCH_X86;
> +    s.info.cap_mode = CS_MODE_32;
> +    s.info.cap_insn_unit = 1;
> +    s.info.cap_insn_split = 8;
> +#elif defined(__x86_64__)
> +    s.info.mach = bfd_mach_x86_64;
> +    print_insn = print_insn_i386;
> +    s.info.cap_arch = CS_ARCH_X86;
> +    s.info.cap_mode = CS_MODE_64;
> +    s.info.cap_insn_unit = 1;
> +    s.info.cap_insn_split = 8;
> +#elif defined(_ARCH_PPC)
> +    s.info.disassembler_options = (char *)"any";
> +    print_insn = print_insn_ppc;
> +    s.info.cap_arch = CS_ARCH_PPC;
> +# ifdef _ARCH_PPC64
> +    s.info.cap_mode = CS_MODE_64;
> +# endif
> +#elif defined(__riscv) && defined(CONFIG_RISCV_DIS)
> +#if defined(_ILP32) || (__riscv_xlen == 32)
> +    print_insn = print_insn_riscv32;
> +#elif defined(_LP64)
> +    print_insn = print_insn_riscv64;
> +#else
> +#error unsupported RISC-V ABI
> +#endif
> +#elif defined(__aarch64__) && defined(CONFIG_ARM_A64_DIS)
> +    print_insn = print_insn_arm_a64;
> +    s.info.cap_arch = CS_ARCH_ARM64;
> +#elif defined(__alpha__)
> +    print_insn = print_insn_alpha;
> +#elif defined(__sparc__)
> +    print_insn = print_insn_sparc;
> +    s.info.mach = bfd_mach_sparc_v9b;
> +#elif defined(__arm__)
> +    print_insn = print_insn_arm;
> +    s.info.cap_arch = CS_ARCH_ARM;
> +    /* TCG only generates code for arm mode.  */
> +#elif defined(__MIPSEB__)
> +    print_insn = print_insn_big_mips;
> +#elif defined(__MIPSEL__)
> +    print_insn = print_insn_little_mips;
> +#elif defined(__m68k__)
> +    print_insn = print_insn_m68k;
> +#elif defined(__s390__)
> +    print_insn = print_insn_s390;
> +#elif defined(__hppa__)
> +    print_insn = print_insn_hppa;
> +#endif
> +
> +    if (print_insn == NULL) {
> +        print_insn = print_insn_od_host;
> +    }
> +
> +    s.info.fprintf_func = fprintf_fake;
> +    unsigned num_insts = 0;
> +    for (pc = (uintptr_t)code; size > 0; pc += count, size -= count) {
> +        num_insts++;
> +        count = print_insn(pc, &s.info);
> +        if (count < 0) {
> +            break;
> +        }
> +    }
> +    return num_insts;
> +}
> +
> +
> +
> +
>  /* Disassemble this for me please... (debugging). */
>  void disas(FILE *out, void *code, unsigned long size)
>  {
> diff --git a/include/disas/disas.h b/include/disas/disas.h
> index 15da511f49..9797ae7cfa 100644
> --- a/include/disas/disas.h
> +++ b/include/disas/disas.h
> @@ -7,6 +7,7 @@
>
>  /* Disassemble this for me please... (debugging). */
>  void disas(FILE *out, void *code, unsigned long size);
> +unsigned get_num_insts(void *code, unsigned long size);
>  void target_disas(FILE *out, CPUState *cpu, target_ulong code,
>                    target_ulong size);
>
> diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
> index ee1e8de0d3..6584b78c4a 100644
> --- a/include/exec/tb-stats.h
> +++ b/include/exec/tb-stats.h
> @@ -33,6 +33,20 @@ struct TBStatistics {
>          unsigned long total;
>          unsigned long atomic;
>      } executions;
> +
> +    struct {
> +        unsigned num_guest_inst;
> +        unsigned num_host_inst;
> +        unsigned num_tcg_ops;
> +        unsigned num_tcg_ops_opt;
> +        unsigned spills;
> +    } code;
> +
> +    struct {
> +        unsigned long total;
> +        unsigned long uncached;
> +        unsigned long spanning;
> +    } translations;
>  };
>
>  bool tb_stats_cmp(const void *ap, const void *bp);
> diff --git a/include/qemu/log.h b/include/qemu/log.h
> index b213411836..8cdfc268ca 100644
> --- a/include/qemu/log.h
> +++ b/include/qemu/log.h
> @@ -131,6 +131,7 @@ void qemu_log_close(void);
>
>  #define TB_NOTHING    0
>  #define TB_EXEC_STATS (1 << 1)
> +#define TB_JIT_STATS  (1 << 2)
>
>  void enable_collect_tb_stats(void);
>  void disable_collect_tb_stats(void);
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index be2c33c400..446e3d1708 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -3126,6 +3126,11 @@ static void temp_sync(TCGContext *s, TCGTemp *ts, TCGRegSet allocated_regs,
>          case TEMP_VAL_REG:
>              tcg_out_st(s, ts->type, ts->reg,
>                         ts->mem_base->reg, ts->mem_offset);
> +
> +            /* Count number of spills */
> +            if (tb_stats_enabled(s->current_tb, TB_JIT_STATS)) {
> +                atomic_inc(&s->current_tb->tb_stats->code.spills);
> +            }
>              break;
>
>          case TEMP_VAL_MEM:
> @@ -3997,6 +4002,8 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
>      int i, num_insns;
>      TCGOp *op;
>
> +    s->current_tb = tb;
> +
>  #ifdef CONFIG_PROFILER
>      {
>          int n = 0;
> @@ -4028,6 +4035,14 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
>      }
>  #endif
>
> +    if (tb_stats_enabled(tb, TB_JIT_STATS)) {
> +        int n = 0;
> +        QTAILQ_FOREACH(op, &s->ops, link) {
> +            n++;
> +        }
> +        atomic_add(&tb->tb_stats->code.num_tcg_ops, n);
> +    }
> +
>  #ifdef CONFIG_DEBUG_TCG
>      /* Ensure all labels referenced have been emitted.  */
>      {
> @@ -4094,6 +4109,14 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
>      }
>  #endif
>
> +    if (tb_stats_enabled(tb, TB_JIT_STATS)) {
> +        int n = 0;
> +        QTAILQ_FOREACH(op, &s->ops, link) {
> +            n++;
> +        }
> +        atomic_add(&tb->tb_stats->code.num_tcg_ops_opt, n);
> +    }
> +
>      tcg_reg_alloc_start(s);
>
>      s->code_buf = tb->tc.ptr;
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index b411e17a28..bf6f3bcba3 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -738,6 +738,8 @@ struct TCGContext {
>
>      uint16_t gen_insn_end_off[TCG_MAX_INSNS];
>      target_ulong gen_insn_data[TCG_MAX_INSNS][TARGET_INSN_START_WORDS];
> +
> +    TranslationBlock *current_tb;
>  };
>
>  extern TCGContext tcg_init_ctx;

Without the disas hack:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

--
Alex Bennée


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v4 4/7] accel: replacing part of CONFIG_PROFILER with TBStats
  2019-07-20  1:02 ` [Qemu-devel] [PATCH v4 4/7] accel: replacing part of CONFIG_PROFILER with TBStats vandersonmr
@ 2019-07-26 15:25   ` Alex Bennée
  0 siblings, 0 replies; 18+ messages in thread
From: Alex Bennée @ 2019-07-26 15:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, vandersonmr, Richard Henderson


vandersonmr <vandersonmr2@gmail.com> writes:

> We add some of the statistics collected in the TCGProfiler
> into the TBStats, having the statistics not only for the whole
> emulation but for each TB. Then, we removed these stats
> from TCGProfiler and reconstruct the information for the
> "info jit" using the sum of all TBStats statistics.
>
> The goal is to have one unique and better way of collecting
> emulation statistics. Moreover, checking dynamiclly if the
> profiling is enabled showed to have an insignificant impact
> on the performance:
> https://wiki.qemu.org/Internships/ProjectIdeas/TCGCodeQuality#Overheads.
>
> Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
> ---
>  accel/tcg/Makefile.objs   |   2 +-
>  accel/tcg/tb-stats.c      | 107 ++++++++++++++++++++++++++++++++++++++
>  accel/tcg/translate-all.c |  10 ++--
>  include/exec/tb-stats.h   |   9 ++++
>  tcg/tcg.c                 |  93 ++++-----------------------------
>  tcg/tcg.h                 |  10 ----
>  6 files changed, 131 insertions(+), 100 deletions(-)
>  create mode 100644 accel/tcg/tb-stats.c
>
> diff --git a/accel/tcg/Makefile.objs b/accel/tcg/Makefile.objs
> index d381a02f34..49ffe81b5d 100644
> --- a/accel/tcg/Makefile.objs
> +++ b/accel/tcg/Makefile.objs
> @@ -2,7 +2,7 @@ obj-$(CONFIG_SOFTMMU) += tcg-all.o
>  obj-$(CONFIG_SOFTMMU) += cputlb.o
>  obj-y += tcg-runtime.o tcg-runtime-gvec.o
>  obj-y += cpu-exec.o cpu-exec-common.o translate-all.o
> -obj-y += translator.o
> +obj-y += translator.o tb-stats.o
>
>  obj-$(CONFIG_USER_ONLY) += user-exec.o
>  obj-$(call lnot,$(CONFIG_SOFTMMU)) += user-exec-stub.o
> diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
> new file mode 100644
> index 0000000000..44497d4f9b
> --- /dev/null
> +++ b/accel/tcg/tb-stats.c
> @@ -0,0 +1,107 @@
> +#include "qemu/osdep.h"
> +
> +#include "disas/disas.h"
> +#include "exec/exec-all.h"
> +#include "tcg.h"
> +
> +#include "qemu/qemu-print.h"
> +
> +struct jit_profile_info {
> +    uint64_t translations;
> +    uint64_t aborted;
> +    uint64_t ops;
> +    unsigned ops_max;
> +    uint64_t del_ops;
> +    uint64_t temps;
> +    unsigned temps_max;
> +    uint64_t host;
> +    uint64_t guest;
> +    uint64_t host_ins;
> +    uint64_t search_data;
> +};
> +
> +/* accumulate the statistics from all TBs */
> +static void collect_jit_profile_info(void *p, uint32_t hash, void *userp)
> +{
> +    struct jit_profile_info *jpi = userp;
> +    TBStatistics *tbs = p;
> +
> +    jpi->translations += tbs->translations.total;
> +    jpi->ops += tbs->code.num_tcg_ops;
> +    if (tbs->translations.total && tbs->code.num_tcg_ops / tbs->translations.total
> +            > jpi->ops_max) {
> +        jpi->ops_max = tbs->code.num_tcg_ops / tbs->translations.total;
> +    }
> +    jpi->del_ops += tbs->code.deleted_ops;
> +    jpi->temps += tbs->code.temps;
> +    if (tbs->translations.total && tbs->code.temps / tbs->translations.total >
> +            jpi->temps_max) {
> +        jpi->temps_max = tbs->code.temps / tbs->translations.total;
> +    }
> +    jpi->host += tbs->code.out_len;
> +    jpi->guest += tbs->code.in_len;
> +    jpi->host_ins += tbs->code.num_host_inst;
> +    jpi->search_data += tbs->code.search_out_len;
> +}
> +
> +/* dump JIT statisticis using TCGProfile and TBStats */
> +void dump_jit_profile_info(TCGProfile *s)
> +{
> +    if (!tb_stats_collection_enabled()) {
> +        return;
> +    }
> +
> +    struct jit_profile_info *jpi = g_new0(struct jit_profile_info,
>  1);

This leaks and needs to be freed after we are done.

> +
> +    qht_iter(&tb_ctx.tb_stats, collect_jit_profile_info, jpi);
> +
> +    if (jpi->translations) {
> +        qemu_printf("translated TBs      %" PRId64 "\n", jpi->translations);
> +        qemu_printf("avg ops/TB          %0.1f max=%d\n",
> +                jpi->ops / (double) jpi->translations, jpi->ops_max);
> +        qemu_printf("deleted ops/TB      %0.2f\n",
> +                jpi->del_ops / (double) jpi->translations);
> +        qemu_printf("avg temps/TB        %0.2f max=%d\n",
> +                jpi->temps / (double) jpi->translations, jpi->temps_max);
> +        qemu_printf("avg host code/TB    %0.1f\n",
> +                jpi->host / (double) jpi->translations);
> +        qemu_printf("avg host ins/TB     %0.1f\n",
> +                jpi->host_ins / (double) jpi->translations);
> +        qemu_printf("avg search data/TB  %0.1f\n",
> +                jpi->search_data / (double) jpi->translations);
> +
> +        if (s) {
> +            int64_t tot = s->interm_time + s->code_time;
> +            qemu_printf("JIT cycles          %" PRId64 " (%0.3f s at 2.4 GHz)\n",
> +                            tot, tot / 2.4e9);
> +            qemu_printf("cycles/op           %0.1f\n",
> +                        jpi->ops ? (double)tot / jpi->ops : 0);
> +            qemu_printf("cycles/in byte      %0.1f\n",
> +                        jpi->guest ? (double)tot / jpi->guest : 0);
> +            qemu_printf("cycles/out byte     %0.1f\n",
> +                        jpi->host ? (double)tot / jpi->host : 0);
> +            qemu_printf("cycles/out inst     %0.1f\n",
> +                        jpi->host_ins ? (double)tot / jpi->host_ins : 0);
> +            qemu_printf("cycles/search byte     %0.1f\n",
> +                        jpi->search_data ? (double)tot / jpi->search_data : 0);
> +            if (tot == 0) {
> +                tot = 1;
> +            }
> +            qemu_printf("  gen_interm time   %0.1f%%\n",
> +                        (double)s->interm_time / tot * 100.0);
> +            qemu_printf("  gen_code time     %0.1f%%\n",
> +                        (double)s->code_time / tot * 100.0);
> +            qemu_printf("optim./code time    %0.1f%%\n",
> +                        (double)s->opt_time / (s->code_time ? s->code_time : 1)
> +                        * 100.0);
> +            qemu_printf("liveness/code time  %0.1f%%\n",
> +                    (double)s->la_time / (s->code_time ? s->code_time : 1) * 100.0);
> +            qemu_printf("cpu_restore count   %" PRId64 "\n",
> +                    s->restore_count);
> +            qemu_printf("  avg cycles        %0.1f\n",
> +                    s->restore_count ? (double)s->restore_time / s->restore_count : 0);
> +        }
> +    }
> +}
> +
> +
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 3a47ac6f2c..bbdb0f23a4 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -1773,8 +1773,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>   tb_overflow:
>
>  #ifdef CONFIG_PROFILER
> -    /* includes aborted translations because of exceptions */
> -    atomic_set(&prof->tb_count1, prof->tb_count1 + 1);
>      ti = profile_getclock();
>  #endif

I assume from the conversation on IRC we can add the profile clocks here
as well. I wonder if execution time could also be added (under a new
tbstats flag)? The data might be too noisy given the time to collect
timestamps vs the relatively short block length.

>
> @@ -1822,7 +1820,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>      }
>
>  #ifdef CONFIG_PROFILER
> -    atomic_set(&prof->tb_count, prof->tb_count + 1);
>      atomic_set(&prof->interm_time, prof->interm_time + profile_getclock() - ti);
>      ti = profile_getclock();
>  #endif
> @@ -1869,12 +1866,13 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>
>  #ifdef CONFIG_PROFILER
>      atomic_set(&prof->code_time, prof->code_time + profile_getclock() - ti);
> -    atomic_set(&prof->code_in_len, prof->code_in_len + tb->size);
> -    atomic_set(&prof->code_out_len, prof->code_out_len + gen_code_size);
> -    atomic_set(&prof->search_out_len, prof->search_out_len + search_size);
>  #endif
>
>      if (tb_stats_enabled(tb, TB_JIT_STATS)) {
> +        atomic_add(&tb->tb_stats->code.in_len, tb->size);
> +        atomic_add(&tb->tb_stats->code.out_len, gen_code_size);

If we are switching num_host_insns to num_host_bytes we don't need a
duplicate here.

> +        atomic_add(&tb->tb_stats->code.search_out_len, search_size);
> +
>          size_t code_size = gen_code_size;
>          if (tcg_ctx->data_gen_ptr) {
>              code_size = tcg_ctx->data_gen_ptr - tb->tc.ptr;
> diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
> index 6584b78c4a..7d775f2c0d 100644
> --- a/include/exec/tb-stats.h
> +++ b/include/exec/tb-stats.h
> @@ -40,6 +40,13 @@ struct TBStatistics {
>          unsigned num_tcg_ops;
>          unsigned num_tcg_ops_opt;
>          unsigned spills;
> +
> +        /* CONFIG_PROFILE */
> +        unsigned temps;
> +        unsigned deleted_ops;
> +        unsigned in_len;
> +        unsigned out_len;
> +        unsigned search_out_len;
>      } code;
>
>      struct {
> @@ -51,4 +58,6 @@ struct TBStatistics {
>
>  bool tb_stats_cmp(const void *ap, const void *bp);
>
> +void dump_jit_profile_info(TCGProfile *s);
> +
>  #endif
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 446e3d1708..46b31a2f68 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -2295,9 +2295,9 @@ void tcg_op_remove(TCGContext *s, TCGOp *op)
>      QTAILQ_INSERT_TAIL(&s->free_ops, op, link);
>      s->nb_ops--;
>
> -#ifdef CONFIG_PROFILER
> -    atomic_set(&s->prof.del_op_count, s->prof.del_op_count + 1);
> -#endif
> +    if (tb_stats_enabled(s->current_tb, TB_JIT_STATS)) {
> +        atomic_inc(&s->current_tb->tb_stats->code.deleted_ops);
> +    }
>  }
>
>  static TCGOp *tcg_op_alloc(TCGOpcode opc)
> @@ -3914,16 +3914,6 @@ void tcg_profile_snapshot(TCGProfile *prof, bool counters, bool table)
>
>          if (counters) {
>              PROF_ADD(prof, orig, cpu_exec_time);
> -            PROF_ADD(prof, orig, tb_count1);
> -            PROF_ADD(prof, orig, tb_count);
> -            PROF_ADD(prof, orig, op_count);
> -            PROF_MAX(prof, orig, op_count_max);
> -            PROF_ADD(prof, orig, temp_count);
> -            PROF_MAX(prof, orig, temp_count_max);
> -            PROF_ADD(prof, orig, del_op_count);
> -            PROF_ADD(prof, orig, code_in_len);
> -            PROF_ADD(prof, orig, code_out_len);
> -            PROF_ADD(prof, orig, search_out_len);
>              PROF_ADD(prof, orig, interm_time);
>              PROF_ADD(prof, orig, code_time);
>              PROF_ADD(prof, orig, la_time);
> @@ -4003,26 +3993,17 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
>      TCGOp *op;
>
>      s->current_tb = tb;
> -
> -#ifdef CONFIG_PROFILER
> -    {
> +    if (tb_stats_enabled(tb, TB_JIT_STATS)) {
>          int n = 0;
>
>          QTAILQ_FOREACH(op, &s->ops, link) {
>              n++;
>          }
> -        atomic_set(&prof->op_count, prof->op_count + n);
> -        if (n > prof->op_count_max) {
> -            atomic_set(&prof->op_count_max, n);
> -        }
> +        atomic_add(&tb->tb_stats->code.num_tcg_ops, n);
>
>          n = s->nb_temps;
> -        atomic_set(&prof->temp_count, prof->temp_count + n);
> -        if (n > prof->temp_count_max) {
> -            atomic_set(&prof->temp_count_max, n);
> -        }
> +        atomic_add(&tb->tb_stats->code.temps, n);
>      }
> -#endif
>
>  #ifdef DEBUG_DISAS
>      if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP)
> @@ -4229,70 +4210,16 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
>      return tcg_current_code_size(s);
>  }
>
> -#ifdef CONFIG_PROFILER
>  void tcg_dump_info(void)
>  {
> +    TCGProfile *s = NULL;
> +#ifdef CONFIG_PROFILER
>      TCGProfile prof = {};
> -    const TCGProfile *s;
> -    int64_t tb_count;
> -    int64_t tb_div_count;
> -    int64_t tot;
> -
>      tcg_profile_snapshot_counters(&prof);
>      s = &prof;
> -    tb_count = s->tb_count;
> -    tb_div_count = tb_count ? tb_count : 1;
> -    tot = s->interm_time + s->code_time;
> -
> -    qemu_printf("JIT cycles          %" PRId64 " (%0.3f s at 2.4 GHz)\n",
> -                tot, tot / 2.4e9);
> -    qemu_printf("translated TBs      %" PRId64 " (aborted=%" PRId64
> -                " %0.1f%%)\n",
> -                tb_count, s->tb_count1 - tb_count,
> -                (double)(s->tb_count1 - s->tb_count)
> -                / (s->tb_count1 ? s->tb_count1 : 1) * 100.0);
> -    qemu_printf("avg ops/TB          %0.1f max=%d\n",
> -                (double)s->op_count / tb_div_count, s->op_count_max);
> -    qemu_printf("deleted ops/TB      %0.2f\n",
> -                (double)s->del_op_count / tb_div_count);
> -    qemu_printf("avg temps/TB        %0.2f max=%d\n",
> -                (double)s->temp_count / tb_div_count, s->temp_count_max);
> -    qemu_printf("avg host code/TB    %0.1f\n",
> -                (double)s->code_out_len / tb_div_count);
> -    qemu_printf("avg search data/TB  %0.1f\n",
> -                (double)s->search_out_len / tb_div_count);
> -
> -    qemu_printf("cycles/op           %0.1f\n",
> -                s->op_count ? (double)tot / s->op_count : 0);
> -    qemu_printf("cycles/in byte      %0.1f\n",
> -                s->code_in_len ? (double)tot / s->code_in_len : 0);
> -    qemu_printf("cycles/out byte     %0.1f\n",
> -                s->code_out_len ? (double)tot / s->code_out_len : 0);
> -    qemu_printf("cycles/search byte     %0.1f\n",
> -                s->search_out_len ? (double)tot / s->search_out_len : 0);
> -    if (tot == 0) {
> -        tot = 1;
> -    }
> -    qemu_printf("  gen_interm time   %0.1f%%\n",
> -                (double)s->interm_time / tot * 100.0);
> -    qemu_printf("  gen_code time     %0.1f%%\n",
> -                (double)s->code_time / tot * 100.0);
> -    qemu_printf("optim./code time    %0.1f%%\n",
> -                (double)s->opt_time / (s->code_time ? s->code_time : 1)
> -                * 100.0);
> -    qemu_printf("liveness/code time  %0.1f%%\n",
> -                (double)s->la_time / (s->code_time ? s->code_time : 1) * 100.0);
> -    qemu_printf("cpu_restore count   %" PRId64 "\n",
> -                s->restore_count);
> -    qemu_printf("  avg cycles        %0.1f\n",
> -                s->restore_count ? (double)s->restore_time / s->restore_count : 0);
> -}
> -#else
> -void tcg_dump_info(void)
> -{
> -    qemu_printf("[TCG profiler not compiled]\n");
> -}
>  #endif
> +    dump_jit_profile_info(s);
> +}
>
>  #ifdef ELF_HOST_MACHINE
>  /* In order to use this feature, the backend needs to do three things:
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index bf6f3bcba3..026a066b9a 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -644,16 +644,6 @@ QEMU_BUILD_BUG_ON(NB_OPS > (1 << 8));
>
>  typedef struct TCGProfile {
>      int64_t cpu_exec_time;
> -    int64_t tb_count1;
> -    int64_t tb_count;
> -    int64_t op_count; /* total insn count */
> -    int op_count_max; /* max insn per TB */
> -    int temp_count_max;
> -    int64_t temp_count;
> -    int64_t del_op_count;
> -    int64_t code_in_len;
> -    int64_t code_out_len;
> -    int64_t search_out_len;
>      int64_t interm_time;
>      int64_t code_time;
>      int64_t la_time;


--
Alex Bennée


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v4 5/7] log: adding -d tb_stats to control tbstats
  2019-07-20  1:02 ` [Qemu-devel] [PATCH v4 5/7] log: adding -d tb_stats to control tbstats vandersonmr
@ 2019-07-26 16:20   ` Alex Bennée
  0 siblings, 0 replies; 18+ messages in thread
From: Alex Bennée @ 2019-07-26 16:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: vandersonmr


vandersonmr <vandersonmr2@gmail.com> writes:

> Adding -d tb_stats:[limit:[all|jit|exec]] to control TBStatistics
> collection. "limit" is used to limit the number of TBStats in the
> linux-user dump. [all|jit|exec] control the profilling level used
> by the TBStats: all, only jit stats or only execution count stats.
>
> Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
> ---
>  include/qemu/log.h |  1 +
>  util/log.c         | 34 ++++++++++++++++++++++++++++++++++
>  2 files changed, 35 insertions(+)
>
> diff --git a/include/qemu/log.h b/include/qemu/log.h
> index 8cdfc268ca..93642603e5 100644
> --- a/include/qemu/log.h
> +++ b/include/qemu/log.h
> @@ -45,6 +45,7 @@ static inline bool qemu_log_separate(void)
>  /* LOG_TRACE (1 << 15) is defined in log-for-trace.h */
>  #define CPU_LOG_TB_OP_IND  (1 << 16)
>  #define CPU_LOG_TB_FPU     (1 << 17)
> +#define CPU_LOG_TB_STATS   (1 << 18)
>
>  /* Lock output for a series of related logs.  Since this is not needed
>   * for a single qemu_log / qemu_log_mask / qemu_log_mask_and_addr, we
> diff --git a/util/log.c b/util/log.c
> index f81653d712..8109d19afb 100644
> --- a/util/log.c
> +++ b/util/log.c
> @@ -30,6 +30,7 @@ FILE *qemu_logfile;
>  int qemu_loglevel;
>  static int log_append = 0;
>  static GArray *debug_regions;
> +int32_t max_num_hot_tbs_to_dump;
>
>  /* Return the number of characters emitted.  */
>  int qemu_log(const char *fmt, ...)
> @@ -273,6 +274,8 @@ const QEMULogItem qemu_log_items[] = {
>      { CPU_LOG_TB_NOCHAIN, "nochain",
>        "do not chain compiled TBs so that \"exec\" and \"cpu\" show\n"
>        "complete traces" },
> +    { CPU_LOG_TB_STATS, "tb_stats[:limit:(all,jit,exec)]",
> +      "show TBs statistics (until given a limit) ordered by their hotness.\n" },
>      { 0, NULL, NULL },
>  };
>
> @@ -294,6 +297,37 @@ int qemu_str_to_log_mask(const char *str)
>              trace_enable_events((*tmp) + 6);
>              mask |= LOG_TRACE;
>  #endif
> +        } else if (g_str_has_prefix(*tmp, "tb_stats")) {
> +            if (g_str_has_prefix(*tmp, "tb_stats:") && (*tmp)[9] != '\0') {
> +
> +                if (!g_ascii_isdigit(*((*tmp) + 9))) {
> +                    fprintf(stderr,
> +                            "should be a number follow by [all|jit|exec], as tb_stats:10:all\n");
> +                    exit(1);
> +                }
> +                /* get limit */
> +                max_num_hot_tbs_to_dump = atoi((*tmp) + 9);

We probably want to handle -d tb_stats:[all|jit|exec] as well because we
might be doing interactive exploration for softmmu. I'd suggest
splitting the number processing off to a new patch and merging the exit
processing part of the later patches with it. As you don't have a HMP
for linux-user you can make the number required for linux-user or only
optional for softmmu.

> +
> +                /* get profilling level */
> +                char *s = (*tmp) + 9;
> +                while (*s != '\0') {
> +                    if (*s++ == ':') {
> +                        break;
> +                    }
> +                }
> +                if (g_str_equal(s, "jit") == 0) {
> +                    set_default_tbstats_flag(TB_JIT_STATS);
> +                } else if (g_str_equal(s, "exec") == 0) {
> +                    set_default_tbstats_flag(TB_EXEC_STATS);
> +                } else {
> +                    set_default_tbstats_flag(TB_JIT_STATS | TB_EXEC_STATS);
> +                }

If additional stats types get added we shall want to be additive:

  -d host_tbs:jit,exec

> +            } else {
> +                set_default_tbstats_flag(TB_JIT_STATS | TB_EXEC_STATS);
> +            }
> +
> +            mask |= CPU_LOG_TB_STATS;
> +            enable_collect_tb_stats();
>          } else {
>              for (item = qemu_log_items; item->mask != 0; item++) {
>                  if (g_str_equal(*tmp, item->name)) {


--
Alex Bennée


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v4 6/7] monitor: adding tb_stats hmp command
  2019-07-20  1:02 ` [Qemu-devel] [PATCH v4 6/7] monitor: adding tb_stats hmp command vandersonmr
@ 2019-07-26 16:57   ` Alex Bennée
  0 siblings, 0 replies; 18+ messages in thread
From: Alex Bennée @ 2019-07-26 16:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Richard Henderson, vandersonmr,
	Dr. David Alan Gilbert, Markus Armbruster


vandersonmr <vandersonmr2@gmail.com> writes:

> Adding tb_stats [start|pause|stop|filter] command to hmp.
> This allows controlling the collection of statistics.
> It is also possible to set the level of collection:
> all, jit, or exec.
>
> The goal of this command is to allow the dynamic exploration
> of the TCG behavior and quality. Therefore, for now, a
> corresponding QMP command is not worthwhile.
>
> Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
> ---
>  accel/tcg/tb-stats.c    | 107 ++++++++++++++++++++++++++++++++++++++++
>  hmp-commands.hx         |  17 +++++++
>  include/exec/tb-stats.h |  13 +++++
>  include/qemu/log.h      |   1 +
>  monitor/misc.c          |  40 +++++++++++++++
>  5 files changed, 178 insertions(+)
>
> diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
> index 44497d4f9b..6c330e1b02 100644
> --- a/accel/tcg/tb-stats.c
> +++ b/accel/tcg/tb-stats.c
> @@ -6,6 +6,9 @@
>
>  #include "qemu/qemu-print.h"
>
> +/* only accessed in safe work */
> +static GList *last_search;
> +
>  struct jit_profile_info {
>      uint64_t translations;
>      uint64_t aborted;
> @@ -104,4 +107,108 @@ void dump_jit_profile_info(TCGProfile *s)
>      }
>  }
>
> +static void dessaloc_tbstats(void *p, uint32_t hash, void *userp)
> +{
> +    g_free(p);
> +}

surely free_tbstats would be a better name?

> +
> +void clean_tbstats(void)
> +{
> +    /* remove all tb_stats */
> +    qht_iter(&tb_ctx.tb_stats, dessaloc_tbstats, NULL);
> +    qht_destroy(&tb_ctx.tb_stats);
> +}
> +
> +static void reset_tbstats_flag(void *p, uint32_t hash, void *userp)
> +{
> +    uint32_t flag = *((int *)userp);
> +    TBStatistics *tbs = p;
> +    tbs->stats_enabled = flag;
> +}
> +
> +void set_tbstats_flags(uint32_t flag)
> +{
> +    /* iterate over tbstats setting their flag as TB_NOTHING */
> +    qht_iter(&tb_ctx.tb_stats, reset_tbstats_flag, &flag);
> +}
> +
> +void do_hmp_tbstats_safe(CPUState *cpu, run_on_cpu_data icmd)
> +{
> +    struct TbstatsCommand *cmdinfo = icmd.host_ptr;
> +    int cmd = cmdinfo->cmd;
> +    uint32_t level = cmdinfo->level;
> +
> +    /* for safe, always pause TB generation while running this commands */
> +    mmap_lock();

This doesn't serialise in all cases - in fact it's running as safe work
which is our serial case. The mmap_lock is really only needed for code
generation in linux-user. I think the lock/unlock can be dropped from
these operations.

> +
> +    switch (cmd) {
> +    case START:
> +        if (tb_stats_collection_paused()) {
> +            set_tbstats_flags(level);
> +        } else {
> +            if (tb_stats_collection_enabled()) {
> +                qemu_printf("TB information already being recorded");
> +                return;
> +            }
> +            qht_init(&tb_ctx.tb_stats, tb_stats_cmp, CODE_GEN_HTABLE_SIZE,
> +                        QHT_MODE_AUTO_RESIZE);

I'm wary of initialising this in multiple places. Maybe move this to
enable_collect_tb_stats and drop the initialisation in tb_htable_init.

> +        }
> +
> +        set_default_tbstats_flag(level);
> +        enable_collect_tb_stats();
> +        tb_flush(cpu);
> +        break;
> +    case PAUSE:
> +        if (!tb_stats_collection_enabled()) {
> +            qemu_printf("TB information not being recorded");
> +            return;
> +        }
> +
> +        /* Continue to create TBStatistic structures but stop collecting statistics */
> +        pause_collect_tb_stats();
> +        tb_flush(cpu);
> +        set_default_tbstats_flag(TB_NOTHING);
> +        set_tbstats_flags(TB_PAUSED);

Minor style suggestion - make the tb_flush the last thing you do once
you have moved things around. You are in safe work so nothing should be
happening anyway and it reads better.

> +        break;
> +    case STOP:
> +        if (!tb_stats_collection_enabled()) {
> +            qemu_printf("TB information not being recorded");
> +            return;
> +        }
> +
> +        /* Dissalloc all TBStatistics structures and stop creating new ones */
> +        disable_collect_tb_stats();
> +        tb_flush(cpu);
> +        clean_tbstats();

As above.

> +        break;
> +    case FILTER:
> +        if (!tb_stats_collection_enabled()) {
> +            qemu_printf("TB information not being recorded");
> +            return;
> +        }
> +        if (!last_search) {
> +            qemu_printf("no search on record! execute info tbs before filtering!");
> +            return;
> +        }
> +
> +        tb_flush(cpu);
> +        set_default_tbstats_flag(TB_NOTHING);
> +
> +        /* Set all tbstats as paused, then return only the ones from last_search */
> +        pause_collect_tb_stats();
> +        set_tbstats_flags(TB_PAUSED);
> +
> +        for (GList *iter = last_search; iter; iter = g_list_next(iter)) {
> +            TBStatistics *tbs = iter->data;
> +            tbs->stats_enabled = level;
> +        }

If we are only interested in tracking the N hotestest at this point we
want to prevent allocation of new TBStats and free all the unused ones?

> +
> +        break;
> +    default: /* INVALID */

g_assert_not_reached()? This would be an internal bug if we ever sent
the wrong command?

> +        break;
> +    }
> +
> +    mmap_unlock();
> +    g_free(cmdinfo);
> +}
>
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index bfa5681dd2..419898751e 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -1885,6 +1885,23 @@ STEXI
>  @findex qemu-io
>  Executes a qemu-io command on the given block device.
>
> +ETEXI
> +#if defined(CONFIG_TCG)
> +    {
> +        .name       = "tb_stats",
> +        .args_type  = "command:s,level:s?",
> +        .params     = "command [stats_level]",
> +        .help       = "Control tb statistics collection:"
> +                        "tb_stats (start|pause|stop|filter)
> [all|jit_stats|exec_stats]",

What does filter do?

> +        .cmd        = hmp_tbstats,
> +    },
> +#endif
> +
> +STEXI
> +@item tb_stats
> +@findex
> +Control recording tb statistics
> +
>  ETEXI
>
>      {
> diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
> index 7d775f2c0d..d1debd3262 100644
> --- a/include/exec/tb-stats.h
> +++ b/include/exec/tb-stats.h
> @@ -6,6 +6,9 @@
>  #include "exec/tb-context.h"
>  #include "tcg.h"
>
> +enum SortBy { SORT_BY_HOTNESS, SORT_BY_HG /* Host/Guest */, SORT_BY_SPILLS };
> +enum TbstatsCmd { START, PAUSE, STOP, FILTER };
> +
>  #define tb_stats_enabled(tb, JIT_STATS) \
>      (tb && tb->tb_stats && (tb->tb_stats->stats_enabled & JIT_STATS))
>
> @@ -60,4 +63,14 @@ bool tb_stats_cmp(const void *ap, const void *bp);
>
>  void dump_jit_profile_info(TCGProfile *s);
>
> +void set_tbstats_flags(uint32_t flags);
> +void clean_tbstats(void);
> +
> +struct TbstatsCommand {
> +    enum TbstatsCmd cmd;
> +    uint32_t level;
> +};
> +
> +void do_hmp_tbstats_safe(CPUState *cpu, run_on_cpu_data icmd);
> +
>  #endif
> diff --git a/include/qemu/log.h b/include/qemu/log.h
> index 93642603e5..47071d72c7 100644
> --- a/include/qemu/log.h
> +++ b/include/qemu/log.h
> @@ -133,6 +133,7 @@ void qemu_log_close(void);
>  #define TB_NOTHING    0
>  #define TB_EXEC_STATS (1 << 1)
>  #define TB_JIT_STATS  (1 << 2)
> +#define TB_PAUSED     (1 << 3)
>
>  void enable_collect_tb_stats(void);
>  void disable_collect_tb_stats(void);
> diff --git a/monitor/misc.c b/monitor/misc.c
> index 00338c002a..aea9b0db4f 100644
> --- a/monitor/misc.c
> +++ b/monitor/misc.c
> @@ -459,6 +459,46 @@ static void hmp_info_registers(Monitor *mon, const QDict *qdict)
>  }
>
>  #ifdef CONFIG_TCG
> +static void hmp_tbstats(Monitor *mon, const QDict *qdict)
> +{
> +    if (!tcg_enabled()) {
> +        error_report("TB information is only available with accel=tcg");
> +        return;
> +    }
> +
> +    char *cmd = (char *) qdict_get_try_str(qdict, "command");
> +    enum TbstatsCmd icmd = -1;
> +
> +    if (strcmp(cmd, "start") == 0) {
> +        icmd = START;
> +    } else if (strcmp(cmd, "pause") == 0) {
> +        icmd = PAUSE;
> +    } else if (strcmp(cmd, "stop") == 0) {
> +        icmd = STOP;
> +    } else if (strcmp(cmd, "filter") == 0) {
> +        icmd = FILTER;
> +    } else {
> +        error_report("invalid command!");
> +    }
> +
> +    char *slevel = (char *) qdict_get_try_str(qdict, "level");
> +    uint32_t level = TB_EXEC_STATS | TB_JIT_STATS;
> +    if (slevel) {
> +        if (strcmp(slevel, "jit_stats") == 0) {
> +            level = TB_JIT_STATS;
> +        } else if (strcmp(slevel, "exec_stats") == 0) {
> +            level = TB_EXEC_STATS;
> +        }
> +    }
> +
> +    struct TbstatsCommand *tbscommand = g_new0(struct TbstatsCommand, 1);
> +    tbscommand->cmd = icmd;
> +    tbscommand->level = level;

It might be worth considering squashing these into a single pointer to
avoid the allocation and free?

> +    async_safe_run_on_cpu(first_cpu, do_hmp_tbstats_safe,
> +                          RUN_ON_CPU_HOST_PTR(tbscommand));
> +
> +}
> +
>  static void hmp_info_jit(Monitor *mon, const QDict *qdict)
>  {
>      if (!tcg_enabled()) {


--
Alex Bennée


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v4 7/7] monitor: adding info tbs, tb, and coverset
  2019-07-20  1:02 ` [Qemu-devel] [PATCH v4 7/7] monitor: adding info tbs, tb, and coverset vandersonmr
@ 2019-07-26 18:17   ` Alex Bennée
  2019-07-29 11:01   ` Alex Bennée
  1 sibling, 0 replies; 18+ messages in thread
From: Alex Bennée @ 2019-07-26 18:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: vandersonmr, Riku Voipio, Markus Armbruster,
	Dr. David Alan Gilbert, Paolo Bonzini, Richard Henderson,
	Laurent Vivier


vandersonmr <vandersonmr2@gmail.com> writes:

> Adding info [tbs|tb|coverset] commands to HMP.
> These commands allow the exploration of TBs
> generated by the TCG. Understand which one
> hotter, with more guest/host instructions...
> and examine their guest, host and IR code.
>
> The goal of this command is to allow the dynamic exploration
> of TCG behavior and code quality. Therefore, for now, a
> corresponding QMP command is not worthwhile.
>
> Signed-off-by: vandersonmr <vandersonmr2@gmail.com>
> ---
>  accel/tcg/tb-stats.c         | 275 +++++++++++++++++++++++++++++++++++
>  hmp-commands-info.hx         |  23 +++
>  include/exec/tb-stats.h      |  37 +++++
>  include/qemu/log-for-trace.h |   2 +
>  include/qemu/log.h           |   1 +
>  linux-user/exit.c            |   4 +
>  monitor/misc.c               |  71 +++++++++
>  util/log.c                   |  26 +++-
>  8 files changed, 431 insertions(+), 8 deletions(-)
>
> diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
> index 6c330e1b02..85a16cd563 100644
> --- a/accel/tcg/tb-stats.c
> +++ b/accel/tcg/tb-stats.c
> @@ -212,3 +212,278 @@ void do_hmp_tbstats_safe(CPUState *cpu, run_on_cpu_data icmd)
>      g_free(cmdinfo);
>  }
>  
> +static void collect_tb_stats(void *p, uint32_t hash, void *userp)
> +{
> +    last_search = g_list_prepend(last_search, p);
> +}
> +
> +static void dump_tb_header(TBStatistics *tbs)
> +{
> +    unsigned g = tbs->translations.total ?
> +        tbs->code.num_guest_inst / tbs->translations.total : 0;
> +    unsigned ops = tbs->translations.total ?
> +        tbs->code.num_tcg_ops / tbs->translations.total : 0;
> +    unsigned ops_opt = tbs->translations.total ?
> +        tbs->code.num_tcg_ops_opt / tbs->translations.total : 0;
> +    unsigned h = tbs->translations.total ?
> +        tbs->code.num_host_inst / tbs->translations.total : 0;
> +    unsigned spills = tbs->translations.total ?
> +        tbs->code.spills / tbs->translations.total : 0;

A helper like:

  #define stat_per_translation(stat, name) \
      stat->translations.total ? stat->name / stat->translations.total : 0

could be helpful here (and useful for later re-use).

> +
> +    float guest_host_prop = g ? ((float) h / g) : 0;
> +
> +    qemu_log("TB%d: phys:0x"TB_PAGE_ADDR_FMT" virt:0x"TARGET_FMT_lx
> +             " flags:%#08x (trans:%lu uncached:%lu exec:%lu ints: g:%u op:%u op_opt:%u h:%u h/g:%.2f spills:%d)\n",
> +             tbs->display_id,
> +             tbs->phys_pc, tbs->pc, tbs->flags,
> +             tbs->translations.total, tbs->translations.uncached,
> +             tbs->executions.total, g, ops, ops_opt, h, guest_host_prop,
> +             spills);
> +}
> +
> +static gint
> +inverse_sort_tbs(gconstpointer p1, gconstpointer p2, gpointer psort_by)
> +{
> +    const TBStatistics *tbs1 = (TBStatistics *) p1;
> +    const TBStatistics *tbs2 = (TBStatistics *) p2;
> +    int sort_by = *((int *) psort_by);
> +    unsigned long c1 = 0;
> +    unsigned long c2 = 0;

Like here. As we aggregate our spill and instruction stats by the number
of translations on display we will want the same for the sort. Otherwise
heavily translated blocks will distort the figures.

> +
> +    if (likely(sort_by == SORT_BY_SPILLS)) {
> +        c1 = tbs1->code.spills;
> +        c2 = tbs2->code.spills;
> +    } else if (likely(sort_by == SORT_BY_HOTNESS)) {
> +        c1 = tbs1->executions.total;
> +        c2 = tbs2->executions.total;
> +    } else if (likely(sort_by == SORT_BY_HG)) {
> +        if (tbs1->code.num_guest_inst == 0) {
> +            return -1;
> +        }
> +        if (tbs2->code.num_guest_inst == 0) {
> +            return 1;
> +        }
> +
> +        float a = (float) tbs1->code.num_host_inst / tbs1->code.num_guest_inst;
> +        float b = (float) tbs2->code.num_host_inst / tbs2->code.num_guest_inst;
> +        c1 = a <= b ? 0 : 1;
> +        c2 = a <= b ? 1 : 0;
> +    }
> +
> +    return c1 < c2 ? 1 : c1 == c2 ? 0 : -1;
> +}
> +
> +static void do_dump_coverset_info(int percentage)
> +{
> +    uint64_t total_exec_count = 0;
> +    uint64_t covered_exec_count = 0;
> +    unsigned coverset_size = 0;
> +    int id = 1;
> +    GList *i;
> +
> +    g_list_free(last_search);
> +    last_search = NULL;
> +
> +    qht_iter(&tb_ctx.tb_stats, collect_tb_stats, NULL);
> +
> +    int sort_by = SORT_BY_HOTNESS;
> +    last_search = g_list_sort_with_data(last_search, inverse_sort_tbs, &sort_by);
> +
> +    if (!last_search) {
> +        qemu_log("No data collected yet\n");
> +        return;
> +    }
> +
> +    /* Compute total execution count for all tbs */
> +    for (i = last_search; i; i = i->next) {
> +        TBStatistics *tbs = (TBStatistics *) i->data;
> +        total_exec_count += tbs->executions.total * tbs->code.num_guest_inst;
> +    }
> +
> +    for (i = last_search; i; i = i->next) {
> +        TBStatistics *tbs = (TBStatistics *) i->data;
> +        covered_exec_count += tbs->executions.total * tbs->code.num_guest_inst;
> +        tbs->display_id = id++;
> +        coverset_size++;
> +        dump_tb_header(tbs);
> +
> +        /* Iterate and display tbs until reach the percentage count cover */
> +        if (((double) covered_exec_count / total_exec_count) >
> +                ((double) percentage / 100)) {

I'd hoist at least the target percentage calculation to the start of the
function. In fact you can calculate what your target covered exec count
is as an integer once you have computed the total above and avoid doing
an expensive float operation each iteration.

> +            break;
> +        }
> +    }
> +
> +    qemu_log("\n------------------------------\n");
> +    qemu_log("# of TBs to reach %d%% of the total of guest insts exec: %u\t",
> +                percentage, coverset_size);
> +    qemu_log("Total of guest insts exec: %lu\n", total_exec_count);
> +    qemu_log("\n------------------------------\n");
> +
> +    /* free the unused bits */
> +    if (i) {
> +        if (i->next) {
> +            i->next->prev = NULL;
> +        }
> +        g_list_free(i->next);
> +        i->next = NULL;
> +    }
> +}
> +
> +static void do_dump_tbs_info(int count, int sort_by)
> +{
> +    int id = 1;
> +    GList *i;
> +
> +    g_list_free(last_search);
> +    last_search = NULL;
> +
> +    qht_iter(&tb_ctx.tb_stats, collect_tb_stats, NULL);
> +
> +    last_search = g_list_sort_with_data(last_search, inverse_sort_tbs,
> +                                        &sort_by);
> +
> +    if (!last_search) {
> +        qemu_printf("No data collected yet!\n");
> +        return;
> +    }
> +
> +    for (i = last_search; i && count--; i = i->next) {
> +        TBStatistics *tbs = (TBStatistics *) i->data;
> +        tbs->display_id = id++;
> +        dump_tb_header(tbs);
> +    }
> +
> +    /* free the unused bits */
> +    if (i) {
> +        if (i->next) {
> +            i->next->prev = NULL;
> +        }
> +        g_list_free(i->next);
> +        i->next = NULL;
> +    }
> +}
> +
> +static void
> +do_dump_coverset_info_safe(CPUState *cpu, run_on_cpu_data percentage)
> +{
> +    qemu_log_to_monitor(true);
> +    do_dump_coverset_info(percentage.host_int);
> +    qemu_log_to_monitor(false);
> +}
> +
> +struct tbs_dump_info {
> +    int count;
> +    int sort_by;
> +};
> +
> +static void do_dump_tbs_info_safe(CPUState *cpu, run_on_cpu_data tbdi)
> +{
> +    struct tbs_dump_info *info = tbdi.host_ptr;
> +    qemu_log_to_monitor(true);
> +    do_dump_tbs_info(info->count, info->sort_by);
> +    qemu_log_to_monitor(false);
> +    g_free(info);
> +}
> +
> +/*
> + * When we dump_tbs_info on a live system via the HMP we want to
> + * ensure the system is quiessent before we start outputting stuff.
> + * Otherwise we could pollute the output with other logging output.
> + */
> +void dump_coverset_info(int percentage, bool use_monitor)
> +{
> +    if (use_monitor) {
> +        async_safe_run_on_cpu(first_cpu, do_dump_coverset_info_safe,
> +                              RUN_ON_CPU_HOST_INT(percentage));
> +    } else {
> +        do_dump_coverset_info(percentage);
> +    }
> +}
> +
> +void dump_tbs_info(int count, int sort_by, bool use_monitor)
> +{
> +    if (use_monitor) {
> +        struct tbs_dump_info *tbdi = g_new(struct tbs_dump_info, 1);
> +        tbdi->count = count;
> +        tbdi->sort_by = sort_by;
> +        async_safe_run_on_cpu(first_cpu, do_dump_tbs_info_safe,
> +                              RUN_ON_CPU_HOST_PTR(tbdi));
> +    } else {
> +        do_dump_tbs_info(count, sort_by);
> +    }
> +}
> +
> +static void do_tb_dump_with_statistics(TBStatistics *tbs, int log_flags)
> +{
> +    CPUState *cpu = current_cpu;
> +    uint32_t cflags = curr_cflags() | CF_NOCACHE;
> +    int old_log_flags = qemu_loglevel;
> +    TranslationBlock *tb = NULL;
> +
> +    qemu_set_log(log_flags);
> +
> +    qemu_log("\n------------------------------\n");
> +    dump_tb_header(tbs);
> +
> +    if (sigsetjmp(cpu->jmp_env, 0) == 0) {
> +        mmap_lock();
> +        tb = tb_gen_code(cpu, tbs->pc, tbs->cs_base, tbs->flags, cflags);
> +        tb_phys_invalidate(tb, -1);
> +        mmap_unlock();
> +    } else {
> +        /*
> +         * The mmap_lock is dropped by tb_gen_code if it runs out of
> +         * memory.
> +         */
> +        fprintf(stderr, "%s: dbg failed!\n", __func__);
> +        assert_no_pages_locked();
> +    }
> +
> +    qemu_set_log(old_log_flags);
> +
> +    tcg_tb_remove(tb);
> +}
> +
> +struct tb_dump_info {
> +    int id;
> +    int log_flags;
> +    bool use_monitor;
> +};
> +
> +static void do_dump_tb_info_safe(CPUState *cpu, run_on_cpu_data info)
> +{
> +    struct tb_dump_info *tbdi = (struct tb_dump_info *) info.host_ptr;
> +    GList *iter;
> +
> +    if (!last_search) {
> +        qemu_printf("no search on record");
> +        return;
> +    }
> +    qemu_log_to_monitor(tbdi->use_monitor);
> +
> +    for (iter = last_search; iter; iter = g_list_next(iter)) {
> +        TBStatistics *tbs = iter->data;
> +        if (tbs->display_id == tbdi->id) {
> +            do_tb_dump_with_statistics(tbs, tbdi->log_flags);
> +            break;
> +        }
> +    }
> +    qemu_log_to_monitor(false);
> +    g_free(tbdi);
> +}
> +
> +void dump_tb_info(int id, int log_mask, bool use_monitor)
> +{
> +    struct tb_dump_info *tbdi = g_new(struct tb_dump_info, 1);
> +
> +    tbdi->id = id;
> +    tbdi->log_flags = log_mask;
> +    tbdi->use_monitor = use_monitor;
> +
> +    async_safe_run_on_cpu(first_cpu, do_dump_tb_info_safe,
> +                          RUN_ON_CPU_HOST_PTR(tbdi));
> +
> +    /* tbdi free'd by do_dump_tb_info_safe */
> +}
> +
> diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
> index c59444c461..761c76619b 100644
> --- a/hmp-commands-info.hx
> +++ b/hmp-commands-info.hx
> @@ -289,6 +289,29 @@ ETEXI
>          .help       = "show dynamic compiler info",
>          .cmd        = hmp_info_jit,
>      },
> +    {
> +        .name       = "tbs",
> +        .args_type  = "number:i?,sortedby:s?",
> +        .params     = "[number sortedby]",
> +        .help       = "show a [number] translated blocks sorted by [sortedby]"
> +                      "sortedby opts: hotness hg",
> +        .cmd        = hmp_info_tbs,
> +    },
> +    {
> +        .name       = "tb",
> +        .args_type  = "id:i,flags:s?",
> +        .params     = "id [log1[,...] flags]",
> +        .help       = "show information about one translated block by id",
> +        .cmd        = hmp_info_tb,
> +    },
> +    {
> +        .name       = "coverset",
> +        .args_type  = "number:i?",
> +        .params     = "[number]",
> +        .help       = "show hottest translated blocks neccesary to cover"
> +                      "[number]% of the execution count",
> +        .cmd        = hmp_info_coverset,
> +    },
>  #endif
>  
>  STEXI
> diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
> index d1debd3262..e2ab1068c3 100644
> --- a/include/exec/tb-stats.h
> +++ b/include/exec/tb-stats.h
> @@ -57,6 +57,9 @@ struct TBStatistics {
>          unsigned long uncached;
>          unsigned long spanning;
>      } translations;
> +
> +    /* HMP information - used for referring to previous search */
> +    int display_id;
>  };
>  
>  bool tb_stats_cmp(const void *ap, const void *bp);
> @@ -73,4 +76,38 @@ struct TbstatsCommand {
>  
>  void do_hmp_tbstats_safe(CPUState *cpu, run_on_cpu_data icmd);
>  
> +/**
> + * dump_coverset_info: report the hottest blocks to cover n% of execution
> + *
> + * @percentage: cover set percentage
> + * @use_monitor: redirect output to monitor
> + *
> + * Report the hottest blocks to either the log or monitor
> + */
> +void dump_coverset_info(int percentage, bool use_monitor);
> +
> +
> +/**
> + * dump_tbs_info: report the hottest blocks
> + *
> + * @count: the limit of hotblocks
> + * @sort_by: property in which the dump will be sorted
> + * @use_monitor: redirect output to monitor
> + *
> + * Report the hottest blocks to either the log or monitor
> + */
> +void dump_tbs_info(int count, int sort_by, bool use_monitor);
> +
> +/**
> + * dump_tb_info: dump information about one TB
> + *
> + * @id: the display id of the block (from previous search)
> + * @mask: the temporary logging mask
> + * @Use_monitor: redirect output to monitor
> + *
> + * Re-run a translation of a block at addr for the purposes of debug output
> + */
> +void dump_tb_info(int id, int log_mask, bool use_monitor);
> +
> +
>  #endif
> diff --git a/include/qemu/log-for-trace.h b/include/qemu/log-for-trace.h
> index 2f0a5b080e..d65eb83037 100644
> --- a/include/qemu/log-for-trace.h
> +++ b/include/qemu/log-for-trace.h
> @@ -21,6 +21,8 @@
>  /* Private global variable, don't use */
>  extern int qemu_loglevel;
>  
> +extern int32_t max_num_hot_tbs_to_dump;
> +
>  #define LOG_TRACE          (1 << 15)
>  
>  /* Returns true if a bit is set in the current loglevel mask */
> diff --git a/include/qemu/log.h b/include/qemu/log.h
> index 47071d72c7..27ba267faa 100644
> --- a/include/qemu/log.h
> +++ b/include/qemu/log.h
> @@ -114,6 +114,7 @@ typedef struct QEMULogItem {
>  extern const QEMULogItem qemu_log_items[];
>  
>  void qemu_set_log(int log_flags);
> +void qemu_log_to_monitor(bool enable);
>  void qemu_log_needs_buffers(void);
>  void qemu_set_log_filename(const char *filename, Error **errp);
>  void qemu_set_dfilter_ranges(const char *ranges, Error **errp);
> diff --git a/linux-user/exit.c b/linux-user/exit.c
> index bdda720553..7226104959 100644
> --- a/linux-user/exit.c
> +++ b/linux-user/exit.c
> @@ -28,6 +28,10 @@ extern void __gcov_dump(void);
>  
>  void preexit_cleanup(CPUArchState *env, int code)
>  {
> +    if (tb_stats_collection_enabled()) {
> +        dump_tbs_info(max_num_hot_tbs_to_dump, SORT_BY_HOTNESS, false);
> +    }
> +
>  #ifdef TARGET_GPROF
>          _mcleanup();
>  #endif
> diff --git a/monitor/misc.c b/monitor/misc.c
> index aea9b0db4f..78ada8aa0e 100644
> --- a/monitor/misc.c
> +++ b/monitor/misc.c
> @@ -499,6 +499,77 @@ static void hmp_tbstats(Monitor *mon, const QDict *qdict)
>  
>  }
>  
> +static void hmp_info_tbs(Monitor *mon, const QDict *qdict)
> +{
> +    int n;
> +    const char *s = NULL;
> +    if (!tcg_enabled()) {
> +        error_report("TB information is only available with accel=tcg");
> +        return;
> +    }
> +    if (!tb_ctx.tb_stats.map) {
> +        error_report("no TB information recorded");
> +        return;
> +    }
> +
> +    n = qdict_get_try_int(qdict, "number", 10);
> +    s = qdict_get_try_str(qdict, "sortedby");
> +
> +    int sortedby = 0;
> +    if (s == NULL || strcmp(s, "hotness") == 0) {
> +        sortedby = SORT_BY_HOTNESS;
> +    } else if (strcmp(s, "hg") == 0) {
> +        sortedby = SORT_BY_HG;
> +    } else if (strcmp(s, "spills") == 0) {
> +        sortedby = SORT_BY_SPILLS;
> +    }
> +
> +    dump_tbs_info(n, sortedby, true);
> +}
> +
> +static void hmp_info_tb(Monitor *mon, const QDict *qdict)
> +{
> +    const int id = qdict_get_int(qdict, "id");
> +    const char *flags = qdict_get_try_str(qdict, "flags");
> +    int mask;
> +
> +    if (!tcg_enabled()) {
> +        error_report("TB information is only available with accel=tcg");
> +        return;
> +    }
> +
> +    mask = flags ? qemu_str_to_log_mask(flags) : CPU_LOG_TB_IN_ASM;
> +
> +    if (!mask) {
> +        help_cmd(mon, "log");
> +        return;
> +    }
> +
> +    dump_tb_info(id, mask, true);
> +}
> +
> +static void hmp_info_coverset(Monitor *mon, const QDict *qdict)
> +{
> +    int n;
> +    if (!tcg_enabled()) {
> +        error_report("TB information is only available with accel=tcg");
> +        return;
> +    }
> +    if (!tb_stats_collection_enabled()) {
> +        error_report("TB information not being recorded");
> +        return;
> +    }
> +
> +    n = qdict_get_try_int(qdict, "number", 90);
> +
> +    if (n < 0 || n > 100) {
> +        error_report("Coverset percentage should be between 0 and 100");
> +        return;
> +    }
> +
> +    dump_coverset_info(n, true);
> +}
> +
>  static void hmp_info_jit(Monitor *mon, const QDict *qdict)
>  {
>      if (!tcg_enabled()) {
> diff --git a/util/log.c b/util/log.c
> index 8109d19afb..7bfcbe3703 100644
> --- a/util/log.c
> +++ b/util/log.c
> @@ -19,6 +19,7 @@
>  
>  #include "qemu/osdep.h"
>  #include "qemu/log.h"
> +#include "qemu/qemu-print.h"
>  #include "qemu/range.h"
>  #include "qemu/error-report.h"
>  #include "qapi/error.h"
> @@ -31,25 +32,34 @@ int qemu_loglevel;
>  static int log_append = 0;
>  static GArray *debug_regions;
>  int32_t max_num_hot_tbs_to_dump;
> +static bool to_monitor;
>  
>  /* Return the number of characters emitted.  */
>  int qemu_log(const char *fmt, ...)
>  {
>      int ret = 0;
> -    if (qemu_logfile) {
> -        va_list ap;
> -        va_start(ap, fmt);
> +    va_list ap;
> +    va_start(ap, fmt);
> +
> +    if (to_monitor) {
> +        ret = qemu_vprintf(fmt, ap);
> +    } else if (qemu_logfile) {
>          ret = vfprintf(qemu_logfile, fmt, ap);
> -        va_end(ap);
> +    }
> +    va_end(ap);
>  
> -        /* Don't pass back error results.  */
> -        if (ret < 0) {
> -            ret = 0;
> -        }
> +    /* Don't pass back error results.  */
> +    if (ret < 0) {
> +        ret = 0;
>      }
>      return ret;
>  }
>  
> +void qemu_log_to_monitor(bool enable)
> +{
> +    to_monitor = enable;
> +}
> +

This tweaking to qemu_log should be in a separate patch.

>  static bool log_uses_own_buffers;
>  
>  /* enable or disable low levels log */


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v4 7/7] monitor: adding info tbs, tb, and coverset
  2019-07-20  1:02 ` [Qemu-devel] [PATCH v4 7/7] monitor: adding info tbs, tb, and coverset vandersonmr
  2019-07-26 18:17   ` Alex Bennée
@ 2019-07-29 11:01   ` Alex Bennée
  2019-07-29 15:20     ` Alex Bennée
  1 sibling, 1 reply; 18+ messages in thread
From: Alex Bennée @ 2019-07-29 11:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: vandersonmr, Riku Voipio, Markus Armbruster,
	Dr. David Alan Gilbert, Paolo Bonzini, Richard Henderson,
	Laurent Vivier


vandersonmr <vandersonmr2@gmail.com> writes:

> Adding info [tbs|tb|coverset] commands to HMP.
> These commands allow the exploration of TBs
> generated by the TCG. Understand which one
> hotter, with more guest/host instructions...
> and examine their guest, host and IR code.
>
> The goal of this command is to allow the dynamic exploration
> of TCG behavior and code quality. Therefore, for now, a
> corresponding QMP command is not worthwhile.
>
> Signed-off-by: vandersonmr <vandersonmr2@gmail.com>
> ---
>  accel/tcg/tb-stats.c         | 275 +++++++++++++++++++++++++++++++++++
>  hmp-commands-info.hx         |  23 +++
>  include/exec/tb-stats.h      |  37 +++++
>  include/qemu/log-for-trace.h |   2 +
>  include/qemu/log.h           |   1 +
>  linux-user/exit.c            |   4 +
>  monitor/misc.c               |  71 +++++++++
>  util/log.c                   |  26 +++-
>  8 files changed, 431 insertions(+), 8 deletions(-)
>
> diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
> index 6c330e1b02..85a16cd563 100644
> --- a/accel/tcg/tb-stats.c
> +++ b/accel/tcg/tb-stats.c
> @@ -212,3 +212,278 @@ void do_hmp_tbstats_safe(CPUState *cpu, run_on_cpu_data icmd)
>      g_free(cmdinfo);
>  }
>
> +static void collect_tb_stats(void *p, uint32_t hash, void *userp)
> +{
> +    last_search = g_list_prepend(last_search, p);
> +}
> +
> +static void dump_tb_header(TBStatistics *tbs)
> +{
> +    unsigned g = tbs->translations.total ?
> +        tbs->code.num_guest_inst / tbs->translations.total : 0;
> +    unsigned ops = tbs->translations.total ?
> +        tbs->code.num_tcg_ops / tbs->translations.total : 0;
> +    unsigned ops_opt = tbs->translations.total ?
> +        tbs->code.num_tcg_ops_opt / tbs->translations.total : 0;
> +    unsigned h = tbs->translations.total ?
> +        tbs->code.num_host_inst / tbs->translations.total : 0;
> +    unsigned spills = tbs->translations.total ?
> +        tbs->code.spills / tbs->translations.total : 0;
> +
> +    float guest_host_prop = g ? ((float) h / g) : 0;
> +
> +    qemu_log("TB%d: phys:0x"TB_PAGE_ADDR_FMT" virt:0x"TARGET_FMT_lx
> +             " flags:%#08x (trans:%lu uncached:%lu exec:%lu ints: g:%u op:%u op_opt:%u h:%u h/g:%.2f spills:%d)\n",
> +             tbs->display_id,
> +             tbs->phys_pc, tbs->pc, tbs->flags,
> +             tbs->translations.total, tbs->translations.uncached,
> +             tbs->executions.total, g, ops, ops_opt, h, guest_host_prop,
> +             spills);
> +}
> +
> +static gint
> +inverse_sort_tbs(gconstpointer p1, gconstpointer p2, gpointer psort_by)
> +{
> +    const TBStatistics *tbs1 = (TBStatistics *) p1;
> +    const TBStatistics *tbs2 = (TBStatistics *) p2;
> +    int sort_by = *((int *) psort_by);
> +    unsigned long c1 = 0;
> +    unsigned long c2 = 0;
> +
> +    if (likely(sort_by == SORT_BY_SPILLS)) {
> +        c1 = tbs1->code.spills;
> +        c2 = tbs2->code.spills;
> +    } else if (likely(sort_by == SORT_BY_HOTNESS)) {
> +        c1 = tbs1->executions.total;
> +        c2 = tbs2->executions.total;
> +    } else if (likely(sort_by == SORT_BY_HG)) {
> +        if (tbs1->code.num_guest_inst == 0) {
> +            return -1;
> +        }
> +        if (tbs2->code.num_guest_inst == 0) {
> +            return 1;
> +        }
> +
> +        float a = (float) tbs1->code.num_host_inst / tbs1->code.num_guest_inst;
> +        float b = (float) tbs2->code.num_host_inst / tbs2->code.num_guest_inst;
> +        c1 = a <= b ? 0 : 1;
> +        c2 = a <= b ? 1 : 0;
> +    }
> +
> +    return c1 < c2 ? 1 : c1 == c2 ? 0 : -1;
> +}
> +
> +static void do_dump_coverset_info(int percentage)
> +{
> +    uint64_t total_exec_count = 0;
> +    uint64_t covered_exec_count = 0;
> +    unsigned coverset_size = 0;
> +    int id = 1;
> +    GList *i;
> +
> +    g_list_free(last_search);
> +    last_search = NULL;
> +
> +    qht_iter(&tb_ctx.tb_stats, collect_tb_stats, NULL);
> +
> +    int sort_by = SORT_BY_HOTNESS;
> +    last_search = g_list_sort_with_data(last_search, inverse_sort_tbs, &sort_by);
> +
> +    if (!last_search) {
> +        qemu_log("No data collected yet\n");
> +        return;
> +    }
> +
> +    /* Compute total execution count for all tbs */
> +    for (i = last_search; i; i = i->next) {
> +        TBStatistics *tbs = (TBStatistics *) i->data;
> +        total_exec_count += tbs->executions.total * tbs->code.num_guest_inst;
> +    }
> +
> +    for (i = last_search; i; i = i->next) {
> +        TBStatistics *tbs = (TBStatistics *) i->data;
> +        covered_exec_count += tbs->executions.total * tbs->code.num_guest_inst;
> +        tbs->display_id = id++;
> +        coverset_size++;
> +        dump_tb_header(tbs);
> +
> +        /* Iterate and display tbs until reach the percentage count cover */
> +        if (((double) covered_exec_count / total_exec_count) >
> +                ((double) percentage / 100)) {
> +            break;
> +        }
> +    }
> +
> +    qemu_log("\n------------------------------\n");
> +    qemu_log("# of TBs to reach %d%% of the total of guest insts exec: %u\t",
> +                percentage, coverset_size);
> +    qemu_log("Total of guest insts exec: %lu\n", total_exec_count);
> +    qemu_log("\n------------------------------\n");
> +
> +    /* free the unused bits */
> +    if (i) {
> +        if (i->next) {
> +            i->next->prev = NULL;
> +        }
> +        g_list_free(i->next);
> +        i->next = NULL;
> +    }
> +}
> +
> +static void do_dump_tbs_info(int count, int sort_by)
> +{
> +    int id = 1;
> +    GList *i;
> +
> +    g_list_free(last_search);
> +    last_search = NULL;
> +
> +    qht_iter(&tb_ctx.tb_stats, collect_tb_stats, NULL);
> +
> +    last_search = g_list_sort_with_data(last_search, inverse_sort_tbs,
> +                                        &sort_by);
> +
> +    if (!last_search) {
> +        qemu_printf("No data collected yet!\n");
> +        return;
> +    }
> +
> +    for (i = last_search; i && count--; i = i->next) {
> +        TBStatistics *tbs = (TBStatistics *) i->data;
> +        tbs->display_id = id++;
> +        dump_tb_header(tbs);
> +    }
> +
> +    /* free the unused bits */
> +    if (i) {
> +        if (i->next) {
> +            i->next->prev = NULL;
> +        }
> +        g_list_free(i->next);
> +        i->next = NULL;
> +    }
> +}
> +
> +static void
> +do_dump_coverset_info_safe(CPUState *cpu, run_on_cpu_data percentage)
> +{
> +    qemu_log_to_monitor(true);
> +    do_dump_coverset_info(percentage.host_int);
> +    qemu_log_to_monitor(false);
> +}
> +
> +struct tbs_dump_info {
> +    int count;
> +    int sort_by;
> +};
> +
> +static void do_dump_tbs_info_safe(CPUState *cpu, run_on_cpu_data tbdi)
> +{
> +    struct tbs_dump_info *info = tbdi.host_ptr;
> +    qemu_log_to_monitor(true);
> +    do_dump_tbs_info(info->count, info->sort_by);
> +    qemu_log_to_monitor(false);
> +    g_free(info);
> +}
> +
> +/*
> + * When we dump_tbs_info on a live system via the HMP we want to
> + * ensure the system is quiessent before we start outputting stuff.
> + * Otherwise we could pollute the output with other logging output.
> + */
> +void dump_coverset_info(int percentage, bool use_monitor)
> +{
> +    if (use_monitor) {
> +        async_safe_run_on_cpu(first_cpu, do_dump_coverset_info_safe,
> +                              RUN_ON_CPU_HOST_INT(percentage));
> +    } else {
> +        do_dump_coverset_info(percentage);
> +    }
> +}
> +
> +void dump_tbs_info(int count, int sort_by, bool use_monitor)
> +{
> +    if (use_monitor) {
> +        struct tbs_dump_info *tbdi = g_new(struct tbs_dump_info, 1);
> +        tbdi->count = count;
> +        tbdi->sort_by = sort_by;
> +        async_safe_run_on_cpu(first_cpu, do_dump_tbs_info_safe,
> +                              RUN_ON_CPU_HOST_PTR(tbdi));
> +    } else {
> +        do_dump_tbs_info(count, sort_by);
> +    }
> +}
> +
> +static void do_tb_dump_with_statistics(TBStatistics *tbs, int log_flags)
> +{
> +    CPUState *cpu = current_cpu;
> +    uint32_t cflags = curr_cflags() | CF_NOCACHE;
> +    int old_log_flags = qemu_loglevel;
> +    TranslationBlock *tb = NULL;
> +
> +    qemu_set_log(log_flags);
> +
> +    qemu_log("\n------------------------------\n");
> +    dump_tb_header(tbs);
> +
> +    if (sigsetjmp(cpu->jmp_env, 0) == 0) {
> +        mmap_lock();
> +        tb = tb_gen_code(cpu, tbs->pc, tbs->cs_base, tbs->flags, cflags);
> +        tb_phys_invalidate(tb, -1);
> +        mmap_unlock();
> +    } else {
> +        /*
> +         * The mmap_lock is dropped by tb_gen_code if it runs out of
> +         * memory.
> +         */
> +        fprintf(stderr, "%s: dbg failed!\n", __func__);
> +        assert_no_pages_locked();
> +    }

So this fails if we can't generate a TB as tcg_tb_remove(NULL) will barf
(this was the crash I was seeing). However with two additional patches
from my plugin series:

  translate-all: use cpu_in_exclusive_work_context() in tb_flush
  translate-all: use cpu_in_exclusive_work_context() in tb_flush

We can do a simple repeat operation:

@@ -420,6 +421,7 @@ static void do_tb_dump_with_statistics(TBStatistics *tbs, int log_flags)

     qemu_log("\n------------------------------\n");
     dump_tb_header(tbs);
+ redo:

     if (sigsetjmp(cpu->jmp_env, 0) == 0) {
         mmap_lock();
@@ -433,6 +435,7 @@ static void do_tb_dump_with_statistics(TBStatistics *tbs, int log_flags)
          */
         fprintf(stderr, "%s: dbg failed!\n", __func__);
         assert_no_pages_locked();
+        goto redo;
     }

Because the tb_flush triggered by the generation code will have occured
straight away.

I'd suggest cherry picking those two patches into your next series.

> +
> +    qemu_set_log(old_log_flags);
> +
> +    tcg_tb_remove(tb);
> +}
> +
> +struct tb_dump_info {
> +    int id;
> +    int log_flags;
> +    bool use_monitor;
> +};
> +
> +static void do_dump_tb_info_safe(CPUState *cpu, run_on_cpu_data info)
> +{
> +    struct tb_dump_info *tbdi = (struct tb_dump_info *) info.host_ptr;
> +    GList *iter;
> +
> +    if (!last_search) {
> +        qemu_printf("no search on record");
> +        return;
> +    }
> +    qemu_log_to_monitor(tbdi->use_monitor);
> +
> +    for (iter = last_search; iter; iter = g_list_next(iter)) {
> +        TBStatistics *tbs = iter->data;
> +        if (tbs->display_id == tbdi->id) {
> +            do_tb_dump_with_statistics(tbs, tbdi->log_flags);
> +            break;
> +        }
> +    }
> +    qemu_log_to_monitor(false);
> +    g_free(tbdi);
> +}
> +
> +void dump_tb_info(int id, int log_mask, bool use_monitor)
> +{
> +    struct tb_dump_info *tbdi = g_new(struct tb_dump_info, 1);
> +
> +    tbdi->id = id;
> +    tbdi->log_flags = log_mask;
> +    tbdi->use_monitor = use_monitor;
> +
> +    async_safe_run_on_cpu(first_cpu, do_dump_tb_info_safe,
> +                          RUN_ON_CPU_HOST_PTR(tbdi));
> +
> +    /* tbdi free'd by do_dump_tb_info_safe */
> +}
> +
> diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
> index c59444c461..761c76619b 100644
> --- a/hmp-commands-info.hx
> +++ b/hmp-commands-info.hx
> @@ -289,6 +289,29 @@ ETEXI
>          .help       = "show dynamic compiler info",
>          .cmd        = hmp_info_jit,
>      },
> +    {
> +        .name       = "tbs",
> +        .args_type  = "number:i?,sortedby:s?",
> +        .params     = "[number sortedby]",
> +        .help       = "show a [number] translated blocks sorted by [sortedby]"
> +                      "sortedby opts: hotness hg",
> +        .cmd        = hmp_info_tbs,
> +    },
> +    {
> +        .name       = "tb",
> +        .args_type  = "id:i,flags:s?",
> +        .params     = "id [log1[,...] flags]",
> +        .help       = "show information about one translated block by id",
> +        .cmd        = hmp_info_tb,
> +    },
> +    {
> +        .name       = "coverset",
> +        .args_type  = "number:i?",
> +        .params     = "[number]",
> +        .help       = "show hottest translated blocks neccesary to cover"
> +                      "[number]% of the execution count",
> +        .cmd        = hmp_info_coverset,
> +    },
>  #endif
>
>  STEXI
> diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
> index d1debd3262..e2ab1068c3 100644
> --- a/include/exec/tb-stats.h
> +++ b/include/exec/tb-stats.h
> @@ -57,6 +57,9 @@ struct TBStatistics {
>          unsigned long uncached;
>          unsigned long spanning;
>      } translations;
> +
> +    /* HMP information - used for referring to previous search */
> +    int display_id;
>  };
>
>  bool tb_stats_cmp(const void *ap, const void *bp);
> @@ -73,4 +76,38 @@ struct TbstatsCommand {
>
>  void do_hmp_tbstats_safe(CPUState *cpu, run_on_cpu_data icmd);
>
> +/**
> + * dump_coverset_info: report the hottest blocks to cover n% of execution
> + *
> + * @percentage: cover set percentage
> + * @use_monitor: redirect output to monitor
> + *
> + * Report the hottest blocks to either the log or monitor
> + */
> +void dump_coverset_info(int percentage, bool use_monitor);
> +
> +
> +/**
> + * dump_tbs_info: report the hottest blocks
> + *
> + * @count: the limit of hotblocks
> + * @sort_by: property in which the dump will be sorted
> + * @use_monitor: redirect output to monitor
> + *
> + * Report the hottest blocks to either the log or monitor
> + */
> +void dump_tbs_info(int count, int sort_by, bool use_monitor);
> +
> +/**
> + * dump_tb_info: dump information about one TB
> + *
> + * @id: the display id of the block (from previous search)
> + * @mask: the temporary logging mask
> + * @Use_monitor: redirect output to monitor
> + *
> + * Re-run a translation of a block at addr for the purposes of debug output
> + */
> +void dump_tb_info(int id, int log_mask, bool use_monitor);
> +
> +
>  #endif
> diff --git a/include/qemu/log-for-trace.h b/include/qemu/log-for-trace.h
> index 2f0a5b080e..d65eb83037 100644
> --- a/include/qemu/log-for-trace.h
> +++ b/include/qemu/log-for-trace.h
> @@ -21,6 +21,8 @@
>  /* Private global variable, don't use */
>  extern int qemu_loglevel;
>
> +extern int32_t max_num_hot_tbs_to_dump;
> +
>  #define LOG_TRACE          (1 << 15)
>
>  /* Returns true if a bit is set in the current loglevel mask */
> diff --git a/include/qemu/log.h b/include/qemu/log.h
> index 47071d72c7..27ba267faa 100644
> --- a/include/qemu/log.h
> +++ b/include/qemu/log.h
> @@ -114,6 +114,7 @@ typedef struct QEMULogItem {
>  extern const QEMULogItem qemu_log_items[];
>
>  void qemu_set_log(int log_flags);
> +void qemu_log_to_monitor(bool enable);
>  void qemu_log_needs_buffers(void);
>  void qemu_set_log_filename(const char *filename, Error **errp);
>  void qemu_set_dfilter_ranges(const char *ranges, Error **errp);
> diff --git a/linux-user/exit.c b/linux-user/exit.c
> index bdda720553..7226104959 100644
> --- a/linux-user/exit.c
> +++ b/linux-user/exit.c
> @@ -28,6 +28,10 @@ extern void __gcov_dump(void);
>
>  void preexit_cleanup(CPUArchState *env, int code)
>  {
> +    if (tb_stats_collection_enabled()) {
> +        dump_tbs_info(max_num_hot_tbs_to_dump, SORT_BY_HOTNESS, false);
> +    }
> +
>  #ifdef TARGET_GPROF
>          _mcleanup();
>  #endif
> diff --git a/monitor/misc.c b/monitor/misc.c
> index aea9b0db4f..78ada8aa0e 100644
> --- a/monitor/misc.c
> +++ b/monitor/misc.c
> @@ -499,6 +499,77 @@ static void hmp_tbstats(Monitor *mon, const QDict *qdict)
>
>  }
>
> +static void hmp_info_tbs(Monitor *mon, const QDict *qdict)
> +{
> +    int n;
> +    const char *s = NULL;
> +    if (!tcg_enabled()) {
> +        error_report("TB information is only available with accel=tcg");
> +        return;
> +    }
> +    if (!tb_ctx.tb_stats.map) {
> +        error_report("no TB information recorded");
> +        return;
> +    }
> +
> +    n = qdict_get_try_int(qdict, "number", 10);
> +    s = qdict_get_try_str(qdict, "sortedby");
> +
> +    int sortedby = 0;
> +    if (s == NULL || strcmp(s, "hotness") == 0) {
> +        sortedby = SORT_BY_HOTNESS;
> +    } else if (strcmp(s, "hg") == 0) {
> +        sortedby = SORT_BY_HG;
> +    } else if (strcmp(s, "spills") == 0) {
> +        sortedby = SORT_BY_SPILLS;
> +    }
> +
> +    dump_tbs_info(n, sortedby, true);
> +}
> +
> +static void hmp_info_tb(Monitor *mon, const QDict *qdict)
> +{
> +    const int id = qdict_get_int(qdict, "id");
> +    const char *flags = qdict_get_try_str(qdict, "flags");
> +    int mask;
> +
> +    if (!tcg_enabled()) {
> +        error_report("TB information is only available with accel=tcg");
> +        return;
> +    }
> +
> +    mask = flags ? qemu_str_to_log_mask(flags) : CPU_LOG_TB_IN_ASM;
> +
> +    if (!mask) {
> +        help_cmd(mon, "log");
> +        return;
> +    }
> +
> +    dump_tb_info(id, mask, true);
> +}
> +
> +static void hmp_info_coverset(Monitor *mon, const QDict *qdict)
> +{
> +    int n;
> +    if (!tcg_enabled()) {
> +        error_report("TB information is only available with accel=tcg");
> +        return;
> +    }
> +    if (!tb_stats_collection_enabled()) {
> +        error_report("TB information not being recorded");
> +        return;
> +    }
> +
> +    n = qdict_get_try_int(qdict, "number", 90);
> +
> +    if (n < 0 || n > 100) {
> +        error_report("Coverset percentage should be between 0 and 100");
> +        return;
> +    }
> +
> +    dump_coverset_info(n, true);
> +}
> +
>  static void hmp_info_jit(Monitor *mon, const QDict *qdict)
>  {
>      if (!tcg_enabled()) {
> diff --git a/util/log.c b/util/log.c
> index 8109d19afb..7bfcbe3703 100644
> --- a/util/log.c
> +++ b/util/log.c
> @@ -19,6 +19,7 @@
>
>  #include "qemu/osdep.h"
>  #include "qemu/log.h"
> +#include "qemu/qemu-print.h"
>  #include "qemu/range.h"
>  #include "qemu/error-report.h"
>  #include "qapi/error.h"
> @@ -31,25 +32,34 @@ int qemu_loglevel;
>  static int log_append = 0;
>  static GArray *debug_regions;
>  int32_t max_num_hot_tbs_to_dump;
> +static bool to_monitor;
>
>  /* Return the number of characters emitted.  */
>  int qemu_log(const char *fmt, ...)
>  {
>      int ret = 0;
> -    if (qemu_logfile) {
> -        va_list ap;
> -        va_start(ap, fmt);
> +    va_list ap;
> +    va_start(ap, fmt);
> +
> +    if (to_monitor) {
> +        ret = qemu_vprintf(fmt, ap);
> +    } else if (qemu_logfile) {
>          ret = vfprintf(qemu_logfile, fmt, ap);
> -        va_end(ap);
> +    }
> +    va_end(ap);
>
> -        /* Don't pass back error results.  */
> -        if (ret < 0) {
> -            ret = 0;
> -        }
> +    /* Don't pass back error results.  */
> +    if (ret < 0) {
> +        ret = 0;
>      }
>      return ret;
>  }
>
> +void qemu_log_to_monitor(bool enable)
> +{
> +    to_monitor = enable;
> +}
> +
>  static bool log_uses_own_buffers;
>
>  /* enable or disable low levels log */


--
Alex Bennée


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v4 7/7] monitor: adding info tbs, tb, and coverset
  2019-07-29 11:01   ` Alex Bennée
@ 2019-07-29 15:20     ` Alex Bennée
  0 siblings, 0 replies; 18+ messages in thread
From: Alex Bennée @ 2019-07-29 15:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: vandersonmr, Riku Voipio, Markus Armbruster,
	Dr. David Alan Gilbert, Paolo Bonzini, Richard Henderson,
	Laurent Vivier


Alex Bennée <alex.bennee@linaro.org> writes:

> vandersonmr <vandersonmr2@gmail.com> writes:
>
>> Adding info [tbs|tb|coverset] commands to HMP.
>> These commands allow the exploration of TBs
>> generated by the TCG. Understand which one
>> hotter, with more guest/host instructions...
>> and examine their guest, host and IR code.
>>
>> The goal of this command is to allow the dynamic exploration
>> of TCG behavior and code quality. Therefore, for now, a
>> corresponding QMP command is not worthwhile.
>>
>> Signed-off-by: vandersonmr <vandersonmr2@gmail.com>
>> ---
>>  accel/tcg/tb-stats.c         | 275 +++++++++++++++++++++++++++++++++++
>>  hmp-commands-info.hx         |  23 +++
>>  include/exec/tb-stats.h      |  37 +++++
>>  include/qemu/log-for-trace.h |   2 +
>>  include/qemu/log.h           |   1 +
>>  linux-user/exit.c            |   4 +
>>  monitor/misc.c               |  71 +++++++++
>>  util/log.c                   |  26 +++-
>>  8 files changed, 431 insertions(+), 8 deletions(-)
>>
>> diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
>> index 6c330e1b02..85a16cd563 100644
>> --- a/accel/tcg/tb-stats.c
>> +++ b/accel/tcg/tb-stats.c
>> @@ -212,3 +212,278 @@ void do_hmp_tbstats_safe(CPUState *cpu, run_on_cpu_data icmd)
>>      g_free(cmdinfo);
>>  }
>>
>> +static void collect_tb_stats(void *p, uint32_t hash, void *userp)
>> +{
>> +    last_search = g_list_prepend(last_search, p);
>> +}
>> +
>> +static void dump_tb_header(TBStatistics *tbs)
>> +{
>> +    unsigned g = tbs->translations.total ?
>> +        tbs->code.num_guest_inst / tbs->translations.total : 0;
>> +    unsigned ops = tbs->translations.total ?
>> +        tbs->code.num_tcg_ops / tbs->translations.total : 0;
>> +    unsigned ops_opt = tbs->translations.total ?
>> +        tbs->code.num_tcg_ops_opt / tbs->translations.total : 0;
>> +    unsigned h = tbs->translations.total ?
>> +        tbs->code.num_host_inst / tbs->translations.total : 0;
>> +    unsigned spills = tbs->translations.total ?
>> +        tbs->code.spills / tbs->translations.total : 0;
>> +
>> +    float guest_host_prop = g ? ((float) h / g) : 0;
>> +
>> +    qemu_log("TB%d: phys:0x"TB_PAGE_ADDR_FMT" virt:0x"TARGET_FMT_lx
>> +             " flags:%#08x (trans:%lu uncached:%lu exec:%lu ints: g:%u op:%u op_opt:%u h:%u h/g:%.2f spills:%d)\n",
>> +             tbs->display_id,
>> +             tbs->phys_pc, tbs->pc, tbs->flags,
>> +             tbs->translations.total, tbs->translations.uncached,
>> +             tbs->executions.total, g, ops, ops_opt, h, guest_host_prop,
>> +             spills);
>> +}
>> +
>> +static gint
>> +inverse_sort_tbs(gconstpointer p1, gconstpointer p2, gpointer psort_by)
>> +{
>> +    const TBStatistics *tbs1 = (TBStatistics *) p1;
>> +    const TBStatistics *tbs2 = (TBStatistics *) p2;
>> +    int sort_by = *((int *) psort_by);
>> +    unsigned long c1 = 0;
>> +    unsigned long c2 = 0;
>> +
>> +    if (likely(sort_by == SORT_BY_SPILLS)) {
>> +        c1 = tbs1->code.spills;
>> +        c2 = tbs2->code.spills;
>> +    } else if (likely(sort_by == SORT_BY_HOTNESS)) {
>> +        c1 = tbs1->executions.total;
>> +        c2 = tbs2->executions.total;
>> +    } else if (likely(sort_by == SORT_BY_HG)) {
>> +        if (tbs1->code.num_guest_inst == 0) {
>> +            return -1;
>> +        }
>> +        if (tbs2->code.num_guest_inst == 0) {
>> +            return 1;
>> +        }
>> +
>> +        float a = (float) tbs1->code.num_host_inst / tbs1->code.num_guest_inst;
>> +        float b = (float) tbs2->code.num_host_inst / tbs2->code.num_guest_inst;
>> +        c1 = a <= b ? 0 : 1;
>> +        c2 = a <= b ? 1 : 0;
>> +    }
>> +
>> +    return c1 < c2 ? 1 : c1 == c2 ? 0 : -1;
>> +}
>> +
>> +static void do_dump_coverset_info(int percentage)
>> +{
>> +    uint64_t total_exec_count = 0;
>> +    uint64_t covered_exec_count = 0;
>> +    unsigned coverset_size = 0;
>> +    int id = 1;
>> +    GList *i;
>> +
>> +    g_list_free(last_search);
>> +    last_search = NULL;
>> +
>> +    qht_iter(&tb_ctx.tb_stats, collect_tb_stats, NULL);
>> +
>> +    int sort_by = SORT_BY_HOTNESS;
>> +    last_search = g_list_sort_with_data(last_search, inverse_sort_tbs, &sort_by);
>> +
>> +    if (!last_search) {
>> +        qemu_log("No data collected yet\n");
>> +        return;
>> +    }
>> +
>> +    /* Compute total execution count for all tbs */
>> +    for (i = last_search; i; i = i->next) {
>> +        TBStatistics *tbs = (TBStatistics *) i->data;
>> +        total_exec_count += tbs->executions.total * tbs->code.num_guest_inst;
>> +    }
>> +
>> +    for (i = last_search; i; i = i->next) {
>> +        TBStatistics *tbs = (TBStatistics *) i->data;
>> +        covered_exec_count += tbs->executions.total * tbs->code.num_guest_inst;
>> +        tbs->display_id = id++;
>> +        coverset_size++;
>> +        dump_tb_header(tbs);
>> +
>> +        /* Iterate and display tbs until reach the percentage count cover */
>> +        if (((double) covered_exec_count / total_exec_count) >
>> +                ((double) percentage / 100)) {
>> +            break;
>> +        }
>> +    }
>> +
>> +    qemu_log("\n------------------------------\n");
>> +    qemu_log("# of TBs to reach %d%% of the total of guest insts exec: %u\t",
>> +                percentage, coverset_size);
>> +    qemu_log("Total of guest insts exec: %lu\n", total_exec_count);
>> +    qemu_log("\n------------------------------\n");
>> +
>> +    /* free the unused bits */
>> +    if (i) {
>> +        if (i->next) {
>> +            i->next->prev = NULL;
>> +        }
>> +        g_list_free(i->next);
>> +        i->next = NULL;
>> +    }
>> +}
>> +
>> +static void do_dump_tbs_info(int count, int sort_by)
>> +{
>> +    int id = 1;
>> +    GList *i;
>> +
>> +    g_list_free(last_search);
>> +    last_search = NULL;
>> +
>> +    qht_iter(&tb_ctx.tb_stats, collect_tb_stats, NULL);
>> +
>> +    last_search = g_list_sort_with_data(last_search, inverse_sort_tbs,
>> +                                        &sort_by);
>> +
>> +    if (!last_search) {
>> +        qemu_printf("No data collected yet!\n");
>> +        return;
>> +    }
>> +
>> +    for (i = last_search; i && count--; i = i->next) {
>> +        TBStatistics *tbs = (TBStatistics *) i->data;
>> +        tbs->display_id = id++;
>> +        dump_tb_header(tbs);
>> +    }
>> +
>> +    /* free the unused bits */
>> +    if (i) {
>> +        if (i->next) {
>> +            i->next->prev = NULL;
>> +        }
>> +        g_list_free(i->next);
>> +        i->next = NULL;
>> +    }
>> +}
>> +
>> +static void
>> +do_dump_coverset_info_safe(CPUState *cpu, run_on_cpu_data percentage)
>> +{
>> +    qemu_log_to_monitor(true);
>> +    do_dump_coverset_info(percentage.host_int);
>> +    qemu_log_to_monitor(false);
>> +}
>> +
>> +struct tbs_dump_info {
>> +    int count;
>> +    int sort_by;
>> +};
>> +
>> +static void do_dump_tbs_info_safe(CPUState *cpu, run_on_cpu_data tbdi)
>> +{
>> +    struct tbs_dump_info *info = tbdi.host_ptr;
>> +    qemu_log_to_monitor(true);
>> +    do_dump_tbs_info(info->count, info->sort_by);
>> +    qemu_log_to_monitor(false);
>> +    g_free(info);
>> +}
>> +
>> +/*
>> + * When we dump_tbs_info on a live system via the HMP we want to
>> + * ensure the system is quiessent before we start outputting stuff.
>> + * Otherwise we could pollute the output with other logging output.
>> + */
>> +void dump_coverset_info(int percentage, bool use_monitor)
>> +{
>> +    if (use_monitor) {
>> +        async_safe_run_on_cpu(first_cpu, do_dump_coverset_info_safe,
>> +                              RUN_ON_CPU_HOST_INT(percentage));
>> +    } else {
>> +        do_dump_coverset_info(percentage);
>> +    }
>> +}
>> +
>> +void dump_tbs_info(int count, int sort_by, bool use_monitor)
>> +{
>> +    if (use_monitor) {
>> +        struct tbs_dump_info *tbdi = g_new(struct tbs_dump_info, 1);
>> +        tbdi->count = count;
>> +        tbdi->sort_by = sort_by;
>> +        async_safe_run_on_cpu(first_cpu, do_dump_tbs_info_safe,
>> +                              RUN_ON_CPU_HOST_PTR(tbdi));
>> +    } else {
>> +        do_dump_tbs_info(count, sort_by);
>> +    }
>> +}
>> +
>> +static void do_tb_dump_with_statistics(TBStatistics *tbs, int log_flags)
>> +{
>> +    CPUState *cpu = current_cpu;
>> +    uint32_t cflags = curr_cflags() | CF_NOCACHE;
>> +    int old_log_flags = qemu_loglevel;
>> +    TranslationBlock *tb = NULL;
>> +
>> +    qemu_set_log(log_flags);
>> +
>> +    qemu_log("\n------------------------------\n");
>> +    dump_tb_header(tbs);
>> +
>> +    if (sigsetjmp(cpu->jmp_env, 0) == 0) {
>> +        mmap_lock();
>> +        tb = tb_gen_code(cpu, tbs->pc, tbs->cs_base, tbs->flags, cflags);
>> +        tb_phys_invalidate(tb, -1);
>> +        mmap_unlock();
>> +    } else {
>> +        /*
>> +         * The mmap_lock is dropped by tb_gen_code if it runs out of
>> +         * memory.
>> +         */
>> +        fprintf(stderr, "%s: dbg failed!\n", __func__);
>> +        assert_no_pages_locked();
>> +    }
>
> So this fails if we can't generate a TB as tcg_tb_remove(NULL) will barf
> (this was the crash I was seeing). However with two additional patches
> from my plugin series:
>
>   translate-all: use cpu_in_exclusive_work_context() in tb_flush
>   translate-all: use cpu_in_exclusive_work_context() in tb_flush
>
> We can do a simple repeat operation:
>
> @@ -420,6 +421,7 @@ static void do_tb_dump_with_statistics(TBStatistics *tbs, int log_flags)
>
>      qemu_log("\n------------------------------\n");
>      dump_tb_header(tbs);
> + redo:
>
>      if (sigsetjmp(cpu->jmp_env, 0) == 0) {
>          mmap_lock();
> @@ -433,6 +435,7 @@ static void do_tb_dump_with_statistics(TBStatistics *tbs, int log_flags)
>           */
>          fprintf(stderr, "%s: dbg failed!\n", __func__);
>          assert_no_pages_locked();
> +        goto redo;
>      }
>
> Because the tb_flush triggered by the generation code will have occured
> straight away.
>
> I'd suggest cherry picking those two patches into your next series.

Hmm not quite enough because we need to handle the case that we cannot
re-translate because the page has been mapped out.

>
>> +
>> +    qemu_set_log(old_log_flags);
>> +
>> +    tcg_tb_remove(tb);
>> +}
>> +
>> +struct tb_dump_info {
>> +    int id;
>> +    int log_flags;
>> +    bool use_monitor;
>> +};
>> +
>> +static void do_dump_tb_info_safe(CPUState *cpu, run_on_cpu_data info)
>> +{
>> +    struct tb_dump_info *tbdi = (struct tb_dump_info *) info.host_ptr;
>> +    GList *iter;
>> +
>> +    if (!last_search) {
>> +        qemu_printf("no search on record");
>> +        return;
>> +    }
>> +    qemu_log_to_monitor(tbdi->use_monitor);
>> +
>> +    for (iter = last_search; iter; iter = g_list_next(iter)) {
>> +        TBStatistics *tbs = iter->data;
>> +        if (tbs->display_id == tbdi->id) {
>> +            do_tb_dump_with_statistics(tbs, tbdi->log_flags);
>> +            break;
>> +        }
>> +    }
>> +    qemu_log_to_monitor(false);
>> +    g_free(tbdi);
>> +}
>> +
>> +void dump_tb_info(int id, int log_mask, bool use_monitor)
>> +{
>> +    struct tb_dump_info *tbdi = g_new(struct tb_dump_info, 1);
>> +
>> +    tbdi->id = id;
>> +    tbdi->log_flags = log_mask;
>> +    tbdi->use_monitor = use_monitor;
>> +
>> +    async_safe_run_on_cpu(first_cpu, do_dump_tb_info_safe,
>> +                          RUN_ON_CPU_HOST_PTR(tbdi));
>> +
>> +    /* tbdi free'd by do_dump_tb_info_safe */
>> +}
>> +
>> diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
>> index c59444c461..761c76619b 100644
>> --- a/hmp-commands-info.hx
>> +++ b/hmp-commands-info.hx
>> @@ -289,6 +289,29 @@ ETEXI
>>          .help       = "show dynamic compiler info",
>>          .cmd        = hmp_info_jit,
>>      },
>> +    {
>> +        .name       = "tbs",
>> +        .args_type  = "number:i?,sortedby:s?",
>> +        .params     = "[number sortedby]",
>> +        .help       = "show a [number] translated blocks sorted by [sortedby]"
>> +                      "sortedby opts: hotness hg",
>> +        .cmd        = hmp_info_tbs,
>> +    },
>> +    {
>> +        .name       = "tb",
>> +        .args_type  = "id:i,flags:s?",
>> +        .params     = "id [log1[,...] flags]",
>> +        .help       = "show information about one translated block by id",
>> +        .cmd        = hmp_info_tb,
>> +    },
>> +    {
>> +        .name       = "coverset",
>> +        .args_type  = "number:i?",
>> +        .params     = "[number]",
>> +        .help       = "show hottest translated blocks neccesary to cover"
>> +                      "[number]% of the execution count",
>> +        .cmd        = hmp_info_coverset,
>> +    },
>>  #endif
>>
>>  STEXI
>> diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
>> index d1debd3262..e2ab1068c3 100644
>> --- a/include/exec/tb-stats.h
>> +++ b/include/exec/tb-stats.h
>> @@ -57,6 +57,9 @@ struct TBStatistics {
>>          unsigned long uncached;
>>          unsigned long spanning;
>>      } translations;
>> +
>> +    /* HMP information - used for referring to previous search */
>> +    int display_id;
>>  };
>>
>>  bool tb_stats_cmp(const void *ap, const void *bp);
>> @@ -73,4 +76,38 @@ struct TbstatsCommand {
>>
>>  void do_hmp_tbstats_safe(CPUState *cpu, run_on_cpu_data icmd);
>>
>> +/**
>> + * dump_coverset_info: report the hottest blocks to cover n% of execution
>> + *
>> + * @percentage: cover set percentage
>> + * @use_monitor: redirect output to monitor
>> + *
>> + * Report the hottest blocks to either the log or monitor
>> + */
>> +void dump_coverset_info(int percentage, bool use_monitor);
>> +
>> +
>> +/**
>> + * dump_tbs_info: report the hottest blocks
>> + *
>> + * @count: the limit of hotblocks
>> + * @sort_by: property in which the dump will be sorted
>> + * @use_monitor: redirect output to monitor
>> + *
>> + * Report the hottest blocks to either the log or monitor
>> + */
>> +void dump_tbs_info(int count, int sort_by, bool use_monitor);
>> +
>> +/**
>> + * dump_tb_info: dump information about one TB
>> + *
>> + * @id: the display id of the block (from previous search)
>> + * @mask: the temporary logging mask
>> + * @Use_monitor: redirect output to monitor
>> + *
>> + * Re-run a translation of a block at addr for the purposes of debug output
>> + */
>> +void dump_tb_info(int id, int log_mask, bool use_monitor);
>> +
>> +
>>  #endif
>> diff --git a/include/qemu/log-for-trace.h b/include/qemu/log-for-trace.h
>> index 2f0a5b080e..d65eb83037 100644
>> --- a/include/qemu/log-for-trace.h
>> +++ b/include/qemu/log-for-trace.h
>> @@ -21,6 +21,8 @@
>>  /* Private global variable, don't use */
>>  extern int qemu_loglevel;
>>
>> +extern int32_t max_num_hot_tbs_to_dump;
>> +
>>  #define LOG_TRACE          (1 << 15)
>>
>>  /* Returns true if a bit is set in the current loglevel mask */
>> diff --git a/include/qemu/log.h b/include/qemu/log.h
>> index 47071d72c7..27ba267faa 100644
>> --- a/include/qemu/log.h
>> +++ b/include/qemu/log.h
>> @@ -114,6 +114,7 @@ typedef struct QEMULogItem {
>>  extern const QEMULogItem qemu_log_items[];
>>
>>  void qemu_set_log(int log_flags);
>> +void qemu_log_to_monitor(bool enable);
>>  void qemu_log_needs_buffers(void);
>>  void qemu_set_log_filename(const char *filename, Error **errp);
>>  void qemu_set_dfilter_ranges(const char *ranges, Error **errp);
>> diff --git a/linux-user/exit.c b/linux-user/exit.c
>> index bdda720553..7226104959 100644
>> --- a/linux-user/exit.c
>> +++ b/linux-user/exit.c
>> @@ -28,6 +28,10 @@ extern void __gcov_dump(void);
>>
>>  void preexit_cleanup(CPUArchState *env, int code)
>>  {
>> +    if (tb_stats_collection_enabled()) {
>> +        dump_tbs_info(max_num_hot_tbs_to_dump, SORT_BY_HOTNESS, false);
>> +    }
>> +
>>  #ifdef TARGET_GPROF
>>          _mcleanup();
>>  #endif
>> diff --git a/monitor/misc.c b/monitor/misc.c
>> index aea9b0db4f..78ada8aa0e 100644
>> --- a/monitor/misc.c
>> +++ b/monitor/misc.c
>> @@ -499,6 +499,77 @@ static void hmp_tbstats(Monitor *mon, const QDict *qdict)
>>
>>  }
>>
>> +static void hmp_info_tbs(Monitor *mon, const QDict *qdict)
>> +{
>> +    int n;
>> +    const char *s = NULL;
>> +    if (!tcg_enabled()) {
>> +        error_report("TB information is only available with accel=tcg");
>> +        return;
>> +    }
>> +    if (!tb_ctx.tb_stats.map) {
>> +        error_report("no TB information recorded");
>> +        return;
>> +    }
>> +
>> +    n = qdict_get_try_int(qdict, "number", 10);
>> +    s = qdict_get_try_str(qdict, "sortedby");
>> +
>> +    int sortedby = 0;
>> +    if (s == NULL || strcmp(s, "hotness") == 0) {
>> +        sortedby = SORT_BY_HOTNESS;
>> +    } else if (strcmp(s, "hg") == 0) {
>> +        sortedby = SORT_BY_HG;
>> +    } else if (strcmp(s, "spills") == 0) {
>> +        sortedby = SORT_BY_SPILLS;
>> +    }
>> +
>> +    dump_tbs_info(n, sortedby, true);
>> +}
>> +
>> +static void hmp_info_tb(Monitor *mon, const QDict *qdict)
>> +{
>> +    const int id = qdict_get_int(qdict, "id");
>> +    const char *flags = qdict_get_try_str(qdict, "flags");
>> +    int mask;
>> +
>> +    if (!tcg_enabled()) {
>> +        error_report("TB information is only available with accel=tcg");
>> +        return;
>> +    }
>> +
>> +    mask = flags ? qemu_str_to_log_mask(flags) : CPU_LOG_TB_IN_ASM;
>> +
>> +    if (!mask) {
>> +        help_cmd(mon, "log");
>> +        return;
>> +    }
>> +
>> +    dump_tb_info(id, mask, true);
>> +}
>> +
>> +static void hmp_info_coverset(Monitor *mon, const QDict *qdict)
>> +{
>> +    int n;
>> +    if (!tcg_enabled()) {
>> +        error_report("TB information is only available with accel=tcg");
>> +        return;
>> +    }
>> +    if (!tb_stats_collection_enabled()) {
>> +        error_report("TB information not being recorded");
>> +        return;
>> +    }
>> +
>> +    n = qdict_get_try_int(qdict, "number", 90);
>> +
>> +    if (n < 0 || n > 100) {
>> +        error_report("Coverset percentage should be between 0 and 100");
>> +        return;
>> +    }
>> +
>> +    dump_coverset_info(n, true);
>> +}
>> +
>>  static void hmp_info_jit(Monitor *mon, const QDict *qdict)
>>  {
>>      if (!tcg_enabled()) {
>> diff --git a/util/log.c b/util/log.c
>> index 8109d19afb..7bfcbe3703 100644
>> --- a/util/log.c
>> +++ b/util/log.c
>> @@ -19,6 +19,7 @@
>>
>>  #include "qemu/osdep.h"
>>  #include "qemu/log.h"
>> +#include "qemu/qemu-print.h"
>>  #include "qemu/range.h"
>>  #include "qemu/error-report.h"
>>  #include "qapi/error.h"
>> @@ -31,25 +32,34 @@ int qemu_loglevel;
>>  static int log_append = 0;
>>  static GArray *debug_regions;
>>  int32_t max_num_hot_tbs_to_dump;
>> +static bool to_monitor;
>>
>>  /* Return the number of characters emitted.  */
>>  int qemu_log(const char *fmt, ...)
>>  {
>>      int ret = 0;
>> -    if (qemu_logfile) {
>> -        va_list ap;
>> -        va_start(ap, fmt);
>> +    va_list ap;
>> +    va_start(ap, fmt);
>> +
>> +    if (to_monitor) {
>> +        ret = qemu_vprintf(fmt, ap);
>> +    } else if (qemu_logfile) {
>>          ret = vfprintf(qemu_logfile, fmt, ap);
>> -        va_end(ap);
>> +    }
>> +    va_end(ap);
>>
>> -        /* Don't pass back error results.  */
>> -        if (ret < 0) {
>> -            ret = 0;
>> -        }
>> +    /* Don't pass back error results.  */
>> +    if (ret < 0) {
>> +        ret = 0;
>>      }
>>      return ret;
>>  }
>>
>> +void qemu_log_to_monitor(bool enable)
>> +{
>> +    to_monitor = enable;
>> +}
>> +
>>  static bool log_uses_own_buffers;
>>
>>  /* enable or disable low levels log */


--
Alex Bennée


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-07-29 15:27 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-20  1:02 [Qemu-devel] [PATCH v4 0/7] Measure Tiny Code Generation Quality vandersonmr
2019-07-20  1:02 ` [Qemu-devel] [PATCH v4 1/7] accel: introducing TBStatistics structure vandersonmr
2019-07-26 11:56   ` Alex Bennée
2019-07-20  1:02 ` [Qemu-devel] [PATCH v4 2/7] accel: collecting TB execution count vandersonmr
2019-07-26 13:38   ` Alex Bennée
2019-07-20  1:02 ` [Qemu-devel] [PATCH v4 3/7] accel: collecting JIT statistics vandersonmr
2019-07-26 14:46   ` Alex Bennée
2019-07-20  1:02 ` [Qemu-devel] [PATCH v4 4/7] accel: replacing part of CONFIG_PROFILER with TBStats vandersonmr
2019-07-26 15:25   ` Alex Bennée
2019-07-20  1:02 ` [Qemu-devel] [PATCH v4 5/7] log: adding -d tb_stats to control tbstats vandersonmr
2019-07-26 16:20   ` Alex Bennée
2019-07-20  1:02 ` [Qemu-devel] [PATCH v4 6/7] monitor: adding tb_stats hmp command vandersonmr
2019-07-26 16:57   ` Alex Bennée
2019-07-20  1:02 ` [Qemu-devel] [PATCH v4 7/7] monitor: adding info tbs, tb, and coverset vandersonmr
2019-07-26 18:17   ` Alex Bennée
2019-07-29 11:01   ` Alex Bennée
2019-07-29 15:20     ` Alex Bennée
2019-07-26 12:51 ` [Qemu-devel] [PATCH v4 0/7] Measure Tiny Code Generation Quality Alex Bennée

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).