qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/4] dumping hot TBs
@ 2019-06-24  5:54 vandersonmr
  2019-06-24  5:54 ` [Qemu-devel] [PATCH v2 1/4] add and link a statistic struct to TBs vandersonmr
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: vandersonmr @ 2019-06-24  5:54 UTC (permalink / raw)
  To: qemu-devel

It adds a new structure which is linked with each TBs and stores its statistics.
We collect the execution count of the TBs and store in this new structure.
The information stored in this new struct is then used to support a new
command line -d hot_tbs:N which dumps information of the N most hot TBs.

Different from v1 now the execution count is being updated directly
from the TBStatistics so we do not need to copy the data when flushing.

[PATCH v2 1/4] add and link a statistic struct to TBs
[PATCH v2 2/4] Adding an optional tb execution counter.
[PATCH v2 3/4] Introduce dump of hot TBs
[PATCH v2 4/4] adding -d hot_tbs:limit command line option



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH v2 1/4] add and link a statistic struct to TBs
  2019-06-24  5:54 [Qemu-devel] [PATCH v2 0/4] dumping hot TBs vandersonmr
@ 2019-06-24  5:54 ` vandersonmr
  2019-06-24 15:06   ` Alex Bennée
  2019-06-24  5:54 ` [Qemu-devel] [PATCH v2 2/4] Adding an optional tb execution counter vandersonmr
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: vandersonmr @ 2019-06-24  5:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, vandersonmr, Richard Henderson

We want to store statistics for each TB even after flushes.
We do not want to modify or grow the TB struct.
So we create a new struct to contain this statistics and
link it to each TB while they are created.

Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
---
 accel/tcg/translate-all.c | 40 +++++++++++++++++++++++++++++++++++++++
 include/exec/exec-all.h   | 21 ++++++++++++++++++++
 include/exec/tb-context.h |  1 +
 include/qemu/log.h        |  1 +
 4 files changed, 63 insertions(+)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 5d1e08b169..f7e99f90e0 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1118,6 +1118,18 @@ static inline void code_gen_alloc(size_t tb_size)
     }
 }
 
+static gint statistics_cmp(gconstpointer p1, gconstpointer p2) 
+{
+    const TBStatistics *a = (TBStatistics *) p1;
+    const TBStatistics *b = (TBStatistics *) p2;
+
+    return (a->pc == b->pc && 
+		   a->cs_base == b->cs_base &&
+		   a->flags == b->flags && 
+	       a->page_addr[0] == b->page_addr[0] &&
+    	   a->page_addr[1] == b->page_addr[1]) ? 0 : 1; 
+}
+
 static bool tb_cmp(const void *ap, const void *bp)
 {
     const TranslationBlock *a = ap;
@@ -1586,6 +1598,29 @@ static inline void tb_page_add(PageDesc *p, TranslationBlock *tb,
 #endif
 }
 
+static void tb_insert_statistics_structure(TranslationBlock *tb) {
+    TBStatistics *new_stats = g_new0(TBStatistics, 1);
+    new_stats->pc = tb->pc;
+    new_stats->cs_base = tb->cs_base;
+    new_stats->flags = tb->flags;
+    new_stats->page_addr[0] = tb->page_addr[0];
+    new_stats->page_addr[1] = tb->page_addr[1];
+
+	GList *lookup_result = g_list_find_custom(tb_ctx.tb_statistics, new_stats, statistics_cmp);
+
+	if (lookup_result) {
+		/* If there is already a TBStatistic for this TB from a previous flush
+		* then just make the new TB point to the older TBStatistic
+		*/
+		free(new_stats);
+    	tb->tb_stats = lookup_result->data;
+	} else {
+		/* If not, then points to the new tb_statistics and add it to the hash */
+		tb->tb_stats = new_stats;
+    	tb_ctx.tb_statistics = g_list_prepend(tb_ctx.tb_statistics, new_stats);
+	}
+}
+
 /* add a new TB and link it to the physical page tables. phys_page2 is
  * (-1) to indicate that only one page contains the TB.
  *
@@ -1636,6 +1671,11 @@ tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
         void *existing_tb = NULL;
         uint32_t h;
 
+        if (qemu_loglevel_mask(CPU_LOG_HOT_TBS)) {
+        	/* create and link to its TB a structure to store statistics */
+        	tb_insert_statistics_structure(tb);
+		}
+
         /* add in the hash table */
         h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags & CF_HASH_MASK,
                          tb->trace_vcpu_dstate);
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 16034ee651..359100ef3b 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -324,6 +324,24 @@ static inline void tlb_flush_by_mmuidx_all_cpus_synced(CPUState *cpu,
 #define CODE_GEN_AVG_BLOCK_SIZE 150
 #endif
 
+typedef struct TBStatistics TBStatistics;                                                                                                               
+
+/* 
+ * This struct stores statistics such as execution count of the TranslationBlocks.
+ * Each TB has its own TBStatistics. TBStatistics is suppose to live even after 
+ * flushes.
+ */
+struct TBStatistics {                                                                                                                                   
+    target_ulong pc;                                                                                                                                    
+    target_ulong cs_base;                                                                                                                               
+    uint32_t flags;                                                                                                                                     
+    tb_page_addr_t page_addr[2];                                                                                                                        
+
+    // total number of times that the related TB have being executed                                                                                    
+    uint32_t exec_count;                                                                                                                                
+    uint32_t exec_count_overflows;                                                                                                                      
+};  
+
 /*
  * Translation Cache-related fields of a TB.
  * This struct exists just for convenience; we keep track of TB's in a binary
@@ -403,6 +421,9 @@ struct TranslationBlock {
     uintptr_t jmp_list_head;
     uintptr_t jmp_list_next[2];
     uintptr_t jmp_dest[2];
+
+    // Pointer to a struct where statistics from the TB is stored
+    TBStatistics *tb_stats;
 };
 
 extern bool parallel_cpus;
diff --git a/include/exec/tb-context.h b/include/exec/tb-context.h
index feb585e0a7..a78ce92e60 100644
--- a/include/exec/tb-context.h
+++ b/include/exec/tb-context.h
@@ -35,6 +35,7 @@ struct TBContext {
 
     /* statistics */
     unsigned tb_flush_count;
+    GList *tb_statistics;
 };
 
 extern TBContext tb_ctx;
diff --git a/include/qemu/log.h b/include/qemu/log.h
index b097a6cae1..2fca65dd01 100644
--- a/include/qemu/log.h
+++ b/include/qemu/log.h
@@ -45,6 +45,7 @@ static inline bool qemu_log_separate(void)
 /* LOG_TRACE (1 << 15) is defined in log-for-trace.h */
 #define CPU_LOG_TB_OP_IND  (1 << 16)
 #define CPU_LOG_TB_FPU     (1 << 17)
+#define CPU_LOG_HOT_TBS    (1 << 18)
 
 /* Lock output for a series of related logs.  Since this is not needed
  * for a single qemu_log / qemu_log_mask / qemu_log_mask_and_addr, we
-- 
2.22.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH v2 2/4] Adding an optional tb execution counter.
  2019-06-24  5:54 [Qemu-devel] [PATCH v2 0/4] dumping hot TBs vandersonmr
  2019-06-24  5:54 ` [Qemu-devel] [PATCH v2 1/4] add and link a statistic struct to TBs vandersonmr
@ 2019-06-24  5:54 ` vandersonmr
  2019-06-25  9:51   ` Alex Bennée
  2019-06-24  5:54 ` [Qemu-devel] [PATCH v2 3/4] Introduce dump of hot TBs vandersonmr
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: vandersonmr @ 2019-06-24  5:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, vandersonmr, Richard Henderson

We collect the number of times each TB is executed
and store it in the its TBStatistics.
We also count the number of times the execution counter overflows.

Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
---
 accel/tcg/tcg-runtime.c   | 10 ++++++++++
 accel/tcg/tcg-runtime.h   |  2 ++
 accel/tcg/translator.c    |  1 +
 include/exec/gen-icount.h |  9 +++++++++
 4 files changed, 22 insertions(+)

diff --git a/accel/tcg/tcg-runtime.c b/accel/tcg/tcg-runtime.c
index 8a1e408e31..9888a7fce8 100644
--- a/accel/tcg/tcg-runtime.c
+++ b/accel/tcg/tcg-runtime.c
@@ -167,3 +167,13 @@ void HELPER(exit_atomic)(CPUArchState *env)
 {
     cpu_loop_exit_atomic(env_cpu(env), GETPC());
 }
+
+void HELPER(inc_exec_freq)(void *ptr)
+{
+    TranslationBlock *tb = (TranslationBlock*) ptr;
+    // if overflows, then reset the execution counter and increment the overflow counter
+    if (atomic_cmpxchg(&tb->tb_stats->exec_count, 0xFFFFFFFF, 0) == 0xFFFFFFFF) {
+        atomic_inc(&tb->tb_stats->exec_count_overflows);
+    }
+    atomic_inc(&tb->tb_stats->exec_count);
+}
diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
index 4fa61b49b4..bf0b75dbe8 100644
--- a/accel/tcg/tcg-runtime.h
+++ b/accel/tcg/tcg-runtime.h
@@ -28,6 +28,8 @@ DEF_HELPER_FLAGS_1(lookup_tb_ptr, TCG_CALL_NO_WG_SE, ptr, env)
 
 DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, env)
 
+DEF_HELPER_FLAGS_1(inc_exec_freq, TCG_CALL_NO_RWG, void, ptr)
+
 #ifdef CONFIG_SOFTMMU
 
 DEF_HELPER_FLAGS_5(atomic_cmpxchgb, TCG_CALL_NO_WG,
diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index 9226a348a3..cc06070e7e 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -54,6 +54,7 @@ void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
     gen_tb_start(db->tb);
     ops->tb_start(db, cpu);
     tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
+    gen_tb_exec_count(tb);
 
     while (true) {
         db->num_insns++;
diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
index f7669b6841..6d38b6e1fb 100644
--- a/include/exec/gen-icount.h
+++ b/include/exec/gen-icount.h
@@ -7,6 +7,15 @@
 
 static TCGOp *icount_start_insn;
 
+static inline void gen_tb_exec_count(TranslationBlock *tb)
+{
+  if (qemu_loglevel_mask(CPU_LOG_HOT_TBS)) {
+    TCGv_ptr tb_ptr = tcg_const_ptr(tb);
+    gen_helper_inc_exec_freq(tb_ptr);
+    tcg_temp_free_ptr(tb_ptr);
+  }
+}
+
 static inline void gen_tb_start(TranslationBlock *tb)
 {
     TCGv_i32 count, imm;
-- 
2.22.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH v2 3/4] Introduce dump of hot TBs
  2019-06-24  5:54 [Qemu-devel] [PATCH v2 0/4] dumping hot TBs vandersonmr
  2019-06-24  5:54 ` [Qemu-devel] [PATCH v2 1/4] add and link a statistic struct to TBs vandersonmr
  2019-06-24  5:54 ` [Qemu-devel] [PATCH v2 2/4] Adding an optional tb execution counter vandersonmr
@ 2019-06-24  5:54 ` vandersonmr
  2019-06-25 10:38   ` Alex Bennée
  2019-06-24  5:54 ` [Qemu-devel] [PATCH v2 4/4] adding -d hot_tbs:limit command line option vandersonmr
  2019-06-24  6:12 ` [Qemu-devel] [PATCH v2 0/4] dumping hot TBs no-reply
  4 siblings, 1 reply; 14+ messages in thread
From: vandersonmr @ 2019-06-24  5:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, vandersonmr, Richard Henderson

Adding a function to dump the Nth hottest TBs.
The block PC, execution count and ops is dump to the log.

Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
---
 accel/tcg/translate-all.c | 45 +++++++++++++++++++++++++++++++++++++++
 include/exec/exec-all.h   |  2 ++
 2 files changed, 47 insertions(+)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index f7e99f90e0..c3d9ecb2c4 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1240,6 +1240,27 @@ static gboolean tb_host_size_iter(gpointer key, gpointer value, gpointer data)
     return false;
 }
 
+static void tb_dump_statistics(TBStatistics *tbs)
+{
+    uint32_t cflags = curr_cflags() | CF_NOCACHE;
+    int old_log_flags = qemu_loglevel;
+
+    qemu_set_log(CPU_LOG_TB_OP_OPT);
+
+    qemu_log("\n------------------------------\n");
+    qemu_log("Translation Block PC: \t0x"TARGET_FMT_lx "\n", tbs->pc);
+    qemu_log("Execution Count: \t%lu\n\n", (uint64_t) (tbs->exec_count + tbs->exec_count_overflows*0xFFFFFFFF));
+
+    mmap_lock();
+    TranslationBlock *tb = tb_gen_code(current_cpu, tbs->pc, tbs->cs_base, tbs->flags, cflags);
+    tb_phys_invalidate(tb, -1);
+    mmap_unlock();
+
+    qemu_set_log(old_log_flags);
+
+    tcg_tb_remove(tb);
+}
+
 /* flush all the translation blocks */
 static void do_tb_flush(CPUState *cpu, run_on_cpu_data tb_flush_count)
 {
@@ -1276,6 +1297,30 @@ done:
     mmap_unlock();
 }
 
+static gint inverse_sort_tbs(gconstpointer p1, gconstpointer p2) 
+{
+    const TBStatistics *tbs1 = (TBStatistics *) p1;
+    const TBStatistics *tbs2 = (TBStatistics *) p2;
+    uint64_t p1_count = (uint64_t) (tbs1->exec_count + tbs1->exec_count_overflows*0xFFFFFFFF);
+    uint64_t p2_count = (uint64_t) (tbs2->exec_count + tbs2->exec_count_overflows*0xFFFFFFFF);
+
+    return p1_count < p2_count ? 1 : p1_count == p2_count ? 0 : -1;
+}
+
+void tb_dump_exec_freq(uint32_t max_tbs_to_print)
+{
+    tb_ctx.tb_statistics = g_list_sort(tb_ctx.tb_statistics, inverse_sort_tbs);
+
+    uint32_t tbs_printed = 0;
+    for (GList *i = tb_ctx.tb_statistics; i != NULL; i = i->next) {
+        tbs_printed++;
+	    tb_dump_statistics((TBStatistics *) i->data);
+        if (max_tbs_to_print != 0 && tbs_printed >= max_tbs_to_print) {
+            break;
+        }
+    }
+}
+
 void tb_flush(CPUState *cpu)
 {
     if (tcg_enabled()) {
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 359100ef3b..0547db0271 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -533,4 +533,6 @@ hwaddr memory_region_section_get_iotlb(CPUState *cpu,
 /* vl.c */
 extern int singlestep;
 
+void tb_dump_exec_freq(uint32_t);
+
 #endif
-- 
2.22.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH v2 4/4] adding -d hot_tbs:limit command line option
  2019-06-24  5:54 [Qemu-devel] [PATCH v2 0/4] dumping hot TBs vandersonmr
                   ` (2 preceding siblings ...)
  2019-06-24  5:54 ` [Qemu-devel] [PATCH v2 3/4] Introduce dump of hot TBs vandersonmr
@ 2019-06-24  5:54 ` vandersonmr
  2019-06-25 12:13   ` Alex Bennée
  2019-06-24  6:12 ` [Qemu-devel] [PATCH v2 0/4] dumping hot TBs no-reply
  4 siblings, 1 reply; 14+ messages in thread
From: vandersonmr @ 2019-06-24  5:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: Riku Voipio, vandersonmr, Laurent Vivier

add option to dump the N most hot TB blocks.
-d hot_tbs:N

Signed-off-by: vandersonmr <vandersonmr2@gmail.com>
---
 include/qemu/log-for-trace.h | 2 ++
 linux-user/exit.c            | 3 +++
 util/log.c                   | 9 +++++++++
 3 files changed, 14 insertions(+)

diff --git a/include/qemu/log-for-trace.h b/include/qemu/log-for-trace.h
index 2f0a5b080e..d65eb83037 100644
--- a/include/qemu/log-for-trace.h
+++ b/include/qemu/log-for-trace.h
@@ -21,6 +21,8 @@
 /* Private global variable, don't use */
 extern int qemu_loglevel;
 
+extern int32_t max_num_hot_tbs_to_dump;
+
 #define LOG_TRACE          (1 << 15)
 
 /* Returns true if a bit is set in the current loglevel mask */
diff --git a/linux-user/exit.c b/linux-user/exit.c
index bdda720553..08b86dfd61 100644
--- a/linux-user/exit.c
+++ b/linux-user/exit.c
@@ -28,6 +28,9 @@ extern void __gcov_dump(void);
 
 void preexit_cleanup(CPUArchState *env, int code)
 {
+    if (qemu_loglevel_mask(CPU_LOG_HOT_TBS)) {
+        tb_dump_exec_freq(max_num_hot_tbs_to_dump);
+    }
 #ifdef TARGET_GPROF
         _mcleanup();
 #endif
diff --git a/util/log.c b/util/log.c
index 1d1b33f7d9..e71c663143 100644
--- a/util/log.c
+++ b/util/log.c
@@ -30,6 +30,7 @@ FILE *qemu_logfile;
 int qemu_loglevel;
 static int log_append = 0;
 static GArray *debug_regions;
+int32_t max_num_hot_tbs_to_dump;
 
 /* Return the number of characters emitted.  */
 int qemu_log(const char *fmt, ...)
@@ -273,6 +274,9 @@ const QEMULogItem qemu_log_items[] = {
     { CPU_LOG_TB_NOCHAIN, "nochain",
       "do not chain compiled TBs so that \"exec\" and \"cpu\" show\n"
       "complete traces" },
+    { CPU_LOG_HOT_TBS, "hot_tbs(:limit)",
+      "show TBs (until given a limit) ordered by their hotness.\n"
+      "(if no limit is given, show all)" },
     { 0, NULL, NULL },
 };
 
@@ -294,6 +298,11 @@ int qemu_str_to_log_mask(const char *str)
             trace_enable_events((*tmp) + 6);
             mask |= LOG_TRACE;
 #endif
+        } else if (g_str_has_prefix(*tmp, "hot_tbs")) {
+            if (g_str_has_prefix(*tmp, "hot_tbs:") && (*tmp)[8] != '\0') {
+                max_num_hot_tbs_to_dump = atoi((*tmp) + 8);
+            }
+            mask |= CPU_LOG_HOT_TBS;
         } else {
             for (item = qemu_log_items; item->mask != 0; item++) {
                 if (g_str_equal(*tmp, item->name)) {
-- 
2.22.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/4] dumping hot TBs
  2019-06-24  5:54 [Qemu-devel] [PATCH v2 0/4] dumping hot TBs vandersonmr
                   ` (3 preceding siblings ...)
  2019-06-24  5:54 ` [Qemu-devel] [PATCH v2 4/4] adding -d hot_tbs:limit command line option vandersonmr
@ 2019-06-24  6:12 ` no-reply
  4 siblings, 0 replies; 14+ messages in thread
From: no-reply @ 2019-06-24  6:12 UTC (permalink / raw)
  To: vandersonmr2; +Cc: qemu-devel

Patchew URL: https://patchew.org/QEMU/20190624055442.2973-1-vandersonmr2@gmail.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [Qemu-devel] [PATCH v2 0/4] dumping hot TBs
Type: series
Message-id: 20190624055442.2973-1-vandersonmr2@gmail.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

From https://github.com/patchew-project/qemu
 * [new tag]               patchew/20190624055442.2973-1-vandersonmr2@gmail.com -> patchew/20190624055442.2973-1-vandersonmr2@gmail.com
Switched to a new branch 'test'
231c2807dd adding -d hot_tbs:limit command line option
d059715ded Introduce dump of hot TBs
25fe42cfad Adding an optional tb execution counter.
9d3e9b25e0 add and link a statistic struct to TBs

=== OUTPUT BEGIN ===
1/4 Checking commit 9d3e9b25e036 (add and link a statistic struct to TBs)
ERROR: trailing whitespace
#23: FILE: accel/tcg/translate-all.c:1121:
+static gint statistics_cmp(gconstpointer p1, gconstpointer p2) $

ERROR: trailing whitespace
#28: FILE: accel/tcg/translate-all.c:1126:
+    return (a->pc == b->pc && $

ERROR: code indent should never use tabs
#29: FILE: accel/tcg/translate-all.c:1127:
+^I^I   a->cs_base == b->cs_base &&$

ERROR: trailing whitespace
#30: FILE: accel/tcg/translate-all.c:1128:
+^I^I   a->flags == b->flags && $

ERROR: code indent should never use tabs
#30: FILE: accel/tcg/translate-all.c:1128:
+^I^I   a->flags == b->flags && $

ERROR: code indent should never use tabs
#31: FILE: accel/tcg/translate-all.c:1129:
+^I       a->page_addr[0] == b->page_addr[0] &&$

ERROR: trailing whitespace
#32: FILE: accel/tcg/translate-all.c:1130:
+    ^I   a->page_addr[1] == b->page_addr[1]) ? 0 : 1; $

ERROR: code indent should never use tabs
#32: FILE: accel/tcg/translate-all.c:1130:
+    ^I   a->page_addr[1] == b->page_addr[1]) ? 0 : 1; $

ERROR: open brace '{' following function declarations go on the next line
#42: FILE: accel/tcg/translate-all.c:1601:
+static void tb_insert_statistics_structure(TranslationBlock *tb) {

ERROR: line over 90 characters
#50: FILE: accel/tcg/translate-all.c:1609:
+       GList *lookup_result = g_list_find_custom(tb_ctx.tb_statistics, new_stats, statistics_cmp);

ERROR: code indent should never use tabs
#50: FILE: accel/tcg/translate-all.c:1609:
+^IGList *lookup_result = g_list_find_custom(tb_ctx.tb_statistics, new_stats, statistics_cmp);$

ERROR: code indent should never use tabs
#52: FILE: accel/tcg/translate-all.c:1611:
+^Iif (lookup_result) {$

WARNING: line over 80 characters
#53: FILE: accel/tcg/translate-all.c:1612:
+               /* If there is already a TBStatistic for this TB from a previous flush

ERROR: code indent should never use tabs
#53: FILE: accel/tcg/translate-all.c:1612:
+^I^I/* If there is already a TBStatistic for this TB from a previous flush$

WARNING: Block comments use a leading /* on a separate line
#53: FILE: accel/tcg/translate-all.c:1612:
+               /* If there is already a TBStatistic for this TB from a previous flush

ERROR: code indent should never use tabs
#54: FILE: accel/tcg/translate-all.c:1613:
+^I^I* then just make the new TB point to the older TBStatistic$

WARNING: Block comments should align the * on each line
#54: FILE: accel/tcg/translate-all.c:1613:
+               /* If there is already a TBStatistic for this TB from a previous flush
+               * then just make the new TB point to the older TBStatistic

ERROR: code indent should never use tabs
#55: FILE: accel/tcg/translate-all.c:1614:
+^I^I*/$

ERROR: code indent should never use tabs
#56: FILE: accel/tcg/translate-all.c:1615:
+^I^Ifree(new_stats);$

ERROR: code indent should never use tabs
#57: FILE: accel/tcg/translate-all.c:1616:
+    ^Itb->tb_stats = lookup_result->data;$

ERROR: code indent should never use tabs
#58: FILE: accel/tcg/translate-all.c:1617:
+^I} else {$

WARNING: line over 80 characters
#59: FILE: accel/tcg/translate-all.c:1618:
+               /* If not, then points to the new tb_statistics and add it to the hash */

ERROR: code indent should never use tabs
#59: FILE: accel/tcg/translate-all.c:1618:
+^I^I/* If not, then points to the new tb_statistics and add it to the hash */$

ERROR: code indent should never use tabs
#60: FILE: accel/tcg/translate-all.c:1619:
+^I^Itb->tb_stats = new_stats;$

ERROR: code indent should never use tabs
#61: FILE: accel/tcg/translate-all.c:1620:
+    ^Itb_ctx.tb_statistics = g_list_prepend(tb_ctx.tb_statistics, new_stats);$

ERROR: code indent should never use tabs
#62: FILE: accel/tcg/translate-all.c:1621:
+^I}$

ERROR: code indent should never use tabs
#73: FILE: accel/tcg/translate-all.c:1675:
+        ^I/* create and link to its TB a structure to store statistics */$

ERROR: code indent should never use tabs
#74: FILE: accel/tcg/translate-all.c:1676:
+        ^Itb_insert_statistics_structure(tb);$

ERROR: code indent should never use tabs
#75: FILE: accel/tcg/translate-all.c:1677:
+^I^I}$

ERROR: trailing whitespace
#88: FILE: include/exec/exec-all.h:327:
+typedef struct TBStatistics TBStatistics;                                                                                                               $

ERROR: line over 90 characters
#88: FILE: include/exec/exec-all.h:327:
+typedef struct TBStatistics TBStatistics;                                                                                                               

ERROR: trailing whitespace
#90: FILE: include/exec/exec-all.h:329:
+/* $

WARNING: line over 80 characters
#91: FILE: include/exec/exec-all.h:330:
+ * This struct stores statistics such as execution count of the TranslationBlocks.

ERROR: trailing whitespace
#92: FILE: include/exec/exec-all.h:331:
+ * Each TB has its own TBStatistics. TBStatistics is suppose to live even after $

ERROR: trailing whitespace
#95: FILE: include/exec/exec-all.h:334:
+struct TBStatistics {                                                                                                                                   $

ERROR: line over 90 characters
#95: FILE: include/exec/exec-all.h:334:
+struct TBStatistics {                                                                                                                                   

ERROR: trailing whitespace
#96: FILE: include/exec/exec-all.h:335:
+    target_ulong pc;                                                                                                                                    $

ERROR: line over 90 characters
#96: FILE: include/exec/exec-all.h:335:
+    target_ulong pc;                                                                                                                                    

ERROR: trailing whitespace
#97: FILE: include/exec/exec-all.h:336:
+    target_ulong cs_base;                                                                                                                               $

ERROR: line over 90 characters
#97: FILE: include/exec/exec-all.h:336:
+    target_ulong cs_base;                                                                                                                               

ERROR: trailing whitespace
#98: FILE: include/exec/exec-all.h:337:
+    uint32_t flags;                                                                                                                                     $

ERROR: line over 90 characters
#98: FILE: include/exec/exec-all.h:337:
+    uint32_t flags;                                                                                                                                     

ERROR: trailing whitespace
#99: FILE: include/exec/exec-all.h:338:
+    tb_page_addr_t page_addr[2];                                                                                                                        $

ERROR: line over 90 characters
#99: FILE: include/exec/exec-all.h:338:
+    tb_page_addr_t page_addr[2];                                                                                                                        

ERROR: trailing whitespace
#101: FILE: include/exec/exec-all.h:340:
+    // total number of times that the related TB have being executed                                                                                    $

ERROR: line over 90 characters
#101: FILE: include/exec/exec-all.h:340:
+    // total number of times that the related TB have being executed                                                                                    

ERROR: do not use C99 // comments
#101: FILE: include/exec/exec-all.h:340:
+    // total number of times that the related TB have being executed                                                                                    

ERROR: trailing whitespace
#102: FILE: include/exec/exec-all.h:341:
+    uint32_t exec_count;                                                                                                                                $

ERROR: line over 90 characters
#102: FILE: include/exec/exec-all.h:341:
+    uint32_t exec_count;                                                                                                                                

ERROR: trailing whitespace
#103: FILE: include/exec/exec-all.h:342:
+    uint32_t exec_count_overflows;                                                                                                                      $

ERROR: line over 90 characters
#103: FILE: include/exec/exec-all.h:342:
+    uint32_t exec_count_overflows;                                                                                                                      

ERROR: trailing whitespace
#104: FILE: include/exec/exec-all.h:343:
+};  $

ERROR: do not use C99 // comments
#114: FILE: include/exec/exec-all.h:425:
+    // Pointer to a struct where statistics from the TB is stored

total: 48 errors, 5 warnings, 105 lines checked

Patch 1/4 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

2/4 Checking commit 25fe42cfad08 (Adding an optional tb execution counter.)
ERROR: "(foo*)" should be "(foo *)"
#25: FILE: accel/tcg/tcg-runtime.c:173:
+    TranslationBlock *tb = (TranslationBlock*) ptr;

WARNING: line over 80 characters
#26: FILE: accel/tcg/tcg-runtime.c:174:
+    // if overflows, then reset the execution counter and increment the overflow counter

ERROR: do not use C99 // comments
#26: FILE: accel/tcg/tcg-runtime.c:174:
+    // if overflows, then reset the execution counter and increment the overflow counter

WARNING: line over 80 characters
#27: FILE: accel/tcg/tcg-runtime.c:175:
+    if (atomic_cmpxchg(&tb->tb_stats->exec_count, 0xFFFFFFFF, 0) == 0xFFFFFFFF) {

total: 2 errors, 2 warnings, 43 lines checked

Patch 2/4 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

3/4 Checking commit d059715deda0 (Introduce dump of hot TBs)
ERROR: line over 90 characters
#30: FILE: accel/tcg/translate-all.c:1252:
+    qemu_log("Execution Count: \t%lu\n\n", (uint64_t) (tbs->exec_count + tbs->exec_count_overflows*0xFFFFFFFF));

ERROR: spaces required around that '*' (ctx:VxV)
#30: FILE: accel/tcg/translate-all.c:1252:
+    qemu_log("Execution Count: \t%lu\n\n", (uint64_t) (tbs->exec_count + tbs->exec_count_overflows*0xFFFFFFFF));
                                                                                                   ^

ERROR: line over 90 characters
#33: FILE: accel/tcg/translate-all.c:1255:
+    TranslationBlock *tb = tb_gen_code(current_cpu, tbs->pc, tbs->cs_base, tbs->flags, cflags);

ERROR: trailing whitespace
#49: FILE: accel/tcg/translate-all.c:1300:
+static gint inverse_sort_tbs(gconstpointer p1, gconstpointer p2) $

ERROR: line over 90 characters
#53: FILE: accel/tcg/translate-all.c:1304:
+    uint64_t p1_count = (uint64_t) (tbs1->exec_count + tbs1->exec_count_overflows*0xFFFFFFFF);

ERROR: spaces required around that '*' (ctx:VxV)
#53: FILE: accel/tcg/translate-all.c:1304:
+    uint64_t p1_count = (uint64_t) (tbs1->exec_count + tbs1->exec_count_overflows*0xFFFFFFFF);
                                                                                  ^

ERROR: line over 90 characters
#54: FILE: accel/tcg/translate-all.c:1305:
+    uint64_t p2_count = (uint64_t) (tbs2->exec_count + tbs2->exec_count_overflows*0xFFFFFFFF);

ERROR: spaces required around that '*' (ctx:VxV)
#54: FILE: accel/tcg/translate-all.c:1305:
+    uint64_t p2_count = (uint64_t) (tbs2->exec_count + tbs2->exec_count_overflows*0xFFFFFFFF);
                                                                                  ^

ERROR: code indent should never use tabs
#66: FILE: accel/tcg/translate-all.c:1317:
+^I    tb_dump_statistics((TBStatistics *) i->data);$

total: 9 errors, 0 warnings, 63 lines checked

Patch 3/4 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

4/4 Checking commit 231c2807dd78 (adding -d hot_tbs:limit command line option)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20190624055442.2973-1-vandersonmr2@gmail.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/4] add and link a statistic struct to TBs
  2019-06-24  5:54 ` [Qemu-devel] [PATCH v2 1/4] add and link a statistic struct to TBs vandersonmr
@ 2019-06-24 15:06   ` Alex Bennée
  2019-06-25 16:28     ` Alex Bennée
  0 siblings, 1 reply; 14+ messages in thread
From: Alex Bennée @ 2019-06-24 15:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, vandersonmr, Richard Henderson


vandersonmr <vandersonmr2@gmail.com> writes:

> We want to store statistics for each TB even after flushes.
> We do not want to modify or grow the TB struct.
> So we create a new struct to contain this statistics and
> link it to each TB while they are created.
>
> Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
> ---
>  accel/tcg/translate-all.c | 40 +++++++++++++++++++++++++++++++++++++++
>  include/exec/exec-all.h   | 21 ++++++++++++++++++++
>  include/exec/tb-context.h |  1 +
>  include/qemu/log.h        |  1 +
>  4 files changed, 63 insertions(+)
>
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 5d1e08b169..f7e99f90e0 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -1118,6 +1118,18 @@ static inline void code_gen_alloc(size_t tb_size)
>      }
>  }
>
> +static gint statistics_cmp(gconstpointer p1, gconstpointer p2)
> +{
> +    const TBStatistics *a = (TBStatistics *) p1;
> +    const TBStatistics *b = (TBStatistics *) p2;
> +
> +    return (a->pc == b->pc &&
> +		   a->cs_base == b->cs_base &&
> +		   a->flags == b->flags &&
> +	       a->page_addr[0] == b->page_addr[0] &&
> +    	   a->page_addr[1] == b->page_addr[1]) ? 0 : 1;
> +}
> +

There are a bunch of white space errors that need fixing up as you
probably already know from patchew ;-)

>  static bool tb_cmp(const void *ap, const void *bp)
>  {
>      const TranslationBlock *a = ap;
> @@ -1586,6 +1598,29 @@ static inline void tb_page_add(PageDesc *p, TranslationBlock *tb,
>  #endif
>  }
>
> +static void tb_insert_statistics_structure(TranslationBlock *tb) {
> +    TBStatistics *new_stats = g_new0(TBStatistics, 1);
> +    new_stats->pc = tb->pc;
> +    new_stats->cs_base = tb->cs_base;
> +    new_stats->flags = tb->flags;
> +    new_stats->page_addr[0] = tb->page_addr[0];
> +    new_stats->page_addr[1] = tb->page_addr[1];
> +
> +	GList *lookup_result = g_list_find_custom(tb_ctx.tb_statistics, new_stats, statistics_cmp);
> +
> +	if (lookup_result) {
> +		/* If there is already a TBStatistic for this TB from a previous flush
> +		* then just make the new TB point to the older TBStatistic
> +		*/
> +		free(new_stats);
> +    	tb->tb_stats = lookup_result->data;
> +	} else {
> +		/* If not, then points to the new tb_statistics and add it to the hash */
> +		tb->tb_stats = new_stats;
> +    	tb_ctx.tb_statistics = g_list_prepend(tb_ctx.tb_statistics,
> new_stats);

This is going to race as nothing prevents two threads attempting to
insert records at the same time.

> +	}
> +}
> +
>  /* add a new TB and link it to the physical page tables. phys_page2 is
>   * (-1) to indicate that only one page contains the TB.
>   *
> @@ -1636,6 +1671,11 @@ tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
>          void *existing_tb = NULL;
>          uint32_t h;
>
> +        if (qemu_loglevel_mask(CPU_LOG_HOT_TBS)) {
> +        	/* create and link to its TB a structure to store statistics */
> +        	tb_insert_statistics_structure(tb);
> +		}
> +
>          /* add in the hash table */
>          h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags & CF_HASH_MASK,
>                           tb->trace_vcpu_dstate);

Perhaps we could just have a second qht array which allows for
"unlocked" insertion of record. You could take advantage of the fact the
hash would be the same:

        h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags & CF_HASH_MASK,
                         tb->trace_vcpu_dstate);
        qht_insert(&tb_ctx.htable, tb, h, &existing_tb);

        /* remove TB from the page(s) if we couldn't insert it */
        if (unlikely(existing_tb)) {
            ...
            tb = existing_tb;
        } else if (qemu_loglevel_mask(CPU_LOG_HOT_TBS)) {
            TBStatistics *new_stats = g_new0(TBStatistics, 1);
            bool ok;
            new_stats->pc = tb->pc;
            new_stats->cs_base = tb->cs_base;
            new_stats->flags = tb->flags;
            new_stats->page_addr[0] = tb->page_addr[0];
            new_stats->page_addr[1] = tb->page_addr[1];
            ok = qht_insert(&tb_ctx.tb_stats, new_stats, h, NULL);
            /*
             * This should never fail as we succeeded in inserting the
             * TB itself which means we haven't seen this combination yet.
             */
            g_assert(ok);
        }

It would also avoid the clunkiness of having to allocate and then
freeing an unused structure.

> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index 16034ee651..359100ef3b 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -324,6 +324,24 @@ static inline void tlb_flush_by_mmuidx_all_cpus_synced(CPUState *cpu,
>  #define CODE_GEN_AVG_BLOCK_SIZE 150
>  #endif
>
> +typedef struct TBStatistics TBStatistics;
> +
> +/*
> + * This struct stores statistics such as execution count of the TranslationBlocks.
> + * Each TB has its own TBStatistics. TBStatistics is suppose to live even after
> + * flushes.
> + */
> +struct TBStatistics {
> +    target_ulong pc;
> +    target_ulong cs_base;
> +    uint32_t flags;
> +    tb_page_addr_t page_addr[2];
> +
> +    // total number of times that the related TB have being executed

No c++ style comments

> +    uint32_t exec_count;
> +    uint32_t exec_count_overflows;
> +};
> +
>  /*
>   * Translation Cache-related fields of a TB.
>   * This struct exists just for convenience; we keep track of TB's in a binary
> @@ -403,6 +421,9 @@ struct TranslationBlock {
>      uintptr_t jmp_list_head;
>      uintptr_t jmp_list_next[2];
>      uintptr_t jmp_dest[2];
> +
> +    // Pointer to a struct where statistics from the TB is stored

No c++ style comments

> +    TBStatistics *tb_stats;
>  };
>
>  extern bool parallel_cpus;
> diff --git a/include/exec/tb-context.h b/include/exec/tb-context.h
> index feb585e0a7..a78ce92e60 100644
> --- a/include/exec/tb-context.h
> +++ b/include/exec/tb-context.h
> @@ -35,6 +35,7 @@ struct TBContext {
>
>      /* statistics */
>      unsigned tb_flush_count;
> +    GList *tb_statistics;
>  };
>
>  extern TBContext tb_ctx;
> diff --git a/include/qemu/log.h b/include/qemu/log.h
> index b097a6cae1..2fca65dd01 100644
> --- a/include/qemu/log.h
> +++ b/include/qemu/log.h
> @@ -45,6 +45,7 @@ static inline bool qemu_log_separate(void)
>  /* LOG_TRACE (1 << 15) is defined in log-for-trace.h */
>  #define CPU_LOG_TB_OP_IND  (1 << 16)
>  #define CPU_LOG_TB_FPU     (1 << 17)
> +#define CPU_LOG_HOT_TBS    (1 << 18)
>
>  /* Lock output for a series of related logs.  Since this is not needed
>   * for a single qemu_log / qemu_log_mask / qemu_log_mask_and_addr, we


--
Alex Bennée


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/4] Adding an optional tb execution counter.
  2019-06-24  5:54 ` [Qemu-devel] [PATCH v2 2/4] Adding an optional tb execution counter vandersonmr
@ 2019-06-25  9:51   ` Alex Bennée
  0 siblings, 0 replies; 14+ messages in thread
From: Alex Bennée @ 2019-06-25  9:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, vandersonmr, Richard Henderson


vandersonmr <vandersonmr2@gmail.com> writes:

> We collect the number of times each TB is executed
> and store it in the its TBStatistics.
> We also count the number of times the execution counter overflows.
>
> Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
> ---
>  accel/tcg/tcg-runtime.c   | 10 ++++++++++
>  accel/tcg/tcg-runtime.h   |  2 ++
>  accel/tcg/translator.c    |  1 +
>  include/exec/gen-icount.h |  9 +++++++++
>  4 files changed, 22 insertions(+)
>
> diff --git a/accel/tcg/tcg-runtime.c b/accel/tcg/tcg-runtime.c
> index 8a1e408e31..9888a7fce8 100644
> --- a/accel/tcg/tcg-runtime.c
> +++ b/accel/tcg/tcg-runtime.c
> @@ -167,3 +167,13 @@ void HELPER(exit_atomic)(CPUArchState *env)
>  {
>      cpu_loop_exit_atomic(env_cpu(env), GETPC());
>  }
> +
> +void HELPER(inc_exec_freq)(void *ptr)
> +{
> +    TranslationBlock *tb = (TranslationBlock*) ptr;

To make things clearer:

  TBStatistics *stats = tb->tb_stats;

> +    // if overflows, then reset the execution counter and increment the overflow counter
> +    if (atomic_cmpxchg(&tb->tb_stats->exec_count, 0xFFFFFFFF, 0) == 0xFFFFFFFF) {
> +        atomic_inc(&tb->tb_stats->exec_count_overflows);
> +    }

This is all very nice but how often do you actually see overflows?
You'll see them even less on 32 bit hosts as they have much less memory
for translations so will flush them out more often. I'd suggest making
the counter "unsigned long" and just living with the fact the 32 bit
might get the execution stats wrong if a block executes more than 4
billion times.

> +    atomic_inc(&tb->tb_stats->exec_count);
> +}
> diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
> index 4fa61b49b4..bf0b75dbe8 100644
> --- a/accel/tcg/tcg-runtime.h
> +++ b/accel/tcg/tcg-runtime.h
> @@ -28,6 +28,8 @@ DEF_HELPER_FLAGS_1(lookup_tb_ptr, TCG_CALL_NO_WG_SE, ptr, env)
>
>  DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, env)
>
> +DEF_HELPER_FLAGS_1(inc_exec_freq, TCG_CALL_NO_RWG, void, ptr)
> +
>  #ifdef CONFIG_SOFTMMU
>
>  DEF_HELPER_FLAGS_5(atomic_cmpxchgb, TCG_CALL_NO_WG,
> diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
> index 9226a348a3..cc06070e7e 100644
> --- a/accel/tcg/translator.c
> +++ b/accel/tcg/translator.c
> @@ -54,6 +54,7 @@ void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
>      gen_tb_start(db->tb);
>      ops->tb_start(db, cpu);
>      tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
> +    gen_tb_exec_count(tb);
>
>      while (true) {
>          db->num_insns++;
> diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
> index f7669b6841..6d38b6e1fb 100644
> --- a/include/exec/gen-icount.h
> +++ b/include/exec/gen-icount.h
> @@ -7,6 +7,15 @@
>
>  static TCGOp *icount_start_insn;
>
> +static inline void gen_tb_exec_count(TranslationBlock *tb)
> +{
> +  if (qemu_loglevel_mask(CPU_LOG_HOT_TBS)) {
> +    TCGv_ptr tb_ptr = tcg_const_ptr(tb);

Why not pass the address of the counter itself to save the cost of the
indirection calculation at runtime?

> +    gen_helper_inc_exec_freq(tb_ptr);
> +    tcg_temp_free_ptr(tb_ptr);
> +  }
> +}
> +
>  static inline void gen_tb_start(TranslationBlock *tb)
>  {
>      TCGv_i32 count, imm;


--
Alex Bennée


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/4] Introduce dump of hot TBs
  2019-06-24  5:54 ` [Qemu-devel] [PATCH v2 3/4] Introduce dump of hot TBs vandersonmr
@ 2019-06-25 10:38   ` Alex Bennée
  0 siblings, 0 replies; 14+ messages in thread
From: Alex Bennée @ 2019-06-25 10:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, vandersonmr, Richard Henderson


vandersonmr <vandersonmr2@gmail.com> writes:

> Adding a function to dump the Nth hottest TBs.
> The block PC, execution count and ops is dump to the log.
>
> Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
> ---
>  accel/tcg/translate-all.c | 45 +++++++++++++++++++++++++++++++++++++++
>  include/exec/exec-all.h   |  2 ++
>  2 files changed, 47 insertions(+)
>
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index f7e99f90e0..c3d9ecb2c4 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -1240,6 +1240,27 @@ static gboolean tb_host_size_iter(gpointer key, gpointer value, gpointer data)
>      return false;
>  }
>
> +static void tb_dump_statistics(TBStatistics *tbs)
> +{
> +    uint32_t cflags = curr_cflags() | CF_NOCACHE;
> +    int old_log_flags = qemu_loglevel;
> +
> +    qemu_set_log(CPU_LOG_TB_OP_OPT);

I think you need to split your approach here. Once you are dealing with
interactive exploration you'll want to dump a given block with whatever
flags you want (in_asm,op,op_opt,out_asm are all relevant). So maybe
something like:

 (qemu) info tb 0xffff07ff7ee in_asm,out_asm


> +
> +    qemu_log("\n------------------------------\n");
> +    qemu_log("Translation Block PC: \t0x"TARGET_FMT_lx "\n", tbs->pc);
> +    qemu_log("Execution Count: \t%lu\n\n", (uint64_t)
> (tbs->exec_count + tbs->exec_count_overflows*0xFFFFFFFF));

For the monitor qemu_printf() would be the right output. Given they are
the same prototype you can pass a function pointer to the lowest level
function depending on if you are coming from the logging path or the
HMP. However redirecting the qemu_log output is the tricky bit.

> +
> +    mmap_lock();
> +    TranslationBlock *tb = tb_gen_code(current_cpu, tbs->pc, tbs->cs_base, tbs->flags, cflags);
> +    tb_phys_invalidate(tb, -1);
> +    mmap_unlock();
> +
> +    qemu_set_log(old_log_flags);

As we are manipulating the flags we'll want to make sure the rest of the
system isn't doing anything at this point. Currently that is the case on
exit() from a linux-user program but again for interactive use we'll
need to ensure we are running as safe_work (like tb_flush does).

> +
> +    tcg_tb_remove(tb);
> +}
> +
>  /* flush all the translation blocks */
>  static void do_tb_flush(CPUState *cpu, run_on_cpu_data tb_flush_count)
>  {
> @@ -1276,6 +1297,30 @@ done:
>      mmap_unlock();
>  }
>
> +static gint inverse_sort_tbs(gconstpointer p1, gconstpointer p2)
> +{
> +    const TBStatistics *tbs1 = (TBStatistics *) p1;
> +    const TBStatistics *tbs2 = (TBStatistics *) p2;
> +    uint64_t p1_count = (uint64_t) (tbs1->exec_count + tbs1->exec_count_overflows*0xFFFFFFFF);
> +    uint64_t p2_count = (uint64_t) (tbs2->exec_count + tbs2->exec_count_overflows*0xFFFFFFFF);
> +
> +    return p1_count < p2_count ? 1 : p1_count == p2_count ? 0 : -1;
> +}
> +
> +void tb_dump_exec_freq(uint32_t max_tbs_to_print)
> +{
> +    tb_ctx.tb_statistics = g_list_sort(tb_ctx.tb_statistics, inverse_sort_tbs);
> +
> +    uint32_t tbs_printed = 0;
> +    for (GList *i = tb_ctx.tb_statistics; i != NULL; i = i->next) {
> +        tbs_printed++;
> +	    tb_dump_statistics((TBStatistics *) i->data);
> +        if (max_tbs_to_print != 0 && tbs_printed >= max_tbs_to_print) {
> +            break;
> +        }
> +    }
> +}
> +
>  void tb_flush(CPUState *cpu)
>  {
>      if (tcg_enabled()) {
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index 359100ef3b..0547db0271 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -533,4 +533,6 @@ hwaddr memory_region_section_get_iotlb(CPUState *cpu,
>  /* vl.c */
>  extern int singlestep;
>
> +void tb_dump_exec_freq(uint32_t);
> +
>  #endif


--
Alex Bennée


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2 4/4] adding -d hot_tbs:limit command line option
  2019-06-24  5:54 ` [Qemu-devel] [PATCH v2 4/4] adding -d hot_tbs:limit command line option vandersonmr
@ 2019-06-25 12:13   ` Alex Bennée
  0 siblings, 0 replies; 14+ messages in thread
From: Alex Bennée @ 2019-06-25 12:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Riku Voipio, vandersonmr, Laurent Vivier


vandersonmr <vandersonmr2@gmail.com> writes:

> add option to dump the N most hot TB blocks.
> -d hot_tbs:N
>
> Signed-off-by: vandersonmr <vandersonmr2@gmail.com>
> ---
>  include/qemu/log-for-trace.h | 2 ++
>  linux-user/exit.c            | 3 +++
>  util/log.c                   | 9 +++++++++
>  3 files changed, 14 insertions(+)
>
> diff --git a/include/qemu/log-for-trace.h b/include/qemu/log-for-trace.h
> index 2f0a5b080e..d65eb83037 100644
> --- a/include/qemu/log-for-trace.h
> +++ b/include/qemu/log-for-trace.h
> @@ -21,6 +21,8 @@
>  /* Private global variable, don't use */
>  extern int qemu_loglevel;
>
> +extern int32_t max_num_hot_tbs_to_dump;
> +

This might as well be an int (especially as your using atoi to scan it).

>  #define LOG_TRACE          (1 << 15)
>
>  /* Returns true if a bit is set in the current loglevel mask */
> diff --git a/linux-user/exit.c b/linux-user/exit.c
> index bdda720553..08b86dfd61 100644
> --- a/linux-user/exit.c
> +++ b/linux-user/exit.c
> @@ -28,6 +28,9 @@ extern void __gcov_dump(void);
>
>  void preexit_cleanup(CPUArchState *env, int code)
>  {
> +    if (qemu_loglevel_mask(CPU_LOG_HOT_TBS)) {
> +        tb_dump_exec_freq(max_num_hot_tbs_to_dump);
> +    }

Rather than baking the individual flags here and the fact you'll need to
duplicate the test for system emulation why not have a common helper
which you call unconditionally here and in the tail end of vl.c's main:

  qemu_do_exit_logs()

where:

void qemu_do_exit_logs(void)
{
    if (qemu_loglevel_mask(CPU_LOG_HOT_TBS)) {
        tb_dump_exec_freq(max_num_hot_tbs_to_dump);
    }
}

and we can extend that for other reports later...


>  #ifdef TARGET_GPROF
>          _mcleanup();
>  #endif
> diff --git a/util/log.c b/util/log.c
> index 1d1b33f7d9..e71c663143 100644
> --- a/util/log.c
> +++ b/util/log.c
> @@ -30,6 +30,7 @@ FILE *qemu_logfile;
>  int qemu_loglevel;
>  static int log_append = 0;
>  static GArray *debug_regions;
> +int32_t max_num_hot_tbs_to_dump;
>
>  /* Return the number of characters emitted.  */
>  int qemu_log(const char *fmt, ...)
> @@ -273,6 +274,9 @@ const QEMULogItem qemu_log_items[] = {
>      { CPU_LOG_TB_NOCHAIN, "nochain",
>        "do not chain compiled TBs so that \"exec\" and \"cpu\" show\n"
>        "complete traces" },
> +    { CPU_LOG_HOT_TBS, "hot_tbs(:limit)",
> +      "show TBs (until given a limit) ordered by their hotness.\n"
> +      "(if no limit is given, show all)" },
>      { 0, NULL, NULL },
>  };
>
> @@ -294,6 +298,11 @@ int qemu_str_to_log_mask(const char *str)
>              trace_enable_events((*tmp) + 6);
>              mask |= LOG_TRACE;
>  #endif
> +        } else if (g_str_has_prefix(*tmp, "hot_tbs")) {
> +            if (g_str_has_prefix(*tmp, "hot_tbs:") && (*tmp)[8] != '\0') {
> +                max_num_hot_tbs_to_dump = atoi((*tmp) + 8);
> +            }
> +            mask |= CPU_LOG_HOT_TBS;
>          } else {
>              for (item = qemu_log_items; item->mask != 0; item++) {
>                  if (g_str_equal(*tmp, item->name)) {


--
Alex Bennée


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/4] add and link a statistic struct to TBs
  2019-06-24 15:06   ` Alex Bennée
@ 2019-06-25 16:28     ` Alex Bennée
  2019-06-25 17:37       ` Vanderson Martins do Rosario
  0 siblings, 1 reply; 14+ messages in thread
From: Alex Bennée @ 2019-06-25 16:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, vandersonmr, Richard Henderson


Alex Bennée <alex.bennee@linaro.org> writes:

> vandersonmr <vandersonmr2@gmail.com> writes:
>
>> We want to store statistics for each TB even after flushes.
>> We do not want to modify or grow the TB struct.
>> So we create a new struct to contain this statistics and
>> link it to each TB while they are created.
>>
>> Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
>> ---
>>  accel/tcg/translate-all.c | 40 +++++++++++++++++++++++++++++++++++++++
>>  include/exec/exec-all.h   | 21 ++++++++++++++++++++
>>  include/exec/tb-context.h |  1 +
>>  include/qemu/log.h        |  1 +
>>  4 files changed, 63 insertions(+)
>>
>> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
>> index 5d1e08b169..f7e99f90e0 100644
>> --- a/accel/tcg/translate-all.c
>> +++ b/accel/tcg/translate-all.c
>> @@ -1118,6 +1118,18 @@ static inline void code_gen_alloc(size_t tb_size)
>>      }
>>  }
>>
>> +static gint statistics_cmp(gconstpointer p1, gconstpointer p2)
>> +{
>> +    const TBStatistics *a = (TBStatistics *) p1;
>> +    const TBStatistics *b = (TBStatistics *) p2;
>> +
>> +    return (a->pc == b->pc &&
>> +		   a->cs_base == b->cs_base &&
>> +		   a->flags == b->flags &&
>> +	       a->page_addr[0] == b->page_addr[0] &&
>> +    	   a->page_addr[1] == b->page_addr[1]) ? 0 : 1;
>> +}
>> +
>
> There are a bunch of white space errors that need fixing up as you
> probably already know from patchew ;-)
>
>>  static bool tb_cmp(const void *ap, const void *bp)
>>  {
>>      const TranslationBlock *a = ap;
>> @@ -1586,6 +1598,29 @@ static inline void tb_page_add(PageDesc *p, TranslationBlock *tb,
>>  #endif
>>  }
>>
>> +static void tb_insert_statistics_structure(TranslationBlock *tb) {
>> +    TBStatistics *new_stats = g_new0(TBStatistics, 1);
>> +    new_stats->pc = tb->pc;
>> +    new_stats->cs_base = tb->cs_base;
>> +    new_stats->flags = tb->flags;
>> +    new_stats->page_addr[0] = tb->page_addr[0];
>> +    new_stats->page_addr[1] = tb->page_addr[1];
>> +
>> +	GList *lookup_result = g_list_find_custom(tb_ctx.tb_statistics, new_stats, statistics_cmp);
>> +
>> +	if (lookup_result) {
>> +		/* If there is already a TBStatistic for this TB from a previous flush
>> +		* then just make the new TB point to the older TBStatistic
>> +		*/
>> +		free(new_stats);
>> +    	tb->tb_stats = lookup_result->data;
>> +	} else {
>> +		/* If not, then points to the new tb_statistics and add it to the hash */
>> +		tb->tb_stats = new_stats;
>> +    	tb_ctx.tb_statistics = g_list_prepend(tb_ctx.tb_statistics,
>> new_stats);
>
> This is going to race as nothing prevents two threads attempting to
> insert records at the same time.
>
>> +	}
>> +}
>> +
>>  /* add a new TB and link it to the physical page tables. phys_page2 is
>>   * (-1) to indicate that only one page contains the TB.
>>   *
>> @@ -1636,6 +1671,11 @@ tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
>>          void *existing_tb = NULL;
>>          uint32_t h;
>>
>> +        if (qemu_loglevel_mask(CPU_LOG_HOT_TBS)) {
>> +        	/* create and link to its TB a structure to store statistics */
>> +        	tb_insert_statistics_structure(tb);
>> +		}
>> +
>>          /* add in the hash table */
>>          h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags & CF_HASH_MASK,
>>                           tb->trace_vcpu_dstate);
>
> Perhaps we could just have a second qht array which allows for
> "unlocked" insertion of record. You could take advantage of the fact the
> hash would be the same:
>
>         h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags & CF_HASH_MASK,
>                          tb->trace_vcpu_dstate);
>         qht_insert(&tb_ctx.htable, tb, h, &existing_tb);
>
>         /* remove TB from the page(s) if we couldn't insert it */
>         if (unlikely(existing_tb)) {
>             ...
>             tb = existing_tb;
>         } else if (qemu_loglevel_mask(CPU_LOG_HOT_TBS)) {
>             TBStatistics *new_stats = g_new0(TBStatistics, 1);
>             bool ok;
>             new_stats->pc = tb->pc;
>             new_stats->cs_base = tb->cs_base;
>             new_stats->flags = tb->flags;
>             new_stats->page_addr[0] = tb->page_addr[0];
>             new_stats->page_addr[1] = tb->page_addr[1];
>             ok = qht_insert(&tb_ctx.tb_stats, new_stats, h, NULL);
>             /*
>              * This should never fail as we succeeded in inserting the
>              * TB itself which means we haven't seen this combination yet.
>              */
>             g_assert(ok);
>         }
>
> It would also avoid the clunkiness of having to allocate and then
> freeing an unused structure.


Actually thinking on this we still have to handle it. We may have
tb_flushed away a translation and just be regenerating the same block.
As TBStatistics should transcend tb_flush events we need to re-use it's
structure.

It would be worth counting the regens as well so we can see blocks that
keep getting re-translated after each flush.

>
>> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
>> index 16034ee651..359100ef3b 100644
>> --- a/include/exec/exec-all.h
>> +++ b/include/exec/exec-all.h
>> @@ -324,6 +324,24 @@ static inline void tlb_flush_by_mmuidx_all_cpus_synced(CPUState *cpu,
>>  #define CODE_GEN_AVG_BLOCK_SIZE 150
>>  #endif
>>
>> +typedef struct TBStatistics TBStatistics;
>> +
>> +/*
>> + * This struct stores statistics such as execution count of the TranslationBlocks.
>> + * Each TB has its own TBStatistics. TBStatistics is suppose to live even after
>> + * flushes.
>> + */
>> +struct TBStatistics {
>> +    target_ulong pc;
>> +    target_ulong cs_base;
>> +    uint32_t flags;
>> +    tb_page_addr_t page_addr[2];
>> +
>> +    // total number of times that the related TB have being executed
>
> No c++ style comments
>
>> +    uint32_t exec_count;
>> +    uint32_t exec_count_overflows;
>> +};
>> +
>>  /*
>>   * Translation Cache-related fields of a TB.
>>   * This struct exists just for convenience; we keep track of TB's in a binary
>> @@ -403,6 +421,9 @@ struct TranslationBlock {
>>      uintptr_t jmp_list_head;
>>      uintptr_t jmp_list_next[2];
>>      uintptr_t jmp_dest[2];
>> +
>> +    // Pointer to a struct where statistics from the TB is stored
>
> No c++ style comments
>
>> +    TBStatistics *tb_stats;
>>  };
>>
>>  extern bool parallel_cpus;
>> diff --git a/include/exec/tb-context.h b/include/exec/tb-context.h
>> index feb585e0a7..a78ce92e60 100644
>> --- a/include/exec/tb-context.h
>> +++ b/include/exec/tb-context.h
>> @@ -35,6 +35,7 @@ struct TBContext {
>>
>>      /* statistics */
>>      unsigned tb_flush_count;
>> +    GList *tb_statistics;
>>  };
>>
>>  extern TBContext tb_ctx;
>> diff --git a/include/qemu/log.h b/include/qemu/log.h
>> index b097a6cae1..2fca65dd01 100644
>> --- a/include/qemu/log.h
>> +++ b/include/qemu/log.h
>> @@ -45,6 +45,7 @@ static inline bool qemu_log_separate(void)
>>  /* LOG_TRACE (1 << 15) is defined in log-for-trace.h */
>>  #define CPU_LOG_TB_OP_IND  (1 << 16)
>>  #define CPU_LOG_TB_FPU     (1 << 17)
>> +#define CPU_LOG_HOT_TBS    (1 << 18)
>>
>>  /* Lock output for a series of related logs.  Since this is not needed
>>   * for a single qemu_log / qemu_log_mask / qemu_log_mask_and_addr, we


--
Alex Bennée


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/4] add and link a statistic struct to TBs
  2019-06-25 16:28     ` Alex Bennée
@ 2019-06-25 17:37       ` Vanderson Martins do Rosario
  2019-06-25 18:02         ` Alex Bennée
  0 siblings, 1 reply; 14+ messages in thread
From: Vanderson Martins do Rosario @ 2019-06-25 17:37 UTC (permalink / raw)
  To: Alex Bennée; +Cc: Paolo Bonzini, qemu-devel, Richard Henderson

When the tb_flush removes a block and it's recreated, this shouldn't
be creating a new block but using the one that is found by:

lookup_result = g_list_find_custom(tb_ctx.tb_statistics, new_stats,
statistics_cmp);

So the tb_statisticics will be reused and we could just add this
regen counter in the if statement: if (lookup_result)

isn't that correct?

Vanderson M. Rosario


On Tue, Jun 25, 2019 at 1:28 PM Alex Bennée <alex.bennee@linaro.org> wrote:

>
> Alex Bennée <alex.bennee@linaro.org> writes:
>
> > vandersonmr <vandersonmr2@gmail.com> writes:
> >
> >> We want to store statistics for each TB even after flushes.
> >> We do not want to modify or grow the TB struct.
> >> So we create a new struct to contain this statistics and
> >> link it to each TB while they are created.
> >>
> >> Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
> >> ---
> >>  accel/tcg/translate-all.c | 40 +++++++++++++++++++++++++++++++++++++++
> >>  include/exec/exec-all.h   | 21 ++++++++++++++++++++
> >>  include/exec/tb-context.h |  1 +
> >>  include/qemu/log.h        |  1 +
> >>  4 files changed, 63 insertions(+)
> >>
> >> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> >> index 5d1e08b169..f7e99f90e0 100644
> >> --- a/accel/tcg/translate-all.c
> >> +++ b/accel/tcg/translate-all.c
> >> @@ -1118,6 +1118,18 @@ static inline void code_gen_alloc(size_t tb_size)
> >>      }
> >>  }
> >>
> >> +static gint statistics_cmp(gconstpointer p1, gconstpointer p2)
> >> +{
> >> +    const TBStatistics *a = (TBStatistics *) p1;
> >> +    const TBStatistics *b = (TBStatistics *) p2;
> >> +
> >> +    return (a->pc == b->pc &&
> >> +               a->cs_base == b->cs_base &&
> >> +               a->flags == b->flags &&
> >> +           a->page_addr[0] == b->page_addr[0] &&
> >> +               a->page_addr[1] == b->page_addr[1]) ? 0 : 1;
> >> +}
> >> +
> >
> > There are a bunch of white space errors that need fixing up as you
> > probably already know from patchew ;-)
> >
> >>  static bool tb_cmp(const void *ap, const void *bp)
> >>  {
> >>      const TranslationBlock *a = ap;
> >> @@ -1586,6 +1598,29 @@ static inline void tb_page_add(PageDesc *p,
> TranslationBlock *tb,
> >>  #endif
> >>  }
> >>
> >> +static void tb_insert_statistics_structure(TranslationBlock *tb) {
> >> +    TBStatistics *new_stats = g_new0(TBStatistics, 1);
> >> +    new_stats->pc = tb->pc;
> >> +    new_stats->cs_base = tb->cs_base;
> >> +    new_stats->flags = tb->flags;
> >> +    new_stats->page_addr[0] = tb->page_addr[0];
> >> +    new_stats->page_addr[1] = tb->page_addr[1];
> >> +
> >> +    GList *lookup_result = g_list_find_custom(tb_ctx.tb_statistics,
> new_stats, statistics_cmp);
> >> +
> >> +    if (lookup_result) {
> >> +            /* If there is already a TBStatistic for this TB from a
> previous flush
> >> +            * then just make the new TB point to the older TBStatistic
> >> +            */
> >> +            free(new_stats);
> >> +            tb->tb_stats = lookup_result->data;
> >> +    } else {
> >> +            /* If not, then points to the new tb_statistics and add it
> to the hash */
> >> +            tb->tb_stats = new_stats;
> >> +            tb_ctx.tb_statistics = g_list_prepend(tb_ctx.tb_statistics,
> >> new_stats);
> >
> > This is going to race as nothing prevents two threads attempting to
> > insert records at the same time.
> >
> >> +    }
> >> +}
> >> +
> >>  /* add a new TB and link it to the physical page tables. phys_page2 is
> >>   * (-1) to indicate that only one page contains the TB.
> >>   *
> >> @@ -1636,6 +1671,11 @@ tb_link_page(TranslationBlock *tb,
> tb_page_addr_t phys_pc,
> >>          void *existing_tb = NULL;
> >>          uint32_t h;
> >>
> >> +        if (qemu_loglevel_mask(CPU_LOG_HOT_TBS)) {
> >> +            /* create and link to its TB a structure to store
> statistics */
> >> +            tb_insert_statistics_structure(tb);
> >> +            }
> >> +
> >>          /* add in the hash table */
> >>          h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags &
> CF_HASH_MASK,
> >>                           tb->trace_vcpu_dstate);
> >
> > Perhaps we could just have a second qht array which allows for
> > "unlocked" insertion of record. You could take advantage of the fact the
> > hash would be the same:
> >
> >         h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags &
> CF_HASH_MASK,
> >                          tb->trace_vcpu_dstate);
> >         qht_insert(&tb_ctx.htable, tb, h, &existing_tb);
> >
> >         /* remove TB from the page(s) if we couldn't insert it */
> >         if (unlikely(existing_tb)) {
> >             ...
> >             tb = existing_tb;
> >         } else if (qemu_loglevel_mask(CPU_LOG_HOT_TBS)) {
> >             TBStatistics *new_stats = g_new0(TBStatistics, 1);
> >             bool ok;
> >             new_stats->pc = tb->pc;
> >             new_stats->cs_base = tb->cs_base;
> >             new_stats->flags = tb->flags;
> >             new_stats->page_addr[0] = tb->page_addr[0];
> >             new_stats->page_addr[1] = tb->page_addr[1];
> >             ok = qht_insert(&tb_ctx.tb_stats, new_stats, h, NULL);
> >             /*
> >              * This should never fail as we succeeded in inserting the
> >              * TB itself which means we haven't seen this combination
> yet.
> >              */
> >             g_assert(ok);
> >         }
> >
> > It would also avoid the clunkiness of having to allocate and then
> > freeing an unused structure.
>
>
> Actually thinking on this we still have to handle it. We may have
> tb_flushed away a translation and just be regenerating the same block.
> As TBStatistics should transcend tb_flush events we need to re-use it's
> structure.
>
> It would be worth counting the regens as well so we can see blocks that
> keep getting re-translated after each flush.
>
> >
> >> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> >> index 16034ee651..359100ef3b 100644
> >> --- a/include/exec/exec-all.h
> >> +++ b/include/exec/exec-all.h
> >> @@ -324,6 +324,24 @@ static inline void
> tlb_flush_by_mmuidx_all_cpus_synced(CPUState *cpu,
> >>  #define CODE_GEN_AVG_BLOCK_SIZE 150
> >>  #endif
> >>
> >> +typedef struct TBStatistics TBStatistics;
> >> +
> >> +/*
> >> + * This struct stores statistics such as execution count of the
> TranslationBlocks.
> >> + * Each TB has its own TBStatistics. TBStatistics is suppose to live
> even after
> >> + * flushes.
> >> + */
> >> +struct TBStatistics {
> >> +    target_ulong pc;
> >> +    target_ulong cs_base;
> >> +    uint32_t flags;
> >> +    tb_page_addr_t page_addr[2];
> >> +
> >> +    // total number of times that the related TB have being executed
> >
> > No c++ style comments
> >
> >> +    uint32_t exec_count;
> >> +    uint32_t exec_count_overflows;
> >> +};
> >> +
> >>  /*
> >>   * Translation Cache-related fields of a TB.
> >>   * This struct exists just for convenience; we keep track of TB's in a
> binary
> >> @@ -403,6 +421,9 @@ struct TranslationBlock {
> >>      uintptr_t jmp_list_head;
> >>      uintptr_t jmp_list_next[2];
> >>      uintptr_t jmp_dest[2];
> >> +
> >> +    // Pointer to a struct where statistics from the TB is stored
> >
> > No c++ style comments
> >
> >> +    TBStatistics *tb_stats;
> >>  };
> >>
> >>  extern bool parallel_cpus;
> >> diff --git a/include/exec/tb-context.h b/include/exec/tb-context.h
> >> index feb585e0a7..a78ce92e60 100644
> >> --- a/include/exec/tb-context.h
> >> +++ b/include/exec/tb-context.h
> >> @@ -35,6 +35,7 @@ struct TBContext {
> >>
> >>      /* statistics */
> >>      unsigned tb_flush_count;
> >> +    GList *tb_statistics;
> >>  };
> >>
> >>  extern TBContext tb_ctx;
> >> diff --git a/include/qemu/log.h b/include/qemu/log.h
> >> index b097a6cae1..2fca65dd01 100644
> >> --- a/include/qemu/log.h
> >> +++ b/include/qemu/log.h
> >> @@ -45,6 +45,7 @@ static inline bool qemu_log_separate(void)
> >>  /* LOG_TRACE (1 << 15) is defined in log-for-trace.h */
> >>  #define CPU_LOG_TB_OP_IND  (1 << 16)
> >>  #define CPU_LOG_TB_FPU     (1 << 17)
> >> +#define CPU_LOG_HOT_TBS    (1 << 18)
> >>
> >>  /* Lock output for a series of related logs.  Since this is not needed
> >>   * for a single qemu_log / qemu_log_mask / qemu_log_mask_and_addr, we
>
>
> --
> Alex Bennée
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/4] add and link a statistic struct to TBs
  2019-06-25 17:37       ` Vanderson Martins do Rosario
@ 2019-06-25 18:02         ` Alex Bennée
  2019-06-26 11:05           ` Alex Bennée
  0 siblings, 1 reply; 14+ messages in thread
From: Alex Bennée @ 2019-06-25 18:02 UTC (permalink / raw)
  To: Vanderson Martins do Rosario; +Cc: Paolo Bonzini, qemu-devel, Richard Henderson


Vanderson Martins do Rosario <vandersonmr2@gmail.com> writes:

> When the tb_flush removes a block and it's recreated, this shouldn't
> be creating a new block but using the one that is found by:
>
> lookup_result = g_list_find_custom(tb_ctx.tb_statistics, new_stats,
> statistics_cmp);
>
> So the tb_statisticics will be reused and we could just add this
> regen counter in the if statement: if (lookup_result)
>
> isn't that correct?

Yes, although as I said earlier you want to use a QHT hash table rather
than a g_list to avoid racing with multiple translations at once.

>
> Vanderson M. Rosario
>
>
> On Tue, Jun 25, 2019 at 1:28 PM Alex Bennée <alex.bennee@linaro.org> wrote:
>
>>
>> Alex Bennée <alex.bennee@linaro.org> writes:
>>
>> > vandersonmr <vandersonmr2@gmail.com> writes:
>> >
>> >> We want to store statistics for each TB even after flushes.
>> >> We do not want to modify or grow the TB struct.
>> >> So we create a new struct to contain this statistics and
>> >> link it to each TB while they are created.
>> >>
>> >> Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
>> >> ---
>> >>  accel/tcg/translate-all.c | 40 +++++++++++++++++++++++++++++++++++++++
>> >>  include/exec/exec-all.h   | 21 ++++++++++++++++++++
>> >>  include/exec/tb-context.h |  1 +
>> >>  include/qemu/log.h        |  1 +
>> >>  4 files changed, 63 insertions(+)
>> >>
>> >> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
>> >> index 5d1e08b169..f7e99f90e0 100644
>> >> --- a/accel/tcg/translate-all.c
>> >> +++ b/accel/tcg/translate-all.c
>> >> @@ -1118,6 +1118,18 @@ static inline void code_gen_alloc(size_t tb_size)
>> >>      }
>> >>  }
>> >>
>> >> +static gint statistics_cmp(gconstpointer p1, gconstpointer p2)
>> >> +{
>> >> +    const TBStatistics *a = (TBStatistics *) p1;
>> >> +    const TBStatistics *b = (TBStatistics *) p2;
>> >> +
>> >> +    return (a->pc == b->pc &&
>> >> +               a->cs_base == b->cs_base &&
>> >> +               a->flags == b->flags &&
>> >> +           a->page_addr[0] == b->page_addr[0] &&
>> >> +               a->page_addr[1] == b->page_addr[1]) ? 0 : 1;
>> >> +}
>> >> +
>> >
>> > There are a bunch of white space errors that need fixing up as you
>> > probably already know from patchew ;-)
>> >
>> >>  static bool tb_cmp(const void *ap, const void *bp)
>> >>  {
>> >>      const TranslationBlock *a = ap;
>> >> @@ -1586,6 +1598,29 @@ static inline void tb_page_add(PageDesc *p,
>> TranslationBlock *tb,
>> >>  #endif
>> >>  }
>> >>
>> >> +static void tb_insert_statistics_structure(TranslationBlock *tb) {
>> >> +    TBStatistics *new_stats = g_new0(TBStatistics, 1);
>> >> +    new_stats->pc = tb->pc;
>> >> +    new_stats->cs_base = tb->cs_base;
>> >> +    new_stats->flags = tb->flags;
>> >> +    new_stats->page_addr[0] = tb->page_addr[0];
>> >> +    new_stats->page_addr[1] = tb->page_addr[1];
>> >> +
>> >> +    GList *lookup_result = g_list_find_custom(tb_ctx.tb_statistics,
>> new_stats, statistics_cmp);
>> >> +
>> >> +    if (lookup_result) {
>> >> +            /* If there is already a TBStatistic for this TB from a
>> previous flush
>> >> +            * then just make the new TB point to the older TBStatistic
>> >> +            */
>> >> +            free(new_stats);
>> >> +            tb->tb_stats = lookup_result->data;
>> >> +    } else {
>> >> +            /* If not, then points to the new tb_statistics and add it
>> to the hash */
>> >> +            tb->tb_stats = new_stats;
>> >> +            tb_ctx.tb_statistics = g_list_prepend(tb_ctx.tb_statistics,
>> >> new_stats);
>> >
>> > This is going to race as nothing prevents two threads attempting to
>> > insert records at the same time.
>> >
>> >> +    }
>> >> +}
>> >> +
>> >>  /* add a new TB and link it to the physical page tables. phys_page2 is
>> >>   * (-1) to indicate that only one page contains the TB.
>> >>   *
>> >> @@ -1636,6 +1671,11 @@ tb_link_page(TranslationBlock *tb,
>> tb_page_addr_t phys_pc,
>> >>          void *existing_tb = NULL;
>> >>          uint32_t h;
>> >>
>> >> +        if (qemu_loglevel_mask(CPU_LOG_HOT_TBS)) {
>> >> +            /* create and link to its TB a structure to store
>> statistics */
>> >> +            tb_insert_statistics_structure(tb);
>> >> +            }
>> >> +
>> >>          /* add in the hash table */
>> >>          h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags &
>> CF_HASH_MASK,
>> >>                           tb->trace_vcpu_dstate);
>> >
>> > Perhaps we could just have a second qht array which allows for
>> > "unlocked" insertion of record. You could take advantage of the fact the
>> > hash would be the same:
>> >
>> >         h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags &
>> CF_HASH_MASK,
>> >                          tb->trace_vcpu_dstate);
>> >         qht_insert(&tb_ctx.htable, tb, h, &existing_tb);
>> >
>> >         /* remove TB from the page(s) if we couldn't insert it */
>> >         if (unlikely(existing_tb)) {
>> >             ...
>> >             tb = existing_tb;
>> >         } else if (qemu_loglevel_mask(CPU_LOG_HOT_TBS)) {
>> >             TBStatistics *new_stats = g_new0(TBStatistics, 1);
>> >             bool ok;
>> >             new_stats->pc = tb->pc;
>> >             new_stats->cs_base = tb->cs_base;
>> >             new_stats->flags = tb->flags;
>> >             new_stats->page_addr[0] = tb->page_addr[0];
>> >             new_stats->page_addr[1] = tb->page_addr[1];
>> >             ok = qht_insert(&tb_ctx.tb_stats, new_stats, h, NULL);
>> >             /*
>> >              * This should never fail as we succeeded in inserting the
>> >              * TB itself which means we haven't seen this combination
>> yet.
>> >              */
>> >             g_assert(ok);
>> >         }
>> >
>> > It would also avoid the clunkiness of having to allocate and then
>> > freeing an unused structure.
>>
>>
>> Actually thinking on this we still have to handle it. We may have
>> tb_flushed away a translation and just be regenerating the same block.
>> As TBStatistics should transcend tb_flush events we need to re-use it's
>> structure.
>>
>> It would be worth counting the regens as well so we can see blocks that
>> keep getting re-translated after each flush.
>>
>> >
>> >> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
>> >> index 16034ee651..359100ef3b 100644
>> >> --- a/include/exec/exec-all.h
>> >> +++ b/include/exec/exec-all.h
>> >> @@ -324,6 +324,24 @@ static inline void
>> tlb_flush_by_mmuidx_all_cpus_synced(CPUState *cpu,
>> >>  #define CODE_GEN_AVG_BLOCK_SIZE 150
>> >>  #endif
>> >>
>> >> +typedef struct TBStatistics TBStatistics;
>> >> +
>> >> +/*
>> >> + * This struct stores statistics such as execution count of the
>> TranslationBlocks.
>> >> + * Each TB has its own TBStatistics. TBStatistics is suppose to live
>> even after
>> >> + * flushes.
>> >> + */
>> >> +struct TBStatistics {
>> >> +    target_ulong pc;
>> >> +    target_ulong cs_base;
>> >> +    uint32_t flags;
>> >> +    tb_page_addr_t page_addr[2];
>> >> +
>> >> +    // total number of times that the related TB have being executed
>> >
>> > No c++ style comments
>> >
>> >> +    uint32_t exec_count;
>> >> +    uint32_t exec_count_overflows;
>> >> +};
>> >> +
>> >>  /*
>> >>   * Translation Cache-related fields of a TB.
>> >>   * This struct exists just for convenience; we keep track of TB's in a
>> binary
>> >> @@ -403,6 +421,9 @@ struct TranslationBlock {
>> >>      uintptr_t jmp_list_head;
>> >>      uintptr_t jmp_list_next[2];
>> >>      uintptr_t jmp_dest[2];
>> >> +
>> >> +    // Pointer to a struct where statistics from the TB is stored
>> >
>> > No c++ style comments
>> >
>> >> +    TBStatistics *tb_stats;
>> >>  };
>> >>
>> >>  extern bool parallel_cpus;
>> >> diff --git a/include/exec/tb-context.h b/include/exec/tb-context.h
>> >> index feb585e0a7..a78ce92e60 100644
>> >> --- a/include/exec/tb-context.h
>> >> +++ b/include/exec/tb-context.h
>> >> @@ -35,6 +35,7 @@ struct TBContext {
>> >>
>> >>      /* statistics */
>> >>      unsigned tb_flush_count;
>> >> +    GList *tb_statistics;
>> >>  };
>> >>
>> >>  extern TBContext tb_ctx;
>> >> diff --git a/include/qemu/log.h b/include/qemu/log.h
>> >> index b097a6cae1..2fca65dd01 100644
>> >> --- a/include/qemu/log.h
>> >> +++ b/include/qemu/log.h
>> >> @@ -45,6 +45,7 @@ static inline bool qemu_log_separate(void)
>> >>  /* LOG_TRACE (1 << 15) is defined in log-for-trace.h */
>> >>  #define CPU_LOG_TB_OP_IND  (1 << 16)
>> >>  #define CPU_LOG_TB_FPU     (1 << 17)
>> >> +#define CPU_LOG_HOT_TBS    (1 << 18)
>> >>
>> >>  /* Lock output for a series of related logs.  Since this is not needed
>> >>   * for a single qemu_log / qemu_log_mask / qemu_log_mask_and_addr, we
>>
>>
>> --
>> Alex Bennée
>>


--
Alex Bennée


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/4] add and link a statistic struct to TBs
  2019-06-25 18:02         ` Alex Bennée
@ 2019-06-26 11:05           ` Alex Bennée
  0 siblings, 0 replies; 14+ messages in thread
From: Alex Bennée @ 2019-06-26 11:05 UTC (permalink / raw)
  To: Vanderson Martins do Rosario; +Cc: Paolo Bonzini, qemu-devel, Richard Henderson


Alex Bennée <alex.bennee@linaro.org> writes:

> Vanderson Martins do Rosario <vandersonmr2@gmail.com> writes:
>
>> When the tb_flush removes a block and it's recreated, this shouldn't
>> be creating a new block but using the one that is found by:
>>
>> lookup_result = g_list_find_custom(tb_ctx.tb_statistics, new_stats,
>> statistics_cmp);
>>
>> So the tb_statisticics will be reused and we could just add this
>> regen counter in the if statement: if (lookup_result)
>>
>> isn't that correct?
>
> Yes, although as I said earlier you want to use a QHT hash table rather
> than a g_list to avoid racing with multiple translations at once.
>
>>
>> Vanderson M. Rosario
>>
>>
>> On Tue, Jun 25, 2019 at 1:28 PM Alex Bennée <alex.bennee@linaro.org> wrote:
>>
>>>
>>> Alex Bennée <alex.bennee@linaro.org> writes:
>>>
>>> > vandersonmr <vandersonmr2@gmail.com> writes:
>>> >
>>> >> We want to store statistics for each TB even after flushes.
>>> >> We do not want to modify or grow the TB struct.
>>> >> So we create a new struct to contain this statistics and
>>> >> link it to each TB while they are created.
>>> >>
>>> >> Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
>>> >> ---
>>> >>  accel/tcg/translate-all.c | 40 +++++++++++++++++++++++++++++++++++++++
>>> >>  include/exec/exec-all.h   | 21 ++++++++++++++++++++
>>> >>  include/exec/tb-context.h |  1 +
>>> >>  include/qemu/log.h        |  1 +
>>> >>  4 files changed, 63 insertions(+)
>>> >>
>>> >> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
>>> >> index 5d1e08b169..f7e99f90e0 100644
>>> >> --- a/accel/tcg/translate-all.c
>>> >> +++ b/accel/tcg/translate-all.c
>>> >> @@ -1118,6 +1118,18 @@ static inline void code_gen_alloc(size_t tb_size)
>>> >>      }
>>> >>  }
>>> >>
>>> >> +static gint statistics_cmp(gconstpointer p1, gconstpointer p2)
>>> >> +{
>>> >> +    const TBStatistics *a = (TBStatistics *) p1;
>>> >> +    const TBStatistics *b = (TBStatistics *) p2;
>>> >> +
>>> >> +    return (a->pc == b->pc &&
>>> >> +               a->cs_base == b->cs_base &&
>>> >> +               a->flags == b->flags &&
>>> >> +           a->page_addr[0] == b->page_addr[0] &&
>>> >> +               a->page_addr[1] == b->page_addr[1]) ? 0 : 1;
>>> >> +}
>>> >> +
>>> >
>>> > There are a bunch of white space errors that need fixing up as you
>>> > probably already know from patchew ;-)
>>> >
>>> >>  static bool tb_cmp(const void *ap, const void *bp)
>>> >>  {
>>> >>      const TranslationBlock *a = ap;
>>> >> @@ -1586,6 +1598,29 @@ static inline void tb_page_add(PageDesc *p,
>>> TranslationBlock *tb,
>>> >>  #endif
>>> >>  }
>>> >>
>>> >> +static void tb_insert_statistics_structure(TranslationBlock *tb) {
>>> >> +    TBStatistics *new_stats = g_new0(TBStatistics, 1);
>>> >> +    new_stats->pc = tb->pc;
>>> >> +    new_stats->cs_base = tb->cs_base;
>>> >> +    new_stats->flags = tb->flags;
>>> >> +    new_stats->page_addr[0] = tb->page_addr[0];
>>> >> +    new_stats->page_addr[1] = tb->page_addr[1];
>>> >> +
>>> >> +    GList *lookup_result = g_list_find_custom(tb_ctx.tb_statistics,
>>> new_stats, statistics_cmp);
>>> >> +
>>> >> +    if (lookup_result) {
>>> >> +            /* If there is already a TBStatistic for this TB from a
>>> previous flush
>>> >> +            * then just make the new TB point to the older TBStatistic
>>> >> +            */
>>> >> +            free(new_stats);
>>> >> +            tb->tb_stats = lookup_result->data;
>>> >> +    } else {
>>> >> +            /* If not, then points to the new tb_statistics and add it
>>> to the hash */
>>> >> +            tb->tb_stats = new_stats;
>>> >> +            tb_ctx.tb_statistics = g_list_prepend(tb_ctx.tb_statistics,
>>> >> new_stats);
>>> >
>>> > This is going to race as nothing prevents two threads attempting to
>>> > insert records at the same time.
>>> >
>>> >> +    }
>>> >> +}
>>> >> +
>>> >>  /* add a new TB and link it to the physical page tables. phys_page2 is
>>> >>   * (-1) to indicate that only one page contains the TB.
>>> >>   *
>>> >> @@ -1636,6 +1671,11 @@ tb_link_page(TranslationBlock *tb,
>>> tb_page_addr_t phys_pc,
>>> >>          void *existing_tb = NULL;
>>> >>          uint32_t h;
>>> >>
>>> >> +        if (qemu_loglevel_mask(CPU_LOG_HOT_TBS)) {
>>> >> +            /* create and link to its TB a structure to store
>>> statistics */
>>> >> +            tb_insert_statistics_structure(tb);
>>> >> +            }
>>> >> +
>>> >>          /* add in the hash table */
>>> >>          h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags &
>>> CF_HASH_MASK,
>>> >>                           tb->trace_vcpu_dstate);
>>> >
>>> > Perhaps we could just have a second qht array which allows for
>>> > "unlocked" insertion of record. You could take advantage of the fact the
>>> > hash would be the same:
>>> >
>>> >         h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags &
>>> CF_HASH_MASK,
>>> >                          tb->trace_vcpu_dstate);
>>> >         qht_insert(&tb_ctx.htable, tb, h, &existing_tb);
>>> >
>>> >         /* remove TB from the page(s) if we couldn't insert it */
>>> >         if (unlikely(existing_tb)) {
>>> >             ...
>>> >             tb = existing_tb;
>>> >         } else if (qemu_loglevel_mask(CPU_LOG_HOT_TBS)) {
>>> >             TBStatistics *new_stats = g_new0(TBStatistics, 1);
>>> >             bool ok;
>>> >             new_stats->pc = tb->pc;
>>> >             new_stats->cs_base = tb->cs_base;
>>> >             new_stats->flags = tb->flags;
>>> >             new_stats->page_addr[0] = tb->page_addr[0];
>>> >             new_stats->page_addr[1] = tb->page_addr[1];
>>> >             ok = qht_insert(&tb_ctx.tb_stats, new_stats, h, NULL);
>>> >             /*
>>> >              * This should never fail as we succeeded in inserting the
>>> >              * TB itself which means we haven't seen this combination
>>> yet.
>>> >              */
>>> >             g_assert(ok);
>>> >         }
>>> >
>>> > It would also avoid the clunkiness of having to allocate and then
>>> > freeing an unused structure.
>>>
>>>
>>> Actually thinking on this we still have to handle it. We may have
>>> tb_flushed away a translation and just be regenerating the same block.
>>> As TBStatistics should transcend tb_flush events we need to re-use it's
>>> structure.
>>>
>>> It would be worth counting the regens as well so we can see blocks that
>>> keep getting re-translated after each flush.

Another issue is lifetime. We really want to have the statistics
available before we generate the code (so we can pass a direct pointer
to the helper) and we can't wait until after we qht_insert the TB into
the buffer (as any thread can start executing that buffer from that
point). So we need to ensure tb->tb_stats is live and ready as
translation time.

--
Alex Bennée


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-06-26 11:06 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-24  5:54 [Qemu-devel] [PATCH v2 0/4] dumping hot TBs vandersonmr
2019-06-24  5:54 ` [Qemu-devel] [PATCH v2 1/4] add and link a statistic struct to TBs vandersonmr
2019-06-24 15:06   ` Alex Bennée
2019-06-25 16:28     ` Alex Bennée
2019-06-25 17:37       ` Vanderson Martins do Rosario
2019-06-25 18:02         ` Alex Bennée
2019-06-26 11:05           ` Alex Bennée
2019-06-24  5:54 ` [Qemu-devel] [PATCH v2 2/4] Adding an optional tb execution counter vandersonmr
2019-06-25  9:51   ` Alex Bennée
2019-06-24  5:54 ` [Qemu-devel] [PATCH v2 3/4] Introduce dump of hot TBs vandersonmr
2019-06-25 10:38   ` Alex Bennée
2019-06-24  5:54 ` [Qemu-devel] [PATCH v2 4/4] adding -d hot_tbs:limit command line option vandersonmr
2019-06-25 12:13   ` Alex Bennée
2019-06-24  6:12 ` [Qemu-devel] [PATCH v2 0/4] dumping hot TBs no-reply

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).