All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] plugins/cache: L2 cache modelling and a minor leak fix
@ 2021-08-10 13:48 Mahmoud Mandour
  2021-08-10 13:48 ` [PATCH 1/5] plugins/cache: freed heap-allocated mutexes Mahmoud Mandour
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: Mahmoud Mandour @ 2021-08-10 13:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: Mahmoud Mandour, alex.bennee

Hello,

This series implements a simple L2 unified per-core cache emulation, the L2
cache is not enabled by default and is only enabled on specifying so using the
plugin arguments. L2 cache is only accessed if L1 does not contain the wanted
block. If a miss occur in L1, the block is "fetched" to L1, and then L2 is
interrogated. This effectively mean that on a cache miss, we bring the data to
all the cache hierarchy, which is quite reasonable.

Note: Some +80-column lines are left as is, since they're all 81~83 cols and I
thought that it's so important to break them since breaking a line usually looks
ugly when not sufficiently long(?) LMK if I need to fix this :)

Mahmoud Mandour (5):
  plugins/cache: freed heap-allocated mutexes
  plugins/cache: implement unified L2 cache emulation
  plugins/cache: split command line arguments into name and value
  plugins/cache: make L2 emulation optional through args
  docs/tcg-plugins: add L2 arguments to cache docs

 contrib/plugins/cache.c    | 318 ++++++++++++++++++++++++++-----------
 docs/devel/tcg-plugins.rst |  22 ++-
 2 files changed, 244 insertions(+), 96 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/5] plugins/cache: freed heap-allocated mutexes
  2021-08-10 13:48 [PATCH 0/5] plugins/cache: L2 cache modelling and a minor leak fix Mahmoud Mandour
@ 2021-08-10 13:48 ` Mahmoud Mandour
  2021-10-08 14:45   ` Alex Bennée
  2021-08-10 13:48 ` [PATCH 2/5] plugins/cache: implement unified L2 cache emulation Mahmoud Mandour
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Mahmoud Mandour @ 2021-08-10 13:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexandre Iooss, Mahmoud Mandour, alex.bennee

Signed-off-by: Mahmoud Mandour <ma.mandourr@gmail.com>
---
 contrib/plugins/cache.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/contrib/plugins/cache.c b/contrib/plugins/cache.c
index a1e03ca882..a255e26e25 100644
--- a/contrib/plugins/cache.c
+++ b/contrib/plugins/cache.c
@@ -614,6 +614,9 @@ static void plugin_exit(qemu_plugin_id_t id, void *p)
     caches_free(dcaches);
     caches_free(icaches);
 
+    g_free(dcache_locks);
+    g_free(icache_locks);
+
     g_hash_table_destroy(miss_ht);
 }
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/5] plugins/cache: implement unified L2 cache emulation
  2021-08-10 13:48 [PATCH 0/5] plugins/cache: L2 cache modelling and a minor leak fix Mahmoud Mandour
  2021-08-10 13:48 ` [PATCH 1/5] plugins/cache: freed heap-allocated mutexes Mahmoud Mandour
@ 2021-08-10 13:48 ` Mahmoud Mandour
  2021-10-08 15:44   ` Alex Bennée
  2021-08-10 13:48 ` [PATCH 3/5] plugins/cache: split command line arguments into name and value Mahmoud Mandour
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Mahmoud Mandour @ 2021-08-10 13:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexandre Iooss, Mahmoud Mandour, alex.bennee

This adds an implementation of a simple L2 configuration, in which a
unified L2 cache (stores both blocks of instructions and data) is
maintained for each core separately, with no inter-core interaction
taken in account. The L2 cache is used as a backup for L1 and is only
accessed if the wanted block does not exist in L1.

In terms of multi-threaded user-space emulation, the same approximation
of L1 is done, a static number of caches is maintained, and each and
every memory access initiated by a thread will have to go through one of
the available caches.

An atomic increment is used to maintain the number of L2 misses per
instruction.

The default cache parameters of L2 caches is:

    2MB cache size
    16-way associativity
    64-byte blocks

Signed-off-by: Mahmoud Mandour <ma.mandourr@gmail.com>
---
 contrib/plugins/cache.c | 256 +++++++++++++++++++++++++++-------------
 1 file changed, 175 insertions(+), 81 deletions(-)

diff --git a/contrib/plugins/cache.c b/contrib/plugins/cache.c
index a255e26e25..908c967a09 100644
--- a/contrib/plugins/cache.c
+++ b/contrib/plugins/cache.c
@@ -82,8 +82,9 @@ typedef struct {
     char *disas_str;
     const char *symbol;
     uint64_t addr;
-    uint64_t dmisses;
-    uint64_t imisses;
+    uint64_t l1_dmisses;
+    uint64_t l1_imisses;
+    uint64_t l2_misses;
 } InsnData;
 
 void (*update_hit)(Cache *cache, int set, int blk);
@@ -93,15 +94,20 @@ void (*metadata_init)(Cache *cache);
 void (*metadata_destroy)(Cache *cache);
 
 static int cores;
-static Cache **dcaches, **icaches;
+static Cache **l1_dcaches, **l1_icaches;
+static Cache **l2_ucaches;
 
-static GMutex *dcache_locks;
-static GMutex *icache_locks;
+static GMutex *l1_dcache_locks;
+static GMutex *l1_icache_locks;
+static GMutex *l2_ucache_locks;
 
-static uint64_t all_dmem_accesses;
-static uint64_t all_imem_accesses;
-static uint64_t all_imisses;
-static uint64_t all_dmisses;
+static uint64_t l1_dmem_accesses;
+static uint64_t l1_imem_accesses;
+static uint64_t l1_imisses;
+static uint64_t l1_dmisses;
+
+static uint64_t l2_mem_accesses;
+static uint64_t l2_misses;
 
 static int pow_of_two(int num)
 {
@@ -382,6 +388,7 @@ static void vcpu_mem_access(unsigned int vcpu_index, qemu_plugin_meminfo_t info,
     struct qemu_plugin_hwaddr *hwaddr;
     int cache_idx;
     InsnData *insn;
+    bool hit_in_l1;
 
     hwaddr = qemu_plugin_get_hwaddr(info, vaddr);
     if (hwaddr && qemu_plugin_hwaddr_is_io(hwaddr)) {
@@ -391,14 +398,29 @@ static void vcpu_mem_access(unsigned int vcpu_index, qemu_plugin_meminfo_t info,
     effective_addr = hwaddr ? qemu_plugin_hwaddr_phys_addr(hwaddr) : vaddr;
     cache_idx = vcpu_index % cores;
 
-    g_mutex_lock(&dcache_locks[cache_idx]);
-    if (!access_cache(dcaches[cache_idx], effective_addr)) {
+    g_mutex_lock(&l1_dcache_locks[cache_idx]);
+    hit_in_l1 = access_cache(l1_dcaches[cache_idx], effective_addr);
+    if (!hit_in_l1) {
+        insn = (InsnData *) userdata;
+        __atomic_fetch_add(&insn->l1_dmisses, 1, __ATOMIC_SEQ_CST);
+        l1_dcaches[cache_idx]->misses++;
+    }
+    l1_dcaches[cache_idx]->accesses++;
+    g_mutex_unlock(&l1_dcache_locks[cache_idx]);
+
+    if (hit_in_l1) {
+        /* No need to access L2 */
+        return;
+    }
+
+    g_mutex_lock(&l2_ucache_locks[cache_idx]);
+    if (!access_cache(l2_ucaches[cache_idx], effective_addr)) {
         insn = (InsnData *) userdata;
-        __atomic_fetch_add(&insn->dmisses, 1, __ATOMIC_SEQ_CST);
-        dcaches[cache_idx]->misses++;
+        __atomic_fetch_add(&insn->l2_misses, 1, __ATOMIC_SEQ_CST);
+        l2_ucaches[cache_idx]->misses++;
     }
-    dcaches[cache_idx]->accesses++;
-    g_mutex_unlock(&dcache_locks[cache_idx]);
+    l2_ucaches[cache_idx]->accesses++;
+    g_mutex_unlock(&l2_ucache_locks[cache_idx]);
 }
 
 static void vcpu_insn_exec(unsigned int vcpu_index, void *userdata)
@@ -406,18 +428,34 @@ static void vcpu_insn_exec(unsigned int vcpu_index, void *userdata)
     uint64_t insn_addr;
     InsnData *insn;
     int cache_idx;
+    bool hit_in_l1;
 
     insn_addr = ((InsnData *) userdata)->addr;
 
     cache_idx = vcpu_index % cores;
-    g_mutex_lock(&icache_locks[cache_idx]);
-    if (!access_cache(icaches[cache_idx], insn_addr)) {
+    g_mutex_lock(&l1_icache_locks[cache_idx]);
+    hit_in_l1 = access_cache(l1_icaches[cache_idx], insn_addr);
+    if (!hit_in_l1) {
+        insn = (InsnData *) userdata;
+        __atomic_fetch_add(&insn->l1_imisses, 1, __ATOMIC_SEQ_CST);
+        l1_icaches[cache_idx]->misses++;
+    }
+    l1_icaches[cache_idx]->accesses++;
+    g_mutex_unlock(&l1_icache_locks[cache_idx]);
+
+    if (hit_in_l1) {
+        /* No need to access L2 */
+        return;
+    }
+
+    g_mutex_lock(&l2_ucache_locks[cache_idx]);
+    if (!access_cache(l2_ucaches[cache_idx], insn_addr)) {
         insn = (InsnData *) userdata;
-        __atomic_fetch_add(&insn->imisses, 1, __ATOMIC_SEQ_CST);
-        icaches[cache_idx]->misses++;
+        __atomic_fetch_add(&insn->l2_misses, 1, __ATOMIC_SEQ_CST);
+        l2_ucaches[cache_idx]->misses++;
     }
-    icaches[cache_idx]->accesses++;
-    g_mutex_unlock(&icache_locks[cache_idx]);
+    l2_ucaches[cache_idx]->accesses++;
+    g_mutex_unlock(&l2_ucache_locks[cache_idx]);
 }
 
 static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
@@ -493,30 +531,28 @@ static void caches_free(Cache **caches)
     }
 }
 
-static int dcmp(gconstpointer a, gconstpointer b)
-{
-    InsnData *insn_a = (InsnData *) a;
-    InsnData *insn_b = (InsnData *) b;
-
-    return insn_a->dmisses < insn_b->dmisses ? 1 : -1;
-}
-
-static void append_stats_line(GString *line, uint64_t daccess, uint64_t dmisses,
-                              uint64_t iaccess, uint64_t imisses)
+static void append_stats_line(GString *line, uint64_t l1_daccess,
+                              uint64_t l1_dmisses, uint64_t l1_iaccess,
+                              uint64_t l1_imisses,  uint64_t l2_access,
+                              uint64_t l2_misses)
 {
-    double dmiss_rate, imiss_rate;
+    double l1_dmiss_rate, l1_imiss_rate, l2_miss_rate;
 
-    dmiss_rate = ((double) dmisses) / (daccess) * 100.0;
-    imiss_rate = ((double) imisses) / (iaccess) * 100.0;
+    l1_dmiss_rate = ((double) l1_dmisses) / (l1_daccess) * 100.0;
+    l1_imiss_rate = ((double) l1_imisses) / (l1_iaccess) * 100.0;
+    l2_miss_rate =  ((double) l2_misses) / (l2_access) * 100.0;
 
     g_string_append_printf(line, "%-14lu %-12lu %9.4lf%%  %-14lu %-12lu"
-                           " %9.4lf%%\n",
-                           daccess,
-                           dmisses,
-                           daccess ? dmiss_rate : 0.0,
-                           iaccess,
-                           imisses,
-                           iaccess ? imiss_rate : 0.0);
+                           " %9.4lf%%  %-12lu %-11lu %10.4lf%%\n",
+                           l1_daccess,
+                           l1_dmisses,
+                           l1_daccess ? l1_dmiss_rate : 0.0,
+                           l1_iaccess,
+                           l1_imisses,
+                           l1_iaccess ? l1_imiss_rate : 0.0,
+                           l2_access,
+                           l2_misses,
+                           l2_access ? l2_miss_rate : 0.0);
 }
 
 static void sum_stats(void)
@@ -525,43 +561,66 @@ static void sum_stats(void)
 
     g_assert(cores > 1);
     for (i = 0; i < cores; i++) {
-        all_imisses += icaches[i]->misses;
-        all_dmisses += dcaches[i]->misses;
-        all_imem_accesses += icaches[i]->accesses;
-        all_dmem_accesses += dcaches[i]->accesses;
+        l1_imisses += l1_icaches[i]->misses;
+        l1_dmisses += l1_dcaches[i]->misses;
+        l1_imem_accesses += l1_icaches[i]->accesses;
+        l1_dmem_accesses += l1_dcaches[i]->accesses;
+
+        l2_misses += l2_ucaches[i]->misses;
+        l2_mem_accesses += l2_ucaches[i]->accesses;
     }
 }
 
+static int dcmp(gconstpointer a, gconstpointer b)
+{
+    InsnData *insn_a = (InsnData *) a;
+    InsnData *insn_b = (InsnData *) b;
+
+    return insn_a->l1_dmisses < insn_b->l1_dmisses ? 1 : -1;
+}
+
 static int icmp(gconstpointer a, gconstpointer b)
 {
     InsnData *insn_a = (InsnData *) a;
     InsnData *insn_b = (InsnData *) b;
 
-    return insn_a->imisses < insn_b->imisses ? 1 : -1;
+    return insn_a->l1_imisses < insn_b->l1_imisses ? 1 : -1;
+}
+
+static int l2_cmp(gconstpointer a, gconstpointer b)
+{
+    InsnData *insn_a = (InsnData *) a;
+    InsnData *insn_b = (InsnData *) b;
+
+    return insn_a->l2_misses < insn_b->l2_misses ? 1 : -1;
 }
 
 static void log_stats(void)
 {
     int i;
-    Cache *icache, *dcache;
+    Cache *icache, *dcache, *l2_cache;
 
     g_autoptr(GString) rep = g_string_new("core #, data accesses, data misses,"
                                           " dmiss rate, insn accesses,"
-                                          " insn misses, imiss rate\n");
+                                          " insn misses, imiss rate,"
+                                          " l2 accesses, l2 misses,"
+                                          " l2 miss rate\n");
 
     for (i = 0; i < cores; i++) {
         g_string_append_printf(rep, "%-8d", i);
-        dcache = dcaches[i];
-        icache = icaches[i];
+        dcache = l1_dcaches[i];
+        icache = l1_icaches[i];
+        l2_cache = l2_ucaches[i];
         append_stats_line(rep, dcache->accesses, dcache->misses,
-                icache->accesses, icache->misses);
+                icache->accesses, icache->misses, l2_cache->accesses,
+                l2_cache->misses);
     }
 
     if (cores > 1) {
         sum_stats();
         g_string_append_printf(rep, "%-8s", "sum");
-        append_stats_line(rep, all_dmem_accesses, all_dmisses,
-                all_imem_accesses, all_imisses);
+        append_stats_line(rep, l1_dmem_accesses, l1_dmisses,
+                l1_imem_accesses, l1_imisses, l2_mem_accesses, l2_misses);
     }
 
     g_string_append(rep, "\n");
@@ -585,7 +644,7 @@ static void log_top_insns(void)
         if (insn->symbol) {
             g_string_append_printf(rep, " (%s)", insn->symbol);
         }
-        g_string_append_printf(rep, ", %ld, %s\n", insn->dmisses,
+        g_string_append_printf(rep, ", %ld, %s\n", insn->l1_dmisses,
                                insn->disas_str);
     }
 
@@ -598,7 +657,20 @@ static void log_top_insns(void)
         if (insn->symbol) {
             g_string_append_printf(rep, " (%s)", insn->symbol);
         }
-        g_string_append_printf(rep, ", %ld, %s\n", insn->imisses,
+        g_string_append_printf(rep, ", %ld, %s\n", insn->l1_imisses,
+                               insn->disas_str);
+    }
+
+    miss_insns = g_list_sort(miss_insns, l2_cmp);
+    g_string_append_printf(rep, "%s", "\naddress, L2 misses, instruction\n");
+
+    for (curr = miss_insns, i = 0; curr && i < limit; i++, curr = curr->next) {
+        insn = (InsnData *) curr->data;
+        g_string_append_printf(rep, "0x%" PRIx64, insn->addr);
+        if (insn->symbol) {
+            g_string_append_printf(rep, " (%s)", insn->symbol);
+        }
+        g_string_append_printf(rep, ", %ld, %s\n", insn->l2_misses,
                                insn->disas_str);
     }
 
@@ -611,11 +683,13 @@ static void plugin_exit(qemu_plugin_id_t id, void *p)
     log_stats();
     log_top_insns();
 
-    caches_free(dcaches);
-    caches_free(icaches);
+    caches_free(l1_dcaches);
+    caches_free(l1_icaches);
+    caches_free(l2_ucaches);
 
-    g_free(dcache_locks);
-    g_free(icache_locks);
+    g_free(l1_dcache_locks);
+    g_free(l1_icache_locks);
+    g_free(l2_ucache_locks);
 
     g_hash_table_destroy(miss_ht);
 }
@@ -647,19 +721,24 @@ int qemu_plugin_install(qemu_plugin_id_t id, const qemu_info_t *info,
                         int argc, char **argv)
 {
     int i;
-    int iassoc, iblksize, icachesize;
-    int dassoc, dblksize, dcachesize;
+    int l1_iassoc, l1_iblksize, l1_icachesize;
+    int l1_dassoc, l1_dblksize, l1_dcachesize;
+    int l2_assoc, l2_blksize, l2_cachesize;
 
     limit = 32;
     sys = info->system_emulation;
 
-    dassoc = 8;
-    dblksize = 64;
-    dcachesize = dblksize * dassoc * 32;
+    l1_dassoc = 8;
+    l1_dblksize = 64;
+    l1_dcachesize = l1_dblksize * l1_dassoc * 32;
+
+    l1_iassoc = 8;
+    l1_iblksize = 64;
+    l1_icachesize = l1_iblksize * l1_iassoc * 32;
 
-    iassoc = 8;
-    iblksize = 64;
-    icachesize = iblksize * iassoc * 32;
+    l2_assoc = 16;
+    l2_blksize = 64;
+    l2_cachesize = l2_assoc * l2_blksize * 2048;
 
     policy = LRU;
 
@@ -668,21 +747,27 @@ int qemu_plugin_install(qemu_plugin_id_t id, const qemu_info_t *info,
     for (i = 0; i < argc; i++) {
         char *opt = argv[i];
         if (g_str_has_prefix(opt, "iblksize=")) {
-            iblksize = g_ascii_strtoll(opt + 9, NULL, 10);
+            l1_iblksize = g_ascii_strtoll(opt + 9, NULL, 10);
         } else if (g_str_has_prefix(opt, "iassoc=")) {
-            iassoc = g_ascii_strtoll(opt + 7, NULL, 10);
+            l1_iassoc = g_ascii_strtoll(opt + 7, NULL, 10);
         } else if (g_str_has_prefix(opt, "icachesize=")) {
-            icachesize = g_ascii_strtoll(opt + 11, NULL, 10);
+            l1_icachesize = g_ascii_strtoll(opt + 11, NULL, 10);
         } else if (g_str_has_prefix(opt, "dblksize=")) {
-            dblksize = g_ascii_strtoll(opt + 9, NULL, 10);
+            l1_dblksize = g_ascii_strtoll(opt + 9, NULL, 10);
         } else if (g_str_has_prefix(opt, "dassoc=")) {
-            dassoc = g_ascii_strtoll(opt + 7, NULL, 10);
+            l1_dassoc = g_ascii_strtoll(opt + 7, NULL, 10);
         } else if (g_str_has_prefix(opt, "dcachesize=")) {
-            dcachesize = g_ascii_strtoll(opt + 11, NULL, 10);
+            l1_dcachesize = g_ascii_strtoll(opt + 11, NULL, 10);
         } else if (g_str_has_prefix(opt, "limit=")) {
             limit = g_ascii_strtoll(opt + 6, NULL, 10);
         } else if (g_str_has_prefix(opt, "cores=")) {
             cores = g_ascii_strtoll(opt + 6, NULL, 10);
+        } else if (g_str_has_prefix(opt, "l2cachesize=")) {
+            l2_cachesize = g_ascii_strtoll(opt + 6, NULL, 10);
+        } else if (g_str_has_prefix(opt, "l2blksize=")) {
+            l2_blksize = g_ascii_strtoll(opt + 6, NULL, 10);
+        } else if (g_str_has_prefix(opt, "l2assoc=")) {
+            l2_assoc = g_ascii_strtoll(opt + 6, NULL, 10);
         } else if (g_str_has_prefix(opt, "evict=")) {
             gchar *p = opt + 6;
             if (g_strcmp0(p, "rand") == 0) {
@@ -703,24 +788,33 @@ int qemu_plugin_install(qemu_plugin_id_t id, const qemu_info_t *info,
 
     policy_init();
 
-    dcaches = caches_init(dblksize, dassoc, dcachesize);
-    if (!dcaches) {
-        const char *err = cache_config_error(dblksize, dassoc, dcachesize);
+    l1_dcaches = caches_init(l1_dblksize, l1_dassoc, l1_dcachesize);
+    if (!l1_dcaches) {
+        const char *err = cache_config_error(l1_dblksize, l1_dassoc, l1_dcachesize);
         fprintf(stderr, "dcache cannot be constructed from given parameters\n");
         fprintf(stderr, "%s\n", err);
         return -1;
     }
 
-    icaches = caches_init(iblksize, iassoc, icachesize);
-    if (!icaches) {
-        const char *err = cache_config_error(iblksize, iassoc, icachesize);
+    l1_icaches = caches_init(l1_iblksize, l1_iassoc, l1_icachesize);
+    if (!l1_icaches) {
+        const char *err = cache_config_error(l1_iblksize, l1_iassoc, l1_icachesize);
         fprintf(stderr, "icache cannot be constructed from given parameters\n");
         fprintf(stderr, "%s\n", err);
         return -1;
     }
 
-    dcache_locks = g_new0(GMutex, cores);
-    icache_locks = g_new0(GMutex, cores);
+    l2_ucaches = caches_init(l2_blksize, l2_assoc, l2_cachesize);
+    if (!l2_ucaches) {
+        const char *err = cache_config_error(l2_blksize, l2_assoc, l2_cachesize);
+        fprintf(stderr, "L2 cache cannot be constructed from given parameters\n");
+        fprintf(stderr, "%s\n", err);
+        return -1;
+    }
+
+    l1_dcache_locks = g_new0(GMutex, cores);
+    l1_icache_locks = g_new0(GMutex, cores);
+    l2_ucache_locks = g_new0(GMutex, cores);
 
     qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans);
     qemu_plugin_register_atexit_cb(id, plugin_exit, NULL);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/5] plugins/cache: split command line arguments into name and value
  2021-08-10 13:48 [PATCH 0/5] plugins/cache: L2 cache modelling and a minor leak fix Mahmoud Mandour
  2021-08-10 13:48 ` [PATCH 1/5] plugins/cache: freed heap-allocated mutexes Mahmoud Mandour
  2021-08-10 13:48 ` [PATCH 2/5] plugins/cache: implement unified L2 cache emulation Mahmoud Mandour
@ 2021-08-10 13:48 ` Mahmoud Mandour
  2021-10-08 14:05   ` Alex Bennée
  2021-08-10 13:48 ` [PATCH 4/5] plugins/cache: make L2 emulation optional through args Mahmoud Mandour
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Mahmoud Mandour @ 2021-08-10 13:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexandre Iooss, Mahmoud Mandour, alex.bennee

This way of handling args is more lenient and sets a better framework to
parse boolean command line arguments.

Signed-off-by: Mahmoud Mandour <ma.mandourr@gmail.com>
---
 contrib/plugins/cache.c | 57 ++++++++++++++++++++++-------------------
 1 file changed, 30 insertions(+), 27 deletions(-)

diff --git a/contrib/plugins/cache.c b/contrib/plugins/cache.c
index 908c967a09..ff325beb9f 100644
--- a/contrib/plugins/cache.c
+++ b/contrib/plugins/cache.c
@@ -11,6 +11,8 @@
 
 #include <qemu-plugin.h>
 
+#define STRTOLL(x) g_ascii_strtoll(x, NULL, 10)
+
 QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
 
 static enum qemu_plugin_mem_rw rw = QEMU_PLUGIN_MEM_RW;
@@ -746,35 +748,36 @@ int qemu_plugin_install(qemu_plugin_id_t id, const qemu_info_t *info,
 
     for (i = 0; i < argc; i++) {
         char *opt = argv[i];
-        if (g_str_has_prefix(opt, "iblksize=")) {
-            l1_iblksize = g_ascii_strtoll(opt + 9, NULL, 10);
-        } else if (g_str_has_prefix(opt, "iassoc=")) {
-            l1_iassoc = g_ascii_strtoll(opt + 7, NULL, 10);
-        } else if (g_str_has_prefix(opt, "icachesize=")) {
-            l1_icachesize = g_ascii_strtoll(opt + 11, NULL, 10);
-        } else if (g_str_has_prefix(opt, "dblksize=")) {
-            l1_dblksize = g_ascii_strtoll(opt + 9, NULL, 10);
-        } else if (g_str_has_prefix(opt, "dassoc=")) {
-            l1_dassoc = g_ascii_strtoll(opt + 7, NULL, 10);
-        } else if (g_str_has_prefix(opt, "dcachesize=")) {
-            l1_dcachesize = g_ascii_strtoll(opt + 11, NULL, 10);
-        } else if (g_str_has_prefix(opt, "limit=")) {
-            limit = g_ascii_strtoll(opt + 6, NULL, 10);
-        } else if (g_str_has_prefix(opt, "cores=")) {
-            cores = g_ascii_strtoll(opt + 6, NULL, 10);
-        } else if (g_str_has_prefix(opt, "l2cachesize=")) {
-            l2_cachesize = g_ascii_strtoll(opt + 6, NULL, 10);
-        } else if (g_str_has_prefix(opt, "l2blksize=")) {
-            l2_blksize = g_ascii_strtoll(opt + 6, NULL, 10);
-        } else if (g_str_has_prefix(opt, "l2assoc=")) {
-            l2_assoc = g_ascii_strtoll(opt + 6, NULL, 10);
-        } else if (g_str_has_prefix(opt, "evict=")) {
-            gchar *p = opt + 6;
-            if (g_strcmp0(p, "rand") == 0) {
+        g_autofree char **tokens = g_strsplit(opt, "=", 2);
+
+        if (g_strcmp0(tokens[0], "iblksize") == 0) {
+            l1_iblksize = STRTOLL(tokens[1]);
+        } else if (g_strcmp0(tokens[0], "iassoc") == 0) {
+            l1_iassoc = STRTOLL(tokens[1]);
+        } else if (g_strcmp0(tokens[0], "icachesize") == 0) {
+            l1_icachesize = STRTOLL(tokens[1]);
+        } else if (g_strcmp0(tokens[0], "dblksize") == 0) {
+            l1_dblksize = STRTOLL(tokens[1]);
+        } else if (g_strcmp0(tokens[0], "dassoc") == 0) {
+            l1_dassoc = STRTOLL(tokens[1]);
+        } else if (g_strcmp0(tokens[0], "dcachesize") == 0) {
+            l1_dcachesize = STRTOLL(tokens[1]);
+        } else if (g_strcmp0(tokens[0], "limit") == 0) {
+            limit = STRTOLL(tokens[1]);
+        } else if (g_strcmp0(tokens[0], "cores") == 0) {
+            cores = STRTOLL(tokens[1]);
+        } else if (g_strcmp0(tokens[0], "l2cachesize") == 0) {
+            l2_cachesize = STRTOLL(tokens[1]);
+        } else if (g_strcmp0(tokens[0], "l2blksize") == 0) {
+            l2_blksize = STRTOLL(tokens[1]);
+        } else if (g_strcmp0(tokens[0], "l2assoc") == 0) {
+            l2_assoc = STRTOLL(tokens[1]);
+        } else if (g_strcmp0(tokens[0], "evict") == 0) {
+            if (g_strcmp0(tokens[1], "rand") == 0) {
                 policy = RAND;
-            } else if (g_strcmp0(p, "lru") == 0) {
+            } else if (g_strcmp0(tokens[1], "lru") == 0) {
                 policy = LRU;
-            } else if (g_strcmp0(p, "fifo") == 0) {
+            } else if (g_strcmp0(tokens[1], "fifo") == 0) {
                 policy = FIFO;
             } else {
                 fprintf(stderr, "invalid eviction policy: %s\n", opt);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/5] plugins/cache: make L2 emulation optional through args
  2021-08-10 13:48 [PATCH 0/5] plugins/cache: L2 cache modelling and a minor leak fix Mahmoud Mandour
                   ` (2 preceding siblings ...)
  2021-08-10 13:48 ` [PATCH 3/5] plugins/cache: split command line arguments into name and value Mahmoud Mandour
@ 2021-08-10 13:48 ` Mahmoud Mandour
  2021-08-10 13:48 ` [PATCH 5/5] docs/tcg-plugins: add L2 arguments to cache docs Mahmoud Mandour
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Mahmoud Mandour @ 2021-08-10 13:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexandre Iooss, Mahmoud Mandour, alex.bennee

By default L2 is not enabled and is enabled by either using the
newly-introduced "l2" boolean argument, or by setting any of the L2
cache parameters using args. On specifying "l2=on", the default cache
configuration is used.

Signed-off-by: Mahmoud Mandour <ma.mandourr@gmail.com>
---
 contrib/plugins/cache.c | 76 +++++++++++++++++++++++++++++------------
 1 file changed, 54 insertions(+), 22 deletions(-)

diff --git a/contrib/plugins/cache.c b/contrib/plugins/cache.c
index ff325beb9f..b9226e7c40 100644
--- a/contrib/plugins/cache.c
+++ b/contrib/plugins/cache.c
@@ -97,6 +97,8 @@ void (*metadata_destroy)(Cache *cache);
 
 static int cores;
 static Cache **l1_dcaches, **l1_icaches;
+
+static bool use_l2;
 static Cache **l2_ucaches;
 
 static GMutex *l1_dcache_locks;
@@ -410,7 +412,7 @@ static void vcpu_mem_access(unsigned int vcpu_index, qemu_plugin_meminfo_t info,
     l1_dcaches[cache_idx]->accesses++;
     g_mutex_unlock(&l1_dcache_locks[cache_idx]);
 
-    if (hit_in_l1) {
+    if (hit_in_l1 || !use_l2) {
         /* No need to access L2 */
         return;
     }
@@ -445,7 +447,7 @@ static void vcpu_insn_exec(unsigned int vcpu_index, void *userdata)
     l1_icaches[cache_idx]->accesses++;
     g_mutex_unlock(&l1_icache_locks[cache_idx]);
 
-    if (hit_in_l1) {
+    if (hit_in_l1 || !use_l2) {
         /* No need to access L2 */
         return;
     }
@@ -542,19 +544,25 @@ static void append_stats_line(GString *line, uint64_t l1_daccess,
 
     l1_dmiss_rate = ((double) l1_dmisses) / (l1_daccess) * 100.0;
     l1_imiss_rate = ((double) l1_imisses) / (l1_iaccess) * 100.0;
-    l2_miss_rate =  ((double) l2_misses) / (l2_access) * 100.0;
 
     g_string_append_printf(line, "%-14lu %-12lu %9.4lf%%  %-14lu %-12lu"
-                           " %9.4lf%%  %-12lu %-11lu %10.4lf%%\n",
+                           " %9.4lf%%",
                            l1_daccess,
                            l1_dmisses,
                            l1_daccess ? l1_dmiss_rate : 0.0,
                            l1_iaccess,
                            l1_imisses,
-                           l1_iaccess ? l1_imiss_rate : 0.0,
-                           l2_access,
-                           l2_misses,
-                           l2_access ? l2_miss_rate : 0.0);
+                           l1_iaccess ? l1_imiss_rate : 0.0);
+
+    if (use_l2) {
+        l2_miss_rate =  ((double) l2_misses) / (l2_access) * 100.0;
+        g_string_append_printf(line, "  %-12lu %-11lu %10.4lf%%",
+                               l2_access,
+                               l2_misses,
+                               l2_access ? l2_miss_rate : 0.0);
+    }
+
+    g_string_append(line, "\n");
 }
 
 static void sum_stats(void)
@@ -568,8 +576,10 @@ static void sum_stats(void)
         l1_imem_accesses += l1_icaches[i]->accesses;
         l1_dmem_accesses += l1_dcaches[i]->accesses;
 
-        l2_misses += l2_ucaches[i]->misses;
-        l2_mem_accesses += l2_ucaches[i]->accesses;
+        if (use_l2) {
+            l2_misses += l2_ucaches[i]->misses;
+            l2_mem_accesses += l2_ucaches[i]->accesses;
+        }
     }
 }
 
@@ -604,25 +614,31 @@ static void log_stats(void)
 
     g_autoptr(GString) rep = g_string_new("core #, data accesses, data misses,"
                                           " dmiss rate, insn accesses,"
-                                          " insn misses, imiss rate,"
-                                          " l2 accesses, l2 misses,"
-                                          " l2 miss rate\n");
+                                          " insn misses, imiss rate");
+
+    if (use_l2) {
+        g_string_append(rep, ", l2 accesses, l2 misses, l2 miss rate");
+    }
+
+    g_string_append(rep, "\n");
 
     for (i = 0; i < cores; i++) {
         g_string_append_printf(rep, "%-8d", i);
         dcache = l1_dcaches[i];
         icache = l1_icaches[i];
-        l2_cache = l2_ucaches[i];
+        l2_cache = use_l2 ? l2_ucaches[i] : NULL;
         append_stats_line(rep, dcache->accesses, dcache->misses,
-                icache->accesses, icache->misses, l2_cache->accesses,
-                l2_cache->misses);
+                icache->accesses, icache->misses,
+                l2_cache ? l2_cache->accesses : 0,
+                l2_cache ? l2_cache->misses : 0);
     }
 
     if (cores > 1) {
         sum_stats();
         g_string_append_printf(rep, "%-8s", "sum");
         append_stats_line(rep, l1_dmem_accesses, l1_dmisses,
-                l1_imem_accesses, l1_imisses, l2_mem_accesses, l2_misses);
+                l1_imem_accesses, l1_imisses,
+                l2_cache ? l2_mem_accesses : 0, l2_cache ? l2_misses : 0);
     }
 
     g_string_append(rep, "\n");
@@ -663,6 +679,10 @@ static void log_top_insns(void)
                                insn->disas_str);
     }
 
+    if (!use_l2) {
+        goto finish;
+    }
+
     miss_insns = g_list_sort(miss_insns, l2_cmp);
     g_string_append_printf(rep, "%s", "\naddress, L2 misses, instruction\n");
 
@@ -676,6 +696,7 @@ static void log_top_insns(void)
                                insn->disas_str);
     }
 
+finish:
     qemu_plugin_outs(rep->str);
     g_list_free(miss_insns);
 }
@@ -687,11 +708,14 @@ static void plugin_exit(qemu_plugin_id_t id, void *p)
 
     caches_free(l1_dcaches);
     caches_free(l1_icaches);
-    caches_free(l2_ucaches);
 
     g_free(l1_dcache_locks);
     g_free(l1_icache_locks);
-    g_free(l2_ucache_locks);
+
+    if (use_l2) {
+        caches_free(l2_ucaches);
+        g_free(l2_ucache_locks);
+    }
 
     g_hash_table_destroy(miss_ht);
 }
@@ -767,11 +791,19 @@ int qemu_plugin_install(qemu_plugin_id_t id, const qemu_info_t *info,
         } else if (g_strcmp0(tokens[0], "cores") == 0) {
             cores = STRTOLL(tokens[1]);
         } else if (g_strcmp0(tokens[0], "l2cachesize") == 0) {
+            use_l2 = true;
             l2_cachesize = STRTOLL(tokens[1]);
         } else if (g_strcmp0(tokens[0], "l2blksize") == 0) {
+            use_l2 = true;
             l2_blksize = STRTOLL(tokens[1]);
         } else if (g_strcmp0(tokens[0], "l2assoc") == 0) {
+            use_l2 = true;
             l2_assoc = STRTOLL(tokens[1]);
+        } else if (g_strcmp0(tokens[0], "l2") == 0) {
+            if (!qemu_plugin_bool_parse(tokens[0], tokens[1], &use_l2)) {
+                fprintf(stderr, "boolean argument parsing failed: %s\n", opt);
+                return -1;
+            }
         } else if (g_strcmp0(tokens[0], "evict") == 0) {
             if (g_strcmp0(tokens[1], "rand") == 0) {
                 policy = RAND;
@@ -807,8 +839,8 @@ int qemu_plugin_install(qemu_plugin_id_t id, const qemu_info_t *info,
         return -1;
     }
 
-    l2_ucaches = caches_init(l2_blksize, l2_assoc, l2_cachesize);
-    if (!l2_ucaches) {
+    l2_ucaches = use_l2 ? caches_init(l2_blksize, l2_assoc, l2_cachesize) : NULL;
+    if (!l2_ucaches && use_l2) {
         const char *err = cache_config_error(l2_blksize, l2_assoc, l2_cachesize);
         fprintf(stderr, "L2 cache cannot be constructed from given parameters\n");
         fprintf(stderr, "%s\n", err);
@@ -817,7 +849,7 @@ int qemu_plugin_install(qemu_plugin_id_t id, const qemu_info_t *info,
 
     l1_dcache_locks = g_new0(GMutex, cores);
     l1_icache_locks = g_new0(GMutex, cores);
-    l2_ucache_locks = g_new0(GMutex, cores);
+    l2_ucache_locks = use_l2 ? g_new0(GMutex, cores) : NULL;
 
     qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans);
     qemu_plugin_register_atexit_cb(id, plugin_exit, NULL);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/5] docs/tcg-plugins: add L2 arguments to cache docs
  2021-08-10 13:48 [PATCH 0/5] plugins/cache: L2 cache modelling and a minor leak fix Mahmoud Mandour
                   ` (3 preceding siblings ...)
  2021-08-10 13:48 ` [PATCH 4/5] plugins/cache: make L2 emulation optional through args Mahmoud Mandour
@ 2021-08-10 13:48 ` Mahmoud Mandour
  2021-08-10 14:56 ` [PATCH 0/5] plugins/cache: L2 cache modelling and a minor leak fix Mahmoud Mandour
  2021-09-20 15:26 ` Mahmoud Mandour
  6 siblings, 0 replies; 11+ messages in thread
From: Mahmoud Mandour @ 2021-08-10 13:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexandre Iooss, Mahmoud Mandour, alex.bennee

cache plugin now allows optional L2 per-core cache emulation that can be
configured through plugin arguments, this commit adds this functionality
to the docs.

While I'm at it, I editted the bullet point for cache plugin to say:
    contrib/plugins/cache.c
instead of
    contrib/plugins/cache
to match other plugins.

Signed-off-by: Mahmoud Mandour <ma.mandourr@gmail.com>
---
 docs/devel/tcg-plugins.rst | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/docs/devel/tcg-plugins.rst b/docs/devel/tcg-plugins.rst
index fcc460bf7b..ee3fab75bd 100644
--- a/docs/devel/tcg-plugins.rst
+++ b/docs/devel/tcg-plugins.rst
@@ -360,10 +360,11 @@ which will output an execution trace following this structure::
   0, 0xd34, 0xf9c8f000, "bl #0x10c8"
   0, 0x10c8, 0xfff96c43, "ldr r3, [r0, #0x44]", load, 0x200000e4, RAM
 
-- contrib/plugins/cache
+- contrib/plugins/cache.c
 
-Cache modelling plugin that measures the performance of a given cache
-configuration when a given working set is run::
+Cache modelling plugin that measures the performance of a given L1 cache
+configuration, and optionally a unified L2 per-core cache when a given working
+set is run::
 
     qemu-x86_64 -plugin ./contrib/plugins/libcache.so \
       -d plugin -D cache.log ./tests/tcg/x86_64-linux-user/float_convs
@@ -421,3 +422,18 @@ The plugin has a number of arguments, all of them are optional:
   Sets the number of cores for which we maintain separate icache and dcache.
   (default: for linux-user, N = 1, for full system emulation: N = cores
   available to guest)
+
+  * l2=on
+
+  Simulates a unified L2 cache (stores blocks for both instructions and data)
+  using the default L2 configuration (cache size = 2MB, associativity = 16-way,
+  block size = 64B).
+
+  * l2cachesize=N
+  * l2blksize=B
+  * l2assoc=A
+
+  L2 cache configuration arguments. They specify the cache size, block size, and
+  associativity of the L2 cache, respectively. Setting any of the L2
+  configuration arguments implies ``l2=on``.
+  (default: N = 2097152 (2MB), B = 64, A = 16)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/5] plugins/cache: L2 cache modelling and a minor leak fix
  2021-08-10 13:48 [PATCH 0/5] plugins/cache: L2 cache modelling and a minor leak fix Mahmoud Mandour
                   ` (4 preceding siblings ...)
  2021-08-10 13:48 ` [PATCH 5/5] docs/tcg-plugins: add L2 arguments to cache docs Mahmoud Mandour
@ 2021-08-10 14:56 ` Mahmoud Mandour
  2021-09-20 15:26 ` Mahmoud Mandour
  6 siblings, 0 replies; 11+ messages in thread
From: Mahmoud Mandour @ 2021-08-10 14:56 UTC (permalink / raw)
  To: open list:All patches CC here; +Cc: Alex Bennée

[-- Attachment #1: Type: text/plain, Size: 1602 bytes --]

On Tue, Aug 10, 2021 at 3:48 PM Mahmoud Mandour <ma.mandourr@gmail.com>
wrote:

> Hello,
>
> This series implements a simple L2 unified per-core cache emulation, the L2
> cache is not enabled by default and is only enabled on specifying so using
> the
> plugin arguments. L2 cache is only accessed if L1 does not contain the
> wanted
> block. If a miss occur in L1, the block is "fetched" to L1, and then L2 is
> interrogated. This effectively mean that on a cache miss, we bring the
> data to
> all the cache hierarchy, which is quite reasonable.
>
> Note: Some +80-column lines are left as is, since they're all 81~83 cols
> and I
> thought that it's so important to break them since breaking a line usually
> looks
> ugly when not sufficiently long(?) LMK if I need to fix this :)
>
>
Based-on: <20210730135817.17816-1-ma.mandourr@gmail.com>
since it uses `qemu_plugin_bool_parse` and based also on the changes done
in the plugin docs in the same series.

This can hopefully be applied cleanly to plugins/next since it has all my
multicore
cache patches and also the new arg-passing scheme patches.


> Mahmoud Mandour (5):
>   plugins/cache: freed heap-allocated mutexes
>   plugins/cache: implement unified L2 cache emulation
>   plugins/cache: split command line arguments into name and value
>   plugins/cache: make L2 emulation optional through args
>   docs/tcg-plugins: add L2 arguments to cache docs
>
>  contrib/plugins/cache.c    | 318 ++++++++++++++++++++++++++-----------
>  docs/devel/tcg-plugins.rst |  22 ++-
>  2 files changed, 244 insertions(+), 96 deletions(-)
>
> --
> 2.25.1
>
>

[-- Attachment #2: Type: text/html, Size: 2290 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/5] plugins/cache: L2 cache modelling and a minor leak fix
  2021-08-10 13:48 [PATCH 0/5] plugins/cache: L2 cache modelling and a minor leak fix Mahmoud Mandour
                   ` (5 preceding siblings ...)
  2021-08-10 14:56 ` [PATCH 0/5] plugins/cache: L2 cache modelling and a minor leak fix Mahmoud Mandour
@ 2021-09-20 15:26 ` Mahmoud Mandour
  6 siblings, 0 replies; 11+ messages in thread
From: Mahmoud Mandour @ 2021-09-20 15:26 UTC (permalink / raw)
  To: open list:All patches CC here; +Cc: Alex Bennée

[-- Attachment #1: Type: text/plain, Size: 1294 bytes --]

Ping :-)

On Tue, Aug 10, 2021 at 3:48 PM Mahmoud Mandour <ma.mandourr@gmail.com>
wrote:

> Hello,
>
> This series implements a simple L2 unified per-core cache emulation, the L2
> cache is not enabled by default and is only enabled on specifying so using
> the
> plugin arguments. L2 cache is only accessed if L1 does not contain the
> wanted
> block. If a miss occur in L1, the block is "fetched" to L1, and then L2 is
> interrogated. This effectively mean that on a cache miss, we bring the
> data to
> all the cache hierarchy, which is quite reasonable.
>
> Note: Some +80-column lines are left as is, since they're all 81~83 cols
> and I
> thought that it's so important to break them since breaking a line usually
> looks
> ugly when not sufficiently long(?) LMK if I need to fix this :)
>
> Mahmoud Mandour (5):
>   plugins/cache: freed heap-allocated mutexes
>   plugins/cache: implement unified L2 cache emulation
>   plugins/cache: split command line arguments into name and value
>   plugins/cache: make L2 emulation optional through args
>   docs/tcg-plugins: add L2 arguments to cache docs
>
>  contrib/plugins/cache.c    | 318 ++++++++++++++++++++++++++-----------
>  docs/devel/tcg-plugins.rst |  22 ++-
>  2 files changed, 244 insertions(+), 96 deletions(-)
>
> --
> 2.25.1
>
>

[-- Attachment #2: Type: text/html, Size: 1682 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/5] plugins/cache: split command line arguments into name and value
  2021-08-10 13:48 ` [PATCH 3/5] plugins/cache: split command line arguments into name and value Mahmoud Mandour
@ 2021-10-08 14:05   ` Alex Bennée
  0 siblings, 0 replies; 11+ messages in thread
From: Alex Bennée @ 2021-10-08 14:05 UTC (permalink / raw)
  To: Mahmoud Mandour; +Cc: Alexandre Iooss, qemu-devel


Mahmoud Mandour <ma.mandourr@gmail.com> writes:

> This way of handling args is more lenient and sets a better framework to
> parse boolean command line arguments.
>
> Signed-off-by: Mahmoud Mandour <ma.mandourr@gmail.com>
> ---
>  contrib/plugins/cache.c | 57 ++++++++++++++++++++++-------------------
>  1 file changed, 30 insertions(+), 27 deletions(-)
>
> diff --git a/contrib/plugins/cache.c b/contrib/plugins/cache.c
> index 908c967a09..ff325beb9f 100644
> --- a/contrib/plugins/cache.c
> +++ b/contrib/plugins/cache.c
> @@ -11,6 +11,8 @@
>  
>  #include <qemu-plugin.h>
>  
> +#define STRTOLL(x) g_ascii_strtoll(x, NULL, 10)
> +

Why wrap the strtoll in a macro here? Also do we deal with signed
numbers, otherwise strtoull makes more sense.

>  QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
>  
>  static enum qemu_plugin_mem_rw rw = QEMU_PLUGIN_MEM_RW;
> @@ -746,35 +748,36 @@ int qemu_plugin_install(qemu_plugin_id_t id, const qemu_info_t *info,
>  
>      for (i = 0; i < argc; i++) {
>          char *opt = argv[i];
> -        if (g_str_has_prefix(opt, "iblksize=")) {
> -            l1_iblksize = g_ascii_strtoll(opt + 9, NULL, 10);
> -        } else if (g_str_has_prefix(opt, "iassoc=")) {
> -            l1_iassoc = g_ascii_strtoll(opt + 7, NULL, 10);
> -        } else if (g_str_has_prefix(opt, "icachesize=")) {
> -            l1_icachesize = g_ascii_strtoll(opt + 11, NULL, 10);
> -        } else if (g_str_has_prefix(opt, "dblksize=")) {
> -            l1_dblksize = g_ascii_strtoll(opt + 9, NULL, 10);
> -        } else if (g_str_has_prefix(opt, "dassoc=")) {
> -            l1_dassoc = g_ascii_strtoll(opt + 7, NULL, 10);
> -        } else if (g_str_has_prefix(opt, "dcachesize=")) {
> -            l1_dcachesize = g_ascii_strtoll(opt + 11, NULL, 10);
> -        } else if (g_str_has_prefix(opt, "limit=")) {
> -            limit = g_ascii_strtoll(opt + 6, NULL, 10);
> -        } else if (g_str_has_prefix(opt, "cores=")) {
> -            cores = g_ascii_strtoll(opt + 6, NULL, 10);
> -        } else if (g_str_has_prefix(opt, "l2cachesize=")) {
> -            l2_cachesize = g_ascii_strtoll(opt + 6, NULL, 10);
> -        } else if (g_str_has_prefix(opt, "l2blksize=")) {
> -            l2_blksize = g_ascii_strtoll(opt + 6, NULL, 10);
> -        } else if (g_str_has_prefix(opt, "l2assoc=")) {
> -            l2_assoc = g_ascii_strtoll(opt + 6, NULL, 10);
> -        } else if (g_str_has_prefix(opt, "evict=")) {
> -            gchar *p = opt + 6;
> -            if (g_strcmp0(p, "rand") == 0) {
> +        g_autofree char **tokens = g_strsplit(opt, "=", 2);

I think using strssplit here is fine, but we don't seem to to take care
that there is a valid tokens[1]. 

> +
> +        if (g_strcmp0(tokens[0], "iblksize") == 0) {
> +            l1_iblksize = STRTOLL(tokens[1]);
> +        } else if (g_strcmp0(tokens[0], "iassoc") == 0) {
> +            l1_iassoc = STRTOLL(tokens[1]);
> +        } else if (g_strcmp0(tokens[0], "icachesize") == 0) {
> +            l1_icachesize = STRTOLL(tokens[1]);
> +        } else if (g_strcmp0(tokens[0], "dblksize") == 0) {
> +            l1_dblksize = STRTOLL(tokens[1]);
> +        } else if (g_strcmp0(tokens[0], "dassoc") == 0) {
> +            l1_dassoc = STRTOLL(tokens[1]);
> +        } else if (g_strcmp0(tokens[0], "dcachesize") == 0) {
> +            l1_dcachesize = STRTOLL(tokens[1]);
> +        } else if (g_strcmp0(tokens[0], "limit") == 0) {
> +            limit = STRTOLL(tokens[1]);
> +        } else if (g_strcmp0(tokens[0], "cores") == 0) {
> +            cores = STRTOLL(tokens[1]);
> +        } else if (g_strcmp0(tokens[0], "l2cachesize") == 0) {
> +            l2_cachesize = STRTOLL(tokens[1]);
> +        } else if (g_strcmp0(tokens[0], "l2blksize") == 0) {
> +            l2_blksize = STRTOLL(tokens[1]);
> +        } else if (g_strcmp0(tokens[0], "l2assoc") == 0) {
> +            l2_assoc = STRTOLL(tokens[1]);
> +        } else if (g_strcmp0(tokens[0], "evict") == 0) {
> +            if (g_strcmp0(tokens[1], "rand") == 0) {
>                  policy = RAND;
> -            } else if (g_strcmp0(p, "lru") == 0) {
> +            } else if (g_strcmp0(tokens[1], "lru") == 0) {
>                  policy = LRU;
> -            } else if (g_strcmp0(p, "fifo") == 0) {
> +            } else if (g_strcmp0(tokens[1], "fifo") == 0) {
>                  policy = FIFO;
>              } else {
>                  fprintf(stderr, "invalid eviction policy: %s\n", opt);


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/5] plugins/cache: freed heap-allocated mutexes
  2021-08-10 13:48 ` [PATCH 1/5] plugins/cache: freed heap-allocated mutexes Mahmoud Mandour
@ 2021-10-08 14:45   ` Alex Bennée
  0 siblings, 0 replies; 11+ messages in thread
From: Alex Bennée @ 2021-10-08 14:45 UTC (permalink / raw)
  To: Mahmoud Mandour; +Cc: Alexandre Iooss, qemu-devel


Mahmoud Mandour <ma.mandourr@gmail.com> writes:

> Signed-off-by: Mahmoud Mandour <ma.mandourr@gmail.com>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/5] plugins/cache: implement unified L2 cache emulation
  2021-08-10 13:48 ` [PATCH 2/5] plugins/cache: implement unified L2 cache emulation Mahmoud Mandour
@ 2021-10-08 15:44   ` Alex Bennée
  0 siblings, 0 replies; 11+ messages in thread
From: Alex Bennée @ 2021-10-08 15:44 UTC (permalink / raw)
  To: Mahmoud Mandour; +Cc: Alexandre Iooss, qemu-devel


Mahmoud Mandour <ma.mandourr@gmail.com> writes:

> This adds an implementation of a simple L2 configuration, in which a
> unified L2 cache (stores both blocks of instructions and data) is
> maintained for each core separately, with no inter-core interaction
> taken in account. The L2 cache is used as a backup for L1 and is only
> accessed if the wanted block does not exist in L1.
>
> In terms of multi-threaded user-space emulation, the same approximation
> of L1 is done, a static number of caches is maintained, and each and
> every memory access initiated by a thread will have to go through one of
> the available caches.
>
> An atomic increment is used to maintain the number of L2 misses per
> instruction.
>
> The default cache parameters of L2 caches is:
>
>     2MB cache size
>     16-way associativity
>     64-byte blocks
>
> Signed-off-by: Mahmoud Mandour <ma.mandourr@gmail.com>
> ---
>  contrib/plugins/cache.c | 256 +++++++++++++++++++++++++++-------------
>  1 file changed, 175 insertions(+), 81 deletions(-)
>
> diff --git a/contrib/plugins/cache.c b/contrib/plugins/cache.c
> index a255e26e25..908c967a09 100644
> --- a/contrib/plugins/cache.c
> +++ b/contrib/plugins/cache.c
> @@ -82,8 +82,9 @@ typedef struct {
>      char *disas_str;
>      const char *symbol;
>      uint64_t addr;
> -    uint64_t dmisses;
> -    uint64_t imisses;
> +    uint64_t l1_dmisses;
> +    uint64_t l1_imisses;
> +    uint64_t l2_misses;
>  } InsnData;
>  
>  void (*update_hit)(Cache *cache, int set, int blk);
> @@ -93,15 +94,20 @@ void (*metadata_init)(Cache *cache);
>  void (*metadata_destroy)(Cache *cache);
>  
>  static int cores;
> -static Cache **dcaches, **icaches;
> +static Cache **l1_dcaches, **l1_icaches;
> +static Cache **l2_ucaches;
>  
> -static GMutex *dcache_locks;
> -static GMutex *icache_locks;
> +static GMutex *l1_dcache_locks;
> +static GMutex *l1_icache_locks;
> +static GMutex *l2_ucache_locks;

Did you experiment with keeping a single locking hierarchy? I measured
quite a high contention with perf while running on system emulation.
While splitting locks can reduce contention I suspect the pattern of
access might just lead to 2 threads serialising twice in a row and
therfore adding to latency.

It might be overly complicated by the current split between i and d
cache for layer 1 which probably makes sense.

Otherwise looks reasonable to me:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-10-08 15:54 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-10 13:48 [PATCH 0/5] plugins/cache: L2 cache modelling and a minor leak fix Mahmoud Mandour
2021-08-10 13:48 ` [PATCH 1/5] plugins/cache: freed heap-allocated mutexes Mahmoud Mandour
2021-10-08 14:45   ` Alex Bennée
2021-08-10 13:48 ` [PATCH 2/5] plugins/cache: implement unified L2 cache emulation Mahmoud Mandour
2021-10-08 15:44   ` Alex Bennée
2021-08-10 13:48 ` [PATCH 3/5] plugins/cache: split command line arguments into name and value Mahmoud Mandour
2021-10-08 14:05   ` Alex Bennée
2021-08-10 13:48 ` [PATCH 4/5] plugins/cache: make L2 emulation optional through args Mahmoud Mandour
2021-08-10 13:48 ` [PATCH 5/5] docs/tcg-plugins: add L2 arguments to cache docs Mahmoud Mandour
2021-08-10 14:56 ` [PATCH 0/5] plugins/cache: L2 cache modelling and a minor leak fix Mahmoud Mandour
2021-09-20 15:26 ` Mahmoud Mandour

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.