All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/2] plugins/cache: multicore cache modelling
@ 2021-08-03 15:12 Mahmoud Mandour
  2021-08-03 15:13 ` [PATCH v5 1/2] plugins/cache: supported " Mahmoud Mandour
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Mahmoud Mandour @ 2021-08-03 15:12 UTC (permalink / raw)
  To: qemu-devel; +Cc: Mahmoud Mandour, alex.bennee

Hello,

This series introduce multicore cache modelling in contrib/plugins/cache.c

Multi-core cache modelling is handled such that for full-system
emulation, a private L1 cache is maintained to each core available to
the system. For multi-threaded userspace emulation, a static number of
cores is maintained for the overall system, and every memory access go
through one of these, even if the number of fired threads is more than
that number.

v4 -> v5:
    1. Reserved a mutex lock for each cache structure; now callbacks generated
    by accesses done by different vcpus don't block each other.
    2. Used atomic increment to access hashtable entries instead of locking.
    3. Renamed mtx to hashtable_lock to reflect its job more explicitly.
    4. Dropped the usage of CoreStats, embedded stats in the cache structure.
    4. append_stats_line now takes the stats explicitly.

Mahmoud Mandour (2):
  plugins/cache: supported multicore cache modelling
  docs/devel/tcg-plugins: added cores arg to cache plugin

 contrib/plugins/cache.c    | 176 +++++++++++++++++++++++++++----------
 docs/devel/tcg-plugins.rst |  13 +--
 2 files changed, 140 insertions(+), 49 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v5 1/2] plugins/cache: supported multicore cache modelling
  2021-08-03 15:12 [PATCH v5 0/2] plugins/cache: multicore cache modelling Mahmoud Mandour
@ 2021-08-03 15:13 ` Mahmoud Mandour
  2021-08-03 15:13 ` [PATCH v5 2/2] docs/devel/tcg-plugins: added cores arg to cache plugin Mahmoud Mandour
  2021-08-03 21:10 ` [PATCH v5 0/2] plugins/cache: multicore cache modelling Alex Bennée
  2 siblings, 0 replies; 6+ messages in thread
From: Mahmoud Mandour @ 2021-08-03 15:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexandre Iooss, Mahmoud Mandour, alex.bennee

Multicore L1 cache modelling is introduced and is supported for both
full system emulation and linux-user.

For full-system emulation, L1 icache and dcache are maintained for each
available core, since this information is exposed to the plugin through
`qemu_plugin_n_vcpus()`.

For linux-user, a static number of cores is assumed (default 1 core, and
can be provided as a plugin argument `cores=N`). Every memory access
goes through one of these caches, this approach is taken as it's
somewhat akin to what happens on real setup, where a program that
dispatches more threads than the available cores, they'll thrash
each other

Signed-off-by: Mahmoud Mandour <ma.mandourr@gmail.com>
---
 contrib/plugins/cache.c | 176 ++++++++++++++++++++++++++++++----------
 1 file changed, 132 insertions(+), 44 deletions(-)

diff --git a/contrib/plugins/cache.c b/contrib/plugins/cache.c
index 066ea6d8ec..a1e03ca882 100644
--- a/contrib/plugins/cache.c
+++ b/contrib/plugins/cache.c
@@ -17,18 +17,12 @@ static enum qemu_plugin_mem_rw rw = QEMU_PLUGIN_MEM_RW;
 
 static GHashTable *miss_ht;
 
-static GMutex mtx;
+static GMutex hashtable_lock;
 static GRand *rng;
 
 static int limit;
 static bool sys;
 
-static uint64_t dmem_accesses;
-static uint64_t dmisses;
-
-static uint64_t imem_accesses;
-static uint64_t imisses;
-
 enum EvictionPolicy {
     LRU,
     FIFO,
@@ -80,6 +74,8 @@ typedef struct {
     int blksize_shift;
     uint64_t set_mask;
     uint64_t tag_mask;
+    uint64_t accesses;
+    uint64_t misses;
 } Cache;
 
 typedef struct {
@@ -96,7 +92,16 @@ void (*update_miss)(Cache *cache, int set, int blk);
 void (*metadata_init)(Cache *cache);
 void (*metadata_destroy)(Cache *cache);
 
-Cache *dcache, *icache;
+static int cores;
+static Cache **dcaches, **icaches;
+
+static GMutex *dcache_locks;
+static GMutex *icache_locks;
+
+static uint64_t all_dmem_accesses;
+static uint64_t all_imem_accesses;
+static uint64_t all_imisses;
+static uint64_t all_dmisses;
 
 static int pow_of_two(int num)
 {
@@ -233,20 +238,24 @@ static bool bad_cache_params(int blksize, int assoc, int cachesize)
 
 static Cache *cache_init(int blksize, int assoc, int cachesize)
 {
-    if (bad_cache_params(blksize, assoc, cachesize)) {
-        return NULL;
-    }
-
     Cache *cache;
     int i;
     uint64_t blk_mask;
 
+    /*
+     * This function shall not be called directly, and hence expects suitable
+     * parameters.
+     */
+    g_assert(!bad_cache_params(blksize, assoc, cachesize));
+
     cache = g_new(Cache, 1);
     cache->assoc = assoc;
     cache->cachesize = cachesize;
     cache->num_sets = cachesize / (blksize * assoc);
     cache->sets = g_new(CacheSet, cache->num_sets);
     cache->blksize_shift = pow_of_two(blksize);
+    cache->accesses = 0;
+    cache->misses = 0;
 
     for (i = 0; i < cache->num_sets; i++) {
         cache->sets[i].blocks = g_new0(CacheBlock, assoc);
@@ -263,6 +272,24 @@ static Cache *cache_init(int blksize, int assoc, int cachesize)
     return cache;
 }
 
+static Cache **caches_init(int blksize, int assoc, int cachesize)
+{
+    Cache **caches;
+    int i;
+
+    if (bad_cache_params(blksize, assoc, cachesize)) {
+        return NULL;
+    }
+
+    caches = g_new(Cache *, cores);
+
+    for (i = 0; i < cores; i++) {
+        caches[i] = cache_init(blksize, assoc, cachesize);
+    }
+
+    return caches;
+}
+
 static int get_invalid_block(Cache *cache, uint64_t set)
 {
     int i;
@@ -353,6 +380,7 @@ static void vcpu_mem_access(unsigned int vcpu_index, qemu_plugin_meminfo_t info,
 {
     uint64_t effective_addr;
     struct qemu_plugin_hwaddr *hwaddr;
+    int cache_idx;
     InsnData *insn;
 
     hwaddr = qemu_plugin_get_hwaddr(info, vaddr);
@@ -361,32 +389,35 @@ static void vcpu_mem_access(unsigned int vcpu_index, qemu_plugin_meminfo_t info,
     }
 
     effective_addr = hwaddr ? qemu_plugin_hwaddr_phys_addr(hwaddr) : vaddr;
+    cache_idx = vcpu_index % cores;
 
-    g_mutex_lock(&mtx);
-    if (!access_cache(dcache, effective_addr)) {
+    g_mutex_lock(&dcache_locks[cache_idx]);
+    if (!access_cache(dcaches[cache_idx], effective_addr)) {
         insn = (InsnData *) userdata;
-        insn->dmisses++;
-        dmisses++;
+        __atomic_fetch_add(&insn->dmisses, 1, __ATOMIC_SEQ_CST);
+        dcaches[cache_idx]->misses++;
     }
-    dmem_accesses++;
-    g_mutex_unlock(&mtx);
+    dcaches[cache_idx]->accesses++;
+    g_mutex_unlock(&dcache_locks[cache_idx]);
 }
 
 static void vcpu_insn_exec(unsigned int vcpu_index, void *userdata)
 {
     uint64_t insn_addr;
     InsnData *insn;
+    int cache_idx;
 
-    g_mutex_lock(&mtx);
     insn_addr = ((InsnData *) userdata)->addr;
 
-    if (!access_cache(icache, insn_addr)) {
+    cache_idx = vcpu_index % cores;
+    g_mutex_lock(&icache_locks[cache_idx]);
+    if (!access_cache(icaches[cache_idx], insn_addr)) {
         insn = (InsnData *) userdata;
-        insn->imisses++;
-        imisses++;
+        __atomic_fetch_add(&insn->imisses, 1, __ATOMIC_SEQ_CST);
+        icaches[cache_idx]->misses++;
     }
-    imem_accesses++;
-    g_mutex_unlock(&mtx);
+    icaches[cache_idx]->accesses++;
+    g_mutex_unlock(&icache_locks[cache_idx]);
 }
 
 static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
@@ -411,7 +442,7 @@ static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
          * new entries for those instructions. Instead, we fetch the same
          * entry from the hash table and register it for the callback again.
          */
-        g_mutex_lock(&mtx);
+        g_mutex_lock(&hashtable_lock);
         data = g_hash_table_lookup(miss_ht, GUINT_TO_POINTER(effective_addr));
         if (data == NULL) {
             data = g_new0(InsnData, 1);
@@ -421,7 +452,7 @@ static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
             g_hash_table_insert(miss_ht, GUINT_TO_POINTER(effective_addr),
                                (gpointer) data);
         }
-        g_mutex_unlock(&mtx);
+        g_mutex_unlock(&hashtable_lock);
 
         qemu_plugin_register_vcpu_mem_cb(insn, vcpu_mem_access,
                                          QEMU_PLUGIN_CB_NO_REGS,
@@ -453,6 +484,15 @@ static void cache_free(Cache *cache)
     g_free(cache);
 }
 
+static void caches_free(Cache **caches)
+{
+    int i;
+
+    for (i = 0; i < cores; i++) {
+        cache_free(caches[i]);
+    }
+}
+
 static int dcmp(gconstpointer a, gconstpointer b)
 {
     InsnData *insn_a = (InsnData *) a;
@@ -461,6 +501,37 @@ static int dcmp(gconstpointer a, gconstpointer b)
     return insn_a->dmisses < insn_b->dmisses ? 1 : -1;
 }
 
+static void append_stats_line(GString *line, uint64_t daccess, uint64_t dmisses,
+                              uint64_t iaccess, uint64_t imisses)
+{
+    double dmiss_rate, imiss_rate;
+
+    dmiss_rate = ((double) dmisses) / (daccess) * 100.0;
+    imiss_rate = ((double) imisses) / (iaccess) * 100.0;
+
+    g_string_append_printf(line, "%-14lu %-12lu %9.4lf%%  %-14lu %-12lu"
+                           " %9.4lf%%\n",
+                           daccess,
+                           dmisses,
+                           daccess ? dmiss_rate : 0.0,
+                           iaccess,
+                           imisses,
+                           iaccess ? imiss_rate : 0.0);
+}
+
+static void sum_stats(void)
+{
+    int i;
+
+    g_assert(cores > 1);
+    for (i = 0; i < cores; i++) {
+        all_imisses += icaches[i]->misses;
+        all_dmisses += dcaches[i]->misses;
+        all_imem_accesses += icaches[i]->accesses;
+        all_dmem_accesses += dcaches[i]->accesses;
+    }
+}
+
 static int icmp(gconstpointer a, gconstpointer b)
 {
     InsnData *insn_a = (InsnData *) a;
@@ -471,19 +542,29 @@ static int icmp(gconstpointer a, gconstpointer b)
 
 static void log_stats(void)
 {
-    g_autoptr(GString) rep = g_string_new("");
-    g_string_append_printf(rep,
-        "Data accesses: %lu, Misses: %lu\nMiss rate: %lf%%\n\n",
-        dmem_accesses,
-        dmisses,
-        ((double) dmisses / (double) dmem_accesses) * 100.0);
-
-    g_string_append_printf(rep,
-        "Instruction accesses: %lu, Misses: %lu\nMiss rate: %lf%%\n\n",
-        imem_accesses,
-        imisses,
-        ((double) imisses / (double) imem_accesses) * 100.0);
+    int i;
+    Cache *icache, *dcache;
+
+    g_autoptr(GString) rep = g_string_new("core #, data accesses, data misses,"
+                                          " dmiss rate, insn accesses,"
+                                          " insn misses, imiss rate\n");
+
+    for (i = 0; i < cores; i++) {
+        g_string_append_printf(rep, "%-8d", i);
+        dcache = dcaches[i];
+        icache = icaches[i];
+        append_stats_line(rep, dcache->accesses, dcache->misses,
+                icache->accesses, icache->misses);
+    }
+
+    if (cores > 1) {
+        sum_stats();
+        g_string_append_printf(rep, "%-8s", "sum");
+        append_stats_line(rep, all_dmem_accesses, all_dmisses,
+                all_imem_accesses, all_imisses);
+    }
 
+    g_string_append(rep, "\n");
     qemu_plugin_outs(rep->str);
 }
 
@@ -530,8 +611,8 @@ static void plugin_exit(qemu_plugin_id_t id, void *p)
     log_stats();
     log_top_insns();
 
-    cache_free(dcache);
-    cache_free(icache);
+    caches_free(dcaches);
+    caches_free(icaches);
 
     g_hash_table_destroy(miss_ht);
 }
@@ -579,6 +660,8 @@ int qemu_plugin_install(qemu_plugin_id_t id, const qemu_info_t *info,
 
     policy = LRU;
 
+    cores = sys ? qemu_plugin_n_vcpus() : 1;
+
     for (i = 0; i < argc; i++) {
         char *opt = argv[i];
         if (g_str_has_prefix(opt, "iblksize=")) {
@@ -595,6 +678,8 @@ int qemu_plugin_install(qemu_plugin_id_t id, const qemu_info_t *info,
             dcachesize = g_ascii_strtoll(opt + 11, NULL, 10);
         } else if (g_str_has_prefix(opt, "limit=")) {
             limit = g_ascii_strtoll(opt + 6, NULL, 10);
+        } else if (g_str_has_prefix(opt, "cores=")) {
+            cores = g_ascii_strtoll(opt + 6, NULL, 10);
         } else if (g_str_has_prefix(opt, "evict=")) {
             gchar *p = opt + 6;
             if (g_strcmp0(p, "rand") == 0) {
@@ -615,22 +700,25 @@ int qemu_plugin_install(qemu_plugin_id_t id, const qemu_info_t *info,
 
     policy_init();
 
-    dcache = cache_init(dblksize, dassoc, dcachesize);
-    if (!dcache) {
+    dcaches = caches_init(dblksize, dassoc, dcachesize);
+    if (!dcaches) {
         const char *err = cache_config_error(dblksize, dassoc, dcachesize);
         fprintf(stderr, "dcache cannot be constructed from given parameters\n");
         fprintf(stderr, "%s\n", err);
         return -1;
     }
 
-    icache = cache_init(iblksize, iassoc, icachesize);
-    if (!icache) {
+    icaches = caches_init(iblksize, iassoc, icachesize);
+    if (!icaches) {
         const char *err = cache_config_error(iblksize, iassoc, icachesize);
         fprintf(stderr, "icache cannot be constructed from given parameters\n");
         fprintf(stderr, "%s\n", err);
         return -1;
     }
 
+    dcache_locks = g_new0(GMutex, cores);
+    icache_locks = g_new0(GMutex, cores);
+
     qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans);
     qemu_plugin_register_atexit_cb(id, plugin_exit, NULL);
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v5 2/2] docs/devel/tcg-plugins: added cores arg to cache plugin
  2021-08-03 15:12 [PATCH v5 0/2] plugins/cache: multicore cache modelling Mahmoud Mandour
  2021-08-03 15:13 ` [PATCH v5 1/2] plugins/cache: supported " Mahmoud Mandour
@ 2021-08-03 15:13 ` Mahmoud Mandour
  2021-08-03 21:10 ` [PATCH v5 0/2] plugins/cache: multicore cache modelling Alex Bennée
  2 siblings, 0 replies; 6+ messages in thread
From: Mahmoud Mandour @ 2021-08-03 15:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexandre Iooss, Mahmoud Mandour, alex.bennee

Signed-off-by: Mahmoud Mandour <ma.mandourr@gmail.com>
---
 docs/devel/tcg-plugins.rst | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/docs/devel/tcg-plugins.rst b/docs/devel/tcg-plugins.rst
index 7e54f12837..863828809d 100644
--- a/docs/devel/tcg-plugins.rst
+++ b/docs/devel/tcg-plugins.rst
@@ -355,11 +355,8 @@ configuration when a given working set is run::
 
 will report the following::
 
-    Data accesses: 996479, Misses: 507
-    Miss rate: 0.050879%
-
-    Instruction accesses: 2641737, Misses: 18617
-    Miss rate: 0.704726%
+    core #, data accesses, data misses, dmiss rate, insn accesses, insn misses, imiss rate
+    0       996695         508             0.0510%  2642799        18617           0.7044%
 
     address, data misses, instruction
     0x424f1e (_int_malloc), 109, movq %rax, 8(%rcx)
@@ -403,3 +400,9 @@ The plugin has a number of arguments, all of them are optional:
   Sets the eviction policy to POLICY. Available policies are: :code:`lru`,
   :code:`fifo`, and :code:`rand`. The plugin will use the specified policy for
   both instruction and data caches. (default: POLICY = :code:`lru`)
+
+  * arg="cores=N"
+
+  Sets the number of cores for which we maintain separate icache and dcache.
+  (default: for linux-user, N = 1, for full system emulation: N = cores
+  available to guest)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v5 0/2] plugins/cache: multicore cache modelling
  2021-08-03 15:12 [PATCH v5 0/2] plugins/cache: multicore cache modelling Mahmoud Mandour
  2021-08-03 15:13 ` [PATCH v5 1/2] plugins/cache: supported " Mahmoud Mandour
  2021-08-03 15:13 ` [PATCH v5 2/2] docs/devel/tcg-plugins: added cores arg to cache plugin Mahmoud Mandour
@ 2021-08-03 21:10 ` Alex Bennée
  2021-08-04 11:54   ` Mahmoud Mandour
  2 siblings, 1 reply; 6+ messages in thread
From: Alex Bennée @ 2021-08-03 21:10 UTC (permalink / raw)
  To: Mahmoud Mandour; +Cc: qemu-devel


Mahmoud Mandour <ma.mandourr@gmail.com> writes:

> Hello,
>
> This series introduce multicore cache modelling in contrib/plugins/cache.c
>
> Multi-core cache modelling is handled such that for full-system
> emulation, a private L1 cache is maintained to each core available to
> the system. For multi-threaded userspace emulation, a static number of
> cores is maintained for the overall system, and every memory access go
> through one of these, even if the number of fired threads is more than
> that number.

Queued to plugins/next, thanks.

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v5 0/2] plugins/cache: multicore cache modelling
  2021-08-03 21:10 ` [PATCH v5 0/2] plugins/cache: multicore cache modelling Alex Bennée
@ 2021-08-04 11:54   ` Mahmoud Mandour
  2021-08-04 14:47     ` Alex Bennée
  0 siblings, 1 reply; 6+ messages in thread
From: Mahmoud Mandour @ 2021-08-04 11:54 UTC (permalink / raw)
  To: Alex Bennée; +Cc: open list:All patches CC here

[-- Attachment #1: Type: text/plain, Size: 924 bytes --]

On Tue, Aug 3, 2021 at 11:10 PM Alex Bennée <alex.bennee@linaro.org> wrote:

>
> Mahmoud Mandour <ma.mandourr@gmail.com> writes:
>
> > Hello,
> >
> > This series introduce multicore cache modelling in
> contrib/plugins/cache.c
> >
> > Multi-core cache modelling is handled such that for full-system
> > emulation, a private L1 cache is maintained to each core available to
> > the system. For multi-threaded userspace emulation, a static number of
> > cores is maintained for the overall system, and every memory access go
> > through one of these, even if the number of fired threads is more than
> > that number.
>
> Queued to plugins/next, thanks.
>
>
From what I can see from your fork, qemu/cache.c at plugins/next ·
stsquad/qemu · GitHub
<https://github.com/stsquad/qemu/blob/plugins/next/contrib/plugins/cache.c>
,
here, I think you enqueued v4 of the patches


> --
> Alex Bennée
>

[-- Attachment #2: Type: text/html, Size: 1568 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v5 0/2] plugins/cache: multicore cache modelling
  2021-08-04 11:54   ` Mahmoud Mandour
@ 2021-08-04 14:47     ` Alex Bennée
  0 siblings, 0 replies; 6+ messages in thread
From: Alex Bennée @ 2021-08-04 14:47 UTC (permalink / raw)
  To: Mahmoud Mandour; +Cc: open list:All patches CC here


Mahmoud Mandour <ma.mandourr@gmail.com> writes:

> On Tue, Aug 3, 2021 at 11:10 PM Alex Bennée <alex.bennee@linaro.org> wrote:
>
>  Mahmoud Mandour <ma.mandourr@gmail.com> writes:
>
>  > Hello,
>  >
>  > This series introduce multicore cache modelling in contrib/plugins/cache.c
>  >
>  > Multi-core cache modelling is handled such that for full-system
>  > emulation, a private L1 cache is maintained to each core available to
>  > the system. For multi-threaded userspace emulation, a static number of
>  > cores is maintained for the overall system, and every memory access go
>  > through one of these, even if the number of fired threads is more than
>  > that number.
>
>  Queued to plugins/next, thanks.
>
> From what I can see from your fork, qemu/cache.c at plugins/next · stsquad/qemu · GitHub, 
> here, I think you enqueued v4 of the patches

No I just haven't re-pushed the branch yet.

>  
>  -- 
>  Alex Bennée


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-08-04 17:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-03 15:12 [PATCH v5 0/2] plugins/cache: multicore cache modelling Mahmoud Mandour
2021-08-03 15:13 ` [PATCH v5 1/2] plugins/cache: supported " Mahmoud Mandour
2021-08-03 15:13 ` [PATCH v5 2/2] docs/devel/tcg-plugins: added cores arg to cache plugin Mahmoud Mandour
2021-08-03 21:10 ` [PATCH v5 0/2] plugins/cache: multicore cache modelling Alex Bennée
2021-08-04 11:54   ` Mahmoud Mandour
2021-08-04 14:47     ` Alex Bennée

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.