All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH  v2 0/6] plugins/next (lockstep, api, hwprofile)
@ 2020-06-10 15:55 Alex Bennée
  2020-06-10 15:55 ` [PATCH v2 1/6] iotests: 194: wait migration completion on target too Alex Bennée
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Alex Bennée @ 2020-06-10 15:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: robert.foley, robhenry, aaron, cota, kuhn.chenqun, peter.puhov,
	Alex Bennée

Hi,

This is the current plugins/next queue. The main changes are:

  - cputlb corruption workaround now just saves data ahead of io_writex
  - tweak to format of virtio-pci naming
  - the hwaddr device name now returns a g_intern_string()
  - bunch of extra features to hwprofile

The question of if we should expose mr->name is still an open one. My
alternate suggestion of sticking to explicitly -device with id=
entries met with crickets so I would welcome other thoughts.

The following are still missing reviews:

 - cputlb: ensure we save the IOTLB data in case of reset
 - plugins: new lockstep plugin for debugging TCG changes

Alex Bennée (5):
  plugins: new lockstep plugin for debugging TCG changes
  cputlb: ensure we save the IOTLB data in case of reset
  hw/virtio/pci: include vdev name in registered PCI sections
  plugins: add API to return a name for a IO device
  plugins: new hwprofile plugin

Vladimir Sementsov-Ogievskiy (1):
  iotests: 194: wait migration completion on target too

 include/qemu/qemu-plugin.h |   6 +
 accel/tcg/cputlb.c         |  63 ++++++-
 hw/virtio/virtio-pci.c     |  22 ++-
 plugins/api.c              |  20 +++
 tests/plugin/hwprofile.c   | 305 +++++++++++++++++++++++++++++++++
 tests/plugin/lockstep.c    | 340 +++++++++++++++++++++++++++++++++++++
 tests/plugin/Makefile      |   2 +
 tests/qemu-iotests/194     |  10 ++
 tests/qemu-iotests/194.out |   5 +
 tests/tcg/Makefile.target  |   9 +-
 10 files changed, 771 insertions(+), 11 deletions(-)
 create mode 100644 tests/plugin/hwprofile.c
 create mode 100644 tests/plugin/lockstep.c

-- 
2.20.1



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/6] iotests: 194: wait migration completion on target too
  2020-06-10 15:55 [PATCH v2 0/6] plugins/next (lockstep, api, hwprofile) Alex Bennée
@ 2020-06-10 15:55 ` Alex Bennée
  2020-06-10 16:38   ` Alex Bennée
  2020-06-10 15:55 ` [PATCH v2 2/6] plugins: new lockstep plugin for debugging TCG changes Alex Bennée
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Alex Bennée @ 2020-06-10 15:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Thomas Huth, Vladimir Sementsov-Ogievskiy,
	robert.foley, open list:Block layer core, Alex Bennée,
	robhenry, Max Reitz, aaron, cota, kuhn.chenqun, peter.puhov

From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

It is possible, that shutdown on target occurs earlier than migration
finish. In this case we crash in bdrv_release_dirty_bitmap_locked()
on assertion "assert(!bdrv_dirty_bitmap_busy(bitmap));" as we do have
busy bitmap, as bitmap migration is ongoing.

We'll fix bitmap migration to gracefully cancel on early shutdown soon.
Now let's fix iotest 194 to wait migration completion before shutdown.

Note that in this test dest_vm.shutdown() is called implicitly, as vms
used as context-providers, see __exit__() method of QEMUMachine class.

Actually, not waiting migration finish is a wrong thing, but the test
started to crash after commit ae00aa239847682
"iotests: 194: test also migration of dirty bitmap", which added dirty
bitmaps here. So, Fixes: tag won't hurt.

Fixes: ae00aa2398476824f0eca80461da215e7cdc1c3b
Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20200604083341.26978-1-vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/194     | 10 ++++++++++
 tests/qemu-iotests/194.out |  5 +++++
 2 files changed, 15 insertions(+)

diff --git a/tests/qemu-iotests/194 b/tests/qemu-iotests/194
index 3fad7c6c1ab..6dc2bc94d7e 100755
--- a/tests/qemu-iotests/194
+++ b/tests/qemu-iotests/194
@@ -87,4 +87,14 @@ with iotests.FilePath('source.img') as source_img_path, \
             iotests.log(dest_vm.qmp('nbd-server-stop'))
             break
 
+    iotests.log('Wait migration completion on target...')
+    migr_events = (('MIGRATION', {'data': {'status': 'completed'}}),
+                   ('MIGRATION', {'data': {'status': 'failed'}}))
+    event = dest_vm.events_wait(migr_events)
+    iotests.log(event, filters=[iotests.filter_qmp_event])
+
+    iotests.log('Check bitmaps on source:')
     iotests.log(source_vm.qmp('query-block')['return'][0]['dirty-bitmaps'])
+
+    iotests.log('Check bitmaps on target:')
+    iotests.log(dest_vm.qmp('query-block')['return'][0]['dirty-bitmaps'])
diff --git a/tests/qemu-iotests/194.out b/tests/qemu-iotests/194.out
index dd60dcc14f1..f70cf7610e0 100644
--- a/tests/qemu-iotests/194.out
+++ b/tests/qemu-iotests/194.out
@@ -21,4 +21,9 @@ Gracefully ending the `drive-mirror` job on source...
 {"data": {"device": "mirror-job0", "len": 1073741824, "offset": 1073741824, "speed": 0, "type": "mirror"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
 Stopping the NBD server on destination...
 {"return": {}}
+Wait migration completion on target...
+{"data": {"status": "completed"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+Check bitmaps on source:
+[{"busy": false, "count": 0, "granularity": 65536, "name": "bitmap0", "persistent": false, "recording": true, "status": "active"}]
+Check bitmaps on target:
 [{"busy": false, "count": 0, "granularity": 65536, "name": "bitmap0", "persistent": false, "recording": true, "status": "active"}]
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/6] plugins: new lockstep plugin for debugging TCG changes
  2020-06-10 15:55 [PATCH v2 0/6] plugins/next (lockstep, api, hwprofile) Alex Bennée
  2020-06-10 15:55 ` [PATCH v2 1/6] iotests: 194: wait migration completion on target too Alex Bennée
@ 2020-06-10 15:55 ` Alex Bennée
  2020-06-11 17:04   ` Robert Foley
  2020-06-10 15:55 ` [PATCH v2 3/6] cputlb: ensure we save the IOTLB data in case of reset Alex Bennée
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Alex Bennée @ 2020-06-10 15:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: robert.foley, Mark Cave-Ayland, Richard Henderson, robhenry,
	Philippe Mathieu-Daudé,
	aaron, cota, kuhn.chenqun, peter.puhov, Alex Bennée

When we make changes to the TCG we sometimes cause regressions that
are deep into the execution cycle of the guest. Debugging this often
requires comparing large volumes of trace information to figure out
where behaviour has diverged.

The lockstep plugin utilises a shared socket so two QEMU's running
with the plugin will write their current execution position and wait
to receive the position of their partner process. When execution
diverges the plugins output where they were and the previous few
blocks before unloading themselves and letting execution continue.

Originally I planned for this to be most useful with -icount but it
turns out you can get divergence pretty quickly due to asynchronous
qemu_cpu_kick_rr_cpus() events causing one side to eventually run into
a short block a few cycles before the other side. For this reason I've
added a bit of tracking and I think the divergence reporting could be
finessed to report only if we really start to diverge in execution.

An example run would be:

  qemu-system-sparc -monitor none -parallel none -net none \
    -M SS-20 -m 256 -kernel day11/zImage.elf \
    -plugin ./tests/plugin/liblockstep.so,arg=lockstep-sparc.sock \
    -d plugin,nochain

with an identical command in another window in the same working
directory.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20200429200754.18327-1-alex.bennee@linaro.org>

---
v3
  - added verbose flag
  - basic heuristics to detect "real" divergence
  - checkpatch tweaks
---
 tests/plugin/lockstep.c   | 340 ++++++++++++++++++++++++++++++++++++++
 tests/plugin/Makefile     |   1 +
 tests/tcg/Makefile.target |   2 +-
 3 files changed, 342 insertions(+), 1 deletion(-)
 create mode 100644 tests/plugin/lockstep.c

diff --git a/tests/plugin/lockstep.c b/tests/plugin/lockstep.c
new file mode 100644
index 00000000000..a696673dff3
--- /dev/null
+++ b/tests/plugin/lockstep.c
@@ -0,0 +1,340 @@
+/*
+ * Lockstep Execution Plugin
+ *
+ * Allows you to execute two QEMU instances in lockstep and report
+ * when their execution diverges. This is mainly useful for developers
+ * who want to see where a change to TCG code generation has
+ * introduced a subtle and hard to find bug.
+ *
+ * Caveats:
+ *   - single-threaded linux-user apps only with non-deterministic syscalls
+ *   - no MTTCG enabled system emulation (icount may help)
+ *
+ * While icount makes things more deterministic it doesn't mean a
+ * particular run may execute the exact same sequence of blocks. An
+ * asynchronous event (for example X11 graphics update) may cause a
+ * block to end early and a new partial block to start. This means
+ * serial only test cases are a better bet. -d nochain may also help.
+ *
+ * This code is not thread safe!
+ *
+ * Copyright (c) 2020 Linaro Ltd
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include <glib.h>
+#include <inttypes.h>
+#include <unistd.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <stdio.h>
+#include <errno.h>
+
+#include <qemu-plugin.h>
+
+QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
+
+/* saved so we can uninstall later */
+static qemu_plugin_id_t our_id;
+
+static unsigned long bb_count;
+static unsigned long insn_count;
+
+/* Information about a translated block */
+typedef struct {
+    uint64_t pc;
+    uint64_t insns;
+} BlockInfo;
+
+/* Information about an execution state in the log */
+typedef struct {
+    BlockInfo *block;
+    unsigned long insn_count;
+    unsigned long block_count;
+} ExecInfo;
+
+/* The execution state we compare */
+typedef struct {
+    uint64_t pc;
+    unsigned long insn_count;
+} ExecState;
+
+typedef struct {
+    GSList *log_pos;
+    int distance;
+} DivergeState;
+
+/* list of translated block info */
+static GSList *blocks;
+
+/* execution log and points of divergence */
+static GSList *log, *divergence_log;
+
+static int socket_fd;
+static char *path_to_unlink;
+
+static bool verbose;
+
+static void plugin_cleanup(qemu_plugin_id_t id)
+{
+    /* Free our block data */
+    g_slist_free_full(blocks, &g_free);
+    g_slist_free_full(log, &g_free);
+    g_slist_free(divergence_log);
+
+    close(socket_fd);
+    if (path_to_unlink) {
+        unlink(path_to_unlink);
+    }
+}
+
+static void plugin_exit(qemu_plugin_id_t id, void *p)
+{
+    g_autoptr(GString) out = g_string_new("No divergence :-)\n");
+    g_string_append_printf(out, "Executed %ld/%d blocks\n",
+                           bb_count, g_slist_length(log));
+    g_string_append_printf(out, "Executed ~%ld instructions\n", insn_count);
+    qemu_plugin_outs(out->str);
+
+    plugin_cleanup(id);
+}
+
+static void report_divergance(ExecState *us, ExecState *them)
+{
+    DivergeState divrec = { log, 0 };
+    g_autoptr(GString) out = g_string_new("");
+    bool diverged = false;
+
+    /*
+     * If we have diverged before did we get back on track or are we
+     * totally loosing it?
+     */
+    if (divergence_log) {
+        DivergeState *last = (DivergeState *) divergence_log->data;
+        GSList *entry;
+
+        for (entry = log; g_slist_next(entry); entry = g_slist_next(entry)) {
+            if (entry == last->log_pos) {
+                break;
+            }
+            divrec.distance++;
+        }
+
+        /*
+         * If the last two records are so close it is likely we will
+         * not recover synchronisation with the other end.
+         */
+        if (divrec.distance == 1 && last->distance == 1) {
+            diverged = true;
+        }
+    }
+    divergence_log = g_slist_prepend(divergence_log,
+                                     g_memdup(&divrec, sizeof(divrec)));
+
+    /* Output short log entry of going out of sync... */
+    if (verbose || divrec.distance == 1 || diverged) {
+        g_string_printf(out, "@ %#016lx vs %#016lx (%d/%d since last)\n",
+                        us->pc, them->pc, g_slist_length(divergence_log),
+                        divrec.distance);
+        qemu_plugin_outs(out->str);
+    }
+
+    if (diverged) {
+        int i;
+        GSList *entry;
+
+        g_string_printf(out, "Δ insn_count @ %#016lx (%ld) vs %#016lx (%ld)\n",
+                        us->pc, us->insn_count, them->pc, them->insn_count);
+
+        for (entry = log, i = 0;
+             g_slist_next(entry) && i < 5;
+             entry = g_slist_next(entry), i++) {
+            ExecInfo *prev = (ExecInfo *) entry->data;
+            g_string_append_printf(out,
+                                   "  previously @ %#016lx/%ld (%ld insns)\n",
+                                   prev->block->pc, prev->block->insns,
+                                   prev->insn_count);
+        }
+        qemu_plugin_outs(out->str);
+        qemu_plugin_outs("too much divergence... giving up.");
+        qemu_plugin_uninstall(our_id, plugin_cleanup);
+    }
+}
+
+static void vcpu_tb_exec(unsigned int cpu_index, void *udata)
+{
+    BlockInfo *bi = (BlockInfo *) udata;
+    ExecState us, them;
+    ssize_t bytes;
+    ExecInfo *exec;
+
+    us.pc = bi->pc;
+    us.insn_count = insn_count;
+
+    /*
+     * Write our current position to the other end. If we fail the
+     * other end has probably died and we should shut down gracefully.
+     */
+    bytes = write(socket_fd, &us, sizeof(ExecState));
+    if (bytes < sizeof(ExecState)) {
+        qemu_plugin_outs(bytes < 0 ?
+                         "problem writing to socket" :
+                         "wrote less than expected to socket");
+        qemu_plugin_uninstall(our_id, plugin_cleanup);
+        return;
+    }
+
+    /*
+     * Now read where our peer has reached. Again a failure probably
+     * indicates the other end died and we should close down cleanly.
+     */
+    bytes = read(socket_fd, &them, sizeof(ExecState));
+    if (bytes < sizeof(ExecState)) {
+        qemu_plugin_outs(bytes < 0 ?
+                         "problem reading from socket" :
+                         "read less than expected");
+        qemu_plugin_uninstall(our_id, plugin_cleanup);
+        return;
+    }
+
+    /*
+     * Compare and report if we have diverged.
+     */
+    if (us.pc != them.pc) {
+        report_divergance(&us, &them);
+    }
+
+    /*
+     * Assume this block will execute fully and record it
+     * in the execution log.
+     */
+    insn_count += bi->insns;
+    bb_count++;
+    exec = g_new0(ExecInfo, 1);
+    exec->block = bi;
+    exec->insn_count = insn_count;
+    exec->block_count = bb_count;
+    log = g_slist_prepend(log, exec);
+}
+
+static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
+{
+    BlockInfo *bi = g_new0(BlockInfo, 1);
+    bi->pc = qemu_plugin_tb_vaddr(tb);
+    bi->insns = qemu_plugin_tb_n_insns(tb);
+
+    /* save a reference so we can free later */
+    blocks = g_slist_prepend(blocks, bi);
+    qemu_plugin_register_vcpu_tb_exec_cb(tb, vcpu_tb_exec,
+                                         QEMU_PLUGIN_CB_NO_REGS, (void *)bi);
+}
+
+
+/*
+ * Instead of encoding master/slave status into what is essentially
+ * two peers we shall just take the simple approach of checking for
+ * the existence of the pipe and assuming if it's not there we are the
+ * first process.
+ */
+static bool setup_socket(const char *path)
+{
+    struct sockaddr_un sockaddr;
+    int fd;
+
+    fd = socket(AF_UNIX, SOCK_STREAM, 0);
+    if (fd < 0) {
+        perror("create socket");
+        return false;
+    }
+
+    sockaddr.sun_family = AF_UNIX;
+    g_strlcpy(sockaddr.sun_path, path, sizeof(sockaddr.sun_path) - 1);
+    if (bind(fd, (struct sockaddr *)&sockaddr, sizeof(sockaddr)) < 0) {
+        perror("bind socket");
+        close(fd);
+        return false;
+    }
+
+    /* remember to clean-up */
+    path_to_unlink = g_strdup(path);
+
+    if (listen(fd, 1) < 0) {
+        perror("listen socket");
+        close(fd);
+        return false;
+    }
+
+    socket_fd = accept(fd, NULL, NULL);
+    if (socket_fd < 0 && errno != EINTR) {
+        perror("accept socket");
+        return false;
+    }
+
+    qemu_plugin_outs("setup_socket::ready\n");
+
+    return true;
+}
+
+static bool connect_socket(const char *path)
+{
+    int fd;
+    struct sockaddr_un sockaddr;
+
+    fd = socket(AF_UNIX, SOCK_STREAM, 0);
+    if (fd < 0) {
+        perror("create socket");
+        return false;
+    }
+
+    sockaddr.sun_family = AF_UNIX;
+    g_strlcpy(sockaddr.sun_path, path, sizeof(sockaddr.sun_path) - 1);
+
+    if (connect(fd, (struct sockaddr *)&sockaddr, sizeof(sockaddr)) < 0) {
+        perror("failed to connect");
+        return false;
+    }
+
+    qemu_plugin_outs("connect_socket::ready\n");
+
+    socket_fd = fd;
+    return true;
+}
+
+static bool setup_unix_socket(const char *path)
+{
+    if (g_file_test(path, G_FILE_TEST_EXISTS)) {
+        return connect_socket(path);
+    } else {
+        return setup_socket(path);
+    }
+}
+
+
+QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
+                                           const qemu_info_t *info,
+                                           int argc, char **argv)
+{
+    int i;
+
+    if (!argc || !argv[0]) {
+        qemu_plugin_outs("Need a socket path to talk to other instance.");
+        return -1;
+    }
+
+    for (i = 0; i < argc; i++) {
+        char *p = argv[i];
+        if (strcmp(p, "verbose") == 0) {
+            verbose = true;
+        } else if (!setup_unix_socket(argv[0])) {
+            qemu_plugin_outs("Failed to setup socket for communications.");
+            return -1;
+        }
+    }
+
+    our_id = id;
+
+    qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans);
+    qemu_plugin_register_atexit_cb(id, plugin_exit, NULL);
+    return 0;
+}
diff --git a/tests/plugin/Makefile b/tests/plugin/Makefile
index 75467b6db85..b3250e2504c 100644
--- a/tests/plugin/Makefile
+++ b/tests/plugin/Makefile
@@ -13,6 +13,7 @@ NAMES += mem
 NAMES += hotblocks
 NAMES += howvec
 NAMES += hotpages
+NAMES += lockstep
 
 SONAMES := $(addsuffix .so,$(addprefix lib,$(NAMES)))
 
diff --git a/tests/tcg/Makefile.target b/tests/tcg/Makefile.target
index b3cff3cad1a..075daf3d22d 100644
--- a/tests/tcg/Makefile.target
+++ b/tests/tcg/Makefile.target
@@ -128,7 +128,7 @@ RUN_TESTS=$(patsubst %,run-%, $(TESTS))
 ifeq ($(CONFIG_PLUGIN),y)
 PLUGIN_DIR=../../plugin
 VPATH+=$(PLUGIN_DIR)
-PLUGINS=$(notdir $(wildcard $(PLUGIN_DIR)/*.so))
+PLUGINS=$(filter-out liblockstep.so,$(notdir $(wildcard $(PLUGIN_DIR)/*.so)))
 
 # We need to ensure expand the run-plugin-TEST-with-PLUGIN
 # pre-requistes manually here as we can't use stems to handle it. We
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 3/6] cputlb: ensure we save the IOTLB data in case of reset
  2020-06-10 15:55 [PATCH v2 0/6] plugins/next (lockstep, api, hwprofile) Alex Bennée
  2020-06-10 15:55 ` [PATCH v2 1/6] iotests: 194: wait migration completion on target too Alex Bennée
  2020-06-10 15:55 ` [PATCH v2 2/6] plugins: new lockstep plugin for debugging TCG changes Alex Bennée
@ 2020-06-10 15:55 ` Alex Bennée
  2020-06-21 20:33   ` Emilio G. Cota
  2020-06-10 15:55 ` [PATCH v2 4/6] hw/virtio/pci: include vdev name in registered PCI sections Alex Bennée
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Alex Bennée @ 2020-06-10 15:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: robert.foley, Paolo Bonzini, robhenry, aaron, cota, kuhn.chenqun,
	peter.puhov, Alex Bennée, Richard Henderson

Any write to a device might cause a re-arrangement of memory
triggering a TLB flush and potential re-size of the TLB invalidating
previous entries. This would cause users of qemu_plugin_get_hwaddr()
to see the warning:

  invalid use of qemu_plugin_get_hwaddr

because of the failed tlb_lookup which should always succeed. To
prevent this we save the IOTLB data in case it is later needed by a
plugin doing a lookup.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v2
  - save the entry instead of re-running the tlb_fill.

squash! cputlb: ensure we save the IOTLB entry in case of reset
---
 accel/tcg/cputlb.c | 63 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 61 insertions(+), 2 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index eb2cf9de5e6..9bf9e479c7c 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1058,6 +1058,47 @@ static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
     return val;
 }
 
+#ifdef CONFIG_PLUGIN
+
+typedef struct SavedIOTLB {
+    struct rcu_head rcu;
+    struct SavedIOTLB **save_loc;
+    MemoryRegionSection *section;
+    hwaddr mr_offset;
+} SavedIOTLB;
+
+static void clean_saved_entry(SavedIOTLB *s)
+{
+    atomic_rcu_set(s->save_loc, NULL);
+    g_free(s);
+}
+
+static __thread SavedIOTLB *saved_for_plugin;
+
+/*
+ * Save a potentially trashed IOTLB entry for later lookup by plugin.
+ *
+ * We also need to track the thread storage address because the RCU
+ * cleanup that runs when we leave the critical region (the current
+ * execution) is actually in a different thread.
+ */
+static void save_iotlb_data(MemoryRegionSection *section, hwaddr mr_offset)
+{
+    SavedIOTLB *s = g_new(SavedIOTLB, 1);
+    s->save_loc = &saved_for_plugin;
+    s->section = section;
+    s->mr_offset = mr_offset;
+    atomic_rcu_set(&saved_for_plugin, s);
+    call_rcu(s, clean_saved_entry, rcu);
+}
+
+#else
+static void save_iotlb_data(MemoryRegionSection *section, hwaddr mr_offset)
+{
+    /* do nothing */
+}
+#endif
+
 static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
                       int mmu_idx, uint64_t val, target_ulong addr,
                       uintptr_t retaddr, MemOp op)
@@ -1077,6 +1118,12 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
     }
     cpu->mem_io_pc = retaddr;
 
+    /*
+     * The memory_region_dispatch may trigger a flush/resize
+     * so for plugins we save the iotlb_data just in case.
+     */
+    save_iotlb_data(section, mr_offset);
+
     if (mr->global_locking && !qemu_mutex_iothread_locked()) {
         qemu_mutex_lock_iothread();
         locked = true;
@@ -1091,6 +1138,7 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
                                MMU_DATA_STORE, mmu_idx, iotlbentry->attrs, r,
                                retaddr);
     }
+
     if (locked) {
         qemu_mutex_unlock_iothread();
     }
@@ -1366,8 +1414,11 @@ void *tlb_vaddr_to_host(CPUArchState *env, abi_ptr addr,
  * in the softmmu lookup code (or helper). We don't handle re-fills or
  * checking the victim table. This is purely informational.
  *
- * This should never fail as the memory access being instrumented
- * should have just filled the TLB.
+ * This almost never fails as the memory access being instrumented
+ * should have just filled the TLB. The one corner case is io_writex
+ * which can cause TLB flushes and potential resizing of the TLBs
+ * loosing the information we need. In those cases we need to recover
+ * data from a thread local copy of the io_tlb entry.
  */
 
 bool tlb_plugin_lookup(CPUState *cpu, target_ulong addr, int mmu_idx,
@@ -1391,6 +1442,14 @@ bool tlb_plugin_lookup(CPUState *cpu, target_ulong addr, int mmu_idx,
             data->v.ram.hostaddr = addr + tlbe->addend;
         }
         return true;
+    } else {
+        SavedIOTLB *saved = atomic_rcu_read(&saved_for_plugin);
+        if (saved) {
+            data->is_io = true;
+            data->v.io.section = saved->section;
+            data->v.io.offset = saved->mr_offset;
+            return true;
+        }
     }
     return false;
 }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 4/6] hw/virtio/pci: include vdev name in registered PCI sections
  2020-06-10 15:55 [PATCH v2 0/6] plugins/next (lockstep, api, hwprofile) Alex Bennée
                   ` (2 preceding siblings ...)
  2020-06-10 15:55 ` [PATCH v2 3/6] cputlb: ensure we save the IOTLB data in case of reset Alex Bennée
@ 2020-06-10 15:55 ` Alex Bennée
  2020-06-10 15:55 ` [PATCH v2 5/6] plugins: add API to return a name for a IO device Alex Bennée
  2020-06-10 15:55 ` [PATCH v2 6/6] plugins: new hwprofile plugin Alex Bennée
  5 siblings, 0 replies; 12+ messages in thread
From: Alex Bennée @ 2020-06-10 15:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: robert.foley, Michael S . Tsirkin, Philippe Mathieu-Daudé,
	robhenry, aaron, cota, kuhn.chenqun, peter.puhov,
	Alex Bennée

When viewing/debugging memory regions it is sometimes hard to figure
out which PCI device something belongs to. Make the names unique by
including the vdev name in the name string.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>

---
v2
  - swap ()'s for an extra -
---
 hw/virtio/virtio-pci.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index d028c17c240..51ab67304bc 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1390,7 +1390,8 @@ static void virtio_pci_device_write(void *opaque, hwaddr addr,
     }
 }
 
-static void virtio_pci_modern_regions_init(VirtIOPCIProxy *proxy)
+static void virtio_pci_modern_regions_init(VirtIOPCIProxy *proxy,
+                                           const char *vdev_name)
 {
     static const MemoryRegionOps common_ops = {
         .read = virtio_pci_common_read,
@@ -1437,36 +1438,41 @@ static void virtio_pci_modern_regions_init(VirtIOPCIProxy *proxy)
         },
         .endianness = DEVICE_LITTLE_ENDIAN,
     };
+    g_autoptr(GString) name = g_string_new(NULL);
 
-
+    g_string_printf(name, "virtio-pci-common-%s", vdev_name);
     memory_region_init_io(&proxy->common.mr, OBJECT(proxy),
                           &common_ops,
                           proxy,
-                          "virtio-pci-common",
+                          name->str,
                           proxy->common.size);
 
+    g_string_printf(name, "virtio-pci-isr-%s", vdev_name);
     memory_region_init_io(&proxy->isr.mr, OBJECT(proxy),
                           &isr_ops,
                           proxy,
-                          "virtio-pci-isr",
+                          name->str,
                           proxy->isr.size);
 
+    g_string_printf(name, "virtio-pci-device-%s", vdev_name);
     memory_region_init_io(&proxy->device.mr, OBJECT(proxy),
                           &device_ops,
                           virtio_bus_get_device(&proxy->bus),
-                          "virtio-pci-device",
+                          name->str,
                           proxy->device.size);
 
+    g_string_printf(name, "virtio-pci-notify-%s", vdev_name);
     memory_region_init_io(&proxy->notify.mr, OBJECT(proxy),
                           &notify_ops,
                           virtio_bus_get_device(&proxy->bus),
-                          "virtio-pci-notify",
+                          name->str,
                           proxy->notify.size);
 
+    g_string_printf(name, "virtio-pci-notify-pio-%s", vdev_name);
     memory_region_init_io(&proxy->notify_pio.mr, OBJECT(proxy),
                           &notify_pio_ops,
                           virtio_bus_get_device(&proxy->bus),
-                          "virtio-pci-notify-pio",
+                          name->str,
                           proxy->notify_pio.size);
 }
 
@@ -1607,7 +1613,7 @@ static void virtio_pci_device_plugged(DeviceState *d, Error **errp)
 
         struct virtio_pci_cfg_cap *cfg_mask;
 
-        virtio_pci_modern_regions_init(proxy);
+        virtio_pci_modern_regions_init(proxy, vdev->name);
 
         virtio_pci_modern_mem_region_map(proxy, &proxy->common, &cap);
         virtio_pci_modern_mem_region_map(proxy, &proxy->isr, &cap);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH  v2 5/6] plugins: add API to return a name for a IO device
  2020-06-10 15:55 [PATCH v2 0/6] plugins/next (lockstep, api, hwprofile) Alex Bennée
                   ` (3 preceding siblings ...)
  2020-06-10 15:55 ` [PATCH v2 4/6] hw/virtio/pci: include vdev name in registered PCI sections Alex Bennée
@ 2020-06-10 15:55 ` Alex Bennée
  2020-06-10 15:55 ` [PATCH v2 6/6] plugins: new hwprofile plugin Alex Bennée
  5 siblings, 0 replies; 12+ messages in thread
From: Alex Bennée @ 2020-06-10 15:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: robert.foley, robhenry, aaron, cota, kuhn.chenqun, peter.puhov,
	Clement Deschamps, Alex Bennée

This may well end up being anonymous but it should always be unique.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
[r-b provisional given change to g_intern_string]
Reviewed-by: Clement Deschamps <clement.deschamps@greensocs.com>
Reviewed-by: Emilio G. Cota <cota@braap.org>

---
v3
  - return a non-freeable const g_intern_string()
  - checkpatch cleanups
---
 include/qemu/qemu-plugin.h |  6 ++++++
 plugins/api.c              | 20 ++++++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
index bab8b0d4b3a..c98c18d6b05 100644
--- a/include/qemu/qemu-plugin.h
+++ b/include/qemu/qemu-plugin.h
@@ -335,6 +335,12 @@ struct qemu_plugin_hwaddr *qemu_plugin_get_hwaddr(qemu_plugin_meminfo_t info,
 bool qemu_plugin_hwaddr_is_io(const struct qemu_plugin_hwaddr *haddr);
 uint64_t qemu_plugin_hwaddr_device_offset(const struct qemu_plugin_hwaddr *haddr);
 
+/*
+ * Returns a string representing the device. The string is valid for
+ * the lifetime of the plugin.
+ */
+const char *qemu_plugin_hwaddr_device_name(const struct qemu_plugin_hwaddr *h);
+
 typedef void
 (*qemu_plugin_vcpu_mem_cb_t)(unsigned int vcpu_index,
                              qemu_plugin_meminfo_t info, uint64_t vaddr,
diff --git a/plugins/api.c b/plugins/api.c
index bbdc5a4eb46..4304e63f0cf 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -303,6 +303,26 @@ uint64_t qemu_plugin_hwaddr_device_offset(const struct qemu_plugin_hwaddr *haddr
     return 0;
 }
 
+const char *qemu_plugin_hwaddr_device_name(const struct qemu_plugin_hwaddr *h)
+{
+#ifdef CONFIG_SOFTMMU
+    if (h && h->is_io) {
+        MemoryRegionSection *mrs = h->v.io.section;
+        if (!mrs->mr->name) {
+            unsigned long maddr = 0xffffffff & (uintptr_t) mrs->mr;
+            g_autofree char *temp = g_strdup_printf("anon%08lx", maddr);
+            return g_intern_string(temp);
+        } else {
+            return g_intern_string(mrs->mr->name);
+        }
+    } else {
+        return g_intern_string("RAM");
+    }
+#else
+    return g_intern_string("Invalid");
+#endif
+}
+
 /*
  * Queries to the number and potential maximum number of vCPUs there
  * will be. This helps the plugin dimension per-vcpu arrays.
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH  v2 6/6] plugins: new hwprofile plugin
  2020-06-10 15:55 [PATCH v2 0/6] plugins/next (lockstep, api, hwprofile) Alex Bennée
                   ` (4 preceding siblings ...)
  2020-06-10 15:55 ` [PATCH v2 5/6] plugins: add API to return a name for a IO device Alex Bennée
@ 2020-06-10 15:55 ` Alex Bennée
  5 siblings, 0 replies; 12+ messages in thread
From: Alex Bennée @ 2020-06-10 15:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: robert.foley, Philippe Mathieu-Daudé,
	robhenry, aaron, cota, kuhn.chenqun, peter.puhov,
	Alex Bennée

This is a plugin intended to help with profiling access to various
bits of system hardware. It only really makes sense for system
emulation.

It takes advantage of the recently exposed helper API that allows us
to see the device name (memory region name) associated with a device.

You can specify arg=read or arg=write to limit the tracking to just
reads or writes (by default it does both).

The pattern option:

  -plugin ./tests/plugin/libhwprofile.so,arg=pattern

will allow you to see the access pattern to devices, eg:

  gic_cpu @ 0xffffffc010040000
    off:00000000, 8, 1, 8, 1
    off:00000000, 4, 1, 4, 1
    off:00000000, 2, 1, 2, 1
    off:00000000, 1, 1, 1, 1

The source option:

  -plugin ./tests/plugin/libhwprofile.so,arg=source

will track the virtual source address of the instruction making the
access:

  pl011 @ 0xffffffc010031000
    pc:ffffffc0104c785c, 1, 4, 0, 0
    pc:ffffffc0104c7898, 1, 4, 0, 0
    pc:ffffffc010512bcc, 2, 1867, 0, 0

You cannot mix source and pattern.

Finally the match option allow you to limit the tracking to just the
devices you care about.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Robert Foley <robert.foley@linaro.org>
Tested-by: Robert Foley <robert.foley@linaro.org>

---
v2
  - use 64 bit cpu masks
  - warn if we will exceed cpu mask capability
  - don't run in linux-user mode
  - skip in check-tcg for linux-user test
v3
  - update device name API
  - re-factor and clean-up
  - add source tracking
  - use direct hash for source/pattern
  - add match function
  - expand the commit message
  - checkpatch cleanups
---
 tests/plugin/hwprofile.c  | 305 ++++++++++++++++++++++++++++++++++++++
 tests/plugin/Makefile     |   1 +
 tests/tcg/Makefile.target |   9 +-
 3 files changed, 314 insertions(+), 1 deletion(-)
 create mode 100644 tests/plugin/hwprofile.c

diff --git a/tests/plugin/hwprofile.c b/tests/plugin/hwprofile.c
new file mode 100644
index 00000000000..6dac1d5f854
--- /dev/null
+++ b/tests/plugin/hwprofile.c
@@ -0,0 +1,305 @@
+/*
+ * Copyright (C) 2020, Alex Bennée <alex.bennee@linaro.org>
+ *
+ * HW Profile - breakdown access patterns for IO to devices
+ *
+ * License: GNU GPL, version 2 or later.
+ *   See the COPYING file in the top-level directory.
+ */
+
+#include <inttypes.h>
+#include <assert.h>
+#include <stdlib.h>
+#include <inttypes.h>
+#include <string.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <glib.h>
+
+#include <qemu-plugin.h>
+
+QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
+
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+
+typedef struct {
+    uint64_t cpu_read;
+    uint64_t cpu_write;
+    uint64_t reads;
+    uint64_t writes;
+} IOCounts;
+
+typedef struct {
+    uint64_t off_or_pc;
+    IOCounts counts;
+} IOLocationCounts;
+
+typedef struct {
+    const char *name;
+    uint64_t   base;
+    IOCounts   totals;
+    GHashTable *detail;
+} DeviceCounts;
+
+static GMutex lock;
+static GHashTable *devices;
+
+/* track the access pattern to a piece of HW */
+static bool pattern;
+/* track the source address of access to HW */
+static bool source;
+/* track only matched regions of HW */
+static bool check_match;
+static gchar **matches;
+
+static enum qemu_plugin_mem_rw rw = QEMU_PLUGIN_MEM_RW;
+
+static inline bool track_reads(void)
+{
+    return rw == QEMU_PLUGIN_MEM_RW || rw == QEMU_PLUGIN_MEM_R;
+}
+
+static inline bool track_writes(void)
+{
+    return rw == QEMU_PLUGIN_MEM_RW || rw == QEMU_PLUGIN_MEM_W;
+}
+
+static void plugin_init(void)
+{
+    devices = g_hash_table_new(NULL, NULL);
+}
+
+static gint sort_cmp(gconstpointer a, gconstpointer b)
+{
+    DeviceCounts *ea = (DeviceCounts *) a;
+    DeviceCounts *eb = (DeviceCounts *) b;
+    return ea->totals.reads + ea->totals.writes >
+           eb->totals.reads + eb->totals.writes ? -1 : 1;
+}
+
+static gint sort_loc(gconstpointer a, gconstpointer b)
+{
+    IOLocationCounts *ea = (IOLocationCounts *) a;
+    IOLocationCounts *eb = (IOLocationCounts *) b;
+    return ea->off_or_pc > eb->off_or_pc;
+}
+
+static void fmt_iocount_record(GString *s, IOCounts *rec)
+{
+    if (track_reads()) {
+        g_string_append_printf(s, ", %"PRIx64", %"PRId64,
+                               rec->cpu_read, rec->reads);
+    }
+    if (track_writes()) {
+        g_string_append_printf(s, ", %"PRIx64", %"PRId64,
+                               rec->cpu_write, rec->writes);
+    }
+}
+
+static void fmt_dev_record(GString *s, DeviceCounts *rec)
+{
+    g_string_append_printf(s, "%s, 0x%"PRIx64,
+                           rec->name, rec->base);
+    fmt_iocount_record(s, &rec->totals);
+    g_string_append_c(s, '\n');
+}
+
+static void plugin_exit(qemu_plugin_id_t id, void *p)
+{
+    g_autoptr(GString) report = g_string_new("");
+    GList *counts;
+
+    if (!(pattern || source)) {
+        g_string_printf(report, "Device, Address");
+        if (track_reads()) {
+            g_string_append_printf(report, ", RCPUs, Reads");
+        }
+        if (track_writes()) {
+            g_string_append_printf(report, ",  WCPUs, Writes");
+        }
+        g_string_append_c(report, '\n');
+    }
+
+    counts = g_hash_table_get_values(devices);
+    if (counts && g_list_next(counts)) {
+        GList *it;
+
+        it = g_list_sort(counts, sort_cmp);
+
+        while (it) {
+            DeviceCounts *rec = (DeviceCounts *) it->data;
+            if (rec->detail) {
+                GList *accesses = g_hash_table_get_values(rec->detail);
+                GList *io_it = g_list_sort(accesses, sort_loc);
+                const char *prefix = pattern ? "off" : "pc";
+                g_string_append_printf(report, "%s @ 0x%"PRIx64"\n",
+                                       rec->name, rec->base);
+                while (io_it) {
+                    IOLocationCounts *loc = (IOLocationCounts *) io_it->data;
+                    g_string_append_printf(report, "  %s:%08"PRIx64,
+                                           prefix, loc->off_or_pc);
+                    fmt_iocount_record(report, &loc->counts);
+                    g_string_append_c(report, '\n');
+                    io_it = io_it->next;
+                }
+            } else {
+                fmt_dev_record(report, rec);
+            }
+            it = it->next;
+        };
+        g_list_free(it);
+    }
+
+    qemu_plugin_outs(report->str);
+}
+
+static DeviceCounts *new_count(const char *name, uint64_t base)
+{
+    DeviceCounts *count = g_new0(DeviceCounts, 1);
+    count->name = name;
+    count->base = base;
+    if (pattern || source) {
+        count->detail = g_hash_table_new(NULL, NULL);
+    }
+    g_hash_table_insert(devices, (gpointer) name, count);
+    return count;
+}
+
+static IOLocationCounts *new_location(GHashTable *table, uint64_t off_or_pc)
+{
+    IOLocationCounts *loc = g_new0(IOLocationCounts, 1);
+    loc->off_or_pc = off_or_pc;
+    g_hash_table_insert(table, (gpointer) off_or_pc, loc);
+    return loc;
+}
+
+static void hwprofile_match_hit(DeviceCounts *rec, uint64_t off)
+{
+    g_autoptr(GString) report = g_string_new("hwprofile: match @ offset");
+    g_string_append_printf(report, "%"PRIx64", previous hits\n", off);
+    fmt_dev_record(report, rec);
+    qemu_plugin_outs(report->str);
+}
+
+static void inc_count(IOCounts *count, bool is_write, unsigned int cpu_index)
+{
+    if (is_write) {
+        count->writes++;
+        count->cpu_write |= (1 << cpu_index);
+    } else {
+        count->reads++;
+        count->cpu_read |= (1 << cpu_index);
+    }
+}
+
+static void vcpu_haddr(unsigned int cpu_index, qemu_plugin_meminfo_t meminfo,
+                       uint64_t vaddr, void *udata)
+{
+    struct qemu_plugin_hwaddr *hwaddr = qemu_plugin_get_hwaddr(meminfo, vaddr);
+
+    if (!hwaddr || !qemu_plugin_hwaddr_is_io(hwaddr)) {
+        return;
+    } else {
+        const char *name = qemu_plugin_hwaddr_device_name(hwaddr);
+        uint64_t off = qemu_plugin_hwaddr_device_offset(hwaddr);
+        bool is_write = qemu_plugin_mem_is_store(meminfo);
+        DeviceCounts *counts;
+
+        g_mutex_lock(&lock);
+        counts = (DeviceCounts *) g_hash_table_lookup(devices, name);
+
+        if (!counts) {
+            uint64_t base = vaddr - off;
+            counts = new_count(name, base);
+        }
+
+        if (check_match) {
+            if (g_strv_contains((const char * const *)matches, counts->name)) {
+                hwprofile_match_hit(counts, off);
+                inc_count(&counts->totals, is_write, cpu_index);
+            }
+        } else {
+            inc_count(&counts->totals, is_write, cpu_index);
+        }
+
+        /* either track offsets or source of access */
+        if (source) {
+            off = (uint64_t) udata;
+        }
+
+        if (pattern || source) {
+            IOLocationCounts *io_count = g_hash_table_lookup(counts->detail,
+                                                             (gpointer) off);
+            if (!io_count) {
+                io_count = new_location(counts->detail, off);
+            }
+            inc_count(&io_count->counts, is_write, cpu_index);
+        }
+
+        g_mutex_unlock(&lock);
+    }
+}
+
+static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
+{
+    size_t n = qemu_plugin_tb_n_insns(tb);
+    size_t i;
+
+    for (i = 0; i < n; i++) {
+        struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, i);
+        gpointer udata = (gpointer) (source ? qemu_plugin_insn_vaddr(insn) : 0);
+        qemu_plugin_register_vcpu_mem_cb(insn, vcpu_haddr,
+                                         QEMU_PLUGIN_CB_NO_REGS,
+                                         rw, udata);
+    }
+}
+
+QEMU_PLUGIN_EXPORT
+int qemu_plugin_install(qemu_plugin_id_t id, const qemu_info_t *info,
+                        int argc, char **argv)
+{
+    int i;
+
+    for (i = 0; i < argc; i++) {
+        char *opt = argv[i];
+        if (g_strcmp0(opt, "read") == 0) {
+            rw = QEMU_PLUGIN_MEM_R;
+        } else if (g_strcmp0(opt, "write") == 0) {
+            rw = QEMU_PLUGIN_MEM_W;
+        } else if (g_strcmp0(opt, "pattern") == 0) {
+            pattern = true;
+        } else if (g_strcmp0(opt, "source") == 0) {
+            source = true;
+        } else if (g_str_has_prefix(opt, "match")) {
+            gchar **parts = g_strsplit(opt, "=", 2);
+            check_match = true;
+            matches = g_strsplit(parts[1], ",", -1);
+            g_strfreev(parts);
+        } else {
+            fprintf(stderr, "option parsing failed: %s\n", opt);
+            return -1;
+        }
+    }
+
+    if (source && pattern) {
+        fprintf(stderr, "can only currently track either source or pattern.\n");
+        return -1;
+    }
+
+    if (!info->system_emulation) {
+        fprintf(stderr, "hwprofile: plugin only useful for system emulation\n");
+        return -1;
+    }
+
+    /* Just warn about overflow */
+    if (info->system.smp_vcpus > 64 ||
+        info->system.max_vcpus > 64) {
+        fprintf(stderr, "hwprofile: can only track up to 64 CPUs\n");
+    }
+
+    plugin_init();
+
+    qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans);
+    qemu_plugin_register_atexit_cb(id, plugin_exit, NULL);
+    return 0;
+}
diff --git a/tests/plugin/Makefile b/tests/plugin/Makefile
index b3250e2504c..d87b8d40699 100644
--- a/tests/plugin/Makefile
+++ b/tests/plugin/Makefile
@@ -14,6 +14,7 @@ NAMES += hotblocks
 NAMES += howvec
 NAMES += hotpages
 NAMES += lockstep
+NAMES += hwprofile
 
 SONAMES := $(addsuffix .so,$(addprefix lib,$(NAMES)))
 
diff --git a/tests/tcg/Makefile.target b/tests/tcg/Makefile.target
index 075daf3d22d..5785554901b 100644
--- a/tests/tcg/Makefile.target
+++ b/tests/tcg/Makefile.target
@@ -128,7 +128,14 @@ RUN_TESTS=$(patsubst %,run-%, $(TESTS))
 ifeq ($(CONFIG_PLUGIN),y)
 PLUGIN_DIR=../../plugin
 VPATH+=$(PLUGIN_DIR)
-PLUGINS=$(filter-out liblockstep.so,$(notdir $(wildcard $(PLUGIN_DIR)/*.so)))
+
+# Some plugins aren't testable automatically
+SKIP_PLUGINS=liblockstep.so
+ifdef CONFIG_USER_ONLY
+SKIP_PLUGINS+=libhwprofile.so
+endif
+
+PLUGINS=$(filter-out $(SKIP_PLUGINS),$(notdir $(wildcard $(PLUGIN_DIR)/*.so)))
 
 # We need to ensure expand the run-plugin-TEST-with-PLUGIN
 # pre-requistes manually here as we can't use stems to handle it. We
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH  v2 1/6] iotests: 194: wait migration completion on target too
  2020-06-10 15:55 ` [PATCH v2 1/6] iotests: 194: wait migration completion on target too Alex Bennée
@ 2020-06-10 16:38   ` Alex Bennée
  0 siblings, 0 replies; 12+ messages in thread
From: Alex Bennée @ 2020-06-10 16:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Thomas Huth, Vladimir Sementsov-Ogievskiy,
	robert.foley, open list:Block layer core, Alex Bennée,
	robhenry, Max Reitz, aaron, cota, kuhn.chenqun, peter.puhov


Alex Bennée <alex.bennee@linaro.org> writes:

> From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>
> It is possible, that shutdown on target occurs earlier than migration
> finish. In this case we crash in bdrv_release_dirty_bitmap_locked()
> on assertion "assert(!bdrv_dirty_bitmap_busy(bitmap));" as we do have
> busy bitmap, as bitmap migration is ongoing.
>
> We'll fix bitmap migration to gracefully cancel on early shutdown soon.
> Now let's fix iotest 194 to wait migration completion before shutdown.
>
> Note that in this test dest_vm.shutdown() is called implicitly, as vms
> used as context-providers, see __exit__() method of QEMUMachine class.
>
> Actually, not waiting migration finish is a wrong thing, but the test
> started to crash after commit ae00aa239847682
> "iotests: 194: test also migration of dirty bitmap", which added dirty
> bitmaps here. So, Fixes: tag won't hurt.
>
> Fixes: ae00aa2398476824f0eca80461da215e7cdc1c3b
> Reported-by: Thomas Huth <thuth@redhat.com>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Tested-by: Thomas Huth <thuth@redhat.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Message-Id: <20200604083341.26978-1-vsementsov@virtuozzo.com>

Obviously this patch isn't going in via plugins/next - I had it in my
tree to keep CI green and forgot to take that into account when
generating the series!

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/6] plugins: new lockstep plugin for debugging TCG changes
  2020-06-10 15:55 ` [PATCH v2 2/6] plugins: new lockstep plugin for debugging TCG changes Alex Bennée
@ 2020-06-11 17:04   ` Robert Foley
  0 siblings, 0 replies; 12+ messages in thread
From: Robert Foley @ 2020-06-11 17:04 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Mark Cave-Ayland, Richard Henderson, QEMU Developers, robhenry,
	aaron, Emilio G. Cota, kuhn.chenqun, Peter Puhov,
	Philippe Mathieu-Daudé

Reviewed-by: Robert Foley <robert.foley@linaro.org>
Tested-by: Robert Foley <robert.foley@linaro.org>

The testing was mostly bringing up pairs of aarch64 VMs and
either waiting for the divergence or exiting out before divergence
with ctrl-a x at various stages of boot.

When we exit from the VM before divergence, we observe the below:
we seem to get "No divergence :-)" and also immediately
"too much divergence... giving up".  Not sure if this is expected in this case.

See below for an example.

No divergence :-)
Executed 1318101/1308985 blocks
Executed ~3052742 instructions
@ 0x005631f2aa0fb0 vs 0x000000000124e8 (1/0 since last)
@ 0x007fc04923e410 vs 0x00000000012f2c (2/1 since last)
@ 0x007fc04923e550 vs 0x00000000012f08 (3/1 since last)
Δ insn_count @ 0x007fc04923e550 (3052937) vs 0x00000000012f08 (3052937)
  previously @ 0x007fc04923e410/1 (3052937 insns)
  previously @ 0x005631f2aa0fb0/2 (3052936 insns)
  previously @ 0x007fc04923e390/3 (3052934 insns)
  previously @ 0x007fc04923e550/2 (3052931 insns)
  previously @ 0x007fc04923e410/1 (3052929 insns)
too much divergence... giving up.

On Wed, 10 Jun 2020 at 11:55, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> When we make changes to the TCG we sometimes cause regressions that
> are deep into the execution cycle of the guest. Debugging this often
> requires comparing large volumes of trace information to figure out
> where behaviour has diverged.
>
> The lockstep plugin utilises a shared socket so two QEMU's running
> with the plugin will write their current execution position and wait
> to receive the position of their partner process. When execution
> diverges the plugins output where they were and the previous few
> blocks before unloading themselves and letting execution continue.
>
> Originally I planned for this to be most useful with -icount but it
> turns out you can get divergence pretty quickly due to asynchronous
> qemu_cpu_kick_rr_cpus() events causing one side to eventually run into
> a short block a few cycles before the other side. For this reason I've
> added a bit of tracking and I think the divergence reporting could be
> finessed to report only if we really start to diverge in execution.
>
> An example run would be:
>
>   qemu-system-sparc -monitor none -parallel none -net none \
>     -M SS-20 -m 256 -kernel day11/zImage.elf \
>     -plugin ./tests/plugin/liblockstep.so,arg=lockstep-sparc.sock \
>     -d plugin,nochain
>
> with an identical command in another window in the same working
> directory.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Cc: Richard Henderson <richard.henderson@linaro.org>
> Cc: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> Message-Id: <20200429200754.18327-1-alex.bennee@linaro.org>
>
> ---
> v3
>   - added verbose flag
>   - basic heuristics to detect "real" divergence
>   - checkpatch tweaks
> ---
>  tests/plugin/lockstep.c   | 340 ++++++++++++++++++++++++++++++++++++++
>  tests/plugin/Makefile     |   1 +
>  tests/tcg/Makefile.target |   2 +-
>  3 files changed, 342 insertions(+), 1 deletion(-)
>  create mode 100644 tests/plugin/lockstep.c
>
> diff --git a/tests/plugin/lockstep.c b/tests/plugin/lockstep.c
> new file mode 100644
> index 00000000000..a696673dff3
> --- /dev/null
> +++ b/tests/plugin/lockstep.c
> @@ -0,0 +1,340 @@
> +/*
> + * Lockstep Execution Plugin
> + *
> + * Allows you to execute two QEMU instances in lockstep and report
> + * when their execution diverges. This is mainly useful for developers
> + * who want to see where a change to TCG code generation has
> + * introduced a subtle and hard to find bug.
> + *
> + * Caveats:
> + *   - single-threaded linux-user apps only with non-deterministic syscalls
> + *   - no MTTCG enabled system emulation (icount may help)
> + *
> + * While icount makes things more deterministic it doesn't mean a
> + * particular run may execute the exact same sequence of blocks. An
> + * asynchronous event (for example X11 graphics update) may cause a
> + * block to end early and a new partial block to start. This means
> + * serial only test cases are a better bet. -d nochain may also help.
> + *
> + * This code is not thread safe!
> + *
> + * Copyright (c) 2020 Linaro Ltd
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#include <glib.h>
> +#include <inttypes.h>
> +#include <unistd.h>
> +#include <sys/socket.h>
> +#include <sys/un.h>
> +#include <stdio.h>
> +#include <errno.h>
> +
> +#include <qemu-plugin.h>
> +
> +QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
> +
> +/* saved so we can uninstall later */
> +static qemu_plugin_id_t our_id;
> +
> +static unsigned long bb_count;
> +static unsigned long insn_count;
> +
> +/* Information about a translated block */
> +typedef struct {
> +    uint64_t pc;
> +    uint64_t insns;
> +} BlockInfo;
> +
> +/* Information about an execution state in the log */
> +typedef struct {
> +    BlockInfo *block;
> +    unsigned long insn_count;
> +    unsigned long block_count;
> +} ExecInfo;
> +
> +/* The execution state we compare */
> +typedef struct {
> +    uint64_t pc;
> +    unsigned long insn_count;
> +} ExecState;
> +
> +typedef struct {
> +    GSList *log_pos;
> +    int distance;
> +} DivergeState;
> +
> +/* list of translated block info */
> +static GSList *blocks;
> +
> +/* execution log and points of divergence */
> +static GSList *log, *divergence_log;
> +
> +static int socket_fd;
> +static char *path_to_unlink;
> +
> +static bool verbose;
> +
> +static void plugin_cleanup(qemu_plugin_id_t id)
> +{
> +    /* Free our block data */
> +    g_slist_free_full(blocks, &g_free);
> +    g_slist_free_full(log, &g_free);
> +    g_slist_free(divergence_log);
> +
> +    close(socket_fd);
> +    if (path_to_unlink) {
> +        unlink(path_to_unlink);
> +    }
> +}
> +
> +static void plugin_exit(qemu_plugin_id_t id, void *p)
> +{
> +    g_autoptr(GString) out = g_string_new("No divergence :-)\n");
> +    g_string_append_printf(out, "Executed %ld/%d blocks\n",
> +                           bb_count, g_slist_length(log));
> +    g_string_append_printf(out, "Executed ~%ld instructions\n", insn_count);
> +    qemu_plugin_outs(out->str);
> +
> +    plugin_cleanup(id);
> +}
> +
> +static void report_divergance(ExecState *us, ExecState *them)
> +{
> +    DivergeState divrec = { log, 0 };
> +    g_autoptr(GString) out = g_string_new("");
> +    bool diverged = false;
> +
> +    /*
> +     * If we have diverged before did we get back on track or are we
> +     * totally loosing it?
> +     */
> +    if (divergence_log) {
> +        DivergeState *last = (DivergeState *) divergence_log->data;
> +        GSList *entry;
> +
> +        for (entry = log; g_slist_next(entry); entry = g_slist_next(entry)) {
> +            if (entry == last->log_pos) {
> +                break;
> +            }
> +            divrec.distance++;
> +        }
> +
> +        /*
> +         * If the last two records are so close it is likely we will
> +         * not recover synchronisation with the other end.
> +         */
> +        if (divrec.distance == 1 && last->distance == 1) {
> +            diverged = true;
> +        }
> +    }
> +    divergence_log = g_slist_prepend(divergence_log,
> +                                     g_memdup(&divrec, sizeof(divrec)));
> +
> +    /* Output short log entry of going out of sync... */
> +    if (verbose || divrec.distance == 1 || diverged) {
> +        g_string_printf(out, "@ %#016lx vs %#016lx (%d/%d since last)\n",
> +                        us->pc, them->pc, g_slist_length(divergence_log),
> +                        divrec.distance);
> +        qemu_plugin_outs(out->str);
> +    }
> +
> +    if (diverged) {
> +        int i;
> +        GSList *entry;
> +
> +        g_string_printf(out, "Δ insn_count @ %#016lx (%ld) vs %#016lx (%ld)\n",
> +                        us->pc, us->insn_count, them->pc, them->insn_count);
> +
> +        for (entry = log, i = 0;
> +             g_slist_next(entry) && i < 5;
> +             entry = g_slist_next(entry), i++) {
> +            ExecInfo *prev = (ExecInfo *) entry->data;
> +            g_string_append_printf(out,
> +                                   "  previously @ %#016lx/%ld (%ld insns)\n",
> +                                   prev->block->pc, prev->block->insns,
> +                                   prev->insn_count);
> +        }
> +        qemu_plugin_outs(out->str);
> +        qemu_plugin_outs("too much divergence... giving up.");
> +        qemu_plugin_uninstall(our_id, plugin_cleanup);
> +    }
> +}
> +
> +static void vcpu_tb_exec(unsigned int cpu_index, void *udata)
> +{
> +    BlockInfo *bi = (BlockInfo *) udata;
> +    ExecState us, them;
> +    ssize_t bytes;
> +    ExecInfo *exec;
> +
> +    us.pc = bi->pc;
> +    us.insn_count = insn_count;
> +
> +    /*
> +     * Write our current position to the other end. If we fail the
> +     * other end has probably died and we should shut down gracefully.
> +     */
> +    bytes = write(socket_fd, &us, sizeof(ExecState));
> +    if (bytes < sizeof(ExecState)) {
> +        qemu_plugin_outs(bytes < 0 ?
> +                         "problem writing to socket" :
> +                         "wrote less than expected to socket");
> +        qemu_plugin_uninstall(our_id, plugin_cleanup);
> +        return;
> +    }
> +
> +    /*
> +     * Now read where our peer has reached. Again a failure probably
> +     * indicates the other end died and we should close down cleanly.
> +     */
> +    bytes = read(socket_fd, &them, sizeof(ExecState));
> +    if (bytes < sizeof(ExecState)) {
> +        qemu_plugin_outs(bytes < 0 ?
> +                         "problem reading from socket" :
> +                         "read less than expected");
> +        qemu_plugin_uninstall(our_id, plugin_cleanup);
> +        return;
> +    }
> +
> +    /*
> +     * Compare and report if we have diverged.
> +     */
> +    if (us.pc != them.pc) {
> +        report_divergance(&us, &them);
> +    }
> +
> +    /*
> +     * Assume this block will execute fully and record it
> +     * in the execution log.
> +     */
> +    insn_count += bi->insns;
> +    bb_count++;
> +    exec = g_new0(ExecInfo, 1);
> +    exec->block = bi;
> +    exec->insn_count = insn_count;
> +    exec->block_count = bb_count;
> +    log = g_slist_prepend(log, exec);
> +}
> +
> +static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
> +{
> +    BlockInfo *bi = g_new0(BlockInfo, 1);
> +    bi->pc = qemu_plugin_tb_vaddr(tb);
> +    bi->insns = qemu_plugin_tb_n_insns(tb);
> +
> +    /* save a reference so we can free later */
> +    blocks = g_slist_prepend(blocks, bi);
> +    qemu_plugin_register_vcpu_tb_exec_cb(tb, vcpu_tb_exec,
> +                                         QEMU_PLUGIN_CB_NO_REGS, (void *)bi);
> +}
> +
> +
> +/*
> + * Instead of encoding master/slave status into what is essentially
> + * two peers we shall just take the simple approach of checking for
> + * the existence of the pipe and assuming if it's not there we are the
> + * first process.
> + */
> +static bool setup_socket(const char *path)
> +{
> +    struct sockaddr_un sockaddr;
> +    int fd;
> +
> +    fd = socket(AF_UNIX, SOCK_STREAM, 0);
> +    if (fd < 0) {
> +        perror("create socket");
> +        return false;
> +    }
> +
> +    sockaddr.sun_family = AF_UNIX;
> +    g_strlcpy(sockaddr.sun_path, path, sizeof(sockaddr.sun_path) - 1);
> +    if (bind(fd, (struct sockaddr *)&sockaddr, sizeof(sockaddr)) < 0) {
> +        perror("bind socket");
> +        close(fd);
> +        return false;
> +    }
> +
> +    /* remember to clean-up */
> +    path_to_unlink = g_strdup(path);
> +
> +    if (listen(fd, 1) < 0) {
> +        perror("listen socket");
> +        close(fd);
> +        return false;
> +    }
> +
> +    socket_fd = accept(fd, NULL, NULL);
> +    if (socket_fd < 0 && errno != EINTR) {
> +        perror("accept socket");
> +        return false;
> +    }
> +
> +    qemu_plugin_outs("setup_socket::ready\n");
> +
> +    return true;
> +}
> +
> +static bool connect_socket(const char *path)
> +{
> +    int fd;
> +    struct sockaddr_un sockaddr;
> +
> +    fd = socket(AF_UNIX, SOCK_STREAM, 0);
> +    if (fd < 0) {
> +        perror("create socket");
> +        return false;
> +    }
> +
> +    sockaddr.sun_family = AF_UNIX;
> +    g_strlcpy(sockaddr.sun_path, path, sizeof(sockaddr.sun_path) - 1);
> +
> +    if (connect(fd, (struct sockaddr *)&sockaddr, sizeof(sockaddr)) < 0) {
> +        perror("failed to connect");
> +        return false;
> +    }
> +
> +    qemu_plugin_outs("connect_socket::ready\n");
> +
> +    socket_fd = fd;
> +    return true;
> +}
> +
> +static bool setup_unix_socket(const char *path)
> +{
> +    if (g_file_test(path, G_FILE_TEST_EXISTS)) {
> +        return connect_socket(path);
> +    } else {
> +        return setup_socket(path);
> +    }
> +}
> +
> +
> +QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
> +                                           const qemu_info_t *info,
> +                                           int argc, char **argv)
> +{
> +    int i;
> +
> +    if (!argc || !argv[0]) {
> +        qemu_plugin_outs("Need a socket path to talk to other instance.");
> +        return -1;
> +    }
> +
> +    for (i = 0; i < argc; i++) {
> +        char *p = argv[i];
> +        if (strcmp(p, "verbose") == 0) {
> +            verbose = true;
> +        } else if (!setup_unix_socket(argv[0])) {
> +            qemu_plugin_outs("Failed to setup socket for communications.");
> +            return -1;
> +        }
> +    }
> +
> +    our_id = id;
> +
> +    qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans);
> +    qemu_plugin_register_atexit_cb(id, plugin_exit, NULL);
> +    return 0;
> +}
> diff --git a/tests/plugin/Makefile b/tests/plugin/Makefile
> index 75467b6db85..b3250e2504c 100644
> --- a/tests/plugin/Makefile
> +++ b/tests/plugin/Makefile
> @@ -13,6 +13,7 @@ NAMES += mem
>  NAMES += hotblocks
>  NAMES += howvec
>  NAMES += hotpages
> +NAMES += lockstep
>
>  SONAMES := $(addsuffix .so,$(addprefix lib,$(NAMES)))
>
> diff --git a/tests/tcg/Makefile.target b/tests/tcg/Makefile.target
> index b3cff3cad1a..075daf3d22d 100644
> --- a/tests/tcg/Makefile.target
> +++ b/tests/tcg/Makefile.target
> @@ -128,7 +128,7 @@ RUN_TESTS=$(patsubst %,run-%, $(TESTS))
>  ifeq ($(CONFIG_PLUGIN),y)
>  PLUGIN_DIR=../../plugin
>  VPATH+=$(PLUGIN_DIR)
> -PLUGINS=$(notdir $(wildcard $(PLUGIN_DIR)/*.so))
> +PLUGINS=$(filter-out liblockstep.so,$(notdir $(wildcard $(PLUGIN_DIR)/*.so)))
>
>  # We need to ensure expand the run-plugin-TEST-with-PLUGIN
>  # pre-requistes manually here as we can't use stems to handle it. We
> --
> 2.20.1
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/6] cputlb: ensure we save the IOTLB data in case of reset
  2020-06-10 15:55 ` [PATCH v2 3/6] cputlb: ensure we save the IOTLB data in case of reset Alex Bennée
@ 2020-06-21 20:33   ` Emilio G. Cota
  2020-06-22  9:02     ` Alex Bennée
  0 siblings, 1 reply; 12+ messages in thread
From: Emilio G. Cota @ 2020-06-21 20:33 UTC (permalink / raw)
  To: Alex Bennée
  Cc: robert.foley, qemu-devel, robhenry, aaron, Paolo Bonzini,
	kuhn.chenqun, peter.puhov, Richard Henderson

On Wed, Jun 10, 2020 at 16:55:06 +0100, Alex Bennée wrote:
> Any write to a device might cause a re-arrangement of memory
> triggering a TLB flush and potential re-size of the TLB invalidating
> previous entries. This would cause users of qemu_plugin_get_hwaddr()
> to see the warning:
> 
>   invalid use of qemu_plugin_get_hwaddr
> 
> because of the failed tlb_lookup which should always succeed. To
> prevent this we save the IOTLB data in case it is later needed by a
> plugin doing a lookup.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> 
> ---
> v2
>   - save the entry instead of re-running the tlb_fill.
> 
> squash! cputlb: ensure we save the IOTLB entry in case of reset
> ---
>  accel/tcg/cputlb.c | 63 ++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 61 insertions(+), 2 deletions(-)
> 
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index eb2cf9de5e6..9bf9e479c7c 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -1058,6 +1058,47 @@ static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
>      return val;
>  }
>  
> +#ifdef CONFIG_PLUGIN
> +
> +typedef struct SavedIOTLB {
> +    struct rcu_head rcu;
> +    struct SavedIOTLB **save_loc;
> +    MemoryRegionSection *section;
> +    hwaddr mr_offset;
> +} SavedIOTLB;
> +
> +static void clean_saved_entry(SavedIOTLB *s)
> +{
> +    atomic_rcu_set(s->save_loc, NULL);

This will race with the CPU thread that sets saved_for_plugin in
save_iotlb_data().

> +    g_free(s);
> +}
> +
> +static __thread SavedIOTLB *saved_for_plugin;

Apologies if this has been discussed, but why is this using TLS
variables and not state embedded in CPUState?
I see that qemu_plugin_get_hwaddr does not take a cpu_index, but
maybe it should? We could then just embed the RCU pointer in CPUState.

> +
> +/*
> + * Save a potentially trashed IOTLB entry for later lookup by plugin.
> + *
> + * We also need to track the thread storage address because the RCU
> + * cleanup that runs when we leave the critical region (the current
> + * execution) is actually in a different thread.
> + */
> +static void save_iotlb_data(MemoryRegionSection *section, hwaddr mr_offset)
> +{
> +    SavedIOTLB *s = g_new(SavedIOTLB, 1);
> +    s->save_loc = &saved_for_plugin;
> +    s->section = section;
> +    s->mr_offset = mr_offset;
> +    atomic_rcu_set(&saved_for_plugin, s);
> +    call_rcu(s, clean_saved_entry, rcu);

Here we could just publish the new pointer and g_free_rcu the old
one, if any.

> +}
> +
> +#else
> +static void save_iotlb_data(MemoryRegionSection *section, hwaddr mr_offset)
> +{
> +    /* do nothing */
> +}
> +#endif
> +
>  static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
>                        int mmu_idx, uint64_t val, target_ulong addr,
>                        uintptr_t retaddr, MemOp op)
> @@ -1077,6 +1118,12 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
>      }
>      cpu->mem_io_pc = retaddr;
>  
> +    /*
> +     * The memory_region_dispatch may trigger a flush/resize
> +     * so for plugins we save the iotlb_data just in case.
> +     */
> +    save_iotlb_data(section, mr_offset);
> +
>      if (mr->global_locking && !qemu_mutex_iothread_locked()) {
>          qemu_mutex_lock_iothread();
>          locked = true;
> @@ -1091,6 +1138,7 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
>                                 MMU_DATA_STORE, mmu_idx, iotlbentry->attrs, r,
>                                 retaddr);
>      }
> +

Stray whitespace change.

>      if (locked) {
>          qemu_mutex_unlock_iothread();
>      }
> @@ -1366,8 +1414,11 @@ void *tlb_vaddr_to_host(CPUArchState *env, abi_ptr addr,
>   * in the softmmu lookup code (or helper). We don't handle re-fills or
>   * checking the victim table. This is purely informational.
>   *
> - * This should never fail as the memory access being instrumented
> - * should have just filled the TLB.
> + * This almost never fails as the memory access being instrumented
> + * should have just filled the TLB. The one corner case is io_writex
> + * which can cause TLB flushes and potential resizing of the TLBs
> + * loosing the information we need. In those cases we need to recover
> + * data from a thread local copy of the io_tlb entry.
>   */
>  
>  bool tlb_plugin_lookup(CPUState *cpu, target_ulong addr, int mmu_idx,
> @@ -1391,6 +1442,14 @@ bool tlb_plugin_lookup(CPUState *cpu, target_ulong addr, int mmu_idx,
>              data->v.ram.hostaddr = addr + tlbe->addend;
>          }
>          return true;
> +    } else {
> +        SavedIOTLB *saved = atomic_rcu_read(&saved_for_plugin);
> +        if (saved) {
> +            data->is_io = true;
> +            data->v.io.section = saved->section;
> +            data->v.io.offset = saved->mr_offset;
> +            return true;
> +        }

Shouldn't we check that the contents of the saved IOTLB match the
parameters of the lookup? Otherwise passing a random address is likely
to land here.

Thanks,
		Emilio


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH  v2 3/6] cputlb: ensure we save the IOTLB data in case of reset
  2020-06-21 20:33   ` Emilio G. Cota
@ 2020-06-22  9:02     ` Alex Bennée
  2020-06-23  1:54       ` Emilio G. Cota
  0 siblings, 1 reply; 12+ messages in thread
From: Alex Bennée @ 2020-06-22  9:02 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: robert.foley, qemu-devel, robhenry, aaron, Paolo Bonzini,
	kuhn.chenqun, peter.puhov, Richard Henderson


Emilio G. Cota <cota@braap.org> writes:

> On Wed, Jun 10, 2020 at 16:55:06 +0100, Alex Bennée wrote:
>> Any write to a device might cause a re-arrangement of memory
>> triggering a TLB flush and potential re-size of the TLB invalidating
>> previous entries. This would cause users of qemu_plugin_get_hwaddr()
>> to see the warning:
>> 
>>   invalid use of qemu_plugin_get_hwaddr
>> 
>> because of the failed tlb_lookup which should always succeed. To
>> prevent this we save the IOTLB data in case it is later needed by a
>> plugin doing a lookup.
>> 
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> 
>> ---
>> v2
>>   - save the entry instead of re-running the tlb_fill.
>> 
>> squash! cputlb: ensure we save the IOTLB entry in case of reset
>> ---
>>  accel/tcg/cputlb.c | 63 ++++++++++++++++++++++++++++++++++++++++++++--
>>  1 file changed, 61 insertions(+), 2 deletions(-)
>> 
>> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
>> index eb2cf9de5e6..9bf9e479c7c 100644
>> --- a/accel/tcg/cputlb.c
>> +++ b/accel/tcg/cputlb.c
>> @@ -1058,6 +1058,47 @@ static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
>>      return val;
>>  }
>>  
>> +#ifdef CONFIG_PLUGIN
>> +
>> +typedef struct SavedIOTLB {
>> +    struct rcu_head rcu;
>> +    struct SavedIOTLB **save_loc;
>> +    MemoryRegionSection *section;
>> +    hwaddr mr_offset;
>> +} SavedIOTLB;
>> +
>> +static void clean_saved_entry(SavedIOTLB *s)
>> +{
>> +    atomic_rcu_set(s->save_loc, NULL);
>
> This will race with the CPU thread that sets saved_for_plugin in
> save_iotlb_data().

Surely that only happens outside the critical section?

>
>> +    g_free(s);
>> +}
>> +
>> +static __thread SavedIOTLB *saved_for_plugin;
>
> Apologies if this has been discussed, but why is this using TLS
> variables and not state embedded in CPUState?

Good point - I guess I;m being lazy.

> I see that qemu_plugin_get_hwaddr does not take a cpu_index, but
> maybe it should? We could then just embed the RCU pointer in CPUState.
>
>> +
>> +/*
>> + * Save a potentially trashed IOTLB entry for later lookup by plugin.
>> + *
>> + * We also need to track the thread storage address because the RCU
>> + * cleanup that runs when we leave the critical region (the current
>> + * execution) is actually in a different thread.
>> + */
>> +static void save_iotlb_data(MemoryRegionSection *section, hwaddr mr_offset)
>> +{
>> +    SavedIOTLB *s = g_new(SavedIOTLB, 1);
>> +    s->save_loc = &saved_for_plugin;
>> +    s->section = section;
>> +    s->mr_offset = mr_offset;
>> +    atomic_rcu_set(&saved_for_plugin, s);
>> +    call_rcu(s, clean_saved_entry, rcu);
>
> Here we could just publish the new pointer and g_free_rcu the old
> one, if any.

That would be simpler. I'll re-spin.

>
>> +}
>> +
>> +#else
>> +static void save_iotlb_data(MemoryRegionSection *section, hwaddr mr_offset)
>> +{
>> +    /* do nothing */
>> +}
>> +#endif
>> +
>>  static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
>>                        int mmu_idx, uint64_t val, target_ulong addr,
>>                        uintptr_t retaddr, MemOp op)
>> @@ -1077,6 +1118,12 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
>>      }
>>      cpu->mem_io_pc = retaddr;
>>  
>> +    /*
>> +     * The memory_region_dispatch may trigger a flush/resize
>> +     * so for plugins we save the iotlb_data just in case.
>> +     */
>> +    save_iotlb_data(section, mr_offset);
>> +
>>      if (mr->global_locking && !qemu_mutex_iothread_locked()) {
>>          qemu_mutex_lock_iothread();
>>          locked = true;
>> @@ -1091,6 +1138,7 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
>>                                 MMU_DATA_STORE, mmu_idx, iotlbentry->attrs, r,
>>                                 retaddr);
>>      }
>> +
>
> Stray whitespace change.
>
>>      if (locked) {
>>          qemu_mutex_unlock_iothread();
>>      }
>> @@ -1366,8 +1414,11 @@ void *tlb_vaddr_to_host(CPUArchState *env, abi_ptr addr,
>>   * in the softmmu lookup code (or helper). We don't handle re-fills or
>>   * checking the victim table. This is purely informational.
>>   *
>> - * This should never fail as the memory access being instrumented
>> - * should have just filled the TLB.
>> + * This almost never fails as the memory access being instrumented
>> + * should have just filled the TLB. The one corner case is io_writex
>> + * which can cause TLB flushes and potential resizing of the TLBs
>> + * loosing the information we need. In those cases we need to recover
>> + * data from a thread local copy of the io_tlb entry.
>>   */
>>  
>>  bool tlb_plugin_lookup(CPUState *cpu, target_ulong addr, int mmu_idx,
>> @@ -1391,6 +1442,14 @@ bool tlb_plugin_lookup(CPUState *cpu, target_ulong addr, int mmu_idx,
>>              data->v.ram.hostaddr = addr + tlbe->addend;
>>          }
>>          return true;
>> +    } else {
>> +        SavedIOTLB *saved = atomic_rcu_read(&saved_for_plugin);
>> +        if (saved) {
>> +            data->is_io = true;
>> +            data->v.io.section = saved->section;
>> +            data->v.io.offset = saved->mr_offset;
>> +            return true;
>> +        }
>
> Shouldn't we check that the contents of the saved IOTLB match the
> parameters of the lookup? Otherwise passing a random address is likely
> to land here.

Good point - I'm being too trusting here ;-)

Thanks for the review.

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/6] cputlb: ensure we save the IOTLB data in case of reset
  2020-06-22  9:02     ` Alex Bennée
@ 2020-06-23  1:54       ` Emilio G. Cota
  0 siblings, 0 replies; 12+ messages in thread
From: Emilio G. Cota @ 2020-06-23  1:54 UTC (permalink / raw)
  To: Alex Bennée
  Cc: robert.foley, qemu-devel, robhenry, aaron, Paolo Bonzini,
	kuhn.chenqun, peter.puhov, Richard Henderson

On Mon, Jun 22, 2020 at 10:02:50 +0100, Alex Bennée wrote:
> Emilio G. Cota <cota@braap.org> writes:
> > On Wed, Jun 10, 2020 at 16:55:06 +0100, Alex Bennée wrote:
(snip)
> >> +#ifdef CONFIG_PLUGIN
> >> +
> >> +typedef struct SavedIOTLB {
> >> +    struct rcu_head rcu;
> >> +    struct SavedIOTLB **save_loc;
> >> +    MemoryRegionSection *section;
> >> +    hwaddr mr_offset;
> >> +} SavedIOTLB;
> >> +
> >> +static void clean_saved_entry(SavedIOTLB *s)
> >> +{
> >> +    atomic_rcu_set(s->save_loc, NULL);
> >
> > This will race with the CPU thread that sets saved_for_plugin in
> > save_iotlb_data().
> 
> Surely that only happens outside the critical section?

I am not sure which critical section you're referring to.

With call_rcu() we defer the execution of the function to the RCU
thread at a later time, where "later time" is defined as any time
after the pre-existing RCU read critical sections have elapsed.

So we could have the RCU thread clearing the variable while the
CPU thread, which is in a _later_ RCU read critical section, is
setting said variable. This is the race I was referring to.

Thanks,
	
		Emilio


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-06-23  1:55 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-10 15:55 [PATCH v2 0/6] plugins/next (lockstep, api, hwprofile) Alex Bennée
2020-06-10 15:55 ` [PATCH v2 1/6] iotests: 194: wait migration completion on target too Alex Bennée
2020-06-10 16:38   ` Alex Bennée
2020-06-10 15:55 ` [PATCH v2 2/6] plugins: new lockstep plugin for debugging TCG changes Alex Bennée
2020-06-11 17:04   ` Robert Foley
2020-06-10 15:55 ` [PATCH v2 3/6] cputlb: ensure we save the IOTLB data in case of reset Alex Bennée
2020-06-21 20:33   ` Emilio G. Cota
2020-06-22  9:02     ` Alex Bennée
2020-06-23  1:54       ` Emilio G. Cota
2020-06-10 15:55 ` [PATCH v2 4/6] hw/virtio/pci: include vdev name in registered PCI sections Alex Bennée
2020-06-10 15:55 ` [PATCH v2 5/6] plugins: add API to return a name for a IO device Alex Bennée
2020-06-10 15:55 ` [PATCH v2 6/6] plugins: new hwprofile plugin Alex Bennée

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.