All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH  v2 00/21] plugins/next pre-PR (hwprofile, regression fixes, icount count fix)
@ 2021-02-10 22:10 Alex Bennée
  2021-02-10 22:10 ` [PATCH v2 01/21] hw/virtio/pci: include vdev name in registered PCI sections Alex Bennée
                   ` (20 more replies)
  0 siblings, 21 replies; 53+ messages in thread
From: Alex Bennée @ 2021-02-10 22:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: robhenry, mahmoudabdalghany, aaron, cota, kuhn.chenqun, Alex Bennée

Hi,

OK time to start preparing a plugins PR with all the accumulated fixes
over the last few months. Broadly they are:

  - a new API for HW access profiling
  - a bunch of style clean-ups
  - a fix for a regression caused by recent TCG updates
  - Richard's io_recompile clean-ups
  - a fix for an icount/io_recompile miscount

The following patches could still do with some review:
  
 - tests/acceptance: add a new tests to detect counting errors
 - accel/tcg: re-factor non-RAM execution code
 - accel/tcg: cache single instruction TB on pending replay exception
 - accel/tcg: actually cache our partial icount TB
 - tests/acceptance: add a new set of tests to exercise plugins
 - tests/plugin: expand insn test to detect duplicate instructions

Alex Bennée (12):
  hw/virtio/pci: include vdev name in registered PCI sections
  plugins: add API to return a name for a IO device
  plugins: new hwprofile plugin
  accel/tcg/plugin-gen: fix the call signature for inline callbacks
  tests/plugin: expand insn test to detect duplicate instructions
  tests/acceptance: add a new set of tests to exercise plugins
  accel/tcg: actually cache our partial icount TB
  accel/tcg: cache single instruction TB on pending replay exception
  accel/tcg: re-factor non-RAM execution code
  accel/tcg: remove CF_NOCACHE and special cases
  accel/tcg: allow plugin instrumentation to be disable via cflags
  tests/acceptance: add a new tests to detect counting errors

Richard Henderson (4):
  exec: Move TranslationBlock typedef to qemu/typedefs.h
  accel/tcg: Create io_recompile_replay_branch hook
  target/mips: Create mips_io_recompile_replay_branch
  target/sh4: Create superh_io_recompile_replay_branch

zhouyang (5):
  contrib: Don't use '#' flag of printf format
  contrib: Fix some code style problems, ERROR: "foo * bar" should be
    "foo *bar"
  contrib: Add spaces around operator
  contrib: space required after that ','
  contrib: Open brace '{' following struct go on the same line

 docs/devel/tcg-plugins.rst               |  34 +++
 include/exec/exec-all.h                  |   9 +-
 include/exec/tb-context.h                |   1 -
 include/hw/core/cpu.h                    |   4 +-
 include/hw/core/tcg-cpu-ops.h            |  13 +-
 include/qemu/qemu-plugin.h               |   6 +
 include/qemu/typedefs.h                  |   1 +
 target/arm/internals.h                   |   3 +-
 accel/tcg/cpu-exec.c                     |  61 +----
 accel/tcg/plugin-gen.c                   |  32 +--
 accel/tcg/translate-all.c                | 129 ++++------
 accel/tcg/translator.c                   |   2 +-
 contrib/ivshmem-server/main.c            |   2 +-
 contrib/plugins/hotblocks.c              |   2 +-
 contrib/plugins/hotpages.c               |   2 +-
 contrib/plugins/howvec.c                 |  19 +-
 contrib/plugins/hwprofile.c              | 305 +++++++++++++++++++++++
 contrib/plugins/lockstep.c               |   6 +-
 hw/virtio/virtio-pci.c                   |  22 +-
 plugins/api.c                            |  20 ++
 target/cris/translate.c                  |   2 +-
 target/lm32/translate.c                  |   2 +-
 target/mips/cpu.c                        |  18 ++
 target/moxie/translate.c                 |   2 +-
 target/sh4/cpu.c                         |  18 ++
 target/unicore32/translate.c             |   2 +-
 tests/plugin/insn.c                      |  12 +-
 contrib/plugins/Makefile                 |   1 +
 tests/acceptance/tcg_plugins.py          | 134 ++++++++++
 tests/tcg/i386/Makefile.softmmu-target   |  10 +
 tests/tcg/i386/Makefile.target           |   7 +
 tests/tcg/x86_64/Makefile.softmmu-target |  10 +
 32 files changed, 697 insertions(+), 194 deletions(-)
 create mode 100644 contrib/plugins/hwprofile.c
 create mode 100644 tests/acceptance/tcg_plugins.py

-- 
2.20.1



^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v2 01/21] hw/virtio/pci: include vdev name in registered PCI sections
  2021-02-10 22:10 [PATCH v2 00/21] plugins/next pre-PR (hwprofile, regression fixes, icount count fix) Alex Bennée
@ 2021-02-10 22:10 ` Alex Bennée
  2021-02-10 22:10 ` [PATCH v2 02/21] plugins: add API to return a name for a IO device Alex Bennée
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 53+ messages in thread
From: Alex Bennée @ 2021-02-10 22:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S . Tsirkin, Philippe Mathieu-Daudé,
	robhenry, mahmoudabdalghany, aaron, cota, kuhn.chenqun,
	Alex Bennée

When viewing/debugging memory regions it is sometimes hard to figure
out which PCI device something belongs to. Make the names unique by
including the vdev name in the name string.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>

Message-Id: <20200713200415.26214-10-alex.bennee@linaro.org>
---
 hw/virtio/virtio-pci.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 094c36aa3e..883045a223 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1423,7 +1423,8 @@ static void virtio_pci_device_write(void *opaque, hwaddr addr,
     }
 }
 
-static void virtio_pci_modern_regions_init(VirtIOPCIProxy *proxy)
+static void virtio_pci_modern_regions_init(VirtIOPCIProxy *proxy,
+                                           const char *vdev_name)
 {
     static const MemoryRegionOps common_ops = {
         .read = virtio_pci_common_read,
@@ -1470,36 +1471,41 @@ static void virtio_pci_modern_regions_init(VirtIOPCIProxy *proxy)
         },
         .endianness = DEVICE_LITTLE_ENDIAN,
     };
+    g_autoptr(GString) name = g_string_new(NULL);
 
-
+    g_string_printf(name, "virtio-pci-common-%s", vdev_name);
     memory_region_init_io(&proxy->common.mr, OBJECT(proxy),
                           &common_ops,
                           proxy,
-                          "virtio-pci-common",
+                          name->str,
                           proxy->common.size);
 
+    g_string_printf(name, "virtio-pci-isr-%s", vdev_name);
     memory_region_init_io(&proxy->isr.mr, OBJECT(proxy),
                           &isr_ops,
                           proxy,
-                          "virtio-pci-isr",
+                          name->str,
                           proxy->isr.size);
 
+    g_string_printf(name, "virtio-pci-device-%s", vdev_name);
     memory_region_init_io(&proxy->device.mr, OBJECT(proxy),
                           &device_ops,
                           proxy,
-                          "virtio-pci-device",
+                          name->str,
                           proxy->device.size);
 
+    g_string_printf(name, "virtio-pci-notify-%s", vdev_name);
     memory_region_init_io(&proxy->notify.mr, OBJECT(proxy),
                           &notify_ops,
                           proxy,
-                          "virtio-pci-notify",
+                          name->str,
                           proxy->notify.size);
 
+    g_string_printf(name, "virtio-pci-notify-pio-%s", vdev_name);
     memory_region_init_io(&proxy->notify_pio.mr, OBJECT(proxy),
                           &notify_pio_ops,
                           proxy,
-                          "virtio-pci-notify-pio",
+                          name->str,
                           proxy->notify_pio.size);
 }
 
@@ -1654,7 +1660,7 @@ static void virtio_pci_device_plugged(DeviceState *d, Error **errp)
 
         struct virtio_pci_cfg_cap *cfg_mask;
 
-        virtio_pci_modern_regions_init(proxy);
+        virtio_pci_modern_regions_init(proxy, vdev->name);
 
         virtio_pci_modern_mem_region_map(proxy, &proxy->common, &cap);
         virtio_pci_modern_mem_region_map(proxy, &proxy->isr, &cap);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH  v2 02/21] plugins: add API to return a name for a IO device
  2021-02-10 22:10 [PATCH v2 00/21] plugins/next pre-PR (hwprofile, regression fixes, icount count fix) Alex Bennée
  2021-02-10 22:10 ` [PATCH v2 01/21] hw/virtio/pci: include vdev name in registered PCI sections Alex Bennée
@ 2021-02-10 22:10 ` Alex Bennée
  2021-02-10 22:10 ` [PATCH v2 03/21] plugins: new hwprofile plugin Alex Bennée
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 53+ messages in thread
From: Alex Bennée @ 2021-02-10 22:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, robhenry, mahmoudabdalghany, aaron, cota,
	kuhn.chenqun, Clement Deschamps, Alex Bennée

This may well end up being anonymous but it should always be unique.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Clement Deschamps <clement.deschamps@greensocs.com>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20200713200415.26214-11-alex.bennee@linaro.org>

---
v4
  - use g_intern_static_string for static strings
---
 include/qemu/qemu-plugin.h |  6 ++++++
 plugins/api.c              | 20 ++++++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
index 5775e82c4e..c66507fe8f 100644
--- a/include/qemu/qemu-plugin.h
+++ b/include/qemu/qemu-plugin.h
@@ -330,6 +330,12 @@ struct qemu_plugin_hwaddr *qemu_plugin_get_hwaddr(qemu_plugin_meminfo_t info,
 bool qemu_plugin_hwaddr_is_io(const struct qemu_plugin_hwaddr *haddr);
 uint64_t qemu_plugin_hwaddr_device_offset(const struct qemu_plugin_hwaddr *haddr);
 
+/*
+ * Returns a string representing the device. The string is valid for
+ * the lifetime of the plugin.
+ */
+const char *qemu_plugin_hwaddr_device_name(const struct qemu_plugin_hwaddr *h);
+
 typedef void
 (*qemu_plugin_vcpu_mem_cb_t)(unsigned int vcpu_index,
                              qemu_plugin_meminfo_t info, uint64_t vaddr,
diff --git a/plugins/api.c b/plugins/api.c
index bbdc5a4eb4..5dc8e6f934 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -303,6 +303,26 @@ uint64_t qemu_plugin_hwaddr_device_offset(const struct qemu_plugin_hwaddr *haddr
     return 0;
 }
 
+const char *qemu_plugin_hwaddr_device_name(const struct qemu_plugin_hwaddr *h)
+{
+#ifdef CONFIG_SOFTMMU
+    if (h && h->is_io) {
+        MemoryRegionSection *mrs = h->v.io.section;
+        if (!mrs->mr->name) {
+            unsigned long maddr = 0xffffffff & (uintptr_t) mrs->mr;
+            g_autofree char *temp = g_strdup_printf("anon%08lx", maddr);
+            return g_intern_string(temp);
+        } else {
+            return g_intern_string(mrs->mr->name);
+        }
+    } else {
+        return g_intern_static_string("RAM");
+    }
+#else
+    return g_intern_static_string("Invalid");
+#endif
+}
+
 /*
  * Queries to the number and potential maximum number of vCPUs there
  * will be. This helps the plugin dimension per-vcpu arrays.
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH  v2 03/21] plugins: new hwprofile plugin
  2021-02-10 22:10 [PATCH v2 00/21] plugins/next pre-PR (hwprofile, regression fixes, icount count fix) Alex Bennée
  2021-02-10 22:10 ` [PATCH v2 01/21] hw/virtio/pci: include vdev name in registered PCI sections Alex Bennée
  2021-02-10 22:10 ` [PATCH v2 02/21] plugins: add API to return a name for a IO device Alex Bennée
@ 2021-02-10 22:10 ` Alex Bennée
  2021-02-10 22:10 ` [PATCH v2 04/21] contrib: Don't use '#' flag of printf format Alex Bennée
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 53+ messages in thread
From: Alex Bennée @ 2021-02-10 22:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: Robert Foley, robhenry, mahmoudabdalghany, aaron, cota,
	kuhn.chenqun, Alex Bennée

This is a plugin intended to help with profiling access to various
bits of system hardware. It only really makes sense for system
emulation.

It takes advantage of the recently exposed helper API that allows us
to see the device name (memory region name) associated with a device.

You can specify arg=read or arg=write to limit the tracking to just
reads or writes (by default it does both).

The pattern option:

  -plugin ./tests/plugin/libhwprofile.so,arg=pattern

will allow you to see the access pattern to devices, eg:

  gic_cpu @ 0xffffffc010040000
    off:00000000, 8, 1, 8, 1
    off:00000000, 4, 1, 4, 1
    off:00000000, 2, 1, 2, 1
    off:00000000, 1, 1, 1, 1

The source option:

  -plugin ./tests/plugin/libhwprofile.so,arg=source

will track the virtual source address of the instruction making the
access:

  pl011 @ 0xffffffc010031000
    pc:ffffffc0104c785c, 1, 4, 0, 0
    pc:ffffffc0104c7898, 1, 4, 0, 0
    pc:ffffffc010512bcc, 2, 1867, 0, 0

You cannot mix source and pattern.

Finally the match option allow you to limit the tracking to just the
devices you care about.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Robert Foley <robert.foley@linaro.org>
Tested-by: Robert Foley <robert.foley@linaro.org>
Message-Id: <20200713200415.26214-12-alex.bennee@linaro.org>

---
vN
  - add some notes to tcg-plugins.rst
---
 docs/devel/tcg-plugins.rst  |  34 ++++
 contrib/plugins/hwprofile.c | 305 ++++++++++++++++++++++++++++++++++++
 contrib/plugins/Makefile    |   1 +
 3 files changed, 340 insertions(+)
 create mode 100644 contrib/plugins/hwprofile.c

diff --git a/docs/devel/tcg-plugins.rst b/docs/devel/tcg-plugins.rst
index 0568dfa6a4..39ce86ed96 100644
--- a/docs/devel/tcg-plugins.rst
+++ b/docs/devel/tcg-plugins.rst
@@ -280,3 +280,37 @@ which will eventually report::
     previously @ 0x000000ffd08098/5 (809900593 insns)
     previously @ 0x000000ffd080c0/1 (809900588 insns)
 
+- contrib/plugins/hwprofile
+
+The hwprofile tool can only be used with system emulation and allows
+the user to see what hardware is accessed how often. It has a number of options:
+
+ * arg=read or arg=write
+
+ By default the plugin tracks both reads and writes. You can use one
+ of these options to limit the tracking to just one class of accesses.
+
+ * arg=source
+
+ Will include a detailed break down of what the guest PC that made the
+ access was. Not compatible with arg=pattern. Example output::
+
+   cirrus-low-memory @ 0xfffffd00000a0000
+    pc:fffffc0000005cdc, 1, 256
+    pc:fffffc0000005ce8, 1, 256
+    pc:fffffc0000005cec, 1, 256
+
+ * arg=pattern
+
+ Instead break down the accesses based on the offset into the HW
+ region. This can be useful for seeing the most used registers of a
+ device. Example output::
+
+    pci0-conf @ 0xfffffd01fe000000
+      off:00000004, 1, 1
+      off:00000010, 1, 3
+      off:00000014, 1, 3
+      off:00000018, 1, 2
+      off:0000001c, 1, 2
+      off:00000020, 1, 2
+      ...
diff --git a/contrib/plugins/hwprofile.c b/contrib/plugins/hwprofile.c
new file mode 100644
index 0000000000..6dac1d5f85
--- /dev/null
+++ b/contrib/plugins/hwprofile.c
@@ -0,0 +1,305 @@
+/*
+ * Copyright (C) 2020, Alex Bennée <alex.bennee@linaro.org>
+ *
+ * HW Profile - breakdown access patterns for IO to devices
+ *
+ * License: GNU GPL, version 2 or later.
+ *   See the COPYING file in the top-level directory.
+ */
+
+#include <inttypes.h>
+#include <assert.h>
+#include <stdlib.h>
+#include <inttypes.h>
+#include <string.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <glib.h>
+
+#include <qemu-plugin.h>
+
+QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
+
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+
+typedef struct {
+    uint64_t cpu_read;
+    uint64_t cpu_write;
+    uint64_t reads;
+    uint64_t writes;
+} IOCounts;
+
+typedef struct {
+    uint64_t off_or_pc;
+    IOCounts counts;
+} IOLocationCounts;
+
+typedef struct {
+    const char *name;
+    uint64_t   base;
+    IOCounts   totals;
+    GHashTable *detail;
+} DeviceCounts;
+
+static GMutex lock;
+static GHashTable *devices;
+
+/* track the access pattern to a piece of HW */
+static bool pattern;
+/* track the source address of access to HW */
+static bool source;
+/* track only matched regions of HW */
+static bool check_match;
+static gchar **matches;
+
+static enum qemu_plugin_mem_rw rw = QEMU_PLUGIN_MEM_RW;
+
+static inline bool track_reads(void)
+{
+    return rw == QEMU_PLUGIN_MEM_RW || rw == QEMU_PLUGIN_MEM_R;
+}
+
+static inline bool track_writes(void)
+{
+    return rw == QEMU_PLUGIN_MEM_RW || rw == QEMU_PLUGIN_MEM_W;
+}
+
+static void plugin_init(void)
+{
+    devices = g_hash_table_new(NULL, NULL);
+}
+
+static gint sort_cmp(gconstpointer a, gconstpointer b)
+{
+    DeviceCounts *ea = (DeviceCounts *) a;
+    DeviceCounts *eb = (DeviceCounts *) b;
+    return ea->totals.reads + ea->totals.writes >
+           eb->totals.reads + eb->totals.writes ? -1 : 1;
+}
+
+static gint sort_loc(gconstpointer a, gconstpointer b)
+{
+    IOLocationCounts *ea = (IOLocationCounts *) a;
+    IOLocationCounts *eb = (IOLocationCounts *) b;
+    return ea->off_or_pc > eb->off_or_pc;
+}
+
+static void fmt_iocount_record(GString *s, IOCounts *rec)
+{
+    if (track_reads()) {
+        g_string_append_printf(s, ", %"PRIx64", %"PRId64,
+                               rec->cpu_read, rec->reads);
+    }
+    if (track_writes()) {
+        g_string_append_printf(s, ", %"PRIx64", %"PRId64,
+                               rec->cpu_write, rec->writes);
+    }
+}
+
+static void fmt_dev_record(GString *s, DeviceCounts *rec)
+{
+    g_string_append_printf(s, "%s, 0x%"PRIx64,
+                           rec->name, rec->base);
+    fmt_iocount_record(s, &rec->totals);
+    g_string_append_c(s, '\n');
+}
+
+static void plugin_exit(qemu_plugin_id_t id, void *p)
+{
+    g_autoptr(GString) report = g_string_new("");
+    GList *counts;
+
+    if (!(pattern || source)) {
+        g_string_printf(report, "Device, Address");
+        if (track_reads()) {
+            g_string_append_printf(report, ", RCPUs, Reads");
+        }
+        if (track_writes()) {
+            g_string_append_printf(report, ",  WCPUs, Writes");
+        }
+        g_string_append_c(report, '\n');
+    }
+
+    counts = g_hash_table_get_values(devices);
+    if (counts && g_list_next(counts)) {
+        GList *it;
+
+        it = g_list_sort(counts, sort_cmp);
+
+        while (it) {
+            DeviceCounts *rec = (DeviceCounts *) it->data;
+            if (rec->detail) {
+                GList *accesses = g_hash_table_get_values(rec->detail);
+                GList *io_it = g_list_sort(accesses, sort_loc);
+                const char *prefix = pattern ? "off" : "pc";
+                g_string_append_printf(report, "%s @ 0x%"PRIx64"\n",
+                                       rec->name, rec->base);
+                while (io_it) {
+                    IOLocationCounts *loc = (IOLocationCounts *) io_it->data;
+                    g_string_append_printf(report, "  %s:%08"PRIx64,
+                                           prefix, loc->off_or_pc);
+                    fmt_iocount_record(report, &loc->counts);
+                    g_string_append_c(report, '\n');
+                    io_it = io_it->next;
+                }
+            } else {
+                fmt_dev_record(report, rec);
+            }
+            it = it->next;
+        };
+        g_list_free(it);
+    }
+
+    qemu_plugin_outs(report->str);
+}
+
+static DeviceCounts *new_count(const char *name, uint64_t base)
+{
+    DeviceCounts *count = g_new0(DeviceCounts, 1);
+    count->name = name;
+    count->base = base;
+    if (pattern || source) {
+        count->detail = g_hash_table_new(NULL, NULL);
+    }
+    g_hash_table_insert(devices, (gpointer) name, count);
+    return count;
+}
+
+static IOLocationCounts *new_location(GHashTable *table, uint64_t off_or_pc)
+{
+    IOLocationCounts *loc = g_new0(IOLocationCounts, 1);
+    loc->off_or_pc = off_or_pc;
+    g_hash_table_insert(table, (gpointer) off_or_pc, loc);
+    return loc;
+}
+
+static void hwprofile_match_hit(DeviceCounts *rec, uint64_t off)
+{
+    g_autoptr(GString) report = g_string_new("hwprofile: match @ offset");
+    g_string_append_printf(report, "%"PRIx64", previous hits\n", off);
+    fmt_dev_record(report, rec);
+    qemu_plugin_outs(report->str);
+}
+
+static void inc_count(IOCounts *count, bool is_write, unsigned int cpu_index)
+{
+    if (is_write) {
+        count->writes++;
+        count->cpu_write |= (1 << cpu_index);
+    } else {
+        count->reads++;
+        count->cpu_read |= (1 << cpu_index);
+    }
+}
+
+static void vcpu_haddr(unsigned int cpu_index, qemu_plugin_meminfo_t meminfo,
+                       uint64_t vaddr, void *udata)
+{
+    struct qemu_plugin_hwaddr *hwaddr = qemu_plugin_get_hwaddr(meminfo, vaddr);
+
+    if (!hwaddr || !qemu_plugin_hwaddr_is_io(hwaddr)) {
+        return;
+    } else {
+        const char *name = qemu_plugin_hwaddr_device_name(hwaddr);
+        uint64_t off = qemu_plugin_hwaddr_device_offset(hwaddr);
+        bool is_write = qemu_plugin_mem_is_store(meminfo);
+        DeviceCounts *counts;
+
+        g_mutex_lock(&lock);
+        counts = (DeviceCounts *) g_hash_table_lookup(devices, name);
+
+        if (!counts) {
+            uint64_t base = vaddr - off;
+            counts = new_count(name, base);
+        }
+
+        if (check_match) {
+            if (g_strv_contains((const char * const *)matches, counts->name)) {
+                hwprofile_match_hit(counts, off);
+                inc_count(&counts->totals, is_write, cpu_index);
+            }
+        } else {
+            inc_count(&counts->totals, is_write, cpu_index);
+        }
+
+        /* either track offsets or source of access */
+        if (source) {
+            off = (uint64_t) udata;
+        }
+
+        if (pattern || source) {
+            IOLocationCounts *io_count = g_hash_table_lookup(counts->detail,
+                                                             (gpointer) off);
+            if (!io_count) {
+                io_count = new_location(counts->detail, off);
+            }
+            inc_count(&io_count->counts, is_write, cpu_index);
+        }
+
+        g_mutex_unlock(&lock);
+    }
+}
+
+static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
+{
+    size_t n = qemu_plugin_tb_n_insns(tb);
+    size_t i;
+
+    for (i = 0; i < n; i++) {
+        struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, i);
+        gpointer udata = (gpointer) (source ? qemu_plugin_insn_vaddr(insn) : 0);
+        qemu_plugin_register_vcpu_mem_cb(insn, vcpu_haddr,
+                                         QEMU_PLUGIN_CB_NO_REGS,
+                                         rw, udata);
+    }
+}
+
+QEMU_PLUGIN_EXPORT
+int qemu_plugin_install(qemu_plugin_id_t id, const qemu_info_t *info,
+                        int argc, char **argv)
+{
+    int i;
+
+    for (i = 0; i < argc; i++) {
+        char *opt = argv[i];
+        if (g_strcmp0(opt, "read") == 0) {
+            rw = QEMU_PLUGIN_MEM_R;
+        } else if (g_strcmp0(opt, "write") == 0) {
+            rw = QEMU_PLUGIN_MEM_W;
+        } else if (g_strcmp0(opt, "pattern") == 0) {
+            pattern = true;
+        } else if (g_strcmp0(opt, "source") == 0) {
+            source = true;
+        } else if (g_str_has_prefix(opt, "match")) {
+            gchar **parts = g_strsplit(opt, "=", 2);
+            check_match = true;
+            matches = g_strsplit(parts[1], ",", -1);
+            g_strfreev(parts);
+        } else {
+            fprintf(stderr, "option parsing failed: %s\n", opt);
+            return -1;
+        }
+    }
+
+    if (source && pattern) {
+        fprintf(stderr, "can only currently track either source or pattern.\n");
+        return -1;
+    }
+
+    if (!info->system_emulation) {
+        fprintf(stderr, "hwprofile: plugin only useful for system emulation\n");
+        return -1;
+    }
+
+    /* Just warn about overflow */
+    if (info->system.smp_vcpus > 64 ||
+        info->system.max_vcpus > 64) {
+        fprintf(stderr, "hwprofile: can only track up to 64 CPUs\n");
+    }
+
+    plugin_init();
+
+    qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans);
+    qemu_plugin_register_atexit_cb(id, plugin_exit, NULL);
+    return 0;
+}
diff --git a/contrib/plugins/Makefile b/contrib/plugins/Makefile
index 7801b08b0d..b9d7935e5e 100644
--- a/contrib/plugins/Makefile
+++ b/contrib/plugins/Makefile
@@ -17,6 +17,7 @@ NAMES += hotblocks
 NAMES += hotpages
 NAMES += howvec
 NAMES += lockstep
+NAMES += hwprofile
 
 SONAMES := $(addsuffix .so,$(addprefix lib,$(NAMES)))
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH  v2 04/21] contrib: Don't use '#' flag of printf format
  2021-02-10 22:10 [PATCH v2 00/21] plugins/next pre-PR (hwprofile, regression fixes, icount count fix) Alex Bennée
                   ` (2 preceding siblings ...)
  2021-02-10 22:10 ` [PATCH v2 03/21] plugins: new hwprofile plugin Alex Bennée
@ 2021-02-10 22:10 ` Alex Bennée
  2021-02-10 22:10 ` [PATCH v2 05/21] contrib: Fix some code style problems, ERROR: "foo * bar" should be "foo *bar" Alex Bennée
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 53+ messages in thread
From: Alex Bennée @ 2021-02-10 22:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhouyang, robhenry, mahmoudabdalghany, aaron, cota, kuhn.chenqun,
	Alex Bennée

From: zhouyang <zhouyang789@huawei.com>

I am reading contrib related code and found some style problems while
check the code using checkpatch.pl. This commit fixs the misuse of
'#' flag of printf format

Signed-off-by: zhouyang <zhouyang789@huawei.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210118031004.1662363-2-zhouyang789@huawei.com>
---
 contrib/plugins/hotblocks.c | 2 +-
 contrib/plugins/hotpages.c  | 2 +-
 contrib/plugins/howvec.c    | 2 +-
 contrib/plugins/lockstep.c  | 6 +++---
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/contrib/plugins/hotblocks.c b/contrib/plugins/hotblocks.c
index 37435a3fc7..4b08340143 100644
--- a/contrib/plugins/hotblocks.c
+++ b/contrib/plugins/hotblocks.c
@@ -63,7 +63,7 @@ static void plugin_exit(qemu_plugin_id_t id, void *p)
 
         for (i = 0; i < limit && it->next; i++, it = it->next) {
             ExecCount *rec = (ExecCount *) it->data;
-            g_string_append_printf(report, "%#016"PRIx64", %d, %ld, %"PRId64"\n",
+            g_string_append_printf(report, "0x%016"PRIx64", %d, %ld, %"PRId64"\n",
                                    rec->start_addr, rec->trans_count,
                                    rec->insns, rec->exec_count);
         }
diff --git a/contrib/plugins/hotpages.c b/contrib/plugins/hotpages.c
index ecd6c18732..eacc678eac 100644
--- a/contrib/plugins/hotpages.c
+++ b/contrib/plugins/hotpages.c
@@ -88,7 +88,7 @@ static void plugin_exit(qemu_plugin_id_t id, void *p)
         for (i = 0; i < limit && it->next; i++, it = it->next) {
             PageCounters *rec = (PageCounters *) it->data;
             g_string_append_printf(report,
-                                   "%#016"PRIx64", 0x%04x, %"PRId64
+                                   "0x%016"PRIx64", 0x%04x, %"PRId64
                                    ", 0x%04x, %"PRId64"\n",
                                    rec->page_address,
                                    rec->cpu_read, rec->reads,
diff --git a/contrib/plugins/howvec.c b/contrib/plugins/howvec.c
index 3b9a6939f2..6e602aaccf 100644
--- a/contrib/plugins/howvec.c
+++ b/contrib/plugins/howvec.c
@@ -209,7 +209,7 @@ static void plugin_exit(qemu_plugin_id_t id, void *p)
              i++, counts = g_list_next(counts)) {
             InsnExecCount *rec = (InsnExecCount *) counts->data;
             g_string_append_printf(report,
-                                   "Instr: %-24s\t(%ld hits)\t(op=%#08x/%s)\n",
+                                   "Instr: %-24s\t(%ld hits)\t(op=0x%08x/%s)\n",
                                    rec->insn,
                                    rec->count,
                                    rec->opcode,
diff --git a/contrib/plugins/lockstep.c b/contrib/plugins/lockstep.c
index 5aad50869d..7fd35eb669 100644
--- a/contrib/plugins/lockstep.c
+++ b/contrib/plugins/lockstep.c
@@ -134,7 +134,7 @@ static void report_divergance(ExecState *us, ExecState *them)
 
     /* Output short log entry of going out of sync... */
     if (verbose || divrec.distance == 1 || diverged) {
-        g_string_printf(out, "@ %#016lx vs %#016lx (%d/%d since last)\n",
+        g_string_printf(out, "@ 0x%016lx vs 0x%016lx (%d/%d since last)\n",
                         us->pc, them->pc, g_slist_length(divergence_log),
                         divrec.distance);
         qemu_plugin_outs(out->str);
@@ -144,7 +144,7 @@ static void report_divergance(ExecState *us, ExecState *them)
         int i;
         GSList *entry;
 
-        g_string_printf(out, "Δ insn_count @ %#016lx (%ld) vs %#016lx (%ld)\n",
+        g_string_printf(out, "Δ insn_count @ 0x%016lx (%ld) vs 0x%016lx (%ld)\n",
                         us->pc, us->insn_count, them->pc, them->insn_count);
 
         for (entry = log, i = 0;
@@ -152,7 +152,7 @@ static void report_divergance(ExecState *us, ExecState *them)
              entry = g_slist_next(entry), i++) {
             ExecInfo *prev = (ExecInfo *) entry->data;
             g_string_append_printf(out,
-                                   "  previously @ %#016lx/%ld (%ld insns)\n",
+                                   "  previously @ 0x%016lx/%ld (%ld insns)\n",
                                    prev->block->pc, prev->block->insns,
                                    prev->insn_count);
         }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH  v2 05/21] contrib: Fix some code style problems, ERROR: "foo * bar" should be "foo *bar"
  2021-02-10 22:10 [PATCH v2 00/21] plugins/next pre-PR (hwprofile, regression fixes, icount count fix) Alex Bennée
                   ` (3 preceding siblings ...)
  2021-02-10 22:10 ` [PATCH v2 04/21] contrib: Don't use '#' flag of printf format Alex Bennée
@ 2021-02-10 22:10 ` Alex Bennée
  2021-02-10 22:10 ` [PATCH v2 06/21] contrib: Add spaces around operator Alex Bennée
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 53+ messages in thread
From: Alex Bennée @ 2021-02-10 22:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhouyang, robhenry, mahmoudabdalghany, aaron, cota, kuhn.chenqun,
	Alex Bennée

From: zhouyang <zhouyang789@huawei.com>

I am reading contrib related code and found some style problems while
check the code using checkpatch.pl. This commit fixs the issue below:
ERROR: "foo * bar" should be "foo *bar"

Signed-off-by: zhouyang <zhouyang789@huawei.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210118031004.1662363-3-zhouyang789@huawei.com>
---
 contrib/plugins/howvec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/plugins/howvec.c b/contrib/plugins/howvec.c
index 6e602aaccf..2f892da17d 100644
--- a/contrib/plugins/howvec.c
+++ b/contrib/plugins/howvec.c
@@ -235,7 +235,7 @@ static void vcpu_insn_exec_before(unsigned int cpu_index, void *udata)
     (*count)++;
 }
 
-static uint64_t * find_counter(struct qemu_plugin_insn *insn)
+static uint64_t *find_counter(struct qemu_plugin_insn *insn)
 {
     int i;
     uint64_t *cnt = NULL;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH  v2 06/21] contrib: Add spaces around operator
  2021-02-10 22:10 [PATCH v2 00/21] plugins/next pre-PR (hwprofile, regression fixes, icount count fix) Alex Bennée
                   ` (4 preceding siblings ...)
  2021-02-10 22:10 ` [PATCH v2 05/21] contrib: Fix some code style problems, ERROR: "foo * bar" should be "foo *bar" Alex Bennée
@ 2021-02-10 22:10 ` Alex Bennée
  2021-02-10 22:10 ` [PATCH v2 07/21] contrib: space required after that ',' Alex Bennée
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 53+ messages in thread
From: Alex Bennée @ 2021-02-10 22:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhouyang, robhenry, mahmoudabdalghany, aaron, cota, kuhn.chenqun,
	Alex Bennée

From: zhouyang <zhouyang789@huawei.com>

I am reading contrib related code and found some style problems while
check the code using checkpatch.pl. This commit fixs the issue below:
ERROR: spaces required around that '*'

Signed-off-by: zhouyang <zhouyang789@huawei.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210118031004.1662363-4-zhouyang789@huawei.com>
---
 contrib/ivshmem-server/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/ivshmem-server/main.c b/contrib/ivshmem-server/main.c
index ee08c4ced0..224dbeb547 100644
--- a/contrib/ivshmem-server/main.c
+++ b/contrib/ivshmem-server/main.c
@@ -17,7 +17,7 @@
 #define IVSHMEM_SERVER_DEFAULT_PID_FILE       "/var/run/ivshmem-server.pid"
 #define IVSHMEM_SERVER_DEFAULT_UNIX_SOCK_PATH "/tmp/ivshmem_socket"
 #define IVSHMEM_SERVER_DEFAULT_SHM_PATH       "ivshmem"
-#define IVSHMEM_SERVER_DEFAULT_SHM_SIZE       (4*1024*1024)
+#define IVSHMEM_SERVER_DEFAULT_SHM_SIZE       (4 * 1024 * 1024)
 #define IVSHMEM_SERVER_DEFAULT_N_VECTORS      1
 
 /* used to quit on signal SIGTERM */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH  v2 07/21] contrib: space required after that ','
  2021-02-10 22:10 [PATCH v2 00/21] plugins/next pre-PR (hwprofile, regression fixes, icount count fix) Alex Bennée
                   ` (5 preceding siblings ...)
  2021-02-10 22:10 ` [PATCH v2 06/21] contrib: Add spaces around operator Alex Bennée
@ 2021-02-10 22:10 ` Alex Bennée
  2021-02-10 22:10 ` [PATCH v2 08/21] contrib: Open brace '{' following struct go on the same line Alex Bennée
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 53+ messages in thread
From: Alex Bennée @ 2021-02-10 22:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhouyang, robhenry, mahmoudabdalghany, aaron, cota, kuhn.chenqun,
	Alex Bennée

From: zhouyang <zhouyang789@huawei.com>

I am reading contrib related code and found some style problems while
check the code using checkpatch.pl. This commit fixs the issue below:
ERROR: space required after that ','

Signed-off-by: zhouyang <zhouyang789@huawei.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210118031004.1662363-5-zhouyang789@huawei.com>
---
 contrib/plugins/howvec.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/contrib/plugins/howvec.c b/contrib/plugins/howvec.c
index 2f892da17d..9d6fa33297 100644
--- a/contrib/plugins/howvec.c
+++ b/contrib/plugins/howvec.c
@@ -68,7 +68,7 @@ static InsnClassExecCount aarch64_insn_classes[] = {
     { "Reserved",            "res",    0x1e000000, 0x00000000, COUNT_CLASS},
     /* Data Processing Immediate */
     { "  PCrel addr",        "pcrel",  0x1f000000, 0x10000000, COUNT_CLASS},
-    { "  Add/Sub (imm,tags)","asit",   0x1f800000, 0x11800000, COUNT_CLASS},
+    { "  Add/Sub (imm,tags)", "asit",   0x1f800000, 0x11800000, COUNT_CLASS},
     { "  Add/Sub (imm)",     "asi",    0x1f000000, 0x11000000, COUNT_CLASS},
     { "  Logical (imm)",     "logi",   0x1f800000, 0x12000000, COUNT_CLASS},
     { "  Move Wide (imm)",   "movwi",  0x1f800000, 0x12800000, COUNT_CLASS},
@@ -91,17 +91,17 @@ static InsnClassExecCount aarch64_insn_classes[] = {
     { "Branches",            "branch", 0x1c000000, 0x14000000, COUNT_CLASS},
     /* Loads and Stores */
     { "  AdvSimd ldstmult",  "advlsm", 0xbfbf0000, 0x0c000000, COUNT_CLASS},
-    { "  AdvSimd ldstmult++","advlsmp",0xbfb00000, 0x0c800000, COUNT_CLASS},
+    { "  AdvSimd ldstmult++", "advlsmp", 0xbfb00000, 0x0c800000, COUNT_CLASS},
     { "  AdvSimd ldst",      "advlss", 0xbf9f0000, 0x0d000000, COUNT_CLASS},
-    { "  AdvSimd ldst++",    "advlssp",0xbf800000, 0x0d800000, COUNT_CLASS},
+    { "  AdvSimd ldst++",    "advlssp", 0xbf800000, 0x0d800000, COUNT_CLASS},
     { "  ldst excl",         "ldstx",  0x3f000000, 0x08000000, COUNT_CLASS},
     { "    Prefetch",        "prfm",   0xff000000, 0xd8000000, COUNT_CLASS},
     { "  Load Reg (lit)",    "ldlit",  0x1b000000, 0x18000000, COUNT_CLASS},
-    { "  ldst noalloc pair", "ldstnap",0x3b800000, 0x28000000, COUNT_CLASS},
+    { "  ldst noalloc pair", "ldstnap", 0x3b800000, 0x28000000, COUNT_CLASS},
     { "  ldst pair",         "ldstp",  0x38000000, 0x28000000, COUNT_CLASS},
     { "  ldst reg",          "ldstr",  0x3b200000, 0x38000000, COUNT_CLASS},
     { "  Atomic ldst",       "atomic", 0x3b200c00, 0x38200000, COUNT_CLASS},
-    { "  ldst reg (reg off)","ldstro", 0x3b200b00, 0x38200800, COUNT_CLASS},
+    { "  ldst reg (reg off)", "ldstro", 0x3b200b00, 0x38200800, COUNT_CLASS},
     { "  ldst reg (pac)",    "ldstpa", 0x3b200200, 0x38200800, COUNT_CLASS},
     { "  ldst reg (imm)",    "ldsti",  0x3b000000, 0x39000000, COUNT_CLASS},
     { "Loads & Stores",      "ldst",   0x0a000000, 0x08000000, COUNT_CLASS},
@@ -202,7 +202,7 @@ static void plugin_exit(qemu_plugin_id_t id, void *p)
 
     counts = g_hash_table_get_values(insns);
     if (counts && g_list_next(counts)) {
-        g_string_append_printf(report,"Individual Instructions:\n");
+        g_string_append_printf(report, "Individual Instructions:\n");
         counts = g_list_sort(counts, cmp_exec_count);
 
         for (i = 0; i < limit && g_list_next(counts);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 08/21] contrib: Open brace '{' following struct go on the same line
  2021-02-10 22:10 [PATCH v2 00/21] plugins/next pre-PR (hwprofile, regression fixes, icount count fix) Alex Bennée
                   ` (6 preceding siblings ...)
  2021-02-10 22:10 ` [PATCH v2 07/21] contrib: space required after that ',' Alex Bennée
@ 2021-02-10 22:10 ` Alex Bennée
  2021-02-10 22:10 ` [PATCH v2 09/21] accel/tcg/plugin-gen: fix the call signature for inline callbacks Alex Bennée
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 53+ messages in thread
From: Alex Bennée @ 2021-02-10 22:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhouyang, robhenry, mahmoudabdalghany, aaron, cota, kuhn.chenqun,
	Alex Bennée

From: zhouyang <zhouyang789@huawei.com>

I found some style problems whil check the code using checkpatch.pl.
This commit fixs the issue below:
ERROR: that open brace { should be on the previous line

Signed-off-by: zhouyang <zhouyang789@huawei.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210118031004.1662363-6-zhouyang789@huawei.com>
---
 contrib/plugins/howvec.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/contrib/plugins/howvec.c b/contrib/plugins/howvec.c
index 9d6fa33297..600f7facc1 100644
--- a/contrib/plugins/howvec.c
+++ b/contrib/plugins/howvec.c
@@ -145,8 +145,7 @@ typedef struct {
     int table_sz;
 } ClassSelector;
 
-static ClassSelector class_tables[] =
-{
+static ClassSelector class_tables[] = {
     { "aarch64", aarch64_insn_classes, ARRAY_SIZE(aarch64_insn_classes) },
     { "sparc",   sparc32_insn_classes, ARRAY_SIZE(sparc32_insn_classes) },
     { "sparc64", sparc64_insn_classes, ARRAY_SIZE(sparc64_insn_classes) },
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 09/21] accel/tcg/plugin-gen: fix the call signature for inline callbacks
  2021-02-10 22:10 [PATCH v2 00/21] plugins/next pre-PR (hwprofile, regression fixes, icount count fix) Alex Bennée
                   ` (7 preceding siblings ...)
  2021-02-10 22:10 ` [PATCH v2 08/21] contrib: Open brace '{' following struct go on the same line Alex Bennée
@ 2021-02-10 22:10 ` Alex Bennée
  2021-02-10 22:10 ` [PATCH v2 10/21] exec: Move TranslationBlock typedef to qemu/typedefs.h Alex Bennée
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 53+ messages in thread
From: Alex Bennée @ 2021-02-10 22:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, robhenry, mahmoudabdalghany, aaron, cota,
	Paolo Bonzini, kuhn.chenqun, Alex Bennée

A recent change to the handling of constants in TCG changed the
pattern of ops emitted for a constant add. We no longer emit a mov and
the constant can be applied directly to the TCG_op_add arguments. This
was causing SEGVs when running the insn plugin with arg=inline. Fix
this by updating copy_add_i64 to do the right thing while also adding
a comment at the top of the append section as an aide memoir if
something like this happens again.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Cc: Emilio G. Cota <cota@braap.org>
Message-Id: <20210210172751.11669-1-alex.bennee@linaro.org>
---
 accel/tcg/plugin-gen.c | 32 +++++++++++---------------------
 1 file changed, 11 insertions(+), 21 deletions(-)

diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index e5dc9d0ca9..8a1bb801e0 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -320,22 +320,6 @@ static TCGOp *copy_const_ptr(TCGOp **begin_op, TCGOp *op, void *ptr)
     return op;
 }
 
-static TCGOp *copy_const_i64(TCGOp **begin_op, TCGOp *op, uint64_t v)
-{
-    if (TCG_TARGET_REG_BITS == 32) {
-        /* 2x mov_i32 */
-        op = copy_op(begin_op, op, INDEX_op_mov_i32);
-        op->args[1] = tcgv_i32_arg(tcg_constant_i32(v));
-        op = copy_op(begin_op, op, INDEX_op_mov_i32);
-        op->args[1] = tcgv_i32_arg(tcg_constant_i32(v >> 32));
-    } else {
-        /* mov_i64 */
-        op = copy_op(begin_op, op, INDEX_op_mov_i64);
-        op->args[1] = tcgv_i64_arg(tcg_constant_i64(v));
-    }
-    return op;
-}
-
 static TCGOp *copy_extu_tl_i64(TCGOp **begin_op, TCGOp *op)
 {
     if (TARGET_LONG_BITS == 32) {
@@ -374,14 +358,17 @@ static TCGOp *copy_st_i64(TCGOp **begin_op, TCGOp *op)
     return op;
 }
 
-static TCGOp *copy_add_i64(TCGOp **begin_op, TCGOp *op)
+static TCGOp *copy_add_i64(TCGOp **begin_op, TCGOp *op, uint64_t v)
 {
     if (TCG_TARGET_REG_BITS == 32) {
         /* all 32-bit backends must implement add2_i32 */
         g_assert(TCG_TARGET_HAS_add2_i32);
         op = copy_op(begin_op, op, INDEX_op_add2_i32);
+        op->args[4] = tcgv_i32_arg(tcg_constant_i32(v));
+        op->args[5] = tcgv_i32_arg(tcg_constant_i32(v >> 32));
     } else {
         op = copy_op(begin_op, op, INDEX_op_add_i64);
+        op->args[2] = tcgv_i64_arg(tcg_constant_i64(v));
     }
     return op;
 }
@@ -431,6 +418,12 @@ static TCGOp *copy_call(TCGOp **begin_op, TCGOp *op, void *empty_func,
     return op;
 }
 
+/*
+ * When we append/replace ops here we are sensitive to changing patterns of
+ * TCGOps generated by the tcg_gen_FOO calls when we generated the
+ * empty callbacks. This will assert very quickly in a debug build as
+ * we assert the ops we are replacing are the correct ones.
+ */
 static TCGOp *append_udata_cb(const struct qemu_plugin_dyn_cb *cb,
                               TCGOp *begin_op, TCGOp *op, int *cb_idx)
 {
@@ -462,11 +455,8 @@ static TCGOp *append_inline_cb(const struct qemu_plugin_dyn_cb *cb,
     /* ld_i64 */
     op = copy_ld_i64(&begin_op, op);
 
-    /* const_i64 */
-    op = copy_const_i64(&begin_op, op, cb->inline_insn.imm);
-
     /* add_i64 */
-    op = copy_add_i64(&begin_op, op);
+    op = copy_add_i64(&begin_op, op, cb->inline_insn.imm);
 
     /* st_i64 */
     op = copy_st_i64(&begin_op, op);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 10/21] exec: Move TranslationBlock typedef to qemu/typedefs.h
  2021-02-10 22:10 [PATCH v2 00/21] plugins/next pre-PR (hwprofile, regression fixes, icount count fix) Alex Bennée
                   ` (8 preceding siblings ...)
  2021-02-10 22:10 ` [PATCH v2 09/21] accel/tcg/plugin-gen: fix the call signature for inline callbacks Alex Bennée
@ 2021-02-10 22:10 ` Alex Bennée
  2021-02-11 10:14   ` Philippe Mathieu-Daudé
  2021-02-10 22:10 ` [PATCH v2 11/21] accel/tcg: Create io_recompile_replay_branch hook Alex Bennée
                   ` (10 subsequent siblings)
  20 siblings, 1 reply; 53+ messages in thread
From: Alex Bennée @ 2021-02-10 22:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Eduardo Habkost, Anthony Green, Richard Henderson,
	Michael Walle, robhenry, mahmoudabdalghany, aaron, cota,
	Paolo Bonzini, kuhn.chenqun, Guan Xuetao, Alex Bennée,
	Edgar E. Iglesias, open list:ARM TCG CPUs

From: Richard Henderson <richard.henderson@linaro.org>

This also means we don't need an extra declaration of
the structure in hw/core/cpu.h.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210208233906.479571-2-richard.henderson@linaro.org>
Message-Id: <20210209182749.31323-2-alex.bennee@linaro.org>
---
 include/exec/tb-context.h     | 1 -
 include/hw/core/cpu.h         | 4 +---
 include/hw/core/tcg-cpu-ops.h | 3 +--
 include/qemu/typedefs.h       | 1 +
 target/arm/internals.h        | 3 +--
 target/cris/translate.c       | 2 +-
 target/lm32/translate.c       | 2 +-
 target/moxie/translate.c      | 2 +-
 target/unicore32/translate.c  | 2 +-
 9 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/include/exec/tb-context.h b/include/exec/tb-context.h
index ec4c13b455..cc33979113 100644
--- a/include/exec/tb-context.h
+++ b/include/exec/tb-context.h
@@ -26,7 +26,6 @@
 #define CODE_GEN_HTABLE_BITS     15
 #define CODE_GEN_HTABLE_SIZE     (1 << CODE_GEN_HTABLE_BITS)
 
-typedef struct TranslationBlock TranslationBlock;
 typedef struct TBContext TBContext;
 
 struct TBContext {
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 38d813c389..c005d3dc2d 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -74,8 +74,6 @@ typedef enum MMUAccessType {
 
 typedef struct CPUWatchpoint CPUWatchpoint;
 
-struct TranslationBlock;
-
 /* see tcg-cpu-ops.h */
 struct TCGCPUOps;
 
@@ -375,7 +373,7 @@ struct CPUState {
     IcountDecr *icount_decr_ptr;
 
     /* Accessed in parallel; all accesses must be atomic */
-    struct TranslationBlock *tb_jmp_cache[TB_JMP_CACHE_SIZE];
+    TranslationBlock *tb_jmp_cache[TB_JMP_CACHE_SIZE];
 
     struct GDBRegisterState *gdb_regs;
     int gdb_num_regs;
diff --git a/include/hw/core/tcg-cpu-ops.h b/include/hw/core/tcg-cpu-ops.h
index ccc97d1894..ac3bb051f2 100644
--- a/include/hw/core/tcg-cpu-ops.h
+++ b/include/hw/core/tcg-cpu-ops.h
@@ -30,8 +30,7 @@ struct TCGCPUOps {
      * If more state needs to be restored, the target must implement a
      * function to restore all the state, and register it here.
      */
-    void (*synchronize_from_tb)(CPUState *cpu,
-                                const struct TranslationBlock *tb);
+    void (*synchronize_from_tb)(CPUState *cpu, const TranslationBlock *tb);
     /** @cpu_exec_enter: Callback for cpu_exec preparation */
     void (*cpu_exec_enter)(CPUState *cpu);
     /** @cpu_exec_exit: Callback for cpu_exec cleanup */
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index dc39b05c30..ee60eb3de4 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -120,6 +120,7 @@ typedef struct ReservedRegion ReservedRegion;
 typedef struct SavedIOTLB SavedIOTLB;
 typedef struct SHPCDevice SHPCDevice;
 typedef struct SSIBus SSIBus;
+typedef struct TranslationBlock TranslationBlock;
 typedef struct VirtIODevice VirtIODevice;
 typedef struct Visitor Visitor;
 typedef struct VMChangeStateEntry VMChangeStateEntry;
diff --git a/target/arm/internals.h b/target/arm/internals.h
index 448982dd2f..7d26ce0c9d 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -172,8 +172,7 @@ void arm_cpu_register_gdb_regs_for_features(ARMCPU *cpu);
 void arm_translate_init(void);
 
 #ifdef CONFIG_TCG
-void arm_cpu_synchronize_from_tb(CPUState *cs,
-                                 const struct TranslationBlock *tb);
+void arm_cpu_synchronize_from_tb(CPUState *cs, const TranslationBlock *tb);
 #endif /* CONFIG_TCG */
 
 
diff --git a/target/cris/translate.c b/target/cris/translate.c
index c893f877ab..65c168c0c7 100644
--- a/target/cris/translate.c
+++ b/target/cris/translate.c
@@ -132,7 +132,7 @@ typedef struct DisasContext {
 
     int delayed_branch;
 
-    struct TranslationBlock *tb;
+    TranslationBlock *tb;
     int singlestep_enabled;
 } DisasContext;
 
diff --git a/target/lm32/translate.c b/target/lm32/translate.c
index 030b232d66..20c70d03f1 100644
--- a/target/lm32/translate.c
+++ b/target/lm32/translate.c
@@ -93,7 +93,7 @@ typedef struct DisasContext {
     unsigned int tb_flags, synced_flags; /* tb dependent flags.  */
     int is_jmp;
 
-    struct TranslationBlock *tb;
+    TranslationBlock *tb;
     int singlestep_enabled;
 
     uint32_t features;
diff --git a/target/moxie/translate.c b/target/moxie/translate.c
index d5fb27dfb8..24a742b25e 100644
--- a/target/moxie/translate.c
+++ b/target/moxie/translate.c
@@ -36,7 +36,7 @@
 
 /* This is the state at translation time.  */
 typedef struct DisasContext {
-    struct TranslationBlock *tb;
+    TranslationBlock *tb;
     target_ulong pc, saved_pc;
     uint32_t opcode;
     uint32_t fp_status;
diff --git a/target/unicore32/translate.c b/target/unicore32/translate.c
index 962f9877a0..370709c9ea 100644
--- a/target/unicore32/translate.c
+++ b/target/unicore32/translate.c
@@ -34,7 +34,7 @@ typedef struct DisasContext {
     int condjmp;
     /* The label that will be jumped to when the instruction is skipped.  */
     TCGLabel *condlabel;
-    struct TranslationBlock *tb;
+    TranslationBlock *tb;
     int singlestep_enabled;
 #ifndef CONFIG_USER_ONLY
     int user;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH  v2 11/21] accel/tcg: Create io_recompile_replay_branch hook
  2021-02-10 22:10 [PATCH v2 00/21] plugins/next pre-PR (hwprofile, regression fixes, icount count fix) Alex Bennée
                   ` (9 preceding siblings ...)
  2021-02-10 22:10 ` [PATCH v2 10/21] exec: Move TranslationBlock typedef to qemu/typedefs.h Alex Bennée
@ 2021-02-10 22:10 ` Alex Bennée
  2021-02-11 10:12   ` Philippe Mathieu-Daudé
  2021-02-10 22:10 ` [PATCH v2 12/21] target/mips: Create mips_io_recompile_replay_branch Alex Bennée
                   ` (9 subsequent siblings)
  20 siblings, 1 reply; 53+ messages in thread
From: Alex Bennée @ 2021-02-10 22:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, robhenry, mahmoudabdalghany, aaron, cota,
	Paolo Bonzini, kuhn.chenqun, Alex Bennée

From: Richard Henderson <richard.henderson@linaro.org>

Create a hook in which to split out the mips and
sh4 ifdefs from cpu_io_recompile.

[AJB: s/stoped/stopped/]

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210208233906.479571-3-richard.henderson@linaro.org>
Message-Id: <20210209182749.31323-3-alex.bennee@linaro.org>
---
 include/hw/core/tcg-cpu-ops.h | 10 ++++++++++
 accel/tcg/translate-all.c     | 17 +++++++++++++----
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/include/hw/core/tcg-cpu-ops.h b/include/hw/core/tcg-cpu-ops.h
index ac3bb051f2..72d791438c 100644
--- a/include/hw/core/tcg-cpu-ops.h
+++ b/include/hw/core/tcg-cpu-ops.h
@@ -88,6 +88,16 @@ struct TCGCPUOps {
      */
     bool (*debug_check_watchpoint)(CPUState *cpu, CPUWatchpoint *wp);
 
+    /**
+     * @io_recompile_replay_branch: Callback for cpu_io_recompile.
+     *
+     * The cpu has been stopped, and cpu_restore_state_from_tb has been
+     * called.  If the faulting instruction is in a delay slot, and the
+     * target architecture requires re-execution of the branch, then
+     * adjust the cpu state as required and return true.
+     */
+    bool (*io_recompile_replay_branch)(CPUState *cpu,
+                                       const TranslationBlock *tb);
 #endif /* CONFIG_SOFTMMU */
 #endif /* NEED_CPU_H */
 
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 81d4c83f22..6eb37883bd 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -60,6 +60,7 @@
 #include "sysemu/cpu-timers.h"
 #include "sysemu/tcg.h"
 #include "qapi/error.h"
+#include "hw/core/tcg-cpu-ops.h"
 #include "internal.h"
 
 /* #define DEBUG_TB_INVALIDATE */
@@ -2420,6 +2421,7 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
     CPUArchState *env = cpu->env_ptr;
 #endif
     TranslationBlock *tb;
+    CPUClass *cc;
     uint32_t n;
 
     tb = tcg_tb_lookup(retaddr);
@@ -2429,11 +2431,18 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
     }
     cpu_restore_state_from_tb(cpu, tb, retaddr, true);
 
-    /* On MIPS and SH, delay slot instructions can only be restarted if
-       they were already the first instruction in the TB.  If this is not
-       the first instruction in a TB then re-execute the preceding
-       branch.  */
+    /*
+     * Some guests must re-execute the branch when re-executing a delay
+     * slot instruction.  When this is the case, adjust icount and N
+     * to account for the re-execution of the branch.
+     */
     n = 1;
+    cc = CPU_GET_CLASS(cpu);
+    if (cc->tcg_ops->io_recompile_replay_branch &&
+        cc->tcg_ops->io_recompile_replay_branch(cpu, tb)) {
+        cpu_neg(cpu)->icount_decr.u16.low++;
+        n = 2;
+    }
 #if defined(TARGET_MIPS)
     if ((env->hflags & MIPS_HFLAG_BMASK) != 0
         && env->active_tc.PC != tb->pc) {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 12/21] target/mips: Create mips_io_recompile_replay_branch
  2021-02-10 22:10 [PATCH v2 00/21] plugins/next pre-PR (hwprofile, regression fixes, icount count fix) Alex Bennée
                   ` (10 preceding siblings ...)
  2021-02-10 22:10 ` [PATCH v2 11/21] accel/tcg: Create io_recompile_replay_branch hook Alex Bennée
@ 2021-02-10 22:10 ` Alex Bennée
  2021-02-11 10:10   ` Philippe Mathieu-Daudé
  2021-02-10 22:10 ` [PATCH v2 13/21] target/sh4: Create superh_io_recompile_replay_branch Alex Bennée
                   ` (8 subsequent siblings)
  20 siblings, 1 reply; 53+ messages in thread
From: Alex Bennée @ 2021-02-10 22:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: Aleksandar Rikalo, Richard Henderson, Philippe Mathieu-Daudé,
	robhenry, mahmoudabdalghany, aaron, cota, Paolo Bonzini,
	kuhn.chenqun, Alex Bennée, Aurelien Jarno

From: Richard Henderson <richard.henderson@linaro.org>

Move the code from accel/tcg/translate-all.c to target/mips/cpu.c.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210208233906.479571-4-richard.henderson@linaro.org>
Message-Id: <20210209182749.31323-4-alex.bennee@linaro.org>
---
 accel/tcg/translate-all.c | 12 ++----------
 target/mips/cpu.c         | 18 ++++++++++++++++++
 2 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 6eb37883bd..470657b02a 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -2417,7 +2417,7 @@ void tb_check_watchpoint(CPUState *cpu, uintptr_t retaddr)
  */
 void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
 {
-#if defined(TARGET_MIPS) || defined(TARGET_SH4)
+#if defined(TARGET_SH4)
     CPUArchState *env = cpu->env_ptr;
 #endif
     TranslationBlock *tb;
@@ -2443,15 +2443,7 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
         cpu_neg(cpu)->icount_decr.u16.low++;
         n = 2;
     }
-#if defined(TARGET_MIPS)
-    if ((env->hflags & MIPS_HFLAG_BMASK) != 0
-        && env->active_tc.PC != tb->pc) {
-        env->active_tc.PC -= (env->hflags & MIPS_HFLAG_B16 ? 2 : 4);
-        cpu_neg(cpu)->icount_decr.u16.low++;
-        env->hflags &= ~MIPS_HFLAG_BMASK;
-        n = 2;
-    }
-#elif defined(TARGET_SH4)
+#if defined(TARGET_SH4)
     if ((env->flags & ((DELAY_SLOT | DELAY_SLOT_CONDITIONAL))) != 0
         && env->pc != tb->pc) {
         env->pc -= 2;
diff --git a/target/mips/cpu.c b/target/mips/cpu.c
index ad163ead62..bf70c77295 100644
--- a/target/mips/cpu.c
+++ b/target/mips/cpu.c
@@ -268,6 +268,23 @@ static void mips_cpu_synchronize_from_tb(CPUState *cs,
     env->hflags &= ~MIPS_HFLAG_BMASK;
     env->hflags |= tb->flags & MIPS_HFLAG_BMASK;
 }
+
+# ifndef CONFIG_USER_ONLY
+static bool mips_io_recompile_replay_branch(CPUState *cs,
+                                            const TranslationBlock *tb)
+{
+    MIPSCPU *cpu = MIPS_CPU(cs);
+    CPUMIPSState *env = &cpu->env;
+
+    if ((env->hflags & MIPS_HFLAG_BMASK) != 0
+        && env->active_tc.PC != tb->pc) {
+        env->active_tc.PC -= (env->hflags & MIPS_HFLAG_B16 ? 2 : 4);
+        env->hflags &= ~MIPS_HFLAG_BMASK;
+        return true;
+    }
+    return false;
+}
+# endif /* !CONFIG_USER_ONLY */
 #endif /* CONFIG_TCG */
 
 static bool mips_cpu_has_work(CPUState *cs)
@@ -679,6 +696,7 @@ static struct TCGCPUOps mips_tcg_ops = {
     .do_interrupt = mips_cpu_do_interrupt,
     .do_transaction_failed = mips_cpu_do_transaction_failed,
     .do_unaligned_access = mips_cpu_do_unaligned_access,
+    .io_recompile_replay_branch = mips_io_recompile_replay_branch,
 #endif /* !CONFIG_USER_ONLY */
 };
 #endif /* CONFIG_TCG */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 13/21] target/sh4: Create superh_io_recompile_replay_branch
  2021-02-10 22:10 [PATCH v2 00/21] plugins/next pre-PR (hwprofile, regression fixes, icount count fix) Alex Bennée
                   ` (11 preceding siblings ...)
  2021-02-10 22:10 ` [PATCH v2 12/21] target/mips: Create mips_io_recompile_replay_branch Alex Bennée
@ 2021-02-10 22:10 ` Alex Bennée
  2021-02-11 10:13   ` Philippe Mathieu-Daudé
  2021-02-10 22:10 ` [PATCH v2 14/21] tests/plugin: expand insn test to detect duplicate instructions Alex Bennée
                   ` (7 subsequent siblings)
  20 siblings, 1 reply; 53+ messages in thread
From: Alex Bennée @ 2021-02-10 22:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yoshinori Sato, Richard Henderson, robhenry, mahmoudabdalghany,
	aaron, cota, Paolo Bonzini, kuhn.chenqun, Alex Bennée

From: Richard Henderson <richard.henderson@linaro.org>

Move the code from accel/tcg/translate-all.c to target/sh4/cpu.c.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210208233906.479571-5-richard.henderson@linaro.org>
Message-Id: <20210209182749.31323-5-alex.bennee@linaro.org>
---
 accel/tcg/translate-all.c | 12 ------------
 target/sh4/cpu.c          | 18 ++++++++++++++++++
 2 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 470657b02a..b8ad95aa1b 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -2417,9 +2417,6 @@ void tb_check_watchpoint(CPUState *cpu, uintptr_t retaddr)
  */
 void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
 {
-#if defined(TARGET_SH4)
-    CPUArchState *env = cpu->env_ptr;
-#endif
     TranslationBlock *tb;
     CPUClass *cc;
     uint32_t n;
@@ -2443,15 +2440,6 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
         cpu_neg(cpu)->icount_decr.u16.low++;
         n = 2;
     }
-#if defined(TARGET_SH4)
-    if ((env->flags & ((DELAY_SLOT | DELAY_SLOT_CONDITIONAL))) != 0
-        && env->pc != tb->pc) {
-        env->pc -= 2;
-        cpu_neg(cpu)->icount_decr.u16.low++;
-        env->flags &= ~(DELAY_SLOT | DELAY_SLOT_CONDITIONAL);
-        n = 2;
-    }
-#endif
 
     /* Generate a new TB executing the I/O insn.  */
     cpu->cflags_next_tb = curr_cflags() | CF_LAST_IO | n;
diff --git a/target/sh4/cpu.c b/target/sh4/cpu.c
index a78d283bc8..ac65c88f1f 100644
--- a/target/sh4/cpu.c
+++ b/target/sh4/cpu.c
@@ -43,6 +43,23 @@ static void superh_cpu_synchronize_from_tb(CPUState *cs,
     cpu->env.flags = tb->flags & TB_FLAG_ENVFLAGS_MASK;
 }
 
+#ifndef CONFIG_USER_ONLY
+static bool superh_io_recompile_replay_branch(CPUState *cs,
+                                              const TranslationBlock *tb)
+{
+    SuperHCPU *cpu = SUPERH_CPU(cs);
+    CPUSH4State *env = &cpu->env;
+
+    if ((env->flags & ((DELAY_SLOT | DELAY_SLOT_CONDITIONAL))) != 0
+        && env->pc != tb->pc) {
+        env->pc -= 2;
+        env->flags &= ~(DELAY_SLOT | DELAY_SLOT_CONDITIONAL);
+        return true;
+    }
+    return false;
+}
+#endif
+
 static bool superh_cpu_has_work(CPUState *cs)
 {
     return cs->interrupt_request & CPU_INTERRUPT_HARD;
@@ -217,6 +234,7 @@ static struct TCGCPUOps superh_tcg_ops = {
 #ifndef CONFIG_USER_ONLY
     .do_interrupt = superh_cpu_do_interrupt,
     .do_unaligned_access = superh_cpu_do_unaligned_access,
+    .io_recompile_replay_branch = superh_io_recompile_replay_branch,
 #endif /* !CONFIG_USER_ONLY */
 };
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 14/21] tests/plugin: expand insn test to detect duplicate instructions
  2021-02-10 22:10 [PATCH v2 00/21] plugins/next pre-PR (hwprofile, regression fixes, icount count fix) Alex Bennée
                   ` (12 preceding siblings ...)
  2021-02-10 22:10 ` [PATCH v2 13/21] target/sh4: Create superh_io_recompile_replay_branch Alex Bennée
@ 2021-02-10 22:10 ` Alex Bennée
  2021-02-10 22:10 ` [PATCH v2 15/21] tests/acceptance: add a new set of tests to exercise plugins Alex Bennée
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 53+ messages in thread
From: Alex Bennée @ 2021-02-10 22:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Richard Henderson, robhenry, mahmoudabdalghany,
	aaron, cota, Paolo Bonzini, kuhn.chenqun, Alex Bennée

A duplicate insn is one that is appears to be executed twice in a row.
This is currently possible due to -icount and cpu_io_recompile()
causing a re-translation of a block. On it's own this won't trigger
any tests though.

The heuristics that the plugin use can't deal with the x86 rep
instruction which (validly) will look like executing the same
instruction several times. To avoid problems later we tweak the rules
for x86 to run the "inline" version of the plugin. This also has the
advantage of increasing coverage of the plugin code (see bugfix in
previous commit).

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210209182749.31323-6-alex.bennee@linaro.org>

---
v2
  - make i386/x86_64 run inline version
---
 tests/plugin/insn.c                      | 12 +++++++++++-
 tests/tcg/i386/Makefile.softmmu-target   | 10 ++++++++++
 tests/tcg/i386/Makefile.target           |  7 +++++++
 tests/tcg/x86_64/Makefile.softmmu-target | 10 ++++++++++
 4 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/tests/plugin/insn.c b/tests/plugin/insn.c
index a9a6e41237..c253980ec8 100644
--- a/tests/plugin/insn.c
+++ b/tests/plugin/insn.c
@@ -21,6 +21,14 @@ static bool do_inline;
 
 static void vcpu_insn_exec_before(unsigned int cpu_index, void *udata)
 {
+    static uint64_t last_pc;
+    uint64_t this_pc = GPOINTER_TO_UINT(udata);
+    if (this_pc == last_pc) {
+        g_autofree gchar *out = g_strdup_printf("detected repeat execution @ 0x%"
+                                                PRIx64 "\n", this_pc);
+        qemu_plugin_outs(out);
+    }
+    last_pc = this_pc;
     insn_count++;
 }
 
@@ -36,8 +44,10 @@ static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
             qemu_plugin_register_vcpu_insn_exec_inline(
                 insn, QEMU_PLUGIN_INLINE_ADD_U64, &insn_count, 1);
         } else {
+            uint64_t vaddr = qemu_plugin_insn_vaddr(insn);
             qemu_plugin_register_vcpu_insn_exec_cb(
-                insn, vcpu_insn_exec_before, QEMU_PLUGIN_CB_NO_REGS, NULL);
+                insn, vcpu_insn_exec_before, QEMU_PLUGIN_CB_NO_REGS,
+                GUINT_TO_POINTER(vaddr));
         }
     }
 }
diff --git a/tests/tcg/i386/Makefile.softmmu-target b/tests/tcg/i386/Makefile.softmmu-target
index 5266f2335a..fa9b1b9f90 100644
--- a/tests/tcg/i386/Makefile.softmmu-target
+++ b/tests/tcg/i386/Makefile.softmmu-target
@@ -33,5 +33,15 @@ EXTRA_RUNS+=$(MULTIARCH_RUNS)
 
 memory: CFLAGS+=-DCHECK_UNALIGNED=1
 
+# non-inline runs will trigger the duplicate instruction heuristics in libinsn.so
+run-plugin-%-with-libinsn.so:
+	$(call run-test, $@, \
+	  $(QEMU) -monitor none -display none \
+		  -chardev file$(COMMA)path=$@.out$(COMMA)id=output \
+                  -plugin ../../plugin/libinsn.so$(COMMA)arg=inline \
+	    	  -d plugin -D $*-with-libinsn.so.pout \
+	   	  $(QEMU_OPTS) $*, \
+		  "$* on $(TARGET_NAME)")
+
 # Running
 QEMU_OPTS+=-device isa-debugcon,chardev=output -device isa-debug-exit,iobase=0xf4,iosize=0x4 -kernel
diff --git a/tests/tcg/i386/Makefile.target b/tests/tcg/i386/Makefile.target
index ad187cb2c9..c4a6f91966 100644
--- a/tests/tcg/i386/Makefile.target
+++ b/tests/tcg/i386/Makefile.target
@@ -48,6 +48,13 @@ else
 SKIP_I386_TESTS+=test-i386-fprem
 endif
 
+# non-inline runs will trigger the duplicate instruction heuristics in libinsn.so
+run-plugin-%-with-libinsn.so:
+	$(call run-test, $@, $(QEMU) $(QEMU_OPTS) \
+	       -plugin ../../plugin/libinsn.so$(COMMA)arg=inline \
+	       -d plugin -D $*-with-libinsn.so.pout $*, \
+		"$* (inline) on $(TARGET_NAME)")
+
 # Update TESTS
 I386_TESTS:=$(filter-out $(SKIP_I386_TESTS), $(ALL_X86_TESTS))
 TESTS=$(MULTIARCH_TESTS) $(I386_TESTS)
diff --git a/tests/tcg/x86_64/Makefile.softmmu-target b/tests/tcg/x86_64/Makefile.softmmu-target
index 1bd763f2e6..9896319f0e 100644
--- a/tests/tcg/x86_64/Makefile.softmmu-target
+++ b/tests/tcg/x86_64/Makefile.softmmu-target
@@ -33,5 +33,15 @@ EXTRA_RUNS+=$(MULTIARCH_RUNS)
 
 memory: CFLAGS+=-DCHECK_UNALIGNED=1
 
+# non-inline runs will trigger the duplicate instruction heuristics in libinsn.so
+run-plugin-%-with-libinsn.so:
+	$(call run-test, $@, \
+	  $(QEMU) -monitor none -display none \
+		  -chardev file$(COMMA)path=$@.out$(COMMA)id=output \
+                  -plugin ../../plugin/libinsn.so$(COMMA)arg=inline \
+	    	  -d plugin -D $*-with-libinsn.so.pout \
+	   	  $(QEMU_OPTS) $*, \
+		  "$* on $(TARGET_NAME)")
+
 # Running
 QEMU_OPTS+=-device isa-debugcon,chardev=output -device isa-debug-exit,iobase=0xf4,iosize=0x4 -kernel
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 15/21] tests/acceptance: add a new set of tests to exercise plugins
  2021-02-10 22:10 [PATCH v2 00/21] plugins/next pre-PR (hwprofile, regression fixes, icount count fix) Alex Bennée
                   ` (13 preceding siblings ...)
  2021-02-10 22:10 ` [PATCH v2 14/21] tests/plugin: expand insn test to detect duplicate instructions Alex Bennée
@ 2021-02-10 22:10 ` Alex Bennée
  2021-02-11 10:31   ` Philippe Mathieu-Daudé
  2021-02-11 19:51   ` Wainer dos Santos Moschetta
  2021-02-10 22:10 ` [PATCH v2 16/21] accel/tcg: actually cache our partial icount TB Alex Bennée
                   ` (5 subsequent siblings)
  20 siblings, 2 replies; 53+ messages in thread
From: Alex Bennée @ 2021-02-10 22:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: Philippe Mathieu-Daudé,
	robhenry, mahmoudabdalghany, aaron, cota,
	Wainer dos Santos Moschetta, Cleber Rosa, kuhn.chenqun,
	Alex Bennée

This is just a simple test to count the instructions executed by a
kernel. However a later test will detect a failure condition when
icount is enabled.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210209182749.31323-7-alex.bennee@linaro.org>
---
 tests/acceptance/tcg_plugins.py | 103 ++++++++++++++++++++++++++++++++
 1 file changed, 103 insertions(+)
 create mode 100644 tests/acceptance/tcg_plugins.py

diff --git a/tests/acceptance/tcg_plugins.py b/tests/acceptance/tcg_plugins.py
new file mode 100644
index 0000000000..b512979769
--- /dev/null
+++ b/tests/acceptance/tcg_plugins.py
@@ -0,0 +1,103 @@
+# TCG Plugins tests
+#
+# These are a little more involved than the basic tests run by check-tcg.
+#
+# Copyright (c) 2021 Linaro
+#
+# Author:
+#  Alex Bennée <alex.bennee@linaro.org>
+#
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+import logging
+import time
+import tempfile
+import mmap
+import re
+
+from boot_linux_console import LinuxKernelTest
+
+
+class PluginKernelBase(LinuxKernelTest):
+    """
+    Boots a Linux kernel with a TCG plugin enabled.
+    """
+
+    timeout = 120
+    KERNEL_COMMON_COMMAND_LINE = 'printk.time=1 panic=-1 '
+
+    def run_vm(self, kernel_path, kernel_command_line,
+               plugin, plugin_log, console_pattern, args):
+
+        logger = logging.getLogger('plugin')
+        start_time = time.time()
+        vm = self.get_vm()
+        vm.set_console()
+        vm.add_args('-kernel', kernel_path,
+                    '-append', kernel_command_line,
+                    '-plugin', plugin,
+                    '-d', 'plugin',
+                    '-D', plugin_log,
+                    '-net', 'none',
+                    '-no-reboot')
+        if args:
+            vm.add_args(*args)
+
+        try:
+            vm.launch()
+        except:
+            # fails if plugins not enabled
+            self.cancel("TCG Plugins not enabled")
+
+        self.wait_for_console_pattern(console_pattern, vm)
+        elapsed = time.time() - start_time
+        logger.info('elapsed time %.2f sec' % elapsed)
+        # ensure logs are flushed
+        vm.shutdown()
+        return elapsed
+
+
+class PluginKernelNormal(PluginKernelBase):
+
+    def _grab_aarch64_kernel(self):
+        kernel_url = ('http://security.debian.org/'
+                      'debian-security/pool/updates/main/l/linux-signed-arm64/'
+                      'linux-image-4.19.0-12-arm64_4.19.152-1_arm64.deb')
+        kernel_sha1 = '2036c2792f80ac9c4ccaae742b2e0a28385b6010'
+        kernel_deb = self.fetch_asset(kernel_url, asset_hash=kernel_sha1)
+        kernel_path = self.extract_from_deb(kernel_deb,
+                                            "/boot/vmlinuz-4.19.0-12-arm64")
+        return kernel_path
+
+    def test_aarch64_virt_insn(self):
+        """
+        :avocado: tags=accel:tcg
+        :avocado: tags=arch:aarch64
+        :avocado: tags=machine:virt
+        :avocado: tags=cpu:cortex-a57
+        """
+        kernel_path = self._grab_aarch64_kernel()
+        kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
+                               'console=ttyAMA0')
+        console_pattern = 'Kernel panic - not syncing: VFS:'
+
+        plugin_log = tempfile.NamedTemporaryFile(mode="r+t", prefix="plugin",
+                                                 suffix=".log")
+
+        self.run_vm(kernel_path, kernel_command_line,
+                    "tests/plugin/libinsn.so", plugin_log.name,
+                    console_pattern,
+                    args=('-cpu', 'cortex-a53'))
+
+        logger = logging.getLogger()
+
+        with plugin_log as lf, \
+             mmap.mmap(lf.fileno(), 0, access=mmap.ACCESS_READ) as s:
+
+            m = re.search(br"insns: (?P<count>\d+)", s)
+            if "count" in m.groupdict():
+                logger.debug("reported %d instructions",
+                             int(m.group("count")))
+            else:
+                logger.debug("Failed to find instruction count")
+                self.fail
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH  v2 16/21] accel/tcg: actually cache our partial icount TB
  2021-02-10 22:10 [PATCH v2 00/21] plugins/next pre-PR (hwprofile, regression fixes, icount count fix) Alex Bennée
                   ` (14 preceding siblings ...)
  2021-02-10 22:10 ` [PATCH v2 15/21] tests/acceptance: add a new set of tests to exercise plugins Alex Bennée
@ 2021-02-10 22:10 ` Alex Bennée
  2021-02-11 10:21   ` Philippe Mathieu-Daudé
  2021-02-11 18:48   ` Richard Henderson
  2021-02-10 22:10 ` [PATCH v2 17/21] accel/tcg: cache single instruction TB on pending replay exception Alex Bennée
                   ` (4 subsequent siblings)
  20 siblings, 2 replies; 53+ messages in thread
From: Alex Bennée @ 2021-02-10 22:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, robhenry, mahmoudabdalghany, aaron, cota,
	Paolo Bonzini, kuhn.chenqun, Alex Bennée

When we exit a block under icount with instructions left to execute we
might need a shorter than normal block to take us to the next
deterministic event. Instead of creating a throwaway block on demand
we use the existing compile flags mechanism to ensure we fetch (or
compile and fetch) a block with exactly the number of instructions we
need.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210209182749.31323-8-alex.bennee@linaro.org>

---
v2
  - drop pointless assert
---
 accel/tcg/cpu-exec.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index d9ef69121c..5b6a4fe84b 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -730,16 +730,17 @@ static inline void cpu_loop_exec_tb(CPUState *cpu, TranslationBlock *tb,
     /* Ensure global icount has gone forward */
     icount_update(cpu);
     /* Refill decrementer and continue execution.  */
-    insns_left = MIN(0xffff, cpu->icount_budget);
+    insns_left = MIN(CF_COUNT_MASK, cpu->icount_budget);
     cpu_neg(cpu)->icount_decr.u16.low = insns_left;
     cpu->icount_extra = cpu->icount_budget - insns_left;
-    if (!cpu->icount_extra && insns_left < tb->icount) {
-        /* Execute any remaining instructions, then let the main loop
-         * handle the next event.
-         */
-        if (insns_left > 0) {
-            cpu_exec_nocache(cpu, insns_left, tb, false);
-        }
+
+    /*
+     * If the next tb has more instructions than we have left to
+     * execute we need to ensure we find/generate a TB with exactly
+     * insns_left instructions in it.
+     */
+    if (!cpu->icount_extra && insns_left > 0 && insns_left < tb->icount)  {
+        cpu->cflags_next_tb = (tb->cflags & ~CF_COUNT_MASK) | insns_left;
     }
 #endif
 }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 17/21] accel/tcg: cache single instruction TB on pending replay exception
  2021-02-10 22:10 [PATCH v2 00/21] plugins/next pre-PR (hwprofile, regression fixes, icount count fix) Alex Bennée
                   ` (15 preceding siblings ...)
  2021-02-10 22:10 ` [PATCH v2 16/21] accel/tcg: actually cache our partial icount TB Alex Bennée
@ 2021-02-10 22:10 ` Alex Bennée
  2021-02-11 19:12   ` Richard Henderson
  2021-02-10 22:10 ` [PATCH v2 18/21] accel/tcg: re-factor non-RAM execution code Alex Bennée
                   ` (3 subsequent siblings)
  20 siblings, 1 reply; 53+ messages in thread
From: Alex Bennée @ 2021-02-10 22:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, robhenry, mahmoudabdalghany, aaron, cota,
	Paolo Bonzini, kuhn.chenqun, Alex Bennée

Again there is no reason to jump through the nocache hoops to execute
a single instruction block. We do have to add an additional wrinkle to
the cpu_handle_interrupt case to ensure we let through a TB where we
have specifically disabled icount for the block.

As the last user of cpu_exec_nocache we can now remove the function.
Further clean-up will follow in subsequent patches.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210209182749.31323-9-alex.bennee@linaro.org>
---
 accel/tcg/cpu-exec.c | 44 ++++----------------------------------------
 1 file changed, 4 insertions(+), 40 deletions(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 5b6a4fe84b..438fece73b 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -224,40 +224,6 @@ cpu_tb_exec(CPUState *cpu, TranslationBlock *itb, int *tb_exit)
     return last_tb;
 }
 
-#ifndef CONFIG_USER_ONLY
-/* Execute the code without caching the generated code. An interpreter
-   could be used if available. */
-static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
-                             TranslationBlock *orig_tb, bool ignore_icount)
-{
-    TranslationBlock *tb;
-    uint32_t cflags = curr_cflags() | CF_NOCACHE;
-    int tb_exit;
-
-    if (ignore_icount) {
-        cflags &= ~CF_USE_ICOUNT;
-    }
-
-    /* Should never happen.
-       We only end up here when an existing TB is too long.  */
-    cflags |= MIN(max_cycles, CF_COUNT_MASK);
-
-    mmap_lock();
-    tb = tb_gen_code(cpu, orig_tb->pc, orig_tb->cs_base,
-                     orig_tb->flags, cflags);
-    tb->orig_tb = orig_tb;
-    mmap_unlock();
-
-    /* execute the generated code */
-    trace_exec_tb_nocache(tb, tb->pc);
-    cpu_tb_exec(cpu, tb, &tb_exit);
-
-    mmap_lock();
-    tb_phys_invalidate(tb, -1);
-    mmap_unlock();
-    tcg_tb_remove(tb);
-}
-#endif
 
 static void cpu_exec_enter(CPUState *cpu)
 {
@@ -524,15 +490,12 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
 #ifndef CONFIG_USER_ONLY
         if (replay_has_exception()
             && cpu_neg(cpu)->icount_decr.u16.low + cpu->icount_extra == 0) {
-            /* try to cause an exception pending in the log */
-            cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0, curr_cflags()), true);
+            /* Execute just one insn to trigger exception pending in the log */
+            cpu->cflags_next_tb = (curr_cflags() & ~CF_USE_ICOUNT) | 1;
         }
 #endif
-        if (cpu->exception_index < 0) {
-            return false;
-        }
+        return false;
     }
-
     if (cpu->exception_index >= EXCP_INTERRUPT) {
         /* exit request from the cpu execution loop */
         *ret = cpu->exception_index;
@@ -688,6 +651,7 @@ static inline bool cpu_handle_interrupt(CPUState *cpu,
     /* Finally, check if we need to exit to the main loop.  */
     if (unlikely(qatomic_read(&cpu->exit_request))
         || (icount_enabled()
+            && (cpu->cflags_next_tb == -1 || cpu->cflags_next_tb & CF_USE_ICOUNT)
             && cpu_neg(cpu)->icount_decr.u16.low + cpu->icount_extra == 0)) {
         qatomic_set(&cpu->exit_request, 0);
         if (cpu->exception_index == -1) {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH  v2 18/21] accel/tcg: re-factor non-RAM execution code
  2021-02-10 22:10 [PATCH v2 00/21] plugins/next pre-PR (hwprofile, regression fixes, icount count fix) Alex Bennée
                   ` (16 preceding siblings ...)
  2021-02-10 22:10 ` [PATCH v2 17/21] accel/tcg: cache single instruction TB on pending replay exception Alex Bennée
@ 2021-02-10 22:10 ` Alex Bennée
  2021-02-11 19:19   ` Richard Henderson
  2021-02-10 22:10 ` [PATCH v2 19/21] accel/tcg: remove CF_NOCACHE and special cases Alex Bennée
                   ` (2 subsequent siblings)
  20 siblings, 1 reply; 53+ messages in thread
From: Alex Bennée @ 2021-02-10 22:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, robhenry, mahmoudabdalghany, aaron, cota,
	Paolo Bonzini, kuhn.chenqun, Alex Bennée

There is no real need to use CF_NOCACHE here. As long as the TB isn't
linked to other TBs or included in the QHT or jump cache then it will
only get executed once.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210209182749.31323-10-alex.bennee@linaro.org>
---
 accel/tcg/translate-all.c | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index b8ad95aa1b..7e62d8ad97 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1778,7 +1778,8 @@ static inline void tb_page_add(PageDesc *p, TranslationBlock *tb,
 #endif
 }
 
-/* add a new TB and link it to the physical page tables. phys_page2 is
+/*
+ * Add a new TB and link it to the physical page tables. phys_page2 is
  * (-1) to indicate that only one page contains the TB.
  *
  * Called with mmap_lock held for user-mode emulation.
@@ -1797,17 +1798,6 @@ tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
 
     assert_memory_lock();
 
-    if (phys_pc == -1) {
-        /*
-         * If the TB is not associated with a physical RAM page then
-         * it must be a temporary one-insn TB, and we have nothing to do
-         * except fill in the page_addr[] fields.
-         */
-        assert(tb->cflags & CF_NOCACHE);
-        tb->page_addr[0] = tb->page_addr[1] = -1;
-        return tb;
-    }
-
     /*
      * Add the TB to the page list, acquiring first the pages's locks.
      * We keep the locks held until after inserting the TB in the hash table,
@@ -1880,9 +1870,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     phys_pc = get_page_addr_code(env, pc);
 
     if (phys_pc == -1) {
-        /* Generate a temporary TB with 1 insn in it */
-        cflags &= ~CF_COUNT_MASK;
-        cflags |= CF_NOCACHE | 1;
+        /* Generate a one-shot TB with 1 insn in it */
+        cflags = (cflags & ~CF_COUNT_MASK) | 1;
     }
 
     cflags &= ~CF_CLUSTER_MASK;
@@ -2096,6 +2085,17 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
         tb_reset_jump(tb, 1);
     }
 
+    /*
+     * If the TB is not associated with a physical RAM page then
+     * it must be a temporary one-insn TB, and we have nothing to do
+     * except fill in the page_addr[] fields. Return early before
+     * attempting to link to other TBs or add to the lookup table.
+     */
+    if (phys_pc == -1) {
+        tb->page_addr[0] = tb->page_addr[1] = -1;
+        return tb;
+    }
+
     /* check next page if needed */
     virt_page2 = (pc + tb->size - 1) & TARGET_PAGE_MASK;
     phys_page2 = -1;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH  v2 19/21] accel/tcg: remove CF_NOCACHE and special cases
  2021-02-10 22:10 [PATCH v2 00/21] plugins/next pre-PR (hwprofile, regression fixes, icount count fix) Alex Bennée
                   ` (17 preceding siblings ...)
  2021-02-10 22:10 ` [PATCH v2 18/21] accel/tcg: re-factor non-RAM execution code Alex Bennée
@ 2021-02-10 22:10 ` Alex Bennée
  2021-02-10 22:10 ` [PATCH v2 20/21] accel/tcg: allow plugin instrumentation to be disable via cflags Alex Bennée
  2021-02-10 22:10 ` [PATCH v2 21/21] tests/acceptance: add a new tests to detect counting errors Alex Bennée
  20 siblings, 0 replies; 53+ messages in thread
From: Alex Bennée @ 2021-02-10 22:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, robhenry, mahmoudabdalghany, aaron, cota,
	Paolo Bonzini, kuhn.chenqun, Alex Bennée

Now we no longer generate CF_NOCACHE blocks we can remove a bunch of
the special case handling for them. While we are at it we can remove
the unused tb->orig_tb field and save a few bytes on the TB structure.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210209182749.31323-11-alex.bennee@linaro.org>
---
 include/exec/exec-all.h   |  3 ---
 accel/tcg/translate-all.c | 51 ++++++++++++---------------------------
 2 files changed, 15 insertions(+), 39 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index f933c74c44..e08179de34 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -454,7 +454,6 @@ struct TranslationBlock {
     uint32_t cflags;    /* compile flags */
 #define CF_COUNT_MASK  0x00007fff
 #define CF_LAST_IO     0x00008000 /* Last insn may be an IO access.  */
-#define CF_NOCACHE     0x00010000 /* To be freed after execution */
 #define CF_USE_ICOUNT  0x00020000
 #define CF_INVALID     0x00040000 /* TB is stale. Set with @jmp_lock held */
 #define CF_PARALLEL    0x00080000 /* Generate code for a parallel context */
@@ -469,8 +468,6 @@ struct TranslationBlock {
 
     struct tb_tc tc;
 
-    /* original tb when cflags has CF_NOCACHE */
-    struct TranslationBlock *orig_tb;
     /* first and second physical page containing code. The lower bit
        of the pointer tells the index in page_next[].
        The list is protected by the TB's page('s) lock(s) */
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 7e62d8ad97..0666f9ef14 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -409,12 +409,6 @@ bool cpu_restore_state(CPUState *cpu, uintptr_t host_pc, bool will_exit)
         TranslationBlock *tb = tcg_tb_lookup(host_pc);
         if (tb) {
             cpu_restore_state_from_tb(cpu, tb, host_pc, will_exit);
-            if (tb_cflags(tb) & CF_NOCACHE) {
-                /* one-shot translation, invalidate it immediately */
-                tb_phys_invalidate(tb, -1);
-                tcg_tb_remove(tb);
-                tb_destroy(tb);
-            }
             return true;
         }
     }
@@ -1633,8 +1627,7 @@ static void do_tb_phys_invalidate(TranslationBlock *tb, bool rm_from_page_list)
     phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
     h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb_cflags(tb) & CF_HASH_MASK,
                      tb->trace_vcpu_dstate);
-    if (!(tb->cflags & CF_NOCACHE) &&
-        !qht_remove(&tb_ctx.htable, tb, h)) {
+    if (!qht_remove(&tb_ctx.htable, tb, h)) {
         return;
     }
 
@@ -1795,6 +1788,8 @@ tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
 {
     PageDesc *p;
     PageDesc *p2 = NULL;
+    void *existing_tb = NULL;
+    uint32_t h;
 
     assert_memory_lock();
 
@@ -1814,25 +1809,20 @@ tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
         tb->page_addr[1] = -1;
     }
 
-    if (!(tb->cflags & CF_NOCACHE)) {
-        void *existing_tb = NULL;
-        uint32_t h;
-
-        /* add in the hash table */
-        h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags & CF_HASH_MASK,
-                         tb->trace_vcpu_dstate);
-        qht_insert(&tb_ctx.htable, tb, h, &existing_tb);
+    /* add in the hash table */
+    h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags & CF_HASH_MASK,
+                     tb->trace_vcpu_dstate);
+    qht_insert(&tb_ctx.htable, tb, h, &existing_tb);
 
-        /* remove TB from the page(s) if we couldn't insert it */
-        if (unlikely(existing_tb)) {
-            tb_page_remove(p, tb);
-            invalidate_page_bitmap(p);
-            if (p2) {
-                tb_page_remove(p2, tb);
-                invalidate_page_bitmap(p2);
-            }
-            tb = existing_tb;
+    /* remove TB from the page(s) if we couldn't insert it */
+    if (unlikely(existing_tb)) {
+        tb_page_remove(p, tb);
+        invalidate_page_bitmap(p);
+        if (p2) {
+            tb_page_remove(p2, tb);
+            invalidate_page_bitmap(p2);
         }
+        tb = existing_tb;
     }
 
     if (p2 && p2 != p) {
@@ -1905,7 +1895,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     tb->cs_base = cs_base;
     tb->flags = flags;
     tb->cflags = cflags;
-    tb->orig_tb = NULL;
     tb->trace_vcpu_dstate = *cpu->trace_dstate;
     tcg_ctx->tb_cflags = cflags;
  tb_overflow:
@@ -2444,16 +2433,6 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
     /* Generate a new TB executing the I/O insn.  */
     cpu->cflags_next_tb = curr_cflags() | CF_LAST_IO | n;
 
-    if (tb_cflags(tb) & CF_NOCACHE) {
-        if (tb->orig_tb) {
-            /* Invalidate original TB if this TB was generated in
-             * cpu_exec_nocache() */
-            tb_phys_invalidate(tb->orig_tb, -1);
-        }
-        tcg_tb_remove(tb);
-        tb_destroy(tb);
-    }
-
     qemu_log_mask_and_addr(CPU_LOG_EXEC, tb->pc,
                            "cpu_io_recompile: rewound execution of TB to "
                            TARGET_FMT_lx "\n", tb->pc);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 20/21] accel/tcg: allow plugin instrumentation to be disable via cflags
  2021-02-10 22:10 [PATCH v2 00/21] plugins/next pre-PR (hwprofile, regression fixes, icount count fix) Alex Bennée
                   ` (18 preceding siblings ...)
  2021-02-10 22:10 ` [PATCH v2 19/21] accel/tcg: remove CF_NOCACHE and special cases Alex Bennée
@ 2021-02-10 22:10 ` Alex Bennée
  2021-02-12  0:53   ` Aaron Lindsay via
  2021-02-10 22:10 ` [PATCH v2 21/21] tests/acceptance: add a new tests to detect counting errors Alex Bennée
  20 siblings, 1 reply; 53+ messages in thread
From: Alex Bennée @ 2021-02-10 22:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, robhenry, mahmoudabdalghany, aaron, cota,
	Paolo Bonzini, kuhn.chenqun, Alex Bennée

When icount is enabled and we recompile an MMIO access we end up
double counting the instruction execution. To avoid this we introduce
the CF_NOINSTR cflag which disables instrumentation for the next TB.
As this is part of the hashed compile flags we will only execute the
generated TB while coming out of a cpu_io_recompile.

While we are at it delete the old TODO. We might as well keep the
translation handy as it's likely you will repeatedly hit it on each
MMIO access.

Reported-by: Aaron Lindsay <aaron@os.amperecomputing.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210209182749.31323-12-alex.bennee@linaro.org>

---
v2
  - squashed CH_HASHMASK to ~CF_INVALID
---
 include/exec/exec-all.h   |  6 +++---
 accel/tcg/translate-all.c | 17 ++++++++---------
 accel/tcg/translator.c    |  2 +-
 3 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index e08179de34..299282cc59 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -454,14 +454,14 @@ struct TranslationBlock {
     uint32_t cflags;    /* compile flags */
 #define CF_COUNT_MASK  0x00007fff
 #define CF_LAST_IO     0x00008000 /* Last insn may be an IO access.  */
+#define CF_NOINSTR     0x00010000 /* Disable instrumentation of TB */
 #define CF_USE_ICOUNT  0x00020000
 #define CF_INVALID     0x00040000 /* TB is stale. Set with @jmp_lock held */
 #define CF_PARALLEL    0x00080000 /* Generate code for a parallel context */
 #define CF_CLUSTER_MASK 0xff000000 /* Top 8 bits are cluster ID */
 #define CF_CLUSTER_SHIFT 24
-/* cflags' mask for hashing/comparison */
-#define CF_HASH_MASK   \
-    (CF_COUNT_MASK | CF_LAST_IO | CF_USE_ICOUNT | CF_PARALLEL | CF_CLUSTER_MASK)
+/* cflags' mask for hashing/comparison, basically ignore CF_INVALID */
+#define CF_HASH_MASK   (~CF_INVALID)
 
     /* Per-vCPU dynamic tracing state used to generate this TB */
     uint32_t trace_vcpu_dstate;
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 0666f9ef14..32a3d8fe24 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -2399,7 +2399,8 @@ void tb_check_watchpoint(CPUState *cpu, uintptr_t retaddr)
 }
 
 #ifndef CONFIG_USER_ONLY
-/* in deterministic execution mode, instructions doing device I/Os
+/*
+ * In deterministic execution mode, instructions doing device I/Os
  * must be at the end of the TB.
  *
  * Called by softmmu_template.h, with iothread mutex not held.
@@ -2430,19 +2431,17 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
         n = 2;
     }
 
-    /* Generate a new TB executing the I/O insn.  */
-    cpu->cflags_next_tb = curr_cflags() | CF_LAST_IO | n;
+    /*
+     * Exit the loop and potentially generate a new TB executing the
+     * just the I/O insns. We also disable instrumentation so we don't
+     * double count the instruction.
+     */
+    cpu->cflags_next_tb = curr_cflags() | CF_NOINSTR | CF_LAST_IO | n;
 
     qemu_log_mask_and_addr(CPU_LOG_EXEC, tb->pc,
                            "cpu_io_recompile: rewound execution of TB to "
                            TARGET_FMT_lx "\n", tb->pc);
 
-    /* TODO: If env->pc != tb->pc (i.e. the faulting instruction was not
-     * the first in the TB) then we end up generating a whole new TB and
-     *  repeating the fault, which is horribly inefficient.
-     *  Better would be to execute just this insn uncached, or generate a
-     *  second new TB.
-     */
     cpu_loop_exit_noexc(cpu);
 }
 
diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index a49a794065..14d1ea795d 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -58,7 +58,7 @@ void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
     ops->tb_start(db, cpu);
     tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
 
-    plugin_enabled = plugin_gen_tb_start(cpu, tb);
+    plugin_enabled = !(tb_cflags(db->tb) & CF_NOINSTR) && plugin_gen_tb_start(cpu, tb);
 
     while (true) {
         db->num_insns++;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 21/21] tests/acceptance: add a new tests to detect counting errors
  2021-02-10 22:10 [PATCH v2 00/21] plugins/next pre-PR (hwprofile, regression fixes, icount count fix) Alex Bennée
                   ` (19 preceding siblings ...)
  2021-02-10 22:10 ` [PATCH v2 20/21] accel/tcg: allow plugin instrumentation to be disable via cflags Alex Bennée
@ 2021-02-10 22:10 ` Alex Bennée
  2021-02-11 10:24   ` Philippe Mathieu-Daudé
  2021-02-11 19:56   ` Wainer dos Santos Moschetta
  20 siblings, 2 replies; 53+ messages in thread
From: Alex Bennée @ 2021-02-10 22:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: Philippe Mathieu-Daudé,
	robhenry, mahmoudabdalghany, aaron, cota,
	Wainer dos Santos Moschetta, Cleber Rosa, kuhn.chenqun,
	Alex Bennée

The insn plugin has a simple heuristic to detect if an instruction is
detected running twice in a row. Check the plugin log after the run
and pass accordingly.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210209182749.31323-13-alex.bennee@linaro.org>
---
 tests/acceptance/tcg_plugins.py | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/tests/acceptance/tcg_plugins.py b/tests/acceptance/tcg_plugins.py
index b512979769..acab599505 100644
--- a/tests/acceptance/tcg_plugins.py
+++ b/tests/acceptance/tcg_plugins.py
@@ -101,3 +101,34 @@ class PluginKernelNormal(PluginKernelBase):
             else:
                 logger.debug("Failed to find instruction count")
                 self.fail
+
+    def test_aarch64_virt_insn_icount(self):
+        """
+        :avocado: tags=accel:tcg
+        :avocado: tags=arch:aarch64
+        :avocado: tags=machine:virt
+        :avocado: tags=cpu:cortex-a57
+        """
+        kernel_path = self._grab_aarch64_kernel()
+        kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
+                               'console=ttyAMA0')
+        console_pattern = 'Kernel panic - not syncing: VFS:'
+
+        plugin_log = tempfile.NamedTemporaryFile(mode="r+t", prefix="plugin",
+                                                 suffix=".log", delete=False)
+
+        self.run_vm(kernel_path, kernel_command_line,
+                    "tests/plugin/libinsn.so", plugin_log.name,
+                    console_pattern,
+                    args=('-cpu', 'cortex-a53', '-icount', 'shift=1'))
+
+        logger = logging.getLogger()
+
+        with plugin_log as lf, \
+             mmap.mmap(lf.fileno(), 0, access=mmap.ACCESS_READ) as s:
+            m = re.search(br"detected repeat execution @ (?P<addr>0x[0-9A-Fa-f]+)", s)
+            if m is not None and "addr" in m.groupdict():
+                logger.debug("detected repeat instructions")
+                self.fail("detected repeated instructions")
+            else:
+                logger.debug("no repeats detected: %s", m)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 12/21] target/mips: Create mips_io_recompile_replay_branch
  2021-02-10 22:10 ` [PATCH v2 12/21] target/mips: Create mips_io_recompile_replay_branch Alex Bennée
@ 2021-02-11 10:10   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 53+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-02-11 10:10 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Aleksandar Rikalo, Richard Henderson, robhenry,
	mahmoudabdalghany, aaron, cota, kuhn.chenqun, Paolo Bonzini,
	Aurelien Jarno

On 2/10/21 11:10 PM, Alex Bennée wrote:
> From: Richard Henderson <richard.henderson@linaro.org>
> 
> Move the code from accel/tcg/translate-all.c to target/mips/cpu.c.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> Message-Id: <20210208233906.479571-4-richard.henderson@linaro.org>
> Message-Id: <20210209182749.31323-4-alex.bennee@linaro.org>
> ---
>  accel/tcg/translate-all.c | 12 ++----------
>  target/mips/cpu.c         | 18 ++++++++++++++++++
>  2 files changed, 20 insertions(+), 10 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 11/21] accel/tcg: Create io_recompile_replay_branch hook
  2021-02-10 22:10 ` [PATCH v2 11/21] accel/tcg: Create io_recompile_replay_branch hook Alex Bennée
@ 2021-02-11 10:12   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 53+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-02-11 10:12 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Richard Henderson, robhenry, mahmoudabdalghany, aaron, cota,
	kuhn.chenqun, Paolo Bonzini

On 2/10/21 11:10 PM, Alex Bennée wrote:
> From: Richard Henderson <richard.henderson@linaro.org>
> 
> Create a hook in which to split out the mips and
> sh4 ifdefs from cpu_io_recompile.
> 
> [AJB: s/stoped/stopped/]
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> Message-Id: <20210208233906.479571-3-richard.henderson@linaro.org>
> Message-Id: <20210209182749.31323-3-alex.bennee@linaro.org>
> ---
>  include/hw/core/tcg-cpu-ops.h | 10 ++++++++++
>  accel/tcg/translate-all.c     | 17 +++++++++++++----
>  2 files changed, 23 insertions(+), 4 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 13/21] target/sh4: Create superh_io_recompile_replay_branch
  2021-02-10 22:10 ` [PATCH v2 13/21] target/sh4: Create superh_io_recompile_replay_branch Alex Bennée
@ 2021-02-11 10:13   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 53+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-02-11 10:13 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Yoshinori Sato, Richard Henderson, robhenry, mahmoudabdalghany,
	aaron, cota, kuhn.chenqun, Paolo Bonzini

On 2/10/21 11:10 PM, Alex Bennée wrote:
> From: Richard Henderson <richard.henderson@linaro.org>
> 
> Move the code from accel/tcg/translate-all.c to target/sh4/cpu.c.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> Message-Id: <20210208233906.479571-5-richard.henderson@linaro.org>
> Message-Id: <20210209182749.31323-5-alex.bennee@linaro.org>
> ---
>  accel/tcg/translate-all.c | 12 ------------
>  target/sh4/cpu.c          | 18 ++++++++++++++++++
>  2 files changed, 18 insertions(+), 12 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 10/21] exec: Move TranslationBlock typedef to qemu/typedefs.h
  2021-02-10 22:10 ` [PATCH v2 10/21] exec: Move TranslationBlock typedef to qemu/typedefs.h Alex Bennée
@ 2021-02-11 10:14   ` Philippe Mathieu-Daudé
  2021-02-11 10:24     ` Alex Bennée
  0 siblings, 1 reply; 53+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-02-11 10:14 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Peter Maydell, Eduardo Habkost, Anthony Green, Richard Henderson,
	aaron, robhenry, mahmoudabdalghany, Michael Walle, cota,
	kuhn.chenqun, Paolo Bonzini, Guan Xuetao, open list:ARM TCG CPUs

On 2/10/21 11:10 PM, Alex Bennée wrote:
> From: Richard Henderson <richard.henderson@linaro.org>
> 
> This also means we don't need an extra declaration of
> the structure in hw/core/cpu.h.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> Message-Id: <20210208233906.479571-2-richard.henderson@linaro.org>

I'd say this one matters ^,

> Message-Id: <20210209182749.31323-2-alex.bennee@linaro.org>

but this one less.

> ---
>  include/exec/tb-context.h     | 1 -
>  include/hw/core/cpu.h         | 4 +---
>  include/hw/core/tcg-cpu-ops.h | 3 +--
>  include/qemu/typedefs.h       | 1 +
>  target/arm/internals.h        | 3 +--
>  target/cris/translate.c       | 2 +-
>  target/lm32/translate.c       | 2 +-
>  target/moxie/translate.c      | 2 +-
>  target/unicore32/translate.c  | 2 +-
>  9 files changed, 8 insertions(+), 12 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 16/21] accel/tcg: actually cache our partial icount TB
  2021-02-10 22:10 ` [PATCH v2 16/21] accel/tcg: actually cache our partial icount TB Alex Bennée
@ 2021-02-11 10:21   ` Philippe Mathieu-Daudé
  2021-02-11 18:48     ` Richard Henderson
  2021-02-11 18:48   ` Richard Henderson
  1 sibling, 1 reply; 53+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-02-11 10:21 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Richard Henderson, robhenry, mahmoudabdalghany, aaron, cota,
	kuhn.chenqun, Paolo Bonzini

Hi Alex,

On 2/10/21 11:10 PM, Alex Bennée wrote:
> When we exit a block under icount with instructions left to execute we
> might need a shorter than normal block to take us to the next
> deterministic event. Instead of creating a throwaway block on demand
> we use the existing compile flags mechanism to ensure we fetch (or
> compile and fetch) a block with exactly the number of instructions we
> need.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Message-Id: <20210209182749.31323-8-alex.bennee@linaro.org>
> 
> ---
> v2
>   - drop pointless assert
> ---
>  accel/tcg/cpu-exec.c | 17 +++++++++--------
>  1 file changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> index d9ef69121c..5b6a4fe84b 100644
> --- a/accel/tcg/cpu-exec.c
> +++ b/accel/tcg/cpu-exec.c
> @@ -730,16 +730,17 @@ static inline void cpu_loop_exec_tb(CPUState *cpu, TranslationBlock *tb,
>      /* Ensure global icount has gone forward */
>      icount_update(cpu);
>      /* Refill decrementer and continue execution.  */
> -    insns_left = MIN(0xffff, cpu->icount_budget);
> +    insns_left = MIN(CF_COUNT_MASK, cpu->icount_budget);

Can you describe this change a bit please?

>      cpu_neg(cpu)->icount_decr.u16.low = insns_left;
>      cpu->icount_extra = cpu->icount_budget - insns_left;
> -    if (!cpu->icount_extra && insns_left < tb->icount) {
> -        /* Execute any remaining instructions, then let the main loop
> -         * handle the next event.
> -         */
> -        if (insns_left > 0) {
> -            cpu_exec_nocache(cpu, insns_left, tb, false);
> -        }
> +
> +    /*
> +     * If the next tb has more instructions than we have left to
> +     * execute we need to ensure we find/generate a TB with exactly
> +     * insns_left instructions in it.
> +     */
> +    if (!cpu->icount_extra && insns_left > 0 && insns_left < tb->icount)  {
> +        cpu->cflags_next_tb = (tb->cflags & ~CF_COUNT_MASK) | insns_left;
>      }
>  #endif
>  }
> 



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 10/21] exec: Move TranslationBlock typedef to qemu/typedefs.h
  2021-02-11 10:14   ` Philippe Mathieu-Daudé
@ 2021-02-11 10:24     ` Alex Bennée
  0 siblings, 0 replies; 53+ messages in thread
From: Alex Bennée @ 2021-02-11 10:24 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Peter Maydell, Eduardo Habkost, Anthony Green, Richard Henderson,
	qemu-devel, robhenry, aaron, mahmoudabdalghany, Michael Walle,
	cota, kuhn.chenqun, Paolo Bonzini, Guan Xuetao,
	open list:ARM TCG CPUs


Philippe Mathieu-Daudé <f4bug@amsat.org> writes:

> On 2/10/21 11:10 PM, Alex Bennée wrote:
>> From: Richard Henderson <richard.henderson@linaro.org>
>> 
>> This also means we don't need an extra declaration of
>> the structure in hw/core/cpu.h.
>> 
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>> Message-Id: <20210208233906.479571-2-richard.henderson@linaro.org>
>
> I'd say this one matters ^,
>
>> Message-Id: <20210209182749.31323-2-alex.bennee@linaro.org>
>
> but this one less.

My general approach to Message-Id's is to keep the one from where I
snarfed the patch and the last Message-Id from whichever iteration of
posting it's had. IOW the second one will be replaced by whatever the ID
is of the parent post here.

>
>> ---
>>  include/exec/tb-context.h     | 1 -
>>  include/hw/core/cpu.h         | 4 +---
>>  include/hw/core/tcg-cpu-ops.h | 3 +--
>>  include/qemu/typedefs.h       | 1 +
>>  target/arm/internals.h        | 3 +--
>>  target/cris/translate.c       | 2 +-
>>  target/lm32/translate.c       | 2 +-
>>  target/moxie/translate.c      | 2 +-
>>  target/unicore32/translate.c  | 2 +-
>>  9 files changed, 8 insertions(+), 12 deletions(-)
>
> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 21/21] tests/acceptance: add a new tests to detect counting errors
  2021-02-10 22:10 ` [PATCH v2 21/21] tests/acceptance: add a new tests to detect counting errors Alex Bennée
@ 2021-02-11 10:24   ` Philippe Mathieu-Daudé
  2021-02-11 19:56   ` Wainer dos Santos Moschetta
  1 sibling, 0 replies; 53+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-02-11 10:24 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: robhenry, mahmoudabdalghany, aaron, cota,
	Wainer dos Santos Moschetta, Cleber Rosa, kuhn.chenqun

On 2/10/21 11:10 PM, Alex Bennée wrote:
> The insn plugin has a simple heuristic to detect if an instruction is
> detected running twice in a row. Check the plugin log after the run
> and pass accordingly.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Message-Id: <20210209182749.31323-13-alex.bennee@linaro.org>
> ---
>  tests/acceptance/tcg_plugins.py | 31 +++++++++++++++++++++++++++++++
>  1 file changed, 31 insertions(+)
> 
> diff --git a/tests/acceptance/tcg_plugins.py b/tests/acceptance/tcg_plugins.py
> index b512979769..acab599505 100644
> --- a/tests/acceptance/tcg_plugins.py
> +++ b/tests/acceptance/tcg_plugins.py
> @@ -101,3 +101,34 @@ class PluginKernelNormal(PluginKernelBase):
>              else:
>                  logger.debug("Failed to find instruction count")
>                  self.fail
> +
> +    def test_aarch64_virt_insn_icount(self):
> +        """
> +        :avocado: tags=accel:tcg
> +        :avocado: tags=arch:aarch64
> +        :avocado: tags=machine:virt
> +        :avocado: tags=cpu:cortex-a57
> +        """
> +        kernel_path = self._grab_aarch64_kernel()
> +        kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
> +                               'console=ttyAMA0')
> +        console_pattern = 'Kernel panic - not syncing: VFS:'
> +
> +        plugin_log = tempfile.NamedTemporaryFile(mode="r+t", prefix="plugin",
> +                                                 suffix=".log", delete=False)
> +
> +        self.run_vm(kernel_path, kernel_command_line,
> +                    "tests/plugin/libinsn.so", plugin_log.name,
> +                    console_pattern,
> +                    args=('-cpu', 'cortex-a53', '-icount', 'shift=1'))
> +
> +        logger = logging.getLogger()
> +
> +        with plugin_log as lf, \
> +             mmap.mmap(lf.fileno(), 0, access=mmap.ACCESS_READ) as s:
> +            m = re.search(br"detected repeat execution @ (?P<addr>0x[0-9A-Fa-f]+)", s)
> +            if m is not None and "addr" in m.groupdict():
> +                logger.debug("detected repeat instructions")

I suppose this is debug left-over and we can remove this line now.

Otherwise:
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>

> +                self.fail("detected repeated instructions")
> +            else:
> +                logger.debug("no repeats detected: %s", m)
> 



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 15/21] tests/acceptance: add a new set of tests to exercise plugins
  2021-02-10 22:10 ` [PATCH v2 15/21] tests/acceptance: add a new set of tests to exercise plugins Alex Bennée
@ 2021-02-11 10:31   ` Philippe Mathieu-Daudé
  2021-02-11 18:59     ` Wainer dos Santos Moschetta
  2021-02-11 19:51   ` Wainer dos Santos Moschetta
  1 sibling, 1 reply; 53+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-02-11 10:31 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel, Willian Rampazzo
  Cc: robhenry, mahmoudabdalghany, aaron, cota,
	Wainer dos Santos Moschetta, Cleber Rosa, kuhn.chenqun

On 2/10/21 11:10 PM, Alex Bennée wrote:
> This is just a simple test to count the instructions executed by a
> kernel. However a later test will detect a failure condition when
> icount is enabled.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Message-Id: <20210209182749.31323-7-alex.bennee@linaro.org>
> ---
>  tests/acceptance/tcg_plugins.py | 103 ++++++++++++++++++++++++++++++++
>  1 file changed, 103 insertions(+)
>  create mode 100644 tests/acceptance/tcg_plugins.py

Can you add this file to "TCG Plugins" in MAINTAINERS please?

> 
> diff --git a/tests/acceptance/tcg_plugins.py b/tests/acceptance/tcg_plugins.py
> new file mode 100644
> index 0000000000..b512979769
> --- /dev/null
> +++ b/tests/acceptance/tcg_plugins.py
> @@ -0,0 +1,103 @@
> +# TCG Plugins tests
> +#
> +# These are a little more involved than the basic tests run by check-tcg.
> +#
> +# Copyright (c) 2021 Linaro
> +#
> +# Author:
> +#  Alex Bennée <alex.bennee@linaro.org>
> +#
> +# SPDX-License-Identifier: GPL-2.0-or-later
> +
> +import logging
> +import time
> +import tempfile
> +import mmap
> +import re
> +
> +from boot_linux_console import LinuxKernelTest
> +
> +
> +class PluginKernelBase(LinuxKernelTest):
> +    """
> +    Boots a Linux kernel with a TCG plugin enabled.
> +    """
> +
> +    timeout = 120
> +    KERNEL_COMMON_COMMAND_LINE = 'printk.time=1 panic=-1 '
> +
> +    def run_vm(self, kernel_path, kernel_command_line,
> +               plugin, plugin_log, console_pattern, args):
> +
> +        logger = logging.getLogger('plugin')
> +        start_time = time.time()
> +        vm = self.get_vm()
> +        vm.set_console()
> +        vm.add_args('-kernel', kernel_path,
> +                    '-append', kernel_command_line,
> +                    '-plugin', plugin,
> +                    '-d', 'plugin',
> +                    '-D', plugin_log,
> +                    '-net', 'none',
> +                    '-no-reboot')
> +        if args:
> +            vm.add_args(*args)
> +
> +        try:
> +            vm.launch()
> +        except:
> +            # fails if plugins not enabled
> +            self.cancel("TCG Plugins not enabled")

The test could fail for other reasons you want to catch...

We don't have yet the possibility to query the binary if it
has the plugin feature builtin. Can you add a TODO comment
so we fix that once we got it?

> +
> +        self.wait_for_console_pattern(console_pattern, vm)
> +        elapsed = time.time() - start_time
> +        logger.info('elapsed time %.2f sec' % elapsed)
> +        # ensure logs are flushed
> +        vm.shutdown()
> +        return elapsed
> +
> +
> +class PluginKernelNormal(PluginKernelBase):
> +
> +    def _grab_aarch64_kernel(self):
> +        kernel_url = ('http://security.debian.org/'
> +                      'debian-security/pool/updates/main/l/linux-signed-arm64/'
> +                      'linux-image-4.19.0-12-arm64_4.19.152-1_arm64.deb')
> +        kernel_sha1 = '2036c2792f80ac9c4ccaae742b2e0a28385b6010'
> +        kernel_deb = self.fetch_asset(kernel_url, asset_hash=kernel_sha1)
> +        kernel_path = self.extract_from_deb(kernel_deb,
> +                                            "/boot/vmlinuz-4.19.0-12-arm64")
> +        return kernel_path
> +
> +    def test_aarch64_virt_insn(self):
> +        """
> +        :avocado: tags=accel:tcg
> +        :avocado: tags=arch:aarch64
> +        :avocado: tags=machine:virt
> +        :avocado: tags=cpu:cortex-a57
> +        """
> +        kernel_path = self._grab_aarch64_kernel()
> +        kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
> +                               'console=ttyAMA0')
> +        console_pattern = 'Kernel panic - not syncing: VFS:'
> +
> +        plugin_log = tempfile.NamedTemporaryFile(mode="r+t", prefix="plugin",
> +                                                 suffix=".log")
> +
> +        self.run_vm(kernel_path, kernel_command_line,
> +                    "tests/plugin/libinsn.so", plugin_log.name,
> +                    console_pattern,
> +                    args=('-cpu', 'cortex-a53'))

I had the understanding that by using 'tags=cpu' QEMU would be called
with "-cpu tags['cpu']". Cleber can you confirm please?

> +
> +        logger = logging.getLogger()
> +
> +        with plugin_log as lf, \
> +             mmap.mmap(lf.fileno(), 0, access=mmap.ACCESS_READ) as s:
> +
> +            m = re.search(br"insns: (?P<count>\d+)", s)
> +            if "count" in m.groupdict():
> +                logger.debug("reported %d instructions",
> +                             int(m.group("count")))
> +            else:
> +                logger.debug("Failed to find instruction count")
> +                self.fail
> 

With MAINTAINERS & TODO:
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>

Thanks,

Phil.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 16/21] accel/tcg: actually cache our partial icount TB
  2021-02-11 10:21   ` Philippe Mathieu-Daudé
@ 2021-02-11 18:48     ` Richard Henderson
  2021-02-12 15:40       ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 53+ messages in thread
From: Richard Henderson @ 2021-02-11 18:48 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Alex Bennée, qemu-devel
  Cc: robhenry, mahmoudabdalghany, aaron, cota, Paolo Bonzini, kuhn.chenqun

On 2/11/21 2:21 AM, Philippe Mathieu-Daudé wrote:
>> -    insns_left = MIN(0xffff, cpu->icount_budget);
>> +    insns_left = MIN(CF_COUNT_MASK, cpu->icount_budget);
> 
> Can you describe this change a bit please?

Replacing a magic number with its proper symbol.


r~


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 16/21] accel/tcg: actually cache our partial icount TB
  2021-02-10 22:10 ` [PATCH v2 16/21] accel/tcg: actually cache our partial icount TB Alex Bennée
  2021-02-11 10:21   ` Philippe Mathieu-Daudé
@ 2021-02-11 18:48   ` Richard Henderson
  1 sibling, 0 replies; 53+ messages in thread
From: Richard Henderson @ 2021-02-11 18:48 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: robhenry, mahmoudabdalghany, aaron, cota, Paolo Bonzini, kuhn.chenqun

On 2/10/21 2:10 PM, Alex Bennée wrote:
> When we exit a block under icount with instructions left to execute we
> might need a shorter than normal block to take us to the next
> deterministic event. Instead of creating a throwaway block on demand
> we use the existing compile flags mechanism to ensure we fetch (or
> compile and fetch) a block with exactly the number of instructions we
> need.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Message-Id: <20210209182749.31323-8-alex.bennee@linaro.org>

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 15/21] tests/acceptance: add a new set of tests to exercise plugins
  2021-02-11 10:31   ` Philippe Mathieu-Daudé
@ 2021-02-11 18:59     ` Wainer dos Santos Moschetta
  0 siblings, 0 replies; 53+ messages in thread
From: Wainer dos Santos Moschetta @ 2021-02-11 18:59 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé,
	Alex Bennée, qemu-devel, Willian Rampazzo
  Cc: robhenry, mahmoudabdalghany, aaron, cota, Cleber Rosa, kuhn.chenqun


On 2/11/21 7:31 AM, Philippe Mathieu-Daudé wrote:
> On 2/10/21 11:10 PM, Alex Bennée wrote:
>> This is just a simple test to count the instructions executed by a
>> kernel. However a later test will detect a failure condition when
>> icount is enabled.
>>
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> Message-Id: <20210209182749.31323-7-alex.bennee@linaro.org>
>> ---
>>   tests/acceptance/tcg_plugins.py | 103 ++++++++++++++++++++++++++++++++
>>   1 file changed, 103 insertions(+)
>>   create mode 100644 tests/acceptance/tcg_plugins.py
> Can you add this file to "TCG Plugins" in MAINTAINERS please?
>
>> diff --git a/tests/acceptance/tcg_plugins.py b/tests/acceptance/tcg_plugins.py
>> new file mode 100644
>> index 0000000000..b512979769
>> --- /dev/null
>> +++ b/tests/acceptance/tcg_plugins.py
>> @@ -0,0 +1,103 @@
>> +# TCG Plugins tests
>> +#
>> +# These are a little more involved than the basic tests run by check-tcg.
>> +#
>> +# Copyright (c) 2021 Linaro
>> +#
>> +# Author:
>> +#  Alex Bennée <alex.bennee@linaro.org>
>> +#
>> +# SPDX-License-Identifier: GPL-2.0-or-later
>> +
>> +import logging
>> +import time
>> +import tempfile
>> +import mmap
>> +import re
>> +
>> +from boot_linux_console import LinuxKernelTest
>> +
>> +
>> +class PluginKernelBase(LinuxKernelTest):
>> +    """
>> +    Boots a Linux kernel with a TCG plugin enabled.
>> +    """
>> +
>> +    timeout = 120
>> +    KERNEL_COMMON_COMMAND_LINE = 'printk.time=1 panic=-1 '
>> +
>> +    def run_vm(self, kernel_path, kernel_command_line,
>> +               plugin, plugin_log, console_pattern, args):
>> +
>> +        logger = logging.getLogger('plugin')
>> +        start_time = time.time()
>> +        vm = self.get_vm()
>> +        vm.set_console()
>> +        vm.add_args('-kernel', kernel_path,
>> +                    '-append', kernel_command_line,
>> +                    '-plugin', plugin,
>> +                    '-d', 'plugin',
>> +                    '-D', plugin_log,
>> +                    '-net', 'none',
>> +                    '-no-reboot')
>> +        if args:
>> +            vm.add_args(*args)
>> +
>> +        try:
>> +            vm.launch()
>> +        except:
>> +            # fails if plugins not enabled
>> +            self.cancel("TCG Plugins not enabled")
> The test could fail for other reasons you want to catch...
>
> We don't have yet the possibility to query the binary if it
> has the plugin feature builtin. Can you add a TODO comment
> so we fix that once we got it?
>
>> +
>> +        self.wait_for_console_pattern(console_pattern, vm)
>> +        elapsed = time.time() - start_time
>> +        logger.info('elapsed time %.2f sec' % elapsed)
>> +        # ensure logs are flushed
>> +        vm.shutdown()
>> +        return elapsed
>> +
>> +
>> +class PluginKernelNormal(PluginKernelBase):
>> +
>> +    def _grab_aarch64_kernel(self):
>> +        kernel_url = ('http://security.debian.org/'
>> +                      'debian-security/pool/updates/main/l/linux-signed-arm64/'
>> +                      'linux-image-4.19.0-12-arm64_4.19.152-1_arm64.deb')
>> +        kernel_sha1 = '2036c2792f80ac9c4ccaae742b2e0a28385b6010'
>> +        kernel_deb = self.fetch_asset(kernel_url, asset_hash=kernel_sha1)
>> +        kernel_path = self.extract_from_deb(kernel_deb,
>> +                                            "/boot/vmlinuz-4.19.0-12-arm64")
>> +        return kernel_path
>> +
>> +    def test_aarch64_virt_insn(self):
>> +        """
>> +        :avocado: tags=accel:tcg
>> +        :avocado: tags=arch:aarch64
>> +        :avocado: tags=machine:virt
>> +        :avocado: tags=cpu:cortex-a57
>> +        """
>> +        kernel_path = self._grab_aarch64_kernel()
>> +        kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
>> +                               'console=ttyAMA0')
>> +        console_pattern = 'Kernel panic - not syncing: VFS:'
>> +
>> +        plugin_log = tempfile.NamedTemporaryFile(mode="r+t", prefix="plugin",
>> +                                                 suffix=".log")
>> +
>> +        self.run_vm(kernel_path, kernel_command_line,
>> +                    "tests/plugin/libinsn.so", plugin_log.name,
>> +                    console_pattern,
>> +                    args=('-cpu', 'cortex-a53'))
> I had the understanding that by using 'tags=cpu' QEMU would be called
> with "-cpu tags['cpu']". Cleber can you confirm please?

Currently only the 'machine' tag is automatically converted to QEMU 
arguments (in tests/acceptance/avocado_qemu/__init__.py::get_vm()). 
Having the same behavior for other tags like 'cpu' and 'accel' should be 
helpful IMO.


- Wainer

>
>> +
>> +        logger = logging.getLogger()
>> +
>> +        with plugin_log as lf, \
>> +             mmap.mmap(lf.fileno(), 0, access=mmap.ACCESS_READ) as s:
>> +
>> +            m = re.search(br"insns: (?P<count>\d+)", s)
>> +            if "count" in m.groupdict():
>> +                logger.debug("reported %d instructions",
>> +                             int(m.group("count")))
>> +            else:
>> +                logger.debug("Failed to find instruction count")
>> +                self.fail
>>
> With MAINTAINERS & TODO:
> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
>
> Thanks,
>
> Phil.
>



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 17/21] accel/tcg: cache single instruction TB on pending replay exception
  2021-02-10 22:10 ` [PATCH v2 17/21] accel/tcg: cache single instruction TB on pending replay exception Alex Bennée
@ 2021-02-11 19:12   ` Richard Henderson
  2021-02-11 20:00     ` Alex Bennée
  0 siblings, 1 reply; 53+ messages in thread
From: Richard Henderson @ 2021-02-11 19:12 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: robhenry, mahmoudabdalghany, aaron, cota, Paolo Bonzini, kuhn.chenqun

On 2/10/21 2:10 PM, Alex Bennée wrote:
> Again there is no reason to jump through the nocache hoops to execute
> a single instruction block. We do have to add an additional wrinkle to
> the cpu_handle_interrupt case to ensure we let through a TB where we
> have specifically disabled icount for the block.

Can you say more about this?  Because...

>      if (unlikely(qatomic_read(&cpu->exit_request))
>          || (icount_enabled()
> +            && (cpu->cflags_next_tb == -1 || cpu->cflags_next_tb & CF_USE_ICOUNT)
>              && cpu_neg(cpu)->icount_decr.u16.low + cpu->icount_extra == 0)) {

... this does not appear to match.  You're checking that icount has been
explicitly *enabled*?  Or am I reading the logic backward and only if icount is
enabled will we take EXCP_INTERRUPT?


r~


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 18/21] accel/tcg: re-factor non-RAM execution code
  2021-02-10 22:10 ` [PATCH v2 18/21] accel/tcg: re-factor non-RAM execution code Alex Bennée
@ 2021-02-11 19:19   ` Richard Henderson
  0 siblings, 0 replies; 53+ messages in thread
From: Richard Henderson @ 2021-02-11 19:19 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: robhenry, mahmoudabdalghany, aaron, cota, Paolo Bonzini, kuhn.chenqun

On 2/10/21 2:10 PM, Alex Bennée wrote:
> There is no real need to use CF_NOCACHE here. As long as the TB isn't
> linked to other TBs or included in the QHT or jump cache then it will
> only get executed once.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Message-Id: <20210209182749.31323-10-alex.bennee@linaro.org>
> ---
>  accel/tcg/translate-all.c | 30 +++++++++++++++---------------
>  1 file changed, 15 insertions(+), 15 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 15/21] tests/acceptance: add a new set of tests to exercise plugins
  2021-02-10 22:10 ` [PATCH v2 15/21] tests/acceptance: add a new set of tests to exercise plugins Alex Bennée
  2021-02-11 10:31   ` Philippe Mathieu-Daudé
@ 2021-02-11 19:51   ` Wainer dos Santos Moschetta
  1 sibling, 0 replies; 53+ messages in thread
From: Wainer dos Santos Moschetta @ 2021-02-11 19:51 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: robhenry, mahmoudabdalghany, aaron, cota, Cleber Rosa,
	kuhn.chenqun, Philippe Mathieu-Daudé


On 2/10/21 7:10 PM, Alex Bennée wrote:
> This is just a simple test to count the instructions executed by a
> kernel. However a later test will detect a failure condition when
> icount is enabled.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Message-Id: <20210209182749.31323-7-alex.bennee@linaro.org>
> ---
>   tests/acceptance/tcg_plugins.py | 103 ++++++++++++++++++++++++++++++++
>   1 file changed, 103 insertions(+)
>   create mode 100644 tests/acceptance/tcg_plugins.py

Configured with

   ./configure --enable-tcg --enable-plugins --disable-docs 
--target-list=aarch64-softmmu

and worked fine here, so

Tested-by: Wainer dos Santos Moschetta <wainersm@redhat.com>

>
> diff --git a/tests/acceptance/tcg_plugins.py b/tests/acceptance/tcg_plugins.py
> new file mode 100644
> index 0000000000..b512979769
> --- /dev/null
> +++ b/tests/acceptance/tcg_plugins.py
> @@ -0,0 +1,103 @@
> +# TCG Plugins tests
> +#
> +# These are a little more involved than the basic tests run by check-tcg.
> +#
> +# Copyright (c) 2021 Linaro
> +#
> +# Author:
> +#  Alex Bennée <alex.bennee@linaro.org>
> +#
> +# SPDX-License-Identifier: GPL-2.0-or-later
> +
> +import logging
> +import time
> +import tempfile
> +import mmap
> +import re
> +
> +from boot_linux_console import LinuxKernelTest
> +
> +
> +class PluginKernelBase(LinuxKernelTest):
> +    """
> +    Boots a Linux kernel with a TCG plugin enabled.
> +    """
> +
> +    timeout = 120
> +    KERNEL_COMMON_COMMAND_LINE = 'printk.time=1 panic=-1 '
> +
> +    def run_vm(self, kernel_path, kernel_command_line,
> +               plugin, plugin_log, console_pattern, args):
> +
> +        logger = logging.getLogger('plugin')
> +        start_time = time.time()
> +        vm = self.get_vm()
> +        vm.set_console()
> +        vm.add_args('-kernel', kernel_path,
> +                    '-append', kernel_command_line,
> +                    '-plugin', plugin,
> +                    '-d', 'plugin',
> +                    '-D', plugin_log,
> +                    '-net', 'none',
> +                    '-no-reboot')
> +        if args:
> +            vm.add_args(*args)
> +
> +        try:
> +            vm.launch()
> +        except:
> +            # fails if plugins not enabled
> +            self.cancel("TCG Plugins not enabled")
> +
> +        self.wait_for_console_pattern(console_pattern, vm)
> +        elapsed = time.time() - start_time
> +        logger.info('elapsed time %.2f sec' % elapsed)
> +        # ensure logs are flushed
> +        vm.shutdown()
> +        return elapsed
> +
> +
> +class PluginKernelNormal(PluginKernelBase):
> +
> +    def _grab_aarch64_kernel(self):
> +        kernel_url = ('http://security.debian.org/'
> +                      'debian-security/pool/updates/main/l/linux-signed-arm64/'
> +                      'linux-image-4.19.0-12-arm64_4.19.152-1_arm64.deb')
> +        kernel_sha1 = '2036c2792f80ac9c4ccaae742b2e0a28385b6010'
> +        kernel_deb = self.fetch_asset(kernel_url, asset_hash=kernel_sha1)
> +        kernel_path = self.extract_from_deb(kernel_deb,
> +                                            "/boot/vmlinuz-4.19.0-12-arm64")
> +        return kernel_path
> +
> +    def test_aarch64_virt_insn(self):
> +        """
> +        :avocado: tags=accel:tcg
> +        :avocado: tags=arch:aarch64
> +        :avocado: tags=machine:virt
> +        :avocado: tags=cpu:cortex-a57
> +        """
> +        kernel_path = self._grab_aarch64_kernel()
> +        kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
> +                               'console=ttyAMA0')
> +        console_pattern = 'Kernel panic - not syncing: VFS:'
> +
> +        plugin_log = tempfile.NamedTemporaryFile(mode="r+t", prefix="plugin",
> +                                                 suffix=".log")
> +
> +        self.run_vm(kernel_path, kernel_command_line,
> +                    "tests/plugin/libinsn.so", plugin_log.name,
> +                    console_pattern,
> +                    args=('-cpu', 'cortex-a53'))
> +
> +        logger = logging.getLogger()
> +
> +        with plugin_log as lf, \
> +             mmap.mmap(lf.fileno(), 0, access=mmap.ACCESS_READ) as s:
> +
> +            m = re.search(br"insns: (?P<count>\d+)", s)
> +            if "count" in m.groupdict():
> +                logger.debug("reported %d instructions",
> +                             int(m.group("count")))
> +            else:
> +                logger.debug("Failed to find instruction count")
> +                self.fail



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 21/21] tests/acceptance: add a new tests to detect counting errors
  2021-02-10 22:10 ` [PATCH v2 21/21] tests/acceptance: add a new tests to detect counting errors Alex Bennée
  2021-02-11 10:24   ` Philippe Mathieu-Daudé
@ 2021-02-11 19:56   ` Wainer dos Santos Moschetta
  1 sibling, 0 replies; 53+ messages in thread
From: Wainer dos Santos Moschetta @ 2021-02-11 19:56 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: robhenry, mahmoudabdalghany, aaron, cota, Cleber Rosa,
	kuhn.chenqun, Philippe Mathieu-Daudé


On 2/10/21 7:10 PM, Alex Bennée wrote:
> The insn plugin has a simple heuristic to detect if an instruction is
> detected running twice in a row. Check the plugin log after the run
> and pass accordingly.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Message-Id: <20210209182749.31323-13-alex.bennee@linaro.org>
> ---
>   tests/acceptance/tcg_plugins.py | 31 +++++++++++++++++++++++++++++++
>   1 file changed, 31 insertions(+)

Likewise,

Tested-by: Wainer dos Santos Moschetta <wainersm@redhat.com>

>
> diff --git a/tests/acceptance/tcg_plugins.py b/tests/acceptance/tcg_plugins.py
> index b512979769..acab599505 100644
> --- a/tests/acceptance/tcg_plugins.py
> +++ b/tests/acceptance/tcg_plugins.py
> @@ -101,3 +101,34 @@ class PluginKernelNormal(PluginKernelBase):
>               else:
>                   logger.debug("Failed to find instruction count")
>                   self.fail
> +
> +    def test_aarch64_virt_insn_icount(self):
> +        """
> +        :avocado: tags=accel:tcg
> +        :avocado: tags=arch:aarch64
> +        :avocado: tags=machine:virt
> +        :avocado: tags=cpu:cortex-a57
> +        """
> +        kernel_path = self._grab_aarch64_kernel()
> +        kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
> +                               'console=ttyAMA0')
> +        console_pattern = 'Kernel panic - not syncing: VFS:'
> +
> +        plugin_log = tempfile.NamedTemporaryFile(mode="r+t", prefix="plugin",
> +                                                 suffix=".log", delete=False)


In case you find it useful to save the file in the Avocado's tests logs 
directory for debugging, just use the `self.outputdir` property:

diff --git a/tests/acceptance/tcg_plugins.py 
b/tests/acceptance/tcg_plugins.py
index acab599505..da5c8ae267 100644
--- a/tests/acceptance/tcg_plugins.py
+++ b/tests/acceptance/tcg_plugins.py
@@ -13,6 +13,7 @@ import logging
  import time
diff --git a/tests/acceptance/tcg_plugins.py 
b/tests/acceptance/tcg_plugins.py
index acab599505..da5c8ae267 100644
--- a/tests/acceptance/tcg_plugins.py
+++ b/tests/acceptance/tcg_plugins.py
@@ -13,6 +13,7 @@ import logging
  import time
  import tempfile
  import mmap
+import os
  import re

  from boot_linux_console import LinuxKernelTest
@@ -114,17 +115,16 @@ class PluginKernelNormal(PluginKernelBase):
                                 'console=ttyAMA0')
          console_pattern = 'Kernel panic - not syncing: VFS:'

-        plugin_log = tempfile.NamedTemporaryFile(mode="r+t", 
prefix="plugin",
-                                                 suffix=".log", 
delete=False)
+        plugin_log = os.path.join(self.outputdir, "plugin.log")

          self.run_vm(kernel_path, kernel_command_line,
-                    "tests/plugin/libinsn.so", plugin_log.name,
+                    "tests/plugin/libinsn.so", plugin_log,
                      console_pattern,
                      args=('-cpu', 'cortex-a53', '-icount', 'shift=1'))

          logger = logging.getLogger()

-        with plugin_log as lf, \
+        with open(plugin_log, 'rt') as lf, \
               mmap.mmap(lf.fileno(), 0, access=mmap.ACCESS_READ) as s:
              m = re.search(br"detected repeat execution @ 
(?P<addr>0x[0-9A-Fa-f]+)", s)
              if m is not None and "addr" in m.groupdict():
---

> +
> +        self.run_vm(kernel_path, kernel_command_line,
> +                    "tests/plugin/libinsn.so", plugin_log.name,
> +                    console_pattern,
> +                    args=('-cpu', 'cortex-a53', '-icount', 'shift=1'))
> +
> +        logger = logging.getLogger()
> +
> +        with plugin_log as lf, \
> +             mmap.mmap(lf.fileno(), 0, access=mmap.ACCESS_READ) as s:
> +            m = re.search(br"detected repeat execution @ (?P<addr>0x[0-9A-Fa-f]+)", s)
> +            if m is not None and "addr" in m.groupdict():
> +                logger.debug("detected repeat instructions")
> +                self.fail("detected repeated instructions")
> +            else:
> +                logger.debug("no repeats detected: %s", m)



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 17/21] accel/tcg: cache single instruction TB on pending replay exception
  2021-02-11 19:12   ` Richard Henderson
@ 2021-02-11 20:00     ` Alex Bennée
  0 siblings, 0 replies; 53+ messages in thread
From: Alex Bennée @ 2021-02-11 20:00 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, robhenry, mahmoudabdalghany, aaron, cota,
	Paolo Bonzini, kuhn.chenqun


Richard Henderson <richard.henderson@linaro.org> writes:

> On 2/10/21 2:10 PM, Alex Bennée wrote:
>> Again there is no reason to jump through the nocache hoops to execute
>> a single instruction block. We do have to add an additional wrinkle to
>> the cpu_handle_interrupt case to ensure we let through a TB where we
>> have specifically disabled icount for the block.
>
> Can you say more about this?  Because...
>
>>      if (unlikely(qatomic_read(&cpu->exit_request))
>>          || (icount_enabled()
>> +            && (cpu->cflags_next_tb == -1 || cpu->cflags_next_tb & CF_USE_ICOUNT)
>>              && cpu_neg(cpu)->icount_decr.u16.low + cpu->icount_extra == 0)) {
>
> ... this does not appear to match.  You're checking that icount has been
> explicitly *enabled*?

If icount has been enabled and we are using the default cflags or
enabled and we have the explicit CF_ICOUNT. The replay exception leg
explicitly disables icount because otherwise we'd never actually execute
the block because we have a budget of 0 cycles left. Previously we ran
that block at the exception handling point - now we fall through and
have to make sure we don't trigger an IRQ.

> Or am I reading the logic backward and only if icount is
> enabled will we take EXCP_INTERRUPT?

Or I guess we have an exit_request which hasn't been handled yet but
there is no EXCP_ pending.

>
>
> r~


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH  v2 20/21] accel/tcg: allow plugin instrumentation to be disable via cflags
  2021-02-10 22:10 ` [PATCH v2 20/21] accel/tcg: allow plugin instrumentation to be disable via cflags Alex Bennée
@ 2021-02-12  0:53   ` Aaron Lindsay via
  2021-02-12 11:22     ` Alex Bennée
  2021-02-12 14:43     ` Alex Bennée
  0 siblings, 2 replies; 53+ messages in thread
From: Aaron Lindsay via @ 2021-02-12  0:53 UTC (permalink / raw)
  To: Alex Bennée
  Cc: qemu-devel, cota, kuhn.chenqun, robhenry, mahmoudabdalghany,
	Richard Henderson, Paolo Bonzini

On Feb 10 22:10, Alex Bennée wrote:
> When icount is enabled and we recompile an MMIO access we end up
> double counting the instruction execution. To avoid this we introduce
> the CF_NOINSTR cflag which disables instrumentation for the next TB.
> As this is part of the hashed compile flags we will only execute the
> generated TB while coming out of a cpu_io_recompile.

Unfortunately this patch works a little too well!

With this change, the memory access callbacks registered via
`qemu_plugin_register_vcpu_mem_cb()` are never called for the
re-translated instruction making the IO access, since we've disabled all
instrumentation.

Is it possible to selectively disable only instruction callbacks using
this mechanism, while still allowing others that would not yet have been
called for the re-translated instruction?

-Aaron

> While we are at it delete the old TODO. We might as well keep the
> translation handy as it's likely you will repeatedly hit it on each
> MMIO access.
> 
> Reported-by: Aaron Lindsay <aaron@os.amperecomputing.com>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> Message-Id: <20210209182749.31323-12-alex.bennee@linaro.org>
> 
> ---
> v2
>   - squashed CH_HASHMASK to ~CF_INVALID
> ---
>  include/exec/exec-all.h   |  6 +++---
>  accel/tcg/translate-all.c | 17 ++++++++---------
>  accel/tcg/translator.c    |  2 +-
>  3 files changed, 12 insertions(+), 13 deletions(-)
> 
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index e08179de34..299282cc59 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -454,14 +454,14 @@ struct TranslationBlock {
>      uint32_t cflags;    /* compile flags */
>  #define CF_COUNT_MASK  0x00007fff
>  #define CF_LAST_IO     0x00008000 /* Last insn may be an IO access.  */
> +#define CF_NOINSTR     0x00010000 /* Disable instrumentation of TB */
>  #define CF_USE_ICOUNT  0x00020000
>  #define CF_INVALID     0x00040000 /* TB is stale. Set with @jmp_lock held */
>  #define CF_PARALLEL    0x00080000 /* Generate code for a parallel context */
>  #define CF_CLUSTER_MASK 0xff000000 /* Top 8 bits are cluster ID */
>  #define CF_CLUSTER_SHIFT 24
> -/* cflags' mask for hashing/comparison */
> -#define CF_HASH_MASK   \
> -    (CF_COUNT_MASK | CF_LAST_IO | CF_USE_ICOUNT | CF_PARALLEL | CF_CLUSTER_MASK)
> +/* cflags' mask for hashing/comparison, basically ignore CF_INVALID */
> +#define CF_HASH_MASK   (~CF_INVALID)
>  
>      /* Per-vCPU dynamic tracing state used to generate this TB */
>      uint32_t trace_vcpu_dstate;
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 0666f9ef14..32a3d8fe24 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -2399,7 +2399,8 @@ void tb_check_watchpoint(CPUState *cpu, uintptr_t retaddr)
>  }
>  
>  #ifndef CONFIG_USER_ONLY
> -/* in deterministic execution mode, instructions doing device I/Os
> +/*
> + * In deterministic execution mode, instructions doing device I/Os
>   * must be at the end of the TB.
>   *
>   * Called by softmmu_template.h, with iothread mutex not held.
> @@ -2430,19 +2431,17 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
>          n = 2;
>      }
>  
> -    /* Generate a new TB executing the I/O insn.  */
> -    cpu->cflags_next_tb = curr_cflags() | CF_LAST_IO | n;
> +    /*
> +     * Exit the loop and potentially generate a new TB executing the
> +     * just the I/O insns. We also disable instrumentation so we don't
> +     * double count the instruction.
> +     */
> +    cpu->cflags_next_tb = curr_cflags() | CF_NOINSTR | CF_LAST_IO | n;
>  
>      qemu_log_mask_and_addr(CPU_LOG_EXEC, tb->pc,
>                             "cpu_io_recompile: rewound execution of TB to "
>                             TARGET_FMT_lx "\n", tb->pc);
>  
> -    /* TODO: If env->pc != tb->pc (i.e. the faulting instruction was not
> -     * the first in the TB) then we end up generating a whole new TB and
> -     *  repeating the fault, which is horribly inefficient.
> -     *  Better would be to execute just this insn uncached, or generate a
> -     *  second new TB.
> -     */
>      cpu_loop_exit_noexc(cpu);
>  }
>  
> diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
> index a49a794065..14d1ea795d 100644
> --- a/accel/tcg/translator.c
> +++ b/accel/tcg/translator.c
> @@ -58,7 +58,7 @@ void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
>      ops->tb_start(db, cpu);
>      tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
>  
> -    plugin_enabled = plugin_gen_tb_start(cpu, tb);
> +    plugin_enabled = !(tb_cflags(db->tb) & CF_NOINSTR) && plugin_gen_tb_start(cpu, tb);
>  
>      while (true) {
>          db->num_insns++;
> -- 
> 2.20.1
> 


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH  v2 20/21] accel/tcg: allow plugin instrumentation to be disable via cflags
  2021-02-12  0:53   ` Aaron Lindsay via
@ 2021-02-12 11:22     ` Alex Bennée
  2021-02-12 14:31       ` Aaron Lindsay via
  2021-02-12 14:43     ` Alex Bennée
  1 sibling, 1 reply; 53+ messages in thread
From: Alex Bennée @ 2021-02-12 11:22 UTC (permalink / raw)
  To: Aaron Lindsay
  Cc: Richard Henderson, qemu-devel, robhenry, mahmoudabdalghany, cota,
	Paolo Bonzini, kuhn.chenqun


Aaron Lindsay <aaron@os.amperecomputing.com> writes:

> On Feb 10 22:10, Alex Bennée wrote:
>> When icount is enabled and we recompile an MMIO access we end up
>> double counting the instruction execution. To avoid this we introduce
>> the CF_NOINSTR cflag which disables instrumentation for the next TB.
>> As this is part of the hashed compile flags we will only execute the
>> generated TB while coming out of a cpu_io_recompile.
>
> Unfortunately this patch works a little too well!
>
> With this change, the memory access callbacks registered via
> `qemu_plugin_register_vcpu_mem_cb()` are never called for the
> re-translated instruction making the IO access, since we've disabled all
> instrumentation.

Hmm well we correctly don't instrument stores (as we have already
executed the plugin for them) - but of course the load instrumentation
is after the fact so we are now missing them.


> Is it possible to selectively disable only instruction callbacks using
> this mechanism, while still allowing others that would not yet have been
> called for the re-translated instruction?

Hmmm let me see if I can finesse the CF_NOINSTR logic to allow
plugin_gen_insn_end() without the rest? It probably needs a better name
for the flag as well. 

>
> -Aaron
>
>> While we are at it delete the old TODO. We might as well keep the
>> translation handy as it's likely you will repeatedly hit it on each
>> MMIO access.
>> 
>> Reported-by: Aaron Lindsay <aaron@os.amperecomputing.com>
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>> Message-Id: <20210209182749.31323-12-alex.bennee@linaro.org>
>> 
>> ---
>> v2
>>   - squashed CH_HASHMASK to ~CF_INVALID
>> ---
>>  include/exec/exec-all.h   |  6 +++---
>>  accel/tcg/translate-all.c | 17 ++++++++---------
>>  accel/tcg/translator.c    |  2 +-
>>  3 files changed, 12 insertions(+), 13 deletions(-)
>> 
>> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
>> index e08179de34..299282cc59 100644
>> --- a/include/exec/exec-all.h
>> +++ b/include/exec/exec-all.h
>> @@ -454,14 +454,14 @@ struct TranslationBlock {
>>      uint32_t cflags;    /* compile flags */
>>  #define CF_COUNT_MASK  0x00007fff
>>  #define CF_LAST_IO     0x00008000 /* Last insn may be an IO access.  */
>> +#define CF_NOINSTR     0x00010000 /* Disable instrumentation of TB */
>>  #define CF_USE_ICOUNT  0x00020000
>>  #define CF_INVALID     0x00040000 /* TB is stale. Set with @jmp_lock held */
>>  #define CF_PARALLEL    0x00080000 /* Generate code for a parallel context */
>>  #define CF_CLUSTER_MASK 0xff000000 /* Top 8 bits are cluster ID */
>>  #define CF_CLUSTER_SHIFT 24
>> -/* cflags' mask for hashing/comparison */
>> -#define CF_HASH_MASK   \
>> -    (CF_COUNT_MASK | CF_LAST_IO | CF_USE_ICOUNT | CF_PARALLEL | CF_CLUSTER_MASK)
>> +/* cflags' mask for hashing/comparison, basically ignore CF_INVALID */
>> +#define CF_HASH_MASK   (~CF_INVALID)
>>  
>>      /* Per-vCPU dynamic tracing state used to generate this TB */
>>      uint32_t trace_vcpu_dstate;
>> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
>> index 0666f9ef14..32a3d8fe24 100644
>> --- a/accel/tcg/translate-all.c
>> +++ b/accel/tcg/translate-all.c
>> @@ -2399,7 +2399,8 @@ void tb_check_watchpoint(CPUState *cpu, uintptr_t retaddr)
>>  }
>>  
>>  #ifndef CONFIG_USER_ONLY
>> -/* in deterministic execution mode, instructions doing device I/Os
>> +/*
>> + * In deterministic execution mode, instructions doing device I/Os
>>   * must be at the end of the TB.
>>   *
>>   * Called by softmmu_template.h, with iothread mutex not held.
>> @@ -2430,19 +2431,17 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
>>          n = 2;
>>      }
>>  
>> -    /* Generate a new TB executing the I/O insn.  */
>> -    cpu->cflags_next_tb = curr_cflags() | CF_LAST_IO | n;
>> +    /*
>> +     * Exit the loop and potentially generate a new TB executing the
>> +     * just the I/O insns. We also disable instrumentation so we don't
>> +     * double count the instruction.
>> +     */
>> +    cpu->cflags_next_tb = curr_cflags() | CF_NOINSTR | CF_LAST_IO | n;
>>  
>>      qemu_log_mask_and_addr(CPU_LOG_EXEC, tb->pc,
>>                             "cpu_io_recompile: rewound execution of TB to "
>>                             TARGET_FMT_lx "\n", tb->pc);
>>  
>> -    /* TODO: If env->pc != tb->pc (i.e. the faulting instruction was not
>> -     * the first in the TB) then we end up generating a whole new TB and
>> -     *  repeating the fault, which is horribly inefficient.
>> -     *  Better would be to execute just this insn uncached, or generate a
>> -     *  second new TB.
>> -     */
>>      cpu_loop_exit_noexc(cpu);
>>  }
>>  
>> diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
>> index a49a794065..14d1ea795d 100644
>> --- a/accel/tcg/translator.c
>> +++ b/accel/tcg/translator.c
>> @@ -58,7 +58,7 @@ void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
>>      ops->tb_start(db, cpu);
>>      tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
>>  
>> -    plugin_enabled = plugin_gen_tb_start(cpu, tb);
>> +    plugin_enabled = !(tb_cflags(db->tb) & CF_NOINSTR) && plugin_gen_tb_start(cpu, tb);
>>  
>>      while (true) {
>>          db->num_insns++;
>> -- 
>> 2.20.1
>> 


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH  v2 20/21] accel/tcg: allow plugin instrumentation to be disable via cflags
  2021-02-12 11:22     ` Alex Bennée
@ 2021-02-12 14:31       ` Aaron Lindsay via
  2021-02-12 14:59         ` Alex Bennée
  0 siblings, 1 reply; 53+ messages in thread
From: Aaron Lindsay via @ 2021-02-12 14:31 UTC (permalink / raw)
  To: Alex Bennée
  Cc: qemu-devel, cota, kuhn.chenqun, robhenry, mahmoudabdalghany,
	Richard Henderson, Paolo Bonzini

On Feb 12 11:22, Alex Bennée wrote:
> Aaron Lindsay <aaron@os.amperecomputing.com> writes:
> > On Feb 10 22:10, Alex Bennée wrote:
> >> When icount is enabled and we recompile an MMIO access we end up
> >> double counting the instruction execution. To avoid this we introduce
> >> the CF_NOINSTR cflag which disables instrumentation for the next TB.
> >> As this is part of the hashed compile flags we will only execute the
> >> generated TB while coming out of a cpu_io_recompile.
> >
> > Unfortunately this patch works a little too well!
> >
> > With this change, the memory access callbacks registered via
> > `qemu_plugin_register_vcpu_mem_cb()` are never called for the
> > re-translated instruction making the IO access, since we've disabled all
> > instrumentation.
> 
> Hmm well we correctly don't instrument stores (as we have already
> executed the plugin for them) - but of course the load instrumentation
> is after the fact so we are now missing them.

I do not believe I am seeing memory callbacks for stores, either. Are
you saying I definitely should be?

My original observation was that the callbacks for store instructions to
IO followed the same pattern as loads:

1) Initial instruction callback (presumably as part of larger block)
2) Second instruction callback (presumably as part of single-instruction block)
3) Memory callback (presumably as part of single-instruction block)

After applying v2 of your patchset I now see only 1), even for stores.

> > Is it possible to selectively disable only instruction callbacks using
> > this mechanism, while still allowing others that would not yet have been
> > called for the re-translated instruction?
> 
> Hmmm let me see if I can finesse the CF_NOINSTR logic to allow
> plugin_gen_insn_end() without the rest? It probably needs a better name
> for the flag as well. 

Funny, the first time reading through this patch I was unsure for a
second whether "CF_NOINSTR" stood for "NO INSTRuction callbacks" or "NO
INSTRumentation"!

-Aaron


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH  v2 20/21] accel/tcg: allow plugin instrumentation to be disable via cflags
  2021-02-12  0:53   ` Aaron Lindsay via
  2021-02-12 11:22     ` Alex Bennée
@ 2021-02-12 14:43     ` Alex Bennée
  2021-02-12 15:41       ` Aaron Lindsay via
  2021-02-12 16:00       ` Alex Bennée
  1 sibling, 2 replies; 53+ messages in thread
From: Alex Bennée @ 2021-02-12 14:43 UTC (permalink / raw)
  To: Aaron Lindsay
  Cc: Richard Henderson, qemu-devel, robhenry, mahmoudabdalghany, cota,
	Paolo Bonzini, kuhn.chenqun


Aaron Lindsay <aaron@os.amperecomputing.com> writes:

> On Feb 10 22:10, Alex Bennée wrote:
>> When icount is enabled and we recompile an MMIO access we end up
>> double counting the instruction execution. To avoid this we introduce
>> the CF_NOINSTR cflag which disables instrumentation for the next TB.
>> As this is part of the hashed compile flags we will only execute the
>> generated TB while coming out of a cpu_io_recompile.
>
> Unfortunately this patch works a little too well!
>
> With this change, the memory access callbacks registered via
> `qemu_plugin_register_vcpu_mem_cb()` are never called for the
> re-translated instruction making the IO access, since we've disabled all
> instrumentation.
>
> Is it possible to selectively disable only instruction callbacks using
> this mechanism, while still allowing others that would not yet have been
> called for the re-translated instruction?

Can you try the following fugly patch on top of this series:

--8<---------------cut here---------------start------------->8---
diff --git a/include/exec/plugin-gen.h b/include/exec/plugin-gen.h
index 4834a9e2f4..b1b72b5d90 100644
--- a/include/exec/plugin-gen.h
+++ b/include/exec/plugin-gen.h
@@ -19,7 +19,7 @@ struct DisasContextBase;
 
 #ifdef CONFIG_PLUGIN
 
-bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb);
+bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool supress);
 void plugin_gen_tb_end(CPUState *cpu);
 void plugin_gen_insn_start(CPUState *cpu, const struct DisasContextBase *db);
 void plugin_gen_insn_end(void);
@@ -41,7 +41,7 @@ static inline void plugin_insn_append(const void *from, size_t size)
 #else /* !CONFIG_PLUGIN */
 
 static inline
-bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb)
+bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool supress)
 {
     return false;
 }
diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
index 841deed79c..2a26a2277f 100644
--- a/include/qemu/plugin.h
+++ b/include/qemu/plugin.h
@@ -92,6 +92,7 @@ struct qemu_plugin_dyn_cb {
     };
 };
 
+/* Internal context for instrumenting an instruction */
 struct qemu_plugin_insn {
     GByteArray *data;
     uint64_t vaddr;
@@ -99,6 +100,7 @@ struct qemu_plugin_insn {
     GArray *cbs[PLUGIN_N_CB_TYPES][PLUGIN_N_CB_SUBTYPES];
     bool calls_helpers;
     bool mem_helper;
+    bool store_only;
 };
 
 /*
@@ -128,6 +130,7 @@ static inline struct qemu_plugin_insn *qemu_plugin_insn_alloc(void)
     return insn;
 }
 
+/* Internal context for this TranslationBlock */
 struct qemu_plugin_tb {
     GPtrArray *insns;
     size_t n;
@@ -135,6 +138,7 @@ struct qemu_plugin_tb {
     uint64_t vaddr2;
     void *haddr1;
     void *haddr2;
+    bool store_only;
     GArray *cbs[PLUGIN_N_CB_SUBTYPES];
 };
 
diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index 8a1bb801e0..137b91282e 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -842,7 +842,7 @@ static void plugin_gen_inject(const struct qemu_plugin_tb *plugin_tb)
     pr_ops();
 }
 
-bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb)
+bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool store_only)
 {
     struct qemu_plugin_tb *ptb = tcg_ctx->plugin_tb;
     bool ret = false;
@@ -855,6 +855,7 @@ bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb)
         ptb->vaddr2 = -1;
         get_page_addr_code_hostp(cpu->env_ptr, tb->pc, &ptb->haddr1);
         ptb->haddr2 = NULL;
+        ptb->store_only = store_only;
 
         plugin_gen_empty_callback(PLUGIN_GEN_FROM_TB);
     }
diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index 14d1ea795d..082f2c8ee1 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -58,7 +58,7 @@ void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
     ops->tb_start(db, cpu);
     tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
 
-    plugin_enabled = !(tb_cflags(db->tb) & CF_NOINSTR) && plugin_gen_tb_start(cpu, tb);
+    plugin_enabled = plugin_gen_tb_start(cpu, tb, tb_cflags(db->tb) & CF_NOINSTR);
 
     while (true) {
         db->num_insns++;
@@ -100,6 +100,8 @@ void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
             gen_io_start();
             ops->translate_insn(db, cpu);
         } else {
+            /* we should only see NOINSTR for io_recompile */
+            g_assert(!(tb_cflags(db->tb) & CF_NOINSTR));
             ops->translate_insn(db, cpu);
         }
 
diff --git a/plugins/api.c b/plugins/api.c
index 5dc8e6f934..ac8475707d 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -84,15 +84,19 @@ void qemu_plugin_register_vcpu_tb_exec_cb(struct qemu_plugin_tb *tb,
                                           enum qemu_plugin_cb_flags flags,
                                           void *udata)
 {
-    plugin_register_dyn_cb__udata(&tb->cbs[PLUGIN_CB_REGULAR],
-                                  cb, flags, udata);
+    if (!tb->store_only) {
+        plugin_register_dyn_cb__udata(&tb->cbs[PLUGIN_CB_REGULAR],
+                                      cb, flags, udata);
+    }
 }
 
 void qemu_plugin_register_vcpu_tb_exec_inline(struct qemu_plugin_tb *tb,
                                               enum qemu_plugin_op op,
                                               void *ptr, uint64_t imm)
 {
-    plugin_register_inline_op(&tb->cbs[PLUGIN_CB_INLINE], 0, op, ptr, imm);
+    if (!tb->store_only) {
+        plugin_register_inline_op(&tb->cbs[PLUGIN_CB_INLINE], 0, op, ptr, imm);
+    }
 }
 
 void qemu_plugin_register_vcpu_insn_exec_cb(struct qemu_plugin_insn *insn,
@@ -100,16 +104,20 @@ void qemu_plugin_register_vcpu_insn_exec_cb(struct qemu_plugin_insn *insn,
                                             enum qemu_plugin_cb_flags flags,
                                             void *udata)
 {
-    plugin_register_dyn_cb__udata(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_REGULAR],
-        cb, flags, udata);
+    if (!insn->store_only) {
+        plugin_register_dyn_cb__udata(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_REGULAR],
+                                      cb, flags, udata);
+    }
 }
 
 void qemu_plugin_register_vcpu_insn_exec_inline(struct qemu_plugin_insn *insn,
                                                 enum qemu_plugin_op op,
                                                 void *ptr, uint64_t imm)
 {
-    plugin_register_inline_op(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_INLINE],
-                              0, op, ptr, imm);
+    if (!insn->store_only) {
+        plugin_register_inline_op(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_INLINE],
+                                  0, op, ptr, imm);
+    }
 }
 
 
@@ -120,8 +128,13 @@ void qemu_plugin_register_vcpu_mem_cb(struct qemu_plugin_insn *insn,
                                       enum qemu_plugin_mem_rw rw,
                                       void *udata)
 {
-    plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
-                                cb, flags, rw, udata);
+    if (insn->store_only && (rw & QEMU_PLUGIN_MEM_W)) {
+        plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
+                                    cb, flags, QEMU_PLUGIN_MEM_W, udata);
+    } else {
+        plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
+                                    cb, flags, rw, udata);
+    }
 }
 
 void qemu_plugin_register_vcpu_mem_inline(struct qemu_plugin_insn *insn,
@@ -129,8 +142,10 @@ void qemu_plugin_register_vcpu_mem_inline(struct qemu_plugin_insn *insn,
                                           enum qemu_plugin_op op, void *ptr,
                                           uint64_t imm)
 {
-    plugin_register_inline_op(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE],
-        rw, op, ptr, imm);
+    if (!insn->store_only) {
+        plugin_register_inline_op(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE],
+                                  rw, op, ptr, imm);
+    }
 }
 
 void qemu_plugin_register_vcpu_tb_trans_cb(qemu_plugin_id_t id,
@@ -181,10 +196,13 @@ uint64_t qemu_plugin_tb_vaddr(const struct qemu_plugin_tb *tb)
 struct qemu_plugin_insn *
 qemu_plugin_tb_get_insn(const struct qemu_plugin_tb *tb, size_t idx)
 {
+    struct qemu_plugin_insn *insn;
     if (unlikely(idx >= tb->n)) {
         return NULL;
     }
-    return g_ptr_array_index(tb->insns, idx);
+    insn = g_ptr_array_index(tb->insns, idx);
+    insn->store_only = tb->store_only;
+    return insn;
 }
 
 /*
--8<---------------cut here---------------end--------------->8---

-- 
Alex Bennée


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH  v2 20/21] accel/tcg: allow plugin instrumentation to be disable via cflags
  2021-02-12 14:31       ` Aaron Lindsay via
@ 2021-02-12 14:59         ` Alex Bennée
  0 siblings, 0 replies; 53+ messages in thread
From: Alex Bennée @ 2021-02-12 14:59 UTC (permalink / raw)
  To: Aaron Lindsay
  Cc: Richard Henderson, qemu-devel, robhenry, mahmoudabdalghany, cota,
	Paolo Bonzini, kuhn.chenqun


Aaron Lindsay <aaron@os.amperecomputing.com> writes:

> On Feb 12 11:22, Alex Bennée wrote:
>> Aaron Lindsay <aaron@os.amperecomputing.com> writes:
>> > On Feb 10 22:10, Alex Bennée wrote:
>> >> When icount is enabled and we recompile an MMIO access we end up
>> >> double counting the instruction execution. To avoid this we introduce
>> >> the CF_NOINSTR cflag which disables instrumentation for the next TB.
>> >> As this is part of the hashed compile flags we will only execute the
>> >> generated TB while coming out of a cpu_io_recompile.
>> >
>> > Unfortunately this patch works a little too well!
>> >
>> > With this change, the memory access callbacks registered via
>> > `qemu_plugin_register_vcpu_mem_cb()` are never called for the
>> > re-translated instruction making the IO access, since we've disabled all
>> > instrumentation.
>> 
>> Hmm well we correctly don't instrument stores (as we have already
>> executed the plugin for them) - but of course the load instrumentation
>> is after the fact so we are now missing them.
>
> I do not believe I am seeing memory callbacks for stores, either. Are
> you saying I definitely should be?
>
> My original observation was that the callbacks for store instructions to
> IO followed the same pattern as loads:
>
> 1) Initial instruction callback (presumably as part of larger block)
> 2) Second instruction callback (presumably as part of single-instruction block)
> 3) Memory callback (presumably as part of single-instruction block)
>
> After applying v2 of your patchset I now see only 1), even for stores.

Right - but any pre-instruction instrumentation shouldn't be done in the
(now badly names CF_NOINSTR) case. It's also confusing because we have
pre and post helpers and inline callbacks are always pre (you can only
count so don't see data).

Can you check the patch in my other email and see if that works better?

>
>> > Is it possible to selectively disable only instruction callbacks using
>> > this mechanism, while still allowing others that would not yet have been
>> > called for the re-translated instruction?
>> 
>> Hmmm let me see if I can finesse the CF_NOINSTR logic to allow
>> plugin_gen_insn_end() without the rest? It probably needs a better name
>> for the flag as well. 
>
> Funny, the first time reading through this patch I was unsure for a
> second whether "CF_NOINSTR" stood for "NO INSTRuction callbacks" or "NO
> INSTRumentation"!
>
> -Aaron


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 16/21] accel/tcg: actually cache our partial icount TB
  2021-02-11 18:48     ` Richard Henderson
@ 2021-02-12 15:40       ` Philippe Mathieu-Daudé
  2021-02-12 17:06         ` Alex Bennée
  0 siblings, 1 reply; 53+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-02-12 15:40 UTC (permalink / raw)
  To: Richard Henderson, Alex Bennée, qemu-devel
  Cc: robhenry, mahmoudabdalghany, aaron, cota, Paolo Bonzini, kuhn.chenqun

On 2/11/21 7:48 PM, Richard Henderson wrote:
> On 2/11/21 2:21 AM, Philippe Mathieu-Daudé wrote:
>>> -    insns_left = MIN(0xffff, cpu->icount_budget);
>>> +    insns_left = MIN(CF_COUNT_MASK, cpu->icount_budget);
>>
>> Can you describe this change a bit please?
> 
> Replacing a magic number with its proper symbol.

I am confuse because I see:

#define CF_COUNT_MASK  0x00007fff


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH  v2 20/21] accel/tcg: allow plugin instrumentation to be disable via cflags
  2021-02-12 14:43     ` Alex Bennée
@ 2021-02-12 15:41       ` Aaron Lindsay via
  2021-02-12 16:04         ` Alex Bennée
  2021-02-12 16:00       ` Alex Bennée
  1 sibling, 1 reply; 53+ messages in thread
From: Aaron Lindsay via @ 2021-02-12 15:41 UTC (permalink / raw)
  To: Alex Bennée
  Cc: qemu-devel, cota, kuhn.chenqun, robhenry, mahmoudabdalghany,
	Richard Henderson, Paolo Bonzini

On Feb 12 14:43, Alex Bennée wrote:
> Aaron Lindsay <aaron@os.amperecomputing.com> writes:
> > On Feb 10 22:10, Alex Bennée wrote:
> >> When icount is enabled and we recompile an MMIO access we end up
> >> double counting the instruction execution. To avoid this we introduce
> >> the CF_NOINSTR cflag which disables instrumentation for the next TB.
> >> As this is part of the hashed compile flags we will only execute the
> >> generated TB while coming out of a cpu_io_recompile.
> >
> > Unfortunately this patch works a little too well!
> >
> > With this change, the memory access callbacks registered via
> > `qemu_plugin_register_vcpu_mem_cb()` are never called for the
> > re-translated instruction making the IO access, since we've disabled all
> > instrumentation.
> >
> > Is it possible to selectively disable only instruction callbacks using
> > this mechanism, while still allowing others that would not yet have been
> > called for the re-translated instruction?
> 
> Can you try the following fugly patch on top of this series:

This patch does allow me to successfully observe memory callbacks for
stores in this case. It seems from looking at the patch that you
intentionally only allowed memory callbacks for stores in this case, and
I still don't see callbacks any for loads.

-Aaron

> --8<---------------cut here---------------start------------->8---
> diff --git a/include/exec/plugin-gen.h b/include/exec/plugin-gen.h
> index 4834a9e2f4..b1b72b5d90 100644
> --- a/include/exec/plugin-gen.h
> +++ b/include/exec/plugin-gen.h
> @@ -19,7 +19,7 @@ struct DisasContextBase;
>  
>  #ifdef CONFIG_PLUGIN
>  
> -bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb);
> +bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool supress);
>  void plugin_gen_tb_end(CPUState *cpu);
>  void plugin_gen_insn_start(CPUState *cpu, const struct DisasContextBase *db);
>  void plugin_gen_insn_end(void);
> @@ -41,7 +41,7 @@ static inline void plugin_insn_append(const void *from, size_t size)
>  #else /* !CONFIG_PLUGIN */
>  
>  static inline
> -bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb)
> +bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool supress)
>  {
>      return false;
>  }
> diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
> index 841deed79c..2a26a2277f 100644
> --- a/include/qemu/plugin.h
> +++ b/include/qemu/plugin.h
> @@ -92,6 +92,7 @@ struct qemu_plugin_dyn_cb {
>      };
>  };
>  
> +/* Internal context for instrumenting an instruction */
>  struct qemu_plugin_insn {
>      GByteArray *data;
>      uint64_t vaddr;
> @@ -99,6 +100,7 @@ struct qemu_plugin_insn {
>      GArray *cbs[PLUGIN_N_CB_TYPES][PLUGIN_N_CB_SUBTYPES];
>      bool calls_helpers;
>      bool mem_helper;
> +    bool store_only;
>  };
>  
>  /*
> @@ -128,6 +130,7 @@ static inline struct qemu_plugin_insn *qemu_plugin_insn_alloc(void)
>      return insn;
>  }
>  
> +/* Internal context for this TranslationBlock */
>  struct qemu_plugin_tb {
>      GPtrArray *insns;
>      size_t n;
> @@ -135,6 +138,7 @@ struct qemu_plugin_tb {
>      uint64_t vaddr2;
>      void *haddr1;
>      void *haddr2;
> +    bool store_only;
>      GArray *cbs[PLUGIN_N_CB_SUBTYPES];
>  };
>  
> diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
> index 8a1bb801e0..137b91282e 100644
> --- a/accel/tcg/plugin-gen.c
> +++ b/accel/tcg/plugin-gen.c
> @@ -842,7 +842,7 @@ static void plugin_gen_inject(const struct qemu_plugin_tb *plugin_tb)
>      pr_ops();
>  }
>  
> -bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb)
> +bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool store_only)
>  {
>      struct qemu_plugin_tb *ptb = tcg_ctx->plugin_tb;
>      bool ret = false;
> @@ -855,6 +855,7 @@ bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb)
>          ptb->vaddr2 = -1;
>          get_page_addr_code_hostp(cpu->env_ptr, tb->pc, &ptb->haddr1);
>          ptb->haddr2 = NULL;
> +        ptb->store_only = store_only;
>  
>          plugin_gen_empty_callback(PLUGIN_GEN_FROM_TB);
>      }
> diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
> index 14d1ea795d..082f2c8ee1 100644
> --- a/accel/tcg/translator.c
> +++ b/accel/tcg/translator.c
> @@ -58,7 +58,7 @@ void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
>      ops->tb_start(db, cpu);
>      tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
>  
> -    plugin_enabled = !(tb_cflags(db->tb) & CF_NOINSTR) && plugin_gen_tb_start(cpu, tb);
> +    plugin_enabled = plugin_gen_tb_start(cpu, tb, tb_cflags(db->tb) & CF_NOINSTR);
>  
>      while (true) {
>          db->num_insns++;
> @@ -100,6 +100,8 @@ void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
>              gen_io_start();
>              ops->translate_insn(db, cpu);
>          } else {
> +            /* we should only see NOINSTR for io_recompile */
> +            g_assert(!(tb_cflags(db->tb) & CF_NOINSTR));
>              ops->translate_insn(db, cpu);
>          }
>  
> diff --git a/plugins/api.c b/plugins/api.c
> index 5dc8e6f934..ac8475707d 100644
> --- a/plugins/api.c
> +++ b/plugins/api.c
> @@ -84,15 +84,19 @@ void qemu_plugin_register_vcpu_tb_exec_cb(struct qemu_plugin_tb *tb,
>                                            enum qemu_plugin_cb_flags flags,
>                                            void *udata)
>  {
> -    plugin_register_dyn_cb__udata(&tb->cbs[PLUGIN_CB_REGULAR],
> -                                  cb, flags, udata);
> +    if (!tb->store_only) {
> +        plugin_register_dyn_cb__udata(&tb->cbs[PLUGIN_CB_REGULAR],
> +                                      cb, flags, udata);
> +    }
>  }
>  
>  void qemu_plugin_register_vcpu_tb_exec_inline(struct qemu_plugin_tb *tb,
>                                                enum qemu_plugin_op op,
>                                                void *ptr, uint64_t imm)
>  {
> -    plugin_register_inline_op(&tb->cbs[PLUGIN_CB_INLINE], 0, op, ptr, imm);
> +    if (!tb->store_only) {
> +        plugin_register_inline_op(&tb->cbs[PLUGIN_CB_INLINE], 0, op, ptr, imm);
> +    }
>  }
>  
>  void qemu_plugin_register_vcpu_insn_exec_cb(struct qemu_plugin_insn *insn,
> @@ -100,16 +104,20 @@ void qemu_plugin_register_vcpu_insn_exec_cb(struct qemu_plugin_insn *insn,
>                                              enum qemu_plugin_cb_flags flags,
>                                              void *udata)
>  {
> -    plugin_register_dyn_cb__udata(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_REGULAR],
> -        cb, flags, udata);
> +    if (!insn->store_only) {
> +        plugin_register_dyn_cb__udata(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_REGULAR],
> +                                      cb, flags, udata);
> +    }
>  }
>  
>  void qemu_plugin_register_vcpu_insn_exec_inline(struct qemu_plugin_insn *insn,
>                                                  enum qemu_plugin_op op,
>                                                  void *ptr, uint64_t imm)
>  {
> -    plugin_register_inline_op(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_INLINE],
> -                              0, op, ptr, imm);
> +    if (!insn->store_only) {
> +        plugin_register_inline_op(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_INLINE],
> +                                  0, op, ptr, imm);
> +    }
>  }
>  
>  
> @@ -120,8 +128,13 @@ void qemu_plugin_register_vcpu_mem_cb(struct qemu_plugin_insn *insn,
>                                        enum qemu_plugin_mem_rw rw,
>                                        void *udata)
>  {
> -    plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
> -                                cb, flags, rw, udata);
> +    if (insn->store_only && (rw & QEMU_PLUGIN_MEM_W)) {
> +        plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
> +                                    cb, flags, QEMU_PLUGIN_MEM_W, udata);
> +    } else {
> +        plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
> +                                    cb, flags, rw, udata);
> +    }
>  }
>  
>  void qemu_plugin_register_vcpu_mem_inline(struct qemu_plugin_insn *insn,
> @@ -129,8 +142,10 @@ void qemu_plugin_register_vcpu_mem_inline(struct qemu_plugin_insn *insn,
>                                            enum qemu_plugin_op op, void *ptr,
>                                            uint64_t imm)
>  {
> -    plugin_register_inline_op(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE],
> -        rw, op, ptr, imm);
> +    if (!insn->store_only) {
> +        plugin_register_inline_op(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE],
> +                                  rw, op, ptr, imm);
> +    }
>  }
>  
>  void qemu_plugin_register_vcpu_tb_trans_cb(qemu_plugin_id_t id,
> @@ -181,10 +196,13 @@ uint64_t qemu_plugin_tb_vaddr(const struct qemu_plugin_tb *tb)
>  struct qemu_plugin_insn *
>  qemu_plugin_tb_get_insn(const struct qemu_plugin_tb *tb, size_t idx)
>  {
> +    struct qemu_plugin_insn *insn;
>      if (unlikely(idx >= tb->n)) {
>          return NULL;
>      }
> -    return g_ptr_array_index(tb->insns, idx);
> +    insn = g_ptr_array_index(tb->insns, idx);
> +    insn->store_only = tb->store_only;
> +    return insn;
>  }
>  
>  /*
> --8<---------------cut here---------------end--------------->8---
> 
> -- 
> Alex Bennée


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH  v2 20/21] accel/tcg: allow plugin instrumentation to be disable via cflags
  2021-02-12 14:43     ` Alex Bennée
  2021-02-12 15:41       ` Aaron Lindsay via
@ 2021-02-12 16:00       ` Alex Bennée
  2021-02-12 17:04         ` Aaron Lindsay via
  1 sibling, 1 reply; 53+ messages in thread
From: Alex Bennée @ 2021-02-12 16:00 UTC (permalink / raw)
  To: Aaron Lindsay
  Cc: Richard Henderson, qemu-devel, robhenry, mahmoudabdalghany, cota,
	Paolo Bonzini, kuhn.chenqun


Alex Bennée <alex.bennee@linaro.org> writes:

> Aaron Lindsay <aaron@os.amperecomputing.com> writes:
>
>> On Feb 10 22:10, Alex Bennée wrote:
>>> When icount is enabled and we recompile an MMIO access we end up
>>> double counting the instruction execution. To avoid this we introduce
>>> the CF_NOINSTR cflag which disables instrumentation for the next TB.
>>> As this is part of the hashed compile flags we will only execute the
>>> generated TB while coming out of a cpu_io_recompile.
>>
>> Unfortunately this patch works a little too well!
>>
>> With this change, the memory access callbacks registered via
>> `qemu_plugin_register_vcpu_mem_cb()` are never called for the
>> re-translated instruction making the IO access, since we've disabled all
>> instrumentation.
>>
>> Is it possible to selectively disable only instruction callbacks using
>> this mechanism, while still allowing others that would not yet have been
>> called for the re-translated instruction?
>
> Can you try the following fugly patch on top of this series:
>
<snip>
> @@ -120,8 +128,13 @@ void qemu_plugin_register_vcpu_mem_cb(struct qemu_plugin_insn *insn,
>                                        enum qemu_plugin_mem_rw rw,
>                                        void *udata)
>  {
> -    plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
> -                                cb, flags, rw, udata);
> +    if (insn->store_only && (rw & QEMU_PLUGIN_MEM_W)) {
> +        plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
> +                                    cb, flags, QEMU_PLUGIN_MEM_W, udata);
> +    } else {
> +        plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
> +                                    cb, flags, rw, udata);
> +    }
>  }
<snip>

Actually I'm wondering if I've got my sense the wrong way around. Should
it be loads only:

  void qemu_plugin_register_vcpu_mem_cb(struct qemu_plugin_insn *insn,
                                        qemu_plugin_vcpu_mem_cb_t cb,
                                        enum qemu_plugin_cb_flags flags,
                                        enum qemu_plugin_mem_rw rw,
                                        void *udata)
  {
      if (insn->store_only && (rw & QEMU_PLUGIN_MEM_R)) {
          plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
                                      cb, flags, QEMU_PLUGIN_MEM_R, udata);
      } else {
          plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
                                      cb, flags, rw, udata);
      }
  }


obviously I'd have to rename the variables :-/

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 20/21] accel/tcg: allow plugin instrumentation to be disable via cflags
  2021-02-12 15:41       ` Aaron Lindsay via
@ 2021-02-12 16:04         ` Alex Bennée
  2021-02-12 16:50           ` Aaron Lindsay via
  0 siblings, 1 reply; 53+ messages in thread
From: Alex Bennée @ 2021-02-12 16:04 UTC (permalink / raw)
  To: Aaron Lindsay
  Cc: Richard Henderson, QEMU Developers, Robert Henry,
	mahmoudabdalghany, Emilio G. Cota, Paolo Bonzini, Chenqun (kuhn)

Do you see two stores or one store? I think I got the sense the wrong
way around because the store is instrumented before the mmu code,
hence should be skipped on a re-instrumented block.

On Fri, 12 Feb 2021 at 15:41, Aaron Lindsay
<aaron@os.amperecomputing.com> wrote:
>
> On Feb 12 14:43, Alex Bennée wrote:
> > Aaron Lindsay <aaron@os.amperecomputing.com> writes:
> > > On Feb 10 22:10, Alex Bennée wrote:
> > >> When icount is enabled and we recompile an MMIO access we end up
> > >> double counting the instruction execution. To avoid this we introduce
> > >> the CF_NOINSTR cflag which disables instrumentation for the next TB.
> > >> As this is part of the hashed compile flags we will only execute the
> > >> generated TB while coming out of a cpu_io_recompile.
> > >
> > > Unfortunately this patch works a little too well!
> > >
> > > With this change, the memory access callbacks registered via
> > > `qemu_plugin_register_vcpu_mem_cb()` are never called for the
> > > re-translated instruction making the IO access, since we've disabled all
> > > instrumentation.
> > >
> > > Is it possible to selectively disable only instruction callbacks using
> > > this mechanism, while still allowing others that would not yet have been
> > > called for the re-translated instruction?
> >
> > Can you try the following fugly patch on top of this series:
>
> This patch does allow me to successfully observe memory callbacks for
> stores in this case. It seems from looking at the patch that you
> intentionally only allowed memory callbacks for stores in this case, and
> I still don't see callbacks any for loads.
>
> -Aaron
>
> > --8<---------------cut here---------------start------------->8---
> > diff --git a/include/exec/plugin-gen.h b/include/exec/plugin-gen.h
> > index 4834a9e2f4..b1b72b5d90 100644
> > --- a/include/exec/plugin-gen.h
> > +++ b/include/exec/plugin-gen.h
> > @@ -19,7 +19,7 @@ struct DisasContextBase;
> >
> >  #ifdef CONFIG_PLUGIN
> >
> > -bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb);
> > +bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool supress);
> >  void plugin_gen_tb_end(CPUState *cpu);
> >  void plugin_gen_insn_start(CPUState *cpu, const struct DisasContextBase *db);
> >  void plugin_gen_insn_end(void);
> > @@ -41,7 +41,7 @@ static inline void plugin_insn_append(const void *from, size_t size)
> >  #else /* !CONFIG_PLUGIN */
> >
> >  static inline
> > -bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb)
> > +bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool supress)
> >  {
> >      return false;
> >  }
> > diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
> > index 841deed79c..2a26a2277f 100644
> > --- a/include/qemu/plugin.h
> > +++ b/include/qemu/plugin.h
> > @@ -92,6 +92,7 @@ struct qemu_plugin_dyn_cb {
> >      };
> >  };
> >
> > +/* Internal context for instrumenting an instruction */
> >  struct qemu_plugin_insn {
> >      GByteArray *data;
> >      uint64_t vaddr;
> > @@ -99,6 +100,7 @@ struct qemu_plugin_insn {
> >      GArray *cbs[PLUGIN_N_CB_TYPES][PLUGIN_N_CB_SUBTYPES];
> >      bool calls_helpers;
> >      bool mem_helper;
> > +    bool store_only;
> >  };
> >
> >  /*
> > @@ -128,6 +130,7 @@ static inline struct qemu_plugin_insn *qemu_plugin_insn_alloc(void)
> >      return insn;
> >  }
> >
> > +/* Internal context for this TranslationBlock */
> >  struct qemu_plugin_tb {
> >      GPtrArray *insns;
> >      size_t n;
> > @@ -135,6 +138,7 @@ struct qemu_plugin_tb {
> >      uint64_t vaddr2;
> >      void *haddr1;
> >      void *haddr2;
> > +    bool store_only;
> >      GArray *cbs[PLUGIN_N_CB_SUBTYPES];
> >  };
> >
> > diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
> > index 8a1bb801e0..137b91282e 100644
> > --- a/accel/tcg/plugin-gen.c
> > +++ b/accel/tcg/plugin-gen.c
> > @@ -842,7 +842,7 @@ static void plugin_gen_inject(const struct qemu_plugin_tb *plugin_tb)
> >      pr_ops();
> >  }
> >
> > -bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb)
> > +bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool store_only)
> >  {
> >      struct qemu_plugin_tb *ptb = tcg_ctx->plugin_tb;
> >      bool ret = false;
> > @@ -855,6 +855,7 @@ bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb)
> >          ptb->vaddr2 = -1;
> >          get_page_addr_code_hostp(cpu->env_ptr, tb->pc, &ptb->haddr1);
> >          ptb->haddr2 = NULL;
> > +        ptb->store_only = store_only;
> >
> >          plugin_gen_empty_callback(PLUGIN_GEN_FROM_TB);
> >      }
> > diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
> > index 14d1ea795d..082f2c8ee1 100644
> > --- a/accel/tcg/translator.c
> > +++ b/accel/tcg/translator.c
> > @@ -58,7 +58,7 @@ void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
> >      ops->tb_start(db, cpu);
> >      tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
> >
> > -    plugin_enabled = !(tb_cflags(db->tb) & CF_NOINSTR) && plugin_gen_tb_start(cpu, tb);
> > +    plugin_enabled = plugin_gen_tb_start(cpu, tb, tb_cflags(db->tb) & CF_NOINSTR);
> >
> >      while (true) {
> >          db->num_insns++;
> > @@ -100,6 +100,8 @@ void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
> >              gen_io_start();
> >              ops->translate_insn(db, cpu);
> >          } else {
> > +            /* we should only see NOINSTR for io_recompile */
> > +            g_assert(!(tb_cflags(db->tb) & CF_NOINSTR));
> >              ops->translate_insn(db, cpu);
> >          }
> >
> > diff --git a/plugins/api.c b/plugins/api.c
> > index 5dc8e6f934..ac8475707d 100644
> > --- a/plugins/api.c
> > +++ b/plugins/api.c
> > @@ -84,15 +84,19 @@ void qemu_plugin_register_vcpu_tb_exec_cb(struct qemu_plugin_tb *tb,
> >                                            enum qemu_plugin_cb_flags flags,
> >                                            void *udata)
> >  {
> > -    plugin_register_dyn_cb__udata(&tb->cbs[PLUGIN_CB_REGULAR],
> > -                                  cb, flags, udata);
> > +    if (!tb->store_only) {
> > +        plugin_register_dyn_cb__udata(&tb->cbs[PLUGIN_CB_REGULAR],
> > +                                      cb, flags, udata);
> > +    }
> >  }
> >
> >  void qemu_plugin_register_vcpu_tb_exec_inline(struct qemu_plugin_tb *tb,
> >                                                enum qemu_plugin_op op,
> >                                                void *ptr, uint64_t imm)
> >  {
> > -    plugin_register_inline_op(&tb->cbs[PLUGIN_CB_INLINE], 0, op, ptr, imm);
> > +    if (!tb->store_only) {
> > +        plugin_register_inline_op(&tb->cbs[PLUGIN_CB_INLINE], 0, op, ptr, imm);
> > +    }
> >  }
> >
> >  void qemu_plugin_register_vcpu_insn_exec_cb(struct qemu_plugin_insn *insn,
> > @@ -100,16 +104,20 @@ void qemu_plugin_register_vcpu_insn_exec_cb(struct qemu_plugin_insn *insn,
> >                                              enum qemu_plugin_cb_flags flags,
> >                                              void *udata)
> >  {
> > -    plugin_register_dyn_cb__udata(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_REGULAR],
> > -        cb, flags, udata);
> > +    if (!insn->store_only) {
> > +        plugin_register_dyn_cb__udata(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_REGULAR],
> > +                                      cb, flags, udata);
> > +    }
> >  }
> >
> >  void qemu_plugin_register_vcpu_insn_exec_inline(struct qemu_plugin_insn *insn,
> >                                                  enum qemu_plugin_op op,
> >                                                  void *ptr, uint64_t imm)
> >  {
> > -    plugin_register_inline_op(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_INLINE],
> > -                              0, op, ptr, imm);
> > +    if (!insn->store_only) {
> > +        plugin_register_inline_op(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_INLINE],
> > +                                  0, op, ptr, imm);
> > +    }
> >  }
> >
> >
> > @@ -120,8 +128,13 @@ void qemu_plugin_register_vcpu_mem_cb(struct qemu_plugin_insn *insn,
> >                                        enum qemu_plugin_mem_rw rw,
> >                                        void *udata)
> >  {
> > -    plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
> > -                                cb, flags, rw, udata);
> > +    if (insn->store_only && (rw & QEMU_PLUGIN_MEM_W)) {
> > +        plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
> > +                                    cb, flags, QEMU_PLUGIN_MEM_W, udata);
> > +    } else {
> > +        plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
> > +                                    cb, flags, rw, udata);
> > +    }
> >  }
> >
> >  void qemu_plugin_register_vcpu_mem_inline(struct qemu_plugin_insn *insn,
> > @@ -129,8 +142,10 @@ void qemu_plugin_register_vcpu_mem_inline(struct qemu_plugin_insn *insn,
> >                                            enum qemu_plugin_op op, void *ptr,
> >                                            uint64_t imm)
> >  {
> > -    plugin_register_inline_op(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE],
> > -        rw, op, ptr, imm);
> > +    if (!insn->store_only) {
> > +        plugin_register_inline_op(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE],
> > +                                  rw, op, ptr, imm);
> > +    }
> >  }
> >
> >  void qemu_plugin_register_vcpu_tb_trans_cb(qemu_plugin_id_t id,
> > @@ -181,10 +196,13 @@ uint64_t qemu_plugin_tb_vaddr(const struct qemu_plugin_tb *tb)
> >  struct qemu_plugin_insn *
> >  qemu_plugin_tb_get_insn(const struct qemu_plugin_tb *tb, size_t idx)
> >  {
> > +    struct qemu_plugin_insn *insn;
> >      if (unlikely(idx >= tb->n)) {
> >          return NULL;
> >      }
> > -    return g_ptr_array_index(tb->insns, idx);
> > +    insn = g_ptr_array_index(tb->insns, idx);
> > +    insn->store_only = tb->store_only;
> > +    return insn;
> >  }
> >
> >  /*
> > --8<---------------cut here---------------end--------------->8---
> >
> > --
> > Alex Bennée



-- 
Alex Bennée
KVM/QEMU Hacker for Linaro


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 20/21] accel/tcg: allow plugin instrumentation to be disable via cflags
  2021-02-12 16:04         ` Alex Bennée
@ 2021-02-12 16:50           ` Aaron Lindsay via
  2021-02-12 17:19             ` Alex Bennée
  2021-02-16 10:34             ` Alex Bennée
  0 siblings, 2 replies; 53+ messages in thread
From: Aaron Lindsay via @ 2021-02-12 16:50 UTC (permalink / raw)
  To: Alex Bennée
  Cc: QEMU Developers, Emilio G. Cota, Chenqun (kuhn),
	Robert Henry, mahmoudabdalghany, Richard Henderson,
	Paolo Bonzini

On Feb 12 16:04, Alex Bennée wrote:
> Do you see two stores or one store? I think I got the sense the wrong
> way around because the store is instrumented before the mmu code,
> hence should be skipped on a re-instrumented block.

I only see one store between the instruction callback for the store and
the instruction callback for the subsequent instruction.

-Aaron

> On Fri, 12 Feb 2021 at 15:41, Aaron Lindsay
> <aaron@os.amperecomputing.com> wrote:
> >
> > On Feb 12 14:43, Alex Bennée wrote:
> > > Aaron Lindsay <aaron@os.amperecomputing.com> writes:
> > > > On Feb 10 22:10, Alex Bennée wrote:
> > > >> When icount is enabled and we recompile an MMIO access we end up
> > > >> double counting the instruction execution. To avoid this we introduce
> > > >> the CF_NOINSTR cflag which disables instrumentation for the next TB.
> > > >> As this is part of the hashed compile flags we will only execute the
> > > >> generated TB while coming out of a cpu_io_recompile.
> > > >
> > > > Unfortunately this patch works a little too well!
> > > >
> > > > With this change, the memory access callbacks registered via
> > > > `qemu_plugin_register_vcpu_mem_cb()` are never called for the
> > > > re-translated instruction making the IO access, since we've disabled all
> > > > instrumentation.
> > > >
> > > > Is it possible to selectively disable only instruction callbacks using
> > > > this mechanism, while still allowing others that would not yet have been
> > > > called for the re-translated instruction?
> > >
> > > Can you try the following fugly patch on top of this series:
> >
> > This patch does allow me to successfully observe memory callbacks for
> > stores in this case. It seems from looking at the patch that you
> > intentionally only allowed memory callbacks for stores in this case, and
> > I still don't see callbacks any for loads.
> >
> > -Aaron
> >
> > > --8<---------------cut here---------------start------------->8---
> > > diff --git a/include/exec/plugin-gen.h b/include/exec/plugin-gen.h
> > > index 4834a9e2f4..b1b72b5d90 100644
> > > --- a/include/exec/plugin-gen.h
> > > +++ b/include/exec/plugin-gen.h
> > > @@ -19,7 +19,7 @@ struct DisasContextBase;
> > >
> > >  #ifdef CONFIG_PLUGIN
> > >
> > > -bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb);
> > > +bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool supress);
> > >  void plugin_gen_tb_end(CPUState *cpu);
> > >  void plugin_gen_insn_start(CPUState *cpu, const struct DisasContextBase *db);
> > >  void plugin_gen_insn_end(void);
> > > @@ -41,7 +41,7 @@ static inline void plugin_insn_append(const void *from, size_t size)
> > >  #else /* !CONFIG_PLUGIN */
> > >
> > >  static inline
> > > -bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb)
> > > +bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool supress)
> > >  {
> > >      return false;
> > >  }
> > > diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
> > > index 841deed79c..2a26a2277f 100644
> > > --- a/include/qemu/plugin.h
> > > +++ b/include/qemu/plugin.h
> > > @@ -92,6 +92,7 @@ struct qemu_plugin_dyn_cb {
> > >      };
> > >  };
> > >
> > > +/* Internal context for instrumenting an instruction */
> > >  struct qemu_plugin_insn {
> > >      GByteArray *data;
> > >      uint64_t vaddr;
> > > @@ -99,6 +100,7 @@ struct qemu_plugin_insn {
> > >      GArray *cbs[PLUGIN_N_CB_TYPES][PLUGIN_N_CB_SUBTYPES];
> > >      bool calls_helpers;
> > >      bool mem_helper;
> > > +    bool store_only;
> > >  };
> > >
> > >  /*
> > > @@ -128,6 +130,7 @@ static inline struct qemu_plugin_insn *qemu_plugin_insn_alloc(void)
> > >      return insn;
> > >  }
> > >
> > > +/* Internal context for this TranslationBlock */
> > >  struct qemu_plugin_tb {
> > >      GPtrArray *insns;
> > >      size_t n;
> > > @@ -135,6 +138,7 @@ struct qemu_plugin_tb {
> > >      uint64_t vaddr2;
> > >      void *haddr1;
> > >      void *haddr2;
> > > +    bool store_only;
> > >      GArray *cbs[PLUGIN_N_CB_SUBTYPES];
> > >  };
> > >
> > > diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
> > > index 8a1bb801e0..137b91282e 100644
> > > --- a/accel/tcg/plugin-gen.c
> > > +++ b/accel/tcg/plugin-gen.c
> > > @@ -842,7 +842,7 @@ static void plugin_gen_inject(const struct qemu_plugin_tb *plugin_tb)
> > >      pr_ops();
> > >  }
> > >
> > > -bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb)
> > > +bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool store_only)
> > >  {
> > >      struct qemu_plugin_tb *ptb = tcg_ctx->plugin_tb;
> > >      bool ret = false;
> > > @@ -855,6 +855,7 @@ bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb)
> > >          ptb->vaddr2 = -1;
> > >          get_page_addr_code_hostp(cpu->env_ptr, tb->pc, &ptb->haddr1);
> > >          ptb->haddr2 = NULL;
> > > +        ptb->store_only = store_only;
> > >
> > >          plugin_gen_empty_callback(PLUGIN_GEN_FROM_TB);
> > >      }
> > > diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
> > > index 14d1ea795d..082f2c8ee1 100644
> > > --- a/accel/tcg/translator.c
> > > +++ b/accel/tcg/translator.c
> > > @@ -58,7 +58,7 @@ void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
> > >      ops->tb_start(db, cpu);
> > >      tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
> > >
> > > -    plugin_enabled = !(tb_cflags(db->tb) & CF_NOINSTR) && plugin_gen_tb_start(cpu, tb);
> > > +    plugin_enabled = plugin_gen_tb_start(cpu, tb, tb_cflags(db->tb) & CF_NOINSTR);
> > >
> > >      while (true) {
> > >          db->num_insns++;
> > > @@ -100,6 +100,8 @@ void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
> > >              gen_io_start();
> > >              ops->translate_insn(db, cpu);
> > >          } else {
> > > +            /* we should only see NOINSTR for io_recompile */
> > > +            g_assert(!(tb_cflags(db->tb) & CF_NOINSTR));
> > >              ops->translate_insn(db, cpu);
> > >          }
> > >
> > > diff --git a/plugins/api.c b/plugins/api.c
> > > index 5dc8e6f934..ac8475707d 100644
> > > --- a/plugins/api.c
> > > +++ b/plugins/api.c
> > > @@ -84,15 +84,19 @@ void qemu_plugin_register_vcpu_tb_exec_cb(struct qemu_plugin_tb *tb,
> > >                                            enum qemu_plugin_cb_flags flags,
> > >                                            void *udata)
> > >  {
> > > -    plugin_register_dyn_cb__udata(&tb->cbs[PLUGIN_CB_REGULAR],
> > > -                                  cb, flags, udata);
> > > +    if (!tb->store_only) {
> > > +        plugin_register_dyn_cb__udata(&tb->cbs[PLUGIN_CB_REGULAR],
> > > +                                      cb, flags, udata);
> > > +    }
> > >  }
> > >
> > >  void qemu_plugin_register_vcpu_tb_exec_inline(struct qemu_plugin_tb *tb,
> > >                                                enum qemu_plugin_op op,
> > >                                                void *ptr, uint64_t imm)
> > >  {
> > > -    plugin_register_inline_op(&tb->cbs[PLUGIN_CB_INLINE], 0, op, ptr, imm);
> > > +    if (!tb->store_only) {
> > > +        plugin_register_inline_op(&tb->cbs[PLUGIN_CB_INLINE], 0, op, ptr, imm);
> > > +    }
> > >  }
> > >
> > >  void qemu_plugin_register_vcpu_insn_exec_cb(struct qemu_plugin_insn *insn,
> > > @@ -100,16 +104,20 @@ void qemu_plugin_register_vcpu_insn_exec_cb(struct qemu_plugin_insn *insn,
> > >                                              enum qemu_plugin_cb_flags flags,
> > >                                              void *udata)
> > >  {
> > > -    plugin_register_dyn_cb__udata(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_REGULAR],
> > > -        cb, flags, udata);
> > > +    if (!insn->store_only) {
> > > +        plugin_register_dyn_cb__udata(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_REGULAR],
> > > +                                      cb, flags, udata);
> > > +    }
> > >  }
> > >
> > >  void qemu_plugin_register_vcpu_insn_exec_inline(struct qemu_plugin_insn *insn,
> > >                                                  enum qemu_plugin_op op,
> > >                                                  void *ptr, uint64_t imm)
> > >  {
> > > -    plugin_register_inline_op(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_INLINE],
> > > -                              0, op, ptr, imm);
> > > +    if (!insn->store_only) {
> > > +        plugin_register_inline_op(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_INLINE],
> > > +                                  0, op, ptr, imm);
> > > +    }
> > >  }
> > >
> > >
> > > @@ -120,8 +128,13 @@ void qemu_plugin_register_vcpu_mem_cb(struct qemu_plugin_insn *insn,
> > >                                        enum qemu_plugin_mem_rw rw,
> > >                                        void *udata)
> > >  {
> > > -    plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
> > > -                                cb, flags, rw, udata);
> > > +    if (insn->store_only && (rw & QEMU_PLUGIN_MEM_W)) {
> > > +        plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
> > > +                                    cb, flags, QEMU_PLUGIN_MEM_W, udata);
> > > +    } else {
> > > +        plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
> > > +                                    cb, flags, rw, udata);
> > > +    }
> > >  }
> > >
> > >  void qemu_plugin_register_vcpu_mem_inline(struct qemu_plugin_insn *insn,
> > > @@ -129,8 +142,10 @@ void qemu_plugin_register_vcpu_mem_inline(struct qemu_plugin_insn *insn,
> > >                                            enum qemu_plugin_op op, void *ptr,
> > >                                            uint64_t imm)
> > >  {
> > > -    plugin_register_inline_op(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE],
> > > -        rw, op, ptr, imm);
> > > +    if (!insn->store_only) {
> > > +        plugin_register_inline_op(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE],
> > > +                                  rw, op, ptr, imm);
> > > +    }
> > >  }
> > >
> > >  void qemu_plugin_register_vcpu_tb_trans_cb(qemu_plugin_id_t id,
> > > @@ -181,10 +196,13 @@ uint64_t qemu_plugin_tb_vaddr(const struct qemu_plugin_tb *tb)
> > >  struct qemu_plugin_insn *
> > >  qemu_plugin_tb_get_insn(const struct qemu_plugin_tb *tb, size_t idx)
> > >  {
> > > +    struct qemu_plugin_insn *insn;
> > >      if (unlikely(idx >= tb->n)) {
> > >          return NULL;
> > >      }
> > > -    return g_ptr_array_index(tb->insns, idx);
> > > +    insn = g_ptr_array_index(tb->insns, idx);
> > > +    insn->store_only = tb->store_only;
> > > +    return insn;
> > >  }
> > >
> > >  /*
> > > --8<---------------cut here---------------end--------------->8---
> > >
> > > --
> > > Alex Bennée
> 
> 
> 
> -- 
> Alex Bennée
> KVM/QEMU Hacker for Linaro


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH  v2 20/21] accel/tcg: allow plugin instrumentation to be disable via cflags
  2021-02-12 16:00       ` Alex Bennée
@ 2021-02-12 17:04         ` Aaron Lindsay via
  0 siblings, 0 replies; 53+ messages in thread
From: Aaron Lindsay via @ 2021-02-12 17:04 UTC (permalink / raw)
  To: Alex Bennée
  Cc: qemu-devel, cota, kuhn.chenqun, robhenry, mahmoudabdalghany,
	Richard Henderson, Paolo Bonzini

On Feb 12 16:00, Alex Bennée wrote:
> 
> Alex Bennée <alex.bennee@linaro.org> writes:
> 
> > Aaron Lindsay <aaron@os.amperecomputing.com> writes:
> >
> >> On Feb 10 22:10, Alex Bennée wrote:
> >>> When icount is enabled and we recompile an MMIO access we end up
> >>> double counting the instruction execution. To avoid this we introduce
> >>> the CF_NOINSTR cflag which disables instrumentation for the next TB.
> >>> As this is part of the hashed compile flags we will only execute the
> >>> generated TB while coming out of a cpu_io_recompile.
> >>
> >> Unfortunately this patch works a little too well!
> >>
> >> With this change, the memory access callbacks registered via
> >> `qemu_plugin_register_vcpu_mem_cb()` are never called for the
> >> re-translated instruction making the IO access, since we've disabled all
> >> instrumentation.
> >>
> >> Is it possible to selectively disable only instruction callbacks using
> >> this mechanism, while still allowing others that would not yet have been
> >> called for the re-translated instruction?
> >
> > Can you try the following fugly patch on top of this series:
> >
> <snip>
> > @@ -120,8 +128,13 @@ void qemu_plugin_register_vcpu_mem_cb(struct qemu_plugin_insn *insn,
> >                                        enum qemu_plugin_mem_rw rw,
> >                                        void *udata)
> >  {
> > -    plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
> > -                                cb, flags, rw, udata);
> > +    if (insn->store_only && (rw & QEMU_PLUGIN_MEM_W)) {
> > +        plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
> > +                                    cb, flags, QEMU_PLUGIN_MEM_W, udata);
> > +    } else {
> > +        plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
> > +                                    cb, flags, rw, udata);
> > +    }
> >  }
> <snip>
> 
> Actually I'm wondering if I've got my sense the wrong way around. Should
> it be loads only:
> 
>   void qemu_plugin_register_vcpu_mem_cb(struct qemu_plugin_insn *insn,
>                                         qemu_plugin_vcpu_mem_cb_t cb,
>                                         enum qemu_plugin_cb_flags flags,
>                                         enum qemu_plugin_mem_rw rw,
>                                         void *udata)
>   {
>       if (insn->store_only && (rw & QEMU_PLUGIN_MEM_R)) {
>           plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
>                                       cb, flags, QEMU_PLUGIN_MEM_R, udata);
>       } else {
>           plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
>                                       cb, flags, rw, udata);
>       }
>   }
> 
> obviously I'd have to rename the variables :-/

This gets me only loads and no stores. I've modified it to be just:

void qemu_plugin_register_vcpu_mem_cb(struct qemu_plugin_insn *insn,
                                      qemu_plugin_vcpu_mem_cb_t cb,
                                      enum qemu_plugin_cb_flags flags,
                                      enum qemu_plugin_mem_rw rw,
                                      void *udata)
{
    plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
                                cb, flags, rw, udata);
}

And that appears to get me one memory callback both for loads and stores.

-Aaron


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 16/21] accel/tcg: actually cache our partial icount TB
  2021-02-12 15:40       ` Philippe Mathieu-Daudé
@ 2021-02-12 17:06         ` Alex Bennée
  0 siblings, 0 replies; 53+ messages in thread
From: Alex Bennée @ 2021-02-12 17:06 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Richard Henderson, qemu-devel, robhenry, mahmoudabdalghany,
	aaron, cota, kuhn.chenqun, Paolo Bonzini


Philippe Mathieu-Daudé <f4bug@amsat.org> writes:

> On 2/11/21 7:48 PM, Richard Henderson wrote:
>> On 2/11/21 2:21 AM, Philippe Mathieu-Daudé wrote:
>>>> -    insns_left = MIN(0xffff, cpu->icount_budget);
>>>> +    insns_left = MIN(CF_COUNT_MASK, cpu->icount_budget);
>>>
>>> Can you describe this change a bit please?
>> 
>> Replacing a magic number with its proper symbol.
>
> I am confuse because I see:
>
> #define CF_COUNT_MASK  0x00007fff

This is the largest partial count you can store in the CF flags (0x8000
is used for LAST_IO). The decrement field can handle the full u16
although in practice I would never expect a final block to be more than
a few instructions. We could probably shorten the mask without any
deleterious effect if we needed to scrape together any more CFLAGS.

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 20/21] accel/tcg: allow plugin instrumentation to be disable via cflags
  2021-02-12 16:50           ` Aaron Lindsay via
@ 2021-02-12 17:19             ` Alex Bennée
  2021-02-16 10:34             ` Alex Bennée
  1 sibling, 0 replies; 53+ messages in thread
From: Alex Bennée @ 2021-02-12 17:19 UTC (permalink / raw)
  To: Aaron Lindsay
  Cc: Richard Henderson, QEMU Developers, Robert Henry,
	mahmoudabdalghany, Emilio G. Cota, Paolo Bonzini, Chenqun (kuhn)


Aaron Lindsay <aaron@os.amperecomputing.com> writes:

> On Feb 12 16:04, Alex Bennée wrote:
>> Do you see two stores or one store? I think I got the sense the wrong
>> way around because the store is instrumented before the mmu code,
>> hence should be skipped on a re-instrumented block.
>
> I only see one store between the instruction callback for the store and
> the instruction callback for the subsequent instruction.

OK - having looked more closely and reminded myself what's going on I
think the difference is memory callbacks versus memory inline. All
inline calls happen before the actual instructions. The callbacks have a
pre and post memory helper where the actual callback comes after the
operation. Those are what we want to preserve.

Let me re-spin the patch and see if I can add a test case to compare the
counts between inline and cb (which should be the same with
deterministic icount).

>
> -Aaron
>
>> On Fri, 12 Feb 2021 at 15:41, Aaron Lindsay
>> <aaron@os.amperecomputing.com> wrote:
>> >
>> > On Feb 12 14:43, Alex Bennée wrote:
>> > > Aaron Lindsay <aaron@os.amperecomputing.com> writes:
>> > > > On Feb 10 22:10, Alex Bennée wrote:
>> > > >> When icount is enabled and we recompile an MMIO access we end up
>> > > >> double counting the instruction execution. To avoid this we introduce
>> > > >> the CF_NOINSTR cflag which disables instrumentation for the next TB.
>> > > >> As this is part of the hashed compile flags we will only execute the
>> > > >> generated TB while coming out of a cpu_io_recompile.
>> > > >
>> > > > Unfortunately this patch works a little too well!
>> > > >
>> > > > With this change, the memory access callbacks registered via
>> > > > `qemu_plugin_register_vcpu_mem_cb()` are never called for the
>> > > > re-translated instruction making the IO access, since we've disabled all
>> > > > instrumentation.
>> > > >
>> > > > Is it possible to selectively disable only instruction callbacks using
>> > > > this mechanism, while still allowing others that would not yet have been
>> > > > called for the re-translated instruction?
>> > >
>> > > Can you try the following fugly patch on top of this series:
>> >
>> > This patch does allow me to successfully observe memory callbacks for
>> > stores in this case. It seems from looking at the patch that you
>> > intentionally only allowed memory callbacks for stores in this case, and
>> > I still don't see callbacks any for loads.
>> >
>> > -Aaron
>> >
>> > > --8<---------------cut here---------------start------------->8---
>> > > diff --git a/include/exec/plugin-gen.h b/include/exec/plugin-gen.h
>> > > index 4834a9e2f4..b1b72b5d90 100644
>> > > --- a/include/exec/plugin-gen.h
>> > > +++ b/include/exec/plugin-gen.h
>> > > @@ -19,7 +19,7 @@ struct DisasContextBase;
>> > >
>> > >  #ifdef CONFIG_PLUGIN
>> > >
>> > > -bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb);
>> > > +bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool supress);
>> > >  void plugin_gen_tb_end(CPUState *cpu);
>> > >  void plugin_gen_insn_start(CPUState *cpu, const struct DisasContextBase *db);
>> > >  void plugin_gen_insn_end(void);
>> > > @@ -41,7 +41,7 @@ static inline void plugin_insn_append(const void *from, size_t size)
>> > >  #else /* !CONFIG_PLUGIN */
>> > >
>> > >  static inline
>> > > -bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb)
>> > > +bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool supress)
>> > >  {
>> > >      return false;
>> > >  }
>> > > diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
>> > > index 841deed79c..2a26a2277f 100644
>> > > --- a/include/qemu/plugin.h
>> > > +++ b/include/qemu/plugin.h
>> > > @@ -92,6 +92,7 @@ struct qemu_plugin_dyn_cb {
>> > >      };
>> > >  };
>> > >
>> > > +/* Internal context for instrumenting an instruction */
>> > >  struct qemu_plugin_insn {
>> > >      GByteArray *data;
>> > >      uint64_t vaddr;
>> > > @@ -99,6 +100,7 @@ struct qemu_plugin_insn {
>> > >      GArray *cbs[PLUGIN_N_CB_TYPES][PLUGIN_N_CB_SUBTYPES];
>> > >      bool calls_helpers;
>> > >      bool mem_helper;
>> > > +    bool store_only;
>> > >  };
>> > >
>> > >  /*
>> > > @@ -128,6 +130,7 @@ static inline struct qemu_plugin_insn *qemu_plugin_insn_alloc(void)
>> > >      return insn;
>> > >  }
>> > >
>> > > +/* Internal context for this TranslationBlock */
>> > >  struct qemu_plugin_tb {
>> > >      GPtrArray *insns;
>> > >      size_t n;
>> > > @@ -135,6 +138,7 @@ struct qemu_plugin_tb {
>> > >      uint64_t vaddr2;
>> > >      void *haddr1;
>> > >      void *haddr2;
>> > > +    bool store_only;
>> > >      GArray *cbs[PLUGIN_N_CB_SUBTYPES];
>> > >  };
>> > >
>> > > diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
>> > > index 8a1bb801e0..137b91282e 100644
>> > > --- a/accel/tcg/plugin-gen.c
>> > > +++ b/accel/tcg/plugin-gen.c
>> > > @@ -842,7 +842,7 @@ static void plugin_gen_inject(const struct qemu_plugin_tb *plugin_tb)
>> > >      pr_ops();
>> > >  }
>> > >
>> > > -bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb)
>> > > +bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool store_only)
>> > >  {
>> > >      struct qemu_plugin_tb *ptb = tcg_ctx->plugin_tb;
>> > >      bool ret = false;
>> > > @@ -855,6 +855,7 @@ bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb)
>> > >          ptb->vaddr2 = -1;
>> > >          get_page_addr_code_hostp(cpu->env_ptr, tb->pc, &ptb->haddr1);
>> > >          ptb->haddr2 = NULL;
>> > > +        ptb->store_only = store_only;
>> > >
>> > >          plugin_gen_empty_callback(PLUGIN_GEN_FROM_TB);
>> > >      }
>> > > diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
>> > > index 14d1ea795d..082f2c8ee1 100644
>> > > --- a/accel/tcg/translator.c
>> > > +++ b/accel/tcg/translator.c
>> > > @@ -58,7 +58,7 @@ void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
>> > >      ops->tb_start(db, cpu);
>> > >      tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
>> > >
>> > > -    plugin_enabled = !(tb_cflags(db->tb) & CF_NOINSTR) && plugin_gen_tb_start(cpu, tb);
>> > > +    plugin_enabled = plugin_gen_tb_start(cpu, tb, tb_cflags(db->tb) & CF_NOINSTR);
>> > >
>> > >      while (true) {
>> > >          db->num_insns++;
>> > > @@ -100,6 +100,8 @@ void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
>> > >              gen_io_start();
>> > >              ops->translate_insn(db, cpu);
>> > >          } else {
>> > > +            /* we should only see NOINSTR for io_recompile */
>> > > +            g_assert(!(tb_cflags(db->tb) & CF_NOINSTR));
>> > >              ops->translate_insn(db, cpu);
>> > >          }
>> > >
>> > > diff --git a/plugins/api.c b/plugins/api.c
>> > > index 5dc8e6f934..ac8475707d 100644
>> > > --- a/plugins/api.c
>> > > +++ b/plugins/api.c
>> > > @@ -84,15 +84,19 @@ void qemu_plugin_register_vcpu_tb_exec_cb(struct qemu_plugin_tb *tb,
>> > >                                            enum qemu_plugin_cb_flags flags,
>> > >                                            void *udata)
>> > >  {
>> > > -    plugin_register_dyn_cb__udata(&tb->cbs[PLUGIN_CB_REGULAR],
>> > > -                                  cb, flags, udata);
>> > > +    if (!tb->store_only) {
>> > > +        plugin_register_dyn_cb__udata(&tb->cbs[PLUGIN_CB_REGULAR],
>> > > +                                      cb, flags, udata);
>> > > +    }
>> > >  }
>> > >
>> > >  void qemu_plugin_register_vcpu_tb_exec_inline(struct qemu_plugin_tb *tb,
>> > >                                                enum qemu_plugin_op op,
>> > >                                                void *ptr, uint64_t imm)
>> > >  {
>> > > -    plugin_register_inline_op(&tb->cbs[PLUGIN_CB_INLINE], 0, op, ptr, imm);
>> > > +    if (!tb->store_only) {
>> > > +        plugin_register_inline_op(&tb->cbs[PLUGIN_CB_INLINE], 0, op, ptr, imm);
>> > > +    }
>> > >  }
>> > >
>> > >  void qemu_plugin_register_vcpu_insn_exec_cb(struct qemu_plugin_insn *insn,
>> > > @@ -100,16 +104,20 @@ void qemu_plugin_register_vcpu_insn_exec_cb(struct qemu_plugin_insn *insn,
>> > >                                              enum qemu_plugin_cb_flags flags,
>> > >                                              void *udata)
>> > >  {
>> > > -    plugin_register_dyn_cb__udata(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_REGULAR],
>> > > -        cb, flags, udata);
>> > > +    if (!insn->store_only) {
>> > > +        plugin_register_dyn_cb__udata(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_REGULAR],
>> > > +                                      cb, flags, udata);
>> > > +    }
>> > >  }
>> > >
>> > >  void qemu_plugin_register_vcpu_insn_exec_inline(struct qemu_plugin_insn *insn,
>> > >                                                  enum qemu_plugin_op op,
>> > >                                                  void *ptr, uint64_t imm)
>> > >  {
>> > > -    plugin_register_inline_op(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_INLINE],
>> > > -                              0, op, ptr, imm);
>> > > +    if (!insn->store_only) {
>> > > +        plugin_register_inline_op(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_INLINE],
>> > > +                                  0, op, ptr, imm);
>> > > +    }
>> > >  }
>> > >
>> > >
>> > > @@ -120,8 +128,13 @@ void qemu_plugin_register_vcpu_mem_cb(struct qemu_plugin_insn *insn,
>> > >                                        enum qemu_plugin_mem_rw rw,
>> > >                                        void *udata)
>> > >  {
>> > > -    plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
>> > > -                                cb, flags, rw, udata);
>> > > +    if (insn->store_only && (rw & QEMU_PLUGIN_MEM_W)) {
>> > > +        plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
>> > > +                                    cb, flags, QEMU_PLUGIN_MEM_W, udata);
>> > > +    } else {
>> > > +        plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
>> > > +                                    cb, flags, rw, udata);
>> > > +    }
>> > >  }
>> > >
>> > >  void qemu_plugin_register_vcpu_mem_inline(struct qemu_plugin_insn *insn,
>> > > @@ -129,8 +142,10 @@ void qemu_plugin_register_vcpu_mem_inline(struct qemu_plugin_insn *insn,
>> > >                                            enum qemu_plugin_op op, void *ptr,
>> > >                                            uint64_t imm)
>> > >  {
>> > > -    plugin_register_inline_op(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE],
>> > > -        rw, op, ptr, imm);
>> > > +    if (!insn->store_only) {
>> > > +        plugin_register_inline_op(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE],
>> > > +                                  rw, op, ptr, imm);
>> > > +    }
>> > >  }
>> > >
>> > >  void qemu_plugin_register_vcpu_tb_trans_cb(qemu_plugin_id_t id,
>> > > @@ -181,10 +196,13 @@ uint64_t qemu_plugin_tb_vaddr(const struct qemu_plugin_tb *tb)
>> > >  struct qemu_plugin_insn *
>> > >  qemu_plugin_tb_get_insn(const struct qemu_plugin_tb *tb, size_t idx)
>> > >  {
>> > > +    struct qemu_plugin_insn *insn;
>> > >      if (unlikely(idx >= tb->n)) {
>> > >          return NULL;
>> > >      }
>> > > -    return g_ptr_array_index(tb->insns, idx);
>> > > +    insn = g_ptr_array_index(tb->insns, idx);
>> > > +    insn->store_only = tb->store_only;
>> > > +    return insn;
>> > >  }
>> > >
>> > >  /*
>> > > --8<---------------cut here---------------end--------------->8---
>> > >
>> > > --
>> > > Alex Bennée
>> 
>> 
>> 
>> -- 
>> Alex Bennée
>> KVM/QEMU Hacker for Linaro


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 20/21] accel/tcg: allow plugin instrumentation to be disable via cflags
  2021-02-12 16:50           ` Aaron Lindsay via
  2021-02-12 17:19             ` Alex Bennée
@ 2021-02-16 10:34             ` Alex Bennée
  2021-02-17 16:32               ` Aaron Lindsay via
  1 sibling, 1 reply; 53+ messages in thread
From: Alex Bennée @ 2021-02-16 10:34 UTC (permalink / raw)
  To: Aaron Lindsay
  Cc: Richard Henderson, QEMU Developers, Robert Henry,
	mahmoudabdalghany, Emilio G. Cota, Paolo Bonzini, Chenqun (kuhn)


Aaron Lindsay <aaron@os.amperecomputing.com> writes:

> On Feb 12 16:04, Alex Bennée wrote:
>> Do you see two stores or one store? I think I got the sense the wrong
>> way around because the store is instrumented before the mmu code,
>> hence should be skipped on a re-instrumented block.
>
> I only see one store between the instruction callback for the store and
> the instruction callback for the subsequent instruction.

I've posted:

  Subject: [PATCH  v3 00/23] plugins/next pre-PR (hwprofile, regression fixes, icount count fix)
  Date: Sat, 13 Feb 2021 13:03:02 +0000
  Message-Id: <20210213130325.14781-1-alex.bennee@linaro.org>

which I think solves it. Could you have a look?

>
> -Aaron
>
>> On Fri, 12 Feb 2021 at 15:41, Aaron Lindsay
>> <aaron@os.amperecomputing.com> wrote:
>> >
>> > On Feb 12 14:43, Alex Bennée wrote:
>> > > Aaron Lindsay <aaron@os.amperecomputing.com> writes:
>> > > > On Feb 10 22:10, Alex Bennée wrote:
>> > > >> When icount is enabled and we recompile an MMIO access we end up
>> > > >> double counting the instruction execution. To avoid this we introduce
>> > > >> the CF_NOINSTR cflag which disables instrumentation for the next TB.
>> > > >> As this is part of the hashed compile flags we will only execute the
>> > > >> generated TB while coming out of a cpu_io_recompile.
>> > > >
>> > > > Unfortunately this patch works a little too well!
>> > > >
>> > > > With this change, the memory access callbacks registered via
>> > > > `qemu_plugin_register_vcpu_mem_cb()` are never called for the
>> > > > re-translated instruction making the IO access, since we've disabled all
>> > > > instrumentation.
>> > > >
>> > > > Is it possible to selectively disable only instruction callbacks using
>> > > > this mechanism, while still allowing others that would not yet have been
>> > > > called for the re-translated instruction?
>> > >
>> > > Can you try the following fugly patch on top of this series:
>> >
>> > This patch does allow me to successfully observe memory callbacks for
>> > stores in this case. It seems from looking at the patch that you
>> > intentionally only allowed memory callbacks for stores in this case, and
>> > I still don't see callbacks any for loads.
>> >
>> > -Aaron
>> >
>> > > --8<---------------cut here---------------start------------->8---
>> > > diff --git a/include/exec/plugin-gen.h b/include/exec/plugin-gen.h
>> > > index 4834a9e2f4..b1b72b5d90 100644
>> > > --- a/include/exec/plugin-gen.h
>> > > +++ b/include/exec/plugin-gen.h
>> > > @@ -19,7 +19,7 @@ struct DisasContextBase;
>> > >
>> > >  #ifdef CONFIG_PLUGIN
>> > >
>> > > -bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb);
>> > > +bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool supress);
>> > >  void plugin_gen_tb_end(CPUState *cpu);
>> > >  void plugin_gen_insn_start(CPUState *cpu, const struct DisasContextBase *db);
>> > >  void plugin_gen_insn_end(void);
>> > > @@ -41,7 +41,7 @@ static inline void plugin_insn_append(const void *from, size_t size)
>> > >  #else /* !CONFIG_PLUGIN */
>> > >
>> > >  static inline
>> > > -bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb)
>> > > +bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool supress)
>> > >  {
>> > >      return false;
>> > >  }
>> > > diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
>> > > index 841deed79c..2a26a2277f 100644
>> > > --- a/include/qemu/plugin.h
>> > > +++ b/include/qemu/plugin.h
>> > > @@ -92,6 +92,7 @@ struct qemu_plugin_dyn_cb {
>> > >      };
>> > >  };
>> > >
>> > > +/* Internal context for instrumenting an instruction */
>> > >  struct qemu_plugin_insn {
>> > >      GByteArray *data;
>> > >      uint64_t vaddr;
>> > > @@ -99,6 +100,7 @@ struct qemu_plugin_insn {
>> > >      GArray *cbs[PLUGIN_N_CB_TYPES][PLUGIN_N_CB_SUBTYPES];
>> > >      bool calls_helpers;
>> > >      bool mem_helper;
>> > > +    bool store_only;
>> > >  };
>> > >
>> > >  /*
>> > > @@ -128,6 +130,7 @@ static inline struct qemu_plugin_insn *qemu_plugin_insn_alloc(void)
>> > >      return insn;
>> > >  }
>> > >
>> > > +/* Internal context for this TranslationBlock */
>> > >  struct qemu_plugin_tb {
>> > >      GPtrArray *insns;
>> > >      size_t n;
>> > > @@ -135,6 +138,7 @@ struct qemu_plugin_tb {
>> > >      uint64_t vaddr2;
>> > >      void *haddr1;
>> > >      void *haddr2;
>> > > +    bool store_only;
>> > >      GArray *cbs[PLUGIN_N_CB_SUBTYPES];
>> > >  };
>> > >
>> > > diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
>> > > index 8a1bb801e0..137b91282e 100644
>> > > --- a/accel/tcg/plugin-gen.c
>> > > +++ b/accel/tcg/plugin-gen.c
>> > > @@ -842,7 +842,7 @@ static void plugin_gen_inject(const struct qemu_plugin_tb *plugin_tb)
>> > >      pr_ops();
>> > >  }
>> > >
>> > > -bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb)
>> > > +bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool store_only)
>> > >  {
>> > >      struct qemu_plugin_tb *ptb = tcg_ctx->plugin_tb;
>> > >      bool ret = false;
>> > > @@ -855,6 +855,7 @@ bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb)
>> > >          ptb->vaddr2 = -1;
>> > >          get_page_addr_code_hostp(cpu->env_ptr, tb->pc, &ptb->haddr1);
>> > >          ptb->haddr2 = NULL;
>> > > +        ptb->store_only = store_only;
>> > >
>> > >          plugin_gen_empty_callback(PLUGIN_GEN_FROM_TB);
>> > >      }
>> > > diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
>> > > index 14d1ea795d..082f2c8ee1 100644
>> > > --- a/accel/tcg/translator.c
>> > > +++ b/accel/tcg/translator.c
>> > > @@ -58,7 +58,7 @@ void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
>> > >      ops->tb_start(db, cpu);
>> > >      tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
>> > >
>> > > -    plugin_enabled = !(tb_cflags(db->tb) & CF_NOINSTR) && plugin_gen_tb_start(cpu, tb);
>> > > +    plugin_enabled = plugin_gen_tb_start(cpu, tb, tb_cflags(db->tb) & CF_NOINSTR);
>> > >
>> > >      while (true) {
>> > >          db->num_insns++;
>> > > @@ -100,6 +100,8 @@ void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
>> > >              gen_io_start();
>> > >              ops->translate_insn(db, cpu);
>> > >          } else {
>> > > +            /* we should only see NOINSTR for io_recompile */
>> > > +            g_assert(!(tb_cflags(db->tb) & CF_NOINSTR));
>> > >              ops->translate_insn(db, cpu);
>> > >          }
>> > >
>> > > diff --git a/plugins/api.c b/plugins/api.c
>> > > index 5dc8e6f934..ac8475707d 100644
>> > > --- a/plugins/api.c
>> > > +++ b/plugins/api.c
>> > > @@ -84,15 +84,19 @@ void qemu_plugin_register_vcpu_tb_exec_cb(struct qemu_plugin_tb *tb,
>> > >                                            enum qemu_plugin_cb_flags flags,
>> > >                                            void *udata)
>> > >  {
>> > > -    plugin_register_dyn_cb__udata(&tb->cbs[PLUGIN_CB_REGULAR],
>> > > -                                  cb, flags, udata);
>> > > +    if (!tb->store_only) {
>> > > +        plugin_register_dyn_cb__udata(&tb->cbs[PLUGIN_CB_REGULAR],
>> > > +                                      cb, flags, udata);
>> > > +    }
>> > >  }
>> > >
>> > >  void qemu_plugin_register_vcpu_tb_exec_inline(struct qemu_plugin_tb *tb,
>> > >                                                enum qemu_plugin_op op,
>> > >                                                void *ptr, uint64_t imm)
>> > >  {
>> > > -    plugin_register_inline_op(&tb->cbs[PLUGIN_CB_INLINE], 0, op, ptr, imm);
>> > > +    if (!tb->store_only) {
>> > > +        plugin_register_inline_op(&tb->cbs[PLUGIN_CB_INLINE], 0, op, ptr, imm);
>> > > +    }
>> > >  }
>> > >
>> > >  void qemu_plugin_register_vcpu_insn_exec_cb(struct qemu_plugin_insn *insn,
>> > > @@ -100,16 +104,20 @@ void qemu_plugin_register_vcpu_insn_exec_cb(struct qemu_plugin_insn *insn,
>> > >                                              enum qemu_plugin_cb_flags flags,
>> > >                                              void *udata)
>> > >  {
>> > > -    plugin_register_dyn_cb__udata(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_REGULAR],
>> > > -        cb, flags, udata);
>> > > +    if (!insn->store_only) {
>> > > +        plugin_register_dyn_cb__udata(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_REGULAR],
>> > > +                                      cb, flags, udata);
>> > > +    }
>> > >  }
>> > >
>> > >  void qemu_plugin_register_vcpu_insn_exec_inline(struct qemu_plugin_insn *insn,
>> > >                                                  enum qemu_plugin_op op,
>> > >                                                  void *ptr, uint64_t imm)
>> > >  {
>> > > -    plugin_register_inline_op(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_INLINE],
>> > > -                              0, op, ptr, imm);
>> > > +    if (!insn->store_only) {
>> > > +        plugin_register_inline_op(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_INLINE],
>> > > +                                  0, op, ptr, imm);
>> > > +    }
>> > >  }
>> > >
>> > >
>> > > @@ -120,8 +128,13 @@ void qemu_plugin_register_vcpu_mem_cb(struct qemu_plugin_insn *insn,
>> > >                                        enum qemu_plugin_mem_rw rw,
>> > >                                        void *udata)
>> > >  {
>> > > -    plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
>> > > -                                cb, flags, rw, udata);
>> > > +    if (insn->store_only && (rw & QEMU_PLUGIN_MEM_W)) {
>> > > +        plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
>> > > +                                    cb, flags, QEMU_PLUGIN_MEM_W, udata);
>> > > +    } else {
>> > > +        plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
>> > > +                                    cb, flags, rw, udata);
>> > > +    }
>> > >  }
>> > >
>> > >  void qemu_plugin_register_vcpu_mem_inline(struct qemu_plugin_insn *insn,
>> > > @@ -129,8 +142,10 @@ void qemu_plugin_register_vcpu_mem_inline(struct qemu_plugin_insn *insn,
>> > >                                            enum qemu_plugin_op op, void *ptr,
>> > >                                            uint64_t imm)
>> > >  {
>> > > -    plugin_register_inline_op(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE],
>> > > -        rw, op, ptr, imm);
>> > > +    if (!insn->store_only) {
>> > > +        plugin_register_inline_op(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE],
>> > > +                                  rw, op, ptr, imm);
>> > > +    }
>> > >  }
>> > >
>> > >  void qemu_plugin_register_vcpu_tb_trans_cb(qemu_plugin_id_t id,
>> > > @@ -181,10 +196,13 @@ uint64_t qemu_plugin_tb_vaddr(const struct qemu_plugin_tb *tb)
>> > >  struct qemu_plugin_insn *
>> > >  qemu_plugin_tb_get_insn(const struct qemu_plugin_tb *tb, size_t idx)
>> > >  {
>> > > +    struct qemu_plugin_insn *insn;
>> > >      if (unlikely(idx >= tb->n)) {
>> > >          return NULL;
>> > >      }
>> > > -    return g_ptr_array_index(tb->insns, idx);
>> > > +    insn = g_ptr_array_index(tb->insns, idx);
>> > > +    insn->store_only = tb->store_only;
>> > > +    return insn;
>> > >  }
>> > >
>> > >  /*
>> > > --8<---------------cut here---------------end--------------->8---
>> > >
>> > > --
>> > > Alex Bennée
>> 
>> 
>> 
>> -- 
>> Alex Bennée
>> KVM/QEMU Hacker for Linaro


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 20/21] accel/tcg: allow plugin instrumentation to be disable via cflags
  2021-02-16 10:34             ` Alex Bennée
@ 2021-02-17 16:32               ` Aaron Lindsay via
  0 siblings, 0 replies; 53+ messages in thread
From: Aaron Lindsay via @ 2021-02-17 16:32 UTC (permalink / raw)
  To: Alex Bennée
  Cc: QEMU Developers, Emilio G. Cota, Chenqun (kuhn),
	Robert Henry, mahmoudabdalghany, Richard Henderson,
	Paolo Bonzini

On Feb 16 10:34, Alex Bennée wrote:
> 
> Aaron Lindsay <aaron@os.amperecomputing.com> writes:
> 
> > On Feb 12 16:04, Alex Bennée wrote:
> >> Do you see two stores or one store? I think I got the sense the wrong
> >> way around because the store is instrumented before the mmu code,
> >> hence should be skipped on a re-instrumented block.
> >
> > I only see one store between the instruction callback for the store and
> > the instruction callback for the subsequent instruction.
> 
> I've posted:
> 
>   Subject: [PATCH  v3 00/23] plugins/next pre-PR (hwprofile, regression fixes, icount count fix)
>   Date: Sat, 13 Feb 2021 13:03:02 +0000
>   Message-Id: <20210213130325.14781-1-alex.bennee@linaro.org>
> 
> which I think solves it. Could you have a look?

Just did, and it looks good to me - Thanks!

-Aaron

> >
> > -Aaron
> >
> >> On Fri, 12 Feb 2021 at 15:41, Aaron Lindsay
> >> <aaron@os.amperecomputing.com> wrote:
> >> >
> >> > On Feb 12 14:43, Alex Bennée wrote:
> >> > > Aaron Lindsay <aaron@os.amperecomputing.com> writes:
> >> > > > On Feb 10 22:10, Alex Bennée wrote:
> >> > > >> When icount is enabled and we recompile an MMIO access we end up
> >> > > >> double counting the instruction execution. To avoid this we introduce
> >> > > >> the CF_NOINSTR cflag which disables instrumentation for the next TB.
> >> > > >> As this is part of the hashed compile flags we will only execute the
> >> > > >> generated TB while coming out of a cpu_io_recompile.
> >> > > >
> >> > > > Unfortunately this patch works a little too well!
> >> > > >
> >> > > > With this change, the memory access callbacks registered via
> >> > > > `qemu_plugin_register_vcpu_mem_cb()` are never called for the
> >> > > > re-translated instruction making the IO access, since we've disabled all
> >> > > > instrumentation.
> >> > > >
> >> > > > Is it possible to selectively disable only instruction callbacks using
> >> > > > this mechanism, while still allowing others that would not yet have been
> >> > > > called for the re-translated instruction?
> >> > >
> >> > > Can you try the following fugly patch on top of this series:
> >> >
> >> > This patch does allow me to successfully observe memory callbacks for
> >> > stores in this case. It seems from looking at the patch that you
> >> > intentionally only allowed memory callbacks for stores in this case, and
> >> > I still don't see callbacks any for loads.
> >> >
> >> > -Aaron
> >> >
> >> > > --8<---------------cut here---------------start------------->8---
> >> > > diff --git a/include/exec/plugin-gen.h b/include/exec/plugin-gen.h
> >> > > index 4834a9e2f4..b1b72b5d90 100644
> >> > > --- a/include/exec/plugin-gen.h
> >> > > +++ b/include/exec/plugin-gen.h
> >> > > @@ -19,7 +19,7 @@ struct DisasContextBase;
> >> > >
> >> > >  #ifdef CONFIG_PLUGIN
> >> > >
> >> > > -bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb);
> >> > > +bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool supress);
> >> > >  void plugin_gen_tb_end(CPUState *cpu);
> >> > >  void plugin_gen_insn_start(CPUState *cpu, const struct DisasContextBase *db);
> >> > >  void plugin_gen_insn_end(void);
> >> > > @@ -41,7 +41,7 @@ static inline void plugin_insn_append(const void *from, size_t size)
> >> > >  #else /* !CONFIG_PLUGIN */
> >> > >
> >> > >  static inline
> >> > > -bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb)
> >> > > +bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool supress)
> >> > >  {
> >> > >      return false;
> >> > >  }
> >> > > diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
> >> > > index 841deed79c..2a26a2277f 100644
> >> > > --- a/include/qemu/plugin.h
> >> > > +++ b/include/qemu/plugin.h
> >> > > @@ -92,6 +92,7 @@ struct qemu_plugin_dyn_cb {
> >> > >      };
> >> > >  };
> >> > >
> >> > > +/* Internal context for instrumenting an instruction */
> >> > >  struct qemu_plugin_insn {
> >> > >      GByteArray *data;
> >> > >      uint64_t vaddr;
> >> > > @@ -99,6 +100,7 @@ struct qemu_plugin_insn {
> >> > >      GArray *cbs[PLUGIN_N_CB_TYPES][PLUGIN_N_CB_SUBTYPES];
> >> > >      bool calls_helpers;
> >> > >      bool mem_helper;
> >> > > +    bool store_only;
> >> > >  };
> >> > >
> >> > >  /*
> >> > > @@ -128,6 +130,7 @@ static inline struct qemu_plugin_insn *qemu_plugin_insn_alloc(void)
> >> > >      return insn;
> >> > >  }
> >> > >
> >> > > +/* Internal context for this TranslationBlock */
> >> > >  struct qemu_plugin_tb {
> >> > >      GPtrArray *insns;
> >> > >      size_t n;
> >> > > @@ -135,6 +138,7 @@ struct qemu_plugin_tb {
> >> > >      uint64_t vaddr2;
> >> > >      void *haddr1;
> >> > >      void *haddr2;
> >> > > +    bool store_only;
> >> > >      GArray *cbs[PLUGIN_N_CB_SUBTYPES];
> >> > >  };
> >> > >
> >> > > diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
> >> > > index 8a1bb801e0..137b91282e 100644
> >> > > --- a/accel/tcg/plugin-gen.c
> >> > > +++ b/accel/tcg/plugin-gen.c
> >> > > @@ -842,7 +842,7 @@ static void plugin_gen_inject(const struct qemu_plugin_tb *plugin_tb)
> >> > >      pr_ops();
> >> > >  }
> >> > >
> >> > > -bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb)
> >> > > +bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool store_only)
> >> > >  {
> >> > >      struct qemu_plugin_tb *ptb = tcg_ctx->plugin_tb;
> >> > >      bool ret = false;
> >> > > @@ -855,6 +855,7 @@ bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb)
> >> > >          ptb->vaddr2 = -1;
> >> > >          get_page_addr_code_hostp(cpu->env_ptr, tb->pc, &ptb->haddr1);
> >> > >          ptb->haddr2 = NULL;
> >> > > +        ptb->store_only = store_only;
> >> > >
> >> > >          plugin_gen_empty_callback(PLUGIN_GEN_FROM_TB);
> >> > >      }
> >> > > diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
> >> > > index 14d1ea795d..082f2c8ee1 100644
> >> > > --- a/accel/tcg/translator.c
> >> > > +++ b/accel/tcg/translator.c
> >> > > @@ -58,7 +58,7 @@ void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
> >> > >      ops->tb_start(db, cpu);
> >> > >      tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
> >> > >
> >> > > -    plugin_enabled = !(tb_cflags(db->tb) & CF_NOINSTR) && plugin_gen_tb_start(cpu, tb);
> >> > > +    plugin_enabled = plugin_gen_tb_start(cpu, tb, tb_cflags(db->tb) & CF_NOINSTR);
> >> > >
> >> > >      while (true) {
> >> > >          db->num_insns++;
> >> > > @@ -100,6 +100,8 @@ void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
> >> > >              gen_io_start();
> >> > >              ops->translate_insn(db, cpu);
> >> > >          } else {
> >> > > +            /* we should only see NOINSTR for io_recompile */
> >> > > +            g_assert(!(tb_cflags(db->tb) & CF_NOINSTR));
> >> > >              ops->translate_insn(db, cpu);
> >> > >          }
> >> > >
> >> > > diff --git a/plugins/api.c b/plugins/api.c
> >> > > index 5dc8e6f934..ac8475707d 100644
> >> > > --- a/plugins/api.c
> >> > > +++ b/plugins/api.c
> >> > > @@ -84,15 +84,19 @@ void qemu_plugin_register_vcpu_tb_exec_cb(struct qemu_plugin_tb *tb,
> >> > >                                            enum qemu_plugin_cb_flags flags,
> >> > >                                            void *udata)
> >> > >  {
> >> > > -    plugin_register_dyn_cb__udata(&tb->cbs[PLUGIN_CB_REGULAR],
> >> > > -                                  cb, flags, udata);
> >> > > +    if (!tb->store_only) {
> >> > > +        plugin_register_dyn_cb__udata(&tb->cbs[PLUGIN_CB_REGULAR],
> >> > > +                                      cb, flags, udata);
> >> > > +    }
> >> > >  }
> >> > >
> >> > >  void qemu_plugin_register_vcpu_tb_exec_inline(struct qemu_plugin_tb *tb,
> >> > >                                                enum qemu_plugin_op op,
> >> > >                                                void *ptr, uint64_t imm)
> >> > >  {
> >> > > -    plugin_register_inline_op(&tb->cbs[PLUGIN_CB_INLINE], 0, op, ptr, imm);
> >> > > +    if (!tb->store_only) {
> >> > > +        plugin_register_inline_op(&tb->cbs[PLUGIN_CB_INLINE], 0, op, ptr, imm);
> >> > > +    }
> >> > >  }
> >> > >
> >> > >  void qemu_plugin_register_vcpu_insn_exec_cb(struct qemu_plugin_insn *insn,
> >> > > @@ -100,16 +104,20 @@ void qemu_plugin_register_vcpu_insn_exec_cb(struct qemu_plugin_insn *insn,
> >> > >                                              enum qemu_plugin_cb_flags flags,
> >> > >                                              void *udata)
> >> > >  {
> >> > > -    plugin_register_dyn_cb__udata(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_REGULAR],
> >> > > -        cb, flags, udata);
> >> > > +    if (!insn->store_only) {
> >> > > +        plugin_register_dyn_cb__udata(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_REGULAR],
> >> > > +                                      cb, flags, udata);
> >> > > +    }
> >> > >  }
> >> > >
> >> > >  void qemu_plugin_register_vcpu_insn_exec_inline(struct qemu_plugin_insn *insn,
> >> > >                                                  enum qemu_plugin_op op,
> >> > >                                                  void *ptr, uint64_t imm)
> >> > >  {
> >> > > -    plugin_register_inline_op(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_INLINE],
> >> > > -                              0, op, ptr, imm);
> >> > > +    if (!insn->store_only) {
> >> > > +        plugin_register_inline_op(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_INLINE],
> >> > > +                                  0, op, ptr, imm);
> >> > > +    }
> >> > >  }
> >> > >
> >> > >
> >> > > @@ -120,8 +128,13 @@ void qemu_plugin_register_vcpu_mem_cb(struct qemu_plugin_insn *insn,
> >> > >                                        enum qemu_plugin_mem_rw rw,
> >> > >                                        void *udata)
> >> > >  {
> >> > > -    plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
> >> > > -                                cb, flags, rw, udata);
> >> > > +    if (insn->store_only && (rw & QEMU_PLUGIN_MEM_W)) {
> >> > > +        plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
> >> > > +                                    cb, flags, QEMU_PLUGIN_MEM_W, udata);
> >> > > +    } else {
> >> > > +        plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
> >> > > +                                    cb, flags, rw, udata);
> >> > > +    }
> >> > >  }
> >> > >
> >> > >  void qemu_plugin_register_vcpu_mem_inline(struct qemu_plugin_insn *insn,
> >> > > @@ -129,8 +142,10 @@ void qemu_plugin_register_vcpu_mem_inline(struct qemu_plugin_insn *insn,
> >> > >                                            enum qemu_plugin_op op, void *ptr,
> >> > >                                            uint64_t imm)
> >> > >  {
> >> > > -    plugin_register_inline_op(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE],
> >> > > -        rw, op, ptr, imm);
> >> > > +    if (!insn->store_only) {
> >> > > +        plugin_register_inline_op(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE],
> >> > > +                                  rw, op, ptr, imm);
> >> > > +    }
> >> > >  }
> >> > >
> >> > >  void qemu_plugin_register_vcpu_tb_trans_cb(qemu_plugin_id_t id,
> >> > > @@ -181,10 +196,13 @@ uint64_t qemu_plugin_tb_vaddr(const struct qemu_plugin_tb *tb)
> >> > >  struct qemu_plugin_insn *
> >> > >  qemu_plugin_tb_get_insn(const struct qemu_plugin_tb *tb, size_t idx)
> >> > >  {
> >> > > +    struct qemu_plugin_insn *insn;
> >> > >      if (unlikely(idx >= tb->n)) {
> >> > >          return NULL;
> >> > >      }
> >> > > -    return g_ptr_array_index(tb->insns, idx);
> >> > > +    insn = g_ptr_array_index(tb->insns, idx);
> >> > > +    insn->store_only = tb->store_only;
> >> > > +    return insn;
> >> > >  }
> >> > >
> >> > >  /*
> >> > > --8<---------------cut here---------------end--------------->8---
> >> > >
> >> > > --
> >> > > Alex Bennée
> >> 
> >> 
> >> 
> >> -- 
> >> Alex Bennée
> >> KVM/QEMU Hacker for Linaro
> 
> 
> -- 
> Alex Bennée


^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2021-02-17 16:35 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-10 22:10 [PATCH v2 00/21] plugins/next pre-PR (hwprofile, regression fixes, icount count fix) Alex Bennée
2021-02-10 22:10 ` [PATCH v2 01/21] hw/virtio/pci: include vdev name in registered PCI sections Alex Bennée
2021-02-10 22:10 ` [PATCH v2 02/21] plugins: add API to return a name for a IO device Alex Bennée
2021-02-10 22:10 ` [PATCH v2 03/21] plugins: new hwprofile plugin Alex Bennée
2021-02-10 22:10 ` [PATCH v2 04/21] contrib: Don't use '#' flag of printf format Alex Bennée
2021-02-10 22:10 ` [PATCH v2 05/21] contrib: Fix some code style problems, ERROR: "foo * bar" should be "foo *bar" Alex Bennée
2021-02-10 22:10 ` [PATCH v2 06/21] contrib: Add spaces around operator Alex Bennée
2021-02-10 22:10 ` [PATCH v2 07/21] contrib: space required after that ',' Alex Bennée
2021-02-10 22:10 ` [PATCH v2 08/21] contrib: Open brace '{' following struct go on the same line Alex Bennée
2021-02-10 22:10 ` [PATCH v2 09/21] accel/tcg/plugin-gen: fix the call signature for inline callbacks Alex Bennée
2021-02-10 22:10 ` [PATCH v2 10/21] exec: Move TranslationBlock typedef to qemu/typedefs.h Alex Bennée
2021-02-11 10:14   ` Philippe Mathieu-Daudé
2021-02-11 10:24     ` Alex Bennée
2021-02-10 22:10 ` [PATCH v2 11/21] accel/tcg: Create io_recompile_replay_branch hook Alex Bennée
2021-02-11 10:12   ` Philippe Mathieu-Daudé
2021-02-10 22:10 ` [PATCH v2 12/21] target/mips: Create mips_io_recompile_replay_branch Alex Bennée
2021-02-11 10:10   ` Philippe Mathieu-Daudé
2021-02-10 22:10 ` [PATCH v2 13/21] target/sh4: Create superh_io_recompile_replay_branch Alex Bennée
2021-02-11 10:13   ` Philippe Mathieu-Daudé
2021-02-10 22:10 ` [PATCH v2 14/21] tests/plugin: expand insn test to detect duplicate instructions Alex Bennée
2021-02-10 22:10 ` [PATCH v2 15/21] tests/acceptance: add a new set of tests to exercise plugins Alex Bennée
2021-02-11 10:31   ` Philippe Mathieu-Daudé
2021-02-11 18:59     ` Wainer dos Santos Moschetta
2021-02-11 19:51   ` Wainer dos Santos Moschetta
2021-02-10 22:10 ` [PATCH v2 16/21] accel/tcg: actually cache our partial icount TB Alex Bennée
2021-02-11 10:21   ` Philippe Mathieu-Daudé
2021-02-11 18:48     ` Richard Henderson
2021-02-12 15:40       ` Philippe Mathieu-Daudé
2021-02-12 17:06         ` Alex Bennée
2021-02-11 18:48   ` Richard Henderson
2021-02-10 22:10 ` [PATCH v2 17/21] accel/tcg: cache single instruction TB on pending replay exception Alex Bennée
2021-02-11 19:12   ` Richard Henderson
2021-02-11 20:00     ` Alex Bennée
2021-02-10 22:10 ` [PATCH v2 18/21] accel/tcg: re-factor non-RAM execution code Alex Bennée
2021-02-11 19:19   ` Richard Henderson
2021-02-10 22:10 ` [PATCH v2 19/21] accel/tcg: remove CF_NOCACHE and special cases Alex Bennée
2021-02-10 22:10 ` [PATCH v2 20/21] accel/tcg: allow plugin instrumentation to be disable via cflags Alex Bennée
2021-02-12  0:53   ` Aaron Lindsay via
2021-02-12 11:22     ` Alex Bennée
2021-02-12 14:31       ` Aaron Lindsay via
2021-02-12 14:59         ` Alex Bennée
2021-02-12 14:43     ` Alex Bennée
2021-02-12 15:41       ` Aaron Lindsay via
2021-02-12 16:04         ` Alex Bennée
2021-02-12 16:50           ` Aaron Lindsay via
2021-02-12 17:19             ` Alex Bennée
2021-02-16 10:34             ` Alex Bennée
2021-02-17 16:32               ` Aaron Lindsay via
2021-02-12 16:00       ` Alex Bennée
2021-02-12 17:04         ` Aaron Lindsay via
2021-02-10 22:10 ` [PATCH v2 21/21] tests/acceptance: add a new tests to detect counting errors Alex Bennée
2021-02-11 10:24   ` Philippe Mathieu-Daudé
2021-02-11 19:56   ` Wainer dos Santos Moschetta

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.