All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [ PATCH v7 00/22] replay additions
@ 2018-02-27  9:51 Pavel Dovgalyuk
  2018-02-27  9:51 ` [Qemu-devel] [PATCH v7 01/22] cpu-exec: fix exception_index handling Pavel Dovgalyuk
                   ` (22 more replies)
  0 siblings, 23 replies; 29+ messages in thread
From: Pavel Dovgalyuk @ 2018-02-27  9:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, war2jordan, boost.lists, quintela,
	ciro.santilli, jasowang, mst, zuban32s, maria.klimushenkova,
	dovgaluk, kraxel, pavel.dovgaluk, thomas.dullien, pbonzini,
	alex.bennee

This set of patches moves replay lock upper in the function call tree.
Now replay lock functions similar to BQL in older version and allows
deterministic execution of the threads in icount mode.
It is also fixes some vmstate creation (and loading) issues
in record/replay modes:
 - VM start/stop fixes in replay mode
 - overlay creation for blkreplay filter
 - fixes for vmstate save/load in record/replay mode
 - fixes for host clock vmstate

There is also a set of helper scripts written by Alex Bennée
for debugging the record/replay code.

v6 patches with updates for v7 are available in the repository:
https://github.com/ispras/qemu/tree/rr-180207

v7 changes:
 - updated record/replay documentation
 - removed abort() from mutex stub functions
 - fixed cpu_io_recompile function

v6 changes:
 - removed BQL optimization at all
 - refined replay lock patches
 - removed lock/unlock from replay-audio

v5 changes:
 - removed patch for narrowing BQL-protected code
 - disabled bdrv_(drain/flush)_all for record/replay mode

v4 changes:
 - removed upstreamed patches
 - added patch for saving async queue state in replay
 - minor fixes

v3 changes:
 - removed upstreamed patches
 - fixed bug with recursive checkpoints
 - fixed bug with icount warp checkpoint

v2 changes:
 - updated lock/unlock logic (as suggested by Paolo Bonzini)
 - updated cpu execution loop to avoid races in setting/resetting exit request (as suggested by Paolo Bonzini)
 - minor changes

---

Alex Bennée (5):
      replay/replay.c: bump REPLAY_VERSION again
      replay/replay-internal.c: track holding of replay_lock
      replay: make locking visible outside replay code
      replay: push replay_mutex_lock up the call tree
      scripts/replay-dump.py: replay log dumper

Pavel Dovgalyuk (17):
      cpu-exec: fix exception_index handling
      block: implement bdrv_snapshot_goto for blkreplay
      blkreplay: create temporary overlay for underlaying devices
      replay: disable default snapshot for record/replay
      replay: fix processing async events
      replay: fixed replay_enable_events
      replay: fix save/load vm for non-empty queue
      replay: added replay log format description
      replay: save prior value of the host clock
      replay: don't destroy mutex at exit
      replay: check return values of fwrite
      replay: avoid recursive call of checkpoints
      replay: don't process async events when warping the clock
      replay: save vmstate of the asynchronous events
      replay: don't drain/flush bdrv queue while RR is working
      replay: update documentation
      tcg: fix cpu_io_recompile


 accel/tcg/cpu-exec.c      |    5 +
 accel/tcg/translate-all.c |   18 ++-
 block/blkreplay.c         |   75 +++++++++++
 block/io.c                |   22 +++
 cpus.c                    |   26 +++-
 docs/replay.txt           |  163 +++++++++++++++++++++---
 include/qemu/timer.h      |   14 ++
 include/sysemu/replay.h   |   18 +++
 migration/savevm.c        |   13 ++
 replay/replay-audio.c     |   14 +-
 replay/replay-char.c      |   21 +--
 replay/replay-events.c    |   75 +++++------
 replay/replay-internal.c  |   47 ++++++-
 replay/replay-internal.h  |   16 ++
 replay/replay-snapshot.c  |   12 ++
 replay/replay-time.c      |   10 +
 replay/replay.c           |   62 ++++++---
 scripts/replay-dump.py    |  308 +++++++++++++++++++++++++++++++++++++++++++++
 stubs/replay.c            |    9 +
 util/main-loop.c          |   15 ++
 util/qemu-timer.c         |   12 ++
 vl.c                      |   12 +-
 22 files changed, 831 insertions(+), 136 deletions(-)
 create mode 100755 scripts/replay-dump.py

-- 
Pavel Dovgalyuk

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v7 01/22] cpu-exec: fix exception_index handling
  2018-02-27  9:51 [Qemu-devel] [ PATCH v7 00/22] replay additions Pavel Dovgalyuk
@ 2018-02-27  9:51 ` Pavel Dovgalyuk
  2018-02-27  9:51 ` [Qemu-devel] [PATCH v7 02/22] block: implement bdrv_snapshot_goto for blkreplay Pavel Dovgalyuk
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 29+ messages in thread
From: Pavel Dovgalyuk @ 2018-02-27  9:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, war2jordan, boost.lists, quintela,
	ciro.santilli, jasowang, mst, zuban32s, maria.klimushenkova,
	dovgaluk, kraxel, pavel.dovgaluk, thomas.dullien, pbonzini,
	alex.bennee

Function cpu_handle_interrupt calls cc->cpu_exec_interrupt to process
pending hardware interrupts. Under the hood cpu_exec_interrupt uses
cpu->exception_index to pass information to the internal function which
is usually common for exception and interrupt processing.
But this value is not reset after return and may be processed again
by cpu_handle_exception. This does not happen due to overwriting
the exception_index at the end of cpu_handle_interrupt.
But this branch may also overwrite the valid exception_index in some cases.
Therefore this patch:
 1. resets exception_index just after the call to cpu_exec_interrupt
 2. prevents overwriting the meaningful value of exception_index

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 accel/tcg/cpu-exec.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 280200f..9cc6972 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -585,6 +585,7 @@ static inline bool cpu_handle_interrupt(CPUState *cpu,
         else {
             if (cc->cpu_exec_interrupt(cpu, interrupt_request)) {
                 replay_interrupt();
+                cpu->exception_index = -1;
                 *last_tb = NULL;
             }
             /* The target hook may have updated the 'cpu->interrupt_request';
@@ -606,7 +607,9 @@ static inline bool cpu_handle_interrupt(CPUState *cpu,
     if (unlikely(atomic_read(&cpu->exit_request)
         || (use_icount && cpu->icount_decr.u16.low + cpu->icount_extra == 0))) {
         atomic_set(&cpu->exit_request, 0);
-        cpu->exception_index = EXCP_INTERRUPT;
+        if (cpu->exception_index == -1) {
+            cpu->exception_index = EXCP_INTERRUPT;
+        }
         return true;
     }
 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v7 02/22] block: implement bdrv_snapshot_goto for blkreplay
  2018-02-27  9:51 [Qemu-devel] [ PATCH v7 00/22] replay additions Pavel Dovgalyuk
  2018-02-27  9:51 ` [Qemu-devel] [PATCH v7 01/22] cpu-exec: fix exception_index handling Pavel Dovgalyuk
@ 2018-02-27  9:51 ` Pavel Dovgalyuk
  2018-02-27  9:51 ` [Qemu-devel] [PATCH v7 03/22] blkreplay: create temporary overlay for underlaying devices Pavel Dovgalyuk
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 29+ messages in thread
From: Pavel Dovgalyuk @ 2018-02-27  9:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, war2jordan, boost.lists, quintela,
	ciro.santilli, jasowang, mst, zuban32s, maria.klimushenkova,
	dovgaluk, kraxel, pavel.dovgaluk, thomas.dullien, pbonzini,
	alex.bennee

From: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>

This patch enables making snapshots with blkreplay used in
block devices.
This function is required to make bdrv_snapshot_goto without
calling .bdrv_open which is not implemented.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
---
 block/blkreplay.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/block/blkreplay.c b/block/blkreplay.c
index 61e44a1..4c58bd2 100755
--- a/block/blkreplay.c
+++ b/block/blkreplay.c
@@ -127,6 +127,12 @@ static int coroutine_fn blkreplay_co_flush(BlockDriverState *bs)
     return ret;
 }
 
+static int blkreplay_snapshot_goto(BlockDriverState *bs,
+                                   const char *snapshot_id)
+{
+    return bdrv_snapshot_goto(bs->file->bs, snapshot_id, NULL);
+}
+
 static BlockDriver bdrv_blkreplay = {
     .format_name            = "blkreplay",
     .protocol_name          = "blkreplay",
@@ -143,6 +149,8 @@ static BlockDriver bdrv_blkreplay = {
     .bdrv_co_pwrite_zeroes  = blkreplay_co_pwrite_zeroes,
     .bdrv_co_pdiscard       = blkreplay_co_pdiscard,
     .bdrv_co_flush          = blkreplay_co_flush,
+
+    .bdrv_snapshot_goto     = blkreplay_snapshot_goto,
 };
 
 static void bdrv_blkreplay_init(void)

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v7 03/22] blkreplay: create temporary overlay for underlaying devices
  2018-02-27  9:51 [Qemu-devel] [ PATCH v7 00/22] replay additions Pavel Dovgalyuk
  2018-02-27  9:51 ` [Qemu-devel] [PATCH v7 01/22] cpu-exec: fix exception_index handling Pavel Dovgalyuk
  2018-02-27  9:51 ` [Qemu-devel] [PATCH v7 02/22] block: implement bdrv_snapshot_goto for blkreplay Pavel Dovgalyuk
@ 2018-02-27  9:51 ` Pavel Dovgalyuk
  2018-02-27  9:51 ` [Qemu-devel] [PATCH v7 04/22] replay: disable default snapshot for record/replay Pavel Dovgalyuk
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 29+ messages in thread
From: Pavel Dovgalyuk @ 2018-02-27  9:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, war2jordan, boost.lists, quintela,
	ciro.santilli, jasowang, mst, zuban32s, maria.klimushenkova,
	dovgaluk, kraxel, pavel.dovgaluk, thomas.dullien, pbonzini,
	alex.bennee

From: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>

This patch allows using '-snapshot' behavior in record/replay mode.
blkreplay layer creates temporary overlays on top of underlaying
disk images. It is needed, because creating an overlay over blkreplay
breaks the determinism.
This patch creates similar temporary overlay (when it is needed)
under the blkreplay driver. Therefore all block operations are controlled
by blkreplay.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
---
 block/blkreplay.c |   67 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 stubs/replay.c    |    1 +
 vl.c              |    2 +-
 3 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/block/blkreplay.c b/block/blkreplay.c
index 4c58bd2..2b68ac3 100755
--- a/block/blkreplay.c
+++ b/block/blkreplay.c
@@ -14,12 +14,71 @@
 #include "block/block_int.h"
 #include "sysemu/replay.h"
 #include "qapi/error.h"
+#include "qapi/qmp/qstring.h"
+#include "qapi/qmp/qdict.h"
+#include "qemu/option.h"
 
 typedef struct Request {
     Coroutine *co;
     QEMUBH *bh;
 } Request;
 
+static BlockDriverState *blkreplay_append_snapshot(BlockDriverState *bs,
+                                                   Error **errp)
+{
+    int ret;
+    BlockDriverState *bs_snapshot;
+    int64_t total_size;
+    QemuOpts *opts = NULL;
+    char tmp_filename[PATH_MAX + 1];
+    QDict *snapshot_options = qdict_new();
+
+    /* Prepare options QDict for the overlay file */
+    qdict_put(snapshot_options, "file.driver", qstring_from_str("file"));
+    qdict_put(snapshot_options, "driver", qstring_from_str("qcow2"));
+
+    /* Create temporary file */
+    ret = get_tmp_filename(tmp_filename, PATH_MAX + 1);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret, "Could not get temporary filename");
+        goto out;
+    }
+    qdict_put(snapshot_options, "file.filename",
+              qstring_from_str(tmp_filename));
+
+    /* Get the required size from the image */
+    total_size = bdrv_getlength(bs);
+    if (total_size < 0) {
+        error_setg_errno(errp, -total_size, "Could not get image size");
+        goto out;
+    }
+
+    opts = qemu_opts_create(bdrv_qcow2.create_opts, NULL, 0, &error_abort);
+    qemu_opt_set_number(opts, BLOCK_OPT_SIZE, total_size, &error_abort);
+    ret = bdrv_create(&bdrv_qcow2, tmp_filename, opts, errp);
+    qemu_opts_del(opts);
+    if (ret < 0) {
+        error_prepend(errp, "Could not create temporary overlay '%s': ",
+                      tmp_filename);
+        goto out;
+    }
+
+    bs_snapshot = bdrv_open(NULL, NULL, snapshot_options,
+                            BDRV_O_RDWR | BDRV_O_TEMPORARY, errp);
+    snapshot_options = NULL;
+    if (!bs_snapshot) {
+        goto out;
+    }
+
+    bdrv_append(bs_snapshot, bs, errp);
+
+    return bs_snapshot;
+
+out:
+    QDECREF(snapshot_options);
+    return NULL;
+}
+
 static int blkreplay_open(BlockDriverState *bs, QDict *options, int flags,
                           Error **errp)
 {
@@ -35,6 +94,14 @@ static int blkreplay_open(BlockDriverState *bs, QDict *options, int flags,
         goto fail;
     }
 
+    /* Add temporary snapshot to preserve the image */
+    if (!replay_snapshot
+        && !blkreplay_append_snapshot(bs->file->bs, &local_err)) {
+        ret = -EINVAL;
+        error_propagate(errp, local_err);
+        goto fail;
+    }
+
     ret = 0;
 fail:
     return ret;
diff --git a/stubs/replay.c b/stubs/replay.c
index 9c8aa48..9991ee5 100644
--- a/stubs/replay.c
+++ b/stubs/replay.c
@@ -3,6 +3,7 @@
 #include "sysemu/sysemu.h"
 
 ReplayMode replay_mode;
+char *replay_snapshot;
 
 int64_t replay_save_clock(unsigned int kind, int64_t clock)
 {
diff --git a/vl.c b/vl.c
index 9e7235d..d260a06 100644
--- a/vl.c
+++ b/vl.c
@@ -4523,7 +4523,7 @@ int main(int argc, char **argv, char **envp)
         qapi_free_BlockdevOptions(bdo->bdo);
         g_free(bdo);
     }
-    if (snapshot || replay_mode != REPLAY_MODE_NONE) {
+    if (snapshot) {
         qemu_opts_foreach(qemu_find_opts("drive"), drive_enable_snapshot,
                           NULL, NULL);
     }

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v7 04/22] replay: disable default snapshot for record/replay
  2018-02-27  9:51 [Qemu-devel] [ PATCH v7 00/22] replay additions Pavel Dovgalyuk
                   ` (2 preceding siblings ...)
  2018-02-27  9:51 ` [Qemu-devel] [PATCH v7 03/22] blkreplay: create temporary overlay for underlaying devices Pavel Dovgalyuk
@ 2018-02-27  9:51 ` Pavel Dovgalyuk
  2018-02-27  9:52 ` [Qemu-devel] [PATCH v7 05/22] replay: fix processing async events Pavel Dovgalyuk
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 29+ messages in thread
From: Pavel Dovgalyuk @ 2018-02-27  9:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, war2jordan, boost.lists, quintela,
	ciro.santilli, jasowang, mst, zuban32s, maria.klimushenkova,
	dovgaluk, kraxel, pavel.dovgaluk, thomas.dullien, pbonzini,
	alex.bennee

From: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>

This patch disables setting '-snapshot' option on by default
in record/replay mode. This is needed for creating vmstates in record
and replay modes.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
---
 vl.c |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/vl.c b/vl.c
index d260a06..1170c69 100644
--- a/vl.c
+++ b/vl.c
@@ -3221,7 +3221,13 @@ int main(int argc, char **argv, char **envp)
                 drive_add(IF_PFLASH, -1, optarg, PFLASH_OPTS);
                 break;
             case QEMU_OPTION_snapshot:
-                snapshot = 1;
+                {
+                    Error *blocker = NULL;
+                    snapshot = 1;
+                    error_setg(&blocker, QERR_REPLAY_NOT_SUPPORTED,
+                               "-snapshot");
+                    replay_add_blocker(blocker);
+                }
                 break;
             case QEMU_OPTION_numa:
                 opts = qemu_opts_parse_noisily(qemu_find_opts("numa"),

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v7 05/22] replay: fix processing async events
  2018-02-27  9:51 [Qemu-devel] [ PATCH v7 00/22] replay additions Pavel Dovgalyuk
                   ` (3 preceding siblings ...)
  2018-02-27  9:51 ` [Qemu-devel] [PATCH v7 04/22] replay: disable default snapshot for record/replay Pavel Dovgalyuk
@ 2018-02-27  9:52 ` Pavel Dovgalyuk
  2018-02-27  9:52 ` [Qemu-devel] [PATCH v7 06/22] replay: fixed replay_enable_events Pavel Dovgalyuk
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 29+ messages in thread
From: Pavel Dovgalyuk @ 2018-02-27  9:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, war2jordan, boost.lists, quintela,
	ciro.santilli, jasowang, mst, zuban32s, maria.klimushenkova,
	dovgaluk, kraxel, pavel.dovgaluk, thomas.dullien, pbonzini,
	alex.bennee

Asynchronous events saved at checkpoints may invoke
callbacks when processed. These callbacks may also generate/read
new events (e.g. clock reads). Therefore event processing flag must be
reset before callback invocation.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
---
 replay/replay-events.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/replay/replay-events.c b/replay/replay-events.c
index 94a6dcc..768b505 100644
--- a/replay/replay-events.c
+++ b/replay/replay-events.c
@@ -295,13 +295,13 @@ void replay_read_events(int checkpoint)
         if (!event) {
             break;
         }
+        replay_finish_event();
+        read_event_kind = -1;
         replay_mutex_unlock();
         replay_run_event(event);
         replay_mutex_lock();
 
         g_free(event);
-        replay_finish_event();
-        read_event_kind = -1;
     }
 }
 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v7 06/22] replay: fixed replay_enable_events
  2018-02-27  9:51 [Qemu-devel] [ PATCH v7 00/22] replay additions Pavel Dovgalyuk
                   ` (4 preceding siblings ...)
  2018-02-27  9:52 ` [Qemu-devel] [PATCH v7 05/22] replay: fix processing async events Pavel Dovgalyuk
@ 2018-02-27  9:52 ` Pavel Dovgalyuk
  2018-02-27  9:52 ` [Qemu-devel] [PATCH v7 07/22] replay: fix save/load vm for non-empty queue Pavel Dovgalyuk
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 29+ messages in thread
From: Pavel Dovgalyuk @ 2018-02-27  9:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, war2jordan, boost.lists, quintela,
	ciro.santilli, jasowang, mst, zuban32s, maria.klimushenkova,
	dovgaluk, kraxel, pavel.dovgaluk, thomas.dullien, pbonzini,
	alex.bennee

This patch fixes assignment to internal events_enabled variable.
Now it is set only in record/replay mode. This affects the behavior
of the external functions that check this flag.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
---
 replay/replay-events.c |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/replay/replay-events.c b/replay/replay-events.c
index 768b505..e858254 100644
--- a/replay/replay-events.c
+++ b/replay/replay-events.c
@@ -67,7 +67,9 @@ static void replay_run_event(Event *event)
 
 void replay_enable_events(void)
 {
-    events_enabled = true;
+    if (replay_mode != REPLAY_MODE_NONE) {
+        events_enabled = true;
+    }
 }
 
 bool replay_has_events(void)
@@ -141,7 +143,7 @@ void replay_add_event(ReplayAsyncEventKind event_kind,
 
 void replay_bh_schedule_event(QEMUBH *bh)
 {
-    if (replay_mode != REPLAY_MODE_NONE && events_enabled) {
+    if (events_enabled) {
         uint64_t id = replay_get_current_step();
         replay_add_event(REPLAY_ASYNC_EVENT_BH, bh, NULL, id);
     } else {
@@ -161,7 +163,7 @@ void replay_add_input_sync_event(void)
 
 void replay_block_event(QEMUBH *bh, uint64_t id)
 {
-    if (replay_mode != REPLAY_MODE_NONE && events_enabled) {
+    if (events_enabled) {
         replay_add_event(REPLAY_ASYNC_EVENT_BLOCK, bh, NULL, id);
     } else {
         qemu_bh_schedule(bh);

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v7 07/22] replay: fix save/load vm for non-empty queue
  2018-02-27  9:51 [Qemu-devel] [ PATCH v7 00/22] replay additions Pavel Dovgalyuk
                   ` (5 preceding siblings ...)
  2018-02-27  9:52 ` [Qemu-devel] [PATCH v7 06/22] replay: fixed replay_enable_events Pavel Dovgalyuk
@ 2018-02-27  9:52 ` Pavel Dovgalyuk
  2018-02-27  9:52 ` [Qemu-devel] [PATCH v7 08/22] replay: added replay log format description Pavel Dovgalyuk
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 29+ messages in thread
From: Pavel Dovgalyuk @ 2018-02-27  9:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, war2jordan, boost.lists, quintela,
	ciro.santilli, jasowang, mst, zuban32s, maria.klimushenkova,
	dovgaluk, kraxel, pavel.dovgaluk, thomas.dullien, pbonzini,
	alex.bennee

This patch does not allows saving/loading vmstate when
replay events queue is not empty. There is no reliable
way to save events queue, because it describes internal
coroutine state. Therefore saving and loading operations
should be deferred to another record/replay step.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>

--

v2: fixed error_report calls
---
 include/sysemu/replay.h  |    3 +++
 migration/savevm.c       |   13 +++++++++++++
 replay/replay-snapshot.c |    6 ++++++
 3 files changed, 22 insertions(+)

diff --git a/include/sysemu/replay.h b/include/sysemu/replay.h
index dc8ae7b..5462555 100644
--- a/include/sysemu/replay.h
+++ b/include/sysemu/replay.h
@@ -164,5 +164,8 @@ void replay_audio_in(int *recorded, void *samples, int *wpos, int size);
 /*! Called at the start of execution.
     Loads or saves initial vmstate depending on execution mode. */
 void replay_vmstate_init(void);
+/*! Called to ensure that replay state is consistent and VM snapshot
+    can be created */
+bool replay_can_snapshot(void);
 
 #endif
diff --git a/migration/savevm.c b/migration/savevm.c
index 8e6d872..d98ea64 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -53,6 +53,7 @@
 #include "qemu/cutils.h"
 #include "io/channel-buffer.h"
 #include "io/channel-file.h"
+#include "sysemu/replay.h"
 
 #ifndef ETH_P_RARP
 #define ETH_P_RARP 0x8035
@@ -2196,6 +2197,12 @@ int save_snapshot(const char *name, Error **errp)
     struct tm tm;
     AioContext *aio_context;
 
+    if (!replay_can_snapshot()) {
+        error_report("Record/replay does not allow making snapshot "
+                     "right now. Try once more later.");
+        return ret;
+    }
+
     if (!bdrv_all_can_snapshot(&bs)) {
         error_setg(errp, "Device '%s' is writable but does not support "
                    "snapshots", bdrv_get_device_name(bs));
@@ -2387,6 +2394,12 @@ int load_snapshot(const char *name, Error **errp)
     AioContext *aio_context;
     MigrationIncomingState *mis = migration_incoming_get_current();
 
+    if (!replay_can_snapshot()) {
+        error_report("Record/replay does not allow loading snapshot "
+                     "right now. Try once more later.");
+        return -EINVAL;
+    }
+
     if (!bdrv_all_can_snapshot(&bs)) {
         error_setg(errp,
                    "Device '%s' is writable but does not support snapshots",
diff --git a/replay/replay-snapshot.c b/replay/replay-snapshot.c
index b2e1076..7075986 100644
--- a/replay/replay-snapshot.c
+++ b/replay/replay-snapshot.c
@@ -83,3 +83,9 @@ void replay_vmstate_init(void)
         }
     }
 }
+
+bool replay_can_snapshot(void)
+{
+    return replay_mode == REPLAY_MODE_NONE
+        || !replay_has_events();
+}

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v7 08/22] replay: added replay log format description
  2018-02-27  9:51 [Qemu-devel] [ PATCH v7 00/22] replay additions Pavel Dovgalyuk
                   ` (6 preceding siblings ...)
  2018-02-27  9:52 ` [Qemu-devel] [PATCH v7 07/22] replay: fix save/load vm for non-empty queue Pavel Dovgalyuk
@ 2018-02-27  9:52 ` Pavel Dovgalyuk
  2018-02-27  9:52 ` [Qemu-devel] [PATCH v7 09/22] replay: save prior value of the host clock Pavel Dovgalyuk
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 29+ messages in thread
From: Pavel Dovgalyuk @ 2018-02-27  9:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, war2jordan, boost.lists, quintela,
	ciro.santilli, jasowang, mst, zuban32s, maria.klimushenkova,
	dovgaluk, kraxel, pavel.dovgaluk, thomas.dullien, pbonzini,
	alex.bennee

From: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>

This patch adds description of the replay log file format
into the docs/replay.txt.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
---
 docs/replay.txt |   69 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 69 insertions(+)

diff --git a/docs/replay.txt b/docs/replay.txt
index 486c1e0..c52407f 100644
--- a/docs/replay.txt
+++ b/docs/replay.txt
@@ -232,3 +232,72 @@ Audio devices
 Audio data is recorded and replay automatically. The command line for recording
 and replaying must contain identical specifications of audio hardware, e.g.:
  -soundhw ac97
+
+Replay log format
+-----------------
+
+Record/replay log consits of the header and the sequence of execution
+events. The header includes 4-byte replay version id and 8-byte reserved
+field. Version is updated every time replay log format changes to prevent
+using replay log created by another build of qemu.
+
+The sequence of the events describes virtual machine state changes.
+It includes all non-deterministic inputs of VM, synchronization marks and
+instruction counts used to correctly inject inputs at replay.
+
+Synchronization marks (checkpoints) are used for synchronizing qemu threads
+that perform operations with virtual hardware. These operations may change
+system's state (e.g., change some register or generate interrupt) and
+therefore should execute synchronously with CPU thread.
+
+Every event in the log includes 1-byte event id and optional arguments.
+When argument is an array, it is stored as 4-byte array length
+and corresponding number of bytes with data.
+Here is the list of events that are written into the log:
+
+ - EVENT_INSTRUCTION. Instructions executed since last event.
+   Argument: 4-byte number of executed instructions.
+ - EVENT_INTERRUPT. Used to synchronize interrupt processing.
+ - EVENT_EXCEPTION. Used to synchronize exception handling.
+ - EVENT_ASYNC. This is a group of events. They are always processed
+   together with checkpoints. When such an event is generated, it is
+   stored in the queue and processed only when checkpoint occurs.
+   Every such event is followed by 1-byte checkpoint id and 1-byte
+   async event id from the following list:
+     - REPLAY_ASYNC_EVENT_BH. Bottom-half callback. This event synchronizes
+       callbacks that affect virtual machine state, but normally called
+       asyncronously.
+       Argument: 8-byte operation id.
+     - REPLAY_ASYNC_EVENT_INPUT. Input device event. Contains
+       parameters of keyboard and mouse input operations
+       (key press/release, mouse pointer movement).
+       Arguments: 9-16 bytes depending of input event.
+     - REPLAY_ASYNC_EVENT_INPUT_SYNC. Internal input synchronization event.
+     - REPLAY_ASYNC_EVENT_CHAR_READ. Character (e.g., serial port) device input
+       initiated by the sender.
+       Arguments: 1-byte character device id.
+                  Array with bytes were read.
+     - REPLAY_ASYNC_EVENT_BLOCK. Block device operation. Used to synchronize
+       operations with disk and flash drives with CPU.
+       Argument: 8-byte operation id.
+     - REPLAY_ASYNC_EVENT_NET. Incoming network packet.
+       Arguments: 1-byte network adapter id.
+                  4-byte packet flags.
+                  Array with packet bytes.
+ - EVENT_SHUTDOWN. Occurs when user sends shutdown event to qemu,
+   e.g., by closing the window.
+ - EVENT_CHAR_WRITE. Used to synchronize character output operations.
+   Arguments: 4-byte output function return value.
+              4-byte offset in the output array.
+ - EVENT_CHAR_READ_ALL. Used to synchronize character input operations,
+   initiated by qemu.
+   Argument: Array with bytes that were read.
+ - EVENT_CHAR_READ_ALL_ERROR. Unsuccessful character input operation,
+   initiated by qemu.
+   Argument: 4-byte error code.
+ - EVENT_CLOCK + clock_id. Group of events for host clock read operations.
+   Argument: 8-byte clock value.
+ - EVENT_CHECKPOINT + checkpoint_id. Checkpoint for synchronization of
+   CPU, internal threads, and asynchronous input events. May be followed
+   by one or more EVENT_ASYNC events.
+ - EVENT_END. Last event in the log.

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v7 09/22] replay: save prior value of the host clock
  2018-02-27  9:51 [Qemu-devel] [ PATCH v7 00/22] replay additions Pavel Dovgalyuk
                   ` (7 preceding siblings ...)
  2018-02-27  9:52 ` [Qemu-devel] [PATCH v7 08/22] replay: added replay log format description Pavel Dovgalyuk
@ 2018-02-27  9:52 ` Pavel Dovgalyuk
  2018-02-27  9:52 ` [Qemu-devel] [PATCH v7 10/22] replay/replay.c: bump REPLAY_VERSION again Pavel Dovgalyuk
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 29+ messages in thread
From: Pavel Dovgalyuk @ 2018-02-27  9:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, war2jordan, boost.lists, quintela,
	ciro.santilli, jasowang, mst, zuban32s, maria.klimushenkova,
	dovgaluk, kraxel, pavel.dovgaluk, thomas.dullien, pbonzini,
	alex.bennee

This patch adds saving/restoring of the host clock field 'last'.
It is used in host clock calculation and therefore clock may
become incorrect when using restored vmstate.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/qemu/timer.h     |   14 ++++++++++++++
 replay/replay-internal.h |    2 ++
 replay/replay-snapshot.c |    3 +++
 util/qemu-timer.c        |   12 ++++++++++++
 4 files changed, 31 insertions(+)

diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index 3b5a54b..39ea907 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -251,6 +251,20 @@ bool qemu_clock_run_timers(QEMUClockType type);
  */
 bool qemu_clock_run_all_timers(void);
 
+/**
+ * qemu_clock_get_last:
+ *
+ * Returns last clock query time.
+ */
+uint64_t qemu_clock_get_last(QEMUClockType type);
+/**
+ * qemu_clock_set_last:
+ *
+ * Sets last clock query time.
+ */
+void qemu_clock_set_last(QEMUClockType type, uint64_t last);
+
+
 /*
  * QEMUTimerList
  */
diff --git a/replay/replay-internal.h b/replay/replay-internal.h
index 3ebb199..be96d7e 100644
--- a/replay/replay-internal.h
+++ b/replay/replay-internal.h
@@ -78,6 +78,8 @@ typedef struct ReplayState {
         This counter is global, because requests from different
         block devices should not get overlapping ids. */
     uint64_t block_request_id;
+    /*! Prior value of the host clock */
+    uint64_t host_clock_last;
 } ReplayState;
 extern ReplayState replay_state;
 
diff --git a/replay/replay-snapshot.c b/replay/replay-snapshot.c
index 7075986..e0b2204 100644
--- a/replay/replay-snapshot.c
+++ b/replay/replay-snapshot.c
@@ -25,6 +25,7 @@ static int replay_pre_save(void *opaque)
 {
     ReplayState *state = opaque;
     state->file_offset = ftell(replay_file);
+    state->host_clock_last = qemu_clock_get_last(QEMU_CLOCK_HOST);
 
     return 0;
 }
@@ -33,6 +34,7 @@ static int replay_post_load(void *opaque, int version_id)
 {
     ReplayState *state = opaque;
     fseek(replay_file, state->file_offset, SEEK_SET);
+    qemu_clock_set_last(QEMU_CLOCK_HOST, state->host_clock_last);
     /* If this was a vmstate, saved in recording mode,
        we need to initialize replay data fields. */
     replay_fetch_data_kind();
@@ -54,6 +56,7 @@ static const VMStateDescription vmstate_replay = {
         VMSTATE_UINT32(has_unread_data, ReplayState),
         VMSTATE_UINT64(file_offset, ReplayState),
         VMSTATE_UINT64(block_request_id, ReplayState),
+        VMSTATE_UINT64(host_clock_last, ReplayState),
         VMSTATE_END_OF_LIST()
     },
 };
diff --git a/util/qemu-timer.c b/util/qemu-timer.c
index 82d5650..2ed1bf2 100644
--- a/util/qemu-timer.c
+++ b/util/qemu-timer.c
@@ -622,6 +622,18 @@ int64_t qemu_clock_get_ns(QEMUClockType type)
     }
 }
 
+uint64_t qemu_clock_get_last(QEMUClockType type)
+{
+    QEMUClock *clock = qemu_clock_ptr(type);
+    return clock->last;
+}
+
+void qemu_clock_set_last(QEMUClockType type, uint64_t last)
+{
+    QEMUClock *clock = qemu_clock_ptr(type);
+    clock->last = last;
+}
+
 void qemu_clock_register_reset_notifier(QEMUClockType type,
                                         Notifier *notifier)
 {

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v7 10/22] replay/replay.c: bump REPLAY_VERSION again
  2018-02-27  9:51 [Qemu-devel] [ PATCH v7 00/22] replay additions Pavel Dovgalyuk
                   ` (8 preceding siblings ...)
  2018-02-27  9:52 ` [Qemu-devel] [PATCH v7 09/22] replay: save prior value of the host clock Pavel Dovgalyuk
@ 2018-02-27  9:52 ` Pavel Dovgalyuk
  2018-02-27  9:52 ` [Qemu-devel] [PATCH v7 11/22] replay/replay-internal.c: track holding of replay_lock Pavel Dovgalyuk
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 29+ messages in thread
From: Pavel Dovgalyuk @ 2018-02-27  9:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, war2jordan, boost.lists, quintela,
	ciro.santilli, jasowang, mst, zuban32s, maria.klimushenkova,
	dovgaluk, kraxel, pavel.dovgaluk, thomas.dullien, pbonzini,
	alex.bennee

From: Alex Bennée <alex.bennee@linaro.org>

This time commit 802f045a5f61b781df55e4492d896b4d20503ba7 broke the
replay file format. Also add a comment about this to
replay-internal.h.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
---
 replay/replay-internal.h |    2 +-
 replay/replay.c          |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/replay/replay-internal.h b/replay/replay-internal.h
index be96d7e..8e4c701 100644
--- a/replay/replay-internal.h
+++ b/replay/replay-internal.h
@@ -12,7 +12,7 @@
  *
  */
 
-
+/* Any changes to order/number of events will need to bump REPLAY_VERSION */
 enum ReplayEvents {
     /* for instruction event */
     EVENT_INSTRUCTION,
diff --git a/replay/replay.c b/replay/replay.c
index 7a23c62..9cddb6b 100644
--- a/replay/replay.c
+++ b/replay/replay.c
@@ -22,7 +22,7 @@
 
 /* Current version of the replay mechanism.
    Increase it when file format changes. */
-#define REPLAY_VERSION              0xe02006
+#define REPLAY_VERSION              0xe02007
 /* Size of replay log header */
 #define HEADER_SIZE                 (sizeof(uint32_t) + sizeof(uint64_t))
 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v7 11/22] replay/replay-internal.c: track holding of replay_lock
  2018-02-27  9:51 [Qemu-devel] [ PATCH v7 00/22] replay additions Pavel Dovgalyuk
                   ` (9 preceding siblings ...)
  2018-02-27  9:52 ` [Qemu-devel] [PATCH v7 10/22] replay/replay.c: bump REPLAY_VERSION again Pavel Dovgalyuk
@ 2018-02-27  9:52 ` Pavel Dovgalyuk
  2018-02-27  9:52 ` [Qemu-devel] [PATCH v7 12/22] replay: make locking visible outside replay code Pavel Dovgalyuk
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 29+ messages in thread
From: Pavel Dovgalyuk @ 2018-02-27  9:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, war2jordan, boost.lists, quintela,
	ciro.santilli, jasowang, mst, zuban32s, maria.klimushenkova,
	dovgaluk, kraxel, pavel.dovgaluk, thomas.dullien, pbonzini,
	alex.bennee

From: Alex Bennée <alex.bennee@linaro.org>

This is modelled after the iothread mutex lock. We keep a TLS flag to
indicate when that thread has acquired the lock and assert we don't
double-lock or release when we shouldn't have.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
---
 replay/replay-internal.c |   11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/replay/replay-internal.c b/replay/replay-internal.c
index fca8514..0d7e1d6 100644
--- a/replay/replay-internal.c
+++ b/replay/replay-internal.c
@@ -169,6 +169,8 @@ void replay_finish_event(void)
     replay_fetch_data_kind();
 }
 
+static __thread bool replay_locked;
+
 void replay_mutex_init(void)
 {
     qemu_mutex_init(&lock);
@@ -179,13 +181,22 @@ void replay_mutex_destroy(void)
     qemu_mutex_destroy(&lock);
 }
 
+static bool replay_mutex_locked(void)
+{
+    return replay_locked;
+}
+
 void replay_mutex_lock(void)
 {
+    g_assert(!replay_mutex_locked());
     qemu_mutex_lock(&lock);
+    replay_locked = true;
 }
 
 void replay_mutex_unlock(void)
 {
+    g_assert(replay_mutex_locked());
+    replay_locked = false;
     qemu_mutex_unlock(&lock);
 }
 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v7 12/22] replay: make locking visible outside replay code
  2018-02-27  9:51 [Qemu-devel] [ PATCH v7 00/22] replay additions Pavel Dovgalyuk
                   ` (10 preceding siblings ...)
  2018-02-27  9:52 ` [Qemu-devel] [PATCH v7 11/22] replay/replay-internal.c: track holding of replay_lock Pavel Dovgalyuk
@ 2018-02-27  9:52 ` Pavel Dovgalyuk
  2018-02-27  9:52 ` [Qemu-devel] [PATCH v7 13/22] replay: push replay_mutex_lock up the call tree Pavel Dovgalyuk
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 29+ messages in thread
From: Pavel Dovgalyuk @ 2018-02-27  9:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, war2jordan, boost.lists, quintela,
	ciro.santilli, jasowang, mst, zuban32s, maria.klimushenkova,
	dovgaluk, kraxel, pavel.dovgaluk, thomas.dullien, pbonzini,
	alex.bennee

From: Alex Bennée <alex.bennee@linaro.org>

The replay_mutex_lock/unlock/locked functions are now going to be used
for ensuring lock-step behaviour between the two threads. Make them
public API functions and also provide stubs for non-QEMU builds on
common paths.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>

--

v7: - removed abort() from stubs to allow calling from utils
    - removed exported replay_mutex_locked()
---
 include/sysemu/replay.h  |   13 +++++++++++++
 replay/replay-internal.c |    2 +-
 replay/replay-internal.h |    6 +++---
 stubs/replay.c           |    8 ++++++++
 4 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/include/sysemu/replay.h b/include/sysemu/replay.h
index 5462555..291bcbc 100644
--- a/include/sysemu/replay.h
+++ b/include/sysemu/replay.h
@@ -46,6 +46,19 @@ extern ReplayMode replay_mode;
 /* Name of the initial VM snapshot */
 extern char *replay_snapshot;
 
+/* Replay locking
+ *
+ * The locks are needed to protect the shared structures and log file
+ * when doing record/replay. They also are the main sync-point between
+ * the main-loop thread and the vCPU thread. This was a role
+ * previously filled by the BQL which has been busy trying to reduce
+ * its impact across the code. This ensures blocks of events stay
+ * sequential and reproducible.
+ */
+
+void replay_mutex_lock(void);
+void replay_mutex_unlock(void);
+
 /* Replay process control functions */
 
 /*! Enables recording or saving event log with specified parameters */
diff --git a/replay/replay-internal.c b/replay/replay-internal.c
index 0d7e1d6..7cdefea 100644
--- a/replay/replay-internal.c
+++ b/replay/replay-internal.c
@@ -181,7 +181,7 @@ void replay_mutex_destroy(void)
     qemu_mutex_destroy(&lock);
 }
 
-static bool replay_mutex_locked(void)
+bool replay_mutex_locked(void)
 {
     return replay_locked;
 }
diff --git a/replay/replay-internal.h b/replay/replay-internal.h
index 8e4c701..41eee66 100644
--- a/replay/replay-internal.h
+++ b/replay/replay-internal.h
@@ -100,12 +100,12 @@ int64_t replay_get_qword(void);
 void replay_get_array(uint8_t *buf, size_t *size);
 void replay_get_array_alloc(uint8_t **buf, size_t *size);
 
-/* Mutex functions for protecting replay log file */
+/* Mutex functions for protecting replay log file and ensuring
+ * synchronisation between vCPU and main-loop threads. */
 
 void replay_mutex_init(void);
 void replay_mutex_destroy(void);
-void replay_mutex_lock(void);
-void replay_mutex_unlock(void);
+bool replay_mutex_locked(void);
 
 /*! Checks error status of the file. */
 void replay_check_error(void);
diff --git a/stubs/replay.c b/stubs/replay.c
index 9991ee5..18ba0bb 100644
--- a/stubs/replay.c
+++ b/stubs/replay.c
@@ -73,3 +73,11 @@ uint64_t blkreplay_next_id(void)
 {
     return 0;
 }
+
+void replay_mutex_lock(void)
+{
+}
+
+void replay_mutex_unlock(void)
+{
+}

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v7 13/22] replay: push replay_mutex_lock up the call tree
  2018-02-27  9:51 [Qemu-devel] [ PATCH v7 00/22] replay additions Pavel Dovgalyuk
                   ` (11 preceding siblings ...)
  2018-02-27  9:52 ` [Qemu-devel] [PATCH v7 12/22] replay: make locking visible outside replay code Pavel Dovgalyuk
@ 2018-02-27  9:52 ` Pavel Dovgalyuk
  2018-03-12 13:02   ` Paolo Bonzini
  2018-02-27  9:52 ` [Qemu-devel] [PATCH v7 14/22] replay: don't destroy mutex at exit Pavel Dovgalyuk
                   ` (9 subsequent siblings)
  22 siblings, 1 reply; 29+ messages in thread
From: Pavel Dovgalyuk @ 2018-02-27  9:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, war2jordan, boost.lists, quintela,
	ciro.santilli, jasowang, mst, zuban32s, maria.klimushenkova,
	dovgaluk, kraxel, pavel.dovgaluk, thomas.dullien, pbonzini,
	alex.bennee

From: Alex Bennée <alex.bennee@linaro.org>

Now instead of using the replay_lock to guard the output of the log we
now use it to protect the whole execution section. This replaces what
the BQL used to do when it was held during TCG execution.

We also introduce some rules for locking order - mainly that you
cannot take the replay_mutex while holding the BQL. This leads to some
slight sophistry during start-up and extending the
replay_mutex_destroy function to unlock the mutex without checking
for the BQL condition so it can be cleanly dropped in the non-replay
case.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Tested-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>

--

v6: refined lock/unlock logic due to removing the BQL patches
    removed replay lock/unlock from audio functions

v2: updated replay_mutex_lock/unlock functions as suggested by Paolo Bonzini
    updated docs
---
 cpus.c                   |   24 ++++++++++++++++++++++--
 docs/replay.txt          |   22 ++++++++++++++++++++++
 include/sysemu/replay.h  |    2 ++
 replay/replay-audio.c    |   14 ++++----------
 replay/replay-char.c     |   21 ++++++++-------------
 replay/replay-events.c   |   20 +++++++-------------
 replay/replay-internal.c |   27 +++++++++++++++++++--------
 replay/replay-time.c     |   10 +++++-----
 replay/replay.c          |   38 ++++++++++++++++++--------------------
 util/main-loop.c         |   15 +++++++++++----
 vl.c                     |    2 ++
 11 files changed, 120 insertions(+), 75 deletions(-)

diff --git a/cpus.c b/cpus.c
index f298b65..40ed0e6 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1307,6 +1307,8 @@ static void prepare_icount_for_run(CPUState *cpu)
         insns_left = MIN(0xffff, cpu->icount_budget);
         cpu->icount_decr.u16.low = insns_left;
         cpu->icount_extra = cpu->icount_budget - insns_left;
+
+        replay_mutex_lock();
     }
 }
 
@@ -1322,6 +1324,8 @@ static void process_icount_data(CPUState *cpu)
         cpu->icount_budget = 0;
 
         replay_account_executed_instructions();
+
+        replay_mutex_unlock();
     }
 }
 
@@ -1336,11 +1340,9 @@ static int tcg_cpu_exec(CPUState *cpu)
 #ifdef CONFIG_PROFILER
     ti = profile_getclock();
 #endif
-    qemu_mutex_unlock_iothread();
     cpu_exec_start(cpu);
     ret = cpu_exec(cpu);
     cpu_exec_end(cpu);
-    qemu_mutex_lock_iothread();
 #ifdef CONFIG_PROFILER
     tcg_time += profile_getclock() - ti;
 #endif
@@ -1409,6 +1411,9 @@ static void *qemu_tcg_rr_cpu_thread_fn(void *arg)
     cpu->exit_request = 1;
 
     while (1) {
+        qemu_mutex_unlock_iothread();
+        replay_mutex_lock();
+        qemu_mutex_lock_iothread();
         /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
         qemu_account_warp_timer();
 
@@ -1417,6 +1422,8 @@ static void *qemu_tcg_rr_cpu_thread_fn(void *arg)
          */
         handle_icount_deadline();
 
+        replay_mutex_unlock();
+
         if (!cpu) {
             cpu = first_cpu;
         }
@@ -1432,11 +1439,13 @@ static void *qemu_tcg_rr_cpu_thread_fn(void *arg)
             if (cpu_can_run(cpu)) {
                 int r;
 
+                qemu_mutex_unlock_iothread();
                 prepare_icount_for_run(cpu);
 
                 r = tcg_cpu_exec(cpu);
 
                 process_icount_data(cpu);
+                qemu_mutex_lock_iothread();
 
                 if (r == EXCP_DEBUG) {
                     cpu_handle_guest_debug(cpu);
@@ -1626,7 +1635,9 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
     while (1) {
         if (cpu_can_run(cpu)) {
             int r;
+            qemu_mutex_unlock_iothread();
             r = tcg_cpu_exec(cpu);
+            qemu_mutex_lock_iothread();
             switch (r) {
             case EXCP_DEBUG:
                 cpu_handle_guest_debug(cpu);
@@ -1773,12 +1784,21 @@ void pause_all_vcpus(void)
         }
     }
 
+    /* We need to drop the replay_lock so any vCPU threads woken up
+     * can finish their replay tasks
+     */
+    replay_mutex_unlock();
+
     while (!all_vcpus_paused()) {
         qemu_cond_wait(&qemu_pause_cond, &qemu_global_mutex);
         CPU_FOREACH(cpu) {
             qemu_cpu_kick(cpu);
         }
     }
+
+    qemu_mutex_unlock_iothread();
+    replay_mutex_lock();
+    qemu_mutex_lock_iothread();
 }
 
 void cpu_resume(CPUState *cpu)
diff --git a/docs/replay.txt b/docs/replay.txt
index c52407f..959633e 100644
--- a/docs/replay.txt
+++ b/docs/replay.txt
@@ -49,6 +49,28 @@ Modifications of qemu include:
  * recording/replaying user input (mouse and keyboard)
  * adding internal checkpoints for cpu and io synchronization
 
+Locking and thread synchronisation
+----------------------------------
+
+Previously the synchronisation of the main thread and the vCPU thread
+was ensured by the holding of the BQL. However the trend has been to
+reduce the time the BQL was held across the system including under TCG
+system emulation. As it is important that batches of events are kept
+in sequence (e.g. expiring timers and checkpoints in the main thread
+while instruction checkpoints are written by the vCPU thread) we need
+another lock to keep things in lock-step. This role is now handled by
+the replay_mutex_lock. It used to be held only for each event being
+written but now it is held for a whole execution period. This results
+in a deterministic ping-pong between the two main threads.
+
+As the BQL is now a finer grained lock than the replay_lock it is almost
+certainly a bug, and a source of deadlocks, to take the
+replay_mutex_lock while the BQL is held. This is enforced by an assert.
+While the unlocks are usually in the reverse order, this is not
+necessary; you can drop the replay_lock while holding the BQL, without
+doing a more complicated unlock_iothread/replay_unlock/lock_iothread
+sequence.
+
 Non-deterministic events
 ------------------------
 
diff --git a/include/sysemu/replay.h b/include/sysemu/replay.h
index 291bcbc..239d00d 100644
--- a/include/sysemu/replay.h
+++ b/include/sysemu/replay.h
@@ -61,6 +61,8 @@ void replay_mutex_unlock(void);
 
 /* Replay process control functions */
 
+/*! Enables and take replay locks (even if we don't use it) */
+void replay_init_locks(void);
 /*! Enables recording or saving event log with specified parameters */
 void replay_configure(struct QemuOpts *opts);
 /*! Initializes timers used for snapshotting and enables events recording */
diff --git a/replay/replay-audio.c b/replay/replay-audio.c
index 3d83743..b113836 100644
--- a/replay/replay-audio.c
+++ b/replay/replay-audio.c
@@ -19,20 +19,17 @@
 void replay_audio_out(int *played)
 {
     if (replay_mode == REPLAY_MODE_RECORD) {
+        g_assert(replay_mutex_locked());
         replay_save_instructions();
-        replay_mutex_lock();
         replay_put_event(EVENT_AUDIO_OUT);
         replay_put_dword(*played);
-        replay_mutex_unlock();
     } else if (replay_mode == REPLAY_MODE_PLAY) {
+        g_assert(replay_mutex_locked());
         replay_account_executed_instructions();
-        replay_mutex_lock();
         if (replay_next_event_is(EVENT_AUDIO_OUT)) {
             *played = replay_get_dword();
             replay_finish_event();
-            replay_mutex_unlock();
         } else {
-            replay_mutex_unlock();
             error_report("Missing audio out event in the replay log");
             abort();
         }
@@ -44,8 +41,8 @@ void replay_audio_in(int *recorded, void *samples, int *wpos, int size)
     int pos;
     uint64_t left, right;
     if (replay_mode == REPLAY_MODE_RECORD) {
+        g_assert(replay_mutex_locked());
         replay_save_instructions();
-        replay_mutex_lock();
         replay_put_event(EVENT_AUDIO_IN);
         replay_put_dword(*recorded);
         replay_put_dword(*wpos);
@@ -55,10 +52,9 @@ void replay_audio_in(int *recorded, void *samples, int *wpos, int size)
             replay_put_qword(left);
             replay_put_qword(right);
         }
-        replay_mutex_unlock();
     } else if (replay_mode == REPLAY_MODE_PLAY) {
+        g_assert(replay_mutex_locked());
         replay_account_executed_instructions();
-        replay_mutex_lock();
         if (replay_next_event_is(EVENT_AUDIO_IN)) {
             *recorded = replay_get_dword();
             *wpos = replay_get_dword();
@@ -69,9 +65,7 @@ void replay_audio_in(int *recorded, void *samples, int *wpos, int size)
                 audio_sample_from_uint64(samples, pos, left, right);
             }
             replay_finish_event();
-            replay_mutex_unlock();
         } else {
-            replay_mutex_unlock();
             error_report("Missing audio in event in the replay log");
             abort();
         }
diff --git a/replay/replay-char.c b/replay/replay-char.c
index cbf7c04..736cc8c 100755
--- a/replay/replay-char.c
+++ b/replay/replay-char.c
@@ -96,25 +96,24 @@ void *replay_event_char_read_load(void)
 
 void replay_char_write_event_save(int res, int offset)
 {
+    g_assert(replay_mutex_locked());
+
     replay_save_instructions();
-    replay_mutex_lock();
     replay_put_event(EVENT_CHAR_WRITE);
     replay_put_dword(res);
     replay_put_dword(offset);
-    replay_mutex_unlock();
 }
 
 void replay_char_write_event_load(int *res, int *offset)
 {
+    g_assert(replay_mutex_locked());
+
     replay_account_executed_instructions();
-    replay_mutex_lock();
     if (replay_next_event_is(EVENT_CHAR_WRITE)) {
         *res = replay_get_dword();
         *offset = replay_get_dword();
         replay_finish_event();
-        replay_mutex_unlock();
     } else {
-        replay_mutex_unlock();
         error_report("Missing character write event in the replay log");
         exit(1);
     }
@@ -122,23 +121,21 @@ void replay_char_write_event_load(int *res, int *offset)
 
 int replay_char_read_all_load(uint8_t *buf)
 {
-    replay_mutex_lock();
+    g_assert(replay_mutex_locked());
+
     if (replay_next_event_is(EVENT_CHAR_READ_ALL)) {
         size_t size;
         int res;
         replay_get_array(buf, &size);
         replay_finish_event();
-        replay_mutex_unlock();
         res = (int)size;
         assert(res >= 0);
         return res;
     } else if (replay_next_event_is(EVENT_CHAR_READ_ALL_ERROR)) {
         int res = replay_get_dword();
         replay_finish_event();
-        replay_mutex_unlock();
         return res;
     } else {
-        replay_mutex_unlock();
         error_report("Missing character read all event in the replay log");
         exit(1);
     }
@@ -146,19 +143,17 @@ int replay_char_read_all_load(uint8_t *buf)
 
 void replay_char_read_all_save_error(int res)
 {
+    g_assert(replay_mutex_locked());
     assert(res < 0);
     replay_save_instructions();
-    replay_mutex_lock();
     replay_put_event(EVENT_CHAR_READ_ALL_ERROR);
     replay_put_dword(res);
-    replay_mutex_unlock();
 }
 
 void replay_char_read_all_save_buf(uint8_t *buf, int offset)
 {
+    g_assert(replay_mutex_locked());
     replay_save_instructions();
-    replay_mutex_lock();
     replay_put_event(EVENT_CHAR_READ_ALL);
     replay_put_array(buf, offset);
-    replay_mutex_unlock();
 }
diff --git a/replay/replay-events.c b/replay/replay-events.c
index e858254..54dd9d2 100644
--- a/replay/replay-events.c
+++ b/replay/replay-events.c
@@ -79,16 +79,14 @@ bool replay_has_events(void)
 
 void replay_flush_events(void)
 {
-    replay_mutex_lock();
+    g_assert(replay_mutex_locked());
+
     while (!QTAILQ_EMPTY(&events_list)) {
         Event *event = QTAILQ_FIRST(&events_list);
-        replay_mutex_unlock();
         replay_run_event(event);
-        replay_mutex_lock();
         QTAILQ_REMOVE(&events_list, event, events);
         g_free(event);
     }
-    replay_mutex_unlock();
 }
 
 void replay_disable_events(void)
@@ -102,14 +100,14 @@ void replay_disable_events(void)
 
 void replay_clear_events(void)
 {
-    replay_mutex_lock();
+    g_assert(replay_mutex_locked());
+
     while (!QTAILQ_EMPTY(&events_list)) {
         Event *event = QTAILQ_FIRST(&events_list);
         QTAILQ_REMOVE(&events_list, event, events);
 
         g_free(event);
     }
-    replay_mutex_unlock();
 }
 
 /*! Adds specified async event to the queue */
@@ -136,9 +134,8 @@ void replay_add_event(ReplayAsyncEventKind event_kind,
     event->opaque2 = opaque2;
     event->id = id;
 
-    replay_mutex_lock();
+    g_assert(replay_mutex_locked());
     QTAILQ_INSERT_TAIL(&events_list, event, events);
-    replay_mutex_unlock();
 }
 
 void replay_bh_schedule_event(QEMUBH *bh)
@@ -207,13 +204,11 @@ static void replay_save_event(Event *event, int checkpoint)
 /* Called with replay mutex locked */
 void replay_save_events(int checkpoint)
 {
+    g_assert(replay_mutex_locked());
     while (!QTAILQ_EMPTY(&events_list)) {
         Event *event = QTAILQ_FIRST(&events_list);
         replay_save_event(event, checkpoint);
-
-        replay_mutex_unlock();
         replay_run_event(event);
-        replay_mutex_lock();
         QTAILQ_REMOVE(&events_list, event, events);
         g_free(event);
     }
@@ -292,6 +287,7 @@ static Event *replay_read_event(int checkpoint)
 /* Called with replay mutex locked */
 void replay_read_events(int checkpoint)
 {
+    g_assert(replay_mutex_locked());
     while (replay_state.data_kind == EVENT_ASYNC) {
         Event *event = replay_read_event(checkpoint);
         if (!event) {
@@ -299,9 +295,7 @@ void replay_read_events(int checkpoint)
         }
         replay_finish_event();
         read_event_kind = -1;
-        replay_mutex_unlock();
         replay_run_event(event);
-        replay_mutex_lock();
 
         g_free(event);
     }
diff --git a/replay/replay-internal.c b/replay/replay-internal.c
index 7cdefea..139b9fa 100644
--- a/replay/replay-internal.c
+++ b/replay/replay-internal.c
@@ -174,10 +174,16 @@ static __thread bool replay_locked;
 void replay_mutex_init(void)
 {
     qemu_mutex_init(&lock);
+    /* Hold the mutex while we start-up */
+    qemu_mutex_lock(&lock);
+    replay_locked = true;
 }
 
 void replay_mutex_destroy(void)
 {
+    if (replay_mutex_locked()) {
+        qemu_mutex_unlock(&lock);
+    }
     qemu_mutex_destroy(&lock);
 }
 
@@ -186,25 +192,31 @@ bool replay_mutex_locked(void)
     return replay_locked;
 }
 
+/* Ordering constraints, replay_lock must be taken before BQL */
 void replay_mutex_lock(void)
 {
-    g_assert(!replay_mutex_locked());
-    qemu_mutex_lock(&lock);
-    replay_locked = true;
+    if (replay_mode != REPLAY_MODE_NONE) {
+        g_assert(!qemu_mutex_iothread_locked());
+        g_assert(!replay_mutex_locked());
+        qemu_mutex_lock(&lock);
+        replay_locked = true;
+    }
 }
 
 void replay_mutex_unlock(void)
 {
-    g_assert(replay_mutex_locked());
-    replay_locked = false;
-    qemu_mutex_unlock(&lock);
+    if (replay_mode != REPLAY_MODE_NONE) {
+        g_assert(replay_mutex_locked());
+        replay_locked = false;
+        qemu_mutex_unlock(&lock);
+    }
 }
 
 /*! Saves cached instructions. */
 void replay_save_instructions(void)
 {
     if (replay_file && replay_mode == REPLAY_MODE_RECORD) {
-        replay_mutex_lock();
+        g_assert(replay_mutex_locked());
         int diff = (int)(replay_get_current_step() - replay_state.current_step);
 
         /* Time can only go forward */
@@ -215,6 +227,5 @@ void replay_save_instructions(void)
             replay_put_dword(diff);
             replay_state.current_step += diff;
         }
-        replay_mutex_unlock();
     }
 }
diff --git a/replay/replay-time.c b/replay/replay-time.c
index f70382a..6a7565e 100644
--- a/replay/replay-time.c
+++ b/replay/replay-time.c
@@ -17,13 +17,13 @@
 
 int64_t replay_save_clock(ReplayClockKind kind, int64_t clock)
 {
-    replay_save_instructions();
 
     if (replay_file) {
-        replay_mutex_lock();
+        g_assert(replay_mutex_locked());
+
+        replay_save_instructions();
         replay_put_event(EVENT_CLOCK + kind);
         replay_put_qword(clock);
-        replay_mutex_unlock();
     }
 
     return clock;
@@ -46,16 +46,16 @@ void replay_read_next_clock(ReplayClockKind kind)
 /*! Reads next clock event from the input. */
 int64_t replay_read_clock(ReplayClockKind kind)
 {
+    g_assert(replay_file && replay_mutex_locked());
+
     replay_account_executed_instructions();
 
     if (replay_file) {
         int64_t ret;
-        replay_mutex_lock();
         if (replay_next_event_is(EVENT_CLOCK + kind)) {
             replay_read_next_clock(kind);
         }
         ret = replay_state.cached_clock[kind];
-        replay_mutex_unlock();
 
         return ret;
     }
diff --git a/replay/replay.c b/replay/replay.c
index 9cddb6b..a8b57cd 100644
--- a/replay/replay.c
+++ b/replay/replay.c
@@ -81,7 +81,7 @@ int replay_get_instructions(void)
 void replay_account_executed_instructions(void)
 {
     if (replay_mode == REPLAY_MODE_PLAY) {
-        replay_mutex_lock();
+        g_assert(replay_mutex_locked());
         if (replay_state.instructions_count > 0) {
             int count = (int)(replay_get_current_step()
                               - replay_state.current_step);
@@ -100,24 +100,22 @@ void replay_account_executed_instructions(void)
                 qemu_notify_event();
             }
         }
-        replay_mutex_unlock();
     }
 }
 
 bool replay_exception(void)
 {
+
     if (replay_mode == REPLAY_MODE_RECORD) {
+        g_assert(replay_mutex_locked());
         replay_save_instructions();
-        replay_mutex_lock();
         replay_put_event(EVENT_EXCEPTION);
-        replay_mutex_unlock();
         return true;
     } else if (replay_mode == REPLAY_MODE_PLAY) {
+        g_assert(replay_mutex_locked());
         bool res = replay_has_exception();
         if (res) {
-            replay_mutex_lock();
             replay_finish_event();
-            replay_mutex_unlock();
         }
         return res;
     }
@@ -129,10 +127,9 @@ bool replay_has_exception(void)
 {
     bool res = false;
     if (replay_mode == REPLAY_MODE_PLAY) {
+        g_assert(replay_mutex_locked());
         replay_account_executed_instructions();
-        replay_mutex_lock();
         res = replay_next_event_is(EVENT_EXCEPTION);
-        replay_mutex_unlock();
     }
 
     return res;
@@ -141,17 +138,15 @@ bool replay_has_exception(void)
 bool replay_interrupt(void)
 {
     if (replay_mode == REPLAY_MODE_RECORD) {
+        g_assert(replay_mutex_locked());
         replay_save_instructions();
-        replay_mutex_lock();
         replay_put_event(EVENT_INTERRUPT);
-        replay_mutex_unlock();
         return true;
     } else if (replay_mode == REPLAY_MODE_PLAY) {
+        g_assert(replay_mutex_locked());
         bool res = replay_has_interrupt();
         if (res) {
-            replay_mutex_lock();
             replay_finish_event();
-            replay_mutex_unlock();
         }
         return res;
     }
@@ -163,10 +158,9 @@ bool replay_has_interrupt(void)
 {
     bool res = false;
     if (replay_mode == REPLAY_MODE_PLAY) {
+        g_assert(replay_mutex_locked());
         replay_account_executed_instructions();
-        replay_mutex_lock();
         res = replay_next_event_is(EVENT_INTERRUPT);
-        replay_mutex_unlock();
     }
     return res;
 }
@@ -174,9 +168,8 @@ bool replay_has_interrupt(void)
 void replay_shutdown_request(ShutdownCause cause)
 {
     if (replay_mode == REPLAY_MODE_RECORD) {
-        replay_mutex_lock();
+        g_assert(replay_mutex_locked());
         replay_put_event(EVENT_SHUTDOWN + cause);
-        replay_mutex_unlock();
     }
 }
 
@@ -190,9 +183,9 @@ bool replay_checkpoint(ReplayCheckpoint checkpoint)
         return true;
     }
 
-    replay_mutex_lock();
 
     if (replay_mode == REPLAY_MODE_PLAY) {
+        g_assert(replay_mutex_locked());
         if (replay_next_event_is(EVENT_CHECKPOINT + checkpoint)) {
             replay_finish_event();
         } else if (replay_state.data_kind != EVENT_ASYNC) {
@@ -205,15 +198,20 @@ bool replay_checkpoint(ReplayCheckpoint checkpoint)
            checkpoint were processed */
         res = replay_state.data_kind != EVENT_ASYNC;
     } else if (replay_mode == REPLAY_MODE_RECORD) {
+        g_assert(replay_mutex_locked());
         replay_put_event(EVENT_CHECKPOINT + checkpoint);
         replay_save_events(checkpoint);
         res = true;
     }
 out:
-    replay_mutex_unlock();
     return res;
 }
 
+void replay_init_locks(void)
+{
+    replay_mutex_init();
+}
+
 static void replay_enable(const char *fname, int mode)
 {
     const char *fmode = NULL;
@@ -233,8 +231,6 @@ static void replay_enable(const char *fname, int mode)
 
     atexit(replay_finish);
 
-    replay_mutex_init();
-
     replay_file = fopen(fname, fmode);
     if (replay_file == NULL) {
         fprintf(stderr, "Replay: open %s: %s\n", fname, strerror(errno));
@@ -274,6 +270,8 @@ void replay_configure(QemuOpts *opts)
     Location loc;
 
     if (!opts) {
+        /* we no longer need this lock */
+        replay_mutex_destroy();
         return;
     }
 
diff --git a/util/main-loop.c b/util/main-loop.c
index 7558eb5..992f9b0 100644
--- a/util/main-loop.c
+++ b/util/main-loop.c
@@ -29,6 +29,7 @@
 #include "qemu/sockets.h"	// struct in_addr needed for libslirp.h
 #include "sysemu/qtest.h"
 #include "sysemu/cpus.h"
+#include "sysemu/replay.h"
 #include "slirp/libslirp.h"
 #include "qemu/main-loop.h"
 #include "block/aio.h"
@@ -245,18 +246,19 @@ static int os_host_main_loop_wait(int64_t timeout)
         timeout = SCALE_MS;
     }
 
+
     if (timeout) {
         spin_counter = 0;
-        qemu_mutex_unlock_iothread();
     } else {
         spin_counter++;
     }
+    qemu_mutex_unlock_iothread();
+    replay_mutex_unlock();
 
     ret = qemu_poll_ns((GPollFD *)gpollfds->data, gpollfds->len, timeout);
 
-    if (timeout) {
-        qemu_mutex_lock_iothread();
-    }
+    replay_mutex_lock();
+    qemu_mutex_lock_iothread();
 
     glib_pollfds_poll();
 
@@ -463,8 +465,13 @@ static int os_host_main_loop_wait(int64_t timeout)
     poll_timeout_ns = qemu_soonest_timeout(poll_timeout_ns, timeout);
 
     qemu_mutex_unlock_iothread();
+
+    replay_mutex_unlock();
+
     g_poll_ret = qemu_poll_ns(poll_fds, n_poll_fds + w->num, poll_timeout_ns);
 
+    replay_mutex_lock();
+
     qemu_mutex_lock_iothread();
     if (g_poll_ret > 0) {
         for (i = 0; i < w->num; i++) {
diff --git a/vl.c b/vl.c
index 1170c69..47517f9 100644
--- a/vl.c
+++ b/vl.c
@@ -3061,6 +3061,8 @@ int main(int argc, char **argv, char **envp)
 
     qemu_init_cpu_list();
     qemu_init_cpu_loop();
+
+    replay_init_locks();
     qemu_mutex_lock_iothread();
 
     atexit(qemu_run_exit_notifiers);

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v7 14/22] replay: don't destroy mutex at exit
  2018-02-27  9:51 [Qemu-devel] [ PATCH v7 00/22] replay additions Pavel Dovgalyuk
                   ` (12 preceding siblings ...)
  2018-02-27  9:52 ` [Qemu-devel] [PATCH v7 13/22] replay: push replay_mutex_lock up the call tree Pavel Dovgalyuk
@ 2018-02-27  9:52 ` Pavel Dovgalyuk
  2018-02-27  9:52 ` [Qemu-devel] [PATCH v7 15/22] replay: check return values of fwrite Pavel Dovgalyuk
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 29+ messages in thread
From: Pavel Dovgalyuk @ 2018-02-27  9:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, war2jordan, boost.lists, quintela,
	ciro.santilli, jasowang, mst, zuban32s, maria.klimushenkova,
	dovgaluk, kraxel, pavel.dovgaluk, thomas.dullien, pbonzini,
	alex.bennee

Replay mutex is held by vCPU thread and destroy function is called
from atexit of the main thread. Therefore we cannot destroy it safely.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
---
 replay/replay.c |    1 -
 1 file changed, 1 deletion(-)

diff --git a/replay/replay.c b/replay/replay.c
index a8b57cd..60659c9 100644
--- a/replay/replay.c
+++ b/replay/replay.c
@@ -356,7 +356,6 @@ void replay_finish(void)
     replay_snapshot = NULL;
 
     replay_finish_events();
-    replay_mutex_destroy();
 }
 
 void replay_add_blocker(Error *reason)

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v7 15/22] replay: check return values of fwrite
  2018-02-27  9:51 [Qemu-devel] [ PATCH v7 00/22] replay additions Pavel Dovgalyuk
                   ` (13 preceding siblings ...)
  2018-02-27  9:52 ` [Qemu-devel] [PATCH v7 14/22] replay: don't destroy mutex at exit Pavel Dovgalyuk
@ 2018-02-27  9:52 ` Pavel Dovgalyuk
  2018-02-27  9:53 ` [Qemu-devel] [PATCH v7 16/22] replay: avoid recursive call of checkpoints Pavel Dovgalyuk
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 29+ messages in thread
From: Pavel Dovgalyuk @ 2018-02-27  9:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, war2jordan, boost.lists, quintela,
	ciro.santilli, jasowang, mst, zuban32s, maria.klimushenkova,
	dovgaluk, kraxel, pavel.dovgaluk, thomas.dullien, pbonzini,
	alex.bennee

This patch adds error reporting when fwrite cannot completely
save the buffer to the file.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>

--

v3: also check putc() return value
---
 replay/replay-internal.c |   17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/replay/replay-internal.c b/replay/replay-internal.c
index 139b9fa..fd88b5b 100644
--- a/replay/replay-internal.c
+++ b/replay/replay-internal.c
@@ -24,12 +24,23 @@
 static QemuMutex lock;
 
 /* File for replay writing */
+static bool write_error;
 FILE *replay_file;
 
+static void replay_write_error(void)
+{
+    if (!write_error) {
+        error_report("replay write error");
+        write_error = true;
+    }
+}
+
 void replay_put_byte(uint8_t byte)
 {
     if (replay_file) {
-        putc(byte, replay_file);
+        if (putc(byte, replay_file) == EOF) {
+            replay_write_error();
+        }
     }
 }
 
@@ -62,7 +73,9 @@ void replay_put_array(const uint8_t *buf, size_t size)
 {
     if (replay_file) {
         replay_put_dword(size);
-        fwrite(buf, 1, size, replay_file);
+        if (fwrite(buf, 1, size, replay_file) != size) {
+            replay_write_error();
+        }
     }
 }
 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v7 16/22] replay: avoid recursive call of checkpoints
  2018-02-27  9:51 [Qemu-devel] [ PATCH v7 00/22] replay additions Pavel Dovgalyuk
                   ` (14 preceding siblings ...)
  2018-02-27  9:52 ` [Qemu-devel] [PATCH v7 15/22] replay: check return values of fwrite Pavel Dovgalyuk
@ 2018-02-27  9:53 ` Pavel Dovgalyuk
  2018-02-27  9:53 ` [Qemu-devel] [PATCH v7 17/22] scripts/replay-dump.py: replay log dumper Pavel Dovgalyuk
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 29+ messages in thread
From: Pavel Dovgalyuk @ 2018-02-27  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, war2jordan, boost.lists, quintela,
	ciro.santilli, jasowang, mst, zuban32s, maria.klimushenkova,
	dovgaluk, kraxel, pavel.dovgaluk, thomas.dullien, pbonzini,
	alex.bennee

This patch adds a flag which denies recursive call of replay_checkpoint
function. Checkpoints may be accompanied by the hardware events. When event
is processed, virtual device may invoke timer modification functions that
also invoke the checkpoint function. This leads to infinite loop.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
---
 replay/replay.c |   14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/replay/replay.c b/replay/replay.c
index 60659c9..d5c3a66 100644
--- a/replay/replay.c
+++ b/replay/replay.c
@@ -176,13 +176,24 @@ void replay_shutdown_request(ShutdownCause cause)
 bool replay_checkpoint(ReplayCheckpoint checkpoint)
 {
     bool res = false;
+    static bool in_checkpoint;
     assert(EVENT_CHECKPOINT + checkpoint <= EVENT_CHECKPOINT_LAST);
-    replay_save_instructions();
 
     if (!replay_file) {
         return true;
     }
 
+    if (in_checkpoint) {
+        /* If we are already in checkpoint, then there is no need
+           for additional synchronization.
+           Recursion occurs when HW event modifies timers.
+           Timer modification may invoke the checkpoint and
+           proceed to recursion. */
+        return true;
+    }
+    in_checkpoint = true;
+
+    replay_save_instructions();
 
     if (replay_mode == REPLAY_MODE_PLAY) {
         g_assert(replay_mutex_locked());
@@ -204,6 +215,7 @@ bool replay_checkpoint(ReplayCheckpoint checkpoint)
         res = true;
     }
 out:
+    in_checkpoint = false;
     return res;
 }
 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v7 17/22] scripts/replay-dump.py: replay log dumper
  2018-02-27  9:51 [Qemu-devel] [ PATCH v7 00/22] replay additions Pavel Dovgalyuk
                   ` (15 preceding siblings ...)
  2018-02-27  9:53 ` [Qemu-devel] [PATCH v7 16/22] replay: avoid recursive call of checkpoints Pavel Dovgalyuk
@ 2018-02-27  9:53 ` Pavel Dovgalyuk
  2018-02-27  9:53 ` [Qemu-devel] [PATCH v7 18/22] replay: don't process async events when warping the clock Pavel Dovgalyuk
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 29+ messages in thread
From: Pavel Dovgalyuk @ 2018-02-27  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, war2jordan, boost.lists, quintela,
	ciro.santilli, jasowang, mst, zuban32s, maria.klimushenkova,
	dovgaluk, kraxel, pavel.dovgaluk, thomas.dullien, pbonzini,
	alex.bennee

From: Alex Bennée <alex.bennee@linaro.org>

This script is a debugging tool for looking through the contents of a
replay log file. It is incomplete but should fail gracefully at events
it doesn't understand.

It currently understands two different log formats as the audio
record/replay support was merged during since MTTCG. It was written to
help debug what has caused the BQL changes to break replay support.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v2
  - yet another update to the log format
---
 scripts/replay-dump.py |  308 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 308 insertions(+)
 create mode 100755 scripts/replay-dump.py

diff --git a/scripts/replay-dump.py b/scripts/replay-dump.py
new file mode 100755
index 0000000..203bb31
--- /dev/null
+++ b/scripts/replay-dump.py
@@ -0,0 +1,308 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+#
+# Dump the contents of a recorded execution stream
+#
+#  Copyright (c) 2017 Alex Bennée <alex.bennee@linaro.org>
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
+
+import argparse
+import struct
+from collections import namedtuple
+
+# This mirrors some of the global replay state which some of the
+# stream loading refers to. Some decoders may read the next event so
+# we need handle that case. Calling reuse_event will ensure the next
+# event is read from the cache rather than advancing the file.
+
+class ReplayState(object):
+    def __init__(self):
+        self.event = -1
+        self.event_count = 0
+        self.already_read = False
+        self.current_checkpoint = 0
+        self.checkpoint = 0
+
+    def set_event(self, ev):
+        self.event = ev
+        self.event_count += 1
+
+    def get_event(self):
+        self.already_read = False
+        return self.event
+
+    def reuse_event(self, ev):
+        self.event = ev
+        self.already_read = True
+
+    def set_checkpoint(self):
+        self.checkpoint = self.event - self.checkpoint_start
+
+    def get_checkpoint(self):
+        return self.checkpoint
+
+replay_state = ReplayState()
+
+# Simple read functions that mirror replay-internal.c
+# The file-stream is big-endian and manually written out a byte at a time.
+
+def read_byte(fin):
+    "Read a single byte"
+    return struct.unpack('>B', fin.read(1))[0]
+
+def read_event(fin):
+    "Read a single byte event, but save some state"
+    if replay_state.already_read:
+        return replay_state.get_event()
+    else:
+        replay_state.set_event(read_byte(fin))
+        return replay_state.event
+
+def read_word(fin):
+    "Read a 16 bit word"
+    return struct.unpack('>H', fin.read(2))[0]
+
+def read_dword(fin):
+    "Read a 32 bit word"
+    return struct.unpack('>I', fin.read(4))[0]
+
+def read_qword(fin):
+    "Read a 64 bit word"
+    return struct.unpack('>Q', fin.read(8))[0]
+
+# Generic decoder structure
+Decoder = namedtuple("Decoder", "eid name fn")
+
+def call_decode(table, index, dumpfile):
+    "Search decode table for next step"
+    decoder = next((d for d in table if d.eid == index), None)
+    if not decoder:
+        print "Could not decode index: %d" % (index)
+        print "Entry is: %s" % (decoder)
+        print "Decode Table is:\n%s" % (table)
+        return False
+    else:
+        return decoder.fn(decoder.eid, decoder.name, dumpfile)
+
+# Print event
+def print_event(eid, name, string=None, event_count=None):
+    "Print event with count"
+    if not event_count:
+        event_count = replay_state.event_count
+
+    if string:
+        print "%d:%s(%d) %s" % (event_count, name, eid, string)
+    else:
+        print "%d:%s(%d)" % (event_count, name, eid)
+
+
+# Decoders for each event type
+
+def decode_unimp(eid, name, _unused_dumpfile):
+    "Unimplimented decoder, will trigger exit"
+    print "%s not handled - will now stop" % (name)
+    return False
+
+# Checkpoint decoder
+def swallow_async_qword(eid, name, dumpfile):
+    "Swallow a qword of data without looking at it"
+    step_id = read_qword(dumpfile)
+    print "  %s(%d) @ %d" % (name, eid, step_id)
+    return True
+
+async_decode_table = [ Decoder(0, "REPLAY_ASYNC_EVENT_BH", swallow_async_qword),
+                       Decoder(1, "REPLAY_ASYNC_INPUT", decode_unimp),
+                       Decoder(2, "REPLAY_ASYNC_INPUT_SYNC", decode_unimp),
+                       Decoder(3, "REPLAY_ASYNC_CHAR_READ", decode_unimp),
+                       Decoder(4, "REPLAY_ASYNC_EVENT_BLOCK", decode_unimp),
+                       Decoder(5, "REPLAY_ASYNC_EVENT_NET", decode_unimp),
+]
+# See replay_read_events/replay_read_event
+def decode_async(eid, name, dumpfile):
+    """Decode an ASYNC event"""
+
+    print_event(eid, name)
+
+    async_event_kind = read_byte(dumpfile)
+    async_event_checkpoint = read_byte(dumpfile)
+
+    if async_event_checkpoint != replay_state.current_checkpoint:
+        print "  mismatch between checkpoint %d and async data %d" % (
+            replay_state.current_checkpoint, async_event_checkpoint)
+        return True
+
+    return call_decode(async_decode_table, async_event_kind, dumpfile)
+
+
+def decode_instruction(eid, name, dumpfile):
+    ins_diff = read_dword(dumpfile)
+    print_event(eid, name, "0x%x" % (ins_diff))
+    return True
+
+def decode_audio_out(eid, name, dumpfile):
+    audio_data = read_dword(dumpfile)
+    print_event(eid, name, "%d" % (audio_data))
+    return True
+
+def decode_checkpoint(eid, name, dumpfile):
+    """Decode a checkpoint.
+
+    Checkpoints contain a series of async events with their own specific data.
+    """
+    replay_state.set_checkpoint()
+    # save event count as we peek ahead
+    event_number = replay_state.event_count
+    next_event = read_event(dumpfile)
+
+    # if the next event is EVENT_ASYNC there are a bunch of
+    # async events to read, otherwise we are done
+    if next_event != 3:
+        print_event(eid, name, "no additional data", event_number)
+    else:
+        print_event(eid, name, "more data follows", event_number)
+
+    replay_state.reuse_event(next_event)
+    return True
+
+def decode_checkpoint_init(eid, name, dumpfile):
+    print_event(eid, name)
+    return True
+
+def decode_interrupt(eid, name, dumpfile):
+    print_event(eid, name)
+    return True
+
+def decode_clock(eid, name, dumpfile):
+    clock_data = read_qword(dumpfile)
+    print_event(eid, name, "0x%x" % (clock_data))
+    return True
+
+
+# pre-MTTCG merge
+v5_event_table = [Decoder(0, "EVENT_INSTRUCTION", decode_instruction),
+                  Decoder(1, "EVENT_INTERRUPT", decode_interrupt),
+                  Decoder(2, "EVENT_EXCEPTION", decode_unimp),
+                  Decoder(3, "EVENT_ASYNC", decode_async),
+                  Decoder(4, "EVENT_SHUTDOWN", decode_unimp),
+                  Decoder(5, "EVENT_CHAR_WRITE", decode_unimp),
+                  Decoder(6, "EVENT_CHAR_READ_ALL", decode_unimp),
+                  Decoder(7, "EVENT_CHAR_READ_ALL_ERROR", decode_unimp),
+                  Decoder(8, "EVENT_CLOCK_HOST", decode_clock),
+                  Decoder(9, "EVENT_CLOCK_VIRTUAL_RT", decode_clock),
+                  Decoder(10, "EVENT_CP_CLOCK_WARP_START", decode_checkpoint),
+                  Decoder(11, "EVENT_CP_CLOCK_WARP_ACCOUNT", decode_checkpoint),
+                  Decoder(12, "EVENT_CP_RESET_REQUESTED", decode_checkpoint),
+                  Decoder(13, "EVENT_CP_SUSPEND_REQUESTED", decode_checkpoint),
+                  Decoder(14, "EVENT_CP_CLOCK_VIRTUAL", decode_checkpoint),
+                  Decoder(15, "EVENT_CP_CLOCK_HOST", decode_checkpoint),
+                  Decoder(16, "EVENT_CP_CLOCK_VIRTUAL_RT", decode_checkpoint),
+                  Decoder(17, "EVENT_CP_INIT", decode_checkpoint_init),
+                  Decoder(18, "EVENT_CP_RESET", decode_checkpoint),
+]
+
+# post-MTTCG merge, AUDIO support added
+v6_event_table = [Decoder(0, "EVENT_INSTRUCTION", decode_instruction),
+                  Decoder(1, "EVENT_INTERRUPT", decode_interrupt),
+                  Decoder(2, "EVENT_EXCEPTION", decode_unimp),
+                  Decoder(3, "EVENT_ASYNC", decode_async),
+                  Decoder(4, "EVENT_SHUTDOWN", decode_unimp),
+                  Decoder(5, "EVENT_CHAR_WRITE", decode_unimp),
+                  Decoder(6, "EVENT_CHAR_READ_ALL", decode_unimp),
+                  Decoder(7, "EVENT_CHAR_READ_ALL_ERROR", decode_unimp),
+                  Decoder(8, "EVENT_AUDIO_OUT", decode_audio_out),
+                  Decoder(9, "EVENT_AUDIO_IN", decode_unimp),
+                  Decoder(10, "EVENT_CLOCK_HOST", decode_clock),
+                  Decoder(11, "EVENT_CLOCK_VIRTUAL_RT", decode_clock),
+                  Decoder(12, "EVENT_CP_CLOCK_WARP_START", decode_checkpoint),
+                  Decoder(13, "EVENT_CP_CLOCK_WARP_ACCOUNT", decode_checkpoint),
+                  Decoder(14, "EVENT_CP_RESET_REQUESTED", decode_checkpoint),
+                  Decoder(15, "EVENT_CP_SUSPEND_REQUESTED", decode_checkpoint),
+                  Decoder(16, "EVENT_CP_CLOCK_VIRTUAL", decode_checkpoint),
+                  Decoder(17, "EVENT_CP_CLOCK_HOST", decode_checkpoint),
+                  Decoder(18, "EVENT_CP_CLOCK_VIRTUAL_RT", decode_checkpoint),
+                  Decoder(19, "EVENT_CP_INIT", decode_checkpoint_init),
+                  Decoder(20, "EVENT_CP_RESET", decode_checkpoint),
+]
+
+# Shutdown cause added
+v7_event_table = [Decoder(0, "EVENT_INSTRUCTION", decode_instruction),
+                  Decoder(1, "EVENT_INTERRUPT", decode_interrupt),
+                  Decoder(2, "EVENT_EXCEPTION", decode_unimp),
+                  Decoder(3, "EVENT_ASYNC", decode_async),
+                  Decoder(4, "EVENT_SHUTDOWN", decode_unimp),
+                  Decoder(5, "EVENT_SHUTDOWN_HOST_ERR", decode_unimp),
+                  Decoder(6, "EVENT_SHUTDOWN_HOST_QMP", decode_unimp),
+                  Decoder(7, "EVENT_SHUTDOWN_HOST_SIGNAL", decode_unimp),
+                  Decoder(8, "EVENT_SHUTDOWN_HOST_UI", decode_unimp),
+                  Decoder(9, "EVENT_SHUTDOWN_GUEST_SHUTDOWN", decode_unimp),
+                  Decoder(10, "EVENT_SHUTDOWN_GUEST_RESET", decode_unimp),
+                  Decoder(11, "EVENT_SHUTDOWN_GUEST_PANIC", decode_unimp),
+                  Decoder(12, "EVENT_SHUTDOWN___MAX", decode_unimp),
+                  Decoder(13, "EVENT_CHAR_WRITE", decode_unimp),
+                  Decoder(14, "EVENT_CHAR_READ_ALL", decode_unimp),
+                  Decoder(15, "EVENT_CHAR_READ_ALL_ERROR", decode_unimp),
+                  Decoder(16, "EVENT_AUDIO_OUT", decode_audio_out),
+                  Decoder(17, "EVENT_AUDIO_IN", decode_unimp),
+                  Decoder(18, "EVENT_CLOCK_HOST", decode_clock),
+                  Decoder(19, "EVENT_CLOCK_VIRTUAL_RT", decode_clock),
+                  Decoder(20, "EVENT_CP_CLOCK_WARP_START", decode_checkpoint),
+                  Decoder(21, "EVENT_CP_CLOCK_WARP_ACCOUNT", decode_checkpoint),
+                  Decoder(22, "EVENT_CP_RESET_REQUESTED", decode_checkpoint),
+                  Decoder(23, "EVENT_CP_SUSPEND_REQUESTED", decode_checkpoint),
+                  Decoder(24, "EVENT_CP_CLOCK_VIRTUAL", decode_checkpoint),
+                  Decoder(25, "EVENT_CP_CLOCK_HOST", decode_checkpoint),
+                  Decoder(26, "EVENT_CP_CLOCK_VIRTUAL_RT", decode_checkpoint),
+                  Decoder(27, "EVENT_CP_INIT", decode_checkpoint_init),
+                  Decoder(28, "EVENT_CP_RESET", decode_checkpoint),
+]
+
+def parse_arguments():
+    "Grab arguments for script"
+    parser = argparse.ArgumentParser()
+    parser.add_argument("-f", "--file", help='record/replay dump to read from',
+                        required=True)
+    return parser.parse_args()
+
+def decode_file(filename):
+    "Decode a record/replay dump"
+    dumpfile = open(filename, "rb")
+
+    # read and throwaway the header
+    version = read_dword(dumpfile)
+    junk = read_qword(dumpfile)
+
+    print "HEADER: version 0x%x" % (version)
+
+    if version == 0xe02007:
+        event_decode_table = v7_event_table
+        replay_state.checkpoint_start = 12
+    elif version == 0xe02006:
+        event_decode_table = v6_event_table
+        replay_state.checkpoint_start = 12
+    else:
+        event_decode_table = v5_event_table
+        replay_state.checkpoint_start = 10
+
+    try:
+        decode_ok = True
+        while decode_ok:
+            event = read_event(dumpfile)
+            decode_ok = call_decode(event_decode_table, event, dumpfile)
+    finally:
+        dumpfile.close()
+
+if __name__ == "__main__":
+    args = parse_arguments()
+    decode_file(args.file)

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v7 18/22] replay: don't process async events when warping the clock
  2018-02-27  9:51 [Qemu-devel] [ PATCH v7 00/22] replay additions Pavel Dovgalyuk
                   ` (16 preceding siblings ...)
  2018-02-27  9:53 ` [Qemu-devel] [PATCH v7 17/22] scripts/replay-dump.py: replay log dumper Pavel Dovgalyuk
@ 2018-02-27  9:53 ` Pavel Dovgalyuk
  2018-02-27  9:53 ` [Qemu-devel] [PATCH v7 19/22] replay: save vmstate of the asynchronous events Pavel Dovgalyuk
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 29+ messages in thread
From: Pavel Dovgalyuk @ 2018-02-27  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, war2jordan, boost.lists, quintela,
	ciro.santilli, jasowang, mst, zuban32s, maria.klimushenkova,
	dovgaluk, kraxel, pavel.dovgaluk, thomas.dullien, pbonzini,
	alex.bennee

Virtual clock is warped from iothread and vcpu thread. When the hardware
events associated with warp checkpoint, then interrupt delivering may be
non-deterministic if checkpoint is processed in different threads in record
and replay.
This patch disables event processing for clock warp checkpoint and leaves
all hardware events to other checkpoints (e.g., virtual clock).

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>

--

v4: added assert for replay_save_events function
---
 replay/replay-events.c |    1 +
 replay/replay.c        |    7 ++++++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/replay/replay-events.c b/replay/replay-events.c
index 54dd9d2..3d5fc8a 100644
--- a/replay/replay-events.c
+++ b/replay/replay-events.c
@@ -205,6 +205,7 @@ static void replay_save_event(Event *event, int checkpoint)
 void replay_save_events(int checkpoint)
 {
     g_assert(replay_mutex_locked());
+    g_assert(checkpoint != CHECKPOINT_CLOCK_WARP_START);
     while (!QTAILQ_EMPTY(&events_list)) {
         Event *event = QTAILQ_FIRST(&events_list);
         replay_save_event(event, checkpoint);
diff --git a/replay/replay.c b/replay/replay.c
index d5c3a66..19721b0 100644
--- a/replay/replay.c
+++ b/replay/replay.c
@@ -211,7 +211,12 @@ bool replay_checkpoint(ReplayCheckpoint checkpoint)
     } else if (replay_mode == REPLAY_MODE_RECORD) {
         g_assert(replay_mutex_locked());
         replay_put_event(EVENT_CHECKPOINT + checkpoint);
-        replay_save_events(checkpoint);
+        /* This checkpoint belongs to several threads.
+           Processing events from different threads is
+           non-deterministic */
+        if (checkpoint != CHECKPOINT_CLOCK_WARP_START) {
+            replay_save_events(checkpoint);
+        }
         res = true;
     }
 out:

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v7 19/22] replay: save vmstate of the asynchronous events
  2018-02-27  9:51 [Qemu-devel] [ PATCH v7 00/22] replay additions Pavel Dovgalyuk
                   ` (17 preceding siblings ...)
  2018-02-27  9:53 ` [Qemu-devel] [PATCH v7 18/22] replay: don't process async events when warping the clock Pavel Dovgalyuk
@ 2018-02-27  9:53 ` Pavel Dovgalyuk
  2018-02-27  9:53 ` [Qemu-devel] [PATCH v7 20/22] replay: don't drain/flush bdrv queue while RR is working Pavel Dovgalyuk
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 29+ messages in thread
From: Pavel Dovgalyuk @ 2018-02-27  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, war2jordan, boost.lists, quintela,
	ciro.santilli, jasowang, mst, zuban32s, maria.klimushenkova,
	dovgaluk, kraxel, pavel.dovgaluk, thomas.dullien, pbonzini,
	alex.bennee

This patch fixes saving and loading the snapshots in the replay mode.
It is required for the snapshots created in the moment when the header
of the asynchronous event is read. This information was not saved in
the snapshot. After loading the vmstate replay continued with the file offset
passed the event header. The event header is lost in this case and replay
hangs.

Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
---
 replay/replay-events.c   |   44 +++++++++++++++++++++-----------------------
 replay/replay-internal.h |    6 ++++++
 replay/replay-snapshot.c |    3 +++
 3 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/replay/replay-events.c b/replay/replay-events.c
index 3d5fc8a..707de38 100644
--- a/replay/replay-events.c
+++ b/replay/replay-events.c
@@ -27,10 +27,6 @@ typedef struct Event {
 } Event;
 
 static QTAILQ_HEAD(, Event) events_list = QTAILQ_HEAD_INITIALIZER(events_list);
-static unsigned int read_event_kind = -1;
-static uint64_t read_id = -1;
-static int read_checkpoint = -1;
-
 static bool events_enabled;
 
 /* Functions */
@@ -218,58 +214,60 @@ void replay_save_events(int checkpoint)
 static Event *replay_read_event(int checkpoint)
 {
     Event *event;
-    if (read_event_kind == -1) {
-        read_checkpoint = replay_get_byte();
-        read_event_kind = replay_get_byte();
-        read_id = -1;
+    if (replay_state.read_event_kind == -1) {
+        replay_state.read_event_checkpoint = replay_get_byte();
+        replay_state.read_event_kind = replay_get_byte();
+        replay_state.read_event_id = -1;
         replay_check_error();
     }
 
-    if (checkpoint != read_checkpoint) {
+    if (checkpoint != replay_state.read_event_checkpoint) {
         return NULL;
     }
 
     /* Events that has not to be in the queue */
-    switch (read_event_kind) {
+    switch (replay_state.read_event_kind) {
     case REPLAY_ASYNC_EVENT_BH:
-        if (read_id == -1) {
-            read_id = replay_get_qword();
+        if (replay_state.read_event_id == -1) {
+            replay_state.read_event_id = replay_get_qword();
         }
         break;
     case REPLAY_ASYNC_EVENT_INPUT:
         event = g_malloc0(sizeof(Event));
-        event->event_kind = read_event_kind;
+        event->event_kind = replay_state.read_event_kind;
         event->opaque = replay_read_input_event();
         return event;
     case REPLAY_ASYNC_EVENT_INPUT_SYNC:
         event = g_malloc0(sizeof(Event));
-        event->event_kind = read_event_kind;
+        event->event_kind = replay_state.read_event_kind;
         event->opaque = 0;
         return event;
     case REPLAY_ASYNC_EVENT_CHAR_READ:
         event = g_malloc0(sizeof(Event));
-        event->event_kind = read_event_kind;
+        event->event_kind = replay_state.read_event_kind;
         event->opaque = replay_event_char_read_load();
         return event;
     case REPLAY_ASYNC_EVENT_BLOCK:
-        if (read_id == -1) {
-            read_id = replay_get_qword();
+        if (replay_state.read_event_id == -1) {
+            replay_state.read_event_id = replay_get_qword();
         }
         break;
     case REPLAY_ASYNC_EVENT_NET:
         event = g_malloc0(sizeof(Event));
-        event->event_kind = read_event_kind;
+        event->event_kind = replay_state.read_event_kind;
         event->opaque = replay_event_net_load();
         return event;
     default:
-        error_report("Unknown ID %d of replay event", read_event_kind);
+        error_report("Unknown ID %d of replay event",
+            replay_state.read_event_kind);
         exit(1);
         break;
     }
 
     QTAILQ_FOREACH(event, &events_list, events) {
-        if (event->event_kind == read_event_kind
-            && (read_id == -1 || read_id == event->id)) {
+        if (event->event_kind == replay_state.read_event_kind
+            && (replay_state.read_event_id == -1
+                || replay_state.read_event_id == event->id)) {
             break;
         }
     }
@@ -295,7 +293,7 @@ void replay_read_events(int checkpoint)
             break;
         }
         replay_finish_event();
-        read_event_kind = -1;
+        replay_state.read_event_kind = -1;
         replay_run_event(event);
 
         g_free(event);
@@ -304,7 +302,7 @@ void replay_read_events(int checkpoint)
 
 void replay_init_events(void)
 {
-    read_event_kind = -1;
+    replay_state.read_event_kind = -1;
 }
 
 void replay_finish_events(void)
diff --git a/replay/replay-internal.h b/replay/replay-internal.h
index 41eee66..1284444 100644
--- a/replay/replay-internal.h
+++ b/replay/replay-internal.h
@@ -80,6 +80,12 @@ typedef struct ReplayState {
     uint64_t block_request_id;
     /*! Prior value of the host clock */
     uint64_t host_clock_last;
+    /*! Asynchronous event type read from the log */
+    int32_t read_event_kind;
+    /*! Asynchronous event id read from the log */
+    uint64_t read_event_id;
+    /*! Asynchronous event checkpoint id read from the log */
+    int32_t read_event_checkpoint;
 } ReplayState;
 extern ReplayState replay_state;
 
diff --git a/replay/replay-snapshot.c b/replay/replay-snapshot.c
index e0b2204..2ab85cf 100644
--- a/replay/replay-snapshot.c
+++ b/replay/replay-snapshot.c
@@ -57,6 +57,9 @@ static const VMStateDescription vmstate_replay = {
         VMSTATE_UINT64(file_offset, ReplayState),
         VMSTATE_UINT64(block_request_id, ReplayState),
         VMSTATE_UINT64(host_clock_last, ReplayState),
+        VMSTATE_INT32(read_event_kind, ReplayState),
+        VMSTATE_UINT64(read_event_id, ReplayState),
+        VMSTATE_INT32(read_event_checkpoint, ReplayState),
         VMSTATE_END_OF_LIST()
     },
 };

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v7 20/22] replay: don't drain/flush bdrv queue while RR is working
  2018-02-27  9:51 [Qemu-devel] [ PATCH v7 00/22] replay additions Pavel Dovgalyuk
                   ` (18 preceding siblings ...)
  2018-02-27  9:53 ` [Qemu-devel] [PATCH v7 19/22] replay: save vmstate of the asynchronous events Pavel Dovgalyuk
@ 2018-02-27  9:53 ` Pavel Dovgalyuk
  2018-03-12 13:06   ` Paolo Bonzini
  2018-02-27  9:53 ` [Qemu-devel] [PATCH v7 21/22] replay: update documentation Pavel Dovgalyuk
                   ` (2 subsequent siblings)
  22 siblings, 1 reply; 29+ messages in thread
From: Pavel Dovgalyuk @ 2018-02-27  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, war2jordan, boost.lists, quintela,
	ciro.santilli, jasowang, mst, zuban32s, maria.klimushenkova,
	dovgaluk, kraxel, pavel.dovgaluk, thomas.dullien, pbonzini,
	alex.bennee

In record/replay mode bdrv queue is controlled by replay mechanism.
It does not allow saving or loading the snapshots
when bdrv queue is not empty. Stopping the VM is not blocked by nonempty
queue, but flushing the queue is still impossible there,
because it may cause deadlocks in replay mode.
This patch disables bdrv_drain_all and bdrv_flush_all in
record/replay mode.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
---
 block/io.c |   22 ++++++++++++++++++++++
 cpus.c     |    2 --
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/block/io.c b/block/io.c
index 89d0745..a0fd54f 100644
--- a/block/io.c
+++ b/block/io.c
@@ -31,6 +31,7 @@
 #include "qemu/cutils.h"
 #include "qapi/error.h"
 #include "qemu/error-report.h"
+#include "sysemu/replay.h"
 
 #define NOT_DONE 0x7fffffff /* used while emulated sync operation in progress */
 
@@ -407,6 +408,13 @@ void bdrv_drain_all_begin(void)
     BdrvNextIterator it;
     GSList *aio_ctxs = NULL, *ctx;
 
+    /* bdrv queue is managed by record/replay,
+       waiting for finishing the I/O requests may
+       be infinite */
+    if (replay_events_enabled()) {
+        return;
+    }
+
     /* BDRV_POLL_WHILE() for a node can only be called from its own I/O thread
      * or the main loop AioContext. We potentially use BDRV_POLL_WHILE() on
      * nodes in several different AioContexts, so make sure we're in the main
@@ -458,6 +466,13 @@ void bdrv_drain_all_end(void)
     BlockDriverState *bs;
     BdrvNextIterator it;
 
+    /* bdrv queue is managed by record/replay,
+       waiting for finishing the I/O requests may
+       be endless */
+    if (replay_events_enabled()) {
+        return;
+    }
+
     for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
         AioContext *aio_context = bdrv_get_aio_context(bs);
 
@@ -1839,6 +1854,13 @@ int bdrv_flush_all(void)
     BlockDriverState *bs = NULL;
     int result = 0;
 
+    /* bdrv queue is managed by record/replay,
+       creating new flush request for stopping
+       the VM may break the determinism */
+    if (replay_events_enabled()) {
+        return result;
+    }
+
     for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
         AioContext *aio_context = bdrv_get_aio_context(bs);
         int ret;
diff --git a/cpus.c b/cpus.c
index 40ed0e6..83e022e 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1006,7 +1006,6 @@ static int do_vm_stop(RunState state)
     }
 
     bdrv_drain_all();
-    replay_disable_events();
     ret = bdrv_flush_all();
 
     return ret;
@@ -2054,7 +2053,6 @@ int vm_prepare_start(void)
         qapi_event_send_stop(&error_abort);
         res = -1;
     } else {
-        replay_enable_events();
         cpu_enable_ticks();
         runstate_set(RUN_STATE_RUNNING);
         vm_state_notify(1, RUN_STATE_RUNNING);

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v7 21/22] replay: update documentation
  2018-02-27  9:51 [Qemu-devel] [ PATCH v7 00/22] replay additions Pavel Dovgalyuk
                   ` (19 preceding siblings ...)
  2018-02-27  9:53 ` [Qemu-devel] [PATCH v7 20/22] replay: don't drain/flush bdrv queue while RR is working Pavel Dovgalyuk
@ 2018-02-27  9:53 ` Pavel Dovgalyuk
  2018-02-27  9:53 ` [Qemu-devel] [PATCH v7 22/22] tcg: fix cpu_io_recompile Pavel Dovgalyuk
  2018-03-12 10:32 ` [Qemu-devel] [ PATCH v7 00/22] replay additions Pavel Dovgalyuk
  22 siblings, 0 replies; 29+ messages in thread
From: Pavel Dovgalyuk @ 2018-02-27  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, war2jordan, boost.lists, quintela,
	ciro.santilli, jasowang, mst, zuban32s, maria.klimushenkova,
	dovgaluk, kraxel, pavel.dovgaluk, thomas.dullien, pbonzini,
	alex.bennee

This patch clarifies the description of the record/replay feature
in docs/replay.txt

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
---
 docs/replay.txt |   72 ++++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 52 insertions(+), 20 deletions(-)

diff --git a/docs/replay.txt b/docs/replay.txt
index 959633e..2e21e9c 100644
--- a/docs/replay.txt
+++ b/docs/replay.txt
@@ -7,14 +7,10 @@ See the COPYING file in the top-level directory.
 Record/replay
 -------------
 
-Record/replay functions are used for the reverse execution and deterministic
-replay of qemu execution. This implementation of deterministic replay can
-be used for deterministic debugging of guest code through a gdb remote
-interface.
-
+Record/replay functions are used for the deterministic replay of qemu execution.
 Execution recording writes a non-deterministic events log, which can be later
 used for replaying the execution anywhere and for unlimited number of times.
-It also supports checkpointing for faster rewinding during reverse debugging.
+It also supports checkpointing for faster rewind to the specific replay moment.
 Execution replaying reads the log and replays all non-deterministic events
 including external input, hardware clocks, and interrupts.
 
@@ -28,16 +24,36 @@ Deterministic replay has the following features:
    input devices.
 
 Usage of the record/replay:
- * First, record the execution, by adding the following arguments to the command line:
-   '-icount shift=7,rr=record,rrfile=replay.bin -net none'.
-   Block devices' images are not actually changed in the recording mode,
+ * First, record the execution with the following command line:
+    qemu-system-i386 \
+     -icount shift=7,rr=record,rrfile=replay.bin \
+     -drive file=disk.qcow2,if=none,id=img-direct \
+     -drive driver=blkreplay,if=none,image=img-direct,id=img-blkreplay \
+     -device ide-hd,drive=img-blkreplay \
+     -netdev user,id=net1 -device rtl8139,netdev=net1 \
+     -object filter-replay,id=replay,netdev=net1
+ * After recording, you can replay it by using another command line:
+    qemu-system-i386 \
+     -icount shift=7,rr=replay,rrfile=replay.bin \
+     -drive file=disk.qcow2,if=none,id=img-direct \
+     -drive driver=blkreplay,if=none,image=img-direct,id=img-blkreplay \
+     -device ide-hd,drive=img-blkreplay \
+     -netdev user,id=net1 -device rtl8139,netdev=net1 \
+     -object filter-replay,id=replay,netdev=net1
+   The only difference with recording is changing the rr option
+   from record to replay.
+ * Block device images are not actually changed in the recording mode,
    because all of the changes are written to the temporary overlay file.
- * Then you can replay it by using another command
-   line option: '-icount shift=7,rr=replay,rrfile=replay.bin -net none'
- * '-net none' option should also be specified if network replay patches
-   are not applied.
-
-Papers with description of deterministic replay implementation:
+   This behavior is enabled by using blkreplay driver. It should be used
+   for every enabled block device, as described in 'Block devices' section.
+ * '-net none' option should be specified when network is not used,
+   because QEMU adds network card by default. When network is needed,
+   it should be configured explicitly with replay filter, as described
+   in 'Network devices' section.
+ * Interaction with audio devices and serial ports are recorded and replayed
+   automatically when such devices are enabled.
+
+Academic papers with description of deterministic replay implementation:
 http://www.computer.org/csdl/proceedings/csmr/2012/4666/00/4666a553-abs.html
 http://dl.acm.org/citation.cfm?id=2786805.2803179
 
@@ -46,8 +62,11 @@ Modifications of qemu include:
  * saving different asynchronous events (e.g. system shutdown) into the log
  * synchronization of the bottom halves execution
  * synchronization of the threads from thread pool
- * recording/replaying user input (mouse and keyboard)
+ * recording/replaying user input (mouse, keyboard, and microphone)
  * adding internal checkpoints for cpu and io synchronization
+ * network filter for recording and replaying the packets
+ * block driver for making block layer deterministic
+ * serial port input record and replay
 
 Locking and thread synchronisation
 ----------------------------------
@@ -77,12 +96,11 @@ Non-deterministic events
 Our record/replay system is based on saving and replaying non-deterministic
 events (e.g. keyboard input) and simulating deterministic ones (e.g. reading
 from HDD or memory of the VM). Saving only non-deterministic events makes
-log file smaller, simulation faster, and allows using reverse debugging even
-for realtime applications.
+log file smaller and simulation faster.
 
 The following non-deterministic data from peripheral devices is saved into
 the log: mouse and keyboard input, network packets, audio controller input,
-USB packets, serial port input, and hardware clocks (they are non-deterministic
+serial port input, and hardware clocks (they are non-deterministic
 too, because their values are taken from the host machine). Inputs from
 simulated hardware, memory of VM, software interrupts, and execution of
 instructions are not saved into the log, because they are deterministic and
@@ -205,7 +223,7 @@ Block devices record/replay module intercepts calls of
 bdrv coroutine functions at the top of block drivers stack.
 To record and replay block operations the drive must be configured
 as following:
- -drive file=disk.qcow,if=none,id=img-direct
+ -drive file=disk.qcow2,if=none,id=img-direct
  -drive driver=blkreplay,if=none,image=img-direct,id=img-blkreplay
  -device ide-hd,drive=img-blkreplay
 
@@ -234,6 +252,12 @@ This snapshot is created at start of recording and restored at start
 of replaying. It also can be loaded while replaying to roll back
 the execution.
 
+Use QEMU monitor to create additional snapshots. 'savevm <name>' command
+created the snapshot and 'loadvm <name>' restores it. To prevent corruption
+of the original disk image, use overlay files linked to the original images.
+Therefore all new snapshots (including the starting one) will be saved in
+overlays and the original image remains unchanged.
+
 Network devices
 ---------------
 
@@ -255,6 +279,14 @@ Audio data is recorded and replay automatically. The command line for recording
 and replaying must contain identical specifications of audio hardware, e.g.:
  -soundhw ac97
 
+Serial ports
+------------
+
+Serial ports input is recorded and replay automatically. The command lines
+for recording and replaying must contain identical number of ports in record
+and replay modes, but their backends may differ.
+E.g., '-serial stdio' in record mode, and '-serial null' in replay mode.
+
 Replay log format
 -----------------
 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v7 22/22] tcg: fix cpu_io_recompile
  2018-02-27  9:51 [Qemu-devel] [ PATCH v7 00/22] replay additions Pavel Dovgalyuk
                   ` (20 preceding siblings ...)
  2018-02-27  9:53 ` [Qemu-devel] [PATCH v7 21/22] replay: update documentation Pavel Dovgalyuk
@ 2018-02-27  9:53 ` Pavel Dovgalyuk
  2018-03-16 11:35   ` Richard Henderson
  2018-03-12 10:32 ` [Qemu-devel] [ PATCH v7 00/22] replay additions Pavel Dovgalyuk
  22 siblings, 1 reply; 29+ messages in thread
From: Pavel Dovgalyuk @ 2018-02-27  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, peter.maydell, war2jordan, boost.lists, quintela,
	ciro.santilli, jasowang, mst, zuban32s, maria.klimushenkova,
	dovgaluk, kraxel, pavel.dovgaluk, thomas.dullien, pbonzini,
	alex.bennee

cpu_io_recompile() function was broken by
the commit 9b990ee5a3cc6aa38f81266fb0c6ef37a36c45b9. Instead of regenerating
the block starting from PC of the original block, it just set the instruction
counter for TCG. In most cases this was unnoticed, but in icount mode
there was an exception for incorrect usage of CF_LAST_IO flag.
This patch recovers recompilation of the original block and also
configures translation for executing single IO instruction which
caused a recompilation.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
---
 accel/tcg/translate-all.c |   18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 67795cd..5ad1b91 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1728,7 +1728,8 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
     CPUArchState *env = cpu->env_ptr;
 #endif
     TranslationBlock *tb;
-    uint32_t n;
+    uint32_t n, flags;
+    target_ulong pc, cs_base;
 
     tb_lock();
     tb = tb_find_pc(retaddr);
@@ -1766,8 +1767,14 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
         cpu_abort(cpu, "TB too big during recompile");
     }
 
-    /* Adjust the execution state of the next TB.  */
-    cpu->cflags_next_tb = curr_cflags() | CF_LAST_IO | n;
+    pc = tb->pc;
+    cs_base = tb->cs_base;
+    flags = tb->flags;
+    tb_phys_invalidate(tb, -1);
+
+    /* Execute one IO instruction without caching
+       instead of creating large TB. */
+    cpu->cflags_next_tb = curr_cflags() | CF_LAST_IO | CF_NOCACHE | 1;
 
     if (tb->cflags & CF_NOCACHE) {
         if (tb->orig_tb) {
@@ -1778,6 +1785,11 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
         tb_remove(tb);
     }
 
+    /* Generate new TB instead of the current one. */
+    /* FIXME: In theory this could raise an exception.  In practice
+       we have already translated the block once so it's probably ok.  */
+    tb_gen_code(cpu, pc, cs_base, flags, curr_cflags() | CF_LAST_IO | n);
+
     /* TODO: If env->pc != tb->pc (i.e. the faulting instruction was not
      * the first in the TB) then we end up generating a whole new TB and
      *  repeating the fault, which is horribly inefficient.

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [ PATCH v7 00/22] replay additions
  2018-02-27  9:51 [Qemu-devel] [ PATCH v7 00/22] replay additions Pavel Dovgalyuk
                   ` (21 preceding siblings ...)
  2018-02-27  9:53 ` [Qemu-devel] [PATCH v7 22/22] tcg: fix cpu_io_recompile Pavel Dovgalyuk
@ 2018-03-12 10:32 ` Pavel Dovgalyuk
  2018-03-12 10:44   ` Ciro Santilli
  22 siblings, 1 reply; 29+ messages in thread
From: Pavel Dovgalyuk @ 2018-03-12 10:32 UTC (permalink / raw)
  To: 'Pavel Dovgalyuk', qemu-devel
  Cc: kwolf, peter.maydell, war2jordan, boost.lists, quintela,
	ciro.santilli, jasowang, mst, zuban32s, maria.klimushenkova,
	kraxel, thomas.dullien, pbonzini, alex.bennee, rth,
	crosthwaite.peter

Ping.

Pavel Dovgalyuk


> -----Original Message-----
> From: Pavel Dovgalyuk [mailto:Pavel.Dovgaluk@ispras.ru]
> Sent: Tuesday, February 27, 2018 12:52 PM
> To: qemu-devel@nongnu.org
> Cc: kwolf@redhat.com; peter.maydell@linaro.org; war2jordan@live.com; boost.lists@gmail.com;
> quintela@redhat.com; ciro.santilli@gmail.com; jasowang@redhat.com; mst@redhat.com;
> zuban32s@gmail.com; maria.klimushenkova@ispras.ru; dovgaluk@ispras.ru; kraxel@redhat.com;
> pavel.dovgaluk@ispras.ru; thomas.dullien@googlemail.com; pbonzini@redhat.com;
> alex.bennee@linaro.org
> Subject: [ PATCH v7 00/22] replay additions
> 
> This set of patches moves replay lock upper in the function call tree.
> Now replay lock functions similar to BQL in older version and allows
> deterministic execution of the threads in icount mode.
> It is also fixes some vmstate creation (and loading) issues
> in record/replay modes:
>  - VM start/stop fixes in replay mode
>  - overlay creation for blkreplay filter
>  - fixes for vmstate save/load in record/replay mode
>  - fixes for host clock vmstate
> 
> There is also a set of helper scripts written by Alex Bennée
> for debugging the record/replay code.
> 
> v6 patches with updates for v7 are available in the repository:
> https://github.com/ispras/qemu/tree/rr-180207
> 
> v7 changes:
>  - updated record/replay documentation
>  - removed abort() from mutex stub functions
>  - fixed cpu_io_recompile function
> 
> v6 changes:
>  - removed BQL optimization at all
>  - refined replay lock patches
>  - removed lock/unlock from replay-audio
> 
> v5 changes:
>  - removed patch for narrowing BQL-protected code
>  - disabled bdrv_(drain/flush)_all for record/replay mode
> 
> v4 changes:
>  - removed upstreamed patches
>  - added patch for saving async queue state in replay
>  - minor fixes
> 
> v3 changes:
>  - removed upstreamed patches
>  - fixed bug with recursive checkpoints
>  - fixed bug with icount warp checkpoint
> 
> v2 changes:
>  - updated lock/unlock logic (as suggested by Paolo Bonzini)
>  - updated cpu execution loop to avoid races in setting/resetting exit request (as suggested
> by Paolo Bonzini)
>  - minor changes
> 
> ---
> 
> Alex Bennée (5):
>       replay/replay.c: bump REPLAY_VERSION again
>       replay/replay-internal.c: track holding of replay_lock
>       replay: make locking visible outside replay code
>       replay: push replay_mutex_lock up the call tree
>       scripts/replay-dump.py: replay log dumper
> 
> Pavel Dovgalyuk (17):
>       cpu-exec: fix exception_index handling
>       block: implement bdrv_snapshot_goto for blkreplay
>       blkreplay: create temporary overlay for underlaying devices
>       replay: disable default snapshot for record/replay
>       replay: fix processing async events
>       replay: fixed replay_enable_events
>       replay: fix save/load vm for non-empty queue
>       replay: added replay log format description
>       replay: save prior value of the host clock
>       replay: don't destroy mutex at exit
>       replay: check return values of fwrite
>       replay: avoid recursive call of checkpoints
>       replay: don't process async events when warping the clock
>       replay: save vmstate of the asynchronous events
>       replay: don't drain/flush bdrv queue while RR is working
>       replay: update documentation
>       tcg: fix cpu_io_recompile
> 
> 
>  accel/tcg/cpu-exec.c      |    5 +
>  accel/tcg/translate-all.c |   18 ++-
>  block/blkreplay.c         |   75 +++++++++++
>  block/io.c                |   22 +++
>  cpus.c                    |   26 +++-
>  docs/replay.txt           |  163 +++++++++++++++++++++---
>  include/qemu/timer.h      |   14 ++
>  include/sysemu/replay.h   |   18 +++
>  migration/savevm.c        |   13 ++
>  replay/replay-audio.c     |   14 +-
>  replay/replay-char.c      |   21 +--
>  replay/replay-events.c    |   75 +++++------
>  replay/replay-internal.c  |   47 ++++++-
>  replay/replay-internal.h  |   16 ++
>  replay/replay-snapshot.c  |   12 ++
>  replay/replay-time.c      |   10 +
>  replay/replay.c           |   62 ++++++---
>  scripts/replay-dump.py    |  308 +++++++++++++++++++++++++++++++++++++++++++++
>  stubs/replay.c            |    9 +
>  util/main-loop.c          |   15 ++
>  util/qemu-timer.c         |   12 ++
>  vl.c                      |   12 +-
>  22 files changed, 831 insertions(+), 136 deletions(-)
>  create mode 100755 scripts/replay-dump.py
> 
> --
> Pavel Dovgalyuk

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [ PATCH v7 00/22] replay additions
  2018-03-12 10:32 ` [Qemu-devel] [ PATCH v7 00/22] replay additions Pavel Dovgalyuk
@ 2018-03-12 10:44   ` Ciro Santilli
  0 siblings, 0 replies; 29+ messages in thread
From: Ciro Santilli @ 2018-03-12 10:44 UTC (permalink / raw)
  To: Pavel Dovgalyuk
  Cc: Pavel Dovgalyuk, QEMU Developers, Kevin Wolf, Peter Maydell,
	war2jordan, Igor R, Juan Quintela, Jason Wang,
	Michael S. Tsirkin, Aleksandr Bezzubikov, maria.klimushenkova,
	Gerd Hoffmann, Thomas Dullien, Paolo Bonzini, Alex Bennée,
	rth, crosthwaite.peter

Just to re-affirm, I have ran this patch on x86 and arm, and it worked.


On Mon, Mar 12, 2018 at 10:32 AM, Pavel Dovgalyuk <dovgaluk@ispras.ru> wrote:
> Ping.
>
> Pavel Dovgalyuk
>
>
>> -----Original Message-----
>> From: Pavel Dovgalyuk [mailto:Pavel.Dovgaluk@ispras.ru]
>> Sent: Tuesday, February 27, 2018 12:52 PM
>> To: qemu-devel@nongnu.org
>> Cc: kwolf@redhat.com; peter.maydell@linaro.org; war2jordan@live.com; boost.lists@gmail.com;
>> quintela@redhat.com; ciro.santilli@gmail.com; jasowang@redhat.com; mst@redhat.com;
>> zuban32s@gmail.com; maria.klimushenkova@ispras.ru; dovgaluk@ispras.ru; kraxel@redhat.com;
>> pavel.dovgaluk@ispras.ru; thomas.dullien@googlemail.com; pbonzini@redhat.com;
>> alex.bennee@linaro.org
>> Subject: [ PATCH v7 00/22] replay additions
>>
>> This set of patches moves replay lock upper in the function call tree.
>> Now replay lock functions similar to BQL in older version and allows
>> deterministic execution of the threads in icount mode.
>> It is also fixes some vmstate creation (and loading) issues
>> in record/replay modes:
>>  - VM start/stop fixes in replay mode
>>  - overlay creation for blkreplay filter
>>  - fixes for vmstate save/load in record/replay mode
>>  - fixes for host clock vmstate
>>
>> There is also a set of helper scripts written by Alex Bennée
>> for debugging the record/replay code.
>>
>> v6 patches with updates for v7 are available in the repository:
>> https://github.com/ispras/qemu/tree/rr-180207
>>
>> v7 changes:
>>  - updated record/replay documentation
>>  - removed abort() from mutex stub functions
>>  - fixed cpu_io_recompile function
>>
>> v6 changes:
>>  - removed BQL optimization at all
>>  - refined replay lock patches
>>  - removed lock/unlock from replay-audio
>>
>> v5 changes:
>>  - removed patch for narrowing BQL-protected code
>>  - disabled bdrv_(drain/flush)_all for record/replay mode
>>
>> v4 changes:
>>  - removed upstreamed patches
>>  - added patch for saving async queue state in replay
>>  - minor fixes
>>
>> v3 changes:
>>  - removed upstreamed patches
>>  - fixed bug with recursive checkpoints
>>  - fixed bug with icount warp checkpoint
>>
>> v2 changes:
>>  - updated lock/unlock logic (as suggested by Paolo Bonzini)
>>  - updated cpu execution loop to avoid races in setting/resetting exit request (as suggested
>> by Paolo Bonzini)
>>  - minor changes
>>
>> ---
>>
>> Alex Bennée (5):
>>       replay/replay.c: bump REPLAY_VERSION again
>>       replay/replay-internal.c: track holding of replay_lock
>>       replay: make locking visible outside replay code
>>       replay: push replay_mutex_lock up the call tree
>>       scripts/replay-dump.py: replay log dumper
>>
>> Pavel Dovgalyuk (17):
>>       cpu-exec: fix exception_index handling
>>       block: implement bdrv_snapshot_goto for blkreplay
>>       blkreplay: create temporary overlay for underlaying devices
>>       replay: disable default snapshot for record/replay
>>       replay: fix processing async events
>>       replay: fixed replay_enable_events
>>       replay: fix save/load vm for non-empty queue
>>       replay: added replay log format description
>>       replay: save prior value of the host clock
>>       replay: don't destroy mutex at exit
>>       replay: check return values of fwrite
>>       replay: avoid recursive call of checkpoints
>>       replay: don't process async events when warping the clock
>>       replay: save vmstate of the asynchronous events
>>       replay: don't drain/flush bdrv queue while RR is working
>>       replay: update documentation
>>       tcg: fix cpu_io_recompile
>>
>>
>>  accel/tcg/cpu-exec.c      |    5 +
>>  accel/tcg/translate-all.c |   18 ++-
>>  block/blkreplay.c         |   75 +++++++++++
>>  block/io.c                |   22 +++
>>  cpus.c                    |   26 +++-
>>  docs/replay.txt           |  163 +++++++++++++++++++++---
>>  include/qemu/timer.h      |   14 ++
>>  include/sysemu/replay.h   |   18 +++
>>  migration/savevm.c        |   13 ++
>>  replay/replay-audio.c     |   14 +-
>>  replay/replay-char.c      |   21 +--
>>  replay/replay-events.c    |   75 +++++------
>>  replay/replay-internal.c  |   47 ++++++-
>>  replay/replay-internal.h  |   16 ++
>>  replay/replay-snapshot.c  |   12 ++
>>  replay/replay-time.c      |   10 +
>>  replay/replay.c           |   62 ++++++---
>>  scripts/replay-dump.py    |  308 +++++++++++++++++++++++++++++++++++++++++++++
>>  stubs/replay.c            |    9 +
>>  util/main-loop.c          |   15 ++
>>  util/qemu-timer.c         |   12 ++
>>  vl.c                      |   12 +-
>>  22 files changed, 831 insertions(+), 136 deletions(-)
>>  create mode 100755 scripts/replay-dump.py
>>
>> --
>> Pavel Dovgalyuk
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH v7 13/22] replay: push replay_mutex_lock up the call tree
  2018-02-27  9:52 ` [Qemu-devel] [PATCH v7 13/22] replay: push replay_mutex_lock up the call tree Pavel Dovgalyuk
@ 2018-03-12 13:02   ` Paolo Bonzini
  0 siblings, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2018-03-12 13:02 UTC (permalink / raw)
  To: Pavel Dovgalyuk, qemu-devel
  Cc: kwolf, peter.maydell, war2jordan, boost.lists, quintela,
	ciro.santilli, jasowang, mst, zuban32s, maria.klimushenkova,
	dovgaluk, kraxel, thomas.dullien, alex.bennee

On 27/02/2018 10:52, Pavel Dovgalyuk wrote:
>  
> +void replay_init_locks(void)
> +{
> +    replay_mutex_init();
> +}
> +

This should not be needed as a public function: until replay_mode
is set, replay_lock and replay_unlock do nothing.  I'm squashing this:

diff --git a/include/sysemu/replay.h b/include/sysemu/replay.h
index eeec66b8f4..3ced6bc231 100644
--- a/include/sysemu/replay.h
+++ b/include/sysemu/replay.h
@@ -63,8 +63,6 @@ void replay_mutex_unlock(void);
 
 /* Replay process control functions */
 
-/*! Enables and take replay locks (even if we don't use it) */
-void replay_init_locks(void);
 /*! Enables recording or saving event log with specified parameters */
 void replay_configure(struct QemuOpts *opts);
 /*! Initializes timers used for snapshotting and enables events recording */
diff --git a/replay/replay.c b/replay/replay.c
index a8b57cd077..b3e814a875 100644
--- a/replay/replay.c
+++ b/replay/replay.c
@@ -207,11 +207,6 @@ out:
     return res;
 }
 
-void replay_init_locks(void)
-{
-    replay_mutex_init();
-}
-
 static void replay_enable(const char *fname, int mode)
 {
     const char *fmode = NULL;
@@ -238,8 +233,9 @@ static void replay_enable(const char *fname, int mode)
     }
 
     replay_filename = g_strdup(fname);
-
     replay_mode = mode;
+    replay_mutex_init();
+
     replay_state.data_kind = -1;
     replay_state.instructions_count = 0;
     replay_state.current_step = 0;
diff --git a/vl.c b/vl.c
index 08e81c46a0..5925a4b502 100644
--- a/vl.c
+++ b/vl.c
@@ -3059,7 +3059,6 @@ int main(int argc, char **argv, char **envp)
     qemu_init_cpu_list();
     qemu_init_cpu_loop();
 
-    replay_init_locks();
     qemu_mutex_lock_iothread();
 
     atexit(qemu_run_exit_notifiers);

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH v7 20/22] replay: don't drain/flush bdrv queue while RR is working
  2018-02-27  9:53 ` [Qemu-devel] [PATCH v7 20/22] replay: don't drain/flush bdrv queue while RR is working Pavel Dovgalyuk
@ 2018-03-12 13:06   ` Paolo Bonzini
  0 siblings, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2018-03-12 13:06 UTC (permalink / raw)
  To: Pavel Dovgalyuk, qemu-devel
  Cc: kwolf, peter.maydell, war2jordan, boost.lists, quintela,
	ciro.santilli, jasowang, mst, zuban32s, maria.klimushenkova,
	dovgaluk, kraxel, thomas.dullien, alex.bennee

On 27/02/2018 10:53, Pavel Dovgalyuk wrote:
> In record/replay mode bdrv queue is controlled by replay mechanism.
> It does not allow saving or loading the snapshots
> when bdrv queue is not empty. Stopping the VM is not blocked by nonempty
> queue, but flushing the queue is still impossible there,
> because it may cause deadlocks in replay mode.
> This patch disables bdrv_drain_all and bdrv_flush_all in
> record/replay mode.
> 
> Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>

A lot of monitor commands that use bdrv_drain_all are going to remain
broken, so I'm not sure about this patch.  (There have been many threads
about bdrv_drain and bdrv_drain_all being a bad API...).  I'm queuing
everything except 2-3-4-20.

Thanks Pavel and Ciro.

Paolo

> ---
>  block/io.c |   22 ++++++++++++++++++++++
>  cpus.c     |    2 --
>  2 files changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index 89d0745..a0fd54f 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -31,6 +31,7 @@
>  #include "qemu/cutils.h"
>  #include "qapi/error.h"
>  #include "qemu/error-report.h"
> +#include "sysemu/replay.h"
>  
>  #define NOT_DONE 0x7fffffff /* used while emulated sync operation in progress */
>  
> @@ -407,6 +408,13 @@ void bdrv_drain_all_begin(void)
>      BdrvNextIterator it;
>      GSList *aio_ctxs = NULL, *ctx;
>  
> +    /* bdrv queue is managed by record/replay,
> +       waiting for finishing the I/O requests may
> +       be infinite */
> +    if (replay_events_enabled()) {
> +        return;
> +    }
> +
>      /* BDRV_POLL_WHILE() for a node can only be called from its own I/O thread
>       * or the main loop AioContext. We potentially use BDRV_POLL_WHILE() on
>       * nodes in several different AioContexts, so make sure we're in the main
> @@ -458,6 +466,13 @@ void bdrv_drain_all_end(void)
>      BlockDriverState *bs;
>      BdrvNextIterator it;
>  
> +    /* bdrv queue is managed by record/replay,
> +       waiting for finishing the I/O requests may
> +       be endless */
> +    if (replay_events_enabled()) {
> +        return;
> +    }
> +
>      for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
>          AioContext *aio_context = bdrv_get_aio_context(bs);
>  
> @@ -1839,6 +1854,13 @@ int bdrv_flush_all(void)
>      BlockDriverState *bs = NULL;
>      int result = 0;
>  
> +    /* bdrv queue is managed by record/replay,
> +       creating new flush request for stopping
> +       the VM may break the determinism */
> +    if (replay_events_enabled()) {
> +        return result;
> +    }
> +
>      for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
>          AioContext *aio_context = bdrv_get_aio_context(bs);
>          int ret;
> diff --git a/cpus.c b/cpus.c
> index 40ed0e6..83e022e 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -1006,7 +1006,6 @@ static int do_vm_stop(RunState state)
>      }
>  
>      bdrv_drain_all();
> -    replay_disable_events();
>      ret = bdrv_flush_all();
>  
>      return ret;
> @@ -2054,7 +2053,6 @@ int vm_prepare_start(void)
>          qapi_event_send_stop(&error_abort);
>          res = -1;
>      } else {
> -        replay_enable_events();
>          cpu_enable_ticks();
>          runstate_set(RUN_STATE_RUNNING);
>          vm_state_notify(1, RUN_STATE_RUNNING);
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH v7 22/22] tcg: fix cpu_io_recompile
  2018-02-27  9:53 ` [Qemu-devel] [PATCH v7 22/22] tcg: fix cpu_io_recompile Pavel Dovgalyuk
@ 2018-03-16 11:35   ` Richard Henderson
  2018-03-16 11:42     ` Pavel Dovgalyuk
  0 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2018-03-16 11:35 UTC (permalink / raw)
  To: Pavel Dovgalyuk
  Cc: qemu-devel@nongnu.org Developers, kwolf, Peter Maydell,
	war2jordan, Juan Quintela, ciro.santilli, Jason Wang,
	Michael S. Tsirkin, zuban32s, maria.klimushenkova, dovgaluk,
	Gerd Hoffmann, boost.lists, thomas.dullien, Paolo Bonzini,
	Alex Bennée

On 27 February 2018 at 17:53, Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru> wrote:
>
> cpu_io_recompile() function was broken by
> the commit 9b990ee5a3cc6aa38f81266fb0c6ef37a36c45b9. Instead of regenerating
> the block starting from PC of the original block, it just set the instruction
> counter for TCG. In most cases this was unnoticed, but in icount mode
> there was an exception for incorrect usage of CF_LAST_IO flag.
> This patch recovers recompilation of the original block and also
> configures translation for executing single IO instruction which
> caused a recompilation.
>
> Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
> ---
>  accel/tcg/translate-all.c |   18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 67795cd..5ad1b91 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -1728,7 +1728,8 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
>      CPUArchState *env = cpu->env_ptr;
>  #endif
>      TranslationBlock *tb;
> -    uint32_t n;
> +    uint32_t n, flags;
> +    target_ulong pc, cs_base;
>
>      tb_lock();
>      tb = tb_find_pc(retaddr);
> @@ -1766,8 +1767,14 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
>          cpu_abort(cpu, "TB too big during recompile");
>      }
>
> -    /* Adjust the execution state of the next TB.  */
> -    cpu->cflags_next_tb = curr_cflags() | CF_LAST_IO | n;
> +    pc = tb->pc;
> +    cs_base = tb->cs_base;
> +    flags = tb->flags;
> +    tb_phys_invalidate(tb, -1);
> +
> +    /* Execute one IO instruction without caching
> +       instead of creating large TB. */
> +    cpu->cflags_next_tb = curr_cflags() | CF_LAST_IO | CF_NOCACHE | 1;
>
>      if (tb->cflags & CF_NOCACHE) {
>          if (tb->orig_tb) {
> @@ -1778,6 +1785,11 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
>          tb_remove(tb);
>      }
>
> +    /* Generate new TB instead of the current one. */
> +    /* FIXME: In theory this could raise an exception.  In practice
> +       we have already translated the block once so it's probably ok.  */
> +    tb_gen_code(cpu, pc, cs_base, flags, curr_cflags() | CF_LAST_IO | n);

This isn't right.  The whole point of the patch that you reference as
having broken
things is that calls to tb_gen_code which ignore their return value
are by definition
relying on the side effect of altering the TB cache, and are therefore
by definition racy.

That is exactly the point of cpu->cflags_next_tb, that when we next
arrive in cpu_exec
we look up (or generate) the next TB with the given flags.  At which
point we will *not*
be relying on the TB cache, and we'll execute the generated TB right away.

I do not have enough context within this patch to determine what the
proper solution is.


r~

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH v7 22/22] tcg: fix cpu_io_recompile
  2018-03-16 11:35   ` Richard Henderson
@ 2018-03-16 11:42     ` Pavel Dovgalyuk
  0 siblings, 0 replies; 29+ messages in thread
From: Pavel Dovgalyuk @ 2018-03-16 11:42 UTC (permalink / raw)
  To: 'Richard Henderson', 'Pavel Dovgalyuk'
  Cc: qemu-devel, kwolf, 'Peter Maydell',
	war2jordan, 'Juan Quintela',
	ciro.santilli, 'Jason Wang', 'Michael S. Tsirkin',
	zuban32s, maria.klimushenkova, 'Gerd Hoffmann',
	boost.lists, thomas.dullien, 'Paolo Bonzini',
	'Alex Bennée'

> From: Richard Henderson [mailto:richard.henderson@linaro.org]
> On 27 February 2018 at 17:53, Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru> wrote:
> >
> > cpu_io_recompile() function was broken by
> > the commit 9b990ee5a3cc6aa38f81266fb0c6ef37a36c45b9. Instead of regenerating
> > the block starting from PC of the original block, it just set the instruction
> > counter for TCG. In most cases this was unnoticed, but in icount mode
> > there was an exception for incorrect usage of CF_LAST_IO flag.
> > This patch recovers recompilation of the original block and also
> > configures translation for executing single IO instruction which
> > caused a recompilation.
> >
> > Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
> > ---
> >  accel/tcg/translate-all.c |   18 +++++++++++++++---
> >  1 file changed, 15 insertions(+), 3 deletions(-)
> >
> > diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> > index 67795cd..5ad1b91 100644
> > --- a/accel/tcg/translate-all.c
> > +++ b/accel/tcg/translate-all.c
> > @@ -1728,7 +1728,8 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
> >      CPUArchState *env = cpu->env_ptr;
> >  #endif
> >      TranslationBlock *tb;
> > -    uint32_t n;
> > +    uint32_t n, flags;
> > +    target_ulong pc, cs_base;
> >
> >      tb_lock();
> >      tb = tb_find_pc(retaddr);
> > @@ -1766,8 +1767,14 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
> >          cpu_abort(cpu, "TB too big during recompile");
> >      }
> >
> > -    /* Adjust the execution state of the next TB.  */
> > -    cpu->cflags_next_tb = curr_cflags() | CF_LAST_IO | n;
> > +    pc = tb->pc;
> > +    cs_base = tb->cs_base;
> > +    flags = tb->flags;
> > +    tb_phys_invalidate(tb, -1);
> > +
> > +    /* Execute one IO instruction without caching
> > +       instead of creating large TB. */
> > +    cpu->cflags_next_tb = curr_cflags() | CF_LAST_IO | CF_NOCACHE | 1;
> >
> >      if (tb->cflags & CF_NOCACHE) {
> >          if (tb->orig_tb) {
> > @@ -1778,6 +1785,11 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
> >          tb_remove(tb);
> >      }
> >
> > +    /* Generate new TB instead of the current one. */
> > +    /* FIXME: In theory this could raise an exception.  In practice
> > +       we have already translated the block once so it's probably ok.  */
> > +    tb_gen_code(cpu, pc, cs_base, flags, curr_cflags() | CF_LAST_IO | n);
> 
> This isn't right.  The whole point of the patch that you reference as
> having broken
> things is that calls to tb_gen_code which ignore their return value
> are by definition
> relying on the side effect of altering the TB cache, and are therefore
> by definition racy.

I see.

> That is exactly the point of cpu->cflags_next_tb, that when we next
> arrive in cpu_exec
> we look up (or generate) the next TB with the given flags.  At which
> point we will *not*
> be relying on the TB cache, and we'll execute the generated TB right away.

Well, as a ineffective solution, we can just omit tb_gen_code, but still
make 
+    cpu->cflags_next_tb = curr_cflags() | CF_LAST_IO | CF_NOCACHE | 1;

Then the recompilation will occur every time, because the translation
for the original address is not limited by some counter.

> I do not have enough context within this patch to determine what the
> proper solution is.

The context is here:

https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04818.html


Pavel Dovgalyuk

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2018-03-16 11:42 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-27  9:51 [Qemu-devel] [ PATCH v7 00/22] replay additions Pavel Dovgalyuk
2018-02-27  9:51 ` [Qemu-devel] [PATCH v7 01/22] cpu-exec: fix exception_index handling Pavel Dovgalyuk
2018-02-27  9:51 ` [Qemu-devel] [PATCH v7 02/22] block: implement bdrv_snapshot_goto for blkreplay Pavel Dovgalyuk
2018-02-27  9:51 ` [Qemu-devel] [PATCH v7 03/22] blkreplay: create temporary overlay for underlaying devices Pavel Dovgalyuk
2018-02-27  9:51 ` [Qemu-devel] [PATCH v7 04/22] replay: disable default snapshot for record/replay Pavel Dovgalyuk
2018-02-27  9:52 ` [Qemu-devel] [PATCH v7 05/22] replay: fix processing async events Pavel Dovgalyuk
2018-02-27  9:52 ` [Qemu-devel] [PATCH v7 06/22] replay: fixed replay_enable_events Pavel Dovgalyuk
2018-02-27  9:52 ` [Qemu-devel] [PATCH v7 07/22] replay: fix save/load vm for non-empty queue Pavel Dovgalyuk
2018-02-27  9:52 ` [Qemu-devel] [PATCH v7 08/22] replay: added replay log format description Pavel Dovgalyuk
2018-02-27  9:52 ` [Qemu-devel] [PATCH v7 09/22] replay: save prior value of the host clock Pavel Dovgalyuk
2018-02-27  9:52 ` [Qemu-devel] [PATCH v7 10/22] replay/replay.c: bump REPLAY_VERSION again Pavel Dovgalyuk
2018-02-27  9:52 ` [Qemu-devel] [PATCH v7 11/22] replay/replay-internal.c: track holding of replay_lock Pavel Dovgalyuk
2018-02-27  9:52 ` [Qemu-devel] [PATCH v7 12/22] replay: make locking visible outside replay code Pavel Dovgalyuk
2018-02-27  9:52 ` [Qemu-devel] [PATCH v7 13/22] replay: push replay_mutex_lock up the call tree Pavel Dovgalyuk
2018-03-12 13:02   ` Paolo Bonzini
2018-02-27  9:52 ` [Qemu-devel] [PATCH v7 14/22] replay: don't destroy mutex at exit Pavel Dovgalyuk
2018-02-27  9:52 ` [Qemu-devel] [PATCH v7 15/22] replay: check return values of fwrite Pavel Dovgalyuk
2018-02-27  9:53 ` [Qemu-devel] [PATCH v7 16/22] replay: avoid recursive call of checkpoints Pavel Dovgalyuk
2018-02-27  9:53 ` [Qemu-devel] [PATCH v7 17/22] scripts/replay-dump.py: replay log dumper Pavel Dovgalyuk
2018-02-27  9:53 ` [Qemu-devel] [PATCH v7 18/22] replay: don't process async events when warping the clock Pavel Dovgalyuk
2018-02-27  9:53 ` [Qemu-devel] [PATCH v7 19/22] replay: save vmstate of the asynchronous events Pavel Dovgalyuk
2018-02-27  9:53 ` [Qemu-devel] [PATCH v7 20/22] replay: don't drain/flush bdrv queue while RR is working Pavel Dovgalyuk
2018-03-12 13:06   ` Paolo Bonzini
2018-02-27  9:53 ` [Qemu-devel] [PATCH v7 21/22] replay: update documentation Pavel Dovgalyuk
2018-02-27  9:53 ` [Qemu-devel] [PATCH v7 22/22] tcg: fix cpu_io_recompile Pavel Dovgalyuk
2018-03-16 11:35   ` Richard Henderson
2018-03-16 11:42     ` Pavel Dovgalyuk
2018-03-12 10:32 ` [Qemu-devel] [ PATCH v7 00/22] replay additions Pavel Dovgalyuk
2018-03-12 10:44   ` Ciro Santilli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.