All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-8.2 0/4] Assorted replay patches
@ 2023-08-14 16:31 Nicholas Piggin
  2023-08-14 16:31 ` [PATCH 1/4] scripts/replay_dump.sh: Update to current rr record format Nicholas Piggin
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Nicholas Piggin @ 2023-08-14 16:31 UTC (permalink / raw)
  To: Pavel Dovgalyuk
  Cc: Nicholas Piggin, Paolo Bonzini, John Snow, Cleber Rosa,
	Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal, qemu-devel

Hi,

These are a few small things I have found helpful while trying to
implement and test rr changes. Patch 2 depends on 1, but otherwise
the patches are independent.

Thanks,
Nick

Nicholas Piggin (4):
  scripts/replay_dump.sh: Update to current rr record format
  tests/avocado: replay_linux.py add replay-dump.py test
  replay: allow runstate shutdown->running when replaying trace
  replay: simple auto-snapshot mode for record

 docs/system/replay.rst        |   5 ++
 include/sysemu/replay.h       |  11 ++++
 include/sysemu/runstate.h     |   1 +
 qemu-options.hx               |   9 ++-
 replay/replay-snapshot.c      |  57 +++++++++++++++++
 replay/replay.c               |  27 ++++++++
 scripts/replay-dump.py        | 113 +++++++++++++++++++++++++++++++---
 softmmu/runstate.c            |  19 ++++++
 softmmu/vl.c                  |   9 +++
 tests/avocado/replay_linux.py |  16 ++++-
 10 files changed, 256 insertions(+), 11 deletions(-)

-- 
2.40.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/4] scripts/replay_dump.sh: Update to current rr record format
  2023-08-14 16:31 [PATCH for-8.2 0/4] Assorted replay patches Nicholas Piggin
@ 2023-08-14 16:31 ` Nicholas Piggin
  2023-08-18  4:19   ` Pavel Dovgalyuk
  2023-08-14 16:31 ` [PATCH 2/4] tests/avocado: replay_linux.py add replay-dump.py test Nicholas Piggin
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Nicholas Piggin @ 2023-08-14 16:31 UTC (permalink / raw)
  To: Pavel Dovgalyuk
  Cc: Nicholas Piggin, Paolo Bonzini, John Snow, Cleber Rosa,
	Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal, qemu-devel

This thing seems to have fallen by the wayside. This gets it working with
the current format, although does not quite implement all events.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
My python skills are not good. Any help on this or patch 2 is
appreciated.

Thanks,
Nick

 scripts/replay-dump.py | 107 ++++++++++++++++++++++++++++++++++++++---
 1 file changed, 101 insertions(+), 6 deletions(-)

diff --git a/scripts/replay-dump.py b/scripts/replay-dump.py
index 3ba97a6d30..937ae19ff1 100755
--- a/scripts/replay-dump.py
+++ b/scripts/replay-dump.py
@@ -20,6 +20,7 @@
 
 import argparse
 import struct
+import os
 from collections import namedtuple
 
 # This mirrors some of the global replay state which some of the
@@ -62,6 +63,10 @@ def read_byte(fin):
     "Read a single byte"
     return struct.unpack('>B', fin.read(1))[0]
 
+def read_bytes(fin, nr):
+    "Read a nr bytes"
+    return fin.read(nr)
+
 def read_event(fin):
     "Read a single byte event, but save some state"
     if replay_state.already_read:
@@ -122,12 +127,18 @@ def swallow_async_qword(eid, name, dumpfile):
     print("  %s(%d) @ %d" % (name, eid, step_id))
     return True
 
+def swallow_bytes(eid, name, dumpfile, nr):
+    "Swallow nr bytes of data without looking at it"
+    dumpfile.seek(nr, os.SEEK_CUR)
+    return True
+
 async_decode_table = [ Decoder(0, "REPLAY_ASYNC_EVENT_BH", swallow_async_qword),
-                       Decoder(1, "REPLAY_ASYNC_INPUT", decode_unimp),
-                       Decoder(2, "REPLAY_ASYNC_INPUT_SYNC", decode_unimp),
-                       Decoder(3, "REPLAY_ASYNC_CHAR_READ", decode_unimp),
-                       Decoder(4, "REPLAY_ASYNC_EVENT_BLOCK", decode_unimp),
-                       Decoder(5, "REPLAY_ASYNC_EVENT_NET", decode_unimp),
+                       Decoder(1, "REPLAY_ASYNC_BH_ONESHOT", decode_unimp),
+                       Decoder(2, "REPLAY_ASYNC_INPUT", decode_unimp),
+                       Decoder(3, "REPLAY_ASYNC_INPUT_SYNC", decode_unimp),
+                       Decoder(4, "REPLAY_ASYNC_CHAR_READ", decode_unimp),
+                       Decoder(5, "REPLAY_ASYNC_EVENT_BLOCK", decode_unimp),
+                       Decoder(6, "REPLAY_ASYNC_EVENT_NET", decode_unimp),
 ]
 # See replay_read_events/replay_read_event
 def decode_async(eid, name, dumpfile):
@@ -156,6 +167,13 @@ def decode_audio_out(eid, name, dumpfile):
     print_event(eid, name, "%d" % (audio_data))
     return True
 
+def decode_random(eid, name, dumpfile):
+    ret = read_dword(dumpfile)
+    size = read_dword(dumpfile)
+    swallow_bytes(eid, name, dumpfile, size)
+    print_event(eid, name, "%d %d" % (ret, size))
+    return True
+
 def decode_checkpoint(eid, name, dumpfile):
     """Decode a checkpoint.
 
@@ -184,6 +202,38 @@ def decode_interrupt(eid, name, dumpfile):
     print_event(eid, name)
     return True
 
+def decode_exception(eid, name, dumpfile):
+    print_event(eid, name)
+    return True
+
+def decode_shutdown(eid, name, dumpfile):
+    print_event(eid, name)
+    return True
+
+def decode_end(eid, name, dumpfile):
+    print_event(eid, name)
+    return False
+
+def decode_char_write(eid, name, dumpfile):
+    res = read_dword(dumpfile)
+    offset = read_dword(dumpfile)
+    print_event(eid, name)
+    return True
+
+def decode_async_char_read(eid, name, dumpfile):
+    char_id = read_byte(dumpfile)
+    size = read_dword(dumpfile)
+    print_event(eid, name, "device:%x chars:%s" % (char_id, read_bytes(dumpfile, size)))
+    return True
+
+def decode_async_net(eid, name, dumpfile):
+    net_id = read_byte(dumpfile)
+    flags = read_dword(dumpfile)
+    size = read_dword(dumpfile)
+    swallow_bytes(eid, name, dumpfile, size)
+    print_event(eid, name, "net:%x flags:%x bytes:%d" % (net_id, flags, size))
+    return True
+
 def decode_clock(eid, name, dumpfile):
     clock_data = read_qword(dumpfile)
     print_event(eid, name, "0x%x" % (clock_data))
@@ -268,6 +318,48 @@ def decode_clock(eid, name, dumpfile):
                   Decoder(28, "EVENT_CP_RESET", decode_checkpoint),
 ]
 
+v12_event_table = [Decoder(0, "EVENT_INSTRUCTION", decode_instruction),
+                  Decoder(1, "EVENT_INTERRUPT", decode_interrupt),
+                  Decoder(2, "EVENT_EXCEPTION", decode_exception),
+                  Decoder(3, "EVENT_ASYNC_BH", swallow_async_qword),
+                  Decoder(4, "EVENT_ASYNC_BH_ONESHOT", swallow_async_qword),
+                  Decoder(5, "EVENT_ASYNC_INPUT", decode_unimp),
+                  Decoder(6, "EVENT_ASYNC_INPUT_SYNC", decode_unimp),
+                  Decoder(7, "EVENT_ASYNC_CHAR_READ", decode_async_char_read),
+                  Decoder(8, "EVENT_ASYNC_BLOCK", swallow_async_qword),
+                  Decoder(9, "EVENT_ASYNC_NET", decode_async_net),
+                  Decoder(10, "EVENT_SHUTDOWN", decode_unimp),
+                  Decoder(11, "EVENT_SHUTDOWN_HOST_ERR", decode_shutdown),
+                  Decoder(12, "EVENT_SHUTDOWN_HOST_QMP_QUIT", decode_shutdown),
+                  Decoder(13, "EVENT_SHUTDOWN_HOST_QMP_RESET", decode_shutdown),
+                  Decoder(14, "EVENT_SHUTDOWN_HOST_SIGNAL", decode_shutdown),
+                  Decoder(15, "EVENT_SHUTDOWN_HOST_UI", decode_shutdown),
+                  Decoder(16, "EVENT_SHUTDOWN_GUEST_SHUTDOWN", decode_shutdown),
+                  Decoder(17, "EVENT_SHUTDOWN_GUEST_RESET", decode_shutdown),
+                  Decoder(18, "EVENT_SHUTDOWN_GUEST_PANIC", decode_shutdown),
+                  Decoder(19, "EVENT_SHUTDOWN_SUBSYS_RESET", decode_shutdown),
+                  Decoder(20, "EVENT_SHUTDOWN_SNAPSHOT_LOAD", decode_shutdown),
+                  Decoder(21, "EVENT_SHUTDOWN___MAX", decode_shutdown),
+                  Decoder(22, "EVENT_CHAR_WRITE", decode_char_write),
+                  Decoder(23, "EVENT_CHAR_READ_ALL", decode_unimp),
+                  Decoder(24, "EVENT_CHAR_READ_ALL_ERROR", decode_unimp),
+                  Decoder(25, "EVENT_AUDIO_OUT", decode_audio_out),
+                  Decoder(26, "EVENT_AUDIO_IN", decode_unimp),
+                  Decoder(27, "EVENT_RANDOM", decode_random),
+                  Decoder(28, "EVENT_CLOCK_HOST", decode_clock),
+                  Decoder(29, "EVENT_CLOCK_VIRTUAL_RT", decode_clock),
+                  Decoder(30, "EVENT_CP_CLOCK_WARP_START", decode_checkpoint),
+                  Decoder(31, "EVENT_CP_CLOCK_WARP_ACCOUNT", decode_checkpoint),
+                  Decoder(32, "EVENT_CP_RESET_REQUESTED", decode_checkpoint),
+                  Decoder(33, "EVENT_CP_SUSPEND_REQUESTED", decode_checkpoint),
+                  Decoder(34, "EVENT_CP_CLOCK_VIRTUAL", decode_checkpoint),
+                  Decoder(35, "EVENT_CP_CLOCK_HOST", decode_checkpoint),
+                  Decoder(36, "EVENT_CP_CLOCK_VIRTUAL_RT", decode_checkpoint),
+                  Decoder(37, "EVENT_CP_INIT", decode_checkpoint_init),
+                  Decoder(38, "EVENT_CP_RESET", decode_checkpoint),
+                  Decoder(39, "EVENT_END", decode_end),
+]
+
 def parse_arguments():
     "Grab arguments for script"
     parser = argparse.ArgumentParser()
@@ -285,7 +377,10 @@ def decode_file(filename):
 
     print("HEADER: version 0x%x" % (version))
 
-    if version == 0xe02007:
+    if version == 0xe0200c:
+        event_decode_table = v12_event_table
+        replay_state.checkpoint_start = 12
+    elif version == 0xe02007:
         event_decode_table = v7_event_table
         replay_state.checkpoint_start = 12
     elif version == 0xe02006:
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/4] tests/avocado: replay_linux.py add replay-dump.py test
  2023-08-14 16:31 [PATCH for-8.2 0/4] Assorted replay patches Nicholas Piggin
  2023-08-14 16:31 ` [PATCH 1/4] scripts/replay_dump.sh: Update to current rr record format Nicholas Piggin
@ 2023-08-14 16:31 ` Nicholas Piggin
  2023-08-18  4:23   ` Pavel Dovgalyuk
  2023-08-14 16:31 ` [PATCH 3/4] replay: allow runstate shutdown->running when replaying trace Nicholas Piggin
  2023-08-14 16:31 ` [PATCH 4/4] replay: simple auto-snapshot mode for record Nicholas Piggin
  3 siblings, 1 reply; 13+ messages in thread
From: Nicholas Piggin @ 2023-08-14 16:31 UTC (permalink / raw)
  To: Pavel Dovgalyuk
  Cc: Nicholas Piggin, Paolo Bonzini, John Snow, Cleber Rosa,
	Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal, qemu-devel

This runs replay-dump.py after recording a trace, and fails the test if
the script fails.

replay-dump.py is modified to exit with non-zero if an error is
encountered while parsing.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
It's possible this could introduce failures to existing test if an
unimplemented event gets recorded. I would make a new test for this but
it takes quite a while to record such a long trace that includes some
block and net events to excercise the script.

Thanks,
Nick

 scripts/replay-dump.py        |  6 ++++--
 tests/avocado/replay_linux.py | 16 +++++++++++++++-
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/scripts/replay-dump.py b/scripts/replay-dump.py
index 937ae19ff1..8f4715632a 100755
--- a/scripts/replay-dump.py
+++ b/scripts/replay-dump.py
@@ -21,6 +21,7 @@
 import argparse
 import struct
 import os
+import sys
 from collections import namedtuple
 
 # This mirrors some of the global replay state which some of the
@@ -97,7 +98,7 @@ def call_decode(table, index, dumpfile):
         print("Could not decode index: %d" % (index))
         print("Entry is: %s" % (decoder))
         print("Decode Table is:\n%s" % (table))
-        return False
+        sys.exit(1)
     else:
         return decoder.fn(decoder.eid, decoder.name, dumpfile)
 
@@ -118,7 +119,7 @@ def print_event(eid, name, string=None, event_count=None):
 def decode_unimp(eid, name, _unused_dumpfile):
     "Unimplimented decoder, will trigger exit"
     print("%s not handled - will now stop" % (name))
-    return False
+    sys.exit(1)
 
 # Checkpoint decoder
 def swallow_async_qword(eid, name, dumpfile):
@@ -401,3 +402,4 @@ def decode_file(filename):
 if __name__ == "__main__":
     args = parse_arguments()
     decode_file(args.file)
+    sys.exit(0)
diff --git a/tests/avocado/replay_linux.py b/tests/avocado/replay_linux.py
index a76dd507fc..12937ce0ec 100644
--- a/tests/avocado/replay_linux.py
+++ b/tests/avocado/replay_linux.py
@@ -11,6 +11,7 @@
 import os
 import logging
 import time
+import subprocess
 
 from avocado import skipUnless
 from avocado_qemu import BUILD_DIR
@@ -21,6 +22,11 @@
 from avocado.utils.path import find_command
 from avocado_qemu import LinuxTest
 
+from pathlib import Path
+
+self_dir = Path(__file__).parent
+src_dir = self_dir.parent.parent
+
 class ReplayLinux(LinuxTest):
     """
     Boots a Linux system, checking for a successful initialization
@@ -94,7 +100,7 @@ def launch_and_wait(self, record, args, shift):
         else:
             vm.event_wait('SHUTDOWN', self.timeout)
             vm.shutdown(True)
-            logger.info('successfully fihished the replay')
+            logger.info('successfully finished the replay')
         elapsed = time.time() - start_time
         logger.info('elapsed time %.2f sec' % elapsed)
         return elapsed
@@ -105,6 +111,14 @@ def run_rr(self, args=None, shift=7):
         logger = logging.getLogger('replay')
         logger.info('replay overhead {:.2%}'.format(t2 / t1 - 1))
 
+        try:
+            replay_path = os.path.join(self.workdir, 'replay.bin')
+            subprocess.check_call(["./scripts/replay-dump.py",
+                                   "-f", replay_path],
+                                  cwd=src_dir, stdout=subprocess.DEVNULL)
+        except subprocess.CalledProcessError:
+            self.fail('replay-dump.py failed')
+
 @skipUnless(os.getenv('AVOCADO_TIMEOUT_EXPECTED'), 'Test might timeout')
 class ReplayLinuxX8664(ReplayLinux):
     """
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/4] replay: allow runstate shutdown->running when replaying trace
  2023-08-14 16:31 [PATCH for-8.2 0/4] Assorted replay patches Nicholas Piggin
  2023-08-14 16:31 ` [PATCH 1/4] scripts/replay_dump.sh: Update to current rr record format Nicholas Piggin
  2023-08-14 16:31 ` [PATCH 2/4] tests/avocado: replay_linux.py add replay-dump.py test Nicholas Piggin
@ 2023-08-14 16:31 ` Nicholas Piggin
  2023-08-18  4:25   ` Pavel Dovgalyuk
  2023-08-14 16:31 ` [PATCH 4/4] replay: simple auto-snapshot mode for record Nicholas Piggin
  3 siblings, 1 reply; 13+ messages in thread
From: Nicholas Piggin @ 2023-08-14 16:31 UTC (permalink / raw)
  To: Pavel Dovgalyuk
  Cc: Nicholas Piggin, Paolo Bonzini, John Snow, Cleber Rosa,
	Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal, qemu-devel

When replaying a trace, it is possible to go from shutdown to
running with a reverse-debugging step. This can be useful if the
problem being debugged triggers a reset or shutdown.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 include/sysemu/runstate.h |  1 +
 replay/replay.c           |  2 ++
 softmmu/runstate.c        | 19 +++++++++++++++++++
 3 files changed, 22 insertions(+)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index 7beb29c2e2..85a1167ccb 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -9,6 +9,7 @@ void runstate_set(RunState new_state);
 RunState runstate_get(void);
 bool runstate_is_running(void);
 bool runstate_needs_reset(void);
+void runstate_replay_enable(void);
 
 typedef void VMChangeStateHandler(void *opaque, bool running, RunState state);
 
diff --git a/replay/replay.c b/replay/replay.c
index 0f7d766efe..e64f71209a 100644
--- a/replay/replay.c
+++ b/replay/replay.c
@@ -272,6 +272,8 @@ static void replay_enable(const char *fname, int mode)
         /* go to the beginning */
         fseek(replay_file, HEADER_SIZE, SEEK_SET);
         replay_fetch_data_kind();
+
+        runstate_replay_enable();
     }
 
     replay_init_events();
diff --git a/softmmu/runstate.c b/softmmu/runstate.c
index f3bd862818..9fd3e57485 100644
--- a/softmmu/runstate.c
+++ b/softmmu/runstate.c
@@ -174,6 +174,12 @@ static const RunStateTransition runstate_transitions_def[] = {
     { RUN_STATE__MAX, RUN_STATE__MAX },
 };
 
+static const RunStateTransition replay_runstate_transitions_def[] = {
+    { RUN_STATE_SHUTDOWN, RUN_STATE_RUNNING},
+
+    { RUN_STATE__MAX, RUN_STATE__MAX },
+};
+
 static bool runstate_valid_transitions[RUN_STATE__MAX][RUN_STATE__MAX];
 
 bool runstate_check(RunState state)
@@ -181,6 +187,19 @@ bool runstate_check(RunState state)
     return current_run_state == state;
 }
 
+void runstate_replay_enable(void)
+{
+    const RunStateTransition *p;
+
+    assert(replay_mode == REPLAY_MODE_PLAY);
+
+    for (p = &replay_runstate_transitions_def[0]; p->from != RUN_STATE__MAX;
+         p++) {
+        runstate_valid_transitions[p->from][p->to] = true;
+    }
+
+}
+
 static void runstate_init(void)
 {
     const RunStateTransition *p;
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/4] replay: simple auto-snapshot mode for record
  2023-08-14 16:31 [PATCH for-8.2 0/4] Assorted replay patches Nicholas Piggin
                   ` (2 preceding siblings ...)
  2023-08-14 16:31 ` [PATCH 3/4] replay: allow runstate shutdown->running when replaying trace Nicholas Piggin
@ 2023-08-14 16:31 ` Nicholas Piggin
  2023-08-18  4:36   ` Pavel Dovgalyuk
  3 siblings, 1 reply; 13+ messages in thread
From: Nicholas Piggin @ 2023-08-14 16:31 UTC (permalink / raw)
  To: Pavel Dovgalyuk
  Cc: Nicholas Piggin, Paolo Bonzini, John Snow, Cleber Rosa,
	Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal, qemu-devel

record makes an initial snapshot when the machine is created, to enable
reverse-debugging. Often the issue being debugged appears near the end of
the trace, so it is important for performance to keep snapshots close to
the end.

This implements a periodic snapshot mode that keeps a rolling set of
recent snapshots.

Arguably this should be done by the debugger or a program that talks to
QMP, but for setting up simple scenarios and tests, it is convenient to
have this feature.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 docs/system/replay.rst   |  5 ++++
 include/sysemu/replay.h  | 11 ++++++++
 qemu-options.hx          |  9 +++++--
 replay/replay-snapshot.c | 57 ++++++++++++++++++++++++++++++++++++++++
 replay/replay.c          | 25 ++++++++++++++++++
 softmmu/vl.c             |  9 +++++++
 6 files changed, 114 insertions(+), 2 deletions(-)

diff --git a/docs/system/replay.rst b/docs/system/replay.rst
index 3105327423..bef9ea4171 100644
--- a/docs/system/replay.rst
+++ b/docs/system/replay.rst
@@ -156,6 +156,11 @@ for storing VM snapshots. Here is the example of the command line for this:
 ``empty.qcow2`` drive does not connected to any virtual block device and used
 for VM snapshots only.
 
+``rrsnapmode`` can be used to select just an initial snapshot or periodic
+snapshots, with ``rrsnapcount`` specifying the number of periodic snapshots
+to maintain, and ``rrsnaptime`` the amount of run time in seconds between
+periodic snapshots.
+
 .. _network-label:
 
 Network devices
diff --git a/include/sysemu/replay.h b/include/sysemu/replay.h
index 08aae5869f..a83e54afc6 100644
--- a/include/sysemu/replay.h
+++ b/include/sysemu/replay.h
@@ -45,6 +45,17 @@ typedef enum ReplayCheckpoint ReplayCheckpoint;
 
 typedef struct ReplayNetState ReplayNetState;
 
+enum ReplaySnapshotMode {
+    REPLAY_SNAPSHOT_MODE_INITIAL,
+    REPLAY_SNAPSHOT_MODE_PERIODIC,
+};
+typedef enum ReplaySnapshotMode ReplaySnapshotMode;
+
+extern ReplaySnapshotMode replay_snapshot_mode;
+
+extern uint64_t replay_snapshot_periodic_delay;
+extern int replay_snapshot_periodic_nr_keep;
+
 /* Name of the initial VM snapshot */
 extern char *replay_snapshot;
 
diff --git a/qemu-options.hx b/qemu-options.hx
index 29b98c3d4c..0dce93eeab 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4530,13 +4530,13 @@ SRST
 ERST
 
 DEF("icount", HAS_ARG, QEMU_OPTION_icount, \
-    "-icount [shift=N|auto][,align=on|off][,sleep=on|off][,rr=record|replay,rrfile=<filename>[,rrsnapshot=<snapshot>]]\n" \
+    "-icount [shift=N|auto][,align=on|off][,sleep=on|off][,rr=record|replay,rrfile=<filename>[,rrsnapshot=<snapshot>][,rrsnapmode=initial|periodic][,rrsnaptime=secs][,rrsnapcount=N]\n" \
     "                enable virtual instruction counter with 2^N clock ticks per\n" \
     "                instruction, enable aligning the host and virtual clocks\n" \
     "                or disable real time cpu sleeping, and optionally enable\n" \
     "                record-and-replay mode\n", QEMU_ARCH_ALL)
 SRST
-``-icount [shift=N|auto][,align=on|off][,sleep=on|off][,rr=record|replay,rrfile=filename[,rrsnapshot=snapshot]]``
+``-icount [shift=N|auto][,align=on|off][,sleep=on|off][,rr=record|replay,rrfile=filename[,rrsnapshot=snapshot][,rrsnapmode=initial|periodic][,rrsnaptime=secs][,rrsnapcount=N]]``
     Enable virtual instruction counter. The virtual cpu will execute one
     instruction every 2^N ns of virtual time. If ``auto`` is specified
     then the virtual cpu speed will be automatically adjusted to keep
@@ -4578,6 +4578,11 @@ SRST
     name. In record mode, a new VM snapshot with the given name is created
     at the start of execution recording. In replay mode this option
     specifies the snapshot name used to load the initial VM state.
+    ``rrsnapmode=periodic`` will additionally cause a periodic snapshot to
+    be created after ``rrsnaptime=secs`` seconds of real runtime. The last
+    ``rrsnapcount=N`` periodic snapshots (not including the initial) will
+    be kept (0 for infinite). Periodic snapshots are useful to speed
+    reverse debugging operations near the end of the recorded trace.
 ERST
 
 DEF("watchdog-action", HAS_ARG, QEMU_OPTION_watchdog_action, \
diff --git a/replay/replay-snapshot.c b/replay/replay-snapshot.c
index 10a7cf7992..38eac61c43 100644
--- a/replay/replay-snapshot.c
+++ b/replay/replay-snapshot.c
@@ -69,6 +69,53 @@ void replay_vmstate_register(void)
     vmstate_register(NULL, 0, &vmstate_replay, &replay_state);
 }
 
+static QEMUTimer *replay_snapshot_timer;
+static int replay_snapshot_count;
+
+static void replay_snapshot_timer_cb(void *opaque)
+{
+    Error *err = NULL;
+    char *name;
+
+    if (!replay_can_snapshot()) {
+        /* Try again soon */
+        timer_mod(replay_snapshot_timer,
+                  qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
+                  replay_snapshot_periodic_delay / 10);
+        return;
+    }
+
+    name = g_strdup_printf("%s-%d", replay_snapshot, replay_snapshot_count);
+    if (!save_snapshot(name,
+                       true, NULL, false, NULL, &err)) {
+        error_report_err(err);
+        error_report("Could not create periodic snapshot "
+                     "for icount record, disabling");
+        g_free(name);
+        return;
+    }
+    g_free(name);
+    replay_snapshot_count++;
+
+    if (replay_snapshot_periodic_nr_keep >= 1 &&
+        replay_snapshot_count > replay_snapshot_periodic_nr_keep) {
+        int del_nr;
+
+        del_nr = replay_snapshot_count - replay_snapshot_periodic_nr_keep - 1;
+        name = g_strdup_printf("%s-%d", replay_snapshot, del_nr);
+        if (!delete_snapshot(name, false, NULL, &err)) {
+            error_report_err(err);
+            error_report("Could not delete periodic snapshot "
+                         "for icount record");
+        }
+        g_free(name);
+    }
+
+    timer_mod(replay_snapshot_timer,
+              qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
+              replay_snapshot_periodic_delay);
+}
+
 void replay_vmstate_init(void)
 {
     Error *err = NULL;
@@ -81,6 +128,16 @@ void replay_vmstate_init(void)
                 error_report("Could not create snapshot for icount record");
                 exit(1);
             }
+
+            if (replay_snapshot_mode == REPLAY_SNAPSHOT_MODE_PERIODIC) {
+                replay_snapshot_timer = timer_new_ms(QEMU_CLOCK_REALTIME,
+                                                     replay_snapshot_timer_cb,
+                                                     NULL);
+                timer_mod(replay_snapshot_timer,
+                          qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
+                          replay_snapshot_periodic_delay);
+            }
+
         } else if (replay_mode == REPLAY_MODE_PLAY) {
             if (!load_snapshot(replay_snapshot, NULL, false, NULL, &err)) {
                 error_report_err(err);
diff --git a/replay/replay.c b/replay/replay.c
index e64f71209a..fa5930700d 100644
--- a/replay/replay.c
+++ b/replay/replay.c
@@ -29,6 +29,10 @@
 ReplayMode replay_mode = REPLAY_MODE_NONE;
 char *replay_snapshot;
 
+ReplaySnapshotMode replay_snapshot_mode;
+uint64_t replay_snapshot_periodic_delay;
+int replay_snapshot_periodic_nr_keep;
+
 /* Name of replay file  */
 static char *replay_filename;
 ReplayState replay_state;
@@ -313,6 +317,27 @@ void replay_configure(QemuOpts *opts)
     }
 
     replay_snapshot = g_strdup(qemu_opt_get(opts, "rrsnapshot"));
+    if (replay_snapshot && mode == REPLAY_MODE_RECORD) {
+        const char *snapmode;
+
+        snapmode = qemu_opt_get(opts, "rrsnapmode");
+        if (!snapmode || !strcmp(snapmode, "initial")) {
+            replay_snapshot_mode = REPLAY_SNAPSHOT_MODE_INITIAL;
+        } else if (!strcmp(snapmode, "periodic")) {
+            replay_snapshot_mode = REPLAY_SNAPSHOT_MODE_PERIODIC;
+        } else {
+            error_report("Invalid rrsnapmode option: %s", snapmode);
+            exit(1);
+        }
+
+        /* Default 10 host seconds of machine runtime per snapshot. */
+        replay_snapshot_periodic_delay =
+                           qemu_opt_get_number(opts, "rrsnaptime", 10) * 1000;
+
+        /* Default 2, to cover at least the last 10 host seconds of runtime. */
+        replay_snapshot_periodic_nr_keep =
+                           qemu_opt_get_number(opts, "rrsnapcount", 2);
+    }
     replay_vmstate_register();
     replay_enable(fname, mode);
 
diff --git a/softmmu/vl.c b/softmmu/vl.c
index b0b96f67fa..e032eb45e8 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -446,6 +446,15 @@ static QemuOptsList qemu_icount_opts = {
         }, {
             .name = "rrsnapshot",
             .type = QEMU_OPT_STRING,
+        }, {
+            .name = "rrsnapmode",
+            .type = QEMU_OPT_STRING,
+        }, {
+            .name = "rrsnaptime",
+            .type = QEMU_OPT_NUMBER,
+        }, {
+            .name = "rrsnapcount",
+            .type = QEMU_OPT_NUMBER,
         },
         { /* end of list */ }
     },
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] scripts/replay_dump.sh: Update to current rr record format
  2023-08-14 16:31 ` [PATCH 1/4] scripts/replay_dump.sh: Update to current rr record format Nicholas Piggin
@ 2023-08-18  4:19   ` Pavel Dovgalyuk
  0 siblings, 0 replies; 13+ messages in thread
From: Pavel Dovgalyuk @ 2023-08-18  4:19 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Paolo Bonzini, John Snow, Cleber Rosa,
	Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal, qemu-devel,
	Довгалюк
	Павел

On 14.08.2023 19:31, Nicholas Piggin wrote:
> This thing seems to have fallen by the wayside. This gets it working with
> the current format, although does not quite implement all events.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

The code looks ok, therefore
Rewieved-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>

However, there is one thing about idea or replay-dump script.
I think it never will be used for parsing the older versions of the log.
Record/replay can only replay the log generated by the same
QEMU version. Any of virtual hw behavior change or some main loop
refactoring may break the replay.

That is why I think that support of the past replay log
formats is useless.

> ---
> My python skills are not good. Any help on this or patch 2 is
> appreciated.
> 
> Thanks,
> Nick
> 
>   scripts/replay-dump.py | 107 ++++++++++++++++++++++++++++++++++++++---
>   1 file changed, 101 insertions(+), 6 deletions(-)
> 
> diff --git a/scripts/replay-dump.py b/scripts/replay-dump.py
> index 3ba97a6d30..937ae19ff1 100755
> --- a/scripts/replay-dump.py
> +++ b/scripts/replay-dump.py
> @@ -20,6 +20,7 @@
>   
>   import argparse
>   import struct
> +import os
>   from collections import namedtuple
>   
>   # This mirrors some of the global replay state which some of the
> @@ -62,6 +63,10 @@ def read_byte(fin):
>       "Read a single byte"
>       return struct.unpack('>B', fin.read(1))[0]
>   
> +def read_bytes(fin, nr):
> +    "Read a nr bytes"
> +    return fin.read(nr)
> +
>   def read_event(fin):
>       "Read a single byte event, but save some state"
>       if replay_state.already_read:
> @@ -122,12 +127,18 @@ def swallow_async_qword(eid, name, dumpfile):
>       print("  %s(%d) @ %d" % (name, eid, step_id))
>       return True
>   
> +def swallow_bytes(eid, name, dumpfile, nr):
> +    "Swallow nr bytes of data without looking at it"
> +    dumpfile.seek(nr, os.SEEK_CUR)
> +    return True
> +
>   async_decode_table = [ Decoder(0, "REPLAY_ASYNC_EVENT_BH", swallow_async_qword),
> -                       Decoder(1, "REPLAY_ASYNC_INPUT", decode_unimp),
> -                       Decoder(2, "REPLAY_ASYNC_INPUT_SYNC", decode_unimp),
> -                       Decoder(3, "REPLAY_ASYNC_CHAR_READ", decode_unimp),
> -                       Decoder(4, "REPLAY_ASYNC_EVENT_BLOCK", decode_unimp),
> -                       Decoder(5, "REPLAY_ASYNC_EVENT_NET", decode_unimp),
> +                       Decoder(1, "REPLAY_ASYNC_BH_ONESHOT", decode_unimp),
> +                       Decoder(2, "REPLAY_ASYNC_INPUT", decode_unimp),
> +                       Decoder(3, "REPLAY_ASYNC_INPUT_SYNC", decode_unimp),
> +                       Decoder(4, "REPLAY_ASYNC_CHAR_READ", decode_unimp),
> +                       Decoder(5, "REPLAY_ASYNC_EVENT_BLOCK", decode_unimp),
> +                       Decoder(6, "REPLAY_ASYNC_EVENT_NET", decode_unimp),
>   ]
>   # See replay_read_events/replay_read_event
>   def decode_async(eid, name, dumpfile):
> @@ -156,6 +167,13 @@ def decode_audio_out(eid, name, dumpfile):
>       print_event(eid, name, "%d" % (audio_data))
>       return True
>   
> +def decode_random(eid, name, dumpfile):
> +    ret = read_dword(dumpfile)
> +    size = read_dword(dumpfile)
> +    swallow_bytes(eid, name, dumpfile, size)
> +    print_event(eid, name, "%d %d" % (ret, size))
> +    return True
> +
>   def decode_checkpoint(eid, name, dumpfile):
>       """Decode a checkpoint.
>   
> @@ -184,6 +202,38 @@ def decode_interrupt(eid, name, dumpfile):
>       print_event(eid, name)
>       return True
>   
> +def decode_exception(eid, name, dumpfile):
> +    print_event(eid, name)
> +    return True
> +
> +def decode_shutdown(eid, name, dumpfile):
> +    print_event(eid, name)
> +    return True
> +
> +def decode_end(eid, name, dumpfile):
> +    print_event(eid, name)
> +    return False
> +
> +def decode_char_write(eid, name, dumpfile):
> +    res = read_dword(dumpfile)
> +    offset = read_dword(dumpfile)
> +    print_event(eid, name)
> +    return True
> +
> +def decode_async_char_read(eid, name, dumpfile):
> +    char_id = read_byte(dumpfile)
> +    size = read_dword(dumpfile)
> +    print_event(eid, name, "device:%x chars:%s" % (char_id, read_bytes(dumpfile, size)))
> +    return True
> +
> +def decode_async_net(eid, name, dumpfile):
> +    net_id = read_byte(dumpfile)
> +    flags = read_dword(dumpfile)
> +    size = read_dword(dumpfile)
> +    swallow_bytes(eid, name, dumpfile, size)
> +    print_event(eid, name, "net:%x flags:%x bytes:%d" % (net_id, flags, size))
> +    return True
> +
>   def decode_clock(eid, name, dumpfile):
>       clock_data = read_qword(dumpfile)
>       print_event(eid, name, "0x%x" % (clock_data))
> @@ -268,6 +318,48 @@ def decode_clock(eid, name, dumpfile):
>                     Decoder(28, "EVENT_CP_RESET", decode_checkpoint),
>   ]
>   
> +v12_event_table = [Decoder(0, "EVENT_INSTRUCTION", decode_instruction),
> +                  Decoder(1, "EVENT_INTERRUPT", decode_interrupt),
> +                  Decoder(2, "EVENT_EXCEPTION", decode_exception),
> +                  Decoder(3, "EVENT_ASYNC_BH", swallow_async_qword),
> +                  Decoder(4, "EVENT_ASYNC_BH_ONESHOT", swallow_async_qword),
> +                  Decoder(5, "EVENT_ASYNC_INPUT", decode_unimp),
> +                  Decoder(6, "EVENT_ASYNC_INPUT_SYNC", decode_unimp),
> +                  Decoder(7, "EVENT_ASYNC_CHAR_READ", decode_async_char_read),
> +                  Decoder(8, "EVENT_ASYNC_BLOCK", swallow_async_qword),
> +                  Decoder(9, "EVENT_ASYNC_NET", decode_async_net),
> +                  Decoder(10, "EVENT_SHUTDOWN", decode_unimp),
> +                  Decoder(11, "EVENT_SHUTDOWN_HOST_ERR", decode_shutdown),
> +                  Decoder(12, "EVENT_SHUTDOWN_HOST_QMP_QUIT", decode_shutdown),
> +                  Decoder(13, "EVENT_SHUTDOWN_HOST_QMP_RESET", decode_shutdown),
> +                  Decoder(14, "EVENT_SHUTDOWN_HOST_SIGNAL", decode_shutdown),
> +                  Decoder(15, "EVENT_SHUTDOWN_HOST_UI", decode_shutdown),
> +                  Decoder(16, "EVENT_SHUTDOWN_GUEST_SHUTDOWN", decode_shutdown),
> +                  Decoder(17, "EVENT_SHUTDOWN_GUEST_RESET", decode_shutdown),
> +                  Decoder(18, "EVENT_SHUTDOWN_GUEST_PANIC", decode_shutdown),
> +                  Decoder(19, "EVENT_SHUTDOWN_SUBSYS_RESET", decode_shutdown),
> +                  Decoder(20, "EVENT_SHUTDOWN_SNAPSHOT_LOAD", decode_shutdown),
> +                  Decoder(21, "EVENT_SHUTDOWN___MAX", decode_shutdown),
> +                  Decoder(22, "EVENT_CHAR_WRITE", decode_char_write),
> +                  Decoder(23, "EVENT_CHAR_READ_ALL", decode_unimp),
> +                  Decoder(24, "EVENT_CHAR_READ_ALL_ERROR", decode_unimp),
> +                  Decoder(25, "EVENT_AUDIO_OUT", decode_audio_out),
> +                  Decoder(26, "EVENT_AUDIO_IN", decode_unimp),
> +                  Decoder(27, "EVENT_RANDOM", decode_random),
> +                  Decoder(28, "EVENT_CLOCK_HOST", decode_clock),
> +                  Decoder(29, "EVENT_CLOCK_VIRTUAL_RT", decode_clock),
> +                  Decoder(30, "EVENT_CP_CLOCK_WARP_START", decode_checkpoint),
> +                  Decoder(31, "EVENT_CP_CLOCK_WARP_ACCOUNT", decode_checkpoint),
> +                  Decoder(32, "EVENT_CP_RESET_REQUESTED", decode_checkpoint),
> +                  Decoder(33, "EVENT_CP_SUSPEND_REQUESTED", decode_checkpoint),
> +                  Decoder(34, "EVENT_CP_CLOCK_VIRTUAL", decode_checkpoint),
> +                  Decoder(35, "EVENT_CP_CLOCK_HOST", decode_checkpoint),
> +                  Decoder(36, "EVENT_CP_CLOCK_VIRTUAL_RT", decode_checkpoint),
> +                  Decoder(37, "EVENT_CP_INIT", decode_checkpoint_init),
> +                  Decoder(38, "EVENT_CP_RESET", decode_checkpoint),
> +                  Decoder(39, "EVENT_END", decode_end),
> +]
> +
>   def parse_arguments():
>       "Grab arguments for script"
>       parser = argparse.ArgumentParser()
> @@ -285,7 +377,10 @@ def decode_file(filename):
>   
>       print("HEADER: version 0x%x" % (version))
>   
> -    if version == 0xe02007:
> +    if version == 0xe0200c:
> +        event_decode_table = v12_event_table
> +        replay_state.checkpoint_start = 12
> +    elif version == 0xe02007:
>           event_decode_table = v7_event_table
>           replay_state.checkpoint_start = 12
>       elif version == 0xe02006:



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] tests/avocado: replay_linux.py add replay-dump.py test
  2023-08-14 16:31 ` [PATCH 2/4] tests/avocado: replay_linux.py add replay-dump.py test Nicholas Piggin
@ 2023-08-18  4:23   ` Pavel Dovgalyuk
  0 siblings, 0 replies; 13+ messages in thread
From: Pavel Dovgalyuk @ 2023-08-18  4:23 UTC (permalink / raw)
  To: Nicholas Piggin, Pavel Dovgalyuk
  Cc: Paolo Bonzini, John Snow, Cleber Rosa,
	Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal, qemu-devel

On 14.08.2023 19:31, Nicholas Piggin wrote:
> This runs replay-dump.py after recording a trace, and fails the test if
> the script fails.
> 
> replay-dump.py is modified to exit with non-zero if an error is
> encountered while parsing.

I would like to have separate test for replay-dump, because
replay-linux tests are very heavy to replay and knowing the exact
reason of the failure in advance would be more convenient.

What do you think of splitting the test?

> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> It's possible this could introduce failures to existing test if an
> unimplemented event gets recorded. I would make a new test for this but
> it takes quite a while to record such a long trace that includes some
> block and net events to excercise the script.
> 
> Thanks,
> Nick
> 
>   scripts/replay-dump.py        |  6 ++++--
>   tests/avocado/replay_linux.py | 16 +++++++++++++++-
>   2 files changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/scripts/replay-dump.py b/scripts/replay-dump.py
> index 937ae19ff1..8f4715632a 100755
> --- a/scripts/replay-dump.py
> +++ b/scripts/replay-dump.py
> @@ -21,6 +21,7 @@
>   import argparse
>   import struct
>   import os
> +import sys
>   from collections import namedtuple
>   
>   # This mirrors some of the global replay state which some of the
> @@ -97,7 +98,7 @@ def call_decode(table, index, dumpfile):
>           print("Could not decode index: %d" % (index))
>           print("Entry is: %s" % (decoder))
>           print("Decode Table is:\n%s" % (table))
> -        return False
> +        sys.exit(1)
>       else:
>           return decoder.fn(decoder.eid, decoder.name, dumpfile)
>   
> @@ -118,7 +119,7 @@ def print_event(eid, name, string=None, event_count=None):
>   def decode_unimp(eid, name, _unused_dumpfile):
>       "Unimplimented decoder, will trigger exit"
>       print("%s not handled - will now stop" % (name))
> -    return False
> +    sys.exit(1)
>   
>   # Checkpoint decoder
>   def swallow_async_qword(eid, name, dumpfile):
> @@ -401,3 +402,4 @@ def decode_file(filename):
>   if __name__ == "__main__":
>       args = parse_arguments()
>       decode_file(args.file)
> +    sys.exit(0)
> diff --git a/tests/avocado/replay_linux.py b/tests/avocado/replay_linux.py
> index a76dd507fc..12937ce0ec 100644
> --- a/tests/avocado/replay_linux.py
> +++ b/tests/avocado/replay_linux.py
> @@ -11,6 +11,7 @@
>   import os
>   import logging
>   import time
> +import subprocess
>   
>   from avocado import skipUnless
>   from avocado_qemu import BUILD_DIR
> @@ -21,6 +22,11 @@
>   from avocado.utils.path import find_command
>   from avocado_qemu import LinuxTest
>   
> +from pathlib import Path
> +
> +self_dir = Path(__file__).parent
> +src_dir = self_dir.parent.parent
> +
>   class ReplayLinux(LinuxTest):
>       """
>       Boots a Linux system, checking for a successful initialization
> @@ -94,7 +100,7 @@ def launch_and_wait(self, record, args, shift):
>           else:
>               vm.event_wait('SHUTDOWN', self.timeout)
>               vm.shutdown(True)
> -            logger.info('successfully fihished the replay')
> +            logger.info('successfully finished the replay')
>           elapsed = time.time() - start_time
>           logger.info('elapsed time %.2f sec' % elapsed)
>           return elapsed
> @@ -105,6 +111,14 @@ def run_rr(self, args=None, shift=7):
>           logger = logging.getLogger('replay')
>           logger.info('replay overhead {:.2%}'.format(t2 / t1 - 1))
>   
> +        try:
> +            replay_path = os.path.join(self.workdir, 'replay.bin')
> +            subprocess.check_call(["./scripts/replay-dump.py",
> +                                   "-f", replay_path],
> +                                  cwd=src_dir, stdout=subprocess.DEVNULL)
> +        except subprocess.CalledProcessError:
> +            self.fail('replay-dump.py failed')
> +
>   @skipUnless(os.getenv('AVOCADO_TIMEOUT_EXPECTED'), 'Test might timeout')
>   class ReplayLinuxX8664(ReplayLinux):
>       """



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] replay: allow runstate shutdown->running when replaying trace
  2023-08-14 16:31 ` [PATCH 3/4] replay: allow runstate shutdown->running when replaying trace Nicholas Piggin
@ 2023-08-18  4:25   ` Pavel Dovgalyuk
  0 siblings, 0 replies; 13+ messages in thread
From: Pavel Dovgalyuk @ 2023-08-18  4:25 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Paolo Bonzini, John Snow, Cleber Rosa,
	Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal, qemu-devel,
	Довгалюк
	Павел

Acked-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>

On 14.08.2023 19:31, Nicholas Piggin wrote:
> When replaying a trace, it is possible to go from shutdown to
> running with a reverse-debugging step. This can be useful if the
> problem being debugged triggers a reset or shutdown.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   include/sysemu/runstate.h |  1 +
>   replay/replay.c           |  2 ++
>   softmmu/runstate.c        | 19 +++++++++++++++++++
>   3 files changed, 22 insertions(+)
> 
> diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
> index 7beb29c2e2..85a1167ccb 100644
> --- a/include/sysemu/runstate.h
> +++ b/include/sysemu/runstate.h
> @@ -9,6 +9,7 @@ void runstate_set(RunState new_state);
>   RunState runstate_get(void);
>   bool runstate_is_running(void);
>   bool runstate_needs_reset(void);
> +void runstate_replay_enable(void);
>   
>   typedef void VMChangeStateHandler(void *opaque, bool running, RunState state);
>   
> diff --git a/replay/replay.c b/replay/replay.c
> index 0f7d766efe..e64f71209a 100644
> --- a/replay/replay.c
> +++ b/replay/replay.c
> @@ -272,6 +272,8 @@ static void replay_enable(const char *fname, int mode)
>           /* go to the beginning */
>           fseek(replay_file, HEADER_SIZE, SEEK_SET);
>           replay_fetch_data_kind();
> +
> +        runstate_replay_enable();
>       }
>   
>       replay_init_events();
> diff --git a/softmmu/runstate.c b/softmmu/runstate.c
> index f3bd862818..9fd3e57485 100644
> --- a/softmmu/runstate.c
> +++ b/softmmu/runstate.c
> @@ -174,6 +174,12 @@ static const RunStateTransition runstate_transitions_def[] = {
>       { RUN_STATE__MAX, RUN_STATE__MAX },
>   };
>   
> +static const RunStateTransition replay_runstate_transitions_def[] = {
> +    { RUN_STATE_SHUTDOWN, RUN_STATE_RUNNING},
> +
> +    { RUN_STATE__MAX, RUN_STATE__MAX },
> +};
> +
>   static bool runstate_valid_transitions[RUN_STATE__MAX][RUN_STATE__MAX];
>   
>   bool runstate_check(RunState state)
> @@ -181,6 +187,19 @@ bool runstate_check(RunState state)
>       return current_run_state == state;
>   }
>   
> +void runstate_replay_enable(void)
> +{
> +    const RunStateTransition *p;
> +
> +    assert(replay_mode == REPLAY_MODE_PLAY);
> +
> +    for (p = &replay_runstate_transitions_def[0]; p->from != RUN_STATE__MAX;
> +         p++) {
> +        runstate_valid_transitions[p->from][p->to] = true;
> +    }
> +
> +}
> +
>   static void runstate_init(void)
>   {
>       const RunStateTransition *p;



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] replay: simple auto-snapshot mode for record
  2023-08-14 16:31 ` [PATCH 4/4] replay: simple auto-snapshot mode for record Nicholas Piggin
@ 2023-08-18  4:36   ` Pavel Dovgalyuk
  2024-02-26  7:36     ` Nicholas Piggin
  0 siblings, 1 reply; 13+ messages in thread
From: Pavel Dovgalyuk @ 2023-08-18  4:36 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Paolo Bonzini, John Snow, Cleber Rosa,
	Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal, qemu-devel,
	Довгалюк
	Павел

On 14.08.2023 19:31, Nicholas Piggin wrote:
> record makes an initial snapshot when the machine is created, to enable
> reverse-debugging. Often the issue being debugged appears near the end of
> the trace, so it is important for performance to keep snapshots close to
> the end.
> 
> This implements a periodic snapshot mode that keeps a rolling set of
> recent snapshots.
> 
> Arguably this should be done by the debugger or a program that talks to
> QMP, but for setting up simple scenarios and tests, it is convenient to
> have this feature.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   docs/system/replay.rst   |  5 ++++
>   include/sysemu/replay.h  | 11 ++++++++
>   qemu-options.hx          |  9 +++++--
>   replay/replay-snapshot.c | 57 ++++++++++++++++++++++++++++++++++++++++
>   replay/replay.c          | 25 ++++++++++++++++++
>   softmmu/vl.c             |  9 +++++++
>   6 files changed, 114 insertions(+), 2 deletions(-)
> 
> diff --git a/docs/system/replay.rst b/docs/system/replay.rst
> index 3105327423..bef9ea4171 100644
> --- a/docs/system/replay.rst
> +++ b/docs/system/replay.rst
> @@ -156,6 +156,11 @@ for storing VM snapshots. Here is the example of the command line for this:
>   ``empty.qcow2`` drive does not connected to any virtual block device and used
>   for VM snapshots only.
>   
> +``rrsnapmode`` can be used to select just an initial snapshot or periodic
> +snapshots, with ``rrsnapcount`` specifying the number of periodic snapshots
> +to maintain, and ``rrsnaptime`` the amount of run time in seconds between
> +periodic snapshots.
> +
>   .. _network-label:
>   
>   Network devices
> diff --git a/include/sysemu/replay.h b/include/sysemu/replay.h
> index 08aae5869f..a83e54afc6 100644
> --- a/include/sysemu/replay.h
> +++ b/include/sysemu/replay.h
> @@ -45,6 +45,17 @@ typedef enum ReplayCheckpoint ReplayCheckpoint;
>   
>   typedef struct ReplayNetState ReplayNetState;
>   
> +enum ReplaySnapshotMode {
> +    REPLAY_SNAPSHOT_MODE_INITIAL,
> +    REPLAY_SNAPSHOT_MODE_PERIODIC,
> +};
> +typedef enum ReplaySnapshotMode ReplaySnapshotMode;
> +
> +extern ReplaySnapshotMode replay_snapshot_mode;
> +
> +extern uint64_t replay_snapshot_periodic_delay;
> +extern int replay_snapshot_periodic_nr_keep;
> +

It seems that all of these doesn't have to be exported,
you can add it into the internal replay header.

>   /* Name of the initial VM snapshot */
>   extern char *replay_snapshot
>   
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 29b98c3d4c..0dce93eeab 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -4530,13 +4530,13 @@ SRST
>   ERST
>   
>   DEF("icount", HAS_ARG, QEMU_OPTION_icount, \
> -    "-icount [shift=N|auto][,align=on|off][,sleep=on|off][,rr=record|replay,rrfile=<filename>[,rrsnapshot=<snapshot>]]\n" \
> +    "-icount [shift=N|auto][,align=on|off][,sleep=on|off][,rr=record|replay,rrfile=<filename>[,rrsnapshot=<snapshot>][,rrsnapmode=initial|periodic][,rrsnaptime=secs][,rrsnapcount=N]\n" \
>       "                enable virtual instruction counter with 2^N clock ticks per\n" \
>       "                instruction, enable aligning the host and virtual clocks\n" \
>       "                or disable real time cpu sleeping, and optionally enable\n" \
>       "                record-and-replay mode\n", QEMU_ARCH_ALL)
>   SRST
> -``-icount [shift=N|auto][,align=on|off][,sleep=on|off][,rr=record|replay,rrfile=filename[,rrsnapshot=snapshot]]``
> +``-icount [shift=N|auto][,align=on|off][,sleep=on|off][,rr=record|replay,rrfile=filename[,rrsnapshot=snapshot][,rrsnapmode=initial|periodic][,rrsnaptime=secs][,rrsnapcount=N]]``
>       Enable virtual instruction counter. The virtual cpu will execute one
>       instruction every 2^N ns of virtual time. If ``auto`` is specified
>       then the virtual cpu speed will be automatically adjusted to keep
> @@ -4578,6 +4578,11 @@ SRST
>       name. In record mode, a new VM snapshot with the given name is created
>       at the start of execution recording. In replay mode this option
>       specifies the snapshot name used to load the initial VM state.
> +    ``rrsnapmode=periodic`` will additionally cause a periodic snapshot to
> +    be created after ``rrsnaptime=secs`` seconds of real runtime. The last
> +    ``rrsnapcount=N`` periodic snapshots (not including the initial) will
> +    be kept (0 for infinite). Periodic snapshots are useful to speed
> +    reverse debugging operations near the end of the recorded trace.
>   ERST
>   
>   DEF("watchdog-action", HAS_ARG, QEMU_OPTION_watchdog_action, \
> diff --git a/replay/replay-snapshot.c b/replay/replay-snapshot.c
> index 10a7cf7992..38eac61c43 100644
> --- a/replay/replay-snapshot.c
> +++ b/replay/replay-snapshot.c
> @@ -69,6 +69,53 @@ void replay_vmstate_register(void)
>       vmstate_register(NULL, 0, &vmstate_replay, &replay_state);
>   }
>   
> +static QEMUTimer *replay_snapshot_timer;
> +static int replay_snapshot_count;
> +
> +static void replay_snapshot_timer_cb(void *opaque)
> +{
> +    Error *err = NULL;
> +    char *name;
> +
> +    if (!replay_can_snapshot()) {
> +        /* Try again soon */
> +        timer_mod(replay_snapshot_timer,
> +                  qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
> +                  replay_snapshot_periodic_delay / 10);
> +        return;
> +    }
> +
> +    name = g_strdup_printf("%s-%d", replay_snapshot, replay_snapshot_count);
> +    if (!save_snapshot(name,
> +                       true, NULL, false, NULL, &err)) {
> +        error_report_err(err);
> +        error_report("Could not create periodic snapshot "
> +                     "for icount record, disabling");
> +        g_free(name);
> +        return;
> +    }
> +    g_free(name);
> +    replay_snapshot_count++;
> +
> +    if (replay_snapshot_periodic_nr_keep >= 1 &&
> +        replay_snapshot_count > replay_snapshot_periodic_nr_keep) {
> +        int del_nr;
> +
> +        del_nr = replay_snapshot_count - replay_snapshot_periodic_nr_keep - 1;
> +        name = g_strdup_printf("%s-%d", replay_snapshot, del_nr);
> +        if (!delete_snapshot(name, false, NULL, &err)) {
> +            error_report_err(err);
> +            error_report("Could not delete periodic snapshot "
> +                         "for icount record");
> +        }
> +        g_free(name);
> +    }
> +
> +    timer_mod(replay_snapshot_timer,
> +              qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
> +              replay_snapshot_periodic_delay);

I'm not sure that realtime is not the best choice for such of a timer.
Virtual machine may be stopped or slowed down for some reason.

> +}
> +
>   void replay_vmstate_init(void)
>   {
>       Error *err = NULL;
> @@ -81,6 +128,16 @@ void replay_vmstate_init(void)
>                   error_report("Could not create snapshot for icount record");
>                   exit(1);
>               }
> +
> +            if (replay_snapshot_mode == REPLAY_SNAPSHOT_MODE_PERIODIC) {
> +                replay_snapshot_timer = timer_new_ms(QEMU_CLOCK_REALTIME,
> +                                                     replay_snapshot_timer_cb,
> +                                                     NULL);
> +                timer_mod(replay_snapshot_timer,
> +                          qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
> +                          replay_snapshot_periodic_delay);
> +            }
> +

Please also delete placeholder comment for the snapshotting timer
in replay_enable function.

>           } else if (replay_mode == REPLAY_MODE_PLAY) {
>               if (!load_snapshot(replay_snapshot, NULL, false, NULL, &err)) {
>                   error_report_err(err);
> diff --git a/replay/replay.c b/replay/replay.c
> index e64f71209a..fa5930700d 100644
> --- a/replay/replay.c
> +++ b/replay/replay.c
> @@ -29,6 +29,10 @@
>   ReplayMode replay_mode = REPLAY_MODE_NONE;
>   char *replay_snapshot;
>   
> +ReplaySnapshotMode replay_snapshot_mode;
> +uint64_t replay_snapshot_periodic_delay;
> +int replay_snapshot_periodic_nr_keep;
> +
>   /* Name of replay file  */
>   static char *replay_filename;
>   ReplayState replay_state;
> @@ -313,6 +317,27 @@ void replay_configure(QemuOpts *opts)
>       }
>   
>       replay_snapshot = g_strdup(qemu_opt_get(opts, "rrsnapshot"));
> +    if (replay_snapshot && mode == REPLAY_MODE_RECORD) {

Can such a snapshotting may be useful in replay mode?

> +        const char *snapmode;
> +
> +        snapmode = qemu_opt_get(opts, "rrsnapmode");
> +        if (!snapmode || !strcmp(snapmode, "initial")) {
> +            replay_snapshot_mode = REPLAY_SNAPSHOT_MODE_INITIAL;
> +        } else if (!strcmp(snapmode, "periodic")) {
> +            replay_snapshot_mode = REPLAY_SNAPSHOT_MODE_PERIODIC;
> +        } else {
> +            error_report("Invalid rrsnapmode option: %s", snapmode);
> +            exit(1);
> +        }
> +
> +        /* Default 10 host seconds of machine runtime per snapshot. */
> +        replay_snapshot_periodic_delay =
> +                           qemu_opt_get_number(opts, "rrsnaptime", 10) * 1000;
> +
> +        /* Default 2, to cover at least the last 10 host seconds of runtime. */
> +        replay_snapshot_periodic_nr_keep =
> +                           qemu_opt_get_number(opts, "rrsnapcount", 2);
> +    }
>       replay_vmstate_register();
>       replay_enable(fname, mode);
>   
> diff --git a/softmmu/vl.c b/softmmu/vl.c
> index b0b96f67fa..e032eb45e8 100644
> --- a/softmmu/vl.c
> +++ b/softmmu/vl.c
> @@ -446,6 +446,15 @@ static QemuOptsList qemu_icount_opts = {
>           }, {
>               .name = "rrsnapshot",
>               .type = QEMU_OPT_STRING,
> +        }, {
> +            .name = "rrsnapmode",
> +            .type = QEMU_OPT_STRING,
> +        }, {
> +            .name = "rrsnaptime",
> +            .type = QEMU_OPT_NUMBER,
> +        }, {
> +            .name = "rrsnapcount",
> +            .type = QEMU_OPT_NUMBER,
>           },
>           { /* end of list */ }
>       },



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] replay: simple auto-snapshot mode for record
  2023-08-18  4:36   ` Pavel Dovgalyuk
@ 2024-02-26  7:36     ` Nicholas Piggin
  2024-02-28  5:07       ` Pavel Dovgalyuk
  0 siblings, 1 reply; 13+ messages in thread
From: Nicholas Piggin @ 2024-02-26  7:36 UTC (permalink / raw)
  To: Pavel Dovgalyuk
  Cc: Paolo Bonzini, John Snow, Cleber Rosa,
	Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal, qemu-devel,
	Довгалюк
	Павел

On Fri Aug 18, 2023 at 2:36 PM AEST, Pavel Dovgalyuk wrote:
> On 14.08.2023 19:31, Nicholas Piggin wrote:
> > record makes an initial snapshot when the machine is created, to enable
> > reverse-debugging. Often the issue being debugged appears near the end of
> > the trace, so it is important for performance to keep snapshots close to
> > the end.
> > 
> > This implements a periodic snapshot mode that keeps a rolling set of
> > recent snapshots.
> > 
> > Arguably this should be done by the debugger or a program that talks to
> > QMP, but for setting up simple scenarios and tests, it is convenient to
> > have this feature.

I'm looking at resurrecting this to help add a bit of testing...

[snip]

> > +static void replay_snapshot_timer_cb(void *opaque)
> > +{
> > +    Error *err = NULL;
> > +    char *name;
> > +
> > +    if (!replay_can_snapshot()) {
> > +        /* Try again soon */
> > +        timer_mod(replay_snapshot_timer,
> > +                  qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
> > +                  replay_snapshot_periodic_delay / 10);
> > +        return;
> > +    }
> > +
> > +    name = g_strdup_printf("%s-%d", replay_snapshot, replay_snapshot_count);
> > +    if (!save_snapshot(name,
> > +                       true, NULL, false, NULL, &err)) {
> > +        error_report_err(err);
> > +        error_report("Could not create periodic snapshot "
> > +                     "for icount record, disabling");
> > +        g_free(name);
> > +        return;
> > +    }
> > +    g_free(name);
> > +    replay_snapshot_count++;
> > +
> > +    if (replay_snapshot_periodic_nr_keep >= 1 &&
> > +        replay_snapshot_count > replay_snapshot_periodic_nr_keep) {
> > +        int del_nr;
> > +
> > +        del_nr = replay_snapshot_count - replay_snapshot_periodic_nr_keep - 1;
> > +        name = g_strdup_printf("%s-%d", replay_snapshot, del_nr);
> > +        if (!delete_snapshot(name, false, NULL, &err)) {
> > +            error_report_err(err);
> > +            error_report("Could not delete periodic snapshot "
> > +                         "for icount record");
> > +        }
> > +        g_free(name);
> > +    }
> > +
> > +    timer_mod(replay_snapshot_timer,
> > +              qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
> > +              replay_snapshot_periodic_delay);
>
> I'm not sure that realtime is not the best choice for such of a timer.
> Virtual machine may be stopped or slowed down for some reason.

My thinking was that, say if you snapshot every 10 seconds of real time
executed, then you should have an upper limit on the order of 10 seconds
to perform a reverse-debug operation (so long as you don't exceed your
nr_keep limit).

Is it worth worrying about complexity of slowdowns and vm pausing?
Maybe it could stop snapshotting on a host pause.

> > +}
> > +
> >   void replay_vmstate_init(void)
> >   {
> >       Error *err = NULL;
> > @@ -81,6 +128,16 @@ void replay_vmstate_init(void)
> >                   error_report("Could not create snapshot for icount record");
> >                   exit(1);
> >               }
> > +
> > +            if (replay_snapshot_mode == REPLAY_SNAPSHOT_MODE_PERIODIC) {
> > +                replay_snapshot_timer = timer_new_ms(QEMU_CLOCK_REALTIME,
> > +                                                     replay_snapshot_timer_cb,
> > +                                                     NULL);
> > +                timer_mod(replay_snapshot_timer,
> > +                          qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
> > +                          replay_snapshot_periodic_delay);
> > +            }
> > +
>
> Please also delete placeholder comment for the snapshotting timer
> in replay_enable function.

Wil do.

> >           } else if (replay_mode == REPLAY_MODE_PLAY) {
> >               if (!load_snapshot(replay_snapshot, NULL, false, NULL, &err)) {
> >                   error_report_err(err);
> > diff --git a/replay/replay.c b/replay/replay.c
> > index e64f71209a..fa5930700d 100644
> > --- a/replay/replay.c
> > +++ b/replay/replay.c
> > @@ -29,6 +29,10 @@
> >   ReplayMode replay_mode = REPLAY_MODE_NONE;
> >   char *replay_snapshot;
> >   
> > +ReplaySnapshotMode replay_snapshot_mode;
> > +uint64_t replay_snapshot_periodic_delay;
> > +int replay_snapshot_periodic_nr_keep;
> > +
> >   /* Name of replay file  */
> >   static char *replay_filename;
> >   ReplayState replay_state;
> > @@ -313,6 +317,27 @@ void replay_configure(QemuOpts *opts)
> >       }
> >   
> >       replay_snapshot = g_strdup(qemu_opt_get(opts, "rrsnapshot"));
> > +    if (replay_snapshot && mode == REPLAY_MODE_RECORD) {
>
> Can such a snapshotting may be useful in replay mode?

Does snapshotting do anything in replay mode? I assume if we did
snapshotting based on the machine timer then we'd have to support
it here so the timer events get replayed properly, at least. But
I was trying to get by with minimum complexity :)

Thanks,
Nick


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] replay: simple auto-snapshot mode for record
  2024-02-26  7:36     ` Nicholas Piggin
@ 2024-02-28  5:07       ` Pavel Dovgalyuk
  2024-02-29  6:32         ` Nicholas Piggin
  2024-03-04 16:33         ` Nicholas Piggin
  0 siblings, 2 replies; 13+ messages in thread
From: Pavel Dovgalyuk @ 2024-02-28  5:07 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Paolo Bonzini, John Snow, Cleber Rosa,
	Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal, qemu-devel

On 26.02.2024 10:36, Nicholas Piggin wrote:
> On Fri Aug 18, 2023 at 2:36 PM AEST, Pavel Dovgalyuk wrote:
>> On 14.08.2023 19:31, Nicholas Piggin wrote:
>>> record makes an initial snapshot when the machine is created, to enable
>>> reverse-debugging. Often the issue being debugged appears near the end of
>>> the trace, so it is important for performance to keep snapshots close to
>>> the end.
>>>
>>> This implements a periodic snapshot mode that keeps a rolling set of
>>> recent snapshots.
>>>
>>> Arguably this should be done by the debugger or a program that talks to
>>> QMP, but for setting up simple scenarios and tests, it is convenient to
>>> have this feature.
> 
> I'm looking at resurrecting this to help add a bit of testing...
> 
> [snip]
> 
>>> +static void replay_snapshot_timer_cb(void *opaque)
>>> +{
>>> +    Error *err = NULL;
>>> +    char *name;
>>> +
>>> +    if (!replay_can_snapshot()) {
>>> +        /* Try again soon */
>>> +        timer_mod(replay_snapshot_timer,
>>> +                  qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
>>> +                  replay_snapshot_periodic_delay / 10);
>>> +        return;
>>> +    }
>>> +
>>> +    name = g_strdup_printf("%s-%d", replay_snapshot, replay_snapshot_count);
>>> +    if (!save_snapshot(name,
>>> +                       true, NULL, false, NULL, &err)) {
>>> +        error_report_err(err);
>>> +        error_report("Could not create periodic snapshot "
>>> +                     "for icount record, disabling");
>>> +        g_free(name);
>>> +        return;
>>> +    }
>>> +    g_free(name);
>>> +    replay_snapshot_count++;
>>> +
>>> +    if (replay_snapshot_periodic_nr_keep >= 1 &&
>>> +        replay_snapshot_count > replay_snapshot_periodic_nr_keep) {
>>> +        int del_nr;
>>> +
>>> +        del_nr = replay_snapshot_count - replay_snapshot_periodic_nr_keep - 1;
>>> +        name = g_strdup_printf("%s-%d", replay_snapshot, del_nr);
>>> +        if (!delete_snapshot(name, false, NULL, &err)) {
>>> +            error_report_err(err);
>>> +            error_report("Could not delete periodic snapshot "
>>> +                         "for icount record");
>>> +        }
>>> +        g_free(name);
>>> +    }
>>> +
>>> +    timer_mod(replay_snapshot_timer,
>>> +              qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
>>> +              replay_snapshot_periodic_delay);
>>
>> I'm not sure that realtime is not the best choice for such of a timer.
>> Virtual machine may be stopped or slowed down for some reason.
> 
> My thinking was that, say if you snapshot every 10 seconds of real time
> executed, then you should have an upper limit on the order of 10 seconds
> to perform a reverse-debug operation (so long as you don't exceed your
> nr_keep limit).

But in some cases savevm itself could take more than 10 seconds.
We'll have infinite saving in this case. That's why I propose using 
virtual clock with the QEMU_TIMER_ATTR_EXTERNAL attribute.

> 
> Is it worth worrying about complexity of slowdowns and vm pausing?
> Maybe it could stop snapshotting on a host pause.
> 
>>> +}
>>> +
>>>    void replay_vmstate_init(void)
>>>    {
>>>        Error *err = NULL;
>>> @@ -81,6 +128,16 @@ void replay_vmstate_init(void)
>>>                    error_report("Could not create snapshot for icount record");
>>>                    exit(1);
>>>                }
>>> +
>>> +            if (replay_snapshot_mode == REPLAY_SNAPSHOT_MODE_PERIODIC) {
>>> +                replay_snapshot_timer = timer_new_ms(QEMU_CLOCK_REALTIME,
>>> +                                                     replay_snapshot_timer_cb,
>>> +                                                     NULL);
>>> +                timer_mod(replay_snapshot_timer,
>>> +                          qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
>>> +                          replay_snapshot_periodic_delay);
>>> +            }
>>> +
>>
>> Please also delete placeholder comment for the snapshotting timer
>> in replay_enable function.
> 
> Wil do.
> 
>>>            } else if (replay_mode == REPLAY_MODE_PLAY) {
>>>                if (!load_snapshot(replay_snapshot, NULL, false, NULL, &err)) {
>>>                    error_report_err(err);
>>> diff --git a/replay/replay.c b/replay/replay.c
>>> index e64f71209a..fa5930700d 100644
>>> --- a/replay/replay.c
>>> +++ b/replay/replay.c
>>> @@ -29,6 +29,10 @@
>>>    ReplayMode replay_mode = REPLAY_MODE_NONE;
>>>    char *replay_snapshot;
>>>    
>>> +ReplaySnapshotMode replay_snapshot_mode;
>>> +uint64_t replay_snapshot_periodic_delay;
>>> +int replay_snapshot_periodic_nr_keep;
>>> +
>>>    /* Name of replay file  */
>>>    static char *replay_filename;
>>>    ReplayState replay_state;
>>> @@ -313,6 +317,27 @@ void replay_configure(QemuOpts *opts)
>>>        }
>>>    
>>>        replay_snapshot = g_strdup(qemu_opt_get(opts, "rrsnapshot"));
>>> +    if (replay_snapshot && mode == REPLAY_MODE_RECORD) {
>>
>> Can such a snapshotting may be useful in replay mode?
> 
> Does snapshotting do anything in replay mode? 

Yes, you can create as many snapshots as you want if 'snapshot=on'
option of the disk image was not used.

> I assume if we did
> snapshotting based on the machine timer then we'd have to support
> it here so the timer events get replayed properly, at least. But
> I was trying to get by with minimum complexity :)

Use QEMU_TIMER_ATTR_EXTERNAL attribute for the timer and then its
events will not affect the replay.

> 
> Thanks,
> Nick



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] replay: simple auto-snapshot mode for record
  2024-02-28  5:07       ` Pavel Dovgalyuk
@ 2024-02-29  6:32         ` Nicholas Piggin
  2024-03-04 16:33         ` Nicholas Piggin
  1 sibling, 0 replies; 13+ messages in thread
From: Nicholas Piggin @ 2024-02-29  6:32 UTC (permalink / raw)
  To: Pavel Dovgalyuk
  Cc: Paolo Bonzini, John Snow, Cleber Rosa,
	Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal, qemu-devel

On Wed Feb 28, 2024 at 3:07 PM AEST, Pavel Dovgalyuk wrote:
> On 26.02.2024 10:36, Nicholas Piggin wrote:
> > On Fri Aug 18, 2023 at 2:36 PM AEST, Pavel Dovgalyuk wrote:
> >> On 14.08.2023 19:31, Nicholas Piggin wrote:
> >>> record makes an initial snapshot when the machine is created, to enable
> >>> reverse-debugging. Often the issue being debugged appears near the end of
> >>> the trace, so it is important for performance to keep snapshots close to
> >>> the end.
> >>>
> >>> This implements a periodic snapshot mode that keeps a rolling set of
> >>> recent snapshots.
> >>>
> >>> Arguably this should be done by the debugger or a program that talks to
> >>> QMP, but for setting up simple scenarios and tests, it is convenient to
> >>> have this feature.
> > 
> > I'm looking at resurrecting this to help add a bit of testing...
> > 
> > [snip]
> > 
> >>> +static void replay_snapshot_timer_cb(void *opaque)
> >>> +{
> >>> +    Error *err = NULL;
> >>> +    char *name;
> >>> +
> >>> +    if (!replay_can_snapshot()) {
> >>> +        /* Try again soon */
> >>> +        timer_mod(replay_snapshot_timer,
> >>> +                  qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
> >>> +                  replay_snapshot_periodic_delay / 10);
> >>> +        return;
> >>> +    }
> >>> +
> >>> +    name = g_strdup_printf("%s-%d", replay_snapshot, replay_snapshot_count);
> >>> +    if (!save_snapshot(name,
> >>> +                       true, NULL, false, NULL, &err)) {
> >>> +        error_report_err(err);
> >>> +        error_report("Could not create periodic snapshot "
> >>> +                     "for icount record, disabling");
> >>> +        g_free(name);
> >>> +        return;
> >>> +    }
> >>> +    g_free(name);
> >>> +    replay_snapshot_count++;
> >>> +
> >>> +    if (replay_snapshot_periodic_nr_keep >= 1 &&
> >>> +        replay_snapshot_count > replay_snapshot_periodic_nr_keep) {
> >>> +        int del_nr;
> >>> +
> >>> +        del_nr = replay_snapshot_count - replay_snapshot_periodic_nr_keep - 1;
> >>> +        name = g_strdup_printf("%s-%d", replay_snapshot, del_nr);
> >>> +        if (!delete_snapshot(name, false, NULL, &err)) {
> >>> +            error_report_err(err);
> >>> +            error_report("Could not delete periodic snapshot "
> >>> +                         "for icount record");
> >>> +        }
> >>> +        g_free(name);
> >>> +    }
> >>> +
> >>> +    timer_mod(replay_snapshot_timer,
> >>> +              qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
> >>> +              replay_snapshot_periodic_delay);
> >>
> >> I'm not sure that realtime is not the best choice for such of a timer.
> >> Virtual machine may be stopped or slowed down for some reason.
> > 
> > My thinking was that, say if you snapshot every 10 seconds of real time
> > executed, then you should have an upper limit on the order of 10 seconds
> > to perform a reverse-debug operation (so long as you don't exceed your
> > nr_keep limit).
>
> But in some cases savevm itself could take more than 10 seconds.
> We'll have infinite saving in this case. That's why I propose using 
> virtual clock with the QEMU_TIMER_ATTR_EXTERNAL attribute.

It shouldn't do that because it sets the next timer after
end of snapshot time + delay.

With virtual time, won't an idle machine just keep warping
the time to the next snapshot save and that *could* cause
infinite snapshotting. I worry that icount time is difficult
to predict and understand. Both have issues but I think real
time is better.

> > 
> > Is it worth worrying about complexity of slowdowns and vm pausing?
> > Maybe it could stop snapshotting on a host pause.
> > 
> >>> +}
> >>> +
> >>>    void replay_vmstate_init(void)
> >>>    {
> >>>        Error *err = NULL;
> >>> @@ -81,6 +128,16 @@ void replay_vmstate_init(void)
> >>>                    error_report("Could not create snapshot for icount record");
> >>>                    exit(1);
> >>>                }
> >>> +
> >>> +            if (replay_snapshot_mode == REPLAY_SNAPSHOT_MODE_PERIODIC) {
> >>> +                replay_snapshot_timer = timer_new_ms(QEMU_CLOCK_REALTIME,
> >>> +                                                     replay_snapshot_timer_cb,
> >>> +                                                     NULL);
> >>> +                timer_mod(replay_snapshot_timer,
> >>> +                          qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
> >>> +                          replay_snapshot_periodic_delay);
> >>> +            }
> >>> +
> >>
> >> Please also delete placeholder comment for the snapshotting timer
> >> in replay_enable function.
> > 
> > Wil do.
> > 
> >>>            } else if (replay_mode == REPLAY_MODE_PLAY) {
> >>>                if (!load_snapshot(replay_snapshot, NULL, false, NULL, &err)) {
> >>>                    error_report_err(err);
> >>> diff --git a/replay/replay.c b/replay/replay.c
> >>> index e64f71209a..fa5930700d 100644
> >>> --- a/replay/replay.c
> >>> +++ b/replay/replay.c
> >>> @@ -29,6 +29,10 @@
> >>>    ReplayMode replay_mode = REPLAY_MODE_NONE;
> >>>    char *replay_snapshot;
> >>>    
> >>> +ReplaySnapshotMode replay_snapshot_mode;
> >>> +uint64_t replay_snapshot_periodic_delay;
> >>> +int replay_snapshot_periodic_nr_keep;
> >>> +
> >>>    /* Name of replay file  */
> >>>    static char *replay_filename;
> >>>    ReplayState replay_state;
> >>> @@ -313,6 +317,27 @@ void replay_configure(QemuOpts *opts)
> >>>        }
> >>>    
> >>>        replay_snapshot = g_strdup(qemu_opt_get(opts, "rrsnapshot"));
> >>> +    if (replay_snapshot && mode == REPLAY_MODE_RECORD) {
> >>
> >> Can such a snapshotting may be useful in replay mode?
> > 
> > Does snapshotting do anything in replay mode? 
>
> Yes, you can create as many snapshots as you want if 'snapshot=on'
> option of the disk image was not used.

Oh, so it will make real snapshots? I just assumed it would
"replay" the motions of the snapshot.

In that case actually that could be useful if your recording
didn't have snapshots and you want to debug it. I'll try it.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] replay: simple auto-snapshot mode for record
  2024-02-28  5:07       ` Pavel Dovgalyuk
  2024-02-29  6:32         ` Nicholas Piggin
@ 2024-03-04 16:33         ` Nicholas Piggin
  1 sibling, 0 replies; 13+ messages in thread
From: Nicholas Piggin @ 2024-03-04 16:33 UTC (permalink / raw)
  To: Pavel Dovgalyuk
  Cc: Paolo Bonzini, John Snow, Cleber Rosa,
	Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal, qemu-devel

On Wed Feb 28, 2024 at 3:07 PM AEST, Pavel Dovgalyuk wrote:
> On 26.02.2024 10:36, Nicholas Piggin wrote:
> >>> @@ -313,6 +317,27 @@ void replay_configure(QemuOpts *opts)
> >>>        }
> >>>    
> >>>        replay_snapshot = g_strdup(qemu_opt_get(opts, "rrsnapshot"));
> >>> +    if (replay_snapshot && mode == REPLAY_MODE_RECORD) {
> >>
> >> Can such a snapshotting may be useful in replay mode?
> > 
> > Does snapshotting do anything in replay mode? 
>
> Yes, you can create as many snapshots as you want if 'snapshot=on'
> option of the disk image was not used.

Actually something goes wrong and the machine crashes when I try
this, so it's not so simple. It's a nice idea to be able to do
because it really makes reverse debugging much faster, but no time
to look into it further at the moment.

So I will resubmit basically as is, which works for me nicely.
Tweaks can be made or other timer behaviour can be added later,
it's not a hard API with precisely defined semantics. So it would
be nice to get the basic mechanisms merged and tests added.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-03-04 16:34 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-14 16:31 [PATCH for-8.2 0/4] Assorted replay patches Nicholas Piggin
2023-08-14 16:31 ` [PATCH 1/4] scripts/replay_dump.sh: Update to current rr record format Nicholas Piggin
2023-08-18  4:19   ` Pavel Dovgalyuk
2023-08-14 16:31 ` [PATCH 2/4] tests/avocado: replay_linux.py add replay-dump.py test Nicholas Piggin
2023-08-18  4:23   ` Pavel Dovgalyuk
2023-08-14 16:31 ` [PATCH 3/4] replay: allow runstate shutdown->running when replaying trace Nicholas Piggin
2023-08-18  4:25   ` Pavel Dovgalyuk
2023-08-14 16:31 ` [PATCH 4/4] replay: simple auto-snapshot mode for record Nicholas Piggin
2023-08-18  4:36   ` Pavel Dovgalyuk
2024-02-26  7:36     ` Nicholas Piggin
2024-02-28  5:07       ` Pavel Dovgalyuk
2024-02-29  6:32         ` Nicholas Piggin
2024-03-04 16:33         ` Nicholas Piggin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.