All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/11] record/replay fixes, maybe for 8.2 or for post release stable?
@ 2023-12-05 20:40 Alex Bennée
  2023-12-05 20:40 ` [PATCH 01/11] tests/avocado: add a simple i386 replay kernel test Alex Bennée
                   ` (10 more replies)
  0 siblings, 11 replies; 40+ messages in thread
From: Alex Bennée @ 2023-12-05 20:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost, Richard Henderson,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, Philippe Mathieu-Daudé,
	John Snow, Pavel Dovgalyuk, Alex Bennée

As I'm a glutton for punishment I thought I'd have a go at fixing the
slowly growing number of record/replay bugs. The two fixes are:

 replay: stop us hanging in rr_wait_io_event
 chardev: force write all when recording replay logs

And the rest is various clean-ups and debugging aids. I don't know if
its worth pushing for 8.2 but it would certainly be nice to get some
review/testing to see if it solves issues for anyone else.

Alex.

Alex Bennée (11):
  tests/avocado: add a simple i386 replay kernel test
  tests/avocado: fix typo in replay_linux
  scripts/replay-dump: update to latest format
  scripts/replay_dump: track total number of instructions
  replay: remove host_clock_last
  replay: add proper kdoc for ReplayState
  replay: make has_unread_data a bool
  replay: introduce a central report point for sync errors
  replay: stop us hanging in rr_wait_io_event
  chardev: force write all when recording replay logs
  tests/avocado: remove skips from replay_kernel

 include/sysemu/replay.h                |  5 ++
 replay/replay-internal.h               | 43 ++++++++++-----
 accel/tcg/tcg-accel-ops-rr.c           |  2 +-
 chardev/char.c                         |  3 +-
 replay/replay-char.c                   |  6 +--
 replay/replay-internal.c               |  5 +-
 replay/replay-snapshot.c               |  6 +--
 replay/replay.c                        | 35 +++++++++++-
 roms/SLOF                              |  2 +-
 scripts/replay-dump.py                 | 75 +++++++++++++++++++++++---
 tests/avocado/replay_kernel.py         | 25 +++++----
 tests/avocado/replay_linux.py          |  2 +-
 tests/tcg/i386/Makefile.softmmu-target | 19 +++++++
 13 files changed, 185 insertions(+), 43 deletions(-)

-- 
2.39.2



^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 01/11] tests/avocado: add a simple i386 replay kernel test
  2023-12-05 20:40 [PATCH 00/11] record/replay fixes, maybe for 8.2 or for post release stable? Alex Bennée
@ 2023-12-05 20:40 ` Alex Bennée
  2023-12-06 16:21   ` Richard Henderson
  2023-12-07  8:20   ` Pavel Dovgalyuk
  2023-12-05 20:40 ` [PATCH 02/11] tests/avocado: fix typo in replay_linux Alex Bennée
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 40+ messages in thread
From: Alex Bennée @ 2023-12-05 20:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost, Richard Henderson,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, Philippe Mathieu-Daudé,
	John Snow, Pavel Dovgalyuk, Alex Bennée

There are a number of bugs against 32 bit x86 on the tracker. Lets at
least establish a baseline pure kernel boot can do record/replay
before we start looking at the devices.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 tests/avocado/replay_kernel.py | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/tests/avocado/replay_kernel.py b/tests/avocado/replay_kernel.py
index c37afa662c..1eaa36444c 100644
--- a/tests/avocado/replay_kernel.py
+++ b/tests/avocado/replay_kernel.py
@@ -82,6 +82,22 @@ def run_rr(self, kernel_path, kernel_command_line, console_pattern,
 
 class ReplayKernelNormal(ReplayKernelBase):
 
+    def test_i386_pc(self):
+        """
+        :avocado: tags=arch:i386
+        :avocado: tags=machine:pc
+        """
+        kernel_url = ('https://storage.tuxboot.com/20230331/i386/bzImage')
+        kernel_hash = 'a3e5b32a354729e65910f5a1ffcda7c14a6c12a55e8213fb86e277f1b76ed956'
+        kernel_path = self.fetch_asset(kernel_url,
+                                       asset_hash=kernel_hash,
+                                       algorithm = "sha256")
+
+        kernel_command_line = self.KERNEL_COMMON_COMMAND_LINE + 'console=ttyS0'
+        console_pattern = 'VFS: Cannot open root device'
+
+        self.run_rr(kernel_path, kernel_command_line, console_pattern, shift=5)
+
     # See https://gitlab.com/qemu-project/qemu/-/issues/2010
     @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test sometimes gets stuck')
     def test_x86_64_pc(self):
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 02/11] tests/avocado: fix typo in replay_linux
  2023-12-05 20:40 [PATCH 00/11] record/replay fixes, maybe for 8.2 or for post release stable? Alex Bennée
  2023-12-05 20:40 ` [PATCH 01/11] tests/avocado: add a simple i386 replay kernel test Alex Bennée
@ 2023-12-05 20:40 ` Alex Bennée
  2023-12-06 11:24   ` Philippe Mathieu-Daudé
                     ` (2 more replies)
  2023-12-05 20:40 ` [PATCH 03/11] scripts/replay-dump: update to latest format Alex Bennée
                   ` (8 subsequent siblings)
  10 siblings, 3 replies; 40+ messages in thread
From: Alex Bennée @ 2023-12-05 20:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost, Richard Henderson,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, Philippe Mathieu-Daudé,
	John Snow, Pavel Dovgalyuk, Alex Bennée

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 tests/avocado/replay_linux.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/avocado/replay_linux.py b/tests/avocado/replay_linux.py
index 270ccc1eae..e95bff3299 100644
--- a/tests/avocado/replay_linux.py
+++ b/tests/avocado/replay_linux.py
@@ -94,7 +94,7 @@ def launch_and_wait(self, record, args, shift):
         else:
             vm.event_wait('SHUTDOWN', self.timeout)
             vm.wait()
-            logger.info('successfully fihished the replay')
+            logger.info('successfully finished the replay')
         elapsed = time.time() - start_time
         logger.info('elapsed time %.2f sec' % elapsed)
         return elapsed
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 03/11] scripts/replay-dump: update to latest format
  2023-12-05 20:40 [PATCH 00/11] record/replay fixes, maybe for 8.2 or for post release stable? Alex Bennée
  2023-12-05 20:40 ` [PATCH 01/11] tests/avocado: add a simple i386 replay kernel test Alex Bennée
  2023-12-05 20:40 ` [PATCH 02/11] tests/avocado: fix typo in replay_linux Alex Bennée
@ 2023-12-05 20:40 ` Alex Bennée
  2023-12-06 16:29   ` Richard Henderson
  2023-12-05 20:40 ` [PATCH 04/11] scripts/replay_dump: track total number of instructions Alex Bennée
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 40+ messages in thread
From: Alex Bennée @ 2023-12-05 20:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost, Richard Henderson,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, Philippe Mathieu-Daudé,
	John Snow, Pavel Dovgalyuk, Alex Bennée

To help debugging replay logs I've implemented decode_plain and
decode_char_write as well as put in a new table for the current format
of log.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 scripts/replay-dump.py | 70 ++++++++++++++++++++++++++++++++++++++----
 1 file changed, 64 insertions(+), 6 deletions(-)

diff --git a/scripts/replay-dump.py b/scripts/replay-dump.py
index b89dc29555..8b9f914534 100755
--- a/scripts/replay-dump.py
+++ b/scripts/replay-dump.py
@@ -115,6 +115,11 @@ def decode_unimp(eid, name, _unused_dumpfile):
     print("%s not handled - will now stop" % (name))
     return False
 
+def decode_plain(eid, name, _unused_dumpfile):
+    "Plain events without additional data"
+    print_event(eid, name, "no data")
+    return True
+
 # Checkpoint decoder
 def swallow_async_qword(eid, name, dumpfile):
     "Swallow a qword of data without looking at it"
@@ -151,6 +156,12 @@ def decode_instruction(eid, name, dumpfile):
     print_event(eid, name, "0x%x" % (ins_diff))
     return True
 
+def decode_char_write(eid, name, dumpfile):
+    res = read_dword(dumpfile)
+    offset = read_dword(dumpfile)
+    print_event(eid, name, "%d -> %d" % (offset, res))
+    return True
+
 def decode_audio_out(eid, name, dumpfile):
     audio_data = read_dword(dumpfile)
     print_event(eid, name, "%d" % (audio_data))
@@ -193,10 +204,10 @@ def decode_clock(eid, name, dumpfile):
 # pre-MTTCG merge
 v5_event_table = [Decoder(0, "EVENT_INSTRUCTION", decode_instruction),
                   Decoder(1, "EVENT_INTERRUPT", decode_interrupt),
-                  Decoder(2, "EVENT_EXCEPTION", decode_unimp),
+                  Decoder(2, "EVENT_EXCEPTION", decode_plain),
                   Decoder(3, "EVENT_ASYNC", decode_async),
                   Decoder(4, "EVENT_SHUTDOWN", decode_unimp),
-                  Decoder(5, "EVENT_CHAR_WRITE", decode_unimp),
+                  Decoder(5, "EVENT_CHAR_WRITE", decode_char_write),
                   Decoder(6, "EVENT_CHAR_READ_ALL", decode_unimp),
                   Decoder(7, "EVENT_CHAR_READ_ALL_ERROR", decode_unimp),
                   Decoder(8, "EVENT_CLOCK_HOST", decode_clock),
@@ -215,10 +226,10 @@ def decode_clock(eid, name, dumpfile):
 # post-MTTCG merge, AUDIO support added
 v6_event_table = [Decoder(0, "EVENT_INSTRUCTION", decode_instruction),
                   Decoder(1, "EVENT_INTERRUPT", decode_interrupt),
-                  Decoder(2, "EVENT_EXCEPTION", decode_unimp),
+                  Decoder(2, "EVENT_EXCEPTION", decode_plain),
                   Decoder(3, "EVENT_ASYNC", decode_async),
                   Decoder(4, "EVENT_SHUTDOWN", decode_unimp),
-                  Decoder(5, "EVENT_CHAR_WRITE", decode_unimp),
+                  Decoder(5, "EVENT_CHAR_WRITE", decode_char_write),
                   Decoder(6, "EVENT_CHAR_READ_ALL", decode_unimp),
                   Decoder(7, "EVENT_CHAR_READ_ALL_ERROR", decode_unimp),
                   Decoder(8, "EVENT_AUDIO_OUT", decode_audio_out),
@@ -250,7 +261,7 @@ def decode_clock(eid, name, dumpfile):
                   Decoder(10, "EVENT_SHUTDOWN_GUEST_RESET", decode_unimp),
                   Decoder(11, "EVENT_SHUTDOWN_GUEST_PANIC", decode_unimp),
                   Decoder(12, "EVENT_SHUTDOWN___MAX", decode_unimp),
-                  Decoder(13, "EVENT_CHAR_WRITE", decode_unimp),
+                  Decoder(13, "EVENT_CHAR_WRITE", decode_char_write),
                   Decoder(14, "EVENT_CHAR_READ_ALL", decode_unimp),
                   Decoder(15, "EVENT_CHAR_READ_ALL_ERROR", decode_unimp),
                   Decoder(16, "EVENT_AUDIO_OUT", decode_audio_out),
@@ -268,6 +279,49 @@ def decode_clock(eid, name, dumpfile):
                   Decoder(28, "EVENT_CP_RESET", decode_checkpoint),
 ]
 
+# Shutdown cause added
+v12_event_table = [Decoder(0, "EVENT_INSTRUCTION", decode_instruction),
+                  Decoder(1, "EVENT_INTERRUPT", decode_interrupt),
+                  Decoder(2, "EVENT_EXCEPTION", decode_plain),
+                  Decoder(3, "EVENT_ASYNC", decode_async),
+                  Decoder(4, "EVENT_ASYNC", decode_async),
+                  Decoder(5, "EVENT_ASYNC", decode_async),
+                  Decoder(6, "EVENT_ASYNC", decode_async),
+                  Decoder(6, "EVENT_ASYNC", decode_async),
+                  Decoder(8, "EVENT_ASYNC", decode_async),
+                  Decoder(9, "EVENT_ASYNC", decode_async),
+                  Decoder(10, "EVENT_ASYNC", decode_async),
+                  Decoder(11, "EVENT_SHUTDOWN", decode_unimp),
+                  Decoder(12, "EVENT_SHUTDOWN_HOST_ERR", decode_unimp),
+                  Decoder(13, "EVENT_SHUTDOWN_HOST_QMP_QUIT", decode_unimp),
+                  Decoder(14, "EVENT_SHUTDOWN_HOST_QMP_RESET", decode_unimp),
+                  Decoder(14, "EVENT_SHUTDOWN_HOST_SIGNAL", decode_unimp),
+                  Decoder(15, "EVENT_SHUTDOWN_HOST_UI", decode_unimp),
+                  Decoder(16, "EVENT_SHUTDOWN_GUEST_SHUTDOWN", decode_unimp),
+                  Decoder(17, "EVENT_SHUTDOWN_GUEST_RESET", decode_unimp),
+                  Decoder(18, "EVENT_SHUTDOWN_GUEST_PANIC", decode_unimp),
+                  Decoder(19, "EVENT_SHUTDOWN_GUEST_SUBSYSTEM_RESET", decode_unimp),
+                  Decoder(20, "EVENT_SHUTDOWN_GUEST_SNAPSHOT_LOAD", decode_unimp),
+                  Decoder(21, "EVENT_SHUTDOWN___MAX", decode_unimp),
+                  Decoder(22, "EVENT_CHAR_WRITE", decode_char_write),
+                  Decoder(23, "EVENT_CHAR_READ_ALL", decode_unimp),
+                  Decoder(24, "EVENT_CHAR_READ_ALL_ERROR", decode_unimp),
+                  Decoder(25, "EVENT_AUDIO_IN", decode_unimp),
+                  Decoder(26, "EVENT_AUDIO_OUT", decode_audio_out),
+                  Decoder(27, "EVENT_RANDOM", decode_unimp),
+                  Decoder(28, "EVENT_CLOCK_HOST", decode_clock),
+                  Decoder(29, "EVENT_CLOCK_VIRTUAL_RT", decode_clock),
+                  Decoder(30, "EVENT_CP_CLOCK_WARP_START", decode_checkpoint),
+                  Decoder(31, "EVENT_CP_CLOCK_WARP_ACCOUNT", decode_checkpoint),
+                  Decoder(32, "EVENT_CP_RESET_REQUESTED", decode_checkpoint),
+                  Decoder(33, "EVENT_CP_SUSPEND_REQUESTED", decode_checkpoint),
+                  Decoder(34, "EVENT_CP_CLOCK_VIRTUAL", decode_checkpoint),
+                  Decoder(35, "EVENT_CP_CLOCK_HOST", decode_checkpoint),
+                  Decoder(36, "EVENT_CP_CLOCK_VIRTUAL_RT", decode_checkpoint),
+                  Decoder(37, "EVENT_CP_INIT", decode_checkpoint_init),
+                  Decoder(38, "EVENT_CP_RESET", decode_checkpoint),
+]
+
 def parse_arguments():
     "Grab arguments for script"
     parser = argparse.ArgumentParser()
@@ -283,9 +337,13 @@ def decode_file(filename):
     version = read_dword(dumpfile)
     junk = read_qword(dumpfile)
 
+    # see REPLAY_VERSION
     print("HEADER: version 0x%x" % (version))
 
-    if version == 0xe02007:
+    if version == 0xe0200c:
+        event_decode_table = v12_event_table
+        replay_state.checkpoint_start = 30
+    elif version == 0xe02007:
         event_decode_table = v7_event_table
         replay_state.checkpoint_start = 12
     elif version == 0xe02006:
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 04/11] scripts/replay_dump: track total number of instructions
  2023-12-05 20:40 [PATCH 00/11] record/replay fixes, maybe for 8.2 or for post release stable? Alex Bennée
                   ` (2 preceding siblings ...)
  2023-12-05 20:40 ` [PATCH 03/11] scripts/replay-dump: update to latest format Alex Bennée
@ 2023-12-05 20:40 ` Alex Bennée
  2023-12-06 11:25   ` Philippe Mathieu-Daudé
                     ` (2 more replies)
  2023-12-05 20:41 ` [PATCH 05/11] replay: remove host_clock_last Alex Bennée
                   ` (6 subsequent siblings)
  10 siblings, 3 replies; 40+ messages in thread
From: Alex Bennée @ 2023-12-05 20:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost, Richard Henderson,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, Philippe Mathieu-Daudé,
	John Snow, Pavel Dovgalyuk, Alex Bennée

This will help in tracking where we are in the stream when debugging.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 scripts/replay-dump.py | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/scripts/replay-dump.py b/scripts/replay-dump.py
index 8b9f914534..2212b09322 100755
--- a/scripts/replay-dump.py
+++ b/scripts/replay-dump.py
@@ -150,10 +150,13 @@ def decode_async(eid, name, dumpfile):
 
     return call_decode(async_decode_table, async_event_kind, dumpfile)
 
+total_insns = 0
 
 def decode_instruction(eid, name, dumpfile):
+    global total_insns
     ins_diff = read_dword(dumpfile)
-    print_event(eid, name, "0x%x" % (ins_diff))
+    total_insns += ins_diff
+    print_event(eid, name, "+ %d -> %d" % (ins_diff, total_insns))
     return True
 
 def decode_char_write(eid, name, dumpfile):
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 05/11] replay: remove host_clock_last
  2023-12-05 20:40 [PATCH 00/11] record/replay fixes, maybe for 8.2 or for post release stable? Alex Bennée
                   ` (3 preceding siblings ...)
  2023-12-05 20:40 ` [PATCH 04/11] scripts/replay_dump: track total number of instructions Alex Bennée
@ 2023-12-05 20:41 ` Alex Bennée
  2023-12-06 11:33   ` Philippe Mathieu-Daudé
                     ` (2 more replies)
  2023-12-05 20:41 ` [PATCH 06/11] replay: add proper kdoc for ReplayState Alex Bennée
                   ` (5 subsequent siblings)
  10 siblings, 3 replies; 40+ messages in thread
From: Alex Bennée @ 2023-12-05 20:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost, Richard Henderson,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, Philippe Mathieu-Daudé,
	John Snow, Pavel Dovgalyuk, Alex Bennée

Fixes: a02fe2ca70 (replay: Remove host_clock_last)
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 replay/replay-internal.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/replay/replay-internal.h b/replay/replay-internal.h
index b6836354ac..516147ddbc 100644
--- a/replay/replay-internal.h
+++ b/replay/replay-internal.h
@@ -80,8 +80,6 @@ typedef struct ReplayState {
         This counter is global, because requests from different
         block devices should not get overlapping ids. */
     uint64_t block_request_id;
-    /*! Prior value of the host clock */
-    uint64_t host_clock_last;
     /*! Asynchronous event id read from the log */
     uint64_t read_event_id;
 } ReplayState;
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 06/11] replay: add proper kdoc for ReplayState
  2023-12-05 20:40 [PATCH 00/11] record/replay fixes, maybe for 8.2 or for post release stable? Alex Bennée
                   ` (4 preceding siblings ...)
  2023-12-05 20:41 ` [PATCH 05/11] replay: remove host_clock_last Alex Bennée
@ 2023-12-05 20:41 ` Alex Bennée
  2023-12-06 11:27   ` Philippe Mathieu-Daudé
  2023-12-07  8:38   ` Pavel Dovgalyuk
  2023-12-05 20:41 ` [PATCH 07/11] replay: make has_unread_data a bool Alex Bennée
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 40+ messages in thread
From: Alex Bennée @ 2023-12-05 20:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost, Richard Henderson,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, Philippe Mathieu-Daudé,
	John Snow, Pavel Dovgalyuk, Alex Bennée

Remove the non-standard comment formatting and move the descriptions
into a proper kdoc comment.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 replay/replay-internal.h               | 27 ++++++++++++++++----------
 roms/SLOF                              |  2 +-
 tests/tcg/i386/Makefile.softmmu-target | 19 ++++++++++++++++++
 3 files changed, 37 insertions(+), 11 deletions(-)

diff --git a/replay/replay-internal.h b/replay/replay-internal.h
index 516147ddbc..98ca3748ed 100644
--- a/replay/replay-internal.h
+++ b/replay/replay-internal.h
@@ -63,24 +63,31 @@ enum ReplayEvents {
     EVENT_COUNT
 };
 
+/**
+ * typedef ReplayState - global tracking Replay state
+ *
+ * This structure tracks where we are in the current ReplayState
+ * including the logged events from the recorded replay stream. Some
+ * of the data is also stored/restored from VMStateDescription when VM
+ * save/restore events take place.
+ *
+ * @cached_clock: Cached clocks values
+ * @current_icount: number of processed instructions
+ * @instruction_count: number of instructions until next event
+ * @data_kind: current event
+ * @has_unread_data: 1 if event not yet processed
+ * @file_offset: offset into replay log at replay snapshot
+ * @block_request_id: current serialised block request id
+ * @read_event_id: current async read event id
+ */
 typedef struct ReplayState {
-    /*! Cached clock values. */
     int64_t cached_clock[REPLAY_CLOCK_COUNT];
-    /*! Current icount - number of processed instructions. */
     uint64_t current_icount;
-    /*! Number of instructions to be executed before other events happen. */
     int instruction_count;
-    /*! Type of the currently executed event. */
     unsigned int data_kind;
-    /*! Flag which indicates that event is not processed yet. */
     unsigned int has_unread_data;
-    /*! Temporary variable for saving current log offset. */
     uint64_t file_offset;
-    /*! Next block operation id.
-        This counter is global, because requests from different
-        block devices should not get overlapping ids. */
     uint64_t block_request_id;
-    /*! Asynchronous event id read from the log */
     uint64_t read_event_id;
 } ReplayState;
 extern ReplayState replay_state;
diff --git a/roms/SLOF b/roms/SLOF
index 3a259df244..6b6c16b4b4 160000
--- a/roms/SLOF
+++ b/roms/SLOF
@@ -1 +1 @@
-Subproject commit 3a259df2449fc4a4e43ab5f33f0b2c66484b4bc3
+Subproject commit 6b6c16b4b40763507cf1f518096f3c3883c5cf2d
diff --git a/tests/tcg/i386/Makefile.softmmu-target b/tests/tcg/i386/Makefile.softmmu-target
index 5266f2335a..b9bef72dcf 100644
--- a/tests/tcg/i386/Makefile.softmmu-target
+++ b/tests/tcg/i386/Makefile.softmmu-target
@@ -33,5 +33,24 @@ EXTRA_RUNS+=$(MULTIARCH_RUNS)
 
 memory: CFLAGS+=-DCHECK_UNALIGNED=1
 
+# Simple Record/Replay Test
+.PHONY: memory-record
+run-memory-record: memory-record memory
+	$(call run-test, $<, \
+	  $(QEMU) -monitor none -display none \
+		  -chardev file$(COMMA)path=$<.out$(COMMA)id=output \
+		  -icount shift=5$(COMMA)rr=record$(COMMA)rrfile=record.bin \
+		  $(QEMU_OPTS) memory)
+
+.PHONY: memory-replay
+run-memory-replay: memory-replay run-memory-record
+	$(call run-test, $<, \
+	  $(QEMU) -monitor none -display none \
+		  -chardev file$(COMMA)path=$<.out$(COMMA)id=output \
+		  -icount shift=5$(COMMA)rr=replay$(COMMA)rrfile=record.bin \
+		  $(QEMU_OPTS) memory)
+
+EXTRA_RUNS+=run-memory-replay
+
 # Running
 QEMU_OPTS+=-device isa-debugcon,chardev=output -device isa-debug-exit,iobase=0xf4,iosize=0x4 -kernel
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 07/11] replay: make has_unread_data a bool
  2023-12-05 20:40 [PATCH 00/11] record/replay fixes, maybe for 8.2 or for post release stable? Alex Bennée
                   ` (5 preceding siblings ...)
  2023-12-05 20:41 ` [PATCH 06/11] replay: add proper kdoc for ReplayState Alex Bennée
@ 2023-12-05 20:41 ` Alex Bennée
  2023-12-06 11:31   ` Philippe Mathieu-Daudé
                     ` (2 more replies)
  2023-12-05 20:41 ` [PATCH 08/11] replay: introduce a central report point for sync errors Alex Bennée
                   ` (3 subsequent siblings)
  10 siblings, 3 replies; 40+ messages in thread
From: Alex Bennée @ 2023-12-05 20:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost, Richard Henderson,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, Philippe Mathieu-Daudé,
	John Snow, Pavel Dovgalyuk, Alex Bennée

For clarity given it only has two states.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 replay/replay-internal.h | 4 ++--
 replay/replay-internal.c | 4 ++--
 replay/replay-snapshot.c | 6 +++---
 replay/replay.c          | 2 +-
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/replay/replay-internal.h b/replay/replay-internal.h
index 98ca3748ed..1bc8fd5086 100644
--- a/replay/replay-internal.h
+++ b/replay/replay-internal.h
@@ -75,7 +75,7 @@ enum ReplayEvents {
  * @current_icount: number of processed instructions
  * @instruction_count: number of instructions until next event
  * @data_kind: current event
- * @has_unread_data: 1 if event not yet processed
+ * @has_unread_data: true if event not yet processed
  * @file_offset: offset into replay log at replay snapshot
  * @block_request_id: current serialised block request id
  * @read_event_id: current async read event id
@@ -85,7 +85,7 @@ typedef struct ReplayState {
     uint64_t current_icount;
     int instruction_count;
     unsigned int data_kind;
-    unsigned int has_unread_data;
+    bool has_unread_data;
     uint64_t file_offset;
     uint64_t block_request_id;
     uint64_t read_event_id;
diff --git a/replay/replay-internal.c b/replay/replay-internal.c
index 77d0c82327..634025096e 100644
--- a/replay/replay-internal.c
+++ b/replay/replay-internal.c
@@ -179,7 +179,7 @@ void replay_fetch_data_kind(void)
                 replay_state.instruction_count = replay_get_dword();
             }
             replay_check_error();
-            replay_state.has_unread_data = 1;
+            replay_state.has_unread_data = true;
             if (replay_state.data_kind >= EVENT_COUNT) {
                 error_report("Replay: unknown event kind %d",
                              replay_state.data_kind);
@@ -191,7 +191,7 @@ void replay_fetch_data_kind(void)
 
 void replay_finish_event(void)
 {
-    replay_state.has_unread_data = 0;
+    replay_state.has_unread_data = false;
     replay_fetch_data_kind();
 }
 
diff --git a/replay/replay-snapshot.c b/replay/replay-snapshot.c
index 10a7cf7992..bf75c2ed28 100644
--- a/replay/replay-snapshot.c
+++ b/replay/replay-snapshot.c
@@ -47,8 +47,8 @@ static int replay_post_load(void *opaque, int version_id)
 
 static const VMStateDescription vmstate_replay = {
     .name = "replay",
-    .version_id = 2,
-    .minimum_version_id = 2,
+    .version_id = 3,
+    .minimum_version_id = 3,
     .pre_save = replay_pre_save,
     .post_load = replay_post_load,
     .fields = (VMStateField[]) {
@@ -56,7 +56,7 @@ static const VMStateDescription vmstate_replay = {
         VMSTATE_UINT64(current_icount, ReplayState),
         VMSTATE_INT32(instruction_count, ReplayState),
         VMSTATE_UINT32(data_kind, ReplayState),
-        VMSTATE_UINT32(has_unread_data, ReplayState),
+        VMSTATE_BOOL(has_unread_data, ReplayState),
         VMSTATE_UINT64(file_offset, ReplayState),
         VMSTATE_UINT64(block_request_id, ReplayState),
         VMSTATE_UINT64(read_event_id, ReplayState),
diff --git a/replay/replay.c b/replay/replay.c
index 0f7d766efe..d729214197 100644
--- a/replay/replay.c
+++ b/replay/replay.c
@@ -258,7 +258,7 @@ static void replay_enable(const char *fname, int mode)
     replay_state.data_kind = -1;
     replay_state.instruction_count = 0;
     replay_state.current_icount = 0;
-    replay_state.has_unread_data = 0;
+    replay_state.has_unread_data = false;
 
     /* skip file header for RECORD and check it for PLAY */
     if (replay_mode == REPLAY_MODE_RECORD) {
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 08/11] replay: introduce a central report point for sync errors
  2023-12-05 20:40 [PATCH 00/11] record/replay fixes, maybe for 8.2 or for post release stable? Alex Bennée
                   ` (6 preceding siblings ...)
  2023-12-05 20:41 ` [PATCH 07/11] replay: make has_unread_data a bool Alex Bennée
@ 2023-12-05 20:41 ` Alex Bennée
  2023-12-06 11:35   ` Philippe Mathieu-Daudé
  2023-12-06 16:47   ` Richard Henderson
  2023-12-05 20:41 ` [PATCH 09/11] replay: stop us hanging in rr_wait_io_event Alex Bennée
                   ` (2 subsequent siblings)
  10 siblings, 2 replies; 40+ messages in thread
From: Alex Bennée @ 2023-12-05 20:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost, Richard Henderson,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, Philippe Mathieu-Daudé,
	John Snow, Pavel Dovgalyuk, Alex Bennée

Figuring out why replay has failed is tricky at the best of times.
Lets centralise the reporting of a replay sync error and add a little
bit of extra information to help with debugging.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 replay/replay-internal.h | 12 ++++++++++++
 replay/replay-char.c     |  6 ++----
 replay/replay-internal.c |  1 +
 replay/replay.c          |  9 +++++++++
 4 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/replay/replay-internal.h b/replay/replay-internal.h
index 1bc8fd5086..709e2eb4cb 100644
--- a/replay/replay-internal.h
+++ b/replay/replay-internal.h
@@ -74,6 +74,7 @@ enum ReplayEvents {
  * @cached_clock: Cached clocks values
  * @current_icount: number of processed instructions
  * @instruction_count: number of instructions until next event
+ * @current_event: current event index
  * @data_kind: current event
  * @has_unread_data: true if event not yet processed
  * @file_offset: offset into replay log at replay snapshot
@@ -84,6 +85,7 @@ typedef struct ReplayState {
     int64_t cached_clock[REPLAY_CLOCK_COUNT];
     uint64_t current_icount;
     int instruction_count;
+    unsigned int current_event;
     unsigned int data_kind;
     bool has_unread_data;
     uint64_t file_offset;
@@ -188,6 +190,16 @@ void replay_event_net_save(void *opaque);
 /*! Reads network from the file. */
 void *replay_event_net_load(void);
 
+/* Diagnostics */
+
+/**
+ * replay_sync_error(): report sync error and exit
+ *
+ * When we reach an error condition we want to report it centrally so
+ * we can also dump some useful information into the logs.
+ */
+G_NORETURN void replay_sync_error(const char *error);
+
 /* VMState-related functions */
 
 /* Registers replay VMState.
diff --git a/replay/replay-char.c b/replay/replay-char.c
index a31aded032..72b1f832dd 100644
--- a/replay/replay-char.c
+++ b/replay/replay-char.c
@@ -113,8 +113,7 @@ void replay_char_write_event_load(int *res, int *offset)
         *offset = replay_get_dword();
         replay_finish_event();
     } else {
-        error_report("Missing character write event in the replay log");
-        exit(1);
+        replay_sync_error("Missing character write event in the replay log");
     }
 }
 
@@ -135,8 +134,7 @@ int replay_char_read_all_load(uint8_t *buf)
         replay_finish_event();
         return res;
     } else {
-        error_report("Missing character read all event in the replay log");
-        exit(1);
+        replay_sync_error("Missing character read all event in the replay log");
     }
 }
 
diff --git a/replay/replay-internal.c b/replay/replay-internal.c
index 634025096e..654b99cfb5 100644
--- a/replay/replay-internal.c
+++ b/replay/replay-internal.c
@@ -175,6 +175,7 @@ void replay_fetch_data_kind(void)
     if (replay_file) {
         if (!replay_state.has_unread_data) {
             replay_state.data_kind = replay_get_byte();
+            replay_state.current_event++;
             if (replay_state.data_kind == EVENT_INSTRUCTION) {
                 replay_state.instruction_count = replay_get_dword();
             }
diff --git a/replay/replay.c b/replay/replay.c
index d729214197..e83c01285c 100644
--- a/replay/replay.c
+++ b/replay/replay.c
@@ -226,6 +226,14 @@ bool replay_has_event(void)
     return res;
 }
 
+G_NORETURN void replay_sync_error(const char *error)
+{
+    error_report("%s (insn total %"PRId64"/%d left, event %u/%u)", error,
+                 replay_state.current_icount, replay_state.instruction_count,
+                 replay_state.current_event, replay_state.data_kind);
+    abort();
+}
+
 static void replay_enable(const char *fname, int mode)
 {
     const char *fmode = NULL;
@@ -258,6 +266,7 @@ static void replay_enable(const char *fname, int mode)
     replay_state.data_kind = -1;
     replay_state.instruction_count = 0;
     replay_state.current_icount = 0;
+    replay_state.current_event = 0;
     replay_state.has_unread_data = false;
 
     /* skip file header for RECORD and check it for PLAY */
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 09/11] replay: stop us hanging in rr_wait_io_event
  2023-12-05 20:40 [PATCH 00/11] record/replay fixes, maybe for 8.2 or for post release stable? Alex Bennée
                   ` (7 preceding siblings ...)
  2023-12-05 20:41 ` [PATCH 08/11] replay: introduce a central report point for sync errors Alex Bennée
@ 2023-12-05 20:41 ` Alex Bennée
  2023-12-06 16:51   ` Richard Henderson
  2023-12-08  8:32   ` Pavel Dovgalyuk
  2023-12-05 20:41 ` [PATCH 10/11] chardev: force write all when recording replay logs Alex Bennée
  2023-12-05 20:41 ` [PATCH 11/11] tests/avocado: remove skips from replay_kernel Alex Bennée
  10 siblings, 2 replies; 40+ messages in thread
From: Alex Bennée @ 2023-12-05 20:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost, Richard Henderson,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, Philippe Mathieu-Daudé,
	John Snow, Pavel Dovgalyuk, Alex Bennée

A lot of the hang I see are when we end up spinning in
rr_wait_io_event for an event that will never come in playback. As a
new check functions which can see if we are in PLAY mode and kick us
us the wait function so the event can be processed.

This fixes most of the failures in replay_kernel.py

Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2013
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Cc: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
---
 include/sysemu/replay.h      |  5 +++++
 accel/tcg/tcg-accel-ops-rr.c |  2 +-
 replay/replay.c              | 24 ++++++++++++++++++++++++
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/include/sysemu/replay.h b/include/sysemu/replay.h
index 08aae5869f..83995ae4bd 100644
--- a/include/sysemu/replay.h
+++ b/include/sysemu/replay.h
@@ -70,6 +70,11 @@ int replay_get_instructions(void);
 /*! Updates instructions counter in replay mode. */
 void replay_account_executed_instructions(void);
 
+/**
+ * replay_can_wait: check if we should pause for wait-io
+ */
+bool replay_can_wait(void);
+
 /* Processing clocks and other time sources */
 
 /*! Save the specified clock */
diff --git a/accel/tcg/tcg-accel-ops-rr.c b/accel/tcg/tcg-accel-ops-rr.c
index 611932f3c3..825e35b3dc 100644
--- a/accel/tcg/tcg-accel-ops-rr.c
+++ b/accel/tcg/tcg-accel-ops-rr.c
@@ -109,7 +109,7 @@ static void rr_wait_io_event(void)
 {
     CPUState *cpu;
 
-    while (all_cpu_threads_idle()) {
+    while (all_cpu_threads_idle() && replay_can_wait()) {
         rr_stop_kick_timer();
         qemu_cond_wait_iothread(first_cpu->halt_cond);
     }
diff --git a/replay/replay.c b/replay/replay.c
index e83c01285c..042a6a9636 100644
--- a/replay/replay.c
+++ b/replay/replay.c
@@ -347,6 +347,30 @@ void replay_start(void)
     replay_enable_events();
 }
 
+/*
+ * For none/record the answer is yes.
+ */
+bool replay_can_wait(void)
+{
+    if (replay_mode == REPLAY_MODE_PLAY) {
+        /*
+         * For playback we shouldn't ever be at a point we wait. If
+         * the instruction count has reached zero and we have an
+         * unconsumed event we should go around again and consume it.
+         */
+        if (replay_state.instruction_count == 0 && replay_state.has_unread_data) {
+            return false;
+        } else {
+            fprintf(stderr, "Error: Invalid replay state\n");
+            fprintf(stderr,"instruction_count = %d, has = %d, event_kind = %d\n",
+                    replay_state.instruction_count, replay_state.has_unread_data, replay_state.data_kind);
+            abort();
+        }
+    }
+    return true;
+}
+
+
 void replay_finish(void)
 {
     if (replay_mode == REPLAY_MODE_NONE) {
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 10/11] chardev: force write all when recording replay logs
  2023-12-05 20:40 [PATCH 00/11] record/replay fixes, maybe for 8.2 or for post release stable? Alex Bennée
                   ` (8 preceding siblings ...)
  2023-12-05 20:41 ` [PATCH 09/11] replay: stop us hanging in rr_wait_io_event Alex Bennée
@ 2023-12-05 20:41 ` Alex Bennée
  2023-12-06 14:25   ` Philippe Mathieu-Daudé
  2023-12-07  8:46   ` Pavel Dovgalyuk
  2023-12-05 20:41 ` [PATCH 11/11] tests/avocado: remove skips from replay_kernel Alex Bennée
  10 siblings, 2 replies; 40+ messages in thread
From: Alex Bennée @ 2023-12-05 20:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost, Richard Henderson,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, Philippe Mathieu-Daudé,
	John Snow, Pavel Dovgalyuk, Alex Bennée

This is mostly a problem within avocado as serial generally isn't busy
enough to overfill pipes. However the consequences of recording a
failed write will haunt us on replay when causing the log to go out of
sync.

Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2010
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Cc: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
---
 chardev/char.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/chardev/char.c b/chardev/char.c
index 996a024c7a..6e5b4d7345 100644
--- a/chardev/char.c
+++ b/chardev/char.c
@@ -171,7 +171,8 @@ int qemu_chr_write(Chardev *s, const uint8_t *buf, int len, bool write_all)
         return res;
     }
 
-    res = qemu_chr_write_buffer(s, buf, len, &offset, write_all);
+    res = qemu_chr_write_buffer(s, buf, len, &offset,
+                                replay_mode == REPLAY_MODE_RECORD ? true : write_all);
 
     if (qemu_chr_replay(s) && replay_mode == REPLAY_MODE_RECORD) {
         replay_char_write_event_save(res, offset);
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 11/11] tests/avocado: remove skips from replay_kernel
  2023-12-05 20:40 [PATCH 00/11] record/replay fixes, maybe for 8.2 or for post release stable? Alex Bennée
                   ` (9 preceding siblings ...)
  2023-12-05 20:41 ` [PATCH 10/11] chardev: force write all when recording replay logs Alex Bennée
@ 2023-12-05 20:41 ` Alex Bennée
  2023-12-07  8:46   ` Pavel Dovgalyuk
  10 siblings, 1 reply; 40+ messages in thread
From: Alex Bennée @ 2023-12-05 20:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost, Richard Henderson,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, Philippe Mathieu-Daudé,
	John Snow, Pavel Dovgalyuk, Alex Bennée

With the latest fixes for #2010 and #2013 these tests look pretty
stable now. Of course the only way to be really sure is to run it in
the CI infrastructure and see what breaks.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 tests/avocado/replay_kernel.py | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/tests/avocado/replay_kernel.py b/tests/avocado/replay_kernel.py
index 1eaa36444c..c54e96c9ff 100644
--- a/tests/avocado/replay_kernel.py
+++ b/tests/avocado/replay_kernel.py
@@ -98,8 +98,6 @@ def test_i386_pc(self):
 
         self.run_rr(kernel_path, kernel_command_line, console_pattern, shift=5)
 
-    # See https://gitlab.com/qemu-project/qemu/-/issues/2010
-    @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test sometimes gets stuck')
     def test_x86_64_pc(self):
         """
         :avocado: tags=arch:x86_64
@@ -135,8 +133,6 @@ def test_mips_malta(self):
 
         self.run_rr(kernel_path, kernel_command_line, console_pattern, shift=5)
 
-    # See https://gitlab.com/qemu-project/qemu/-/issues/2013
-    @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable on GitLab')
     def test_mips64el_malta(self):
         """
         This test requires the ar tool to extract "data.tar.gz" from
@@ -152,7 +148,6 @@ def test_mips64el_malta(self):
 
         :avocado: tags=arch:mips64el
         :avocado: tags=machine:malta
-        :avocado: tags=flaky
         """
         deb_url = ('http://snapshot.debian.org/archive/debian/'
                    '20130217T032700Z/pool/main/l/linux-2.6/'
@@ -200,8 +195,6 @@ def test_arm_virt(self):
 
         self.run_rr(kernel_path, kernel_command_line, console_pattern, shift=1)
 
-    @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable on GitLab')
-
     def test_arm_cubieboard_initrd(self):
         """
         :avocado: tags=arch:arm
@@ -354,7 +347,6 @@ def test_m68k_mcf5208evb(self):
         file_path = self.fetch_asset(tar_url, asset_hash=tar_hash)
         self.do_test_advcal_2018(file_path, 'sanity-clause.elf')
 
-    @skip("Test currently broken") # Console stuck as of 5.2-rc1
     def test_microblaze_s3adsp1800(self):
         """
         :avocado: tags=arch:microblaze
@@ -389,7 +381,6 @@ def test_or1k_sim(self):
         file_path = self.fetch_asset(tar_url, asset_hash=tar_hash)
         self.do_test_advcal_2018(file_path, 'vmlinux')
 
-    @skip("nios2 emulation is buggy under record/replay")
     def test_nios2_10m50(self):
         """
         :avocado: tags=arch:nios2
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 02/11] tests/avocado: fix typo in replay_linux
  2023-12-05 20:40 ` [PATCH 02/11] tests/avocado: fix typo in replay_linux Alex Bennée
@ 2023-12-06 11:24   ` Philippe Mathieu-Daudé
  2023-12-06 16:22   ` Richard Henderson
  2023-12-07  8:20   ` Pavel Dovgalyuk
  2 siblings, 0 replies; 40+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-12-06 11:24 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost, Richard Henderson,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, John Snow, Pavel Dovgalyuk

On 5/12/23 21:40, Alex Bennée wrote:
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>   tests/avocado/replay_linux.py | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 04/11] scripts/replay_dump: track total number of instructions
  2023-12-05 20:40 ` [PATCH 04/11] scripts/replay_dump: track total number of instructions Alex Bennée
@ 2023-12-06 11:25   ` Philippe Mathieu-Daudé
  2023-12-06 16:38   ` Richard Henderson
  2023-12-07  8:26   ` Pavel Dovgalyuk
  2 siblings, 0 replies; 40+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-12-06 11:25 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost, Richard Henderson,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, John Snow, Pavel Dovgalyuk

On 5/12/23 21:40, Alex Bennée wrote:
> This will help in tracking where we are in the stream when debugging.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>   scripts/replay-dump.py | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 06/11] replay: add proper kdoc for ReplayState
  2023-12-05 20:41 ` [PATCH 06/11] replay: add proper kdoc for ReplayState Alex Bennée
@ 2023-12-06 11:27   ` Philippe Mathieu-Daudé
  2023-12-06 11:56     ` Alex Bennée
  2023-12-07  8:38   ` Pavel Dovgalyuk
  1 sibling, 1 reply; 40+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-12-06 11:27 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost, Richard Henderson,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, John Snow, Pavel Dovgalyuk

On 5/12/23 21:41, Alex Bennée wrote:
> Remove the non-standard comment formatting and move the descriptions
> into a proper kdoc comment.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>   replay/replay-internal.h               | 27 ++++++++++++++++----------

~~~

>   roms/SLOF                              |  2 +-
>   tests/tcg/i386/Makefile.softmmu-target | 19 ++++++++++++++++++
>   3 files changed, 37 insertions(+), 11 deletions(-)
> 
> diff --git a/replay/replay-internal.h b/replay/replay-internal.h
> index 516147ddbc..98ca3748ed 100644
> --- a/replay/replay-internal.h
> +++ b/replay/replay-internal.h
> @@ -63,24 +63,31 @@ enum ReplayEvents {
>       EVENT_COUNT
>   };
>   
> +/**
> + * typedef ReplayState - global tracking Replay state
> + *
> + * This structure tracks where we are in the current ReplayState
> + * including the logged events from the recorded replay stream. Some
> + * of the data is also stored/restored from VMStateDescription when VM
> + * save/restore events take place.
> + *
> + * @cached_clock: Cached clocks values
> + * @current_icount: number of processed instructions
> + * @instruction_count: number of instructions until next event
> + * @data_kind: current event
> + * @has_unread_data: 1 if event not yet processed
> + * @file_offset: offset into replay log at replay snapshot
> + * @block_request_id: current serialised block request id
> + * @read_event_id: current async read event id
> + */
>   typedef struct ReplayState {
> -    /*! Cached clock values. */
>       int64_t cached_clock[REPLAY_CLOCK_COUNT];
> -    /*! Current icount - number of processed instructions. */
>       uint64_t current_icount;
> -    /*! Number of instructions to be executed before other events happen. */
>       int instruction_count;
> -    /*! Type of the currently executed event. */
>       unsigned int data_kind;
> -    /*! Flag which indicates that event is not processed yet. */
>       unsigned int has_unread_data;
> -    /*! Temporary variable for saving current log offset. */
>       uint64_t file_offset;
> -    /*! Next block operation id.
> -        This counter is global, because requests from different
> -        block devices should not get overlapping ids. */
>       uint64_t block_request_id;
> -    /*! Asynchronous event id read from the log */
>       uint64_t read_event_id;
>   } ReplayState;
>   extern ReplayState replay_state;

Up to here:
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

The rest doesn't belong to this commit:

> diff --git a/roms/SLOF b/roms/SLOF
> index 3a259df244..6b6c16b4b4 160000
> --- a/roms/SLOF
> +++ b/roms/SLOF
> @@ -1 +1 @@
> -Subproject commit 3a259df2449fc4a4e43ab5f33f0b2c66484b4bc3
> +Subproject commit 6b6c16b4b40763507cf1f518096f3c3883c5cf2d
> diff --git a/tests/tcg/i386/Makefile.softmmu-target b/tests/tcg/i386/Makefile.softmmu-target
> index 5266f2335a..b9bef72dcf 100644
> --- a/tests/tcg/i386/Makefile.softmmu-target
> +++ b/tests/tcg/i386/Makefile.softmmu-target
> @@ -33,5 +33,24 @@ EXTRA_RUNS+=$(MULTIARCH_RUNS)
>   
>   memory: CFLAGS+=-DCHECK_UNALIGNED=1
>   
> +# Simple Record/Replay Test
> +.PHONY: memory-record
> +run-memory-record: memory-record memory
> +	$(call run-test, $<, \
> +	  $(QEMU) -monitor none -display none \
> +		  -chardev file$(COMMA)path=$<.out$(COMMA)id=output \
> +		  -icount shift=5$(COMMA)rr=record$(COMMA)rrfile=record.bin \
> +		  $(QEMU_OPTS) memory)
> +
> +.PHONY: memory-replay
> +run-memory-replay: memory-replay run-memory-record
> +	$(call run-test, $<, \
> +	  $(QEMU) -monitor none -display none \
> +		  -chardev file$(COMMA)path=$<.out$(COMMA)id=output \
> +		  -icount shift=5$(COMMA)rr=replay$(COMMA)rrfile=record.bin \
> +		  $(QEMU_OPTS) memory)
> +
> +EXTRA_RUNS+=run-memory-replay
> +
>   # Running
>   QEMU_OPTS+=-device isa-debugcon,chardev=output -device isa-debug-exit,iobase=0xf4,iosize=0x4 -kernel



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 07/11] replay: make has_unread_data a bool
  2023-12-05 20:41 ` [PATCH 07/11] replay: make has_unread_data a bool Alex Bennée
@ 2023-12-06 11:31   ` Philippe Mathieu-Daudé
  2023-12-06 16:42   ` Richard Henderson
  2023-12-07  8:39   ` Pavel Dovgalyuk
  2 siblings, 0 replies; 40+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-12-06 11:31 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost, Richard Henderson,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, John Snow, Pavel Dovgalyuk

On 5/12/23 21:41, Alex Bennée wrote:
> For clarity given it only has two states.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>   replay/replay-internal.h | 4 ++--
>   replay/replay-internal.c | 4 ++--
>   replay/replay-snapshot.c | 6 +++---
>   replay/replay.c          | 2 +-
>   4 files changed, 8 insertions(+), 8 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 05/11] replay: remove host_clock_last
  2023-12-05 20:41 ` [PATCH 05/11] replay: remove host_clock_last Alex Bennée
@ 2023-12-06 11:33   ` Philippe Mathieu-Daudé
  2023-12-06 16:39   ` Richard Henderson
  2023-12-07  8:26   ` Pavel Dovgalyuk
  2 siblings, 0 replies; 40+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-12-06 11:33 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost, Richard Henderson,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, John Snow, Pavel Dovgalyuk

On 5/12/23 21:41, Alex Bennée wrote:
> Fixes: a02fe2ca70 (replay: Remove host_clock_last)
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>   replay/replay-internal.h | 2 --
>   1 file changed, 2 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 08/11] replay: introduce a central report point for sync errors
  2023-12-05 20:41 ` [PATCH 08/11] replay: introduce a central report point for sync errors Alex Bennée
@ 2023-12-06 11:35   ` Philippe Mathieu-Daudé
  2023-12-06 16:48     ` Richard Henderson
  2023-12-06 16:47   ` Richard Henderson
  1 sibling, 1 reply; 40+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-12-06 11:35 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost, Richard Henderson,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, John Snow, Pavel Dovgalyuk

Hi Alex,

On 5/12/23 21:41, Alex Bennée wrote:
> Figuring out why replay has failed is tricky at the best of times.
> Lets centralise the reporting of a replay sync error and add a little
> bit of extra information to help with debugging.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>   replay/replay-internal.h | 12 ++++++++++++
>   replay/replay-char.c     |  6 ++----
>   replay/replay-internal.c |  1 +
>   replay/replay.c          |  9 +++++++++
>   4 files changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/replay/replay-internal.h b/replay/replay-internal.h
> index 1bc8fd5086..709e2eb4cb 100644
> --- a/replay/replay-internal.h
> +++ b/replay/replay-internal.h
> @@ -74,6 +74,7 @@ enum ReplayEvents {
>    * @cached_clock: Cached clocks values
>    * @current_icount: number of processed instructions
>    * @instruction_count: number of instructions until next event
> + * @current_event: current event index
>    * @data_kind: current event
>    * @has_unread_data: true if event not yet processed
>    * @file_offset: offset into replay log at replay snapshot
> @@ -84,6 +85,7 @@ typedef struct ReplayState {
>       int64_t cached_clock[REPLAY_CLOCK_COUNT];
>       uint64_t current_icount;
>       int instruction_count;
> +    unsigned int current_event;
>       unsigned int data_kind;
>       bool has_unread_data;
>       uint64_t file_offset;
Shouldn't this field be migrated?


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 06/11] replay: add proper kdoc for ReplayState
  2023-12-06 11:27   ` Philippe Mathieu-Daudé
@ 2023-12-06 11:56     ` Alex Bennée
  0 siblings, 0 replies; 40+ messages in thread
From: Alex Bennée @ 2023-12-06 11:56 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: qemu-devel, Cleber Rosa, Beraldo Leal, Eduardo Habkost,
	Richard Henderson, Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, John Snow, Pavel Dovgalyuk

Philippe Mathieu-Daudé <philmd@linaro.org> writes:

> On 5/12/23 21:41, Alex Bennée wrote:
>> Remove the non-standard comment formatting and move the descriptions
>> into a proper kdoc comment.
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> ---
>>   replay/replay-internal.h               | 27 ++++++++++++++++----------
>
> ~~~
>
>>   roms/SLOF                              |  2 +-
>>   tests/tcg/i386/Makefile.softmmu-target | 19 ++++++++++++++++++
>>   3 files changed, 37 insertions(+), 11 deletions(-)
>> diff --git a/replay/replay-internal.h b/replay/replay-internal.h
>> index 516147ddbc..98ca3748ed 100644
>> --- a/replay/replay-internal.h
>> +++ b/replay/replay-internal.h
>> @@ -63,24 +63,31 @@ enum ReplayEvents {
>>       EVENT_COUNT
>>   };
>>   +/**
>> + * typedef ReplayState - global tracking Replay state
>> + *
>> + * This structure tracks where we are in the current ReplayState
>> + * including the logged events from the recorded replay stream. Some
>> + * of the data is also stored/restored from VMStateDescription when VM
>> + * save/restore events take place.
>> + *
>> + * @cached_clock: Cached clocks values
>> + * @current_icount: number of processed instructions
>> + * @instruction_count: number of instructions until next event
>> + * @data_kind: current event
>> + * @has_unread_data: 1 if event not yet processed
>> + * @file_offset: offset into replay log at replay snapshot
>> + * @block_request_id: current serialised block request id
>> + * @read_event_id: current async read event id
>> + */
>>   typedef struct ReplayState {
>> -    /*! Cached clock values. */
>>       int64_t cached_clock[REPLAY_CLOCK_COUNT];
>> -    /*! Current icount - number of processed instructions. */
>>       uint64_t current_icount;
>> -    /*! Number of instructions to be executed before other events happen. */
>>       int instruction_count;
>> -    /*! Type of the currently executed event. */
>>       unsigned int data_kind;
>> -    /*! Flag which indicates that event is not processed yet. */
>>       unsigned int has_unread_data;
>> -    /*! Temporary variable for saving current log offset. */
>>       uint64_t file_offset;
>> -    /*! Next block operation id.
>> -        This counter is global, because requests from different
>> -        block devices should not get overlapping ids. */
>>       uint64_t block_request_id;
>> -    /*! Asynchronous event id read from the log */
>>       uint64_t read_event_id;
>>   } ReplayState;
>>   extern ReplayState replay_state;
>
> Up to here:
> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
>
> The rest doesn't belong to this commit:

Oops, I missed that when rushing this out last night... will delete.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 10/11] chardev: force write all when recording replay logs
  2023-12-05 20:41 ` [PATCH 10/11] chardev: force write all when recording replay logs Alex Bennée
@ 2023-12-06 14:25   ` Philippe Mathieu-Daudé
  2023-12-07  8:46   ` Pavel Dovgalyuk
  1 sibling, 0 replies; 40+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-12-06 14:25 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost, Richard Henderson,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, John Snow, Pavel Dovgalyuk

On 5/12/23 21:41, Alex Bennée wrote:
> This is mostly a problem within avocado as serial generally isn't busy
> enough to overfill pipes. However the consequences of recording a
> failed write will haunt us on replay when causing the log to go out of
> sync.
> 
> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2010
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Cc: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
> ---
>   chardev/char.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/chardev/char.c b/chardev/char.c
> index 996a024c7a..6e5b4d7345 100644
> --- a/chardev/char.c
> +++ b/chardev/char.c
> @@ -171,7 +171,8 @@ int qemu_chr_write(Chardev *s, const uint8_t *buf, int len, bool write_all)
>           return res;
>       }

Please add a comment explaining why in the code. Maybe simpler
using:

        if (replay_mode == REPLAY_MODE_RECORD) {
            /* $explanation */
            write_all = true;
        }

> -    res = qemu_chr_write_buffer(s, buf, len, &offset, write_all);
> +    res = qemu_chr_write_buffer(s, buf, len, &offset,
> +                                replay_mode == REPLAY_MODE_RECORD ? true : write_all);
With a comment:
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 01/11] tests/avocado: add a simple i386 replay kernel test
  2023-12-05 20:40 ` [PATCH 01/11] tests/avocado: add a simple i386 replay kernel test Alex Bennée
@ 2023-12-06 16:21   ` Richard Henderson
  2023-12-07  8:20   ` Pavel Dovgalyuk
  1 sibling, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2023-12-06 16:21 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, Philippe Mathieu-Daudé,
	John Snow, Pavel Dovgalyuk

On 12/5/23 12:40, Alex Bennée wrote:
> There are a number of bugs against 32 bit x86 on the tracker. Lets at
> least establish a baseline pure kernel boot can do record/replay
> before we start looking at the devices.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>   tests/avocado/replay_kernel.py | 16 ++++++++++++++++
>   1 file changed, 16 insertions(+)

Acked-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 02/11] tests/avocado: fix typo in replay_linux
  2023-12-05 20:40 ` [PATCH 02/11] tests/avocado: fix typo in replay_linux Alex Bennée
  2023-12-06 11:24   ` Philippe Mathieu-Daudé
@ 2023-12-06 16:22   ` Richard Henderson
  2023-12-07  8:20   ` Pavel Dovgalyuk
  2 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2023-12-06 16:22 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, Philippe Mathieu-Daudé,
	John Snow, Pavel Dovgalyuk

On 12/5/23 12:40, Alex Bennée wrote:
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>   tests/avocado/replay_linux.py | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 03/11] scripts/replay-dump: update to latest format
  2023-12-05 20:40 ` [PATCH 03/11] scripts/replay-dump: update to latest format Alex Bennée
@ 2023-12-06 16:29   ` Richard Henderson
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2023-12-06 16:29 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, Philippe Mathieu-Daudé,
	John Snow, Pavel Dovgalyuk

On 12/5/23 12:40, Alex Bennée wrote:
> @@ -268,6 +279,49 @@ def decode_clock(eid, name, dumpfile):
>                     Decoder(28, "EVENT_CP_RESET", decode_checkpoint),
>   ]
>   
> +# Shutdown cause added
> +v12_event_table = [Decoder(0, "EVENT_INSTRUCTION", decode_instruction),

This comment applied to the v7 changes.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 04/11] scripts/replay_dump: track total number of instructions
  2023-12-05 20:40 ` [PATCH 04/11] scripts/replay_dump: track total number of instructions Alex Bennée
  2023-12-06 11:25   ` Philippe Mathieu-Daudé
@ 2023-12-06 16:38   ` Richard Henderson
  2023-12-07  8:26   ` Pavel Dovgalyuk
  2 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2023-12-06 16:38 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, Philippe Mathieu-Daudé,
	John Snow, Pavel Dovgalyuk

On 12/5/23 12:40, Alex Bennée wrote:
> This will help in tracking where we are in the stream when debugging.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>   scripts/replay-dump.py | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 05/11] replay: remove host_clock_last
  2023-12-05 20:41 ` [PATCH 05/11] replay: remove host_clock_last Alex Bennée
  2023-12-06 11:33   ` Philippe Mathieu-Daudé
@ 2023-12-06 16:39   ` Richard Henderson
  2023-12-07  8:26   ` Pavel Dovgalyuk
  2 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2023-12-06 16:39 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, Philippe Mathieu-Daudé,
	John Snow, Pavel Dovgalyuk

On 12/5/23 12:41, Alex Bennée wrote:
> Fixes: a02fe2ca70 (replay: Remove host_clock_last)
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>   replay/replay-internal.h | 2 --
>   1 file changed, 2 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 07/11] replay: make has_unread_data a bool
  2023-12-05 20:41 ` [PATCH 07/11] replay: make has_unread_data a bool Alex Bennée
  2023-12-06 11:31   ` Philippe Mathieu-Daudé
@ 2023-12-06 16:42   ` Richard Henderson
  2023-12-07  8:39   ` Pavel Dovgalyuk
  2 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2023-12-06 16:42 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, Philippe Mathieu-Daudé,
	John Snow, Pavel Dovgalyuk

On 12/5/23 12:41, Alex Bennée wrote:
> For clarity given it only has two states.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>   replay/replay-internal.h | 4 ++--
>   replay/replay-internal.c | 4 ++--
>   replay/replay-snapshot.c | 6 +++---
>   replay/replay.c          | 2 +-
>   4 files changed, 8 insertions(+), 8 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 08/11] replay: introduce a central report point for sync errors
  2023-12-05 20:41 ` [PATCH 08/11] replay: introduce a central report point for sync errors Alex Bennée
  2023-12-06 11:35   ` Philippe Mathieu-Daudé
@ 2023-12-06 16:47   ` Richard Henderson
  1 sibling, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2023-12-06 16:47 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, Philippe Mathieu-Daudé,
	John Snow, Pavel Dovgalyuk

On 12/5/23 12:41, Alex Bennée wrote:
> Figuring out why replay has failed is tricky at the best of times.
> Lets centralise the reporting of a replay sync error and add a little
> bit of extra information to help with debugging.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>   replay/replay-internal.h | 12 ++++++++++++
>   replay/replay-char.c     |  6 ++----
>   replay/replay-internal.c |  1 +
>   replay/replay.c          |  9 +++++++++
>   4 files changed, 24 insertions(+), 4 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 08/11] replay: introduce a central report point for sync errors
  2023-12-06 11:35   ` Philippe Mathieu-Daudé
@ 2023-12-06 16:48     ` Richard Henderson
  2023-12-07  8:45       ` Pavel Dovgalyuk
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2023-12-06 16:48 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Alex Bennée, qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, John Snow, Pavel Dovgalyuk

On 12/6/23 03:35, Philippe Mathieu-Daudé wrote:
> Hi Alex,
> 
> On 5/12/23 21:41, Alex Bennée wrote:
>> Figuring out why replay has failed is tricky at the best of times.
>> Lets centralise the reporting of a replay sync error and add a little
>> bit of extra information to help with debugging.
>>
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> ---
>>   replay/replay-internal.h | 12 ++++++++++++
>>   replay/replay-char.c     |  6 ++----
>>   replay/replay-internal.c |  1 +
>>   replay/replay.c          |  9 +++++++++
>>   4 files changed, 24 insertions(+), 4 deletions(-)
>>
>> diff --git a/replay/replay-internal.h b/replay/replay-internal.h
>> index 1bc8fd5086..709e2eb4cb 100644
>> --- a/replay/replay-internal.h
>> +++ b/replay/replay-internal.h
>> @@ -74,6 +74,7 @@ enum ReplayEvents {
>>    * @cached_clock: Cached clocks values
>>    * @current_icount: number of processed instructions
>>    * @instruction_count: number of instructions until next event
>> + * @current_event: current event index
>>    * @data_kind: current event
>>    * @has_unread_data: true if event not yet processed
>>    * @file_offset: offset into replay log at replay snapshot
>> @@ -84,6 +85,7 @@ typedef struct ReplayState {
>>       int64_t cached_clock[REPLAY_CLOCK_COUNT];
>>       uint64_t current_icount;
>>       int instruction_count;
>> +    unsigned int current_event;
>>       unsigned int data_kind;
>>       bool has_unread_data;
>>       uint64_t file_offset;
> Shouldn't this field be migrated?

No, it's for diagnostic use only.


r~



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 09/11] replay: stop us hanging in rr_wait_io_event
  2023-12-05 20:41 ` [PATCH 09/11] replay: stop us hanging in rr_wait_io_event Alex Bennée
@ 2023-12-06 16:51   ` Richard Henderson
  2023-12-08  8:32   ` Pavel Dovgalyuk
  1 sibling, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2023-12-06 16:51 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, Philippe Mathieu-Daudé,
	John Snow, Pavel Dovgalyuk

On 12/5/23 12:41, Alex Bennée wrote:
> A lot of the hang I see are when we end up spinning in
> rr_wait_io_event for an event that will never come in playback. As a
> new check functions which can see if we are in PLAY mode and kick us
> us the wait function so the event can be processed.
> 
> This fixes most of the failures in replay_kernel.py
> 
> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2013
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Cc: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
> ---
>   include/sysemu/replay.h      |  5 +++++
>   accel/tcg/tcg-accel-ops-rr.c |  2 +-
>   replay/replay.c              | 24 ++++++++++++++++++++++++
>   3 files changed, 30 insertions(+), 1 deletion(-)
> 
> diff --git a/include/sysemu/replay.h b/include/sysemu/replay.h
> index 08aae5869f..83995ae4bd 100644
> --- a/include/sysemu/replay.h
> +++ b/include/sysemu/replay.h
> @@ -70,6 +70,11 @@ int replay_get_instructions(void);
>   /*! Updates instructions counter in replay mode. */
>   void replay_account_executed_instructions(void);
>   
> +/**
> + * replay_can_wait: check if we should pause for wait-io
> + */
> +bool replay_can_wait(void);
> +
>   /* Processing clocks and other time sources */
>   
>   /*! Save the specified clock */
> diff --git a/accel/tcg/tcg-accel-ops-rr.c b/accel/tcg/tcg-accel-ops-rr.c
> index 611932f3c3..825e35b3dc 100644
> --- a/accel/tcg/tcg-accel-ops-rr.c
> +++ b/accel/tcg/tcg-accel-ops-rr.c
> @@ -109,7 +109,7 @@ static void rr_wait_io_event(void)
>   {
>       CPUState *cpu;
>   
> -    while (all_cpu_threads_idle()) {
> +    while (all_cpu_threads_idle() && replay_can_wait()) {
>           rr_stop_kick_timer();
>           qemu_cond_wait_iothread(first_cpu->halt_cond);
>       }
> diff --git a/replay/replay.c b/replay/replay.c
> index e83c01285c..042a6a9636 100644
> --- a/replay/replay.c
> +++ b/replay/replay.c
> @@ -347,6 +347,30 @@ void replay_start(void)
>       replay_enable_events();
>   }
>   
> +/*
> + * For none/record the answer is yes.
> + */
> +bool replay_can_wait(void)
> +{
> +    if (replay_mode == REPLAY_MODE_PLAY) {
> +        /*
> +         * For playback we shouldn't ever be at a point we wait. If
> +         * the instruction count has reached zero and we have an
> +         * unconsumed event we should go around again and consume it.
> +         */
> +        if (replay_state.instruction_count == 0 && replay_state.has_unread_data) {
> +            return false;
> +        } else {
> +            fprintf(stderr, "Error: Invalid replay state\n");
> +            fprintf(stderr,"instruction_count = %d, has = %d, event_kind = %d\n",
> +                    replay_state.instruction_count, replay_state.has_unread_data, replay_state.data_kind);
> +            abort();

error_report.


r~

> +        }
> +    }
> +    return true;
> +}
> +
> +
>   void replay_finish(void)
>   {
>       if (replay_mode == REPLAY_MODE_NONE) {



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 01/11] tests/avocado: add a simple i386 replay kernel test
  2023-12-05 20:40 ` [PATCH 01/11] tests/avocado: add a simple i386 replay kernel test Alex Bennée
  2023-12-06 16:21   ` Richard Henderson
@ 2023-12-07  8:20   ` Pavel Dovgalyuk
  1 sibling, 0 replies; 40+ messages in thread
From: Pavel Dovgalyuk @ 2023-12-07  8:20 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost, Richard Henderson,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, Philippe Mathieu-Daudé,
	John Snow, Pavel Dovgalyuk

On 05.12.2023 23:40, Alex Bennée wrote:
> There are a number of bugs against 32 bit x86 on the tracker. Lets at
> least establish a baseline pure kernel boot can do record/replay
> before we start looking at the devices.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>   tests/avocado/replay_kernel.py | 16 ++++++++++++++++
>   1 file changed, 16 insertions(+)
> 

Acked-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 02/11] tests/avocado: fix typo in replay_linux
  2023-12-05 20:40 ` [PATCH 02/11] tests/avocado: fix typo in replay_linux Alex Bennée
  2023-12-06 11:24   ` Philippe Mathieu-Daudé
  2023-12-06 16:22   ` Richard Henderson
@ 2023-12-07  8:20   ` Pavel Dovgalyuk
  2 siblings, 0 replies; 40+ messages in thread
From: Pavel Dovgalyuk @ 2023-12-07  8:20 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost, Richard Henderson,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, Philippe Mathieu-Daudé,
	John Snow, Pavel Dovgalyuk

On 05.12.2023 23:40, Alex Bennée wrote:
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>   tests/avocado/replay_linux.py | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tests/avocado/replay_linux.py b/tests/avocado/replay_linux.py
> index 270ccc1eae..e95bff3299 100644
> --- a/tests/avocado/replay_linux.py
> +++ b/tests/avocado/replay_linux.py
> @@ -94,7 +94,7 @@ def launch_and_wait(self, record, args, shift):
>           else:
>               vm.event_wait('SHUTDOWN', self.timeout)
>               vm.wait()
> -            logger.info('successfully fihished the replay')
> +            logger.info('successfully finished the replay')
>           elapsed = time.time() - start_time
>           logger.info('elapsed time %.2f sec' % elapsed)
>           return elapsed


Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 04/11] scripts/replay_dump: track total number of instructions
  2023-12-05 20:40 ` [PATCH 04/11] scripts/replay_dump: track total number of instructions Alex Bennée
  2023-12-06 11:25   ` Philippe Mathieu-Daudé
  2023-12-06 16:38   ` Richard Henderson
@ 2023-12-07  8:26   ` Pavel Dovgalyuk
  2 siblings, 0 replies; 40+ messages in thread
From: Pavel Dovgalyuk @ 2023-12-07  8:26 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost, Richard Henderson,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, Philippe Mathieu-Daudé,
	John Snow, Pavel Dovgalyuk

On 05.12.2023 23:40, Alex Bennée wrote:
> This will help in tracking where we are in the stream when debugging.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>   scripts/replay-dump.py | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/scripts/replay-dump.py b/scripts/replay-dump.py
> index 8b9f914534..2212b09322 100755
> --- a/scripts/replay-dump.py
> +++ b/scripts/replay-dump.py
> @@ -150,10 +150,13 @@ def decode_async(eid, name, dumpfile):
>   
>       return call_decode(async_decode_table, async_event_kind, dumpfile)
>   
> +total_insns = 0
>   
>   def decode_instruction(eid, name, dumpfile):
> +    global total_insns
>       ins_diff = read_dword(dumpfile)
> -    print_event(eid, name, "0x%x" % (ins_diff))
> +    total_insns += ins_diff
> +    print_event(eid, name, "+ %d -> %d" % (ins_diff, total_insns))
>       return True
>   
>   def decode_char_write(eid, name, dumpfile):

Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 05/11] replay: remove host_clock_last
  2023-12-05 20:41 ` [PATCH 05/11] replay: remove host_clock_last Alex Bennée
  2023-12-06 11:33   ` Philippe Mathieu-Daudé
  2023-12-06 16:39   ` Richard Henderson
@ 2023-12-07  8:26   ` Pavel Dovgalyuk
  2 siblings, 0 replies; 40+ messages in thread
From: Pavel Dovgalyuk @ 2023-12-07  8:26 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost, Richard Henderson,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, Philippe Mathieu-Daudé,
	John Snow, Pavel Dovgalyuk

On 05.12.2023 23:41, Alex Bennée wrote:
> Fixes: a02fe2ca70 (replay: Remove host_clock_last)
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>   replay/replay-internal.h | 2 --
>   1 file changed, 2 deletions(-)
> 
> diff --git a/replay/replay-internal.h b/replay/replay-internal.h
> index b6836354ac..516147ddbc 100644
> --- a/replay/replay-internal.h
> +++ b/replay/replay-internal.h
> @@ -80,8 +80,6 @@ typedef struct ReplayState {
>           This counter is global, because requests from different
>           block devices should not get overlapping ids. */
>       uint64_t block_request_id;
> -    /*! Prior value of the host clock */
> -    uint64_t host_clock_last;
>       /*! Asynchronous event id read from the log */
>       uint64_t read_event_id;
>   } ReplayState;

Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 06/11] replay: add proper kdoc for ReplayState
  2023-12-05 20:41 ` [PATCH 06/11] replay: add proper kdoc for ReplayState Alex Bennée
  2023-12-06 11:27   ` Philippe Mathieu-Daudé
@ 2023-12-07  8:38   ` Pavel Dovgalyuk
  1 sibling, 0 replies; 40+ messages in thread
From: Pavel Dovgalyuk @ 2023-12-07  8:38 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost, Richard Henderson,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, Philippe Mathieu-Daudé,
	John Snow, Pavel Dovgalyuk

On 05.12.2023 23:41, Alex Bennée wrote:
> Remove the non-standard comment formatting and move the descriptions
> into a proper kdoc comment.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>   replay/replay-internal.h               | 27 ++++++++++++++++----------
>   roms/SLOF                              |  2 +-
>   tests/tcg/i386/Makefile.softmmu-target | 19 ++++++++++++++++++
>   3 files changed, 37 insertions(+), 11 deletions(-)
> 
> diff --git a/replay/replay-internal.h b/replay/replay-internal.h
> index 516147ddbc..98ca3748ed 100644
> --- a/replay/replay-internal.h
> +++ b/replay/replay-internal.h
> @@ -63,24 +63,31 @@ enum ReplayEvents {
>       EVENT_COUNT
>   };
>   
> +/**
> + * typedef ReplayState - global tracking Replay state
> + *
> + * This structure tracks where we are in the current ReplayState
> + * including the logged events from the recorded replay stream. Some
> + * of the data is also stored/restored from VMStateDescription when VM
> + * save/restore events take place.
> + *
> + * @cached_clock: Cached clocks values
> + * @current_icount: number of processed instructions
> + * @instruction_count: number of instructions until next event
> + * @data_kind: current event
> + * @has_unread_data: 1 if event not yet processed
> + * @file_offset: offset into replay log at replay snapshot
> + * @block_request_id: current serialised block request id
> + * @read_event_id: current async read event id
> + */
>   typedef struct ReplayState {
> -    /*! Cached clock values. */
>       int64_t cached_clock[REPLAY_CLOCK_COUNT];
> -    /*! Current icount - number of processed instructions. */
>       uint64_t current_icount;
> -    /*! Number of instructions to be executed before other events happen. */
>       int instruction_count;
> -    /*! Type of the currently executed event. */
>       unsigned int data_kind;
> -    /*! Flag which indicates that event is not processed yet. */
>       unsigned int has_unread_data;
> -    /*! Temporary variable for saving current log offset. */
>       uint64_t file_offset;
> -    /*! Next block operation id.
> -        This counter is global, because requests from different
> -        block devices should not get overlapping ids. */
>       uint64_t block_request_id;
> -    /*! Asynchronous event id read from the log */
>       uint64_t read_event_id;
>   } ReplayState;
>   extern ReplayState replay_state;


Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 07/11] replay: make has_unread_data a bool
  2023-12-05 20:41 ` [PATCH 07/11] replay: make has_unread_data a bool Alex Bennée
  2023-12-06 11:31   ` Philippe Mathieu-Daudé
  2023-12-06 16:42   ` Richard Henderson
@ 2023-12-07  8:39   ` Pavel Dovgalyuk
  2 siblings, 0 replies; 40+ messages in thread
From: Pavel Dovgalyuk @ 2023-12-07  8:39 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost, Richard Henderson,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, Philippe Mathieu-Daudé,
	John Snow, Pavel Dovgalyuk

On 05.12.2023 23:41, Alex Bennée wrote:
> For clarity given it only has two states.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>   replay/replay-internal.h | 4 ++--
>   replay/replay-internal.c | 4 ++--
>   replay/replay-snapshot.c | 6 +++---
>   replay/replay.c          | 2 +-
>   4 files changed, 8 insertions(+), 8 deletions(-)
> 

Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 08/11] replay: introduce a central report point for sync errors
  2023-12-06 16:48     ` Richard Henderson
@ 2023-12-07  8:45       ` Pavel Dovgalyuk
  0 siblings, 0 replies; 40+ messages in thread
From: Pavel Dovgalyuk @ 2023-12-07  8:45 UTC (permalink / raw)
  To: Richard Henderson, Philippe Mathieu-Daudé,
	Alex Bennée, qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, John Snow, Pavel Dovgalyuk

On 06.12.2023 19:48, Richard Henderson wrote:
> On 12/6/23 03:35, Philippe Mathieu-Daudé wrote:
>> Hi Alex,
>>
>> On 5/12/23 21:41, Alex Bennée wrote:
>>> Figuring out why replay has failed is tricky at the best of times.
>>> Lets centralise the reporting of a replay sync error and add a little
>>> bit of extra information to help with debugging.
>>>
>>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>>> ---
>>>   replay/replay-internal.h | 12 ++++++++++++
>>>   replay/replay-char.c     |  6 ++----
>>>   replay/replay-internal.c |  1 +
>>>   replay/replay.c          |  9 +++++++++
>>>   4 files changed, 24 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/replay/replay-internal.h b/replay/replay-internal.h
>>> index 1bc8fd5086..709e2eb4cb 100644
>>> --- a/replay/replay-internal.h
>>> +++ b/replay/replay-internal.h
>>> @@ -74,6 +74,7 @@ enum ReplayEvents {
>>>    * @cached_clock: Cached clocks values
>>>    * @current_icount: number of processed instructions
>>>    * @instruction_count: number of instructions until next event
>>> + * @current_event: current event index
>>>    * @data_kind: current event
>>>    * @has_unread_data: true if event not yet processed
>>>    * @file_offset: offset into replay log at replay snapshot
>>> @@ -84,6 +85,7 @@ typedef struct ReplayState {
>>>       int64_t cached_clock[REPLAY_CLOCK_COUNT];
>>>       uint64_t current_icount;
>>>       int instruction_count;
>>> +    unsigned int current_event;
>>>       unsigned int data_kind;
>>>       bool has_unread_data;
>>>       uint64_t file_offset;
>> Shouldn't this field be migrated?
> 
> No, it's for diagnostic use only.

It should be migrated, because RR may be started from the snapshot, 
which references the middle of replayed scenario.


Pavel Dovgalyuk




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 11/11] tests/avocado: remove skips from replay_kernel
  2023-12-05 20:41 ` [PATCH 11/11] tests/avocado: remove skips from replay_kernel Alex Bennée
@ 2023-12-07  8:46   ` Pavel Dovgalyuk
  0 siblings, 0 replies; 40+ messages in thread
From: Pavel Dovgalyuk @ 2023-12-07  8:46 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost, Richard Henderson,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, Philippe Mathieu-Daudé,
	John Snow, Pavel Dovgalyuk

On 05.12.2023 23:41, Alex Bennée wrote:
> With the latest fixes for #2010 and #2013 these tests look pretty
> stable now. Of course the only way to be really sure is to run it in
> the CI infrastructure and see what breaks.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>   tests/avocado/replay_kernel.py | 9 ---------
>   1 file changed, 9 deletions(-)
> 
> diff --git a/tests/avocado/replay_kernel.py b/tests/avocado/replay_kernel.py
> index 1eaa36444c..c54e96c9ff 100644
> --- a/tests/avocado/replay_kernel.py
> +++ b/tests/avocado/replay_kernel.py
> @@ -98,8 +98,6 @@ def test_i386_pc(self):
>   
>           self.run_rr(kernel_path, kernel_command_line, console_pattern, shift=5)
>   
> -    # See https://gitlab.com/qemu-project/qemu/-/issues/2010
> -    @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test sometimes gets stuck')
>       def test_x86_64_pc(self):
>           """
>           :avocado: tags=arch:x86_64
> @@ -135,8 +133,6 @@ def test_mips_malta(self):
>   
>           self.run_rr(kernel_path, kernel_command_line, console_pattern, shift=5)
>   
> -    # See https://gitlab.com/qemu-project/qemu/-/issues/2013
> -    @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable on GitLab')
>       def test_mips64el_malta(self):
>           """
>           This test requires the ar tool to extract "data.tar.gz" from
> @@ -152,7 +148,6 @@ def test_mips64el_malta(self):
>   
>           :avocado: tags=arch:mips64el
>           :avocado: tags=machine:malta
> -        :avocado: tags=flaky
>           """
>           deb_url = ('http://snapshot.debian.org/archive/debian/'
>                      '20130217T032700Z/pool/main/l/linux-2.6/'
> @@ -200,8 +195,6 @@ def test_arm_virt(self):
>   
>           self.run_rr(kernel_path, kernel_command_line, console_pattern, shift=1)
>   
> -    @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable on GitLab')
> -
>       def test_arm_cubieboard_initrd(self):
>           """
>           :avocado: tags=arch:arm
> @@ -354,7 +347,6 @@ def test_m68k_mcf5208evb(self):
>           file_path = self.fetch_asset(tar_url, asset_hash=tar_hash)
>           self.do_test_advcal_2018(file_path, 'sanity-clause.elf')
>   
> -    @skip("Test currently broken") # Console stuck as of 5.2-rc1
>       def test_microblaze_s3adsp1800(self):
>           """
>           :avocado: tags=arch:microblaze
> @@ -389,7 +381,6 @@ def test_or1k_sim(self):
>           file_path = self.fetch_asset(tar_url, asset_hash=tar_hash)
>           self.do_test_advcal_2018(file_path, 'vmlinux')
>   
> -    @skip("nios2 emulation is buggy under record/replay")
>       def test_nios2_10m50(self):
>           """
>           :avocado: tags=arch:nios2


Acked-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 10/11] chardev: force write all when recording replay logs
  2023-12-05 20:41 ` [PATCH 10/11] chardev: force write all when recording replay logs Alex Bennée
  2023-12-06 14:25   ` Philippe Mathieu-Daudé
@ 2023-12-07  8:46   ` Pavel Dovgalyuk
  1 sibling, 0 replies; 40+ messages in thread
From: Pavel Dovgalyuk @ 2023-12-07  8:46 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost, Richard Henderson,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, Philippe Mathieu-Daudé,
	John Snow, Pavel Dovgalyuk

On 05.12.2023 23:41, Alex Bennée wrote:
> This is mostly a problem within avocado as serial generally isn't busy
> enough to overfill pipes. However the consequences of recording a
> failed write will haunt us on replay when causing the log to go out of
> sync.
> 
> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2010
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Cc: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
> ---
>   chardev/char.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/chardev/char.c b/chardev/char.c
> index 996a024c7a..6e5b4d7345 100644
> --- a/chardev/char.c
> +++ b/chardev/char.c
> @@ -171,7 +171,8 @@ int qemu_chr_write(Chardev *s, const uint8_t *buf, int len, bool write_all)
>           return res;
>       }
>   
> -    res = qemu_chr_write_buffer(s, buf, len, &offset, write_all);
> +    res = qemu_chr_write_buffer(s, buf, len, &offset,
> +                                replay_mode == REPLAY_MODE_RECORD ? true : write_all);
>   
>       if (qemu_chr_replay(s) && replay_mode == REPLAY_MODE_RECORD) {
>           replay_char_write_event_save(res, offset);

Acked-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 09/11] replay: stop us hanging in rr_wait_io_event
  2023-12-05 20:41 ` [PATCH 09/11] replay: stop us hanging in rr_wait_io_event Alex Bennée
  2023-12-06 16:51   ` Richard Henderson
@ 2023-12-08  8:32   ` Pavel Dovgalyuk
  2023-12-08  9:29     ` Alex Bennée
  1 sibling, 1 reply; 40+ messages in thread
From: Pavel Dovgalyuk @ 2023-12-08  8:32 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Cleber Rosa, Beraldo Leal, Eduardo Habkost, Richard Henderson,
	Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, Philippe Mathieu-Daudé,
	John Snow, Pavel Dovgalyuk

On 05.12.2023 23:41, Alex Bennée wrote:
> A lot of the hang I see are when we end up spinning in
> rr_wait_io_event for an event that will never come in playback. As a
> new check functions which can see if we are in PLAY mode and kick us
> us the wait function so the event can be processed.
> 
> This fixes most of the failures in replay_kernel.py

Is there an effect for console QEMU only?
I've tested this patch on Windows7 boot scenario and replay speed has 
not changed.

> 
> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2013
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Cc: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
> ---
>   include/sysemu/replay.h      |  5 +++++
>   accel/tcg/tcg-accel-ops-rr.c |  2 +-
>   replay/replay.c              | 24 ++++++++++++++++++++++++
>   3 files changed, 30 insertions(+), 1 deletion(-)
> 
> diff --git a/include/sysemu/replay.h b/include/sysemu/replay.h
> index 08aae5869f..83995ae4bd 100644
> --- a/include/sysemu/replay.h
> +++ b/include/sysemu/replay.h
> @@ -70,6 +70,11 @@ int replay_get_instructions(void);
>   /*! Updates instructions counter in replay mode. */
>   void replay_account_executed_instructions(void);
>   
> +/**
> + * replay_can_wait: check if we should pause for wait-io
> + */
> +bool replay_can_wait(void);
> +
>   /* Processing clocks and other time sources */
>   
>   /*! Save the specified clock */
> diff --git a/accel/tcg/tcg-accel-ops-rr.c b/accel/tcg/tcg-accel-ops-rr.c
> index 611932f3c3..825e35b3dc 100644
> --- a/accel/tcg/tcg-accel-ops-rr.c
> +++ b/accel/tcg/tcg-accel-ops-rr.c
> @@ -109,7 +109,7 @@ static void rr_wait_io_event(void)
>   {
>       CPUState *cpu;
>   
> -    while (all_cpu_threads_idle()) {
> +    while (all_cpu_threads_idle() && replay_can_wait()) {
>           rr_stop_kick_timer();
>           qemu_cond_wait_iothread(first_cpu->halt_cond);
>       }
> diff --git a/replay/replay.c b/replay/replay.c
> index e83c01285c..042a6a9636 100644
> --- a/replay/replay.c
> +++ b/replay/replay.c
> @@ -347,6 +347,30 @@ void replay_start(void)
>       replay_enable_events();
>   }
>   
> +/*
> + * For none/record the answer is yes.
> + */
> +bool replay_can_wait(void)
> +{
> +    if (replay_mode == REPLAY_MODE_PLAY) {
> +        /*
> +         * For playback we shouldn't ever be at a point we wait. If
> +         * the instruction count has reached zero and we have an
> +         * unconsumed event we should go around again and consume it.
> +         */
> +        if (replay_state.instruction_count == 0 && replay_state.has_unread_data) {
> +            return false;
> +        } else {
> +            fprintf(stderr, "Error: Invalid replay state\n");
> +            fprintf(stderr,"instruction_count = %d, has = %d, event_kind = %d\n",
> +                    replay_state.instruction_count, replay_state.has_unread_data, replay_state.data_kind);
> +            abort();
> +        }
> +    }
> +    return true;
> +}
> +
> +
>   void replay_finish(void)
>   {
>       if (replay_mode == REPLAY_MODE_NONE) {



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 09/11] replay: stop us hanging in rr_wait_io_event
  2023-12-08  8:32   ` Pavel Dovgalyuk
@ 2023-12-08  9:29     ` Alex Bennée
  0 siblings, 0 replies; 40+ messages in thread
From: Alex Bennée @ 2023-12-08  9:29 UTC (permalink / raw)
  To: Pavel Dovgalyuk
  Cc: qemu-devel, Cleber Rosa, Beraldo Leal, Eduardo Habkost,
	Richard Henderson, Wainer dos Santos Moschetta, Paolo Bonzini,
	Marc-André Lureau, Philippe Mathieu-Daudé,
	John Snow, Pavel Dovgalyuk

Pavel Dovgalyuk <pavel.dovgalyuk@ispras.ru> writes:

> On 05.12.2023 23:41, Alex Bennée wrote:
>> A lot of the hang I see are when we end up spinning in
>> rr_wait_io_event for an event that will never come in playback. As a
>> new check functions which can see if we are in PLAY mode and kick us
>> us the wait function so the event can be processed.
>> This fixes most of the failures in replay_kernel.py
>
> Is there an effect for console QEMU only?
> I've tested this patch on Windows7 boot scenario and replay speed has
> not changed.

It was a lock up I was seeing (because once it was in this mode what was
going to wake it up). It could be outside of running in avocado some
other random event woke things up and allowed stuff to progress. However
I think this is a correctness issue, when would we ever wait for io
during playback?

>
>> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2013
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> Cc: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
>> ---
>>   include/sysemu/replay.h      |  5 +++++
>>   accel/tcg/tcg-accel-ops-rr.c |  2 +-
>>   replay/replay.c              | 24 ++++++++++++++++++++++++
>>   3 files changed, 30 insertions(+), 1 deletion(-)
>> diff --git a/include/sysemu/replay.h b/include/sysemu/replay.h
>> index 08aae5869f..83995ae4bd 100644
>> --- a/include/sysemu/replay.h
>> +++ b/include/sysemu/replay.h
>> @@ -70,6 +70,11 @@ int replay_get_instructions(void);
>>   /*! Updates instructions counter in replay mode. */
>>   void replay_account_executed_instructions(void);
>>   +/**
>> + * replay_can_wait: check if we should pause for wait-io
>> + */
>> +bool replay_can_wait(void);
>> +
>>   /* Processing clocks and other time sources */
>>     /*! Save the specified clock */
>> diff --git a/accel/tcg/tcg-accel-ops-rr.c b/accel/tcg/tcg-accel-ops-rr.c
>> index 611932f3c3..825e35b3dc 100644
>> --- a/accel/tcg/tcg-accel-ops-rr.c
>> +++ b/accel/tcg/tcg-accel-ops-rr.c
>> @@ -109,7 +109,7 @@ static void rr_wait_io_event(void)
>>   {
>>       CPUState *cpu;
>>   -    while (all_cpu_threads_idle()) {
>> +    while (all_cpu_threads_idle() && replay_can_wait()) {
>>           rr_stop_kick_timer();
>>           qemu_cond_wait_iothread(first_cpu->halt_cond);
>>       }
>> diff --git a/replay/replay.c b/replay/replay.c
>> index e83c01285c..042a6a9636 100644
>> --- a/replay/replay.c
>> +++ b/replay/replay.c
>> @@ -347,6 +347,30 @@ void replay_start(void)
>>       replay_enable_events();
>>   }
>>   +/*
>> + * For none/record the answer is yes.
>> + */
>> +bool replay_can_wait(void)
>> +{
>> +    if (replay_mode == REPLAY_MODE_PLAY) {
>> +        /*
>> +         * For playback we shouldn't ever be at a point we wait. If
>> +         * the instruction count has reached zero and we have an
>> +         * unconsumed event we should go around again and consume it.
>> +         */
>> +        if (replay_state.instruction_count == 0 && replay_state.has_unread_data) {
>> +            return false;
>> +        } else {
>> +            fprintf(stderr, "Error: Invalid replay state\n");
>> +            fprintf(stderr,"instruction_count = %d, has = %d, event_kind = %d\n",
>> +                    replay_state.instruction_count, replay_state.has_unread_data, replay_state.data_kind);
>> +            abort();
>> +        }
>> +    }
>> +    return true;
>> +}
>> +
>> +
>>   void replay_finish(void)
>>   {
>>       if (replay_mode == REPLAY_MODE_NONE) {

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2023-12-08  9:30 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-05 20:40 [PATCH 00/11] record/replay fixes, maybe for 8.2 or for post release stable? Alex Bennée
2023-12-05 20:40 ` [PATCH 01/11] tests/avocado: add a simple i386 replay kernel test Alex Bennée
2023-12-06 16:21   ` Richard Henderson
2023-12-07  8:20   ` Pavel Dovgalyuk
2023-12-05 20:40 ` [PATCH 02/11] tests/avocado: fix typo in replay_linux Alex Bennée
2023-12-06 11:24   ` Philippe Mathieu-Daudé
2023-12-06 16:22   ` Richard Henderson
2023-12-07  8:20   ` Pavel Dovgalyuk
2023-12-05 20:40 ` [PATCH 03/11] scripts/replay-dump: update to latest format Alex Bennée
2023-12-06 16:29   ` Richard Henderson
2023-12-05 20:40 ` [PATCH 04/11] scripts/replay_dump: track total number of instructions Alex Bennée
2023-12-06 11:25   ` Philippe Mathieu-Daudé
2023-12-06 16:38   ` Richard Henderson
2023-12-07  8:26   ` Pavel Dovgalyuk
2023-12-05 20:41 ` [PATCH 05/11] replay: remove host_clock_last Alex Bennée
2023-12-06 11:33   ` Philippe Mathieu-Daudé
2023-12-06 16:39   ` Richard Henderson
2023-12-07  8:26   ` Pavel Dovgalyuk
2023-12-05 20:41 ` [PATCH 06/11] replay: add proper kdoc for ReplayState Alex Bennée
2023-12-06 11:27   ` Philippe Mathieu-Daudé
2023-12-06 11:56     ` Alex Bennée
2023-12-07  8:38   ` Pavel Dovgalyuk
2023-12-05 20:41 ` [PATCH 07/11] replay: make has_unread_data a bool Alex Bennée
2023-12-06 11:31   ` Philippe Mathieu-Daudé
2023-12-06 16:42   ` Richard Henderson
2023-12-07  8:39   ` Pavel Dovgalyuk
2023-12-05 20:41 ` [PATCH 08/11] replay: introduce a central report point for sync errors Alex Bennée
2023-12-06 11:35   ` Philippe Mathieu-Daudé
2023-12-06 16:48     ` Richard Henderson
2023-12-07  8:45       ` Pavel Dovgalyuk
2023-12-06 16:47   ` Richard Henderson
2023-12-05 20:41 ` [PATCH 09/11] replay: stop us hanging in rr_wait_io_event Alex Bennée
2023-12-06 16:51   ` Richard Henderson
2023-12-08  8:32   ` Pavel Dovgalyuk
2023-12-08  9:29     ` Alex Bennée
2023-12-05 20:41 ` [PATCH 10/11] chardev: force write all when recording replay logs Alex Bennée
2023-12-06 14:25   ` Philippe Mathieu-Daudé
2023-12-07  8:46   ` Pavel Dovgalyuk
2023-12-05 20:41 ` [PATCH 11/11] tests/avocado: remove skips from replay_kernel Alex Bennée
2023-12-07  8:46   ` Pavel Dovgalyuk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.