All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC PATCH v1 0/9] MTTCG and record/replay fixes for rc3
@ 2017-04-03 12:45 Alex Bennée
  2017-04-03 12:45 ` [Qemu-devel] [RFC PATCH v1 1/9] scripts/qemugdb/mtree.py: fix up mtree dump Alex Bennée
                   ` (9 more replies)
  0 siblings, 10 replies; 32+ messages in thread
From: Alex Bennée @ 2017-04-03 12:45 UTC (permalink / raw)
  To: dovgaluk, rth, pbonzini
  Cc: peter.maydell, qemu-devel, mttcg, fred.konrad, a.rigo, cota,
	bobby.prani, nikunj, Alex Bennée

Hi,

This is the current state of my fixes for icount based record and
replay. It doesn't completely fix the problem (hence the RFC status)
but improves it to the point that I have been able to record and
replay the boot of a vexpress kernel.

The first 3 patches are helper scripts I've been using during my
debugging. The first is the only real fix and the following 2 should
probably be dropped from any pull request as they introduce new
features rather than fix something.

We then have another BQL fix for i386. I haven't had a chance to
replicate myself so far but it looks perfectly sane to me.

Finally the fixes for icount:

  cpus: remove icount handling from qemu_tcg_cpu_thread_fn

  Simple clean-up as we don't do icount for MTTCG

  cpus: check cpu->running in cpu_get_icount_raw()

  I'm not sure the race happens and once outside of cpu->running the
  icount counters should be zero. However it seems a sensible
  precaution.

  cpus: move icount preparation out of tcg_exec_cpu

  This is a little light re-factoring that stops the icount work
  getting in the way of the main bit of tcg_exec_cpu. It also removed
  some redundant assignment and replaced them with asserts for now.

  cpus: don't credit executed instructions before they have run

  This is the main one which ensures we never jump forward in time and
  cpu_get_icount_raw() remains consistent.

  replay: gracefully handle backward time events

  This is the most hand-wavey patch. It glosses over the disparity in
  time between the vCPU thread and the main-loop by jumping forward to
  the most current time value. However it is not really deterministic
  and runs into potential problems with sequencing of log events.

  I think a better fix would be to extend replay_lock() so all related
  log events are serialised and we don't end up with interleaved
  events from the vCPU thread and the main-loop.

I think the cpus: patches should probably go into the next
pull-request while we see if we can come up with a better final
solution for fixing record/replay. However given how long this
regression has run during the release candidate process I wanted to
update everyone on the current status and get feedback ASAP.

Cheers,


Alex Bennée (9):
  scripts/qemugdb/mtree.py: fix up mtree dump
  scripts/qemu-gdb/timers.py: new helper to dump timer state
  scripts/replay-dump.py: replay log dumper
  target/i386/misc_helper: wrap BQL around another IRQ generator
  cpus: remove icount handling from qemu_tcg_cpu_thread_fn
  cpus: check cpu->running in cpu_get_icount_raw()
  cpus: move icount preparation out of tcg_exec_cpu
  cpus: don't credit executed instructions before they have run
  replay: gracefully handle backward time events

 cpus.c                    |  94 +++++++++++-----
 include/qom/cpu.h         |   1 +
 replay/replay-internal.c  |   7 ++
 replay/replay.c           |   9 +-
 scripts/qemu-gdb.py       |   3 +-
 scripts/qemugdb/mtree.py  |  12 +-
 scripts/qemugdb/timers.py |  54 +++++++++
 scripts/replay-dump.py    | 272 ++++++++++++++++++++++++++++++++++++++++++++++
 target/i386/misc_helper.c |   3 +
 9 files changed, 423 insertions(+), 32 deletions(-)
 create mode 100644 scripts/qemugdb/timers.py
 create mode 100755 scripts/replay-dump.py

-- 
2.11.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2017-04-07 11:27 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-03 12:45 [Qemu-devel] [RFC PATCH v1 0/9] MTTCG and record/replay fixes for rc3 Alex Bennée
2017-04-03 12:45 ` [Qemu-devel] [RFC PATCH v1 1/9] scripts/qemugdb/mtree.py: fix up mtree dump Alex Bennée
2017-04-03 12:45 ` [Qemu-devel] [RFC PATCH v1 2/9] scripts/qemu-gdb/timers.py: new helper to dump timer state Alex Bennée
2017-04-03 14:02   ` Philippe Mathieu-Daudé
2017-04-03 12:45 ` [Qemu-devel] [RFC PATCH v1 3/9] scripts/replay-dump.py: replay log dumper Alex Bennée
2017-04-03 12:45 ` [Qemu-devel] [RFC PATCH v1 4/9] target/i386/misc_helper: wrap BQL around another IRQ generator Alex Bennée
2017-04-04 16:53   ` Richard Henderson
2017-04-04 17:36     ` Eduardo Habkost
2017-04-03 12:45 ` [Qemu-devel] [RFC PATCH v1 5/9] cpus: remove icount handling from qemu_tcg_cpu_thread_fn Alex Bennée
2017-04-04 16:53   ` Richard Henderson
2017-04-03 12:45 ` [Qemu-devel] [RFC PATCH v1 6/9] cpus: check cpu->running in cpu_get_icount_raw() Alex Bennée
2017-04-03 14:00   ` Philippe Mathieu-Daudé
2017-04-04 16:54   ` Richard Henderson
2017-04-03 12:45 ` [Qemu-devel] [RFC PATCH v1 7/9] cpus: move icount preparation out of tcg_exec_cpu Alex Bennée
2017-04-04  5:39   ` Pavel Dovgalyuk
2017-04-04  8:56     ` Alex Bennée
2017-04-04 10:46       ` Alex Bennée
2017-04-04 10:53         ` Paolo Bonzini
2017-04-04 12:31           ` Alex Bennée
2017-04-04 12:37             ` Paolo Bonzini
2017-04-04 13:29               ` Alex Bennée
2017-04-05 10:44                 ` Pavel Dovgalyuk
2017-04-05 11:18                   ` Alex Bennée
2017-04-03 12:45 ` [Qemu-devel] [RFC PATCH v1 8/9] cpus: don't credit executed instructions before they have run Alex Bennée
2017-04-03 17:04   ` Paolo Bonzini
2017-04-04  5:37   ` Pavel Dovgalyuk
2017-04-04 10:13     ` Paolo Bonzini
2017-04-07 11:27       ` Pavel Dovgalyuk
2017-04-04 14:39   ` Paolo Bonzini
2017-04-03 12:45 ` [Qemu-devel] [RFC PATCH v1 9/9] replay: gracefully handle backward time events Alex Bennée
2017-04-03 17:03 ` [Qemu-devel] [RFC PATCH v1 0/9] MTTCG and record/replay fixes for rc3 Paolo Bonzini
2017-04-04  8:50   ` Alex Bennée

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.