qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Cc: edgar.iglesias@xilinx.com, peter.maydell@linaro.org,
	igor.rubinov@gmail.com, mark.burton@greensocs.com,
	real@ispras.ru, hines@cert.org, qemu-devel@nongnu.org,
	maria.klimushenkova@ispras.ru, stefanha@redhat.com,
	pbonzini@redhat.com, batuzovk@ispras.ru, alex.bennee@linaro.org,
	fred.konrad@greensocs.com
Subject: Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay
Date: Wed, 10 Feb 2016 13:28:16 +0100	[thread overview]
Message-ID: <20160210122816.GB5474@noname.redhat.com> (raw)
In-Reply-To: <000601d163fb$4cbcae70$e6360b50$@Dovgaluk@ispras.ru>

Am 10.02.2016 um 13:05 hat Pavel Dovgalyuk geschrieben:
> > -----Original Message-----
> > Am 09.02.2016 um 12:52 hat Pavel Dovgalyuk geschrieben:
> > > > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > > > But even this doesn't feel completely right, because block drivers are
> > > > already layered and there is no need to hardcode something optional (and
> > > > rarely used) in the hot code path that could just be another layer.
> > > >
> > > > I assume that you know beforehand if you want to replay something, so
> > > > requiring you to configure your block devices with a replay driver on
> > > > top of the stack seems reasonable enough.
> > >
> > > I cannot use block drivers for this. When driver functions are used, QEMU
> > > is already used coroutines (and probably started bottom halves).
> > > Coroutines make execution non-deterministic.
> > > That's why we have to intercept blk_aio_ functions, that are called
> > > deterministically.
> > 
> > What does "deterministic" mean in this context, i.e. what are your exact
> > requirements?
> 
> "Deterministic" means that the replayed execution should run exactly
> the same guest instructions in the same sequence, as in recording session.

Okay. I think with this we can do better than what you have now.

> > I don't think that coroutines introduce anything non-deterministic per
> > se. Depending on what you mean by it, the block layer code paths in
> > block.c may contain problematic code.
> 
> They are non-deterministic if we need instruction-level accuracy.
> Thread switching (and therefore callbacks and BH execution) is non-deterministic.

Thread switching depends on an external event (the kernel scheduler
deciding to switch), so agreed, if a thread switch ever influences what
the guest sees, that would be a problem.

Generally, however, callbacks and BHs don't involve a thread switch at
all (BHs can be invoked from a different thread in theory, but we have
very few of those cases and they shouldn't be visible for the guest).
The same is true for coroutines, which are semantically equivalent to
callbacks.

> In two different executions these callbacks may happen at different moments of
> time (counting in number of executed instructions).
> All operations with virtual devices (including memory, interrupt controller,
> and disk drive controller) should happen at deterministic moments of time
> to be replayable.

Right, so let's talk about what this external non-deterministic event
really is.

I think the only thing whose timing is unknown in the block layer is the
completion of I/O requests. This non-determinism comes from the time the
I/O syscalls made by the lowest layer (usually raw-posix) take.

This means that we can add logic to remove the non-determinism at the
point of our choice between raw-posix and the guest device emulation. A
block driver on top is as good as anything else.

While recording, this block driver would just pass the request to next
lower layer (starting a request is deterministic, so it doesn't need to
be logged) and once the request completes it logs it. While replaying,
the completion of requests is delayed until we read it in the log; if we
read it in the log and the request hasn't completed yet, we do a busy
wait for it (while(!completed) aio_poll();).

This model would get rid of the bdrv_drain_all() that you call
everywhere and therefore allow concurrent requests, giving a result that
is much closer to the "normal" behaviour without replay.

> > The block layer uses bottom halves in some cases for request completion,
> > but not until calling into the first driver (why would they be a
> > problem?). What could happen is that a request is serialised and
> > therefore delayed in some special configurations, which sounds a lot
> > like what you wanted to avoid.
> 
> Drivers cannot distinguish the requests from guest CPU and from savevm/loadvm.
> First ones have to be deterministic, because they affect guest memory,
> virtual disk controller, and interrupts.

Sure they can, these are two different callbacks. But even if they
couldn't, making more things than necessary deterministic might be
wasteful, but not really harmful.

> > Is there a writeup somewhere of the context in which this code is called
> > and the rules that need to be considered for replay to work?
> 
> There is some information about it in docs/replay.txt and in
> http://wiki.qemu.org/Features/record-replay

Thanks, that was useful. Especially the explanation of non-deterministic
events in docs/replay.txt seems to be very much in line with what I'm
suggesting, as it made clear that external events are what we should be
looking for.

Kevin

  reply	other threads:[~2016-02-10 12:28 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-09  5:55 [Qemu-devel] [PATCH 0/3] Deterministic replay extensions Pavel Dovgalyuk
2016-02-09  5:55 ` [Qemu-devel] [PATCH 1/3] replay: character devices Pavel Dovgalyuk
2016-02-09  5:55 ` [Qemu-devel] [PATCH 2/3] replay: introduce new checkpoint for icount warp Pavel Dovgalyuk
2016-02-09  5:55 ` [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay Pavel Dovgalyuk
2016-02-09 10:27   ` Kevin Wolf
2016-02-09 11:52     ` Pavel Dovgalyuk
2016-02-10 11:45       ` Kevin Wolf
2016-02-10 12:05         ` Pavel Dovgalyuk
2016-02-10 12:28           ` Kevin Wolf [this message]
2016-02-10 12:51             ` Pavel Dovgalyuk
2016-02-10 13:25               ` Kevin Wolf
2016-02-10 13:33                 ` Pavel Dovgalyuk
2016-02-10 13:52                   ` Kevin Wolf
2016-02-11  6:05                 ` Pavel Dovgalyuk
2016-02-11  9:43                   ` Kevin Wolf
2016-02-11 11:00                     ` Pavel Dovgalyuk
2016-02-11 12:18                       ` Kevin Wolf
2016-02-11 12:24                         ` Pavel Dovgalyuk
2016-02-12  8:33                         ` Pavel Dovgalyuk
2016-02-12  9:44                           ` Kevin Wolf
2016-02-12 13:19                 ` Pavel Dovgalyuk
2016-02-12 13:58                   ` Kevin Wolf
2016-02-15  8:38                     ` Pavel Dovgalyuk
2016-02-15  9:10                       ` Kevin Wolf
2016-02-15  9:14                       ` Pavel Dovgalyuk
2016-02-15  9:38                         ` Kevin Wolf
2016-02-15 11:19                           ` Pavel Dovgalyuk
2016-02-15 12:46                             ` Kevin Wolf
2016-02-15 13:54                           ` Pavel Dovgalyuk
2016-02-15 14:06                             ` Kevin Wolf
2016-02-15 14:24                               ` Pavel Dovgalyuk
2016-02-15 15:01                                 ` Kevin Wolf
2016-02-16  6:25                                   ` Pavel Dovgalyuk
2016-02-16 10:02                                     ` Kevin Wolf
2016-02-16 11:20                                       ` Pavel Dovgalyuk
2016-02-16 12:54                                         ` Kevin Wolf
2016-02-18  9:18                                           ` Pavel Dovgalyuk
2016-02-20  7:11                                           ` Pavel Dovgalyuk
2016-02-22 11:06                                             ` Kevin Wolf
2016-02-24 11:59                                               ` Pavel Dovgalyuk
2016-02-24 13:14                                                 ` Kevin Wolf
2016-02-25  9:06                                                   ` Pavel Dovgalyuk
2016-02-26  9:01                                                     ` Kevin Wolf
2016-02-29  7:03                                                       ` Pavel Dovgalyuk
2016-02-29  7:54                                                         ` Kevin Wolf
2016-02-15 14:50                               ` Pavel Dovgalyuk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160210122816.GB5474@noname.redhat.com \
    --to=kwolf@redhat.com \
    --cc=Pavel.Dovgaluk@ispras.ru \
    --cc=alex.bennee@linaro.org \
    --cc=batuzovk@ispras.ru \
    --cc=edgar.iglesias@xilinx.com \
    --cc=fred.konrad@greensocs.com \
    --cc=hines@cert.org \
    --cc=igor.rubinov@gmail.com \
    --cc=maria.klimushenkova@ispras.ru \
    --cc=mark.burton@greensocs.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=real@ispras.ru \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).