qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Cc: edgar.iglesias@xilinx.com, peter.maydell@linaro.org,
	igor.rubinov@gmail.com, mark.burton@greensocs.com,
	real@ispras.ru, hines@cert.org, qemu-devel@nongnu.org,
	maria.klimushenkova@ispras.ru, stefanha@redhat.com,
	pbonzini@redhat.com, batuzovk@ispras.ru, alex.bennee@linaro.org,
	fred.konrad@greensocs.com
Subject: Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay
Date: Wed, 10 Feb 2016 14:25:45 +0100	[thread overview]
Message-ID: <20160210132545.GC5474@noname.redhat.com> (raw)
In-Reply-To: <000a01d16401$be4d31d0$3ae79570$@Dovgaluk@ispras.ru>

Am 10.02.2016 um 13:51 hat Pavel Dovgalyuk geschrieben:
> > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > Am 10.02.2016 um 13:05 hat Pavel Dovgalyuk geschrieben:
> > > > Am 09.02.2016 um 12:52 hat Pavel Dovgalyuk geschrieben:
> > > > > > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > > > > > But even this doesn't feel completely right, because block drivers are
> > > > > > already layered and there is no need to hardcode something optional (and
> > > > > > rarely used) in the hot code path that could just be another layer.
> > > > > >
> > > > > > I assume that you know beforehand if you want to replay something, so
> > > > > > requiring you to configure your block devices with a replay driver on
> > > > > > top of the stack seems reasonable enough.
> > > > >
> > > > > I cannot use block drivers for this. When driver functions are used, QEMU
> > > > > is already used coroutines (and probably started bottom halves).
> > > > > Coroutines make execution non-deterministic.
> > > > > That's why we have to intercept blk_aio_ functions, that are called
> > > > > deterministically.
> > > >
> > > > What does "deterministic" mean in this context, i.e. what are your exact
> > > > requirements?
> > >
> > > "Deterministic" means that the replayed execution should run exactly
> > > the same guest instructions in the same sequence, as in recording session.
> > 
> > Okay. I think with this we can do better than what you have now.
> > 
> > > > I don't think that coroutines introduce anything non-deterministic per
> > > > se. Depending on what you mean by it, the block layer code paths in
> > > > block.c may contain problematic code.
> > >
> > > They are non-deterministic if we need instruction-level accuracy.
> > > Thread switching (and therefore callbacks and BH execution) is non-deterministic.
> > 
> > Thread switching depends on an external event (the kernel scheduler
> > deciding to switch), so agreed, if a thread switch ever influences what
> > the guest sees, that would be a problem.
> > 
> > Generally, however, callbacks and BHs don't involve a thread switch at
> > all (BHs can be invoked from a different thread in theory, but we have
> > very few of those cases and they shouldn't be visible for the guest).
> > The same is true for coroutines, which are semantically equivalent to
> > callbacks.
> > 
> > > In two different executions these callbacks may happen at different moments of
> > > time (counting in number of executed instructions).
> > > All operations with virtual devices (including memory, interrupt controller,
> > > and disk drive controller) should happen at deterministic moments of time
> > > to be replayable.
> > 
> > Right, so let's talk about what this external non-deterministic event
> > really is.
> > 
> > I think the only thing whose timing is unknown in the block layer is the
> > completion of I/O requests. This non-determinism comes from the time the
> > I/O syscalls made by the lowest layer (usually raw-posix) take.
> 
> Right.
> 
> > This means that we can add logic to remove the non-determinism at the
> > point of our choice between raw-posix and the guest device emulation. A
> > block driver on top is as good as anything else.
> > 
> > While recording, this block driver would just pass the request to next
> > lower layer (starting a request is deterministic, so it doesn't need to
> > be logged) and once the request completes it logs it. While replaying,
> > the completion of requests is delayed until we read it in the log; if we
> > read it in the log and the request hasn't completed yet, we do a busy
> > wait for it (while(!completed) aio_poll();).
> 
> I tried serializing all bottom halves and worker thread callbacks in
> previous version of the patches. That code was much more complicated 
> and error-prone than the current version. We had to classify all bottom
> halves to recorded and non-recorded (because sometimes they are used
> for qemu's purposes, not the guest ones).
> 
> However, I don't understand yet which layer do you offer as the candidate
> for record/replay? What functions should be changed?
> I would like to investigate this way, but I don't got it yet.

At the core, I wouldn't change any existing function, but introduce a
new block driver. You could copy raw_bsd.c for a start and then tweak
it. Leave out functions that you don't want to support, and add the
necessary magic to .bdrv_co_readv/writev.

Something like this (can probably be generalised for more than just
reads as the part after the bdrv_co_reads() call should be the same for
reads, writes and any other request types):

int blkreplay_co_readv()
{
    BlockReplayState *s = bs->opaque;
    int reqid = s->reqid++;

    bdrv_co_readv(bs->file, ...);

    if (mode == record) {
        log(reqid, time);
    } else {
        assert(mode == replay);
        bool *done = req_replayed_list_get(reqid)
        if (done) {
            *done = true;
        } else {
            req_completed_list_insert(reqid, qemu_coroutine_self());
            qemu_coroutine_yield();
        }
    }
}

/* called by replay.c */
int blkreplay_run_event()
{
    if (mode == replay) {
        co = req_completed_list_get(e.reqid);
        if (co) {
            qemu_coroutine_enter(co);
        } else {
            bool done = false;
            req_replayed_list_insert(reqid, &done);
            /* wait synchronously for completion */
            while (!done) {
                aio_poll();
            }
        }
    }
}

Where we could consider changing existing code is that it might be
desirable to automatically put an instance of this block driver on top
of every block device when record/replay is used. If we don't do that,
you need to explicitly specify -drive driver=blkreplay,...

> > This model would get rid of the bdrv_drain_all() that you call
> > everywhere and therefore allow concurrent requests, giving a result that
> > is much closer to the "normal" behaviour without replay.
> > 
> > > > The block layer uses bottom halves in some cases for request completion,
> > > > but not until calling into the first driver (why would they be a
> > > > problem?). What could happen is that a request is serialised and
> > > > therefore delayed in some special configurations, which sounds a lot
> > > > like what you wanted to avoid.
> > >
> > > Drivers cannot distinguish the requests from guest CPU and from savevm/loadvm.
> > > First ones have to be deterministic, because they affect guest memory,
> > > virtual disk controller, and interrupts.
> > 
> > Sure they can, these are two different callbacks. But even if they
> > couldn't, making more things than necessary deterministic might be
> > wasteful, but not really harmful.
> 
> Is there any universal way to check this?

No, the block layer doesn't know its caller. It's only possible in this
specific case because the guest never calls .bdrv_save_vmstate().

But as I said, logging more requests than necessary doesn't really hurt
expect making the log a bit larger.

Kevin

  reply	other threads:[~2016-02-10 13:25 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-09  5:55 [Qemu-devel] [PATCH 0/3] Deterministic replay extensions Pavel Dovgalyuk
2016-02-09  5:55 ` [Qemu-devel] [PATCH 1/3] replay: character devices Pavel Dovgalyuk
2016-02-09  5:55 ` [Qemu-devel] [PATCH 2/3] replay: introduce new checkpoint for icount warp Pavel Dovgalyuk
2016-02-09  5:55 ` [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay Pavel Dovgalyuk
2016-02-09 10:27   ` Kevin Wolf
2016-02-09 11:52     ` Pavel Dovgalyuk
2016-02-10 11:45       ` Kevin Wolf
2016-02-10 12:05         ` Pavel Dovgalyuk
2016-02-10 12:28           ` Kevin Wolf
2016-02-10 12:51             ` Pavel Dovgalyuk
2016-02-10 13:25               ` Kevin Wolf [this message]
2016-02-10 13:33                 ` Pavel Dovgalyuk
2016-02-10 13:52                   ` Kevin Wolf
2016-02-11  6:05                 ` Pavel Dovgalyuk
2016-02-11  9:43                   ` Kevin Wolf
2016-02-11 11:00                     ` Pavel Dovgalyuk
2016-02-11 12:18                       ` Kevin Wolf
2016-02-11 12:24                         ` Pavel Dovgalyuk
2016-02-12  8:33                         ` Pavel Dovgalyuk
2016-02-12  9:44                           ` Kevin Wolf
2016-02-12 13:19                 ` Pavel Dovgalyuk
2016-02-12 13:58                   ` Kevin Wolf
2016-02-15  8:38                     ` Pavel Dovgalyuk
2016-02-15  9:10                       ` Kevin Wolf
2016-02-15  9:14                       ` Pavel Dovgalyuk
2016-02-15  9:38                         ` Kevin Wolf
2016-02-15 11:19                           ` Pavel Dovgalyuk
2016-02-15 12:46                             ` Kevin Wolf
2016-02-15 13:54                           ` Pavel Dovgalyuk
2016-02-15 14:06                             ` Kevin Wolf
2016-02-15 14:24                               ` Pavel Dovgalyuk
2016-02-15 15:01                                 ` Kevin Wolf
2016-02-16  6:25                                   ` Pavel Dovgalyuk
2016-02-16 10:02                                     ` Kevin Wolf
2016-02-16 11:20                                       ` Pavel Dovgalyuk
2016-02-16 12:54                                         ` Kevin Wolf
2016-02-18  9:18                                           ` Pavel Dovgalyuk
2016-02-20  7:11                                           ` Pavel Dovgalyuk
2016-02-22 11:06                                             ` Kevin Wolf
2016-02-24 11:59                                               ` Pavel Dovgalyuk
2016-02-24 13:14                                                 ` Kevin Wolf
2016-02-25  9:06                                                   ` Pavel Dovgalyuk
2016-02-26  9:01                                                     ` Kevin Wolf
2016-02-29  7:03                                                       ` Pavel Dovgalyuk
2016-02-29  7:54                                                         ` Kevin Wolf
2016-02-15 14:50                               ` Pavel Dovgalyuk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160210132545.GC5474@noname.redhat.com \
    --to=kwolf@redhat.com \
    --cc=Pavel.Dovgaluk@ispras.ru \
    --cc=alex.bennee@linaro.org \
    --cc=batuzovk@ispras.ru \
    --cc=edgar.iglesias@xilinx.com \
    --cc=fred.konrad@greensocs.com \
    --cc=hines@cert.org \
    --cc=igor.rubinov@gmail.com \
    --cc=maria.klimushenkova@ispras.ru \
    --cc=mark.burton@greensocs.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=real@ispras.ru \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).