From: Kevin Wolf <kwolf@redhat.com>
To: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Cc: edgar.iglesias@xilinx.com, peter.maydell@linaro.org,
igor.rubinov@gmail.com, mark.burton@greensocs.com,
real@ispras.ru, hines@cert.org, qemu-devel@nongnu.org,
maria.klimushenkova@ispras.ru, stefanha@redhat.com,
pbonzini@redhat.com, batuzovk@ispras.ru, alex.bennee@linaro.org,
fred.konrad@greensocs.com
Subject: Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay
Date: Fri, 12 Feb 2016 10:44:41 +0100 [thread overview]
Message-ID: <20160212094441.GA4828@noname.redhat.com> (raw)
In-Reply-To: <000b01d16570$099e8020$1cdb8060$@Dovgaluk@ispras.ru>
Am 12.02.2016 um 09:33 hat Pavel Dovgalyuk geschrieben:
> > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > Am 11.02.2016 um 12:00 hat Pavel Dovgalyuk geschrieben:
> > > > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > > > Am 11.02.2016 um 07:05 hat Pavel Dovgalyuk geschrieben:
> > > > > > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > > > > > Am 10.02.2016 um 13:51 hat Pavel Dovgalyuk geschrieben:
> > > > > > > However, I don't understand yet which layer do you offer as the candidate
> > > > > > > for record/replay? What functions should be changed?
> > > > > > > I would like to investigate this way, but I don't got it yet.
> > > > > >
> > > > > > At the core, I wouldn't change any existing function, but introduce a
> > > > > > new block driver. You could copy raw_bsd.c for a start and then tweak
> > > > > > it. Leave out functions that you don't want to support, and add the
> > > > > > necessary magic to .bdrv_co_readv/writev.
> > > > > >
> > > > > > Something like this (can probably be generalised for more than just
> > > > > > reads as the part after the bdrv_co_reads() call should be the same for
> > > > > > reads, writes and any other request types):
> > > > > >
> > > > > > int blkreplay_co_readv()
> > > > > > {
> > > > > > BlockReplayState *s = bs->opaque;
> > > > > > int reqid = s->reqid++;
> > > > > >
> > > > > > bdrv_co_readv(bs->file, ...);
> > > > > >
> > > > > > if (mode == record) {
> > > > > > log(reqid, time);
> > > > > > } else {
> > > > > > assert(mode == replay);
> > > > > > bool *done = req_replayed_list_get(reqid)
> > > > > > if (done) {
> > > > > > *done = true;
> > > > > > } else {
> > > > > > req_completed_list_insert(reqid, qemu_coroutine_self());
> > > > > > qemu_coroutine_yield();
> > > > > > }
> > > > > > }
> > > > > > }
> > > > > >
> > > > > > /* called by replay.c */
> > > > > > int blkreplay_run_event()
> > > > > > {
> > > > > > if (mode == replay) {
> > > > > > co = req_completed_list_get(e.reqid);
> > > > > > if (co) {
> > > > > > qemu_coroutine_enter(co);
> > > > > > } else {
> > > > > > bool done = false;
> > > > > > req_replayed_list_insert(reqid, &done);
> > > > > > /* wait synchronously for completion */
> > > > > > while (!done) {
> > > > > > aio_poll();
> > > > > > }
> > > > > > }
> > > > > > }
> > > > > > }
> > > > > >
> > > > > > Where we could consider changing existing code is that it might be
> > > > > > desirable to automatically put an instance of this block driver on top
> > > > > > of every block device when record/replay is used. If we don't do that,
> > > > > > you need to explicitly specify -drive driver=blkreplay,...
> > > > >
> > > > > As far, as I understand, all synchronous read/write request are also passed
> > > > > through this coroutines layer.
> > > >
> > > > Yes, all read/write requests go through the same function internally, no
> > > > matter which external interface was used.
> > > >
> > > > > It means that every disk access in replay phase should match the recording phase.
> > > >
> > > > Right. If I'm not mistaken, this was the fundamental requirement you
> > > > have, so I wouldn't have suggested this otherwise.
> > > >
> > > > > Record/replay is intended to be used for debugging and analysis.
> > > > > When execution is replayed, guest machine cannot notice analysis overhead.
> > > > > Some analysis methods may include disk image reading. E.g., qemu-based
> > > > > analysis framework DECAF uses sleuthkit for disk forensics (
> > > > https://github.com/sycurelab/DECAF ).
> > > > > If similar framework will be used with replay, forensics disk access operations
> > > > > won't work if we will record/replay the coroutines.
> > > >
> > > > Sorry, I'm not sure if I can follow.
> > > >
> > > > If such analysis software runs in the guest, it's not a replay any more
> > > > and I completely fail to see what you're doing.
> > > >
> > > > If it's a qemu component independent from the guest, then my method
> > > > gives you a clean way to bypass the replay driver that wouldn't be
> > > > possible with yours.
> > >
> > > The second one. qemu may be extended with some components that
> > > perform guest introspection.
> > >
> > > > If your plan was to record/replay only async requests and then use sync
> > > > requests to bypass the record/replay, let me clearly state that this is
> > > > the wrong approach: There are still guest devices which do synchronous
> > > > I/O and need to be considered in the replay log, and you shouldn't
> > > > prevent the analysis code from using AIO (in fact, using sync I/O in new
> > > > code is very much frowned upon).
> > >
> > > Why do guest synchronous requests have to be recorded?
> > > Aren't they completely deterministic?
> >
> > Good point. I think you're right in practice. In theory, with dataplane
> > (i.e. when running the request in a separate thread) it could happen,
> > but I guess that isn't very compatible with replay anyway - and at the
> > first sight I couldn't see it performing synchronous requests either.
> >
> > > > I can explain in more detail what the block device structure looks like
> > > > and how to access an image with and without record/replay, but first let
> > > > me please know whether I guessed right what your problem is. Or if I
> > > > missed your point, can you please describe in detail a case that
> > > > wouldn't work?
> > >
> > > You have understood it correctly.
> > > And what is the solution for bypassing one of the layers from component that
> > > should not affect the replay?
> >
> > For this, you need to understand how block drivers are stacked in qemu.
> > Each driver in the stack has a separate struct BlockDriverState, which
> > can be used to access its data. You could hook up things like this:
> >
> > virtio-blk NBD server
> > -------------- ------------
> > | |
> > v |
> > +------------+ |
> > | blkreplay | |
> > +------------+ |
> > | |
> > v |
> > +------------+ |
> > | qcow2 | <---------------+
> > +------------+
> > |
> > v
> > +------------+
> > | raw-posix |
> > +------------+
> > |
> > v
> > --------------
> > filesystem
> >
> > As you see, what I've chosen for the external analysis interface is just
> > an NBD server as this is the component that we already have today. You
> > could hook up any other (new) code there; the important part is that it
> > doesn't work on the BDS of the blkreplay driver, but directly on the BDS
> > of the qcow2 driver.
> >
> > On the command line, it could look like this (this assumes that we don't
> > add syntactic sugar that creates the blkreplay part automatically - we
> > can always do that):
> >
> > -drive file=image.qcow2,if=none,id=img-direct
> > -drive driver=blkreplay,if=none,image=img-direct,id=img-blkreplay
> > -device virtio-blk-pci,drive=img-blkreplay
>
> Are there any hints for driver with these options?
> I can't figure out how to create _open function for that.
> blkdebug driver seems similar, but it receives image name directly, without referencing
> the lower level.
Actually, the 'image' option of blkdebug works like this. For example, I
just tried this commandline and it works:
-drive file=/tmp/test.qcow2,if=none,id=img
-drive driver=blkdebug,image=img,if=virtio
Just ignore the legacy part with 'x-image' and pass NULL as the filename
to bdrv_open_child().
Another driver that could be useful as an example is quorum.
Kevin
next prev parent reply other threads:[~2016-02-12 9:44 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-09 5:55 [Qemu-devel] [PATCH 0/3] Deterministic replay extensions Pavel Dovgalyuk
2016-02-09 5:55 ` [Qemu-devel] [PATCH 1/3] replay: character devices Pavel Dovgalyuk
2016-02-09 5:55 ` [Qemu-devel] [PATCH 2/3] replay: introduce new checkpoint for icount warp Pavel Dovgalyuk
2016-02-09 5:55 ` [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay Pavel Dovgalyuk
2016-02-09 10:27 ` Kevin Wolf
2016-02-09 11:52 ` Pavel Dovgalyuk
2016-02-10 11:45 ` Kevin Wolf
2016-02-10 12:05 ` Pavel Dovgalyuk
2016-02-10 12:28 ` Kevin Wolf
2016-02-10 12:51 ` Pavel Dovgalyuk
2016-02-10 13:25 ` Kevin Wolf
2016-02-10 13:33 ` Pavel Dovgalyuk
2016-02-10 13:52 ` Kevin Wolf
2016-02-11 6:05 ` Pavel Dovgalyuk
2016-02-11 9:43 ` Kevin Wolf
2016-02-11 11:00 ` Pavel Dovgalyuk
2016-02-11 12:18 ` Kevin Wolf
2016-02-11 12:24 ` Pavel Dovgalyuk
2016-02-12 8:33 ` Pavel Dovgalyuk
2016-02-12 9:44 ` Kevin Wolf [this message]
2016-02-12 13:19 ` Pavel Dovgalyuk
2016-02-12 13:58 ` Kevin Wolf
2016-02-15 8:38 ` Pavel Dovgalyuk
2016-02-15 9:10 ` Kevin Wolf
2016-02-15 9:14 ` Pavel Dovgalyuk
2016-02-15 9:38 ` Kevin Wolf
2016-02-15 11:19 ` Pavel Dovgalyuk
2016-02-15 12:46 ` Kevin Wolf
2016-02-15 13:54 ` Pavel Dovgalyuk
2016-02-15 14:06 ` Kevin Wolf
2016-02-15 14:24 ` Pavel Dovgalyuk
2016-02-15 15:01 ` Kevin Wolf
2016-02-16 6:25 ` Pavel Dovgalyuk
2016-02-16 10:02 ` Kevin Wolf
2016-02-16 11:20 ` Pavel Dovgalyuk
2016-02-16 12:54 ` Kevin Wolf
2016-02-18 9:18 ` Pavel Dovgalyuk
2016-02-20 7:11 ` Pavel Dovgalyuk
2016-02-22 11:06 ` Kevin Wolf
2016-02-24 11:59 ` Pavel Dovgalyuk
2016-02-24 13:14 ` Kevin Wolf
2016-02-25 9:06 ` Pavel Dovgalyuk
2016-02-26 9:01 ` Kevin Wolf
2016-02-29 7:03 ` Pavel Dovgalyuk
2016-02-29 7:54 ` Kevin Wolf
2016-02-15 14:50 ` Pavel Dovgalyuk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160212094441.GA4828@noname.redhat.com \
--to=kwolf@redhat.com \
--cc=Pavel.Dovgaluk@ispras.ru \
--cc=alex.bennee@linaro.org \
--cc=batuzovk@ispras.ru \
--cc=edgar.iglesias@xilinx.com \
--cc=fred.konrad@greensocs.com \
--cc=hines@cert.org \
--cc=igor.rubinov@gmail.com \
--cc=maria.klimushenkova@ispras.ru \
--cc=mark.burton@greensocs.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=real@ispras.ru \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).