From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38110) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aTozh-0004P7-MG for qemu-devel@nongnu.org; Thu, 11 Feb 2016 06:01:06 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aTozb-0002ht-LW for qemu-devel@nongnu.org; Thu, 11 Feb 2016 06:01:05 -0500 Received: from mail.ispras.ru ([83.149.199.45]:58324) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aTozb-0002hh-8q for qemu-devel@nongnu.org; Thu, 11 Feb 2016 06:00:59 -0500 From: "Pavel Dovgalyuk" References: <20160209055506.8208.67.stgit@PASHA-ISP> <20160209055524.8208.16023.stgit@PASHA-ISP> <20160209102739.GB8554@noname.redhat.com> <001201d16330$6ce67e40$46b37ac0$@Dovgaluk@ispras.ru> <20160210114531.GA5474@noname.redhat.com> <000601d163fb$4cbcae70$e6360b50$@Dovgaluk@ispras.ru> <20160210122816.GB5474@noname.redhat.com> <000a01d16401$be4d31d0$3ae79570$@Dovgaluk@ispras.ru> <20160210132545.GC5474@noname.redhat.com> <000601d16492$3a67d0a0$af3771e0$@Dovgaluk@ispras.ru> <20160211094343.GA4419@noname.redhat.com> In-Reply-To: <20160211094343.GA4419@noname.redhat.com> Date: Thu, 11 Feb 2016 14:00:51 +0300 Message-ID: <001001d164bb$789e0960$69da1c20$@Dovgaluk@ispras.ru> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Language: ru Subject: Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: 'Kevin Wolf' Cc: edgar.iglesias@xilinx.com, peter.maydell@linaro.org, igor.rubinov@gmail.com, mark.burton@greensocs.com, real@ispras.ru, hines@cert.org, qemu-devel@nongnu.org, maria.klimushenkova@ispras.ru, stefanha@redhat.com, pbonzini@redhat.com, batuzovk@ispras.ru, alex.bennee@linaro.org, fred.konrad@greensocs.com > From: Kevin Wolf [mailto:kwolf@redhat.com] > Am 11.02.2016 um 07:05 hat Pavel Dovgalyuk geschrieben: > > > From: Kevin Wolf [mailto:kwolf@redhat.com] > > > Am 10.02.2016 um 13:51 hat Pavel Dovgalyuk geschrieben: > > > > However, I don't understand yet which layer do you offer as the candidate > > > > for record/replay? What functions should be changed? > > > > I would like to investigate this way, but I don't got it yet. > > > > > > At the core, I wouldn't change any existing function, but introduce a > > > new block driver. You could copy raw_bsd.c for a start and then tweak > > > it. Leave out functions that you don't want to support, and add the > > > necessary magic to .bdrv_co_readv/writev. > > > > > > Something like this (can probably be generalised for more than just > > > reads as the part after the bdrv_co_reads() call should be the same for > > > reads, writes and any other request types): > > > > > > int blkreplay_co_readv() > > > { > > > BlockReplayState *s = bs->opaque; > > > int reqid = s->reqid++; > > > > > > bdrv_co_readv(bs->file, ...); > > > > > > if (mode == record) { > > > log(reqid, time); > > > } else { > > > assert(mode == replay); > > > bool *done = req_replayed_list_get(reqid) > > > if (done) { > > > *done = true; > > > } else { > > > req_completed_list_insert(reqid, qemu_coroutine_self()); > > > qemu_coroutine_yield(); > > > } > > > } > > > } > > > > > > /* called by replay.c */ > > > int blkreplay_run_event() > > > { > > > if (mode == replay) { > > > co = req_completed_list_get(e.reqid); > > > if (co) { > > > qemu_coroutine_enter(co); > > > } else { > > > bool done = false; > > > req_replayed_list_insert(reqid, &done); > > > /* wait synchronously for completion */ > > > while (!done) { > > > aio_poll(); > > > } > > > } > > > } > > > } > > > > > > Where we could consider changing existing code is that it might be > > > desirable to automatically put an instance of this block driver on top > > > of every block device when record/replay is used. If we don't do that, > > > you need to explicitly specify -drive driver=blkreplay,... > > > > As far, as I understand, all synchronous read/write request are also passed > > through this coroutines layer. > > Yes, all read/write requests go through the same function internally, no > matter which external interface was used. > > > It means that every disk access in replay phase should match the recording phase. > > Right. If I'm not mistaken, this was the fundamental requirement you > have, so I wouldn't have suggested this otherwise. > > > Record/replay is intended to be used for debugging and analysis. > > When execution is replayed, guest machine cannot notice analysis overhead. > > Some analysis methods may include disk image reading. E.g., qemu-based > > analysis framework DECAF uses sleuthkit for disk forensics ( > https://github.com/sycurelab/DECAF ). > > If similar framework will be used with replay, forensics disk access operations > > won't work if we will record/replay the coroutines. > > Sorry, I'm not sure if I can follow. > > If such analysis software runs in the guest, it's not a replay any more > and I completely fail to see what you're doing. > > If it's a qemu component independent from the guest, then my method > gives you a clean way to bypass the replay driver that wouldn't be > possible with yours. The second one. qemu may be extended with some components that perform guest introspection. > If your plan was to record/replay only async requests and then use sync > requests to bypass the record/replay, let me clearly state that this is > the wrong approach: There are still guest devices which do synchronous > I/O and need to be considered in the replay log, and you shouldn't > prevent the analysis code from using AIO (in fact, using sync I/O in new > code is very much frowned upon). Why do guest synchronous requests have to be recorded? Aren't they completely deterministic? > I can explain in more detail what the block device structure looks like > and how to access an image with and without record/replay, but first let > me please know whether I guessed right what your problem is. Or if I > missed your point, can you please describe in detail a case that > wouldn't work? You have understood it correctly. And what is the solution for bypassing one of the layers from component that should not affect the replay? > > And if we'll record only high-level asynchronous disk operations, then > > there will be a way to performs disk accesses without matching these > > operations with replay log. > > > > However, we already tried to record disk operations completion events > > (in previous more complicated version). Recording overhead was the same > > as with new blk_aio_ version of the patch. > > Well, if your code really intentionally ignores synchronous requests, > then the advantage you get from my approach is being correct. > > Other than that, I'm personally mainly interested in keeping the core > block layer easily maintainable and the hot I/O paths clean for cases > that don't use all the features qemu has. This means that in the mid > term existing features need to move out of the core block layer rather > than new ones being added. So doing things in a separate block driver > might not be an advantage for your immediate needs, but it's important > for clean modularisation and a requirement for things to get merged > (unless you can prove that it's impossible to do this way, but I don't > think you can - otherwise the whole block layer design would be wrong). Ok, let's try to find the solution with a new driver. Pavel Dovgalyuk