From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:49329) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aYrsi-0000x1-PY for qemu-devel@nongnu.org; Thu, 25 Feb 2016 04:06:45 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aYrsf-0002ry-Io for qemu-devel@nongnu.org; Thu, 25 Feb 2016 04:06:44 -0500 Received: from mail.ispras.ru ([83.149.199.45]:60788) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aYrsf-0002rl-5D for qemu-devel@nongnu.org; Thu, 25 Feb 2016 04:06:41 -0500 From: "Pavel Dovgalyuk" References: <20160215140635.GF5244@noname.str.redhat.com> <005501d167fc$8ed75030$ac85f090$@ru> <20160215150110.GG5244@noname.str.redhat.com> <000601d16882$c9637270$5c2a5750$@ru> <20160216100208.GA4920@noname.str.redhat.com> <000a01d168ac$09929500$1cb7bf00$@ru> <20160216125453.GC4920@noname.str.redhat.com> <000601d16bad$e93f9eb0$bbbedc10$@ru> <20160222110644.GD5387@noname.str.redhat.com> <002101d16efa$c74b3850$55e1a8f0$@ru> <20160224131407.GF4485@noname.redhat.com> In-Reply-To: <20160224131407.GF4485@noname.redhat.com> Date: Thu, 25 Feb 2016 12:06:41 +0300 Message-ID: <000c01d16fab$d85e3b40$891ab1c0$@ru> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Language: ru Subject: Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: 'Kevin Wolf' Cc: edgar.iglesias@xilinx.com, peter.maydell@linaro.org, igor.rubinov@gmail.com, mark.burton@greensocs.com, real@ispras.ru, hines@cert.org, qemu-devel@nongnu.org, maria.klimushenkova@ispras.ru, stefanha@redhat.com, pbonzini@redhat.com, batuzovk@ispras.ru, alex.bennee@linaro.org, fred.konrad@greensocs.com > From: Kevin Wolf [mailto:kwolf@redhat.com] > > > Coroutines aren't randomly assigned to threads, but threads actively > > > enter coroutines. To my knowledge this happens only when starting a > > > request (either vcpu or I/O thread; consistent per device) or by a > > > callback when some event happens (only I/O thread). I can't see any > > > non-determinism here. > > > > Behavior of coroutines looks strange for me. > > Consider the code below (co_readv function of the replay driver). > > In record mode it somehow changes the thread it assigned to. > > Code in point A is executed in CPU thread and code in point B - in some other thread. > > May this happen because this coroutine yields somewhere and its execution is restored > > by aio_poll, which is called from iothread? > > In this case event finishing callback cannot be executed deterministically > > (always in CPU thread or always in IO thread). > > > > static int coroutine_fn blkreplay_co_readv(BlockDriverState *bs, > > int64_t sector_num, int nb_sectors, QEMUIOVector *qiov) > > { > > BDRVBlkreplayState *s = bs->opaque; > > uint32_t reqid = request_id++; > > Request *req; > > // A > > bdrv_co_readv(bs->file->bs, sector_num, nb_sectors, qiov); > > > > if (replay_mode == REPLAY_MODE_RECORD) { > > replay_save_block_event(reqid); > > } else { > > assert(replay_mode == REPLAY_MODE_PLAY); > > if (reqid == current_request) { > > current_finished = true; > > } else { > > req = block_request_insert(reqid, bs, qemu_coroutine_self()); > > qemu_coroutine_yield(); > > block_request_remove(req); > > } > > } > > // B > > return 0; > > } > > Yes, I guess this can happen. As I described above, the coroutine can be > entered from a vcpu thread initially. After yielding for the first time, > it is resumed from the I/O thread. So if there are paths where the > coroutine never yields, the coroutine completes in the original vcpu > thread. (It's not the common case that bdrv_co_readv() doesn't yield, > but it happens e.g. with unallocated sectors in qcow2.) > > If this is a problem for you, you need to force the coroutine into the > I/O thread. You can do that by scheduling a BH, then yield, and then let > the BH reenter the coroutine. Thanks, this approach seems to work. I got rid of replay_run_block_event, because BH basically does the same job. There is one problem with flush event - callbacks for flush are called for all layers and I couldn't synchronize them correctly yet. I'll probably have to add new callback to block driver, which handles flush request for the whole stack of the drivers. Pavel Dovgalyuk