From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60048) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aVIXx-0005gk-H6 for qemu-devel@nongnu.org; Mon, 15 Feb 2016 07:46:34 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aVIXs-0004zp-Ci for qemu-devel@nongnu.org; Mon, 15 Feb 2016 07:46:33 -0500 Received: from mx1.redhat.com ([209.132.183.28]:33485) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aVIXr-0004ze-Re for qemu-devel@nongnu.org; Mon, 15 Feb 2016 07:46:28 -0500 Date: Mon, 15 Feb 2016 13:46:23 +0100 From: Kevin Wolf Message-ID: <20160215124623.GD5244@noname.str.redhat.com> References: <000601d163fb$4cbcae70$e6360b50$@Dovgaluk@ispras.ru> <20160210122816.GB5474@noname.redhat.com> <000a01d16401$be4d31d0$3ae79570$@Dovgaluk@ispras.ru> <20160210132545.GC5474@noname.redhat.com> <001201d16597$fa5de6a0$ef19b3e0$@Dovgaluk@ispras.ru> <20160212135820.GD4828@noname.redhat.com> <003301d167cc$4d7d9480$e878bd80$@ru> <003a01d167d1$42df95f0$c89ec1d0$@ru> <20160215093810.GC5244@noname.str.redhat.com> <004601d167e2$b039e8b0$10adba10$@ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <004601d167e2$b039e8b0$10adba10$@ru> Subject: Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Pavel Dovgalyuk Cc: edgar.iglesias@xilinx.com, peter.maydell@linaro.org, igor.rubinov@gmail.com, mark.burton@greensocs.com, real@ispras.ru, hines@cert.org, qemu-devel@nongnu.org, maria.klimushenkova@ispras.ru, stefanha@redhat.com, pbonzini@redhat.com, batuzovk@ispras.ru, alex.bennee@linaro.org, fred.konrad@greensocs.com Am 15.02.2016 um 12:19 hat Pavel Dovgalyuk geschrieben: > > From: Kevin Wolf [mailto:kwolf@redhat.com] > > Am 15.02.2016 um 10:14 hat Pavel Dovgalyuk geschrieben: > > > > From: Pavel Dovgalyuk [mailto:dovgaluk@ispras.ru] > > > > > From: Kevin Wolf [mailto:kwolf@redhat.com] > > > > > > > > > > > > > > int blkreplay_co_readv() > > > > > > > { > > > > > > > BlockReplayState *s = bs->opaque; > > > > > > > int reqid = s->reqid++; > > > > > > > > > > > > > > bdrv_co_readv(bs->file, ...); > > > > > > > > > > > > > > if (mode == record) { > > > > > > > log(reqid, time); > > > > > > > } else { > > > > > > > assert(mode == replay); > > > > > > > bool *done = req_replayed_list_get(reqid) > > > > > > > if (done) { > > > > > > > *done = true; > > > > > > > } else { > > > > > > point A > > > > > > > req_completed_list_insert(reqid, qemu_coroutine_self()); > > > > > > > qemu_coroutine_yield(); > > > > > > > } > > > > > > > } > > > > > > > } > > > > > > > > > > > > > > /* called by replay.c */ > > > > > > > int blkreplay_run_event() > > > > > > > { > > > > > > > if (mode == replay) { > > > > > > > co = req_completed_list_get(e.reqid); > > > > > > > if (co) { > > > > > > > qemu_coroutine_enter(co); > > > > > > > } else { > > > > > > > bool done = false; > > > > > > > req_replayed_list_insert(reqid, &done); > > > > > > point B > > > > > > > /* wait synchronously for completion */ > > > > > > > while (!done) { > > > > > > > aio_poll(); > > > > > > > } > > > > > > > } > > > > > > > } > > > > > > > } > > > > > > > > Now I've encountered a situation where blkreplay_run_event is called from read coroutine: > > > bdrv_prwv_co -> aio_poll -> qemu_clock_get_ns -> replay_read_clock -> blkreplay_run_event > > > \--> bdrv_co_readv -> blkreplay_co_readv -> bdrv_co_readv(lower layer) > > > > > > bdrv_co_readv inside blkreplay_co_readv can't proceed in this situation. > > > This is probably because aio_poll has taken the aio context? > > > How can I resolve this? > > > > First of all, I'm not sure if running replay events from > > qemu_clock_get_ns() is such a great idea. This is not a function that > > callers expect to change any state. If you absolutely have to do it > > there instead of in the clock device emulations, maybe restricting it to > > replaying clock events could make it a bit more harmless. > > qemu_clock_get_ns() wants to read some clock data from the log. > While reading, it finds block driver event and tries to proceed it. > These block events may occur at any moment, because blkreplay_co_readv() > writes them immediately as executes. I understand what's happening. I just think it's asking for bugs when you do this in a low-level function like qemu_clock_get_ns() that nobody expects to have any side effects. But we'll see how many bugs this will cause. > Alternative approach is adding these events to the queue and executing them > when checkpoint will be met. I'm not sure that this is easy to implement with > coroutines. Shouldn't be hard, you're queueing requests already. But it's unlikely to be the right solution. Why does qemu_clock_get_ns() need to replay events to begin with? Shouldn't it just return the current state without replaying anything new? > > Anyway, what does "can't proceed" mean? The coroutine yields because > > it's waiting for I/O, but it is never reentered? Or is it hanging while > > trying to acquire a lock? > > bdrv_co_io_em() (raw file read/write) yields and waits inifinitely to return, because > aio_poll hands in some clock read. Okay, so the coroutine is sleeping while it waits for I/O to complete. What's missing is the I/O completion callback. When the coroutine yields in bdrv_co_io_em(), the caller coroutines continues execution at blkreplay_run_event(). My naive expectation would be that it returns to replay_read_clock(), which in turn returns to qemu_clock_get_ns(), which knows the right time now and returns to aio_poll(), which would process I/O completions and call the callback. The callback would reenter the other coroutine in bdrv_co_io_em() and the request would be completed. As the callback isn't happening, something in this unwinding process don't work as expected and we're actually hanging somewhere else. Do you know where that is? Kevin