From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:60048)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1aVIXx-0005gk-H6
	for qemu-devel@nongnu.org; Mon, 15 Feb 2016 07:46:34 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1aVIXs-0004zp-Ci
	for qemu-devel@nongnu.org; Mon, 15 Feb 2016 07:46:33 -0500
Received: from mx1.redhat.com ([209.132.183.28]:33485)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1aVIXr-0004ze-Re
	for qemu-devel@nongnu.org; Mon, 15 Feb 2016 07:46:28 -0500
Date: Mon, 15 Feb 2016 13:46:23 +0100
From: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20160215124623.GD5244@noname.str.redhat.com>
References: <000601d163fb$4cbcae70$e6360b50$@Dovgaluk@ispras.ru>
	<20160210122816.GB5474@noname.redhat.com>
	<000a01d16401$be4d31d0$3ae79570$@Dovgaluk@ispras.ru>
	<20160210132545.GC5474@noname.redhat.com>
	<001201d16597$fa5de6a0$ef19b3e0$@Dovgaluk@ispras.ru>
	<20160212135820.GD4828@noname.redhat.com>
	<003301d167cc$4d7d9480$e878bd80$@ru>
	<003a01d167d1$42df95f0$c89ec1d0$@ru>
	<20160215093810.GC5244@noname.str.redhat.com>
	<004601d167e2$b039e8b0$10adba10$@ru>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <004601d167e2$b039e8b0$10adba10$@ru>
Subject: Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices
	record/replay
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Pavel Dovgalyuk <dovgaluk@ispras.ru>
Cc: edgar.iglesias@xilinx.com, peter.maydell@linaro.org, igor.rubinov@gmail.com, mark.burton@greensocs.com, real@ispras.ru, hines@cert.org, qemu-devel@nongnu.org, maria.klimushenkova@ispras.ru, stefanha@redhat.com, pbonzini@redhat.com, batuzovk@ispras.ru, alex.bennee@linaro.org, fred.konrad@greensocs.com

Am 15.02.2016 um 12:19 hat Pavel Dovgalyuk geschrieben:
> > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > Am 15.02.2016 um 10:14 hat Pavel Dovgalyuk geschrieben:
> > > > From: Pavel Dovgalyuk [mailto:dovgaluk@ispras.ru]
> > > > > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > > > > > >
> > > > > > > int blkreplay_co_readv()
> > > > > > > {	
> > > > > > >     BlockReplayState *s = bs->opaque;
> > > > > > >     int reqid = s->reqid++;
> > > > > > >
> > > > > > >     bdrv_co_readv(bs->file, ...);
> > > > > > >
> > > > > > >     if (mode == record) {
> > > > > > >         log(reqid, time);
> > > > > > >     } else {
> > > > > > >         assert(mode == replay);
> > > > > > >         bool *done = req_replayed_list_get(reqid)
> > > > > > >         if (done) {
> > > > > > >             *done = true;
> > > > > > >         } else {
> > > > > > point A
> > > > > > >             req_completed_list_insert(reqid, qemu_coroutine_self());
> > > > > > >             qemu_coroutine_yield();
> > > > > > >         }
> > > > > > >     }
> > > > > > > }
> > > > > > >
> > > > > > > /* called by replay.c */
> > > > > > > int blkreplay_run_event()
> > > > > > > {
> > > > > > >     if (mode == replay) {
> > > > > > >         co = req_completed_list_get(e.reqid);
> > > > > > >         if (co) {
> > > > > > >             qemu_coroutine_enter(co);
> > > > > > >         } else {
> > > > > > >             bool done = false;
> > > > > > >             req_replayed_list_insert(reqid, &done);
> > > > > > point B
> > > > > > >             /* wait synchronously for completion */
> > > > > > >             while (!done) {
> > > > > > >                 aio_poll();
> > > > > > >             }
> > > > > > >         }
> > > > > > >     }
> > > > > > > }
> > > > >
> > > Now I've encountered a situation where blkreplay_run_event is called from read coroutine:
> > > bdrv_prwv_co -> aio_poll -> qemu_clock_get_ns -> replay_read_clock -> blkreplay_run_event
> > >            \--> bdrv_co_readv -> blkreplay_co_readv -> bdrv_co_readv(lower layer)
> > >
> > > bdrv_co_readv inside blkreplay_co_readv can't proceed in this situation.
> > > This is probably because aio_poll has taken the aio context?
> > > How can I resolve this?
> > 
> > First of all, I'm not sure if running replay events from
> > qemu_clock_get_ns() is such a great idea. This is not a function that
> > callers expect to change any state. If you absolutely have to do it
> > there instead of in the clock device emulations, maybe restricting it to
> > replaying clock events could make it a bit more harmless.
> 
> qemu_clock_get_ns() wants to read some clock data from the log.
> While reading, it finds block driver event and tries to proceed it.
> These block events may occur at any moment, because blkreplay_co_readv()
> writes them immediately as executes.

I understand what's happening. I just think it's asking for bugs when
you do this in a low-level function like qemu_clock_get_ns() that nobody
expects to have any side effects. But we'll see how many bugs this will
cause.

> Alternative approach is adding these events to the queue and executing them
> when checkpoint will be met. I'm not sure that this is easy to implement with
> coroutines.

Shouldn't be hard, you're queueing requests already. But it's unlikely
to be the right solution.

Why does qemu_clock_get_ns() need to replay events to begin with?
Shouldn't it just return the current state without replaying anything
new?

> > Anyway, what does "can't proceed" mean? The coroutine yields because
> > it's waiting for I/O, but it is never reentered? Or is it hanging while
> > trying to acquire a lock?
> 
> bdrv_co_io_em() (raw file read/write) yields and waits inifinitely to return, because
> aio_poll hands in some clock read.

Okay, so the coroutine is sleeping while it waits for I/O to complete.
What's missing is the I/O completion callback.

When the coroutine yields in bdrv_co_io_em(), the caller coroutines
continues execution at blkreplay_run_event(). My naive expectation would
be that it returns to replay_read_clock(), which in turn returns to
qemu_clock_get_ns(), which knows the right time now and returns to
aio_poll(), which would process I/O completions and call the callback.
The callback would reenter the other coroutine in bdrv_co_io_em() and
the request would be completed.

As the callback isn't happening, something in this unwinding process
don't work as expected and we're actually hanging somewhere else. Do you
know where that is?

Kevin