From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35400) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X4YsT-0006sY-Cs for qemu-devel@nongnu.org; Tue, 08 Jul 2014 13:08:34 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1X4YsK-0001YD-BC for qemu-devel@nongnu.org; Tue, 08 Jul 2014 13:08:25 -0400 Received: from mail-qa0-x231.google.com ([2607:f8b0:400d:c00::231]:39882) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X4YsK-0001Y4-6t for qemu-devel@nongnu.org; Tue, 08 Jul 2014 13:08:16 -0400 Received: by mail-qa0-f49.google.com with SMTP id w8so4946663qac.22 for ; Tue, 08 Jul 2014 10:08:15 -0700 (PDT) Sender: Paolo Bonzini Message-ID: <53BC2579.9060200@redhat.com> Date: Tue, 08 Jul 2014 19:08:09 +0200 From: Paolo Bonzini MIME-Version: 1.0 References: <53BA8B49.9050709@de.ibm.com> <20140708155956.GB11505@stefanha-thinkpad.redhat.com> In-Reply-To: <20140708155956.GB11505@stefanha-thinkpad.redhat.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] another locking issue in current dataplane code? List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi , Christian Borntraeger Cc: Cornelia Huck , Kevin Wolf , ming.lei@canonical.com, "qemu-devel@nongnu.org" , Dominik Dingel Il 08/07/2014 17:59, Stefan Hajnoczi ha scritto: > I sent Christian an initial patch to fix this but now both threads are > stuck in rfifolock_lock() inside cond wait. That's very strange and > should never happen. I had this patch pending for 2.2: commit 6c81e31615c3cda5ea981a998ba8b1b8ed17de6f Author: Paolo Bonzini Date: Mon Jul 7 10:39:49 2014 +0200 iothread: do not rely on aio_poll(ctx, true) result to end a loop Currently, whenever aio_poll(ctx, true) has completed all pending work it returns true *and* the next call to aio_poll(ctx, true) will not block. This invariant has its roots in qemu_aio_flush()'s implementation as "while (qemu_aio_wait()) {}". However, qemu_aio_flush() does not exist anymore and bdrv_drain_all() is implemented differently; and this invariant is complicated to maintain and subtly different from the return value of GMainLoop's g_main_context_iteration. All calls to aio_poll(ctx, true) except one are guarded by a while() loop checking for a request to be incomplete, or a BlockDriverState to be idle. Modify that one exception in iothread.c. Signed-off-by: Paolo Bonzini diff --git a/iothread.c b/iothread.c index 1fbf9f1..d9403cf 100644 --- a/iothread.c +++ b/iothread.c @@ -30,6 +30,7 @@ typedef ObjectClass IOThreadClass; static void *iothread_run(void *opaque) { IOThread *iothread = opaque; + bool blocking; qemu_mutex_lock(&iothread->init_done_lock); iothread->thread_id = qemu_get_thread_id(); @@ -38,8 +39,10 @@ static void *iothread_run(void *opaque) while (!iothread->stopping) { aio_context_acquire(iothread->ctx); - while (!iothread->stopping && aio_poll(iothread->ctx, true)) { + blocking = true; + while (!iothread->stopping && aio_poll(iothread->ctx, blocking)) { /* Progress was made, keep going */ + blocking = false; } aio_context_release(iothread->ctx); } Christian, can you test it? Paolo