From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37526) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aOgbz-0007p7-En for qemu-devel@nongnu.org; Thu, 28 Jan 2016 02:03:24 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aOgby-0007ct-IY for qemu-devel@nongnu.org; Thu, 28 Jan 2016 02:03:23 -0500 From: Fam Zheng Date: Thu, 28 Jan 2016 15:02:51 +0800 Message-Id: <1453964571-23016-3-git-send-email-famz@redhat.com> In-Reply-To: <1453964571-23016-1-git-send-email-famz@redhat.com> References: <1453964571-23016-1-git-send-email-famz@redhat.com> Subject: [Qemu-devel] [PATCH 2/2] blockjob: Fix hang in block_job_finish_sync List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: Kevin Wolf , qemu-block@nongnu.org, Jeff Cody , mreitz@redhat.com, stefanha@redhat.com, jsnow@redhat.com With a mirror job running on a virtio-blk dataplane disk, sending "q" to HMP will cause a dead loop in block_job_finish_sync. This is because the aio_poll() only processes the AIO context of bs which has no more work to do, while the main loop BH that is scheduled for setting the job->completed flag is never processed. Fix this by adding a "ctx" pointer in BlockJob structure, to track which context to poll for the block job to make progress. Its value is set to the BDS context at block job creation, until block_job_coroutine_complete() is called by the block job coroutine. After that point, the block job's work is deferred to main loop BH. Signed-off-by: Fam Zheng --- blockjob.c | 4 +++- include/block/blockjob.h | 2 ++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/blockjob.c b/blockjob.c index 4b16720..4ea1ce0 100644 --- a/blockjob.c +++ b/blockjob.c @@ -74,6 +74,7 @@ void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs, job->opaque = opaque; job->busy = true; job->refcnt = 1; + job->ctx = bdrv_get_aio_context(bs); bs->job = job; /* Only set speed when necessary to avoid NotSupported error */ @@ -304,7 +305,7 @@ static int block_job_finish_sync(BlockJob *job, return -EBUSY; } while (!job->completed) { - aio_poll(bdrv_get_aio_context(bs), true); + aio_poll(job->ctx, true); } ret = (job->cancelled && job->ret == 0) ? -ECANCELED : job->ret; block_job_unref(job); @@ -497,6 +498,7 @@ void block_job_coroutine_complete(BlockJob *job, data->aio_context = bdrv_get_aio_context(job->bs); data->fn = fn; data->opaque = opaque; + job->ctx = qemu_get_aio_context(); qemu_bh_schedule(data->bh); } diff --git a/include/block/blockjob.h b/include/block/blockjob.h index de59fc2..5c6a884 100644 --- a/include/block/blockjob.h +++ b/include/block/blockjob.h @@ -92,6 +92,8 @@ struct BlockJob { */ char *id; + AioContext *ctx; + /** * The coroutine that executes the job. If not NULL, it is * reentered when busy is false and the job is cancelled. -- 2.4.3