All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
To: 08005325@163.com, qemu-block@nongnu.org, qemu-devel@nongnu.org
Cc: kwolf@redhat.com, Michael Qiu <qiudayu@huayun.com>,
	mreitz@redhat.com, liangpeng10@huawei.com, quweijie@huayun.com,
	jsnow@redhat.com
Subject: Re: [PATCH] blockjob: Fix crash with IOthread when block commit after snapshot
Date: Wed, 3 Feb 2021 10:45:07 +0300	[thread overview]
Message-ID: <4a1d2a61-0c6c-7d44-092b-26a05798e54a@virtuozzo.com> (raw)
In-Reply-To: <20210203024059.52683-1-08005325@163.com>

subject should start with [PATCH v5]

03.02.2021 05:40, 08005325@163.com wrote:
> From: Michael Qiu <qiudayu@huayun.com>
> 
> v5: reformat the commit log with backtrace of main thread
>      Add a boolean variable to make main thread could re-acquire
>      aio_context on success path.
> 
> v4: rebase to latest code
> 
> v3: reformat the commit log, remove duplicate content

patch history shouldn't go into commit message. So you should place it under '---' [*], after calling git format-patch

> 
> Currently, if guest has workloads, IO thread will acquire aio_context
> lock before do io_submit, it leads to segmentfault when do block commit
> after snapshot. Just like below:
> 
> Program received signal SIGSEGV, Segmentation fault.
> 
> [Switching to Thread 0x7f7c7d91f700 (LWP 99907)]
> 0x00005576d0f65aab in bdrv_mirror_top_pwritev at ../block/mirror.c:1437
> 1437    ../block/mirror.c: No such file or directory.
> (gdb) p s->job
> $17 = (MirrorBlockJob *) 0x0
> (gdb) p s->stop
> $18 = false
> 
> Call trace of IO thread:
> 0  0x00005576d0f65aab in bdrv_mirror_top_pwritev at ../block/mirror.c:1437
> 1  0x00005576d0f7f3ab in bdrv_driver_pwritev at ../block/io.c:1174
> 2  0x00005576d0f8139d in bdrv_aligned_pwritev at ../block/io.c:1988
> 3  0x00005576d0f81b65 in bdrv_co_pwritev_part at ../block/io.c:2156
> 4  0x00005576d0f8e6b7 in blk_do_pwritev_part at ../block/block-backend.c:1260
> 5  0x00005576d0f8e84d in blk_aio_write_entry at ../block/block-backend.c:1476
> ...
> 
> Switch to qemu main thread:
> 0  0x00007f903be704ed in __lll_lock_wait at
> /lib/../lib64/libpthread.so.0
> 1  0x00007f903be6bde6 in _L_lock_941 at /lib/../lib64/libpthread.so.0
> 2  0x00007f903be6bcdf in pthread_mutex_lock at
> /lib/../lib64/libpthread.so.0
> 3  0x0000564b21456889 in qemu_mutex_lock_impl at
> ../util/qemu-thread-posix.c:79
> 4  0x0000564b213af8a5 in block_job_add_bdrv at ../blockjob.c:224
> 5  0x0000564b213b00ad in block_job_create at ../blockjob.c:440
> 6  0x0000564b21357c0a in mirror_start_job at ../block/mirror.c:1622
> 7  0x0000564b2135a9af in commit_active_start at ../block/mirror.c:1867
> 8  0x0000564b2133d132 in qmp_block_commit at ../blockdev.c:2768
> 9  0x0000564b2141fef3 in qmp_marshal_block_commit at
> qapi/qapi-commands-block-core.c:346
> 10 0x0000564b214503c9 in do_qmp_dispatch_bh at
> ../qapi/qmp-dispatch.c:110
> 11 0x0000564b21451996 in aio_bh_poll at ../util/async.c:164
> 12 0x0000564b2146018e in aio_dispatch at ../util/aio-posix.c:381
> 13 0x0000564b2145187e in aio_ctx_dispatch at ../util/async.c:306
> 14 0x00007f9040239049 in g_main_context_dispatch at
> /lib/../lib64/libglib-2.0.so.0
> 15 0x0000564b21447368 in main_loop_wait at ../util/main-loop.c:232
> 16 0x0000564b21447368 in main_loop_wait at ../util/main-loop.c:255
> 17 0x0000564b21447368 in main_loop_wait at ../util/main-loop.c:531
> 18 0x0000564b212304e1 in qemu_main_loop at ../softmmu/runstate.c:721
> 19 0x0000564b20f7975e in main at ../softmmu/main.c:50
> 
> In IO thread when do bdrv_mirror_top_pwritev, the job is NULL, and stop field
> is false, this means the MirrorBDSOpaque "s" object has not been initialized
> yet, and this object is initialized by block_job_create(), but the initialize
> process is stuck in acquiring the lock.
> 
> In this situation, IO thread come to bdrv_mirror_top_pwritev(),which means that
> mirror-top node is already inserted into block graph, but its bs->opaque->job
> is not initialized.
> 
> The root cause is that qemu main thread do release/acquire when hold the lock,
> at the same time, IO thread get the lock after release stage, and the crash
> occured.
> 
> Actually, in this situation, job->job.aio_context will not equal to
> qemu_get_aio_context(), and will be the same as bs->aio_context,
> thus, no need to release the lock, becasue bdrv_root_attach_child()
> will not change the context.
> 
> This patch fix this issue.
> 
> Fixes: 132ada80 "block: Adjust AioContexts when attaching nodes"
> 
> Signed-off-by: Michael Qiu <qiudayu@huayun.com>

I feel like there may be more problems (like the fact that drained section should be expanded, and
that expanding doesn't help as Michael said), but I think that temporary releasing locks is unsafe
thing, and if we can avoid it for some cases it's good, especially if it fixes some bug:

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

> ---

[*] patch history and anything that you don't want to put into final commit message goes here.

>   blockjob.c | 10 ++++++++--
>   1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/blockjob.c b/blockjob.c
> index db3a21699c..d9dca36f65 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -212,15 +212,21 @@ int block_job_add_bdrv(BlockJob *job, const char *name, BlockDriverState *bs,
>                          uint64_t perm, uint64_t shared_perm, Error **errp)
>   {
>       BdrvChild *c;
> +    bool need_context_ops;
>   
>       bdrv_ref(bs);
> -    if (job->job.aio_context != qemu_get_aio_context()) {
> +
> +    need_context_ops = bdrv_get_aio_context(bs) != job->job.aio_context;

I'd also put the second condition into same variable, just for less typing. Still it should work as is.

> +
> +    if (need_context_ops &&
> +        job->job.aio_context != qemu_get_aio_context()) {
>           aio_context_release(job->job.aio_context);
>       }
>       c = bdrv_root_attach_child(bs, name, &child_job, 0,
>                                  job->job.aio_context, perm, shared_perm, job,
>                                  errp);
> -    if (job->job.aio_context != qemu_get_aio_context()) {
> +    if (need_context_ops &&
> +        job->job.aio_context != qemu_get_aio_context()) {
>           aio_context_acquire(job->job.aio_context);
>       }
>       if (c == NULL) {
> 


-- 
Best regards,
Vladimir


  reply	other threads:[~2021-02-03  7:56 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-03  2:40 [PATCH] blockjob: Fix crash with IOthread when block commit after snapshot 08005325
2021-02-03  7:45 ` Vladimir Sementsov-Ogievskiy [this message]
2021-02-04  8:38   ` Michael Qiu
2021-02-12 13:03   ` Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4a1d2a61-0c6c-7d44-092b-26a05798e54a@virtuozzo.com \
    --to=vsementsov@virtuozzo.com \
    --cc=08005325@163.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=liangpeng10@huawei.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qiudayu@huayun.com \
    --cc=quweijie@huayun.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.