All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anton Nefedov <anton.nefedov@virtuozzo.com>
To: Alberto Garcia <berto@igalia.com>, Fam Zheng <famz@redhat.com>
Cc: qemu-devel@nongnu.org, kwolf@redhat.com, qemu-block@nongnu.org,
	mreitz@redhat.com
Subject: Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30)
Date: Wed, 15 Nov 2017 19:31:20 +0300	[thread overview]
Message-ID: <14461b9b-d62d-3723-d2bb-c2fe873207c5@virtuozzo.com> (raw)
In-Reply-To: <w518tf7k0ov.fsf@maestria.local.igalia.com>



On 15/11/2017 6:42 PM, Alberto Garcia wrote:
> On Fri 10 Nov 2017 04:02:23 AM CET, Fam Zheng wrote:
>>>> I'm thinking that perhaps we should add the pause point directly to
>>>> block_job_defer_to_main_loop(), to prevent any block job from
>>>> running the exit function when it's paused.
>>>
>>> I was trying this and unfortunately this breaks the mirror job at
>>> least (reproduced with a simple block-commit on the topmost node,
>>> which uses commit_active_start() -> mirror_start_job()).
>>>
>>> So what happens is that mirror_run() always calls
>>> bdrv_drained_begin() before returning, pausing the block job. The
>>> closing bdrv_drained_end() call is at the end of mirror_exit(),
>>> already in the main loop.
>>>
>>> So the mirror job is always calling block_job_defer_to_main_loop()
>>> and mirror_exit() when the job is paused.
>>
>> FWIW, I think Max's report on 194 failures is related:
>>
>> https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg01822.html
>>
>> so perhaps it's worth testing his patch too:
>>
>> https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg01835.html
> 
> Well, that doesn't solve the original crash with parallel block jobs.
> The root of the crash is that the mirror job manipulates the graph
> _while being paused_, so the BlockReopenQueue in bdrv_reopen_multiple()
> gets messed up, and pausing the jobs (commit 40840e419be31e) won't help.
> 
> I have the impression that one major source of headaches is the fact
> that the reopen queue contains nodes that don't need to be reopened at
> all. Ideally this should be detected early on in bdrv_reopen_queue(), so
> there's no chance that the queue contains nodes used by a different
> block job. If we had that then op blockers should be enough to prevent
> these things. Or am I missing something?
> 
> Berto
> 

After applying Max's patch I tried the similar approach; that is keep
BDSes referenced while they are in the reopen queue.
Now I get the stream job hanging. Somehow one blk_root_drained_begin()
is not paired with blk_root_drained_end(). So the job stays paused.
Didn't dig deeper yet, but at first glance the reduced reopen queue
won't help with this, as reopen drains all BDSes anyway (or can we avoid
that too?)

/Anton

  reply	other threads:[~2017-11-15 16:31 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-07 16:19 [Qemu-devel] segfault in parallel blockjobs (iotest 30) Anton Nefedov
2017-11-08 13:58 ` [Qemu-devel] [Qemu-block] " Alberto Garcia
2017-11-08 14:45 ` Alberto Garcia
2017-11-08 15:50   ` Anton Nefedov
2017-11-09  6:05     ` Fam Zheng
2017-11-09 10:03       ` Alberto Garcia
2017-11-09 16:26   ` Alberto Garcia
2017-11-10  3:02     ` Fam Zheng
2017-11-15 15:42       ` Alberto Garcia
2017-11-15 16:31         ` Anton Nefedov [this message]
2017-11-16 15:54           ` Alberto Garcia
2017-11-16 16:09             ` Anton Nefedov
2017-11-21 12:51               ` Alberto Garcia
2017-11-21 15:18                 ` Anton Nefedov
2017-11-21 15:31                   ` Alberto Garcia
2017-11-22  0:13                     ` John Snow
2017-11-22 12:55                   ` Alberto Garcia
2017-11-22 15:58                     ` John Snow
2017-11-16 21:56             ` John Snow
2017-11-17 16:16               ` Alberto Garcia
2017-11-22 15:05 ` Alberto Garcia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=14461b9b-d62d-3723-d2bb-c2fe873207c5@virtuozzo.com \
    --to=anton.nefedov@virtuozzo.com \
    --cc=berto@igalia.com \
    --cc=famz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.