All of lore.kernel.org
 help / color / mirror / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: Qemu-block <qemu-block@nongnu.org>, Qemu-devel <qemu-devel@nongnu.org>
Subject: [Qemu-devel] Drainage in bdrv_replace_child_noperm()
Date: Mon, 6 Nov 2017 19:49:17 +0100	[thread overview]
Message-ID: <8a184b91-49ef-bb52-d190-053c4c0861a1@redhat.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 4291 bytes --]

Hi everyone,

On my quest to fix some flaky iotests, I came to a bit of a halt on 129.
 (Details: Its issue is that block jobs now generally ignore throttling
in a BB (because they use their own), so we have to add a throttle node
instead.  However, when I added it, I got an abort.)

My issue can be reproduced as follows:

$ x86_64-softmmu/qemu-system-x86_64 \
    -qmp stdio \
    -object throttle-group,id=tg0 \
    -blockdev "{'driver':'throttle','node-name':'drive0',
                'throttle-group':'tg0','file':{'driver':'null-co'}}" \
    -blockdev node-name=target,driver=null-co
{"QMP": {"version": {"qemu": {"micro": 50, "minor": 10, "major": 2},
"package": " (v2.9.0-632-g4a52d43-dirty)"}, "capabilities": []}}
{'execute':'qmp_capabilities'}
{"return": {}}
{'execute':'blockdev-mirror','arguments':{
    'device':'drive0','job-id':'job0','target':'target','sync':'full',
    'filter-node-name':'mirror-node' }}
qemu-system-x86_64: block/throttle.c:213: throttle_co_drain_end:
Assertion `tgm->io_limits_disabled' failed.
[1]    3524 abort (core dumped)  x86_64-softmmu/qemu-system-x86_64 -qmp
stdio -object throttle-group,id=tg0

Here's what happens:

(1) bdrv_drained_begin(bs) in mirror_start_job() starts draining drive0.

(2) bdrv_append(...) puts mirror-node above drive0.  Through
bdrv_replace_child_noperm(), this will invoke
bdrv_child_cb_drained_begin() on mirror-node.  This is necessary because
drive0 is drained, so the new parent needs to be drained as well.
However, note that drive0 is not yet attached to mirror-node.
Therefore, mirror-node cannot drain drive0 recursively.

This is seemingly fine because drive0 is drained anyway.  However, this
is different from what would happen if we would have drained drive0 with
mirror-node already attached to it as its parent: Then, we would have
drained drive0 twice; once by itself, and another time recursively
through mirror-node.

This will be important in a second...

(3) ...and this second is now: We invoke bdrv_drained_end() on drive0.
Now, through bdrv_parent_drained_end() and bdrv_child_cb_drained_end()
that goes up to mirror-node which recursively un-drains drive0.  Fine so
far.  But once that parent un-drain is done, we un-drain drive0 by
itself: And this fails the assertion in the throttle driver because we
attempt to un-drain it twice, although we've drained it only once.


So the issue has two parts:

(A) (Un-)Draining a parent from a child will always (?[1]) (un-)drain
that child, too.  This seems a bit superfluous to me and I would guess
that it results in worst-case O(n^2) function calls to drain a block
graph consisting of n nodes.

(B) In bdrv_replace_child_noperm() we try to drain the parent if the new
child is drained; specifically, we want it to be in a state as if it had
been a parent when the child was originally drained.  However, we fail
at this because we drain the parent without the child attached, so we
don't drain the child twice.  This bites us when we undrain everything.

(Most importantly, ideally we'd want to attach the new child to the
parent and then drain the parent: This would give us exactly the state
we want.  However, attaching the child first and then draining the
parent is unsafe, so we cannot do it...)

[1] Whether the parent (un-)drains the child depends on the
BdrvChildRole.drained_{begin,end}() implementation, strictly speaking.
We cannot say it generally.

OK, so how to fix it?  I don't know, so I'm asking you. :-)

I have two ideas:

One is to assume that (un-)draining a parent will always (un-)drain all
children, including the one the (un-)drain comes from.  This assumption
seems wrong, see [1], but maybe it isn't.  Anyway, if so, we could just
explicitly drain the new child in bdrv_replace_child_noperm() after
having drained the parent and thus get a consistent state again.

The other is to declare (A) wrong.  Maybe when
BdrvChildRole.drained_{begin,end}() is invoked, we should not drain that
child because we can declare it the caller's responsibility to make sure
it's drained.  This seems logical to me because usually those methods
are invoked when the child is drained anyway.  But maybe I'm wrong. :-)


So, any ideas?

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

             reply	other threads:[~2017-11-06 18:49 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-06 18:49 Max Reitz [this message]
2017-11-07  5:21 ` [Qemu-devel] Drainage in bdrv_replace_child_noperm() Fam Zheng
2017-11-08 19:39   ` Max Reitz
2017-11-07 14:22 ` [Qemu-devel] [Qemu-block] " Kevin Wolf
2017-11-08 20:16   ` Max Reitz
2017-11-09 16:25     ` Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8a184b91-49ef-bb52-d190-053c4c0861a1@redhat.com \
    --to=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.