qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>,
	qemu-devel@nongnu.org, qemu-block@nongnu.org
Subject: Re: [Qemu-devel] [PATCH v2 0/9] block: Delay poll when ending drained sections
Date: Wed, 17 Jul 2019 15:20:07 +0200	[thread overview]
Message-ID: <ced90b10-b5ee-21dc-4c46-e47aaac27fc9@redhat.com> (raw)
In-Reply-To: <20190716163746.GH7297@linux.fritz.box>


[-- Attachment #1.1: Type: text/plain, Size: 5233 bytes --]

On 16.07.19 18:37, Kevin Wolf wrote:
> Am 16.07.2019 um 18:24 hat Max Reitz geschrieben:
>> On 16.07.19 16:40, Kevin Wolf wrote:
>>> Am 19.06.2019 um 17:25 hat Max Reitz geschrieben:
>>>> Hi,
>>>>
>>>> This is v2 to “block: Keep track of parent quiescing”.
>>>>
>>>> Please read this cover letter, because I’m very unsure about the design
>>>> in this series and I’d appreciate some comments.
>>>>
>>>> As Kevin wrote in his reply to that series, the actual problem is that
>>>> bdrv_drain_invoke() polls on every node whenever ending a drain.  This
>>>> may cause graph changes, which is actually forbidden.
>>>>
>>>> To solve that problem, this series makes the drain code construct a list
>>>> of undrain operations that have been initiated, and then polls all of
>>>> them on the root level once graph changes are acceptable.
>>>>
>>>> Note that I don’t like this list concept very much, so I’m open to
>>>> alternatives.
>>>
>>> So drain_end is different from drain_begin in that it wants to wait only
>>> for all bdrv_drain_invoke() calls to complete, but not for other
>>> requests that are in flight. Makes sense.
>>>
>>> Though instead of managing a whole list, wouldn't a counter suffice?
>>>
>>>> Furthermore, all BdrvChildRoles with BDS parents have a broken
>>>> .drained_end() implementation.  The documentation clearly states that
>>>> this function is not allowed to poll, but it does.  So this series
>>>> changes it to a variant (using the new code) that does not poll.
>>>>
>>>> There is a catch, which may actually be a problem, I don’t know: The new
>>>> variant of that .drained_end() does not poll at all, never.  As
>>>> described above, now every bdrv_drain_invoke() returns an object that
>>>> describes when it will be done and which can thus be polled for.  These
>>>> objects are just discarded when using BdrvChildRole.drained_end().  That
>>>> does not feel quite right.  It would probably be more correct to let
>>>> BdrvChildRole.drained_end() return these objects so the top level
>>>> bdrv_drained_end() can poll for their completion.
>>>>
>>>> I decided not to do this, for two reasons:
>>>> (1) Doing so would spill the “list of objects to poll for” design to
>>>>     places outside of block/io.c.  I don’t like the design very much as
>>>>     it is, but I can live with it as long as it’s constrained to the
>>>>     core drain code in block/io.c.
>>>>     This is made worse by the fact that currently, those objects are of
>>>>     type BdrvCoDrainData.  But it shouldn’t be a problem to add a new
>>>>     type that is externally visible (we only need the AioContext and
>>>>     whether bdrv_drain_invoke_entry() is done).
>>>>
>>>> (2) It seems to work as it is.
>>>>
>>>> The alternative would be to add the same GSList ** parameter to
>>>> BdrvChildRole.drained_end() that I added in the core drain code in patch
>>>> 2, and then let the .drained_end() implementation fill that with objects
>>>> to poll for.  (Which would be accomplished by adding a frontend to
>>>> bdrv_do_drained_end() that lets bdrv_child_cb_drained_poll() pass the
>>>> parameter through.)
>>>>
>>>> Opinions?
>>>
>>> I think I would add an int* to BdrvChildRole.drained_end() so that we
>>> can just increase the counter whereever we need to.
>>
>> So you mean just polling the @bs for which a caller gave poll=true until
>> the counter reaches 0?  I’ll try, sounds good (if I can get it to work).
> 
> Yes, that's what I have in mind.
> 
> We expect graph changes to happen during the polling, but I think the
> caller is responsible for making sure that the top-level @bs stays
> around, so we don't need to be extra careful here.
> 
> Also, bdrv_drain_invoke() is always called in the same AioContext as the
> top-level drain operation, so the whole aio_context_acquire/release
> stuff from this series should become unnecessary, and we don't need
> atomics to access the counter either.
> 
> So I think this should really simplify the series a lot.

Hm.  Unfortunately, not all nodes in a chain always have the same
AioContext.

I think they generally should, but there is at least one exception:
bdrv_set_aio_context*() itself.  bdrv_set_aio_context_ignore() drains
the node, then puts other members of the subgraph into the same
AioContext, then itself.

Now say this reaches the bottom node.  That node will not recurse
anywhere else, but only change its own AioContext, in a drained section.
 So when that section ends, the bottom node will be in a different
AioContext than the other nodes.

So, er, well.  I have three ideas:

(1) Skip the polling on the top level drained_end if the node still has
another quiesce_counter on it.  Sounds a bit too error-prone to me.

(2) Drop the drained sections in bdrv_set_aio_context_ignore().  Instead
require the root caller to have the whole subtree drained.  That way,
drained_end will never be invoked while the subtree has different
AioContexts.

(3) I need a list after all (one that only contains AioContexts, but still).


I like (3) as little as I did in this series.  (1) seems wrong.  I’ll
try (2) first.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

      reply	other threads:[~2019-07-17 13:20 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-19 15:25 [Qemu-devel] [PATCH v2 0/9] block: Delay poll when ending drained sections Max Reitz
2019-06-19 15:25 ` [Qemu-devel] [PATCH v2 1/9] block: Introduce BdrvChild.parent_quiesce_counter Max Reitz
2019-06-19 15:25 ` [Qemu-devel] [PATCH v2 2/9] block: Add @data_objs to bdrv_drain_invoke() Max Reitz
2019-06-19 15:25 ` [Qemu-devel] [PATCH v2 3/9] block: Add bdrv_poll_drain_data_objs() Max Reitz
2019-06-19 15:25 ` [Qemu-devel] [PATCH v2 4/9] block: Move polling out of bdrv_drain_invoke() Max Reitz
2019-06-19 15:25 ` [Qemu-devel] [PATCH v2 5/9] block: Add @poll to bdrv_parent_drained_end_single() Max Reitz
2019-06-19 15:26 ` [Qemu-devel] [PATCH v2 6/9] block: Add bdrv_drained_end_no_poll() Max Reitz
2019-06-19 15:26 ` [Qemu-devel] [PATCH v2 7/9] block: Fix BDS children's .drained_end() Max Reitz
2019-06-19 15:26 ` [Qemu-devel] [PATCH v2 8/9] iotests: Add @has_quit to vm.shutdown() Max Reitz
2019-06-19 15:26 ` [Qemu-devel] [PATCH v2 9/9] iotests: Test commit with a filter on the chain Max Reitz
2019-07-15 13:24 ` [Qemu-devel] [PATCH v2 0/9] block: Delay poll when ending drained sections Max Reitz
2019-07-16 14:40 ` Kevin Wolf
2019-07-16 16:24   ` Max Reitz
2019-07-16 16:37     ` Kevin Wolf
2019-07-17 13:20       ` Max Reitz [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ced90b10-b5ee-21dc-4c46-e47aaac27fc9@redhat.com \
    --to=mreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).