All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hanna Reitz <hreitz@redhat.com>
To: Emanuele Giuseppe Esposito <eesposit@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>
Cc: Fam Zheng <fam@euphon.net>, Kevin Wolf <kwolf@redhat.com>,
	Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	qemu-block@nongnu.org, qemu-devel@nongnu.org,
	Paolo Bonzini <pbonzini@redhat.com>, John Snow <jsnow@redhat.com>
Subject: Re: [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept
Date: Wed, 30 Mar 2022 16:12:30 +0200	[thread overview]
Message-ID: <c7f6fe7e-c309-010a-eaba-549fbfcb45ce@redhat.com> (raw)
In-Reply-To: <a4d3fc47-0769-7d11-47aa-a1c4ac503406@redhat.com>

On 30.03.22 13:55, Emanuele Giuseppe Esposito wrote:
>
> Am 30/03/2022 um 12:53 schrieb Hanna Reitz:
>> On 17.03.22 17:23, Emanuele Giuseppe Esposito wrote:
>>> Am 09/03/2022 um 14:26 schrieb Emanuele Giuseppe Esposito:
>>>>>> * Drains allow the caller (either main loop or iothread running
>>>>>> the context) to wait all in_flights requests and operations
>>>>>> of a BDS: normal drains target a given node and is parents, while
>>>>>> subtree ones also include the subgraph of the node. Siblings are
>>>>>> not affected by any of these two kind of drains.
>>>>> Siblings are drained to the extent required for their parent node to
>>>>> reach in_flight == 0.
>>>>>
>>>>> I haven't checked the code but I guess the case you're alluding to is
>>>>> that siblings with multiple parents could have other I/O in flight that
>>>>> will not be drained and further I/O can be submitted after the parent
>>>>> has drained?
>>>> Yes, this in theory can happen. I don't really know if this happens
>>>> practically, and how likely is to happen.
>>>>
>>>> The alternative would be to make a drain that blocks the whole graph,
>>>> siblings included, but that would probably be an overkill.
>>>>
>>> So I have thought about this, and I think maybe this is not a concrete
>>> problem.
>>> Suppose we have a graph where "parent" has 2 children: "child" and
>>> "sibling". "sibling" also has a blockjob.
>>>
>>> Now, main loop wants to modify parent-child relation and maybe detach
>>> child from parent.
>>>
>>> 1st wrong assumption: the sibling is not drained. Actually my strategy
>>> takes into account draining both nodes, also because parent could be in
>>> another graph. Therefore sibling is drained.
>>>
>>> But let's assume "sibling" is the sibling of the parent.
>>>
>>> Therefore we have
>>> "child" -> "parent" -> "grandparent"
>>> and
>>> "blockjob" -> "sibling" -> "grandparent"
>>>
>>> The issue is the following: main loop can't drain "sibling", because
>>> subtree_drained does not reach it. Therefore blockjob can still run
>>> while main loop modifies "child" -> "parent". Blockjob can either:
>>> 1) drain, but this won't affect "child" -> "parent"
>>> 2) read the graph in other ways different from drain, for example
>>> .set_aio_context recursively touches the whole graph.
>>> 3) write the graph.
>> I don’t really understand the problem here.  If the block job only
>> operates on the sibling subgraph, why would it care what’s going on in
>> the other subgraph?
> We are talking about something that probably does not happen, but what
> if it calls a callback similar to .set_aio_context that goes through the
> whole graph?

Hm.  Quite unfortunate if such a callback can operate on drained nodes, 
I’d say.  Ideally callbacks wouldn’t do that, but probably they will. :/

> Even though the first question is: is there such callback?

I mean, you could say any callback qualifies.  Draining a node will only 
drain its recursive parents, so siblings are not affected.  If the 
sibling issues the callback on its parent...  (E.g. changes in the 
backing chain requiring a qcow2 parent node to change the backing file 
string in its image file)

> Second even more irrealistic case is when a job randomly looks for a bs
> in another connectivity component and for example drains it.
> Again probably impossible.

I hope so, but the block layer sure likes to surprise me.

>> Block jobs should own all nodes that are associated with them (e.g.
>> because they intend to drop or replace them when the job is done), so
>> when part of the graph is drained, all jobs that could modify that part
>> should be drained, too.
> What do you mean with "own"?

They’re added with block_job_add_bdrv(), and then are children of the 
BlockJob object.

>>> 3) can be only performed in the main loop, because it's a graph
>>> operation. It means that the blockjob runs when the graph modifying
>>> coroutine/bh is not running. They never run together.
>>> The safety of this operation relies on where the drains are and will be
>>> inserted. If you do like in my patch "block.c:
>>> bdrv_replace_child_noperm: first call ->attach(), and then add child\x0f",
>>> then we would have problem, because we drain between two writes, and the
>>> blockjob will find an inconsistent graph. If we do it as we seem to do
>>> it so far, then we won't really have any problem.
>>>
>>> 2) is a read, and can theoretically be performed by another thread. But
>>> is there a function that does that? .set_aio_context for example is a GS
>>> function, so we will fall back to case 3) and nothing bad would happen.
>>>
>>> Is there a counter example for this?
>>>
>>> -----------
>>>
>>> Talking about something else, I discussed with Kevin what *seems* to be
>>> an alternative way to do this, instead of adding drains everywhere.
>>> His idea is to replicate what blk_wait_while_drained() currently does
>>> but on a larger scale. It is something in between this subtree_drains
>>> logic and a rwlock.
>>>
>>> Basically if I understood correctly, we could implement
>>> bdrv_wait_while_drained(), and put in all places where we would put a
>>> read lock: all the reads to ->parents and ->children.
>>> This function detects if the bdrv is under drain, and if so it will stop
>>> and wait that the drain finishes (ie the graph modification).
>>> On the other side, each write would just need to drain probably both
>>> nodes (simple drain), to signal that we are modifying the graph. Once
>>> bdrv_drained_begin() finishes, we are sure all coroutines are stopped.
>>> Once bdrv_drained_end() finishes, we automatically let all coroutine
>>> restart, and continue where they left off.
>>>
>>> Seems a good compromise between drains and rwlock. What do you think?
>> Well, sounds complicated.  So I’m asking myself whether this would be
>> noticeably better than just an RwLock for graph modifications, like the
>> global lock Vladimir has proposed.
> But the point is then: aren't we re-inventing an AioContext lock?

I don’t know how AioContext locks would even help with graph changes.  
If I want to change a block subgraph that’s in a different I/O thread, 
locking that thread isn’t enough (I would’ve thought); because I have no 
idea what the thread is doing when I’m locking it.  Perhaps it’s 
iterating through ->children right now (with some yields in between), 
and by pausing it, changing the graph, and then resuming it, it’ll still 
cause problems.

> the lock will protect not only ->parents and ->child, but also other
> bdrv fields that are concurrently read/written.

I would’ve thought this lock should only protect ->parents and ->children.

> I don't know, it seems to me that there is a lot of uncertainty on which
> way to take...

Definitely. :)

I wouldn’t call that a bad thing, necessarily.  Let’s look at the 
positive side: There are many ideas!

Hanna



  reply	other threads:[~2022-03-30 14:13 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-01 14:21 [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept Emanuele Giuseppe Esposito
2022-03-01 14:21 ` [RFC PATCH 1/5] aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED Emanuele Giuseppe Esposito
2022-03-02 16:21   ` Stefan Hajnoczi
2022-03-01 14:21 ` [RFC PATCH 2/5] introduce BDRV_POLL_WHILE_UNLOCKED Emanuele Giuseppe Esposito
2022-03-02 16:22   ` Stefan Hajnoczi
2022-03-09 13:49   ` Eric Blake
2022-03-01 14:21 ` [RFC PATCH 3/5] block/io.c: introduce bdrv_subtree_drained_{begin/end}_unlocked Emanuele Giuseppe Esposito
2022-03-02 16:25   ` Stefan Hajnoczi
2022-03-01 14:21 ` [RFC PATCH 4/5] child_job_drained_poll: override polling condition only when in home thread Emanuele Giuseppe Esposito
2022-03-02 16:37   ` Stefan Hajnoczi
2022-03-01 14:21 ` [RFC PATCH 5/5] test-bdrv-drain: ensure draining from main loop stops iothreads Emanuele Giuseppe Esposito
2022-03-01 14:26 ` [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept Emanuele Giuseppe Esposito
2022-03-02  9:47 ` Stefan Hajnoczi
2022-03-09 13:26   ` Emanuele Giuseppe Esposito
2022-03-10 15:54     ` Stefan Hajnoczi
2022-03-17 16:23     ` Emanuele Giuseppe Esposito
2022-03-30 10:53       ` Hanna Reitz
2022-03-30 11:55         ` Emanuele Giuseppe Esposito
2022-03-30 14:12           ` Hanna Reitz [this message]
2022-03-30 16:02         ` Paolo Bonzini
2022-03-31  9:59           ` Paolo Bonzini
2022-03-31 13:51             ` Emanuele Giuseppe Esposito
2022-03-31 16:40               ` Paolo Bonzini
2022-04-01  8:05                 ` Emanuele Giuseppe Esposito
2022-04-01 11:01                   ` Paolo Bonzini
2022-04-04  9:25                     ` Stefan Hajnoczi
2022-04-04  9:41                       ` Paolo Bonzini
2022-04-04  9:51                         ` Emanuele Giuseppe Esposito
2022-04-04 10:07                           ` Paolo Bonzini
2022-04-05  9:39                         ` Stefan Hajnoczi
2022-04-05 10:43                         ` Kevin Wolf
2022-04-13 13:43                     ` Emanuele Giuseppe Esposito
2022-04-13 14:51                       ` Kevin Wolf
2022-04-13 15:14                         ` Emanuele Giuseppe Esposito
2022-04-13 15:22                           ` Emanuele Giuseppe Esposito
2022-04-13 16:29                           ` Kevin Wolf
2022-04-13 20:43                             ` Paolo Bonzini
2022-04-13 20:46                         ` Paolo Bonzini
2022-03-02 11:07 ` Vladimir Sementsov-Ogievskiy
2022-03-02 16:20   ` Stefan Hajnoczi
2022-03-09 13:26   ` Emanuele Giuseppe Esposito
2022-03-16 21:55     ` Emanuele Giuseppe Esposito
2022-03-21 12:22       ` Vladimir Sementsov-Ogievskiy
2022-03-21 15:24     ` Vladimir Sementsov-Ogievskiy
2022-03-21 15:44     ` Vladimir Sementsov-Ogievskiy
2022-03-30  9:09       ` Emanuele Giuseppe Esposito
2022-03-30  9:52         ` Vladimir Sementsov-Ogievskiy
2022-03-30  9:58           ` Emanuele Giuseppe Esposito
2022-04-05 10:55             ` Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c7f6fe7e-c309-010a-eaba-549fbfcb45ce@redhat.com \
    --to=hreitz@redhat.com \
    --cc=eesposit@redhat.com \
    --cc=fam@euphon.net \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.