All of lore.kernel.org
 help / color / mirror / Atom feed
From: Emanuele Giuseppe Esposito <eesposit@redhat.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Fam Zheng <fam@euphon.net>, Kevin Wolf <kwolf@redhat.com>,
	Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	qemu-block@nongnu.org, qemu-devel@nongnu.org,
	Hanna Reitz <hreitz@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>, John Snow <jsnow@redhat.com>
Subject: Re: [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept
Date: Wed, 9 Mar 2022 14:26:28 +0100	[thread overview]
Message-ID: <af53599c-c7de-d2b8-00fa-0e7d28121251@redhat.com> (raw)
In-Reply-To: <Yh89L8gT46MbSJCQ@stefanha-x1.localdomain>



Am 02/03/2022 um 10:47 schrieb Stefan Hajnoczi:
> On Tue, Mar 01, 2022 at 09:21:08AM -0500, Emanuele Giuseppe Esposito wrote:
>> This serie tries to provide a proof of concept and a clear explanation
>> on why we need to use drains (and more precisely subtree_drains)
>> to replace the aiocontext lock, especially to protect BlockDriverState
>> ->children and ->parent lists.
>>
>> Just a small recap on the key concepts:
>> * We split block layer APIs in "global state" (GS), "I/O", and
>> "global state or I/O".
>>   GS are running in the main loop, under BQL, and are the only
>>   one allowed to modify the BlockDriverState graph.
>>
>>   I/O APIs are thread safe and can run in any thread
>>
>>   "global state or I/O" are essentially all APIs that use
>>   BDRV_POLL_WHILE. This is because there can be only 2 threads
>>   that can use BDRV_POLL_WHILE: main loop and the iothread that
>>   runs the aiocontext.
>>
>> * Drains allow the caller (either main loop or iothread running
>> the context) to wait all in_flights requests and operations
>> of a BDS: normal drains target a given node and is parents, while
>> subtree ones also include the subgraph of the node. Siblings are
>> not affected by any of these two kind of drains.
> 
> Siblings are drained to the extent required for their parent node to
> reach in_flight == 0.
> 
> I haven't checked the code but I guess the case you're alluding to is
> that siblings with multiple parents could have other I/O in flight that
> will not be drained and further I/O can be submitted after the parent
> has drained?

Yes, this in theory can happen. I don't really know if this happens
practically, and how likely is to happen.

The alternative would be to make a drain that blocks the whole graph,
siblings included, but that would probably be an overkill.

> 
>> After bdrv_drained_begin, no more request is allowed to come
>> from the affected nodes. Therefore the only actor left working
>> on a drained part of the graph should be the main loop.
> 
> It's worth remembering that this invariant is not enforced. Draining is
> a cooperative scheme. Everything *should* be notified and stop
> submitting I/O, but there is no assertion or return -EBUSY if a request
> is submitted while the BDS is drained.

Yes, it is a cooperative scheme and all except from mirror (fixed in the
last patch) seem to follow the invariant.
> 
> If the thread that drained the BDS wants, I think it can (legally)
> submit I/O within the drained section.
> 

Yes but at some point the main loop drains too before changing the
graph. Then the thread must be stopped, according with the invariant
above. So it doesn't matter if the thread has already drained or not.

>> What do we intend to do
>> -----------------------
>> We want to remove the AioContext lock. It is not 100% clear on how
>> many things we are protecting with it, and why.
>> As a starter, we want to protect BlockDriverState ->parents and
>> ->children lists, since they are read by main loop and I/O but
>> only written by main loop under BQL. The function that modifies
>> these lists is bdrv_replace_child_common().
>>
>> How do we want to do it
>> -----------------------
>> We individuated as ideal subtitute of AioContext lock
>> the subtree_drain API. The reason is simple: draining prevents the iothread to read or write the nodes, so once the main loop finishes
>> executing bdrv_drained_begin() on the interested graph, we are sure that
>> the iothread is not going to look or even interfere with that part of the graph.
>> We are also sure that the only two actors that can look at a specific
>> BlockDriverState in any given context are the main loop and the
>> iothread running the AioContext (ensured by "global state or IO" logic).
>>
>> Why use _subtree_ instead of normal drain
>> -----------------------------------------
>> A simple drain "blocks" a given node and all its parents.
>> But it doesn't touch the child.
>> This means that if we use a simple drain, a child can always
>> keep processing requests, and eventually end up calling itself
>> bdrv_drained_begin, ending up reading the parents node while the main loop
>> is modifying them. Therefore a subtree drain is necessary.
>>
>> Possible scenarios
>> -------------------
>> Keeping in mind that we can only have an iothread and the main loop
>> draining on a certain node, we could have:
>>
>> main loop successfully drains and then iothread tries to drain:
>>   impossible scenario, as iothread is already stopped once main
>>   successfully drains.
>>
>> iothread successfully drains and then main loop drains:
>>   should not be a problem, as:
>>   1) the iothread should be already "blocked" by its own drain
> 
> Once drained_begin() returns in the IOThread, the IOThread can do
> anything it wants, including more submitting I/O. I don't consider that
> "blocked", so I'm not sure what this sentence means?
> 
> The way the main loop thread protects itself against the IOThread is via
> the aio "external" handler concept and block job drain callbacks, which
> are activated by drained_begin(). They ensure that the IOThread will not
> perform further processing that submits I/O, but the IOThread code that
> invoked drained_begin() can still do anything it wants.

As above I think that regardless on what the iothread is doing, once the
main loop has finished executing bdrv_drained_begin the iothread should
not be doing anything related to the nodes that have been drained.

> 
>>   2) main loop would still wait for it to completely block
>>   There is the issue of mirror overriding such scenario to avoid
>>   having deadlocks, but that is handled in the next section.
>>
>> main loop and iothread try to drain together:
>>   As above, this case doens't really matter. As long as
>>   bdrv_drained_begin invariant is respected, the main loop will
>>   continue only once the iothread is "blocked" on that part of the graph.
> 
> I'm not sure about this, see above.
> 
>>
>> A note on iothread draining
>> ---------------------------
>> Theoretically draining from an iothread should not be possible,
>> as the iothread would be scheduling a bh in the main loop waiting
>> for itself to stop, even though it is not yet stopped since it is waiting for the bh.
> 
> I'm not sure what you mean. Which function schedules this BH?
bdrv_drained_begin and _end schedule a bh to run the draining code in
the main loop, if they are in a coroutine. That's what I meant here :)
> 
>>
>> This is what would happen in the tests in patch 5 if .drained_poll
>> was not implemented.
>>
>> Therefore, one solution is to use .drained_poll callback in BlockJobDriver.
>> This callback overrides the default job poll() behavior, and
>> allows the polling condition to stop waiting for the job.
>> It is actually used only in mirror.
>> This however breaks bdrv_drained_begin invariant, because the
>> iothread is not really blocked on that node but continues running.
>> In order to fix this, patch 4 allows the polling condition to be
>> used only by the iothread, and not the main loop too, preventing
>> the drain to return before the iothread is effectively stopped.
>> This is also shown in the tests in patch 5. If the fix in patch
>> 4 is removed, then the main loop drain will return earlier and
>> allow the iothread to run and drain together.
>>
>> The other patches in this serie are cherry-picked from the various
>> series I already sent, and are included here just to allow
>> subtree_drained_begin/end_unlocked implementation.
>>
>> Emanuele Giuseppe Esposito (5):
>>   aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED
>>   introduce BDRV_POLL_WHILE_UNLOCKED
>>   block/io.c: introduce bdrv_subtree_drained_{begin/end}_unlocked
>>   child_job_drained_poll: override polling condition only when in home
>>     thread
>>   test-bdrv-drain: ensure draining from main loop stops iothreads
>>
>>  block/io.c                   |  48 ++++++--
>>  blockjob.c                   |   3 +-
>>  include/block/aio-wait.h     |  15 ++-
>>  include/block/block.h        |   7 ++
>>  tests/unit/test-bdrv-drain.c | 218 +++++++++++++++++++++++++++++++++++
>>  5 files changed, 274 insertions(+), 17 deletions(-)
>>
>> -- 
>> 2.31.1
>>



  reply	other threads:[~2022-03-09 13:28 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-01 14:21 [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept Emanuele Giuseppe Esposito
2022-03-01 14:21 ` [RFC PATCH 1/5] aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED Emanuele Giuseppe Esposito
2022-03-02 16:21   ` Stefan Hajnoczi
2022-03-01 14:21 ` [RFC PATCH 2/5] introduce BDRV_POLL_WHILE_UNLOCKED Emanuele Giuseppe Esposito
2022-03-02 16:22   ` Stefan Hajnoczi
2022-03-09 13:49   ` Eric Blake
2022-03-01 14:21 ` [RFC PATCH 3/5] block/io.c: introduce bdrv_subtree_drained_{begin/end}_unlocked Emanuele Giuseppe Esposito
2022-03-02 16:25   ` Stefan Hajnoczi
2022-03-01 14:21 ` [RFC PATCH 4/5] child_job_drained_poll: override polling condition only when in home thread Emanuele Giuseppe Esposito
2022-03-02 16:37   ` Stefan Hajnoczi
2022-03-01 14:21 ` [RFC PATCH 5/5] test-bdrv-drain: ensure draining from main loop stops iothreads Emanuele Giuseppe Esposito
2022-03-01 14:26 ` [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept Emanuele Giuseppe Esposito
2022-03-02  9:47 ` Stefan Hajnoczi
2022-03-09 13:26   ` Emanuele Giuseppe Esposito [this message]
2022-03-10 15:54     ` Stefan Hajnoczi
2022-03-17 16:23     ` Emanuele Giuseppe Esposito
2022-03-30 10:53       ` Hanna Reitz
2022-03-30 11:55         ` Emanuele Giuseppe Esposito
2022-03-30 14:12           ` Hanna Reitz
2022-03-30 16:02         ` Paolo Bonzini
2022-03-31  9:59           ` Paolo Bonzini
2022-03-31 13:51             ` Emanuele Giuseppe Esposito
2022-03-31 16:40               ` Paolo Bonzini
2022-04-01  8:05                 ` Emanuele Giuseppe Esposito
2022-04-01 11:01                   ` Paolo Bonzini
2022-04-04  9:25                     ` Stefan Hajnoczi
2022-04-04  9:41                       ` Paolo Bonzini
2022-04-04  9:51                         ` Emanuele Giuseppe Esposito
2022-04-04 10:07                           ` Paolo Bonzini
2022-04-05  9:39                         ` Stefan Hajnoczi
2022-04-05 10:43                         ` Kevin Wolf
2022-04-13 13:43                     ` Emanuele Giuseppe Esposito
2022-04-13 14:51                       ` Kevin Wolf
2022-04-13 15:14                         ` Emanuele Giuseppe Esposito
2022-04-13 15:22                           ` Emanuele Giuseppe Esposito
2022-04-13 16:29                           ` Kevin Wolf
2022-04-13 20:43                             ` Paolo Bonzini
2022-04-13 20:46                         ` Paolo Bonzini
2022-03-02 11:07 ` Vladimir Sementsov-Ogievskiy
2022-03-02 16:20   ` Stefan Hajnoczi
2022-03-09 13:26   ` Emanuele Giuseppe Esposito
2022-03-16 21:55     ` Emanuele Giuseppe Esposito
2022-03-21 12:22       ` Vladimir Sementsov-Ogievskiy
2022-03-21 15:24     ` Vladimir Sementsov-Ogievskiy
2022-03-21 15:44     ` Vladimir Sementsov-Ogievskiy
2022-03-30  9:09       ` Emanuele Giuseppe Esposito
2022-03-30  9:52         ` Vladimir Sementsov-Ogievskiy
2022-03-30  9:58           ` Emanuele Giuseppe Esposito
2022-04-05 10:55             ` Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=af53599c-c7de-d2b8-00fa-0e7d28121251@redhat.com \
    --to=eesposit@redhat.com \
    --cc=fam@euphon.net \
    --cc=hreitz@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.