All of lore.kernel.org
 help / color / mirror / Atom feed
From: Emanuele Giuseppe Esposito <eesposit@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Fam Zheng <fam@euphon.net>,
	Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	qemu-block@nongnu.org, qemu-devel@nongnu.org,
	Hanna Reitz <hreitz@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>, John Snow <jsnow@redhat.com>
Subject: Re: [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept
Date: Wed, 13 Apr 2022 17:14:04 +0200	[thread overview]
Message-ID: <5d34e709-fe59-70df-2723-49f252aaed78@redhat.com> (raw)
In-Reply-To: <Ylbjd3kzEsBZmgJQ@redhat.com>



Am 13/04/2022 um 16:51 schrieb Kevin Wolf:
> Am 13.04.2022 um 15:43 hat Emanuele Giuseppe Esposito geschrieben:
>> So this is a more concrete and up-to-date header.
>>
>> Few things to notice:
>> - we have a list of AioContext. They are registered once an aiocontext
>> is created, and deleted when it is destroyed.
>> This list is helpful because each aiocontext can only modify its own
>> number of readers, avoiding unnecessary cacheline bouncing
>>
>> - if a coroutine changes aiocontext, it's ok with regards to the
>> per-aiocontext reader counter. As long as the sum is correct, there's no
>> issue. The problem comes only once the original aiocontext is deleted,
>> and at that point we need to move the count it held to a shared global
>> variable, otherwise we risk to lose track of readers.
> 
> So the idea is that we can do bdrv_graph_co_rdlock() in one thread and
> the corresponding bdrv_graph_co_rdunlock() in a different thread?
> 
> Would the unlock somehow remember the original thread, or do you use the
> "sum is correct" argument and allow negative counter values, so you can
> end up having count +1 in A and -1 in B to represent "no active
> readers"? If this happens, it's likely to happen many times, so do we
> have to take integer overflows into account then?
> 
>> - All synchronization between the flags explained in this header is of
>> course handled in the implementation. But for now it would be nice to
>> have a feedback on the idea/API.
>>
>> So in short we need:
>> - per-aiocontext counter
>> - global list of aiocontext
>> - global additional reader counter (in case an aiocontext is deleted)
>> - global CoQueue
>> - global has_writer flag
>> - global QemuMutex to protect the list access
>>
>> Emanuele
>>
>> #ifndef BLOCK_LOCK_H
>> #define BLOCK_LOCK_H
>>
>> #include "qemu/osdep.h"
>>
>> /*
>>  * register_aiocontext:
>>  * Add AioContext @ctx to the list of AioContext.
>>  * This list is used to obtain the total number of readers
>>  * currently running the graph.
>>  */
>> void register_aiocontext(AioContext *ctx);
>>
>> /*
>>  * unregister_aiocontext:
>>  * Removes AioContext @ctx to the list of AioContext.
>>  */
>> void unregister_aiocontext(AioContext *ctx);
>>
>> /*
>>  * bdrv_graph_wrlock:
>>  * Modify the graph. Nobody else is allowed to access the graph.
>>  * Set global has_writer to 1, so that the next readers will wait
>>  * that writer is done in a coroutine queue.
>>  * Then keep track of the running readers by counting what is the total
>>  * amount of readers (sum of all aiocontext readers), and wait until
>>  * they all finish with AIO_WAIT_WHILE.
>>  */
>> void bdrv_graph_wrlock(void);
> 
> Do we need a coroutine version that yields instead of using
> AIO_WAIT_WHILE() or are we sure this will only ever be called from
> non-coroutine contexts?

writes (graph modifications) are always done under BQL in the main loop.
Except an unit test, I don't think a coroutine ever does that.

> 
>> /*
>>  * bdrv_graph_wrunlock:
>>  * Write finished, reset global has_writer to 0 and restart
>>  * all readers that are waiting.
>>  */
>> void bdrv_graph_wrunlock(void);
>>
>> /*
>>  * bdrv_graph_co_rdlock:
>>  * Read the bs graph. Increases the reader counter of the current
>> aiocontext,
>>  * and if has_writer is set, it means that the writer is modifying
>>  * the graph, therefore wait in a coroutine queue.
>>  * The writer will then wake this coroutine once it is done.
>>  *
>>  * This lock cannot be taken recursively.
>>  */
>> void coroutine_fn bdrv_graph_co_rdlock(void);
> 
> What prevents it from being taken recursively when it's just a counter?
> (I do see however, that you can't take a reader lock while you have the
> writer lock or vice versa because it would deadlock.)
> 
I actually didn't add the assertion to prevent it from being recoursive
yet, but I think it simplifies everything if it's not recoursive

> Does this being a coroutine_fn mean that we would have to convert QMP
> command handlers to coroutines so that they can take the rdlock while
> they don't expect the graph to change? Or should we have a non-coroutine
> version, too, that works with AIO_WAIT_WHILE()?

Why convert the QMP command handlers? coroutine_fn was just to signal
that it can also be called from coroutines, like the ones created by the
blk_* API.
A reader does not have to be a coroutine. AIO_WAIT_WHILE is not
mandatory to allow it to finish, it helps to ensure progress in case
some reader is waiting for something, but other than that is not
necessary IMO.

> Or should this only be taken for very small pieces of code directly
> accessing the BdrvChild objects, and high-level users like QMP commands
> shouldn't even consider themselves readers?
> 

No I think if we focus on small pieces of code we end up having a
million lock/unlock pairs.

>> /*
>>  * bdrv_graph_rdunlock:
>>  * Read terminated, decrease the count of readers in the current aiocontext.
>>  * If the writer is waiting for reads to finish (has_writer == 1), signal
>>  * the writer that we are done via aio_wait_kick() to let it continue.
>>  */
>> void coroutine_fn bdrv_graph_co_rdunlock(void);
>>
>> #endif /* BLOCK_LOCK_H */
> 
> I expect that in the final version, we might want to have some sugar
> like a WITH_BDRV_GRAPH_RDLOCK_GUARD() macro, but obviously that doesn't
> affect the fundamental design.

Yeah I will ping you once I get to that point ;)

Emanuele
> 
> Kevin
> 



  reply	other threads:[~2022-04-13 15:23 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-01 14:21 [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept Emanuele Giuseppe Esposito
2022-03-01 14:21 ` [RFC PATCH 1/5] aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED Emanuele Giuseppe Esposito
2022-03-02 16:21   ` Stefan Hajnoczi
2022-03-01 14:21 ` [RFC PATCH 2/5] introduce BDRV_POLL_WHILE_UNLOCKED Emanuele Giuseppe Esposito
2022-03-02 16:22   ` Stefan Hajnoczi
2022-03-09 13:49   ` Eric Blake
2022-03-01 14:21 ` [RFC PATCH 3/5] block/io.c: introduce bdrv_subtree_drained_{begin/end}_unlocked Emanuele Giuseppe Esposito
2022-03-02 16:25   ` Stefan Hajnoczi
2022-03-01 14:21 ` [RFC PATCH 4/5] child_job_drained_poll: override polling condition only when in home thread Emanuele Giuseppe Esposito
2022-03-02 16:37   ` Stefan Hajnoczi
2022-03-01 14:21 ` [RFC PATCH 5/5] test-bdrv-drain: ensure draining from main loop stops iothreads Emanuele Giuseppe Esposito
2022-03-01 14:26 ` [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept Emanuele Giuseppe Esposito
2022-03-02  9:47 ` Stefan Hajnoczi
2022-03-09 13:26   ` Emanuele Giuseppe Esposito
2022-03-10 15:54     ` Stefan Hajnoczi
2022-03-17 16:23     ` Emanuele Giuseppe Esposito
2022-03-30 10:53       ` Hanna Reitz
2022-03-30 11:55         ` Emanuele Giuseppe Esposito
2022-03-30 14:12           ` Hanna Reitz
2022-03-30 16:02         ` Paolo Bonzini
2022-03-31  9:59           ` Paolo Bonzini
2022-03-31 13:51             ` Emanuele Giuseppe Esposito
2022-03-31 16:40               ` Paolo Bonzini
2022-04-01  8:05                 ` Emanuele Giuseppe Esposito
2022-04-01 11:01                   ` Paolo Bonzini
2022-04-04  9:25                     ` Stefan Hajnoczi
2022-04-04  9:41                       ` Paolo Bonzini
2022-04-04  9:51                         ` Emanuele Giuseppe Esposito
2022-04-04 10:07                           ` Paolo Bonzini
2022-04-05  9:39                         ` Stefan Hajnoczi
2022-04-05 10:43                         ` Kevin Wolf
2022-04-13 13:43                     ` Emanuele Giuseppe Esposito
2022-04-13 14:51                       ` Kevin Wolf
2022-04-13 15:14                         ` Emanuele Giuseppe Esposito [this message]
2022-04-13 15:22                           ` Emanuele Giuseppe Esposito
2022-04-13 16:29                           ` Kevin Wolf
2022-04-13 20:43                             ` Paolo Bonzini
2022-04-13 20:46                         ` Paolo Bonzini
2022-03-02 11:07 ` Vladimir Sementsov-Ogievskiy
2022-03-02 16:20   ` Stefan Hajnoczi
2022-03-09 13:26   ` Emanuele Giuseppe Esposito
2022-03-16 21:55     ` Emanuele Giuseppe Esposito
2022-03-21 12:22       ` Vladimir Sementsov-Ogievskiy
2022-03-21 15:24     ` Vladimir Sementsov-Ogievskiy
2022-03-21 15:44     ` Vladimir Sementsov-Ogievskiy
2022-03-30  9:09       ` Emanuele Giuseppe Esposito
2022-03-30  9:52         ` Vladimir Sementsov-Ogievskiy
2022-03-30  9:58           ` Emanuele Giuseppe Esposito
2022-04-05 10:55             ` Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5d34e709-fe59-70df-2723-49f252aaed78@redhat.com \
    --to=eesposit@redhat.com \
    --cc=fam@euphon.net \
    --cc=hreitz@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.