All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Blake <eblake@redhat.com>
To: zhuyangyang <zhuyangyang14@huawei.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>, Fam Zheng <fam@euphon.net>,
	 qemu-block@nongnu.org, qemu-devel@nongnu.org,
	qemu-stable@nongnu.org,  luolongmin@huawei.com,
	suxiaodong1@huawei.com, chenxiaoyu48@huawei.com,
	 wangyan122@huawei.com, yebiaoxiang@huawei.com
Subject: Re: [PATCH v1] coroutine: avoid inserting duplicate coroutine to co_queue_wakeup
Date: Mon, 25 Mar 2024 11:00:31 -0500	[thread overview]
Message-ID: <wac6lbzuvpdyttfpuoeac4ygps2elzktf5dcw22wruaupqy5r6@sfausqavuvil> (raw)
In-Reply-To: <20240325091850.1087235-1-zhuyangyang14@huawei.com>

I've seen (and agree with) Stefan's reply that a more thorough audit
is needed, but am still providing a preliminary review based on what I
see here.

On Mon, Mar 25, 2024 at 05:18:50PM +0800, zhuyangyang via wrote:
> If g_main_loop_run()/aio_poll() is called in the coroutine context,
> the pending coroutine may be woken up repeatedly, and the co_queue_wakeup
> may be disordered.

coroutine context should not be doing anything that can block; you may
have uncovered a bigger bug that we need to address.

> 
> When the poll() syscall exited in g_main_loop_run()/aio_poll(), it means
> some listened events is completed. Therefore, the completion callback
> function is dispatched.
> 
> If this callback function needs to invoke aio_co_enter(), it will only
> wake up the coroutine (because we are already in coroutine context),
> which may cause that the data on this listening event_fd/socket_fd
> is not read/cleared. When the next poll () exits, it will be woken up again
> and inserted into the wakeup queue again.
> 
> For example, if TLS is enabled in NBD, the server will call g_main_loop_run()
> in the coroutine, and repeatedly wake up the io_read event on a socket.
> The call stack is as follows:
> 
> aio_co_enter()
> aio_co_wake()
> qio_channel_restart_read()
> aio_dispatch_handler()
> aio_dispatch_handlers()
> aio_dispatch()
> aio_ctx_dispatch()
> g_main_context_dispatch()
> g_main_loop_run()
> nbd_negotiate_handle_starttls()
> nbd_negotiate_options()
> nbd_negotiate()
> nbd_co_client_start()
> coroutine_trampoline()
> 
> Signed-off-by: zhuyangyang <zhuyangyang14@huawei.com>
> ---
>  util/async.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/util/async.c b/util/async.c
> index 0467890052..25fc1e6083 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -705,7 +705,18 @@ void aio_co_enter(AioContext *ctx, Coroutine *co)
>      if (qemu_in_coroutine()) {
>          Coroutine *self = qemu_coroutine_self();
>          assert(self != co);
> -        QSIMPLEQ_INSERT_TAIL(&self->co_queue_wakeup, co, co_queue_next);
> +        /*
> +         * If the Coroutine *co is already in the co_queue_wakeup, this
> +         * repeated insertion will causes the loss of other queue element

s/causes/cause/

> +         * or infinite loop.
> +         * For examplex:

s/examplex/example/

> +         * Head->a->b->c->NULL, after insert_tail(head, b) => Head->a->b->NULL
> +         * Head->a-b>->NULL, after insert_tail(head, b) => Head->a->b->b...

s/b>->/b->/

> +         */
> +        if (!co->co_queue_next.sqe_next &&
> +            self->co_queue_wakeup.sqh_last != &co->co_queue_next.sqe_next) {
> +            QSIMPLEQ_INSERT_TAIL(&self->co_queue_wakeup, co, co_queue_next);
> +        }
>      } else {
>          qemu_aio_coroutine_enter(ctx, co);
>      }

Intuitively, attacking the symptoms (avoiding bogus list insertion
when it is already on the list) makes sense; but I want to make sure
we attack the root cause.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org



  parent reply	other threads:[~2024-03-25 16:02 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-25  9:18 [PATCH v1] coroutine: avoid inserting duplicate coroutine to co_queue_wakeup zhuyangyang via
2024-03-25 15:50 ` Stefan Hajnoczi
2024-03-26 11:53   ` zhuyangyang via
2024-03-27 22:13   ` Eric Blake
2024-03-28  8:54     ` Kevin Wolf
2024-03-25 16:00 ` Eric Blake [this message]
2024-03-26 12:25   ` zhuyangyang via
2024-03-28 12:40 ` Eric Blake
2024-03-29 13:09   ` Zhu Yangyang via
2024-04-01  8:04   ` Zhu Yangyang via

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=wac6lbzuvpdyttfpuoeac4ygps2elzktf5dcw22wruaupqy5r6@sfausqavuvil \
    --to=eblake@redhat.com \
    --cc=chenxiaoyu48@huawei.com \
    --cc=fam@euphon.net \
    --cc=luolongmin@huawei.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-stable@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=suxiaodong1@huawei.com \
    --cc=wangyan122@huawei.com \
    --cc=yebiaoxiang@huawei.com \
    --cc=zhuyangyang14@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.