From: Roman Kagan <rvkagan@yandex-team.ru>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Cc: kwolf@redhat.com, qemu-block@nongnu.org, qemu-devel@nongnu.org,
mreitz@redhat.com, den@openvz.org
Subject: Re: [PATCH 1/4] block/nbd: fix drain dead-lock because of nbd reconnect-delay
Date: Wed, 3 Feb 2021 18:00:20 +0300 [thread overview]
Message-ID: <20210203150020.GE2786@rvkaganb.lan> (raw)
In-Reply-To: <e4387e05-4963-e335-3002-ddbbcf5aaf82@virtuozzo.com>
On Wed, Feb 03, 2021 at 05:44:34PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 03.02.2021 17:21, Roman Kagan wrote:
> > On Wed, Feb 03, 2021 at 04:10:41PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > 03.02.2021 13:53, Roman Kagan wrote:
> > > > On Thu, Sep 03, 2020 at 10:02:58PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > > We pause reconnect process during drained section. So, if we have some
> > > > > requests, waiting for reconnect we should cancel them, otherwise they
> > > > > deadlock the drained section.
> > > > >
> > > > > How to reproduce:
> > > > >
> > > > > 1. Create an image:
> > > > > qemu-img create -f qcow2 xx 100M
> > > > >
> > > > > 2. Start NBD server:
> > > > > qemu-nbd xx
> > > > >
> > > > > 3. Start vm with second nbd disk on node2, like this:
> > > > >
> > > > > ./build/x86_64-softmmu/qemu-system-x86_64 -nodefaults -drive \
> > > > > file=/work/images/cent7.qcow2 -drive \
> > > > > driver=nbd,server.type=inet,server.host=192.168.100.5,server.port=10809,reconnect-delay=60 \
> > > > > -vnc :0 -m 2G -enable-kvm -vga std
> > > > >
> > > > > 4. Access the vm through vnc (or some other way?), and check that NBD
> > > > > drive works:
> > > > >
> > > > > dd if=/dev/sdb of=/dev/null bs=1M count=10
> > > > >
> > > > > - the command should succeed.
> > > > >
> > > > > 5. Now, kill the nbd server, and run dd in the guest again:
> > > > >
> > > > > dd if=/dev/sdb of=/dev/null bs=1M count=10
> > > > >
> > > > > Now Qemu is trying to reconnect, and dd-generated requests are waiting
> > > > > for the connection (they will wait up to 60 seconds (see
> > > > > reconnect-delay option above) and than fail). But suddenly, vm may
> > > > > totally hang in the deadlock. You may need to increase reconnect-delay
> > > > > period to catch the dead-lock.
> > > > >
> > > > > VM doesn't respond because drain dead-lock happens in cpu thread with
> > > > > global mutex taken. That's not good thing by itself and is not fixed
> > > > > by this commit (true way is using iothreads). Still this commit fixes
> > > > > drain dead-lock itself.
> > > > >
> > > > > Note: probably, we can instead continue to reconnect during drained
> > > > > section. To achieve this, we may move negotiation to the connect thread
> > > > > to make it independent of bs aio context. But expanding drained section
> > > > > doesn't seem good anyway. So, let's now fix the bug the simplest way.
> > > > >
> > > > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > > > > ---
> > > > > block/nbd.c | 5 +++++
> > > > > 1 file changed, 5 insertions(+)
> > > > >
> > > > > diff --git a/block/nbd.c b/block/nbd.c
> > > > > index 9daf003bea..912ea27be7 100644
> > > > > --- a/block/nbd.c
> > > > > +++ b/block/nbd.c
> > > > > @@ -242,6 +242,11 @@ static void coroutine_fn nbd_client_co_drain_begin(BlockDriverState *bs)
> > > > > }
> > > > > nbd_co_establish_connection_cancel(bs, false);
> > > > > +
> > > > > + if (s->state == NBD_CLIENT_CONNECTING_WAIT) {
> > > > > + s->state = NBD_CLIENT_CONNECTING_NOWAIT;
> > > > > + qemu_co_queue_restart_all(&s->free_sema);
> > > > > + }
> > > > > }
> > > > > static void coroutine_fn nbd_client_co_drain_end(BlockDriverState *bs)
> > > >
> > > > This basically defeats the whole purpose of reconnect: if the nbd client
> > > > is trying to reconnect, drain effectively cancels that and makes all
> > > > in-flight requests to complete with an error.
> > > >
> > > > I'm not suggesting to revert this patch (it's now in the tree as commit
> > > > 8c517de24a), because the deadlock is no better, but I'm afraid the only
> > > > real fix is to implement reconnect during the drain section. I'm still
> > > > trying to get my head around it so no patch yet, but I just wanted to
> > > > bring this up in case anybody beats me to it.
> > > >
> > >
> > >
> > > What do you mean by "reconnect during drained section"? Trying to
> > > establish the connection? Or keeping in-flight requests instead of
> > > cancelling them? We can't keep in-flight requests during drained
> > > section, as it's the whole sense of drained-section: no in-flight
> > > requests. So we'll have to wait for them at drain_begin (waiting up to
> > > reconnect-delay, which doesn't seem good and triggers dead-lock for
> > > non-iothread environment) or just cancel them..
> > >
> > > Do you argue that waiting on drained-begin is somehow better than
> > > cancelling?
> >
> > Sorry I should have used more accurate wording to be clear.
> >
> > Yes, my point is that canceling the requests on entry to a drained
> > section is incorrect. And no, it doesn't matter if it can be long:
> > that's the price you pay for doing the drain. Think of reconnect as a
> > special case of a slow connection: if an nbd reply from the server is
> > delayed for whatever reason without dropping the connection, you have to
> > wait here, too. (In fact, reconnect is even slightly better here since
> > it has a configurable finite timeout while the message delay does not
> > AFAIK.)
> >
> > Does it make sense?
>
> Hmm, yes..
>
> But then we should fix the original deadlock some other way.
Exactly.
> Probably it will be possible only by using iothread for nbd node (I
> failed to find original thread where someone said that iothreads is a
> solution). And than we can do cancel in nbd_client_co_drain_begin()
> only if bs doesn't have a separate iothread.
Without this commit, I see qemu hang with nbd in an iothread, too. I'll
double-check if it's that very same deadlock or something else.
Thanks,
Roman.
next prev parent reply other threads:[~2021-02-03 15:02 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-03 19:02 [PATCH 0/4] nbd reconnect new fixes Vladimir Sementsov-Ogievskiy
2020-09-03 19:02 ` [PATCH 1/4] block/nbd: fix drain dead-lock because of nbd reconnect-delay Vladimir Sementsov-Ogievskiy
2020-09-23 15:08 ` Eric Blake
2021-02-03 10:53 ` Roman Kagan
2021-02-03 13:10 ` Vladimir Sementsov-Ogievskiy
2021-02-03 14:21 ` Roman Kagan
2021-02-03 14:44 ` Vladimir Sementsov-Ogievskiy
2021-02-03 15:00 ` Roman Kagan [this message]
2020-09-03 19:02 ` [PATCH 2/4] block/nbd: correctly use qio_channel_detach_aio_context when needed Vladimir Sementsov-Ogievskiy
2020-09-23 15:10 ` Eric Blake
2020-09-03 19:03 ` [PATCH 3/4] block/nbd: fix reconnect-delay Vladimir Sementsov-Ogievskiy
2020-09-23 15:15 ` Eric Blake
2020-09-03 19:03 ` [PATCH 4/4] block/nbd: nbd_co_reconnect_loop(): don't connect if drained Vladimir Sementsov-Ogievskiy
2020-09-23 15:16 ` Eric Blake
2020-09-04 13:32 ` [PATCH 0/4] nbd reconnect new fixes no-reply
2020-09-04 14:00 ` Vladimir Sementsov-Ogievskiy
2020-09-04 13:34 ` no-reply
2020-09-04 14:01 ` Vladimir Sementsov-Ogievskiy
2020-09-18 18:29 ` Vladimir Sementsov-Ogievskiy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210203150020.GE2786@rvkaganb.lan \
--to=rvkagan@yandex-team.ru \
--cc=den@openvz.org \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=vsementsov@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).