All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Dietmar Maurer <dietmar@proxmox.com>
Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	qemu-block@nongnu.org, Max Reitz <mreitz@redhat.com>
Subject: Re: bdrv_drained_begin deadlock with io-threads
Date: Thu, 2 Apr 2020 16:25:24 +0200	[thread overview]
Message-ID: <20200402142524.GD4006@linux.fritz.box> (raw)
In-Reply-To: <20200402121403.GB4006@linux.fritz.box>

Am 02.04.2020 um 14:14 hat Kevin Wolf geschrieben:
> Am 02.04.2020 um 11:10 hat Dietmar Maurer geschrieben:
> > > It seems to fix it, yes. Now I don't get any hangs any more. 
> > 
> > I just tested using your configuration, and a recent centos8 image
> > running dd loop inside it:
> > 
> > # while dd if=/dev/urandom of=testfile.raw bs=1M count=100; do sync; done
> > 
> > With that, I am unable to trigger the bug.
> > 
> > Would you mind running the test using a Debian Buster image running "stress-ng -d 5" inside?
> > I (and to other people here) can trigger the bug quite reliable with that.
> > 
> > On Debian, you can easily install stress-ng using apt:
> > 
> > # apt update
> > # apt install stress-ng
> > 
> > Seems stress-ng uses a different write pattern which can trigger the bug 
> > more reliable.
> 
> I was going to, just give me some time...

Can you reproduce the problem with my script, but pointing it to your
Debian image and running stress-ng instead of dd? If so, how long does
it take to reproduce for you?

I was just going to write that I can't reproduce in my first attempt
(which is still with the image on tmpfs as in my script, and therefore
without O_DIRECT or Linux AIO) when it finally did hang. However, this
is still while completing a job, not while starting it:

(gdb) bt
#0  0x00007f8b6b4e9526 in ppoll () at /lib64/libc.so.6
#1  0x00005619fc090919 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  0x00005619fc090919 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at util/qemu-timer.c:335
#3  0x00005619fc0930f1 in fdmon_poll_wait (ctx=0x5619fe79ae00, ready_list=0x7fff4006cf58, timeout=-1) at util/fdmon-poll.c:79
#4  0x00005619fc0926d7 in aio_poll (ctx=0x5619fe79ae00, blocking=blocking@entry=true) at util/aio-posix.c:589
#5  0x00005619fbfefd83 in bdrv_do_drained_begin (poll=<optimized out>, ignore_bds_parents=false, parent=0x0, recursive=false, bs=0x5619fe81e490) at block/io.c:429
#6  0x00005619fbfefd83 in bdrv_do_drained_begin (bs=0x5619fe81e490, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>) at block/io.c:395
#7  0x00005619fbfe0ce7 in blk_drain (blk=0x5619ffd35c00) at block/block-backend.c:1617
#8  0x00005619fbfe18cd in blk_unref (blk=0x5619ffd35c00) at block/block-backend.c:473
#9  0x00005619fbf9b185 in block_job_free (job=0x5619ffd0b800) at blockjob.c:89
#10 0x00005619fbf9c769 in job_unref (job=0x5619ffd0b800) at job.c:378
#11 0x00005619fbf9c769 in job_unref (job=0x5619ffd0b800) at job.c:370
#12 0x00005619fbf9d57d in job_exit (opaque=0x5619ffd0b800) at job.c:892
#13 0x00005619fc08eea5 in aio_bh_call (bh=0x7f8b5406f410) at util/async.c:164
#14 0x00005619fc08eea5 in aio_bh_poll (ctx=ctx@entry=0x5619fe79ae00) at util/async.c:164
#15 0x00005619fc09252e in aio_dispatch (ctx=0x5619fe79ae00) at util/aio-posix.c:380
#16 0x00005619fc08ed8e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:298
#17 0x00007f8b6df5606d in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#18 0x00005619fc091798 in glib_pollfds_poll () at util/main-loop.c:219
#19 0x00005619fc091798 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
#20 0x00005619fc091798 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:518
#21 0x00005619fbd07559 in qemu_main_loop () at /home/kwolf/source/qemu/softmmu/vl.c:1664
#22 0x00005619fbbf093e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/kwolf/source/qemu/softmmu/main.c:49

It does looks more like your case because I now have bs.in_flight == 0
and the BlockBackend of the scsi-hd device has in_flight == 8. Of
course, this still doesn't answer why it happens, and I'm not sure if we
can tell without adding some debug code.

I'm testing on my current block branch with Stefan's fixes on top.

Kevin



  reply	other threads:[~2020-04-02 14:26 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-31  8:46 bdrv_drained_begin deadlock with io-threads Dietmar Maurer
2020-03-31  9:17 ` Dietmar Maurer
2020-03-31  9:33   ` Dietmar Maurer
2020-03-31 12:58 ` Kevin Wolf
2020-03-31 14:32   ` Dietmar Maurer
2020-03-31 14:53     ` Vladimir Sementsov-Ogievskiy
2020-03-31 15:24       ` Dietmar Maurer
2020-03-31 15:37         ` Kevin Wolf
2020-03-31 16:18           ` Dietmar Maurer
2020-04-01 10:37             ` Kevin Wolf
2020-04-01 15:37               ` Dietmar Maurer
2020-04-01 15:50                 ` Dietmar Maurer
2020-04-01 18:12                 ` Kevin Wolf
2020-04-01 18:28                   ` Dietmar Maurer
2020-04-01 18:44                     ` Kevin Wolf
2020-04-02  6:48                       ` Dietmar Maurer
2020-04-02  9:10                       ` Dietmar Maurer
2020-04-02 12:14                         ` Kevin Wolf
2020-04-02 14:25                           ` Kevin Wolf [this message]
2020-04-02 15:40                             ` Dietmar Maurer
2020-04-02 16:47                               ` Kevin Wolf
2020-04-02 17:10                                 ` Kevin Wolf
2020-04-03  6:48                                   ` Thomas Lamprecht
2020-04-03  8:26                                   ` Dietmar Maurer
2020-04-03  8:47                                     ` Kevin Wolf
2020-04-03 16:31                                       ` Dietmar Maurer
2020-04-06  8:31                                         ` Kevin Wolf
2020-04-02 15:44                             ` Dietmar Maurer
2020-04-01 18:35                   ` Kevin Wolf
2020-04-02  9:21                   ` Dietmar Maurer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200402142524.GD4006@linux.fritz.box \
    --to=kwolf@redhat.com \
    --cc=dietmar@proxmox.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.