[Qemu-devel] Intermittent hang of iotest 194 (bdrv_drain_all after non-shared storage migration)

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] Intermittent hang of iotest 194 (bdrv_drain_all after non-shared storage migration)
@ 2017-11-09  0:48 Max Reitz
  2017-11-09  4:21 ` Fam Zheng
  0 siblings, 1 reply; 7+ messages in thread
From: Max Reitz @ 2017-11-09  0:48 UTC (permalink / raw)
  To: Qemu-block; +Cc: qemu-devel, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2041 bytes --]

Hi,

More exciting news from the bdrv_drain() front!

I've noticed in the past that iotest 194 sometimes hangs.  I usually run
the tests on tmpfs, but I've just now verified that it happens on my SSD
just as well.

So the reproducer is a plain:

while ./check -raw 194; do; done

(No difference between raw or qcow2, though.)

And then, after a couple of runs (or a couple ten), it will just hang.
The reason is that the source VM lingers around and doesn't quit
voluntarily -- the test itself was successful, but it just can't exit.

If you force it to exit by killing the VM (e.g. through pkill -11 qemu),
this is the backtrace:

#0  0x00007f7cfc297e06 in ppoll () at /lib64/libc.so.6
#1  0x0000563b846bcac9 in ppoll (__ss=0x0, __timeout=0x0,
__nfds=<optimized out>, __fds=<optimized out>) at
/usr/include/bits/poll2.h:77
#2  0x0000563b846bcac9 in qemu_poll_ns (fds=<optimized out>,
nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:322
#3  0x0000563b846be711 in aio_poll (ctx=ctx@entry=0x563b856e3e80,
blocking=<optimized out>) at util/aio-posix.c:629
#4  0x0000563b8463afa4 in bdrv_drain_recurse
(bs=bs@entry=0x563b865568a0, begin=begin@entry=true) at block/io.c:201
#5  0x0000563b8463baff in bdrv_drain_all_begin () at block/io.c:381
#6  0x0000563b8463bc99 in bdrv_drain_all () at block/io.c:411
#7  0x0000563b8459888b in block_migration_cleanup (opaque=<optimized
out>) at migration/block.c:714
#8  0x0000563b845883be in qemu_savevm_state_cleanup () at
migration/savevm.c:1251
#9  0x0000563b845811fd in migration_thread (opaque=0x563b856f1da0) at
migration/migration.c:2298
#10 0x00007f7cfc56f36d in start_thread () at /lib64/libpthread.so.0
#11 0x00007f7cfc2a3e1f in clone () at /lib64/libc.so.6

And when you make bdrv_drain_all_begin() print what we are trying to
drain, you can see that it's the format node (managed by the "raw"
driver in this case).

So I thought, before I put more time into this, let's ask whether the
test author has any ideas. :-)

Max

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] Intermittent hang of iotest 194 (bdrv_drain_all after non-shared storage migration)
  2017-11-09  0:48 [Qemu-devel] Intermittent hang of iotest 194 (bdrv_drain_all after non-shared storage migration) Max Reitz
@ 2017-11-09  4:21 ` Fam Zheng
  2017-11-09 15:14   ` Max Reitz
  0 siblings, 1 reply; 7+ messages in thread
From: Fam Zheng @ 2017-11-09  4:21 UTC (permalink / raw)
  To: Max Reitz; +Cc: Qemu-block, qemu-devel, Stefan Hajnoczi

On Thu, 11/09 01:48, Max Reitz wrote:
> Hi,
> 
> More exciting news from the bdrv_drain() front!
> 
> I've noticed in the past that iotest 194 sometimes hangs.  I usually run
> the tests on tmpfs, but I've just now verified that it happens on my SSD
> just as well.
> 
> So the reproducer is a plain:
> 
> while ./check -raw 194; do; done

I cannot produce it on my machine.

> 
> (No difference between raw or qcow2, though.)
> 
> And then, after a couple of runs (or a couple ten), it will just hang.
> The reason is that the source VM lingers around and doesn't quit
> voluntarily -- the test itself was successful, but it just can't exit.
> 
> If you force it to exit by killing the VM (e.g. through pkill -11 qemu),
> this is the backtrace:
> 
> #0  0x00007f7cfc297e06 in ppoll () at /lib64/libc.so.6
> #1  0x0000563b846bcac9 in ppoll (__ss=0x0, __timeout=0x0,
> __nfds=<optimized out>, __fds=<optimized out>) at
> /usr/include/bits/poll2.h:77

Looking at the 0 timeout it seems we are in the aio_poll(ctx, blocking=false);
branches of BDRV_POLL_WHILE? Is it a busy loop? If so I wonder what is making
progress and causing the return value to be true in aio_poll().

> #2  0x0000563b846bcac9 in qemu_poll_ns (fds=<optimized out>,
> nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:322
> #3  0x0000563b846be711 in aio_poll (ctx=ctx@entry=0x563b856e3e80,
> blocking=<optimized out>) at util/aio-posix.c:629
> #4  0x0000563b8463afa4 in bdrv_drain_recurse
> (bs=bs@entry=0x563b865568a0, begin=begin@entry=true) at block/io.c:201
> #5  0x0000563b8463baff in bdrv_drain_all_begin () at block/io.c:381
> #6  0x0000563b8463bc99 in bdrv_drain_all () at block/io.c:411
> #7  0x0000563b8459888b in block_migration_cleanup (opaque=<optimized
> out>) at migration/block.c:714
> #8  0x0000563b845883be in qemu_savevm_state_cleanup () at
> migration/savevm.c:1251
> #9  0x0000563b845811fd in migration_thread (opaque=0x563b856f1da0) at
> migration/migration.c:2298
> #10 0x00007f7cfc56f36d in start_thread () at /lib64/libpthread.so.0
> #11 0x00007f7cfc2a3e1f in clone () at /lib64/libc.so.6
> 
> 
> And when you make bdrv_drain_all_begin() print what we are trying to
> drain, you can see that it's the format node (managed by the "raw"
> driver in this case).

So what is the value of bs->in_flight?

> 
> So I thought, before I put more time into this, let's ask whether the
> test author has any ideas. :-)

Fam

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] Intermittent hang of iotest 194 (bdrv_drain_all after non-shared storage migration)
  2017-11-09  4:21 ` Fam Zheng
@ 2017-11-09 15:14   ` Max Reitz
  2017-11-09 15:30     ` Fam Zheng
  0 siblings, 1 reply; 7+ messages in thread
From: Max Reitz @ 2017-11-09 15:14 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Qemu-block, qemu-devel, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 5883 bytes --]

On 2017-11-09 05:21, Fam Zheng wrote:
> On Thu, 11/09 01:48, Max Reitz wrote:
>> Hi,
>>
>> More exciting news from the bdrv_drain() front!
>>
>> I've noticed in the past that iotest 194 sometimes hangs.  I usually run
>> the tests on tmpfs, but I've just now verified that it happens on my SSD
>> just as well.
>>
>> So the reproducer is a plain:
>>
>> while ./check -raw 194; do; done
> 
> I cannot produce it on my machine.

Hm, too bad.  I see it both on my work laptop (with Fedora) and my
desktop (with Arch)...

>> (No difference between raw or qcow2, though.)
>>
>> And then, after a couple of runs (or a couple ten), it will just hang.
>> The reason is that the source VM lingers around and doesn't quit
>> voluntarily -- the test itself was successful, but it just can't exit.
>>
>> If you force it to exit by killing the VM (e.g. through pkill -11 qemu),
>> this is the backtrace:
>>
>> #0  0x00007f7cfc297e06 in ppoll () at /lib64/libc.so.6
>> #1  0x0000563b846bcac9 in ppoll (__ss=0x0, __timeout=0x0,
>> __nfds=<optimized out>, __fds=<optimized out>) at
>> /usr/include/bits/poll2.h:77
> 
> Looking at the 0 timeout it seems we are in the aio_poll(ctx, blocking=false);
> branches of BDRV_POLL_WHILE? Is it a busy loop? If so I wonder what is making
> progress and causing the return value to be true in aio_poll().
> 
>> #2  0x0000563b846bcac9 in qemu_poll_ns (fds=<optimized out>,
>> nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:322
>> #3  0x0000563b846be711 in aio_poll (ctx=ctx@entry=0x563b856e3e80,
>> blocking=<optimized out>) at util/aio-posix.c:629
>> #4  0x0000563b8463afa4 in bdrv_drain_recurse
>> (bs=bs@entry=0x563b865568a0, begin=begin@entry=true) at block/io.c:201
>> #5  0x0000563b8463baff in bdrv_drain_all_begin () at block/io.c:381
>> #6  0x0000563b8463bc99 in bdrv_drain_all () at block/io.c:411
>> #7  0x0000563b8459888b in block_migration_cleanup (opaque=<optimized
>> out>) at migration/block.c:714
>> #8  0x0000563b845883be in qemu_savevm_state_cleanup () at
>> migration/savevm.c:1251
>> #9  0x0000563b845811fd in migration_thread (opaque=0x563b856f1da0) at
>> migration/migration.c:2298
>> #10 0x00007f7cfc56f36d in start_thread () at /lib64/libpthread.so.0
>> #11 0x00007f7cfc2a3e1f in clone () at /lib64/libc.so.6
>>
>>
>> And when you make bdrv_drain_all_begin() print what we are trying to
>> drain, you can see that it's the format node (managed by the "raw"
>> driver in this case).
> 
> So what is the value of bs->in_flight?

gdb:
> (gdb) print bs->in_flight 
> $3 = 2307492233

"That's weird, why would it..."

> (gdb) print *bs
> $4 = {open_flags = -1202160144, read_only = 161, encrypted = 85, sg = false, probed = false, force_share = 96, implicit = 159, drv = 0x0, opaque = 0x0, aio_context = 0x8989898989898989, aio_notifiers = {lh_first = 0x8989898989898989}, 
>   walking_aio_notifiers = 137, filename = '\211' <repeats 4096 times>, backing_file = '\211' <repeats 4096 times>, backing_format = '\211' <repeats 16 times>, full_open_options = 0x8989898989898989, 
>   exact_filename = '\211' <repeats 4096 times>, backing = 0x8989898989898989, file = 0x8989898989898989, bl = {request_alignment = 2307492233, max_pdiscard = -1987475063, pdiscard_alignment = 2307492233, 
>     max_pwrite_zeroes = -1987475063, pwrite_zeroes_alignment = 2307492233, opt_transfer = 2307492233, max_transfer = 2307492233, min_mem_alignment = 9910603678816504201, opt_mem_alignment = 9910603678816504201, max_iov = -1987475063}, 
>   supported_write_flags = 2307492233, supported_zero_flags = 2307492233, node_name = '\211' <repeats 32 times>, node_list = {tqe_next = 0x8989898989898989, tqe_prev = 0x8989898989898989}, bs_list = {tqe_next = 0x8989898989898989, 
>     tqe_prev = 0x8989898989898989}, monitor_list = {tqe_next = 0x8989898989898989, tqe_prev = 0x8989898989898989}, refcnt = -1987475063, op_blockers = {{lh_first = 0x8989898989898989} <repeats 16 times>}, job = 0x8989898989898989, 
>   inherits_from = 0x8989898989898989, children = {lh_first = 0x8989898989898989}, parents = {lh_first = 0x8989898989898989}, options = 0x8989898989898989, explicit_options = 0x8989898989898989, detect_zeroes = 2307492233, 
>   backing_blocker = 0x8989898989898989, total_sectors = -8536140394893047415, before_write_notifiers = {notifiers = {lh_first = 0x8989898989898989}}, write_threshold_offset = 9910603678816504201, write_threshold_notifier = {notify = 
>     0x8989898989898989, node = {le_next = 0x8989898989898989, le_prev = 0x8989898989898989}}, dirty_bitmap_mutex = {lock = {__data = {__lock = -1987475063, __count = 2307492233, __owner = -1987475063, __nusers = 2307492233, 
>         __kind = -1987475063, __spins = -30327, __elision = -30327, __list = {__prev = 0x8989898989898989, __next = 0x8989898989898989}}, __size = '\211' <repeats 40 times>, __align = -8536140394893047415}, initialized = 137}, 
>   dirty_bitmaps = {lh_first = 0x8989898989898989}, wr_highest_offset = {value = 9910603678816504201}, copy_on_read = -1987475063, in_flight = 2307492233, serialising_in_flight = 2307492233, wakeup = 137, io_plugged = 2307492233, 
>   enable_write_cache = -1987475063, quiesce_counter = -1987475063, write_gen = 2307492233, reqs_lock = {locked = 2307492233, ctx = 0x8989898989898989, from_push = {slh_first = 0x8989898989898989}, to_pop = {
>       slh_first = 0x8989898989898989}, handoff = 2307492233, sequence = 2307492233, holder = 0x8989898989898989}, tracked_requests = {lh_first = 0x8989898989898989}, flush_queue = {entries = {sqh_first = 0x8989898989898989, 
>       sqh_last = 0x8989898989898989}}, active_flush_req = 160, flushed_gen = 0}
"Oh."

*sigh*

OK, I'll look into it...

Max

>> So I thought, before I put more time into this, let's ask whether the
>> test author has any ideas. :-)
> 
> Fam
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] Intermittent hang of iotest 194 (bdrv_drain_all after non-shared storage migration)
  2017-11-09 15:14   ` Max Reitz
@ 2017-11-09 15:30     ` Fam Zheng
  2017-11-09 19:31       ` Max Reitz
  0 siblings, 1 reply; 7+ messages in thread
From: Fam Zheng @ 2017-11-09 15:30 UTC (permalink / raw)
  To: Max Reitz; +Cc: Qemu-block, qemu-devel, Stefan Hajnoczi

On Thu, 11/09 16:14, Max Reitz wrote:
> On 2017-11-09 05:21, Fam Zheng wrote:
> > On Thu, 11/09 01:48, Max Reitz wrote:
> >> Hi,
> >>
> >> More exciting news from the bdrv_drain() front!
> >>
> >> I've noticed in the past that iotest 194 sometimes hangs.  I usually run
> >> the tests on tmpfs, but I've just now verified that it happens on my SSD
> >> just as well.
> >>
> >> So the reproducer is a plain:
> >>
> >> while ./check -raw 194; do; done
> > 
> > I cannot produce it on my machine.
> 
> Hm, too bad.  I see it both on my work laptop (with Fedora) and my
> desktop (with Arch)...
> 
> >> (No difference between raw or qcow2, though.)
> >>
> >> And then, after a couple of runs (or a couple ten), it will just hang.
> >> The reason is that the source VM lingers around and doesn't quit
> >> voluntarily -- the test itself was successful, but it just can't exit.
> >>
> >> If you force it to exit by killing the VM (e.g. through pkill -11 qemu),
> >> this is the backtrace:
> >>
> >> #0  0x00007f7cfc297e06 in ppoll () at /lib64/libc.so.6
> >> #1  0x0000563b846bcac9 in ppoll (__ss=0x0, __timeout=0x0,
> >> __nfds=<optimized out>, __fds=<optimized out>) at
> >> /usr/include/bits/poll2.h:77
> > 
> > Looking at the 0 timeout it seems we are in the aio_poll(ctx, blocking=false);
> > branches of BDRV_POLL_WHILE? Is it a busy loop? If so I wonder what is making
> > progress and causing the return value to be true in aio_poll().
> > 
> >> #2  0x0000563b846bcac9 in qemu_poll_ns (fds=<optimized out>,
> >> nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:322
> >> #3  0x0000563b846be711 in aio_poll (ctx=ctx@entry=0x563b856e3e80,
> >> blocking=<optimized out>) at util/aio-posix.c:629
> >> #4  0x0000563b8463afa4 in bdrv_drain_recurse
> >> (bs=bs@entry=0x563b865568a0, begin=begin@entry=true) at block/io.c:201
> >> #5  0x0000563b8463baff in bdrv_drain_all_begin () at block/io.c:381
> >> #6  0x0000563b8463bc99 in bdrv_drain_all () at block/io.c:411
> >> #7  0x0000563b8459888b in block_migration_cleanup (opaque=<optimized
> >> out>) at migration/block.c:714
> >> #8  0x0000563b845883be in qemu_savevm_state_cleanup () at
> >> migration/savevm.c:1251
> >> #9  0x0000563b845811fd in migration_thread (opaque=0x563b856f1da0) at
> >> migration/migration.c:2298
> >> #10 0x00007f7cfc56f36d in start_thread () at /lib64/libpthread.so.0
> >> #11 0x00007f7cfc2a3e1f in clone () at /lib64/libc.so.6
> >>
> >>
> >> And when you make bdrv_drain_all_begin() print what we are trying to
> >> drain, you can see that it's the format node (managed by the "raw"
> >> driver in this case).
> > 
> > So what is the value of bs->in_flight?
> 
> gdb:
> > (gdb) print bs->in_flight 
> > $3 = 2307492233
> 
> "That's weird, why would it..."
> 
> > (gdb) print *bs
> > $4 = {open_flags = -1202160144, read_only = 161, encrypted = 85, sg = false, probed = false, force_share = 96, implicit = 159, drv = 0x0, opaque = 0x0, aio_context = 0x8989898989898989, aio_notifiers = {lh_first = 0x8989898989898989}, 
> >   walking_aio_notifiers = 137, filename = '\211' <repeats 4096 times>, backing_file = '\211' <repeats 4096 times>, backing_format = '\211' <repeats 16 times>, full_open_options = 0x8989898989898989, 
> >   exact_filename = '\211' <repeats 4096 times>, backing = 0x8989898989898989, file = 0x8989898989898989, bl = {request_alignment = 2307492233, max_pdiscard = -1987475063, pdiscard_alignment = 2307492233, 
> >     max_pwrite_zeroes = -1987475063, pwrite_zeroes_alignment = 2307492233, opt_transfer = 2307492233, max_transfer = 2307492233, min_mem_alignment = 9910603678816504201, opt_mem_alignment = 9910603678816504201, max_iov = -1987475063}, 
> >   supported_write_flags = 2307492233, supported_zero_flags = 2307492233, node_name = '\211' <repeats 32 times>, node_list = {tqe_next = 0x8989898989898989, tqe_prev = 0x8989898989898989}, bs_list = {tqe_next = 0x8989898989898989, 
> >     tqe_prev = 0x8989898989898989}, monitor_list = {tqe_next = 0x8989898989898989, tqe_prev = 0x8989898989898989}, refcnt = -1987475063, op_blockers = {{lh_first = 0x8989898989898989} <repeats 16 times>}, job = 0x8989898989898989, 
> >   inherits_from = 0x8989898989898989, children = {lh_first = 0x8989898989898989}, parents = {lh_first = 0x8989898989898989}, options = 0x8989898989898989, explicit_options = 0x8989898989898989, detect_zeroes = 2307492233, 
> >   backing_blocker = 0x8989898989898989, total_sectors = -8536140394893047415, before_write_notifiers = {notifiers = {lh_first = 0x8989898989898989}}, write_threshold_offset = 9910603678816504201, write_threshold_notifier = {notify = 
> >     0x8989898989898989, node = {le_next = 0x8989898989898989, le_prev = 0x8989898989898989}}, dirty_bitmap_mutex = {lock = {__data = {__lock = -1987475063, __count = 2307492233, __owner = -1987475063, __nusers = 2307492233, 
> >         __kind = -1987475063, __spins = -30327, __elision = -30327, __list = {__prev = 0x8989898989898989, __next = 0x8989898989898989}}, __size = '\211' <repeats 40 times>, __align = -8536140394893047415}, initialized = 137}, 
> >   dirty_bitmaps = {lh_first = 0x8989898989898989}, wr_highest_offset = {value = 9910603678816504201}, copy_on_read = -1987475063, in_flight = 2307492233, serialising_in_flight = 2307492233, wakeup = 137, io_plugged = 2307492233, 
> >   enable_write_cache = -1987475063, quiesce_counter = -1987475063, write_gen = 2307492233, reqs_lock = {locked = 2307492233, ctx = 0x8989898989898989, from_push = {slh_first = 0x8989898989898989}, to_pop = {
> >       slh_first = 0x8989898989898989}, handoff = 2307492233, sequence = 2307492233, holder = 0x8989898989898989}, tracked_requests = {lh_first = 0x8989898989898989}, flush_queue = {entries = {sqh_first = 0x8989898989898989, 
> >       sqh_last = 0x8989898989898989}}, active_flush_req = 160, flushed_gen = 0}
> "Oh."
> 
> *sigh*
> 
> OK, I'll look into it...

OK, I'll let you.. Just one more thing: could it relate to the use-after-free
bug reported on block_job_defer_to_main_loop()?

https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg01144.html

Fam

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] Intermittent hang of iotest 194 (bdrv_drain_all after non-shared storage migration)
  2017-11-09 15:30     ` Fam Zheng
@ 2017-11-09 19:31       ` Max Reitz
  2017-11-10  2:36         ` Fam Zheng
  0 siblings, 1 reply; 7+ messages in thread
From: Max Reitz @ 2017-11-09 19:31 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Qemu-block, qemu-devel, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 7512 bytes --]

On 2017-11-09 16:30, Fam Zheng wrote:
> On Thu, 11/09 16:14, Max Reitz wrote:
>> On 2017-11-09 05:21, Fam Zheng wrote:
>>> On Thu, 11/09 01:48, Max Reitz wrote:
>>>> Hi,
>>>>
>>>> More exciting news from the bdrv_drain() front!
>>>>
>>>> I've noticed in the past that iotest 194 sometimes hangs.  I usually run
>>>> the tests on tmpfs, but I've just now verified that it happens on my SSD
>>>> just as well.
>>>>
>>>> So the reproducer is a plain:
>>>>
>>>> while ./check -raw 194; do; done
>>>
>>> I cannot produce it on my machine.
>>
>> Hm, too bad.  I see it both on my work laptop (with Fedora) and my
>> desktop (with Arch)...
>>
>>>> (No difference between raw or qcow2, though.)
>>>>
>>>> And then, after a couple of runs (or a couple ten), it will just hang.
>>>> The reason is that the source VM lingers around and doesn't quit
>>>> voluntarily -- the test itself was successful, but it just can't exit.
>>>>
>>>> If you force it to exit by killing the VM (e.g. through pkill -11 qemu),
>>>> this is the backtrace:
>>>>
>>>> #0  0x00007f7cfc297e06 in ppoll () at /lib64/libc.so.6
>>>> #1  0x0000563b846bcac9 in ppoll (__ss=0x0, __timeout=0x0,
>>>> __nfds=<optimized out>, __fds=<optimized out>) at
>>>> /usr/include/bits/poll2.h:77
>>>
>>> Looking at the 0 timeout it seems we are in the aio_poll(ctx, blocking=false);
>>> branches of BDRV_POLL_WHILE? Is it a busy loop? If so I wonder what is making
>>> progress and causing the return value to be true in aio_poll().
>>>
>>>> #2  0x0000563b846bcac9 in qemu_poll_ns (fds=<optimized out>,
>>>> nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:322
>>>> #3  0x0000563b846be711 in aio_poll (ctx=ctx@entry=0x563b856e3e80,
>>>> blocking=<optimized out>) at util/aio-posix.c:629
>>>> #4  0x0000563b8463afa4 in bdrv_drain_recurse
>>>> (bs=bs@entry=0x563b865568a0, begin=begin@entry=true) at block/io.c:201
>>>> #5  0x0000563b8463baff in bdrv_drain_all_begin () at block/io.c:381
>>>> #6  0x0000563b8463bc99 in bdrv_drain_all () at block/io.c:411
>>>> #7  0x0000563b8459888b in block_migration_cleanup (opaque=<optimized
>>>> out>) at migration/block.c:714
>>>> #8  0x0000563b845883be in qemu_savevm_state_cleanup () at
>>>> migration/savevm.c:1251
>>>> #9  0x0000563b845811fd in migration_thread (opaque=0x563b856f1da0) at
>>>> migration/migration.c:2298
>>>> #10 0x00007f7cfc56f36d in start_thread () at /lib64/libpthread.so.0
>>>> #11 0x00007f7cfc2a3e1f in clone () at /lib64/libc.so.6
>>>>
>>>>
>>>> And when you make bdrv_drain_all_begin() print what we are trying to
>>>> drain, you can see that it's the format node (managed by the "raw"
>>>> driver in this case).
>>>
>>> So what is the value of bs->in_flight?
>>
>> gdb:
>>> (gdb) print bs->in_flight 
>>> $3 = 2307492233
>>
>> "That's weird, why would it..."
>>
>>> (gdb) print *bs
>>> $4 = {open_flags = -1202160144, read_only = 161, encrypted = 85, sg = false, probed = false, force_share = 96, implicit = 159, drv = 0x0, opaque = 0x0, aio_context = 0x8989898989898989, aio_notifiers = {lh_first = 0x8989898989898989}, 
>>>   walking_aio_notifiers = 137, filename = '\211' <repeats 4096 times>, backing_file = '\211' <repeats 4096 times>, backing_format = '\211' <repeats 16 times>, full_open_options = 0x8989898989898989, 
>>>   exact_filename = '\211' <repeats 4096 times>, backing = 0x8989898989898989, file = 0x8989898989898989, bl = {request_alignment = 2307492233, max_pdiscard = -1987475063, pdiscard_alignment = 2307492233, 
>>>     max_pwrite_zeroes = -1987475063, pwrite_zeroes_alignment = 2307492233, opt_transfer = 2307492233, max_transfer = 2307492233, min_mem_alignment = 9910603678816504201, opt_mem_alignment = 9910603678816504201, max_iov = -1987475063}, 
>>>   supported_write_flags = 2307492233, supported_zero_flags = 2307492233, node_name = '\211' <repeats 32 times>, node_list = {tqe_next = 0x8989898989898989, tqe_prev = 0x8989898989898989}, bs_list = {tqe_next = 0x8989898989898989, 
>>>     tqe_prev = 0x8989898989898989}, monitor_list = {tqe_next = 0x8989898989898989, tqe_prev = 0x8989898989898989}, refcnt = -1987475063, op_blockers = {{lh_first = 0x8989898989898989} <repeats 16 times>}, job = 0x8989898989898989, 
>>>   inherits_from = 0x8989898989898989, children = {lh_first = 0x8989898989898989}, parents = {lh_first = 0x8989898989898989}, options = 0x8989898989898989, explicit_options = 0x8989898989898989, detect_zeroes = 2307492233, 
>>>   backing_blocker = 0x8989898989898989, total_sectors = -8536140394893047415, before_write_notifiers = {notifiers = {lh_first = 0x8989898989898989}}, write_threshold_offset = 9910603678816504201, write_threshold_notifier = {notify = 
>>>     0x8989898989898989, node = {le_next = 0x8989898989898989, le_prev = 0x8989898989898989}}, dirty_bitmap_mutex = {lock = {__data = {__lock = -1987475063, __count = 2307492233, __owner = -1987475063, __nusers = 2307492233, 
>>>         __kind = -1987475063, __spins = -30327, __elision = -30327, __list = {__prev = 0x8989898989898989, __next = 0x8989898989898989}}, __size = '\211' <repeats 40 times>, __align = -8536140394893047415}, initialized = 137}, 
>>>   dirty_bitmaps = {lh_first = 0x8989898989898989}, wr_highest_offset = {value = 9910603678816504201}, copy_on_read = -1987475063, in_flight = 2307492233, serialising_in_flight = 2307492233, wakeup = 137, io_plugged = 2307492233, 
>>>   enable_write_cache = -1987475063, quiesce_counter = -1987475063, write_gen = 2307492233, reqs_lock = {locked = 2307492233, ctx = 0x8989898989898989, from_push = {slh_first = 0x8989898989898989}, to_pop = {
>>>       slh_first = 0x8989898989898989}, handoff = 2307492233, sequence = 2307492233, holder = 0x8989898989898989}, tracked_requests = {lh_first = 0x8989898989898989}, flush_queue = {entries = {sqh_first = 0x8989898989898989, 
>>>       sqh_last = 0x8989898989898989}}, active_flush_req = 160, flushed_gen = 0}
>> "Oh."
>>
>> *sigh*
>>
>> OK, I'll look into it...
> 
> OK, I'll let you.. Just one more thing: could it relate to the use-after-free
> bug reported on block_job_defer_to_main_loop()?
> 
> https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg01144.html

Thanks for the heads-up; I think it's a different issue, though.

What appears to be happening is that the mirror job completes and then
drains its BDS.  While that is happening, a bdrv_drain_all() comes in
from block_migration_cleanup().

That now tries to drain the mirror node.  However, that node cannot be
drained until the job is truly gone now, so that is what's happening:
mirror_exit() is called, it cleans up, destroys the mirror node, and
returns.

Now bdrv_drain_all() can go on, specifically the BDRV_POLL_WHILE() on
the mirror node.  However, oops, that node is gone now...  So that's
where the issue seems to be. :-/

Maybe all that we need to do is wrap the bdrv_drain_recurse() call in
bdrv_drain_all_begin() in a bdrv_ref()/bdrv_unref() pair?  Having run
194 for a couple of minutes, that seems to indeed work -- until it dies
because of an invalid BB pointer in bdrv_next().  I guess that is
because bdrv_next() does not guard against deleted BDSs.

Copying all BDS into an own list (in both bdrv_drain_all_begin() and
bdrv_drain_all_end()), with a strong reference to every single one, and
then draining them really seems to work, though.  (Survived 9000
iterations, that seems good enough for something that usually fails
after, like, 5.)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] Intermittent hang of iotest 194 (bdrv_drain_all after non-shared storage migration)
  2017-11-09 19:31       ` Max Reitz
@ 2017-11-10  2:36         ` Fam Zheng
  2017-11-10 17:48           ` Max Reitz
  0 siblings, 1 reply; 7+ messages in thread
From: Fam Zheng @ 2017-11-10  2:36 UTC (permalink / raw)
  To: Max Reitz; +Cc: Qemu-block, qemu-devel, Stefan Hajnoczi

On Thu, 11/09 20:31, Max Reitz wrote:
> On 2017-11-09 16:30, Fam Zheng wrote:
> > On Thu, 11/09 16:14, Max Reitz wrote:
> >> On 2017-11-09 05:21, Fam Zheng wrote:
> >>> On Thu, 11/09 01:48, Max Reitz wrote:
> >>>> Hi,
> >>>>
> >>>> More exciting news from the bdrv_drain() front!
> >>>>
> >>>> I've noticed in the past that iotest 194 sometimes hangs.  I usually run
> >>>> the tests on tmpfs, but I've just now verified that it happens on my SSD
> >>>> just as well.
> >>>>
> >>>> So the reproducer is a plain:
> >>>>
> >>>> while ./check -raw 194; do; done
> >>>
> >>> I cannot produce it on my machine.
> >>
> >> Hm, too bad.  I see it both on my work laptop (with Fedora) and my
> >> desktop (with Arch)...
> >>
> >>>> (No difference between raw or qcow2, though.)
> >>>>
> >>>> And then, after a couple of runs (or a couple ten), it will just hang.
> >>>> The reason is that the source VM lingers around and doesn't quit
> >>>> voluntarily -- the test itself was successful, but it just can't exit.
> >>>>
> >>>> If you force it to exit by killing the VM (e.g. through pkill -11 qemu),
> >>>> this is the backtrace:
> >>>>
> >>>> #0  0x00007f7cfc297e06 in ppoll () at /lib64/libc.so.6
> >>>> #1  0x0000563b846bcac9 in ppoll (__ss=0x0, __timeout=0x0,
> >>>> __nfds=<optimized out>, __fds=<optimized out>) at
> >>>> /usr/include/bits/poll2.h:77
> >>>
> >>> Looking at the 0 timeout it seems we are in the aio_poll(ctx, blocking=false);
> >>> branches of BDRV_POLL_WHILE? Is it a busy loop? If so I wonder what is making
> >>> progress and causing the return value to be true in aio_poll().
> >>>
> >>>> #2  0x0000563b846bcac9 in qemu_poll_ns (fds=<optimized out>,
> >>>> nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:322
> >>>> #3  0x0000563b846be711 in aio_poll (ctx=ctx@entry=0x563b856e3e80,
> >>>> blocking=<optimized out>) at util/aio-posix.c:629
> >>>> #4  0x0000563b8463afa4 in bdrv_drain_recurse
> >>>> (bs=bs@entry=0x563b865568a0, begin=begin@entry=true) at block/io.c:201
> >>>> #5  0x0000563b8463baff in bdrv_drain_all_begin () at block/io.c:381
> >>>> #6  0x0000563b8463bc99 in bdrv_drain_all () at block/io.c:411
> >>>> #7  0x0000563b8459888b in block_migration_cleanup (opaque=<optimized
> >>>> out>) at migration/block.c:714
> >>>> #8  0x0000563b845883be in qemu_savevm_state_cleanup () at
> >>>> migration/savevm.c:1251
> >>>> #9  0x0000563b845811fd in migration_thread (opaque=0x563b856f1da0) at
> >>>> migration/migration.c:2298
> >>>> #10 0x00007f7cfc56f36d in start_thread () at /lib64/libpthread.so.0
> >>>> #11 0x00007f7cfc2a3e1f in clone () at /lib64/libc.so.6
> >>>>
> >>>>
> >>>> And when you make bdrv_drain_all_begin() print what we are trying to
> >>>> drain, you can see that it's the format node (managed by the "raw"
> >>>> driver in this case).
> >>>
> >>> So what is the value of bs->in_flight?
> >>
> >> gdb:
> >>> (gdb) print bs->in_flight 
> >>> $3 = 2307492233
> >>
> >> "That's weird, why would it..."
> >>
> >>> (gdb) print *bs
> >>> $4 = {open_flags = -1202160144, read_only = 161, encrypted = 85, sg = false, probed = false, force_share = 96, implicit = 159, drv = 0x0, opaque = 0x0, aio_context = 0x8989898989898989, aio_notifiers = {lh_first = 0x8989898989898989}, 
> >>>   walking_aio_notifiers = 137, filename = '\211' <repeats 4096 times>, backing_file = '\211' <repeats 4096 times>, backing_format = '\211' <repeats 16 times>, full_open_options = 0x8989898989898989, 
> >>>   exact_filename = '\211' <repeats 4096 times>, backing = 0x8989898989898989, file = 0x8989898989898989, bl = {request_alignment = 2307492233, max_pdiscard = -1987475063, pdiscard_alignment = 2307492233, 
> >>>     max_pwrite_zeroes = -1987475063, pwrite_zeroes_alignment = 2307492233, opt_transfer = 2307492233, max_transfer = 2307492233, min_mem_alignment = 9910603678816504201, opt_mem_alignment = 9910603678816504201, max_iov = -1987475063}, 
> >>>   supported_write_flags = 2307492233, supported_zero_flags = 2307492233, node_name = '\211' <repeats 32 times>, node_list = {tqe_next = 0x8989898989898989, tqe_prev = 0x8989898989898989}, bs_list = {tqe_next = 0x8989898989898989, 
> >>>     tqe_prev = 0x8989898989898989}, monitor_list = {tqe_next = 0x8989898989898989, tqe_prev = 0x8989898989898989}, refcnt = -1987475063, op_blockers = {{lh_first = 0x8989898989898989} <repeats 16 times>}, job = 0x8989898989898989, 
> >>>   inherits_from = 0x8989898989898989, children = {lh_first = 0x8989898989898989}, parents = {lh_first = 0x8989898989898989}, options = 0x8989898989898989, explicit_options = 0x8989898989898989, detect_zeroes = 2307492233, 
> >>>   backing_blocker = 0x8989898989898989, total_sectors = -8536140394893047415, before_write_notifiers = {notifiers = {lh_first = 0x8989898989898989}}, write_threshold_offset = 9910603678816504201, write_threshold_notifier = {notify = 
> >>>     0x8989898989898989, node = {le_next = 0x8989898989898989, le_prev = 0x8989898989898989}}, dirty_bitmap_mutex = {lock = {__data = {__lock = -1987475063, __count = 2307492233, __owner = -1987475063, __nusers = 2307492233, 
> >>>         __kind = -1987475063, __spins = -30327, __elision = -30327, __list = {__prev = 0x8989898989898989, __next = 0x8989898989898989}}, __size = '\211' <repeats 40 times>, __align = -8536140394893047415}, initialized = 137}, 
> >>>   dirty_bitmaps = {lh_first = 0x8989898989898989}, wr_highest_offset = {value = 9910603678816504201}, copy_on_read = -1987475063, in_flight = 2307492233, serialising_in_flight = 2307492233, wakeup = 137, io_plugged = 2307492233, 
> >>>   enable_write_cache = -1987475063, quiesce_counter = -1987475063, write_gen = 2307492233, reqs_lock = {locked = 2307492233, ctx = 0x8989898989898989, from_push = {slh_first = 0x8989898989898989}, to_pop = {
> >>>       slh_first = 0x8989898989898989}, handoff = 2307492233, sequence = 2307492233, holder = 0x8989898989898989}, tracked_requests = {lh_first = 0x8989898989898989}, flush_queue = {entries = {sqh_first = 0x8989898989898989, 
> >>>       sqh_last = 0x8989898989898989}}, active_flush_req = 160, flushed_gen = 0}
> >> "Oh."
> >>
> >> *sigh*
> >>
> >> OK, I'll look into it...
> > 
> > OK, I'll let you.. Just one more thing: could it relate to the use-after-free
> > bug reported on block_job_defer_to_main_loop()?
> > 
> > https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg01144.html
> 
> Thanks for the heads-up; I think it's a different issue, though.
> 
> What appears to be happening is that the mirror job completes and then
> drains its BDS.  While that is happening, a bdrv_drain_all() comes in
> from block_migration_cleanup().
> 
> That now tries to drain the mirror node.  However, that node cannot be
> drained until the job is truly gone now, so that is what's happening:
> mirror_exit() is called, it cleans up, destroys the mirror node, and
> returns.
> 
> Now bdrv_drain_all() can go on, specifically the BDRV_POLL_WHILE() on
> the mirror node.  However, oops, that node is gone now...  So that's
> where the issue seems to be. :-/
> 
> Maybe all that we need to do is wrap the bdrv_drain_recurse() call in
> bdrv_drain_all_begin() in a bdrv_ref()/bdrv_unref() pair?  Having run
> 194 for a couple of minutes, that seems to indeed work -- until it dies
> because of an invalid BB pointer in bdrv_next().  I guess that is
> because bdrv_next() does not guard against deleted BDSs.
> 
> Copying all BDS into an own list (in both bdrv_drain_all_begin() and
> bdrv_drain_all_end()), with a strong reference to every single one, and
> then draining them really seems to work, though.  (Survived 9000
> iterations, that seems good enough for something that usually fails
> after, like, 5.)

Yes, that makes sense. I'm curious if the patch in

https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg01649.html

would also work?

Fam

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] Intermittent hang of iotest 194 (bdrv_drain_all after non-shared storage migration)
  2017-11-10  2:36         ` Fam Zheng
@ 2017-11-10 17:48           ` Max Reitz
  0 siblings, 0 replies; 7+ messages in thread
From: Max Reitz @ 2017-11-10 17:48 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Qemu-block, qemu-devel, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2103 bytes --]

On 2017-11-10 03:36, Fam Zheng wrote:
> On Thu, 11/09 20:31, Max Reitz wrote:
>> On 2017-11-09 16:30, Fam Zheng wrote:
>>> On Thu, 11/09 16:14, Max Reitz wrote:

[...]

>>>> *sigh*
>>>>
>>>> OK, I'll look into it...
>>>
>>> OK, I'll let you.. Just one more thing: could it relate to the use-after-free
>>> bug reported on block_job_defer_to_main_loop()?
>>>
>>> https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg01144.html
>>
>> Thanks for the heads-up; I think it's a different issue, though.
>>
>> What appears to be happening is that the mirror job completes and then
>> drains its BDS.  While that is happening, a bdrv_drain_all() comes in
>> from block_migration_cleanup().
>>
>> That now tries to drain the mirror node.  However, that node cannot be
>> drained until the job is truly gone now, so that is what's happening:
>> mirror_exit() is called, it cleans up, destroys the mirror node, and
>> returns.
>>
>> Now bdrv_drain_all() can go on, specifically the BDRV_POLL_WHILE() on
>> the mirror node.  However, oops, that node is gone now...  So that's
>> where the issue seems to be. :-/
>>
>> Maybe all that we need to do is wrap the bdrv_drain_recurse() call in
>> bdrv_drain_all_begin() in a bdrv_ref()/bdrv_unref() pair?  Having run
>> 194 for a couple of minutes, that seems to indeed work -- until it dies
>> because of an invalid BB pointer in bdrv_next().  I guess that is
>> because bdrv_next() does not guard against deleted BDSs.
>>
>> Copying all BDS into an own list (in both bdrv_drain_all_begin() and
>> bdrv_drain_all_end()), with a strong reference to every single one, and
>> then draining them really seems to work, though.  (Survived 9000
>> iterations, that seems good enough for something that usually fails
>> after, like, 5.)
> 
> Yes, that makes sense. I'm curious if the patch in
> 
> https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg01649.html
> 
> would also work?

No, unfortunately it did not.

(Or maybe fortunately so, since that means I didn't do a whole lot of
work for nothing :-))

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-11-10 17:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-09  0:48 [Qemu-devel] Intermittent hang of iotest 194 (bdrv_drain_all after non-shared storage migration) Max Reitz
2017-11-09  4:21 ` Fam Zheng
2017-11-09 15:14   ` Max Reitz
2017-11-09 15:30     ` Fam Zheng
2017-11-09 19:31       ` Max Reitz
2017-11-10  2:36         ` Fam Zheng
2017-11-10 17:48           ` Max Reitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.