All of lore.kernel.org
 help / color / mirror / Atom feed
* Qemu-KVM 0.12.3 and Multipath -> Assertion
@ 2010-05-03 21:26 ` Peter Lieven
  0 siblings, 0 replies; 34+ messages in thread
From: Peter Lieven @ 2010-05-03 21:26 UTC (permalink / raw)
  To: kvm, qemu-devel

Hi Qemu/KVM Devel Team,

i'm using qemu-kvm 0.12.3 with latest Kernel 2.6.33.3.
As backend we use open-iSCSI with dm-multipath.

Multipath is configured to queue i/o if no path is available.

If we create a failure on all paths, qemu starts to consume 100%
CPU due to i/o waits which is ok so far.

1 odd thing: The Monitor Interface is not responding any more ...

What es a really blocker is that KVM crashes with:
kvm: /usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507: bmdma_active_if: 
Assertion `bmdma->unit != (uint8_t)-1' failed.

after the multipath has reestablisched at least one path.

Any ideas? I remember this was working with earlier kernel/kvm/qemu 
versions.

Thanks,
Peter



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion
@ 2010-05-03 21:26 ` Peter Lieven
  0 siblings, 0 replies; 34+ messages in thread
From: Peter Lieven @ 2010-05-03 21:26 UTC (permalink / raw)
  To: kvm, qemu-devel

Hi Qemu/KVM Devel Team,

i'm using qemu-kvm 0.12.3 with latest Kernel 2.6.33.3.
As backend we use open-iSCSI with dm-multipath.

Multipath is configured to queue i/o if no path is available.

If we create a failure on all paths, qemu starts to consume 100%
CPU due to i/o waits which is ok so far.

1 odd thing: The Monitor Interface is not responding any more ...

What es a really blocker is that KVM crashes with:
kvm: /usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507: bmdma_active_if: 
Assertion `bmdma->unit != (uint8_t)-1' failed.

after the multipath has reestablisched at least one path.

Any ideas? I remember this was working with earlier kernel/kvm/qemu 
versions.

Thanks,
Peter

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Qemu-KVM 0.12.3 and Multipath -> Assertion
  2010-05-03 21:26 ` [Qemu-devel] " Peter Lieven
@ 2010-05-04  5:38   ` André Weidemann
  -1 siblings, 0 replies; 34+ messages in thread
From: André Weidemann @ 2010-05-04  5:38 UTC (permalink / raw)
  To: Peter Lieven; +Cc: kvm, qemu-devel

Hi Peter,
On 03.05.2010 23:26, Peter Lieven wrote:

> Hi Qemu/KVM Devel Team,
>
> i'm using qemu-kvm 0.12.3 with latest Kernel 2.6.33.3.
> As backend we use open-iSCSI with dm-multipath.
>
> Multipath is configured to queue i/o if no path is available.
>
> If we create a failure on all paths, qemu starts to consume 100%
> CPU due to i/o waits which is ok so far.
>
> 1 odd thing: The Monitor Interface is not responding any more ...
>
> What es a really blocker is that KVM crashes with:
> kvm: /usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507: bmdma_active_if:
> Assertion `bmdma->unit != (uint8_t)-1' failed.
>
> after the multipath has reestablisched at least one path.
>
> Any ideas? I remember this was working with earlier kernel/kvm/qemu
> versions.


I have the same issue on my machine, although I am using local storage 
(LVM or a physical disk) to write my data to.
I reported the "Assertion failed" on March 17th to the list. Marcello 
and Avi had asked some question back then, but I don't know if they have 
come up with a fix for it.

Regards
  André

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] Re: Qemu-KVM 0.12.3 and Multipath -> Assertion
@ 2010-05-04  5:38   ` André Weidemann
  0 siblings, 0 replies; 34+ messages in thread
From: André Weidemann @ 2010-05-04  5:38 UTC (permalink / raw)
  To: Peter Lieven; +Cc: qemu-devel, kvm

Hi Peter,
On 03.05.2010 23:26, Peter Lieven wrote:

> Hi Qemu/KVM Devel Team,
>
> i'm using qemu-kvm 0.12.3 with latest Kernel 2.6.33.3.
> As backend we use open-iSCSI with dm-multipath.
>
> Multipath is configured to queue i/o if no path is available.
>
> If we create a failure on all paths, qemu starts to consume 100%
> CPU due to i/o waits which is ok so far.
>
> 1 odd thing: The Monitor Interface is not responding any more ...
>
> What es a really blocker is that KVM crashes with:
> kvm: /usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507: bmdma_active_if:
> Assertion `bmdma->unit != (uint8_t)-1' failed.
>
> after the multipath has reestablisched at least one path.
>
> Any ideas? I remember this was working with earlier kernel/kvm/qemu
> versions.


I have the same issue on my machine, although I am using local storage 
(LVM or a physical disk) to write my data to.
I reported the "Assertion failed" on March 17th to the list. Marcello 
and Avi had asked some question back then, but I don't know if they have 
come up with a fix for it.

Regards
  André

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion
  2010-05-03 21:26 ` [Qemu-devel] " Peter Lieven
@ 2010-05-04  8:35   ` Kevin Wolf
  -1 siblings, 0 replies; 34+ messages in thread
From: Kevin Wolf @ 2010-05-04  8:35 UTC (permalink / raw)
  To: Peter Lieven; +Cc: kvm, qemu-devel

Am 03.05.2010 23:26, schrieb Peter Lieven:
> Hi Qemu/KVM Devel Team,
> 
> i'm using qemu-kvm 0.12.3 with latest Kernel 2.6.33.3.
> As backend we use open-iSCSI with dm-multipath.
> 
> Multipath is configured to queue i/o if no path is available.
> 
> If we create a failure on all paths, qemu starts to consume 100%
> CPU due to i/o waits which is ok so far.
> 
> 1 odd thing: The Monitor Interface is not responding any more ...
> 
> What es a really blocker is that KVM crashes with:
> kvm: /usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507: bmdma_active_if: 
> Assertion `bmdma->unit != (uint8_t)-1' failed.
> 
> after the multipath has reestablisched at least one path.

Can you get a stack backtrace with gdb?

> Any ideas? I remember this was working with earlier kernel/kvm/qemu 
> versions.

If it works in the same setup with an older qemu version, bisecting
might help.

Kevin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion
@ 2010-05-04  8:35   ` Kevin Wolf
  0 siblings, 0 replies; 34+ messages in thread
From: Kevin Wolf @ 2010-05-04  8:35 UTC (permalink / raw)
  To: Peter Lieven; +Cc: qemu-devel, kvm

Am 03.05.2010 23:26, schrieb Peter Lieven:
> Hi Qemu/KVM Devel Team,
> 
> i'm using qemu-kvm 0.12.3 with latest Kernel 2.6.33.3.
> As backend we use open-iSCSI with dm-multipath.
> 
> Multipath is configured to queue i/o if no path is available.
> 
> If we create a failure on all paths, qemu starts to consume 100%
> CPU due to i/o waits which is ok so far.
> 
> 1 odd thing: The Monitor Interface is not responding any more ...
> 
> What es a really blocker is that KVM crashes with:
> kvm: /usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507: bmdma_active_if: 
> Assertion `bmdma->unit != (uint8_t)-1' failed.
> 
> after the multipath has reestablisched at least one path.

Can you get a stack backtrace with gdb?

> Any ideas? I remember this was working with earlier kernel/kvm/qemu 
> versions.

If it works in the same setup with an older qemu version, bisecting
might help.

Kevin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion
  2010-05-04  8:35   ` Kevin Wolf
@ 2010-05-04 11:38     ` Peter Lieven
  -1 siblings, 0 replies; 34+ messages in thread
From: Peter Lieven @ 2010-05-04 11:38 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: kvm, qemu-devel

hi kevin,

i set a breakpint at bmdma_active_if. the first 2 breaks encountered 
when the last path in the multipath
failed, but the assertion was not true.
when i kicked one path back in the breakpoint was reached again, this 
time leading to an assert.
the stacktrace is from the point shortly before.

hope this helps.

br,
peter
--

(gdb) b bmdma_active_if
Breakpoint 2 at 0x43f2e0: file 
/usr/src/qemu-kvm-0.12.3/hw/ide/internal.h, line 507.
(gdb) c
Continuing.
[Switching to Thread 0x7f7b3300d950 (LWP 21171)]

Breakpoint 2, bmdma_active_if (bmdma=0xe31fd8) at 
/usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507
507        assert(bmdma->unit != (uint8_t)-1);
(gdb) c
Continuing.

Breakpoint 2, bmdma_active_if (bmdma=0xe31fd8) at 
/usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507
507        assert(bmdma->unit != (uint8_t)-1);
(gdb) c
Continuing.

Breakpoint 2, bmdma_active_if (bmdma=0xe31fd8) at 
/usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507
507        assert(bmdma->unit != (uint8_t)-1);
(gdb) bt full
#0  bmdma_active_if (bmdma=0xe31fd8) at 
/usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507
    __PRETTY_FUNCTION__ = "bmdma_active_if"
#1  0x000000000043f6ba in ide_read_dma_cb (opaque=0xe31fd8, ret=0) at 
/usr/src/qemu-kvm-0.12.3/hw/ide/core.c:554
    bm = (BMDMAState *) 0xe31fd8
    s = (IDEState *) 0xe17940
    n = 0
    sector_num = 0
#2  0x000000000058730c in dma_bdrv_cb (opaque=0xe17940, ret=0) at 
/usr/src/qemu-kvm-0.12.3/dma-helpers.c:94
    dbs = (DMAAIOCB *) 0xe17940
    cur_addr = 0
    cur_len = 0
    mem = (void *) 0x0
#3  0x000000000049e510 in qemu_laio_process_completion (s=0xe119c0, 
laiocb=0xe179c0) at linux-aio.c:68
    ret = 0
#4  0x000000000049e611 in qemu_laio_enqueue_completed (s=0xe119c0, 
laiocb=0xe179c0) at linux-aio.c:107
No locals.
#5  0x000000000049e787 in qemu_laio_completion_cb (opaque=0xe119c0) at 
linux-aio.c:144
    iocb = (struct iocb *) 0xe179f0
    laiocb = (struct qemu_laiocb *) 0xe179c0
    val = 1
    ret = 8
    nevents = 1
    i = 0
    events = {{data = 0x0, obj = 0xe179f0, res = 4096, res2 = 0}, {data 
= 0x0, obj = 0x0, res = 0, res2 = 0} <repeats 46 times>, {data = 0x0, 
obj = 0x0, res = 0,
    res2 = 4365191}, {data = 0x429abf, obj = 0x7f7b3300c410, res = 
4614129721674825936, res2 = 14777248}, {data = 0x3000000018, obj = 
0x7f7b3300c4c0, res = 140167113393152,
    res2 = 47259417504}, {data = 0xe17740, obj = 0xa3300c4e0, res = 
140167113393184, res2 = 0}, {data = 0xe17740, obj = 0x0, res = 0, res2 = 
17}, {data = 0x7f7b3300ccf0,
    obj = 0x92, res = 32, res2 = 168}, {data = 0x7f7b33797a00, obj = 
0x801000, res = 0, res2 = 140167141433408}, {data = 0x7f7b34496e00, obj 
= 0x7f7b33797a00,
    res = 140167113393392, res2 = 8392704}, {data = 0x0, obj = 
0x7f7b34aca040, res = 140167134932480, res2 = 140167118209654}, {data = 
0x7f7b3300d950, obj = 0x42603d, res = 0,
    res2 = 42949672960}, {data = 0x7f7b3300c510, obj = 0xe17ba0, res = 
14776128, res2 = 43805361568}, {data = 0x7f7b3300ced0, obj = 0x42797e, 
res = 0, res2 = 14777248}, {
    data = 0x174, obj = 0x0, res = 373, res2 = 0}, {data = 0x176, obj = 
0x0, res = 3221225601, res2 = 0}, {data = 0x4008ae89c0000083, obj = 0x0, 
res = 209379655938, res2 = 0}, {
    data = 0x7f7bc0000084, obj = 0x0, res = 3221225602, res2 = 0}, {data 
= 0x7f7b00000012, obj = 0x0, res = 17, res2 = 0}, {data = 0x0, obj = 
0x11, res = 140167113395840,
    res2 = 146}, {data = 0x20, obj = 0xa8, res = 140167121304064, res2 = 
8392704}, {data = 0x0, obj = 0x7f7b34aca040, res = 140167134932480, res2 
= 140167121304064}, {
    data = 0x7f7b3300c680, obj = 0x801000, res = 0, res2 = 
140167141433408}, {data = 0x7f7b34496e00, obj = 0x7f7b334a4276, res = 
140167113398608, res2 = 4350013}, {data = 0x0,
    obj = 0xa00000000, res = 140167113393824, res2 = 14777248}, {data = 
0xe2c010, obj = 0xa3300c730, res = 140167113396320, res2 = 4356478}, 
{data = 0x0, obj = 0xe17ba0,
    res = 372, res2 = 0}, {data = 0x175, obj = 0x0, res = 374, res2 = 
0}, {data = 0xc0000081, obj = 0x0, res = 3221225603, res2 = 0}, {data = 
0xc0000102, obj = 0x0,
    res = 3221225604, res2 = 0}, {data = 0xc0000082, obj = 0x0, res = 
18, res2 = 0}, {data = 0x11, obj = 0x0, res = 0, res2 = 0}, {data = 0x0, 
obj = 0x0, res = 0, res2 = 0}, {
    data = 0x0, obj = 0x0, res = 0, res2 = 0}, {data = 0x0, obj = 0x0, 
res = 0, res2 = 0}, {data = 0x0, obj = 0x0, res = 0, res2 = 0}, {data = 
0x0, obj = 0x0, res = 0,
    res2 = 140167139245116}, {data = 0x0, obj = 0x7f7b34abe118, res = 9, 
res2 = 13}, {data = 0x25bf5fc6, obj = 0x7f7b348b40f0, res = 
140167117719264, res2 = 6}, {
    data = 0x96fd7f, obj = 0x7f7b3300c850, res = 140167113394680, res2 = 
140167117724520}, {data = 0x0, obj = 0x7f7b34abe168, res = 
140167141388288, res2 = 4206037}, {
    data = 0x7f7b3343a210, obj = 0x402058, res = 21474836480, res2 = 
4294968102}, {data = 0x0, obj = 0x7f7b34ac8358, res = 140167113394736, 
res2 = 140167113394680}, {
    data = 0x25bf5fc6, obj = 0x7f7b3300c9e0, res = 0, res2 = 
140167139246910}, {data = 0x0, obj = 0x7f7b34abe168, res = 5, res2 = 
140167139245116}, {data = 0x1,
    obj = 0x7f7b34abe120, res = 10, res2 = 13}, {data = 0x9fd7b9dd, obj 
= 0x7f7b348b40f0, res = 140167139205904, res2 = 140166257704989}, {data 
= 0x27f5ee7,
    obj = 0x7f7b3300c950, res = 140167113394936, res2 = 
140167139205920}, {data = 0x0, obj = 0x7f7b34abe510, res = 
140167141434664, res2 = 140167134874710}, {
    data = 0x7f7b348aa5a8, obj = 0x7f7b34486e30, res = 21474836480, res2 
= 4294967319}, {data = 0x7f7b3428427c, obj = 0x7f7b34ac7cc8, res = 
140167113394992,
    res2 = 140167113394936}, {data = 0x9fd7b9dd, obj = 0x7f7b3300cae0, 
res = 0, res2 = 140167139246910}, {data = 0x0, obj = 0x7f7b34abe510, res 
= 140166257704965, res2 = 0}, {
    data = 0x500000001, obj = 0xffffffff, res = 0, res2 = 8627912}, 
{data = 0x801000, obj = 0x100000000, res = 140167141385488, res2 = 
140167141424328}, {data = 0x7f7b3300cb60,
    obj = 0x7f7b34ac7970, res = 140167134874710, res2 = 0}, {data = 0x5, 
obj = 0x0, res = 140167117743864, res2 = 140167113398608}, {data = 
0x7f7b3300cb00, obj = 0x7f7b348bf592,
    res = 14665264, res2 = 90}, {data = 0x2, obj = 0x7f7b33508255, res = 
140167113398608, res2 = 140167113394944}, {data = 0x801000, obj = 0x0, 
res = 140167141433408,
    res2 = 4966659}, {data = 0x50, obj = 0x23300cb2f, res = 
140167139206472, res2 = 140167141434664}, {data = 0x98, obj = 
0xffffffff, res = 140167113395072, res2 = 2191368}, {
    data = 0x801000, obj = 0x3, res = 140167141433408, res2 = 
140167139245116}, {data = 0x7f7b34486000, obj = 0x7f7b34abe0e8, res = 3, 
res2 = 13}, {data = 0xa896c0a2,
    obj = 0x7f7b348b40f0, res = 140167132758616, res2 = 34}, {data = 
0x2a25b02, obj = 0x7f7b3300cb90, res = 140167113395512, res2 = 
140167132758672}, {data = 0x0,
    obj = 0x7f7b34abe1b0, res = 140167141396480, res2 = 4204852}, {data 
= 0x7f7b34284458, obj = 0x400d38, res = 21474836480, res2 = 4294967302}, 
{data = 0xc008ae67325339e0,
---Type <return> to continue, or q <return> to quit---
    obj = 0x7f7b34ac8358, res = 140167113395568, res2 = 
140167113395512}, {data = 0xa896c0a2, obj = 0x7f7b3300cd20, res = 0, 
res2 = 140167139246910}, {data = 0x0,
    obj = 0x7f7b34abe1b0, res = 140166257704965, res2 = 0}, {data = 0x1, 
obj = 0x101010101010101, res = 140167113395504, res2 = 14776768}, {data 
= 0x7f7b3300cd20,
    obj = 0x13300cd7c, res = 140167141384624, res2 = 140167141426008}, 
{data = 0x7f7b3300cda0, obj = 0x7f7b34ac8000, res = 4204852, res2 = 
4364819}, {data = 0x42994b,
    obj = 0x7f7b3300ccf0, res = 13837501612500713496, res2 = 14658720}, 
{data = 0x3000000018, obj = 0x7f7b3300cda0, res = 140167113395424, res2 
= 4371109}, {
    data = 0x7f7b3300cdc0, obj = 0x3300cd4c, res = 64424509441, res2 = 
4364819}, {data = 0x42994b, obj = 0x7f7b3300cd50, res = 
13837501612500713473, res2 = 14658720}, {
    data = 0x3000000018, obj = 0x7f7b3300ce00, res = 140167132758816, 
res2 = 140167141396480}, {data = 0x7f7b3300ce20, obj = 0xffffffff, res = 
140167113395580, res2 = 8626296}, {
    data = 0x801000, obj = 0x0, res = 140167141433408, res2 = 
140167134932480}, {data = 0x0, obj = 0x7f7b348b871a, res = 
140166257704965, res2 = 0}, {data = 0x7f7b3300cdd0,
    obj = 0x42b2a5, res = 140167132758816, res2 = 140167113398608}}
    ts = {tv_sec = 0, tv_nsec = 0}
    s = (struct qemu_laio_state *) 0xe119c0
#6  0x000000000049e841 in laio_cancel (blockacb=0xe179c0) at linux-aio.c:184
    laiocb = (struct qemu_laiocb *) 0xe179c0
    event = {data = 0x1, obj = 0x4c7fb1, res = 140167113395792, res2 = 
4384262}
    ret = -22
#7  0x000000000049a29b in bdrv_aio_cancel (acb=0xe179c0) at block.c:1792
No locals.
#8  0x000000000058755a in dma_aio_cancel (acb=0xe17940) at 
/usr/src/qemu-kvm-0.12.3/dma-helpers.c:138
    dbs = (DMAAIOCB *) 0xe17940
#9  0x000000000049a29b in bdrv_aio_cancel (acb=0xe17940) at block.c:1792
No locals.
#10 0x0000000000444a0c in ide_dma_cancel (bm=0xe31fd8) at 
/usr/src/qemu-kvm-0.12.3/hw/ide/core.c:2838
No locals.
#11 0x0000000000444f39 in bmdma_cmd_writeb (opaque=0xe31fd8, addr=49152, 
val=8) at /usr/src/qemu-kvm-0.12.3/hw/ide/pci.c:44
    bm = (BMDMAState *) 0xe31fd8
#12 0x00000000004c81bc in ioport_write (index=0, address=49152, data=8) 
at ioport.c:80
    func = (IOPortWriteFunc *) 0x444f0c <bmdma_cmd_writeb>
    default_func = {0x4c81d0 <default_ioport_writeb>, 0x4c8225 
<default_ioport_writew>, 0x4c8282 <default_ioport_writel>}
#13 0x00000000004c8543 in cpu_outb (addr=49152, val=8 '\b') at ioport.c:198
No locals.
#14 0x0000000000429689 in kvm_handle_io (port=49152, 
data=0x7f7b34ab7000, direction=1, size=1, count=1) at 
/usr/src/qemu-kvm-0.12.3/kvm-all.c:535
    i = 0
    ptr = (uint8_t *) 0x7f7b34ab7000 "\b"
#15 0x000000000042bac3 in kvm_run (env=0xe17ba0) at 
/usr/src/qemu-kvm-0.12.3/qemu-kvm.c:964
    r = 0
    kvm = (kvm_context_t) 0xdfb0d0
    run = (struct kvm_run *) 0x7f7b34ab6000
    fd = 15
#16 0x000000000042cdda in kvm_cpu_exec (env=0xe17ba0) at 
/usr/src/qemu-kvm-0.12.3/qemu-kvm.c:1647
    r = 0
#17 0x000000000042d564 in kvm_main_loop_cpu (env=0xe17ba0) at 
/usr/src/qemu-kvm-0.12.3/qemu-kvm.c:1889
    run_cpu = 1
#18 0x000000000042d6a5 in ap_main_loop (_env=0xe17ba0) at 
/usr/src/qemu-kvm-0.12.3/qemu-kvm.c:1939
    env = (struct CPUX86State *) 0xe17ba0
    signals = {__val = {18446744067267100671, 18446744073709551615 
<repeats 15 times>}}
    data = (struct ioperm_data *) 0x0
#19 0x00007f7b3448d3ba in start_thread () from /lib/libpthread.so.0
No symbol table info available.
#20 0x00007f7b3350ffcd in clone () from /lib/libc.so.6
No symbol table info available.
#21 0x0000000000000000 in ?? ()
No symbol table info available.
(gdb) c
Continuing.
kvm: /usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507: bmdma_active_if: 
Assertion `bmdma->unit != (uint8_t)-1' failed.

Program received signal SIGABRT, Aborted.
0x00007f7b3345cfb5 in raise () from /lib/libc.so.6


Kevin Wolf wrote:
> Am 03.05.2010 23:26, schrieb Peter Lieven:
>   
>> Hi Qemu/KVM Devel Team,
>>
>> i'm using qemu-kvm 0.12.3 with latest Kernel 2.6.33.3.
>> As backend we use open-iSCSI with dm-multipath.
>>
>> Multipath is configured to queue i/o if no path is available.
>>
>> If we create a failure on all paths, qemu starts to consume 100%
>> CPU due to i/o waits which is ok so far.
>>
>> 1 odd thing: The Monitor Interface is not responding any more ...
>>
>> What es a really blocker is that KVM crashes with:
>> kvm: /usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507: bmdma_active_if: 
>> Assertion `bmdma->unit != (uint8_t)-1' failed.
>>
>> after the multipath has reestablisched at least one path.
>>     
>
> Can you get a stack backtrace with gdb?
>
>   
>> Any ideas? I remember this was working with earlier kernel/kvm/qemu 
>> versions.
>>     
>
> If it works in the same setup with an older qemu version, bisecting
> might help.
>
> Kevin
>
>   


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion
@ 2010-05-04 11:38     ` Peter Lieven
  0 siblings, 0 replies; 34+ messages in thread
From: Peter Lieven @ 2010-05-04 11:38 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, kvm

hi kevin,

i set a breakpint at bmdma_active_if. the first 2 breaks encountered 
when the last path in the multipath
failed, but the assertion was not true.
when i kicked one path back in the breakpoint was reached again, this 
time leading to an assert.
the stacktrace is from the point shortly before.

hope this helps.

br,
peter
--

(gdb) b bmdma_active_if
Breakpoint 2 at 0x43f2e0: file 
/usr/src/qemu-kvm-0.12.3/hw/ide/internal.h, line 507.
(gdb) c
Continuing.
[Switching to Thread 0x7f7b3300d950 (LWP 21171)]

Breakpoint 2, bmdma_active_if (bmdma=0xe31fd8) at 
/usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507
507        assert(bmdma->unit != (uint8_t)-1);
(gdb) c
Continuing.

Breakpoint 2, bmdma_active_if (bmdma=0xe31fd8) at 
/usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507
507        assert(bmdma->unit != (uint8_t)-1);
(gdb) c
Continuing.

Breakpoint 2, bmdma_active_if (bmdma=0xe31fd8) at 
/usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507
507        assert(bmdma->unit != (uint8_t)-1);
(gdb) bt full
#0  bmdma_active_if (bmdma=0xe31fd8) at 
/usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507
    __PRETTY_FUNCTION__ = "bmdma_active_if"
#1  0x000000000043f6ba in ide_read_dma_cb (opaque=0xe31fd8, ret=0) at 
/usr/src/qemu-kvm-0.12.3/hw/ide/core.c:554
    bm = (BMDMAState *) 0xe31fd8
    s = (IDEState *) 0xe17940
    n = 0
    sector_num = 0
#2  0x000000000058730c in dma_bdrv_cb (opaque=0xe17940, ret=0) at 
/usr/src/qemu-kvm-0.12.3/dma-helpers.c:94
    dbs = (DMAAIOCB *) 0xe17940
    cur_addr = 0
    cur_len = 0
    mem = (void *) 0x0
#3  0x000000000049e510 in qemu_laio_process_completion (s=0xe119c0, 
laiocb=0xe179c0) at linux-aio.c:68
    ret = 0
#4  0x000000000049e611 in qemu_laio_enqueue_completed (s=0xe119c0, 
laiocb=0xe179c0) at linux-aio.c:107
No locals.
#5  0x000000000049e787 in qemu_laio_completion_cb (opaque=0xe119c0) at 
linux-aio.c:144
    iocb = (struct iocb *) 0xe179f0
    laiocb = (struct qemu_laiocb *) 0xe179c0
    val = 1
    ret = 8
    nevents = 1
    i = 0
    events = {{data = 0x0, obj = 0xe179f0, res = 4096, res2 = 0}, {data 
= 0x0, obj = 0x0, res = 0, res2 = 0} <repeats 46 times>, {data = 0x0, 
obj = 0x0, res = 0,
    res2 = 4365191}, {data = 0x429abf, obj = 0x7f7b3300c410, res = 
4614129721674825936, res2 = 14777248}, {data = 0x3000000018, obj = 
0x7f7b3300c4c0, res = 140167113393152,
    res2 = 47259417504}, {data = 0xe17740, obj = 0xa3300c4e0, res = 
140167113393184, res2 = 0}, {data = 0xe17740, obj = 0x0, res = 0, res2 = 
17}, {data = 0x7f7b3300ccf0,
    obj = 0x92, res = 32, res2 = 168}, {data = 0x7f7b33797a00, obj = 
0x801000, res = 0, res2 = 140167141433408}, {data = 0x7f7b34496e00, obj 
= 0x7f7b33797a00,
    res = 140167113393392, res2 = 8392704}, {data = 0x0, obj = 
0x7f7b34aca040, res = 140167134932480, res2 = 140167118209654}, {data = 
0x7f7b3300d950, obj = 0x42603d, res = 0,
    res2 = 42949672960}, {data = 0x7f7b3300c510, obj = 0xe17ba0, res = 
14776128, res2 = 43805361568}, {data = 0x7f7b3300ced0, obj = 0x42797e, 
res = 0, res2 = 14777248}, {
    data = 0x174, obj = 0x0, res = 373, res2 = 0}, {data = 0x176, obj = 
0x0, res = 3221225601, res2 = 0}, {data = 0x4008ae89c0000083, obj = 0x0, 
res = 209379655938, res2 = 0}, {
    data = 0x7f7bc0000084, obj = 0x0, res = 3221225602, res2 = 0}, {data 
= 0x7f7b00000012, obj = 0x0, res = 17, res2 = 0}, {data = 0x0, obj = 
0x11, res = 140167113395840,
    res2 = 146}, {data = 0x20, obj = 0xa8, res = 140167121304064, res2 = 
8392704}, {data = 0x0, obj = 0x7f7b34aca040, res = 140167134932480, res2 
= 140167121304064}, {
    data = 0x7f7b3300c680, obj = 0x801000, res = 0, res2 = 
140167141433408}, {data = 0x7f7b34496e00, obj = 0x7f7b334a4276, res = 
140167113398608, res2 = 4350013}, {data = 0x0,
    obj = 0xa00000000, res = 140167113393824, res2 = 14777248}, {data = 
0xe2c010, obj = 0xa3300c730, res = 140167113396320, res2 = 4356478}, 
{data = 0x0, obj = 0xe17ba0,
    res = 372, res2 = 0}, {data = 0x175, obj = 0x0, res = 374, res2 = 
0}, {data = 0xc0000081, obj = 0x0, res = 3221225603, res2 = 0}, {data = 
0xc0000102, obj = 0x0,
    res = 3221225604, res2 = 0}, {data = 0xc0000082, obj = 0x0, res = 
18, res2 = 0}, {data = 0x11, obj = 0x0, res = 0, res2 = 0}, {data = 0x0, 
obj = 0x0, res = 0, res2 = 0}, {
    data = 0x0, obj = 0x0, res = 0, res2 = 0}, {data = 0x0, obj = 0x0, 
res = 0, res2 = 0}, {data = 0x0, obj = 0x0, res = 0, res2 = 0}, {data = 
0x0, obj = 0x0, res = 0,
    res2 = 140167139245116}, {data = 0x0, obj = 0x7f7b34abe118, res = 9, 
res2 = 13}, {data = 0x25bf5fc6, obj = 0x7f7b348b40f0, res = 
140167117719264, res2 = 6}, {
    data = 0x96fd7f, obj = 0x7f7b3300c850, res = 140167113394680, res2 = 
140167117724520}, {data = 0x0, obj = 0x7f7b34abe168, res = 
140167141388288, res2 = 4206037}, {
    data = 0x7f7b3343a210, obj = 0x402058, res = 21474836480, res2 = 
4294968102}, {data = 0x0, obj = 0x7f7b34ac8358, res = 140167113394736, 
res2 = 140167113394680}, {
    data = 0x25bf5fc6, obj = 0x7f7b3300c9e0, res = 0, res2 = 
140167139246910}, {data = 0x0, obj = 0x7f7b34abe168, res = 5, res2 = 
140167139245116}, {data = 0x1,
    obj = 0x7f7b34abe120, res = 10, res2 = 13}, {data = 0x9fd7b9dd, obj 
= 0x7f7b348b40f0, res = 140167139205904, res2 = 140166257704989}, {data 
= 0x27f5ee7,
    obj = 0x7f7b3300c950, res = 140167113394936, res2 = 
140167139205920}, {data = 0x0, obj = 0x7f7b34abe510, res = 
140167141434664, res2 = 140167134874710}, {
    data = 0x7f7b348aa5a8, obj = 0x7f7b34486e30, res = 21474836480, res2 
= 4294967319}, {data = 0x7f7b3428427c, obj = 0x7f7b34ac7cc8, res = 
140167113394992,
    res2 = 140167113394936}, {data = 0x9fd7b9dd, obj = 0x7f7b3300cae0, 
res = 0, res2 = 140167139246910}, {data = 0x0, obj = 0x7f7b34abe510, res 
= 140166257704965, res2 = 0}, {
    data = 0x500000001, obj = 0xffffffff, res = 0, res2 = 8627912}, 
{data = 0x801000, obj = 0x100000000, res = 140167141385488, res2 = 
140167141424328}, {data = 0x7f7b3300cb60,
    obj = 0x7f7b34ac7970, res = 140167134874710, res2 = 0}, {data = 0x5, 
obj = 0x0, res = 140167117743864, res2 = 140167113398608}, {data = 
0x7f7b3300cb00, obj = 0x7f7b348bf592,
    res = 14665264, res2 = 90}, {data = 0x2, obj = 0x7f7b33508255, res = 
140167113398608, res2 = 140167113394944}, {data = 0x801000, obj = 0x0, 
res = 140167141433408,
    res2 = 4966659}, {data = 0x50, obj = 0x23300cb2f, res = 
140167139206472, res2 = 140167141434664}, {data = 0x98, obj = 
0xffffffff, res = 140167113395072, res2 = 2191368}, {
    data = 0x801000, obj = 0x3, res = 140167141433408, res2 = 
140167139245116}, {data = 0x7f7b34486000, obj = 0x7f7b34abe0e8, res = 3, 
res2 = 13}, {data = 0xa896c0a2,
    obj = 0x7f7b348b40f0, res = 140167132758616, res2 = 34}, {data = 
0x2a25b02, obj = 0x7f7b3300cb90, res = 140167113395512, res2 = 
140167132758672}, {data = 0x0,
    obj = 0x7f7b34abe1b0, res = 140167141396480, res2 = 4204852}, {data 
= 0x7f7b34284458, obj = 0x400d38, res = 21474836480, res2 = 4294967302}, 
{data = 0xc008ae67325339e0,
---Type <return> to continue, or q <return> to quit---
    obj = 0x7f7b34ac8358, res = 140167113395568, res2 = 
140167113395512}, {data = 0xa896c0a2, obj = 0x7f7b3300cd20, res = 0, 
res2 = 140167139246910}, {data = 0x0,
    obj = 0x7f7b34abe1b0, res = 140166257704965, res2 = 0}, {data = 0x1, 
obj = 0x101010101010101, res = 140167113395504, res2 = 14776768}, {data 
= 0x7f7b3300cd20,
    obj = 0x13300cd7c, res = 140167141384624, res2 = 140167141426008}, 
{data = 0x7f7b3300cda0, obj = 0x7f7b34ac8000, res = 4204852, res2 = 
4364819}, {data = 0x42994b,
    obj = 0x7f7b3300ccf0, res = 13837501612500713496, res2 = 14658720}, 
{data = 0x3000000018, obj = 0x7f7b3300cda0, res = 140167113395424, res2 
= 4371109}, {
    data = 0x7f7b3300cdc0, obj = 0x3300cd4c, res = 64424509441, res2 = 
4364819}, {data = 0x42994b, obj = 0x7f7b3300cd50, res = 
13837501612500713473, res2 = 14658720}, {
    data = 0x3000000018, obj = 0x7f7b3300ce00, res = 140167132758816, 
res2 = 140167141396480}, {data = 0x7f7b3300ce20, obj = 0xffffffff, res = 
140167113395580, res2 = 8626296}, {
    data = 0x801000, obj = 0x0, res = 140167141433408, res2 = 
140167134932480}, {data = 0x0, obj = 0x7f7b348b871a, res = 
140166257704965, res2 = 0}, {data = 0x7f7b3300cdd0,
    obj = 0x42b2a5, res = 140167132758816, res2 = 140167113398608}}
    ts = {tv_sec = 0, tv_nsec = 0}
    s = (struct qemu_laio_state *) 0xe119c0
#6  0x000000000049e841 in laio_cancel (blockacb=0xe179c0) at linux-aio.c:184
    laiocb = (struct qemu_laiocb *) 0xe179c0
    event = {data = 0x1, obj = 0x4c7fb1, res = 140167113395792, res2 = 
4384262}
    ret = -22
#7  0x000000000049a29b in bdrv_aio_cancel (acb=0xe179c0) at block.c:1792
No locals.
#8  0x000000000058755a in dma_aio_cancel (acb=0xe17940) at 
/usr/src/qemu-kvm-0.12.3/dma-helpers.c:138
    dbs = (DMAAIOCB *) 0xe17940
#9  0x000000000049a29b in bdrv_aio_cancel (acb=0xe17940) at block.c:1792
No locals.
#10 0x0000000000444a0c in ide_dma_cancel (bm=0xe31fd8) at 
/usr/src/qemu-kvm-0.12.3/hw/ide/core.c:2838
No locals.
#11 0x0000000000444f39 in bmdma_cmd_writeb (opaque=0xe31fd8, addr=49152, 
val=8) at /usr/src/qemu-kvm-0.12.3/hw/ide/pci.c:44
    bm = (BMDMAState *) 0xe31fd8
#12 0x00000000004c81bc in ioport_write (index=0, address=49152, data=8) 
at ioport.c:80
    func = (IOPortWriteFunc *) 0x444f0c <bmdma_cmd_writeb>
    default_func = {0x4c81d0 <default_ioport_writeb>, 0x4c8225 
<default_ioport_writew>, 0x4c8282 <default_ioport_writel>}
#13 0x00000000004c8543 in cpu_outb (addr=49152, val=8 '\b') at ioport.c:198
No locals.
#14 0x0000000000429689 in kvm_handle_io (port=49152, 
data=0x7f7b34ab7000, direction=1, size=1, count=1) at 
/usr/src/qemu-kvm-0.12.3/kvm-all.c:535
    i = 0
    ptr = (uint8_t *) 0x7f7b34ab7000 "\b"
#15 0x000000000042bac3 in kvm_run (env=0xe17ba0) at 
/usr/src/qemu-kvm-0.12.3/qemu-kvm.c:964
    r = 0
    kvm = (kvm_context_t) 0xdfb0d0
    run = (struct kvm_run *) 0x7f7b34ab6000
    fd = 15
#16 0x000000000042cdda in kvm_cpu_exec (env=0xe17ba0) at 
/usr/src/qemu-kvm-0.12.3/qemu-kvm.c:1647
    r = 0
#17 0x000000000042d564 in kvm_main_loop_cpu (env=0xe17ba0) at 
/usr/src/qemu-kvm-0.12.3/qemu-kvm.c:1889
    run_cpu = 1
#18 0x000000000042d6a5 in ap_main_loop (_env=0xe17ba0) at 
/usr/src/qemu-kvm-0.12.3/qemu-kvm.c:1939
    env = (struct CPUX86State *) 0xe17ba0
    signals = {__val = {18446744067267100671, 18446744073709551615 
<repeats 15 times>}}
    data = (struct ioperm_data *) 0x0
#19 0x00007f7b3448d3ba in start_thread () from /lib/libpthread.so.0
No symbol table info available.
#20 0x00007f7b3350ffcd in clone () from /lib/libc.so.6
No symbol table info available.
#21 0x0000000000000000 in ?? ()
No symbol table info available.
(gdb) c
Continuing.
kvm: /usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507: bmdma_active_if: 
Assertion `bmdma->unit != (uint8_t)-1' failed.

Program received signal SIGABRT, Aborted.
0x00007f7b3345cfb5 in raise () from /lib/libc.so.6


Kevin Wolf wrote:
> Am 03.05.2010 23:26, schrieb Peter Lieven:
>   
>> Hi Qemu/KVM Devel Team,
>>
>> i'm using qemu-kvm 0.12.3 with latest Kernel 2.6.33.3.
>> As backend we use open-iSCSI with dm-multipath.
>>
>> Multipath is configured to queue i/o if no path is available.
>>
>> If we create a failure on all paths, qemu starts to consume 100%
>> CPU due to i/o waits which is ok so far.
>>
>> 1 odd thing: The Monitor Interface is not responding any more ...
>>
>> What es a really blocker is that KVM crashes with:
>> kvm: /usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507: bmdma_active_if: 
>> Assertion `bmdma->unit != (uint8_t)-1' failed.
>>
>> after the multipath has reestablisched at least one path.
>>     
>
> Can you get a stack backtrace with gdb?
>
>   
>> Any ideas? I remember this was working with earlier kernel/kvm/qemu 
>> versions.
>>     
>
> If it works in the same setup with an older qemu version, bisecting
> might help.
>
> Kevin
>
>   

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion
  2010-05-04 11:38     ` Peter Lieven
@ 2010-05-04 12:20       ` Kevin Wolf
  -1 siblings, 0 replies; 34+ messages in thread
From: Kevin Wolf @ 2010-05-04 12:20 UTC (permalink / raw)
  To: Peter Lieven; +Cc: kvm, qemu-devel, Christoph Hellwig

Am 04.05.2010 13:38, schrieb Peter Lieven:
> hi kevin,
> 
> i set a breakpint at bmdma_active_if. the first 2 breaks encountered 
> when the last path in the multipath
> failed, but the assertion was not true.
> when i kicked one path back in the breakpoint was reached again, this 
> time leading to an assert.
> the stacktrace is from the point shortly before.
> 
> hope this helps.

Hm, looks like there's something wrong with cancelling requests -
bdrv_aio_cancel might decide that it completes a request (and
consequently calls the callback for it) whereas the IDE emulation
decides that it's done with the request before calling bdrv_aio_cancel.

I haven't looked in much detail what this could break, but does
something like this help?

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 0757528..3cd55e3 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -2838,10 +2838,6 @@ static void ide_dma_restart(IDEState *s, int is_read)
 void ide_dma_cancel(BMDMAState *bm)
 {
     if (bm->status & BM_STATUS_DMAING) {
-        bm->status &= ~BM_STATUS_DMAING;
-        /* cancel DMA request */
-        bm->unit = -1;
-        bm->dma_cb = NULL;
         if (bm->aiocb) {
 #ifdef DEBUG_AIO
             printf("aio_cancel\n");
@@ -2849,6 +2845,10 @@ void ide_dma_cancel(BMDMAState *bm)
             bdrv_aio_cancel(bm->aiocb);
             bm->aiocb = NULL;
         }
+        bm->status &= ~BM_STATUS_DMAING;
+        /* cancel DMA request */
+        bm->unit = -1;
+        bm->dma_cb = NULL;
     }
 }

Kevin

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion
@ 2010-05-04 12:20       ` Kevin Wolf
  0 siblings, 0 replies; 34+ messages in thread
From: Kevin Wolf @ 2010-05-04 12:20 UTC (permalink / raw)
  To: Peter Lieven; +Cc: qemu-devel, kvm, Christoph Hellwig

Am 04.05.2010 13:38, schrieb Peter Lieven:
> hi kevin,
> 
> i set a breakpint at bmdma_active_if. the first 2 breaks encountered 
> when the last path in the multipath
> failed, but the assertion was not true.
> when i kicked one path back in the breakpoint was reached again, this 
> time leading to an assert.
> the stacktrace is from the point shortly before.
> 
> hope this helps.

Hm, looks like there's something wrong with cancelling requests -
bdrv_aio_cancel might decide that it completes a request (and
consequently calls the callback for it) whereas the IDE emulation
decides that it's done with the request before calling bdrv_aio_cancel.

I haven't looked in much detail what this could break, but does
something like this help?

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 0757528..3cd55e3 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -2838,10 +2838,6 @@ static void ide_dma_restart(IDEState *s, int is_read)
 void ide_dma_cancel(BMDMAState *bm)
 {
     if (bm->status & BM_STATUS_DMAING) {
-        bm->status &= ~BM_STATUS_DMAING;
-        /* cancel DMA request */
-        bm->unit = -1;
-        bm->dma_cb = NULL;
         if (bm->aiocb) {
 #ifdef DEBUG_AIO
             printf("aio_cancel\n");
@@ -2849,6 +2845,10 @@ void ide_dma_cancel(BMDMAState *bm)
             bdrv_aio_cancel(bm->aiocb);
             bm->aiocb = NULL;
         }
+        bm->status &= ~BM_STATUS_DMAING;
+        /* cancel DMA request */
+        bm->unit = -1;
+        bm->dma_cb = NULL;
     }
 }

Kevin

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion
  2010-05-04 12:20       ` Kevin Wolf
@ 2010-05-04 13:42         ` Peter Lieven
  -1 siblings, 0 replies; 34+ messages in thread
From: Peter Lieven @ 2010-05-04 13:42 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: kvm, qemu-devel, Christoph Hellwig

hi kevin,

you did it *g*

looks promising. applied this patched and was not able to reproduce yet :-)

secure way to reproduce was to shut down all multipath paths, then 
initiate i/o
in the vm (e.g. start an application). of course, everything hangs at 
this point.

after reenabling one path, vm crashed. now it seems to behave correctly and
just report an DMA timeout and continues normally afterwards.

can you imagine of any way preventing the vm to consume 100% cpu in
that waiting state?
my current approach is to run all vms with nice 1, which helped to keep the
machine responsible if all vms (in my test case 64 on a box) have hanging
i/o at the same time.

br,
peter



Kevin Wolf wrote:
> Am 04.05.2010 13:38, schrieb Peter Lieven:
>   
>> hi kevin,
>>
>> i set a breakpint at bmdma_active_if. the first 2 breaks encountered 
>> when the last path in the multipath
>> failed, but the assertion was not true.
>> when i kicked one path back in the breakpoint was reached again, this 
>> time leading to an assert.
>> the stacktrace is from the point shortly before.
>>
>> hope this helps.
>>     
>
> Hm, looks like there's something wrong with cancelling requests -
> bdrv_aio_cancel might decide that it completes a request (and
> consequently calls the callback for it) whereas the IDE emulation
> decides that it's done with the request before calling bdrv_aio_cancel.
>
> I haven't looked in much detail what this could break, but does
> something like this help?
>
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index 0757528..3cd55e3 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -2838,10 +2838,6 @@ static void ide_dma_restart(IDEState *s, int is_read)
>  void ide_dma_cancel(BMDMAState *bm)
>  {
>      if (bm->status & BM_STATUS_DMAING) {
> -        bm->status &= ~BM_STATUS_DMAING;
> -        /* cancel DMA request */
> -        bm->unit = -1;
> -        bm->dma_cb = NULL;
>          if (bm->aiocb) {
>  #ifdef DEBUG_AIO
>              printf("aio_cancel\n");
> @@ -2849,6 +2845,10 @@ void ide_dma_cancel(BMDMAState *bm)
>              bdrv_aio_cancel(bm->aiocb);
>              bm->aiocb = NULL;
>          }
> +        bm->status &= ~BM_STATUS_DMAING;
> +        /* cancel DMA request */
> +        bm->unit = -1;
> +        bm->dma_cb = NULL;
>      }
>  }
>
> Kevin
>
>   


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion
@ 2010-05-04 13:42         ` Peter Lieven
  0 siblings, 0 replies; 34+ messages in thread
From: Peter Lieven @ 2010-05-04 13:42 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, kvm, Christoph Hellwig

hi kevin,

you did it *g*

looks promising. applied this patched and was not able to reproduce yet :-)

secure way to reproduce was to shut down all multipath paths, then 
initiate i/o
in the vm (e.g. start an application). of course, everything hangs at 
this point.

after reenabling one path, vm crashed. now it seems to behave correctly and
just report an DMA timeout and continues normally afterwards.

can you imagine of any way preventing the vm to consume 100% cpu in
that waiting state?
my current approach is to run all vms with nice 1, which helped to keep the
machine responsible if all vms (in my test case 64 on a box) have hanging
i/o at the same time.

br,
peter



Kevin Wolf wrote:
> Am 04.05.2010 13:38, schrieb Peter Lieven:
>   
>> hi kevin,
>>
>> i set a breakpint at bmdma_active_if. the first 2 breaks encountered 
>> when the last path in the multipath
>> failed, but the assertion was not true.
>> when i kicked one path back in the breakpoint was reached again, this 
>> time leading to an assert.
>> the stacktrace is from the point shortly before.
>>
>> hope this helps.
>>     
>
> Hm, looks like there's something wrong with cancelling requests -
> bdrv_aio_cancel might decide that it completes a request (and
> consequently calls the callback for it) whereas the IDE emulation
> decides that it's done with the request before calling bdrv_aio_cancel.
>
> I haven't looked in much detail what this could break, but does
> something like this help?
>
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index 0757528..3cd55e3 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -2838,10 +2838,6 @@ static void ide_dma_restart(IDEState *s, int is_read)
>  void ide_dma_cancel(BMDMAState *bm)
>  {
>      if (bm->status & BM_STATUS_DMAING) {
> -        bm->status &= ~BM_STATUS_DMAING;
> -        /* cancel DMA request */
> -        bm->unit = -1;
> -        bm->dma_cb = NULL;
>          if (bm->aiocb) {
>  #ifdef DEBUG_AIO
>              printf("aio_cancel\n");
> @@ -2849,6 +2845,10 @@ void ide_dma_cancel(BMDMAState *bm)
>              bdrv_aio_cancel(bm->aiocb);
>              bm->aiocb = NULL;
>          }
> +        bm->status &= ~BM_STATUS_DMAING;
> +        /* cancel DMA request */
> +        bm->unit = -1;
> +        bm->dma_cb = NULL;
>      }
>  }
>
> Kevin
>
>   

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion
  2010-05-04 13:42         ` Peter Lieven
@ 2010-05-04 14:01           ` Kevin Wolf
  -1 siblings, 0 replies; 34+ messages in thread
From: Kevin Wolf @ 2010-05-04 14:01 UTC (permalink / raw)
  To: Peter Lieven; +Cc: kvm, qemu-devel, Christoph Hellwig

Am 04.05.2010 15:42, schrieb Peter Lieven:
> hi kevin,
> 
> you did it *g*
> 
> looks promising. applied this patched and was not able to reproduce yet :-)
> 
> secure way to reproduce was to shut down all multipath paths, then 
> initiate i/o
> in the vm (e.g. start an application). of course, everything hangs at 
> this point.
> 
> after reenabling one path, vm crashed. now it seems to behave correctly and
> just report an DMA timeout and continues normally afterwards.

Great, I'm going to submit it as a proper patch then.

Christoph, by now I'm pretty sure it's right, but can you have another
look if this is correct, anyway?

> can you imagine of any way preventing the vm to consume 100% cpu in
> that waiting state?
> my current approach is to run all vms with nice 1, which helped to keep the
> machine responsible if all vms (in my test case 64 on a box) have hanging
> i/o at the same time.

I don't have anything particular in mind, but you could just attach gdb
and get another backtrace while it consumes 100% CPU (you'll need to use
"thread apply all bt" to catch everything). Then we should see where
it's hanging.

Kevin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion
@ 2010-05-04 14:01           ` Kevin Wolf
  0 siblings, 0 replies; 34+ messages in thread
From: Kevin Wolf @ 2010-05-04 14:01 UTC (permalink / raw)
  To: Peter Lieven; +Cc: qemu-devel, kvm, Christoph Hellwig

Am 04.05.2010 15:42, schrieb Peter Lieven:
> hi kevin,
> 
> you did it *g*
> 
> looks promising. applied this patched and was not able to reproduce yet :-)
> 
> secure way to reproduce was to shut down all multipath paths, then 
> initiate i/o
> in the vm (e.g. start an application). of course, everything hangs at 
> this point.
> 
> after reenabling one path, vm crashed. now it seems to behave correctly and
> just report an DMA timeout and continues normally afterwards.

Great, I'm going to submit it as a proper patch then.

Christoph, by now I'm pretty sure it's right, but can you have another
look if this is correct, anyway?

> can you imagine of any way preventing the vm to consume 100% cpu in
> that waiting state?
> my current approach is to run all vms with nice 1, which helped to keep the
> machine responsible if all vms (in my test case 64 on a box) have hanging
> i/o at the same time.

I don't have anything particular in mind, but you could just attach gdb
and get another backtrace while it consumes 100% CPU (you'll need to use
"thread apply all bt" to catch everything). Then we should see where
it's hanging.

Kevin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion
  2010-05-04 14:01           ` Kevin Wolf
  (?)
@ 2010-05-04 17:07           ` Christoph Hellwig
  2010-05-18 11:13             ` Peter Lieven
  -1 siblings, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2010-05-04 17:07 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Peter Lieven, qemu-devel, kvm

On Tue, May 04, 2010 at 04:01:35PM +0200, Kevin Wolf wrote:
> Great, I'm going to submit it as a proper patch then.
> 
> Christoph, by now I'm pretty sure it's right, but can you have another
> look if this is correct, anyway?

It looks correct to me - we really shouldn't update the the fields
until bdrv_aio_cancel has returned.  In fact we cannot cancel a request
more often than we can, so there's a fairly high chance it will
complete.


Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion
  2010-05-04 12:20       ` Kevin Wolf
@ 2010-05-08  9:53         ` André Weidemann
  -1 siblings, 0 replies; 34+ messages in thread
From: André Weidemann @ 2010-05-08  9:53 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Peter Lieven, kvm, qemu-devel, Christoph Hellwig

Hi Kevin,
On 04.05.2010 14:20, Kevin Wolf wrote:

> Am 04.05.2010 13:38, schrieb Peter Lieven:
>> hi kevin,
>>
>> i set a breakpint at bmdma_active_if. the first 2 breaks encountered
>> when the last path in the multipath
>> failed, but the assertion was not true.
>> when i kicked one path back in the breakpoint was reached again, this
>> time leading to an assert.
>> the stacktrace is from the point shortly before.
>>
>> hope this helps.
>
> Hm, looks like there's something wrong with cancelling requests -
> bdrv_aio_cancel might decide that it completes a request (and
> consequently calls the callback for it) whereas the IDE emulation
> decides that it's done with the request before calling bdrv_aio_cancel.
>
> I haven't looked in much detail what this could break, but does
> something like this help?

Your attached patch fixes the problem I had as well. I ran 3 consecutive 
tests tonight, which all finished without crashing the VM.
I reported my "assertion failed" error on March 14th while doing disk 
perfomance tests using iozone in an Ubuntu 9.10 VM with qemu-kvm 0.12.3.

Thank you very much.
  André

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion
@ 2010-05-08  9:53         ` André Weidemann
  0 siblings, 0 replies; 34+ messages in thread
From: André Weidemann @ 2010-05-08  9:53 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Peter Lieven, qemu-devel, kvm, Christoph Hellwig

Hi Kevin,
On 04.05.2010 14:20, Kevin Wolf wrote:

> Am 04.05.2010 13:38, schrieb Peter Lieven:
>> hi kevin,
>>
>> i set a breakpint at bmdma_active_if. the first 2 breaks encountered
>> when the last path in the multipath
>> failed, but the assertion was not true.
>> when i kicked one path back in the breakpoint was reached again, this
>> time leading to an assert.
>> the stacktrace is from the point shortly before.
>>
>> hope this helps.
>
> Hm, looks like there's something wrong with cancelling requests -
> bdrv_aio_cancel might decide that it completes a request (and
> consequently calls the callback for it) whereas the IDE emulation
> decides that it's done with the request before calling bdrv_aio_cancel.
>
> I haven't looked in much detail what this could break, but does
> something like this help?

Your attached patch fixes the problem I had as well. I ran 3 consecutive 
tests tonight, which all finished without crashing the VM.
I reported my "assertion failed" error on March 14th while doing disk 
perfomance tests using iozone in an Ubuntu 9.10 VM with qemu-kvm 0.12.3.

Thank you very much.
  André

^ permalink raw reply	[flat|nested] 34+ messages in thread

* qemu-kvm hangs if multipath device is queing (was: Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion)
  2010-05-04 14:01           ` Kevin Wolf
  (?)
  (?)
@ 2010-05-12 14:01           ` Peter Lieven
  2010-05-14  9:26               ` [Qemu-devel] " Kevin Wolf
  -1 siblings, 1 reply; 34+ messages in thread
From: Peter Lieven @ 2010-05-12 14:01 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, kvm, Christoph Hellwig

Hi Kevin,

here we go. I created a blocking multipath device (interrupted all 
paths). qemu-kvm hangs with 100% cpu.
also monitor is not responding.

If I restore at least one path, the vm is continueing.

BR,
Peter


^C
Program received signal SIGINT, Interrupt.
0x00007fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0
(gdb) bt
#0  0x00007fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0
#1  0x00007fd8a6aaa190 in _L_lock_102 () from /lib/libpthread.so.0
#2  0x00007fd8a6aa9a7e in pthread_mutex_lock () from /lib/libpthread.so.0
#3  0x000000000042e739 in kvm_mutex_lock () at 
/usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2524
#4  0x000000000042e76e in qemu_mutex_lock_iothread () at 
/usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2537
#5  0x000000000040c262 in main_loop_wait (timeout=1000) at 
/usr/src/qemu-kvm-0.12.4/vl.c:3995
#6  0x000000000042dcf1 in kvm_main_loop () at 
/usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2126
#7  0x000000000040c98c in main_loop () at /usr/src/qemu-kvm-0.12.4/vl.c:4212
#8  0x000000000041054b in main (argc=30, argv=0x7fff266a77e8, 
envp=0x7fff266a78e0) at /usr/src/qemu-kvm-0.12.4/vl.c:6252
(gdb) bt full
#0  0x00007fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0
No symbol table info available.
#1  0x00007fd8a6aaa190 in _L_lock_102 () from /lib/libpthread.so.0
No symbol table info available.
#2  0x00007fd8a6aa9a7e in pthread_mutex_lock () from /lib/libpthread.so.0
No symbol table info available.
#3  0x000000000042e739 in kvm_mutex_lock () at 
/usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2524
No locals.
#4  0x000000000042e76e in qemu_mutex_lock_iothread () at 
/usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2537
No locals.
#5  0x000000000040c262 in main_loop_wait (timeout=1000) at 
/usr/src/qemu-kvm-0.12.4/vl.c:3995
    ioh = (IOHandlerRecord *) 0x0
    rfds = {fds_bits = {1048576, 0 <repeats 15 times>}}
    wfds = {fds_bits = {0 <repeats 16 times>}}
    xfds = {fds_bits = {0 <repeats 16 times>}}
    ret = 1
    nfds = 21
    tv = {tv_sec = 0, tv_usec = 999761}
#6  0x000000000042dcf1 in kvm_main_loop () at 
/usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2126
    fds = {18, 19}
    mask = {__val = {268443712, 0 <repeats 15 times>}}
    sigfd = 20
#7  0x000000000040c98c in main_loop () at /usr/src/qemu-kvm-0.12.4/vl.c:4212
    r = 0
#8  0x000000000041054b in main (argc=30, argv=0x7fff266a77e8, 
envp=0x7fff266a78e0) at /usr/src/qemu-kvm-0.12.4/vl.c:6252
    gdbstub_dev = 0x0
    boot_devices_bitmap = 12
    i = 0
    snapshot = 0
    linux_boot = 0
    initrd_filename = 0x0
    kernel_filename = 0x0
    kernel_cmdline = 0x588fac ""
    boot_devices = "dc", '\0' <repeats 30 times>
    ds = (DisplayState *) 0x198bf00
    dcl = (DisplayChangeListener *) 0x0
    cyls = 0
    heads = 0
    secs = 0
    translation = 0
    hda_opts = (QemuOpts *) 0x0
    opts = (QemuOpts *) 0x1957390
    optind = 30
---Type <return> to continue, or q <return> to quit---
    r = 0x7fff266a8a23 "-usbdevice"
    optarg = 0x7fff266a8a2e "tablet"
    loadvm = 0x0
    machine = (QEMUMachine *) 0x861720
    cpu_model = 0x7fff266a8917 "qemu64,model_id=Intel(R) Xeon(R) CPU", ' 
' <repeats 11 times>, "E5520  @ 2.27GHz"
    fds = {644511720, 32767}
    tb_size = 0
    pid_file = 0x7fff266a89bb "/var/run/qemu/vm-150.pid"
    incoming = 0x0
    fd = 0
    pwd = (struct passwd *) 0x0
    chroot_dir = 0x0
    run_as = 0x0
    env = (struct CPUX86State *) 0x0
    show_vnc_port = 0
    params = {0x58cc76 "order", 0x58cc7c "once", 0x58cc81 "menu", 0x0}

Kevin Wolf wrote:
> Am 04.05.2010 15:42, schrieb Peter Lieven:
>   
>> hi kevin,
>>
>> you did it *g*
>>
>> looks promising. applied this patched and was not able to reproduce yet :-)
>>
>> secure way to reproduce was to shut down all multipath paths, then 
>> initiate i/o
>> in the vm (e.g. start an application). of course, everything hangs at 
>> this point.
>>
>> after reenabling one path, vm crashed. now it seems to behave correctly and
>> just report an DMA timeout and continues normally afterwards.
>>     
>
> Great, I'm going to submit it as a proper patch then.
>
> Christoph, by now I'm pretty sure it's right, but can you have another
> look if this is correct, anyway?
>
>   
>> can you imagine of any way preventing the vm to consume 100% cpu in
>> that waiting state?
>> my current approach is to run all vms with nice 1, which helped to keep the
>> machine responsible if all vms (in my test case 64 on a box) have hanging
>> i/o at the same time.
>>     
>
> I don't have anything particular in mind, but you could just attach gdb
> and get another backtrace while it consumes 100% CPU (you'll need to use
> "thread apply all bt" to catch everything). Then we should see where
> it's hanging.
>
> Kevin
>
>
>
>   


-- 
Mit freundlichen Grüßen/Kind Regards

Peter Lieven

..........................................................................................................

   KAMP Netzwerkdienste GmbH
   Vestische Str. 89-91 | 46117 Oberhausen
   Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40
   mailto:pl@kamp.de | http://www.kamp.de

   Geschäftsführer: Heiner Lante | Michael Lante
   Amtsgericht Duisburg | HRB Nr. 12154
   USt-Id-Nr.: DE 120607556

......................................................................................................... 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: qemu-kvm hangs if multipath device is queing
  2010-05-12 14:01           ` qemu-kvm hangs if multipath device is queing (was: Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion) Peter Lieven
@ 2010-05-14  9:26               ` Kevin Wolf
  0 siblings, 0 replies; 34+ messages in thread
From: Kevin Wolf @ 2010-05-14  9:26 UTC (permalink / raw)
  To: Peter Lieven; +Cc: qemu-devel, kvm, Christoph Hellwig

Hi Peter,

Am 12.05.2010 16:01, schrieb Peter Lieven:
> Hi Kevin,
> 
> here we go. I created a blocking multipath device (interrupted all 
> paths). qemu-kvm hangs with 100% cpu.
> also monitor is not responding.
> 
> If I restore at least one path, the vm is continueing.
> 
> BR,
> Peter

This seems to be the backtrace of only one thread, and likely not the
interesting one. Can you please use "threads all apply bt" to get the
backtrace of all threads?

Kevin

> 
> 
> ^C
> Program received signal SIGINT, Interrupt.
> 0x00007fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0
> (gdb) bt
> #0  0x00007fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0
> #1  0x00007fd8a6aaa190 in _L_lock_102 () from /lib/libpthread.so.0
> #2  0x00007fd8a6aa9a7e in pthread_mutex_lock () from /lib/libpthread.so.0
> #3  0x000000000042e739 in kvm_mutex_lock () at 
> /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2524
> #4  0x000000000042e76e in qemu_mutex_lock_iothread () at 
> /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2537
> #5  0x000000000040c262 in main_loop_wait (timeout=1000) at 
> /usr/src/qemu-kvm-0.12.4/vl.c:3995
> #6  0x000000000042dcf1 in kvm_main_loop () at 
> /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2126
> #7  0x000000000040c98c in main_loop () at /usr/src/qemu-kvm-0.12.4/vl.c:4212
> #8  0x000000000041054b in main (argc=30, argv=0x7fff266a77e8, 
> envp=0x7fff266a78e0) at /usr/src/qemu-kvm-0.12.4/vl.c:6252
> (gdb) bt full
> #0  0x00007fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0
> No symbol table info available.
> #1  0x00007fd8a6aaa190 in _L_lock_102 () from /lib/libpthread.so.0
> No symbol table info available.
> #2  0x00007fd8a6aa9a7e in pthread_mutex_lock () from /lib/libpthread.so.0
> No symbol table info available.
> #3  0x000000000042e739 in kvm_mutex_lock () at 
> /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2524
> No locals.
> #4  0x000000000042e76e in qemu_mutex_lock_iothread () at 
> /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2537
> No locals.
> #5  0x000000000040c262 in main_loop_wait (timeout=1000) at 
> /usr/src/qemu-kvm-0.12.4/vl.c:3995
>     ioh = (IOHandlerRecord *) 0x0
>     rfds = {fds_bits = {1048576, 0 <repeats 15 times>}}
>     wfds = {fds_bits = {0 <repeats 16 times>}}
>     xfds = {fds_bits = {0 <repeats 16 times>}}
>     ret = 1
>     nfds = 21
>     tv = {tv_sec = 0, tv_usec = 999761}
> #6  0x000000000042dcf1 in kvm_main_loop () at 
> /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2126
>     fds = {18, 19}
>     mask = {__val = {268443712, 0 <repeats 15 times>}}
>     sigfd = 20
> #7  0x000000000040c98c in main_loop () at /usr/src/qemu-kvm-0.12.4/vl.c:4212
>     r = 0
> #8  0x000000000041054b in main (argc=30, argv=0x7fff266a77e8, 
> envp=0x7fff266a78e0) at /usr/src/qemu-kvm-0.12.4/vl.c:6252
>     gdbstub_dev = 0x0
>     boot_devices_bitmap = 12
>     i = 0
>     snapshot = 0
>     linux_boot = 0
>     initrd_filename = 0x0
>     kernel_filename = 0x0
>     kernel_cmdline = 0x588fac ""
>     boot_devices = "dc", '\0' <repeats 30 times>
>     ds = (DisplayState *) 0x198bf00
>     dcl = (DisplayChangeListener *) 0x0
>     cyls = 0
>     heads = 0
>     secs = 0
>     translation = 0
>     hda_opts = (QemuOpts *) 0x0
>     opts = (QemuOpts *) 0x1957390
>     optind = 30
> ---Type <return> to continue, or q <return> to quit---
>     r = 0x7fff266a8a23 "-usbdevice"
>     optarg = 0x7fff266a8a2e "tablet"
>     loadvm = 0x0
>     machine = (QEMUMachine *) 0x861720
>     cpu_model = 0x7fff266a8917 "qemu64,model_id=Intel(R) Xeon(R) CPU", ' 
> ' <repeats 11 times>, "E5520  @ 2.27GHz"
>     fds = {644511720, 32767}
>     tb_size = 0
>     pid_file = 0x7fff266a89bb "/var/run/qemu/vm-150.pid"
>     incoming = 0x0
>     fd = 0
>     pwd = (struct passwd *) 0x0
>     chroot_dir = 0x0
>     run_as = 0x0
>     env = (struct CPUX86State *) 0x0
>     show_vnc_port = 0
>     params = {0x58cc76 "order", 0x58cc7c "once", 0x58cc81 "menu", 0x0}
> 
> Kevin Wolf wrote:
>> Am 04.05.2010 15:42, schrieb Peter Lieven:
>>   
>>> hi kevin,
>>>
>>> you did it *g*
>>>
>>> looks promising. applied this patched and was not able to reproduce yet :-)
>>>
>>> secure way to reproduce was to shut down all multipath paths, then 
>>> initiate i/o
>>> in the vm (e.g. start an application). of course, everything hangs at 
>>> this point.
>>>
>>> after reenabling one path, vm crashed. now it seems to behave correctly and
>>> just report an DMA timeout and continues normally afterwards.
>>>     
>>
>> Great, I'm going to submit it as a proper patch then.
>>
>> Christoph, by now I'm pretty sure it's right, but can you have another
>> look if this is correct, anyway?
>>
>>   
>>> can you imagine of any way preventing the vm to consume 100% cpu in
>>> that waiting state?
>>> my current approach is to run all vms with nice 1, which helped to keep the
>>> machine responsible if all vms (in my test case 64 on a box) have hanging
>>> i/o at the same time.
>>>     
>>
>> I don't have anything particular in mind, but you could just attach gdb
>> and get another backtrace while it consumes 100% CPU (you'll need to use
>> "thread apply all bt" to catch everything). Then we should see where
>> it's hanging.
>>
>> Kevin
>>
>>
>>
>>   
> 
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] Re: qemu-kvm hangs if multipath device is queing
@ 2010-05-14  9:26               ` Kevin Wolf
  0 siblings, 0 replies; 34+ messages in thread
From: Kevin Wolf @ 2010-05-14  9:26 UTC (permalink / raw)
  To: Peter Lieven; +Cc: qemu-devel, kvm, Christoph Hellwig

Hi Peter,

Am 12.05.2010 16:01, schrieb Peter Lieven:
> Hi Kevin,
> 
> here we go. I created a blocking multipath device (interrupted all 
> paths). qemu-kvm hangs with 100% cpu.
> also monitor is not responding.
> 
> If I restore at least one path, the vm is continueing.
> 
> BR,
> Peter

This seems to be the backtrace of only one thread, and likely not the
interesting one. Can you please use "threads all apply bt" to get the
backtrace of all threads?

Kevin

> 
> 
> ^C
> Program received signal SIGINT, Interrupt.
> 0x00007fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0
> (gdb) bt
> #0  0x00007fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0
> #1  0x00007fd8a6aaa190 in _L_lock_102 () from /lib/libpthread.so.0
> #2  0x00007fd8a6aa9a7e in pthread_mutex_lock () from /lib/libpthread.so.0
> #3  0x000000000042e739 in kvm_mutex_lock () at 
> /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2524
> #4  0x000000000042e76e in qemu_mutex_lock_iothread () at 
> /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2537
> #5  0x000000000040c262 in main_loop_wait (timeout=1000) at 
> /usr/src/qemu-kvm-0.12.4/vl.c:3995
> #6  0x000000000042dcf1 in kvm_main_loop () at 
> /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2126
> #7  0x000000000040c98c in main_loop () at /usr/src/qemu-kvm-0.12.4/vl.c:4212
> #8  0x000000000041054b in main (argc=30, argv=0x7fff266a77e8, 
> envp=0x7fff266a78e0) at /usr/src/qemu-kvm-0.12.4/vl.c:6252
> (gdb) bt full
> #0  0x00007fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0
> No symbol table info available.
> #1  0x00007fd8a6aaa190 in _L_lock_102 () from /lib/libpthread.so.0
> No symbol table info available.
> #2  0x00007fd8a6aa9a7e in pthread_mutex_lock () from /lib/libpthread.so.0
> No symbol table info available.
> #3  0x000000000042e739 in kvm_mutex_lock () at 
> /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2524
> No locals.
> #4  0x000000000042e76e in qemu_mutex_lock_iothread () at 
> /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2537
> No locals.
> #5  0x000000000040c262 in main_loop_wait (timeout=1000) at 
> /usr/src/qemu-kvm-0.12.4/vl.c:3995
>     ioh = (IOHandlerRecord *) 0x0
>     rfds = {fds_bits = {1048576, 0 <repeats 15 times>}}
>     wfds = {fds_bits = {0 <repeats 16 times>}}
>     xfds = {fds_bits = {0 <repeats 16 times>}}
>     ret = 1
>     nfds = 21
>     tv = {tv_sec = 0, tv_usec = 999761}
> #6  0x000000000042dcf1 in kvm_main_loop () at 
> /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2126
>     fds = {18, 19}
>     mask = {__val = {268443712, 0 <repeats 15 times>}}
>     sigfd = 20
> #7  0x000000000040c98c in main_loop () at /usr/src/qemu-kvm-0.12.4/vl.c:4212
>     r = 0
> #8  0x000000000041054b in main (argc=30, argv=0x7fff266a77e8, 
> envp=0x7fff266a78e0) at /usr/src/qemu-kvm-0.12.4/vl.c:6252
>     gdbstub_dev = 0x0
>     boot_devices_bitmap = 12
>     i = 0
>     snapshot = 0
>     linux_boot = 0
>     initrd_filename = 0x0
>     kernel_filename = 0x0
>     kernel_cmdline = 0x588fac ""
>     boot_devices = "dc", '\0' <repeats 30 times>
>     ds = (DisplayState *) 0x198bf00
>     dcl = (DisplayChangeListener *) 0x0
>     cyls = 0
>     heads = 0
>     secs = 0
>     translation = 0
>     hda_opts = (QemuOpts *) 0x0
>     opts = (QemuOpts *) 0x1957390
>     optind = 30
> ---Type <return> to continue, or q <return> to quit---
>     r = 0x7fff266a8a23 "-usbdevice"
>     optarg = 0x7fff266a8a2e "tablet"
>     loadvm = 0x0
>     machine = (QEMUMachine *) 0x861720
>     cpu_model = 0x7fff266a8917 "qemu64,model_id=Intel(R) Xeon(R) CPU", ' 
> ' <repeats 11 times>, "E5520  @ 2.27GHz"
>     fds = {644511720, 32767}
>     tb_size = 0
>     pid_file = 0x7fff266a89bb "/var/run/qemu/vm-150.pid"
>     incoming = 0x0
>     fd = 0
>     pwd = (struct passwd *) 0x0
>     chroot_dir = 0x0
>     run_as = 0x0
>     env = (struct CPUX86State *) 0x0
>     show_vnc_port = 0
>     params = {0x58cc76 "order", 0x58cc7c "once", 0x58cc81 "menu", 0x0}
> 
> Kevin Wolf wrote:
>> Am 04.05.2010 15:42, schrieb Peter Lieven:
>>   
>>> hi kevin,
>>>
>>> you did it *g*
>>>
>>> looks promising. applied this patched and was not able to reproduce yet :-)
>>>
>>> secure way to reproduce was to shut down all multipath paths, then 
>>> initiate i/o
>>> in the vm (e.g. start an application). of course, everything hangs at 
>>> this point.
>>>
>>> after reenabling one path, vm crashed. now it seems to behave correctly and
>>> just report an DMA timeout and continues normally afterwards.
>>>     
>>
>> Great, I'm going to submit it as a proper patch then.
>>
>> Christoph, by now I'm pretty sure it's right, but can you have another
>> look if this is correct, anyway?
>>
>>   
>>> can you imagine of any way preventing the vm to consume 100% cpu in
>>> that waiting state?
>>> my current approach is to run all vms with nice 1, which helped to keep the
>>> machine responsible if all vms (in my test case 64 on a box) have hanging
>>> i/o at the same time.
>>>     
>>
>> I don't have anything particular in mind, but you could just attach gdb
>> and get another backtrace while it consumes 100% CPU (you'll need to use
>> "thread apply all bt" to catch everything). Then we should see where
>> it's hanging.
>>
>> Kevin
>>
>>
>>
>>   
> 
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: qemu-kvm hangs if multipath device is queing
  2010-05-14  9:26               ` [Qemu-devel] " Kevin Wolf
@ 2010-05-18 11:10                 ` Peter Lieven
  -1 siblings, 0 replies; 34+ messages in thread
From: Peter Lieven @ 2010-05-18 11:10 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, kvm, Christoph Hellwig

hi kevin,

here is the backtrace of (hopefully) all threads:

^C
Program received signal SIGINT, Interrupt.
[Switching to Thread 0x7f39b72656f0 (LWP 10695)]
0x00007f39b6c3ea94 in __lll_lock_wait () from /lib/libpthread.so.0

(gdb) thread apply all bt

Thread 2 (Thread 0x7f39b57b8950 (LWP 10698)):
#0  0x00007f39b6c3eedb in read () from /lib/libpthread.so.0
#1  0x000000000049e723 in qemu_laio_completion_cb (opaque=0x22b4010) at 
linux-aio.c:125
#2  0x000000000049e8ad in laio_cancel (blockacb=0x22ba310) at 
linux-aio.c:184
#3  0x000000000049a309 in bdrv_aio_cancel (acb=0x22ba310) at block.c:1800
#4  0x0000000000587a52 in dma_aio_cancel (acb=0x22ba170) at 
/usr/src/qemu-kvm-0.12.4/dma-helpers.c:138
#5  0x000000000049a309 in bdrv_aio_cancel (acb=0x22ba170) at block.c:1800
#6  0x0000000000444aac in ide_dma_cancel (bm=0x2800fd8) at 
/usr/src/qemu-kvm-0.12.4/hw/ide/core.c:2834
#7  0x0000000000445001 in bmdma_cmd_writeb (opaque=0x2800fd8, 
addr=49152, val=8) at /usr/src/qemu-kvm-0.12.4/hw/ide/pci.c:44
#8  0x00000000004c85f0 in ioport_write (index=0, address=49152, data=8) 
at ioport.c:80
#9  0x00000000004c8977 in cpu_outb (addr=49152, val=8 '\b') at ioport.c:198
#10 0x0000000000429731 in kvm_handle_io (port=49152, 
data=0x7f39b7263000, direction=1, size=1, count=1)
    at /usr/src/qemu-kvm-0.12.4/kvm-all.c:535
#11 0x000000000042bb8b in kvm_run (env=0x22ba5d0) at 
/usr/src/qemu-kvm-0.12.4/qemu-kvm.c:968
#12 0x000000000042cea2 in kvm_cpu_exec (env=0x22ba5d0) at 
/usr/src/qemu-kvm-0.12.4/qemu-kvm.c:1651
#13 0x000000000042d62c in kvm_main_loop_cpu (env=0x22ba5d0) at 
/usr/src/qemu-kvm-0.12.4/qemu-kvm.c:1893
#14 0x000000000042d76d in ap_main_loop (_env=0x22ba5d0) at 
/usr/src/qemu-kvm-0.12.4/qemu-kvm.c:1943
#15 0x00007f39b6c383ba in start_thread () from /lib/libpthread.so.0
#16 0x00007f39b5cbafcd in clone () from /lib/libc.so.6
#17 0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f39b72656f0 (LWP 10695)):
#0  0x00007f39b6c3ea94 in __lll_lock_wait () from /lib/libpthread.so.0
#1  0x00007f39b6c3a190 in _L_lock_102 () from /lib/libpthread.so.0
#2  0x00007f39b6c39a7e in pthread_mutex_lock () from /lib/libpthread.so.0
#3  0x000000000042e739 in kvm_mutex_lock () at 
/usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2524
#4  0x000000000042e76e in qemu_mutex_lock_iothread () at 
/usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2537
#5  0x000000000040c262 in main_loop_wait (timeout=1000) at 
/usr/src/qemu-kvm-0.12.4/vl.c:3995
#6  0x000000000042dcf1 in kvm_main_loop () at 
/usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2126
#7  0x000000000040c98c in main_loop () at /usr/src/qemu-kvm-0.12.4/vl.c:4212
#8  0x000000000041054b in main (argc=30, argv=0x7fff019f1ca8, 
envp=0x7fff019f1da0) at /usr/src/qemu-kvm-0.12.4/vl.c:6252


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] Re: qemu-kvm hangs if multipath device is queing
@ 2010-05-18 11:10                 ` Peter Lieven
  0 siblings, 0 replies; 34+ messages in thread
From: Peter Lieven @ 2010-05-18 11:10 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, kvm, Christoph Hellwig

hi kevin,

here is the backtrace of (hopefully) all threads:

^C
Program received signal SIGINT, Interrupt.
[Switching to Thread 0x7f39b72656f0 (LWP 10695)]
0x00007f39b6c3ea94 in __lll_lock_wait () from /lib/libpthread.so.0

(gdb) thread apply all bt

Thread 2 (Thread 0x7f39b57b8950 (LWP 10698)):
#0  0x00007f39b6c3eedb in read () from /lib/libpthread.so.0
#1  0x000000000049e723 in qemu_laio_completion_cb (opaque=0x22b4010) at 
linux-aio.c:125
#2  0x000000000049e8ad in laio_cancel (blockacb=0x22ba310) at 
linux-aio.c:184
#3  0x000000000049a309 in bdrv_aio_cancel (acb=0x22ba310) at block.c:1800
#4  0x0000000000587a52 in dma_aio_cancel (acb=0x22ba170) at 
/usr/src/qemu-kvm-0.12.4/dma-helpers.c:138
#5  0x000000000049a309 in bdrv_aio_cancel (acb=0x22ba170) at block.c:1800
#6  0x0000000000444aac in ide_dma_cancel (bm=0x2800fd8) at 
/usr/src/qemu-kvm-0.12.4/hw/ide/core.c:2834
#7  0x0000000000445001 in bmdma_cmd_writeb (opaque=0x2800fd8, 
addr=49152, val=8) at /usr/src/qemu-kvm-0.12.4/hw/ide/pci.c:44
#8  0x00000000004c85f0 in ioport_write (index=0, address=49152, data=8) 
at ioport.c:80
#9  0x00000000004c8977 in cpu_outb (addr=49152, val=8 '\b') at ioport.c:198
#10 0x0000000000429731 in kvm_handle_io (port=49152, 
data=0x7f39b7263000, direction=1, size=1, count=1)
    at /usr/src/qemu-kvm-0.12.4/kvm-all.c:535
#11 0x000000000042bb8b in kvm_run (env=0x22ba5d0) at 
/usr/src/qemu-kvm-0.12.4/qemu-kvm.c:968
#12 0x000000000042cea2 in kvm_cpu_exec (env=0x22ba5d0) at 
/usr/src/qemu-kvm-0.12.4/qemu-kvm.c:1651
#13 0x000000000042d62c in kvm_main_loop_cpu (env=0x22ba5d0) at 
/usr/src/qemu-kvm-0.12.4/qemu-kvm.c:1893
#14 0x000000000042d76d in ap_main_loop (_env=0x22ba5d0) at 
/usr/src/qemu-kvm-0.12.4/qemu-kvm.c:1943
#15 0x00007f39b6c383ba in start_thread () from /lib/libpthread.so.0
#16 0x00007f39b5cbafcd in clone () from /lib/libc.so.6
#17 0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f39b72656f0 (LWP 10695)):
#0  0x00007f39b6c3ea94 in __lll_lock_wait () from /lib/libpthread.so.0
#1  0x00007f39b6c3a190 in _L_lock_102 () from /lib/libpthread.so.0
#2  0x00007f39b6c39a7e in pthread_mutex_lock () from /lib/libpthread.so.0
#3  0x000000000042e739 in kvm_mutex_lock () at 
/usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2524
#4  0x000000000042e76e in qemu_mutex_lock_iothread () at 
/usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2537
#5  0x000000000040c262 in main_loop_wait (timeout=1000) at 
/usr/src/qemu-kvm-0.12.4/vl.c:3995
#6  0x000000000042dcf1 in kvm_main_loop () at 
/usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2126
#7  0x000000000040c98c in main_loop () at /usr/src/qemu-kvm-0.12.4/vl.c:4212
#8  0x000000000041054b in main (argc=30, argv=0x7fff019f1ca8, 
envp=0x7fff019f1da0) at /usr/src/qemu-kvm-0.12.4/vl.c:6252

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion
  2010-05-04 17:07           ` Christoph Hellwig
@ 2010-05-18 11:13             ` Peter Lieven
  2010-05-18 12:14                 ` Kevin Wolf
  0 siblings, 1 reply; 34+ messages in thread
From: Peter Lieven @ 2010-05-18 11:13 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Kevin Wolf, qemu-devel, kvm

hi,

will this patch make it into 0.12.4.1 ?

br,
peter

Christoph Hellwig wrote:
> On Tue, May 04, 2010 at 04:01:35PM +0200, Kevin Wolf wrote:
>   
>> Great, I'm going to submit it as a proper patch then.
>>
>> Christoph, by now I'm pretty sure it's right, but can you have another
>> look if this is correct, anyway?
>>     
>
> It looks correct to me - we really shouldn't update the the fields
> until bdrv_aio_cancel has returned.  In fact we cannot cancel a request
> more often than we can, so there's a fairly high chance it will
> complete.
>
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
>
>   


-- 
Mit freundlichen Grüßen/Kind Regards

Peter Lieven

..........................................................................................................

   KAMP Netzwerkdienste GmbH
   Vestische Str. 89-91 | 46117 Oberhausen
   Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40
   mailto:pl@kamp.de | http://www.kamp.de

   Geschäftsführer: Heiner Lante | Michael Lante
   Amtsgericht Duisburg | HRB Nr. 12154
   USt-Id-Nr.: DE 120607556

......................................................................................................... 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion
  2010-05-18 11:13             ` Peter Lieven
@ 2010-05-18 12:14                 ` Kevin Wolf
  0 siblings, 0 replies; 34+ messages in thread
From: Kevin Wolf @ 2010-05-18 12:14 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Peter Lieven, Christoph Hellwig, qemu-devel, kvm

Am 18.05.2010 13:13, schrieb Peter Lieven:
> hi,
> 
> will this patch make it into 0.12.4.1 ?
> 
> br,
> peter

Anthony, can you please cherry-pick commit 38d8dfa1 into stable-0.12?

Kevin

> 
> Christoph Hellwig wrote:
>> On Tue, May 04, 2010 at 04:01:35PM +0200, Kevin Wolf wrote:
>>   
>>> Great, I'm going to submit it as a proper patch then.
>>>
>>> Christoph, by now I'm pretty sure it's right, but can you have another
>>> look if this is correct, anyway?
>>>     
>>
>> It looks correct to me - we really shouldn't update the the fields
>> until bdrv_aio_cancel has returned.  In fact we cannot cancel a request
>> more often than we can, so there's a fairly high chance it will
>> complete.
>>
>>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion
@ 2010-05-18 12:14                 ` Kevin Wolf
  0 siblings, 0 replies; 34+ messages in thread
From: Kevin Wolf @ 2010-05-18 12:14 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Peter Lieven, Christoph Hellwig, kvm, qemu-devel

Am 18.05.2010 13:13, schrieb Peter Lieven:
> hi,
> 
> will this patch make it into 0.12.4.1 ?
> 
> br,
> peter

Anthony, can you please cherry-pick commit 38d8dfa1 into stable-0.12?

Kevin

> 
> Christoph Hellwig wrote:
>> On Tue, May 04, 2010 at 04:01:35PM +0200, Kevin Wolf wrote:
>>   
>>> Great, I'm going to submit it as a proper patch then.
>>>
>>> Christoph, by now I'm pretty sure it's right, but can you have another
>>> look if this is correct, anyway?
>>>     
>>
>> It looks correct to me - we really shouldn't update the the fields
>> until bdrv_aio_cancel has returned.  In fact we cannot cancel a request
>> more often than we can, so there's a fairly high chance it will
>> complete.
>>
>>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: qemu-kvm hangs if multipath device is queing
  2010-05-18 11:10                 ` [Qemu-devel] " Peter Lieven
@ 2010-05-18 13:22                   ` Kevin Wolf
  -1 siblings, 0 replies; 34+ messages in thread
From: Kevin Wolf @ 2010-05-18 13:22 UTC (permalink / raw)
  To: Peter Lieven; +Cc: qemu-devel, kvm, Christoph Hellwig

Am 18.05.2010 13:10, schrieb Peter Lieven:
> hi kevin,
> 
> here is the backtrace of (hopefully) all threads:
> 
> ^C
> Program received signal SIGINT, Interrupt.
> [Switching to Thread 0x7f39b72656f0 (LWP 10695)]
> 0x00007f39b6c3ea94 in __lll_lock_wait () from /lib/libpthread.so.0
> 
> (gdb) thread apply all bt
> 
> Thread 2 (Thread 0x7f39b57b8950 (LWP 10698)):
> #0  0x00007f39b6c3eedb in read () from /lib/libpthread.so.0
> #1  0x000000000049e723 in qemu_laio_completion_cb (opaque=0x22b4010) at 
> linux-aio.c:125
> #2  0x000000000049e8ad in laio_cancel (blockacb=0x22ba310) at 
> linux-aio.c:184

I think it's stuck here in an endless loop:

    while (laiocb->ret == -EINPROGRESS)
        qemu_laio_completion_cb(laiocb->ctx);

Can you verify this by single-stepping one or two loop iterations? ret
and errno after the read call could be interesting, too.

We'll be stuck in an endless loop if the request doesn't complete, which
might well happen in your scenario. Not sure what the right thing to do
is. We probably need to fail the bdrv_aio_cancel to avoid blocking the
whole program, but I have no idea what device emulations should do on
that condition.

As long as we can't handle that condition correctly, leaving the hang in
place is probably the best option. Maybe add some sleep to avoid 100%
CPU consumption.

Kevin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] Re: qemu-kvm hangs if multipath device is queing
@ 2010-05-18 13:22                   ` Kevin Wolf
  0 siblings, 0 replies; 34+ messages in thread
From: Kevin Wolf @ 2010-05-18 13:22 UTC (permalink / raw)
  To: Peter Lieven; +Cc: qemu-devel, kvm, Christoph Hellwig

Am 18.05.2010 13:10, schrieb Peter Lieven:
> hi kevin,
> 
> here is the backtrace of (hopefully) all threads:
> 
> ^C
> Program received signal SIGINT, Interrupt.
> [Switching to Thread 0x7f39b72656f0 (LWP 10695)]
> 0x00007f39b6c3ea94 in __lll_lock_wait () from /lib/libpthread.so.0
> 
> (gdb) thread apply all bt
> 
> Thread 2 (Thread 0x7f39b57b8950 (LWP 10698)):
> #0  0x00007f39b6c3eedb in read () from /lib/libpthread.so.0
> #1  0x000000000049e723 in qemu_laio_completion_cb (opaque=0x22b4010) at 
> linux-aio.c:125
> #2  0x000000000049e8ad in laio_cancel (blockacb=0x22ba310) at 
> linux-aio.c:184

I think it's stuck here in an endless loop:

    while (laiocb->ret == -EINPROGRESS)
        qemu_laio_completion_cb(laiocb->ctx);

Can you verify this by single-stepping one or two loop iterations? ret
and errno after the read call could be interesting, too.

We'll be stuck in an endless loop if the request doesn't complete, which
might well happen in your scenario. Not sure what the right thing to do
is. We probably need to fail the bdrv_aio_cancel to avoid blocking the
whole program, but I have no idea what device emulations should do on
that condition.

As long as we can't handle that condition correctly, leaving the hang in
place is probably the best option. Maybe add some sleep to avoid 100%
CPU consumption.

Kevin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: qemu-kvm hangs if multipath device is queing
  2010-05-18 13:22                   ` [Qemu-devel] " Kevin Wolf
@ 2010-05-19  7:29                     ` Christoph Hellwig
  -1 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2010-05-19  7:29 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Peter Lieven, qemu-devel, kvm, Christoph Hellwig

On Tue, May 18, 2010 at 03:22:36PM +0200, Kevin Wolf wrote:
> I think it's stuck here in an endless loop:
> 
>     while (laiocb->ret == -EINPROGRESS)
>         qemu_laio_completion_cb(laiocb->ctx);
> 
> Can you verify this by single-stepping one or two loop iterations? ret
> and errno after the read call could be interesting, too.

Maybe the compiler is just too smart.  Without some form of barrier
it could just optimize the loop away as laiocb->ret couldn't change
in a normal single-threaded environment.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] Re: qemu-kvm hangs if multipath device is queing
@ 2010-05-19  7:29                     ` Christoph Hellwig
  0 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2010-05-19  7:29 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Peter Lieven, qemu-devel, kvm, Christoph Hellwig

On Tue, May 18, 2010 at 03:22:36PM +0200, Kevin Wolf wrote:
> I think it's stuck here in an endless loop:
> 
>     while (laiocb->ret == -EINPROGRESS)
>         qemu_laio_completion_cb(laiocb->ctx);
> 
> Can you verify this by single-stepping one or two loop iterations? ret
> and errno after the read call could be interesting, too.

Maybe the compiler is just too smart.  Without some form of barrier
it could just optimize the loop away as laiocb->ret couldn't change
in a normal single-threaded environment.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: qemu-kvm hangs if multipath device is queing
  2010-05-19  7:29                     ` [Qemu-devel] " Christoph Hellwig
@ 2010-05-19  7:48                       ` Kevin Wolf
  -1 siblings, 0 replies; 34+ messages in thread
From: Kevin Wolf @ 2010-05-19  7:48 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Peter Lieven, qemu-devel, kvm

Am 19.05.2010 09:29, schrieb Christoph Hellwig:
> On Tue, May 18, 2010 at 03:22:36PM +0200, Kevin Wolf wrote:
>> I think it's stuck here in an endless loop:
>>
>>     while (laiocb->ret == -EINPROGRESS)
>>         qemu_laio_completion_cb(laiocb->ctx);
>>
>> Can you verify this by single-stepping one or two loop iterations? ret
>> and errno after the read call could be interesting, too.
> 
> Maybe the compiler is just too smart.  Without some form of barrier
> it could just optimize the loop away as laiocb->ret couldn't change
> in a normal single-threaded environment.

It probably could in theory, but in practice we're in a read() inside
qemu_laio_completion, so it didn't do it here.

Kevin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] Re: qemu-kvm hangs if multipath device is queing
@ 2010-05-19  7:48                       ` Kevin Wolf
  0 siblings, 0 replies; 34+ messages in thread
From: Kevin Wolf @ 2010-05-19  7:48 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Peter Lieven, qemu-devel, kvm

Am 19.05.2010 09:29, schrieb Christoph Hellwig:
> On Tue, May 18, 2010 at 03:22:36PM +0200, Kevin Wolf wrote:
>> I think it's stuck here in an endless loop:
>>
>>     while (laiocb->ret == -EINPROGRESS)
>>         qemu_laio_completion_cb(laiocb->ctx);
>>
>> Can you verify this by single-stepping one or two loop iterations? ret
>> and errno after the read call could be interesting, too.
> 
> Maybe the compiler is just too smart.  Without some form of barrier
> it could just optimize the loop away as laiocb->ret couldn't change
> in a normal single-threaded environment.

It probably could in theory, but in practice we're in a read() inside
qemu_laio_completion, so it didn't do it here.

Kevin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: qemu-kvm hangs if multipath device is queing
  2010-05-19  7:48                       ` [Qemu-devel] " Kevin Wolf
@ 2010-05-19  8:18                         ` Peter Lieven
  -1 siblings, 0 replies; 34+ messages in thread
From: Peter Lieven @ 2010-05-19  8:18 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Christoph Hellwig, qemu-devel, kvm

Kevin Wolf wrote:
> Am 19.05.2010 09:29, schrieb Christoph Hellwig:
>   
>> On Tue, May 18, 2010 at 03:22:36PM +0200, Kevin Wolf wrote:
>>     
>>> I think it's stuck here in an endless loop:
>>>
>>>     while (laiocb->ret == -EINPROGRESS)
>>>         qemu_laio_completion_cb(laiocb->ctx);
>>>
>>> Can you verify this by single-stepping one or two loop iterations? ret
>>> and errno after the read call could be interesting, too.
>>>       
>> Maybe the compiler is just too smart.  Without some form of barrier
>> it could just optimize the loop away as laiocb->ret couldn't change
>> in a normal single-threaded environment.
>>     
>
> It probably could in theory, but in practice we're in a read() inside
> qemu_laio_completion, so it didn't do it here.
>   
if you supply a patch that will add some usleeps at the point in
question i'm willing to test if it solves the 100% cpu problem.
> Kevin
>
>   


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] Re: qemu-kvm hangs if multipath device is queing
@ 2010-05-19  8:18                         ` Peter Lieven
  0 siblings, 0 replies; 34+ messages in thread
From: Peter Lieven @ 2010-05-19  8:18 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Christoph Hellwig, kvm, qemu-devel

Kevin Wolf wrote:
> Am 19.05.2010 09:29, schrieb Christoph Hellwig:
>   
>> On Tue, May 18, 2010 at 03:22:36PM +0200, Kevin Wolf wrote:
>>     
>>> I think it's stuck here in an endless loop:
>>>
>>>     while (laiocb->ret == -EINPROGRESS)
>>>         qemu_laio_completion_cb(laiocb->ctx);
>>>
>>> Can you verify this by single-stepping one or two loop iterations? ret
>>> and errno after the read call could be interesting, too.
>>>       
>> Maybe the compiler is just too smart.  Without some form of barrier
>> it could just optimize the loop away as laiocb->ret couldn't change
>> in a normal single-threaded environment.
>>     
>
> It probably could in theory, but in practice we're in a read() inside
> qemu_laio_completion, so it didn't do it here.
>   
if you supply a patch that will add some usleeps at the point in
question i'm willing to test if it solves the 100% cpu problem.
> Kevin
>
>   

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] Re: qemu-kvm hangs if multipath device is queing
  2010-05-19  8:18                         ` [Qemu-devel] " Peter Lieven
  (?)
@ 2010-05-23 10:30                         ` Peter Lieven
  -1 siblings, 0 replies; 34+ messages in thread
From: Peter Lieven @ 2010-05-23 10:30 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Kevin Wolf, Christoph Hellwig, kvm, qemu-devel


Am 19.05.2010 um 10:18 schrieb Peter Lieven:

> Kevin Wolf wrote:
>> Am 19.05.2010 09:29, schrieb Christoph Hellwig:
>>  
>>> On Tue, May 18, 2010 at 03:22:36PM +0200, Kevin Wolf wrote:
>>>    
>>>> I think it's stuck here in an endless loop:
>>>> 
>>>>    while (laiocb->ret == -EINPROGRESS)
>>>>        qemu_laio_completion_cb(laiocb->ctx);
>>>> 
>>>> Can you verify this by single-stepping one or two loop iterations? ret
>>>> and errno after the read call could be interesting, too.
>>>>      
>>> Maybe the compiler is just too smart.  Without some form of barrier
>>> it could just optimize the loop away as laiocb->ret couldn't change
>>> in a normal single-threaded environment.
>>>    
>> 
>> It probably could in theory, but in practice we're in a read() inside
>> qemu_laio_completion, so it didn't do it here.
>>  
> if you supply a patch that will add some usleeps at the point in
> question i'm willing to test if it solves the 100% cpu problem.

can someone help here? what would be the best option to add some
usleeps?

>> Kevin
>> 
>>  
> 
> 
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2010-05-23 10:30 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-03 21:26 Qemu-KVM 0.12.3 and Multipath -> Assertion Peter Lieven
2010-05-03 21:26 ` [Qemu-devel] " Peter Lieven
2010-05-04  5:38 ` André Weidemann
2010-05-04  5:38   ` [Qemu-devel] " André Weidemann
2010-05-04  8:35 ` [Qemu-devel] " Kevin Wolf
2010-05-04  8:35   ` Kevin Wolf
2010-05-04 11:38   ` Peter Lieven
2010-05-04 11:38     ` Peter Lieven
2010-05-04 12:20     ` Kevin Wolf
2010-05-04 12:20       ` Kevin Wolf
2010-05-04 13:42       ` Peter Lieven
2010-05-04 13:42         ` Peter Lieven
2010-05-04 14:01         ` Kevin Wolf
2010-05-04 14:01           ` Kevin Wolf
2010-05-04 17:07           ` Christoph Hellwig
2010-05-18 11:13             ` Peter Lieven
2010-05-18 12:14               ` Kevin Wolf
2010-05-18 12:14                 ` Kevin Wolf
2010-05-12 14:01           ` qemu-kvm hangs if multipath device is queing (was: Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion) Peter Lieven
2010-05-14  9:26             ` qemu-kvm hangs if multipath device is queing Kevin Wolf
2010-05-14  9:26               ` [Qemu-devel] " Kevin Wolf
2010-05-18 11:10               ` Peter Lieven
2010-05-18 11:10                 ` [Qemu-devel] " Peter Lieven
2010-05-18 13:22                 ` Kevin Wolf
2010-05-18 13:22                   ` [Qemu-devel] " Kevin Wolf
2010-05-19  7:29                   ` Christoph Hellwig
2010-05-19  7:29                     ` [Qemu-devel] " Christoph Hellwig
2010-05-19  7:48                     ` Kevin Wolf
2010-05-19  7:48                       ` [Qemu-devel] " Kevin Wolf
2010-05-19  8:18                       ` Peter Lieven
2010-05-19  8:18                         ` [Qemu-devel] " Peter Lieven
2010-05-23 10:30                         ` Peter Lieven
2010-05-08  9:53       ` [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion André Weidemann
2010-05-08  9:53         ` André Weidemann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.