All of lore.kernel.org
 help / color / mirror / Atom feed
* I/O scheduler deadlocks on Xen virtual block devices
@ 2010-09-01 14:17 Doug Neal
  2010-09-01 17:33 ` Jeff Moyer
  2010-09-03 12:22 ` Pasi Kärkkäinen
  0 siblings, 2 replies; 10+ messages in thread
From: Doug Neal @ 2010-09-01 14:17 UTC (permalink / raw)
  To: linux-kernel

Hello all,

I believe I have found a bug in the I/O scheduler which results in
access to a block device being blocked indefinitely.

The setup:
 * Xen dom0 version 3.4.2.
 * domU: Ubuntu 10.04, x86_64, with kernel 2.6.32.15+drm33.5.
 * Paravirtual disks and network interfaces.
 * Root filesystem on /dev/xvdb1
 * A scratch filesystem for the purposes of my tests on /dev/xvda
mounted on /mnt
 * Both filesystems are ext3, formatted and mounted with defaults
 * XVBDs are backed by LVM on top of an iSCSI SAN in the dom0.


Activities leading up to the incident:

To reproduce the bug, I run the VM on a Xen host which has a moderate
workload of other VMs. (It seems to manifest itself more readily than
if the host is otherwise idle).

I repeatedly rsync the contents of a Linux install CD to an arbitrary
location on the scratch filesystem, e.g. /mnt/test, then rm -rf the
lot, and rsync again. It can sometimes take a few iterations before
the bug is triggered. Sometimes it's triggered on the rsync, sometimes
on the rm.

At some point during either the rsync or the rm, all the processes
accessing /dev/xvda (rsync, kjournald, flush) become stuck in the D
state. After 120 seconds the warnings start to appear in dmesg:

  [  840.070508] INFO: task rm:1455 blocked for more than 120 seconds.
  [  840.070514] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.

Full dmesg output is below, including call traces which show that each
of these processes is stuck in io_schedule or sync_buffer called from
io_schedule.


I believe I have eliminated:
 * Problems with the underlying physical device
   - The same bug has manifested itself on two completely separate
sets of hardware, with different servers, switches, and SANs.

 * Problems with the host's iSCSI initiator
   - Other VMs depending on the same iSCSI session are unaffected
   - Other XVBDs within the same VM (in this case: /dev/xvdb1 mounted
on /) are unaffected

Things I've tried:
 * Noop, deadline, cfq and anticipatory elevators.
 * Reducing the VM's vCPU count to 1. I _wasn't_ able to trigger the
bug under only one vCPU.

Unfortunately the steps to reproduce are rather long winded once you
factor in setting up the Xen dom0 which is in this case Citrix
XenServer, so if there is any way I can help with that please let me
know, e.g. by arranging remote access to my existing test environment
or by supplying a VM template image. However I don't believe this
problem is limited to this environment as I have seen quite a few
reports of similar or identical issues elsewhere (but not on LKML...):

http://lists.xensource.com/archives/html/xen-devel/2010-06/msg00590.html
- same issue reported with a non-commercial and more recent Xen
version than mine. Also links to three Redhat bug reports showing
similar behaviour.


Thanks for your time,

Doug



[  720.070103] INFO: task kjournald:1011 blocked for more than 120 seconds.
[  720.070121] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  720.070131] kjournald     D ffff8800035bcbc0     0  1011      2 0x00000000
[  720.070140]  ffff880002dc1a30 0000000000000246 0000000000015bc0
0000000000015bc0
[  720.070149]  ffff8800574db1a0 ffff880002dc1fd8 0000000000015bc0
ffff8800574dade0
[  720.070156]  0000000000015bc0 ffff880002dc1fd8 0000000000015bc0
ffff8800574db1a0
[  720.070164] Call Trace:
[  720.070177]  [<ffffffff81558bb7>] io_schedule+0x47/0x70
[  720.070184]  [<ffffffff812a095e>] get_request_wait+0xce/0x1a0
[  720.070191]  [<ffffffff81085090>] ? autoremove_wake_function+0x0/0x40
[  720.070195]  [<ffffffff812a1045>] __make_request+0x95/0x4a0
[  720.070201]  [<ffffffff810f6595>] ? mempool_alloc_slab+0x15/0x20
[  720.070208]  [<ffffffff8100f37f>] ? xen_restore_fl_direct_end+0x0/0x1
[  720.070213]  [<ffffffff8129f7a1>] generic_make_request+0x1b1/0x4f0
[  720.070218]  [<ffffffff810f6595>] ? mempool_alloc_slab+0x15/0x20
[  720.070223]  [<ffffffff8129fb60>] submit_bio+0x80/0x110
[  720.070229]  [<ffffffff8116d399>] submit_bh+0xf9/0x140
[  720.070235]  [<ffffffff8121495c>] journal_submit_data_buffers+0x24c/0x320
[  720.070240]  [<ffffffff81214d02>] journal_commit_transaction+0x2d2/0xe40
[  720.070245]  [<ffffffff8100f37f>] ? xen_restore_fl_direct_end+0x0/0x1
[  720.070250]  [<ffffffff8155ab5e>] ? _spin_unlock_irqrestore+0x1e/0x30
[  720.070256]  [<ffffffff81077719>] ? try_to_del_timer_sync+0x79/0xd0
[  720.070261]  [<ffffffff8100f37f>] ? xen_restore_fl_direct_end+0x0/0x1
[  720.070265]  [<ffffffff8155ab5e>] ? _spin_unlock_irqrestore+0x1e/0x30
[  720.070270]  [<ffffffff81218c1d>] kjournald+0xed/0x250
[  720.070275]  [<ffffffff81085090>] ? autoremove_wake_function+0x0/0x40
[  720.070279]  [<ffffffff81218b30>] ? kjournald+0x0/0x250
[  720.070284]  [<ffffffff81084d16>] kthread+0x96/0xa0
[  720.070289]  [<ffffffff810141ea>] child_rip+0xa/0x20
[  720.070295]  [<ffffffff810133d1>] ? int_ret_from_sys_call+0x7/0x1b
[  720.070299]  [<ffffffff81013b5d>] ? retint_restore_args+0x5/0x6
[  720.070313]  [<ffffffff810141e0>] ? child_rip+0x0/0x20
[  720.070317] INFO: task rm:1455 blocked for more than 120 seconds.
[  720.070323] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  720.070329] rm            D ffff8800035dabc0     0  1455    980 0x00000000
[  720.070333]  ffff880058bedb68 0000000000000286 0000000000015bc0
0000000000015bc0
[  720.070338]  ffff880002d39ab0 ffff880058bedfd8 0000000000015bc0
ffff880002d396f0
[  720.070343]  0000000000015bc0 ffff880058bedfd8 0000000000015bc0
ffff880002d39ab0
[  720.070348] Call Trace:
[  720.070351]  [<ffffffff8116e5d0>] ? sync_buffer+0x0/0x50
[  720.070354]  [<ffffffff81558bb7>] io_schedule+0x47/0x70
[  720.070357]  [<ffffffff8116e615>] sync_buffer+0x45/0x50
[  720.070360]  [<ffffffff8155908a>] __wait_on_bit_lock+0x5a/0xc0
[  720.070363]  [<ffffffff8116e5d0>] ? sync_buffer+0x0/0x50
[  720.070366]  [<ffffffff81559168>] out_of_line_wait_on_bit_lock+0x78/0x90
[  720.070369]  [<ffffffff810850d0>] ? wake_bit_function+0x0/0x40
[  720.070373]  [<ffffffff8116d259>] ? try_to_free_buffers+0x79/0xc0
[  720.070376]  [<ffffffff8116e796>] __lock_buffer+0x36/0x40
[  720.070379]  [<ffffffff812136d6>] journal_invalidatepage+0xc6/0xd0
[  720.070383]  [<ffffffff811b7e5d>] ext3_invalidatepage+0x3d/0x40
[  720.070388]  [<ffffffff810ff7a5>] do_invalidatepage+0x25/0x30
[  720.070390]  [<ffffffff810ffa8a>] truncate_inode_page+0x9a/0xb0
[  720.070393]  [<ffffffff810ffd80>] truncate_inode_pages_range+0x160/0x440
[  720.070396]  [<ffffffff811bb950>] ? ext3_delete_inode+0x0/0xf0
[  720.070399]  [<ffffffff81100075>] truncate_inode_pages+0x15/0x20
[  720.070402]  [<ffffffff811bb96d>] ext3_delete_inode+0x1d/0xf0
[  720.070405]  [<ffffffff811bb950>] ? ext3_delete_inode+0x0/0xf0
[  720.070409]  [<ffffffff8115cac3>] generic_delete_inode+0xe3/0x1c0
[  720.070412]  [<ffffffff8115cc05>] generic_drop_inode+0x65/0x80
[  720.070415]  [<ffffffff8115b592>] iput+0x62/0x70
[  720.070419]  [<ffffffff81151f82>] do_unlinkat+0x112/0x1d0
[  720.070422]  [<ffffffff811555e5>] ? sys_getdents+0xb5/0xf0
[  720.070425]  [<ffffffff811521b2>] sys_unlinkat+0x22/0x40
[  720.070428]  [<ffffffff810131b2>] system_call_fastpath+0x16/0x1b
[  840.070097] INFO: task flush-202:0:208 blocked for more than 120 seconds.
[  840.070115] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  840.070125] flush-202:0   D ffff88000359ebc0     0   208      2 0x00000000
[  840.070132]  ffff880002d698c0 0000000000000246 0000000000015bc0
0000000000015bc0
[  840.070142]  ffff88007d444890 ffff880002d69fd8 0000000000015bc0
ffff88007d4444d0
[  840.070149]  0000000000015bc0 ffff880002d69fd8 0000000000015bc0
ffff88007d444890
[  840.070156] Call Trace:
[  840.070170]  [<ffffffff8116e5d0>] ? sync_buffer+0x0/0x50
[  840.070177]  [<ffffffff81558bb7>] io_schedule+0x47/0x70
[  840.070182]  [<ffffffff8116e615>] sync_buffer+0x45/0x50
[  840.070187]  [<ffffffff8155908a>] __wait_on_bit_lock+0x5a/0xc0
[  840.070191]  [<ffffffff8116e5d0>] ? sync_buffer+0x0/0x50
[  840.070196]  [<ffffffff8116eb60>] ? end_buffer_async_write+0x0/0x180
[  840.070201]  [<ffffffff81559168>] out_of_line_wait_on_bit_lock+0x78/0x90
[  840.070208]  [<ffffffff810850d0>] ? wake_bit_function+0x0/0x40
[  840.070212]  [<ffffffff8116e796>] __lock_buffer+0x36/0x40
[  840.070217]  [<ffffffff8116f473>] __block_write_full_page+0x383/0x3c0
[  840.070222]  [<ffffffff8116eb60>] ? end_buffer_async_write+0x0/0x180
[  840.070227]  [<ffffffff8116fde0>] block_write_full_page_endio+0xe0/0x120
[  840.070232]  [<ffffffff811b70f0>] ? buffer_unmapped+0x0/0x20
[  840.070237]  [<ffffffff8116fe35>] block_write_full_page+0x15/0x20
[  840.070241]  [<ffffffff811b7bfd>] ext3_ordered_writepage+0x1dd/0x200
[  840.070248]  [<ffffffff810fc687>] __writepage+0x17/0x40
[  840.070254]  [<ffffffff810fd83f>] write_cache_pages+0x21f/0x4d0
[  840.070258]  [<ffffffff810fc670>] ? __writepage+0x0/0x40
[  840.070263]  [<ffffffff810fdb14>] generic_writepages+0x24/0x30
[  840.070268]  [<ffffffff810fdb55>] do_writepages+0x35/0x40
[  840.070273]  [<ffffffff81166976>] writeback_single_inode+0xf6/0x3d0
[  840.070279]  [<ffffffff811675de>] writeback_inodes_wb+0x40e/0x5e0
[  840.070283]  [<ffffffff811678ba>] wb_writeback+0x10a/0x1d0
[  840.070289]  [<ffffffff81077719>] ? try_to_del_timer_sync+0x79/0xd0
[  840.070294]  [<ffffffff81558e6d>] ? schedule_timeout+0x19d/0x300
[  840.070298]  [<ffffffff81167bec>] wb_do_writeback+0x18c/0x1a0
[  840.070303]  [<ffffffff81167c53>] bdi_writeback_task+0x53/0xe0
[  840.070308]  [<ffffffff8110f546>] bdi_start_fn+0x86/0x100
[  840.070312]  [<ffffffff8110f4c0>] ? bdi_start_fn+0x0/0x100
[  840.070317]  [<ffffffff81084d16>] kthread+0x96/0xa0
[  840.070322]  [<ffffffff810141ea>] child_rip+0xa/0x20
[  840.070329]  [<ffffffff810133d1>] ? int_ret_from_sys_call+0x7/0x1b
[  840.070333]  [<ffffffff81013b5d>] ? retint_restore_args+0x5/0x6
[  840.070337]  [<ffffffff810141e0>] ? child_rip+0x0/0x20
[  840.070346] INFO: task kjournald:1011 blocked for more than 120 seconds.
[  840.070354] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  840.070363] kjournald     D ffff8800035bcbc0     0  1011      2 0x00000000
[  840.070378]  ffff880002dc1a30 0000000000000246 0000000000015bc0
0000000000015bc0
[  840.070385]  ffff8800574db1a0 ffff880002dc1fd8 0000000000015bc0
ffff8800574dade0
[  840.070392]  0000000000015bc0 ffff880002dc1fd8 0000000000015bc0
ffff8800574db1a0
[  840.070399] Call Trace:
[  840.070404]  [<ffffffff81558bb7>] io_schedule+0x47/0x70
[  840.070409]  [<ffffffff812a095e>] get_request_wait+0xce/0x1a0
[  840.070414]  [<ffffffff81085090>] ? autoremove_wake_function+0x0/0x40
[  840.070418]  [<ffffffff812a1045>] __make_request+0x95/0x4a0
[  840.070423]  [<ffffffff810f6595>] ? mempool_alloc_slab+0x15/0x20
[  840.070428]  [<ffffffff8100f37f>] ? xen_restore_fl_direct_end+0x0/0x1
[  840.070434]  [<ffffffff8129f7a1>] generic_make_request+0x1b1/0x4f0
[  840.070438]  [<ffffffff810f6595>] ? mempool_alloc_slab+0x15/0x20
[  840.070443]  [<ffffffff8129fb60>] submit_bio+0x80/0x110
[  840.070448]  [<ffffffff8116d399>] submit_bh+0xf9/0x140
[  840.070463]  [<ffffffff8121495c>] journal_submit_data_buffers+0x24c/0x320
[  840.070466]  [<ffffffff81214d02>] journal_commit_transaction+0x2d2/0xe40
[  840.070470]  [<ffffffff8100f37f>] ? xen_restore_fl_direct_end+0x0/0x1
[  840.070473]  [<ffffffff8155ab5e>] ? _spin_unlock_irqrestore+0x1e/0x30
[  840.070476]  [<ffffffff81077719>] ? try_to_del_timer_sync+0x79/0xd0
[  840.070479]  [<ffffffff8100f37f>] ? xen_restore_fl_direct_end+0x0/0x1
[  840.070482]  [<ffffffff8155ab5e>] ? _spin_unlock_irqrestore+0x1e/0x30
[  840.070485]  [<ffffffff81218c1d>] kjournald+0xed/0x250
[  840.070488]  [<ffffffff81085090>] ? autoremove_wake_function+0x0/0x40
[  840.070491]  [<ffffffff81218b30>] ? kjournald+0x0/0x250
[  840.070494]  [<ffffffff81084d16>] kthread+0x96/0xa0
[  840.070497]  [<ffffffff810141ea>] child_rip+0xa/0x20
[  840.070500]  [<ffffffff810133d1>] ? int_ret_from_sys_call+0x7/0x1b
[  840.070503]  [<ffffffff81013b5d>] ? retint_restore_args+0x5/0x6
[  840.070506]  [<ffffffff810141e0>] ? child_rip+0x0/0x20
[  840.070508] INFO: task rm:1455 blocked for more than 120 seconds.
[  840.070514] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  840.070519] rm            D ffff8800035dabc0     0  1455    980 0x00000000
[  840.070524]  ffff880058bedb68 0000000000000286 0000000000015bc0
0000000000015bc0
[  840.070528]  ffff880002d39ab0 ffff880058bedfd8 0000000000015bc0
ffff880002d396f0
[  840.070533]  0000000000015bc0 ffff880058bedfd8 0000000000015bc0
ffff880002d39ab0
[  840.070538] Call Trace:
[  840.070541]  [<ffffffff8116e5d0>] ? sync_buffer+0x0/0x50
[  840.070544]  [<ffffffff81558bb7>] io_schedule+0x47/0x70
[  840.070547]  [<ffffffff8116e615>] sync_buffer+0x45/0x50
[  840.070550]  [<ffffffff8155908a>] __wait_on_bit_lock+0x5a/0xc0
[  840.070553]  [<ffffffff8116e5d0>] ? sync_buffer+0x0/0x50
[  840.070556]  [<ffffffff81559168>] out_of_line_wait_on_bit_lock+0x78/0x90
[  840.070560]  [<ffffffff810850d0>] ? wake_bit_function+0x0/0x40
[  840.070563]  [<ffffffff8116d259>] ? try_to_free_buffers+0x79/0xc0
[  840.070566]  [<ffffffff8116e796>] __lock_buffer+0x36/0x40
[  840.070569]  [<ffffffff812136d6>] journal_invalidatepage+0xc6/0xd0
[  840.070572]  [<ffffffff811b7e5d>] ext3_invalidatepage+0x3d/0x40
[  840.070575]  [<ffffffff810ff7a5>] do_invalidatepage+0x25/0x30
[  840.070578]  [<ffffffff810ffa8a>] truncate_inode_page+0x9a/0xb0
[  840.070581]  [<ffffffff810ffd80>] truncate_inode_pages_range+0x160/0x440
[  840.070584]  [<ffffffff811bb950>] ? ext3_delete_inode+0x0/0xf0
[  840.070587]  [<ffffffff81100075>] truncate_inode_pages+0x15/0x20
[  840.070590]  [<ffffffff811bb96d>] ext3_delete_inode+0x1d/0xf0
[  840.070593]  [<ffffffff811bb950>] ? ext3_delete_inode+0x0/0xf0
[  840.070597]  [<ffffffff8115cac3>] generic_delete_inode+0xe3/0x1c0
[  840.070600]  [<ffffffff8115cc05>] generic_drop_inode+0x65/0x80
[  840.070603]  [<ffffffff8115b592>] iput+0x62/0x70
[  840.070606]  [<ffffffff81151f82>] do_unlinkat+0x112/0x1d0
[  840.070610]  [<ffffffff811555e5>] ? sys_getdents+0xb5/0xf0
[  840.070613]  [<ffffffff811521b2>] sys_unlinkat+0x22/0x40
[  840.070616]  [<ffffffff810131b2>] system_call_fastpath+0x16/0x1b

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: I/O scheduler deadlocks on Xen virtual block devices
  2010-09-01 14:17 I/O scheduler deadlocks on Xen virtual block devices Doug Neal
@ 2010-09-01 17:33 ` Jeff Moyer
  2010-09-02  8:34   ` Doug Neal
  2010-09-03 12:22 ` Pasi Kärkkäinen
  1 sibling, 1 reply; 10+ messages in thread
From: Jeff Moyer @ 2010-09-01 17:33 UTC (permalink / raw)
  To: Doug Neal; +Cc: linux-kernel

Doug Neal <dneallkml@gmail.com> writes:

> Hello all,
>
> I believe I have found a bug in the I/O scheduler which results in
> access to a block device being blocked indefinitely.
>
> The setup:
>  * Xen dom0 version 3.4.2.
>  * domU: Ubuntu 10.04, x86_64, with kernel 2.6.32.15+drm33.5.
>  * Paravirtual disks and network interfaces.
>  * Root filesystem on /dev/xvdb1
>  * A scratch filesystem for the purposes of my tests on /dev/xvda
> mounted on /mnt
>  * Both filesystems are ext3, formatted and mounted with defaults
>  * XVBDs are backed by LVM on top of an iSCSI SAN in the dom0.
>
>
> Activities leading up to the incident:
>
> To reproduce the bug, I run the VM on a Xen host which has a moderate
> workload of other VMs. (It seems to manifest itself more readily than
> if the host is otherwise idle).
>
> I repeatedly rsync the contents of a Linux install CD to an arbitrary
> location on the scratch filesystem, e.g. /mnt/test, then rm -rf the
> lot, and rsync again. It can sometimes take a few iterations before
> the bug is triggered. Sometimes it's triggered on the rsync, sometimes
> on the rm.
>
> At some point during either the rsync or the rm, all the processes
> accessing /dev/xvda (rsync, kjournald, flush) become stuck in the D
> state. After 120 seconds the warnings start to appear in dmesg:
>
>   [  840.070508] INFO: task rm:1455 blocked for more than 120 seconds.
>   [  840.070514] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
>
> Full dmesg output is below, including call traces which show that each
> of these processes is stuck in io_schedule or sync_buffer called from
> io_schedule.
>
>
> I believe I have eliminated:
>  * Problems with the underlying physical device
>    - The same bug has manifested itself on two completely separate
> sets of hardware, with different servers, switches, and SANs.
>
>  * Problems with the host's iSCSI initiator
>    - Other VMs depending on the same iSCSI session are unaffected
>    - Other XVBDs within the same VM (in this case: /dev/xvdb1 mounted
> on /) are unaffected
>
> Things I've tried:
>  * Noop, deadline, cfq and anticipatory elevators.

Did you try these different I/O schedulers in the domU or on the dom0?
Does switching I/O schedulers in either place make the problem go away
when it happens?

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: I/O scheduler deadlocks on Xen virtual block devices
  2010-09-01 17:33 ` Jeff Moyer
@ 2010-09-02  8:34   ` Doug Neal
  2010-09-02 13:47     ` Jeff Moyer
  0 siblings, 1 reply; 10+ messages in thread
From: Doug Neal @ 2010-09-02  8:34 UTC (permalink / raw)
  To: Jeff Moyer, linux-kernel

>
> Did you try these different I/O schedulers in the domU or on the dom0?
> Does switching I/O schedulers in either place make the problem go away
> when it happens?
>

In the domU, and the bug was present in all cases. The dom0 was using
cfq. I'll run the tests again using each scheduler in the dom0 with
domU set to noop and report back.

Thanks,

Doug

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: I/O scheduler deadlocks on Xen virtual block devices
  2010-09-02  8:34   ` Doug Neal
@ 2010-09-02 13:47     ` Jeff Moyer
  2010-09-02 14:06       ` Doug Neal
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff Moyer @ 2010-09-02 13:47 UTC (permalink / raw)
  To: Doug Neal; +Cc: linux-kernel

Doug Neal <dneallkml@gmail.com> writes:

>>
>> Did you try these different I/O schedulers in the domU or on the dom0?
>> Does switching I/O schedulers in either place make the problem go away
>> when it happens?
>>
>
> In the domU, and the bug was present in all cases. The dom0 was using
> cfq. I'll run the tests again using each scheduler in the dom0 with
> domU set to noop and report back.

While I think this is an interesting test, you need only test one I/O
scheduler in the dom0.  Also, I think you misunderstood the second
question.  I'd like to know if switching I/O schedulers while the system
is in this bad state helps get I/O going again.

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: I/O scheduler deadlocks on Xen virtual block devices
  2010-09-02 13:47     ` Jeff Moyer
@ 2010-09-02 14:06       ` Doug Neal
  2010-09-02 15:25         ` Jeff Moyer
  0 siblings, 1 reply; 10+ messages in thread
From: Doug Neal @ 2010-09-02 14:06 UTC (permalink / raw)
  To: Jeff Moyer, linux-kernel

>>>
>>> Did you try these different I/O schedulers in the domU or on the dom0?
>>> Does switching I/O schedulers in either place make the problem go away
>>> when it happens?
>>>
>>
>> In the domU, and the bug was present in all cases. The dom0 was using
>> cfq. I'll run the tests again using each scheduler in the dom0 with
>> domU set to noop and report back.
>
> While I think this is an interesting test, you need only test one I/O
> scheduler in the dom0.  Also, I think you misunderstood the second
> question.  I'd like to know if switching I/O schedulers while the system
> is in this bad state helps get I/O going again.
>
Jeff,

Issue confirmed in the following cases:

dom0:domU
noop:noop
noop:deadline
noop:anticipatory
cfq:noop
cfq:deadline
cfq:anticipatory
cfq:cfq

I try switching schedulers in the domU during the lockup, the echo x >
/sys/block/xvda/queue/scheduler command also hangs. In the dom0, no
effect.

Cheers,

Doug

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: I/O scheduler deadlocks on Xen virtual block devices
  2010-09-02 14:06       ` Doug Neal
@ 2010-09-02 15:25         ` Jeff Moyer
  2010-09-02 15:43           ` Doug Neal
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff Moyer @ 2010-09-02 15:25 UTC (permalink / raw)
  To: Doug Neal; +Cc: linux-kernel

Doug Neal <dneallkml@gmail.com> writes:

>>>>
>>>> Did you try these different I/O schedulers in the domU or on the dom0?
>>>> Does switching I/O schedulers in either place make the problem go away
>>>> when it happens?
>>>>
>>>
>>> In the domU, and the bug was present in all cases. The dom0 was using
>>> cfq. I'll run the tests again using each scheduler in the dom0 with
>>> domU set to noop and report back.
>>
>> While I think this is an interesting test, you need only test one I/O
>> scheduler in the dom0.  Also, I think you misunderstood the second
>> question.  I'd like to know if switching I/O schedulers while the system
>> is in this bad state helps get I/O going again.
>>
> Jeff,
>
> Issue confirmed in the following cases:
>
> dom0:domU
> noop:noop
> noop:deadline
> noop:anticipatory
> cfq:noop
> cfq:deadline
> cfq:anticipatory
> cfq:cfq
>
> I try switching schedulers in the domU during the lockup, the echo x >
> /sys/block/xvda/queue/scheduler command also hangs. In the dom0, no
> effect.

OK.  So, this means that the I/O is not hung up in the I/O scheduler.  I
don't have a whole lot of experience with the Xen I/O stack, but I think
that's where I'd start digging (especially if there's nothing in the
logs about command timed out).

I hope this was a somewhat helpful exercise.  Sorry I can't be of more
help.

Cheers!
Jeff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: I/O scheduler deadlocks on Xen virtual block devices
  2010-09-02 15:25         ` Jeff Moyer
@ 2010-09-02 15:43           ` Doug Neal
  0 siblings, 0 replies; 10+ messages in thread
From: Doug Neal @ 2010-09-02 15:43 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: linux-kernel

On Thu, Sep 2, 2010 at 4:25 PM, Jeff Moyer <jmoyer@redhat.com> wrote:
> Doug Neal <dneallkml@gmail.com> writes:
>
>>>>>
>>>>> Did you try these different I/O schedulers in the domU or on the dom0?
>>>>> Does switching I/O schedulers in either place make the problem go away
>>>>> when it happens?
>>>>>
>>>>
>>>> In the domU, and the bug was present in all cases. The dom0 was using
>>>> cfq. I'll run the tests again using each scheduler in the dom0 with
>>>> domU set to noop and report back.
>>>
>>> While I think this is an interesting test, you need only test one I/O
>>> scheduler in the dom0.  Also, I think you misunderstood the second
>>> question.  I'd like to know if switching I/O schedulers while the system
>>> is in this bad state helps get I/O going again.
>>>
>> Jeff,
>>
>> Issue confirmed in the following cases:
>>
>> dom0:domU
>> noop:noop
>> noop:deadline
>> noop:anticipatory
>> cfq:noop
>> cfq:deadline
>> cfq:anticipatory
>> cfq:cfq
>>
>> I try switching schedulers in the domU during the lockup, the echo x >
>> /sys/block/xvda/queue/scheduler command also hangs. In the dom0, no
>> effect.
>
> OK.  So, this means that the I/O is not hung up in the I/O scheduler.  I
> don't have a whole lot of experience with the Xen I/O stack, but I think
> that's where I'd start digging (especially if there's nothing in the
> logs about command timed out).
>
> I hope this was a somewhat helpful exercise.  Sorry I can't be of more
> help.
>
> Cheers!
> Jeff
>

No problem, thanks for your assistance.


Doug

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: I/O scheduler deadlocks on Xen virtual block devices
  2010-09-01 14:17 I/O scheduler deadlocks on Xen virtual block devices Doug Neal
  2010-09-01 17:33 ` Jeff Moyer
@ 2010-09-03 12:22 ` Pasi Kärkkäinen
  2010-09-03 15:17   ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 10+ messages in thread
From: Pasi Kärkkäinen @ 2010-09-03 12:22 UTC (permalink / raw)
  To: Doug Neal; +Cc: linux-kernel, Jeremy Fitzhardinge

On Wed, Sep 01, 2010 at 03:17:25PM +0100, Doug Neal wrote:
> Hello all,
> 
> I believe I have found a bug in the I/O scheduler which results in
> access to a block device being blocked indefinitely.
> 
> The setup:
>  * Xen dom0 version 3.4.2.
>  * domU: Ubuntu 10.04, x86_64, with kernel 2.6.32.15+drm33.5.
>  * Paravirtual disks and network interfaces.
>  * Root filesystem on /dev/xvdb1
>  * A scratch filesystem for the purposes of my tests on /dev/xvda
> mounted on /mnt
>  * Both filesystems are ext3, formatted and mounted with defaults
>  * XVBDs are backed by LVM on top of an iSCSI SAN in the dom0.
> 

Hmm.. recently there was some patches to the Xen block device frontend driver
(sent to lkml). Maybe you should try if those make a difference? 

Jeremy: What do you think? 

-- Pasi

> 
> Activities leading up to the incident:
> 
> To reproduce the bug, I run the VM on a Xen host which has a moderate
> workload of other VMs. (It seems to manifest itself more readily than
> if the host is otherwise idle).
> 
> I repeatedly rsync the contents of a Linux install CD to an arbitrary
> location on the scratch filesystem, e.g. /mnt/test, then rm -rf the
> lot, and rsync again. It can sometimes take a few iterations before
> the bug is triggered. Sometimes it's triggered on the rsync, sometimes
> on the rm.
> 
> At some point during either the rsync or the rm, all the processes
> accessing /dev/xvda (rsync, kjournald, flush) become stuck in the D
> state. After 120 seconds the warnings start to appear in dmesg:
> 
>   [  840.070508] INFO: task rm:1455 blocked for more than 120 seconds.
>   [  840.070514] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> 
> Full dmesg output is below, including call traces which show that each
> of these processes is stuck in io_schedule or sync_buffer called from
> io_schedule.
> 
> 
> I believe I have eliminated:
>  * Problems with the underlying physical device
>    - The same bug has manifested itself on two completely separate
> sets of hardware, with different servers, switches, and SANs.
> 
>  * Problems with the host's iSCSI initiator
>    - Other VMs depending on the same iSCSI session are unaffected
>    - Other XVBDs within the same VM (in this case: /dev/xvdb1 mounted
> on /) are unaffected
> 
> Things I've tried:
>  * Noop, deadline, cfq and anticipatory elevators.
>  * Reducing the VM's vCPU count to 1. I _wasn't_ able to trigger the
> bug under only one vCPU.
> 
> Unfortunately the steps to reproduce are rather long winded once you
> factor in setting up the Xen dom0 which is in this case Citrix
> XenServer, so if there is any way I can help with that please let me
> know, e.g. by arranging remote access to my existing test environment
> or by supplying a VM template image. However I don't believe this
> problem is limited to this environment as I have seen quite a few
> reports of similar or identical issues elsewhere (but not on LKML...):
> 
> http://lists.xensource.com/archives/html/xen-devel/2010-06/msg00590.html
> - same issue reported with a non-commercial and more recent Xen
> version than mine. Also links to three Redhat bug reports showing
> similar behaviour.
> 
> 
> Thanks for your time,
> 
> Doug
> 
> 
> 
> [  720.070103] INFO: task kjournald:1011 blocked for more than 120 seconds.
> [  720.070121] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  720.070131] kjournald     D ffff8800035bcbc0     0  1011      2 0x00000000
> [  720.070140]  ffff880002dc1a30 0000000000000246 0000000000015bc0
> 0000000000015bc0
> [  720.070149]  ffff8800574db1a0 ffff880002dc1fd8 0000000000015bc0
> ffff8800574dade0
> [  720.070156]  0000000000015bc0 ffff880002dc1fd8 0000000000015bc0
> ffff8800574db1a0
> [  720.070164] Call Trace:
> [  720.070177]  [<ffffffff81558bb7>] io_schedule+0x47/0x70
> [  720.070184]  [<ffffffff812a095e>] get_request_wait+0xce/0x1a0
> [  720.070191]  [<ffffffff81085090>] ? autoremove_wake_function+0x0/0x40
> [  720.070195]  [<ffffffff812a1045>] __make_request+0x95/0x4a0
> [  720.070201]  [<ffffffff810f6595>] ? mempool_alloc_slab+0x15/0x20
> [  720.070208]  [<ffffffff8100f37f>] ? xen_restore_fl_direct_end+0x0/0x1
> [  720.070213]  [<ffffffff8129f7a1>] generic_make_request+0x1b1/0x4f0
> [  720.070218]  [<ffffffff810f6595>] ? mempool_alloc_slab+0x15/0x20
> [  720.070223]  [<ffffffff8129fb60>] submit_bio+0x80/0x110
> [  720.070229]  [<ffffffff8116d399>] submit_bh+0xf9/0x140
> [  720.070235]  [<ffffffff8121495c>] journal_submit_data_buffers+0x24c/0x320
> [  720.070240]  [<ffffffff81214d02>] journal_commit_transaction+0x2d2/0xe40
> [  720.070245]  [<ffffffff8100f37f>] ? xen_restore_fl_direct_end+0x0/0x1
> [  720.070250]  [<ffffffff8155ab5e>] ? _spin_unlock_irqrestore+0x1e/0x30
> [  720.070256]  [<ffffffff81077719>] ? try_to_del_timer_sync+0x79/0xd0
> [  720.070261]  [<ffffffff8100f37f>] ? xen_restore_fl_direct_end+0x0/0x1
> [  720.070265]  [<ffffffff8155ab5e>] ? _spin_unlock_irqrestore+0x1e/0x30
> [  720.070270]  [<ffffffff81218c1d>] kjournald+0xed/0x250
> [  720.070275]  [<ffffffff81085090>] ? autoremove_wake_function+0x0/0x40
> [  720.070279]  [<ffffffff81218b30>] ? kjournald+0x0/0x250
> [  720.070284]  [<ffffffff81084d16>] kthread+0x96/0xa0
> [  720.070289]  [<ffffffff810141ea>] child_rip+0xa/0x20
> [  720.070295]  [<ffffffff810133d1>] ? int_ret_from_sys_call+0x7/0x1b
> [  720.070299]  [<ffffffff81013b5d>] ? retint_restore_args+0x5/0x6
> [  720.070313]  [<ffffffff810141e0>] ? child_rip+0x0/0x20
> [  720.070317] INFO: task rm:1455 blocked for more than 120 seconds.
> [  720.070323] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  720.070329] rm            D ffff8800035dabc0     0  1455    980 0x00000000
> [  720.070333]  ffff880058bedb68 0000000000000286 0000000000015bc0
> 0000000000015bc0
> [  720.070338]  ffff880002d39ab0 ffff880058bedfd8 0000000000015bc0
> ffff880002d396f0
> [  720.070343]  0000000000015bc0 ffff880058bedfd8 0000000000015bc0
> ffff880002d39ab0
> [  720.070348] Call Trace:
> [  720.070351]  [<ffffffff8116e5d0>] ? sync_buffer+0x0/0x50
> [  720.070354]  [<ffffffff81558bb7>] io_schedule+0x47/0x70
> [  720.070357]  [<ffffffff8116e615>] sync_buffer+0x45/0x50
> [  720.070360]  [<ffffffff8155908a>] __wait_on_bit_lock+0x5a/0xc0
> [  720.070363]  [<ffffffff8116e5d0>] ? sync_buffer+0x0/0x50
> [  720.070366]  [<ffffffff81559168>] out_of_line_wait_on_bit_lock+0x78/0x90
> [  720.070369]  [<ffffffff810850d0>] ? wake_bit_function+0x0/0x40
> [  720.070373]  [<ffffffff8116d259>] ? try_to_free_buffers+0x79/0xc0
> [  720.070376]  [<ffffffff8116e796>] __lock_buffer+0x36/0x40
> [  720.070379]  [<ffffffff812136d6>] journal_invalidatepage+0xc6/0xd0
> [  720.070383]  [<ffffffff811b7e5d>] ext3_invalidatepage+0x3d/0x40
> [  720.070388]  [<ffffffff810ff7a5>] do_invalidatepage+0x25/0x30
> [  720.070390]  [<ffffffff810ffa8a>] truncate_inode_page+0x9a/0xb0
> [  720.070393]  [<ffffffff810ffd80>] truncate_inode_pages_range+0x160/0x440
> [  720.070396]  [<ffffffff811bb950>] ? ext3_delete_inode+0x0/0xf0
> [  720.070399]  [<ffffffff81100075>] truncate_inode_pages+0x15/0x20
> [  720.070402]  [<ffffffff811bb96d>] ext3_delete_inode+0x1d/0xf0
> [  720.070405]  [<ffffffff811bb950>] ? ext3_delete_inode+0x0/0xf0
> [  720.070409]  [<ffffffff8115cac3>] generic_delete_inode+0xe3/0x1c0
> [  720.070412]  [<ffffffff8115cc05>] generic_drop_inode+0x65/0x80
> [  720.070415]  [<ffffffff8115b592>] iput+0x62/0x70
> [  720.070419]  [<ffffffff81151f82>] do_unlinkat+0x112/0x1d0
> [  720.070422]  [<ffffffff811555e5>] ? sys_getdents+0xb5/0xf0
> [  720.070425]  [<ffffffff811521b2>] sys_unlinkat+0x22/0x40
> [  720.070428]  [<ffffffff810131b2>] system_call_fastpath+0x16/0x1b
> [  840.070097] INFO: task flush-202:0:208 blocked for more than 120 seconds.
> [  840.070115] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  840.070125] flush-202:0   D ffff88000359ebc0     0   208      2 0x00000000
> [  840.070132]  ffff880002d698c0 0000000000000246 0000000000015bc0
> 0000000000015bc0
> [  840.070142]  ffff88007d444890 ffff880002d69fd8 0000000000015bc0
> ffff88007d4444d0
> [  840.070149]  0000000000015bc0 ffff880002d69fd8 0000000000015bc0
> ffff88007d444890
> [  840.070156] Call Trace:
> [  840.070170]  [<ffffffff8116e5d0>] ? sync_buffer+0x0/0x50
> [  840.070177]  [<ffffffff81558bb7>] io_schedule+0x47/0x70
> [  840.070182]  [<ffffffff8116e615>] sync_buffer+0x45/0x50
> [  840.070187]  [<ffffffff8155908a>] __wait_on_bit_lock+0x5a/0xc0
> [  840.070191]  [<ffffffff8116e5d0>] ? sync_buffer+0x0/0x50
> [  840.070196]  [<ffffffff8116eb60>] ? end_buffer_async_write+0x0/0x180
> [  840.070201]  [<ffffffff81559168>] out_of_line_wait_on_bit_lock+0x78/0x90
> [  840.070208]  [<ffffffff810850d0>] ? wake_bit_function+0x0/0x40
> [  840.070212]  [<ffffffff8116e796>] __lock_buffer+0x36/0x40
> [  840.070217]  [<ffffffff8116f473>] __block_write_full_page+0x383/0x3c0
> [  840.070222]  [<ffffffff8116eb60>] ? end_buffer_async_write+0x0/0x180
> [  840.070227]  [<ffffffff8116fde0>] block_write_full_page_endio+0xe0/0x120
> [  840.070232]  [<ffffffff811b70f0>] ? buffer_unmapped+0x0/0x20
> [  840.070237]  [<ffffffff8116fe35>] block_write_full_page+0x15/0x20
> [  840.070241]  [<ffffffff811b7bfd>] ext3_ordered_writepage+0x1dd/0x200
> [  840.070248]  [<ffffffff810fc687>] __writepage+0x17/0x40
> [  840.070254]  [<ffffffff810fd83f>] write_cache_pages+0x21f/0x4d0
> [  840.070258]  [<ffffffff810fc670>] ? __writepage+0x0/0x40
> [  840.070263]  [<ffffffff810fdb14>] generic_writepages+0x24/0x30
> [  840.070268]  [<ffffffff810fdb55>] do_writepages+0x35/0x40
> [  840.070273]  [<ffffffff81166976>] writeback_single_inode+0xf6/0x3d0
> [  840.070279]  [<ffffffff811675de>] writeback_inodes_wb+0x40e/0x5e0
> [  840.070283]  [<ffffffff811678ba>] wb_writeback+0x10a/0x1d0
> [  840.070289]  [<ffffffff81077719>] ? try_to_del_timer_sync+0x79/0xd0
> [  840.070294]  [<ffffffff81558e6d>] ? schedule_timeout+0x19d/0x300
> [  840.070298]  [<ffffffff81167bec>] wb_do_writeback+0x18c/0x1a0
> [  840.070303]  [<ffffffff81167c53>] bdi_writeback_task+0x53/0xe0
> [  840.070308]  [<ffffffff8110f546>] bdi_start_fn+0x86/0x100
> [  840.070312]  [<ffffffff8110f4c0>] ? bdi_start_fn+0x0/0x100
> [  840.070317]  [<ffffffff81084d16>] kthread+0x96/0xa0
> [  840.070322]  [<ffffffff810141ea>] child_rip+0xa/0x20
> [  840.070329]  [<ffffffff810133d1>] ? int_ret_from_sys_call+0x7/0x1b
> [  840.070333]  [<ffffffff81013b5d>] ? retint_restore_args+0x5/0x6
> [  840.070337]  [<ffffffff810141e0>] ? child_rip+0x0/0x20
> [  840.070346] INFO: task kjournald:1011 blocked for more than 120 seconds.
> [  840.070354] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  840.070363] kjournald     D ffff8800035bcbc0     0  1011      2 0x00000000
> [  840.070378]  ffff880002dc1a30 0000000000000246 0000000000015bc0
> 0000000000015bc0
> [  840.070385]  ffff8800574db1a0 ffff880002dc1fd8 0000000000015bc0
> ffff8800574dade0
> [  840.070392]  0000000000015bc0 ffff880002dc1fd8 0000000000015bc0
> ffff8800574db1a0
> [  840.070399] Call Trace:
> [  840.070404]  [<ffffffff81558bb7>] io_schedule+0x47/0x70
> [  840.070409]  [<ffffffff812a095e>] get_request_wait+0xce/0x1a0
> [  840.070414]  [<ffffffff81085090>] ? autoremove_wake_function+0x0/0x40
> [  840.070418]  [<ffffffff812a1045>] __make_request+0x95/0x4a0
> [  840.070423]  [<ffffffff810f6595>] ? mempool_alloc_slab+0x15/0x20
> [  840.070428]  [<ffffffff8100f37f>] ? xen_restore_fl_direct_end+0x0/0x1
> [  840.070434]  [<ffffffff8129f7a1>] generic_make_request+0x1b1/0x4f0
> [  840.070438]  [<ffffffff810f6595>] ? mempool_alloc_slab+0x15/0x20
> [  840.070443]  [<ffffffff8129fb60>] submit_bio+0x80/0x110
> [  840.070448]  [<ffffffff8116d399>] submit_bh+0xf9/0x140
> [  840.070463]  [<ffffffff8121495c>] journal_submit_data_buffers+0x24c/0x320
> [  840.070466]  [<ffffffff81214d02>] journal_commit_transaction+0x2d2/0xe40
> [  840.070470]  [<ffffffff8100f37f>] ? xen_restore_fl_direct_end+0x0/0x1
> [  840.070473]  [<ffffffff8155ab5e>] ? _spin_unlock_irqrestore+0x1e/0x30
> [  840.070476]  [<ffffffff81077719>] ? try_to_del_timer_sync+0x79/0xd0
> [  840.070479]  [<ffffffff8100f37f>] ? xen_restore_fl_direct_end+0x0/0x1
> [  840.070482]  [<ffffffff8155ab5e>] ? _spin_unlock_irqrestore+0x1e/0x30
> [  840.070485]  [<ffffffff81218c1d>] kjournald+0xed/0x250
> [  840.070488]  [<ffffffff81085090>] ? autoremove_wake_function+0x0/0x40
> [  840.070491]  [<ffffffff81218b30>] ? kjournald+0x0/0x250
> [  840.070494]  [<ffffffff81084d16>] kthread+0x96/0xa0
> [  840.070497]  [<ffffffff810141ea>] child_rip+0xa/0x20
> [  840.070500]  [<ffffffff810133d1>] ? int_ret_from_sys_call+0x7/0x1b
> [  840.070503]  [<ffffffff81013b5d>] ? retint_restore_args+0x5/0x6
> [  840.070506]  [<ffffffff810141e0>] ? child_rip+0x0/0x20
> [  840.070508] INFO: task rm:1455 blocked for more than 120 seconds.
> [  840.070514] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  840.070519] rm            D ffff8800035dabc0     0  1455    980 0x00000000
> [  840.070524]  ffff880058bedb68 0000000000000286 0000000000015bc0
> 0000000000015bc0
> [  840.070528]  ffff880002d39ab0 ffff880058bedfd8 0000000000015bc0
> ffff880002d396f0
> [  840.070533]  0000000000015bc0 ffff880058bedfd8 0000000000015bc0
> ffff880002d39ab0
> [  840.070538] Call Trace:
> [  840.070541]  [<ffffffff8116e5d0>] ? sync_buffer+0x0/0x50
> [  840.070544]  [<ffffffff81558bb7>] io_schedule+0x47/0x70
> [  840.070547]  [<ffffffff8116e615>] sync_buffer+0x45/0x50
> [  840.070550]  [<ffffffff8155908a>] __wait_on_bit_lock+0x5a/0xc0
> [  840.070553]  [<ffffffff8116e5d0>] ? sync_buffer+0x0/0x50
> [  840.070556]  [<ffffffff81559168>] out_of_line_wait_on_bit_lock+0x78/0x90
> [  840.070560]  [<ffffffff810850d0>] ? wake_bit_function+0x0/0x40
> [  840.070563]  [<ffffffff8116d259>] ? try_to_free_buffers+0x79/0xc0
> [  840.070566]  [<ffffffff8116e796>] __lock_buffer+0x36/0x40
> [  840.070569]  [<ffffffff812136d6>] journal_invalidatepage+0xc6/0xd0
> [  840.070572]  [<ffffffff811b7e5d>] ext3_invalidatepage+0x3d/0x40
> [  840.070575]  [<ffffffff810ff7a5>] do_invalidatepage+0x25/0x30
> [  840.070578]  [<ffffffff810ffa8a>] truncate_inode_page+0x9a/0xb0
> [  840.070581]  [<ffffffff810ffd80>] truncate_inode_pages_range+0x160/0x440
> [  840.070584]  [<ffffffff811bb950>] ? ext3_delete_inode+0x0/0xf0
> [  840.070587]  [<ffffffff81100075>] truncate_inode_pages+0x15/0x20
> [  840.070590]  [<ffffffff811bb96d>] ext3_delete_inode+0x1d/0xf0
> [  840.070593]  [<ffffffff811bb950>] ? ext3_delete_inode+0x0/0xf0
> [  840.070597]  [<ffffffff8115cac3>] generic_delete_inode+0xe3/0x1c0
> [  840.070600]  [<ffffffff8115cc05>] generic_drop_inode+0x65/0x80
> [  840.070603]  [<ffffffff8115b592>] iput+0x62/0x70
> [  840.070606]  [<ffffffff81151f82>] do_unlinkat+0x112/0x1d0
> [  840.070610]  [<ffffffff811555e5>] ? sys_getdents+0xb5/0xf0
> [  840.070613]  [<ffffffff811521b2>] sys_unlinkat+0x22/0x40
> [  840.070616]  [<ffffffff810131b2>] system_call_fastpath+0x16/0x1b
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: I/O scheduler deadlocks on Xen virtual block devices
  2010-09-03 12:22 ` Pasi Kärkkäinen
@ 2010-09-03 15:17   ` Jeremy Fitzhardinge
  2010-09-06 12:44     ` Doug Neal
  0 siblings, 1 reply; 10+ messages in thread
From: Jeremy Fitzhardinge @ 2010-09-03 15:17 UTC (permalink / raw)
  To: Pasi Kärkkäinen; +Cc: Doug Neal, linux-kernel

 On 09/03/2010 05:22 AM, Pasi Kärkkäinen wrote:
> On Wed, Sep 01, 2010 at 03:17:25PM +0100, Doug Neal wrote:
>> Hello all,
>>
>> I believe I have found a bug in the I/O scheduler which results in
>> access to a block device being blocked indefinitely.
>>
>> The setup:
>>  * Xen dom0 version 3.4.2.
>>  * domU: Ubuntu 10.04, x86_64, with kernel 2.6.32.15+drm33.5.
>>  * Paravirtual disks and network interfaces.
>>  * Root filesystem on /dev/xvdb1
>>  * A scratch filesystem for the purposes of my tests on /dev/xvda
>> mounted on /mnt
>>  * Both filesystems are ext3, formatted and mounted with defaults
>>  * XVBDs are backed by LVM on top of an iSCSI SAN in the dom0.
>>
> Hmm.. recently there was some patches to the Xen block device frontend driver
> (sent to lkml). Maybe you should try if those make a difference? 
>
> Jeremy: What do you think? 

I think this is more likely to be related to the lost interrupt bug
which I sent fixes up for last week ("xen: use percpu interrupts for
IPIs and VIRQs" and "xen: handle events as edge-triggered").

Doug, as a workaround, try disabling irqbalanced.

Thanks,
    J


> -- Pasi
>
>> Activities leading up to the incident:
>>
>> To reproduce the bug, I run the VM on a Xen host which has a moderate
>> workload of other VMs. (It seems to manifest itself more readily than
>> if the host is otherwise idle).
>>
>> I repeatedly rsync the contents of a Linux install CD to an arbitrary
>> location on the scratch filesystem, e.g. /mnt/test, then rm -rf the
>> lot, and rsync again. It can sometimes take a few iterations before
>> the bug is triggered. Sometimes it's triggered on the rsync, sometimes
>> on the rm.
>>
>> At some point during either the rsync or the rm, all the processes
>> accessing /dev/xvda (rsync, kjournald, flush) become stuck in the D
>> state. After 120 seconds the warnings start to appear in dmesg:
>>
>>   [  840.070508] INFO: task rm:1455 blocked for more than 120 seconds.
>>   [  840.070514] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>>
>> Full dmesg output is below, including call traces which show that each
>> of these processes is stuck in io_schedule or sync_buffer called from
>> io_schedule.
>>
>>
>> I believe I have eliminated:
>>  * Problems with the underlying physical device
>>    - The same bug has manifested itself on two completely separate
>> sets of hardware, with different servers, switches, and SANs.
>>
>>  * Problems with the host's iSCSI initiator
>>    - Other VMs depending on the same iSCSI session are unaffected
>>    - Other XVBDs within the same VM (in this case: /dev/xvdb1 mounted
>> on /) are unaffected
>>
>> Things I've tried:
>>  * Noop, deadline, cfq and anticipatory elevators.
>>  * Reducing the VM's vCPU count to 1. I _wasn't_ able to trigger the
>> bug under only one vCPU.
>>
>> Unfortunately the steps to reproduce are rather long winded once you
>> factor in setting up the Xen dom0 which is in this case Citrix
>> XenServer, so if there is any way I can help with that please let me
>> know, e.g. by arranging remote access to my existing test environment
>> or by supplying a VM template image. However I don't believe this
>> problem is limited to this environment as I have seen quite a few
>> reports of similar or identical issues elsewhere (but not on LKML...):
>>
>> http://lists.xensource.com/archives/html/xen-devel/2010-06/msg00590.html
>> - same issue reported with a non-commercial and more recent Xen
>> version than mine. Also links to three Redhat bug reports showing
>> similar behaviour.
>>
>>
>> Thanks for your time,
>>
>> Doug
>>
>>
>>
>> [  720.070103] INFO: task kjournald:1011 blocked for more than 120 seconds.
>> [  720.070121] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [  720.070131] kjournald     D ffff8800035bcbc0     0  1011      2 0x00000000
>> [  720.070140]  ffff880002dc1a30 0000000000000246 0000000000015bc0
>> 0000000000015bc0
>> [  720.070149]  ffff8800574db1a0 ffff880002dc1fd8 0000000000015bc0
>> ffff8800574dade0
>> [  720.070156]  0000000000015bc0 ffff880002dc1fd8 0000000000015bc0
>> ffff8800574db1a0
>> [  720.070164] Call Trace:
>> [  720.070177]  [<ffffffff81558bb7>] io_schedule+0x47/0x70
>> [  720.070184]  [<ffffffff812a095e>] get_request_wait+0xce/0x1a0
>> [  720.070191]  [<ffffffff81085090>] ? autoremove_wake_function+0x0/0x40
>> [  720.070195]  [<ffffffff812a1045>] __make_request+0x95/0x4a0
>> [  720.070201]  [<ffffffff810f6595>] ? mempool_alloc_slab+0x15/0x20
>> [  720.070208]  [<ffffffff8100f37f>] ? xen_restore_fl_direct_end+0x0/0x1
>> [  720.070213]  [<ffffffff8129f7a1>] generic_make_request+0x1b1/0x4f0
>> [  720.070218]  [<ffffffff810f6595>] ? mempool_alloc_slab+0x15/0x20
>> [  720.070223]  [<ffffffff8129fb60>] submit_bio+0x80/0x110
>> [  720.070229]  [<ffffffff8116d399>] submit_bh+0xf9/0x140
>> [  720.070235]  [<ffffffff8121495c>] journal_submit_data_buffers+0x24c/0x320
>> [  720.070240]  [<ffffffff81214d02>] journal_commit_transaction+0x2d2/0xe40
>> [  720.070245]  [<ffffffff8100f37f>] ? xen_restore_fl_direct_end+0x0/0x1
>> [  720.070250]  [<ffffffff8155ab5e>] ? _spin_unlock_irqrestore+0x1e/0x30
>> [  720.070256]  [<ffffffff81077719>] ? try_to_del_timer_sync+0x79/0xd0
>> [  720.070261]  [<ffffffff8100f37f>] ? xen_restore_fl_direct_end+0x0/0x1
>> [  720.070265]  [<ffffffff8155ab5e>] ? _spin_unlock_irqrestore+0x1e/0x30
>> [  720.070270]  [<ffffffff81218c1d>] kjournald+0xed/0x250
>> [  720.070275]  [<ffffffff81085090>] ? autoremove_wake_function+0x0/0x40
>> [  720.070279]  [<ffffffff81218b30>] ? kjournald+0x0/0x250
>> [  720.070284]  [<ffffffff81084d16>] kthread+0x96/0xa0
>> [  720.070289]  [<ffffffff810141ea>] child_rip+0xa/0x20
>> [  720.070295]  [<ffffffff810133d1>] ? int_ret_from_sys_call+0x7/0x1b
>> [  720.070299]  [<ffffffff81013b5d>] ? retint_restore_args+0x5/0x6
>> [  720.070313]  [<ffffffff810141e0>] ? child_rip+0x0/0x20
>> [  720.070317] INFO: task rm:1455 blocked for more than 120 seconds.
>> [  720.070323] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [  720.070329] rm            D ffff8800035dabc0     0  1455    980 0x00000000
>> [  720.070333]  ffff880058bedb68 0000000000000286 0000000000015bc0
>> 0000000000015bc0
>> [  720.070338]  ffff880002d39ab0 ffff880058bedfd8 0000000000015bc0
>> ffff880002d396f0
>> [  720.070343]  0000000000015bc0 ffff880058bedfd8 0000000000015bc0
>> ffff880002d39ab0
>> [  720.070348] Call Trace:
>> [  720.070351]  [<ffffffff8116e5d0>] ? sync_buffer+0x0/0x50
>> [  720.070354]  [<ffffffff81558bb7>] io_schedule+0x47/0x70
>> [  720.070357]  [<ffffffff8116e615>] sync_buffer+0x45/0x50
>> [  720.070360]  [<ffffffff8155908a>] __wait_on_bit_lock+0x5a/0xc0
>> [  720.070363]  [<ffffffff8116e5d0>] ? sync_buffer+0x0/0x50
>> [  720.070366]  [<ffffffff81559168>] out_of_line_wait_on_bit_lock+0x78/0x90
>> [  720.070369]  [<ffffffff810850d0>] ? wake_bit_function+0x0/0x40
>> [  720.070373]  [<ffffffff8116d259>] ? try_to_free_buffers+0x79/0xc0
>> [  720.070376]  [<ffffffff8116e796>] __lock_buffer+0x36/0x40
>> [  720.070379]  [<ffffffff812136d6>] journal_invalidatepage+0xc6/0xd0
>> [  720.070383]  [<ffffffff811b7e5d>] ext3_invalidatepage+0x3d/0x40
>> [  720.070388]  [<ffffffff810ff7a5>] do_invalidatepage+0x25/0x30
>> [  720.070390]  [<ffffffff810ffa8a>] truncate_inode_page+0x9a/0xb0
>> [  720.070393]  [<ffffffff810ffd80>] truncate_inode_pages_range+0x160/0x440
>> [  720.070396]  [<ffffffff811bb950>] ? ext3_delete_inode+0x0/0xf0
>> [  720.070399]  [<ffffffff81100075>] truncate_inode_pages+0x15/0x20
>> [  720.070402]  [<ffffffff811bb96d>] ext3_delete_inode+0x1d/0xf0
>> [  720.070405]  [<ffffffff811bb950>] ? ext3_delete_inode+0x0/0xf0
>> [  720.070409]  [<ffffffff8115cac3>] generic_delete_inode+0xe3/0x1c0
>> [  720.070412]  [<ffffffff8115cc05>] generic_drop_inode+0x65/0x80
>> [  720.070415]  [<ffffffff8115b592>] iput+0x62/0x70
>> [  720.070419]  [<ffffffff81151f82>] do_unlinkat+0x112/0x1d0
>> [  720.070422]  [<ffffffff811555e5>] ? sys_getdents+0xb5/0xf0
>> [  720.070425]  [<ffffffff811521b2>] sys_unlinkat+0x22/0x40
>> [  720.070428]  [<ffffffff810131b2>] system_call_fastpath+0x16/0x1b
>> [  840.070097] INFO: task flush-202:0:208 blocked for more than 120 seconds.
>> [  840.070115] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [  840.070125] flush-202:0   D ffff88000359ebc0     0   208      2 0x00000000
>> [  840.070132]  ffff880002d698c0 0000000000000246 0000000000015bc0
>> 0000000000015bc0
>> [  840.070142]  ffff88007d444890 ffff880002d69fd8 0000000000015bc0
>> ffff88007d4444d0
>> [  840.070149]  0000000000015bc0 ffff880002d69fd8 0000000000015bc0
>> ffff88007d444890
>> [  840.070156] Call Trace:
>> [  840.070170]  [<ffffffff8116e5d0>] ? sync_buffer+0x0/0x50
>> [  840.070177]  [<ffffffff81558bb7>] io_schedule+0x47/0x70
>> [  840.070182]  [<ffffffff8116e615>] sync_buffer+0x45/0x50
>> [  840.070187]  [<ffffffff8155908a>] __wait_on_bit_lock+0x5a/0xc0
>> [  840.070191]  [<ffffffff8116e5d0>] ? sync_buffer+0x0/0x50
>> [  840.070196]  [<ffffffff8116eb60>] ? end_buffer_async_write+0x0/0x180
>> [  840.070201]  [<ffffffff81559168>] out_of_line_wait_on_bit_lock+0x78/0x90
>> [  840.070208]  [<ffffffff810850d0>] ? wake_bit_function+0x0/0x40
>> [  840.070212]  [<ffffffff8116e796>] __lock_buffer+0x36/0x40
>> [  840.070217]  [<ffffffff8116f473>] __block_write_full_page+0x383/0x3c0
>> [  840.070222]  [<ffffffff8116eb60>] ? end_buffer_async_write+0x0/0x180
>> [  840.070227]  [<ffffffff8116fde0>] block_write_full_page_endio+0xe0/0x120
>> [  840.070232]  [<ffffffff811b70f0>] ? buffer_unmapped+0x0/0x20
>> [  840.070237]  [<ffffffff8116fe35>] block_write_full_page+0x15/0x20
>> [  840.070241]  [<ffffffff811b7bfd>] ext3_ordered_writepage+0x1dd/0x200
>> [  840.070248]  [<ffffffff810fc687>] __writepage+0x17/0x40
>> [  840.070254]  [<ffffffff810fd83f>] write_cache_pages+0x21f/0x4d0
>> [  840.070258]  [<ffffffff810fc670>] ? __writepage+0x0/0x40
>> [  840.070263]  [<ffffffff810fdb14>] generic_writepages+0x24/0x30
>> [  840.070268]  [<ffffffff810fdb55>] do_writepages+0x35/0x40
>> [  840.070273]  [<ffffffff81166976>] writeback_single_inode+0xf6/0x3d0
>> [  840.070279]  [<ffffffff811675de>] writeback_inodes_wb+0x40e/0x5e0
>> [  840.070283]  [<ffffffff811678ba>] wb_writeback+0x10a/0x1d0
>> [  840.070289]  [<ffffffff81077719>] ? try_to_del_timer_sync+0x79/0xd0
>> [  840.070294]  [<ffffffff81558e6d>] ? schedule_timeout+0x19d/0x300
>> [  840.070298]  [<ffffffff81167bec>] wb_do_writeback+0x18c/0x1a0
>> [  840.070303]  [<ffffffff81167c53>] bdi_writeback_task+0x53/0xe0
>> [  840.070308]  [<ffffffff8110f546>] bdi_start_fn+0x86/0x100
>> [  840.070312]  [<ffffffff8110f4c0>] ? bdi_start_fn+0x0/0x100
>> [  840.070317]  [<ffffffff81084d16>] kthread+0x96/0xa0
>> [  840.070322]  [<ffffffff810141ea>] child_rip+0xa/0x20
>> [  840.070329]  [<ffffffff810133d1>] ? int_ret_from_sys_call+0x7/0x1b
>> [  840.070333]  [<ffffffff81013b5d>] ? retint_restore_args+0x5/0x6
>> [  840.070337]  [<ffffffff810141e0>] ? child_rip+0x0/0x20
>> [  840.070346] INFO: task kjournald:1011 blocked for more than 120 seconds.
>> [  840.070354] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [  840.070363] kjournald     D ffff8800035bcbc0     0  1011      2 0x00000000
>> [  840.070378]  ffff880002dc1a30 0000000000000246 0000000000015bc0
>> 0000000000015bc0
>> [  840.070385]  ffff8800574db1a0 ffff880002dc1fd8 0000000000015bc0
>> ffff8800574dade0
>> [  840.070392]  0000000000015bc0 ffff880002dc1fd8 0000000000015bc0
>> ffff8800574db1a0
>> [  840.070399] Call Trace:
>> [  840.070404]  [<ffffffff81558bb7>] io_schedule+0x47/0x70
>> [  840.070409]  [<ffffffff812a095e>] get_request_wait+0xce/0x1a0
>> [  840.070414]  [<ffffffff81085090>] ? autoremove_wake_function+0x0/0x40
>> [  840.070418]  [<ffffffff812a1045>] __make_request+0x95/0x4a0
>> [  840.070423]  [<ffffffff810f6595>] ? mempool_alloc_slab+0x15/0x20
>> [  840.070428]  [<ffffffff8100f37f>] ? xen_restore_fl_direct_end+0x0/0x1
>> [  840.070434]  [<ffffffff8129f7a1>] generic_make_request+0x1b1/0x4f0
>> [  840.070438]  [<ffffffff810f6595>] ? mempool_alloc_slab+0x15/0x20
>> [  840.070443]  [<ffffffff8129fb60>] submit_bio+0x80/0x110
>> [  840.070448]  [<ffffffff8116d399>] submit_bh+0xf9/0x140
>> [  840.070463]  [<ffffffff8121495c>] journal_submit_data_buffers+0x24c/0x320
>> [  840.070466]  [<ffffffff81214d02>] journal_commit_transaction+0x2d2/0xe40
>> [  840.070470]  [<ffffffff8100f37f>] ? xen_restore_fl_direct_end+0x0/0x1
>> [  840.070473]  [<ffffffff8155ab5e>] ? _spin_unlock_irqrestore+0x1e/0x30
>> [  840.070476]  [<ffffffff81077719>] ? try_to_del_timer_sync+0x79/0xd0
>> [  840.070479]  [<ffffffff8100f37f>] ? xen_restore_fl_direct_end+0x0/0x1
>> [  840.070482]  [<ffffffff8155ab5e>] ? _spin_unlock_irqrestore+0x1e/0x30
>> [  840.070485]  [<ffffffff81218c1d>] kjournald+0xed/0x250
>> [  840.070488]  [<ffffffff81085090>] ? autoremove_wake_function+0x0/0x40
>> [  840.070491]  [<ffffffff81218b30>] ? kjournald+0x0/0x250
>> [  840.070494]  [<ffffffff81084d16>] kthread+0x96/0xa0
>> [  840.070497]  [<ffffffff810141ea>] child_rip+0xa/0x20
>> [  840.070500]  [<ffffffff810133d1>] ? int_ret_from_sys_call+0x7/0x1b
>> [  840.070503]  [<ffffffff81013b5d>] ? retint_restore_args+0x5/0x6
>> [  840.070506]  [<ffffffff810141e0>] ? child_rip+0x0/0x20
>> [  840.070508] INFO: task rm:1455 blocked for more than 120 seconds.
>> [  840.070514] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [  840.070519] rm            D ffff8800035dabc0     0  1455    980 0x00000000
>> [  840.070524]  ffff880058bedb68 0000000000000286 0000000000015bc0
>> 0000000000015bc0
>> [  840.070528]  ffff880002d39ab0 ffff880058bedfd8 0000000000015bc0
>> ffff880002d396f0
>> [  840.070533]  0000000000015bc0 ffff880058bedfd8 0000000000015bc0
>> ffff880002d39ab0
>> [  840.070538] Call Trace:
>> [  840.070541]  [<ffffffff8116e5d0>] ? sync_buffer+0x0/0x50
>> [  840.070544]  [<ffffffff81558bb7>] io_schedule+0x47/0x70
>> [  840.070547]  [<ffffffff8116e615>] sync_buffer+0x45/0x50
>> [  840.070550]  [<ffffffff8155908a>] __wait_on_bit_lock+0x5a/0xc0
>> [  840.070553]  [<ffffffff8116e5d0>] ? sync_buffer+0x0/0x50
>> [  840.070556]  [<ffffffff81559168>] out_of_line_wait_on_bit_lock+0x78/0x90
>> [  840.070560]  [<ffffffff810850d0>] ? wake_bit_function+0x0/0x40
>> [  840.070563]  [<ffffffff8116d259>] ? try_to_free_buffers+0x79/0xc0
>> [  840.070566]  [<ffffffff8116e796>] __lock_buffer+0x36/0x40
>> [  840.070569]  [<ffffffff812136d6>] journal_invalidatepage+0xc6/0xd0
>> [  840.070572]  [<ffffffff811b7e5d>] ext3_invalidatepage+0x3d/0x40
>> [  840.070575]  [<ffffffff810ff7a5>] do_invalidatepage+0x25/0x30
>> [  840.070578]  [<ffffffff810ffa8a>] truncate_inode_page+0x9a/0xb0
>> [  840.070581]  [<ffffffff810ffd80>] truncate_inode_pages_range+0x160/0x440
>> [  840.070584]  [<ffffffff811bb950>] ? ext3_delete_inode+0x0/0xf0
>> [  840.070587]  [<ffffffff81100075>] truncate_inode_pages+0x15/0x20
>> [  840.070590]  [<ffffffff811bb96d>] ext3_delete_inode+0x1d/0xf0
>> [  840.070593]  [<ffffffff811bb950>] ? ext3_delete_inode+0x0/0xf0
>> [  840.070597]  [<ffffffff8115cac3>] generic_delete_inode+0xe3/0x1c0
>> [  840.070600]  [<ffffffff8115cc05>] generic_drop_inode+0x65/0x80
>> [  840.070603]  [<ffffffff8115b592>] iput+0x62/0x70
>> [  840.070606]  [<ffffffff81151f82>] do_unlinkat+0x112/0x1d0
>> [  840.070610]  [<ffffffff811555e5>] ? sys_getdents+0xb5/0xf0
>> [  840.070613]  [<ffffffff811521b2>] sys_unlinkat+0x22/0x40
>> [  840.070616]  [<ffffffff810131b2>] system_call_fastpath+0x16/0x1b
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: I/O scheduler deadlocks on Xen virtual block devices
  2010-09-03 15:17   ` Jeremy Fitzhardinge
@ 2010-09-06 12:44     ` Doug Neal
  0 siblings, 0 replies; 10+ messages in thread
From: Doug Neal @ 2010-09-06 12:44 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Pasi Kärkkäinen, linux-kernel

On Fri, Sep 3, 2010 at 4:17 PM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>  On 09/03/2010 05:22 AM, Pasi Kärkkäinen wrote:
>> On Wed, Sep 01, 2010 at 03:17:25PM +0100, Doug Neal wrote:
>>> Hello all,
>>>
>>> I believe I have found a bug in the I/O scheduler which results in
>>> access to a block device being blocked indefinitely.
>>>
>>> The setup:
>>>  * Xen dom0 version 3.4.2.
>>>  * domU: Ubuntu 10.04, x86_64, with kernel 2.6.32.15+drm33.5.
>>>  * Paravirtual disks and network interfaces.
>>>  * Root filesystem on /dev/xvdb1
>>>  * A scratch filesystem for the purposes of my tests on /dev/xvda
>>> mounted on /mnt
>>>  * Both filesystems are ext3, formatted and mounted with defaults
>>>  * XVBDs are backed by LVM on top of an iSCSI SAN in the dom0.
>>>
>> Hmm.. recently there was some patches to the Xen block device frontend driver
>> (sent to lkml). Maybe you should try if those make a difference?
>>
>> Jeremy: What do you think?
>
> I think this is more likely to be related to the lost interrupt bug
> which I sent fixes up for last week ("xen: use percpu interrupts for
> IPIs and VIRQs" and "xen: handle events as edge-triggered").
>
> Doug, as a workaround, try disabling irqbalanced.
>
> Thanks,
>    J

Jeremy,

That seems to be doing the trick - thanks! Will see about trying the
patches as well.

Doug

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-09-06 12:44 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-01 14:17 I/O scheduler deadlocks on Xen virtual block devices Doug Neal
2010-09-01 17:33 ` Jeff Moyer
2010-09-02  8:34   ` Doug Neal
2010-09-02 13:47     ` Jeff Moyer
2010-09-02 14:06       ` Doug Neal
2010-09-02 15:25         ` Jeff Moyer
2010-09-02 15:43           ` Doug Neal
2010-09-03 12:22 ` Pasi Kärkkäinen
2010-09-03 15:17   ` Jeremy Fitzhardinge
2010-09-06 12:44     ` Doug Neal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.