I/O scheduler deadlocks on Xen virtual block devices

* I/O scheduler deadlocks on Xen virtual block devices
@ 2010-09-01 14:17 Doug Neal
  2010-09-01 17:33 ` Jeff Moyer
  2010-09-03 12:22 ` Pasi Kärkkäinen
  0 siblings, 2 replies; 10+ messages in thread
From: Doug Neal @ 2010-09-01 14:17 UTC (permalink / raw)
  To: linux-kernel

Hello all,

I believe I have found a bug in the I/O scheduler which results in
access to a block device being blocked indefinitely.

The setup:
 * Xen dom0 version 3.4.2.
 * domU: Ubuntu 10.04, x86_64, with kernel 2.6.32.15+drm33.5.
 * Paravirtual disks and network interfaces.
 * Root filesystem on /dev/xvdb1
 * A scratch filesystem for the purposes of my tests on /dev/xvda
mounted on /mnt
 * Both filesystems are ext3, formatted and mounted with defaults
 * XVBDs are backed by LVM on top of an iSCSI SAN in the dom0.

Activities leading up to the incident:

To reproduce the bug, I run the VM on a Xen host which has a moderate
workload of other VMs. (It seems to manifest itself more readily than
if the host is otherwise idle).

I repeatedly rsync the contents of a Linux install CD to an arbitrary
location on the scratch filesystem, e.g. /mnt/test, then rm -rf the
lot, and rsync again. It can sometimes take a few iterations before
the bug is triggered. Sometimes it's triggered on the rsync, sometimes
on the rm.

At some point during either the rsync or the rm, all the processes
accessing /dev/xvda (rsync, kjournald, flush) become stuck in the D
state. After 120 seconds the warnings start to appear in dmesg:

  [  840.070508] INFO: task rm:1455 blocked for more than 120 seconds.
  [  840.070514] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.

Full dmesg output is below, including call traces which show that each
of these processes is stuck in io_schedule or sync_buffer called from
io_schedule.

I believe I have eliminated:
 * Problems with the underlying physical device
   - The same bug has manifested itself on two completely separate
sets of hardware, with different servers, switches, and SANs.

 * Problems with the host's iSCSI initiator
   - Other VMs depending on the same iSCSI session are unaffected
   - Other XVBDs within the same VM (in this case: /dev/xvdb1 mounted
on /) are unaffected

Things I've tried:
 * Noop, deadline, cfq and anticipatory elevators.
 * Reducing the VM's vCPU count to 1. I _wasn't_ able to trigger the
bug under only one vCPU.

Unfortunately the steps to reproduce are rather long winded once you
factor in setting up the Xen dom0 which is in this case Citrix
XenServer, so if there is any way I can help with that please let me
know, e.g. by arranging remote access to my existing test environment
or by supplying a VM template image. However I don't believe this
problem is limited to this environment as I have seen quite a few
reports of similar or identical issues elsewhere (but not on LKML...):

http://lists.xensource.com/archives/html/xen-devel/2010-06/msg00590.html
- same issue reported with a non-commercial and more recent Xen
version than mine. Also links to three Redhat bug reports showing
similar behaviour.

Thanks for your time,

Doug

[  720.070103] INFO: task kjournald:1011 blocked for more than 120 seconds.
[  720.070121] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  720.070131] kjournald     D ffff8800035bcbc0     0  1011      2 0x00000000
[  720.070140]  ffff880002dc1a30 0000000000000246 0000000000015bc0
0000000000015bc0
[  720.070149]  ffff8800574db1a0 ffff880002dc1fd8 0000000000015bc0
ffff8800574dade0
[  720.070156]  0000000000015bc0 ffff880002dc1fd8 0000000000015bc0
ffff8800574db1a0
[  720.070164] Call Trace:
[  720.070177]  [<ffffffff81558bb7>] io_schedule+0x47/0x70
[  720.070184]  [<ffffffff812a095e>] get_request_wait+0xce/0x1a0
[  720.070191]  [<ffffffff81085090>] ? autoremove_wake_function+0x0/0x40
[  720.070195]  [<ffffffff812a1045>] __make_request+0x95/0x4a0
[  720.070201]  [<ffffffff810f6595>] ? mempool_alloc_slab+0x15/0x20
[  720.070208]  [<ffffffff8100f37f>] ? xen_restore_fl_direct_end+0x0/0x1
[  720.070213]  [<ffffffff8129f7a1>] generic_make_request+0x1b1/0x4f0
[  720.070218]  [<ffffffff810f6595>] ? mempool_alloc_slab+0x15/0x20
[  720.070223]  [<ffffffff8129fb60>] submit_bio+0x80/0x110
[  720.070229]  [<ffffffff8116d399>] submit_bh+0xf9/0x140
[  720.070235]  [<ffffffff8121495c>] journal_submit_data_buffers+0x24c/0x320
[  720.070240]  [<ffffffff81214d02>] journal_commit_transaction+0x2d2/0xe40
[  720.070245]  [<ffffffff8100f37f>] ? xen_restore_fl_direct_end+0x0/0x1
[  720.070250]  [<ffffffff8155ab5e>] ? _spin_unlock_irqrestore+0x1e/0x30
[  720.070256]  [<ffffffff81077719>] ? try_to_del_timer_sync+0x79/0xd0
[  720.070261]  [<ffffffff8100f37f>] ? xen_restore_fl_direct_end+0x0/0x1
[  720.070265]  [<ffffffff8155ab5e>] ? _spin_unlock_irqrestore+0x1e/0x30
[  720.070270]  [<ffffffff81218c1d>] kjournald+0xed/0x250
[  720.070275]  [<ffffffff81085090>] ? autoremove_wake_function+0x0/0x40
[  720.070279]  [<ffffffff81218b30>] ? kjournald+0x0/0x250
[  720.070284]  [<ffffffff81084d16>] kthread+0x96/0xa0
[  720.070289]  [<ffffffff810141ea>] child_rip+0xa/0x20
[  720.070295]  [<ffffffff810133d1>] ? int_ret_from_sys_call+0x7/0x1b
[  720.070299]  [<ffffffff81013b5d>] ? retint_restore_args+0x5/0x6
[  720.070313]  [<ffffffff810141e0>] ? child_rip+0x0/0x20
[  720.070317] INFO: task rm:1455 blocked for more than 120 seconds.
[  720.070323] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  720.070329] rm            D ffff8800035dabc0     0  1455    980 0x00000000
[  720.070333]  ffff880058bedb68 0000000000000286 0000000000015bc0
0000000000015bc0
[  720.070338]  ffff880002d39ab0 ffff880058bedfd8 0000000000015bc0
ffff880002d396f0
[  720.070343]  0000000000015bc0 ffff880058bedfd8 0000000000015bc0
ffff880002d39ab0
[  720.070348] Call Trace:
[  720.070351]  [<ffffffff8116e5d0>] ? sync_buffer+0x0/0x50
[  720.070354]  [<ffffffff81558bb7>] io_schedule+0x47/0x70
[  720.070357]  [<ffffffff8116e615>] sync_buffer+0x45/0x50
[  720.070360]  [<ffffffff8155908a>] __wait_on_bit_lock+0x5a/0xc0
[  720.070363]  [<ffffffff8116e5d0>] ? sync_buffer+0x0/0x50
[  720.070366]  [<ffffffff81559168>] out_of_line_wait_on_bit_lock+0x78/0x90
[  720.070369]  [<ffffffff810850d0>] ? wake_bit_function+0x0/0x40
[  720.070373]  [<ffffffff8116d259>] ? try_to_free_buffers+0x79/0xc0
[  720.070376]  [<ffffffff8116e796>] __lock_buffer+0x36/0x40
[  720.070379]  [<ffffffff812136d6>] journal_invalidatepage+0xc6/0xd0
[  720.070383]  [<ffffffff811b7e5d>] ext3_invalidatepage+0x3d/0x40
[  720.070388]  [<ffffffff810ff7a5>] do_invalidatepage+0x25/0x30
[  720.070390]  [<ffffffff810ffa8a>] truncate_inode_page+0x9a/0xb0
[  720.070393]  [<ffffffff810ffd80>] truncate_inode_pages_range+0x160/0x440
[  720.070396]  [<ffffffff811bb950>] ? ext3_delete_inode+0x0/0xf0
[  720.070399]  [<ffffffff81100075>] truncate_inode_pages+0x15/0x20
[  720.070402]  [<ffffffff811bb96d>] ext3_delete_inode+0x1d/0xf0
[  720.070405]  [<ffffffff811bb950>] ? ext3_delete_inode+0x0/0xf0
[  720.070409]  [<ffffffff8115cac3>] generic_delete_inode+0xe3/0x1c0
[  720.070412]  [<ffffffff8115cc05>] generic_drop_inode+0x65/0x80
[  720.070415]  [<ffffffff8115b592>] iput+0x62/0x70
[  720.070419]  [<ffffffff81151f82>] do_unlinkat+0x112/0x1d0
[  720.070422]  [<ffffffff811555e5>] ? sys_getdents+0xb5/0xf0
[  720.070425]  [<ffffffff811521b2>] sys_unlinkat+0x22/0x40
[  720.070428]  [<ffffffff810131b2>] system_call_fastpath+0x16/0x1b
[  840.070097] INFO: task flush-202:0:208 blocked for more than 120 seconds.
[  840.070115] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  840.070125] flush-202:0   D ffff88000359ebc0     0   208      2 0x00000000
[  840.070132]  ffff880002d698c0 0000000000000246 0000000000015bc0
0000000000015bc0
[  840.070142]  ffff88007d444890 ffff880002d69fd8 0000000000015bc0
ffff88007d4444d0
[  840.070149]  0000000000015bc0 ffff880002d69fd8 0000000000015bc0
ffff88007d444890
[  840.070156] Call Trace:
[  840.070170]  [<ffffffff8116e5d0>] ? sync_buffer+0x0/0x50
[  840.070177]  [<ffffffff81558bb7>] io_schedule+0x47/0x70
[  840.070182]  [<ffffffff8116e615>] sync_buffer+0x45/0x50
[  840.070187]  [<ffffffff8155908a>] __wait_on_bit_lock+0x5a/0xc0
[  840.070191]  [<ffffffff8116e5d0>] ? sync_buffer+0x0/0x50
[  840.070196]  [<ffffffff8116eb60>] ? end_buffer_async_write+0x0/0x180
[  840.070201]  [<ffffffff81559168>] out_of_line_wait_on_bit_lock+0x78/0x90
[  840.070208]  [<ffffffff810850d0>] ? wake_bit_function+0x0/0x40
[  840.070212]  [<ffffffff8116e796>] __lock_buffer+0x36/0x40
[  840.070217]  [<ffffffff8116f473>] __block_write_full_page+0x383/0x3c0
[  840.070222]  [<ffffffff8116eb60>] ? end_buffer_async_write+0x0/0x180
[  840.070227]  [<ffffffff8116fde0>] block_write_full_page_endio+0xe0/0x120
[  840.070232]  [<ffffffff811b70f0>] ? buffer_unmapped+0x0/0x20
[  840.070237]  [<ffffffff8116fe35>] block_write_full_page+0x15/0x20
[  840.070241]  [<ffffffff811b7bfd>] ext3_ordered_writepage+0x1dd/0x200
[  840.070248]  [<ffffffff810fc687>] __writepage+0x17/0x40
[  840.070254]  [<ffffffff810fd83f>] write_cache_pages+0x21f/0x4d0
[  840.070258]  [<ffffffff810fc670>] ? __writepage+0x0/0x40
[  840.070263]  [<ffffffff810fdb14>] generic_writepages+0x24/0x30
[  840.070268]  [<ffffffff810fdb55>] do_writepages+0x35/0x40
[  840.070273]  [<ffffffff81166976>] writeback_single_inode+0xf6/0x3d0
[  840.070279]  [<ffffffff811675de>] writeback_inodes_wb+0x40e/0x5e0
[  840.070283]  [<ffffffff811678ba>] wb_writeback+0x10a/0x1d0
[  840.070289]  [<ffffffff81077719>] ? try_to_del_timer_sync+0x79/0xd0
[  840.070294]  [<ffffffff81558e6d>] ? schedule_timeout+0x19d/0x300
[  840.070298]  [<ffffffff81167bec>] wb_do_writeback+0x18c/0x1a0
[  840.070303]  [<ffffffff81167c53>] bdi_writeback_task+0x53/0xe0
[  840.070308]  [<ffffffff8110f546>] bdi_start_fn+0x86/0x100
[  840.070312]  [<ffffffff8110f4c0>] ? bdi_start_fn+0x0/0x100
[  840.070317]  [<ffffffff81084d16>] kthread+0x96/0xa0
[  840.070322]  [<ffffffff810141ea>] child_rip+0xa/0x20
[  840.070329]  [<ffffffff810133d1>] ? int_ret_from_sys_call+0x7/0x1b
[  840.070333]  [<ffffffff81013b5d>] ? retint_restore_args+0x5/0x6
[  840.070337]  [<ffffffff810141e0>] ? child_rip+0x0/0x20
[  840.070346] INFO: task kjournald:1011 blocked for more than 120 seconds.
[  840.070354] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  840.070363] kjournald     D ffff8800035bcbc0     0  1011      2 0x00000000
[  840.070378]  ffff880002dc1a30 0000000000000246 0000000000015bc0
0000000000015bc0
[  840.070385]  ffff8800574db1a0 ffff880002dc1fd8 0000000000015bc0
ffff8800574dade0
[  840.070392]  0000000000015bc0 ffff880002dc1fd8 0000000000015bc0
ffff8800574db1a0
[  840.070399] Call Trace:
[  840.070404]  [<ffffffff81558bb7>] io_schedule+0x47/0x70
[  840.070409]  [<ffffffff812a095e>] get_request_wait+0xce/0x1a0
[  840.070414]  [<ffffffff81085090>] ? autoremove_wake_function+0x0/0x40
[  840.070418]  [<ffffffff812a1045>] __make_request+0x95/0x4a0
[  840.070423]  [<ffffffff810f6595>] ? mempool_alloc_slab+0x15/0x20
[  840.070428]  [<ffffffff8100f37f>] ? xen_restore_fl_direct_end+0x0/0x1
[  840.070434]  [<ffffffff8129f7a1>] generic_make_request+0x1b1/0x4f0
[  840.070438]  [<ffffffff810f6595>] ? mempool_alloc_slab+0x15/0x20
[  840.070443]  [<ffffffff8129fb60>] submit_bio+0x80/0x110
[  840.070448]  [<ffffffff8116d399>] submit_bh+0xf9/0x140
[  840.070463]  [<ffffffff8121495c>] journal_submit_data_buffers+0x24c/0x320
[  840.070466]  [<ffffffff81214d02>] journal_commit_transaction+0x2d2/0xe40
[  840.070470]  [<ffffffff8100f37f>] ? xen_restore_fl_direct_end+0x0/0x1
[  840.070473]  [<ffffffff8155ab5e>] ? _spin_unlock_irqrestore+0x1e/0x30
[  840.070476]  [<ffffffff81077719>] ? try_to_del_timer_sync+0x79/0xd0
[  840.070479]  [<ffffffff8100f37f>] ? xen_restore_fl_direct_end+0x0/0x1
[  840.070482]  [<ffffffff8155ab5e>] ? _spin_unlock_irqrestore+0x1e/0x30
[  840.070485]  [<ffffffff81218c1d>] kjournald+0xed/0x250
[  840.070488]  [<ffffffff81085090>] ? autoremove_wake_function+0x0/0x40
[  840.070491]  [<ffffffff81218b30>] ? kjournald+0x0/0x250
[  840.070494]  [<ffffffff81084d16>] kthread+0x96/0xa0
[  840.070497]  [<ffffffff810141ea>] child_rip+0xa/0x20
[  840.070500]  [<ffffffff810133d1>] ? int_ret_from_sys_call+0x7/0x1b
[  840.070503]  [<ffffffff81013b5d>] ? retint_restore_args+0x5/0x6
[  840.070506]  [<ffffffff810141e0>] ? child_rip+0x0/0x20
[  840.070508] INFO: task rm:1455 blocked for more than 120 seconds.
[  840.070514] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  840.070519] rm            D ffff8800035dabc0     0  1455    980 0x00000000
[  840.070524]  ffff880058bedb68 0000000000000286 0000000000015bc0
0000000000015bc0
[  840.070528]  ffff880002d39ab0 ffff880058bedfd8 0000000000015bc0
ffff880002d396f0
[  840.070533]  0000000000015bc0 ffff880058bedfd8 0000000000015bc0
ffff880002d39ab0
[  840.070538] Call Trace:
[  840.070541]  [<ffffffff8116e5d0>] ? sync_buffer+0x0/0x50
[  840.070544]  [<ffffffff81558bb7>] io_schedule+0x47/0x70
[  840.070547]  [<ffffffff8116e615>] sync_buffer+0x45/0x50
[  840.070550]  [<ffffffff8155908a>] __wait_on_bit_lock+0x5a/0xc0
[  840.070553]  [<ffffffff8116e5d0>] ? sync_buffer+0x0/0x50
[  840.070556]  [<ffffffff81559168>] out_of_line_wait_on_bit_lock+0x78/0x90
[  840.070560]  [<ffffffff810850d0>] ? wake_bit_function+0x0/0x40
[  840.070563]  [<ffffffff8116d259>] ? try_to_free_buffers+0x79/0xc0
[  840.070566]  [<ffffffff8116e796>] __lock_buffer+0x36/0x40
[  840.070569]  [<ffffffff812136d6>] journal_invalidatepage+0xc6/0xd0
[  840.070572]  [<ffffffff811b7e5d>] ext3_invalidatepage+0x3d/0x40
[  840.070575]  [<ffffffff810ff7a5>] do_invalidatepage+0x25/0x30
[  840.070578]  [<ffffffff810ffa8a>] truncate_inode_page+0x9a/0xb0
[  840.070581]  [<ffffffff810ffd80>] truncate_inode_pages_range+0x160/0x440
[  840.070584]  [<ffffffff811bb950>] ? ext3_delete_inode+0x0/0xf0
[  840.070587]  [<ffffffff81100075>] truncate_inode_pages+0x15/0x20
[  840.070590]  [<ffffffff811bb96d>] ext3_delete_inode+0x1d/0xf0
[  840.070593]  [<ffffffff811bb950>] ? ext3_delete_inode+0x0/0xf0
[  840.070597]  [<ffffffff8115cac3>] generic_delete_inode+0xe3/0x1c0
[  840.070600]  [<ffffffff8115cc05>] generic_drop_inode+0x65/0x80
[  840.070603]  [<ffffffff8115b592>] iput+0x62/0x70
[  840.070606]  [<ffffffff81151f82>] do_unlinkat+0x112/0x1d0
[  840.070610]  [<ffffffff811555e5>] ? sys_getdents+0xb5/0xf0
[  840.070613]  [<ffffffff811521b2>] sys_unlinkat+0x22/0x40
[  840.070616]  [<ffffffff810131b2>] system_call_fastpath+0x16/0x1b

^ permalink raw reply	[flat|nested] 10+ messages in thread