xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* dom0 suddenly blocking on all access to md device
@ 2021-02-26 22:39 Andy Smith
  2021-02-26 22:50 ` Andy Smith
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Andy Smith @ 2021-02-26 22:39 UTC (permalink / raw)
  To: xen-devel

Hi,

I suspect this might be an issue in the dom0 kernel (Debian buster,
kernel 4.19.0-13-amd64), but just lately I've been sporadically
having issues where dom0 blocks or severely slows down on all access
to the particular md device that hosts all domU block devices.

Setup in dom0: an md RAID10 that is used as an LVM PV for an LVM volume
group, where all domU block devices are LVM logical volumes in that
group. So the relevant part of a domU config file might look like:

disk = [ "phy:/dev/myvg/domu_debtest1_xvda,xvda,w",
         "phy:/dev/myvg/domu_debtest1_xvdb,xvdb,w" ]

The guests are mostly PV, a sprinkling of PVH, no HVM.

There's 5 of these servers but 3 of them have only recently been
upgraded to Xen 4.12.14 (on Debian buster) from Xen 4.10 (on Debian
jessie). The fact that all of them have been pretty stable in the
past, on differing hardware, makes me discount a hardware issue. The
fact that two of them have been buster / 4.12.x for a long time
without issue but are also now starting to see this does make me
think that it's a recent dom0 kernel issue.

When the problem occurs, inside every domU I see things like this:

Feb 26 20:02:34 backup4 kernel: [2530464.736085] INFO: task btrfs-transacti:333 blocked for more than 120 seconds.
Feb 26 20:02:34 backup4 kernel: [2530464.736107]       Not tainted 4.9.0-14-amd64 #1 Debian 4.9.246-2
Feb 26 20:02:34 backup4 kernel: [2530464.736117] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 26 20:02:34 backup4 kernel: [2530464.736131] btrfs-transacti D    0   333      2 0x00000000
Feb 26 20:02:34 backup4 kernel: [2530464.736146]  0000000000000246 ffff8800f4e0c400 0000000000000000 ffff8800f8a7f100
Feb 26 20:02:34 backup4 kernel: [2530464.736168]  ffff8800fad18a00 ffff8800fa7dd000 ffffc90040b2f670 ffffffff8161a979
Feb 26 20:02:34 backup4 kernel: [2530464.736188]  ffff8800fa6d0200 0000000000000000 ffff8800fad18a00 0000000000000010
Feb 26 20:02:34 backup4 kernel: [2530464.736209] Call Trace:
Feb 26 20:02:34 backup4 kernel: [2530464.736223]  [<ffffffff8161a979>] ? __schedule+0x239/0x6f0
Feb 26 20:02:34 backup4 kernel: [2530464.736236]  [<ffffffff8161ae62>] ? schedule+0x32/0x80
Feb 26 20:02:34 backup4 kernel: [2530464.736248]  [<ffffffff8161e1fd>] ? schedule_timeout+0x1dd/0x380
Feb 26 20:02:34 backup4 kernel: [2530464.736263]  [<ffffffff8101c201>] ? xen_clocksource_get_cycles+0x11/0x20
Feb 26 20:02:34 backup4 kernel: [2530464.736275]  [<ffffffff8161a6dd>] ? io_schedule_timeout+0x9d/0x100
Feb 26 20:02:34 backup4 kernel: [2530464.736289]  [<ffffffff81367964>] ? __sbitmap_queue_get+0x24/0x90
Feb 26 20:02:34 backup4 kernel: [2530464.736302]  [<ffffffff81317f60>] ? bt_get.isra.6+0x160/0x220
Feb 26 20:02:34 backup4 kernel: [2530464.736338]  [<ffffffffc0148bf8>] ? __btrfs_map_block+0x6c8/0x11d0 [btrfs]
Feb 26 20:02:34 backup4 kernel: [2530464.736353]  [<ffffffff810bf010>] ? prepare_to_wait_event+0xf0/0xf0
Feb 26 20:02:34 backup4 kernel: [2530464.736364]  [<ffffffff813182d3>] ? blk_mq_get_tag+0x23/0x90
Feb 26 20:02:34 backup4 kernel: [2530464.736377]  [<ffffffff81313b6a>] ? __blk_mq_alloc_request+0x1a/0x220
Feb 26 20:02:34 backup4 kernel: [2530464.736390]  [<ffffffff81314a39>] ? blk_mq_map_request+0xd9/0x170
Feb 26 20:02:34 backup4 kernel: [2530464.736402]  [<ffffffff8131726b>] ? blk_mq_make_request+0xbb/0x580
Feb 26 20:02:34 backup4 kernel: [2530464.736429]  [<ffffffffc0148bf8>] ? __btrfs_map_block+0x6c8/0x11d0 [btrfs]
Feb 26 20:02:34 backup4 kernel: [2530464.736444]  [<ffffffff8130b0f5>] ? generic_make_request+0x115/0x2d0
Feb 26 20:02:34 backup4 kernel: [2530464.736456]  [<ffffffff8130b326>] ? submit_bio+0x76/0x140
Feb 26 20:02:34 backup4 kernel: [2530464.736481]  [<ffffffffc0149d9a>] ? btrfs_map_bio+0x19a/0x340 [btrfs]
Feb 26 20:02:34 backup4 kernel: [2530464.736505]  [<ffffffffc0111635>] ? btree_submit_bio_hook+0xf5/0x110 [btrfs]
Feb 26 20:02:34 backup4 kernel: [2530464.736535]  [<ffffffffc0138318>] ? submit_one_bio+0x68/0x90 [btrfs]
Feb 26 20:02:34 backup4 kernel: [2530464.736561]  [<ffffffffc013fd4d>] ? read_extent_buffer_pages+0x1cd/0x300 [btrfs]
Feb 26 20:02:34 backup4 kernel: [2530464.736587]  [<ffffffffc010fbe0>] ? free_root_pointers+0x60/0x60 [btrfs]
Feb 26 20:02:34 backup4 kernel: [2530464.736609]  [<ffffffffc010ff9c>] ? btree_read_extent_buffer_pages+0x8c/0x100 [btrfs]
Feb 26 20:02:34 backup4 kernel: [2530464.736635]  [<ffffffffc0111814>] ? read_tree_block+0x34/0x50 [btrfs]
Feb 26 20:02:34 backup4 kernel: [2530464.736655]  [<ffffffffc00ef9f3>] ? read_block_for_search.isra.36+0x133/0x320 [btrfs]
Feb 26 20:02:34 backup4 kernel: [2530464.736678]  [<ffffffffc00eabe4>] ? unlock_up+0xd4/0x180 [btrfs]
Feb 26 20:02:34 backup4 kernel: [2530464.736700]  [<ffffffffc00f1b8d>] ? btrfs_search_slot+0x3ad/0xa00 [btrfs]
Feb 26 20:02:34 backup4 kernel: [2530464.736723]  [<ffffffffc00f3a47>] ? btrfs_insert_empty_items+0x67/0xc0 [btrfs]
Feb 26 20:02:34 backup4 kernel: [2530464.736748]  [<ffffffffc00ffe24>] ? __btrfs_run_delayed_refs+0xfc4/0x13a0 [btrfs]
Feb 26 20:02:34 backup4 kernel: [2530464.736763]  [<ffffffff810164bd>] ? xen_mc_flush+0xdd/0x1d0
Feb 26 20:02:34 backup4 kernel: [2530464.736785]  [<ffffffffc01033ad>] ? btrfs_run_delayed_refs+0x9d/0x2b0 [btrfs]
Feb 26 20:02:34 backup4 kernel: [2530464.736811]  [<ffffffffc0119817>] ? btrfs_commit_transaction+0x57/0xa10 [btrfs]
Feb 26 20:02:34 backup4 kernel: [2530464.736837]  [<ffffffffc011a266>] ? start_transaction+0x96/0x480 [btrfs]
Feb 26 20:02:34 backup4 kernel: [2530464.736861]  [<ffffffffc011464c>] ? transaction_kthread+0x1dc/0x200 [btrfs]
Feb 26 20:02:34 backup4 kernel: [2530464.736886]  [<ffffffffc0114470>] ? btrfs_cleanup_transaction+0x590/0x590 [btrfs]
Feb 26 20:02:34 backup4 kernel: [2530464.736901]  [<ffffffff8109be69>] ? kthread+0xd9/0xf0
Feb 26 20:02:34 backup4 kernel: [2530464.736913]  [<ffffffff8109bd90>] ? kthread_park+0x60/0x60
Feb 26 20:02:34 backup4 kernel: [2530464.736926]  [<ffffffff8161f8f7>] ? ret_from_fork+0x57/0x70

It's all kinds of guest kernel, and the processes are basically
anything that tries to access its block devices.

Over in the dom0 at the time, I mostly haven't managed to get logs,
probably because its logging is on the same md device that is having
problems. Some of the servers are fortunate to have their dom0
operating system installed on separate devices to the guest devices,
and on one of those I got this:

Feb 20 00:58:44 talisker kernel: [5876461.472590] INFO: task md5_raid10:226 blocked for more than 120 seconds.
Feb 20 00:58:44 talisker kernel: [5876461.473105]       Not tainted 4.19.0-13-amd64 #1 Debian 4.19.160-2
Feb 20 00:58:44 talisker kernel: [5876461.473523] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 20 00:58:44 talisker kernel: [5876461.473936] md5_raid10      D    0   226      2 0x80000000
Feb 20 00:58:44 talisker kernel: [5876461.474341] Call Trace:
Feb 20 00:58:44 talisker kernel: [5876461.474743]  __schedule+0x29f/0x840
Feb 20 00:58:44 talisker kernel: [5876461.475142]  ? _raw_spin_unlock_irqrestore+0x14/0x20
Feb 20 00:58:44 talisker kernel: [5876461.475554]  schedule+0x28/0x80
Feb 20 00:58:44 talisker kernel: [5876461.475964]  md_super_wait+0x6e/0xa0 [md_mod]
Feb 20 00:58:44 talisker kernel: [5876461.476372]  ? finish_wait+0x80/0x80
Feb 20 00:58:44 talisker kernel: [5876461.476817]  md_bitmap_wait_writes+0x93/0xa0 [md_mod]
Feb 20 00:58:44 talisker kernel: [5876461.477504]  ? md_bitmap_get_counter+0x42/0xd0 [md_mod]
Feb 20 00:58:44 talisker kernel: [5876461.478248]  md_bitmap_daemon_work+0x1f7/0x370 [md_mod]
Feb 20 00:58:44 talisker kernel: [5876461.478904]  md_check_recovery+0x41/0x530 [md_mod]
Feb 20 00:58:44 talisker kernel: [5876461.479309]  raid10d+0x62/0x1460 [raid10]
Feb 20 00:58:44 talisker kernel: [5876461.479722]  ? __switch_to_asm+0x41/0x70
Feb 20 00:58:44 talisker kernel: [5876461.480133]  ? finish_task_switch+0x78/0x280
Feb 20 00:58:44 talisker kernel: [5876461.480540]  ? _raw_spin_lock_irqsave+0x15/0x40
Feb 20 00:58:44 talisker kernel: [5876461.480987]  ? lock_timer_base+0x67/0x80
Feb 20 00:58:44 talisker kernel: [5876461.481719]  ? _raw_spin_unlock_irqrestore+0x14/0x20
Feb 20 00:58:44 talisker kernel: [5876461.482358]  ? try_to_del_timer_sync+0x4d/0x80
Feb 20 00:58:44 talisker kernel: [5876461.482768]  ? del_timer_sync+0x37/0x40
Feb 20 00:58:44 talisker kernel: [5876461.483162]  ? schedule_timeout+0x173/0x3b0
Feb 20 00:58:44 talisker kernel: [5876461.483553]  ? md_rdev_init+0xb0/0xb0 [md_mod]
Feb 20 00:58:44 talisker kernel: [5876461.483944]  ? md_thread+0x94/0x150 [md_mod]
Feb 20 00:58:44 talisker kernel: [5876461.484345]  ? r10bio_pool_alloc+0x20/0x20 [raid10]
Feb 20 00:58:44 talisker kernel: [5876461.484777]  md_thread+0x94/0x150 [md_mod]
Feb 20 00:58:44 talisker kernel: [5876461.485500]  ? finish_wait+0x80/0x80
Feb 20 00:58:44 talisker kernel: [5876461.486083]  kthread+0x112/0x130
Feb 20 00:58:44 talisker kernel: [5876461.486479]  ? kthread_bind+0x30/0x30
Feb 20 00:58:44 talisker kernel: [5876461.486870]  ret_from_fork+0x35/0x40
Feb 20 00:58:44 talisker kernel: [5876461.487260] INFO: task 1.xvda-0:4237 blocked for more than 120 seconds.
Feb 20 00:58:44 talisker kernel: [5876461.487644]       Not tainted 4.19.0-13-amd64 #1 Debian 4.19.160-2
Feb 20 00:58:44 talisker kernel: [5876461.488027] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 20 00:58:44 talisker kernel: [5876461.488422] 1.xvda-0        D    0  4237      2 0x80000000
Feb 20 00:58:44 talisker kernel: [5876461.488842] Call Trace:
Feb 20 00:58:44 talisker kernel: [5876461.489530]  __schedule+0x29f/0x840
Feb 20 00:58:44 talisker kernel: [5876461.490149]  ? _raw_spin_unlock_irqrestore+0x14/0x20
Feb 20 00:58:44 talisker kernel: [5876461.490545]  schedule+0x28/0x80
Feb 20 00:58:44 talisker kernel: [5876461.490954]  md_super_wait+0x6e/0xa0 [md_mod]
Feb 20 00:58:44 talisker kernel: [5876461.491330]  ? finish_wait+0x80/0x80
Feb 20 00:58:44 talisker kernel: [5876461.491708]  md_bitmap_wait_writes+0x93/0xa0 [md_mod]
Feb 20 00:58:44 talisker kernel: [5876461.492101]  md_bitmap_unplug+0xc5/0x120 [md_mod]
Feb 20 00:58:44 talisker kernel: [5876461.492490]  raid10_unplug+0xd4/0x190 [raid10]
Feb 20 00:58:44 talisker kernel: [5876461.492926]  blk_flush_plug_list+0xcf/0x240
Feb 20 00:58:44 talisker kernel: [5876461.493648]  blk_finish_plug+0x21/0x2e
Feb 20 00:58:44 talisker kernel: [5876461.494277]  dispatch_rw_block_io+0x696/0x990 [xen_blkback]
Feb 20 00:58:44 talisker kernel: [5876461.494657]  ? inv_show+0x30/0x30
Feb 20 00:58:44 talisker kernel: [5876461.495043]  __do_block_io_op+0x30f/0x610 [xen_blkback]
Feb 20 00:58:44 talisker kernel: [5876461.495458]  ? _raw_spin_unlock_irqrestore+0x14/0x20
Feb 20 00:58:44 talisker kernel: [5876461.495871]  ? try_to_del_timer_sync+0x4d/0x80
Feb 20 00:58:44 talisker kernel: [5876461.496264]  xen_blkif_schedule+0xdb/0x650 [xen_blkback]
Feb 20 00:58:44 talisker kernel: [5876461.496784]  ? finish_wait+0x80/0x80
Feb 20 00:58:44 talisker kernel: [5876461.497418]  ? xen_blkif_be_int+0x30/0x30 [xen_blkback]
Feb 20 00:58:44 talisker kernel: [5876461.498041]  kthread+0x112/0x130
Feb 20 00:58:44 talisker kernel: [5876461.498668]  ? kthread_bind+0x30/0x30
Feb 20 00:58:44 talisker kernel: [5876461.499309]  ret_from_fork+0x35/0x40
Feb 20 00:58:44 talisker kernel: [5876461.499960] INFO: task 1.xvda-1:4238 blocked for more than 120 seconds.
Feb 20 00:58:44 talisker kernel: [5876461.500518]       Not tainted 4.19.0-13-amd64 #1 Debian 4.19.160-2
Feb 20 00:58:44 talisker kernel: [5876461.500943] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 20 00:58:44 talisker kernel: [5876461.501609] 1.xvda-1        D    0  4238      2 0x80000000
Feb 20 00:58:44 talisker kernel: [5876461.501992] Call Trace:
Feb 20 00:58:44 talisker kernel: [5876461.502372]  __schedule+0x29f/0x840
Feb 20 00:58:44 talisker kernel: [5876461.502747]  schedule+0x28/0x80
Feb 20 00:58:44 talisker kernel: [5876461.503121]  io_schedule+0x12/0x40
Feb 20 00:58:44 talisker kernel: [5876461.503494]  wbt_wait+0x205/0x300
Feb 20 00:58:44 talisker kernel: [5876461.503863]  ? wbt_wait+0x300/0x300
Feb 20 00:58:44 talisker kernel: [5876461.504237]  rq_qos_throttle+0x31/0x40
Feb 20 00:58:44 talisker kernel: [5876461.504637]  blk_mq_make_request+0x111/0x530
Feb 20 00:58:44 talisker kernel: [5876461.505319]  generic_make_request+0x1a4/0x400
Feb 20 00:58:44 talisker kernel: [5876461.505999]  raid10_unplug+0xfd/0x190 [raid10]
Feb 20 00:58:44 talisker kernel: [5876461.506402]  blk_flush_plug_list+0xcf/0x240
Feb 20 00:58:44 talisker kernel: [5876461.506772]  blk_finish_plug+0x21/0x2e
Feb 20 00:58:44 talisker kernel: [5876461.507140]  dispatch_rw_block_io+0x696/0x990 [xen_blkback]
Feb 20 00:58:44 talisker kernel: [5876461.507792]  ? inv_show+0x30/0x30
Feb 20 00:58:44 talisker kernel: [5876461.508166]  __do_block_io_op+0x30f/0x610 [xen_blkback]
Feb 20 00:58:44 talisker kernel: [5876461.508549]  ? _raw_spin_unlock_irqrestore+0x14/0x20
Feb 20 00:58:44 talisker kernel: [5876461.508967]  ? try_to_del_timer_sync+0x4d/0x80
Feb 20 00:58:44 talisker kernel: [5876461.509673]  xen_blkif_schedule+0xdb/0x650 [xen_blkback]
Feb 20 00:58:44 talisker kernel: [5876461.510304]  ? finish_wait+0x80/0x80
Feb 20 00:58:44 talisker kernel: [5876461.510678]  ? xen_blkif_be_int+0x30/0x30 [xen_blkback]
Feb 20 00:58:44 talisker kernel: [5876461.511049]  kthread+0x112/0x130
Feb 20 00:58:44 talisker kernel: [5876461.511413]  ? kthread_bind+0x30/0x30
Feb 20 00:58:44 talisker kernel: [5876461.511776]  ret_from_fork+0x35/0x40

Administrators of the guests notice problems and try to shutdown or
reboot, but that fails because dom0 can't write to its xenstore, so
mostly domains can't be managed after this happens and the server
has to be forcibly rebooted.

These are all using the default scheduler, which I understand since
4.12 is credit2. SMT is enabled and I've limited dom0 to 2 cores,
then pinned dom0 to cores 0 and 1, and pinned all other guests to
their choice out of the remaining cores. That is something I did
fairly recently though; for a long time there was no pinning yet
this still started happening.

In a couple of cases I have found that I've been able to run
"xentop" and see a particular guest doing heavy block device reads.
I've done an "xl destroy" on that guest and then everything has
returned to normal. Unfortunately the times this has happened have
been on dom0s without useful logs. There's just a gap in logs
between when the problems started and when the (apparently)
problematic domU is destroyed. The problematic domU can then be
booted again and life goes on.

So, it could be totally unrelated to Xen, and as I investigate
further I will try different kernels in dom0. But the way that
destroying a domU frees things up makes me wonder if it could be Xen
related, maybe scheduler related? Also, it's always the md device
that the guest block devices are on that is stalled - IO to other
devices in dom0

Are there any hypervisor magic sysrq debug keys that could provide
useful information to you in ruling in / out a Xen issue?

Should I try using the "credit" scheduler (instead of "credit2") at
next boot?

I *think* this has only been seen with kernel version
4.19.0-13-amd64. Some of these servers have now been rebooted into
4.19.0-14-amd64 (the latest available package) due to the issue,
which has not yet re-occurred for them.

If it does re-occur with 4.19.0-14-amd64 what kernel version would
you advise I try out at next reboot so as to take the Debian kernel
out of the picture? I will download an upstream kernel release and
build a Debian package out of it, using my existing kernel config as
a base.

As Debian buster is on the 4.19 series should I pick the latest
4.19.x longterm to be near to it, or the 5.10.x longterm, or the
5.11.x stable?

Thanks,
Andy


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: dom0 suddenly blocking on all access to md device
  2021-02-26 22:39 dom0 suddenly blocking on all access to md device Andy Smith
@ 2021-02-26 22:50 ` Andy Smith
  2021-03-02  0:44 ` Andy Smith
  2021-06-12 14:11 ` Andy Smith
  2 siblings, 0 replies; 6+ messages in thread
From: Andy Smith @ 2021-02-26 22:50 UTC (permalink / raw)
  To: xen-devel

Oops, I didn't finish this sentence before sending:

On Fri, Feb 26, 2021 at 10:39:27PM +0000, Andy Smith wrote:
> Also, it's always the md device that the guest block devices are
> on that is stalled - IO to other devices in dom0
…seems fine.

Thanks,
Andy


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: dom0 suddenly blocking on all access to md device
  2021-02-26 22:39 dom0 suddenly blocking on all access to md device Andy Smith
  2021-02-26 22:50 ` Andy Smith
@ 2021-03-02  0:44 ` Andy Smith
  2021-06-12 14:11 ` Andy Smith
  2 siblings, 0 replies; 6+ messages in thread
From: Andy Smith @ 2021-03-02  0:44 UTC (permalink / raw)
  To: xen-devel

Hello,

On Fri, Feb 26, 2021 at 10:39:27PM +0000, Andy Smith wrote:
> just lately I've been sporadically having issues where dom0 blocks
> or severely slows down on all access to the particular md device
> that hosts all domU block devices.

This just happened again on the same server as the previous
occurrence on 26th February.

Again I was able to get things going again and avoid having to
destroy all guests and hard reset the host by doing a destroy on a
guest that was seen to be trying to do the most block IO (through
xentop).

This means that this particular host is still on Debian buster
kernel 4.19.0-13-amd64 and credit2 scheduler. As it's been only a
few days this might present itself again on this host quite soon. Is
there any useful information I can provide when it does?

Unfortunately there were no logs of interest neither in the dom0
syslog, guest syslog nor dom0 serial console. After the heavy IO
guest was destroyed dom0 syslog did give:

Mar  2 00:14:08 clockwork kernel: [6732447.973298] xen-blkback: Scheduled work from previous purge is still busy, cannot purge list

…but I assume that is just a result of IO springing back to life.

Thanks,
Andy


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: dom0 suddenly blocking on all access to md device
  2021-02-26 22:39 dom0 suddenly blocking on all access to md device Andy Smith
  2021-02-26 22:50 ` Andy Smith
  2021-03-02  0:44 ` Andy Smith
@ 2021-06-12 14:11 ` Andy Smith
  2021-06-12 22:47   ` Rob Townley
  2 siblings, 1 reply; 6+ messages in thread
From: Andy Smith @ 2021-06-12 14:11 UTC (permalink / raw)
  To: xen-devel

Hello,

Unfortunately I'm still experiencing this problem as described in
the earlier email below and I'm running out of ideas for things to
test / try.

What was fine for a long time (~5 years): Debian jessie dom0 kernel
4.9.x with Xen 4.10.

Below issues started happening on same machines once dom0 was
upgraded to Debian buster 4.19.x kernel (currently 4.19.0-16-amd64)
and 4.12 hypervisor. Starting around December 2020.

Since then I've also tried going to Xen 4.14.2 (plus latest XSA
patches up to XSA377) and it's still happening. I've also tried
switching to "credit" scheduler and that did not make a difference.
It can be a month or two between incidents although one machine just
had it happen twice in 3 days. Maybe half a dozen incidents so far
on different machines, different hardware configs.

Hypervisor command line is:

dom0_mem=4096M dom0_max_vcpus=2 com1=115200,8n1,0x2f8,10 console=com1,vga ucode=scan serial_tx_buffer=256k smt=1

There's a serial console but not much interesting is ever seen on
it. If there are some debug keys you would like to see the output of
please let me know. Pretty much the only sort of thing that gets
logged in dom0 is the following and that could just be a symptom.

Jun 12 12:04:40 clockwork kernel: [216427.246183] INFO: task md5_raid1:205 blocked for more than 120 seconds.
Jun 12 12:04:40 clockwork kernel: [216427.246995]       Not tainted 4.19.0-16-amd64 #1 Debian 4.19.181-1
Jun 12 12:04:40 clockwork kernel: [216427.247852] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 12 12:04:40 clockwork kernel: [216427.248674] md5_raid1       D 0   205      2 0x80000000
Jun 12 12:04:40 clockwork kernel: [216427.249534] Call Trace:
Jun 12 12:04:40 clockwork kernel: [216427.250368] __schedule+0x29f/0x840
Jun 12 12:04:40 clockwork kernel: [216427.251788]  ? _raw_spin_unlock_irqrestore+0x14/0x20
Jun 12 12:04:40 clockwork kernel: [216427.253078] schedule+0x28/0x80
Jun 12 12:04:40 clockwork kernel: [216427.253945] md_super_wait+0x6e/0xa0 [md_mod]
Jun 12 12:04:40 clockwork kernel: [216427.254812]  ? finish_wait+0x80/0x80
Jun 12 12:04:40 clockwork kernel: [216427.256139] md_bitmap_wait_writes+0x93/0xa0 [md_mod]
Jun 12 12:04:40 clockwork kernel: [216427.256994]  ? md_bitmap_get_counter+0x42/0xd0 [md_mod]
Jun 12 12:04:40 clockwork kernel: [216427.257787] md_bitmap_daemon_work+0x1f7/0x370 [md_mod]
Jun 12 12:04:40 clockwork kernel: [216427.258608]  ? md_rdev_init+0xb0/0xb0 [md_mod]
Jun 12 12:04:40 clockwork kernel: [216427.259553] md_check_recovery+0x41/0x530 [md_mod]
Jun 12 12:04:40 clockwork kernel: [216427.260304]  raid1d+0x5c/0xf10 [raid1]
Jun 12 12:04:40 clockwork kernel: [216427.261096]  ? lock_timer_base+0x67/0x80
Jun 12 12:04:40 clockwork kernel: [216427.261863]  ? _raw_spin_unlock_irqrestore+0x14/0x20
Jun 12 12:04:40 clockwork kernel: [216427.262659]  ? try_to_del_timer_sync+0x4d/0x80
Jun 12 12:04:40 clockwork kernel: [216427.263436]  ? del_timer_sync+0x37/0x40
Jun 12 12:04:40 clockwork kernel: [216427.264189]  ? schedule_timeout+0x173/0x3b0
Jun 12 12:04:40 clockwork kernel: [216427.264911]  ? md_rdev_init+0xb0/0xb0 [md_mod]
Jun 12 12:04:40 clockwork kernel: [216427.265664]  ? md_thread+0x94/0x150 [md_mod]
Jun 12 12:04:40 clockwork kernel: [216427.266412]  ? process_checks+0x4a0/0x4a0 [raid1]
Jun 12 12:04:40 clockwork kernel: [216427.267124] md_thread+0x94/0x150 [md_mod]
Jun 12 12:04:40 clockwork kernel: [216427.267842]  ? finish_wait+0x80/0x80
Jun 12 12:04:40 clockwork kernel: [216427.268539] kthread+0x112/0x130
Jun 12 12:04:40 clockwork kernel: [216427.269231]  ? kthread_bind+0x30/0x30
Jun 12 12:04:40 clockwork kernel: [216427.269903] ret_from_fork+0x35/0x40
Jun 12 12:04:40 clockwork kernel: [216427.270590] INFO: task md2_raid1:207 blocked for more than 120 seconds.
Jun 12 12:04:40 clockwork kernel: [216427.271260]       Not tainted 4.19.0-16-amd64 #1 Debian 4.19.181-1
Jun 12 12:04:40 clockwork kernel: [216427.271942] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 12 12:04:40 clockwork kernel: [216427.272721] md2_raid1       D 0   207      2 0x80000000
Jun 12 12:04:40 clockwork kernel: [216427.273432] Call Trace:
Jun 12 12:04:40 clockwork kernel: [216427.274172] __schedule+0x29f/0x840
Jun 12 12:04:40 clockwork kernel: [216427.274869] schedule+0x28/0x80
Jun 12 12:04:40 clockwork kernel: [216427.275543] io_schedule+0x12/0x40
Jun 12 12:04:40 clockwork kernel: [216427.276208] wbt_wait+0x205/0x300
Jun 12 12:04:40 clockwork kernel: [216427.276861]  ? wbt_wait+0x300/0x300
Jun 12 12:04:40 clockwork kernel: [216427.277503] rq_qos_throttle+0x31/0x40
Jun 12 12:04:40 clockwork kernel: [216427.278193] blk_mq_make_request+0x111/0x530
Jun 12 12:04:40 clockwork kernel: [216427.278876] generic_make_request+0x1a4/0x400
Jun 12 12:04:40 clockwork kernel: [216427.279657]  ? try_to_wake_up+0x54/0x470
Jun 12 12:04:40 clockwork kernel: [216427.280400] submit_bio+0x45/0x130
Jun 12 12:04:40 clockwork kernel: [216427.281136]  ? md_super_write.part.63+0x90/0x120 [md_mod]
Jun 12 12:04:40 clockwork kernel: [216427.281788] md_update_sb.part.65+0x3a8/0x8e0 [md_mod]
Jun 12 12:04:40 clockwork kernel: [216427.282480]  ? md_rdev_init+0xb0/0xb0 [md_mod]
Jun 12 12:04:40 clockwork kernel: [216427.283106] md_check_recovery+0x272/0x530 [md_mod]
Jun 12 12:04:40 clockwork kernel: [216427.283738]  raid1d+0x5c/0xf10 [raid1]
Jun 12 12:04:40 clockwork kernel: [216427.284345]  ? __schedule+0x2a7/0x840
Jun 12 12:04:40 clockwork kernel: [216427.284939]  ? md_rdev_init+0xb0/0xb0 [md_mod]
Jun 12 12:04:40 clockwork kernel: [216427.285522]  ? schedule+0x28/0x80
Jun 12 12:04:40 clockwork kernel: [216427.286121]  ? schedule_timeout+0x26d/0x3b0
Jun 12 12:04:40 clockwork kernel: [216427.286702]  ? __schedule+0x2a7/0x840
Jun 12 12:04:40 clockwork kernel: [216427.287279]  ? md_rdev_init+0xb0/0xb0 [md_mod]
Jun 12 12:04:40 clockwork kernel: [216427.287871]  ? md_thread+0x94/0x150 [md_mod]
Jun 12 12:04:40 clockwork kernel: [216427.288458]  ? process_checks+0x4a0/0x4a0 [raid1]
Jun 12 12:04:40 clockwork kernel: [216427.289062] md_thread+0x94/0x150 [md_mod]
Jun 12 12:04:40 clockwork kernel: [216427.289663]  ? finish_wait+0x80/0x80
Jun 12 12:04:40 clockwork kernel: [216427.290288] kthread+0x112/0x130
Jun 12 12:04:40 clockwork kernel: [216427.290858]  ? kthread_bind+0x30/0x30
Jun 12 12:04:40 clockwork kernel: [216427.291433] ret_from_fork+0x35/0x40

What I HAVEN'T yet tried is a much newer kernel. That will probably
be what I try next having exhausted all ideas about upgrading or
configuring Xen.

Should I take a kernel from buster-backports which would currently
be:

    https://packages.debian.org/buster-backports/linux-image-5.10.0-0.bpo.5-amd64

or should I build a kernel package from a mainline release?

Thanks,
Andy

On Fri, Feb 26, 2021 at 10:39:27PM +0000, Andy Smith wrote:
> Hi,
> 
> I suspect this might be an issue in the dom0 kernel (Debian buster,
> kernel 4.19.0-13-amd64), but just lately I've been sporadically
> having issues where dom0 blocks or severely slows down on all access
> to the particular md device that hosts all domU block devices.
> 
> Setup in dom0: an md RAID10 that is used as an LVM PV for an LVM volume
> group, where all domU block devices are LVM logical volumes in that
> group. So the relevant part of a domU config file might look like:
> 
> disk = [ "phy:/dev/myvg/domu_debtest1_xvda,xvda,w",
>          "phy:/dev/myvg/domu_debtest1_xvdb,xvdb,w" ]
> 
> The guests are mostly PV, a sprinkling of PVH, no HVM.
> 
> There's 5 of these servers but 3 of them have only recently been
> upgraded to Xen 4.12.14 (on Debian buster) from Xen 4.10 (on Debian
> jessie). The fact that all of them have been pretty stable in the
> past, on differing hardware, makes me discount a hardware issue. The
> fact that two of them have been buster / 4.12.x for a long time
> without issue but are also now starting to see this does make me
> think that it's a recent dom0 kernel issue.
> 
> When the problem occurs, inside every domU I see things like this:
> 
> Feb 26 20:02:34 backup4 kernel: [2530464.736085] INFO: task btrfs-transacti:333 blocked for more than 120 seconds.
> Feb 26 20:02:34 backup4 kernel: [2530464.736107]       Not tainted 4.9.0-14-amd64 #1 Debian 4.9.246-2
> Feb 26 20:02:34 backup4 kernel: [2530464.736117] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Feb 26 20:02:34 backup4 kernel: [2530464.736131] btrfs-transacti D    0   333      2 0x00000000
> Feb 26 20:02:34 backup4 kernel: [2530464.736146]  0000000000000246 ffff8800f4e0c400 0000000000000000 ffff8800f8a7f100
> Feb 26 20:02:34 backup4 kernel: [2530464.736168]  ffff8800fad18a00 ffff8800fa7dd000 ffffc90040b2f670 ffffffff8161a979
> Feb 26 20:02:34 backup4 kernel: [2530464.736188]  ffff8800fa6d0200 0000000000000000 ffff8800fad18a00 0000000000000010
> Feb 26 20:02:34 backup4 kernel: [2530464.736209] Call Trace:
> Feb 26 20:02:34 backup4 kernel: [2530464.736223]  [<ffffffff8161a979>] ? __schedule+0x239/0x6f0
> Feb 26 20:02:34 backup4 kernel: [2530464.736236]  [<ffffffff8161ae62>] ? schedule+0x32/0x80
> Feb 26 20:02:34 backup4 kernel: [2530464.736248]  [<ffffffff8161e1fd>] ? schedule_timeout+0x1dd/0x380
> Feb 26 20:02:34 backup4 kernel: [2530464.736263]  [<ffffffff8101c201>] ? xen_clocksource_get_cycles+0x11/0x20
> Feb 26 20:02:34 backup4 kernel: [2530464.736275]  [<ffffffff8161a6dd>] ? io_schedule_timeout+0x9d/0x100
> Feb 26 20:02:34 backup4 kernel: [2530464.736289]  [<ffffffff81367964>] ? __sbitmap_queue_get+0x24/0x90
> Feb 26 20:02:34 backup4 kernel: [2530464.736302]  [<ffffffff81317f60>] ? bt_get.isra.6+0x160/0x220
> Feb 26 20:02:34 backup4 kernel: [2530464.736338]  [<ffffffffc0148bf8>] ? __btrfs_map_block+0x6c8/0x11d0 [btrfs]
> Feb 26 20:02:34 backup4 kernel: [2530464.736353]  [<ffffffff810bf010>] ? prepare_to_wait_event+0xf0/0xf0
> Feb 26 20:02:34 backup4 kernel: [2530464.736364]  [<ffffffff813182d3>] ? blk_mq_get_tag+0x23/0x90
> Feb 26 20:02:34 backup4 kernel: [2530464.736377]  [<ffffffff81313b6a>] ? __blk_mq_alloc_request+0x1a/0x220
> Feb 26 20:02:34 backup4 kernel: [2530464.736390]  [<ffffffff81314a39>] ? blk_mq_map_request+0xd9/0x170
> Feb 26 20:02:34 backup4 kernel: [2530464.736402]  [<ffffffff8131726b>] ? blk_mq_make_request+0xbb/0x580
> Feb 26 20:02:34 backup4 kernel: [2530464.736429]  [<ffffffffc0148bf8>] ? __btrfs_map_block+0x6c8/0x11d0 [btrfs]
> Feb 26 20:02:34 backup4 kernel: [2530464.736444]  [<ffffffff8130b0f5>] ? generic_make_request+0x115/0x2d0
> Feb 26 20:02:34 backup4 kernel: [2530464.736456]  [<ffffffff8130b326>] ? submit_bio+0x76/0x140
> Feb 26 20:02:34 backup4 kernel: [2530464.736481]  [<ffffffffc0149d9a>] ? btrfs_map_bio+0x19a/0x340 [btrfs]
> Feb 26 20:02:34 backup4 kernel: [2530464.736505]  [<ffffffffc0111635>] ? btree_submit_bio_hook+0xf5/0x110 [btrfs]
> Feb 26 20:02:34 backup4 kernel: [2530464.736535]  [<ffffffffc0138318>] ? submit_one_bio+0x68/0x90 [btrfs]
> Feb 26 20:02:34 backup4 kernel: [2530464.736561]  [<ffffffffc013fd4d>] ? read_extent_buffer_pages+0x1cd/0x300 [btrfs]
> Feb 26 20:02:34 backup4 kernel: [2530464.736587]  [<ffffffffc010fbe0>] ? free_root_pointers+0x60/0x60 [btrfs]
> Feb 26 20:02:34 backup4 kernel: [2530464.736609]  [<ffffffffc010ff9c>] ? btree_read_extent_buffer_pages+0x8c/0x100 [btrfs]
> Feb 26 20:02:34 backup4 kernel: [2530464.736635]  [<ffffffffc0111814>] ? read_tree_block+0x34/0x50 [btrfs]
> Feb 26 20:02:34 backup4 kernel: [2530464.736655]  [<ffffffffc00ef9f3>] ? read_block_for_search.isra.36+0x133/0x320 [btrfs]
> Feb 26 20:02:34 backup4 kernel: [2530464.736678]  [<ffffffffc00eabe4>] ? unlock_up+0xd4/0x180 [btrfs]
> Feb 26 20:02:34 backup4 kernel: [2530464.736700]  [<ffffffffc00f1b8d>] ? btrfs_search_slot+0x3ad/0xa00 [btrfs]
> Feb 26 20:02:34 backup4 kernel: [2530464.736723]  [<ffffffffc00f3a47>] ? btrfs_insert_empty_items+0x67/0xc0 [btrfs]
> Feb 26 20:02:34 backup4 kernel: [2530464.736748]  [<ffffffffc00ffe24>] ? __btrfs_run_delayed_refs+0xfc4/0x13a0 [btrfs]
> Feb 26 20:02:34 backup4 kernel: [2530464.736763]  [<ffffffff810164bd>] ? xen_mc_flush+0xdd/0x1d0
> Feb 26 20:02:34 backup4 kernel: [2530464.736785]  [<ffffffffc01033ad>] ? btrfs_run_delayed_refs+0x9d/0x2b0 [btrfs]
> Feb 26 20:02:34 backup4 kernel: [2530464.736811]  [<ffffffffc0119817>] ? btrfs_commit_transaction+0x57/0xa10 [btrfs]
> Feb 26 20:02:34 backup4 kernel: [2530464.736837]  [<ffffffffc011a266>] ? start_transaction+0x96/0x480 [btrfs]
> Feb 26 20:02:34 backup4 kernel: [2530464.736861]  [<ffffffffc011464c>] ? transaction_kthread+0x1dc/0x200 [btrfs]
> Feb 26 20:02:34 backup4 kernel: [2530464.736886]  [<ffffffffc0114470>] ? btrfs_cleanup_transaction+0x590/0x590 [btrfs]
> Feb 26 20:02:34 backup4 kernel: [2530464.736901]  [<ffffffff8109be69>] ? kthread+0xd9/0xf0
> Feb 26 20:02:34 backup4 kernel: [2530464.736913]  [<ffffffff8109bd90>] ? kthread_park+0x60/0x60
> Feb 26 20:02:34 backup4 kernel: [2530464.736926]  [<ffffffff8161f8f7>] ? ret_from_fork+0x57/0x70
> 
> It's all kinds of guest kernel, and the processes are basically
> anything that tries to access its block devices.
> 
> Over in the dom0 at the time, I mostly haven't managed to get logs,
> probably because its logging is on the same md device that is having
> problems. Some of the servers are fortunate to have their dom0
> operating system installed on separate devices to the guest devices,
> and on one of those I got this:
> 
> Feb 20 00:58:44 talisker kernel: [5876461.472590] INFO: task md5_raid10:226 blocked for more than 120 seconds.
> Feb 20 00:58:44 talisker kernel: [5876461.473105]       Not tainted 4.19.0-13-amd64 #1 Debian 4.19.160-2
> Feb 20 00:58:44 talisker kernel: [5876461.473523] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Feb 20 00:58:44 talisker kernel: [5876461.473936] md5_raid10      D    0   226      2 0x80000000
> Feb 20 00:58:44 talisker kernel: [5876461.474341] Call Trace:
> Feb 20 00:58:44 talisker kernel: [5876461.474743]  __schedule+0x29f/0x840
> Feb 20 00:58:44 talisker kernel: [5876461.475142]  ? _raw_spin_unlock_irqrestore+0x14/0x20
> Feb 20 00:58:44 talisker kernel: [5876461.475554]  schedule+0x28/0x80
> Feb 20 00:58:44 talisker kernel: [5876461.475964]  md_super_wait+0x6e/0xa0 [md_mod]
> Feb 20 00:58:44 talisker kernel: [5876461.476372]  ? finish_wait+0x80/0x80
> Feb 20 00:58:44 talisker kernel: [5876461.476817]  md_bitmap_wait_writes+0x93/0xa0 [md_mod]
> Feb 20 00:58:44 talisker kernel: [5876461.477504]  ? md_bitmap_get_counter+0x42/0xd0 [md_mod]
> Feb 20 00:58:44 talisker kernel: [5876461.478248]  md_bitmap_daemon_work+0x1f7/0x370 [md_mod]
> Feb 20 00:58:44 talisker kernel: [5876461.478904]  md_check_recovery+0x41/0x530 [md_mod]
> Feb 20 00:58:44 talisker kernel: [5876461.479309]  raid10d+0x62/0x1460 [raid10]
> Feb 20 00:58:44 talisker kernel: [5876461.479722]  ? __switch_to_asm+0x41/0x70
> Feb 20 00:58:44 talisker kernel: [5876461.480133]  ? finish_task_switch+0x78/0x280
> Feb 20 00:58:44 talisker kernel: [5876461.480540]  ? _raw_spin_lock_irqsave+0x15/0x40
> Feb 20 00:58:44 talisker kernel: [5876461.480987]  ? lock_timer_base+0x67/0x80
> Feb 20 00:58:44 talisker kernel: [5876461.481719]  ? _raw_spin_unlock_irqrestore+0x14/0x20
> Feb 20 00:58:44 talisker kernel: [5876461.482358]  ? try_to_del_timer_sync+0x4d/0x80
> Feb 20 00:58:44 talisker kernel: [5876461.482768]  ? del_timer_sync+0x37/0x40
> Feb 20 00:58:44 talisker kernel: [5876461.483162]  ? schedule_timeout+0x173/0x3b0
> Feb 20 00:58:44 talisker kernel: [5876461.483553]  ? md_rdev_init+0xb0/0xb0 [md_mod]
> Feb 20 00:58:44 talisker kernel: [5876461.483944]  ? md_thread+0x94/0x150 [md_mod]
> Feb 20 00:58:44 talisker kernel: [5876461.484345]  ? r10bio_pool_alloc+0x20/0x20 [raid10]
> Feb 20 00:58:44 talisker kernel: [5876461.484777]  md_thread+0x94/0x150 [md_mod]
> Feb 20 00:58:44 talisker kernel: [5876461.485500]  ? finish_wait+0x80/0x80
> Feb 20 00:58:44 talisker kernel: [5876461.486083]  kthread+0x112/0x130
> Feb 20 00:58:44 talisker kernel: [5876461.486479]  ? kthread_bind+0x30/0x30
> Feb 20 00:58:44 talisker kernel: [5876461.486870]  ret_from_fork+0x35/0x40
> Feb 20 00:58:44 talisker kernel: [5876461.487260] INFO: task 1.xvda-0:4237 blocked for more than 120 seconds.
> Feb 20 00:58:44 talisker kernel: [5876461.487644]       Not tainted 4.19.0-13-amd64 #1 Debian 4.19.160-2
> Feb 20 00:58:44 talisker kernel: [5876461.488027] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Feb 20 00:58:44 talisker kernel: [5876461.488422] 1.xvda-0        D    0  4237      2 0x80000000
> Feb 20 00:58:44 talisker kernel: [5876461.488842] Call Trace:
> Feb 20 00:58:44 talisker kernel: [5876461.489530]  __schedule+0x29f/0x840
> Feb 20 00:58:44 talisker kernel: [5876461.490149]  ? _raw_spin_unlock_irqrestore+0x14/0x20
> Feb 20 00:58:44 talisker kernel: [5876461.490545]  schedule+0x28/0x80
> Feb 20 00:58:44 talisker kernel: [5876461.490954]  md_super_wait+0x6e/0xa0 [md_mod]
> Feb 20 00:58:44 talisker kernel: [5876461.491330]  ? finish_wait+0x80/0x80
> Feb 20 00:58:44 talisker kernel: [5876461.491708]  md_bitmap_wait_writes+0x93/0xa0 [md_mod]
> Feb 20 00:58:44 talisker kernel: [5876461.492101]  md_bitmap_unplug+0xc5/0x120 [md_mod]
> Feb 20 00:58:44 talisker kernel: [5876461.492490]  raid10_unplug+0xd4/0x190 [raid10]
> Feb 20 00:58:44 talisker kernel: [5876461.492926]  blk_flush_plug_list+0xcf/0x240
> Feb 20 00:58:44 talisker kernel: [5876461.493648]  blk_finish_plug+0x21/0x2e
> Feb 20 00:58:44 talisker kernel: [5876461.494277]  dispatch_rw_block_io+0x696/0x990 [xen_blkback]
> Feb 20 00:58:44 talisker kernel: [5876461.494657]  ? inv_show+0x30/0x30
> Feb 20 00:58:44 talisker kernel: [5876461.495043]  __do_block_io_op+0x30f/0x610 [xen_blkback]
> Feb 20 00:58:44 talisker kernel: [5876461.495458]  ? _raw_spin_unlock_irqrestore+0x14/0x20
> Feb 20 00:58:44 talisker kernel: [5876461.495871]  ? try_to_del_timer_sync+0x4d/0x80
> Feb 20 00:58:44 talisker kernel: [5876461.496264]  xen_blkif_schedule+0xdb/0x650 [xen_blkback]
> Feb 20 00:58:44 talisker kernel: [5876461.496784]  ? finish_wait+0x80/0x80
> Feb 20 00:58:44 talisker kernel: [5876461.497418]  ? xen_blkif_be_int+0x30/0x30 [xen_blkback]
> Feb 20 00:58:44 talisker kernel: [5876461.498041]  kthread+0x112/0x130
> Feb 20 00:58:44 talisker kernel: [5876461.498668]  ? kthread_bind+0x30/0x30
> Feb 20 00:58:44 talisker kernel: [5876461.499309]  ret_from_fork+0x35/0x40
> Feb 20 00:58:44 talisker kernel: [5876461.499960] INFO: task 1.xvda-1:4238 blocked for more than 120 seconds.
> Feb 20 00:58:44 talisker kernel: [5876461.500518]       Not tainted 4.19.0-13-amd64 #1 Debian 4.19.160-2
> Feb 20 00:58:44 talisker kernel: [5876461.500943] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Feb 20 00:58:44 talisker kernel: [5876461.501609] 1.xvda-1        D    0  4238      2 0x80000000
> Feb 20 00:58:44 talisker kernel: [5876461.501992] Call Trace:
> Feb 20 00:58:44 talisker kernel: [5876461.502372]  __schedule+0x29f/0x840
> Feb 20 00:58:44 talisker kernel: [5876461.502747]  schedule+0x28/0x80
> Feb 20 00:58:44 talisker kernel: [5876461.503121]  io_schedule+0x12/0x40
> Feb 20 00:58:44 talisker kernel: [5876461.503494]  wbt_wait+0x205/0x300
> Feb 20 00:58:44 talisker kernel: [5876461.503863]  ? wbt_wait+0x300/0x300
> Feb 20 00:58:44 talisker kernel: [5876461.504237]  rq_qos_throttle+0x31/0x40
> Feb 20 00:58:44 talisker kernel: [5876461.504637]  blk_mq_make_request+0x111/0x530
> Feb 20 00:58:44 talisker kernel: [5876461.505319]  generic_make_request+0x1a4/0x400
> Feb 20 00:58:44 talisker kernel: [5876461.505999]  raid10_unplug+0xfd/0x190 [raid10]
> Feb 20 00:58:44 talisker kernel: [5876461.506402]  blk_flush_plug_list+0xcf/0x240
> Feb 20 00:58:44 talisker kernel: [5876461.506772]  blk_finish_plug+0x21/0x2e
> Feb 20 00:58:44 talisker kernel: [5876461.507140]  dispatch_rw_block_io+0x696/0x990 [xen_blkback]
> Feb 20 00:58:44 talisker kernel: [5876461.507792]  ? inv_show+0x30/0x30
> Feb 20 00:58:44 talisker kernel: [5876461.508166]  __do_block_io_op+0x30f/0x610 [xen_blkback]
> Feb 20 00:58:44 talisker kernel: [5876461.508549]  ? _raw_spin_unlock_irqrestore+0x14/0x20
> Feb 20 00:58:44 talisker kernel: [5876461.508967]  ? try_to_del_timer_sync+0x4d/0x80
> Feb 20 00:58:44 talisker kernel: [5876461.509673]  xen_blkif_schedule+0xdb/0x650 [xen_blkback]
> Feb 20 00:58:44 talisker kernel: [5876461.510304]  ? finish_wait+0x80/0x80
> Feb 20 00:58:44 talisker kernel: [5876461.510678]  ? xen_blkif_be_int+0x30/0x30 [xen_blkback]
> Feb 20 00:58:44 talisker kernel: [5876461.511049]  kthread+0x112/0x130
> Feb 20 00:58:44 talisker kernel: [5876461.511413]  ? kthread_bind+0x30/0x30
> Feb 20 00:58:44 talisker kernel: [5876461.511776]  ret_from_fork+0x35/0x40
> 
> Administrators of the guests notice problems and try to shutdown or
> reboot, but that fails because dom0 can't write to its xenstore, so
> mostly domains can't be managed after this happens and the server
> has to be forcibly rebooted.
> 
> These are all using the default scheduler, which I understand since
> 4.12 is credit2. SMT is enabled and I've limited dom0 to 2 cores,
> then pinned dom0 to cores 0 and 1, and pinned all other guests to
> their choice out of the remaining cores. That is something I did
> fairly recently though; for a long time there was no pinning yet
> this still started happening.
> 
> In a couple of cases I have found that I've been able to run
> "xentop" and see a particular guest doing heavy block device reads.
> I've done an "xl destroy" on that guest and then everything has
> returned to normal. Unfortunately the times this has happened have
> been on dom0s without useful logs. There's just a gap in logs
> between when the problems started and when the (apparently)
> problematic domU is destroyed. The problematic domU can then be
> booted again and life goes on.
> 
> So, it could be totally unrelated to Xen, and as I investigate
> further I will try different kernels in dom0. But the way that
> destroying a domU frees things up makes me wonder if it could be Xen
> related, maybe scheduler related? Also, it's always the md device
> that the guest block devices are on that is stalled - IO to other
> devices in dom0
> 
> Are there any hypervisor magic sysrq debug keys that could provide
> useful information to you in ruling in / out a Xen issue?
> 
> Should I try using the "credit" scheduler (instead of "credit2") at
> next boot?
> 
> I *think* this has only been seen with kernel version
> 4.19.0-13-amd64. Some of these servers have now been rebooted into
> 4.19.0-14-amd64 (the latest available package) due to the issue,
> which has not yet re-occurred for them.
> 
> If it does re-occur with 4.19.0-14-amd64 what kernel version would
> you advise I try out at next reboot so as to take the Debian kernel
> out of the picture? I will download an upstream kernel release and
> build a Debian package out of it, using my existing kernel config as
> a base.
> 
> As Debian buster is on the 4.19 series should I pick the latest
> 4.19.x longterm to be near to it, or the 5.10.x longterm, or the
> 5.11.x stable?
> 
> Thanks,
> Andy


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: dom0 suddenly blocking on all access to md device
  2021-06-12 14:11 ` Andy Smith
@ 2021-06-12 22:47   ` Rob Townley
  2021-06-12 23:13     ` Andy Smith
  0 siblings, 1 reply; 6+ messages in thread
From: Rob Townley @ 2021-06-12 22:47 UTC (permalink / raw)
  To: Andy Smith; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 24244 bytes --]

mdadm.conf has email reporting capabilities to alert to failing drives.
Test that you receive emails.

Use mdadm to run tests on the raid.

iostat may indicate a failing drive as well
smartctl -a /dev/


On Sat, Jun 12, 2021 at 9:12 AM Andy Smith <andy@strugglers.net> wrote:

> Hello,
>
> Unfortunately I'm still experiencing this problem as described in
> the earlier email below and I'm running out of ideas for things to
> test / try.
>
> What was fine for a long time (~5 years): Debian jessie dom0 kernel
> 4.9.x with Xen 4.10.
>
> Below issues started happening on same machines once dom0 was
> upgraded to Debian buster 4.19.x kernel (currently 4.19.0-16-amd64)
> and 4.12 hypervisor. Starting around December 2020.
>
> Since then I've also tried going to Xen 4.14.2 (plus latest XSA
> patches up to XSA377) and it's still happening. I've also tried
> switching to "credit" scheduler and that did not make a difference.
> It can be a month or two between incidents although one machine just
> had it happen twice in 3 days. Maybe half a dozen incidents so far
> on different machines, different hardware configs.
>
> Hypervisor command line is:
>
> dom0_mem=4096M dom0_max_vcpus=2 com1=115200,8n1,0x2f8,10 console=com1,vga
> ucode=scan serial_tx_buffer=256k smt=1
>
> There's a serial console but not much interesting is ever seen on
> it. If there are some debug keys you would like to see the output of
> please let me know. Pretty much the only sort of thing that gets
> logged in dom0 is the following and that could just be a symptom.
>
> Jun 12 12:04:40 clockwork kernel: [216427.246183] INFO: task md5_raid1:205
> blocked for more than 120 seconds.
> Jun 12 12:04:40 clockwork kernel: [216427.246995]       Not tainted
> 4.19.0-16-amd64 #1 Debian 4.19.181-1
> Jun 12 12:04:40 clockwork kernel: [216427.247852] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Jun 12 12:04:40 clockwork kernel: [216427.248674] md5_raid1       D 0
>  205      2 0x80000000
> Jun 12 12:04:40 clockwork kernel: [216427.249534] Call Trace:
> Jun 12 12:04:40 clockwork kernel: [216427.250368] __schedule+0x29f/0x840
> Jun 12 12:04:40 clockwork kernel: [216427.251788]  ?
> _raw_spin_unlock_irqrestore+0x14/0x20
> Jun 12 12:04:40 clockwork kernel: [216427.253078] schedule+0x28/0x80
> Jun 12 12:04:40 clockwork kernel: [216427.253945] md_super_wait+0x6e/0xa0
> [md_mod]
> Jun 12 12:04:40 clockwork kernel: [216427.254812]  ? finish_wait+0x80/0x80
> Jun 12 12:04:40 clockwork kernel: [216427.256139]
> md_bitmap_wait_writes+0x93/0xa0 [md_mod]
> Jun 12 12:04:40 clockwork kernel: [216427.256994]  ?
> md_bitmap_get_counter+0x42/0xd0 [md_mod]
> Jun 12 12:04:40 clockwork kernel: [216427.257787]
> md_bitmap_daemon_work+0x1f7/0x370 [md_mod]
> Jun 12 12:04:40 clockwork kernel: [216427.258608]  ?
> md_rdev_init+0xb0/0xb0 [md_mod]
> Jun 12 12:04:40 clockwork kernel: [216427.259553]
> md_check_recovery+0x41/0x530 [md_mod]
> Jun 12 12:04:40 clockwork kernel: [216427.260304]  raid1d+0x5c/0xf10
> [raid1]
> Jun 12 12:04:40 clockwork kernel: [216427.261096]  ?
> lock_timer_base+0x67/0x80
> Jun 12 12:04:40 clockwork kernel: [216427.261863]  ?
> _raw_spin_unlock_irqrestore+0x14/0x20
> Jun 12 12:04:40 clockwork kernel: [216427.262659]  ?
> try_to_del_timer_sync+0x4d/0x80
> Jun 12 12:04:40 clockwork kernel: [216427.263436]  ?
> del_timer_sync+0x37/0x40
> Jun 12 12:04:40 clockwork kernel: [216427.264189]  ?
> schedule_timeout+0x173/0x3b0
> Jun 12 12:04:40 clockwork kernel: [216427.264911]  ?
> md_rdev_init+0xb0/0xb0 [md_mod]
> Jun 12 12:04:40 clockwork kernel: [216427.265664]  ? md_thread+0x94/0x150
> [md_mod]
> Jun 12 12:04:40 clockwork kernel: [216427.266412]  ?
> process_checks+0x4a0/0x4a0 [raid1]
> Jun 12 12:04:40 clockwork kernel: [216427.267124] md_thread+0x94/0x150
> [md_mod]
> Jun 12 12:04:40 clockwork kernel: [216427.267842]  ? finish_wait+0x80/0x80
> Jun 12 12:04:40 clockwork kernel: [216427.268539] kthread+0x112/0x130
> Jun 12 12:04:40 clockwork kernel: [216427.269231]  ? kthread_bind+0x30/0x30
> Jun 12 12:04:40 clockwork kernel: [216427.269903] ret_from_fork+0x35/0x40
> Jun 12 12:04:40 clockwork kernel: [216427.270590] INFO: task md2_raid1:207
> blocked for more than 120 seconds.
> Jun 12 12:04:40 clockwork kernel: [216427.271260]       Not tainted
> 4.19.0-16-amd64 #1 Debian 4.19.181-1
> Jun 12 12:04:40 clockwork kernel: [216427.271942] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Jun 12 12:04:40 clockwork kernel: [216427.272721] md2_raid1       D 0
>  207      2 0x80000000
> Jun 12 12:04:40 clockwork kernel: [216427.273432] Call Trace:
> Jun 12 12:04:40 clockwork kernel: [216427.274172] __schedule+0x29f/0x840
> Jun 12 12:04:40 clockwork kernel: [216427.274869] schedule+0x28/0x80
> Jun 12 12:04:40 clockwork kernel: [216427.275543] io_schedule+0x12/0x40
> Jun 12 12:04:40 clockwork kernel: [216427.276208] wbt_wait+0x205/0x300
> Jun 12 12:04:40 clockwork kernel: [216427.276861]  ? wbt_wait+0x300/0x300
> Jun 12 12:04:40 clockwork kernel: [216427.277503] rq_qos_throttle+0x31/0x40
> Jun 12 12:04:40 clockwork kernel: [216427.278193]
> blk_mq_make_request+0x111/0x530
> Jun 12 12:04:40 clockwork kernel: [216427.278876]
> generic_make_request+0x1a4/0x400
> Jun 12 12:04:40 clockwork kernel: [216427.279657]  ?
> try_to_wake_up+0x54/0x470
> Jun 12 12:04:40 clockwork kernel: [216427.280400] submit_bio+0x45/0x130
> Jun 12 12:04:40 clockwork kernel: [216427.281136]  ?
> md_super_write.part.63+0x90/0x120 [md_mod]
> Jun 12 12:04:40 clockwork kernel: [216427.281788]
> md_update_sb.part.65+0x3a8/0x8e0 [md_mod]
> Jun 12 12:04:40 clockwork kernel: [216427.282480]  ?
> md_rdev_init+0xb0/0xb0 [md_mod]
> Jun 12 12:04:40 clockwork kernel: [216427.283106]
> md_check_recovery+0x272/0x530 [md_mod]
> Jun 12 12:04:40 clockwork kernel: [216427.283738]  raid1d+0x5c/0xf10
> [raid1]
> Jun 12 12:04:40 clockwork kernel: [216427.284345]  ? __schedule+0x2a7/0x840
> Jun 12 12:04:40 clockwork kernel: [216427.284939]  ?
> md_rdev_init+0xb0/0xb0 [md_mod]
> Jun 12 12:04:40 clockwork kernel: [216427.285522]  ? schedule+0x28/0x80
> Jun 12 12:04:40 clockwork kernel: [216427.286121]  ?
> schedule_timeout+0x26d/0x3b0
> Jun 12 12:04:40 clockwork kernel: [216427.286702]  ? __schedule+0x2a7/0x840
> Jun 12 12:04:40 clockwork kernel: [216427.287279]  ?
> md_rdev_init+0xb0/0xb0 [md_mod]
> Jun 12 12:04:40 clockwork kernel: [216427.287871]  ? md_thread+0x94/0x150
> [md_mod]
> Jun 12 12:04:40 clockwork kernel: [216427.288458]  ?
> process_checks+0x4a0/0x4a0 [raid1]
> Jun 12 12:04:40 clockwork kernel: [216427.289062] md_thread+0x94/0x150
> [md_mod]
> Jun 12 12:04:40 clockwork kernel: [216427.289663]  ? finish_wait+0x80/0x80
> Jun 12 12:04:40 clockwork kernel: [216427.290288] kthread+0x112/0x130
> Jun 12 12:04:40 clockwork kernel: [216427.290858]  ? kthread_bind+0x30/0x30
> Jun 12 12:04:40 clockwork kernel: [216427.291433] ret_from_fork+0x35/0x40
>
> What I HAVEN'T yet tried is a much newer kernel. That will probably
> be what I try next having exhausted all ideas about upgrading or
> configuring Xen.
>
> Should I take a kernel from buster-backports which would currently
> be:
>
>
> https://packages.debian.org/buster-backports/linux-image-5.10.0-0.bpo.5-amd64
>
> or should I build a kernel package from a mainline release?
>
> Thanks,
> Andy
>
> On Fri, Feb 26, 2021 at 10:39:27PM +0000, Andy Smith wrote:
> > Hi,
> >
> > I suspect this might be an issue in the dom0 kernel (Debian buster,
> > kernel 4.19.0-13-amd64), but just lately I've been sporadically
> > having issues where dom0 blocks or severely slows down on all access
> > to the particular md device that hosts all domU block devices.
> >
> > Setup in dom0: an md RAID10 that is used as an LVM PV for an LVM volume
> > group, where all domU block devices are LVM logical volumes in that
> > group. So the relevant part of a domU config file might look like:
> >
> > disk = [ "phy:/dev/myvg/domu_debtest1_xvda,xvda,w",
> >          "phy:/dev/myvg/domu_debtest1_xvdb,xvdb,w" ]
> >
> > The guests are mostly PV, a sprinkling of PVH, no HVM.
> >
> > There's 5 of these servers but 3 of them have only recently been
> > upgraded to Xen 4.12.14 (on Debian buster) from Xen 4.10 (on Debian
> > jessie). The fact that all of them have been pretty stable in the
> > past, on differing hardware, makes me discount a hardware issue. The
> > fact that two of them have been buster / 4.12.x for a long time
> > without issue but are also now starting to see this does make me
> > think that it's a recent dom0 kernel issue.
> >
> > When the problem occurs, inside every domU I see things like this:
> >
> > Feb 26 20:02:34 backup4 kernel: [2530464.736085] INFO: task
> btrfs-transacti:333 blocked for more than 120 seconds.
> > Feb 26 20:02:34 backup4 kernel: [2530464.736107]       Not tainted
> 4.9.0-14-amd64 #1 Debian 4.9.246-2
> > Feb 26 20:02:34 backup4 kernel: [2530464.736117] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > Feb 26 20:02:34 backup4 kernel: [2530464.736131] btrfs-transacti D    0
>  333      2 0x00000000
> > Feb 26 20:02:34 backup4 kernel: [2530464.736146]  0000000000000246
> ffff8800f4e0c400 0000000000000000 ffff8800f8a7f100
> > Feb 26 20:02:34 backup4 kernel: [2530464.736168]  ffff8800fad18a00
> ffff8800fa7dd000 ffffc90040b2f670 ffffffff8161a979
> > Feb 26 20:02:34 backup4 kernel: [2530464.736188]  ffff8800fa6d0200
> 0000000000000000 ffff8800fad18a00 0000000000000010
> > Feb 26 20:02:34 backup4 kernel: [2530464.736209] Call Trace:
> > Feb 26 20:02:34 backup4 kernel: [2530464.736223]  [<ffffffff8161a979>] ?
> __schedule+0x239/0x6f0
> > Feb 26 20:02:34 backup4 kernel: [2530464.736236]  [<ffffffff8161ae62>] ?
> schedule+0x32/0x80
> > Feb 26 20:02:34 backup4 kernel: [2530464.736248]  [<ffffffff8161e1fd>] ?
> schedule_timeout+0x1dd/0x380
> > Feb 26 20:02:34 backup4 kernel: [2530464.736263]  [<ffffffff8101c201>] ?
> xen_clocksource_get_cycles+0x11/0x20
> > Feb 26 20:02:34 backup4 kernel: [2530464.736275]  [<ffffffff8161a6dd>] ?
> io_schedule_timeout+0x9d/0x100
> > Feb 26 20:02:34 backup4 kernel: [2530464.736289]  [<ffffffff81367964>] ?
> __sbitmap_queue_get+0x24/0x90
> > Feb 26 20:02:34 backup4 kernel: [2530464.736302]  [<ffffffff81317f60>] ?
> bt_get.isra.6+0x160/0x220
> > Feb 26 20:02:34 backup4 kernel: [2530464.736338]  [<ffffffffc0148bf8>] ?
> __btrfs_map_block+0x6c8/0x11d0 [btrfs]
> > Feb 26 20:02:34 backup4 kernel: [2530464.736353]  [<ffffffff810bf010>] ?
> prepare_to_wait_event+0xf0/0xf0
> > Feb 26 20:02:34 backup4 kernel: [2530464.736364]  [<ffffffff813182d3>] ?
> blk_mq_get_tag+0x23/0x90
> > Feb 26 20:02:34 backup4 kernel: [2530464.736377]  [<ffffffff81313b6a>] ?
> __blk_mq_alloc_request+0x1a/0x220
> > Feb 26 20:02:34 backup4 kernel: [2530464.736390]  [<ffffffff81314a39>] ?
> blk_mq_map_request+0xd9/0x170
> > Feb 26 20:02:34 backup4 kernel: [2530464.736402]  [<ffffffff8131726b>] ?
> blk_mq_make_request+0xbb/0x580
> > Feb 26 20:02:34 backup4 kernel: [2530464.736429]  [<ffffffffc0148bf8>] ?
> __btrfs_map_block+0x6c8/0x11d0 [btrfs]
> > Feb 26 20:02:34 backup4 kernel: [2530464.736444]  [<ffffffff8130b0f5>] ?
> generic_make_request+0x115/0x2d0
> > Feb 26 20:02:34 backup4 kernel: [2530464.736456]  [<ffffffff8130b326>] ?
> submit_bio+0x76/0x140
> > Feb 26 20:02:34 backup4 kernel: [2530464.736481]  [<ffffffffc0149d9a>] ?
> btrfs_map_bio+0x19a/0x340 [btrfs]
> > Feb 26 20:02:34 backup4 kernel: [2530464.736505]  [<ffffffffc0111635>] ?
> btree_submit_bio_hook+0xf5/0x110 [btrfs]
> > Feb 26 20:02:34 backup4 kernel: [2530464.736535]  [<ffffffffc0138318>] ?
> submit_one_bio+0x68/0x90 [btrfs]
> > Feb 26 20:02:34 backup4 kernel: [2530464.736561]  [<ffffffffc013fd4d>] ?
> read_extent_buffer_pages+0x1cd/0x300 [btrfs]
> > Feb 26 20:02:34 backup4 kernel: [2530464.736587]  [<ffffffffc010fbe0>] ?
> free_root_pointers+0x60/0x60 [btrfs]
> > Feb 26 20:02:34 backup4 kernel: [2530464.736609]  [<ffffffffc010ff9c>] ?
> btree_read_extent_buffer_pages+0x8c/0x100 [btrfs]
> > Feb 26 20:02:34 backup4 kernel: [2530464.736635]  [<ffffffffc0111814>] ?
> read_tree_block+0x34/0x50 [btrfs]
> > Feb 26 20:02:34 backup4 kernel: [2530464.736655]  [<ffffffffc00ef9f3>] ?
> read_block_for_search.isra.36+0x133/0x320 [btrfs]
> > Feb 26 20:02:34 backup4 kernel: [2530464.736678]  [<ffffffffc00eabe4>] ?
> unlock_up+0xd4/0x180 [btrfs]
> > Feb 26 20:02:34 backup4 kernel: [2530464.736700]  [<ffffffffc00f1b8d>] ?
> btrfs_search_slot+0x3ad/0xa00 [btrfs]
> > Feb 26 20:02:34 backup4 kernel: [2530464.736723]  [<ffffffffc00f3a47>] ?
> btrfs_insert_empty_items+0x67/0xc0 [btrfs]
> > Feb 26 20:02:34 backup4 kernel: [2530464.736748]  [<ffffffffc00ffe24>] ?
> __btrfs_run_delayed_refs+0xfc4/0x13a0 [btrfs]
> > Feb 26 20:02:34 backup4 kernel: [2530464.736763]  [<ffffffff810164bd>] ?
> xen_mc_flush+0xdd/0x1d0
> > Feb 26 20:02:34 backup4 kernel: [2530464.736785]  [<ffffffffc01033ad>] ?
> btrfs_run_delayed_refs+0x9d/0x2b0 [btrfs]
> > Feb 26 20:02:34 backup4 kernel: [2530464.736811]  [<ffffffffc0119817>] ?
> btrfs_commit_transaction+0x57/0xa10 [btrfs]
> > Feb 26 20:02:34 backup4 kernel: [2530464.736837]  [<ffffffffc011a266>] ?
> start_transaction+0x96/0x480 [btrfs]
> > Feb 26 20:02:34 backup4 kernel: [2530464.736861]  [<ffffffffc011464c>] ?
> transaction_kthread+0x1dc/0x200 [btrfs]
> > Feb 26 20:02:34 backup4 kernel: [2530464.736886]  [<ffffffffc0114470>] ?
> btrfs_cleanup_transaction+0x590/0x590 [btrfs]
> > Feb 26 20:02:34 backup4 kernel: [2530464.736901]  [<ffffffff8109be69>] ?
> kthread+0xd9/0xf0
> > Feb 26 20:02:34 backup4 kernel: [2530464.736913]  [<ffffffff8109bd90>] ?
> kthread_park+0x60/0x60
> > Feb 26 20:02:34 backup4 kernel: [2530464.736926]  [<ffffffff8161f8f7>] ?
> ret_from_fork+0x57/0x70
> >
> > It's all kinds of guest kernel, and the processes are basically
> > anything that tries to access its block devices.
> >
> > Over in the dom0 at the time, I mostly haven't managed to get logs,
> > probably because its logging is on the same md device that is having
> > problems. Some of the servers are fortunate to have their dom0
> > operating system installed on separate devices to the guest devices,
> > and on one of those I got this:
> >
> > Feb 20 00:58:44 talisker kernel: [5876461.472590] INFO: task
> md5_raid10:226 blocked for more than 120 seconds.
> > Feb 20 00:58:44 talisker kernel: [5876461.473105]       Not tainted
> 4.19.0-13-amd64 #1 Debian 4.19.160-2
> > Feb 20 00:58:44 talisker kernel: [5876461.473523] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > Feb 20 00:58:44 talisker kernel: [5876461.473936] md5_raid10      D
> 0   226      2 0x80000000
> > Feb 20 00:58:44 talisker kernel: [5876461.474341] Call Trace:
> > Feb 20 00:58:44 talisker kernel: [5876461.474743]  __schedule+0x29f/0x840
> > Feb 20 00:58:44 talisker kernel: [5876461.475142]  ?
> _raw_spin_unlock_irqrestore+0x14/0x20
> > Feb 20 00:58:44 talisker kernel: [5876461.475554]  schedule+0x28/0x80
> > Feb 20 00:58:44 talisker kernel: [5876461.475964]
> md_super_wait+0x6e/0xa0 [md_mod]
> > Feb 20 00:58:44 talisker kernel: [5876461.476372]  ?
> finish_wait+0x80/0x80
> > Feb 20 00:58:44 talisker kernel: [5876461.476817]
> md_bitmap_wait_writes+0x93/0xa0 [md_mod]
> > Feb 20 00:58:44 talisker kernel: [5876461.477504]  ?
> md_bitmap_get_counter+0x42/0xd0 [md_mod]
> > Feb 20 00:58:44 talisker kernel: [5876461.478248]
> md_bitmap_daemon_work+0x1f7/0x370 [md_mod]
> > Feb 20 00:58:44 talisker kernel: [5876461.478904]
> md_check_recovery+0x41/0x530 [md_mod]
> > Feb 20 00:58:44 talisker kernel: [5876461.479309]  raid10d+0x62/0x1460
> [raid10]
> > Feb 20 00:58:44 talisker kernel: [5876461.479722]  ?
> __switch_to_asm+0x41/0x70
> > Feb 20 00:58:44 talisker kernel: [5876461.480133]  ?
> finish_task_switch+0x78/0x280
> > Feb 20 00:58:44 talisker kernel: [5876461.480540]  ?
> _raw_spin_lock_irqsave+0x15/0x40
> > Feb 20 00:58:44 talisker kernel: [5876461.480987]  ?
> lock_timer_base+0x67/0x80
> > Feb 20 00:58:44 talisker kernel: [5876461.481719]  ?
> _raw_spin_unlock_irqrestore+0x14/0x20
> > Feb 20 00:58:44 talisker kernel: [5876461.482358]  ?
> try_to_del_timer_sync+0x4d/0x80
> > Feb 20 00:58:44 talisker kernel: [5876461.482768]  ?
> del_timer_sync+0x37/0x40
> > Feb 20 00:58:44 talisker kernel: [5876461.483162]  ?
> schedule_timeout+0x173/0x3b0
> > Feb 20 00:58:44 talisker kernel: [5876461.483553]  ?
> md_rdev_init+0xb0/0xb0 [md_mod]
> > Feb 20 00:58:44 talisker kernel: [5876461.483944]  ?
> md_thread+0x94/0x150 [md_mod]
> > Feb 20 00:58:44 talisker kernel: [5876461.484345]  ?
> r10bio_pool_alloc+0x20/0x20 [raid10]
> > Feb 20 00:58:44 talisker kernel: [5876461.484777]  md_thread+0x94/0x150
> [md_mod]
> > Feb 20 00:58:44 talisker kernel: [5876461.485500]  ?
> finish_wait+0x80/0x80
> > Feb 20 00:58:44 talisker kernel: [5876461.486083]  kthread+0x112/0x130
> > Feb 20 00:58:44 talisker kernel: [5876461.486479]  ?
> kthread_bind+0x30/0x30
> > Feb 20 00:58:44 talisker kernel: [5876461.486870]
> ret_from_fork+0x35/0x40
> > Feb 20 00:58:44 talisker kernel: [5876461.487260] INFO: task
> 1.xvda-0:4237 blocked for more than 120 seconds.
> > Feb 20 00:58:44 talisker kernel: [5876461.487644]       Not tainted
> 4.19.0-13-amd64 #1 Debian 4.19.160-2
> > Feb 20 00:58:44 talisker kernel: [5876461.488027] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > Feb 20 00:58:44 talisker kernel: [5876461.488422] 1.xvda-0        D
> 0  4237      2 0x80000000
> > Feb 20 00:58:44 talisker kernel: [5876461.488842] Call Trace:
> > Feb 20 00:58:44 talisker kernel: [5876461.489530]  __schedule+0x29f/0x840
> > Feb 20 00:58:44 talisker kernel: [5876461.490149]  ?
> _raw_spin_unlock_irqrestore+0x14/0x20
> > Feb 20 00:58:44 talisker kernel: [5876461.490545]  schedule+0x28/0x80
> > Feb 20 00:58:44 talisker kernel: [5876461.490954]
> md_super_wait+0x6e/0xa0 [md_mod]
> > Feb 20 00:58:44 talisker kernel: [5876461.491330]  ?
> finish_wait+0x80/0x80
> > Feb 20 00:58:44 talisker kernel: [5876461.491708]
> md_bitmap_wait_writes+0x93/0xa0 [md_mod]
> > Feb 20 00:58:44 talisker kernel: [5876461.492101]
> md_bitmap_unplug+0xc5/0x120 [md_mod]
> > Feb 20 00:58:44 talisker kernel: [5876461.492490]
> raid10_unplug+0xd4/0x190 [raid10]
> > Feb 20 00:58:44 talisker kernel: [5876461.492926]
> blk_flush_plug_list+0xcf/0x240
> > Feb 20 00:58:44 talisker kernel: [5876461.493648]
> blk_finish_plug+0x21/0x2e
> > Feb 20 00:58:44 talisker kernel: [5876461.494277]
> dispatch_rw_block_io+0x696/0x990 [xen_blkback]
> > Feb 20 00:58:44 talisker kernel: [5876461.494657]  ? inv_show+0x30/0x30
> > Feb 20 00:58:44 talisker kernel: [5876461.495043]
> __do_block_io_op+0x30f/0x610 [xen_blkback]
> > Feb 20 00:58:44 talisker kernel: [5876461.495458]  ?
> _raw_spin_unlock_irqrestore+0x14/0x20
> > Feb 20 00:58:44 talisker kernel: [5876461.495871]  ?
> try_to_del_timer_sync+0x4d/0x80
> > Feb 20 00:58:44 talisker kernel: [5876461.496264]
> xen_blkif_schedule+0xdb/0x650 [xen_blkback]
> > Feb 20 00:58:44 talisker kernel: [5876461.496784]  ?
> finish_wait+0x80/0x80
> > Feb 20 00:58:44 talisker kernel: [5876461.497418]  ?
> xen_blkif_be_int+0x30/0x30 [xen_blkback]
> > Feb 20 00:58:44 talisker kernel: [5876461.498041]  kthread+0x112/0x130
> > Feb 20 00:58:44 talisker kernel: [5876461.498668]  ?
> kthread_bind+0x30/0x30
> > Feb 20 00:58:44 talisker kernel: [5876461.499309]
> ret_from_fork+0x35/0x40
> > Feb 20 00:58:44 talisker kernel: [5876461.499960] INFO: task
> 1.xvda-1:4238 blocked for more than 120 seconds.
> > Feb 20 00:58:44 talisker kernel: [5876461.500518]       Not tainted
> 4.19.0-13-amd64 #1 Debian 4.19.160-2
> > Feb 20 00:58:44 talisker kernel: [5876461.500943] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > Feb 20 00:58:44 talisker kernel: [5876461.501609] 1.xvda-1        D
> 0  4238      2 0x80000000
> > Feb 20 00:58:44 talisker kernel: [5876461.501992] Call Trace:
> > Feb 20 00:58:44 talisker kernel: [5876461.502372]  __schedule+0x29f/0x840
> > Feb 20 00:58:44 talisker kernel: [5876461.502747]  schedule+0x28/0x80
> > Feb 20 00:58:44 talisker kernel: [5876461.503121]  io_schedule+0x12/0x40
> > Feb 20 00:58:44 talisker kernel: [5876461.503494]  wbt_wait+0x205/0x300
> > Feb 20 00:58:44 talisker kernel: [5876461.503863]  ? wbt_wait+0x300/0x300
> > Feb 20 00:58:44 talisker kernel: [5876461.504237]
> rq_qos_throttle+0x31/0x40
> > Feb 20 00:58:44 talisker kernel: [5876461.504637]
> blk_mq_make_request+0x111/0x530
> > Feb 20 00:58:44 talisker kernel: [5876461.505319]
> generic_make_request+0x1a4/0x400
> > Feb 20 00:58:44 talisker kernel: [5876461.505999]
> raid10_unplug+0xfd/0x190 [raid10]
> > Feb 20 00:58:44 talisker kernel: [5876461.506402]
> blk_flush_plug_list+0xcf/0x240
> > Feb 20 00:58:44 talisker kernel: [5876461.506772]
> blk_finish_plug+0x21/0x2e
> > Feb 20 00:58:44 talisker kernel: [5876461.507140]
> dispatch_rw_block_io+0x696/0x990 [xen_blkback]
> > Feb 20 00:58:44 talisker kernel: [5876461.507792]  ? inv_show+0x30/0x30
> > Feb 20 00:58:44 talisker kernel: [5876461.508166]
> __do_block_io_op+0x30f/0x610 [xen_blkback]
> > Feb 20 00:58:44 talisker kernel: [5876461.508549]  ?
> _raw_spin_unlock_irqrestore+0x14/0x20
> > Feb 20 00:58:44 talisker kernel: [5876461.508967]  ?
> try_to_del_timer_sync+0x4d/0x80
> > Feb 20 00:58:44 talisker kernel: [5876461.509673]
> xen_blkif_schedule+0xdb/0x650 [xen_blkback]
> > Feb 20 00:58:44 talisker kernel: [5876461.510304]  ?
> finish_wait+0x80/0x80
> > Feb 20 00:58:44 talisker kernel: [5876461.510678]  ?
> xen_blkif_be_int+0x30/0x30 [xen_blkback]
> > Feb 20 00:58:44 talisker kernel: [5876461.511049]  kthread+0x112/0x130
> > Feb 20 00:58:44 talisker kernel: [5876461.511413]  ?
> kthread_bind+0x30/0x30
> > Feb 20 00:58:44 talisker kernel: [5876461.511776]
> ret_from_fork+0x35/0x40
> >
> > Administrators of the guests notice problems and try to shutdown or
> > reboot, but that fails because dom0 can't write to its xenstore, so
> > mostly domains can't be managed after this happens and the server
> > has to be forcibly rebooted.
> >
> > These are all using the default scheduler, which I understand since
> > 4.12 is credit2. SMT is enabled and I've limited dom0 to 2 cores,
> > then pinned dom0 to cores 0 and 1, and pinned all other guests to
> > their choice out of the remaining cores. That is something I did
> > fairly recently though; for a long time there was no pinning yet
> > this still started happening.
> >
> > In a couple of cases I have found that I've been able to run
> > "xentop" and see a particular guest doing heavy block device reads.
> > I've done an "xl destroy" on that guest and then everything has
> > returned to normal. Unfortunately the times this has happened have
> > been on dom0s without useful logs. There's just a gap in logs
> > between when the problems started and when the (apparently)
> > problematic domU is destroyed. The problematic domU can then be
> > booted again and life goes on.
> >
> > So, it could be totally unrelated to Xen, and as I investigate
> > further I will try different kernels in dom0. But the way that
> > destroying a domU frees things up makes me wonder if it could be Xen
> > related, maybe scheduler related? Also, it's always the md device
> > that the guest block devices are on that is stalled - IO to other
> > devices in dom0
> >
> > Are there any hypervisor magic sysrq debug keys that could provide
> > useful information to you in ruling in / out a Xen issue?
> >
> > Should I try using the "credit" scheduler (instead of "credit2") at
> > next boot?
> >
> > I *think* this has only been seen with kernel version
> > 4.19.0-13-amd64. Some of these servers have now been rebooted into
> > 4.19.0-14-amd64 (the latest available package) due to the issue,
> > which has not yet re-occurred for them.
> >
> > If it does re-occur with 4.19.0-14-amd64 what kernel version would
> > you advise I try out at next reboot so as to take the Debian kernel
> > out of the picture? I will download an upstream kernel release and
> > build a Debian package out of it, using my existing kernel config as
> > a base.
> >
> > As Debian buster is on the 4.19 series should I pick the latest
> > 4.19.x longterm to be near to it, or the 5.10.x longterm, or the
> > 5.11.x stable?
> >
> > Thanks,
> > Andy
>
>

[-- Attachment #2: Type: text/html, Size: 26920 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: dom0 suddenly blocking on all access to md device
  2021-06-12 22:47   ` Rob Townley
@ 2021-06-12 23:13     ` Andy Smith
  0 siblings, 0 replies; 6+ messages in thread
From: Andy Smith @ 2021-06-12 23:13 UTC (permalink / raw)
  To: xen-devel

Hi Rob,

On Sat, Jun 12, 2021 at 05:47:49PM -0500, Rob Townley wrote:
> mdadm.conf has email reporting capabilities to alert to failing drives.
> Test that you receive emails.

I do receive those emails, when such things occur, but the drives
are not failing.

Devices are not kicked out of MD arrays, all IO just stalls
completely. Also these incidents coincide with an upgrade of OS and
hypervisor and are happening on 5 different servers so far, so it
would be highly unlikely that so many devices suddenly went bad.

> Use mdadm to run tests on the raid.

Weekly scrubs take place using /usr/share/mdadm/checkarray

> smartctl -a /dev/

Yep, SMART health checks and self-testing are enabled.

I've now put two test servers on linux-image-amd64/buster-backports
and any time any of the production servers experiences the issue I
will boot it into that kernel next time.

Cheers,
Andy


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-06-12 23:14 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-26 22:39 dom0 suddenly blocking on all access to md device Andy Smith
2021-02-26 22:50 ` Andy Smith
2021-03-02  0:44 ` Andy Smith
2021-06-12 14:11 ` Andy Smith
2021-06-12 22:47   ` Rob Townley
2021-06-12 23:13     ` Andy Smith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).