* [dm-crypt] Dm-integrity freeze
@ 2019-03-07 15:26 Victor Helmholtz
2019-03-07 16:37 ` Milan Broz
0 siblings, 1 reply; 10+ messages in thread
From: Victor Helmholtz @ 2019-03-07 15:26 UTC (permalink / raw)
To: dm-crypt
[-- Attachment #1: Type: text/plain, Size: 2982 bytes --]
Hi
I have recently had a problem with my server: all writes to RAID drives were frozen and I had to force-reboot it. I looked at kernel logs and I think dm-integrity might be source of the problem. Could anyone help me understand meaning of following messages (this message was repeated 8 times, full kernel log is here: https://pastebin.com/DMgMtNJq <https://pastebin.com/DMgMtNJq> ):
Mar 1 23:48:21 unassigned-hostname kernel: [369732.498048] INFO: task kworker/0:112:26760 blocked for more than 120 seconds.
Mar 1 23:48:21 unassigned-hostname kernel: [369732.499784] Not tainted 4.19.0-2-amd64 #1 Debian 4.19.16-1
Mar 1 23:48:21 unassigned-hostname kernel: [369732.501222] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 1 23:48:21 unassigned-hostname kernel: [369732.503095] kworker/0:112 D 0 26760 2 0x80000080
Mar 1 23:48:21 unassigned-hostname kernel: [369732.504424] Workqueue: dm-integrity-metadata integrity_bio_wait [dm_integrity]
Mar 1 23:48:21 unassigned-hostname kernel: [369732.506149] Call Trace:
Mar 1 23:48:21 unassigned-hostname kernel: [369732.506782] ? __schedule+0x2a2/0x870
Mar 1 23:48:21 unassigned-hostname kernel: [369732.507706] schedule+0x28/0x80
Mar 1 23:48:21 unassigned-hostname kernel: [369732.508484] io_schedule+0x12/0x40
Mar 1 23:48:21 unassigned-hostname kernel: [369732.509321] wait_and_add_new_range+0x89/0xa0 [dm_integrity]
Mar 1 23:48:21 unassigned-hostname kernel: [369732.510709] dm_integrity_map_continue+0xe71/0x1020 [dm_integrity]
Mar 1 23:48:21 unassigned-hostname kernel: [369732.512193] ? __switch_to_asm+0x40/0x70
Mar 1 23:48:21 unassigned-hostname kernel: [369732.513149] ? __switch_to_asm+0x34/0x70
Mar 1 23:48:21 unassigned-hostname kernel: [369732.514105] ? __switch_to_asm+0x40/0x70
Mar 1 23:48:21 unassigned-hostname kernel: [369732.515088] ? __switch_to_asm+0x40/0x70
Mar 1 23:48:21 unassigned-hostname kernel: [369732.516048] ? __switch_to_asm+0x34/0x70
Mar 1 23:48:21 unassigned-hostname kernel: [369732.517000] ? __switch_to_asm+0x40/0x70
Mar 1 23:48:21 unassigned-hostname kernel: [369732.517978] ? __switch_to_asm+0x34/0x70
Mar 1 23:48:21 unassigned-hostname kernel: [369732.518968] ? __switch_to_asm+0x34/0x70
Mar 1 23:48:21 unassigned-hostname kernel: [369732.519926] ? __switch_to+0x8c/0x440
Mar 1 23:48:21 unassigned-hostname kernel: [369732.520822] process_one_work+0x1a7/0x3a0
Mar 1 23:48:21 unassigned-hostname kernel: [369732.521798] worker_thread+0x30/0x390
Mar 1 23:48:21 unassigned-hostname kernel: [369732.522725] ? pwq_unbound_release_workfn+0xd0/0xd0
Mar 1 23:48:21 unassigned-hostname kernel: [369732.523901] kthread+0x112/0x130
Mar 1 23:48:21 unassigned-hostname kernel: [369732.524691] ? kthread_bind+0x30/0x30
Mar 1 23:48:21 unassigned-hostname kernel: [369732.525584] ret_from_fork+0x35/0x40
Is this a bug in kernel or do I have some problems with hardware?
Regards
Victor
[-- Attachment #2: Type: text/html, Size: 4171 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dm-crypt] Dm-integrity freeze
2019-03-07 15:26 [dm-crypt] Dm-integrity freeze Victor Helmholtz
@ 2019-03-07 16:37 ` Milan Broz
2019-05-21 8:43 ` [dm-crypt] " Hans van Kranenburg
0 siblings, 1 reply; 10+ messages in thread
From: Milan Broz @ 2019-03-07 16:37 UTC (permalink / raw)
To: Victor Helmholtz, dm-crypt; +Cc: device-mapper development, Mikulas Patocka
Hi,
from the full log it seems that it is stuck in tgtd (iSCSI).
Anyway, this is device-mapper, dm-devel is better list here.
(added to cc + Mikulas)
m.
On 3/7/19 4:26 PM, Victor Helmholtz wrote:
> Hi
>
> I have recently had a problem with my server: all writes to RAID drives were frozen and I had to force-reboot it. I looked at kernel logs and I think dm-integrity might be source of the problem. Could anyone help me understand meaning of following messages (this message was repeated 8 times, full kernel log is here: https://pastebin.com/DMgMtNJq ):
>
> Mar 1 23:48:21 unassigned-hostname kernel: [369732.498048] INFO: task kworker/0:112:26760 blocked for more than 120 seconds.
> Mar 1 23:48:21 unassigned-hostname kernel: [369732.499784] Not tainted 4.19.0-2-amd64 #1 Debian 4.19.16-1
> Mar 1 23:48:21 unassigned-hostname kernel: [369732.501222] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Mar 1 23:48:21 unassigned-hostname kernel: [369732.503095] kworker/0:112 D 0 26760 2 0x80000080
> Mar 1 23:48:21 unassigned-hostname kernel: [369732.504424] Workqueue: dm-integrity-metadata integrity_bio_wait [dm_integrity]
> Mar 1 23:48:21 unassigned-hostname kernel: [369732.506149] Call Trace:
> Mar 1 23:48:21 unassigned-hostname kernel: [369732.506782] ? __schedule+0x2a2/0x870
> Mar 1 23:48:21 unassigned-hostname kernel: [369732.507706] schedule+0x28/0x80
> Mar 1 23:48:21 unassigned-hostname kernel: [369732.508484] io_schedule+0x12/0x40
> Mar 1 23:48:21 unassigned-hostname kernel: [369732.509321] wait_and_add_new_range+0x89/0xa0 [dm_integrity]
> Mar 1 23:48:21 unassigned-hostname kernel: [369732.510709] dm_integrity_map_continue+0xe71/0x1020 [dm_integrity]
> Mar 1 23:48:21 unassigned-hostname kernel: [369732.512193] ? __switch_to_asm+0x40/0x70
> Mar 1 23:48:21 unassigned-hostname kernel: [369732.513149] ? __switch_to_asm+0x34/0x70
> Mar 1 23:48:21 unassigned-hostname kernel: [369732.514105] ? __switch_to_asm+0x40/0x70
> Mar 1 23:48:21 unassigned-hostname kernel: [369732.515088] ? __switch_to_asm+0x40/0x70
> Mar 1 23:48:21 unassigned-hostname kernel: [369732.516048] ? __switch_to_asm+0x34/0x70
> Mar 1 23:48:21 unassigned-hostname kernel: [369732.517000] ? __switch_to_asm+0x40/0x70
> Mar 1 23:48:21 unassigned-hostname kernel: [369732.517978] ? __switch_to_asm+0x34/0x70
> Mar 1 23:48:21 unassigned-hostname kernel: [369732.518968] ? __switch_to_asm+0x34/0x70
> Mar 1 23:48:21 unassigned-hostname kernel: [369732.519926] ? __switch_to+0x8c/0x440
> Mar 1 23:48:21 unassigned-hostname kernel: [369732.520822] process_one_work+0x1a7/0x3a0
> Mar 1 23:48:21 unassigned-hostname kernel: [369732.521798] worker_thread+0x30/0x390
> Mar 1 23:48:21 unassigned-hostname kernel: [369732.522725] ? pwq_unbound_release_workfn+0xd0/0xd0
> Mar 1 23:48:21 unassigned-hostname kernel: [369732.523901] kthread+0x112/0x130
> Mar 1 23:48:21 unassigned-hostname kernel: [369732.524691] ? kthread_bind+0x30/0x30
> Mar 1 23:48:21 unassigned-hostname kernel: [369732.525584] ret_from_fork+0x35/0x40
>
> Is this a bug in kernel or do I have some problems with hardware?
>
> Regards
> Victor
>
> _______________________________________________
> dm-crypt mailing list
> dm-crypt@saout.de
> https://www.saout.de/mailman/listinfo/dm-crypt
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dm-crypt] [dm-devel] Dm-integrity freeze
2019-03-07 16:37 ` Milan Broz
@ 2019-05-21 8:43 ` Hans van Kranenburg
0 siblings, 0 replies; 10+ messages in thread
From: Hans van Kranenburg @ 2019-05-21 8:43 UTC (permalink / raw)
To: Milan Broz, Victor Helmholtz, dm-crypt
Cc: device-mapper development, Mikulas Patocka
Hi,
I'm seeing the same lockup, also 4.19. This is mdadm RAID10 on top of 4x
a partition with only dm-integrity.
It just happened out of the blue, no heavy load or anything. All IO to
it is frozen now.
[777184.932426] INFO: task md127_raid10:507 blocked for more than 120
seconds.
[777184.932544] Tainted: G W 4.19.0-mx4-amd64 #1
Debian 4.19.28-2+mendix1
[777184.932666] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[777184.932788] md127_raid10 D 0 507 2 0x80000000
[777184.932791] Call Trace:
[777184.932801] ? __schedule+0x2a2/0x870
[777184.932804] ? _raw_spin_unlock_irqrestore+0x14/0x20
[777184.932805] schedule+0x28/0x80
[777184.932814] md_super_wait+0x6e/0xa0 [md_mod]
[777184.932820] ? finish_wait+0x80/0x80
[777184.932824] write_page+0x172/0x320 [md_mod]
[777184.932826] ? __schedule+0x2aa/0x870
[777184.932828] ? _raw_spin_unlock_irqrestore+0x14/0x20
[777184.932832] md_update_sb.part.64+0x382/0x8f0 [md_mod]
[777184.932837] md_check_recovery+0x148/0x540 [md_mod]
[777184.932841] raid10d+0x62/0x1440 [raid10]
[777184.932843] ? __switch_to_asm+0x34/0x70
[777184.932845] ? finish_task_switch+0x78/0x260
[777184.932847] ? _raw_spin_lock_irqsave+0x15/0x40
[777184.932850] ? lock_timer_base+0x67/0x80
[777184.932852] ? _raw_spin_unlock_irqrestore+0x14/0x20
[777184.932854] ? try_to_del_timer_sync+0x4d/0x80
[777184.932855] ? del_timer_sync+0x35/0x40
[777184.932857] ? schedule_timeout+0x173/0x390
[777184.932861] ? md_rdev_init+0xb0/0xb0 [md_mod]
[777184.932864] ? md_thread+0x94/0x150 [md_mod]
[777184.932866] ? raid10_end_write_request+0x290/0x290 [raid10]
[777184.932869] md_thread+0x94/0x150 [md_mod]
[777184.932872] ? finish_wait+0x80/0x80
[777184.932873] kthread+0x112/0x130
[777184.932875] ? kthread_bind+0x30/0x30
[777184.932877] ret_from_fork+0x35/0x40
[777184.932884] INFO: task 1.xvda-1:1657 blocked for more than 120 seconds.
[777184.932979] Tainted: G W 4.19.0-mx4-amd64 #1
Debian 4.19.28-2+mendix1
[777184.933101] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[777184.933222] 1.xvda-1 D 0 1657 2 0x80000000
[777184.933223] Call Trace:
[777184.933225] ? __schedule+0x2a2/0x870
[777184.933227] ? _raw_spin_unlock_irqrestore+0x14/0x20
[777184.933228] schedule+0x28/0x80
[777184.933232] md_write_start+0x14b/0x220 [md_mod]
[777184.933234] ? finish_wait+0x80/0x80
[777184.933236] raid10_make_request+0x3d/0x130 [raid10]
[777184.933239] ? finish_wait+0x80/0x80
[777184.933242] md_handle_request+0x119/0x190 [md_mod]
[777184.933247] md_make_request+0x78/0x160 [md_mod]
[777184.933251] generic_make_request+0x1a4/0x410
[777184.933254] ? bvec_alloc+0x51/0xe0
[777184.933256] submit_bio+0x45/0x140
[777184.933258] ? bio_add_page+0x48/0x60
[777184.933263] dispatch_rw_block_io+0x68f/0x990 [xen_blkback]
[777184.933266] ? branch_type+0x2e4/0x350
[777184.933269] __do_block_io_op+0x2fc/0x5f0 [xen_blkback]
[777184.933271] ? _raw_spin_unlock_irqrestore+0x14/0x20
[777184.933272] ? try_to_del_timer_sync+0x4d/0x80
[777184.933275] xen_blkif_schedule+0xce/0x620 [xen_blkback]
[777184.933277] ? finish_wait+0x80/0x80
[777184.933279] ? xen_blkif_be_int+0x30/0x30 [xen_blkback]
[777184.933281] kthread+0x112/0x130
[777184.933282] ? kthread_bind+0x30/0x30
[777184.933283] ret_from_fork+0x35/0x40
[777184.933293] INFO: task 11.xvda-0:14864 blocked for more than 120
seconds.
[777184.933387] Tainted: G W 4.19.0-mx4-amd64 #1
Debian 4.19.28-2+mendix1
[777184.933509] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[777184.933638] 11.xvda-0 D 0 14864 2 0x80000000
[777184.933640] Call Trace:
[777184.933642] ? __schedule+0x2a2/0x870
[777184.933643] ? _raw_spin_unlock_irqrestore+0x14/0x20
[777184.933644] schedule+0x28/0x80
[777184.933648] md_write_start+0x14b/0x220 [md_mod]
[777184.933650] ? finish_wait+0x80/0x80
[777184.933652] raid10_make_request+0x3d/0x130 [raid10]
[777184.933654] ? finish_wait+0x80/0x80
[777184.933658] md_handle_request+0x119/0x190 [md_mod]
[777184.933662] md_make_request+0x78/0x160 [md_mod]
[777184.933664] generic_make_request+0x1a4/0x410
[777184.933666] submit_bio+0x45/0x140
[777184.933668] ? bio_add_page+0x48/0x60
[777184.933670] dispatch_rw_block_io+0x68f/0x990 [xen_blkback]
[777184.933672] ? branch_type+0x2e7/0x350
[777184.933674] __do_block_io_op+0x2fc/0x5f0 [xen_blkback]
[777184.933676] ? _raw_spin_unlock_irqrestore+0x14/0x20
[777184.933678] ? try_to_del_timer_sync+0x4d/0x80
[777184.933680] xen_blkif_schedule+0xce/0x620 [xen_blkback]
[777184.933682] ? __schedule+0x2aa/0x870
[777184.933684] ? finish_wait+0x80/0x80
[777184.933686] ? xen_blkif_be_int+0x30/0x30 [xen_blkback]
[777184.933687] kthread+0x112/0x130
[777184.933688] ? kthread_bind+0x30/0x30
[777184.933689] ret_from_fork+0x35/0x40
[777184.933694] INFO: task btrfs-transacti:15468 blocked for more than
120 seconds.
[777184.933815] Tainted: G W 4.19.0-mx4-amd64 #1
Debian 4.19.28-2+mendix1
[777184.933937] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[777184.934058] btrfs-transacti D 0 15468 2 0x80000000
[777184.934059] Call Trace:
[777184.934061] ? __schedule+0x2a2/0x870
[777184.934062] ? _raw_spin_unlock_irqrestore+0x14/0x20
[777184.934063] schedule+0x28/0x80
[777184.934067] md_write_start+0x14b/0x220 [md_mod]
[777184.934069] ? finish_wait+0x80/0x80
[777184.934072] raid10_make_request+0x3d/0x130 [raid10]
[777184.934074] ? finish_wait+0x80/0x80
[777184.934077] md_handle_request+0x119/0x190 [md_mod]
[777184.934081] md_make_request+0x78/0x160 [md_mod]
[777184.934083] generic_make_request+0x1a4/0x410
[777184.934085] submit_bio+0x45/0x140
[777184.934086] ? bio_clone_fast+0x2c/0x60
[777184.934119] btrfs_map_bio+0x20a/0x350 [btrfs]
[777184.934136] btree_submit_bio_hook+0x8d/0xc0 [btrfs]
[777184.934155] submit_one_bio+0x5f/0x80 [btrfs]
[777184.934173] flush_write_bio.isra.40+0x1d/0x30 [btrfs]
[777184.934191] btree_write_cache_pages+0x243/0x360 [btrfs]
[777184.934210] ? alloc_extent_state+0x1e/0xc0 [btrfs]
[777184.934226] ? btrfs_buffer_uptodate+0x33/0x50 [btrfs]
[777184.934238] ? read_block_for_search.isra.36+0xf2/0x340 [btrfs]
[777184.934241] do_writepages+0x41/0xd0
[777184.934244] ? __slab_alloc+0x27/0x30
[777184.934261] ? merge_state.part.45+0x3f/0x130 [btrfs]
[777184.934262] ? kmem_cache_alloc+0x15c/0x1c0
[777184.934279] ? clear_state_bit+0xdb/0x180 [btrfs]
[777184.934281] __filemap_fdatawrite_range+0xbe/0xf0
[777184.934298] btrfs_write_marked_extents+0x68/0x150 [btrfs]
[777184.934314] btrfs_write_and_wait_transaction.isra.21+0x4d/0xa0 [btrfs]
[777184.934331] btrfs_commit_transaction+0x56c/0x870 [btrfs]
[777184.934347] ? start_transaction+0x9d/0x3f0 [btrfs]
[777184.934362] transaction_kthread+0x147/0x180 [btrfs]
[777184.934379] ? btrfs_cleanup_transaction+0x530/0x530 [btrfs]
[777184.934380] kthread+0x112/0x130
[777184.934381] ? kthread_bind+0x30/0x30
[777184.934383] ret_from_fork+0x35/0x40
[777184.934391] INFO: task kworker/1:3:16943 blocked for more than 120
seconds.
[777184.934487] Tainted: G W 4.19.0-mx4-amd64 #1
Debian 4.19.28-2+mendix1
[777184.934617] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[777184.934747] kworker/1:3 D 0 16943 2 0x80000000
[777184.934753] Workqueue: dm-integrity-metadata integrity_bio_wait
[dm_integrity]
[777184.934754] Call Trace:
[777184.934756] ? __schedule+0x2a2/0x870
[777184.934757] schedule+0x28/0x80
[777184.934760] io_schedule+0x12/0x40
[777184.934763] wait_and_add_new_range+0x89/0xa0 [dm_integrity]
[777184.934766] dm_integrity_map_continue+0xe71/0x1020 [dm_integrity]
[777184.934769] ? xen_load_sp0+0x77/0x170
[777184.934771] ? __switch_to+0x1e1/0x440
[777184.934773] ? finish_task_switch+0x78/0x260
[777184.934775] process_one_work+0x1a7/0x3a0
[777184.934777] worker_thread+0x30/0x390
[777184.934779] ? create_worker+0x1a0/0x1a0
[777184.934781] kthread+0x112/0x130
[777184.934782] ? kthread_bind+0x30/0x30
[777184.934783] ret_from_fork+0x35/0x40
[777184.934787] INFO: task kworker/0:8:17282 blocked for more than 120
seconds.
[777184.934882] Tainted: G W 4.19.0-mx4-amd64 #1
Debian 4.19.28-2+mendix1
[777184.935005] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[777184.935125] kworker/0:8 D 0 17282 2 0x80000000
[777184.935129] Workqueue: dm-integrity-metadata integrity_bio_wait
[dm_integrity]
[777184.935130] Call Trace:
[777184.935132] ? __schedule+0x2a2/0x870
[777184.935133] schedule+0x28/0x80
[777184.935135] io_schedule+0x12/0x40
[777184.935137] wait_and_add_new_range+0x89/0xa0 [dm_integrity]
[777184.935139] dm_integrity_map_continue+0xe71/0x1020 [dm_integrity]
[777184.935141] ? xen_load_sp0+0x77/0x170
[777184.935143] ? __switch_to+0x1e1/0x440
[777184.935144] ? finish_task_switch+0x78/0x260
[777184.935146] process_one_work+0x1a7/0x3a0
[777184.935148] worker_thread+0x30/0x390
[777184.935149] ? create_worker+0x1a0/0x1a0
[777184.935150] kthread+0x112/0x130
[777184.935151] ? kthread_bind+0x30/0x30
[777184.935153] ret_from_fork+0x35/0x40
[777184.935156] INFO: task kworker/u8:9:17284 blocked for more than 120
seconds.
[777184.935251] Tainted: G W 4.19.0-mx4-amd64 #1
Debian 4.19.28-2+mendix1
[777184.935373] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[777184.935494] kworker/u8:9 D 0 17284 2 0x80000000
[777184.935515] Workqueue: btrfs-submit btrfs_submit_helper [btrfs]
[777184.935515] Call Trace:
[777184.935517] ? __schedule+0x2a2/0x870
[777184.935519] ? _raw_spin_unlock_irqrestore+0x14/0x20
[777184.935520] schedule+0x28/0x80
[777184.935525] md_write_start+0x14b/0x220 [md_mod]
[777184.935527] ? finish_wait+0x80/0x80
[777184.935529] raid10_make_request+0x3d/0x130 [raid10]
[777184.935531] ? finish_wait+0x80/0x80
[777184.935535] md_handle_request+0x119/0x190 [md_mod]
[777184.935539] md_make_request+0x78/0x160 [md_mod]
[777184.935541] generic_make_request+0x1a4/0x410
[777184.935544] submit_bio+0x45/0x140
[777184.935562] run_scheduled_bios+0x18e/0x440 [btrfs]
[777184.935581] normal_work_helper+0xba/0x300 [btrfs]
[777184.935584] process_one_work+0x1a7/0x3a0
[777184.935586] worker_thread+0x30/0x390
[777184.935587] ? create_worker+0x1a0/0x1a0
[777184.935588] kthread+0x112/0x130
[777184.935589] ? kthread_bind+0x30/0x30
[777184.935591] ret_from_fork+0x35/0x40
[777184.935593] INFO: task kworker/1:13:17315 blocked for more than 120
seconds.
[777184.935713] Tainted: G W 4.19.0-mx4-amd64 #1
Debian 4.19.28-2+mendix1
[777184.935836] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[777184.935957] kworker/1:13 D 0 17315 2 0x80000000
[777184.935960] Workqueue: dm-integrity-metadata integrity_bio_wait
[dm_integrity]
[777184.935961] Call Trace:
[777184.935963] ? __schedule+0x2a2/0x870
[777184.935965] schedule+0x28/0x80
[777184.935967] io_schedule+0x12/0x40
[777184.935969] wait_and_add_new_range+0x89/0xa0 [dm_integrity]
[777184.935971] dm_integrity_map_continue+0xe71/0x1020 [dm_integrity]
[777184.935973] ? xen_load_sp0+0x77/0x170
[777184.935975] ? __switch_to+0x1e1/0x440
[777184.935976] ? finish_task_switch+0x78/0x260
[777184.935978] process_one_work+0x1a7/0x3a0
[777184.935980] worker_thread+0x30/0x390
[777184.935981] ? create_worker+0x1a0/0x1a0
[777184.935982] kthread+0x112/0x130
[777184.935983] ? kthread_bind+0x30/0x30
[777184.935985] ret_from_fork+0x35/0x40
[777184.935988] INFO: task kworker/0:20:17330 blocked for more than 120
seconds.
[777184.936083] Tainted: G W 4.19.0-mx4-amd64 #1
Debian 4.19.28-2+mendix1
[777184.936205] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[777184.936355] kworker/0:20 D 0 17330 2 0x80000000
[777184.936359] Workqueue: dm-integrity-metadata integrity_bio_wait
[dm_integrity]
[777184.936360] Call Trace:
[777184.936363] ? __schedule+0x2a2/0x870
[777184.936364] schedule+0x28/0x80
[777184.936366] io_schedule+0x12/0x40
[777184.936368] wait_and_add_new_range+0x89/0xa0 [dm_integrity]
[777184.936370] dm_integrity_map_continue+0xe71/0x1020 [dm_integrity]
[777184.936373] ? xen_load_sp0+0x77/0x170
[777184.936374] ? __switch_to+0x1e1/0x440
[777184.936375] ? finish_task_switch+0x78/0x260
[777184.936377] process_one_work+0x1a7/0x3a0
[777184.936379] worker_thread+0x30/0x390
[777184.936381] ? create_worker+0x1a0/0x1a0
[777184.936382] kthread+0x112/0x130
[777184.936383] ? kthread_bind+0x30/0x30
[777184.936385] ret_from_fork+0x35/0x40
[777184.936390] INFO: task kworker/1:23:17371 blocked for more than 120
seconds.
[777184.936500] Tainted: G W 4.19.0-mx4-amd64 #1
Debian 4.19.28-2+mendix1
[777184.936625] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[777184.936762] kworker/1:23 D 0 17371 2 0x80000000
[777184.936775] Workqueue: dm-integrity-metadata integrity_bio_wait
[dm_integrity]
[777184.936786] Call Trace:
[777184.936791] ? __schedule+0x2a2/0x870
[777184.936798] schedule+0x28/0x80
[777184.936805] io_schedule+0x12/0x40
[777184.936812] wait_and_add_new_range+0x89/0xa0 [dm_integrity]
[777184.936819] dm_integrity_map_continue+0xe71/0x1020 [dm_integrity]
[777184.936828] ? xen_load_sp0+0x77/0x170
[777184.936834] ? __switch_to+0x1e1/0x440
[777184.936842] ? finish_task_switch+0x78/0x260
[777184.936849] process_one_work+0x1a7/0x3a0
[777184.936856] worker_thread+0x30/0x390
[777184.936862] ? create_worker+0x1a0/0x1a0
[777184.936868] kthread+0x112/0x130
[777184.936875] ? kthread_bind+0x30/0x30
[777184.936880] ret_from_fork+0x35/0x40
Hans
On 3/7/19 5:37 PM, Milan Broz wrote:
> Hi,
>
> from the full log it seems that it is stuck in tgtd (iSCSI).
>
> Anyway, this is device-mapper, dm-devel is better list here.
> (added to cc + Mikulas)
>
> m.
>
> On 3/7/19 4:26 PM, Victor Helmholtz wrote:
>> Hi
>>
>> I have recently had a problem with my server: all writes to RAID drives were frozen and I had to force-reboot it. I looked at kernel logs and I think dm-integrity might be source of the problem. Could anyone help me understand meaning of following messages (this message was repeated 8 times, full kernel log is here: https://pastebin.com/DMgMtNJq ):
>>
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.498048] INFO: task kworker/0:112:26760 blocked for more than 120 seconds.
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.499784] Not tainted 4.19.0-2-amd64 #1 Debian 4.19.16-1
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.501222] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.503095] kworker/0:112 D 0 26760 2 0x80000080
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.504424] Workqueue: dm-integrity-metadata integrity_bio_wait [dm_integrity]
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.506149] Call Trace:
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.506782] ? __schedule+0x2a2/0x870
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.507706] schedule+0x28/0x80
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.508484] io_schedule+0x12/0x40
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.509321] wait_and_add_new_range+0x89/0xa0 [dm_integrity]
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.510709] dm_integrity_map_continue+0xe71/0x1020 [dm_integrity]
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.512193] ? __switch_to_asm+0x40/0x70
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.513149] ? __switch_to_asm+0x34/0x70
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.514105] ? __switch_to_asm+0x40/0x70
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.515088] ? __switch_to_asm+0x40/0x70
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.516048] ? __switch_to_asm+0x34/0x70
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.517000] ? __switch_to_asm+0x40/0x70
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.517978] ? __switch_to_asm+0x34/0x70
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.518968] ? __switch_to_asm+0x34/0x70
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.519926] ? __switch_to+0x8c/0x440
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.520822] process_one_work+0x1a7/0x3a0
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.521798] worker_thread+0x30/0x390
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.522725] ? pwq_unbound_release_workfn+0xd0/0xd0
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.523901] kthread+0x112/0x130
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.524691] ? kthread_bind+0x30/0x30
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.525584] ret_from_fork+0x35/0x40
>>
>> Is this a bug in kernel or do I have some problems with hardware?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dm-crypt] Dm-integrity freeze
@ 2019-05-21 8:43 ` Hans van Kranenburg
0 siblings, 0 replies; 10+ messages in thread
From: Hans van Kranenburg @ 2019-05-21 8:43 UTC (permalink / raw)
To: Milan Broz, Victor Helmholtz, dm-crypt
Cc: device-mapper development, Mikulas Patocka
Hi,
I'm seeing the same lockup, also 4.19. This is mdadm RAID10 on top of 4x
a partition with only dm-integrity.
It just happened out of the blue, no heavy load or anything. All IO to
it is frozen now.
[777184.932426] INFO: task md127_raid10:507 blocked for more than 120
seconds.
[777184.932544] Tainted: G W 4.19.0-mx4-amd64 #1
Debian 4.19.28-2+mendix1
[777184.932666] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[777184.932788] md127_raid10 D 0 507 2 0x80000000
[777184.932791] Call Trace:
[777184.932801] ? __schedule+0x2a2/0x870
[777184.932804] ? _raw_spin_unlock_irqrestore+0x14/0x20
[777184.932805] schedule+0x28/0x80
[777184.932814] md_super_wait+0x6e/0xa0 [md_mod]
[777184.932820] ? finish_wait+0x80/0x80
[777184.932824] write_page+0x172/0x320 [md_mod]
[777184.932826] ? __schedule+0x2aa/0x870
[777184.932828] ? _raw_spin_unlock_irqrestore+0x14/0x20
[777184.932832] md_update_sb.part.64+0x382/0x8f0 [md_mod]
[777184.932837] md_check_recovery+0x148/0x540 [md_mod]
[777184.932841] raid10d+0x62/0x1440 [raid10]
[777184.932843] ? __switch_to_asm+0x34/0x70
[777184.932845] ? finish_task_switch+0x78/0x260
[777184.932847] ? _raw_spin_lock_irqsave+0x15/0x40
[777184.932850] ? lock_timer_base+0x67/0x80
[777184.932852] ? _raw_spin_unlock_irqrestore+0x14/0x20
[777184.932854] ? try_to_del_timer_sync+0x4d/0x80
[777184.932855] ? del_timer_sync+0x35/0x40
[777184.932857] ? schedule_timeout+0x173/0x390
[777184.932861] ? md_rdev_init+0xb0/0xb0 [md_mod]
[777184.932864] ? md_thread+0x94/0x150 [md_mod]
[777184.932866] ? raid10_end_write_request+0x290/0x290 [raid10]
[777184.932869] md_thread+0x94/0x150 [md_mod]
[777184.932872] ? finish_wait+0x80/0x80
[777184.932873] kthread+0x112/0x130
[777184.932875] ? kthread_bind+0x30/0x30
[777184.932877] ret_from_fork+0x35/0x40
[777184.932884] INFO: task 1.xvda-1:1657 blocked for more than 120 seconds.
[777184.932979] Tainted: G W 4.19.0-mx4-amd64 #1
Debian 4.19.28-2+mendix1
[777184.933101] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[777184.933222] 1.xvda-1 D 0 1657 2 0x80000000
[777184.933223] Call Trace:
[777184.933225] ? __schedule+0x2a2/0x870
[777184.933227] ? _raw_spin_unlock_irqrestore+0x14/0x20
[777184.933228] schedule+0x28/0x80
[777184.933232] md_write_start+0x14b/0x220 [md_mod]
[777184.933234] ? finish_wait+0x80/0x80
[777184.933236] raid10_make_request+0x3d/0x130 [raid10]
[777184.933239] ? finish_wait+0x80/0x80
[777184.933242] md_handle_request+0x119/0x190 [md_mod]
[777184.933247] md_make_request+0x78/0x160 [md_mod]
[777184.933251] generic_make_request+0x1a4/0x410
[777184.933254] ? bvec_alloc+0x51/0xe0
[777184.933256] submit_bio+0x45/0x140
[777184.933258] ? bio_add_page+0x48/0x60
[777184.933263] dispatch_rw_block_io+0x68f/0x990 [xen_blkback]
[777184.933266] ? branch_type+0x2e4/0x350
[777184.933269] __do_block_io_op+0x2fc/0x5f0 [xen_blkback]
[777184.933271] ? _raw_spin_unlock_irqrestore+0x14/0x20
[777184.933272] ? try_to_del_timer_sync+0x4d/0x80
[777184.933275] xen_blkif_schedule+0xce/0x620 [xen_blkback]
[777184.933277] ? finish_wait+0x80/0x80
[777184.933279] ? xen_blkif_be_int+0x30/0x30 [xen_blkback]
[777184.933281] kthread+0x112/0x130
[777184.933282] ? kthread_bind+0x30/0x30
[777184.933283] ret_from_fork+0x35/0x40
[777184.933293] INFO: task 11.xvda-0:14864 blocked for more than 120
seconds.
[777184.933387] Tainted: G W 4.19.0-mx4-amd64 #1
Debian 4.19.28-2+mendix1
[777184.933509] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[777184.933638] 11.xvda-0 D 0 14864 2 0x80000000
[777184.933640] Call Trace:
[777184.933642] ? __schedule+0x2a2/0x870
[777184.933643] ? _raw_spin_unlock_irqrestore+0x14/0x20
[777184.933644] schedule+0x28/0x80
[777184.933648] md_write_start+0x14b/0x220 [md_mod]
[777184.933650] ? finish_wait+0x80/0x80
[777184.933652] raid10_make_request+0x3d/0x130 [raid10]
[777184.933654] ? finish_wait+0x80/0x80
[777184.933658] md_handle_request+0x119/0x190 [md_mod]
[777184.933662] md_make_request+0x78/0x160 [md_mod]
[777184.933664] generic_make_request+0x1a4/0x410
[777184.933666] submit_bio+0x45/0x140
[777184.933668] ? bio_add_page+0x48/0x60
[777184.933670] dispatch_rw_block_io+0x68f/0x990 [xen_blkback]
[777184.933672] ? branch_type+0x2e7/0x350
[777184.933674] __do_block_io_op+0x2fc/0x5f0 [xen_blkback]
[777184.933676] ? _raw_spin_unlock_irqrestore+0x14/0x20
[777184.933678] ? try_to_del_timer_sync+0x4d/0x80
[777184.933680] xen_blkif_schedule+0xce/0x620 [xen_blkback]
[777184.933682] ? __schedule+0x2aa/0x870
[777184.933684] ? finish_wait+0x80/0x80
[777184.933686] ? xen_blkif_be_int+0x30/0x30 [xen_blkback]
[777184.933687] kthread+0x112/0x130
[777184.933688] ? kthread_bind+0x30/0x30
[777184.933689] ret_from_fork+0x35/0x40
[777184.933694] INFO: task btrfs-transacti:15468 blocked for more than
120 seconds.
[777184.933815] Tainted: G W 4.19.0-mx4-amd64 #1
Debian 4.19.28-2+mendix1
[777184.933937] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[777184.934058] btrfs-transacti D 0 15468 2 0x80000000
[777184.934059] Call Trace:
[777184.934061] ? __schedule+0x2a2/0x870
[777184.934062] ? _raw_spin_unlock_irqrestore+0x14/0x20
[777184.934063] schedule+0x28/0x80
[777184.934067] md_write_start+0x14b/0x220 [md_mod]
[777184.934069] ? finish_wait+0x80/0x80
[777184.934072] raid10_make_request+0x3d/0x130 [raid10]
[777184.934074] ? finish_wait+0x80/0x80
[777184.934077] md_handle_request+0x119/0x190 [md_mod]
[777184.934081] md_make_request+0x78/0x160 [md_mod]
[777184.934083] generic_make_request+0x1a4/0x410
[777184.934085] submit_bio+0x45/0x140
[777184.934086] ? bio_clone_fast+0x2c/0x60
[777184.934119] btrfs_map_bio+0x20a/0x350 [btrfs]
[777184.934136] btree_submit_bio_hook+0x8d/0xc0 [btrfs]
[777184.934155] submit_one_bio+0x5f/0x80 [btrfs]
[777184.934173] flush_write_bio.isra.40+0x1d/0x30 [btrfs]
[777184.934191] btree_write_cache_pages+0x243/0x360 [btrfs]
[777184.934210] ? alloc_extent_state+0x1e/0xc0 [btrfs]
[777184.934226] ? btrfs_buffer_uptodate+0x33/0x50 [btrfs]
[777184.934238] ? read_block_for_search.isra.36+0xf2/0x340 [btrfs]
[777184.934241] do_writepages+0x41/0xd0
[777184.934244] ? __slab_alloc+0x27/0x30
[777184.934261] ? merge_state.part.45+0x3f/0x130 [btrfs]
[777184.934262] ? kmem_cache_alloc+0x15c/0x1c0
[777184.934279] ? clear_state_bit+0xdb/0x180 [btrfs]
[777184.934281] __filemap_fdatawrite_range+0xbe/0xf0
[777184.934298] btrfs_write_marked_extents+0x68/0x150 [btrfs]
[777184.934314] btrfs_write_and_wait_transaction.isra.21+0x4d/0xa0 [btrfs]
[777184.934331] btrfs_commit_transaction+0x56c/0x870 [btrfs]
[777184.934347] ? start_transaction+0x9d/0x3f0 [btrfs]
[777184.934362] transaction_kthread+0x147/0x180 [btrfs]
[777184.934379] ? btrfs_cleanup_transaction+0x530/0x530 [btrfs]
[777184.934380] kthread+0x112/0x130
[777184.934381] ? kthread_bind+0x30/0x30
[777184.934383] ret_from_fork+0x35/0x40
[777184.934391] INFO: task kworker/1:3:16943 blocked for more than 120
seconds.
[777184.934487] Tainted: G W 4.19.0-mx4-amd64 #1
Debian 4.19.28-2+mendix1
[777184.934617] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[777184.934747] kworker/1:3 D 0 16943 2 0x80000000
[777184.934753] Workqueue: dm-integrity-metadata integrity_bio_wait
[dm_integrity]
[777184.934754] Call Trace:
[777184.934756] ? __schedule+0x2a2/0x870
[777184.934757] schedule+0x28/0x80
[777184.934760] io_schedule+0x12/0x40
[777184.934763] wait_and_add_new_range+0x89/0xa0 [dm_integrity]
[777184.934766] dm_integrity_map_continue+0xe71/0x1020 [dm_integrity]
[777184.934769] ? xen_load_sp0+0x77/0x170
[777184.934771] ? __switch_to+0x1e1/0x440
[777184.934773] ? finish_task_switch+0x78/0x260
[777184.934775] process_one_work+0x1a7/0x3a0
[777184.934777] worker_thread+0x30/0x390
[777184.934779] ? create_worker+0x1a0/0x1a0
[777184.934781] kthread+0x112/0x130
[777184.934782] ? kthread_bind+0x30/0x30
[777184.934783] ret_from_fork+0x35/0x40
[777184.934787] INFO: task kworker/0:8:17282 blocked for more than 120
seconds.
[777184.934882] Tainted: G W 4.19.0-mx4-amd64 #1
Debian 4.19.28-2+mendix1
[777184.935005] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[777184.935125] kworker/0:8 D 0 17282 2 0x80000000
[777184.935129] Workqueue: dm-integrity-metadata integrity_bio_wait
[dm_integrity]
[777184.935130] Call Trace:
[777184.935132] ? __schedule+0x2a2/0x870
[777184.935133] schedule+0x28/0x80
[777184.935135] io_schedule+0x12/0x40
[777184.935137] wait_and_add_new_range+0x89/0xa0 [dm_integrity]
[777184.935139] dm_integrity_map_continue+0xe71/0x1020 [dm_integrity]
[777184.935141] ? xen_load_sp0+0x77/0x170
[777184.935143] ? __switch_to+0x1e1/0x440
[777184.935144] ? finish_task_switch+0x78/0x260
[777184.935146] process_one_work+0x1a7/0x3a0
[777184.935148] worker_thread+0x30/0x390
[777184.935149] ? create_worker+0x1a0/0x1a0
[777184.935150] kthread+0x112/0x130
[777184.935151] ? kthread_bind+0x30/0x30
[777184.935153] ret_from_fork+0x35/0x40
[777184.935156] INFO: task kworker/u8:9:17284 blocked for more than 120
seconds.
[777184.935251] Tainted: G W 4.19.0-mx4-amd64 #1
Debian 4.19.28-2+mendix1
[777184.935373] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[777184.935494] kworker/u8:9 D 0 17284 2 0x80000000
[777184.935515] Workqueue: btrfs-submit btrfs_submit_helper [btrfs]
[777184.935515] Call Trace:
[777184.935517] ? __schedule+0x2a2/0x870
[777184.935519] ? _raw_spin_unlock_irqrestore+0x14/0x20
[777184.935520] schedule+0x28/0x80
[777184.935525] md_write_start+0x14b/0x220 [md_mod]
[777184.935527] ? finish_wait+0x80/0x80
[777184.935529] raid10_make_request+0x3d/0x130 [raid10]
[777184.935531] ? finish_wait+0x80/0x80
[777184.935535] md_handle_request+0x119/0x190 [md_mod]
[777184.935539] md_make_request+0x78/0x160 [md_mod]
[777184.935541] generic_make_request+0x1a4/0x410
[777184.935544] submit_bio+0x45/0x140
[777184.935562] run_scheduled_bios+0x18e/0x440 [btrfs]
[777184.935581] normal_work_helper+0xba/0x300 [btrfs]
[777184.935584] process_one_work+0x1a7/0x3a0
[777184.935586] worker_thread+0x30/0x390
[777184.935587] ? create_worker+0x1a0/0x1a0
[777184.935588] kthread+0x112/0x130
[777184.935589] ? kthread_bind+0x30/0x30
[777184.935591] ret_from_fork+0x35/0x40
[777184.935593] INFO: task kworker/1:13:17315 blocked for more than 120
seconds.
[777184.935713] Tainted: G W 4.19.0-mx4-amd64 #1
Debian 4.19.28-2+mendix1
[777184.935836] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[777184.935957] kworker/1:13 D 0 17315 2 0x80000000
[777184.935960] Workqueue: dm-integrity-metadata integrity_bio_wait
[dm_integrity]
[777184.935961] Call Trace:
[777184.935963] ? __schedule+0x2a2/0x870
[777184.935965] schedule+0x28/0x80
[777184.935967] io_schedule+0x12/0x40
[777184.935969] wait_and_add_new_range+0x89/0xa0 [dm_integrity]
[777184.935971] dm_integrity_map_continue+0xe71/0x1020 [dm_integrity]
[777184.935973] ? xen_load_sp0+0x77/0x170
[777184.935975] ? __switch_to+0x1e1/0x440
[777184.935976] ? finish_task_switch+0x78/0x260
[777184.935978] process_one_work+0x1a7/0x3a0
[777184.935980] worker_thread+0x30/0x390
[777184.935981] ? create_worker+0x1a0/0x1a0
[777184.935982] kthread+0x112/0x130
[777184.935983] ? kthread_bind+0x30/0x30
[777184.935985] ret_from_fork+0x35/0x40
[777184.935988] INFO: task kworker/0:20:17330 blocked for more than 120
seconds.
[777184.936083] Tainted: G W 4.19.0-mx4-amd64 #1
Debian 4.19.28-2+mendix1
[777184.936205] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[777184.936355] kworker/0:20 D 0 17330 2 0x80000000
[777184.936359] Workqueue: dm-integrity-metadata integrity_bio_wait
[dm_integrity]
[777184.936360] Call Trace:
[777184.936363] ? __schedule+0x2a2/0x870
[777184.936364] schedule+0x28/0x80
[777184.936366] io_schedule+0x12/0x40
[777184.936368] wait_and_add_new_range+0x89/0xa0 [dm_integrity]
[777184.936370] dm_integrity_map_continue+0xe71/0x1020 [dm_integrity]
[777184.936373] ? xen_load_sp0+0x77/0x170
[777184.936374] ? __switch_to+0x1e1/0x440
[777184.936375] ? finish_task_switch+0x78/0x260
[777184.936377] process_one_work+0x1a7/0x3a0
[777184.936379] worker_thread+0x30/0x390
[777184.936381] ? create_worker+0x1a0/0x1a0
[777184.936382] kthread+0x112/0x130
[777184.936383] ? kthread_bind+0x30/0x30
[777184.936385] ret_from_fork+0x35/0x40
[777184.936390] INFO: task kworker/1:23:17371 blocked for more than 120
seconds.
[777184.936500] Tainted: G W 4.19.0-mx4-amd64 #1
Debian 4.19.28-2+mendix1
[777184.936625] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[777184.936762] kworker/1:23 D 0 17371 2 0x80000000
[777184.936775] Workqueue: dm-integrity-metadata integrity_bio_wait
[dm_integrity]
[777184.936786] Call Trace:
[777184.936791] ? __schedule+0x2a2/0x870
[777184.936798] schedule+0x28/0x80
[777184.936805] io_schedule+0x12/0x40
[777184.936812] wait_and_add_new_range+0x89/0xa0 [dm_integrity]
[777184.936819] dm_integrity_map_continue+0xe71/0x1020 [dm_integrity]
[777184.936828] ? xen_load_sp0+0x77/0x170
[777184.936834] ? __switch_to+0x1e1/0x440
[777184.936842] ? finish_task_switch+0x78/0x260
[777184.936849] process_one_work+0x1a7/0x3a0
[777184.936856] worker_thread+0x30/0x390
[777184.936862] ? create_worker+0x1a0/0x1a0
[777184.936868] kthread+0x112/0x130
[777184.936875] ? kthread_bind+0x30/0x30
[777184.936880] ret_from_fork+0x35/0x40
Hans
On 3/7/19 5:37 PM, Milan Broz wrote:
> Hi,
>
> from the full log it seems that it is stuck in tgtd (iSCSI).
>
> Anyway, this is device-mapper, dm-devel is better list here.
> (added to cc + Mikulas)
>
> m.
>
> On 3/7/19 4:26 PM, Victor Helmholtz wrote:
>> Hi
>>
>> I have recently had a problem with my server: all writes to RAID drives were frozen and I had to force-reboot it. I looked at kernel logs and I think dm-integrity might be source of the problem. Could anyone help me understand meaning of following messages (this message was repeated 8 times, full kernel log is here: https://pastebin.com/DMgMtNJq ):
>>
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.498048] INFO: task kworker/0:112:26760 blocked for more than 120 seconds.
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.499784] Not tainted 4.19.0-2-amd64 #1 Debian 4.19.16-1
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.501222] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.503095] kworker/0:112 D 0 26760 2 0x80000080
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.504424] Workqueue: dm-integrity-metadata integrity_bio_wait [dm_integrity]
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.506149] Call Trace:
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.506782] ? __schedule+0x2a2/0x870
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.507706] schedule+0x28/0x80
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.508484] io_schedule+0x12/0x40
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.509321] wait_and_add_new_range+0x89/0xa0 [dm_integrity]
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.510709] dm_integrity_map_continue+0xe71/0x1020 [dm_integrity]
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.512193] ? __switch_to_asm+0x40/0x70
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.513149] ? __switch_to_asm+0x34/0x70
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.514105] ? __switch_to_asm+0x40/0x70
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.515088] ? __switch_to_asm+0x40/0x70
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.516048] ? __switch_to_asm+0x34/0x70
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.517000] ? __switch_to_asm+0x40/0x70
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.517978] ? __switch_to_asm+0x34/0x70
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.518968] ? __switch_to_asm+0x34/0x70
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.519926] ? __switch_to+0x8c/0x440
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.520822] process_one_work+0x1a7/0x3a0
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.521798] worker_thread+0x30/0x390
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.522725] ? pwq_unbound_release_workfn+0xd0/0xd0
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.523901] kthread+0x112/0x130
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.524691] ? kthread_bind+0x30/0x30
>> Mar 1 23:48:21 unassigned-hostname kernel: [369732.525584] ret_from_fork+0x35/0x40
>>
>> Is this a bug in kernel or do I have some problems with hardware?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dm-crypt] [dm-devel] Dm-integrity freeze
2019-05-21 8:43 ` [dm-crypt] " Hans van Kranenburg
@ 2019-05-21 20:33 ` Hans van Kranenburg
-1 siblings, 0 replies; 10+ messages in thread
From: Hans van Kranenburg @ 2019-05-21 20:33 UTC (permalink / raw)
To: Milan Broz, Victor Helmholtz, dm-crypt
Cc: device-mapper development, Mikulas Patocka
Hi,
On 5/21/19 10:43 AM, Hans van Kranenburg wrote:
> Hi,
>
> I'm seeing the same lockup, also 4.19. This is mdadm RAID10 on top of 4x
> a partition with only dm-integrity.
>
> It just happened out of the blue, no heavy load or anything. All IO to
> it is frozen now.
>
> [...]
There it is again... dmesg dump below. All cpus on 100% iowait.
It's triggered after a few minutes by running some Windows 2019 server
install (ugh, don't ask) in a Xen HVM domU, which writes into a raw
sparse file on a btrfs filesystem on LVM on mdadm RAID10 on 4x
dm-integrity (wheeee!!)...
This morning it was triggered a few minutes after starting an old
windows 2008 server image that I copied to this machine.
When running only other Linux vms, and when copying data onto
filesystems that live in LVM logical volumes I haven't seen this problem
yet, at all, in the last few weeks that this machine is running.
I noticed there's a "dm integrity: fix deadlock with overlapping I/O"
fix in a later 4.19. Is there any chance this is related? I have no
idea, but any hints or suggestions about what to try would be appreciated.
Right now, I still have the opportunity to stop the migration and keep
using the previous hardware for some days, having the possibility to
reproduce this a few times and try things, if you want.
If not, I likely have to go try remove the dm-integrity layer, but I'd
rather help fixing this issue.
Thanks,
[37218.238621] INFO: task md127_raid10:485 blocked for more than 120
seconds.
[37218.238713] Not tainted 4.19.0-mx4-amd64 #1 Debian
4.19.28-2+mendix1
[37218.238782] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[37218.238856] md127_raid10 D 0 485 2 0x80000000
[37218.238859] Call Trace:
[37218.238868] ? __schedule+0x2a2/0x870
[37218.238871] ? _raw_spin_unlock_irqrestore+0x14/0x20
[37218.238872] schedule+0x28/0x80
[37218.238882] md_super_wait+0x6e/0xa0 [md_mod]
[37218.238887] ? finish_wait+0x80/0x80
[37218.238892] write_page+0x172/0x320 [md_mod]
[37218.238894] ? __schedule+0x2aa/0x870
[37218.238895] ? _raw_spin_unlock_irqrestore+0x14/0x20
[37218.238900] md_update_sb.part.64+0x382/0x8f0 [md_mod]
[37218.238904] md_check_recovery+0x148/0x540 [md_mod]
[37218.238907] raid10d+0x62/0x1440 [raid10]
[37218.238910] ? __switch_to_asm+0x34/0x70
[37218.238912] ? finish_task_switch+0x78/0x260
[37218.238914] ? _raw_spin_lock_irqsave+0x15/0x40
[37218.238917] ? lock_timer_base+0x67/0x80
[37218.238918] ? _raw_spin_unlock_irqrestore+0x14/0x20
[37218.238920] ? try_to_del_timer_sync+0x4d/0x80
[37218.238922] ? del_timer_sync+0x35/0x40
[37218.238923] ? schedule_timeout+0x173/0x390
[37218.238927] ? md_rdev_init+0xb0/0xb0 [md_mod]
[37218.238930] ? md_thread+0x94/0x150 [md_mod]
[37218.238932] ? raid10_end_write_request+0x290/0x290 [raid10]
[37218.238936] md_thread+0x94/0x150 [md_mod]
[37218.238938] ? finish_wait+0x80/0x80
[37218.238940] kthread+0x112/0x130
[37218.238941] ? kthread_bind+0x30/0x30
[37218.238943] ret_from_fork+0x35/0x40
[37218.238951] INFO: task btrfs-transacti:575 blocked for more than 120
seconds.
[37218.239023] Not tainted 4.19.0-mx4-amd64 #1 Debian
4.19.28-2+mendix1
[37218.239092] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[37218.239164] btrfs-transacti D 0 575 2 0x80000000
[37218.239166] Call Trace:
[37218.239168] ? __schedule+0x2a2/0x870
[37218.239169] ? _raw_spin_unlock_irqrestore+0x14/0x20
[37218.239171] schedule+0x28/0x80
[37218.239175] md_write_start+0x14b/0x220 [md_mod]
[37218.239177] ? finish_wait+0x80/0x80
[37218.239179] raid10_make_request+0x3d/0x130 [raid10]
[37218.239181] ? finish_wait+0x80/0x80
[37218.239185] md_handle_request+0x119/0x190 [md_mod]
[37218.239189] md_make_request+0x78/0x160 [md_mod]
[37218.239192] generic_make_request+0x1a4/0x410
[37218.239195] submit_bio+0x45/0x140
[37218.239196] ? bio_clone_fast+0x2c/0x60
[37218.239228] btrfs_map_bio+0x20a/0x350 [btrfs]
[37218.239246] btree_submit_bio_hook+0x8d/0xc0 [btrfs]
[37218.239264] submit_one_bio+0x5f/0x80 [btrfs]
[37218.239283] flush_write_bio.isra.40+0x1d/0x30 [btrfs]
[37218.239301] btree_write_cache_pages+0x243/0x360 [btrfs]
[37218.239318] ? btrfs_buffer_uptodate+0x33/0x50 [btrfs]
[37218.239331] ? read_block_for_search.isra.36+0xf2/0x340 [btrfs]
[37218.239333] do_writepages+0x41/0xd0
[37218.239351] ? merge_state.part.45+0x3f/0x130 [btrfs]
[37218.239353] ? kmem_cache_alloc+0x15c/0x1c0
[37218.239370] ? clear_state_bit+0xdb/0x180 [btrfs]
[37218.239372] __filemap_fdatawrite_range+0xbe/0xf0
[37218.239388] btrfs_write_marked_extents+0x68/0x150 [btrfs]
[37218.239405] btrfs_write_and_wait_transaction.isra.21+0x4d/0xa0 [btrfs]
[37218.239421] btrfs_commit_transaction+0x56c/0x870 [btrfs]
[37218.239437] ? start_transaction+0x9d/0x3f0 [btrfs]
[37218.239453] transaction_kthread+0x147/0x180 [btrfs]
[37218.239468] ? btrfs_cleanup_transaction+0x530/0x530 [btrfs]
[37218.239470] kthread+0x112/0x130
[37218.239471] ? kthread_bind+0x30/0x30
[37218.239473] ret_from_fork+0x35/0x40
[37218.239481] INFO: task 1.xvda-1:1820 blocked for more than 120 seconds.
[37218.239551] Not tainted 4.19.0-mx4-amd64 #1 Debian
4.19.28-2+mendix1
[37218.239620] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[37218.239692] 1.xvda-1 D 0 1820 2 0x80000000
[37218.239694] Call Trace:
[37218.239696] ? __schedule+0x2a2/0x870
[37218.239698] ? _raw_spin_unlock_irqrestore+0x14/0x20
[37218.239699] schedule+0x28/0x80
[37218.239703] md_write_start+0x14b/0x220 [md_mod]
[37218.239706] ? finish_wait+0x80/0x80
[37218.239708] raid10_make_request+0x3d/0x130 [raid10]
[37218.239710] ? finish_wait+0x80/0x80
[37218.239713] md_handle_request+0x119/0x190 [md_mod]
[37218.239718] md_make_request+0x78/0x160 [md_mod]
[37218.239720] generic_make_request+0x1a4/0x410
[37218.239723] ? bvec_alloc+0x51/0xe0
[37218.239725] submit_bio+0x45/0x140
[37218.239727] ? bio_add_page+0x48/0x60
[37218.239731] dispatch_rw_block_io+0x68f/0x990 [xen_blkback]
[37218.239734] ? branch_type+0x2e4/0x350
[37218.239737] __do_block_io_op+0x2fc/0x5f0 [xen_blkback]
[37218.239739] ? _raw_spin_unlock_irqrestore+0x14/0x20
[37218.239740] ? try_to_del_timer_sync+0x4d/0x80
[37218.239743] xen_blkif_schedule+0xce/0x620 [xen_blkback]
[37218.239745] ? finish_wait+0x80/0x80
[37218.239748] ? xen_blkif_be_int+0x30/0x30 [xen_blkback]
[37218.239749] kthread+0x112/0x130
[37218.239750] ? kthread_bind+0x30/0x30
[37218.239751] ret_from_fork+0x35/0x40
[37218.239757] INFO: task 2.xvda-0:2125 blocked for more than 120 seconds.
[37218.239825] Not tainted 4.19.0-mx4-amd64 #1 Debian
4.19.28-2+mendix1
[37218.239894] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[37218.239967] 2.xvda-0 D 0 2125 2 0x80000000
[37218.239968] Call Trace:
[37218.239970] ? __schedule+0x2a2/0x870
[37218.239971] ? _raw_spin_unlock_irqrestore+0x14/0x20
[37218.239972] schedule+0x28/0x80
[37218.239977] md_write_start+0x14b/0x220 [md_mod]
[37218.239979] ? finish_wait+0x80/0x80
[37218.239981] raid10_make_request+0x3d/0x130 [raid10]
[37218.239983] ? finish_wait+0x80/0x80
[37218.239986] md_handle_request+0x119/0x190 [md_mod]
[37218.239990] md_make_request+0x78/0x160 [md_mod]
[37218.239992] generic_make_request+0x1a4/0x410
[37218.239994] submit_bio+0x45/0x140
[37218.239996] ? bio_add_page+0x48/0x60
[37218.239998] dispatch_rw_block_io+0x68f/0x990 [xen_blkback]
[37218.240001] ? branch_type+0x2e7/0x350
[37218.240003] __do_block_io_op+0x2fc/0x5f0 [xen_blkback]
[37218.240005] ? _raw_spin_unlock_irqrestore+0x14/0x20
[37218.240006] ? try_to_del_timer_sync+0x4d/0x80
[37218.240009] xen_blkif_schedule+0xce/0x620 [xen_blkback]
[37218.240010] ? __schedule+0x2aa/0x870
[37218.240012] ? finish_wait+0x80/0x80
[37218.240014] ? xen_blkif_be_int+0x30/0x30 [xen_blkback]
[37218.240015] kthread+0x112/0x130
[37218.240016] ? kthread_bind+0x30/0x30
[37218.240018] ret_from_fork+0x35/0x40
[37218.240021] INFO: task qemu-system-i38:6225 blocked for more than 120
seconds.
[37218.240092] Not tainted 4.19.0-mx4-amd64 #1 Debian
4.19.28-2+mendix1
[37218.240161] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[37218.240258] qemu-system-i38 D 0 6225 1 0x00000000
[37218.240260] Call Trace:
[37218.240262] ? __schedule+0x2a2/0x870
[37218.240263] schedule+0x28/0x80
[37218.240266] io_schedule+0x12/0x40
[37218.240267] wait_on_page_bit+0x114/0x150
[37218.240269] ? page_cache_tree_insert+0xe0/0xe0
[37218.240288] extent_write_cache_pages+0x33e/0x3e0 [btrfs]
[37218.240292] ? timerqueue_del+0x1e/0x40
[37218.240293] ? __remove_hrtimer+0x35/0x70
[37218.240311] extent_writepages+0x50/0x80 [btrfs]
[37218.240314] do_writepages+0x41/0xd0
[37218.240315] ? __hrtimer_init+0xb0/0xb0
[37218.240317] __filemap_fdatawrite_range+0xbe/0xf0
[37218.240334] start_ordered_ops+0x4c/0x90 [btrfs]
[37218.240352] btrfs_sync_file+0x9a/0x3d0 [btrfs]
[37218.240357] do_fsync+0x38/0x70
[37218.240359] __x64_sys_fdatasync+0x13/0x20
[37218.240361] do_syscall_64+0x53/0x100
[37218.240364] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[37218.240366] RIP: 0033:0x7f2c4484f2e7
[37218.240373] Code: Bad RIP value.
[37218.240374] RSP: 002b:00007f2c40c24560 EFLAGS: 00000293 ORIG_RAX:
000000000000004b
[37218.240376] RAX: ffffffffffffffda RBX: 000000000000001c RCX:
00007f2c4484f2e7
[37218.240377] RDX: 0000000000000000 RSI: 0000000000000008 RDI:
000000000000001c
[37218.240378] RBP: 000055f297129ea0 R08: 0000000000000000 R09:
00000000ffffffff
[37218.240379] R10: 00007f2c40c24660 R11: 0000000000000293 R12:
000055f2964b06e0
[37218.240380] R13: 000055f2970ea078 R14: 000055f2974aa920 R15:
000055f297d92ab0
[37218.240383] INFO: task kworker/1:2:4661 blocked for more than 120
seconds.
[37218.240478] Not tainted 4.19.0-mx4-amd64 #1 Debian
4.19.28-2+mendix1
[37218.240580] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[37218.240701] kworker/1:2 D 0 4661 2 0x80000000
[37218.240708] Workqueue: dm-integrity-metadata integrity_bio_wait
[dm_integrity]
[37218.240709] Call Trace:
[37218.240711] ? __schedule+0x2a2/0x870
[37218.240713] schedule+0x28/0x80
[37218.240715] io_schedule+0x12/0x40
[37218.240717] wait_and_add_new_range+0x89/0xa0 [dm_integrity]
[37218.240719] dm_integrity_map_continue+0xe71/0x1020 [dm_integrity]
[37218.240723] ? xen_load_sp0+0x77/0x170
[37218.240725] ? __switch_to+0x1e1/0x440
[37218.240727] ? finish_task_switch+0x78/0x260
[37218.240730] process_one_work+0x1a7/0x3a0
[37218.240732] worker_thread+0x30/0x390
[37218.240734] ? create_worker+0x1a0/0x1a0
[37218.240735] kthread+0x112/0x130
[37218.240736] ? kthread_bind+0x30/0x30
[37218.240738] ret_from_fork+0x35/0x40
[37218.240741] INFO: task kworker/1:11:4887 blocked for more than 120
seconds.
[37218.240836] Not tainted 4.19.0-mx4-amd64 #1 Debian
4.19.28-2+mendix1
[37218.240929] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[37218.241050] kworker/1:11 D 0 4887 2 0x80000000
[37218.241053] Workqueue: dm-integrity-metadata integrity_bio_wait
[dm_integrity]
[37218.241054] Call Trace:
[37218.241056] ? __schedule+0x2a2/0x870
[37218.241058] schedule+0x28/0x80
[37218.241059] io_schedule+0x12/0x40
[37218.241061] wait_and_add_new_range+0x89/0xa0 [dm_integrity]
[37218.241063] dm_integrity_map_continue+0xe71/0x1020 [dm_integrity]
[37218.241065] ? xen_load_sp0+0x77/0x170
[37218.241067] ? __switch_to+0x1e1/0x440
[37218.241068] ? finish_task_switch+0x78/0x260
[37218.241070] process_one_work+0x1a7/0x3a0
[37218.241072] worker_thread+0x30/0x390
[37218.241073] ? create_worker+0x1a0/0x1a0
[37218.241074] kthread+0x112/0x130
[37218.241075] ? kthread_bind+0x30/0x30
[37218.241077] ret_from_fork+0x35/0x40
[37218.241082] INFO: task kworker/0:19:5061 blocked for more than 120
seconds.
[37218.241176] Not tainted 4.19.0-mx4-amd64 #1 Debian
4.19.28-2+mendix1
[37218.241270] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[37218.241391] kworker/0:19 D 0 5061 2 0x80000000
[37218.241394] Workqueue: dm-integrity-metadata integrity_bio_wait
[dm_integrity]
[37218.241395] Call Trace:
[37218.241397] ? __schedule+0x2a2/0x870
[37218.241398] schedule+0x28/0x80
[37218.241400] io_schedule+0x12/0x40
[37218.241402] wait_and_add_new_range+0x89/0xa0 [dm_integrity]
[37218.241404] dm_integrity_map_continue+0xe71/0x1020 [dm_integrity]
[37218.241406] ? xen_load_sp0+0x77/0x170
[37218.241407] ? __switch_to+0x1e1/0x440
[37218.241408] ? finish_task_switch+0x78/0x260
[37218.241410] process_one_work+0x1a7/0x3a0
[37218.241412] worker_thread+0x30/0x390
[37218.241414] ? create_worker+0x1a0/0x1a0
[37218.241415] kthread+0x112/0x130
[37218.241416] ? kthread_bind+0x30/0x30
[37218.241417] ret_from_fork+0x35/0x40
[37218.241420] INFO: task kworker/1:3:5090 blocked for more than 120
seconds.
[37218.241514] Not tainted 4.19.0-mx4-amd64 #1 Debian
4.19.28-2+mendix1
[37218.241616] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[37218.241737] kworker/1:3 D 0 5090 2 0x80000000
[37218.241740] Workqueue: dm-integrity-metadata integrity_bio_wait
[dm_integrity]
[37218.241740] Call Trace:
[37218.241742] ? __schedule+0x2a2/0x870
[37218.241744] schedule+0x28/0x80
[37218.241745] io_schedule+0x12/0x40
[37218.241747] wait_and_add_new_range+0x89/0xa0 [dm_integrity]
[37218.241749] dm_integrity_map_continue+0xe71/0x1020 [dm_integrity]
[37218.241751] ? xen_load_sp0+0x77/0x170
[37218.241752] ? __switch_to+0x1e1/0x440
[37218.241753] ? finish_task_switch+0x78/0x260
[37218.241755] process_one_work+0x1a7/0x3a0
[37218.241757] worker_thread+0x30/0x390
[37218.241758] ? create_worker+0x1a0/0x1a0
[37218.241759] kthread+0x112/0x130
[37218.241760] ? kthread_bind+0x30/0x30
[37218.241762] ret_from_fork+0x35/0x40
[37218.241764] INFO: task kworker/1:10:5264 blocked for more than 120
seconds.
[37218.241859] Not tainted 4.19.0-mx4-amd64 #1 Debian
4.19.28-2+mendix1
[37218.241952] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[37218.242073] kworker/1:10 D 0 5264 2 0x80000000
[37218.242076] Workqueue: dm-integrity-metadata integrity_bio_wait
[dm_integrity]
[37218.242077] Call Trace:
[37218.242079] ? __schedule+0x2a2/0x870
[37218.242080] schedule+0x28/0x80
[37218.242082] io_schedule+0x12/0x40
[37218.242083] wait_and_add_new_range+0x89/0xa0 [dm_integrity]
[37218.242085] dm_integrity_map_continue+0xe71/0x1020 [dm_integrity]
[37218.242088] ? xen_load_sp0+0x77/0x170
[37218.242089] ? __switch_to+0x1e1/0x440
[37218.242090] ? finish_task_switch+0x78/0x260
[37218.242092] process_one_work+0x1a7/0x3a0
[37218.242094] worker_thread+0x30/0x390
[37218.242095] ? create_worker+0x1a0/0x1a0
[37218.242096] kthread+0x112/0x130
[37218.242097] ? kthread_bind+0x30/0x30
[37218.242099] ret_from_fork+0x35/0x40
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dm-crypt] Dm-integrity freeze
@ 2019-05-21 20:33 ` Hans van Kranenburg
0 siblings, 0 replies; 10+ messages in thread
From: Hans van Kranenburg @ 2019-05-21 20:33 UTC (permalink / raw)
To: Milan Broz, Victor Helmholtz, dm-crypt
Cc: device-mapper development, Mikulas Patocka
Hi,
On 5/21/19 10:43 AM, Hans van Kranenburg wrote:
> Hi,
>
> I'm seeing the same lockup, also 4.19. This is mdadm RAID10 on top of 4x
> a partition with only dm-integrity.
>
> It just happened out of the blue, no heavy load or anything. All IO to
> it is frozen now.
>
> [...]
There it is again... dmesg dump below. All cpus on 100% iowait.
It's triggered after a few minutes by running some Windows 2019 server
install (ugh, don't ask) in a Xen HVM domU, which writes into a raw
sparse file on a btrfs filesystem on LVM on mdadm RAID10 on 4x
dm-integrity (wheeee!!)...
This morning it was triggered a few minutes after starting an old
windows 2008 server image that I copied to this machine.
When running only other Linux vms, and when copying data onto
filesystems that live in LVM logical volumes I haven't seen this problem
yet, at all, in the last few weeks that this machine is running.
I noticed there's a "dm integrity: fix deadlock with overlapping I/O"
fix in a later 4.19. Is there any chance this is related? I have no
idea, but any hints or suggestions about what to try would be appreciated.
Right now, I still have the opportunity to stop the migration and keep
using the previous hardware for some days, having the possibility to
reproduce this a few times and try things, if you want.
If not, I likely have to go try remove the dm-integrity layer, but I'd
rather help fixing this issue.
Thanks,
[37218.238621] INFO: task md127_raid10:485 blocked for more than 120
seconds.
[37218.238713] Not tainted 4.19.0-mx4-amd64 #1 Debian
4.19.28-2+mendix1
[37218.238782] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[37218.238856] md127_raid10 D 0 485 2 0x80000000
[37218.238859] Call Trace:
[37218.238868] ? __schedule+0x2a2/0x870
[37218.238871] ? _raw_spin_unlock_irqrestore+0x14/0x20
[37218.238872] schedule+0x28/0x80
[37218.238882] md_super_wait+0x6e/0xa0 [md_mod]
[37218.238887] ? finish_wait+0x80/0x80
[37218.238892] write_page+0x172/0x320 [md_mod]
[37218.238894] ? __schedule+0x2aa/0x870
[37218.238895] ? _raw_spin_unlock_irqrestore+0x14/0x20
[37218.238900] md_update_sb.part.64+0x382/0x8f0 [md_mod]
[37218.238904] md_check_recovery+0x148/0x540 [md_mod]
[37218.238907] raid10d+0x62/0x1440 [raid10]
[37218.238910] ? __switch_to_asm+0x34/0x70
[37218.238912] ? finish_task_switch+0x78/0x260
[37218.238914] ? _raw_spin_lock_irqsave+0x15/0x40
[37218.238917] ? lock_timer_base+0x67/0x80
[37218.238918] ? _raw_spin_unlock_irqrestore+0x14/0x20
[37218.238920] ? try_to_del_timer_sync+0x4d/0x80
[37218.238922] ? del_timer_sync+0x35/0x40
[37218.238923] ? schedule_timeout+0x173/0x390
[37218.238927] ? md_rdev_init+0xb0/0xb0 [md_mod]
[37218.238930] ? md_thread+0x94/0x150 [md_mod]
[37218.238932] ? raid10_end_write_request+0x290/0x290 [raid10]
[37218.238936] md_thread+0x94/0x150 [md_mod]
[37218.238938] ? finish_wait+0x80/0x80
[37218.238940] kthread+0x112/0x130
[37218.238941] ? kthread_bind+0x30/0x30
[37218.238943] ret_from_fork+0x35/0x40
[37218.238951] INFO: task btrfs-transacti:575 blocked for more than 120
seconds.
[37218.239023] Not tainted 4.19.0-mx4-amd64 #1 Debian
4.19.28-2+mendix1
[37218.239092] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[37218.239164] btrfs-transacti D 0 575 2 0x80000000
[37218.239166] Call Trace:
[37218.239168] ? __schedule+0x2a2/0x870
[37218.239169] ? _raw_spin_unlock_irqrestore+0x14/0x20
[37218.239171] schedule+0x28/0x80
[37218.239175] md_write_start+0x14b/0x220 [md_mod]
[37218.239177] ? finish_wait+0x80/0x80
[37218.239179] raid10_make_request+0x3d/0x130 [raid10]
[37218.239181] ? finish_wait+0x80/0x80
[37218.239185] md_handle_request+0x119/0x190 [md_mod]
[37218.239189] md_make_request+0x78/0x160 [md_mod]
[37218.239192] generic_make_request+0x1a4/0x410
[37218.239195] submit_bio+0x45/0x140
[37218.239196] ? bio_clone_fast+0x2c/0x60
[37218.239228] btrfs_map_bio+0x20a/0x350 [btrfs]
[37218.239246] btree_submit_bio_hook+0x8d/0xc0 [btrfs]
[37218.239264] submit_one_bio+0x5f/0x80 [btrfs]
[37218.239283] flush_write_bio.isra.40+0x1d/0x30 [btrfs]
[37218.239301] btree_write_cache_pages+0x243/0x360 [btrfs]
[37218.239318] ? btrfs_buffer_uptodate+0x33/0x50 [btrfs]
[37218.239331] ? read_block_for_search.isra.36+0xf2/0x340 [btrfs]
[37218.239333] do_writepages+0x41/0xd0
[37218.239351] ? merge_state.part.45+0x3f/0x130 [btrfs]
[37218.239353] ? kmem_cache_alloc+0x15c/0x1c0
[37218.239370] ? clear_state_bit+0xdb/0x180 [btrfs]
[37218.239372] __filemap_fdatawrite_range+0xbe/0xf0
[37218.239388] btrfs_write_marked_extents+0x68/0x150 [btrfs]
[37218.239405] btrfs_write_and_wait_transaction.isra.21+0x4d/0xa0 [btrfs]
[37218.239421] btrfs_commit_transaction+0x56c/0x870 [btrfs]
[37218.239437] ? start_transaction+0x9d/0x3f0 [btrfs]
[37218.239453] transaction_kthread+0x147/0x180 [btrfs]
[37218.239468] ? btrfs_cleanup_transaction+0x530/0x530 [btrfs]
[37218.239470] kthread+0x112/0x130
[37218.239471] ? kthread_bind+0x30/0x30
[37218.239473] ret_from_fork+0x35/0x40
[37218.239481] INFO: task 1.xvda-1:1820 blocked for more than 120 seconds.
[37218.239551] Not tainted 4.19.0-mx4-amd64 #1 Debian
4.19.28-2+mendix1
[37218.239620] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[37218.239692] 1.xvda-1 D 0 1820 2 0x80000000
[37218.239694] Call Trace:
[37218.239696] ? __schedule+0x2a2/0x870
[37218.239698] ? _raw_spin_unlock_irqrestore+0x14/0x20
[37218.239699] schedule+0x28/0x80
[37218.239703] md_write_start+0x14b/0x220 [md_mod]
[37218.239706] ? finish_wait+0x80/0x80
[37218.239708] raid10_make_request+0x3d/0x130 [raid10]
[37218.239710] ? finish_wait+0x80/0x80
[37218.239713] md_handle_request+0x119/0x190 [md_mod]
[37218.239718] md_make_request+0x78/0x160 [md_mod]
[37218.239720] generic_make_request+0x1a4/0x410
[37218.239723] ? bvec_alloc+0x51/0xe0
[37218.239725] submit_bio+0x45/0x140
[37218.239727] ? bio_add_page+0x48/0x60
[37218.239731] dispatch_rw_block_io+0x68f/0x990 [xen_blkback]
[37218.239734] ? branch_type+0x2e4/0x350
[37218.239737] __do_block_io_op+0x2fc/0x5f0 [xen_blkback]
[37218.239739] ? _raw_spin_unlock_irqrestore+0x14/0x20
[37218.239740] ? try_to_del_timer_sync+0x4d/0x80
[37218.239743] xen_blkif_schedule+0xce/0x620 [xen_blkback]
[37218.239745] ? finish_wait+0x80/0x80
[37218.239748] ? xen_blkif_be_int+0x30/0x30 [xen_blkback]
[37218.239749] kthread+0x112/0x130
[37218.239750] ? kthread_bind+0x30/0x30
[37218.239751] ret_from_fork+0x35/0x40
[37218.239757] INFO: task 2.xvda-0:2125 blocked for more than 120 seconds.
[37218.239825] Not tainted 4.19.0-mx4-amd64 #1 Debian
4.19.28-2+mendix1
[37218.239894] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[37218.239967] 2.xvda-0 D 0 2125 2 0x80000000
[37218.239968] Call Trace:
[37218.239970] ? __schedule+0x2a2/0x870
[37218.239971] ? _raw_spin_unlock_irqrestore+0x14/0x20
[37218.239972] schedule+0x28/0x80
[37218.239977] md_write_start+0x14b/0x220 [md_mod]
[37218.239979] ? finish_wait+0x80/0x80
[37218.239981] raid10_make_request+0x3d/0x130 [raid10]
[37218.239983] ? finish_wait+0x80/0x80
[37218.239986] md_handle_request+0x119/0x190 [md_mod]
[37218.239990] md_make_request+0x78/0x160 [md_mod]
[37218.239992] generic_make_request+0x1a4/0x410
[37218.239994] submit_bio+0x45/0x140
[37218.239996] ? bio_add_page+0x48/0x60
[37218.239998] dispatch_rw_block_io+0x68f/0x990 [xen_blkback]
[37218.240001] ? branch_type+0x2e7/0x350
[37218.240003] __do_block_io_op+0x2fc/0x5f0 [xen_blkback]
[37218.240005] ? _raw_spin_unlock_irqrestore+0x14/0x20
[37218.240006] ? try_to_del_timer_sync+0x4d/0x80
[37218.240009] xen_blkif_schedule+0xce/0x620 [xen_blkback]
[37218.240010] ? __schedule+0x2aa/0x870
[37218.240012] ? finish_wait+0x80/0x80
[37218.240014] ? xen_blkif_be_int+0x30/0x30 [xen_blkback]
[37218.240015] kthread+0x112/0x130
[37218.240016] ? kthread_bind+0x30/0x30
[37218.240018] ret_from_fork+0x35/0x40
[37218.240021] INFO: task qemu-system-i38:6225 blocked for more than 120
seconds.
[37218.240092] Not tainted 4.19.0-mx4-amd64 #1 Debian
4.19.28-2+mendix1
[37218.240161] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[37218.240258] qemu-system-i38 D 0 6225 1 0x00000000
[37218.240260] Call Trace:
[37218.240262] ? __schedule+0x2a2/0x870
[37218.240263] schedule+0x28/0x80
[37218.240266] io_schedule+0x12/0x40
[37218.240267] wait_on_page_bit+0x114/0x150
[37218.240269] ? page_cache_tree_insert+0xe0/0xe0
[37218.240288] extent_write_cache_pages+0x33e/0x3e0 [btrfs]
[37218.240292] ? timerqueue_del+0x1e/0x40
[37218.240293] ? __remove_hrtimer+0x35/0x70
[37218.240311] extent_writepages+0x50/0x80 [btrfs]
[37218.240314] do_writepages+0x41/0xd0
[37218.240315] ? __hrtimer_init+0xb0/0xb0
[37218.240317] __filemap_fdatawrite_range+0xbe/0xf0
[37218.240334] start_ordered_ops+0x4c/0x90 [btrfs]
[37218.240352] btrfs_sync_file+0x9a/0x3d0 [btrfs]
[37218.240357] do_fsync+0x38/0x70
[37218.240359] __x64_sys_fdatasync+0x13/0x20
[37218.240361] do_syscall_64+0x53/0x100
[37218.240364] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[37218.240366] RIP: 0033:0x7f2c4484f2e7
[37218.240373] Code: Bad RIP value.
[37218.240374] RSP: 002b:00007f2c40c24560 EFLAGS: 00000293 ORIG_RAX:
000000000000004b
[37218.240376] RAX: ffffffffffffffda RBX: 000000000000001c RCX:
00007f2c4484f2e7
[37218.240377] RDX: 0000000000000000 RSI: 0000000000000008 RDI:
000000000000001c
[37218.240378] RBP: 000055f297129ea0 R08: 0000000000000000 R09:
00000000ffffffff
[37218.240379] R10: 00007f2c40c24660 R11: 0000000000000293 R12:
000055f2964b06e0
[37218.240380] R13: 000055f2970ea078 R14: 000055f2974aa920 R15:
000055f297d92ab0
[37218.240383] INFO: task kworker/1:2:4661 blocked for more than 120
seconds.
[37218.240478] Not tainted 4.19.0-mx4-amd64 #1 Debian
4.19.28-2+mendix1
[37218.240580] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[37218.240701] kworker/1:2 D 0 4661 2 0x80000000
[37218.240708] Workqueue: dm-integrity-metadata integrity_bio_wait
[dm_integrity]
[37218.240709] Call Trace:
[37218.240711] ? __schedule+0x2a2/0x870
[37218.240713] schedule+0x28/0x80
[37218.240715] io_schedule+0x12/0x40
[37218.240717] wait_and_add_new_range+0x89/0xa0 [dm_integrity]
[37218.240719] dm_integrity_map_continue+0xe71/0x1020 [dm_integrity]
[37218.240723] ? xen_load_sp0+0x77/0x170
[37218.240725] ? __switch_to+0x1e1/0x440
[37218.240727] ? finish_task_switch+0x78/0x260
[37218.240730] process_one_work+0x1a7/0x3a0
[37218.240732] worker_thread+0x30/0x390
[37218.240734] ? create_worker+0x1a0/0x1a0
[37218.240735] kthread+0x112/0x130
[37218.240736] ? kthread_bind+0x30/0x30
[37218.240738] ret_from_fork+0x35/0x40
[37218.240741] INFO: task kworker/1:11:4887 blocked for more than 120
seconds.
[37218.240836] Not tainted 4.19.0-mx4-amd64 #1 Debian
4.19.28-2+mendix1
[37218.240929] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[37218.241050] kworker/1:11 D 0 4887 2 0x80000000
[37218.241053] Workqueue: dm-integrity-metadata integrity_bio_wait
[dm_integrity]
[37218.241054] Call Trace:
[37218.241056] ? __schedule+0x2a2/0x870
[37218.241058] schedule+0x28/0x80
[37218.241059] io_schedule+0x12/0x40
[37218.241061] wait_and_add_new_range+0x89/0xa0 [dm_integrity]
[37218.241063] dm_integrity_map_continue+0xe71/0x1020 [dm_integrity]
[37218.241065] ? xen_load_sp0+0x77/0x170
[37218.241067] ? __switch_to+0x1e1/0x440
[37218.241068] ? finish_task_switch+0x78/0x260
[37218.241070] process_one_work+0x1a7/0x3a0
[37218.241072] worker_thread+0x30/0x390
[37218.241073] ? create_worker+0x1a0/0x1a0
[37218.241074] kthread+0x112/0x130
[37218.241075] ? kthread_bind+0x30/0x30
[37218.241077] ret_from_fork+0x35/0x40
[37218.241082] INFO: task kworker/0:19:5061 blocked for more than 120
seconds.
[37218.241176] Not tainted 4.19.0-mx4-amd64 #1 Debian
4.19.28-2+mendix1
[37218.241270] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[37218.241391] kworker/0:19 D 0 5061 2 0x80000000
[37218.241394] Workqueue: dm-integrity-metadata integrity_bio_wait
[dm_integrity]
[37218.241395] Call Trace:
[37218.241397] ? __schedule+0x2a2/0x870
[37218.241398] schedule+0x28/0x80
[37218.241400] io_schedule+0x12/0x40
[37218.241402] wait_and_add_new_range+0x89/0xa0 [dm_integrity]
[37218.241404] dm_integrity_map_continue+0xe71/0x1020 [dm_integrity]
[37218.241406] ? xen_load_sp0+0x77/0x170
[37218.241407] ? __switch_to+0x1e1/0x440
[37218.241408] ? finish_task_switch+0x78/0x260
[37218.241410] process_one_work+0x1a7/0x3a0
[37218.241412] worker_thread+0x30/0x390
[37218.241414] ? create_worker+0x1a0/0x1a0
[37218.241415] kthread+0x112/0x130
[37218.241416] ? kthread_bind+0x30/0x30
[37218.241417] ret_from_fork+0x35/0x40
[37218.241420] INFO: task kworker/1:3:5090 blocked for more than 120
seconds.
[37218.241514] Not tainted 4.19.0-mx4-amd64 #1 Debian
4.19.28-2+mendix1
[37218.241616] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[37218.241737] kworker/1:3 D 0 5090 2 0x80000000
[37218.241740] Workqueue: dm-integrity-metadata integrity_bio_wait
[dm_integrity]
[37218.241740] Call Trace:
[37218.241742] ? __schedule+0x2a2/0x870
[37218.241744] schedule+0x28/0x80
[37218.241745] io_schedule+0x12/0x40
[37218.241747] wait_and_add_new_range+0x89/0xa0 [dm_integrity]
[37218.241749] dm_integrity_map_continue+0xe71/0x1020 [dm_integrity]
[37218.241751] ? xen_load_sp0+0x77/0x170
[37218.241752] ? __switch_to+0x1e1/0x440
[37218.241753] ? finish_task_switch+0x78/0x260
[37218.241755] process_one_work+0x1a7/0x3a0
[37218.241757] worker_thread+0x30/0x390
[37218.241758] ? create_worker+0x1a0/0x1a0
[37218.241759] kthread+0x112/0x130
[37218.241760] ? kthread_bind+0x30/0x30
[37218.241762] ret_from_fork+0x35/0x40
[37218.241764] INFO: task kworker/1:10:5264 blocked for more than 120
seconds.
[37218.241859] Not tainted 4.19.0-mx4-amd64 #1 Debian
4.19.28-2+mendix1
[37218.241952] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[37218.242073] kworker/1:10 D 0 5264 2 0x80000000
[37218.242076] Workqueue: dm-integrity-metadata integrity_bio_wait
[dm_integrity]
[37218.242077] Call Trace:
[37218.242079] ? __schedule+0x2a2/0x870
[37218.242080] schedule+0x28/0x80
[37218.242082] io_schedule+0x12/0x40
[37218.242083] wait_and_add_new_range+0x89/0xa0 [dm_integrity]
[37218.242085] dm_integrity_map_continue+0xe71/0x1020 [dm_integrity]
[37218.242088] ? xen_load_sp0+0x77/0x170
[37218.242089] ? __switch_to+0x1e1/0x440
[37218.242090] ? finish_task_switch+0x78/0x260
[37218.242092] process_one_work+0x1a7/0x3a0
[37218.242094] worker_thread+0x30/0x390
[37218.242095] ? create_worker+0x1a0/0x1a0
[37218.242096] kthread+0x112/0x130
[37218.242097] ? kthread_bind+0x30/0x30
[37218.242099] ret_from_fork+0x35/0x40
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dm-crypt] Dm-integrity freeze
2019-05-21 20:33 ` [dm-crypt] " Hans van Kranenburg
@ 2019-05-21 22:07 ` Mike Snitzer
-1 siblings, 0 replies; 10+ messages in thread
From: Mike Snitzer @ 2019-05-21 22:07 UTC (permalink / raw)
To: Hans van Kranenburg
Cc: Milan Broz, Victor Helmholtz, dm-crypt,
device-mapper development, Mikulas Patocka
On Tue, May 21 2019 at 4:33pm -0400,
Hans van Kranenburg <hans@knorrie.org> wrote:
> Hi,
>
> On 5/21/19 10:43 AM, Hans van Kranenburg wrote:
> > Hi,
> >
> > I'm seeing the same lockup, also 4.19. This is mdadm RAID10 on top of 4x
> > a partition with only dm-integrity.
> >
> > It just happened out of the blue, no heavy load or anything. All IO to
> > it is frozen now.
> >
> > [...]
>
> There it is again... dmesg dump below. All cpus on 100% iowait.
>
> It's triggered after a few minutes by running some Windows 2019 server
> install (ugh, don't ask) in a Xen HVM domU, which writes into a raw
> sparse file on a btrfs filesystem on LVM on mdadm RAID10 on 4x
> dm-integrity (wheeee!!)...
>
> This morning it was triggered a few minutes after starting an old
> windows 2008 server image that I copied to this machine.
>
> When running only other Linux vms, and when copying data onto
> filesystems that live in LVM logical volumes I haven't seen this problem
> yet, at all, in the last few weeks that this machine is running.
>
> I noticed there's a "dm integrity: fix deadlock with overlapping I/O"
> fix in a later 4.19. Is there any chance this is related? I have no
> idea, but any hints or suggestions about what to try would be appreciated.
Yes, all your hung tasks are hung in wait_and_add_new_range().
Please use that later 4.19 or apply commit 4ed319c6ac08 ("dm integrity:
fix deadlock with overlapping I/O")
Mike
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Dm-integrity freeze
@ 2019-05-21 22:07 ` Mike Snitzer
0 siblings, 0 replies; 10+ messages in thread
From: Mike Snitzer @ 2019-05-21 22:07 UTC (permalink / raw)
To: Hans van Kranenburg
Cc: dm-crypt, device-mapper development, Mikulas Patocka, Milan Broz,
Victor Helmholtz
On Tue, May 21 2019 at 4:33pm -0400,
Hans van Kranenburg <hans@knorrie.org> wrote:
> Hi,
>
> On 5/21/19 10:43 AM, Hans van Kranenburg wrote:
> > Hi,
> >
> > I'm seeing the same lockup, also 4.19. This is mdadm RAID10 on top of 4x
> > a partition with only dm-integrity.
> >
> > It just happened out of the blue, no heavy load or anything. All IO to
> > it is frozen now.
> >
> > [...]
>
> There it is again... dmesg dump below. All cpus on 100% iowait.
>
> It's triggered after a few minutes by running some Windows 2019 server
> install (ugh, don't ask) in a Xen HVM domU, which writes into a raw
> sparse file on a btrfs filesystem on LVM on mdadm RAID10 on 4x
> dm-integrity (wheeee!!)...
>
> This morning it was triggered a few minutes after starting an old
> windows 2008 server image that I copied to this machine.
>
> When running only other Linux vms, and when copying data onto
> filesystems that live in LVM logical volumes I haven't seen this problem
> yet, at all, in the last few weeks that this machine is running.
>
> I noticed there's a "dm integrity: fix deadlock with overlapping I/O"
> fix in a later 4.19. Is there any chance this is related? I have no
> idea, but any hints or suggestions about what to try would be appreciated.
Yes, all your hung tasks are hung in wait_and_add_new_range().
Please use that later 4.19 or apply commit 4ed319c6ac08 ("dm integrity:
fix deadlock with overlapping I/O")
Mike
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dm-crypt] Dm-integrity freeze
2019-05-21 22:07 ` Mike Snitzer
@ 2019-05-22 7:52 ` Hans van Kranenburg
-1 siblings, 0 replies; 10+ messages in thread
From: Hans van Kranenburg @ 2019-05-22 7:52 UTC (permalink / raw)
To: Mike Snitzer
Cc: Milan Broz, Victor Helmholtz, dm-crypt,
device-mapper development, Mikulas Patocka
On 5/22/19 12:07 AM, Mike Snitzer wrote:
> On Tue, May 21 2019 at 4:33pm -0400,
> Hans van Kranenburg <hans@knorrie.org> wrote:
>
>> Hi,
>>
>> On 5/21/19 10:43 AM, Hans van Kranenburg wrote:
>>> Hi,
>>>
>>> I'm seeing the same lockup, also 4.19. This is mdadm RAID10 on top of 4x
>>> a partition with only dm-integrity.
>>>
>>> It just happened out of the blue, no heavy load or anything. All IO to
>>> it is frozen now.
>>>
>>> [...]
>>
>> There it is again... dmesg dump below. All cpus on 100% iowait.
>>
>> It's triggered after a few minutes by running some Windows 2019 server
>> install (ugh, don't ask) in a Xen HVM domU, which writes into a raw
>> sparse file on a btrfs filesystem on LVM on mdadm RAID10 on 4x
>> dm-integrity (wheeee!!)...
>>
>> This morning it was triggered a few minutes after starting an old
>> windows 2008 server image that I copied to this machine.
>>
>> When running only other Linux vms, and when copying data onto
>> filesystems that live in LVM logical volumes I haven't seen this problem
>> yet, at all, in the last few weeks that this machine is running.
>>
>> I noticed there's a "dm integrity: fix deadlock with overlapping I/O"
>> fix in a later 4.19. Is there any chance this is related? I have no
>> idea, but any hints or suggestions about what to try would be appreciated.
>
> Yes, all your hung tasks are hung in wait_and_add_new_range().
>
> Please use that later 4.19 or apply commit 4ed319c6ac08 ("dm integrity:
> fix deadlock with overlapping I/O")
Oh, silly me, that one is in 4.19.35 and I already have 4.9.37 based on
Debian in my repo, just rebooted the machine into it. Let's see what
happens today.
Thanks for confirming, sorry for the noise.
Hans
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Dm-integrity freeze
@ 2019-05-22 7:52 ` Hans van Kranenburg
0 siblings, 0 replies; 10+ messages in thread
From: Hans van Kranenburg @ 2019-05-22 7:52 UTC (permalink / raw)
To: Mike Snitzer
Cc: dm-crypt, device-mapper development, Mikulas Patocka, Milan Broz,
Victor Helmholtz
On 5/22/19 12:07 AM, Mike Snitzer wrote:
> On Tue, May 21 2019 at 4:33pm -0400,
> Hans van Kranenburg <hans@knorrie.org> wrote:
>
>> Hi,
>>
>> On 5/21/19 10:43 AM, Hans van Kranenburg wrote:
>>> Hi,
>>>
>>> I'm seeing the same lockup, also 4.19. This is mdadm RAID10 on top of 4x
>>> a partition with only dm-integrity.
>>>
>>> It just happened out of the blue, no heavy load or anything. All IO to
>>> it is frozen now.
>>>
>>> [...]
>>
>> There it is again... dmesg dump below. All cpus on 100% iowait.
>>
>> It's triggered after a few minutes by running some Windows 2019 server
>> install (ugh, don't ask) in a Xen HVM domU, which writes into a raw
>> sparse file on a btrfs filesystem on LVM on mdadm RAID10 on 4x
>> dm-integrity (wheeee!!)...
>>
>> This morning it was triggered a few minutes after starting an old
>> windows 2008 server image that I copied to this machine.
>>
>> When running only other Linux vms, and when copying data onto
>> filesystems that live in LVM logical volumes I haven't seen this problem
>> yet, at all, in the last few weeks that this machine is running.
>>
>> I noticed there's a "dm integrity: fix deadlock with overlapping I/O"
>> fix in a later 4.19. Is there any chance this is related? I have no
>> idea, but any hints or suggestions about what to try would be appreciated.
>
> Yes, all your hung tasks are hung in wait_and_add_new_range().
>
> Please use that later 4.19 or apply commit 4ed319c6ac08 ("dm integrity:
> fix deadlock with overlapping I/O")
Oh, silly me, that one is in 4.19.35 and I already have 4.9.37 based on
Debian in my repo, just rebooted the machine into it. Let's see what
happens today.
Thanks for confirming, sorry for the noise.
Hans
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-05-22 7:52 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-07 15:26 [dm-crypt] Dm-integrity freeze Victor Helmholtz
2019-03-07 16:37 ` Milan Broz
2019-05-21 8:43 ` [dm-crypt] [dm-devel] " Hans van Kranenburg
2019-05-21 8:43 ` [dm-crypt] " Hans van Kranenburg
2019-05-21 20:33 ` [dm-crypt] [dm-devel] " Hans van Kranenburg
2019-05-21 20:33 ` [dm-crypt] " Hans van Kranenburg
2019-05-21 22:07 ` Mike Snitzer
2019-05-21 22:07 ` Mike Snitzer
2019-05-22 7:52 ` [dm-crypt] " Hans van Kranenburg
2019-05-22 7:52 ` Hans van Kranenburg
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.