All of lore.kernel.org
 help / color / mirror / Atom feed
* Fix "dm kcopyd: Fix bug causing workqueue stalls" causes dead lock
@ 2019-09-27 13:19 Guruswamy Basavaiah
  2019-09-27 19:33 ` Nikos Tsironis
  0 siblings, 1 reply; 19+ messages in thread
From: Guruswamy Basavaiah @ 2019-09-27 13:19 UTC (permalink / raw)
  To: dm-devel, ntsironis, iliastsi

Hello,
 We have drbd partition on top of lvm partition. when node having
secondary drbd partition is coming up, large amount of data will be
synced between primary to secondary drbd partitions.

During this time, we see the drbd Sync(Resync) stops at some point.
After 120 seconds we see hung-task-timeout warnings in the logs.(see
at the end of this email)

If i increase the cow_count semaphore value from 2048 to 8192 or
remove the below patch, drbd sync works seamlessly.

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/patch/?id=721b1d98fb517ae99ab3b757021cf81db41e67be

I am not familiar with dm code, from hung task back traces what i
understand is, when thread is trying to queue work to kcopyd, holding
"&_origins_lock" and blocked on cow_count lock,
jobs from kcopyd is trying to queue work to same kcopyd and blocked on
"&_origins_lock" and dead lock.

Below is the hung task back traces.
Sep 24 12:08:48.974658 err CFPU-1 kernel: [  279.991760] INFO: task
kworker/1:1:170 blocked for more than 120 seconds.
Sep 24 12:08:48.974658 err CFPU-1 kernel: [  279.998569]
Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
Sep 24 12:08:48.974658 err CFPU-1 kernel: [  280.006593] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 24 12:08:48.974658 info CFPU-1 kernel: [  280.014435] kworker/1:1
   D ffffffff80e1db78     0   170      2 0x00100000
Sep 24 12:08:48.974658 info CFPU-1 kernel: [  280.014482] Workqueue:
kcopyd do_work [dm_mod]
Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487] Stack :
0000000000000000 0000000000000001 0003000300000000 80000007fde8bac8
Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
80000007fe759b00 0000000000000002 ffffffffc0285294 80000007f8d1ca00
Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
ffffffffc027eda8 0000000000000001 ffffffff80b30000 0000000000000100
Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
8000000784c098c8 ffffffff80e1db78 80000007fe759b00 ffffffff80e204b8
Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
8000000788ef79c0 800000078505ba70 80000007fe759b00 00000001852b4620
Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
ffffffffc0280000 80000007852b4620 80000007eebf5758 ffffffffc027edec
Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
0000000000000000 80000007852b4620 80000007835d8e80 ffffffffc027f38c
Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
8000000787ac0580 0000000000000001 80000007f8d1ca60 8000000785aeb080
Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
0000000000000000 0000000000000000 0000000000000200 ffffffffc0282488
Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
0000000000000200 80000007f8d1ca00 ffffffffc0280000 ffffffffc027db90
Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]   ...
Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014558] Call Trace:
Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014570]
[<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014580]
[<ffffffff80e1db78>] schedule+0x38/0x98
Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014590]
[<ffffffff80e204b8>] __down_read+0xa8/0xf0
Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014609]
[<ffffffffc027edec>] do_origin.isra.13+0x44/0x110 [dm_snapshot]
Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014625]
[<ffffffffc027f38c>] pending_complete+0x1ac/0x378 [dm_snapshot]
Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014642]
[<ffffffffc0282488>] persistent_commit_exception+0x140/0x218
[dm_snapshot]
Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014659]
[<ffffffffc027db90>] copy_callback+0x108/0x1a0 [dm_snapshot]
Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014688]
[<ffffffffc01eb6a4>] run_complete_job+0x8c/0x148 [dm_mod]
Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014724]
[<ffffffffc01eaae8>] process_jobs+0xc8/0x1e0 [dm_mod]
Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014759]
[<ffffffffc01eb1e0>] do_work+0xb8/0x110 [dm_mod]
Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014783]
[<ffffffff808b18e0>] process_one_work+0x190/0x480
Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014791]
[<ffffffff808b1d18>] worker_thread+0x148/0x580
Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014801]
[<ffffffff808b813c>] kthread+0xdc/0xf8
Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014811]
[<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014817]
Sep 24 12:08:48.974658 err CFPU-1 kernel: [  280.015046] INFO: task
drbd_r_r1:7772 blocked for more than 120 seconds.
Sep 24 12:08:48.974658 err CFPU-1 kernel: [  280.021766]
Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
Sep 24 12:08:48.974658 err CFPU-1 kernel: [  280.029781] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 24 12:08:48.982578 info CFPU-1 kernel: [  280.037637] drbd_r_r1
   D ffffffff80e1db78     0  7772      2 0x00100002
Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648] Stack :
00000000000207e8 ffffffff80b35530 8000000788e77860 8000000788e77860
Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
7fffffffffffffff 80000007f8d1cb28 8000000788e0a880 0000000000000002
Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
ffffffff80e20000 80000007f9306480 00000000000207e8 ffffffff808dfe58
Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
80000007eebf5168 ffffffff80e1db78 0000000002411200 ffffffff80e208d8
Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
80000007eab5b440 ffffffff8097b25c 00000000000207a0 ffffffff808dfe58
Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
80000007f8d1ca00 ffffffff809ce014 80000007f9306000 0000000002011200
Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
7fffffffffffffff 80000007f8d1cb28 8000000788e0a880 0000000000000002
Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
ffffffff80e20000 ffffffff80e1f8d8 80000007f8d1cb30 80000007f8d1cb30
Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
8000000788e0a880 00ffffff808db9e8 0000000000000000 80000007f8d1cb28
Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
80000007f9306480 ffffffffc02834d0 ffffffff808e0000 ffffffff808dfba4
Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]   ...
Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037774] Call Trace:
Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037797]
[<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037849]
[<ffffffff80e1db78>] schedule+0x38/0x98
Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037865]
[<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037874]
[<ffffffff80e1f8d8>] __down+0x90/0xd8
Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037884]
[<ffffffff808dfba4>] down+0x54/0x70
Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037901]
[<ffffffffc027da48>] start_copy+0x98/0xd8 [dm_snapshot]
Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037934]
[<ffffffffc027e7d4>] __origin_write+0x184/0x2c0 [dm_snapshot]
Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037968]
[<ffffffffc027ee4c>] do_origin.isra.13+0xa4/0x110 [dm_snapshot]
Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038016]
[<ffffffffc01dc2c0>] __map_bio+0xb0/0x258 [dm_mod]
Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038071]
[<ffffffffc01deb94>] __split_and_process_bio+0x274/0x488 [dm_mod]
Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038126]
[<ffffffffc01dee3c>] dm_make_request+0x94/0x128 [dm_mod]
Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038174]
[<ffffffff80b3347c>] generic_make_request+0x114/0x290
Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038238]
[<ffffffffc0511090>] drbd_submit_peer_request+0x258/0x618 [drbd]
Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038337]
[<ffffffffc05121f8>] receive_RSDataReply+0x3b8/0x770 [drbd]
Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038426]
[<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038526]
[<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
Sep 24 12:08:48.985184 warn CFPU-1 kernel: [  280.038588]
[<ffffffff808b813c>] kthread+0xdc/0xf8
Sep 24 12:08:48.985184 warn CFPU-1 kernel: [  280.038604]
[<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
Sep 24 12:08:48.985184 warn CFPU-1 kernel: [  280.038612]
Sep 24 12:08:48.985184 err CFPU-1 kernel: [  280.038629] INFO: task
drbd_r_r5:7910 blocked for more than 120 seconds.
Sep 24 12:08:48.990218 err CFPU-1 kernel: [  280.045351]
Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
Sep 24 12:08:48.998215 err CFPU-1 kernel: [  280.053372] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 24 12:08:49.006141 info CFPU-1 kernel: [  280.061248] drbd_r_r5
   D ffffffff80e1db78     0  7910      2 0x00100002
Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257] Stack :
0000000000000001 ffffffff80b35530 8000000788ef7950 8000000788ef7950
Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
8000000787936c00 0000000000000002 ffffffffc0285294 00000000000049ba
Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
0000000000000001 00000000000049ba 0000000000000001 ffffffff80b2aa90
Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
0000000000000001 ffffffff80e1db78 8000000787936c00 ffffffff80e204b8
Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
800000078785b9c0 80000007fde8bb30 8000000787936c00 0000000100000000
Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
ffffffffc0280000 80000007eff43920 8000000788ca0198 ffffffffc027edec
Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
80000007eff43900 c000000001f5b080 80000007eff43920 ffffffffc01dc2c0
Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
80000007eb29bd80 0000000000000000 0000000002400000 80000007eff42900
Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
00000000000049ba ffffffff80b2abec 8000000788ef7a50 8000000788ef7a50
Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
ffffffffc01dbad8 c000000001f5b080 c000000001f5b080 80000007eff43900
Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]   ...
Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061375] Call Trace:
Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061392]
[<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061435]
[<ffffffff80e1db78>] schedule+0x38/0x98
Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061445]
[<ffffffff80e204b8>] __down_read+0xa8/0xf0
Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061463]
[<ffffffffc027edec>] do_origin.isra.13+0x44/0x110 [dm_snapshot]
Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061509]
[<ffffffffc01dc2c0>] __map_bio+0xb0/0x258 [dm_mod]
Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061563]
[<ffffffffc01deb94>] __split_and_process_bio+0x274/0x488 [dm_mod]
Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061598]
[<ffffffffc01dee3c>] dm_make_request+0x94/0x128 [dm_mod]
Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061624]
[<ffffffff80b3347c>] generic_make_request+0x114/0x290
Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061674]
[<ffffffffc0511090>] drbd_submit_peer_request+0x258/0x618 [drbd]
Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061739]
[<ffffffffc0512c78>] receive_Data+0x6c8/0x1368 [drbd]
Sep 24 12:08:49.007987 warn CFPU-1 kernel: [  280.061816]
[<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
Sep 24 12:08:49.007987 warn CFPU-1 kernel: [  280.061882]
[<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
Sep 24 12:08:49.007987 warn CFPU-1 kernel: [  280.061922]
[<ffffffff808b813c>] kthread+0xdc/0xf8
Sep 24 12:08:49.007987 warn CFPU-1 kernel: [  280.061931]
[<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
Sep 24 12:08:49.007987 warn CFPU-1 kernel: [  280.061937]
Sep 24 12:08:49.007987 err CFPU-1 kernel: [  280.061948] INFO: task
drbd_r_r6:7991 blocked for more than 120 seconds.
Sep 24 12:08:49.013584 err CFPU-1 kernel: [  280.068669]
Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
Sep 24 12:08:49.021541 err CFPU-1 kernel: [  280.076683] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 24 12:08:49.029370 info CFPU-1 kernel: [  280.084523] drbd_r_r6
   D ffffffff80e1db78     0  7991      2 0x00100002
Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532] Stack :
0000000000000001 ffffffff80b35530 800000078785b950 800000078785b950
Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
8000000787b79b00 0000000000000002 ffffffffc0285294 000000000000049a
Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
0000000000000001 000000000000049a 0000000000000001 ffffffff80b2aa90
Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
0000000000000001 ffffffff80e1db78 8000000787b79b00 ffffffff80e204b8
Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
800000078465b940 8000000788ef79c0 8000000787b79b00 0000000100000000
Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
ffffffffc0280000 80000007e90d6c20 80000007878cf2d8 ffffffffc027edec
Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
80000007e90d6c00 c000000001fcc080 80000007e90d6c20 ffffffffc01dc2c0
Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
80000007f93fd900 0000000000000000 0000000002400000 80000007e90d6a00
Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
000000000000049a ffffffff80b2abec 800000078785ba50 800000078785ba50
Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
ffffffffc01dbad8 c000000001fcc080 c000000001fcc080 80000007e90d6c00
Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]   ...
Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084610] Call Trace:
Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084621]
[<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084631]
[<ffffffff80e1db78>] schedule+0x38/0x98
Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084641]
[<ffffffff80e204b8>] __down_read+0xa8/0xf0
Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084659]
[<ffffffffc027edec>] do_origin.isra.13+0x44/0x110 [dm_snapshot]
Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084696]
[<ffffffffc01dc2c0>] __map_bio+0xb0/0x258 [dm_mod]
Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084729]
[<ffffffffc01deb94>] __split_and_process_bio+0x274/0x488 [dm_mod]
Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084777]
[<ffffffffc01dee3c>] dm_make_request+0x94/0x128 [dm_mod]
Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.084810]
[<ffffffff80b3347c>] generic_make_request+0x114/0x290
Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.084863]
[<ffffffffc0511090>] drbd_submit_peer_request+0x258/0x618 [drbd]
Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.084928]
[<ffffffffc0512c78>] receive_Data+0x6c8/0x1368 [drbd]
Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.084992]
[<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.085057]
[<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.085097]
[<ffffffff808b813c>] kthread+0xdc/0xf8
Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.085106]
[<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.085112]
Sep 24 12:08:49.031000 err CFPU-1 kernel: [  280.085121] INFO: task
drbd_r_r7:8046 blocked for more than 120 seconds.
Sep 24 12:08:49.036834 err CFPU-1 kernel: [  280.091854]
Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
Sep 24 12:08:49.044747 err CFPU-1 kernel: [  280.099876] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 24 12:08:49.056612 info CFPU-1 kernel: [  280.107739] drbd_r_r7
   D ffffffff80e1db78     0  8046      2 0x00100000
Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749] Stack :
0000000000000000 0000000000000010 ffffffff811b3a80 0000000000000000
Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
7fffffffffffffff 7fffffffffffffff 0000000000000000 0000000000000002
Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
8000000788fcfa98 0000000000000000 ffffffffc053baa8 ffffffff80be4360
Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
ffffffffc0550000 ffffffff80e1db78 ffffffff81070660 ffffffff80e208d8
Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
8000000797a43600 ffffffff808c4878 80000007ff2f4100 0000000000000000
Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
80000007f93fdb40 0000000000000000 80000007ff007000 ffffffff81070660
Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
8000000788fcfa90 7fffffffffffffff 0000000000000000 0000000000000002
Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
8000000788fcfa98 ffffffff80e1e8a0 0000000100000000 8000000797a42880
Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
ffffffff808c4b60 8000000788fcfaa0 8000000788fcfaa0 ffffffffc053baa8
Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
ffffffff80be4360 ffffffff81cb4540 ffffffff81cb0000 80000007f93fdb40
Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]   ...
Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107821] Call Trace:
Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107833]
[<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107842]
[<ffffffff80e1db78>] schedule+0x38/0x98
Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107850]
[<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107859]
[<ffffffff80e1e8a0>] wait_for_common+0xd8/0x1b0
Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107873]
[<ffffffff808acc20>] call_usermodehelper_exec+0x110/0x188
Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107932]
[<ffffffffc053da3c>] drbd_khelper+0x1c4/0x310 [drbd]
Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107998]
[<ffffffffc0500a14>] drbd_start_resync+0x534/0x968 [drbd]
Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108060]
[<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108124]
[<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108189]
[<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108229]
[<ffffffff808b813c>] kthread+0xdc/0xf8
Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108239]
[<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108245]
Sep 24 12:08:49.056612 err CFPU-1 kernel: [  280.108471] INFO: task
drbd_r_r8:8120 blocked for more than 120 seconds.
Sep 24 12:08:49.060085 err CFPU-1 kernel: [  280.115194]
Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
Sep 24 12:08:49.075924 err CFPU-1 kernel: [  280.123218] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 24 12:08:49.075924 info CFPU-1 kernel: [  280.131070] drbd_r_r8
   D ffffffff80e1db78     0  8120      2 0x00100002
Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083] Stack :
0000000000000000 0000000000000010 ffffffff811b3a80 0000000000000000
Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
7fffffffffffffff 7fffffffffffffff 0000000000000000 0000000000000002
Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
80000007851d3a98 0000000000000000 ffffffffc053baa8 ffffffff80be4360
Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
ffffffffc0550000 ffffffff80e1db78 ffffffff81070660 ffffffff80e208d8
Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
80000007f96f6c00 ffffffff808c4878 80000007ff2f4100 0000000000000000
Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
80000007ee99f360 0000000000000000 80000007ff007000 ffffffff81070660
Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
80000007851d3a90 7fffffffffffffff 0000000000000000 0000000000000002
Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
80000007851d3a98 ffffffff80e1e8a0 0000000100000000 8000000784ba3600
Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
ffffffff808c4b60 80000007851d3aa0 80000007851d3aa0 ffffffffc053baa8
Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
ffffffff80be4360 ffffffff81cb4540 ffffffff81cb0000 80000007ee99f360
Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]   ...
Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131162] Call Trace:
Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131174]
[<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131183]
[<ffffffff80e1db78>] schedule+0x38/0x98
Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131195]
[<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131204]
[<ffffffff80e1e8a0>] wait_for_common+0xd8/0x1b0
Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131216]
[<ffffffff808acc20>] call_usermodehelper_exec+0x110/0x188
Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131274]
[<ffffffffc053da3c>] drbd_khelper+0x1c4/0x310 [drbd]
Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131341]
[<ffffffffc0500a14>] drbd_start_resync+0x534/0x968 [drbd]
Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131403]
[<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131482]
[<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
Sep 24 12:08:49.077857 warn CFPU-1 kernel: [  280.131557]
[<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
Sep 24 12:08:49.077857 warn CFPU-1 kernel: [  280.131597]
[<ffffffff808b813c>] kthread+0xdc/0xf8
Sep 24 12:08:49.077857 warn CFPU-1 kernel: [  280.131606]
[<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
Sep 24 12:08:49.077857 warn CFPU-1 kernel: [  280.131612]
Sep 24 12:08:49.077857 err CFPU-1 kernel: [  280.131641] INFO: task
drbd_r_r9:8173 blocked for more than 120 seconds.
Sep 24 12:08:49.091245 err CFPU-1 kernel: [  280.138358]
Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
Sep 24 12:08:49.091245 err CFPU-1 kernel: [  280.146373] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 24 12:08:49.099067 info CFPU-1 kernel: [  280.154219] drbd_r_r9
   D ffffffff80e1db78     0  8173      2 0x00100002
Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229] Stack :
0000000000000000 0000000000000010 ffffffff811b3a80 0000000000000000
Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
7fffffffffffffff 7fffffffffffffff 0000000000000000 0000000000000002
Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
8000000783c9fa98 0000000000000000 ffffffffc053baa8 ffffffff80be4360
Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
ffffffffc0550000 ffffffff80e1db78 ffffffff81070660 ffffffff80e208d8
Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
8000000797a45100 ffffffff808c4878 80000007ff2f4100 0000000000000000
Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
8000000788ca9d20 0000000000000000 80000007ff007000 ffffffff81070660
Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
8000000783c9fa90 7fffffffffffffff 0000000000000000 0000000000000002
Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
8000000783c9fa98 ffffffff80e1e8a0 0000000100000000 80000007843d4380
Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
ffffffff808c4b60 8000000783c9faa0 8000000783c9faa0 ffffffffc053baa8
Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
ffffffff80be4360 ffffffff81cb4540 ffffffff81cb0000 8000000788ca9d20
Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]   ...
Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154305] Call Trace:
Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154315]
[<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154325]
[<ffffffff80e1db78>] schedule+0x38/0x98
Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154333]
[<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154342]
[<ffffffff80e1e8a0>] wait_for_common+0xd8/0x1b0
Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154354]
[<ffffffff808acc20>] call_usermodehelper_exec+0x110/0x188
Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154410]
[<ffffffffc053da3c>] drbd_khelper+0x1c4/0x310 [drbd]
Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154475]
[<ffffffffc0500a14>] drbd_start_resync+0x534/0x968 [drbd]
Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154540]
[<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
Sep 24 12:08:49.100714 warn CFPU-1 kernel: [  280.154612]
[<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
Sep 24 12:08:49.100714 warn CFPU-1 kernel: [  280.154678]
[<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
Sep 24 12:08:49.100714 warn CFPU-1 kernel: [  280.154717]
[<ffffffff808b813c>] kthread+0xdc/0xf8
Sep 24 12:08:49.100714 warn CFPU-1 kernel: [  280.154726]
[<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
Sep 24 12:08:49.100714 warn CFPU-1 kernel: [  280.154732]
Sep 24 12:08:49.100714 err CFPU-1 kernel: [  280.154742] INFO: task
drbd_r_r10:8254 blocked for more than 120 seconds.
Sep 24 12:08:49.106539 err CFPU-1 kernel: [  280.161554]
Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
Sep 24 12:08:49.114430 err CFPU-1 kernel: [  280.169569] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 24 12:08:49.129749 info CFPU-1 kernel: [  280.177434] drbd_r_r10
   D ffffffff80e1db78     0  8254      2 0x00100002
Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445] Stack :
0000000000000000 0000000000000010 ffffffff811b3a80 0000000000000000
Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
7fffffffffffffff 7fffffffffffffff 0000000000000000 0000000000000002
Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
80000007846efa98 0000000000000000 ffffffffc053baa8 ffffffff80be4360
Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
ffffffffc0550000 ffffffff80e1db78 ffffffff81070660 ffffffff80e208d8
Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
8000000784893600 ffffffff808c4878 80000007ff2f4100 0000000000000000
Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
80000007e9498180 0000000000000000 80000007ff007000 ffffffff81070660
Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
80000007846efa90 7fffffffffffffff 0000000000000000 0000000000000002
Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
80000007846efa98 ffffffff80e1e8a0 0000000100000000 8000000783cf8d80
Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
ffffffff808c4b60 80000007846efaa0 80000007846efaa0 ffffffffc053baa8
Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
ffffffff80be4360 ffffffff81cb4540 ffffffff81cb0000 80000007e9498180
Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]   ...
Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177517] Call Trace:
Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177529]
[<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177539]
[<ffffffff80e1db78>] schedule+0x38/0x98
Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177547]
[<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177556]
[<ffffffff80e1e8a0>] wait_for_common+0xd8/0x1b0
Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177569]
[<ffffffff808acc20>] call_usermodehelper_exec+0x110/0x188
Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177631]
[<ffffffffc053da3c>] drbd_khelper+0x1c4/0x310 [drbd]
Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177697]
[<ffffffffc0500a14>] drbd_start_resync+0x534/0x968 [drbd]
Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177760]
[<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177823]
[<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177888]
[<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177928]
[<ffffffff808b813c>] kthread+0xdc/0xf8
Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177938]
[<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177944]
Sep 24 12:08:49.129749 err CFPU-1 kernel: [  280.177960] INFO: task
drbd_r_r11:8328 blocked for more than 120 seconds.
Sep 24 12:08:49.129749 err CFPU-1 kernel: [  280.184775]
Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
Sep 24 12:08:49.137674 err CFPU-1 kernel: [  280.192793] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 24 12:08:49.152797 info CFPU-1 kernel: [  280.200644] drbd_r_r11
   D ffffffff80e1db78     0  8328      2 0x00100002
Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654] Stack :
0000000000000000 0000000000000010 ffffffff811b3a80 0000000000000000
Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
7fffffffffffffff 7fffffffffffffff 0000000000000000 0000000000000002
Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
8000000783c5fa98 0000000000000000 ffffffffc053baa8 ffffffff80be4360
Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
ffffffffc0550000 ffffffff80e1db78 ffffffff81070660 ffffffff80e208d8
Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
8000000787935100 ffffffff808c4878 80000007ff2f4100 0000000000000000
Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
80000007eebd25a0 0000000000000000 80000007ff007000 ffffffff81070660
Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
8000000783c5fa90 7fffffffffffffff 0000000000000000 0000000000000002
Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
8000000783c5fa98 ffffffff80e1e8a0 0000000100000000 800000078371de80
Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
ffffffff808c4b60 8000000783c5faa0 8000000783c5faa0 ffffffffc053baa8
Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
ffffffff80be4360 ffffffff81cb4540 ffffffff81cb0000 80000007eebd25a0
Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]   ...
Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200726] Call Trace:
Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200739]
[<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200749]
[<ffffffff80e1db78>] schedule+0x38/0x98
Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200757]
[<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200766]
[<ffffffff80e1e8a0>] wait_for_common+0xd8/0x1b0
Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200778]
[<ffffffff808acc20>] call_usermodehelper_exec+0x110/0x188
Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200840]
[<ffffffffc053da3c>] drbd_khelper+0x1c4/0x310 [drbd]
Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200906]
[<ffffffffc0500a14>] drbd_start_resync+0x534/0x968 [drbd]
Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200969]
[<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.201032]
[<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.201097]
[<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.201137]
[<ffffffff808b813c>] kthread+0xdc/0xf8
Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.201147]
[<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.201153]
Sep 24 12:08:49.152797 err CFPU-1 kernel: [  280.201183] INFO: task
lvcreate:8585 blocked for more than 120 seconds.
Sep 24 12:08:49.152797 err CFPU-1 kernel: [  280.207823]
Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
Sep 24 12:08:49.160715 err CFPU-1 kernel: [  280.215860] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 24 12:08:49.168570 info CFPU-1 kernel: [  280.223704] lvcreate
   D ffffffff80e1db78     0  8585   8582 0x00100000
Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719] Stack :
ffffffff809ce420 ffffffffc027e170 80000007ea05c720 ffffffff809ce014
Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
ffffffffc0285294 ffffffffc0285290 8000000797bdb600 0000000000000002
Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
ffffffffc02834d0 ffffffffc0280000 ffffffff809ce420 ffffffffc027e170
Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
80000007f9275d80 ffffffff80e1db78 ffffffffc0285294 ffffffff80e20584
Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
80000007fde8bb30 ffffffffc0285298 8000000797bdb600 0000000000000000
Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
c00000000205a080 0000000000000000 c00000000205a080 80000007ea05c600
Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
80000007fec10300 ffffffffc027fd20 8000000780cb0150 ffffffffc01e2e9c
Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
00000008c0284b30 00000002c0284b30 80000007ea05c620 ffffffffc0280000
Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
c00000000205a080 80000007ea05de00 0000000000000000 8000000780cb0150
Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
8000000780cb0160 0000000001d4c000 0000000000000000 8000000780cb0000
Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]   ...
Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223796] Call Trace:
Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223805]
[<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223814]
[<ffffffff80e1db78>] schedule+0x38/0x98
Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223824]
[<ffffffff80e20584>] __down_write_nested+0x84/0xe8
Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223843]
[<ffffffffc027fd20>] snapshot_ctr+0x4d0/0x868 [dm_snapshot]
Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223886]
[<ffffffffc01e31d4>] dm_table_add_target+0x164/0x418 [dm_mod]
Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223921]
[<ffffffffc01e7bdc>] table_load+0x194/0x478 [dm_mod]
Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223956]
[<ffffffffc01e8e1c>] ctl_ioctl+0x424/0x678 [dm_mod]
Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.224006]
[<ffffffffc01e90a0>] dm_ctl_ioctl+0x30/0x40 [dm_mod]
Sep 24 12:08:49.170177 warn CFPU-1 kernel: [  280.224040]
[<ffffffff809f29cc>] do_vfs_ioctl+0x38c/0x5f8
Sep 24 12:08:49.170177 warn CFPU-1 kernel: [  280.224049]
[<ffffffff809f2c98>] SyS_ioctl+0x60/0xc8
Sep 24 12:08:49.170177 warn CFPU-1 kernel: [  280.224059]
[<ffffffff80879b70>] syscall_common+0x34/0x58
Sep 24 12:08:49.170177 warn CFPU-1 kernel: [  280.224065]

-- 
Guruswamy Basavaiah

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fix "dm kcopyd: Fix bug causing workqueue stalls" causes dead lock
  2019-09-27 13:19 Fix "dm kcopyd: Fix bug causing workqueue stalls" causes dead lock Guruswamy Basavaiah
@ 2019-09-27 19:33 ` Nikos Tsironis
  2019-09-29 14:36   ` Guruswamy Basavaiah
  0 siblings, 1 reply; 19+ messages in thread
From: Nikos Tsironis @ 2019-09-27 19:33 UTC (permalink / raw)
  To: Guruswamy Basavaiah, dm-devel, iliastsi; +Cc: agk, Mike Snitzer

On 9/27/19 4:19 PM, Guruswamy Basavaiah wrote:
> Hello,
>  We have drbd partition on top of lvm partition. when node having
> secondary drbd partition is coming up, large amount of data will be
> synced between primary to secondary drbd partitions.
> 
> During this time, we see the drbd Sync(Resync) stops at some point.
> After 120 seconds we see hung-task-timeout warnings in the logs.(see
> at the end of this email)
> 
> If i increase the cow_count semaphore value from 2048 to 8192 or
> remove the below patch, drbd sync works seamlessly.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/patch/?id=721b1d98fb517ae99ab3b757021cf81db41e67be
> 
> I am not familiar with dm code, from hung task back traces what i
> understand is, when thread is trying to queue work to kcopyd, holding
> "&_origins_lock" and blocked on cow_count lock,
> jobs from kcopyd is trying to queue work to same kcopyd and blocked on
> "&_origins_lock" and dead lock.
> 

Hello Guruswamy,

I am Cc-ing the maintainers, so they can be in the loop.

I examined the attached logs and I believe the following happens:

1. DRBD issues a number of writes to the snapshot origin device. These
   writes cause COW, which is performed by kcopyd.

2. At some point DRBD reaches the cow_count semaphore limit (2048) and
   blocks in down(&s->cow_count), holding a read lock on _origins_lock.

3. Someone tries to create a new snapshot. This involves taking a write
   lock on _origins_lock, which blocks because DRBD at step (2) already
   holds a read lock on it. That's the blocked lvcreate at the end of
   the trace.

4. A COW operation, issued by step (1), completes and kcopyd runs
   dm-snapshot's completion callback, which tries to take a read lock on
   _origins_lock, before signaling the cow_count semaphore. This read
   lock blocks, the semaphore is never signaled and we have the deadlock
   you experienced.

At first glance this seemed strange, because DRBD at step (2) holds a
read lock on _origins_lock, so taking another read lock should be
possible.

But, if I am not missing something, the read-write semaphore
implementation gives priority to writers, meaning that as soon as a
writer tries to enter the critical section, the lvcreate in our case, no
readers will be allowed in until all writers have completed their work.

That's what I believe is causing the deadlock you are experiencing.

I will send a patch fixing this and I will let you now.

Thanks,
Nikos

> Below is the hung task back traces.
> Sep 24 12:08:48.974658 err CFPU-1 kernel: [  279.991760] INFO: task
> kworker/1:1:170 blocked for more than 120 seconds.
> Sep 24 12:08:48.974658 err CFPU-1 kernel: [  279.998569]
> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
> Sep 24 12:08:48.974658 err CFPU-1 kernel: [  280.006593] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 24 12:08:48.974658 info CFPU-1 kernel: [  280.014435] kworker/1:1
>    D ffffffff80e1db78     0   170      2 0x00100000
> Sep 24 12:08:48.974658 info CFPU-1 kernel: [  280.014482] Workqueue:
> kcopyd do_work [dm_mod]
> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487] Stack :
> 0000000000000000 0000000000000001 0003000300000000 80000007fde8bac8
> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
> 80000007fe759b00 0000000000000002 ffffffffc0285294 80000007f8d1ca00
> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
> ffffffffc027eda8 0000000000000001 ffffffff80b30000 0000000000000100
> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
> 8000000784c098c8 ffffffff80e1db78 80000007fe759b00 ffffffff80e204b8
> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
> 8000000788ef79c0 800000078505ba70 80000007fe759b00 00000001852b4620
> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
> ffffffffc0280000 80000007852b4620 80000007eebf5758 ffffffffc027edec
> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
> 0000000000000000 80000007852b4620 80000007835d8e80 ffffffffc027f38c
> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
> 8000000787ac0580 0000000000000001 80000007f8d1ca60 8000000785aeb080
> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
> 0000000000000000 0000000000000000 0000000000000200 ffffffffc0282488
> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
> 0000000000000200 80000007f8d1ca00 ffffffffc0280000 ffffffffc027db90
> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]   ...
> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014558] Call Trace:
> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014570]
> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014580]
> [<ffffffff80e1db78>] schedule+0x38/0x98
> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014590]
> [<ffffffff80e204b8>] __down_read+0xa8/0xf0
> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014609]
> [<ffffffffc027edec>] do_origin.isra.13+0x44/0x110 [dm_snapshot]
> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014625]
> [<ffffffffc027f38c>] pending_complete+0x1ac/0x378 [dm_snapshot]
> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014642]
> [<ffffffffc0282488>] persistent_commit_exception+0x140/0x218
> [dm_snapshot]
> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014659]
> [<ffffffffc027db90>] copy_callback+0x108/0x1a0 [dm_snapshot]
> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014688]
> [<ffffffffc01eb6a4>] run_complete_job+0x8c/0x148 [dm_mod]
> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014724]
> [<ffffffffc01eaae8>] process_jobs+0xc8/0x1e0 [dm_mod]
> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014759]
> [<ffffffffc01eb1e0>] do_work+0xb8/0x110 [dm_mod]
> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014783]
> [<ffffffff808b18e0>] process_one_work+0x190/0x480
> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014791]
> [<ffffffff808b1d18>] worker_thread+0x148/0x580
> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014801]
> [<ffffffff808b813c>] kthread+0xdc/0xf8
> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014811]
> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014817]
> Sep 24 12:08:48.974658 err CFPU-1 kernel: [  280.015046] INFO: task
> drbd_r_r1:7772 blocked for more than 120 seconds.
> Sep 24 12:08:48.974658 err CFPU-1 kernel: [  280.021766]
> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
> Sep 24 12:08:48.974658 err CFPU-1 kernel: [  280.029781] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 24 12:08:48.982578 info CFPU-1 kernel: [  280.037637] drbd_r_r1
>    D ffffffff80e1db78     0  7772      2 0x00100002
> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648] Stack :
> 00000000000207e8 ffffffff80b35530 8000000788e77860 8000000788e77860
> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
> 7fffffffffffffff 80000007f8d1cb28 8000000788e0a880 0000000000000002
> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
> ffffffff80e20000 80000007f9306480 00000000000207e8 ffffffff808dfe58
> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
> 80000007eebf5168 ffffffff80e1db78 0000000002411200 ffffffff80e208d8
> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
> 80000007eab5b440 ffffffff8097b25c 00000000000207a0 ffffffff808dfe58
> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
> 80000007f8d1ca00 ffffffff809ce014 80000007f9306000 0000000002011200
> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
> 7fffffffffffffff 80000007f8d1cb28 8000000788e0a880 0000000000000002
> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
> ffffffff80e20000 ffffffff80e1f8d8 80000007f8d1cb30 80000007f8d1cb30
> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
> 8000000788e0a880 00ffffff808db9e8 0000000000000000 80000007f8d1cb28
> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
> 80000007f9306480 ffffffffc02834d0 ffffffff808e0000 ffffffff808dfba4
> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]   ...
> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037774] Call Trace:
> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037797]
> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037849]
> [<ffffffff80e1db78>] schedule+0x38/0x98
> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037865]
> [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037874]
> [<ffffffff80e1f8d8>] __down+0x90/0xd8
> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037884]
> [<ffffffff808dfba4>] down+0x54/0x70
> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037901]
> [<ffffffffc027da48>] start_copy+0x98/0xd8 [dm_snapshot]
> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037934]
> [<ffffffffc027e7d4>] __origin_write+0x184/0x2c0 [dm_snapshot]
> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037968]
> [<ffffffffc027ee4c>] do_origin.isra.13+0xa4/0x110 [dm_snapshot]
> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038016]
> [<ffffffffc01dc2c0>] __map_bio+0xb0/0x258 [dm_mod]
> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038071]
> [<ffffffffc01deb94>] __split_and_process_bio+0x274/0x488 [dm_mod]
> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038126]
> [<ffffffffc01dee3c>] dm_make_request+0x94/0x128 [dm_mod]
> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038174]
> [<ffffffff80b3347c>] generic_make_request+0x114/0x290
> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038238]
> [<ffffffffc0511090>] drbd_submit_peer_request+0x258/0x618 [drbd]
> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038337]
> [<ffffffffc05121f8>] receive_RSDataReply+0x3b8/0x770 [drbd]
> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038426]
> [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038526]
> [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
> Sep 24 12:08:48.985184 warn CFPU-1 kernel: [  280.038588]
> [<ffffffff808b813c>] kthread+0xdc/0xf8
> Sep 24 12:08:48.985184 warn CFPU-1 kernel: [  280.038604]
> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
> Sep 24 12:08:48.985184 warn CFPU-1 kernel: [  280.038612]
> Sep 24 12:08:48.985184 err CFPU-1 kernel: [  280.038629] INFO: task
> drbd_r_r5:7910 blocked for more than 120 seconds.
> Sep 24 12:08:48.990218 err CFPU-1 kernel: [  280.045351]
> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
> Sep 24 12:08:48.998215 err CFPU-1 kernel: [  280.053372] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 24 12:08:49.006141 info CFPU-1 kernel: [  280.061248] drbd_r_r5
>    D ffffffff80e1db78     0  7910      2 0x00100002
> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257] Stack :
> 0000000000000001 ffffffff80b35530 8000000788ef7950 8000000788ef7950
> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
> 8000000787936c00 0000000000000002 ffffffffc0285294 00000000000049ba
> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
> 0000000000000001 00000000000049ba 0000000000000001 ffffffff80b2aa90
> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
> 0000000000000001 ffffffff80e1db78 8000000787936c00 ffffffff80e204b8
> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
> 800000078785b9c0 80000007fde8bb30 8000000787936c00 0000000100000000
> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
> ffffffffc0280000 80000007eff43920 8000000788ca0198 ffffffffc027edec
> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
> 80000007eff43900 c000000001f5b080 80000007eff43920 ffffffffc01dc2c0
> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
> 80000007eb29bd80 0000000000000000 0000000002400000 80000007eff42900
> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
> 00000000000049ba ffffffff80b2abec 8000000788ef7a50 8000000788ef7a50
> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
> ffffffffc01dbad8 c000000001f5b080 c000000001f5b080 80000007eff43900
> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]   ...
> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061375] Call Trace:
> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061392]
> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061435]
> [<ffffffff80e1db78>] schedule+0x38/0x98
> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061445]
> [<ffffffff80e204b8>] __down_read+0xa8/0xf0
> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061463]
> [<ffffffffc027edec>] do_origin.isra.13+0x44/0x110 [dm_snapshot]
> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061509]
> [<ffffffffc01dc2c0>] __map_bio+0xb0/0x258 [dm_mod]
> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061563]
> [<ffffffffc01deb94>] __split_and_process_bio+0x274/0x488 [dm_mod]
> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061598]
> [<ffffffffc01dee3c>] dm_make_request+0x94/0x128 [dm_mod]
> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061624]
> [<ffffffff80b3347c>] generic_make_request+0x114/0x290
> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061674]
> [<ffffffffc0511090>] drbd_submit_peer_request+0x258/0x618 [drbd]
> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061739]
> [<ffffffffc0512c78>] receive_Data+0x6c8/0x1368 [drbd]
> Sep 24 12:08:49.007987 warn CFPU-1 kernel: [  280.061816]
> [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
> Sep 24 12:08:49.007987 warn CFPU-1 kernel: [  280.061882]
> [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
> Sep 24 12:08:49.007987 warn CFPU-1 kernel: [  280.061922]
> [<ffffffff808b813c>] kthread+0xdc/0xf8
> Sep 24 12:08:49.007987 warn CFPU-1 kernel: [  280.061931]
> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
> Sep 24 12:08:49.007987 warn CFPU-1 kernel: [  280.061937]
> Sep 24 12:08:49.007987 err CFPU-1 kernel: [  280.061948] INFO: task
> drbd_r_r6:7991 blocked for more than 120 seconds.
> Sep 24 12:08:49.013584 err CFPU-1 kernel: [  280.068669]
> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
> Sep 24 12:08:49.021541 err CFPU-1 kernel: [  280.076683] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 24 12:08:49.029370 info CFPU-1 kernel: [  280.084523] drbd_r_r6
>    D ffffffff80e1db78     0  7991      2 0x00100002
> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532] Stack :
> 0000000000000001 ffffffff80b35530 800000078785b950 800000078785b950
> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
> 8000000787b79b00 0000000000000002 ffffffffc0285294 000000000000049a
> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
> 0000000000000001 000000000000049a 0000000000000001 ffffffff80b2aa90
> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
> 0000000000000001 ffffffff80e1db78 8000000787b79b00 ffffffff80e204b8
> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
> 800000078465b940 8000000788ef79c0 8000000787b79b00 0000000100000000
> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
> ffffffffc0280000 80000007e90d6c20 80000007878cf2d8 ffffffffc027edec
> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
> 80000007e90d6c00 c000000001fcc080 80000007e90d6c20 ffffffffc01dc2c0
> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
> 80000007f93fd900 0000000000000000 0000000002400000 80000007e90d6a00
> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
> 000000000000049a ffffffff80b2abec 800000078785ba50 800000078785ba50
> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
> ffffffffc01dbad8 c000000001fcc080 c000000001fcc080 80000007e90d6c00
> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]   ...
> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084610] Call Trace:
> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084621]
> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084631]
> [<ffffffff80e1db78>] schedule+0x38/0x98
> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084641]
> [<ffffffff80e204b8>] __down_read+0xa8/0xf0
> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084659]
> [<ffffffffc027edec>] do_origin.isra.13+0x44/0x110 [dm_snapshot]
> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084696]
> [<ffffffffc01dc2c0>] __map_bio+0xb0/0x258 [dm_mod]
> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084729]
> [<ffffffffc01deb94>] __split_and_process_bio+0x274/0x488 [dm_mod]
> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084777]
> [<ffffffffc01dee3c>] dm_make_request+0x94/0x128 [dm_mod]
> Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.084810]
> [<ffffffff80b3347c>] generic_make_request+0x114/0x290
> Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.084863]
> [<ffffffffc0511090>] drbd_submit_peer_request+0x258/0x618 [drbd]
> Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.084928]
> [<ffffffffc0512c78>] receive_Data+0x6c8/0x1368 [drbd]
> Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.084992]
> [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
> Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.085057]
> [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
> Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.085097]
> [<ffffffff808b813c>] kthread+0xdc/0xf8
> Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.085106]
> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
> Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.085112]
> Sep 24 12:08:49.031000 err CFPU-1 kernel: [  280.085121] INFO: task
> drbd_r_r7:8046 blocked for more than 120 seconds.
> Sep 24 12:08:49.036834 err CFPU-1 kernel: [  280.091854]
> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
> Sep 24 12:08:49.044747 err CFPU-1 kernel: [  280.099876] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 24 12:08:49.056612 info CFPU-1 kernel: [  280.107739] drbd_r_r7
>    D ffffffff80e1db78     0  8046      2 0x00100000
> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749] Stack :
> 0000000000000000 0000000000000010 ffffffff811b3a80 0000000000000000
> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
> 7fffffffffffffff 7fffffffffffffff 0000000000000000 0000000000000002
> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
> 8000000788fcfa98 0000000000000000 ffffffffc053baa8 ffffffff80be4360
> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
> ffffffffc0550000 ffffffff80e1db78 ffffffff81070660 ffffffff80e208d8
> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
> 8000000797a43600 ffffffff808c4878 80000007ff2f4100 0000000000000000
> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
> 80000007f93fdb40 0000000000000000 80000007ff007000 ffffffff81070660
> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
> 8000000788fcfa90 7fffffffffffffff 0000000000000000 0000000000000002
> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
> 8000000788fcfa98 ffffffff80e1e8a0 0000000100000000 8000000797a42880
> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
> ffffffff808c4b60 8000000788fcfaa0 8000000788fcfaa0 ffffffffc053baa8
> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
> ffffffff80be4360 ffffffff81cb4540 ffffffff81cb0000 80000007f93fdb40
> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]   ...
> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107821] Call Trace:
> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107833]
> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107842]
> [<ffffffff80e1db78>] schedule+0x38/0x98
> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107850]
> [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107859]
> [<ffffffff80e1e8a0>] wait_for_common+0xd8/0x1b0
> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107873]
> [<ffffffff808acc20>] call_usermodehelper_exec+0x110/0x188
> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107932]
> [<ffffffffc053da3c>] drbd_khelper+0x1c4/0x310 [drbd]
> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107998]
> [<ffffffffc0500a14>] drbd_start_resync+0x534/0x968 [drbd]
> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108060]
> [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108124]
> [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108189]
> [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108229]
> [<ffffffff808b813c>] kthread+0xdc/0xf8
> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108239]
> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108245]
> Sep 24 12:08:49.056612 err CFPU-1 kernel: [  280.108471] INFO: task
> drbd_r_r8:8120 blocked for more than 120 seconds.
> Sep 24 12:08:49.060085 err CFPU-1 kernel: [  280.115194]
> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
> Sep 24 12:08:49.075924 err CFPU-1 kernel: [  280.123218] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 24 12:08:49.075924 info CFPU-1 kernel: [  280.131070] drbd_r_r8
>    D ffffffff80e1db78     0  8120      2 0x00100002
> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083] Stack :
> 0000000000000000 0000000000000010 ffffffff811b3a80 0000000000000000
> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
> 7fffffffffffffff 7fffffffffffffff 0000000000000000 0000000000000002
> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
> 80000007851d3a98 0000000000000000 ffffffffc053baa8 ffffffff80be4360
> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
> ffffffffc0550000 ffffffff80e1db78 ffffffff81070660 ffffffff80e208d8
> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
> 80000007f96f6c00 ffffffff808c4878 80000007ff2f4100 0000000000000000
> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
> 80000007ee99f360 0000000000000000 80000007ff007000 ffffffff81070660
> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
> 80000007851d3a90 7fffffffffffffff 0000000000000000 0000000000000002
> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
> 80000007851d3a98 ffffffff80e1e8a0 0000000100000000 8000000784ba3600
> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
> ffffffff808c4b60 80000007851d3aa0 80000007851d3aa0 ffffffffc053baa8
> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
> ffffffff80be4360 ffffffff81cb4540 ffffffff81cb0000 80000007ee99f360
> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]   ...
> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131162] Call Trace:
> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131174]
> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131183]
> [<ffffffff80e1db78>] schedule+0x38/0x98
> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131195]
> [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131204]
> [<ffffffff80e1e8a0>] wait_for_common+0xd8/0x1b0
> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131216]
> [<ffffffff808acc20>] call_usermodehelper_exec+0x110/0x188
> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131274]
> [<ffffffffc053da3c>] drbd_khelper+0x1c4/0x310 [drbd]
> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131341]
> [<ffffffffc0500a14>] drbd_start_resync+0x534/0x968 [drbd]
> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131403]
> [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131482]
> [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
> Sep 24 12:08:49.077857 warn CFPU-1 kernel: [  280.131557]
> [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
> Sep 24 12:08:49.077857 warn CFPU-1 kernel: [  280.131597]
> [<ffffffff808b813c>] kthread+0xdc/0xf8
> Sep 24 12:08:49.077857 warn CFPU-1 kernel: [  280.131606]
> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
> Sep 24 12:08:49.077857 warn CFPU-1 kernel: [  280.131612]
> Sep 24 12:08:49.077857 err CFPU-1 kernel: [  280.131641] INFO: task
> drbd_r_r9:8173 blocked for more than 120 seconds.
> Sep 24 12:08:49.091245 err CFPU-1 kernel: [  280.138358]
> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
> Sep 24 12:08:49.091245 err CFPU-1 kernel: [  280.146373] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 24 12:08:49.099067 info CFPU-1 kernel: [  280.154219] drbd_r_r9
>    D ffffffff80e1db78     0  8173      2 0x00100002
> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229] Stack :
> 0000000000000000 0000000000000010 ffffffff811b3a80 0000000000000000
> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
> 7fffffffffffffff 7fffffffffffffff 0000000000000000 0000000000000002
> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
> 8000000783c9fa98 0000000000000000 ffffffffc053baa8 ffffffff80be4360
> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
> ffffffffc0550000 ffffffff80e1db78 ffffffff81070660 ffffffff80e208d8
> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
> 8000000797a45100 ffffffff808c4878 80000007ff2f4100 0000000000000000
> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
> 8000000788ca9d20 0000000000000000 80000007ff007000 ffffffff81070660
> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
> 8000000783c9fa90 7fffffffffffffff 0000000000000000 0000000000000002
> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
> 8000000783c9fa98 ffffffff80e1e8a0 0000000100000000 80000007843d4380
> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
> ffffffff808c4b60 8000000783c9faa0 8000000783c9faa0 ffffffffc053baa8
> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
> ffffffff80be4360 ffffffff81cb4540 ffffffff81cb0000 8000000788ca9d20
> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]   ...
> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154305] Call Trace:
> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154315]
> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154325]
> [<ffffffff80e1db78>] schedule+0x38/0x98
> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154333]
> [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154342]
> [<ffffffff80e1e8a0>] wait_for_common+0xd8/0x1b0
> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154354]
> [<ffffffff808acc20>] call_usermodehelper_exec+0x110/0x188
> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154410]
> [<ffffffffc053da3c>] drbd_khelper+0x1c4/0x310 [drbd]
> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154475]
> [<ffffffffc0500a14>] drbd_start_resync+0x534/0x968 [drbd]
> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154540]
> [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
> Sep 24 12:08:49.100714 warn CFPU-1 kernel: [  280.154612]
> [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
> Sep 24 12:08:49.100714 warn CFPU-1 kernel: [  280.154678]
> [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
> Sep 24 12:08:49.100714 warn CFPU-1 kernel: [  280.154717]
> [<ffffffff808b813c>] kthread+0xdc/0xf8
> Sep 24 12:08:49.100714 warn CFPU-1 kernel: [  280.154726]
> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
> Sep 24 12:08:49.100714 warn CFPU-1 kernel: [  280.154732]
> Sep 24 12:08:49.100714 err CFPU-1 kernel: [  280.154742] INFO: task
> drbd_r_r10:8254 blocked for more than 120 seconds.
> Sep 24 12:08:49.106539 err CFPU-1 kernel: [  280.161554]
> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
> Sep 24 12:08:49.114430 err CFPU-1 kernel: [  280.169569] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 24 12:08:49.129749 info CFPU-1 kernel: [  280.177434] drbd_r_r10
>    D ffffffff80e1db78     0  8254      2 0x00100002
> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445] Stack :
> 0000000000000000 0000000000000010 ffffffff811b3a80 0000000000000000
> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
> 7fffffffffffffff 7fffffffffffffff 0000000000000000 0000000000000002
> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
> 80000007846efa98 0000000000000000 ffffffffc053baa8 ffffffff80be4360
> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
> ffffffffc0550000 ffffffff80e1db78 ffffffff81070660 ffffffff80e208d8
> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
> 8000000784893600 ffffffff808c4878 80000007ff2f4100 0000000000000000
> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
> 80000007e9498180 0000000000000000 80000007ff007000 ffffffff81070660
> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
> 80000007846efa90 7fffffffffffffff 0000000000000000 0000000000000002
> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
> 80000007846efa98 ffffffff80e1e8a0 0000000100000000 8000000783cf8d80
> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
> ffffffff808c4b60 80000007846efaa0 80000007846efaa0 ffffffffc053baa8
> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
> ffffffff80be4360 ffffffff81cb4540 ffffffff81cb0000 80000007e9498180
> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]   ...
> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177517] Call Trace:
> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177529]
> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177539]
> [<ffffffff80e1db78>] schedule+0x38/0x98
> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177547]
> [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177556]
> [<ffffffff80e1e8a0>] wait_for_common+0xd8/0x1b0
> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177569]
> [<ffffffff808acc20>] call_usermodehelper_exec+0x110/0x188
> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177631]
> [<ffffffffc053da3c>] drbd_khelper+0x1c4/0x310 [drbd]
> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177697]
> [<ffffffffc0500a14>] drbd_start_resync+0x534/0x968 [drbd]
> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177760]
> [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177823]
> [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177888]
> [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177928]
> [<ffffffff808b813c>] kthread+0xdc/0xf8
> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177938]
> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177944]
> Sep 24 12:08:49.129749 err CFPU-1 kernel: [  280.177960] INFO: task
> drbd_r_r11:8328 blocked for more than 120 seconds.
> Sep 24 12:08:49.129749 err CFPU-1 kernel: [  280.184775]
> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
> Sep 24 12:08:49.137674 err CFPU-1 kernel: [  280.192793] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 24 12:08:49.152797 info CFPU-1 kernel: [  280.200644] drbd_r_r11
>    D ffffffff80e1db78     0  8328      2 0x00100002
> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654] Stack :
> 0000000000000000 0000000000000010 ffffffff811b3a80 0000000000000000
> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
> 7fffffffffffffff 7fffffffffffffff 0000000000000000 0000000000000002
> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
> 8000000783c5fa98 0000000000000000 ffffffffc053baa8 ffffffff80be4360
> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
> ffffffffc0550000 ffffffff80e1db78 ffffffff81070660 ffffffff80e208d8
> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
> 8000000787935100 ffffffff808c4878 80000007ff2f4100 0000000000000000
> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
> 80000007eebd25a0 0000000000000000 80000007ff007000 ffffffff81070660
> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
> 8000000783c5fa90 7fffffffffffffff 0000000000000000 0000000000000002
> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
> 8000000783c5fa98 ffffffff80e1e8a0 0000000100000000 800000078371de80
> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
> ffffffff808c4b60 8000000783c5faa0 8000000783c5faa0 ffffffffc053baa8
> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
> ffffffff80be4360 ffffffff81cb4540 ffffffff81cb0000 80000007eebd25a0
> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]   ...
> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200726] Call Trace:
> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200739]
> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200749]
> [<ffffffff80e1db78>] schedule+0x38/0x98
> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200757]
> [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200766]
> [<ffffffff80e1e8a0>] wait_for_common+0xd8/0x1b0
> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200778]
> [<ffffffff808acc20>] call_usermodehelper_exec+0x110/0x188
> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200840]
> [<ffffffffc053da3c>] drbd_khelper+0x1c4/0x310 [drbd]
> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200906]
> [<ffffffffc0500a14>] drbd_start_resync+0x534/0x968 [drbd]
> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200969]
> [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.201032]
> [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.201097]
> [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.201137]
> [<ffffffff808b813c>] kthread+0xdc/0xf8
> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.201147]
> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.201153]
> Sep 24 12:08:49.152797 err CFPU-1 kernel: [  280.201183] INFO: task
> lvcreate:8585 blocked for more than 120 seconds.
> Sep 24 12:08:49.152797 err CFPU-1 kernel: [  280.207823]
> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
> Sep 24 12:08:49.160715 err CFPU-1 kernel: [  280.215860] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 24 12:08:49.168570 info CFPU-1 kernel: [  280.223704] lvcreate
>    D ffffffff80e1db78     0  8585   8582 0x00100000
> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719] Stack :
> ffffffff809ce420 ffffffffc027e170 80000007ea05c720 ffffffff809ce014
> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
> ffffffffc0285294 ffffffffc0285290 8000000797bdb600 0000000000000002
> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
> ffffffffc02834d0 ffffffffc0280000 ffffffff809ce420 ffffffffc027e170
> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
> 80000007f9275d80 ffffffff80e1db78 ffffffffc0285294 ffffffff80e20584
> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
> 80000007fde8bb30 ffffffffc0285298 8000000797bdb600 0000000000000000
> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
> c00000000205a080 0000000000000000 c00000000205a080 80000007ea05c600
> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
> 80000007fec10300 ffffffffc027fd20 8000000780cb0150 ffffffffc01e2e9c
> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
> 00000008c0284b30 00000002c0284b30 80000007ea05c620 ffffffffc0280000
> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
> c00000000205a080 80000007ea05de00 0000000000000000 8000000780cb0150
> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
> 8000000780cb0160 0000000001d4c000 0000000000000000 8000000780cb0000
> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]   ...
> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223796] Call Trace:
> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223805]
> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223814]
> [<ffffffff80e1db78>] schedule+0x38/0x98
> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223824]
> [<ffffffff80e20584>] __down_write_nested+0x84/0xe8
> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223843]
> [<ffffffffc027fd20>] snapshot_ctr+0x4d0/0x868 [dm_snapshot]
> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223886]
> [<ffffffffc01e31d4>] dm_table_add_target+0x164/0x418 [dm_mod]
> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223921]
> [<ffffffffc01e7bdc>] table_load+0x194/0x478 [dm_mod]
> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223956]
> [<ffffffffc01e8e1c>] ctl_ioctl+0x424/0x678 [dm_mod]
> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.224006]
> [<ffffffffc01e90a0>] dm_ctl_ioctl+0x30/0x40 [dm_mod]
> Sep 24 12:08:49.170177 warn CFPU-1 kernel: [  280.224040]
> [<ffffffff809f29cc>] do_vfs_ioctl+0x38c/0x5f8
> Sep 24 12:08:49.170177 warn CFPU-1 kernel: [  280.224049]
> [<ffffffff809f2c98>] SyS_ioctl+0x60/0xc8
> Sep 24 12:08:49.170177 warn CFPU-1 kernel: [  280.224059]
> [<ffffffff80879b70>] syscall_common+0x34/0x58
> Sep 24 12:08:49.170177 warn CFPU-1 kernel: [  280.224065]
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fix "dm kcopyd: Fix bug causing workqueue stalls" causes dead lock
  2019-09-27 19:33 ` Nikos Tsironis
@ 2019-09-29 14:36   ` Guruswamy Basavaiah
  2019-10-01 12:12     ` Nikos Tsironis
  0 siblings, 1 reply; 19+ messages in thread
From: Guruswamy Basavaiah @ 2019-09-29 14:36 UTC (permalink / raw)
  To: Nikos Tsironis; +Cc: dm-devel, agk, Mike Snitzer, iliastsi

Hello Nikos,
 Thanks for pointing out the lvcreate write lock.

Guru


On Sat, 28 Sep 2019 at 01:03, Nikos Tsironis <ntsironis@arrikto.com> wrote:
>
> On 9/27/19 4:19 PM, Guruswamy Basavaiah wrote:
> > Hello,
> >  We have drbd partition on top of lvm partition. when node having
> > secondary drbd partition is coming up, large amount of data will be
> > synced between primary to secondary drbd partitions.
> >
> > During this time, we see the drbd Sync(Resync) stops at some point.
> > After 120 seconds we see hung-task-timeout warnings in the logs.(see
> > at the end of this email)
> >
> > If i increase the cow_count semaphore value from 2048 to 8192 or
> > remove the below patch, drbd sync works seamlessly.
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/patch/?id=721b1d98fb517ae99ab3b757021cf81db41e67be
> >
> > I am not familiar with dm code, from hung task back traces what i
> > understand is, when thread is trying to queue work to kcopyd, holding
> > "&_origins_lock" and blocked on cow_count lock,
> > jobs from kcopyd is trying to queue work to same kcopyd and blocked on
> > "&_origins_lock" and dead lock.
> >
>
> Hello Guruswamy,
>
> I am Cc-ing the maintainers, so they can be in the loop.
>
> I examined the attached logs and I believe the following happens:
>
> 1. DRBD issues a number of writes to the snapshot origin device. These
>    writes cause COW, which is performed by kcopyd.
>
> 2. At some point DRBD reaches the cow_count semaphore limit (2048) and
>    blocks in down(&s->cow_count), holding a read lock on _origins_lock.
>
> 3. Someone tries to create a new snapshot. This involves taking a write
>    lock on _origins_lock, which blocks because DRBD at step (2) already
>    holds a read lock on it. That's the blocked lvcreate at the end of
>    the trace.
>
> 4. A COW operation, issued by step (1), completes and kcopyd runs
>    dm-snapshot's completion callback, which tries to take a read lock on
>    _origins_lock, before signaling the cow_count semaphore. This read
>    lock blocks, the semaphore is never signaled and we have the deadlock
>    you experienced.
>
> At first glance this seemed strange, because DRBD at step (2) holds a
> read lock on _origins_lock, so taking another read lock should be
> possible.
>
> But, if I am not missing something, the read-write semaphore
> implementation gives priority to writers, meaning that as soon as a
> writer tries to enter the critical section, the lvcreate in our case, no
> readers will be allowed in until all writers have completed their work.
>
> That's what I believe is causing the deadlock you are experiencing.
>
> I will send a patch fixing this and I will let you now.
>
> Thanks,
> Nikos
>
> > Below is the hung task back traces.
> > Sep 24 12:08:48.974658 err CFPU-1 kernel: [  279.991760] INFO: task
> > kworker/1:1:170 blocked for more than 120 seconds.
> > Sep 24 12:08:48.974658 err CFPU-1 kernel: [  279.998569]
> > Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
> > Sep 24 12:08:48.974658 err CFPU-1 kernel: [  280.006593] "echo 0 >
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > Sep 24 12:08:48.974658 info CFPU-1 kernel: [  280.014435] kworker/1:1
> >    D ffffffff80e1db78     0   170      2 0x00100000
> > Sep 24 12:08:48.974658 info CFPU-1 kernel: [  280.014482] Workqueue:
> > kcopyd do_work [dm_mod]
> > Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487] Stack :
> > 0000000000000000 0000000000000001 0003000300000000 80000007fde8bac8
> > Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
> > 80000007fe759b00 0000000000000002 ffffffffc0285294 80000007f8d1ca00
> > Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
> > ffffffffc027eda8 0000000000000001 ffffffff80b30000 0000000000000100
> > Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
> > 8000000784c098c8 ffffffff80e1db78 80000007fe759b00 ffffffff80e204b8
> > Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
> > 8000000788ef79c0 800000078505ba70 80000007fe759b00 00000001852b4620
> > Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
> > ffffffffc0280000 80000007852b4620 80000007eebf5758 ffffffffc027edec
> > Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
> > 0000000000000000 80000007852b4620 80000007835d8e80 ffffffffc027f38c
> > Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
> > 8000000787ac0580 0000000000000001 80000007f8d1ca60 8000000785aeb080
> > Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
> > 0000000000000000 0000000000000000 0000000000000200 ffffffffc0282488
> > Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
> > 0000000000000200 80000007f8d1ca00 ffffffffc0280000 ffffffffc027db90
> > Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]   ...
> > Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014558] Call Trace:
> > Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014570]
> > [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
> > Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014580]
> > [<ffffffff80e1db78>] schedule+0x38/0x98
> > Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014590]
> > [<ffffffff80e204b8>] __down_read+0xa8/0xf0
> > Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014609]
> > [<ffffffffc027edec>] do_origin.isra.13+0x44/0x110 [dm_snapshot]
> > Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014625]
> > [<ffffffffc027f38c>] pending_complete+0x1ac/0x378 [dm_snapshot]
> > Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014642]
> > [<ffffffffc0282488>] persistent_commit_exception+0x140/0x218
> > [dm_snapshot]
> > Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014659]
> > [<ffffffffc027db90>] copy_callback+0x108/0x1a0 [dm_snapshot]
> > Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014688]
> > [<ffffffffc01eb6a4>] run_complete_job+0x8c/0x148 [dm_mod]
> > Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014724]
> > [<ffffffffc01eaae8>] process_jobs+0xc8/0x1e0 [dm_mod]
> > Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014759]
> > [<ffffffffc01eb1e0>] do_work+0xb8/0x110 [dm_mod]
> > Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014783]
> > [<ffffffff808b18e0>] process_one_work+0x190/0x480
> > Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014791]
> > [<ffffffff808b1d18>] worker_thread+0x148/0x580
> > Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014801]
> > [<ffffffff808b813c>] kthread+0xdc/0xf8
> > Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014811]
> > [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
> > Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014817]
> > Sep 24 12:08:48.974658 err CFPU-1 kernel: [  280.015046] INFO: task
> > drbd_r_r1:7772 blocked for more than 120 seconds.
> > Sep 24 12:08:48.974658 err CFPU-1 kernel: [  280.021766]
> > Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
> > Sep 24 12:08:48.974658 err CFPU-1 kernel: [  280.029781] "echo 0 >
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > Sep 24 12:08:48.982578 info CFPU-1 kernel: [  280.037637] drbd_r_r1
> >    D ffffffff80e1db78     0  7772      2 0x00100002
> > Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648] Stack :
> > 00000000000207e8 ffffffff80b35530 8000000788e77860 8000000788e77860
> > Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
> > 7fffffffffffffff 80000007f8d1cb28 8000000788e0a880 0000000000000002
> > Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
> > ffffffff80e20000 80000007f9306480 00000000000207e8 ffffffff808dfe58
> > Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
> > 80000007eebf5168 ffffffff80e1db78 0000000002411200 ffffffff80e208d8
> > Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
> > 80000007eab5b440 ffffffff8097b25c 00000000000207a0 ffffffff808dfe58
> > Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
> > 80000007f8d1ca00 ffffffff809ce014 80000007f9306000 0000000002011200
> > Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
> > 7fffffffffffffff 80000007f8d1cb28 8000000788e0a880 0000000000000002
> > Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
> > ffffffff80e20000 ffffffff80e1f8d8 80000007f8d1cb30 80000007f8d1cb30
> > Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
> > 8000000788e0a880 00ffffff808db9e8 0000000000000000 80000007f8d1cb28
> > Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
> > 80000007f9306480 ffffffffc02834d0 ffffffff808e0000 ffffffff808dfba4
> > Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]   ...
> > Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037774] Call Trace:
> > Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037797]
> > [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
> > Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037849]
> > [<ffffffff80e1db78>] schedule+0x38/0x98
> > Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037865]
> > [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
> > Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037874]
> > [<ffffffff80e1f8d8>] __down+0x90/0xd8
> > Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037884]
> > [<ffffffff808dfba4>] down+0x54/0x70
> > Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037901]
> > [<ffffffffc027da48>] start_copy+0x98/0xd8 [dm_snapshot]
> > Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037934]
> > [<ffffffffc027e7d4>] __origin_write+0x184/0x2c0 [dm_snapshot]
> > Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037968]
> > [<ffffffffc027ee4c>] do_origin.isra.13+0xa4/0x110 [dm_snapshot]
> > Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038016]
> > [<ffffffffc01dc2c0>] __map_bio+0xb0/0x258 [dm_mod]
> > Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038071]
> > [<ffffffffc01deb94>] __split_and_process_bio+0x274/0x488 [dm_mod]
> > Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038126]
> > [<ffffffffc01dee3c>] dm_make_request+0x94/0x128 [dm_mod]
> > Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038174]
> > [<ffffffff80b3347c>] generic_make_request+0x114/0x290
> > Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038238]
> > [<ffffffffc0511090>] drbd_submit_peer_request+0x258/0x618 [drbd]
> > Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038337]
> > [<ffffffffc05121f8>] receive_RSDataReply+0x3b8/0x770 [drbd]
> > Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038426]
> > [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
> > Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038526]
> > [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
> > Sep 24 12:08:48.985184 warn CFPU-1 kernel: [  280.038588]
> > [<ffffffff808b813c>] kthread+0xdc/0xf8
> > Sep 24 12:08:48.985184 warn CFPU-1 kernel: [  280.038604]
> > [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
> > Sep 24 12:08:48.985184 warn CFPU-1 kernel: [  280.038612]
> > Sep 24 12:08:48.985184 err CFPU-1 kernel: [  280.038629] INFO: task
> > drbd_r_r5:7910 blocked for more than 120 seconds.
> > Sep 24 12:08:48.990218 err CFPU-1 kernel: [  280.045351]
> > Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
> > Sep 24 12:08:48.998215 err CFPU-1 kernel: [  280.053372] "echo 0 >
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > Sep 24 12:08:49.006141 info CFPU-1 kernel: [  280.061248] drbd_r_r5
> >    D ffffffff80e1db78     0  7910      2 0x00100002
> > Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257] Stack :
> > 0000000000000001 ffffffff80b35530 8000000788ef7950 8000000788ef7950
> > Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
> > 8000000787936c00 0000000000000002 ffffffffc0285294 00000000000049ba
> > Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
> > 0000000000000001 00000000000049ba 0000000000000001 ffffffff80b2aa90
> > Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
> > 0000000000000001 ffffffff80e1db78 8000000787936c00 ffffffff80e204b8
> > Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
> > 800000078785b9c0 80000007fde8bb30 8000000787936c00 0000000100000000
> > Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
> > ffffffffc0280000 80000007eff43920 8000000788ca0198 ffffffffc027edec
> > Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
> > 80000007eff43900 c000000001f5b080 80000007eff43920 ffffffffc01dc2c0
> > Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
> > 80000007eb29bd80 0000000000000000 0000000002400000 80000007eff42900
> > Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
> > 00000000000049ba ffffffff80b2abec 8000000788ef7a50 8000000788ef7a50
> > Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
> > ffffffffc01dbad8 c000000001f5b080 c000000001f5b080 80000007eff43900
> > Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]   ...
> > Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061375] Call Trace:
> > Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061392]
> > [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
> > Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061435]
> > [<ffffffff80e1db78>] schedule+0x38/0x98
> > Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061445]
> > [<ffffffff80e204b8>] __down_read+0xa8/0xf0
> > Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061463]
> > [<ffffffffc027edec>] do_origin.isra.13+0x44/0x110 [dm_snapshot]
> > Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061509]
> > [<ffffffffc01dc2c0>] __map_bio+0xb0/0x258 [dm_mod]
> > Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061563]
> > [<ffffffffc01deb94>] __split_and_process_bio+0x274/0x488 [dm_mod]
> > Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061598]
> > [<ffffffffc01dee3c>] dm_make_request+0x94/0x128 [dm_mod]
> > Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061624]
> > [<ffffffff80b3347c>] generic_make_request+0x114/0x290
> > Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061674]
> > [<ffffffffc0511090>] drbd_submit_peer_request+0x258/0x618 [drbd]
> > Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061739]
> > [<ffffffffc0512c78>] receive_Data+0x6c8/0x1368 [drbd]
> > Sep 24 12:08:49.007987 warn CFPU-1 kernel: [  280.061816]
> > [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
> > Sep 24 12:08:49.007987 warn CFPU-1 kernel: [  280.061882]
> > [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
> > Sep 24 12:08:49.007987 warn CFPU-1 kernel: [  280.061922]
> > [<ffffffff808b813c>] kthread+0xdc/0xf8
> > Sep 24 12:08:49.007987 warn CFPU-1 kernel: [  280.061931]
> > [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
> > Sep 24 12:08:49.007987 warn CFPU-1 kernel: [  280.061937]
> > Sep 24 12:08:49.007987 err CFPU-1 kernel: [  280.061948] INFO: task
> > drbd_r_r6:7991 blocked for more than 120 seconds.
> > Sep 24 12:08:49.013584 err CFPU-1 kernel: [  280.068669]
> > Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
> > Sep 24 12:08:49.021541 err CFPU-1 kernel: [  280.076683] "echo 0 >
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > Sep 24 12:08:49.029370 info CFPU-1 kernel: [  280.084523] drbd_r_r6
> >    D ffffffff80e1db78     0  7991      2 0x00100002
> > Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532] Stack :
> > 0000000000000001 ffffffff80b35530 800000078785b950 800000078785b950
> > Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
> > 8000000787b79b00 0000000000000002 ffffffffc0285294 000000000000049a
> > Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
> > 0000000000000001 000000000000049a 0000000000000001 ffffffff80b2aa90
> > Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
> > 0000000000000001 ffffffff80e1db78 8000000787b79b00 ffffffff80e204b8
> > Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
> > 800000078465b940 8000000788ef79c0 8000000787b79b00 0000000100000000
> > Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
> > ffffffffc0280000 80000007e90d6c20 80000007878cf2d8 ffffffffc027edec
> > Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
> > 80000007e90d6c00 c000000001fcc080 80000007e90d6c20 ffffffffc01dc2c0
> > Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
> > 80000007f93fd900 0000000000000000 0000000002400000 80000007e90d6a00
> > Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
> > 000000000000049a ffffffff80b2abec 800000078785ba50 800000078785ba50
> > Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
> > ffffffffc01dbad8 c000000001fcc080 c000000001fcc080 80000007e90d6c00
> > Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]   ...
> > Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084610] Call Trace:
> > Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084621]
> > [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
> > Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084631]
> > [<ffffffff80e1db78>] schedule+0x38/0x98
> > Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084641]
> > [<ffffffff80e204b8>] __down_read+0xa8/0xf0
> > Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084659]
> > [<ffffffffc027edec>] do_origin.isra.13+0x44/0x110 [dm_snapshot]
> > Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084696]
> > [<ffffffffc01dc2c0>] __map_bio+0xb0/0x258 [dm_mod]
> > Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084729]
> > [<ffffffffc01deb94>] __split_and_process_bio+0x274/0x488 [dm_mod]
> > Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084777]
> > [<ffffffffc01dee3c>] dm_make_request+0x94/0x128 [dm_mod]
> > Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.084810]
> > [<ffffffff80b3347c>] generic_make_request+0x114/0x290
> > Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.084863]
> > [<ffffffffc0511090>] drbd_submit_peer_request+0x258/0x618 [drbd]
> > Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.084928]
> > [<ffffffffc0512c78>] receive_Data+0x6c8/0x1368 [drbd]
> > Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.084992]
> > [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
> > Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.085057]
> > [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
> > Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.085097]
> > [<ffffffff808b813c>] kthread+0xdc/0xf8
> > Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.085106]
> > [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
> > Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.085112]
> > Sep 24 12:08:49.031000 err CFPU-1 kernel: [  280.085121] INFO: task
> > drbd_r_r7:8046 blocked for more than 120 seconds.
> > Sep 24 12:08:49.036834 err CFPU-1 kernel: [  280.091854]
> > Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
> > Sep 24 12:08:49.044747 err CFPU-1 kernel: [  280.099876] "echo 0 >
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > Sep 24 12:08:49.056612 info CFPU-1 kernel: [  280.107739] drbd_r_r7
> >    D ffffffff80e1db78     0  8046      2 0x00100000
> > Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749] Stack :
> > 0000000000000000 0000000000000010 ffffffff811b3a80 0000000000000000
> > Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
> > 7fffffffffffffff 7fffffffffffffff 0000000000000000 0000000000000002
> > Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
> > 8000000788fcfa98 0000000000000000 ffffffffc053baa8 ffffffff80be4360
> > Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
> > ffffffffc0550000 ffffffff80e1db78 ffffffff81070660 ffffffff80e208d8
> > Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
> > 8000000797a43600 ffffffff808c4878 80000007ff2f4100 0000000000000000
> > Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
> > 80000007f93fdb40 0000000000000000 80000007ff007000 ffffffff81070660
> > Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
> > 8000000788fcfa90 7fffffffffffffff 0000000000000000 0000000000000002
> > Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
> > 8000000788fcfa98 ffffffff80e1e8a0 0000000100000000 8000000797a42880
> > Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
> > ffffffff808c4b60 8000000788fcfaa0 8000000788fcfaa0 ffffffffc053baa8
> > Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
> > ffffffff80be4360 ffffffff81cb4540 ffffffff81cb0000 80000007f93fdb40
> > Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]   ...
> > Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107821] Call Trace:
> > Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107833]
> > [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
> > Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107842]
> > [<ffffffff80e1db78>] schedule+0x38/0x98
> > Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107850]
> > [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
> > Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107859]
> > [<ffffffff80e1e8a0>] wait_for_common+0xd8/0x1b0
> > Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107873]
> > [<ffffffff808acc20>] call_usermodehelper_exec+0x110/0x188
> > Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107932]
> > [<ffffffffc053da3c>] drbd_khelper+0x1c4/0x310 [drbd]
> > Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107998]
> > [<ffffffffc0500a14>] drbd_start_resync+0x534/0x968 [drbd]
> > Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108060]
> > [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
> > Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108124]
> > [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
> > Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108189]
> > [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
> > Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108229]
> > [<ffffffff808b813c>] kthread+0xdc/0xf8
> > Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108239]
> > [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
> > Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108245]
> > Sep 24 12:08:49.056612 err CFPU-1 kernel: [  280.108471] INFO: task
> > drbd_r_r8:8120 blocked for more than 120 seconds.
> > Sep 24 12:08:49.060085 err CFPU-1 kernel: [  280.115194]
> > Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
> > Sep 24 12:08:49.075924 err CFPU-1 kernel: [  280.123218] "echo 0 >
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > Sep 24 12:08:49.075924 info CFPU-1 kernel: [  280.131070] drbd_r_r8
> >    D ffffffff80e1db78     0  8120      2 0x00100002
> > Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083] Stack :
> > 0000000000000000 0000000000000010 ffffffff811b3a80 0000000000000000
> > Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
> > 7fffffffffffffff 7fffffffffffffff 0000000000000000 0000000000000002
> > Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
> > 80000007851d3a98 0000000000000000 ffffffffc053baa8 ffffffff80be4360
> > Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
> > ffffffffc0550000 ffffffff80e1db78 ffffffff81070660 ffffffff80e208d8
> > Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
> > 80000007f96f6c00 ffffffff808c4878 80000007ff2f4100 0000000000000000
> > Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
> > 80000007ee99f360 0000000000000000 80000007ff007000 ffffffff81070660
> > Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
> > 80000007851d3a90 7fffffffffffffff 0000000000000000 0000000000000002
> > Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
> > 80000007851d3a98 ffffffff80e1e8a0 0000000100000000 8000000784ba3600
> > Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
> > ffffffff808c4b60 80000007851d3aa0 80000007851d3aa0 ffffffffc053baa8
> > Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
> > ffffffff80be4360 ffffffff81cb4540 ffffffff81cb0000 80000007ee99f360
> > Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]   ...
> > Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131162] Call Trace:
> > Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131174]
> > [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
> > Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131183]
> > [<ffffffff80e1db78>] schedule+0x38/0x98
> > Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131195]
> > [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
> > Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131204]
> > [<ffffffff80e1e8a0>] wait_for_common+0xd8/0x1b0
> > Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131216]
> > [<ffffffff808acc20>] call_usermodehelper_exec+0x110/0x188
> > Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131274]
> > [<ffffffffc053da3c>] drbd_khelper+0x1c4/0x310 [drbd]
> > Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131341]
> > [<ffffffffc0500a14>] drbd_start_resync+0x534/0x968 [drbd]
> > Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131403]
> > [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
> > Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131482]
> > [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
> > Sep 24 12:08:49.077857 warn CFPU-1 kernel: [  280.131557]
> > [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
> > Sep 24 12:08:49.077857 warn CFPU-1 kernel: [  280.131597]
> > [<ffffffff808b813c>] kthread+0xdc/0xf8
> > Sep 24 12:08:49.077857 warn CFPU-1 kernel: [  280.131606]
> > [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
> > Sep 24 12:08:49.077857 warn CFPU-1 kernel: [  280.131612]
> > Sep 24 12:08:49.077857 err CFPU-1 kernel: [  280.131641] INFO: task
> > drbd_r_r9:8173 blocked for more than 120 seconds.
> > Sep 24 12:08:49.091245 err CFPU-1 kernel: [  280.138358]
> > Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
> > Sep 24 12:08:49.091245 err CFPU-1 kernel: [  280.146373] "echo 0 >
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > Sep 24 12:08:49.099067 info CFPU-1 kernel: [  280.154219] drbd_r_r9
> >    D ffffffff80e1db78     0  8173      2 0x00100002
> > Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229] Stack :
> > 0000000000000000 0000000000000010 ffffffff811b3a80 0000000000000000
> > Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
> > 7fffffffffffffff 7fffffffffffffff 0000000000000000 0000000000000002
> > Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
> > 8000000783c9fa98 0000000000000000 ffffffffc053baa8 ffffffff80be4360
> > Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
> > ffffffffc0550000 ffffffff80e1db78 ffffffff81070660 ffffffff80e208d8
> > Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
> > 8000000797a45100 ffffffff808c4878 80000007ff2f4100 0000000000000000
> > Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
> > 8000000788ca9d20 0000000000000000 80000007ff007000 ffffffff81070660
> > Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
> > 8000000783c9fa90 7fffffffffffffff 0000000000000000 0000000000000002
> > Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
> > 8000000783c9fa98 ffffffff80e1e8a0 0000000100000000 80000007843d4380
> > Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
> > ffffffff808c4b60 8000000783c9faa0 8000000783c9faa0 ffffffffc053baa8
> > Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
> > ffffffff80be4360 ffffffff81cb4540 ffffffff81cb0000 8000000788ca9d20
> > Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]   ...
> > Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154305] Call Trace:
> > Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154315]
> > [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
> > Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154325]
> > [<ffffffff80e1db78>] schedule+0x38/0x98
> > Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154333]
> > [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
> > Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154342]
> > [<ffffffff80e1e8a0>] wait_for_common+0xd8/0x1b0
> > Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154354]
> > [<ffffffff808acc20>] call_usermodehelper_exec+0x110/0x188
> > Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154410]
> > [<ffffffffc053da3c>] drbd_khelper+0x1c4/0x310 [drbd]
> > Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154475]
> > [<ffffffffc0500a14>] drbd_start_resync+0x534/0x968 [drbd]
> > Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154540]
> > [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
> > Sep 24 12:08:49.100714 warn CFPU-1 kernel: [  280.154612]
> > [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
> > Sep 24 12:08:49.100714 warn CFPU-1 kernel: [  280.154678]
> > [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
> > Sep 24 12:08:49.100714 warn CFPU-1 kernel: [  280.154717]
> > [<ffffffff808b813c>] kthread+0xdc/0xf8
> > Sep 24 12:08:49.100714 warn CFPU-1 kernel: [  280.154726]
> > [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
> > Sep 24 12:08:49.100714 warn CFPU-1 kernel: [  280.154732]
> > Sep 24 12:08:49.100714 err CFPU-1 kernel: [  280.154742] INFO: task
> > drbd_r_r10:8254 blocked for more than 120 seconds.
> > Sep 24 12:08:49.106539 err CFPU-1 kernel: [  280.161554]
> > Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
> > Sep 24 12:08:49.114430 err CFPU-1 kernel: [  280.169569] "echo 0 >
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > Sep 24 12:08:49.129749 info CFPU-1 kernel: [  280.177434] drbd_r_r10
> >    D ffffffff80e1db78     0  8254      2 0x00100002
> > Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445] Stack :
> > 0000000000000000 0000000000000010 ffffffff811b3a80 0000000000000000
> > Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
> > 7fffffffffffffff 7fffffffffffffff 0000000000000000 0000000000000002
> > Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
> > 80000007846efa98 0000000000000000 ffffffffc053baa8 ffffffff80be4360
> > Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
> > ffffffffc0550000 ffffffff80e1db78 ffffffff81070660 ffffffff80e208d8
> > Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
> > 8000000784893600 ffffffff808c4878 80000007ff2f4100 0000000000000000
> > Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
> > 80000007e9498180 0000000000000000 80000007ff007000 ffffffff81070660
> > Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
> > 80000007846efa90 7fffffffffffffff 0000000000000000 0000000000000002
> > Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
> > 80000007846efa98 ffffffff80e1e8a0 0000000100000000 8000000783cf8d80
> > Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
> > ffffffff808c4b60 80000007846efaa0 80000007846efaa0 ffffffffc053baa8
> > Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
> > ffffffff80be4360 ffffffff81cb4540 ffffffff81cb0000 80000007e9498180
> > Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]   ...
> > Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177517] Call Trace:
> > Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177529]
> > [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
> > Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177539]
> > [<ffffffff80e1db78>] schedule+0x38/0x98
> > Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177547]
> > [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
> > Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177556]
> > [<ffffffff80e1e8a0>] wait_for_common+0xd8/0x1b0
> > Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177569]
> > [<ffffffff808acc20>] call_usermodehelper_exec+0x110/0x188
> > Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177631]
> > [<ffffffffc053da3c>] drbd_khelper+0x1c4/0x310 [drbd]
> > Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177697]
> > [<ffffffffc0500a14>] drbd_start_resync+0x534/0x968 [drbd]
> > Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177760]
> > [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
> > Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177823]
> > [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
> > Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177888]
> > [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
> > Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177928]
> > [<ffffffff808b813c>] kthread+0xdc/0xf8
> > Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177938]
> > [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
> > Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177944]
> > Sep 24 12:08:49.129749 err CFPU-1 kernel: [  280.177960] INFO: task
> > drbd_r_r11:8328 blocked for more than 120 seconds.
> > Sep 24 12:08:49.129749 err CFPU-1 kernel: [  280.184775]
> > Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
> > Sep 24 12:08:49.137674 err CFPU-1 kernel: [  280.192793] "echo 0 >
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > Sep 24 12:08:49.152797 info CFPU-1 kernel: [  280.200644] drbd_r_r11
> >    D ffffffff80e1db78     0  8328      2 0x00100002
> > Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654] Stack :
> > 0000000000000000 0000000000000010 ffffffff811b3a80 0000000000000000
> > Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
> > 7fffffffffffffff 7fffffffffffffff 0000000000000000 0000000000000002
> > Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
> > 8000000783c5fa98 0000000000000000 ffffffffc053baa8 ffffffff80be4360
> > Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
> > ffffffffc0550000 ffffffff80e1db78 ffffffff81070660 ffffffff80e208d8
> > Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
> > 8000000787935100 ffffffff808c4878 80000007ff2f4100 0000000000000000
> > Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
> > 80000007eebd25a0 0000000000000000 80000007ff007000 ffffffff81070660
> > Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
> > 8000000783c5fa90 7fffffffffffffff 0000000000000000 0000000000000002
> > Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
> > 8000000783c5fa98 ffffffff80e1e8a0 0000000100000000 800000078371de80
> > Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
> > ffffffff808c4b60 8000000783c5faa0 8000000783c5faa0 ffffffffc053baa8
> > Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
> > ffffffff80be4360 ffffffff81cb4540 ffffffff81cb0000 80000007eebd25a0
> > Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]   ...
> > Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200726] Call Trace:
> > Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200739]
> > [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
> > Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200749]
> > [<ffffffff80e1db78>] schedule+0x38/0x98
> > Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200757]
> > [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
> > Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200766]
> > [<ffffffff80e1e8a0>] wait_for_common+0xd8/0x1b0
> > Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200778]
> > [<ffffffff808acc20>] call_usermodehelper_exec+0x110/0x188
> > Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200840]
> > [<ffffffffc053da3c>] drbd_khelper+0x1c4/0x310 [drbd]
> > Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200906]
> > [<ffffffffc0500a14>] drbd_start_resync+0x534/0x968 [drbd]
> > Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200969]
> > [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
> > Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.201032]
> > [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
> > Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.201097]
> > [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
> > Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.201137]
> > [<ffffffff808b813c>] kthread+0xdc/0xf8
> > Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.201147]
> > [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
> > Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.201153]
> > Sep 24 12:08:49.152797 err CFPU-1 kernel: [  280.201183] INFO: task
> > lvcreate:8585 blocked for more than 120 seconds.
> > Sep 24 12:08:49.152797 err CFPU-1 kernel: [  280.207823]
> > Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
> > Sep 24 12:08:49.160715 err CFPU-1 kernel: [  280.215860] "echo 0 >
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > Sep 24 12:08:49.168570 info CFPU-1 kernel: [  280.223704] lvcreate
> >    D ffffffff80e1db78     0  8585   8582 0x00100000
> > Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719] Stack :
> > ffffffff809ce420 ffffffffc027e170 80000007ea05c720 ffffffff809ce014
> > Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
> > ffffffffc0285294 ffffffffc0285290 8000000797bdb600 0000000000000002
> > Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
> > ffffffffc02834d0 ffffffffc0280000 ffffffff809ce420 ffffffffc027e170
> > Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
> > 80000007f9275d80 ffffffff80e1db78 ffffffffc0285294 ffffffff80e20584
> > Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
> > 80000007fde8bb30 ffffffffc0285298 8000000797bdb600 0000000000000000
> > Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
> > c00000000205a080 0000000000000000 c00000000205a080 80000007ea05c600
> > Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
> > 80000007fec10300 ffffffffc027fd20 8000000780cb0150 ffffffffc01e2e9c
> > Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
> > 00000008c0284b30 00000002c0284b30 80000007ea05c620 ffffffffc0280000
> > Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
> > c00000000205a080 80000007ea05de00 0000000000000000 8000000780cb0150
> > Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
> > 8000000780cb0160 0000000001d4c000 0000000000000000 8000000780cb0000
> > Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]   ...
> > Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223796] Call Trace:
> > Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223805]
> > [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
> > Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223814]
> > [<ffffffff80e1db78>] schedule+0x38/0x98
> > Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223824]
> > [<ffffffff80e20584>] __down_write_nested+0x84/0xe8
> > Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223843]
> > [<ffffffffc027fd20>] snapshot_ctr+0x4d0/0x868 [dm_snapshot]
> > Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223886]
> > [<ffffffffc01e31d4>] dm_table_add_target+0x164/0x418 [dm_mod]
> > Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223921]
> > [<ffffffffc01e7bdc>] table_load+0x194/0x478 [dm_mod]
> > Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223956]
> > [<ffffffffc01e8e1c>] ctl_ioctl+0x424/0x678 [dm_mod]
> > Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.224006]
> > [<ffffffffc01e90a0>] dm_ctl_ioctl+0x30/0x40 [dm_mod]
> > Sep 24 12:08:49.170177 warn CFPU-1 kernel: [  280.224040]
> > [<ffffffff809f29cc>] do_vfs_ioctl+0x38c/0x5f8
> > Sep 24 12:08:49.170177 warn CFPU-1 kernel: [  280.224049]
> > [<ffffffff809f2c98>] SyS_ioctl+0x60/0xc8
> > Sep 24 12:08:49.170177 warn CFPU-1 kernel: [  280.224059]
> > [<ffffffff80879b70>] syscall_common+0x34/0x58
> > Sep 24 12:08:49.170177 warn CFPU-1 kernel: [  280.224065]
> >



--
Guruswamy Basavaiah

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fix "dm kcopyd: Fix bug causing workqueue stalls" causes dead lock
  2019-09-29 14:36   ` Guruswamy Basavaiah
@ 2019-10-01 12:12     ` Nikos Tsironis
  2019-10-01 12:27       ` Guruswamy Basavaiah
  0 siblings, 1 reply; 19+ messages in thread
From: Nikos Tsironis @ 2019-10-01 12:12 UTC (permalink / raw)
  To: Guruswamy Basavaiah; +Cc: dm-devel, agk, Mike Snitzer, iliastsi

On 9/29/19 5:36 PM, Guruswamy Basavaiah wrote:
> Hello Nikos,
>  Thanks for pointing out the lvcreate write lock.
> 
> Guru
> 

Hi Guru,

I have sent a fix for this and I have Cc-ed you.

Is this something you are able to consistently reproduce? If so, it
would be great if you could also test the fix.

Thanks,
Nikos

> 
> On Sat, 28 Sep 2019 at 01:03, Nikos Tsironis <ntsironis@arrikto.com> wrote:
>>
>> On 9/27/19 4:19 PM, Guruswamy Basavaiah wrote:
>>> Hello,
>>>  We have drbd partition on top of lvm partition. when node having
>>> secondary drbd partition is coming up, large amount of data will be
>>> synced between primary to secondary drbd partitions.
>>>
>>> During this time, we see the drbd Sync(Resync) stops at some point.
>>> After 120 seconds we see hung-task-timeout warnings in the logs.(see
>>> at the end of this email)
>>>
>>> If i increase the cow_count semaphore value from 2048 to 8192 or
>>> remove the below patch, drbd sync works seamlessly.
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/patch/?id=721b1d98fb517ae99ab3b757021cf81db41e67be
>>>
>>> I am not familiar with dm code, from hung task back traces what i
>>> understand is, when thread is trying to queue work to kcopyd, holding
>>> "&_origins_lock" and blocked on cow_count lock,
>>> jobs from kcopyd is trying to queue work to same kcopyd and blocked on
>>> "&_origins_lock" and dead lock.
>>>
>>
>> Hello Guruswamy,
>>
>> I am Cc-ing the maintainers, so they can be in the loop.
>>
>> I examined the attached logs and I believe the following happens:
>>
>> 1. DRBD issues a number of writes to the snapshot origin device. These
>>    writes cause COW, which is performed by kcopyd.
>>
>> 2. At some point DRBD reaches the cow_count semaphore limit (2048) and
>>    blocks in down(&s->cow_count), holding a read lock on _origins_lock.
>>
>> 3. Someone tries to create a new snapshot. This involves taking a write
>>    lock on _origins_lock, which blocks because DRBD at step (2) already
>>    holds a read lock on it. That's the blocked lvcreate at the end of
>>    the trace.
>>
>> 4. A COW operation, issued by step (1), completes and kcopyd runs
>>    dm-snapshot's completion callback, which tries to take a read lock on
>>    _origins_lock, before signaling the cow_count semaphore. This read
>>    lock blocks, the semaphore is never signaled and we have the deadlock
>>    you experienced.
>>
>> At first glance this seemed strange, because DRBD at step (2) holds a
>> read lock on _origins_lock, so taking another read lock should be
>> possible.
>>
>> But, if I am not missing something, the read-write semaphore
>> implementation gives priority to writers, meaning that as soon as a
>> writer tries to enter the critical section, the lvcreate in our case, no
>> readers will be allowed in until all writers have completed their work.
>>
>> That's what I believe is causing the deadlock you are experiencing.
>>
>> I will send a patch fixing this and I will let you now.
>>
>> Thanks,
>> Nikos
>>
>>> Below is the hung task back traces.
>>> Sep 24 12:08:48.974658 err CFPU-1 kernel: [  279.991760] INFO: task
>>> kworker/1:1:170 blocked for more than 120 seconds.
>>> Sep 24 12:08:48.974658 err CFPU-1 kernel: [  279.998569]
>>> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
>>> Sep 24 12:08:48.974658 err CFPU-1 kernel: [  280.006593] "echo 0 >
>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> Sep 24 12:08:48.974658 info CFPU-1 kernel: [  280.014435] kworker/1:1
>>>    D ffffffff80e1db78     0   170      2 0x00100000
>>> Sep 24 12:08:48.974658 info CFPU-1 kernel: [  280.014482] Workqueue:
>>> kcopyd do_work [dm_mod]
>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487] Stack :
>>> 0000000000000000 0000000000000001 0003000300000000 80000007fde8bac8
>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
>>> 80000007fe759b00 0000000000000002 ffffffffc0285294 80000007f8d1ca00
>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
>>> ffffffffc027eda8 0000000000000001 ffffffff80b30000 0000000000000100
>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
>>> 8000000784c098c8 ffffffff80e1db78 80000007fe759b00 ffffffff80e204b8
>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
>>> 8000000788ef79c0 800000078505ba70 80000007fe759b00 00000001852b4620
>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
>>> ffffffffc0280000 80000007852b4620 80000007eebf5758 ffffffffc027edec
>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
>>> 0000000000000000 80000007852b4620 80000007835d8e80 ffffffffc027f38c
>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
>>> 8000000787ac0580 0000000000000001 80000007f8d1ca60 8000000785aeb080
>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
>>> 0000000000000000 0000000000000000 0000000000000200 ffffffffc0282488
>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
>>> 0000000000000200 80000007f8d1ca00 ffffffffc0280000 ffffffffc027db90
>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]   ...
>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014558] Call Trace:
>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014570]
>>> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014580]
>>> [<ffffffff80e1db78>] schedule+0x38/0x98
>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014590]
>>> [<ffffffff80e204b8>] __down_read+0xa8/0xf0
>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014609]
>>> [<ffffffffc027edec>] do_origin.isra.13+0x44/0x110 [dm_snapshot]
>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014625]
>>> [<ffffffffc027f38c>] pending_complete+0x1ac/0x378 [dm_snapshot]
>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014642]
>>> [<ffffffffc0282488>] persistent_commit_exception+0x140/0x218
>>> [dm_snapshot]
>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014659]
>>> [<ffffffffc027db90>] copy_callback+0x108/0x1a0 [dm_snapshot]
>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014688]
>>> [<ffffffffc01eb6a4>] run_complete_job+0x8c/0x148 [dm_mod]
>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014724]
>>> [<ffffffffc01eaae8>] process_jobs+0xc8/0x1e0 [dm_mod]
>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014759]
>>> [<ffffffffc01eb1e0>] do_work+0xb8/0x110 [dm_mod]
>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014783]
>>> [<ffffffff808b18e0>] process_one_work+0x190/0x480
>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014791]
>>> [<ffffffff808b1d18>] worker_thread+0x148/0x580
>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014801]
>>> [<ffffffff808b813c>] kthread+0xdc/0xf8
>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014811]
>>> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014817]
>>> Sep 24 12:08:48.974658 err CFPU-1 kernel: [  280.015046] INFO: task
>>> drbd_r_r1:7772 blocked for more than 120 seconds.
>>> Sep 24 12:08:48.974658 err CFPU-1 kernel: [  280.021766]
>>> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
>>> Sep 24 12:08:48.974658 err CFPU-1 kernel: [  280.029781] "echo 0 >
>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> Sep 24 12:08:48.982578 info CFPU-1 kernel: [  280.037637] drbd_r_r1
>>>    D ffffffff80e1db78     0  7772      2 0x00100002
>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648] Stack :
>>> 00000000000207e8 ffffffff80b35530 8000000788e77860 8000000788e77860
>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
>>> 7fffffffffffffff 80000007f8d1cb28 8000000788e0a880 0000000000000002
>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
>>> ffffffff80e20000 80000007f9306480 00000000000207e8 ffffffff808dfe58
>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
>>> 80000007eebf5168 ffffffff80e1db78 0000000002411200 ffffffff80e208d8
>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
>>> 80000007eab5b440 ffffffff8097b25c 00000000000207a0 ffffffff808dfe58
>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
>>> 80000007f8d1ca00 ffffffff809ce014 80000007f9306000 0000000002011200
>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
>>> 7fffffffffffffff 80000007f8d1cb28 8000000788e0a880 0000000000000002
>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
>>> ffffffff80e20000 ffffffff80e1f8d8 80000007f8d1cb30 80000007f8d1cb30
>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
>>> 8000000788e0a880 00ffffff808db9e8 0000000000000000 80000007f8d1cb28
>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
>>> 80000007f9306480 ffffffffc02834d0 ffffffff808e0000 ffffffff808dfba4
>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]   ...
>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037774] Call Trace:
>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037797]
>>> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037849]
>>> [<ffffffff80e1db78>] schedule+0x38/0x98
>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037865]
>>> [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037874]
>>> [<ffffffff80e1f8d8>] __down+0x90/0xd8
>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037884]
>>> [<ffffffff808dfba4>] down+0x54/0x70
>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037901]
>>> [<ffffffffc027da48>] start_copy+0x98/0xd8 [dm_snapshot]
>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037934]
>>> [<ffffffffc027e7d4>] __origin_write+0x184/0x2c0 [dm_snapshot]
>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037968]
>>> [<ffffffffc027ee4c>] do_origin.isra.13+0xa4/0x110 [dm_snapshot]
>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038016]
>>> [<ffffffffc01dc2c0>] __map_bio+0xb0/0x258 [dm_mod]
>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038071]
>>> [<ffffffffc01deb94>] __split_and_process_bio+0x274/0x488 [dm_mod]
>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038126]
>>> [<ffffffffc01dee3c>] dm_make_request+0x94/0x128 [dm_mod]
>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038174]
>>> [<ffffffff80b3347c>] generic_make_request+0x114/0x290
>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038238]
>>> [<ffffffffc0511090>] drbd_submit_peer_request+0x258/0x618 [drbd]
>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038337]
>>> [<ffffffffc05121f8>] receive_RSDataReply+0x3b8/0x770 [drbd]
>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038426]
>>> [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038526]
>>> [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
>>> Sep 24 12:08:48.985184 warn CFPU-1 kernel: [  280.038588]
>>> [<ffffffff808b813c>] kthread+0xdc/0xf8
>>> Sep 24 12:08:48.985184 warn CFPU-1 kernel: [  280.038604]
>>> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
>>> Sep 24 12:08:48.985184 warn CFPU-1 kernel: [  280.038612]
>>> Sep 24 12:08:48.985184 err CFPU-1 kernel: [  280.038629] INFO: task
>>> drbd_r_r5:7910 blocked for more than 120 seconds.
>>> Sep 24 12:08:48.990218 err CFPU-1 kernel: [  280.045351]
>>> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
>>> Sep 24 12:08:48.998215 err CFPU-1 kernel: [  280.053372] "echo 0 >
>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> Sep 24 12:08:49.006141 info CFPU-1 kernel: [  280.061248] drbd_r_r5
>>>    D ffffffff80e1db78     0  7910      2 0x00100002
>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257] Stack :
>>> 0000000000000001 ffffffff80b35530 8000000788ef7950 8000000788ef7950
>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
>>> 8000000787936c00 0000000000000002 ffffffffc0285294 00000000000049ba
>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
>>> 0000000000000001 00000000000049ba 0000000000000001 ffffffff80b2aa90
>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
>>> 0000000000000001 ffffffff80e1db78 8000000787936c00 ffffffff80e204b8
>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
>>> 800000078785b9c0 80000007fde8bb30 8000000787936c00 0000000100000000
>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
>>> ffffffffc0280000 80000007eff43920 8000000788ca0198 ffffffffc027edec
>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
>>> 80000007eff43900 c000000001f5b080 80000007eff43920 ffffffffc01dc2c0
>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
>>> 80000007eb29bd80 0000000000000000 0000000002400000 80000007eff42900
>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
>>> 00000000000049ba ffffffff80b2abec 8000000788ef7a50 8000000788ef7a50
>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
>>> ffffffffc01dbad8 c000000001f5b080 c000000001f5b080 80000007eff43900
>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]   ...
>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061375] Call Trace:
>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061392]
>>> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061435]
>>> [<ffffffff80e1db78>] schedule+0x38/0x98
>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061445]
>>> [<ffffffff80e204b8>] __down_read+0xa8/0xf0
>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061463]
>>> [<ffffffffc027edec>] do_origin.isra.13+0x44/0x110 [dm_snapshot]
>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061509]
>>> [<ffffffffc01dc2c0>] __map_bio+0xb0/0x258 [dm_mod]
>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061563]
>>> [<ffffffffc01deb94>] __split_and_process_bio+0x274/0x488 [dm_mod]
>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061598]
>>> [<ffffffffc01dee3c>] dm_make_request+0x94/0x128 [dm_mod]
>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061624]
>>> [<ffffffff80b3347c>] generic_make_request+0x114/0x290
>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061674]
>>> [<ffffffffc0511090>] drbd_submit_peer_request+0x258/0x618 [drbd]
>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061739]
>>> [<ffffffffc0512c78>] receive_Data+0x6c8/0x1368 [drbd]
>>> Sep 24 12:08:49.007987 warn CFPU-1 kernel: [  280.061816]
>>> [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
>>> Sep 24 12:08:49.007987 warn CFPU-1 kernel: [  280.061882]
>>> [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
>>> Sep 24 12:08:49.007987 warn CFPU-1 kernel: [  280.061922]
>>> [<ffffffff808b813c>] kthread+0xdc/0xf8
>>> Sep 24 12:08:49.007987 warn CFPU-1 kernel: [  280.061931]
>>> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
>>> Sep 24 12:08:49.007987 warn CFPU-1 kernel: [  280.061937]
>>> Sep 24 12:08:49.007987 err CFPU-1 kernel: [  280.061948] INFO: task
>>> drbd_r_r6:7991 blocked for more than 120 seconds.
>>> Sep 24 12:08:49.013584 err CFPU-1 kernel: [  280.068669]
>>> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
>>> Sep 24 12:08:49.021541 err CFPU-1 kernel: [  280.076683] "echo 0 >
>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> Sep 24 12:08:49.029370 info CFPU-1 kernel: [  280.084523] drbd_r_r6
>>>    D ffffffff80e1db78     0  7991      2 0x00100002
>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532] Stack :
>>> 0000000000000001 ffffffff80b35530 800000078785b950 800000078785b950
>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
>>> 8000000787b79b00 0000000000000002 ffffffffc0285294 000000000000049a
>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
>>> 0000000000000001 000000000000049a 0000000000000001 ffffffff80b2aa90
>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
>>> 0000000000000001 ffffffff80e1db78 8000000787b79b00 ffffffff80e204b8
>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
>>> 800000078465b940 8000000788ef79c0 8000000787b79b00 0000000100000000
>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
>>> ffffffffc0280000 80000007e90d6c20 80000007878cf2d8 ffffffffc027edec
>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
>>> 80000007e90d6c00 c000000001fcc080 80000007e90d6c20 ffffffffc01dc2c0
>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
>>> 80000007f93fd900 0000000000000000 0000000002400000 80000007e90d6a00
>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
>>> 000000000000049a ffffffff80b2abec 800000078785ba50 800000078785ba50
>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
>>> ffffffffc01dbad8 c000000001fcc080 c000000001fcc080 80000007e90d6c00
>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]   ...
>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084610] Call Trace:
>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084621]
>>> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084631]
>>> [<ffffffff80e1db78>] schedule+0x38/0x98
>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084641]
>>> [<ffffffff80e204b8>] __down_read+0xa8/0xf0
>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084659]
>>> [<ffffffffc027edec>] do_origin.isra.13+0x44/0x110 [dm_snapshot]
>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084696]
>>> [<ffffffffc01dc2c0>] __map_bio+0xb0/0x258 [dm_mod]
>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084729]
>>> [<ffffffffc01deb94>] __split_and_process_bio+0x274/0x488 [dm_mod]
>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084777]
>>> [<ffffffffc01dee3c>] dm_make_request+0x94/0x128 [dm_mod]
>>> Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.084810]
>>> [<ffffffff80b3347c>] generic_make_request+0x114/0x290
>>> Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.084863]
>>> [<ffffffffc0511090>] drbd_submit_peer_request+0x258/0x618 [drbd]
>>> Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.084928]
>>> [<ffffffffc0512c78>] receive_Data+0x6c8/0x1368 [drbd]
>>> Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.084992]
>>> [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
>>> Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.085057]
>>> [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
>>> Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.085097]
>>> [<ffffffff808b813c>] kthread+0xdc/0xf8
>>> Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.085106]
>>> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
>>> Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.085112]
>>> Sep 24 12:08:49.031000 err CFPU-1 kernel: [  280.085121] INFO: task
>>> drbd_r_r7:8046 blocked for more than 120 seconds.
>>> Sep 24 12:08:49.036834 err CFPU-1 kernel: [  280.091854]
>>> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
>>> Sep 24 12:08:49.044747 err CFPU-1 kernel: [  280.099876] "echo 0 >
>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> Sep 24 12:08:49.056612 info CFPU-1 kernel: [  280.107739] drbd_r_r7
>>>    D ffffffff80e1db78     0  8046      2 0x00100000
>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749] Stack :
>>> 0000000000000000 0000000000000010 ffffffff811b3a80 0000000000000000
>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
>>> 7fffffffffffffff 7fffffffffffffff 0000000000000000 0000000000000002
>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
>>> 8000000788fcfa98 0000000000000000 ffffffffc053baa8 ffffffff80be4360
>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
>>> ffffffffc0550000 ffffffff80e1db78 ffffffff81070660 ffffffff80e208d8
>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
>>> 8000000797a43600 ffffffff808c4878 80000007ff2f4100 0000000000000000
>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
>>> 80000007f93fdb40 0000000000000000 80000007ff007000 ffffffff81070660
>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
>>> 8000000788fcfa90 7fffffffffffffff 0000000000000000 0000000000000002
>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
>>> 8000000788fcfa98 ffffffff80e1e8a0 0000000100000000 8000000797a42880
>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
>>> ffffffff808c4b60 8000000788fcfaa0 8000000788fcfaa0 ffffffffc053baa8
>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
>>> ffffffff80be4360 ffffffff81cb4540 ffffffff81cb0000 80000007f93fdb40
>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]   ...
>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107821] Call Trace:
>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107833]
>>> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107842]
>>> [<ffffffff80e1db78>] schedule+0x38/0x98
>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107850]
>>> [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107859]
>>> [<ffffffff80e1e8a0>] wait_for_common+0xd8/0x1b0
>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107873]
>>> [<ffffffff808acc20>] call_usermodehelper_exec+0x110/0x188
>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107932]
>>> [<ffffffffc053da3c>] drbd_khelper+0x1c4/0x310 [drbd]
>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107998]
>>> [<ffffffffc0500a14>] drbd_start_resync+0x534/0x968 [drbd]
>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108060]
>>> [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108124]
>>> [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108189]
>>> [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108229]
>>> [<ffffffff808b813c>] kthread+0xdc/0xf8
>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108239]
>>> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108245]
>>> Sep 24 12:08:49.056612 err CFPU-1 kernel: [  280.108471] INFO: task
>>> drbd_r_r8:8120 blocked for more than 120 seconds.
>>> Sep 24 12:08:49.060085 err CFPU-1 kernel: [  280.115194]
>>> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
>>> Sep 24 12:08:49.075924 err CFPU-1 kernel: [  280.123218] "echo 0 >
>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> Sep 24 12:08:49.075924 info CFPU-1 kernel: [  280.131070] drbd_r_r8
>>>    D ffffffff80e1db78     0  8120      2 0x00100002
>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083] Stack :
>>> 0000000000000000 0000000000000010 ffffffff811b3a80 0000000000000000
>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
>>> 7fffffffffffffff 7fffffffffffffff 0000000000000000 0000000000000002
>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
>>> 80000007851d3a98 0000000000000000 ffffffffc053baa8 ffffffff80be4360
>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
>>> ffffffffc0550000 ffffffff80e1db78 ffffffff81070660 ffffffff80e208d8
>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
>>> 80000007f96f6c00 ffffffff808c4878 80000007ff2f4100 0000000000000000
>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
>>> 80000007ee99f360 0000000000000000 80000007ff007000 ffffffff81070660
>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
>>> 80000007851d3a90 7fffffffffffffff 0000000000000000 0000000000000002
>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
>>> 80000007851d3a98 ffffffff80e1e8a0 0000000100000000 8000000784ba3600
>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
>>> ffffffff808c4b60 80000007851d3aa0 80000007851d3aa0 ffffffffc053baa8
>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
>>> ffffffff80be4360 ffffffff81cb4540 ffffffff81cb0000 80000007ee99f360
>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]   ...
>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131162] Call Trace:
>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131174]
>>> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131183]
>>> [<ffffffff80e1db78>] schedule+0x38/0x98
>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131195]
>>> [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131204]
>>> [<ffffffff80e1e8a0>] wait_for_common+0xd8/0x1b0
>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131216]
>>> [<ffffffff808acc20>] call_usermodehelper_exec+0x110/0x188
>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131274]
>>> [<ffffffffc053da3c>] drbd_khelper+0x1c4/0x310 [drbd]
>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131341]
>>> [<ffffffffc0500a14>] drbd_start_resync+0x534/0x968 [drbd]
>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131403]
>>> [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131482]
>>> [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
>>> Sep 24 12:08:49.077857 warn CFPU-1 kernel: [  280.131557]
>>> [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
>>> Sep 24 12:08:49.077857 warn CFPU-1 kernel: [  280.131597]
>>> [<ffffffff808b813c>] kthread+0xdc/0xf8
>>> Sep 24 12:08:49.077857 warn CFPU-1 kernel: [  280.131606]
>>> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
>>> Sep 24 12:08:49.077857 warn CFPU-1 kernel: [  280.131612]
>>> Sep 24 12:08:49.077857 err CFPU-1 kernel: [  280.131641] INFO: task
>>> drbd_r_r9:8173 blocked for more than 120 seconds.
>>> Sep 24 12:08:49.091245 err CFPU-1 kernel: [  280.138358]
>>> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
>>> Sep 24 12:08:49.091245 err CFPU-1 kernel: [  280.146373] "echo 0 >
>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> Sep 24 12:08:49.099067 info CFPU-1 kernel: [  280.154219] drbd_r_r9
>>>    D ffffffff80e1db78     0  8173      2 0x00100002
>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229] Stack :
>>> 0000000000000000 0000000000000010 ffffffff811b3a80 0000000000000000
>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
>>> 7fffffffffffffff 7fffffffffffffff 0000000000000000 0000000000000002
>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
>>> 8000000783c9fa98 0000000000000000 ffffffffc053baa8 ffffffff80be4360
>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
>>> ffffffffc0550000 ffffffff80e1db78 ffffffff81070660 ffffffff80e208d8
>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
>>> 8000000797a45100 ffffffff808c4878 80000007ff2f4100 0000000000000000
>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
>>> 8000000788ca9d20 0000000000000000 80000007ff007000 ffffffff81070660
>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
>>> 8000000783c9fa90 7fffffffffffffff 0000000000000000 0000000000000002
>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
>>> 8000000783c9fa98 ffffffff80e1e8a0 0000000100000000 80000007843d4380
>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
>>> ffffffff808c4b60 8000000783c9faa0 8000000783c9faa0 ffffffffc053baa8
>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
>>> ffffffff80be4360 ffffffff81cb4540 ffffffff81cb0000 8000000788ca9d20
>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]   ...
>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154305] Call Trace:
>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154315]
>>> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154325]
>>> [<ffffffff80e1db78>] schedule+0x38/0x98
>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154333]
>>> [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154342]
>>> [<ffffffff80e1e8a0>] wait_for_common+0xd8/0x1b0
>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154354]
>>> [<ffffffff808acc20>] call_usermodehelper_exec+0x110/0x188
>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154410]
>>> [<ffffffffc053da3c>] drbd_khelper+0x1c4/0x310 [drbd]
>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154475]
>>> [<ffffffffc0500a14>] drbd_start_resync+0x534/0x968 [drbd]
>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154540]
>>> [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
>>> Sep 24 12:08:49.100714 warn CFPU-1 kernel: [  280.154612]
>>> [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
>>> Sep 24 12:08:49.100714 warn CFPU-1 kernel: [  280.154678]
>>> [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
>>> Sep 24 12:08:49.100714 warn CFPU-1 kernel: [  280.154717]
>>> [<ffffffff808b813c>] kthread+0xdc/0xf8
>>> Sep 24 12:08:49.100714 warn CFPU-1 kernel: [  280.154726]
>>> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
>>> Sep 24 12:08:49.100714 warn CFPU-1 kernel: [  280.154732]
>>> Sep 24 12:08:49.100714 err CFPU-1 kernel: [  280.154742] INFO: task
>>> drbd_r_r10:8254 blocked for more than 120 seconds.
>>> Sep 24 12:08:49.106539 err CFPU-1 kernel: [  280.161554]
>>> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
>>> Sep 24 12:08:49.114430 err CFPU-1 kernel: [  280.169569] "echo 0 >
>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> Sep 24 12:08:49.129749 info CFPU-1 kernel: [  280.177434] drbd_r_r10
>>>    D ffffffff80e1db78     0  8254      2 0x00100002
>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445] Stack :
>>> 0000000000000000 0000000000000010 ffffffff811b3a80 0000000000000000
>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
>>> 7fffffffffffffff 7fffffffffffffff 0000000000000000 0000000000000002
>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
>>> 80000007846efa98 0000000000000000 ffffffffc053baa8 ffffffff80be4360
>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
>>> ffffffffc0550000 ffffffff80e1db78 ffffffff81070660 ffffffff80e208d8
>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
>>> 8000000784893600 ffffffff808c4878 80000007ff2f4100 0000000000000000
>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
>>> 80000007e9498180 0000000000000000 80000007ff007000 ffffffff81070660
>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
>>> 80000007846efa90 7fffffffffffffff 0000000000000000 0000000000000002
>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
>>> 80000007846efa98 ffffffff80e1e8a0 0000000100000000 8000000783cf8d80
>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
>>> ffffffff808c4b60 80000007846efaa0 80000007846efaa0 ffffffffc053baa8
>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
>>> ffffffff80be4360 ffffffff81cb4540 ffffffff81cb0000 80000007e9498180
>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]   ...
>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177517] Call Trace:
>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177529]
>>> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177539]
>>> [<ffffffff80e1db78>] schedule+0x38/0x98
>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177547]
>>> [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177556]
>>> [<ffffffff80e1e8a0>] wait_for_common+0xd8/0x1b0
>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177569]
>>> [<ffffffff808acc20>] call_usermodehelper_exec+0x110/0x188
>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177631]
>>> [<ffffffffc053da3c>] drbd_khelper+0x1c4/0x310 [drbd]
>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177697]
>>> [<ffffffffc0500a14>] drbd_start_resync+0x534/0x968 [drbd]
>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177760]
>>> [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177823]
>>> [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177888]
>>> [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177928]
>>> [<ffffffff808b813c>] kthread+0xdc/0xf8
>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177938]
>>> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177944]
>>> Sep 24 12:08:49.129749 err CFPU-1 kernel: [  280.177960] INFO: task
>>> drbd_r_r11:8328 blocked for more than 120 seconds.
>>> Sep 24 12:08:49.129749 err CFPU-1 kernel: [  280.184775]
>>> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
>>> Sep 24 12:08:49.137674 err CFPU-1 kernel: [  280.192793] "echo 0 >
>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> Sep 24 12:08:49.152797 info CFPU-1 kernel: [  280.200644] drbd_r_r11
>>>    D ffffffff80e1db78     0  8328      2 0x00100002
>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654] Stack :
>>> 0000000000000000 0000000000000010 ffffffff811b3a80 0000000000000000
>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
>>> 7fffffffffffffff 7fffffffffffffff 0000000000000000 0000000000000002
>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
>>> 8000000783c5fa98 0000000000000000 ffffffffc053baa8 ffffffff80be4360
>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
>>> ffffffffc0550000 ffffffff80e1db78 ffffffff81070660 ffffffff80e208d8
>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
>>> 8000000787935100 ffffffff808c4878 80000007ff2f4100 0000000000000000
>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
>>> 80000007eebd25a0 0000000000000000 80000007ff007000 ffffffff81070660
>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
>>> 8000000783c5fa90 7fffffffffffffff 0000000000000000 0000000000000002
>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
>>> 8000000783c5fa98 ffffffff80e1e8a0 0000000100000000 800000078371de80
>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
>>> ffffffff808c4b60 8000000783c5faa0 8000000783c5faa0 ffffffffc053baa8
>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
>>> ffffffff80be4360 ffffffff81cb4540 ffffffff81cb0000 80000007eebd25a0
>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]   ...
>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200726] Call Trace:
>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200739]
>>> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200749]
>>> [<ffffffff80e1db78>] schedule+0x38/0x98
>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200757]
>>> [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200766]
>>> [<ffffffff80e1e8a0>] wait_for_common+0xd8/0x1b0
>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200778]
>>> [<ffffffff808acc20>] call_usermodehelper_exec+0x110/0x188
>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200840]
>>> [<ffffffffc053da3c>] drbd_khelper+0x1c4/0x310 [drbd]
>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200906]
>>> [<ffffffffc0500a14>] drbd_start_resync+0x534/0x968 [drbd]
>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200969]
>>> [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.201032]
>>> [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.201097]
>>> [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.201137]
>>> [<ffffffff808b813c>] kthread+0xdc/0xf8
>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.201147]
>>> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.201153]
>>> Sep 24 12:08:49.152797 err CFPU-1 kernel: [  280.201183] INFO: task
>>> lvcreate:8585 blocked for more than 120 seconds.
>>> Sep 24 12:08:49.152797 err CFPU-1 kernel: [  280.207823]
>>> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
>>> Sep 24 12:08:49.160715 err CFPU-1 kernel: [  280.215860] "echo 0 >
>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> Sep 24 12:08:49.168570 info CFPU-1 kernel: [  280.223704] lvcreate
>>>    D ffffffff80e1db78     0  8585   8582 0x00100000
>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719] Stack :
>>> ffffffff809ce420 ffffffffc027e170 80000007ea05c720 ffffffff809ce014
>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
>>> ffffffffc0285294 ffffffffc0285290 8000000797bdb600 0000000000000002
>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
>>> ffffffffc02834d0 ffffffffc0280000 ffffffff809ce420 ffffffffc027e170
>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
>>> 80000007f9275d80 ffffffff80e1db78 ffffffffc0285294 ffffffff80e20584
>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
>>> 80000007fde8bb30 ffffffffc0285298 8000000797bdb600 0000000000000000
>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
>>> c00000000205a080 0000000000000000 c00000000205a080 80000007ea05c600
>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
>>> 80000007fec10300 ffffffffc027fd20 8000000780cb0150 ffffffffc01e2e9c
>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
>>> 00000008c0284b30 00000002c0284b30 80000007ea05c620 ffffffffc0280000
>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
>>> c00000000205a080 80000007ea05de00 0000000000000000 8000000780cb0150
>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
>>> 8000000780cb0160 0000000001d4c000 0000000000000000 8000000780cb0000
>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]   ...
>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223796] Call Trace:
>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223805]
>>> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223814]
>>> [<ffffffff80e1db78>] schedule+0x38/0x98
>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223824]
>>> [<ffffffff80e20584>] __down_write_nested+0x84/0xe8
>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223843]
>>> [<ffffffffc027fd20>] snapshot_ctr+0x4d0/0x868 [dm_snapshot]
>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223886]
>>> [<ffffffffc01e31d4>] dm_table_add_target+0x164/0x418 [dm_mod]
>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223921]
>>> [<ffffffffc01e7bdc>] table_load+0x194/0x478 [dm_mod]
>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223956]
>>> [<ffffffffc01e8e1c>] ctl_ioctl+0x424/0x678 [dm_mod]
>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.224006]
>>> [<ffffffffc01e90a0>] dm_ctl_ioctl+0x30/0x40 [dm_mod]
>>> Sep 24 12:08:49.170177 warn CFPU-1 kernel: [  280.224040]
>>> [<ffffffff809f29cc>] do_vfs_ioctl+0x38c/0x5f8
>>> Sep 24 12:08:49.170177 warn CFPU-1 kernel: [  280.224049]
>>> [<ffffffff809f2c98>] SyS_ioctl+0x60/0xc8
>>> Sep 24 12:08:49.170177 warn CFPU-1 kernel: [  280.224059]
>>> [<ffffffff80879b70>] syscall_common+0x34/0x58
>>> Sep 24 12:08:49.170177 warn CFPU-1 kernel: [  280.224065]
>>>
> 
> 
> 
> --
> Guruswamy Basavaiah
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fix "dm kcopyd: Fix bug causing workqueue stalls" causes dead lock
  2019-10-01 12:12     ` Nikos Tsironis
@ 2019-10-01 12:27       ` Guruswamy Basavaiah
  2019-10-01 12:43         ` Nikos Tsironis
  0 siblings, 1 reply; 19+ messages in thread
From: Guruswamy Basavaiah @ 2019-10-01 12:27 UTC (permalink / raw)
  To: Nikos Tsironis; +Cc: dm-devel, agk, Mike Snitzer, iliastsi

Hello Nikos,
 Yes, issue is consistently reproducible with us, in a particular
set-up and test case.
 I will get the access to set-up next week, will try to test and let
you know the results before end of next week.

Guru

On Tue, 1 Oct 2019 at 17:42, Nikos Tsironis <ntsironis@arrikto.com> wrote:
>
> On 9/29/19 5:36 PM, Guruswamy Basavaiah wrote:
> > Hello Nikos,
> >  Thanks for pointing out the lvcreate write lock.
> >
> > Guru
> >
>
> Hi Guru,
>
> I have sent a fix for this and I have Cc-ed you.
>
> Is this something you are able to consistently reproduce? If so, it
> would be great if you could also test the fix.
>
> Thanks,
> Nikos
>
> >
> > On Sat, 28 Sep 2019 at 01:03, Nikos Tsironis <ntsironis@arrikto.com> wrote:
> >>
> >> On 9/27/19 4:19 PM, Guruswamy Basavaiah wrote:
> >>> Hello,
> >>>  We have drbd partition on top of lvm partition. when node having
> >>> secondary drbd partition is coming up, large amount of data will be
> >>> synced between primary to secondary drbd partitions.
> >>>
> >>> During this time, we see the drbd Sync(Resync) stops at some point.
> >>> After 120 seconds we see hung-task-timeout warnings in the logs.(see
> >>> at the end of this email)
> >>>
> >>> If i increase the cow_count semaphore value from 2048 to 8192 or
> >>> remove the below patch, drbd sync works seamlessly.
> >>>
> >>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/patch/?id=721b1d98fb517ae99ab3b757021cf81db41e67be
> >>>
> >>> I am not familiar with dm code, from hung task back traces what i
> >>> understand is, when thread is trying to queue work to kcopyd, holding
> >>> "&_origins_lock" and blocked on cow_count lock,
> >>> jobs from kcopyd is trying to queue work to same kcopyd and blocked on
> >>> "&_origins_lock" and dead lock.
> >>>
> >>
> >> Hello Guruswamy,
> >>
> >> I am Cc-ing the maintainers, so they can be in the loop.
> >>
> >> I examined the attached logs and I believe the following happens:
> >>
> >> 1. DRBD issues a number of writes to the snapshot origin device. These
> >>    writes cause COW, which is performed by kcopyd.
> >>
> >> 2. At some point DRBD reaches the cow_count semaphore limit (2048) and
> >>    blocks in down(&s->cow_count), holding a read lock on _origins_lock.
> >>
> >> 3. Someone tries to create a new snapshot. This involves taking a write
> >>    lock on _origins_lock, which blocks because DRBD at step (2) already
> >>    holds a read lock on it. That's the blocked lvcreate at the end of
> >>    the trace.
> >>
> >> 4. A COW operation, issued by step (1), completes and kcopyd runs
> >>    dm-snapshot's completion callback, which tries to take a read lock on
> >>    _origins_lock, before signaling the cow_count semaphore. This read
> >>    lock blocks, the semaphore is never signaled and we have the deadlock
> >>    you experienced.
> >>
> >> At first glance this seemed strange, because DRBD at step (2) holds a
> >> read lock on _origins_lock, so taking another read lock should be
> >> possible.
> >>
> >> But, if I am not missing something, the read-write semaphore
> >> implementation gives priority to writers, meaning that as soon as a
> >> writer tries to enter the critical section, the lvcreate in our case, no
> >> readers will be allowed in until all writers have completed their work.
> >>
> >> That's what I believe is causing the deadlock you are experiencing.
> >>
> >> I will send a patch fixing this and I will let you now.
> >>
> >> Thanks,
> >> Nikos
> >>
> >>> Below is the hung task back traces.
> >>> Sep 24 12:08:48.974658 err CFPU-1 kernel: [  279.991760] INFO: task
> >>> kworker/1:1:170 blocked for more than 120 seconds.
> >>> Sep 24 12:08:48.974658 err CFPU-1 kernel: [  279.998569]
> >>> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
> >>> Sep 24 12:08:48.974658 err CFPU-1 kernel: [  280.006593] "echo 0 >
> >>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >>> Sep 24 12:08:48.974658 info CFPU-1 kernel: [  280.014435] kworker/1:1
> >>>    D ffffffff80e1db78     0   170      2 0x00100000
> >>> Sep 24 12:08:48.974658 info CFPU-1 kernel: [  280.014482] Workqueue:
> >>> kcopyd do_work [dm_mod]
> >>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487] Stack :
> >>> 0000000000000000 0000000000000001 0003000300000000 80000007fde8bac8
> >>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
> >>> 80000007fe759b00 0000000000000002 ffffffffc0285294 80000007f8d1ca00
> >>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
> >>> ffffffffc027eda8 0000000000000001 ffffffff80b30000 0000000000000100
> >>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
> >>> 8000000784c098c8 ffffffff80e1db78 80000007fe759b00 ffffffff80e204b8
> >>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
> >>> 8000000788ef79c0 800000078505ba70 80000007fe759b00 00000001852b4620
> >>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
> >>> ffffffffc0280000 80000007852b4620 80000007eebf5758 ffffffffc027edec
> >>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
> >>> 0000000000000000 80000007852b4620 80000007835d8e80 ffffffffc027f38c
> >>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
> >>> 8000000787ac0580 0000000000000001 80000007f8d1ca60 8000000785aeb080
> >>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
> >>> 0000000000000000 0000000000000000 0000000000000200 ffffffffc0282488
> >>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
> >>> 0000000000000200 80000007f8d1ca00 ffffffffc0280000 ffffffffc027db90
> >>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]   ...
> >>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014558] Call Trace:
> >>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014570]
> >>> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
> >>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014580]
> >>> [<ffffffff80e1db78>] schedule+0x38/0x98
> >>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014590]
> >>> [<ffffffff80e204b8>] __down_read+0xa8/0xf0
> >>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014609]
> >>> [<ffffffffc027edec>] do_origin.isra.13+0x44/0x110 [dm_snapshot]
> >>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014625]
> >>> [<ffffffffc027f38c>] pending_complete+0x1ac/0x378 [dm_snapshot]
> >>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014642]
> >>> [<ffffffffc0282488>] persistent_commit_exception+0x140/0x218
> >>> [dm_snapshot]
> >>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014659]
> >>> [<ffffffffc027db90>] copy_callback+0x108/0x1a0 [dm_snapshot]
> >>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014688]
> >>> [<ffffffffc01eb6a4>] run_complete_job+0x8c/0x148 [dm_mod]
> >>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014724]
> >>> [<ffffffffc01eaae8>] process_jobs+0xc8/0x1e0 [dm_mod]
> >>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014759]
> >>> [<ffffffffc01eb1e0>] do_work+0xb8/0x110 [dm_mod]
> >>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014783]
> >>> [<ffffffff808b18e0>] process_one_work+0x190/0x480
> >>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014791]
> >>> [<ffffffff808b1d18>] worker_thread+0x148/0x580
> >>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014801]
> >>> [<ffffffff808b813c>] kthread+0xdc/0xf8
> >>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014811]
> >>> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
> >>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014817]
> >>> Sep 24 12:08:48.974658 err CFPU-1 kernel: [  280.015046] INFO: task
> >>> drbd_r_r1:7772 blocked for more than 120 seconds.
> >>> Sep 24 12:08:48.974658 err CFPU-1 kernel: [  280.021766]
> >>> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
> >>> Sep 24 12:08:48.974658 err CFPU-1 kernel: [  280.029781] "echo 0 >
> >>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >>> Sep 24 12:08:48.982578 info CFPU-1 kernel: [  280.037637] drbd_r_r1
> >>>    D ffffffff80e1db78     0  7772      2 0x00100002
> >>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648] Stack :
> >>> 00000000000207e8 ffffffff80b35530 8000000788e77860 8000000788e77860
> >>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
> >>> 7fffffffffffffff 80000007f8d1cb28 8000000788e0a880 0000000000000002
> >>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
> >>> ffffffff80e20000 80000007f9306480 00000000000207e8 ffffffff808dfe58
> >>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
> >>> 80000007eebf5168 ffffffff80e1db78 0000000002411200 ffffffff80e208d8
> >>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
> >>> 80000007eab5b440 ffffffff8097b25c 00000000000207a0 ffffffff808dfe58
> >>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
> >>> 80000007f8d1ca00 ffffffff809ce014 80000007f9306000 0000000002011200
> >>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
> >>> 7fffffffffffffff 80000007f8d1cb28 8000000788e0a880 0000000000000002
> >>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
> >>> ffffffff80e20000 ffffffff80e1f8d8 80000007f8d1cb30 80000007f8d1cb30
> >>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
> >>> 8000000788e0a880 00ffffff808db9e8 0000000000000000 80000007f8d1cb28
> >>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
> >>> 80000007f9306480 ffffffffc02834d0 ffffffff808e0000 ffffffff808dfba4
> >>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]   ...
> >>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037774] Call Trace:
> >>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037797]
> >>> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
> >>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037849]
> >>> [<ffffffff80e1db78>] schedule+0x38/0x98
> >>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037865]
> >>> [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
> >>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037874]
> >>> [<ffffffff80e1f8d8>] __down+0x90/0xd8
> >>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037884]
> >>> [<ffffffff808dfba4>] down+0x54/0x70
> >>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037901]
> >>> [<ffffffffc027da48>] start_copy+0x98/0xd8 [dm_snapshot]
> >>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037934]
> >>> [<ffffffffc027e7d4>] __origin_write+0x184/0x2c0 [dm_snapshot]
> >>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037968]
> >>> [<ffffffffc027ee4c>] do_origin.isra.13+0xa4/0x110 [dm_snapshot]
> >>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038016]
> >>> [<ffffffffc01dc2c0>] __map_bio+0xb0/0x258 [dm_mod]
> >>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038071]
> >>> [<ffffffffc01deb94>] __split_and_process_bio+0x274/0x488 [dm_mod]
> >>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038126]
> >>> [<ffffffffc01dee3c>] dm_make_request+0x94/0x128 [dm_mod]
> >>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038174]
> >>> [<ffffffff80b3347c>] generic_make_request+0x114/0x290
> >>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038238]
> >>> [<ffffffffc0511090>] drbd_submit_peer_request+0x258/0x618 [drbd]
> >>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038337]
> >>> [<ffffffffc05121f8>] receive_RSDataReply+0x3b8/0x770 [drbd]
> >>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038426]
> >>> [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
> >>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038526]
> >>> [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
> >>> Sep 24 12:08:48.985184 warn CFPU-1 kernel: [  280.038588]
> >>> [<ffffffff808b813c>] kthread+0xdc/0xf8
> >>> Sep 24 12:08:48.985184 warn CFPU-1 kernel: [  280.038604]
> >>> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
> >>> Sep 24 12:08:48.985184 warn CFPU-1 kernel: [  280.038612]
> >>> Sep 24 12:08:48.985184 err CFPU-1 kernel: [  280.038629] INFO: task
> >>> drbd_r_r5:7910 blocked for more than 120 seconds.
> >>> Sep 24 12:08:48.990218 err CFPU-1 kernel: [  280.045351]
> >>> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
> >>> Sep 24 12:08:48.998215 err CFPU-1 kernel: [  280.053372] "echo 0 >
> >>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >>> Sep 24 12:08:49.006141 info CFPU-1 kernel: [  280.061248] drbd_r_r5
> >>>    D ffffffff80e1db78     0  7910      2 0x00100002
> >>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257] Stack :
> >>> 0000000000000001 ffffffff80b35530 8000000788ef7950 8000000788ef7950
> >>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
> >>> 8000000787936c00 0000000000000002 ffffffffc0285294 00000000000049ba
> >>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
> >>> 0000000000000001 00000000000049ba 0000000000000001 ffffffff80b2aa90
> >>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
> >>> 0000000000000001 ffffffff80e1db78 8000000787936c00 ffffffff80e204b8
> >>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
> >>> 800000078785b9c0 80000007fde8bb30 8000000787936c00 0000000100000000
> >>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
> >>> ffffffffc0280000 80000007eff43920 8000000788ca0198 ffffffffc027edec
> >>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
> >>> 80000007eff43900 c000000001f5b080 80000007eff43920 ffffffffc01dc2c0
> >>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
> >>> 80000007eb29bd80 0000000000000000 0000000002400000 80000007eff42900
> >>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
> >>> 00000000000049ba ffffffff80b2abec 8000000788ef7a50 8000000788ef7a50
> >>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
> >>> ffffffffc01dbad8 c000000001f5b080 c000000001f5b080 80000007eff43900
> >>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]   ...
> >>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061375] Call Trace:
> >>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061392]
> >>> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
> >>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061435]
> >>> [<ffffffff80e1db78>] schedule+0x38/0x98
> >>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061445]
> >>> [<ffffffff80e204b8>] __down_read+0xa8/0xf0
> >>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061463]
> >>> [<ffffffffc027edec>] do_origin.isra.13+0x44/0x110 [dm_snapshot]
> >>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061509]
> >>> [<ffffffffc01dc2c0>] __map_bio+0xb0/0x258 [dm_mod]
> >>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061563]
> >>> [<ffffffffc01deb94>] __split_and_process_bio+0x274/0x488 [dm_mod]
> >>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061598]
> >>> [<ffffffffc01dee3c>] dm_make_request+0x94/0x128 [dm_mod]
> >>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061624]
> >>> [<ffffffff80b3347c>] generic_make_request+0x114/0x290
> >>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061674]
> >>> [<ffffffffc0511090>] drbd_submit_peer_request+0x258/0x618 [drbd]
> >>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061739]
> >>> [<ffffffffc0512c78>] receive_Data+0x6c8/0x1368 [drbd]
> >>> Sep 24 12:08:49.007987 warn CFPU-1 kernel: [  280.061816]
> >>> [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
> >>> Sep 24 12:08:49.007987 warn CFPU-1 kernel: [  280.061882]
> >>> [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
> >>> Sep 24 12:08:49.007987 warn CFPU-1 kernel: [  280.061922]
> >>> [<ffffffff808b813c>] kthread+0xdc/0xf8
> >>> Sep 24 12:08:49.007987 warn CFPU-1 kernel: [  280.061931]
> >>> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
> >>> Sep 24 12:08:49.007987 warn CFPU-1 kernel: [  280.061937]
> >>> Sep 24 12:08:49.007987 err CFPU-1 kernel: [  280.061948] INFO: task
> >>> drbd_r_r6:7991 blocked for more than 120 seconds.
> >>> Sep 24 12:08:49.013584 err CFPU-1 kernel: [  280.068669]
> >>> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
> >>> Sep 24 12:08:49.021541 err CFPU-1 kernel: [  280.076683] "echo 0 >
> >>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >>> Sep 24 12:08:49.029370 info CFPU-1 kernel: [  280.084523] drbd_r_r6
> >>>    D ffffffff80e1db78     0  7991      2 0x00100002
> >>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532] Stack :
> >>> 0000000000000001 ffffffff80b35530 800000078785b950 800000078785b950
> >>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
> >>> 8000000787b79b00 0000000000000002 ffffffffc0285294 000000000000049a
> >>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
> >>> 0000000000000001 000000000000049a 0000000000000001 ffffffff80b2aa90
> >>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
> >>> 0000000000000001 ffffffff80e1db78 8000000787b79b00 ffffffff80e204b8
> >>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
> >>> 800000078465b940 8000000788ef79c0 8000000787b79b00 0000000100000000
> >>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
> >>> ffffffffc0280000 80000007e90d6c20 80000007878cf2d8 ffffffffc027edec
> >>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
> >>> 80000007e90d6c00 c000000001fcc080 80000007e90d6c20 ffffffffc01dc2c0
> >>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
> >>> 80000007f93fd900 0000000000000000 0000000002400000 80000007e90d6a00
> >>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
> >>> 000000000000049a ffffffff80b2abec 800000078785ba50 800000078785ba50
> >>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
> >>> ffffffffc01dbad8 c000000001fcc080 c000000001fcc080 80000007e90d6c00
> >>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]   ...
> >>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084610] Call Trace:
> >>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084621]
> >>> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
> >>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084631]
> >>> [<ffffffff80e1db78>] schedule+0x38/0x98
> >>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084641]
> >>> [<ffffffff80e204b8>] __down_read+0xa8/0xf0
> >>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084659]
> >>> [<ffffffffc027edec>] do_origin.isra.13+0x44/0x110 [dm_snapshot]
> >>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084696]
> >>> [<ffffffffc01dc2c0>] __map_bio+0xb0/0x258 [dm_mod]
> >>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084729]
> >>> [<ffffffffc01deb94>] __split_and_process_bio+0x274/0x488 [dm_mod]
> >>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084777]
> >>> [<ffffffffc01dee3c>] dm_make_request+0x94/0x128 [dm_mod]
> >>> Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.084810]
> >>> [<ffffffff80b3347c>] generic_make_request+0x114/0x290
> >>> Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.084863]
> >>> [<ffffffffc0511090>] drbd_submit_peer_request+0x258/0x618 [drbd]
> >>> Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.084928]
> >>> [<ffffffffc0512c78>] receive_Data+0x6c8/0x1368 [drbd]
> >>> Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.084992]
> >>> [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
> >>> Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.085057]
> >>> [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
> >>> Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.085097]
> >>> [<ffffffff808b813c>] kthread+0xdc/0xf8
> >>> Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.085106]
> >>> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
> >>> Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.085112]
> >>> Sep 24 12:08:49.031000 err CFPU-1 kernel: [  280.085121] INFO: task
> >>> drbd_r_r7:8046 blocked for more than 120 seconds.
> >>> Sep 24 12:08:49.036834 err CFPU-1 kernel: [  280.091854]
> >>> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
> >>> Sep 24 12:08:49.044747 err CFPU-1 kernel: [  280.099876] "echo 0 >
> >>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >>> Sep 24 12:08:49.056612 info CFPU-1 kernel: [  280.107739] drbd_r_r7
> >>>    D ffffffff80e1db78     0  8046      2 0x00100000
> >>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749] Stack :
> >>> 0000000000000000 0000000000000010 ffffffff811b3a80 0000000000000000
> >>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
> >>> 7fffffffffffffff 7fffffffffffffff 0000000000000000 0000000000000002
> >>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
> >>> 8000000788fcfa98 0000000000000000 ffffffffc053baa8 ffffffff80be4360
> >>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
> >>> ffffffffc0550000 ffffffff80e1db78 ffffffff81070660 ffffffff80e208d8
> >>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
> >>> 8000000797a43600 ffffffff808c4878 80000007ff2f4100 0000000000000000
> >>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
> >>> 80000007f93fdb40 0000000000000000 80000007ff007000 ffffffff81070660
> >>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
> >>> 8000000788fcfa90 7fffffffffffffff 0000000000000000 0000000000000002
> >>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
> >>> 8000000788fcfa98 ffffffff80e1e8a0 0000000100000000 8000000797a42880
> >>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
> >>> ffffffff808c4b60 8000000788fcfaa0 8000000788fcfaa0 ffffffffc053baa8
> >>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
> >>> ffffffff80be4360 ffffffff81cb4540 ffffffff81cb0000 80000007f93fdb40
> >>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]   ...
> >>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107821] Call Trace:
> >>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107833]
> >>> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
> >>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107842]
> >>> [<ffffffff80e1db78>] schedule+0x38/0x98
> >>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107850]
> >>> [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
> >>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107859]
> >>> [<ffffffff80e1e8a0>] wait_for_common+0xd8/0x1b0
> >>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107873]
> >>> [<ffffffff808acc20>] call_usermodehelper_exec+0x110/0x188
> >>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107932]
> >>> [<ffffffffc053da3c>] drbd_khelper+0x1c4/0x310 [drbd]
> >>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107998]
> >>> [<ffffffffc0500a14>] drbd_start_resync+0x534/0x968 [drbd]
> >>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108060]
> >>> [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
> >>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108124]
> >>> [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
> >>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108189]
> >>> [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
> >>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108229]
> >>> [<ffffffff808b813c>] kthread+0xdc/0xf8
> >>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108239]
> >>> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
> >>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108245]
> >>> Sep 24 12:08:49.056612 err CFPU-1 kernel: [  280.108471] INFO: task
> >>> drbd_r_r8:8120 blocked for more than 120 seconds.
> >>> Sep 24 12:08:49.060085 err CFPU-1 kernel: [  280.115194]
> >>> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
> >>> Sep 24 12:08:49.075924 err CFPU-1 kernel: [  280.123218] "echo 0 >
> >>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >>> Sep 24 12:08:49.075924 info CFPU-1 kernel: [  280.131070] drbd_r_r8
> >>>    D ffffffff80e1db78     0  8120      2 0x00100002
> >>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083] Stack :
> >>> 0000000000000000 0000000000000010 ffffffff811b3a80 0000000000000000
> >>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
> >>> 7fffffffffffffff 7fffffffffffffff 0000000000000000 0000000000000002
> >>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
> >>> 80000007851d3a98 0000000000000000 ffffffffc053baa8 ffffffff80be4360
> >>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
> >>> ffffffffc0550000 ffffffff80e1db78 ffffffff81070660 ffffffff80e208d8
> >>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
> >>> 80000007f96f6c00 ffffffff808c4878 80000007ff2f4100 0000000000000000
> >>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
> >>> 80000007ee99f360 0000000000000000 80000007ff007000 ffffffff81070660
> >>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
> >>> 80000007851d3a90 7fffffffffffffff 0000000000000000 0000000000000002
> >>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
> >>> 80000007851d3a98 ffffffff80e1e8a0 0000000100000000 8000000784ba3600
> >>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
> >>> ffffffff808c4b60 80000007851d3aa0 80000007851d3aa0 ffffffffc053baa8
> >>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
> >>> ffffffff80be4360 ffffffff81cb4540 ffffffff81cb0000 80000007ee99f360
> >>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]   ...
> >>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131162] Call Trace:
> >>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131174]
> >>> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
> >>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131183]
> >>> [<ffffffff80e1db78>] schedule+0x38/0x98
> >>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131195]
> >>> [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
> >>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131204]
> >>> [<ffffffff80e1e8a0>] wait_for_common+0xd8/0x1b0
> >>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131216]
> >>> [<ffffffff808acc20>] call_usermodehelper_exec+0x110/0x188
> >>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131274]
> >>> [<ffffffffc053da3c>] drbd_khelper+0x1c4/0x310 [drbd]
> >>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131341]
> >>> [<ffffffffc0500a14>] drbd_start_resync+0x534/0x968 [drbd]
> >>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131403]
> >>> [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
> >>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131482]
> >>> [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
> >>> Sep 24 12:08:49.077857 warn CFPU-1 kernel: [  280.131557]
> >>> [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
> >>> Sep 24 12:08:49.077857 warn CFPU-1 kernel: [  280.131597]
> >>> [<ffffffff808b813c>] kthread+0xdc/0xf8
> >>> Sep 24 12:08:49.077857 warn CFPU-1 kernel: [  280.131606]
> >>> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
> >>> Sep 24 12:08:49.077857 warn CFPU-1 kernel: [  280.131612]
> >>> Sep 24 12:08:49.077857 err CFPU-1 kernel: [  280.131641] INFO: task
> >>> drbd_r_r9:8173 blocked for more than 120 seconds.
> >>> Sep 24 12:08:49.091245 err CFPU-1 kernel: [  280.138358]
> >>> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
> >>> Sep 24 12:08:49.091245 err CFPU-1 kernel: [  280.146373] "echo 0 >
> >>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >>> Sep 24 12:08:49.099067 info CFPU-1 kernel: [  280.154219] drbd_r_r9
> >>>    D ffffffff80e1db78     0  8173      2 0x00100002
> >>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229] Stack :
> >>> 0000000000000000 0000000000000010 ffffffff811b3a80 0000000000000000
> >>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
> >>> 7fffffffffffffff 7fffffffffffffff 0000000000000000 0000000000000002
> >>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
> >>> 8000000783c9fa98 0000000000000000 ffffffffc053baa8 ffffffff80be4360
> >>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
> >>> ffffffffc0550000 ffffffff80e1db78 ffffffff81070660 ffffffff80e208d8
> >>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
> >>> 8000000797a45100 ffffffff808c4878 80000007ff2f4100 0000000000000000
> >>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
> >>> 8000000788ca9d20 0000000000000000 80000007ff007000 ffffffff81070660
> >>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
> >>> 8000000783c9fa90 7fffffffffffffff 0000000000000000 0000000000000002
> >>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
> >>> 8000000783c9fa98 ffffffff80e1e8a0 0000000100000000 80000007843d4380
> >>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
> >>> ffffffff808c4b60 8000000783c9faa0 8000000783c9faa0 ffffffffc053baa8
> >>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
> >>> ffffffff80be4360 ffffffff81cb4540 ffffffff81cb0000 8000000788ca9d20
> >>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]   ...
> >>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154305] Call Trace:
> >>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154315]
> >>> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
> >>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154325]
> >>> [<ffffffff80e1db78>] schedule+0x38/0x98
> >>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154333]
> >>> [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
> >>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154342]
> >>> [<ffffffff80e1e8a0>] wait_for_common+0xd8/0x1b0
> >>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154354]
> >>> [<ffffffff808acc20>] call_usermodehelper_exec+0x110/0x188
> >>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154410]
> >>> [<ffffffffc053da3c>] drbd_khelper+0x1c4/0x310 [drbd]
> >>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154475]
> >>> [<ffffffffc0500a14>] drbd_start_resync+0x534/0x968 [drbd]
> >>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154540]
> >>> [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
> >>> Sep 24 12:08:49.100714 warn CFPU-1 kernel: [  280.154612]
> >>> [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
> >>> Sep 24 12:08:49.100714 warn CFPU-1 kernel: [  280.154678]
> >>> [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
> >>> Sep 24 12:08:49.100714 warn CFPU-1 kernel: [  280.154717]
> >>> [<ffffffff808b813c>] kthread+0xdc/0xf8
> >>> Sep 24 12:08:49.100714 warn CFPU-1 kernel: [  280.154726]
> >>> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
> >>> Sep 24 12:08:49.100714 warn CFPU-1 kernel: [  280.154732]
> >>> Sep 24 12:08:49.100714 err CFPU-1 kernel: [  280.154742] INFO: task
> >>> drbd_r_r10:8254 blocked for more than 120 seconds.
> >>> Sep 24 12:08:49.106539 err CFPU-1 kernel: [  280.161554]
> >>> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
> >>> Sep 24 12:08:49.114430 err CFPU-1 kernel: [  280.169569] "echo 0 >
> >>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >>> Sep 24 12:08:49.129749 info CFPU-1 kernel: [  280.177434] drbd_r_r10
> >>>    D ffffffff80e1db78     0  8254      2 0x00100002
> >>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445] Stack :
> >>> 0000000000000000 0000000000000010 ffffffff811b3a80 0000000000000000
> >>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
> >>> 7fffffffffffffff 7fffffffffffffff 0000000000000000 0000000000000002
> >>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
> >>> 80000007846efa98 0000000000000000 ffffffffc053baa8 ffffffff80be4360
> >>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
> >>> ffffffffc0550000 ffffffff80e1db78 ffffffff81070660 ffffffff80e208d8
> >>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
> >>> 8000000784893600 ffffffff808c4878 80000007ff2f4100 0000000000000000
> >>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
> >>> 80000007e9498180 0000000000000000 80000007ff007000 ffffffff81070660
> >>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
> >>> 80000007846efa90 7fffffffffffffff 0000000000000000 0000000000000002
> >>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
> >>> 80000007846efa98 ffffffff80e1e8a0 0000000100000000 8000000783cf8d80
> >>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
> >>> ffffffff808c4b60 80000007846efaa0 80000007846efaa0 ffffffffc053baa8
> >>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
> >>> ffffffff80be4360 ffffffff81cb4540 ffffffff81cb0000 80000007e9498180
> >>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]   ...
> >>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177517] Call Trace:
> >>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177529]
> >>> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
> >>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177539]
> >>> [<ffffffff80e1db78>] schedule+0x38/0x98
> >>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177547]
> >>> [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
> >>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177556]
> >>> [<ffffffff80e1e8a0>] wait_for_common+0xd8/0x1b0
> >>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177569]
> >>> [<ffffffff808acc20>] call_usermodehelper_exec+0x110/0x188
> >>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177631]
> >>> [<ffffffffc053da3c>] drbd_khelper+0x1c4/0x310 [drbd]
> >>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177697]
> >>> [<ffffffffc0500a14>] drbd_start_resync+0x534/0x968 [drbd]
> >>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177760]
> >>> [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
> >>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177823]
> >>> [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
> >>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177888]
> >>> [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
> >>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177928]
> >>> [<ffffffff808b813c>] kthread+0xdc/0xf8
> >>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177938]
> >>> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
> >>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177944]
> >>> Sep 24 12:08:49.129749 err CFPU-1 kernel: [  280.177960] INFO: task
> >>> drbd_r_r11:8328 blocked for more than 120 seconds.
> >>> Sep 24 12:08:49.129749 err CFPU-1 kernel: [  280.184775]
> >>> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
> >>> Sep 24 12:08:49.137674 err CFPU-1 kernel: [  280.192793] "echo 0 >
> >>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >>> Sep 24 12:08:49.152797 info CFPU-1 kernel: [  280.200644] drbd_r_r11
> >>>    D ffffffff80e1db78     0  8328      2 0x00100002
> >>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654] Stack :
> >>> 0000000000000000 0000000000000010 ffffffff811b3a80 0000000000000000
> >>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
> >>> 7fffffffffffffff 7fffffffffffffff 0000000000000000 0000000000000002
> >>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
> >>> 8000000783c5fa98 0000000000000000 ffffffffc053baa8 ffffffff80be4360
> >>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
> >>> ffffffffc0550000 ffffffff80e1db78 ffffffff81070660 ffffffff80e208d8
> >>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
> >>> 8000000787935100 ffffffff808c4878 80000007ff2f4100 0000000000000000
> >>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
> >>> 80000007eebd25a0 0000000000000000 80000007ff007000 ffffffff81070660
> >>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
> >>> 8000000783c5fa90 7fffffffffffffff 0000000000000000 0000000000000002
> >>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
> >>> 8000000783c5fa98 ffffffff80e1e8a0 0000000100000000 800000078371de80
> >>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
> >>> ffffffff808c4b60 8000000783c5faa0 8000000783c5faa0 ffffffffc053baa8
> >>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
> >>> ffffffff80be4360 ffffffff81cb4540 ffffffff81cb0000 80000007eebd25a0
> >>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]   ...
> >>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200726] Call Trace:
> >>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200739]
> >>> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
> >>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200749]
> >>> [<ffffffff80e1db78>] schedule+0x38/0x98
> >>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200757]
> >>> [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
> >>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200766]
> >>> [<ffffffff80e1e8a0>] wait_for_common+0xd8/0x1b0
> >>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200778]
> >>> [<ffffffff808acc20>] call_usermodehelper_exec+0x110/0x188
> >>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200840]
> >>> [<ffffffffc053da3c>] drbd_khelper+0x1c4/0x310 [drbd]
> >>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200906]
> >>> [<ffffffffc0500a14>] drbd_start_resync+0x534/0x968 [drbd]
> >>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200969]
> >>> [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
> >>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.201032]
> >>> [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
> >>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.201097]
> >>> [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
> >>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.201137]
> >>> [<ffffffff808b813c>] kthread+0xdc/0xf8
> >>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.201147]
> >>> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
> >>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.201153]
> >>> Sep 24 12:08:49.152797 err CFPU-1 kernel: [  280.201183] INFO: task
> >>> lvcreate:8585 blocked for more than 120 seconds.
> >>> Sep 24 12:08:49.152797 err CFPU-1 kernel: [  280.207823]
> >>> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
> >>> Sep 24 12:08:49.160715 err CFPU-1 kernel: [  280.215860] "echo 0 >
> >>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >>> Sep 24 12:08:49.168570 info CFPU-1 kernel: [  280.223704] lvcreate
> >>>    D ffffffff80e1db78     0  8585   8582 0x00100000
> >>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719] Stack :
> >>> ffffffff809ce420 ffffffffc027e170 80000007ea05c720 ffffffff809ce014
> >>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
> >>> ffffffffc0285294 ffffffffc0285290 8000000797bdb600 0000000000000002
> >>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
> >>> ffffffffc02834d0 ffffffffc0280000 ffffffff809ce420 ffffffffc027e170
> >>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
> >>> 80000007f9275d80 ffffffff80e1db78 ffffffffc0285294 ffffffff80e20584
> >>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
> >>> 80000007fde8bb30 ffffffffc0285298 8000000797bdb600 0000000000000000
> >>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
> >>> c00000000205a080 0000000000000000 c00000000205a080 80000007ea05c600
> >>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
> >>> 80000007fec10300 ffffffffc027fd20 8000000780cb0150 ffffffffc01e2e9c
> >>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
> >>> 00000008c0284b30 00000002c0284b30 80000007ea05c620 ffffffffc0280000
> >>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
> >>> c00000000205a080 80000007ea05de00 0000000000000000 8000000780cb0150
> >>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
> >>> 8000000780cb0160 0000000001d4c000 0000000000000000 8000000780cb0000
> >>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]   ...
> >>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223796] Call Trace:
> >>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223805]
> >>> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
> >>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223814]
> >>> [<ffffffff80e1db78>] schedule+0x38/0x98
> >>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223824]
> >>> [<ffffffff80e20584>] __down_write_nested+0x84/0xe8
> >>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223843]
> >>> [<ffffffffc027fd20>] snapshot_ctr+0x4d0/0x868 [dm_snapshot]
> >>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223886]
> >>> [<ffffffffc01e31d4>] dm_table_add_target+0x164/0x418 [dm_mod]
> >>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223921]
> >>> [<ffffffffc01e7bdc>] table_load+0x194/0x478 [dm_mod]
> >>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223956]
> >>> [<ffffffffc01e8e1c>] ctl_ioctl+0x424/0x678 [dm_mod]
> >>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.224006]
> >>> [<ffffffffc01e90a0>] dm_ctl_ioctl+0x30/0x40 [dm_mod]
> >>> Sep 24 12:08:49.170177 warn CFPU-1 kernel: [  280.224040]
> >>> [<ffffffff809f29cc>] do_vfs_ioctl+0x38c/0x5f8
> >>> Sep 24 12:08:49.170177 warn CFPU-1 kernel: [  280.224049]
> >>> [<ffffffff809f2c98>] SyS_ioctl+0x60/0xc8
> >>> Sep 24 12:08:49.170177 warn CFPU-1 kernel: [  280.224059]
> >>> [<ffffffff80879b70>] syscall_common+0x34/0x58
> >>> Sep 24 12:08:49.170177 warn CFPU-1 kernel: [  280.224065]
> >>>
> >
> >
> >
> > --
> > Guruswamy Basavaiah
> >



-- 
Guruswamy Basavaiah

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fix "dm kcopyd: Fix bug causing workqueue stalls" causes dead lock
  2019-10-01 12:27       ` Guruswamy Basavaiah
@ 2019-10-01 12:43         ` Nikos Tsironis
  2019-10-09 14:13           ` Mike Snitzer
  0 siblings, 1 reply; 19+ messages in thread
From: Nikos Tsironis @ 2019-10-01 12:43 UTC (permalink / raw)
  To: Guruswamy Basavaiah; +Cc: dm-devel, agk, Mike Snitzer, iliastsi

On 10/1/19 3:27 PM, Guruswamy Basavaiah wrote:
> Hello Nikos,
>  Yes, issue is consistently reproducible with us, in a particular
> set-up and test case.
>  I will get the access to set-up next week, will try to test and let
> you know the results before end of next week.
> 

That sounds great!

Thanks a lot,
Nikos

> Guru
> 
> On Tue, 1 Oct 2019 at 17:42, Nikos Tsironis <ntsironis@arrikto.com> wrote:
>>
>> On 9/29/19 5:36 PM, Guruswamy Basavaiah wrote:
>>> Hello Nikos,
>>>  Thanks for pointing out the lvcreate write lock.
>>>
>>> Guru
>>>
>>
>> Hi Guru,
>>
>> I have sent a fix for this and I have Cc-ed you.
>>
>> Is this something you are able to consistently reproduce? If so, it
>> would be great if you could also test the fix.
>>
>> Thanks,
>> Nikos
>>
>>>
>>> On Sat, 28 Sep 2019 at 01:03, Nikos Tsironis <ntsironis@arrikto.com> wrote:
>>>>
>>>> On 9/27/19 4:19 PM, Guruswamy Basavaiah wrote:
>>>>> Hello,
>>>>>  We have drbd partition on top of lvm partition. when node having
>>>>> secondary drbd partition is coming up, large amount of data will be
>>>>> synced between primary to secondary drbd partitions.
>>>>>
>>>>> During this time, we see the drbd Sync(Resync) stops at some point.
>>>>> After 120 seconds we see hung-task-timeout warnings in the logs.(see
>>>>> at the end of this email)
>>>>>
>>>>> If i increase the cow_count semaphore value from 2048 to 8192 or
>>>>> remove the below patch, drbd sync works seamlessly.
>>>>>
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/patch/?id=721b1d98fb517ae99ab3b757021cf81db41e67be
>>>>>
>>>>> I am not familiar with dm code, from hung task back traces what i
>>>>> understand is, when thread is trying to queue work to kcopyd, holding
>>>>> "&_origins_lock" and blocked on cow_count lock,
>>>>> jobs from kcopyd is trying to queue work to same kcopyd and blocked on
>>>>> "&_origins_lock" and dead lock.
>>>>>
>>>>
>>>> Hello Guruswamy,
>>>>
>>>> I am Cc-ing the maintainers, so they can be in the loop.
>>>>
>>>> I examined the attached logs and I believe the following happens:
>>>>
>>>> 1. DRBD issues a number of writes to the snapshot origin device. These
>>>>    writes cause COW, which is performed by kcopyd.
>>>>
>>>> 2. At some point DRBD reaches the cow_count semaphore limit (2048) and
>>>>    blocks in down(&s->cow_count), holding a read lock on _origins_lock.
>>>>
>>>> 3. Someone tries to create a new snapshot. This involves taking a write
>>>>    lock on _origins_lock, which blocks because DRBD at step (2) already
>>>>    holds a read lock on it. That's the blocked lvcreate at the end of
>>>>    the trace.
>>>>
>>>> 4. A COW operation, issued by step (1), completes and kcopyd runs
>>>>    dm-snapshot's completion callback, which tries to take a read lock on
>>>>    _origins_lock, before signaling the cow_count semaphore. This read
>>>>    lock blocks, the semaphore is never signaled and we have the deadlock
>>>>    you experienced.
>>>>
>>>> At first glance this seemed strange, because DRBD at step (2) holds a
>>>> read lock on _origins_lock, so taking another read lock should be
>>>> possible.
>>>>
>>>> But, if I am not missing something, the read-write semaphore
>>>> implementation gives priority to writers, meaning that as soon as a
>>>> writer tries to enter the critical section, the lvcreate in our case, no
>>>> readers will be allowed in until all writers have completed their work.
>>>>
>>>> That's what I believe is causing the deadlock you are experiencing.
>>>>
>>>> I will send a patch fixing this and I will let you now.
>>>>
>>>> Thanks,
>>>> Nikos
>>>>
>>>>> Below is the hung task back traces.
>>>>> Sep 24 12:08:48.974658 err CFPU-1 kernel: [  279.991760] INFO: task
>>>>> kworker/1:1:170 blocked for more than 120 seconds.
>>>>> Sep 24 12:08:48.974658 err CFPU-1 kernel: [  279.998569]
>>>>> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
>>>>> Sep 24 12:08:48.974658 err CFPU-1 kernel: [  280.006593] "echo 0 >
>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>> Sep 24 12:08:48.974658 info CFPU-1 kernel: [  280.014435] kworker/1:1
>>>>>    D ffffffff80e1db78     0   170      2 0x00100000
>>>>> Sep 24 12:08:48.974658 info CFPU-1 kernel: [  280.014482] Workqueue:
>>>>> kcopyd do_work [dm_mod]
>>>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487] Stack :
>>>>> 0000000000000000 0000000000000001 0003000300000000 80000007fde8bac8
>>>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
>>>>> 80000007fe759b00 0000000000000002 ffffffffc0285294 80000007f8d1ca00
>>>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
>>>>> ffffffffc027eda8 0000000000000001 ffffffff80b30000 0000000000000100
>>>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
>>>>> 8000000784c098c8 ffffffff80e1db78 80000007fe759b00 ffffffff80e204b8
>>>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
>>>>> 8000000788ef79c0 800000078505ba70 80000007fe759b00 00000001852b4620
>>>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
>>>>> ffffffffc0280000 80000007852b4620 80000007eebf5758 ffffffffc027edec
>>>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
>>>>> 0000000000000000 80000007852b4620 80000007835d8e80 ffffffffc027f38c
>>>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
>>>>> 8000000787ac0580 0000000000000001 80000007f8d1ca60 8000000785aeb080
>>>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
>>>>> 0000000000000000 0000000000000000 0000000000000200 ffffffffc0282488
>>>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]
>>>>> 0000000000000200 80000007f8d1ca00 ffffffffc0280000 ffffffffc027db90
>>>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014487]   ...
>>>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014558] Call Trace:
>>>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014570]
>>>>> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
>>>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014580]
>>>>> [<ffffffff80e1db78>] schedule+0x38/0x98
>>>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014590]
>>>>> [<ffffffff80e204b8>] __down_read+0xa8/0xf0
>>>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014609]
>>>>> [<ffffffffc027edec>] do_origin.isra.13+0x44/0x110 [dm_snapshot]
>>>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014625]
>>>>> [<ffffffffc027f38c>] pending_complete+0x1ac/0x378 [dm_snapshot]
>>>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014642]
>>>>> [<ffffffffc0282488>] persistent_commit_exception+0x140/0x218
>>>>> [dm_snapshot]
>>>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014659]
>>>>> [<ffffffffc027db90>] copy_callback+0x108/0x1a0 [dm_snapshot]
>>>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014688]
>>>>> [<ffffffffc01eb6a4>] run_complete_job+0x8c/0x148 [dm_mod]
>>>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014724]
>>>>> [<ffffffffc01eaae8>] process_jobs+0xc8/0x1e0 [dm_mod]
>>>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014759]
>>>>> [<ffffffffc01eb1e0>] do_work+0xb8/0x110 [dm_mod]
>>>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014783]
>>>>> [<ffffffff808b18e0>] process_one_work+0x190/0x480
>>>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014791]
>>>>> [<ffffffff808b1d18>] worker_thread+0x148/0x580
>>>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014801]
>>>>> [<ffffffff808b813c>] kthread+0xdc/0xf8
>>>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014811]
>>>>> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
>>>>> Sep 24 12:08:48.974658 warn CFPU-1 kernel: [  280.014817]
>>>>> Sep 24 12:08:48.974658 err CFPU-1 kernel: [  280.015046] INFO: task
>>>>> drbd_r_r1:7772 blocked for more than 120 seconds.
>>>>> Sep 24 12:08:48.974658 err CFPU-1 kernel: [  280.021766]
>>>>> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
>>>>> Sep 24 12:08:48.974658 err CFPU-1 kernel: [  280.029781] "echo 0 >
>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>> Sep 24 12:08:48.982578 info CFPU-1 kernel: [  280.037637] drbd_r_r1
>>>>>    D ffffffff80e1db78     0  7772      2 0x00100002
>>>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648] Stack :
>>>>> 00000000000207e8 ffffffff80b35530 8000000788e77860 8000000788e77860
>>>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
>>>>> 7fffffffffffffff 80000007f8d1cb28 8000000788e0a880 0000000000000002
>>>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
>>>>> ffffffff80e20000 80000007f9306480 00000000000207e8 ffffffff808dfe58
>>>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
>>>>> 80000007eebf5168 ffffffff80e1db78 0000000002411200 ffffffff80e208d8
>>>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
>>>>> 80000007eab5b440 ffffffff8097b25c 00000000000207a0 ffffffff808dfe58
>>>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
>>>>> 80000007f8d1ca00 ffffffff809ce014 80000007f9306000 0000000002011200
>>>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
>>>>> 7fffffffffffffff 80000007f8d1cb28 8000000788e0a880 0000000000000002
>>>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
>>>>> ffffffff80e20000 ffffffff80e1f8d8 80000007f8d1cb30 80000007f8d1cb30
>>>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
>>>>> 8000000788e0a880 00ffffff808db9e8 0000000000000000 80000007f8d1cb28
>>>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]
>>>>> 80000007f9306480 ffffffffc02834d0 ffffffff808e0000 ffffffff808dfba4
>>>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037648]   ...
>>>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037774] Call Trace:
>>>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037797]
>>>>> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
>>>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037849]
>>>>> [<ffffffff80e1db78>] schedule+0x38/0x98
>>>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037865]
>>>>> [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
>>>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037874]
>>>>> [<ffffffff80e1f8d8>] __down+0x90/0xd8
>>>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037884]
>>>>> [<ffffffff808dfba4>] down+0x54/0x70
>>>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037901]
>>>>> [<ffffffffc027da48>] start_copy+0x98/0xd8 [dm_snapshot]
>>>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037934]
>>>>> [<ffffffffc027e7d4>] __origin_write+0x184/0x2c0 [dm_snapshot]
>>>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.037968]
>>>>> [<ffffffffc027ee4c>] do_origin.isra.13+0xa4/0x110 [dm_snapshot]
>>>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038016]
>>>>> [<ffffffffc01dc2c0>] __map_bio+0xb0/0x258 [dm_mod]
>>>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038071]
>>>>> [<ffffffffc01deb94>] __split_and_process_bio+0x274/0x488 [dm_mod]
>>>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038126]
>>>>> [<ffffffffc01dee3c>] dm_make_request+0x94/0x128 [dm_mod]
>>>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038174]
>>>>> [<ffffffff80b3347c>] generic_make_request+0x114/0x290
>>>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038238]
>>>>> [<ffffffffc0511090>] drbd_submit_peer_request+0x258/0x618 [drbd]
>>>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038337]
>>>>> [<ffffffffc05121f8>] receive_RSDataReply+0x3b8/0x770 [drbd]
>>>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038426]
>>>>> [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
>>>>> Sep 24 12:08:48.983234 warn CFPU-1 kernel: [  280.038526]
>>>>> [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
>>>>> Sep 24 12:08:48.985184 warn CFPU-1 kernel: [  280.038588]
>>>>> [<ffffffff808b813c>] kthread+0xdc/0xf8
>>>>> Sep 24 12:08:48.985184 warn CFPU-1 kernel: [  280.038604]
>>>>> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
>>>>> Sep 24 12:08:48.985184 warn CFPU-1 kernel: [  280.038612]
>>>>> Sep 24 12:08:48.985184 err CFPU-1 kernel: [  280.038629] INFO: task
>>>>> drbd_r_r5:7910 blocked for more than 120 seconds.
>>>>> Sep 24 12:08:48.990218 err CFPU-1 kernel: [  280.045351]
>>>>> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
>>>>> Sep 24 12:08:48.998215 err CFPU-1 kernel: [  280.053372] "echo 0 >
>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>> Sep 24 12:08:49.006141 info CFPU-1 kernel: [  280.061248] drbd_r_r5
>>>>>    D ffffffff80e1db78     0  7910      2 0x00100002
>>>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257] Stack :
>>>>> 0000000000000001 ffffffff80b35530 8000000788ef7950 8000000788ef7950
>>>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
>>>>> 8000000787936c00 0000000000000002 ffffffffc0285294 00000000000049ba
>>>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
>>>>> 0000000000000001 00000000000049ba 0000000000000001 ffffffff80b2aa90
>>>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
>>>>> 0000000000000001 ffffffff80e1db78 8000000787936c00 ffffffff80e204b8
>>>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
>>>>> 800000078785b9c0 80000007fde8bb30 8000000787936c00 0000000100000000
>>>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
>>>>> ffffffffc0280000 80000007eff43920 8000000788ca0198 ffffffffc027edec
>>>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
>>>>> 80000007eff43900 c000000001f5b080 80000007eff43920 ffffffffc01dc2c0
>>>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
>>>>> 80000007eb29bd80 0000000000000000 0000000002400000 80000007eff42900
>>>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
>>>>> 00000000000049ba ffffffff80b2abec 8000000788ef7a50 8000000788ef7a50
>>>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]
>>>>> ffffffffc01dbad8 c000000001f5b080 c000000001f5b080 80000007eff43900
>>>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061257]   ...
>>>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061375] Call Trace:
>>>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061392]
>>>>> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
>>>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061435]
>>>>> [<ffffffff80e1db78>] schedule+0x38/0x98
>>>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061445]
>>>>> [<ffffffff80e204b8>] __down_read+0xa8/0xf0
>>>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061463]
>>>>> [<ffffffffc027edec>] do_origin.isra.13+0x44/0x110 [dm_snapshot]
>>>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061509]
>>>>> [<ffffffffc01dc2c0>] __map_bio+0xb0/0x258 [dm_mod]
>>>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061563]
>>>>> [<ffffffffc01deb94>] __split_and_process_bio+0x274/0x488 [dm_mod]
>>>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061598]
>>>>> [<ffffffffc01dee3c>] dm_make_request+0x94/0x128 [dm_mod]
>>>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061624]
>>>>> [<ffffffff80b3347c>] generic_make_request+0x114/0x290
>>>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061674]
>>>>> [<ffffffffc0511090>] drbd_submit_peer_request+0x258/0x618 [drbd]
>>>>> Sep 24 12:08:49.006513 warn CFPU-1 kernel: [  280.061739]
>>>>> [<ffffffffc0512c78>] receive_Data+0x6c8/0x1368 [drbd]
>>>>> Sep 24 12:08:49.007987 warn CFPU-1 kernel: [  280.061816]
>>>>> [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
>>>>> Sep 24 12:08:49.007987 warn CFPU-1 kernel: [  280.061882]
>>>>> [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
>>>>> Sep 24 12:08:49.007987 warn CFPU-1 kernel: [  280.061922]
>>>>> [<ffffffff808b813c>] kthread+0xdc/0xf8
>>>>> Sep 24 12:08:49.007987 warn CFPU-1 kernel: [  280.061931]
>>>>> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
>>>>> Sep 24 12:08:49.007987 warn CFPU-1 kernel: [  280.061937]
>>>>> Sep 24 12:08:49.007987 err CFPU-1 kernel: [  280.061948] INFO: task
>>>>> drbd_r_r6:7991 blocked for more than 120 seconds.
>>>>> Sep 24 12:08:49.013584 err CFPU-1 kernel: [  280.068669]
>>>>> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
>>>>> Sep 24 12:08:49.021541 err CFPU-1 kernel: [  280.076683] "echo 0 >
>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>> Sep 24 12:08:49.029370 info CFPU-1 kernel: [  280.084523] drbd_r_r6
>>>>>    D ffffffff80e1db78     0  7991      2 0x00100002
>>>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532] Stack :
>>>>> 0000000000000001 ffffffff80b35530 800000078785b950 800000078785b950
>>>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
>>>>> 8000000787b79b00 0000000000000002 ffffffffc0285294 000000000000049a
>>>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
>>>>> 0000000000000001 000000000000049a 0000000000000001 ffffffff80b2aa90
>>>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
>>>>> 0000000000000001 ffffffff80e1db78 8000000787b79b00 ffffffff80e204b8
>>>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
>>>>> 800000078465b940 8000000788ef79c0 8000000787b79b00 0000000100000000
>>>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
>>>>> ffffffffc0280000 80000007e90d6c20 80000007878cf2d8 ffffffffc027edec
>>>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
>>>>> 80000007e90d6c00 c000000001fcc080 80000007e90d6c20 ffffffffc01dc2c0
>>>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
>>>>> 80000007f93fd900 0000000000000000 0000000002400000 80000007e90d6a00
>>>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
>>>>> 000000000000049a ffffffff80b2abec 800000078785ba50 800000078785ba50
>>>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]
>>>>> ffffffffc01dbad8 c000000001fcc080 c000000001fcc080 80000007e90d6c00
>>>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084532]   ...
>>>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084610] Call Trace:
>>>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084621]
>>>>> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
>>>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084631]
>>>>> [<ffffffff80e1db78>] schedule+0x38/0x98
>>>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084641]
>>>>> [<ffffffff80e204b8>] __down_read+0xa8/0xf0
>>>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084659]
>>>>> [<ffffffffc027edec>] do_origin.isra.13+0x44/0x110 [dm_snapshot]
>>>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084696]
>>>>> [<ffffffffc01dc2c0>] __map_bio+0xb0/0x258 [dm_mod]
>>>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084729]
>>>>> [<ffffffffc01deb94>] __split_and_process_bio+0x274/0x488 [dm_mod]
>>>>> Sep 24 12:08:49.029567 warn CFPU-1 kernel: [  280.084777]
>>>>> [<ffffffffc01dee3c>] dm_make_request+0x94/0x128 [dm_mod]
>>>>> Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.084810]
>>>>> [<ffffffff80b3347c>] generic_make_request+0x114/0x290
>>>>> Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.084863]
>>>>> [<ffffffffc0511090>] drbd_submit_peer_request+0x258/0x618 [drbd]
>>>>> Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.084928]
>>>>> [<ffffffffc0512c78>] receive_Data+0x6c8/0x1368 [drbd]
>>>>> Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.084992]
>>>>> [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
>>>>> Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.085057]
>>>>> [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
>>>>> Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.085097]
>>>>> [<ffffffff808b813c>] kthread+0xdc/0xf8
>>>>> Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.085106]
>>>>> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
>>>>> Sep 24 12:08:49.031000 warn CFPU-1 kernel: [  280.085112]
>>>>> Sep 24 12:08:49.031000 err CFPU-1 kernel: [  280.085121] INFO: task
>>>>> drbd_r_r7:8046 blocked for more than 120 seconds.
>>>>> Sep 24 12:08:49.036834 err CFPU-1 kernel: [  280.091854]
>>>>> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
>>>>> Sep 24 12:08:49.044747 err CFPU-1 kernel: [  280.099876] "echo 0 >
>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>> Sep 24 12:08:49.056612 info CFPU-1 kernel: [  280.107739] drbd_r_r7
>>>>>    D ffffffff80e1db78     0  8046      2 0x00100000
>>>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749] Stack :
>>>>> 0000000000000000 0000000000000010 ffffffff811b3a80 0000000000000000
>>>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
>>>>> 7fffffffffffffff 7fffffffffffffff 0000000000000000 0000000000000002
>>>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
>>>>> 8000000788fcfa98 0000000000000000 ffffffffc053baa8 ffffffff80be4360
>>>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
>>>>> ffffffffc0550000 ffffffff80e1db78 ffffffff81070660 ffffffff80e208d8
>>>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
>>>>> 8000000797a43600 ffffffff808c4878 80000007ff2f4100 0000000000000000
>>>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
>>>>> 80000007f93fdb40 0000000000000000 80000007ff007000 ffffffff81070660
>>>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
>>>>> 8000000788fcfa90 7fffffffffffffff 0000000000000000 0000000000000002
>>>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
>>>>> 8000000788fcfa98 ffffffff80e1e8a0 0000000100000000 8000000797a42880
>>>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
>>>>> ffffffff808c4b60 8000000788fcfaa0 8000000788fcfaa0 ffffffffc053baa8
>>>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]
>>>>> ffffffff80be4360 ffffffff81cb4540 ffffffff81cb0000 80000007f93fdb40
>>>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107749]   ...
>>>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107821] Call Trace:
>>>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107833]
>>>>> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
>>>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107842]
>>>>> [<ffffffff80e1db78>] schedule+0x38/0x98
>>>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107850]
>>>>> [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
>>>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107859]
>>>>> [<ffffffff80e1e8a0>] wait_for_common+0xd8/0x1b0
>>>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107873]
>>>>> [<ffffffff808acc20>] call_usermodehelper_exec+0x110/0x188
>>>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107932]
>>>>> [<ffffffffc053da3c>] drbd_khelper+0x1c4/0x310 [drbd]
>>>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.107998]
>>>>> [<ffffffffc0500a14>] drbd_start_resync+0x534/0x968 [drbd]
>>>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108060]
>>>>> [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
>>>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108124]
>>>>> [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
>>>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108189]
>>>>> [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
>>>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108229]
>>>>> [<ffffffff808b813c>] kthread+0xdc/0xf8
>>>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108239]
>>>>> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
>>>>> Sep 24 12:08:49.056612 warn CFPU-1 kernel: [  280.108245]
>>>>> Sep 24 12:08:49.056612 err CFPU-1 kernel: [  280.108471] INFO: task
>>>>> drbd_r_r8:8120 blocked for more than 120 seconds.
>>>>> Sep 24 12:08:49.060085 err CFPU-1 kernel: [  280.115194]
>>>>> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
>>>>> Sep 24 12:08:49.075924 err CFPU-1 kernel: [  280.123218] "echo 0 >
>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>> Sep 24 12:08:49.075924 info CFPU-1 kernel: [  280.131070] drbd_r_r8
>>>>>    D ffffffff80e1db78     0  8120      2 0x00100002
>>>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083] Stack :
>>>>> 0000000000000000 0000000000000010 ffffffff811b3a80 0000000000000000
>>>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
>>>>> 7fffffffffffffff 7fffffffffffffff 0000000000000000 0000000000000002
>>>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
>>>>> 80000007851d3a98 0000000000000000 ffffffffc053baa8 ffffffff80be4360
>>>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
>>>>> ffffffffc0550000 ffffffff80e1db78 ffffffff81070660 ffffffff80e208d8
>>>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
>>>>> 80000007f96f6c00 ffffffff808c4878 80000007ff2f4100 0000000000000000
>>>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
>>>>> 80000007ee99f360 0000000000000000 80000007ff007000 ffffffff81070660
>>>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
>>>>> 80000007851d3a90 7fffffffffffffff 0000000000000000 0000000000000002
>>>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
>>>>> 80000007851d3a98 ffffffff80e1e8a0 0000000100000000 8000000784ba3600
>>>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
>>>>> ffffffff808c4b60 80000007851d3aa0 80000007851d3aa0 ffffffffc053baa8
>>>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]
>>>>> ffffffff80be4360 ffffffff81cb4540 ffffffff81cb0000 80000007ee99f360
>>>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131083]   ...
>>>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131162] Call Trace:
>>>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131174]
>>>>> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
>>>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131183]
>>>>> [<ffffffff80e1db78>] schedule+0x38/0x98
>>>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131195]
>>>>> [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
>>>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131204]
>>>>> [<ffffffff80e1e8a0>] wait_for_common+0xd8/0x1b0
>>>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131216]
>>>>> [<ffffffff808acc20>] call_usermodehelper_exec+0x110/0x188
>>>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131274]
>>>>> [<ffffffffc053da3c>] drbd_khelper+0x1c4/0x310 [drbd]
>>>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131341]
>>>>> [<ffffffffc0500a14>] drbd_start_resync+0x534/0x968 [drbd]
>>>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131403]
>>>>> [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
>>>>> Sep 24 12:08:49.076256 warn CFPU-1 kernel: [  280.131482]
>>>>> [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
>>>>> Sep 24 12:08:49.077857 warn CFPU-1 kernel: [  280.131557]
>>>>> [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
>>>>> Sep 24 12:08:49.077857 warn CFPU-1 kernel: [  280.131597]
>>>>> [<ffffffff808b813c>] kthread+0xdc/0xf8
>>>>> Sep 24 12:08:49.077857 warn CFPU-1 kernel: [  280.131606]
>>>>> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
>>>>> Sep 24 12:08:49.077857 warn CFPU-1 kernel: [  280.131612]
>>>>> Sep 24 12:08:49.077857 err CFPU-1 kernel: [  280.131641] INFO: task
>>>>> drbd_r_r9:8173 blocked for more than 120 seconds.
>>>>> Sep 24 12:08:49.091245 err CFPU-1 kernel: [  280.138358]
>>>>> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
>>>>> Sep 24 12:08:49.091245 err CFPU-1 kernel: [  280.146373] "echo 0 >
>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>> Sep 24 12:08:49.099067 info CFPU-1 kernel: [  280.154219] drbd_r_r9
>>>>>    D ffffffff80e1db78     0  8173      2 0x00100002
>>>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229] Stack :
>>>>> 0000000000000000 0000000000000010 ffffffff811b3a80 0000000000000000
>>>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
>>>>> 7fffffffffffffff 7fffffffffffffff 0000000000000000 0000000000000002
>>>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
>>>>> 8000000783c9fa98 0000000000000000 ffffffffc053baa8 ffffffff80be4360
>>>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
>>>>> ffffffffc0550000 ffffffff80e1db78 ffffffff81070660 ffffffff80e208d8
>>>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
>>>>> 8000000797a45100 ffffffff808c4878 80000007ff2f4100 0000000000000000
>>>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
>>>>> 8000000788ca9d20 0000000000000000 80000007ff007000 ffffffff81070660
>>>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
>>>>> 8000000783c9fa90 7fffffffffffffff 0000000000000000 0000000000000002
>>>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
>>>>> 8000000783c9fa98 ffffffff80e1e8a0 0000000100000000 80000007843d4380
>>>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
>>>>> ffffffff808c4b60 8000000783c9faa0 8000000783c9faa0 ffffffffc053baa8
>>>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]
>>>>> ffffffff80be4360 ffffffff81cb4540 ffffffff81cb0000 8000000788ca9d20
>>>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154229]   ...
>>>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154305] Call Trace:
>>>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154315]
>>>>> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
>>>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154325]
>>>>> [<ffffffff80e1db78>] schedule+0x38/0x98
>>>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154333]
>>>>> [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
>>>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154342]
>>>>> [<ffffffff80e1e8a0>] wait_for_common+0xd8/0x1b0
>>>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154354]
>>>>> [<ffffffff808acc20>] call_usermodehelper_exec+0x110/0x188
>>>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154410]
>>>>> [<ffffffffc053da3c>] drbd_khelper+0x1c4/0x310 [drbd]
>>>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154475]
>>>>> [<ffffffffc0500a14>] drbd_start_resync+0x534/0x968 [drbd]
>>>>> Sep 24 12:08:49.099314 warn CFPU-1 kernel: [  280.154540]
>>>>> [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
>>>>> Sep 24 12:08:49.100714 warn CFPU-1 kernel: [  280.154612]
>>>>> [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
>>>>> Sep 24 12:08:49.100714 warn CFPU-1 kernel: [  280.154678]
>>>>> [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
>>>>> Sep 24 12:08:49.100714 warn CFPU-1 kernel: [  280.154717]
>>>>> [<ffffffff808b813c>] kthread+0xdc/0xf8
>>>>> Sep 24 12:08:49.100714 warn CFPU-1 kernel: [  280.154726]
>>>>> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
>>>>> Sep 24 12:08:49.100714 warn CFPU-1 kernel: [  280.154732]
>>>>> Sep 24 12:08:49.100714 err CFPU-1 kernel: [  280.154742] INFO: task
>>>>> drbd_r_r10:8254 blocked for more than 120 seconds.
>>>>> Sep 24 12:08:49.106539 err CFPU-1 kernel: [  280.161554]
>>>>> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
>>>>> Sep 24 12:08:49.114430 err CFPU-1 kernel: [  280.169569] "echo 0 >
>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>> Sep 24 12:08:49.129749 info CFPU-1 kernel: [  280.177434] drbd_r_r10
>>>>>    D ffffffff80e1db78     0  8254      2 0x00100002
>>>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445] Stack :
>>>>> 0000000000000000 0000000000000010 ffffffff811b3a80 0000000000000000
>>>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
>>>>> 7fffffffffffffff 7fffffffffffffff 0000000000000000 0000000000000002
>>>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
>>>>> 80000007846efa98 0000000000000000 ffffffffc053baa8 ffffffff80be4360
>>>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
>>>>> ffffffffc0550000 ffffffff80e1db78 ffffffff81070660 ffffffff80e208d8
>>>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
>>>>> 8000000784893600 ffffffff808c4878 80000007ff2f4100 0000000000000000
>>>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
>>>>> 80000007e9498180 0000000000000000 80000007ff007000 ffffffff81070660
>>>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
>>>>> 80000007846efa90 7fffffffffffffff 0000000000000000 0000000000000002
>>>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
>>>>> 80000007846efa98 ffffffff80e1e8a0 0000000100000000 8000000783cf8d80
>>>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
>>>>> ffffffff808c4b60 80000007846efaa0 80000007846efaa0 ffffffffc053baa8
>>>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]
>>>>> ffffffff80be4360 ffffffff81cb4540 ffffffff81cb0000 80000007e9498180
>>>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177445]   ...
>>>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177517] Call Trace:
>>>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177529]
>>>>> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
>>>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177539]
>>>>> [<ffffffff80e1db78>] schedule+0x38/0x98
>>>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177547]
>>>>> [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
>>>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177556]
>>>>> [<ffffffff80e1e8a0>] wait_for_common+0xd8/0x1b0
>>>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177569]
>>>>> [<ffffffff808acc20>] call_usermodehelper_exec+0x110/0x188
>>>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177631]
>>>>> [<ffffffffc053da3c>] drbd_khelper+0x1c4/0x310 [drbd]
>>>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177697]
>>>>> [<ffffffffc0500a14>] drbd_start_resync+0x534/0x968 [drbd]
>>>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177760]
>>>>> [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
>>>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177823]
>>>>> [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
>>>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177888]
>>>>> [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
>>>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177928]
>>>>> [<ffffffff808b813c>] kthread+0xdc/0xf8
>>>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177938]
>>>>> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
>>>>> Sep 24 12:08:49.129749 warn CFPU-1 kernel: [  280.177944]
>>>>> Sep 24 12:08:49.129749 err CFPU-1 kernel: [  280.177960] INFO: task
>>>>> drbd_r_r11:8328 blocked for more than 120 seconds.
>>>>> Sep 24 12:08:49.129749 err CFPU-1 kernel: [  280.184775]
>>>>> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
>>>>> Sep 24 12:08:49.137674 err CFPU-1 kernel: [  280.192793] "echo 0 >
>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>> Sep 24 12:08:49.152797 info CFPU-1 kernel: [  280.200644] drbd_r_r11
>>>>>    D ffffffff80e1db78     0  8328      2 0x00100002
>>>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654] Stack :
>>>>> 0000000000000000 0000000000000010 ffffffff811b3a80 0000000000000000
>>>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
>>>>> 7fffffffffffffff 7fffffffffffffff 0000000000000000 0000000000000002
>>>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
>>>>> 8000000783c5fa98 0000000000000000 ffffffffc053baa8 ffffffff80be4360
>>>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
>>>>> ffffffffc0550000 ffffffff80e1db78 ffffffff81070660 ffffffff80e208d8
>>>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
>>>>> 8000000787935100 ffffffff808c4878 80000007ff2f4100 0000000000000000
>>>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
>>>>> 80000007eebd25a0 0000000000000000 80000007ff007000 ffffffff81070660
>>>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
>>>>> 8000000783c5fa90 7fffffffffffffff 0000000000000000 0000000000000002
>>>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
>>>>> 8000000783c5fa98 ffffffff80e1e8a0 0000000100000000 800000078371de80
>>>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
>>>>> ffffffff808c4b60 8000000783c5faa0 8000000783c5faa0 ffffffffc053baa8
>>>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]
>>>>> ffffffff80be4360 ffffffff81cb4540 ffffffff81cb0000 80000007eebd25a0
>>>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200654]   ...
>>>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200726] Call Trace:
>>>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200739]
>>>>> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
>>>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200749]
>>>>> [<ffffffff80e1db78>] schedule+0x38/0x98
>>>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200757]
>>>>> [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
>>>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200766]
>>>>> [<ffffffff80e1e8a0>] wait_for_common+0xd8/0x1b0
>>>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200778]
>>>>> [<ffffffff808acc20>] call_usermodehelper_exec+0x110/0x188
>>>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200840]
>>>>> [<ffffffffc053da3c>] drbd_khelper+0x1c4/0x310 [drbd]
>>>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200906]
>>>>> [<ffffffffc0500a14>] drbd_start_resync+0x534/0x968 [drbd]
>>>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.200969]
>>>>> [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
>>>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.201032]
>>>>> [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
>>>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.201097]
>>>>> [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
>>>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.201137]
>>>>> [<ffffffff808b813c>] kthread+0xdc/0xf8
>>>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.201147]
>>>>> [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c
>>>>> Sep 24 12:08:49.152797 warn CFPU-1 kernel: [  280.201153]
>>>>> Sep 24 12:08:49.152797 err CFPU-1 kernel: [  280.201183] INFO: task
>>>>> lvcreate:8585 blocked for more than 120 seconds.
>>>>> Sep 24 12:08:49.152797 err CFPU-1 kernel: [  280.207823]
>>>>> Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
>>>>> Sep 24 12:08:49.160715 err CFPU-1 kernel: [  280.215860] "echo 0 >
>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>> Sep 24 12:08:49.168570 info CFPU-1 kernel: [  280.223704] lvcreate
>>>>>    D ffffffff80e1db78     0  8585   8582 0x00100000
>>>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719] Stack :
>>>>> ffffffff809ce420 ffffffffc027e170 80000007ea05c720 ffffffff809ce014
>>>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
>>>>> ffffffffc0285294 ffffffffc0285290 8000000797bdb600 0000000000000002
>>>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
>>>>> ffffffffc02834d0 ffffffffc0280000 ffffffff809ce420 ffffffffc027e170
>>>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
>>>>> 80000007f9275d80 ffffffff80e1db78 ffffffffc0285294 ffffffff80e20584
>>>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
>>>>> 80000007fde8bb30 ffffffffc0285298 8000000797bdb600 0000000000000000
>>>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
>>>>> c00000000205a080 0000000000000000 c00000000205a080 80000007ea05c600
>>>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
>>>>> 80000007fec10300 ffffffffc027fd20 8000000780cb0150 ffffffffc01e2e9c
>>>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
>>>>> 00000008c0284b30 00000002c0284b30 80000007ea05c620 ffffffffc0280000
>>>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
>>>>> c00000000205a080 80000007ea05de00 0000000000000000 8000000780cb0150
>>>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]
>>>>> 8000000780cb0160 0000000001d4c000 0000000000000000 8000000780cb0000
>>>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223719]   ...
>>>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223796] Call Trace:
>>>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223805]
>>>>> [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
>>>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223814]
>>>>> [<ffffffff80e1db78>] schedule+0x38/0x98
>>>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223824]
>>>>> [<ffffffff80e20584>] __down_write_nested+0x84/0xe8
>>>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223843]
>>>>> [<ffffffffc027fd20>] snapshot_ctr+0x4d0/0x868 [dm_snapshot]
>>>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223886]
>>>>> [<ffffffffc01e31d4>] dm_table_add_target+0x164/0x418 [dm_mod]
>>>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223921]
>>>>> [<ffffffffc01e7bdc>] table_load+0x194/0x478 [dm_mod]
>>>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.223956]
>>>>> [<ffffffffc01e8e1c>] ctl_ioctl+0x424/0x678 [dm_mod]
>>>>> Sep 24 12:08:49.168796 warn CFPU-1 kernel: [  280.224006]
>>>>> [<ffffffffc01e90a0>] dm_ctl_ioctl+0x30/0x40 [dm_mod]
>>>>> Sep 24 12:08:49.170177 warn CFPU-1 kernel: [  280.224040]
>>>>> [<ffffffff809f29cc>] do_vfs_ioctl+0x38c/0x5f8
>>>>> Sep 24 12:08:49.170177 warn CFPU-1 kernel: [  280.224049]
>>>>> [<ffffffff809f2c98>] SyS_ioctl+0x60/0xc8
>>>>> Sep 24 12:08:49.170177 warn CFPU-1 kernel: [  280.224059]
>>>>> [<ffffffff80879b70>] syscall_common+0x34/0x58
>>>>> Sep 24 12:08:49.170177 warn CFPU-1 kernel: [  280.224065]
>>>>>
>>>
>>>
>>>
>>> --
>>> Guruswamy Basavaiah
>>>
> 
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fix "dm kcopyd: Fix bug causing workqueue stalls" causes dead lock
  2019-10-01 12:43         ` Nikos Tsironis
@ 2019-10-09 14:13           ` Mike Snitzer
  2019-10-09 15:44             ` Nikos Tsironis
  0 siblings, 1 reply; 19+ messages in thread
From: Mike Snitzer @ 2019-10-09 14:13 UTC (permalink / raw)
  To: Nikos Tsironis, Guruswamy Basavaiah; +Cc: dm-devel, agk, iliastsi

On Tue, Oct 01 2019 at  8:43am -0400,
Nikos Tsironis <ntsironis@arrikto.com> wrote:

> On 10/1/19 3:27 PM, Guruswamy Basavaiah wrote:
> > Hello Nikos,
> >  Yes, issue is consistently reproducible with us, in a particular
> > set-up and test case.
> >  I will get the access to set-up next week, will try to test and let
> > you know the results before end of next week.
> > 
> 
> That sounds great!
> 
> Thanks a lot,
> Nikos

Hi Guru,

Any chance you could try this fix that I've staged to send to Linus?
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.4&id=633b1613b2a49304743c18314bb6e6465c21fd8a

Shiort of that, Nikos: do you happen to have a test scenario that teases
out this deadlock?

Thanks,
Mike

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fix "dm kcopyd: Fix bug causing workqueue stalls" causes dead lock
  2019-10-09 14:13           ` Mike Snitzer
@ 2019-10-09 15:44             ` Nikos Tsironis
  2019-10-09 16:04               ` Mike Snitzer
  0 siblings, 1 reply; 19+ messages in thread
From: Nikos Tsironis @ 2019-10-09 15:44 UTC (permalink / raw)
  To: Mike Snitzer, Guruswamy Basavaiah; +Cc: dm-devel, agk, iliastsi

On 10/9/19 5:13 PM, Mike Snitzer wrote:> On Tue, Oct 01 2019 at  8:43am -0400,
> Nikos Tsironis <ntsironis@arrikto.com> wrote:
> 
>> On 10/1/19 3:27 PM, Guruswamy Basavaiah wrote:
>>> Hello Nikos,
>>>  Yes, issue is consistently reproducible with us, in a particular
>>> set-up and test case.
>>>  I will get the access to set-up next week, will try to test and let
>>> you know the results before end of next week.
>>>
>>
>> That sounds great!
>>
>> Thanks a lot,
>> Nikos
> 
> Hi Guru,
> 
> Any chance you could try this fix that I've staged to send to Linus?
> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.4&id=633b1613b2a49304743c18314bb6e6465c21fd8a
> 
> Shiort of that, Nikos: do you happen to have a test scenario that teases
> out this deadlock?
> 

Hi Mike,

Yes,

I created a 50G LV and took a snapshot of the same size:

  lvcreate -n data-lv -L50G testvg
  lvcreate -n snap-lv -L50G -s testvg/data-lv

Then I ran the following fio job:

[global]
randrepeat=1
ioengine=libaio
bs=1M
size=6G
offset_increment=6G
numjobs=8
direct=1
iodepth=32
group_reporting
filename=/dev/testvg/data-lv

[test]
rw=write
timeout=180

, concurrently with the following script:

lvcreate -n dummy-lv -L1G testvg

while true
do
 lvcreate -n dummy-snap -L1M -s testvg/dummy-lv
 lvremove -f testvg/dummy-snap
done

This reproduced the deadlock for me. I also ran 'echo 30 >
/proc/sys/kernel/hung_task_timeout_secs', to reduce the hung task
timeout.

Nikos.

> Thanks,
> Mike
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fix "dm kcopyd: Fix bug causing workqueue stalls" causes dead lock
  2019-10-09 15:44             ` Nikos Tsironis
@ 2019-10-09 16:04               ` Mike Snitzer
  2019-10-09 20:03                 ` Guruswamy Basavaiah
  2019-10-10 11:58                 ` Nikos Tsironis
  0 siblings, 2 replies; 19+ messages in thread
From: Mike Snitzer @ 2019-10-09 16:04 UTC (permalink / raw)
  To: Nikos Tsironis; +Cc: dm-devel, iliastsi, agk, Guruswamy Basavaiah

On Wed, Oct 09 2019 at 11:44am -0400,
Nikos Tsironis <ntsironis@arrikto.com> wrote:

> On 10/9/19 5:13 PM, Mike Snitzer wrote:> On Tue, Oct 01 2019 at  8:43am -0400,
> > Nikos Tsironis <ntsironis@arrikto.com> wrote:
> > 
> >> On 10/1/19 3:27 PM, Guruswamy Basavaiah wrote:
> >>> Hello Nikos,
> >>>  Yes, issue is consistently reproducible with us, in a particular
> >>> set-up and test case.
> >>>  I will get the access to set-up next week, will try to test and let
> >>> you know the results before end of next week.
> >>>
> >>
> >> That sounds great!
> >>
> >> Thanks a lot,
> >> Nikos
> > 
> > Hi Guru,
> > 
> > Any chance you could try this fix that I've staged to send to Linus?
> > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.4&id=633b1613b2a49304743c18314bb6e6465c21fd8a
> > 
> > Shiort of that, Nikos: do you happen to have a test scenario that teases
> > out this deadlock?
> > 
> 
> Hi Mike,
> 
> Yes,
> 
> I created a 50G LV and took a snapshot of the same size:
> 
>   lvcreate -n data-lv -L50G testvg
>   lvcreate -n snap-lv -L50G -s testvg/data-lv
> 
> Then I ran the following fio job:
> 
> [global]
> randrepeat=1
> ioengine=libaio
> bs=1M
> size=6G
> offset_increment=6G
> numjobs=8
> direct=1
> iodepth=32
> group_reporting
> filename=/dev/testvg/data-lv
> 
> [test]
> rw=write
> timeout=180
> 
> , concurrently with the following script:
> 
> lvcreate -n dummy-lv -L1G testvg
> 
> while true
> do
>  lvcreate -n dummy-snap -L1M -s testvg/dummy-lv
>  lvremove -f testvg/dummy-snap
> done
> 
> This reproduced the deadlock for me. I also ran 'echo 30 >
> /proc/sys/kernel/hung_task_timeout_secs', to reduce the hung task
> timeout.
> 
> Nikos.

Very nice, well done.  Curious if you've tested with the fix I've staged
(see above)?  If so, does it resolve the deadlock?  If you've had
success I'd be happy to update the tags in the commit header to include
your Tested-by before sending it to Linus.  Also, any review of the
patch that you can do would be appreciated and with your formal
Reviewed-by reply would be welcomed and folded in too.

Mike

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fix "dm kcopyd: Fix bug causing workqueue stalls" causes dead lock
  2019-10-09 16:04               ` Mike Snitzer
@ 2019-10-09 20:03                 ` Guruswamy Basavaiah
  2019-10-10  6:34                   ` Guruswamy Basavaiah
  2019-10-10 11:58                 ` Nikos Tsironis
  1 sibling, 1 reply; 19+ messages in thread
From: Guruswamy Basavaiah @ 2019-10-09 20:03 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, Nikos Tsironis, agk, iliastsi

Hello Mike,
 I will get the testing result before end of Thursday.
Guru

On Wed, 9 Oct 2019 at 21:34, Mike Snitzer <snitzer@redhat.com> wrote:
>
> On Wed, Oct 09 2019 at 11:44am -0400,
> Nikos Tsironis <ntsironis@arrikto.com> wrote:
>
> > On 10/9/19 5:13 PM, Mike Snitzer wrote:> On Tue, Oct 01 2019 at  8:43am -0400,
> > > Nikos Tsironis <ntsironis@arrikto.com> wrote:
> > >
> > >> On 10/1/19 3:27 PM, Guruswamy Basavaiah wrote:
> > >>> Hello Nikos,
> > >>>  Yes, issue is consistently reproducible with us, in a particular
> > >>> set-up and test case.
> > >>>  I will get the access to set-up next week, will try to test and let
> > >>> you know the results before end of next week.
> > >>>
> > >>
> > >> That sounds great!
> > >>
> > >> Thanks a lot,
> > >> Nikos
> > >
> > > Hi Guru,
> > >
> > > Any chance you could try this fix that I've staged to send to Linus?
> > > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.4&id=633b1613b2a49304743c18314bb6e6465c21fd8a
> > >
> > > Shiort of that, Nikos: do you happen to have a test scenario that teases
> > > out this deadlock?
> > >
> >
> > Hi Mike,
> >
> > Yes,
> >
> > I created a 50G LV and took a snapshot of the same size:
> >
> >   lvcreate -n data-lv -L50G testvg
> >   lvcreate -n snap-lv -L50G -s testvg/data-lv
> >
> > Then I ran the following fio job:
> >
> > [global]
> > randrepeat=1
> > ioengine=libaio
> > bs=1M
> > size=6G
> > offset_increment=6G
> > numjobs=8
> > direct=1
> > iodepth=32
> > group_reporting
> > filename=/dev/testvg/data-lv
> >
> > [test]
> > rw=write
> > timeout=180
> >
> > , concurrently with the following script:
> >
> > lvcreate -n dummy-lv -L1G testvg
> >
> > while true
> > do
> >  lvcreate -n dummy-snap -L1M -s testvg/dummy-lv
> >  lvremove -f testvg/dummy-snap
> > done
> >
> > This reproduced the deadlock for me. I also ran 'echo 30 >
> > /proc/sys/kernel/hung_task_timeout_secs', to reduce the hung task
> > timeout.
> >
> > Nikos.
>
> Very nice, well done.  Curious if you've tested with the fix I've staged
> (see above)?  If so, does it resolve the deadlock?  If you've had
> success I'd be happy to update the tags in the commit header to include
> your Tested-by before sending it to Linus.  Also, any review of the
> patch that you can do would be appreciated and with your formal
> Reviewed-by reply would be welcomed and folded in too.
>
> Mike



-- 
Guruswamy Basavaiah

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fix "dm kcopyd: Fix bug causing workqueue stalls" causes dead lock
  2019-10-09 20:03                 ` Guruswamy Basavaiah
@ 2019-10-10  6:34                   ` Guruswamy Basavaiah
  2019-10-10 12:03                     ` Nikos Tsironis
  0 siblings, 1 reply; 19+ messages in thread
From: Guruswamy Basavaiah @ 2019-10-10  6:34 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, Nikos Tsironis, agk, iliastsi

Hello,
We use 4.4.184 in our builds and the patch fails to apply.
Is it possible to give a patch for 4.4.x branch ?

patching Logs.
patching file drivers/md/dm-snap.c
Hunk #1 succeeded at 19 (offset 1 line).
Hunk #2 succeeded at 105 (offset -1 lines).
Hunk #3 succeeded at 157 (offset -4 lines).
Hunk #4 succeeded at 1206 (offset -120 lines).
Hunk #5 FAILED at 1508.
Hunk #6 succeeded at 1412 (offset -124 lines).
Hunk #7 succeeded at 1425 (offset -124 lines).
Hunk #8 FAILED at 1925.
Hunk #9 succeeded at 1866 with fuzz 2 (offset -255 lines).
Hunk #10 succeeded at 2202 (offset -294 lines).
Hunk #11 succeeded at 2332 (offset -294 lines).
2 out of 11 hunks FAILED -- saving rejects to file drivers/md/dm-snap.c.rej

Guru

On Thu, 10 Oct 2019 at 01:33, Guruswamy Basavaiah <guru2018@gmail.com> wrote:
>
> Hello Mike,
>  I will get the testing result before end of Thursday.
> Guru
>
> On Wed, 9 Oct 2019 at 21:34, Mike Snitzer <snitzer@redhat.com> wrote:
> >
> > On Wed, Oct 09 2019 at 11:44am -0400,
> > Nikos Tsironis <ntsironis@arrikto.com> wrote:
> >
> > > On 10/9/19 5:13 PM, Mike Snitzer wrote:> On Tue, Oct 01 2019 at  8:43am -0400,
> > > > Nikos Tsironis <ntsironis@arrikto.com> wrote:
> > > >
> > > >> On 10/1/19 3:27 PM, Guruswamy Basavaiah wrote:
> > > >>> Hello Nikos,
> > > >>>  Yes, issue is consistently reproducible with us, in a particular
> > > >>> set-up and test case.
> > > >>>  I will get the access to set-up next week, will try to test and let
> > > >>> you know the results before end of next week.
> > > >>>
> > > >>
> > > >> That sounds great!
> > > >>
> > > >> Thanks a lot,
> > > >> Nikos
> > > >
> > > > Hi Guru,
> > > >
> > > > Any chance you could try this fix that I've staged to send to Linus?
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.4&id=633b1613b2a49304743c18314bb6e6465c21fd8a
> > > >
> > > > Shiort of that, Nikos: do you happen to have a test scenario that teases
> > > > out this deadlock?
> > > >
> > >
> > > Hi Mike,
> > >
> > > Yes,
> > >
> > > I created a 50G LV and took a snapshot of the same size:
> > >
> > >   lvcreate -n data-lv -L50G testvg
> > >   lvcreate -n snap-lv -L50G -s testvg/data-lv
> > >
> > > Then I ran the following fio job:
> > >
> > > [global]
> > > randrepeat=1
> > > ioengine=libaio
> > > bs=1M
> > > size=6G
> > > offset_increment=6G
> > > numjobs=8
> > > direct=1
> > > iodepth=32
> > > group_reporting
> > > filename=/dev/testvg/data-lv
> > >
> > > [test]
> > > rw=write
> > > timeout=180
> > >
> > > , concurrently with the following script:
> > >
> > > lvcreate -n dummy-lv -L1G testvg
> > >
> > > while true
> > > do
> > >  lvcreate -n dummy-snap -L1M -s testvg/dummy-lv
> > >  lvremove -f testvg/dummy-snap
> > > done
> > >
> > > This reproduced the deadlock for me. I also ran 'echo 30 >
> > > /proc/sys/kernel/hung_task_timeout_secs', to reduce the hung task
> > > timeout.
> > >
> > > Nikos.
> >
> > Very nice, well done.  Curious if you've tested with the fix I've staged
> > (see above)?  If so, does it resolve the deadlock?  If you've had
> > success I'd be happy to update the tags in the commit header to include
> > your Tested-by before sending it to Linus.  Also, any review of the
> > patch that you can do would be appreciated and with your formal
> > Reviewed-by reply would be welcomed and folded in too.
> >
> > Mike
>
>
>
> --
> Guruswamy Basavaiah



-- 
Guruswamy Basavaiah

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fix "dm kcopyd: Fix bug causing workqueue stalls" causes dead lock
  2019-10-09 16:04               ` Mike Snitzer
  2019-10-09 20:03                 ` Guruswamy Basavaiah
@ 2019-10-10 11:58                 ` Nikos Tsironis
  1 sibling, 0 replies; 19+ messages in thread
From: Nikos Tsironis @ 2019-10-10 11:58 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, iliastsi, agk, Guruswamy Basavaiah

On 10/9/19 7:04 PM, Mike Snitzer wrote:
> On Wed, Oct 09 2019 at 11:44am -0400,
> Nikos Tsironis <ntsironis@arrikto.com> wrote:
> 
>> On 10/9/19 5:13 PM, Mike Snitzer wrote:> On Tue, Oct 01 2019 at  8:43am -0400,
>>> Nikos Tsironis <ntsironis@arrikto.com> wrote:
>>>
>>>> On 10/1/19 3:27 PM, Guruswamy Basavaiah wrote:
>>>>> Hello Nikos,
>>>>>  Yes, issue is consistently reproducible with us, in a particular
>>>>> set-up and test case.
>>>>>  I will get the access to set-up next week, will try to test and let
>>>>> you know the results before end of next week.
>>>>>
>>>>
>>>> That sounds great!
>>>>
>>>> Thanks a lot,
>>>> Nikos
>>>
>>> Hi Guru,
>>>
>>> Any chance you could try this fix that I've staged to send to Linus?
>>> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.4&id=633b1613b2a49304743c18314bb6e6465c21fd8a
>>>
>>> Shiort of that, Nikos: do you happen to have a test scenario that teases
>>> out this deadlock?
>>>
>>
>> Hi Mike,
>>
>> Yes,
>>
>> I created a 50G LV and took a snapshot of the same size:
>>
>>   lvcreate -n data-lv -L50G testvg
>>   lvcreate -n snap-lv -L50G -s testvg/data-lv
>>
>> Then I ran the following fio job:
>>
>> [global]
>> randrepeat=1
>> ioengine=libaio
>> bs=1M
>> size=6G
>> offset_increment=6G
>> numjobs=8
>> direct=1
>> iodepth=32
>> group_reporting
>> filename=/dev/testvg/data-lv
>>
>> [test]
>> rw=write
>> timeout=180
>>
>> , concurrently with the following script:
>>
>> lvcreate -n dummy-lv -L1G testvg
>>
>> while true
>> do
>>  lvcreate -n dummy-snap -L1M -s testvg/dummy-lv
>>  lvremove -f testvg/dummy-snap
>> done
>>
>> This reproduced the deadlock for me. I also ran 'echo 30 >
>> /proc/sys/kernel/hung_task_timeout_secs', to reduce the hung task
>> timeout.
>>
>> Nikos.
> 
> Very nice, well done.  Curious if you've tested with the fix I've staged
> (see above)?  If so, does it resolve the deadlock?  If you've had
> success I'd be happy to update the tags in the commit header to include
> your Tested-by before sending it to Linus.  Also, any review of the
> patch that you can do would be appreciated and with your formal
> Reviewed-by reply would be welcomed and folded in too.
> 

Yes, I have tested the staged fix. I forgot to mention it in my previous
mail.

I ran the test for the default 'snapshot_cow_threshold' value of 2048
and I also ran it for a value of 1, to stress it a little more.

In both cases everything went fine, the deadlock was gone.

Nikos

> Mike
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fix "dm kcopyd: Fix bug causing workqueue stalls" causes dead lock
  2019-10-10  6:34                   ` Guruswamy Basavaiah
@ 2019-10-10 12:03                     ` Nikos Tsironis
  2019-10-11 10:17                       ` Guruswamy Basavaiah
  0 siblings, 1 reply; 19+ messages in thread
From: Nikos Tsironis @ 2019-10-10 12:03 UTC (permalink / raw)
  To: Guruswamy Basavaiah, Mike Snitzer; +Cc: dm-devel, agk, iliastsi

[-- Attachment #1: Type: text/plain, Size: 3515 bytes --]

On 10/10/19 9:34 AM, Guruswamy Basavaiah wrote:
> Hello,
> We use 4.4.184 in our builds and the patch fails to apply.
> Is it possible to give a patch for 4.4.x branch ?
Hi Guru,

I attach the two patches fixing the deadlock rebased on the 4.4.x branch.

Nikos

> 
> patching Logs.
> patching file drivers/md/dm-snap.c
> Hunk #1 succeeded at 19 (offset 1 line).
> Hunk #2 succeeded at 105 (offset -1 lines).
> Hunk #3 succeeded at 157 (offset -4 lines).
> Hunk #4 succeeded at 1206 (offset -120 lines).
> Hunk #5 FAILED at 1508.
> Hunk #6 succeeded at 1412 (offset -124 lines).
> Hunk #7 succeeded at 1425 (offset -124 lines).
> Hunk #8 FAILED at 1925.
> Hunk #9 succeeded at 1866 with fuzz 2 (offset -255 lines).
> Hunk #10 succeeded at 2202 (offset -294 lines).
> Hunk #11 succeeded at 2332 (offset -294 lines).
> 2 out of 11 hunks FAILED -- saving rejects to file drivers/md/dm-snap.c.rej
> 
> Guru
> 
> On Thu, 10 Oct 2019 at 01:33, Guruswamy Basavaiah <guru2018@gmail.com> wrote:
>>
>> Hello Mike,
>>  I will get the testing result before end of Thursday.
>> Guru
>>
>> On Wed, 9 Oct 2019 at 21:34, Mike Snitzer <snitzer@redhat.com> wrote:
>>>
>>> On Wed, Oct 09 2019 at 11:44am -0400,
>>> Nikos Tsironis <ntsironis@arrikto.com> wrote:
>>>
>>>> On 10/9/19 5:13 PM, Mike Snitzer wrote:> On Tue, Oct 01 2019 at  8:43am -0400,
>>>>> Nikos Tsironis <ntsironis@arrikto.com> wrote:
>>>>>
>>>>>> On 10/1/19 3:27 PM, Guruswamy Basavaiah wrote:
>>>>>>> Hello Nikos,
>>>>>>>  Yes, issue is consistently reproducible with us, in a particular
>>>>>>> set-up and test case.
>>>>>>>  I will get the access to set-up next week, will try to test and let
>>>>>>> you know the results before end of next week.
>>>>>>>
>>>>>>
>>>>>> That sounds great!
>>>>>>
>>>>>> Thanks a lot,
>>>>>> Nikos
>>>>>
>>>>> Hi Guru,
>>>>>
>>>>> Any chance you could try this fix that I've staged to send to Linus?
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.4&id=633b1613b2a49304743c18314bb6e6465c21fd8a
>>>>>
>>>>> Shiort of that, Nikos: do you happen to have a test scenario that teases
>>>>> out this deadlock?
>>>>>
>>>>
>>>> Hi Mike,
>>>>
>>>> Yes,
>>>>
>>>> I created a 50G LV and took a snapshot of the same size:
>>>>
>>>>   lvcreate -n data-lv -L50G testvg
>>>>   lvcreate -n snap-lv -L50G -s testvg/data-lv
>>>>
>>>> Then I ran the following fio job:
>>>>
>>>> [global]
>>>> randrepeat=1
>>>> ioengine=libaio
>>>> bs=1M
>>>> size=6G
>>>> offset_increment=6G
>>>> numjobs=8
>>>> direct=1
>>>> iodepth=32
>>>> group_reporting
>>>> filename=/dev/testvg/data-lv
>>>>
>>>> [test]
>>>> rw=write
>>>> timeout=180
>>>>
>>>> , concurrently with the following script:
>>>>
>>>> lvcreate -n dummy-lv -L1G testvg
>>>>
>>>> while true
>>>> do
>>>>  lvcreate -n dummy-snap -L1M -s testvg/dummy-lv
>>>>  lvremove -f testvg/dummy-snap
>>>> done
>>>>
>>>> This reproduced the deadlock for me. I also ran 'echo 30 >
>>>> /proc/sys/kernel/hung_task_timeout_secs', to reduce the hung task
>>>> timeout.
>>>>
>>>> Nikos.
>>>
>>> Very nice, well done.  Curious if you've tested with the fix I've staged
>>> (see above)?  If so, does it resolve the deadlock?  If you've had
>>> success I'd be happy to update the tags in the commit header to include
>>> your Tested-by before sending it to Linus.  Also, any review of the
>>> patch that you can do would be appreciated and with your formal
>>> Reviewed-by reply would be welcomed and folded in too.
>>>
>>> Mike
>>
>>
>>
>> --
>> Guruswamy Basavaiah
> 
> 
> 

[-- Attachment #2: 0001-dm-snapshot-introduce-account_start_copy-and-account.patch --]
[-- Type: text/x-patch, Size: 1903 bytes --]

From 5b1ae3cfc07e53e6e6e37f9f40b074dd7a8536b9 Mon Sep 17 00:00:00 2001
From: Mikulas Patocka <mpatocka@redhat.com>
Date: Wed, 2 Oct 2019 06:14:17 -0400
Subject: [PATCH 1/2] dm snapshot: introduce account_start_copy() and
 account_end_copy()

This simple refactoring moves code for modifying the semaphore cow_count
into separate functions to prepare for changes that will extend these
methods to provide for a more sophisticated mechanism for COW
throttling.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-snap.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index 5d3797728b9c..a9c82fd036c6 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -1398,6 +1398,16 @@ static void snapshot_dtr(struct dm_target *ti)
 	kfree(s);
 }
 
+static void account_start_copy(struct dm_snapshot *s)
+{
+	down(&s->cow_count);
+}
+
+static void account_end_copy(struct dm_snapshot *s)
+{
+	up(&s->cow_count);
+}
+
 /*
  * Flush a list of buffers.
  */
@@ -1582,7 +1592,7 @@ static void copy_callback(int read_err, unsigned long write_err, void *context)
 		}
 		list_add(&pe->out_of_order_entry, lh);
 	}
-	up(&s->cow_count);
+	account_end_copy(s);
 }
 
 /*
@@ -1606,7 +1616,7 @@ static void start_copy(struct dm_snap_pending_exception *pe)
 	dest.count = src.count;
 
 	/* Hand over to kcopyd */
-	down(&s->cow_count);
+	account_start_copy(s);
 	dm_kcopyd_copy(s->kcopyd_client, &src, 1, &dest, 0, copy_callback, pe);
 }
 
@@ -1627,7 +1637,7 @@ static void start_full_bio(struct dm_snap_pending_exception *pe,
 	pe->full_bio_end_io = bio->bi_end_io;
 	pe->full_bio_private = bio->bi_private;
 
-	down(&s->cow_count);
+	account_start_copy(s);
 	callback_data = dm_kcopyd_prepare_callback(s->kcopyd_client,
 						   copy_callback, pe);
 
-- 
2.11.0


[-- Attachment #3: 0002-dm-snapshot-rework-COW-throttling-to-fix-deadlock.patch --]
[-- Type: text/x-patch, Size: 7999 bytes --]

From cec63eda390b8759adf9c1888530b90b12a16903 Mon Sep 17 00:00:00 2001
From: Mikulas Patocka <mpatocka@redhat.com>
Date: Wed, 2 Oct 2019 06:15:53 -0400
Subject: [PATCH 2/2] dm snapshot: rework COW throttling to fix deadlock

Commit 721b1d98fb517a ("dm snapshot: Fix excessive memory usage and
workqueue stalls") introduced a semaphore to limit the maximum number of
in-flight kcopyd (COW) jobs.

The implementation of this throttling mechanism is prone to a deadlock:

1. One or more threads write to the origin device causing COW, which is
   performed by kcopyd.

2. At some point some of these threads might reach the s->cow_count
   semaphore limit and block in down(&s->cow_count), holding a read lock
   on _origins_lock.

3. Someone tries to acquire a write lock on _origins_lock, e.g.,
   snapshot_ctr(), which blocks because the threads at step (2) already
   hold a read lock on it.

4. A COW operation completes and kcopyd runs dm-snapshot's completion
   callback, which ends up calling pending_complete().
   pending_complete() tries to resubmit any deferred origin bios. This
   requires acquiring a read lock on _origins_lock, which blocks.

   This happens because the read-write semaphore implementation gives
   priority to writers, meaning that as soon as a writer tries to enter
   the critical section, no readers will be allowed in, until all
   writers have completed their work.

   So, pending_complete() waits for the writer at step (3) to acquire
   and release the lock. This writer waits for the readers at step (2)
   to release the read lock and those readers wait for
   pending_complete() (the kcopyd thread) to signal the s->cow_count
   semaphore: DEADLOCK.

The above was thoroughly analyzed and documented by Nikos Tsironis as
part of his initial proposal for fixing this deadlock, see:
https://www.redhat.com/archives/dm-devel/2019-October/msg00001.html

Fix this deadlock by reworking COW throttling so that it waits without
holding any locks. Add a variable 'in_progress' that counts how many
kcopyd jobs are running. A function wait_for_in_progress() will sleep if
'in_progress' is over the limit. It drops _origins_lock in order to
avoid the deadlock.

Reported-by: Guruswamy Basavaiah <guru2018@gmail.com>
Reported-by: Nikos Tsironis <ntsironis@arrikto.com>
Fixes: 721b1d98fb51 ("dm snapshot: Fix excessive memory usage and workqueue stalls")
Cc: stable@vger.kernel.org # v5.0+
Depends-on: 4a3f111a73a8c ("dm snapshot: introduce account_start_copy() and account_end_copy()")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-snap.c | 78 ++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 64 insertions(+), 14 deletions(-)

diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index a9c82fd036c6..0141b7089506 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -19,7 +19,6 @@
 #include <linux/vmalloc.h>
 #include <linux/log2.h>
 #include <linux/dm-kcopyd.h>
-#include <linux/semaphore.h>
 
 #include "dm.h"
 
@@ -106,8 +105,8 @@ struct dm_snapshot {
 	/* The on disk metadata handler */
 	struct dm_exception_store *store;
 
-	/* Maximum number of in-flight COW jobs. */
-	struct semaphore cow_count;
+	unsigned in_progress;
+	struct wait_queue_head in_progress_wait;
 
 	struct dm_kcopyd_client *kcopyd_client;
 
@@ -158,8 +157,8 @@ struct dm_snapshot {
  */
 #define DEFAULT_COW_THRESHOLD 2048
 
-static int cow_threshold = DEFAULT_COW_THRESHOLD;
-module_param_named(snapshot_cow_threshold, cow_threshold, int, 0644);
+static unsigned cow_threshold = DEFAULT_COW_THRESHOLD;
+module_param_named(snapshot_cow_threshold, cow_threshold, uint, 0644);
 MODULE_PARM_DESC(snapshot_cow_threshold, "Maximum number of chunks being copied on write");
 
 DECLARE_DM_KCOPYD_THROTTLE_WITH_MODULE_PARM(snapshot_copy_throttle,
@@ -1207,7 +1206,7 @@ static int snapshot_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 		goto bad_hash_tables;
 	}
 
-	sema_init(&s->cow_count, (cow_threshold > 0) ? cow_threshold : INT_MAX);
+	init_waitqueue_head(&s->in_progress_wait);
 
 	s->kcopyd_client = dm_kcopyd_client_create(&dm_kcopyd_throttle);
 	if (IS_ERR(s->kcopyd_client)) {
@@ -1395,17 +1394,54 @@ static void snapshot_dtr(struct dm_target *ti)
 
 	dm_put_device(ti, s->origin);
 
+	WARN_ON(s->in_progress);
+
 	kfree(s);
 }
 
 static void account_start_copy(struct dm_snapshot *s)
 {
-	down(&s->cow_count);
+	spin_lock(&s->in_progress_wait.lock);
+	s->in_progress++;
+	spin_unlock(&s->in_progress_wait.lock);
 }
 
 static void account_end_copy(struct dm_snapshot *s)
 {
-	up(&s->cow_count);
+	spin_lock(&s->in_progress_wait.lock);
+	BUG_ON(!s->in_progress);
+	s->in_progress--;
+	if (likely(s->in_progress <= cow_threshold) &&
+	    unlikely(waitqueue_active(&s->in_progress_wait)))
+		wake_up_locked(&s->in_progress_wait);
+	spin_unlock(&s->in_progress_wait.lock);
+}
+
+static bool wait_for_in_progress(struct dm_snapshot *s, bool unlock_origins)
+{
+	if (unlikely(s->in_progress > cow_threshold)) {
+		spin_lock(&s->in_progress_wait.lock);
+		if (likely(s->in_progress > cow_threshold)) {
+			/*
+			 * NOTE: this throttle doesn't account for whether
+			 * the caller is servicing an IO that will trigger a COW
+			 * so excess throttling may result for chunks not required
+			 * to be COW'd.  But if cow_threshold was reached, extra
+			 * throttling is unlikely to negatively impact performance.
+			 */
+			DECLARE_WAITQUEUE(wait, current);
+			__add_wait_queue(&s->in_progress_wait, &wait);
+			__set_current_state(TASK_UNINTERRUPTIBLE);
+			spin_unlock(&s->in_progress_wait.lock);
+			if (unlock_origins)
+				up_read(&_origins_lock);
+			io_schedule();
+			remove_wait_queue(&s->in_progress_wait, &wait);
+			return false;
+		}
+		spin_unlock(&s->in_progress_wait.lock);
+	}
+	return true;
 }
 
 /*
@@ -1423,7 +1459,7 @@ static void flush_bios(struct bio *bio)
 	}
 }
 
-static int do_origin(struct dm_dev *origin, struct bio *bio);
+static int do_origin(struct dm_dev *origin, struct bio *bio, bool limit);
 
 /*
  * Flush a list of buffers.
@@ -1436,7 +1472,7 @@ static void retry_origin_bios(struct dm_snapshot *s, struct bio *bio)
 	while (bio) {
 		n = bio->bi_next;
 		bio->bi_next = NULL;
-		r = do_origin(s->origin, bio);
+		r = do_origin(s->origin, bio, false);
 		if (r == DM_MAPIO_REMAPPED)
 			generic_make_request(bio);
 		bio = n;
@@ -1728,6 +1764,11 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio)
 	if (!s->valid)
 		return -EIO;
 
+	if (bio_data_dir(bio) == WRITE) {
+		while (unlikely(!wait_for_in_progress(s, false)))
+			; /* wait_for_in_progress() has slept */
+	}
+
 	/* FIXME: should only take write lock if we need
 	 * to copy an exception */
 	down_write(&s->lock);
@@ -1877,7 +1918,7 @@ static int snapshot_merge_map(struct dm_target *ti, struct bio *bio)
 
 	if (bio_rw(bio) == WRITE) {
 		up_write(&s->lock);
-		return do_origin(s->origin, bio);
+		return do_origin(s->origin, bio, false);
 	}
 
 out_unlock:
@@ -2213,15 +2254,24 @@ static int __origin_write(struct list_head *snapshots, sector_t sector,
 /*
  * Called on a write from the origin driver.
  */
-static int do_origin(struct dm_dev *origin, struct bio *bio)
+static int do_origin(struct dm_dev *origin, struct bio *bio, bool limit)
 {
 	struct origin *o;
 	int r = DM_MAPIO_REMAPPED;
 
+again:
 	down_read(&_origins_lock);
 	o = __lookup_origin(origin->bdev);
-	if (o)
+	if (o) {
+		if (limit) {
+			struct dm_snapshot *s;
+			list_for_each_entry(s, &o->snapshots, list)
+				if (unlikely(!wait_for_in_progress(s, true)))
+					goto again;
+		}
+
 		r = __origin_write(&o->snapshots, bio->bi_iter.bi_sector, bio);
+	}
 	up_read(&_origins_lock);
 
 	return r;
@@ -2334,7 +2384,7 @@ static int origin_map(struct dm_target *ti, struct bio *bio)
 		dm_accept_partial_bio(bio, available_sectors);
 
 	/* Only tell snapshots if this is a write */
-	return do_origin(o->dev, bio);
+	return do_origin(o->dev, bio, true);
 }
 
 /*
-- 
2.11.0


[-- Attachment #4: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Fix "dm kcopyd: Fix bug causing workqueue stalls" causes dead lock
  2019-10-10 12:03                     ` Nikos Tsironis
@ 2019-10-11 10:17                       ` Guruswamy Basavaiah
  2019-10-11 11:39                         ` Nikos Tsironis
  0 siblings, 1 reply; 19+ messages in thread
From: Guruswamy Basavaiah @ 2019-10-11 10:17 UTC (permalink / raw)
  To: Nikos Tsironis; +Cc: dm-devel, agk, Mike Snitzer, iliastsi

[-- Attachment #1: Type: text/plain, Size: 4245 bytes --]

Hello Nikos,
 Applied these patches and tested.
 We still see hung_task_timeout back traces and the drbd Resync is blocked.
 Attached the back trace, please let me know if you need any other information.

 In patch "0002-dm-snapshot-rework-COW-throttling-to-fix-deadlock.patch"
I change "struct wait_queue_head" to "wait_queue_head_t" as i was
getting compilation error with former one.

On Thu, 10 Oct 2019 at 17:33, Nikos Tsironis <ntsironis@arrikto.com> wrote:
>
> On 10/10/19 9:34 AM, Guruswamy Basavaiah wrote:
> > Hello,
> > We use 4.4.184 in our builds and the patch fails to apply.
> > Is it possible to give a patch for 4.4.x branch ?
> Hi Guru,
>
> I attach the two patches fixing the deadlock rebased on the 4.4.x branch.
>
> Nikos
>
> >
> > patching Logs.
> > patching file drivers/md/dm-snap.c
> > Hunk #1 succeeded at 19 (offset 1 line).
> > Hunk #2 succeeded at 105 (offset -1 lines).
> > Hunk #3 succeeded at 157 (offset -4 lines).
> > Hunk #4 succeeded at 1206 (offset -120 lines).
> > Hunk #5 FAILED at 1508.
> > Hunk #6 succeeded at 1412 (offset -124 lines).
> > Hunk #7 succeeded at 1425 (offset -124 lines).
> > Hunk #8 FAILED at 1925.
> > Hunk #9 succeeded at 1866 with fuzz 2 (offset -255 lines).
> > Hunk #10 succeeded at 2202 (offset -294 lines).
> > Hunk #11 succeeded at 2332 (offset -294 lines).
> > 2 out of 11 hunks FAILED -- saving rejects to file drivers/md/dm-snap.c.rej
> >
> > Guru
> >
> > On Thu, 10 Oct 2019 at 01:33, Guruswamy Basavaiah <guru2018@gmail.com> wrote:
> >>
> >> Hello Mike,
> >>  I will get the testing result before end of Thursday.
> >> Guru
> >>
> >> On Wed, 9 Oct 2019 at 21:34, Mike Snitzer <snitzer@redhat.com> wrote:
> >>>
> >>> On Wed, Oct 09 2019 at 11:44am -0400,
> >>> Nikos Tsironis <ntsironis@arrikto.com> wrote:
> >>>
> >>>> On 10/9/19 5:13 PM, Mike Snitzer wrote:> On Tue, Oct 01 2019 at  8:43am -0400,
> >>>>> Nikos Tsironis <ntsironis@arrikto.com> wrote:
> >>>>>
> >>>>>> On 10/1/19 3:27 PM, Guruswamy Basavaiah wrote:
> >>>>>>> Hello Nikos,
> >>>>>>>  Yes, issue is consistently reproducible with us, in a particular
> >>>>>>> set-up and test case.
> >>>>>>>  I will get the access to set-up next week, will try to test and let
> >>>>>>> you know the results before end of next week.
> >>>>>>>
> >>>>>>
> >>>>>> That sounds great!
> >>>>>>
> >>>>>> Thanks a lot,
> >>>>>> Nikos
> >>>>>
> >>>>> Hi Guru,
> >>>>>
> >>>>> Any chance you could try this fix that I've staged to send to Linus?
> >>>>> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.4&id=633b1613b2a49304743c18314bb6e6465c21fd8a
> >>>>>
> >>>>> Shiort of that, Nikos: do you happen to have a test scenario that teases
> >>>>> out this deadlock?
> >>>>>
> >>>>
> >>>> Hi Mike,
> >>>>
> >>>> Yes,
> >>>>
> >>>> I created a 50G LV and took a snapshot of the same size:
> >>>>
> >>>>   lvcreate -n data-lv -L50G testvg
> >>>>   lvcreate -n snap-lv -L50G -s testvg/data-lv
> >>>>
> >>>> Then I ran the following fio job:
> >>>>
> >>>> [global]
> >>>> randrepeat=1
> >>>> ioengine=libaio
> >>>> bs=1M
> >>>> size=6G
> >>>> offset_increment=6G
> >>>> numjobs=8
> >>>> direct=1
> >>>> iodepth=32
> >>>> group_reporting
> >>>> filename=/dev/testvg/data-lv
> >>>>
> >>>> [test]
> >>>> rw=write
> >>>> timeout=180
> >>>>
> >>>> , concurrently with the following script:
> >>>>
> >>>> lvcreate -n dummy-lv -L1G testvg
> >>>>
> >>>> while true
> >>>> do
> >>>>  lvcreate -n dummy-snap -L1M -s testvg/dummy-lv
> >>>>  lvremove -f testvg/dummy-snap
> >>>> done
> >>>>
> >>>> This reproduced the deadlock for me. I also ran 'echo 30 >
> >>>> /proc/sys/kernel/hung_task_timeout_secs', to reduce the hung task
> >>>> timeout.
> >>>>
> >>>> Nikos.
> >>>
> >>> Very nice, well done.  Curious if you've tested with the fix I've staged
> >>> (see above)?  If so, does it resolve the deadlock?  If you've had
> >>> success I'd be happy to update the tags in the commit header to include
> >>> your Tested-by before sending it to Linus.  Also, any review of the
> >>> patch that you can do would be appreciated and with your formal
> >>> Reviewed-by reply would be welcomed and folded in too.
> >>>
> >>> Mike
> >>
> >>
> >>
> >> --
> >> Guruswamy Basavaiah
> >
> >
> >



-- 
Guruswamy Basavaiah

[-- Attachment #2: reboot.1.log --]
[-- Type: text/x-log, Size: 26740 bytes --]

[  279.965655] INFO: task drbd_r_r4:7898 blocked for more than 120 seconds.
[  279.972382]       Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
[  279.980404] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  279.988248] drbd_r_r4       D ffffffff80e1db78     0  7898      2 0x00100000
[  279.988258] Stack : ffffffff81d00000 ffffffff8114aa38 0000000000000000 ffffffff808dfdb8
               	  7fffffffffffffff 0000000000000000 80000000c9df7c38 7fffffffffffffff
               	  ffffffffc0280000 80000007ea1a0498 0000000000000001 ffffffffc0280000
               	  80000007efcf0200 ffffffff80e1db78 8000000788ae7830 ffffffff80e208d8
               	  0000000000000001 ffffffff80b35530 8000000788ae7820 8000000788ae7820
               	  8000000788ae7830 8000000788ae7830 ffffffff80fb772d 80000007efcf0200
               	  80000000c9df7380 0000000000000000 80000000c9df7c38 7fffffffffffffff
               	  ffffffffc0280000 ffffffff80e1d0b4 0000000000000002 80000007efcf0200
               	  ffffffffc0280000 0000000000000001 80000007efcf0330 ffffffffc027eb14
               	  0000000000000000 8000000788ac6c00 ffffffff808c4b60 80000007efcf0338
               	  ...
[  279.988330] Call Trace:
[  279.988342] [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
[  279.988351] [<ffffffff80e1db78>] schedule+0x38/0x98
[  279.988359] [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
[  279.988368] [<ffffffff80e1d0b4>] io_schedule_timeout+0x8c/0xc0
[  279.988389] [<ffffffffc027eb14>] wait_for_in_progress+0x12c/0x168 [dm_snapshot]
[  279.988406] [<ffffffffc027ec34>] do_origin+0xe4/0x170 [dm_snapshot]
[  279.988446] [<ffffffffc01dc2c0>] __map_bio+0xb0/0x258 [dm_mod]
[  279.988479] [<ffffffffc01deb94>] __split_and_process_bio+0x274/0x488 [dm_mod]
[  279.988511] [<ffffffffc01dee3c>] dm_make_request+0x94/0x128 [dm_mod]
[  279.988535] [<ffffffff80b3347c>] generic_make_request+0x114/0x290
[  279.988543] [<ffffffff80b336c0>] submit_bio+0xc8/0x1e0
[  279.988604] [<ffffffffc051cf60>] drbd_md_sync_page_io+0x360/0x670 [drbd]
[  279.988671] [<ffffffffc052ae20>] drbd_md_write+0x1c8/0x320 [drbd]
[  279.988738] [<ffffffffc052b0d4>] drbd_md_sync+0x15c/0x350 [drbd]
[  279.988803] [<ffffffffc0500bcc>] drbd_start_resync+0x6ec/0x968 [drbd]
[  279.988866] [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
[  279.988929] [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
[  279.988994] [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
[  279.989035] [<ffffffff808b813c>] kthread+0xdc/0xf8
[  279.989045] [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c

[  399.988466] INFO: task drbd_r_r4:7898 blocked for more than 120 seconds.
[  399.995189]       Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
[  400.003206] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  400.011066] drbd_r_r4       D ffffffff80e1db78     0  7898      2 0x00100000
[  400.011076] Stack : ffffffff81d00000 ffffffff8114aa38 0000000000000000 ffffffff808dfdb8
               	  7fffffffffffffff 0000000000000000 80000000c9df7c38 7fffffffffffffff
               	  ffffffffc0280000 80000007ea1a0498 0000000000000001 ffffffffc0280000
               	  80000007efcf0200 ffffffff80e1db78 8000000788ae7830 ffffffff80e208d8
               	  0000000000000001 ffffffff80b35530 8000000788ae7820 8000000788ae7820
               	  8000000788ae7830 8000000788ae7830 ffffffff80fb772d 80000007efcf0200
               	  80000000c9df7380 0000000000000000 80000000c9df7c38 7fffffffffffffff
               	  ffffffffc0280000 ffffffff80e1d0b4 0000000000000002 80000007efcf0200
               	  ffffffffc0280000 0000000000000001 80000007efcf0330 ffffffffc027eb14
               	  0000000000000000 8000000788ac6c00 ffffffff808c4b60 80000007efcf0338
               	  ...
[  400.011148] Call Trace:
[  400.011159] [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
[  400.011169] [<ffffffff80e1db78>] schedule+0x38/0x98
[  400.011177] [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
[  400.011186] [<ffffffff80e1d0b4>] io_schedule_timeout+0x8c/0xc0
[  400.011205] [<ffffffffc027eb14>] wait_for_in_progress+0x12c/0x168 [dm_snapshot]
[  400.011222] [<ffffffffc027ec34>] do_origin+0xe4/0x170 [dm_snapshot]
[  400.011263] [<ffffffffc01dc2c0>] __map_bio+0xb0/0x258 [dm_mod]
[  400.011295] [<ffffffffc01deb94>] __split_and_process_bio+0x274/0x488 [dm_mod]
[  400.011328] [<ffffffffc01dee3c>] dm_make_request+0x94/0x128 [dm_mod]
[  400.011351] [<ffffffff80b3347c>] generic_make_request+0x114/0x290
[  400.011359] [<ffffffff80b336c0>] submit_bio+0xc8/0x1e0
[  400.011419] [<ffffffffc051cf60>] drbd_md_sync_page_io+0x360/0x670 [drbd]
[  400.011486] [<ffffffffc052ae20>] drbd_md_write+0x1c8/0x320 [drbd]
[  400.011554] [<ffffffffc052b0d4>] drbd_md_sync+0x15c/0x350 [drbd]
[  400.011618] [<ffffffffc0500bcc>] drbd_start_resync+0x6ec/0x968 [drbd]
[  400.011681] [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
[  400.011744] [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
[  400.011809] [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
[  400.011850] [<ffffffff808b813c>] kthread+0xdc/0xf8
[  400.011860] [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c

[  520.011262] INFO: task drbd_r_r4:7898 blocked for more than 120 seconds.
[  520.017985]       Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
[  520.026006] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  520.033855] drbd_r_r4       D ffffffff80e1db78     0  7898      2 0x00100000
[  520.033896] Stack : ffffffff81d00000 ffffffff8114aa38 0000000000000000 ffffffff808dfdb8
               	  7fffffffffffffff 0000000000000000 80000000c9df7c38 7fffffffffffffff
               	  ffffffffc0280000 80000007ea1a0498 0000000000000001 ffffffffc0280000
               	  80000007efcf0200 ffffffff80e1db78 8000000788ae7830 ffffffff80e208d8
               	  0000000000000001 ffffffff80b35530 8000000788ae7820 8000000788ae7820
               	  8000000788ae7830 8000000788ae7830 ffffffff80fb772d 80000007efcf0200
               	  80000000c9df7380 0000000000000000 80000000c9df7c38 7fffffffffffffff
               	  ffffffffc0280000 ffffffff80e1d0b4 0000000000000002 80000007efcf0200
               	  ffffffffc0280000 0000000000000001 80000007efcf0330 ffffffffc027eb14
               	  0000000000000000 8000000788ac6c00 ffffffff808c4b60 80000007efcf0338
               	  ...
[  520.033976] Call Trace:
[  520.033988] [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
[  520.034014] [<ffffffff80e1db78>] schedule+0x38/0x98
[  520.034022] [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
[  520.034031] [<ffffffff80e1d0b4>] io_schedule_timeout+0x8c/0xc0
[  520.034050] [<ffffffffc027eb14>] wait_for_in_progress+0x12c/0x168 [dm_snapshot]
[  520.034067] [<ffffffffc027ec34>] do_origin+0xe4/0x170 [dm_snapshot]
[  520.034107] [<ffffffffc01dc2c0>] __map_bio+0xb0/0x258 [dm_mod]
[  520.034140] [<ffffffffc01deb94>] __split_and_process_bio+0x274/0x488 [dm_mod]
[  520.034192] [<ffffffffc01dee3c>] dm_make_request+0x94/0x128 [dm_mod]
[  520.034230] [<ffffffff80b3347c>] generic_make_request+0x114/0x290
[  520.034241] [<ffffffff80b336c0>] submit_bio+0xc8/0x1e0
[  520.034301] [<ffffffffc051cf60>] drbd_md_sync_page_io+0x360/0x670 [drbd]
[  520.034368] [<ffffffffc052ae20>] drbd_md_write+0x1c8/0x320 [drbd]
[  520.034444] [<ffffffffc052b0d4>] drbd_md_sync+0x15c/0x350 [drbd]
[  520.034524] [<ffffffffc0500bcc>] drbd_start_resync+0x6ec/0x968 [drbd]
[  520.034589] [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
[  520.034654] [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
[  520.034736] [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
[  520.034778] [<ffffffff808b813c>] kthread+0xdc/0xf8
[  520.034788] [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c

[  540.587484] dmxmsg: CAC hand 1002 sequence is 2.
[  600.590573] dmxmsg: CAC hand 1002 sequence is 3.
[  640.034067] INFO: task drbd_r_r4:7898 blocked for more than 120 seconds.
[  640.040796]       Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
[  640.048820] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  640.056675] drbd_r_r4       D ffffffff80e1db78     0  7898      2 0x00100000
[  640.056685] Stack : ffffffff81d00000 ffffffff8114aa38 0000000000000000 ffffffff808dfdb8
               	  7fffffffffffffff 0000000000000000 80000000c9df7c38 7fffffffffffffff
               	  ffffffffc0280000 80000007ea1a0498 0000000000000001 ffffffffc0280000
               	  80000007efcf0200 ffffffff80e1db78 8000000788ae7830 ffffffff80e208d8
               	  0000000000000001 ffffffff80b35530 8000000788ae7820 8000000788ae7820
               	  8000000788ae7830 8000000788ae7830 ffffffff80fb772d 80000007efcf0200
               	  80000000c9df7380 0000000000000000 80000000c9df7c38 7fffffffffffffff
               	  ffffffffc0280000 ffffffff80e1d0b4 0000000000000002 80000007efcf0200
               	  ffffffffc0280000 0000000000000001 80000007efcf0330 ffffffffc027eb14
               	  0000000000000000 8000000788ac6c00 ffffffff808c4b60 80000007efcf0338
               	  ...
[  640.056757] Call Trace:
[  640.056769] [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
[  640.056778] [<ffffffff80e1db78>] schedule+0x38/0x98
[  640.056786] [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
[  640.056795] [<ffffffff80e1d0b4>] io_schedule_timeout+0x8c/0xc0
[  640.056814] [<ffffffffc027eb14>] wait_for_in_progress+0x12c/0x168 [dm_snapshot]
[  640.056831] [<ffffffffc027ec34>] do_origin+0xe4/0x170 [dm_snapshot]
[  640.056871] [<ffffffffc01dc2c0>] __map_bio+0xb0/0x258 [dm_mod]
[  640.056903] [<ffffffffc01deb94>] __split_and_process_bio+0x274/0x488 [dm_mod]
[  640.056935] [<ffffffffc01dee3c>] dm_make_request+0x94/0x128 [dm_mod]
[  640.056959] [<ffffffff80b3347c>] generic_make_request+0x114/0x290
[  640.056967] [<ffffffff80b336c0>] submit_bio+0xc8/0x1e0
[  640.057028] [<ffffffffc051cf60>] drbd_md_sync_page_io+0x360/0x670 [drbd]
[  640.057094] [<ffffffffc052ae20>] drbd_md_write+0x1c8/0x320 [drbd]
[  640.057161] [<ffffffffc052b0d4>] drbd_md_sync+0x15c/0x350 [drbd]
[  640.057226] [<ffffffffc0500bcc>] drbd_start_resync+0x6ec/0x968 [drbd]
[  640.057289] [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
[  640.057352] [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
[  640.057417] [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
[  640.057458] [<ffffffff808b813c>] kthread+0xdc/0xf8
[  640.057468] [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c

[  733.653271] Process {pid:9165, uid:0, comm:HealthDetector} is killing process {pid:12786, comm:getFCSStats.sh}
[  760.056793] INFO: task drbd_r_r4:7898 blocked for more than 120 seconds.
[  760.063517]       Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
[  760.071540] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  760.079402] drbd_r_r4       D ffffffff80e1db78     0  7898      2 0x00100000
[  760.079412] Stack : ffffffff81d00000 ffffffff8114aa38 0000000000000000 ffffffff808dfdb8
               	  7fffffffffffffff 0000000000000000 80000000c9df7c38 7fffffffffffffff
               	  ffffffffc0280000 80000007ea1a0498 0000000000000001 ffffffffc0280000
               	  80000007efcf0200 ffffffff80e1db78 8000000788ae7830 ffffffff80e208d8
               	  0000000000000001 ffffffff80b35530 8000000788ae7820 8000000788ae7820
               	  8000000788ae7830 8000000788ae7830 ffffffff80fb772d 80000007efcf0200
               	  80000000c9df7380 0000000000000000 80000000c9df7c38 7fffffffffffffff
               	  ffffffffc0280000 ffffffff80e1d0b4 0000000000000002 80000007efcf0200
               	  ffffffffc0280000 0000000000000001 80000007efcf0330 ffffffffc027eb14
               	  0000000000000000 8000000788ac6c00 ffffffff808c4b60 80000007efcf0338
               	  ...
[  760.079483] Call Trace:
[  760.079495] [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
[  760.079504] [<ffffffff80e1db78>] schedule+0x38/0x98
[  760.079512] [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
[  760.079521] [<ffffffff80e1d0b4>] io_schedule_timeout+0x8c/0xc0
[  760.079541] [<ffffffffc027eb14>] wait_for_in_progress+0x12c/0x168 [dm_snapshot]
[  760.079557] [<ffffffffc027ec34>] do_origin+0xe4/0x170 [dm_snapshot]
[  760.079598] [<ffffffffc01dc2c0>] __map_bio+0xb0/0x258 [dm_mod]
[  760.079631] [<ffffffffc01deb94>] __split_and_process_bio+0x274/0x488 [dm_mod]
[  760.079663] [<ffffffffc01dee3c>] dm_make_request+0x94/0x128 [dm_mod]
[  760.079686] [<ffffffff80b3347c>] generic_make_request+0x114/0x290
[  760.079695] [<ffffffff80b336c0>] submit_bio+0xc8/0x1e0
[  760.079755] [<ffffffffc051cf60>] drbd_md_sync_page_io+0x360/0x670 [drbd]
[  760.079822] [<ffffffffc052ae20>] drbd_md_write+0x1c8/0x320 [drbd]
[  760.079890] [<ffffffffc052b0d4>] drbd_md_sync+0x15c/0x350 [drbd]
[  760.079954] [<ffffffffc0500bcc>] drbd_start_resync+0x6ec/0x968 [drbd]
[  760.080017] [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
[  760.080081] [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
[  760.080146] [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
[  760.080187] [<ffffffff808b813c>] kthread+0xdc/0xf8
[  760.080197] [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c

[  880.079674] INFO: task drbd_r_r4:7898 blocked for more than 120 seconds.
[  880.086398]       Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
[  880.094421] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  880.102263] drbd_r_r4       D ffffffff80e1db78     0  7898      2 0x00100000
[  880.102273] Stack : ffffffff81d00000 ffffffff8114aa38 0000000000000000 ffffffff808dfdb8
               	  7fffffffffffffff 0000000000000000 80000000c9df7c38 7fffffffffffffff
               	  ffffffffc0280000 80000007ea1a0498 0000000000000001 ffffffffc0280000
               	  80000007efcf0200 ffffffff80e1db78 8000000788ae7830 ffffffff80e208d8
               	  0000000000000001 ffffffff80b35530 8000000788ae7820 8000000788ae7820
               	  8000000788ae7830 8000000788ae7830 ffffffff80fb772d 80000007efcf0200
               	  80000000c9df7380 0000000000000000 80000000c9df7c38 7fffffffffffffff
               	  ffffffffc0280000 ffffffff80e1d0b4 0000000000000002 80000007efcf0200
               	  ffffffffc0280000 0000000000000001 80000007efcf0330 ffffffffc027eb14
               	  0000000000000000 8000000788ac6c00 ffffffff808c4b60 80000007efcf0338
               	  ...
[  880.102344] Call Trace:
[  880.102356] [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
[  880.102365] [<ffffffff80e1db78>] schedule+0x38/0x98
[  880.102373] [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
[  880.102382] [<ffffffff80e1d0b4>] io_schedule_timeout+0x8c/0xc0
[  880.102402] [<ffffffffc027eb14>] wait_for_in_progress+0x12c/0x168 [dm_snapshot]
[  880.102419] [<ffffffffc027ec34>] do_origin+0xe4/0x170 [dm_snapshot]
[  880.102459] [<ffffffffc01dc2c0>] __map_bio+0xb0/0x258 [dm_mod]
[  880.102491] [<ffffffffc01deb94>] __split_and_process_bio+0x274/0x488 [dm_mod]
[  880.102523] [<ffffffffc01dee3c>] dm_make_request+0x94/0x128 [dm_mod]
[  880.102546] [<ffffffff80b3347c>] generic_make_request+0x114/0x290
[  880.102555] [<ffffffff80b336c0>] submit_bio+0xc8/0x1e0
[  880.102615] [<ffffffffc051cf60>] drbd_md_sync_page_io+0x360/0x670 [drbd]
[  880.102682] [<ffffffffc052ae20>] drbd_md_write+0x1c8/0x320 [drbd]
[  880.102749] [<ffffffffc052b0d4>] drbd_md_sync+0x15c/0x350 [drbd]
[  880.102814] [<ffffffffc0500bcc>] drbd_start_resync+0x6ec/0x968 [drbd]
[  880.102877] [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
[  880.102940] [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
[  880.103005] [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
[  880.103046] [<ffffffff808b813c>] kthread+0xdc/0xf8
[  880.103056] [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c

[  900.595553] dmxmsg: CAC hand 1002 sequence is 8.
[  960.598987] dmxmsg: CAC hand 1002 sequence is 9.
[ 1000.102471] INFO: task drbd_r_r4:7898 blocked for more than 120 seconds.
[ 1000.109200]       Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
[ 1000.117218] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1000.125058] drbd_r_r4       D ffffffff80e1db78     0  7898      2 0x00100000
[ 1000.125068] Stack : ffffffff81d00000 ffffffff8114aa38 0000000000000000 ffffffff808dfdb8
               	  7fffffffffffffff 0000000000000000 80000000c9df7c38 7fffffffffffffff
               	  ffffffffc0280000 80000007ea1a0498 0000000000000001 ffffffffc0280000
               	  80000007efcf0200 ffffffff80e1db78 8000000788ae7830 ffffffff80e208d8
               	  0000000000000001 ffffffff80b35530 8000000788ae7820 8000000788ae7820
               	  8000000788ae7830 8000000788ae7830 ffffffff80fb772d 80000007efcf0200
               	  80000000c9df7380 0000000000000000 80000000c9df7c38 7fffffffffffffff
               	  ffffffffc0280000 ffffffff80e1d0b4 0000000000000002 80000007efcf0200
               	  ffffffffc0280000 0000000000000001 80000007efcf0330 ffffffffc027eb14
               	  0000000000000000 8000000788ac6c00 ffffffff808c4b60 80000007efcf0338
               	  ...
[ 1000.125144] Call Trace:
[ 1000.125157] [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
[ 1000.125166] [<ffffffff80e1db78>] schedule+0x38/0x98
[ 1000.125174] [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
[ 1000.125183] [<ffffffff80e1d0b4>] io_schedule_timeout+0x8c/0xc0
[ 1000.125203] [<ffffffffc027eb14>] wait_for_in_progress+0x12c/0x168 [dm_snapshot]
[ 1000.125219] [<ffffffffc027ec34>] do_origin+0xe4/0x170 [dm_snapshot]
[ 1000.125260] [<ffffffffc01dc2c0>] __map_bio+0xb0/0x258 [dm_mod]
[ 1000.125292] [<ffffffffc01deb94>] __split_and_process_bio+0x274/0x488 [dm_mod]
[ 1000.125324] [<ffffffffc01dee3c>] dm_make_request+0x94/0x128 [dm_mod]
[ 1000.125376] [<ffffffff80b3347c>] generic_make_request+0x114/0x290
[ 1000.125394] [<ffffffff80b336c0>] submit_bio+0xc8/0x1e0
[ 1000.125458] [<ffffffffc051cf60>] drbd_md_sync_page_io+0x360/0x670 [drbd]
[ 1000.125525] [<ffffffffc052ae20>] drbd_md_write+0x1c8/0x320 [drbd]
[ 1000.125592] [<ffffffffc052b0d4>] drbd_md_sync+0x15c/0x350 [drbd]
[ 1000.125657] [<ffffffffc0500bcc>] drbd_start_resync+0x6ec/0x968 [drbd]
[ 1000.125719] [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
[ 1000.125783] [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
[ 1000.125848] [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
[ 1000.125888] [<ffffffff808b813c>] kthread+0xdc/0xf8
[ 1000.125898] [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c

[ 1033.650169] Process {pid:9165, uid:0, comm:HealthDetector} is killing process {pid:16180, comm:getFCSStats.sh}
[ 1120.125262] INFO: task drbd_r_r4:7898 blocked for more than 120 seconds.
[ 1120.131989]       Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
[ 1120.140011] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1120.147862] drbd_r_r4       D ffffffff80e1db78     0  7898      2 0x00100000
[ 1120.147872] Stack : ffffffff81d00000 ffffffff8114aa38 0000000000000000 ffffffff808dfdb8
               	  7fffffffffffffff 0000000000000000 80000000c9df7c38 7fffffffffffffff
               	  ffffffffc0280000 80000007ea1a0498 0000000000000001 ffffffffc0280000
               	  80000007efcf0200 ffffffff80e1db78 8000000788ae7830 ffffffff80e208d8
               	  0000000000000001 ffffffff80b35530 8000000788ae7820 8000000788ae7820
               	  8000000788ae7830 8000000788ae7830 ffffffff80fb772d 80000007efcf0200
               	  80000000c9df7380 0000000000000000 80000000c9df7c38 7fffffffffffffff
               	  ffffffffc0280000 ffffffff80e1d0b4 0000000000000002 80000007efcf0200
               	  ffffffffc0280000 0000000000000001 80000007efcf0330 ffffffffc027eb14
               	  0000000000000000 8000000788ac6c00 ffffffff808c4b60 80000007efcf0338
               	  ...
[ 1120.147944] Call Trace:
[ 1120.147956] [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
[ 1120.147965] [<ffffffff80e1db78>] schedule+0x38/0x98
[ 1120.147973] [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
[ 1120.147982] [<ffffffff80e1d0b4>] io_schedule_timeout+0x8c/0xc0
[ 1120.148002] [<ffffffffc027eb14>] wait_for_in_progress+0x12c/0x168 [dm_snapshot]
[ 1120.148019] [<ffffffffc027ec34>] do_origin+0xe4/0x170 [dm_snapshot]
[ 1120.148058] [<ffffffffc01dc2c0>] __map_bio+0xb0/0x258 [dm_mod]
[ 1120.148091] [<ffffffffc01deb94>] __split_and_process_bio+0x274/0x488 [dm_mod]
[ 1120.148123] [<ffffffffc01dee3c>] dm_make_request+0x94/0x128 [dm_mod]
[ 1120.148146] [<ffffffff80b3347c>] generic_make_request+0x114/0x290
[ 1120.148154] [<ffffffff80b336c0>] submit_bio+0xc8/0x1e0
[ 1120.148216] [<ffffffffc051cf60>] drbd_md_sync_page_io+0x360/0x670 [drbd]
[ 1120.148282] [<ffffffffc052ae20>] drbd_md_write+0x1c8/0x320 [drbd]
[ 1120.148350] [<ffffffffc052b0d4>] drbd_md_sync+0x15c/0x350 [drbd]
[ 1120.148415] [<ffffffffc0500bcc>] drbd_start_resync+0x6ec/0x968 [drbd]
[ 1120.148477] [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
[ 1120.148541] [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
[ 1120.148606] [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
[ 1120.148647] [<ffffffff808b813c>] kthread+0xdc/0xf8
[ 1120.148657] [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c

[ 1240.148088] INFO: task drbd_r_r4:7898 blocked for more than 120 seconds.
[ 1240.154815]       Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
[ 1240.162841] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1240.170689] drbd_r_r4       D ffffffff80e1db78     0  7898      2 0x00100000
[ 1240.170712] Stack : ffffffff81d00000 ffffffff8114aa38 0000000000000000 ffffffff808dfdb8
               	  7fffffffffffffff 0000000000000000 80000000c9df7c38 7fffffffffffffff
               	  ffffffffc0280000 80000007ea1a0498 0000000000000001 ffffffffc0280000
               	  80000007efcf0200 ffffffff80e1db78 8000000788ae7830 ffffffff80e208d8
               	  0000000000000001 ffffffff80b35530 8000000788ae7820 8000000788ae7820
               	  8000000788ae7830 8000000788ae7830 ffffffff80fb772d 80000007efcf0200
               	  80000000c9df7380 0000000000000000 80000000c9df7c38 7fffffffffffffff
               	  ffffffffc0280000 ffffffff80e1d0b4 0000000000000002 80000007efcf0200
               	  ffffffffc0280000 0000000000000001 80000007efcf0330 ffffffffc027eb14
               	  0000000000000000 8000000788ac6c00 ffffffff808c4b60 80000007efcf0338
               	  ...
[ 1240.170788] Call Trace:
[ 1240.170800] [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
[ 1240.170809] [<ffffffff80e1db78>] schedule+0x38/0x98
[ 1240.170817] [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
[ 1240.170826] [<ffffffff80e1d0b4>] io_schedule_timeout+0x8c/0xc0
[ 1240.170847] [<ffffffffc027eb14>] wait_for_in_progress+0x12c/0x168 [dm_snapshot]
[ 1240.170863] [<ffffffffc027ec34>] do_origin+0xe4/0x170 [dm_snapshot]
[ 1240.170904] [<ffffffffc01dc2c0>] __map_bio+0xb0/0x258 [dm_mod]
[ 1240.170936] [<ffffffffc01deb94>] __split_and_process_bio+0x274/0x488 [dm_mod]
[ 1240.170969] [<ffffffffc01dee3c>] dm_make_request+0x94/0x128 [dm_mod]
[ 1240.171009] [<ffffffff80b3347c>] generic_make_request+0x114/0x290
[ 1240.171027] [<ffffffff80b336c0>] submit_bio+0xc8/0x1e0
[ 1240.171096] [<ffffffffc051cf60>] drbd_md_sync_page_io+0x360/0x670 [drbd]
[ 1240.171170] [<ffffffffc052ae20>] drbd_md_write+0x1c8/0x320 [drbd]
[ 1240.171247] [<ffffffffc052b0d4>] drbd_md_sync+0x15c/0x350 [drbd]
[ 1240.171332] [<ffffffffc0500bcc>] drbd_start_resync+0x6ec/0x968 [drbd]
[ 1240.171396] [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
[ 1240.171463] [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
[ 1240.171541] [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
[ 1240.171582] [<ffffffff808b813c>] kthread+0xdc/0xf8
[ 1240.171592] [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c

[ 1333.647369] Process {pid:9165, uid:0, comm:HealthDetector} is killing process {pid:17698, comm:getFCSStats.sh}
[ 1360.170893] INFO: task drbd_r_r4:7898 blocked for more than 120 seconds.
[ 1360.177617]       Tainted: P           O    4.4.184-octeon-distro.git-v2.96-4-rc-wnd #1
[ 1360.185639] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1360.193504] drbd_r_r4       D ffffffff80e1db78     0  7898      2 0x00100000
[ 1360.193514] Stack : ffffffff81d00000 ffffffff8114aa38 0000000000000000 ffffffff808dfdb8
               	  7fffffffffffffff 0000000000000000 80000000c9df7c38 7fffffffffffffff
               	  ffffffffc0280000 80000007ea1a0498 0000000000000001 ffffffffc0280000
               	  80000007efcf0200 ffffffff80e1db78 8000000788ae7830 ffffffff80e208d8
               	  0000000000000001 ffffffff80b35530 8000000788ae7820 8000000788ae7820
               	  8000000788ae7830 8000000788ae7830 ffffffff80fb772d 80000007efcf0200
               	  80000000c9df7380 0000000000000000 80000000c9df7c38 7fffffffffffffff
               	  ffffffffc0280000 ffffffff80e1d0b4 0000000000000002 80000007efcf0200
               	  ffffffffc0280000 0000000000000001 80000007efcf0330 ffffffffc027eb14
               	  0000000000000000 8000000788ac6c00 ffffffff808c4b60 80000007efcf0338
               	  ...
[ 1360.193586] Call Trace:
[ 1360.193598] [<ffffffff80e1d4a8>] __schedule+0x3c0/0xa58
[ 1360.193607] [<ffffffff80e1db78>] schedule+0x38/0x98
[ 1360.193615] [<ffffffff80e208d8>] schedule_timeout+0x240/0x2a0
[ 1360.193624] [<ffffffff80e1d0b4>] io_schedule_timeout+0x8c/0xc0
[ 1360.193644] [<ffffffffc027eb14>] wait_for_in_progress+0x12c/0x168 [dm_snapshot]
[ 1360.193660] [<ffffffffc027ec34>] do_origin+0xe4/0x170 [dm_snapshot]
[ 1360.193701] [<ffffffffc01dc2c0>] __map_bio+0xb0/0x258 [dm_mod]
[ 1360.193733] [<ffffffffc01deb94>] __split_and_process_bio+0x274/0x488 [dm_mod]
[ 1360.193765] [<ffffffffc01dee3c>] dm_make_request+0x94/0x128 [dm_mod]
[ 1360.193788] [<ffffffff80b3347c>] generic_make_request+0x114/0x290
[ 1360.193797] [<ffffffff80b336c0>] submit_bio+0xc8/0x1e0
[ 1360.193857] [<ffffffffc051cf60>] drbd_md_sync_page_io+0x360/0x670 [drbd]
[ 1360.193924] [<ffffffffc052ae20>] drbd_md_write+0x1c8/0x320 [drbd]
[ 1360.193991] [<ffffffffc052b0d4>] drbd_md_sync+0x15c/0x350 [drbd]
[ 1360.194056] [<ffffffffc0500bcc>] drbd_start_resync+0x6ec/0x968 [drbd]
[ 1360.194118] [<ffffffffc05037dc>] receive_sync_uuid+0x2d4/0x5a0 [drbd]
[ 1360.194182] [<ffffffffc0515b30>] drbd_receiver+0x210/0x420 [drbd]
[ 1360.194247] [<ffffffffc0523a3c>] drbd_thread_setup+0x74/0x1a8 [drbd]
[ 1360.194287] [<ffffffff808b813c>] kthread+0xdc/0xf8
[ 1360.194297] [<ffffffff8086bf28>] ret_from_kernel_thread+0x14/0x1c


[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fix "dm kcopyd: Fix bug causing workqueue stalls" causes dead lock
  2019-10-11 10:17                       ` Guruswamy Basavaiah
@ 2019-10-11 11:39                         ` Nikos Tsironis
  2019-10-11 12:17                           ` Nikos Tsironis
  0 siblings, 1 reply; 19+ messages in thread
From: Nikos Tsironis @ 2019-10-11 11:39 UTC (permalink / raw)
  To: Guruswamy Basavaiah
  Cc: dm-devel, Mikulas Patocka, agk, Mike Snitzer, iliastsi

On 10/11/19 1:17 PM, Guruswamy Basavaiah wrote:
> Hello Nikos,
>  Applied these patches and tested.
>  We still see hung_task_timeout back traces and the drbd Resync is blocked.
>  Attached the back trace, please let me know if you need any other information.
> 

Hi Guru,

Can you provide more information about your setup? The output of
'dmsetup table', 'dmsetup ls --tree' and the DRBD configuration would
help to get a better picture of your I/O stack.

Also, is it possible to describe the test case you are running and
exactly what it does?

Thanks,
Nikos

>  In patch "0002-dm-snapshot-rework-COW-throttling-to-fix-deadlock.patch"
> I change "struct wait_queue_head" to "wait_queue_head_t" as i was
> getting compilation error with former one.
> 
> On Thu, 10 Oct 2019 at 17:33, Nikos Tsironis <ntsironis@arrikto.com> wrote:
>>
>> On 10/10/19 9:34 AM, Guruswamy Basavaiah wrote:
>>> Hello,
>>> We use 4.4.184 in our builds and the patch fails to apply.
>>> Is it possible to give a patch for 4.4.x branch ?
>> Hi Guru,
>>
>> I attach the two patches fixing the deadlock rebased on the 4.4.x branch.
>>
>> Nikos
>>
>>>
>>> patching Logs.
>>> patching file drivers/md/dm-snap.c
>>> Hunk #1 succeeded at 19 (offset 1 line).
>>> Hunk #2 succeeded at 105 (offset -1 lines).
>>> Hunk #3 succeeded at 157 (offset -4 lines).
>>> Hunk #4 succeeded at 1206 (offset -120 lines).
>>> Hunk #5 FAILED at 1508.
>>> Hunk #6 succeeded at 1412 (offset -124 lines).
>>> Hunk #7 succeeded at 1425 (offset -124 lines).
>>> Hunk #8 FAILED at 1925.
>>> Hunk #9 succeeded at 1866 with fuzz 2 (offset -255 lines).
>>> Hunk #10 succeeded at 2202 (offset -294 lines).
>>> Hunk #11 succeeded at 2332 (offset -294 lines).
>>> 2 out of 11 hunks FAILED -- saving rejects to file drivers/md/dm-snap.c.rej
>>>
>>> Guru
>>>
>>> On Thu, 10 Oct 2019 at 01:33, Guruswamy Basavaiah <guru2018@gmail.com> wrote:
>>>>
>>>> Hello Mike,
>>>>  I will get the testing result before end of Thursday.
>>>> Guru
>>>>
>>>> On Wed, 9 Oct 2019 at 21:34, Mike Snitzer <snitzer@redhat.com> wrote:
>>>>>
>>>>> On Wed, Oct 09 2019 at 11:44am -0400,
>>>>> Nikos Tsironis <ntsironis@arrikto.com> wrote:
>>>>>
>>>>>> On 10/9/19 5:13 PM, Mike Snitzer wrote:> On Tue, Oct 01 2019 at  8:43am -0400,
>>>>>>> Nikos Tsironis <ntsironis@arrikto.com> wrote:
>>>>>>>
>>>>>>>> On 10/1/19 3:27 PM, Guruswamy Basavaiah wrote:
>>>>>>>>> Hello Nikos,
>>>>>>>>>  Yes, issue is consistently reproducible with us, in a particular
>>>>>>>>> set-up and test case.
>>>>>>>>>  I will get the access to set-up next week, will try to test and let
>>>>>>>>> you know the results before end of next week.
>>>>>>>>>
>>>>>>>>
>>>>>>>> That sounds great!
>>>>>>>>
>>>>>>>> Thanks a lot,
>>>>>>>> Nikos
>>>>>>>
>>>>>>> Hi Guru,
>>>>>>>
>>>>>>> Any chance you could try this fix that I've staged to send to Linus?
>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.4&id=633b1613b2a49304743c18314bb6e6465c21fd8a
>>>>>>>
>>>>>>> Shiort of that, Nikos: do you happen to have a test scenario that teases
>>>>>>> out this deadlock?
>>>>>>>
>>>>>>
>>>>>> Hi Mike,
>>>>>>
>>>>>> Yes,
>>>>>>
>>>>>> I created a 50G LV and took a snapshot of the same size:
>>>>>>
>>>>>>   lvcreate -n data-lv -L50G testvg
>>>>>>   lvcreate -n snap-lv -L50G -s testvg/data-lv
>>>>>>
>>>>>> Then I ran the following fio job:
>>>>>>
>>>>>> [global]
>>>>>> randrepeat=1
>>>>>> ioengine=libaio
>>>>>> bs=1M
>>>>>> size=6G
>>>>>> offset_increment=6G
>>>>>> numjobs=8
>>>>>> direct=1
>>>>>> iodepth=32
>>>>>> group_reporting
>>>>>> filename=/dev/testvg/data-lv
>>>>>>
>>>>>> [test]
>>>>>> rw=write
>>>>>> timeout=180
>>>>>>
>>>>>> , concurrently with the following script:
>>>>>>
>>>>>> lvcreate -n dummy-lv -L1G testvg
>>>>>>
>>>>>> while true
>>>>>> do
>>>>>>  lvcreate -n dummy-snap -L1M -s testvg/dummy-lv
>>>>>>  lvremove -f testvg/dummy-snap
>>>>>> done
>>>>>>
>>>>>> This reproduced the deadlock for me. I also ran 'echo 30 >
>>>>>> /proc/sys/kernel/hung_task_timeout_secs', to reduce the hung task
>>>>>> timeout.
>>>>>>
>>>>>> Nikos.
>>>>>
>>>>> Very nice, well done.  Curious if you've tested with the fix I've staged
>>>>> (see above)?  If so, does it resolve the deadlock?  If you've had
>>>>> success I'd be happy to update the tags in the commit header to include
>>>>> your Tested-by before sending it to Linus.  Also, any review of the
>>>>> patch that you can do would be appreciated and with your formal
>>>>> Reviewed-by reply would be welcomed and folded in too.
>>>>>
>>>>> Mike
>>>>
>>>>
>>>>
>>>> --
>>>> Guruswamy Basavaiah
>>>
>>>
>>>
> 
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fix "dm kcopyd: Fix bug causing workqueue stalls" causes dead lock
  2019-10-11 11:39                         ` Nikos Tsironis
@ 2019-10-11 12:17                           ` Nikos Tsironis
  2019-10-12  8:46                             ` Guruswamy Basavaiah
  0 siblings, 1 reply; 19+ messages in thread
From: Nikos Tsironis @ 2019-10-11 12:17 UTC (permalink / raw)
  To: Guruswamy Basavaiah
  Cc: dm-devel, Mikulas Patocka, agk, Mike Snitzer, iliastsi

[-- Attachment #1: Type: text/plain, Size: 5025 bytes --]

On 10/11/19 2:39 PM, Nikos Tsironis wrote:
> On 10/11/19 1:17 PM, Guruswamy Basavaiah wrote:
>> Hello Nikos,
>>  Applied these patches and tested.
>>  We still see hung_task_timeout back traces and the drbd Resync is blocked.
>>  Attached the back trace, please let me know if you need any other information.
>>
> 
> Hi Guru,
> 
> Can you provide more information about your setup? The output of
> 'dmsetup table', 'dmsetup ls --tree' and the DRBD configuration would
> help to get a better picture of your I/O stack.
> 
> Also, is it possible to describe the test case you are running and
> exactly what it does?
> 
> Thanks,
> Nikos
> 

Hi Guru,

I believe I found the mistake. The in_progress variable was never
initialized to zero.

I attach a new version of the second patch correcting this.

Can you please test again with this patch?

Thanks,
Nikos

>>  In patch "0002-dm-snapshot-rework-COW-throttling-to-fix-deadlock.patch"
>> I change "struct wait_queue_head" to "wait_queue_head_t" as i was
>> getting compilation error with former one.
>>
>> On Thu, 10 Oct 2019 at 17:33, Nikos Tsironis <ntsironis@arrikto.com> wrote:
>>>
>>> On 10/10/19 9:34 AM, Guruswamy Basavaiah wrote:
>>>> Hello,
>>>> We use 4.4.184 in our builds and the patch fails to apply.
>>>> Is it possible to give a patch for 4.4.x branch ?
>>> Hi Guru,
>>>
>>> I attach the two patches fixing the deadlock rebased on the 4.4.x branch.
>>>
>>> Nikos
>>>
>>>>
>>>> patching Logs.
>>>> patching file drivers/md/dm-snap.c
>>>> Hunk #1 succeeded at 19 (offset 1 line).
>>>> Hunk #2 succeeded at 105 (offset -1 lines).
>>>> Hunk #3 succeeded at 157 (offset -4 lines).
>>>> Hunk #4 succeeded at 1206 (offset -120 lines).
>>>> Hunk #5 FAILED at 1508.
>>>> Hunk #6 succeeded at 1412 (offset -124 lines).
>>>> Hunk #7 succeeded at 1425 (offset -124 lines).
>>>> Hunk #8 FAILED at 1925.
>>>> Hunk #9 succeeded at 1866 with fuzz 2 (offset -255 lines).
>>>> Hunk #10 succeeded at 2202 (offset -294 lines).
>>>> Hunk #11 succeeded at 2332 (offset -294 lines).
>>>> 2 out of 11 hunks FAILED -- saving rejects to file drivers/md/dm-snap.c.rej
>>>>
>>>> Guru
>>>>
>>>> On Thu, 10 Oct 2019 at 01:33, Guruswamy Basavaiah <guru2018@gmail.com> wrote:
>>>>>
>>>>> Hello Mike,
>>>>>  I will get the testing result before end of Thursday.
>>>>> Guru
>>>>>
>>>>> On Wed, 9 Oct 2019 at 21:34, Mike Snitzer <snitzer@redhat.com> wrote:
>>>>>>
>>>>>> On Wed, Oct 09 2019 at 11:44am -0400,
>>>>>> Nikos Tsironis <ntsironis@arrikto.com> wrote:
>>>>>>
>>>>>>> On 10/9/19 5:13 PM, Mike Snitzer wrote:> On Tue, Oct 01 2019 at  8:43am -0400,
>>>>>>>> Nikos Tsironis <ntsironis@arrikto.com> wrote:
>>>>>>>>
>>>>>>>>> On 10/1/19 3:27 PM, Guruswamy Basavaiah wrote:
>>>>>>>>>> Hello Nikos,
>>>>>>>>>>  Yes, issue is consistently reproducible with us, in a particular
>>>>>>>>>> set-up and test case.
>>>>>>>>>>  I will get the access to set-up next week, will try to test and let
>>>>>>>>>> you know the results before end of next week.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> That sounds great!
>>>>>>>>>
>>>>>>>>> Thanks a lot,
>>>>>>>>> Nikos
>>>>>>>>
>>>>>>>> Hi Guru,
>>>>>>>>
>>>>>>>> Any chance you could try this fix that I've staged to send to Linus?
>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.4&id=633b1613b2a49304743c18314bb6e6465c21fd8a
>>>>>>>>
>>>>>>>> Shiort of that, Nikos: do you happen to have a test scenario that teases
>>>>>>>> out this deadlock?
>>>>>>>>
>>>>>>>
>>>>>>> Hi Mike,
>>>>>>>
>>>>>>> Yes,
>>>>>>>
>>>>>>> I created a 50G LV and took a snapshot of the same size:
>>>>>>>
>>>>>>>   lvcreate -n data-lv -L50G testvg
>>>>>>>   lvcreate -n snap-lv -L50G -s testvg/data-lv
>>>>>>>
>>>>>>> Then I ran the following fio job:
>>>>>>>
>>>>>>> [global]
>>>>>>> randrepeat=1
>>>>>>> ioengine=libaio
>>>>>>> bs=1M
>>>>>>> size=6G
>>>>>>> offset_increment=6G
>>>>>>> numjobs=8
>>>>>>> direct=1
>>>>>>> iodepth=32
>>>>>>> group_reporting
>>>>>>> filename=/dev/testvg/data-lv
>>>>>>>
>>>>>>> [test]
>>>>>>> rw=write
>>>>>>> timeout=180
>>>>>>>
>>>>>>> , concurrently with the following script:
>>>>>>>
>>>>>>> lvcreate -n dummy-lv -L1G testvg
>>>>>>>
>>>>>>> while true
>>>>>>> do
>>>>>>>  lvcreate -n dummy-snap -L1M -s testvg/dummy-lv
>>>>>>>  lvremove -f testvg/dummy-snap
>>>>>>> done
>>>>>>>
>>>>>>> This reproduced the deadlock for me. I also ran 'echo 30 >
>>>>>>> /proc/sys/kernel/hung_task_timeout_secs', to reduce the hung task
>>>>>>> timeout.
>>>>>>>
>>>>>>> Nikos.
>>>>>>
>>>>>> Very nice, well done.  Curious if you've tested with the fix I've staged
>>>>>> (see above)?  If so, does it resolve the deadlock?  If you've had
>>>>>> success I'd be happy to update the tags in the commit header to include
>>>>>> your Tested-by before sending it to Linus.  Also, any review of the
>>>>>> patch that you can do would be appreciated and with your formal
>>>>>> Reviewed-by reply would be welcomed and folded in too.
>>>>>>
>>>>>> Mike
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Guruswamy Basavaiah
>>>>
>>>>
>>>>
>>
>>
>>

[-- Attachment #2: 0002-dm-snapshot-rework-COW-throttling-to-fix-deadlock.patch --]
[-- Type: text/x-patch, Size: 7911 bytes --]

From 80c68f059b5ce9828030569aadabb97085ffea5e Mon Sep 17 00:00:00 2001
From: Mikulas Patocka <mpatocka@redhat.com>
Date: Wed, 2 Oct 2019 06:15:53 -0400
Subject: [PATCH] dm snapshot: rework COW throttling to fix deadlock

Commit 721b1d98fb517a ("dm snapshot: Fix excessive memory usage and
workqueue stalls") introduced a semaphore to limit the maximum number of
in-flight kcopyd (COW) jobs.

The implementation of this throttling mechanism is prone to a deadlock:

1. One or more threads write to the origin device causing COW, which is
   performed by kcopyd.

2. At some point some of these threads might reach the s->cow_count
   semaphore limit and block in down(&s->cow_count), holding a read lock
   on _origins_lock.

3. Someone tries to acquire a write lock on _origins_lock, e.g.,
   snapshot_ctr(), which blocks because the threads at step (2) already
   hold a read lock on it.

4. A COW operation completes and kcopyd runs dm-snapshot's completion
   callback, which ends up calling pending_complete().
   pending_complete() tries to resubmit any deferred origin bios. This
   requires acquiring a read lock on _origins_lock, which blocks.

   This happens because the read-write semaphore implementation gives
   priority to writers, meaning that as soon as a writer tries to enter
   the critical section, no readers will be allowed in, until all
   writers have completed their work.

   So, pending_complete() waits for the writer at step (3) to acquire
   and release the lock. This writer waits for the readers at step (2)
   to release the read lock and those readers wait for
   pending_complete() (the kcopyd thread) to signal the s->cow_count
   semaphore: DEADLOCK.

The above was thoroughly analyzed and documented by Nikos Tsironis as
part of his initial proposal for fixing this deadlock, see:
https://www.redhat.com/archives/dm-devel/2019-October/msg00001.html

Fix this deadlock by reworking COW throttling so that it waits without
holding any locks. Add a variable 'in_progress' that counts how many
kcopyd jobs are running. A function wait_for_in_progress() will sleep if
'in_progress' is over the limit. It drops _origins_lock in order to
avoid the deadlock.

Reported-by: Guruswamy Basavaiah <guru2018@gmail.com>
Reported-by: Nikos Tsironis <ntsironis@arrikto.com>
Fixes: 721b1d98fb51 ("dm snapshot: Fix excessive memory usage and workqueue stalls")
Cc: stable@vger.kernel.org # v5.0+
Depends-on: 4a3f111a73a8c ("dm snapshot: introduce account_start_copy() and account_end_copy()")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-snap.c | 79 ++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 65 insertions(+), 14 deletions(-)

diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index a9c82fd036c6..9f127a4e26b5 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -19,7 +19,6 @@
 #include <linux/vmalloc.h>
 #include <linux/log2.h>
 #include <linux/dm-kcopyd.h>
-#include <linux/semaphore.h>
 
 #include "dm.h"
 
@@ -106,8 +105,8 @@ struct dm_snapshot {
 	/* The on disk metadata handler */
 	struct dm_exception_store *store;
 
-	/* Maximum number of in-flight COW jobs. */
-	struct semaphore cow_count;
+	unsigned in_progress;
+	struct wait_queue_head in_progress_wait;
 
 	struct dm_kcopyd_client *kcopyd_client;
 
@@ -158,8 +157,8 @@ struct dm_snapshot {
  */
 #define DEFAULT_COW_THRESHOLD 2048
 
-static int cow_threshold = DEFAULT_COW_THRESHOLD;
-module_param_named(snapshot_cow_threshold, cow_threshold, int, 0644);
+static unsigned cow_threshold = DEFAULT_COW_THRESHOLD;
+module_param_named(snapshot_cow_threshold, cow_threshold, uint, 0644);
 MODULE_PARM_DESC(snapshot_cow_threshold, "Maximum number of chunks being copied on write");
 
 DECLARE_DM_KCOPYD_THROTTLE_WITH_MODULE_PARM(snapshot_copy_throttle,
@@ -1207,7 +1206,8 @@ static int snapshot_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 		goto bad_hash_tables;
 	}
 
-	sema_init(&s->cow_count, (cow_threshold > 0) ? cow_threshold : INT_MAX);
+	init_waitqueue_head(&s->in_progress_wait);
+	s->in_progress = 0;
 
 	s->kcopyd_client = dm_kcopyd_client_create(&dm_kcopyd_throttle);
 	if (IS_ERR(s->kcopyd_client)) {
@@ -1395,17 +1395,54 @@ static void snapshot_dtr(struct dm_target *ti)
 
 	dm_put_device(ti, s->origin);
 
+	WARN_ON(s->in_progress);
+
 	kfree(s);
 }
 
 static void account_start_copy(struct dm_snapshot *s)
 {
-	down(&s->cow_count);
+	spin_lock(&s->in_progress_wait.lock);
+	s->in_progress++;
+	spin_unlock(&s->in_progress_wait.lock);
 }
 
 static void account_end_copy(struct dm_snapshot *s)
 {
-	up(&s->cow_count);
+	spin_lock(&s->in_progress_wait.lock);
+	BUG_ON(!s->in_progress);
+	s->in_progress--;
+	if (likely(s->in_progress <= cow_threshold) &&
+	    unlikely(waitqueue_active(&s->in_progress_wait)))
+		wake_up_locked(&s->in_progress_wait);
+	spin_unlock(&s->in_progress_wait.lock);
+}
+
+static bool wait_for_in_progress(struct dm_snapshot *s, bool unlock_origins)
+{
+	if (unlikely(s->in_progress > cow_threshold)) {
+		spin_lock(&s->in_progress_wait.lock);
+		if (likely(s->in_progress > cow_threshold)) {
+			/*
+			 * NOTE: this throttle doesn't account for whether
+			 * the caller is servicing an IO that will trigger a COW
+			 * so excess throttling may result for chunks not required
+			 * to be COW'd.  But if cow_threshold was reached, extra
+			 * throttling is unlikely to negatively impact performance.
+			 */
+			DECLARE_WAITQUEUE(wait, current);
+			__add_wait_queue(&s->in_progress_wait, &wait);
+			__set_current_state(TASK_UNINTERRUPTIBLE);
+			spin_unlock(&s->in_progress_wait.lock);
+			if (unlock_origins)
+				up_read(&_origins_lock);
+			io_schedule();
+			remove_wait_queue(&s->in_progress_wait, &wait);
+			return false;
+		}
+		spin_unlock(&s->in_progress_wait.lock);
+	}
+	return true;
 }
 
 /*
@@ -1423,7 +1460,7 @@ static void flush_bios(struct bio *bio)
 	}
 }
 
-static int do_origin(struct dm_dev *origin, struct bio *bio);
+static int do_origin(struct dm_dev *origin, struct bio *bio, bool limit);
 
 /*
  * Flush a list of buffers.
@@ -1436,7 +1473,7 @@ static void retry_origin_bios(struct dm_snapshot *s, struct bio *bio)
 	while (bio) {
 		n = bio->bi_next;
 		bio->bi_next = NULL;
-		r = do_origin(s->origin, bio);
+		r = do_origin(s->origin, bio, false);
 		if (r == DM_MAPIO_REMAPPED)
 			generic_make_request(bio);
 		bio = n;
@@ -1728,6 +1765,11 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio)
 	if (!s->valid)
 		return -EIO;
 
+	if (bio_data_dir(bio) == WRITE) {
+		while (unlikely(!wait_for_in_progress(s, false)))
+			; /* wait_for_in_progress() has slept */
+	}
+
 	/* FIXME: should only take write lock if we need
 	 * to copy an exception */
 	down_write(&s->lock);
@@ -1877,7 +1919,7 @@ redirect_to_origin:
 
 	if (bio_rw(bio) == WRITE) {
 		up_write(&s->lock);
-		return do_origin(s->origin, bio);
+		return do_origin(s->origin, bio, false);
 	}
 
 out_unlock:
@@ -2213,15 +2255,24 @@ next_snapshot:
 /*
  * Called on a write from the origin driver.
  */
-static int do_origin(struct dm_dev *origin, struct bio *bio)
+static int do_origin(struct dm_dev *origin, struct bio *bio, bool limit)
 {
 	struct origin *o;
 	int r = DM_MAPIO_REMAPPED;
 
+again:
 	down_read(&_origins_lock);
 	o = __lookup_origin(origin->bdev);
-	if (o)
+	if (o) {
+		if (limit) {
+			struct dm_snapshot *s;
+			list_for_each_entry(s, &o->snapshots, list)
+				if (unlikely(!wait_for_in_progress(s, true)))
+					goto again;
+		}
+
 		r = __origin_write(&o->snapshots, bio->bi_iter.bi_sector, bio);
+	}
 	up_read(&_origins_lock);
 
 	return r;
@@ -2334,7 +2385,7 @@ static int origin_map(struct dm_target *ti, struct bio *bio)
 		dm_accept_partial_bio(bio, available_sectors);
 
 	/* Only tell snapshots if this is a write */
-	return do_origin(o->dev, bio);
+	return do_origin(o->dev, bio, true);
 }
 
 /*
-- 
2.11.0


[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Fix "dm kcopyd: Fix bug causing workqueue stalls" causes dead lock
  2019-10-11 12:17                           ` Nikos Tsironis
@ 2019-10-12  8:46                             ` Guruswamy Basavaiah
  2019-10-17  5:58                               ` Guruswamy Basavaiah
  0 siblings, 1 reply; 19+ messages in thread
From: Guruswamy Basavaiah @ 2019-10-12  8:46 UTC (permalink / raw)
  To: Nikos Tsironis; +Cc: dm-devel, Mikulas Patocka, agk, Mike Snitzer, iliastsi

Hello Nikos,
 I am having some issues in our set-up, I will try to get the results ASAP.
Guru


On Fri, 11 Oct 2019 at 17:47, Nikos Tsironis <ntsironis@arrikto.com> wrote:
>
> On 10/11/19 2:39 PM, Nikos Tsironis wrote:
> > On 10/11/19 1:17 PM, Guruswamy Basavaiah wrote:
> >> Hello Nikos,
> >>  Applied these patches and tested.
> >>  We still see hung_task_timeout back traces and the drbd Resync is blocked.
> >>  Attached the back trace, please let me know if you need any other information.
> >>
> >
> > Hi Guru,
> >
> > Can you provide more information about your setup? The output of
> > 'dmsetup table', 'dmsetup ls --tree' and the DRBD configuration would
> > help to get a better picture of your I/O stack.
> >
> > Also, is it possible to describe the test case you are running and
> > exactly what it does?
> >
> > Thanks,
> > Nikos
> >
>
> Hi Guru,
>
> I believe I found the mistake. The in_progress variable was never
> initialized to zero.
>
> I attach a new version of the second patch correcting this.
>
> Can you please test again with this patch?
>
> Thanks,
> Nikos
>
> >>  In patch "0002-dm-snapshot-rework-COW-throttling-to-fix-deadlock.patch"
> >> I change "struct wait_queue_head" to "wait_queue_head_t" as i was
> >> getting compilation error with former one.
> >>
> >> On Thu, 10 Oct 2019 at 17:33, Nikos Tsironis <ntsironis@arrikto.com> wrote:
> >>>
> >>> On 10/10/19 9:34 AM, Guruswamy Basavaiah wrote:
> >>>> Hello,
> >>>> We use 4.4.184 in our builds and the patch fails to apply.
> >>>> Is it possible to give a patch for 4.4.x branch ?
> >>> Hi Guru,
> >>>
> >>> I attach the two patches fixing the deadlock rebased on the 4.4.x branch.
> >>>
> >>> Nikos
> >>>
> >>>>
> >>>> patching Logs.
> >>>> patching file drivers/md/dm-snap.c
> >>>> Hunk #1 succeeded at 19 (offset 1 line).
> >>>> Hunk #2 succeeded at 105 (offset -1 lines).
> >>>> Hunk #3 succeeded at 157 (offset -4 lines).
> >>>> Hunk #4 succeeded at 1206 (offset -120 lines).
> >>>> Hunk #5 FAILED at 1508.
> >>>> Hunk #6 succeeded at 1412 (offset -124 lines).
> >>>> Hunk #7 succeeded at 1425 (offset -124 lines).
> >>>> Hunk #8 FAILED at 1925.
> >>>> Hunk #9 succeeded at 1866 with fuzz 2 (offset -255 lines).
> >>>> Hunk #10 succeeded at 2202 (offset -294 lines).
> >>>> Hunk #11 succeeded at 2332 (offset -294 lines).
> >>>> 2 out of 11 hunks FAILED -- saving rejects to file drivers/md/dm-snap.c.rej
> >>>>
> >>>> Guru
> >>>>
> >>>> On Thu, 10 Oct 2019 at 01:33, Guruswamy Basavaiah <guru2018@gmail.com> wrote:
> >>>>>
> >>>>> Hello Mike,
> >>>>>  I will get the testing result before end of Thursday.
> >>>>> Guru
> >>>>>
> >>>>> On Wed, 9 Oct 2019 at 21:34, Mike Snitzer <snitzer@redhat.com> wrote:
> >>>>>>
> >>>>>> On Wed, Oct 09 2019 at 11:44am -0400,
> >>>>>> Nikos Tsironis <ntsironis@arrikto.com> wrote:
> >>>>>>
> >>>>>>> On 10/9/19 5:13 PM, Mike Snitzer wrote:> On Tue, Oct 01 2019 at  8:43am -0400,
> >>>>>>>> Nikos Tsironis <ntsironis@arrikto.com> wrote:
> >>>>>>>>
> >>>>>>>>> On 10/1/19 3:27 PM, Guruswamy Basavaiah wrote:
> >>>>>>>>>> Hello Nikos,
> >>>>>>>>>>  Yes, issue is consistently reproducible with us, in a particular
> >>>>>>>>>> set-up and test case.
> >>>>>>>>>>  I will get the access to set-up next week, will try to test and let
> >>>>>>>>>> you know the results before end of next week.
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> That sounds great!
> >>>>>>>>>
> >>>>>>>>> Thanks a lot,
> >>>>>>>>> Nikos
> >>>>>>>>
> >>>>>>>> Hi Guru,
> >>>>>>>>
> >>>>>>>> Any chance you could try this fix that I've staged to send to Linus?
> >>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.4&id=633b1613b2a49304743c18314bb6e6465c21fd8a
> >>>>>>>>
> >>>>>>>> Shiort of that, Nikos: do you happen to have a test scenario that teases
> >>>>>>>> out this deadlock?
> >>>>>>>>
> >>>>>>>
> >>>>>>> Hi Mike,
> >>>>>>>
> >>>>>>> Yes,
> >>>>>>>
> >>>>>>> I created a 50G LV and took a snapshot of the same size:
> >>>>>>>
> >>>>>>>   lvcreate -n data-lv -L50G testvg
> >>>>>>>   lvcreate -n snap-lv -L50G -s testvg/data-lv
> >>>>>>>
> >>>>>>> Then I ran the following fio job:
> >>>>>>>
> >>>>>>> [global]
> >>>>>>> randrepeat=1
> >>>>>>> ioengine=libaio
> >>>>>>> bs=1M
> >>>>>>> size=6G
> >>>>>>> offset_increment=6G
> >>>>>>> numjobs=8
> >>>>>>> direct=1
> >>>>>>> iodepth=32
> >>>>>>> group_reporting
> >>>>>>> filename=/dev/testvg/data-lv
> >>>>>>>
> >>>>>>> [test]
> >>>>>>> rw=write
> >>>>>>> timeout=180
> >>>>>>>
> >>>>>>> , concurrently with the following script:
> >>>>>>>
> >>>>>>> lvcreate -n dummy-lv -L1G testvg
> >>>>>>>
> >>>>>>> while true
> >>>>>>> do
> >>>>>>>  lvcreate -n dummy-snap -L1M -s testvg/dummy-lv
> >>>>>>>  lvremove -f testvg/dummy-snap
> >>>>>>> done
> >>>>>>>
> >>>>>>> This reproduced the deadlock for me. I also ran 'echo 30 >
> >>>>>>> /proc/sys/kernel/hung_task_timeout_secs', to reduce the hung task
> >>>>>>> timeout.
> >>>>>>>
> >>>>>>> Nikos.
> >>>>>>
> >>>>>> Very nice, well done.  Curious if you've tested with the fix I've staged
> >>>>>> (see above)?  If so, does it resolve the deadlock?  If you've had
> >>>>>> success I'd be happy to update the tags in the commit header to include
> >>>>>> your Tested-by before sending it to Linus.  Also, any review of the
> >>>>>> patch that you can do would be appreciated and with your formal
> >>>>>> Reviewed-by reply would be welcomed and folded in too.
> >>>>>>
> >>>>>> Mike
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Guruswamy Basavaiah
> >>>>
> >>>>
> >>>>
> >>
> >>
> >>



--
Guruswamy Basavaiah

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fix "dm kcopyd: Fix bug causing workqueue stalls" causes dead lock
  2019-10-12  8:46                             ` Guruswamy Basavaiah
@ 2019-10-17  5:58                               ` Guruswamy Basavaiah
  2019-10-17  7:43                                 ` Nikos Tsironis
  0 siblings, 1 reply; 19+ messages in thread
From: Guruswamy Basavaiah @ 2019-10-17  5:58 UTC (permalink / raw)
  To: Nikos Tsironis; +Cc: dm-devel, Mikulas Patocka, agk, Mike Snitzer, iliastsi

Hello Nikos,
 Tested with your new patches. Issue is resolved. Thank you.
 In second patch "struct wait_queue_head" to "wait_queue_head_t" for
variable in_progress_wait, else compilation is failing with error
 "error: field 'in_progress_wait' has incomplete type
  struct wait_queue_head in_progress_wait;"
 Attached the changed patch.

Guru

On Sat, 12 Oct 2019 at 14:16, Guruswamy Basavaiah <guru2018@gmail.com> wrote:
>
> Hello Nikos,
>  I am having some issues in our set-up, I will try to get the results ASAP.
> Guru
>
>
> On Fri, 11 Oct 2019 at 17:47, Nikos Tsironis <ntsironis@arrikto.com> wrote:
> >
> > On 10/11/19 2:39 PM, Nikos Tsironis wrote:
> > > On 10/11/19 1:17 PM, Guruswamy Basavaiah wrote:
> > >> Hello Nikos,
> > >>  Applied these patches and tested.
> > >>  We still see hung_task_timeout back traces and the drbd Resync is blocked.
> > >>  Attached the back trace, please let me know if you need any other information.
> > >>
> > >
> > > Hi Guru,
> > >
> > > Can you provide more information about your setup? The output of
> > > 'dmsetup table', 'dmsetup ls --tree' and the DRBD configuration would
> > > help to get a better picture of your I/O stack.
> > >
> > > Also, is it possible to describe the test case you are running and
> > > exactly what it does?
> > >
> > > Thanks,
> > > Nikos
> > >
> >
> > Hi Guru,
> >
> > I believe I found the mistake. The in_progress variable was never
> > initialized to zero.
> >
> > I attach a new version of the second patch correcting this.
> >
> > Can you please test again with this patch?
> >
> > Thanks,
> > Nikos
> >
> > >>  In patch "0002-dm-snapshot-rework-COW-throttling-to-fix-deadlock.patch"
> > >> I change "struct wait_queue_head" to "wait_queue_head_t" as i was
> > >> getting compilation error with former one.
> > >>
> > >> On Thu, 10 Oct 2019 at 17:33, Nikos Tsironis <ntsironis@arrikto.com> wrote:
> > >>>
> > >>> On 10/10/19 9:34 AM, Guruswamy Basavaiah wrote:
> > >>>> Hello,
> > >>>> We use 4.4.184 in our builds and the patch fails to apply.
> > >>>> Is it possible to give a patch for 4.4.x branch ?
> > >>> Hi Guru,
> > >>>
> > >>> I attach the two patches fixing the deadlock rebased on the 4.4.x branch.
> > >>>
> > >>> Nikos
> > >>>
> > >>>>
> > >>>> patching Logs.
> > >>>> patching file drivers/md/dm-snap.c
> > >>>> Hunk #1 succeeded at 19 (offset 1 line).
> > >>>> Hunk #2 succeeded at 105 (offset -1 lines).
> > >>>> Hunk #3 succeeded at 157 (offset -4 lines).
> > >>>> Hunk #4 succeeded at 1206 (offset -120 lines).
> > >>>> Hunk #5 FAILED at 1508.
> > >>>> Hunk #6 succeeded at 1412 (offset -124 lines).
> > >>>> Hunk #7 succeeded at 1425 (offset -124 lines).
> > >>>> Hunk #8 FAILED at 1925.
> > >>>> Hunk #9 succeeded at 1866 with fuzz 2 (offset -255 lines).
> > >>>> Hunk #10 succeeded at 2202 (offset -294 lines).
> > >>>> Hunk #11 succeeded at 2332 (offset -294 lines).
> > >>>> 2 out of 11 hunks FAILED -- saving rejects to file drivers/md/dm-snap.c.rej
> > >>>>
> > >>>> Guru
> > >>>>
> > >>>> On Thu, 10 Oct 2019 at 01:33, Guruswamy Basavaiah <guru2018@gmail.com> wrote:
> > >>>>>
> > >>>>> Hello Mike,
> > >>>>>  I will get the testing result before end of Thursday.
> > >>>>> Guru
> > >>>>>
> > >>>>> On Wed, 9 Oct 2019 at 21:34, Mike Snitzer <snitzer@redhat.com> wrote:
> > >>>>>>
> > >>>>>> On Wed, Oct 09 2019 at 11:44am -0400,
> > >>>>>> Nikos Tsironis <ntsironis@arrikto.com> wrote:
> > >>>>>>
> > >>>>>>> On 10/9/19 5:13 PM, Mike Snitzer wrote:> On Tue, Oct 01 2019 at  8:43am -0400,
> > >>>>>>>> Nikos Tsironis <ntsironis@arrikto.com> wrote:
> > >>>>>>>>
> > >>>>>>>>> On 10/1/19 3:27 PM, Guruswamy Basavaiah wrote:
> > >>>>>>>>>> Hello Nikos,
> > >>>>>>>>>>  Yes, issue is consistently reproducible with us, in a particular
> > >>>>>>>>>> set-up and test case.
> > >>>>>>>>>>  I will get the access to set-up next week, will try to test and let
> > >>>>>>>>>> you know the results before end of next week.
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> That sounds great!
> > >>>>>>>>>
> > >>>>>>>>> Thanks a lot,
> > >>>>>>>>> Nikos
> > >>>>>>>>
> > >>>>>>>> Hi Guru,
> > >>>>>>>>
> > >>>>>>>> Any chance you could try this fix that I've staged to send to Linus?
> > >>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.4&id=633b1613b2a49304743c18314bb6e6465c21fd8a
> > >>>>>>>>
> > >>>>>>>> Shiort of that, Nikos: do you happen to have a test scenario that teases
> > >>>>>>>> out this deadlock?
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> Hi Mike,
> > >>>>>>>
> > >>>>>>> Yes,
> > >>>>>>>
> > >>>>>>> I created a 50G LV and took a snapshot of the same size:
> > >>>>>>>
> > >>>>>>>   lvcreate -n data-lv -L50G testvg
> > >>>>>>>   lvcreate -n snap-lv -L50G -s testvg/data-lv
> > >>>>>>>
> > >>>>>>> Then I ran the following fio job:
> > >>>>>>>
> > >>>>>>> [global]
> > >>>>>>> randrepeat=1
> > >>>>>>> ioengine=libaio
> > >>>>>>> bs=1M
> > >>>>>>> size=6G
> > >>>>>>> offset_increment=6G
> > >>>>>>> numjobs=8
> > >>>>>>> direct=1
> > >>>>>>> iodepth=32
> > >>>>>>> group_reporting
> > >>>>>>> filename=/dev/testvg/data-lv
> > >>>>>>>
> > >>>>>>> [test]
> > >>>>>>> rw=write
> > >>>>>>> timeout=180
> > >>>>>>>
> > >>>>>>> , concurrently with the following script:
> > >>>>>>>
> > >>>>>>> lvcreate -n dummy-lv -L1G testvg
> > >>>>>>>
> > >>>>>>> while true
> > >>>>>>> do
> > >>>>>>>  lvcreate -n dummy-snap -L1M -s testvg/dummy-lv
> > >>>>>>>  lvremove -f testvg/dummy-snap
> > >>>>>>> done
> > >>>>>>>
> > >>>>>>> This reproduced the deadlock for me. I also ran 'echo 30 >
> > >>>>>>> /proc/sys/kernel/hung_task_timeout_secs', to reduce the hung task
> > >>>>>>> timeout.
> > >>>>>>>
> > >>>>>>> Nikos.
> > >>>>>>
> > >>>>>> Very nice, well done.  Curious if you've tested with the fix I've staged
> > >>>>>> (see above)?  If so, does it resolve the deadlock?  If you've had
> > >>>>>> success I'd be happy to update the tags in the commit header to include
> > >>>>>> your Tested-by before sending it to Linus.  Also, any review of the
> > >>>>>> patch that you can do would be appreciated and with your formal
> > >>>>>> Reviewed-by reply would be welcomed and folded in too.
> > >>>>>>
> > >>>>>> Mike
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> Guruswamy Basavaiah
> > >>>>
> > >>>>
> > >>>>
> > >>
> > >>
> > >>
>
>
>
> --
> Guruswamy Basavaiah



-- 
Guruswamy Basavaiah

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fix "dm kcopyd: Fix bug causing workqueue stalls" causes dead lock
  2019-10-17  5:58                               ` Guruswamy Basavaiah
@ 2019-10-17  7:43                                 ` Nikos Tsironis
  0 siblings, 0 replies; 19+ messages in thread
From: Nikos Tsironis @ 2019-10-17  7:43 UTC (permalink / raw)
  To: Guruswamy Basavaiah
  Cc: dm-devel, Mikulas Patocka, agk, Mike Snitzer, iliastsi

On 10/17/19 8:58 AM, Guruswamy Basavaiah wrote:
>Hello Nikos,
>  Tested with your new patches. Issue is resolved. Thank you.

Hi Guru,

That's great. Thanks for testing the patches.

>  In second patch "struct wait_queue_head" to "wait_queue_head_t" for
> variable in_progress_wait, else compilation is failing with error
>  "error: field 'in_progress_wait' has incomplete type
>   struct wait_queue_head in_progress_wait;"

"struct wait_queue_head" was introduced by commit 9d9d676f595b50
("sched/wait: Standardize internal naming of wait-queue heads"), which
is included in kernels starting from v4.13.

So, the patch works fine with the latest kernel, but needs adapting for
older kernels, which I missed when rebasing the patches for the 4.4.x
kernel series.

Nikos.

>  Attached the changed patch.
> 
> Guru
> 
> On Sat, 12 Oct 2019 at 14:16, Guruswamy Basavaiah <guru2018@gmail.com> wrote:
>>
>> Hello Nikos,
>>  I am having some issues in our set-up, I will try to get the results ASAP.
>> Guru
>>
>>
>> On Fri, 11 Oct 2019 at 17:47, Nikos Tsironis <ntsironis@arrikto.com> wrote:
>>>
>>> On 10/11/19 2:39 PM, Nikos Tsironis wrote:
>>>> On 10/11/19 1:17 PM, Guruswamy Basavaiah wrote:
>>>>> Hello Nikos,
>>>>>  Applied these patches and tested.
>>>>>  We still see hung_task_timeout back traces and the drbd Resync is blocked.
>>>>>  Attached the back trace, please let me know if you need any other information.
>>>>>
>>>>
>>>> Hi Guru,
>>>>
>>>> Can you provide more information about your setup? The output of
>>>> 'dmsetup table', 'dmsetup ls --tree' and the DRBD configuration would
>>>> help to get a better picture of your I/O stack.
>>>>
>>>> Also, is it possible to describe the test case you are running and
>>>> exactly what it does?
>>>>
>>>> Thanks,
>>>> Nikos
>>>>
>>>
>>> Hi Guru,
>>>
>>> I believe I found the mistake. The in_progress variable was never
>>> initialized to zero.
>>>
>>> I attach a new version of the second patch correcting this.
>>>
>>> Can you please test again with this patch?
>>>
>>> Thanks,
>>> Nikos
>>>
>>>>>  In patch "0002-dm-snapshot-rework-COW-throttling-to-fix-deadlock.patch"
>>>>> I change "struct wait_queue_head" to "wait_queue_head_t" as i was
>>>>> getting compilation error with former one.
>>>>>
>>>>> On Thu, 10 Oct 2019 at 17:33, Nikos Tsironis <ntsironis@arrikto.com> wrote:
>>>>>>
>>>>>> On 10/10/19 9:34 AM, Guruswamy Basavaiah wrote:
>>>>>>> Hello,
>>>>>>> We use 4.4.184 in our builds and the patch fails to apply.
>>>>>>> Is it possible to give a patch for 4.4.x branch ?
>>>>>> Hi Guru,
>>>>>>
>>>>>> I attach the two patches fixing the deadlock rebased on the 4.4.x branch.
>>>>>>
>>>>>> Nikos
>>>>>>
>>>>>>>
>>>>>>> patching Logs.
>>>>>>> patching file drivers/md/dm-snap.c
>>>>>>> Hunk #1 succeeded at 19 (offset 1 line).
>>>>>>> Hunk #2 succeeded at 105 (offset -1 lines).
>>>>>>> Hunk #3 succeeded at 157 (offset -4 lines).
>>>>>>> Hunk #4 succeeded at 1206 (offset -120 lines).
>>>>>>> Hunk #5 FAILED at 1508.
>>>>>>> Hunk #6 succeeded at 1412 (offset -124 lines).
>>>>>>> Hunk #7 succeeded at 1425 (offset -124 lines).
>>>>>>> Hunk #8 FAILED at 1925.
>>>>>>> Hunk #9 succeeded at 1866 with fuzz 2 (offset -255 lines).
>>>>>>> Hunk #10 succeeded at 2202 (offset -294 lines).
>>>>>>> Hunk #11 succeeded at 2332 (offset -294 lines).
>>>>>>> 2 out of 11 hunks FAILED -- saving rejects to file drivers/md/dm-snap.c.rej
>>>>>>>
>>>>>>> Guru
>>>>>>>
>>>>>>> On Thu, 10 Oct 2019 at 01:33, Guruswamy Basavaiah <guru2018@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Hello Mike,
>>>>>>>>  I will get the testing result before end of Thursday.
>>>>>>>> Guru
>>>>>>>>
>>>>>>>> On Wed, 9 Oct 2019 at 21:34, Mike Snitzer <snitzer@redhat.com> wrote:
>>>>>>>>>
>>>>>>>>> On Wed, Oct 09 2019 at 11:44am -0400,
>>>>>>>>> Nikos Tsironis <ntsironis@arrikto.com> wrote:
>>>>>>>>>
>>>>>>>>>> On 10/9/19 5:13 PM, Mike Snitzer wrote:> On Tue, Oct 01 2019 at  8:43am -0400,
>>>>>>>>>>> Nikos Tsironis <ntsironis@arrikto.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On 10/1/19 3:27 PM, Guruswamy Basavaiah wrote:
>>>>>>>>>>>>> Hello Nikos,
>>>>>>>>>>>>>  Yes, issue is consistently reproducible with us, in a particular
>>>>>>>>>>>>> set-up and test case.
>>>>>>>>>>>>>  I will get the access to set-up next week, will try to test and let
>>>>>>>>>>>>> you know the results before end of next week.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> That sounds great!
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks a lot,
>>>>>>>>>>>> Nikos
>>>>>>>>>>>
>>>>>>>>>>> Hi Guru,
>>>>>>>>>>>
>>>>>>>>>>> Any chance you could try this fix that I've staged to send to Linus?
>>>>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.4&id=633b1613b2a49304743c18314bb6e6465c21fd8a
>>>>>>>>>>>
>>>>>>>>>>> Shiort of that, Nikos: do you happen to have a test scenario that teases
>>>>>>>>>>> out this deadlock?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Mike,
>>>>>>>>>>
>>>>>>>>>> Yes,
>>>>>>>>>>
>>>>>>>>>> I created a 50G LV and took a snapshot of the same size:
>>>>>>>>>>
>>>>>>>>>>   lvcreate -n data-lv -L50G testvg
>>>>>>>>>>   lvcreate -n snap-lv -L50G -s testvg/data-lv
>>>>>>>>>>
>>>>>>>>>> Then I ran the following fio job:
>>>>>>>>>>
>>>>>>>>>> [global]
>>>>>>>>>> randrepeat=1
>>>>>>>>>> ioengine=libaio
>>>>>>>>>> bs=1M
>>>>>>>>>> size=6G
>>>>>>>>>> offset_increment=6G
>>>>>>>>>> numjobs=8
>>>>>>>>>> direct=1
>>>>>>>>>> iodepth=32
>>>>>>>>>> group_reporting
>>>>>>>>>> filename=/dev/testvg/data-lv
>>>>>>>>>>
>>>>>>>>>> [test]
>>>>>>>>>> rw=write
>>>>>>>>>> timeout=180
>>>>>>>>>>
>>>>>>>>>> , concurrently with the following script:
>>>>>>>>>>
>>>>>>>>>> lvcreate -n dummy-lv -L1G testvg
>>>>>>>>>>
>>>>>>>>>> while true
>>>>>>>>>> do
>>>>>>>>>>  lvcreate -n dummy-snap -L1M -s testvg/dummy-lv
>>>>>>>>>>  lvremove -f testvg/dummy-snap
>>>>>>>>>> done
>>>>>>>>>>
>>>>>>>>>> This reproduced the deadlock for me. I also ran 'echo 30 >
>>>>>>>>>> /proc/sys/kernel/hung_task_timeout_secs', to reduce the hung task
>>>>>>>>>> timeout.
>>>>>>>>>>
>>>>>>>>>> Nikos.
>>>>>>>>>
>>>>>>>>> Very nice, well done.  Curious if you've tested with the fix I've staged
>>>>>>>>> (see above)?  If so, does it resolve the deadlock?  If you've had
>>>>>>>>> success I'd be happy to update the tags in the commit header to include
>>>>>>>>> your Tested-by before sending it to Linus.  Also, any review of the
>>>>>>>>> patch that you can do would be appreciated and with your formal
>>>>>>>>> Reviewed-by reply would be welcomed and folded in too.
>>>>>>>>>
>>>>>>>>> Mike
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Guruswamy Basavaiah
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>>
>>
>>
>>
>> --
>> Guruswamy Basavaiah
> 
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2019-10-17  7:43 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-27 13:19 Fix "dm kcopyd: Fix bug causing workqueue stalls" causes dead lock Guruswamy Basavaiah
2019-09-27 19:33 ` Nikos Tsironis
2019-09-29 14:36   ` Guruswamy Basavaiah
2019-10-01 12:12     ` Nikos Tsironis
2019-10-01 12:27       ` Guruswamy Basavaiah
2019-10-01 12:43         ` Nikos Tsironis
2019-10-09 14:13           ` Mike Snitzer
2019-10-09 15:44             ` Nikos Tsironis
2019-10-09 16:04               ` Mike Snitzer
2019-10-09 20:03                 ` Guruswamy Basavaiah
2019-10-10  6:34                   ` Guruswamy Basavaiah
2019-10-10 12:03                     ` Nikos Tsironis
2019-10-11 10:17                       ` Guruswamy Basavaiah
2019-10-11 11:39                         ` Nikos Tsironis
2019-10-11 12:17                           ` Nikos Tsironis
2019-10-12  8:46                             ` Guruswamy Basavaiah
2019-10-17  5:58                               ` Guruswamy Basavaiah
2019-10-17  7:43                                 ` Nikos Tsironis
2019-10-10 11:58                 ` Nikos Tsironis

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.