All of lore.kernel.org
 help / color / mirror / Atom feed
* Bisected: Kernel deadlock bug related to cgroups.
@ 2016-08-30 18:39 Brent Lovelace
  2016-08-31 10:16 ` Oleg Nesterov
  0 siblings, 1 reply; 4+ messages in thread
From: Brent Lovelace @ 2016-08-30 18:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: oleg, lizefan, tj

I found a kernel deadlock regression bug introduced in the 4.4 kernel.

This bug occurs when running an ftp send-to-self test. We used libcurl 
against
a ftp server.
We are requesting to download a 45k file at about 17 times per second. 
It locks
up on its own after about 5 minutes. It will happen reliably when 
attempting to
ssh into the system while the ftp test is running.

Please CC me as I'm not on the lkml mailing list.


We used special routing rules to make send-to-self work.

[root@brent-6port ~]# ip ru show
11:    from all to 10.0.0.1 iif eth3 lookup local
11:    from all to 10.0.0.2 iif eth2 lookup local
50:    from all oif eth0 lookup 1
50:    from all oif eth1 lookup 2
50:    from all oif eth3 lookup 4
50:    from all oif eth2 lookup 3
50:    from all oif eth4 lookup 5
400:    from 10.0.0.1 lookup 4
400:    from 10.0.0.2 lookup 3
512:    from all lookup local
32766:    from all lookup main
32767:    from all lookup default

[root@brent-6port ~]# ip route show
default via 192.168.100.1 dev eth5
10.0.0.0/24 dev eth3  proto kernel  scope link  src 10.0.0.1
10.0.0.0/24 dev eth2  proto kernel  scope link  src 10.0.0.2
192.168.100.0/24 dev eth5  proto kernel  scope link  src 192.168.100.86

[root@brent-6port ~]# ip route show table 3
10.0.0.0/24 via 10.0.0.2 dev eth2

[root@brent-6port ~]# ip route show table 4
10.0.0.0/24 via 10.0.0.1 dev eth3

I bisected to this commit id:
----------------------------------------------------------------------------------
commit c9e75f0492b248aeaa7af8991a6fc9a21506bc96
Author: Oleg Nesterov <oleg@redhat.com>
Date:   Fri Nov 27 19:57:19 2015 +0100

      cgroup: pids: fix race between cgroup_post_fork() and cgroup_migrate()

      If the new child migrates to another cgroup before 
cgroup_post_fork() calls
      subsys->fork(), then both pids_can_attach() and pids_fork() will 
do the same
      pids_uncharge(old_pids) + pids_charge(pids) sequence twice.

      Change copy_process() to call 
threadgroup_change_begin/threadgroup_change_end
      unconditionally. percpu_down_read() is cheap and this allows other 
cleanups,
      see the next changes.

      Also, this way we can unify cgroup_threadgroup_rwsem and dup_mmap_sem.

      Signed-off-by: Oleg Nesterov <oleg@redhat.com>
      Acked-by: Zefan Li <lizefan@huawei.com>
      Signed-off-by: Tejun Heo <tj@kernel.org>

:040000 040000 ec136eff1aee5082f3aed3e9aa8c205e699d0c94 
9123aae80295b265518797c6fd2a3af86b867648 M    kernel
----------------------------------------------------------------------------------

Here is the console splat after the lockup occurs.

brent-6port login: INFO: task systemd:1 blocked for more than 180 seconds.
        Not tainted 4.4.0-rc1+ #22
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
systemd         D ffff88007dfcfd10     0     1      0 0x00000000
   ffff88007dfcfd10 00ff88007dfcfcd8 0000000000000001 ffff88007e296f80
   ffff88007dff8000 ffff88007dfd0000 ffffffff82bae220 ffff880036758c00
   fffffffffffffff2 ffff88005e327b00 ffff88007dfcfd28 ffffffff816c571f
Call Trace:
   [<ffffffff816c571f>] schedule+0x7a/0x8f
   [<ffffffff81126ed2>] percpu_down_write+0xad/0xc4
   [<ffffffff811220c8>] ? wake_up_atomic_t+0x25/0x25
   [<ffffffff81169b4d>] __cgroup_procs_write+0x72/0x229
   [<ffffffff8112b2eb>] ? lock_acquire+0x103/0x18f
   [<ffffffff81169d21>] cgroup_procs_write+0xf/0x11
   [<ffffffff81167b58>] cgroup_file_write+0x2c/0xd0
   [<ffffffff8126dace>] kernfs_fop_write+0x106/0x14e
   [<ffffffff812069cd>] __vfs_write+0x21/0xa0
   [<ffffffff81126dad>] ? percpu_down_read+0x36/0x5f
   [<ffffffff81208faa>] ? __sb_start_write+0x5a/0xab
   [<ffffffff81208faa>] ? __sb_start_write+0x5a/0xab
   [<ffffffff8120722d>] vfs_write+0x95/0xe0
   [<ffffffff812079e0>] SyS_write+0x4b/0x79
   [<ffffffff816c99b6>] entry_SYSCALL_64_fastpath+0x16/0x76
4 locks held by systemd/1:
   #0:  (sb_writers#6){.+.+.+}, at: [<ffffffff81208faa>] 
__sb_start_write+0x5a/0xab
   #1:  (&of->mutex){+.+.+.}, at: [<ffffffff8126da5d>] 
kernfs_fop_write+0x95/0x14e
   #2:  (cgroup_mutex){+.+.+.}, at: [<ffffffff81167369>] 
cgroup_kn_lock_live+0x4b/0x98
   #3:  (&cgroup_threadgroup_rwsem){++++++}, at: [<ffffffff81126e44>] 
percpu_down_write+0x1f/0xc4
INFO: task kworker/u8:2:185 blocked for more than 180 seconds.
        Not tainted 4.4.0-rc1+ #22
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u8:2    D ffff880036befb58     0   185      2 0x00000000
Workqueue: netns cleanup_net
   ffff880036befb58 00ff880036befbd0 ffffffff00000002 ffff88007e316f80
   ffff8800783e8000 ffff880036bf0000 ffff88005917bed0 ffff8800783e8000
   ffffffff816c8953 ffff88005917bed8 ffff880036befb70 ffffffff816c571f
Call Trace:
   [<ffffffff816c8953>] ? usleep_range+0x3a/0x3a
   [<ffffffff816c571f>] schedule+0x7a/0x8f
   [<ffffffff816c8982>] schedule_timeout+0x2f/0xd8
   [<ffffffff816c9204>] ? _raw_spin_unlock_irq+0x27/0x3f
   [<ffffffff816c8953>] ? usleep_range+0x3a/0x3a
   [<ffffffff81129ea2>] ? trace_hardirqs_on_caller+0x16f/0x18b
   [<ffffffff816c5f44>] do_wait_for_common+0xf0/0x127
   [<ffffffff816c5f44>] ? do_wait_for_common+0xf0/0x127
   [<ffffffff811101c3>] ? wake_up_q+0x42/0x42
   [<ffffffff816c5ff2>] wait_for_common+0x36/0x50
   [<ffffffff816c6024>] wait_for_completion+0x18/0x1a
   [<ffffffff81106b5f>] kthread_stop+0xc8/0x217
   [<ffffffffa06d440c>] pg_net_exit+0xbc/0x112 [pktgen]
   [<ffffffff81604fa2>] ops_exit_list+0x3d/0x4e
   [<ffffffff81606123>] cleanup_net+0x19f/0x234
   [<ffffffff81101aa3>] process_one_work+0x237/0x46b
   [<ffffffff8110216d>] worker_thread+0x1e7/0x292
   [<ffffffff81101f86>] ? rescuer_thread+0x285/0x285
   [<ffffffff81106ded>] kthread+0xc4/0xcc
   [<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
   [<ffffffff816c9d6f>] ret_from_fork+0x3f/0x70
   [<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
3 locks held by kworker/u8:2/185:
   #0:  ("%s""netns"){.+.+.+}, at: [<ffffffff811019ad>] 
process_one_work+0x141/0x46b
   #1:  (net_cleanup_work){+.+.+.}, at: [<ffffffff811019ad>] 
process_one_work+0x141/0x46b
   #2:  (net_mutex){+.+.+.}, at: [<ffffffff81605ffe>] cleanup_net+0x7a/0x234
INFO: task systemd-udevd:704 blocked for more than 180 seconds.
        Not tainted 4.4.0-rc1+ #22
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
systemd-udevd   D ffff880078767cd8     0   704      1 0x00000080
   ffff880078767cd8 0000000000000000 0000000000000000 ffff88007e216f80
   ffff8800366ea480 ffff880078768000 ffff8800366ea480 0000000000000000
   0000000001200011 00005649eb435028 ffff880078767cf0 ffffffff816c571f
Call Trace:
   [<ffffffff816c571f>] schedule+0x7a/0x8f
   [<ffffffff816c85ee>] rwsem_down_read_failed+0xdc/0xf8
   [<ffffffff8136d634>] call_rwsem_down_read_failed+0x14/0x30
   [<ffffffff8136d634>] ? call_rwsem_down_read_failed+0x14/0x30
   [<ffffffff810e9b97>] ? copy_process+0x5b8/0x195d
   [<ffffffff81126dc4>] ? percpu_down_read+0x4d/0x5f
   [<ffffffff810e9b97>] copy_process+0x5b8/0x195d
   [<ffffffff810eb094>] _do_fork+0x8c/0x2fb
   [<ffffffff81177b2c>] ? __audit_syscall_entry+0xbb/0xdf
   [<ffffffff81003526>] ? do_audit_syscall_entry+0x5b/0x5d
   [<ffffffff8100366d>] ? syscall_trace_enter_phase1+0xe0/0xeb
   [<ffffffff81003017>] ? trace_hardirqs_on_thunk+0x17/0x19
   [<ffffffff810eb375>] SyS_clone+0x14/0x16
   [<ffffffff816c99b6>] entry_SYSCALL_64_fastpath+0x16/0x76
1 lock held by systemd-udevd/704:
   #0:  (&cgroup_threadgroup_rwsem){++++++}, at: [<ffffffff810e9b97>] 
copy_process+0x5b8/0x195d
INFO: task btserver:1838 blocked for more than 180 seconds.
        Not tainted 4.4.0-rc1+ #22
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
btserver        D ffff88007a2a7cd8     0  1838   1663 0x00000000
   ffff88007a2a7cd8 0000000000000000 0000000000000002 ffff88007e316f80
   ffff88007a374900 ffff88007a2a8000 ffff88007a374900 0000000000000000
   0000000000100011 00000000007d51bf ffff88007a2a7cf0 ffffffff816c571f
Call Trace:
   [<ffffffff816c571f>] schedule+0x7a/0x8f
   [<ffffffff816c85ee>] rwsem_down_read_failed+0xdc/0xf8
   [<ffffffff8136d634>] call_rwsem_down_read_failed+0x14/0x30
   [<ffffffff8136d634>] ? call_rwsem_down_read_failed+0x14/0x30
   [<ffffffff810e9b97>] ? copy_process+0x5b8/0x195d
   [<ffffffff81126dc4>] ? percpu_down_read+0x4d/0x5f
   [<ffffffff810e9b97>] copy_process+0x5b8/0x195d
   [<ffffffff810eb094>] _do_fork+0x8c/0x2fb
   [<ffffffff81129ecb>] ? trace_hardirqs_on+0xd/0xf
   [<ffffffff816c9209>] ? _raw_spin_unlock_irq+0x2c/0x3f
   [<ffffffff810f9339>] ? __set_current_blocked+0x44/0x49
   [<ffffffff810f5cc5>] ? copy_to_user+0x32/0x38
   [<ffffffff81003044>] ? lockdep_sys_exit_thunk+0x12/0x14
   [<ffffffff810eb375>] SyS_clone+0x14/0x16
   [<ffffffff816c99b6>] entry_SYSCALL_64_fastpath+0x16/0x76
1 lock held by btserver/1838:
   #0:  (&cgroup_threadgroup_rwsem){++++++}, at: [<ffffffff810e9b97>] 
copy_process+0x5b8/0x195d
INFO: task btserver:2199 blocked for more than 180 seconds.
        Not tainted 4.4.0-rc1+ #22
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
btserver        D ffff88005e29bcd8     0  2199   1652 0x00000000
   ffff88005e29bcd8 0000000000000000 0000000000000000 ffff88007e216f80
   ffff88000019a480 ffff88005e29c000 ffff88000019a480 0000000000000000
   0000000000100011 00000000007d51bf ffff88005e29bcf0 ffffffff816c571f
Call Trace:
   [<ffffffff816c571f>] schedule+0x7a/0x8f
   [<ffffffff816c85ee>] rwsem_down_read_failed+0xdc/0xf8
   [<ffffffff8136d634>] call_rwsem_down_read_failed+0x14/0x30
   [<ffffffff8136d634>] ? call_rwsem_down_read_failed+0x14/0x30
   [<ffffffff810e9b97>] ? copy_process+0x5b8/0x195d
   [<ffffffff81126dc4>] ? percpu_down_read+0x4d/0x5f
   [<ffffffff810e9b97>] copy_process+0x5b8/0x195d
   [<ffffffff810eb094>] _do_fork+0x8c/0x2fb
   [<ffffffff81129ecb>] ? trace_hardirqs_on+0xd/0xf
   [<ffffffff816c9209>] ? _raw_spin_unlock_irq+0x2c/0x3f
   [<ffffffff810f9339>] ? __set_current_blocked+0x44/0x49
   [<ffffffff810f5cc5>] ? copy_to_user+0x32/0x38
   [<ffffffff81003044>] ? lockdep_sys_exit_thunk+0x12/0x14
   [<ffffffff810eb375>] SyS_clone+0x14/0x16
   [<ffffffff816c99b6>] entry_SYSCALL_64_fastpath+0x16/0x76
1 lock held by btserver/2199:
   #0:  (&cgroup_threadgroup_rwsem){++++++}, at: [<ffffffff810e9b97>] 
copy_process+0x5b8/0x195d
INFO: task vsftpd:4352 blocked for more than 180 seconds.
        Not tainted 4.4.0-rc1+ #22
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
vsftpd          D ffff880054867c68     0  4352   2611 0x00000000
   ffff880054867c68 00ff88005933a480 ffff880000000000 ffff88007e216f80
   ffff88005933a480 ffff880054868000 0000000000000246 ffff880054867cc0
   ffff88005933a480 ffffffff81cea268 ffff880054867c80 ffffffff816c571f
Call Trace:
   [<ffffffff816c571f>] schedule+0x7a/0x8f
   [<ffffffff816c59b1>] schedule_preempt_disabled+0x10/0x19
   [<ffffffff816c64c0>] mutex_lock_nested+0x1c0/0x3a0
   [<ffffffff81606233>] ? copy_net_ns+0x7b/0xf8
   [<ffffffff81606233>] copy_net_ns+0x7b/0xf8
   [<ffffffff81606233>] ? copy_net_ns+0x7b/0xf8
   [<ffffffff811073c9>] create_new_namespaces+0xfc/0x16b
   [<ffffffff8110759c>] copy_namespaces+0x164/0x186
   [<ffffffff810ea6b1>] copy_process+0x10d2/0x195d
   [<ffffffff810eb094>] _do_fork+0x8c/0x2fb
   [<ffffffff81003044>] ? lockdep_sys_exit_thunk+0x12/0x14
   [<ffffffff810eb375>] SyS_clone+0x14/0x16
   [<ffffffff816c99b6>] entry_SYSCALL_64_fastpath+0x16/0x76
2 locks held by vsftpd/4352:
   #0:  (&cgroup_threadgroup_rwsem){++++++}, at: [<ffffffff810e9b97>] 
copy_process+0x5b8/0x195d
   #1:  (net_mutex){+.+.+.}, at: [<ffffffff81606233>] copy_net_ns+0x7b/0xf8
INFO: task kpktgend_0:4354 blocked for more than 180 seconds.
        Not tainted 4.4.0-rc1+ #22
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kpktgend_0      D ffff88005917bce8     0  4354      2 0x00000000
   ffff88005917bce8 00ffffffa06d5d06 ffff880000000000 ffff88007e216f80
   ffff88007a4ec900 ffff88005917c000 ffff88007a4ec900 ffffffffa06d5d06
   ffff88005917bed0 0000000000000000 ffff88005917bd00 ffffffff816c571f
Call Trace:
   [<ffffffffa06d5d06>] ? pg_net_init+0x346/0x346 [pktgen]
   [<ffffffff816c571f>] schedule+0x7a/0x8f
   [<ffffffff816c85ee>] rwsem_down_read_failed+0xdc/0xf8
   [<ffffffff8136d634>] call_rwsem_down_read_failed+0x14/0x30
   [<ffffffff8136d634>] ? call_rwsem_down_read_failed+0x14/0x30
   [<ffffffff810f91e0>] ? exit_signals+0x17/0x103
   [<ffffffff81126dc4>] ? percpu_down_read+0x4d/0x5f
   [<ffffffff810f91e0>] exit_signals+0x17/0x103
   [<ffffffff810ed342>] do_exit+0x105/0x9a4
   [<ffffffffa06d5d06>] ? pg_net_init+0x346/0x346 [pktgen]
   [<ffffffff81106df5>] kthread+0xcc/0xcc
   [<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
   [<ffffffff816c9d6f>] ret_from_fork+0x3f/0x70
   [<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
1 lock held by kpktgend_0/4354:
   #0:  (&cgroup_threadgroup_rwsem){++++++}, at: [<ffffffff810f91e0>] 
exit_signals+0x17/0x103
INFO: task kpktgend_1:4355 blocked for more than 180 seconds.
        Not tainted 4.4.0-rc1+ #22
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kpktgend_1      D ffff88007a4e3ce8     0  4355      2 0x00000000
   ffff88007a4e3ce8 00ffffffa06d5d06 ffff880000000001 ffff88007e296f80
   ffff88007a350000 ffff88007a4e4000 ffff88007a350000 ffffffffa06d5d06
   ffff88007a4e3ed0 0000000000000000 ffff88007a4e3d00 ffffffff816c571f
Call Trace:
   [<ffffffffa06d5d06>] ? pg_net_init+0x346/0x346 [pktgen]
   [<ffffffff816c571f>] schedule+0x7a/0x8f
   [<ffffffff816c85ee>] rwsem_down_read_failed+0xdc/0xf8
   [<ffffffff8136d634>] call_rwsem_down_read_failed+0x14/0x30
   [<ffffffff8136d634>] ? call_rwsem_down_read_failed+0x14/0x30
   [<ffffffff810f91e0>] ? exit_signals+0x17/0x103
   [<ffffffff81126dc4>] ? percpu_down_read+0x4d/0x5f
   [<ffffffff810f91e0>] exit_signals+0x17/0x103
   [<ffffffff810ed342>] do_exit+0x105/0x9a4
   [<ffffffffa06d5d06>] ? pg_net_init+0x346/0x346 [pktgen]
   [<ffffffff81106df5>] kthread+0xcc/0xcc
   [<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
   [<ffffffff816c9d6f>] ret_from_fork+0x3f/0x70
   [<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
1 lock held by kpktgend_1/4355:
   #0:  (&cgroup_threadgroup_rwsem){++++++}, at: [<ffffffff810f91e0>] 
exit_signals+0x17/0x103
INFO: task kpktgend_2:4356 blocked for more than 180 seconds.
        Not tainted 4.4.0-rc1+ #22
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kpktgend_2      D ffff8800549f7ce8     0  4356      2 0x00000000
   ffff8800549f7ce8 00ffffffa06d5d06 ffff880000000002 ffff88007e316f80
   ffff880054a52480 ffff8800549f8000 ffff880054a52480 ffffffffa06d5d06
   ffff8800549f7ed0 0000000000000000 ffff8800549f7d00 ffffffff816c571f
Call Trace:
   [<ffffffffa06d5d06>] ? pg_net_init+0x346/0x346 [pktgen]
   [<ffffffff816c571f>] schedule+0x7a/0x8f
   [<ffffffff816c85ee>] rwsem_down_read_failed+0xdc/0xf8
   [<ffffffff8136d634>] call_rwsem_down_read_failed+0x14/0x30
   [<ffffffff8136d634>] ? call_rwsem_down_read_failed+0x14/0x30
   [<ffffffff810f91e0>] ? exit_signals+0x17/0x103
   [<ffffffff81126dc4>] ? percpu_down_read+0x4d/0x5f
   [<ffffffff810f91e0>] exit_signals+0x17/0x103
   [<ffffffff810ed342>] do_exit+0x105/0x9a4
   [<ffffffffa06d5d06>] ? pg_net_init+0x346/0x346 [pktgen]
   [<ffffffff81106df5>] kthread+0xcc/0xcc
   [<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
   [<ffffffff816c9d6f>] ret_from_fork+0x3f/0x70
   [<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
1 lock held by kpktgend_2/4356:
   #0:  (&cgroup_threadgroup_rwsem){++++++}, at: [<ffffffff810f91e0>] 
exit_signals+0x17/0x103
INFO: task kpktgend_3:4357 blocked for more than 180 seconds.
        Not tainted 4.4.0-rc1+ #22
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kpktgend_3      D ffff88005e2b7ce8     0  4357      2 0x00000000
   ffff88005e2b7ce8 00ffffffa06d5d06 ffff880000000003 ffff88007e396f80
   ffff880054a54900 ffff88005e2b8000 ffff880054a54900 ffffffffa06d5d06
   ffff88005e2b7ed0 0000000000000000 ffff88005e2b7d00 ffffffff816c571f
Call Trace:
   [<ffffffffa06d5d06>] ? pg_net_init+0x346/0x346 [pktgen]
   [<ffffffff816c571f>] schedule+0x7a/0x8f
   [<ffffffff816c85ee>] rwsem_down_read_failed+0xdc/0xf8
   [<ffffffff8136d634>] call_rwsem_down_read_failed+0x14/0x30
   [<ffffffff8136d634>] ? call_rwsem_down_read_failed+0x14/0x30
   [<ffffffff810f91e0>] ? exit_signals+0x17/0x103
   [<ffffffff81126dc4>] ? percpu_down_read+0x4d/0x5f
   [<ffffffff810f91e0>] exit_signals+0x17/0x103
   [<ffffffff810ed342>] do_exit+0x105/0x9a4
   [<ffffffffa06d5d06>] ? pg_net_init+0x346/0x346 [pktgen]
   [<ffffffff81106df5>] kthread+0xcc/0xcc
   [<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
   [<ffffffff816c9d6f>] ret_from_fork+0x3f/0x70
   [<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
1 lock held by kpktgend_3/4357:
   #0:  (&cgroup_threadgroup_rwsem){++++++}, at: [<ffffffff810f91e0>] 
exit_signals+0x17/0x103

login: timed out after 60 seconds

CTRL-A Z for help | 115200 8N1 | NOR | Minicom 2.7 | VT102 | Offline | 
ttyUSB0


Please let me know if I can offer further help with troubleshooting this 
problem.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Bisected: Kernel deadlock bug related to cgroups.
  2016-08-30 18:39 Bisected: Kernel deadlock bug related to cgroups Brent Lovelace
@ 2016-08-31 10:16 ` Oleg Nesterov
  2016-08-31 10:33   ` Balbir Singh
  0 siblings, 1 reply; 4+ messages in thread
From: Oleg Nesterov @ 2016-08-31 10:16 UTC (permalink / raw)
  To: Brent Lovelace; +Cc: linux-kernel, lizefan, tj, Balbir Singh

On 08/30, Brent Lovelace wrote:
>
> I found a kernel deadlock regression bug introduced in the 4.4 kernel.
...
> I bisected to this commit id:
> ----------------------------------------------------------------------------------
> commit c9e75f0492b248aeaa7af8991a6fc9a21506bc96
> Author: Oleg Nesterov <oleg@redhat.com>
> Date:   Fri Nov 27 19:57:19 2015 +0100
>
>      cgroup: pids: fix race between cgroup_post_fork() and cgroup_migrate()

Thanks Brent!

> systemd         D ffff88007dfcfd10     0     1      0 0x00000000
>   ffff88007dfcfd10 00ff88007dfcfcd8 0000000000000001 ffff88007e296f80
>   ffff88007dff8000 ffff88007dfd0000 ffffffff82bae220 ffff880036758c00
>   fffffffffffffff2 ffff88005e327b00 ffff88007dfcfd28 ffffffff816c571f
> Call Trace:
>   [<ffffffff816c571f>] schedule+0x7a/0x8f
>   [<ffffffff81126ed2>] percpu_down_write+0xad/0xc4
>   [<ffffffff811220c8>] ? wake_up_atomic_t+0x25/0x25
>   [<ffffffff81169b4d>] __cgroup_procs_write+0x72/0x229
>   [<ffffffff8112b2eb>] ? lock_acquire+0x103/0x18f

so it sleeps in wait_event() waiting for active readers, and the new
readers will block. In particular, do_exit() will block.

> kworker/u8:2    D ffff880036befb58     0   185      2 0x00000000
> Workqueue: netns cleanup_net
>   ffff880036befb58 00ff880036befbd0 ffffffff00000002 ffff88007e316f80
>   ffff8800783e8000 ffff880036bf0000 ffff88005917bed0 ffff8800783e8000
>   ffffffff816c8953 ffff88005917bed8 ffff880036befb70 ffffffff816c571f
> Call Trace:
>   [<ffffffff816c8953>] ? usleep_range+0x3a/0x3a
>   [<ffffffff816c571f>] schedule+0x7a/0x8f
>   [<ffffffff816c8982>] schedule_timeout+0x2f/0xd8
>   [<ffffffff816c9204>] ? _raw_spin_unlock_irq+0x27/0x3f
>   [<ffffffff816c8953>] ? usleep_range+0x3a/0x3a
>   [<ffffffff81129ea2>] ? trace_hardirqs_on_caller+0x16f/0x18b
>   [<ffffffff816c5f44>] do_wait_for_common+0xf0/0x127
>   [<ffffffff816c5f44>] ? do_wait_for_common+0xf0/0x127
>   [<ffffffff811101c3>] ? wake_up_q+0x42/0x42
>   [<ffffffff816c5ff2>] wait_for_common+0x36/0x50
>   [<ffffffff816c6024>] wait_for_completion+0x18/0x1a
>   [<ffffffff81106b5f>] kthread_stop+0xc8/0x217
>   [<ffffffffa06d440c>] pg_net_exit+0xbc/0x112 [pktgen]
>   [<ffffffff81604fa2>] ops_exit_list+0x3d/0x4e
>   [<ffffffff81606123>] cleanup_net+0x19f/0x234
>   [<ffffffff81101aa3>] process_one_work+0x237/0x46b
>   [<ffffffff8110216d>] worker_thread+0x1e7/0x292
>   [<ffffffff81101f86>] ? rescuer_thread+0x285/0x285
>   [<ffffffff81106ded>] kthread+0xc4/0xcc
>   [<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
>   [<ffffffff816c9d6f>] ret_from_fork+0x3f/0x70
>   [<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
> 3 locks held by kworker/u8:2/185:
>   #0:  ("%s""netns"){.+.+.+}, at: [<ffffffff811019ad>]
> process_one_work+0x141/0x46b
>   #1:  (net_cleanup_work){+.+.+.}, at: [<ffffffff811019ad>]
> process_one_work+0x141/0x46b
>   #2:  (net_mutex){+.+.+.}, at: [<ffffffff81605ffe>] cleanup_net+0x7a/0x234

Note that it sleeps with net_mutex held. Probably waiting for kpktgend_*
below.

> vsftpd          D ffff880054867c68     0  4352   2611 0x00000000
>   ffff880054867c68 00ff88005933a480 ffff880000000000 ffff88007e216f80
>   ffff88005933a480 ffff880054868000 0000000000000246 ffff880054867cc0
>   ffff88005933a480 ffffffff81cea268 ffff880054867c80 ffffffff816c571f
> Call Trace:
>   [<ffffffff816c571f>] schedule+0x7a/0x8f
>   [<ffffffff816c59b1>] schedule_preempt_disabled+0x10/0x19
>   [<ffffffff816c64c0>] mutex_lock_nested+0x1c0/0x3a0
>   [<ffffffff81606233>] ? copy_net_ns+0x7b/0xf8
>   [<ffffffff81606233>] copy_net_ns+0x7b/0xf8
>   [<ffffffff81606233>] ? copy_net_ns+0x7b/0xf8
>   [<ffffffff811073c9>] create_new_namespaces+0xfc/0x16b
>   [<ffffffff8110759c>] copy_namespaces+0x164/0x186
>   [<ffffffff810ea6b1>] copy_process+0x10d2/0x195d
>   [<ffffffff810eb094>] _do_fork+0x8c/0x2fb
>   [<ffffffff81003044>] ? lockdep_sys_exit_thunk+0x12/0x14
>   [<ffffffff810eb375>] SyS_clone+0x14/0x16
>   [<ffffffff816c99b6>] entry_SYSCALL_64_fastpath+0x16/0x76
> 2 locks held by vsftpd/4352:
>   #0:  (&cgroup_threadgroup_rwsem){++++++}, at: [<ffffffff810e9b97>]
> copy_process+0x5b8/0x195d
>   #1:  (net_mutex){+.+.+.}, at: [<ffffffff81606233>] copy_net_ns+0x7b/0xf8

This waits for net_mutex held by kworker/u8:2 above. And with
cgroup_threadgroup_rwsem acquired for reading, that is why systemd
above hangs.

> kpktgend_0      D ffff88005917bce8     0  4354      2 0x00000000
>   ffff88005917bce8 00ffffffa06d5d06 ffff880000000000 ffff88007e216f80
>   ffff88007a4ec900 ffff88005917c000 ffff88007a4ec900 ffffffffa06d5d06
>   ffff88005917bed0 0000000000000000 ffff88005917bd00 ffffffff816c571f
> Call Trace:
>   [<ffffffffa06d5d06>] ? pg_net_init+0x346/0x346 [pktgen]
>   [<ffffffff816c571f>] schedule+0x7a/0x8f
>   [<ffffffff816c85ee>] rwsem_down_read_failed+0xdc/0xf8
>   [<ffffffff8136d634>] call_rwsem_down_read_failed+0x14/0x30
>   [<ffffffff8136d634>] ? call_rwsem_down_read_failed+0x14/0x30
>   [<ffffffff810f91e0>] ? exit_signals+0x17/0x103
>   [<ffffffff81126dc4>] ? percpu_down_read+0x4d/0x5f
>   [<ffffffff810f91e0>] exit_signals+0x17/0x103
>   [<ffffffff810ed342>] do_exit+0x105/0x9a4
>   [<ffffffffa06d5d06>] ? pg_net_init+0x346/0x346 [pktgen]
>   [<ffffffff81106df5>] kthread+0xcc/0xcc
>   [<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
>   [<ffffffff816c9d6f>] ret_from_fork+0x3f/0x70
>   [<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
> 1 lock held by kpktgend_0/4354:
>   #0:  (&cgroup_threadgroup_rwsem){++++++}, at: [<ffffffff810f91e0>]

it can't take cgroup_threadgroup_rwsem for reading, so it can't exit,
and that is why kworker/u8:2 hangs.

> kpktgend_1      D ffff88007a4e3ce8     0  4355      2 0x00000000
...
> kpktgend_2      D ffff8800549f7ce8     0  4356      2 0x00000000
...
> kpktgend_3      D ffff88005e2b7ce8     0  4357      2 0x00000000
...

The same.

Could you try the recent 568ac888215c7fb2fab "cgroup: reduce read
locked section of cgroup_threadgroup_rwsem during fork" patch?
Attached below.


With this patch copy_net_ns() should be called outside of
cgroup_threadgroup_rwsem, the deadlock should hopefully go away.

Thanks,

Oleg.
---

commit 568ac888215c7fb2fabe8ea739b00ec3c1f5d440
Author: Balbir Singh <bsingharora@gmail.com>
Date:   Wed Aug 10 15:43:06 2016 -0400

    cgroup: reduce read locked section of cgroup_threadgroup_rwsem during fork
    
    cgroup_threadgroup_rwsem is acquired in read mode during process exit
    and fork.  It is also grabbed in write mode during
    __cgroups_proc_write().  I've recently run into a scenario with lots
    of memory pressure and OOM and I am beginning to see
    
    systemd
    
     __switch_to+0x1f8/0x350
     __schedule+0x30c/0x990
     schedule+0x48/0xc0
     percpu_down_write+0x114/0x170
     __cgroup_procs_write.isra.12+0xb8/0x3c0
     cgroup_file_write+0x74/0x1a0
     kernfs_fop_write+0x188/0x200
     __vfs_write+0x6c/0xe0
     vfs_write+0xc0/0x230
     SyS_write+0x6c/0x110
     system_call+0x38/0xb4
    
    This thread is waiting on the reader of cgroup_threadgroup_rwsem to
    exit.  The reader itself is under memory pressure and has gone into
    reclaim after fork. There are times the reader also ends up waiting on
    oom_lock as well.
    
     __switch_to+0x1f8/0x350
     __schedule+0x30c/0x990
     schedule+0x48/0xc0
     jbd2_log_wait_commit+0xd4/0x180
     ext4_evict_inode+0x88/0x5c0
     evict+0xf8/0x2a0
     dispose_list+0x50/0x80
     prune_icache_sb+0x6c/0x90
     super_cache_scan+0x190/0x210
     shrink_slab.part.15+0x22c/0x4c0
     shrink_zone+0x288/0x3c0
     do_try_to_free_pages+0x1dc/0x590
     try_to_free_pages+0xdc/0x260
     __alloc_pages_nodemask+0x72c/0xc90
     alloc_pages_current+0xb4/0x1a0
     page_table_alloc+0xc0/0x170
     __pte_alloc+0x58/0x1f0
     copy_page_range+0x4ec/0x950
     copy_process.isra.5+0x15a0/0x1870
     _do_fork+0xa8/0x4b0
     ppc_clone+0x8/0xc
    
    In the meanwhile, all processes exiting/forking are blocked almost
    stalling the system.
    
    This patch moves the threadgroup_change_begin from before
    cgroup_fork() to just before cgroup_canfork().  There is no nee to
    worry about threadgroup changes till the task is actually added to the
    threadgroup.  This avoids having to call reclaim with
    cgroup_threadgroup_rwsem held.
    
    tj: Subject and description edits.
    
    Signed-off-by: Balbir Singh <bsingharora@gmail.com>
    Acked-by: Zefan Li <lizefan@huawei.com>
    Cc: Oleg Nesterov <oleg@redhat.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: stable@vger.kernel.org # v4.2+
    Signed-off-by: Tejun Heo <tj@kernel.org>

diff --git a/kernel/fork.c b/kernel/fork.c
index 52e725d..aaf7823 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1404,7 +1404,6 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 	p->real_start_time = ktime_get_boot_ns();
 	p->io_context = NULL;
 	p->audit_context = NULL;
-	threadgroup_change_begin(current);
 	cgroup_fork(p);
 #ifdef CONFIG_NUMA
 	p->mempolicy = mpol_dup(p->mempolicy);
@@ -1556,6 +1555,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 	INIT_LIST_HEAD(&p->thread_group);
 	p->task_works = NULL;
 
+	threadgroup_change_begin(current);
 	/*
 	 * Ensure that the cgroup subsystem policies allow the new process to be
 	 * forked. It should be noted the the new process's css_set can be changed
@@ -1656,6 +1656,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 bad_fork_cancel_cgroup:
 	cgroup_cancel_fork(p);
 bad_fork_free_pid:
+	threadgroup_change_end(current);
 	if (pid != &init_struct_pid)
 		free_pid(pid);
 bad_fork_cleanup_thread:
@@ -1688,7 +1689,6 @@ bad_fork_cleanup_policy:
 	mpol_put(p->mempolicy);
 bad_fork_cleanup_threadgroup_lock:
 #endif
-	threadgroup_change_end(current);
 	delayacct_tsk_free(p);
 bad_fork_cleanup_count:
 	atomic_dec(&p->cred->user->processes);

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: Bisected: Kernel deadlock bug related to cgroups.
  2016-08-31 10:16 ` Oleg Nesterov
@ 2016-08-31 10:33   ` Balbir Singh
  2016-08-31 23:19     ` Brent Lovelace
  0 siblings, 1 reply; 4+ messages in thread
From: Balbir Singh @ 2016-08-31 10:33 UTC (permalink / raw)
  To: Oleg Nesterov, Brent Lovelace; +Cc: linux-kernel, lizefan, tj



On 31/08/16 20:16, Oleg Nesterov wrote:
> On 08/30, Brent Lovelace wrote:
>>
>> I found a kernel deadlock regression bug introduced in the 4.4 kernel.
> ...
>> I bisected to this commit id:
>> ----------------------------------------------------------------------------------
>> commit c9e75f0492b248aeaa7af8991a6fc9a21506bc96
>> Author: Oleg Nesterov <oleg@redhat.com>
>> Date:   Fri Nov 27 19:57:19 2015 +0100
>>
>>      cgroup: pids: fix race between cgroup_post_fork() and cgroup_migrate()
> 
> Thanks Brent!
> 
>> systemd         D ffff88007dfcfd10     0     1      0 0x00000000
>>   ffff88007dfcfd10 00ff88007dfcfcd8 0000000000000001 ffff88007e296f80
>>   ffff88007dff8000 ffff88007dfd0000 ffffffff82bae220 ffff880036758c00
>>   fffffffffffffff2 ffff88005e327b00 ffff88007dfcfd28 ffffffff816c571f
>> Call Trace:
>>   [<ffffffff816c571f>] schedule+0x7a/0x8f
>>   [<ffffffff81126ed2>] percpu_down_write+0xad/0xc4
>>   [<ffffffff811220c8>] ? wake_up_atomic_t+0x25/0x25
>>   [<ffffffff81169b4d>] __cgroup_procs_write+0x72/0x229
>>   [<ffffffff8112b2eb>] ? lock_acquire+0x103/0x18f
> 
> so it sleeps in wait_event() waiting for active readers, and the new
> readers will block. In particular, do_exit() will block.
> 

New readers/writers both, usually this is a sign that a reader is
already holding the percpu rwsem.

>> kworker/u8:2    D ffff880036befb58     0   185      2 0x00000000
>> Workqueue: netns cleanup_net
>>   ffff880036befb58 00ff880036befbd0 ffffffff00000002 ffff88007e316f80
>>   ffff8800783e8000 ffff880036bf0000 ffff88005917bed0 ffff8800783e8000
>>   ffffffff816c8953 ffff88005917bed8 ffff880036befb70 ffffffff816c571f
>> Call Trace:
>>   [<ffffffff816c8953>] ? usleep_range+0x3a/0x3a
>>   [<ffffffff816c571f>] schedule+0x7a/0x8f
>>   [<ffffffff816c8982>] schedule_timeout+0x2f/0xd8
>>   [<ffffffff816c9204>] ? _raw_spin_unlock_irq+0x27/0x3f
>>   [<ffffffff816c8953>] ? usleep_range+0x3a/0x3a
>>   [<ffffffff81129ea2>] ? trace_hardirqs_on_caller+0x16f/0x18b
>>   [<ffffffff816c5f44>] do_wait_for_common+0xf0/0x127
>>   [<ffffffff816c5f44>] ? do_wait_for_common+0xf0/0x127
>>   [<ffffffff811101c3>] ? wake_up_q+0x42/0x42
>>   [<ffffffff816c5ff2>] wait_for_common+0x36/0x50
>>   [<ffffffff816c6024>] wait_for_completion+0x18/0x1a
>>   [<ffffffff81106b5f>] kthread_stop+0xc8/0x217
>>   [<ffffffffa06d440c>] pg_net_exit+0xbc/0x112 [pktgen]
>>   [<ffffffff81604fa2>] ops_exit_list+0x3d/0x4e
>>   [<ffffffff81606123>] cleanup_net+0x19f/0x234
>>   [<ffffffff81101aa3>] process_one_work+0x237/0x46b
>>   [<ffffffff8110216d>] worker_thread+0x1e7/0x292
>>   [<ffffffff81101f86>] ? rescuer_thread+0x285/0x285
>>   [<ffffffff81106ded>] kthread+0xc4/0xcc
>>   [<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
>>   [<ffffffff816c9d6f>] ret_from_fork+0x3f/0x70
>>   [<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
>> 3 locks held by kworker/u8:2/185:
>>   #0:  ("%s""netns"){.+.+.+}, at: [<ffffffff811019ad>]
>> process_one_work+0x141/0x46b
>>   #1:  (net_cleanup_work){+.+.+.}, at: [<ffffffff811019ad>]
>> process_one_work+0x141/0x46b
>>   #2:  (net_mutex){+.+.+.}, at: [<ffffffff81605ffe>] cleanup_net+0x7a/0x234
> 
> Note that it sleeps with net_mutex held. Probably waiting for kpktgend_*
> below.
> 
>> vsftpd          D ffff880054867c68     0  4352   2611 0x00000000
>>   ffff880054867c68 00ff88005933a480 ffff880000000000 ffff88007e216f80
>>   ffff88005933a480 ffff880054868000 0000000000000246 ffff880054867cc0
>>   ffff88005933a480 ffffffff81cea268 ffff880054867c80 ffffffff816c571f
>> Call Trace:
>>   [<ffffffff816c571f>] schedule+0x7a/0x8f
>>   [<ffffffff816c59b1>] schedule_preempt_disabled+0x10/0x19
>>   [<ffffffff816c64c0>] mutex_lock_nested+0x1c0/0x3a0
>>   [<ffffffff81606233>] ? copy_net_ns+0x7b/0xf8
>>   [<ffffffff81606233>] copy_net_ns+0x7b/0xf8
>>   [<ffffffff81606233>] ? copy_net_ns+0x7b/0xf8
>>   [<ffffffff811073c9>] create_new_namespaces+0xfc/0x16b
>>   [<ffffffff8110759c>] copy_namespaces+0x164/0x186
>>   [<ffffffff810ea6b1>] copy_process+0x10d2/0x195d
>>   [<ffffffff810eb094>] _do_fork+0x8c/0x2fb
>>   [<ffffffff81003044>] ? lockdep_sys_exit_thunk+0x12/0x14
>>   [<ffffffff810eb375>] SyS_clone+0x14/0x16
>>   [<ffffffff816c99b6>] entry_SYSCALL_64_fastpath+0x16/0x76
>> 2 locks held by vsftpd/4352:
>>   #0:  (&cgroup_threadgroup_rwsem){++++++}, at: [<ffffffff810e9b97>]
>> copy_process+0x5b8/0x195d
>>   #1:  (net_mutex){+.+.+.}, at: [<ffffffff81606233>] copy_net_ns+0x7b/0xf8
> 
> This waits for net_mutex held by kworker/u8:2 above. And with
> cgroup_threadgroup_rwsem acquired for reading, that is why systemd
> above hangs.
> 
>> kpktgend_0      D ffff88005917bce8     0  4354      2 0x00000000
>>   ffff88005917bce8 00ffffffa06d5d06 ffff880000000000 ffff88007e216f80
>>   ffff88007a4ec900 ffff88005917c000 ffff88007a4ec900 ffffffffa06d5d06
>>   ffff88005917bed0 0000000000000000 ffff88005917bd00 ffffffff816c571f
>> Call Trace:
>>   [<ffffffffa06d5d06>] ? pg_net_init+0x346/0x346 [pktgen]
>>   [<ffffffff816c571f>] schedule+0x7a/0x8f
>>   [<ffffffff816c85ee>] rwsem_down_read_failed+0xdc/0xf8
>>   [<ffffffff8136d634>] call_rwsem_down_read_failed+0x14/0x30
>>   [<ffffffff8136d634>] ? call_rwsem_down_read_failed+0x14/0x30
>>   [<ffffffff810f91e0>] ? exit_signals+0x17/0x103
>>   [<ffffffff81126dc4>] ? percpu_down_read+0x4d/0x5f
>>   [<ffffffff810f91e0>] exit_signals+0x17/0x103
>>   [<ffffffff810ed342>] do_exit+0x105/0x9a4
>>   [<ffffffffa06d5d06>] ? pg_net_init+0x346/0x346 [pktgen]
>>   [<ffffffff81106df5>] kthread+0xcc/0xcc
>>   [<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
>>   [<ffffffff816c9d6f>] ret_from_fork+0x3f/0x70
>>   [<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
>> 1 lock held by kpktgend_0/4354:
>>   #0:  (&cgroup_threadgroup_rwsem){++++++}, at: [<ffffffff810f91e0>]
> 
> it can't take cgroup_threadgroup_rwsem for reading, so it can't exit,
> and that is why kworker/u8:2 hangs.
> 
>> kpktgend_1      D ffff88007a4e3ce8     0  4355      2 0x00000000
> ...
>> kpktgend_2      D ffff8800549f7ce8     0  4356      2 0x00000000
> ...
>> kpktgend_3      D ffff88005e2b7ce8     0  4357      2 0x00000000
> ...
> 
> The same.
> 
> Could you try the recent 568ac888215c7fb2fab "cgroup: reduce read
> locked section of cgroup_threadgroup_rwsem during fork" patch?
> Attached below.
> 
> 
> With this patch copy_net_ns() should be called outside of
> cgroup_threadgroup_rwsem, the deadlock should hopefully go away.
> 
> Thanks,

Yes, I would be interested in seeing if this race goes away. Thanks for
the pointer Oleg!

Balbir Singh.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Bisected: Kernel deadlock bug related to cgroups.
  2016-08-31 10:33   ` Balbir Singh
@ 2016-08-31 23:19     ` Brent Lovelace
  0 siblings, 0 replies; 4+ messages in thread
From: Brent Lovelace @ 2016-08-31 23:19 UTC (permalink / raw)
  To: Balbir Singh, Oleg Nesterov; +Cc: linux-kernel, lizefan, tj



On 08/31/2016 03:33 AM, Balbir Singh wrote:
>
> On 31/08/16 20:16, Oleg Nesterov wrote:
>> On 08/30, Brent Lovelace wrote:
>>> I found a kernel deadlock regression bug introduced in the 4.4 kernel.
>> ...
>>> I bisected to this commit id:
>>> ----------------------------------------------------------------------------------
>>> commit c9e75f0492b248aeaa7af8991a6fc9a21506bc96
>>> Author: Oleg Nesterov <oleg@redhat.com>
>>> Date:   Fri Nov 27 19:57:19 2015 +0100
>>>
>>>       cgroup: pids: fix race between cgroup_post_fork() and cgroup_migrate()
>> Thanks Brent!
>>
>>> systemd         D ffff88007dfcfd10     0     1      0 0x00000000
>>>    ffff88007dfcfd10 00ff88007dfcfcd8 0000000000000001 ffff88007e296f80
>>>    ffff88007dff8000 ffff88007dfd0000 ffffffff82bae220 ffff880036758c00
>>>    fffffffffffffff2 ffff88005e327b00 ffff88007dfcfd28 ffffffff816c571f
>>> Call Trace:
>>>    [<ffffffff816c571f>] schedule+0x7a/0x8f
>>>    [<ffffffff81126ed2>] percpu_down_write+0xad/0xc4
>>>    [<ffffffff811220c8>] ? wake_up_atomic_t+0x25/0x25
>>>    [<ffffffff81169b4d>] __cgroup_procs_write+0x72/0x229
>>>    [<ffffffff8112b2eb>] ? lock_acquire+0x103/0x18f
>> so it sleeps in wait_event() waiting for active readers, and the new
>> readers will block. In particular, do_exit() will block.
>>
> New readers/writers both, usually this is a sign that a reader is
> already holding the percpu rwsem.
>
>>> kworker/u8:2    D ffff880036befb58     0   185      2 0x00000000
>>> Workqueue: netns cleanup_net
>>>    ffff880036befb58 00ff880036befbd0 ffffffff00000002 ffff88007e316f80
>>>    ffff8800783e8000 ffff880036bf0000 ffff88005917bed0 ffff8800783e8000
>>>    ffffffff816c8953 ffff88005917bed8 ffff880036befb70 ffffffff816c571f
>>> Call Trace:
>>>    [<ffffffff816c8953>] ? usleep_range+0x3a/0x3a
>>>    [<ffffffff816c571f>] schedule+0x7a/0x8f
>>>    [<ffffffff816c8982>] schedule_timeout+0x2f/0xd8
>>>    [<ffffffff816c9204>] ? _raw_spin_unlock_irq+0x27/0x3f
>>>    [<ffffffff816c8953>] ? usleep_range+0x3a/0x3a
>>>    [<ffffffff81129ea2>] ? trace_hardirqs_on_caller+0x16f/0x18b
>>>    [<ffffffff816c5f44>] do_wait_for_common+0xf0/0x127
>>>    [<ffffffff816c5f44>] ? do_wait_for_common+0xf0/0x127
>>>    [<ffffffff811101c3>] ? wake_up_q+0x42/0x42
>>>    [<ffffffff816c5ff2>] wait_for_common+0x36/0x50
>>>    [<ffffffff816c6024>] wait_for_completion+0x18/0x1a
>>>    [<ffffffff81106b5f>] kthread_stop+0xc8/0x217
>>>    [<ffffffffa06d440c>] pg_net_exit+0xbc/0x112 [pktgen]
>>>    [<ffffffff81604fa2>] ops_exit_list+0x3d/0x4e
>>>    [<ffffffff81606123>] cleanup_net+0x19f/0x234
>>>    [<ffffffff81101aa3>] process_one_work+0x237/0x46b
>>>    [<ffffffff8110216d>] worker_thread+0x1e7/0x292
>>>    [<ffffffff81101f86>] ? rescuer_thread+0x285/0x285
>>>    [<ffffffff81106ded>] kthread+0xc4/0xcc
>>>    [<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
>>>    [<ffffffff816c9d6f>] ret_from_fork+0x3f/0x70
>>>    [<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
>>> 3 locks held by kworker/u8:2/185:
>>>    #0:  ("%s""netns"){.+.+.+}, at: [<ffffffff811019ad>]
>>> process_one_work+0x141/0x46b
>>>    #1:  (net_cleanup_work){+.+.+.}, at: [<ffffffff811019ad>]
>>> process_one_work+0x141/0x46b
>>>    #2:  (net_mutex){+.+.+.}, at: [<ffffffff81605ffe>] cleanup_net+0x7a/0x234
>> Note that it sleeps with net_mutex held. Probably waiting for kpktgend_*
>> below.
>>
>>> vsftpd          D ffff880054867c68     0  4352   2611 0x00000000
>>>    ffff880054867c68 00ff88005933a480 ffff880000000000 ffff88007e216f80
>>>    ffff88005933a480 ffff880054868000 0000000000000246 ffff880054867cc0
>>>    ffff88005933a480 ffffffff81cea268 ffff880054867c80 ffffffff816c571f
>>> Call Trace:
>>>    [<ffffffff816c571f>] schedule+0x7a/0x8f
>>>    [<ffffffff816c59b1>] schedule_preempt_disabled+0x10/0x19
>>>    [<ffffffff816c64c0>] mutex_lock_nested+0x1c0/0x3a0
>>>    [<ffffffff81606233>] ? copy_net_ns+0x7b/0xf8
>>>    [<ffffffff81606233>] copy_net_ns+0x7b/0xf8
>>>    [<ffffffff81606233>] ? copy_net_ns+0x7b/0xf8
>>>    [<ffffffff811073c9>] create_new_namespaces+0xfc/0x16b
>>>    [<ffffffff8110759c>] copy_namespaces+0x164/0x186
>>>    [<ffffffff810ea6b1>] copy_process+0x10d2/0x195d
>>>    [<ffffffff810eb094>] _do_fork+0x8c/0x2fb
>>>    [<ffffffff81003044>] ? lockdep_sys_exit_thunk+0x12/0x14
>>>    [<ffffffff810eb375>] SyS_clone+0x14/0x16
>>>    [<ffffffff816c99b6>] entry_SYSCALL_64_fastpath+0x16/0x76
>>> 2 locks held by vsftpd/4352:
>>>    #0:  (&cgroup_threadgroup_rwsem){++++++}, at: [<ffffffff810e9b97>]
>>> copy_process+0x5b8/0x195d
>>>    #1:  (net_mutex){+.+.+.}, at: [<ffffffff81606233>] copy_net_ns+0x7b/0xf8
>> This waits for net_mutex held by kworker/u8:2 above. And with
>> cgroup_threadgroup_rwsem acquired for reading, that is why systemd
>> above hangs.
>>
>>> kpktgend_0      D ffff88005917bce8     0  4354      2 0x00000000
>>>    ffff88005917bce8 00ffffffa06d5d06 ffff880000000000 ffff88007e216f80
>>>    ffff88007a4ec900 ffff88005917c000 ffff88007a4ec900 ffffffffa06d5d06
>>>    ffff88005917bed0 0000000000000000 ffff88005917bd00 ffffffff816c571f
>>> Call Trace:
>>>    [<ffffffffa06d5d06>] ? pg_net_init+0x346/0x346 [pktgen]
>>>    [<ffffffff816c571f>] schedule+0x7a/0x8f
>>>    [<ffffffff816c85ee>] rwsem_down_read_failed+0xdc/0xf8
>>>    [<ffffffff8136d634>] call_rwsem_down_read_failed+0x14/0x30
>>>    [<ffffffff8136d634>] ? call_rwsem_down_read_failed+0x14/0x30
>>>    [<ffffffff810f91e0>] ? exit_signals+0x17/0x103
>>>    [<ffffffff81126dc4>] ? percpu_down_read+0x4d/0x5f
>>>    [<ffffffff810f91e0>] exit_signals+0x17/0x103
>>>    [<ffffffff810ed342>] do_exit+0x105/0x9a4
>>>    [<ffffffffa06d5d06>] ? pg_net_init+0x346/0x346 [pktgen]
>>>    [<ffffffff81106df5>] kthread+0xcc/0xcc
>>>    [<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
>>>    [<ffffffff816c9d6f>] ret_from_fork+0x3f/0x70
>>>    [<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
>>> 1 lock held by kpktgend_0/4354:
>>>    #0:  (&cgroup_threadgroup_rwsem){++++++}, at: [<ffffffff810f91e0>]
>> it can't take cgroup_threadgroup_rwsem for reading, so it can't exit,
>> and that is why kworker/u8:2 hangs.
>>
>>> kpktgend_1      D ffff88007a4e3ce8     0  4355      2 0x00000000
>> ...
>>> kpktgend_2      D ffff8800549f7ce8     0  4356      2 0x00000000
>> ...
>>> kpktgend_3      D ffff88005e2b7ce8     0  4357      2 0x00000000
>> ...
>>
>> The same.
>>
>> Could you try the recent 568ac888215c7fb2fab "cgroup: reduce read
>> locked section of cgroup_threadgroup_rwsem during fork" patch?
>> Attached below.
>>
>>
>> With this patch copy_net_ns() should be called outside of
>> cgroup_threadgroup_rwsem, the deadlock should hopefully go away.
>>
>> Thanks,
> Yes, I would be interested in seeing if this race goes away. Thanks for
> the pointer Oleg!
>
> Balbir Singh.
>
Thank you so much for the help! With 568ac888215c7fb2fab patched into 
the 4.4 kernel, the deadlock issue went away.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-08-31 23:20 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-30 18:39 Bisected: Kernel deadlock bug related to cgroups Brent Lovelace
2016-08-31 10:16 ` Oleg Nesterov
2016-08-31 10:33   ` Balbir Singh
2016-08-31 23:19     ` Brent Lovelace

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.