Deadlock in cleanup_net and addrconf_verify_work locks up workqueue

* Deadlock in cleanup_net and addrconf_verify_work locks up workqueue
@ 2020-02-11 19:23 Sargun Dhillon
  2020-02-12  5:08 ` Eric Dumazet
  0 siblings, 1 reply; 4+ messages in thread
From: Sargun Dhillon @ 2020-02-11 19:23 UTC (permalink / raw)
  To: netdev, LKML
  Cc: Tetsuo Handa, Gabriel Hartmann, Rob Gulewich, Bruce Curtis,
	Hannes Frederic Sowa

We've found a workqueue stall / deadlock. Our workload is a container-oriented
workload in which we utilize IPv6. Our container (namespace) churn is quite
frequent, and containers can be terminated before their networking is
even setup.

We're running 4.19.73 in production, and in investigation of the underlying
causes, I don't think that future versions of 4.19 fix it.

We've narrowed it down to a lockup between ipv6_addrconf, and cleanup_net.

crash> bt 8
PID: 8      TASK: ffff9a1072b50000  CPU: 24  COMMAND: "kworker/u192:0"
 #0 [ffffbfe2c00fbb70] __schedule at ffffffffa7f02bf7
 #1 [ffffbfe2c00fbc10] schedule at ffffffffa7f031e8
 #2 [ffffbfe2c00fbc18] schedule_timeout at ffffffffa7f0700e
 #3 [ffffbfe2c00fbc90] wait_for_completion at ffffffffa7f03b50
 #4 [ffffbfe2c00fbce0] __flush_work at ffffffffa76a2532
 #5 [ffffbfe2c00fbd58] rollback_registered_many at ffffffffa7dbcdf4
 #6 [ffffbfe2c00fbdc0] unregister_netdevice_many at ffffffffa7dbd31e
 #7 [ffffbfe2c00fbdd0] default_device_exit_batch at ffffffffa7dbd512
 #8 [ffffbfe2c00fbe40] cleanup_net at ffffffffa7dab970
 #9 [ffffbfe2c00fbe98] process_one_work at ffffffffa76a17c4
#10 [ffffbfe2c00fbed8] worker_thread at ffffffffa76a19dd
#11 [ffffbfe2c00fbf10] kthread at ffffffffa76a7fd3
#12 [ffffbfe2c00fbf50] ret_from_fork at ffffffffa80001ff

crash> bt 1369493
PID: 1369493  TASK: ffff9a03684d9600  CPU: 58  COMMAND: "kworker/58:1"
 #0 [ffffbfe30d68fd48] __schedule at ffffffffa7f02bf7
 #1 [ffffbfe30d68fde8] schedule at ffffffffa7f031e8
 #2 [ffffbfe30d68fdf0] schedule_preempt_disabled at ffffffffa7f0349a
 #3 [ffffbfe30d68fdf8] __mutex_lock at ffffffffa7f04aed
 #4 [ffffbfe30d68fe90] addrconf_verify_work at ffffffffa7e8d1aa
 #5 [ffffbfe30d68fe98] process_one_work at ffffffffa76a17c4
 #6 [ffffbfe30d68fed8] worker_thread at ffffffffa76a19dd
 #7 [ffffbfe30d68ff10] kthread at ffffffffa76a7fd3
 #8 [ffffbfe30d68ff50] ret_from_fork at ffffffffa80001ff

 struct -x mutex.owner.counter rtnl_mutex
  owner.counter = 0xffff9a1072b50001

0xffff9a1072b50001 & (~0x07) = 0xffff9a1072b50000

This points back to PID 8 / CPU 24. It is working on cleanup_net, and a part
of cleanup net involves calling ops_exit_list, and as part of that it calls
default_device_exit_batch. default_device_exit_batch takes the rtnl lock before
calling into unregister_netdevice_many, and rollback_registered_many.
rollback_registered_many calls flush_all_backlogs. This will never complete
because it is holding the rtnl lock, and PID 1369493 / CPU 58 is waiting
for rtnl_lock.

If relevant, the workqueue stalls themselves look something like:
BUG: workqueue lockup - pool cpus=70 node=0 flags=0x0 nice=0 stuck for 3720s!
BUG: workqueue lockup - pool cpus=70 node=0 flags=0x0 nice=-20 stuck for 3719s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
  pwq 32: cpus=16 node=0 flags=0x0 nice=0 active=2/256
    in-flight: 1274779:slab_caches_to_rcu_destroy_workfn slab_caches_to_rcu_destroy_workfn
workqueue events_highpri: flags=0x10
  pwq 141: cpus=70 node=0 flags=0x0 nice=-20 active=1/256
    pending: flush_backlog BAR(8)
workqueue events_power_efficient: flags=0x82
  pwq 193: cpus=0-23,48-71 node=0 flags=0x4 nice=0 active=1/256
    in-flight: 1396446:check_lifetime
workqueue mm_percpu_wq: flags=0x8
  pwq 140: cpus=70 node=0 flags=0x0 nice=0 active=1/256
    pending: vmstat_update
workqueue netns: flags=0xe000a
  pwq 192: cpus=0-95 flags=0x4 nice=0 active=1/1
    in-flight: 8:cleanup_net
    delayed: cleanup_net
workqueue writeback: flags=0x4e
  pwq 193: cpus=0-23,48-71 node=0 flags=0x4 nice=0 active=1/256
    in-flight: 1334335:wb_workfn
workqueue kblockd: flags=0x18
  pwq 141: cpus=70 node=0 flags=0x0 nice=-20 active=1/256
    pending: blk_mq_run_work_fn
workqueue ipv6_addrconf: flags=0x40008
  pwq 116: cpus=58 node=0 flags=0x0 nice=0 active=1/1
    in-flight: 1369493:addrconf_verify_work
workqueue ena: flags=0xe000a
  pwq 192: cpus=0-95 flags=0x4 nice=0 active=1/1
    in-flight: 7505:ena_fw_reset_device [ena]

^ permalink raw reply	[flat|nested] 4+ messages in thread