All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: tj@kernel.org, linux-mm@kvack.org
Subject: mm: Why WQ_MEM_RECLAIM workqueue remains pending?
Date: Thu, 29 Jun 2017 19:57:58 +0900	[thread overview]
Message-ID: <201706291957.JGH39511.tQMOFSLOFJVHOF@I-love.SAKURA.ne.jp> (raw)

I hit an unable to invoke the OOM killer lockup shown below. According to
"cpus=1 node=0 flags=0x0 nice=0" part, it seems that drain_local_pages_wq
work stuck despite it is on WQ_MEM_RECLAIM mm_percpu_wq workqueue.

    mm_percpu_wq = alloc_workqueue("mm_percpu_wq", WQ_MEM_RECLAIM, 0);

Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20170629.txt.xz .
----------
[  423.393025] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 75s!
[  423.405078] Showing busy workqueues and worker pools:
[  423.411988] workqueue events: flags=0x0
[  423.417675]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[  423.425512]     pending: vmpressure_work_fn
[  423.431489]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=3/256
[  423.435671]     pending: vmstat_shepherd, rht_deferred_worker, vmw_fb_dirty_flush [vmwgfx]
[  423.439838] workqueue events_power_efficient: flags=0x80
[  423.442709]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
[  423.445870]     pending: neigh_periodic_work, do_cache_clean
[  423.448884] workqueue events_freezable_power_: flags=0x84
[  423.451809]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[  423.454953]     pending: disk_events_workfn
[  423.457371] workqueue mm_percpu_wq: flags=0x8
[  423.459883]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256
[  423.463018]     pending: drain_local_pages_wq BAR(8775), vmstat_update
[  423.465553] workqueue writeback: flags=0x4e
[  423.467221]   pwq 128: cpus=0-63 flags=0x4 nice=0 active=2/256
[  423.469645]     in-flight: 355:wb_workfn wb_workfn
[  423.472129] workqueue ipv6_addrconf: flags=0x40008
[  423.474416]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/1
[  423.476725]     pending: addrconf_verify_work
[  423.478513] workqueue mpt_poll_0: flags=0x8
[  423.480199]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[  423.482462]     pending: mpt_fault_reset_work [mptbase]
[  423.484533] workqueue xfs-data/sda1: flags=0xc
[  423.486388]   pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=4/256
[  423.488674]     in-flight: 95:xfs_end_io [xfs], 8978:xfs_end_io [xfs], 65:xfs_end_io [xfs], 8979:xfs_end_io [xfs]
[  423.492249]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=2/256
[  423.494519]     in-flight: 205:xfs_end_io [xfs], 23:xfs_end_io [xfs]
[  423.496943]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=17/256 MAYDAY
[  423.499432]     in-flight: 17:xfs_end_io [xfs], 8985:xfs_end_io [xfs], 8980:xfs_end_io [xfs], 7805:xfs_end_io [xfs], 8984:xfs_end_io [xfs]
[  423.503673]     pending: xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs]
[  423.511095]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=8/256 MAYDAY
[  423.513586]     in-flight: 8981:xfs_end_io [xfs], 375(RESCUER):xfs_end_io [xfs] xfs_end_io [xfs], 33:xfs_end_io [xfs], 3:xfs_end_io [xfs], 7839:xfs_end_io [xfs], 8976:xfs_end_io [xfs]
[  423.519583]     pending: xfs_end_io [xfs]
[  423.521566] workqueue xfs-eofblocks/sda1: flags=0xc
[  423.523645]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[  423.526041]     in-flight: 168:xfs_eofblocks_worker [xfs]
[  423.528253] pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=75s workers=6 manager: 217
[  423.531114] pool 2: cpus=1 node=0 flags=0x0 nice=0 hung=76s workers=7 manager: 43
[  423.534038] pool 4: cpus=2 node=0 flags=0x0 nice=0 hung=0s workers=8 idle: 8977 7840 60 7837 8983 8982
[  423.537554] pool 6: cpus=3 node=0 flags=0x0 nice=0 hung=0s workers=7 idle: 464 42 29
[  423.540547] pool 128: cpus=0-63 flags=0x4 nice=0 hung=48s workers=3 idle: 354 356
[  453.599480] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 105s!
[  453.613056] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 106s!
[  453.623744] Showing busy workqueues and worker pools:
[  453.631182] workqueue events: flags=0x0
[  453.638161]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[  453.644207]     pending: vmpressure_work_fn
[  453.647411]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=3/256
[  453.651569]     pending: vmstat_shepherd, rht_deferred_worker, vmw_fb_dirty_flush [vmwgfx]
[  453.656989] workqueue events_power_efficient: flags=0x80
[  453.660815]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
[  453.665043]     pending: neigh_periodic_work, do_cache_clean
[  453.669263] workqueue events_freezable_power_: flags=0x84
[  453.672641]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[  453.675053]     pending: disk_events_workfn
[  453.676885] workqueue mm_percpu_wq: flags=0x8
[  453.678730]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256
[  453.681075]     pending: drain_local_pages_wq BAR(8775), vmstat_update
[  453.683610] workqueue writeback: flags=0x4e
[  453.685412]   pwq 128: cpus=0-63 flags=0x4 nice=0 active=2/256
[  453.687703]     in-flight: 355:wb_workfn wb_workfn
[  453.690022] workqueue ipv6_addrconf: flags=0x40008
[  453.692018]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/1
[  453.694295]     pending: addrconf_verify_work
[  453.696139] workqueue mpt_poll_0: flags=0x8
[  453.698024]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[  453.700362]     pending: mpt_fault_reset_work [mptbase]
[  453.702515] workqueue xfs-data/sda1: flags=0xc
[  453.704470]   pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=4/256
[  453.706809]     in-flight: 95:xfs_end_io [xfs], 8978:xfs_end_io [xfs], 65:xfs_end_io [xfs], 8979:xfs_end_io [xfs]
[  453.710578]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=2/256
[  453.712956]     in-flight: 205:xfs_end_io [xfs], 23:xfs_end_io [xfs]
[  453.715424]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=17/256 MAYDAY
[  453.717958]     in-flight: 17:xfs_end_io [xfs], 8985:xfs_end_io [xfs], 8980:xfs_end_io [xfs], 7805:xfs_end_io [xfs], 8984:xfs_end_io [xfs]
[  453.722214]     pending: xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs]
[  453.729761]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=8/256 MAYDAY
[  453.732276]     in-flight: 8981:xfs_end_io [xfs], 375(RESCUER):xfs_end_io [xfs] xfs_end_io [xfs], 33:xfs_end_io [xfs], 3:xfs_end_io [xfs], 7839:xfs_end_io [xfs], 8976:xfs_end_io [xfs]
[  453.738334]     pending: xfs_end_io [xfs]
[  453.740136] workqueue xfs-eofblocks/sda1: flags=0xc
[  453.742165]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[  453.744507]     in-flight: 168:xfs_eofblocks_worker [xfs]
[  453.746655] pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=106s workers=6 manager: 217
[  453.749528] pool 2: cpus=1 node=0 flags=0x0 nice=0 hung=106s workers=7 manager: 43
[  453.752382] pool 4: cpus=2 node=0 flags=0x0 nice=0 hung=0s workers=8 idle: 8977 7840 60 7837 8983 8982
[  453.755740] pool 6: cpus=3 node=0 flags=0x0 nice=0 hung=0s workers=7 idle: 464 42 29
[  453.758603] pool 128: cpus=0-63 flags=0x4 nice=0 hung=78s workers=3 idle: 354 356
----------

This might be just another example of being caught by too_many_isolated() trap
in shrink_inactive_list(). But I expected that works in mm_percpu_wq workqueue is
always processed immediately as long as in-flight work calls schedule().

Why "pending: drain_local_pages_wq" was not processed despite all in-flight works on
"pool 2: cpus=1 node=0 flags=0x0 nice=0" (that is, 17, 8985, 8980, 7805, 8984 and 168
shown below) were all sleeping at schedule() ? Or, am I just fooled by appearances that
drain_local_pages_wq was actually processed but constantly/immediately re-queued as if
drain_local_pages_wq was never processed?

----------
[  521.100167] kworker/1:0     D12608    17      2 0x00000000
[  521.102148] Workqueue: xfs-data/sda1 xfs_end_io [xfs]
[  521.103916] Call Trace:
[  521.104962]  __schedule+0x23f/0x5d0
[  521.106326]  schedule+0x31/0x80
[  521.107569]  schedule_timeout+0x189/0x290
[  521.109033]  ? del_timer_sync+0x40/0x40
[  521.110496]  io_schedule_timeout+0x19/0x40
[  521.112035]  ? io_schedule_timeout+0x19/0x40
[  521.113609]  congestion_wait+0x7d/0xd0
[  521.115128]  ? wait_woken+0x80/0x80
[  521.116440]  shrink_inactive_list+0x3e3/0x4d0
[  521.118048]  shrink_node_memcg+0x360/0x780
[  521.119592]  shrink_node+0xdc/0x310
[  521.120903]  ? shrink_node+0xdc/0x310
[  521.122293]  do_try_to_free_pages+0xea/0x370
[  521.123828]  try_to_free_pages+0xc3/0x100
[  521.125351]  __alloc_pages_slowpath+0x441/0xd50
[  521.126999]  __alloc_pages_nodemask+0x20c/0x250
[  521.128611]  alloc_pages_current+0x65/0xd0
[  521.130129]  new_slab+0x472/0x600
[  521.131405]  ___slab_alloc+0x41b/0x590
[  521.132835]  ? kmem_alloc+0x8a/0x110 [xfs]
[  521.134361]  ? ___slab_alloc+0x1b6/0x590
[  521.135803]  ? kmem_alloc+0x8a/0x110 [xfs]
[  521.137344]  __slab_alloc+0x1b/0x30
[  521.138653]  ? __slab_alloc+0x1b/0x30
[  521.140217]  __kmalloc+0x17e/0x200
[  521.141507]  kmem_alloc+0x8a/0x110 [xfs]
[  521.142980]  xfs_log_commit_cil+0x276/0x750 [xfs]
[  521.144692]  __xfs_trans_commit+0x7d/0x280 [xfs]
[  521.146332]  xfs_trans_commit+0xb/0x10 [xfs]
[  521.147898]  __xfs_setfilesize+0x7c/0xb0 [xfs]
[  521.149522]  xfs_setfilesize_ioend+0x41/0x60 [xfs]
[  521.151215]  xfs_end_io+0x44/0x130 [xfs]
[  521.152724]  process_one_work+0x1f5/0x390
[  521.154187]  worker_thread+0x46/0x410
[  521.155594]  kthread+0xff/0x140
[  521.156836]  ? process_one_work+0x390/0x390
[  521.158356]  ? kthread_create_on_node+0x60/0x60
[  521.160024]  ret_from_fork+0x25/0x30

[  521.760190] kswapd0         D11640    51      2 0x00000000
[  521.762176] Call Trace:
[  521.763204]  __schedule+0x23f/0x5d0
[  521.764581]  schedule+0x31/0x80
[  521.765826]  schedule_timeout+0x1c1/0x290
[  521.767351]  ? save_stack_trace+0x16/0x20
[  521.768827]  ? set_track+0x6b/0x140
[  521.770204]  ? init_object+0x64/0xa0
[  521.771561]  __down+0x85/0xd0
[  521.772765]  ? __down+0x85/0xd0
[  521.773982]  ? cmpxchg_double_slab.isra.73+0x140/0x150
[  521.775827]  down+0x3c/0x50
[  521.776991]  ? down+0x3c/0x50
[  521.778190]  xfs_buf_lock+0x21/0x50 [xfs]
[  521.779716]  _xfs_buf_find+0x3cd/0x640 [xfs]
[  521.781267]  xfs_buf_get_map+0x25/0x150 [xfs]
[  521.782952]  xfs_buf_read_map+0x25/0xc0 [xfs]
[  521.784610]  xfs_trans_read_buf_map+0xef/0x2f0 [xfs]
[  521.786358]  xfs_read_agf+0x86/0x110 [xfs]
[  521.787898]  ? wakeup_preempt_entity.isra.76+0x39/0x50
[  521.789746]  xfs_alloc_read_agf+0x3e/0x140 [xfs]
[  521.791416]  xfs_alloc_fix_freelist+0x3e8/0x4e0 [xfs]
[  521.793231]  ? ttwu_do_activate.isra.66+0x6d/0x80
[  521.794951]  ? try_to_wake_up+0x23b/0x3c0
[  521.796443]  ? radix_tree_lookup+0xd/0x10
[  521.798011]  ? xfs_perag_get+0x16/0x50 [xfs]
[  521.799592]  ? xfs_bmap_longest_free_extent+0x8e/0xb0 [xfs]
[  521.801515]  xfs_alloc_vextent+0x15a/0x4a0 [xfs]
[  521.803210]  xfs_bmap_btalloc+0x33f/0x910 [xfs]
[  521.804884]  xfs_bmap_alloc+0x9/0x10 [xfs]
[  521.806382]  xfs_bmapi_write+0x7ca/0x1170 [xfs]
[  521.808072]  xfs_iomap_write_allocate+0x191/0x3b0 [xfs]
[  521.809953]  xfs_map_blocks+0x180/0x240 [xfs]
[  521.811527]  xfs_do_writepage+0x259/0x780 [xfs]
[  521.813200]  ? list_lru_add+0x3d/0xe0
[  521.814602]  xfs_vm_writepage+0x36/0x70 [xfs]
[  521.816171]  pageout.isra.53+0x195/0x2c0
[  521.817645]  shrink_page_list+0xa72/0xd50
[  521.819112]  shrink_inactive_list+0x239/0x4d0
[  521.820709]  ? radix_tree_gang_lookup_tag+0xd7/0x150
[  521.822599]  shrink_node_memcg+0x360/0x780
[  521.824238]  shrink_node+0xdc/0x310
[  521.825805]  ? shrink_node+0xdc/0x310
[  521.827318]  kswapd+0x373/0x6a0
[  521.828580]  kthread+0xff/0x140
[  521.829912]  ? mem_cgroup_shrink_node+0xb0/0xb0
[  521.831541]  ? kthread_create_on_node+0x60/0x60
[  521.833218]  ret_from_fork+0x25/0x30

[  522.064227] kworker/1:2     D11688   168      2 0x00000000
[  522.066193] Workqueue: xfs-eofblocks/sda1 xfs_eofblocks_worker [xfs]
[  522.068401] Call Trace:
[  522.069479]  __schedule+0x23f/0x5d0
[  522.070807]  schedule+0x31/0x80
[  522.072085]  schedule_timeout+0x1c1/0x290
[  522.073561]  ? init_object+0x64/0xa0
[  522.074956]  __down+0x85/0xd0
[  522.076122]  ? __down+0x85/0xd0
[  522.077381]  ? deactivate_slab.isra.83+0xa0/0x4b0
[  522.079052]  down+0x3c/0x50
[  522.080228]  ? down+0x3c/0x50
[  522.081403]  xfs_buf_lock+0x21/0x50 [xfs]
[  522.082919]  _xfs_buf_find+0x3cd/0x640 [xfs]
[  522.084521]  xfs_buf_get_map+0x25/0x150 [xfs]
[  522.086098]  xfs_buf_read_map+0x25/0xc0 [xfs]
[  522.087723]  xfs_trans_read_buf_map+0xef/0x2f0 [xfs]
[  522.089532]  xfs_read_agf+0x86/0x110 [xfs]
[  522.091034]  xfs_alloc_read_agf+0x3e/0x140 [xfs]
[  522.092730]  xfs_alloc_fix_freelist+0x3e8/0x4e0 [xfs]
[  522.094561]  ? kmem_zone_alloc+0x8a/0x110 [xfs]
[  522.096186]  ? set_track+0x6b/0x140
[  522.097565]  ? init_object+0x64/0xa0
[  522.098912]  ? ___slab_alloc+0x1b6/0x590
[  522.100391]  ? ___slab_alloc+0x1b6/0x590
[  522.101886]  xfs_free_extent_fix_freelist+0x78/0xe0 [xfs]
[  522.103767]  xfs_free_extent+0x6a/0x1d0 [xfs]
[  522.105396]  xfs_trans_free_extent+0x2c/0xb0 [xfs]
[  522.107145]  xfs_extent_free_finish_item+0x21/0x40 [xfs]
[  522.109003]  xfs_defer_finish+0x143/0x2b0 [xfs]
[  522.110686]  xfs_itruncate_extents+0x1a5/0x3d0 [xfs]
[  522.112499]  xfs_free_eofblocks+0x1a8/0x200 [xfs]
[  522.114191]  xfs_inode_free_eofblocks+0xe3/0x110 [xfs]
[  522.116056]  ? xfs_inode_ag_walk_grab+0x63/0xa0 [xfs]
[  522.117888]  xfs_inode_ag_walk.isra.23+0x20a/0x450 [xfs]
[  522.119808]  ? __xfs_inode_clear_eofblocks_tag+0x120/0x120 [xfs]
[  522.121921]  xfs_inode_ag_iterator_tag+0x6e/0xa0 [xfs]
[  522.123744]  ? __xfs_inode_clear_eofblocks_tag+0x120/0x120 [xfs]
[  522.125870]  xfs_eofblocks_worker+0x28/0x40 [xfs]
[  522.127617]  process_one_work+0x1f5/0x390
[  522.129111]  worker_thread+0x46/0x410
[  522.130531]  ? __schedule+0x247/0x5d0
[  522.132044]  kthread+0xff/0x140
[  522.133673]  ? process_one_work+0x390/0x390
[  522.135320]  ? kthread_create_on_node+0x60/0x60
[  522.137024]  ret_from_fork+0x25/0x30

[  526.558822] kworker/1:3     D12608  7805      2 0x00000080
[  526.560825] Workqueue: xfs-data/sda1 xfs_end_io [xfs]
[  526.562636] Call Trace:
[  526.563678]  __schedule+0x23f/0x5d0
[  526.565058]  schedule+0x31/0x80
[  526.566295]  schedule_timeout+0x189/0x290
[  526.567823]  ? del_timer_sync+0x40/0x40
[  526.569339]  io_schedule_timeout+0x19/0x40
[  526.570849]  ? io_schedule_timeout+0x19/0x40
[  526.572434]  congestion_wait+0x7d/0xd0
[  526.573841]  ? wait_woken+0x80/0x80
[  526.575209]  shrink_inactive_list+0x3e3/0x4d0
[  526.576848]  shrink_node_memcg+0x360/0x780
[  526.578358]  shrink_node+0xdc/0x310
[  526.579760]  ? shrink_node+0xdc/0x310
[  526.581142]  do_try_to_free_pages+0xea/0x370
[  526.582720]  try_to_free_pages+0xc3/0x100
[  526.584251]  __alloc_pages_slowpath+0x441/0xd50
[  526.585882]  __alloc_pages_nodemask+0x20c/0x250
[  526.587539]  alloc_pages_current+0x65/0xd0
[  526.589045]  new_slab+0x472/0x600
[  526.590364]  ___slab_alloc+0x41b/0x590
[  526.591805]  ? kmem_alloc+0x8a/0x110 [xfs]
[  526.593313]  ? ___slab_alloc+0x1b6/0x590
[  526.594838]  ? kmem_alloc+0x8a/0x110 [xfs]
[  526.596346]  __slab_alloc+0x1b/0x30
[  526.598171]  ? __slab_alloc+0x1b/0x30
[  526.599658]  __kmalloc+0x17e/0x200
[  526.600959]  ? __slab_free+0x9f/0x300
[  526.602382]  kmem_alloc+0x8a/0x110 [xfs]
[  526.603839]  xfs_log_commit_cil+0x276/0x750 [xfs]
[  526.605563]  __xfs_trans_commit+0x7d/0x280 [xfs]
[  526.607256]  xfs_trans_commit+0xb/0x10 [xfs]
[  526.608814]  __xfs_setfilesize+0x7c/0xb0 [xfs]
[  526.610494]  xfs_setfilesize_ioend+0x41/0x60 [xfs]
[  526.612224]  xfs_end_io+0x44/0x130 [xfs]
[  526.613680]  process_one_work+0x1f5/0x390
[  526.615232]  worker_thread+0x46/0x410
[  526.616641]  ? __schedule+0x247/0x5d0
[  526.618029]  kthread+0xff/0x140
[  526.619319]  ? process_one_work+0x390/0x390
[  526.620845]  ? kthread_create_on_node+0x60/0x60
[  526.622497]  ret_from_fork+0x25/0x30

[  551.327823] c.out           D11000  8775   7943 0x00000080
[  551.329829] Call Trace:
[  551.330945]  __schedule+0x23f/0x5d0
[  551.332300]  schedule+0x31/0x80
[  551.333552]  schedule_timeout+0x189/0x290
[  551.335052]  ? release_pages+0x30f/0x3d0
[  551.336564]  ? del_timer_sync+0x40/0x40
[  551.338013]  io_schedule_timeout+0x19/0x40
[  551.339583]  ? io_schedule_timeout+0x19/0x40
[  551.341270]  congestion_wait+0x7d/0xd0
[  551.342696]  ? wait_woken+0x80/0x80
[  551.344077]  shrink_inactive_list+0x3e3/0x4d0
[  551.345672]  shrink_node_memcg+0x360/0x780
[  551.347237]  shrink_node+0xdc/0x310
[  551.348636]  ? shrink_node+0xdc/0x310
[  551.350027]  do_try_to_free_pages+0xea/0x370
[  551.351627]  try_to_free_pages+0xc3/0x100
[  551.353115]  __alloc_pages_slowpath+0x441/0xd50
[  551.355045]  ? account_page_dirtied+0x109/0x160
[  551.356768]  __alloc_pages_nodemask+0x20c/0x250
[  551.358458]  alloc_pages_current+0x65/0xd0
[  551.359980]  __page_cache_alloc+0x95/0xb0
[  551.361584]  __do_page_cache_readahead+0x10a/0x2d0
[  551.363307]  ? radix_tree_lookup_slot+0x22/0x50
[  551.364950]  ? find_get_entry+0x19/0x140
[  551.366458]  filemap_fault+0x4b1/0x760
[  551.367872]  ? filemap_fault+0x4b1/0x760
[  551.369430]  ? iomap_apply+0xc8/0x110
[  551.370909]  ? _cond_resched+0x15/0x40
[  551.372895]  xfs_filemap_fault+0x34/0x50 [xfs]
[  551.374640]  __do_fault+0x19/0xf0
[  551.376074]  __handle_mm_fault+0xb0b/0x1030
[  551.377612]  handle_mm_fault+0xf4/0x220
[  551.379086]  __do_page_fault+0x25b/0x4a0
[  551.380540]  do_page_fault+0x30/0x80
[  551.381940]  ? do_syscall_64+0xfd/0x140
[  551.383419]  page_fault+0x28/0x30
[  551.384693] RIP: 0033:0x7f9efff4fc90
[  551.386114] RSP: 002b:00007ffecf5cc0c8 EFLAGS: 00010246
[  551.388004] RAX: 0000000000001000 RBX: 0000000000000003 RCX: 00007f9efff4fc90
[  551.390470] RDX: 0000000000001000 RSI: 00000000006010c0 RDI: 0000000000000003
[  551.392898] RBP: 0000000000000003 R08: 00007f9effeaf938 R09: 000000000000000e
[  551.395300] R10: 00007ffecf5cbe50 R11: 0000000000000246 R12: 000000000040085d
[  551.397912] R13: 00007ffecf5cc1d0 R14: 0000000000000000 R15: 0000000000000000

[  563.729125] kworker/1:4     D11568  8980      2 0x00000080
[  563.731075] Workqueue: xfs-data/sda1 xfs_end_io [xfs]
[  563.732874] Call Trace:
[  563.733893]  __schedule+0x23f/0x5d0
[  563.735218]  schedule+0x31/0x80
[  563.736493]  schedule_timeout+0x189/0x290
[  563.737985]  ? del_timer_sync+0x40/0x40
[  563.739399]  io_schedule_timeout+0x19/0x40
[  563.741018]  ? io_schedule_timeout+0x19/0x40
[  563.742560]  congestion_wait+0x7d/0xd0
[  563.743979]  ? wait_woken+0x80/0x80
[  563.745311]  shrink_inactive_list+0x3e3/0x4d0
[  563.746909]  shrink_node_memcg+0x360/0x780
[  563.748425]  shrink_node+0xdc/0x310
[  563.749739]  ? shrink_node+0xdc/0x310
[  563.751437]  do_try_to_free_pages+0xea/0x370
[  563.753020]  try_to_free_pages+0xc3/0x100
[  563.754486]  __alloc_pages_slowpath+0x441/0xd50
[  563.756234]  __alloc_pages_nodemask+0x20c/0x250
[  563.757884]  alloc_pages_current+0x65/0xd0
[  563.759375]  new_slab+0x472/0x600
[  563.760771]  ___slab_alloc+0x41b/0x590
[  563.762221]  ? kmem_alloc+0x8a/0x110 [xfs]
[  563.763751]  ? ___slab_alloc+0x1b6/0x590
[  563.765198]  ? kmem_alloc+0x8a/0x110 [xfs]
[  563.766741]  __slab_alloc+0x1b/0x30
[  563.768092]  ? __slab_alloc+0x1b/0x30
[  563.769450]  __kmalloc+0x17e/0x200
[  563.770786]  kmem_alloc+0x8a/0x110 [xfs]
[  563.772260]  xfs_log_commit_cil+0x276/0x750 [xfs]
[  563.773982]  __xfs_trans_commit+0x7d/0x280 [xfs]
[  563.775684]  xfs_trans_commit+0xb/0x10 [xfs]
[  563.777231]  __xfs_setfilesize+0x7c/0xb0 [xfs]
[  563.778850]  xfs_setfilesize_ioend+0x41/0x60 [xfs]
[  563.780580]  xfs_end_io+0x44/0x130 [xfs]
[  563.782051]  process_one_work+0x1f5/0x390
[  563.783573]  worker_thread+0x46/0x410
[  563.784938]  ? __schedule+0x247/0x5d0
[  563.786368]  kthread+0xff/0x140
[  563.787576]  ? process_one_work+0x390/0x390
[  563.789131]  ? kthread_create_on_node+0x60/0x60
[  563.790789]  ? do_syscall_64+0x13a/0x140
[  563.792221]  ret_from_fork+0x25/0x30

[  563.887126] kworker/1:5     D11320  8984      2 0x00000080
[  563.889083] Workqueue: xfs-data/sda1 xfs_end_io [xfs]
[  563.890905] Call Trace:
[  563.891922]  __schedule+0x23f/0x5d0
[  563.893279]  schedule+0x31/0x80
[  563.894501]  schedule_timeout+0x189/0x290
[  563.896034]  ? del_timer_sync+0x40/0x40
[  563.897450]  io_schedule_timeout+0x19/0x40
[  563.898967]  ? io_schedule_timeout+0x19/0x40
[  563.900541]  congestion_wait+0x7d/0xd0
[  563.901937]  ? wait_woken+0x80/0x80
[  563.903282]  shrink_inactive_list+0x3e3/0x4d0
[  563.904853]  shrink_node_memcg+0x360/0x780
[  563.906403]  shrink_node+0xdc/0x310
[  563.907720]  ? shrink_node+0xdc/0x310
[  563.909133]  do_try_to_free_pages+0xea/0x370
[  563.910731]  try_to_free_pages+0xc3/0x100
[  563.912198]  __alloc_pages_slowpath+0x441/0xd50
[  563.913836]  __alloc_pages_nodemask+0x20c/0x250
[  563.915526]  alloc_pages_current+0x65/0xd0
[  563.917012]  new_slab+0x472/0x600
[  563.918291]  ___slab_alloc+0x41b/0x590
[  563.919687]  ? kmem_alloc+0x8a/0x110 [xfs]
[  563.921204]  ? ___slab_alloc+0x1b6/0x590
[  563.922647]  ? kmem_alloc+0x8a/0x110 [xfs]
[  563.924159]  __slab_alloc+0x1b/0x30
[  563.925593]  ? __slab_alloc+0x1b/0x30
[  563.926952]  __kmalloc+0x17e/0x200
[  563.928286]  kmem_alloc+0x8a/0x110 [xfs]
[  563.929724]  xfs_log_commit_cil+0x276/0x750 [xfs]
[  563.931446]  __xfs_trans_commit+0x7d/0x280 [xfs]
[  563.933086]  xfs_trans_commit+0xb/0x10 [xfs]
[  563.934631]  __xfs_setfilesize+0x7c/0xb0 [xfs]
[  563.936279]  xfs_setfilesize_ioend+0x41/0x60 [xfs]
[  563.938007]  xfs_end_io+0x44/0x130 [xfs]
[  563.939445]  process_one_work+0x1f5/0x390
[  563.941032]  worker_thread+0x46/0x410
[  563.942439]  ? __schedule+0x247/0x5d0
[  563.943850]  kthread+0xff/0x140
[  563.945056]  ? process_one_work+0x390/0x390
[  563.946616]  ? kthread_create_on_node+0x60/0x60
[  563.948261]  ? do_syscall_64+0x13a/0x140
[  563.949698]  ret_from_fork+0x25/0x30

[  563.951084] kworker/1:6     D12128  8985      2 0x00000080
[  563.953006] Workqueue: xfs-data/sda1 xfs_end_io [xfs]
[  563.954773] Call Trace:
[  563.955861]  __schedule+0x23f/0x5d0
[  563.957178]  schedule+0x31/0x80
[  563.958430]  schedule_timeout+0x189/0x290
[  563.959893]  ? del_timer_sync+0x40/0x40
[  563.961344]  io_schedule_timeout+0x19/0x40
[  563.962852]  ? io_schedule_timeout+0x19/0x40
[  563.964390]  congestion_wait+0x7d/0xd0
[  563.965843]  ? wait_woken+0x80/0x80
[  563.967153]  shrink_inactive_list+0x3e3/0x4d0
[  563.968781]  shrink_node_memcg+0x360/0x780
[  563.970268]  shrink_node+0xdc/0x310
[  563.971602]  ? shrink_node+0xdc/0x310
[  563.972983]  do_try_to_free_pages+0xea/0x370
[  563.974519]  try_to_free_pages+0xc3/0x100
[  563.976045]  __alloc_pages_slowpath+0x441/0xd50
[  563.977658]  __alloc_pages_nodemask+0x20c/0x250
[  563.979303]  alloc_pages_current+0x65/0xd0
[  563.980818]  new_slab+0x472/0x600
[  563.982079]  ___slab_alloc+0x41b/0x590
[  563.983499]  ? kmem_alloc+0x8a/0x110 [xfs]
[  563.984987]  ? ___slab_alloc+0x1b6/0x590
[  563.986552]  ? kmem_alloc+0x8a/0x110 [xfs]
[  563.988076]  __slab_alloc+0x1b/0x30
[  563.989384]  ? __slab_alloc+0x1b/0x30
[  563.990788]  __kmalloc+0x17e/0x200
[  563.992077]  kmem_alloc+0x8a/0x110 [xfs]
[  563.993586]  xfs_log_commit_cil+0x276/0x750 [xfs]
[  563.995257]  __xfs_trans_commit+0x7d/0x280 [xfs]
[  563.996950]  xfs_trans_commit+0xb/0x10 [xfs]
[  563.998530]  __xfs_setfilesize+0x7c/0xb0 [xfs]
[  564.000132]  xfs_setfilesize_ioend+0x41/0x60 [xfs]
[  564.001882]  xfs_end_io+0x44/0x130 [xfs]
[  564.003348]  process_one_work+0x1f5/0x390
[  564.004814]  worker_thread+0x46/0x410
[  564.006217]  ? __schedule+0x247/0x5d0
[  564.007578]  kthread+0xff/0x140
[  564.008837]  ? process_one_work+0x390/0x390
[  564.010392]  ? kthread_create_on_node+0x60/0x60
[  564.012005]  ret_from_fork+0x25/0x30
----------

I wish we can print how long each work remains
pending or in-flight. Something like below...

----------
diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index c102ef6..444f86f 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -101,6 +101,7 @@ struct work_struct {
 	atomic_long_t data;
 	struct list_head entry;
 	work_func_t func;
+	unsigned long stamp;
 #ifdef CONFIG_LOCKDEP
 	struct lockdep_map lockdep_map;
 #endif
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index a86688f..6be185a 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1296,6 +1296,7 @@ static void insert_work(struct pool_workqueue *pwq, struct work_struct *work,
 	struct worker_pool *pool = pwq->pool;
 
 	/* we own @work, set data and link */
+	work->stamp = jiffies;
 	set_work_pwq(work, pwq, extra_flags);
 	list_add_tail(&work->entry, head);
 	get_pwq(pwq);
@@ -4316,10 +4317,10 @@ static void pr_cont_work(bool comma, struct work_struct *work)
 
 		barr = container_of(work, struct wq_barrier, work);
 
-		pr_cont("%s BAR(%d)", comma ? "," : "",
-			task_pid_nr(barr->task));
+		pr_cont("%s BAR(%d){%lu}", comma ? "," : "",
+			task_pid_nr(barr->task), jiffies - work->stamp);
 	} else {
-		pr_cont("%s %pf", comma ? "," : "", work->func);
+		pr_cont("%s %pf{%lu}", comma ? "," : "", work->func, jiffies - work->stamp);
 	}
 }
 
@@ -4351,10 +4352,11 @@ static void show_pwq(struct pool_workqueue *pwq)
 			if (worker->current_pwq != pwq)
 				continue;
 
-			pr_cont("%s %d%s:%pf", comma ? "," : "",
+			pr_cont("%s %d%s:%pf{%lu}", comma ? "," : "",
 				task_pid_nr(worker->task),
 				worker == pwq->wq->rescuer ? "(RESCUER)" : "",
-				worker->current_func);
+				worker->current_func, worker->current_work ?
+				jiffies - worker->current_work->stamp : 0);
 			list_for_each_entry(work, &worker->scheduled, entry)
 				pr_cont_work(false, work);
 			comma = true;
----------

----------
[  872.639478] Showing busy workqueues and worker pools:
[  872.641577] workqueue events_freezable_power_: flags=0x84
[  872.643751]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[  872.646118]     in-flight: 10905:disk_events_workfn{155066}
[  872.648446] workqueue writeback: flags=0x4e
[  872.650194]   pwq 128: cpus=0-63 flags=0x4 nice=0 active=2/256
[  872.652507]     in-flight: 354:wb_workfn{153828} wb_workfn{153828}
[  872.655406] workqueue xfs-eofblocks/sda1: flags=0xc
[  872.657444]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[  872.659827]     in-flight: 3:xfs_eofblocks_worker [xfs]{146617}
[  872.662295] pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=0s workers=7 idle: 94 7730 9860 33 10902
[  872.665704] pool 128: cpus=0-63 flags=0x4 nice=0 hung=22s workers=3 idle: 353 352

[ 1107.349044] Showing busy workqueues and worker pools:
[ 1107.351129] workqueue events_freezable_power_: flags=0x84
[ 1107.353297]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[ 1107.355662]     in-flight: 10905:disk_events_workfn{389810}
[ 1107.357929] workqueue writeback: flags=0x4e
[ 1107.359674]   pwq 128: cpus=0-63 flags=0x4 nice=0 active=2/256
[ 1107.362093]     in-flight: 354:wb_workfn{388572} wb_workfn{388572}
[ 1107.364976] workqueue xfs-eofblocks/sda1: flags=0xc
[ 1107.366977]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[ 1107.369354]     in-flight: 3:xfs_eofblocks_worker [xfs]{381361}
[ 1107.371770] pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=0s workers=4 idle: 94 7730
[ 1107.374847] pool 128: cpus=0-63 flags=0x4 nice=0 hung=257s workers=3 idle: 353 352

[ 1239.796406] Showing busy workqueues and worker pools:
[ 1239.798505] workqueue events_freezable_power_: flags=0x84
[ 1239.800809]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[ 1239.803200]     in-flight: 10905:disk_events_workfn{522257}
[ 1239.805537] workqueue writeback: flags=0x4e
[ 1239.807338]   pwq 128: cpus=0-63 flags=0x4 nice=0 active=2/256
[ 1239.809714]     in-flight: 354:wb_workfn{521034} wb_workfn{521034}
[ 1239.812589] workqueue xfs-eofblocks/sda1: flags=0xc
[ 1239.814682]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[ 1239.817207]     in-flight: 3:xfs_eofblocks_worker [xfs]{513823}
[ 1239.819686] pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=0s workers=4 idle: 94 7730
[ 1239.822659] pool 128: cpus=0-63 flags=0x4 nice=0 hung=389s workers=3 idle: 353 352
----------

I haven't succeeded reproducing mm_percpu_wq with this patch applied.
What is strange, "BUG: workqueue lockup" messages do not show up even though
SysRq-t says that pool 128 is hung...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

             reply	other threads:[~2017-06-29 10:58 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-29 10:57 Tetsuo Handa [this message]
2017-07-07 10:27 ` mm: Why WQ_MEM_RECLAIM workqueue remains pending? Tetsuo Handa
2017-07-10 18:12   ` Tejun Heo
2017-07-11 10:51     ` Tetsuo Handa
2017-07-11 12:19       ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201706291957.JGH39511.tQMOFSLOFJVHOF@I-love.SAKURA.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=linux-mm@kvack.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.