linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* timer code oops when calling mod_delayed_work
@ 2015-10-29 14:31 Jeff Layton
  2015-10-29 17:58 ` Jeff Layton
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff Layton @ 2015-10-29 14:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Tejun Heo, bfields, Michael Skralivetsky, Chris Worley, Trond Myklebust

Our QA guys hit the following warning and oops today on a relatively
recent upstream kernel. They were running a fairly busy NFSv3 I/O test
against this server:

[   71.074180] NFSD: starting 90-second grace period (net ffffffff81cf3b40)
[  221.146802] Ebtables v2.0 unregistered
[  318.909835] ------------[ cut here ]------------
[  318.909846] WARNING: CPU: 22 PID: 3402 at kernel/workqueue.c:1444 __queue_delayed_work+0xe7/0x190()
[  318.909848] Modules linked in: bridge stp llc intel_rapl iosf_mbi x86_pkg_temp_thermal coretemp kvm_intel kvm ipmi_ssif crct10dif_pclmul crc32_pclmul iTCO_wdt iTCO_vendor_support joydev mei_me mei sb_edac shpchp ipmi_si lpc_ich i2c_i801 edac_core tpm_tis wmi ipmi_msghandler tpm acpi_power_meter acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c mlx4_en vxlan ip6_udp_tunnel udp_tunnel ast drm_kms_helper ttm crc32c_intel igb drm mlx4_core ptp nvme pps_core dca i2c_algo_bit fjes [last unloaded: iptable_security]
[  318.909889] CPU: 22 PID: 3402 Comm: nfsd Not tainted 4.3.0-0.rc7.git0.1.fc24.x86_64 #1
[  318.909891] Hardware name: Quanta D51B/D51BP (1G LOM), BIOS S2BP3A05 01/29/2015
[  318.909892]  0000000000000000 00000000319846b5 ffff880322d0f440 ffffffff813a48cf
[  318.909895]  0000000000000000 ffff880322d0f478 ffffffff8109f122 0000000000000400
[  318.909898]  ffff88085f48fc00 ffffffffa03623e0 ffff88085f48fc00 000000000001d4c0
[  318.909901] Call Trace:
[  318.909906]  [<ffffffff813a48cf>] dump_stack+0x44/0x55
[  318.909909]  [<ffffffff8109f122>] warn_slowpath_common+0x82/0xc0
[  318.909913]  [<ffffffff8109f26a>] warn_slowpath_null+0x1a/0x20
[  318.909915]  [<ffffffff810b67a7>] __queue_delayed_work+0xe7/0x190
[  318.909917]  [<ffffffff810b70de>] mod_delayed_work_on+0x5e/0xa0
[  318.909931]  [<ffffffffa0333394>] prune_cache_entries+0x94/0xb0 [nfsd]
[  318.909936]  [<ffffffffa03333be>] nfsd_reply_cache_scan+0xe/0x10 [nfsd]
[  318.909941]  [<ffffffff811b78e2>] shrink_slab.part.42+0x1e2/0x380
[  318.909944]  [<ffffffff811bbfc6>] shrink_zone+0x2b6/0x2c0
[  318.909946]  [<ffffffff811bc145>] do_try_to_free_pages+0x175/0x400
[  318.909949]  [<ffffffff811bc4da>] try_to_free_pages+0x10a/0x150
[  318.909954]  [<ffffffff811ae5b1>] __alloc_pages_nodemask+0x581/0x990
[  318.909957]  [<ffffffff811b19f5>] ? account_page_dirtied+0xd5/0x160
[  318.909960]  [<ffffffff811f7b41>] alloc_pages_current+0x91/0x100
[  318.909963]  [<ffffffff811a45bb>] __page_cache_alloc+0xab/0xe0
[  318.909965]  [<ffffffff811a4c94>] pagecache_get_page+0x84/0x1e0
[  318.909967]  [<ffffffff811a4e16>] grab_cache_page_write_begin+0x26/0x40
[  318.909990]  [<ffffffffa0221cd4>] xfs_vm_write_begin+0x34/0xe0 [xfs]
[  318.910005]  [<ffffffffa0221a39>] ? xfs_vm_write_end+0x29/0x80 [xfs]
[  318.910007]  [<ffffffff811a418e>] generic_perform_write+0xce/0x1c0
[  318.910023]  [<ffffffffa022efb7>] xfs_file_buffered_aio_write+0x127/0x250 [xfs]
[  318.910028]  [<ffffffffa032be60>] ? nfserrno+0x60/0x60 [nfsd]
[  318.910031]  [<ffffffff812035e9>] ? __kmalloc+0x1b9/0x260
[  318.910034]  [<ffffffff81223617>] ? rw_copy_check_uvector+0x57/0x110
[  318.910047]  [<ffffffffa022f1d5>] xfs_file_write_iter+0xf5/0x110 [xfs]
[  318.910050]  [<ffffffff81221e3c>] do_iter_readv_writev+0x6c/0xa0
[  318.910053]  [<ffffffff812229bf>] do_readv_writev+0x18f/0x230
[  318.910065]  [<ffffffffa022f0e0>] ? xfs_file_buffered_aio_write+0x250/0x250 [xfs]
[  318.910068]  [<ffffffff813284bb>] ? security_file_open+0x8b/0x90
[  318.910070]  [<ffffffff8121ffa4>] ? do_dentry_open+0x264/0x2f0
[  318.910084]  [<ffffffffa022ced0>] ? xfs_extent_busy_ag_cmp+0x20/0x20 [xfs]
[  318.910086]  [<ffffffff81222ae9>] vfs_writev+0x39/0x50
[  318.910092]  [<ffffffffa032df4d>] nfsd_vfs_write+0xbd/0x380 [nfsd]
[  318.910097]  [<ffffffffa0330568>] nfsd_write+0xd8/0xf0 [nfsd]
[  318.910103]  [<ffffffffa03367fc>] nfsd3_proc_write+0xbc/0x150 [nfsd]
[  318.910107]  [<ffffffffa0328e68>] nfsd_dispatch+0xb8/0x200 [nfsd]
[  318.910128]  [<ffffffffa02d39dc>] svc_process_common+0x40c/0x650 [sunrpc]
[  318.910137]  [<ffffffffa02d4b14>] svc_process+0xf4/0x180 [sunrpc]
[  318.910142]  [<ffffffffa03288bf>] nfsd+0xef/0x160 [nfsd]
[  318.910146]  [<ffffffffa03287d0>] ? nfsd_destroy+0x60/0x60 [nfsd]
[  318.910149]  [<ffffffff810bd6a8>] kthread+0xd8/0xf0
[  318.910152]  [<ffffffff810bd5d0>] ? kthread_worker_fn+0x160/0x160
[  318.910155]  [<ffffffff8177f19f>] ret_from_fork+0x3f/0x70
[  318.910158]  [<ffffffff810bd5d0>] ? kthread_worker_fn+0x160/0x160
[  318.910160] ---[ end trace db375019f3368852 ]---
[  318.910169] ------------[ cut here ]------------
[  318.910201] kernel BUG at kernel/time/timer.c:977!
[  318.910226] invalid opcode: 0000 [#1] SMP 
[  318.910249] Modules linked in: bridge stp llc intel_rapl iosf_mbi x86_pkg_temp_thermal coretemp kvm_intel kvm ipmi_ssif crct10dif_pclmul crc32_pclmul iTCO_wdt iTCO_vendor_support joydev mei_me mei sb_edac shpchp ipmi_si lpc_ich i2c_i801 edac_core tpm_tis wmi ipmi_msghandler tpm acpi_power_meter acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c mlx4_en vxlan ip6_udp_tunnel udp_tunnel ast drm_kms_helper ttm crc32c_intel igb drm mlx4_core ptp nvme pps_core dca i2c_algo_bit fjes [last unloaded: iptable_security]
[  318.910588] CPU: 22 PID: 3402 Comm: nfsd Tainted: G        W       4.3.0-0.rc7.git0.1.fc24.x86_64 #1
[  318.910629] Hardware name: Quanta D51B/D51BP (1G LOM), BIOS S2BP3A05 01/29/2015
[  318.910664] task: ffff880322c49e00 ti: ffff880322d0c000 task.ti: ffff880322d0c000
[  318.910698] RIP: 0010:[<ffffffff811076a6>]  [<ffffffff811076a6>] add_timer_on+0xc6/0x110
[  318.910736] RSP: 0018:ffff880322d0f458  EFLAGS: 00010082
[  318.910761] RAX: 0000000000000000 RBX: 0000000000000016 RCX: 0000000100021e85
[  318.910793] RDX: ffffffffa03623c0 RSI: 0000000000000016 RDI: ffffffffa03623e0
[  318.910826] RBP: ffff880322d0f488 R08: 000000000000000a R09: 00000000000005f2
[  318.910856] R10: 000000000000027c R11: 00000000000005f2 R12: ffff88085f48fc00
[  318.910888] R13: ffff88085fd8e080 R14: ffff88085f48fc00 R15: 000000000001d4c0
[  318.910922] FS:  0000000000000000(0000) GS:ffff88085fd80000(0000) knlGS:0000000000000000
[  318.910956] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  318.910981] CR2: 00007f9d6ea41000 CR3: 0000000001c0b000 CR4: 00000000001406e0
[  318.911012] Stack:
[  318.911025]  ffff88085f48fc00 0000000000000016 ffff88085f48fc00 ffffffffa03623e0
[  318.911069]  ffff88085f48fc00 000000000001d4c0 ffff880322d0f4c0 ffffffff810b6749
[  318.911111]  000000000001d4c0 ffffffffa03623c0 0000000000000001 ffffffffa03623c0
[  318.911153] Call Trace:
[  318.911170]  [<ffffffff810b6749>] __queue_delayed_work+0x89/0x190
[  318.911201]  [<ffffffff810b70de>] mod_delayed_work_on+0x5e/0xa0
[  318.911236]  [<ffffffffa0333394>] prune_cache_entries+0x94/0xb0 [nfsd]
[  318.911267]  [<ffffffffa03333be>] nfsd_reply_cache_scan+0xe/0x10 [nfsd]
[  318.911299]  [<ffffffff811b78e2>] shrink_slab.part.42+0x1e2/0x380
[  318.911330]  [<ffffffff811bbfc6>] shrink_zone+0x2b6/0x2c0
[  318.911358]  [<ffffffff811bc145>] do_try_to_free_pages+0x175/0x400
[  318.911391]  [<ffffffff811bc4da>] try_to_free_pages+0x10a/0x150
[  318.911422]  [<ffffffff811ae5b1>] __alloc_pages_nodemask+0x581/0x990
[  318.911454]  [<ffffffff811b19f5>] ? account_page_dirtied+0xd5/0x160
[  318.911483]  [<ffffffff811f7b41>] alloc_pages_current+0x91/0x100
[  318.911513]  [<ffffffff811a45bb>] __page_cache_alloc+0xab/0xe0
[  318.911543]  [<ffffffff811a4c94>] pagecache_get_page+0x84/0x1e0
[  318.911573]  [<ffffffff811a4e16>] grab_cache_page_write_begin+0x26/0x40
[  318.911620]  [<ffffffffa0221cd4>] xfs_vm_write_begin+0x34/0xe0 [xfs]
[  318.911661]  [<ffffffffa0221a39>] ? xfs_vm_write_end+0x29/0x80 [xfs]
[  318.912748]  [<ffffffff811a418e>] generic_perform_write+0xce/0x1c0
[  318.913810]  [<ffffffffa022efb7>] xfs_file_buffered_aio_write+0x127/0x250 [xfs]
[  318.914858]  [<ffffffffa032be60>] ? nfserrno+0x60/0x60 [nfsd]
[  318.915715]  [<ffffffff812035e9>] ? __kmalloc+0x1b9/0x260
[  318.916360]  [<ffffffff81223617>] ? rw_copy_check_uvector+0x57/0x110
[  318.917014]  [<ffffffffa022f1d5>] xfs_file_write_iter+0xf5/0x110 [xfs]
[  318.917650]  [<ffffffff81221e3c>] do_iter_readv_writev+0x6c/0xa0
[  318.918312]  [<ffffffff812229bf>] do_readv_writev+0x18f/0x230
[  318.918954]  [<ffffffffa022f0e0>] ? xfs_file_buffered_aio_write+0x250/0x250 [xfs]
[  318.919575]  [<ffffffff813284bb>] ? security_file_open+0x8b/0x90
[  318.920178]  [<ffffffff8121ffa4>] ? do_dentry_open+0x264/0x2f0
[  318.920771]  [<ffffffffa022ced0>] ? xfs_extent_busy_ag_cmp+0x20/0x20 [xfs]
[  318.921354]  [<ffffffff81222ae9>] vfs_writev+0x39/0x50
[  318.921961]  [<ffffffffa032df4d>] nfsd_vfs_write+0xbd/0x380 [nfsd]
[  318.922509]  [<ffffffffa0330568>] nfsd_write+0xd8/0xf0 [nfsd]
[  318.923045]  [<ffffffffa03367fc>] nfsd3_proc_write+0xbc/0x150 [nfsd]
[  318.923574]  [<ffffffffa0328e68>] nfsd_dispatch+0xb8/0x200 [nfsd]
[  318.924111]  [<ffffffffa02d39dc>] svc_process_common+0x40c/0x650 [sunrpc]
[  318.924633]  [<ffffffffa02d4b14>] svc_process+0xf4/0x180 [sunrpc]
[  318.925144]  [<ffffffffa03288bf>] nfsd+0xef/0x160 [nfsd]
[  318.925654]  [<ffffffffa03287d0>] ? nfsd_destroy+0x60/0x60 [nfsd]
[  318.926163]  [<ffffffff810bd6a8>] kthread+0xd8/0xf0
[  318.926668]  [<ffffffff810bd5d0>] ? kthread_worker_fn+0x160/0x160
[  318.927189]  [<ffffffff8177f19f>] ret_from_fork+0x3f/0x70
[  318.927685]  [<ffffffff810bd5d0>] ? kthread_worker_fn+0x160/0x160
[  318.928210] Code: 4d 85 c0 75 e4 48 89 de 4c 89 ef e8 55 f9 ff ff 48 8b 34 24 4c 89 ef e8 19 72 67 00 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b 48 83 7f 38 00 48 8b 45 08 0f 85 5f ff ff ff 48 89 47 38 
[  318.929315] RIP  [<ffffffff811076a6>] add_timer_on+0xc6/0x110
[  318.930019]  RSP <ffff880322d0f458>

Basically the shrinker kicked in. After we shrank a bunch of cache
entries, we tried to mod_delayed_work on the cache cleaner job so that
it wouldn't run too soon, and that's where the code oopsed.

Is our usage of mod_delayed_work in prune_cache_entries() wrong in some
way? Or is there some breakage in the workqueue or timer code?

-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-11-04 19:43 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-29 14:31 timer code oops when calling mod_delayed_work Jeff Layton
2015-10-29 17:58 ` Jeff Layton
2015-10-31  2:00   ` Tejun Heo
2015-10-31 11:34     ` Jeff Layton
2015-10-31 21:31       ` Tejun Heo
2015-10-31 21:54         ` Jeff Layton
2015-11-02 19:48           ` Chris Worley
2015-11-02 19:56             ` Jeff Layton
2015-11-03  1:33               ` Jeff Layton
2015-11-03 17:55                 ` Jeff Layton
2015-11-03 22:54                   ` Tejun Heo
2015-11-04  0:06                     ` Tejun Heo
2015-11-04 11:48                       ` Jeff Layton
2015-11-04 17:15                         ` [PATCH] timer: add_timer_on() should perform proper migration Tejun Heo
2015-11-04 19:27                           ` [tip:timers/urgent] timers: Use proper base migration in add_timer_on() tip-bot for Tejun Heo
2015-11-04 19:35                           ` [PATCH] timer: add_timer_on() should perform proper migration Thomas Gleixner
2015-11-04 19:43                             ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).