linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@poochiereds.net>
To: linux-kernel@vger.kernel.org
Cc: Tejun Heo <tj@kernel.org>,
	bfields@fieldses.org,
	Michael Skralivetsky <michael.skralivetsky@primarydata.com>,
	Chris Worley <chris.worley@primarydata.com>,
	Trond Myklebust <trond.myklebust@primarydata.com>
Subject: Re: timer code oops when calling mod_delayed_work
Date: Thu, 29 Oct 2015 13:58:36 -0400	[thread overview]
Message-ID: <20151029135836.02ad9000@synchrony.poochiereds.net> (raw)
In-Reply-To: <20151029103113.2f893924@tlielax.poochiereds.net>

On Thu, 29 Oct 2015 10:31:13 -0400
Jeff Layton <jlayton@poochiereds.net> wrote:

> Our QA guys hit the following warning and oops today on a relatively
> recent upstream kernel. They were running a fairly busy NFSv3 I/O test
> against this server:
> 
> [   71.074180] NFSD: starting 90-second grace period (net ffffffff81cf3b40)
> [  221.146802] Ebtables v2.0 unregistered
> [  318.909835] ------------[ cut here ]------------
> [  318.909846] WARNING: CPU: 22 PID: 3402 at kernel/workqueue.c:1444 __queue_delayed_work+0xe7/0x190()
> [  318.909848] Modules linked in: bridge stp llc intel_rapl iosf_mbi x86_pkg_temp_thermal coretemp kvm_intel kvm ipmi_ssif crct10dif_pclmul crc32_pclmul iTCO_wdt iTCO_vendor_support joydev mei_me mei sb_edac shpchp ipmi_si lpc_ich i2c_i801 edac_core tpm_tis wmi ipmi_msghandler tpm acpi_power_meter acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c mlx4_en vxlan ip6_udp_tunnel udp_tunnel ast drm_kms_helper ttm crc32c_intel igb drm mlx4_core ptp nvme pps_core dca i2c_algo_bit fjes [last unloaded: iptable_security]
> [  318.909889] CPU: 22 PID: 3402 Comm: nfsd Not tainted 4.3.0-0.rc7.git0.1.fc24.x86_64 #1
> [  318.909891] Hardware name: Quanta D51B/D51BP (1G LOM), BIOS S2BP3A05 01/29/2015
> [  318.909892]  0000000000000000 00000000319846b5 ffff880322d0f440 ffffffff813a48cf
> [  318.909895]  0000000000000000 ffff880322d0f478 ffffffff8109f122 0000000000000400
> [  318.909898]  ffff88085f48fc00 ffffffffa03623e0 ffff88085f48fc00 000000000001d4c0
> [  318.909901] Call Trace:
> [  318.909906]  [<ffffffff813a48cf>] dump_stack+0x44/0x55
> [  318.909909]  [<ffffffff8109f122>] warn_slowpath_common+0x82/0xc0
> [  318.909913]  [<ffffffff8109f26a>] warn_slowpath_null+0x1a/0x20
> [  318.909915]  [<ffffffff810b67a7>] __queue_delayed_work+0xe7/0x190
> [  318.909917]  [<ffffffff810b70de>] mod_delayed_work_on+0x5e/0xa0
> [  318.909931]  [<ffffffffa0333394>] prune_cache_entries+0x94/0xb0 [nfsd]
> [  318.909936]  [<ffffffffa03333be>] nfsd_reply_cache_scan+0xe/0x10 [nfsd]
> [  318.909941]  [<ffffffff811b78e2>] shrink_slab.part.42+0x1e2/0x380
> [  318.909944]  [<ffffffff811bbfc6>] shrink_zone+0x2b6/0x2c0
> [  318.909946]  [<ffffffff811bc145>] do_try_to_free_pages+0x175/0x400
> [  318.909949]  [<ffffffff811bc4da>] try_to_free_pages+0x10a/0x150
> [  318.909954]  [<ffffffff811ae5b1>] __alloc_pages_nodemask+0x581/0x990
> [  318.909957]  [<ffffffff811b19f5>] ? account_page_dirtied+0xd5/0x160
> [  318.909960]  [<ffffffff811f7b41>] alloc_pages_current+0x91/0x100
> [  318.909963]  [<ffffffff811a45bb>] __page_cache_alloc+0xab/0xe0
> [  318.909965]  [<ffffffff811a4c94>] pagecache_get_page+0x84/0x1e0
> [  318.909967]  [<ffffffff811a4e16>] grab_cache_page_write_begin+0x26/0x40
> [  318.909990]  [<ffffffffa0221cd4>] xfs_vm_write_begin+0x34/0xe0 [xfs]
> [  318.910005]  [<ffffffffa0221a39>] ? xfs_vm_write_end+0x29/0x80 [xfs]
> [  318.910007]  [<ffffffff811a418e>] generic_perform_write+0xce/0x1c0
> [  318.910023]  [<ffffffffa022efb7>] xfs_file_buffered_aio_write+0x127/0x250 [xfs]
> [  318.910028]  [<ffffffffa032be60>] ? nfserrno+0x60/0x60 [nfsd]
> [  318.910031]  [<ffffffff812035e9>] ? __kmalloc+0x1b9/0x260
> [  318.910034]  [<ffffffff81223617>] ? rw_copy_check_uvector+0x57/0x110
> [  318.910047]  [<ffffffffa022f1d5>] xfs_file_write_iter+0xf5/0x110 [xfs]
> [  318.910050]  [<ffffffff81221e3c>] do_iter_readv_writev+0x6c/0xa0
> [  318.910053]  [<ffffffff812229bf>] do_readv_writev+0x18f/0x230
> [  318.910065]  [<ffffffffa022f0e0>] ? xfs_file_buffered_aio_write+0x250/0x250 [xfs]
> [  318.910068]  [<ffffffff813284bb>] ? security_file_open+0x8b/0x90
> [  318.910070]  [<ffffffff8121ffa4>] ? do_dentry_open+0x264/0x2f0
> [  318.910084]  [<ffffffffa022ced0>] ? xfs_extent_busy_ag_cmp+0x20/0x20 [xfs]
> [  318.910086]  [<ffffffff81222ae9>] vfs_writev+0x39/0x50
> [  318.910092]  [<ffffffffa032df4d>] nfsd_vfs_write+0xbd/0x380 [nfsd]
> [  318.910097]  [<ffffffffa0330568>] nfsd_write+0xd8/0xf0 [nfsd]
> [  318.910103]  [<ffffffffa03367fc>] nfsd3_proc_write+0xbc/0x150 [nfsd]
> [  318.910107]  [<ffffffffa0328e68>] nfsd_dispatch+0xb8/0x200 [nfsd]
> [  318.910128]  [<ffffffffa02d39dc>] svc_process_common+0x40c/0x650 [sunrpc]
> [  318.910137]  [<ffffffffa02d4b14>] svc_process+0xf4/0x180 [sunrpc]
> [  318.910142]  [<ffffffffa03288bf>] nfsd+0xef/0x160 [nfsd]
> [  318.910146]  [<ffffffffa03287d0>] ? nfsd_destroy+0x60/0x60 [nfsd]
> [  318.910149]  [<ffffffff810bd6a8>] kthread+0xd8/0xf0
> [  318.910152]  [<ffffffff810bd5d0>] ? kthread_worker_fn+0x160/0x160
> [  318.910155]  [<ffffffff8177f19f>] ret_from_fork+0x3f/0x70
> [  318.910158]  [<ffffffff810bd5d0>] ? kthread_worker_fn+0x160/0x160
> [  318.910160] ---[ end trace db375019f3368852 ]---
> [  318.910169] ------------[ cut here ]------------
> [  318.910201] kernel BUG at kernel/time/timer.c:977!
> [  318.910226] invalid opcode: 0000 [#1] SMP 
> [  318.910249] Modules linked in: bridge stp llc intel_rapl iosf_mbi x86_pkg_temp_thermal coretemp kvm_intel kvm ipmi_ssif crct10dif_pclmul crc32_pclmul iTCO_wdt iTCO_vendor_support joydev mei_me mei sb_edac shpchp ipmi_si lpc_ich i2c_i801 edac_core tpm_tis wmi ipmi_msghandler tpm acpi_power_meter acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c mlx4_en vxlan ip6_udp_tunnel udp_tunnel ast drm_kms_helper ttm crc32c_intel igb drm mlx4_core ptp nvme pps_core dca i2c_algo_bit fjes [last unloaded: iptable_security]
> [  318.910588] CPU: 22 PID: 3402 Comm: nfsd Tainted: G        W       4.3.0-0.rc7.git0.1.fc24.x86_64 #1
> [  318.910629] Hardware name: Quanta D51B/D51BP (1G LOM), BIOS S2BP3A05 01/29/2015
> [  318.910664] task: ffff880322c49e00 ti: ffff880322d0c000 task.ti: ffff880322d0c000
> [  318.910698] RIP: 0010:[<ffffffff811076a6>]  [<ffffffff811076a6>] add_timer_on+0xc6/0x110
> [  318.910736] RSP: 0018:ffff880322d0f458  EFLAGS: 00010082
> [  318.910761] RAX: 0000000000000000 RBX: 0000000000000016 RCX: 0000000100021e85
> [  318.910793] RDX: ffffffffa03623c0 RSI: 0000000000000016 RDI: ffffffffa03623e0
> [  318.910826] RBP: ffff880322d0f488 R08: 000000000000000a R09: 00000000000005f2
> [  318.910856] R10: 000000000000027c R11: 00000000000005f2 R12: ffff88085f48fc00
> [  318.910888] R13: ffff88085fd8e080 R14: ffff88085f48fc00 R15: 000000000001d4c0
> [  318.910922] FS:  0000000000000000(0000) GS:ffff88085fd80000(0000) knlGS:0000000000000000
> [  318.910956] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  318.910981] CR2: 00007f9d6ea41000 CR3: 0000000001c0b000 CR4: 00000000001406e0
> [  318.911012] Stack:
> [  318.911025]  ffff88085f48fc00 0000000000000016 ffff88085f48fc00 ffffffffa03623e0
> [  318.911069]  ffff88085f48fc00 000000000001d4c0 ffff880322d0f4c0 ffffffff810b6749
> [  318.911111]  000000000001d4c0 ffffffffa03623c0 0000000000000001 ffffffffa03623c0
> [  318.911153] Call Trace:
> [  318.911170]  [<ffffffff810b6749>] __queue_delayed_work+0x89/0x190
> [  318.911201]  [<ffffffff810b70de>] mod_delayed_work_on+0x5e/0xa0
> [  318.911236]  [<ffffffffa0333394>] prune_cache_entries+0x94/0xb0 [nfsd]
> [  318.911267]  [<ffffffffa03333be>] nfsd_reply_cache_scan+0xe/0x10 [nfsd]
> [  318.911299]  [<ffffffff811b78e2>] shrink_slab.part.42+0x1e2/0x380
> [  318.911330]  [<ffffffff811bbfc6>] shrink_zone+0x2b6/0x2c0
> [  318.911358]  [<ffffffff811bc145>] do_try_to_free_pages+0x175/0x400
> [  318.911391]  [<ffffffff811bc4da>] try_to_free_pages+0x10a/0x150
> [  318.911422]  [<ffffffff811ae5b1>] __alloc_pages_nodemask+0x581/0x990
> [  318.911454]  [<ffffffff811b19f5>] ? account_page_dirtied+0xd5/0x160
> [  318.911483]  [<ffffffff811f7b41>] alloc_pages_current+0x91/0x100
> [  318.911513]  [<ffffffff811a45bb>] __page_cache_alloc+0xab/0xe0
> [  318.911543]  [<ffffffff811a4c94>] pagecache_get_page+0x84/0x1e0
> [  318.911573]  [<ffffffff811a4e16>] grab_cache_page_write_begin+0x26/0x40
> [  318.911620]  [<ffffffffa0221cd4>] xfs_vm_write_begin+0x34/0xe0 [xfs]
> [  318.911661]  [<ffffffffa0221a39>] ? xfs_vm_write_end+0x29/0x80 [xfs]
> [  318.912748]  [<ffffffff811a418e>] generic_perform_write+0xce/0x1c0
> [  318.913810]  [<ffffffffa022efb7>] xfs_file_buffered_aio_write+0x127/0x250 [xfs]
> [  318.914858]  [<ffffffffa032be60>] ? nfserrno+0x60/0x60 [nfsd]
> [  318.915715]  [<ffffffff812035e9>] ? __kmalloc+0x1b9/0x260
> [  318.916360]  [<ffffffff81223617>] ? rw_copy_check_uvector+0x57/0x110
> [  318.917014]  [<ffffffffa022f1d5>] xfs_file_write_iter+0xf5/0x110 [xfs]
> [  318.917650]  [<ffffffff81221e3c>] do_iter_readv_writev+0x6c/0xa0
> [  318.918312]  [<ffffffff812229bf>] do_readv_writev+0x18f/0x230
> [  318.918954]  [<ffffffffa022f0e0>] ? xfs_file_buffered_aio_write+0x250/0x250 [xfs]
> [  318.919575]  [<ffffffff813284bb>] ? security_file_open+0x8b/0x90
> [  318.920178]  [<ffffffff8121ffa4>] ? do_dentry_open+0x264/0x2f0
> [  318.920771]  [<ffffffffa022ced0>] ? xfs_extent_busy_ag_cmp+0x20/0x20 [xfs]
> [  318.921354]  [<ffffffff81222ae9>] vfs_writev+0x39/0x50
> [  318.921961]  [<ffffffffa032df4d>] nfsd_vfs_write+0xbd/0x380 [nfsd]
> [  318.922509]  [<ffffffffa0330568>] nfsd_write+0xd8/0xf0 [nfsd]
> [  318.923045]  [<ffffffffa03367fc>] nfsd3_proc_write+0xbc/0x150 [nfsd]
> [  318.923574]  [<ffffffffa0328e68>] nfsd_dispatch+0xb8/0x200 [nfsd]
> [  318.924111]  [<ffffffffa02d39dc>] svc_process_common+0x40c/0x650 [sunrpc]
> [  318.924633]  [<ffffffffa02d4b14>] svc_process+0xf4/0x180 [sunrpc]
> [  318.925144]  [<ffffffffa03288bf>] nfsd+0xef/0x160 [nfsd]
> [  318.925654]  [<ffffffffa03287d0>] ? nfsd_destroy+0x60/0x60 [nfsd]
> [  318.926163]  [<ffffffff810bd6a8>] kthread+0xd8/0xf0
> [  318.926668]  [<ffffffff810bd5d0>] ? kthread_worker_fn+0x160/0x160
> [  318.927189]  [<ffffffff8177f19f>] ret_from_fork+0x3f/0x70
> [  318.927685]  [<ffffffff810bd5d0>] ? kthread_worker_fn+0x160/0x160
> [  318.928210] Code: 4d 85 c0 75 e4 48 89 de 4c 89 ef e8 55 f9 ff ff 48 8b 34 24 4c 89 ef e8 19 72 67 00 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b 48 83 7f 38 00 48 8b 45 08 0f 85 5f ff ff ff 48 89 47 38 
> [  318.929315] RIP  [<ffffffff811076a6>] add_timer_on+0xc6/0x110
> [  318.930019]  RSP <ffff880322d0f458>
> 
> Basically the shrinker kicked in. After we shrank a bunch of cache
> entries, we tried to mod_delayed_work on the cache cleaner job so that
> it wouldn't run too soon, and that's where the code oopsed.
> 
> Is our usage of mod_delayed_work in prune_cache_entries() wrong in some
> way? Or is there some breakage in the workqueue or timer code?
> 

Ok, a bit more analysis...

We did get a vmcore from this. Here is the "cache_cleaner" delayed work:

crash> p cache_cleaner
cache_cleaner = $12 = {
  work = {
    data = {
      counter = 0xfffffffe1
    }, 
    entry = {
      next = 0xffffffffa03623c8 <cache_cleaner+8>, 
      prev = 0xffffffffa03623c8 <cache_cleaner+8>
    }, 
    func = 0xffffffffa03333c0 <cache_cleaner_func>
  }, 
  timer = {
    entry = {
      next = 0x0, 
      pprev = 0xffff88085fd0eaf8
    }, 
    expires = 0x100021e99, 
    function = 0xffffffff810b66a0 <delayed_work_timer_fn>, 
    data = 0xffffffffa03623c0, 
    flags = 0x200014, 
    slack = 0xffffffff, 
    start_pid = 0x0, 
    start_site = 0x0, 
    start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
  }, 
  wq = 0xffff88085f48fc00, 
  cpu = 0x14
}

So the PENDING bit is set (lowest bit in data.counter), and timer->entry.pprev
pprev pointer is not NULL (so timer_pending is true). I also see that
there are several nfsd threads running the shrinker at the same time.

There is one potential problem that I see, but I'd appreciate someone
sanity checking me on this. Here is mod_delayed_work_on:

------------------[snip]------------------------
bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq,
                         struct delayed_work *dwork, unsigned long delay)
{
        unsigned long flags;
        int ret;

        do {
                ret = try_to_grab_pending(&dwork->work, true, &flags);
        } while (unlikely(ret == -EAGAIN));

        if (likely(ret >= 0)) {
                __queue_delayed_work(cpu, wq, dwork, delay);
                local_irq_restore(flags);
        }

        /* -ENOENT from try_to_grab_pending() becomes %true */
        return ret;
}
------------------[snip]------------------------

...and here is the beginning of try_to_grab_pending:

------------------[snip]------------------------
        /* try to steal the timer if it exists */
        if (is_dwork) {
                struct delayed_work *dwork = to_delayed_work(work);

                /*
                 * dwork->timer is irqsafe.  If del_timer() fails, it's
                 * guaranteed that the timer is not queued anywhere and not
                 * running on the local CPU.
                 */
                if (likely(del_timer(&dwork->timer)))
                        return 1;
        }

        /* try to claim PENDING the normal way */
        if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work)))
                return 0;
------------------[snip]------------------------


...so if del_timer returns true, we'll return 1 from
try_to_grab_pending without actually setting the
WORK_STRUCT_PENDING_BIT, and then will end up calling
__queue_delayed_work.

That seems wrong to me -- shouldn't we be ensuring that that bit is set
when returning 1 from try_to_grab_pending to guard against concurrent
queue_delayed_work_on calls?

-- 
Jeff Layton <jlayton@poochiereds.net>

  reply	other threads:[~2015-10-29 17:58 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-29 14:31 timer code oops when calling mod_delayed_work Jeff Layton
2015-10-29 17:58 ` Jeff Layton [this message]
2015-10-31  2:00   ` Tejun Heo
2015-10-31 11:34     ` Jeff Layton
2015-10-31 21:31       ` Tejun Heo
2015-10-31 21:54         ` Jeff Layton
2015-11-02 19:48           ` Chris Worley
2015-11-02 19:56             ` Jeff Layton
2015-11-03  1:33               ` Jeff Layton
2015-11-03 17:55                 ` Jeff Layton
2015-11-03 22:54                   ` Tejun Heo
2015-11-04  0:06                     ` Tejun Heo
2015-11-04 11:48                       ` Jeff Layton
2015-11-04 17:15                         ` [PATCH] timer: add_timer_on() should perform proper migration Tejun Heo
2015-11-04 19:27                           ` [tip:timers/urgent] timers: Use proper base migration in add_timer_on() tip-bot for Tejun Heo
2015-11-04 19:35                           ` [PATCH] timer: add_timer_on() should perform proper migration Thomas Gleixner
2015-11-04 19:43                             ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151029135836.02ad9000@synchrony.poochiereds.net \
    --to=jlayton@poochiereds.net \
    --cc=bfields@fieldses.org \
    --cc=chris.worley@primarydata.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michael.skralivetsky@primarydata.com \
    --cc=tj@kernel.org \
    --cc=trond.myklebust@primarydata.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).