From: Tejun Heo <tj@kernel.org>
To: Jeff Layton <jlayton@poochiereds.net>
Cc: linux-kernel@vger.kernel.org, bfields@fieldses.org,
Michael Skralivetsky <michael.skralivetsky@primarydata.com>,
Chris Worley <chris.worley@primarydata.com>,
Trond Myklebust <trond.myklebust@primarydata.com>,
Lai Jiangshan <laijs@cn.fujitsu.com>
Subject: Re: timer code oops when calling mod_delayed_work
Date: Sat, 31 Oct 2015 11:00:12 +0900 [thread overview]
Message-ID: <20151031020012.GH3582@mtj.duckdns.org> (raw)
In-Reply-To: <20151029135836.02ad9000@synchrony.poochiereds.net>
(cc'ing Lai)
Hello, Jeff.
On Thu, Oct 29, 2015 at 01:58:36PM -0400, Jeff Layton wrote:
> crash> p cache_cleaner
> cache_cleaner = $12 = {
> work = {
> data = {
> counter = 0xfffffffe1
If I'm not mistaken, PENDING, flush color 14, OFFQ and POOL_NONE.
> },
> entry = {
> next = 0xffffffffa03623c8 <cache_cleaner+8>,
> prev = 0xffffffffa03623c8 <cache_cleaner+8>
Empty entry.
> },
> func = 0xffffffffa03333c0 <cache_cleaner_func>
> },
> timer = {
> entry = {
> next = 0x0,
> pprev = 0xffff88085fd0eaf8
> },
> expires = 0x100021e99,
> function = 0xffffffff810b66a0 <delayed_work_timer_fn>,
> data = 0xffffffffa03623c0,
> flags = 0x200014,
> slack = 0xffffffff,
> start_pid = 0x0,
> start_site = 0x0,
> start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
> },
> wq = 0xffff88085f48fc00,
> cpu = 0x14
> }
>
> So the PENDING bit is set (lowest bit in data.counter), and timer->entry.pprev
> pprev pointer is not NULL (so timer_pending is true). I also see that
> there are several nfsd threads running the shrinker at the same time.
>
> There is one potential problem that I see, but I'd appreciate someone
> sanity checking me on this. Here is mod_delayed_work_on:
...
> ...and here is the beginning of try_to_grab_pending:
>
> ------------------[snip]------------------------
> /* try to steal the timer if it exists */
> if (is_dwork) {
> struct delayed_work *dwork = to_delayed_work(work);
>
> /*
> * dwork->timer is irqsafe. If del_timer() fails, it's
> * guaranteed that the timer is not queued anywhere and not
> * running on the local CPU.
> */
> if (likely(del_timer(&dwork->timer)))
> return 1;
> }
>
> /* try to claim PENDING the normal way */
> if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work)))
> return 0;
> ------------------[snip]------------------------
>
>
> ...so if del_timer returns true, we'll return 1 from
> try_to_grab_pending without actually setting the
> WORK_STRUCT_PENDING_BIT, and then will end up calling
> __queue_delayed_work.
>
> That seems wrong to me -- shouldn't we be ensuring that that bit is set
> when returning 1 from try_to_grab_pending to guard against concurrent
> queue_delayed_work_on calls?
But if try_to_grab_pending() succeeded at stealing dwork->timer, it's
known that the PENDING bit must already be set. IOW, the bit is
stolen together with the timer.
Heh, this one is tricky. Yeah, try_to_grab_pending() missing PENDING
would explain the failure but I can't see how it'd leak at the moment.
Thanks.
--
tejun
next prev parent reply other threads:[~2015-10-31 2:00 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-29 14:31 timer code oops when calling mod_delayed_work Jeff Layton
2015-10-29 17:58 ` Jeff Layton
2015-10-31 2:00 ` Tejun Heo [this message]
2015-10-31 11:34 ` Jeff Layton
2015-10-31 21:31 ` Tejun Heo
2015-10-31 21:54 ` Jeff Layton
2015-11-02 19:48 ` Chris Worley
2015-11-02 19:56 ` Jeff Layton
2015-11-03 1:33 ` Jeff Layton
2015-11-03 17:55 ` Jeff Layton
2015-11-03 22:54 ` Tejun Heo
2015-11-04 0:06 ` Tejun Heo
2015-11-04 11:48 ` Jeff Layton
2015-11-04 17:15 ` [PATCH] timer: add_timer_on() should perform proper migration Tejun Heo
2015-11-04 19:27 ` [tip:timers/urgent] timers: Use proper base migration in add_timer_on() tip-bot for Tejun Heo
2015-11-04 19:35 ` [PATCH] timer: add_timer_on() should perform proper migration Thomas Gleixner
2015-11-04 19:43 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151031020012.GH3582@mtj.duckdns.org \
--to=tj@kernel.org \
--cc=bfields@fieldses.org \
--cc=chris.worley@primarydata.com \
--cc=jlayton@poochiereds.net \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=michael.skralivetsky@primarydata.com \
--cc=trond.myklebust@primarydata.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).