linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Jeff Layton <jlayton@poochiereds.net>
Cc: linux-kernel@vger.kernel.org, bfields@fieldses.org,
	Michael Skralivetsky <michael.skralivetsky@primarydata.com>,
	Chris Worley <chris.worley@primarydata.com>,
	Trond Myklebust <trond.myklebust@primarydata.com>,
	Lai Jiangshan <laijs@cn.fujitsu.com>
Subject: Re: timer code oops when calling mod_delayed_work
Date: Sat, 31 Oct 2015 11:00:12 +0900	[thread overview]
Message-ID: <20151031020012.GH3582@mtj.duckdns.org> (raw)
In-Reply-To: <20151029135836.02ad9000@synchrony.poochiereds.net>

(cc'ing Lai)

Hello, Jeff.

On Thu, Oct 29, 2015 at 01:58:36PM -0400, Jeff Layton wrote:
> crash> p cache_cleaner
> cache_cleaner = $12 = {
>   work = {
>     data = {
>       counter = 0xfffffffe1

If I'm not mistaken, PENDING, flush color 14, OFFQ and POOL_NONE.

>     }, 
>     entry = {
>       next = 0xffffffffa03623c8 <cache_cleaner+8>, 
>       prev = 0xffffffffa03623c8 <cache_cleaner+8>

Empty entry.

>     }, 
>     func = 0xffffffffa03333c0 <cache_cleaner_func>
>   }, 
>   timer = {
>     entry = {
>       next = 0x0, 
>       pprev = 0xffff88085fd0eaf8
>     }, 
>     expires = 0x100021e99, 
>     function = 0xffffffff810b66a0 <delayed_work_timer_fn>, 
>     data = 0xffffffffa03623c0, 
>     flags = 0x200014, 
>     slack = 0xffffffff, 
>     start_pid = 0x0, 
>     start_site = 0x0, 
>     start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
>   }, 
>   wq = 0xffff88085f48fc00, 
>   cpu = 0x14
> }
> 
> So the PENDING bit is set (lowest bit in data.counter), and timer->entry.pprev
> pprev pointer is not NULL (so timer_pending is true). I also see that
> there are several nfsd threads running the shrinker at the same time.
> 
> There is one potential problem that I see, but I'd appreciate someone
> sanity checking me on this. Here is mod_delayed_work_on:
...
> ...and here is the beginning of try_to_grab_pending:
> 
> ------------------[snip]------------------------
>         /* try to steal the timer if it exists */
>         if (is_dwork) {
>                 struct delayed_work *dwork = to_delayed_work(work);
> 
>                 /*
>                  * dwork->timer is irqsafe.  If del_timer() fails, it's
>                  * guaranteed that the timer is not queued anywhere and not
>                  * running on the local CPU.
>                  */
>                 if (likely(del_timer(&dwork->timer)))
>                         return 1;
>         }
> 
>         /* try to claim PENDING the normal way */
>         if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work)))
>                 return 0;
> ------------------[snip]------------------------
> 
> 
> ...so if del_timer returns true, we'll return 1 from
> try_to_grab_pending without actually setting the
> WORK_STRUCT_PENDING_BIT, and then will end up calling
> __queue_delayed_work.
> 
> That seems wrong to me -- shouldn't we be ensuring that that bit is set
> when returning 1 from try_to_grab_pending to guard against concurrent
> queue_delayed_work_on calls?

But if try_to_grab_pending() succeeded at stealing dwork->timer, it's
known that the PENDING bit must already be set.  IOW, the bit is
stolen together with the timer.

Heh, this one is tricky.  Yeah, try_to_grab_pending() missing PENDING
would explain the failure but I can't see how it'd leak at the moment.

Thanks.

-- 
tejun

  reply	other threads:[~2015-10-31  2:00 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-29 14:31 timer code oops when calling mod_delayed_work Jeff Layton
2015-10-29 17:58 ` Jeff Layton
2015-10-31  2:00   ` Tejun Heo [this message]
2015-10-31 11:34     ` Jeff Layton
2015-10-31 21:31       ` Tejun Heo
2015-10-31 21:54         ` Jeff Layton
2015-11-02 19:48           ` Chris Worley
2015-11-02 19:56             ` Jeff Layton
2015-11-03  1:33               ` Jeff Layton
2015-11-03 17:55                 ` Jeff Layton
2015-11-03 22:54                   ` Tejun Heo
2015-11-04  0:06                     ` Tejun Heo
2015-11-04 11:48                       ` Jeff Layton
2015-11-04 17:15                         ` [PATCH] timer: add_timer_on() should perform proper migration Tejun Heo
2015-11-04 19:27                           ` [tip:timers/urgent] timers: Use proper base migration in add_timer_on() tip-bot for Tejun Heo
2015-11-04 19:35                           ` [PATCH] timer: add_timer_on() should perform proper migration Thomas Gleixner
2015-11-04 19:43                             ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151031020012.GH3582@mtj.duckdns.org \
    --to=tj@kernel.org \
    --cc=bfields@fieldses.org \
    --cc=chris.worley@primarydata.com \
    --cc=jlayton@poochiereds.net \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michael.skralivetsky@primarydata.com \
    --cc=trond.myklebust@primarydata.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).