All of lore.kernel.org
 help / color / mirror / Atom feed
From: "George Spelvin" <linux@sciencehorizons.net>
To: tglx@linutronix.de
Cc: edumazet@google.com, linux@sciencehorizons.net,
	linux-kernel@vger.kernel.org, peterz@infradead.org,
	richardcochran@gmail.com
Subject: Re: [patch 13/20] timer: Switch to a non cascading wheel
Date: 14 Jun 2016 04:16:02 -0400	[thread overview]
Message-ID: <20160614081602.10755.qmail@ns.sciencehorizons.net> (raw)

Nice cleanup!


I think I see a buglet in your level-5 cascading.

Suppose a timer is requested far in the future for a time
that is an exact multiple of 32768 jiffies.

collect_expired_timers() scans level 5 after all the previous ones,
and will cascade it to level 0, in a level-0 bucket which has already
been scanned, and won't be scanned again for 64 jiffies.

I agree that 64 jiffies is well within your allowed rounding accuracy,
and order of timer firing is not guaranteed when they're for the same
time, but it is a bit odd when a timer fires 32 jiffies *before* another
timer scheduled for 32 jiffies later.  That's the sort of peculiarity
that could lead to a subtle bug.


While I like the cleanup of just limiting long-term resolution, if
it turns out to be necessary, it's not too hard to add exact timers
back in if a need is found in future.  All you need is a second
__internal_add_timer function that rounds down rather than up, and to
teach expire_timers() to cascade in the unlikely situation that a timer
does have an expiry time in the future.

(It also gets rid of the special case for level 5.)


Other, mostly minor, code comments:

> +/* Level offsets in the wheel */
> +#define LVL0_OFFS	(0)
> +#define LVL1_OFFS	(LVL_SIZE)
> +#define LVL2_OFFS	(LVL1_OFFS + LVL_SIZE)
> +#define LVL3_OFFS	(LVL2_OFFS + LVL_SIZE)
> +#define LVL4_OFFS	(LVL3_OFFS + LVL_SIZE)
> +#define LVL5_OFFS	(LVL4_OFFS + LVL_SIZE)
> +
> +/* Clock divisor for the next level */
> +#define LVL_CLK_SHIFT	3
> +#define LVL_CLK_DIV	(1 << LVL_CLK_SHIFT)
> +#define LVL_CLK_MASK	(LVL_CLK_DIV - 1)
> +
> +/* The shift constants for selecting the bucket at the levels */
> +#define LVL1_SHIFT	(1 * LVL_CLK_SHIFT)
> +#define LVL2_SHIFT	(2 * LVL_CLK_SHIFT)
> +#define LVL3_SHIFT	(3 * LVL_CLK_SHIFT)
> +#define LVL4_SHIFT	(4 * LVL_CLK_SHIFT)
> +#define LVL5_SHIFT	(5 * LVL_CLK_SHIFT)
> +
> +/* The granularity of each level */
> +#define LVL0_GRAN	0x00000001
> +#define LVL1_GRAN	(LVL0_GRAN << LVL_CLK_SHIFT)
> +#define LVL2_GRAN	(LVL1_GRAN << LVL_CLK_SHIFT)
> +#define LVL3_GRAN	(LVL2_GRAN << LVL_CLK_SHIFT)
> +#define LVL4_GRAN	(LVL3_GRAN << LVL_CLK_SHIFT)
> +#define LVL5_GRAN	(LVL4_GRAN << LVL_CLK_SHIFT)

Wouldn't this all be so much simpler as

#define LVL_BITS	6	/* Renamed previous LVL_SHIFT */
#define LVL_SIZE	(1 << LVL_BITS)
#define LVL_MASK	(LVL_BITS - 1)
#define LVL_OFFS(n)	((n) * LVL_SIZE)
#define LVL_SHIFT(n)	((n) * LVL_CLK_SHIFT)
#define LVL_GRAN(n)	(1 << LVL_SHIFT(n))

Then you could do
+static inline unsigned calc_index(unsigned expires, unsigned level),
+{
+	/* Round up to next bin bin */
+	expires = ((expires - 1) >> LVL_SHIFT(level)) + 1;
+	return LVL_OFFS(level) + (expires & LVL_MASK);
+}


> +#define LVL1_TSTART	(LVL_SIZE - 1)

Er... isn't that LVL_SIZE, as documented in the table above?
Then it could be
#define LVL_TSTART(n) (LVL_SIZE << LVL_SHIFT(n))

Ideally, you'd like all of that

+	if (delta < LVL1_TSTART) {
+		idx = (expires + LVL0_GRAN) & LVL_MASK;
+	} else if (delta < LVL2_TSTART) {
+		idx = calc_index(expires, LVL1_GRAN, LVL1_SHIFT, LVL1_OFFS);
+	} else if (delta < LVL3_TSTART) {
+		idx = calc_index(expires, LVL2_GRAN, LVL2_SHIFT, LVL2_OFFS);
+	} else if (delta < LVL4_TSTART) {
+		idx = calc_index(expires, LVL3_GRAN, LVL3_SHIFT, LVL3_OFFS);
+	} else if (delta < LVL5_TSTART) {
+		idx = calc_index(expires, LVL4_GRAN, LVL4_SHIFT, LVL4_OFFS);

to be replaced with __builtin_clz or similar:

	level = __fls(delta | LVL_MASK);
	if (level <  LVL_BITS + LVL_SHIFT(LVL_DEPTH-1)) {	/* or LVL_DEPTH-2, no difference */
		level = (level + LVL_CLK_SHIFT - LVL_BITS) / LVL_CLK_SHIFT;
	} else if ((long)delta < 0) {
		expires = base->clk;
		level = 0;
	} else {
		level = LVL_DEPTH - 1;
	}
	index = calc_index(expires, level);


> +static inline void detach_expired_timer(struct timer_list *timer)
>  {
>  	detach_timer(timer, true);
> -	if (!(timer->flags & TIMER_DEFERRABLE))
> -		base->active_timers--;
> -	base->all_timers--;
>  }

Is there even a reason to have this wrapper any more?  Why not
just replace all calls to it in the source?


> +		timer = hlist_entry(head->first, struct timer_list, entry);
> +		fn = timer->function;
> +		data = timer->data;
> +
> +		timer_stats_account_timer(timer);
> +
> +		base->running_timer = timer;
> +		detach_expired_timer(timer);

Is there some non-obvious reason that you have to fetch fn and data
so early?  It seems like a register pressure pessimization, if the
compiler can't figure out that timer_stats code can't change them.

The cache line containing this timer was already prefetched when you
updated its entry.pprev as part of removing the previous entry from
the list.

I see why you want to fetch them with the lock held in case there's some
freaky race, but I'd do it all after detach_timer().

             reply	other threads:[~2016-06-14  8:16 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-14  8:16 George Spelvin [this message]
2016-06-14  8:50 ` [patch 13/20] timer: Switch to a non cascading wheel Thomas Gleixner
2016-06-14 10:15   ` George Spelvin
2016-06-14 10:20 ` Peter Zijlstra
2016-06-14 12:58   ` George Spelvin
2016-06-14 16:48     ` Thomas Gleixner
2016-06-14 19:56       ` George Spelvin
  -- strict thread matches above, loose matches on Subject: below --
2016-06-13  8:40 [patch 00/20] timer: Refactor the timer wheel Thomas Gleixner
2016-06-13  8:41 ` [patch 13/20] timer: Switch to a non cascading wheel Thomas Gleixner
2016-06-13 11:40   ` Peter Zijlstra
2016-06-13 12:30     ` Thomas Gleixner
2016-06-13 12:46       ` Eric Dumazet
2016-06-13 14:30         ` Thomas Gleixner
2016-06-14 10:11       ` Ingo Molnar
2016-06-14 16:28         ` Thomas Gleixner
2016-06-14 17:14           ` Arjan van de Ven
2016-06-14 18:05             ` Thomas Gleixner
2016-06-14 20:34               ` Peter Zijlstra
2016-06-14 20:42               ` Peter Zijlstra
2016-06-14 21:17                 ` Eric Dumazet
2016-06-15 14:53                   ` Thomas Gleixner
2016-06-15 14:55                     ` Arjan van de Ven
2016-06-15 16:43                       ` Thomas Gleixner
2016-06-16 15:43                         ` Thomas Gleixner
2016-06-16 16:02                           ` Paul E. McKenney
2016-06-16 18:14                             ` Peter Zijlstra
2016-06-17  0:40                               ` Paul E. McKenney
2016-06-17  4:04                                 ` Paul E. McKenney
2016-06-16 16:04                           ` Arjan van de Ven
2016-06-16 16:09                             ` Thomas Gleixner
2016-06-15 15:05                     ` Eric Dumazet
2016-06-13 14:36   ` Richard Cochran
2016-06-13 14:39     ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160614081602.10755.qmail@ns.sciencehorizons.net \
    --to=linux@sciencehorizons.net \
    --cc=edumazet@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=richardcochran@gmail.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.