From: "George Spelvin" <linux@sciencehorizons.net>
To: linux-kernel@vger.kernel.org, tglx@linutronix.de
Cc: arjan@infradead.org, clm@fb.com, edumazet@google.com,
fweisbec@gmail.com, lenb@kernel.org, linux@sciencehorizons.net,
mingo@kernel.org, paulmck@linux.vnet.ibm.com,
peterz@infradead.org, riel@redhat.com, rt@linutronix.de,
torvalds@linux-foundation.org
Subject: Re: [patch V2 12/20] timer: Switch to a non cascading wheel
Date: 18 Jun 2016 05:55:39 -0400 [thread overview]
Message-ID: <20160618095539.20718.qmail@ns.sciencehorizons.net> (raw)
In-Reply-To: <20160617131003.810514984@linutronix.de>
I want to read this even more, but here's a dump of my comments so far...
> 1) Cascading is avoided (except for extreme long time timers)
> + * Note: This implementation might be suboptimal vs. timers enqueued in the
> + * cascade level because we do not look at the timers to figure out when
> + * they really expire. So for now, we just treat the cascading timers
> + * like any other timer. If each cascading bucket has a timer, we wake
> + * up with the granularity of the last level.
You've eliminated cascading entirely, so these comments are stale, no?
> +# define BASE_RND_DN(n) ((n) & ~BASE_MASK)
> +# define BASE_RND_UP(n) (BASE_RND_DN(n) + BASE_INCR)
Er... is this correct? Usually I'd expect the result of rounding up
to occasionally be equal to the original (e.g. BASE_RND_UP(0) == 0), but
this doesn't have that property.
Given that you don't use BASE_RND_DN anywhere, maybe shrink this to one definition?
Looking at the __next_timer_interrupt function, it seems that it does
a lot more work than necessary. Once a timeout has been found in the
current level, the range which must be searched in the following level
is limited to 1/LVL_CLK_DIV of the range in the current level.
That quickly tapers off to zero and the search can stop.
In particular, if a timeout is found at level 0 between the immediately
next bucket and the next bucket which is a multiple of LEVEL_SHIFT_DIV,
inclusive (1 <= x <= 8 buckets depending on the sbits of base->clk),
then the search can stop immediately.
This is hairy code and the following untested code is probably buggy,
but the basic idea is:
/*
* Search span bits beginning at (offset + clk) for a set bit, wrapping
* at the end of the level. Return the position of the bit relative to
* (offset + clk), or >= span if none.
*/
static unsigned next_pending_bucket(struct timer_base *base, unsigned offset,
unsigned clk, unsigned span)
{
unsigned pos;
if (clk + span <= LVL_SIZE) {
/* No wrap, simple search */
clk += offset;
return find_next_bit(base->pending_map, clk + span, clk);
return pos - clk;
} else {
/* Search wraps */
clk += offset;
pos = find_next_bit(base->pending_map, offset + LVL_SIZE, clk);
if (pos >= offset + LVL_SIZE)
return pos - clk;
clk -= LVL_SIZE;
pos = find_next_bit(base->pending_map, clk + span, offset);
return pos - clk;
}
}
/* Find the next expiring timer list >= base->clk */
static unsigned long __next_timer_interrupt(struct timer_base *base)
{
unsigned long clk, end, next;
unsigned lvl, offset. bit;
/* Phase 1: Find the starting level */
bit = find_first_bit(base->pending_map, WHEEL_SIZE);
if (unlkely(bit >= WHEEL_SIZE)) {
/* No pending timers */
next = base->clk + NEXT_TIMER_MAX_DELTA;
goto done;
}
lvl = (unsigned)bit / LVL_SIZE;
clk = (base->clk + LVL_GRAN(lvl) - 1) >> LVL_SHIFT(lvl);
offset = (bit | LVL_MASK) + 1; /* End of the current level */
/* Phase 2: Find the next-expiring list in this level */
if ((clk & LVL_MASK) > (bit & LVL_MASK)) {
unsigned b = offset - LVL_SIZE + (clk & LVL_MASK);
b = find_next_bit(base->pending_map, offset, b);
if (b < offset)
bit = b;
}
end = clk + ((bit - clk) & LVL_MASK); /* The next expiration time */
next = end << LVL_SHIFT(lvl);
/*
* At this point, clk is the current time, in units of the current
* level's granularity, and rounded up. end is the time of the
* earliest expiration found so far, in the same units and rounded
* down. next is the unrounded expiration time in jiffies.
*
* Phase 3: Search higher levels for expirations in [clk, end).
*/
while (++lvl < LVL_DEPTH) {
unsigned b;
clk = (clk + LVL_CLK_MASK) >> LVL_CLK_SHIFT;
end >>= LVL_CLK_SHIFT;
if (clk >= end)
break;
b = next_pending_bucket(base, offset, clk & LVL_MASK, end-clk);
if (b < end - clk) {
end = clk + b;
next = end << LVL_SHIFT(lvl);
}
offset += LVL_SIZE;
}
done:
spin_unlock(&base->lock);
return next;
}
next prev parent reply other threads:[~2016-06-18 9:55 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-17 13:26 [patch V2 00/20] timer: Refactor the timer wheel Thomas Gleixner
2016-06-17 13:26 ` [patch V2 01/20] timer: Make pinned a timer property Thomas Gleixner
2016-06-17 13:26 ` [patch V2 02/20] x86/apic/uv: Initialize timer as pinned Thomas Gleixner
2016-06-17 13:26 ` [patch V2 03/20] x86/mce: " Thomas Gleixner
2016-06-17 13:26 ` [patch V2 05/20] driver/net/ethernet/tile: " Thomas Gleixner
2016-06-21 18:14 ` Peter Zijlstra
2016-06-17 13:26 ` [patch V2 04/20] cpufreq/powernv: " Thomas Gleixner
2016-06-17 13:26 ` [patch V2 06/20] drivers/tty/metag_da: " Thomas Gleixner
2016-06-17 13:26 ` [patch V2 07/20] drivers/tty/mips_ejtag: " Thomas Gleixner
2016-06-17 13:26 ` [patch V2 08/20] net/ipv4/inet: Initialize timers " Thomas Gleixner
2016-06-17 13:26 ` [patch V2 09/20] timer: Remove mod_timer_pinned Thomas Gleixner
2016-06-17 13:26 ` [patch V2 10/20] hlist: Add hlist_is_singular_node() helper Thomas Gleixner
2016-06-17 13:26 ` [patch V2 11/20] timer: Give a few structs and members proper names Thomas Gleixner
2016-06-17 13:26 ` [patch V2 12/20] timer: Switch to a non cascading wheel Thomas Gleixner
2016-06-18 9:55 ` George Spelvin [this message]
2016-06-24 10:06 ` Thomas Gleixner
2016-06-17 13:26 ` [patch V2 13/20] timer: Remove slack leftovers Thomas Gleixner
2016-06-17 13:26 ` [patch V2 14/20] timer: Move __run_timers() function Thomas Gleixner
2016-06-17 13:26 ` [patch V2 15/20] timer: Optimize collect timers for NOHZ Thomas Gleixner
2016-06-17 13:26 ` [patch V2 16/20] tick/sched: Remove pointless empty function Thomas Gleixner
2016-06-17 13:26 ` [patch V2 17/20] timer: Forward wheel clock whenever possible Thomas Gleixner
2016-06-17 13:26 ` [patch V2 18/20] timer: Only wake softirq if necessary Thomas Gleixner
2016-06-17 13:26 ` [patch V2 19/20] timer: Split out index calculation Thomas Gleixner
2016-06-17 13:26 ` [patch V2 20/20] timer: Optimization for same expiry time in mod_timer() Thomas Gleixner
2016-06-17 13:48 ` [patch V2 00/20] timer: Refactor the timer wheel Eric Dumazet
2016-06-17 13:57 ` Thomas Gleixner
2016-06-17 14:25 ` Eric Dumazet
2016-06-20 13:56 ` Thomas Gleixner
2016-06-20 14:46 ` Arjan van de Ven
2016-06-20 14:46 ` Thomas Gleixner
2016-06-20 14:49 ` Arjan van de Ven
2016-06-20 19:03 ` Rik van Riel
2016-06-21 2:48 ` Eric Dumazet
2016-06-17 14:26 ` Arjan van de Ven
2016-06-20 15:05 ` Paul E. McKenney
2016-06-20 15:13 ` Thomas Gleixner
2016-06-20 15:41 ` Paul E. McKenney
2016-06-22 7:37 ` Mike Galbraith
2016-06-22 8:44 ` Thomas Gleixner
2016-06-22 9:06 ` Mike Galbraith
2016-06-22 13:37 ` Mike Galbraith
2016-06-22 10:28 ` [LTP] " Cyril Hrubis
2016-06-23 8:27 ` Thomas Gleixner
2016-06-23 11:47 ` Cyril Hrubis
2016-06-23 13:58 ` George Spelvin
2016-06-23 14:10 ` Thomas Gleixner
2016-06-23 15:11 ` Cyril Hrubis
2016-06-23 15:21 ` Thomas Gleixner
2016-06-23 16:31 ` Cyril Hrubis
2016-06-26 19:00 ` Pavel Machek
2016-06-26 19:21 ` Arjan van de Ven
2016-06-26 20:02 ` Pavel Machek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160618095539.20718.qmail@ns.sciencehorizons.net \
--to=linux@sciencehorizons.net \
--cc=arjan@infradead.org \
--cc=clm@fb.com \
--cc=edumazet@google.com \
--cc=fweisbec@gmail.com \
--cc=lenb@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=rt@linutronix.de \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).