linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "George Spelvin" <linux@sciencehorizons.net>
To: linux-kernel@vger.kernel.org, tglx@linutronix.de
Cc: arjan@infradead.org, clm@fb.com, edumazet@google.com,
	fweisbec@gmail.com, lenb@kernel.org, linux@sciencehorizons.net,
	mingo@kernel.org, paulmck@linux.vnet.ibm.com,
	peterz@infradead.org, riel@redhat.com, rt@linutronix.de,
	torvalds@linux-foundation.org
Subject: Re: [patch V2 12/20] timer: Switch to a non cascading wheel
Date: 18 Jun 2016 05:55:39 -0400	[thread overview]
Message-ID: <20160618095539.20718.qmail@ns.sciencehorizons.net> (raw)
In-Reply-To: <20160617131003.810514984@linutronix.de>

I want to read this even more, but here's a dump of my comments so far...

> 1) Cascading is avoided (except for extreme long time timers)

> + * Note: This implementation might be suboptimal vs. timers enqueued in the
> + *	 cascade level because we do not look at the timers to figure out when
> + *	 they really expire. So for now, we just treat the cascading timers
> + *	 like any other timer. If each cascading bucket has a timer, we wake
> + *	 up with the granularity of the last level.

You've eliminated cascading entirely, so these comments are stale, no?

> +# define BASE_RND_DN(n)        ((n) & ~BASE_MASK)
> +# define BASE_RND_UP(n)        (BASE_RND_DN(n) + BASE_INCR)

Er... is this correct?  Usually I'd expect the result of rounding up
to occasionally be equal to the original (e.g. BASE_RND_UP(0) == 0), but
this doesn't have that property.

Given that you don't use BASE_RND_DN anywhere, maybe shrink this to one definition?


Looking at the __next_timer_interrupt function, it seems that it does
a lot more work than necessary.  Once a timeout has been found in the
current level, the range which must be searched in the following level
is limited to 1/LVL_CLK_DIV of the range in the current level.

That quickly tapers off to zero and the search can stop.

In particular, if a timeout is found at level 0 between the immediately
next bucket and the next bucket which is a multiple of LEVEL_SHIFT_DIV,
inclusive (1 <= x <= 8 buckets depending on the sbits of base->clk),
then the search can stop immediately.


This is hairy code and the following untested code is probably buggy,
but the basic idea is:

/*
 * Search span bits beginning at (offset + clk) for a set bit, wrapping
 * at the end of the level.  Return the position of the bit relative to
 * (offset + clk), or >= span if none.
 */
static unsigned next_pending_bucket(struct timer_base *base, unsigned offset,
	unsigned clk, unsigned span)
{
	unsigned pos;

	if (clk + span <= LVL_SIZE) {
		/* No wrap, simple search */
		clk += offset;
		return find_next_bit(base->pending_map, clk + span, clk);
		return pos - clk;
	} else {
		/* Search wraps */
		clk += offset;
		pos = find_next_bit(base->pending_map, offset + LVL_SIZE, clk);
		if (pos >= offset + LVL_SIZE)
			return pos - clk;
		clk -= LVL_SIZE;
		pos = find_next_bit(base->pending_map, clk + span, offset);
		return pos - clk;
	}
}

/* Find the next expiring timer list >= base->clk */
static unsigned long __next_timer_interrupt(struct timer_base *base)
{
	unsigned long clk, end, next;
	unsigned lvl, offset. bit;

	/* Phase 1: Find the starting level */
	bit = find_first_bit(base->pending_map, WHEEL_SIZE);
	if (unlkely(bit >= WHEEL_SIZE)) {
		/* No pending timers */
		next = base->clk + NEXT_TIMER_MAX_DELTA;
		goto done;
	}
	lvl = (unsigned)bit / LVL_SIZE;
	clk = (base->clk + LVL_GRAN(lvl) - 1) >> LVL_SHIFT(lvl);
	offset = (bit | LVL_MASK) + 1;	/* End of the current level */

	/* Phase 2: Find the next-expiring list in this level */
	if ((clk & LVL_MASK) > (bit & LVL_MASK)) {
		unsigned b = offset - LVL_SIZE + (clk & LVL_MASK);

		b = find_next_bit(base->pending_map, offset, b);
		if (b < offset)
			bit = b;
	}
	end = clk + ((bit - clk) & LVL_MASK);	/* The next expiration time */
	next = end << LVL_SHIFT(lvl);

	/*
	 * At this point, clk is the current time, in units of the current
	 * level's granularity, and rounded up.  end is the time of the
	 * earliest expiration found so far, in the same units and rounded
	 * down.  next is the unrounded expiration time in jiffies.
	 *
	 * Phase 3: Search higher levels for expirations in [clk, end).
	 */
	while (++lvl < LVL_DEPTH) {
		unsigned b;

		clk = (clk + LVL_CLK_MASK) >> LVL_CLK_SHIFT;
		end >>= LVL_CLK_SHIFT;
		if (clk >= end)
			break;
		b = next_pending_bucket(base, offset, clk & LVL_MASK, end-clk);
		if (b < end - clk) {
			end = clk + b;
			next = end << LVL_SHIFT(lvl);
		}
		offset += LVL_SIZE;
	}
done:
	spin_unlock(&base->lock);
	return next;
}

  reply	other threads:[~2016-06-18  9:55 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-17 13:26 [patch V2 00/20] timer: Refactor the timer wheel Thomas Gleixner
2016-06-17 13:26 ` [patch V2 01/20] timer: Make pinned a timer property Thomas Gleixner
2016-06-17 13:26 ` [patch V2 02/20] x86/apic/uv: Initialize timer as pinned Thomas Gleixner
2016-06-17 13:26 ` [patch V2 03/20] x86/mce: " Thomas Gleixner
2016-06-17 13:26 ` [patch V2 05/20] driver/net/ethernet/tile: " Thomas Gleixner
2016-06-21 18:14   ` Peter Zijlstra
2016-06-17 13:26 ` [patch V2 04/20] cpufreq/powernv: " Thomas Gleixner
2016-06-17 13:26 ` [patch V2 06/20] drivers/tty/metag_da: " Thomas Gleixner
2016-06-17 13:26 ` [patch V2 07/20] drivers/tty/mips_ejtag: " Thomas Gleixner
2016-06-17 13:26 ` [patch V2 08/20] net/ipv4/inet: Initialize timers " Thomas Gleixner
2016-06-17 13:26 ` [patch V2 09/20] timer: Remove mod_timer_pinned Thomas Gleixner
2016-06-17 13:26 ` [patch V2 10/20] hlist: Add hlist_is_singular_node() helper Thomas Gleixner
2016-06-17 13:26 ` [patch V2 11/20] timer: Give a few structs and members proper names Thomas Gleixner
2016-06-17 13:26 ` [patch V2 12/20] timer: Switch to a non cascading wheel Thomas Gleixner
2016-06-18  9:55   ` George Spelvin [this message]
2016-06-24 10:06     ` Thomas Gleixner
2016-06-17 13:26 ` [patch V2 13/20] timer: Remove slack leftovers Thomas Gleixner
2016-06-17 13:26 ` [patch V2 14/20] timer: Move __run_timers() function Thomas Gleixner
2016-06-17 13:26 ` [patch V2 15/20] timer: Optimize collect timers for NOHZ Thomas Gleixner
2016-06-17 13:26 ` [patch V2 16/20] tick/sched: Remove pointless empty function Thomas Gleixner
2016-06-17 13:26 ` [patch V2 17/20] timer: Forward wheel clock whenever possible Thomas Gleixner
2016-06-17 13:26 ` [patch V2 18/20] timer: Only wake softirq if necessary Thomas Gleixner
2016-06-17 13:26 ` [patch V2 19/20] timer: Split out index calculation Thomas Gleixner
2016-06-17 13:26 ` [patch V2 20/20] timer: Optimization for same expiry time in mod_timer() Thomas Gleixner
2016-06-17 13:48 ` [patch V2 00/20] timer: Refactor the timer wheel Eric Dumazet
2016-06-17 13:57   ` Thomas Gleixner
2016-06-17 14:25     ` Eric Dumazet
2016-06-20 13:56       ` Thomas Gleixner
2016-06-20 14:46         ` Arjan van de Ven
2016-06-20 14:46           ` Thomas Gleixner
2016-06-20 14:49             ` Arjan van de Ven
2016-06-20 19:03         ` Rik van Riel
2016-06-21  2:48           ` Eric Dumazet
2016-06-17 14:26 ` Arjan van de Ven
2016-06-20 15:05 ` Paul E. McKenney
2016-06-20 15:13   ` Thomas Gleixner
2016-06-20 15:41     ` Paul E. McKenney
2016-06-22  7:37 ` Mike Galbraith
2016-06-22  8:44   ` Thomas Gleixner
2016-06-22  9:06     ` Mike Galbraith
2016-06-22 13:37       ` Mike Galbraith
2016-06-22 10:28     ` [LTP] " Cyril Hrubis
2016-06-23  8:27       ` Thomas Gleixner
2016-06-23 11:47         ` Cyril Hrubis
2016-06-23 13:58           ` George Spelvin
2016-06-23 14:10             ` Thomas Gleixner
2016-06-23 15:11             ` Cyril Hrubis
2016-06-23 15:21               ` Thomas Gleixner
2016-06-23 16:31                 ` Cyril Hrubis
2016-06-26 19:00     ` Pavel Machek
2016-06-26 19:21       ` Arjan van de Ven
2016-06-26 20:02         ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160618095539.20718.qmail@ns.sciencehorizons.net \
    --to=linux@sciencehorizons.net \
    --cc=arjan@infradead.org \
    --cc=clm@fb.com \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=lenb@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=rt@linutronix.de \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).