linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Rafael J. Wysocki" <rjw@rjwysocki.net>
To: Linux PM <linux-pm@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Paul McKenney <paulmck@linux.vnet.ibm.com>,
	Thomas Ilsche <thomas.ilsche@tu-dresden.de>,
	Doug Smythies <dsmythies@telus.net>,
	Rik van Riel <riel@surriel.com>,
	Aubrey Li <aubrey.li@linux.intel.com>,
	Mike Galbraith <mgalbraith@suse.de>,
	LKML <linux-kernel@vger.kernel.org>,
	Len Brown <len.brown@intel.com>
Subject: [PATCH v8 08/10] sched: idle: Select idle state before stopping the tick
Date: Thu, 29 Mar 2018 14:16:24 +0200	[thread overview]
Message-ID: <1717553.uJRtP9dZYi@aspire.rjw.lan> (raw)
In-Reply-To: <40092860.XNQZrLjKDd@aspire.rjw.lan>

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

In order to address the issue with short idle duration predictions
by the idle governor after the scheduler tick has been stopped,
reorder the code in cpuidle_idle_call() so that the governor idle
state selection runs before tick_nohz_idle_go_idle() and use the
"nohz" hint returned by cpuidle_select() to decide whether or not
to stop the tick.

This isn't straightforward, because menu_select() invokes
tick_nohz_get_sleep_length() to get the time to the next timer
event and the number returned by the latter comes from
__tick_nohz_idle_stop_tick().  Fortunately, however, it is possible
to compute that number without actually stopping the tick and with
the help of the existing code.

Namely, tick_nohz_get_sleep_length() can be made call
tick_nohz_next_event(), introduced earlier, to get the time to the
next non-highres timer event.  If that happens, tick_nohz_next_event()
need not be called by __tick_nohz_idle_stop_tick() again.

If it turns out that the scheduler tick cannot be stopped going
forward or the next timer event is too close for the tick to be
stopped, tick_nohz_get_sleep_length() can simply return the time to
the next event currently programmed into the corresponding clock
event device.

In addition to knowing the return value of tick_nohz_next_event(),
however, tick_nohz_get_sleep_length() needs to know the time to the
next highres timer event, but with the scheduler tick timer excluded,
which can be computed with the help of hrtimer_get_next_event().

That minimum of that number and the tick_nohz_next_event() return
value is the total time to the next timer event with the assumption
that the tick will be stopped.  It can be returned to the idle
governor which can use it for predicting idle duration (under the
assumption that the tick will be stopped) and deciding whether or
not it makes sense to stop the tick before putting the CPU into the
selected idle state.

With the above, the sleep_length field in struct tick_sched is not
necessary any more, so drop it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v7 -> v8:
  * Some changes moved to patch [06/10].
  * Use hrtimer_get_next_event() in tick_nohz_get_sleep_length().
  * Add timer_clear_idle() to tick_nohz_idle_retain_tick().

---
 include/linux/hrtimer.h  |    2 -
 include/linux/tick.h     |    2 +
 kernel/sched/idle.c      |   11 ++++++--
 kernel/time/hrtimer.c    |   56 +++++++++++++++++++++++++++++++++----------
 kernel/time/tick-sched.c |   61 +++++++++++++++++++++++++++++++++++++----------
 kernel/time/tick-sched.h |    2 -
 kernel/time/timer.c      |    4 +--
 7 files changed, 105 insertions(+), 33 deletions(-)

Index: linux-pm/include/linux/tick.h
===================================================================
--- linux-pm.orig/include/linux/tick.h
+++ linux-pm/include/linux/tick.h
@@ -115,6 +115,7 @@ enum tick_dep_bits {
 extern bool tick_nohz_enabled;
 extern int tick_nohz_tick_stopped(void);
 extern void tick_nohz_idle_stop_tick(void);
+extern void tick_nohz_idle_retain_tick(void);
 extern void tick_nohz_idle_restart_tick(void);
 extern void tick_nohz_idle_enter(void);
 extern void tick_nohz_idle_exit(void);
@@ -137,6 +138,7 @@ static inline void tick_nohz_idle_stop_t
 #define tick_nohz_enabled (0)
 static inline int tick_nohz_tick_stopped(void) { return 0; }
 static inline void tick_nohz_idle_stop_tick(void) { }
+static inline void tick_nohz_idle_retain_tick(void) { }
 static inline void tick_nohz_idle_restart_tick(void) { }
 static inline void tick_nohz_idle_enter(void) { }
 static inline void tick_nohz_idle_exit(void) { }
Index: linux-pm/kernel/sched/idle.c
===================================================================
--- linux-pm.orig/kernel/sched/idle.c
+++ linux-pm/kernel/sched/idle.c
@@ -190,13 +190,18 @@ static void cpuidle_idle_call(void)
 	} else {
 		bool stop_tick = true;
 
-		tick_nohz_idle_stop_tick();
-		rcu_idle_enter();
-
 		/*
 		 * Ask the cpuidle framework to choose a convenient idle state.
 		 */
 		next_state = cpuidle_select(drv, dev, &stop_tick);
+
+		if (stop_tick)
+			tick_nohz_idle_stop_tick();
+		else
+			tick_nohz_idle_retain_tick();
+
+		rcu_idle_enter();
+
 		entered_state = call_cpuidle(drv, dev, next_state);
 		/*
 		 * Give the governor an opportunity to reflect on the outcome
Index: linux-pm/kernel/time/tick-sched.c
===================================================================
--- linux-pm.orig/kernel/time/tick-sched.c
+++ linux-pm/kernel/time/tick-sched.c
@@ -930,16 +930,19 @@ static bool can_stop_idle_tick(int cpu,
 
 static void __tick_nohz_idle_stop_tick(struct tick_sched *ts)
 {
-	struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);
 	ktime_t expires;
 	int cpu = smp_processor_id();
 
-	WARN_ON_ONCE(ts->timer_expires_base);
-
-	if (!can_stop_idle_tick(cpu, ts))
-		goto out;
-
-	expires = tick_nohz_next_event(ts, cpu);
+	/*
+	 * If tick_nohz_get_sleep_length() ran tick_nohz_next_event(), the
+	 * tick timer expiration time is known already.
+	 */
+	if (ts->timer_expires_base)
+		expires = ts->timer_expires;
+	else if (can_stop_idle_tick(cpu, ts))
+		expires = tick_nohz_next_event(ts, cpu);
+	else
+		return;
 
 	ts->idle_calls++;
 
@@ -958,9 +961,6 @@ static void __tick_nohz_idle_stop_tick(s
 	} else {
 		tick_nohz_retain_tick(ts);
 	}
-
-out:
-	ts->sleep_length = ktime_sub(dev->next_event, ts->idle_entrytime);
 }
 
 /**
@@ -973,6 +973,16 @@ void tick_nohz_idle_stop_tick(void)
 	__tick_nohz_idle_stop_tick(this_cpu_ptr(&tick_cpu_sched));
 }
 
+void tick_nohz_idle_retain_tick(void)
+{
+	tick_nohz_retain_tick(this_cpu_ptr(&tick_cpu_sched));
+	/*
+	 * Undo the effect of get_next_timer_interrupt() called from
+	 * tick_nohz_next_event().
+	 */
+	timer_clear_idle();
+}
+
 /**
  * tick_nohz_idle_enter - prepare for entering idle on the current CPU
  *
@@ -1036,15 +1046,42 @@ bool tick_nohz_idle_got_tick(void)
 }
 
 /**
- * tick_nohz_get_sleep_length - return the length of the current sleep
+ * tick_nohz_get_sleep_length - return the expected length of the current sleep
  *
  * Called from power state control code with interrupts disabled
  */
 ktime_t tick_nohz_get_sleep_length(void)
 {
+	struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);
 	struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
+	int cpu = smp_processor_id();
+	/*
+	 * The idle entry time is expected to be a sufficient approximation of
+	 * the current time at this point.
+	 */
+	ktime_t now = ts->idle_entrytime;
+	ktime_t next_event;
+
+	WARN_ON_ONCE(!ts->inidle);
+
+	if (!can_stop_idle_tick(cpu, ts))
+		goto out_dev;
+
+	next_event = tick_nohz_next_event(ts, cpu);
+	if (!next_event)
+		goto out_dev;
+
+	/*
+	 * If the next highres timer to expire is earlier than next_event, the
+	 * idle governor needs to know that.
+	 */
+	next_event = min_t(u64, next_event,
+			   hrtimer_get_next_event(&ts->sched_timer));
+
+	return ktime_sub(next_event, now);
 
-	return ts->sleep_length;
+out_dev:
+	return ktime_sub(dev->next_event, now);
 }
 
 /**
Index: linux-pm/kernel/time/tick-sched.h
===================================================================
--- linux-pm.orig/kernel/time/tick-sched.h
+++ linux-pm/kernel/time/tick-sched.h
@@ -38,7 +38,6 @@ enum tick_nohz_mode {
  * @idle_exittime:	Time when the idle state was left
  * @idle_sleeptime:	Sum of the time slept in idle with sched tick stopped
  * @iowait_sleeptime:	Sum of the time slept in idle with sched tick stopped, with IO outstanding
- * @sleep_length:	Duration of the current idle sleep
  * @timer_expires:	Anticipated timer expiration time (in case sched tick is stopped)
  * @timer_expires_base:	Base time clock monotonic for @timer_expires
  * @do_timer_lst:	CPU was the last one doing do_timer before going idle
@@ -60,7 +59,6 @@ struct tick_sched {
 	ktime_t				idle_exittime;
 	ktime_t				idle_sleeptime;
 	ktime_t				iowait_sleeptime;
-	ktime_t				sleep_length;
 	unsigned long			last_jiffies;
 	u64				timer_expires;
 	u64				timer_expires_base;

  parent reply	other threads:[~2018-03-29 12:25 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-29 11:48 [PATCH v8 00/10] sched/cpuidle: Idle loop rework Rafael J. Wysocki
2018-03-29 12:00 ` [PATCH v8 01/10] time: tick-sched: Reorganize idle tick management code Rafael J. Wysocki
2018-04-01  1:50   ` Frederic Weisbecker
2018-03-29 12:01 ` [PATCH v8 02/10] sched: idle: Do not stop the tick upfront in the idle loop Rafael J. Wysocki
2018-04-02 20:36   ` Frederic Weisbecker
2018-03-29 12:02 ` [PATCH v8 03/10] sched: idle: Do not stop the tick before cpuidle_idle_call() Rafael J. Wysocki
2018-04-02 21:25   ` Frederic Weisbecker
2018-03-29 12:03 ` [PATCH v8 04/10] jiffies: Introduce USER_TICK_USEC and redefine TICK_USEC Rafael J. Wysocki
2018-03-29 12:05 ` [PATCH v8 05/10] cpuidle: Return nohz hint from cpuidle_select() Rafael J. Wysocki
2018-03-29 12:11 ` [PATCH v8 06/10] time: tick-sched: Split tick_nohz_stop_sched_tick() Rafael J. Wysocki
2018-03-29 12:12 ` [PATCH v8 07/10] time: hrtimer: Timer exclusion support for hrtimer_get_next_event() Rafael J. Wysocki
2018-03-29 12:16 ` Rafael J. Wysocki [this message]
2018-03-29 12:20 ` [PATCH v8 09/10] cpuidle: menu: Refine idle state selection for running tick Rafael J. Wysocki
2018-03-29 12:21 ` [PATCH v8 10/10] cpuidle: menu: Avoid selecting shallow states with stopped tick Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1717553.uJRtP9dZYi@aspire.rjw.lan \
    --to=rjw@rjwysocki.net \
    --cc=aubrey.li@linux.intel.com \
    --cc=dsmythies@telus.net \
    --cc=fweisbec@gmail.com \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mgalbraith@suse.de \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.ilsche@tu-dresden.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).