LKML Archive on lore.kernel.org
 help / color / Atom feed
From: "Rafael J. Wysocki" <rjw@rjwysocki.net>
To: Peter Zijlstra <peterz@infradead.org>,
	Linux PM <linux-pm@vger.kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Paul McKenney <paulmck@linux.vnet.ibm.com>,
	Thomas Ilsche <thomas.ilsche@tu-dresden.de>,
	Doug Smythies <dsmythies@telus.net>,
	Rik van Riel <riel@surriel.com>,
	Aubrey Li <aubrey.li@linux.intel.com>,
	Mike Galbraith <mgalbraith@suse.de>,
	LKML <linux-kernel@vger.kernel.org>
Subject: [RFT][PATCH v7.2 5/8] cpuidle: Return nohz hint from cpuidle_select()
Date: Wed, 21 Mar 2018 14:03:51 +0100
Message-ID: <2313989.EXUHJuTSS6@aspire.rjw.lan> (raw)
In-Reply-To: <1635957.yuHkCe9oyz@aspire.rjw.lan>

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Add a new pointer argument to cpuidle_select() and to the ->select
cpuidle governor callback to allow a boolean value indicating
whether or not the tick should be stopped before entering the
selected state to be returned from there.

Make the ladder governor ignore that pointer (to preserve its
current behavior) and make the menu governor return 'false" through
it if:
 (1) the idle exit latency is constrained at 0, or
 (2) the selected state is a polling one, or
 (3) the expected idle period duration is within the tick period
     range.

In addition to that, the correction factor computations in the menu
governor need to take the possibility that the tick may not be
stopped into account to avoid artificially small correction factor
values.  To that end, add a mechanism to record tick wakeups, as
suggested by Peter Zijlstra, and use it to modify the menu_reflect()
behavior when tick wakeup occurs.  Namely, if the CPU is woken up by
the tick and the return value of tick_nohz_get_sleep_length() is not
within the tick boundary, the predicted idle duration is likely too
short, so make menu_reflect() try to compensate for that by bumping
up the correction factor with a (sufficiently large) constant value.

Since the value returned through the new argument pointer of
cpuidle_select() is not used by its caller yet, this change by
itself is not expected to alter the functionality of the code.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

This fixes the bug in the v7.1 of patch [5/8] (and in the original v7 of it
for that matter) by which tick wakeups occurring for tick_nohz_get_sleep_length()
return values within the tick boundary were treated in a special way
incorrectly.

It doesn't run menu_update() on tick wakeups occurring when the return
value of tick_nohz_get_sleep_length() is not within the tick boundary, but
simply bumps up the correction factor alone then.

>From the theoretical standpoint this is my favorite version of patch [5/8]
so far, as there's a clear (to me at least) reason for all of the changes.
Of course, the constant value used for bumping up the correction factor in
menu_reflect() is a matter of speculation at this point, but this is the
only remaining sort of moving part I can see.  And it can be adjusted later. :-)

Still, the theory needs to meet practice and we'll see what comes out of that ...

Thanks!

---
 drivers/cpuidle/cpuidle.c          |   10 +++++-
 drivers/cpuidle/governors/ladder.c |    3 +
 drivers/cpuidle/governors/menu.c   |   57 +++++++++++++++++++++++++++++++++----
 include/linux/cpuidle.h            |    8 +++--
 include/linux/tick.h               |    2 +
 kernel/sched/idle.c                |    4 +-
 kernel/time/tick-sched.c           |   20 ++++++++++++
 7 files changed, 92 insertions(+), 12 deletions(-)

Index: linux-pm/include/linux/cpuidle.h
===================================================================
--- linux-pm.orig/include/linux/cpuidle.h
+++ linux-pm/include/linux/cpuidle.h
@@ -135,7 +135,8 @@ extern bool cpuidle_not_available(struct
 				  struct cpuidle_device *dev);
 
 extern int cpuidle_select(struct cpuidle_driver *drv,
-			  struct cpuidle_device *dev);
+			  struct cpuidle_device *dev,
+			  bool *stop_tick);
 extern int cpuidle_enter(struct cpuidle_driver *drv,
 			 struct cpuidle_device *dev, int index);
 extern void cpuidle_reflect(struct cpuidle_device *dev, int index);
@@ -167,7 +168,7 @@ static inline bool cpuidle_not_available
 					 struct cpuidle_device *dev)
 {return true; }
 static inline int cpuidle_select(struct cpuidle_driver *drv,
-				 struct cpuidle_device *dev)
+				 struct cpuidle_device *dev, bool *stop_tick)
 {return -ENODEV; }
 static inline int cpuidle_enter(struct cpuidle_driver *drv,
 				struct cpuidle_device *dev, int index)
@@ -250,7 +251,8 @@ struct cpuidle_governor {
 					struct cpuidle_device *dev);
 
 	int  (*select)		(struct cpuidle_driver *drv,
-					struct cpuidle_device *dev);
+					struct cpuidle_device *dev,
+					bool *stop_tick);
 	void (*reflect)		(struct cpuidle_device *dev, int index);
 };
 
Index: linux-pm/kernel/sched/idle.c
===================================================================
--- linux-pm.orig/kernel/sched/idle.c
+++ linux-pm/kernel/sched/idle.c
@@ -188,13 +188,15 @@ static void cpuidle_idle_call(void)
 		next_state = cpuidle_find_deepest_state(drv, dev);
 		call_cpuidle(drv, dev, next_state);
 	} else {
+		bool stop_tick = true;
+
 		tick_nohz_idle_stop_tick();
 		rcu_idle_enter();
 
 		/*
 		 * Ask the cpuidle framework to choose a convenient idle state.
 		 */
-		next_state = cpuidle_select(drv, dev);
+		next_state = cpuidle_select(drv, dev, &stop_tick);
 		entered_state = call_cpuidle(drv, dev, next_state);
 		/*
 		 * Give the governor an opportunity to reflect on the outcome
Index: linux-pm/drivers/cpuidle/cpuidle.c
===================================================================
--- linux-pm.orig/drivers/cpuidle/cpuidle.c
+++ linux-pm/drivers/cpuidle/cpuidle.c
@@ -272,12 +272,18 @@ int cpuidle_enter_state(struct cpuidle_d
  *
  * @drv: the cpuidle driver
  * @dev: the cpuidle device
+ * @stop_tick: indication on whether or not to stop the tick
  *
  * Returns the index of the idle state.  The return value must not be negative.
+ *
+ * The memory location pointed to by @stop_tick is expected to be written the
+ * 'false' boolean value if the scheduler tick should not be stopped before
+ * entering the returned state.
  */
-int cpuidle_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
+int cpuidle_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
+		   bool *stop_tick)
 {
-	return cpuidle_curr_governor->select(drv, dev);
+	return cpuidle_curr_governor->select(drv, dev, stop_tick);
 }
 
 /**
Index: linux-pm/drivers/cpuidle/governors/ladder.c
===================================================================
--- linux-pm.orig/drivers/cpuidle/governors/ladder.c
+++ linux-pm/drivers/cpuidle/governors/ladder.c
@@ -63,9 +63,10 @@ static inline void ladder_do_selection(s
  * ladder_select_state - selects the next state to enter
  * @drv: cpuidle driver
  * @dev: the CPU
+ * @dummy: not used
  */
 static int ladder_select_state(struct cpuidle_driver *drv,
-				struct cpuidle_device *dev)
+			       struct cpuidle_device *dev, bool *dummy)
 {
 	struct ladder_device *ldev = this_cpu_ptr(&ladder_devices);
 	struct device *device = get_cpu_device(dev->cpu);
Index: linux-pm/drivers/cpuidle/governors/menu.c
===================================================================
--- linux-pm.orig/drivers/cpuidle/governors/menu.c
+++ linux-pm/drivers/cpuidle/governors/menu.c
@@ -279,8 +279,10 @@ again:
  * menu_select - selects the next idle state to enter
  * @drv: cpuidle driver containing state data
  * @dev: the CPU
+ * @stop_tick: indication on whether or not to stop the tick
  */
-static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
+static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
+		       bool *stop_tick)
 {
 	struct menu_device *data = this_cpu_ptr(&menu_devices);
 	struct device *device = get_cpu_device(dev->cpu);
@@ -303,8 +305,10 @@ static int menu_select(struct cpuidle_dr
 		latency_req = resume_latency;
 
 	/* Special case when user has set very strict latency requirement */
-	if (unlikely(latency_req == 0))
+	if (unlikely(latency_req == 0)) {
+		*stop_tick = false;
 		return 0;
+	}
 
 	/* determine the expected residency time, round up */
 	data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length());
@@ -354,6 +358,7 @@ static int menu_select(struct cpuidle_dr
 	if (latency_req > interactivity_req)
 		latency_req = interactivity_req;
 
+	expected_interval = data->predicted_us;
 	/*
 	 * Find the idle state with the lowest power while satisfying
 	 * our constraints.
@@ -369,15 +374,30 @@ static int menu_select(struct cpuidle_dr
 			idx = i; /* first enabled state */
 		if (s->target_residency > data->predicted_us)
 			break;
-		if (s->exit_latency > latency_req)
+		if (s->exit_latency > latency_req) {
+			/*
+			 * If we break out of the loop for latency reasons, use
+			 * the target residency of the selected state as the
+			 * expected idle duration so that the tick is retained
+			 * as long as that target residency is low enough.
+			 */
+			expected_interval = drv->states[idx].target_residency;
 			break;
-
+		}
 		idx = i;
 	}
 
 	if (idx == -1)
 		idx = 0; /* No states enabled. Must use 0. */
 
+	/*
+	 * Don't stop the tick if the selected state is a polling one or if the
+	 * expected idle duration is shorter than the tick period length.
+	 */
+	if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
+	    expected_interval < TICK_USEC)
+		*stop_tick = false;
+
 	data->last_state_idx = idx;
 
 	return data->last_state_idx;
@@ -396,7 +416,34 @@ static void menu_reflect(struct cpuidle_
 	struct menu_device *data = this_cpu_ptr(&menu_devices);
 
 	data->last_state_idx = index;
-	data->needs_update = 1;
+	/*
+	 * Tick wakeups occurring when the tick_nohz_get_sleep_length() return
+	 * value is within the tick boundary should be treated as regular ones,
+	 * as the nohz code itself doesn't stop the tick then.
+	 */
+	if (tick_nohz_idle_got_tick() && data->next_timer_us > TICK_USEC) {
+		unsigned int new_factor = data->correction_factor[data->bucket];
+
+		/*
+		 * Only update the correction factor, don't update the repeating
+		 * pattern data to avoid polluting it with the tick period
+		 * length which is a design artifact here.
+		 */
+		new_factor -= new_factor / DECAY;
+		/*
+		 * The nohz code said that there wouldn't be any wakeups
+		 * within the tick boundary (if the tick wasn't stopped), but
+		 * menu_select() had a differing opinion.  Yet, the CPU was
+		 * woken up by the tick, so menu_select() was not quite right.
+		 * Try to make it do a better job next time by bumping up the
+		 * correction factor.  Use 0.75 * RESOLUTION (which is easy
+		 * enough to get) that should work fine on the average.
+		 */
+		new_factor += RESOLUTION / 2 + RESOLUTION / 4;
+		data->correction_factor[data->bucket] = new_factor;
+	} else {
+		data->needs_update = 1;
+	}
 }
 
 /**
Index: linux-pm/kernel/time/tick-sched.c
===================================================================
--- linux-pm.orig/kernel/time/tick-sched.c
+++ linux-pm/kernel/time/tick-sched.c
@@ -991,6 +991,20 @@ void tick_nohz_irq_exit(void)
 }
 
 /**
+ * tick_nohz_idle_got_tick - Check whether or not the tick handler has run
+ */
+bool tick_nohz_idle_got_tick(void)
+{
+	struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
+
+	if (ts->inidle > 1) {
+		ts->inidle = 1;
+		return true;
+	}
+	return false;
+}
+
+/**
  * tick_nohz_get_sleep_length - return the length of the current sleep
  *
  * Called from power state control code with interrupts disabled
@@ -1101,6 +1115,9 @@ static void tick_nohz_handler(struct clo
 	struct pt_regs *regs = get_irq_regs();
 	ktime_t now = ktime_get();
 
+	if (ts->inidle)
+		ts->inidle = 2;
+
 	dev->next_event = KTIME_MAX;
 
 	tick_sched_do_timer(now);
@@ -1198,6 +1215,9 @@ static enum hrtimer_restart tick_sched_t
 	struct pt_regs *regs = get_irq_regs();
 	ktime_t now = ktime_get();
 
+	if (ts->inidle)
+		ts->inidle = 2;
+
 	tick_sched_do_timer(now);
 
 	/*
Index: linux-pm/include/linux/tick.h
===================================================================
--- linux-pm.orig/include/linux/tick.h
+++ linux-pm/include/linux/tick.h
@@ -119,6 +119,7 @@ extern void tick_nohz_idle_restart_tick(
 extern void tick_nohz_idle_enter(void);
 extern void tick_nohz_idle_exit(void);
 extern void tick_nohz_irq_exit(void);
+extern bool tick_nohz_idle_got_tick(void);
 extern ktime_t tick_nohz_get_sleep_length(void);
 extern unsigned long tick_nohz_get_idle_calls(void);
 extern unsigned long tick_nohz_get_idle_calls_cpu(int cpu);
@@ -139,6 +140,7 @@ static inline void tick_nohz_idle_stop_t
 static inline void tick_nohz_idle_restart_tick(void) { }
 static inline void tick_nohz_idle_enter(void) { }
 static inline void tick_nohz_idle_exit(void) { }
+static inline bool tick_nohz_idle_got_tick(void) { return false; }
 
 static inline ktime_t tick_nohz_get_sleep_length(void)
 {

  parent reply index

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-20 15:12 [RFT][PATCH v7 0/8] sched/cpuidle: Idle loop rework Rafael J. Wysocki
2018-03-20 15:13 ` [RFT][PATCH v7 1/8] time: tick-sched: Reorganize idle tick management code Rafael J. Wysocki
2018-03-20 15:15 ` [RFT][PATCH v7 2/8] sched: idle: Do not stop the tick upfront in the idle loop Rafael J. Wysocki
2018-03-20 15:15 ` [RFT][PATCH v7 3/8] " Rafael J. Wysocki
2018-03-20 18:00   ` [Correction][RFT][PATCH v7 3/8] sched: idle: Do not stop the tick before cpuidle_idle_call() Rafael J. Wysocki
2018-03-20 15:16 ` [RFT][PATCH v7 4/8] jiffies: Introduce USER_TICK_USEC and redefine TICK_USEC Rafael J. Wysocki
2018-03-20 15:45 ` [RFT][PATCH v7 5/8] cpuidle: Return nohz hint from cpuidle_select() Rafael J. Wysocki
2018-03-21  6:48   ` [RFT][PATCH v7.1 " Rafael J. Wysocki
2018-03-21 11:52     ` Rafael J. Wysocki
2018-03-21 13:03   ` Rafael J. Wysocki [this message]
2018-03-21 14:36   ` [RFT][PATCH v7 " Rafael J. Wysocki
2018-03-21 17:59     ` Thomas Ilsche
2018-03-21 22:15       ` Rafael J. Wysocki
2018-03-22 13:18         ` Thomas Ilsche
2018-03-22 17:23           ` Rafael J. Wysocki
2018-03-22  6:24       ` Doug Smythies
2018-03-22 15:41       ` Doug Smythies
2018-03-22 17:21         ` Rafael J. Wysocki
2018-03-21 18:23     ` Doug Smythies
2018-03-22 17:40   ` [RFT][PATCH v7.3 " Rafael J. Wysocki
2018-03-28  9:14     ` Thomas Ilsche
2018-03-30  9:39       ` Rafael J. Wysocki
2018-04-10 15:22         ` Thomas Ilsche
2018-03-22 20:46   ` Doug Smythies
2018-03-20 15:45 ` [RFT][PATCH v7 6/8] sched: idle: Select idle state before stopping the tick Rafael J. Wysocki
2018-03-27 21:50   ` Thomas Ilsche
2018-03-27 22:10     ` Rafael J. Wysocki
2018-03-28  8:13       ` Rafael J. Wysocki
2018-03-28  8:38         ` Thomas Ilsche
2018-03-28 10:37           ` Rafael J. Wysocki
2018-03-28 10:56             ` Rafael J. Wysocki
2018-03-28 15:15               ` Thomas Ilsche
2018-03-28 20:41               ` Doug Smythies
2018-03-28 23:11                 ` Rafael J. Wysocki
2018-03-20 15:46 ` [RFT][PATCH v7 7/8] cpuidle: menu: Refine idle state selection for running tick Rafael J. Wysocki
2018-03-20 15:47 ` [RFT][PATCH v7 8/8] cpuidle: menu: Avoid selecting shallow states with stopped tick Rafael J. Wysocki
2018-03-20 17:52 ` [RFT][PATCH v7 3/8] sched: idle: Do not stop the tick upfront in the idle loop Doug Smythies
2018-03-20 18:01   ` Rafael J. Wysocki
2018-03-21 12:31 ` [RFT][PATCH v7 0/8] sched/cpuidle: Idle loop rework Rik van Riel
2018-03-21 13:55   ` Rafael J. Wysocki
2018-03-21 14:53     ` Rik van Riel

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2313989.EXUHJuTSS6@aspire.rjw.lan \
    --to=rjw@rjwysocki.net \
    --cc=aubrey.li@linux.intel.com \
    --cc=dsmythies@telus.net \
    --cc=fweisbec@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mgalbraith@suse.de \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.ilsche@tu-dresden.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org linux-kernel@archiver.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox