From: "Rafael J. Wysocki" <rjw@rjwysocki.net>
To: Peter Zijlstra <peterz@infradead.org>,
Linux PM <linux-pm@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
Frederic Weisbecker <fweisbec@gmail.com>,
Paul McKenney <paulmck@linux.vnet.ibm.com>,
Thomas Ilsche <thomas.ilsche@tu-dresden.de>,
Doug Smythies <dsmythies@telus.net>,
Rik van Riel <riel@surriel.com>,
Aubrey Li <aubrey.li@linux.intel.com>,
Mike Galbraith <mgalbraith@suse.de>,
LKML <linux-kernel@vger.kernel.org>
Subject: [RFC/RFT][PATCH v2 5/6] sched: idle: Select idle state before stopping the tick
Date: Tue, 06 Mar 2018 10:10:35 +0100 [thread overview]
Message-ID: <1545256.RXd5dazKu1@aspire.rjw.lan> (raw)
In-Reply-To: <2067762.1uWBf5RSRc@aspire.rjw.lan>
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In order to address the issue with short idle duration predictions
by the idle governor after the tick has been stopped, reorder the
code in cpuidle_idle_call() so that the governor idle state selection
runs before tick_nohz_idle_go_idle() and use the "nohz" hint returned
by cpuidle_select() to tell tick_nohz_idle_go_idle() whether or not
to stop the tick.
This isn't straightforward, because menu_predict() invokes
tick_nohz_get_sleep_length() to get the time to the next timer
event and the number returned by the latter comes from
__tick_nohz_idle_enter(). Fortunately, however, it is possible
to compute that number without actually stopping the tick and with
the help of the existing code.
Namely, notice that tick_nohz_stop_sched_tick() already computes the
next timer event time to reprogram the scheduler tick hrtimer and
that time can be used as a proxy for the actual next timer event
time in the idle duration predicition.
Accordingly, rename the original tick_nohz_stop_sched_tick() to
__tick_nohz_next_event() and add the stop_tick argument indicating
whether or not to stop the tick to it. If that argument is 'true',
the function will work like the original tick_nohz_stop_sched_tick(),
but otherwise it will just compute the next event time without
stopping the tick. Next, redefine tick_nohz_stop_sched_tick() as
a wrapper around the new function.
Following that, make tick_nohz_get_sleep_length() call
__tick_nohz_next_event() to compute the next timer event time
and make it use the new last_jiffies_update field in struct
tick_sched to tell __tick_nohz_idle_enter() to skip some code
that has run already.
[After this change the __tick_nohz_next_event() code computing the
next event time will run twice in a row if the expected idle period
duration coming from cpuidle_select() is large enough which is sort
of ugly, but the next set of changes deals with that separately.
To do that, it uses the value of the last_jiffies_update field in
struct tick_sched introduced here, among other things.]
Finally, drop the now redundant sleep_length field from struct
tick_sched.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
-> v2: Use the "nohz" hint from cpuidle_select() instead of the expected
idle duration.
---
kernel/sched/idle.c | 7 ++---
kernel/time/tick-sched.c | 64 +++++++++++++++++++++++++++++++++--------------
kernel/time/tick-sched.h | 3 --
3 files changed, 50 insertions(+), 24 deletions(-)
Index: linux-pm/kernel/sched/idle.c
===================================================================
--- linux-pm.orig/kernel/sched/idle.c
+++ linux-pm/kernel/sched/idle.c
@@ -188,13 +188,14 @@ static void cpuidle_idle_call(void)
} else {
bool nohz = true;
- tick_nohz_idle_go_idle(true);
- rcu_idle_enter();
-
/*
* Ask the cpuidle framework to choose a convenient idle state.
*/
next_state = cpuidle_select(drv, dev, &nohz);
+
+ tick_nohz_idle_go_idle(nohz);
+ rcu_idle_enter();
+
entered_state = call_cpuidle(drv, dev, next_state);
/*
* Give the governor an opportunity to reflect on the outcome
Index: linux-pm/kernel/time/tick-sched.c
===================================================================
--- linux-pm.orig/kernel/time/tick-sched.c
+++ linux-pm/kernel/time/tick-sched.c
@@ -655,8 +655,8 @@ static inline bool local_timer_softirq_p
return local_softirq_pending() & TIMER_SOFTIRQ;
}
-static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
- ktime_t now, int cpu)
+static ktime_t __tick_nohz_next_event(struct tick_sched *ts, int cpu,
+ bool stop_tick)
{
struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);
u64 basemono, next_tick, next_tmr, next_rcu, delta, expires;
@@ -670,6 +670,7 @@ static ktime_t tick_nohz_stop_sched_tick
basejiff = jiffies;
} while (read_seqretry(&jiffies_lock, seq));
ts->last_jiffies = basejiff;
+ ts->last_jiffies_update = basemono;
/*
* Keep the periodic tick, when RCU, architecture or irq_work
@@ -732,8 +733,10 @@ static ktime_t tick_nohz_stop_sched_tick
*/
delta = timekeeping_max_deferment();
if (cpu == tick_do_timer_cpu) {
- tick_do_timer_cpu = TICK_DO_TIMER_NONE;
- ts->do_timer_last = 1;
+ if (stop_tick) {
+ tick_do_timer_cpu = TICK_DO_TIMER_NONE;
+ ts->do_timer_last = 1;
+ }
} else if (tick_do_timer_cpu != TICK_DO_TIMER_NONE) {
delta = KTIME_MAX;
ts->do_timer_last = 0;
@@ -756,6 +759,12 @@ static ktime_t tick_nohz_stop_sched_tick
expires = min_t(u64, expires, next_tick);
tick = expires;
+ if (!stop_tick) {
+ /* Undo the effect of get_next_timer_interrupt(). */
+ timer_clear_idle();
+ goto out;
+ }
+
/* Skip reprogram of event if its not changed */
if (ts->tick_stopped && (expires == ts->next_tick)) {
/* Sanity check: make sure clockevent is actually programmed */
@@ -804,14 +813,14 @@ static ktime_t tick_nohz_stop_sched_tick
else
tick_program_event(tick, 1);
out:
- /*
- * Update the estimated sleep length until the next timer
- * (not only the tick).
- */
- ts->sleep_length = ktime_sub(dev->next_event, now);
return tick;
}
+static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts, int cpu)
+{
+ return __tick_nohz_next_event(ts, cpu, true);
+}
+
static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
{
/* Update jiffies first */
@@ -847,7 +856,7 @@ static void tick_nohz_full_update_tick(s
return;
if (can_stop_full_tick(cpu, ts))
- tick_nohz_stop_sched_tick(ts, ktime_get(), cpu);
+ tick_nohz_stop_sched_tick(ts, cpu);
else if (ts->tick_stopped)
tick_nohz_restart_sched_tick(ts, ktime_get());
#endif
@@ -873,10 +882,8 @@ static bool can_stop_idle_tick(int cpu,
return false;
}
- if (unlikely(ts->nohz_mode == NOHZ_MODE_INACTIVE)) {
- ts->sleep_length = NSEC_PER_SEC / HZ;
+ if (unlikely(ts->nohz_mode == NOHZ_MODE_INACTIVE))
return false;
- }
if (need_resched())
return false;
@@ -913,17 +920,22 @@ static bool can_stop_idle_tick(int cpu,
static void __tick_nohz_idle_enter(struct tick_sched *ts, bool stop_tick)
{
- ktime_t now, expires;
int cpu = smp_processor_id();
- now = tick_nohz_start_idle(ts);
+ if (!ts->last_jiffies_update) {
+ /* tick_nohz_get_sleep_length() has not run. */
+ tick_nohz_start_idle(ts);
+ if (!can_stop_idle_tick(cpu, ts))
+ return;
+ }
- if (can_stop_idle_tick(cpu, ts) && stop_tick) {
+ if (stop_tick) {
int was_stopped = ts->tick_stopped;
+ ktime_t expires;
ts->idle_calls++;
- expires = tick_nohz_stop_sched_tick(ts, now, cpu);
+ expires = tick_nohz_stop_sched_tick(ts, cpu);
if (expires > 0LL) {
ts->idle_sleeps++;
ts->idle_expires = expires;
@@ -934,6 +946,8 @@ static void __tick_nohz_idle_enter(struc
nohz_balance_enter_idle(cpu);
}
}
+
+ ts->last_jiffies_update = 0;
}
void __tick_nohz_idle_prepare(void)
@@ -1013,15 +1027,27 @@ void tick_nohz_irq_exit(void)
}
/**
- * tick_nohz_get_sleep_length - return the length of the current sleep
+ * tick_nohz_get_sleep_length - return the expected length of the current sleep
*
* Called from power state control code with interrupts disabled
*/
ktime_t tick_nohz_get_sleep_length(void)
{
struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
+ int cpu = smp_processor_id();
+ ktime_t now, next_event;
- return ts->sleep_length;
+ now = tick_nohz_start_idle(ts);
+
+ if (can_stop_idle_tick(cpu, ts)) {
+ next_event = __tick_nohz_next_event(ts, cpu, false);
+ } else {
+ struct clock_event_device *dev;
+
+ dev = __this_cpu_read(tick_cpu_device.evtdev);
+ next_event = dev->next_event;
+ }
+ return ktime_sub(next_event, now);;
}
/**
Index: linux-pm/kernel/time/tick-sched.h
===================================================================
--- linux-pm.orig/kernel/time/tick-sched.h
+++ linux-pm/kernel/time/tick-sched.h
@@ -38,7 +38,6 @@ enum tick_nohz_mode {
* @idle_exittime: Time when the idle state was left
* @idle_sleeptime: Sum of the time slept in idle with sched tick stopped
* @iowait_sleeptime: Sum of the time slept in idle with sched tick stopped, with IO outstanding
- * @sleep_length: Duration of the current idle sleep
* @do_timer_lst: CPU was the last one doing do_timer before going idle
*/
struct tick_sched {
@@ -58,8 +57,8 @@ struct tick_sched {
ktime_t idle_exittime;
ktime_t idle_sleeptime;
ktime_t iowait_sleeptime;
- ktime_t sleep_length;
unsigned long last_jiffies;
+ u64 last_jiffies_update;
u64 next_timer;
ktime_t idle_expires;
int do_timer_last;
next prev parent reply other threads:[~2018-03-06 9:16 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-06 8:57 [RFC/RFT][PATCH v2 0/6] sched/cpuidle: Idle loop rework Rafael J. Wysocki
2018-03-06 9:02 ` [RFC/RFT][PATCH v2 1/6] time: tick-sched: Reorganize idle tick management code Rafael J. Wysocki
2018-03-07 23:18 ` Frederic Weisbecker
2018-03-08 9:22 ` Rafael J. Wysocki
2018-03-08 15:14 ` Frederic Weisbecker
2018-03-08 16:34 ` Rafael J. Wysocki
2018-03-08 17:00 ` Frederic Weisbecker
2018-03-06 9:02 ` [RFC/RFT][PATCH v2 2/6] sched: idle: Do not stop the tick upfront in the idle loop Rafael J. Wysocki
2018-03-07 23:39 ` Frederic Weisbecker
2018-03-08 9:05 ` Rafael J. Wysocki
2018-03-06 9:03 ` [RFC/RFT][PATCH v2 3/6] sched: idle: Do not stop the tick before cpuidle_idle_call() Rafael J. Wysocki
2018-03-06 9:05 ` [RFC/RFT][PATCH v2 4/6] cpuidle: Return nohz hint from cpuidle_select() Rafael J. Wysocki
2018-03-06 9:28 ` Rafael J. Wysocki
2018-03-06 10:06 ` [Update][RFC/RFT][PATCH " Rafael J. Wysocki
2018-03-06 9:10 ` Rafael J. Wysocki [this message]
2018-03-06 9:10 ` [RFC/RFT][PATCH v2 6/6] time: tick-sched: Avoid running the same code twice in a row Rafael J. Wysocki
2018-03-08 10:31 ` [RFC/RFT][PATCH v2 0/6] sched/cpuidle: Idle loop rework Mike Galbraith
2018-03-08 11:10 ` Rafael J. Wysocki
2018-03-08 13:40 ` Mike Galbraith
2018-03-09 9:58 ` Rafael J. Wysocki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1545256.RXd5dazKu1@aspire.rjw.lan \
--to=rjw@rjwysocki.net \
--cc=aubrey.li@linux.intel.com \
--cc=dsmythies@telus.net \
--cc=fweisbec@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=mgalbraith@suse.de \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=riel@surriel.com \
--cc=tglx@linutronix.de \
--cc=thomas.ilsche@tu-dresden.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).