All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/4] tick/sched: Forward timer even in nohz mode
@ 2019-12-16 23:22 Scott Wood
  2019-12-16 23:22 ` [PATCH 2/4] tick/sched: Set last_tick in init paths Scott Wood
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Scott Wood @ 2019-12-16 23:22 UTC (permalink / raw)
  To: Peter Zijlstra, Frederic Weisbecker, Thomas Gleixner, Ingo Molnar
  Cc: linux-kernel, Scott Wood

Currently when exiting nohz, the expiry will be forwarded as if we
had just run the timer.  If we re-enter nohz before this new expiry,
and exit after, this forwarding will happen again.  If this load pattern
recurs the tick can be indefinitely postponed.

To avoid this, use last_tick as-is rather than calling hrtimer_forward().
However, in some cases the tick *will* have just run (despite being
"stopped"), and leading to double timer execution.

To avoid that, forward the timer after every tick (regardless of nohz
status) and keep last_tick up-to-date.  During nohz, last_tick will
reflect what the expiry would have been if not in nohz mode.

Signed-off-by: Scott Wood <swood@redhat.com>
---
 kernel/time/tick-sched.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 8b192e67aabc..8936b604dd6c 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -642,9 +642,6 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now)
 	hrtimer_cancel(&ts->sched_timer);
 	hrtimer_set_expires(&ts->sched_timer, ts->last_tick);
 
-	/* Forward the time to expire in the future */
-	hrtimer_forward(&ts->sched_timer, now, tick_period);
-
 	if (ts->nohz_mode == NOHZ_MODE_HIGHRES) {
 		hrtimer_start_expires(&ts->sched_timer,
 				      HRTIMER_MODE_ABS_PINNED_HARD);
@@ -1207,12 +1204,13 @@ static void tick_nohz_handler(struct clock_event_device *dev)
 
 	tick_sched_do_timer(ts, now);
 	tick_sched_handle(ts, regs);
+	hrtimer_forward(&ts->sched_timer, now, tick_period);
+	ts->last_tick = hrtimer_get_expires(&ts->sched_timer);
 
 	/* No need to reprogram if we are running tickless  */
 	if (unlikely(ts->tick_stopped))
 		return;
 
-	hrtimer_forward(&ts->sched_timer, now, tick_period);
 	tick_program_event(hrtimer_get_expires(&ts->sched_timer), 1);
 }
 
@@ -1311,12 +1309,13 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
 	else
 		ts->next_tick = 0;
 
+	hrtimer_forward(timer, now, tick_period);
+	ts->last_tick = hrtimer_get_expires(&ts->sched_timer);
+
 	/* No need to reprogram if we are in idle or full dynticks mode */
 	if (unlikely(ts->tick_stopped))
 		return HRTIMER_NORESTART;
 
-	hrtimer_forward(timer, now, tick_period);
-
 	return HRTIMER_RESTART;
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/4] tick/sched: Set last_tick in init paths
  2019-12-16 23:22 [PATCH 1/4] tick/sched: Forward timer even in nohz mode Scott Wood
@ 2019-12-16 23:22 ` Scott Wood
  2019-12-16 23:22 ` [PATCH 3/4] sched/core: Don't skip remote tick for idle cpus Scott Wood
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Scott Wood @ 2019-12-16 23:22 UTC (permalink / raw)
  To: Peter Zijlstra, Frederic Weisbecker, Thomas Gleixner, Ingo Molnar
  Cc: linux-kernel, Scott Wood

This eliminates the need to save last_tick on nohz entry.

Signed-off-by: Scott Wood <swood@redhat.com>
---
 kernel/time/tick-sched.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 8936b604dd6c..59e663e240fc 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -794,7 +794,6 @@ static void tick_nohz_stop_tick(struct tick_sched *ts, int cpu)
 		calc_load_nohz_start();
 		quiet_vmstat();
 
-		ts->last_tick = hrtimer_get_expires(&ts->sched_timer);
 		ts->tick_stopped = 1;
 		trace_tick_stop(1, TICK_DEP_MASK_NONE);
 	}
@@ -1248,6 +1247,7 @@ static void tick_nohz_switch_to_nohz(void)
 
 	hrtimer_set_expires(&ts->sched_timer, next);
 	hrtimer_forward_now(&ts->sched_timer, tick_period);
+	ts->last_tick = hrtimer_get_expires(&ts->sched_timer);
 	tick_program_event(hrtimer_get_expires(&ts->sched_timer), 1);
 	tick_nohz_activate(ts, NOHZ_MODE_LOWRES);
 }
@@ -1355,6 +1355,7 @@ void tick_setup_sched_timer(void)
 	}
 
 	hrtimer_forward(&ts->sched_timer, now, tick_period);
+	ts->last_tick = hrtimer_get_expires(&ts->sched_timer);
 	hrtimer_start_expires(&ts->sched_timer, HRTIMER_MODE_ABS_PINNED_HARD);
 	tick_nohz_activate(ts, NOHZ_MODE_HIGHRES);
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/4] sched/core: Don't skip remote tick for idle cpus
  2019-12-16 23:22 [PATCH 1/4] tick/sched: Forward timer even in nohz mode Scott Wood
  2019-12-16 23:22 ` [PATCH 2/4] tick/sched: Set last_tick in init paths Scott Wood
@ 2019-12-16 23:22 ` Scott Wood
  2019-12-16 23:22 ` [PATCH 4/4] timers/nohz: Update nohz load in remote tick Scott Wood
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Scott Wood @ 2019-12-16 23:22 UTC (permalink / raw)
  To: Peter Zijlstra, Frederic Weisbecker, Thomas Gleixner, Ingo Molnar
  Cc: linux-kernel, Scott Wood

This will be used in the next patch to get a loadavg update from
nohz cpus.  The delta check is skipped because idle_sched_class
doesn't update se.exec_start.

Signed-off-by: Scott Wood <swood@redhat.com>
---
 kernel/sched/core.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 90e4b00ace89..dfb8ea801700 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3668,22 +3668,24 @@ static void sched_tick_remote(struct work_struct *work)
 	 * statistics and checks timeslices in a time-independent way, regardless
 	 * of when exactly it is running.
 	 */
-	if (idle_cpu(cpu) || !tick_nohz_tick_stopped_cpu(cpu))
+	if (!tick_nohz_tick_stopped_cpu(cpu))
 		goto out_requeue;
 
 	rq_lock_irq(rq, &rf);
 	curr = rq->curr;
-	if (is_idle_task(curr) || cpu_is_offline(cpu))
+	if (cpu_is_offline(cpu))
 		goto out_unlock;
 
 	update_rq_clock(rq);
-	delta = rq_clock_task(rq) - curr->se.exec_start;
 
-	/*
-	 * Make sure the next tick runs within a reasonable
-	 * amount of time.
-	 */
-	WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);
+	if (!is_idle_task(curr)) {
+		/*
+		 * Make sure the next tick runs within a reasonable
+		 * amount of time.
+		 */
+		delta = rq_clock_task(rq) - curr->se.exec_start;
+		WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);
+	}
 	curr->sched_class->task_tick(rq, curr, 0);
 
 out_unlock:
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/4] timers/nohz: Update nohz load in remote tick
  2019-12-16 23:22 [PATCH 1/4] tick/sched: Forward timer even in nohz mode Scott Wood
  2019-12-16 23:22 ` [PATCH 2/4] tick/sched: Set last_tick in init paths Scott Wood
  2019-12-16 23:22 ` [PATCH 3/4] sched/core: Don't skip remote tick for idle cpus Scott Wood
@ 2019-12-16 23:22 ` Scott Wood
  2020-01-07  9:26   ` Peter Zijlstra
  2020-01-06 20:12 ` [PATCH 1/4] tick/sched: Forward timer even in nohz mode Scott Wood
  2020-01-06 22:18 ` Frederic Weisbecker
  4 siblings, 1 reply; 8+ messages in thread
From: Scott Wood @ 2019-12-16 23:22 UTC (permalink / raw)
  To: Peter Zijlstra, Frederic Weisbecker, Thomas Gleixner, Ingo Molnar
  Cc: linux-kernel, Scott Wood

From: Peter Zijlstra <peterz@infradead.org>

The way loadavg is tracked during nohz only pays attention to the load
upon entering nohz.  This can be particularly noticeable if full nohz is
entered while non-idle, and then the cpu goes idle and stays that way for
a long time.

Use the remote tick to ensure that full nohz cpus report their deltas
within a reasonable time.

[swood: added changelog and removed recheck of stopped tick]
Signed-off-by: Scott Wood <swood@redhat.com>
---
This patch was provided by Peter in an unsigned email reply --
Peter, can I get a signoff?
---
 include/linux/sched/nohz.h |  2 ++
 kernel/sched/core.c        |  4 +++-
 kernel/sched/loadavg.c     | 33 +++++++++++++++++++++++----------
 3 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/include/linux/sched/nohz.h b/include/linux/sched/nohz.h
index 1abe91ff6e4a..6d67e9a5af6b 100644
--- a/include/linux/sched/nohz.h
+++ b/include/linux/sched/nohz.h
@@ -15,9 +15,11 @@ static inline void nohz_balance_enter_idle(int cpu) { }
 
 #ifdef CONFIG_NO_HZ_COMMON
 void calc_load_nohz_start(void);
+void calc_load_nohz_remote(struct rq *rq);
 void calc_load_nohz_stop(void);
 #else
 static inline void calc_load_nohz_start(void) { }
+static inline void calc_load_nohz_remote(struct rq *rq) { }
 static inline void calc_load_nohz_stop(void) { }
 #endif /* CONFIG_NO_HZ_COMMON */
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index dfb8ea801700..2e4a505e48af 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3676,6 +3676,7 @@ static void sched_tick_remote(struct work_struct *work)
 	if (cpu_is_offline(cpu))
 		goto out_unlock;
 
+	curr = rq->curr;
 	update_rq_clock(rq);
 
 	if (!is_idle_task(curr)) {
@@ -3688,10 +3689,11 @@ static void sched_tick_remote(struct work_struct *work)
 	}
 	curr->sched_class->task_tick(rq, curr, 0);
 
+	calc_load_nohz_remote(rq);
 out_unlock:
 	rq_unlock_irq(rq, &rf);
-
 out_requeue:
+
 	/*
 	 * Run the remote tick once per second (1Hz). This arbitrary
 	 * frequency is large enough to avoid overload but short enough
diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
index 28a516575c18..de22da666ac7 100644
--- a/kernel/sched/loadavg.c
+++ b/kernel/sched/loadavg.c
@@ -231,16 +231,11 @@ static inline int calc_load_read_idx(void)
 	return calc_load_idx & 1;
 }
 
-void calc_load_nohz_start(void)
+static void calc_load_nohz_fold(struct rq *rq)
 {
-	struct rq *this_rq = this_rq();
 	long delta;
 
-	/*
-	 * We're going into NO_HZ mode, if there's any pending delta, fold it
-	 * into the pending NO_HZ delta.
-	 */
-	delta = calc_load_fold_active(this_rq, 0);
+	delta = calc_load_fold_active(rq, 0);
 	if (delta) {
 		int idx = calc_load_write_idx();
 
@@ -248,6 +243,24 @@ void calc_load_nohz_start(void)
 	}
 }
 
+void calc_load_nohz_start(void)
+{
+	/*
+	 * We're going into NO_HZ mode, if there's any pending delta, fold it
+	 * into the pending NO_HZ delta.
+	 */
+	calc_load_nohz_fold(this_rq());
+}
+
+/*
+ * Keep track of the load for NOHZ_FULL, must be called between
+ * calc_load_nohz_{start,stop}().
+ */
+void calc_load_nohz_remote(struct rq *rq)
+{
+	calc_load_nohz_fold(rq);
+}
+
 void calc_load_nohz_stop(void)
 {
 	struct rq *this_rq = this_rq();
@@ -268,7 +281,7 @@ void calc_load_nohz_stop(void)
 		this_rq->calc_load_update += LOAD_FREQ;
 }
 
-static long calc_load_nohz_fold(void)
+static long calc_load_nohz_read(void)
 {
 	int idx = calc_load_read_idx();
 	long delta = 0;
@@ -323,7 +336,7 @@ static void calc_global_nohz(void)
 }
 #else /* !CONFIG_NO_HZ_COMMON */
 
-static inline long calc_load_nohz_fold(void) { return 0; }
+static inline long calc_load_nohz_read(void) { return 0; }
 static inline void calc_global_nohz(void) { }
 
 #endif /* CONFIG_NO_HZ_COMMON */
@@ -346,7 +359,7 @@ void calc_global_load(unsigned long ticks)
 	/*
 	 * Fold the 'old' NO_HZ-delta to include all NO_HZ CPUs.
 	 */
-	delta = calc_load_nohz_fold();
+	delta = calc_load_nohz_read();
 	if (delta)
 		atomic_long_add(delta, &calc_load_tasks);
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/4] tick/sched: Forward timer even in nohz mode
  2019-12-16 23:22 [PATCH 1/4] tick/sched: Forward timer even in nohz mode Scott Wood
                   ` (2 preceding siblings ...)
  2019-12-16 23:22 ` [PATCH 4/4] timers/nohz: Update nohz load in remote tick Scott Wood
@ 2020-01-06 20:12 ` Scott Wood
  2020-01-06 22:18 ` Frederic Weisbecker
  4 siblings, 0 replies; 8+ messages in thread
From: Scott Wood @ 2020-01-06 20:12 UTC (permalink / raw)
  To: Peter Zijlstra, Frederic Weisbecker, Thomas Gleixner, Ingo Molnar
  Cc: linux-kernel

On Mon, 2019-12-16 at 18:22 -0500, Scott Wood wrote:
> Currently when exiting nohz, the expiry will be forwarded as if we
> had just run the timer.  If we re-enter nohz before this new expiry,
> and exit after, this forwarding will happen again.  If this load pattern
> recurs the tick can be indefinitely postponed.
> 
> To avoid this, use last_tick as-is rather than calling hrtimer_forward().
> However, in some cases the tick *will* have just run (despite being
> "stopped"), and leading to double timer execution.
> 
> To avoid that, forward the timer after every tick (regardless of nohz
> status) and keep last_tick up-to-date.  During nohz, last_tick will
> reflect what the expiry would have been if not in nohz mode.
> 
> Signed-off-by: Scott Wood <swood@redhat.com>
> ---
>  kernel/time/tick-sched.c | 11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)

Any comments on these patches?

Thanks,
Scott



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/4] tick/sched: Forward timer even in nohz mode
  2019-12-16 23:22 [PATCH 1/4] tick/sched: Forward timer even in nohz mode Scott Wood
                   ` (3 preceding siblings ...)
  2020-01-06 20:12 ` [PATCH 1/4] tick/sched: Forward timer even in nohz mode Scott Wood
@ 2020-01-06 22:18 ` Frederic Weisbecker
  2020-01-08 23:18   ` Scott Wood
  4 siblings, 1 reply; 8+ messages in thread
From: Frederic Weisbecker @ 2020-01-06 22:18 UTC (permalink / raw)
  To: Scott Wood
  Cc: Peter Zijlstra, Frederic Weisbecker, Thomas Gleixner,
	Ingo Molnar, linux-kernel

On Mon, Dec 16, 2019 at 06:22:22PM -0500, Scott Wood wrote:
> Currently when exiting nohz, the expiry will be forwarded as if we
> had just run the timer.  If we re-enter nohz before this new expiry,
> and exit after, this forwarding will happen again.  If this load pattern
> recurs the tick can be indefinitely postponed.

I must be missing something but I don't see why that would be a problem.
Indeed the tick can be indefinitely postponed but that's as long as it's
not needed. As soon as it's needed (timer callback expired, RCU, ...), the
tick will be retained and it will eventually fire.

> @@ -642,9 +642,6 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now)
>  	hrtimer_cancel(&ts->sched_timer);
>  	hrtimer_set_expires(&ts->sched_timer, ts->last_tick);
>  
> -	/* Forward the time to expire in the future */
> -	hrtimer_forward(&ts->sched_timer, now, tick_period);
> -

By doing that, you may program a past tick and thus add a useless interrupt
at each idle exit.

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 4/4] timers/nohz: Update nohz load in remote tick
  2019-12-16 23:22 ` [PATCH 4/4] timers/nohz: Update nohz load in remote tick Scott Wood
@ 2020-01-07  9:26   ` Peter Zijlstra
  0 siblings, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2020-01-07  9:26 UTC (permalink / raw)
  To: Scott Wood
  Cc: Frederic Weisbecker, Thomas Gleixner, Ingo Molnar, linux-kernel

On Mon, Dec 16, 2019 at 06:22:25PM -0500, Scott Wood wrote:
> From: Peter Zijlstra <peterz@infradead.org>
> 
> The way loadavg is tracked during nohz only pays attention to the load
> upon entering nohz.  This can be particularly noticeable if full nohz is
> entered while non-idle, and then the cpu goes idle and stays that way for
> a long time.
> 
> Use the remote tick to ensure that full nohz cpus report their deltas
> within a reasonable time.
> 
> [swood: added changelog and removed recheck of stopped tick]
> Signed-off-by: Scott Wood <swood@redhat.com>
> ---
> This patch was provided by Peter in an unsigned email reply --
> Peter, can I get a signoff?

Here goes, glad it works :-)

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/4] tick/sched: Forward timer even in nohz mode
  2020-01-06 22:18 ` Frederic Weisbecker
@ 2020-01-08 23:18   ` Scott Wood
  0 siblings, 0 replies; 8+ messages in thread
From: Scott Wood @ 2020-01-08 23:18 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, Frederic Weisbecker, Thomas Gleixner,
	Ingo Molnar, linux-kernel

On Mon, 2020-01-06 at 23:18 +0100, Frederic Weisbecker wrote:
> On Mon, Dec 16, 2019 at 06:22:22PM -0500, Scott Wood wrote:
> > Currently when exiting nohz, the expiry will be forwarded as if we
> > had just run the timer.  If we re-enter nohz before this new expiry,
> > and exit after, this forwarding will happen again.  If this load pattern
> > recurs the tick can be indefinitely postponed.
> 
> I must be missing something but I don't see why that would be a problem.
> Indeed the tick can be indefinitely postponed but that's as long as it's
> not needed. As soon as it's needed (timer callback expired, RCU, ...), the
> tick will be retained and it will eventually fire.

My main concern was the loadavg recording, which does not prevent entering
nohz.  However, if we're going into and out of nohz then the call to
calc_load_nohz_start() in tick_nohz_stop_tick() should take care of it, so I
guess we only need patches 3 and 4, if there's nothing else in the tick that
doesn't either prevent nohz or get handled during the transition
(psi_task_tick?).

-Scott



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-01-08 23:18 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-16 23:22 [PATCH 1/4] tick/sched: Forward timer even in nohz mode Scott Wood
2019-12-16 23:22 ` [PATCH 2/4] tick/sched: Set last_tick in init paths Scott Wood
2019-12-16 23:22 ` [PATCH 3/4] sched/core: Don't skip remote tick for idle cpus Scott Wood
2019-12-16 23:22 ` [PATCH 4/4] timers/nohz: Update nohz load in remote tick Scott Wood
2020-01-07  9:26   ` Peter Zijlstra
2020-01-06 20:12 ` [PATCH 1/4] tick/sched: Forward timer even in nohz mode Scott Wood
2020-01-06 22:18 ` Frederic Weisbecker
2020-01-08 23:18   ` Scott Wood

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.