linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/10] tick/nohz updates
@ 2021-03-11 12:36 Frederic Weisbecker
  2021-03-11 12:36 ` [PATCH 01/10] tick/nohz: Prevent tick_nohz_get_sleep_length() from returning negative value Frederic Weisbecker
                   ` (9 more replies)
  0 siblings, 10 replies; 35+ messages in thread
From: Frederic Weisbecker @ 2021-03-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Thomas Gleixner
  Cc: LKML, Frederic Weisbecker, Rafael J . Wysocki, Ti Zhou,
	Yunfeng Ye, Paul E . McKenney, Marcelo Tosatti, Ingo Molnar

Various updates for the timer/nohz side.

* Two fixes (maybe 01/10 deserves a stable tag, I should check)

* A Kconfig help change

* A debug check while enqueuing timers when the tick is stopped in idle.

* The rest is noise reduction for full nohz ("tick/nohz: Kick only 
  _queued_ task whose tick dependency is updated").
  It relies on the fact that p->on_rq is never set to anything else than
  TASK_ON_RQ_QUEUED while the task is running on a CPU, the only relevant
  exception being the task dequeuing itself on schedule().
  I haven't found issues in my modest reviews of deactivate_task() callers
  but I lost my headlamp while visiting fair's sched entity dequeuing
  and throttling (had to find my way back in the dark again).

So please double check the last patch.

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	timers/nohz

HEAD: e469edfa00f97e5ec5d31fe31d3aef0a5c9bd607

Thanks,
	Frederic
---

Frederic Weisbecker (5):
      tick/nohz: Add tick_nohz_full_this_cpu()
      tick/nohz: Remove superflous check for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
      timer: Report ignored local enqueue in nohz mode
      tick/nohz: Update nohz_full Kconfig help
      tick/nohz: Only wakeup a single target cpu when kicking a task

Marcelo Tosatti (2):
      tick/nohz: Change signal tick dependency to wakeup CPUs of member tasks
      tick/nohz: Kick only _queued_ task whose tick dependency is updated

Yunfeng Ye (2):
      tick/nohz: Conditionally restart tick on idle exit
      tick/nohz: Update idle_exittime on actual idle exit

Zhou Ti (x2019cwm) (1):
      tick/nohz: Prevent tick_nohz_get_sleep_length() from returning negative value


 include/linux/sched.h          |   2 +
 include/linux/tick.h           |  19 ++++--
 kernel/sched/core.c            |  25 +++++++-
 kernel/time/Kconfig            |  11 ++--
 kernel/time/posix-cpu-timers.c |   4 +-
 kernel/time/tick-sched.c       | 134 +++++++++++++++++++++++++++++------------
 6 files changed, 142 insertions(+), 53 deletions(-)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 01/10] tick/nohz: Prevent tick_nohz_get_sleep_length() from returning negative value
  2021-03-11 12:36 [PATCH 00/10] tick/nohz updates Frederic Weisbecker
@ 2021-03-11 12:36 ` Frederic Weisbecker
  2021-03-16 12:21   ` Peter Zijlstra
  2021-03-11 12:37 ` [PATCH 02/10] tick/nohz: Add tick_nohz_full_this_cpu() Frederic Weisbecker
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Frederic Weisbecker @ 2021-03-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Thomas Gleixner
  Cc: LKML, Zhou Ti (x2019cwm),
	Rafael J . Wysocki, Yunfeng Ye, Frederic Weisbecker,
	Paul E . McKenney, Marcelo Tosatti, Ingo Molnar

From: "Zhou Ti (x2019cwm)" <x2019cwm@stfx.ca>

If the hardware clock happens to fire its interrupts late, two possible
issues can happen while calling tick_nohz_get_sleep_length(). Either:

1) The next clockevent device event is due past the last idle entry time.

or:

2) The last timekeeping update happened before the last idle entry time
   and the next timer callback expires before the last idle entry time.

Make sure that both cases are handled to avoid returning a negative
duration to the cpuidle governors.

Signed-off-by: Ti Zhou <x2019cwm@stfx.ca>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/time/tick-sched.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index e10a4af88737..22b6a46818cf 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1142,6 +1142,9 @@ ktime_t tick_nohz_get_sleep_length(ktime_t *delta_next)
 
 	*delta_next = ktime_sub(dev->next_event, now);
 
+	if (unlikely(*delta_next < 0))
+		*delta_next = 0;
+
 	if (!can_stop_idle_tick(cpu, ts))
 		return *delta_next;
 
@@ -1156,6 +1159,9 @@ ktime_t tick_nohz_get_sleep_length(ktime_t *delta_next)
 	next_event = min_t(u64, next_event,
 			   hrtimer_next_event_without(&ts->sched_timer));
 
+	if (unlikely(next_event < now))
+		next_event = now;
+
 	return ktime_sub(next_event, now);
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 02/10] tick/nohz: Add tick_nohz_full_this_cpu()
  2021-03-11 12:36 [PATCH 00/10] tick/nohz updates Frederic Weisbecker
  2021-03-11 12:36 ` [PATCH 01/10] tick/nohz: Prevent tick_nohz_get_sleep_length() from returning negative value Frederic Weisbecker
@ 2021-03-11 12:37 ` Frederic Weisbecker
  2021-03-16 12:28   ` Peter Zijlstra
  2021-03-11 12:37 ` [PATCH 03/10] tick/nohz: Conditionally restart tick on idle exit Frederic Weisbecker
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Frederic Weisbecker @ 2021-03-11 12:37 UTC (permalink / raw)
  To: Peter Zijlstra, Thomas Gleixner
  Cc: LKML, Frederic Weisbecker, Rafael J . Wysocki, Ti Zhou,
	Yunfeng Ye, Paul E . McKenney, Marcelo Tosatti, Ingo Molnar

Optimize further the check for local full dynticks CPU. Testing directly
tick_nohz_full_cpu(smp_processor_id()) is suboptimal because the
compiler first fetches the CPU number and only then processes the
static key.

It's best to evaluate the static branch before anything. Provide with a
function that handles that correctly and convert some users along.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Yunfeng Ye <yeyunfeng@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 include/linux/tick.h     | 11 ++++++++++-
 kernel/time/tick-sched.c | 12 +++++-------
 2 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index 7340613c7eff..bfc96cbe955c 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -193,6 +193,14 @@ static inline bool tick_nohz_full_cpu(int cpu)
 	return cpumask_test_cpu(cpu, tick_nohz_full_mask);
 }
 
+static inline bool tick_nohz_full_this_cpu(void)
+{
+	if (!tick_nohz_full_enabled())
+		return false;
+
+	return cpumask_test_cpu(smp_processor_id(), tick_nohz_full_mask);
+}
+
 static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask)
 {
 	if (tick_nohz_full_enabled())
@@ -271,6 +279,7 @@ extern void __init tick_nohz_full_setup(cpumask_var_t cpumask);
 #else
 static inline bool tick_nohz_full_enabled(void) { return false; }
 static inline bool tick_nohz_full_cpu(int cpu) { return false; }
+static inline bool tick_nohz_full_this_cpu(void) { return false; }
 static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) { }
 
 static inline void tick_nohz_dep_set_cpu(int cpu, enum tick_dep_bits bit) { }
@@ -296,7 +305,7 @@ static inline void tick_nohz_full_setup(cpumask_var_t cpumask) { }
 
 static inline void tick_nohz_task_switch(void)
 {
-	if (tick_nohz_full_enabled())
+	if (tick_nohz_full_this_cpu())
 		__tick_nohz_task_switch();
 }
 
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 22b6a46818cf..af76cfa51b57 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -304,7 +304,7 @@ static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) =
  */
 static void tick_nohz_full_kick(void)
 {
-	if (!tick_nohz_full_cpu(smp_processor_id()))
+	if (!tick_nohz_full_this_cpu())
 		return;
 
 	irq_work_queue(this_cpu_ptr(&nohz_full_kick_work));
@@ -452,9 +452,6 @@ void __tick_nohz_task_switch(void)
 
 	local_irq_save(flags);
 
-	if (!tick_nohz_full_cpu(smp_processor_id()))
-		goto out;
-
 	ts = this_cpu_ptr(&tick_cpu_sched);
 
 	if (ts->tick_stopped) {
@@ -462,7 +459,6 @@ void __tick_nohz_task_switch(void)
 		    atomic_read(&current->signal->tick_dep_mask))
 			tick_nohz_full_kick();
 	}
-out:
 	local_irq_restore(flags);
 }
 
@@ -929,14 +925,16 @@ static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
 static void tick_nohz_full_update_tick(struct tick_sched *ts)
 {
 #ifdef CONFIG_NO_HZ_FULL
-	int cpu = smp_processor_id();
+	int cpu;
 
-	if (!tick_nohz_full_cpu(cpu))
+	if (!tick_nohz_full_this_cpu())
 		return;
 
 	if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
 		return;
 
+	cpu = smp_processor_id();
+
 	if (can_stop_full_tick(cpu, ts))
 		tick_nohz_stop_sched_tick(ts, cpu);
 	else if (ts->tick_stopped)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 03/10] tick/nohz: Conditionally restart tick on idle exit
  2021-03-11 12:36 [PATCH 00/10] tick/nohz updates Frederic Weisbecker
  2021-03-11 12:36 ` [PATCH 01/10] tick/nohz: Prevent tick_nohz_get_sleep_length() from returning negative value Frederic Weisbecker
  2021-03-11 12:37 ` [PATCH 02/10] tick/nohz: Add tick_nohz_full_this_cpu() Frederic Weisbecker
@ 2021-03-11 12:37 ` Frederic Weisbecker
  2021-03-11 12:37 ` [PATCH 04/10] tick/nohz: Remove superflous check for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE Frederic Weisbecker
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 35+ messages in thread
From: Frederic Weisbecker @ 2021-03-11 12:37 UTC (permalink / raw)
  To: Peter Zijlstra, Thomas Gleixner
  Cc: LKML, Yunfeng Ye, Rafael J . Wysocki, Ti Zhou,
	Frederic Weisbecker, Paul E . McKenney, Marcelo Tosatti,
	Ingo Molnar

From: Yunfeng Ye <yeyunfeng@huawei.com>

In nohz_full mode, switching from idle to a task will unconditionally
issue a tick restart. If the task is alone in the runqueue or is the
highest priority, the tick will fire once then eventually stop. But that
alone is still undesired noise.

Therefore, only restart the tick on idle exit when it's strictly
necessary.

Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/time/tick-sched.c | 50 ++++++++++++++++++++++++----------------
 1 file changed, 30 insertions(+), 20 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index af76cfa51b57..77dc8cd61dc8 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -922,26 +922,30 @@ static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
 	tick_nohz_restart(ts, now);
 }
 
-static void tick_nohz_full_update_tick(struct tick_sched *ts)
+static void __tick_nohz_full_update_tick(struct tick_sched *ts,
+					 ktime_t now)
 {
 #ifdef CONFIG_NO_HZ_FULL
-	int cpu;
-
-	if (!tick_nohz_full_this_cpu())
-		return;
-
-	if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
-		return;
-
-	cpu = smp_processor_id();
+	int cpu = smp_processor_id();
 
 	if (can_stop_full_tick(cpu, ts))
 		tick_nohz_stop_sched_tick(ts, cpu);
 	else if (ts->tick_stopped)
-		tick_nohz_restart_sched_tick(ts, ktime_get());
+		tick_nohz_restart_sched_tick(ts, now);
 #endif
 }
 
+static void tick_nohz_full_update_tick(struct tick_sched *ts)
+{
+	if (!tick_nohz_full_this_cpu())
+		return;
+
+	if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
+		return;
+
+	__tick_nohz_full_update_tick(ts, ktime_get());
+}
+
 static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
 {
 	/*
@@ -1209,18 +1213,24 @@ static void tick_nohz_account_idle_ticks(struct tick_sched *ts)
 #endif
 }
 
-static void __tick_nohz_idle_restart_tick(struct tick_sched *ts, ktime_t now)
-{
-	tick_nohz_restart_sched_tick(ts, now);
-	tick_nohz_account_idle_ticks(ts);
-}
-
 void tick_nohz_idle_restart_tick(void)
 {
 	struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
 
-	if (ts->tick_stopped)
-		__tick_nohz_idle_restart_tick(ts, ktime_get());
+	if (ts->tick_stopped) {
+		tick_nohz_restart_sched_tick(ts, ktime_get());
+		tick_nohz_account_idle_ticks(ts);
+	}
+}
+
+static void tick_nohz_idle_update_tick(struct tick_sched *ts, ktime_t now)
+{
+	if (tick_nohz_full_this_cpu())
+		__tick_nohz_full_update_tick(ts, now);
+	else
+		tick_nohz_restart_sched_tick(ts, now);
+
+	tick_nohz_account_idle_ticks(ts);
 }
 
 /**
@@ -1252,7 +1262,7 @@ void tick_nohz_idle_exit(void)
 		tick_nohz_stop_idle(ts, now);
 
 	if (tick_stopped)
-		__tick_nohz_idle_restart_tick(ts, now);
+		tick_nohz_idle_update_tick(ts, now);
 
 	local_irq_enable();
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 04/10] tick/nohz: Remove superflous check for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
  2021-03-11 12:36 [PATCH 00/10] tick/nohz updates Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2021-03-11 12:37 ` [PATCH 03/10] tick/nohz: Conditionally restart tick on idle exit Frederic Weisbecker
@ 2021-03-11 12:37 ` Frederic Weisbecker
  2021-03-11 12:37 ` [PATCH 05/10] tick/nohz: Update idle_exittime on actual idle exit Frederic Weisbecker
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 35+ messages in thread
From: Frederic Weisbecker @ 2021-03-11 12:37 UTC (permalink / raw)
  To: Peter Zijlstra, Thomas Gleixner
  Cc: LKML, Frederic Weisbecker, Rafael J . Wysocki, Ti Zhou,
	Yunfeng Ye, Paul E . McKenney, Marcelo Tosatti, Ingo Molnar

The vtime_accounting_enabled_this_cpu() early check already makes what
follows as dead code in the case of CONFIG_VIRT_CPU_ACCOUNTING_NATIVE.
No need to keep the ifdeferry around.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Yunfeng Ye <yeyunfeng@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 kernel/time/tick-sched.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 77dc8cd61dc8..c86b586d65e0 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1194,7 +1194,6 @@ unsigned long tick_nohz_get_idle_calls(void)
 
 static void tick_nohz_account_idle_ticks(struct tick_sched *ts)
 {
-#ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
 	unsigned long ticks;
 
 	if (vtime_accounting_enabled_this_cpu())
@@ -1210,7 +1209,6 @@ static void tick_nohz_account_idle_ticks(struct tick_sched *ts)
 	 */
 	if (ticks && ticks < LONG_MAX)
 		account_idle_ticks(ticks);
-#endif
 }
 
 void tick_nohz_idle_restart_tick(void)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 05/10] tick/nohz: Update idle_exittime on actual idle exit
  2021-03-11 12:36 [PATCH 00/10] tick/nohz updates Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2021-03-11 12:37 ` [PATCH 04/10] tick/nohz: Remove superflous check for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE Frederic Weisbecker
@ 2021-03-11 12:37 ` Frederic Weisbecker
  2021-03-11 12:37 ` [PATCH 06/10] timer: Report ignored local enqueue in nohz mode Frederic Weisbecker
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 35+ messages in thread
From: Frederic Weisbecker @ 2021-03-11 12:37 UTC (permalink / raw)
  To: Peter Zijlstra, Thomas Gleixner
  Cc: LKML, Yunfeng Ye, Rafael J . Wysocki, Ti Zhou,
	Frederic Weisbecker, Paul E . McKenney, Marcelo Tosatti,
	Ingo Molnar

From: Yunfeng Ye <yeyunfeng@huawei.com>

The idle_exittime field of tick_sched is used to record the time when
the idle state was left. but currently the idle_exittime is updated in
the function tick_nohz_restart_sched_tick(), which is not always in idle
state when nohz_full is configured:

  tick_irq_exit
    tick_nohz_irq_exit
      tick_nohz_full_update_tick
        tick_nohz_restart_sched_tick
          ts->idle_exittime = now;

It's thus overwritten by mistake on nohz_full tick restart. Move the
update to the appropriate idle exit path instead.

Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
Cc: Yunfeng Ye <yeyunfeng@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/time/tick-sched.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index c86b586d65e0..2a041d0dc3eb 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -917,8 +917,6 @@ static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
 	 * Cancel the scheduled timer and restore the tick
 	 */
 	ts->tick_stopped  = 0;
-	ts->idle_exittime = now;
-
 	tick_nohz_restart(ts, now);
 }
 
@@ -1192,10 +1190,13 @@ unsigned long tick_nohz_get_idle_calls(void)
 	return ts->idle_calls;
 }
 
-static void tick_nohz_account_idle_ticks(struct tick_sched *ts)
+static void tick_nohz_account_idle_time(struct tick_sched *ts,
+					ktime_t now)
 {
 	unsigned long ticks;
 
+	ts->idle_exittime = now;
+
 	if (vtime_accounting_enabled_this_cpu())
 		return;
 	/*
@@ -1216,8 +1217,9 @@ void tick_nohz_idle_restart_tick(void)
 	struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
 
 	if (ts->tick_stopped) {
-		tick_nohz_restart_sched_tick(ts, ktime_get());
-		tick_nohz_account_idle_ticks(ts);
+		ktime_t now = ktime_get();
+		tick_nohz_restart_sched_tick(ts, now);
+		tick_nohz_account_idle_time(ts, now);
 	}
 }
 
@@ -1228,7 +1230,7 @@ static void tick_nohz_idle_update_tick(struct tick_sched *ts, ktime_t now)
 	else
 		tick_nohz_restart_sched_tick(ts, now);
 
-	tick_nohz_account_idle_ticks(ts);
+	tick_nohz_account_idle_time(ts, now);
 }
 
 /**
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 06/10] timer: Report ignored local enqueue in nohz mode
  2021-03-11 12:36 [PATCH 00/10] tick/nohz updates Frederic Weisbecker
                   ` (4 preceding siblings ...)
  2021-03-11 12:37 ` [PATCH 05/10] tick/nohz: Update idle_exittime on actual idle exit Frederic Weisbecker
@ 2021-03-11 12:37 ` Frederic Weisbecker
  2021-03-16 15:27   ` Peter Zijlstra
  2021-03-11 12:37 ` [PATCH 07/10] tick/nohz: Update nohz_full Kconfig help Frederic Weisbecker
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Frederic Weisbecker @ 2021-03-11 12:37 UTC (permalink / raw)
  To: Peter Zijlstra, Thomas Gleixner
  Cc: LKML, Frederic Weisbecker, Rafael J . Wysocki, Ti Zhou,
	Yunfeng Ye, Paul E . McKenney, Marcelo Tosatti, Ingo Molnar

Enqueuing a local timer after the tick has been stopped will result in
the timer being ignored until the next random interrupt.

Perform sanity checks to report these situations.

Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/sched/core.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ca2bb629595f..24552911f92b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -674,6 +674,22 @@ int get_nohz_timer_target(void)
 	return cpu;
 }
 
+/* Make sure the timer won't be ignored in dynticks-idle case */
+static void wake_idle_assert_possible(void)
+{
+#ifdef CONFIG_SCHED_DEBUG
+	/*
+	 * Timers are re-evaluated after idle IRQs. In case of softirq,
+	 * we assume IRQ tail. Ksoftirqd shouldn't reach here as the
+	 * timer base wouldn't be idle. And inline softirq processing
+	 * after a call to local_bh_enable() within idle loop sound too
+	 * fun to be considered here.
+	 */
+	WARN_ONCE(in_task(),
+		  "Late timer enqueue may be ignored\n");
+#endif
+}
+
 /*
  * When add_timer_on() enqueues a timer into the timer wheel of an
  * idle CPU then this timer might expire before the next timer event
@@ -688,8 +704,10 @@ static void wake_up_idle_cpu(int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
 
-	if (cpu == smp_processor_id())
+	if (cpu == smp_processor_id()) {
+		wake_idle_assert_possible();
 		return;
+	}
 
 	if (set_nr_and_not_polling(rq->idle))
 		smp_send_reschedule(cpu);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 07/10] tick/nohz: Update nohz_full Kconfig help
  2021-03-11 12:36 [PATCH 00/10] tick/nohz updates Frederic Weisbecker
                   ` (5 preceding siblings ...)
  2021-03-11 12:37 ` [PATCH 06/10] timer: Report ignored local enqueue in nohz mode Frederic Weisbecker
@ 2021-03-11 12:37 ` Frederic Weisbecker
  2021-03-11 12:37 ` [PATCH 08/10] tick/nohz: Only wakeup a single target cpu when kicking a task Frederic Weisbecker
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 35+ messages in thread
From: Frederic Weisbecker @ 2021-03-11 12:37 UTC (permalink / raw)
  To: Peter Zijlstra, Thomas Gleixner
  Cc: LKML, Frederic Weisbecker, Rafael J . Wysocki, Ti Zhou,
	Yunfeng Ye, Paul E . McKenney, Marcelo Tosatti, Ingo Molnar

CONFIG_NO_HZ_FULL behaves just like CONFIG_NO_HZ_IDLE by default.
Reassure distros about it.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Yunfeng Ye <yeyunfeng@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 kernel/time/Kconfig | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig
index 83e158d016ba..6649e1d2dba5 100644
--- a/kernel/time/Kconfig
+++ b/kernel/time/Kconfig
@@ -117,13 +117,14 @@ config NO_HZ_FULL
 	 the task mostly runs in userspace and has few kernel activity.
 
 	 You need to fill up the nohz_full boot parameter with the
-	 desired range of dynticks CPUs.
+	 desired range of dynticks CPUs to use it. This is implemented at
+	 the expense of some overhead in user <-> kernel transitions:
+	 syscalls, exceptions and interrupts.
 
-	 This is implemented at the expense of some overhead in user <-> kernel
-	 transitions: syscalls, exceptions and interrupts. Even when it's
-	 dynamically off.
+	 By default, without passing nohz_full parameter, this behaves just
+	 like NO_HZ_IDLE.
 
-	 Say N.
+	 If you're a distro say Y.
 
 endchoice
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 08/10] tick/nohz: Only wakeup a single target cpu when kicking a task
  2021-03-11 12:36 [PATCH 00/10] tick/nohz updates Frederic Weisbecker
                   ` (6 preceding siblings ...)
  2021-03-11 12:37 ` [PATCH 07/10] tick/nohz: Update nohz_full Kconfig help Frederic Weisbecker
@ 2021-03-11 12:37 ` Frederic Weisbecker
  2021-03-11 12:37 ` [PATCH 09/10] tick/nohz: Change signal tick dependency to wakeup CPUs of member tasks Frederic Weisbecker
  2021-03-11 12:37 ` [PATCH 10/10] tick/nohz: Kick only _queued_ task whose tick dependency is updated Frederic Weisbecker
  9 siblings, 0 replies; 35+ messages in thread
From: Frederic Weisbecker @ 2021-03-11 12:37 UTC (permalink / raw)
  To: Peter Zijlstra, Thomas Gleixner
  Cc: LKML, Frederic Weisbecker, Rafael J . Wysocki, Ti Zhou,
	Yunfeng Ye, Paul E . McKenney, Marcelo Tosatti, Ingo Molnar

When adding a tick dependency to a task, its necessary to
wakeup the CPU where the task resides to reevaluate tick
dependencies on that CPU.

However the current code wakes up all nohz_full CPUs, which
is unnecessary.

Switch to waking up a single CPU, by using ordering of writes
to task->cpu and task->tick_dep_mask.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Yunfeng Ye <yeyunfeng@huawei.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 kernel/time/tick-sched.c | 40 +++++++++++++++++++++++++++-------------
 1 file changed, 27 insertions(+), 13 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 2a041d0dc3eb..8457f15a5073 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -322,6 +322,31 @@ void tick_nohz_full_kick_cpu(int cpu)
 	irq_work_queue_on(&per_cpu(nohz_full_kick_work, cpu), cpu);
 }
 
+static void tick_nohz_kick_task(struct task_struct *tsk)
+{
+	int cpu = task_cpu(tsk);
+
+	/*
+	 * If the task concurrently migrates to another cpu,
+	 * we guarantee it sees the new tick dependency upon
+	 * schedule.
+	 *
+	 *
+	 * set_task_cpu(p, cpu);
+	 *   STORE p->cpu = @cpu
+	 * __schedule() (switch to task 'p')
+	 *   LOCK rq->lock
+	 *   smp_mb__after_spin_lock()          STORE p->tick_dep_mask
+	 *   tick_nohz_task_switch()            smp_mb() (atomic_fetch_or())
+	 *      LOAD p->tick_dep_mask           LOAD p->cpu
+	 */
+
+	preempt_disable();
+	if (cpu_online(cpu))
+		tick_nohz_full_kick_cpu(cpu);
+	preempt_enable();
+}
+
 /*
  * Kick all full dynticks CPUs in order to force these to re-evaluate
  * their dependency on the tick and restart it if necessary.
@@ -404,19 +429,8 @@ EXPORT_SYMBOL_GPL(tick_nohz_dep_clear_cpu);
  */
 void tick_nohz_dep_set_task(struct task_struct *tsk, enum tick_dep_bits bit)
 {
-	if (!atomic_fetch_or(BIT(bit), &tsk->tick_dep_mask)) {
-		if (tsk == current) {
-			preempt_disable();
-			tick_nohz_full_kick();
-			preempt_enable();
-		} else {
-			/*
-			 * Some future tick_nohz_full_kick_task()
-			 * should optimize this.
-			 */
-			tick_nohz_full_kick_all();
-		}
-	}
+	if (!atomic_fetch_or(BIT(bit), &tsk->tick_dep_mask))
+		tick_nohz_kick_task(tsk);
 }
 EXPORT_SYMBOL_GPL(tick_nohz_dep_set_task);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 09/10] tick/nohz: Change signal tick dependency to wakeup CPUs of member tasks
  2021-03-11 12:36 [PATCH 00/10] tick/nohz updates Frederic Weisbecker
                   ` (7 preceding siblings ...)
  2021-03-11 12:37 ` [PATCH 08/10] tick/nohz: Only wakeup a single target cpu when kicking a task Frederic Weisbecker
@ 2021-03-11 12:37 ` Frederic Weisbecker
  2021-03-11 12:37 ` [PATCH 10/10] tick/nohz: Kick only _queued_ task whose tick dependency is updated Frederic Weisbecker
  9 siblings, 0 replies; 35+ messages in thread
From: Frederic Weisbecker @ 2021-03-11 12:37 UTC (permalink / raw)
  To: Peter Zijlstra, Thomas Gleixner
  Cc: LKML, Marcelo Tosatti, Rafael J . Wysocki, Ti Zhou, Yunfeng Ye,
	Frederic Weisbecker, Paul E . McKenney, Ingo Molnar

From: Marcelo Tosatti <mtosatti@redhat.com>

Rather than waking up all nohz_full CPUs on the system, only wakeup
the target CPUs of member threads of the signal.

Reduces interruptions to nohz_full CPUs.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Yunfeng Ye <yeyunfeng@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 include/linux/tick.h           |  8 ++++----
 kernel/time/posix-cpu-timers.c |  4 ++--
 kernel/time/tick-sched.c       | 15 +++++++++++++--
 3 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index bfc96cbe955c..9e71ab1889aa 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -215,7 +215,7 @@ extern void tick_nohz_dep_set_task(struct task_struct *tsk,
 				   enum tick_dep_bits bit);
 extern void tick_nohz_dep_clear_task(struct task_struct *tsk,
 				     enum tick_dep_bits bit);
-extern void tick_nohz_dep_set_signal(struct signal_struct *signal,
+extern void tick_nohz_dep_set_signal(struct task_struct *tsk,
 				     enum tick_dep_bits bit);
 extern void tick_nohz_dep_clear_signal(struct signal_struct *signal,
 				       enum tick_dep_bits bit);
@@ -260,11 +260,11 @@ static inline void tick_dep_clear_task(struct task_struct *tsk,
 	if (tick_nohz_full_enabled())
 		tick_nohz_dep_clear_task(tsk, bit);
 }
-static inline void tick_dep_set_signal(struct signal_struct *signal,
+static inline void tick_dep_set_signal(struct task_struct *tsk,
 				       enum tick_dep_bits bit)
 {
 	if (tick_nohz_full_enabled())
-		tick_nohz_dep_set_signal(signal, bit);
+		tick_nohz_dep_set_signal(tsk, bit);
 }
 static inline void tick_dep_clear_signal(struct signal_struct *signal,
 					 enum tick_dep_bits bit)
@@ -293,7 +293,7 @@ static inline void tick_dep_set_task(struct task_struct *tsk,
 				     enum tick_dep_bits bit) { }
 static inline void tick_dep_clear_task(struct task_struct *tsk,
 				       enum tick_dep_bits bit) { }
-static inline void tick_dep_set_signal(struct signal_struct *signal,
+static inline void tick_dep_set_signal(struct task_struct *tsk,
 				       enum tick_dep_bits bit) { }
 static inline void tick_dep_clear_signal(struct signal_struct *signal,
 					 enum tick_dep_bits bit) { }
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index a71758e34e45..932e0cb4b57b 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -523,7 +523,7 @@ static void arm_timer(struct k_itimer *timer, struct task_struct *p)
 	if (CPUCLOCK_PERTHREAD(timer->it_clock))
 		tick_dep_set_task(p, TICK_DEP_BIT_POSIX_TIMER);
 	else
-		tick_dep_set_signal(p->signal, TICK_DEP_BIT_POSIX_TIMER);
+		tick_dep_set_signal(p, TICK_DEP_BIT_POSIX_TIMER);
 }
 
 /*
@@ -1358,7 +1358,7 @@ void set_process_cpu_timer(struct task_struct *tsk, unsigned int clkid,
 	if (*newval < *nextevt)
 		*nextevt = *newval;
 
-	tick_dep_set_signal(tsk->signal, TICK_DEP_BIT_POSIX_TIMER);
+	tick_dep_set_signal(tsk, TICK_DEP_BIT_POSIX_TIMER);
 }
 
 static int do_cpu_nanosleep(const clockid_t which_clock, int flags,
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 8457f15a5073..a866fd8e7bb5 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -444,9 +444,20 @@ EXPORT_SYMBOL_GPL(tick_nohz_dep_clear_task);
  * Set a per-taskgroup tick dependency. Posix CPU timers need this in order to elapse
  * per process timers.
  */
-void tick_nohz_dep_set_signal(struct signal_struct *sig, enum tick_dep_bits bit)
+void tick_nohz_dep_set_signal(struct task_struct *tsk,
+			      enum tick_dep_bits bit)
 {
-	tick_nohz_dep_set_all(&sig->tick_dep_mask, bit);
+	int prev;
+	struct signal_struct *sig = tsk->signal;
+
+	prev = atomic_fetch_or(BIT(bit), &sig->tick_dep_mask);
+	if (!prev) {
+		struct task_struct *t;
+
+		lockdep_assert_held(&tsk->sighand->siglock);
+		__for_each_thread(sig, t)
+			tick_nohz_kick_task(t);
+	}
 }
 
 void tick_nohz_dep_clear_signal(struct signal_struct *sig, enum tick_dep_bits bit)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 10/10] tick/nohz: Kick only _queued_ task whose tick dependency is updated
  2021-03-11 12:36 [PATCH 00/10] tick/nohz updates Frederic Weisbecker
                   ` (8 preceding siblings ...)
  2021-03-11 12:37 ` [PATCH 09/10] tick/nohz: Change signal tick dependency to wakeup CPUs of member tasks Frederic Weisbecker
@ 2021-03-11 12:37 ` Frederic Weisbecker
  9 siblings, 0 replies; 35+ messages in thread
From: Frederic Weisbecker @ 2021-03-11 12:37 UTC (permalink / raw)
  To: Peter Zijlstra, Thomas Gleixner
  Cc: LKML, Marcelo Tosatti, Rafael J . Wysocki, Ti Zhou, Yunfeng Ye,
	Frederic Weisbecker, Paul E . McKenney, Ingo Molnar

From: Marcelo Tosatti <mtosatti@redhat.com>

When the tick dependency of a task is updated, we want it to aknowledge
the new state and restart the tick if needed. If the task is not
running, we don't need to kick it because it will observe the new
dependency upon scheduling in. But if the task is running, we may need
to send an IPI to it so that it gets notified.

Unfortunately we don't have the means to check if a task is running
in a race free way. Checking p->on_cpu in a synchronized way against
p->tick_dep_mask would imply adding a full barrier between
prepare_task_switch() and tick_nohz_task_switch(), which we want to
avoid in this fast-path.

Therefore we blindly fire an IPI to the task's CPU.

Meanwhile we can check if the task is queued on the CPU rq because
p->on_rq is always set to TASK_ON_RQ_QUEUED _before_ schedule() and its
full barrier that precedes tick_nohz_task_switch(). And if the task is
queued on a nohz_full CPU, it also has fair chances to be running as the
isolation constraints prescribe running single tasks on full dynticks
CPUs.

So use this as a trick to check if we can spare an IPI toward a
non-running task.

NOTE: For the ordering to be correct, it is assumed that we never
deactivate a task while it is running, the only exception being the task
deactivating itself while scheduling out.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Yunfeng Ye <yeyunfeng@huawei.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 include/linux/sched.h    |  2 ++
 kernel/sched/core.c      |  5 +++++
 kernel/time/tick-sched.c | 19 +++++++++++++++++--
 3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index ef00bb22164c..64dd6f698f3a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1999,6 +1999,8 @@ static inline void set_task_cpu(struct task_struct *p, unsigned int cpu)
 
 #endif /* CONFIG_SMP */
 
+extern bool sched_task_on_rq(struct task_struct *p);
+
 /*
  * In order to reduce various lock holder preemption latencies provide an
  * interface to see if a vCPU is currently running or not.
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 24552911f92b..7ec46f3e8110 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1598,6 +1598,11 @@ static inline void uclamp_post_fork(struct task_struct *p) { }
 static inline void init_uclamp(void) { }
 #endif /* CONFIG_UCLAMP_TASK */
 
+bool sched_task_on_rq(struct task_struct *p)
+{
+	return task_on_rq_queued(p);
+}
+
 static inline void enqueue_task(struct rq *rq, struct task_struct *p, int flags)
 {
 	if (!(flags & ENQUEUE_NOCLOCK))
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index a866fd8e7bb5..015281c3dd8d 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -324,8 +324,6 @@ void tick_nohz_full_kick_cpu(int cpu)
 
 static void tick_nohz_kick_task(struct task_struct *tsk)
 {
-	int cpu = task_cpu(tsk);
-
 	/*
 	 * If the task concurrently migrates to another cpu,
 	 * we guarantee it sees the new tick dependency upon
@@ -340,6 +338,23 @@ static void tick_nohz_kick_task(struct task_struct *tsk)
 	 *   tick_nohz_task_switch()            smp_mb() (atomic_fetch_or())
 	 *      LOAD p->tick_dep_mask           LOAD p->cpu
 	 */
+	int cpu = task_cpu(tsk);
+
+	/*
+	 * If the task is not running, run_posix_cpu_timers
+	 * has nothing to elapsed, can spare IPI in that
+	 * case.
+	 *
+	 * activate_task()                      STORE p->tick_dep_mask
+	 * STORE p->on_rq
+	 * __schedule() (switch to task 'p')    smp_mb() (atomic_fetch_or())
+	 * LOCK rq->lock                        LOAD p->on_rq
+	 * smp_mb__after_spin_lock()
+	 * tick_nohz_task_switch()
+	 *	LOAD p->tick_dep_mask
+	 */
+	if (!sched_task_on_rq(tsk))
+		return;
 
 	preempt_disable();
 	if (cpu_online(cpu))
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/10] tick/nohz: Prevent tick_nohz_get_sleep_length() from returning negative value
  2021-03-11 12:36 ` [PATCH 01/10] tick/nohz: Prevent tick_nohz_get_sleep_length() from returning negative value Frederic Weisbecker
@ 2021-03-16 12:21   ` Peter Zijlstra
  2021-03-16 13:37     ` Frederic Weisbecker
  0 siblings, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2021-03-16 12:21 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, LKML, Zhou Ti (x2019cwm),
	Rafael J . Wysocki, Yunfeng Ye, Paul E . McKenney,
	Marcelo Tosatti, Ingo Molnar

On Thu, Mar 11, 2021 at 01:36:59PM +0100, Frederic Weisbecker wrote:
> From: "Zhou Ti (x2019cwm)" <x2019cwm@stfx.ca>
> 
> If the hardware clock happens to fire its interrupts late, two possible
> issues can happen while calling tick_nohz_get_sleep_length(). Either:
> 
> 1) The next clockevent device event is due past the last idle entry time.
> 
> or:
> 
> 2) The last timekeeping update happened before the last idle entry time
>    and the next timer callback expires before the last idle entry time.
> 
> Make sure that both cases are handled to avoid returning a negative
> duration to the cpuidle governors.

Why? ... and wouldn't it be cheaper the fix the caller to
check negative once, instead of adding two branches here?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 02/10] tick/nohz: Add tick_nohz_full_this_cpu()
  2021-03-11 12:37 ` [PATCH 02/10] tick/nohz: Add tick_nohz_full_this_cpu() Frederic Weisbecker
@ 2021-03-16 12:28   ` Peter Zijlstra
  2021-03-16 13:05     ` Frederic Weisbecker
  0 siblings, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2021-03-16 12:28 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, LKML, Rafael J . Wysocki, Ti Zhou, Yunfeng Ye,
	Paul E . McKenney, Marcelo Tosatti, Ingo Molnar

On Thu, Mar 11, 2021 at 01:37:00PM +0100, Frederic Weisbecker wrote:
> Optimize further the check for local full dynticks CPU. Testing directly
> tick_nohz_full_cpu(smp_processor_id()) is suboptimal because the
> compiler first fetches the CPU number and only then processes the
> static key.
> 
> It's best to evaluate the static branch before anything.

Or you do tricky things like this ;-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index 7340613c7eff..bd4a6b055b80 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -185,13 +185,12 @@ static inline bool tick_nohz_full_enabled(void)
 	return tick_nohz_full_running;
 }
 
-static inline bool tick_nohz_full_cpu(int cpu)
-{
-	if (!tick_nohz_full_enabled())
-		return false;
-
-	return cpumask_test_cpu(cpu, tick_nohz_full_mask);
-}
+#define tick_nohz_full_cpu(_cpu) ({					\
+	bool __ret = false;						\
+	if (tick_nohz_full_enabled())					\
+		__ret = cpumask_test_cpu((_cpu), tick_nohz_full_mask);	\
+	__ret;								\
+})
 
 static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask)
 {



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 02/10] tick/nohz: Add tick_nohz_full_this_cpu()
  2021-03-16 12:28   ` Peter Zijlstra
@ 2021-03-16 13:05     ` Frederic Weisbecker
  0 siblings, 0 replies; 35+ messages in thread
From: Frederic Weisbecker @ 2021-03-16 13:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, LKML, Rafael J . Wysocki, Ti Zhou, Yunfeng Ye,
	Paul E . McKenney, Marcelo Tosatti, Ingo Molnar

On Tue, Mar 16, 2021 at 01:28:01PM +0100, Peter Zijlstra wrote:
> On Thu, Mar 11, 2021 at 01:37:00PM +0100, Frederic Weisbecker wrote:
> > Optimize further the check for local full dynticks CPU. Testing directly
> > tick_nohz_full_cpu(smp_processor_id()) is suboptimal because the
> > compiler first fetches the CPU number and only then processes the
> > static key.
> > 
> > It's best to evaluate the static branch before anything.
> 
> Or you do tricky things like this ;-)

Good point!

I'll check the asm diff to see if that really does what we want.
I expect it will.

Thanks.

> 
> diff --git a/include/linux/tick.h b/include/linux/tick.h
> index 7340613c7eff..bd4a6b055b80 100644
> --- a/include/linux/tick.h
> +++ b/include/linux/tick.h
> @@ -185,13 +185,12 @@ static inline bool tick_nohz_full_enabled(void)
>  	return tick_nohz_full_running;
>  }
>  
> -static inline bool tick_nohz_full_cpu(int cpu)
> -{
> -	if (!tick_nohz_full_enabled())
> -		return false;
> -
> -	return cpumask_test_cpu(cpu, tick_nohz_full_mask);
> -}
> +#define tick_nohz_full_cpu(_cpu) ({					\
> +	bool __ret = false;						\
> +	if (tick_nohz_full_enabled())					\
> +		__ret = cpumask_test_cpu((_cpu), tick_nohz_full_mask);	\
> +	__ret;								\
> +})
>  
>  static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask)
>  {
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/10] tick/nohz: Prevent tick_nohz_get_sleep_length() from returning negative value
  2021-03-16 12:21   ` Peter Zijlstra
@ 2021-03-16 13:37     ` Frederic Weisbecker
  2021-03-16 14:35       ` Peter Zijlstra
  0 siblings, 1 reply; 35+ messages in thread
From: Frederic Weisbecker @ 2021-03-16 13:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, LKML, Zhou Ti (x2019cwm),
	Rafael J . Wysocki, Yunfeng Ye, Paul E . McKenney,
	Marcelo Tosatti, Ingo Molnar

On Tue, Mar 16, 2021 at 01:21:29PM +0100, Peter Zijlstra wrote:
> On Thu, Mar 11, 2021 at 01:36:59PM +0100, Frederic Weisbecker wrote:
> > From: "Zhou Ti (x2019cwm)" <x2019cwm@stfx.ca>
> > 
> > If the hardware clock happens to fire its interrupts late, two possible
> > issues can happen while calling tick_nohz_get_sleep_length(). Either:
> > 
> > 1) The next clockevent device event is due past the last idle entry time.
> > 
> > or:
> > 
> > 2) The last timekeeping update happened before the last idle entry time
> >    and the next timer callback expires before the last idle entry time.
> > 
> > Make sure that both cases are handled to avoid returning a negative
> > duration to the cpuidle governors.
> 
> Why? ... and wouldn't it be cheaper the fix the caller to
> check negative once, instead of adding two branches here?

There are already two callers and potentially two return values to check
for each because the function returns two values.

I'd rather make the API more robust instead of fixing each callers and worrying
about future ones.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/10] tick/nohz: Prevent tick_nohz_get_sleep_length() from returning negative value
  2021-03-16 13:37     ` Frederic Weisbecker
@ 2021-03-16 14:35       ` Peter Zijlstra
  2021-03-16 14:53         ` Frederic Weisbecker
  0 siblings, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2021-03-16 14:35 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, LKML, Zhou Ti (x2019cwm),
	Rafael J . Wysocki, Yunfeng Ye, Paul E . McKenney,
	Marcelo Tosatti, Ingo Molnar

On Tue, Mar 16, 2021 at 02:37:03PM +0100, Frederic Weisbecker wrote:
> On Tue, Mar 16, 2021 at 01:21:29PM +0100, Peter Zijlstra wrote:
> > On Thu, Mar 11, 2021 at 01:36:59PM +0100, Frederic Weisbecker wrote:
> > > From: "Zhou Ti (x2019cwm)" <x2019cwm@stfx.ca>
> > > 
> > > If the hardware clock happens to fire its interrupts late, two possible
> > > issues can happen while calling tick_nohz_get_sleep_length(). Either:
> > > 
> > > 1) The next clockevent device event is due past the last idle entry time.
> > > 
> > > or:
> > > 
> > > 2) The last timekeeping update happened before the last idle entry time
> > >    and the next timer callback expires before the last idle entry time.
> > > 
> > > Make sure that both cases are handled to avoid returning a negative
> > > duration to the cpuidle governors.
> > 
> > Why? ... and wouldn't it be cheaper the fix the caller to
> > check negative once, instead of adding two branches here?
> 
> There are already two callers and potentially two return values to check
> for each because the function returns two values.
> 
> I'd rather make the API more robust instead of fixing each callers and worrying
> about future ones.

But what's the actual problem? The Changelog doesn't say why returning a
negative value is a problem, and in fact the return value is explicitly
signed.

Anyway, I don't terribly mind the patch, I was just confused by the lack
of actual justification.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/10] tick/nohz: Prevent tick_nohz_get_sleep_length() from returning negative value
  2021-03-16 14:35       ` Peter Zijlstra
@ 2021-03-16 14:53         ` Frederic Weisbecker
  2021-03-16 15:26           ` Rafael J. Wysocki
  0 siblings, 1 reply; 35+ messages in thread
From: Frederic Weisbecker @ 2021-03-16 14:53 UTC (permalink / raw)
  To: Peter Zijlstra, Rafael J . Wysocki
  Cc: Thomas Gleixner, LKML, Zhou Ti (x2019cwm),
	Yunfeng Ye, Paul E . McKenney, Marcelo Tosatti, Ingo Molnar

On Tue, Mar 16, 2021 at 03:35:37PM +0100, Peter Zijlstra wrote:
> On Tue, Mar 16, 2021 at 02:37:03PM +0100, Frederic Weisbecker wrote:
> > On Tue, Mar 16, 2021 at 01:21:29PM +0100, Peter Zijlstra wrote:
> > > On Thu, Mar 11, 2021 at 01:36:59PM +0100, Frederic Weisbecker wrote:
> > > > From: "Zhou Ti (x2019cwm)" <x2019cwm@stfx.ca>
> > > > 
> > > > If the hardware clock happens to fire its interrupts late, two possible
> > > > issues can happen while calling tick_nohz_get_sleep_length(). Either:
> > > > 
> > > > 1) The next clockevent device event is due past the last idle entry time.
> > > > 
> > > > or:
> > > > 
> > > > 2) The last timekeeping update happened before the last idle entry time
> > > >    and the next timer callback expires before the last idle entry time.
> > > > 
> > > > Make sure that both cases are handled to avoid returning a negative
> > > > duration to the cpuidle governors.
> > > 
> > > Why? ... and wouldn't it be cheaper the fix the caller to
> > > check negative once, instead of adding two branches here?
> > 
> > There are already two callers and potentially two return values to check
> > for each because the function returns two values.
> > 
> > I'd rather make the API more robust instead of fixing each callers and worrying
> > about future ones.
> 
> But what's the actual problem? The Changelog doesn't say why returning a
> negative value is a problem, and in fact the return value is explicitly
> signed.
> 
> Anyway, I don't terribly mind the patch, I was just confused by the lack
> of actual justification.

And you're right, the motivation is pure FUD: I don't know exactly
how the cpuidle governors may react to such negative values and so this
is just to prevent from potential accident.

Rafael, does that look harmless to you?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/10] tick/nohz: Prevent tick_nohz_get_sleep_length() from returning negative value
  2021-03-16 14:53         ` Frederic Weisbecker
@ 2021-03-16 15:26           ` Rafael J. Wysocki
  2021-03-16 15:57             ` 回复: " Zhou Ti (x2019cwm)
  0 siblings, 1 reply; 35+ messages in thread
From: Rafael J. Wysocki @ 2021-03-16 15:26 UTC (permalink / raw)
  To: Frederic Weisbecker, Peter Zijlstra
  Cc: Thomas Gleixner, LKML, Zhou Ti (x2019cwm),
	Yunfeng Ye, Paul E . McKenney, Marcelo Tosatti, Ingo Molnar,
	rafael

On 3/16/2021 3:53 PM, Frederic Weisbecker wrote:
> On Tue, Mar 16, 2021 at 03:35:37PM +0100, Peter Zijlstra wrote:
>> On Tue, Mar 16, 2021 at 02:37:03PM +0100, Frederic Weisbecker wrote:
>>> On Tue, Mar 16, 2021 at 01:21:29PM +0100, Peter Zijlstra wrote:
>>>> On Thu, Mar 11, 2021 at 01:36:59PM +0100, Frederic Weisbecker wrote:
>>>>> From: "Zhou Ti (x2019cwm)" <x2019cwm@stfx.ca>
>>>>>
>>>>> If the hardware clock happens to fire its interrupts late, two possible
>>>>> issues can happen while calling tick_nohz_get_sleep_length(). Either:
>>>>>
>>>>> 1) The next clockevent device event is due past the last idle entry time.
>>>>>
>>>>> or:
>>>>>
>>>>> 2) The last timekeeping update happened before the last idle entry time
>>>>>     and the next timer callback expires before the last idle entry time.
>>>>>
>>>>> Make sure that both cases are handled to avoid returning a negative
>>>>> duration to the cpuidle governors.
>>>> Why? ... and wouldn't it be cheaper the fix the caller to
>>>> check negative once, instead of adding two branches here?
>>> There are already two callers and potentially two return values to check
>>> for each because the function returns two values.
>>>
>>> I'd rather make the API more robust instead of fixing each callers and worrying
>>> about future ones.
>> But what's the actual problem? The Changelog doesn't say why returning a
>> negative value is a problem, and in fact the return value is explicitly
>> signed.
>>
>> Anyway, I don't terribly mind the patch, I was just confused by the lack
>> of actual justification.
> And you're right, the motivation is pure FUD: I don't know exactly
> how the cpuidle governors may react to such negative values and so this
> is just to prevent from potential accident.
>
> Rafael, does that look harmless to you?

No, this is a problem now.  Both governors using this assign the return 
value of it to a u64 var and so negative values confuse them.

That said I think it's better to deal with the issue in the callers.

I can send a patch for that if needed.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 06/10] timer: Report ignored local enqueue in nohz mode
  2021-03-11 12:37 ` [PATCH 06/10] timer: Report ignored local enqueue in nohz mode Frederic Weisbecker
@ 2021-03-16 15:27   ` Peter Zijlstra
  2021-03-25 13:07     ` Frederic Weisbecker
  0 siblings, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2021-03-16 15:27 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, LKML, Rafael J . Wysocki, Ti Zhou, Yunfeng Ye,
	Paul E . McKenney, Marcelo Tosatti, Ingo Molnar

On Thu, Mar 11, 2021 at 01:37:04PM +0100, Frederic Weisbecker wrote:
> Enqueuing a local timer after the tick has been stopped will result in
> the timer being ignored until the next random interrupt.
> 
> Perform sanity checks to report these situations.
> 
> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> ---
>  kernel/sched/core.c | 20 +++++++++++++++++++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index ca2bb629595f..24552911f92b 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -674,6 +674,22 @@ int get_nohz_timer_target(void)
>  	return cpu;
>  }
>  
> +/* Make sure the timer won't be ignored in dynticks-idle case */
> +static void wake_idle_assert_possible(void)
> +{
> +#ifdef CONFIG_SCHED_DEBUG
> +	/*
> +	 * Timers are re-evaluated after idle IRQs. In case of softirq,
> +	 * we assume IRQ tail. Ksoftirqd shouldn't reach here as the
> +	 * timer base wouldn't be idle. And inline softirq processing
> +	 * after a call to local_bh_enable() within idle loop sound too
> +	 * fun to be considered here.
> +	 */
> +	WARN_ONCE(in_task(),
> +		  "Late timer enqueue may be ignored\n");
> +#endif
> +}
> +
>  /*
>   * When add_timer_on() enqueues a timer into the timer wheel of an
>   * idle CPU then this timer might expire before the next timer event
> @@ -688,8 +704,10 @@ static void wake_up_idle_cpu(int cpu)
>  {
>  	struct rq *rq = cpu_rq(cpu);
>  
> -	if (cpu == smp_processor_id())
> +	if (cpu == smp_processor_id()) {
> +		wake_idle_assert_possible();
>  		return;
> +	}
>  
>  	if (set_nr_and_not_polling(rq->idle))
>  		smp_send_reschedule(cpu);

I'm not entirely sure I understand this one. What's the callchain that
leads to this?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* 回复: [PATCH 01/10] tick/nohz: Prevent tick_nohz_get_sleep_length() from returning negative value
  2021-03-16 15:26           ` Rafael J. Wysocki
@ 2021-03-16 15:57             ` Zhou Ti (x2019cwm)
  2021-03-16 16:08               ` Zhou Ti (x2019cwm)
  0 siblings, 1 reply; 35+ messages in thread
From: Zhou Ti (x2019cwm) @ 2021-03-16 15:57 UTC (permalink / raw)
  To: Rafael J. Wysocki, Frederic Weisbecker, Peter Zijlstra
  Cc: Thomas Gleixner, LKML, Yunfeng Ye, Paul E . McKenney,
	Marcelo Tosatti, Ingo Molnar, rafael

Yes, the return of a negative number results in a very large unsigned integer, which idle governors use as a baseline prediction for future interrupts and to correct their own parameters. This problem can lead to the selection of idle states that are too deep, which can be detrimental to both energy and performance.

________________________________________
发件人: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
发送时间: 2021年3月16日 3:26
收件人: Frederic Weisbecker; Peter Zijlstra
抄送: Thomas Gleixner; LKML; Zhou Ti (x2019cwm); Yunfeng Ye; Paul E . McKenney; Marcelo Tosatti; Ingo Molnar; rafael@kernel.org
主题: Re: [PATCH 01/10] tick/nohz: Prevent tick_nohz_get_sleep_length() from returning negative value

On 3/16/2021 3:53 PM, Frederic Weisbecker wrote:
> On Tue, Mar 16, 2021 at 03:35:37PM +0100, Peter Zijlstra wrote:
>> On Tue, Mar 16, 2021 at 02:37:03PM +0100, Frederic Weisbecker wrote:
>>> On Tue, Mar 16, 2021 at 01:21:29PM +0100, Peter Zijlstra wrote:
>>>> On Thu, Mar 11, 2021 at 01:36:59PM +0100, Frederic Weisbecker wrote:
>>>>> From: "Zhou Ti (x2019cwm)" <x2019cwm@stfx.ca>
>>>>>
>>>>> If the hardware clock happens to fire its interrupts late, two possible
>>>>> issues can happen while calling tick_nohz_get_sleep_length(). Either:
>>>>>
>>>>> 1) The next clockevent device event is due past the last idle entry time.
>>>>>
>>>>> or:
>>>>>
>>>>> 2) The last timekeeping update happened before the last idle entry time
>>>>>     and the next timer callback expires before the last idle entry time.
>>>>>
>>>>> Make sure that both cases are handled to avoid returning a negative
>>>>> duration to the cpuidle governors.
>>>> Why? ... and wouldn't it be cheaper the fix the caller to
>>>> check negative once, instead of adding two branches here?
>>> There are already two callers and potentially two return values to check
>>> for each because the function returns two values.
>>>
>>> I'd rather make the API more robust instead of fixing each callers and worrying
>>> about future ones.
>> But what's the actual problem? The Changelog doesn't say why returning a
>> negative value is a problem, and in fact the return value is explicitly
>> signed.
>>
>> Anyway, I don't terribly mind the patch, I was just confused by the lack
>> of actual justification.
> And you're right, the motivation is pure FUD: I don't know exactly
> how the cpuidle governors may react to such negative values and so this
> is just to prevent from potential accident.
>
> Rafael, does that look harmless to you?

No, this is a problem now.  Both governors using this assign the return
value of it to a u64 var and so negative values confuse them.

That said I think it's better to deal with the issue in the callers.

I can send a patch for that if needed.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* 回复: [PATCH 01/10] tick/nohz: Prevent tick_nohz_get_sleep_length() from returning negative value
  2021-03-16 15:57             ` 回复: " Zhou Ti (x2019cwm)
@ 2021-03-16 16:08               ` Zhou Ti (x2019cwm)
  2021-03-16 16:25                 ` Peter Zijlstra
  2021-03-25 13:14                 ` Frederic Weisbecker
  0 siblings, 2 replies; 35+ messages in thread
From: Zhou Ti (x2019cwm) @ 2021-03-16 16:08 UTC (permalink / raw)
  To: Rafael J. Wysocki, Frederic Weisbecker, Peter Zijlstra
  Cc: Thomas Gleixner, LKML, Yunfeng Ye, Paul E . McKenney,
	Marcelo Tosatti, Ingo Molnar, rafael

But I don't think it's a good idea to handle this in callers, because logically the function shouldn't return negative values. Returning 0 directly would allow idle governors to get another chance to select again.

________________________________________
发件人: Zhou Ti (x2019cwm) <x2019cwm@stfx.ca>
发送时间: 2021年3月16日 3:57
收件人: Rafael J. Wysocki; Frederic Weisbecker; Peter Zijlstra
抄送: Thomas Gleixner; LKML; Yunfeng Ye; Paul E . McKenney; Marcelo Tosatti; Ingo Molnar; rafael@kernel.org
主题: 回复: [PATCH 01/10] tick/nohz: Prevent tick_nohz_get_sleep_length() from returning negative value

Yes, the return of a negative number results in a very large unsigned integer, which idle governors use as a baseline prediction for future interrupts and to correct their own parameters. This problem can lead to the selection of idle states that are too deep, which can be detrimental to both energy and performance.

________________________________________
发件人: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
发送时间: 2021年3月16日 3:26
收件人: Frederic Weisbecker; Peter Zijlstra
抄送: Thomas Gleixner; LKML; Zhou Ti (x2019cwm); Yunfeng Ye; Paul E . McKenney; Marcelo Tosatti; Ingo Molnar; rafael@kernel.org
主题: Re: [PATCH 01/10] tick/nohz: Prevent tick_nohz_get_sleep_length() from returning negative value

On 3/16/2021 3:53 PM, Frederic Weisbecker wrote:
> On Tue, Mar 16, 2021 at 03:35:37PM +0100, Peter Zijlstra wrote:
>> On Tue, Mar 16, 2021 at 02:37:03PM +0100, Frederic Weisbecker wrote:
>>> On Tue, Mar 16, 2021 at 01:21:29PM +0100, Peter Zijlstra wrote:
>>>> On Thu, Mar 11, 2021 at 01:36:59PM +0100, Frederic Weisbecker wrote:
>>>>> From: "Zhou Ti (x2019cwm)" <x2019cwm@stfx.ca>
>>>>>
>>>>> If the hardware clock happens to fire its interrupts late, two possible
>>>>> issues can happen while calling tick_nohz_get_sleep_length(). Either:
>>>>>
>>>>> 1) The next clockevent device event is due past the last idle entry time.
>>>>>
>>>>> or:
>>>>>
>>>>> 2) The last timekeeping update happened before the last idle entry time
>>>>>     and the next timer callback expires before the last idle entry time.
>>>>>
>>>>> Make sure that both cases are handled to avoid returning a negative
>>>>> duration to the cpuidle governors.
>>>> Why? ... and wouldn't it be cheaper the fix the caller to
>>>> check negative once, instead of adding two branches here?
>>> There are already two callers and potentially two return values to check
>>> for each because the function returns two values.
>>>
>>> I'd rather make the API more robust instead of fixing each callers and worrying
>>> about future ones.
>> But what's the actual problem? The Changelog doesn't say why returning a
>> negative value is a problem, and in fact the return value is explicitly
>> signed.
>>
>> Anyway, I don't terribly mind the patch, I was just confused by the lack
>> of actual justification.
> And you're right, the motivation is pure FUD: I don't know exactly
> how the cpuidle governors may react to such negative values and so this
> is just to prevent from potential accident.
>
> Rafael, does that look harmless to you?

No, this is a problem now.  Both governors using this assign the return
value of it to a u64 var and so negative values confuse them.

That said I think it's better to deal with the issue in the callers.

I can send a patch for that if needed.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 回复: [PATCH 01/10] tick/nohz: Prevent tick_nohz_get_sleep_length() from returning negative value
  2021-03-16 16:08               ` Zhou Ti (x2019cwm)
@ 2021-03-16 16:25                 ` Peter Zijlstra
  2021-03-17 21:49                   ` Zhou Ti (x2019cwm)
  2021-03-25 13:14                 ` Frederic Weisbecker
  1 sibling, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2021-03-16 16:25 UTC (permalink / raw)
  To: Zhou Ti (x2019cwm), g
  Cc: Rafael J. Wysocki, Frederic Weisbecker, Thomas Gleixner, LKML,
	Yunfeng Ye, Paul E . McKenney, Marcelo Tosatti, Ingo Molnar,
	rafael

On Tue, Mar 16, 2021 at 04:08:08PM +0000, Zhou Ti (x2019cwm) wrote:
> But I don't think it's a good idea to handle this in callers, because
> logically the function shouldn't return negative values. Returning 0
> directly would allow idle governors to get another chance to select
> again.


A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 回复: [PATCH 01/10] tick/nohz: Prevent tick_nohz_get_sleep_length() from returning negative value
  2021-03-16 16:25                 ` Peter Zijlstra
@ 2021-03-17 21:49                   ` Zhou Ti (x2019cwm)
  0 siblings, 0 replies; 35+ messages in thread
From: Zhou Ti (x2019cwm) @ 2021-03-17 21:49 UTC (permalink / raw)
  To: Peter Zijlstra, g
  Cc: Rafael J. Wysocki, Frederic Weisbecker, Thomas Gleixner, LKML,
	Yunfeng Ye, Paul E . McKenney, Marcelo Tosatti, Ingo Molnar,
	rafael

On March 16, 2021 12:25, Peter Zijlstra wrote:
>On Tue, Mar 16, 2021 at 04:08:08PM +0000, Zhou Ti (x2019cwm) wrote:
>> But I don't think it's a good idea to handle this in callers, because
>> logically the function shouldn't return negative values. Returning 0
>> directly would allow idle governors to get another chance to select
>> again.


>A: Because it messes up the order in which people normally read text.
>Q: Why is top-posting such a bad thing?
>A: Top-posting.
>Q: What is the most annoying thing in e-mail?

Sorry about it, I am a newbie, I won't do it again.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 06/10] timer: Report ignored local enqueue in nohz mode
  2021-03-16 15:27   ` Peter Zijlstra
@ 2021-03-25 13:07     ` Frederic Weisbecker
  0 siblings, 0 replies; 35+ messages in thread
From: Frederic Weisbecker @ 2021-03-25 13:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, LKML, Rafael J . Wysocki, Ti Zhou, Yunfeng Ye,
	Paul E . McKenney, Marcelo Tosatti, Ingo Molnar

On Tue, Mar 16, 2021 at 04:27:56PM +0100, Peter Zijlstra wrote:
> On Thu, Mar 11, 2021 at 01:37:04PM +0100, Frederic Weisbecker wrote:
> > Enqueuing a local timer after the tick has been stopped will result in
> > the timer being ignored until the next random interrupt.
> > 
> > Perform sanity checks to report these situations.
> > 
> > Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Paul E. McKenney <paulmck@kernel.org>
> > ---
> >  kernel/sched/core.c | 20 +++++++++++++++++++-
> >  1 file changed, 19 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index ca2bb629595f..24552911f92b 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -674,6 +674,22 @@ int get_nohz_timer_target(void)
> >  	return cpu;
> >  }
> >  
> > +/* Make sure the timer won't be ignored in dynticks-idle case */
> > +static void wake_idle_assert_possible(void)
> > +{
> > +#ifdef CONFIG_SCHED_DEBUG
> > +	/*
> > +	 * Timers are re-evaluated after idle IRQs. In case of softirq,
> > +	 * we assume IRQ tail. Ksoftirqd shouldn't reach here as the
> > +	 * timer base wouldn't be idle. And inline softirq processing
> > +	 * after a call to local_bh_enable() within idle loop sound too
> > +	 * fun to be considered here.
> > +	 */
> > +	WARN_ONCE(in_task(),
> > +		  "Late timer enqueue may be ignored\n");
> > +#endif
> > +}
> > +
> >  /*
> >   * When add_timer_on() enqueues a timer into the timer wheel of an
> >   * idle CPU then this timer might expire before the next timer event
> > @@ -688,8 +704,10 @@ static void wake_up_idle_cpu(int cpu)
> >  {
> >  	struct rq *rq = cpu_rq(cpu);
> >  
> > -	if (cpu == smp_processor_id())
> > +	if (cpu == smp_processor_id()) {
> > +		wake_idle_assert_possible();
> >  		return;
> > +	}
> >  
> >  	if (set_nr_and_not_polling(rq->idle))
> >  		smp_send_reschedule(cpu);
> 
> I'm not entirely sure I understand this one. What's the callchain that
> leads to this?

That's while calling add_timer*() or mod_timer() on an idle target.

Now the issue is only relevant when these timer functions are called
after cpuidle_select(), which arguably makes a small vulnerable window
that could be spotted in the future if the timer functions are called
after instrumentation_end()?

Thanks.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 回复: [PATCH 01/10] tick/nohz: Prevent tick_nohz_get_sleep_length() from returning negative value
  2021-03-16 16:08               ` Zhou Ti (x2019cwm)
  2021-03-16 16:25                 ` Peter Zijlstra
@ 2021-03-25 13:14                 ` Frederic Weisbecker
  2021-03-25 18:56                   ` Rafael J. Wysocki
  1 sibling, 1 reply; 35+ messages in thread
From: Frederic Weisbecker @ 2021-03-25 13:14 UTC (permalink / raw)
  To: Zhou Ti (x2019cwm)
  Cc: Rafael J. Wysocki, Peter Zijlstra, Thomas Gleixner, LKML,
	Yunfeng Ye, Paul E . McKenney, Marcelo Tosatti, Ingo Molnar,
	rafael

On Tue, Mar 16, 2021 at 04:08:08PM +0000, Zhou Ti (x2019cwm) wrote:
> But I don't think it's a good idea to handle this in callers, because logically the function shouldn't return negative values. Returning 0 directly would allow idle governors to get another chance to select again.

Hmm, I'm going to leave the last word to Rafael since cpuidle are the only
callers of this. In any case we need to fix it.

Thanks.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 回复: [PATCH 01/10] tick/nohz: Prevent tick_nohz_get_sleep_length() from returning negative value
  2021-03-25 13:14                 ` Frederic Weisbecker
@ 2021-03-25 18:56                   ` Rafael J. Wysocki
  2021-03-25 19:18                     ` Zhou Ti (x2019cwm)
  0 siblings, 1 reply; 35+ messages in thread
From: Rafael J. Wysocki @ 2021-03-25 18:56 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Zhou Ti (x2019cwm),
	linux-kernel, Rafael J. Wysocki, Peter Zijlstra, Thomas Gleixner,
	LKML, Yunfeng Ye, Paul E . McKenney, Marcelo Tosatti,
	Ingo Molnar, rafael, Linux PM

On Thursday, March 25, 2021 2:14:00 PM CET Frederic Weisbecker wrote:
> On Tue, Mar 16, 2021 at 04:08:08PM +0000, Zhou Ti (x2019cwm) wrote:
> > But I don't think it's a good idea to handle this in callers, because logically the function shouldn't return negative values. Returning 0 directly would allow idle governors to get another chance to select again.
> 
> Hmm, I'm going to leave the last word to Rafael since cpuidle are the only
> callers of this. In any case we need to fix it.

Yes, we do.

So I said that I preferred to address this in the callers and the reason why
is because, for example, for the teo governor it would be a matter of using
a different data type to store the tick_nohz_get_sleep_length() return value,
like in the (untested) patch below.

So at least in this case there is no need to add any new branches anywhere.

I'm still not sure about menu, because it is more complicated, but even if
that one needs an extra branch, that is a win already.

---
 drivers/cpuidle/governors/teo.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Index: linux-pm/drivers/cpuidle/governors/teo.c
===================================================================
--- linux-pm.orig/drivers/cpuidle/governors/teo.c
+++ linux-pm/drivers/cpuidle/governors/teo.c
@@ -100,8 +100,8 @@ struct teo_idle_state {
  * @intervals: Saved idle duration values.
  */
 struct teo_cpu {
-	u64 time_span_ns;
-	u64 sleep_length_ns;
+	s64 time_span_ns;
+	s64 sleep_length_ns;
 	struct teo_idle_state states[CPUIDLE_STATE_MAX];
 	int interval_idx;
 	u64 intervals[INTERVALS];
@@ -216,7 +216,7 @@ static bool teo_time_ok(u64 interval_ns)
  */
 static int teo_find_shallower_state(struct cpuidle_driver *drv,
 				    struct cpuidle_device *dev, int state_idx,
-				    u64 duration_ns)
+				    s64 duration_ns)
 {
 	int i;
 
@@ -242,7 +242,7 @@ static int teo_select(struct cpuidle_dri
 {
 	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
 	s64 latency_req = cpuidle_governor_latency_req(dev->cpu);
-	u64 duration_ns;
+	s64 duration_ns;
 	unsigned int hits, misses, early_hits;
 	int max_early_idx, prev_max_early_idx, constraint_idx, idx, i;
 	ktime_t delta_tick;




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 回复: [PATCH 01/10] tick/nohz: Prevent tick_nohz_get_sleep_length() from returning negative value
  2021-03-25 18:56                   ` Rafael J. Wysocki
@ 2021-03-25 19:18                     ` Zhou Ti (x2019cwm)
  2021-03-25 19:50                       ` Rafael J. Wysocki
  0 siblings, 1 reply; 35+ messages in thread
From: Zhou Ti (x2019cwm) @ 2021-03-25 19:18 UTC (permalink / raw)
  To: Rafael J. Wysocki, Frederic Weisbecker
  Cc: LKML, Rafael J. Wysocki, Peter Zijlstra, Thomas Gleixner,
	Yunfeng Ye, Paul E . McKenney, Marcelo Tosatti, Ingo Molnar,
	rafael, Linux PM

On March 25, 2021 14:56, Rafael J. Wysocki wrote:
> On Thursday, March 25, 2021 2:14:00 PM CET Frederic Weisbecker wrote:
> > On Tue, Mar 16, 2021 at 04:08:08PM +0000, Zhou Ti (x2019cwm) wrote:
> > > But I don't think it's a good idea to handle this in callers, because logically the function shouldn't return negative values. Returning 0 directly would allow idle governors to get another chance to select again.
> >
> > Hmm, I'm going to leave the last word to Rafael since cpuidle are the only
> > callers of this. In any case we need to fix it.
> 
> Yes, we do.
> 
> So I said that I preferred to address this in the callers and the reason why
> is because, for example, for the teo governor it would be a matter of using
> a different data type to store the tick_nohz_get_sleep_length() return value,
> like in the (untested) patch below.
> 
> So at least in this case there is no need to add any new branches anywhere.
> 
> I'm still not sure about menu, because it is more complicated, but even if
> that one needs an extra branch, that is a win already.

I would like to point out the potential trouble that fixing this issue in the 
callers could cause.

1. This function is called multiple times in menu governor and TEO 
governor. I'm not sure that receiving results using signed integers is enough 
to solve all the problems, in the worst case it may require increasing 
the logical complexity of the code.

2. This function is important for developing idle governor. 
If the problem is not fixed in the function itself, then this potential 
pitfall should be explicitly stated in the documentation. This is because 
it is difficult to predict from the definition and naming of the function 
that it might return a negative number. I actually discovered this anomaly 
when I was doing data analysis on my own idle governor. For some idle control 
algorithms, this exception return could lead to serious consequences, 
because negative return logically won't happen.

> 
> ---
>  drivers/cpuidle/governors/teo.c |    8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> Index: linux-pm/drivers/cpuidle/governors/teo.c
> ===================================================================
> --- linux-pm.orig/drivers/cpuidle/governors/teo.c
> +++ linux-pm/drivers/cpuidle/governors/teo.c
> @@ -100,8 +100,8 @@ struct teo_idle_state {
>   * @intervals: Saved idle duration values.
>   */
>  struct teo_cpu {
> -       u64 time_span_ns;
> -       u64 sleep_length_ns;
> +       s64 time_span_ns;
> +       s64 sleep_length_ns;
>         struct teo_idle_state states[CPUIDLE_STATE_MAX];
>         int interval_idx;
>         u64 intervals[INTERVALS];
> @@ -216,7 +216,7 @@ static bool teo_time_ok(u64 interval_ns)
>   */
>  static int teo_find_shallower_state(struct cpuidle_driver *drv,
>                                     struct cpuidle_device *dev, int state_idx,
> -                                   u64 duration_ns)
> +                                   s64 duration_ns)
>  {
>         int i;
> 
> @@ -242,7 +242,7 @@ static int teo_select(struct cpuidle_dri
>  {
>         struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
>         s64 latency_req = cpuidle_governor_latency_req(dev->cpu);
> -       u64 duration_ns;
> +       s64 duration_ns;
>         unsigned int hits, misses, early_hits;
>         int max_early_idx, prev_max_early_idx, constraint_idx, idx, i;
>         ktime_t delta_tick;


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 回复: [PATCH 01/10] tick/nohz: Prevent tick_nohz_get_sleep_length() from returning negative value
  2021-03-25 19:18                     ` Zhou Ti (x2019cwm)
@ 2021-03-25 19:50                       ` Rafael J. Wysocki
  2021-03-25 20:37                         ` Zhou Ti (x2019cwm)
  0 siblings, 1 reply; 35+ messages in thread
From: Rafael J. Wysocki @ 2021-03-25 19:50 UTC (permalink / raw)
  To: Zhou Ti (x2019cwm)
  Cc: Rafael J. Wysocki, Frederic Weisbecker, LKML, Rafael J. Wysocki,
	Peter Zijlstra, Thomas Gleixner, Yunfeng Ye, Paul E . McKenney,
	Marcelo Tosatti, Ingo Molnar, rafael, Linux PM

On Thu, Mar 25, 2021 at 8:18 PM Zhou Ti (x2019cwm) <x2019cwm@stfx.ca> wrote:
>
> On March 25, 2021 14:56, Rafael J. Wysocki wrote:
> > On Thursday, March 25, 2021 2:14:00 PM CET Frederic Weisbecker wrote:
> > > On Tue, Mar 16, 2021 at 04:08:08PM +0000, Zhou Ti (x2019cwm) wrote:
> > > > But I don't think it's a good idea to handle this in callers, because logically the function shouldn't return negative values. Returning 0 directly would allow idle governors to get another chance to select again.
> > >
> > > Hmm, I'm going to leave the last word to Rafael since cpuidle are the only
> > > callers of this. In any case we need to fix it.
> >
> > Yes, we do.
> >
> > So I said that I preferred to address this in the callers and the reason why
> > is because, for example, for the teo governor it would be a matter of using
> > a different data type to store the tick_nohz_get_sleep_length() return value,
> > like in the (untested) patch below.
> >
> > So at least in this case there is no need to add any new branches anywhere.
> >
> > I'm still not sure about menu, because it is more complicated, but even if
> > that one needs an extra branch, that is a win already.
>
> I would like to point out the potential trouble that fixing this issue in the
> callers could cause.
>
> 1. This function is called multiple times in menu governor and TEO
> governor.

What do you mean by "multiple times"?

Each of the governors calls it once per cycle and its previous return
value is not used in the next cycle at least in teo.

> I'm not sure that receiving results using signed integers is enough
> to solve all the problems, in the worst case it may require increasing
> the logical complexity of the code.

That is a valid concern, so it is a tradeoff between increasing the
logical complexity of the code and adding branches to it.

> 2. This function is important for developing idle governor.
> If the problem is not fixed in the function itself, then this potential
> pitfall should be explicitly stated in the documentation.

That I can agree with.

> This is because
> it is difficult to predict from the definition and naming of the function
> that it might return a negative number. I actually discovered this anomaly
> when I was doing data analysis on my own idle governor. For some idle control
> algorithms, this exception return could lead to serious consequences,
> because negative return logically won't happen.

Well, it's a matter of how to take the possible negative return value
into account so it does not affect the result of the computations.

> >
> > ---
> >  drivers/cpuidle/governors/teo.c |    8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > Index: linux-pm/drivers/cpuidle/governors/teo.c
> > ===================================================================
> > --- linux-pm.orig/drivers/cpuidle/governors/teo.c
> > +++ linux-pm/drivers/cpuidle/governors/teo.c
> > @@ -100,8 +100,8 @@ struct teo_idle_state {
> >   * @intervals: Saved idle duration values.
> >   */
> >  struct teo_cpu {
> > -       u64 time_span_ns;
> > -       u64 sleep_length_ns;
> > +       s64 time_span_ns;
> > +       s64 sleep_length_ns;
> >         struct teo_idle_state states[CPUIDLE_STATE_MAX];
> >         int interval_idx;
> >         u64 intervals[INTERVALS];
> > @@ -216,7 +216,7 @@ static bool teo_time_ok(u64 interval_ns)
> >   */
> >  static int teo_find_shallower_state(struct cpuidle_driver *drv,
> >                                     struct cpuidle_device *dev, int state_idx,
> > -                                   u64 duration_ns)
> > +                                   s64 duration_ns)
> >  {
> >         int i;
> >
> > @@ -242,7 +242,7 @@ static int teo_select(struct cpuidle_dri
> >  {
> >         struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
> >         s64 latency_req = cpuidle_governor_latency_req(dev->cpu);
> > -       u64 duration_ns;
> > +       s64 duration_ns;
> >         unsigned int hits, misses, early_hits;
> >         int max_early_idx, prev_max_early_idx, constraint_idx, idx, i;
> >         ktime_t delta_tick;
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 回复: [PATCH 01/10] tick/nohz: Prevent tick_nohz_get_sleep_length() from returning negative value
  2021-03-25 19:50                       ` Rafael J. Wysocki
@ 2021-03-25 20:37                         ` Zhou Ti (x2019cwm)
  2021-03-26 17:01                           ` Rafael J. Wysocki
  0 siblings, 1 reply; 35+ messages in thread
From: Zhou Ti (x2019cwm) @ 2021-03-25 20:37 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Frederic Weisbecker, LKML, Rafael J. Wysocki,
	Peter Zijlstra, Thomas Gleixner, Yunfeng Ye, Paul E . McKenney,
	Marcelo Tosatti, Ingo Molnar, Linux PM

On March 25, 2021 15:50, Rafael J. Wysocki wrote:
> On Thu, Mar 25, 2021 at 8:18 PM Zhou Ti (x2019cwm) <x2019cwm@stfx.ca> wrote:
> >
> > On March 25, 2021 14:56, Rafael J. Wysocki wrote:
> > > On Thursday, March 25, 2021 2:14:00 PM CET Frederic Weisbecker wrote:
> > > > On Tue, Mar 16, 2021 at 04:08:08PM +0000, Zhou Ti (x2019cwm) wrote:
> > > > > But I don't think it's a good idea to handle this in callers, because logically the function shouldn't return negative values. Returning 0 directly would allow idle governors to get another chance to select again.
> > > >
> > > > Hmm, I'm going to leave the last word to Rafael since cpuidle are the only
> > > > callers of this. In any case we need to fix it.
> > >
> > > Yes, we do.
> > >
> > > So I said that I preferred to address this in the callers and the reason why
> > > is because, for example, for the teo governor it would be a matter of using
> > > a different data type to store the tick_nohz_get_sleep_length() return value,
> > > like in the (untested) patch below.
> > >
> > > So at least in this case there is no need to add any new branches anywhere.
> > >
> > > I'm still not sure about menu, because it is more complicated, but even if
> > > that one needs an extra branch, that is a win already.
> >
> > I would like to point out the potential trouble that fixing this issue in the
> > callers could cause.
> >
> > 1. This function is called multiple times in menu governor and TEO
> > governor.
> 
> What do you mean by "multiple times"?
> 
> Each of the governors calls it once per cycle and its previous return
> value is not used in the next cycle at least in teo.

I remember a governor called this function twice in a cycle, I guess I remember 
wrong.

> 
> > I'm not sure that receiving results using signed integers is enough
> > to solve all the problems, in the worst case it may require increasing
> > the logical complexity of the code.
> 
> That is a valid concern, so it is a tradeoff between increasing the
> logical complexity of the code and adding branches to it.
> 
> > 2. This function is important for developing idle governor.
> > If the problem is not fixed in the function itself, then this potential
> > pitfall should be explicitly stated in the documentation.
> 
> That I can agree with.
> 
> > This is because
> > it is difficult to predict from the definition and naming of the function
> > that it might return a negative number. I actually discovered this anomaly
> > when I was doing data analysis on my own idle governor. For some idle control
> > algorithms, this exception return could lead to serious consequences,
> > because negative return logically won't happen.
> 
> Well, it's a matter of how to take the possible negative return value
> into account so it does not affect the result of the computations.

I think it is challenging for some algorithms to take negative return values 
into account properly. For TEO (and even menu), it is possible to 
solve the problem by just changing the way the data is received is because the 
learning mechanism for both algorithms is simple. 

One of the interesting things about the CPUIdle subsystem is that it is well 
suited to introduce machine learning and probabilistic statistical methods.
This means that many of the more complex and data-sensitive algorithms can 
potentially be explored. In the best case we will still need to add additional 
code complexity to a new algorithm.

It would reduce a lot of unnecessary considerations (for example, highlight 
this shortcoming in the documentation) if we could ensure that this function 
would work as it is logically defined. But I don't really understand 
how much of a burden adding an extra branch would impose, so I don't know if 
this tradeoff is worth it.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 回复: [PATCH 01/10] tick/nohz: Prevent tick_nohz_get_sleep_length() from returning negative value
  2021-03-25 20:37                         ` Zhou Ti (x2019cwm)
@ 2021-03-26 17:01                           ` Rafael J. Wysocki
  2021-03-26 17:53                             ` Zhou Ti (x2019cwm)
  0 siblings, 1 reply; 35+ messages in thread
From: Rafael J. Wysocki @ 2021-03-26 17:01 UTC (permalink / raw)
  To: Zhou Ti (x2019cwm)
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Frederic Weisbecker, LKML,
	Rafael J. Wysocki, Peter Zijlstra, Thomas Gleixner, Yunfeng Ye,
	Paul E . McKenney, Marcelo Tosatti, Ingo Molnar, Linux PM

On Thu, Mar 25, 2021 at 9:37 PM Zhou Ti (x2019cwm) <x2019cwm@stfx.ca> wrote:
>
> On March 25, 2021 15:50, Rafael J. Wysocki wrote:
> > On Thu, Mar 25, 2021 at 8:18 PM Zhou Ti (x2019cwm) <x2019cwm@stfx.ca> wrote:
> > >
> > > On March 25, 2021 14:56, Rafael J. Wysocki wrote:
> > > > On Thursday, March 25, 2021 2:14:00 PM CET Frederic Weisbecker wrote:
> > > > > On Tue, Mar 16, 2021 at 04:08:08PM +0000, Zhou Ti (x2019cwm) wrote:
> > > > > > But I don't think it's a good idea to handle this in callers, because logically the function shouldn't return negative values. Returning 0 directly would allow idle governors to get another chance to select again.
> > > > >
> > > > > Hmm, I'm going to leave the last word to Rafael since cpuidle are the only
> > > > > callers of this. In any case we need to fix it.
> > > >
> > > > Yes, we do.
> > > >
> > > > So I said that I preferred to address this in the callers and the reason why
> > > > is because, for example, for the teo governor it would be a matter of using
> > > > a different data type to store the tick_nohz_get_sleep_length() return value,
> > > > like in the (untested) patch below.
> > > >
> > > > So at least in this case there is no need to add any new branches anywhere.
> > > >
> > > > I'm still not sure about menu, because it is more complicated, but even if
> > > > that one needs an extra branch, that is a win already.
> > >
> > > I would like to point out the potential trouble that fixing this issue in the
> > > callers could cause.
> > >
> > > 1. This function is called multiple times in menu governor and TEO
> > > governor.
> >
> > What do you mean by "multiple times"?
> >
> > Each of the governors calls it once per cycle and its previous return
> > value is not used in the next cycle at least in teo.
>
> I remember a governor called this function twice in a cycle, I guess I remember
> wrong.

That obviously depends on the governor, but both teo and menu call it
once per cycle.

> > > I'm not sure that receiving results using signed integers is enough
> > > to solve all the problems, in the worst case it may require increasing
> > > the logical complexity of the code.
> >
> > That is a valid concern, so it is a tradeoff between increasing the
> > logical complexity of the code and adding branches to it.
> >
> > > 2. This function is important for developing idle governor.
> > > If the problem is not fixed in the function itself, then this potential
> > > pitfall should be explicitly stated in the documentation.
> >
> > That I can agree with.
> >
> > > This is because
> > > it is difficult to predict from the definition and naming of the function
> > > that it might return a negative number. I actually discovered this anomaly
> > > when I was doing data analysis on my own idle governor. For some idle control
> > > algorithms, this exception return could lead to serious consequences,
> > > because negative return logically won't happen.
> >
> > Well, it's a matter of how to take the possible negative return value
> > into account so it does not affect the result of the computations.
>
> I think it is challenging for some algorithms to take negative return values
> into account properly. For TEO (and even menu), it is possible to
> solve the problem by just changing the way the data is received is because the
> learning mechanism for both algorithms is simple.

Of course this depends on the governor.

> One of the interesting things about the CPUIdle subsystem is that it is well
> suited to introduce machine learning and probabilistic statistical methods.

You need to remember that the governor code runs in the idle loop
context which is expected to be reasonably fast.

That's why we are worrying about individual branches here.

> This means that many of the more complex and data-sensitive algorithms can
> potentially be explored. In the best case we will still need to add additional
> code complexity to a new algorithm.

So I'm not sure what the problem with adding an upfront negative value
check to the governor is.

> It would reduce a lot of unnecessary considerations (for example, highlight
> this shortcoming in the documentation) if we could ensure that this function
> would work as it is logically defined. But I don't really understand
> how much of a burden adding an extra branch would impose, so I don't know if
> this tradeoff is worth it.

It ultimately depends on the governor, which is why I think that the
negative value check should be done by the governor, if needed, and
not by the function called by it, because in the latter case the check
may be redundant and we end up with an extra branch (or two branches
in this particular case) for no good reason whatsoever.

Yes, there are governors which simply can do the negative value check
upfront right after calling that function and ensure that they will
not deal with negative values going forward.  This is probably what
I'll do in the menu case.

However, if the governor is simple enough and it can avoid doing the
explicit negative value check, I don't see a reason to do that check
elsewhere "just in case".

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 回复: [PATCH 01/10] tick/nohz: Prevent tick_nohz_get_sleep_length() from returning negative value
  2021-03-26 17:01                           ` Rafael J. Wysocki
@ 2021-03-26 17:53                             ` Zhou Ti (x2019cwm)
  2021-03-26 18:54                               ` Rafael J. Wysocki
  0 siblings, 1 reply; 35+ messages in thread
From: Zhou Ti (x2019cwm) @ 2021-03-26 17:53 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Frederic Weisbecker, LKML, Rafael J. Wysocki,
	Peter Zijlstra, Thomas Gleixner, Yunfeng Ye, Paul E . McKenney,
	Marcelo Tosatti, Ingo Molnar, Linux PM

On Fri, 26 Mar 2021 18:01:47 +0100, Rafael J. Wysocki wrote:
> On Thu, Mar 25, 2021 at 9:37 PM Zhou Ti (x2019cwm) <x2019cwm@stfx.ca> wrote:
> >
> > On March 25, 2021 15:50, Rafael J. Wysocki wrote:
> > > On Thu, Mar 25, 2021 at 8:18 PM Zhou Ti (x2019cwm) <x2019cwm@stfx.ca> wrote:
> > > >
> > > > On March 25, 2021 14:56, Rafael J. Wysocki wrote:
> > > > > On Thursday, March 25, 2021 2:14:00 PM CET Frederic Weisbecker wrote:
> > > > > > On Tue, Mar 16, 2021 at 04:08:08PM +0000, Zhou Ti (x2019cwm) wrote:
> > > > > > > But I don't think it's a good idea to handle this in callers, because logically the function shouldn't return negative values. Returning 0 directly would allow idle governors to get another chance to select again.
> > > > > >
> > > > > > Hmm, I'm going to leave the last word to Rafael since cpuidle are the only
> > > > > > callers of this. In any case we need to fix it.
> > > > >
> > > > > Yes, we do.
> > > > >
> > > > > So I said that I preferred to address this in the callers and the reason why
> > > > > is because, for example, for the teo governor it would be a matter of using
> > > > > a different data type to store the tick_nohz_get_sleep_length() return value,
> > > > > like in the (untested) patch below.
> > > > >
> > > > > So at least in this case there is no need to add any new branches anywhere.
> > > > >
> > > > > I'm still not sure about menu, because it is more complicated, but even if
> > > > > that one needs an extra branch, that is a win already.
> > > >
> > > > I would like to point out the potential trouble that fixing this issue in the
> > > > callers could cause.
> > > >
> > > > 1. This function is called multiple times in menu governor and TEO
> > > > governor.
> > >
> > > What do you mean by "multiple times"?
> > >
> > > Each of the governors calls it once per cycle and its previous return
> > > value is not used in the next cycle at least in teo.
> >
> > I remember a governor called this function twice in a cycle, I guess I remember
> > wrong.
>
> That obviously depends on the governor, but both teo and menu call it
> once per cycle.
>
> > > > I'm not sure that receiving results using signed integers is enough
> > > > to solve all the problems, in the worst case it may require increasing
> > > > the logical complexity of the code.
> > >
> > > That is a valid concern, so it is a tradeoff between increasing the
> > > logical complexity of the code and adding branches to it.
> > >
> > > > 2. This function is important for developing idle governor.
> > > > If the problem is not fixed in the function itself, then this potential
> > > > pitfall should be explicitly stated in the documentation.
> > >
> > > That I can agree with.
> > >
> > > > This is because
> > > > it is difficult to predict from the definition and naming of the function
> > > > that it might return a negative number. I actually discovered this anomaly
> > > > when I was doing data analysis on my own idle governor. For some idle control
> > > > algorithms, this exception return could lead to serious consequences,
> > > > because negative return logically won't happen.
> > >
> > > Well, it's a matter of how to take the possible negative return value
> > > into account so it does not affect the result of the computations.
> >
> > I think it is challenging for some algorithms to take negative return values
> > into account properly. For TEO (and even menu), it is possible to
> > solve the problem by just changing the way the data is received is because the
> > learning mechanism for both algorithms is simple.
>
> Of course this depends on the governor.
>
> > One of the interesting things about the CPUIdle subsystem is that it is well
> > suited to introduce machine learning and probabilistic statistical methods.
>
> You need to remember that the governor code runs in the idle loop
> context which is expected to be reasonably fast.
>
> That's why we are worrying about individual branches here.
>
> > This means that many of the more complex and data-sensitive algorithms can
> > potentially be explored. In the best case we will still need to add additional
> > code complexity to a new algorithm.
>
> So I'm not sure what the problem with adding an upfront negative value
> check to the governor is.
>
> > It would reduce a lot of unnecessary considerations (for example, highlight
> > this shortcoming in the documentation) if we could ensure that this function
> > would work as it is logically defined. But I don't really understand
> > how much of a burden adding an extra branch would impose, so I don't know if
> > this tradeoff is worth it.
>
> It ultimately depends on the governor, which is why I think that the
> negative value check should be done by the governor, if needed, and
> not by the function called by it, because in the latter case the check
> may be redundant and we end up with an extra branch (or two branches
> in this particular case) for no good reason whatsoever.
>
> Yes, there are governors which simply can do the negative value check
> upfront right after calling that function and ensure that they will
> not deal with negative values going forward.  This is probably what
> I'll do in the menu case.
>
> However, if the governor is simple enough and it can avoid doing the
> explicit negative value check, I don't see a reason to do that check
> elsewhere "just in case".

Makes sense. I will submit my patch to fix this issue in menu and TEO.

________________________________________
From: Rafael J. Wysocki <rafael@kernel.org>
Sent: March 26, 2021 13:01
To: Zhou Ti (x2019cwm)
Cc: Rafael J. Wysocki; Rafael J. Wysocki; Frederic Weisbecker; LKML; Rafael J. Wysocki; Peter Zijlstra; Thomas Gleixner; Yunfeng Ye; Paul E . McKenney; Marcelo Tosatti; Ingo Molnar; Linux PM
Subject: Re: 回复: [PATCH 01/10] tick/nohz: Prevent tick_nohz_get_sleep_length() from returning negative value

On Thu, Mar 25, 2021 at 9:37 PM Zhou Ti (x2019cwm) <x2019cwm@stfx.ca> wrote:
>
> On March 25, 2021 15:50, Rafael J. Wysocki wrote:
> > On Thu, Mar 25, 2021 at 8:18 PM Zhou Ti (x2019cwm) <x2019cwm@stfx.ca> wrote:
> > >
> > > On March 25, 2021 14:56, Rafael J. Wysocki wrote:
> > > > On Thursday, March 25, 2021 2:14:00 PM CET Frederic Weisbecker wrote:
> > > > > On Tue, Mar 16, 2021 at 04:08:08PM +0000, Zhou Ti (x2019cwm) wrote:
> > > > > > But I don't think it's a good idea to handle this in callers, because logically the function shouldn't return negative values. Returning 0 directly would allow idle governors to get another chance to select again.
> > > > >
> > > > > Hmm, I'm going to leave the last word to Rafael since cpuidle are the only
> > > > > callers of this. In any case we need to fix it.
> > > >
> > > > Yes, we do.
> > > >
> > > > So I said that I preferred to address this in the callers and the reason why
> > > > is because, for example, for the teo governor it would be a matter of using
> > > > a different data type to store the tick_nohz_get_sleep_length() return value,
> > > > like in the (untested) patch below.
> > > >
> > > > So at least in this case there is no need to add any new branches anywhere.
> > > >
> > > > I'm still not sure about menu, because it is more complicated, but even if
> > > > that one needs an extra branch, that is a win already.
> > >
> > > I would like to point out the potential trouble that fixing this issue in the
> > > callers could cause.
> > >
> > > 1. This function is called multiple times in menu governor and TEO
> > > governor.
> >
> > What do you mean by "multiple times"?
> >
> > Each of the governors calls it once per cycle and its previous return
> > value is not used in the next cycle at least in teo.
>
> I remember a governor called this function twice in a cycle, I guess I remember
> wrong.

That obviously depends on the governor, but both teo and menu call it
once per cycle.

> > > I'm not sure that receiving results using signed integers is enough
> > > to solve all the problems, in the worst case it may require increasing
> > > the logical complexity of the code.
> >
> > That is a valid concern, so it is a tradeoff between increasing the
> > logical complexity of the code and adding branches to it.
> >
> > > 2. This function is important for developing idle governor.
> > > If the problem is not fixed in the function itself, then this potential
> > > pitfall should be explicitly stated in the documentation.
> >
> > That I can agree with.
> >
> > > This is because
> > > it is difficult to predict from the definition and naming of the function
> > > that it might return a negative number. I actually discovered this anomaly
> > > when I was doing data analysis on my own idle governor. For some idle control
> > > algorithms, this exception return could lead to serious consequences,
> > > because negative return logically won't happen.
> >
> > Well, it's a matter of how to take the possible negative return value
> > into account so it does not affect the result of the computations.
>
> I think it is challenging for some algorithms to take negative return values
> into account properly. For TEO (and even menu), it is possible to
> solve the problem by just changing the way the data is received is because the
> learning mechanism for both algorithms is simple.

Of course this depends on the governor.

> One of the interesting things about the CPUIdle subsystem is that it is well
> suited to introduce machine learning and probabilistic statistical methods.

You need to remember that the governor code runs in the idle loop
context which is expected to be reasonably fast.

That's why we are worrying about individual branches here.

> This means that many of the more complex and data-sensitive algorithms can
> potentially be explored. In the best case we will still need to add additional
> code complexity to a new algorithm.

So I'm not sure what the problem with adding an upfront negative value
check to the governor is.

> It would reduce a lot of unnecessary considerations (for example, highlight
> this shortcoming in the documentation) if we could ensure that this function
> would work as it is logically defined. But I don't really understand
> how much of a burden adding an extra branch would impose, so I don't know if
> this tradeoff is worth it.

It ultimately depends on the governor, which is why I think that the
negative value check should be done by the governor, if needed, and
not by the function called by it, because in the latter case the check
may be redundant and we end up with an extra branch (or two branches
in this particular case) for no good reason whatsoever.

Yes, there are governors which simply can do the negative value check
upfront right after calling that function and ensure that they will
not deal with negative values going forward.  This is probably what
I'll do in the menu case.

However, if the governor is simple enough and it can avoid doing the
explicit negative value check, I don't see a reason to do that check
elsewhere "just in case".

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 回复: [PATCH 01/10] tick/nohz: Prevent tick_nohz_get_sleep_length() from returning negative value
  2021-03-26 17:53                             ` Zhou Ti (x2019cwm)
@ 2021-03-26 18:54                               ` Rafael J. Wysocki
  2021-03-26 22:53                                 ` Zhou Ti (x2019cwm)
  0 siblings, 1 reply; 35+ messages in thread
From: Rafael J. Wysocki @ 2021-03-26 18:54 UTC (permalink / raw)
  To: Zhou Ti (x2019cwm)
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Frederic Weisbecker, LKML,
	Rafael J. Wysocki, Peter Zijlstra, Thomas Gleixner, Yunfeng Ye,
	Paul E . McKenney, Marcelo Tosatti, Ingo Molnar, Linux PM

On Fri, Mar 26, 2021 at 6:53 PM Zhou Ti (x2019cwm) <x2019cwm@stfx.ca> wrote:
>
> On Fri, 26 Mar 2021 18:01:47 +0100, Rafael J. Wysocki wrote:
> > On Thu, Mar 25, 2021 at 9:37 PM Zhou Ti (x2019cwm) <x2019cwm@stfx.ca> wrote:
> > >
> > > On March 25, 2021 15:50, Rafael J. Wysocki wrote:
> > > > On Thu, Mar 25, 2021 at 8:18 PM Zhou Ti (x2019cwm) <x2019cwm@stfx.ca> wrote:
> > > > >
> > > > > On March 25, 2021 14:56, Rafael J. Wysocki wrote:
> > > > > > On Thursday, March 25, 2021 2:14:00 PM CET Frederic Weisbecker wrote:
> > > > > > > On Tue, Mar 16, 2021 at 04:08:08PM +0000, Zhou Ti (x2019cwm) wrote:
> > > > > > > > But I don't think it's a good idea to handle this in callers, because logically the function shouldn't return negative values. Returning 0 directly would allow idle governors to get another chance to select again.
> > > > > > >
> > > > > > > Hmm, I'm going to leave the last word to Rafael since cpuidle are the only
> > > > > > > callers of this. In any case we need to fix it.
> > > > > >
> > > > > > Yes, we do.
> > > > > >
> > > > > > So I said that I preferred to address this in the callers and the reason why
> > > > > > is because, for example, for the teo governor it would be a matter of using
> > > > > > a different data type to store the tick_nohz_get_sleep_length() return value,
> > > > > > like in the (untested) patch below.
> > > > > >
> > > > > > So at least in this case there is no need to add any new branches anywhere.
> > > > > >
> > > > > > I'm still not sure about menu, because it is more complicated, but even if
> > > > > > that one needs an extra branch, that is a win already.
> > > > >
> > > > > I would like to point out the potential trouble that fixing this issue in the
> > > > > callers could cause.
> > > > >
> > > > > 1. This function is called multiple times in menu governor and TEO
> > > > > governor.
> > > >
> > > > What do you mean by "multiple times"?
> > > >
> > > > Each of the governors calls it once per cycle and its previous return
> > > > value is not used in the next cycle at least in teo.
> > >
> > > I remember a governor called this function twice in a cycle, I guess I remember
> > > wrong.
> >
> > That obviously depends on the governor, but both teo and menu call it
> > once per cycle.
> >
> > > > > I'm not sure that receiving results using signed integers is enough
> > > > > to solve all the problems, in the worst case it may require increasing
> > > > > the logical complexity of the code.
> > > >
> > > > That is a valid concern, so it is a tradeoff between increasing the
> > > > logical complexity of the code and adding branches to it.
> > > >
> > > > > 2. This function is important for developing idle governor.
> > > > > If the problem is not fixed in the function itself, then this potential
> > > > > pitfall should be explicitly stated in the documentation.
> > > >
> > > > That I can agree with.
> > > >
> > > > > This is because
> > > > > it is difficult to predict from the definition and naming of the function
> > > > > that it might return a negative number. I actually discovered this anomaly
> > > > > when I was doing data analysis on my own idle governor. For some idle control
> > > > > algorithms, this exception return could lead to serious consequences,
> > > > > because negative return logically won't happen.
> > > >
> > > > Well, it's a matter of how to take the possible negative return value
> > > > into account so it does not affect the result of the computations.
> > >
> > > I think it is challenging for some algorithms to take negative return values
> > > into account properly. For TEO (and even menu), it is possible to
> > > solve the problem by just changing the way the data is received is because the
> > > learning mechanism for both algorithms is simple.
> >
> > Of course this depends on the governor.
> >
> > > One of the interesting things about the CPUIdle subsystem is that it is well
> > > suited to introduce machine learning and probabilistic statistical methods.
> >
> > You need to remember that the governor code runs in the idle loop
> > context which is expected to be reasonably fast.
> >
> > That's why we are worrying about individual branches here.
> >
> > > This means that many of the more complex and data-sensitive algorithms can
> > > potentially be explored. In the best case we will still need to add additional
> > > code complexity to a new algorithm.
> >
> > So I'm not sure what the problem with adding an upfront negative value
> > check to the governor is.
> >
> > > It would reduce a lot of unnecessary considerations (for example, highlight
> > > this shortcoming in the documentation) if we could ensure that this function
> > > would work as it is logically defined. But I don't really understand
> > > how much of a burden adding an extra branch would impose, so I don't know if
> > > this tradeoff is worth it.
> >
> > It ultimately depends on the governor, which is why I think that the
> > negative value check should be done by the governor, if needed, and
> > not by the function called by it, because in the latter case the check
> > may be redundant and we end up with an extra branch (or two branches
> > in this particular case) for no good reason whatsoever.
> >
> > Yes, there are governors which simply can do the negative value check
> > upfront right after calling that function and ensure that they will
> > not deal with negative values going forward.  This is probably what
> > I'll do in the menu case.
> >
> > However, if the governor is simple enough and it can avoid doing the
> > explicit negative value check, I don't see a reason to do that check
> > elsewhere "just in case".
>
> Makes sense. I will submit my patch to fix this issue in menu and TEO.

Well, I have patches for that already and they are not
super-straightforward.  Though If you want to try to fix this
yourself, I'll wait for your submission.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 回复: [PATCH 01/10] tick/nohz: Prevent tick_nohz_get_sleep_length() from returning negative value
  2021-03-26 18:54                               ` Rafael J. Wysocki
@ 2021-03-26 22:53                                 ` Zhou Ti (x2019cwm)
  2021-03-29 12:44                                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 35+ messages in thread
From: Zhou Ti (x2019cwm) @ 2021-03-26 22:53 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Frederic Weisbecker, LKML, Rafael J. Wysocki,
	Peter Zijlstra, Thomas Gleixner, Yunfeng Ye, Paul E . McKenney,
	Marcelo Tosatti, Ingo Molnar, Linux PM

On Fri, 26 Mar 2021 19:54:26 +0100, Rafael J. Wysocki wrote:
> On Fri, Mar 26, 2021 at 6:53 PM Zhou Ti (x2019cwm) <x2019cwm@stfx.ca> wrote:
> >
> > On Fri, 26 Mar 2021 18:01:47 +0100, Rafael J. Wysocki wrote:
> > > On Thu, Mar 25, 2021 at 9:37 PM Zhou Ti (x2019cwm) <x2019cwm@stfx.ca> wrote:
> > > >
> > > > On March 25, 2021 15:50, Rafael J. Wysocki wrote:
> > > > > On Thu, Mar 25, 2021 at 8:18 PM Zhou Ti (x2019cwm) <x2019cwm@stfx.ca> wrote:
> > > > > >
> > > > > > On March 25, 2021 14:56, Rafael J. Wysocki wrote:
> > > > > > > On Thursday, March 25, 2021 2:14:00 PM CET Frederic Weisbecker wrote:
> > > > > > > > On Tue, Mar 16, 2021 at 04:08:08PM +0000, Zhou Ti (x2019cwm) wrote:
> > > > > > > > > But I don't think it's a good idea to handle this in callers, because logically the function shouldn't return negative values. Returning 0 directly would allow idle governors to get another chance to select again.
> > > > > > > >
> > > > > > > > Hmm, I'm going to leave the last word to Rafael since cpuidle are the only
> > > > > > > > callers of this. In any case we need to fix it.
> > > > > > >
> > > > > > > Yes, we do.
> > > > > > >
> > > > > > > So I said that I preferred to address this in the callers and the reason why
> > > > > > > is because, for example, for the teo governor it would be a matter of using
> > > > > > > a different data type to store the tick_nohz_get_sleep_length() return value,
> > > > > > > like in the (untested) patch below.
> > > > > > >
> > > > > > > So at least in this case there is no need to add any new branches anywhere.
> > > > > > >
> > > > > > > I'm still not sure about menu, because it is more complicated, but even if
> > > > > > > that one needs an extra branch, that is a win already.
> > > > > >
> > > > > > I would like to point out the potential trouble that fixing this issue in the
> > > > > > callers could cause.
> > > > > >
> > > > > > 1. This function is called multiple times in menu governor and TEO
> > > > > > governor.
> > > > >
> > > > > What do you mean by "multiple times"?
> > > > >
> > > > > Each of the governors calls it once per cycle and its previous return
> > > > > value is not used in the next cycle at least in teo.
> > > >
> > > > I remember a governor called this function twice in a cycle, I guess I remember
> > > > wrong.
> > >
> > > That obviously depends on the governor, but both teo and menu call it
> > > once per cycle.
> > >
> > > > > > I'm not sure that receiving results using signed integers is enough
> > > > > > to solve all the problems, in the worst case it may require increasing
> > > > > > the logical complexity of the code.
> > > > >
> > > > > That is a valid concern, so it is a tradeoff between increasing the
> > > > > logical complexity of the code and adding branches to it.
> > > > >
> > > > > > 2. This function is important for developing idle governor.
> > > > > > If the problem is not fixed in the function itself, then this potential
> > > > > > pitfall should be explicitly stated in the documentation.
> > > > >
> > > > > That I can agree with.
> > > > >
> > > > > > This is because
> > > > > > it is difficult to predict from the definition and naming of the function
> > > > > > that it might return a negative number. I actually discovered this anomaly
> > > > > > when I was doing data analysis on my own idle governor. For some idle control
> > > > > > algorithms, this exception return could lead to serious consequences,
> > > > > > because negative return logically won't happen.
> > > > >
> > > > > Well, it's a matter of how to take the possible negative return value
> > > > > into account so it does not affect the result of the computations.
> > > >
> > > > I think it is challenging for some algorithms to take negative return values
> > > > into account properly. For TEO (and even menu), it is possible to
> > > > solve the problem by just changing the way the data is received is because the
> > > > learning mechanism for both algorithms is simple.
> > >
> > > Of course this depends on the governor.
> > >
> > > > One of the interesting things about the CPUIdle subsystem is that it is well
> > > > suited to introduce machine learning and probabilistic statistical methods.
> > >
> > > You need to remember that the governor code runs in the idle loop
> > > context which is expected to be reasonably fast.
> > >
> > > That's why we are worrying about individual branches here.
> > >
> > > > This means that many of the more complex and data-sensitive algorithms can
> > > > potentially be explored. In the best case we will still need to add additional
> > > > code complexity to a new algorithm.
> > >
> > > So I'm not sure what the problem with adding an upfront negative value
> > > check to the governor is.
> > >
> > > > It would reduce a lot of unnecessary considerations (for example, highlight
> > > > this shortcoming in the documentation) if we could ensure that this function
> > > > would work as it is logically defined. But I don't really understand
> > > > how much of a burden adding an extra branch would impose, so I don't know if
> > > > this tradeoff is worth it.
> > >
> > > It ultimately depends on the governor, which is why I think that the
> > > negative value check should be done by the governor, if needed, and
> > > not by the function called by it, because in the latter case the check
> > > may be redundant and we end up with an extra branch (or two branches
> > > in this particular case) for no good reason whatsoever.
> > >
> > > Yes, there are governors which simply can do the negative value check
> > > upfront right after calling that function and ensure that they will
> > > not deal with negative values going forward.  This is probably what
> > > I'll do in the menu case.
> > >
> > > However, if the governor is simple enough and it can avoid doing the
> > > explicit negative value check, I don't see a reason to do that check
> > > elsewhere "just in case".
> >
> > Makes sense. I will submit my patch to fix this issue in menu and TEO.
> 
> Well, I have patches for that already and they are not
> super-straightforward.  Though If you want to try to fix this
> yourself, I'll wait for your submission.

Thanks! I really like this subsystem, so I hope to contribute a little.
I still have some questions.

For TEO governor:
    Even if we change some datatypes as your patch did before, some explicit 
    type conversions still need to be added to prevent wrong results.

    For example:
        line 276 (because 1u > -1 will be false)
        line 329
        line 422
    
    Since some of them are in a loop, does the overhead caused by the
    type conversions worth it? or do I need to do some pre-processing to avoid 
    duplicate conversions (which may cause additional space overhead) ?
    
For menu governor:
    If we simply change the datatypes, the conversions required are even more.

    For example:
        lines 133-142
        line 202
        line 289
        lines 302-309
        line 320
        lines 327-328
        line 347

    If we just do exception handling on next_timer_ns and delta_next,
    that still requires two additional branches.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 回复: [PATCH 01/10] tick/nohz: Prevent tick_nohz_get_sleep_length() from returning negative value
  2021-03-26 22:53                                 ` Zhou Ti (x2019cwm)
@ 2021-03-29 12:44                                   ` Rafael J. Wysocki
  2021-03-29 14:49                                     ` Zhou Ti (x2019cwm)
  0 siblings, 1 reply; 35+ messages in thread
From: Rafael J. Wysocki @ 2021-03-29 12:44 UTC (permalink / raw)
  To: Zhou Ti (x2019cwm)
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Frederic Weisbecker, LKML,
	Rafael J. Wysocki, Peter Zijlstra, Thomas Gleixner, Yunfeng Ye,
	Paul E . McKenney, Marcelo Tosatti, Ingo Molnar, Linux PM

On Fri, Mar 26, 2021 at 11:53 PM Zhou Ti (x2019cwm) <x2019cwm@stfx.ca> wrote:
>
> On Fri, 26 Mar 2021 19:54:26 +0100, Rafael J. Wysocki wrote:
> > On Fri, Mar 26, 2021 at 6:53 PM Zhou Ti (x2019cwm) <x2019cwm@stfx.ca> wrote:
> > >
> > > On Fri, 26 Mar 2021 18:01:47 +0100, Rafael J. Wysocki wrote:
> > > > On Thu, Mar 25, 2021 at 9:37 PM Zhou Ti (x2019cwm) <x2019cwm@stfx.ca> wrote:
> > > > >
> > > > > On March 25, 2021 15:50, Rafael J. Wysocki wrote:
> > > > > > On Thu, Mar 25, 2021 at 8:18 PM Zhou Ti (x2019cwm) <x2019cwm@stfx.ca> wrote:
> > > > > > >
> > > > > > > On March 25, 2021 14:56, Rafael J. Wysocki wrote:
> > > > > > > > On Thursday, March 25, 2021 2:14:00 PM CET Frederic Weisbecker wrote:
> > > > > > > > > On Tue, Mar 16, 2021 at 04:08:08PM +0000, Zhou Ti (x2019cwm) wrote:
> > > > > > > > > > But I don't think it's a good idea to handle this in callers, because logically the function shouldn't return negative values. Returning 0 directly would allow idle governors to get another chance to select again.
> > > > > > > > >
> > > > > > > > > Hmm, I'm going to leave the last word to Rafael since cpuidle are the only
> > > > > > > > > callers of this. In any case we need to fix it.
> > > > > > > >
> > > > > > > > Yes, we do.
> > > > > > > >
> > > > > > > > So I said that I preferred to address this in the callers and the reason why
> > > > > > > > is because, for example, for the teo governor it would be a matter of using
> > > > > > > > a different data type to store the tick_nohz_get_sleep_length() return value,
> > > > > > > > like in the (untested) patch below.
> > > > > > > >
> > > > > > > > So at least in this case there is no need to add any new branches anywhere.
> > > > > > > >
> > > > > > > > I'm still not sure about menu, because it is more complicated, but even if
> > > > > > > > that one needs an extra branch, that is a win already.
> > > > > > >
> > > > > > > I would like to point out the potential trouble that fixing this issue in the
> > > > > > > callers could cause.
> > > > > > >
> > > > > > > 1. This function is called multiple times in menu governor and TEO
> > > > > > > governor.
> > > > > >
> > > > > > What do you mean by "multiple times"?
> > > > > >
> > > > > > Each of the governors calls it once per cycle and its previous return
> > > > > > value is not used in the next cycle at least in teo.
> > > > >
> > > > > I remember a governor called this function twice in a cycle, I guess I remember
> > > > > wrong.
> > > >
> > > > That obviously depends on the governor, but both teo and menu call it
> > > > once per cycle.
> > > >
> > > > > > > I'm not sure that receiving results using signed integers is enough
> > > > > > > to solve all the problems, in the worst case it may require increasing
> > > > > > > the logical complexity of the code.
> > > > > >
> > > > > > That is a valid concern, so it is a tradeoff between increasing the
> > > > > > logical complexity of the code and adding branches to it.
> > > > > >
> > > > > > > 2. This function is important for developing idle governor.
> > > > > > > If the problem is not fixed in the function itself, then this potential
> > > > > > > pitfall should be explicitly stated in the documentation.
> > > > > >
> > > > > > That I can agree with.
> > > > > >
> > > > > > > This is because
> > > > > > > it is difficult to predict from the definition and naming of the function
> > > > > > > that it might return a negative number. I actually discovered this anomaly
> > > > > > > when I was doing data analysis on my own idle governor. For some idle control
> > > > > > > algorithms, this exception return could lead to serious consequences,
> > > > > > > because negative return logically won't happen.
> > > > > >
> > > > > > Well, it's a matter of how to take the possible negative return value
> > > > > > into account so it does not affect the result of the computations.
> > > > >
> > > > > I think it is challenging for some algorithms to take negative return values
> > > > > into account properly. For TEO (and even menu), it is possible to
> > > > > solve the problem by just changing the way the data is received is because the
> > > > > learning mechanism for both algorithms is simple.
> > > >
> > > > Of course this depends on the governor.
> > > >
> > > > > One of the interesting things about the CPUIdle subsystem is that it is well
> > > > > suited to introduce machine learning and probabilistic statistical methods.
> > > >
> > > > You need to remember that the governor code runs in the idle loop
> > > > context which is expected to be reasonably fast.
> > > >
> > > > That's why we are worrying about individual branches here.
> > > >
> > > > > This means that many of the more complex and data-sensitive algorithms can
> > > > > potentially be explored. In the best case we will still need to add additional
> > > > > code complexity to a new algorithm.
> > > >
> > > > So I'm not sure what the problem with adding an upfront negative value
> > > > check to the governor is.
> > > >
> > > > > It would reduce a lot of unnecessary considerations (for example, highlight
> > > > > this shortcoming in the documentation) if we could ensure that this function
> > > > > would work as it is logically defined. But I don't really understand
> > > > > how much of a burden adding an extra branch would impose, so I don't know if
> > > > > this tradeoff is worth it.
> > > >
> > > > It ultimately depends on the governor, which is why I think that the
> > > > negative value check should be done by the governor, if needed, and
> > > > not by the function called by it, because in the latter case the check
> > > > may be redundant and we end up with an extra branch (or two branches
> > > > in this particular case) for no good reason whatsoever.
> > > >
> > > > Yes, there are governors which simply can do the negative value check
> > > > upfront right after calling that function and ensure that they will
> > > > not deal with negative values going forward.  This is probably what
> > > > I'll do in the menu case.
> > > >
> > > > However, if the governor is simple enough and it can avoid doing the
> > > > explicit negative value check, I don't see a reason to do that check
> > > > elsewhere "just in case".
> > >
> > > Makes sense. I will submit my patch to fix this issue in menu and TEO.
> >
> > Well, I have patches for that already and they are not
> > super-straightforward.  Though If you want to try to fix this
> > yourself, I'll wait for your submission.
>
> Thanks! I really like this subsystem, so I hope to contribute a little.
> I still have some questions.
>
> For TEO governor:
>     Even if we change some datatypes as your patch did before, some explicit
>     type conversions still need to be added to prevent wrong results.

That's true.  That patch was way incomplete and I would start with
changing the data type of exit_latency_ns and target_residency_ns in
struct cpuidle_state to s64 for this reason.

>     For example:
>         line 276 (because 1u > -1 will be false)
>         line 329
>         line 422
>
>     Since some of them are in a loop, does the overhead caused by the
>     type conversions worth it? or do I need to do some pre-processing to avoid
>     duplicate conversions (which may cause additional space overhead) ?

Well, it looks like it may be better if I send my patches after all,
because it will take less time overall than me explaining here what I
would do.  Let me do that.

> For menu governor:
>     If we simply change the datatypes, the conversions required are even more.

I would add a negative return value check to menu as I said before.

>     For example:
>         lines 133-142
>         line 202
>         line 289
>         lines 302-309
>         line 320
>         lines 327-328
>         line 347
>
>     If we just do exception handling on next_timer_ns and delta_next,
>     that still requires two additional branches.

Not really AFAICS, but again let me send a patch to illustrate my point.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 回复: [PATCH 01/10] tick/nohz: Prevent tick_nohz_get_sleep_length() from returning negative value
  2021-03-29 12:44                                   ` Rafael J. Wysocki
@ 2021-03-29 14:49                                     ` Zhou Ti (x2019cwm)
  0 siblings, 0 replies; 35+ messages in thread
From: Zhou Ti (x2019cwm) @ 2021-03-29 14:49 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Frederic Weisbecker, LKML, Rafael J. Wysocki,
	Peter Zijlstra, Thomas Gleixner, Yunfeng Ye, Paul E . McKenney,
	Marcelo Tosatti, Ingo Molnar, Linux PM

On Mon 2021-03-29 8:45, Rafael J. Wysocki wrote:
> On Fri, Mar 26, 2021 at 11:53 PM Zhou Ti (x2019cwm) <x2019cwm@stfx.ca> wrote:
> >
> > On Fri, 26 Mar 2021 19:54:26 +0100, Rafael J. Wysocki wrote:
> > > On Fri, Mar 26, 2021 at 6:53 PM Zhou Ti (x2019cwm) <x2019cwm@stfx.ca> wrote:
> > > >
> > > > On Fri, 26 Mar 2021 18:01:47 +0100, Rafael J. Wysocki wrote:
> > > > > On Thu, Mar 25, 2021 at 9:37 PM Zhou Ti (x2019cwm) <x2019cwm@stfx.ca> wrote:
> > > > > >
> > > > > > On March 25, 2021 15:50, Rafael J. Wysocki wrote:
> > > > > > > On Thu, Mar 25, 2021 at 8:18 PM Zhou Ti (x2019cwm) <x2019cwm@stfx.ca> wrote:
> > > > > > > >
> > > > > > > > On March 25, 2021 14:56, Rafael J. Wysocki wrote:
> > > > > > > > > On Thursday, March 25, 2021 2:14:00 PM CET Frederic Weisbecker wrote:
> > > > > > > > > > On Tue, Mar 16, 2021 at 04:08:08PM +0000, Zhou Ti (x2019cwm) wrote:
> > > > > > > > > > > But I don't think it's a good idea to handle this in callers, because logically the function shouldn't return negative values. Returning 0 directly would allow idle governors to get another chance to select again.
> > > > > > > > > >
> > > > > > > > > > Hmm, I'm going to leave the last word to Rafael since cpuidle are the only
> > > > > > > > > > callers of this. In any case we need to fix it.
> > > > > > > > >
> > > > > > > > > Yes, we do.
> > > > > > > > >
> > > > > > > > > So I said that I preferred to address this in the callers and the reason why
> > > > > > > > > is because, for example, for the teo governor it would be a matter of using
> > > > > > > > > a different data type to store the tick_nohz_get_sleep_length() return value,
> > > > > > > > > like in the (untested) patch below.
> > > > > > > > >
> > > > > > > > > So at least in this case there is no need to add any new branches anywhere.
> > > > > > > > >
> > > > > > > > > I'm still not sure about menu, because it is more complicated, but even if
> > > > > > > > > that one needs an extra branch, that is a win already.
> > > > > > > >
> > > > > > > > I would like to point out the potential trouble that fixing this issue in the
> > > > > > > > callers could cause.
> > > > > > > >
> > > > > > > > 1. This function is called multiple times in menu governor and TEO
> > > > > > > > governor.
> > > > > > >
> > > > > > > What do you mean by "multiple times"?
> > > > > > >
> > > > > > > Each of the governors calls it once per cycle and its previous return
> > > > > > > value is not used in the next cycle at least in teo.
> > > > > >
> > > > > > I remember a governor called this function twice in a cycle, I guess I remember
> > > > > > wrong.
> > > > >
> > > > > That obviously depends on the governor, but both teo and menu call it
> > > > > once per cycle.
> > > > >
> > > > > > > > I'm not sure that receiving results using signed integers is enough
> > > > > > > > to solve all the problems, in the worst case it may require increasing
> > > > > > > > the logical complexity of the code.
> > > > > > >
> > > > > > > That is a valid concern, so it is a tradeoff between increasing the
> > > > > > > logical complexity of the code and adding branches to it.
> > > > > > >
> > > > > > > > 2. This function is important for developing idle governor.
> > > > > > > > If the problem is not fixed in the function itself, then this potential
> > > > > > > > pitfall should be explicitly stated in the documentation.
> > > > > > >
> > > > > > > That I can agree with.
> > > > > > >
> > > > > > > > This is because
> > > > > > > > it is difficult to predict from the definition and naming of the function
> > > > > > > > that it might return a negative number. I actually discovered this anomaly
> > > > > > > > when I was doing data analysis on my own idle governor. For some idle control
> > > > > > > > algorithms, this exception return could lead to serious consequences,
> > > > > > > > because negative return logically won't happen.
> > > > > > >
> > > > > > > Well, it's a matter of how to take the possible negative return value
> > > > > > > into account so it does not affect the result of the computations.
> > > > > >
> > > > > > I think it is challenging for some algorithms to take negative return values
> > > > > > into account properly. For TEO (and even menu), it is possible to
> > > > > > solve the problem by just changing the way the data is received is because the
> > > > > > learning mechanism for both algorithms is simple.
> > > > >
> > > > > Of course this depends on the governor.
> > > > >
> > > > > > One of the interesting things about the CPUIdle subsystem is that it is well
> > > > > > suited to introduce machine learning and probabilistic statistical methods.
> > > > >
> > > > > You need to remember that the governor code runs in the idle loop
> > > > > context which is expected to be reasonably fast.
> > > > >
> > > > > That's why we are worrying about individual branches here.
> > > > >
> > > > > > This means that many of the more complex and data-sensitive algorithms can
> > > > > > potentially be explored. In the best case we will still need to add additional
> > > > > > code complexity to a new algorithm.
> > > > >
> > > > > So I'm not sure what the problem with adding an upfront negative value
> > > > > check to the governor is.
> > > > >
> > > > > > It would reduce a lot of unnecessary considerations (for example, highlight
> > > > > > this shortcoming in the documentation) if we could ensure that this function
> > > > > > would work as it is logically defined. But I don't really understand
> > > > > > how much of a burden adding an extra branch would impose, so I don't know if
> > > > > > this tradeoff is worth it.
> > > > >
> > > > > It ultimately depends on the governor, which is why I think that the
> > > > > negative value check should be done by the governor, if needed, and
> > > > > not by the function called by it, because in the latter case the check
> > > > > may be redundant and we end up with an extra branch (or two branches
> > > > > in this particular case) for no good reason whatsoever.
> > > > >
> > > > > Yes, there are governors which simply can do the negative value check
> > > > > upfront right after calling that function and ensure that they will
> > > > > not deal with negative values going forward.  This is probably what
> > > > > I'll do in the menu case.
> > > > >
> > > > > However, if the governor is simple enough and it can avoid doing the
> > > > > explicit negative value check, I don't see a reason to do that check
> > > > > elsewhere "just in case".
> > > >
> > > > Makes sense. I will submit my patch to fix this issue in menu and TEO.
> > >
> > > Well, I have patches for that already and they are not
> > > super-straightforward.  Though If you want to try to fix this
> > > yourself, I'll wait for your submission.
> >
> > Thanks! I really like this subsystem, so I hope to contribute a little.
> > I still have some questions.
> >
> > For TEO governor:
> >     Even if we change some datatypes as your patch did before, some explicit
> >     type conversions still need to be added to prevent wrong results.
> 
> That's true.  That patch was way incomplete and I would start with
> changing the data type of exit_latency_ns and target_residency_ns in
> struct cpuidle_state to s64 for this reason.
> 
> >     For example:
> >         line 276 (because 1u > -1 will be false)
> >         line 329
> >         line 422
> >
> >     Since some of them are in a loop, does the overhead caused by the
> >     type conversions worth it? or do I need to do some pre-processing to avoid
> >     duplicate conversions (which may cause additional space overhead) ?
> 
> Well, it looks like it may be better if I send my patches after all,
> because it will take less time overall than me explaining here what I
> would do.  Let me do that.
> 
> > For menu governor:
> >     If we simply change the datatypes, the conversions required are even more.
> 
> I would add a negative return value check to menu as I said before.
> 
> Not really AFAICS, but again let me send a patch to illustrate my point.

Sure! I will study your patches carefully.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2021-03-29 14:49 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-11 12:36 [PATCH 00/10] tick/nohz updates Frederic Weisbecker
2021-03-11 12:36 ` [PATCH 01/10] tick/nohz: Prevent tick_nohz_get_sleep_length() from returning negative value Frederic Weisbecker
2021-03-16 12:21   ` Peter Zijlstra
2021-03-16 13:37     ` Frederic Weisbecker
2021-03-16 14:35       ` Peter Zijlstra
2021-03-16 14:53         ` Frederic Weisbecker
2021-03-16 15:26           ` Rafael J. Wysocki
2021-03-16 15:57             ` 回复: " Zhou Ti (x2019cwm)
2021-03-16 16:08               ` Zhou Ti (x2019cwm)
2021-03-16 16:25                 ` Peter Zijlstra
2021-03-17 21:49                   ` Zhou Ti (x2019cwm)
2021-03-25 13:14                 ` Frederic Weisbecker
2021-03-25 18:56                   ` Rafael J. Wysocki
2021-03-25 19:18                     ` Zhou Ti (x2019cwm)
2021-03-25 19:50                       ` Rafael J. Wysocki
2021-03-25 20:37                         ` Zhou Ti (x2019cwm)
2021-03-26 17:01                           ` Rafael J. Wysocki
2021-03-26 17:53                             ` Zhou Ti (x2019cwm)
2021-03-26 18:54                               ` Rafael J. Wysocki
2021-03-26 22:53                                 ` Zhou Ti (x2019cwm)
2021-03-29 12:44                                   ` Rafael J. Wysocki
2021-03-29 14:49                                     ` Zhou Ti (x2019cwm)
2021-03-11 12:37 ` [PATCH 02/10] tick/nohz: Add tick_nohz_full_this_cpu() Frederic Weisbecker
2021-03-16 12:28   ` Peter Zijlstra
2021-03-16 13:05     ` Frederic Weisbecker
2021-03-11 12:37 ` [PATCH 03/10] tick/nohz: Conditionally restart tick on idle exit Frederic Weisbecker
2021-03-11 12:37 ` [PATCH 04/10] tick/nohz: Remove superflous check for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE Frederic Weisbecker
2021-03-11 12:37 ` [PATCH 05/10] tick/nohz: Update idle_exittime on actual idle exit Frederic Weisbecker
2021-03-11 12:37 ` [PATCH 06/10] timer: Report ignored local enqueue in nohz mode Frederic Weisbecker
2021-03-16 15:27   ` Peter Zijlstra
2021-03-25 13:07     ` Frederic Weisbecker
2021-03-11 12:37 ` [PATCH 07/10] tick/nohz: Update nohz_full Kconfig help Frederic Weisbecker
2021-03-11 12:37 ` [PATCH 08/10] tick/nohz: Only wakeup a single target cpu when kicking a task Frederic Weisbecker
2021-03-11 12:37 ` [PATCH 09/10] tick/nohz: Change signal tick dependency to wakeup CPUs of member tasks Frederic Weisbecker
2021-03-11 12:37 ` [PATCH 10/10] tick/nohz: Kick only _queued_ task whose tick dependency is updated Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).