All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nohz_full: Fix TICK_DO_TIMER_NONE vs nohz_full warning
@ 2022-09-27 12:03 Valentin Schneider
  0 siblings, 0 replies; only message in thread
From: Valentin Schneider @ 2022-09-27 12:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Frederic Weisbecker,
	Paul E. McKenney, Thomas Gleixner, Nicholas Piggin,
	Marcelo Tosatti

Booting a system with an invalid nohz_full cmdline mask (in my case
nr_cpus=X on the cmdline makes the nohz_full CPUs out of range) triggers:

[    1.209455] WARNING: CPU: 1 PID: 1 at kernel/time/tick-sched.c:191 tick_sched_do_timer+0x90/0xa0
[    1.209455] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.19.0-00675-g7e9518baed4c #39
[    1.209455] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[    1.209455] RIP: 0010:tick_sched_do_timer+0x90/0xa0
[    1.209455] Call Trace:
[    1.209455]  <IRQ>
[    1.209455]  tick_sched_timer+0x2e/0x80
[    1.209455]  __hrtimer_run_queues+0xfc/0x2a0
[    1.209455]  hrtimer_interrupt+0x105/0x240
[    1.209455]  __sysvec_apic_timer_interrupt+0x7a/0x160
[    1.209455]  sysvec_apic_timer_interrupt+0x85/0xb0
[    1.209455]  </IRQ>

This is because nothing checks the actual contents of the mask in
housekeeping_setup(), so in those scenarios we do end up invoking
tick_nohz_full_setup() and thus setting tick_nohz_full_running to true.

However, later on in tick_nohz_init(), this ends up being a no-op:

  for_each_cpu(cpu, tick_nohz_full_mask)
	  ct_cpu_track_user(cpu);

This in turn means we end up with
  tick_nohz_full_running == true
  context_tracking_enabled() == false
IOW
  tick_nohz_full_enabled() == false

Thus, __tick_nohz_idle_stop_tick() can legitimately stop the tick during
idle for the tick_do_timer_cpu, and sets tick_do_timer_cpu to
TICK_DO_TIMER_NONE. This triggers the warning when later on the tick fires
and tick_sched_do_timer() detects the tick_do_timer_cpu was relinquished.

Check the contents of the non_housekeeping_mask after it is
parsed (rcu_init_nohz() does a similar check). For good measure, also
update the check in tick_sched_do_timer() to use tick_nohz_full_enabled().

Fixes: 08ae95f4fd3b ("nohz_full: Allow the boot CPU to be nohz_full")
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
---
 kernel/sched/isolation.c | 3 ++-
 kernel/time/tick-sched.c | 4 +---
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 373d42c707bc..774cd187a1f7 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -120,7 +120,8 @@ static int __init housekeeping_setup(char *str, unsigned long flags)
 	}
 
 	alloc_bootmem_cpumask_var(&non_housekeeping_mask);
-	if (cpulist_parse(str, non_housekeeping_mask) < 0) {
+	if (cpulist_parse(str, non_housekeeping_mask) < 0 ||
+	    !cpumask_subset(non_housekeeping_mask, cpu_possible_mask)) {
 		pr_warn("Housekeeping: nohz_full= or isolcpus= incorrect CPU range\n");
 		goto free_non_housekeeping_mask;
 	}
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index b0e3c9205946..dae01a6577ab 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -187,9 +187,7 @@ static void tick_sched_do_timer(struct tick_sched *ts, ktime_t now)
 	 * tick_do_timer_cpu never relinquishes.
 	 */
 	if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE)) {
-#ifdef CONFIG_NO_HZ_FULL
-		WARN_ON_ONCE(tick_nohz_full_running);
-#endif
+		WARN_ON_ONCE(tick_nohz_full_enabled());
 		tick_do_timer_cpu = cpu;
 	}
 #endif
-- 
2.31.1


^ permalink raw reply related	[flat|nested] only message in thread

only message in thread, other threads:[~2022-09-27 12:03 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-27 12:03 [PATCH] nohz_full: Fix TICK_DO_TIMER_NONE vs nohz_full warning Valentin Schneider

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.