linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] Allow CPU0 to be nohz full
@ 2019-04-11  3:34 Nicholas Piggin
  2019-04-11  3:34 ` [PATCH v2 1/5] sched/core: allow the remote scheduler tick to be started on CPU0 Nicholas Piggin
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Nicholas Piggin @ 2019-04-11  3:34 UTC (permalink / raw)
  To: Thomas Gleixner, Frederic Weisbecker
  Cc: linux-arch, Peter Zijlstra, Rafael J . Wysocki, linux-kernel,
	Nicholas Piggin, Ingo Molnar, linuxppc-dev

Since last time, I added a compile time option to opt-out of this
if the platform does not support suspend on non-zero, and tried to
improve legibility of changelogs and explain the justification
better.

I have been testing this on powerpc/pseries and it seems to work
fine (the firmware call to suspend can be called on any CPU and
resumes where it left off), but not included here because the
code has some bitrot unrelated to this series which I hacked to
fix. I will discuss it and either send an acked patch to go with
this series if it is small, or fix it in powerpc tree.

Thanks,
Nick

Nicholas Piggin (5):
  sched/core: allow the remote scheduler tick to be started on CPU0
  PM / suspend: add function to disable secondaries for suspend
  kernel/cpu: Allow non-zero CPU to be primary for suspend / kexec
    freeze
  kernel/sched/isolation: require a present CPU in housekeeping mask
  nohz_full: Allow the boot CPU to be nohz_full

 include/linux/cpu.h       | 15 ++++++++++++
 kernel/cpu.c              | 10 +++++++-
 kernel/kexec_core.c       |  4 ++--
 kernel/power/Kconfig      |  9 +++++++
 kernel/power/hibernate.c  | 12 +++++-----
 kernel/power/suspend.c    |  4 ++--
 kernel/sched/core.c       |  2 +-
 kernel/sched/isolation.c  | 18 ++++++++++----
 kernel/time/tick-common.c | 50 +++++++++++++++++++++++++++++++++++----
 kernel/time/tick-sched.c  | 34 ++++++++++++++++++--------
 11 files changed, 131 insertions(+), 31 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/5] sched/core: allow the remote scheduler tick to be started on CPU0
  2019-04-11  3:34 [PATCH v2 0/5] Allow CPU0 to be nohz full Nicholas Piggin
@ 2019-04-11  3:34 ` Nicholas Piggin
  2019-04-11  3:34 ` [PATCH v2 2/5] PM / suspend: add function to disable secondaries for suspend Nicholas Piggin
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Nicholas Piggin @ 2019-04-11  3:34 UTC (permalink / raw)
  To: Thomas Gleixner, Frederic Weisbecker
  Cc: linux-arch, Peter Zijlstra, Rafael J . Wysocki, linux-kernel,
	Nicholas Piggin, Ingo Molnar, linuxppc-dev

This has no effect yet because CPU0 will always be a housekeeping CPU
until a later change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 kernel/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4778c48a7fda..10e05ec049b6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5918,7 +5918,7 @@ void __init sched_init_smp(void)
 
 static int __init migration_init(void)
 {
-	sched_rq_cpu_starting(smp_processor_id());
+	sched_cpu_starting(smp_processor_id());
 	return 0;
 }
 early_initcall(migration_init);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/5] PM / suspend: add function to disable secondaries for suspend
  2019-04-11  3:34 [PATCH v2 0/5] Allow CPU0 to be nohz full Nicholas Piggin
  2019-04-11  3:34 ` [PATCH v2 1/5] sched/core: allow the remote scheduler tick to be started on CPU0 Nicholas Piggin
@ 2019-04-11  3:34 ` Nicholas Piggin
  2019-04-11  3:34 ` [PATCH v2 3/5] kernel/cpu: Allow non-zero CPU to be primary for suspend / kexec freeze Nicholas Piggin
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Nicholas Piggin @ 2019-04-11  3:34 UTC (permalink / raw)
  To: Thomas Gleixner, Frederic Weisbecker
  Cc: linux-arch, Peter Zijlstra, Rafael J . Wysocki, linux-kernel,
	Nicholas Piggin, Ingo Molnar, linuxppc-dev

This adds a function to disable secondary CPUs for suspend that are
not necessarily non-zero / non-boot CPUs. Platforms will be able to
use this to suspend using non-zero CPUs.

Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 include/linux/cpu.h      | 10 ++++++++++
 kernel/kexec_core.c      |  4 ++--
 kernel/power/hibernate.c | 12 ++++++------
 kernel/power/suspend.c   |  4 ++--
 4 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 5041357d0297..563e697e7779 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -137,6 +137,16 @@ static inline int disable_nonboot_cpus(void)
 	return freeze_secondary_cpus(0);
 }
 extern void enable_nonboot_cpus(void);
+
+static inline int suspend_disable_secondary_cpus(void)
+{
+	return freeze_secondary_cpus(0);
+}
+static inline void suspend_enable_secondary_cpus(void)
+{
+	return enable_nonboot_cpus();
+}
+
 #else /* !CONFIG_PM_SLEEP_SMP */
 static inline int disable_nonboot_cpus(void) { return 0; }
 static inline void enable_nonboot_cpus(void) {}
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index d7140447be75..fd5c95ff9251 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -1150,7 +1150,7 @@ int kernel_kexec(void)
 		error = dpm_suspend_end(PMSG_FREEZE);
 		if (error)
 			goto Resume_devices;
-		error = disable_nonboot_cpus();
+		error = suspend_disable_secondary_cpus();
 		if (error)
 			goto Enable_cpus;
 		local_irq_disable();
@@ -1183,7 +1183,7 @@ int kernel_kexec(void)
  Enable_irqs:
 		local_irq_enable();
  Enable_cpus:
-		enable_nonboot_cpus();
+		suspend_enable_secondary_cpus();
 		dpm_resume_start(PMSG_RESTORE);
  Resume_devices:
 		dpm_resume_end(PMSG_RESTORE);
diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index abef759de7c8..cfc7a57049e4 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -281,7 +281,7 @@ static int create_image(int platform_mode)
 	if (error || hibernation_test(TEST_PLATFORM))
 		goto Platform_finish;
 
-	error = disable_nonboot_cpus();
+	error = suspend_disable_secondary_cpus();
 	if (error || hibernation_test(TEST_CPUS))
 		goto Enable_cpus;
 
@@ -323,7 +323,7 @@ static int create_image(int platform_mode)
 	local_irq_enable();
 
  Enable_cpus:
-	enable_nonboot_cpus();
+	suspend_enable_secondary_cpus();
 
  Platform_finish:
 	platform_finish(platform_mode);
@@ -417,7 +417,7 @@ int hibernation_snapshot(int platform_mode)
 
 int __weak hibernate_resume_nonboot_cpu_disable(void)
 {
-	return disable_nonboot_cpus();
+	return suspend_disable_secondary_cpus();
 }
 
 /**
@@ -486,7 +486,7 @@ static int resume_target_kernel(bool platform_mode)
 	local_irq_enable();
 
  Enable_cpus:
-	enable_nonboot_cpus();
+	suspend_enable_secondary_cpus();
 
  Cleanup:
 	platform_restore_cleanup(platform_mode);
@@ -564,7 +564,7 @@ int hibernation_platform_enter(void)
 	if (error)
 		goto Platform_finish;
 
-	error = disable_nonboot_cpus();
+	error = suspend_disable_secondary_cpus();
 	if (error)
 		goto Enable_cpus;
 
@@ -586,7 +586,7 @@ int hibernation_platform_enter(void)
 	local_irq_enable();
 
  Enable_cpus:
-	enable_nonboot_cpus();
+	suspend_enable_secondary_cpus();
 
  Platform_finish:
 	hibernation_ops->finish();
diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index 0bd595a0b610..59b6def23046 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -428,7 +428,7 @@ static int suspend_enter(suspend_state_t state, bool *wakeup)
 	if (suspend_test(TEST_PLATFORM))
 		goto Platform_wake;
 
-	error = disable_nonboot_cpus();
+	error = suspend_disable_secondary_cpus();
 	if (error || suspend_test(TEST_CPUS))
 		goto Enable_cpus;
 
@@ -458,7 +458,7 @@ static int suspend_enter(suspend_state_t state, bool *wakeup)
 	BUG_ON(irqs_disabled());
 
  Enable_cpus:
-	enable_nonboot_cpus();
+	suspend_enable_secondary_cpus();
 
  Platform_wake:
 	platform_resume_noirq(state);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/5] kernel/cpu: Allow non-zero CPU to be primary for suspend / kexec freeze
  2019-04-11  3:34 [PATCH v2 0/5] Allow CPU0 to be nohz full Nicholas Piggin
  2019-04-11  3:34 ` [PATCH v2 1/5] sched/core: allow the remote scheduler tick to be started on CPU0 Nicholas Piggin
  2019-04-11  3:34 ` [PATCH v2 2/5] PM / suspend: add function to disable secondaries for suspend Nicholas Piggin
@ 2019-04-11  3:34 ` Nicholas Piggin
  2019-04-25 12:02   ` Peter Zijlstra
  2019-04-11  3:34 ` [PATCH v2 4/5] kernel/sched/isolation: require a present CPU in housekeeping mask Nicholas Piggin
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 11+ messages in thread
From: Nicholas Piggin @ 2019-04-11  3:34 UTC (permalink / raw)
  To: Thomas Gleixner, Frederic Weisbecker
  Cc: linux-arch, Peter Zijlstra, Rafael J . Wysocki, linux-kernel,
	Nicholas Piggin, Ingo Molnar, linuxppc-dev

This patch provides an arch option, ARCH_SUSPEND_NONZERO_CPU, to
opt-in to allowing suspend to occur on one of the housekeeping CPUs
rather than hardcoded CPU0.

This will allow CPU0 to be a nohz_full CPU with a later change.

It may be possible for platforms with hardware/firmware restrictions
on suspend/wake effectively support this by handing off the final
stage to CPU0 when kernel housekeeping is no longer required. Another
option is to make housekeeping / nohz_full mask dynamic at runtime,
but the complexity could not be justified at this time.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/Kconfig |  4 ++++
 include/linux/cpu.h  |  7 ++++++-
 kernel/cpu.c         | 10 +++++++++-
 kernel/power/Kconfig |  9 +++++++++
 4 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2d0be82c3061..bc98b0e37a10 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -318,6 +318,10 @@ config ARCH_SUSPEND_POSSIBLE
 		   (PPC_85xx && !PPC_E500MC) || PPC_86xx || PPC_PSERIES \
 		   || 44x || 40x
 
+config ARCH_SUSPEND_NONZERO_CPU
+	def_bool y
+	depends on PPC_POWERNV || PPC_PSERIES
+
 config PPC_DCR_NATIVE
 	bool
 
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 563e697e7779..dd3813959d62 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -140,7 +140,12 @@ extern void enable_nonboot_cpus(void);
 
 static inline int suspend_disable_secondary_cpus(void)
 {
-	return freeze_secondary_cpus(0);
+	int cpu = 0;
+
+	if (IS_ENABLED(CONFIG_PM_SLEEP_SMP_NONZERO_CPU))
+		cpu = -1;
+
+	return freeze_secondary_cpus(cpu);
 }
 static inline void suspend_enable_secondary_cpus(void)
 {
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 6754f3ecfd94..d1bf6e2b4752 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -9,6 +9,7 @@
 #include <linux/notifier.h>
 #include <linux/sched/signal.h>
 #include <linux/sched/hotplug.h>
+#include <linux/sched/isolation.h>
 #include <linux/sched/task.h>
 #include <linux/sched/smt.h>
 #include <linux/unistd.h>
@@ -1199,8 +1200,15 @@ int freeze_secondary_cpus(int primary)
 	int cpu, error = 0;
 
 	cpu_maps_update_begin();
-	if (!cpu_online(primary))
+	if (primary == -1) {
 		primary = cpumask_first(cpu_online_mask);
+		if (!housekeeping_cpu(primary, HK_FLAG_TIMER))
+			primary = housekeeping_any_cpu(HK_FLAG_TIMER);
+	} else {
+		if (!cpu_online(primary))
+			primary = cpumask_first(cpu_online_mask);
+	}
+
 	/*
 	 * We take down all of the non-boot CPUs in one shot to avoid races
 	 * with the userspace trying to use the CPU hotplug at the same time
diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig
index f8fe57d1022e..9bbaaab14b36 100644
--- a/kernel/power/Kconfig
+++ b/kernel/power/Kconfig
@@ -114,6 +114,15 @@ config PM_SLEEP_SMP
 	depends on PM_SLEEP
 	select HOTPLUG_CPU
 
+config PM_SLEEP_SMP_NONZERO_CPU
+	def_bool y
+	depends on PM_SLEEP_SMP
+	depends on ARCH_SUSPEND_NONZERO_CPU
+	---help---
+	If an arch can suspend (for suspend, hibernate, kexec, etc) on a
+	non-zero numbered CPU, it may define ARCH_SUSPEND_NONZERO_CPU. This
+	will allow nohz_full mask to include CPU0.
+
 config PM_AUTOSLEEP
 	bool "Opportunistic sleep"
 	depends on PM_SLEEP
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 4/5] kernel/sched/isolation: require a present CPU in housekeeping mask
  2019-04-11  3:34 [PATCH v2 0/5] Allow CPU0 to be nohz full Nicholas Piggin
                   ` (2 preceding siblings ...)
  2019-04-11  3:34 ` [PATCH v2 3/5] kernel/cpu: Allow non-zero CPU to be primary for suspend / kexec freeze Nicholas Piggin
@ 2019-04-11  3:34 ` Nicholas Piggin
  2019-04-11  3:34 ` [PATCH v2 5/5] nohz_full: Allow the boot CPU to be nohz_full Nicholas Piggin
  2019-04-25 12:04 ` [PATCH v2 0/5] Allow CPU0 to be nohz full Peter Zijlstra
  5 siblings, 0 replies; 11+ messages in thread
From: Nicholas Piggin @ 2019-04-11  3:34 UTC (permalink / raw)
  To: Thomas Gleixner, Frederic Weisbecker
  Cc: linux-arch, Peter Zijlstra, Rafael J . Wysocki, linux-kernel,
	Nicholas Piggin, Ingo Molnar, linuxppc-dev

During housekeeping mask setup, currently a possible CPU is required.
That does not guarantee the CPU would be available at boot time, so
check to ensure that at least one present CPU is in the mask.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 kernel/sched/isolation.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index b02d148e7672..687302051a27 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -65,6 +65,7 @@ void __init housekeeping_init(void)
 static int __init housekeeping_setup(char *str, enum hk_flags flags)
 {
 	cpumask_var_t non_housekeeping_mask;
+	cpumask_var_t tmp;
 	int err;
 
 	alloc_bootmem_cpumask_var(&non_housekeeping_mask);
@@ -75,16 +76,23 @@ static int __init housekeeping_setup(char *str, enum hk_flags flags)
 		return 0;
 	}
 
+	alloc_bootmem_cpumask_var(&tmp);
 	if (!housekeeping_flags) {
 		alloc_bootmem_cpumask_var(&housekeeping_mask);
 		cpumask_andnot(housekeeping_mask,
 			       cpu_possible_mask, non_housekeeping_mask);
-		if (cpumask_empty(housekeeping_mask))
+
+		cpumask_andnot(tmp, cpu_present_mask, non_housekeeping_mask);
+		if (cpumask_empty(tmp)) {
+			pr_warn("Housekeeping: must include one present CPU, "
+				"using boot CPU:%d\n", smp_processor_id());
 			__cpumask_set_cpu(smp_processor_id(), housekeeping_mask);
+			__cpumask_clear_cpu(smp_processor_id(), non_housekeeping_mask);
+		}
 	} else {
-		cpumask_var_t tmp;
-
-		alloc_bootmem_cpumask_var(&tmp);
+		cpumask_andnot(tmp, cpu_present_mask, non_housekeeping_mask);
+		if (cpumask_empty(tmp))
+			__cpumask_clear_cpu(smp_processor_id(), non_housekeeping_mask);
 		cpumask_andnot(tmp, cpu_possible_mask, non_housekeeping_mask);
 		if (!cpumask_equal(tmp, housekeeping_mask)) {
 			pr_warn("Housekeeping: nohz_full= must match isolcpus=\n");
@@ -92,8 +100,8 @@ static int __init housekeeping_setup(char *str, enum hk_flags flags)
 			free_bootmem_cpumask_var(non_housekeeping_mask);
 			return 0;
 		}
-		free_bootmem_cpumask_var(tmp);
 	}
+	free_bootmem_cpumask_var(tmp);
 
 	if ((flags & HK_FLAG_TICK) && !(housekeeping_flags & HK_FLAG_TICK)) {
 		if (IS_ENABLED(CONFIG_NO_HZ_FULL)) {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 5/5] nohz_full: Allow the boot CPU to be nohz_full
  2019-04-11  3:34 [PATCH v2 0/5] Allow CPU0 to be nohz full Nicholas Piggin
                   ` (3 preceding siblings ...)
  2019-04-11  3:34 ` [PATCH v2 4/5] kernel/sched/isolation: require a present CPU in housekeeping mask Nicholas Piggin
@ 2019-04-11  3:34 ` Nicholas Piggin
  2019-04-25 12:04 ` [PATCH v2 0/5] Allow CPU0 to be nohz full Peter Zijlstra
  5 siblings, 0 replies; 11+ messages in thread
From: Nicholas Piggin @ 2019-04-11  3:34 UTC (permalink / raw)
  To: Thomas Gleixner, Frederic Weisbecker
  Cc: linux-arch, Peter Zijlstra, Rafael J . Wysocki, linux-kernel,
	Nicholas Piggin, Ingo Molnar, linuxppc-dev

Allow the boot CPU / CPU0 to be nohz_full. Have the boot CPU take the
do_timer duty during boot until a housekeeping CPU can take over.

This is supported when CONFIG_PM_SLEEP_SMP is not configured, or when
it is configured and the arch allows suspend on non-zero CPUs.

nohz_full has been trialed at a large supercomputer site and found to
significantly reduce jitter. In order to deploy it in production, they
need CPU0 to be nohz_full because their job control system requires
the application CPUs to start from 0, and the housekeeping CPUs are
placed higher. An equivalent job scheduling that uses CPU0 for
housekeeping could be achieved by modifying their system, but it is
preferable if nohz_full can support their environment without
modification.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 kernel/time/tick-common.c | 50 +++++++++++++++++++++++++++++++++++----
 kernel/time/tick-sched.c  | 34 ++++++++++++++++++--------
 2 files changed, 70 insertions(+), 14 deletions(-)

diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index 529143b4c8d2..31146c13226e 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -46,6 +46,14 @@ ktime_t tick_period;
  *    procedure also covers cpu hotplug.
  */
 int tick_do_timer_cpu __read_mostly = TICK_DO_TIMER_BOOT;
+#ifdef CONFIG_NO_HZ_FULL
+/*
+ * tick_do_timer_boot_cpu indicates the boot CPU temporarily owns
+ * tick_do_timer_cpu and it should be taken over by an eligible secondary
+ * when one comes online.
+ */
+static int tick_do_timer_boot_cpu __read_mostly = -1;
+#endif
 
 /*
  * Debugging: see timer_list.c
@@ -167,6 +175,26 @@ void tick_setup_periodic(struct clock_event_device *dev, int broadcast)
 	}
 }
 
+#ifdef CONFIG_NO_HZ_FULL
+static void giveup_do_timer(void *info)
+{
+	int cpu = *(unsigned int *)info;
+
+	WARN_ON(tick_do_timer_cpu != smp_processor_id());
+
+	tick_do_timer_cpu = cpu;
+}
+
+static void tick_take_do_timer_from_boot(void)
+{
+	int cpu = smp_processor_id();
+	int from = tick_do_timer_boot_cpu;
+
+	if (from >= 0 && from != cpu)
+		smp_call_function_single(from, giveup_do_timer, &cpu, 1);
+}
+#endif
+
 /*
  * Setup the tick device
  */
@@ -186,12 +214,26 @@ static void tick_setup_device(struct tick_device *td,
 		 * this cpu:
 		 */
 		if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) {
-			if (!tick_nohz_full_cpu(cpu))
-				tick_do_timer_cpu = cpu;
-			else
-				tick_do_timer_cpu = TICK_DO_TIMER_NONE;
+			tick_do_timer_cpu = cpu;
+
 			tick_next_period = ktime_get();
 			tick_period = NSEC_PER_SEC / HZ;
+#ifdef CONFIG_NO_HZ_FULL
+			/*
+			 * The boot CPU may be nohz_full, in which case set
+			 * tick_do_timer_boot_cpu so the first housekeeping
+			 * secondary that comes up will take do_timer from
+			 * us.
+			 */
+			if (tick_nohz_full_cpu(cpu))
+				tick_do_timer_boot_cpu = cpu;
+
+		} else if (tick_do_timer_boot_cpu != -1 &&
+						!tick_nohz_full_cpu(cpu)) {
+			tick_take_do_timer_from_boot();
+			tick_do_timer_boot_cpu = -1;
+			WARN_ON(tick_do_timer_cpu != cpu);
+#endif
 		}
 
 		/*
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 6fa52cd6df0b..4aa917acbe1c 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -121,10 +121,16 @@ static void tick_sched_do_timer(struct tick_sched *ts, ktime_t now)
 	 * into a long sleep. If two CPUs happen to assign themselves to
 	 * this duty, then the jiffies update is still serialized by
 	 * jiffies_lock.
+	 *
+	 * If nohz_full is enabled, this should not happen because the
+	 * tick_do_timer_cpu never relinquishes.
 	 */
-	if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE)
-	    && !tick_nohz_full_cpu(cpu))
+	if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE)) {
+#ifdef CONFIG_NO_HZ_FULL
+		WARN_ON(tick_nohz_full_running);
+#endif
 		tick_do_timer_cpu = cpu;
+	}
 #endif
 
 	/* Check, if the jiffies need an update */
@@ -395,8 +401,8 @@ void __init tick_nohz_full_setup(cpumask_var_t cpumask)
 static int tick_nohz_cpu_down(unsigned int cpu)
 {
 	/*
-	 * The boot CPU handles housekeeping duty (unbound timers,
-	 * workqueues, timekeeping, ...) on behalf of full dynticks
+	 * The tick_do_timer_cpu CPU handles housekeeping duty (unbound
+	 * timers, workqueues, timekeeping, ...) on behalf of full dynticks
 	 * CPUs. It must remain online when nohz full is enabled.
 	 */
 	if (tick_nohz_full_running && tick_do_timer_cpu == cpu)
@@ -423,12 +429,15 @@ void __init tick_nohz_init(void)
 		return;
 	}
 
-	cpu = smp_processor_id();
+	if (IS_ENABLED(CONFIG_PM_SLEEP_SMP) &&
+			!IS_ENABLED(CONFIG_PM_SLEEP_SMP_NONZERO_CPU)) {
+		cpu = smp_processor_id();
 
-	if (cpumask_test_cpu(cpu, tick_nohz_full_mask)) {
-		pr_warn("NO_HZ: Clearing %d from nohz_full range for timekeeping\n",
-			cpu);
-		cpumask_clear_cpu(cpu, tick_nohz_full_mask);
+		if (cpumask_test_cpu(cpu, tick_nohz_full_mask)) {
+			pr_warn("NO_HZ: Clearing %d from nohz_full range "
+				"for timekeeping\n", cpu);
+			cpumask_clear_cpu(cpu, tick_nohz_full_mask);
+		}
 	}
 
 	for_each_cpu(cpu, tick_nohz_full_mask)
@@ -904,8 +913,13 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
 		/*
 		 * Boot safety: make sure the timekeeping duty has been
 		 * assigned before entering dyntick-idle mode,
+		 * tick_do_timer_cpu is TICK_DO_TIMER_BOOT
 		 */
-		if (tick_do_timer_cpu == TICK_DO_TIMER_NONE)
+		if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_BOOT))
+			return false;
+
+		/* Should not happen for nohz-full */
+		if (WARN_ON_ONCE(tick_do_timer_cpu == TICK_DO_TIMER_NONE))
 			return false;
 	}
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/5] kernel/cpu: Allow non-zero CPU to be primary for suspend / kexec freeze
  2019-04-11  3:34 ` [PATCH v2 3/5] kernel/cpu: Allow non-zero CPU to be primary for suspend / kexec freeze Nicholas Piggin
@ 2019-04-25 12:02   ` Peter Zijlstra
  2019-04-26  4:32     ` Nicholas Piggin
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2019-04-25 12:02 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linux-arch, Frederic Weisbecker, Rafael J . Wysocki,
	linux-kernel, Ingo Molnar, Thomas Gleixner, linuxppc-dev

On Thu, Apr 11, 2019 at 01:34:46PM +1000, Nicholas Piggin wrote:
> This patch provides an arch option, ARCH_SUSPEND_NONZERO_CPU, to
> opt-in to allowing suspend to occur on one of the housekeeping CPUs
> rather than hardcoded CPU0.
> 
> This will allow CPU0 to be a nohz_full CPU with a later change.
> 
> It may be possible for platforms with hardware/firmware restrictions
> on suspend/wake effectively support this by handing off the final
> stage to CPU0 when kernel housekeeping is no longer required. Another
> option is to make housekeeping / nohz_full mask dynamic at runtime,
> but the complexity could not be justified at this time.

Should we not tie this into whatever already allows an achitecture to
hotplug CPU-0? For instance, x86 default disallows this but has
cpu0_hotpluggable to allow this.

Presumably POWER already allows hotplugging CPU-0 ?



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/5] Allow CPU0 to be nohz full
  2019-04-11  3:34 [PATCH v2 0/5] Allow CPU0 to be nohz full Nicholas Piggin
                   ` (4 preceding siblings ...)
  2019-04-11  3:34 ` [PATCH v2 5/5] nohz_full: Allow the boot CPU to be nohz_full Nicholas Piggin
@ 2019-04-25 12:04 ` Peter Zijlstra
  2019-04-30  2:46   ` Nicholas Piggin
  5 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2019-04-25 12:04 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linux-arch, Frederic Weisbecker, Rafael J . Wysocki,
	linux-kernel, Ingo Molnar, Thomas Gleixner, linuxppc-dev

On Thu, Apr 11, 2019 at 01:34:43PM +1000, Nicholas Piggin wrote:
> Since last time, I added a compile time option to opt-out of this
> if the platform does not support suspend on non-zero, and tried to
> improve legibility of changelogs and explain the justification
> better.
> 
> I have been testing this on powerpc/pseries and it seems to work
> fine (the firmware call to suspend can be called on any CPU and
> resumes where it left off), but not included here because the
> code has some bitrot unrelated to this series which I hacked to
> fix. I will discuss it and either send an acked patch to go with
> this series if it is small, or fix it in powerpc tree.
> 

Rafael, Frederic, any comments?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/5] kernel/cpu: Allow non-zero CPU to be primary for suspend / kexec freeze
  2019-04-25 12:02   ` Peter Zijlstra
@ 2019-04-26  4:32     ` Nicholas Piggin
  0 siblings, 0 replies; 11+ messages in thread
From: Nicholas Piggin @ 2019-04-26  4:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, Frederic Weisbecker, Rafael J . Wysocki,
	linux-kernel, Ingo Molnar, Thomas Gleixner, linuxppc-dev

Peter Zijlstra's on April 25, 2019 10:02 pm:
> On Thu, Apr 11, 2019 at 01:34:46PM +1000, Nicholas Piggin wrote:
>> This patch provides an arch option, ARCH_SUSPEND_NONZERO_CPU, to
>> opt-in to allowing suspend to occur on one of the housekeeping CPUs
>> rather than hardcoded CPU0.
>> 
>> This will allow CPU0 to be a nohz_full CPU with a later change.
>> 
>> It may be possible for platforms with hardware/firmware restrictions
>> on suspend/wake effectively support this by handing off the final
>> stage to CPU0 when kernel housekeeping is no longer required. Another
>> option is to make housekeeping / nohz_full mask dynamic at runtime,
>> but the complexity could not be justified at this time.
> 
> Should we not tie this into whatever already allows an achitecture to
> hotplug CPU-0? For instance, x86 default disallows this but has
> cpu0_hotpluggable to allow this.
 
I didn't know about that option, but I see it still has the suspend
/ hibernate restriction though, which is what this patch is breaking
from.

If we are to prevent suspend completely at boot time, then it's no
problem to run cpu0 with nohz_full, but TPTB decided that's a bad
thing.

But I have no problem with an arch adding another boot time option
or hook into cpu0_hotpluggable that allows you to force nohz_full.

> Presumably POWER already allows hotplugging CPU-0 ?

Yeah it does. Suspend in pseries is actually really just used for
some proprietary hypervisor partition migration scheme where you
suspend, image gets saved, then resume it somewhere else. So that's
easy, it's just a hypercall that appears to return exactly as it was
called. No restriction on CPU number.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/5] Allow CPU0 to be nohz full
  2019-04-25 12:04 ` [PATCH v2 0/5] Allow CPU0 to be nohz full Peter Zijlstra
@ 2019-04-30  2:46   ` Nicholas Piggin
  2019-04-30 12:07     ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Nicholas Piggin @ 2019-04-30  2:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, Frederic Weisbecker, Rafael J . Wysocki,
	linux-kernel, Ingo Molnar, Thomas Gleixner, linuxppc-dev

Peter Zijlstra's on April 25, 2019 10:04 pm:
> On Thu, Apr 11, 2019 at 01:34:43PM +1000, Nicholas Piggin wrote:
>> Since last time, I added a compile time option to opt-out of this
>> if the platform does not support suspend on non-zero, and tried to
>> improve legibility of changelogs and explain the justification
>> better.
>> 
>> I have been testing this on powerpc/pseries and it seems to work
>> fine (the firmware call to suspend can be called on any CPU and
>> resumes where it left off), but not included here because the
>> code has some bitrot unrelated to this series which I hacked to
>> fix. I will discuss it and either send an acked patch to go with
>> this series if it is small, or fix it in powerpc tree.
>> 
> 
> Rafael, Frederic, any comments?
> 

Sorry to ping again, I guess people are probably busy after vacation.
Any chance we could get this in next merge window? Peter are you okay
with the config option as it is, then we can look at adapting it to
what x86 needs as a follow up (e.g., allow nohz CPU0 for
cpu0_hotpluggable case)?

Thanks,
Nick


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/5] Allow CPU0 to be nohz full
  2019-04-30  2:46   ` Nicholas Piggin
@ 2019-04-30 12:07     ` Peter Zijlstra
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2019-04-30 12:07 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linux-arch, Frederic Weisbecker, Rafael J . Wysocki,
	linux-kernel, Ingo Molnar, Thomas Gleixner, linuxppc-dev

On Tue, Apr 30, 2019 at 12:46:40PM +1000, Nicholas Piggin wrote:
> Peter Zijlstra's on April 25, 2019 10:04 pm:
> > On Thu, Apr 11, 2019 at 01:34:43PM +1000, Nicholas Piggin wrote:
> >> Since last time, I added a compile time option to opt-out of this
> >> if the platform does not support suspend on non-zero, and tried to
> >> improve legibility of changelogs and explain the justification
> >> better.
> >> 
> >> I have been testing this on powerpc/pseries and it seems to work
> >> fine (the firmware call to suspend can be called on any CPU and
> >> resumes where it left off), but not included here because the
> >> code has some bitrot unrelated to this series which I hacked to
> >> fix. I will discuss it and either send an acked patch to go with
> >> this series if it is small, or fix it in powerpc tree.
> >> 
> > 
> > Rafael, Frederic, any comments?
> > 
> 
> Sorry to ping again, I guess people are probably busy after vacation.
> Any chance we could get this in next merge window? Peter are you okay
> with the config option as it is, then we can look at adapting it to
> what x86 needs as a follow up (e.g., allow nohz CPU0 for
> cpu0_hotpluggable case)?

Yeah, let me just queue these here patches. Not sure they'll still make
the upcoming merge window, but we can try.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-04-30 12:10 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-11  3:34 [PATCH v2 0/5] Allow CPU0 to be nohz full Nicholas Piggin
2019-04-11  3:34 ` [PATCH v2 1/5] sched/core: allow the remote scheduler tick to be started on CPU0 Nicholas Piggin
2019-04-11  3:34 ` [PATCH v2 2/5] PM / suspend: add function to disable secondaries for suspend Nicholas Piggin
2019-04-11  3:34 ` [PATCH v2 3/5] kernel/cpu: Allow non-zero CPU to be primary for suspend / kexec freeze Nicholas Piggin
2019-04-25 12:02   ` Peter Zijlstra
2019-04-26  4:32     ` Nicholas Piggin
2019-04-11  3:34 ` [PATCH v2 4/5] kernel/sched/isolation: require a present CPU in housekeeping mask Nicholas Piggin
2019-04-11  3:34 ` [PATCH v2 5/5] nohz_full: Allow the boot CPU to be nohz_full Nicholas Piggin
2019-04-25 12:04 ` [PATCH v2 0/5] Allow CPU0 to be nohz full Peter Zijlstra
2019-04-30  2:46   ` Nicholas Piggin
2019-04-30 12:07     ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).