linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] Allow CPU0 to be nohz full
@ 2019-04-04 12:07 Nicholas Piggin
  2019-04-04 12:07 ` [PATCH 1/4] sched/core: allow the remote scheduler tick to be started on CPU0 Nicholas Piggin
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Nicholas Piggin @ 2019-04-04 12:07 UTC (permalink / raw)
  To: Thomas Gleixner, Frederic Weisbecker
  Cc: Nicholas Piggin, Ingo Molnar, Peter Zijlstra, Rafael J . Wysocki,
	linux-kernel

I've been looking at ways to fix suspend breakage with CPU0 as a
nohz CPU. I started looking at various things like allowing CPU0
to take over do_timer again temporarily or allowing nohz full
to be stopped at runtime (that is quite a significant change for
little real benefit). The problem then was having the housekeeping
CPU go offline.

So I decided to try just allowing the freeze to occur on non-zero
CPU. This seems to be a lot simpler to get working, but I guess
some archs won't be able to deal with this? Would it be okay to
make it opt-in per arch?

Thanks,
Nick

Nicholas Piggin (4):
  sched/core: allow the remote scheduler tick to be started on CPU0
  kernel/cpu: Allow non-zero CPU to be primary for suspend / kexec
    freeze
  kernel/sched/isolation: require a present CPU in housekeeping mask
  nohz_full: Allow the boot CPU to be full nohz

 include/linux/cpu.h       |  2 +-
 kernel/cpu.c              | 10 +++++++-
 kernel/sched/core.c       |  2 +-
 kernel/sched/isolation.c  | 18 ++++++++++----
 kernel/time/tick-common.c | 50 +++++++++++++++++++++++++++++++++++----
 kernel/time/tick-sched.c  | 27 +++++++++++----------
 6 files changed, 84 insertions(+), 25 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/4] sched/core: allow the remote scheduler tick to be started on CPU0
  2019-04-04 12:07 [PATCH 0/4] Allow CPU0 to be nohz full Nicholas Piggin
@ 2019-04-04 12:07 ` Nicholas Piggin
  2019-04-04 12:07 ` [PATCH 2/4] kernel/cpu: Allow non-zero CPU to be primary for suspend / kexec freeze Nicholas Piggin
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Nicholas Piggin @ 2019-04-04 12:07 UTC (permalink / raw)
  To: Thomas Gleixner, Frederic Weisbecker
  Cc: Nicholas Piggin, Ingo Molnar, Peter Zijlstra, Rafael J . Wysocki,
	linux-kernel

This has on effect yet because CPU0 will always be a housekeeping CPU
until a later change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 kernel/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4778c48a7fda..10e05ec049b6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5918,7 +5918,7 @@ void __init sched_init_smp(void)
 
 static int __init migration_init(void)
 {
-	sched_rq_cpu_starting(smp_processor_id());
+	sched_cpu_starting(smp_processor_id());
 	return 0;
 }
 early_initcall(migration_init);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/4] kernel/cpu: Allow non-zero CPU to be primary for suspend / kexec freeze
  2019-04-04 12:07 [PATCH 0/4] Allow CPU0 to be nohz full Nicholas Piggin
  2019-04-04 12:07 ` [PATCH 1/4] sched/core: allow the remote scheduler tick to be started on CPU0 Nicholas Piggin
@ 2019-04-04 12:07 ` Nicholas Piggin
  2019-04-04 12:07 ` [PATCH 3/4] kernel/sched/isolation: require a present CPU in housekeeping mask Nicholas Piggin
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Nicholas Piggin @ 2019-04-04 12:07 UTC (permalink / raw)
  To: Thomas Gleixner, Frederic Weisbecker
  Cc: Nicholas Piggin, Ingo Molnar, Peter Zijlstra, Rafael J . Wysocki,
	linux-kernel

This patch chooses a housekeeping CPU to be the primary when disabling
CPUs for suspend / kexec freeze. This should not have any effect until
a later change because CPU0 is always a housekeeping CPU.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 include/linux/cpu.h |  2 +-
 kernel/cpu.c        | 10 +++++++++-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 5041357d0297..b11c94d88953 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -134,7 +134,7 @@ static inline void put_online_cpus(void) { cpus_read_unlock(); }
 extern int freeze_secondary_cpus(int primary);
 static inline int disable_nonboot_cpus(void)
 {
-	return freeze_secondary_cpus(0);
+	return freeze_secondary_cpus(-1);
 }
 extern void enable_nonboot_cpus(void);
 #else /* !CONFIG_PM_SLEEP_SMP */
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 6754f3ecfd94..d1bf6e2b4752 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -9,6 +9,7 @@
 #include <linux/notifier.h>
 #include <linux/sched/signal.h>
 #include <linux/sched/hotplug.h>
+#include <linux/sched/isolation.h>
 #include <linux/sched/task.h>
 #include <linux/sched/smt.h>
 #include <linux/unistd.h>
@@ -1199,8 +1200,15 @@ int freeze_secondary_cpus(int primary)
 	int cpu, error = 0;
 
 	cpu_maps_update_begin();
-	if (!cpu_online(primary))
+	if (primary == -1) {
 		primary = cpumask_first(cpu_online_mask);
+		if (!housekeeping_cpu(primary, HK_FLAG_TIMER))
+			primary = housekeeping_any_cpu(HK_FLAG_TIMER);
+	} else {
+		if (!cpu_online(primary))
+			primary = cpumask_first(cpu_online_mask);
+	}
+
 	/*
 	 * We take down all of the non-boot CPUs in one shot to avoid races
 	 * with the userspace trying to use the CPU hotplug at the same time
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/4] kernel/sched/isolation: require a present CPU in housekeeping mask
  2019-04-04 12:07 [PATCH 0/4] Allow CPU0 to be nohz full Nicholas Piggin
  2019-04-04 12:07 ` [PATCH 1/4] sched/core: allow the remote scheduler tick to be started on CPU0 Nicholas Piggin
  2019-04-04 12:07 ` [PATCH 2/4] kernel/cpu: Allow non-zero CPU to be primary for suspend / kexec freeze Nicholas Piggin
@ 2019-04-04 12:07 ` Nicholas Piggin
  2019-04-04 12:07 ` [PATCH 4/4] nohz_full: Allow the boot CPU to be full nohz Nicholas Piggin
  2019-04-04 14:36 ` [PATCH 0/4] Allow CPU0 to be nohz full Thomas Gleixner
  4 siblings, 0 replies; 12+ messages in thread
From: Nicholas Piggin @ 2019-04-04 12:07 UTC (permalink / raw)
  To: Thomas Gleixner, Frederic Weisbecker
  Cc: Nicholas Piggin, Ingo Molnar, Peter Zijlstra, Rafael J . Wysocki,
	linux-kernel

During housekeeping mask setup, currently a possible CPU is required.
That does not guarantee a CPU at boot time, so check to ensure that
at least one present CPU is in the mask.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 kernel/sched/isolation.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index b02d148e7672..687302051a27 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -65,6 +65,7 @@ void __init housekeeping_init(void)
 static int __init housekeeping_setup(char *str, enum hk_flags flags)
 {
 	cpumask_var_t non_housekeeping_mask;
+	cpumask_var_t tmp;
 	int err;
 
 	alloc_bootmem_cpumask_var(&non_housekeeping_mask);
@@ -75,16 +76,23 @@ static int __init housekeeping_setup(char *str, enum hk_flags flags)
 		return 0;
 	}
 
+	alloc_bootmem_cpumask_var(&tmp);
 	if (!housekeeping_flags) {
 		alloc_bootmem_cpumask_var(&housekeeping_mask);
 		cpumask_andnot(housekeeping_mask,
 			       cpu_possible_mask, non_housekeeping_mask);
-		if (cpumask_empty(housekeeping_mask))
+
+		cpumask_andnot(tmp, cpu_present_mask, non_housekeeping_mask);
+		if (cpumask_empty(tmp)) {
+			pr_warn("Housekeeping: must include one present CPU, "
+				"using boot CPU:%d\n", smp_processor_id());
 			__cpumask_set_cpu(smp_processor_id(), housekeeping_mask);
+			__cpumask_clear_cpu(smp_processor_id(), non_housekeeping_mask);
+		}
 	} else {
-		cpumask_var_t tmp;
-
-		alloc_bootmem_cpumask_var(&tmp);
+		cpumask_andnot(tmp, cpu_present_mask, non_housekeeping_mask);
+		if (cpumask_empty(tmp))
+			__cpumask_clear_cpu(smp_processor_id(), non_housekeeping_mask);
 		cpumask_andnot(tmp, cpu_possible_mask, non_housekeeping_mask);
 		if (!cpumask_equal(tmp, housekeeping_mask)) {
 			pr_warn("Housekeeping: nohz_full= must match isolcpus=\n");
@@ -92,8 +100,8 @@ static int __init housekeeping_setup(char *str, enum hk_flags flags)
 			free_bootmem_cpumask_var(non_housekeeping_mask);
 			return 0;
 		}
-		free_bootmem_cpumask_var(tmp);
 	}
+	free_bootmem_cpumask_var(tmp);
 
 	if ((flags & HK_FLAG_TICK) && !(housekeeping_flags & HK_FLAG_TICK)) {
 		if (IS_ENABLED(CONFIG_NO_HZ_FULL)) {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/4] nohz_full: Allow the boot CPU to be full nohz
  2019-04-04 12:07 [PATCH 0/4] Allow CPU0 to be nohz full Nicholas Piggin
                   ` (2 preceding siblings ...)
  2019-04-04 12:07 ` [PATCH 3/4] kernel/sched/isolation: require a present CPU in housekeeping mask Nicholas Piggin
@ 2019-04-04 12:07 ` Nicholas Piggin
  2019-04-04 14:36 ` [PATCH 0/4] Allow CPU0 to be nohz full Thomas Gleixner
  4 siblings, 0 replies; 12+ messages in thread
From: Nicholas Piggin @ 2019-04-04 12:07 UTC (permalink / raw)
  To: Thomas Gleixner, Frederic Weisbecker
  Cc: Nicholas Piggin, Ingo Molnar, Peter Zijlstra, Rafael J . Wysocki,
	linux-kernel

Allow the boot CPU to be full nohz, and have it take the do_timer duty
temporarily during boot.

nohz_full has been successful at significantly reducing jitter for a
large supercomputer customer, but their job control system requires
CPU0 to be for housekeeping.

This will cause suspend / kexec freeze to occur on a non-boot CPU,
so the option may need to be made conditional by arch?

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 kernel/time/tick-common.c | 50 +++++++++++++++++++++++++++++++++++----
 kernel/time/tick-sched.c  | 27 +++++++++++----------
 2 files changed, 60 insertions(+), 17 deletions(-)

diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index 529143b4c8d2..31146c13226e 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -46,6 +46,14 @@ ktime_t tick_period;
  *    procedure also covers cpu hotplug.
  */
 int tick_do_timer_cpu __read_mostly = TICK_DO_TIMER_BOOT;
+#ifdef CONFIG_NO_HZ_FULL
+/*
+ * tick_do_timer_boot_cpu indicates the boot CPU temporarily owns
+ * tick_do_timer_cpu and it should be taken over by an eligible secondary
+ * when one comes online.
+ */
+static int tick_do_timer_boot_cpu __read_mostly = -1;
+#endif
 
 /*
  * Debugging: see timer_list.c
@@ -167,6 +175,26 @@ void tick_setup_periodic(struct clock_event_device *dev, int broadcast)
 	}
 }
 
+#ifdef CONFIG_NO_HZ_FULL
+static void giveup_do_timer(void *info)
+{
+	int cpu = *(unsigned int *)info;
+
+	WARN_ON(tick_do_timer_cpu != smp_processor_id());
+
+	tick_do_timer_cpu = cpu;
+}
+
+static void tick_take_do_timer_from_boot(void)
+{
+	int cpu = smp_processor_id();
+	int from = tick_do_timer_boot_cpu;
+
+	if (from >= 0 && from != cpu)
+		smp_call_function_single(from, giveup_do_timer, &cpu, 1);
+}
+#endif
+
 /*
  * Setup the tick device
  */
@@ -186,12 +214,26 @@ static void tick_setup_device(struct tick_device *td,
 		 * this cpu:
 		 */
 		if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) {
-			if (!tick_nohz_full_cpu(cpu))
-				tick_do_timer_cpu = cpu;
-			else
-				tick_do_timer_cpu = TICK_DO_TIMER_NONE;
+			tick_do_timer_cpu = cpu;
+
 			tick_next_period = ktime_get();
 			tick_period = NSEC_PER_SEC / HZ;
+#ifdef CONFIG_NO_HZ_FULL
+			/*
+			 * The boot CPU may be nohz_full, in which case set
+			 * tick_do_timer_boot_cpu so the first housekeeping
+			 * secondary that comes up will take do_timer from
+			 * us.
+			 */
+			if (tick_nohz_full_cpu(cpu))
+				tick_do_timer_boot_cpu = cpu;
+
+		} else if (tick_do_timer_boot_cpu != -1 &&
+						!tick_nohz_full_cpu(cpu)) {
+			tick_take_do_timer_from_boot();
+			tick_do_timer_boot_cpu = -1;
+			WARN_ON(tick_do_timer_cpu != cpu);
+#endif
 		}
 
 		/*
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 6fa52cd6df0b..c0105bf4ecd9 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -121,10 +121,14 @@ static void tick_sched_do_timer(struct tick_sched *ts, ktime_t now)
 	 * into a long sleep. If two CPUs happen to assign themselves to
 	 * this duty, then the jiffies update is still serialized by
 	 * jiffies_lock.
+	 *
+	 * If nohz_full is enabled, this should not happen because the
+	 * tick_do_timer_cpu never relinquishes.
 	 */
-	if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE)
-	    && !tick_nohz_full_cpu(cpu))
+	if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE)) {
+		WARN_ON(tick_nohz_full_running);
 		tick_do_timer_cpu = cpu;
+	}
 #endif
 
 	/* Check, if the jiffies need an update */
@@ -395,8 +399,8 @@ void __init tick_nohz_full_setup(cpumask_var_t cpumask)
 static int tick_nohz_cpu_down(unsigned int cpu)
 {
 	/*
-	 * The boot CPU handles housekeeping duty (unbound timers,
-	 * workqueues, timekeeping, ...) on behalf of full dynticks
+	 * The tick_do_timer_cpu CPU handles housekeeping duty (unbound
+	 * timers, workqueues, timekeeping, ...) on behalf of full dynticks
 	 * CPUs. It must remain online when nohz full is enabled.
 	 */
 	if (tick_nohz_full_running && tick_do_timer_cpu == cpu)
@@ -423,14 +427,6 @@ void __init tick_nohz_init(void)
 		return;
 	}
 
-	cpu = smp_processor_id();
-
-	if (cpumask_test_cpu(cpu, tick_nohz_full_mask)) {
-		pr_warn("NO_HZ: Clearing %d from nohz_full range for timekeeping\n",
-			cpu);
-		cpumask_clear_cpu(cpu, tick_nohz_full_mask);
-	}
-
 	for_each_cpu(cpu, tick_nohz_full_mask)
 		context_tracking_cpu_set(cpu);
 
@@ -904,8 +900,13 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
 		/*
 		 * Boot safety: make sure the timekeeping duty has been
 		 * assigned before entering dyntick-idle mode,
+		 * tick_do_timer_cpu is TICK_DO_TIMER_BOOT
 		 */
-		if (tick_do_timer_cpu == TICK_DO_TIMER_NONE)
+		if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_BOOT))
+			return false;
+
+		/* Should not happen for nohz-full */
+		if (WARN_ON_ONCE(tick_do_timer_cpu == TICK_DO_TIMER_NONE))
 			return false;
 	}
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/4] Allow CPU0 to be nohz full
  2019-04-04 12:07 [PATCH 0/4] Allow CPU0 to be nohz full Nicholas Piggin
                   ` (3 preceding siblings ...)
  2019-04-04 12:07 ` [PATCH 4/4] nohz_full: Allow the boot CPU to be full nohz Nicholas Piggin
@ 2019-04-04 14:36 ` Thomas Gleixner
  2019-04-04 16:02   ` Nicholas Piggin
  4 siblings, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2019-04-04 14:36 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Frederic Weisbecker, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, linux-kernel

On Thu, 4 Apr 2019, Nicholas Piggin wrote:

> I've been looking at ways to fix suspend breakage with CPU0 as a
> nohz CPU. I started looking at various things like allowing CPU0
> to take over do_timer again temporarily or allowing nohz full
> to be stopped at runtime (that is quite a significant change for
> little real benefit). The problem then was having the housekeeping
> CPU go offline.
> 
> So I decided to try just allowing the freeze to occur on non-zero
> CPU. This seems to be a lot simpler to get working, but I guess
> some archs won't be able to deal with this? Would it be okay to
> make it opt-in per arch?

It needs to be opt in. x86 will fall on its nose with that.

Now the real interesting question is WHY do we need that at all?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/4] Allow CPU0 to be nohz full
  2019-04-04 14:36 ` [PATCH 0/4] Allow CPU0 to be nohz full Thomas Gleixner
@ 2019-04-04 16:02   ` Nicholas Piggin
  2019-04-05 17:54     ` Thomas Gleixner
  0 siblings, 1 reply; 12+ messages in thread
From: Nicholas Piggin @ 2019-04-04 16:02 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Frederic Weisbecker, linux-kernel, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki

Thomas Gleixner's on April 5, 2019 12:36 am:
> On Thu, 4 Apr 2019, Nicholas Piggin wrote:
> 
>> I've been looking at ways to fix suspend breakage with CPU0 as a
>> nohz CPU. I started looking at various things like allowing CPU0
>> to take over do_timer again temporarily or allowing nohz full
>> to be stopped at runtime (that is quite a significant change for
>> little real benefit). The problem then was having the housekeeping
>> CPU go offline.
>> 
>> So I decided to try just allowing the freeze to occur on non-zero
>> CPU. This seems to be a lot simpler to get working, but I guess
>> some archs won't be able to deal with this? Would it be okay to
>> make it opt-in per arch?
> 
> It needs to be opt in. x86 will fall on its nose with that.

Okay I can add that.

> Now the real interesting question is WHY do we need that at all?

Why full nohz for CPU0? Basically this is how their job system was
written and used, testing nohz full was a change that came much later 
as an optimisation.

I don't think there is a fundamental reason an equivalent system
could not be made that uses a different CPU for housekeeping, but I
was assured the change would be quite difficult for them.

If we can support it, it seems nice if you can take a particular
configuration and just apply nohz_full to your application processors
without any other changes.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/4] Allow CPU0 to be nohz full
  2019-04-04 16:02   ` Nicholas Piggin
@ 2019-04-05 17:54     ` Thomas Gleixner
  2019-04-09  9:21       ` Nicholas Piggin
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2019-04-05 17:54 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Frederic Weisbecker, linux-kernel, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki

On Fri, 5 Apr 2019, Nicholas Piggin wrote:
> Thomas Gleixner's on April 5, 2019 12:36 am:
> > On Thu, 4 Apr 2019, Nicholas Piggin wrote:
> > 
> >> I've been looking at ways to fix suspend breakage with CPU0 as a
> >> nohz CPU. I started looking at various things like allowing CPU0
> >> to take over do_timer again temporarily or allowing nohz full
> >> to be stopped at runtime (that is quite a significant change for
> >> little real benefit). The problem then was having the housekeeping
> >> CPU go offline.
> >> 
> >> So I decided to try just allowing the freeze to occur on non-zero
> >> CPU. This seems to be a lot simpler to get working, but I guess
> >> some archs won't be able to deal with this? Would it be okay to
> >> make it opt-in per arch?
> > 
> > It needs to be opt in. x86 will fall on its nose with that.
> 
> Okay I can add that.
> 
> > Now the real interesting question is WHY do we need that at all?
> 
> Why full nohz for CPU0? Basically this is how their job system was
> written and used, testing nohz full was a change that came much later 
> as an optimisation.
> 
> I don't think there is a fundamental reason an equivalent system
> could not be made that uses a different CPU for housekeeping, but I
> was assured the change would be quite difficult for them.
> 
> If we can support it, it seems nice if you can take a particular
> configuration and just apply nohz_full to your application processors
> without any other changes.

This wants an explanation in the patches. And patch 4 has in the changelog:

   nohz_full has been successful at significantly reducing jitter for a
   large supercomputer customer, but their job control system requires CPU0
   to be for housekeeping.

which just makes me dazed and confused :)

Other than some coherent explanation and making it opt in, I don't think
there is a fundamental issue with that.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/4] Allow CPU0 to be nohz full
  2019-04-05 17:54     ` Thomas Gleixner
@ 2019-04-09  9:21       ` Nicholas Piggin
  2019-04-11 15:42         ` Paul E. McKenney
  0 siblings, 1 reply; 12+ messages in thread
From: Nicholas Piggin @ 2019-04-09  9:21 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Frederic Weisbecker, linux-kernel, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki

Thomas Gleixner's on April 6, 2019 3:54 am:
> On Fri, 5 Apr 2019, Nicholas Piggin wrote:
>> Thomas Gleixner's on April 5, 2019 12:36 am:
>> > On Thu, 4 Apr 2019, Nicholas Piggin wrote:
>> > 
>> >> I've been looking at ways to fix suspend breakage with CPU0 as a
>> >> nohz CPU. I started looking at various things like allowing CPU0
>> >> to take over do_timer again temporarily or allowing nohz full
>> >> to be stopped at runtime (that is quite a significant change for
>> >> little real benefit). The problem then was having the housekeeping
>> >> CPU go offline.
>> >> 
>> >> So I decided to try just allowing the freeze to occur on non-zero
>> >> CPU. This seems to be a lot simpler to get working, but I guess
>> >> some archs won't be able to deal with this? Would it be okay to
>> >> make it opt-in per arch?
>> > 
>> > It needs to be opt in. x86 will fall on its nose with that.
>> 
>> Okay I can add that.
>> 
>> > Now the real interesting question is WHY do we need that at all?
>> 
>> Why full nohz for CPU0? Basically this is how their job system was
>> written and used, testing nohz full was a change that came much later 
>> as an optimisation.
>> 
>> I don't think there is a fundamental reason an equivalent system
>> could not be made that uses a different CPU for housekeeping, but I
>> was assured the change would be quite difficult for them.
>> 
>> If we can support it, it seems nice if you can take a particular
>> configuration and just apply nohz_full to your application processors
>> without any other changes.
> 
> This wants an explanation in the patches.

Okay.

> And patch 4 has in the changelog:
> 
>    nohz_full has been successful at significantly reducing jitter for a
>    large supercomputer customer, but their job control system requires CPU0
>    to be for housekeeping.
> 
> which just makes me dazed and confused :)
> 
> Other than some coherent explanation and making it opt in, I don't think
> there is a fundamental issue with that.

I will try to make the changelogs less jibberish then :)

Thanks,
Nick


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/4] Allow CPU0 to be nohz full
  2019-04-09  9:21       ` Nicholas Piggin
@ 2019-04-11 15:42         ` Paul E. McKenney
  2019-04-12  3:16           ` Nicholas Piggin
  0 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2019-04-11 15:42 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Thomas Gleixner, Frederic Weisbecker, linux-kernel, Ingo Molnar,
	Peter Zijlstra, Rafael J . Wysocki

On Tue, Apr 09, 2019 at 07:21:54PM +1000, Nicholas Piggin wrote:
> Thomas Gleixner's on April 6, 2019 3:54 am:
> > On Fri, 5 Apr 2019, Nicholas Piggin wrote:
> >> Thomas Gleixner's on April 5, 2019 12:36 am:
> >> > On Thu, 4 Apr 2019, Nicholas Piggin wrote:
> >> > 
> >> >> I've been looking at ways to fix suspend breakage with CPU0 as a
> >> >> nohz CPU. I started looking at various things like allowing CPU0
> >> >> to take over do_timer again temporarily or allowing nohz full
> >> >> to be stopped at runtime (that is quite a significant change for
> >> >> little real benefit). The problem then was having the housekeeping
> >> >> CPU go offline.
> >> >> 
> >> >> So I decided to try just allowing the freeze to occur on non-zero
> >> >> CPU. This seems to be a lot simpler to get working, but I guess
> >> >> some archs won't be able to deal with this? Would it be okay to
> >> >> make it opt-in per arch?
> >> > 
> >> > It needs to be opt in. x86 will fall on its nose with that.
> >> 
> >> Okay I can add that.
> >> 
> >> > Now the real interesting question is WHY do we need that at all?
> >> 
> >> Why full nohz for CPU0? Basically this is how their job system was
> >> written and used, testing nohz full was a change that came much later 
> >> as an optimisation.
> >> 
> >> I don't think there is a fundamental reason an equivalent system
> >> could not be made that uses a different CPU for housekeeping, but I
> >> was assured the change would be quite difficult for them.
> >> 
> >> If we can support it, it seems nice if you can take a particular
> >> configuration and just apply nohz_full to your application processors
> >> without any other changes.
> > 
> > This wants an explanation in the patches.
> 
> Okay.
> 
> > And patch 4 has in the changelog:
> > 
> >    nohz_full has been successful at significantly reducing jitter for a
> >    large supercomputer customer, but their job control system requires CPU0
> >    to be for housekeeping.
> > 
> > which just makes me dazed and confused :)
> > 
> > Other than some coherent explanation and making it opt in, I don't think
> > there is a fundamental issue with that.
> 
> I will try to make the changelogs less jibberish then :)

Maybe this is all taken care of now, but do the various clocks stay
synchronized with wall-clock time if all CPUs are in nohz_full mode?
At one time, at least one CPU needed to keep its scheduler-clock
interrupt going in order to keep things in sync.

The ppc timebase register might make it possible to do this without any
scheduler-clock interrupts, but figured I should check.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/4] Allow CPU0 to be nohz full
  2019-04-11 15:42         ` Paul E. McKenney
@ 2019-04-12  3:16           ` Nicholas Piggin
  2019-04-12 11:30             ` Paul E. McKenney
  0 siblings, 1 reply; 12+ messages in thread
From: Nicholas Piggin @ 2019-04-12  3:16 UTC (permalink / raw)
  To: paulmck
  Cc: Frederic Weisbecker, linux-kernel, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Thomas Gleixner

Paul E. McKenney's on April 12, 2019 1:42 am:
> On Tue, Apr 09, 2019 at 07:21:54PM +1000, Nicholas Piggin wrote:
>> Thomas Gleixner's on April 6, 2019 3:54 am:
>> > On Fri, 5 Apr 2019, Nicholas Piggin wrote:
>> >> Thomas Gleixner's on April 5, 2019 12:36 am:
>> >> > On Thu, 4 Apr 2019, Nicholas Piggin wrote:
>> >> > 
>> >> >> I've been looking at ways to fix suspend breakage with CPU0 as a
>> >> >> nohz CPU. I started looking at various things like allowing CPU0
>> >> >> to take over do_timer again temporarily or allowing nohz full
>> >> >> to be stopped at runtime (that is quite a significant change for
>> >> >> little real benefit). The problem then was having the housekeeping
>> >> >> CPU go offline.
>> >> >> 
>> >> >> So I decided to try just allowing the freeze to occur on non-zero
>> >> >> CPU. This seems to be a lot simpler to get working, but I guess
>> >> >> some archs won't be able to deal with this? Would it be okay to
>> >> >> make it opt-in per arch?
>> >> > 
>> >> > It needs to be opt in. x86 will fall on its nose with that.
>> >> 
>> >> Okay I can add that.
>> >> 
>> >> > Now the real interesting question is WHY do we need that at all?
>> >> 
>> >> Why full nohz for CPU0? Basically this is how their job system was
>> >> written and used, testing nohz full was a change that came much later 
>> >> as an optimisation.
>> >> 
>> >> I don't think there is a fundamental reason an equivalent system
>> >> could not be made that uses a different CPU for housekeeping, but I
>> >> was assured the change would be quite difficult for them.
>> >> 
>> >> If we can support it, it seems nice if you can take a particular
>> >> configuration and just apply nohz_full to your application processors
>> >> without any other changes.
>> > 
>> > This wants an explanation in the patches.
>> 
>> Okay.
>> 
>> > And patch 4 has in the changelog:
>> > 
>> >    nohz_full has been successful at significantly reducing jitter for a
>> >    large supercomputer customer, but their job control system requires CPU0
>> >    to be for housekeeping.
>> > 
>> > which just makes me dazed and confused :)
>> > 
>> > Other than some coherent explanation and making it opt in, I don't think
>> > there is a fundamental issue with that.
>> 
>> I will try to make the changelogs less jibberish then :)
> 
> Maybe this is all taken care of now, but do the various clocks stay
> synchronized with wall-clock time if all CPUs are in nohz_full mode?
> At one time, at least one CPU needed to keep its scheduler-clock
> interrupt going in order to keep things in sync.

Ah, may not have been clear in the changelog -- the series still 
requires at least one CPU present at boot time to be a housekeeper 
that keeps things running. So conceptually this doesn't change 
anything about runtime behaviour, the main change is the boot-time
handoff from CPU0.

> The ppc timebase register might make it possible to do this without any
> scheduler-clock interrupts, but figured I should check.  ;-)

I dont know all this code too well, but if we really wanted to push 
things, I think nohz-full could be more aggressive in shutting down 
the tick and possibly even avoiding a housekeeping CPU completely, but 
you would have to do that work on user->kernel switch too. Likely the 
complexity and overhead is not worthwhile.

Other thing is you might be able to avoid the jiffies tick completely
and change jiffies to read from timebase register. Lot of interesting
things we could try.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/4] Allow CPU0 to be nohz full
  2019-04-12  3:16           ` Nicholas Piggin
@ 2019-04-12 11:30             ` Paul E. McKenney
  0 siblings, 0 replies; 12+ messages in thread
From: Paul E. McKenney @ 2019-04-12 11:30 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Frederic Weisbecker, linux-kernel, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Thomas Gleixner

On Fri, Apr 12, 2019 at 01:16:01PM +1000, Nicholas Piggin wrote:
> Paul E. McKenney's on April 12, 2019 1:42 am:
> > On Tue, Apr 09, 2019 at 07:21:54PM +1000, Nicholas Piggin wrote:
> >> Thomas Gleixner's on April 6, 2019 3:54 am:
> >> > On Fri, 5 Apr 2019, Nicholas Piggin wrote:
> >> >> Thomas Gleixner's on April 5, 2019 12:36 am:
> >> >> > On Thu, 4 Apr 2019, Nicholas Piggin wrote:
> >> >> > 
> >> >> >> I've been looking at ways to fix suspend breakage with CPU0 as a
> >> >> >> nohz CPU. I started looking at various things like allowing CPU0
> >> >> >> to take over do_timer again temporarily or allowing nohz full
> >> >> >> to be stopped at runtime (that is quite a significant change for
> >> >> >> little real benefit). The problem then was having the housekeeping
> >> >> >> CPU go offline.
> >> >> >> 
> >> >> >> So I decided to try just allowing the freeze to occur on non-zero
> >> >> >> CPU. This seems to be a lot simpler to get working, but I guess
> >> >> >> some archs won't be able to deal with this? Would it be okay to
> >> >> >> make it opt-in per arch?
> >> >> > 
> >> >> > It needs to be opt in. x86 will fall on its nose with that.
> >> >> 
> >> >> Okay I can add that.
> >> >> 
> >> >> > Now the real interesting question is WHY do we need that at all?
> >> >> 
> >> >> Why full nohz for CPU0? Basically this is how their job system was
> >> >> written and used, testing nohz full was a change that came much later 
> >> >> as an optimisation.
> >> >> 
> >> >> I don't think there is a fundamental reason an equivalent system
> >> >> could not be made that uses a different CPU for housekeeping, but I
> >> >> was assured the change would be quite difficult for them.
> >> >> 
> >> >> If we can support it, it seems nice if you can take a particular
> >> >> configuration and just apply nohz_full to your application processors
> >> >> without any other changes.
> >> > 
> >> > This wants an explanation in the patches.
> >> 
> >> Okay.
> >> 
> >> > And patch 4 has in the changelog:
> >> > 
> >> >    nohz_full has been successful at significantly reducing jitter for a
> >> >    large supercomputer customer, but their job control system requires CPU0
> >> >    to be for housekeeping.
> >> > 
> >> > which just makes me dazed and confused :)
> >> > 
> >> > Other than some coherent explanation and making it opt in, I don't think
> >> > there is a fundamental issue with that.
> >> 
> >> I will try to make the changelogs less jibberish then :)
> > 
> > Maybe this is all taken care of now, but do the various clocks stay
> > synchronized with wall-clock time if all CPUs are in nohz_full mode?
> > At one time, at least one CPU needed to keep its scheduler-clock
> > interrupt going in order to keep things in sync.
> 
> Ah, may not have been clear in the changelog -- the series still 
> requires at least one CPU present at boot time to be a housekeeper 
> that keeps things running. So conceptually this doesn't change 
> anything about runtime behaviour, the main change is the boot-time
> handoff from CPU0.

I did miss that, thank you for the update.

> > The ppc timebase register might make it possible to do this without any
> > scheduler-clock interrupts, but figured I should check.  ;-)
> 
> I dont know all this code too well, but if we really wanted to push 
> things, I think nohz-full could be more aggressive in shutting down 
> the tick and possibly even avoiding a housekeeping CPU completely, but 
> you would have to do that work on user->kernel switch too. Likely the 
> complexity and overhead is not worthwhile.

There was some RCU functionality that detected when all the
non-housekeeping CPUs went idle, but it went unused for some years, so
I reverted it.  This revert commit is at tag sysidle.2017.05.11a in my
-rcu tree.  If it is actually going to be used, I could of course add
it back.  ;-)

> Other thing is you might be able to avoid the jiffies tick completely
> and change jiffies to read from timebase register. Lot of interesting
> things we could try.

Or make userspace use the timebase register to avoid the need for
in-kernel time adjustments, though the connection to NTP and similar
would still need to be maintained.  I supposed that the jiffies
counter could be fixed up on entry to the kernel?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-04-12 11:30 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-04 12:07 [PATCH 0/4] Allow CPU0 to be nohz full Nicholas Piggin
2019-04-04 12:07 ` [PATCH 1/4] sched/core: allow the remote scheduler tick to be started on CPU0 Nicholas Piggin
2019-04-04 12:07 ` [PATCH 2/4] kernel/cpu: Allow non-zero CPU to be primary for suspend / kexec freeze Nicholas Piggin
2019-04-04 12:07 ` [PATCH 3/4] kernel/sched/isolation: require a present CPU in housekeeping mask Nicholas Piggin
2019-04-04 12:07 ` [PATCH 4/4] nohz_full: Allow the boot CPU to be full nohz Nicholas Piggin
2019-04-04 14:36 ` [PATCH 0/4] Allow CPU0 to be nohz full Thomas Gleixner
2019-04-04 16:02   ` Nicholas Piggin
2019-04-05 17:54     ` Thomas Gleixner
2019-04-09  9:21       ` Nicholas Piggin
2019-04-11 15:42         ` Paul E. McKenney
2019-04-12  3:16           ` Nicholas Piggin
2019-04-12 11:30             ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).