All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] kernel/isolation: Asset that a housekeeping CPU comes up at boot time
@ 2019-06-01 11:39 Nicholas Piggin
  2019-06-10  7:24 ` Nicholas Piggin
  2019-06-24 10:57 ` Qais Yousef
  0 siblings, 2 replies; 7+ messages in thread
From: Nicholas Piggin @ 2019-06-01 11:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: Nicholas Piggin, Frederic Weisbecker, Ingo Molnar, Peter Zijlstra

With the change to allow the boot CPU0 to be isolated, it is possible
to specify command line options that result in no housekeeping CPU
online at boot.

An 8 CPU system booted with "nohz_full=0-6 maxcpus=4", for example.

It is not easily possible at housekeeping init time to know all the
various SMP options that will result in an invalid configuration, so
this patch adds a sanity check after SMP init, to ensure that a
housekeeping CPU has been onlined.

The panic is undesirable, but it's better than the alternative of an
obscure non deterministic failure. The panic will reliably happen
when advanced parameters are used incorrectly.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 kernel/sched/isolation.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 123ea07a3f3b..7b9e1e0d4ec3 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -63,6 +63,29 @@ void __init housekeeping_init(void)
 	WARN_ON_ONCE(cpumask_empty(housekeeping_mask));
 }
 
+static int __init housekeeping_verify_smp(void)
+{
+	int cpu;
+
+	/*
+	 * Early housekeeping setup is done before CPUs come up, and there are
+	 * a range of options scattered around that can restrict which CPUs
+	 * come up. It is possible to pass in a combination of housekeeping
+	 * and SMP arguments that result in housekeeping assigned to an
+	 * offline CPU.
+	 *
+	 * Check that condition here after SMP comes up, and give a useful
+	 * error message rather than an obscure non deterministic crash or
+	 * hang later.
+	 */
+	for_each_online_cpu(cpu) {
+		if (cpumask_test_cpu(cpu, housekeeping_mask))
+			return 0;
+	}
+	panic("Housekeeping: nohz_full= or isolcpus= resulted in no online CPUs for housekeeping.\n");
+}
+core_initcall(housekeeping_verify_smp);
+
 static int __init housekeeping_setup(char *str, enum hk_flags flags)
 {
 	cpumask_var_t non_housekeeping_mask;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] kernel/isolation: Asset that a housekeeping CPU comes up at boot time
  2019-06-01 11:39 [PATCH] kernel/isolation: Asset that a housekeeping CPU comes up at boot time Nicholas Piggin
@ 2019-06-10  7:24 ` Nicholas Piggin
  2019-06-17 15:59   ` Peter Zijlstra
  2019-06-24 10:57 ` Qais Yousef
  1 sibling, 1 reply; 7+ messages in thread
From: Nicholas Piggin @ 2019-06-10  7:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: Frederic Weisbecker, Ingo Molnar, Peter Zijlstra

Nicholas Piggin's on June 1, 2019 9:39 pm:
> With the change to allow the boot CPU0 to be isolated, it is possible
> to specify command line options that result in no housekeeping CPU
> online at boot.
> 
> An 8 CPU system booted with "nohz_full=0-6 maxcpus=4", for example.
> 
> It is not easily possible at housekeeping init time to know all the
> various SMP options that will result in an invalid configuration, so
> this patch adds a sanity check after SMP init, to ensure that a
> housekeeping CPU has been onlined.
> 
> The panic is undesirable, but it's better than the alternative of an
> obscure non deterministic failure. The panic will reliably happen
> when advanced parameters are used incorrectly.

Ping on this one? This should resolve Frederic's remaining objection
to the series (at least until he solves it more generally).

As the series has already been merged, should we get this upstream
before release?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] kernel/isolation: Asset that a housekeeping CPU comes up at boot time
  2019-06-10  7:24 ` Nicholas Piggin
@ 2019-06-17 15:59   ` Peter Zijlstra
  2019-06-17 19:05     ` Frederic Weisbecker
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2019-06-17 15:59 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linux-kernel, Frederic Weisbecker, Ingo Molnar

On Mon, Jun 10, 2019 at 05:24:32PM +1000, Nicholas Piggin wrote:
> Nicholas Piggin's on June 1, 2019 9:39 pm:
> > With the change to allow the boot CPU0 to be isolated, it is possible
> > to specify command line options that result in no housekeeping CPU
> > online at boot.
> > 
> > An 8 CPU system booted with "nohz_full=0-6 maxcpus=4", for example.
> > 
> > It is not easily possible at housekeeping init time to know all the
> > various SMP options that will result in an invalid configuration, so
> > this patch adds a sanity check after SMP init, to ensure that a
> > housekeeping CPU has been onlined.
> > 
> > The panic is undesirable, but it's better than the alternative of an
> > obscure non deterministic failure. The panic will reliably happen
> > when advanced parameters are used incorrectly.
> 
> Ping on this one? This should resolve Frederic's remaining objection
> to the series (at least until he solves it more generally).
> 
> As the series has already been merged, should we get this upstream
> before release?

I was hoping for feedback from Frederic, lacking that, I've queued it
now.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] kernel/isolation: Asset that a housekeeping CPU comes up at boot time
  2019-06-17 15:59   ` Peter Zijlstra
@ 2019-06-17 19:05     ` Frederic Weisbecker
  2019-06-19  2:40       ` Nicholas Piggin
  0 siblings, 1 reply; 7+ messages in thread
From: Frederic Weisbecker @ 2019-06-17 19:05 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Nicholas Piggin, linux-kernel, Ingo Molnar

On Mon, Jun 17, 2019 at 05:59:31PM +0200, Peter Zijlstra wrote:
> On Mon, Jun 10, 2019 at 05:24:32PM +1000, Nicholas Piggin wrote:
> > Nicholas Piggin's on June 1, 2019 9:39 pm:
> > > With the change to allow the boot CPU0 to be isolated, it is possible
> > > to specify command line options that result in no housekeeping CPU
> > > online at boot.
> > > 
> > > An 8 CPU system booted with "nohz_full=0-6 maxcpus=4", for example.
> > > 
> > > It is not easily possible at housekeeping init time to know all the
> > > various SMP options that will result in an invalid configuration, so
> > > this patch adds a sanity check after SMP init, to ensure that a
> > > housekeeping CPU has been onlined.
> > > 
> > > The panic is undesirable, but it's better than the alternative of an
> > > obscure non deterministic failure. The panic will reliably happen
> > > when advanced parameters are used incorrectly.
> > 
> > Ping on this one? This should resolve Frederic's remaining objection
> > to the series (at least until he solves it more generally).
> > 
> > As the series has already been merged, should we get this upstream
> > before release?
> 
> I was hoping for feedback from Frederic, lacking that, I've queued it
> now.
> 

Sorry I just came back from vacation. Any chance we can use a WARN() instead?
I prefer to use panic() only when data is really threatened or such.

Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] kernel/isolation: Asset that a housekeeping CPU comes up at boot time
  2019-06-17 19:05     ` Frederic Weisbecker
@ 2019-06-19  2:40       ` Nicholas Piggin
  0 siblings, 0 replies; 7+ messages in thread
From: Nicholas Piggin @ 2019-06-19  2:40 UTC (permalink / raw)
  To: Frederic Weisbecker, Peter Zijlstra; +Cc: linux-kernel, Ingo Molnar

Frederic Weisbecker's on June 18, 2019 5:05 am:
> On Mon, Jun 17, 2019 at 05:59:31PM +0200, Peter Zijlstra wrote:
>> On Mon, Jun 10, 2019 at 05:24:32PM +1000, Nicholas Piggin wrote:
>> > Nicholas Piggin's on June 1, 2019 9:39 pm:
>> > > With the change to allow the boot CPU0 to be isolated, it is possible
>> > > to specify command line options that result in no housekeeping CPU
>> > > online at boot.
>> > > 
>> > > An 8 CPU system booted with "nohz_full=0-6 maxcpus=4", for example.
>> > > 
>> > > It is not easily possible at housekeeping init time to know all the
>> > > various SMP options that will result in an invalid configuration, so
>> > > this patch adds a sanity check after SMP init, to ensure that a
>> > > housekeeping CPU has been onlined.
>> > > 
>> > > The panic is undesirable, but it's better than the alternative of an
>> > > obscure non deterministic failure. The panic will reliably happen
>> > > when advanced parameters are used incorrectly.
>> > 
>> > Ping on this one? This should resolve Frederic's remaining objection
>> > to the series (at least until he solves it more generally).
>> > 
>> > As the series has already been merged, should we get this upstream
>> > before release?
>> 
>> I was hoping for feedback from Frederic, lacking that, I've queued it
>> now.
>> 
> 
> Sorry I just came back from vacation. Any chance we can use a WARN() instead?
> I prefer to use panic() only when data is really threatened or such.

I thought it was decided to panic here, because we don't assign a house
keeping CPU so the system is unlikely to behave properly. A warn might
scroll off the screen by the time things grind to a halt.

This is a one-time boot parameter misconfiguration, many cases of which
can cause a panic and boot stop.

No question if we can make this more dynamic that would be better, but
for near term at least can we go with this?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] kernel/isolation: Asset that a housekeeping CPU comes up at boot time
  2019-06-01 11:39 [PATCH] kernel/isolation: Asset that a housekeeping CPU comes up at boot time Nicholas Piggin
  2019-06-10  7:24 ` Nicholas Piggin
@ 2019-06-24 10:57 ` Qais Yousef
  2019-06-25  0:05   ` Nicholas Piggin
  1 sibling, 1 reply; 7+ messages in thread
From: Qais Yousef @ 2019-06-24 10:57 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linux-kernel, Frederic Weisbecker, Ingo Molnar, Peter Zijlstra

On 06/01/19 21:39, Nicholas Piggin wrote:
> With the change to allow the boot CPU0 to be isolated, it is possible
> to specify command line options that result in no housekeeping CPU
> online at boot.
> 
> An 8 CPU system booted with "nohz_full=0-6 maxcpus=4", for example.
> 
> It is not easily possible at housekeeping init time to know all the
> various SMP options that will result in an invalid configuration, so
> this patch adds a sanity check after SMP init, to ensure that a
> housekeeping CPU has been onlined.
> 
> The panic is undesirable, but it's better than the alternative of an
> obscure non deterministic failure. The panic will reliably happen
> when advanced parameters are used incorrectly.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  kernel/sched/isolation.c | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> index 123ea07a3f3b..7b9e1e0d4ec3 100644
> --- a/kernel/sched/isolation.c
> +++ b/kernel/sched/isolation.c
> @@ -63,6 +63,29 @@ void __init housekeeping_init(void)
>  	WARN_ON_ONCE(cpumask_empty(housekeeping_mask));
>  }
>  
> +static int __init housekeeping_verify_smp(void)
> +{
> +	int cpu;
> +
> +	/*
> +	 * Early housekeeping setup is done before CPUs come up, and there are
> +	 * a range of options scattered around that can restrict which CPUs
> +	 * come up. It is possible to pass in a combination of housekeeping
> +	 * and SMP arguments that result in housekeeping assigned to an
> +	 * offline CPU.
> +	 *
> +	 * Check that condition here after SMP comes up, and give a useful
> +	 * error message rather than an obscure non deterministic crash or
> +	 * hang later.
> +	 */
> +	for_each_online_cpu(cpu) {
> +		if (cpumask_test_cpu(cpu, housekeeping_mask))
> +			return 0;
> +	}
> +	panic("Housekeeping: nohz_full= or isolcpus= resulted in no online CPUs for housekeeping.\n");

I am hitting this panic when I boot my juno board.


I have CONFIG_CPU_ISOLATION=y but I don't pass nohuz_full nor isolcpus in the
commandline. I think what's going on is that housekeeping_setup() doesn't get
called and hence housekeeping_mask isn't initialized in my case, causing this
check to fail and trigger the panic.

The below seems to 'fix' it though not sure if it's the right way forward.
A revert obviously fixes it too but I doubt we want that :-)


diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 7b9e1e0d4ec3..a9ca8628c1a2 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -67,6 +67,9 @@ static int __init housekeeping_verify_smp(void)
 {
 	int cpu;
 
+	if (!housekeeping_flags)
+		return 0;
+
 	/*
 	 * Early housekeeping setup is done before CPUs come up, and there are
 	 * a range of options scattered around that can restrict which CPUs


Cheers

--
Qais Yousef


> +}
> +core_initcall(housekeeping_verify_smp);
> +
>  static int __init housekeeping_setup(char *str, enum hk_flags flags)
>  {
>  	cpumask_var_t non_housekeeping_mask;
> -- 
> 2.20.1
> 

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] kernel/isolation: Asset that a housekeeping CPU comes up at boot time
  2019-06-24 10:57 ` Qais Yousef
@ 2019-06-25  0:05   ` Nicholas Piggin
  0 siblings, 0 replies; 7+ messages in thread
From: Nicholas Piggin @ 2019-06-25  0:05 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Frederic Weisbecker, linux-kernel, Ingo Molnar, Peter Zijlstra

Qais Yousef's on June 24, 2019 8:57 pm:
> On 06/01/19 21:39, Nicholas Piggin wrote:
>> With the change to allow the boot CPU0 to be isolated, it is possible
>> to specify command line options that result in no housekeeping CPU
>> online at boot.
>> 
>> An 8 CPU system booted with "nohz_full=0-6 maxcpus=4", for example.
>> 
>> It is not easily possible at housekeeping init time to know all the
>> various SMP options that will result in an invalid configuration, so
>> this patch adds a sanity check after SMP init, to ensure that a
>> housekeeping CPU has been onlined.
>> 
>> The panic is undesirable, but it's better than the alternative of an
>> obscure non deterministic failure. The panic will reliably happen
>> when advanced parameters are used incorrectly.
>> 
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>>  kernel/sched/isolation.c | 23 +++++++++++++++++++++++
>>  1 file changed, 23 insertions(+)
>> 
>> diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
>> index 123ea07a3f3b..7b9e1e0d4ec3 100644
>> --- a/kernel/sched/isolation.c
>> +++ b/kernel/sched/isolation.c
>> @@ -63,6 +63,29 @@ void __init housekeeping_init(void)
>>  	WARN_ON_ONCE(cpumask_empty(housekeeping_mask));
>>  }
>>  
>> +static int __init housekeeping_verify_smp(void)
>> +{
>> +	int cpu;
>> +
>> +	/*
>> +	 * Early housekeeping setup is done before CPUs come up, and there are
>> +	 * a range of options scattered around that can restrict which CPUs
>> +	 * come up. It is possible to pass in a combination of housekeeping
>> +	 * and SMP arguments that result in housekeeping assigned to an
>> +	 * offline CPU.
>> +	 *
>> +	 * Check that condition here after SMP comes up, and give a useful
>> +	 * error message rather than an obscure non deterministic crash or
>> +	 * hang later.
>> +	 */
>> +	for_each_online_cpu(cpu) {
>> +		if (cpumask_test_cpu(cpu, housekeeping_mask))
>> +			return 0;
>> +	}
>> +	panic("Housekeeping: nohz_full= or isolcpus= resulted in no online CPUs for housekeeping.\n");
> 
> I am hitting this panic when I boot my juno board.
> 
> 
> I have CONFIG_CPU_ISOLATION=y but I don't pass nohuz_full nor isolcpus in the
> commandline. I think what's going on is that housekeeping_setup() doesn't get
> called and hence housekeeping_mask isn't initialized in my case, causing this
> check to fail and trigger the panic.
> 
> The below seems to 'fix' it though not sure if it's the right way forward.
> A revert obviously fixes it too but I doubt we want that :-)

That'll do it. Thanks for the report and investigation.

> 
> 
> diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> index 7b9e1e0d4ec3..a9ca8628c1a2 100644
> --- a/kernel/sched/isolation.c
> +++ b/kernel/sched/isolation.c
> @@ -67,6 +67,9 @@ static int __init housekeeping_verify_smp(void)
>  {
>  	int cpu;
>  
> +	if (!housekeeping_flags)
> +		return 0;
> +
>  	/*
>  	 * Early housekeeping setup is done before CPUs come up, and there are
>  	 * a range of options scattered around that can restrict which CPUs
> 
> 
> Cheers
> 
> --
> Qais Yousef
> 
> 
>> +}
>> +core_initcall(housekeeping_verify_smp);
>> +
>>  static int __init housekeeping_setup(char *str, enum hk_flags flags)
>>  {
>>  	cpumask_var_t non_housekeeping_mask;
>> -- 
>> 2.20.1
>> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-06-25  0:06 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-01 11:39 [PATCH] kernel/isolation: Asset that a housekeeping CPU comes up at boot time Nicholas Piggin
2019-06-10  7:24 ` Nicholas Piggin
2019-06-17 15:59   ` Peter Zijlstra
2019-06-17 19:05     ` Frederic Weisbecker
2019-06-19  2:40       ` Nicholas Piggin
2019-06-24 10:57 ` Qais Yousef
2019-06-25  0:05   ` Nicholas Piggin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.