linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] smp,cpumask: Don't call functions on offline CPUs
@ 2019-05-22 11:15 Andrew Murray
  2019-05-22 14:09 ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Murray @ 2019-05-22 11:15 UTC (permalink / raw)
  To: peterz, Thomas Gleixner; +Cc: Rik van Riel, linux-kernel

When we are able to allocate a cpumask in on_each_cpu_cond_mask
we call functions with on_each_cpu_mask - this masks out offline
cpus via smp_call_function_many.

However when we fail to allocate a cpumask in on_each_cpu_cond_mask
we call functions with smp_call_function_single - this will return
-ENXIO from generic_exec_single if a CPU is offline which will
result in a WARN_ON_ONCE.

Let's avoid the WARN by only calling smp_call_function_single when
the CPU is online and thus making both paths consistent with each
other.

Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
 kernel/smp.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index f4cf1b0bb3b8..10970692f1c0 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -259,6 +259,7 @@ static void flush_smp_call_function_queue(bool warn_cpu_offline)
 
 /*
  * smp_call_function_single - Run a function on a specific CPU
+ * @cpu:  The CPU to run on.
  * @func: The function to run. This must be fast and non-blocking.
  * @info: An arbitrary pointer to pass to the function.
  * @wait: If true, wait until function has completed on other CPUs.
@@ -657,6 +658,8 @@ EXPORT_SYMBOL(on_each_cpu_mask);
  *		completed on other CPUs.
  * @gfp_flags:	GFP flags to use when allocating the cpumask
  *		used internally by the function.
+ * @mask:	The set of cpus to run on (only runs on online
+		subset).
  *
  * The function might sleep if the GFP flags indicates a non
  * atomic allocation is allowed.
@@ -690,12 +693,13 @@ void on_each_cpu_cond_mask(bool (*cond_func)(int cpu, void *info),
 		 * just have to IPI them one by one.
 		 */
 		preempt_disable();
-		for_each_cpu(cpu, mask)
-			if (cond_func(cpu, info)) {
+		for_each_cpu(cpu, mask) {
+			if (cpu_online(cpu) && cond_func(cpu, info)) {
 				ret = smp_call_function_single(cpu, func,
 								info, wait);
 				WARN_ON_ONCE(ret);
 			}
+		}
 		preempt_enable();
 	}
 }
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] smp,cpumask: Don't call functions on offline CPUs
  2019-05-22 11:15 [PATCH] smp,cpumask: Don't call functions on offline CPUs Andrew Murray
@ 2019-05-22 14:09 ` Peter Zijlstra
  2019-05-22 14:37   ` Andrew Murray
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2019-05-22 14:09 UTC (permalink / raw)
  To: Andrew Murray; +Cc: Thomas Gleixner, Rik van Riel, linux-kernel

On Wed, May 22, 2019 at 12:15:37PM +0100, Andrew Murray wrote:
> When we are able to allocate a cpumask in on_each_cpu_cond_mask
> we call functions with on_each_cpu_mask - this masks out offline
> cpus via smp_call_function_many.
> 
> However when we fail to allocate a cpumask in on_each_cpu_cond_mask
> we call functions with smp_call_function_single - this will return
> -ENXIO from generic_exec_single if a CPU is offline which will
> result in a WARN_ON_ONCE.
> 
> Let's avoid the WARN by only calling smp_call_function_single when
> the CPU is online and thus making both paths consistent with each
> other.

I'm confused, why are you feeding it offline CPUs to begin with? @mask
shouldn't include them.

Is perhaps the problem that on_each_cpu_cond() uses cpu_onlne_mask
without protection?

Something like so?

diff --git a/kernel/smp.c b/kernel/smp.c
index f4cf1b0bb3b8..a493b3dfa67f 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -705,8 +707,10 @@ void on_each_cpu_cond(bool (*cond_func)(int cpu, void *info),
 			smp_call_func_t func, void *info, bool wait,
 			gfp_t gfp_flags)
 {
+	cpus_read_lock();
 	on_each_cpu_cond_mask(cond_func, func, info, wait, gfp_flags,
 				cpu_online_mask);
+	cpus_read_unlock();
 }
 EXPORT_SYMBOL(on_each_cpu_cond);
 


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] smp,cpumask: Don't call functions on offline CPUs
  2019-05-22 14:09 ` Peter Zijlstra
@ 2019-05-22 14:37   ` Andrew Murray
  2019-05-22 14:49     ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Murray @ 2019-05-22 14:37 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Thomas Gleixner, Rik van Riel, linux-kernel

On Wed, May 22, 2019 at 04:09:21PM +0200, Peter Zijlstra wrote:
> On Wed, May 22, 2019 at 12:15:37PM +0100, Andrew Murray wrote:
> > When we are able to allocate a cpumask in on_each_cpu_cond_mask
> > we call functions with on_each_cpu_mask - this masks out offline
> > cpus via smp_call_function_many.
> > 
> > However when we fail to allocate a cpumask in on_each_cpu_cond_mask
> > we call functions with smp_call_function_single - this will return
> > -ENXIO from generic_exec_single if a CPU is offline which will
> > result in a WARN_ON_ONCE.
> > 
> > Let's avoid the WARN by only calling smp_call_function_single when
> > the CPU is online and thus making both paths consistent with each
> > other.
> 
> I'm confused, why are you feeding it offline CPUs to begin with? @mask
> shouldn't include them.

There are only two users of this function: on_each_cpu_cond which passes
cpu_online_mask, and native_flush_tlb_others from arch/x86/mm/tlb.c. But
I haven't followed all callers to native_flush_tlb_others to determine if
they are only passing in online CPUs.

Whilst trying to understand the code I couldn't help but notice that the
behaviour changes depending on the ability of memory allocation at the
time. This seemed odd. And of course there may be future callers of this
function that for some strange reason do pass in offline CPUs. (Maybe this
is OK and it's something they shouldn't be doing).

I felt that this function should be consistent and document the behaviour
via the @mask doc like some of the other functions in the file.

> 
> Is perhaps the problem that on_each_cpu_cond() uses cpu_onlne_mask
> without protection?

Does this prevent racing with a CPU going offline? I guess this prevents
the warning at the expense of a lock - but is only beneficial in the
unlikely path. (In the likely path this prevents new CPUs going offline
but we don't care because we don't WARN if they aren't they when we
attempt to call functions).

At least this is my limited understanding.

Thanks,

Andrew Murray

> 
> Something like so?
> 
> diff --git a/kernel/smp.c b/kernel/smp.c
> index f4cf1b0bb3b8..a493b3dfa67f 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -705,8 +707,10 @@ void on_each_cpu_cond(bool (*cond_func)(int cpu, void *info),
>  			smp_call_func_t func, void *info, bool wait,
>  			gfp_t gfp_flags)
>  {
> +	cpus_read_lock();
>  	on_each_cpu_cond_mask(cond_func, func, info, wait, gfp_flags,
>  				cpu_online_mask);
> +	cpus_read_unlock();
>  }
>  EXPORT_SYMBOL(on_each_cpu_cond);
>  
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] smp,cpumask: Don't call functions on offline CPUs
  2019-05-22 14:37   ` Andrew Murray
@ 2019-05-22 14:49     ` Peter Zijlstra
  2019-05-22 15:14       ` Andrew Murray
  2019-05-22 18:23       ` Rik van Riel
  0 siblings, 2 replies; 8+ messages in thread
From: Peter Zijlstra @ 2019-05-22 14:49 UTC (permalink / raw)
  To: Andrew Murray; +Cc: Thomas Gleixner, Rik van Riel, linux-kernel

On Wed, May 22, 2019 at 03:37:11PM +0100, Andrew Murray wrote:
> > Is perhaps the problem that on_each_cpu_cond() uses cpu_onlne_mask
> > without protection?
> 
> Does this prevent racing with a CPU going offline? I guess this prevents
> the warning at the expense of a lock - but is only beneficial in the
> unlikely path. (In the likely path this prevents new CPUs going offline
> but we don't care because we don't WARN if they aren't they when we
> attempt to call functions).
> 
> At least this is my limited understanding.

Hmm.. I don't think it could matter, we only use the mask when
preempt_disable(), which would already block offline, due to it using
stop_machine().

So the patch is a no-op.

What's the WARN you see? TLB invalidation should pass mm_cpumask(),
which similarly should not contain offline CPUs I'm thinking.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] smp,cpumask: Don't call functions on offline CPUs
  2019-05-22 14:49     ` Peter Zijlstra
@ 2019-05-22 15:14       ` Andrew Murray
  2019-05-27 10:52         ` Peter Zijlstra
  2019-05-22 18:23       ` Rik van Riel
  1 sibling, 1 reply; 8+ messages in thread
From: Andrew Murray @ 2019-05-22 15:14 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Thomas Gleixner, Rik van Riel, linux-kernel

On Wed, May 22, 2019 at 04:49:18PM +0200, Peter Zijlstra wrote:
> On Wed, May 22, 2019 at 03:37:11PM +0100, Andrew Murray wrote:
> > > Is perhaps the problem that on_each_cpu_cond() uses cpu_onlne_mask
> > > without protection?
> > 
> > Does this prevent racing with a CPU going offline? I guess this prevents
> > the warning at the expense of a lock - but is only beneficial in the
> > unlikely path. (In the likely path this prevents new CPUs going offline
> > but we don't care because we don't WARN if they aren't they when we
> > attempt to call functions).
> > 
> > At least this is my limited understanding.
> 
> Hmm.. I don't think it could matter, we only use the mask when
> preempt_disable(), which would already block offline, due to it using
> stop_machine().

OK, so as long as all arches use stop_machine to bring down a CPU then
preeempt_disable will stop racing with CPUs going offline (I don't know
if they all do or not).

> 
> So the patch is a no-op.
> 
> What's the WARN you see? TLB invalidation should pass mm_cpumask(),
> which similarly should not contain offline CPUs I'm thinking.

So the only remaining issue is if callers pass offline CPUs to the
function (I guess there is still the chance of a race between calling
on_each_cpu_cond_mask and entering the preempt disabled'd bit). As you
suggest they probably don't.

Given the above, should we add a " @mask: The mask of cpus it can run on."
to on_each_cpu_cond_mask? (Which is different to the wording of other
functions in the same file that mask it with online CPUs, e.g.
" @mask: The set of cpus to run on (only runs on online subset).".

(I haven't seen any WARN, but from looking at the code thought that it
was possible.)

Thanks,

Andrew Murray

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] smp,cpumask: Don't call functions on offline CPUs
  2019-05-22 14:49     ` Peter Zijlstra
  2019-05-22 15:14       ` Andrew Murray
@ 2019-05-22 18:23       ` Rik van Riel
  2019-05-27 10:53         ` Peter Zijlstra
  1 sibling, 1 reply; 8+ messages in thread
From: Rik van Riel @ 2019-05-22 18:23 UTC (permalink / raw)
  To: Peter Zijlstra, Andrew Murray; +Cc: Thomas Gleixner, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1372 bytes --]

On Wed, 2019-05-22 at 16:49 +0200, Peter Zijlstra wrote:
> On Wed, May 22, 2019 at 03:37:11PM +0100, Andrew Murray wrote:
> > > Is perhaps the problem that on_each_cpu_cond() uses
> > > cpu_onlne_mask
> > > without protection?
> > 
> > Does this prevent racing with a CPU going offline? I guess this
> > prevents
> > the warning at the expense of a lock - but is only beneficial in
> > the
> > unlikely path. (In the likely path this prevents new CPUs going
> > offline
> > but we don't care because we don't WARN if they aren't they when we
> > attempt to call functions).
> > 
> > At least this is my limited understanding.
> 
> Hmm.. I don't think it could matter, we only use the mask when
> preempt_disable(), which would already block offline, due to it using
> stop_machine().
> 
> So the patch is a no-op.
> 
> What's the WARN you see? TLB invalidation should pass mm_cpumask(),
> which similarly should not contain offline CPUs I'm thinking.

Does the TLB invalidation code have anything in it
to prevent from racing with the CPU offline code?

In other words, could we end up with the TLB
invalidation code building its bitmask, getting
interrupted (eg. hypervisor preemption, NMI),
and not sending out the IPI to that bitmask of
CPUs until after one of the CPUs in the bitmap
has gotten offlined?

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] smp,cpumask: Don't call functions on offline CPUs
  2019-05-22 15:14       ` Andrew Murray
@ 2019-05-27 10:52         ` Peter Zijlstra
  0 siblings, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2019-05-27 10:52 UTC (permalink / raw)
  To: Andrew Murray; +Cc: Thomas Gleixner, Rik van Riel, linux-kernel

On Wed, May 22, 2019 at 04:14:23PM +0100, Andrew Murray wrote:
> On Wed, May 22, 2019 at 04:49:18PM +0200, Peter Zijlstra wrote:
> > On Wed, May 22, 2019 at 03:37:11PM +0100, Andrew Murray wrote:
> > > > Is perhaps the problem that on_each_cpu_cond() uses cpu_onlne_mask
> > > > without protection?
> > > 
> > > Does this prevent racing with a CPU going offline? I guess this prevents
> > > the warning at the expense of a lock - but is only beneficial in the
> > > unlikely path. (In the likely path this prevents new CPUs going offline
> > > but we don't care because we don't WARN if they aren't they when we
> > > attempt to call functions).
> > > 
> > > At least this is my limited understanding.
> > 
> > Hmm.. I don't think it could matter, we only use the mask when
> > preempt_disable(), which would already block offline, due to it using
> > stop_machine().
> 
> OK, so as long as all arches use stop_machine to bring down a CPU then
> preeempt_disable will stop racing with CPUs going offline (I don't know
> if they all do or not).

kernel/cpu.c:takedown_cpu() is generic code :-)

> > So the patch is a no-op.
> > 
> > What's the WARN you see? TLB invalidation should pass mm_cpumask(),
> > which similarly should not contain offline CPUs I'm thinking.
> 
> So the only remaining issue is if callers pass offline CPUs to the
> function (I guess there is still the chance of a race between calling
> on_each_cpu_cond_mask and entering the preempt disabled'd bit). As you
> suggest they probably don't.

Yes, it is possible to construct fail that way.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] smp,cpumask: Don't call functions on offline CPUs
  2019-05-22 18:23       ` Rik van Riel
@ 2019-05-27 10:53         ` Peter Zijlstra
  0 siblings, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2019-05-27 10:53 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Andrew Murray, Thomas Gleixner, linux-kernel

On Wed, May 22, 2019 at 02:23:47PM -0400, Rik van Riel wrote:
> On Wed, 2019-05-22 at 16:49 +0200, Peter Zijlstra wrote:
> > On Wed, May 22, 2019 at 03:37:11PM +0100, Andrew Murray wrote:
> > > > Is perhaps the problem that on_each_cpu_cond() uses
> > > > cpu_onlne_mask
> > > > without protection?
> > > 
> > > Does this prevent racing with a CPU going offline? I guess this
> > > prevents
> > > the warning at the expense of a lock - but is only beneficial in
> > > the
> > > unlikely path. (In the likely path this prevents new CPUs going
> > > offline
> > > but we don't care because we don't WARN if they aren't they when we
> > > attempt to call functions).
> > > 
> > > At least this is my limited understanding.
> > 
> > Hmm.. I don't think it could matter, we only use the mask when
> > preempt_disable(), which would already block offline, due to it using
> > stop_machine().
> > 
> > So the patch is a no-op.
> > 
> > What's the WARN you see? TLB invalidation should pass mm_cpumask(),
> > which similarly should not contain offline CPUs I'm thinking.
> 
> Does the TLB invalidation code have anything in it
> to prevent from racing with the CPU offline code?
> 
> In other words, could we end up with the TLB
> invalidation code building its bitmask, getting
> interrupted (eg. hypervisor preemption, NMI),
> and not sending out the IPI to that bitmask of
> CPUs until after one of the CPUs in the bitmap
> has gotten offlined?

One possible thing would be if cpu-offline didn't remove the bit from
mm_cpumask() because it entered lazy state earlier or something.

Then, mm_cpumask() would contain an offline CPU and the WARN could
trigger, but I don't _think_ we do that, but I didn't check.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-05-27 10:53 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-22 11:15 [PATCH] smp,cpumask: Don't call functions on offline CPUs Andrew Murray
2019-05-22 14:09 ` Peter Zijlstra
2019-05-22 14:37   ` Andrew Murray
2019-05-22 14:49     ` Peter Zijlstra
2019-05-22 15:14       ` Andrew Murray
2019-05-27 10:52         ` Peter Zijlstra
2019-05-22 18:23       ` Rik van Riel
2019-05-27 10:53         ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).