linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] smp: avoid generic_exec_single cause system lockup
@ 2019-07-18  8:03 luferry
  2019-07-18  8:07 ` Thomas Gleixner
  0 siblings, 1 reply; 7+ messages in thread
From: luferry @ 2019-07-18  8:03 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra (Intel),
	Rik van Riel, Greg Kroah-Hartman, Josh Poimboeuf, linux-kernel,
	luferry

From: luferry <luferry@163.com>

The race can reproduced by sending wait enabled IPI in softirq/irq env

src cpu only send ipi when dst cpu with queue empty, if interrupts
disturbed between llist_add and send_ipi. Interrupt handler may raise
softirq.In irq env, if src cpu try send_ipi to same dst cpu with
wait enabled. Since dst cpu's queue is not empty, src cpu won't send
ipi and dst cpu won't be waked up. src cpu will stall in
csd_lock_wait(csd). Which may cause soft lockup or hard lockup depends on
which time other cpus do send IPI to dst cpu.

So just send IPI when wait enabled and in_interrupt()

if (llist_add(&csd->llist, &per_cpu(call_single_queue, cpu)))
	// src cpu got interrupt here
     arch_send_call_function_single_ipi(cpu);

CPU0                                   CPU1

kernel env:smp_call_function         call_single_queue empty
kernel env:llist_add
                                       call_single_queue got csd
get interrupt
raise softirq
irq env:smp_call_function with wait
irq env:llist_add
irq env:queue not empty and skip send ipi
irq env:waiting for csd execution

Signed-off-by: luferry <luferry@163.com>
---
 kernel/smp.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index d155374632eb..5f5343e17bb3 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -142,9 +142,8 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(call_single_data_t, csd_data);
 static int generic_exec_single(int cpu, call_single_data_t *csd,
 			       smp_call_func_t func, void *info)
 {
+	unsigned long flags;
 	if (cpu == smp_processor_id()) {
-		unsigned long flags;
-
 		/*
 		 * We can unlock early even for the synchronous on-stack case,
 		 * since we're doing this from the same CPU..
@@ -176,8 +175,10 @@ static int generic_exec_single(int cpu, call_single_data_t *csd,
 	 * locking and barrier primitives. Generic code isn't really
 	 * equipped to do the right thing...
 	 */
+	local_irq_save(flags);
 	if (llist_add(&csd->llist, &per_cpu(call_single_queue, cpu)))
 		arch_send_call_function_single_ipi(cpu);
+	local_irq_restore(flags);
 
 	return 0;
 }
@@ -404,6 +405,7 @@ EXPORT_SYMBOL_GPL(smp_call_function_any);
 void smp_call_function_many(const struct cpumask *mask,
 			    smp_call_func_t func, void *info, bool wait)
 {
+	unsigned long flags;
 	struct call_function_data *cfd;
 	int cpu, next_cpu, this_cpu = smp_processor_id();
 
@@ -446,6 +448,8 @@ void smp_call_function_many(const struct cpumask *mask,
 		return;
 
 	cpumask_clear(cfd->cpumask_ipi);
+
+	local_irq_save(flags);
 	for_each_cpu(cpu, cfd->cpumask) {
 		call_single_data_t *csd = per_cpu_ptr(cfd->csd, cpu);
 
@@ -460,6 +464,7 @@ void smp_call_function_many(const struct cpumask *mask,
 
 	/* Send a message to all CPUs in the map */
 	arch_send_call_function_ipi_mask(cfd->cpumask_ipi);
+	local_irq_restore(flags);
 
 	if (wait) {
 		for_each_cpu(cpu, cfd->cpumask) {
-- 
2.14.1.40.g8e62ba1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] smp: avoid generic_exec_single cause system lockup
  2019-07-18  8:03 [PATCH v2] smp: avoid generic_exec_single cause system lockup luferry
@ 2019-07-18  8:07 ` Thomas Gleixner
  2019-07-18  8:50   ` luferry
  0 siblings, 1 reply; 7+ messages in thread
From: Thomas Gleixner @ 2019-07-18  8:07 UTC (permalink / raw)
  To: luferry
  Cc: Peter Zijlstra (Intel),
	Rik van Riel, Greg Kroah-Hartman, Josh Poimboeuf, linux-kernel

On Thu, 18 Jul 2019, luferry@163.com wrote:

> From: luferry <luferry@163.com>
> 
> The race can reproduced by sending wait enabled IPI in softirq/irq env

Which code path is doing that?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re:Re: [PATCH v2] smp: avoid generic_exec_single cause system lockup
  2019-07-18  8:07 ` Thomas Gleixner
@ 2019-07-18  8:50   ` luferry
  2019-07-18  9:58     ` Thomas Gleixner
  0 siblings, 1 reply; 7+ messages in thread
From: luferry @ 2019-07-18  8:50 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra (Intel),
	Rik van Riel, Greg Kroah-Hartman, Josh Poimboeuf, linux-kernel











At 2019-07-18 16:07:58, "Thomas Gleixner" <tglx@linutronix.de> wrote:
>On Thu, 18 Jul 2019, luferry@163.com wrote:
>
>> From: luferry <luferry@163.com>
>> 
>> The race can reproduced by sending wait enabled IPI in softirq/irq env
>
>Which code path is doing that?
>
>Thanks,
>
>	tglx

Thanks for your kindly reply.
I checked kernel and found no code path can run into this.
Actually , i encounter with this problem by my own code.
I need to do some specific urgent work periodicity and these 
work may run for quite a while. So i can't disable irq during these work 
which stops me from using hrtimer to do this. So i did add an extra 
sofitrq action which may invoke smp_call.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re:Re: [PATCH v2] smp: avoid generic_exec_single cause system lockup
  2019-07-18  8:50   ` luferry
@ 2019-07-18  9:58     ` Thomas Gleixner
  2019-07-18 16:06       ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Thomas Gleixner @ 2019-07-18  9:58 UTC (permalink / raw)
  To: luferry
  Cc: Peter Zijlstra (Intel),
	Rik van Riel, Greg Kroah-Hartman, Josh Poimboeuf, linux-kernel

On Thu, 18 Jul 2019, luferry wrote:
> At 2019-07-18 16:07:58, "Thomas Gleixner" <tglx@linutronix.de> wrote:
> >On Thu, 18 Jul 2019, luferry@163.com wrote:
> >
> >> From: luferry <luferry@163.com>
> >> 
> >> The race can reproduced by sending wait enabled IPI in softirq/irq env
> >
> >Which code path is doing that?
>
> I checked kernel and found no code path can run into this.

For a good reason.

> Actually , i encounter with this problem by my own code.
> I need to do some specific urgent work periodicity and these 
> work may run for quite a while. So i can't disable irq during these work 
> which stops me from using hrtimer to do this. So i did add an extra 
> sofitrq action which may invoke smp_call.

Well, from softirq handling context the only allowed interface is
smp_call_function_single_async().

The code is actually missing a warning to that effect. See below.

Vs. your proposed change. It's broken in various ways and no, we are not
going to support that and definitely we are not going to disable interrupts
around a loop over all cpus in a mask.

Thanks,

	tglx

8<--------------
Subject: smp: Warn on function calls from softirq context
From: Thomas Gleixner <tglx@linutronix.de>
Date: Thu, 18 Jul 2019 11:20:09 +0200

It's clearly documented that smp function calls cannot be invoked from
softirq handling context. Unfortunately nothing enforces that or emits a
warning.

A single function call can be invoked from softirq context only via
smp_call_function_single_async().

Reported-by: luferry <luferry@163.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/smp.c |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)

--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -291,6 +291,15 @@ int smp_call_function_single(int cpu, sm
 	WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
 		     && !oops_in_progress);
 
+	/*
+	 * Can deadlock when the softirq is executed on return from
+	 * interrupt and the interrupt hit between llist_add() and
+	 * arch_send_call_function_single_ipi() because then this
+	 * invocation sees the list non-empty, skips the IPI send
+	 * and waits forever.
+	 */
+	WARN_ON_ONCE(is_serving_softirq() && wait);
+
 	csd = &csd_stack;
 	if (!wait) {
 		csd = this_cpu_ptr(&csd_data);
@@ -416,6 +425,13 @@ void smp_call_function_many(const struct
 	WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
 		     && !oops_in_progress && !early_boot_irqs_disabled);
 
+	/*
+	 * Bottom half handlers are not allowed to call this as they might
+	 * corrupt cfd_data when the interrupt which triggered softirq
+	 * processing hit this function.
+	 */
+	WARN_ON_ONCE(is_serving_softirq());
+
 	/* Try to fastpath.  So, what's a CPU they want? Ignoring this one. */
 	cpu = cpumask_first_and(mask, cpu_online_mask);
 	if (cpu == this_cpu)



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Re: [PATCH v2] smp: avoid generic_exec_single cause system lockup
  2019-07-18  9:58     ` Thomas Gleixner
@ 2019-07-18 16:06       ` Peter Zijlstra
  2019-07-18 18:17         ` Thomas Gleixner
  2019-07-20  9:31         ` [tip:smp/urgent] smp: Warn on function calls from softirq context tip-bot for Peter Zijlstra
  0 siblings, 2 replies; 7+ messages in thread
From: Peter Zijlstra @ 2019-07-18 16:06 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: luferry, Rik van Riel, Greg Kroah-Hartman, Josh Poimboeuf, linux-kernel

On Thu, Jul 18, 2019 at 11:58:47AM +0200, Thomas Gleixner wrote:
> Subject: smp: Warn on function calls from softirq context
> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Thu, 18 Jul 2019 11:20:09 +0200
> 
> It's clearly documented that smp function calls cannot be invoked from
> softirq handling context. Unfortunately nothing enforces that or emits a
> warning.
> 
> A single function call can be invoked from softirq context only via
> smp_call_function_single_async().
> 
> Reported-by: luferry <luferry@163.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  kernel/smp.c |   16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -291,6 +291,15 @@ int smp_call_function_single(int cpu, sm
>  	WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
>  		     && !oops_in_progress);
>  
> +	/*
> +	 * Can deadlock when the softirq is executed on return from
> +	 * interrupt and the interrupt hit between llist_add() and
> +	 * arch_send_call_function_single_ipi() because then this
> +	 * invocation sees the list non-empty, skips the IPI send
> +	 * and waits forever.
> +	 */
> +	WARN_ON_ONCE(is_serving_softirq() && wait);
> +
>  	csd = &csd_stack;
>  	if (!wait) {
>  		csd = this_cpu_ptr(&csd_data);
> @@ -416,6 +425,13 @@ void smp_call_function_many(const struct
>  	WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
>  		     && !oops_in_progress && !early_boot_irqs_disabled);
>  
> +	/*
> +	 * Bottom half handlers are not allowed to call this as they might
> +	 * corrupt cfd_data when the interrupt which triggered softirq
> +	 * processing hit this function.
> +	 */
> +	WARN_ON_ONCE(is_serving_softirq());
> +
>  	/* Try to fastpath.  So, what's a CPU they want? Ignoring this one. */
>  	cpu = cpumask_first_and(mask, cpu_online_mask);
>  	if (cpu == this_cpu)

As we discussed on IRC, it is worse, we can only use these functions
from task/process context. We need something like the below.

I've build a kernel with this applied and nothing went *splat*.

diff --git a/kernel/smp.c b/kernel/smp.c
index 616d4d114847..7dbcb402c2fc 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -291,6 +291,14 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
 	WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
 		     && !oops_in_progress);
 
+	/*
+	 * When @wait we can deadlock when we interrupt between llist_add() and
+	 * arch_send_call_function_ipi*(); when !@wait we can deadlock due to
+	 * csd_lock() on because the interrupt context uses the same csd
+	 * storage.
+	 */
+	WARN_ON_ONCE(!in_task());
+
 	csd = &csd_stack;
 	if (!wait) {
 		csd = this_cpu_ptr(&csd_data);
@@ -416,6 +424,14 @@ void smp_call_function_many(const struct cpumask *mask,
 	WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
 		     && !oops_in_progress && !early_boot_irqs_disabled);
 
+	/*
+	 * When @wait we can deadlock when we interrupt between llist_add() and
+	 * arch_send_call_function_ipi*(); when !@wait we can deadlock due to
+	 * csd_lock() on because the interrupt context uses the same csd
+	 * storage.
+	 */
+	WARN_ON_ONCE(!in_task());
+
 	/* Try to fastpath.  So, what's a CPU they want? Ignoring this one. */
 	cpu = cpumask_first_and(mask, cpu_online_mask);
 	if (cpu == this_cpu)

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: Re: [PATCH v2] smp: avoid generic_exec_single cause system lockup
  2019-07-18 16:06       ` Peter Zijlstra
@ 2019-07-18 18:17         ` Thomas Gleixner
  2019-07-20  9:31         ` [tip:smp/urgent] smp: Warn on function calls from softirq context tip-bot for Peter Zijlstra
  1 sibling, 0 replies; 7+ messages in thread
From: Thomas Gleixner @ 2019-07-18 18:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: luferry, Rik van Riel, Greg Kroah-Hartman, Josh Poimboeuf, linux-kernel

On Thu, 18 Jul 2019, Peter Zijlstra wrote:
> On Thu, Jul 18, 2019 at 11:58:47AM +0200, Thomas Gleixner wrote:
> 
> As we discussed on IRC, it is worse, we can only use these functions
> from task/process context. We need something like the below.

Indeed that's better defined.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [tip:smp/urgent] smp: Warn on function calls from softirq context
  2019-07-18 16:06       ` Peter Zijlstra
  2019-07-18 18:17         ` Thomas Gleixner
@ 2019-07-20  9:31         ` tip-bot for Peter Zijlstra
  1 sibling, 0 replies; 7+ messages in thread
From: tip-bot for Peter Zijlstra @ 2019-07-20  9:31 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: peterz, linux-kernel, luferry, hpa, mingo, tglx

Commit-ID:  19dbdcb8039cff16669a05136a29180778d16d0a
Gitweb:     https://git.kernel.org/tip/19dbdcb8039cff16669a05136a29180778d16d0a
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Thu, 18 Jul 2019 11:20:09 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 20 Jul 2019 11:27:16 +0200

smp: Warn on function calls from softirq context

It's clearly documented that smp function calls cannot be invoked from
softirq handling context. Unfortunately nothing enforces that or emits a
warning.

A single function call can be invoked from softirq context only via
smp_call_function_single_async().

The only legit context is task context, so add a warning to that effect.

Reported-by: luferry <luferry@163.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190718160601.GP3402@hirez.programming.kicks-ass.net
---
 kernel/smp.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/kernel/smp.c b/kernel/smp.c
index 616d4d114847..7dbcb402c2fc 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -291,6 +291,14 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
 	WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
 		     && !oops_in_progress);
 
+	/*
+	 * When @wait we can deadlock when we interrupt between llist_add() and
+	 * arch_send_call_function_ipi*(); when !@wait we can deadlock due to
+	 * csd_lock() on because the interrupt context uses the same csd
+	 * storage.
+	 */
+	WARN_ON_ONCE(!in_task());
+
 	csd = &csd_stack;
 	if (!wait) {
 		csd = this_cpu_ptr(&csd_data);
@@ -416,6 +424,14 @@ void smp_call_function_many(const struct cpumask *mask,
 	WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
 		     && !oops_in_progress && !early_boot_irqs_disabled);
 
+	/*
+	 * When @wait we can deadlock when we interrupt between llist_add() and
+	 * arch_send_call_function_ipi*(); when !@wait we can deadlock due to
+	 * csd_lock() on because the interrupt context uses the same csd
+	 * storage.
+	 */
+	WARN_ON_ONCE(!in_task());
+
 	/* Try to fastpath.  So, what's a CPU they want? Ignoring this one. */
 	cpu = cpumask_first_and(mask, cpu_online_mask);
 	if (cpu == this_cpu)

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-07-20  9:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-18  8:03 [PATCH v2] smp: avoid generic_exec_single cause system lockup luferry
2019-07-18  8:07 ` Thomas Gleixner
2019-07-18  8:50   ` luferry
2019-07-18  9:58     ` Thomas Gleixner
2019-07-18 16:06       ` Peter Zijlstra
2019-07-18 18:17         ` Thomas Gleixner
2019-07-20  9:31         ` [tip:smp/urgent] smp: Warn on function calls from softirq context tip-bot for Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).