linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH]smp: Fix send func call IPI to empty cpu mask
@ 2013-01-26  7:53 Wang YanQing
  2013-01-26 20:06 ` Linus Torvalds
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Wang YanQing @ 2013-01-26  7:53 UTC (permalink / raw)
  To: akpm; +Cc: peterz, tglx, mina86, srivatsa.bhat, linux-kernel, torvalds, stable

I get below warning every day with 3.7,
one or two times per day.

[ 2235.186027] WARNING: at /mnt/sda7/kernel/linux/arch/x86/kernel/apic/ipi.c:109 default_send_IPI_mask_logical+0x2f/0xb8()
[ 2235.186030] Hardware name: Aspire 4741
[ 2235.186032] empty IPI mask
[ 2235.186034] Modules linked in: vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) nvidia(PO) wl(O)
[ 2235.186046] Pid: 5542, comm: pool Tainted: P           O 3.7.2+ #41
[ 2235.186049] Call Trace:
[ 2235.186059]  [<c1020ec7>] warn_slowpath_common+0x65/0x7a
[ 2235.186064]  [<c1016bf0>] ? default_send_IPI_mask_logical+0x2f/0xb8
[ 2235.186069]  [<c1020f40>] warn_slowpath_fmt+0x26/0x2a
[ 2235.186074]  [<c1016bf0>] default_send_IPI_mask_logical+0x2f/0xb8
[ 2235.186079]  [<c1015cbc>] native_send_call_func_ipi+0x4f/0x57
[ 2235.186087]  [<c1053453>] smp_call_function_many+0x191/0x1a9
[ 2235.186092]  [<c101dffc>] ? do_flush_tlb_all+0x3f/0x3f
[ 2235.186097]  [<c101e074>] native_flush_tlb_others+0x21/0x24
[ 2235.186101]  [<c101e0da>] flush_tlb_page+0x63/0x89
[ 2235.186105]  [<c101d360>] ptep_set_access_flags+0x20/0x26
[ 2235.186111]  [<c108fadd>] do_wp_page+0x234/0x502
[ 2235.186117]  [<c1043f1a>] ? T.2009+0x31/0x35
[ 2235.186121]  [<c1090825>] handle_pte_fault+0x50d/0x54c
[ 2235.186128]  [<c1027150>] ? irq_exit+0x5f/0x61
[ 2235.186133]  [<c1015c47>] ? smp_call_function_interrupt+0x2c/0x2e
[ 2235.186143]  [<c12db06d>] ? call_function_interrupt+0x2d/0x34
[ 2235.186148]  [<c1090934>] handle_mm_fault+0xd0/0xe2
[ 2235.186153]  [<c12dd143>] __do_page_fault+0x411/0x42d
[ 2235.186158]  [<c1052523>] ? sys_futex+0xa9/0xee
[ 2235.186162]  [<c12dd15f>] ? __do_page_fault+0x42d/0x42d
[ 2235.186166]  [<c12dd167>] do_page_fault+0x8/0xa
[ 2235.186170]  [<c12db31a>] error_code+0x5a/0x60
[ 2235.186174]  [<c12dd15f>] ? __do_page_fault+0x42d/0x42d
[ 2235.186177] ---[ end trace 089b20858c3cb340 ]---

This patch fix it.

This patch also fix some system hang problem:
If the data->cpumask been cleared after pass

        if (WARN_ONCE(!mask, "empty IPI mask"))
                return;
then the problem 83d349f3 fix will happen again.

Signed-off-by: Wang YanQing <udknight@gmail.com>
---
 kernel/smp.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 29dd40a..7c56aba 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -33,6 +33,7 @@ struct call_function_data {
 	struct call_single_data	csd;
 	atomic_t		refs;
 	cpumask_var_t		cpumask;
+	cpumask_var_t		cpumask_ipi;
 };
 
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct call_function_data, cfd_data);
@@ -526,6 +527,13 @@ void smp_call_function_many(const struct cpumask *mask,
 		return;
 	}
 
+	/*
+	 * After we put entry into list, data->cpumask
+	 * may be cleared when others cpu respone other
+	 * IPI for call function, then data->cpumask will
+	 * be zero.
+	 */
+	cpumask_copy(data->cpumask_ipi, data->cpumask);
 	raw_spin_lock_irqsave(&call_function.lock, flags);
 	/*
 	 * Place entry at the _HEAD_ of the list, so that any cpu still
@@ -549,7 +557,7 @@ void smp_call_function_many(const struct cpumask *mask,
 	smp_mb();
 
 	/* Send a message to all CPUs in the map */
-	arch_send_call_function_ipi_mask(data->cpumask);
+	arch_send_call_function_ipi_mask(data->cpumask_ipi);
 
 	/* Optionally wait for the CPUs to complete */
 	if (wait)
-- 
1.7.11.1.116.g8228a23

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH]smp: Fix send func call IPI to empty cpu mask
  2013-01-26  7:53 [PATCH]smp: Fix send func call IPI to empty cpu mask Wang YanQing
@ 2013-01-26 20:06 ` Linus Torvalds
  2013-01-27 15:50   ` Ingo Molnar
  2013-01-27 14:35 ` [tip:x86/urgent] smp: Fix SMP function call empty cpu mask race tip-bot for Wang YanQing
  2013-01-28 10:29 ` tip-bot for Wang YanQing
  2 siblings, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2013-01-26 20:06 UTC (permalink / raw)
  To: Wang YanQing, Andrew Morton, Peter Zijlstra, Thomas Gleixner,
	mina86, Srivatsa S. Bhat, Linux Kernel Mailing List, stable,
	Ingo Molnar, Mike Galbraith, Jan Beulich, Milton Miller

On Fri, Jan 25, 2013 at 11:53 PM, Wang YanQing <udknight@gmail.com> wrote:
> I get below warning every day with 3.7,
> one or two times per day.
>
> [ 2235.186027] WARNING: at /mnt/sda7/kernel/linux/arch/x86/kernel/apic/ipi.c:109 default_send_IPI_mask_logical+0x2f/0xb8()
> [ 2235.186030] Hardware name: Aspire 4741
> [ 2235.186032] empty IPI mask
> [ 2235.186079]  [<c1015cbc>] native_send_call_func_ipi+0x4f/0x57
> [ 2235.186087]  [<c1053453>] smp_call_function_many+0x191/0x1a9
> [ 2235.186097]  [<c101e074>] native_flush_tlb_others+0x21/0x24
> [ 2235.186101]  [<c101e0da>] flush_tlb_page+0x63/0x89
> [ 2235.186105]  [<c101d360>] ptep_set_access_flags+0x20/0x26
> [ 2235.186111]  [<c108fadd>] do_wp_page+0x234/0x502
> [ 2235.186121]  [<c1090825>] handle_pte_fault+0x50d/0x54c
> [ 2235.186148]  [<c1090934>] handle_mm_fault+0xd0/0xe2
> [ 2235.186153]  [<c12dd143>] __do_page_fault+0x411/0x42d
> [ 2235.186166]  [<c12dd167>] do_page_fault+0x8/0xa
> [ 2235.186170]  [<c12db31a>] error_code+0x5a/0x60
>
> This patch fix it.
>
> This patch also fix some system hang problem:
> If the data->cpumask been cleared after pass
>
>         if (WARN_ONCE(!mask, "empty IPI mask"))
>                 return;
> then the problem 83d349f3 fix will happen again.

Hmm. We have very consciously tried to avoid the extra copy, although
I'm not entirely sure why (it might possibly hurt on the MAXSMP
configuration).

See for example commit 723aae25d5cd ("smp_call_function_many: handle
concurrent clearing of mask") which fixed another version of this
problem.

But I do agree that it looks like the copy is required, simply because
- as you say - once we've done the "list_add_rcu()" to add it to the
queue, we can have (another) IPI to the target CPU that can now see it
and clear the mask.

So by the time we get to actually send the IPI, the mask might have
been cleared by another IPI. So I do agree that your patch seems
correct, but I really really want to run it by other people.

Guys? Original patch on lkml. The other possible fix might be to take
the &call_function.lock earlier in
generic_smp_call_function_interrupt(), so that we can never clear the
bit while somebody is adding entries to the list... But I think it
very much tries to avoid that on purpose right now, with only the last
CPU responding to that IPI taking the lock.

So copying the IPI mask seems to be the reasonable approach. Comments?

                Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [tip:x86/urgent] smp: Fix SMP function call empty cpu mask race
  2013-01-26  7:53 [PATCH]smp: Fix send func call IPI to empty cpu mask Wang YanQing
  2013-01-26 20:06 ` Linus Torvalds
@ 2013-01-27 14:35 ` tip-bot for Wang YanQing
  2013-01-28 10:29 ` tip-bot for Wang YanQing
  2 siblings, 0 replies; 8+ messages in thread
From: tip-bot for Wang YanQing @ 2013-01-27 14:35 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, udknight, paulmck, akpm,
	stable, tglx

Commit-ID:  c7b798525b50256c8084215a139fa40b0114bfcc
Gitweb:     http://git.kernel.org/tip/c7b798525b50256c8084215a139fa40b0114bfcc
Author:     Wang YanQing <udknight@gmail.com>
AuthorDate: Sat, 26 Jan 2013 15:53:57 +0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 27 Jan 2013 13:29:14 +0100

smp: Fix SMP function call empty cpu mask race

I get the following warning every day with v3.7, once or
twice a day:

  [ 2235.186027] WARNING: at /mnt/sda7/kernel/linux/arch/x86/kernel/apic/ipi.c:109 default_send_IPI_mask_logical+0x2f/0xb8()

As explained by Linus as well:

 |
 | Once we've done the "list_add_rcu()" to add it to the
 | queue, we can have (another) IPI to the target CPU that can
 | now see it and clear the mask.
 |
 | So by the time we get to actually send the IPI, the mask might
 | have been cleared by another IPI.
 |

This patch also fixes a system hang problem, if the data->cpumask
gets cleared after passing this point:

        if (WARN_ONCE(!mask, "empty IPI mask"))
                return;

then the problem in commit 83d349f35e1a ("x86: don't send an IPI to
the empty set of CPU's") will happen again.

Signed-off-by: Wang YanQing <udknight@gmail.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: peterz@infradead.org
Cc: mina86@mina86.org
Cc: srivatsa.bhat@linux.vnet.ibm.com
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/20130126075357.GA3205@udknight
[ Tidied up the changelog and the comment in the code. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/smp.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 29dd40a..93e576e 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -33,6 +33,7 @@ struct call_function_data {
 	struct call_single_data	csd;
 	atomic_t		refs;
 	cpumask_var_t		cpumask;
+	cpumask_var_t		cpumask_ipi;
 };
 
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct call_function_data, cfd_data);
@@ -526,6 +527,12 @@ void smp_call_function_many(const struct cpumask *mask,
 		return;
 	}
 
+	/*
+	 * After we put an entry into the list, data->cpumask
+	 * may be cleared again when another CPU sends another IPI for
+	 * a SMP function call, so data->cpumask will be zero.
+	 */
+	cpumask_copy(data->cpumask_ipi, data->cpumask);
 	raw_spin_lock_irqsave(&call_function.lock, flags);
 	/*
 	 * Place entry at the _HEAD_ of the list, so that any cpu still
@@ -549,7 +556,7 @@ void smp_call_function_many(const struct cpumask *mask,
 	smp_mb();
 
 	/* Send a message to all CPUs in the map */
-	arch_send_call_function_ipi_mask(data->cpumask);
+	arch_send_call_function_ipi_mask(data->cpumask_ipi);
 
 	/* Optionally wait for the CPUs to complete */
 	if (wait)

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH]smp: Fix send func call IPI to empty cpu mask
  2013-01-26 20:06 ` Linus Torvalds
@ 2013-01-27 15:50   ` Ingo Molnar
  2013-01-28  9:25     ` Jan Beulich
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2013-01-27 15:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Wang YanQing, Andrew Morton, Peter Zijlstra, Thomas Gleixner,
	mina86, Srivatsa S. Bhat, Linux Kernel Mailing List, stable,
	Mike Galbraith, Jan Beulich, Milton Miller


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Fri, Jan 25, 2013 at 11:53 PM, Wang YanQing <udknight@gmail.com> wrote:
> > I get below warning every day with 3.7,
> > one or two times per day.
> >
> > [ 2235.186027] WARNING: at /mnt/sda7/kernel/linux/arch/x86/kernel/apic/ipi.c:109 default_send_IPI_mask_logical+0x2f/0xb8()
> > [ 2235.186030] Hardware name: Aspire 4741
> > [ 2235.186032] empty IPI mask
> > [ 2235.186079]  [<c1015cbc>] native_send_call_func_ipi+0x4f/0x57
> > [ 2235.186087]  [<c1053453>] smp_call_function_many+0x191/0x1a9
> > [ 2235.186097]  [<c101e074>] native_flush_tlb_others+0x21/0x24
> > [ 2235.186101]  [<c101e0da>] flush_tlb_page+0x63/0x89
> > [ 2235.186105]  [<c101d360>] ptep_set_access_flags+0x20/0x26
> > [ 2235.186111]  [<c108fadd>] do_wp_page+0x234/0x502
> > [ 2235.186121]  [<c1090825>] handle_pte_fault+0x50d/0x54c
> > [ 2235.186148]  [<c1090934>] handle_mm_fault+0xd0/0xe2
> > [ 2235.186153]  [<c12dd143>] __do_page_fault+0x411/0x42d
> > [ 2235.186166]  [<c12dd167>] do_page_fault+0x8/0xa
> > [ 2235.186170]  [<c12db31a>] error_code+0x5a/0x60
> >
> > This patch fix it.
> >
> > This patch also fix some system hang problem:
> > If the data->cpumask been cleared after pass
> >
> >         if (WARN_ONCE(!mask, "empty IPI mask"))
> >                 return;
> > then the problem 83d349f3 fix will happen again.
> 
> Hmm. We have very consciously tried to avoid the extra copy, although
> I'm not entirely sure why (it might possibly hurt on the MAXSMP
> configuration).
> 
> See for example commit 723aae25d5cd ("smp_call_function_many: handle
> concurrent clearing of mask") which fixed another version of this
> problem.
> 
> But I do agree that it looks like the copy is required, simply because
> - as you say - once we've done the "list_add_rcu()" to add it to the
> queue, we can have (another) IPI to the target CPU that can now see it
> and clear the mask.
> 
> So by the time we get to actually send the IPI, the mask might have
> been cleared by another IPI. So I do agree that your patch seems
> correct, but I really really want to run it by other people.
> 
> Guys? Original patch on lkml. The other possible fix might be 
> to take the &call_function.lock earlier in 
> generic_smp_call_function_interrupt(), so that we can never 
> clear the bit while somebody is adding entries to the list... 
> But I think it very much tries to avoid that on purpose right 
> now, with only the last CPU responding to that IPI taking the 
> lock.
> 
> So copying the IPI mask seems to be the reasonable approach. 
> Comments?

Agreed, looks correct to me as well - I've queued the fix up in 
tip:x86/urgent.

( I've added your Acked-by to the patch, please holler if you 
  disagree with how the final commit ended up looking like. )

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH]smp: Fix send func call IPI to empty cpu mask
  2013-01-27 15:50   ` Ingo Molnar
@ 2013-01-28  9:25     ` Jan Beulich
  2013-01-28  9:32       ` Wang YanQing
  2013-01-28 10:20       ` Ingo Molnar
  0 siblings, 2 replies; 8+ messages in thread
From: Jan Beulich @ 2013-01-28  9:25 UTC (permalink / raw)
  To: Ingo Molnar, Linus Torvalds
  Cc: Milton Miller, Wang YanQing, Mike Galbraith, Peter Zijlstra,
	Thomas Gleixner, Andrew Morton, Srivatsa S. Bhat, mina86,
	Linux Kernel Mailing List, stable

>>> On 27.01.13 at 16:50, Ingo Molnar <mingo@kernel.org> wrote:

> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
>> On Fri, Jan 25, 2013 at 11:53 PM, Wang YanQing <udknight@gmail.com> wrote:
>> > I get below warning every day with 3.7,
>> > one or two times per day.
>> >
>> > [ 2235.186027] WARNING: at 
> /mnt/sda7/kernel/linux/arch/x86/kernel/apic/ipi.c:109 
> default_send_IPI_mask_logical+0x2f/0xb8()
>> > [ 2235.186030] Hardware name: Aspire 4741
>> > [ 2235.186032] empty IPI mask
>> > [ 2235.186079]  [<c1015cbc>] native_send_call_func_ipi+0x4f/0x57
>> > [ 2235.186087]  [<c1053453>] smp_call_function_many+0x191/0x1a9
>> > [ 2235.186097]  [<c101e074>] native_flush_tlb_others+0x21/0x24
>> > [ 2235.186101]  [<c101e0da>] flush_tlb_page+0x63/0x89
>> > [ 2235.186105]  [<c101d360>] ptep_set_access_flags+0x20/0x26
>> > [ 2235.186111]  [<c108fadd>] do_wp_page+0x234/0x502
>> > [ 2235.186121]  [<c1090825>] handle_pte_fault+0x50d/0x54c
>> > [ 2235.186148]  [<c1090934>] handle_mm_fault+0xd0/0xe2
>> > [ 2235.186153]  [<c12dd143>] __do_page_fault+0x411/0x42d
>> > [ 2235.186166]  [<c12dd167>] do_page_fault+0x8/0xa
>> > [ 2235.186170]  [<c12db31a>] error_code+0x5a/0x60
>> >
>> > This patch fix it.
>> >
>> > This patch also fix some system hang problem:
>> > If the data->cpumask been cleared after pass
>> >
>> >         if (WARN_ONCE(!mask, "empty IPI mask"))
>> >                 return;
>> > then the problem 83d349f3 fix will happen again.
>> 
>> Hmm. We have very consciously tried to avoid the extra copy, although
>> I'm not entirely sure why (it might possibly hurt on the MAXSMP
>> configuration).
>> 
>> See for example commit 723aae25d5cd ("smp_call_function_many: handle
>> concurrent clearing of mask") which fixed another version of this
>> problem.
>> 
>> But I do agree that it looks like the copy is required, simply because
>> - as you say - once we've done the "list_add_rcu()" to add it to the
>> queue, we can have (another) IPI to the target CPU that can now see it
>> and clear the mask.
>> 
>> So by the time we get to actually send the IPI, the mask might have
>> been cleared by another IPI. So I do agree that your patch seems
>> correct, but I really really want to run it by other people.
>> 
>> Guys? Original patch on lkml. The other possible fix might be 
>> to take the &call_function.lock earlier in 
>> generic_smp_call_function_interrupt(), so that we can never 
>> clear the bit while somebody is adding entries to the list... 
>> But I think it very much tries to avoid that on purpose right 
>> now, with only the last CPU responding to that IPI taking the 
>> lock.
>> 
>> So copying the IPI mask seems to be the reasonable approach. 
>> Comments?
> 
> Agreed, looks correct to me as well - I've queued the fix up in 
> tip:x86/urgent.

But the patch is obviously incomplete for the CPUMASK_OFFSTACK
case, as the newly added cpumask_ipi member never gets
its bit array allocated.

Jan


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH]smp: Fix send func call IPI to empty cpu mask
  2013-01-28  9:25     ` Jan Beulich
@ 2013-01-28  9:32       ` Wang YanQing
  2013-01-28 10:20       ` Ingo Molnar
  1 sibling, 0 replies; 8+ messages in thread
From: Wang YanQing @ 2013-01-28  9:32 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Ingo Molnar, Linus Torvalds, Milton Miller, Mike Galbraith,
	Peter Zijlstra, Thomas Gleixner, Andrew Morton, Srivatsa S. Bhat,
	mina86, Linux Kernel Mailing List, stable

On Mon, Jan 28, 2013 at 09:25:55AM +0000, Jan Beulich wrote:
> But the patch is obviously incomplete for the CPUMASK_OFFSTACK
> case, as the newly added cpumask_ipi member never gets
> its bit array allocated.
> 
> Jan
Yes, I have found it, and the patch is in lkml some hours ago.

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH]smp: Fix send func call IPI to empty cpu mask
  2013-01-28  9:25     ` Jan Beulich
  2013-01-28  9:32       ` Wang YanQing
@ 2013-01-28 10:20       ` Ingo Molnar
  1 sibling, 0 replies; 8+ messages in thread
From: Ingo Molnar @ 2013-01-28 10:20 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Linus Torvalds, Milton Miller, Wang YanQing, Mike Galbraith,
	Peter Zijlstra, Thomas Gleixner, Andrew Morton, Srivatsa S. Bhat,
	mina86, Linux Kernel Mailing List, stable


* Jan Beulich <JBeulich@suse.com> wrote:

> >>> On 27.01.13 at 16:50, Ingo Molnar <mingo@kernel.org> wrote:
> 
> > * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > 
> >> On Fri, Jan 25, 2013 at 11:53 PM, Wang YanQing <udknight@gmail.com> wrote:
> >> > I get below warning every day with 3.7,
> >> > one or two times per day.
> >> >
> >> > [ 2235.186027] WARNING: at 
> > /mnt/sda7/kernel/linux/arch/x86/kernel/apic/ipi.c:109 
> > default_send_IPI_mask_logical+0x2f/0xb8()
> >> > [ 2235.186030] Hardware name: Aspire 4741
> >> > [ 2235.186032] empty IPI mask
> >> > [ 2235.186079]  [<c1015cbc>] native_send_call_func_ipi+0x4f/0x57
> >> > [ 2235.186087]  [<c1053453>] smp_call_function_many+0x191/0x1a9
> >> > [ 2235.186097]  [<c101e074>] native_flush_tlb_others+0x21/0x24
> >> > [ 2235.186101]  [<c101e0da>] flush_tlb_page+0x63/0x89
> >> > [ 2235.186105]  [<c101d360>] ptep_set_access_flags+0x20/0x26
> >> > [ 2235.186111]  [<c108fadd>] do_wp_page+0x234/0x502
> >> > [ 2235.186121]  [<c1090825>] handle_pte_fault+0x50d/0x54c
> >> > [ 2235.186148]  [<c1090934>] handle_mm_fault+0xd0/0xe2
> >> > [ 2235.186153]  [<c12dd143>] __do_page_fault+0x411/0x42d
> >> > [ 2235.186166]  [<c12dd167>] do_page_fault+0x8/0xa
> >> > [ 2235.186170]  [<c12db31a>] error_code+0x5a/0x60
> >> >
> >> > This patch fix it.
> >> >
> >> > This patch also fix some system hang problem:
> >> > If the data->cpumask been cleared after pass
> >> >
> >> >         if (WARN_ONCE(!mask, "empty IPI mask"))
> >> >                 return;
> >> > then the problem 83d349f3 fix will happen again.
> >> 
> >> Hmm. We have very consciously tried to avoid the extra copy, although
> >> I'm not entirely sure why (it might possibly hurt on the MAXSMP
> >> configuration).
> >> 
> >> See for example commit 723aae25d5cd ("smp_call_function_many: handle
> >> concurrent clearing of mask") which fixed another version of this
> >> problem.
> >> 
> >> But I do agree that it looks like the copy is required, simply because
> >> - as you say - once we've done the "list_add_rcu()" to add it to the
> >> queue, we can have (another) IPI to the target CPU that can now see it
> >> and clear the mask.
> >> 
> >> So by the time we get to actually send the IPI, the mask might have
> >> been cleared by another IPI. So I do agree that your patch seems
> >> correct, but I really really want to run it by other people.
> >> 
> >> Guys? Original patch on lkml. The other possible fix might be 
> >> to take the &call_function.lock earlier in 
> >> generic_smp_call_function_interrupt(), so that we can never 
> >> clear the bit while somebody is adding entries to the list... 
> >> But I think it very much tries to avoid that on purpose right 
> >> now, with only the last CPU responding to that IPI taking the 
> >> lock.
> >> 
> >> So copying the IPI mask seems to be the reasonable approach. 
> >> Comments?
> > 
> > Agreed, looks correct to me as well - I've queued the fix up in 
> > tip:x86/urgent.
> 
> But the patch is obviously incomplete for the CPUMASK_OFFSTACK 
> case, as the newly added cpumask_ipi member never gets its bit 
> array allocated.

Yes, indeed - I'll amend it with the fix.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [tip:x86/urgent] smp: Fix SMP function call empty cpu mask race
  2013-01-26  7:53 [PATCH]smp: Fix send func call IPI to empty cpu mask Wang YanQing
  2013-01-26 20:06 ` Linus Torvalds
  2013-01-27 14:35 ` [tip:x86/urgent] smp: Fix SMP function call empty cpu mask race tip-bot for Wang YanQing
@ 2013-01-28 10:29 ` tip-bot for Wang YanQing
  2 siblings, 0 replies; 8+ messages in thread
From: tip-bot for Wang YanQing @ 2013-01-28 10:29 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, jbeulich, udknight, paulmck,
	akpm, stable, tglx

Commit-ID:  f44310b98ddb7f0d06550d73ed67df5865e3eda5
Gitweb:     http://git.kernel.org/tip/f44310b98ddb7f0d06550d73ed67df5865e3eda5
Author:     Wang YanQing <udknight@gmail.com>
AuthorDate: Sat, 26 Jan 2013 15:53:57 +0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 28 Jan 2013 11:21:57 +0100

smp: Fix SMP function call empty cpu mask race

I get the following warning every day with v3.7, once or
twice a day:

  [ 2235.186027] WARNING: at /mnt/sda7/kernel/linux/arch/x86/kernel/apic/ipi.c:109 default_send_IPI_mask_logical+0x2f/0xb8()

As explained by Linus as well:

 |
 | Once we've done the "list_add_rcu()" to add it to the
 | queue, we can have (another) IPI to the target CPU that can
 | now see it and clear the mask.
 |
 | So by the time we get to actually send the IPI, the mask might
 | have been cleared by another IPI.
 |

This patch also fixes a system hang problem, if the data->cpumask
gets cleared after passing this point:

        if (WARN_ONCE(!mask, "empty IPI mask"))
                return;

then the problem in commit 83d349f35e1a ("x86: don't send an IPI to
the empty set of CPU's") will happen again.

Signed-off-by: Wang YanQing <udknight@gmail.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: peterz@infradead.org
Cc: mina86@mina86.org
Cc: srivatsa.bhat@linux.vnet.ibm.com
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/20130126075357.GA3205@udknight
[ Tidied up the changelog and the comment in the code. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/smp.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 29dd40a..69f38bd 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -33,6 +33,7 @@ struct call_function_data {
 	struct call_single_data	csd;
 	atomic_t		refs;
 	cpumask_var_t		cpumask;
+	cpumask_var_t		cpumask_ipi;
 };
 
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct call_function_data, cfd_data);
@@ -56,6 +57,9 @@ hotplug_cfd(struct notifier_block *nfb, unsigned long action, void *hcpu)
 		if (!zalloc_cpumask_var_node(&cfd->cpumask, GFP_KERNEL,
 				cpu_to_node(cpu)))
 			return notifier_from_errno(-ENOMEM);
+		if (!zalloc_cpumask_var_node(&cfd->cpumask_ipi, GFP_KERNEL,
+				cpu_to_node(cpu)))
+			return notifier_from_errno(-ENOMEM);
 		break;
 
 #ifdef CONFIG_HOTPLUG_CPU
@@ -65,6 +69,7 @@ hotplug_cfd(struct notifier_block *nfb, unsigned long action, void *hcpu)
 	case CPU_DEAD:
 	case CPU_DEAD_FROZEN:
 		free_cpumask_var(cfd->cpumask);
+		free_cpumask_var(cfd->cpumask_ipi);
 		break;
 #endif
 	};
@@ -526,6 +531,12 @@ void smp_call_function_many(const struct cpumask *mask,
 		return;
 	}
 
+	/*
+	 * After we put an entry into the list, data->cpumask
+	 * may be cleared again when another CPU sends another IPI for
+	 * a SMP function call, so data->cpumask will be zero.
+	 */
+	cpumask_copy(data->cpumask_ipi, data->cpumask);
 	raw_spin_lock_irqsave(&call_function.lock, flags);
 	/*
 	 * Place entry at the _HEAD_ of the list, so that any cpu still
@@ -549,7 +560,7 @@ void smp_call_function_many(const struct cpumask *mask,
 	smp_mb();
 
 	/* Send a message to all CPUs in the map */
-	arch_send_call_function_ipi_mask(data->cpumask);
+	arch_send_call_function_ipi_mask(data->cpumask_ipi);
 
 	/* Optionally wait for the CPUs to complete */
 	if (wait)

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-01-28 10:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-26  7:53 [PATCH]smp: Fix send func call IPI to empty cpu mask Wang YanQing
2013-01-26 20:06 ` Linus Torvalds
2013-01-27 15:50   ` Ingo Molnar
2013-01-28  9:25     ` Jan Beulich
2013-01-28  9:32       ` Wang YanQing
2013-01-28 10:20       ` Ingo Molnar
2013-01-27 14:35 ` [tip:x86/urgent] smp: Fix SMP function call empty cpu mask race tip-bot for Wang YanQing
2013-01-28 10:29 ` tip-bot for Wang YanQing

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).