* [PATCH]smp: Fix send func call IPI to empty cpu mask
@ 2013-01-26 7:53 Wang YanQing
2013-01-26 20:06 ` Linus Torvalds
0 siblings, 1 reply; 6+ messages in thread
From: Wang YanQing @ 2013-01-26 7:53 UTC (permalink / raw)
To: akpm; +Cc: peterz, tglx, mina86, srivatsa.bhat, linux-kernel, torvalds, stable
I get below warning every day with 3.7,
one or two times per day.
[ 2235.186027] WARNING: at /mnt/sda7/kernel/linux/arch/x86/kernel/apic/ipi.c:109 default_send_IPI_mask_logical+0x2f/0xb8()
[ 2235.186030] Hardware name: Aspire 4741
[ 2235.186032] empty IPI mask
[ 2235.186034] Modules linked in: vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) nvidia(PO) wl(O)
[ 2235.186046] Pid: 5542, comm: pool Tainted: P O 3.7.2+ #41
[ 2235.186049] Call Trace:
[ 2235.186059] [<c1020ec7>] warn_slowpath_common+0x65/0x7a
[ 2235.186064] [<c1016bf0>] ? default_send_IPI_mask_logical+0x2f/0xb8
[ 2235.186069] [<c1020f40>] warn_slowpath_fmt+0x26/0x2a
[ 2235.186074] [<c1016bf0>] default_send_IPI_mask_logical+0x2f/0xb8
[ 2235.186079] [<c1015cbc>] native_send_call_func_ipi+0x4f/0x57
[ 2235.186087] [<c1053453>] smp_call_function_many+0x191/0x1a9
[ 2235.186092] [<c101dffc>] ? do_flush_tlb_all+0x3f/0x3f
[ 2235.186097] [<c101e074>] native_flush_tlb_others+0x21/0x24
[ 2235.186101] [<c101e0da>] flush_tlb_page+0x63/0x89
[ 2235.186105] [<c101d360>] ptep_set_access_flags+0x20/0x26
[ 2235.186111] [<c108fadd>] do_wp_page+0x234/0x502
[ 2235.186117] [<c1043f1a>] ? T.2009+0x31/0x35
[ 2235.186121] [<c1090825>] handle_pte_fault+0x50d/0x54c
[ 2235.186128] [<c1027150>] ? irq_exit+0x5f/0x61
[ 2235.186133] [<c1015c47>] ? smp_call_function_interrupt+0x2c/0x2e
[ 2235.186143] [<c12db06d>] ? call_function_interrupt+0x2d/0x34
[ 2235.186148] [<c1090934>] handle_mm_fault+0xd0/0xe2
[ 2235.186153] [<c12dd143>] __do_page_fault+0x411/0x42d
[ 2235.186158] [<c1052523>] ? sys_futex+0xa9/0xee
[ 2235.186162] [<c12dd15f>] ? __do_page_fault+0x42d/0x42d
[ 2235.186166] [<c12dd167>] do_page_fault+0x8/0xa
[ 2235.186170] [<c12db31a>] error_code+0x5a/0x60
[ 2235.186174] [<c12dd15f>] ? __do_page_fault+0x42d/0x42d
[ 2235.186177] ---[ end trace 089b20858c3cb340 ]---
This patch fix it.
This patch also fix some system hang problem:
If the data->cpumask been cleared after pass
if (WARN_ONCE(!mask, "empty IPI mask"))
return;
then the problem 83d349f3 fix will happen again.
Signed-off-by: Wang YanQing <udknight@gmail.com>
---
kernel/smp.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/kernel/smp.c b/kernel/smp.c
index 29dd40a..7c56aba 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -33,6 +33,7 @@ struct call_function_data {
struct call_single_data csd;
atomic_t refs;
cpumask_var_t cpumask;
+ cpumask_var_t cpumask_ipi;
};
static DEFINE_PER_CPU_SHARED_ALIGNED(struct call_function_data, cfd_data);
@@ -526,6 +527,13 @@ void smp_call_function_many(const struct cpumask *mask,
return;
}
+ /*
+ * After we put entry into list, data->cpumask
+ * may be cleared when others cpu respone other
+ * IPI for call function, then data->cpumask will
+ * be zero.
+ */
+ cpumask_copy(data->cpumask_ipi, data->cpumask);
raw_spin_lock_irqsave(&call_function.lock, flags);
/*
* Place entry at the _HEAD_ of the list, so that any cpu still
@@ -549,7 +557,7 @@ void smp_call_function_many(const struct cpumask *mask,
smp_mb();
/* Send a message to all CPUs in the map */
- arch_send_call_function_ipi_mask(data->cpumask);
+ arch_send_call_function_ipi_mask(data->cpumask_ipi);
/* Optionally wait for the CPUs to complete */
if (wait)
--
1.7.11.1.116.g8228a23
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH]smp: Fix send func call IPI to empty cpu mask
2013-01-26 7:53 [PATCH]smp: Fix send func call IPI to empty cpu mask Wang YanQing
@ 2013-01-26 20:06 ` Linus Torvalds
2013-01-27 15:50 ` Ingo Molnar
0 siblings, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2013-01-26 20:06 UTC (permalink / raw)
To: Wang YanQing, Andrew Morton, Peter Zijlstra, Thomas Gleixner,
mina86, Srivatsa S. Bhat, Linux Kernel Mailing List, stable,
Ingo Molnar, Mike Galbraith, Jan Beulich, Milton Miller
On Fri, Jan 25, 2013 at 11:53 PM, Wang YanQing <udknight@gmail.com> wrote:
> I get below warning every day with 3.7,
> one or two times per day.
>
> [ 2235.186027] WARNING: at /mnt/sda7/kernel/linux/arch/x86/kernel/apic/ipi.c:109 default_send_IPI_mask_logical+0x2f/0xb8()
> [ 2235.186030] Hardware name: Aspire 4741
> [ 2235.186032] empty IPI mask
> [ 2235.186079] [<c1015cbc>] native_send_call_func_ipi+0x4f/0x57
> [ 2235.186087] [<c1053453>] smp_call_function_many+0x191/0x1a9
> [ 2235.186097] [<c101e074>] native_flush_tlb_others+0x21/0x24
> [ 2235.186101] [<c101e0da>] flush_tlb_page+0x63/0x89
> [ 2235.186105] [<c101d360>] ptep_set_access_flags+0x20/0x26
> [ 2235.186111] [<c108fadd>] do_wp_page+0x234/0x502
> [ 2235.186121] [<c1090825>] handle_pte_fault+0x50d/0x54c
> [ 2235.186148] [<c1090934>] handle_mm_fault+0xd0/0xe2
> [ 2235.186153] [<c12dd143>] __do_page_fault+0x411/0x42d
> [ 2235.186166] [<c12dd167>] do_page_fault+0x8/0xa
> [ 2235.186170] [<c12db31a>] error_code+0x5a/0x60
>
> This patch fix it.
>
> This patch also fix some system hang problem:
> If the data->cpumask been cleared after pass
>
> if (WARN_ONCE(!mask, "empty IPI mask"))
> return;
> then the problem 83d349f3 fix will happen again.
Hmm. We have very consciously tried to avoid the extra copy, although
I'm not entirely sure why (it might possibly hurt on the MAXSMP
configuration).
See for example commit 723aae25d5cd ("smp_call_function_many: handle
concurrent clearing of mask") which fixed another version of this
problem.
But I do agree that it looks like the copy is required, simply because
- as you say - once we've done the "list_add_rcu()" to add it to the
queue, we can have (another) IPI to the target CPU that can now see it
and clear the mask.
So by the time we get to actually send the IPI, the mask might have
been cleared by another IPI. So I do agree that your patch seems
correct, but I really really want to run it by other people.
Guys? Original patch on lkml. The other possible fix might be to take
the &call_function.lock earlier in
generic_smp_call_function_interrupt(), so that we can never clear the
bit while somebody is adding entries to the list... But I think it
very much tries to avoid that on purpose right now, with only the last
CPU responding to that IPI taking the lock.
So copying the IPI mask seems to be the reasonable approach. Comments?
Linus
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH]smp: Fix send func call IPI to empty cpu mask
2013-01-26 20:06 ` Linus Torvalds
@ 2013-01-27 15:50 ` Ingo Molnar
2013-01-28 9:25 ` Jan Beulich
0 siblings, 1 reply; 6+ messages in thread
From: Ingo Molnar @ 2013-01-27 15:50 UTC (permalink / raw)
To: Linus Torvalds
Cc: Wang YanQing, Andrew Morton, Peter Zijlstra, Thomas Gleixner,
mina86, Srivatsa S. Bhat, Linux Kernel Mailing List, stable,
Mike Galbraith, Jan Beulich, Milton Miller
* Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Fri, Jan 25, 2013 at 11:53 PM, Wang YanQing <udknight@gmail.com> wrote:
> > I get below warning every day with 3.7,
> > one or two times per day.
> >
> > [ 2235.186027] WARNING: at /mnt/sda7/kernel/linux/arch/x86/kernel/apic/ipi.c:109 default_send_IPI_mask_logical+0x2f/0xb8()
> > [ 2235.186030] Hardware name: Aspire 4741
> > [ 2235.186032] empty IPI mask
> > [ 2235.186079] [<c1015cbc>] native_send_call_func_ipi+0x4f/0x57
> > [ 2235.186087] [<c1053453>] smp_call_function_many+0x191/0x1a9
> > [ 2235.186097] [<c101e074>] native_flush_tlb_others+0x21/0x24
> > [ 2235.186101] [<c101e0da>] flush_tlb_page+0x63/0x89
> > [ 2235.186105] [<c101d360>] ptep_set_access_flags+0x20/0x26
> > [ 2235.186111] [<c108fadd>] do_wp_page+0x234/0x502
> > [ 2235.186121] [<c1090825>] handle_pte_fault+0x50d/0x54c
> > [ 2235.186148] [<c1090934>] handle_mm_fault+0xd0/0xe2
> > [ 2235.186153] [<c12dd143>] __do_page_fault+0x411/0x42d
> > [ 2235.186166] [<c12dd167>] do_page_fault+0x8/0xa
> > [ 2235.186170] [<c12db31a>] error_code+0x5a/0x60
> >
> > This patch fix it.
> >
> > This patch also fix some system hang problem:
> > If the data->cpumask been cleared after pass
> >
> > if (WARN_ONCE(!mask, "empty IPI mask"))
> > return;
> > then the problem 83d349f3 fix will happen again.
>
> Hmm. We have very consciously tried to avoid the extra copy, although
> I'm not entirely sure why (it might possibly hurt on the MAXSMP
> configuration).
>
> See for example commit 723aae25d5cd ("smp_call_function_many: handle
> concurrent clearing of mask") which fixed another version of this
> problem.
>
> But I do agree that it looks like the copy is required, simply because
> - as you say - once we've done the "list_add_rcu()" to add it to the
> queue, we can have (another) IPI to the target CPU that can now see it
> and clear the mask.
>
> So by the time we get to actually send the IPI, the mask might have
> been cleared by another IPI. So I do agree that your patch seems
> correct, but I really really want to run it by other people.
>
> Guys? Original patch on lkml. The other possible fix might be
> to take the &call_function.lock earlier in
> generic_smp_call_function_interrupt(), so that we can never
> clear the bit while somebody is adding entries to the list...
> But I think it very much tries to avoid that on purpose right
> now, with only the last CPU responding to that IPI taking the
> lock.
>
> So copying the IPI mask seems to be the reasonable approach.
> Comments?
Agreed, looks correct to me as well - I've queued the fix up in
tip:x86/urgent.
( I've added your Acked-by to the patch, please holler if you
disagree with how the final commit ended up looking like. )
Thanks,
Ingo
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH]smp: Fix send func call IPI to empty cpu mask
2013-01-27 15:50 ` Ingo Molnar
@ 2013-01-28 9:25 ` Jan Beulich
2013-01-28 9:32 ` Wang YanQing
2013-01-28 10:20 ` Ingo Molnar
0 siblings, 2 replies; 6+ messages in thread
From: Jan Beulich @ 2013-01-28 9:25 UTC (permalink / raw)
To: Ingo Molnar, Linus Torvalds
Cc: Milton Miller, Wang YanQing, Mike Galbraith, Peter Zijlstra,
Thomas Gleixner, Andrew Morton, Srivatsa S. Bhat, mina86,
Linux Kernel Mailing List, stable
>>> On 27.01.13 at 16:50, Ingo Molnar <mingo@kernel.org> wrote:
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>> On Fri, Jan 25, 2013 at 11:53 PM, Wang YanQing <udknight@gmail.com> wrote:
>> > I get below warning every day with 3.7,
>> > one or two times per day.
>> >
>> > [ 2235.186027] WARNING: at
> /mnt/sda7/kernel/linux/arch/x86/kernel/apic/ipi.c:109
> default_send_IPI_mask_logical+0x2f/0xb8()
>> > [ 2235.186030] Hardware name: Aspire 4741
>> > [ 2235.186032] empty IPI mask
>> > [ 2235.186079] [<c1015cbc>] native_send_call_func_ipi+0x4f/0x57
>> > [ 2235.186087] [<c1053453>] smp_call_function_many+0x191/0x1a9
>> > [ 2235.186097] [<c101e074>] native_flush_tlb_others+0x21/0x24
>> > [ 2235.186101] [<c101e0da>] flush_tlb_page+0x63/0x89
>> > [ 2235.186105] [<c101d360>] ptep_set_access_flags+0x20/0x26
>> > [ 2235.186111] [<c108fadd>] do_wp_page+0x234/0x502
>> > [ 2235.186121] [<c1090825>] handle_pte_fault+0x50d/0x54c
>> > [ 2235.186148] [<c1090934>] handle_mm_fault+0xd0/0xe2
>> > [ 2235.186153] [<c12dd143>] __do_page_fault+0x411/0x42d
>> > [ 2235.186166] [<c12dd167>] do_page_fault+0x8/0xa
>> > [ 2235.186170] [<c12db31a>] error_code+0x5a/0x60
>> >
>> > This patch fix it.
>> >
>> > This patch also fix some system hang problem:
>> > If the data->cpumask been cleared after pass
>> >
>> > if (WARN_ONCE(!mask, "empty IPI mask"))
>> > return;
>> > then the problem 83d349f3 fix will happen again.
>>
>> Hmm. We have very consciously tried to avoid the extra copy, although
>> I'm not entirely sure why (it might possibly hurt on the MAXSMP
>> configuration).
>>
>> See for example commit 723aae25d5cd ("smp_call_function_many: handle
>> concurrent clearing of mask") which fixed another version of this
>> problem.
>>
>> But I do agree that it looks like the copy is required, simply because
>> - as you say - once we've done the "list_add_rcu()" to add it to the
>> queue, we can have (another) IPI to the target CPU that can now see it
>> and clear the mask.
>>
>> So by the time we get to actually send the IPI, the mask might have
>> been cleared by another IPI. So I do agree that your patch seems
>> correct, but I really really want to run it by other people.
>>
>> Guys? Original patch on lkml. The other possible fix might be
>> to take the &call_function.lock earlier in
>> generic_smp_call_function_interrupt(), so that we can never
>> clear the bit while somebody is adding entries to the list...
>> But I think it very much tries to avoid that on purpose right
>> now, with only the last CPU responding to that IPI taking the
>> lock.
>>
>> So copying the IPI mask seems to be the reasonable approach.
>> Comments?
>
> Agreed, looks correct to me as well - I've queued the fix up in
> tip:x86/urgent.
But the patch is obviously incomplete for the CPUMASK_OFFSTACK
case, as the newly added cpumask_ipi member never gets
its bit array allocated.
Jan
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH]smp: Fix send func call IPI to empty cpu mask
2013-01-28 9:25 ` Jan Beulich
@ 2013-01-28 9:32 ` Wang YanQing
2013-01-28 10:20 ` Ingo Molnar
1 sibling, 0 replies; 6+ messages in thread
From: Wang YanQing @ 2013-01-28 9:32 UTC (permalink / raw)
To: Jan Beulich
Cc: Ingo Molnar, Linus Torvalds, Milton Miller, Mike Galbraith,
Peter Zijlstra, Thomas Gleixner, Andrew Morton, Srivatsa S. Bhat,
mina86, Linux Kernel Mailing List, stable
On Mon, Jan 28, 2013 at 09:25:55AM +0000, Jan Beulich wrote:
> But the patch is obviously incomplete for the CPUMASK_OFFSTACK
> case, as the newly added cpumask_ipi member never gets
> its bit array allocated.
>
> Jan
Yes, I have found it, and the patch is in lkml some hours ago.
Thanks.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH]smp: Fix send func call IPI to empty cpu mask
2013-01-28 9:25 ` Jan Beulich
2013-01-28 9:32 ` Wang YanQing
@ 2013-01-28 10:20 ` Ingo Molnar
1 sibling, 0 replies; 6+ messages in thread
From: Ingo Molnar @ 2013-01-28 10:20 UTC (permalink / raw)
To: Jan Beulich
Cc: Linus Torvalds, Milton Miller, Wang YanQing, Mike Galbraith,
Peter Zijlstra, Thomas Gleixner, Andrew Morton, Srivatsa S. Bhat,
mina86, Linux Kernel Mailing List, stable
* Jan Beulich <JBeulich@suse.com> wrote:
> >>> On 27.01.13 at 16:50, Ingo Molnar <mingo@kernel.org> wrote:
>
> > * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> >
> >> On Fri, Jan 25, 2013 at 11:53 PM, Wang YanQing <udknight@gmail.com> wrote:
> >> > I get below warning every day with 3.7,
> >> > one or two times per day.
> >> >
> >> > [ 2235.186027] WARNING: at
> > /mnt/sda7/kernel/linux/arch/x86/kernel/apic/ipi.c:109
> > default_send_IPI_mask_logical+0x2f/0xb8()
> >> > [ 2235.186030] Hardware name: Aspire 4741
> >> > [ 2235.186032] empty IPI mask
> >> > [ 2235.186079] [<c1015cbc>] native_send_call_func_ipi+0x4f/0x57
> >> > [ 2235.186087] [<c1053453>] smp_call_function_many+0x191/0x1a9
> >> > [ 2235.186097] [<c101e074>] native_flush_tlb_others+0x21/0x24
> >> > [ 2235.186101] [<c101e0da>] flush_tlb_page+0x63/0x89
> >> > [ 2235.186105] [<c101d360>] ptep_set_access_flags+0x20/0x26
> >> > [ 2235.186111] [<c108fadd>] do_wp_page+0x234/0x502
> >> > [ 2235.186121] [<c1090825>] handle_pte_fault+0x50d/0x54c
> >> > [ 2235.186148] [<c1090934>] handle_mm_fault+0xd0/0xe2
> >> > [ 2235.186153] [<c12dd143>] __do_page_fault+0x411/0x42d
> >> > [ 2235.186166] [<c12dd167>] do_page_fault+0x8/0xa
> >> > [ 2235.186170] [<c12db31a>] error_code+0x5a/0x60
> >> >
> >> > This patch fix it.
> >> >
> >> > This patch also fix some system hang problem:
> >> > If the data->cpumask been cleared after pass
> >> >
> >> > if (WARN_ONCE(!mask, "empty IPI mask"))
> >> > return;
> >> > then the problem 83d349f3 fix will happen again.
> >>
> >> Hmm. We have very consciously tried to avoid the extra copy, although
> >> I'm not entirely sure why (it might possibly hurt on the MAXSMP
> >> configuration).
> >>
> >> See for example commit 723aae25d5cd ("smp_call_function_many: handle
> >> concurrent clearing of mask") which fixed another version of this
> >> problem.
> >>
> >> But I do agree that it looks like the copy is required, simply because
> >> - as you say - once we've done the "list_add_rcu()" to add it to the
> >> queue, we can have (another) IPI to the target CPU that can now see it
> >> and clear the mask.
> >>
> >> So by the time we get to actually send the IPI, the mask might have
> >> been cleared by another IPI. So I do agree that your patch seems
> >> correct, but I really really want to run it by other people.
> >>
> >> Guys? Original patch on lkml. The other possible fix might be
> >> to take the &call_function.lock earlier in
> >> generic_smp_call_function_interrupt(), so that we can never
> >> clear the bit while somebody is adding entries to the list...
> >> But I think it very much tries to avoid that on purpose right
> >> now, with only the last CPU responding to that IPI taking the
> >> lock.
> >>
> >> So copying the IPI mask seems to be the reasonable approach.
> >> Comments?
> >
> > Agreed, looks correct to me as well - I've queued the fix up in
> > tip:x86/urgent.
>
> But the patch is obviously incomplete for the CPUMASK_OFFSTACK
> case, as the newly added cpumask_ipi member never gets its bit
> array allocated.
Yes, indeed - I'll amend it with the fix.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-01-28 10:20 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-26 7:53 [PATCH]smp: Fix send func call IPI to empty cpu mask Wang YanQing
2013-01-26 20:06 ` Linus Torvalds
2013-01-27 15:50 ` Ingo Molnar
2013-01-28 9:25 ` Jan Beulich
2013-01-28 9:32 ` Wang YanQing
2013-01-28 10:20 ` Ingo Molnar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).