* smp_call_function_single with wait=0 considered harmful @ 2013-12-04 16:46 Christoph Hellwig 2013-12-05 21:43 ` Bjorn Helgaas 2014-02-28 12:26 ` Peter Zijlstra 0 siblings, 2 replies; 8+ messages in thread From: Christoph Hellwig @ 2013-12-04 16:46 UTC (permalink / raw) To: Andrew Morton, Ingo Molnar, Thomas Gleixner, Tony Luck, Robert Richter, Bjorn Helgaas, Aaro Koskinen, David Daney Cc: linux-kernel While doing my recent work on the generic smp function calls I noticed that smp_call_function_single without the wait flag can't work, as it allocates struct call_single_data on stack, and without the wait flag will happily return before the IPI has been executed. This affects the following callers: arch/ia64/kernel/mca.c:mca_cpu_callback() arch/ia64/kernel/smpboot.c:ia64_sync_itc() arch/x86/kernel/kvm.c:kvm_cpu_notify() arch/x86/oprofile/nmi_int.c:oprofile_cpu_notifier() arch/x86/pci/amd_bus.c:amd_cpu_notify() drivers/staging/octeon/ethernet-rx.c:cvm_oct_enable_one_cpu() kernel/stop_machine.c:stop_two_cpus() It would be good to get these fixed so that we could remove the parameter. Either convert them to wait, or use a preallocated call_single_data and __smp_call_function_single. After that I'd like to remove the wait argument to prevent further abuses. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: smp_call_function_single with wait=0 considered harmful 2013-12-04 16:46 smp_call_function_single with wait=0 considered harmful Christoph Hellwig @ 2013-12-05 21:43 ` Bjorn Helgaas 2013-12-06 10:56 ` Christoph Hellwig 2014-02-28 12:26 ` Peter Zijlstra 1 sibling, 1 reply; 8+ messages in thread From: Bjorn Helgaas @ 2013-12-05 21:43 UTC (permalink / raw) To: Christoph Hellwig Cc: Andrew Morton, Ingo Molnar, Thomas Gleixner, Tony Luck, Robert Richter, Aaro Koskinen, David Daney, linux-kernel On Wed, Dec 4, 2013 at 9:46 AM, Christoph Hellwig <hch@infradead.org> wrote: > While doing my recent work on the generic smp function calls I noticed > that smp_call_function_single without the wait flag can't work, as > it allocates struct call_single_data on stack, and without the wait > flag will happily return before the IPI has been executed. I don't understand the problem yet. With wait==0, smp_call_function_single() sets "csd = &__get_cpu_var(csd_data)", so it's not using a struct on the stack. We'll queue up "func" and likely will return before it is executed, but that should be fine because nobody will overwrite csd_data until it *is* executed and csd_unlock() has been called. > This affects the following callers: > > arch/ia64/kernel/mca.c:mca_cpu_callback() > arch/ia64/kernel/smpboot.c:ia64_sync_itc() > arch/x86/kernel/kvm.c:kvm_cpu_notify() > arch/x86/oprofile/nmi_int.c:oprofile_cpu_notifier() > arch/x86/pci/amd_bus.c:amd_cpu_notify() I don't see any reason why amd_cpu_notify() needs to use wait==0. > drivers/staging/octeon/ethernet-rx.c:cvm_oct_enable_one_cpu() > kernel/stop_machine.c:stop_two_cpus() > > It would be good to get these fixed so that we could remove the > parameter. Either convert them to wait, or use a preallocated > call_single_data and __smp_call_function_single. > > After that I'd like to remove the wait argument to prevent further > abuses. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: smp_call_function_single with wait=0 considered harmful 2013-12-05 21:43 ` Bjorn Helgaas @ 2013-12-06 10:56 ` Christoph Hellwig 0 siblings, 0 replies; 8+ messages in thread From: Christoph Hellwig @ 2013-12-06 10:56 UTC (permalink / raw) To: Bjorn Helgaas Cc: Christoph Hellwig, Andrew Morton, Ingo Molnar, Thomas Gleixner, Tony Luck, Robert Richter, Aaro Koskinen, David Daney, linux-kernel On Thu, Dec 05, 2013 at 02:43:03PM -0700, Bjorn Helgaas wrote: > smp_call_function_single() sets "csd = &__get_cpu_var(csd_data)", so > it's not using a struct on the stack. We'll queue up "func" and > likely will return before it is executed, but that should be fine > because nobody will overwrite csd_data until it *is* executed and > csd_unlock() has been called. You're right, I missed the usage of the per-cpu data later in the function. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: smp_call_function_single with wait=0 considered harmful 2013-12-04 16:46 smp_call_function_single with wait=0 considered harmful Christoph Hellwig 2013-12-05 21:43 ` Bjorn Helgaas @ 2014-02-28 12:26 ` Peter Zijlstra 2014-02-28 12:39 ` Peter Zijlstra 1 sibling, 1 reply; 8+ messages in thread From: Peter Zijlstra @ 2014-02-28 12:26 UTC (permalink / raw) To: Christoph Hellwig Cc: Andrew Morton, Ingo Molnar, Thomas Gleixner, Tony Luck, Robert Richter, Bjorn Helgaas, Aaro Koskinen, David Daney, linux-kernel On Wed, Dec 04, 2013 at 08:46:27AM -0800, Christoph Hellwig wrote: > While doing my recent work on the generic smp function calls I noticed > that smp_call_function_single without the wait flag can't work, as > it allocates struct call_single_data on stack, and without the wait > flag will happily return before the IPI has been executed. It doesn't actually; it uses a per-cpu one in the !wait case. The subsequent csd_lock() ensures it will wait for any prior user to complete, so only if you're doing multiple smp_call_function_single() invocations back-to-back will they queue up. > This affects the following callers: <snip> > kernel/stop_machine.c:stop_two_cpus() That site should work with .wait=1 just fine, but given the above, the .wait=0 doesn't appear problematic at all. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: smp_call_function_single with wait=0 considered harmful 2014-02-28 12:26 ` Peter Zijlstra @ 2014-02-28 12:39 ` Peter Zijlstra 2014-02-28 17:06 ` Rik van Riel ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Peter Zijlstra @ 2014-02-28 12:39 UTC (permalink / raw) To: Christoph Hellwig Cc: Andrew Morton, Ingo Molnar, Thomas Gleixner, Tony Luck, Robert Richter, Bjorn Helgaas, Aaro Koskinen, David Daney, linux-kernel, Prarit Bhargava, Rik van Riel, Mel Gorman On Fri, Feb 28, 2014 at 01:26:24PM +0100, Peter Zijlstra wrote: > On Wed, Dec 04, 2013 at 08:46:27AM -0800, Christoph Hellwig wrote: > > kernel/stop_machine.c:stop_two_cpus() > > That site should work with .wait=1 just fine, but given the above, the > .wait=0 doesn't appear problematic at all. Scratch that; its broken, but not because of smp_call_function_single(). --- Subject: stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus() We must use smp_call_function_single(.wait=1) for the irq_cpu_stop_queue_work() to ensure the queueing is actually done under stop_cpus_lock. Without this we could have dropped the lock by the time we do the queueing and get the race we tried to fix. Fixes: 7053ea1a34fa ("stop_machine: Fix race between stop_two_cpus() and stop_cpus()") Cc: Prarit Bhargava <prarit@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> --- kernel/stop_machine.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c index 84571e09c907..01fbae5b97b7 100644 --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -293,7 +293,7 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void * */ smp_call_function_single(min(cpu1, cpu2), &irq_cpu_stop_queue_work, - &call_args, 0); + &call_args, 1); lg_local_unlock(&stop_cpus_lock); preempt_enable(); ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: smp_call_function_single with wait=0 considered harmful 2014-02-28 12:39 ` Peter Zijlstra @ 2014-02-28 17:06 ` Rik van Riel 2014-02-28 17:34 ` Prarit Bhargava 2014-03-11 12:36 ` [tip:sched/core] stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus() tip-bot for Peter Zijlstra 2 siblings, 0 replies; 8+ messages in thread From: Rik van Riel @ 2014-02-28 17:06 UTC (permalink / raw) To: Peter Zijlstra, Christoph Hellwig Cc: Andrew Morton, Ingo Molnar, Thomas Gleixner, Tony Luck, Robert Richter, Bjorn Helgaas, Aaro Koskinen, David Daney, linux-kernel, Prarit Bhargava, Mel Gorman On 02/28/2014 07:39 AM, Peter Zijlstra wrote: > Subject: stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus() > > We must use smp_call_function_single(.wait=1) for the > irq_cpu_stop_queue_work() to ensure the queueing is actually done under > stop_cpus_lock. Without this we could have dropped the lock by the time > we do the queueing and get the race we tried to fix. > > Fixes: 7053ea1a34fa ("stop_machine: Fix race between stop_two_cpus() and stop_cpus()") > Cc: Prarit Bhargava <prarit@redhat.com> > Cc: Rik van Riel <riel@redhat.com> > Cc: Mel Gorman <mgorman@suse.de> > Signed-off-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Rik van Riel <riel@redhat.com> -- All rights reversed ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: smp_call_function_single with wait=0 considered harmful 2014-02-28 12:39 ` Peter Zijlstra 2014-02-28 17:06 ` Rik van Riel @ 2014-02-28 17:34 ` Prarit Bhargava 2014-03-11 12:36 ` [tip:sched/core] stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus() tip-bot for Peter Zijlstra 2 siblings, 0 replies; 8+ messages in thread From: Prarit Bhargava @ 2014-02-28 17:34 UTC (permalink / raw) To: Peter Zijlstra Cc: Christoph Hellwig, Andrew Morton, Ingo Molnar, Thomas Gleixner, Tony Luck, Robert Richter, Bjorn Helgaas, Aaro Koskinen, David Daney, linux-kernel, Rik van Riel, Mel Gorman On 02/28/2014 07:39 AM, Peter Zijlstra wrote: > On Fri, Feb 28, 2014 at 01:26:24PM +0100, Peter Zijlstra wrote: >> On Wed, Dec 04, 2013 at 08:46:27AM -0800, Christoph Hellwig wrote: >>> kernel/stop_machine.c:stop_two_cpus() >> >> That site should work with .wait=1 just fine, but given the above, the >> .wait=0 doesn't appear problematic at all. > > Scratch that; its broken, but not because of smp_call_function_single(). > > --- > Subject: stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus() > > We must use smp_call_function_single(.wait=1) for the > irq_cpu_stop_queue_work() to ensure the queueing is actually done under > stop_cpus_lock. Without this we could have dropped the lock by the time > we do the queueing and get the race we tried to fix. > > Fixes: 7053ea1a34fa ("stop_machine: Fix race between stop_two_cpus() and stop_cpus()") > Cc: Prarit Bhargava <prarit@redhat.com> > Cc: Rik van Riel <riel@redhat.com> > Cc: Mel Gorman <mgorman@suse.de> > Signed-off-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Prarit Bhargava <prarit@redhat.com> P. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [tip:sched/core] stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus() 2014-02-28 12:39 ` Peter Zijlstra 2014-02-28 17:06 ` Rik van Riel 2014-02-28 17:34 ` Prarit Bhargava @ 2014-03-11 12:36 ` tip-bot for Peter Zijlstra 2 siblings, 0 replies; 8+ messages in thread From: tip-bot for Peter Zijlstra @ 2014-03-11 12:36 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, hpa, mingo, peterz, hch, riel, akpm, mgorman, tglx, prarit Commit-ID: 177c53d943368fc97644ebc0a250dc8e2d124250 Gitweb: http://git.kernel.org/tip/177c53d943368fc97644ebc0a250dc8e2d124250 Author: Peter Zijlstra <peterz@infradead.org> AuthorDate: Fri, 28 Feb 2014 13:39:05 +0100 Committer: Ingo Molnar <mingo@kernel.org> CommitDate: Tue, 11 Mar 2014 11:33:47 +0100 stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus() We must use smp_call_function_single(.wait=1) for the irq_cpu_stop_queue_work() to ensure the queueing is actually done under stop_cpus_lock. Without this we could have dropped the lock by the time we do the queueing and get the race we tried to fix. Fixes: 7053ea1a34fa ("stop_machine: Fix race between stop_two_cpus() and stop_cpus()") Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20140228123905.GK3104@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> --- kernel/stop_machine.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c index 84571e0..01fbae5 100644 --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -293,7 +293,7 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void * */ smp_call_function_single(min(cpu1, cpu2), &irq_cpu_stop_queue_work, - &call_args, 0); + &call_args, 1); lg_local_unlock(&stop_cpus_lock); preempt_enable(); ^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-03-11 12:37 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-12-04 16:46 smp_call_function_single with wait=0 considered harmful Christoph Hellwig 2013-12-05 21:43 ` Bjorn Helgaas 2013-12-06 10:56 ` Christoph Hellwig 2014-02-28 12:26 ` Peter Zijlstra 2014-02-28 12:39 ` Peter Zijlstra 2014-02-28 17:06 ` Rik van Riel 2014-02-28 17:34 ` Prarit Bhargava 2014-03-11 12:36 ` [tip:sched/core] stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus() tip-bot for Peter Zijlstra
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.