From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758027AbZBLKOf (ORCPT ); Thu, 12 Feb 2009 05:14:35 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756678AbZBLKOZ (ORCPT ); Thu, 12 Feb 2009 05:14:25 -0500 Received: from bombadil.infradead.org ([18.85.46.34]:33746 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756559AbZBLKOY (ORCPT ); Thu, 12 Feb 2009 05:14:24 -0500 Subject: Re: [patch] generic-ipi: remove kmalloc, cleanup From: Peter Zijlstra To: Ingo Molnar Cc: Frederic Weisbecker , Thomas Gleixner , LKML , rt-users , Steven Rostedt , Carsten Emde , Clark Williams , rusty In-Reply-To: <20090212100756.GA12790@elte.hu> References: <20090212005032.GA4788@nowhere> <20090212021257.GB4697@nowhere> <20090212081801.GA22979@elte.hu> <20090212081923.GA26838@elte.hu> <1234430564.23438.205.camel@twins> <20090212100756.GA12790@elte.hu> Content-Type: text/plain Content-Transfer-Encoding: 7bit Date: Thu, 12 Feb 2009 11:16:10 +0100 Message-Id: <1234433770.23438.210.camel@twins> Mime-Version: 1.0 X-Mailer: Evolution 2.24.3 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2009-02-12 at 11:07 +0100, Ingo Molnar wrote: > * Peter Zijlstra wrote: > > > On Thu, 2009-02-12 at 09:19 +0100, Ingo Molnar wrote: > > > * Ingo Molnar wrote: > > > > > > > hm, that's a complex one - we do kfree() from IPI context, [...] > > > > > > The patch below might do the trick - it offloads this to a softirq. > > > Not tested yet. > > > > The simple fix is something like: > > > > --- > > kernel/smp.c | 8 ++++++++ > > 1 files changed, 8 insertions(+), 0 deletions(-) > > ok, i made it unconditional (not just a PREEMPT_RT hac) and did the > cleanup below on top of it. > > I dont think repeat, queued IPIs are all that interesting from a > performance point of view. If they are, it will all be clearly > bisectable. Right, except I really don't like the smp_call_function_many() slow path that's now the only path. Rusty did that I think, but he also had some idea on how to fix it, I think it boiled down to sticking a count in the call data instead of the full cpumask. So I'd rather we first fix that code, and then remove the kmalloc all-together like you propose here. > Ingo > > ---------------> > Subject: generic-ipi: remove kmalloc, cleanup > From: Ingo Molnar > > Now that we dont use the kmalloc() sequence anymore, remove > CSD_FLAG_ALLOC and all its dependencies. > > Signed-off-by: Ingo Molnar > --- > kernel/smp.c | 86 +++++++++++------------------------------------------------ > 1 file changed, 17 insertions(+), 69 deletions(-) > > Index: tip/kernel/smp.c > =================================================================== > --- tip.orig/kernel/smp.c > +++ tip/kernel/smp.c > @@ -17,8 +17,7 @@ __cacheline_aligned_in_smp DEFINE_RAW_SP > > enum { > CSD_FLAG_WAIT = 0x01, > - CSD_FLAG_ALLOC = 0x02, > - CSD_FLAG_LOCK = 0x04, > + CSD_FLAG_LOCK = 0x02, > }; > > struct call_function_data { > @@ -85,15 +84,6 @@ static void generic_exec_single(int cpu, > csd_flag_wait(data); > } > > -static void rcu_free_call_data(struct rcu_head *head) > -{ > - struct call_function_data *data; > - > - data = container_of(head, struct call_function_data, rcu_head); > - > - kfree(data); > -} > - > /* > * Invoked by arch to handle an IPI for call function. Must be called with > * interrupts disabled. > @@ -138,8 +128,6 @@ void generic_smp_call_function_interrupt > smp_wmb(); > data->csd.flags &= ~CSD_FLAG_WAIT; > } > - if (data->csd.flags & CSD_FLAG_ALLOC) > - call_rcu(&data->rcu_head, rcu_free_call_data); > } > rcu_read_unlock(); > > @@ -190,8 +178,7 @@ void generic_smp_call_function_single_in > } else if (data_flags & CSD_FLAG_LOCK) { > smp_wmb(); > data->flags &= ~CSD_FLAG_LOCK; > - } else if (data_flags & CSD_FLAG_ALLOC) > - kfree(data); > + } > } > /* > * See comment on outer loop > @@ -236,13 +223,11 @@ int smp_call_function_single(int cpu, vo > /* > * We are calling a function on a single CPU > * and we are not going to wait for it to finish. > - * We first try to allocate the data, but if we > - * fail, we fall back to use a per cpu data to pass > - * the information to that CPU. Since all callers > - * of this code will use the same data, we must > - * synchronize the callers to prevent a new caller > - * from corrupting the data before the callee > - * can access it. > + * We use a per cpu data to pass the information to > + * that CPU. Since all callers of this code will use > + * the same data, we must synchronize the callers to > + * prevent a new caller from corrupting the data before > + * the callee can access it. > * > * The CSD_FLAG_LOCK is used to let us know when > * the IPI handler is done with the data. > @@ -252,15 +237,10 @@ int smp_call_function_single(int cpu, vo > * will make sure the callee is done with the > * data before a new caller will use it. > */ > - data = NULL; > - if (data) > - data->flags = CSD_FLAG_ALLOC; > - else { > - data = &per_cpu(csd_data, me); > - while (data->flags & CSD_FLAG_LOCK) > - cpu_relax(); > - data->flags = CSD_FLAG_LOCK; > - } > + data = &per_cpu(csd_data, me); > + while (data->flags & CSD_FLAG_LOCK) > + cpu_relax(); > + data->flags = CSD_FLAG_LOCK; > } else { > data = &d; > data->flags = CSD_FLAG_WAIT; > @@ -321,8 +301,6 @@ void smp_call_function_many(const struct > void (*func)(void *), void *info, > bool wait) > { > - struct call_function_data *data; > - unsigned long flags; > int cpu, next_cpu; > > /* Can deadlock when called with interrupts disabled */ > @@ -347,43 +325,13 @@ void smp_call_function_many(const struct > return; > } > > - data = NULL; > - if (unlikely(!data)) { > - /* Slow path. */ > - for_each_online_cpu(cpu) { > - if (cpu == smp_processor_id()) > - continue; > - if (cpumask_test_cpu(cpu, mask)) > - smp_call_function_single(cpu, func, info, wait); > - } > - return; > + /* Slow path. */ > + for_each_online_cpu(cpu) { > + if (cpu == smp_processor_id()) > + continue; > + if (cpumask_test_cpu(cpu, mask)) > + smp_call_function_single(cpu, func, info, wait); > } > - > - spin_lock_init(&data->lock); > - data->csd.flags = CSD_FLAG_ALLOC; > - if (wait) > - data->csd.flags |= CSD_FLAG_WAIT; > - data->csd.func = func; > - data->csd.info = info; > - cpumask_and(to_cpumask(data->cpumask_bits), mask, cpu_online_mask); > - cpumask_clear_cpu(smp_processor_id(), to_cpumask(data->cpumask_bits)); > - data->refs = cpumask_weight(to_cpumask(data->cpumask_bits)); > - > - spin_lock_irqsave(&call_function_lock, flags); > - list_add_tail_rcu(&data->csd.list, &call_function_queue); > - spin_unlock_irqrestore(&call_function_lock, flags); > - > - /* > - * Make the list addition visible before sending the ipi. > - */ > - smp_mb(); > - > - /* Send a message to all CPUs in the map */ > - arch_send_call_function_ipi_mask(to_cpumask(data->cpumask_bits)); > - > - /* optionally wait for the CPUs to complete */ > - if (wait) > - csd_flag_wait(&data->csd); > } > EXPORT_SYMBOL(smp_call_function_many); >