All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	netfilter-devel@vger.kernel.org, mingo@elte.hu,
	akpm@linux-foundation.org, torvalds@linux-foundation.org,
	davem@davemloft.net, dada1@cosmosbay.com, zbr@ioremap.net,
	jeff.chua.linux@gmail.com, paulus@samba.org,
	laijs@cn.fujitsu.com, jengelh@medozas.de, r000n@r000n.net,
	benh@kernel.crashing.org
Subject: Re: [PATCH RFC] v1 expedited "big hammer" RCU grace periods
Date: Thu, 23 Apr 2009 09:45:39 -0400	[thread overview]
Message-ID: <20090423134539.GC11261@Krystal> (raw)
In-Reply-To: <20090423052520.GA13036@linux.vnet.ibm.com>

* Paul E. McKenney (paulmck@linux.vnet.ibm.com) wrote:
> First cut of "big hammer" expedited RCU grace periods, but only for
> rcu_bh.  This creates another softirq vector, so that entering this
> softirq vector will have forced an rcu_bh quiescent state (as noted by
> Dave Miller).  Use smp_call_function() to invoke raise_softirq() on all
> CPUs in order to cause this to happen.  Track the CPUs that have passed
> through a quiescent state (or gone offline) with a cpumask.
> 
> Does nothing to expedite callbacks already registered with call_rcu_bh(),
> but there is no need to.
> 
> Shortcomings:
> 
> o	Untested, probably does not compile, not for inclusion.
> 
> o	Does not handle rcu, only rcu_bh.
> 
> Thoughts?
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> ---
> 
>  include/linux/interrupt.h |    1 
>  include/linux/rcupdate.h  |    1 
>  kernel/rcupdate.c         |  106 ++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 108 insertions(+)
> 
> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> index 91bb76f..b7b58cc 100644
> --- a/include/linux/interrupt.h
> +++ b/include/linux/interrupt.h
> @@ -338,6 +338,7 @@ enum
>  	TASKLET_SOFTIRQ,
>  	SCHED_SOFTIRQ,
>  	HRTIMER_SOFTIRQ,
> +	RCU_EXPEDITED_SOFTIRQ,
>  	RCU_SOFTIRQ,	/* Preferable RCU should always be the last softirq */
>  
>  	NR_SOFTIRQS
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 15fbb3c..d4af557 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -264,6 +264,7 @@ extern void synchronize_rcu(void);
>  extern void rcu_barrier(void);
>  extern void rcu_barrier_bh(void);
>  extern void rcu_barrier_sched(void);
> +extern void synchronize_rcu_bh_expedited(void);
>  
>  /* Internal to kernel */
>  extern void rcu_init(void);
> diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
> index a967c9f..bfa98dd 100644
> --- a/kernel/rcupdate.c
> +++ b/kernel/rcupdate.c
> @@ -217,10 +217,116 @@ static int __cpuinit rcu_barrier_cpu_hotplug(struct notifier_block *self,
>  	return NOTIFY_OK;
>  }
>  
> +static DEFINE_MUTEX(synchronize_rcu_bh_mutex);
> +static long synchronize_rcu_bh_completed; /* Expedited-grace-period count. */

Given you access this variable through a mutex, you might want to
consider using a u64 (unsigned long long) to make sure it counts a full
64-bits on 32-bits architectures. This would make overflows less likely
to happen. Note that the reads should be done through a mutex then.
Another alternative is to protect it using a seqlock. Note that the two
previous "non-atomic" read solutions should never be used in NMI
context. :-/

> +
> +#ifndef CONFIG_SMP
> +
> +static void __init synchronize_rcu_expedited_init(void)
> +{
> +}
> +
> +void synchronize_rcu_bh_expedited(void)
> +{
> +	mutex_lock(&synchronize_rcu_bh_mutex);
> +	synchronize_rcu_bh_completed++;
> +	mutex_unlock(&synchronize_rcu_bh_mutex);
> +}
> +
> +#else /* #ifndef CONFIG_SMP */
> +
> +static DEFINE_PER_CPU(int, rcu_bh_need_qs);
> +static cpumask_var_t rcu_bh_waiting_map;
> +
> +static void synchronize_rcu_bh_expedited_help(struct softirq_action *unused)
> +{
> +	if (__get_cpu_var(rcu_bh_need_qs)) {

Is this test useful at all ? The value can only be 0 or 1, so if it is
already 0, then we set it to 0 again. I doubt you care that much about
cache-line bouncing ?

> +		smp_mb();
> +		__get_cpu_var(rcu_bh_need_qs) = 0;
> +		smp_mb();

Is this second mb required ? We want to make sure no memory access
coming from code executed before the flag write goes into the next qs
period, this is fine (and explains the first mb()), but do we care
about interleaving writes done after the flag write ? In the worse case,
they will be put in the previous qs period, and that seems fine.

> +	}
> +}
> +
> +static void rcu_bh_fast_qs(void *unused)
> +{
> +	raise_softirq(RCU_EXPEDITED_SOFTIRQ);
> +}
> +
> +static void __init synchronize_rcu_expedited_init(void)
> +{
> +	open_softirq(RCU_EXPEDITED_SOFTIRQ, synchronize_rcu_bh_expedited_help);
> +	alloc_bootmem_cpumask_var(&rcu_bh_waiting_map);
> +}
> +
> +void synchronize_rcu_bh_expedited(void)
> +{
> +	int cpu;
> +	int done;
> +	int times = 0;
> +
> +	mutex_lock(&synchronize_rcu_bh_mutex);
> +
> +	/* Take snapshot of online CPUs, blocking CPU hotplug. */
> +	preempt_disable();

Hrm, how about simply 

get_online_cpus();

put_online_cpus();

just once at the beginning/end of this function to stop cpu hotplug ?

All these preempt_enable/disable seems a bit too much. But I see that
you might not want to disabling cpu hotplug for too long, so you
probably have a good reason to do it this way.

> +	cpumask_copy(rcu_bh_waiting_map, &cpu_online_map);
> +	preempt_enable();
> +
> +	/* Mark each online CPU as needing a quiescent state. */
> +	for_each_cpu(cpu, rcu_bh_waiting_map)
> +		per_cpu(rcu_bh_need_qs, cpu) = 1;
> +
> +	/* Call for a quiescent state on each online CPU. */
> +	preempt_disable();
> +	cpumask_clear_cpu(smp_processor_id(), rcu_bh_waiting_map);

Commenting that smp_call_function has a smp_mb(), which makes sure
per_cpu(rcu_bh_need_qs, cpu) writes happened before calling the remote
functions would not hurt.

Mathieu

> +	smp_call_function(rcu_bh_fast_qs, NULL, 1);
> +	preempt_enable();
> +
> +	/*
> +	 * Loop waiting for each CPU to either pass through a quiescent
> +	 * state or to go offline.  We don't care which.
> +	 */
> +	for (;;) {
> +		
> +		/* Ignore CPUs that have gone offline, blocking CPU hotplug. */
> +		preempt_disable();
> +		cpumask_and(rcu_bh_waiting_map, rcu_bh_waiting_map,
> +			    &cpu_online_map);
> +		cpumask_clear_cpu(smp_processor_id(), rcu_bh_waiting_map);
> +		preempt_enable();
> +
> +		/* Check if any CPUs still need a quiescent state. */
> +		done = 1;
> +		for_each_cpu(cpu, rcu_bh_waiting_map) {
> +			if (per_cpu(rcu_bh_need_qs, cpu)) {
> +				done = 0;
> +				break;
> +			}
> +			cpumask_clear_cpu(cpu, rcu_bh_waiting_map);
> +		}
> +		if (done)
> +			break;
> +
> +		/*
> +		 * Wait a bit.  If we have already waited a fair
> +		 * amount of time, sleep.
> +		 */
> +		if (++times < 10)
> +			udelay(10 * times);
> +		else
> +			schedule_timeout_uninterruptible(1);
> +	}
> +
> +	synchronize_rcu_bh_completed++;

> +	mutex_unlock(&synchronize_rcu_bh_mutex);
> +}
> +
> +#endif /* #else #ifndef CONFIG_SMP */
> +
>  void __init rcu_init(void)
>  {
>  	__rcu_init();
>  	hotcpu_notifier(rcu_barrier_cpu_hotplug, 0);
> +	synchronize_rcu_expedited_init();
>  }
>  
>  void rcu_scheduler_starting(void)

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

  parent reply	other threads:[~2009-04-23 13:46 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-23  5:25 [PATCH RFC] v1 expedited "big hammer" RCU grace periods Paul E. McKenney
2009-04-23  6:11 ` Lai Jiangshan
2009-04-23 15:15   ` Paul E. McKenney
2009-04-24  0:39     ` Lai Jiangshan
2009-04-24  1:13       ` Paul E. McKenney
2009-04-23  7:54 ` Ingo Molnar
2009-04-23 15:34   ` Paul E. McKenney
2009-04-23 15:47     ` Linus Torvalds
2009-04-23 18:29       ` Paul E. McKenney
2009-04-23 13:45 ` Mathieu Desnoyers [this message]
2009-04-23 16:54   ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090423134539.GC11261@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=akpm@linux-foundation.org \
    --cc=benh@kernel.crashing.org \
    --cc=dada1@cosmosbay.com \
    --cc=davem@davemloft.net \
    --cc=jeff.chua.linux@gmail.com \
    --cc=jengelh@medozas.de \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=paulus@samba.org \
    --cc=r000n@r000n.net \
    --cc=torvalds@linux-foundation.org \
    --cc=zbr@ioremap.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.