linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Josh Triplett <josh@joshtriplett.org>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: linux-kernel@vger.kernel.org, mingo@elte.hu,
	laijs@cn.fujitsu.com, dipankar@in.ibm.com,
	akpm@linux-foundation.org, mathieu.desnoyers@polymtl.ca,
	niv@us.ibm.com, tglx@linutronix.de, peterz@infradead.org,
	rostedt@goodmis.org, Valdis.Kletnieks@vt.edu,
	dhowells@redhat.com, eric.dumazet@gmail.com, darren@dvhart.com,
	fweisbec@gmail.com, sbw@mit.edu, patches@linaro.org
Subject: Re: [PATCH tip/core/rcu 17/23] rcu: Fix day-zero grace-period initialization/cleanup race
Date: Mon, 3 Sep 2012 02:39:57 -0700	[thread overview]
Message-ID: <20120903093957.GF5574@leaf> (raw)
In-Reply-To: <1346350718-30937-17-git-send-email-paulmck@linux.vnet.ibm.com>

On Thu, Aug 30, 2012 at 11:18:32AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> 
> The current approach to grace-period initialization is vulnerable to
> extremely low-probabity races.  These races stem fro the fact that the
> old grace period is marked completed on the same traversal through the
> rcu_node structure that is marking the start of the new grace period.
> These races can result in too-short grace periods, as shown in the
> following scenario:
> 
> 1.	CPU 0 completes a grace period, but needs an additional
> 	grace period, so starts initializing one, initializing all
> 	the non-leaf rcu_node strcutures and the first leaf rcu_node
> 	structure.  Because CPU 0 is both completing the old grace
> 	period and starting a new one, it marks the completion of
> 	the old grace period and the start of the new grace period
> 	in a single traversal of the rcu_node structures.
> 
> 	Therefore, CPUs corresponding to the first rcu_node structure
> 	can become aware that the prior grace period has completed, but
> 	CPUs corresponding to the other rcu_node structures will see
> 	this same prior grace period as still being in progress.
> 
> 2.	CPU 1 passes through a quiescent state, and therefore informs
> 	the RCU core.  Because its leaf rcu_node structure has already
> 	been initialized, this CPU's quiescent state is applied to the
> 	new (and only partially initialized) grace period.
> 
> 3.	CPU 1 enters an RCU read-side critical section and acquires
> 	a reference to data item A.  Note that this critical section
> 	started after the beginning of the new grace period, and
> 	therefore will not block this new grace period.
> 
> 4.	CPU 16 exits dyntick-idle mode.  Because it was in dyntick-idle
> 	mode, other CPUs informed the RCU core of its extended quiescent
> 	state for the past several grace periods.  This means that CPU
> 	16 is not yet aware that these past grace periods have ended.
> 	Assume that CPU 16 corresponds to the second leaf rcu_node
> 	structure.
> 
> 5.	CPU 16 removes data item A from its enclosing data structure
> 	and passes it to call_rcu(), which queues a callback in the
> 	RCU_NEXT_TAIL segment of the callback queue.
> 
> 6.	CPU 16 enters the RCU core, possibly because it has taken a
> 	scheduling-clock interrupt, or alternatively because it has more
> 	than 10,000 callbacks queued.  It notes that the second most
> 	recent grace period has completed (recall that it cannot yet
> 	become aware that the most recent grace period has completed),
> 	and therefore advances its callbacks.  The callback for data
> 	item A is therefore in the RCU_NEXT_READY_TAIL segment of the
> 	callback queue.
> 
> 7.	CPU 0 completes initialization of the remaining leaf rcu_node
> 	structures for the new grace period, including the structure
> 	corresponding to CPU 16.
> 
> 8.	CPU 16 again enters the RCU core, again, possibly because it has
> 	taken a scheduling-clock interrupt, or alternatively because
> 	it now has more than 10,000 callbacks queued.	It notes that
> 	the most recent grace period has ended, and therefore advances
> 	its callbacks.	The callback for data item A is therefore in
> 	the RCU_WAIT_TAIL segment of the callback queue.
> 
> 9.	All CPUs other than CPU 1 pass through quiescent states.  Because
> 	CPU 1 already passed through its quiescent state, the new grace
> 	period completes.  Note that CPU 1 is still in its RCU read-side
> 	critical section, still referencing data item A.
> 
> 10.	Suppose that CPU 2 wais the last CPU to pass through a quiescent
> 	state for the new grace period, and suppose further that CPU 2
> 	did not have any callbacks queued, therefore not needing an
> 	additional grace period.  CPU 2 therefore traverses all of the
> 	rcu_node structures, marking the new grace period as completed,
> 	but does not initialize a new grace period.
> 
> 11.	CPU 16 yet again enters the RCU core, yet again possibly because
> 	it has taken a scheduling-clock interrupt, or alternatively
> 	because it now has more than 10,000 callbacks queued.	It notes
> 	that the new grace period has ended, and therefore advances
> 	its callbacks.	The callback for data item A is therefore in
> 	the RCU_DONE_TAIL segment of the callback queue.  This means
> 	that this callback is now considered ready to be invoked.
> 
> 12.	CPU 16 invokes the callback, freeing data item A while CPU 1
> 	is still referencing it.
> 
> This scenario represents a day-zero bug for TREE_RCU.  This commit
> therefore ensures that the old grace period is marked completed in
> all leaf rcu_node structures before a new grace period is marked
> started in any of them.
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Reviewed-by: Josh Triplett <josh@joshtriplett.org>

>  kernel/rcutree.c |   36 +++++++++++++-----------------------
>  1 files changed, 13 insertions(+), 23 deletions(-)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index d435009..4cfe488 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -1161,33 +1161,23 @@ static void rcu_gp_cleanup(struct rcu_state *rsp)
>  	 * they can do to advance the grace period.  It is therefore
>  	 * safe for us to drop the lock in order to mark the grace
>  	 * period as completed in all of the rcu_node structures.
> -	 *
> -	 * But if this CPU needs another grace period, it will take
> -	 * care of this while initializing the next grace period.
> -	 * We use RCU_WAIT_TAIL instead of the usual RCU_DONE_TAIL
> -	 * because the callbacks have not yet been advanced: Those
> -	 * callbacks are waiting on the grace period that just now
> -	 * completed.
>  	 */
> -	rdp = this_cpu_ptr(rsp->rda);
> -	if (*rdp->nxttail[RCU_WAIT_TAIL] == NULL) {
> -		raw_spin_unlock_irqrestore(&rnp->lock, flags);
> +	raw_spin_unlock_irqrestore(&rnp->lock, flags);
>  
> -		/*
> -		 * Propagate new ->completed value to rcu_node
> -		 * structures so that other CPUs don't have to
> -		 * wait until the start of the next grace period
> -		 * to process their callbacks.
> -		 */
> -		rcu_for_each_node_breadth_first(rsp, rnp) {
> -			raw_spin_lock_irqsave(&rnp->lock, flags);
> -			rnp->completed = rsp->gpnum;
> -			raw_spin_unlock_irqrestore(&rnp->lock, flags);
> -			cond_resched();
> -		}
> -		rnp = rcu_get_root(rsp);
> +	/*
> +	 * Propagate new ->completed value to rcu_node structures so
> +	 * that other CPUs don't have to wait until the start of the next
> +	 * grace period to process their callbacks.  This also avoids
> +	 * some nasty RCU grace-period initialization races.
> +	 */
> +	rcu_for_each_node_breadth_first(rsp, rnp) {
>  		raw_spin_lock_irqsave(&rnp->lock, flags);
> +		rnp->completed = rsp->gpnum;
> +		raw_spin_unlock_irqrestore(&rnp->lock, flags);
> +		cond_resched();
>  	}
> +	rnp = rcu_get_root(rsp);
> +	raw_spin_lock_irqsave(&rnp->lock, flags);
>  
>  	rsp->completed = rsp->gpnum; /* Declare grace period done. */
>  	trace_rcu_grace_period(rsp->name, rsp->completed, "end");
> -- 
> 1.7.8
> 

  reply	other threads:[~2012-09-03  9:40 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-30 18:18 [PATCH tip/core/rcu 0/23] Improvements to RT response on big systems and expedited functions Paul E. McKenney
2012-08-30 18:18 ` [PATCH tip/core/rcu 01/23] rcu: Move RCU grace-period initialization into a kthread Paul E. McKenney
2012-08-30 18:18   ` [PATCH tip/core/rcu 02/23] rcu: Allow RCU grace-period initialization to be preempted Paul E. McKenney
2012-09-02  1:09     ` Josh Triplett
2012-09-05  1:22       ` Paul E. McKenney
2012-08-30 18:18   ` [PATCH tip/core/rcu 03/23] rcu: Move RCU grace-period cleanup into kthread Paul E. McKenney
2012-09-02  1:22     ` Josh Triplett
2012-09-06 13:34     ` Peter Zijlstra
2012-09-06 17:29       ` Paul E. McKenney
2012-08-30 18:18   ` [PATCH tip/core/rcu 04/23] rcu: Allow RCU grace-period cleanup to be preempted Paul E. McKenney
2012-09-02  1:36     ` Josh Triplett
2012-08-30 18:18   ` [PATCH tip/core/rcu 05/23] rcu: Prevent offline CPUs from executing RCU core code Paul E. McKenney
2012-09-02  1:45     ` Josh Triplett
2012-08-30 18:18   ` [PATCH tip/core/rcu 06/23] rcu: Break up rcu_gp_kthread() into subfunctions Paul E. McKenney
2012-09-02  2:11     ` Josh Triplett
2012-09-06 13:39     ` Peter Zijlstra
2012-09-06 17:32       ` Paul E. McKenney
2012-09-06 18:49         ` Josh Triplett
2012-09-06 19:09           ` Peter Zijlstra
2012-09-06 20:30             ` Paul E. McKenney
2012-09-06 20:30           ` Paul E. McKenney
2012-08-30 18:18   ` [PATCH tip/core/rcu 07/23] rcu: Provide OOM handler to motivate lazy RCU callbacks Paul E. McKenney
2012-09-02  2:13     ` Josh Triplett
2012-09-03  9:08     ` Lai Jiangshan
2012-09-05 17:45       ` Paul E. McKenney
2012-09-06 13:46     ` Peter Zijlstra
2012-09-06 13:52       ` Steven Rostedt
2012-09-06 17:41         ` Paul E. McKenney
2012-09-06 17:46           ` Peter Zijlstra
2012-09-06 20:32             ` Paul E. McKenney
2012-08-30 18:18   ` [PATCH tip/core/rcu 08/23] rcu: Segregate rcu_state fields to improve cache locality Paul E. McKenney
2012-09-02  2:51     ` Josh Triplett
2012-08-30 18:18   ` [PATCH tip/core/rcu 09/23] rcu: Move quiescent-state forcing into kthread Paul E. McKenney
2012-08-30 18:18   ` [PATCH tip/core/rcu 10/23] rcu: Allow RCU quiescent-state forcing to be preempted Paul E. McKenney
2012-09-02  5:23     ` Josh Triplett
2012-08-30 18:18   ` [PATCH tip/core/rcu 11/23] rcu: Adjust debugfs tracing for kthread-based quiescent-state forcing Paul E. McKenney
2012-09-02  6:05     ` Josh Triplett
2012-08-30 18:18   ` [PATCH tip/core/rcu 12/23] rcu: Prevent force_quiescent_state() memory contention Paul E. McKenney
2012-09-02 10:47     ` Josh Triplett
2012-08-30 18:18   ` [PATCH tip/core/rcu 13/23] rcu: Control grace-period duration from sysfs Paul E. McKenney
2012-09-03  9:30     ` Josh Triplett
2012-09-03  9:31       ` Josh Triplett
2012-09-06 14:15     ` Peter Zijlstra
2012-09-06 17:53       ` Paul E. McKenney
2012-09-06 18:28         ` Peter Zijlstra
2012-09-06 20:37           ` Paul E. McKenney
2012-08-30 18:18   ` [PATCH tip/core/rcu 14/23] rcu: Remove now-unused rcu_state fields Paul E. McKenney
2012-09-03  9:31     ` Josh Triplett
2012-09-06 14:17     ` Peter Zijlstra
2012-09-06 18:02       ` Paul E. McKenney
2012-08-30 18:18   ` [PATCH tip/core/rcu 15/23] rcu: Make rcutree module parameters visible in sysfs Paul E. McKenney
2012-09-03  9:32     ` Josh Triplett
2012-08-30 18:18   ` [PATCH tip/core/rcu 16/23] rcu: Prevent initialization-time quiescent-state race Paul E. McKenney
2012-09-03  9:37     ` Josh Triplett
2012-09-05 18:19       ` Paul E. McKenney
2012-09-05 18:55         ` Josh Triplett
2012-09-05 19:49           ` Paul E. McKenney
2012-09-06 14:21         ` Peter Zijlstra
2012-09-06 16:18           ` Paul E. McKenney
2012-09-06 16:22             ` Peter Zijlstra
2012-08-30 18:18   ` [PATCH tip/core/rcu 17/23] rcu: Fix day-zero grace-period initialization/cleanup race Paul E. McKenney
2012-09-03  9:39     ` Josh Triplett [this message]
2012-09-06 14:24     ` Peter Zijlstra
2012-09-06 18:06       ` Paul E. McKenney
2012-08-30 18:18   ` [PATCH tip/core/rcu 18/23] rcu: Add random PROVE_RCU_DELAY to grace-period initialization Paul E. McKenney
2012-09-03  9:41     ` Josh Triplett
2012-09-06 14:27     ` Peter Zijlstra
2012-09-06 18:25       ` Paul E. McKenney
2012-08-30 18:18   ` [PATCH tip/core/rcu 19/23] rcu: Adjust for unconditional ->completed assignment Paul E. McKenney
2012-09-03  9:42     ` Josh Triplett
2012-08-30 18:18   ` [PATCH tip/core/rcu 20/23] rcu: Remove callback acceleration from grace-period initialization Paul E. McKenney
2012-09-03  9:42     ` Josh Triplett
2012-08-30 18:18   ` [PATCH tip/core/rcu 21/23] rcu: Eliminate signed overflow in synchronize_rcu_expedited() Paul E. McKenney
2012-09-03  9:43     ` Josh Triplett
2012-08-30 18:18   ` [PATCH tip/core/rcu 22/23] rcu: Reduce synchronize_rcu_expedited() latency Paul E. McKenney
2012-09-03  9:46     ` Josh Triplett
2012-08-30 18:18   ` [PATCH tip/core/rcu 23/23] rcu: Simplify quiescent-state detection Paul E. McKenney
2012-09-03  9:56     ` Josh Triplett
2012-09-06 14:36     ` Peter Zijlstra
2012-09-06 20:01       ` Paul E. McKenney
2012-09-06 21:18         ` Mathieu Desnoyers
2012-09-06 21:31           ` Paul E. McKenney
2012-09-02  1:04   ` [PATCH tip/core/rcu 01/23] rcu: Move RCU grace-period initialization into a kthread Josh Triplett
2012-09-06 13:32   ` Peter Zijlstra
2012-09-06 17:00     ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120903093957.GF5574@leaf \
    --to=josh@joshtriplett.org \
    --cc=Valdis.Kletnieks@vt.edu \
    --cc=akpm@linux-foundation.org \
    --cc=darren@dvhart.com \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=eric.dumazet@gmail.com \
    --cc=fweisbec@gmail.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@polymtl.ca \
    --cc=mingo@elte.hu \
    --cc=niv@us.ibm.com \
    --cc=patches@linaro.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sbw@mit.edu \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).