All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josh Triplett <josh@joshtriplett.org>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: linux-kernel@vger.kernel.org, mingo@elte.hu,
	laijs@cn.fujitsu.com, dipankar@in.ibm.com,
	akpm@linux-foundation.org, mathieu.desnoyers@polymtl.ca,
	niv@us.ibm.com, tglx@linutronix.de, peterz@infradead.org,
	rostedt@goodmis.org, Valdis.Kletnieks@vt.edu,
	dhowells@redhat.com, eric.dumazet@gmail.com, darren@dvhart.com,
	fweisbec@gmail.com, sbw@mit.edu, patches@linaro.org
Subject: Re: [PATCH tip/core/rcu 01/23] rcu: Move RCU grace-period initialization into a kthread
Date: Sat, 1 Sep 2012 18:04:23 -0700	[thread overview]
Message-ID: <20120902010422.GA5713@leaf> (raw)
In-Reply-To: <1346350718-30937-1-git-send-email-paulmck@linux.vnet.ibm.com>

On Thu, Aug 30, 2012 at 11:18:16AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> 
> As the first step towards allowing grace-period initialization to be
> preemptible, this commit moves the RCU grace-period initialization
> into its own kthread.  This is needed to keep large-system scheduling
> latency at reasonable levels.
> 
> Reported-by: Mike Galbraith <mgalbraith@suse.de>
> Reported-by: Dimitri Sivanich <sivanich@sgi.com>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Reviewed-by: Josh Triplett <josh@joshtriplett.org>

>  kernel/rcutree.c |  191 ++++++++++++++++++++++++++++++++++++------------------
>  kernel/rcutree.h |    3 +
>  2 files changed, 130 insertions(+), 64 deletions(-)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index f280e54..e1c5868 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -1040,6 +1040,103 @@ rcu_start_gp_per_cpu(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_dat
>  }
>  
>  /*
> + * Body of kthread that handles grace periods.
> + */
> +static int rcu_gp_kthread(void *arg)
> +{
> +	unsigned long flags;
> +	struct rcu_data *rdp;
> +	struct rcu_node *rnp;
> +	struct rcu_state *rsp = arg;
> +
> +	for (;;) {
> +
> +		/* Handle grace-period start. */
> +		rnp = rcu_get_root(rsp);
> +		for (;;) {
> +			wait_event_interruptible(rsp->gp_wq, rsp->gp_flags);
> +			if (rsp->gp_flags)
> +				break;
> +			flush_signals(current);
> +		}
> +		raw_spin_lock_irqsave(&rnp->lock, flags);
> +		rsp->gp_flags = 0;
> +		rdp = this_cpu_ptr(rsp->rda);
> +
> +		if (rcu_gp_in_progress(rsp)) {
> +			/*
> +			 * A grace period is already in progress, so
> +			 * don't start another one.
> +			 */
> +			raw_spin_unlock_irqrestore(&rnp->lock, flags);
> +			continue;
> +		}
> +
> +		if (rsp->fqs_active) {
> +			/*
> +			 * We need a grace period, but force_quiescent_state()
> +			 * is running.  Tell it to start one on our behalf.
> +			 */
> +			rsp->fqs_need_gp = 1;
> +			raw_spin_unlock_irqrestore(&rnp->lock, flags);
> +			continue;
> +		}
> +
> +		/* Advance to a new grace period and initialize state. */
> +		rsp->gpnum++;
> +		trace_rcu_grace_period(rsp->name, rsp->gpnum, "start");
> +		WARN_ON_ONCE(rsp->fqs_state == RCU_GP_INIT);
> +		rsp->fqs_state = RCU_GP_INIT; /* Stop force_quiescent_state. */
> +		rsp->jiffies_force_qs = jiffies + RCU_JIFFIES_TILL_FORCE_QS;
> +		record_gp_stall_check_time(rsp);
> +		raw_spin_unlock(&rnp->lock);  /* leave irqs disabled. */
> +
> +		/* Exclude any concurrent CPU-hotplug operations. */
> +		raw_spin_lock(&rsp->onofflock);  /* irqs already disabled. */
> +
> +		/*
> +		 * Set the quiescent-state-needed bits in all the rcu_node
> +		 * structures for all currently online CPUs in breadth-first
> +		 * order, starting from the root rcu_node structure.
> +		 * This operation relies on the layout of the hierarchy
> +		 * within the rsp->node[] array.  Note that other CPUs will
> +		 * access only the leaves of the hierarchy, which still
> +		 * indicate that no grace period is in progress, at least
> +		 * until the corresponding leaf node has been initialized.
> +		 * In addition, we have excluded CPU-hotplug operations.
> +		 *
> +		 * Note that the grace period cannot complete until
> +		 * we finish the initialization process, as there will
> +		 * be at least one qsmask bit set in the root node until
> +		 * that time, namely the one corresponding to this CPU,
> +		 * due to the fact that we have irqs disabled.
> +		 */
> +		rcu_for_each_node_breadth_first(rsp, rnp) {
> +			raw_spin_lock(&rnp->lock); /* irqs already disabled. */
> +			rcu_preempt_check_blocked_tasks(rnp);
> +			rnp->qsmask = rnp->qsmaskinit;
> +			rnp->gpnum = rsp->gpnum;
> +			rnp->completed = rsp->completed;
> +			if (rnp == rdp->mynode)
> +				rcu_start_gp_per_cpu(rsp, rnp, rdp);
> +			rcu_preempt_boost_start_gp(rnp);
> +			trace_rcu_grace_period_init(rsp->name, rnp->gpnum,
> +						    rnp->level, rnp->grplo,
> +						    rnp->grphi, rnp->qsmask);
> +			raw_spin_unlock(&rnp->lock); /* irqs remain disabled. */
> +		}
> +
> +		rnp = rcu_get_root(rsp);
> +		raw_spin_lock(&rnp->lock); /* irqs already disabled. */
> +		/* force_quiescent_state() now OK. */
> +		rsp->fqs_state = RCU_SIGNAL_INIT;
> +		raw_spin_unlock(&rnp->lock); /* irqs remain disabled. */
> +		raw_spin_unlock_irqrestore(&rsp->onofflock, flags);
> +	}
> +	return 0;
> +}
> +
> +/*
>   * Start a new RCU grace period if warranted, re-initializing the hierarchy
>   * in preparation for detecting the next grace period.  The caller must hold
>   * the root node's ->lock, which is released before return.  Hard irqs must
> @@ -1056,77 +1153,20 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags)
>  	struct rcu_data *rdp = this_cpu_ptr(rsp->rda);
>  	struct rcu_node *rnp = rcu_get_root(rsp);
>  
> -	if (!rcu_scheduler_fully_active ||
> +	if (!rsp->gp_kthread ||
>  	    !cpu_needs_another_gp(rsp, rdp)) {
>  		/*
> -		 * Either the scheduler hasn't yet spawned the first
> -		 * non-idle task or this CPU does not need another
> -		 * grace period.  Either way, don't start a new grace
> -		 * period.
> -		 */
> -		raw_spin_unlock_irqrestore(&rnp->lock, flags);
> -		return;
> -	}
> -
> -	if (rsp->fqs_active) {
> -		/*
> -		 * This CPU needs a grace period, but force_quiescent_state()
> -		 * is running.  Tell it to start one on this CPU's behalf.
> +		 * Either we have not yet spawned the grace-period
> +		 * task or this CPU does not need another grace period.
> +		 * Either way, don't start a new grace period.
>  		 */
> -		rsp->fqs_need_gp = 1;
>  		raw_spin_unlock_irqrestore(&rnp->lock, flags);
>  		return;
>  	}
>  
> -	/* Advance to a new grace period and initialize state. */
> -	rsp->gpnum++;
> -	trace_rcu_grace_period(rsp->name, rsp->gpnum, "start");
> -	WARN_ON_ONCE(rsp->fqs_state == RCU_GP_INIT);
> -	rsp->fqs_state = RCU_GP_INIT; /* Hold off force_quiescent_state. */
> -	rsp->jiffies_force_qs = jiffies + RCU_JIFFIES_TILL_FORCE_QS;
> -	record_gp_stall_check_time(rsp);
> -	raw_spin_unlock(&rnp->lock);  /* leave irqs disabled. */
> -
> -	/* Exclude any concurrent CPU-hotplug operations. */
> -	raw_spin_lock(&rsp->onofflock);  /* irqs already disabled. */
> -
> -	/*
> -	 * Set the quiescent-state-needed bits in all the rcu_node
> -	 * structures for all currently online CPUs in breadth-first
> -	 * order, starting from the root rcu_node structure.  This
> -	 * operation relies on the layout of the hierarchy within the
> -	 * rsp->node[] array.  Note that other CPUs will access only
> -	 * the leaves of the hierarchy, which still indicate that no
> -	 * grace period is in progress, at least until the corresponding
> -	 * leaf node has been initialized.  In addition, we have excluded
> -	 * CPU-hotplug operations.
> -	 *
> -	 * Note that the grace period cannot complete until we finish
> -	 * the initialization process, as there will be at least one
> -	 * qsmask bit set in the root node until that time, namely the
> -	 * one corresponding to this CPU, due to the fact that we have
> -	 * irqs disabled.
> -	 */
> -	rcu_for_each_node_breadth_first(rsp, rnp) {
> -		raw_spin_lock(&rnp->lock);	/* irqs already disabled. */
> -		rcu_preempt_check_blocked_tasks(rnp);
> -		rnp->qsmask = rnp->qsmaskinit;
> -		rnp->gpnum = rsp->gpnum;
> -		rnp->completed = rsp->completed;
> -		if (rnp == rdp->mynode)
> -			rcu_start_gp_per_cpu(rsp, rnp, rdp);
> -		rcu_preempt_boost_start_gp(rnp);
> -		trace_rcu_grace_period_init(rsp->name, rnp->gpnum,
> -					    rnp->level, rnp->grplo,
> -					    rnp->grphi, rnp->qsmask);
> -		raw_spin_unlock(&rnp->lock);	/* irqs remain disabled. */
> -	}
> -
> -	rnp = rcu_get_root(rsp);
> -	raw_spin_lock(&rnp->lock);		/* irqs already disabled. */
> -	rsp->fqs_state = RCU_SIGNAL_INIT; /* force_quiescent_state now OK. */
> -	raw_spin_unlock(&rnp->lock);		/* irqs remain disabled. */
> -	raw_spin_unlock_irqrestore(&rsp->onofflock, flags);
> +	rsp->gp_flags = 1;
> +	raw_spin_unlock_irqrestore(&rnp->lock, flags);
> +	wake_up(&rsp->gp_wq);
>  }
>  
>  /*
> @@ -2627,6 +2667,28 @@ static int __cpuinit rcu_cpu_notify(struct notifier_block *self,
>  }
>  
>  /*
> + * Spawn the kthread that handles this RCU flavor's grace periods.
> + */
> +static int __init rcu_spawn_gp_kthread(void)
> +{
> +	unsigned long flags;
> +	struct rcu_node *rnp;
> +	struct rcu_state *rsp;
> +	struct task_struct *t;
> +
> +	for_each_rcu_flavor(rsp) {
> +		t = kthread_run(rcu_gp_kthread, rsp, rsp->name);
> +		BUG_ON(IS_ERR(t));
> +		rnp = rcu_get_root(rsp);
> +		raw_spin_lock_irqsave(&rnp->lock, flags);
> +		rsp->gp_kthread = t;
> +		raw_spin_unlock_irqrestore(&rnp->lock, flags);
> +	}
> +	return 0;
> +}
> +early_initcall(rcu_spawn_gp_kthread);
> +
> +/*
>   * This function is invoked towards the end of the scheduler's initialization
>   * process.  Before this is called, the idle task might contain
>   * RCU read-side critical sections (during which time, this idle
> @@ -2727,6 +2789,7 @@ static void __init rcu_init_one(struct rcu_state *rsp,
>  	}
>  
>  	rsp->rda = rda;
> +	init_waitqueue_head(&rsp->gp_wq);
>  	rnp = rsp->level[rcu_num_lvls - 1];
>  	for_each_possible_cpu(i) {
>  		while (i > rnp->grphi)
> diff --git a/kernel/rcutree.h b/kernel/rcutree.h
> index 4d29169..117a150 100644
> --- a/kernel/rcutree.h
> +++ b/kernel/rcutree.h
> @@ -385,6 +385,9 @@ struct rcu_state {
>  	u8	boost;				/* Subject to priority boost. */
>  	unsigned long gpnum;			/* Current gp number. */
>  	unsigned long completed;		/* # of last completed gp. */
> +	struct task_struct *gp_kthread;		/* Task for grace periods. */
> +	wait_queue_head_t gp_wq;		/* Where GP task waits. */
> +	int gp_flags;				/* Commands for GP task. */
>  
>  	/* End of fields guarded by root rcu_node's lock. */
>  
> -- 
> 1.7.8
> 

  parent reply	other threads:[~2012-09-02  1:04 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-30 18:18 [PATCH tip/core/rcu 0/23] Improvements to RT response on big systems and expedited functions Paul E. McKenney
2012-08-30 18:18 ` [PATCH tip/core/rcu 01/23] rcu: Move RCU grace-period initialization into a kthread Paul E. McKenney
2012-08-30 18:18   ` [PATCH tip/core/rcu 02/23] rcu: Allow RCU grace-period initialization to be preempted Paul E. McKenney
2012-09-02  1:09     ` Josh Triplett
2012-09-05  1:22       ` Paul E. McKenney
2012-08-30 18:18   ` [PATCH tip/core/rcu 03/23] rcu: Move RCU grace-period cleanup into kthread Paul E. McKenney
2012-09-02  1:22     ` Josh Triplett
2012-09-06 13:34     ` Peter Zijlstra
2012-09-06 17:29       ` Paul E. McKenney
2012-08-30 18:18   ` [PATCH tip/core/rcu 04/23] rcu: Allow RCU grace-period cleanup to be preempted Paul E. McKenney
2012-09-02  1:36     ` Josh Triplett
2012-08-30 18:18   ` [PATCH tip/core/rcu 05/23] rcu: Prevent offline CPUs from executing RCU core code Paul E. McKenney
2012-09-02  1:45     ` Josh Triplett
2012-08-30 18:18   ` [PATCH tip/core/rcu 06/23] rcu: Break up rcu_gp_kthread() into subfunctions Paul E. McKenney
2012-09-02  2:11     ` Josh Triplett
2012-09-06 13:39     ` Peter Zijlstra
2012-09-06 17:32       ` Paul E. McKenney
2012-09-06 18:49         ` Josh Triplett
2012-09-06 19:09           ` Peter Zijlstra
2012-09-06 20:30             ` Paul E. McKenney
2012-09-06 20:30           ` Paul E. McKenney
2012-08-30 18:18   ` [PATCH tip/core/rcu 07/23] rcu: Provide OOM handler to motivate lazy RCU callbacks Paul E. McKenney
2012-09-02  2:13     ` Josh Triplett
2012-09-03  9:08     ` Lai Jiangshan
2012-09-05 17:45       ` Paul E. McKenney
2012-09-06 13:46     ` Peter Zijlstra
2012-09-06 13:52       ` Steven Rostedt
2012-09-06 17:41         ` Paul E. McKenney
2012-09-06 17:46           ` Peter Zijlstra
2012-09-06 20:32             ` Paul E. McKenney
2012-08-30 18:18   ` [PATCH tip/core/rcu 08/23] rcu: Segregate rcu_state fields to improve cache locality Paul E. McKenney
2012-09-02  2:51     ` Josh Triplett
2012-08-30 18:18   ` [PATCH tip/core/rcu 09/23] rcu: Move quiescent-state forcing into kthread Paul E. McKenney
2012-08-30 18:18   ` [PATCH tip/core/rcu 10/23] rcu: Allow RCU quiescent-state forcing to be preempted Paul E. McKenney
2012-09-02  5:23     ` Josh Triplett
2012-08-30 18:18   ` [PATCH tip/core/rcu 11/23] rcu: Adjust debugfs tracing for kthread-based quiescent-state forcing Paul E. McKenney
2012-09-02  6:05     ` Josh Triplett
2012-08-30 18:18   ` [PATCH tip/core/rcu 12/23] rcu: Prevent force_quiescent_state() memory contention Paul E. McKenney
2012-09-02 10:47     ` Josh Triplett
2012-08-30 18:18   ` [PATCH tip/core/rcu 13/23] rcu: Control grace-period duration from sysfs Paul E. McKenney
2012-09-03  9:30     ` Josh Triplett
2012-09-03  9:31       ` Josh Triplett
2012-09-06 14:15     ` Peter Zijlstra
2012-09-06 17:53       ` Paul E. McKenney
2012-09-06 18:28         ` Peter Zijlstra
2012-09-06 20:37           ` Paul E. McKenney
2012-08-30 18:18   ` [PATCH tip/core/rcu 14/23] rcu: Remove now-unused rcu_state fields Paul E. McKenney
2012-09-03  9:31     ` Josh Triplett
2012-09-06 14:17     ` Peter Zijlstra
2012-09-06 18:02       ` Paul E. McKenney
2012-08-30 18:18   ` [PATCH tip/core/rcu 15/23] rcu: Make rcutree module parameters visible in sysfs Paul E. McKenney
2012-09-03  9:32     ` Josh Triplett
2012-08-30 18:18   ` [PATCH tip/core/rcu 16/23] rcu: Prevent initialization-time quiescent-state race Paul E. McKenney
2012-09-03  9:37     ` Josh Triplett
2012-09-05 18:19       ` Paul E. McKenney
2012-09-05 18:55         ` Josh Triplett
2012-09-05 19:49           ` Paul E. McKenney
2012-09-06 14:21         ` Peter Zijlstra
2012-09-06 16:18           ` Paul E. McKenney
2012-09-06 16:22             ` Peter Zijlstra
2012-08-30 18:18   ` [PATCH tip/core/rcu 17/23] rcu: Fix day-zero grace-period initialization/cleanup race Paul E. McKenney
2012-09-03  9:39     ` Josh Triplett
2012-09-06 14:24     ` Peter Zijlstra
2012-09-06 18:06       ` Paul E. McKenney
2012-08-30 18:18   ` [PATCH tip/core/rcu 18/23] rcu: Add random PROVE_RCU_DELAY to grace-period initialization Paul E. McKenney
2012-09-03  9:41     ` Josh Triplett
2012-09-06 14:27     ` Peter Zijlstra
2012-09-06 18:25       ` Paul E. McKenney
2012-08-30 18:18   ` [PATCH tip/core/rcu 19/23] rcu: Adjust for unconditional ->completed assignment Paul E. McKenney
2012-09-03  9:42     ` Josh Triplett
2012-08-30 18:18   ` [PATCH tip/core/rcu 20/23] rcu: Remove callback acceleration from grace-period initialization Paul E. McKenney
2012-09-03  9:42     ` Josh Triplett
2012-08-30 18:18   ` [PATCH tip/core/rcu 21/23] rcu: Eliminate signed overflow in synchronize_rcu_expedited() Paul E. McKenney
2012-09-03  9:43     ` Josh Triplett
2012-08-30 18:18   ` [PATCH tip/core/rcu 22/23] rcu: Reduce synchronize_rcu_expedited() latency Paul E. McKenney
2012-09-03  9:46     ` Josh Triplett
2012-08-30 18:18   ` [PATCH tip/core/rcu 23/23] rcu: Simplify quiescent-state detection Paul E. McKenney
2012-09-03  9:56     ` Josh Triplett
2012-09-06 14:36     ` Peter Zijlstra
2012-09-06 20:01       ` Paul E. McKenney
2012-09-06 21:18         ` Mathieu Desnoyers
2012-09-06 21:31           ` Paul E. McKenney
2012-09-02  1:04   ` Josh Triplett [this message]
2012-09-06 13:32   ` [PATCH tip/core/rcu 01/23] rcu: Move RCU grace-period initialization into a kthread Peter Zijlstra
2012-09-06 17:00     ` Paul E. McKenney
2012-09-20 18:47 [PATCH tip/core/rcu 0/23] v2 Improvements to RT response on big systems and expedited functions Paul E. McKenney
2012-09-20 18:47 ` [PATCH tip/core/rcu 01/23] rcu: Move RCU grace-period initialization into a kthread Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120902010422.GA5713@leaf \
    --to=josh@joshtriplett.org \
    --cc=Valdis.Kletnieks@vt.edu \
    --cc=akpm@linux-foundation.org \
    --cc=darren@dvhart.com \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=eric.dumazet@gmail.com \
    --cc=fweisbec@gmail.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@polymtl.ca \
    --cc=mingo@elte.hu \
    --cc=niv@us.ibm.com \
    --cc=patches@linaro.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sbw@mit.edu \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.