All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched: Don't call any kfree*() API in do_set_cpus_allowed()
@ 2023-10-31  0:14 Waiman Long
  2023-10-31  8:53 ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Waiman Long @ 2023-10-31  0:14 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider
  Cc: linux-kernel, Phil Auld, kernel test robot, aubrey.li, yu.c.chen,
	Waiman Long

Commit 851a723e45d1 ("sched: Always clear user_cpus_ptr in
do_set_cpus_allowed()") added a kfree() call to free any user
provided affinity mask, if present. It was changed later to use
kfree_rcu() in commit 9a5418bc48ba ("sched/core: Use kfree_rcu()
in do_set_cpus_allowed()") to avoid a circular locking dependency
problem.

It turns out that even kfree_rcu() isn't safe for avoiding
circular locking problem. As reported by kernel test robot,
the following circular locking dependency still exists:

  &rdp->nocb_lock --> rcu_node_0 --> &rq->__lock

So no kfree*() API can be used in do_set_cpus_allowed(). To prevent
memory leakage, the unused user provided affinity mask is now saved in a
lockless list to be reused later by subsequent sched_setaffinity() calls.

Without kfree_rcu(), the internal cpumask_rcuhead union can be removed
too as a lockless list entry only holds a single pointer.

Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202310302207.a25f1a30-oliver.sang@intel.com
Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/sched/core.c | 31 ++++++++++++++++++-------------
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 802551e0009b..f536d11a284e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2789,6 +2789,11 @@ __do_set_cpus_allowed(struct task_struct *p, struct affinity_context *ctx)
 		set_next_task(rq, p);
 }
 
+/*
+ * A lockless list of free cpumasks to be used for user cpumasks.
+ */
+static LLIST_HEAD(cpumask_free_lhead);
+
 /*
  * Used for kthread_bind() and select_fallback_rq(), in both cases the user
  * affinity (if any) should be destroyed too.
@@ -2800,29 +2805,29 @@ void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
 		.user_mask = NULL,
 		.flags     = SCA_USER,	/* clear the user requested mask */
 	};
-	union cpumask_rcuhead {
-		cpumask_t cpumask;
-		struct rcu_head rcu;
-	};
 
 	__do_set_cpus_allowed(p, &ac);
 
 	/*
-	 * Because this is called with p->pi_lock held, it is not possible
-	 * to use kfree() here (when PREEMPT_RT=y), therefore punt to using
-	 * kfree_rcu().
+	 * We can't call any kfree*() API here as p->pi_lock and/or rq lock
+	 * may be held. So we save it in a llist to be reused in the next
+	 * sched_setaffinity() call.
 	 */
-	kfree_rcu((union cpumask_rcuhead *)ac.user_mask, rcu);
+	if (ac.user_mask)
+		llist_add((struct llist_node *)ac.user_mask, &cpumask_free_lhead);
 }
 
 static cpumask_t *alloc_user_cpus_ptr(int node)
 {
-	/*
-	 * See do_set_cpus_allowed() above for the rcu_head usage.
-	 */
-	int size = max_t(int, cpumask_size(), sizeof(struct rcu_head));
+	struct cpumask *pmask = NULL;
+
+	if (!llist_empty(&cpumask_free_lhead))
+		pmask = (struct cpumask *)llist_del_first(&cpumask_free_lhead);
+
+	if (!pmask)
+		pmask = kmalloc_node(cpumask_size(), GFP_KERNEL, node);
 
-	return kmalloc_node(size, GFP_KERNEL, node);
+	return pmask;
 }
 
 int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct *src,
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] sched: Don't call any kfree*() API in do_set_cpus_allowed()
  2023-10-31  0:14 [PATCH] sched: Don't call any kfree*() API in do_set_cpus_allowed() Waiman Long
@ 2023-10-31  8:53 ` Peter Zijlstra
  2023-10-31 10:52   ` Frederic Weisbecker
  2023-10-31 14:29   ` Paul E. McKenney
  0 siblings, 2 replies; 11+ messages in thread
From: Peter Zijlstra @ 2023-10-31  8:53 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, linux-kernel,
	Phil Auld, kernel test robot, aubrey.li, yu.c.chen, paulmck,
	frederic, quic_neeraju, joel, josh, boqun.feng,
	mathieu.desnoyers, jiangshanlai, qiang.zhang1211

On Mon, Oct 30, 2023 at 08:14:18PM -0400, Waiman Long wrote:
> Commit 851a723e45d1 ("sched: Always clear user_cpus_ptr in
> do_set_cpus_allowed()") added a kfree() call to free any user
> provided affinity mask, if present. It was changed later to use
> kfree_rcu() in commit 9a5418bc48ba ("sched/core: Use kfree_rcu()
> in do_set_cpus_allowed()") to avoid a circular locking dependency
> problem.
> 
> It turns out that even kfree_rcu() isn't safe for avoiding
> circular locking problem. As reported by kernel test robot,
> the following circular locking dependency still exists:
> 
>   &rdp->nocb_lock --> rcu_node_0 --> &rq->__lock
> 
> So no kfree*() API can be used in do_set_cpus_allowed(). To prevent
> memory leakage, the unused user provided affinity mask is now saved in a
> lockless list to be reused later by subsequent sched_setaffinity() calls.
> 
> Without kfree_rcu(), the internal cpumask_rcuhead union can be removed
> too as a lockless list entry only holds a single pointer.
> 
> Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()")

Bah, or we fix RCU...  Paul, how insane is the below?

---
 kernel/rcu/tree.c | 31 +++++++++++++++++++++----------
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index cb1caefa8bd0..4b8e26a028ee 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -754,15 +754,20 @@ static int dyntick_save_progress_counter(struct rcu_data *rdp)
 }
 
 /*
- * Return true if the specified CPU has passed through a quiescent
- * state by virtue of being in or having passed through an dynticks
- * idle state since the last call to dyntick_save_progress_counter()
- * for this same CPU, or by virtue of having been offline.
+ * Returns positive if the specified CPU has passed through a quiescent state
+ * by virtue of being in or having passed through an dynticks idle state since
+ * the last call to dyntick_save_progress_counter() for this same CPU, or by
+ * virtue of having been offline.
+ *
+ * Returns negative if the specified CPU needs a force resched.
+ *
+ * Returns zero otherwise.
  */
 static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
 {
-	unsigned long jtsq;
 	struct rcu_node *rnp = rdp->mynode;
+	unsigned long jtsq;
+	int ret = 0;
 
 	/*
 	 * If the CPU passed through or entered a dynticks idle phase with
@@ -847,8 +852,8 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
 	    (time_after(jiffies, READ_ONCE(rdp->last_fqs_resched) + jtsq * 3) ||
 	     rcu_state.cbovld)) {
 		WRITE_ONCE(rdp->rcu_urgent_qs, true);
-		resched_cpu(rdp->cpu);
 		WRITE_ONCE(rdp->last_fqs_resched, jiffies);
+		ret = -1;
 	}
 
 	/*
@@ -891,7 +896,7 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
 		}
 	}
 
-	return 0;
+	return ret;
 }
 
 /* Trace-event wrapper function for trace_rcu_future_grace_period.  */
@@ -2255,11 +2260,11 @@ void rcu_sched_clock_irq(int user)
  */
 static void force_qs_rnp(int (*f)(struct rcu_data *rdp))
 {
-	int cpu;
+	unsigned long mask, rsmask = 0;
 	unsigned long flags;
-	unsigned long mask;
 	struct rcu_data *rdp;
 	struct rcu_node *rnp;
+	int cpu, ret;
 
 	rcu_state.cbovld = rcu_state.cbovldnext;
 	rcu_state.cbovldnext = false;
@@ -2284,10 +2289,13 @@ static void force_qs_rnp(int (*f)(struct rcu_data *rdp))
 		}
 		for_each_leaf_node_cpu_mask(rnp, cpu, rnp->qsmask) {
 			rdp = per_cpu_ptr(&rcu_data, cpu);
-			if (f(rdp)) {
+			ret = f(rdp);
+			if (ret > 0) {
 				mask |= rdp->grpmask;
 				rcu_disable_urgency_upon_qs(rdp);
 			}
+			if (ret < 0)
+				rsmask |= 1UL << (cpu - rnp->grplo);
 		}
 		if (mask != 0) {
 			/* Idle/offline CPUs, report (releases rnp->lock). */
@@ -2296,6 +2304,9 @@ static void force_qs_rnp(int (*f)(struct rcu_data *rdp))
 			/* Nothing to do here, so just drop the lock. */
 			raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 		}
+
+		for_each_leaf_node_cpu_mask(rnp, cpu, rsmask)
+			resched_cpu(cpu);
 	}
 }
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] sched: Don't call any kfree*() API in do_set_cpus_allowed()
  2023-10-31  8:53 ` Peter Zijlstra
@ 2023-10-31 10:52   ` Frederic Weisbecker
  2023-10-31 12:18     ` Peter Zijlstra
  2023-10-31 14:29   ` Paul E. McKenney
  1 sibling, 1 reply; 11+ messages in thread
From: Frederic Weisbecker @ 2023-10-31 10:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Waiman Long, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, linux-kernel,
	Phil Auld, kernel test robot, aubrey.li, yu.c.chen, paulmck,
	quic_neeraju, joel, josh, boqun.feng, mathieu.desnoyers,
	jiangshanlai, qiang.zhang1211

On Tue, Oct 31, 2023 at 09:53:08AM +0100, Peter Zijlstra wrote:
> On Mon, Oct 30, 2023 at 08:14:18PM -0400, Waiman Long wrote:
> > Commit 851a723e45d1 ("sched: Always clear user_cpus_ptr in
> > do_set_cpus_allowed()") added a kfree() call to free any user
> > provided affinity mask, if present. It was changed later to use
> > kfree_rcu() in commit 9a5418bc48ba ("sched/core: Use kfree_rcu()
> > in do_set_cpus_allowed()") to avoid a circular locking dependency
> > problem.
> > 
> > It turns out that even kfree_rcu() isn't safe for avoiding
> > circular locking problem. As reported by kernel test robot,
> > the following circular locking dependency still exists:
> > 
> >   &rdp->nocb_lock --> rcu_node_0 --> &rq->__lock
> > 
> > So no kfree*() API can be used in do_set_cpus_allowed(). To prevent
> > memory leakage, the unused user provided affinity mask is now saved in a
> > lockless list to be reused later by subsequent sched_setaffinity() calls.
> > 
> > Without kfree_rcu(), the internal cpumask_rcuhead union can be removed
> > too as a lockless list entry only holds a single pointer.
> > 
> > Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()")
> 
> Bah, or we fix RCU...  Paul, how insane is the below?

Makes sense. We can't remove &rdp->nocb_lock --> rcu_node_0 but we can (and
should) indeed remove rcu_node_0 --> &rq->__lock

Just a detail below:

> @@ -2284,10 +2289,13 @@ static void force_qs_rnp(int (*f)(struct rcu_data *rdp))
>  		}
>  		for_each_leaf_node_cpu_mask(rnp, cpu, rnp->qsmask) {
>  			rdp = per_cpu_ptr(&rcu_data, cpu);
> -			if (f(rdp)) {
> +			ret = f(rdp);
> +			if (ret > 0) {
>  				mask |= rdp->grpmask;
>  				rcu_disable_urgency_upon_qs(rdp);
>  			}
> +			if (ret < 0)
> +				rsmask |= 1UL << (cpu - rnp->grplo);

I guess this can be simplified with rsmask |= rdp->grpmask;

Thanks.

>  		}
>  		if (mask != 0) {
>  			/* Idle/offline CPUs, report (releases rnp->lock). */
> @@ -2296,6 +2304,9 @@ static void force_qs_rnp(int (*f)(struct rcu_data *rdp))
>  			/* Nothing to do here, so just drop the lock. */
>  			raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>  		}
> +
> +		for_each_leaf_node_cpu_mask(rnp, cpu, rsmask)
> +			resched_cpu(cpu);
>  	}
>  }
>  

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sched: Don't call any kfree*() API in do_set_cpus_allowed()
  2023-10-31 10:52   ` Frederic Weisbecker
@ 2023-10-31 12:18     ` Peter Zijlstra
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2023-10-31 12:18 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Waiman Long, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, linux-kernel,
	Phil Auld, kernel test robot, aubrey.li, yu.c.chen, paulmck,
	quic_neeraju, joel, josh, boqun.feng, mathieu.desnoyers,
	jiangshanlai, qiang.zhang1211

On Tue, Oct 31, 2023 at 11:52:38AM +0100, Frederic Weisbecker wrote:
> On Tue, Oct 31, 2023 at 09:53:08AM +0100, Peter Zijlstra wrote:
> > On Mon, Oct 30, 2023 at 08:14:18PM -0400, Waiman Long wrote:
> > > Commit 851a723e45d1 ("sched: Always clear user_cpus_ptr in
> > > do_set_cpus_allowed()") added a kfree() call to free any user
> > > provided affinity mask, if present. It was changed later to use
> > > kfree_rcu() in commit 9a5418bc48ba ("sched/core: Use kfree_rcu()
> > > in do_set_cpus_allowed()") to avoid a circular locking dependency
> > > problem.
> > > 
> > > It turns out that even kfree_rcu() isn't safe for avoiding
> > > circular locking problem. As reported by kernel test robot,
> > > the following circular locking dependency still exists:
> > > 
> > >   &rdp->nocb_lock --> rcu_node_0 --> &rq->__lock
> > > 
> > > So no kfree*() API can be used in do_set_cpus_allowed(). To prevent
> > > memory leakage, the unused user provided affinity mask is now saved in a
> > > lockless list to be reused later by subsequent sched_setaffinity() calls.
> > > 
> > > Without kfree_rcu(), the internal cpumask_rcuhead union can be removed
> > > too as a lockless list entry only holds a single pointer.
> > > 
> > > Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()")
> > 
> > Bah, or we fix RCU...  Paul, how insane is the below?
> 
> Makes sense. We can't remove &rdp->nocb_lock --> rcu_node_0 but we can (and
> should) indeed remove rcu_node_0 --> &rq->__lock
> 
> Just a detail below:
> 
> > @@ -2284,10 +2289,13 @@ static void force_qs_rnp(int (*f)(struct rcu_data *rdp))
> >  		}
> >  		for_each_leaf_node_cpu_mask(rnp, cpu, rnp->qsmask) {
> >  			rdp = per_cpu_ptr(&rcu_data, cpu);
> > -			if (f(rdp)) {
> > +			ret = f(rdp);
> > +			if (ret > 0) {
> >  				mask |= rdp->grpmask;
> >  				rcu_disable_urgency_upon_qs(rdp);
> >  			}
> > +			if (ret < 0)
> > +				rsmask |= 1UL << (cpu - rnp->grplo);
> 
> I guess this can be simplified with rsmask |= rdp->grpmask;

Ah, I wasn't sure that was actually the same. I looked for a minute and
gave up :-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sched: Don't call any kfree*() API in do_set_cpus_allowed()
  2023-10-31  8:53 ` Peter Zijlstra
  2023-10-31 10:52   ` Frederic Weisbecker
@ 2023-10-31 14:29   ` Paul E. McKenney
  2023-10-31 20:02     ` [PATCH] rcu: Break rcu_node_0 --> &rq->__lock order Peter Zijlstra
  1 sibling, 1 reply; 11+ messages in thread
From: Paul E. McKenney @ 2023-10-31 14:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Waiman Long, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, linux-kernel,
	Phil Auld, kernel test robot, aubrey.li, yu.c.chen, frederic,
	quic_neeraju, joel, josh, boqun.feng, mathieu.desnoyers,
	jiangshanlai, qiang.zhang1211

On Tue, Oct 31, 2023 at 09:53:08AM +0100, Peter Zijlstra wrote:
> On Mon, Oct 30, 2023 at 08:14:18PM -0400, Waiman Long wrote:
> > Commit 851a723e45d1 ("sched: Always clear user_cpus_ptr in
> > do_set_cpus_allowed()") added a kfree() call to free any user
> > provided affinity mask, if present. It was changed later to use
> > kfree_rcu() in commit 9a5418bc48ba ("sched/core: Use kfree_rcu()
> > in do_set_cpus_allowed()") to avoid a circular locking dependency
> > problem.
> > 
> > It turns out that even kfree_rcu() isn't safe for avoiding
> > circular locking problem. As reported by kernel test robot,
> > the following circular locking dependency still exists:
> > 
> >   &rdp->nocb_lock --> rcu_node_0 --> &rq->__lock
> > 
> > So no kfree*() API can be used in do_set_cpus_allowed(). To prevent
> > memory leakage, the unused user provided affinity mask is now saved in a
> > lockless list to be reused later by subsequent sched_setaffinity() calls.
> > 
> > Without kfree_rcu(), the internal cpumask_rcuhead union can be removed
> > too as a lockless list entry only holds a single pointer.
> > 
> > Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()")
> 
> Bah, or we fix RCU...  Paul, how insane is the below?

Other than the de-alphabetization of the local variables, it looks
plausible to me.  Frederic's suggestion also sounds plausible to me.

							Thanx, Paul

> ---
>  kernel/rcu/tree.c | 31 +++++++++++++++++++++----------
>  1 file changed, 21 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index cb1caefa8bd0..4b8e26a028ee 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -754,15 +754,20 @@ static int dyntick_save_progress_counter(struct rcu_data *rdp)
>  }
>  
>  /*
> - * Return true if the specified CPU has passed through a quiescent
> - * state by virtue of being in or having passed through an dynticks
> - * idle state since the last call to dyntick_save_progress_counter()
> - * for this same CPU, or by virtue of having been offline.
> + * Returns positive if the specified CPU has passed through a quiescent state
> + * by virtue of being in or having passed through an dynticks idle state since
> + * the last call to dyntick_save_progress_counter() for this same CPU, or by
> + * virtue of having been offline.
> + *
> + * Returns negative if the specified CPU needs a force resched.
> + *
> + * Returns zero otherwise.
>   */
>  static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
>  {
> -	unsigned long jtsq;
>  	struct rcu_node *rnp = rdp->mynode;
> +	unsigned long jtsq;
> +	int ret = 0;
>  
>  	/*
>  	 * If the CPU passed through or entered a dynticks idle phase with
> @@ -847,8 +852,8 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
>  	    (time_after(jiffies, READ_ONCE(rdp->last_fqs_resched) + jtsq * 3) ||
>  	     rcu_state.cbovld)) {
>  		WRITE_ONCE(rdp->rcu_urgent_qs, true);
> -		resched_cpu(rdp->cpu);
>  		WRITE_ONCE(rdp->last_fqs_resched, jiffies);
> +		ret = -1;
>  	}
>  
>  	/*
> @@ -891,7 +896,7 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
>  		}
>  	}
>  
> -	return 0;
> +	return ret;
>  }
>  
>  /* Trace-event wrapper function for trace_rcu_future_grace_period.  */
> @@ -2255,11 +2260,11 @@ void rcu_sched_clock_irq(int user)
>   */
>  static void force_qs_rnp(int (*f)(struct rcu_data *rdp))
>  {
> -	int cpu;
> +	unsigned long mask, rsmask = 0;
>  	unsigned long flags;
> -	unsigned long mask;
>  	struct rcu_data *rdp;
>  	struct rcu_node *rnp;
> +	int cpu, ret;
>  
>  	rcu_state.cbovld = rcu_state.cbovldnext;
>  	rcu_state.cbovldnext = false;
> @@ -2284,10 +2289,13 @@ static void force_qs_rnp(int (*f)(struct rcu_data *rdp))
>  		}
>  		for_each_leaf_node_cpu_mask(rnp, cpu, rnp->qsmask) {
>  			rdp = per_cpu_ptr(&rcu_data, cpu);
> -			if (f(rdp)) {
> +			ret = f(rdp);
> +			if (ret > 0) {
>  				mask |= rdp->grpmask;
>  				rcu_disable_urgency_upon_qs(rdp);
>  			}
> +			if (ret < 0)
> +				rsmask |= 1UL << (cpu - rnp->grplo);
>  		}
>  		if (mask != 0) {
>  			/* Idle/offline CPUs, report (releases rnp->lock). */
> @@ -2296,6 +2304,9 @@ static void force_qs_rnp(int (*f)(struct rcu_data *rdp))
>  			/* Nothing to do here, so just drop the lock. */
>  			raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>  		}
> +
> +		for_each_leaf_node_cpu_mask(rnp, cpu, rsmask)
> +			resched_cpu(cpu);
>  	}
>  }
>  

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] rcu: Break rcu_node_0 --> &rq->__lock order
  2023-10-31 14:29   ` Paul E. McKenney
@ 2023-10-31 20:02     ` Peter Zijlstra
  2023-10-31 21:49       ` Paul E. McKenney
                         ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Peter Zijlstra @ 2023-10-31 20:02 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Waiman Long, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, linux-kernel,
	Phil Auld, kernel test robot, aubrey.li, yu.c.chen, frederic,
	quic_neeraju, joel, josh, boqun.feng, mathieu.desnoyers,
	jiangshanlai, qiang.zhang1211

On Tue, Oct 31, 2023 at 07:29:04AM -0700, Paul E. McKenney wrote:
> Other than the de-alphabetization of the local variables, it looks
> plausible to me.  Frederic's suggestion also sounds plausible to me.

Having spend the better part of the past two decades using upside down
xmas trees for local variables, this alphabet thing is obnoxious :-)

But your code, your rules.

To reduce the number of alphabet songs required, I've taken the liberty
to move a few variables into a narrower scope, hope that doesn't offend.

---
Subject: rcu: Break rcu_node_0 --> &rq->__lock order
From: Peter Zijlstra <peterz@infradead.org>
Date: Tue, 31 Oct 2023 09:53:08 +0100

Commit 851a723e45d1 ("sched: Always clear user_cpus_ptr in
do_set_cpus_allowed()") added a kfree() call to free any user
provided affinity mask, if present. It was changed later to use
kfree_rcu() in commit 9a5418bc48ba ("sched/core: Use kfree_rcu()
in do_set_cpus_allowed()") to avoid a circular locking dependency
problem.

It turns out that even kfree_rcu() isn't safe for avoiding
circular locking problem. As reported by kernel test robot,
the following circular locking dependency now exists:

  &rdp->nocb_lock --> rcu_node_0 --> &rq->__lock

Solve this by breaking the rcu_node_0 --> &rq->__lock chain by moving
the resched_cpu() out from under rcu_node lock.

[peterz: heavily borrowed from Waiman's Changelog]
Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/oe-lkp/202310302207.a25f1a30-oliver.sang@intel.com
---
 kernel/rcu/tree.c |   34 ++++++++++++++++++++++++----------
 1 file changed, 24 insertions(+), 10 deletions(-)

--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -754,14 +754,19 @@ static int dyntick_save_progress_counter
 }
 
 /*
- * Return true if the specified CPU has passed through a quiescent
- * state by virtue of being in or having passed through an dynticks
- * idle state since the last call to dyntick_save_progress_counter()
- * for this same CPU, or by virtue of having been offline.
+ * Returns positive if the specified CPU has passed through a quiescent state
+ * by virtue of being in or having passed through an dynticks idle state since
+ * the last call to dyntick_save_progress_counter() for this same CPU, or by
+ * virtue of having been offline.
+ *
+ * Returns negative if the specified CPU needs a force resched.
+ *
+ * Returns zero otherwise.
  */
 static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
 {
 	unsigned long jtsq;
+	int ret = 0;
 	struct rcu_node *rnp = rdp->mynode;
 
 	/*
@@ -847,8 +852,8 @@ static int rcu_implicit_dynticks_qs(stru
 	    (time_after(jiffies, READ_ONCE(rdp->last_fqs_resched) + jtsq * 3) ||
 	     rcu_state.cbovld)) {
 		WRITE_ONCE(rdp->rcu_urgent_qs, true);
-		resched_cpu(rdp->cpu);
 		WRITE_ONCE(rdp->last_fqs_resched, jiffies);
+		ret = -1;
 	}
 
 	/*
@@ -891,7 +896,7 @@ static int rcu_implicit_dynticks_qs(stru
 		}
 	}
 
-	return 0;
+	return ret;
 }
 
 /* Trace-event wrapper function for trace_rcu_future_grace_period.  */
@@ -2257,15 +2262,15 @@ static void force_qs_rnp(int (*f)(struct
 {
 	int cpu;
 	unsigned long flags;
-	unsigned long mask;
-	struct rcu_data *rdp;
 	struct rcu_node *rnp;
 
 	rcu_state.cbovld = rcu_state.cbovldnext;
 	rcu_state.cbovldnext = false;
 	rcu_for_each_leaf_node(rnp) {
+		unsigned long mask = 0;
+		unsigned long rsmask = 0;
+
 		cond_resched_tasks_rcu_qs();
-		mask = 0;
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 		rcu_state.cbovldnext |= !!rnp->cbovldmask;
 		if (rnp->qsmask == 0) {
@@ -2283,11 +2288,17 @@ static void force_qs_rnp(int (*f)(struct
 			continue;
 		}
 		for_each_leaf_node_cpu_mask(rnp, cpu, rnp->qsmask) {
+			struct rcu_data *rdp;
+			int ret;
+
 			rdp = per_cpu_ptr(&rcu_data, cpu);
-			if (f(rdp)) {
+			ret = f(rdp);
+			if (ret > 0) {
 				mask |= rdp->grpmask;
 				rcu_disable_urgency_upon_qs(rdp);
 			}
+			if (ret < 0)
+				rsmask |= rdp->grpmask;
 		}
 		if (mask != 0) {
 			/* Idle/offline CPUs, report (releases rnp->lock). */
@@ -2296,6 +2307,9 @@ static void force_qs_rnp(int (*f)(struct
 			/* Nothing to do here, so just drop the lock. */
 			raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 		}
+
+		for_each_leaf_node_cpu_mask(rnp, cpu, rsmask)
+			resched_cpu(cpu);
 	}
 }
 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] rcu: Break rcu_node_0 --> &rq->__lock order
  2023-10-31 20:02     ` [PATCH] rcu: Break rcu_node_0 --> &rq->__lock order Peter Zijlstra
@ 2023-10-31 21:49       ` Paul E. McKenney
  2023-11-01  0:07       ` Waiman Long
  2023-11-01 11:21       ` Z qiang
  2 siblings, 0 replies; 11+ messages in thread
From: Paul E. McKenney @ 2023-10-31 21:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Waiman Long, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, linux-kernel,
	Phil Auld, kernel test robot, aubrey.li, yu.c.chen, frederic,
	quic_neeraju, joel, josh, boqun.feng, mathieu.desnoyers,
	jiangshanlai, qiang.zhang1211

On Tue, Oct 31, 2023 at 09:02:28PM +0100, Peter Zijlstra wrote:
> On Tue, Oct 31, 2023 at 07:29:04AM -0700, Paul E. McKenney wrote:
> > Other than the de-alphabetization of the local variables, it looks
> > plausible to me.  Frederic's suggestion also sounds plausible to me.
> 
> Having spend the better part of the past two decades using upside down
> xmas trees for local variables, this alphabet thing is obnoxious :-)
> 
> But your code, your rules.
> 
> To reduce the number of alphabet songs required, I've taken the liberty
> to move a few variables into a narrower scope, hope that doesn't offend.

I have no problem with pushing local variables to local scopes!  ;-)

> ---
> Subject: rcu: Break rcu_node_0 --> &rq->__lock order
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Tue, 31 Oct 2023 09:53:08 +0100
> 
> Commit 851a723e45d1 ("sched: Always clear user_cpus_ptr in
> do_set_cpus_allowed()") added a kfree() call to free any user
> provided affinity mask, if present. It was changed later to use
> kfree_rcu() in commit 9a5418bc48ba ("sched/core: Use kfree_rcu()
> in do_set_cpus_allowed()") to avoid a circular locking dependency
> problem.
> 
> It turns out that even kfree_rcu() isn't safe for avoiding
> circular locking problem. As reported by kernel test robot,
> the following circular locking dependency now exists:
> 
>   &rdp->nocb_lock --> rcu_node_0 --> &rq->__lock
> 
> Solve this by breaking the rcu_node_0 --> &rq->__lock chain by moving
> the resched_cpu() out from under rcu_node lock.
> 
> [peterz: heavily borrowed from Waiman's Changelog]
> Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()")
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: https://lore.kernel.org/oe-lkp/202310302207.a25f1a30-oliver.sang@intel.com

This passes light testing, so I have queued it for further review and
testing.

							Thanx, Paul

> ---
>  kernel/rcu/tree.c |   34 ++++++++++++++++++++++++----------
>  1 file changed, 24 insertions(+), 10 deletions(-)
> 
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -754,14 +754,19 @@ static int dyntick_save_progress_counter
>  }
>  
>  /*
> - * Return true if the specified CPU has passed through a quiescent
> - * state by virtue of being in or having passed through an dynticks
> - * idle state since the last call to dyntick_save_progress_counter()
> - * for this same CPU, or by virtue of having been offline.
> + * Returns positive if the specified CPU has passed through a quiescent state
> + * by virtue of being in or having passed through an dynticks idle state since
> + * the last call to dyntick_save_progress_counter() for this same CPU, or by
> + * virtue of having been offline.
> + *
> + * Returns negative if the specified CPU needs a force resched.
> + *
> + * Returns zero otherwise.
>   */
>  static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
>  {
>  	unsigned long jtsq;
> +	int ret = 0;
>  	struct rcu_node *rnp = rdp->mynode;
>  
>  	/*
> @@ -847,8 +852,8 @@ static int rcu_implicit_dynticks_qs(stru
>  	    (time_after(jiffies, READ_ONCE(rdp->last_fqs_resched) + jtsq * 3) ||
>  	     rcu_state.cbovld)) {
>  		WRITE_ONCE(rdp->rcu_urgent_qs, true);
> -		resched_cpu(rdp->cpu);
>  		WRITE_ONCE(rdp->last_fqs_resched, jiffies);
> +		ret = -1;
>  	}
>  
>  	/*
> @@ -891,7 +896,7 @@ static int rcu_implicit_dynticks_qs(stru
>  		}
>  	}
>  
> -	return 0;
> +	return ret;
>  }
>  
>  /* Trace-event wrapper function for trace_rcu_future_grace_period.  */
> @@ -2257,15 +2262,15 @@ static void force_qs_rnp(int (*f)(struct
>  {
>  	int cpu;
>  	unsigned long flags;
> -	unsigned long mask;
> -	struct rcu_data *rdp;
>  	struct rcu_node *rnp;
>  
>  	rcu_state.cbovld = rcu_state.cbovldnext;
>  	rcu_state.cbovldnext = false;
>  	rcu_for_each_leaf_node(rnp) {
> +		unsigned long mask = 0;
> +		unsigned long rsmask = 0;
> +
>  		cond_resched_tasks_rcu_qs();
> -		mask = 0;
>  		raw_spin_lock_irqsave_rcu_node(rnp, flags);
>  		rcu_state.cbovldnext |= !!rnp->cbovldmask;
>  		if (rnp->qsmask == 0) {
> @@ -2283,11 +2288,17 @@ static void force_qs_rnp(int (*f)(struct
>  			continue;
>  		}
>  		for_each_leaf_node_cpu_mask(rnp, cpu, rnp->qsmask) {
> +			struct rcu_data *rdp;
> +			int ret;
> +
>  			rdp = per_cpu_ptr(&rcu_data, cpu);
> -			if (f(rdp)) {
> +			ret = f(rdp);
> +			if (ret > 0) {
>  				mask |= rdp->grpmask;
>  				rcu_disable_urgency_upon_qs(rdp);
>  			}
> +			if (ret < 0)
> +				rsmask |= rdp->grpmask;
>  		}
>  		if (mask != 0) {
>  			/* Idle/offline CPUs, report (releases rnp->lock). */
> @@ -2296,6 +2307,9 @@ static void force_qs_rnp(int (*f)(struct
>  			/* Nothing to do here, so just drop the lock. */
>  			raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>  		}
> +
> +		for_each_leaf_node_cpu_mask(rnp, cpu, rsmask)
> +			resched_cpu(cpu);
>  	}
>  }
>  

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] rcu: Break rcu_node_0 --> &rq->__lock order
  2023-10-31 20:02     ` [PATCH] rcu: Break rcu_node_0 --> &rq->__lock order Peter Zijlstra
  2023-10-31 21:49       ` Paul E. McKenney
@ 2023-11-01  0:07       ` Waiman Long
  2023-11-01 11:21       ` Z qiang
  2 siblings, 0 replies; 11+ messages in thread
From: Waiman Long @ 2023-11-01  0:07 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E. McKenney
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, linux-kernel,
	Phil Auld, kernel test robot, aubrey.li, yu.c.chen, frederic,
	quic_neeraju, joel, josh, boqun.feng, mathieu.desnoyers,
	jiangshanlai, qiang.zhang1211

On 10/31/23 16:02, Peter Zijlstra wrote:
> On Tue, Oct 31, 2023 at 07:29:04AM -0700, Paul E. McKenney wrote:
>> Other than the de-alphabetization of the local variables, it looks
>> plausible to me.  Frederic's suggestion also sounds plausible to me.
> Having spend the better part of the past two decades using upside down
> xmas trees for local variables, this alphabet thing is obnoxious :-)
>
> But your code, your rules.
>
> To reduce the number of alphabet songs required, I've taken the liberty
> to move a few variables into a narrower scope, hope that doesn't offend.
>
> ---
> Subject: rcu: Break rcu_node_0 --> &rq->__lock order
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Tue, 31 Oct 2023 09:53:08 +0100
>
> Commit 851a723e45d1 ("sched: Always clear user_cpus_ptr in
> do_set_cpus_allowed()") added a kfree() call to free any user
> provided affinity mask, if present. It was changed later to use
> kfree_rcu() in commit 9a5418bc48ba ("sched/core: Use kfree_rcu()
> in do_set_cpus_allowed()") to avoid a circular locking dependency
> problem.
>
> It turns out that even kfree_rcu() isn't safe for avoiding
> circular locking problem. As reported by kernel test robot,
> the following circular locking dependency now exists:
>
>    &rdp->nocb_lock --> rcu_node_0 --> &rq->__lock
>
> Solve this by breaking the rcu_node_0 --> &rq->__lock chain by moving
> the resched_cpu() out from under rcu_node lock.
>
> [peterz: heavily borrowed from Waiman's Changelog]
> Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()")
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: https://lore.kernel.org/oe-lkp/202310302207.a25f1a30-oliver.sang@intel.com

Thanks for addressing this issue. I am fine with your way as long as it 
gets the job done. I am not familiar enough of the RCU code to do a 
proper review, but I do get the general idea of your change and it looks 
good to me.

Acked-by: Waiman Long <longman@redhat.com>

> ---
>   kernel/rcu/tree.c |   34 ++++++++++++++++++++++++----------
>   1 file changed, 24 insertions(+), 10 deletions(-)
>
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -754,14 +754,19 @@ static int dyntick_save_progress_counter
>   }
>   
>   /*
> - * Return true if the specified CPU has passed through a quiescent
> - * state by virtue of being in or having passed through an dynticks
> - * idle state since the last call to dyntick_save_progress_counter()
> - * for this same CPU, or by virtue of having been offline.
> + * Returns positive if the specified CPU has passed through a quiescent state
> + * by virtue of being in or having passed through an dynticks idle state since
> + * the last call to dyntick_save_progress_counter() for this same CPU, or by
> + * virtue of having been offline.
> + *
> + * Returns negative if the specified CPU needs a force resched.
> + *
> + * Returns zero otherwise.
>    */
>   static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
>   {
>   	unsigned long jtsq;
> +	int ret = 0;
>   	struct rcu_node *rnp = rdp->mynode;
>   
>   	/*
> @@ -847,8 +852,8 @@ static int rcu_implicit_dynticks_qs(stru
>   	    (time_after(jiffies, READ_ONCE(rdp->last_fqs_resched) + jtsq * 3) ||
>   	     rcu_state.cbovld)) {
>   		WRITE_ONCE(rdp->rcu_urgent_qs, true);
> -		resched_cpu(rdp->cpu);
>   		WRITE_ONCE(rdp->last_fqs_resched, jiffies);
> +		ret = -1;
>   	}
>   
>   	/*
> @@ -891,7 +896,7 @@ static int rcu_implicit_dynticks_qs(stru
>   		}
>   	}
>   
> -	return 0;
> +	return ret;
>   }
>   
>   /* Trace-event wrapper function for trace_rcu_future_grace_period.  */
> @@ -2257,15 +2262,15 @@ static void force_qs_rnp(int (*f)(struct
>   {
>   	int cpu;
>   	unsigned long flags;
> -	unsigned long mask;
> -	struct rcu_data *rdp;
>   	struct rcu_node *rnp;
>   
>   	rcu_state.cbovld = rcu_state.cbovldnext;
>   	rcu_state.cbovldnext = false;
>   	rcu_for_each_leaf_node(rnp) {
> +		unsigned long mask = 0;
> +		unsigned long rsmask = 0;
> +
>   		cond_resched_tasks_rcu_qs();
> -		mask = 0;
>   		raw_spin_lock_irqsave_rcu_node(rnp, flags);
>   		rcu_state.cbovldnext |= !!rnp->cbovldmask;
>   		if (rnp->qsmask == 0) {
> @@ -2283,11 +2288,17 @@ static void force_qs_rnp(int (*f)(struct
>   			continue;
>   		}
>   		for_each_leaf_node_cpu_mask(rnp, cpu, rnp->qsmask) {
> +			struct rcu_data *rdp;
> +			int ret;
> +
>   			rdp = per_cpu_ptr(&rcu_data, cpu);
> -			if (f(rdp)) {
> +			ret = f(rdp);
> +			if (ret > 0) {
>   				mask |= rdp->grpmask;
>   				rcu_disable_urgency_upon_qs(rdp);
>   			}
> +			if (ret < 0)
> +				rsmask |= rdp->grpmask;
>   		}
>   		if (mask != 0) {
>   			/* Idle/offline CPUs, report (releases rnp->lock). */
> @@ -2296,6 +2307,9 @@ static void force_qs_rnp(int (*f)(struct
>   			/* Nothing to do here, so just drop the lock. */
>   			raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>   		}
> +
> +		for_each_leaf_node_cpu_mask(rnp, cpu, rsmask)
> +			resched_cpu(cpu);
>   	}
>   }
>   
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] rcu: Break rcu_node_0 --> &rq->__lock order
  2023-10-31 20:02     ` [PATCH] rcu: Break rcu_node_0 --> &rq->__lock order Peter Zijlstra
  2023-10-31 21:49       ` Paul E. McKenney
  2023-11-01  0:07       ` Waiman Long
@ 2023-11-01 11:21       ` Z qiang
  2023-11-01 14:38         ` Peter Zijlstra
  2 siblings, 1 reply; 11+ messages in thread
From: Z qiang @ 2023-11-01 11:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, Waiman Long, Ingo Molnar, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	linux-kernel, Phil Auld, kernel test robot, aubrey.li, yu.c.chen,
	frederic, quic_neeraju, joel, josh, boqun.feng,
	mathieu.desnoyers, jiangshanlai

>
> On Tue, Oct 31, 2023 at 07:29:04AM -0700, Paul E. McKenney wrote:
> > Other than the de-alphabetization of the local variables, it looks
> > plausible to me.  Frederic's suggestion also sounds plausible to me.
>
> Having spend the better part of the past two decades using upside down
> xmas trees for local variables, this alphabet thing is obnoxious :-)
>
> But your code, your rules.
>
> To reduce the number of alphabet songs required, I've taken the liberty
> to move a few variables into a narrower scope, hope that doesn't offend.
>
> ---
> Subject: rcu: Break rcu_node_0 --> &rq->__lock order
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Tue, 31 Oct 2023 09:53:08 +0100
>
> Commit 851a723e45d1 ("sched: Always clear user_cpus_ptr in
> do_set_cpus_allowed()") added a kfree() call to free any user
> provided affinity mask, if present. It was changed later to use
> kfree_rcu() in commit 9a5418bc48ba ("sched/core: Use kfree_rcu()
> in do_set_cpus_allowed()") to avoid a circular locking dependency
> problem.
>
> It turns out that even kfree_rcu() isn't safe for avoiding
> circular locking problem. As reported by kernel test robot,
> the following circular locking dependency now exists:
>
>   &rdp->nocb_lock --> rcu_node_0 --> &rq->__lock
>
> Solve this by breaking the rcu_node_0 --> &rq->__lock chain by moving
> the resched_cpu() out from under rcu_node lock.
>
> [peterz: heavily borrowed from Waiman's Changelog]
> Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()")
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: https://lore.kernel.org/oe-lkp/202310302207.a25f1a30-oliver.sang@intel.com
> ---
>  kernel/rcu/tree.c |   34 ++++++++++++++++++++++++----------
>  1 file changed, 24 insertions(+), 10 deletions(-)
>
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -754,14 +754,19 @@ static int dyntick_save_progress_counter
>  }
>
>  /*
> - * Return true if the specified CPU has passed through a quiescent
> - * state by virtue of being in or having passed through an dynticks
> - * idle state since the last call to dyntick_save_progress_counter()
> - * for this same CPU, or by virtue of having been offline.
> + * Returns positive if the specified CPU has passed through a quiescent state
> + * by virtue of being in or having passed through an dynticks idle state since
> + * the last call to dyntick_save_progress_counter() for this same CPU, or by
> + * virtue of having been offline.
> + *
> + * Returns negative if the specified CPU needs a force resched.
> + *
> + * Returns zero otherwise.
>   */
>  static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
>  {
>         unsigned long jtsq;
> +       int ret = 0;
>         struct rcu_node *rnp = rdp->mynode;
>
>         /*
> @@ -847,8 +852,8 @@ static int rcu_implicit_dynticks_qs(stru
>             (time_after(jiffies, READ_ONCE(rdp->last_fqs_resched) + jtsq * 3) ||
>              rcu_state.cbovld)) {
>                 WRITE_ONCE(rdp->rcu_urgent_qs, true);
> -               resched_cpu(rdp->cpu);
>                 WRITE_ONCE(rdp->last_fqs_resched, jiffies);
> +               ret = -1;
>         }
>


Maybe some modifications are missing here:

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index aa4c808978b8..77e7a0dc722a 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -867,8 +867,8 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
        if (time_after(jiffies, rcu_state.jiffies_resched)) {
                if (time_after(jiffies,
                               READ_ONCE(rdp->last_fqs_resched) + jtsq)) {
-                       resched_cpu(rdp->cpu);
                        WRITE_ONCE(rdp->last_fqs_resched, jiffies);
+                       ret = -1;
                }
                if (IS_ENABLED(CONFIG_IRQ_WORK) &&
                    !rdp->rcu_iw_pending && rdp->rcu_iw_gp_seq != rnp->gp_seq &&



Thanks
Zqiang


>         /*
> @@ -891,7 +896,7 @@ static int rcu_implicit_dynticks_qs(stru
>                 }
>         }
>
> -       return 0;
> +       return ret;
>  }
>
>  /* Trace-event wrapper function for trace_rcu_future_grace_period.  */
> @@ -2257,15 +2262,15 @@ static void force_qs_rnp(int (*f)(struct
>  {
>         int cpu;
>         unsigned long flags;
> -       unsigned long mask;
> -       struct rcu_data *rdp;
>         struct rcu_node *rnp;
>
>         rcu_state.cbovld = rcu_state.cbovldnext;
>         rcu_state.cbovldnext = false;
>         rcu_for_each_leaf_node(rnp) {
> +               unsigned long mask = 0;
> +               unsigned long rsmask = 0;
> +
>                 cond_resched_tasks_rcu_qs();
> -               mask = 0;
>                 raw_spin_lock_irqsave_rcu_node(rnp, flags);
>                 rcu_state.cbovldnext |= !!rnp->cbovldmask;
>                 if (rnp->qsmask == 0) {
> @@ -2283,11 +2288,17 @@ static void force_qs_rnp(int (*f)(struct
>                         continue;
>                 }
>                 for_each_leaf_node_cpu_mask(rnp, cpu, rnp->qsmask) {
> +                       struct rcu_data *rdp;
> +                       int ret;
> +
>                         rdp = per_cpu_ptr(&rcu_data, cpu);
> -                       if (f(rdp)) {
> +                       ret = f(rdp);
> +                       if (ret > 0) {
>                                 mask |= rdp->grpmask;
>                                 rcu_disable_urgency_upon_qs(rdp);
>                         }
> +                       if (ret < 0)
> +                               rsmask |= rdp->grpmask;
>                 }
>                 if (mask != 0) {
>                         /* Idle/offline CPUs, report (releases rnp->lock). */
> @@ -2296,6 +2307,9 @@ static void force_qs_rnp(int (*f)(struct
>                         /* Nothing to do here, so just drop the lock. */
>                         raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>                 }
> +
> +               for_each_leaf_node_cpu_mask(rnp, cpu, rsmask)
> +                       resched_cpu(cpu);
>         }
>  }
>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] rcu: Break rcu_node_0 --> &rq->__lock order
  2023-11-01 11:21       ` Z qiang
@ 2023-11-01 14:38         ` Peter Zijlstra
  2023-11-01 16:52           ` Paul E. McKenney
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2023-11-01 14:38 UTC (permalink / raw)
  To: Z qiang
  Cc: Paul E. McKenney, Waiman Long, Ingo Molnar, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	linux-kernel, Phil Auld, kernel test robot, aubrey.li, yu.c.chen,
	frederic, quic_neeraju, joel, josh, boqun.feng,
	mathieu.desnoyers, jiangshanlai

On Wed, Nov 01, 2023 at 07:21:09PM +0800, Z qiang wrote:

> >  static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
> >  {
> >         unsigned long jtsq;
> > +       int ret = 0;
> >         struct rcu_node *rnp = rdp->mynode;
> >
> >         /*
> > @@ -847,8 +852,8 @@ static int rcu_implicit_dynticks_qs(stru
> >             (time_after(jiffies, READ_ONCE(rdp->last_fqs_resched) + jtsq * 3) ||
> >              rcu_state.cbovld)) {
> >                 WRITE_ONCE(rdp->rcu_urgent_qs, true);
> > -               resched_cpu(rdp->cpu);
> >                 WRITE_ONCE(rdp->last_fqs_resched, jiffies);
> > +               ret = -1;
> >         }
> >
> 
> 
> Maybe some modifications are missing here:
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index aa4c808978b8..77e7a0dc722a 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -867,8 +867,8 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
>         if (time_after(jiffies, rcu_state.jiffies_resched)) {
>                 if (time_after(jiffies,
>                                READ_ONCE(rdp->last_fqs_resched) + jtsq)) {
> -                       resched_cpu(rdp->cpu);
>                         WRITE_ONCE(rdp->last_fqs_resched, jiffies);
> +                       ret = -1;
>                 }
>                 if (IS_ENABLED(CONFIG_IRQ_WORK) &&
>                     !rdp->rcu_iw_pending && rdp->rcu_iw_gp_seq != rnp->gp_seq &&
> 
> 

Bah, you're quite right, I missed that there were two sites calling
resched_cpu().

Paul, do you want a fixed up version or will you fold in the fix?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] rcu: Break rcu_node_0 --> &rq->__lock order
  2023-11-01 14:38         ` Peter Zijlstra
@ 2023-11-01 16:52           ` Paul E. McKenney
  0 siblings, 0 replies; 11+ messages in thread
From: Paul E. McKenney @ 2023-11-01 16:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Z qiang, Waiman Long, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, linux-kernel,
	Phil Auld, kernel test robot, aubrey.li, yu.c.chen, frederic,
	quic_neeraju, joel, josh, boqun.feng, mathieu.desnoyers,
	jiangshanlai

On Wed, Nov 01, 2023 at 03:38:14PM +0100, Peter Zijlstra wrote:
> On Wed, Nov 01, 2023 at 07:21:09PM +0800, Z qiang wrote:
> 
> > >  static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
> > >  {
> > >         unsigned long jtsq;
> > > +       int ret = 0;
> > >         struct rcu_node *rnp = rdp->mynode;
> > >
> > >         /*
> > > @@ -847,8 +852,8 @@ static int rcu_implicit_dynticks_qs(stru
> > >             (time_after(jiffies, READ_ONCE(rdp->last_fqs_resched) + jtsq * 3) ||
> > >              rcu_state.cbovld)) {
> > >                 WRITE_ONCE(rdp->rcu_urgent_qs, true);
> > > -               resched_cpu(rdp->cpu);
> > >                 WRITE_ONCE(rdp->last_fqs_resched, jiffies);
> > > +               ret = -1;
> > >         }
> > >
> > 
> > 
> > Maybe some modifications are missing here:
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index aa4c808978b8..77e7a0dc722a 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -867,8 +867,8 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
> >         if (time_after(jiffies, rcu_state.jiffies_resched)) {
> >                 if (time_after(jiffies,
> >                                READ_ONCE(rdp->last_fqs_resched) + jtsq)) {
> > -                       resched_cpu(rdp->cpu);
> >                         WRITE_ONCE(rdp->last_fqs_resched, jiffies);
> > +                       ret = -1;
> >                 }
> >                 if (IS_ENABLED(CONFIG_IRQ_WORK) &&
> >                     !rdp->rcu_iw_pending && rdp->rcu_iw_gp_seq != rnp->gp_seq &&
> > 
> > 
> 
> Bah, you're quite right, I missed that there were two sites calling
> resched_cpu().
> 
> Paul, do you want a fixed up version or will you fold in the fix?

I can fold it in.  I also clearly need to add a 15-second stall to at
least one of the rcutorture scenarios to exercise this code path...

(And Frederic might be pushing this one, his choice.)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-11-01 16:52 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-31  0:14 [PATCH] sched: Don't call any kfree*() API in do_set_cpus_allowed() Waiman Long
2023-10-31  8:53 ` Peter Zijlstra
2023-10-31 10:52   ` Frederic Weisbecker
2023-10-31 12:18     ` Peter Zijlstra
2023-10-31 14:29   ` Paul E. McKenney
2023-10-31 20:02     ` [PATCH] rcu: Break rcu_node_0 --> &rq->__lock order Peter Zijlstra
2023-10-31 21:49       ` Paul E. McKenney
2023-11-01  0:07       ` Waiman Long
2023-11-01 11:21       ` Z qiang
2023-11-01 14:38         ` Peter Zijlstra
2023-11-01 16:52           ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.