linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] rcu: exp: Fix "must hold exp_mutex" comments for QS reporting functions
@ 2018-03-07  8:49 Boqun Feng
  2018-03-07 15:48 ` Paul E. McKenney
  0 siblings, 1 reply; 11+ messages in thread
From: Boqun Feng @ 2018-03-07  8:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Boqun Feng, Paul E. McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan

Since commit d9a3da0699b2 ("rcu: Add expedited grace-period support for
preemptible RCU"), there are comments for some funtions in
rcu_report_exp_rnp()'s call-chain saying that exp_mutex or its
predecessors needs to be held.

However, exp_mutex and its predecessors are merely used for synchronize
between GPs, and it's clearly that all variables visited by those
functions are under the protection of rcu_node's ->lock. Moreover, those
functions are currently called without held exp_mutex, and seems that
doesn't introduce any trouble.

So this patch fix this problem by correcting the comments

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Fixes: d9a3da0699b2 ("rcu: Add expedited grace-period support for preemptible RCU")
---
 kernel/rcu/tree_exp.h | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index f8e4571efabf..2fd882b08b7c 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -154,7 +154,7 @@ static void __maybe_unused sync_exp_reset_tree(struct rcu_state *rsp)
  * for the current expedited grace period.  Works only for preemptible
  * RCU -- other RCU implementation use other means.
  *
- * Caller must hold the rcu_state's exp_mutex.
+ * Caller must hold the specificed rcu_node structure's ->lock
  */
 static bool sync_rcu_preempt_exp_done(struct rcu_node *rnp)
 {
@@ -170,8 +170,7 @@ static bool sync_rcu_preempt_exp_done(struct rcu_node *rnp)
  * recursively up the tree.  (Calm down, calm down, we do the recursion
  * iteratively!)
  *
- * Caller must hold the rcu_state's exp_mutex and the specified rcu_node
- * structure's ->lock.
+ * Caller must hold the specified rcu_node structure's ->lock.
  */
 static void __rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
 				 bool wake, unsigned long flags)
@@ -207,8 +206,6 @@ static void __rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
 /*
  * Report expedited quiescent state for specified node.  This is a
  * lock-acquisition wrapper function for __rcu_report_exp_rnp().
- *
- * Caller must hold the rcu_state's exp_mutex.
  */
 static void __maybe_unused rcu_report_exp_rnp(struct rcu_state *rsp,
 					      struct rcu_node *rnp, bool wake)
@@ -221,8 +218,7 @@ static void __maybe_unused rcu_report_exp_rnp(struct rcu_state *rsp,
 
 /*
  * Report expedited quiescent state for multiple CPUs, all covered by the
- * specified leaf rcu_node structure.  Caller must hold the rcu_state's
- * exp_mutex.
+ * specified leaf rcu_node structure.
  */
 static void rcu_report_exp_cpu_mult(struct rcu_state *rsp, struct rcu_node *rnp,
 				    unsigned long mask, bool wake)
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] rcu: exp: Fix "must hold exp_mutex" comments for QS reporting functions
  2018-03-07  8:49 [PATCH] rcu: exp: Fix "must hold exp_mutex" comments for QS reporting functions Boqun Feng
@ 2018-03-07 15:48 ` Paul E. McKenney
  2018-03-08  3:46   ` Boqun Feng
  0 siblings, 1 reply; 11+ messages in thread
From: Paul E. McKenney @ 2018-03-07 15:48 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan

On Wed, Mar 07, 2018 at 04:49:39PM +0800, Boqun Feng wrote:
> Since commit d9a3da0699b2 ("rcu: Add expedited grace-period support for
> preemptible RCU"), there are comments for some funtions in
> rcu_report_exp_rnp()'s call-chain saying that exp_mutex or its
> predecessors needs to be held.
> 
> However, exp_mutex and its predecessors are merely used for synchronize
> between GPs, and it's clearly that all variables visited by those
> functions are under the protection of rcu_node's ->lock. Moreover, those
> functions are currently called without held exp_mutex, and seems that
> doesn't introduce any trouble.
> 
> So this patch fix this problem by correcting the comments
> 
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> Fixes: d9a3da0699b2 ("rcu: Add expedited grace-period support for preemptible RCU")

Good catch, applied!

These have been around for almost a decade!  ;-)  They happened because
I ended up rewriting expedited support several times before it was
accepted, and apparently failed to update the comments towards the end.

So thank you for catching this one!

But wouldn't it be better to use lockdep instead of comments in this case?

							Thanx, Paul

> ---
>  kernel/rcu/tree_exp.h | 10 +++-------
>  1 file changed, 3 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index f8e4571efabf..2fd882b08b7c 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -154,7 +154,7 @@ static void __maybe_unused sync_exp_reset_tree(struct rcu_state *rsp)
>   * for the current expedited grace period.  Works only for preemptible
>   * RCU -- other RCU implementation use other means.
>   *
> - * Caller must hold the rcu_state's exp_mutex.
> + * Caller must hold the specificed rcu_node structure's ->lock
>   */
>  static bool sync_rcu_preempt_exp_done(struct rcu_node *rnp)
>  {
> @@ -170,8 +170,7 @@ static bool sync_rcu_preempt_exp_done(struct rcu_node *rnp)
>   * recursively up the tree.  (Calm down, calm down, we do the recursion
>   * iteratively!)
>   *
> - * Caller must hold the rcu_state's exp_mutex and the specified rcu_node
> - * structure's ->lock.
> + * Caller must hold the specified rcu_node structure's ->lock.
>   */
>  static void __rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
>  				 bool wake, unsigned long flags)
> @@ -207,8 +206,6 @@ static void __rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
>  /*
>   * Report expedited quiescent state for specified node.  This is a
>   * lock-acquisition wrapper function for __rcu_report_exp_rnp().
> - *
> - * Caller must hold the rcu_state's exp_mutex.
>   */
>  static void __maybe_unused rcu_report_exp_rnp(struct rcu_state *rsp,
>  					      struct rcu_node *rnp, bool wake)
> @@ -221,8 +218,7 @@ static void __maybe_unused rcu_report_exp_rnp(struct rcu_state *rsp,
> 
>  /*
>   * Report expedited quiescent state for multiple CPUs, all covered by the
> - * specified leaf rcu_node structure.  Caller must hold the rcu_state's
> - * exp_mutex.
> + * specified leaf rcu_node structure.
>   */
>  static void rcu_report_exp_cpu_mult(struct rcu_state *rsp, struct rcu_node *rnp,
>  				    unsigned long mask, bool wake)
> -- 
> 2.16.2
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] rcu: exp: Fix "must hold exp_mutex" comments for QS reporting functions
  2018-03-07 15:48 ` Paul E. McKenney
@ 2018-03-08  3:46   ` Boqun Feng
  2018-03-08  4:30     ` Paul E. McKenney
  0 siblings, 1 reply; 11+ messages in thread
From: Boqun Feng @ 2018-03-08  3:46 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan

[-- Attachment #1: Type: text/plain, Size: 7855 bytes --]

On Wed, Mar 07, 2018 at 07:48:29AM -0800, Paul E. McKenney wrote:
> On Wed, Mar 07, 2018 at 04:49:39PM +0800, Boqun Feng wrote:
> > Since commit d9a3da0699b2 ("rcu: Add expedited grace-period support for
> > preemptible RCU"), there are comments for some funtions in
> > rcu_report_exp_rnp()'s call-chain saying that exp_mutex or its
> > predecessors needs to be held.
> > 
> > However, exp_mutex and its predecessors are merely used for synchronize
> > between GPs, and it's clearly that all variables visited by those
> > functions are under the protection of rcu_node's ->lock. Moreover, those
> > functions are currently called without held exp_mutex, and seems that
> > doesn't introduce any trouble.
> > 
> > So this patch fix this problem by correcting the comments
> > 
> > Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> > Fixes: d9a3da0699b2 ("rcu: Add expedited grace-period support for preemptible RCU")
> 
> Good catch, applied!
> 
> These have been around for almost a decade!  ;-)  They happened because
> I ended up rewriting expedited support several times before it was
> accepted, and apparently failed to update the comments towards the end.
> 
> So thank you for catching this one!
> 
> But wouldn't it be better to use lockdep instead of comments in this case?
> 

Agreed. That's a good idea. And I might find something about this, so I
add this for sync_rcu_preempt_exp_done(), 

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 2fd882b08b7c..1fa394fe2f8a 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -20,6 +20,8 @@
  * Authors: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
  */
 
+#include <linux/lockdep.h>
+
 /*
  * Record the start of an expedited grace period.
  */
@@ -158,6 +160,8 @@ static void __maybe_unused sync_exp_reset_tree(struct rcu_state *rsp)
  */
 static bool sync_rcu_preempt_exp_done(struct rcu_node *rnp)
 {
+	BUG_ON(!lockdep_is_held(&rnp->lock));
+
 	return rnp->exp_tasks == NULL &&
 	       READ_ONCE(rnp->expmask) == 0;
 }

And I could got this with rcutorture (--configs TREE03 --bootargs
"rcutorture.gp_exp=1" --duration 10 --kconfig "CONFIG_PROVE_LOCKING=y"):

[    0.013629] ------------[ cut here ]------------
[    0.014000] kernel BUG at /home/fixme/linux-rcu/kernel/rcu/tree_exp.h:163!
[    0.014009] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[    0.014697] Modules linked in:
[    0.014992] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.16.0-rc1+ #1
[    0.015000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
[    0.015000] RIP: 0010:sync_rcu_preempt_exp_done+0x30/0x40
[    0.015000] RSP: 0000:ffffffffb7003d00 EFLAGS: 00010246
[    0.015000] RAX: 0000000000000000 RBX: ffffffffb7064d00 RCX: 0000000000000003
[    0.015000] RDX: 0000000080000003 RSI: ffffffffb7064d18 RDI: ffffffffb7024da8
[    0.015000] RBP: ffffffffb7064d00 R08: 0000000000000000 R09: 0000000000000000
[    0.015000] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffb70686d0
[    0.015000] R13: ffffffffb765e2e0 R14: ffffffffb7064d00 R15: 0000000000000004
[    0.015000] FS:  0000000000000000(0000) GS:ffff94fb9fc00000(0000) knlGS:0000000000000000
[    0.015000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.015000] CR2: ffff94fb8e57f000 CR3: 000000000d01e000 CR4: 00000000000006f0
[    0.015000] Call Trace:
[    0.015000]  rcu_exp_wait_wake+0x6d/0x9e0
[    0.015000]  ? rcu_cleanup_dead_rnp+0x90/0x90
[    0.015000]  ? rcu_report_exp_cpu_mult+0x60/0x60
[    0.015000]  _synchronize_rcu_expedited+0x680/0x6f0
[    0.015000]  ? rcu_cleanup_dead_rnp+0x90/0x90
[    0.015000]  ? acpi_hw_read_port+0x4c/0xc2
[    0.015000]  ? acpi_hw_read+0xf4/0x153
[    0.015000]  synchronize_rcu.part.79+0x53/0x60
[    0.015000]  ? acpi_read_bit_register+0x38/0x69
[    0.015000]  rcu_test_sync_prims+0x5/0x20
[    0.015000]  rest_init+0x6/0xc0
[    0.015000]  start_kernel+0x447/0x467
[    0.015000]  secondary_startup_64+0xa5/0xb0
[    0.015000] Code: ff 48 89 fb 48 83 c7 18 e8 9e 52 fd ff 85 c0 74 1a 31 c0 48 83 bb c0 00 00 00 00 74 02 5b c3 48 8b 43 68 5b 48 85 c0 0f 94 c0 c3 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 41 56 49 89 fe 
[    0.015000] RIP: sync_rcu_preempt_exp_done+0x30/0x40 RSP: ffffffffb7003d00
[    0.015007] ---[ end trace 09539f6b90637d6c ]---


And I found in rcu_exp_wait_wake(), some of sync_rcu_preempt_exp_done()
are not protected by rcu_node's lock. I have a straight-forward fix
(along with the debug changes I mention above).

With this fix, rcutorture doesn't complain about the lock missing
anymore. I will continue to investigate whether missing lock is a
problem, but in the meanwhile, looking forward to your insight ;-)

Regards,
Boqun
---------------------------------->8
Subject: [PATCH] rcu: exp: Protect sync_rcu_preempt_exp_done() with rcu_node
 lock

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 kernel/rcu/tree_exp.h | 35 ++++++++++++++++++++++++++++++++---
 1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 2fd882b08b7c..0001ac0ce6a7 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -20,6 +20,8 @@
  * Authors: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
  */
 
+#include <linux/lockdep.h>
+
 /*
  * Record the start of an expedited grace period.
  */
@@ -158,10 +160,30 @@ static void __maybe_unused sync_exp_reset_tree(struct rcu_state *rsp)
  */
 static bool sync_rcu_preempt_exp_done(struct rcu_node *rnp)
 {
+	BUG_ON(!lockdep_is_held(&rnp->lock));
+
 	return rnp->exp_tasks == NULL &&
 	       READ_ONCE(rnp->expmask) == 0;
 }
 
+/*
+ * Like sync_rcu_preempt_exp_done(), but this function assumes the caller
+ * doesn't hold the rcu_node's ->lock, and will acquire and release the lock
+ * itself
+ */
+static bool sync_rcu_preempt_exp_done_unlocked(struct rcu_node *rnp)
+{
+	unsigned long flags;
+	bool ret;
+
+	raw_spin_lock_irqsave_rcu_node(rnp, flags);
+	ret = sync_rcu_preempt_exp_done(rnp);
+	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+
+	return ret;
+}
+
+
 /*
  * Report the exit from RCU read-side critical section for the last task
  * that queued itself during or before the current expedited preemptible-RCU
@@ -490,6 +512,7 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
 	struct rcu_node *rnp;
 	struct rcu_node *rnp_root = rcu_get_root(rsp);
 	int ret;
+	unsigned long flags;
 
 	trace_rcu_exp_grace_period(rsp->name, rcu_exp_gp_seq_endval(rsp), TPS("startwait"));
 	jiffies_stall = rcu_jiffies_till_stall_check();
@@ -498,9 +521,9 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
 	for (;;) {
 		ret = swait_event_timeout(
 				rsp->expedited_wq,
-				sync_rcu_preempt_exp_done(rnp_root),
+				sync_rcu_preempt_exp_done_unlocked(rnp_root),
 				jiffies_stall);
-		if (ret > 0 || sync_rcu_preempt_exp_done(rnp_root))
+		if (ret > 0 || sync_rcu_preempt_exp_done_unlocked(rnp_root))
 			return;
 		WARN_ON(ret < 0);  /* workqueues should not be signaled. */
 		if (rcu_cpu_stall_suppress)
@@ -533,8 +556,14 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
 			rcu_for_each_node_breadth_first(rsp, rnp) {
 				if (rnp == rnp_root)
 					continue; /* printed unconditionally */
-				if (sync_rcu_preempt_exp_done(rnp))
+
+				raw_spin_lock_irqsave_rcu_node(rnp, flags);
+				if (sync_rcu_preempt_exp_done(rnp)) {
+					raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 					continue;
+				}
+				raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+
 				pr_cont(" l=%u:%d-%d:%#lx/%c",
 					rnp->level, rnp->grplo, rnp->grphi,
 					rnp->expmask,
-- 
2.16.2


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] rcu: exp: Fix "must hold exp_mutex" comments for QS reporting functions
  2018-03-08  3:46   ` Boqun Feng
@ 2018-03-08  4:30     ` Paul E. McKenney
  2018-03-08  4:54       ` Boqun Feng
  0 siblings, 1 reply; 11+ messages in thread
From: Paul E. McKenney @ 2018-03-08  4:30 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan

On Thu, Mar 08, 2018 at 11:46:27AM +0800, Boqun Feng wrote:
> On Wed, Mar 07, 2018 at 07:48:29AM -0800, Paul E. McKenney wrote:
> > On Wed, Mar 07, 2018 at 04:49:39PM +0800, Boqun Feng wrote:
> > > Since commit d9a3da0699b2 ("rcu: Add expedited grace-period support for
> > > preemptible RCU"), there are comments for some funtions in
> > > rcu_report_exp_rnp()'s call-chain saying that exp_mutex or its
> > > predecessors needs to be held.
> > > 
> > > However, exp_mutex and its predecessors are merely used for synchronize
> > > between GPs, and it's clearly that all variables visited by those
> > > functions are under the protection of rcu_node's ->lock. Moreover, those
> > > functions are currently called without held exp_mutex, and seems that
> > > doesn't introduce any trouble.
> > > 
> > > So this patch fix this problem by correcting the comments
> > > 
> > > Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> > > Fixes: d9a3da0699b2 ("rcu: Add expedited grace-period support for preemptible RCU")
> > 
> > Good catch, applied!
> > 
> > These have been around for almost a decade!  ;-)  They happened because
> > I ended up rewriting expedited support several times before it was
> > accepted, and apparently failed to update the comments towards the end.
> > 
> > So thank you for catching this one!
> > 
> > But wouldn't it be better to use lockdep instead of comments in this case?
> > 
> 
> Agreed. That's a good idea. And I might find something about this, so I
> add this for sync_rcu_preempt_exp_done(), 
> 
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index 2fd882b08b7c..1fa394fe2f8a 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -20,6 +20,8 @@
>   * Authors: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>   */
>  
> +#include <linux/lockdep.h>
> +
>  /*
>   * Record the start of an expedited grace period.
>   */
> @@ -158,6 +160,8 @@ static void __maybe_unused sync_exp_reset_tree(struct rcu_state *rsp)
>   */
>  static bool sync_rcu_preempt_exp_done(struct rcu_node *rnp)
>  {
> +	BUG_ON(!lockdep_is_held(&rnp->lock));
> +
>  	return rnp->exp_tasks == NULL &&
>  	       READ_ONCE(rnp->expmask) == 0;
>  }
> 
> And I could got this with rcutorture (--configs TREE03 --bootargs
> "rcutorture.gp_exp=1" --duration 10 --kconfig "CONFIG_PROVE_LOCKING=y"):

Certainly stronger medicine than the comment!  ;-)

> [    0.013629] ------------[ cut here ]------------
> [    0.014000] kernel BUG at /home/fixme/linux-rcu/kernel/rcu/tree_exp.h:163!
> [    0.014009] invalid opcode: 0000 [#1] PREEMPT SMP PTI
> [    0.014697] Modules linked in:
> [    0.014992] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.16.0-rc1+ #1
> [    0.015000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
> [    0.015000] RIP: 0010:sync_rcu_preempt_exp_done+0x30/0x40
> [    0.015000] RSP: 0000:ffffffffb7003d00 EFLAGS: 00010246
> [    0.015000] RAX: 0000000000000000 RBX: ffffffffb7064d00 RCX: 0000000000000003
> [    0.015000] RDX: 0000000080000003 RSI: ffffffffb7064d18 RDI: ffffffffb7024da8
> [    0.015000] RBP: ffffffffb7064d00 R08: 0000000000000000 R09: 0000000000000000
> [    0.015000] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffb70686d0
> [    0.015000] R13: ffffffffb765e2e0 R14: ffffffffb7064d00 R15: 0000000000000004
> [    0.015000] FS:  0000000000000000(0000) GS:ffff94fb9fc00000(0000) knlGS:0000000000000000
> [    0.015000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.015000] CR2: ffff94fb8e57f000 CR3: 000000000d01e000 CR4: 00000000000006f0
> [    0.015000] Call Trace:
> [    0.015000]  rcu_exp_wait_wake+0x6d/0x9e0
> [    0.015000]  ? rcu_cleanup_dead_rnp+0x90/0x90
> [    0.015000]  ? rcu_report_exp_cpu_mult+0x60/0x60
> [    0.015000]  _synchronize_rcu_expedited+0x680/0x6f0
> [    0.015000]  ? rcu_cleanup_dead_rnp+0x90/0x90
> [    0.015000]  ? acpi_hw_read_port+0x4c/0xc2
> [    0.015000]  ? acpi_hw_read+0xf4/0x153
> [    0.015000]  synchronize_rcu.part.79+0x53/0x60
> [    0.015000]  ? acpi_read_bit_register+0x38/0x69
> [    0.015000]  rcu_test_sync_prims+0x5/0x20
> [    0.015000]  rest_init+0x6/0xc0
> [    0.015000]  start_kernel+0x447/0x467
> [    0.015000]  secondary_startup_64+0xa5/0xb0
> [    0.015000] Code: ff 48 89 fb 48 83 c7 18 e8 9e 52 fd ff 85 c0 74 1a 31 c0 48 83 bb c0 00 00 00 00 74 02 5b c3 48 8b 43 68 5b 48 85 c0 0f 94 c0 c3 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 41 56 49 89 fe 
> [    0.015000] RIP: sync_rcu_preempt_exp_done+0x30/0x40 RSP: ffffffffb7003d00
> [    0.015007] ---[ end trace 09539f6b90637d6c ]---
> 
> 
> And I found in rcu_exp_wait_wake(), some of sync_rcu_preempt_exp_done()
> are not protected by rcu_node's lock. I have a straight-forward fix
> (along with the debug changes I mention above).
> 
> With this fix, rcutorture doesn't complain about the lock missing
> anymore. I will continue to investigate whether missing lock is a
> problem, but in the meanwhile, looking forward to your insight ;-)
> 
> Regards,
> Boqun
> ---------------------------------->8
> Subject: [PATCH] rcu: exp: Protect sync_rcu_preempt_exp_done() with rcu_node
>  lock
> 
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> ---
>  kernel/rcu/tree_exp.h | 35 ++++++++++++++++++++++++++++++++---
>  1 file changed, 32 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index 2fd882b08b7c..0001ac0ce6a7 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -20,6 +20,8 @@
>   * Authors: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>   */
>  
> +#include <linux/lockdep.h>
> +
>  /*
>   * Record the start of an expedited grace period.
>   */
> @@ -158,10 +160,30 @@ static void __maybe_unused sync_exp_reset_tree(struct rcu_state *rsp)
>   */
>  static bool sync_rcu_preempt_exp_done(struct rcu_node *rnp)
>  {
> +	BUG_ON(!lockdep_is_held(&rnp->lock));
> +
>  	return rnp->exp_tasks == NULL &&
>  	       READ_ONCE(rnp->expmask) == 0;
>  }
>  
> +/*
> + * Like sync_rcu_preempt_exp_done(), but this function assumes the caller
> + * doesn't hold the rcu_node's ->lock, and will acquire and release the lock
> + * itself
> + */
> +static bool sync_rcu_preempt_exp_done_unlocked(struct rcu_node *rnp)
> +{
> +	unsigned long flags;
> +	bool ret;
> +
> +	raw_spin_lock_irqsave_rcu_node(rnp, flags);
> +	ret = sync_rcu_preempt_exp_done(rnp);

Let's see...  The sync_rcu_preempt_exp_done() function checks the
->exp_tasks pointer and the ->expmask bitmask.  The number of bits in the
mask can only decrease, and the ->exp_tasks pointer can only transition
from NULL to non-NULL when there is at least one bit set.  However,
there is no ordering in sync_rcu_preempt_exp_done(), so it is possible
that it could be fooled without the lock:

o	CPU 0 in sync_rcu_preempt_exp_done() reads ->exp_tasks and
	sees that it is NULL.

o	CPU 1 blocks within an RCU read-side critical section, so
	it enqueues the task and points ->exp_tasks at it and
	clears CPU 1's bit in ->expmask.

o	All other CPUs clear their bits in ->expmask.

o	CPU 0 reads ->expmask, sees that it is zero, so incorrectly
	concludes that all quiescent states have completed, despite
	the fact that ->exp_tasks is non-NULL.

So it seems to me that the lock is needed.  Good catch!!!  The problem
would occur only if the task running on CPU 0 received a spurious
wakeup, but that could potentially happen.

If lock contention becomes a problem, memory-ordering tricks could be
applied, but the lock is of course simpler.

I am guessing that this is a prototype patch, and that you are planning
to add lockdep annotations in more places, but either way please let
me know.

						Thanx, Paul

> +	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> +
> +	return ret;
> +}
> +
> +
>  /*
>   * Report the exit from RCU read-side critical section for the last task
>   * that queued itself during or before the current expedited preemptible-RCU
> @@ -490,6 +512,7 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
>  	struct rcu_node *rnp;
>  	struct rcu_node *rnp_root = rcu_get_root(rsp);
>  	int ret;
> +	unsigned long flags;
>  
>  	trace_rcu_exp_grace_period(rsp->name, rcu_exp_gp_seq_endval(rsp), TPS("startwait"));
>  	jiffies_stall = rcu_jiffies_till_stall_check();
> @@ -498,9 +521,9 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
>  	for (;;) {
>  		ret = swait_event_timeout(
>  				rsp->expedited_wq,
> -				sync_rcu_preempt_exp_done(rnp_root),
> +				sync_rcu_preempt_exp_done_unlocked(rnp_root),
>  				jiffies_stall);
> -		if (ret > 0 || sync_rcu_preempt_exp_done(rnp_root))
> +		if (ret > 0 || sync_rcu_preempt_exp_done_unlocked(rnp_root))
>  			return;
>  		WARN_ON(ret < 0);  /* workqueues should not be signaled. */
>  		if (rcu_cpu_stall_suppress)
> @@ -533,8 +556,14 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
>  			rcu_for_each_node_breadth_first(rsp, rnp) {
>  				if (rnp == rnp_root)
>  					continue; /* printed unconditionally */
> -				if (sync_rcu_preempt_exp_done(rnp))
> +
> +				raw_spin_lock_irqsave_rcu_node(rnp, flags);
> +				if (sync_rcu_preempt_exp_done(rnp)) {
> +					raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>  					continue;
> +				}
> +				raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> +
>  				pr_cont(" l=%u:%d-%d:%#lx/%c",
>  					rnp->level, rnp->grplo, rnp->grphi,
>  					rnp->expmask,
> -- 
> 2.16.2
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] rcu: exp: Fix "must hold exp_mutex" comments for QS reporting functions
  2018-03-08  4:30     ` Paul E. McKenney
@ 2018-03-08  4:54       ` Boqun Feng
  2018-03-08  8:30         ` Boqun Feng
  0 siblings, 1 reply; 11+ messages in thread
From: Boqun Feng @ 2018-03-08  4:54 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan

[-- Attachment #1: Type: text/plain, Size: 4179 bytes --]

On Wed, Mar 07, 2018 at 08:30:17PM -0800, Paul E. McKenney wrote:
[...]
> >  
> > +/*
> > + * Like sync_rcu_preempt_exp_done(), but this function assumes the caller
> > + * doesn't hold the rcu_node's ->lock, and will acquire and release the lock
> > + * itself
> > + */
> > +static bool sync_rcu_preempt_exp_done_unlocked(struct rcu_node *rnp)
> > +{
> > +	unsigned long flags;
> > +	bool ret;
> > +
> > +	raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > +	ret = sync_rcu_preempt_exp_done(rnp);
> 
> Let's see...  The sync_rcu_preempt_exp_done() function checks the
> ->exp_tasks pointer and the ->expmask bitmask.  The number of bits in the
> mask can only decrease, and the ->exp_tasks pointer can only transition
> from NULL to non-NULL when there is at least one bit set.  However,
> there is no ordering in sync_rcu_preempt_exp_done(), so it is possible
> that it could be fooled without the lock:
> 
> o	CPU 0 in sync_rcu_preempt_exp_done() reads ->exp_tasks and
> 	sees that it is NULL.
> 
> o	CPU 1 blocks within an RCU read-side critical section, so
> 	it enqueues the task and points ->exp_tasks at it and
> 	clears CPU 1's bit in ->expmask.
> 
> o	All other CPUs clear their bits in ->expmask.
> 
> o	CPU 0 reads ->expmask, sees that it is zero, so incorrectly
> 	concludes that all quiescent states have completed, despite
> 	the fact that ->exp_tasks is non-NULL.
> 
> So it seems to me that the lock is needed.  Good catch!!!  The problem
> would occur only if the task running on CPU 0 received a spurious
> wakeup, but that could potentially happen.
> 

Thanks for the analysis ;-)

> If lock contention becomes a problem, memory-ordering tricks could be
> applied, but the lock is of course simpler.
> 

Agreed.

> I am guessing that this is a prototype patch, and that you are planning

Yes, this is a prototype. And I'm preparing a proper patch to send
later.

> to add lockdep annotations in more places, but either way please let
> me know.
> 

Give it's a bug as per your analysis, I'd like to defer other lockdep
annotations and send this first. However, I'm currently getting other
lockdep splats after applying this, so I need to get that sorted first.

Regards,
Boqun

> 						Thanx, Paul
> 
> > +	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > +
> > +	return ret;
> > +}
> > +
> > +
> >  /*
> >   * Report the exit from RCU read-side critical section for the last task
> >   * that queued itself during or before the current expedited preemptible-RCU
> > @@ -490,6 +512,7 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
> >  	struct rcu_node *rnp;
> >  	struct rcu_node *rnp_root = rcu_get_root(rsp);
> >  	int ret;
> > +	unsigned long flags;
> >  
> >  	trace_rcu_exp_grace_period(rsp->name, rcu_exp_gp_seq_endval(rsp), TPS("startwait"));
> >  	jiffies_stall = rcu_jiffies_till_stall_check();
> > @@ -498,9 +521,9 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
> >  	for (;;) {
> >  		ret = swait_event_timeout(
> >  				rsp->expedited_wq,
> > -				sync_rcu_preempt_exp_done(rnp_root),
> > +				sync_rcu_preempt_exp_done_unlocked(rnp_root),
> >  				jiffies_stall);
> > -		if (ret > 0 || sync_rcu_preempt_exp_done(rnp_root))
> > +		if (ret > 0 || sync_rcu_preempt_exp_done_unlocked(rnp_root))
> >  			return;
> >  		WARN_ON(ret < 0);  /* workqueues should not be signaled. */
> >  		if (rcu_cpu_stall_suppress)
> > @@ -533,8 +556,14 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
> >  			rcu_for_each_node_breadth_first(rsp, rnp) {
> >  				if (rnp == rnp_root)
> >  					continue; /* printed unconditionally */
> > -				if (sync_rcu_preempt_exp_done(rnp))
> > +
> > +				raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > +				if (sync_rcu_preempt_exp_done(rnp)) {
> > +					raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> >  					continue;
> > +				}
> > +				raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > +
> >  				pr_cont(" l=%u:%d-%d:%#lx/%c",
> >  					rnp->level, rnp->grplo, rnp->grphi,
> >  					rnp->expmask,
> > -- 
> > 2.16.2
> > 
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] rcu: exp: Fix "must hold exp_mutex" comments for QS reporting functions
  2018-03-08  4:54       ` Boqun Feng
@ 2018-03-08  8:30         ` Boqun Feng
  2018-03-08 15:42           ` Paul E. McKenney
  0 siblings, 1 reply; 11+ messages in thread
From: Boqun Feng @ 2018-03-08  8:30 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan

[-- Attachment #1: Type: text/plain, Size: 4752 bytes --]

On Thu, Mar 08, 2018 at 12:54:29PM +0800, Boqun Feng wrote:
> On Wed, Mar 07, 2018 at 08:30:17PM -0800, Paul E. McKenney wrote:
> [...]
> > >  
> > > +/*
> > > + * Like sync_rcu_preempt_exp_done(), but this function assumes the caller
> > > + * doesn't hold the rcu_node's ->lock, and will acquire and release the lock
> > > + * itself
> > > + */
> > > +static bool sync_rcu_preempt_exp_done_unlocked(struct rcu_node *rnp)
> > > +{
> > > +	unsigned long flags;
> > > +	bool ret;
> > > +
> > > +	raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > > +	ret = sync_rcu_preempt_exp_done(rnp);
> > 
> > Let's see...  The sync_rcu_preempt_exp_done() function checks the
> > ->exp_tasks pointer and the ->expmask bitmask.  The number of bits in the
> > mask can only decrease, and the ->exp_tasks pointer can only transition
> > from NULL to non-NULL when there is at least one bit set.  However,
> > there is no ordering in sync_rcu_preempt_exp_done(), so it is possible
> > that it could be fooled without the lock:
> > 
> > o	CPU 0 in sync_rcu_preempt_exp_done() reads ->exp_tasks and
> > 	sees that it is NULL.
> > 
> > o	CPU 1 blocks within an RCU read-side critical section, so
> > 	it enqueues the task and points ->exp_tasks at it and
> > 	clears CPU 1's bit in ->expmask.
> > 
> > o	All other CPUs clear their bits in ->expmask.
> > 
> > o	CPU 0 reads ->expmask, sees that it is zero, so incorrectly
> > 	concludes that all quiescent states have completed, despite
> > 	the fact that ->exp_tasks is non-NULL.
> > 
> > So it seems to me that the lock is needed.  Good catch!!!  The problem
> > would occur only if the task running on CPU 0 received a spurious
> > wakeup, but that could potentially happen.
> > 
> 
> Thanks for the analysis ;-)
> 
> > If lock contention becomes a problem, memory-ordering tricks could be
> > applied, but the lock is of course simpler.
> > 
> 
> Agreed.
> 
> > I am guessing that this is a prototype patch, and that you are planning
> 
> Yes, this is a prototype. And I'm preparing a proper patch to send
> later.
> 
> > to add lockdep annotations in more places, but either way please let
> > me know.
> > 
> 
> Give it's a bug as per your analysis, I'd like to defer other lockdep
> annotations and send this first. However, I'm currently getting other
> lockdep splats after applying this, so I need to get that sorted first.
> 

Hmm.. the other lockdep splat seems irrelevant with my patch, I could
observe it on mainline using rcutorture with CONFIG_PROVE_LOCKING=y. I'd
spend some more time on it, in the meanwhile, send a proper patch for
this sync_rcu_preempt_exp_done().

Regards,
Boqun

> Regards,
> Boqun
> 
> > 						Thanx, Paul
> > 
> > > +	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > > +
> > > +	return ret;
> > > +}
> > > +
> > > +
> > >  /*
> > >   * Report the exit from RCU read-side critical section for the last task
> > >   * that queued itself during or before the current expedited preemptible-RCU
> > > @@ -490,6 +512,7 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
> > >  	struct rcu_node *rnp;
> > >  	struct rcu_node *rnp_root = rcu_get_root(rsp);
> > >  	int ret;
> > > +	unsigned long flags;
> > >  
> > >  	trace_rcu_exp_grace_period(rsp->name, rcu_exp_gp_seq_endval(rsp), TPS("startwait"));
> > >  	jiffies_stall = rcu_jiffies_till_stall_check();
> > > @@ -498,9 +521,9 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
> > >  	for (;;) {
> > >  		ret = swait_event_timeout(
> > >  				rsp->expedited_wq,
> > > -				sync_rcu_preempt_exp_done(rnp_root),
> > > +				sync_rcu_preempt_exp_done_unlocked(rnp_root),
> > >  				jiffies_stall);
> > > -		if (ret > 0 || sync_rcu_preempt_exp_done(rnp_root))
> > > +		if (ret > 0 || sync_rcu_preempt_exp_done_unlocked(rnp_root))
> > >  			return;
> > >  		WARN_ON(ret < 0);  /* workqueues should not be signaled. */
> > >  		if (rcu_cpu_stall_suppress)
> > > @@ -533,8 +556,14 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
> > >  			rcu_for_each_node_breadth_first(rsp, rnp) {
> > >  				if (rnp == rnp_root)
> > >  					continue; /* printed unconditionally */
> > > -				if (sync_rcu_preempt_exp_done(rnp))
> > > +
> > > +				raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > > +				if (sync_rcu_preempt_exp_done(rnp)) {
> > > +					raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > >  					continue;
> > > +				}
> > > +				raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > > +
> > >  				pr_cont(" l=%u:%d-%d:%#lx/%c",
> > >  					rnp->level, rnp->grplo, rnp->grphi,
> > >  					rnp->expmask,
> > > -- 
> > > 2.16.2
> > > 
> > 
> > 



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] rcu: exp: Fix "must hold exp_mutex" comments for QS reporting functions
  2018-03-08  8:30         ` Boqun Feng
@ 2018-03-08 15:42           ` Paul E. McKenney
  2018-03-09  6:57             ` Boqun Feng
  0 siblings, 1 reply; 11+ messages in thread
From: Paul E. McKenney @ 2018-03-08 15:42 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan

On Thu, Mar 08, 2018 at 04:30:06PM +0800, Boqun Feng wrote:
> On Thu, Mar 08, 2018 at 12:54:29PM +0800, Boqun Feng wrote:
> > On Wed, Mar 07, 2018 at 08:30:17PM -0800, Paul E. McKenney wrote:
> > [...]
> > > >  
> > > > +/*
> > > > + * Like sync_rcu_preempt_exp_done(), but this function assumes the caller
> > > > + * doesn't hold the rcu_node's ->lock, and will acquire and release the lock
> > > > + * itself
> > > > + */
> > > > +static bool sync_rcu_preempt_exp_done_unlocked(struct rcu_node *rnp)
> > > > +{
> > > > +	unsigned long flags;
> > > > +	bool ret;
> > > > +
> > > > +	raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > > > +	ret = sync_rcu_preempt_exp_done(rnp);
> > > 
> > > Let's see...  The sync_rcu_preempt_exp_done() function checks the
> > > ->exp_tasks pointer and the ->expmask bitmask.  The number of bits in the
> > > mask can only decrease, and the ->exp_tasks pointer can only transition
> > > from NULL to non-NULL when there is at least one bit set.  However,
> > > there is no ordering in sync_rcu_preempt_exp_done(), so it is possible
> > > that it could be fooled without the lock:
> > > 
> > > o	CPU 0 in sync_rcu_preempt_exp_done() reads ->exp_tasks and
> > > 	sees that it is NULL.
> > > 
> > > o	CPU 1 blocks within an RCU read-side critical section, so
> > > 	it enqueues the task and points ->exp_tasks at it and
> > > 	clears CPU 1's bit in ->expmask.
> > > 
> > > o	All other CPUs clear their bits in ->expmask.
> > > 
> > > o	CPU 0 reads ->expmask, sees that it is zero, so incorrectly
> > > 	concludes that all quiescent states have completed, despite
> > > 	the fact that ->exp_tasks is non-NULL.
> > > 
> > > So it seems to me that the lock is needed.  Good catch!!!  The problem
> > > would occur only if the task running on CPU 0 received a spurious
> > > wakeup, but that could potentially happen.
> > 
> > Thanks for the analysis ;-)

The other limitation is that it occurs only on systems small enough
to have a single-node rcu_node tree.  But still...

> > > If lock contention becomes a problem, memory-ordering tricks could be
> > > applied, but the lock is of course simpler.
> > > 
> > 
> > Agreed.
> > 
> > > I am guessing that this is a prototype patch, and that you are planning
> > 
> > Yes, this is a prototype. And I'm preparing a proper patch to send
> > later.

Very good, thank you!

> > > to add lockdep annotations in more places, but either way please let
> > > me know.
> > 
> > Give it's a bug as per your analysis, I'd like to defer other lockdep
> > annotations and send this first. However, I'm currently getting other
> > lockdep splats after applying this, so I need to get that sorted first.
> 
> Hmm.. the other lockdep splat seems irrelevant with my patch, I could
> observe it on mainline using rcutorture with CONFIG_PROVE_LOCKING=y. I'd
> spend some more time on it, in the meanwhile, send a proper patch for
> this sync_rcu_preempt_exp_done().

I am not seeing that one, but am very interested in getting it fixed!  ;-)

							Thanx, Paul

> Regards,
> Boqun
> 
> > Regards,
> > Boqun
> > 
> > > 						Thanx, Paul
> > > 
> > > > +	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > > > +
> > > > +	return ret;
> > > > +}
> > > > +
> > > > +
> > > >  /*
> > > >   * Report the exit from RCU read-side critical section for the last task
> > > >   * that queued itself during or before the current expedited preemptible-RCU
> > > > @@ -490,6 +512,7 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
> > > >  	struct rcu_node *rnp;
> > > >  	struct rcu_node *rnp_root = rcu_get_root(rsp);
> > > >  	int ret;
> > > > +	unsigned long flags;
> > > >  
> > > >  	trace_rcu_exp_grace_period(rsp->name, rcu_exp_gp_seq_endval(rsp), TPS("startwait"));
> > > >  	jiffies_stall = rcu_jiffies_till_stall_check();
> > > > @@ -498,9 +521,9 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
> > > >  	for (;;) {
> > > >  		ret = swait_event_timeout(
> > > >  				rsp->expedited_wq,
> > > > -				sync_rcu_preempt_exp_done(rnp_root),
> > > > +				sync_rcu_preempt_exp_done_unlocked(rnp_root),
> > > >  				jiffies_stall);
> > > > -		if (ret > 0 || sync_rcu_preempt_exp_done(rnp_root))
> > > > +		if (ret > 0 || sync_rcu_preempt_exp_done_unlocked(rnp_root))
> > > >  			return;
> > > >  		WARN_ON(ret < 0);  /* workqueues should not be signaled. */
> > > >  		if (rcu_cpu_stall_suppress)
> > > > @@ -533,8 +556,14 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
> > > >  			rcu_for_each_node_breadth_first(rsp, rnp) {
> > > >  				if (rnp == rnp_root)
> > > >  					continue; /* printed unconditionally */
> > > > -				if (sync_rcu_preempt_exp_done(rnp))
> > > > +
> > > > +				raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > > > +				if (sync_rcu_preempt_exp_done(rnp)) {
> > > > +					raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > > >  					continue;
> > > > +				}
> > > > +				raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > > > +
> > > >  				pr_cont(" l=%u:%d-%d:%#lx/%c",
> > > >  					rnp->level, rnp->grplo, rnp->grphi,
> > > >  					rnp->expmask,
> > > > -- 
> > > > 2.16.2
> > > > 
> > > 
> > > 
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] rcu: exp: Fix "must hold exp_mutex" comments for QS reporting functions
  2018-03-08 15:42           ` Paul E. McKenney
@ 2018-03-09  6:57             ` Boqun Feng
  2018-03-09 20:17               ` Paul E. McKenney
  0 siblings, 1 reply; 11+ messages in thread
From: Boqun Feng @ 2018-03-09  6:57 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan

[-- Attachment #1: Type: text/plain, Size: 3388 bytes --]

On Thu, Mar 08, 2018 at 07:42:55AM -0800, Paul E. McKenney wrote:
> On Thu, Mar 08, 2018 at 04:30:06PM +0800, Boqun Feng wrote:
> > On Thu, Mar 08, 2018 at 12:54:29PM +0800, Boqun Feng wrote:
> > > On Wed, Mar 07, 2018 at 08:30:17PM -0800, Paul E. McKenney wrote:
> > > [...]
> > > > >  
> > > > > +/*
> > > > > + * Like sync_rcu_preempt_exp_done(), but this function assumes the caller
> > > > > + * doesn't hold the rcu_node's ->lock, and will acquire and release the lock
> > > > > + * itself
> > > > > + */
> > > > > +static bool sync_rcu_preempt_exp_done_unlocked(struct rcu_node *rnp)
> > > > > +{
> > > > > +	unsigned long flags;
> > > > > +	bool ret;
> > > > > +
> > > > > +	raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > > > > +	ret = sync_rcu_preempt_exp_done(rnp);
> > > > 
> > > > Let's see...  The sync_rcu_preempt_exp_done() function checks the
> > > > ->exp_tasks pointer and the ->expmask bitmask.  The number of bits in the
> > > > mask can only decrease, and the ->exp_tasks pointer can only transition
> > > > from NULL to non-NULL when there is at least one bit set.  However,
> > > > there is no ordering in sync_rcu_preempt_exp_done(), so it is possible
> > > > that it could be fooled without the lock:
> > > > 
> > > > o	CPU 0 in sync_rcu_preempt_exp_done() reads ->exp_tasks and
> > > > 	sees that it is NULL.
> > > > 
> > > > o	CPU 1 blocks within an RCU read-side critical section, so
> > > > 	it enqueues the task and points ->exp_tasks at it and
> > > > 	clears CPU 1's bit in ->expmask.
> > > > 
> > > > o	All other CPUs clear their bits in ->expmask.
> > > > 
> > > > o	CPU 0 reads ->expmask, sees that it is zero, so incorrectly
> > > > 	concludes that all quiescent states have completed, despite
> > > > 	the fact that ->exp_tasks is non-NULL.
> > > > 
> > > > So it seems to me that the lock is needed.  Good catch!!!  The problem
> > > > would occur only if the task running on CPU 0 received a spurious
> > > > wakeup, but that could potentially happen.
> > > 
> > > Thanks for the analysis ;-)
> 
> The other limitation is that it occurs only on systems small enough
> to have a single-node rcu_node tree.  But still...
> 
> > > > If lock contention becomes a problem, memory-ordering tricks could be
> > > > applied, but the lock is of course simpler.
> > > > 
> > > 
> > > Agreed.
> > > 
> > > > I am guessing that this is a prototype patch, and that you are planning
> > > 
> > > Yes, this is a prototype. And I'm preparing a proper patch to send
> > > later.
> 
> Very good, thank you!
> 
> > > > to add lockdep annotations in more places, but either way please let
> > > > me know.
> > > 
> > > Give it's a bug as per your analysis, I'd like to defer other lockdep
> > > annotations and send this first. However, I'm currently getting other
> > > lockdep splats after applying this, so I need to get that sorted first.
> > 
> > Hmm.. the other lockdep splat seems irrelevant with my patch, I could
> > observe it on mainline using rcutorture with CONFIG_PROVE_LOCKING=y. I'd
> > spend some more time on it, in the meanwhile, send a proper patch for
> > this sync_rcu_preempt_exp_done().
> 
> I am not seeing that one, but am very interested in getting it fixed!  ;-)
> 

Found the root cause, and send out the patch ;-)

Regards,
Boqun

> 							Thanx, Paul
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] rcu: exp: Fix "must hold exp_mutex" comments for QS reporting functions
  2018-03-09  6:57             ` Boqun Feng
@ 2018-03-09 20:17               ` Paul E. McKenney
  2018-03-12  5:28                 ` Boqun Feng
  0 siblings, 1 reply; 11+ messages in thread
From: Paul E. McKenney @ 2018-03-09 20:17 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan

On Fri, Mar 09, 2018 at 02:57:00PM +0800, Boqun Feng wrote:
> On Thu, Mar 08, 2018 at 07:42:55AM -0800, Paul E. McKenney wrote:
> > On Thu, Mar 08, 2018 at 04:30:06PM +0800, Boqun Feng wrote:
> > > On Thu, Mar 08, 2018 at 12:54:29PM +0800, Boqun Feng wrote:
> > > > On Wed, Mar 07, 2018 at 08:30:17PM -0800, Paul E. McKenney wrote:
> > > > [...]
> > > > > >  
> > > > > > +/*
> > > > > > + * Like sync_rcu_preempt_exp_done(), but this function assumes the caller
> > > > > > + * doesn't hold the rcu_node's ->lock, and will acquire and release the lock
> > > > > > + * itself
> > > > > > + */
> > > > > > +static bool sync_rcu_preempt_exp_done_unlocked(struct rcu_node *rnp)
> > > > > > +{
> > > > > > +	unsigned long flags;
> > > > > > +	bool ret;
> > > > > > +
> > > > > > +	raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > > > > > +	ret = sync_rcu_preempt_exp_done(rnp);
> > > > > 
> > > > > Let's see...  The sync_rcu_preempt_exp_done() function checks the
> > > > > ->exp_tasks pointer and the ->expmask bitmask.  The number of bits in the
> > > > > mask can only decrease, and the ->exp_tasks pointer can only transition
> > > > > from NULL to non-NULL when there is at least one bit set.  However,
> > > > > there is no ordering in sync_rcu_preempt_exp_done(), so it is possible
> > > > > that it could be fooled without the lock:
> > > > > 
> > > > > o	CPU 0 in sync_rcu_preempt_exp_done() reads ->exp_tasks and
> > > > > 	sees that it is NULL.
> > > > > 
> > > > > o	CPU 1 blocks within an RCU read-side critical section, so
> > > > > 	it enqueues the task and points ->exp_tasks at it and
> > > > > 	clears CPU 1's bit in ->expmask.
> > > > > 
> > > > > o	All other CPUs clear their bits in ->expmask.
> > > > > 
> > > > > o	CPU 0 reads ->expmask, sees that it is zero, so incorrectly
> > > > > 	concludes that all quiescent states have completed, despite
> > > > > 	the fact that ->exp_tasks is non-NULL.
> > > > > 
> > > > > So it seems to me that the lock is needed.  Good catch!!!  The problem
> > > > > would occur only if the task running on CPU 0 received a spurious
> > > > > wakeup, but that could potentially happen.
> > > > 
> > > > Thanks for the analysis ;-)
> > 
> > The other limitation is that it occurs only on systems small enough
> > to have a single-node rcu_node tree.  But still...
> > 
> > > > > If lock contention becomes a problem, memory-ordering tricks could be
> > > > > applied, but the lock is of course simpler.
> > > > > 
> > > > 
> > > > Agreed.
> > > > 
> > > > > I am guessing that this is a prototype patch, and that you are planning
> > > > 
> > > > Yes, this is a prototype. And I'm preparing a proper patch to send
> > > > later.
> > 
> > Very good, thank you!
> > 
> > > > > to add lockdep annotations in more places, but either way please let
> > > > > me know.
> > > > 
> > > > Give it's a bug as per your analysis, I'd like to defer other lockdep
> > > > annotations and send this first. However, I'm currently getting other
> > > > lockdep splats after applying this, so I need to get that sorted first.
> > > 
> > > Hmm.. the other lockdep splat seems irrelevant with my patch, I could
> > > observe it on mainline using rcutorture with CONFIG_PROVE_LOCKING=y. I'd
> > > spend some more time on it, in the meanwhile, send a proper patch for
> > > this sync_rcu_preempt_exp_done().
> > 
> > I am not seeing that one, but am very interested in getting it fixed!  ;-)
> 
> Found the root cause, and send out the patch ;-)

Very good!  Still not sure why I don't see it, but as long as it is fixed!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] rcu: exp: Fix "must hold exp_mutex" comments for QS reporting functions
  2018-03-09 20:17               ` Paul E. McKenney
@ 2018-03-12  5:28                 ` Boqun Feng
  2018-03-13  0:15                   ` Paul E. McKenney
  0 siblings, 1 reply; 11+ messages in thread
From: Boqun Feng @ 2018-03-12  5:28 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan

[-- Attachment #1: Type: text/plain, Size: 4060 bytes --]

On Fri, Mar 09, 2018 at 12:17:07PM -0800, Paul E. McKenney wrote:
> On Fri, Mar 09, 2018 at 02:57:00PM +0800, Boqun Feng wrote:
> > On Thu, Mar 08, 2018 at 07:42:55AM -0800, Paul E. McKenney wrote:
> > > On Thu, Mar 08, 2018 at 04:30:06PM +0800, Boqun Feng wrote:
> > > > On Thu, Mar 08, 2018 at 12:54:29PM +0800, Boqun Feng wrote:
> > > > > On Wed, Mar 07, 2018 at 08:30:17PM -0800, Paul E. McKenney wrote:
> > > > > [...]
> > > > > > >  
> > > > > > > +/*
> > > > > > > + * Like sync_rcu_preempt_exp_done(), but this function assumes the caller
> > > > > > > + * doesn't hold the rcu_node's ->lock, and will acquire and release the lock
> > > > > > > + * itself
> > > > > > > + */
> > > > > > > +static bool sync_rcu_preempt_exp_done_unlocked(struct rcu_node *rnp)
> > > > > > > +{
> > > > > > > +	unsigned long flags;
> > > > > > > +	bool ret;
> > > > > > > +
> > > > > > > +	raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > > > > > > +	ret = sync_rcu_preempt_exp_done(rnp);
> > > > > > 
> > > > > > Let's see...  The sync_rcu_preempt_exp_done() function checks the
> > > > > > ->exp_tasks pointer and the ->expmask bitmask.  The number of bits in the
> > > > > > mask can only decrease, and the ->exp_tasks pointer can only transition
> > > > > > from NULL to non-NULL when there is at least one bit set.  However,
> > > > > > there is no ordering in sync_rcu_preempt_exp_done(), so it is possible
> > > > > > that it could be fooled without the lock:
> > > > > > 
> > > > > > o	CPU 0 in sync_rcu_preempt_exp_done() reads ->exp_tasks and
> > > > > > 	sees that it is NULL.
> > > > > > 
> > > > > > o	CPU 1 blocks within an RCU read-side critical section, so
> > > > > > 	it enqueues the task and points ->exp_tasks at it and
> > > > > > 	clears CPU 1's bit in ->expmask.
> > > > > > 
> > > > > > o	All other CPUs clear their bits in ->expmask.
> > > > > > 
> > > > > > o	CPU 0 reads ->expmask, sees that it is zero, so incorrectly
> > > > > > 	concludes that all quiescent states have completed, despite
> > > > > > 	the fact that ->exp_tasks is non-NULL.
> > > > > > 
> > > > > > So it seems to me that the lock is needed.  Good catch!!!  The problem
> > > > > > would occur only if the task running on CPU 0 received a spurious
> > > > > > wakeup, but that could potentially happen.
> > > > > 
> > > > > Thanks for the analysis ;-)
> > > 
> > > The other limitation is that it occurs only on systems small enough
> > > to have a single-node rcu_node tree.  But still...
> > > 
> > > > > > If lock contention becomes a problem, memory-ordering tricks could be
> > > > > > applied, but the lock is of course simpler.
> > > > > > 
> > > > > 
> > > > > Agreed.
> > > > > 
> > > > > > I am guessing that this is a prototype patch, and that you are planning
> > > > > 
> > > > > Yes, this is a prototype. And I'm preparing a proper patch to send
> > > > > later.
> > > 
> > > Very good, thank you!
> > > 
> > > > > > to add lockdep annotations in more places, but either way please let
> > > > > > me know.
> > > > > 
> > > > > Give it's a bug as per your analysis, I'd like to defer other lockdep
> > > > > annotations and send this first. However, I'm currently getting other
> > > > > lockdep splats after applying this, so I need to get that sorted first.
> > > > 
> > > > Hmm.. the other lockdep splat seems irrelevant with my patch, I could
> > > > observe it on mainline using rcutorture with CONFIG_PROVE_LOCKING=y. I'd
> > > > spend some more time on it, in the meanwhile, send a proper patch for
> > > > this sync_rcu_preempt_exp_done().
> > > 
> > > I am not seeing that one, but am very interested in getting it fixed!  ;-)
> > 
> > Found the root cause, and send out the patch ;-)
> 
> Very good!  Still not sure why I don't see it, but as long as it is fixed!
> 

One thing I could hit this is because I ran rcutorture with
CONFIG_PROVE_LOCKING=y, maybe you could consider adding that in your
rcutorture testsuite?

Regards,
Boqun

> 							Thanx, Paul
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] rcu: exp: Fix "must hold exp_mutex" comments for QS reporting functions
  2018-03-12  5:28                 ` Boqun Feng
@ 2018-03-13  0:15                   ` Paul E. McKenney
  0 siblings, 0 replies; 11+ messages in thread
From: Paul E. McKenney @ 2018-03-13  0:15 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan

On Mon, Mar 12, 2018 at 01:28:38PM +0800, Boqun Feng wrote:
> On Fri, Mar 09, 2018 at 12:17:07PM -0800, Paul E. McKenney wrote:
> > On Fri, Mar 09, 2018 at 02:57:00PM +0800, Boqun Feng wrote:
> > > On Thu, Mar 08, 2018 at 07:42:55AM -0800, Paul E. McKenney wrote:
> > > > On Thu, Mar 08, 2018 at 04:30:06PM +0800, Boqun Feng wrote:
> > > > > On Thu, Mar 08, 2018 at 12:54:29PM +0800, Boqun Feng wrote:
> > > > > > On Wed, Mar 07, 2018 at 08:30:17PM -0800, Paul E. McKenney wrote:
> > > > > > [...]
> > > > > > > >  
> > > > > > > > +/*
> > > > > > > > + * Like sync_rcu_preempt_exp_done(), but this function assumes the caller
> > > > > > > > + * doesn't hold the rcu_node's ->lock, and will acquire and release the lock
> > > > > > > > + * itself
> > > > > > > > + */
> > > > > > > > +static bool sync_rcu_preempt_exp_done_unlocked(struct rcu_node *rnp)
> > > > > > > > +{
> > > > > > > > +	unsigned long flags;
> > > > > > > > +	bool ret;
> > > > > > > > +
> > > > > > > > +	raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > > > > > > > +	ret = sync_rcu_preempt_exp_done(rnp);
> > > > > > > 
> > > > > > > Let's see...  The sync_rcu_preempt_exp_done() function checks the
> > > > > > > ->exp_tasks pointer and the ->expmask bitmask.  The number of bits in the
> > > > > > > mask can only decrease, and the ->exp_tasks pointer can only transition
> > > > > > > from NULL to non-NULL when there is at least one bit set.  However,
> > > > > > > there is no ordering in sync_rcu_preempt_exp_done(), so it is possible
> > > > > > > that it could be fooled without the lock:
> > > > > > > 
> > > > > > > o	CPU 0 in sync_rcu_preempt_exp_done() reads ->exp_tasks and
> > > > > > > 	sees that it is NULL.
> > > > > > > 
> > > > > > > o	CPU 1 blocks within an RCU read-side critical section, so
> > > > > > > 	it enqueues the task and points ->exp_tasks at it and
> > > > > > > 	clears CPU 1's bit in ->expmask.
> > > > > > > 
> > > > > > > o	All other CPUs clear their bits in ->expmask.
> > > > > > > 
> > > > > > > o	CPU 0 reads ->expmask, sees that it is zero, so incorrectly
> > > > > > > 	concludes that all quiescent states have completed, despite
> > > > > > > 	the fact that ->exp_tasks is non-NULL.
> > > > > > > 
> > > > > > > So it seems to me that the lock is needed.  Good catch!!!  The problem
> > > > > > > would occur only if the task running on CPU 0 received a spurious
> > > > > > > wakeup, but that could potentially happen.
> > > > > > 
> > > > > > Thanks for the analysis ;-)
> > > > 
> > > > The other limitation is that it occurs only on systems small enough
> > > > to have a single-node rcu_node tree.  But still...
> > > > 
> > > > > > > If lock contention becomes a problem, memory-ordering tricks could be
> > > > > > > applied, but the lock is of course simpler.
> > > > > > > 
> > > > > > 
> > > > > > Agreed.
> > > > > > 
> > > > > > > I am guessing that this is a prototype patch, and that you are planning
> > > > > > 
> > > > > > Yes, this is a prototype. And I'm preparing a proper patch to send
> > > > > > later.
> > > > 
> > > > Very good, thank you!
> > > > 
> > > > > > > to add lockdep annotations in more places, but either way please let
> > > > > > > me know.
> > > > > > 
> > > > > > Give it's a bug as per your analysis, I'd like to defer other lockdep
> > > > > > annotations and send this first. However, I'm currently getting other
> > > > > > lockdep splats after applying this, so I need to get that sorted first.
> > > > > 
> > > > > Hmm.. the other lockdep splat seems irrelevant with my patch, I could
> > > > > observe it on mainline using rcutorture with CONFIG_PROVE_LOCKING=y. I'd
> > > > > spend some more time on it, in the meanwhile, send a proper patch for
> > > > > this sync_rcu_preempt_exp_done().
> > > > 
> > > > I am not seeing that one, but am very interested in getting it fixed!  ;-)
> > > 
> > > Found the root cause, and send out the patch ;-)
> > 
> > Very good!  Still not sure why I don't see it, but as long as it is fixed!
> > 
> 
> One thing I could hit this is because I ran rcutorture with
> CONFIG_PROVE_LOCKING=y, maybe you could consider adding that in your
> rcutorture testsuite?

I also need to test without lockdep, but you are right that I should
certainly do a full-up lockdep once in a while!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-03-13  0:15 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-07  8:49 [PATCH] rcu: exp: Fix "must hold exp_mutex" comments for QS reporting functions Boqun Feng
2018-03-07 15:48 ` Paul E. McKenney
2018-03-08  3:46   ` Boqun Feng
2018-03-08  4:30     ` Paul E. McKenney
2018-03-08  4:54       ` Boqun Feng
2018-03-08  8:30         ` Boqun Feng
2018-03-08 15:42           ` Paul E. McKenney
2018-03-09  6:57             ` Boqun Feng
2018-03-09 20:17               ` Paul E. McKenney
2018-03-12  5:28                 ` Boqun Feng
2018-03-13  0:15                   ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).