RCU Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH tip/core/rcu 0/30] Miscellaneous fixes for v5.7
@ 2020-02-14 23:55 Paul E. McKenney
  2020-02-14 23:55 ` [PATCH tip/core/rcu 01/30] nfs: Fix nfs_access_get_cached_rcu() sparse error paulmck
                   ` (29 more replies)
  0 siblings, 30 replies; 46+ messages in thread
From: Paul E. McKenney @ 2020-02-14 23:55 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel

Hello!

This series provides miscellaneous fixes.

1.	Fix nfs_access_get_cached_rcu() sparse error, courtesy of
	Madhuparna Bhowmik.

2.	Warn on for_each_leaf_node_cpu_mask() from non-leaf rcu_node
	structure.

3.	Fix exp_funnel_lock()/rcu_exp_wait_wake() datarace.

4.	Provide debug symbols and line numbers in KCSAN runs.

5.	Add WRITE_ONCE() to rcu_node ->qsmask update.

6.	Add WRITE_ONCE to rcu_node ->exp_seq_rq store.

7.	Add READ_ONCE() to rcu_node ->gp_seq.

8.	Add WRITE_ONCE() to rcu_state ->gp_req_activity.

9.	Add WRITE_ONCE() to rcu_node ->qsmaskinitnext.

10.	Add WRITE_ONCE() to rt_mutex ->owner.

11.	Add READ_ONCE() to rcu_segcblist ->tails[].

12.	*_ONCE() for grace-period progress indicators.

13.	Fix typos in beginning comments, courtesy of SeongJae Park.

14.	Add READ_ONCE() to rcu_data ->gpwrap.

15.	Add *_ONCE() to rcu_data ->rcu_forced_tick.

16.	Add *_ONCE() to rcu_node ->boost_kthread_status.

17.	Use hlist_unhashed_lockless() in timer_pending(), courtesy of
	Eric Dumazet.

18.	Remove dead code from rcu_segcblist_insert_pend_cbs().

19.	Add WRITE_ONCE() to rcu_state ->gp_start.

20.	Fix rcu_barrier_callback() race condition.

21.	Add brackets around cond argument in __list_check_rcu macro,
	courtesy of Amol Grover.

22.	Don't flag non-starting GPs before GP kthread is running.

23.	Add missing annotation for rcu_nocb_bypass_lock(), courtesy
	of Jules Irenge.

24.	Add missing annotation for rcu_nocb_bypass_unlock(), courtesy
	of Jules Irenge.

25.	Optimize and protect atomic_cmpxchg() loop.

26.	Tighten rcu_lockdep_assert_cblist_protected() check.

27.	Make nocb_gp_wait() double-check unexpected-callback warning.

28.	Mark rcu_state.ncpus to detect concurrent writes.

29.	Mark rcu_state.gp_seq to detect concurrent writes.

30.	Make rcu_barrier() account for offline no-CBs CPUs.

							Thanx, Paul

------------------------------------------------------------------------

 fs/nfs/dir.c               |    2 
 include/linux/rculist.h    |    4 -
 include/linux/timer.h      |    2 
 include/trace/events/rcu.h |    1 
 kernel/locking/rtmutex.c   |    2 
 kernel/rcu/Makefile        |    4 +
 kernel/rcu/rcu.h           |    6 +-
 kernel/rcu/rcu_segcblist.c |    4 -
 kernel/rcu/srcutree.c      |    2 
 kernel/rcu/tree.c          |  134 +++++++++++++++++++++++++--------------------
 kernel/rcu/tree_exp.h      |    4 -
 kernel/rcu/tree_plugin.h   |   21 ++++---
 kernel/rcu/tree_stall.h    |   41 +++++++------
 kernel/time/timer.c        |    7 +-
 14 files changed, 135 insertions(+), 99 deletions(-)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH tip/core/rcu 01/30] nfs: Fix nfs_access_get_cached_rcu() sparse error
  2020-02-14 23:55 [PATCH tip/core/rcu 0/30] Miscellaneous fixes for v5.7 Paul E. McKenney
@ 2020-02-14 23:55 ` paulmck
  2020-02-14 23:55 ` [PATCH tip/core/rcu 02/30] rcu: Warn on for_each_leaf_node_cpu_mask() from non-leaf paulmck
                   ` (28 subsequent siblings)
  29 siblings, 0 replies; 46+ messages in thread
From: paulmck @ 2020-02-14 23:55 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Madhuparna Bhowmik,
	Paul E . McKenney

From: Madhuparna Bhowmik <madhuparnabhowmik04@gmail.com>

This patch fixes the following sparse error:
fs/nfs/dir.c:2353:14: error: incompatible types in comparison expression (different address spaces):
fs/nfs/dir.c:2353:14:    struct list_head [noderef] <asn:4> *
fs/nfs/dir.c:2353:14:    struct list_head *

Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik04@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 fs/nfs/dir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 1320288..55a29b0 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -2383,7 +2383,7 @@ static int nfs_access_get_cached_rcu(struct inode *inode, const struct cred *cre
 	rcu_read_lock();
 	if (nfsi->cache_validity & NFS_INO_INVALID_ACCESS)
 		goto out;
-	lh = rcu_dereference(nfsi->access_cache_entry_lru.prev);
+	lh = rcu_dereference(list_tail_rcu(&nfsi->access_cache_entry_lru));
 	cache = list_entry(lh, struct nfs_access_entry, lru);
 	if (lh == &nfsi->access_cache_entry_lru ||
 	    cred_fscmp(cred, cache->cred) != 0)
-- 
2.9.5


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH tip/core/rcu 02/30] rcu: Warn on for_each_leaf_node_cpu_mask() from non-leaf
  2020-02-14 23:55 [PATCH tip/core/rcu 0/30] Miscellaneous fixes for v5.7 Paul E. McKenney
  2020-02-14 23:55 ` [PATCH tip/core/rcu 01/30] nfs: Fix nfs_access_get_cached_rcu() sparse error paulmck
@ 2020-02-14 23:55 ` paulmck
  2020-02-14 23:55 ` [PATCH tip/core/rcu 03/30] rcu: Fix exp_funnel_lock()/rcu_exp_wait_wake() datarace paulmck
                   ` (27 subsequent siblings)
  29 siblings, 0 replies; 46+ messages in thread
From: paulmck @ 2020-02-14 23:55 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

The for_each_leaf_node_cpu_mask() and for_each_leaf_node_possible_cpu()
macros must be invoked only on leaf rcu_node structures.  Failing to
abide by this restriction can result in infinite loops on systems with
more than 64 CPUs (or for more than 32 CPUs on 32-bit systems).  This
commit therefore adds WARN_ON_ONCE() calls to make misuse of these two
macros easier to debug.

Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/rcu.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 05f936e..f6ce173 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -325,7 +325,8 @@ static inline void rcu_init_levelspread(int *levelspread, const int *levelcnt)
  * Iterate over all possible CPUs in a leaf RCU node.
  */
 #define for_each_leaf_node_possible_cpu(rnp, cpu) \
-	for ((cpu) = cpumask_next((rnp)->grplo - 1, cpu_possible_mask); \
+	for (WARN_ON_ONCE(!rcu_is_leaf_node(rnp)), \
+	     (cpu) = cpumask_next((rnp)->grplo - 1, cpu_possible_mask); \
 	     (cpu) <= rnp->grphi; \
 	     (cpu) = cpumask_next((cpu), cpu_possible_mask))
 
@@ -335,7 +336,8 @@ static inline void rcu_init_levelspread(int *levelspread, const int *levelcnt)
 #define rcu_find_next_bit(rnp, cpu, mask) \
 	((rnp)->grplo + find_next_bit(&(mask), BITS_PER_LONG, (cpu)))
 #define for_each_leaf_node_cpu_mask(rnp, cpu, mask) \
-	for ((cpu) = rcu_find_next_bit((rnp), 0, (mask)); \
+	for (WARN_ON_ONCE(!rcu_is_leaf_node(rnp)), \
+	     (cpu) = rcu_find_next_bit((rnp), 0, (mask)); \
 	     (cpu) <= rnp->grphi; \
 	     (cpu) = rcu_find_next_bit((rnp), (cpu) + 1 - (rnp->grplo), (mask)))
 
-- 
2.9.5


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH tip/core/rcu 03/30] rcu: Fix exp_funnel_lock()/rcu_exp_wait_wake() datarace
  2020-02-14 23:55 [PATCH tip/core/rcu 0/30] Miscellaneous fixes for v5.7 Paul E. McKenney
  2020-02-14 23:55 ` [PATCH tip/core/rcu 01/30] nfs: Fix nfs_access_get_cached_rcu() sparse error paulmck
  2020-02-14 23:55 ` [PATCH tip/core/rcu 02/30] rcu: Warn on for_each_leaf_node_cpu_mask() from non-leaf paulmck
@ 2020-02-14 23:55 ` paulmck
  2020-02-14 23:55 ` [PATCH tip/core/rcu 04/30] rcu: Provide debug symbols and line numbers in KCSAN runs paulmck
                   ` (26 subsequent siblings)
  29 siblings, 0 replies; 46+ messages in thread
From: paulmck @ 2020-02-14 23:55 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

The rcu_node structure's ->exp_seq_rq field is accessed locklessly, so
updates must use WRITE_ONCE().  This commit therefore adds the needed
WRITE_ONCE() invocation where it was missed.

This data race was reported by KCSAN.  Not appropriate for backporting
due to failure being unlikely.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree_exp.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index dcbd757..d7e0484 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -589,7 +589,7 @@ static void rcu_exp_wait_wake(unsigned long s)
 			spin_lock(&rnp->exp_lock);
 			/* Recheck, avoid hang in case someone just arrived. */
 			if (ULONG_CMP_LT(rnp->exp_seq_rq, s))
-				rnp->exp_seq_rq = s;
+				WRITE_ONCE(rnp->exp_seq_rq, s);
 			spin_unlock(&rnp->exp_lock);
 		}
 		smp_mb(); /* All above changes before wakeup. */
-- 
2.9.5


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH tip/core/rcu 04/30] rcu: Provide debug symbols and line numbers in KCSAN runs
  2020-02-14 23:55 [PATCH tip/core/rcu 0/30] Miscellaneous fixes for v5.7 Paul E. McKenney
                   ` (2 preceding siblings ...)
  2020-02-14 23:55 ` [PATCH tip/core/rcu 03/30] rcu: Fix exp_funnel_lock()/rcu_exp_wait_wake() datarace paulmck
@ 2020-02-14 23:55 ` paulmck
  2020-02-14 23:55 ` [PATCH tip/core/rcu 05/30] rcu: Add WRITE_ONCE() to rcu_node ->qsmask update paulmck
                   ` (25 subsequent siblings)
  29 siblings, 0 replies; 46+ messages in thread
From: paulmck @ 2020-02-14 23:55 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

This commit adds "-g -fno-omit-frame-pointer" to ease interpretation
of KCSAN output, but only for CONFIG_KCSAN=y kerrnels.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/Makefile | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile
index 82d5fba..f91f2c2 100644
--- a/kernel/rcu/Makefile
+++ b/kernel/rcu/Makefile
@@ -3,6 +3,10 @@
 # and is generally not a function of system call inputs.
 KCOV_INSTRUMENT := n
 
+ifeq ($(CONFIG_KCSAN),y)
+KBUILD_CFLAGS += -g -fno-omit-frame-pointer
+endif
+
 obj-y += update.o sync.o
 obj-$(CONFIG_TREE_SRCU) += srcutree.o
 obj-$(CONFIG_TINY_SRCU) += srcutiny.o
-- 
2.9.5


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH tip/core/rcu 05/30] rcu: Add WRITE_ONCE() to rcu_node ->qsmask update
  2020-02-14 23:55 [PATCH tip/core/rcu 0/30] Miscellaneous fixes for v5.7 Paul E. McKenney
                   ` (3 preceding siblings ...)
  2020-02-14 23:55 ` [PATCH tip/core/rcu 04/30] rcu: Provide debug symbols and line numbers in KCSAN runs paulmck
@ 2020-02-14 23:55 ` paulmck
  2020-02-14 23:55 ` [PATCH tip/core/rcu 06/30] rcu: Add WRITE_ONCE to rcu_node ->exp_seq_rq store paulmck
                   ` (24 subsequent siblings)
  29 siblings, 0 replies; 46+ messages in thread
From: paulmck @ 2020-02-14 23:55 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

The rcu_node structure's ->qsmask field is read locklessly, so this
commit adds the WRITE_ONCE() to an update in order to provide proper
documentation and READ_ONCE()/WRITE_ONCE() pairing.

This data race was reported by KCSAN.  Not appropriate for backporting
due to failure being unlikely.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index d91c915..bb57c24 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1881,7 +1881,7 @@ static void rcu_report_qs_rnp(unsigned long mask, struct rcu_node *rnp,
 		WARN_ON_ONCE(oldmask); /* Any child must be all zeroed! */
 		WARN_ON_ONCE(!rcu_is_leaf_node(rnp) &&
 			     rcu_preempt_blocked_readers_cgp(rnp));
-		rnp->qsmask &= ~mask;
+		WRITE_ONCE(rnp->qsmask, rnp->qsmask & ~mask);
 		trace_rcu_quiescent_state_report(rcu_state.name, rnp->gp_seq,
 						 mask, rnp->qsmask, rnp->level,
 						 rnp->grplo, rnp->grphi,
-- 
2.9.5


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH tip/core/rcu 06/30] rcu: Add WRITE_ONCE to rcu_node ->exp_seq_rq store
  2020-02-14 23:55 [PATCH tip/core/rcu 0/30] Miscellaneous fixes for v5.7 Paul E. McKenney
                   ` (4 preceding siblings ...)
  2020-02-14 23:55 ` [PATCH tip/core/rcu 05/30] rcu: Add WRITE_ONCE() to rcu_node ->qsmask update paulmck
@ 2020-02-14 23:55 ` paulmck
  2020-02-15  3:47   ` Steven Rostedt
  2020-02-14 23:55 ` [PATCH tip/core/rcu 07/30] rcu: Add READ_ONCE() to rcu_node ->gp_seq paulmck
                   ` (23 subsequent siblings)
  29 siblings, 1 reply; 46+ messages in thread
From: paulmck @ 2020-02-14 23:55 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

The rcu_node structure's ->exp_seq_rq field is read locklessly, so
this commit adds the WRITE_ONCE() to a load in order to provide proper
documentation and READ_ONCE()/WRITE_ONCE() pairing.

This data race was reported by KCSAN.  Not appropriate for backporting
due to failure being unlikely.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree_exp.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index d7e0484..85b009e 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -314,7 +314,7 @@ static bool exp_funnel_lock(unsigned long s)
 				   sync_exp_work_done(s));
 			return true;
 		}
-		rnp->exp_seq_rq = s; /* Followers can wait on us. */
+		WRITE_ONCE(rnp->exp_seq_rq, s); /* Followers can wait on us. */
 		spin_unlock(&rnp->exp_lock);
 		trace_rcu_exp_funnel_lock(rcu_state.name, rnp->level,
 					  rnp->grplo, rnp->grphi, TPS("nxtlvl"));
-- 
2.9.5


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH tip/core/rcu 07/30] rcu: Add READ_ONCE() to rcu_node ->gp_seq
  2020-02-14 23:55 [PATCH tip/core/rcu 0/30] Miscellaneous fixes for v5.7 Paul E. McKenney
                   ` (5 preceding siblings ...)
  2020-02-14 23:55 ` [PATCH tip/core/rcu 06/30] rcu: Add WRITE_ONCE to rcu_node ->exp_seq_rq store paulmck
@ 2020-02-14 23:55 ` paulmck
  2020-02-14 23:55 ` [PATCH tip/core/rcu 08/30] rcu: Add WRITE_ONCE() to rcu_state ->gp_req_activity paulmck
                   ` (22 subsequent siblings)
  29 siblings, 0 replies; 46+ messages in thread
From: paulmck @ 2020-02-14 23:55 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

The rcu_node structure's ->gp_seq field is read locklessly, so this
commit adds the READ_ONCE() to several loads in order to avoid
destructive compiler optimizations.

This data race was reported by KCSAN.  Not appropriate for backporting
because this affects only tracing and warnings.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index bb57c24..236692d 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1126,8 +1126,9 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
 static void trace_rcu_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
 			      unsigned long gp_seq_req, const char *s)
 {
-	trace_rcu_future_grace_period(rcu_state.name, rnp->gp_seq, gp_seq_req,
-				      rnp->level, rnp->grplo, rnp->grphi, s);
+	trace_rcu_future_grace_period(rcu_state.name, READ_ONCE(rnp->gp_seq),
+				      gp_seq_req, rnp->level,
+				      rnp->grplo, rnp->grphi, s);
 }
 
 /*
@@ -1904,7 +1905,7 @@ static void rcu_report_qs_rnp(unsigned long mask, struct rcu_node *rnp,
 		rnp_c = rnp;
 		rnp = rnp->parent;
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
-		oldmask = rnp_c->qsmask;
+		oldmask = READ_ONCE(rnp_c->qsmask);
 	}
 
 	/*
@@ -2052,7 +2053,7 @@ int rcutree_dying_cpu(unsigned int cpu)
 		return 0;
 
 	blkd = !!(rnp->qsmask & rdp->grpmask);
-	trace_rcu_grace_period(rcu_state.name, rnp->gp_seq,
+	trace_rcu_grace_period(rcu_state.name, READ_ONCE(rnp->gp_seq),
 			       blkd ? TPS("cpuofl") : TPS("cpuofl-bgp"));
 	return 0;
 }
-- 
2.9.5


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH tip/core/rcu 08/30] rcu: Add WRITE_ONCE() to rcu_state ->gp_req_activity
  2020-02-14 23:55 [PATCH tip/core/rcu 0/30] Miscellaneous fixes for v5.7 Paul E. McKenney
                   ` (6 preceding siblings ...)
  2020-02-14 23:55 ` [PATCH tip/core/rcu 07/30] rcu: Add READ_ONCE() to rcu_node ->gp_seq paulmck
@ 2020-02-14 23:55 ` paulmck
  2020-02-14 23:55 ` [PATCH tip/core/rcu 09/30] rcu: Add WRITE_ONCE() to rcu_node ->qsmaskinitnext paulmck
                   ` (21 subsequent siblings)
  29 siblings, 0 replies; 46+ messages in thread
From: paulmck @ 2020-02-14 23:55 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

The rcu_state structure's ->gp_req_activity field is read locklessly,
so this commit adds the WRITE_ONCE() to an update in order to provide
proper documentation and READ_ONCE()/WRITE_ONCE() pairing.

This data race was reported by KCSAN.  Not appropriate for backporting
due to failure being unlikely.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 236692d..9443909 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1200,7 +1200,7 @@ static bool rcu_start_this_gp(struct rcu_node *rnp_start, struct rcu_data *rdp,
 	}
 	trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("Startedroot"));
 	WRITE_ONCE(rcu_state.gp_flags, rcu_state.gp_flags | RCU_GP_FLAG_INIT);
-	rcu_state.gp_req_activity = jiffies;
+	WRITE_ONCE(rcu_state.gp_req_activity, jiffies);
 	if (!rcu_state.gp_kthread) {
 		trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("NoGPkthread"));
 		goto unlock_out;
@@ -1775,7 +1775,7 @@ static void rcu_gp_cleanup(void)
 		    rcu_segcblist_is_offloaded(&rdp->cblist);
 	if ((offloaded || !rcu_accelerate_cbs(rnp, rdp)) && needgp) {
 		WRITE_ONCE(rcu_state.gp_flags, RCU_GP_FLAG_INIT);
-		rcu_state.gp_req_activity = jiffies;
+		WRITE_ONCE(rcu_state.gp_req_activity, jiffies);
 		trace_rcu_grace_period(rcu_state.name,
 				       READ_ONCE(rcu_state.gp_seq),
 				       TPS("newreq"));
-- 
2.9.5


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH tip/core/rcu 09/30] rcu: Add WRITE_ONCE() to rcu_node ->qsmaskinitnext
  2020-02-14 23:55 [PATCH tip/core/rcu 0/30] Miscellaneous fixes for v5.7 Paul E. McKenney
                   ` (7 preceding siblings ...)
  2020-02-14 23:55 ` [PATCH tip/core/rcu 08/30] rcu: Add WRITE_ONCE() to rcu_state ->gp_req_activity paulmck
@ 2020-02-14 23:55 ` paulmck
  2020-02-14 23:55 ` [PATCH tip/core/rcu 10/30] locking/rtmutex: rcu: Add WRITE_ONCE() to rt_mutex ->owner paulmck
                   ` (20 subsequent siblings)
  29 siblings, 0 replies; 46+ messages in thread
From: paulmck @ 2020-02-14 23:55 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

The rcu_state structure's ->qsmaskinitnext field is read locklessly,
so this commit adds the WRITE_ONCE() to an update in order to provide
proper documentation and READ_ONCE()/WRITE_ONCE() pairing.

This data race was reported by KCSAN.  Not appropriate for backporting
due to failure being unlikely for systems not doing incessant CPU-hotplug
operations.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 9443909..346321a 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3379,7 +3379,7 @@ void rcu_cpu_starting(unsigned int cpu)
 	rnp = rdp->mynode;
 	mask = rdp->grpmask;
 	raw_spin_lock_irqsave_rcu_node(rnp, flags);
-	rnp->qsmaskinitnext |= mask;
+	WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext | mask);
 	oldmask = rnp->expmaskinitnext;
 	rnp->expmaskinitnext |= mask;
 	oldmask ^= rnp->expmaskinitnext;
@@ -3432,7 +3432,7 @@ void rcu_report_dead(unsigned int cpu)
 		rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	}
-	rnp->qsmaskinitnext &= ~mask;
+	WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext & ~mask);
 	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 	raw_spin_unlock(&rcu_state.ofl_lock);
 
-- 
2.9.5


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH tip/core/rcu 10/30] locking/rtmutex: rcu: Add WRITE_ONCE() to rt_mutex ->owner
  2020-02-14 23:55 [PATCH tip/core/rcu 0/30] Miscellaneous fixes for v5.7 Paul E. McKenney
                   ` (8 preceding siblings ...)
  2020-02-14 23:55 ` [PATCH tip/core/rcu 09/30] rcu: Add WRITE_ONCE() to rcu_node ->qsmaskinitnext paulmck
@ 2020-02-14 23:55 ` paulmck
  2020-02-14 23:55 ` [PATCH tip/core/rcu 11/30] rcu: Add READ_ONCE() to rcu_segcblist ->tails[] paulmck
                   ` (19 subsequent siblings)
  29 siblings, 0 replies; 46+ messages in thread
From: paulmck @ 2020-02-14 23:55 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney, Ingo Molnar,
	Will Deacon

From: "Paul E. McKenney" <paulmck@kernel.org>

The rt_mutex structure's ->owner field is read locklessly, so this
commit adds the WRITE_ONCE() to an update in order to provide proper
documentation and READ_ONCE()/WRITE_ONCE() pairing.

This data race was reported by KCSAN.  Not appropriate for backporting
due to failure being unlikely.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Will Deacon <will@kernel.org>
---
 kernel/locking/rtmutex.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 851bbb1..c9f090d 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -57,7 +57,7 @@ rt_mutex_set_owner(struct rt_mutex *lock, struct task_struct *owner)
 	if (rt_mutex_has_waiters(lock))
 		val |= RT_MUTEX_HAS_WAITERS;
 
-	lock->owner = (struct task_struct *)val;
+	WRITE_ONCE(lock->owner, (struct task_struct *)val);
 }
 
 static inline void clear_rt_mutex_waiters(struct rt_mutex *lock)
-- 
2.9.5


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH tip/core/rcu 11/30] rcu: Add READ_ONCE() to rcu_segcblist ->tails[]
  2020-02-14 23:55 [PATCH tip/core/rcu 0/30] Miscellaneous fixes for v5.7 Paul E. McKenney
                   ` (9 preceding siblings ...)
  2020-02-14 23:55 ` [PATCH tip/core/rcu 10/30] locking/rtmutex: rcu: Add WRITE_ONCE() to rt_mutex ->owner paulmck
@ 2020-02-14 23:55 ` paulmck
  2020-02-14 23:55 ` [PATCH tip/core/rcu 12/30] rcu: *_ONCE() for grace-period progress indicators paulmck
                   ` (18 subsequent siblings)
  29 siblings, 0 replies; 46+ messages in thread
From: paulmck @ 2020-02-14 23:55 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

The rcu_segcblist structure's ->tails[] array entries are read
locklessly, so this commit adds the READ_ONCE() to a load in order to
avoid destructive compiler optimizations.

This data race was reported by KCSAN.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/rcu_segcblist.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index 5f4fd3b..426a472 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -182,7 +182,7 @@ void rcu_segcblist_offload(struct rcu_segcblist *rsclp)
 bool rcu_segcblist_ready_cbs(struct rcu_segcblist *rsclp)
 {
 	return rcu_segcblist_is_enabled(rsclp) &&
-	       &rsclp->head != rsclp->tails[RCU_DONE_TAIL];
+	       &rsclp->head != READ_ONCE(rsclp->tails[RCU_DONE_TAIL]);
 }
 
 /*
-- 
2.9.5


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH tip/core/rcu 12/30] rcu: *_ONCE() for grace-period progress indicators
  2020-02-14 23:55 [PATCH tip/core/rcu 0/30] Miscellaneous fixes for v5.7 Paul E. McKenney
                   ` (10 preceding siblings ...)
  2020-02-14 23:55 ` [PATCH tip/core/rcu 11/30] rcu: Add READ_ONCE() to rcu_segcblist ->tails[] paulmck
@ 2020-02-14 23:55 ` paulmck
  2020-02-14 23:55 ` [PATCH tip/core/rcu 13/30] rcu: Fix typos in beginning comments paulmck
                   ` (17 subsequent siblings)
  29 siblings, 0 replies; 46+ messages in thread
From: paulmck @ 2020-02-14 23:55 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

The various RCU structures' ->gp_seq, ->gp_seq_needed, ->gp_req_activity,
and ->gp_activity fields are read locklessly, so they must be updated with
WRITE_ONCE() and, when read locklessly, with READ_ONCE().  This commit makes
these changes.

This data race was reported by KCSAN.  Not appropriate for backporting
due to failure being unlikely.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c        | 14 +++++++-------
 kernel/rcu/tree_plugin.h |  2 +-
 kernel/rcu/tree_stall.h  | 26 +++++++++++++++-----------
 3 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 346321a..53946b1 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1175,7 +1175,7 @@ static bool rcu_start_this_gp(struct rcu_node *rnp_start, struct rcu_data *rdp,
 					  TPS("Prestarted"));
 			goto unlock_out;
 		}
-		rnp->gp_seq_needed = gp_seq_req;
+		WRITE_ONCE(rnp->gp_seq_needed, gp_seq_req);
 		if (rcu_seq_state(rcu_seq_current(&rnp->gp_seq))) {
 			/*
 			 * We just marked the leaf or internal node, and a
@@ -1210,8 +1210,8 @@ static bool rcu_start_this_gp(struct rcu_node *rnp_start, struct rcu_data *rdp,
 unlock_out:
 	/* Push furthest requested GP to leaf node and rcu_data structure. */
 	if (ULONG_CMP_LT(gp_seq_req, rnp->gp_seq_needed)) {
-		rnp_start->gp_seq_needed = rnp->gp_seq_needed;
-		rdp->gp_seq_needed = rnp->gp_seq_needed;
+		WRITE_ONCE(rnp_start->gp_seq_needed, rnp->gp_seq_needed);
+		WRITE_ONCE(rdp->gp_seq_needed, rnp->gp_seq_needed);
 	}
 	if (rnp != rnp_start)
 		raw_spin_unlock_rcu_node(rnp);
@@ -1423,7 +1423,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
 	}
 	rdp->gp_seq = rnp->gp_seq;  /* Remember new grace-period state. */
 	if (ULONG_CMP_LT(rdp->gp_seq_needed, rnp->gp_seq_needed) || rdp->gpwrap)
-		rdp->gp_seq_needed = rnp->gp_seq_needed;
+		WRITE_ONCE(rdp->gp_seq_needed, rnp->gp_seq_needed);
 	WRITE_ONCE(rdp->gpwrap, false);
 	rcu_gpnum_ovf(rnp, rdp);
 	return ret;
@@ -3276,12 +3276,12 @@ int rcutree_prepare_cpu(unsigned int cpu)
 	rnp = rdp->mynode;
 	raw_spin_lock_rcu_node(rnp);		/* irqs already disabled. */
 	rdp->beenonline = true;	 /* We have now been online. */
-	rdp->gp_seq = rnp->gp_seq;
-	rdp->gp_seq_needed = rnp->gp_seq;
+	rdp->gp_seq = READ_ONCE(rnp->gp_seq);
+	rdp->gp_seq_needed = rdp->gp_seq;
 	rdp->cpu_no_qs.b.norm = true;
 	rdp->core_needs_qs = false;
 	rdp->rcu_iw_pending = false;
-	rdp->rcu_iw_gp_seq = rnp->gp_seq - 1;
+	rdp->rcu_iw_gp_seq = rdp->gp_seq - 1;
 	trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuonl"));
 	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 	rcu_prepare_kthreads(cpu);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index c6ea81c..b5ba148 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -753,7 +753,7 @@ dump_blkd_tasks(struct rcu_node *rnp, int ncheck)
 	raw_lockdep_assert_held_rcu_node(rnp);
 	pr_info("%s: grp: %d-%d level: %d ->gp_seq %ld ->completedqs %ld\n",
 		__func__, rnp->grplo, rnp->grphi, rnp->level,
-		(long)rnp->gp_seq, (long)rnp->completedqs);
+		(long)READ_ONCE(rnp->gp_seq), (long)rnp->completedqs);
 	for (rnp1 = rnp; rnp1; rnp1 = rnp1->parent)
 		pr_info("%s: %d:%d ->qsmask %#lx ->qsmaskinit %#lx ->qsmaskinitnext %#lx\n",
 			__func__, rnp1->grplo, rnp1->grphi, rnp1->qsmask, rnp1->qsmaskinit, rnp1->qsmaskinitnext);
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 55f9b84..3dcbecf 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -592,21 +592,22 @@ void show_rcu_gp_kthreads(void)
 		(long)READ_ONCE(rcu_get_root()->gp_seq_needed),
 		READ_ONCE(rcu_state.gp_flags));
 	rcu_for_each_node_breadth_first(rnp) {
-		if (ULONG_CMP_GE(rcu_state.gp_seq, rnp->gp_seq_needed))
+		if (ULONG_CMP_GE(READ_ONCE(rcu_state.gp_seq),
+				 READ_ONCE(rnp->gp_seq_needed)))
 			continue;
 		pr_info("\trcu_node %d:%d ->gp_seq %ld ->gp_seq_needed %ld\n",
-			rnp->grplo, rnp->grphi, (long)rnp->gp_seq,
-			(long)rnp->gp_seq_needed);
+			rnp->grplo, rnp->grphi, (long)READ_ONCE(rnp->gp_seq),
+			(long)READ_ONCE(rnp->gp_seq_needed));
 		if (!rcu_is_leaf_node(rnp))
 			continue;
 		for_each_leaf_node_possible_cpu(rnp, cpu) {
 			rdp = per_cpu_ptr(&rcu_data, cpu);
 			if (rdp->gpwrap ||
-			    ULONG_CMP_GE(rcu_state.gp_seq,
-					 rdp->gp_seq_needed))
+			    ULONG_CMP_GE(READ_ONCE(rcu_state.gp_seq),
+					 READ_ONCE(rdp->gp_seq_needed)))
 				continue;
 			pr_info("\tcpu %d ->gp_seq_needed %ld\n",
-				cpu, (long)rdp->gp_seq_needed);
+				cpu, (long)READ_ONCE(rdp->gp_seq_needed));
 		}
 	}
 	for_each_possible_cpu(cpu) {
@@ -631,7 +632,8 @@ static void rcu_check_gp_start_stall(struct rcu_node *rnp, struct rcu_data *rdp,
 	static atomic_t warned = ATOMIC_INIT(0);
 
 	if (!IS_ENABLED(CONFIG_PROVE_RCU) || rcu_gp_in_progress() ||
-	    ULONG_CMP_GE(rnp_root->gp_seq, rnp_root->gp_seq_needed))
+	    ULONG_CMP_GE(READ_ONCE(rnp_root->gp_seq),
+	    		 READ_ONCE(rnp_root->gp_seq_needed)))
 		return;
 	j = jiffies; /* Expensive access, and in common case don't get here. */
 	if (time_before(j, READ_ONCE(rcu_state.gp_req_activity) + gpssdelay) ||
@@ -642,7 +644,8 @@ static void rcu_check_gp_start_stall(struct rcu_node *rnp, struct rcu_data *rdp,
 	raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	j = jiffies;
 	if (rcu_gp_in_progress() ||
-	    ULONG_CMP_GE(rnp_root->gp_seq, rnp_root->gp_seq_needed) ||
+	    ULONG_CMP_GE(READ_ONCE(rnp_root->gp_seq),
+	    		 READ_ONCE(rnp_root->gp_seq_needed)) ||
 	    time_before(j, READ_ONCE(rcu_state.gp_req_activity) + gpssdelay) ||
 	    time_before(j, READ_ONCE(rcu_state.gp_activity) + gpssdelay) ||
 	    atomic_read(&warned)) {
@@ -655,9 +658,10 @@ static void rcu_check_gp_start_stall(struct rcu_node *rnp, struct rcu_data *rdp,
 		raw_spin_lock_rcu_node(rnp_root); /* irqs already disabled. */
 	j = jiffies;
 	if (rcu_gp_in_progress() ||
-	    ULONG_CMP_GE(rnp_root->gp_seq, rnp_root->gp_seq_needed) ||
-	    time_before(j, rcu_state.gp_req_activity + gpssdelay) ||
-	    time_before(j, rcu_state.gp_activity + gpssdelay) ||
+	    ULONG_CMP_GE(READ_ONCE(rnp_root->gp_seq),
+	    		 READ_ONCE(rnp_root->gp_seq_needed)) ||
+	    time_before(j, READ_ONCE(rcu_state.gp_req_activity) + gpssdelay) ||
+	    time_before(j, READ_ONCE(rcu_state.gp_activity) + gpssdelay) ||
 	    atomic_xchg(&warned, 1)) {
 		if (rnp_root != rnp)
 			/* irqs remain disabled. */
-- 
2.9.5


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH tip/core/rcu 13/30] rcu: Fix typos in beginning comments
  2020-02-14 23:55 [PATCH tip/core/rcu 0/30] Miscellaneous fixes for v5.7 Paul E. McKenney
                   ` (11 preceding siblings ...)
  2020-02-14 23:55 ` [PATCH tip/core/rcu 12/30] rcu: *_ONCE() for grace-period progress indicators paulmck
@ 2020-02-14 23:55 ` paulmck
  2020-02-14 23:55 ` [PATCH tip/core/rcu 14/30] rcu: Add READ_ONCE() to rcu_data ->gpwrap paulmck
                   ` (16 subsequent siblings)
  29 siblings, 0 replies; 46+ messages in thread
From: paulmck @ 2020-02-14 23:55 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, SeongJae Park, Paul E . McKenney

From: SeongJae Park <sjpark@amazon.de>

Signed-off-by: SeongJae Park <sjpark@amazon.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/srcutree.c | 2 +-
 kernel/rcu/tree.c     | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 657e6a7..7ddb29c 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -5,7 +5,7 @@
  * Copyright (C) IBM Corporation, 2006
  * Copyright (C) Fujitsu, 2012
  *
- * Author: Paul McKenney <paulmck@linux.ibm.com>
+ * Authors: Paul McKenney <paulmck@linux.ibm.com>
  *	   Lai Jiangshan <laijs@cn.fujitsu.com>
  *
  * For detailed explanation of Read-Copy Update mechanism see -
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 53946b1..a70f56b 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1,12 +1,12 @@
 // SPDX-License-Identifier: GPL-2.0+
 /*
- * Read-Copy Update mechanism for mutual exclusion
+ * Read-Copy Update mechanism for mutual exclusion (tree-based version)
  *
  * Copyright IBM Corporation, 2008
  *
  * Authors: Dipankar Sarma <dipankar@in.ibm.com>
  *	    Manfred Spraul <manfred@colorfullife.com>
- *	    Paul E. McKenney <paulmck@linux.ibm.com> Hierarchical version
+ *	    Paul E. McKenney <paulmck@linux.ibm.com>
  *
  * Based on the original work by Paul McKenney <paulmck@linux.ibm.com>
  * and inputs from Rusty Russell, Andrea Arcangeli and Andi Kleen.
-- 
2.9.5


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH tip/core/rcu 14/30] rcu: Add READ_ONCE() to rcu_data ->gpwrap
  2020-02-14 23:55 [PATCH tip/core/rcu 0/30] Miscellaneous fixes for v5.7 Paul E. McKenney
                   ` (12 preceding siblings ...)
  2020-02-14 23:55 ` [PATCH tip/core/rcu 13/30] rcu: Fix typos in beginning comments paulmck
@ 2020-02-14 23:55 ` paulmck
  2020-02-14 23:55 ` [PATCH tip/core/rcu 15/30] rcu: Add *_ONCE() to rcu_data ->rcu_forced_tick paulmck
                   ` (15 subsequent siblings)
  29 siblings, 0 replies; 46+ messages in thread
From: paulmck @ 2020-02-14 23:55 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

The rcu_data structure's ->gpwrap field is read locklessly, and so
this commit adds the required READ_ONCE() to a pair of laods in order
to avoid destructive compiler optimizations.

This data race was reported by KCSAN.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c       | 2 +-
 kernel/rcu/tree_stall.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index a70f56b..e851a12 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1322,7 +1322,7 @@ static void rcu_accelerate_cbs_unlocked(struct rcu_node *rnp,
 
 	rcu_lockdep_assert_cblist_protected(rdp);
 	c = rcu_seq_snap(&rcu_state.gp_seq);
-	if (!rdp->gpwrap && ULONG_CMP_GE(rdp->gp_seq_needed, c)) {
+	if (!READ_ONCE(rdp->gpwrap) && ULONG_CMP_GE(rdp->gp_seq_needed, c)) {
 		/* Old request still live, so mark recent callbacks. */
 		(void)rcu_segcblist_accelerate(&rdp->cblist, c);
 		return;
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 3dcbecf..3275f27 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -602,7 +602,7 @@ void show_rcu_gp_kthreads(void)
 			continue;
 		for_each_leaf_node_possible_cpu(rnp, cpu) {
 			rdp = per_cpu_ptr(&rcu_data, cpu);
-			if (rdp->gpwrap ||
+			if (READ_ONCE(rdp->gpwrap) ||
 			    ULONG_CMP_GE(READ_ONCE(rcu_state.gp_seq),
 					 READ_ONCE(rdp->gp_seq_needed)))
 				continue;
-- 
2.9.5


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH tip/core/rcu 15/30] rcu: Add *_ONCE() to rcu_data ->rcu_forced_tick
  2020-02-14 23:55 [PATCH tip/core/rcu 0/30] Miscellaneous fixes for v5.7 Paul E. McKenney
                   ` (13 preceding siblings ...)
  2020-02-14 23:55 ` [PATCH tip/core/rcu 14/30] rcu: Add READ_ONCE() to rcu_data ->gpwrap paulmck
@ 2020-02-14 23:55 ` paulmck
  2020-02-14 23:55 ` [PATCH tip/core/rcu 16/30] rcu: Add *_ONCE() to rcu_node ->boost_kthread_status paulmck
                   ` (14 subsequent siblings)
  29 siblings, 0 replies; 46+ messages in thread
From: paulmck @ 2020-02-14 23:55 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

The rcu_data structure's ->rcu_forced_tick field is read locklessly, so
this commit adds WRITE_ONCE() to all updates and READ_ONCE() to all
lockless reads.

This data race was reported by KCSAN.  Not appropriate for backporting
due to failure being unlikely.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index e851a12..be59a5d 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -818,11 +818,12 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
 		incby = 1;
 	} else if (tick_nohz_full_cpu(rdp->cpu) &&
 		   rdp->dynticks_nmi_nesting == DYNTICK_IRQ_NONIDLE &&
-		   READ_ONCE(rdp->rcu_urgent_qs) && !rdp->rcu_forced_tick) {
+		   READ_ONCE(rdp->rcu_urgent_qs) &&
+		   !READ_ONCE(rdp->rcu_forced_tick)) {
 		raw_spin_lock_rcu_node(rdp->mynode);
 		// Recheck under lock.
 		if (rdp->rcu_urgent_qs && !rdp->rcu_forced_tick) {
-			rdp->rcu_forced_tick = true;
+			WRITE_ONCE(rdp->rcu_forced_tick, true);
 			tick_dep_set_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
 		}
 		raw_spin_unlock_rcu_node(rdp->mynode);
@@ -899,7 +900,7 @@ static void rcu_disable_urgency_upon_qs(struct rcu_data *rdp)
 	WRITE_ONCE(rdp->rcu_need_heavy_qs, false);
 	if (tick_nohz_full_cpu(rdp->cpu) && rdp->rcu_forced_tick) {
 		tick_dep_clear_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
-		rdp->rcu_forced_tick = false;
+		WRITE_ONCE(rdp->rcu_forced_tick, false);
 	}
 }
 
-- 
2.9.5


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH tip/core/rcu 16/30] rcu: Add *_ONCE() to rcu_node ->boost_kthread_status
  2020-02-14 23:55 [PATCH tip/core/rcu 0/30] Miscellaneous fixes for v5.7 Paul E. McKenney
                   ` (14 preceding siblings ...)
  2020-02-14 23:55 ` [PATCH tip/core/rcu 15/30] rcu: Add *_ONCE() to rcu_data ->rcu_forced_tick paulmck
@ 2020-02-14 23:55 ` paulmck
  2020-02-14 23:55 ` [PATCH tip/core/rcu 17/30] timer: Use hlist_unhashed_lockless() in timer_pending() paulmck
                   ` (13 subsequent siblings)
  29 siblings, 0 replies; 46+ messages in thread
From: paulmck @ 2020-02-14 23:55 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

The rcu_node structure's ->boost_kthread_status field is accessed
locklessly, so this commit causes all updates to use WRITE_ONCE() and
all reads to use READ_ONCE().

This data race was reported by KCSAN.  Not appropriate for backporting
due to failure being unlikely.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree_plugin.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index b5ba148..0f8b714 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1032,18 +1032,18 @@ static int rcu_boost_kthread(void *arg)
 
 	trace_rcu_utilization(TPS("Start boost kthread@init"));
 	for (;;) {
-		rnp->boost_kthread_status = RCU_KTHREAD_WAITING;
+		WRITE_ONCE(rnp->boost_kthread_status, RCU_KTHREAD_WAITING);
 		trace_rcu_utilization(TPS("End boost kthread@rcu_wait"));
 		rcu_wait(rnp->boost_tasks || rnp->exp_tasks);
 		trace_rcu_utilization(TPS("Start boost kthread@rcu_wait"));
-		rnp->boost_kthread_status = RCU_KTHREAD_RUNNING;
+		WRITE_ONCE(rnp->boost_kthread_status, RCU_KTHREAD_RUNNING);
 		more2boost = rcu_boost(rnp);
 		if (more2boost)
 			spincnt++;
 		else
 			spincnt = 0;
 		if (spincnt > 10) {
-			rnp->boost_kthread_status = RCU_KTHREAD_YIELDING;
+			WRITE_ONCE(rnp->boost_kthread_status, RCU_KTHREAD_YIELDING);
 			trace_rcu_utilization(TPS("End boost kthread@rcu_yield"));
 			schedule_timeout_interruptible(2);
 			trace_rcu_utilization(TPS("Start boost kthread@rcu_yield"));
@@ -1082,7 +1082,7 @@ static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags)
 			rnp->boost_tasks = rnp->gp_tasks;
 		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 		rcu_wake_cond(rnp->boost_kthread_task,
-			      rnp->boost_kthread_status);
+			      READ_ONCE(rnp->boost_kthread_status));
 	} else {
 		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 	}
-- 
2.9.5


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH tip/core/rcu 17/30] timer: Use hlist_unhashed_lockless() in timer_pending()
  2020-02-14 23:55 [PATCH tip/core/rcu 0/30] Miscellaneous fixes for v5.7 Paul E. McKenney
                   ` (15 preceding siblings ...)
  2020-02-14 23:55 ` [PATCH tip/core/rcu 16/30] rcu: Add *_ONCE() to rcu_node ->boost_kthread_status paulmck
@ 2020-02-14 23:55 ` paulmck
  2020-02-14 23:55 ` [PATCH tip/core/rcu 18/30] rcu: Remove dead code from rcu_segcblist_insert_pend_cbs() paulmck
                   ` (12 subsequent siblings)
  29 siblings, 0 replies; 46+ messages in thread
From: paulmck @ 2020-02-14 23:55 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E . McKenney

From: Eric Dumazet <edumazet@google.com>

The timer_pending() function is mostly used in lockless contexts, so
Without proper annotations, KCSAN might detect a data-race [1].

Using hlist_unhashed_lockless() instead of hand-coding it seems
appropriate (as suggested by Paul E. McKenney).

[1]

BUG: KCSAN: data-race in del_timer / detach_if_pending

write to 0xffff88808697d870 of 8 bytes by task 10 on cpu 0:
 __hlist_del include/linux/list.h:764 [inline]
 detach_timer kernel/time/timer.c:815 [inline]
 detach_if_pending+0xcd/0x2d0 kernel/time/timer.c:832
 try_to_del_timer_sync+0x60/0xb0 kernel/time/timer.c:1226
 del_timer_sync+0x6b/0xa0 kernel/time/timer.c:1365
 schedule_timeout+0x2d2/0x6e0 kernel/time/timer.c:1896
 rcu_gp_fqs_loop+0x37c/0x580 kernel/rcu/tree.c:1639
 rcu_gp_kthread+0x143/0x230 kernel/rcu/tree.c:1799
 kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

read to 0xffff88808697d870 of 8 bytes by task 12060 on cpu 1:
 del_timer+0x3b/0xb0 kernel/time/timer.c:1198
 sk_stop_timer+0x25/0x60 net/core/sock.c:2845
 inet_csk_clear_xmit_timers+0x69/0xa0 net/ipv4/inet_connection_sock.c:523
 tcp_clear_xmit_timers include/net/tcp.h:606 [inline]
 tcp_v4_destroy_sock+0xa3/0x3f0 net/ipv4/tcp_ipv4.c:2096
 inet_csk_destroy_sock+0xf4/0x250 net/ipv4/inet_connection_sock.c:836
 tcp_close+0x6f3/0x970 net/ipv4/tcp.c:2497
 inet_release+0x86/0x100 net/ipv4/af_inet.c:427
 __sock_release+0x85/0x160 net/socket.c:590
 sock_close+0x24/0x30 net/socket.c:1268
 __fput+0x1e1/0x520 fs/file_table.c:280
 ____fput+0x1f/0x30 fs/file_table.c:313
 task_work_run+0xf6/0x130 kernel/task_work.c:113
 tracehook_notify_resume include/linux/tracehook.h:188 [inline]
 exit_to_usermode_loop+0x2b4/0x2c0 arch/x86/entry/common.c:163

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 12060 Comm: syz-executor.5 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine,

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
[ paulmck: Pulled in Eric's later amendments. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 include/linux/timer.h | 2 +-
 kernel/time/timer.c   | 7 ++++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 1e6650e..0dc19a8 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -164,7 +164,7 @@ static inline void destroy_timer_on_stack(struct timer_list *timer) { }
  */
 static inline int timer_pending(const struct timer_list * timer)
 {
-	return timer->entry.pprev != NULL;
+	return !hlist_unhashed_lockless(&timer->entry);
 }
 
 extern void add_timer_on(struct timer_list *timer, int cpu);
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 4820823..568564a 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -944,6 +944,7 @@ static struct timer_base *lock_timer_base(struct timer_list *timer,
 
 #define MOD_TIMER_PENDING_ONLY		0x01
 #define MOD_TIMER_REDUCE		0x02
+#define MOD_TIMER_NOTPENDING		0x04
 
 static inline int
 __mod_timer(struct timer_list *timer, unsigned long expires, unsigned int options)
@@ -960,7 +961,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires, unsigned int option
 	 * the timer is re-modified to have the same timeout or ends up in the
 	 * same array bucket then just return:
 	 */
-	if (timer_pending(timer)) {
+	if (!(options & MOD_TIMER_NOTPENDING) && timer_pending(timer)) {
 		/*
 		 * The downside of this optimization is that it can result in
 		 * larger granularity than you would get from adding a new
@@ -1133,7 +1134,7 @@ EXPORT_SYMBOL(timer_reduce);
 void add_timer(struct timer_list *timer)
 {
 	BUG_ON(timer_pending(timer));
-	mod_timer(timer, timer->expires);
+	__mod_timer(timer, timer->expires, MOD_TIMER_NOTPENDING);
 }
 EXPORT_SYMBOL(add_timer);
 
@@ -1891,7 +1892,7 @@ signed long __sched schedule_timeout(signed long timeout)
 
 	timer.task = current;
 	timer_setup_on_stack(&timer.timer, process_timeout, 0);
-	__mod_timer(&timer.timer, expire, 0);
+	__mod_timer(&timer.timer, expire, MOD_TIMER_NOTPENDING);
 	schedule();
 	del_singleshot_timer_sync(&timer.timer);
 
-- 
2.9.5


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH tip/core/rcu 18/30] rcu: Remove dead code from rcu_segcblist_insert_pend_cbs()
  2020-02-14 23:55 [PATCH tip/core/rcu 0/30] Miscellaneous fixes for v5.7 Paul E. McKenney
                   ` (16 preceding siblings ...)
  2020-02-14 23:55 ` [PATCH tip/core/rcu 17/30] timer: Use hlist_unhashed_lockless() in timer_pending() paulmck
@ 2020-02-14 23:55 ` paulmck
  2020-02-14 23:55 ` [PATCH tip/core/rcu 19/30] rcu: Add WRITE_ONCE() to rcu_state ->gp_start paulmck
                   ` (11 subsequent siblings)
  29 siblings, 0 replies; 46+ messages in thread
From: paulmck @ 2020-02-14 23:55 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

The rcu_segcblist_insert_pend_cbs() function currently (partially)
initializes the rcu_cblist that it pulls callbacks from.  However, all
the resulting stores are dead because all callers pass in the address of
an on-stack cblist that is not used afterwards.  This commit therefore
removes this pointless initialization.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/rcu_segcblist.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index 426a472..9a0f661 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -381,8 +381,6 @@ void rcu_segcblist_insert_pend_cbs(struct rcu_segcblist *rsclp,
 		return; /* Nothing to do. */
 	WRITE_ONCE(*rsclp->tails[RCU_NEXT_TAIL], rclp->head);
 	WRITE_ONCE(rsclp->tails[RCU_NEXT_TAIL], rclp->tail);
-	rclp->head = NULL;
-	rclp->tail = &rclp->head;
 }
 
 /*
-- 
2.9.5


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH tip/core/rcu 19/30] rcu: Add WRITE_ONCE() to rcu_state ->gp_start
  2020-02-14 23:55 [PATCH tip/core/rcu 0/30] Miscellaneous fixes for v5.7 Paul E. McKenney
                   ` (17 preceding siblings ...)
  2020-02-14 23:55 ` [PATCH tip/core/rcu 18/30] rcu: Remove dead code from rcu_segcblist_insert_pend_cbs() paulmck
@ 2020-02-14 23:55 ` paulmck
  2020-02-14 23:55 ` [PATCH tip/core/rcu 20/30] rcu: Fix rcu_barrier_callback() race condition paulmck
                   ` (10 subsequent siblings)
  29 siblings, 0 replies; 46+ messages in thread
From: paulmck @ 2020-02-14 23:55 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

The rcu_state structure's ->gp_start field is read locklessly, so this
commit adds the WRITE_ONCE() to an update in order to provide proper
documentation and READ_ONCE()/WRITE_ONCE() pairing.

This data race was reported by KCSAN.  Not appropriate for backporting
due to failure being unlikely.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree_stall.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 3275f27..56df88e 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -102,7 +102,7 @@ static void record_gp_stall_check_time(void)
 	unsigned long j = jiffies;
 	unsigned long j1;
 
-	rcu_state.gp_start = j;
+	WRITE_ONCE(rcu_state.gp_start, j);
 	j1 = rcu_jiffies_till_stall_check();
 	/* Record ->gp_start before ->jiffies_stall. */
 	smp_store_release(&rcu_state.jiffies_stall, j + j1); /* ^^^ */
-- 
2.9.5


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH tip/core/rcu 20/30] rcu: Fix rcu_barrier_callback() race condition
  2020-02-14 23:55 [PATCH tip/core/rcu 0/30] Miscellaneous fixes for v5.7 Paul E. McKenney
                   ` (18 preceding siblings ...)
  2020-02-14 23:55 ` [PATCH tip/core/rcu 19/30] rcu: Add WRITE_ONCE() to rcu_state ->gp_start paulmck
@ 2020-02-14 23:55 ` paulmck
  2020-02-14 23:55 ` [PATCH tip/core/rcu 21/30] rculist: Add brackets around cond argument in __list_check_rcu macro paulmck
                   ` (9 subsequent siblings)
  29 siblings, 0 replies; 46+ messages in thread
From: paulmck @ 2020-02-14 23:55 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

The rcu_barrier_callback() function does an atomic_dec_and_test(), and
if it is the last CPU to check in, does the required wakeup.  Either way,
it does an event trace.  Unfortunately, this is susceptible to the
following sequence of events:

o	CPU 0 invokes rcu_barrier_callback(), but atomic_dec_and_test()
	says that it is not last.  But at this point, CPU 0 is delayed,
	perhaps due to an NMI, SMI, or vCPU preemption.

o	CPU 1 invokes rcu_barrier_callback(), and atomic_dec_and_test()
	says that it is last.  So CPU 1 traces completion and does
	the needed wakeup.

o	The awakened rcu_barrier() function does cleanup and releases
	rcu_state.barrier_mutex.

o	Another CPU now acquires rcu_state.barrier_mutex and starts
	another round of rcu_barrier() processing, including updating
	rcu_state.barrier_sequence.

o	CPU 0 gets its act back together and does its tracing.  Except
	that rcu_state.barrier_sequence has already been updated, so
	its tracing is incorrect and probably quite confusing.
	(Wait!  Why did this CPU check in twice for one rcu_barrier()
	invocation???)

This commit therefore causes rcu_barrier_callback() to take a
snapshot of the value of rcu_state.barrier_sequence before invoking
atomic_dec_and_test(), thus guaranteeing that the event-trace output
is sensible, even if the timing of the event-trace output might still
be confusing.  (Wait!  Why did the old rcu_barrier() complete before
all of its CPUs checked in???)  But being that this is RCU, only so much
confusion can reasonably be eliminated.

This data race was reported by KCSAN.  Not appropriate for backporting
due to failure being unlikely and due to the mild consequences of the
failure, namely a confusing event trace.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index be59a5d..62383ce 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3077,15 +3077,22 @@ static void rcu_barrier_trace(const char *s, int cpu, unsigned long done)
 /*
  * RCU callback function for rcu_barrier().  If we are last, wake
  * up the task executing rcu_barrier().
+ *
+ * Note that the value of rcu_state.barrier_sequence must be captured
+ * before the atomic_dec_and_test().  Otherwise, if this CPU is not last,
+ * other CPUs might count the value down to zero before this CPU gets
+ * around to invoking rcu_barrier_trace(), which might result in bogus
+ * data from the next instance of rcu_barrier().
  */
 static void rcu_barrier_callback(struct rcu_head *rhp)
 {
+	unsigned long __maybe_unused s = rcu_state.barrier_sequence;
+
 	if (atomic_dec_and_test(&rcu_state.barrier_cpu_count)) {
-		rcu_barrier_trace(TPS("LastCB"), -1,
-				  rcu_state.barrier_sequence);
+		rcu_barrier_trace(TPS("LastCB"), -1, s);
 		complete(&rcu_state.barrier_completion);
 	} else {
-		rcu_barrier_trace(TPS("CB"), -1, rcu_state.barrier_sequence);
+		rcu_barrier_trace(TPS("CB"), -1, s);
 	}
 }
 
-- 
2.9.5


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH tip/core/rcu 21/30] rculist: Add brackets around cond argument in __list_check_rcu macro
  2020-02-14 23:55 [PATCH tip/core/rcu 0/30] Miscellaneous fixes for v5.7 Paul E. McKenney
                   ` (19 preceding siblings ...)
  2020-02-14 23:55 ` [PATCH tip/core/rcu 20/30] rcu: Fix rcu_barrier_callback() race condition paulmck
@ 2020-02-14 23:55 ` paulmck
  2020-02-14 23:55 ` [PATCH tip/core/rcu 22/30] rcu: Don't flag non-starting GPs before GP kthread is running paulmck
                   ` (8 subsequent siblings)
  29 siblings, 0 replies; 46+ messages in thread
From: paulmck @ 2020-02-14 23:55 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Amol Grover, Paul E . McKenney

From: Amol Grover <frextrite@gmail.com>

Passing a complex lockdep condition to __list_check_rcu results
in false positive lockdep splat due to incorrect expression
evaluation.

For example, a lockdep check condition `cond1 || cond2` is
evaluated as `!cond1 || cond2 && !rcu_read_lock_any_held()`
which, according to operator precedence, evaluates to
`!cond1 || (cond2 && !rcu_read_lock_any_held())`.
This would result in a lockdep splat when cond1 is false
and cond2 is true which is logically incorrect.

Signed-off-by: Amol Grover <frextrite@gmail.com>
Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 include/linux/rculist.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index 9f313e4..8214cdc 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -60,9 +60,9 @@ static inline void INIT_LIST_HEAD_RCU(struct list_head *list)
 #define __list_check_rcu(dummy, cond, extra...)				\
 	({								\
 	check_arg_count_one(extra);					\
-	RCU_LOCKDEP_WARN(!cond && !rcu_read_lock_any_held(),		\
+	RCU_LOCKDEP_WARN(!(cond) && !rcu_read_lock_any_held(),		\
 			 "RCU-list traversed in non-reader section!");	\
-	 })
+	})
 #else
 #define __list_check_rcu(dummy, cond, extra...)				\
 	({ check_arg_count_one(extra); })
-- 
2.9.5


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH tip/core/rcu 22/30] rcu: Don't flag non-starting GPs before GP kthread is running
  2020-02-14 23:55 [PATCH tip/core/rcu 0/30] Miscellaneous fixes for v5.7 Paul E. McKenney
                   ` (20 preceding siblings ...)
  2020-02-14 23:55 ` [PATCH tip/core/rcu 21/30] rculist: Add brackets around cond argument in __list_check_rcu macro paulmck
@ 2020-02-14 23:55 ` paulmck
  2020-02-15  3:53   ` Steven Rostedt
  2020-02-14 23:56 ` [PATCH tip/core/rcu 23/30] rcu: Add missing annotation for rcu_nocb_bypass_lock() paulmck
                   ` (7 subsequent siblings)
  29 siblings, 1 reply; 46+ messages in thread
From: paulmck @ 2020-02-14 23:55 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

Currently rcu_check_gp_start_stall() complains if a grace period takes
too long to start, where "too long" is roughly one RCU CPU stall-warning
interval.  This has worked well, but there are some debugging Kconfig
options (such as CONFIG_EFI_PGT_DUMP=y) that can make booting take a
very long time, so much so that the stall-warning interval has expired
before RCU's grace-period kthread has even been spawned.

This commit therefore resets the rcu_state.gp_req_activity and
rcu_state.gp_activity timestamps just before the grace-period kthread
is spawned, and modifies the checks and adds ordering to ensure that
if rcu_check_gp_start_stall() sees that the grace-period kthread
has been spawned, that it will also see the resets applied to the
rcu_state.gp_req_activity and rcu_state.gp_activity timestamps.

Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
[ paulmck: Fix whitespace issues reported by Qian Cai. ]
Tested-by: Qian Cai <cai@lca.pw>
---
 kernel/rcu/tree.c       | 11 +++++++----
 kernel/rcu/tree_stall.h | 11 ++++++-----
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 62383ce..5ee5657 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1202,7 +1202,7 @@ static bool rcu_start_this_gp(struct rcu_node *rnp_start, struct rcu_data *rdp,
 	trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("Startedroot"));
 	WRITE_ONCE(rcu_state.gp_flags, rcu_state.gp_flags | RCU_GP_FLAG_INIT);
 	WRITE_ONCE(rcu_state.gp_req_activity, jiffies);
-	if (!rcu_state.gp_kthread) {
+	if (!READ_ONCE(rcu_state.gp_kthread)) {
 		trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("NoGPkthread"));
 		goto unlock_out;
 	}
@@ -1252,10 +1252,10 @@ static bool rcu_future_gp_cleanup(struct rcu_node *rnp)
  */
 static void rcu_gp_kthread_wake(void)
 {
-	if ((current == rcu_state.gp_kthread &&
+	if ((current == READ_ONCE(rcu_state.gp_kthread) &&
 	     !in_irq() && !in_serving_softirq()) ||
 	    !READ_ONCE(rcu_state.gp_flags) ||
-	    !rcu_state.gp_kthread)
+	    !READ_ONCE(rcu_state.gp_kthread))
 		return;
 	WRITE_ONCE(rcu_state.gp_wake_time, jiffies);
 	WRITE_ONCE(rcu_state.gp_wake_seq, READ_ONCE(rcu_state.gp_seq));
@@ -3554,7 +3554,10 @@ static int __init rcu_spawn_gp_kthread(void)
 	}
 	rnp = rcu_get_root();
 	raw_spin_lock_irqsave_rcu_node(rnp, flags);
-	rcu_state.gp_kthread = t;
+	WRITE_ONCE(rcu_state.gp_activity, jiffies);
+	WRITE_ONCE(rcu_state.gp_req_activity, jiffies);
+	// Reset .gp_activity and .gp_req_activity before setting .gp_kthread.
+	smp_store_release(&rcu_state.gp_kthread, t);  /* ^^^ */
 	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 	wake_up_process(t);
 	rcu_spawn_nocb_kthreads();
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 56df88e..4c216f7 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -578,6 +578,7 @@ void show_rcu_gp_kthreads(void)
 	unsigned long jw;
 	struct rcu_data *rdp;
 	struct rcu_node *rnp;
+	struct task_struct *t = READ_ONCE(rcu_state.gp_kthread);
 
 	j = jiffies;
 	ja = j - READ_ONCE(rcu_state.gp_activity);
@@ -585,8 +586,7 @@ void show_rcu_gp_kthreads(void)
 	jw = j - READ_ONCE(rcu_state.gp_wake_time);
 	pr_info("%s: wait state: %s(%d) ->state: %#lx delta ->gp_activity %lu ->gp_req_activity %lu ->gp_wake_time %lu ->gp_wake_seq %ld ->gp_seq %ld ->gp_seq_needed %ld ->gp_flags %#x\n",
 		rcu_state.name, gp_state_getname(rcu_state.gp_state),
-		rcu_state.gp_state,
-		rcu_state.gp_kthread ? rcu_state.gp_kthread->state : 0x1ffffL,
+		rcu_state.gp_state, t ? t->state : 0x1ffffL,
 		ja, jr, jw, (long)READ_ONCE(rcu_state.gp_wake_seq),
 		(long)READ_ONCE(rcu_state.gp_seq),
 		(long)READ_ONCE(rcu_get_root()->gp_seq_needed),
@@ -633,7 +633,8 @@ static void rcu_check_gp_start_stall(struct rcu_node *rnp, struct rcu_data *rdp,
 
 	if (!IS_ENABLED(CONFIG_PROVE_RCU) || rcu_gp_in_progress() ||
 	    ULONG_CMP_GE(READ_ONCE(rnp_root->gp_seq),
-	    		 READ_ONCE(rnp_root->gp_seq_needed)))
+			 READ_ONCE(rnp_root->gp_seq_needed)) ||
+	    !smp_load_acquire(&rcu_state.gp_kthread))
 		return;
 	j = jiffies; /* Expensive access, and in common case don't get here. */
 	if (time_before(j, READ_ONCE(rcu_state.gp_req_activity) + gpssdelay) ||
@@ -645,7 +646,7 @@ static void rcu_check_gp_start_stall(struct rcu_node *rnp, struct rcu_data *rdp,
 	j = jiffies;
 	if (rcu_gp_in_progress() ||
 	    ULONG_CMP_GE(READ_ONCE(rnp_root->gp_seq),
-	    		 READ_ONCE(rnp_root->gp_seq_needed)) ||
+			 READ_ONCE(rnp_root->gp_seq_needed)) ||
 	    time_before(j, READ_ONCE(rcu_state.gp_req_activity) + gpssdelay) ||
 	    time_before(j, READ_ONCE(rcu_state.gp_activity) + gpssdelay) ||
 	    atomic_read(&warned)) {
@@ -659,7 +660,7 @@ static void rcu_check_gp_start_stall(struct rcu_node *rnp, struct rcu_data *rdp,
 	j = jiffies;
 	if (rcu_gp_in_progress() ||
 	    ULONG_CMP_GE(READ_ONCE(rnp_root->gp_seq),
-	    		 READ_ONCE(rnp_root->gp_seq_needed)) ||
+			 READ_ONCE(rnp_root->gp_seq_needed)) ||
 	    time_before(j, READ_ONCE(rcu_state.gp_req_activity) + gpssdelay) ||
 	    time_before(j, READ_ONCE(rcu_state.gp_activity) + gpssdelay) ||
 	    atomic_xchg(&warned, 1)) {
-- 
2.9.5


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH tip/core/rcu 23/30] rcu: Add missing annotation for rcu_nocb_bypass_lock()
  2020-02-14 23:55 [PATCH tip/core/rcu 0/30] Miscellaneous fixes for v5.7 Paul E. McKenney
                   ` (21 preceding siblings ...)
  2020-02-14 23:55 ` [PATCH tip/core/rcu 22/30] rcu: Don't flag non-starting GPs before GP kthread is running paulmck
@ 2020-02-14 23:56 ` paulmck
  2020-02-14 23:56 ` [PATCH tip/core/rcu 24/30] rcu/nocb: Add missing annotation for rcu_nocb_bypass_unlock() paulmck
                   ` (6 subsequent siblings)
  29 siblings, 0 replies; 46+ messages in thread
From: paulmck @ 2020-02-14 23:56 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Jules Irenge, Paul E . McKenney

From: Jules Irenge <jbi.octave@gmail.com>

Sparse reports warning at rcu_nocb_bypass_lock()

|warning: context imbalance in rcu_nocb_bypass_lock() - wrong count at exit

To fix this, this commit adds an __acquires(&rdp->nocb_bypass_lock).
Given that rcu_nocb_bypass_lock() does actually call raw_spin_lock()
when raw_spin_trylock() fails, this not only fixes the warning but also
improves on the readability of the code.

Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree_plugin.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 0f8b714..6db2cad 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1486,6 +1486,7 @@ module_param(nocb_nobypass_lim_per_jiffy, int, 0);
  * flag the contention.
  */
 static void rcu_nocb_bypass_lock(struct rcu_data *rdp)
+	__acquires(&rdp->nocb_bypass_lock)
 {
 	lockdep_assert_irqs_disabled();
 	if (raw_spin_trylock(&rdp->nocb_bypass_lock))
-- 
2.9.5


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH tip/core/rcu 24/30] rcu/nocb: Add missing annotation for rcu_nocb_bypass_unlock()
  2020-02-14 23:55 [PATCH tip/core/rcu 0/30] Miscellaneous fixes for v5.7 Paul E. McKenney
                   ` (22 preceding siblings ...)
  2020-02-14 23:56 ` [PATCH tip/core/rcu 23/30] rcu: Add missing annotation for rcu_nocb_bypass_lock() paulmck
@ 2020-02-14 23:56 ` paulmck
  2020-02-14 23:56 ` [PATCH tip/core/rcu 25/30] rcu: Optimize and protect atomic_cmpxchg() loop paulmck
                   ` (5 subsequent siblings)
  29 siblings, 0 replies; 46+ messages in thread
From: paulmck @ 2020-02-14 23:56 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Jules Irenge, Paul E . McKenney

From: Jules Irenge <jbi.octave@gmail.com>

Sparse reports warning at rcu_nocb_bypass_unlock()

warning: context imbalance in rcu_nocb_bypass_unlock() - unexpected unlock

The root cause is a missing annotation of rcu_nocb_bypass_unlock()
which causes the warning.

This commit therefore adds the missing __releases(&rdp->nocb_bypass_lock)
annotation.

Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Boqun Feng <boqun.feng@gmail.com>
---
 kernel/rcu/tree_plugin.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 6db2cad..3846519 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1530,6 +1530,7 @@ static bool rcu_nocb_bypass_trylock(struct rcu_data *rdp)
  * Release the specified rcu_data structure's ->nocb_bypass_lock.
  */
 static void rcu_nocb_bypass_unlock(struct rcu_data *rdp)
+	__releases(&rdp->nocb_bypass_lock)
 {
 	lockdep_assert_irqs_disabled();
 	raw_spin_unlock(&rdp->nocb_bypass_lock);
-- 
2.9.5


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH tip/core/rcu 25/30] rcu: Optimize and protect atomic_cmpxchg() loop
  2020-02-14 23:55 [PATCH tip/core/rcu 0/30] Miscellaneous fixes for v5.7 Paul E. McKenney
                   ` (23 preceding siblings ...)
  2020-02-14 23:56 ` [PATCH tip/core/rcu 24/30] rcu/nocb: Add missing annotation for rcu_nocb_bypass_unlock() paulmck
@ 2020-02-14 23:56 ` paulmck
  2020-02-14 23:56 ` [PATCH tip/core/rcu 26/30] rcu: Tighten rcu_lockdep_assert_cblist_protected() check paulmck
                   ` (4 subsequent siblings)
  29 siblings, 0 replies; 46+ messages in thread
From: paulmck @ 2020-02-14 23:56 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

This commit reworks the atomic_cmpxchg() loop in rcu_eqs_special_set()
to do only the initial read from the current CPU's rcu_data structure's
->dynticks field explicitly.  On subsequent passes, this value is instead
retained from the failing atomic_cmpxchg() operation.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 5ee5657..4146207 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -342,14 +342,17 @@ bool rcu_eqs_special_set(int cpu)
 {
 	int old;
 	int new;
+	int new_old;
 	struct rcu_data *rdp = &per_cpu(rcu_data, cpu);
 
+	new_old = atomic_read(&rdp->dynticks);
 	do {
-		old = atomic_read(&rdp->dynticks);
+		old = new_old;
 		if (old & RCU_DYNTICK_CTRL_CTR)
 			return false;
 		new = old | RCU_DYNTICK_CTRL_MASK;
-	} while (atomic_cmpxchg(&rdp->dynticks, old, new) != old);
+		new_old = atomic_cmpxchg(&rdp->dynticks, old, new);
+	} while (new_old != old);
 	return true;
 }
 
-- 
2.9.5


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH tip/core/rcu 26/30] rcu: Tighten rcu_lockdep_assert_cblist_protected() check
  2020-02-14 23:55 [PATCH tip/core/rcu 0/30] Miscellaneous fixes for v5.7 Paul E. McKenney
                   ` (24 preceding siblings ...)
  2020-02-14 23:56 ` [PATCH tip/core/rcu 25/30] rcu: Optimize and protect atomic_cmpxchg() loop paulmck
@ 2020-02-14 23:56 ` paulmck
  2020-02-14 23:56 ` [PATCH tip/core/rcu 27/30] rcu: Make nocb_gp_wait() double-check unexpected-callback warning paulmck
                   ` (3 subsequent siblings)
  29 siblings, 0 replies; 46+ messages in thread
From: paulmck @ 2020-02-14 23:56 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

The ->nocb_lock lockdep assertion is currently guarded by cpu_online(),
which is incorrect for no-CBs CPUs, whose callback lists must be
protected by ->nocb_lock regardless of whether or not the corresponding
CPU is online.  This situation could result in failure to detect bugs
resulting from failing to hold ->nocb_lock for offline CPUs.

This commit therefore removes the cpu_online() guard.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree_plugin.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 3846519..70b3c0f 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1579,8 +1579,7 @@ static void rcu_nocb_unlock_irqrestore(struct rcu_data *rdp,
 static void rcu_lockdep_assert_cblist_protected(struct rcu_data *rdp)
 {
 	lockdep_assert_irqs_disabled();
-	if (rcu_segcblist_is_offloaded(&rdp->cblist) &&
-	    cpu_online(rdp->cpu))
+	if (rcu_segcblist_is_offloaded(&rdp->cblist))
 		lockdep_assert_held(&rdp->nocb_lock);
 }
 
-- 
2.9.5


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH tip/core/rcu 27/30] rcu: Make nocb_gp_wait() double-check unexpected-callback warning
  2020-02-14 23:55 [PATCH tip/core/rcu 0/30] Miscellaneous fixes for v5.7 Paul E. McKenney
                   ` (25 preceding siblings ...)
  2020-02-14 23:56 ` [PATCH tip/core/rcu 26/30] rcu: Tighten rcu_lockdep_assert_cblist_protected() check paulmck
@ 2020-02-14 23:56 ` paulmck
  2020-02-14 23:56 ` [PATCH tip/core/rcu 28/30] rcu: Mark rcu_state.ncpus to detect concurrent writes paulmck
                   ` (2 subsequent siblings)
  29 siblings, 0 replies; 46+ messages in thread
From: paulmck @ 2020-02-14 23:56 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

Currently, nocb_gp_wait() unconditionally complains if there is a
callback not already associated with a grace period.  This assumes that
either there was no such callback initially on the one hand, or that
the rcu_advance_cbs() function assigned all such callbacks to a grace
period on the other.  However, in theory there are some situations that
would prevent rcu_advance_cbs() from assigning all of the callbacks.

This commit therefore checks for unassociated callbacks immediately after
rcu_advance_cbs() returns, while the corresponding rcu_node structure's
->lock is still held.  If there are unassociated callbacks at that point,
the subsequent WARN_ON_ONCE() is disabled.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree_plugin.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 70b3c0f..36e71c9 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1931,6 +1931,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 	struct rcu_data *rdp;
 	struct rcu_node *rnp;
 	unsigned long wait_gp_seq = 0; // Suppress "use uninitialized" warning.
+	bool wasempty = false;
 
 	/*
 	 * Each pass through the following loop checks for CBs and for the
@@ -1970,10 +1971,13 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 		     rcu_seq_done(&rnp->gp_seq, cur_gp_seq))) {
 			raw_spin_lock_rcu_node(rnp); /* irqs disabled. */
 			needwake_gp = rcu_advance_cbs(rnp, rdp);
+			wasempty = rcu_segcblist_restempty(&rdp->cblist,
+							   RCU_NEXT_READY_TAIL);
 			raw_spin_unlock_rcu_node(rnp); /* irqs disabled. */
 		}
 		// Need to wait on some grace period?
-		WARN_ON_ONCE(!rcu_segcblist_restempty(&rdp->cblist,
+		WARN_ON_ONCE(wasempty &&
+			     !rcu_segcblist_restempty(&rdp->cblist,
 						      RCU_NEXT_READY_TAIL));
 		if (rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq)) {
 			if (!needwait_gp ||
-- 
2.9.5


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH tip/core/rcu 28/30] rcu: Mark rcu_state.ncpus to detect concurrent writes
  2020-02-14 23:55 [PATCH tip/core/rcu 0/30] Miscellaneous fixes for v5.7 Paul E. McKenney
                   ` (26 preceding siblings ...)
  2020-02-14 23:56 ` [PATCH tip/core/rcu 27/30] rcu: Make nocb_gp_wait() double-check unexpected-callback warning paulmck
@ 2020-02-14 23:56 ` paulmck
  2020-02-14 23:56 ` [PATCH tip/core/rcu 29/30] rcu: Mark rcu_state.gp_seq " paulmck
  2020-02-14 23:56 ` [PATCH tip/core/rcu 30/30] rcu: Make rcu_barrier() account for offline no-CBs CPUs paulmck
  29 siblings, 0 replies; 46+ messages in thread
From: paulmck @ 2020-02-14 23:56 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

The rcu_state structure's ncpus field is only to be modified by the
CPU-hotplug CPU-online code path, which is single-threaded.  This
commit therefore enlists KCSAN's help in enforcing this restriction.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 4146207..3a60a160 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3397,6 +3397,7 @@ void rcu_cpu_starting(unsigned int cpu)
 	nbits = bitmap_weight(&oldmask, BITS_PER_LONG);
 	/* Allow lockless access for expedited grace periods. */
 	smp_store_release(&rcu_state.ncpus, rcu_state.ncpus + nbits); /* ^^^ */
+	ASSERT_EXCLUSIVE_WRITER(rcu_state.ncpus);
 	rcu_gpnum_ovf(rnp, rdp); /* Offline-induced counter wrap? */
 	rdp->rcu_onl_gp_seq = READ_ONCE(rcu_state.gp_seq);
 	rdp->rcu_onl_gp_flags = READ_ONCE(rcu_state.gp_flags);
-- 
2.9.5


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH tip/core/rcu 29/30] rcu: Mark rcu_state.gp_seq to detect concurrent writes
  2020-02-14 23:55 [PATCH tip/core/rcu 0/30] Miscellaneous fixes for v5.7 Paul E. McKenney
                   ` (27 preceding siblings ...)
  2020-02-14 23:56 ` [PATCH tip/core/rcu 28/30] rcu: Mark rcu_state.ncpus to detect concurrent writes paulmck
@ 2020-02-14 23:56 ` " paulmck
  2020-02-14 23:56 ` [PATCH tip/core/rcu 30/30] rcu: Make rcu_barrier() account for offline no-CBs CPUs paulmck
  29 siblings, 0 replies; 46+ messages in thread
From: paulmck @ 2020-02-14 23:56 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

The rcu_state structure's gp_seq field is only to be modified by the RCU
grace-period kthread, which is single-threaded.  This commit therefore
enlists KCSAN's help in enforcing this restriction.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 24 ++++++++++--------------
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 3a60a160..d15041f 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1209,7 +1209,7 @@ static bool rcu_start_this_gp(struct rcu_node *rnp_start, struct rcu_data *rdp,
 		trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("NoGPkthread"));
 		goto unlock_out;
 	}
-	trace_rcu_grace_period(rcu_state.name, READ_ONCE(rcu_state.gp_seq), TPS("newreq"));
+	trace_rcu_grace_period(rcu_state.name, data_race(rcu_state.gp_seq), TPS("newreq"));
 	ret = true;  /* Caller must wake GP kthread. */
 unlock_out:
 	/* Push furthest requested GP to leaf node and rcu_data structure. */
@@ -1494,6 +1494,7 @@ static bool rcu_gp_init(void)
 	record_gp_stall_check_time();
 	/* Record GP times before starting GP, hence rcu_seq_start(). */
 	rcu_seq_start(&rcu_state.gp_seq);
+	ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
 	trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("start"));
 	raw_spin_unlock_irq_rcu_node(rnp);
 
@@ -1656,8 +1657,7 @@ static void rcu_gp_fqs_loop(void)
 			WRITE_ONCE(rcu_state.jiffies_kick_kthreads,
 				   jiffies + (j ? 3 * j : 2));
 		}
-		trace_rcu_grace_period(rcu_state.name,
-				       READ_ONCE(rcu_state.gp_seq),
+		trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
 				       TPS("fqswait"));
 		rcu_state.gp_state = RCU_GP_WAIT_FQS;
 		ret = swait_event_idle_timeout_exclusive(
@@ -1671,13 +1671,11 @@ static void rcu_gp_fqs_loop(void)
 		/* If time for quiescent-state forcing, do it. */
 		if (ULONG_CMP_GE(jiffies, rcu_state.jiffies_force_qs) ||
 		    (gf & RCU_GP_FLAG_FQS)) {
-			trace_rcu_grace_period(rcu_state.name,
-					       READ_ONCE(rcu_state.gp_seq),
+			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
 					       TPS("fqsstart"));
 			rcu_gp_fqs(first_gp_fqs);
 			first_gp_fqs = false;
-			trace_rcu_grace_period(rcu_state.name,
-					       READ_ONCE(rcu_state.gp_seq),
+			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
 					       TPS("fqsend"));
 			cond_resched_tasks_rcu_qs();
 			WRITE_ONCE(rcu_state.gp_activity, jiffies);
@@ -1688,8 +1686,7 @@ static void rcu_gp_fqs_loop(void)
 			cond_resched_tasks_rcu_qs();
 			WRITE_ONCE(rcu_state.gp_activity, jiffies);
 			WARN_ON(signal_pending(current));
-			trace_rcu_grace_period(rcu_state.name,
-					       READ_ONCE(rcu_state.gp_seq),
+			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
 					       TPS("fqswaitsig"));
 			ret = 1; /* Keep old FQS timing. */
 			j = jiffies;
@@ -1766,6 +1763,7 @@ static void rcu_gp_cleanup(void)
 	/* Declare grace period done, trace first to use old GP number. */
 	trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("end"));
 	rcu_seq_end(&rcu_state.gp_seq);
+	ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
 	rcu_state.gp_state = RCU_GP_IDLE;
 	/* Check for GP requests since above loop. */
 	rdp = this_cpu_ptr(&rcu_data);
@@ -1781,7 +1779,7 @@ static void rcu_gp_cleanup(void)
 		WRITE_ONCE(rcu_state.gp_flags, RCU_GP_FLAG_INIT);
 		WRITE_ONCE(rcu_state.gp_req_activity, jiffies);
 		trace_rcu_grace_period(rcu_state.name,
-				       READ_ONCE(rcu_state.gp_seq),
+				       rcu_state.gp_seq,
 				       TPS("newreq"));
 	} else {
 		WRITE_ONCE(rcu_state.gp_flags,
@@ -1800,8 +1798,7 @@ static int __noreturn rcu_gp_kthread(void *unused)
 
 		/* Handle grace-period start. */
 		for (;;) {
-			trace_rcu_grace_period(rcu_state.name,
-					       READ_ONCE(rcu_state.gp_seq),
+			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
 					       TPS("reqwait"));
 			rcu_state.gp_state = RCU_GP_WAIT_GPS;
 			swait_event_idle_exclusive(rcu_state.gp_wq,
@@ -1814,8 +1811,7 @@ static int __noreturn rcu_gp_kthread(void *unused)
 			cond_resched_tasks_rcu_qs();
 			WRITE_ONCE(rcu_state.gp_activity, jiffies);
 			WARN_ON(signal_pending(current));
-			trace_rcu_grace_period(rcu_state.name,
-					       READ_ONCE(rcu_state.gp_seq),
+			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
 					       TPS("reqwaitsig"));
 		}
 
-- 
2.9.5


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH tip/core/rcu 30/30] rcu: Make rcu_barrier() account for offline no-CBs CPUs
  2020-02-14 23:55 [PATCH tip/core/rcu 0/30] Miscellaneous fixes for v5.7 Paul E. McKenney
                   ` (28 preceding siblings ...)
  2020-02-14 23:56 ` [PATCH tip/core/rcu 29/30] rcu: Mark rcu_state.gp_seq " paulmck
@ 2020-02-14 23:56 ` paulmck
  2020-02-25 10:24   ` Boqun Feng
  29 siblings, 1 reply; 46+ messages in thread
From: paulmck @ 2020-02-14 23:56 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney, # 5 . 5 . x

From: "Paul E. McKenney" <paulmck@kernel.org>

Currently, rcu_barrier() ignores offline CPUs,  However, it is possible
for an offline no-CBs CPU to have callbacks queued, and rcu_barrier()
must wait for those callbacks.  This commit therefore makes rcu_barrier()
directly invoke the rcu_barrier_func() with interrupts disabled for such
CPUs.  This requires passing the CPU number into this function so that
it can entrain the rcu_barrier() callback onto the correct CPU's callback
list, given that the code must instead execute on the current CPU.

While in the area, this commit fixes a bug where the first CPU's callback
might have been invoked before rcu_segcblist_entrain() returned, which
would also result in an early wakeup.

Fixes: 5d6742b37727 ("rcu/nocb: Use rcu_segcblist for no-CBs CPUs")
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: <stable@vger.kernel.org> # 5.5.x
---
 include/trace/events/rcu.h |  1 +
 kernel/rcu/tree.c          | 32 ++++++++++++++++++++------------
 2 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index 5e49b06..d56d54c 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -712,6 +712,7 @@ TRACE_EVENT_RCU(rcu_torture_read,
  *	"Begin": rcu_barrier() started.
  *	"EarlyExit": rcu_barrier() piggybacked, thus early exit.
  *	"Inc1": rcu_barrier() piggyback check counter incremented.
+ *	"OfflineNoCBQ": rcu_barrier() found offline no-CBs CPU with callbacks.
  *	"OnlineQ": rcu_barrier() found online CPU with callbacks.
  *	"OnlineNQ": rcu_barrier() found online CPU, no callbacks.
  *	"IRQ": An rcu_barrier_callback() callback posted on remote CPU.
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index d15041f..160643e 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3098,9 +3098,10 @@ static void rcu_barrier_callback(struct rcu_head *rhp)
 /*
  * Called with preemption disabled, and from cross-cpu IRQ context.
  */
-static void rcu_barrier_func(void *unused)
+static void rcu_barrier_func(void *cpu_in)
 {
-	struct rcu_data *rdp = raw_cpu_ptr(&rcu_data);
+	uintptr_t cpu = (uintptr_t)cpu_in;
+	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
 
 	rcu_barrier_trace(TPS("IRQ"), -1, rcu_state.barrier_sequence);
 	rdp->barrier_head.func = rcu_barrier_callback;
@@ -3127,7 +3128,7 @@ static void rcu_barrier_func(void *unused)
  */
 void rcu_barrier(void)
 {
-	int cpu;
+	uintptr_t cpu;
 	struct rcu_data *rdp;
 	unsigned long s = rcu_seq_snap(&rcu_state.barrier_sequence);
 
@@ -3150,13 +3151,14 @@ void rcu_barrier(void)
 	rcu_barrier_trace(TPS("Inc1"), -1, rcu_state.barrier_sequence);
 
 	/*
-	 * Initialize the count to one rather than to zero in order to
-	 * avoid a too-soon return to zero in case of a short grace period
-	 * (or preemption of this task).  Exclude CPU-hotplug operations
-	 * to ensure that no offline CPU has callbacks queued.
+	 * Initialize the count to two rather than to zero in order
+	 * to avoid a too-soon return to zero in case of an immediate
+	 * invocation of the just-enqueued callback (or preemption of
+	 * this task).  Exclude CPU-hotplug operations to ensure that no
+	 * offline non-offloaded CPU has callbacks queued.
 	 */
 	init_completion(&rcu_state.barrier_completion);
-	atomic_set(&rcu_state.barrier_cpu_count, 1);
+	atomic_set(&rcu_state.barrier_cpu_count, 2);
 	get_online_cpus();
 
 	/*
@@ -3166,13 +3168,19 @@ void rcu_barrier(void)
 	 */
 	for_each_possible_cpu(cpu) {
 		rdp = per_cpu_ptr(&rcu_data, cpu);
-		if (!cpu_online(cpu) &&
+		if (cpu_is_offline(cpu) &&
 		    !rcu_segcblist_is_offloaded(&rdp->cblist))
 			continue;
-		if (rcu_segcblist_n_cbs(&rdp->cblist)) {
+		if (rcu_segcblist_n_cbs(&rdp->cblist) && cpu_online(cpu)) {
 			rcu_barrier_trace(TPS("OnlineQ"), cpu,
 					  rcu_state.barrier_sequence);
-			smp_call_function_single(cpu, rcu_barrier_func, NULL, 1);
+			smp_call_function_single(cpu, rcu_barrier_func, (void *)cpu, 1);
+		} else if (cpu_is_offline(cpu)) {
+			rcu_barrier_trace(TPS("OfflineNoCBQ"), cpu,
+					  rcu_state.barrier_sequence);
+			local_irq_disable();
+			rcu_barrier_func((void *)cpu);
+			local_irq_enable();
 		} else {
 			rcu_barrier_trace(TPS("OnlineNQ"), cpu,
 					  rcu_state.barrier_sequence);
@@ -3184,7 +3192,7 @@ void rcu_barrier(void)
 	 * Now that we have an rcu_barrier_callback() callback on each
 	 * CPU, and thus each counted, remove the initial count.
 	 */
-	if (atomic_dec_and_test(&rcu_state.barrier_cpu_count))
+	if (atomic_sub_and_test(2, &rcu_state.barrier_cpu_count))
 		complete(&rcu_state.barrier_completion);
 
 	/* Wait for all rcu_barrier_callback() callbacks to be invoked. */
-- 
2.9.5


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH tip/core/rcu 06/30] rcu: Add WRITE_ONCE to rcu_node ->exp_seq_rq store
  2020-02-14 23:55 ` [PATCH tip/core/rcu 06/30] rcu: Add WRITE_ONCE to rcu_node ->exp_seq_rq store paulmck
@ 2020-02-15  3:47   ` Steven Rostedt
  2020-02-15 10:58     ` Paul E. McKenney
  0 siblings, 1 reply; 46+ messages in thread
From: Steven Rostedt @ 2020-02-15  3:47 UTC (permalink / raw)
  To: paulmck
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, dipankar,
	akpm, mathieu.desnoyers, josh, tglx, peterz, dhowells, edumazet,
	fweisbec, oleg, joel, Linus Torvalds

On Fri, 14 Feb 2020 15:55:43 -0800
paulmck@kernel.org wrote:

> From: "Paul E. McKenney" <paulmck@kernel.org>
> 
> The rcu_node structure's ->exp_seq_rq field is read locklessly, so
> this commit adds the WRITE_ONCE() to a load in order to provide proper
> documentation and READ_ONCE()/WRITE_ONCE() pairing.
> 
> This data race was reported by KCSAN.  Not appropriate for backporting
> due to failure being unlikely.
> 
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> ---
>  kernel/rcu/tree_exp.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index d7e0484..85b009e 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -314,7 +314,7 @@ static bool exp_funnel_lock(unsigned long s)
>  				   sync_exp_work_done(s));
>  			return true;
>  		}
> -		rnp->exp_seq_rq = s; /* Followers can wait on us. */
> +		WRITE_ONCE(rnp->exp_seq_rq, s); /* Followers can wait on us. */

Didn't Linus say this is basically bogus?

Perhaps just using it as documenting that it's read locklessly, but is
it really needed?

-- Steve



>  		spin_unlock(&rnp->exp_lock);
>  		trace_rcu_exp_funnel_lock(rcu_state.name, rnp->level,
>  					  rnp->grplo, rnp->grphi, TPS("nxtlvl"));


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH tip/core/rcu 22/30] rcu: Don't flag non-starting GPs before GP kthread is running
  2020-02-14 23:55 ` [PATCH tip/core/rcu 22/30] rcu: Don't flag non-starting GPs before GP kthread is running paulmck
@ 2020-02-15  3:53   ` Steven Rostedt
  2020-02-15 11:01     ` Paul E. McKenney
  0 siblings, 1 reply; 46+ messages in thread
From: Steven Rostedt @ 2020-02-15  3:53 UTC (permalink / raw)
  To: paulmck
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, dipankar,
	akpm, mathieu.desnoyers, josh, tglx, peterz, dhowells, edumazet,
	fweisbec, oleg, joel, Linus Torvalds

On Fri, 14 Feb 2020 15:55:59 -0800
paulmck@kernel.org wrote:

> @@ -1252,10 +1252,10 @@ static bool rcu_future_gp_cleanup(struct rcu_node *rnp)
>   */
>  static void rcu_gp_kthread_wake(void)
>  {
> -	if ((current == rcu_state.gp_kthread &&
> +	if ((current == READ_ONCE(rcu_state.gp_kthread) &&
>  	     !in_irq() && !in_serving_softirq()) ||
>  	    !READ_ONCE(rcu_state.gp_flags) ||
> -	    !rcu_state.gp_kthread)
> +	    !READ_ONCE(rcu_state.gp_kthread))
>  		return;

This looks buggy. You have two instances of
READ_ONCE(rcu_state.gp_thread), which means they can be different. Is
that intentional?

-- Steve

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH tip/core/rcu 06/30] rcu: Add WRITE_ONCE to rcu_node ->exp_seq_rq store
  2020-02-15  3:47   ` Steven Rostedt
@ 2020-02-15 10:58     ` Paul E. McKenney
  2020-02-17 21:11       ` Joel Fernandes
  0 siblings, 1 reply; 46+ messages in thread
From: Paul E. McKenney @ 2020-02-15 10:58 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, dipankar,
	akpm, mathieu.desnoyers, josh, tglx, peterz, dhowells, edumazet,
	fweisbec, oleg, joel, Linus Torvalds

On Fri, Feb 14, 2020 at 10:47:43PM -0500, Steven Rostedt wrote:
> On Fri, 14 Feb 2020 15:55:43 -0800
> paulmck@kernel.org wrote:
> 
> > From: "Paul E. McKenney" <paulmck@kernel.org>
> > 
> > The rcu_node structure's ->exp_seq_rq field is read locklessly, so
> > this commit adds the WRITE_ONCE() to a load in order to provide proper
> > documentation and READ_ONCE()/WRITE_ONCE() pairing.
> > 
> > This data race was reported by KCSAN.  Not appropriate for backporting
> > due to failure being unlikely.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > ---
> >  kernel/rcu/tree_exp.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > index d7e0484..85b009e 100644
> > --- a/kernel/rcu/tree_exp.h
> > +++ b/kernel/rcu/tree_exp.h
> > @@ -314,7 +314,7 @@ static bool exp_funnel_lock(unsigned long s)
> >  				   sync_exp_work_done(s));
> >  			return true;
> >  		}
> > -		rnp->exp_seq_rq = s; /* Followers can wait on us. */
> > +		WRITE_ONCE(rnp->exp_seq_rq, s); /* Followers can wait on us. */
> 
> Didn't Linus say this is basically bogus?
> 
> Perhaps just using it as documenting that it's read locklessly, but is
> it really needed?

Yes, Linus explicitly stated that WRITE_ONCE() is not required in
this case, but he also said that he was OK with it being there for
documentation purposes.

And within RCU, I -do- need it because I absolutely need to see if a
given patch introduced new KCSAN reports.  So I need it for the same
reason that I need the build to proceed without warnings.

Others who are working with less concurrency-intensive code might quite
reasonably make other choices, of course.  And my setting certain KCSAN
config options in my own builds doesn't inconvenience them, so we should
all be happy, right?  :-)

							Thanx, Paul

> -- Steve
> 
> 
> 
> >  		spin_unlock(&rnp->exp_lock);
> >  		trace_rcu_exp_funnel_lock(rcu_state.name, rnp->level,
> >  					  rnp->grplo, rnp->grphi, TPS("nxtlvl"));
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH tip/core/rcu 22/30] rcu: Don't flag non-starting GPs before GP kthread is running
  2020-02-15  3:53   ` Steven Rostedt
@ 2020-02-15 11:01     ` Paul E. McKenney
  2020-02-15 13:42       ` Paul E. McKenney
  0 siblings, 1 reply; 46+ messages in thread
From: Paul E. McKenney @ 2020-02-15 11:01 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, dipankar,
	akpm, mathieu.desnoyers, josh, tglx, peterz, dhowells, edumazet,
	fweisbec, oleg, joel, Linus Torvalds

On Fri, Feb 14, 2020 at 10:53:05PM -0500, Steven Rostedt wrote:
> On Fri, 14 Feb 2020 15:55:59 -0800
> paulmck@kernel.org wrote:
> 
> > @@ -1252,10 +1252,10 @@ static bool rcu_future_gp_cleanup(struct rcu_node *rnp)
> >   */
> >  static void rcu_gp_kthread_wake(void)
> >  {
> > -	if ((current == rcu_state.gp_kthread &&
> > +	if ((current == READ_ONCE(rcu_state.gp_kthread) &&
> >  	     !in_irq() && !in_serving_softirq()) ||
> >  	    !READ_ONCE(rcu_state.gp_flags) ||
> > -	    !rcu_state.gp_kthread)
> > +	    !READ_ONCE(rcu_state.gp_kthread))
> >  		return;
> 
> This looks buggy. You have two instances of
> READ_ONCE(rcu_state.gp_thread), which means they can be different. Is
> that intentional?

It might well be a bug, but let's see...

The rcu_state.gp_kthread field is initially NULL and transitions only once
to the non-NULL pointer to the RCU grace-period kthread's task_struct
structure.  So yes, this does work, courtesy of the compiler not being
allowed to change the order of READ_ONCE() instances and conherence-order
rules for READ_ONCE() and WRITE_ONCE().

But it would clearly be way better to do just one READ_ONCE() into a
local variable and test that local variable twice.

I will make this change, and thank you for calling my attention to it!

						Thanx, Paul

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH tip/core/rcu 22/30] rcu: Don't flag non-starting GPs before GP kthread is running
  2020-02-15 11:01     ` Paul E. McKenney
@ 2020-02-15 13:42       ` Paul E. McKenney
  2020-02-17 20:25         ` Steven Rostedt
  0 siblings, 1 reply; 46+ messages in thread
From: Paul E. McKenney @ 2020-02-15 13:42 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, dipankar,
	akpm, mathieu.desnoyers, josh, tglx, peterz, dhowells, edumazet,
	fweisbec, oleg, joel, Linus Torvalds

On Sat, Feb 15, 2020 at 03:01:11AM -0800, Paul E. McKenney wrote:
> On Fri, Feb 14, 2020 at 10:53:05PM -0500, Steven Rostedt wrote:
> > On Fri, 14 Feb 2020 15:55:59 -0800
> > paulmck@kernel.org wrote:
> > 
> > > @@ -1252,10 +1252,10 @@ static bool rcu_future_gp_cleanup(struct rcu_node *rnp)
> > >   */
> > >  static void rcu_gp_kthread_wake(void)
> > >  {
> > > -	if ((current == rcu_state.gp_kthread &&
> > > +	if ((current == READ_ONCE(rcu_state.gp_kthread) &&
> > >  	     !in_irq() && !in_serving_softirq()) ||
> > >  	    !READ_ONCE(rcu_state.gp_flags) ||
> > > -	    !rcu_state.gp_kthread)
> > > +	    !READ_ONCE(rcu_state.gp_kthread))
> > >  		return;
> > 
> > This looks buggy. You have two instances of
> > READ_ONCE(rcu_state.gp_thread), which means they can be different. Is
> > that intentional?
> 
> It might well be a bug, but let's see...
> 
> The rcu_state.gp_kthread field is initially NULL and transitions only once
> to the non-NULL pointer to the RCU grace-period kthread's task_struct
> structure.  So yes, this does work, courtesy of the compiler not being
> allowed to change the order of READ_ONCE() instances and conherence-order
> rules for READ_ONCE() and WRITE_ONCE().
> 
> But it would clearly be way better to do just one READ_ONCE() into a
> local variable and test that local variable twice.
> 
> I will make this change, and thank you for calling my attention to it!

And does the following V2 look better?

							Thanx, Paul

------------------------------------------------------------------------

commit 35f7c539d30d5b595718302d07334146f8eb7304
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Tue Jan 21 12:30:22 2020 -0800

    rcu: Don't flag non-starting GPs before GP kthread is running
    
    Currently rcu_check_gp_start_stall() complains if a grace period takes
    too long to start, where "too long" is roughly one RCU CPU stall-warning
    interval.  This has worked well, but there are some debugging Kconfig
    options (such as CONFIG_EFI_PGT_DUMP=y) that can make booting take a
    very long time, so much so that the stall-warning interval has expired
    before RCU's grace-period kthread has even been spawned.
    
    This commit therefore resets the rcu_state.gp_req_activity and
    rcu_state.gp_activity timestamps just before the grace-period kthread
    is spawned, and modifies the checks and adds ordering to ensure that
    if rcu_check_gp_start_stall() sees that the grace-period kthread
    has been spawned, that it will also see the resets applied to the
    rcu_state.gp_req_activity and rcu_state.gp_activity timestamps.
    
    Reported-by: Qian Cai <cai@lca.pw>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    [ paulmck: Fix whitespace issues reported by Qian Cai. ]
    Tested-by: Qian Cai <cai@lca.pw>
    [ paulmck: Simplify grace-period wakeup check per Steve Rostedt feedback. ]

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 62383ce..4a4a975 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1202,7 +1202,7 @@ static bool rcu_start_this_gp(struct rcu_node *rnp_start, struct rcu_data *rdp,
 	trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("Startedroot"));
 	WRITE_ONCE(rcu_state.gp_flags, rcu_state.gp_flags | RCU_GP_FLAG_INIT);
 	WRITE_ONCE(rcu_state.gp_req_activity, jiffies);
-	if (!rcu_state.gp_kthread) {
+	if (!READ_ONCE(rcu_state.gp_kthread)) {
 		trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("NoGPkthread"));
 		goto unlock_out;
 	}
@@ -1237,12 +1237,13 @@ static bool rcu_future_gp_cleanup(struct rcu_node *rnp)
 }
 
 /*
- * Awaken the grace-period kthread.  Don't do a self-awaken (unless in
- * an interrupt or softirq handler), and don't bother awakening when there
- * is nothing for the grace-period kthread to do (as in several CPUs raced
- * to awaken, and we lost), and finally don't try to awaken a kthread that
- * has not yet been created.  If all those checks are passed, track some
- * debug information and awaken.
+ * Awaken the grace-period kthread.  Don't do a self-awaken (unless in an
+ * interrupt or softirq handler, in which case we just might immediately
+ * sleep upon return, resulting in a grace-period hang), and don't bother
+ * awakening when there is nothing for the grace-period kthread to do
+ * (as in several CPUs raced to awaken, we lost), and finally don't try
+ * to awaken a kthread that has not yet been created.  If all those checks
+ * are passed, track some debug information and awaken.
  *
  * So why do the self-wakeup when in an interrupt or softirq handler
  * in the grace-period kthread's context?  Because the kthread might have
@@ -1252,10 +1253,10 @@ static bool rcu_future_gp_cleanup(struct rcu_node *rnp)
  */
 static void rcu_gp_kthread_wake(void)
 {
-	if ((current == rcu_state.gp_kthread &&
-	     !in_irq() && !in_serving_softirq()) ||
-	    !READ_ONCE(rcu_state.gp_flags) ||
-	    !rcu_state.gp_kthread)
+	struct task_struct *t = READ_ONCE(rcu_state.gp_kthread);
+
+	if ((current == t && !in_irq() && !in_serving_softirq()) ||
+	    !READ_ONCE(rcu_state.gp_flags) || !t)
 		return;
 	WRITE_ONCE(rcu_state.gp_wake_time, jiffies);
 	WRITE_ONCE(rcu_state.gp_wake_seq, READ_ONCE(rcu_state.gp_seq));
@@ -3554,7 +3555,10 @@ static int __init rcu_spawn_gp_kthread(void)
 	}
 	rnp = rcu_get_root();
 	raw_spin_lock_irqsave_rcu_node(rnp, flags);
-	rcu_state.gp_kthread = t;
+	WRITE_ONCE(rcu_state.gp_activity, jiffies);
+	WRITE_ONCE(rcu_state.gp_req_activity, jiffies);
+	// Reset .gp_activity and .gp_req_activity before setting .gp_kthread.
+	smp_store_release(&rcu_state.gp_kthread, t);  /* ^^^ */
 	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 	wake_up_process(t);
 	rcu_spawn_nocb_kthreads();
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 488b71d..16ad7ad 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -578,6 +578,7 @@ void show_rcu_gp_kthreads(void)
 	unsigned long jw;
 	struct rcu_data *rdp;
 	struct rcu_node *rnp;
+	struct task_struct *t = READ_ONCE(rcu_state.gp_kthread);
 
 	j = jiffies;
 	ja = j - READ_ONCE(rcu_state.gp_activity);
@@ -585,8 +586,7 @@ void show_rcu_gp_kthreads(void)
 	jw = j - READ_ONCE(rcu_state.gp_wake_time);
 	pr_info("%s: wait state: %s(%d) ->state: %#lx delta ->gp_activity %lu ->gp_req_activity %lu ->gp_wake_time %lu ->gp_wake_seq %ld ->gp_seq %ld ->gp_seq_needed %ld ->gp_flags %#x\n",
 		rcu_state.name, gp_state_getname(rcu_state.gp_state),
-		rcu_state.gp_state,
-		rcu_state.gp_kthread ? rcu_state.gp_kthread->state : 0x1ffffL,
+		rcu_state.gp_state, t ? t->state : 0x1ffffL,
 		ja, jr, jw, (long)READ_ONCE(rcu_state.gp_wake_seq),
 		(long)READ_ONCE(rcu_state.gp_seq),
 		(long)READ_ONCE(rcu_get_root()->gp_seq_needed),
@@ -633,7 +633,8 @@ static void rcu_check_gp_start_stall(struct rcu_node *rnp, struct rcu_data *rdp,
 
 	if (!IS_ENABLED(CONFIG_PROVE_RCU) || rcu_gp_in_progress() ||
 	    ULONG_CMP_GE(READ_ONCE(rnp_root->gp_seq),
-			 READ_ONCE(rnp_root->gp_seq_needed)))
+			 READ_ONCE(rnp_root->gp_seq_needed)) ||
+	    !smp_load_acquire(&rcu_state.gp_kthread)) // Get stable kthread.
 		return;
 	j = jiffies; /* Expensive access, and in common case don't get here. */
 	if (time_before(j, READ_ONCE(rcu_state.gp_req_activity) + gpssdelay) ||

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH tip/core/rcu 22/30] rcu: Don't flag non-starting GPs before GP kthread is running
  2020-02-15 13:42       ` Paul E. McKenney
@ 2020-02-17 20:25         ` Steven Rostedt
  2020-02-17 22:03           ` Paul E. McKenney
  0 siblings, 1 reply; 46+ messages in thread
From: Steven Rostedt @ 2020-02-17 20:25 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, dipankar,
	akpm, mathieu.desnoyers, josh, tglx, peterz, dhowells, edumazet,
	fweisbec, oleg, joel, Linus Torvalds

On Sat, 15 Feb 2020 05:42:08 -0800
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> 
> And does the following V2 look better?
> 

For the issue I brought up, yes. But now I have to ask...

> @@ -1252,10 +1253,10 @@ static bool rcu_future_gp_cleanup(struct rcu_node *rnp)
>   */
>  static void rcu_gp_kthread_wake(void)
>  {
> -	if ((current == rcu_state.gp_kthread &&
> -	     !in_irq() && !in_serving_softirq()) ||
> -	    !READ_ONCE(rcu_state.gp_flags) ||
> -	    !rcu_state.gp_kthread)
> +	struct task_struct *t = READ_ONCE(rcu_state.gp_kthread);
> +
> +	if ((current == t && !in_irq() && !in_serving_softirq()) ||
> +	    !READ_ONCE(rcu_state.gp_flags) || !t)

Why not test !t first? As that is the fastest operation in the if
statement, and will shortcut all the other operations if it is true.

As I like to micro-optimize ;-), for or (||) statements, I like to add
the fastest operations first. To me, that would be:

	if (!t || READ_ONCE(rcu_state.gp_flags) ||
	    (current == t && !in_irq() && !in_serving_softirq()))
		return;

Note, in_irq() reads preempt_count which is not always a fast operation.

-- Steve


>  		return;
>  	WRITE_ONCE(rcu_state.gp_wake_time, jiffies);
>  	WRITE_ONCE(rcu_state.gp_wake_seq, READ_ONCE(rcu_state.gp_seq));
> @@ -3554,7 +3555,10 @@ static int __init rcu_spawn_gp_kthread(void)
>  	}
>  	rnp = rcu_get_root();
>  	raw_spin_lock_irqsave_rcu_node(rnp, flags);
> -	rcu_state.gp_kthread = t;
> +	WRITE_ONCE(rcu_state.gp_activity, jiffies);
> +	WRITE_ONCE(rcu_state.gp_req_activity, jiffies);
> +	// Reset .gp_activity and .gp_req_activity before setting .gp_kthread.
> +	smp_store_release(&rcu_state.gp_kthread, t);  /* ^^^ */
>  	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>  	wake_up_process(t);
>  	rcu_spawn_nocb_kthreads();
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index 488b71d..16ad7ad 100644
\

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH tip/core/rcu 06/30] rcu: Add WRITE_ONCE to rcu_node ->exp_seq_rq store
  2020-02-15 10:58     ` Paul E. McKenney
@ 2020-02-17 21:11       ` Joel Fernandes
  2020-02-17 21:36         ` Paul E. McKenney
  0 siblings, 1 reply; 46+ messages in thread
From: Joel Fernandes @ 2020-02-17 21:11 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, rcu, linux-kernel, kernel-team, mingo,
	jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, dhowells, edumazet, fweisbec, oleg, Linus Torvalds

On Sat, Feb 15, 2020 at 02:58:03AM -0800, Paul E. McKenney wrote:
> On Fri, Feb 14, 2020 at 10:47:43PM -0500, Steven Rostedt wrote:
> > On Fri, 14 Feb 2020 15:55:43 -0800
> > paulmck@kernel.org wrote:
> > 
> > > From: "Paul E. McKenney" <paulmck@kernel.org>
> > > 
> > > The rcu_node structure's ->exp_seq_rq field is read locklessly, so
> > > this commit adds the WRITE_ONCE() to a load in order to provide proper
> > > documentation and READ_ONCE()/WRITE_ONCE() pairing.
> > > 
> > > This data race was reported by KCSAN.  Not appropriate for backporting
> > > due to failure being unlikely.
> > > 
> > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > ---
> > >  kernel/rcu/tree_exp.h | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > > index d7e0484..85b009e 100644
> > > --- a/kernel/rcu/tree_exp.h
> > > +++ b/kernel/rcu/tree_exp.h
> > > @@ -314,7 +314,7 @@ static bool exp_funnel_lock(unsigned long s)
> > >  				   sync_exp_work_done(s));
> > >  			return true;
> > >  		}
> > > -		rnp->exp_seq_rq = s; /* Followers can wait on us. */
> > > +		WRITE_ONCE(rnp->exp_seq_rq, s); /* Followers can wait on us. */
> > 
> > Didn't Linus say this is basically bogus?
> > 
> > Perhaps just using it as documenting that it's read locklessly, but is
> > it really needed?
> 
> Yes, Linus explicitly stated that WRITE_ONCE() is not required in
> this case, but he also said that he was OK with it being there for
> documentation purposes.

Just to add, PeterZ does approve of WRITE_ONCE() to prevent store-tearing
where applicable.

And I have reproduced Will's example [1] with the arm64 Clang compiler
shipping with the latest Android NDK just now. It does break up stores when
writing zeroes to 64-bit valyes, this is a real problem which WRITE_ONCE()
resolves. And I've verified GCC on arm64 does break up 64-bit immediate value
writes without WRITE_ONCE().

thanks,

 - Joel

[1] https://lore.kernel.org/lkml/20190821103200.kpufwtviqhpbuv2n@willie-the-truck/


> And within RCU, I -do- need it because I absolutely need to see if a
> given patch introduced new KCSAN reports.  So I need it for the same
> reason that I need the build to proceed without warnings.
> 
> Others who are working with less concurrency-intensive code might quite
> reasonably make other choices, of course.  And my setting certain KCSAN
> config options in my own builds doesn't inconvenience them, so we should
> all be happy, right?  :-)
> 
> 							Thanx, Paul
> 
> > -- Steve
> > 
> > 
> > 
> > >  		spin_unlock(&rnp->exp_lock);
> > >  		trace_rcu_exp_funnel_lock(rcu_state.name, rnp->level,
> > >  					  rnp->grplo, rnp->grphi, TPS("nxtlvl"));
> > 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH tip/core/rcu 06/30] rcu: Add WRITE_ONCE to rcu_node ->exp_seq_rq store
  2020-02-17 21:11       ` Joel Fernandes
@ 2020-02-17 21:36         ` Paul E. McKenney
  0 siblings, 0 replies; 46+ messages in thread
From: Paul E. McKenney @ 2020-02-17 21:36 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Steven Rostedt, rcu, linux-kernel, kernel-team, mingo,
	jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, dhowells, edumazet, fweisbec, oleg, Linus Torvalds

On Mon, Feb 17, 2020 at 04:11:35PM -0500, Joel Fernandes wrote:
> On Sat, Feb 15, 2020 at 02:58:03AM -0800, Paul E. McKenney wrote:
> > On Fri, Feb 14, 2020 at 10:47:43PM -0500, Steven Rostedt wrote:
> > > On Fri, 14 Feb 2020 15:55:43 -0800
> > > paulmck@kernel.org wrote:
> > > 
> > > > From: "Paul E. McKenney" <paulmck@kernel.org>
> > > > 
> > > > The rcu_node structure's ->exp_seq_rq field is read locklessly, so
> > > > this commit adds the WRITE_ONCE() to a load in order to provide proper
> > > > documentation and READ_ONCE()/WRITE_ONCE() pairing.
> > > > 
> > > > This data race was reported by KCSAN.  Not appropriate for backporting
> > > > due to failure being unlikely.
> > > > 
> > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > > ---
> > > >  kernel/rcu/tree_exp.h | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > > > index d7e0484..85b009e 100644
> > > > --- a/kernel/rcu/tree_exp.h
> > > > +++ b/kernel/rcu/tree_exp.h
> > > > @@ -314,7 +314,7 @@ static bool exp_funnel_lock(unsigned long s)
> > > >  				   sync_exp_work_done(s));
> > > >  			return true;
> > > >  		}
> > > > -		rnp->exp_seq_rq = s; /* Followers can wait on us. */
> > > > +		WRITE_ONCE(rnp->exp_seq_rq, s); /* Followers can wait on us. */
> > > 
> > > Didn't Linus say this is basically bogus?
> > > 
> > > Perhaps just using it as documenting that it's read locklessly, but is
> > > it really needed?
> > 
> > Yes, Linus explicitly stated that WRITE_ONCE() is not required in
> > this case, but he also said that he was OK with it being there for
> > documentation purposes.
> 
> Just to add, PeterZ does approve of WRITE_ONCE() to prevent store-tearing
> where applicable.
> 
> And I have reproduced Will's example [1] with the arm64 Clang compiler
> shipping with the latest Android NDK just now. It does break up stores when
> writing zeroes to 64-bit valyes, this is a real problem which WRITE_ONCE()
> resolves. And I've verified GCC on arm64 does break up 64-bit immediate value
> writes without WRITE_ONCE().

That does sounds a bit on the required side!  ;-)

							Thanx, Paul

> thanks,
> 
>  - Joel
> 
> [1] https://lore.kernel.org/lkml/20190821103200.kpufwtviqhpbuv2n@willie-the-truck/
> 
> 
> > And within RCU, I -do- need it because I absolutely need to see if a
> > given patch introduced new KCSAN reports.  So I need it for the same
> > reason that I need the build to proceed without warnings.
> > 
> > Others who are working with less concurrency-intensive code might quite
> > reasonably make other choices, of course.  And my setting certain KCSAN
> > config options in my own builds doesn't inconvenience them, so we should
> > all be happy, right?  :-)
> > 
> > 							Thanx, Paul
> > 
> > > -- Steve
> > > 
> > > 
> > > 
> > > >  		spin_unlock(&rnp->exp_lock);
> > > >  		trace_rcu_exp_funnel_lock(rcu_state.name, rnp->level,
> > > >  					  rnp->grplo, rnp->grphi, TPS("nxtlvl"));
> > > 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH tip/core/rcu 22/30] rcu: Don't flag non-starting GPs before GP kthread is running
  2020-02-17 20:25         ` Steven Rostedt
@ 2020-02-17 22:03           ` Paul E. McKenney
  2020-02-17 22:21             ` Steven Rostedt
  0 siblings, 1 reply; 46+ messages in thread
From: Paul E. McKenney @ 2020-02-17 22:03 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, dipankar,
	akpm, mathieu.desnoyers, josh, tglx, peterz, dhowells, edumazet,
	fweisbec, oleg, joel, Linus Torvalds

On Mon, Feb 17, 2020 at 03:25:17PM -0500, Steven Rostedt wrote:
> On Sat, 15 Feb 2020 05:42:08 -0800
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > 
> > And does the following V2 look better?
> > 
> 
> For the issue I brought up, yes. But now I have to ask...
> 
> > @@ -1252,10 +1253,10 @@ static bool rcu_future_gp_cleanup(struct rcu_node *rnp)
> >   */
> >  static void rcu_gp_kthread_wake(void)
> >  {
> > -	if ((current == rcu_state.gp_kthread &&
> > -	     !in_irq() && !in_serving_softirq()) ||
> > -	    !READ_ONCE(rcu_state.gp_flags) ||
> > -	    !rcu_state.gp_kthread)
> > +	struct task_struct *t = READ_ONCE(rcu_state.gp_kthread);
> > +
> > +	if ((current == t && !in_irq() && !in_serving_softirq()) ||
> > +	    !READ_ONCE(rcu_state.gp_flags) || !t)
> 
> Why not test !t first? As that is the fastest operation in the if
> statement, and will shortcut all the other operations if it is true.
> 
> As I like to micro-optimize ;-), for or (||) statements, I like to add
> the fastest operations first. To me, that would be:
> 
> 	if (!t || READ_ONCE(rcu_state.gp_flags) ||
> 	    (current == t && !in_irq() && !in_serving_softirq()))
> 		return;
> 
> Note, in_irq() reads preempt_count which is not always a fast operation.

And what is a day without micro-optimization of a slowpath?  :-)

OK, let's see...

Grace-period kthread wakeups are normally mediated by rcu_start_this_gp(),
which uses a funnel lock to consolidate concurrent requests to start
a grace period.  If a grace period is already in progress, it refrains
from doing a wakeup because that means that the grace-period kthread
will check for another grace period being needed at the end of the
current grace period.

Exceptions include:

o	The wakeup reporting the last quiescent state of the current
	grace period.

o	Emergency situations such as callback overloads and RCU CPU stalls.

So on a busy system that is not overloaded, the common case is that
rcu_gp_kthread_wake() is invoked only once per grace period because there
is no emergency and there is a grace period in progress.  If this system
has short idle periods and a fair number of quiescent states, a reasonable
amount of idle time, then the last quiescent state will not normally be
detected by the grace-period kthread.  But workloads can of course vary.

The "!t" holds only during early boot.  So we could put a likely() around
the "t".  But more to the point, at runtime, "!t" would always be false,
so it really should be last in the list of "||" clauses.  This isn't
enough of a fastpath for a static branch to make sense.

The "!READ_ONCE(rcu_state.gp_flags)" will normally hold, though it is
false often enough to pay for itself.  Or has been in the past, anyway.
I suspect that access to the global variable rcu_state.gp_flags is not
always fast either.

So I am having difficulty talking myself into modifying this one given
the frequency of operations.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH tip/core/rcu 22/30] rcu: Don't flag non-starting GPs before GP kthread is running
  2020-02-17 22:03           ` Paul E. McKenney
@ 2020-02-17 22:21             ` Steven Rostedt
  2020-02-17 23:03               ` Paul E. McKenney
  0 siblings, 1 reply; 46+ messages in thread
From: Steven Rostedt @ 2020-02-17 22:21 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, dipankar,
	akpm, mathieu.desnoyers, josh, tglx, peterz, dhowells, edumazet,
	fweisbec, oleg, joel, Linus Torvalds

On Mon, 17 Feb 2020 14:03:56 -0800
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> And what is a day without micro-optimization of a slowpath?  :-)

A day you have off, but still find yourself working ;-)

> 
> OK, let's see...
> 
> Grace-period kthread wakeups are normally mediated by rcu_start_this_gp(),
> which uses a funnel lock to consolidate concurrent requests to start
> a grace period.  If a grace period is already in progress, it refrains
> from doing a wakeup because that means that the grace-period kthread
> will check for another grace period being needed at the end of the
> current grace period.
> 
> Exceptions include:
> 
> o	The wakeup reporting the last quiescent state of the current
> 	grace period.
> 
> o	Emergency situations such as callback overloads and RCU CPU stalls.
> 
> So on a busy system that is not overloaded, the common case is that
> rcu_gp_kthread_wake() is invoked only once per grace period because there
> is no emergency and there is a grace period in progress.  If this system
> has short idle periods and a fair number of quiescent states, a reasonable
> amount of idle time, then the last quiescent state will not normally be
> detected by the grace-period kthread.  But workloads can of course vary.
> 
> The "!t" holds only during early boot.  So we could put a likely() around
> the "t".  But more to the point, at runtime, "!t" would always be false,
> so it really should be last in the list of "||" clauses.  This isn't
> enough of a fastpath for a static branch to make sense.

Hey! Does that mean we can add a static branch for that check?


struct static_key rcu_booting = STATIC_KEY_INIT_TRUE;

[...]

	if (READ_ONCE(rcu_state.gp_flags) ||
	    (current == t && !in_irq() && !in_serving_softirq())
		return;

	if (static_branch_unlikely(&rcu_booting) && !t)
		return;

At end of boot:

	static_key_disable(&rcu_booting);

That way we can really micro-optimize the slow path, and it basically
becomes a nop!

-- Steve

> 
> The "!READ_ONCE(rcu_state.gp_flags)" will normally hold, though it is
> false often enough to pay for itself.  Or has been in the past, anyway.
> I suspect that access to the global variable rcu_state.gp_flags is not
> always fast either.
> 
> So I am having difficulty talking myself into modifying this one given
> the frequency of operations.
> 
> 							Thanx, Paul


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH tip/core/rcu 22/30] rcu: Don't flag non-starting GPs before GP kthread is running
  2020-02-17 22:21             ` Steven Rostedt
@ 2020-02-17 23:03               ` Paul E. McKenney
  0 siblings, 0 replies; 46+ messages in thread
From: Paul E. McKenney @ 2020-02-17 23:03 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, dipankar,
	akpm, mathieu.desnoyers, josh, tglx, peterz, dhowells, edumazet,
	fweisbec, oleg, joel, Linus Torvalds

On Mon, Feb 17, 2020 at 05:21:31PM -0500, Steven Rostedt wrote:
> On Mon, 17 Feb 2020 14:03:56 -0800
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > And what is a day without micro-optimization of a slowpath?  :-)
> 
> A day you have off, but still find yourself working ;-)
> 
> > 
> > OK, let's see...
> > 
> > Grace-period kthread wakeups are normally mediated by rcu_start_this_gp(),
> > which uses a funnel lock to consolidate concurrent requests to start
> > a grace period.  If a grace period is already in progress, it refrains
> > from doing a wakeup because that means that the grace-period kthread
> > will check for another grace period being needed at the end of the
> > current grace period.
> > 
> > Exceptions include:
> > 
> > o	The wakeup reporting the last quiescent state of the current
> > 	grace period.
> > 
> > o	Emergency situations such as callback overloads and RCU CPU stalls.
> > 
> > So on a busy system that is not overloaded, the common case is that
> > rcu_gp_kthread_wake() is invoked only once per grace period because there
> > is no emergency and there is a grace period in progress.  If this system
> > has short idle periods and a fair number of quiescent states, a reasonable
> > amount of idle time, then the last quiescent state will not normally be
> > detected by the grace-period kthread.  But workloads can of course vary.
> > 
> > The "!t" holds only during early boot.  So we could put a likely() around
> > the "t".  But more to the point, at runtime, "!t" would always be false,
> > so it really should be last in the list of "||" clauses.  This isn't
> > enough of a fastpath for a static branch to make sense.
> 
> Hey! Does that mean we can add a static branch for that check?
> 
> 
> struct static_key rcu_booting = STATIC_KEY_INIT_TRUE;
> 
> [...]
> 
> 	if (READ_ONCE(rcu_state.gp_flags) ||
> 	    (current == t && !in_irq() && !in_serving_softirq())
> 		return;
> 
> 	if (static_branch_unlikely(&rcu_booting) && !t)
> 		return;
> 
> At end of boot:
> 
> 	static_key_disable(&rcu_booting);
> 
> That way we can really micro-optimize the slow path, and it basically
> becomes a nop!

;-) ;-) ;-)

There might be some place where static branches would be of use in RCU,
but unfortunately none that I can think of on the read side.  Though I
am once again making progress on Lai Jiangshan's patchset, which might
one day result in inlining of __rcu_read_lock() and __rcu_read_unlock()
in PREEMPT=y kernels.

							Thanx, Paul

> -- Steve
> 
> > 
> > The "!READ_ONCE(rcu_state.gp_flags)" will normally hold, though it is
> > false often enough to pay for itself.  Or has been in the past, anyway.
> > I suspect that access to the global variable rcu_state.gp_flags is not
> > always fast either.
> > 
> > So I am having difficulty talking myself into modifying this one given
> > the frequency of operations.
> > 
> > 							Thanx, Paul
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH tip/core/rcu 30/30] rcu: Make rcu_barrier() account for offline no-CBs CPUs
  2020-02-14 23:56 ` [PATCH tip/core/rcu 30/30] rcu: Make rcu_barrier() account for offline no-CBs CPUs paulmck
@ 2020-02-25 10:24   ` Boqun Feng
  2020-02-26  3:14     ` Paul E. McKenney
  0 siblings, 1 reply; 46+ messages in thread
From: Boqun Feng @ 2020-02-25 10:24 UTC (permalink / raw)
  To: paulmck
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, dipankar,
	akpm, mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, # 5 . 5 . x

Hi Paul,

On Fri, Feb 14, 2020 at 03:56:07PM -0800, paulmck@kernel.org wrote:
> From: "Paul E. McKenney" <paulmck@kernel.org>
> 
> Currently, rcu_barrier() ignores offline CPUs,  However, it is possible
> for an offline no-CBs CPU to have callbacks queued, and rcu_barrier()
> must wait for those callbacks.  This commit therefore makes rcu_barrier()
> directly invoke the rcu_barrier_func() with interrupts disabled for such
> CPUs.  This requires passing the CPU number into this function so that
> it can entrain the rcu_barrier() callback onto the correct CPU's callback
> list, given that the code must instead execute on the current CPU.
> 
> While in the area, this commit fixes a bug where the first CPU's callback
> might have been invoked before rcu_segcblist_entrain() returned, which
> would also result in an early wakeup.
> 
> Fixes: 5d6742b37727 ("rcu/nocb: Use rcu_segcblist for no-CBs CPUs")
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Cc: <stable@vger.kernel.org> # 5.5.x
> ---
>  include/trace/events/rcu.h |  1 +
>  kernel/rcu/tree.c          | 32 ++++++++++++++++++++------------
>  2 files changed, 21 insertions(+), 12 deletions(-)
> 
> diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> index 5e49b06..d56d54c 100644
> --- a/include/trace/events/rcu.h
> +++ b/include/trace/events/rcu.h
> @@ -712,6 +712,7 @@ TRACE_EVENT_RCU(rcu_torture_read,
>   *	"Begin": rcu_barrier() started.
>   *	"EarlyExit": rcu_barrier() piggybacked, thus early exit.
>   *	"Inc1": rcu_barrier() piggyback check counter incremented.
> + *	"OfflineNoCBQ": rcu_barrier() found offline no-CBs CPU with callbacks.
>   *	"OnlineQ": rcu_barrier() found online CPU with callbacks.
>   *	"OnlineNQ": rcu_barrier() found online CPU, no callbacks.
>   *	"IRQ": An rcu_barrier_callback() callback posted on remote CPU.
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index d15041f..160643e 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3098,9 +3098,10 @@ static void rcu_barrier_callback(struct rcu_head *rhp)
>  /*
>   * Called with preemption disabled, and from cross-cpu IRQ context.
>   */
> -static void rcu_barrier_func(void *unused)
> +static void rcu_barrier_func(void *cpu_in)
>  {
> -	struct rcu_data *rdp = raw_cpu_ptr(&rcu_data);
> +	uintptr_t cpu = (uintptr_t)cpu_in;
> +	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
>  
>  	rcu_barrier_trace(TPS("IRQ"), -1, rcu_state.barrier_sequence);
>  	rdp->barrier_head.func = rcu_barrier_callback;
> @@ -3127,7 +3128,7 @@ static void rcu_barrier_func(void *unused)
>   */
>  void rcu_barrier(void)
>  {
> -	int cpu;
> +	uintptr_t cpu;
>  	struct rcu_data *rdp;
>  	unsigned long s = rcu_seq_snap(&rcu_state.barrier_sequence);
>  
> @@ -3150,13 +3151,14 @@ void rcu_barrier(void)
>  	rcu_barrier_trace(TPS("Inc1"), -1, rcu_state.barrier_sequence);
>  
>  	/*
> -	 * Initialize the count to one rather than to zero in order to
> -	 * avoid a too-soon return to zero in case of a short grace period
> -	 * (or preemption of this task).  Exclude CPU-hotplug operations
> -	 * to ensure that no offline CPU has callbacks queued.
> +	 * Initialize the count to two rather than to zero in order
> +	 * to avoid a too-soon return to zero in case of an immediate
> +	 * invocation of the just-enqueued callback (or preemption of
> +	 * this task).  Exclude CPU-hotplug operations to ensure that no
> +	 * offline non-offloaded CPU has callbacks queued.
>  	 */
>  	init_completion(&rcu_state.barrier_completion);
> -	atomic_set(&rcu_state.barrier_cpu_count, 1);
> +	atomic_set(&rcu_state.barrier_cpu_count, 2);
>  	get_online_cpus();
>  
>  	/*
> @@ -3166,13 +3168,19 @@ void rcu_barrier(void)
>  	 */
>  	for_each_possible_cpu(cpu) {
>  		rdp = per_cpu_ptr(&rcu_data, cpu);
> -		if (!cpu_online(cpu) &&
> +		if (cpu_is_offline(cpu) &&
>  		    !rcu_segcblist_is_offloaded(&rdp->cblist))
>  			continue;
> -		if (rcu_segcblist_n_cbs(&rdp->cblist)) {
> +		if (rcu_segcblist_n_cbs(&rdp->cblist) && cpu_online(cpu)) {
>  			rcu_barrier_trace(TPS("OnlineQ"), cpu,
>  					  rcu_state.barrier_sequence);
> -			smp_call_function_single(cpu, rcu_barrier_func, NULL, 1);
> +			smp_call_function_single(cpu, rcu_barrier_func, (void *)cpu, 1);
> +		} else if (cpu_is_offline(cpu)) {

I wonder whether this should be:

		  else if (rcu_segcblist_n_cbs(&rdp->cblist) && cpu_is_offline(cpu))

? Because I think we only want to queue the barrier call back if there
are callbacks for a particular CPU. Am I missing something subtle?

Regards,
Boqun

> +			rcu_barrier_trace(TPS("OfflineNoCBQ"), cpu,
> +					  rcu_state.barrier_sequence);
> +			local_irq_disable();
> +			rcu_barrier_func((void *)cpu);
> +			local_irq_enable();
>  		} else {
>  			rcu_barrier_trace(TPS("OnlineNQ"), cpu,
>  					  rcu_state.barrier_sequence);
> @@ -3184,7 +3192,7 @@ void rcu_barrier(void)
>  	 * Now that we have an rcu_barrier_callback() callback on each
>  	 * CPU, and thus each counted, remove the initial count.
>  	 */
> -	if (atomic_dec_and_test(&rcu_state.barrier_cpu_count))
> +	if (atomic_sub_and_test(2, &rcu_state.barrier_cpu_count))
>  		complete(&rcu_state.barrier_completion);
>  
>  	/* Wait for all rcu_barrier_callback() callbacks to be invoked. */
> -- 
> 2.9.5
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH tip/core/rcu 30/30] rcu: Make rcu_barrier() account for offline no-CBs CPUs
  2020-02-25 10:24   ` Boqun Feng
@ 2020-02-26  3:14     ` Paul E. McKenney
  2020-02-26  4:18       ` Paul E. McKenney
  2020-02-26  6:14       ` Boqun Feng
  0 siblings, 2 replies; 46+ messages in thread
From: Paul E. McKenney @ 2020-02-26  3:14 UTC (permalink / raw)
  To: Boqun Feng
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, dipankar,
	akpm, mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, # 5 . 5 . x

On Tue, Feb 25, 2020 at 06:24:36PM +0800, Boqun Feng wrote:
> Hi Paul,
> 
> On Fri, Feb 14, 2020 at 03:56:07PM -0800, paulmck@kernel.org wrote:
> > From: "Paul E. McKenney" <paulmck@kernel.org>
> > 
> > Currently, rcu_barrier() ignores offline CPUs,  However, it is possible
> > for an offline no-CBs CPU to have callbacks queued, and rcu_barrier()
> > must wait for those callbacks.  This commit therefore makes rcu_barrier()
> > directly invoke the rcu_barrier_func() with interrupts disabled for such
> > CPUs.  This requires passing the CPU number into this function so that
> > it can entrain the rcu_barrier() callback onto the correct CPU's callback
> > list, given that the code must instead execute on the current CPU.
> > 
> > While in the area, this commit fixes a bug where the first CPU's callback
> > might have been invoked before rcu_segcblist_entrain() returned, which
> > would also result in an early wakeup.
> > 
> > Fixes: 5d6742b37727 ("rcu/nocb: Use rcu_segcblist for no-CBs CPUs")
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > Cc: <stable@vger.kernel.org> # 5.5.x
> > ---
> >  include/trace/events/rcu.h |  1 +
> >  kernel/rcu/tree.c          | 32 ++++++++++++++++++++------------
> >  2 files changed, 21 insertions(+), 12 deletions(-)
> > 
> > diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> > index 5e49b06..d56d54c 100644
> > --- a/include/trace/events/rcu.h
> > +++ b/include/trace/events/rcu.h
> > @@ -712,6 +712,7 @@ TRACE_EVENT_RCU(rcu_torture_read,
> >   *	"Begin": rcu_barrier() started.
> >   *	"EarlyExit": rcu_barrier() piggybacked, thus early exit.
> >   *	"Inc1": rcu_barrier() piggyback check counter incremented.
> > + *	"OfflineNoCBQ": rcu_barrier() found offline no-CBs CPU with callbacks.
> >   *	"OnlineQ": rcu_barrier() found online CPU with callbacks.
> >   *	"OnlineNQ": rcu_barrier() found online CPU, no callbacks.
> >   *	"IRQ": An rcu_barrier_callback() callback posted on remote CPU.
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index d15041f..160643e 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3098,9 +3098,10 @@ static void rcu_barrier_callback(struct rcu_head *rhp)
> >  /*
> >   * Called with preemption disabled, and from cross-cpu IRQ context.
> >   */
> > -static void rcu_barrier_func(void *unused)
> > +static void rcu_barrier_func(void *cpu_in)
> >  {
> > -	struct rcu_data *rdp = raw_cpu_ptr(&rcu_data);
> > +	uintptr_t cpu = (uintptr_t)cpu_in;
> > +	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> >  
> >  	rcu_barrier_trace(TPS("IRQ"), -1, rcu_state.barrier_sequence);
> >  	rdp->barrier_head.func = rcu_barrier_callback;
> > @@ -3127,7 +3128,7 @@ static void rcu_barrier_func(void *unused)
> >   */
> >  void rcu_barrier(void)
> >  {
> > -	int cpu;
> > +	uintptr_t cpu;
> >  	struct rcu_data *rdp;
> >  	unsigned long s = rcu_seq_snap(&rcu_state.barrier_sequence);
> >  
> > @@ -3150,13 +3151,14 @@ void rcu_barrier(void)
> >  	rcu_barrier_trace(TPS("Inc1"), -1, rcu_state.barrier_sequence);
> >  
> >  	/*
> > -	 * Initialize the count to one rather than to zero in order to
> > -	 * avoid a too-soon return to zero in case of a short grace period
> > -	 * (or preemption of this task).  Exclude CPU-hotplug operations
> > -	 * to ensure that no offline CPU has callbacks queued.
> > +	 * Initialize the count to two rather than to zero in order
> > +	 * to avoid a too-soon return to zero in case of an immediate
> > +	 * invocation of the just-enqueued callback (or preemption of
> > +	 * this task).  Exclude CPU-hotplug operations to ensure that no
> > +	 * offline non-offloaded CPU has callbacks queued.
> >  	 */
> >  	init_completion(&rcu_state.barrier_completion);
> > -	atomic_set(&rcu_state.barrier_cpu_count, 1);
> > +	atomic_set(&rcu_state.barrier_cpu_count, 2);
> >  	get_online_cpus();
> >  
> >  	/*
> > @@ -3166,13 +3168,19 @@ void rcu_barrier(void)
> >  	 */
> >  	for_each_possible_cpu(cpu) {
> >  		rdp = per_cpu_ptr(&rcu_data, cpu);
> > -		if (!cpu_online(cpu) &&
> > +		if (cpu_is_offline(cpu) &&
> >  		    !rcu_segcblist_is_offloaded(&rdp->cblist))
> >  			continue;
> > -		if (rcu_segcblist_n_cbs(&rdp->cblist)) {
> > +		if (rcu_segcblist_n_cbs(&rdp->cblist) && cpu_online(cpu)) {
> >  			rcu_barrier_trace(TPS("OnlineQ"), cpu,
> >  					  rcu_state.barrier_sequence);
> > -			smp_call_function_single(cpu, rcu_barrier_func, NULL, 1);
> > +			smp_call_function_single(cpu, rcu_barrier_func, (void *)cpu, 1);
> > +		} else if (cpu_is_offline(cpu)) {
> 
> I wonder whether this should be:
> 
> 		  else if (rcu_segcblist_n_cbs(&rdp->cblist) && cpu_is_offline(cpu))
> 
> ? Because I think we only want to queue the barrier call back if there
> are callbacks for a particular CPU. Am I missing something subtle?

I don't believe that you are missing anything at all!

Thank you very much -- this bug would not have shown up in any validation
setup that I am aware of.  ;-)

							Thanx, Paul

> Regards,
> Boqun
> 
> > +			rcu_barrier_trace(TPS("OfflineNoCBQ"), cpu,
> > +					  rcu_state.barrier_sequence);
> > +			local_irq_disable();
> > +			rcu_barrier_func((void *)cpu);
> > +			local_irq_enable();
> >  		} else {
> >  			rcu_barrier_trace(TPS("OnlineNQ"), cpu,
> >  					  rcu_state.barrier_sequence);
> > @@ -3184,7 +3192,7 @@ void rcu_barrier(void)
> >  	 * Now that we have an rcu_barrier_callback() callback on each
> >  	 * CPU, and thus each counted, remove the initial count.
> >  	 */
> > -	if (atomic_dec_and_test(&rcu_state.barrier_cpu_count))
> > +	if (atomic_sub_and_test(2, &rcu_state.barrier_cpu_count))
> >  		complete(&rcu_state.barrier_completion);
> >  
> >  	/* Wait for all rcu_barrier_callback() callbacks to be invoked. */
> > -- 
> > 2.9.5
> > 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH tip/core/rcu 30/30] rcu: Make rcu_barrier() account for offline no-CBs CPUs
  2020-02-26  3:14     ` Paul E. McKenney
@ 2020-02-26  4:18       ` Paul E. McKenney
  2020-02-26  6:14       ` Boqun Feng
  1 sibling, 0 replies; 46+ messages in thread
From: Paul E. McKenney @ 2020-02-26  4:18 UTC (permalink / raw)
  To: Boqun Feng
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, dipankar,
	akpm, mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, # 5 . 5 . x

On Tue, Feb 25, 2020 at 07:14:55PM -0800, Paul E. McKenney wrote:
> On Tue, Feb 25, 2020 at 06:24:36PM +0800, Boqun Feng wrote:
> > Hi Paul,
> > 
> > On Fri, Feb 14, 2020 at 03:56:07PM -0800, paulmck@kernel.org wrote:
> > > From: "Paul E. McKenney" <paulmck@kernel.org>
> > > 
> > > Currently, rcu_barrier() ignores offline CPUs,  However, it is possible
> > > for an offline no-CBs CPU to have callbacks queued, and rcu_barrier()
> > > must wait for those callbacks.  This commit therefore makes rcu_barrier()
> > > directly invoke the rcu_barrier_func() with interrupts disabled for such
> > > CPUs.  This requires passing the CPU number into this function so that
> > > it can entrain the rcu_barrier() callback onto the correct CPU's callback
> > > list, given that the code must instead execute on the current CPU.
> > > 
> > > While in the area, this commit fixes a bug where the first CPU's callback
> > > might have been invoked before rcu_segcblist_entrain() returned, which
> > > would also result in an early wakeup.
> > > 
> > > Fixes: 5d6742b37727 ("rcu/nocb: Use rcu_segcblist for no-CBs CPUs")
> > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > Cc: <stable@vger.kernel.org> # 5.5.x
> > > ---
> > >  include/trace/events/rcu.h |  1 +
> > >  kernel/rcu/tree.c          | 32 ++++++++++++++++++++------------
> > >  2 files changed, 21 insertions(+), 12 deletions(-)
> > > 
> > > diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> > > index 5e49b06..d56d54c 100644
> > > --- a/include/trace/events/rcu.h
> > > +++ b/include/trace/events/rcu.h
> > > @@ -712,6 +712,7 @@ TRACE_EVENT_RCU(rcu_torture_read,
> > >   *	"Begin": rcu_barrier() started.
> > >   *	"EarlyExit": rcu_barrier() piggybacked, thus early exit.
> > >   *	"Inc1": rcu_barrier() piggyback check counter incremented.
> > > + *	"OfflineNoCBQ": rcu_barrier() found offline no-CBs CPU with callbacks.
> > >   *	"OnlineQ": rcu_barrier() found online CPU with callbacks.
> > >   *	"OnlineNQ": rcu_barrier() found online CPU, no callbacks.
> > >   *	"IRQ": An rcu_barrier_callback() callback posted on remote CPU.
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index d15041f..160643e 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -3098,9 +3098,10 @@ static void rcu_barrier_callback(struct rcu_head *rhp)
> > >  /*
> > >   * Called with preemption disabled, and from cross-cpu IRQ context.
> > >   */
> > > -static void rcu_barrier_func(void *unused)
> > > +static void rcu_barrier_func(void *cpu_in)
> > >  {
> > > -	struct rcu_data *rdp = raw_cpu_ptr(&rcu_data);
> > > +	uintptr_t cpu = (uintptr_t)cpu_in;
> > > +	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > >  
> > >  	rcu_barrier_trace(TPS("IRQ"), -1, rcu_state.barrier_sequence);
> > >  	rdp->barrier_head.func = rcu_barrier_callback;
> > > @@ -3127,7 +3128,7 @@ static void rcu_barrier_func(void *unused)
> > >   */
> > >  void rcu_barrier(void)
> > >  {
> > > -	int cpu;
> > > +	uintptr_t cpu;
> > >  	struct rcu_data *rdp;
> > >  	unsigned long s = rcu_seq_snap(&rcu_state.barrier_sequence);
> > >  
> > > @@ -3150,13 +3151,14 @@ void rcu_barrier(void)
> > >  	rcu_barrier_trace(TPS("Inc1"), -1, rcu_state.barrier_sequence);
> > >  
> > >  	/*
> > > -	 * Initialize the count to one rather than to zero in order to
> > > -	 * avoid a too-soon return to zero in case of a short grace period
> > > -	 * (or preemption of this task).  Exclude CPU-hotplug operations
> > > -	 * to ensure that no offline CPU has callbacks queued.
> > > +	 * Initialize the count to two rather than to zero in order
> > > +	 * to avoid a too-soon return to zero in case of an immediate
> > > +	 * invocation of the just-enqueued callback (or preemption of
> > > +	 * this task).  Exclude CPU-hotplug operations to ensure that no
> > > +	 * offline non-offloaded CPU has callbacks queued.
> > >  	 */
> > >  	init_completion(&rcu_state.barrier_completion);
> > > -	atomic_set(&rcu_state.barrier_cpu_count, 1);
> > > +	atomic_set(&rcu_state.barrier_cpu_count, 2);
> > >  	get_online_cpus();
> > >  
> > >  	/*
> > > @@ -3166,13 +3168,19 @@ void rcu_barrier(void)
> > >  	 */
> > >  	for_each_possible_cpu(cpu) {
> > >  		rdp = per_cpu_ptr(&rcu_data, cpu);
> > > -		if (!cpu_online(cpu) &&
> > > +		if (cpu_is_offline(cpu) &&
> > >  		    !rcu_segcblist_is_offloaded(&rdp->cblist))
> > >  			continue;
> > > -		if (rcu_segcblist_n_cbs(&rdp->cblist)) {
> > > +		if (rcu_segcblist_n_cbs(&rdp->cblist) && cpu_online(cpu)) {
> > >  			rcu_barrier_trace(TPS("OnlineQ"), cpu,
> > >  					  rcu_state.barrier_sequence);
> > > -			smp_call_function_single(cpu, rcu_barrier_func, NULL, 1);
> > > +			smp_call_function_single(cpu, rcu_barrier_func, (void *)cpu, 1);
> > > +		} else if (cpu_is_offline(cpu)) {
> > 
> > I wonder whether this should be:
> > 
> > 		  else if (rcu_segcblist_n_cbs(&rdp->cblist) && cpu_is_offline(cpu))
> > 
> > ? Because I think we only want to queue the barrier call back if there
> > are callbacks for a particular CPU. Am I missing something subtle?
> 
> I don't believe that you are missing anything at all!
> 
> Thank you very much -- this bug would not have shown up in any validation
> setup that I am aware of.  ;-)

And with additional adjustment to make tracing accurate.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH tip/core/rcu 30/30] rcu: Make rcu_barrier() account for offline no-CBs CPUs
  2020-02-26  3:14     ` Paul E. McKenney
  2020-02-26  4:18       ` Paul E. McKenney
@ 2020-02-26  6:14       ` Boqun Feng
  1 sibling, 0 replies; 46+ messages in thread
From: Boqun Feng @ 2020-02-26  6:14 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, dipankar,
	akpm, mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, # 5 . 5 . x

On Tue, Feb 25, 2020 at 07:14:55PM -0800, Paul E. McKenney wrote:
> On Tue, Feb 25, 2020 at 06:24:36PM +0800, Boqun Feng wrote:
> > Hi Paul,
> > 
> > On Fri, Feb 14, 2020 at 03:56:07PM -0800, paulmck@kernel.org wrote:
> > > From: "Paul E. McKenney" <paulmck@kernel.org>
> > > 
> > > Currently, rcu_barrier() ignores offline CPUs,  However, it is possible
> > > for an offline no-CBs CPU to have callbacks queued, and rcu_barrier()
> > > must wait for those callbacks.  This commit therefore makes rcu_barrier()
> > > directly invoke the rcu_barrier_func() with interrupts disabled for such
> > > CPUs.  This requires passing the CPU number into this function so that
> > > it can entrain the rcu_barrier() callback onto the correct CPU's callback
> > > list, given that the code must instead execute on the current CPU.
> > > 
> > > While in the area, this commit fixes a bug where the first CPU's callback
> > > might have been invoked before rcu_segcblist_entrain() returned, which
> > > would also result in an early wakeup.
> > > 
> > > Fixes: 5d6742b37727 ("rcu/nocb: Use rcu_segcblist for no-CBs CPUs")
> > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > Cc: <stable@vger.kernel.org> # 5.5.x
> > > ---
> > >  include/trace/events/rcu.h |  1 +
> > >  kernel/rcu/tree.c          | 32 ++++++++++++++++++++------------
> > >  2 files changed, 21 insertions(+), 12 deletions(-)
> > > 
> > > diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> > > index 5e49b06..d56d54c 100644
> > > --- a/include/trace/events/rcu.h
> > > +++ b/include/trace/events/rcu.h
> > > @@ -712,6 +712,7 @@ TRACE_EVENT_RCU(rcu_torture_read,
> > >   *	"Begin": rcu_barrier() started.
> > >   *	"EarlyExit": rcu_barrier() piggybacked, thus early exit.
> > >   *	"Inc1": rcu_barrier() piggyback check counter incremented.
> > > + *	"OfflineNoCBQ": rcu_barrier() found offline no-CBs CPU with callbacks.
> > >   *	"OnlineQ": rcu_barrier() found online CPU with callbacks.
> > >   *	"OnlineNQ": rcu_barrier() found online CPU, no callbacks.
> > >   *	"IRQ": An rcu_barrier_callback() callback posted on remote CPU.
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index d15041f..160643e 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -3098,9 +3098,10 @@ static void rcu_barrier_callback(struct rcu_head *rhp)
> > >  /*
> > >   * Called with preemption disabled, and from cross-cpu IRQ context.
> > >   */
> > > -static void rcu_barrier_func(void *unused)
> > > +static void rcu_barrier_func(void *cpu_in)
> > >  {
> > > -	struct rcu_data *rdp = raw_cpu_ptr(&rcu_data);
> > > +	uintptr_t cpu = (uintptr_t)cpu_in;
> > > +	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > >  
> > >  	rcu_barrier_trace(TPS("IRQ"), -1, rcu_state.barrier_sequence);
> > >  	rdp->barrier_head.func = rcu_barrier_callback;
> > > @@ -3127,7 +3128,7 @@ static void rcu_barrier_func(void *unused)
> > >   */
> > >  void rcu_barrier(void)
> > >  {
> > > -	int cpu;
> > > +	uintptr_t cpu;
> > >  	struct rcu_data *rdp;
> > >  	unsigned long s = rcu_seq_snap(&rcu_state.barrier_sequence);
> > >  
> > > @@ -3150,13 +3151,14 @@ void rcu_barrier(void)
> > >  	rcu_barrier_trace(TPS("Inc1"), -1, rcu_state.barrier_sequence);
> > >  
> > >  	/*
> > > -	 * Initialize the count to one rather than to zero in order to
> > > -	 * avoid a too-soon return to zero in case of a short grace period
> > > -	 * (or preemption of this task).  Exclude CPU-hotplug operations
> > > -	 * to ensure that no offline CPU has callbacks queued.
> > > +	 * Initialize the count to two rather than to zero in order
> > > +	 * to avoid a too-soon return to zero in case of an immediate
> > > +	 * invocation of the just-enqueued callback (or preemption of
> > > +	 * this task).  Exclude CPU-hotplug operations to ensure that no
> > > +	 * offline non-offloaded CPU has callbacks queued.
> > >  	 */
> > >  	init_completion(&rcu_state.barrier_completion);
> > > -	atomic_set(&rcu_state.barrier_cpu_count, 1);
> > > +	atomic_set(&rcu_state.barrier_cpu_count, 2);
> > >  	get_online_cpus();
> > >  
> > >  	/*
> > > @@ -3166,13 +3168,19 @@ void rcu_barrier(void)
> > >  	 */
> > >  	for_each_possible_cpu(cpu) {
> > >  		rdp = per_cpu_ptr(&rcu_data, cpu);
> > > -		if (!cpu_online(cpu) &&
> > > +		if (cpu_is_offline(cpu) &&
> > >  		    !rcu_segcblist_is_offloaded(&rdp->cblist))
> > >  			continue;
> > > -		if (rcu_segcblist_n_cbs(&rdp->cblist)) {
> > > +		if (rcu_segcblist_n_cbs(&rdp->cblist) && cpu_online(cpu)) {
> > >  			rcu_barrier_trace(TPS("OnlineQ"), cpu,
> > >  					  rcu_state.barrier_sequence);
> > > -			smp_call_function_single(cpu, rcu_barrier_func, NULL, 1);
> > > +			smp_call_function_single(cpu, rcu_barrier_func, (void *)cpu, 1);
> > > +		} else if (cpu_is_offline(cpu)) {
> > 
> > I wonder whether this should be:
> > 
> > 		  else if (rcu_segcblist_n_cbs(&rdp->cblist) && cpu_is_offline(cpu))
> > 
> > ? Because I think we only want to queue the barrier call back if there
> > are callbacks for a particular CPU. Am I missing something subtle?
> 
> I don't believe that you are missing anything at all!
> 
> Thank you very much -- this bug would not have shown up in any validation
> setup that I am aware of.  ;-)
> 
> 							Thanx, Paul
> 
> > Regards,
> > Boqun
> > 
> > > +			rcu_barrier_trace(TPS("OfflineNoCBQ"), cpu,
> > > +					  rcu_state.barrier_sequence);
> > > +			local_irq_disable();
> > > +			rcu_barrier_func((void *)cpu);
> > > +			local_irq_enable();

Another (interesting) thing I found here is that we actually don't need
the irq-off section to call rcu_barrier_func() in this branch. Because
the target CPU is offlined, so only the cblist is only accessed at two
places, IIUC, one is the rcuo kthread and one is here (in
rcu_barrier()), and both places are in the process context rather than
irq context, so irq-off is not required to prevent the deadlock.

But yes, I know, if we drop the local_irq_disable/enable() pair here,
it will make lockdep very unhappy ;-)

Regards,
Boqun

> > >  		} else {
> > >  			rcu_barrier_trace(TPS("OnlineNQ"), cpu,
> > >  					  rcu_state.barrier_sequence);
> > > @@ -3184,7 +3192,7 @@ void rcu_barrier(void)
> > >  	 * Now that we have an rcu_barrier_callback() callback on each
> > >  	 * CPU, and thus each counted, remove the initial count.
> > >  	 */
> > > -	if (atomic_dec_and_test(&rcu_state.barrier_cpu_count))
> > > +	if (atomic_sub_and_test(2, &rcu_state.barrier_cpu_count))
> > >  		complete(&rcu_state.barrier_completion);
> > >  
> > >  	/* Wait for all rcu_barrier_callback() callbacks to be invoked. */
> > > -- 
> > > 2.9.5
> > > 

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, back to index

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-14 23:55 [PATCH tip/core/rcu 0/30] Miscellaneous fixes for v5.7 Paul E. McKenney
2020-02-14 23:55 ` [PATCH tip/core/rcu 01/30] nfs: Fix nfs_access_get_cached_rcu() sparse error paulmck
2020-02-14 23:55 ` [PATCH tip/core/rcu 02/30] rcu: Warn on for_each_leaf_node_cpu_mask() from non-leaf paulmck
2020-02-14 23:55 ` [PATCH tip/core/rcu 03/30] rcu: Fix exp_funnel_lock()/rcu_exp_wait_wake() datarace paulmck
2020-02-14 23:55 ` [PATCH tip/core/rcu 04/30] rcu: Provide debug symbols and line numbers in KCSAN runs paulmck
2020-02-14 23:55 ` [PATCH tip/core/rcu 05/30] rcu: Add WRITE_ONCE() to rcu_node ->qsmask update paulmck
2020-02-14 23:55 ` [PATCH tip/core/rcu 06/30] rcu: Add WRITE_ONCE to rcu_node ->exp_seq_rq store paulmck
2020-02-15  3:47   ` Steven Rostedt
2020-02-15 10:58     ` Paul E. McKenney
2020-02-17 21:11       ` Joel Fernandes
2020-02-17 21:36         ` Paul E. McKenney
2020-02-14 23:55 ` [PATCH tip/core/rcu 07/30] rcu: Add READ_ONCE() to rcu_node ->gp_seq paulmck
2020-02-14 23:55 ` [PATCH tip/core/rcu 08/30] rcu: Add WRITE_ONCE() to rcu_state ->gp_req_activity paulmck
2020-02-14 23:55 ` [PATCH tip/core/rcu 09/30] rcu: Add WRITE_ONCE() to rcu_node ->qsmaskinitnext paulmck
2020-02-14 23:55 ` [PATCH tip/core/rcu 10/30] locking/rtmutex: rcu: Add WRITE_ONCE() to rt_mutex ->owner paulmck
2020-02-14 23:55 ` [PATCH tip/core/rcu 11/30] rcu: Add READ_ONCE() to rcu_segcblist ->tails[] paulmck
2020-02-14 23:55 ` [PATCH tip/core/rcu 12/30] rcu: *_ONCE() for grace-period progress indicators paulmck
2020-02-14 23:55 ` [PATCH tip/core/rcu 13/30] rcu: Fix typos in beginning comments paulmck
2020-02-14 23:55 ` [PATCH tip/core/rcu 14/30] rcu: Add READ_ONCE() to rcu_data ->gpwrap paulmck
2020-02-14 23:55 ` [PATCH tip/core/rcu 15/30] rcu: Add *_ONCE() to rcu_data ->rcu_forced_tick paulmck
2020-02-14 23:55 ` [PATCH tip/core/rcu 16/30] rcu: Add *_ONCE() to rcu_node ->boost_kthread_status paulmck
2020-02-14 23:55 ` [PATCH tip/core/rcu 17/30] timer: Use hlist_unhashed_lockless() in timer_pending() paulmck
2020-02-14 23:55 ` [PATCH tip/core/rcu 18/30] rcu: Remove dead code from rcu_segcblist_insert_pend_cbs() paulmck
2020-02-14 23:55 ` [PATCH tip/core/rcu 19/30] rcu: Add WRITE_ONCE() to rcu_state ->gp_start paulmck
2020-02-14 23:55 ` [PATCH tip/core/rcu 20/30] rcu: Fix rcu_barrier_callback() race condition paulmck
2020-02-14 23:55 ` [PATCH tip/core/rcu 21/30] rculist: Add brackets around cond argument in __list_check_rcu macro paulmck
2020-02-14 23:55 ` [PATCH tip/core/rcu 22/30] rcu: Don't flag non-starting GPs before GP kthread is running paulmck
2020-02-15  3:53   ` Steven Rostedt
2020-02-15 11:01     ` Paul E. McKenney
2020-02-15 13:42       ` Paul E. McKenney
2020-02-17 20:25         ` Steven Rostedt
2020-02-17 22:03           ` Paul E. McKenney
2020-02-17 22:21             ` Steven Rostedt
2020-02-17 23:03               ` Paul E. McKenney
2020-02-14 23:56 ` [PATCH tip/core/rcu 23/30] rcu: Add missing annotation for rcu_nocb_bypass_lock() paulmck
2020-02-14 23:56 ` [PATCH tip/core/rcu 24/30] rcu/nocb: Add missing annotation for rcu_nocb_bypass_unlock() paulmck
2020-02-14 23:56 ` [PATCH tip/core/rcu 25/30] rcu: Optimize and protect atomic_cmpxchg() loop paulmck
2020-02-14 23:56 ` [PATCH tip/core/rcu 26/30] rcu: Tighten rcu_lockdep_assert_cblist_protected() check paulmck
2020-02-14 23:56 ` [PATCH tip/core/rcu 27/30] rcu: Make nocb_gp_wait() double-check unexpected-callback warning paulmck
2020-02-14 23:56 ` [PATCH tip/core/rcu 28/30] rcu: Mark rcu_state.ncpus to detect concurrent writes paulmck
2020-02-14 23:56 ` [PATCH tip/core/rcu 29/30] rcu: Mark rcu_state.gp_seq " paulmck
2020-02-14 23:56 ` [PATCH tip/core/rcu 30/30] rcu: Make rcu_barrier() account for offline no-CBs CPUs paulmck
2020-02-25 10:24   ` Boqun Feng
2020-02-26  3:14     ` Paul E. McKenney
2020-02-26  4:18       ` Paul E. McKenney
2020-02-26  6:14       ` Boqun Feng

RCU Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/rcu/0 rcu/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 rcu rcu/ https://lore.kernel.org/rcu \
		rcu@vger.kernel.org
	public-inbox-index rcu

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.rcu


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git