All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH rcu 0/9] SRCU updates for v6.3
@ 2023-01-05  0:28 Paul E. McKenney
  2023-01-05  0:28 ` [PATCH rcu 1/9] srcu: Release early_srcu resources when no longer in use Paul E. McKenney
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: Paul E. McKenney @ 2023-01-05  0:28 UTC (permalink / raw)
  To: rcu; +Cc: linux-kernel, kernel-team, rostedt

Hello!

This series provides SRCU updates, perhaps most notably the addition
of a srcu_down_read() and srcu_up_read() that act like srcu_read_lock()
and srcu_read_unlock(), but which permit the srcu_up_read() to be invoked
from a different context than the srcu_down_read().

1.	Release early_srcu resources when no longer in use, courtesy
	of Zqiang.

2.	Delegate work to the boot cpu if using SRCU_SIZE_SMALL, courtesy
	of Pingfan Liu.

3.	Fix a misspelling in comment, courtesy of Pingfan Liu.

4.	Fix the comparision in srcu_invl_snp_seq(), courtesy of Pingfan
	Liu.

5.	Add srcu_down_read() and srcu_up_read().

6.	Add test code for semaphore-like SRCU readers.

7.	Remove needless rcu_seq_done() check while holding read lock,
	courtesy of Pingfan Liu.

8.	Yet more detail for srcu_readers_active_idx_check() comments.

9.	Update comment after the index flip.

						Thanx, Paul

------------------------------------------------------------------------

 b/include/linux/srcu.h     |   45 ++++++++++++++++++++++
 b/include/linux/srcutree.h |    2 -
 b/kernel/rcu/srcutree.c    |    9 ++--
 b/kernel/rcu/update.c      |    1 
 kernel/rcu/srcutree.c      |   89 ++++++++++++++++++++++++++++++++-------------
 kernel/rcu/update.c        |    3 +
 6 files changed, 119 insertions(+), 30 deletions(-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH rcu 1/9] srcu: Release early_srcu resources when no longer in use
  2023-01-05  0:28 [PATCH rcu 0/9] SRCU updates for v6.3 Paul E. McKenney
@ 2023-01-05  0:28 ` Paul E. McKenney
  2023-01-05  0:28 ` [PATCH rcu 2/9] srcu: Delegate work to the boot cpu if using SRCU_SIZE_SMALL Paul E. McKenney
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Paul E. McKenney @ 2023-01-05  0:28 UTC (permalink / raw)
  To: rcu; +Cc: linux-kernel, kernel-team, rostedt, Zqiang, Paul E . McKenney

From: Zqiang <qiang1.zhang@intel.com>

Kernels built with the CONFIG_TREE_SRCU Kconfig option set and then
booted with rcupdate.rcu_self_test=1 and srcutree.convert_to_big=1 will
test Tree SRCU during early boot.  The early_srcu structure's srcu_node
array will be allocated when init_srcu_struct_fields() is invoked,
but after the test completes this early_srcu structure will not be used.

This commit therefore invokes cleanup_srcu_struct() to free that srcu_node
structure.

Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/update.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index f5e6a2f95a2a0..a5b4abbee6439 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -586,6 +586,7 @@ static int rcu_verify_early_boot_tests(void)
 		early_boot_test_counter++;
 		srcu_barrier(&early_srcu);
 		WARN_ON_ONCE(!poll_state_synchronize_srcu(&early_srcu, early_srcu_cookie));
+		cleanup_srcu_struct(&early_srcu);
 	}
 	if (rcu_self_test_counter != early_boot_test_counter) {
 		WARN_ON(1);
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rcu 2/9] srcu: Delegate work to the boot cpu if using SRCU_SIZE_SMALL
  2023-01-05  0:28 [PATCH rcu 0/9] SRCU updates for v6.3 Paul E. McKenney
  2023-01-05  0:28 ` [PATCH rcu 1/9] srcu: Release early_srcu resources when no longer in use Paul E. McKenney
@ 2023-01-05  0:28 ` Paul E. McKenney
  2023-01-05  0:28 ` [PATCH rcu 3/9] srcu: Fix a misspelling in comment Paul E. McKenney
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Paul E. McKenney @ 2023-01-05  0:28 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Pingfan Liu,
	Paul E. McKenney, Lai Jiangshan, Josh Triplett,
	Mathieu Desnoyers

From: Pingfan Liu <kernelfans@gmail.com>

Commit 994f706872e6 ("srcu: Make Tree SRCU able to operate without
snp_node array") assumes that cpu 0 is always online.  However, there
really are situations when some other CPU is the boot CPU, for example,
when booting a kdump kernel with the maxcpus=1 boot parameter.

On PowerPC, the kdump kernel can hang as follows:
...
[    1.740036] systemd[1]: Hostname set to <xyz.com>
[  243.686240] INFO: task systemd:1 blocked for more than 122 seconds.
[  243.686264]       Not tainted 6.1.0-rc1 #1
[  243.686272] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.686281] task:systemd         state:D stack:0     pid:1     ppid:0      flags:0x00042000
[  243.686296] Call Trace:
[  243.686301] [c000000016657640] [c000000016657670] 0xc000000016657670 (unreliable)
[  243.686317] [c000000016657830] [c00000001001dec0] __switch_to+0x130/0x220
[  243.686333] [c000000016657890] [c000000010f607b8] __schedule+0x1f8/0x580
[  243.686347] [c000000016657940] [c000000010f60bb4] schedule+0x74/0x140
[  243.686361] [c0000000166579b0] [c000000010f699b8] schedule_timeout+0x168/0x1c0
[  243.686374] [c000000016657a80] [c000000010f61de8] __wait_for_common+0x148/0x360
[  243.686387] [c000000016657b20] [c000000010176bb0] __flush_work.isra.0+0x1c0/0x3d0
[  243.686401] [c000000016657bb0] [c0000000105f2768] fsnotify_wait_marks_destroyed+0x28/0x40
[  243.686415] [c000000016657bd0] [c0000000105f21b8] fsnotify_destroy_group+0x68/0x160
[  243.686428] [c000000016657c40] [c0000000105f6500] inotify_release+0x30/0xa0
[  243.686440] [c000000016657cb0] [c0000000105751a8] __fput+0xc8/0x350
[  243.686452] [c000000016657d00] [c00000001017d524] task_work_run+0xe4/0x170
[  243.686464] [c000000016657d50] [c000000010020e94] do_notify_resume+0x134/0x140
[  243.686478] [c000000016657d80] [c00000001002eb18] interrupt_exit_user_prepare_main+0x198/0x270
[  243.686493] [c000000016657de0] [c00000001002ec60] syscall_exit_prepare+0x70/0x180
[  243.686505] [c000000016657e10] [c00000001000bf7c] system_call_vectored_common+0xfc/0x280
[  243.686520] --- interrupt: 3000 at 0x7fffa47d5ba4
[  243.686528] NIP:  00007fffa47d5ba4 LR: 0000000000000000 CTR: 0000000000000000
[  243.686538] REGS: c000000016657e80 TRAP: 3000   Not tainted  (6.1.0-rc1)
[  243.686548] MSR:  800000000000d033 <SF,EE,PR,ME,IR,DR,RI,LE>  CR: 42044440  XER: 00000000
[  243.686572] IRQMASK: 0
[  243.686572] GPR00: 0000000000000006 00007ffffa606710 00007fffa48e7200 0000000000000000
[  243.686572] GPR04: 0000000000000002 000000000000000a 0000000000000000 0000000000000001
[  243.686572] GPR08: 000001000c172dd0 0000000000000000 0000000000000000 0000000000000000
[  243.686572] GPR12: 0000000000000000 00007fffa4ff4bc0 0000000000000000 0000000000000000
[  243.686572] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  243.686572] GPR20: 0000000132dfdc50 000000000000000e 0000000000189375 0000000000000000
[  243.686572] GPR24: 00007ffffa606ae0 0000000000000005 000001000c185490 000001000c172570
[  243.686572] GPR28: 000001000c172990 000001000c184850 000001000c172e00 00007fffa4fedd98
[  243.686683] NIP [00007fffa47d5ba4] 0x7fffa47d5ba4
[  243.686691] LR [0000000000000000] 0x0
[  243.686698] --- interrupt: 3000
[  243.686708] INFO: task kworker/u16:1:24 blocked for more than 122 seconds.
[  243.686717]       Not tainted 6.1.0-rc1 #1
[  243.686724] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.686733] task:kworker/u16:1   state:D stack:0     pid:24    ppid:2      flags:0x00000800
[  243.686747] Workqueue: events_unbound fsnotify_mark_destroy_workfn
[  243.686758] Call Trace:
[  243.686762] [c0000000166736e0] [c00000004fd91000] 0xc00000004fd91000 (unreliable)
[  243.686775] [c0000000166738d0] [c00000001001dec0] __switch_to+0x130/0x220
[  243.686788] [c000000016673930] [c000000010f607b8] __schedule+0x1f8/0x580
[  243.686801] [c0000000166739e0] [c000000010f60bb4] schedule+0x74/0x140
[  243.686814] [c000000016673a50] [c000000010f699b8] schedule_timeout+0x168/0x1c0
[  243.686827] [c000000016673b20] [c000000010f61de8] __wait_for_common+0x148/0x360
[  243.686840] [c000000016673bc0] [c000000010210840] __synchronize_srcu.part.0+0xa0/0xe0
[  243.686855] [c000000016673c30] [c0000000105f2c64] fsnotify_mark_destroy_workfn+0xc4/0x1a0
[  243.686868] [c000000016673ca0] [c000000010174ea8] process_one_work+0x2a8/0x570
[  243.686882] [c000000016673d40] [c000000010175208] worker_thread+0x98/0x5e0
[  243.686895] [c000000016673dc0] [c0000000101828d4] kthread+0x124/0x130
[  243.686908] [c000000016673e10] [c00000001000cd40] ret_from_kernel_thread+0x5c/0x64
[  366.566274] INFO: task systemd:1 blocked for more than 245 seconds.
[  366.566298]       Not tainted 6.1.0-rc1 #1
[  366.566305] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  366.566314] task:systemd         state:D stack:0     pid:1     ppid:0      flags:0x00042000
[  366.566329] Call Trace:
...

The above splat occurs because PowerPC really does use maxcpus=1
instead of nr_cpus=1 in the kernel command line.  Consequently, the
(quite possibly non-zero) kdump CPU is the only online CPU in the kdump
kernel.  SRCU unconditionally queues a sdp->work on cpu 0, for which no
worker thread has been created, so sdp->work will be never executed and
__synchronize_srcu() will never be completed.

This commit therefore replaces CPU ID 0 with get_boot_cpu_id() in key
places in Tree SRCU.  Since the CPU indicated by get_boot_cpu_id()
is guaranteed to be online, this avoids the above splat.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: rcu@vger.kernel.org
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/srcutree.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index ca4b5dcec675b..16953784a0bdf 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -726,7 +726,7 @@ static void srcu_gp_start(struct srcu_struct *ssp)
 	int state;
 
 	if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_BARRIER)
-		sdp = per_cpu_ptr(ssp->sda, 0);
+		sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id());
 	else
 		sdp = this_cpu_ptr(ssp->sda);
 	lockdep_assert_held(&ACCESS_PRIVATE(ssp, lock));
@@ -837,7 +837,8 @@ static void srcu_gp_end(struct srcu_struct *ssp)
 	/* Initiate callback invocation as needed. */
 	ss_state = smp_load_acquire(&ssp->srcu_size_state);
 	if (ss_state < SRCU_SIZE_WAIT_BARRIER) {
-		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, 0), cbdelay);
+		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, get_boot_cpu_id()),
+					cbdelay);
 	} else {
 		idx = rcu_seq_ctr(gpseq) % ARRAY_SIZE(snp->srcu_have_cbs);
 		srcu_for_each_node_breadth_first(ssp, snp) {
@@ -1161,7 +1162,7 @@ static unsigned long srcu_gp_start_if_needed(struct srcu_struct *ssp,
 	idx = __srcu_read_lock_nmisafe(ssp);
 	ss_state = smp_load_acquire(&ssp->srcu_size_state);
 	if (ss_state < SRCU_SIZE_WAIT_CALL)
-		sdp = per_cpu_ptr(ssp->sda, 0);
+		sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id());
 	else
 		sdp = raw_cpu_ptr(ssp->sda);
 	spin_lock_irqsave_sdp_contention(sdp, &flags);
@@ -1497,7 +1498,7 @@ void srcu_barrier(struct srcu_struct *ssp)
 
 	idx = __srcu_read_lock_nmisafe(ssp);
 	if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_BARRIER)
-		srcu_barrier_one_cpu(ssp, per_cpu_ptr(ssp->sda, 0));
+		srcu_barrier_one_cpu(ssp, per_cpu_ptr(ssp->sda,	get_boot_cpu_id()));
 	else
 		for_each_possible_cpu(cpu)
 			srcu_barrier_one_cpu(ssp, per_cpu_ptr(ssp->sda, cpu));
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rcu 3/9] srcu: Fix a misspelling in comment
  2023-01-05  0:28 [PATCH rcu 0/9] SRCU updates for v6.3 Paul E. McKenney
  2023-01-05  0:28 ` [PATCH rcu 1/9] srcu: Release early_srcu resources when no longer in use Paul E. McKenney
  2023-01-05  0:28 ` [PATCH rcu 2/9] srcu: Delegate work to the boot cpu if using SRCU_SIZE_SMALL Paul E. McKenney
@ 2023-01-05  0:28 ` Paul E. McKenney
  2023-01-05  0:28 ` [PATCH rcu 4/9] srcu: Fix the comparision in srcu_invl_snp_seq() Paul E. McKenney
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Paul E. McKenney @ 2023-01-05  0:28 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Pingfan Liu, Lai Jiangshan,
	Josh Triplett, Mathieu Desnoyers, Mukesh Ojha,
	Frederic Weisbecker, Paul E . McKenney

From: Pingfan Liu <kernelfans@gmail.com>

s/srcu_gq_seq/srcu_gp_seq/

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: <rcu@vger.kernel.org>
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 include/linux/srcutree.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h
index c689a81752c9a..558057b517b74 100644
--- a/include/linux/srcutree.h
+++ b/include/linux/srcutree.h
@@ -49,7 +49,7 @@ struct srcu_data {
 struct srcu_node {
 	spinlock_t __private lock;
 	unsigned long srcu_have_cbs[4];		/* GP seq for children having CBs, but only */
-						/*  if greater than ->srcu_gq_seq. */
+						/*  if greater than ->srcu_gp_seq. */
 	unsigned long srcu_data_have_cbs[4];	/* Which srcu_data structs have CBs for given GP? */
 	unsigned long srcu_gp_seq_needed_exp;	/* Furthest future exp GP. */
 	struct srcu_node *srcu_parent;		/* Next up in tree. */
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rcu 4/9] srcu: Fix the comparision in srcu_invl_snp_seq()
  2023-01-05  0:28 [PATCH rcu 0/9] SRCU updates for v6.3 Paul E. McKenney
                   ` (2 preceding siblings ...)
  2023-01-05  0:28 ` [PATCH rcu 3/9] srcu: Fix a misspelling in comment Paul E. McKenney
@ 2023-01-05  0:28 ` Paul E. McKenney
  2023-01-05  0:28 ` [PATCH rcu 5/9] rcu: Add srcu_down_read() and srcu_up_read() Paul E. McKenney
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Paul E. McKenney @ 2023-01-05  0:28 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Pingfan Liu, Lai Jiangshan,
	Paul E. McKenney, Josh Triplett, Mathieu Desnoyers,
	Frederic Weisbecker

From: Pingfan Liu <kernelfans@gmail.com>

A grace-period sequence number contains two fields: counter and
state.  SRCU_SNP_INIT_SEQ provides a guaranteed invalid value for
grace-period sequence numbers in newly allocated srcu_node structures'
->srcu_have_cbs[] and ->srcu_gp_seq_needed_exp fields.  The point of the
comparison in srcu_invl_snp_seq() is not to detect invalid grace-period
sequence numbers in general, but rather to detect a newly allocated
srcu_node structure whose ->srcu_have_cbs[] and ->srcu_gp_seq_needed_exp
fields need to be brought into line with the srcu_struct structure's
->srcu_gp_seq field.

This commit therefore causes srcu_invl_snp_seq() to compare both fields
of the specified grace-period sequence number.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: <rcu@vger.kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/srcutree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 16953784a0bdf..6af0312005801 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -154,7 +154,7 @@ static void init_srcu_struct_data(struct srcu_struct *ssp)
  */
 static inline bool srcu_invl_snp_seq(unsigned long s)
 {
-	return rcu_seq_state(s) == SRCU_SNP_INIT_SEQ;
+	return s == SRCU_SNP_INIT_SEQ;
 }
 
 /*
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rcu 5/9] rcu: Add srcu_down_read() and srcu_up_read()
  2023-01-05  0:28 [PATCH rcu 0/9] SRCU updates for v6.3 Paul E. McKenney
                   ` (3 preceding siblings ...)
  2023-01-05  0:28 ` [PATCH rcu 4/9] srcu: Fix the comparision in srcu_invl_snp_seq() Paul E. McKenney
@ 2023-01-05  0:28 ` Paul E. McKenney
  2023-01-05  0:28 ` [PATCH rcu 6/9] rcu: Add test code for semaphore-like SRCU readers Paul E. McKenney
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Paul E. McKenney @ 2023-01-05  0:28 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Paul E. McKenney, Jan Kara,
	Amir Goldstein

A pair of matching srcu_read_lock() and srcu_read_unlock() invocations
must take place within the same context, for example, within the same
task.  Otherwise, lockdep complains, as is the right thing to do for
most use cases.

However, there are use cases involving asynchronous I/O where the
SRCU reader needs to begin on one task and end on another.  This commit
therefore supplies the semaphore-like srcu_down_read() and srcu_up_read(),
which act like srcu_read_lock() and srcu_read_unlock(), but permitting
srcu_up_read() to be invoked in a different context than was the matching
srcu_down_read().

Neither srcu_down_read() nor srcu_up_read() may be invoked from an
NMI handler.

Reported-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Amir Goldstein <amir73il@gmail.com>
---
 include/linux/srcu.h | 45 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index 9b9d0bbf1d3cf..74796cd7e7a9d 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -214,6 +214,34 @@ srcu_read_lock_notrace(struct srcu_struct *ssp) __acquires(ssp)
 	return retval;
 }
 
+/**
+ * srcu_down_read - register a new reader for an SRCU-protected structure.
+ * @ssp: srcu_struct in which to register the new reader.
+ *
+ * Enter a semaphore-like SRCU read-side critical section.  Note that
+ * SRCU read-side critical sections may be nested.  However, it is
+ * illegal to call anything that waits on an SRCU grace period for the
+ * same srcu_struct, whether directly or indirectly.  Please note that
+ * one way to indirectly wait on an SRCU grace period is to acquire
+ * a mutex that is held elsewhere while calling synchronize_srcu() or
+ * synchronize_srcu_expedited().  But if you want lockdep to help you
+ * keep this stuff straight, you should instead use srcu_read_lock().
+ *
+ * The semaphore-like nature of srcu_down_read() means that the matching
+ * srcu_up_read() can be invoked from some other context, for example,
+ * from some other task or from an irq handler.  However, neither
+ * srcu_down_read() nor srcu_up_read() may be invoked from an NMI handler.
+ *
+ * Calls to srcu_down_read() may be nested, similar to the manner in
+ * which calls to down_read() may be nested.
+ */
+static inline int srcu_down_read(struct srcu_struct *ssp) __acquires(ssp)
+{
+	WARN_ON_ONCE(in_nmi());
+	srcu_check_nmi_safety(ssp, false);
+	return __srcu_read_lock(ssp);
+}
+
 /**
  * srcu_read_unlock - unregister a old reader from an SRCU-protected structure.
  * @ssp: srcu_struct in which to unregister the old reader.
@@ -254,6 +282,23 @@ srcu_read_unlock_notrace(struct srcu_struct *ssp, int idx) __releases(ssp)
 	__srcu_read_unlock(ssp, idx);
 }
 
+/**
+ * srcu_up_read - unregister a old reader from an SRCU-protected structure.
+ * @ssp: srcu_struct in which to unregister the old reader.
+ * @idx: return value from corresponding srcu_read_lock().
+ *
+ * Exit an SRCU read-side critical section, but not necessarily from
+ * the same context as the maching srcu_down_read().
+ */
+static inline void srcu_up_read(struct srcu_struct *ssp, int idx)
+	__releases(ssp)
+{
+	WARN_ON_ONCE(idx & ~0x1);
+	WARN_ON_ONCE(in_nmi());
+	srcu_check_nmi_safety(ssp, false);
+	__srcu_read_unlock(ssp, idx);
+}
+
 /**
  * smp_mb__after_srcu_read_unlock - ensure full ordering after srcu_read_unlock
  *
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rcu 6/9] rcu: Add test code for semaphore-like SRCU readers
  2023-01-05  0:28 [PATCH rcu 0/9] SRCU updates for v6.3 Paul E. McKenney
                   ` (4 preceding siblings ...)
  2023-01-05  0:28 ` [PATCH rcu 5/9] rcu: Add srcu_down_read() and srcu_up_read() Paul E. McKenney
@ 2023-01-05  0:28 ` Paul E. McKenney
  2023-01-05  0:28 ` [PATCH rcu 7/9] srcu: Remove needless rcu_seq_done() check while holding read lock Paul E. McKenney
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Paul E. McKenney @ 2023-01-05  0:28 UTC (permalink / raw)
  To: rcu; +Cc: linux-kernel, kernel-team, rostedt, Paul E. McKenney

This commit adds trivial test code for srcu_down_read() and
srcu_up_read().

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/update.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index a5b4abbee6439..a72f98c120f09 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -555,9 +555,12 @@ struct early_boot_kfree_rcu {
 static void early_boot_test_call_rcu(void)
 {
 	static struct rcu_head head;
+	int idx;
 	static struct rcu_head shead;
 	struct early_boot_kfree_rcu *rhp;
 
+	idx = srcu_down_read(&early_srcu);
+	srcu_up_read(&early_srcu, idx);
 	call_rcu(&head, test_callback);
 	early_srcu_cookie = start_poll_synchronize_srcu(&early_srcu);
 	call_srcu(&early_srcu, &shead, test_callback);
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rcu 7/9] srcu: Remove needless rcu_seq_done() check while holding read lock
  2023-01-05  0:28 [PATCH rcu 0/9] SRCU updates for v6.3 Paul E. McKenney
                   ` (5 preceding siblings ...)
  2023-01-05  0:28 ` [PATCH rcu 6/9] rcu: Add test code for semaphore-like SRCU readers Paul E. McKenney
@ 2023-01-05  0:28 ` Paul E. McKenney
  2023-01-05  0:28 ` [PATCH rcu 8/9] srcu: Yet more detail for srcu_readers_active_idx_check() comments Paul E. McKenney
  2023-01-05  0:28 ` [PATCH rcu 9/9] srcu: Update comment after the index flip Paul E. McKenney
  8 siblings, 0 replies; 10+ messages in thread
From: Paul E. McKenney @ 2023-01-05  0:28 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Pingfan Liu, Lai Jiangshan,
	Frederic Weisbecker, Josh Triplett, Mathieu Desnoyers,
	Paul E . McKenney

From: Pingfan Liu <kernelfans@gmail.com>

The srcu_gp_start_if_needed() function now read-holds the srcu_struct
whose grace period is being started, which means that the corresponding
SRCU grace period cannot end.  This in turn means that the SRCU
grace-period sequence number returned by rcu_seq_snap() cannot expire
during this time.  And that means that the calls to rcu_seq_done() in
srcu_funnel_exp_start() and srcu_funnel_gp_start() can never return true.

This commit therefore removes these rcu_seq_done() checks, but adds checks
in kernels built with CONFIG_PROVE_RCU=y that splats if rcu_seq_done()
does somehow return true.

[ paulmck: Rearrange checks to handle kernels built with lockdep. ]

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: rcu@vger.kernel.org
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/srcutree.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 6af0312005801..68b8d8b150db1 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -915,7 +915,7 @@ static void srcu_funnel_exp_start(struct srcu_struct *ssp, struct srcu_node *snp
 	if (snp)
 		for (; snp != NULL; snp = snp->srcu_parent) {
 			sgsne = READ_ONCE(snp->srcu_gp_seq_needed_exp);
-			if (rcu_seq_done(&ssp->srcu_gp_seq, s) ||
+			if (WARN_ON_ONCE(rcu_seq_done(&ssp->srcu_gp_seq, s)) ||
 			    (!srcu_invl_snp_seq(sgsne) && ULONG_CMP_GE(sgsne, s)))
 				return;
 			spin_lock_irqsave_rcu_node(snp, flags);
@@ -942,6 +942,9 @@ static void srcu_funnel_exp_start(struct srcu_struct *ssp, struct srcu_node *snp
  *
  * Note that this function also does the work of srcu_funnel_exp_start(),
  * in some cases by directly invoking it.
+ *
+ * The srcu read lock should be hold around this function. And s is a seq snap
+ * after holding that lock.
  */
 static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp,
 				 unsigned long s, bool do_norm)
@@ -962,7 +965,7 @@ static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp,
 	if (snp_leaf)
 		/* Each pass through the loop does one level of the srcu_node tree. */
 		for (snp = snp_leaf; snp != NULL; snp = snp->srcu_parent) {
-			if (rcu_seq_done(&ssp->srcu_gp_seq, s) && snp != snp_leaf)
+			if (WARN_ON_ONCE(rcu_seq_done(&ssp->srcu_gp_seq, s)) && snp != snp_leaf)
 				return; /* GP already done and CBs recorded. */
 			spin_lock_irqsave_rcu_node(snp, flags);
 			snp_seq = snp->srcu_have_cbs[idx];
@@ -999,8 +1002,8 @@ static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp,
 	if (!do_norm && ULONG_CMP_LT(ssp->srcu_gp_seq_needed_exp, s))
 		WRITE_ONCE(ssp->srcu_gp_seq_needed_exp, s);
 
-	/* If grace period not already done and none in progress, start it. */
-	if (!rcu_seq_done(&ssp->srcu_gp_seq, s) &&
+	/* If grace period not already in progress, start it. */
+	if (!WARN_ON_ONCE(rcu_seq_done(&ssp->srcu_gp_seq, s)) &&
 	    rcu_seq_state(ssp->srcu_gp_seq) == SRCU_STATE_IDLE) {
 		WARN_ON_ONCE(ULONG_CMP_GE(ssp->srcu_gp_seq, ssp->srcu_gp_seq_needed));
 		srcu_gp_start(ssp);
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rcu 8/9] srcu: Yet more detail for srcu_readers_active_idx_check() comments
  2023-01-05  0:28 [PATCH rcu 0/9] SRCU updates for v6.3 Paul E. McKenney
                   ` (6 preceding siblings ...)
  2023-01-05  0:28 ` [PATCH rcu 7/9] srcu: Remove needless rcu_seq_done() check while holding read lock Paul E. McKenney
@ 2023-01-05  0:28 ` Paul E. McKenney
  2023-01-05  0:28 ` [PATCH rcu 9/9] srcu: Update comment after the index flip Paul E. McKenney
  8 siblings, 0 replies; 10+ messages in thread
From: Paul E. McKenney @ 2023-01-05  0:28 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Paul E. McKenney, Boqun Feng,
	Frederic Weisbecker, Joel Fernandes (Google),
	Neeraj Upadhyay, Uladzislau Rezki

The comment in srcu_readers_active_idx_check() following the smp_mb()
is out of date, hailing from a simpler time when preemption was disabled
across the bulk of __srcu_read_lock().  The fact that preemption was
disabled meant that the number of tasks that had fetched the old index
but not yet incremented counters was limited by the number of CPUs.

In our more complex modern times, the number of CPUs is no longer a limit.
This commit therefore updates this comment, additionally giving more
memory-ordering detail.

[ paulmck: Apply Nt->Nc feedback from Joel Fernandes. ]

Reported-by: Boqun Feng <boqun.feng@gmail.com>
Reported-by: Frederic Weisbecker <frederic@kernel.org>
Reported-by: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Reported-by: Neeraj Upadhyay <neeraj.iitr10@gmail.com>
Reported-by: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/srcutree.c | 67 ++++++++++++++++++++++++++++++++-----------
 1 file changed, 51 insertions(+), 16 deletions(-)

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 68b8d8b150db1..d2d2e31c42b13 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -469,24 +469,59 @@ static bool srcu_readers_active_idx_check(struct srcu_struct *ssp, int idx)
 
 	/*
 	 * If the locks are the same as the unlocks, then there must have
-	 * been no readers on this index at some time in between. This does
-	 * not mean that there are no more readers, as one could have read
-	 * the current index but not have incremented the lock counter yet.
+	 * been no readers on this index at some point in this function.
+	 * But there might be more readers, as a task might have read
+	 * the current ->srcu_idx but not yet have incremented its CPU's
+	 * ->srcu_lock_count[idx] counter.  In fact, it is possible
+	 * that most of the tasks have been preempted between fetching
+	 * ->srcu_idx and incrementing ->srcu_lock_count[idx].  And there
+	 * could be almost (ULONG_MAX / sizeof(struct task_struct)) tasks
+	 * in a system whose address space was fully populated with memory.
+	 * Call this quantity Nt.
 	 *
-	 * So suppose that the updater is preempted here for so long
-	 * that more than ULONG_MAX non-nested readers come and go in
-	 * the meantime.  It turns out that this cannot result in overflow
-	 * because if a reader modifies its unlock count after we read it
-	 * above, then that reader's next load of ->srcu_idx is guaranteed
-	 * to get the new value, which will cause it to operate on the
-	 * other bank of counters, where it cannot contribute to the
-	 * overflow of these counters.  This means that there is a maximum
-	 * of 2*NR_CPUS increments, which cannot overflow given current
-	 * systems, especially not on 64-bit systems.
+	 * So suppose that the updater is preempted at this point in the
+	 * code for a long time.  That now-preempted updater has already
+	 * flipped ->srcu_idx (possibly during the preceding grace period),
+	 * done an smp_mb() (again, possibly during the preceding grace
+	 * period), and summed up the ->srcu_unlock_count[idx] counters.
+	 * How many times can a given one of the aforementioned Nt tasks
+	 * increment the old ->srcu_idx value's ->srcu_lock_count[idx]
+	 * counter, in the absence of nesting?
 	 *
-	 * OK, how about nesting?  This does impose a limit on nesting
-	 * of floor(ULONG_MAX/NR_CPUS/2), which should be sufficient,
-	 * especially on 64-bit systems.
+	 * It can clearly do so once, given that it has already fetched
+	 * the old value of ->srcu_idx and is just about to use that value
+	 * to index its increment of ->srcu_lock_count[idx].  But as soon as
+	 * it leaves that SRCU read-side critical section, it will increment
+	 * ->srcu_unlock_count[idx], which must follow the updater's above
+	 * read from that same value.  Thus, as soon the reading task does
+	 * an smp_mb() and a later fetch from ->srcu_idx, that task will be
+	 * guaranteed to get the new index.  Except that the increment of
+	 * ->srcu_unlock_count[idx] in __srcu_read_unlock() is after the
+	 * smp_mb(), and the fetch from ->srcu_idx in __srcu_read_lock()
+	 * is before the smp_mb().  Thus, that task might not see the new
+	 * value of ->srcu_idx until the -second- __srcu_read_lock(),
+	 * which in turn means that this task might well increment
+	 * ->srcu_lock_count[idx] for the old value of ->srcu_idx twice,
+	 * not just once.
+	 *
+	 * However, it is important to note that a given smp_mb() takes
+	 * effect not just for the task executing it, but also for any
+	 * later task running on that same CPU.
+	 *
+	 * That is, there can be almost Nt + Nc further increments of
+	 * ->srcu_lock_count[idx] for the old index, where Nc is the number
+	 * of CPUs.  But this is OK because the size of the task_struct
+	 * structure limits the value of Nt and current systems limit Nc
+	 * to a few thousand.
+	 *
+	 * OK, but what about nesting?  This does impose a limit on
+	 * nesting of half of the size of the task_struct structure
+	 * (measured in bytes), which should be sufficient.  A late 2022
+	 * TREE01 rcutorture run reported this size to be no less than
+	 * 9408 bytes, allowing up to 4704 levels of nesting, which is
+	 * comfortably beyond excessive.  Especially on 64-bit systems,
+	 * which are unlikely to be configured with an address space fully
+	 * populated with memory, at least not anytime soon.
 	 */
 	return srcu_readers_lock_idx(ssp, idx) == unlocks;
 }
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rcu 9/9] srcu: Update comment after the index flip
  2023-01-05  0:28 [PATCH rcu 0/9] SRCU updates for v6.3 Paul E. McKenney
                   ` (7 preceding siblings ...)
  2023-01-05  0:28 ` [PATCH rcu 8/9] srcu: Yet more detail for srcu_readers_active_idx_check() comments Paul E. McKenney
@ 2023-01-05  0:28 ` Paul E. McKenney
  8 siblings, 0 replies; 10+ messages in thread
From: Paul E. McKenney @ 2023-01-05  0:28 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Paul E. McKenney,
	Joel Fernandes, Boqun Feng, Frederic Weisbecker, Neeraj Upadhyay

Because there is not guaranteed to be a full memory barrier between
the ->srcu_unlock_count increment of an srcu_read_unlock() and the
->srcu_lock_count increment of the next srcu_read_lock(), this next
srcu_read_lock() is not guaranteed to see the effect of the index flip
just prior to this comment.  However, this next srcu_read_lock() will
execute a full memory barrier, so the srcu_read_lock() after that is
guaranteed to see that index flip.

This guarantee is illustrated by the following diagram of events and
the litmus test following that.

------------------------------------------------------------------------

READER                  UPDATER
-------------           ----------
                           // idx is initially 0.

                           srcu_flip() {
                              smp_mb();
// RSCS

srcu_read_unlock() {
  smp_mb();
                              idx++;    // P
                              smp_mb(); // QQ
                           }

                           srcu_readers_unlock_idx(0) {
        ,--counted------------ count all unlock[0]; // Q
        |
  unlock[0]++;  // X

}
                               smp_mb();
srcu_read_lock() {
  READ(idx) = 0;         ,---- count all lock[0]; // contributes imbalance of 1.
  lock[0]++;  ----counted              |
  smp_mb(); // PP          }           |
}                                      |
                                       |
// RSCS                             not going to effect above scan
                                       |
srcu_read_unlock() {                   |
  smp_mb();                            |
  unlock[0]++;                         |
}                                      |
                                      /
                                     /
srcu_read_lock() {                  |
  READ(idx);  // Y  -----cannot be counted because of P (has to sample idx as 1)
  lock[1]++;
  ...
}

------------------------------------------------------------------------

This makes it similar to the store buffer pattern. Using X, Y, P and Q
annotated above, we get:

------------------------------------------------------------------------

READER                    UPDATER
X (write)                 P (write)

smp_mb(); //PP            smp_mb(); //QQ

Y (read)                  Q (read)

------------------------------------------------------------------------

ASCII art courtesy of Joel Fernandes.

Reported-by: Joel Fernandes <joel@joelfernandes.org>
Reported-by: Boqun Feng <boqun.feng@gmail.com>
Reported-by: Frederic Weisbecker <frederic@kernel.org>
Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/srcutree.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index d2d2e31c42b13..ab4ee58af84bf 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -1098,10 +1098,11 @@ static void srcu_flip(struct srcu_struct *ssp)
 
 	/*
 	 * Ensure that if the updater misses an __srcu_read_unlock()
-	 * increment, that task's next __srcu_read_lock() will see the
-	 * above counter update.  Note that both this memory barrier
-	 * and the one in srcu_readers_active_idx_check() provide the
-	 * guarantee for __srcu_read_lock().
+	 * increment, that task's __srcu_read_lock() following its next
+	 * __srcu_read_lock() or __srcu_read_unlock() will see the above
+	 * counter update.  Note that both this memory barrier and the
+	 * one in srcu_readers_active_idx_check() provide the guarantee
+	 * for __srcu_read_lock().
 	 */
 	smp_mb(); /* D */  /* Pairs with C. */
 }
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-01-05  0:37 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-05  0:28 [PATCH rcu 0/9] SRCU updates for v6.3 Paul E. McKenney
2023-01-05  0:28 ` [PATCH rcu 1/9] srcu: Release early_srcu resources when no longer in use Paul E. McKenney
2023-01-05  0:28 ` [PATCH rcu 2/9] srcu: Delegate work to the boot cpu if using SRCU_SIZE_SMALL Paul E. McKenney
2023-01-05  0:28 ` [PATCH rcu 3/9] srcu: Fix a misspelling in comment Paul E. McKenney
2023-01-05  0:28 ` [PATCH rcu 4/9] srcu: Fix the comparision in srcu_invl_snp_seq() Paul E. McKenney
2023-01-05  0:28 ` [PATCH rcu 5/9] rcu: Add srcu_down_read() and srcu_up_read() Paul E. McKenney
2023-01-05  0:28 ` [PATCH rcu 6/9] rcu: Add test code for semaphore-like SRCU readers Paul E. McKenney
2023-01-05  0:28 ` [PATCH rcu 7/9] srcu: Remove needless rcu_seq_done() check while holding read lock Paul E. McKenney
2023-01-05  0:28 ` [PATCH rcu 8/9] srcu: Yet more detail for srcu_readers_active_idx_check() comments Paul E. McKenney
2023-01-05  0:28 ` [PATCH rcu 9/9] srcu: Update comment after the index flip Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.