rcu.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH tip/core/rcu 0/12] Add strict short-grace-period Kconfig option
@ 2020-08-12 22:56 Paul E. McKenney
  2020-08-12 22:57 ` [PATCH tip/core/rcu 01/12] rcu: Add Kconfig option for strict RCU grace periods paulmck
                   ` (11 more replies)
  0 siblings, 12 replies; 15+ messages in thread
From: Paul E. McKenney @ 2020-08-12 22:56 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, elver, dvyukov, jannh

Hello!

This series adds a CONFIG_RCU_STRICT_GRACE_PERIOD Kconfig option that
causes RCU to strive for short grace periods even at great expense in
terms of performance, scalability, and real-time response.  This option is
therefore not for production use, but rather to allow tools such as KASAN
to more readily detect pointer leaks from RCU read-side critical sections.
Here is an example of such a pointer leak:

	rcu_read_lock();
	p = rcu_dereference(gp);
	do_something(p);
	rcu_read_unlock(); // *p might be freed immediately!
	do_something_else(p); // Potential use after free BUG!!!

This series also adds a rcutree.rcu_unlock_delay kernel boot parameter
that delays the specified number of microseconds after the outermost
rcu_read_unlock() in an attempt to further bend the odds in KASAN's favor.

The patches in this series are as follows:

1.	Add Kconfig option for strict RCU grace periods.

2.	Reduce leaf fanout for strict RCU grace periods.

3.	Restrict default jiffies_till_first_fqs for strict RCU GPs.

4.	Force DEFAULT_RCU_BLIMIT to 1000 for strict RCU GPs.

5.	Always set .need_qs from __rcu_read_lock() for strict GPs.

6.	Do full report for .need_qs for strict GPs.

7.	Attempt QS when CPU discovers GP for strict GPs.

8.	IPI all CPUs at GP start for strict GPs.

9.	IPI all CPUs at GP end for strict GPs.

10.	Provide optional RCU-reader exit delay for strict GPs.

11.	Execute RCU reader shortly after rcu_core for strict GPs.

12.	Report QS for outermost PREEMPT=n rcu_read_unlock() for strict GPs.

							Thanx, Paul

------------------------------------------------------------------------

 Documentation/admin-guide/kernel-parameters.txt |    9 +++
 include/linux/rcupdate.h                        |    7 ++
 kernel/rcu/Kconfig                              |    8 +-
 kernel/rcu/Kconfig.debug                        |   15 +++++
 kernel/rcu/tree.c                               |   65 +++++++++++++++++++++---
 kernel/rcu/tree.h                               |    1 
 kernel/rcu/tree_plugin.h                        |   47 +++++++++++++----
 7 files changed, 132 insertions(+), 20 deletions(-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH tip/core/rcu 01/12] rcu: Add Kconfig option for strict RCU grace periods
  2020-08-12 22:56 [PATCH tip/core/rcu 0/12] Add strict short-grace-period Kconfig option Paul E. McKenney
@ 2020-08-12 22:57 ` paulmck
  2020-08-12 22:57 ` [PATCH tip/core/rcu 02/12] rcu: Reduce leaf fanout " paulmck
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: paulmck @ 2020-08-12 22:57 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, elver, dvyukov, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

People running automated tests have asked for a way to make RCU minimize
grace-period duration in order to increase the probability of KASAN
detecting a pointer being improperly leaked from an RCU read-side critical
section, for example, like this:

	rcu_read_lock();
	p = rcu_dereference(gp);
	do_something_with(p); // OK
	rcu_read_unlock();
	do_something_else_with(p); // BUG!!!

The rcupdate.rcu_expedited boot parameter is a start in this direction,
given that it makes calls to synchronize_rcu() instead invoke the faster
(and more wasteful) synchronize_rcu_expedited().  However, this does
nothing to shorten RCU grace periods that are instead initiated by
call_rcu(), and RCU pointer-leak bugs can involve call_rcu() just as
surely as they can synchronize_rcu().

This commit therefore adds a RCU_STRICT_GRACE_PERIOD Kconfig option
that will be used to shorten normal (non-expedited) RCU grace periods.
This commit also dumps out a message when this option is in effect.
Later commits will actually shorten grace periods.

Reported-by Jann Horn <jannh@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/Kconfig.debug | 15 +++++++++++++++
 kernel/rcu/tree_plugin.h |  2 ++
 2 files changed, 17 insertions(+)

diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
index 3cf6132..cab5a4b 100644
--- a/kernel/rcu/Kconfig.debug
+++ b/kernel/rcu/Kconfig.debug
@@ -114,4 +114,19 @@ config RCU_EQS_DEBUG
 	  Say N here if you need ultimate kernel/user switch latencies
 	  Say Y if you are unsure
 
+config RCU_STRICT_GRACE_PERIOD
+	bool "Provide debug RCU implementation with short grace periods"
+	depends on DEBUG_KERNEL && RCU_EXPERT
+	default n
+	select PREEMPT_COUNT if PREEMPT=n
+	help
+	  Select this option to build an RCU variant that is strict about
+	  grace periods, making them as short as it can.  This limits
+	  scalability, destroys real-time response, degrades battery
+	  lifetime and kills performance.  Don't try this on large
+	  machines, as in systems with more than about 10 or 20 CPUs.
+	  But in conjunction with tools like KASAN, it can be helpful
+	  when looking for certain types of RCU usage bugs, for example,
+	  too-short RCU read-side critical sections.
+
 endmenu # "RCU Debugging"
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index cb1e8c8..5c0c580 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -36,6 +36,8 @@ static void __init rcu_bootup_announce_oddness(void)
 		pr_info("\tRCU dyntick-idle grace-period acceleration is enabled.\n");
 	if (IS_ENABLED(CONFIG_PROVE_RCU))
 		pr_info("\tRCU lockdep checking is enabled.\n");
+	if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD))
+		pr_info("\tRCU strict (and thus non-scalable) grace periods enabled.\n");
 	if (RCU_NUM_LVLS >= 4)
 		pr_info("\tFour(or more)-level hierarchy is enabled.\n");
 	if (RCU_FANOUT_LEAF != 16)
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH tip/core/rcu 02/12] rcu: Reduce leaf fanout for strict RCU grace periods
  2020-08-12 22:56 [PATCH tip/core/rcu 0/12] Add strict short-grace-period Kconfig option Paul E. McKenney
  2020-08-12 22:57 ` [PATCH tip/core/rcu 01/12] rcu: Add Kconfig option for strict RCU grace periods paulmck
@ 2020-08-12 22:57 ` paulmck
  2020-08-12 22:57 ` [PATCH tip/core/rcu 03/12] rcu: Restrict default jiffies_till_first_fqs for strict RCU GPs paulmck
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: paulmck @ 2020-08-12 22:57 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, elver, dvyukov, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

Because strict RCU grace periods will complete more quickly, they will
experience greater lock contention on each leaf rcu_node structure's
->lock.  This commit therefore reduces the leaf fanout in order to reduce
this lock contention.

Note that this also has the effect of reducing the number of CPUs
supported to 16 in the case of CONFIG_RCU_FANOUT_LEAF=2 or 81 in the
case of CONFIG_RCU_FANOUT_LEAF=3.  However, greater numbers of CPUs are
probably a bad idea when using CONFIG_RCU_STRICT_GRACE_PERIOD=y.  Those
wishing to live dangerously are free to edit their kernel/rcu/Kconfig
files accordingly.

Reported-by Jann Horn <jannh@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/Kconfig | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
index 0ebe15a..b71e21f 100644
--- a/kernel/rcu/Kconfig
+++ b/kernel/rcu/Kconfig
@@ -135,10 +135,12 @@ config RCU_FANOUT
 
 config RCU_FANOUT_LEAF
 	int "Tree-based hierarchical RCU leaf-level fanout value"
-	range 2 64 if 64BIT
-	range 2 32 if !64BIT
+	range 2 64 if 64BIT && !RCU_STRICT_GRACE_PERIOD
+	range 2 32 if !64BIT && !RCU_STRICT_GRACE_PERIOD
+	range 2 3 if RCU_STRICT_GRACE_PERIOD
 	depends on TREE_RCU && RCU_EXPERT
-	default 16
+	default 16 if !RCU_STRICT_GRACE_PERIOD
+	default 2 if RCU_STRICT_GRACE_PERIOD
 	help
 	  This option controls the leaf-level fanout of hierarchical
 	  implementations of RCU, and allows trading off cache misses
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH tip/core/rcu 03/12] rcu: Restrict default jiffies_till_first_fqs for strict RCU GPs
  2020-08-12 22:56 [PATCH tip/core/rcu 0/12] Add strict short-grace-period Kconfig option Paul E. McKenney
  2020-08-12 22:57 ` [PATCH tip/core/rcu 01/12] rcu: Add Kconfig option for strict RCU grace periods paulmck
  2020-08-12 22:57 ` [PATCH tip/core/rcu 02/12] rcu: Reduce leaf fanout " paulmck
@ 2020-08-12 22:57 ` paulmck
  2020-08-12 22:57 ` [PATCH tip/core/rcu 04/12] rcu: Force DEFAULT_RCU_BLIMIT to 1000 " paulmck
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: paulmck @ 2020-08-12 22:57 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, elver, dvyukov, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

If there are idle CPUs, RCU's grace-period kthread will wait several
jiffies before even thinking about polling them.  This promotes
efficiency, which is normally a good thing, but when the kernel
has been built with CONFIG_RCU_STRICT_GRACE_PERIOD=y, we care more
about short grace periods.  This commit therefore restricts the
default jiffies_till_first_fqs value to zero in kernels built with
CONFIG_RCU_STRICT_GRACE_PERIOD=y, which causes RCU's grace-period kthread
to poll for idle CPUs immediately after starting a grace period.

Reported-by Jann Horn <jannh@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 65e1b5e..d333f1b 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -471,7 +471,7 @@ module_param(qhimark, long, 0444);
 module_param(qlowmark, long, 0444);
 module_param(qovld, long, 0444);
 
-static ulong jiffies_till_first_fqs = ULONG_MAX;
+static ulong jiffies_till_first_fqs = IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) ? 0 : ULONG_MAX;
 static ulong jiffies_till_next_fqs = ULONG_MAX;
 static bool rcu_kick_kthreads;
 static int rcu_divisor = 7;
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH tip/core/rcu 04/12] rcu: Force DEFAULT_RCU_BLIMIT to 1000 for strict RCU GPs
  2020-08-12 22:56 [PATCH tip/core/rcu 0/12] Add strict short-grace-period Kconfig option Paul E. McKenney
                   ` (2 preceding siblings ...)
  2020-08-12 22:57 ` [PATCH tip/core/rcu 03/12] rcu: Restrict default jiffies_till_first_fqs for strict RCU GPs paulmck
@ 2020-08-12 22:57 ` paulmck
  2020-08-12 22:57 ` [PATCH tip/core/rcu 05/12] rcu: Always set .need_qs from __rcu_read_lock() for strict GPs paulmck
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: paulmck @ 2020-08-12 22:57 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, elver, dvyukov, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

The value of DEFAULT_RCU_BLIMIT is normally set to 10, the idea being to
avoid needless response-time degradation due to RCU callback invocation.
However, when CONFIG_RCU_STRICT_GRACE_PERIOD=y it is better to avoid
throttling callback execution in order to better detect pointer
leaks from RCU read-side critical sections.  This commit therefore
sets the value of DEFAULT_RCU_BLIMIT to 1000 in kernels built with
CONFIG_RCU_STRICT_GRACE_PERIOD=y.

Reported-by Jann Horn <jannh@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index d333f1b..08cc91c 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -454,17 +454,18 @@ static int rcu_is_cpu_rrupt_from_idle(void)
 	return __this_cpu_read(rcu_data.dynticks_nesting) == 0;
 }
 
-#define DEFAULT_RCU_BLIMIT 10     /* Maximum callbacks per rcu_do_batch ... */
-#define DEFAULT_MAX_RCU_BLIMIT 10000 /* ... even during callback flood. */
+#define DEFAULT_RCU_BLIMIT (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) ? 1000 : 10)
+				// Maximum callbacks per rcu_do_batch ...
+#define DEFAULT_MAX_RCU_BLIMIT 10000 // ... even during callback flood.
 static long blimit = DEFAULT_RCU_BLIMIT;
-#define DEFAULT_RCU_QHIMARK 10000 /* If this many pending, ignore blimit. */
+#define DEFAULT_RCU_QHIMARK 10000 // If this many pending, ignore blimit.
 static long qhimark = DEFAULT_RCU_QHIMARK;
-#define DEFAULT_RCU_QLOMARK 100   /* Once only this many pending, use blimit. */
+#define DEFAULT_RCU_QLOMARK 100   // Once only this many pending, use blimit.
 static long qlowmark = DEFAULT_RCU_QLOMARK;
 #define DEFAULT_RCU_QOVLD_MULT 2
 #define DEFAULT_RCU_QOVLD (DEFAULT_RCU_QOVLD_MULT * DEFAULT_RCU_QHIMARK)
-static long qovld = DEFAULT_RCU_QOVLD; /* If this many pending, hammer QS. */
-static long qovld_calc = -1;	  /* No pre-initialization lock acquisitions! */
+static long qovld = DEFAULT_RCU_QOVLD; // If this many pending, hammer QS.
+static long qovld_calc = -1;	  // No pre-initialization lock acquisitions!
 
 module_param(blimit, long, 0444);
 module_param(qhimark, long, 0444);
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH tip/core/rcu 05/12] rcu: Always set .need_qs from __rcu_read_lock() for strict GPs
  2020-08-12 22:56 [PATCH tip/core/rcu 0/12] Add strict short-grace-period Kconfig option Paul E. McKenney
                   ` (3 preceding siblings ...)
  2020-08-12 22:57 ` [PATCH tip/core/rcu 04/12] rcu: Force DEFAULT_RCU_BLIMIT to 1000 " paulmck
@ 2020-08-12 22:57 ` paulmck
  2020-08-12 22:57 ` [PATCH tip/core/rcu 06/12] rcu: Do full report for .need_qs " paulmck
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: paulmck @ 2020-08-12 22:57 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, elver, dvyukov, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

The ->rcu_read_unlock_special.b.need_qs field in the task_struct
structure indicates that the RCU core needs a quiscent state from the
corresponding task.  The __rcu_read_unlock() function checks this (via
an eventual call to rcu_preempt_deferred_qs_irqrestore()), and if set
reports a quiscent state immediately upon exit from the outermost RCU
read-side critical section.

Currently, this flag is only set when the scheduling-clock interrupt
decides that the current RCU grace period is too old, as in about
one full second too old.  But if the kernel has been built with
CONFIG_RCU_STRICT_GRACE_PERIOD=y, we clearly do not want to wait that
long.  This commit therefore sets the .need_qs field immediately at the
start of the RCU read-side critical section from within __rcu_read_lock()
in order to unconditionally enlist help from __rcu_read_unlock().

But note the additional check for rcu_state.gp_kthread, which prevents
attempts to awaken RCU's grace-period kthread during early boot before
there is a scheduler.  Leaving off this check results in early boot hangs.
So early that there is no console output.  Thus, this additional check
fails until such time as RCU's grace-period kthread has been created,
avoiding these empty-console hangs.

Reported-by Jann Horn <jannh@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree_plugin.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 5c0c580..7ed55c5 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -376,6 +376,8 @@ void __rcu_read_lock(void)
 	rcu_preempt_read_enter();
 	if (IS_ENABLED(CONFIG_PROVE_LOCKING))
 		WARN_ON_ONCE(rcu_preempt_depth() > RCU_NEST_PMAX);
+	if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) && rcu_state.gp_kthread)
+		WRITE_ONCE(current->rcu_read_unlock_special.b.need_qs, true);
 	barrier();  /* critical section after entry code. */
 }
 EXPORT_SYMBOL_GPL(__rcu_read_lock);
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH tip/core/rcu 06/12] rcu: Do full report for .need_qs for strict GPs
  2020-08-12 22:56 [PATCH tip/core/rcu 0/12] Add strict short-grace-period Kconfig option Paul E. McKenney
                   ` (4 preceding siblings ...)
  2020-08-12 22:57 ` [PATCH tip/core/rcu 05/12] rcu: Always set .need_qs from __rcu_read_lock() for strict GPs paulmck
@ 2020-08-12 22:57 ` paulmck
  2020-08-13  0:50   ` Jann Horn
  2020-08-12 22:57 ` [PATCH tip/core/rcu 07/12] rcu: Attempt QS when CPU discovers GP " paulmck
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 15+ messages in thread
From: paulmck @ 2020-08-12 22:57 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, elver, dvyukov, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

The rcu_preempt_deferred_qs_irqrestore() function is invoked at
the end of an RCU read-side critical section (for example, directly
from rcu_read_unlock()) and, if .need_qs is set, invokes rcu_qs() to
report the new quiescent state.  This works, except that rcu_qs() only
updates per-CPU state, leaving reporting of the actual quiescent state
to a later call to rcu_report_qs_rdp(), for example from within a later
RCU_SOFTIRQ instance.  Although this approach is exactly what you want if
you are more concerned about efficiency than about short grace periods,
in CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels, short grace periods are
the name of the game.

This commit therefore makes rcu_preempt_deferred_qs_irqrestore() directly
invoke rcu_report_qs_rdp() in CONFIG_RCU_STRICT_GRACE_PERIOD=y, thus
shortening grace periods.

Historical note:  To the best of my knowledge, causing rcu_read_unlock()
to directly report a quiescent state first appeared in Jim Houston's
and Joe Korty's JRCU.  This is the second instance of a Linux-kernel RCU
feature being inspired by JRCU, the first being RCU callback offloading
(as in the RCU_NOCB_CPU Kconfig option).

Reported-by Jann Horn <jannh@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree_plugin.h | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 7ed55c5..1761ff4 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -459,8 +459,12 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
 		return;
 	}
 	t->rcu_read_unlock_special.s = 0;
-	if (special.b.need_qs)
-		rcu_qs();
+	if (special.b.need_qs) {
+		if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD))
+			rcu_report_qs_rdp(rdp->cpu, rdp);
+		else
+			rcu_qs();
+	}
 
 	/*
 	 * Respond to a request by an expedited grace period for a
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH tip/core/rcu 07/12] rcu: Attempt QS when CPU discovers GP for strict GPs
  2020-08-12 22:56 [PATCH tip/core/rcu 0/12] Add strict short-grace-period Kconfig option Paul E. McKenney
                   ` (5 preceding siblings ...)
  2020-08-12 22:57 ` [PATCH tip/core/rcu 06/12] rcu: Do full report for .need_qs " paulmck
@ 2020-08-12 22:57 ` paulmck
  2020-08-12 22:57 ` [PATCH tip/core/rcu 08/12] rcu: IPI all CPUs at GP start " paulmck
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: paulmck @ 2020-08-12 22:57 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, elver, dvyukov, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

A given CPU normally notes a new grace period during one RCU_SOFTIRQ,
but avoids reporting the corresponding quiescent state until some later
RCU_SOFTIRQ.  This leisurly approach improves efficiency by increasing
the number of update requests served by each grace period, but is not
what is needed for kernels built with CONFIG_RCU_STRICT_GRACE_PERIOD=y.

This commit therefore adds a new rcu_strict_gp_check_qs() function
which, in CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels, simply enters and
immediately exist an RCU read-side critical section.  If the CPU is
in a quiescent state, the rcu_read_unlock() will attempt to report an
immediate quiescent state.  This rcu_strict_gp_check_qs() function is
invoked from note_gp_changes(), so that a CPU just noticing a new grace
period might immediately report a quiescent state for that grace period.

Reported-by Jann Horn <jannh@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 08cc91c..4353a1a 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1557,6 +1557,19 @@ static void __maybe_unused rcu_advance_cbs_nowake(struct rcu_node *rnp,
 }
 
 /*
+ * In CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels, attempt to generate a
+ * quiescent state.  This is intended to be invoked when the CPU notices
+ * a new grace period.
+ */
+static void rcu_strict_gp_check_qs(void)
+{
+	if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD)) {
+		rcu_read_lock();
+		rcu_read_unlock();
+	}
+}
+
+/*
  * Update CPU-local rcu_data state to record the beginnings and ends of
  * grace periods.  The caller must hold the ->lock of the leaf rcu_node
  * structure corresponding to the current CPU, and must have irqs disabled.
@@ -1626,6 +1639,7 @@ static void note_gp_changes(struct rcu_data *rdp)
 	}
 	needwake = __note_gp_changes(rnp, rdp);
 	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+	rcu_strict_gp_check_qs();
 	if (needwake)
 		rcu_gp_kthread_wake();
 }
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH tip/core/rcu 08/12] rcu: IPI all CPUs at GP start for strict GPs
  2020-08-12 22:56 [PATCH tip/core/rcu 0/12] Add strict short-grace-period Kconfig option Paul E. McKenney
                   ` (6 preceding siblings ...)
  2020-08-12 22:57 ` [PATCH tip/core/rcu 07/12] rcu: Attempt QS when CPU discovers GP " paulmck
@ 2020-08-12 22:57 ` paulmck
  2020-08-12 22:57 ` [PATCH tip/core/rcu 09/12] rcu: IPI all CPUs at GP end " paulmck
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: paulmck @ 2020-08-12 22:57 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, elver, dvyukov, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

Currently, each CPU discovers the beginning of a given grace period
on its own time, which is again good for efficiency but bad for fast
grace periods.  This commit therefore uses on_each_cpu() to IPI each
CPU after grace-period initialization in order to inform each CPU of
the new grace period in a timely manner, but only in kernels build with
CONFIG_RCU_STRICT_GRACE_PERIOD=y.

Reported-by Jann Horn <jannh@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 4353a1a..a30d6f3 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1678,6 +1678,15 @@ static void rcu_gp_torture_wait(void)
 }
 
 /*
+ * Handler for on_each_cpu() to invoke the target CPU's RCU core
+ * processing.
+ */
+static void rcu_strict_gp_boundary(void *unused)
+{
+	invoke_rcu_core();
+}
+
+/*
  * Initialize a new grace period.  Return false if no grace period required.
  */
 static bool rcu_gp_init(void)
@@ -1805,6 +1814,10 @@ static bool rcu_gp_init(void)
 		WRITE_ONCE(rcu_state.gp_activity, jiffies);
 	}
 
+	// If strict, make all CPUs aware of new grace period.
+	if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD))
+		on_each_cpu(rcu_strict_gp_boundary, NULL, 0);
+
 	return true;
 }
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH tip/core/rcu 09/12] rcu: IPI all CPUs at GP end for strict GPs
  2020-08-12 22:56 [PATCH tip/core/rcu 0/12] Add strict short-grace-period Kconfig option Paul E. McKenney
                   ` (7 preceding siblings ...)
  2020-08-12 22:57 ` [PATCH tip/core/rcu 08/12] rcu: IPI all CPUs at GP start " paulmck
@ 2020-08-12 22:57 ` paulmck
  2020-08-12 22:57 ` [PATCH tip/core/rcu 10/12] rcu: Provide optional RCU-reader exit delay " paulmck
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: paulmck @ 2020-08-12 22:57 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, elver, dvyukov, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

Currently, each CPU discovers the end of a given grace period on its
own time, which is again good for efficiency but bad for fast grace
periods, given that it is things like kfree() within the RCU callbacks
that will cause trouble for pointers leaked from RCU read-side critical
sections.  This commit therefore uses on_each_cpu() to IPI each CPU
after grace-period cleanup in order to inform each CPU of the end of
the old grace period in a timely manner, but only in kernels build with
CONFIG_RCU_STRICT_GRACE_PERIOD=y.

Reported-by Jann Horn <jannh@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index a30d6f3..dd7af40 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2034,6 +2034,10 @@ static void rcu_gp_cleanup(void)
 			   rcu_state.gp_flags & RCU_GP_FLAG_INIT);
 	}
 	raw_spin_unlock_irq_rcu_node(rnp);
+
+	// If strict, make all CPUs aware of the end of the old grace period.
+	if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD))
+		on_each_cpu(rcu_strict_gp_boundary, NULL, 0);
 }
 
 /*
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH tip/core/rcu 10/12] rcu: Provide optional RCU-reader exit delay for strict GPs
  2020-08-12 22:56 [PATCH tip/core/rcu 0/12] Add strict short-grace-period Kconfig option Paul E. McKenney
                   ` (8 preceding siblings ...)
  2020-08-12 22:57 ` [PATCH tip/core/rcu 09/12] rcu: IPI all CPUs at GP end " paulmck
@ 2020-08-12 22:57 ` paulmck
  2020-08-12 22:57 ` [PATCH tip/core/rcu 11/12] rcu: Execute RCU reader shortly after rcu_core " paulmck
  2020-08-12 22:57 ` [PATCH tip/core/rcu 12/12] rcu: Report QS for outermost PREEMPT=n rcu_read_unlock() " paulmck
  11 siblings, 0 replies; 15+ messages in thread
From: paulmck @ 2020-08-12 22:57 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, elver, dvyukov, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

The goal of this series is to increase the probability of tools like
KASAN detecting that an RCU-protected pointer was used outside of its
RCU read-side critical section.  Thus far, the approach has been to make
grace periods and callback processing happen faster.  Another approach
is to delay the pointer leaker.  This commit therefore allows a delay
to be applied to exit from RCU read-side critical sections.

This slowdown is specified by a new rcutree.rcu_unlock_delay kernel boot
parameter that specifies this delay in microseconds, defaulting to zero.

Reported-by Jann Horn <jannh@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 Documentation/admin-guide/kernel-parameters.txt |  9 +++++++++
 kernel/rcu/tree_plugin.h                        | 12 ++++++++++--
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 60e2c6e..c532c70 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4128,6 +4128,15 @@
 			This wake_up() will be accompanied by a
 			WARN_ONCE() splat and an ftrace_dump().
 
+	rcutree.rcu_unlock_delay= [KNL]
+			In CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels,
+			this specifies an rcu_read_unlock()-time delay
+			in microseconds.  This defaults to zero.
+			Larger delays increase the probability of
+			catching RCU pointer leaks, that is, buggy use
+			of RCU-protected pointers after the relevant
+			rcu_read_unlock() has completed.
+
 	rcutree.sysrq_rcu= [KNL]
 			Commandeer a sysrq key to dump out Tree RCU's
 			rcu_node tree with an eye towards determining
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 1761ff4..25c9ee4 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -430,6 +430,12 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp)
 	return !list_empty(&rnp->blkd_tasks);
 }
 
+// Add delay to rcu_read_unlock() for strict grace periods.
+static int rcu_unlock_delay;
+#ifdef CONFIG_RCU_STRICT_GRACE_PERIOD
+module_param(rcu_unlock_delay, int, 0444);
+#endif
+
 /*
  * Report deferred quiescent states.  The deferral time can
  * be quite short, for example, in the case of the call from
@@ -460,10 +466,12 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
 	}
 	t->rcu_read_unlock_special.s = 0;
 	if (special.b.need_qs) {
-		if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD))
+		if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD)) {
 			rcu_report_qs_rdp(rdp->cpu, rdp);
-		else
+			udelay(rcu_unlock_delay);
+		} else {
 			rcu_qs();
+		}
 	}
 
 	/*
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH tip/core/rcu 11/12] rcu: Execute RCU reader shortly after rcu_core for strict GPs
  2020-08-12 22:56 [PATCH tip/core/rcu 0/12] Add strict short-grace-period Kconfig option Paul E. McKenney
                   ` (9 preceding siblings ...)
  2020-08-12 22:57 ` [PATCH tip/core/rcu 10/12] rcu: Provide optional RCU-reader exit delay " paulmck
@ 2020-08-12 22:57 ` paulmck
  2020-08-12 22:57 ` [PATCH tip/core/rcu 12/12] rcu: Report QS for outermost PREEMPT=n rcu_read_unlock() " paulmck
  11 siblings, 0 replies; 15+ messages in thread
From: paulmck @ 2020-08-12 22:57 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, elver, dvyukov, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

A kernel built with CONFIG_RCU_STRICT_GRACE_PERIOD=y needs a quiescent
state to appear very shortly after a CPU has noticed a new grace period.
Placing an RCU reader immediately after this point is ineffective because
this normally happens in softirq context, which acts as a big RCU reader.
This commit therefore introduces a new per-CPU work_struct, which is
used at the end of rcu_core() processing to schedule an RCU read-side
critical section from within a clean environment.

Reported-by Jann Horn <jannh@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 13 +++++++++++++
 kernel/rcu/tree.h |  1 +
 2 files changed, 14 insertions(+)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index dd7af40..ac37343 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2634,6 +2634,14 @@ void rcu_force_quiescent_state(void)
 }
 EXPORT_SYMBOL_GPL(rcu_force_quiescent_state);
 
+// Workqueue handler for an RCU reader for kernels enforcing struct RCU
+// grace periods.
+static void strict_work_handler(struct work_struct *work)
+{
+	rcu_read_lock();
+	rcu_read_unlock();
+}
+
 /* Perform RCU core processing work for the current CPU.  */
 static __latent_entropy void rcu_core(void)
 {
@@ -2678,6 +2686,10 @@ static __latent_entropy void rcu_core(void)
 	/* Do any needed deferred wakeups of rcuo kthreads. */
 	do_nocb_deferred_wakeup(rdp);
 	trace_rcu_utilization(TPS("End RCU core"));
+
+	// If strict GPs, schedule an RCU reader in a clean environment.
+	if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD))
+		queue_work_on(rdp->cpu, rcu_gp_wq, &rdp->strict_work);
 }
 
 static void rcu_core_si(struct softirq_action *h)
@@ -3874,6 +3886,7 @@ rcu_boot_init_percpu_data(int cpu)
 
 	/* Set up local state, ensuring consistent view of global state. */
 	rdp->grpmask = leaf_node_cpu_bit(rdp->mynode, cpu);
+	INIT_WORK(&rdp->strict_work, strict_work_handler);
 	WARN_ON_ONCE(rdp->dynticks_nesting != 1);
 	WARN_ON_ONCE(rcu_dynticks_in_eqs(rcu_dynticks_snap(rdp)));
 	rdp->rcu_ofl_gp_seq = rcu_state.gp_seq;
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 309bc7f..e4f66b8 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -165,6 +165,7 @@ struct rcu_data {
 					/* period it is aware of. */
 	struct irq_work defer_qs_iw;	/* Obtain later scheduler attention. */
 	bool defer_qs_iw_pending;	/* Scheduler attention pending? */
+	struct work_struct strict_work;	/* Schedule readers for strict GPs. */
 
 	/* 2) batch handling */
 	struct rcu_segcblist cblist;	/* Segmented callback list, with */
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH tip/core/rcu 12/12] rcu: Report QS for outermost PREEMPT=n rcu_read_unlock() for strict GPs
  2020-08-12 22:56 [PATCH tip/core/rcu 0/12] Add strict short-grace-period Kconfig option Paul E. McKenney
                   ` (10 preceding siblings ...)
  2020-08-12 22:57 ` [PATCH tip/core/rcu 11/12] rcu: Execute RCU reader shortly after rcu_core " paulmck
@ 2020-08-12 22:57 ` paulmck
  11 siblings, 0 replies; 15+ messages in thread
From: paulmck @ 2020-08-12 22:57 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, elver, dvyukov, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

The CONFIG_PREEMPT=n instance of rcu_read_unlock is even more
aggressively than that of CONFIG_PREEMPT=y in deferring reporting
quiescent states to the RCU core.  This is just what is wanted in normal
use because it reduces overhead, but the resulting delay is not what
is wanted for kernels built with CONFIG_RCU_STRICT_GRACE_PERIOD=y.
This commit therefore adds an rcu_read_unlock_strict() function that
checks for exceptional conditions, and reports the newly started
quiescent state if it is safe to do so, also doing a spin-delay if
requested via rcutree.rcu_unlock_delay.  This commit also adds a call
to rcu_read_unlock_strict() from the CONFIG_PREEMPT=n instance of
__rcu_read_unlock().

[ paulmck: Fixed bug located by kernel test robot <lkp@intel.com> ]
Reported-by Jann Horn <jannh@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 include/linux/rcupdate.h |  7 +++++++
 kernel/rcu/tree.c        |  6 ++++++
 kernel/rcu/tree_plugin.h | 23 +++++++++++++++++------
 3 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index b47d6b6..7c1ceff 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -55,6 +55,12 @@ void __rcu_read_unlock(void);
 
 #else /* #ifdef CONFIG_PREEMPT_RCU */
 
+#ifdef CONFIG_TINY_RCU
+#define rcu_read_unlock_strict() do { } while (0)
+#else
+void rcu_read_unlock_strict(void);
+#endif
+
 static inline void __rcu_read_lock(void)
 {
 	preempt_disable();
@@ -63,6 +69,7 @@ static inline void __rcu_read_lock(void)
 static inline void __rcu_read_unlock(void)
 {
 	preempt_enable();
+	rcu_read_unlock_strict();
 }
 
 static inline int rcu_preempt_depth(void)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index ac37343..78852ef 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -164,6 +164,12 @@ module_param(gp_init_delay, int, 0444);
 static int gp_cleanup_delay;
 module_param(gp_cleanup_delay, int, 0444);
 
+// Add delay to rcu_read_unlock() for strict grace periods.
+static int rcu_unlock_delay;
+#ifdef CONFIG_RCU_STRICT_GRACE_PERIOD
+module_param(rcu_unlock_delay, int, 0444);
+#endif
+
 /*
  * This rcu parameter is runtime-read-only. It reflects
  * a minimum allowed number of objects which can be cached
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 25c9ee4..0881aef 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -430,12 +430,6 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp)
 	return !list_empty(&rnp->blkd_tasks);
 }
 
-// Add delay to rcu_read_unlock() for strict grace periods.
-static int rcu_unlock_delay;
-#ifdef CONFIG_RCU_STRICT_GRACE_PERIOD
-module_param(rcu_unlock_delay, int, 0444);
-#endif
-
 /*
  * Report deferred quiescent states.  The deferral time can
  * be quite short, for example, in the case of the call from
@@ -785,6 +779,23 @@ dump_blkd_tasks(struct rcu_node *rnp, int ncheck)
 #else /* #ifdef CONFIG_PREEMPT_RCU */
 
 /*
+ * If strict grace periods are enabled, and if the calling
+ * __rcu_read_unlock() marks the beginning of a quiescent state, immediately
+ * report that quiescent state and, if requested, spin for a bit.
+ */
+void rcu_read_unlock_strict(void)
+{
+	struct rcu_data *rdp;
+
+	if (!IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) ||
+	   irqs_disabled() || preempt_count() || !rcu_state.gp_kthread)
+		return;
+	rdp = this_cpu_ptr(&rcu_data);
+	rcu_report_qs_rdp(rdp->cpu, rdp);
+	udelay(rcu_unlock_delay);
+}
+
+/*
  * Tell them what RCU they are running.
  */
 static void __init rcu_bootup_announce(void)
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH tip/core/rcu 06/12] rcu: Do full report for .need_qs for strict GPs
  2020-08-12 22:57 ` [PATCH tip/core/rcu 06/12] rcu: Do full report for .need_qs " paulmck
@ 2020-08-13  0:50   ` Jann Horn
  2020-08-13  3:29     ` Paul E. McKenney
  0 siblings, 1 reply; 15+ messages in thread
From: Jann Horn @ 2020-08-13  0:50 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: rcu, kernel list, Kernel Team, Ingo Molnar, Lai Jiangshan,
	dipankar, Andrew Morton, Mathieu Desnoyers, Josh Triplett,
	Thomas Gleixner, Peter Zijlstra, Steven Rostedt, David Howells,
	Eric Dumazet, Frederic Weisbecker, Oleg Nesterov,
	Joel Fernandes (Google),
	Marco Elver, Dmitry Vyukov

On Thu, Aug 13, 2020 at 12:57 AM <paulmck@kernel.org> wrote:
> The rcu_preempt_deferred_qs_irqrestore() function is invoked at
> the end of an RCU read-side critical section (for example, directly
> from rcu_read_unlock()) and, if .need_qs is set, invokes rcu_qs() to
> report the new quiescent state.  This works, except that rcu_qs() only
> updates per-CPU state, leaving reporting of the actual quiescent state
> to a later call to rcu_report_qs_rdp(), for example from within a later
> RCU_SOFTIRQ instance.  Although this approach is exactly what you want if
> you are more concerned about efficiency than about short grace periods,
> in CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels, short grace periods are
> the name of the game.
>
> This commit therefore makes rcu_preempt_deferred_qs_irqrestore() directly
> invoke rcu_report_qs_rdp() in CONFIG_RCU_STRICT_GRACE_PERIOD=y, thus
> shortening grace periods.

Ooh, I'm very happy about this series! :)

> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 7ed55c5..1761ff4 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -459,8 +459,12 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
>                 return;
>         }
>         t->rcu_read_unlock_special.s = 0;
> -       if (special.b.need_qs)
> -               rcu_qs();
> +       if (special.b.need_qs) {
> +               if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD))
> +                       rcu_report_qs_rdp(rdp->cpu, rdp);

Not an issue with this patch specifically, but: I'm looking at
rcu_report_qs_rdp(), and some of the parts that I do vaguely
understand look a bit off to me.

rcu_report_qs_rdp() is given a CPU number as first argument, but never
actually uses that argument. (And the only existing caller also passes
in rdp->cpu, just like this patch.) I guess that argument can go away?

The comment above rcu_report_qs_rdp() claims that it "must be called
from the specified CPU", but there is a branch in there that
specifically checks whether that is true ("if (rdp->cpu ==
smp_processor_id())"). As far as I can tell, rcu_report_qs_rdp() is,
as the comment says, indeed never invoked with another CPU's rcu_data
(only invoked via rcu_core() -> rcu_check_quiescent_state() ->
rcu_report_qs_rdp(), and rcu_core() looks up "rdp =
raw_cpu_ptr(&rcu_data)"). So perhaps if there is a check for whether
rdp belongs to the current CPU, that check should have a WARN_ON(), or
something like that, since it indicates that the API contract
specified in the comment was violated?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH tip/core/rcu 06/12] rcu: Do full report for .need_qs for strict GPs
  2020-08-13  0:50   ` Jann Horn
@ 2020-08-13  3:29     ` Paul E. McKenney
  0 siblings, 0 replies; 15+ messages in thread
From: Paul E. McKenney @ 2020-08-13  3:29 UTC (permalink / raw)
  To: Jann Horn
  Cc: rcu, kernel list, Kernel Team, Ingo Molnar, Lai Jiangshan,
	dipankar, Andrew Morton, Mathieu Desnoyers, Josh Triplett,
	Thomas Gleixner, Peter Zijlstra, Steven Rostedt, David Howells,
	Eric Dumazet, Frederic Weisbecker, Oleg Nesterov,
	Joel Fernandes (Google),
	Marco Elver, Dmitry Vyukov

On Thu, Aug 13, 2020 at 02:50:27AM +0200, Jann Horn wrote:
> On Thu, Aug 13, 2020 at 12:57 AM <paulmck@kernel.org> wrote:
> > The rcu_preempt_deferred_qs_irqrestore() function is invoked at
> > the end of an RCU read-side critical section (for example, directly
> > from rcu_read_unlock()) and, if .need_qs is set, invokes rcu_qs() to
> > report the new quiescent state.  This works, except that rcu_qs() only
> > updates per-CPU state, leaving reporting of the actual quiescent state
> > to a later call to rcu_report_qs_rdp(), for example from within a later
> > RCU_SOFTIRQ instance.  Although this approach is exactly what you want if
> > you are more concerned about efficiency than about short grace periods,
> > in CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels, short grace periods are
> > the name of the game.
> >
> > This commit therefore makes rcu_preempt_deferred_qs_irqrestore() directly
> > invoke rcu_report_qs_rdp() in CONFIG_RCU_STRICT_GRACE_PERIOD=y, thus
> > shortening grace periods.
> 
> Ooh, I'm very happy about this series! :)

Glad you like it!  And I hope that it helps!

One usability concern is whether rcutree.rcu_unlock_delay needs to be
applied only some small fraction of the time in order to allow the delay
to be large (a couple hundred microseconds?)  while still avoiding doing
too much more damage to timing and performance than absolutely necessary.

> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 7ed55c5..1761ff4 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -459,8 +459,12 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
> >                 return;
> >         }
> >         t->rcu_read_unlock_special.s = 0;
> > -       if (special.b.need_qs)
> > -               rcu_qs();
> > +       if (special.b.need_qs) {
> > +               if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD))
> > +                       rcu_report_qs_rdp(rdp->cpu, rdp);
> 
> Not an issue with this patch specifically, but: I'm looking at
> rcu_report_qs_rdp(), and some of the parts that I do vaguely
> understand look a bit off to me.
> 
> rcu_report_qs_rdp() is given a CPU number as first argument, but never
> actually uses that argument. (And the only existing caller also passes
> in rdp->cpu, just like this patch.) I guess that argument can go away?
> 
> The comment above rcu_report_qs_rdp() claims that it "must be called
> from the specified CPU", but there is a branch in there that
> specifically checks whether that is true ("if (rdp->cpu ==
> smp_processor_id())"). As far as I can tell, rcu_report_qs_rdp() is,
> as the comment says, indeed never invoked with another CPU's rcu_data
> (only invoked via rcu_core() -> rcu_check_quiescent_state() ->
> rcu_report_qs_rdp(), and rcu_core() looks up "rdp =
> raw_cpu_ptr(&rcu_data)"). So perhaps if there is a check for whether
> rdp belongs to the current CPU, that check should have a WARN_ON(), or
> something like that, since it indicates that the API contract
> specified in the comment was violated?

It looks like you are correct, and that the first parameter can be
dropped, the "if" you mention replaced by a WARN_ON_ONCE(), and
the body of that "if" be unconditional.  I have it on my list, and
if it still looks correct in the cold hard light of dawn, I will
apply it with your Reported-by.

And thank you very much for looking it over!

						Thanx, Paul

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-08-13  3:29 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-12 22:56 [PATCH tip/core/rcu 0/12] Add strict short-grace-period Kconfig option Paul E. McKenney
2020-08-12 22:57 ` [PATCH tip/core/rcu 01/12] rcu: Add Kconfig option for strict RCU grace periods paulmck
2020-08-12 22:57 ` [PATCH tip/core/rcu 02/12] rcu: Reduce leaf fanout " paulmck
2020-08-12 22:57 ` [PATCH tip/core/rcu 03/12] rcu: Restrict default jiffies_till_first_fqs for strict RCU GPs paulmck
2020-08-12 22:57 ` [PATCH tip/core/rcu 04/12] rcu: Force DEFAULT_RCU_BLIMIT to 1000 " paulmck
2020-08-12 22:57 ` [PATCH tip/core/rcu 05/12] rcu: Always set .need_qs from __rcu_read_lock() for strict GPs paulmck
2020-08-12 22:57 ` [PATCH tip/core/rcu 06/12] rcu: Do full report for .need_qs " paulmck
2020-08-13  0:50   ` Jann Horn
2020-08-13  3:29     ` Paul E. McKenney
2020-08-12 22:57 ` [PATCH tip/core/rcu 07/12] rcu: Attempt QS when CPU discovers GP " paulmck
2020-08-12 22:57 ` [PATCH tip/core/rcu 08/12] rcu: IPI all CPUs at GP start " paulmck
2020-08-12 22:57 ` [PATCH tip/core/rcu 09/12] rcu: IPI all CPUs at GP end " paulmck
2020-08-12 22:57 ` [PATCH tip/core/rcu 10/12] rcu: Provide optional RCU-reader exit delay " paulmck
2020-08-12 22:57 ` [PATCH tip/core/rcu 11/12] rcu: Execute RCU reader shortly after rcu_core " paulmck
2020-08-12 22:57 ` [PATCH tip/core/rcu 12/12] rcu: Report QS for outermost PREEMPT=n rcu_read_unlock() " paulmck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).