linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC tip/core/rcu 0/4] Accelerate RCU Tasks Trace updates
@ 2020-09-10 20:19 Paul E. McKenney
  2020-09-10 20:20 ` [PATCH RFC tip/core/rcu 1/4] rcu-tasks: Mark variables static paulmck
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Paul E. McKenney @ 2020-09-10 20:19 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, alexei.starovoitov, daniel,
	jolsa, bpf

Hello!

This series accelerates RCU Tasks Trace updates, reducing the average
grace-period latencies from about a second to about 20 milliseconds
on my x86 laptop.  These are benchmark numbers, based on a previously
posted patch to rcuscale.c [1] running on my x86 laptop.  The patches
in this series are as follows:

1.	Mark variables static, noted during this effort but otherwise
	unconnected.  This change has no effect, so that the average
	grace-period latency remains at 980 milliseconds.

2.	Use more aggressive polling for RCU Tasks Trace.  This polling
	starts at five-millisecond intervals instead of the prior
	100-millisecond intervals.  As before, the polling interval
	increases in duration as the grace period ages, and again as
	before is capped at one second.  This change reduces the
	average grace-period latency to about 620 milliseconds.

3.	Selectively enable more RCU Tasks Trace IPIs.  This retains
	the old behavior of suppressing IPIs for grace periods that are
	younger than 500 milliseconds for CONFIG_TASKS_TRACE_RCU_READ_MB=y
	kernels, including CONFIG_PREEMPT_RT=y kernels, but allows IPIs
	immediately on other kernels.  It is quite possible that a more
	sophisticated decision procedure will be required, and changes
	to RCU's dyntick-idle code might also be needed.  This change
	(along with the earlier ones) reduces the average grace-period
	latency to about 120 milliseconds.

4.	Shorten per-grace-period sleep for RCU Tasks Trace.  The
	current code sleeps for 100 milliseconds after the end of
	each grace period, which by itself prevents a back-to-back
	string of grace-period waits from completing faster than
	ten per second.  This patch also retains this old behavior
	for CONFIG_TASKS_TRACE_RCU_READ_MB=y (and again thus also
	for CONFIG_PREEMPT_RT=y kernels).  For other kernels, this
	post-grace-period sleep is reduced to five milliseconds.
	This change (along with the earlier ones) reduced the average
	grace-period latency to about 18 milliseconds, for an overall
	factor-of-50 reduction in latency.

Alexei Starovoitov benchmarked an earlier patch [2], producing results
that are roughly consistent with the above reduction in latency [3].

							Thanx, Paul

[1] https://lore.kernel.org/bpf/20200909193900.GK29330@paulmck-ThinkPad-P72/
[2] https://lore.kernel.org/bpf/20200910052727.GA4351@paulmck-ThinkPad-P72/
[3] https://lore.kernel.org/bpf/619554b2-4746-635e-22f3-7f0f09d97760@fb.com/

------------------------------------------------------------------------

 tasks.h |   37 +++++++++++++++++++++++++++++--------
 1 file changed, 29 insertions(+), 8 deletions(-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH RFC tip/core/rcu 1/4] rcu-tasks: Mark variables static
  2020-09-10 20:19 [PATCH RFC tip/core/rcu 0/4] Accelerate RCU Tasks Trace updates Paul E. McKenney
@ 2020-09-10 20:20 ` paulmck
  2020-09-10 20:20 ` [PATCH RFC tip/core/rcu 2/4] rcu-tasks: Use more aggressive polling for RCU Tasks Trace paulmck
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: paulmck @ 2020-09-10 20:20 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, alexei.starovoitov, daniel,
	jolsa, bpf, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

The n_heavy_reader_attempts, n_heavy_reader_updates, and
n_heavy_reader_ofl_updates variables are not used outside of their
translation unit, so this commit marks them static.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tasks.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 05d3e13..978508e 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -745,9 +745,9 @@ static DEFINE_PER_CPU(bool, trc_ipi_to_cpu);
 
 // The number of detections of task quiescent state relying on
 // heavyweight readers executing explicit memory barriers.
-unsigned long n_heavy_reader_attempts;
-unsigned long n_heavy_reader_updates;
-unsigned long n_heavy_reader_ofl_updates;
+static unsigned long n_heavy_reader_attempts;
+static unsigned long n_heavy_reader_updates;
+static unsigned long n_heavy_reader_ofl_updates;
 
 void call_rcu_tasks_trace(struct rcu_head *rhp, rcu_callback_t func);
 DEFINE_RCU_TASKS(rcu_tasks_trace, rcu_tasks_wait_gp, call_rcu_tasks_trace,
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH RFC tip/core/rcu 2/4] rcu-tasks: Use more aggressive polling for RCU Tasks Trace
  2020-09-10 20:19 [PATCH RFC tip/core/rcu 0/4] Accelerate RCU Tasks Trace updates Paul E. McKenney
  2020-09-10 20:20 ` [PATCH RFC tip/core/rcu 1/4] rcu-tasks: Mark variables static paulmck
@ 2020-09-10 20:20 ` paulmck
  2020-09-10 20:20 ` [PATCH RFC tip/core/rcu 3/4] rcu-tasks: Selectively enable more RCU Tasks Trace IPIs paulmck
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: paulmck @ 2020-09-10 20:20 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, alexei.starovoitov, daniel,
	jolsa, bpf, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

The RCU Tasks Trace grace periods are too slow, as in 40x slower than
those of RCU Tasks.  This is due to my having assumed a one-second grace
period was OK, and thus not having optimized any further.  This commit
provides the first step in this optimization process, namely by allowing
the task_list scan backoff interval to be specified on a per-flavor basis,
and then speeding up the scans for RCU Tasks Trace.  However, kernels
built with CONFIG_TASKS_TRACE_RCU_READ_MB=y continue to use the old slower
backoff, consistent with that Kconfig option's goal of reducing IPIs.

Link: https://lore.kernel.org/bpf/CAADnVQK_AiX+S_L_A4CQWT11XyveppBbQSQgH_qWGyzu_E8Yeg@mail.gmail.com/
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tasks.h | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 978508e..ad8c4f3 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -28,6 +28,7 @@ typedef void (*postgp_func_t)(struct rcu_tasks *rtp);
  * @kthread_ptr: This flavor's grace-period/callback-invocation kthread.
  * @gp_func: This flavor's grace-period-wait function.
  * @gp_state: Grace period's most recent state transition (debugging).
+ * @init_fract: Initial backoff sleep interval.
  * @gp_jiffies: Time of last @gp_state transition.
  * @gp_start: Most recent grace-period start in jiffies.
  * @n_gps: Number of grace periods completed since boot.
@@ -48,6 +49,7 @@ struct rcu_tasks {
 	struct wait_queue_head cbs_wq;
 	raw_spinlock_t cbs_lock;
 	int gp_state;
+	int init_fract;
 	unsigned long gp_jiffies;
 	unsigned long gp_start;
 	unsigned long n_gps;
@@ -329,8 +331,10 @@ static void rcu_tasks_wait_gp(struct rcu_tasks *rtp)
 	 */
 	lastreport = jiffies;
 
-	/* Start off with HZ/10 wait and slowly back off to 1 HZ wait. */
-	fract = 10;
+	// Start off with initial wait and slowly back off to 1 HZ wait.
+	fract = rtp->init_fract;
+	if (fract > HZ)
+		fract = HZ;
 
 	for (;;) {
 		bool firstreport;
@@ -553,6 +557,7 @@ EXPORT_SYMBOL_GPL(rcu_barrier_tasks);
 
 static int __init rcu_spawn_tasks_kthread(void)
 {
+	rcu_tasks.init_fract = 10;
 	rcu_tasks.pregp_func = rcu_tasks_pregp_step;
 	rcu_tasks.pertask_func = rcu_tasks_pertask;
 	rcu_tasks.postscan_func = rcu_tasks_postscan;
@@ -1163,6 +1168,13 @@ EXPORT_SYMBOL_GPL(rcu_barrier_tasks_trace);
 
 static int __init rcu_spawn_tasks_trace_kthread(void)
 {
+	if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB)) {
+		rcu_tasks_trace.init_fract = 10;
+	} else {
+		rcu_tasks_trace.init_fract = HZ / 5;
+		if (rcu_tasks_trace.init_fract <= 0)
+			rcu_tasks_trace.init_fract = 1;
+	}
 	rcu_tasks_trace.pregp_func = rcu_tasks_trace_pregp_step;
 	rcu_tasks_trace.pertask_func = rcu_tasks_trace_pertask;
 	rcu_tasks_trace.postscan_func = rcu_tasks_trace_postscan;
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH RFC tip/core/rcu 3/4] rcu-tasks: Selectively enable more RCU Tasks Trace IPIs
  2020-09-10 20:19 [PATCH RFC tip/core/rcu 0/4] Accelerate RCU Tasks Trace updates Paul E. McKenney
  2020-09-10 20:20 ` [PATCH RFC tip/core/rcu 1/4] rcu-tasks: Mark variables static paulmck
  2020-09-10 20:20 ` [PATCH RFC tip/core/rcu 2/4] rcu-tasks: Use more aggressive polling for RCU Tasks Trace paulmck
@ 2020-09-10 20:20 ` paulmck
  2020-09-10 20:20 ` [PATCH RFC tip/core/rcu 4/4] rcu-tasks: Shorten per-grace-period sleep for RCU Tasks Trace paulmck
  2020-09-17 21:06 ` [PATCH RFC tip/core/rcu 0/4] Accelerate RCU Tasks Trace updates Paul E. McKenney
  4 siblings, 0 replies; 16+ messages in thread
From: paulmck @ 2020-09-10 20:20 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, alexei.starovoitov, daniel,
	jolsa, bpf, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

Many workloads are quite sensitive to IPIs, and such workloads should
build kernels with CONFIG_TASKS_TRACE_RCU_READ_MB=y to prevent RCU
Tasks Trace from using them under normal conditions.  However, other
workloads are quite happy to permit more IPIs if doing so makes BPF
program updates go faster.  This commit therefore sets the default
value for the rcupdate.rcu_task_ipi_delay kernel parameter to zero for
kernels that have been built with CONFIG_TASKS_TRACE_RCU_READ_MB=n,
while retaining the old default of (HZ / 10) for kernels that have
indicated an aversion to IPIs via CONFIG_TASKS_TRACE_RCU_READ_MB=y.

Link: https://lore.kernel.org/bpf/CAADnVQK_AiX+S_L_A4CQWT11XyveppBbQSQgH_qWGyzu_E8Yeg@mail.gmail.com/
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tasks.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index ad8c4f3..2b4df23 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -83,7 +83,7 @@ static struct rcu_tasks rt_name =					\
 DEFINE_STATIC_SRCU(tasks_rcu_exit_srcu);
 
 /* Avoid IPIing CPUs early in the grace period. */
-#define RCU_TASK_IPI_DELAY (HZ / 2)
+#define RCU_TASK_IPI_DELAY (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB) ? HZ / 2 : 0)
 static int rcu_task_ipi_delay __read_mostly = RCU_TASK_IPI_DELAY;
 module_param(rcu_task_ipi_delay, int, 0644);
 
@@ -916,7 +916,8 @@ static void trc_wait_for_one_reader(struct task_struct *t,
 
 	// If currently running, send an IPI, either way, add to list.
 	trc_add_holdout(t, bhp);
-	if (task_curr(t) && time_after(jiffies, rcu_tasks_trace.gp_start + rcu_task_ipi_delay)) {
+	if (task_curr(t) &&
+	    time_after(jiffies + 1, rcu_tasks_trace.gp_start + rcu_task_ipi_delay)) {
 		// The task is currently running, so try IPIing it.
 		cpu = task_cpu(t);
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH RFC tip/core/rcu 4/4] rcu-tasks: Shorten per-grace-period sleep for RCU Tasks Trace
  2020-09-10 20:19 [PATCH RFC tip/core/rcu 0/4] Accelerate RCU Tasks Trace updates Paul E. McKenney
                   ` (2 preceding siblings ...)
  2020-09-10 20:20 ` [PATCH RFC tip/core/rcu 3/4] rcu-tasks: Selectively enable more RCU Tasks Trace IPIs paulmck
@ 2020-09-10 20:20 ` paulmck
  2020-09-11  3:18   ` Alexei Starovoitov
  2020-09-17 21:06 ` [PATCH RFC tip/core/rcu 0/4] Accelerate RCU Tasks Trace updates Paul E. McKenney
  4 siblings, 1 reply; 16+ messages in thread
From: paulmck @ 2020-09-10 20:20 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, alexei.starovoitov, daniel,
	jolsa, bpf, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

The various RCU tasks flavors currently wait 100 milliseconds between each
grace period in order to prevent CPU-bound loops and to favor efficiency
over latency.  However, RCU Tasks Trace needs to have a grace-period
latency of roughly 25 milliseconds, which is completely infeasible given
the 100-millisecond per-grace-period sleep.  This commit therefore reduces
this sleep duration to 5 milliseconds (or one jiffy, whichever is longer)
in kernels built with CONFIG_TASKS_TRACE_RCU_READ_MB=y.

Link: https://lore.kernel.org/bpf/CAADnVQK_AiX+S_L_A4CQWT11XyveppBbQSQgH_qWGyzu_E8Yeg@mail.gmail.com/
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tasks.h | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 2b4df23..a0eaed5 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -28,6 +28,7 @@ typedef void (*postgp_func_t)(struct rcu_tasks *rtp);
  * @kthread_ptr: This flavor's grace-period/callback-invocation kthread.
  * @gp_func: This flavor's grace-period-wait function.
  * @gp_state: Grace period's most recent state transition (debugging).
+ * @gp_sleep: Per-grace-period sleep to prevent CPU-bound looping.
  * @init_fract: Initial backoff sleep interval.
  * @gp_jiffies: Time of last @gp_state transition.
  * @gp_start: Most recent grace-period start in jiffies.
@@ -49,6 +50,7 @@ struct rcu_tasks {
 	struct wait_queue_head cbs_wq;
 	raw_spinlock_t cbs_lock;
 	int gp_state;
+	int gp_sleep;
 	int init_fract;
 	unsigned long gp_jiffies;
 	unsigned long gp_start;
@@ -233,7 +235,7 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 			cond_resched();
 		}
 		/* Paranoid sleep to keep this from entering a tight loop */
-		schedule_timeout_idle(HZ/10);
+		schedule_timeout_idle(rtp->gp_sleep);
 
 		set_tasks_gp_state(rtp, RTGS_WAIT_CBS);
 	}
@@ -557,6 +559,7 @@ EXPORT_SYMBOL_GPL(rcu_barrier_tasks);
 
 static int __init rcu_spawn_tasks_kthread(void)
 {
+	rcu_tasks.gp_sleep = HZ / 10;
 	rcu_tasks.init_fract = 10;
 	rcu_tasks.pregp_func = rcu_tasks_pregp_step;
 	rcu_tasks.pertask_func = rcu_tasks_pertask;
@@ -690,6 +693,7 @@ EXPORT_SYMBOL_GPL(rcu_barrier_tasks_rude);
 
 static int __init rcu_spawn_tasks_rude_kthread(void)
 {
+	rcu_tasks_rude.gp_sleep = HZ / 10;
 	rcu_spawn_tasks_kthread_generic(&rcu_tasks_rude);
 	return 0;
 }
@@ -1170,8 +1174,12 @@ EXPORT_SYMBOL_GPL(rcu_barrier_tasks_trace);
 static int __init rcu_spawn_tasks_trace_kthread(void)
 {
 	if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB)) {
+		rcu_tasks_trace.gp_sleep = HZ / 10;
 		rcu_tasks_trace.init_fract = 10;
 	} else {
+		rcu_tasks_trace.gp_sleep = HZ / 200;
+		if (rcu_tasks_trace.gp_sleep <= 0)
+			rcu_tasks_trace.gp_sleep = 1;
 		rcu_tasks_trace.init_fract = HZ / 5;
 		if (rcu_tasks_trace.init_fract <= 0)
 			rcu_tasks_trace.init_fract = 1;
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC tip/core/rcu 4/4] rcu-tasks: Shorten per-grace-period sleep for RCU Tasks Trace
  2020-09-10 20:20 ` [PATCH RFC tip/core/rcu 4/4] rcu-tasks: Shorten per-grace-period sleep for RCU Tasks Trace paulmck
@ 2020-09-11  3:18   ` Alexei Starovoitov
  2020-09-11  4:37     ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Alexei Starovoitov @ 2020-09-11  3:18 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: rcu, LKML, Kernel Team, Ingo Molnar, Lai Jiangshan, dipankar,
	Andrew Morton, Mathieu Desnoyers, Josh Triplett, Thomas Gleixner,
	Peter Zijlstra, Steven Rostedt, David Howells, Eric Dumazet,
	Frederic Weisbecker, Oleg Nesterov, Joel Fernandes,
	Daniel Borkmann, Jiri Olsa, bpf

On Thu, Sep 10, 2020 at 1:20 PM <paulmck@kernel.org> wrote:
>
> From: "Paul E. McKenney" <paulmck@kernel.org>
>
> The various RCU tasks flavors currently wait 100 milliseconds between each
> grace period in order to prevent CPU-bound loops and to favor efficiency
> over latency.  However, RCU Tasks Trace needs to have a grace-period
> latency of roughly 25 milliseconds, which is completely infeasible given
> the 100-millisecond per-grace-period sleep.  This commit therefore reduces
> this sleep duration to 5 milliseconds (or one jiffy, whichever is longer)
> in kernels built with CONFIG_TASKS_TRACE_RCU_READ_MB=y.

The commit log is either misleading or wrong?
If I read the code correctly in CONFIG_TASKS_TRACE_RCU_READ_MB=y
case the existing HZ/10 "paranoid sleep" is preserved.
It's for the MB=n case it is reduced to HZ/200.
Also I don't understand why you're talking about milliseconds but
all numbers are HZ based. HZ/10 gives different number of
milliseconds depending on HZ.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC tip/core/rcu 4/4] rcu-tasks: Shorten per-grace-period sleep for RCU Tasks Trace
  2020-09-11  3:18   ` Alexei Starovoitov
@ 2020-09-11  4:37     ` Paul E. McKenney
  0 siblings, 0 replies; 16+ messages in thread
From: Paul E. McKenney @ 2020-09-11  4:37 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: rcu, LKML, Kernel Team, Ingo Molnar, Lai Jiangshan, dipankar,
	Andrew Morton, Mathieu Desnoyers, Josh Triplett, Thomas Gleixner,
	Peter Zijlstra, Steven Rostedt, David Howells, Eric Dumazet,
	Frederic Weisbecker, Oleg Nesterov, Joel Fernandes,
	Daniel Borkmann, Jiri Olsa, bpf

On Thu, Sep 10, 2020 at 08:18:01PM -0700, Alexei Starovoitov wrote:
> On Thu, Sep 10, 2020 at 1:20 PM <paulmck@kernel.org> wrote:
> >
> > From: "Paul E. McKenney" <paulmck@kernel.org>
> >
> > The various RCU tasks flavors currently wait 100 milliseconds between each
> > grace period in order to prevent CPU-bound loops and to favor efficiency
> > over latency.  However, RCU Tasks Trace needs to have a grace-period
> > latency of roughly 25 milliseconds, which is completely infeasible given
> > the 100-millisecond per-grace-period sleep.  This commit therefore reduces
> > this sleep duration to 5 milliseconds (or one jiffy, whichever is longer)
> > in kernels built with CONFIG_TASKS_TRACE_RCU_READ_MB=y.
> 
> The commit log is either misleading or wrong?
> If I read the code correctly in CONFIG_TASKS_TRACE_RCU_READ_MB=y
> case the existing HZ/10 "paranoid sleep" is preserved.

Yes, for CONFIG_TASKS_TRACE_RCU_READ_MB=y, the previous 100-millisecond
"paranoid sleep" is preserved.  Preserving previous behavior is of course
especially important for rcupdate.rcu_task_ipi_delay, given that real-time
applications are degraded by IPIs.  And given that we are avoiding IPIs
in this case, speeding up the polling is not all that helpful.

> It's for the MB=n case it is reduced to HZ/200.

Yes, that is, to roughly 5 milliseconds for large HZ or to one jiffy
for HZ<200.  Here, we send IPIs much more aggressively, so polling
more frequently does help a lot.

> Also I don't understand why you're talking about milliseconds but
> all numbers are HZ based. HZ/10 gives different number of
> milliseconds depending on HZ.

As long as HZ is 10 or greater, HZ/10 jiffies is roughly 100 milliseconds.
In the unlikely event that HZ is less than 10, the code clamps to one
jiffy.  Since schedule_timeout_idle() sleep time is specified in jiffies,
it all works out.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC tip/core/rcu 0/4] Accelerate RCU Tasks Trace updates
  2020-09-10 20:19 [PATCH RFC tip/core/rcu 0/4] Accelerate RCU Tasks Trace updates Paul E. McKenney
                   ` (3 preceding siblings ...)
  2020-09-10 20:20 ` [PATCH RFC tip/core/rcu 4/4] rcu-tasks: Shorten per-grace-period sleep for RCU Tasks Trace paulmck
@ 2020-09-17 21:06 ` Paul E. McKenney
  2020-09-17 21:07   ` [PATCH tip/core/rcu 1/8] rcu-tasks: Prevent complaints of unused show_rcu_tasks_classic_gp_kthread() paulmck
                     ` (7 more replies)
  4 siblings, 8 replies; 16+ messages in thread
From: Paul E. McKenney @ 2020-09-17 21:06 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, alexei.starovoitov, daniel,
	jolsa, bpf, sfr

Hello!

This series accelerates RCU Tasks Trace updates, reducing the average
grace-period latencies from about a second to about 20 milliseconds on my
x86 laptop.  These are benchmark numbers, based on a previously posted
patch to rcuscale.c [1] running on my x86 laptop.  Additional patches
provide a compiler-warning cleanup and fix several race conditions that
were exposed by the faster grace periods.  The patches in this series
are as follows:

1.	Prevent complaints of unused show_rcu_tasks_classic_gp_kthread().
	This is not related to the problem at hand, but it is
	a pre-existing patch that provides a simple cleanup.
	The grace-period latency thus remains at 980 milliseconds.

2.	Mark variables static, noted during this effort but otherwise
	unconnected.  This change has no effect, so that the average
	grace-period latency remains at 980 milliseconds.

3.	Use more aggressive polling for RCU Tasks Trace.  This polling
	starts at five-millisecond intervals instead of the prior
	100-millisecond intervals.  As before, the polling interval
	increases in duration as the grace period ages, and again as
	before is capped at one second.  This change reduces the
	average grace-period latency to about 620 milliseconds.

4.	Selectively enable more RCU Tasks Trace IPIs.  This retains
	the old behavior of suppressing IPIs for grace periods that are
	younger than 500 milliseconds for CONFIG_TASKS_TRACE_RCU_READ_MB=y
	kernels, including CONFIG_PREEMPT_RT=y kernels, but allows IPIs
	immediately on other kernels.  It is quite possible that a more
	sophisticated decision procedure will be required, and changes
	to RCU's dyntick-idle code might also be needed.  This change
	(along with the earlier ones) reduces the average grace-period
	latency to about 120 milliseconds.

5.	Shorten per-grace-period sleep for RCU Tasks Trace.  The
	current code sleeps for 100 milliseconds after the end of
	each grace period, which by itself prevents a back-to-back
	string of grace-period waits from completing faster than
	ten per second.  This patch also retains this old behavior
	for CONFIG_TASKS_TRACE_RCU_READ_MB=y (and again thus also
	for CONFIG_PREEMPT_RT=y kernels).  For other kernels, this
	post-grace-period sleep is reduced to five milliseconds.
	This change (along with the earlier ones) reduced the average
	grace-period latency to about 18 milliseconds, for an overall
	factor-of-50 reduction in latency.

6.	Fix a deadlock-inducing race between rcu_read_unlock_trace()
	and trc_read_check_handler().  The race window was only a few
	instructions wide, but please see the commit log for the full
	sad story.  The grace-period speedup made this race 50 times
	more probable, and thus reduced the rcutorture runtime required
	to reproduce it from about five months to about four days.

7.	Fix a low-probability race between rcu_read_unlock_trace()
	and the RCU Tasks Trace CPU stall reporting loop in
	rcu_tasks_trace_postgp().  This race could result in leaking
	task_struct structures.

8.	Fix a low-probability race between the RCU Tasks Trace CPU
	stall reporting loop in rcu_tasks_trace_postgp() and task exit.
	This race could result in use-after-free errors.

Alexei Starovoitov benchmarked an earlier patch [2], producing results
that are roughly consistent with the above reduction in latency [3].

Changes since last week's RFC version:

o	Added patch #1, which cleans up a compiler warning.

o	Renumbered patches 1-4 to 2-5.

o	Add Ccs to patches 3, 4, and 5.

o	Add patches 6-8 to fix race conditions exposed by 50x faster
	grace periods.	These are either rare on the one hand or both
	rare and occurring only during an RCU Tasks Trace CPU stall
	warning -and- rare on the other.  Still, they need to be fixed.

o	This series maintains the sub-20-millisecond update-side
	grace-period delays of the RFC series.
	
o	Fixing the first of the race conditions required that a
	compiler barrier be added to rcu_read_lock_trace() and that
	another compiler barrier along with a WRITE_ONCE() be added to
	rcu_read_unlock_trace().  This fix therefore adds a fraction of
	a nanosecond to read-side overhead.  On my laptop, the increase
	is from about 2.6 nanoseconds to about 3 nanoseconds.  This
	small increase should not cause noticeable problems.

							Thanx, Paul

[1] https://lore.kernel.org/bpf/20200909193900.GK29330@paulmck-ThinkPad-P72/
[2] https://lore.kernel.org/bpf/20200910052727.GA4351@paulmck-ThinkPad-P72/
[3] https://lore.kernel.org/bpf/619554b2-4746-635e-22f3-7f0f09d97760@fb.com/

------------------------------------------------------------------------

 include/linux/rcupdate_trace.h |    4 +
 kernel/rcu/tasks.h             |   92 +++++++++++++++++++++++++++++++----------
 2 files changed, 75 insertions(+), 21 deletions(-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH tip/core/rcu 1/8] rcu-tasks: Prevent complaints of unused show_rcu_tasks_classic_gp_kthread()
  2020-09-17 21:06 ` [PATCH RFC tip/core/rcu 0/4] Accelerate RCU Tasks Trace updates Paul E. McKenney
@ 2020-09-17 21:07   ` paulmck
  2020-09-17 21:07   ` [PATCH tip/core/rcu 2/8] rcu-tasks: Mark variables static paulmck
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: paulmck @ 2020-09-17 21:07 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, sfr, Paul E. McKenney,
	# 5 . 8 . x

From: "Paul E. McKenney" <paulmck@kernel.org>

Commit 8344496e8b49 ("rcu-tasks: Conditionally compile
show_rcu_tasks_gp_kthreads()") introduced conditional
compilation of several functions, but forgot one occurrence of
show_rcu_tasks_classic_gp_kthread() that causes the compiler to warn of
an unused static function.  This commit uses "static inline" to avoid
these complaints and possibly also to avoid emitting an actual definition
of this function.

Fixes: 8344496e8b49 ("rcu-tasks: Conditionally compile show_rcu_tasks_gp_kthreads()")
Cc: <stable@vger.kernel.org> # 5.8.x
Reported-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tasks.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 835e2df..05d3e13 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -590,7 +590,7 @@ void exit_tasks_rcu_finish(void) __releases(&tasks_rcu_exit_srcu)
 }
 
 #else /* #ifdef CONFIG_TASKS_RCU */
-static void show_rcu_tasks_classic_gp_kthread(void) { }
+static inline void show_rcu_tasks_classic_gp_kthread(void) { }
 void exit_tasks_rcu_start(void) { }
 void exit_tasks_rcu_finish(void) { exit_tasks_rcu_finish_trace(current); }
 #endif /* #else #ifdef CONFIG_TASKS_RCU */
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH tip/core/rcu 2/8] rcu-tasks: Mark variables static
  2020-09-17 21:06 ` [PATCH RFC tip/core/rcu 0/4] Accelerate RCU Tasks Trace updates Paul E. McKenney
  2020-09-17 21:07   ` [PATCH tip/core/rcu 1/8] rcu-tasks: Prevent complaints of unused show_rcu_tasks_classic_gp_kthread() paulmck
@ 2020-09-17 21:07   ` paulmck
  2020-09-17 21:07   ` [PATCH tip/core/rcu 3/8] rcu-tasks: Use more aggressive polling for RCU Tasks Trace paulmck
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: paulmck @ 2020-09-17 21:07 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, sfr, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

The n_heavy_reader_attempts, n_heavy_reader_updates, and
n_heavy_reader_ofl_updates variables are not used outside of their
translation unit, so this commit marks them static.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tasks.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 05d3e13..978508e 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -745,9 +745,9 @@ static DEFINE_PER_CPU(bool, trc_ipi_to_cpu);
 
 // The number of detections of task quiescent state relying on
 // heavyweight readers executing explicit memory barriers.
-unsigned long n_heavy_reader_attempts;
-unsigned long n_heavy_reader_updates;
-unsigned long n_heavy_reader_ofl_updates;
+static unsigned long n_heavy_reader_attempts;
+static unsigned long n_heavy_reader_updates;
+static unsigned long n_heavy_reader_ofl_updates;
 
 void call_rcu_tasks_trace(struct rcu_head *rhp, rcu_callback_t func);
 DEFINE_RCU_TASKS(rcu_tasks_trace, rcu_tasks_wait_gp, call_rcu_tasks_trace,
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH tip/core/rcu 3/8] rcu-tasks: Use more aggressive polling for RCU Tasks Trace
  2020-09-17 21:06 ` [PATCH RFC tip/core/rcu 0/4] Accelerate RCU Tasks Trace updates Paul E. McKenney
  2020-09-17 21:07   ` [PATCH tip/core/rcu 1/8] rcu-tasks: Prevent complaints of unused show_rcu_tasks_classic_gp_kthread() paulmck
  2020-09-17 21:07   ` [PATCH tip/core/rcu 2/8] rcu-tasks: Mark variables static paulmck
@ 2020-09-17 21:07   ` paulmck
  2020-09-17 21:07   ` [PATCH tip/core/rcu 4/8] rcu-tasks: Selectively enable more RCU Tasks Trace IPIs paulmck
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: paulmck @ 2020-09-17 21:07 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, sfr, Paul E. McKenney,
	Daniel Borkmann, Jiri Olsa, bpf

From: "Paul E. McKenney" <paulmck@kernel.org>

The RCU Tasks Trace grace periods are too slow, as in 40x slower than
those of RCU Tasks.  This is due to my having assumed a one-second grace
period was OK, and thus not having optimized any further.  This commit
provides the first step in this optimization process, namely by allowing
the task_list scan backoff interval to be specified on a per-flavor basis,
and then speeding up the scans for RCU Tasks Trace.  However, kernels
built with CONFIG_TASKS_TRACE_RCU_READ_MB=y continue to use the old slower
backoff, consistent with that Kconfig option's goal of reducing IPIs.

Link: https://lore.kernel.org/bpf/CAADnVQK_AiX+S_L_A4CQWT11XyveppBbQSQgH_qWGyzu_E8Yeg@mail.gmail.com/
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tasks.h | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 978508e..ad8c4f3 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -28,6 +28,7 @@ typedef void (*postgp_func_t)(struct rcu_tasks *rtp);
  * @kthread_ptr: This flavor's grace-period/callback-invocation kthread.
  * @gp_func: This flavor's grace-period-wait function.
  * @gp_state: Grace period's most recent state transition (debugging).
+ * @init_fract: Initial backoff sleep interval.
  * @gp_jiffies: Time of last @gp_state transition.
  * @gp_start: Most recent grace-period start in jiffies.
  * @n_gps: Number of grace periods completed since boot.
@@ -48,6 +49,7 @@ struct rcu_tasks {
 	struct wait_queue_head cbs_wq;
 	raw_spinlock_t cbs_lock;
 	int gp_state;
+	int init_fract;
 	unsigned long gp_jiffies;
 	unsigned long gp_start;
 	unsigned long n_gps;
@@ -329,8 +331,10 @@ static void rcu_tasks_wait_gp(struct rcu_tasks *rtp)
 	 */
 	lastreport = jiffies;
 
-	/* Start off with HZ/10 wait and slowly back off to 1 HZ wait. */
-	fract = 10;
+	// Start off with initial wait and slowly back off to 1 HZ wait.
+	fract = rtp->init_fract;
+	if (fract > HZ)
+		fract = HZ;
 
 	for (;;) {
 		bool firstreport;
@@ -553,6 +557,7 @@ EXPORT_SYMBOL_GPL(rcu_barrier_tasks);
 
 static int __init rcu_spawn_tasks_kthread(void)
 {
+	rcu_tasks.init_fract = 10;
 	rcu_tasks.pregp_func = rcu_tasks_pregp_step;
 	rcu_tasks.pertask_func = rcu_tasks_pertask;
 	rcu_tasks.postscan_func = rcu_tasks_postscan;
@@ -1163,6 +1168,13 @@ EXPORT_SYMBOL_GPL(rcu_barrier_tasks_trace);
 
 static int __init rcu_spawn_tasks_trace_kthread(void)
 {
+	if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB)) {
+		rcu_tasks_trace.init_fract = 10;
+	} else {
+		rcu_tasks_trace.init_fract = HZ / 5;
+		if (rcu_tasks_trace.init_fract <= 0)
+			rcu_tasks_trace.init_fract = 1;
+	}
 	rcu_tasks_trace.pregp_func = rcu_tasks_trace_pregp_step;
 	rcu_tasks_trace.pertask_func = rcu_tasks_trace_pertask;
 	rcu_tasks_trace.postscan_func = rcu_tasks_trace_postscan;
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH tip/core/rcu 4/8] rcu-tasks: Selectively enable more RCU Tasks Trace IPIs
  2020-09-17 21:06 ` [PATCH RFC tip/core/rcu 0/4] Accelerate RCU Tasks Trace updates Paul E. McKenney
                     ` (2 preceding siblings ...)
  2020-09-17 21:07   ` [PATCH tip/core/rcu 3/8] rcu-tasks: Use more aggressive polling for RCU Tasks Trace paulmck
@ 2020-09-17 21:07   ` paulmck
  2020-09-17 21:07   ` [PATCH tip/core/rcu 5/8] rcu-tasks: Shorten per-grace-period sleep for RCU Tasks Trace paulmck
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: paulmck @ 2020-09-17 21:07 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, sfr, Paul E. McKenney,
	Daniel Borkmann, Jiri Olsa, bpf

From: "Paul E. McKenney" <paulmck@kernel.org>

Many workloads are quite sensitive to IPIs, and such workloads should
build kernels with CONFIG_TASKS_TRACE_RCU_READ_MB=y to prevent RCU
Tasks Trace from using them under normal conditions.  However, other
workloads are quite happy to permit more IPIs if doing so makes BPF
program updates go faster.  This commit therefore sets the default
value for the rcupdate.rcu_task_ipi_delay kernel parameter to zero for
kernels that have been built with CONFIG_TASKS_TRACE_RCU_READ_MB=n,
while retaining the old default of (HZ / 10) for kernels that have
indicated an aversion to IPIs via CONFIG_TASKS_TRACE_RCU_READ_MB=y.

Link: https://lore.kernel.org/bpf/CAADnVQK_AiX+S_L_A4CQWT11XyveppBbQSQgH_qWGyzu_E8Yeg@mail.gmail.com/
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tasks.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index ad8c4f3..2b4df23 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -83,7 +83,7 @@ static struct rcu_tasks rt_name =					\
 DEFINE_STATIC_SRCU(tasks_rcu_exit_srcu);
 
 /* Avoid IPIing CPUs early in the grace period. */
-#define RCU_TASK_IPI_DELAY (HZ / 2)
+#define RCU_TASK_IPI_DELAY (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB) ? HZ / 2 : 0)
 static int rcu_task_ipi_delay __read_mostly = RCU_TASK_IPI_DELAY;
 module_param(rcu_task_ipi_delay, int, 0644);
 
@@ -916,7 +916,8 @@ static void trc_wait_for_one_reader(struct task_struct *t,
 
 	// If currently running, send an IPI, either way, add to list.
 	trc_add_holdout(t, bhp);
-	if (task_curr(t) && time_after(jiffies, rcu_tasks_trace.gp_start + rcu_task_ipi_delay)) {
+	if (task_curr(t) &&
+	    time_after(jiffies + 1, rcu_tasks_trace.gp_start + rcu_task_ipi_delay)) {
 		// The task is currently running, so try IPIing it.
 		cpu = task_cpu(t);
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH tip/core/rcu 5/8] rcu-tasks: Shorten per-grace-period sleep for RCU Tasks Trace
  2020-09-17 21:06 ` [PATCH RFC tip/core/rcu 0/4] Accelerate RCU Tasks Trace updates Paul E. McKenney
                     ` (3 preceding siblings ...)
  2020-09-17 21:07   ` [PATCH tip/core/rcu 4/8] rcu-tasks: Selectively enable more RCU Tasks Trace IPIs paulmck
@ 2020-09-17 21:07   ` paulmck
  2020-09-17 21:07   ` [PATCH tip/core/rcu 6/8] rcu-tasks: Fix grace-period/unlock race in " paulmck
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: paulmck @ 2020-09-17 21:07 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, sfr, Paul E. McKenney,
	Daniel Borkmann, Jiri Olsa, bpf

From: "Paul E. McKenney" <paulmck@kernel.org>

The various RCU tasks flavors currently wait 100 milliseconds between each
grace period in order to prevent CPU-bound loops and to favor efficiency
over latency.  However, RCU Tasks Trace needs to have a grace-period
latency of roughly 25 milliseconds, which is completely infeasible given
the 100-millisecond per-grace-period sleep.  This commit therefore reduces
this sleep duration to 5 milliseconds (or one jiffy, whichever is longer)
in kernels built with CONFIG_TASKS_TRACE_RCU_READ_MB=y.

Link: https://lore.kernel.org/bpf/CAADnVQK_AiX+S_L_A4CQWT11XyveppBbQSQgH_qWGyzu_E8Yeg@mail.gmail.com/
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tasks.h | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 2b4df23..a0eaed5 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -28,6 +28,7 @@ typedef void (*postgp_func_t)(struct rcu_tasks *rtp);
  * @kthread_ptr: This flavor's grace-period/callback-invocation kthread.
  * @gp_func: This flavor's grace-period-wait function.
  * @gp_state: Grace period's most recent state transition (debugging).
+ * @gp_sleep: Per-grace-period sleep to prevent CPU-bound looping.
  * @init_fract: Initial backoff sleep interval.
  * @gp_jiffies: Time of last @gp_state transition.
  * @gp_start: Most recent grace-period start in jiffies.
@@ -49,6 +50,7 @@ struct rcu_tasks {
 	struct wait_queue_head cbs_wq;
 	raw_spinlock_t cbs_lock;
 	int gp_state;
+	int gp_sleep;
 	int init_fract;
 	unsigned long gp_jiffies;
 	unsigned long gp_start;
@@ -233,7 +235,7 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 			cond_resched();
 		}
 		/* Paranoid sleep to keep this from entering a tight loop */
-		schedule_timeout_idle(HZ/10);
+		schedule_timeout_idle(rtp->gp_sleep);
 
 		set_tasks_gp_state(rtp, RTGS_WAIT_CBS);
 	}
@@ -557,6 +559,7 @@ EXPORT_SYMBOL_GPL(rcu_barrier_tasks);
 
 static int __init rcu_spawn_tasks_kthread(void)
 {
+	rcu_tasks.gp_sleep = HZ / 10;
 	rcu_tasks.init_fract = 10;
 	rcu_tasks.pregp_func = rcu_tasks_pregp_step;
 	rcu_tasks.pertask_func = rcu_tasks_pertask;
@@ -690,6 +693,7 @@ EXPORT_SYMBOL_GPL(rcu_barrier_tasks_rude);
 
 static int __init rcu_spawn_tasks_rude_kthread(void)
 {
+	rcu_tasks_rude.gp_sleep = HZ / 10;
 	rcu_spawn_tasks_kthread_generic(&rcu_tasks_rude);
 	return 0;
 }
@@ -1170,8 +1174,12 @@ EXPORT_SYMBOL_GPL(rcu_barrier_tasks_trace);
 static int __init rcu_spawn_tasks_trace_kthread(void)
 {
 	if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB)) {
+		rcu_tasks_trace.gp_sleep = HZ / 10;
 		rcu_tasks_trace.init_fract = 10;
 	} else {
+		rcu_tasks_trace.gp_sleep = HZ / 200;
+		if (rcu_tasks_trace.gp_sleep <= 0)
+			rcu_tasks_trace.gp_sleep = 1;
 		rcu_tasks_trace.init_fract = HZ / 5;
 		if (rcu_tasks_trace.init_fract <= 0)
 			rcu_tasks_trace.init_fract = 1;
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH tip/core/rcu 6/8] rcu-tasks: Fix grace-period/unlock race in RCU Tasks Trace
  2020-09-17 21:06 ` [PATCH RFC tip/core/rcu 0/4] Accelerate RCU Tasks Trace updates Paul E. McKenney
                     ` (4 preceding siblings ...)
  2020-09-17 21:07   ` [PATCH tip/core/rcu 5/8] rcu-tasks: Shorten per-grace-period sleep for RCU Tasks Trace paulmck
@ 2020-09-17 21:07   ` paulmck
  2020-09-17 21:07   ` [PATCH tip/core/rcu 7/8] rcu-tasks: Fix low-probability task_struct leak paulmck
  2020-09-17 21:07   ` [PATCH tip/core/rcu 8/8] rcu-tasks: Enclose task-list scan in rcu_read_lock() paulmck
  7 siblings, 0 replies; 16+ messages in thread
From: paulmck @ 2020-09-17 21:07 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, sfr, Paul E. McKenney,
	Alexei Starovoitov, Daniel Borkmann, Jiri Olsa, bpf, # 5 . 7 . x

From: "Paul E. McKenney" <paulmck@kernel.org>

The more intense grace-period processing resulting from the 50x RCU
Tasks Trace grace-period speedups exposed the following race condition:

o	Task A running on CPU 0 executes rcu_read_lock_trace(),
	entering a read-side critical section.

o	When Task A eventually invokes rcu_read_unlock_trace()
	to exit its read-side critical section, this function
	notes that the ->trc_reader_special.s flag is zero and
	and therefore invoke wil set ->trc_reader_nesting to zero
	using WRITE_ONCE().  But before that happens...

o	The RCU Tasks Trace grace-period kthread running on some other
	CPU interrogates Task A, but this fails because this task is
	currently running.  This kthread therefore sends an IPI to CPU 0.

o	CPU 0 receives the IPI, and thus invokes trc_read_check_handler().
	Because Task A has not yet cleared its ->trc_reader_nesting
	counter, this function sees that Task A is still within its
	read-side critical section.  This function therefore sets the
	->trc_reader_nesting.b.need_qs flag, AKA the .need_qs flag.

	Except that Task A has already checked the .need_qs flag, which
	is part of the ->trc_reader_special.s flag.  The .need_qs flag
	therefore remains set until Task A's next rcu_read_unlock_trace().

o	Task A now invokes synchronize_rcu_tasks_trace(), which cannot
	start a new grace period until the current grace period completes.
	And thus cannot return until after that time.

	But Task A's .need_qs flag is still set, which prevents the current
	grace period from completing.  And because Task A is blocked, it
	will never execute rcu_read_unlock_trace() until its call to
	synchronize_rcu_tasks_trace() returns.

	We are therefore deadlocked.

This race is improbable, but 80 hours of rcutorture made it happen twice.
The race was possible before the grace-period speedup, but roughly 50x
less probable.  Several thousand hours of rcutorture would have been
necessary to have a reasonable chance of making this happen before this
50x speedup.

This commit therefore eliminates this deadlock by setting
->trc_reader_nesting to a large negative number before checking the
.need_qs and zeroing (or decrementing with respect to its initial
value) ->trc_reader_nesting.  For its part, the IPI handler's
trc_read_check_handler() function adds a check for negative values,
deferring evaluation of the task in this case.  Taken together, these
changes avoid this deadlock scenario.

Fixes: 276c410448db ("rcu-tasks: Split ->trc_reader_need_end")
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: <bpf@vger.kernel.org>
Cc: <stable@vger.kernel.org> # 5.7.x
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 include/linux/rcupdate_trace.h | 4 ++++
 kernel/rcu/tasks.h             | 6 ++++++
 2 files changed, 10 insertions(+)

diff --git a/include/linux/rcupdate_trace.h b/include/linux/rcupdate_trace.h
index d9015aa..a6a6a3a 100644
--- a/include/linux/rcupdate_trace.h
+++ b/include/linux/rcupdate_trace.h
@@ -50,6 +50,7 @@ static inline void rcu_read_lock_trace(void)
 	struct task_struct *t = current;
 
 	WRITE_ONCE(t->trc_reader_nesting, READ_ONCE(t->trc_reader_nesting) + 1);
+	barrier();
 	if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB) &&
 	    t->trc_reader_special.b.need_mb)
 		smp_mb(); // Pairs with update-side barriers
@@ -72,6 +73,9 @@ static inline void rcu_read_unlock_trace(void)
 
 	rcu_lock_release(&rcu_trace_lock_map);
 	nesting = READ_ONCE(t->trc_reader_nesting) - 1;
+	barrier(); // Critical section before disabling.
+	// Disable IPI-based setting of .need_qs.
+	WRITE_ONCE(t->trc_reader_nesting, INT_MIN);
 	if (likely(!READ_ONCE(t->trc_reader_special.s)) || nesting) {
 		WRITE_ONCE(t->trc_reader_nesting, nesting);
 		return;  // We assume shallow reader nesting.
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index a0eaed5..e583a2d 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -830,6 +830,12 @@ static void trc_read_check_handler(void *t_in)
 		WRITE_ONCE(t->trc_reader_checked, true);
 		goto reset_ipi;
 	}
+	// If we are racing with an rcu_read_unlock_trace(), try again later.
+	if (unlikely(t->trc_reader_nesting < 0)) {
+		if (WARN_ON_ONCE(atomic_dec_and_test(&trc_n_readers_need_end)))
+			wake_up(&trc_wait);
+		goto reset_ipi;
+	}
 	WRITE_ONCE(t->trc_reader_checked, true);
 
 	// Get here if the task is in a read-side critical section.  Set
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH tip/core/rcu 7/8] rcu-tasks: Fix low-probability task_struct leak
  2020-09-17 21:06 ` [PATCH RFC tip/core/rcu 0/4] Accelerate RCU Tasks Trace updates Paul E. McKenney
                     ` (5 preceding siblings ...)
  2020-09-17 21:07   ` [PATCH tip/core/rcu 6/8] rcu-tasks: Fix grace-period/unlock race in " paulmck
@ 2020-09-17 21:07   ` paulmck
  2020-09-17 21:07   ` [PATCH tip/core/rcu 8/8] rcu-tasks: Enclose task-list scan in rcu_read_lock() paulmck
  7 siblings, 0 replies; 16+ messages in thread
From: paulmck @ 2020-09-17 21:07 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, sfr, Paul E. McKenney,
	Alexei Starovoitov, Daniel Borkmann, Jiri Olsa, bpf, # 5 . 7 . x

From: "Paul E. McKenney" <paulmck@kernel.org>

When rcu_tasks_trace_postgp() function detects an RCU Tasks Trace
CPU stall, it adds all tasks blocking the current grace period to
a list, invoking get_task_struct() on each to prevent them from
being freed while on the list.  It then traverses that list,
printing stall-warning messages for each one that is still blocking
the current grace period and removing it from the list.  The list
removal invokes the matching put_task_struct().

This of course means that in the admittedly unlikely event that some
task executes its outermost rcu_read_unlock_trace() in the meantime, it
won't be removed from the list and put_task_struct() won't be executing,
resulting in a task_struct leak.  This commit therefore makes the list
removal and put_task_struct() unconditional, stopping the leak.

Note further that this bug can occur only after an RCU Tasks Trace CPU
stall warning, which by default only happens after a grace period has
extended for ten minutes (yes, not a typo, minutes).

Fixes: 4593e772b502 ("rcu-tasks: Add stall warnings for RCU Tasks Trace")
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: <bpf@vger.kernel.org>
Cc: <stable@vger.kernel.org> # 5.7.x
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tasks.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index e583a2d..fcd9c25 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1092,11 +1092,11 @@ static void rcu_tasks_trace_postgp(struct rcu_tasks *rtp)
 			if (READ_ONCE(t->trc_reader_special.b.need_qs))
 				trc_add_holdout(t, &holdouts);
 		firstreport = true;
-		list_for_each_entry_safe(t, g, &holdouts, trc_holdout_list)
-			if (READ_ONCE(t->trc_reader_special.b.need_qs)) {
+		list_for_each_entry_safe(t, g, &holdouts, trc_holdout_list) {
+			if (READ_ONCE(t->trc_reader_special.b.need_qs))
 				show_stalled_task_trace(t, &firstreport);
-				trc_del_holdout(t);
-			}
+			trc_del_holdout(t); // Release task_struct reference.
+		}
 		if (firstreport)
 			pr_err("INFO: rcu_tasks_trace detected stalls? (Counter/taskslist mismatch?)\n");
 		show_stalled_ipi_trace();
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH tip/core/rcu 8/8] rcu-tasks: Enclose task-list scan in rcu_read_lock()
  2020-09-17 21:06 ` [PATCH RFC tip/core/rcu 0/4] Accelerate RCU Tasks Trace updates Paul E. McKenney
                     ` (6 preceding siblings ...)
  2020-09-17 21:07   ` [PATCH tip/core/rcu 7/8] rcu-tasks: Fix low-probability task_struct leak paulmck
@ 2020-09-17 21:07   ` paulmck
  7 siblings, 0 replies; 16+ messages in thread
From: paulmck @ 2020-09-17 21:07 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, sfr, Paul E. McKenney,
	Alexei Starovoitov, Daniel Borkmann, Jiri Olsa, bpf, # 5 . 7 . x

From: "Paul E. McKenney" <paulmck@kernel.org>

The rcu_tasks_trace_postgp() function uses for_each_process_thread()
to scan the task list without the benefit of RCU read-side protection,
which can result in use-after-free errors on task_struct structures.
This error was missed because the TRACE01 rcutorture scenario enables
lockdep, but also builds with CONFIG_PREEMPT_NONE=y.  In this situation,
preemption is disabled everywhere, so lockdep thinks everywhere can
be a legitimate RCU reader.  This commit therefore adds the needed
rcu_read_lock() and rcu_read_unlock().

Note that this bug can occur only after an RCU Tasks Trace CPU stall
warning, which by default only happens after a grace period has extended
for ten minutes (yes, not a typo, minutes).

Fixes: 4593e772b502 ("rcu-tasks: Add stall warnings for RCU Tasks Trace")
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: <bpf@vger.kernel.org>
Cc: <stable@vger.kernel.org> # 5.7.x
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tasks.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index fcd9c25..d5d9f2d 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1088,9 +1088,11 @@ static void rcu_tasks_trace_postgp(struct rcu_tasks *rtp)
 		if (ret)
 			break;  // Count reached zero.
 		// Stall warning time, so make a list of the offenders.
+		rcu_read_lock();
 		for_each_process_thread(g, t)
 			if (READ_ONCE(t->trc_reader_special.b.need_qs))
 				trc_add_holdout(t, &holdouts);
+		rcu_read_unlock();
 		firstreport = true;
 		list_for_each_entry_safe(t, g, &holdouts, trc_holdout_list) {
 			if (READ_ONCE(t->trc_reader_special.b.need_qs))
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-09-17 21:08 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-10 20:19 [PATCH RFC tip/core/rcu 0/4] Accelerate RCU Tasks Trace updates Paul E. McKenney
2020-09-10 20:20 ` [PATCH RFC tip/core/rcu 1/4] rcu-tasks: Mark variables static paulmck
2020-09-10 20:20 ` [PATCH RFC tip/core/rcu 2/4] rcu-tasks: Use more aggressive polling for RCU Tasks Trace paulmck
2020-09-10 20:20 ` [PATCH RFC tip/core/rcu 3/4] rcu-tasks: Selectively enable more RCU Tasks Trace IPIs paulmck
2020-09-10 20:20 ` [PATCH RFC tip/core/rcu 4/4] rcu-tasks: Shorten per-grace-period sleep for RCU Tasks Trace paulmck
2020-09-11  3:18   ` Alexei Starovoitov
2020-09-11  4:37     ` Paul E. McKenney
2020-09-17 21:06 ` [PATCH RFC tip/core/rcu 0/4] Accelerate RCU Tasks Trace updates Paul E. McKenney
2020-09-17 21:07   ` [PATCH tip/core/rcu 1/8] rcu-tasks: Prevent complaints of unused show_rcu_tasks_classic_gp_kthread() paulmck
2020-09-17 21:07   ` [PATCH tip/core/rcu 2/8] rcu-tasks: Mark variables static paulmck
2020-09-17 21:07   ` [PATCH tip/core/rcu 3/8] rcu-tasks: Use more aggressive polling for RCU Tasks Trace paulmck
2020-09-17 21:07   ` [PATCH tip/core/rcu 4/8] rcu-tasks: Selectively enable more RCU Tasks Trace IPIs paulmck
2020-09-17 21:07   ` [PATCH tip/core/rcu 5/8] rcu-tasks: Shorten per-grace-period sleep for RCU Tasks Trace paulmck
2020-09-17 21:07   ` [PATCH tip/core/rcu 6/8] rcu-tasks: Fix grace-period/unlock race in " paulmck
2020-09-17 21:07   ` [PATCH tip/core/rcu 7/8] rcu-tasks: Fix low-probability task_struct leak paulmck
2020-09-17 21:07   ` [PATCH tip/core/rcu 8/8] rcu-tasks: Enclose task-list scan in rcu_read_lock() paulmck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).