rcu.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH AUTOSEL 6.0 02/18] rcu: Avoid triggering strict-GP irq-work when RCU is idle
       [not found] <20221009205136.1201774-1-sashal@kernel.org>
@ 2022-10-09 20:51 ` Sasha Levin
  2022-10-09 20:51 ` [PATCH AUTOSEL 6.0 03/18] rcu: Back off upon fill_page_cache_func() allocation failure Sasha Levin
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2022-10-09 20:51 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Zqiang, Paul E . McKenney, Sasha Levin, frederic, quic_neeraju,
	josh, rcu

From: Zqiang <qiang1.zhang@intel.com>

[ Upstream commit 621189a1fe93cb2b34d62c5cdb9e258bca044813 ]

Kernels built with PREEMPT_RCU=y and RCU_STRICT_GRACE_PERIOD=y trigger
irq-work from rcu_read_unlock(), and the resulting irq-work handler
invokes rcu_preempt_deferred_qs_handle().  The point of this triggering
is to force grace periods to end quickly in order to give tools like KASAN
a better chance of detecting RCU usage bugs such as leaking RCU-protected
pointers out of an RCU read-side critical section.

However, this irq-work triggering is unconditional.  This works, but
there is no point in doing this irq-work unless the current grace period
is waiting on the running CPU or task, which is not the common case.
After all, in the common case there are many rcu_read_unlock() calls
per CPU per grace period.

This commit therefore triggers the irq-work only when the current grace
period is waiting on the running CPU or task.

This change was tested as follows on a four-CPU system:

	echo rcu_preempt_deferred_qs_handler > /sys/kernel/debug/tracing/set_ftrace_filter
	echo 1 > /sys/kernel/debug/tracing/function_profile_enabled
	insmod rcutorture.ko
	sleep 20
	rmmod rcutorture.ko
	echo 0 > /sys/kernel/debug/tracing/function_profile_enabled
	echo > /sys/kernel/debug/tracing/set_ftrace_filter

This procedure produces results in this per-CPU set of files:

	/sys/kernel/debug/tracing/trace_stat/function*

Sample output from one of these files is as follows:

  Function                               Hit    Time            Avg             s^2
  --------                               ---    ----            ---             ---
  rcu_preempt_deferred_qs_handle      838746    182650.3 us     0.217 us        0.004 us

The baseline sum of the "Hit" values (the number of calls to this
function) was 3,319,015.  With this commit, that sum was 1,140,359,
for a 2.9x reduction.  The worst-case variance across the CPUs was less
than 25%, so this large effect size is statistically significant.

The raw data is available in the Link: URL.

Link: https://lore.kernel.org/all/20220808022626.12825-1-qiang1.zhang@intel.com/
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 kernel/rcu/tree_plugin.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 438ecae6bd7e..49468b4d1b43 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -641,7 +641,8 @@ static void rcu_read_unlock_special(struct task_struct *t)
 
 		expboost = (t->rcu_blocked_node && READ_ONCE(t->rcu_blocked_node->exp_tasks)) ||
 			   (rdp->grpmask & READ_ONCE(rnp->expmask)) ||
-			   IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) ||
+			   (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) &&
+			   ((rdp->grpmask & READ_ONCE(rnp->qsmask)) || t->rcu_blocked_node)) ||
 			   (IS_ENABLED(CONFIG_RCU_BOOST) && irqs_were_disabled &&
 			    t->rcu_blocked_node);
 		// Need to defer quiescent state until everything is enabled.
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH AUTOSEL 6.0 03/18] rcu: Back off upon fill_page_cache_func() allocation failure
       [not found] <20221009205136.1201774-1-sashal@kernel.org>
  2022-10-09 20:51 ` [PATCH AUTOSEL 6.0 02/18] rcu: Avoid triggering strict-GP irq-work when RCU is idle Sasha Levin
@ 2022-10-09 20:51 ` Sasha Levin
  2022-10-09 20:51 ` [PATCH AUTOSEL 6.0 04/18] rcu-tasks: Convert RCU_LOCKDEP_WARN() to WARN_ONCE() Sasha Levin
  2022-10-09 20:51 ` [PATCH AUTOSEL 6.0 05/18] rcu-tasks: Ensure RCU Tasks Trace loops have quiescent states Sasha Levin
  3 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2022-10-09 20:51 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Michal Hocko, Uladzislau Rezki, Paul E. McKenney,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	Sasha Levin, rcu

From: Michal Hocko <mhocko@suse.com>

[ Upstream commit 093590c16b447f53e66771c8579ae66c96f6ef61 ]

The fill_page_cache_func() function allocates couple of pages to store
kvfree_rcu_bulk_data structures. This is a lightweight (GFP_NORETRY)
allocation which can fail under memory pressure. The function will,
however keep retrying even when the previous attempt has failed.

This retrying is in theory correct, but in practice the allocation is
invoked from workqueue context, which means that if the memory reclaim
gets stuck, these retries can hog the worker for quite some time.
Although the workqueues subsystem automatically adjusts concurrency, such
adjustment is not guaranteed to happen until the worker context sleeps.
And the fill_page_cache_func() function's retry loop is not guaranteed
to sleep (see the should_reclaim_retry() function).

And we have seen this function cause workqueue lockups:

kernel: BUG: workqueue lockup - pool cpus=93 node=1 flags=0x1 nice=0 stuck for 32s!
[...]
kernel: pool 74: cpus=37 node=0 flags=0x1 nice=0 hung=32s workers=2 manager: 2146
kernel:   pwq 498: cpus=249 node=1 flags=0x1 nice=0 active=4/256 refcnt=5
kernel:     in-flight: 1917:fill_page_cache_func
kernel:     pending: dbs_work_handler, free_work, kfree_rcu_monitor

Originally, we thought that the root cause of this lockup was several
retries with direct reclaim, but this is not yet confirmed.  Furthermore,
we have seen similar lockups without any heavy memory pressure.  This
suggests that there are other factors contributing to these lockups.
However, it is not really clear that endless retries are desireable.

So let's make the fill_page_cache_func() function back off after
allocation failure.

Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 kernel/rcu/tree.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 79aea7df4345..eb435941e92f 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3183,15 +3183,16 @@ static void fill_page_cache_func(struct work_struct *work)
 		bnode = (struct kvfree_rcu_bulk_data *)
 			__get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
 
-		if (bnode) {
-			raw_spin_lock_irqsave(&krcp->lock, flags);
-			pushed = put_cached_bnode(krcp, bnode);
-			raw_spin_unlock_irqrestore(&krcp->lock, flags);
+		if (!bnode)
+			break;
 
-			if (!pushed) {
-				free_page((unsigned long) bnode);
-				break;
-			}
+		raw_spin_lock_irqsave(&krcp->lock, flags);
+		pushed = put_cached_bnode(krcp, bnode);
+		raw_spin_unlock_irqrestore(&krcp->lock, flags);
+
+		if (!pushed) {
+			free_page((unsigned long) bnode);
+			break;
 		}
 	}
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH AUTOSEL 6.0 04/18] rcu-tasks: Convert RCU_LOCKDEP_WARN() to WARN_ONCE()
       [not found] <20221009205136.1201774-1-sashal@kernel.org>
  2022-10-09 20:51 ` [PATCH AUTOSEL 6.0 02/18] rcu: Avoid triggering strict-GP irq-work when RCU is idle Sasha Levin
  2022-10-09 20:51 ` [PATCH AUTOSEL 6.0 03/18] rcu: Back off upon fill_page_cache_func() allocation failure Sasha Levin
@ 2022-10-09 20:51 ` Sasha Levin
  2022-10-09 20:51 ` [PATCH AUTOSEL 6.0 05/18] rcu-tasks: Ensure RCU Tasks Trace loops have quiescent states Sasha Levin
  3 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2022-10-09 20:51 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Zqiang, Paul E . McKenney, Sasha Levin, frederic, quic_neeraju,
	josh, rcu

From: Zqiang <qiang1.zhang@intel.com>

[ Upstream commit fcd53c8a4dfa38bafb89efdd0b0f718f3a03f884 ]

Kernels built with CONFIG_PROVE_RCU=y and CONFIG_DEBUG_LOCK_ALLOC=y
attempt to emit a warning when the synchronize_rcu_tasks_generic()
function is called during early boot while the rcu_scheduler_active
variable is RCU_SCHEDULER_INACTIVE.  However the warnings is not
actually be printed because the debug_lockdep_rcu_enabled() returns
false, exactly because the rcu_scheduler_active variable is still equal
to RCU_SCHEDULER_INACTIVE.

This commit therefore replaces RCU_LOCKDEP_WARN() with WARN_ONCE()
to force these warnings to actually be printed.

Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 kernel/rcu/tasks.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 83c7e6620d40..469bf2a3b505 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -560,7 +560,7 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 static void synchronize_rcu_tasks_generic(struct rcu_tasks *rtp)
 {
 	/* Complain if the scheduler has not started.  */
-	RCU_LOCKDEP_WARN(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE,
+	WARN_ONCE(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE,
 			 "synchronize_rcu_tasks called too soon");
 
 	// If the grace-period kthread is running, use it.
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH AUTOSEL 6.0 05/18] rcu-tasks: Ensure RCU Tasks Trace loops have quiescent states
       [not found] <20221009205136.1201774-1-sashal@kernel.org>
                   ` (2 preceding siblings ...)
  2022-10-09 20:51 ` [PATCH AUTOSEL 6.0 04/18] rcu-tasks: Convert RCU_LOCKDEP_WARN() to WARN_ONCE() Sasha Levin
@ 2022-10-09 20:51 ` Sasha Levin
  3 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2022-10-09 20:51 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Paul E. McKenney, Sasha Levin, frederic, quic_neeraju, josh, rcu

From: "Paul E. McKenney" <paulmck@kernel.org>

[ Upstream commit d6ad60635cafe900bcd11ad588d8accb36c36b1b ]

The RCU Tasks Trace grace-period kthread loops across all CPUs, and
there can be quite a few CPUs, with some commercially available systems
sporting well over a thousand of them.  Some of these loops can feature
IPIs, which can take some time.  This commit therefore places a call to
cond_resched_tasks_rcu_qs() in each such loop.

Link: https://docs.google.com/document/d/1V0YnG1HTWMt9WHJjroiJL9lf-hMrud4v8Fn3fhyY0cI/edit?usp=sharing
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 kernel/rcu/tasks.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 469bf2a3b505..f5bf6fb430da 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1500,6 +1500,7 @@ static void rcu_tasks_trace_pregp_step(struct list_head *hop)
 		if (rcu_tasks_trace_pertask_prep(t, true))
 			trc_add_holdout(t, hop);
 		rcu_read_unlock();
+		cond_resched_tasks_rcu_qs();
 	}
 
 	// Only after all running tasks have been accounted for is it
@@ -1520,6 +1521,7 @@ static void rcu_tasks_trace_pregp_step(struct list_head *hop)
 			raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
 		}
 		raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
+		cond_resched_tasks_rcu_qs();
 	}
 
 	// Re-enable CPU hotplug now that the holdout list is populated.
@@ -1619,6 +1621,7 @@ static void check_all_holdout_tasks_trace(struct list_head *hop,
 			trc_del_holdout(t);
 		else if (needreport)
 			show_stalled_task_trace(t, firstreport);
+		cond_resched_tasks_rcu_qs();
 	}
 
 	// Re-enable CPU hotplug now that the holdout list scan has completed.
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-10-09 20:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20221009205136.1201774-1-sashal@kernel.org>
2022-10-09 20:51 ` [PATCH AUTOSEL 6.0 02/18] rcu: Avoid triggering strict-GP irq-work when RCU is idle Sasha Levin
2022-10-09 20:51 ` [PATCH AUTOSEL 6.0 03/18] rcu: Back off upon fill_page_cache_func() allocation failure Sasha Levin
2022-10-09 20:51 ` [PATCH AUTOSEL 6.0 04/18] rcu-tasks: Convert RCU_LOCKDEP_WARN() to WARN_ONCE() Sasha Levin
2022-10-09 20:51 ` [PATCH AUTOSEL 6.0 05/18] rcu-tasks: Ensure RCU Tasks Trace loops have quiescent states Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).