linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH rcu 01/12] rcu: Decrease FQS scan wait time in case of callback overloading
  2022-06-20 22:20 [PATCH rcu 0/12] Miscellaneous fixes for v5.20 Paul E. McKenney
@ 2022-06-20 22:20 ` Paul E. McKenney
  2022-06-21  5:29   ` Neeraj Upadhyay
  2022-06-20 22:20 ` [PATCH rcu 02/12] rcu: Avoid tracing a few functions executed in stop machine Paul E. McKenney
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 39+ messages in thread
From: Paul E. McKenney @ 2022-06-20 22:20 UTC (permalink / raw)
  To: rcu; +Cc: linux-kernel, kernel-team, rostedt, Paul E. McKenney

The force-quiesce-state loop function rcu_gp_fqs_loop() checks for
callback overloading and does an immediate initial scan for idle CPUs
if so.  However, subsequent rescans will be carried out at as leisurely a
rate as they always are, as specified by the rcutree.jiffies_till_next_fqs
module parameter.  It might be tempting to just continue immediately
rescanning, but this turns the RCU grace-period kthread into a CPU hog.
It might also be tempting to reduce the time between rescans to a single
jiffy, but this can be problematic on larger systems.

This commit therefore divides the normal time between rescans by three,
rounding up.  Thus a small system running at HZ=1000 that is suffering
from callback overload will wait only one jiffy instead of the normal
three between rescans.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c25ba442044a6..c19d5926886fb 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1993,6 +1993,11 @@ static noinline_for_stack void rcu_gp_fqs_loop(void)
 			WRITE_ONCE(rcu_state.jiffies_kick_kthreads,
 				   jiffies + (j ? 3 * j : 2));
 		}
+		if (rcu_state.cbovld) {
+			j = (j + 2) / 3;
+			if (j <= 0)
+				j = 1;
+		}
 		trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
 				       TPS("fqswait"));
 		WRITE_ONCE(rcu_state.gp_state, RCU_GP_WAIT_FQS);
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH rcu 0/12] Miscellaneous fixes for v5.20
@ 2022-06-20 22:20 Paul E. McKenney
  2022-06-20 22:20 ` [PATCH rcu 01/12] rcu: Decrease FQS scan wait time in case of callback overloading Paul E. McKenney
                   ` (11 more replies)
  0 siblings, 12 replies; 39+ messages in thread
From: Paul E. McKenney @ 2022-06-20 22:20 UTC (permalink / raw)
  To: rcu; +Cc: linux-kernel, kernel-team, rostedt

Hello!

This series contains miscellaneous fixes:

1.	Decrease FQS scan wait time in case of callback overloading.

2.	Avoid tracing a few functions executed in stop machine, courtesy
	of Patrick Wang.

3.	Add rnp->cbovldmask check in rcutree_migrate_callbacks(),
	courtesy of Zqiang.

4.	Immediately boost preempted readers for strict grace periods,
	courtesy of Zqiang.

5.	Forbid RCU_STRICT_GRACE_PERIOD in TINY_RCU kernels.

6.	locking/csd_lock: Change csdlock_debug from early_param to
	__setup, courtesy of Chen Zhongjin.

7.	tiny: Record kvfree_call_rcu() call stack for KASAN, courtesy
	of Johannes Berg.

8.	Cleanup RCU urgency state for offline CPU, courtesy of Zqiang.

9.	Remove useless monitor_todo flag, courtesy of "Joel Fernandes
	(Google)".

10.	Initialize first_gp_fqs at declaration in rcu_gp_fqs().

11.	Add comment to describe GP-done condition in fqs loop, courtesy
	of Neeraj Upadhyay.

12.	Block less aggressively for expedited grace periods.

						Thanx, Paul

------------------------------------------------------------------------

 b/include/linux/rcutiny.h  |   11 +++++++++-
 b/kernel/rcu/Kconfig.debug |    2 -
 b/kernel/rcu/srcutree.c    |   20 ++++++++++++------
 b/kernel/rcu/tiny.c        |   14 +++++++++++++
 b/kernel/rcu/tree.c        |    5 ++++
 b/kernel/rcu/tree_plugin.h |    6 ++---
 b/kernel/smp.c             |    4 +--
 kernel/rcu/tree.c          |   48 ++++++++++++++++++++++++++-------------------
 kernel/rcu/tree_plugin.h   |    3 +-
 9 files changed, 78 insertions(+), 35 deletions(-)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH rcu 02/12] rcu: Avoid tracing a few functions executed in stop machine
  2022-06-20 22:20 [PATCH rcu 0/12] Miscellaneous fixes for v5.20 Paul E. McKenney
  2022-06-20 22:20 ` [PATCH rcu 01/12] rcu: Decrease FQS scan wait time in case of callback overloading Paul E. McKenney
@ 2022-06-20 22:20 ` Paul E. McKenney
  2022-06-21  5:47   ` Neeraj Upadhyay
  2022-06-20 22:20 ` [PATCH rcu 03/12] rcu: Add rnp->cbovldmask check in rcutree_migrate_callbacks() Paul E. McKenney
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 39+ messages in thread
From: Paul E. McKenney @ 2022-06-20 22:20 UTC (permalink / raw)
  To: rcu; +Cc: linux-kernel, kernel-team, rostedt, Patrick Wang, Paul E . McKenney

From: Patrick Wang <patrick.wang.shcn@gmail.com>

Stop-machine recently started calling additional functions while waiting:

----------------------------------------------------------------
Former stop machine wait loop:
do {
    cpu_relax(); => macro
    ...
} while (curstate != STOPMACHINE_EXIT);
-----------------------------------------------------------------
Current stop machine wait loop:
do {
    stop_machine_yield(cpumask); => function (notraced)
    ...
    touch_nmi_watchdog(); => function (notraced, inside calls also notraced)
    ...
    rcu_momentary_dyntick_idle(); => function (notraced, inside calls traced)
} while (curstate != MULTI_STOP_EXIT);
------------------------------------------------------------------

These functions (and the functions that they call) must be marked
notrace to prevent them from being updated while they are executing.
The consequences of failing to mark these functions can be severe:

  rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
  rcu: 	1-...!: (0 ticks this GP) idle=14f/1/0x4000000000000000 softirq=3397/3397 fqs=0
  rcu: 	3-...!: (0 ticks this GP) idle=ee9/1/0x4000000000000000 softirq=5168/5168 fqs=0
  	(detected by 0, t=8137 jiffies, g=5889, q=2 ncpus=4)
  Task dump for CPU 1:
  task:migration/1     state:R  running task     stack:    0 pid:   19 ppid:     2 flags:0x00000000
  Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
  Call Trace:
  Task dump for CPU 3:
  task:migration/3     state:R  running task     stack:    0 pid:   29 ppid:     2 flags:0x00000000
  Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
  Call Trace:
  rcu: rcu_preempt kthread timer wakeup didn't happen for 8136 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
  rcu: 	Possible timer handling issue on cpu=2 timer-softirq=594
  rcu: rcu_preempt kthread starved for 8137 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2
  rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
  rcu: RCU grace-period kthread stack dump:
  task:rcu_preempt     state:I stack:    0 pid:   14 ppid:     2 flags:0x00000000
  Call Trace:
    schedule+0x56/0xc2
    schedule_timeout+0x82/0x184
    rcu_gp_fqs_loop+0x19a/0x318
    rcu_gp_kthread+0x11a/0x140
    kthread+0xee/0x118
    ret_from_exception+0x0/0x14
  rcu: Stack dump where RCU GP kthread last ran:
  Task dump for CPU 2:
  task:migration/2     state:R  running task     stack:    0 pid:   24 ppid:     2 flags:0x00000000
  Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
  Call Trace:

This commit therefore marks these functions notrace:
 rcu_preempt_deferred_qs()
 rcu_preempt_need_deferred_qs()
 rcu_preempt_deferred_qs_irqrestore()

Signed-off-by: Patrick Wang <patrick.wang.shcn@gmail.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree_plugin.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index c8ba0fe17267c..440d9e02a26e0 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -460,7 +460,7 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp)
  * be quite short, for example, in the case of the call from
  * rcu_read_unlock_special().
  */
-static void
+static notrace void
 rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
 {
 	bool empty_exp;
@@ -581,7 +581,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
  * is disabled.  This function cannot be expected to understand these
  * nuances, so the caller must handle them.
  */
-static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
+static notrace bool rcu_preempt_need_deferred_qs(struct task_struct *t)
 {
 	return (__this_cpu_read(rcu_data.cpu_no_qs.b.exp) ||
 		READ_ONCE(t->rcu_read_unlock_special.s)) &&
@@ -595,7 +595,7 @@ static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
  * evaluate safety in terms of interrupt, softirq, and preemption
  * disabling.
  */
-static void rcu_preempt_deferred_qs(struct task_struct *t)
+static notrace void rcu_preempt_deferred_qs(struct task_struct *t)
 {
 	unsigned long flags;
 
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH rcu 03/12] rcu: Add rnp->cbovldmask check in rcutree_migrate_callbacks()
  2022-06-20 22:20 [PATCH rcu 0/12] Miscellaneous fixes for v5.20 Paul E. McKenney
  2022-06-20 22:20 ` [PATCH rcu 01/12] rcu: Decrease FQS scan wait time in case of callback overloading Paul E. McKenney
  2022-06-20 22:20 ` [PATCH rcu 02/12] rcu: Avoid tracing a few functions executed in stop machine Paul E. McKenney
@ 2022-06-20 22:20 ` Paul E. McKenney
  2022-06-21  5:57   ` Neeraj Upadhyay
  2022-06-20 22:20 ` [PATCH rcu 04/12] rcu: Immediately boost preempted readers for strict grace periods Paul E. McKenney
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 39+ messages in thread
From: Paul E. McKenney @ 2022-06-20 22:20 UTC (permalink / raw)
  To: rcu; +Cc: linux-kernel, kernel-team, rostedt, Zqiang, Paul E . McKenney

From: Zqiang <qiang1.zhang@intel.com>

Currently, the rcu_node structure's ->cbovlmask field is set in call_rcu()
when a given CPU is suffering from callback overload.  But if that CPU
goes offline, the outgoing CPU's callbacks is migrated to the running
CPU, which is likely to overload the running CPU.  However, that CPU's
bit in its leaf rcu_node structure's ->cbovlmask field remains zero.

Initially, this is OK because the outgoing CPU's bit remains set.
However, that bit will be cleared at the next end of a grace period,
at which time it is quite possible that the running CPU will still
be overloaded.  If the running CPU invokes call_rcu(), then overload
will be checked for and the bit will be set.  Except that there is no
guarantee that the running CPU will invoke call_rcu(), in which case the
next grace period will fail to take the running CPU's overload condition
into account.  Plus, because the bit is not set, the end of the grace
period won't check for overload on this CPU.

This commit therefore adds a call to check_cb_ovld_locked() in
check_cb_ovld_locked() to set the running CPU's ->cbovlmask bit
appropriately.

Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c19d5926886fb..f4a37f2032664 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4491,6 +4491,7 @@ void rcutree_migrate_callbacks(int cpu)
 	needwake = needwake || rcu_advance_cbs(my_rnp, my_rdp);
 	rcu_segcblist_disable(&rdp->cblist);
 	WARN_ON_ONCE(rcu_segcblist_empty(&my_rdp->cblist) != !rcu_segcblist_n_cbs(&my_rdp->cblist));
+	check_cb_ovld_locked(my_rdp, my_rnp);
 	if (rcu_rdp_is_offloaded(my_rdp)) {
 		raw_spin_unlock_rcu_node(my_rnp); /* irqs remain disabled. */
 		__call_rcu_nocb_wake(my_rdp, true, flags);
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH rcu 04/12] rcu: Immediately boost preempted readers for strict grace periods
  2022-06-20 22:20 [PATCH rcu 0/12] Miscellaneous fixes for v5.20 Paul E. McKenney
                   ` (2 preceding siblings ...)
  2022-06-20 22:20 ` [PATCH rcu 03/12] rcu: Add rnp->cbovldmask check in rcutree_migrate_callbacks() Paul E. McKenney
@ 2022-06-20 22:20 ` Paul E. McKenney
  2022-06-21  6:00   ` Neeraj Upadhyay
  2022-06-20 22:20 ` [PATCH rcu 05/12] rcu: Forbid RCU_STRICT_GRACE_PERIOD in TINY_RCU kernels Paul E. McKenney
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 39+ messages in thread
From: Paul E. McKenney @ 2022-06-20 22:20 UTC (permalink / raw)
  To: rcu; +Cc: linux-kernel, kernel-team, rostedt, Zqiang, Paul E . McKenney

From: Zqiang <qiang1.zhang@intel.com>

The intent of the CONFIG_RCU_STRICT_GRACE_PERIOD Konfig option is to
cause normal grace periods to complete quickly in order to better catch
errors resulting from improperly leaking pointers from RCU read-side
critical sections.  However, kernels built with this option enabled still
wait for some hundreds of milliseconds before boosting RCU readers that
have been preempted within their current critical section.  The value
of this delay is set by the CONFIG_RCU_BOOST_DELAY Kconfig option,
which defaults to 500 milliseconds.

This commit therefore causes kernels build with strict grace periods
to ignore CONFIG_RCU_BOOST_DELAY.  This causes rcu_initiate_boost()
to start boosting immediately after all CPUs on a given leaf rcu_node
structure have passed through their quiescent states.

Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree_plugin.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 440d9e02a26e0..53d05eb804121 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1140,7 +1140,8 @@ static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags)
 	    (rnp->gp_tasks != NULL &&
 	     rnp->boost_tasks == NULL &&
 	     rnp->qsmask == 0 &&
-	     (!time_after(rnp->boost_time, jiffies) || rcu_state.cbovld))) {
+	     (!time_after(rnp->boost_time, jiffies) || rcu_state.cbovld ||
+	      IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD)))) {
 		if (rnp->exp_tasks == NULL)
 			WRITE_ONCE(rnp->boost_tasks, rnp->gp_tasks);
 		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH rcu 05/12] rcu: Forbid RCU_STRICT_GRACE_PERIOD in TINY_RCU kernels
  2022-06-20 22:20 [PATCH rcu 0/12] Miscellaneous fixes for v5.20 Paul E. McKenney
                   ` (3 preceding siblings ...)
  2022-06-20 22:20 ` [PATCH rcu 04/12] rcu: Immediately boost preempted readers for strict grace periods Paul E. McKenney
@ 2022-06-20 22:20 ` Paul E. McKenney
  2022-06-21  6:02   ` Neeraj Upadhyay
  2022-06-20 22:20 ` [PATCH rcu 06/12] locking/csd_lock: Change csdlock_debug from early_param to __setup Paul E. McKenney
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 39+ messages in thread
From: Paul E. McKenney @ 2022-06-20 22:20 UTC (permalink / raw)
  To: rcu; +Cc: linux-kernel, kernel-team, rostedt, Paul E. McKenney

The RCU_STRICT_GRACE_PERIOD Kconfig option does nothing in kernels
built with CONFIG_TINY_RCU=y, so this commit adjusts the dependencies
to disallow this combination.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/Kconfig.debug | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
index 9b64e55d4f615..4da05beb13d79 100644
--- a/kernel/rcu/Kconfig.debug
+++ b/kernel/rcu/Kconfig.debug
@@ -121,7 +121,7 @@ config RCU_EQS_DEBUG
 
 config RCU_STRICT_GRACE_PERIOD
 	bool "Provide debug RCU implementation with short grace periods"
-	depends on DEBUG_KERNEL && RCU_EXPERT && NR_CPUS <= 4
+	depends on DEBUG_KERNEL && RCU_EXPERT && NR_CPUS <= 4 && !TINY_RCU
 	default n
 	select PREEMPT_COUNT if PREEMPT=n
 	help
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH rcu 06/12] locking/csd_lock: Change csdlock_debug from early_param to __setup
  2022-06-20 22:20 [PATCH rcu 0/12] Miscellaneous fixes for v5.20 Paul E. McKenney
                   ` (4 preceding siblings ...)
  2022-06-20 22:20 ` [PATCH rcu 05/12] rcu: Forbid RCU_STRICT_GRACE_PERIOD in TINY_RCU kernels Paul E. McKenney
@ 2022-06-20 22:20 ` Paul E. McKenney
  2022-06-20 22:20 ` [PATCH rcu 07/12] rcu: tiny: Record kvfree_call_rcu() call stack for KASAN Paul E. McKenney
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 39+ messages in thread
From: Paul E. McKenney @ 2022-06-20 22:20 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Chen Zhongjin, stable,
	Chen jingwen, Paul E . McKenney

From: Chen Zhongjin <chenzhongjin@huawei.com>

The csdlock_debug kernel-boot parameter is parsed by the
early_param() function csdlock_debug().  If set, csdlock_debug()
invokes static_branch_enable() to enable csd_lock_wait feature, which
triggers a panic on arm64 for kernels built with CONFIG_SPARSEMEM=y and
CONFIG_SPARSEMEM_VMEMMAP=n.

With CONFIG_SPARSEMEM_VMEMMAP=n, __nr_to_section is called in
static_key_enable() and returns NULL, resulting in a NULL dereference
because mem_section is initialized only later in sparse_init().

This is also a problem for powerpc because early_param() functions
are invoked earlier than jump_label_init(), also resulting in
static_key_enable() failures.  These failures cause the warning "static
key 'xxx' used before call to jump_label_init()".

Thus, early_param is too early for csd_lock_wait to run
static_branch_enable(), so changes it to __setup to fix these.

Fixes: 8d0968cc6b8f ("locking/csd_lock: Add boot parameter for controlling CSD lock debugging")
Cc: stable@vger.kernel.org
Reported-by: Chen jingwen <chenjingwen6@huawei.com>
Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/smp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index dd215f4394264..650810a6f29b3 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -174,9 +174,9 @@ static int __init csdlock_debug(char *str)
 	if (val)
 		static_branch_enable(&csdlock_debug_enabled);
 
-	return 0;
+	return 1;
 }
-early_param("csdlock_debug", csdlock_debug);
+__setup("csdlock_debug=", csdlock_debug);
 
 static DEFINE_PER_CPU(call_single_data_t *, cur_csd);
 static DEFINE_PER_CPU(smp_call_func_t, cur_csd_func);
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH rcu 07/12] rcu: tiny: Record kvfree_call_rcu() call stack for KASAN
  2022-06-20 22:20 [PATCH rcu 0/12] Miscellaneous fixes for v5.20 Paul E. McKenney
                   ` (5 preceding siblings ...)
  2022-06-20 22:20 ` [PATCH rcu 06/12] locking/csd_lock: Change csdlock_debug from early_param to __setup Paul E. McKenney
@ 2022-06-20 22:20 ` Paul E. McKenney
  2022-06-21  6:31   ` Neeraj Upadhyay
  2022-06-20 22:20 ` [PATCH rcu 08/12] rcu: Cleanup RCU urgency state for offline CPU Paul E. McKenney
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 39+ messages in thread
From: Paul E. McKenney @ 2022-06-20 22:20 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Johannes Berg, Dmitry Vyukov,
	Paul E . McKenney

From: Johannes Berg <johannes.berg@intel.com>

When running KASAN with Tiny RCU (e.g. under ARCH=um, where
a working KASAN patch is now available), we don't get any
information on the original kfree_rcu() (or similar) caller
when a problem is reported, as Tiny RCU doesn't record this.

Add the recording, which required pulling kvfree_call_rcu()
out of line for the KASAN case since the recording function
(kasan_record_aux_stack_noalloc) is neither exported, nor
can we include kasan.h into rcutiny.h.

without KASAN, the patch has no size impact (ARCH=um kernel):
    text       data         bss         dec        hex    filename
 6151515    4423154    33148520    43723189    29b29b5    linux
 6151515    4423154    33148520    43723189    29b29b5    linux + patch

with KASAN, the impact on my build was minimal:
    text       data         bss         dec        hex    filename
13915539    7388050    33282304    54585893    340ea25    linux
13911266    7392114    33282304    54585684    340e954    linux + patch
   -4273      +4064         +-0        -209

Acked-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 include/linux/rcutiny.h | 11 ++++++++++-
 kernel/rcu/tiny.c       | 14 ++++++++++++++
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index 5fed476f977f6..d84e13f2c3848 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -38,7 +38,7 @@ static inline void synchronize_rcu_expedited(void)
  */
 extern void kvfree(const void *addr);
 
-static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
+static inline void __kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 {
 	if (head) {
 		call_rcu(head, func);
@@ -51,6 +51,15 @@ static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 	kvfree((void *) func);
 }
 
+#ifdef CONFIG_KASAN_GENERIC
+void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
+#else
+static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
+{
+	__kvfree_call_rcu(head, func);
+}
+#endif
+
 void rcu_qs(void);
 
 static inline void rcu_softirq_qs(void)
diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
index 340b3f8b090d4..58ff3721d975c 100644
--- a/kernel/rcu/tiny.c
+++ b/kernel/rcu/tiny.c
@@ -217,6 +217,20 @@ bool poll_state_synchronize_rcu(unsigned long oldstate)
 }
 EXPORT_SYMBOL_GPL(poll_state_synchronize_rcu);
 
+#ifdef CONFIG_KASAN_GENERIC
+void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
+{
+	if (head) {
+		void *ptr = (void *) head - (unsigned long) func;
+
+		kasan_record_aux_stack_noalloc(ptr);
+	}
+
+	__kvfree_call_rcu(head, func);
+}
+EXPORT_SYMBOL_GPL(kvfree_call_rcu);
+#endif
+
 void __init rcu_init(void)
 {
 	open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH rcu 08/12] rcu: Cleanup RCU urgency state for offline CPU
  2022-06-20 22:20 [PATCH rcu 0/12] Miscellaneous fixes for v5.20 Paul E. McKenney
                   ` (6 preceding siblings ...)
  2022-06-20 22:20 ` [PATCH rcu 07/12] rcu: tiny: Record kvfree_call_rcu() call stack for KASAN Paul E. McKenney
@ 2022-06-20 22:20 ` Paul E. McKenney
  2022-06-21  7:03   ` Neeraj Upadhyay
  2022-06-20 22:20 ` [PATCH rcu 09/12] rcu/kvfree: Remove useless monitor_todo flag Paul E. McKenney
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 39+ messages in thread
From: Paul E. McKenney @ 2022-06-20 22:20 UTC (permalink / raw)
  To: rcu; +Cc: linux-kernel, kernel-team, rostedt, Zqiang, Paul E . McKenney

From: Zqiang <qiang1.zhang@intel.com>

When a CPU is slow to provide a quiescent state for a given grace
period, RCU takes steps to encourage that CPU to get with the
quiescent-state program in a more timely fashion.  These steps
include these flags in the rcu_data structure:

1.	->rcu_urgent_qs, which causes the scheduling-clock interrupt to
	request an otherwise pointless context switch from the scheduler.

2.	->rcu_need_heavy_qs, which causes both cond_resched() and RCU's
	context-switch hook to do an immediate momentary quiscent state.

3.	->rcu_need_heavy_qs, which causes the scheduler-clock tick to
	be enabled even on nohz_full CPUs with only one runnable task.

These flags are of course cleared once the corresponding CPU has passed
through a quiescent state.  Unless that quiescent state is the CPU
going offline, which means that when the CPU comes back online, it will
needlessly consume additional CPU time and incur additional latency,
which constitutes a minor but very real performance bug.

This commit therefore adds the call to rcu_disable_urgency_upon_qs()
that clears these flags to the CPU-hotplug offlining code path.

Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index f4a37f2032664..5445b19b48408 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4446,6 +4446,7 @@ void rcu_report_dead(unsigned int cpu)
 	rdp->rcu_ofl_gp_flags = READ_ONCE(rcu_state.gp_flags);
 	if (rnp->qsmask & mask) { /* RCU waiting on outgoing CPU? */
 		/* Report quiescent state -before- changing ->qsmaskinitnext! */
+		rcu_disable_urgency_upon_qs(rdp);
 		rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	}
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH rcu 09/12] rcu/kvfree: Remove useless monitor_todo flag
  2022-06-20 22:20 [PATCH rcu 0/12] Miscellaneous fixes for v5.20 Paul E. McKenney
                   ` (7 preceding siblings ...)
  2022-06-20 22:20 ` [PATCH rcu 08/12] rcu: Cleanup RCU urgency state for offline CPU Paul E. McKenney
@ 2022-06-20 22:20 ` Paul E. McKenney
  2022-06-21 10:02   ` Neeraj Upadhyay
  2022-06-20 22:20 ` [PATCH rcu 10/12] rcu: Initialize first_gp_fqs at declaration in rcu_gp_fqs() Paul E. McKenney
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 39+ messages in thread
From: Paul E. McKenney @ 2022-06-20 22:20 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Joel Fernandes (Google),
	Uladzislau Rezki, Paul E . McKenney

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

monitor_todo is not needed as the work struct already tracks
if work is pending. Just use that to know if work is pending
using schedule_delayed_work() helper.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 33 ++++++++++++++++-----------------
 1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 5445b19b48408..7919d7b48fa6a 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3216,7 +3216,6 @@ struct kfree_rcu_cpu_work {
  * @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period
  * @lock: Synchronize access to this structure
  * @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES
- * @monitor_todo: Tracks whether a @monitor_work delayed work is pending
  * @initialized: The @rcu_work fields have been initialized
  * @count: Number of objects for which GP not started
  * @bkvcache:
@@ -3241,7 +3240,6 @@ struct kfree_rcu_cpu {
 	struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES];
 	raw_spinlock_t lock;
 	struct delayed_work monitor_work;
-	bool monitor_todo;
 	bool initialized;
 	int count;
 
@@ -3421,6 +3419,18 @@ static void kfree_rcu_work(struct work_struct *work)
 	}
 }
 
+static bool
+need_offload_krc(struct kfree_rcu_cpu *krcp)
+{
+	int i;
+
+	for (i = 0; i < FREE_N_CHANNELS; i++)
+		if (krcp->bkvhead[i])
+			return true;
+
+	return !!krcp->head;
+}
+
 /*
  * This function is invoked after the KFREE_DRAIN_JIFFIES timeout.
  */
@@ -3477,9 +3487,7 @@ static void kfree_rcu_monitor(struct work_struct *work)
 	// of the channels that is still busy we should rearm the
 	// work to repeat an attempt. Because previous batches are
 	// still in progress.
-	if (!krcp->bkvhead[0] && !krcp->bkvhead[1] && !krcp->head)
-		krcp->monitor_todo = false;
-	else
+	if (need_offload_krc(krcp))
 		schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
 
 	raw_spin_unlock_irqrestore(&krcp->lock, flags);
@@ -3667,11 +3675,8 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 	WRITE_ONCE(krcp->count, krcp->count + 1);
 
 	// Set timer to drain after KFREE_DRAIN_JIFFIES.
-	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
-	    !krcp->monitor_todo) {
-		krcp->monitor_todo = true;
+	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING)
 		schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
-	}
 
 unlock_return:
 	krc_this_cpu_unlock(krcp, flags);
@@ -3746,14 +3751,8 @@ void __init kfree_rcu_scheduler_running(void)
 		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
 
 		raw_spin_lock_irqsave(&krcp->lock, flags);
-		if ((!krcp->bkvhead[0] && !krcp->bkvhead[1] && !krcp->head) ||
-				krcp->monitor_todo) {
-			raw_spin_unlock_irqrestore(&krcp->lock, flags);
-			continue;
-		}
-		krcp->monitor_todo = true;
-		schedule_delayed_work_on(cpu, &krcp->monitor_work,
-					 KFREE_DRAIN_JIFFIES);
+		if (need_offload_krc(krcp))
+			schedule_delayed_work_on(cpu, &krcp->monitor_work, KFREE_DRAIN_JIFFIES);
 		raw_spin_unlock_irqrestore(&krcp->lock, flags);
 	}
 }
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH rcu 10/12] rcu: Initialize first_gp_fqs at declaration in rcu_gp_fqs()
  2022-06-20 22:20 [PATCH rcu 0/12] Miscellaneous fixes for v5.20 Paul E. McKenney
                   ` (8 preceding siblings ...)
  2022-06-20 22:20 ` [PATCH rcu 09/12] rcu/kvfree: Remove useless monitor_todo flag Paul E. McKenney
@ 2022-06-20 22:20 ` Paul E. McKenney
  2022-06-20 22:20 ` [PATCH rcu 11/12] rcu/tree: Add comment to describe GP-done condition in fqs loop Paul E. McKenney
  2022-06-20 22:20 ` [PATCH rcu 12/12] srcu: Block less aggressively for expedited grace periods Paul E. McKenney
  11 siblings, 0 replies; 39+ messages in thread
From: Paul E. McKenney @ 2022-06-20 22:20 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Paul E. McKenney,
	Frederic Weisbecker, Neeraj Upadhyay

This commit saves a line of code by initializing the rcu_gp_fqs()
function's first_gp_fqs local variable in its declaration.

Reported-by: Frederic Weisbecker <frederic@kernel.org>
Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 7919d7b48fa6a..7d8e9cb852786 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1971,13 +1971,12 @@ static void rcu_gp_fqs(bool first_time)
  */
 static noinline_for_stack void rcu_gp_fqs_loop(void)
 {
-	bool first_gp_fqs;
+	bool first_gp_fqs = true;
 	int gf = 0;
 	unsigned long j;
 	int ret;
 	struct rcu_node *rnp = rcu_get_root();
 
-	first_gp_fqs = true;
 	j = READ_ONCE(jiffies_till_first_fqs);
 	if (rcu_state.cbovld)
 		gf = RCU_GP_FLAG_OVLD;
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH rcu 11/12] rcu/tree: Add comment to describe GP-done condition in fqs loop
  2022-06-20 22:20 [PATCH rcu 0/12] Miscellaneous fixes for v5.20 Paul E. McKenney
                   ` (9 preceding siblings ...)
  2022-06-20 22:20 ` [PATCH rcu 10/12] rcu: Initialize first_gp_fqs at declaration in rcu_gp_fqs() Paul E. McKenney
@ 2022-06-20 22:20 ` Paul E. McKenney
  2022-06-20 22:20 ` [PATCH rcu 12/12] srcu: Block less aggressively for expedited grace periods Paul E. McKenney
  11 siblings, 0 replies; 39+ messages in thread
From: Paul E. McKenney @ 2022-06-20 22:20 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Neeraj Upadhyay,
	Joel Fernandes, Paul E . McKenney

From: Neeraj Upadhyay <quic_neeraju@quicinc.com>

Add a comment to explain why !rcu_preempt_blocked_readers_cgp() condition
is required on root rnp node, for GP completion check in rcu_gp_fqs_loop().

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 7d8e9cb852786..cdffb3b60ee02 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2005,7 +2005,15 @@ static noinline_for_stack void rcu_gp_fqs_loop(void)
 		rcu_gp_torture_wait();
 		WRITE_ONCE(rcu_state.gp_state, RCU_GP_DOING_FQS);
 		/* Locking provides needed memory barriers. */
-		/* If grace period done, leave loop. */
+		/*
+		 * Exit the loop if the root rcu_node structure indicates that the grace period
+		 * has ended, leave the loop.  The rcu_preempt_blocked_readers_cgp(rnp) check
+		 * is required only for single-node rcu_node trees because readers blocking
+		 * the current grace period are queued only on leaf rcu_node structures.
+		 * For multi-node trees, checking the root node's ->qsmask suffices, because a
+		 * given root node's ->qsmask bit is cleared only when all CPUs and tasks from
+		 * the corresponding leaf nodes have passed through their quiescent state.
+		 */
 		if (!READ_ONCE(rnp->qsmask) &&
 		    !rcu_preempt_blocked_readers_cgp(rnp))
 			break;
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH rcu 12/12] srcu: Block less aggressively for expedited grace periods
  2022-06-20 22:20 [PATCH rcu 0/12] Miscellaneous fixes for v5.20 Paul E. McKenney
                   ` (10 preceding siblings ...)
  2022-06-20 22:20 ` [PATCH rcu 11/12] rcu/tree: Add comment to describe GP-done condition in fqs loop Paul E. McKenney
@ 2022-06-20 22:20 ` Paul E. McKenney
  2022-06-21  2:00   ` Zhangfei Gao
                     ` (2 more replies)
  11 siblings, 3 replies; 39+ messages in thread
From: Paul E. McKenney @ 2022-06-20 22:20 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Paul E. McKenney,
	Zhangfei Gao, Shameerali Kolothum Thodi, Paolo Bonzini

Commit 282d8998e997 ("srcu: Prevent expedited GPs and blocking readers
from consuming CPU") fixed a problem where a long-running expedited SRCU
grace period could block kernel live patching.  It did so by giving up
on expediting once a given SRCU expedited grace period grew too old.

Unfortunately, this added excessive delays to boots of embedded systems
running on qemu that use the ARM IORT RMR feature.  This commit therefore
makes the transition away from expediting less aggressive, increasing
the per-grace-period phase number of non-sleeping polls of readers from
one to three and increasing the required grace-period age from one jiffy
(actually from zero to one jiffies) to two jiffies (actually from one
to two jiffies).

Fixes: 282d8998e997 ("srcu: Prevent expedited GPs and blocking readers from consuming CPU")
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reported-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Reported-by: chenxiang (M)" <chenxiang66@hisilicon.com>
Cc: Shameerali Kolothum Thodi  <shameerali.kolothum.thodi@huawei.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/all/20615615-0013-5adc-584f-2b1d5c03ebfc@linaro.org/
---
 kernel/rcu/srcutree.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 50ba70f019dea..0db7873f4e95b 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -513,7 +513,7 @@ static bool srcu_readers_active(struct srcu_struct *ssp)
 
 #define SRCU_INTERVAL		1	// Base delay if no expedited GPs pending.
 #define SRCU_MAX_INTERVAL	10	// Maximum incremental delay from slow readers.
-#define SRCU_MAX_NODELAY_PHASE	1	// Maximum per-GP-phase consecutive no-delay instances.
+#define SRCU_MAX_NODELAY_PHASE	3	// Maximum per-GP-phase consecutive no-delay instances.
 #define SRCU_MAX_NODELAY	100	// Maximum consecutive no-delay instances.
 
 /*
@@ -522,16 +522,22 @@ static bool srcu_readers_active(struct srcu_struct *ssp)
  */
 static unsigned long srcu_get_delay(struct srcu_struct *ssp)
 {
+	unsigned long gpstart;
+	unsigned long j;
 	unsigned long jbase = SRCU_INTERVAL;
 
 	if (ULONG_CMP_LT(READ_ONCE(ssp->srcu_gp_seq), READ_ONCE(ssp->srcu_gp_seq_needed_exp)))
 		jbase = 0;
-	if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)))
-		jbase += jiffies - READ_ONCE(ssp->srcu_gp_start);
-	if (!jbase) {
-		WRITE_ONCE(ssp->srcu_n_exp_nodelay, READ_ONCE(ssp->srcu_n_exp_nodelay) + 1);
-		if (READ_ONCE(ssp->srcu_n_exp_nodelay) > SRCU_MAX_NODELAY_PHASE)
-			jbase = 1;
+	if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq))) {
+		j = jiffies - 1;
+		gpstart = READ_ONCE(ssp->srcu_gp_start);
+		if (time_after(j, gpstart))
+			jbase += j - gpstart;
+		if (!jbase) {
+			WRITE_ONCE(ssp->srcu_n_exp_nodelay, READ_ONCE(ssp->srcu_n_exp_nodelay) + 1);
+			if (READ_ONCE(ssp->srcu_n_exp_nodelay) > SRCU_MAX_NODELAY_PHASE)
+				jbase = 1;
+		}
 	}
 	return jbase > SRCU_MAX_INTERVAL ? SRCU_MAX_INTERVAL : jbase;
 }
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH rcu 12/12] srcu: Block less aggressively for expedited grace periods
  2022-06-20 22:20 ` [PATCH rcu 12/12] srcu: Block less aggressively for expedited grace periods Paul E. McKenney
@ 2022-06-21  2:00   ` Zhangfei Gao
  2022-06-21  3:15     ` Paul E. McKenney
  2022-06-21  7:43   ` Shameerali Kolothum Thodi
  2022-06-21 10:13   ` Neeraj Upadhyay
  2 siblings, 1 reply; 39+ messages in thread
From: Zhangfei Gao @ 2022-06-21  2:00 UTC (permalink / raw)
  To: Paul E. McKenney, rcu
  Cc: linux-kernel, kernel-team, rostedt, Zhangfei Gao,
	Shameerali Kolothum Thodi, Paolo Bonzini



On 2022/6/21 上午6:20, Paul E. McKenney wrote:
> Commit 282d8998e997 ("srcu: Prevent expedited GPs and blocking readers
> from consuming CPU") fixed a problem where a long-running expedited SRCU
> grace period could block kernel live patching.  It did so by giving up
> on expediting once a given SRCU expedited grace period grew too old.
>
> Unfortunately, this added excessive delays to boots of embedded systems
> running on qemu that use the ARM IORT RMR feature.  This commit therefore
> makes the transition away from expediting less aggressive, increasing
> the per-grace-period phase number of non-sleeping polls of readers from
> one to three and increasing the required grace-period age from one jiffy
> (actually from zero to one jiffies) to two jiffies (actually from one
> to two jiffies).
>
> Fixes: 282d8998e997 ("srcu: Prevent expedited GPs and blocking readers from consuming CPU")
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Reported-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> Reported-by: chenxiang (M)" <chenxiang66@hisilicon.com>
> Cc: Shameerali Kolothum Thodi  <shameerali.kolothum.thodi@huawei.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Link: https://lore.kernel.org/all/20615615-0013-5adc-584f-2b1d5c03ebfc@linaro.org/

Test on 5.19-rc1 with this patch with qemu boot with -bios 
QEMU_EFI-2022.fd, seems not work, same as rc1.

real    2m42.948s
user    0m2.843s
sys     0m1.170s

qemu: stable-6.1

build/aarch64-softmmu/qemu-system-aarch64 -machine 
virt,gic-version=3,iommu=smmuv3 \
-enable-kvm -cpu host -m 1024 \
-kernel /home/linaro/Image -initrd /home/linaro/tmp/ramdisk-new.img 
-nographic -append \
"rdinit=init console=ttyAMA0 earlycon=pl011,0x9000000 kpti=off acpi=force" \
-bios QEMU_EFI-2022.fd

Thanks
> ---
>   kernel/rcu/srcutree.c | 20 +++++++++++++-------
>   1 file changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index 50ba70f019dea..0db7873f4e95b 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -513,7 +513,7 @@ static bool srcu_readers_active(struct srcu_struct *ssp)
>   
>   #define SRCU_INTERVAL		1	// Base delay if no expedited GPs pending.
>   #define SRCU_MAX_INTERVAL	10	// Maximum incremental delay from slow readers.
> -#define SRCU_MAX_NODELAY_PHASE	1	// Maximum per-GP-phase consecutive no-delay instances.
> +#define SRCU_MAX_NODELAY_PHASE	3	// Maximum per-GP-phase consecutive no-delay instances.
>   #define SRCU_MAX_NODELAY	100	// Maximum consecutive no-delay instances.
>   
>   /*
> @@ -522,16 +522,22 @@ static bool srcu_readers_active(struct srcu_struct *ssp)
>    */
>   static unsigned long srcu_get_delay(struct srcu_struct *ssp)
>   {
> +	unsigned long gpstart;
> +	unsigned long j;
>   	unsigned long jbase = SRCU_INTERVAL;
>   
>   	if (ULONG_CMP_LT(READ_ONCE(ssp->srcu_gp_seq), READ_ONCE(ssp->srcu_gp_seq_needed_exp)))
>   		jbase = 0;
> -	if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)))
> -		jbase += jiffies - READ_ONCE(ssp->srcu_gp_start);
> -	if (!jbase) {
> -		WRITE_ONCE(ssp->srcu_n_exp_nodelay, READ_ONCE(ssp->srcu_n_exp_nodelay) + 1);
> -		if (READ_ONCE(ssp->srcu_n_exp_nodelay) > SRCU_MAX_NODELAY_PHASE)
> -			jbase = 1;
> +	if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq))) {
> +		j = jiffies - 1;
> +		gpstart = READ_ONCE(ssp->srcu_gp_start);
> +		if (time_after(j, gpstart))
> +			jbase += j - gpstart;
> +		if (!jbase) {
> +			WRITE_ONCE(ssp->srcu_n_exp_nodelay, READ_ONCE(ssp->srcu_n_exp_nodelay) + 1);
> +			if (READ_ONCE(ssp->srcu_n_exp_nodelay) > SRCU_MAX_NODELAY_PHASE)
> +				jbase = 1;
> +		}
>   	}
>   	return jbase > SRCU_MAX_INTERVAL ? SRCU_MAX_INTERVAL : jbase;
>   }


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH rcu 12/12] srcu: Block less aggressively for expedited grace periods
  2022-06-21  2:00   ` Zhangfei Gao
@ 2022-06-21  3:15     ` Paul E. McKenney
  0 siblings, 0 replies; 39+ messages in thread
From: Paul E. McKenney @ 2022-06-21  3:15 UTC (permalink / raw)
  To: Zhangfei Gao
  Cc: rcu, linux-kernel, kernel-team, rostedt,
	Shameerali Kolothum Thodi, Paolo Bonzini

On Tue, Jun 21, 2022 at 10:00:07AM +0800, Zhangfei Gao wrote:
> 
> 
> On 2022/6/21 上午6:20, Paul E. McKenney wrote:
> > Commit 282d8998e997 ("srcu: Prevent expedited GPs and blocking readers
> > from consuming CPU") fixed a problem where a long-running expedited SRCU
> > grace period could block kernel live patching.  It did so by giving up
> > on expediting once a given SRCU expedited grace period grew too old.
> > 
> > Unfortunately, this added excessive delays to boots of embedded systems
> > running on qemu that use the ARM IORT RMR feature.  This commit therefore
> > makes the transition away from expediting less aggressive, increasing
> > the per-grace-period phase number of non-sleeping polls of readers from
> > one to three and increasing the required grace-period age from one jiffy
> > (actually from zero to one jiffies) to two jiffies (actually from one
> > to two jiffies).
> > 
> > Fixes: 282d8998e997 ("srcu: Prevent expedited GPs and blocking readers from consuming CPU")
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > Reported-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> > Reported-by: chenxiang (M)" <chenxiang66@hisilicon.com>
> > Cc: Shameerali Kolothum Thodi  <shameerali.kolothum.thodi@huawei.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Link: https://lore.kernel.org/all/20615615-0013-5adc-584f-2b1d5c03ebfc@linaro.org/
> 
> Test on 5.19-rc1 with this patch with qemu boot with -bios QEMU_EFI-2022.fd,
> seems not work, same as rc1.
> 
> real    2m42.948s
> user    0m2.843s
> sys     0m1.170s
> 
> qemu: stable-6.1
> 
> build/aarch64-softmmu/qemu-system-aarch64 -machine
> virt,gic-version=3,iommu=smmuv3 \
> -enable-kvm -cpu host -m 1024 \
> -kernel /home/linaro/Image -initrd /home/linaro/tmp/ramdisk-new.img
> -nographic -append \
> "rdinit=init console=ttyAMA0 earlycon=pl011,0x9000000 kpti=off acpi=force" \
> -bios QEMU_EFI-2022.fd

Understood.  This patch fixes some cases, but not your case.  Which is
why you guys are experimenting with additional changes.  In the meantime,
this patch helps at least some people.  I look forward to you guys have
an appropriate solution that I can pull in on top of this one.

Or, if the solution shows up quickly enough, I can replace this patch
with your guys' solution.

							Thanx, Paul

> Thanks
> > ---
> >   kernel/rcu/srcutree.c | 20 +++++++++++++-------
> >   1 file changed, 13 insertions(+), 7 deletions(-)
> > 
> > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > index 50ba70f019dea..0db7873f4e95b 100644
> > --- a/kernel/rcu/srcutree.c
> > +++ b/kernel/rcu/srcutree.c
> > @@ -513,7 +513,7 @@ static bool srcu_readers_active(struct srcu_struct *ssp)
> >   #define SRCU_INTERVAL		1	// Base delay if no expedited GPs pending.
> >   #define SRCU_MAX_INTERVAL	10	// Maximum incremental delay from slow readers.
> > -#define SRCU_MAX_NODELAY_PHASE	1	// Maximum per-GP-phase consecutive no-delay instances.
> > +#define SRCU_MAX_NODELAY_PHASE	3	// Maximum per-GP-phase consecutive no-delay instances.
> >   #define SRCU_MAX_NODELAY	100	// Maximum consecutive no-delay instances.
> >   /*
> > @@ -522,16 +522,22 @@ static bool srcu_readers_active(struct srcu_struct *ssp)
> >    */
> >   static unsigned long srcu_get_delay(struct srcu_struct *ssp)
> >   {
> > +	unsigned long gpstart;
> > +	unsigned long j;
> >   	unsigned long jbase = SRCU_INTERVAL;
> >   	if (ULONG_CMP_LT(READ_ONCE(ssp->srcu_gp_seq), READ_ONCE(ssp->srcu_gp_seq_needed_exp)))
> >   		jbase = 0;
> > -	if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)))
> > -		jbase += jiffies - READ_ONCE(ssp->srcu_gp_start);
> > -	if (!jbase) {
> > -		WRITE_ONCE(ssp->srcu_n_exp_nodelay, READ_ONCE(ssp->srcu_n_exp_nodelay) + 1);
> > -		if (READ_ONCE(ssp->srcu_n_exp_nodelay) > SRCU_MAX_NODELAY_PHASE)
> > -			jbase = 1;
> > +	if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq))) {
> > +		j = jiffies - 1;
> > +		gpstart = READ_ONCE(ssp->srcu_gp_start);
> > +		if (time_after(j, gpstart))
> > +			jbase += j - gpstart;
> > +		if (!jbase) {
> > +			WRITE_ONCE(ssp->srcu_n_exp_nodelay, READ_ONCE(ssp->srcu_n_exp_nodelay) + 1);
> > +			if (READ_ONCE(ssp->srcu_n_exp_nodelay) > SRCU_MAX_NODELAY_PHASE)
> > +				jbase = 1;
> > +		}
> >   	}
> >   	return jbase > SRCU_MAX_INTERVAL ? SRCU_MAX_INTERVAL : jbase;
> >   }
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH rcu 01/12] rcu: Decrease FQS scan wait time in case of callback overloading
  2022-06-20 22:20 ` [PATCH rcu 01/12] rcu: Decrease FQS scan wait time in case of callback overloading Paul E. McKenney
@ 2022-06-21  5:29   ` Neeraj Upadhyay
  2022-06-21 22:19     ` Paul E. McKenney
  0 siblings, 1 reply; 39+ messages in thread
From: Neeraj Upadhyay @ 2022-06-21  5:29 UTC (permalink / raw)
  To: Paul E. McKenney, rcu; +Cc: linux-kernel, kernel-team, rostedt



On 6/21/2022 3:50 AM, Paul E. McKenney wrote:
> The force-quiesce-state loop function rcu_gp_fqs_loop() checks for
> callback overloading and does an immediate initial scan for idle CPUs
> if so.  However, subsequent rescans will be carried out at as leisurely a
> rate as they always are, as specified by the rcutree.jiffies_till_next_fqs
> module parameter.  It might be tempting to just continue immediately
> rescanning, but this turns the RCU grace-period kthread into a CPU hog.
> It might also be tempting to reduce the time between rescans to a single
> jiffy, but this can be problematic on larger systems.
> 
> This commit therefore divides the normal time between rescans by three,
> rounding up.  Thus a small system running at HZ=1000 that is suffering
> from callback overload will wait only one jiffy instead of the normal
> three between rescans.
> 
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> ---
>   kernel/rcu/tree.c | 5 +++++
>   1 file changed, 5 insertions(+)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index c25ba442044a6..c19d5926886fb 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1993,6 +1993,11 @@ static noinline_for_stack void rcu_gp_fqs_loop(void)
>   			WRITE_ONCE(rcu_state.jiffies_kick_kthreads,
>   				   jiffies + (j ? 3 * j : 2));
>   		}
> +		if (rcu_state.cbovld) {
> +			j = (j + 2) / 3;
> +			if (j <= 0)
> +				j = 1;
> +		}

We update 'j' here, after setting rcu_state.jiffies_force_qs

     WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies + j)

So, we return from swait_event_idle_timeout_exclusive after 1/3 time 
duration.

     swait_event_idle_timeout_exclusive(rcu_state.gp_wq,
				 rcu_gp_fqs_check_wake(&gf), j);

This can result in !timer_after check to return false and we will
enter the 'else' (stray signal block) code?

This might not matter for first 2 fqs loop iterations, where 
RCU_GP_FLAG_OVLD is set in 'gf', but subsequent iterations won't benefit
from this patch?


if (!time_after(rcu_state.jiffies_force_qs, jiffies) ||
	(gf & (RCU_GP_FLAG_FQS | RCU_GP_FLAG_OVLD))) {
			...
} else {
	/* Deal with stray signal. */
}


So, do we need to move this calculation above the 'if' block which sets 
rcu_state.jiffies_force_qs?
		if (!ret) {

			WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies +
						j);...
		}

Thanks
Neeraj

>   		trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
>   				       TPS("fqswait"));
>   		WRITE_ONCE(rcu_state.gp_state, RCU_GP_WAIT_FQS);

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH rcu 02/12] rcu: Avoid tracing a few functions executed in stop machine
  2022-06-20 22:20 ` [PATCH rcu 02/12] rcu: Avoid tracing a few functions executed in stop machine Paul E. McKenney
@ 2022-06-21  5:47   ` Neeraj Upadhyay
  2022-06-21 22:21     ` Paul E. McKenney
  0 siblings, 1 reply; 39+ messages in thread
From: Neeraj Upadhyay @ 2022-06-21  5:47 UTC (permalink / raw)
  To: Paul E. McKenney, rcu; +Cc: linux-kernel, kernel-team, rostedt, Patrick Wang



On 6/21/2022 3:50 AM, Paul E. McKenney wrote:
> From: Patrick Wang <patrick.wang.shcn@gmail.com>
> 
> Stop-machine recently started calling additional functions while waiting:
> 
> ----------------------------------------------------------------
> Former stop machine wait loop:
> do {
>      cpu_relax(); => macro
>      ...
> } while (curstate != STOPMACHINE_EXIT);
> -----------------------------------------------------------------
> Current stop machine wait loop:
> do {
>      stop_machine_yield(cpumask); => function (notraced)
>      ...
>      touch_nmi_watchdog(); => function (notraced, inside calls also notraced)
>      ...
>      rcu_momentary_dyntick_idle(); => function (notraced, inside calls traced)
> } while (curstate != MULTI_STOP_EXIT);
> ------------------------------------------------------------------
> 
> These functions (and the functions that they call) must be marked
> notrace to prevent them from being updated while they are executing.
> The consequences of failing to mark these functions can be severe:
> 
>    rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>    rcu: 	1-...!: (0 ticks this GP) idle=14f/1/0x4000000000000000 softirq=3397/3397 fqs=0
>    rcu: 	3-...!: (0 ticks this GP) idle=ee9/1/0x4000000000000000 softirq=5168/5168 fqs=0
>    	(detected by 0, t=8137 jiffies, g=5889, q=2 ncpus=4)
>    Task dump for CPU 1:
>    task:migration/1     state:R  running task     stack:    0 pid:   19 ppid:     2 flags:0x00000000
>    Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
>    Call Trace:
>    Task dump for CPU 3:
>    task:migration/3     state:R  running task     stack:    0 pid:   29 ppid:     2 flags:0x00000000
>    Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
>    Call Trace:
>    rcu: rcu_preempt kthread timer wakeup didn't happen for 8136 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
>    rcu: 	Possible timer handling issue on cpu=2 timer-softirq=594
>    rcu: rcu_preempt kthread starved for 8137 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2
>    rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
>    rcu: RCU grace-period kthread stack dump:
>    task:rcu_preempt     state:I stack:    0 pid:   14 ppid:     2 flags:0x00000000
>    Call Trace:
>      schedule+0x56/0xc2
>      schedule_timeout+0x82/0x184
>      rcu_gp_fqs_loop+0x19a/0x318
>      rcu_gp_kthread+0x11a/0x140
>      kthread+0xee/0x118
>      ret_from_exception+0x0/0x14
>    rcu: Stack dump where RCU GP kthread last ran:
>    Task dump for CPU 2:
>    task:migration/2     state:R  running task     stack:    0 pid:   24 ppid:     2 flags:0x00000000
>    Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
>    Call Trace:
> 
> This commit therefore marks these functions notrace:
>   rcu_preempt_deferred_qs()
>   rcu_preempt_need_deferred_qs()
>   rcu_preempt_deferred_qs_irqrestore()
> 

Only the preemptible RCU definitions are updated; so, this change is not 
required for non-preemptible RCU case?


Thanks
Neeraj

> Signed-off-by: Patrick Wang <patrick.wang.shcn@gmail.com>
> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> ---
>   kernel/rcu/tree_plugin.h | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index c8ba0fe17267c..440d9e02a26e0 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -460,7 +460,7 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp)
>    * be quite short, for example, in the case of the call from
>    * rcu_read_unlock_special().
>    */
> -static void
> +static notrace void
>   rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
>   {
>   	bool empty_exp;
> @@ -581,7 +581,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
>    * is disabled.  This function cannot be expected to understand these
>    * nuances, so the caller must handle them.
>    */
> -static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
> +static notrace bool rcu_preempt_need_deferred_qs(struct task_struct *t)
>   {
>   	return (__this_cpu_read(rcu_data.cpu_no_qs.b.exp) ||
>   		READ_ONCE(t->rcu_read_unlock_special.s)) &&
> @@ -595,7 +595,7 @@ static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
>    * evaluate safety in terms of interrupt, softirq, and preemption
>    * disabling.
>    */
> -static void rcu_preempt_deferred_qs(struct task_struct *t)
> +static notrace void rcu_preempt_deferred_qs(struct task_struct *t)
>   {
>   	unsigned long flags;
>   

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH rcu 03/12] rcu: Add rnp->cbovldmask check in rcutree_migrate_callbacks()
  2022-06-20 22:20 ` [PATCH rcu 03/12] rcu: Add rnp->cbovldmask check in rcutree_migrate_callbacks() Paul E. McKenney
@ 2022-06-21  5:57   ` Neeraj Upadhyay
  2022-06-21 22:22     ` Paul E. McKenney
  0 siblings, 1 reply; 39+ messages in thread
From: Neeraj Upadhyay @ 2022-06-21  5:57 UTC (permalink / raw)
  To: Paul E. McKenney, rcu; +Cc: linux-kernel, kernel-team, rostedt, Zqiang



On 6/21/2022 3:50 AM, Paul E. McKenney wrote:
> From: Zqiang <qiang1.zhang@intel.com>
> 
> Currently, the rcu_node structure's ->cbovlmask field is set in call_rcu()
> when a given CPU is suffering from callback overload.  But if that CPU
> goes offline, the outgoing CPU's callbacks is migrated to the running
> CPU, which is likely to overload the running CPU.  However, that CPU's
> bit in its leaf rcu_node structure's ->cbovlmask field remains zero.
> 
> Initially, this is OK because the outgoing CPU's bit remains set.
> However, that bit will be cleared at the next end of a grace period,
> at which time it is quite possible that the running CPU will still
> be overloaded.  If the running CPU invokes call_rcu(), then overload
> will be checked for and the bit will be set.  Except that there is no
> guarantee that the running CPU will invoke call_rcu(), in which case the
> next grace period will fail to take the running CPU's overload condition
> into account.  Plus, because the bit is not set, the end of the grace
> period won't check for overload on this CPU.
> 
> This commit therefore adds a call to check_cb_ovld_locked() in
> check_cb_ovld_locked() to set the running CPU's ->cbovlmask bit

Nit: s/check_cb_ovld_locked/rcutree_migrate_callbacks/

> appropriately.
> 
> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> ---

Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>


Thanks
Neeraj

>   kernel/rcu/tree.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index c19d5926886fb..f4a37f2032664 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -4491,6 +4491,7 @@ void rcutree_migrate_callbacks(int cpu)
>   	needwake = needwake || rcu_advance_cbs(my_rnp, my_rdp);
>   	rcu_segcblist_disable(&rdp->cblist);
>   	WARN_ON_ONCE(rcu_segcblist_empty(&my_rdp->cblist) != !rcu_segcblist_n_cbs(&my_rdp->cblist));
> +	check_cb_ovld_locked(my_rdp, my_rnp);
>   	if (rcu_rdp_is_offloaded(my_rdp)) {
>   		raw_spin_unlock_rcu_node(my_rnp); /* irqs remain disabled. */
>   		__call_rcu_nocb_wake(my_rdp, true, flags);

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH rcu 04/12] rcu: Immediately boost preempted readers for strict grace periods
  2022-06-20 22:20 ` [PATCH rcu 04/12] rcu: Immediately boost preempted readers for strict grace periods Paul E. McKenney
@ 2022-06-21  6:00   ` Neeraj Upadhyay
  0 siblings, 0 replies; 39+ messages in thread
From: Neeraj Upadhyay @ 2022-06-21  6:00 UTC (permalink / raw)
  To: Paul E. McKenney, rcu; +Cc: linux-kernel, kernel-team, rostedt, Zqiang



On 6/21/2022 3:50 AM, Paul E. McKenney wrote:
> From: Zqiang <qiang1.zhang@intel.com>
> 
> The intent of the CONFIG_RCU_STRICT_GRACE_PERIOD Konfig option is to
> cause normal grace periods to complete quickly in order to better catch
> errors resulting from improperly leaking pointers from RCU read-side
> critical sections.  However, kernels built with this option enabled still
> wait for some hundreds of milliseconds before boosting RCU readers that
> have been preempted within their current critical section.  The value
> of this delay is set by the CONFIG_RCU_BOOST_DELAY Kconfig option,
> which defaults to 500 milliseconds.
> 
> This commit therefore causes kernels build with strict grace periods
> to ignore CONFIG_RCU_BOOST_DELAY.  This causes rcu_initiate_boost()
> to start boosting immediately after all CPUs on a given leaf rcu_node
> structure have passed through their quiescent states.
> 
> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> ---

Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>


Thanks
Neeraj

>   kernel/rcu/tree_plugin.h | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 440d9e02a26e0..53d05eb804121 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -1140,7 +1140,8 @@ static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags)
>   	    (rnp->gp_tasks != NULL &&
>   	     rnp->boost_tasks == NULL &&
>   	     rnp->qsmask == 0 &&
> -	     (!time_after(rnp->boost_time, jiffies) || rcu_state.cbovld))) {
> +	     (!time_after(rnp->boost_time, jiffies) || rcu_state.cbovld ||
> +	      IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD)))) {
>   		if (rnp->exp_tasks == NULL)
>   			WRITE_ONCE(rnp->boost_tasks, rnp->gp_tasks);
>   		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH rcu 05/12] rcu: Forbid RCU_STRICT_GRACE_PERIOD in TINY_RCU kernels
  2022-06-20 22:20 ` [PATCH rcu 05/12] rcu: Forbid RCU_STRICT_GRACE_PERIOD in TINY_RCU kernels Paul E. McKenney
@ 2022-06-21  6:02   ` Neeraj Upadhyay
  0 siblings, 0 replies; 39+ messages in thread
From: Neeraj Upadhyay @ 2022-06-21  6:02 UTC (permalink / raw)
  To: Paul E. McKenney, rcu; +Cc: linux-kernel, kernel-team, rostedt



On 6/21/2022 3:50 AM, Paul E. McKenney wrote:
> The RCU_STRICT_GRACE_PERIOD Kconfig option does nothing in kernels
> built with CONFIG_TINY_RCU=y, so this commit adjusts the dependencies
> to disallow this combination.
> 
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> ---

Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>


Thanks
Neeraj

>   kernel/rcu/Kconfig.debug | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
> index 9b64e55d4f615..4da05beb13d79 100644
> --- a/kernel/rcu/Kconfig.debug
> +++ b/kernel/rcu/Kconfig.debug
> @@ -121,7 +121,7 @@ config RCU_EQS_DEBUG
>   
>   config RCU_STRICT_GRACE_PERIOD
>   	bool "Provide debug RCU implementation with short grace periods"
> -	depends on DEBUG_KERNEL && RCU_EXPERT && NR_CPUS <= 4
> +	depends on DEBUG_KERNEL && RCU_EXPERT && NR_CPUS <= 4 && !TINY_RCU
>   	default n
>   	select PREEMPT_COUNT if PREEMPT=n
>   	help

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH rcu 07/12] rcu: tiny: Record kvfree_call_rcu() call stack for KASAN
  2022-06-20 22:20 ` [PATCH rcu 07/12] rcu: tiny: Record kvfree_call_rcu() call stack for KASAN Paul E. McKenney
@ 2022-06-21  6:31   ` Neeraj Upadhyay
  2022-06-21 19:31     ` Paul E. McKenney
  0 siblings, 1 reply; 39+ messages in thread
From: Neeraj Upadhyay @ 2022-06-21  6:31 UTC (permalink / raw)
  To: Paul E. McKenney, rcu
  Cc: linux-kernel, kernel-team, rostedt, Johannes Berg, Dmitry Vyukov



On 6/21/2022 3:50 AM, Paul E. McKenney wrote:
> From: Johannes Berg <johannes.berg@intel.com>
> 
> When running KASAN with Tiny RCU (e.g. under ARCH=um, where
> a working KASAN patch is now available), we don't get any
> information on the original kfree_rcu() (or similar) caller
> when a problem is reported, as Tiny RCU doesn't record this.
> 
> Add the recording, which required pulling kvfree_call_rcu()
> out of line for the KASAN case since the recording function
> (kasan_record_aux_stack_noalloc) is neither exported, nor
> can we include kasan.h into rcutiny.h.
> 
> without KASAN, the patch has no size impact (ARCH=um kernel):
>      text       data         bss         dec        hex    filename
>   6151515    4423154    33148520    43723189    29b29b5    linux
>   6151515    4423154    33148520    43723189    29b29b5    linux + patch
> 
> with KASAN, the impact on my build was minimal:
>      text       data         bss         dec        hex    filename
> 13915539    7388050    33282304    54585893    340ea25    linux
> 13911266    7392114    33282304    54585684    340e954    linux + patch
>     -4273      +4064         +-0        -209
> 
> Acked-by: Dmitry Vyukov <dvyukov@google.com>
> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> ---
>   include/linux/rcutiny.h | 11 ++++++++++-
>   kernel/rcu/tiny.c       | 14 ++++++++++++++
>   2 files changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
> index 5fed476f977f6..d84e13f2c3848 100644
> --- a/include/linux/rcutiny.h
> +++ b/include/linux/rcutiny.h
> @@ -38,7 +38,7 @@ static inline void synchronize_rcu_expedited(void)
>    */
>   extern void kvfree(const void *addr);
>   
> -static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> +static inline void __kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
>   {
>   	if (head) {
>   		call_rcu(head, func);
> @@ -51,6 +51,15 @@ static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
>   	kvfree((void *) func);
>   }
>   
> +#ifdef CONFIG_KASAN_GENERIC
> +void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
> +#else
> +static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> +{
> +	__kvfree_call_rcu(head, func);
> +}
> +#endif
> +
>   void rcu_qs(void);
>   
>   static inline void rcu_softirq_qs(void)
> diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
> index 340b3f8b090d4..58ff3721d975c 100644
> --- a/kernel/rcu/tiny.c
> +++ b/kernel/rcu/tiny.c
> @@ -217,6 +217,20 @@ bool poll_state_synchronize_rcu(unsigned long oldstate)
>   }
>   EXPORT_SYMBOL_GPL(poll_state_synchronize_rcu);
>   
> +#ifdef CONFIG_KASAN_GENERIC
> +void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> +{
> +	if (head) {
> +		void *ptr = (void *) head - (unsigned long) func;
> +
> +		kasan_record_aux_stack_noalloc(ptr);
> +	}

For the !head case; similar to Tree RCU's kvfree_call_rcu() 
implementation, we do not need to record 'ptr' (which will be 'func')?


Thanks
Neeraj

> +
> +	__kvfree_call_rcu(head, func);
> +}
> +EXPORT_SYMBOL_GPL(kvfree_call_rcu);
> +#endif
> +
>   void __init rcu_init(void)
>   {
>   	open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH rcu 08/12] rcu: Cleanup RCU urgency state for offline CPU
  2022-06-20 22:20 ` [PATCH rcu 08/12] rcu: Cleanup RCU urgency state for offline CPU Paul E. McKenney
@ 2022-06-21  7:03   ` Neeraj Upadhyay
  2022-06-21 22:24     ` Paul E. McKenney
  0 siblings, 1 reply; 39+ messages in thread
From: Neeraj Upadhyay @ 2022-06-21  7:03 UTC (permalink / raw)
  To: Paul E. McKenney, rcu; +Cc: linux-kernel, kernel-team, rostedt, Zqiang



On 6/21/2022 3:50 AM, Paul E. McKenney wrote:
> From: Zqiang <qiang1.zhang@intel.com>
> 
> When a CPU is slow to provide a quiescent state for a given grace
> period, RCU takes steps to encourage that CPU to get with the
> quiescent-state program in a more timely fashion.  These steps
> include these flags in the rcu_data structure:
> 
> 1.	->rcu_urgent_qs, which causes the scheduling-clock interrupt to
> 	request an otherwise pointless context switch from the scheduler.
> 
> 2.	->rcu_need_heavy_qs, which causes both cond_resched() and RCU's
> 	context-switch hook to do an immediate momentary quiscent state.
> 
> 3.	->rcu_need_heavy_qs, which causes the scheduler-clock tick to

nit: s/->rcu_need_heavy_qs/->rcu_forced_tick/ ?

> 	be enabled even on nohz_full CPUs with only one runnable task.
> 
> These flags are of course cleared once the corresponding CPU has passed
> through a quiescent state.  Unless that quiescent state is the CPU
> going offline, which means that when the CPU comes back online, it will
> needlessly consume additional CPU time and incur additional latency,
> which constitutes a minor but very real performance bug.
> 
> This commit therefore adds the call to rcu_disable_urgency_upon_qs()
> that clears these flags to the CPU-hotplug offlining code path.
> 
> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> ---


Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>




Thanks
Neeraj

>   kernel/rcu/tree.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index f4a37f2032664..5445b19b48408 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -4446,6 +4446,7 @@ void rcu_report_dead(unsigned int cpu)
>   	rdp->rcu_ofl_gp_flags = READ_ONCE(rcu_state.gp_flags);
>   	if (rnp->qsmask & mask) { /* RCU waiting on outgoing CPU? */
>   		/* Report quiescent state -before- changing ->qsmaskinitnext! */
> +		rcu_disable_urgency_upon_qs(rdp);
>   		rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
>   		raw_spin_lock_irqsave_rcu_node(rnp, flags);
>   	}

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH rcu 12/12] srcu: Block less aggressively for expedited grace periods
  2022-06-20 22:20 ` [PATCH rcu 12/12] srcu: Block less aggressively for expedited grace periods Paul E. McKenney
  2022-06-21  2:00   ` Zhangfei Gao
@ 2022-06-21  7:43   ` Shameerali Kolothum Thodi
  2022-06-21 19:36     ` Paul E. McKenney
  2022-06-21 10:13   ` Neeraj Upadhyay
  2 siblings, 1 reply; 39+ messages in thread
From: Shameerali Kolothum Thodi @ 2022-06-21  7:43 UTC (permalink / raw)
  To: Paul E. McKenney, rcu
  Cc: linux-kernel, kernel-team, rostedt, Zhangfei Gao, Paolo Bonzini



> -----Original Message-----
> From: Paul E. McKenney [mailto:paulmck@kernel.org]
> Sent: 20 June 2022 23:21
> To: rcu@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org; kernel-team@fb.com;
> rostedt@goodmis.org; Paul E. McKenney <paulmck@kernel.org>; Zhangfei
> Gao <zhangfei.gao@linaro.org>; Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; Paolo Bonzini
> <pbonzini@redhat.com>
> Subject: [PATCH rcu 12/12] srcu: Block less aggressively for expedited grace
> periods
> 
> Commit 282d8998e997 ("srcu: Prevent expedited GPs and blocking readers
> from consuming CPU") fixed a problem where a long-running expedited
> SRCU
> grace period could block kernel live patching.  It did so by giving up
> on expediting once a given SRCU expedited grace period grew too old.
> 
> Unfortunately, this added excessive delays to boots of embedded systems
> running on qemu that use the ARM IORT RMR feature. 

As pointed out here[0]/[1], this delay has nothing to do with ARM IORT RMR
feature. The delay is due to the "-bios QEMU_EFI.fd" line which emulates flash
devices and requires excessive memslot ops during boot.

Thanks,
Shameer

[0] https://lore.kernel.org/all/110bbec5cee74efba0aad64360069a12@huawei.com/
[1] https://lore.kernel.org/all/8735g649ew.wl-maz@kernel.org/


 This commit
> therefore
> makes the transition away from expediting less aggressive, increasing
> the per-grace-period phase number of non-sleeping polls of readers from
> one to three and increasing the required grace-period age from one jiffy
> (actually from zero to one jiffies) to two jiffies (actually from one
> to two jiffies).
> 
> Fixes: 282d8998e997 ("srcu: Prevent expedited GPs and blocking readers
> from consuming CPU")
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Reported-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> Reported-by: chenxiang (M)" <chenxiang66@hisilicon.com>
> Cc: Shameerali Kolothum Thodi  <shameerali.kolothum.thodi@huawei.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Link:
> https://lore.kernel.org/all/20615615-0013-5adc-584f-2b1d5c03ebfc@linaro
> .org/
> ---
>  kernel/rcu/srcutree.c | 20 +++++++++++++-------
>  1 file changed, 13 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index 50ba70f019dea..0db7873f4e95b 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -513,7 +513,7 @@ static bool srcu_readers_active(struct srcu_struct
> *ssp)
> 
>  #define SRCU_INTERVAL		1	// Base delay if no expedited GPs
> pending.
>  #define SRCU_MAX_INTERVAL	10	// Maximum incremental delay from
> slow readers.
> -#define SRCU_MAX_NODELAY_PHASE	1	// Maximum per-GP-phase
> consecutive no-delay instances.
> +#define SRCU_MAX_NODELAY_PHASE	3	// Maximum per-GP-phase
> consecutive no-delay instances.
>  #define SRCU_MAX_NODELAY	100	// Maximum consecutive no-delay
> instances.
> 
>  /*
> @@ -522,16 +522,22 @@ static bool srcu_readers_active(struct srcu_struct
> *ssp)
>   */
>  static unsigned long srcu_get_delay(struct srcu_struct *ssp)
>  {
> +	unsigned long gpstart;
> +	unsigned long j;
>  	unsigned long jbase = SRCU_INTERVAL;
> 
>  	if (ULONG_CMP_LT(READ_ONCE(ssp->srcu_gp_seq),
> READ_ONCE(ssp->srcu_gp_seq_needed_exp)))
>  		jbase = 0;
> -	if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)))
> -		jbase += jiffies - READ_ONCE(ssp->srcu_gp_start);
> -	if (!jbase) {
> -		WRITE_ONCE(ssp->srcu_n_exp_nodelay,
> READ_ONCE(ssp->srcu_n_exp_nodelay) + 1);
> -		if (READ_ONCE(ssp->srcu_n_exp_nodelay) >
> SRCU_MAX_NODELAY_PHASE)
> -			jbase = 1;
> +	if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq))) {
> +		j = jiffies - 1;
> +		gpstart = READ_ONCE(ssp->srcu_gp_start);
> +		if (time_after(j, gpstart))
> +			jbase += j - gpstart;
> +		if (!jbase) {
> +			WRITE_ONCE(ssp->srcu_n_exp_nodelay,
> READ_ONCE(ssp->srcu_n_exp_nodelay) + 1);
> +			if (READ_ONCE(ssp->srcu_n_exp_nodelay) >
> SRCU_MAX_NODELAY_PHASE)
> +				jbase = 1;
> +		}
>  	}
>  	return jbase > SRCU_MAX_INTERVAL ? SRCU_MAX_INTERVAL : jbase;
>  }
> --
> 2.31.1.189.g2e36527f23


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH rcu 09/12] rcu/kvfree: Remove useless monitor_todo flag
  2022-06-20 22:20 ` [PATCH rcu 09/12] rcu/kvfree: Remove useless monitor_todo flag Paul E. McKenney
@ 2022-06-21 10:02   ` Neeraj Upadhyay
  0 siblings, 0 replies; 39+ messages in thread
From: Neeraj Upadhyay @ 2022-06-21 10:02 UTC (permalink / raw)
  To: Paul E. McKenney, rcu
  Cc: linux-kernel, kernel-team, rostedt, Joel Fernandes (Google),
	Uladzislau Rezki



On 6/21/2022 3:50 AM, Paul E. McKenney wrote:
> From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> 
> monitor_todo is not needed as the work struct already tracks
> if work is pending. Just use that to know if work is pending
> using schedule_delayed_work() helper.
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> ---

Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>


Thanks
Neeraj

>   kernel/rcu/tree.c | 33 ++++++++++++++++-----------------
>   1 file changed, 16 insertions(+), 17 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 5445b19b48408..7919d7b48fa6a 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3216,7 +3216,6 @@ struct kfree_rcu_cpu_work {
>    * @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period
>    * @lock: Synchronize access to this structure
>    * @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES
> - * @monitor_todo: Tracks whether a @monitor_work delayed work is pending
>    * @initialized: The @rcu_work fields have been initialized
>    * @count: Number of objects for which GP not started
>    * @bkvcache:
> @@ -3241,7 +3240,6 @@ struct kfree_rcu_cpu {
>   	struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES];
>   	raw_spinlock_t lock;
>   	struct delayed_work monitor_work;
> -	bool monitor_todo;
>   	bool initialized;
>   	int count;
>   
> @@ -3421,6 +3419,18 @@ static void kfree_rcu_work(struct work_struct *work)
>   	}
>   }
>   
> +static bool
> +need_offload_krc(struct kfree_rcu_cpu *krcp)
> +{
> +	int i;
> +
> +	for (i = 0; i < FREE_N_CHANNELS; i++)
> +		if (krcp->bkvhead[i])
> +			return true;
> +
> +	return !!krcp->head;
> +}
> +
>   /*
>    * This function is invoked after the KFREE_DRAIN_JIFFIES timeout.
>    */
> @@ -3477,9 +3487,7 @@ static void kfree_rcu_monitor(struct work_struct *work)
>   	// of the channels that is still busy we should rearm the
>   	// work to repeat an attempt. Because previous batches are
>   	// still in progress.
> -	if (!krcp->bkvhead[0] && !krcp->bkvhead[1] && !krcp->head)
> -		krcp->monitor_todo = false;
> -	else
> +	if (need_offload_krc(krcp))
>   		schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
>   
>   	raw_spin_unlock_irqrestore(&krcp->lock, flags);
> @@ -3667,11 +3675,8 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
>   	WRITE_ONCE(krcp->count, krcp->count + 1);
>   
>   	// Set timer to drain after KFREE_DRAIN_JIFFIES.
> -	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
> -	    !krcp->monitor_todo) {
> -		krcp->monitor_todo = true;
> +	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING)
>   		schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
> -	}
>   
>   unlock_return:
>   	krc_this_cpu_unlock(krcp, flags);
> @@ -3746,14 +3751,8 @@ void __init kfree_rcu_scheduler_running(void)
>   		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
>   
>   		raw_spin_lock_irqsave(&krcp->lock, flags);
> -		if ((!krcp->bkvhead[0] && !krcp->bkvhead[1] && !krcp->head) ||
> -				krcp->monitor_todo) {
> -			raw_spin_unlock_irqrestore(&krcp->lock, flags);
> -			continue;
> -		}
> -		krcp->monitor_todo = true;
> -		schedule_delayed_work_on(cpu, &krcp->monitor_work,
> -					 KFREE_DRAIN_JIFFIES);
> +		if (need_offload_krc(krcp))
> +			schedule_delayed_work_on(cpu, &krcp->monitor_work, KFREE_DRAIN_JIFFIES);
>   		raw_spin_unlock_irqrestore(&krcp->lock, flags);
>   	}
>   }

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH rcu 12/12] srcu: Block less aggressively for expedited grace periods
  2022-06-20 22:20 ` [PATCH rcu 12/12] srcu: Block less aggressively for expedited grace periods Paul E. McKenney
  2022-06-21  2:00   ` Zhangfei Gao
  2022-06-21  7:43   ` Shameerali Kolothum Thodi
@ 2022-06-21 10:13   ` Neeraj Upadhyay
  2022-06-21 22:25     ` Paul E. McKenney
  2 siblings, 1 reply; 39+ messages in thread
From: Neeraj Upadhyay @ 2022-06-21 10:13 UTC (permalink / raw)
  To: Paul E. McKenney, rcu
  Cc: linux-kernel, kernel-team, rostedt, Zhangfei Gao,
	Shameerali Kolothum Thodi, Paolo Bonzini



On 6/21/2022 3:50 AM, Paul E. McKenney wrote:
> Commit 282d8998e997 ("srcu: Prevent expedited GPs and blocking readers
> from consuming CPU") fixed a problem where a long-running expedited SRCU
> grace period could block kernel live patching.  It did so by giving up
> on expediting once a given SRCU expedited grace period grew too old.
> 
> Unfortunately, this added excessive delays to boots of embedded systems
> running on qemu that use the ARM IORT RMR feature.  This commit therefore
> makes the transition away from expediting less aggressive, increasing
> the per-grace-period phase number of non-sleeping polls of readers from
> one to three and increasing the required grace-period age from one jiffy
> (actually from zero to one jiffies) to two jiffies (actually from one
> to two jiffies).
> 
> Fixes: 282d8998e997 ("srcu: Prevent expedited GPs and blocking readers from consuming CPU")
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Reported-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> Reported-by: chenxiang (M)" <chenxiang66@hisilicon.com>
> Cc: Shameerali Kolothum Thodi  <shameerali.kolothum.thodi@huawei.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Link: https://lore.kernel.org/all/20615615-0013-5adc-584f-2b1d5c03ebfc@linaro.org/
> ---


Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>


Thanks
Neeraj

>   kernel/rcu/srcutree.c | 20 +++++++++++++-------
>   1 file changed, 13 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index 50ba70f019dea..0db7873f4e95b 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -513,7 +513,7 @@ static bool srcu_readers_active(struct srcu_struct *ssp)
>   
>   #define SRCU_INTERVAL		1	// Base delay if no expedited GPs pending.
>   #define SRCU_MAX_INTERVAL	10	// Maximum incremental delay from slow readers.
> -#define SRCU_MAX_NODELAY_PHASE	1	// Maximum per-GP-phase consecutive no-delay instances.
> +#define SRCU_MAX_NODELAY_PHASE	3	// Maximum per-GP-phase consecutive no-delay instances.
>   #define SRCU_MAX_NODELAY	100	// Maximum consecutive no-delay instances.
>   
>   /*
> @@ -522,16 +522,22 @@ static bool srcu_readers_active(struct srcu_struct *ssp)
>    */
>   static unsigned long srcu_get_delay(struct srcu_struct *ssp)
>   {
> +	unsigned long gpstart;
> +	unsigned long j;
>   	unsigned long jbase = SRCU_INTERVAL;
>   
>   	if (ULONG_CMP_LT(READ_ONCE(ssp->srcu_gp_seq), READ_ONCE(ssp->srcu_gp_seq_needed_exp)))
>   		jbase = 0;
> -	if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)))
> -		jbase += jiffies - READ_ONCE(ssp->srcu_gp_start);
> -	if (!jbase) {
> -		WRITE_ONCE(ssp->srcu_n_exp_nodelay, READ_ONCE(ssp->srcu_n_exp_nodelay) + 1);
> -		if (READ_ONCE(ssp->srcu_n_exp_nodelay) > SRCU_MAX_NODELAY_PHASE)
> -			jbase = 1;
> +	if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq))) {
> +		j = jiffies - 1;
> +		gpstart = READ_ONCE(ssp->srcu_gp_start);
> +		if (time_after(j, gpstart))
> +			jbase += j - gpstart;
> +		if (!jbase) {
> +			WRITE_ONCE(ssp->srcu_n_exp_nodelay, READ_ONCE(ssp->srcu_n_exp_nodelay) + 1);
> +			if (READ_ONCE(ssp->srcu_n_exp_nodelay) > SRCU_MAX_NODELAY_PHASE)
> +				jbase = 1;
> +		}
>   	}
>   	return jbase > SRCU_MAX_INTERVAL ? SRCU_MAX_INTERVAL : jbase;
>   }

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH rcu 07/12] rcu: tiny: Record kvfree_call_rcu() call stack for KASAN
  2022-06-21  6:31   ` Neeraj Upadhyay
@ 2022-06-21 19:31     ` Paul E. McKenney
  2022-06-21 21:14       ` Marco Elver
  0 siblings, 1 reply; 39+ messages in thread
From: Paul E. McKenney @ 2022-06-21 19:31 UTC (permalink / raw)
  To: Neeraj Upadhyay
  Cc: rcu, linux-kernel, kernel-team, rostedt, Johannes Berg,
	Dmitry Vyukov, elver

On Tue, Jun 21, 2022 at 12:01:29PM +0530, Neeraj Upadhyay wrote:
> 
> 
> On 6/21/2022 3:50 AM, Paul E. McKenney wrote:
> > From: Johannes Berg <johannes.berg@intel.com>
> > 
> > When running KASAN with Tiny RCU (e.g. under ARCH=um, where
> > a working KASAN patch is now available), we don't get any
> > information on the original kfree_rcu() (or similar) caller
> > when a problem is reported, as Tiny RCU doesn't record this.
> > 
> > Add the recording, which required pulling kvfree_call_rcu()
> > out of line for the KASAN case since the recording function
> > (kasan_record_aux_stack_noalloc) is neither exported, nor
> > can we include kasan.h into rcutiny.h.
> > 
> > without KASAN, the patch has no size impact (ARCH=um kernel):
> >      text       data         bss         dec        hex    filename
> >   6151515    4423154    33148520    43723189    29b29b5    linux
> >   6151515    4423154    33148520    43723189    29b29b5    linux + patch
> > 
> > with KASAN, the impact on my build was minimal:
> >      text       data         bss         dec        hex    filename
> > 13915539    7388050    33282304    54585893    340ea25    linux
> > 13911266    7392114    33282304    54585684    340e954    linux + patch
> >     -4273      +4064         +-0        -209
> > 
> > Acked-by: Dmitry Vyukov <dvyukov@google.com>
> > Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > ---
> >   include/linux/rcutiny.h | 11 ++++++++++-
> >   kernel/rcu/tiny.c       | 14 ++++++++++++++
> >   2 files changed, 24 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
> > index 5fed476f977f6..d84e13f2c3848 100644
> > --- a/include/linux/rcutiny.h
> > +++ b/include/linux/rcutiny.h
> > @@ -38,7 +38,7 @@ static inline void synchronize_rcu_expedited(void)
> >    */
> >   extern void kvfree(const void *addr);
> > -static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > +static inline void __kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> >   {
> >   	if (head) {
> >   		call_rcu(head, func);
> > @@ -51,6 +51,15 @@ static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> >   	kvfree((void *) func);
> >   }
> > +#ifdef CONFIG_KASAN_GENERIC
> > +void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
> > +#else
> > +static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > +{
> > +	__kvfree_call_rcu(head, func);
> > +}
> > +#endif
> > +
> >   void rcu_qs(void);
> >   static inline void rcu_softirq_qs(void)
> > diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
> > index 340b3f8b090d4..58ff3721d975c 100644
> > --- a/kernel/rcu/tiny.c
> > +++ b/kernel/rcu/tiny.c
> > @@ -217,6 +217,20 @@ bool poll_state_synchronize_rcu(unsigned long oldstate)
> >   }
> >   EXPORT_SYMBOL_GPL(poll_state_synchronize_rcu);
> > +#ifdef CONFIG_KASAN_GENERIC
> > +void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > +{
> > +	if (head) {
> > +		void *ptr = (void *) head - (unsigned long) func;
> > +
> > +		kasan_record_aux_stack_noalloc(ptr);
> > +	}
> 
> For the !head case; similar to Tree RCU's kvfree_call_rcu() implementation,
> we do not need to record 'ptr' (which will be 'func')?

My understanding is that we do not need to record in that case
because __kvfree_call_rcu() will simply invoke the almost-zero-cost
synchronize_rcu() and then invoke kfree().

Johannes, Dmitry, Marco, anything that I am missing?

							Thanx, Paul

> Thanks
> Neeraj
> 
> > +
> > +	__kvfree_call_rcu(head, func);
> > +}
> > +EXPORT_SYMBOL_GPL(kvfree_call_rcu);
> > +#endif
> > +
> >   void __init rcu_init(void)
> >   {
> >   	open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH rcu 12/12] srcu: Block less aggressively for expedited grace periods
  2022-06-21  7:43   ` Shameerali Kolothum Thodi
@ 2022-06-21 19:36     ` Paul E. McKenney
  0 siblings, 0 replies; 39+ messages in thread
From: Paul E. McKenney @ 2022-06-21 19:36 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: rcu, linux-kernel, kernel-team, rostedt, Zhangfei Gao, Paolo Bonzini

On Tue, Jun 21, 2022 at 07:43:08AM +0000, Shameerali Kolothum Thodi wrote:
> 
> 
> > -----Original Message-----
> > From: Paul E. McKenney [mailto:paulmck@kernel.org]
> > Sent: 20 June 2022 23:21
> > To: rcu@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org; kernel-team@fb.com;
> > rostedt@goodmis.org; Paul E. McKenney <paulmck@kernel.org>; Zhangfei
> > Gao <zhangfei.gao@linaro.org>; Shameerali Kolothum Thodi
> > <shameerali.kolothum.thodi@huawei.com>; Paolo Bonzini
> > <pbonzini@redhat.com>
> > Subject: [PATCH rcu 12/12] srcu: Block less aggressively for expedited grace
> > periods
> > 
> > Commit 282d8998e997 ("srcu: Prevent expedited GPs and blocking readers
> > from consuming CPU") fixed a problem where a long-running expedited
> > SRCU
> > grace period could block kernel live patching.  It did so by giving up
> > on expediting once a given SRCU expedited grace period grew too old.
> > 
> > Unfortunately, this added excessive delays to boots of embedded systems
> > running on qemu that use the ARM IORT RMR feature. 
> 
> As pointed out here[0]/[1], this delay has nothing to do with ARM IORT RMR
> feature. The delay is due to the "-bios QEMU_EFI.fd" line which emulates flash
> devices and requires excessive memslot ops during boot.

Good eyes, and I will update.

Are we stuck with the excessive memslot ops?  If not, great.

If we are stuck with them, there probably needs to be a way of adjusting
the SRCU delay parameters.  Making "-bios QEMU_EFI.fd" go quickly seems
to require quite a bit of spinning, more than is likely to be helpful
in general.

							Thanx, Paul

> Thanks,
> Shameer
> 
> [0] https://lore.kernel.org/all/110bbec5cee74efba0aad64360069a12@huawei.com/
> [1] https://lore.kernel.org/all/8735g649ew.wl-maz@kernel.org/
> 
> 
>  This commit
> > therefore
> > makes the transition away from expediting less aggressive, increasing
> > the per-grace-period phase number of non-sleeping polls of readers from
> > one to three and increasing the required grace-period age from one jiffy
> > (actually from zero to one jiffies) to two jiffies (actually from one
> > to two jiffies).
> > 
> > Fixes: 282d8998e997 ("srcu: Prevent expedited GPs and blocking readers
> > from consuming CPU")
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > Reported-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> > Reported-by: chenxiang (M)" <chenxiang66@hisilicon.com>
> > Cc: Shameerali Kolothum Thodi  <shameerali.kolothum.thodi@huawei.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Link:
> > https://lore.kernel.org/all/20615615-0013-5adc-584f-2b1d5c03ebfc@linaro
> > .org/
> > ---
> >  kernel/rcu/srcutree.c | 20 +++++++++++++-------
> >  1 file changed, 13 insertions(+), 7 deletions(-)
> > 
> > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > index 50ba70f019dea..0db7873f4e95b 100644
> > --- a/kernel/rcu/srcutree.c
> > +++ b/kernel/rcu/srcutree.c
> > @@ -513,7 +513,7 @@ static bool srcu_readers_active(struct srcu_struct
> > *ssp)
> > 
> >  #define SRCU_INTERVAL		1	// Base delay if no expedited GPs
> > pending.
> >  #define SRCU_MAX_INTERVAL	10	// Maximum incremental delay from
> > slow readers.
> > -#define SRCU_MAX_NODELAY_PHASE	1	// Maximum per-GP-phase
> > consecutive no-delay instances.
> > +#define SRCU_MAX_NODELAY_PHASE	3	// Maximum per-GP-phase
> > consecutive no-delay instances.
> >  #define SRCU_MAX_NODELAY	100	// Maximum consecutive no-delay
> > instances.
> > 
> >  /*
> > @@ -522,16 +522,22 @@ static bool srcu_readers_active(struct srcu_struct
> > *ssp)
> >   */
> >  static unsigned long srcu_get_delay(struct srcu_struct *ssp)
> >  {
> > +	unsigned long gpstart;
> > +	unsigned long j;
> >  	unsigned long jbase = SRCU_INTERVAL;
> > 
> >  	if (ULONG_CMP_LT(READ_ONCE(ssp->srcu_gp_seq),
> > READ_ONCE(ssp->srcu_gp_seq_needed_exp)))
> >  		jbase = 0;
> > -	if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)))
> > -		jbase += jiffies - READ_ONCE(ssp->srcu_gp_start);
> > -	if (!jbase) {
> > -		WRITE_ONCE(ssp->srcu_n_exp_nodelay,
> > READ_ONCE(ssp->srcu_n_exp_nodelay) + 1);
> > -		if (READ_ONCE(ssp->srcu_n_exp_nodelay) >
> > SRCU_MAX_NODELAY_PHASE)
> > -			jbase = 1;
> > +	if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq))) {
> > +		j = jiffies - 1;
> > +		gpstart = READ_ONCE(ssp->srcu_gp_start);
> > +		if (time_after(j, gpstart))
> > +			jbase += j - gpstart;
> > +		if (!jbase) {
> > +			WRITE_ONCE(ssp->srcu_n_exp_nodelay,
> > READ_ONCE(ssp->srcu_n_exp_nodelay) + 1);
> > +			if (READ_ONCE(ssp->srcu_n_exp_nodelay) >
> > SRCU_MAX_NODELAY_PHASE)
> > +				jbase = 1;
> > +		}
> >  	}
> >  	return jbase > SRCU_MAX_INTERVAL ? SRCU_MAX_INTERVAL : jbase;
> >  }
> > --
> > 2.31.1.189.g2e36527f23
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH rcu 07/12] rcu: tiny: Record kvfree_call_rcu() call stack for KASAN
  2022-06-21 19:31     ` Paul E. McKenney
@ 2022-06-21 21:14       ` Marco Elver
  2022-06-21 22:17         ` Paul E. McKenney
  0 siblings, 1 reply; 39+ messages in thread
From: Marco Elver @ 2022-06-21 21:14 UTC (permalink / raw)
  To: paulmck
  Cc: Neeraj Upadhyay, rcu, linux-kernel, kernel-team, rostedt,
	Johannes Berg, Dmitry Vyukov

On Tue, 21 Jun 2022 at 21:31, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Tue, Jun 21, 2022 at 12:01:29PM +0530, Neeraj Upadhyay wrote:
> >
> >
> > On 6/21/2022 3:50 AM, Paul E. McKenney wrote:
> > > From: Johannes Berg <johannes.berg@intel.com>
> > >
> > > When running KASAN with Tiny RCU (e.g. under ARCH=um, where
> > > a working KASAN patch is now available), we don't get any
> > > information on the original kfree_rcu() (or similar) caller
> > > when a problem is reported, as Tiny RCU doesn't record this.
> > >
> > > Add the recording, which required pulling kvfree_call_rcu()
> > > out of line for the KASAN case since the recording function
> > > (kasan_record_aux_stack_noalloc) is neither exported, nor
> > > can we include kasan.h into rcutiny.h.
> > >
> > > without KASAN, the patch has no size impact (ARCH=um kernel):
> > >      text       data         bss         dec        hex    filename
> > >   6151515    4423154    33148520    43723189    29b29b5    linux
> > >   6151515    4423154    33148520    43723189    29b29b5    linux + patch
> > >
> > > with KASAN, the impact on my build was minimal:
> > >      text       data         bss         dec        hex    filename
> > > 13915539    7388050    33282304    54585893    340ea25    linux
> > > 13911266    7392114    33282304    54585684    340e954    linux + patch
> > >     -4273      +4064         +-0        -209
> > >
> > > Acked-by: Dmitry Vyukov <dvyukov@google.com>
> > > Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > ---
> > >   include/linux/rcutiny.h | 11 ++++++++++-
> > >   kernel/rcu/tiny.c       | 14 ++++++++++++++
> > >   2 files changed, 24 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
> > > index 5fed476f977f6..d84e13f2c3848 100644
> > > --- a/include/linux/rcutiny.h
> > > +++ b/include/linux/rcutiny.h
> > > @@ -38,7 +38,7 @@ static inline void synchronize_rcu_expedited(void)
> > >    */
> > >   extern void kvfree(const void *addr);
> > > -static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > > +static inline void __kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > >   {
> > >     if (head) {
> > >             call_rcu(head, func);
> > > @@ -51,6 +51,15 @@ static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > >     kvfree((void *) func);
> > >   }
> > > +#ifdef CONFIG_KASAN_GENERIC
> > > +void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
> > > +#else
> > > +static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > > +{
> > > +   __kvfree_call_rcu(head, func);
> > > +}
> > > +#endif
> > > +
> > >   void rcu_qs(void);
> > >   static inline void rcu_softirq_qs(void)
> > > diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
> > > index 340b3f8b090d4..58ff3721d975c 100644
> > > --- a/kernel/rcu/tiny.c
> > > +++ b/kernel/rcu/tiny.c
> > > @@ -217,6 +217,20 @@ bool poll_state_synchronize_rcu(unsigned long oldstate)
> > >   }
> > >   EXPORT_SYMBOL_GPL(poll_state_synchronize_rcu);
> > > +#ifdef CONFIG_KASAN_GENERIC
> > > +void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > > +{
> > > +   if (head) {
> > > +           void *ptr = (void *) head - (unsigned long) func;
> > > +
> > > +           kasan_record_aux_stack_noalloc(ptr);
> > > +   }
> >
> > For the !head case; similar to Tree RCU's kvfree_call_rcu() implementation,
> > we do not need to record 'ptr' (which will be 'func')?
>
> My understanding is that we do not need to record in that case
> because __kvfree_call_rcu() will simply invoke the almost-zero-cost
> synchronize_rcu() and then invoke kfree().
>
> Johannes, Dmitry, Marco, anything that I am missing?

As-is looks sensible - doing kasan_record_aux_stack_noalloc() only
makes sense if the actual kfree() is not done with a callstack that
will point at the kvfree_call_rcu() caller. Otherwise we're doing
redundant work and just polluting the aux stack storage slots. So in
the case where kvfree_call_rcu() does synchronize_rcu() and kfree()
the kvfree_call_rcu() caller is in the callstack, and would be shown
on use-after-free bugs.

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH rcu 07/12] rcu: tiny: Record kvfree_call_rcu() call stack for KASAN
  2022-06-21 21:14       ` Marco Elver
@ 2022-06-21 22:17         ` Paul E. McKenney
  0 siblings, 0 replies; 39+ messages in thread
From: Paul E. McKenney @ 2022-06-21 22:17 UTC (permalink / raw)
  To: Marco Elver
  Cc: Neeraj Upadhyay, rcu, linux-kernel, kernel-team, rostedt,
	Johannes Berg, Dmitry Vyukov

On Tue, Jun 21, 2022 at 11:14:04PM +0200, Marco Elver wrote:
> On Tue, 21 Jun 2022 at 21:31, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Tue, Jun 21, 2022 at 12:01:29PM +0530, Neeraj Upadhyay wrote:
> > >
> > >
> > > On 6/21/2022 3:50 AM, Paul E. McKenney wrote:
> > > > From: Johannes Berg <johannes.berg@intel.com>
> > > >
> > > > When running KASAN with Tiny RCU (e.g. under ARCH=um, where
> > > > a working KASAN patch is now available), we don't get any
> > > > information on the original kfree_rcu() (or similar) caller
> > > > when a problem is reported, as Tiny RCU doesn't record this.
> > > >
> > > > Add the recording, which required pulling kvfree_call_rcu()
> > > > out of line for the KASAN case since the recording function
> > > > (kasan_record_aux_stack_noalloc) is neither exported, nor
> > > > can we include kasan.h into rcutiny.h.
> > > >
> > > > without KASAN, the patch has no size impact (ARCH=um kernel):
> > > >      text       data         bss         dec        hex    filename
> > > >   6151515    4423154    33148520    43723189    29b29b5    linux
> > > >   6151515    4423154    33148520    43723189    29b29b5    linux + patch
> > > >
> > > > with KASAN, the impact on my build was minimal:
> > > >      text       data         bss         dec        hex    filename
> > > > 13915539    7388050    33282304    54585893    340ea25    linux
> > > > 13911266    7392114    33282304    54585684    340e954    linux + patch
> > > >     -4273      +4064         +-0        -209
> > > >
> > > > Acked-by: Dmitry Vyukov <dvyukov@google.com>
> > > > Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > > ---
> > > >   include/linux/rcutiny.h | 11 ++++++++++-
> > > >   kernel/rcu/tiny.c       | 14 ++++++++++++++
> > > >   2 files changed, 24 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
> > > > index 5fed476f977f6..d84e13f2c3848 100644
> > > > --- a/include/linux/rcutiny.h
> > > > +++ b/include/linux/rcutiny.h
> > > > @@ -38,7 +38,7 @@ static inline void synchronize_rcu_expedited(void)
> > > >    */
> > > >   extern void kvfree(const void *addr);
> > > > -static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > > > +static inline void __kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > > >   {
> > > >     if (head) {
> > > >             call_rcu(head, func);
> > > > @@ -51,6 +51,15 @@ static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > > >     kvfree((void *) func);
> > > >   }
> > > > +#ifdef CONFIG_KASAN_GENERIC
> > > > +void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
> > > > +#else
> > > > +static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > > > +{
> > > > +   __kvfree_call_rcu(head, func);
> > > > +}
> > > > +#endif
> > > > +
> > > >   void rcu_qs(void);
> > > >   static inline void rcu_softirq_qs(void)
> > > > diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
> > > > index 340b3f8b090d4..58ff3721d975c 100644
> > > > --- a/kernel/rcu/tiny.c
> > > > +++ b/kernel/rcu/tiny.c
> > > > @@ -217,6 +217,20 @@ bool poll_state_synchronize_rcu(unsigned long oldstate)
> > > >   }
> > > >   EXPORT_SYMBOL_GPL(poll_state_synchronize_rcu);
> > > > +#ifdef CONFIG_KASAN_GENERIC
> > > > +void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > > > +{
> > > > +   if (head) {
> > > > +           void *ptr = (void *) head - (unsigned long) func;
> > > > +
> > > > +           kasan_record_aux_stack_noalloc(ptr);
> > > > +   }
> > >
> > > For the !head case; similar to Tree RCU's kvfree_call_rcu() implementation,
> > > we do not need to record 'ptr' (which will be 'func')?
> >
> > My understanding is that we do not need to record in that case
> > because __kvfree_call_rcu() will simply invoke the almost-zero-cost
> > synchronize_rcu() and then invoke kfree().
> >
> > Johannes, Dmitry, Marco, anything that I am missing?
> 
> As-is looks sensible - doing kasan_record_aux_stack_noalloc() only
> makes sense if the actual kfree() is not done with a callstack that
> will point at the kvfree_call_rcu() caller. Otherwise we're doing
> redundant work and just polluting the aux stack storage slots. So in
> the case where kvfree_call_rcu() does synchronize_rcu() and kfree()
> the kvfree_call_rcu() caller is in the callstack, and would be shown
> on use-after-free bugs.

Thank you for checking and for confirmation!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH rcu 01/12] rcu: Decrease FQS scan wait time in case of callback overloading
  2022-06-21  5:29   ` Neeraj Upadhyay
@ 2022-06-21 22:19     ` Paul E. McKenney
  2022-06-22 11:46       ` Neeraj Upadhyay
  0 siblings, 1 reply; 39+ messages in thread
From: Paul E. McKenney @ 2022-06-21 22:19 UTC (permalink / raw)
  To: Neeraj Upadhyay; +Cc: rcu, linux-kernel, kernel-team, rostedt

On Tue, Jun 21, 2022 at 10:59:58AM +0530, Neeraj Upadhyay wrote:
> 
> 
> On 6/21/2022 3:50 AM, Paul E. McKenney wrote:
> > The force-quiesce-state loop function rcu_gp_fqs_loop() checks for
> > callback overloading and does an immediate initial scan for idle CPUs
> > if so.  However, subsequent rescans will be carried out at as leisurely a
> > rate as they always are, as specified by the rcutree.jiffies_till_next_fqs
> > module parameter.  It might be tempting to just continue immediately
> > rescanning, but this turns the RCU grace-period kthread into a CPU hog.
> > It might also be tempting to reduce the time between rescans to a single
> > jiffy, but this can be problematic on larger systems.
> > 
> > This commit therefore divides the normal time between rescans by three,
> > rounding up.  Thus a small system running at HZ=1000 that is suffering
> > from callback overload will wait only one jiffy instead of the normal
> > three between rescans.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > ---
> >   kernel/rcu/tree.c | 5 +++++
> >   1 file changed, 5 insertions(+)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index c25ba442044a6..c19d5926886fb 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -1993,6 +1993,11 @@ static noinline_for_stack void rcu_gp_fqs_loop(void)
> >   			WRITE_ONCE(rcu_state.jiffies_kick_kthreads,
> >   				   jiffies + (j ? 3 * j : 2));
> >   		}
> > +		if (rcu_state.cbovld) {
> > +			j = (j + 2) / 3;
> > +			if (j <= 0)
> > +				j = 1;
> > +		}
> 
> We update 'j' here, after setting rcu_state.jiffies_force_qs
> 
>     WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies + j)
> 
> So, we return from swait_event_idle_timeout_exclusive after 1/3 time
> duration.
> 
>     swait_event_idle_timeout_exclusive(rcu_state.gp_wq,
> 				 rcu_gp_fqs_check_wake(&gf), j);
> 
> This can result in !timer_after check to return false and we will
> enter the 'else' (stray signal block) code?
> 
> This might not matter for first 2 fqs loop iterations, where
> RCU_GP_FLAG_OVLD is set in 'gf', but subsequent iterations won't benefit
> from this patch?
> 
> 
> if (!time_after(rcu_state.jiffies_force_qs, jiffies) ||
> 	(gf & (RCU_GP_FLAG_FQS | RCU_GP_FLAG_OVLD))) {
> 			...
> } else {
> 	/* Deal with stray signal. */
> }
> 
> 
> So, do we need to move this calculation above the 'if' block which sets
> rcu_state.jiffies_force_qs?
> 		if (!ret) {
> 
> 			WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies +
> 						j);...
> 		}

Good catch, thank you!  How about the updated patch shown below?

							Thanx, Paul

------------------------------------------------------------------------

commit 77de092c78f549b5c28075bfee9998a525d21f84
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Tue Apr 12 15:08:14 2022 -0700

    rcu: Decrease FQS scan wait time in case of callback overloading
    
    The force-quiesce-state loop function rcu_gp_fqs_loop() checks for
    callback overloading and does an immediate initial scan for idle CPUs
    if so.  However, subsequent rescans will be carried out at as leisurely a
    rate as they always are, as specified by the rcutree.jiffies_till_next_fqs
    module parameter.  It might be tempting to just continue immediately
    rescanning, but this turns the RCU grace-period kthread into a CPU hog.
    It might also be tempting to reduce the time between rescans to a single
    jiffy, but this can be problematic on larger systems.
    
    This commit therefore divides the normal time between rescans by three,
    rounding up.  Thus a small system running at HZ=1000 that is suffering
    from callback overload will wait only one jiffy instead of the normal
    three between rescans.
    
    [ paulmck: Apply Neeraj Upadhyay feedback. ]
    
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c25ba442044a6..52094e72866e5 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1983,7 +1983,12 @@ static noinline_for_stack void rcu_gp_fqs_loop(void)
 		gf = RCU_GP_FLAG_OVLD;
 	ret = 0;
 	for (;;) {
-		if (!ret) {
+		if (rcu_state.cbovld) {
+			j = (j + 2) / 3;
+			if (j <= 0)
+				j = 1;
+		}
+		if (!ret || time_before(jiffies + j, rcu_state.jiffies_force_qs)) {
 			WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies + j);
 			/*
 			 * jiffies_force_qs before RCU_GP_WAIT_FQS state

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH rcu 02/12] rcu: Avoid tracing a few functions executed in stop machine
  2022-06-21  5:47   ` Neeraj Upadhyay
@ 2022-06-21 22:21     ` Paul E. McKenney
  2022-06-22 11:50       ` Neeraj Upadhyay
  0 siblings, 1 reply; 39+ messages in thread
From: Paul E. McKenney @ 2022-06-21 22:21 UTC (permalink / raw)
  To: Neeraj Upadhyay; +Cc: rcu, linux-kernel, kernel-team, rostedt, Patrick Wang

On Tue, Jun 21, 2022 at 11:17:10AM +0530, Neeraj Upadhyay wrote:
> 
> 
> On 6/21/2022 3:50 AM, Paul E. McKenney wrote:
> > From: Patrick Wang <patrick.wang.shcn@gmail.com>
> > 
> > Stop-machine recently started calling additional functions while waiting:
> > 
> > ----------------------------------------------------------------
> > Former stop machine wait loop:
> > do {
> >      cpu_relax(); => macro
> >      ...
> > } while (curstate != STOPMACHINE_EXIT);
> > -----------------------------------------------------------------
> > Current stop machine wait loop:
> > do {
> >      stop_machine_yield(cpumask); => function (notraced)
> >      ...
> >      touch_nmi_watchdog(); => function (notraced, inside calls also notraced)
> >      ...
> >      rcu_momentary_dyntick_idle(); => function (notraced, inside calls traced)
> > } while (curstate != MULTI_STOP_EXIT);
> > ------------------------------------------------------------------
> > 
> > These functions (and the functions that they call) must be marked
> > notrace to prevent them from being updated while they are executing.
> > The consequences of failing to mark these functions can be severe:
> > 
> >    rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> >    rcu: 	1-...!: (0 ticks this GP) idle=14f/1/0x4000000000000000 softirq=3397/3397 fqs=0
> >    rcu: 	3-...!: (0 ticks this GP) idle=ee9/1/0x4000000000000000 softirq=5168/5168 fqs=0
> >    	(detected by 0, t=8137 jiffies, g=5889, q=2 ncpus=4)
> >    Task dump for CPU 1:
> >    task:migration/1     state:R  running task     stack:    0 pid:   19 ppid:     2 flags:0x00000000
> >    Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
> >    Call Trace:
> >    Task dump for CPU 3:
> >    task:migration/3     state:R  running task     stack:    0 pid:   29 ppid:     2 flags:0x00000000
> >    Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
> >    Call Trace:
> >    rcu: rcu_preempt kthread timer wakeup didn't happen for 8136 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
> >    rcu: 	Possible timer handling issue on cpu=2 timer-softirq=594
> >    rcu: rcu_preempt kthread starved for 8137 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2
> >    rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> >    rcu: RCU grace-period kthread stack dump:
> >    task:rcu_preempt     state:I stack:    0 pid:   14 ppid:     2 flags:0x00000000
> >    Call Trace:
> >      schedule+0x56/0xc2
> >      schedule_timeout+0x82/0x184
> >      rcu_gp_fqs_loop+0x19a/0x318
> >      rcu_gp_kthread+0x11a/0x140
> >      kthread+0xee/0x118
> >      ret_from_exception+0x0/0x14
> >    rcu: Stack dump where RCU GP kthread last ran:
> >    Task dump for CPU 2:
> >    task:migration/2     state:R  running task     stack:    0 pid:   24 ppid:     2 flags:0x00000000
> >    Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
> >    Call Trace:
> > 
> > This commit therefore marks these functions notrace:
> >   rcu_preempt_deferred_qs()
> >   rcu_preempt_need_deferred_qs()
> >   rcu_preempt_deferred_qs_irqrestore()
> > 
> 
> Only the preemptible RCU definitions are updated; so, this change is not
> required for non-preemptible RCU case?

It appears to me to be required.  How about as shown below?

							Thanx, Paul

------------------------------------------------------------------------

commit 06cfe0c675c93884c3ffc75ec24ece7d0acd7a32
Author: Patrick Wang <patrick.wang.shcn@gmail.com>
Date:   Tue Apr 26 18:45:02 2022 +0800

    rcu: Avoid tracing a few functions executed in stop machine
    
    Stop-machine recently started calling additional functions while waiting:
    
    ----------------------------------------------------------------
    Former stop machine wait loop:
    do {
        cpu_relax(); => macro
        ...
    } while (curstate != STOPMACHINE_EXIT);
    -----------------------------------------------------------------
    Current stop machine wait loop:
    do {
        stop_machine_yield(cpumask); => function (notraced)
        ...
        touch_nmi_watchdog(); => function (notraced, inside calls also notraced)
        ...
        rcu_momentary_dyntick_idle(); => function (notraced, inside calls traced)
    } while (curstate != MULTI_STOP_EXIT);
    ------------------------------------------------------------------
    
    These functions (and the functions that they call) must be marked
    notrace to prevent them from being updated while they are executing.
    The consequences of failing to mark these functions can be severe:
    
      rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
      rcu:  1-...!: (0 ticks this GP) idle=14f/1/0x4000000000000000 softirq=3397/3397 fqs=0
      rcu:  3-...!: (0 ticks this GP) idle=ee9/1/0x4000000000000000 softirq=5168/5168 fqs=0
            (detected by 0, t=8137 jiffies, g=5889, q=2 ncpus=4)
      Task dump for CPU 1:
      task:migration/1     state:R  running task     stack:    0 pid:   19 ppid:     2 flags:0x00000000
      Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
      Call Trace:
      Task dump for CPU 3:
      task:migration/3     state:R  running task     stack:    0 pid:   29 ppid:     2 flags:0x00000000
      Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
      Call Trace:
      rcu: rcu_preempt kthread timer wakeup didn't happen for 8136 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
      rcu:  Possible timer handling issue on cpu=2 timer-softirq=594
      rcu: rcu_preempt kthread starved for 8137 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2
      rcu:  Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
      rcu: RCU grace-period kthread stack dump:
      task:rcu_preempt     state:I stack:    0 pid:   14 ppid:     2 flags:0x00000000
      Call Trace:
        schedule+0x56/0xc2
        schedule_timeout+0x82/0x184
        rcu_gp_fqs_loop+0x19a/0x318
        rcu_gp_kthread+0x11a/0x140
        kthread+0xee/0x118
        ret_from_exception+0x0/0x14
      rcu: Stack dump where RCU GP kthread last ran:
      Task dump for CPU 2:
      task:migration/2     state:R  running task     stack:    0 pid:   24 ppid:     2 flags:0x00000000
      Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
      Call Trace:
    
    This commit therefore marks these functions notrace:
     rcu_preempt_deferred_qs()
     rcu_preempt_need_deferred_qs()
     rcu_preempt_deferred_qs_irqrestore()
    
    [ paulmck: Apply feedback from Neeraj Upadhyay. ]
    
    Signed-off-by: Patrick Wang <patrick.wang.shcn@gmail.com>
    Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index c8ba0fe17267c..7a07f2ca153e2 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -460,7 +460,7 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp)
  * be quite short, for example, in the case of the call from
  * rcu_read_unlock_special().
  */
-static void
+static notrace void
 rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
 {
 	bool empty_exp;
@@ -581,7 +581,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
  * is disabled.  This function cannot be expected to understand these
  * nuances, so the caller must handle them.
  */
-static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
+static notrace bool rcu_preempt_need_deferred_qs(struct task_struct *t)
 {
 	return (__this_cpu_read(rcu_data.cpu_no_qs.b.exp) ||
 		READ_ONCE(t->rcu_read_unlock_special.s)) &&
@@ -595,7 +595,7 @@ static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
  * evaluate safety in terms of interrupt, softirq, and preemption
  * disabling.
  */
-static void rcu_preempt_deferred_qs(struct task_struct *t)
+static notrace void rcu_preempt_deferred_qs(struct task_struct *t)
 {
 	unsigned long flags;
 
@@ -926,7 +926,7 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp)
  * Because there is no preemptible RCU, there can be no deferred quiescent
  * states.
  */
-static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
+static notrace bool rcu_preempt_need_deferred_qs(struct task_struct *t)
 {
 	return false;
 }
@@ -935,7 +935,7 @@ static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
 // period for a quiescent state from this CPU.  Note that requests from
 // tasks are handled when removing the task from the blocked-tasks list
 // below.
-static void rcu_preempt_deferred_qs(struct task_struct *t)
+static notrace void rcu_preempt_deferred_qs(struct task_struct *t)
 {
 	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
 

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH rcu 03/12] rcu: Add rnp->cbovldmask check in rcutree_migrate_callbacks()
  2022-06-21  5:57   ` Neeraj Upadhyay
@ 2022-06-21 22:22     ` Paul E. McKenney
  0 siblings, 0 replies; 39+ messages in thread
From: Paul E. McKenney @ 2022-06-21 22:22 UTC (permalink / raw)
  To: Neeraj Upadhyay; +Cc: rcu, linux-kernel, kernel-team, rostedt, Zqiang

On Tue, Jun 21, 2022 at 11:27:28AM +0530, Neeraj Upadhyay wrote:
> 
> 
> On 6/21/2022 3:50 AM, Paul E. McKenney wrote:
> > From: Zqiang <qiang1.zhang@intel.com>
> > 
> > Currently, the rcu_node structure's ->cbovlmask field is set in call_rcu()
> > when a given CPU is suffering from callback overload.  But if that CPU
> > goes offline, the outgoing CPU's callbacks is migrated to the running
> > CPU, which is likely to overload the running CPU.  However, that CPU's
> > bit in its leaf rcu_node structure's ->cbovlmask field remains zero.
> > 
> > Initially, this is OK because the outgoing CPU's bit remains set.
> > However, that bit will be cleared at the next end of a grace period,
> > at which time it is quite possible that the running CPU will still
> > be overloaded.  If the running CPU invokes call_rcu(), then overload
> > will be checked for and the bit will be set.  Except that there is no
> > guarantee that the running CPU will invoke call_rcu(), in which case the
> > next grace period will fail to take the running CPU's overload condition
> > into account.  Plus, because the bit is not set, the end of the grace
> > period won't check for overload on this CPU.
> > 
> > This commit therefore adds a call to check_cb_ovld_locked() in
> > check_cb_ovld_locked() to set the running CPU's ->cbovlmask bit
> 
> Nit: s/check_cb_ovld_locked/rcutree_migrate_callbacks/

Good catch, fixed!

> > appropriately.
> > 
> > Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > ---
> 
> Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>

Thank you, applied.

							Thanx, Paul

> Thanks
> Neeraj
> 
> >   kernel/rcu/tree.c | 1 +
> >   1 file changed, 1 insertion(+)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index c19d5926886fb..f4a37f2032664 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -4491,6 +4491,7 @@ void rcutree_migrate_callbacks(int cpu)
> >   	needwake = needwake || rcu_advance_cbs(my_rnp, my_rdp);
> >   	rcu_segcblist_disable(&rdp->cblist);
> >   	WARN_ON_ONCE(rcu_segcblist_empty(&my_rdp->cblist) != !rcu_segcblist_n_cbs(&my_rdp->cblist));
> > +	check_cb_ovld_locked(my_rdp, my_rnp);
> >   	if (rcu_rdp_is_offloaded(my_rdp)) {
> >   		raw_spin_unlock_rcu_node(my_rnp); /* irqs remain disabled. */
> >   		__call_rcu_nocb_wake(my_rdp, true, flags);

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH rcu 08/12] rcu: Cleanup RCU urgency state for offline CPU
  2022-06-21  7:03   ` Neeraj Upadhyay
@ 2022-06-21 22:24     ` Paul E. McKenney
  0 siblings, 0 replies; 39+ messages in thread
From: Paul E. McKenney @ 2022-06-21 22:24 UTC (permalink / raw)
  To: Neeraj Upadhyay; +Cc: rcu, linux-kernel, kernel-team, rostedt, Zqiang

On Tue, Jun 21, 2022 at 12:33:12PM +0530, Neeraj Upadhyay wrote:
> 
> 
> On 6/21/2022 3:50 AM, Paul E. McKenney wrote:
> > From: Zqiang <qiang1.zhang@intel.com>
> > 
> > When a CPU is slow to provide a quiescent state for a given grace
> > period, RCU takes steps to encourage that CPU to get with the
> > quiescent-state program in a more timely fashion.  These steps
> > include these flags in the rcu_data structure:
> > 
> > 1.	->rcu_urgent_qs, which causes the scheduling-clock interrupt to
> > 	request an otherwise pointless context switch from the scheduler.
> > 
> > 2.	->rcu_need_heavy_qs, which causes both cond_resched() and RCU's
> > 	context-switch hook to do an immediate momentary quiscent state.
> > 
> > 3.	->rcu_need_heavy_qs, which causes the scheduler-clock tick to
> 
> nit: s/->rcu_need_heavy_qs/->rcu_forced_tick/ ?

This one got lost in the shuffle, so I will apply it on my next rebase.

> > 	be enabled even on nohz_full CPUs with only one runnable task.
> > 
> > These flags are of course cleared once the corresponding CPU has passed
> > through a quiescent state.  Unless that quiescent state is the CPU
> > going offline, which means that when the CPU comes back online, it will
> > needlessly consume additional CPU time and incur additional latency,
> > which constitutes a minor but very real performance bug.
> > 
> > This commit therefore adds the call to rcu_disable_urgency_upon_qs()
> > that clears these flags to the CPU-hotplug offlining code path.
> > 
> > Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > ---
> 
> 
> Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>

And again, thank you!

							Thanx, Paul

> Thanks
> Neeraj
> 
> >   kernel/rcu/tree.c | 1 +
> >   1 file changed, 1 insertion(+)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index f4a37f2032664..5445b19b48408 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -4446,6 +4446,7 @@ void rcu_report_dead(unsigned int cpu)
> >   	rdp->rcu_ofl_gp_flags = READ_ONCE(rcu_state.gp_flags);
> >   	if (rnp->qsmask & mask) { /* RCU waiting on outgoing CPU? */
> >   		/* Report quiescent state -before- changing ->qsmaskinitnext! */
> > +		rcu_disable_urgency_upon_qs(rdp);
> >   		rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
> >   		raw_spin_lock_irqsave_rcu_node(rnp, flags);
> >   	}

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH rcu 12/12] srcu: Block less aggressively for expedited grace periods
  2022-06-21 10:13   ` Neeraj Upadhyay
@ 2022-06-21 22:25     ` Paul E. McKenney
  0 siblings, 0 replies; 39+ messages in thread
From: Paul E. McKenney @ 2022-06-21 22:25 UTC (permalink / raw)
  To: Neeraj Upadhyay
  Cc: rcu, linux-kernel, kernel-team, rostedt, Zhangfei Gao,
	Shameerali Kolothum Thodi, Paolo Bonzini

On Tue, Jun 21, 2022 at 03:43:30PM +0530, Neeraj Upadhyay wrote:
> 
> 
> On 6/21/2022 3:50 AM, Paul E. McKenney wrote:
> > Commit 282d8998e997 ("srcu: Prevent expedited GPs and blocking readers
> > from consuming CPU") fixed a problem where a long-running expedited SRCU
> > grace period could block kernel live patching.  It did so by giving up
> > on expediting once a given SRCU expedited grace period grew too old.
> > 
> > Unfortunately, this added excessive delays to boots of embedded systems
> > running on qemu that use the ARM IORT RMR feature.  This commit therefore
> > makes the transition away from expediting less aggressive, increasing
> > the per-grace-period phase number of non-sleeping polls of readers from
> > one to three and increasing the required grace-period age from one jiffy
> > (actually from zero to one jiffies) to two jiffies (actually from one
> > to two jiffies).
> > 
> > Fixes: 282d8998e997 ("srcu: Prevent expedited GPs and blocking readers from consuming CPU")
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > Reported-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> > Reported-by: chenxiang (M)" <chenxiang66@hisilicon.com>
> > Cc: Shameerali Kolothum Thodi  <shameerali.kolothum.thodi@huawei.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Link: https://lore.kernel.org/all/20615615-0013-5adc-584f-2b1d5c03ebfc@linaro.org/
> > ---
> 
> 
> Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>

And I applied your Reviewed-by to the other commits you gave it for,
and thank you very much!  Much nicer to fix the bugs now than to have
to do separate patches for them later.  ;-)

							Thanx, Paul

> Thanks
> Neeraj
> 
> >   kernel/rcu/srcutree.c | 20 +++++++++++++-------
> >   1 file changed, 13 insertions(+), 7 deletions(-)
> > 
> > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > index 50ba70f019dea..0db7873f4e95b 100644
> > --- a/kernel/rcu/srcutree.c
> > +++ b/kernel/rcu/srcutree.c
> > @@ -513,7 +513,7 @@ static bool srcu_readers_active(struct srcu_struct *ssp)
> >   #define SRCU_INTERVAL		1	// Base delay if no expedited GPs pending.
> >   #define SRCU_MAX_INTERVAL	10	// Maximum incremental delay from slow readers.
> > -#define SRCU_MAX_NODELAY_PHASE	1	// Maximum per-GP-phase consecutive no-delay instances.
> > +#define SRCU_MAX_NODELAY_PHASE	3	// Maximum per-GP-phase consecutive no-delay instances.
> >   #define SRCU_MAX_NODELAY	100	// Maximum consecutive no-delay instances.
> >   /*
> > @@ -522,16 +522,22 @@ static bool srcu_readers_active(struct srcu_struct *ssp)
> >    */
> >   static unsigned long srcu_get_delay(struct srcu_struct *ssp)
> >   {
> > +	unsigned long gpstart;
> > +	unsigned long j;
> >   	unsigned long jbase = SRCU_INTERVAL;
> >   	if (ULONG_CMP_LT(READ_ONCE(ssp->srcu_gp_seq), READ_ONCE(ssp->srcu_gp_seq_needed_exp)))
> >   		jbase = 0;
> > -	if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)))
> > -		jbase += jiffies - READ_ONCE(ssp->srcu_gp_start);
> > -	if (!jbase) {
> > -		WRITE_ONCE(ssp->srcu_n_exp_nodelay, READ_ONCE(ssp->srcu_n_exp_nodelay) + 1);
> > -		if (READ_ONCE(ssp->srcu_n_exp_nodelay) > SRCU_MAX_NODELAY_PHASE)
> > -			jbase = 1;
> > +	if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq))) {
> > +		j = jiffies - 1;
> > +		gpstart = READ_ONCE(ssp->srcu_gp_start);
> > +		if (time_after(j, gpstart))
> > +			jbase += j - gpstart;
> > +		if (!jbase) {
> > +			WRITE_ONCE(ssp->srcu_n_exp_nodelay, READ_ONCE(ssp->srcu_n_exp_nodelay) + 1);
> > +			if (READ_ONCE(ssp->srcu_n_exp_nodelay) > SRCU_MAX_NODELAY_PHASE)
> > +				jbase = 1;
> > +		}
> >   	}
> >   	return jbase > SRCU_MAX_INTERVAL ? SRCU_MAX_INTERVAL : jbase;
> >   }

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH rcu 01/12] rcu: Decrease FQS scan wait time in case of callback overloading
  2022-06-21 22:19     ` Paul E. McKenney
@ 2022-06-22 11:46       ` Neeraj Upadhyay
  0 siblings, 0 replies; 39+ messages in thread
From: Neeraj Upadhyay @ 2022-06-22 11:46 UTC (permalink / raw)
  To: paulmck; +Cc: rcu, linux-kernel, kernel-team, rostedt



On 6/22/2022 3:49 AM, Paul E. McKenney wrote:
> On Tue, Jun 21, 2022 at 10:59:58AM +0530, Neeraj Upadhyay wrote:
>>
>>
>> On 6/21/2022 3:50 AM, Paul E. McKenney wrote:
>>> The force-quiesce-state loop function rcu_gp_fqs_loop() checks for
>>> callback overloading and does an immediate initial scan for idle CPUs
>>> if so.  However, subsequent rescans will be carried out at as leisurely a
>>> rate as they always are, as specified by the rcutree.jiffies_till_next_fqs
>>> module parameter.  It might be tempting to just continue immediately
>>> rescanning, but this turns the RCU grace-period kthread into a CPU hog.
>>> It might also be tempting to reduce the time between rescans to a single
>>> jiffy, but this can be problematic on larger systems.
>>>
>>> This commit therefore divides the normal time between rescans by three,
>>> rounding up.  Thus a small system running at HZ=1000 that is suffering
>>> from callback overload will wait only one jiffy instead of the normal
>>> three between rescans.
>>>
>>> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>>> ---
>>>    kernel/rcu/tree.c | 5 +++++
>>>    1 file changed, 5 insertions(+)
>>>
>>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>>> index c25ba442044a6..c19d5926886fb 100644
>>> --- a/kernel/rcu/tree.c
>>> +++ b/kernel/rcu/tree.c
>>> @@ -1993,6 +1993,11 @@ static noinline_for_stack void rcu_gp_fqs_loop(void)
>>>    			WRITE_ONCE(rcu_state.jiffies_kick_kthreads,
>>>    				   jiffies + (j ? 3 * j : 2));
>>>    		}
>>> +		if (rcu_state.cbovld) {
>>> +			j = (j + 2) / 3;
>>> +			if (j <= 0)
>>> +				j = 1;
>>> +		}
>>
>> We update 'j' here, after setting rcu_state.jiffies_force_qs
>>
>>      WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies + j)
>>
>> So, we return from swait_event_idle_timeout_exclusive after 1/3 time
>> duration.
>>
>>      swait_event_idle_timeout_exclusive(rcu_state.gp_wq,
>> 				 rcu_gp_fqs_check_wake(&gf), j);
>>
>> This can result in !timer_after check to return false and we will
>> enter the 'else' (stray signal block) code?
>>
>> This might not matter for first 2 fqs loop iterations, where
>> RCU_GP_FLAG_OVLD is set in 'gf', but subsequent iterations won't benefit
>> from this patch?
>>
>>
>> if (!time_after(rcu_state.jiffies_force_qs, jiffies) ||
>> 	(gf & (RCU_GP_FLAG_FQS | RCU_GP_FLAG_OVLD))) {
>> 			...
>> } else {
>> 	/* Deal with stray signal. */
>> }
>>
>>
>> So, do we need to move this calculation above the 'if' block which sets
>> rcu_state.jiffies_force_qs?
>> 		if (!ret) {
>>
>> 			WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies +
>> 						j);...
>> 		}
> 
> Good catch, thank you!  How about the updated patch shown below?
> 

Looks good to me.


Thanks
Neeraj

> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit 77de092c78f549b5c28075bfee9998a525d21f84
> Author: Paul E. McKenney <paulmck@kernel.org>
> Date:   Tue Apr 12 15:08:14 2022 -0700
> 
>      rcu: Decrease FQS scan wait time in case of callback overloading
>      
>      The force-quiesce-state loop function rcu_gp_fqs_loop() checks for
>      callback overloading and does an immediate initial scan for idle CPUs
>      if so.  However, subsequent rescans will be carried out at as leisurely a
>      rate as they always are, as specified by the rcutree.jiffies_till_next_fqs
>      module parameter.  It might be tempting to just continue immediately
>      rescanning, but this turns the RCU grace-period kthread into a CPU hog.
>      It might also be tempting to reduce the time between rescans to a single
>      jiffy, but this can be problematic on larger systems.
>      
>      This commit therefore divides the normal time between rescans by three,
>      rounding up.  Thus a small system running at HZ=1000 that is suffering
>      from callback overload will wait only one jiffy instead of the normal
>      three between rescans.
>      
>      [ paulmck: Apply Neeraj Upadhyay feedback. ]
>      
>      Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index c25ba442044a6..52094e72866e5 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1983,7 +1983,12 @@ static noinline_for_stack void rcu_gp_fqs_loop(void)
>   		gf = RCU_GP_FLAG_OVLD;
>   	ret = 0;
>   	for (;;) {
> -		if (!ret) {
> +		if (rcu_state.cbovld) {
> +			j = (j + 2) / 3;
> +			if (j <= 0)
> +				j = 1;
> +		}
> +		if (!ret || time_before(jiffies + j, rcu_state.jiffies_force_qs)) {
>   			WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies + j);
>   			/*
>   			 * jiffies_force_qs before RCU_GP_WAIT_FQS state

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH rcu 02/12] rcu: Avoid tracing a few functions executed in stop machine
  2022-06-21 22:21     ` Paul E. McKenney
@ 2022-06-22 11:50       ` Neeraj Upadhyay
  2022-06-22 15:35         ` Paul E. McKenney
  0 siblings, 1 reply; 39+ messages in thread
From: Neeraj Upadhyay @ 2022-06-22 11:50 UTC (permalink / raw)
  To: paulmck; +Cc: rcu, linux-kernel, kernel-team, rostedt, Patrick Wang



On 6/22/2022 3:51 AM, Paul E. McKenney wrote:
> On Tue, Jun 21, 2022 at 11:17:10AM +0530, Neeraj Upadhyay wrote:
>>
>>
>> On 6/21/2022 3:50 AM, Paul E. McKenney wrote:
>>> From: Patrick Wang <patrick.wang.shcn@gmail.com>
>>>
>>> Stop-machine recently started calling additional functions while waiting:
>>>
>>> ----------------------------------------------------------------
>>> Former stop machine wait loop:
>>> do {
>>>       cpu_relax(); => macro
>>>       ...
>>> } while (curstate != STOPMACHINE_EXIT);
>>> -----------------------------------------------------------------
>>> Current stop machine wait loop:
>>> do {
>>>       stop_machine_yield(cpumask); => function (notraced)
>>>       ...
>>>       touch_nmi_watchdog(); => function (notraced, inside calls also notraced)
>>>       ...
>>>       rcu_momentary_dyntick_idle(); => function (notraced, inside calls traced)
>>> } while (curstate != MULTI_STOP_EXIT);
>>> ------------------------------------------------------------------
>>>
>>> These functions (and the functions that they call) must be marked
>>> notrace to prevent them from being updated while they are executing.
>>> The consequences of failing to mark these functions can be severe:
>>>
>>>     rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>>>     rcu: 	1-...!: (0 ticks this GP) idle=14f/1/0x4000000000000000 softirq=3397/3397 fqs=0
>>>     rcu: 	3-...!: (0 ticks this GP) idle=ee9/1/0x4000000000000000 softirq=5168/5168 fqs=0
>>>     	(detected by 0, t=8137 jiffies, g=5889, q=2 ncpus=4)
>>>     Task dump for CPU 1:
>>>     task:migration/1     state:R  running task     stack:    0 pid:   19 ppid:     2 flags:0x00000000
>>>     Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
>>>     Call Trace:
>>>     Task dump for CPU 3:
>>>     task:migration/3     state:R  running task     stack:    0 pid:   29 ppid:     2 flags:0x00000000
>>>     Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
>>>     Call Trace:
>>>     rcu: rcu_preempt kthread timer wakeup didn't happen for 8136 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
>>>     rcu: 	Possible timer handling issue on cpu=2 timer-softirq=594
>>>     rcu: rcu_preempt kthread starved for 8137 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2
>>>     rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
>>>     rcu: RCU grace-period kthread stack dump:
>>>     task:rcu_preempt     state:I stack:    0 pid:   14 ppid:     2 flags:0x00000000
>>>     Call Trace:
>>>       schedule+0x56/0xc2
>>>       schedule_timeout+0x82/0x184
>>>       rcu_gp_fqs_loop+0x19a/0x318
>>>       rcu_gp_kthread+0x11a/0x140
>>>       kthread+0xee/0x118
>>>       ret_from_exception+0x0/0x14
>>>     rcu: Stack dump where RCU GP kthread last ran:
>>>     Task dump for CPU 2:
>>>     task:migration/2     state:R  running task     stack:    0 pid:   24 ppid:     2 flags:0x00000000
>>>     Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
>>>     Call Trace:
>>>
>>> This commit therefore marks these functions notrace:
>>>    rcu_preempt_deferred_qs()
>>>    rcu_preempt_need_deferred_qs()
>>>    rcu_preempt_deferred_qs_irqrestore()
>>>
>>
>> Only the preemptible RCU definitions are updated; so, this change is not
>> required for non-preemptible RCU case?
> 
> It appears to me to be required.  How about as shown below?
> 

Looks good to me.


Thanks
Neeraj

> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit 06cfe0c675c93884c3ffc75ec24ece7d0acd7a32
> Author: Patrick Wang <patrick.wang.shcn@gmail.com>
> Date:   Tue Apr 26 18:45:02 2022 +0800
> 
>      rcu: Avoid tracing a few functions executed in stop machine
>      
>      Stop-machine recently started calling additional functions while waiting:
>      
>      ----------------------------------------------------------------
>      Former stop machine wait loop:
>      do {
>          cpu_relax(); => macro
>          ...
>      } while (curstate != STOPMACHINE_EXIT);
>      -----------------------------------------------------------------
>      Current stop machine wait loop:
>      do {
>          stop_machine_yield(cpumask); => function (notraced)
>          ...
>          touch_nmi_watchdog(); => function (notraced, inside calls also notraced)
>          ...
>          rcu_momentary_dyntick_idle(); => function (notraced, inside calls traced)
>      } while (curstate != MULTI_STOP_EXIT);
>      ------------------------------------------------------------------
>      
>      These functions (and the functions that they call) must be marked
>      notrace to prevent them from being updated while they are executing.
>      The consequences of failing to mark these functions can be severe:
>      
>        rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>        rcu:  1-...!: (0 ticks this GP) idle=14f/1/0x4000000000000000 softirq=3397/3397 fqs=0
>        rcu:  3-...!: (0 ticks this GP) idle=ee9/1/0x4000000000000000 softirq=5168/5168 fqs=0
>              (detected by 0, t=8137 jiffies, g=5889, q=2 ncpus=4)
>        Task dump for CPU 1:
>        task:migration/1     state:R  running task     stack:    0 pid:   19 ppid:     2 flags:0x00000000
>        Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
>        Call Trace:
>        Task dump for CPU 3:
>        task:migration/3     state:R  running task     stack:    0 pid:   29 ppid:     2 flags:0x00000000
>        Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
>        Call Trace:
>        rcu: rcu_preempt kthread timer wakeup didn't happen for 8136 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
>        rcu:  Possible timer handling issue on cpu=2 timer-softirq=594
>        rcu: rcu_preempt kthread starved for 8137 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2
>        rcu:  Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
>        rcu: RCU grace-period kthread stack dump:
>        task:rcu_preempt     state:I stack:    0 pid:   14 ppid:     2 flags:0x00000000
>        Call Trace:
>          schedule+0x56/0xc2
>          schedule_timeout+0x82/0x184
>          rcu_gp_fqs_loop+0x19a/0x318
>          rcu_gp_kthread+0x11a/0x140
>          kthread+0xee/0x118
>          ret_from_exception+0x0/0x14
>        rcu: Stack dump where RCU GP kthread last ran:
>        Task dump for CPU 2:
>        task:migration/2     state:R  running task     stack:    0 pid:   24 ppid:     2 flags:0x00000000
>        Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
>        Call Trace:
>      
>      This commit therefore marks these functions notrace:
>       rcu_preempt_deferred_qs()
>       rcu_preempt_need_deferred_qs()
>       rcu_preempt_deferred_qs_irqrestore()
>      
>      [ paulmck: Apply feedback from Neeraj Upadhyay. ]
>      
>      Signed-off-by: Patrick Wang <patrick.wang.shcn@gmail.com>
>      Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
>      Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index c8ba0fe17267c..7a07f2ca153e2 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -460,7 +460,7 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp)
>    * be quite short, for example, in the case of the call from
>    * rcu_read_unlock_special().
>    */
> -static void
> +static notrace void
>   rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
>   {
>   	bool empty_exp;
> @@ -581,7 +581,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
>    * is disabled.  This function cannot be expected to understand these
>    * nuances, so the caller must handle them.
>    */
> -static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
> +static notrace bool rcu_preempt_need_deferred_qs(struct task_struct *t)
>   {
>   	return (__this_cpu_read(rcu_data.cpu_no_qs.b.exp) ||
>   		READ_ONCE(t->rcu_read_unlock_special.s)) &&
> @@ -595,7 +595,7 @@ static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
>    * evaluate safety in terms of interrupt, softirq, and preemption
>    * disabling.
>    */
> -static void rcu_preempt_deferred_qs(struct task_struct *t)
> +static notrace void rcu_preempt_deferred_qs(struct task_struct *t)
>   {
>   	unsigned long flags;
>   
> @@ -926,7 +926,7 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp)
>    * Because there is no preemptible RCU, there can be no deferred quiescent
>    * states.
>    */
> -static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
> +static notrace bool rcu_preempt_need_deferred_qs(struct task_struct *t)
>   {
>   	return false;
>   }
> @@ -935,7 +935,7 @@ static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
>   // period for a quiescent state from this CPU.  Note that requests from
>   // tasks are handled when removing the task from the blocked-tasks list
>   // below.
> -static void rcu_preempt_deferred_qs(struct task_struct *t)
> +static notrace void rcu_preempt_deferred_qs(struct task_struct *t)
>   {
>   	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
>   

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH rcu 02/12] rcu: Avoid tracing a few functions executed in stop machine
  2022-06-22 11:50       ` Neeraj Upadhyay
@ 2022-06-22 15:35         ` Paul E. McKenney
  2022-06-22 15:49           ` Neeraj Upadhyay
  0 siblings, 1 reply; 39+ messages in thread
From: Paul E. McKenney @ 2022-06-22 15:35 UTC (permalink / raw)
  To: Neeraj Upadhyay; +Cc: rcu, linux-kernel, kernel-team, rostedt, Patrick Wang

On Wed, Jun 22, 2022 at 05:20:35PM +0530, Neeraj Upadhyay wrote:
> 
> 
> On 6/22/2022 3:51 AM, Paul E. McKenney wrote:
> > On Tue, Jun 21, 2022 at 11:17:10AM +0530, Neeraj Upadhyay wrote:
> > > 
> > > 
> > > On 6/21/2022 3:50 AM, Paul E. McKenney wrote:
> > > > From: Patrick Wang <patrick.wang.shcn@gmail.com>
> > > > 
> > > > Stop-machine recently started calling additional functions while waiting:
> > > > 
> > > > ----------------------------------------------------------------
> > > > Former stop machine wait loop:
> > > > do {
> > > >       cpu_relax(); => macro
> > > >       ...
> > > > } while (curstate != STOPMACHINE_EXIT);
> > > > -----------------------------------------------------------------
> > > > Current stop machine wait loop:
> > > > do {
> > > >       stop_machine_yield(cpumask); => function (notraced)
> > > >       ...
> > > >       touch_nmi_watchdog(); => function (notraced, inside calls also notraced)
> > > >       ...
> > > >       rcu_momentary_dyntick_idle(); => function (notraced, inside calls traced)
> > > > } while (curstate != MULTI_STOP_EXIT);
> > > > ------------------------------------------------------------------
> > > > 
> > > > These functions (and the functions that they call) must be marked
> > > > notrace to prevent them from being updated while they are executing.
> > > > The consequences of failing to mark these functions can be severe:
> > > > 
> > > >     rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > >     rcu: 	1-...!: (0 ticks this GP) idle=14f/1/0x4000000000000000 softirq=3397/3397 fqs=0
> > > >     rcu: 	3-...!: (0 ticks this GP) idle=ee9/1/0x4000000000000000 softirq=5168/5168 fqs=0
> > > >     	(detected by 0, t=8137 jiffies, g=5889, q=2 ncpus=4)
> > > >     Task dump for CPU 1:
> > > >     task:migration/1     state:R  running task     stack:    0 pid:   19 ppid:     2 flags:0x00000000
> > > >     Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
> > > >     Call Trace:
> > > >     Task dump for CPU 3:
> > > >     task:migration/3     state:R  running task     stack:    0 pid:   29 ppid:     2 flags:0x00000000
> > > >     Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
> > > >     Call Trace:
> > > >     rcu: rcu_preempt kthread timer wakeup didn't happen for 8136 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
> > > >     rcu: 	Possible timer handling issue on cpu=2 timer-softirq=594
> > > >     rcu: rcu_preempt kthread starved for 8137 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2
> > > >     rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> > > >     rcu: RCU grace-period kthread stack dump:
> > > >     task:rcu_preempt     state:I stack:    0 pid:   14 ppid:     2 flags:0x00000000
> > > >     Call Trace:
> > > >       schedule+0x56/0xc2
> > > >       schedule_timeout+0x82/0x184
> > > >       rcu_gp_fqs_loop+0x19a/0x318
> > > >       rcu_gp_kthread+0x11a/0x140
> > > >       kthread+0xee/0x118
> > > >       ret_from_exception+0x0/0x14
> > > >     rcu: Stack dump where RCU GP kthread last ran:
> > > >     Task dump for CPU 2:
> > > >     task:migration/2     state:R  running task     stack:    0 pid:   24 ppid:     2 flags:0x00000000
> > > >     Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
> > > >     Call Trace:
> > > > 
> > > > This commit therefore marks these functions notrace:
> > > >    rcu_preempt_deferred_qs()
> > > >    rcu_preempt_need_deferred_qs()
> > > >    rcu_preempt_deferred_qs_irqrestore()
> > > > 
> > > 
> > > Only the preemptible RCU definitions are updated; so, this change is not
> > > required for non-preemptible RCU case?
> > 
> > It appears to me to be required.  How about as shown below?
> > 
> 
> Looks good to me.

Thank you!  May I apply your Reviewed-by to both of these?  (1/12 and
2/12.)

							Thanx, Paul

> Thanks
> Neeraj
> 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > commit 06cfe0c675c93884c3ffc75ec24ece7d0acd7a32
> > Author: Patrick Wang <patrick.wang.shcn@gmail.com>
> > Date:   Tue Apr 26 18:45:02 2022 +0800
> > 
> >      rcu: Avoid tracing a few functions executed in stop machine
> >      Stop-machine recently started calling additional functions while waiting:
> >      ----------------------------------------------------------------
> >      Former stop machine wait loop:
> >      do {
> >          cpu_relax(); => macro
> >          ...
> >      } while (curstate != STOPMACHINE_EXIT);
> >      -----------------------------------------------------------------
> >      Current stop machine wait loop:
> >      do {
> >          stop_machine_yield(cpumask); => function (notraced)
> >          ...
> >          touch_nmi_watchdog(); => function (notraced, inside calls also notraced)
> >          ...
> >          rcu_momentary_dyntick_idle(); => function (notraced, inside calls traced)
> >      } while (curstate != MULTI_STOP_EXIT);
> >      ------------------------------------------------------------------
> >      These functions (and the functions that they call) must be marked
> >      notrace to prevent them from being updated while they are executing.
> >      The consequences of failing to mark these functions can be severe:
> >        rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> >        rcu:  1-...!: (0 ticks this GP) idle=14f/1/0x4000000000000000 softirq=3397/3397 fqs=0
> >        rcu:  3-...!: (0 ticks this GP) idle=ee9/1/0x4000000000000000 softirq=5168/5168 fqs=0
> >              (detected by 0, t=8137 jiffies, g=5889, q=2 ncpus=4)
> >        Task dump for CPU 1:
> >        task:migration/1     state:R  running task     stack:    0 pid:   19 ppid:     2 flags:0x00000000
> >        Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
> >        Call Trace:
> >        Task dump for CPU 3:
> >        task:migration/3     state:R  running task     stack:    0 pid:   29 ppid:     2 flags:0x00000000
> >        Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
> >        Call Trace:
> >        rcu: rcu_preempt kthread timer wakeup didn't happen for 8136 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
> >        rcu:  Possible timer handling issue on cpu=2 timer-softirq=594
> >        rcu: rcu_preempt kthread starved for 8137 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2
> >        rcu:  Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> >        rcu: RCU grace-period kthread stack dump:
> >        task:rcu_preempt     state:I stack:    0 pid:   14 ppid:     2 flags:0x00000000
> >        Call Trace:
> >          schedule+0x56/0xc2
> >          schedule_timeout+0x82/0x184
> >          rcu_gp_fqs_loop+0x19a/0x318
> >          rcu_gp_kthread+0x11a/0x140
> >          kthread+0xee/0x118
> >          ret_from_exception+0x0/0x14
> >        rcu: Stack dump where RCU GP kthread last ran:
> >        Task dump for CPU 2:
> >        task:migration/2     state:R  running task     stack:    0 pid:   24 ppid:     2 flags:0x00000000
> >        Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
> >        Call Trace:
> >      This commit therefore marks these functions notrace:
> >       rcu_preempt_deferred_qs()
> >       rcu_preempt_need_deferred_qs()
> >       rcu_preempt_deferred_qs_irqrestore()
> >      [ paulmck: Apply feedback from Neeraj Upadhyay. ]
> >      Signed-off-by: Patrick Wang <patrick.wang.shcn@gmail.com>
> >      Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> >      Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > 
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index c8ba0fe17267c..7a07f2ca153e2 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -460,7 +460,7 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp)
> >    * be quite short, for example, in the case of the call from
> >    * rcu_read_unlock_special().
> >    */
> > -static void
> > +static notrace void
> >   rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
> >   {
> >   	bool empty_exp;
> > @@ -581,7 +581,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
> >    * is disabled.  This function cannot be expected to understand these
> >    * nuances, so the caller must handle them.
> >    */
> > -static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
> > +static notrace bool rcu_preempt_need_deferred_qs(struct task_struct *t)
> >   {
> >   	return (__this_cpu_read(rcu_data.cpu_no_qs.b.exp) ||
> >   		READ_ONCE(t->rcu_read_unlock_special.s)) &&
> > @@ -595,7 +595,7 @@ static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
> >    * evaluate safety in terms of interrupt, softirq, and preemption
> >    * disabling.
> >    */
> > -static void rcu_preempt_deferred_qs(struct task_struct *t)
> > +static notrace void rcu_preempt_deferred_qs(struct task_struct *t)
> >   {
> >   	unsigned long flags;
> > @@ -926,7 +926,7 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp)
> >    * Because there is no preemptible RCU, there can be no deferred quiescent
> >    * states.
> >    */
> > -static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
> > +static notrace bool rcu_preempt_need_deferred_qs(struct task_struct *t)
> >   {
> >   	return false;
> >   }
> > @@ -935,7 +935,7 @@ static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
> >   // period for a quiescent state from this CPU.  Note that requests from
> >   // tasks are handled when removing the task from the blocked-tasks list
> >   // below.
> > -static void rcu_preempt_deferred_qs(struct task_struct *t)
> > +static notrace void rcu_preempt_deferred_qs(struct task_struct *t)
> >   {
> >   	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH rcu 02/12] rcu: Avoid tracing a few functions executed in stop machine
  2022-06-22 15:35         ` Paul E. McKenney
@ 2022-06-22 15:49           ` Neeraj Upadhyay
  2022-06-23  0:29             ` Paul E. McKenney
  0 siblings, 1 reply; 39+ messages in thread
From: Neeraj Upadhyay @ 2022-06-22 15:49 UTC (permalink / raw)
  To: paulmck; +Cc: rcu, linux-kernel, kernel-team, rostedt, Patrick Wang



On 6/22/2022 9:05 PM, Paul E. McKenney wrote:
> On Wed, Jun 22, 2022 at 05:20:35PM +0530, Neeraj Upadhyay wrote:
>>
>>
>> On 6/22/2022 3:51 AM, Paul E. McKenney wrote:
>>> On Tue, Jun 21, 2022 at 11:17:10AM +0530, Neeraj Upadhyay wrote:
>>>>
>>>>
>>>> On 6/21/2022 3:50 AM, Paul E. McKenney wrote:
>>>>> From: Patrick Wang <patrick.wang.shcn@gmail.com>
>>>>>
>>>>> Stop-machine recently started calling additional functions while waiting:
>>>>>
>>>>> ----------------------------------------------------------------
>>>>> Former stop machine wait loop:
>>>>> do {
>>>>>        cpu_relax(); => macro
>>>>>        ...
>>>>> } while (curstate != STOPMACHINE_EXIT);
>>>>> -----------------------------------------------------------------
>>>>> Current stop machine wait loop:
>>>>> do {
>>>>>        stop_machine_yield(cpumask); => function (notraced)
>>>>>        ...
>>>>>        touch_nmi_watchdog(); => function (notraced, inside calls also notraced)
>>>>>        ...
>>>>>        rcu_momentary_dyntick_idle(); => function (notraced, inside calls traced)
>>>>> } while (curstate != MULTI_STOP_EXIT);
>>>>> ------------------------------------------------------------------
>>>>>
>>>>> These functions (and the functions that they call) must be marked
>>>>> notrace to prevent them from being updated while they are executing.
>>>>> The consequences of failing to mark these functions can be severe:
>>>>>
>>>>>      rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>>>>>      rcu: 	1-...!: (0 ticks this GP) idle=14f/1/0x4000000000000000 softirq=3397/3397 fqs=0
>>>>>      rcu: 	3-...!: (0 ticks this GP) idle=ee9/1/0x4000000000000000 softirq=5168/5168 fqs=0
>>>>>      	(detected by 0, t=8137 jiffies, g=5889, q=2 ncpus=4)
>>>>>      Task dump for CPU 1:
>>>>>      task:migration/1     state:R  running task     stack:    0 pid:   19 ppid:     2 flags:0x00000000
>>>>>      Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
>>>>>      Call Trace:
>>>>>      Task dump for CPU 3:
>>>>>      task:migration/3     state:R  running task     stack:    0 pid:   29 ppid:     2 flags:0x00000000
>>>>>      Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
>>>>>      Call Trace:
>>>>>      rcu: rcu_preempt kthread timer wakeup didn't happen for 8136 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
>>>>>      rcu: 	Possible timer handling issue on cpu=2 timer-softirq=594
>>>>>      rcu: rcu_preempt kthread starved for 8137 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2
>>>>>      rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
>>>>>      rcu: RCU grace-period kthread stack dump:
>>>>>      task:rcu_preempt     state:I stack:    0 pid:   14 ppid:     2 flags:0x00000000
>>>>>      Call Trace:
>>>>>        schedule+0x56/0xc2
>>>>>        schedule_timeout+0x82/0x184
>>>>>        rcu_gp_fqs_loop+0x19a/0x318
>>>>>        rcu_gp_kthread+0x11a/0x140
>>>>>        kthread+0xee/0x118
>>>>>        ret_from_exception+0x0/0x14
>>>>>      rcu: Stack dump where RCU GP kthread last ran:
>>>>>      Task dump for CPU 2:
>>>>>      task:migration/2     state:R  running task     stack:    0 pid:   24 ppid:     2 flags:0x00000000
>>>>>      Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
>>>>>      Call Trace:
>>>>>
>>>>> This commit therefore marks these functions notrace:
>>>>>     rcu_preempt_deferred_qs()
>>>>>     rcu_preempt_need_deferred_qs()
>>>>>     rcu_preempt_deferred_qs_irqrestore()
>>>>>
>>>>
>>>> Only the preemptible RCU definitions are updated; so, this change is not
>>>> required for non-preemptible RCU case?
>>>
>>> It appears to me to be required.  How about as shown below?
>>>
>>
>> Looks good to me.
> 
> Thank you!  May I apply your Reviewed-by to both of these?  (1/12 and
> 2/12.)
> 

Yes, sure.

Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>


Thanks
Neeraj

> 							Thanx, Paul
> 
>> Thanks
>> Neeraj
>>
>>> 							Thanx, Paul
>>>
>>> ------------------------------------------------------------------------
>>>
>>> commit 06cfe0c675c93884c3ffc75ec24ece7d0acd7a32
>>> Author: Patrick Wang <patrick.wang.shcn@gmail.com>
>>> Date:   Tue Apr 26 18:45:02 2022 +0800
>>>
>>>       rcu: Avoid tracing a few functions executed in stop machine
>>>       Stop-machine recently started calling additional functions while waiting:
>>>       ----------------------------------------------------------------
>>>       Former stop machine wait loop:
>>>       do {
>>>           cpu_relax(); => macro
>>>           ...
>>>       } while (curstate != STOPMACHINE_EXIT);
>>>       -----------------------------------------------------------------
>>>       Current stop machine wait loop:
>>>       do {
>>>           stop_machine_yield(cpumask); => function (notraced)
>>>           ...
>>>           touch_nmi_watchdog(); => function (notraced, inside calls also notraced)
>>>           ...
>>>           rcu_momentary_dyntick_idle(); => function (notraced, inside calls traced)
>>>       } while (curstate != MULTI_STOP_EXIT);
>>>       ------------------------------------------------------------------
>>>       These functions (and the functions that they call) must be marked
>>>       notrace to prevent them from being updated while they are executing.
>>>       The consequences of failing to mark these functions can be severe:
>>>         rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>>>         rcu:  1-...!: (0 ticks this GP) idle=14f/1/0x4000000000000000 softirq=3397/3397 fqs=0
>>>         rcu:  3-...!: (0 ticks this GP) idle=ee9/1/0x4000000000000000 softirq=5168/5168 fqs=0
>>>               (detected by 0, t=8137 jiffies, g=5889, q=2 ncpus=4)
>>>         Task dump for CPU 1:
>>>         task:migration/1     state:R  running task     stack:    0 pid:   19 ppid:     2 flags:0x00000000
>>>         Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
>>>         Call Trace:
>>>         Task dump for CPU 3:
>>>         task:migration/3     state:R  running task     stack:    0 pid:   29 ppid:     2 flags:0x00000000
>>>         Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
>>>         Call Trace:
>>>         rcu: rcu_preempt kthread timer wakeup didn't happen for 8136 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
>>>         rcu:  Possible timer handling issue on cpu=2 timer-softirq=594
>>>         rcu: rcu_preempt kthread starved for 8137 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2
>>>         rcu:  Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
>>>         rcu: RCU grace-period kthread stack dump:
>>>         task:rcu_preempt     state:I stack:    0 pid:   14 ppid:     2 flags:0x00000000
>>>         Call Trace:
>>>           schedule+0x56/0xc2
>>>           schedule_timeout+0x82/0x184
>>>           rcu_gp_fqs_loop+0x19a/0x318
>>>           rcu_gp_kthread+0x11a/0x140
>>>           kthread+0xee/0x118
>>>           ret_from_exception+0x0/0x14
>>>         rcu: Stack dump where RCU GP kthread last ran:
>>>         Task dump for CPU 2:
>>>         task:migration/2     state:R  running task     stack:    0 pid:   24 ppid:     2 flags:0x00000000
>>>         Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
>>>         Call Trace:
>>>       This commit therefore marks these functions notrace:
>>>        rcu_preempt_deferred_qs()
>>>        rcu_preempt_need_deferred_qs()
>>>        rcu_preempt_deferred_qs_irqrestore()
>>>       [ paulmck: Apply feedback from Neeraj Upadhyay. ]
>>>       Signed-off-by: Patrick Wang <patrick.wang.shcn@gmail.com>
>>>       Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
>>>       Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>>>
>>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
>>> index c8ba0fe17267c..7a07f2ca153e2 100644
>>> --- a/kernel/rcu/tree_plugin.h
>>> +++ b/kernel/rcu/tree_plugin.h
>>> @@ -460,7 +460,7 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp)
>>>     * be quite short, for example, in the case of the call from
>>>     * rcu_read_unlock_special().
>>>     */
>>> -static void
>>> +static notrace void
>>>    rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
>>>    {
>>>    	bool empty_exp;
>>> @@ -581,7 +581,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
>>>     * is disabled.  This function cannot be expected to understand these
>>>     * nuances, so the caller must handle them.
>>>     */
>>> -static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
>>> +static notrace bool rcu_preempt_need_deferred_qs(struct task_struct *t)
>>>    {
>>>    	return (__this_cpu_read(rcu_data.cpu_no_qs.b.exp) ||
>>>    		READ_ONCE(t->rcu_read_unlock_special.s)) &&
>>> @@ -595,7 +595,7 @@ static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
>>>     * evaluate safety in terms of interrupt, softirq, and preemption
>>>     * disabling.
>>>     */
>>> -static void rcu_preempt_deferred_qs(struct task_struct *t)
>>> +static notrace void rcu_preempt_deferred_qs(struct task_struct *t)
>>>    {
>>>    	unsigned long flags;
>>> @@ -926,7 +926,7 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp)
>>>     * Because there is no preemptible RCU, there can be no deferred quiescent
>>>     * states.
>>>     */
>>> -static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
>>> +static notrace bool rcu_preempt_need_deferred_qs(struct task_struct *t)
>>>    {
>>>    	return false;
>>>    }
>>> @@ -935,7 +935,7 @@ static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
>>>    // period for a quiescent state from this CPU.  Note that requests from
>>>    // tasks are handled when removing the task from the blocked-tasks list
>>>    // below.
>>> -static void rcu_preempt_deferred_qs(struct task_struct *t)
>>> +static notrace void rcu_preempt_deferred_qs(struct task_struct *t)
>>>    {
>>>    	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH rcu 02/12] rcu: Avoid tracing a few functions executed in stop machine
  2022-06-22 15:49           ` Neeraj Upadhyay
@ 2022-06-23  0:29             ` Paul E. McKenney
  0 siblings, 0 replies; 39+ messages in thread
From: Paul E. McKenney @ 2022-06-23  0:29 UTC (permalink / raw)
  To: Neeraj Upadhyay; +Cc: rcu, linux-kernel, kernel-team, rostedt, Patrick Wang

On Wed, Jun 22, 2022 at 09:19:25PM +0530, Neeraj Upadhyay wrote:
> 
> 
> On 6/22/2022 9:05 PM, Paul E. McKenney wrote:
> > On Wed, Jun 22, 2022 at 05:20:35PM +0530, Neeraj Upadhyay wrote:
> > > 
> > > 
> > > On 6/22/2022 3:51 AM, Paul E. McKenney wrote:
> > > > On Tue, Jun 21, 2022 at 11:17:10AM +0530, Neeraj Upadhyay wrote:
> > > > > 
> > > > > 
> > > > > On 6/21/2022 3:50 AM, Paul E. McKenney wrote:
> > > > > > From: Patrick Wang <patrick.wang.shcn@gmail.com>
> > > > > > 
> > > > > > Stop-machine recently started calling additional functions while waiting:
> > > > > > 
> > > > > > ----------------------------------------------------------------
> > > > > > Former stop machine wait loop:
> > > > > > do {
> > > > > >        cpu_relax(); => macro
> > > > > >        ...
> > > > > > } while (curstate != STOPMACHINE_EXIT);
> > > > > > -----------------------------------------------------------------
> > > > > > Current stop machine wait loop:
> > > > > > do {
> > > > > >        stop_machine_yield(cpumask); => function (notraced)
> > > > > >        ...
> > > > > >        touch_nmi_watchdog(); => function (notraced, inside calls also notraced)
> > > > > >        ...
> > > > > >        rcu_momentary_dyntick_idle(); => function (notraced, inside calls traced)
> > > > > > } while (curstate != MULTI_STOP_EXIT);
> > > > > > ------------------------------------------------------------------
> > > > > > 
> > > > > > These functions (and the functions that they call) must be marked
> > > > > > notrace to prevent them from being updated while they are executing.
> > > > > > The consequences of failing to mark these functions can be severe:
> > > > > > 
> > > > > >      rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > > >      rcu: 	1-...!: (0 ticks this GP) idle=14f/1/0x4000000000000000 softirq=3397/3397 fqs=0
> > > > > >      rcu: 	3-...!: (0 ticks this GP) idle=ee9/1/0x4000000000000000 softirq=5168/5168 fqs=0
> > > > > >      	(detected by 0, t=8137 jiffies, g=5889, q=2 ncpus=4)
> > > > > >      Task dump for CPU 1:
> > > > > >      task:migration/1     state:R  running task     stack:    0 pid:   19 ppid:     2 flags:0x00000000
> > > > > >      Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
> > > > > >      Call Trace:
> > > > > >      Task dump for CPU 3:
> > > > > >      task:migration/3     state:R  running task     stack:    0 pid:   29 ppid:     2 flags:0x00000000
> > > > > >      Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
> > > > > >      Call Trace:
> > > > > >      rcu: rcu_preempt kthread timer wakeup didn't happen for 8136 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
> > > > > >      rcu: 	Possible timer handling issue on cpu=2 timer-softirq=594
> > > > > >      rcu: rcu_preempt kthread starved for 8137 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2
> > > > > >      rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> > > > > >      rcu: RCU grace-period kthread stack dump:
> > > > > >      task:rcu_preempt     state:I stack:    0 pid:   14 ppid:     2 flags:0x00000000
> > > > > >      Call Trace:
> > > > > >        schedule+0x56/0xc2
> > > > > >        schedule_timeout+0x82/0x184
> > > > > >        rcu_gp_fqs_loop+0x19a/0x318
> > > > > >        rcu_gp_kthread+0x11a/0x140
> > > > > >        kthread+0xee/0x118
> > > > > >        ret_from_exception+0x0/0x14
> > > > > >      rcu: Stack dump where RCU GP kthread last ran:
> > > > > >      Task dump for CPU 2:
> > > > > >      task:migration/2     state:R  running task     stack:    0 pid:   24 ppid:     2 flags:0x00000000
> > > > > >      Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
> > > > > >      Call Trace:
> > > > > > 
> > > > > > This commit therefore marks these functions notrace:
> > > > > >     rcu_preempt_deferred_qs()
> > > > > >     rcu_preempt_need_deferred_qs()
> > > > > >     rcu_preempt_deferred_qs_irqrestore()
> > > > > > 
> > > > > 
> > > > > Only the preemptible RCU definitions are updated; so, this change is not
> > > > > required for non-preemptible RCU case?
> > > > 
> > > > It appears to me to be required.  How about as shown below?
> > > > 
> > > 
> > > Looks good to me.
> > 
> > Thank you!  May I apply your Reviewed-by to both of these?  (1/12 and
> > 2/12.)
> > 
> 
> Yes, sure.
> 
> Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>

Very good, thank you!  I will apply these on my next rebase.

							Thanx, Paul

> Thanks
> Neeraj
> 
> > 							Thanx, Paul
> > 
> > > Thanks
> > > Neeraj
> > > 
> > > > 							Thanx, Paul
> > > > 
> > > > ------------------------------------------------------------------------
> > > > 
> > > > commit 06cfe0c675c93884c3ffc75ec24ece7d0acd7a32
> > > > Author: Patrick Wang <patrick.wang.shcn@gmail.com>
> > > > Date:   Tue Apr 26 18:45:02 2022 +0800
> > > > 
> > > >       rcu: Avoid tracing a few functions executed in stop machine
> > > >       Stop-machine recently started calling additional functions while waiting:
> > > >       ----------------------------------------------------------------
> > > >       Former stop machine wait loop:
> > > >       do {
> > > >           cpu_relax(); => macro
> > > >           ...
> > > >       } while (curstate != STOPMACHINE_EXIT);
> > > >       -----------------------------------------------------------------
> > > >       Current stop machine wait loop:
> > > >       do {
> > > >           stop_machine_yield(cpumask); => function (notraced)
> > > >           ...
> > > >           touch_nmi_watchdog(); => function (notraced, inside calls also notraced)
> > > >           ...
> > > >           rcu_momentary_dyntick_idle(); => function (notraced, inside calls traced)
> > > >       } while (curstate != MULTI_STOP_EXIT);
> > > >       ------------------------------------------------------------------
> > > >       These functions (and the functions that they call) must be marked
> > > >       notrace to prevent them from being updated while they are executing.
> > > >       The consequences of failing to mark these functions can be severe:
> > > >         rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > >         rcu:  1-...!: (0 ticks this GP) idle=14f/1/0x4000000000000000 softirq=3397/3397 fqs=0
> > > >         rcu:  3-...!: (0 ticks this GP) idle=ee9/1/0x4000000000000000 softirq=5168/5168 fqs=0
> > > >               (detected by 0, t=8137 jiffies, g=5889, q=2 ncpus=4)
> > > >         Task dump for CPU 1:
> > > >         task:migration/1     state:R  running task     stack:    0 pid:   19 ppid:     2 flags:0x00000000
> > > >         Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
> > > >         Call Trace:
> > > >         Task dump for CPU 3:
> > > >         task:migration/3     state:R  running task     stack:    0 pid:   29 ppid:     2 flags:0x00000000
> > > >         Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
> > > >         Call Trace:
> > > >         rcu: rcu_preempt kthread timer wakeup didn't happen for 8136 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
> > > >         rcu:  Possible timer handling issue on cpu=2 timer-softirq=594
> > > >         rcu: rcu_preempt kthread starved for 8137 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2
> > > >         rcu:  Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> > > >         rcu: RCU grace-period kthread stack dump:
> > > >         task:rcu_preempt     state:I stack:    0 pid:   14 ppid:     2 flags:0x00000000
> > > >         Call Trace:
> > > >           schedule+0x56/0xc2
> > > >           schedule_timeout+0x82/0x184
> > > >           rcu_gp_fqs_loop+0x19a/0x318
> > > >           rcu_gp_kthread+0x11a/0x140
> > > >           kthread+0xee/0x118
> > > >           ret_from_exception+0x0/0x14
> > > >         rcu: Stack dump where RCU GP kthread last ran:
> > > >         Task dump for CPU 2:
> > > >         task:migration/2     state:R  running task     stack:    0 pid:   24 ppid:     2 flags:0x00000000
> > > >         Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
> > > >         Call Trace:
> > > >       This commit therefore marks these functions notrace:
> > > >        rcu_preempt_deferred_qs()
> > > >        rcu_preempt_need_deferred_qs()
> > > >        rcu_preempt_deferred_qs_irqrestore()
> > > >       [ paulmck: Apply feedback from Neeraj Upadhyay. ]
> > > >       Signed-off-by: Patrick Wang <patrick.wang.shcn@gmail.com>
> > > >       Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> > > >       Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > > 
> > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > > index c8ba0fe17267c..7a07f2ca153e2 100644
> > > > --- a/kernel/rcu/tree_plugin.h
> > > > +++ b/kernel/rcu/tree_plugin.h
> > > > @@ -460,7 +460,7 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp)
> > > >     * be quite short, for example, in the case of the call from
> > > >     * rcu_read_unlock_special().
> > > >     */
> > > > -static void
> > > > +static notrace void
> > > >    rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
> > > >    {
> > > >    	bool empty_exp;
> > > > @@ -581,7 +581,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
> > > >     * is disabled.  This function cannot be expected to understand these
> > > >     * nuances, so the caller must handle them.
> > > >     */
> > > > -static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
> > > > +static notrace bool rcu_preempt_need_deferred_qs(struct task_struct *t)
> > > >    {
> > > >    	return (__this_cpu_read(rcu_data.cpu_no_qs.b.exp) ||
> > > >    		READ_ONCE(t->rcu_read_unlock_special.s)) &&
> > > > @@ -595,7 +595,7 @@ static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
> > > >     * evaluate safety in terms of interrupt, softirq, and preemption
> > > >     * disabling.
> > > >     */
> > > > -static void rcu_preempt_deferred_qs(struct task_struct *t)
> > > > +static notrace void rcu_preempt_deferred_qs(struct task_struct *t)
> > > >    {
> > > >    	unsigned long flags;
> > > > @@ -926,7 +926,7 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp)
> > > >     * Because there is no preemptible RCU, there can be no deferred quiescent
> > > >     * states.
> > > >     */
> > > > -static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
> > > > +static notrace bool rcu_preempt_need_deferred_qs(struct task_struct *t)
> > > >    {
> > > >    	return false;
> > > >    }
> > > > @@ -935,7 +935,7 @@ static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
> > > >    // period for a quiescent state from this CPU.  Note that requests from
> > > >    // tasks are handled when removing the task from the blocked-tasks list
> > > >    // below.
> > > > -static void rcu_preempt_deferred_qs(struct task_struct *t)
> > > > +static notrace void rcu_preempt_deferred_qs(struct task_struct *t)
> > > >    {
> > > >    	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2022-06-23  0:29 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-20 22:20 [PATCH rcu 0/12] Miscellaneous fixes for v5.20 Paul E. McKenney
2022-06-20 22:20 ` [PATCH rcu 01/12] rcu: Decrease FQS scan wait time in case of callback overloading Paul E. McKenney
2022-06-21  5:29   ` Neeraj Upadhyay
2022-06-21 22:19     ` Paul E. McKenney
2022-06-22 11:46       ` Neeraj Upadhyay
2022-06-20 22:20 ` [PATCH rcu 02/12] rcu: Avoid tracing a few functions executed in stop machine Paul E. McKenney
2022-06-21  5:47   ` Neeraj Upadhyay
2022-06-21 22:21     ` Paul E. McKenney
2022-06-22 11:50       ` Neeraj Upadhyay
2022-06-22 15:35         ` Paul E. McKenney
2022-06-22 15:49           ` Neeraj Upadhyay
2022-06-23  0:29             ` Paul E. McKenney
2022-06-20 22:20 ` [PATCH rcu 03/12] rcu: Add rnp->cbovldmask check in rcutree_migrate_callbacks() Paul E. McKenney
2022-06-21  5:57   ` Neeraj Upadhyay
2022-06-21 22:22     ` Paul E. McKenney
2022-06-20 22:20 ` [PATCH rcu 04/12] rcu: Immediately boost preempted readers for strict grace periods Paul E. McKenney
2022-06-21  6:00   ` Neeraj Upadhyay
2022-06-20 22:20 ` [PATCH rcu 05/12] rcu: Forbid RCU_STRICT_GRACE_PERIOD in TINY_RCU kernels Paul E. McKenney
2022-06-21  6:02   ` Neeraj Upadhyay
2022-06-20 22:20 ` [PATCH rcu 06/12] locking/csd_lock: Change csdlock_debug from early_param to __setup Paul E. McKenney
2022-06-20 22:20 ` [PATCH rcu 07/12] rcu: tiny: Record kvfree_call_rcu() call stack for KASAN Paul E. McKenney
2022-06-21  6:31   ` Neeraj Upadhyay
2022-06-21 19:31     ` Paul E. McKenney
2022-06-21 21:14       ` Marco Elver
2022-06-21 22:17         ` Paul E. McKenney
2022-06-20 22:20 ` [PATCH rcu 08/12] rcu: Cleanup RCU urgency state for offline CPU Paul E. McKenney
2022-06-21  7:03   ` Neeraj Upadhyay
2022-06-21 22:24     ` Paul E. McKenney
2022-06-20 22:20 ` [PATCH rcu 09/12] rcu/kvfree: Remove useless monitor_todo flag Paul E. McKenney
2022-06-21 10:02   ` Neeraj Upadhyay
2022-06-20 22:20 ` [PATCH rcu 10/12] rcu: Initialize first_gp_fqs at declaration in rcu_gp_fqs() Paul E. McKenney
2022-06-20 22:20 ` [PATCH rcu 11/12] rcu/tree: Add comment to describe GP-done condition in fqs loop Paul E. McKenney
2022-06-20 22:20 ` [PATCH rcu 12/12] srcu: Block less aggressively for expedited grace periods Paul E. McKenney
2022-06-21  2:00   ` Zhangfei Gao
2022-06-21  3:15     ` Paul E. McKenney
2022-06-21  7:43   ` Shameerali Kolothum Thodi
2022-06-21 19:36     ` Paul E. McKenney
2022-06-21 10:13   ` Neeraj Upadhyay
2022-06-21 22:25     ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).