All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH rcu 0/11] Miscellaneous fixes for v5.19
@ 2022-04-18 22:53 Paul E. McKenney
  2022-04-18 22:53 ` [PATCH rcu 01/11] rcu: Clarify fill-the-gap comment in rcu_segcblist_advance() Paul E. McKenney
                   ` (11 more replies)
  0 siblings, 12 replies; 24+ messages in thread
From: Paul E. McKenney @ 2022-04-18 22:53 UTC (permalink / raw)
  To: rcu; +Cc: linux-kernel, kernel-team, rostedt

Hello!

This series contains miscellaneous fixes:

1.	Clarify fill-the-gap comment in rcu_segcblist_advance().

2.	Fix rcu_preempt_deferred_qs_irqrestore() strict QS reporting.

3.	Check for jiffies going backwards.

4.	Provide boot-time timeout for CSD lock diagnostics.

5.	Add comments to final rcu_gp_cleanup() "if" statement.

6.	Print number of online CPUs in RCU CPU stall-warning messages.

7.	Fix preemption mode check on synchronize_rcu[_expedited](),
	courtesy of Frederic Weisbecker.

8.	Drop needless initialization of sdp in srcu_gp_start(), courtesy
	of Lukas Bulwahn.

9.	Check for successful spawn of ->boost_kthread_task, courtesy
	of Zqiang.

10.	rcu_sync: Fix comment to properly reflect rcu_sync_exit()
	behavior, courtesy of David Vernet.

11.	Use IRQ_WORK_INIT_HARD() to avoid rcu_read_unlock() hangs,
	courtesy of Zqiang.

						Thanx, Paul

------------------------------------------------------------------------

 b/Documentation/admin-guide/kernel-parameters.txt |   11 ++++++++
 b/kernel/rcu/rcu_segcblist.c                      |    8 +++---
 b/kernel/rcu/srcutree.c                           |    2 -
 b/kernel/rcu/sync.c                               |    2 -
 b/kernel/rcu/tree.c                               |   10 +++++++
 b/kernel/rcu/tree.h                               |    1 
 b/kernel/rcu/tree_plugin.h                        |    1 
 b/kernel/rcu/tree_stall.h                         |    8 +++---
 b/kernel/smp.c                                    |    7 +++--
 kernel/rcu/tree.c                                 |   29 +++++++++++++++++-----
 kernel/rcu/tree_plugin.h                          |   11 ++++++--
 11 files changed, 70 insertions(+), 20 deletions(-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH rcu 01/11] rcu: Clarify fill-the-gap comment in rcu_segcblist_advance()
  2022-04-18 22:53 [PATCH rcu 0/11] Miscellaneous fixes for v5.19 Paul E. McKenney
@ 2022-04-18 22:53 ` Paul E. McKenney
  2022-04-18 22:53 ` [PATCH rcu 02/11] rcu: Fix rcu_preempt_deferred_qs_irqrestore() strict QS reporting Paul E. McKenney
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2022-04-18 22:53 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Paul E. McKenney,
	Frederic Weisbecker, Neeraj Upadhyay, Boqun Feng

Reported-by: Frederic Weisbecker <frederic@kernel.org>
Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Reported-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/rcu_segcblist.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index 81145c3ece25..c54ea2b6a36b 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -505,10 +505,10 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq)
 		WRITE_ONCE(rsclp->tails[j], rsclp->tails[RCU_DONE_TAIL]);
 
 	/*
-	 * Callbacks moved, so clean up the misordered ->tails[] pointers
-	 * that now point into the middle of the list of ready-to-invoke
-	 * callbacks.  The overall effect is to copy down the later pointers
-	 * into the gap that was created by the now-ready segments.
+	 * Callbacks moved, so there might be an empty RCU_WAIT_TAIL
+	 * and a non-empty RCU_NEXT_READY_TAIL.  If so, copy the
+	 * RCU_NEXT_READY_TAIL segment to fill the RCU_WAIT_TAIL gap
+	 * created by the now-ready-to-invoke segments.
 	 */
 	for (j = RCU_WAIT_TAIL; i < RCU_NEXT_TAIL; i++, j++) {
 		if (rsclp->tails[j] == rsclp->tails[RCU_NEXT_TAIL])
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH rcu 02/11] rcu: Fix rcu_preempt_deferred_qs_irqrestore() strict QS reporting
  2022-04-18 22:53 [PATCH rcu 0/11] Miscellaneous fixes for v5.19 Paul E. McKenney
  2022-04-18 22:53 ` [PATCH rcu 01/11] rcu: Clarify fill-the-gap comment in rcu_segcblist_advance() Paul E. McKenney
@ 2022-04-18 22:53 ` Paul E. McKenney
  2022-04-18 22:53 ` [PATCH rcu 03/11] rcu: Check for jiffies going backwards Paul E. McKenney
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2022-04-18 22:53 UTC (permalink / raw)
  To: rcu; +Cc: linux-kernel, kernel-team, rostedt, Paul E. McKenney

Suppose we have a kernel built with both CONFIG_RCU_STRICT_GRACE_PERIOD=y
and CONFIG_PREEMPT=y.  Suppose further that an RCU reader from which RCU
core needs a quiescent state ends in rcu_preempt_deferred_qs_irqrestore().
This function will then invoke rcu_report_qs_rdp() in order to immediately
report that quiescent state.  Unfortunately, it will not have cleared
that reader's CPU's rcu_data structure's ->cpu_no_qs.b.norm field.
As a result, rcu_report_qs_rdp() will take an early exit because it
will believe that this CPU has not yet encountered a quiescent state,
and there will be no reporting of the current quiescent state.

This commit therefore causes rcu_preempt_deferred_qs_irqrestore() to
clear the ->cpu_no_qs.b.norm field before invoking rcu_report_qs_rdp().

Kudos to Boqun Feng and Neeraj Upadhyay for helping with analysis of
this issue!

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree_plugin.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 8360d86db1c0..176639c6215f 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -486,6 +486,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
 	t->rcu_read_unlock_special.s = 0;
 	if (special.b.need_qs) {
 		if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD)) {
+			rdp->cpu_no_qs.b.norm = false;
 			rcu_report_qs_rdp(rdp);
 			udelay(rcu_unlock_delay);
 		} else {
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH rcu 03/11] rcu: Check for jiffies going backwards
  2022-04-18 22:53 [PATCH rcu 0/11] Miscellaneous fixes for v5.19 Paul E. McKenney
  2022-04-18 22:53 ` [PATCH rcu 01/11] rcu: Clarify fill-the-gap comment in rcu_segcblist_advance() Paul E. McKenney
  2022-04-18 22:53 ` [PATCH rcu 02/11] rcu: Fix rcu_preempt_deferred_qs_irqrestore() strict QS reporting Paul E. McKenney
@ 2022-04-18 22:53 ` Paul E. McKenney
  2022-04-18 22:53 ` [PATCH rcu 04/11] kernel/smp: Provide boot-time timeout for CSD lock diagnostics Paul E. McKenney
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2022-04-18 22:53 UTC (permalink / raw)
  To: rcu; +Cc: linux-kernel, kernel-team, rostedt, Paul E. McKenney, Saravanan D

A report of a 12-jiffy normal RCU CPU stall warning raises interesting
questions about the nature of time on the offending system.  This commit
instruments rcu_sched_clock_irq(), which is RCU's hook into the
scheduling-clock interrupt, checking for the jiffies counter going
backwards.

Reported-by: Saravanan D <sarvanand@fb.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 10 ++++++++++
 kernel/rcu/tree.h |  1 +
 2 files changed, 11 insertions(+)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index a4b8189455d5..a5ea67454640 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1679,6 +1679,8 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
 	rdp->gp_seq = rnp->gp_seq;  /* Remember new grace-period state. */
 	if (ULONG_CMP_LT(rdp->gp_seq_needed, rnp->gp_seq_needed) || rdp->gpwrap)
 		WRITE_ONCE(rdp->gp_seq_needed, rnp->gp_seq_needed);
+	if (IS_ENABLED(CONFIG_PROVE_RCU) && READ_ONCE(rdp->gpwrap))
+		WRITE_ONCE(rdp->last_sched_clock, jiffies);
 	WRITE_ONCE(rdp->gpwrap, false);
 	rcu_gpnum_ovf(rnp, rdp);
 	return ret;
@@ -2609,6 +2611,13 @@ static void rcu_do_batch(struct rcu_data *rdp)
  */
 void rcu_sched_clock_irq(int user)
 {
+	unsigned long j;
+
+	if (IS_ENABLED(CONFIG_PROVE_RCU)) {
+		j = jiffies;
+		WARN_ON_ONCE(time_before(j, __this_cpu_read(rcu_data.last_sched_clock)));
+		__this_cpu_write(rcu_data.last_sched_clock, j);
+	}
 	trace_rcu_utilization(TPS("Start scheduler-tick"));
 	lockdep_assert_irqs_disabled();
 	raw_cpu_inc(rcu_data.ticks_this_gp);
@@ -4179,6 +4188,7 @@ rcu_boot_init_percpu_data(int cpu)
 	rdp->rcu_ofl_gp_flags = RCU_GP_CLEANED;
 	rdp->rcu_onl_gp_seq = rcu_state.gp_seq;
 	rdp->rcu_onl_gp_flags = RCU_GP_CLEANED;
+	rdp->last_sched_clock = jiffies;
 	rdp->cpu = cpu;
 	rcu_boot_init_nocb_percpu_data(rdp);
 }
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 926673ebe355..94b55f669915 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -254,6 +254,7 @@ struct rcu_data {
 	unsigned long rcu_onl_gp_seq;	/* ->gp_seq at last online. */
 	short rcu_onl_gp_flags;		/* ->gp_flags at last online. */
 	unsigned long last_fqs_resched;	/* Time of last rcu_resched(). */
+	unsigned long last_sched_clock;	/* Jiffies of last rcu_sched_clock_irq(). */
 
 	int cpu;
 };
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH rcu 04/11] kernel/smp: Provide boot-time timeout for CSD lock diagnostics
  2022-04-18 22:53 [PATCH rcu 0/11] Miscellaneous fixes for v5.19 Paul E. McKenney
                   ` (2 preceding siblings ...)
  2022-04-18 22:53 ` [PATCH rcu 03/11] rcu: Check for jiffies going backwards Paul E. McKenney
@ 2022-04-18 22:53 ` Paul E. McKenney
  2022-04-19  7:11   ` Juergen Gross
  2022-04-18 22:53 ` [PATCH rcu 05/11] rcu: Add comments to final rcu_gp_cleanup() "if" statement Paul E. McKenney
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2022-04-18 22:53 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Paul E. McKenney,
	Rik van Riel, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	Sebastian Andrzej Siewior, Juergen Gross

Debugging of problems involving insanely long-running SMI handlers
proceeds better if the CSD-lock timeout can be adjusted.  This commit
therefore provides a new smp.csd_lock_timeout kernel boot parameter
that specifies the timeout in milliseconds.  The default remains at the
previously hard-coded value of five seconds.

Cc: Rik van Riel <riel@surriel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 Documentation/admin-guide/kernel-parameters.txt | 11 +++++++++++
 kernel/smp.c                                    |  7 +++++--
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 3f1cc5e317ed..645c4c001b16 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5377,6 +5377,17 @@
 	smart2=		[HW]
 			Format: <io1>[,<io2>[,...,<io8>]]
 
+	smp.csd_lock_timeout= [KNL]
+			Specify the period of time in milliseconds
+			that smp_call_function() and friends will wait
+			for a CPU to release the CSD lock.  This is
+			useful when diagnosing bugs involving CPUs
+			disabling interrupts for extended periods
+			of time.  Defaults to 5,000 milliseconds, and
+			setting a value of zero disables this feature.
+			This feature may be more efficiently disabled
+			using the csdlock_debug- kernel parameter.
+
 	smsc-ircc2.nopnp	[HW] Don't use PNP to discover SMC devices
 	smsc-ircc2.ircc_cfg=	[HW] Device configuration I/O port
 	smsc-ircc2.ircc_sir=	[HW] SIR base I/O port
diff --git a/kernel/smp.c b/kernel/smp.c
index 01a7c1706a58..d82439bac401 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -183,7 +183,9 @@ static DEFINE_PER_CPU(smp_call_func_t, cur_csd_func);
 static DEFINE_PER_CPU(void *, cur_csd_info);
 static DEFINE_PER_CPU(struct cfd_seq_local, cfd_seq_local);
 
-#define CSD_LOCK_TIMEOUT (5ULL * NSEC_PER_SEC)
+static ulong csd_lock_timeout = 5000;  /* CSD lock timeout in milliseconds. */
+module_param(csd_lock_timeout, ulong, 0444);
+
 static atomic_t csd_bug_count = ATOMIC_INIT(0);
 static u64 cfd_seq;
 
@@ -329,6 +331,7 @@ static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 *
 	u64 ts2, ts_delta;
 	call_single_data_t *cpu_cur_csd;
 	unsigned int flags = READ_ONCE(csd->node.u_flags);
+	unsigned long long csd_lock_timeout_ns = csd_lock_timeout * NSEC_PER_MSEC;
 
 	if (!(flags & CSD_FLAG_LOCK)) {
 		if (!unlikely(*bug_id))
@@ -341,7 +344,7 @@ static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 *
 
 	ts2 = sched_clock();
 	ts_delta = ts2 - *ts1;
-	if (likely(ts_delta <= CSD_LOCK_TIMEOUT))
+	if (likely(ts_delta <= csd_lock_timeout_ns || csd_lock_timeout_ns <= 0))
 		return false;
 
 	firsttime = !*bug_id;
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH rcu 05/11] rcu: Add comments to final rcu_gp_cleanup() "if" statement
  2022-04-18 22:53 [PATCH rcu 0/11] Miscellaneous fixes for v5.19 Paul E. McKenney
                   ` (3 preceding siblings ...)
  2022-04-18 22:53 ` [PATCH rcu 04/11] kernel/smp: Provide boot-time timeout for CSD lock diagnostics Paul E. McKenney
@ 2022-04-18 22:53 ` Paul E. McKenney
  2022-04-18 22:53 ` [PATCH rcu 06/11] rcu: Print number of online CPUs in RCU CPU stall-warning messages Paul E. McKenney
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2022-04-18 22:53 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Paul E. McKenney, Boqun Feng,
	Frederic Weisbecker, Neeraj Upadhyay, Uladzislau Rezki

The final "if" statement in rcu_gp_cleanup() has proven to be rather
confusing, straightforward though it might have seemed when initially
written.  This commit therefore adds comments to its "then" and "else"
clauses to at least provide a more elevated form of confusion.

Reported-by: Boqun Feng <boqun.feng@gmail.com>
Reported-by: Frederic Weisbecker <frederic@kernel.org>
Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Reported-by: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index a5ea67454640..29669070348e 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2098,14 +2098,29 @@ static noinline void rcu_gp_cleanup(void)
 	/* Advance CBs to reduce false positives below. */
 	offloaded = rcu_rdp_is_offloaded(rdp);
 	if ((offloaded || !rcu_accelerate_cbs(rnp, rdp)) && needgp) {
+
+		// We get here if a grace period was needed (“needgp”)
+		// and the above call to rcu_accelerate_cbs() did not set
+		// the RCU_GP_FLAG_INIT bit in ->gp_state (which records
+		// the need for another grace period).  The purpose
+		// of the “offloaded” check is to avoid invoking
+		// rcu_accelerate_cbs() on an offloaded CPU because we do not
+		// hold the ->nocb_lock needed to safely access an offloaded
+		// ->cblist.  We do not want to acquire that lock because
+		// it can be heavily contended during callback floods.
+
 		WRITE_ONCE(rcu_state.gp_flags, RCU_GP_FLAG_INIT);
 		WRITE_ONCE(rcu_state.gp_req_activity, jiffies);
-		trace_rcu_grace_period(rcu_state.name,
-				       rcu_state.gp_seq,
-				       TPS("newreq"));
+		trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("newreq"));
 	} else {
-		WRITE_ONCE(rcu_state.gp_flags,
-			   rcu_state.gp_flags & RCU_GP_FLAG_INIT);
+
+		// We get here either if there is no need for an
+		// additional grace period or if rcu_accelerate_cbs() has
+		// already set the RCU_GP_FLAG_INIT bit in ->gp_flags. 
+		// So all we need to do is to clear all of the other
+		// ->gp_flags bits.
+
+		WRITE_ONCE(rcu_state.gp_flags, rcu_state.gp_flags & RCU_GP_FLAG_INIT);
 	}
 	raw_spin_unlock_irq_rcu_node(rnp);
 
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH rcu 06/11] rcu: Print number of online CPUs in RCU CPU stall-warning messages
  2022-04-18 22:53 [PATCH rcu 0/11] Miscellaneous fixes for v5.19 Paul E. McKenney
                   ` (4 preceding siblings ...)
  2022-04-18 22:53 ` [PATCH rcu 05/11] rcu: Add comments to final rcu_gp_cleanup() "if" statement Paul E. McKenney
@ 2022-04-18 22:53 ` Paul E. McKenney
  2022-04-18 22:53 ` [PATCH rcu 07/11] rcu: Fix preemption mode check on synchronize_rcu[_expedited]() Paul E. McKenney
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2022-04-18 22:53 UTC (permalink / raw)
  To: rcu; +Cc: linux-kernel, kernel-team, rostedt, Paul E. McKenney

RCU's synchronous grace periods act quite differently when there is
only one online CPU, especially in the no-op case in kernels built with
CONFIG_PREEMPTION=n.  This change in behavior can be important debugging
information, so this commit adds the number of online CPUs to the RCU
CPU stall warning messages.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree_stall.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 0c5d8516516a..268dd79c58e7 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -565,9 +565,9 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
 
 	for_each_possible_cpu(cpu)
 		totqlen += rcu_get_n_cbs_cpu(cpu);
-	pr_cont("\t(detected by %d, t=%ld jiffies, g=%ld, q=%lu)\n",
+	pr_cont("\t(detected by %d, t=%ld jiffies, g=%ld, q=%lu ncpus=%d)\n",
 	       smp_processor_id(), (long)(jiffies - gps),
-	       (long)rcu_seq_current(&rcu_state.gp_seq), totqlen);
+	       (long)rcu_seq_current(&rcu_state.gp_seq), totqlen, rcu_state.n_online_cpus);
 	if (ndetected) {
 		rcu_dump_cpu_stacks();
 
@@ -626,9 +626,9 @@ static void print_cpu_stall(unsigned long gps)
 	raw_spin_unlock_irqrestore_rcu_node(rdp->mynode, flags);
 	for_each_possible_cpu(cpu)
 		totqlen += rcu_get_n_cbs_cpu(cpu);
-	pr_cont("\t(t=%lu jiffies g=%ld q=%lu)\n",
+	pr_cont("\t(t=%lu jiffies g=%ld q=%lu ncpus=%d)\n",
 		jiffies - gps,
-		(long)rcu_seq_current(&rcu_state.gp_seq), totqlen);
+		(long)rcu_seq_current(&rcu_state.gp_seq), totqlen, rcu_state.n_online_cpus);
 
 	rcu_check_gp_kthread_expired_fqs_timer();
 	rcu_check_gp_kthread_starvation();
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH rcu 07/11] rcu: Fix preemption mode check on synchronize_rcu[_expedited]()
  2022-04-18 22:53 [PATCH rcu 0/11] Miscellaneous fixes for v5.19 Paul E. McKenney
                   ` (5 preceding siblings ...)
  2022-04-18 22:53 ` [PATCH rcu 06/11] rcu: Print number of online CPUs in RCU CPU stall-warning messages Paul E. McKenney
@ 2022-04-18 22:53 ` Paul E. McKenney
  2022-04-18 22:53 ` [PATCH rcu 08/11] srcu: Drop needless initialization of sdp in srcu_gp_start() Paul E. McKenney
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2022-04-18 22:53 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Frederic Weisbecker,
	Paul E . McKenney, Uladzislau Rezki, Joel Fernandes, Boqun Feng,
	Peter Zijlstra, Neeraj Upadhyay, Valentin Schneider

From: Frederic Weisbecker <frederic@kernel.org>

An early check on synchronize_rcu[_expedited]() tries to determine if
the current CPU is in UP mode on an SMP no-preempt kernel, in which case
there is no need to start a grace period since the current assumed
quiescent state is all we need.

However the preemption mode doesn't take into account the boot selected
preemption mode under CONFIG_PREEMPT_DYNAMIC=y, missing a possible
early return if the running flavour is "none" or "voluntary".

Use the shiny new preempt mode accessors to fix this.  However,
avoid invoking them during early boot because doing so triggers a
WARN_ON_ONCE().

[ paulmck: Update for mainlined API. ]

Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 29669070348e..d3caa82b9954 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3741,7 +3741,9 @@ static int rcu_blocking_is_gp(void)
 {
 	int ret;
 
-	if (IS_ENABLED(CONFIG_PREEMPTION))
+	// Invoking preempt_model_*() too early gets a splat.
+	if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE ||
+	    preempt_model_full() || preempt_model_rt())
 		return rcu_scheduler_active == RCU_SCHEDULER_INACTIVE;
 	might_sleep();  /* Check for RCU read-side critical section. */
 	preempt_disable();
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH rcu 08/11] srcu: Drop needless initialization of sdp in srcu_gp_start()
  2022-04-18 22:53 [PATCH rcu 0/11] Miscellaneous fixes for v5.19 Paul E. McKenney
                   ` (6 preceding siblings ...)
  2022-04-18 22:53 ` [PATCH rcu 07/11] rcu: Fix preemption mode check on synchronize_rcu[_expedited]() Paul E. McKenney
@ 2022-04-18 22:53 ` Paul E. McKenney
  2022-04-19  2:03   ` kernel test robot
  2022-04-18 22:53 ` [PATCH rcu 09/11] rcu: Check for successful spawn of ->boost_kthread_task Paul E. McKenney
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2022-04-18 22:53 UTC (permalink / raw)
  To: rcu; +Cc: linux-kernel, kernel-team, rostedt, Lukas Bulwahn, Paul E . McKenney

From: Lukas Bulwahn <lukas.bulwahn@gmail.com>

Commit 9c7ef4c30f12 ("srcu: Make Tree SRCU able to operate without
snp_node array") initializes the local variable sdp differently depending
on the srcu's state in srcu_gp_start().  Either way, this initialization
overwrites the value used when sdp is defined.

This commit therefore drops this pointless definition-time initialization.
Although there is no functional change, compiler code generation may
be affected.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/srcutree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 6833d8887181..e0ef018612a2 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -434,7 +434,7 @@ EXPORT_SYMBOL_GPL(__srcu_read_unlock);
  */
 static void srcu_gp_start(struct srcu_struct *ssp)
 {
-	struct srcu_data *sdp = this_cpu_ptr(ssp->sda);
+	struct srcu_data *sdp;
 	int state;
 
 	lockdep_assert_held(&ACCESS_PRIVATE(ssp, lock));
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH rcu 09/11] rcu: Check for successful spawn of ->boost_kthread_task
  2022-04-18 22:53 [PATCH rcu 0/11] Miscellaneous fixes for v5.19 Paul E. McKenney
                   ` (7 preceding siblings ...)
  2022-04-18 22:53 ` [PATCH rcu 08/11] srcu: Drop needless initialization of sdp in srcu_gp_start() Paul E. McKenney
@ 2022-04-18 22:53 ` Paul E. McKenney
  2022-04-18 22:53 ` [PATCH rcu 10/11] rcu_sync: Fix comment to properly reflect rcu_sync_exit() behavior Paul E. McKenney
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2022-04-18 22:53 UTC (permalink / raw)
  To: rcu; +Cc: linux-kernel, kernel-team, rostedt, Zqiang, Paul E . McKenney

From: Zqiang <qiang1.zhang@intel.com>

For the spawning of the priority-boost kthreads can fail, improbable
though this might seem.  This commit therefore refrains from attemoting
to initiate RCU priority boosting when The ->boost_kthread_task pointer
is NULL.

Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree_plugin.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 176639c6215f..5c23aceecd62 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1125,7 +1125,8 @@ static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags)
 	__releases(rnp->lock)
 {
 	raw_lockdep_assert_held_rcu_node(rnp);
-	if (!rcu_preempt_blocked_readers_cgp(rnp) && rnp->exp_tasks == NULL) {
+	if (!rnp->boost_kthread_task ||
+	    (!rcu_preempt_blocked_readers_cgp(rnp) && !rnp->exp_tasks)) {
 		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 		return;
 	}
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH rcu 10/11] rcu_sync: Fix comment to properly reflect rcu_sync_exit() behavior
  2022-04-18 22:53 [PATCH rcu 0/11] Miscellaneous fixes for v5.19 Paul E. McKenney
                   ` (8 preceding siblings ...)
  2022-04-18 22:53 ` [PATCH rcu 09/11] rcu: Check for successful spawn of ->boost_kthread_task Paul E. McKenney
@ 2022-04-18 22:53 ` Paul E. McKenney
  2022-04-18 22:53 ` [PATCH rcu 11/11] rcu: Use IRQ_WORK_INIT_HARD() to avoid rcu_read_unlock() hangs Paul E. McKenney
       [not found] ` <20220419085607.2014-1-hdanton@sina.com>
  11 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2022-04-18 22:53 UTC (permalink / raw)
  To: rcu; +Cc: linux-kernel, kernel-team, rostedt, David Vernet, Paul E . McKenney

From: David Vernet <void@manifault.com>

The rcu_sync_enter() function is used by updaters to force RCU readers
(e.g. percpu-rwsem) to use their slow paths during an update.  This is
accomplished by setting the ->gp_state of the rcu_sync structure to
GP_ENTER.  In the case of percpu-rwsem, the readers' slow path waits on
a semaphore instead of just incrementing a reader count.  Each updater
invokes the rcu_sync_exit() function to signal to readers that they
may again take their fastpaths.  The rcu_sync_exit() function sets the
->gp_state of the rcu_sync structure to GP_EXIT, and if all goes well,
after a grace period the ->gp_state reverts back to GP_IDLE.

Unfortunately, the rcu_sync_enter() function currently has a comment
incorrectly stating that rcu_sync_exit() (by an updater) will re-enable
reader "slowpaths".  This patch changes the comment to state that this
function re-enables reader fastpaths.

Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/sync.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c
index 33d896d85902..5cefc702158f 100644
--- a/kernel/rcu/sync.c
+++ b/kernel/rcu/sync.c
@@ -111,7 +111,7 @@ static void rcu_sync_func(struct rcu_head *rhp)
  * a slowpath during the update.  After this function returns, all
  * subsequent calls to rcu_sync_is_idle() will return false, which
  * tells readers to stay off their fastpaths.  A later call to
- * rcu_sync_exit() re-enables reader slowpaths.
+ * rcu_sync_exit() re-enables reader fastpaths.
  *
  * When called in isolation, rcu_sync_enter() must wait for a grace
  * period, however, closely spaced calls to rcu_sync_enter() can
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH rcu 11/11] rcu: Use IRQ_WORK_INIT_HARD() to avoid rcu_read_unlock() hangs
  2022-04-18 22:53 [PATCH rcu 0/11] Miscellaneous fixes for v5.19 Paul E. McKenney
                   ` (9 preceding siblings ...)
  2022-04-18 22:53 ` [PATCH rcu 10/11] rcu_sync: Fix comment to properly reflect rcu_sync_exit() behavior Paul E. McKenney
@ 2022-04-18 22:53 ` Paul E. McKenney
       [not found] ` <20220419085607.2014-1-hdanton@sina.com>
  11 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2022-04-18 22:53 UTC (permalink / raw)
  To: rcu; +Cc: linux-kernel, kernel-team, rostedt, Zqiang, Paul E . McKenney

From: Zqiang <qiang1.zhang@intel.com>

When booting kernels built with both CONFIG_RCU_STRICT_GRACE_PERIOD=y
and CONFIG_PREEMPT_RT=y, the rcu_read_unlock_special() function's
invocation of irq_work_queue_on() the init_irq_work() causes the
rcu_preempt_deferred_qs_handler() function to work execute in SCHED_FIFO
irq_work kthreads.  Because rcu_read_unlock_special() is invoked on each
rcu_read_unlock() in such kernels, the amount of work just keeps piling
up, resulting in a boot-time hang.

This commit therefore avoids this hang by using IRQ_WORK_INIT_HARD()
instead of init_irq_work(), but only in kernels built with both
CONFIG_PREEMPT_RT=y and CONFIG_RCU_STRICT_GRACE_PERIOD=y.

Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree_plugin.h | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 5c23aceecd62..2a3715419073 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -661,7 +661,13 @@ static void rcu_read_unlock_special(struct task_struct *t)
 			    expboost && !rdp->defer_qs_iw_pending && cpu_online(rdp->cpu)) {
 				// Get scheduler to re-evaluate and call hooks.
 				// If !IRQ_WORK, FQS scan will eventually IPI.
-				init_irq_work(&rdp->defer_qs_iw, rcu_preempt_deferred_qs_handler);
+				if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) &&
+				    IS_ENABLED(CONFIG_PREEMPT_RT))
+					rdp->defer_qs_iw = IRQ_WORK_INIT_HARD(
+								rcu_preempt_deferred_qs_handler);
+				else
+					init_irq_work(&rdp->defer_qs_iw,
+						      rcu_preempt_deferred_qs_handler);
 				rdp->defer_qs_iw_pending = true;
 				irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
 			}
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH rcu 08/11] srcu: Drop needless initialization of sdp in srcu_gp_start()
  2022-04-18 22:53 ` [PATCH rcu 08/11] srcu: Drop needless initialization of sdp in srcu_gp_start() Paul E. McKenney
@ 2022-04-19  2:03   ` kernel test robot
  2022-04-19  3:25       ` Paul E. McKenney
  0 siblings, 1 reply; 24+ messages in thread
From: kernel test robot @ 2022-04-19  2:03 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: llvm, kbuild-all

Hi "Paul,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on linux/master v5.18-rc3]
[cannot apply to paulmck-rcu/dev paulmck-rcu/rcu/next next-20220414]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Paul-E-McKenney/Miscellaneous-fixes-for-v5-19/20220419-065730
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git b2d229d4ddb17db541098b83524d901257e93845
config: i386-randconfig-a013 (https://download.01.org/0day-ci/archive/20220419/202204191057.dy5dzVNo-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 429cbac0390654f90bba18a41799464adf31a5ec)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/375cc5ddcf9646bfab6ec257cfa297439ed15273
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Paul-E-McKenney/Miscellaneous-fixes-for-v5-19/20220419-065730
        git checkout 375cc5ddcf9646bfab6ec257cfa297439ed15273
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash kernel/rcu/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> kernel/rcu/srcutree.c:443:25: warning: variable 'sdp' is uninitialized when used here [-Wuninitialized]
           rcu_segcblist_advance(&sdp->srcu_cblist,
                                  ^~~
   kernel/rcu/srcutree.c:437:23: note: initialize the variable 'sdp' to silence this warning
           struct srcu_data *sdp;
                                ^
                                 = NULL
   1 warning generated.


vim +/sdp +443 kernel/rcu/srcutree.c

dad81a2026841b Paul E. McKenney 2017-03-25  431  
dad81a2026841b Paul E. McKenney 2017-03-25  432  /*
dad81a2026841b Paul E. McKenney 2017-03-25  433   * Start an SRCU grace period.
dad81a2026841b Paul E. McKenney 2017-03-25  434   */
aacb5d91ab1bfb Paul E. McKenney 2018-10-28  435  static void srcu_gp_start(struct srcu_struct *ssp)
dad81a2026841b Paul E. McKenney 2017-03-25  436  {
375cc5ddcf9646 Lukas Bulwahn    2022-04-18  437  	struct srcu_data *sdp;
dad81a2026841b Paul E. McKenney 2017-03-25  438  	int state;
dad81a2026841b Paul E. McKenney 2017-03-25  439  
aacb5d91ab1bfb Paul E. McKenney 2018-10-28  440  	lockdep_assert_held(&ACCESS_PRIVATE(ssp, lock));
aacb5d91ab1bfb Paul E. McKenney 2018-10-28  441  	WARN_ON_ONCE(ULONG_CMP_GE(ssp->srcu_gp_seq, ssp->srcu_gp_seq_needed));
eb4c2382272ae7 Dennis Krein     2018-10-26  442  	spin_lock_rcu_node(sdp);  /* Interrupts already disabled. */
da915ad5cf25b5 Paul E. McKenney 2017-04-05 @443  	rcu_segcblist_advance(&sdp->srcu_cblist,
aacb5d91ab1bfb Paul E. McKenney 2018-10-28  444  			      rcu_seq_current(&ssp->srcu_gp_seq));
da915ad5cf25b5 Paul E. McKenney 2017-04-05  445  	(void)rcu_segcblist_accelerate(&sdp->srcu_cblist,
aacb5d91ab1bfb Paul E. McKenney 2018-10-28  446  				       rcu_seq_snap(&ssp->srcu_gp_seq));
eb4c2382272ae7 Dennis Krein     2018-10-26  447  	spin_unlock_rcu_node(sdp);  /* Interrupts remain disabled. */
2da4b2a7fd8de5 Paul E. McKenney 2017-04-25  448  	smp_mb(); /* Order prior store to ->srcu_gp_seq_needed vs. GP start. */
aacb5d91ab1bfb Paul E. McKenney 2018-10-28  449  	rcu_seq_start(&ssp->srcu_gp_seq);
710426068dc60f Paul E. McKenney 2020-01-03  450  	state = rcu_seq_state(ssp->srcu_gp_seq);
dad81a2026841b Paul E. McKenney 2017-03-25  451  	WARN_ON_ONCE(state != SRCU_STATE_SCAN1);
dad81a2026841b Paul E. McKenney 2017-03-25  452  }
dad81a2026841b Paul E. McKenney 2017-03-25  453  

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rcu 08/11] srcu: Drop needless initialization of sdp in srcu_gp_start()
  2022-04-19  2:03   ` kernel test robot
@ 2022-04-19  3:25       ` Paul E. McKenney
  0 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2022-04-19  3:25 UTC (permalink / raw)
  To: kernel test robot; +Cc: llvm, kbuild-all

On Tue, Apr 19, 2022 at 10:03:17AM +0800, kernel test robot wrote:
> Hi "Paul,
> 
> I love your patch! Perhaps something to improve:
> 
> [auto build test WARNING on linus/master]
> [also build test WARNING on linux/master v5.18-rc3]
> [cannot apply to paulmck-rcu/dev paulmck-rcu/rcu/next next-20220414]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]

I recently updated my patch-generation scripts, and they apparently
need help.  Please ignore this patch.  The 21/21 variant is the correct
version.

							Thanx, Paul

> url:    https://github.com/intel-lab-lkp/linux/commits/Paul-E-McKenney/Miscellaneous-fixes-for-v5-19/20220419-065730
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git b2d229d4ddb17db541098b83524d901257e93845
> config: i386-randconfig-a013 (https://download.01.org/0day-ci/archive/20220419/202204191057.dy5dzVNo-lkp@intel.com/config)
> compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 429cbac0390654f90bba18a41799464adf31a5ec)
> reproduce (this is a W=1 build):
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # https://github.com/intel-lab-lkp/linux/commit/375cc5ddcf9646bfab6ec257cfa297439ed15273
>         git remote add linux-review https://github.com/intel-lab-lkp/linux
>         git fetch --no-tags linux-review Paul-E-McKenney/Miscellaneous-fixes-for-v5-19/20220419-065730
>         git checkout 375cc5ddcf9646bfab6ec257cfa297439ed15273
>         # save the config file to linux build tree
>         mkdir build_dir
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash kernel/rcu/
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> 
> All warnings (new ones prefixed by >>):
> 
> >> kernel/rcu/srcutree.c:443:25: warning: variable 'sdp' is uninitialized when used here [-Wuninitialized]
>            rcu_segcblist_advance(&sdp->srcu_cblist,
>                                   ^~~
>    kernel/rcu/srcutree.c:437:23: note: initialize the variable 'sdp' to silence this warning
>            struct srcu_data *sdp;
>                                 ^
>                                  = NULL
>    1 warning generated.
> 
> 
> vim +/sdp +443 kernel/rcu/srcutree.c
> 
> dad81a2026841b Paul E. McKenney 2017-03-25  431  
> dad81a2026841b Paul E. McKenney 2017-03-25  432  /*
> dad81a2026841b Paul E. McKenney 2017-03-25  433   * Start an SRCU grace period.
> dad81a2026841b Paul E. McKenney 2017-03-25  434   */
> aacb5d91ab1bfb Paul E. McKenney 2018-10-28  435  static void srcu_gp_start(struct srcu_struct *ssp)
> dad81a2026841b Paul E. McKenney 2017-03-25  436  {
> 375cc5ddcf9646 Lukas Bulwahn    2022-04-18  437  	struct srcu_data *sdp;
> dad81a2026841b Paul E. McKenney 2017-03-25  438  	int state;
> dad81a2026841b Paul E. McKenney 2017-03-25  439  
> aacb5d91ab1bfb Paul E. McKenney 2018-10-28  440  	lockdep_assert_held(&ACCESS_PRIVATE(ssp, lock));
> aacb5d91ab1bfb Paul E. McKenney 2018-10-28  441  	WARN_ON_ONCE(ULONG_CMP_GE(ssp->srcu_gp_seq, ssp->srcu_gp_seq_needed));
> eb4c2382272ae7 Dennis Krein     2018-10-26  442  	spin_lock_rcu_node(sdp);  /* Interrupts already disabled. */
> da915ad5cf25b5 Paul E. McKenney 2017-04-05 @443  	rcu_segcblist_advance(&sdp->srcu_cblist,
> aacb5d91ab1bfb Paul E. McKenney 2018-10-28  444  			      rcu_seq_current(&ssp->srcu_gp_seq));
> da915ad5cf25b5 Paul E. McKenney 2017-04-05  445  	(void)rcu_segcblist_accelerate(&sdp->srcu_cblist,
> aacb5d91ab1bfb Paul E. McKenney 2018-10-28  446  				       rcu_seq_snap(&ssp->srcu_gp_seq));
> eb4c2382272ae7 Dennis Krein     2018-10-26  447  	spin_unlock_rcu_node(sdp);  /* Interrupts remain disabled. */
> 2da4b2a7fd8de5 Paul E. McKenney 2017-04-25  448  	smp_mb(); /* Order prior store to ->srcu_gp_seq_needed vs. GP start. */
> aacb5d91ab1bfb Paul E. McKenney 2018-10-28  449  	rcu_seq_start(&ssp->srcu_gp_seq);
> 710426068dc60f Paul E. McKenney 2020-01-03  450  	state = rcu_seq_state(ssp->srcu_gp_seq);
> dad81a2026841b Paul E. McKenney 2017-03-25  451  	WARN_ON_ONCE(state != SRCU_STATE_SCAN1);
> dad81a2026841b Paul E. McKenney 2017-03-25  452  }
> dad81a2026841b Paul E. McKenney 2017-03-25  453  
> 
> -- 
> 0-DAY CI Kernel Test Service
> https://01.org/lkp

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rcu 08/11] srcu: Drop needless initialization of sdp in srcu_gp_start()
@ 2022-04-19  3:25       ` Paul E. McKenney
  0 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2022-04-19  3:25 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 4585 bytes --]

On Tue, Apr 19, 2022 at 10:03:17AM +0800, kernel test robot wrote:
> Hi "Paul,
> 
> I love your patch! Perhaps something to improve:
> 
> [auto build test WARNING on linus/master]
> [also build test WARNING on linux/master v5.18-rc3]
> [cannot apply to paulmck-rcu/dev paulmck-rcu/rcu/next next-20220414]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]

I recently updated my patch-generation scripts, and they apparently
need help.  Please ignore this patch.  The 21/21 variant is the correct
version.

							Thanx, Paul

> url:    https://github.com/intel-lab-lkp/linux/commits/Paul-E-McKenney/Miscellaneous-fixes-for-v5-19/20220419-065730
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git b2d229d4ddb17db541098b83524d901257e93845
> config: i386-randconfig-a013 (https://download.01.org/0day-ci/archive/20220419/202204191057.dy5dzVNo-lkp(a)intel.com/config)
> compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 429cbac0390654f90bba18a41799464adf31a5ec)
> reproduce (this is a W=1 build):
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # https://github.com/intel-lab-lkp/linux/commit/375cc5ddcf9646bfab6ec257cfa297439ed15273
>         git remote add linux-review https://github.com/intel-lab-lkp/linux
>         git fetch --no-tags linux-review Paul-E-McKenney/Miscellaneous-fixes-for-v5-19/20220419-065730
>         git checkout 375cc5ddcf9646bfab6ec257cfa297439ed15273
>         # save the config file to linux build tree
>         mkdir build_dir
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash kernel/rcu/
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> 
> All warnings (new ones prefixed by >>):
> 
> >> kernel/rcu/srcutree.c:443:25: warning: variable 'sdp' is uninitialized when used here [-Wuninitialized]
>            rcu_segcblist_advance(&sdp->srcu_cblist,
>                                   ^~~
>    kernel/rcu/srcutree.c:437:23: note: initialize the variable 'sdp' to silence this warning
>            struct srcu_data *sdp;
>                                 ^
>                                  = NULL
>    1 warning generated.
> 
> 
> vim +/sdp +443 kernel/rcu/srcutree.c
> 
> dad81a2026841b Paul E. McKenney 2017-03-25  431  
> dad81a2026841b Paul E. McKenney 2017-03-25  432  /*
> dad81a2026841b Paul E. McKenney 2017-03-25  433   * Start an SRCU grace period.
> dad81a2026841b Paul E. McKenney 2017-03-25  434   */
> aacb5d91ab1bfb Paul E. McKenney 2018-10-28  435  static void srcu_gp_start(struct srcu_struct *ssp)
> dad81a2026841b Paul E. McKenney 2017-03-25  436  {
> 375cc5ddcf9646 Lukas Bulwahn    2022-04-18  437  	struct srcu_data *sdp;
> dad81a2026841b Paul E. McKenney 2017-03-25  438  	int state;
> dad81a2026841b Paul E. McKenney 2017-03-25  439  
> aacb5d91ab1bfb Paul E. McKenney 2018-10-28  440  	lockdep_assert_held(&ACCESS_PRIVATE(ssp, lock));
> aacb5d91ab1bfb Paul E. McKenney 2018-10-28  441  	WARN_ON_ONCE(ULONG_CMP_GE(ssp->srcu_gp_seq, ssp->srcu_gp_seq_needed));
> eb4c2382272ae7 Dennis Krein     2018-10-26  442  	spin_lock_rcu_node(sdp);  /* Interrupts already disabled. */
> da915ad5cf25b5 Paul E. McKenney 2017-04-05 @443  	rcu_segcblist_advance(&sdp->srcu_cblist,
> aacb5d91ab1bfb Paul E. McKenney 2018-10-28  444  			      rcu_seq_current(&ssp->srcu_gp_seq));
> da915ad5cf25b5 Paul E. McKenney 2017-04-05  445  	(void)rcu_segcblist_accelerate(&sdp->srcu_cblist,
> aacb5d91ab1bfb Paul E. McKenney 2018-10-28  446  				       rcu_seq_snap(&ssp->srcu_gp_seq));
> eb4c2382272ae7 Dennis Krein     2018-10-26  447  	spin_unlock_rcu_node(sdp);  /* Interrupts remain disabled. */
> 2da4b2a7fd8de5 Paul E. McKenney 2017-04-25  448  	smp_mb(); /* Order prior store to ->srcu_gp_seq_needed vs. GP start. */
> aacb5d91ab1bfb Paul E. McKenney 2018-10-28  449  	rcu_seq_start(&ssp->srcu_gp_seq);
> 710426068dc60f Paul E. McKenney 2020-01-03  450  	state = rcu_seq_state(ssp->srcu_gp_seq);
> dad81a2026841b Paul E. McKenney 2017-03-25  451  	WARN_ON_ONCE(state != SRCU_STATE_SCAN1);
> dad81a2026841b Paul E. McKenney 2017-03-25  452  }
> dad81a2026841b Paul E. McKenney 2017-03-25  453  
> 
> -- 
> 0-DAY CI Kernel Test Service
> https://01.org/lkp

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rcu 04/11] kernel/smp: Provide boot-time timeout for CSD lock diagnostics
  2022-04-18 22:53 ` [PATCH rcu 04/11] kernel/smp: Provide boot-time timeout for CSD lock diagnostics Paul E. McKenney
@ 2022-04-19  7:11   ` Juergen Gross
  2022-04-19 16:44     ` Paul E. McKenney
  0 siblings, 1 reply; 24+ messages in thread
From: Juergen Gross @ 2022-04-19  7:11 UTC (permalink / raw)
  To: Paul E. McKenney, rcu
  Cc: linux-kernel, kernel-team, rostedt, Rik van Riel, Peter Zijlstra,
	Ingo Molnar, Thomas Gleixner, Sebastian Andrzej Siewior


[-- Attachment #1.1.1: Type: text/plain, Size: 3429 bytes --]

On 19.04.22 00:53, Paul E. McKenney wrote:
> Debugging of problems involving insanely long-running SMI handlers
> proceeds better if the CSD-lock timeout can be adjusted.  This commit
> therefore provides a new smp.csd_lock_timeout kernel boot parameter
> that specifies the timeout in milliseconds.  The default remains at the
> previously hard-coded value of five seconds.
> 
> Cc: Rik van Riel <riel@surriel.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> ---
>   Documentation/admin-guide/kernel-parameters.txt | 11 +++++++++++
>   kernel/smp.c                                    |  7 +++++--
>   2 files changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 3f1cc5e317ed..645c4c001b16 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -5377,6 +5377,17 @@
>   	smart2=		[HW]
>   			Format: <io1>[,<io2>[,...,<io8>]]
>   
> +	smp.csd_lock_timeout= [KNL]
> +			Specify the period of time in milliseconds
> +			that smp_call_function() and friends will wait
> +			for a CPU to release the CSD lock.  This is
> +			useful when diagnosing bugs involving CPUs
> +			disabling interrupts for extended periods
> +			of time.  Defaults to 5,000 milliseconds, and
> +			setting a value of zero disables this feature.
> +			This feature may be more efficiently disabled
> +			using the csdlock_debug- kernel parameter.
> +
>   	smsc-ircc2.nopnp	[HW] Don't use PNP to discover SMC devices
>   	smsc-ircc2.ircc_cfg=	[HW] Device configuration I/O port
>   	smsc-ircc2.ircc_sir=	[HW] SIR base I/O port
> diff --git a/kernel/smp.c b/kernel/smp.c
> index 01a7c1706a58..d82439bac401 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -183,7 +183,9 @@ static DEFINE_PER_CPU(smp_call_func_t, cur_csd_func);
>   static DEFINE_PER_CPU(void *, cur_csd_info);
>   static DEFINE_PER_CPU(struct cfd_seq_local, cfd_seq_local);
>   
> -#define CSD_LOCK_TIMEOUT (5ULL * NSEC_PER_SEC)
> +static ulong csd_lock_timeout = 5000;  /* CSD lock timeout in milliseconds. */
> +module_param(csd_lock_timeout, ulong, 0444);
> +
>   static atomic_t csd_bug_count = ATOMIC_INIT(0);
>   static u64 cfd_seq;
>   
> @@ -329,6 +331,7 @@ static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 *
>   	u64 ts2, ts_delta;
>   	call_single_data_t *cpu_cur_csd;
>   	unsigned int flags = READ_ONCE(csd->node.u_flags);
> +	unsigned long long csd_lock_timeout_ns = csd_lock_timeout * NSEC_PER_MSEC;
>   
>   	if (!(flags & CSD_FLAG_LOCK)) {
>   		if (!unlikely(*bug_id))
> @@ -341,7 +344,7 @@ static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 *
>   
>   	ts2 = sched_clock();
>   	ts_delta = ts2 - *ts1;
> -	if (likely(ts_delta <= CSD_LOCK_TIMEOUT))
> +	if (likely(ts_delta <= csd_lock_timeout_ns || csd_lock_timeout_ns <= 0))

csd_lock_timeout_ns is unsigned, so "csd_lock_timeout_ns <= 0" should be rather
"csd_lock_timeout_ns == 0".

With that fixed you can add my

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rcu 04/11] kernel/smp: Provide boot-time timeout for CSD lock diagnostics
       [not found] ` <20220419085607.2014-1-hdanton@sina.com>
@ 2022-04-19 13:46   ` Paul E. McKenney
  2022-04-19 14:11     ` Dmitry Vyukov
  0 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2022-04-19 13:46 UTC (permalink / raw)
  To: Hillf Danton; +Cc: Dmitry Vyukov, linux-kernel

On Tue, Apr 19, 2022 at 04:56:07PM +0800, Hillf Danton wrote:
> On Mon, 18 Apr 2022 15:53:52 -0700 Paul E. McKenney wrote:
> > Debugging of problems involving insanely long-running SMI handlers
> > proceeds better if the CSD-lock timeout can be adjusted.  This commit
> > therefore provides a new smp.csd_lock_timeout kernel boot parameter
> > that specifies the timeout in milliseconds.  The default remains at the
> > previously hard-coded value of five seconds.
> > 
> > Cc: Rik van Riel <riel@surriel.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > Cc: Juergen Gross <jgross@suse.com>
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > ---
> >  Documentation/admin-guide/kernel-parameters.txt | 11 +++++++++++
> >  kernel/smp.c                                    |  7 +++++--
> >  2 files changed, 16 insertions(+), 2 deletions(-)
> > 
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index 3f1cc5e317ed..645c4c001b16 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -5377,6 +5377,17 @@
> >  	smart2=		[HW]
> >  			Format: <io1>[,<io2>[,...,<io8>]]
> >  
> > +	smp.csd_lock_timeout= [KNL]
> > +			Specify the period of time in milliseconds
> > +			that smp_call_function() and friends will wait
> > +			for a CPU to release the CSD lock.  This is
> > +			useful when diagnosing bugs involving CPUs
> > +			disabling interrupts for extended periods
> > +			of time.  Defaults to 5,000 milliseconds, and
> > +			setting a value of zero disables this feature.
> > +			This feature may be more efficiently disabled
> > +			using the csdlock_debug- kernel parameter.
> > +
> 
> Can non-responsive CSD lock detected trigger syzbot (warning) report?

If they enable it by building with CONFIG_CSD_LOCK_WAIT_DEBUG=y, yes.  

							Thanx, Paul

> Hillf
> >  	smsc-ircc2.nopnp	[HW] Don't use PNP to discover SMC devices
> >  	smsc-ircc2.ircc_cfg=	[HW] Device configuration I/O port
> >  	smsc-ircc2.ircc_sir=	[HW] SIR base I/O port
> > diff --git a/kernel/smp.c b/kernel/smp.c
> > index 01a7c1706a58..d82439bac401 100644
> > --- a/kernel/smp.c
> > +++ b/kernel/smp.c
> > @@ -183,7 +183,9 @@ static DEFINE_PER_CPU(smp_call_func_t, cur_csd_func);
> >  static DEFINE_PER_CPU(void *, cur_csd_info);
> >  static DEFINE_PER_CPU(struct cfd_seq_local, cfd_seq_local);
> >  
> > -#define CSD_LOCK_TIMEOUT (5ULL * NSEC_PER_SEC)
> > +static ulong csd_lock_timeout = 5000;  /* CSD lock timeout in milliseconds. */
> > +module_param(csd_lock_timeout, ulong, 0444);
> > +
> >  static atomic_t csd_bug_count = ATOMIC_INIT(0);
> >  static u64 cfd_seq;
> >  
> > @@ -329,6 +331,7 @@ static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 *
> >  	u64 ts2, ts_delta;
> >  	call_single_data_t *cpu_cur_csd;
> >  	unsigned int flags = READ_ONCE(csd->node.u_flags);
> > +	unsigned long long csd_lock_timeout_ns = csd_lock_timeout * NSEC_PER_MSEC;
> >  
> >  	if (!(flags & CSD_FLAG_LOCK)) {
> >  		if (!unlikely(*bug_id))
> > @@ -341,7 +344,7 @@ static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 *
> >  
> >  	ts2 = sched_clock();
> >  	ts_delta = ts2 - *ts1;
> > -	if (likely(ts_delta <= CSD_LOCK_TIMEOUT))
> > +	if (likely(ts_delta <= csd_lock_timeout_ns || csd_lock_timeout_ns <= 0))
> >  		return false;
> >  
> >  	firsttime = !*bug_id;
> > -- 
> > 2.31.1.189.g2e36527f23

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rcu 04/11] kernel/smp: Provide boot-time timeout for CSD lock diagnostics
  2022-04-19 13:46   ` [PATCH rcu 04/11] kernel/smp: Provide boot-time timeout for CSD lock diagnostics Paul E. McKenney
@ 2022-04-19 14:11     ` Dmitry Vyukov
  2022-04-19 16:49       ` Paul E. McKenney
  0 siblings, 1 reply; 24+ messages in thread
From: Dmitry Vyukov @ 2022-04-19 14:11 UTC (permalink / raw)
  To: paulmck; +Cc: Hillf Danton, linux-kernel, syzkaller

On Tue, 19 Apr 2022 at 15:46, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Tue, Apr 19, 2022 at 04:56:07PM +0800, Hillf Danton wrote:
> > On Mon, 18 Apr 2022 15:53:52 -0700 Paul E. McKenney wrote:
> > > Debugging of problems involving insanely long-running SMI handlers
> > > proceeds better if the CSD-lock timeout can be adjusted.  This commit
> > > therefore provides a new smp.csd_lock_timeout kernel boot parameter
> > > that specifies the timeout in milliseconds.  The default remains at the
> > > previously hard-coded value of five seconds.
> > >
> > > Cc: Rik van Riel <riel@surriel.com>
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Cc: Ingo Molnar <mingo@kernel.org>
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > > Cc: Juergen Gross <jgross@suse.com>
> > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > ---
> > >  Documentation/admin-guide/kernel-parameters.txt | 11 +++++++++++
> > >  kernel/smp.c                                    |  7 +++++--
> > >  2 files changed, 16 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > > index 3f1cc5e317ed..645c4c001b16 100644
> > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > @@ -5377,6 +5377,17 @@
> > >     smart2=         [HW]
> > >                     Format: <io1>[,<io2>[,...,<io8>]]
> > >
> > > +   smp.csd_lock_timeout= [KNL]
> > > +                   Specify the period of time in milliseconds
> > > +                   that smp_call_function() and friends will wait
> > > +                   for a CPU to release the CSD lock.  This is
> > > +                   useful when diagnosing bugs involving CPUs
> > > +                   disabling interrupts for extended periods
> > > +                   of time.  Defaults to 5,000 milliseconds, and
> > > +                   setting a value of zero disables this feature.
> > > +                   This feature may be more efficiently disabled
> > > +                   using the csdlock_debug- kernel parameter.
> > > +
> >
> > Can non-responsive CSD lock detected trigger syzbot (warning) report?
>
> If they enable it by building with CONFIG_CSD_LOCK_WAIT_DEBUG=y, yes.

+syzkaller mailing list

Currently we don't enable CONFIG_CSD_LOCK_WAIT_DEBUG in syzbot configs.
Is it a generally useful debugging feature recommended to be enabled
in kernel testing systems?
If we enabled it, we also need to figure out where it fits into the
timeout hierarchy and adjust smp.csd_lock_timeout:
https://github.com/google/syzkaller/blob/master/dashboard/config/linux/bits/x86_64.yml#L15-L40


>                                                         Thanx, Paul
>
> > Hillf
> > >     smsc-ircc2.nopnp        [HW] Don't use PNP to discover SMC devices
> > >     smsc-ircc2.ircc_cfg=    [HW] Device configuration I/O port
> > >     smsc-ircc2.ircc_sir=    [HW] SIR base I/O port
> > > diff --git a/kernel/smp.c b/kernel/smp.c
> > > index 01a7c1706a58..d82439bac401 100644
> > > --- a/kernel/smp.c
> > > +++ b/kernel/smp.c
> > > @@ -183,7 +183,9 @@ static DEFINE_PER_CPU(smp_call_func_t, cur_csd_func);
> > >  static DEFINE_PER_CPU(void *, cur_csd_info);
> > >  static DEFINE_PER_CPU(struct cfd_seq_local, cfd_seq_local);
> > >
> > > -#define CSD_LOCK_TIMEOUT (5ULL * NSEC_PER_SEC)
> > > +static ulong csd_lock_timeout = 5000;  /* CSD lock timeout in milliseconds. */
> > > +module_param(csd_lock_timeout, ulong, 0444);
> > > +
> > >  static atomic_t csd_bug_count = ATOMIC_INIT(0);
> > >  static u64 cfd_seq;
> > >
> > > @@ -329,6 +331,7 @@ static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 *
> > >     u64 ts2, ts_delta;
> > >     call_single_data_t *cpu_cur_csd;
> > >     unsigned int flags = READ_ONCE(csd->node.u_flags);
> > > +   unsigned long long csd_lock_timeout_ns = csd_lock_timeout * NSEC_PER_MSEC;
> > >
> > >     if (!(flags & CSD_FLAG_LOCK)) {
> > >             if (!unlikely(*bug_id))
> > > @@ -341,7 +344,7 @@ static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 *
> > >
> > >     ts2 = sched_clock();
> > >     ts_delta = ts2 - *ts1;
> > > -   if (likely(ts_delta <= CSD_LOCK_TIMEOUT))
> > > +   if (likely(ts_delta <= csd_lock_timeout_ns || csd_lock_timeout_ns <= 0))
> > >             return false;
> > >
> > >     firsttime = !*bug_id;
> > > --
> > > 2.31.1.189.g2e36527f23

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rcu 04/11] kernel/smp: Provide boot-time timeout for CSD lock diagnostics
  2022-04-19  7:11   ` Juergen Gross
@ 2022-04-19 16:44     ` Paul E. McKenney
  0 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2022-04-19 16:44 UTC (permalink / raw)
  To: Juergen Gross
  Cc: rcu, linux-kernel, kernel-team, rostedt, Rik van Riel,
	Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	Sebastian Andrzej Siewior

On Tue, Apr 19, 2022 at 09:11:46AM +0200, Juergen Gross wrote:
> On 19.04.22 00:53, Paul E. McKenney wrote:
> > Debugging of problems involving insanely long-running SMI handlers
> > proceeds better if the CSD-lock timeout can be adjusted.  This commit
> > therefore provides a new smp.csd_lock_timeout kernel boot parameter
> > that specifies the timeout in milliseconds.  The default remains at the
> > previously hard-coded value of five seconds.
> > 
> > Cc: Rik van Riel <riel@surriel.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > Cc: Juergen Gross <jgross@suse.com>
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > ---
> >   Documentation/admin-guide/kernel-parameters.txt | 11 +++++++++++
> >   kernel/smp.c                                    |  7 +++++--
> >   2 files changed, 16 insertions(+), 2 deletions(-)
> > 
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index 3f1cc5e317ed..645c4c001b16 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -5377,6 +5377,17 @@
> >   	smart2=		[HW]
> >   			Format: <io1>[,<io2>[,...,<io8>]]
> > +	smp.csd_lock_timeout= [KNL]
> > +			Specify the period of time in milliseconds
> > +			that smp_call_function() and friends will wait
> > +			for a CPU to release the CSD lock.  This is
> > +			useful when diagnosing bugs involving CPUs
> > +			disabling interrupts for extended periods
> > +			of time.  Defaults to 5,000 milliseconds, and
> > +			setting a value of zero disables this feature.
> > +			This feature may be more efficiently disabled
> > +			using the csdlock_debug- kernel parameter.
> > +
> >   	smsc-ircc2.nopnp	[HW] Don't use PNP to discover SMC devices
> >   	smsc-ircc2.ircc_cfg=	[HW] Device configuration I/O port
> >   	smsc-ircc2.ircc_sir=	[HW] SIR base I/O port
> > diff --git a/kernel/smp.c b/kernel/smp.c
> > index 01a7c1706a58..d82439bac401 100644
> > --- a/kernel/smp.c
> > +++ b/kernel/smp.c
> > @@ -183,7 +183,9 @@ static DEFINE_PER_CPU(smp_call_func_t, cur_csd_func);
> >   static DEFINE_PER_CPU(void *, cur_csd_info);
> >   static DEFINE_PER_CPU(struct cfd_seq_local, cfd_seq_local);
> > -#define CSD_LOCK_TIMEOUT (5ULL * NSEC_PER_SEC)
> > +static ulong csd_lock_timeout = 5000;  /* CSD lock timeout in milliseconds. */
> > +module_param(csd_lock_timeout, ulong, 0444);
> > +
> >   static atomic_t csd_bug_count = ATOMIC_INIT(0);
> >   static u64 cfd_seq;
> > @@ -329,6 +331,7 @@ static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 *
> >   	u64 ts2, ts_delta;
> >   	call_single_data_t *cpu_cur_csd;
> >   	unsigned int flags = READ_ONCE(csd->node.u_flags);
> > +	unsigned long long csd_lock_timeout_ns = csd_lock_timeout * NSEC_PER_MSEC;
> >   	if (!(flags & CSD_FLAG_LOCK)) {
> >   		if (!unlikely(*bug_id))
> > @@ -341,7 +344,7 @@ static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 *
> >   	ts2 = sched_clock();
> >   	ts_delta = ts2 - *ts1;
> > -	if (likely(ts_delta <= CSD_LOCK_TIMEOUT))
> > +	if (likely(ts_delta <= csd_lock_timeout_ns || csd_lock_timeout_ns <= 0))
> 
> csd_lock_timeout_ns is unsigned, so "csd_lock_timeout_ns <= 0" should be rather
> "csd_lock_timeout_ns == 0".
> 
> With that fixed you can add my
> 
> Reviewed-by: Juergen Gross <jgross@suse.com>

Thank you, and will do on next rebase.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rcu 04/11] kernel/smp: Provide boot-time timeout for CSD lock diagnostics
  2022-04-19 14:11     ` Dmitry Vyukov
@ 2022-04-19 16:49       ` Paul E. McKenney
       [not found]         ` <20220419231820.2089-1-hdanton@sina.com>
  0 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2022-04-19 16:49 UTC (permalink / raw)
  To: Dmitry Vyukov; +Cc: Hillf Danton, linux-kernel, syzkaller

On Tue, Apr 19, 2022 at 04:11:36PM +0200, Dmitry Vyukov wrote:
> On Tue, 19 Apr 2022 at 15:46, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Tue, Apr 19, 2022 at 04:56:07PM +0800, Hillf Danton wrote:
> > > On Mon, 18 Apr 2022 15:53:52 -0700 Paul E. McKenney wrote:
> > > > Debugging of problems involving insanely long-running SMI handlers
> > > > proceeds better if the CSD-lock timeout can be adjusted.  This commit
> > > > therefore provides a new smp.csd_lock_timeout kernel boot parameter
> > > > that specifies the timeout in milliseconds.  The default remains at the
> > > > previously hard-coded value of five seconds.
> > > >
> > > > Cc: Rik van Riel <riel@surriel.com>
> > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > Cc: Ingo Molnar <mingo@kernel.org>
> > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > > > Cc: Juergen Gross <jgross@suse.com>
> > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > > ---
> > > >  Documentation/admin-guide/kernel-parameters.txt | 11 +++++++++++
> > > >  kernel/smp.c                                    |  7 +++++--
> > > >  2 files changed, 16 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > > > index 3f1cc5e317ed..645c4c001b16 100644
> > > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > > @@ -5377,6 +5377,17 @@
> > > >     smart2=         [HW]
> > > >                     Format: <io1>[,<io2>[,...,<io8>]]
> > > >
> > > > +   smp.csd_lock_timeout= [KNL]
> > > > +                   Specify the period of time in milliseconds
> > > > +                   that smp_call_function() and friends will wait
> > > > +                   for a CPU to release the CSD lock.  This is
> > > > +                   useful when diagnosing bugs involving CPUs
> > > > +                   disabling interrupts for extended periods
> > > > +                   of time.  Defaults to 5,000 milliseconds, and
> > > > +                   setting a value of zero disables this feature.
> > > > +                   This feature may be more efficiently disabled
> > > > +                   using the csdlock_debug- kernel parameter.
> > > > +
> > >
> > > Can non-responsive CSD lock detected trigger syzbot (warning) report?
> >
> > If they enable it by building with CONFIG_CSD_LOCK_WAIT_DEBUG=y, yes.
> 
> +syzkaller mailing list
> 
> Currently we don't enable CONFIG_CSD_LOCK_WAIT_DEBUG in syzbot configs.
> Is it a generally useful debugging feature recommended to be enabled
> in kernel testing systems?

With the default value for smp.csd_lock_timeout, it detects CPUs that have
had interrupts disabled for more than five seconds, which can be useful
for detecting what would otherwise be silent response-time failures.

> If we enabled it, we also need to figure out where it fits into the
> timeout hierarchy and adjust smp.csd_lock_timeout:
> https://github.com/google/syzkaller/blob/master/dashboard/config/linux/bits/x86_64.yml#L15-L40

On this, I must defer to you guys.  I can say that the larger the value
that you choose for smp.csd_lock_timeout, the nearer the beginnning of
that group it should go, but you guys knew that already.  ;-)

                                                        Thanx, Paul

> > > Hillf
> > > >     smsc-ircc2.nopnp        [HW] Don't use PNP to discover SMC devices
> > > >     smsc-ircc2.ircc_cfg=    [HW] Device configuration I/O port
> > > >     smsc-ircc2.ircc_sir=    [HW] SIR base I/O port
> > > > diff --git a/kernel/smp.c b/kernel/smp.c
> > > > index 01a7c1706a58..d82439bac401 100644
> > > > --- a/kernel/smp.c
> > > > +++ b/kernel/smp.c
> > > > @@ -183,7 +183,9 @@ static DEFINE_PER_CPU(smp_call_func_t, cur_csd_func);
> > > >  static DEFINE_PER_CPU(void *, cur_csd_info);
> > > >  static DEFINE_PER_CPU(struct cfd_seq_local, cfd_seq_local);
> > > >
> > > > -#define CSD_LOCK_TIMEOUT (5ULL * NSEC_PER_SEC)
> > > > +static ulong csd_lock_timeout = 5000;  /* CSD lock timeout in milliseconds. */
> > > > +module_param(csd_lock_timeout, ulong, 0444);
> > > > +
> > > >  static atomic_t csd_bug_count = ATOMIC_INIT(0);
> > > >  static u64 cfd_seq;
> > > >
> > > > @@ -329,6 +331,7 @@ static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 *
> > > >     u64 ts2, ts_delta;
> > > >     call_single_data_t *cpu_cur_csd;
> > > >     unsigned int flags = READ_ONCE(csd->node.u_flags);
> > > > +   unsigned long long csd_lock_timeout_ns = csd_lock_timeout * NSEC_PER_MSEC;
> > > >
> > > >     if (!(flags & CSD_FLAG_LOCK)) {
> > > >             if (!unlikely(*bug_id))
> > > > @@ -341,7 +344,7 @@ static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 *
> > > >
> > > >     ts2 = sched_clock();
> > > >     ts_delta = ts2 - *ts1;
> > > > -   if (likely(ts_delta <= CSD_LOCK_TIMEOUT))
> > > > +   if (likely(ts_delta <= csd_lock_timeout_ns || csd_lock_timeout_ns <= 0))
> > > >             return false;
> > > >
> > > >     firsttime = !*bug_id;
> > > > --
> > > > 2.31.1.189.g2e36527f23

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rcu 04/11] kernel/smp: Provide boot-time timeout for CSD lock diagnostics
       [not found]         ` <20220419231820.2089-1-hdanton@sina.com>
@ 2022-04-20 12:17           ` Dmitry Vyukov
  2022-04-20 13:41             ` Paul E. McKenney
  0 siblings, 1 reply; 24+ messages in thread
From: Dmitry Vyukov @ 2022-04-20 12:17 UTC (permalink / raw)
  To: Hillf Danton; +Cc: linux-kernel, Paul E. McKenney, syzkaller

On Wed, 20 Apr 2022 at 01:18, Hillf Danton <hdanton@sina.com> wrote:
>
> On Tue, 19 Apr 2022 09:49:08 -0700 Paul E. McKenney wrote:
> > On Tue, Apr 19, 2022 at 04:11:36PM +0200, Dmitry Vyukov wrote:
> > > On Tue, 19 Apr 2022 at 15:46, Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > On Tue, Apr 19, 2022 at 04:56:07PM +0800, Hillf Danton wrote:
> > > > > On Mon, 18 Apr 2022 15:53:52 -0700 Paul E. McKenney wrote:
> > > > > > Debugging of problems involving insanely long-running SMI handlers
> > > > > > proceeds better if the CSD-lock timeout can be adjusted.  This commit
> > > > > > therefore provides a new smp.csd_lock_timeout kernel boot parameter
> > > > > > that specifies the timeout in milliseconds.  The default remains at the
> > > > > > previously hard-coded value of five seconds.
> > > > > >
> > > > > > Cc: Rik van Riel <riel@surriel.com>
> > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > Cc: Ingo Molnar <mingo@kernel.org>
> > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > > > > > Cc: Juergen Gross <jgross@suse.com>
> > > > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > > > > ---
> > > > > >  Documentation/admin-guide/kernel-parameters.txt | 11 +++++++++++
> > > > > >  kernel/smp.c                                    |  7 +++++--
> > > > > >  2 files changed, 16 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > > > > > index 3f1cc5e317ed..645c4c001b16 100644
> > > > > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > > > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > > > > @@ -5377,6 +5377,17 @@
> > > > > >     smart2=         [HW]
> > > > > >                     Format: <io1>[,<io2>[,...,<io8>]]
> > > > > >
> > > > > > +   smp.csd_lock_timeout= [KNL]
> > > > > > +                   Specify the period of time in milliseconds
> > > > > > +                   that smp_call_function() and friends will wait
> > > > > > +                   for a CPU to release the CSD lock.  This is
> > > > > > +                   useful when diagnosing bugs involving CPUs
> > > > > > +                   disabling interrupts for extended periods
> > > > > > +                   of time.  Defaults to 5,000 milliseconds, and
> > > > > > +                   setting a value of zero disables this feature.
> > > > > > +                   This feature may be more efficiently disabled
> > > > > > +                   using the csdlock_debug- kernel parameter.
> > > > > > +
> > > > >
> > > > > Can non-responsive CSD lock detected trigger syzbot (warning) report?
> > > >
> > > > If they enable it by building with CONFIG_CSD_LOCK_WAIT_DEBUG=y, yes.
> > >
> > > +syzkaller mailing list
> > >
> > > Currently we don't enable CONFIG_CSD_LOCK_WAIT_DEBUG in syzbot configs.
> > > Is it a generally useful debugging feature recommended to be enabled
> > > in kernel testing systems?
> >
> > With the default value for smp.csd_lock_timeout, it detects CPUs that have
> > had interrupts disabled for more than five seconds, which can be useful
> > for detecting what would otherwise be silent response-time failures.
>
> The five seconds take precedence over the 143s in reports [1] IMO.
> The shorter timeout helps select reproducers which in turn help find answer
> to questions there.


I've sent https://github.com/google/syzkaller/pull/3090 to enable the config.
5 seconds won't be reliable, I've set it to 100s to match CPU stall timeout.



> Hillf
>
> [1] https://lore.kernel.org/lkml/20220321210140.GK4285@paulmck-ThinkPad-P17-Gen-1/
>
> >
> > > If we enabled it, we also need to figure out where it fits into the
> > > timeout hierarchy and adjust smp.csd_lock_timeout:
> > > https://github.com/google/syzkaller/blob/master/dashboard/config/linux/bits/x86_64.yml#L15-L40
> >
> > On this, I must defer to you guys.  I can say that the larger the value
> > that you choose for smp.csd_lock_timeout, the nearer the beginnning of
> > that group it should go, but you guys knew that already.  ;-)
> >
> >                                                         Thanx, Paul
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller/20220419231820.2089-1-hdanton%40sina.com.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rcu 04/11] kernel/smp: Provide boot-time timeout for CSD lock diagnostics
  2022-04-20 12:17           ` Dmitry Vyukov
@ 2022-04-20 13:41             ` Paul E. McKenney
  2022-04-20 14:00               ` Dmitry Vyukov
  0 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2022-04-20 13:41 UTC (permalink / raw)
  To: Dmitry Vyukov; +Cc: Hillf Danton, linux-kernel, syzkaller

On Wed, Apr 20, 2022 at 02:17:21PM +0200, Dmitry Vyukov wrote:
> On Wed, 20 Apr 2022 at 01:18, Hillf Danton <hdanton@sina.com> wrote:
> > On Tue, 19 Apr 2022 09:49:08 -0700 Paul E. McKenney wrote:
> > > On Tue, Apr 19, 2022 at 04:11:36PM +0200, Dmitry Vyukov wrote:
> > > > On Tue, 19 Apr 2022 at 15:46, Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > On Tue, Apr 19, 2022 at 04:56:07PM +0800, Hillf Danton wrote:
> > > > > > On Mon, 18 Apr 2022 15:53:52 -0700 Paul E. McKenney wrote:
> > > > > > > Debugging of problems involving insanely long-running SMI handlers
> > > > > > > proceeds better if the CSD-lock timeout can be adjusted.  This commit
> > > > > > > therefore provides a new smp.csd_lock_timeout kernel boot parameter
> > > > > > > that specifies the timeout in milliseconds.  The default remains at the
> > > > > > > previously hard-coded value of five seconds.
> > > > > > >
> > > > > > > Cc: Rik van Riel <riel@surriel.com>
> > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > Cc: Ingo Molnar <mingo@kernel.org>
> > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > > > > > > Cc: Juergen Gross <jgross@suse.com>
> > > > > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > > > > > ---
> > > > > > >  Documentation/admin-guide/kernel-parameters.txt | 11 +++++++++++
> > > > > > >  kernel/smp.c                                    |  7 +++++--
> > > > > > >  2 files changed, 16 insertions(+), 2 deletions(-)
> > > > > > >
> > > > > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > > > > > > index 3f1cc5e317ed..645c4c001b16 100644
> > > > > > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > > > > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > > > > > @@ -5377,6 +5377,17 @@
> > > > > > >     smart2=         [HW]
> > > > > > >                     Format: <io1>[,<io2>[,...,<io8>]]
> > > > > > >
> > > > > > > +   smp.csd_lock_timeout= [KNL]
> > > > > > > +                   Specify the period of time in milliseconds
> > > > > > > +                   that smp_call_function() and friends will wait
> > > > > > > +                   for a CPU to release the CSD lock.  This is
> > > > > > > +                   useful when diagnosing bugs involving CPUs
> > > > > > > +                   disabling interrupts for extended periods
> > > > > > > +                   of time.  Defaults to 5,000 milliseconds, and
> > > > > > > +                   setting a value of zero disables this feature.
> > > > > > > +                   This feature may be more efficiently disabled
> > > > > > > +                   using the csdlock_debug- kernel parameter.
> > > > > > > +
> > > > > >
> > > > > > Can non-responsive CSD lock detected trigger syzbot (warning) report?
> > > > >
> > > > > If they enable it by building with CONFIG_CSD_LOCK_WAIT_DEBUG=y, yes.
> > > >
> > > > +syzkaller mailing list
> > > >
> > > > Currently we don't enable CONFIG_CSD_LOCK_WAIT_DEBUG in syzbot configs.
> > > > Is it a generally useful debugging feature recommended to be enabled
> > > > in kernel testing systems?
> > >
> > > With the default value for smp.csd_lock_timeout, it detects CPUs that have
> > > had interrupts disabled for more than five seconds, which can be useful
> > > for detecting what would otherwise be silent response-time failures.
> >
> > The five seconds take precedence over the 143s in reports [1] IMO.
> > The shorter timeout helps select reproducers which in turn help find answer
> > to questions there.
> 
> I've sent https://github.com/google/syzkaller/pull/3090 to enable the config.
> 5 seconds won't be reliable, I've set it to 100s to match CPU stall timeout.

Thank you, Dmitry!

Is there something I should be doing to enable the RCU CPU stall timeout
to be reduced?

							Thanx, Paul

> > [1] https://lore.kernel.org/lkml/20220321210140.GK4285@paulmck-ThinkPad-P17-Gen-1/
> >
> > >
> > > > If we enabled it, we also need to figure out where it fits into the
> > > > timeout hierarchy and adjust smp.csd_lock_timeout:
> > > > https://github.com/google/syzkaller/blob/master/dashboard/config/linux/bits/x86_64.yml#L15-L40
> > >
> > > On this, I must defer to you guys.  I can say that the larger the value
> > > that you choose for smp.csd_lock_timeout, the nearer the beginnning of
> > > that group it should go, but you guys knew that already.  ;-)
> > >
> > >                                                         Thanx, Paul
> >
> > --
> > You received this message because you are subscribed to the Google Groups "syzkaller" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+unsubscribe@googlegroups.com.
> > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller/20220419231820.2089-1-hdanton%40sina.com.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rcu 04/11] kernel/smp: Provide boot-time timeout for CSD lock diagnostics
  2022-04-20 13:41             ` Paul E. McKenney
@ 2022-04-20 14:00               ` Dmitry Vyukov
  2022-04-20 14:12                 ` Paul E. McKenney
  0 siblings, 1 reply; 24+ messages in thread
From: Dmitry Vyukov @ 2022-04-20 14:00 UTC (permalink / raw)
  To: paulmck; +Cc: Hillf Danton, linux-kernel, syzkaller

On Wed, 20 Apr 2022 at 15:41, Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > > > > Debugging of problems involving insanely long-running SMI handlers
> > > > > > > > proceeds better if the CSD-lock timeout can be adjusted.  This commit
> > > > > > > > therefore provides a new smp.csd_lock_timeout kernel boot parameter
> > > > > > > > that specifies the timeout in milliseconds.  The default remains at the
> > > > > > > > previously hard-coded value of five seconds.
> > > > > > > >
> > > > > > > > Cc: Rik van Riel <riel@surriel.com>
> > > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > > Cc: Ingo Molnar <mingo@kernel.org>
> > > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > > Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > > > > > > > Cc: Juergen Gross <jgross@suse.com>
> > > > > > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > > > > > > ---
> > > > > > > >  Documentation/admin-guide/kernel-parameters.txt | 11 +++++++++++
> > > > > > > >  kernel/smp.c                                    |  7 +++++--
> > > > > > > >  2 files changed, 16 insertions(+), 2 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > > > > > > > index 3f1cc5e317ed..645c4c001b16 100644
> > > > > > > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > > > > > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > > > > > > @@ -5377,6 +5377,17 @@
> > > > > > > >     smart2=         [HW]
> > > > > > > >                     Format: <io1>[,<io2>[,...,<io8>]]
> > > > > > > >
> > > > > > > > +   smp.csd_lock_timeout= [KNL]
> > > > > > > > +                   Specify the period of time in milliseconds
> > > > > > > > +                   that smp_call_function() and friends will wait
> > > > > > > > +                   for a CPU to release the CSD lock.  This is
> > > > > > > > +                   useful when diagnosing bugs involving CPUs
> > > > > > > > +                   disabling interrupts for extended periods
> > > > > > > > +                   of time.  Defaults to 5,000 milliseconds, and
> > > > > > > > +                   setting a value of zero disables this feature.
> > > > > > > > +                   This feature may be more efficiently disabled
> > > > > > > > +                   using the csdlock_debug- kernel parameter.
> > > > > > > > +
> > > > > > >
> > > > > > > Can non-responsive CSD lock detected trigger syzbot (warning) report?
> > > > > >
> > > > > > If they enable it by building with CONFIG_CSD_LOCK_WAIT_DEBUG=y, yes.
> > > > >
> > > > > +syzkaller mailing list
> > > > >
> > > > > Currently we don't enable CONFIG_CSD_LOCK_WAIT_DEBUG in syzbot configs.
> > > > > Is it a generally useful debugging feature recommended to be enabled
> > > > > in kernel testing systems?
> > > >
> > > > With the default value for smp.csd_lock_timeout, it detects CPUs that have
> > > > had interrupts disabled for more than five seconds, which can be useful
> > > > for detecting what would otherwise be silent response-time failures.
> > >
> > > The five seconds take precedence over the 143s in reports [1] IMO.
> > > The shorter timeout helps select reproducers which in turn help find answer
> > > to questions there.
> >
> > I've sent https://github.com/google/syzkaller/pull/3090 to enable the config.
> > 5 seconds won't be reliable, I've set it to 100s to match CPU stall timeout.
>
> Thank you, Dmitry!
>
> Is there something I should be doing to enable the RCU CPU stall timeout
> to be reduced?

No.
We just bumped it to a high enough value to not cause false positives.
If it's stalled, it's stalled.


>                                                         Thanx, Paul
>
> > > [1] https://lore.kernel.org/lkml/20220321210140.GK4285@paulmck-ThinkPad-P17-Gen-1/
> > >
> > > >
> > > > > If we enabled it, we also need to figure out where it fits into the
> > > > > timeout hierarchy and adjust smp.csd_lock_timeout:
> > > > > https://github.com/google/syzkaller/blob/master/dashboard/config/linux/bits/x86_64.yml#L15-L40
> > > >
> > > > On this, I must defer to you guys.  I can say that the larger the value
> > > > that you choose for smp.csd_lock_timeout, the nearer the beginnning of
> > > > that group it should go, but you guys knew that already.  ;-)
> > > >
> > > >                                                         Thanx, Paul
> > >
> > > --
> > > You received this message because you are subscribed to the Google Groups "syzkaller" group.
> > > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+unsubscribe@googlegroups.com.
> > > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller/20220419231820.2089-1-hdanton%40sina.com.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rcu 04/11] kernel/smp: Provide boot-time timeout for CSD lock diagnostics
  2022-04-20 14:00               ` Dmitry Vyukov
@ 2022-04-20 14:12                 ` Paul E. McKenney
  0 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2022-04-20 14:12 UTC (permalink / raw)
  To: Dmitry Vyukov; +Cc: Hillf Danton, linux-kernel, syzkaller

On Wed, Apr 20, 2022 at 04:00:07PM +0200, Dmitry Vyukov wrote:
> On Wed, 20 Apr 2022 at 15:41, Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > > > > > Debugging of problems involving insanely long-running SMI handlers
> > > > > > > > > proceeds better if the CSD-lock timeout can be adjusted.  This commit
> > > > > > > > > therefore provides a new smp.csd_lock_timeout kernel boot parameter
> > > > > > > > > that specifies the timeout in milliseconds.  The default remains at the
> > > > > > > > > previously hard-coded value of five seconds.
> > > > > > > > >
> > > > > > > > > Cc: Rik van Riel <riel@surriel.com>
> > > > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > > > Cc: Ingo Molnar <mingo@kernel.org>
> > > > > > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > > > Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > > > > > > > > Cc: Juergen Gross <jgross@suse.com>
> > > > > > > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > > > > > > > ---
> > > > > > > > >  Documentation/admin-guide/kernel-parameters.txt | 11 +++++++++++
> > > > > > > > >  kernel/smp.c                                    |  7 +++++--
> > > > > > > > >  2 files changed, 16 insertions(+), 2 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > > > > > > > > index 3f1cc5e317ed..645c4c001b16 100644
> > > > > > > > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > > > > > > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > > > > > > > @@ -5377,6 +5377,17 @@
> > > > > > > > >     smart2=         [HW]
> > > > > > > > >                     Format: <io1>[,<io2>[,...,<io8>]]
> > > > > > > > >
> > > > > > > > > +   smp.csd_lock_timeout= [KNL]
> > > > > > > > > +                   Specify the period of time in milliseconds
> > > > > > > > > +                   that smp_call_function() and friends will wait
> > > > > > > > > +                   for a CPU to release the CSD lock.  This is
> > > > > > > > > +                   useful when diagnosing bugs involving CPUs
> > > > > > > > > +                   disabling interrupts for extended periods
> > > > > > > > > +                   of time.  Defaults to 5,000 milliseconds, and
> > > > > > > > > +                   setting a value of zero disables this feature.
> > > > > > > > > +                   This feature may be more efficiently disabled
> > > > > > > > > +                   using the csdlock_debug- kernel parameter.
> > > > > > > > > +
> > > > > > > >
> > > > > > > > Can non-responsive CSD lock detected trigger syzbot (warning) report?
> > > > > > >
> > > > > > > If they enable it by building with CONFIG_CSD_LOCK_WAIT_DEBUG=y, yes.
> > > > > >
> > > > > > +syzkaller mailing list
> > > > > >
> > > > > > Currently we don't enable CONFIG_CSD_LOCK_WAIT_DEBUG in syzbot configs.
> > > > > > Is it a generally useful debugging feature recommended to be enabled
> > > > > > in kernel testing systems?
> > > > >
> > > > > With the default value for smp.csd_lock_timeout, it detects CPUs that have
> > > > > had interrupts disabled for more than five seconds, which can be useful
> > > > > for detecting what would otherwise be silent response-time failures.
> > > >
> > > > The five seconds take precedence over the 143s in reports [1] IMO.
> > > > The shorter timeout helps select reproducers which in turn help find answer
> > > > to questions there.
> > >
> > > I've sent https://github.com/google/syzkaller/pull/3090 to enable the config.
> > > 5 seconds won't be reliable, I've set it to 100s to match CPU stall timeout.
> >
> > Thank you, Dmitry!
> >
> > Is there something I should be doing to enable the RCU CPU stall timeout
> > to be reduced?
> 
> No.
> We just bumped it to a high enough value to not cause false positives.
> If it's stalled, it's stalled.

Ah, your goal is to find perma-stalls.  Got it, thank you!

                                                        Thanx, Paul

> > > > [1] https://lore.kernel.org/lkml/20220321210140.GK4285@paulmck-ThinkPad-P17-Gen-1/
> > > >
> > > > >
> > > > > > If we enabled it, we also need to figure out where it fits into the
> > > > > > timeout hierarchy and adjust smp.csd_lock_timeout:
> > > > > > https://github.com/google/syzkaller/blob/master/dashboard/config/linux/bits/x86_64.yml#L15-L40
> > > > >
> > > > > On this, I must defer to you guys.  I can say that the larger the value
> > > > > that you choose for smp.csd_lock_timeout, the nearer the beginnning of
> > > > > that group it should go, but you guys knew that already.  ;-)
> > > > >
> > > > >                                                         Thanx, Paul
> > > >
> > > > --
> > > > You received this message because you are subscribed to the Google Groups "syzkaller" group.
> > > > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+unsubscribe@googlegroups.com.
> > > > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller/20220419231820.2089-1-hdanton%40sina.com.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2022-04-20 14:17 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-18 22:53 [PATCH rcu 0/11] Miscellaneous fixes for v5.19 Paul E. McKenney
2022-04-18 22:53 ` [PATCH rcu 01/11] rcu: Clarify fill-the-gap comment in rcu_segcblist_advance() Paul E. McKenney
2022-04-18 22:53 ` [PATCH rcu 02/11] rcu: Fix rcu_preempt_deferred_qs_irqrestore() strict QS reporting Paul E. McKenney
2022-04-18 22:53 ` [PATCH rcu 03/11] rcu: Check for jiffies going backwards Paul E. McKenney
2022-04-18 22:53 ` [PATCH rcu 04/11] kernel/smp: Provide boot-time timeout for CSD lock diagnostics Paul E. McKenney
2022-04-19  7:11   ` Juergen Gross
2022-04-19 16:44     ` Paul E. McKenney
2022-04-18 22:53 ` [PATCH rcu 05/11] rcu: Add comments to final rcu_gp_cleanup() "if" statement Paul E. McKenney
2022-04-18 22:53 ` [PATCH rcu 06/11] rcu: Print number of online CPUs in RCU CPU stall-warning messages Paul E. McKenney
2022-04-18 22:53 ` [PATCH rcu 07/11] rcu: Fix preemption mode check on synchronize_rcu[_expedited]() Paul E. McKenney
2022-04-18 22:53 ` [PATCH rcu 08/11] srcu: Drop needless initialization of sdp in srcu_gp_start() Paul E. McKenney
2022-04-19  2:03   ` kernel test robot
2022-04-19  3:25     ` Paul E. McKenney
2022-04-19  3:25       ` Paul E. McKenney
2022-04-18 22:53 ` [PATCH rcu 09/11] rcu: Check for successful spawn of ->boost_kthread_task Paul E. McKenney
2022-04-18 22:53 ` [PATCH rcu 10/11] rcu_sync: Fix comment to properly reflect rcu_sync_exit() behavior Paul E. McKenney
2022-04-18 22:53 ` [PATCH rcu 11/11] rcu: Use IRQ_WORK_INIT_HARD() to avoid rcu_read_unlock() hangs Paul E. McKenney
     [not found] ` <20220419085607.2014-1-hdanton@sina.com>
2022-04-19 13:46   ` [PATCH rcu 04/11] kernel/smp: Provide boot-time timeout for CSD lock diagnostics Paul E. McKenney
2022-04-19 14:11     ` Dmitry Vyukov
2022-04-19 16:49       ` Paul E. McKenney
     [not found]         ` <20220419231820.2089-1-hdanton@sina.com>
2022-04-20 12:17           ` Dmitry Vyukov
2022-04-20 13:41             ` Paul E. McKenney
2022-04-20 14:00               ` Dmitry Vyukov
2022-04-20 14:12                 ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.