All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH csd-lock] Decrease console output from CSD-lock timeouts
@ 2023-06-12 20:49 Paul E. McKenney
  2023-06-12 20:50 ` [PATCH csd-lock 1/2] smp: Reduce logging due to dump_stack of CSD waiters Paul E. McKenney
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Paul E. McKenney @ 2023-06-12 20:49 UTC (permalink / raw)
  To: peterz, jgross, vschneid, yury.norov; +Cc: linux-kernel, imran.f.khan

Hello!

This series contains a couple of patches that reduce the console output
produced by CSD-lock timeouts.

1.	Reduce logging due to dump_stack of CSD waiters, courtesy of
	Imran Khan.

2.	Reduce NMI traffic from CSD waiters to CSD destination, courtesy
	of Imran Khan.

						Thanx, Paul

------------------------------------------------------------------------

 b/kernel/smp.c |    3 ++-
 kernel/smp.c   |   10 +++++++++-
 2 files changed, 11 insertions(+), 2 deletions(-)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH csd-lock 1/2] smp: Reduce logging due to dump_stack of CSD waiters
  2023-06-12 20:49 [PATCH csd-lock] Decrease console output from CSD-lock timeouts Paul E. McKenney
@ 2023-06-12 20:50 ` Paul E. McKenney
  2023-06-12 20:50 ` [PATCH csd-lock 2/2] smp: Reduce NMI traffic from CSD waiters to CSD destination Paul E. McKenney
  2023-06-25 15:42 ` [PATCH csd-lock] Decrease console output from CSD-lock timeouts Paul E. McKenney
  2 siblings, 0 replies; 5+ messages in thread
From: Paul E. McKenney @ 2023-06-12 20:50 UTC (permalink / raw)
  To: peterz, jgross, vschneid, yury.norov
  Cc: linux-kernel, imran.f.khan, kernel-team, Paul E . McKenney

From: Imran Khan <imran.f.khan@oracle.com>

If a waiter is waiting for CSD lock, its call stack will not change
between first and subsequent hang detection for the same CSD lock.
Therefore, do dump_stack only for first-time detection for a given waiter.

This avoids excessive logging on systems with hundreds of CPUs where
repetitive dump_stack from hundreds of CPUs would otherwise flood the
console.

Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/smp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index ab3e5dad6cfe..b7ccba677a0a 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -248,7 +248,8 @@ static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 *
 			arch_send_call_function_single_ipi(cpu);
 		}
 	}
-	dump_stack();
+	if (firsttime)
+		dump_stack();
 	*ts1 = ts2;
 
 	return false;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH csd-lock 2/2] smp: Reduce NMI traffic from CSD waiters to CSD destination
  2023-06-12 20:49 [PATCH csd-lock] Decrease console output from CSD-lock timeouts Paul E. McKenney
  2023-06-12 20:50 ` [PATCH csd-lock 1/2] smp: Reduce logging due to dump_stack of CSD waiters Paul E. McKenney
@ 2023-06-12 20:50 ` Paul E. McKenney
  2023-06-25 15:42 ` [PATCH csd-lock] Decrease console output from CSD-lock timeouts Paul E. McKenney
  2 siblings, 0 replies; 5+ messages in thread
From: Paul E. McKenney @ 2023-06-12 20:50 UTC (permalink / raw)
  To: peterz, jgross, vschneid, yury.norov
  Cc: linux-kernel, imran.f.khan, kernel-team, Paul E . McKenney

From: Imran Khan <imran.f.khan@oracle.com>

On systems with hundreds of CPUs, if most of the CPUs detect a CSD hang,
then all of these waiting CPUs send an NMI to the destination CPU in
order to dump its backtrace.

Given enough NMIs, the destination CPU will spent much of its time
producing backtraces, thus further delaying that CPU's response to the
original CSD IPI.  In the worst case, by the time destination CPU is
done producing all of these backtrace NMIs, the CSD wait timeout will
have elapsed so that the waiters resend their backtrace NMIs again,
further delaying forward progress.

Therefore, to avoid these delays, issue the backtrace NMI only from
the first waiter.  The destination CPU's other waiters can make use of
backtrace obtained from the first waiter's NMI.

Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/smp.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index b7ccba677a0a..a1cd21ea8b30 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -43,6 +43,8 @@ static DEFINE_PER_CPU_ALIGNED(struct call_function_data, cfd_data);
 
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct llist_head, call_single_queue);
 
+static DEFINE_PER_CPU(atomic_t, trigger_backtrace) = ATOMIC_INIT(1);
+
 static void __flush_smp_call_function_queue(bool warn_cpu_offline);
 
 int smpcfd_prepare_cpu(unsigned int cpu)
@@ -242,7 +244,8 @@ static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 *
 			 *bug_id, !cpu_cur_csd ? "unresponsive" : "handling this request");
 	}
 	if (cpu >= 0) {
-		dump_cpu_task(cpu);
+		if (atomic_cmpxchg_acquire(&per_cpu(trigger_backtrace, cpu), 1, 0))
+			dump_cpu_task(cpu);
 		if (!cpu_cur_csd) {
 			pr_alert("csd: Re-sending CSD lock (#%d) IPI from CPU#%02d to CPU#%02d\n", *bug_id, raw_smp_processor_id(), cpu);
 			arch_send_call_function_single_ipi(cpu);
@@ -423,9 +426,14 @@ static void __flush_smp_call_function_queue(bool warn_cpu_offline)
 	struct llist_node *entry, *prev;
 	struct llist_head *head;
 	static bool warned;
+	atomic_t *tbt;
 
 	lockdep_assert_irqs_disabled();
 
+	/* Allow waiters to send backtrace NMI from here onwards */
+	tbt = this_cpu_ptr(&trigger_backtrace);
+	atomic_set_release(tbt, 1);
+
 	head = this_cpu_ptr(&call_single_queue);
 	entry = llist_del_all(head);
 	entry = llist_reverse_order(entry);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH csd-lock] Decrease console output from CSD-lock timeouts
  2023-06-12 20:49 [PATCH csd-lock] Decrease console output from CSD-lock timeouts Paul E. McKenney
  2023-06-12 20:50 ` [PATCH csd-lock 1/2] smp: Reduce logging due to dump_stack of CSD waiters Paul E. McKenney
  2023-06-12 20:50 ` [PATCH csd-lock 2/2] smp: Reduce NMI traffic from CSD waiters to CSD destination Paul E. McKenney
@ 2023-06-25 15:42 ` Paul E. McKenney
  2023-07-17 15:36   ` Paul E. McKenney
  2 siblings, 1 reply; 5+ messages in thread
From: Paul E. McKenney @ 2023-06-25 15:42 UTC (permalink / raw)
  To: peterz, jgross, vschneid, yury.norov; +Cc: linux-kernel, imran.f.khan

On Mon, Jun 12, 2023 at 01:49:40PM -0700, Paul E. McKenney wrote:
> Hello!
> 
> This series contains a couple of patches that reduce the console output
> produced by CSD-lock timeouts.
> 
> 1.	Reduce logging due to dump_stack of CSD waiters, courtesy of
> 	Imran Khan.
> 
> 2.	Reduce NMI traffic from CSD waiters to CSD destination, courtesy
> 	of Imran Khan.

Hearing no objections, I plan to send Linus a pull request for these at
this end of this coming week.

						Thanx, Paul

> ------------------------------------------------------------------------
> 
>  b/kernel/smp.c |    3 ++-
>  kernel/smp.c   |   10 +++++++++-
>  2 files changed, 11 insertions(+), 2 deletions(-)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH csd-lock] Decrease console output from CSD-lock timeouts
  2023-06-25 15:42 ` [PATCH csd-lock] Decrease console output from CSD-lock timeouts Paul E. McKenney
@ 2023-07-17 15:36   ` Paul E. McKenney
  0 siblings, 0 replies; 5+ messages in thread
From: Paul E. McKenney @ 2023-07-17 15:36 UTC (permalink / raw)
  To: peterz, jgross, vschneid, yury.norov; +Cc: linux-kernel, imran.f.khan

On Sun, Jun 25, 2023 at 08:42:17AM -0700, Paul E. McKenney wrote:
> On Mon, Jun 12, 2023 at 01:49:40PM -0700, Paul E. McKenney wrote:
> > Hello!
> > 
> > This series contains a couple of patches that reduce the console output
> > produced by CSD-lock timeouts.
> > 
> > 1.	Reduce logging due to dump_stack of CSD waiters, courtesy of
> > 	Imran Khan.
> > 
> > 2.	Reduce NMI traffic from CSD waiters to CSD destination, courtesy
> > 	of Imran Khan.
> 
> Hearing no objections, I plan to send Linus a pull request for these at
> this end of this coming week.

OK, best-laid plans and all that.  Sigh.  Next merge window...

						Thanx, Paul

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-07-17 15:36 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-12 20:49 [PATCH csd-lock] Decrease console output from CSD-lock timeouts Paul E. McKenney
2023-06-12 20:50 ` [PATCH csd-lock 1/2] smp: Reduce logging due to dump_stack of CSD waiters Paul E. McKenney
2023-06-12 20:50 ` [PATCH csd-lock 2/2] smp: Reduce NMI traffic from CSD waiters to CSD destination Paul E. McKenney
2023-06-25 15:42 ` [PATCH csd-lock] Decrease console output from CSD-lock timeouts Paul E. McKenney
2023-07-17 15:36   ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.