From: Peter Zijlstra <peterz@infradead.org>
To: Juergen Gross <jgross@suse.com>
Cc: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
paulmck@kernel.org, mhocko@suse.com,
Jonathan Corbet <corbet@lwn.net>
Subject: Re: [PATCH v2 3/3] kernel/smp: add more data to CSD lock debugging
Date: Mon, 1 Mar 2021 18:07:24 +0100 [thread overview]
Message-ID: <YD0fTLnQTQ7/M7fx@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20210301101336.7797-4-jgross@suse.com>
On Mon, Mar 01, 2021 at 11:13:36AM +0100, Juergen Gross wrote:
> In order to help identifying problems with IPI handling and remote
> function execution add some more data to IPI debugging code.
>
> There have been multiple reports of cpus looping long times (many
> seconds) in smp_call_function_many() waiting for another cpu executing
> a function like tlb flushing. Most of these reports have been for
> cases where the kernel was running as a guest on top of KVM or Xen
> (there are rumours of that happening under VMWare, too, and even on
> bare metal).
>
> Finding the root cause hasn't been successful yet, even after more than
> 2 years of chasing this bug by different developers.
>
> Commit 35feb60474bf4f7 ("kernel/smp: Provide CSD lock timeout
> diagnostics") tried to address this by adding some debug code and by
> issuing another IPI when a hang was detected. This helped mitigating
> the problem (the repeated IPI unlocks the hang), but the root cause is
> still unknown.
>
> Current available data suggests that either an IPI wasn't sent when it
> should have been, or that the IPI didn't result in the target cpu
> executing the queued function (due to the IPI not reaching the cpu,
> the IPI handler not being called, or the handler not seeing the queued
> request).
>
> Try to add more diagnostic data by introducing a global atomic counter
> which is being incremented when doing critical operations (before and
> after queueing a new request, when sending an IPI, and when dequeueing
> a request). The counter value is stored in percpu variables which can
> be printed out when a hang is detected.
>
> The data of the last event (consisting of sequence counter, source
> cpu, target cpu, and event type) is stored in a global variable. When
> a new event is to be traced, the data of the last event is stored in
> the event related percpu location and the global data is updated with
> the new event's data. This allows to track two events in one data
> location: one by the value of the event data (the event before the
> current one), and one by the location itself (the current event).
>
> A typical printout with a detected hang will look like this:
>
> csd: Detected non-responsive CSD lock (#1) on CPU#1, waiting 5000000003 ns for CPU#06 scf_handler_1+0x0/0x50(0xffffa2a881bb1410).
> csd: CSD lock (#1) handling prior scf_handler_1+0x0/0x50(0xffffa2a8813823c0) request.
> csd: cnt(00008cc): ffff->0000 dequeue (src cpu 0 == empty)
> csd: cnt(00008cd): ffff->0006 idle
> csd: cnt(0003668): 0001->0006 queue
> csd: cnt(0003669): 0001->0006 ipi
> csd: cnt(0003e0f): 0007->000a queue
> csd: cnt(0003e10): 0001->ffff ping
> csd: cnt(0003e71): 0003->0000 ping
> csd: cnt(0003e72): ffff->0006 gotipi
> csd: cnt(0003e73): ffff->0006 handle
> csd: cnt(0003e74): ffff->0006 dequeue (src cpu 0 == empty)
> csd: cnt(0003e7f): 0004->0006 ping
> csd: cnt(0003e80): 0001->ffff pinged
> csd: cnt(0003eb2): 0005->0001 noipi
> csd: cnt(0003eb3): 0001->0006 queue
> csd: cnt(0003eb4): 0001->0006 noipi
> csd: cnt now: 0003f00
>
> The idea is to print only relevant entries. Those are all events which
> are associated with the hang (so sender side events for the source cpu
> of the hanging request, and receiver side events for the target cpu),
> and the related events just before those (for adding data needed to
> identify a possible race). Printing all available data would be
> possible, but this would add large amounts of data printed on larger
> configurations.
>
> Signed-off-by: Juergen Gross <jgross@suse.com>
> Tested-by: Paul E. McKenney <paulmck@kernel.org>
> ---
> V2:
> - add automatic data deciphering and sorting of entries
> - add new trace point for leaving flush_smp_call_function_queue()
> - add information when finding an empty call_single_queue
They do not apply on top of these:
https://lkml.kernel.org/r/20210220231712.2475218-2-namit@vmware.com
:-/
> @@ -290,6 +476,19 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(call_single_data_t, csd_data);
>
> void __smp_call_single_queue(int cpu, struct llist_node *node)
> {
> +#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
> + if (static_branch_unlikely(&csdlock_debug_extended)) {
> + unsigned int type;
> +
> + type = CSD_TYPE(container_of(node, call_single_data_t,
> + node.llist));
> + if (type == CSD_TYPE_SYNC || type == CSD_TYPE_ASYNC) {
> + __smp_call_single_queue_debug(cpu, node);
> + return;
> + }
> + }
> +#endif
This really ought to be in generic_exec_single(), because there we know
the type matches.
next prev parent reply other threads:[~2021-03-01 20:43 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-01 10:13 [PATCH v2 0/3] kernel/smp.c: add more CSD lock debugging Juergen Gross
2021-03-01 10:13 ` [PATCH v2 1/3] kernel/smp: add boot parameter for controlling " Juergen Gross
2021-03-01 14:09 ` [tip: locking/core] locking/csd_lock: Add " tip-bot2 for Juergen Gross
2021-03-06 11:54 ` tip-bot2 for Juergen Gross
2021-03-01 10:13 ` [PATCH v2 2/3] kernel/smp: prepare more " Juergen Gross
2021-03-01 14:09 ` [tip: locking/core] locking/csd_lock: Prepare " tip-bot2 for Juergen Gross
2021-03-06 11:54 ` tip-bot2 for Juergen Gross
2021-03-01 10:13 ` [PATCH v2 3/3] kernel/smp: add more data to " Juergen Gross
2021-03-01 14:09 ` [tip: locking/core] locking/csd_lock: Add " tip-bot2 for Juergen Gross
2021-03-01 17:07 ` Peter Zijlstra [this message]
2021-03-01 19:16 ` [PATCH v2 3/3] kernel/smp: add " Jürgen Groß
2021-03-01 20:44 ` Peter Zijlstra
2021-03-02 7:05 ` Ingo Molnar
2021-03-02 7:14 ` Ingo Molnar
2021-03-02 5:57 ` Jürgen Groß
2021-03-06 11:54 ` [tip: locking/core] locking/csd_lock: Add " tip-bot2 for Juergen Gross
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YD0fTLnQTQ7/M7fx@hirez.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=corbet@lwn.net \
--cc=jgross@suse.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mhocko@suse.com \
--cc=paulmck@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).