All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Paul Burton <paul.burton@mips.com>,
	James Hogan <jhogan@kernel.org>,
	Ralf Baechle <ralf@linux-mips.org>,
	Huacai Chen <chenhc@lemote.com>,
	linux-mips@linux-mips.org
Subject: [PATCH 4.9 01/66] MIPS: Use async IPIs for arch_trigger_cpumask_backtrace()
Date: Fri, 20 Jul 2018 14:13:18 +0200	[thread overview]
Message-ID: <20180720121407.305714136@linuxfoundation.org> (raw)
In-Reply-To: <20180720121407.228772286@linuxfoundation.org>

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Paul Burton <paul.burton@mips.com>

commit b63e132b6433a41cf311e8bc382d33fd2b73b505 upstream.

The current MIPS implementation of arch_trigger_cpumask_backtrace() is
broken because it attempts to use synchronous IPIs despite the fact that
it may be run with interrupts disabled.

This means that when arch_trigger_cpumask_backtrace() is invoked, for
example by the RCU CPU stall watchdog, we may:

  - Deadlock due to use of synchronous IPIs with interrupts disabled,
    causing the CPU that's attempting to generate the backtrace output
    to hang itself.

  - Not succeed in generating the desired output from remote CPUs.

  - Produce warnings about this from smp_call_function_many(), for
    example:

    [42760.526910] INFO: rcu_sched detected stalls on CPUs/tasks:
    [42760.535755]  0-...!: (1 GPs behind) idle=ade/140000000000000/0 softirq=526944/526945 fqs=0
    [42760.547874]  1-...!: (0 ticks this GP) idle=e4a/140000000000000/0 softirq=547885/547885 fqs=0
    [42760.559869]  (detected by 2, t=2162 jiffies, g=266689, c=266688, q=33)
    [42760.568927] ------------[ cut here ]------------
    [42760.576146] WARNING: CPU: 2 PID: 1216 at kernel/smp.c:416 smp_call_function_many+0x88/0x20c
    [42760.587839] Modules linked in:
    [42760.593152] CPU: 2 PID: 1216 Comm: sh Not tainted 4.15.4-00373-gee058bb4d0c2 #2
    [42760.603767] Stack : 8e09bd20 8e09bd20 8e09bd20 fffffff0 00000007 00000006 00000000 8e09bca8
    [42760.616937]         95b2b379 95b2b379 807a0080 00000007 81944518 0000018a 00000032 00000000
    [42760.630095]         00000000 00000030 80000000 00000000 806eca74 00000009 8017e2b8 000001a0
    [42760.643169]         00000000 00000002 00000000 8e09baa4 00000008 808b8008 86d69080 8e09bca0
    [42760.656282]         8e09ad50 805e20aa 00000000 00000000 00000000 8017e2b8 00000009 801070ca
    [42760.669424]         ...
    [42760.673919] Call Trace:
    [42760.678672] [<27fde568>] show_stack+0x70/0xf0
    [42760.685417] [<84751641>] dump_stack+0xaa/0xd0
    [42760.692188] [<699d671c>] __warn+0x80/0x92
    [42760.698549] [<68915d41>] warn_slowpath_null+0x28/0x36
    [42760.705912] [<f7c76c1c>] smp_call_function_many+0x88/0x20c
    [42760.713696] [<6bbdfc2a>] arch_trigger_cpumask_backtrace+0x30/0x4a
    [42760.722216] [<f845bd33>] rcu_dump_cpu_stacks+0x6a/0x98
    [42760.729580] [<796e7629>] rcu_check_callbacks+0x672/0x6ac
    [42760.737476] [<059b3b43>] update_process_times+0x18/0x34
    [42760.744981] [<6eb94941>] tick_sched_handle.isra.5+0x26/0x38
    [42760.752793] [<478d3d70>] tick_sched_timer+0x1c/0x50
    [42760.759882] [<e56ea39f>] __hrtimer_run_queues+0xc6/0x226
    [42760.767418] [<e88bbcae>] hrtimer_interrupt+0x88/0x19a
    [42760.775031] [<6765a19e>] gic_compare_interrupt+0x2e/0x3a
    [42760.782761] [<0558bf5f>] handle_percpu_devid_irq+0x78/0x168
    [42760.790795] [<90c11ba2>] generic_handle_irq+0x1e/0x2c
    [42760.798117] [<1b6d462c>] gic_handle_local_int+0x38/0x86
    [42760.805545] [<b2ada1c7>] gic_irq_dispatch+0xa/0x14
    [42760.812534] [<90c11ba2>] generic_handle_irq+0x1e/0x2c
    [42760.820086] [<c7521934>] do_IRQ+0x16/0x20
    [42760.826274] [<9aef3ce6>] plat_irq_dispatch+0x62/0x94
    [42760.833458] [<6a94b53c>] except_vec_vi_end+0x70/0x78
    [42760.840655] [<22284043>] smp_call_function_many+0x1ba/0x20c
    [42760.848501] [<54022b58>] smp_call_function+0x1e/0x2c
    [42760.855693] [<ab9fc705>] flush_tlb_mm+0x2a/0x98
    [42760.862730] [<0844cdd0>] tlb_flush_mmu+0x1c/0x44
    [42760.869628] [<cb259b74>] arch_tlb_finish_mmu+0x26/0x3e
    [42760.877021] [<1aeaaf74>] tlb_finish_mmu+0x18/0x66
    [42760.883907] [<b3fce717>] exit_mmap+0x76/0xea
    [42760.890428] [<c4c8a2f6>] mmput+0x80/0x11a
    [42760.896632] [<a41a08f4>] do_exit+0x1f4/0x80c
    [42760.903158] [<ee01cef6>] do_group_exit+0x20/0x7e
    [42760.909990] [<13fa8d54>] __wake_up_parent+0x0/0x1e
    [42760.917045] [<46cf89d0>] smp_call_function_many+0x1a2/0x20c
    [42760.924893] [<8c21a93b>] syscall_common+0x14/0x1c
    [42760.931765] ---[ end trace 02aa09da9dc52a60 ]---
    [42760.938342] ------------[ cut here ]------------
    [42760.945311] WARNING: CPU: 2 PID: 1216 at kernel/smp.c:291 smp_call_function_single+0xee/0xf8
    ...

This patch switches MIPS' arch_trigger_cpumask_backtrace() to use async
IPIs & smp_call_function_single_async() in order to resolve this
problem. We ensure use of the pre-allocated call_single_data_t
structures is serialized by maintaining a cpumask indicating that
they're busy, and refusing to attempt to send an IPI when a CPU's bit is
set in this mask. This should only happen if a CPU hasn't responded to a
previous backtrace IPI - ie. if it's hung - and we print a warning to
the console in this case.

I've marked this for stable branches as far back as v4.9, to which it
applies cleanly. Strictly speaking the faulty MIPS implementation can be
traced further back to commit 856839b76836 ("MIPS: Add
arch_trigger_all_cpu_backtrace() function") in v3.19, but kernel
versions v3.19 through v4.8 will require further work to backport due to
the rework performed in commit 9a01c3ed5cdb ("nmi_backtrace: add more
trigger_*_cpu_backtrace() methods").

Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/19597/
Cc: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Huacai Chen <chenhc@lemote.com>
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org # v4.9+
Fixes: 856839b76836 ("MIPS: Add arch_trigger_all_cpu_backtrace() function")
Fixes: 9a01c3ed5cdb ("nmi_backtrace: add more trigger_*_cpu_backtrace() methods")
[ Huacai: backported to 4.9: Replace "call_single_data_t" with "struct call_single_data" ]
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/mips/kernel/process.c |   45 ++++++++++++++++++++++++++++++---------------
 1 file changed, 30 insertions(+), 15 deletions(-)

--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -26,6 +26,7 @@
 #include <linux/kallsyms.h>
 #include <linux/random.h>
 #include <linux/prctl.h>
+#include <linux/nmi.h>
 
 #include <asm/asm.h>
 #include <asm/bootinfo.h>
@@ -633,28 +634,42 @@ unsigned long arch_align_stack(unsigned
 	return sp & ALMASK;
 }
 
-static void arch_dump_stack(void *info)
-{
-	struct pt_regs *regs;
+static DEFINE_PER_CPU(struct call_single_data, backtrace_csd);
+static struct cpumask backtrace_csd_busy;
 
-	regs = get_irq_regs();
-
-	if (regs)
-		show_regs(regs);
-	else
-		dump_stack();
+static void handle_backtrace(void *info)
+{
+	nmi_cpu_backtrace(get_irq_regs());
+	cpumask_clear_cpu(smp_processor_id(), &backtrace_csd_busy);
 }
 
-void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
+static void raise_backtrace(cpumask_t *mask)
 {
-	long this_cpu = get_cpu();
+	struct call_single_data *csd;
+	int cpu;
 
-	if (cpumask_test_cpu(this_cpu, mask) && !exclude_self)
-		dump_stack();
+	for_each_cpu(cpu, mask) {
+		/*
+		 * If we previously sent an IPI to the target CPU & it hasn't
+		 * cleared its bit in the busy cpumask then it didn't handle
+		 * our previous IPI & it's not safe for us to reuse the
+		 * call_single_data_t.
+		 */
+		if (cpumask_test_and_set_cpu(cpu, &backtrace_csd_busy)) {
+			pr_warn("Unable to send backtrace IPI to CPU%u - perhaps it hung?\n",
+				cpu);
+			continue;
+		}
 
-	smp_call_function_many(mask, arch_dump_stack, NULL, 1);
+		csd = &per_cpu(backtrace_csd, cpu);
+		csd->func = handle_backtrace;
+		smp_call_function_single_async(cpu, csd);
+	}
+}
 
-	put_cpu();
+void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
+{
+	nmi_trigger_cpumask_backtrace(mask, exclude_self, raise_backtrace);
 }
 
 int mips_get_process_fp_mode(struct task_struct *task)



  reply	other threads:[~2018-07-20 12:28 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-20 12:13 [PATCH 4.9 00/66] 4.9.114-stable review Greg Kroah-Hartman
2018-07-20 12:13 ` Greg Kroah-Hartman [this message]
2018-07-20 12:13 ` [PATCH 4.9 02/66] compiler, clang: suppress warning for unused static inline functions Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 03/66] compiler, clang: properly override inline for clang Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 04/66] compiler, clang: always inline when CONFIG_OPTIMIZE_INLINING is disabled Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 05/66] compiler-gcc.h: Add __attribute__((gnu_inline)) to all inline declarations Greg Kroah-Hartman
2018-07-20 12:13   ` Greg Kroah-Hartman
2018-07-20 12:13 ` Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 06/66] x86/asm: Add _ASM_ARG* constants for argument registers to <asm/asm.h> Greg Kroah-Hartman
2018-07-20 12:13   ` Greg Kroah-Hartman
2018-07-20 12:13 ` Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 07/66] x86/paravirt: Make native_save_fl() extern inline Greg Kroah-Hartman
2018-07-20 12:13   ` Greg Kroah-Hartman
2018-07-20 12:13 ` Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 08/66] ocfs2: subsystem.su_mutex is required while accessing the item->ci_parent Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 09/66] ocfs2: ip_alloc_sem should be taken in ocfs2_get_block() Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 10/66] mtd: m25p80: consider max message size in m25p80_read Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 11/66] spi/bcm63xx: make spi subsystem aware of message size limits Greg Kroah-Hartman
2018-07-20 12:34   ` Mark Brown
2018-07-20 13:39     ` Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 13/66] bcm63xx_enet: correct clock usage Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 14/66] bcm63xx_enet: do not write to random DMA channel on BCM6345 Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 15/66] crypto: crypto4xx - remove bad list_del Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 16/66] crypto: crypto4xx - fix crypto4xx_build_pdr, crypto4xx_build_sdr leak Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 17/66] atm: zatm: Fix potential Spectre v1 Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 18/66] ipvlan: fix IFLA_MTU ignored on NEWLINK Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 19/66] net: dccp: avoid crash in ccid3_hc_rx_send_feedback() Greg Kroah-Hartman
2018-07-20 12:13   ` Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 20/66] net: dccp: switch rx_tstamp_last_feedback to monotonic clock Greg Kroah-Hartman
2018-07-20 12:13   ` Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 21/66] net/mlx5: Fix incorrect raw command length parsing Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 22/66] net/mlx5: Fix wrong size allocation for QoS ETC TC regitster Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 23/66] net_sched: blackhole: tell upper qdisc about dropped packets Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 24/66] net: sungem: fix rx checksum support Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 25/66] qed: Fix use of incorrect size in memcpy call Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 26/66] qed: Limit msix vectors in kdump kernel to the minimum required count Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 28/66] r8152: napi hangup fix after disconnect Greg Kroah-Hartman
2018-07-20 12:13   ` [4.9,28/66] " Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 29/66] tcp: fix Fast Open key endianness Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 31/66] vhost_net: validate sock before trying to put its fd Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 32/66] net/packet: fix use-after-free Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 33/66] net/mlx5: Fix command interface race in polling mode Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 34/66] net: cxgb3_main: fix potential Spectre v1 Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 35/66] rtlwifi: rtl8821ae: fix firmware is not ready to run Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 36/66] net: lan78xx: Fix race in tx pending skb size calculation Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 37/66] netfilter: ebtables: reject non-bridge targets Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 38/66] reiserfs: fix buffer overflow with long warning messages Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 39/66] KEYS: DNS: fix parsing multiple options Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 40/66] netfilter: ipv6: nf_defrag: drop skb dst before queueing Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 41/66] rds: avoid unenecessary cong_update in loop transport Greg Kroah-Hartman
2018-07-20 12:13 ` [PATCH 4.9 42/66] net/nfc: Avoid stalls when nfc_alloc_send_skb() returned NULL Greg Kroah-Hartman
2018-07-20 12:14 ` [PATCH 4.9 43/66] arm64: assembler: introduce ldr_this_cpu Greg Kroah-Hartman
2018-07-20 12:14 ` [PATCH 4.9 44/66] KVM: arm64: Store vcpu on the stack during __guest_enter() Greg Kroah-Hartman
2018-07-20 12:14 ` [PATCH 4.9 45/66] KVM: arm/arm64: Convert kvm_host_cpu_state to a static per-cpu allocation Greg Kroah-Hartman
2018-07-20 12:14 ` [PATCH 4.9 46/66] KVM: arm64: Change hyp_panic()s dependency on tpidr_el2 Greg Kroah-Hartman
2018-07-20 12:14 ` [PATCH 4.9 47/66] arm64: alternatives: use tpidr_el2 on VHE hosts Greg Kroah-Hartman
2018-07-20 12:14 ` [PATCH 4.9 48/66] KVM: arm64: Stop save/restoring host tpidr_el1 on VHE Greg Kroah-Hartman
2018-07-20 12:14 ` [PATCH 4.9 49/66] arm64: alternatives: Add dynamic patching feature Greg Kroah-Hartman
2018-07-20 12:14 ` [PATCH 4.9 50/66] KVM: arm/arm64: Do not use kern_hyp_va() with kvm_vgic_global_state Greg Kroah-Hartman
2018-07-20 12:14 ` [PATCH 4.9 51/66] KVM: arm64: Avoid storing the vcpu pointer on the stack Greg Kroah-Hartman
2018-07-20 12:14 ` [PATCH 4.9 52/66] arm/arm64: smccc: Add SMCCC-specific return codes Greg Kroah-Hartman
2018-07-20 12:14 ` [PATCH 4.9 53/66] arm64: Call ARCH_WORKAROUND_2 on transitions between EL0 and EL1 Greg Kroah-Hartman
2018-07-20 12:14 ` [PATCH 4.9 54/66] arm64: Add per-cpu infrastructure to call ARCH_WORKAROUND_2 Greg Kroah-Hartman
2018-07-20 12:14 ` [PATCH 4.9 55/66] arm64: Add ARCH_WORKAROUND_2 probing Greg Kroah-Hartman
2018-07-20 12:14 ` [PATCH 4.9 56/66] arm64: Add ssbd command-line option Greg Kroah-Hartman
2018-07-20 12:14 ` [PATCH 4.9 57/66] arm64: ssbd: Add global mitigation state accessor Greg Kroah-Hartman
2018-07-20 12:14 ` [PATCH 4.9 58/66] arm64: ssbd: Skip apply_ssbd if not using dynamic mitigation Greg Kroah-Hartman
2018-07-20 12:14 ` [PATCH 4.9 59/66] arm64: ssbd: Restore mitigation status on CPU resume Greg Kroah-Hartman
2018-07-20 12:14 ` [PATCH 4.9 60/66] arm64: ssbd: Introduce thread flag to control userspace mitigation Greg Kroah-Hartman
2018-07-20 12:14 ` [PATCH 4.9 61/66] arm64: ssbd: Add prctl interface for per-thread mitigation Greg Kroah-Hartman
2018-07-20 12:14 ` [PATCH 4.9 62/66] arm64: KVM: Add HYP per-cpu accessors Greg Kroah-Hartman
2018-07-20 12:14 ` [PATCH 4.9 63/66] arm64: KVM: Add ARCH_WORKAROUND_2 support for guests Greg Kroah-Hartman
2018-07-20 12:14 ` [PATCH 4.9 64/66] arm64: KVM: Handle guests ARCH_WORKAROUND_2 requests Greg Kroah-Hartman
2018-07-20 12:14 ` [PATCH 4.9 65/66] arm64: KVM: Add ARCH_WORKAROUND_2 discovery through ARCH_FEATURES_FUNC_ID Greg Kroah-Hartman
2018-07-20 13:34 ` [PATCH 4.9 00/66] 4.9.114-stable review Nathan Chancellor
2018-07-20 13:40   ` Greg Kroah-Hartman
2018-07-21  7:36 ` Naresh Kamboju
2018-07-21 13:40 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180720121407.305714136@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=chenhc@lemote.com \
    --cc=jhogan@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@linux-mips.org \
    --cc=paul.burton@mips.com \
    --cc=ralf@linux-mips.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.