All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Vitaly Kuznetsov <vkuznets@redhat.com>,
	"K. Y. Srinivasan" <kys@microsoft.com>,
	devel@linuxdriverproject.org,
	Haiyang Zhang <haiyangz@microsoft.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>,
	Sasha Levin <alexander.levin@verizon.com>
Subject: [PATCH 4.9 52/93] x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic
Date: Mon, 20 Mar 2017 18:51:27 +0100	[thread overview]
Message-ID: <20170320174738.365262958@linuxfoundation.org> (raw)
In-Reply-To: <20170320174735.243147498@linuxfoundation.org>

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Vitaly Kuznetsov <vkuznets@redhat.com>

[ Upstream commit 59107e2f48831daedc46973ce4988605ab066de3 ]

There is a feature in Hyper-V ('Debug-VM --InjectNonMaskableInterrupt')
which injects NMI to the guest. We may want to crash the guest and do kdump
on this NMI by enabling unknown_nmi_panic. To make kdump succeed we need to
allow the kdump kernel to re-establish VMBus connection so it will see
VMBus devices (storage, network,..).

To properly unload VMBus making it possible to start over during kdump we
need to do the following:

 - Send an 'unload' message to the hypervisor. This can be done on any CPU
   so we do this the crashing CPU.

 - Receive the 'unload finished' reply message. WS2012R2 delivers this
   message to the CPU which was used to establish VMBus connection during
   module load and this CPU may differ from the CPU sending 'unload'.

Receiving a VMBus message means the following:

 - There is a per-CPU slot in memory for one message. This slot can in
   theory be accessed by any CPU.

 - We get an interrupt on the CPU when a message was placed into the slot.

 - When we read the message we need to clear the slot and signal the fact
   to the hypervisor. In case there are more messages to this CPU pending
   the hypervisor will deliver the next message. The signaling is done by
   writing to an MSR so this can only be done on the appropriate CPU.

To avoid doing cross-CPU work on crash we have vmbus_wait_for_unload()
function which checks message slots for all CPUs in a loop waiting for the
'unload finished' messages. However, there is an issue which arises when
these conditions are met:

 - We're crashing on a CPU which is different from the one which was used
   to initially contact the hypervisor.

 - The CPU which was used for the initial contact is blocked with interrupts
   disabled and there is a message pending in the message slot.

In this case we won't be able to read the 'unload finished' message on the
crashing CPU. This is reproducible when we receive unknown NMIs on all CPUs
simultaneously: the first CPU entering panic() will proceed to crash and
all other CPUs will stop themselves with interrupts disabled.

The suggested solution is to handle unknown NMIs for Hyper-V guests on the
first CPU which gets them only. This will allow us to rely on VMBus
interrupt handler being able to receive the 'unload finish' message in
case it is delivered to a different CPU.

The issue is not reproducible on WS2016 as Debug-VM delivers NMI to the
boot CPU only, WS2012R2 and earlier Hyper-V versions are affected.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: devel@linuxdriverproject.org
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Link: http://lkml.kernel.org/r/20161202100720.28121-1-vkuznets@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kernel/cpu/mshyperv.c |   24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -31,6 +31,7 @@
 #include <asm/apic.h>
 #include <asm/timer.h>
 #include <asm/reboot.h>
+#include <asm/nmi.h>
 
 struct ms_hyperv_info ms_hyperv;
 EXPORT_SYMBOL_GPL(ms_hyperv);
@@ -158,6 +159,26 @@ static unsigned char hv_get_nmi_reason(v
 	return 0;
 }
 
+#ifdef CONFIG_X86_LOCAL_APIC
+/*
+ * Prior to WS2016 Debug-VM sends NMIs to all CPUs which makes
+ * it dificult to process CHANNELMSG_UNLOAD in case of crash. Handle
+ * unknown NMI on the first CPU which gets it.
+ */
+static int hv_nmi_unknown(unsigned int val, struct pt_regs *regs)
+{
+	static atomic_t nmi_cpu = ATOMIC_INIT(-1);
+
+	if (!unknown_nmi_panic)
+		return NMI_DONE;
+
+	if (atomic_cmpxchg(&nmi_cpu, -1, raw_smp_processor_id()) != -1)
+		return NMI_HANDLED;
+
+	return NMI_DONE;
+}
+#endif
+
 static void __init ms_hyperv_init_platform(void)
 {
 	/*
@@ -183,6 +204,9 @@ static void __init ms_hyperv_init_platfo
 		pr_info("HyperV: LAPIC Timer Frequency: %#x\n",
 			lapic_timer_frequency);
 	}
+
+	register_nmi_handler(NMI_UNKNOWN, hv_nmi_unknown, NMI_FLAG_FIRST,
+			     "hv_nmi_unknown");
 #endif
 
 	if (ms_hyperv.features & HV_X64_MSR_TIME_REF_COUNT_AVAILABLE)

  parent reply	other threads:[~2017-03-20 18:33 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-20 17:50 [PATCH 4.9 00/93] 4.9.17-stable review Greg Kroah-Hartman
2017-03-20 17:50 ` [PATCH 4.9 01/93] net/mlx5e: Register/unregister vport representors on interface attach/detach Greg Kroah-Hartman
2017-03-20 17:50 ` [PATCH 4.9 02/93] net/mlx5e: Do not reduce LRO WQE size when not using build_skb Greg Kroah-Hartman
2017-03-20 17:50 ` [PATCH 4.9 03/93] net/mlx5e: Fix wrong CQE decompression Greg Kroah-Hartman
2017-03-20 17:50 ` [PATCH 4.9 04/93] vxlan: correctly validate VXLAN ID against VXLAN_N_VID Greg Kroah-Hartman
2017-03-20 17:50 ` [PATCH 4.9 05/93] vti6: return GRE_KEY for vti6 Greg Kroah-Hartman
2017-03-20 17:50 ` [PATCH 4.9 06/93] vxlan: dont allow overwrite of config src addr Greg Kroah-Hartman
2017-03-20 17:50 ` [PATCH 4.9 07/93] ipv4: mask tos for input route Greg Kroah-Hartman
2017-03-20 17:50 ` [PATCH 4.9 08/93] net sched actions: decrement module reference count after table flush Greg Kroah-Hartman
2017-03-20 17:50 ` [PATCH 4.9 10/93] net: phy: Avoid deadlock during phy_error() Greg Kroah-Hartman
2017-03-20 17:50 ` [PATCH 4.9 11/93] vxlan: lock RCU on TX path Greg Kroah-Hartman
2017-03-20 17:50 ` [PATCH 4.9 12/93] geneve: " Greg Kroah-Hartman
2017-03-20 17:50 ` [PATCH 4.9 13/93] mlxsw: spectrum_router: Avoid potential packets loss Greg Kroah-Hartman
2017-03-20 17:50 ` [PATCH 4.9 14/93] tcp/dccp: block BH for SYN processing Greg Kroah-Hartman
2017-03-20 17:50 ` [PATCH 4.9 15/93] net: bridge: allow IPv6 when multicast flood is disabled Greg Kroah-Hartman
2017-03-20 17:50 ` [PATCH 4.9 16/93] net: dont call strlen() on the user buffer in packet_bind_spkt() Greg Kroah-Hartman
2017-03-20 17:50 ` [PATCH 4.9 17/93] net: net_enable_timestamp() can be called from irq contexts Greg Kroah-Hartman
2017-03-20 17:50 ` [PATCH 4.9 18/93] ipv6: orphan skbs in reassembly unit Greg Kroah-Hartman
2017-03-20 17:50 ` [PATCH 4.9 19/93] dccp: Unlock sock before calling sk_free() Greg Kroah-Hartman
2017-03-20 17:50 ` [PATCH 4.9 20/93] strparser: destroy workqueue on module exit Greg Kroah-Hartman
2017-03-20 17:50 ` [PATCH 4.9 21/93] tcp: fix various issues for sockets morphing to listen state Greg Kroah-Hartman
2017-03-20 17:50 ` [PATCH 4.9 22/93] net: fix socket refcounting in skb_complete_wifi_ack() Greg Kroah-Hartman
2017-03-20 17:50 ` [PATCH 4.9 23/93] net: fix socket refcounting in skb_complete_tx_timestamp() Greg Kroah-Hartman
2017-03-20 17:50 ` [PATCH 4.9 24/93] net/sched: act_skbmod: remove unneeded rcu_read_unlock in tcf_skbmod_dump Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 25/93] dccp: fix use-after-free in dccp_feat_activate_values Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 26/93] vrf: Fix use-after-free in vrf_xmit Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 27/93] net/tunnel: set inner protocol in network gro hooks Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 28/93] uapi: fix linux/packet_diag.h userspace compilation error Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 30/93] mpls: Send route delete notifications when router module is unloaded Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 31/93] mpls: Do not decrement alive counter for unregister events Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 32/93] ipv6: make ECMP route replacement less greedy Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 33/93] ipv6: avoid write to a possibly cloned skb Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 34/93] bridge: drop netfilter fake rtable unconditionally Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 37/93] dccp: fix memory leak during tear-down of unsuccessful connection request Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 38/93] bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 39/93] bpf: fix state equivalence Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 40/93] bpf: fix regression on verifier pruning wrt map lookups Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 41/93] bpf: fix mark_reg_unknown_value for spilled regs on map value marking Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 42/93] dmaengine: iota: ioat_alloc_chan_resources should not perform sleeping allocations Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 43/93] xen: do not re-use pirq number cached in pci device msi msg data Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 44/93] igb: Workaround for igb i210 firmware issue Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 45/93] igb: add i211 to i210 PHY workaround Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 46/93] scsi: ibmvscsis: Issues from Dan Carpenter/Smatch Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 47/93] scsi: ibmvscsis: Return correct partition name/# to client Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 48/93] scsi: ibmvscsis: Clean up properly if target_submit_cmd/tmr fails Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 49/93] scsi: ibmvscsis: Rearrange functions for future patches Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 50/93] scsi: ibmvscsis: Synchronize cmds at tpg_enable_store time Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 51/93] scsi: ibmvscsis: Synchronize cmds at remove time Greg Kroah-Hartman
2017-03-20 17:51 ` Greg Kroah-Hartman [this message]
2017-03-20 17:51 ` [PATCH 4.9 53/93] PCI: Separate VF BAR updates from standard BAR updates Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 54/93] PCI: Remove pci_resource_bar() and pci_iov_resource_bar() Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 55/93] PCI: Add comments about ROM BAR updating Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 56/93] PCI: Decouple IORESOURCE_ROM_ENABLE and PCI_ROM_ADDRESS_ENABLE Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 57/93] PCI: Dont update VF BARs while VF memory space is enabled Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 58/93] PCI: Update BARs using property bits appropriate for type Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 59/93] PCI: Ignore BAR updates on virtual functions Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 60/93] PCI: Do any VF BAR updates before enabling the BARs Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 61/93] ibmveth: calculate gso_segs for large packets Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 62/93] Drivers: hv: ring_buffer: count on wrap around mappings in get_next_pkt_raw() (v2) Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 63/93] vfio/spapr: Postpone allocation of userspace version of TCE table Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 64/93] powerpc/iommu: Pass mm_struct to init/cleanup helpers Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 65/93] powerpc/iommu: Stop using @current in mm_iommu_xxx Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 66/93] vfio/spapr: Reference mm in tce_container Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 67/93] powerpc/mm/iommu, vfio/spapr: Put pages on VFIO container shutdown Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 68/93] vfio/spapr: Add a helper to create default DMA window Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 69/93] vfio/spapr: Postpone default window creation Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 70/93] drm/nouveau/disp/gp102: fix cursor/overlay immediate channel indices Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 71/93] drm/nouveau/disp/nv50-: split chid into chid.ctrl and chid.user Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 72/93] drm/nouveau/disp/nv50-: specify ctrl/user separately when constructing classes Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 73/93] block: allow WRITE_SAME commands with the SG_IO ioctl Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 74/93] s390/zcrypt: Introduce CEX6 toleration Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 75/93] [media] uvcvideo: uvc_scan_fallback() for webcams with broken chain Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 76/93] slub: move synchronize_sched out of slab_mutex on shrink Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 77/93] ACPI / blacklist: add _REV quirks for Dell Precision 5520 and 3520 Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 78/93] ACPI / blacklist: Make Dell Latitude 3350 ethernet work Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 79/93] serial: 8250_pci: Detach low-level driver during PCI error recovery Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 80/93] usb: gadget: udc: atmel: remove memory leak Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 82/93] clk: bcm2835: Fix ->fixed_divider of pllh_aux Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 83/93] drm/vc4: Fix race between page flip completion event and clean-up Greg Kroah-Hartman
2017-03-20 17:51 ` [PATCH 4.9 84/93] drm/vc4: Fix ->clock_select setting for the VEC encoder Greg Kroah-Hartman
2017-03-20 17:52 ` [PATCH 4.9 85/93] arm64: KVM: VHE: Clear HCR_TGE when invalidating guest TLBs Greg Kroah-Hartman
2017-03-20 17:52 ` [PATCH 4.9 86/93] irqchip/gicv3-its: Add workaround for QDF2400 ITS erratum 0065 Greg Kroah-Hartman
2017-03-20 17:52 ` [PATCH 4.9 87/93] x86/tsc: Fix ART for TSC_KNOWN_FREQ Greg Kroah-Hartman
2017-03-20 17:52 ` [PATCH 4.9 88/93] x86/kasan: Fix boot with KASAN=y and PROFILE_ANNOTATED_BRANCHES=y Greg Kroah-Hartman
2017-03-20 17:52   ` Greg Kroah-Hartman
2017-03-20 17:52 ` [PATCH 4.9 89/93] x86/perf: Fix CR4.PCE propagation to use active_mm instead of mm Greg Kroah-Hartman
2017-03-20 17:52 ` [PATCH 4.9 90/93] futex: Fix potential use-after-free in FUTEX_REQUEUE_PI Greg Kroah-Hartman
2017-03-20 17:52 ` [PATCH 4.9 91/93] futex: Add missing error handling to FUTEX_REQUEUE_PI Greg Kroah-Hartman
2017-03-20 17:52 ` [PATCH 4.9 92/93] locking/rwsem: Fix down_write_killable() for CONFIG_RWSEM_GENERIC_SPINLOCK=y Greg Kroah-Hartman
2017-03-20 17:52 ` [PATCH 4.9 93/93] crypto: powerpc - Fix initialisation of crc32c context Greg Kroah-Hartman
2017-03-21  0:12 ` [PATCH 4.9 00/93] 4.9.17-stable review Shuah Khan
2017-03-21  2:13 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170320174738.365262958@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=alexander.levin@verizon.com \
    --cc=devel@linuxdriverproject.org \
    --cc=haiyangz@microsoft.com \
    --cc=kys@microsoft.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.