linux-hyperv.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/hyperv: Suspend/resume the VP assist page for hibernation
@ 2020-04-17  6:29 Dexuan Cui
  2020-04-17  9:07 ` Wei Liu
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Dexuan Cui @ 2020-04-17  6:29 UTC (permalink / raw)
  To: bp, haiyangz, hpa, kys, linux-hyperv, linux-kernel, mingo,
	sthemmin, tglx, x86, mikelley, vkuznets, wei.liu
  Cc: Dexuan Cui

Unlike the other CPUs, CPU0 is never offlined during hibernation. So in the
resume path, the "new" kernel's VP assist page is not suspended (i.e.
disabled), and later when we jump to the "old" kernel, the page is not
properly re-enabled for CPU0 with the allocated page from the old kernel.

So far, the VP assist page is only used by hv_apic_eoi_write(). When the
page is not properly re-enabled, hvp->apic_assist is always 0, so the
HV_X64_MSR_EOI MSR is always written. This is not ideal with respect to
performance, but Hyper-V can still correctly handle this.

The issue is: the hypervisor can corrupt the old kernel memory, and hence
sometimes cause unexpected behaviors, e.g. when the old kernel's non-boot
CPUs are being onlined in the resume path, the VM can hang or be killed
due to virtual triple fault.

Fix the issue by calling hv_cpu_die()/hv_cpu_init() in the syscore ops.

Without the fix, hibernation can fail at a rate of 1/300 ~ 1/500.
With the fix, hibernation can pass a long-haul test of 2000 rounds.

Fixes: 05bd330a7fd8 ("x86/hyperv: Suspend/resume the hypercall page for hibernation")
Cc: stable@vger.kernel.org
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 arch/x86/hyperv/hv_init.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index b0da5320bcff..4d3ce86331a3 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -72,7 +72,8 @@ static int hv_cpu_init(unsigned int cpu)
 	struct page *pg;
 
 	input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
-	pg = alloc_page(GFP_KERNEL);
+	/* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
+	pg = alloc_page(GFP_ATOMIC);
 	if (unlikely(!pg))
 		return -ENOMEM;
 	*input_arg = page_address(pg);
@@ -253,6 +254,7 @@ static int __init hv_pci_init(void)
 static int hv_suspend(void)
 {
 	union hv_x64_msr_hypercall_contents hypercall_msr;
+	int ret;
 
 	/*
 	 * Reset the hypercall page as it is going to be invalidated
@@ -269,12 +271,17 @@ static int hv_suspend(void)
 	hypercall_msr.enable = 0;
 	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
 
-	return 0;
+	ret = hv_cpu_die(0);
+	return ret;
 }
 
 static void hv_resume(void)
 {
 	union hv_x64_msr_hypercall_contents hypercall_msr;
+	int ret;
+
+	ret = hv_cpu_init(0);
+	WARN_ON(ret);
 
 	/* Re-enable the hypercall page */
 	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
@@ -287,6 +294,7 @@ static void hv_resume(void)
 	hv_hypercall_pg_saved = NULL;
 }
 
+/* Note: when the ops are called, only CPU0 is online and IRQs are disabled. */
 static struct syscore_ops hv_syscore_ops = {
 	.suspend	= hv_suspend,
 	.resume		= hv_resume,
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-04-20 16:41 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-17  6:29 [PATCH] x86/hyperv: Suspend/resume the VP assist page for hibernation Dexuan Cui
2020-04-17  9:07 ` Wei Liu
2020-04-17 22:44   ` Dexuan Cui
2020-04-17 10:03 ` Vitaly Kuznetsov
2020-04-17 10:55   ` Wei Liu
2020-04-17 12:03     ` Vitaly Kuznetsov
2020-04-17 13:08       ` Wei Liu
2020-04-17 23:07       ` Dexuan Cui
2020-04-17 11:00 ` Wei Liu
2020-04-17 23:47   ` Dexuan Cui
2020-04-20 12:08     ` Wei Liu
2020-04-20 16:40       ` Dexuan Cui

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).