linux-hyperv.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] x86/hyperv: Fix kexec panic/hang issues
@ 2020-12-22  6:55 Dexuan Cui
  2020-12-22 13:45 ` Michael Kelley
  2021-01-05 13:04 ` Wei Liu
  0 siblings, 2 replies; 5+ messages in thread
From: Dexuan Cui @ 2020-12-22  6:55 UTC (permalink / raw)
  To: tglx, mingo, bp, x86, hpa, linux-hyperv, mikelley, wei.liu,
	vkuznets, jwiesner, ohering
  Cc: linux-kernel, sthemmin, haiyangz, kys, Dexuan Cui

Currently the kexec kernel can panic or hang due to 2 causes:

1) hv_cpu_die() is not called upon kexec, so the hypervisor corrupts the
old VP Assist Pages when the kexec kernel runs. The same issue is fixed
for hibernation in commit 421f090c819d ("x86/hyperv: Suspend/resume the
VP assist page for hibernation"). Now fix it for kexec.

2) hyperv_cleanup() is called too early. In the kexec path, the other CPUs
are stopped in hv_machine_shutdown() -> native_machine_shutdown(), so
between hv_kexec_handler() and native_machine_shutdown(), the other CPUs
can still try to access the hypercall page and cause panic. The workaround
"hv_hypercall_pg = NULL;" in hyperv_cleanup() is unreliabe. Move
hyperv_cleanup() to a better place.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
---

Changes in v2:
	Improved the commit log as Michael Kelley suggested.
	No change to v1 otherwise.

 arch/x86/hyperv/hv_init.c       |  4 ++++
 arch/x86/include/asm/mshyperv.h |  2 ++
 arch/x86/kernel/cpu/mshyperv.c  | 18 ++++++++++++++++++
 drivers/hv/vmbus_drv.c          |  2 --
 4 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index e04d90af4c27..4638a52d8eae 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -16,6 +16,7 @@
 #include <asm/hyperv-tlfs.h>
 #include <asm/mshyperv.h>
 #include <asm/idtentry.h>
+#include <linux/kexec.h>
 #include <linux/version.h>
 #include <linux/vmalloc.h>
 #include <linux/mm.h>
@@ -26,6 +27,8 @@
 #include <linux/syscore_ops.h>
 #include <clocksource/hyperv_timer.h>
 
+int hyperv_init_cpuhp;
+
 void *hv_hypercall_pg;
 EXPORT_SYMBOL_GPL(hv_hypercall_pg);
 
@@ -401,6 +404,7 @@ void __init hyperv_init(void)
 
 	register_syscore_ops(&hv_syscore_ops);
 
+	hyperv_init_cpuhp = cpuhp;
 	return;
 
 remove_cpuhp_state:
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index ffc289992d1b..30f76b966857 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -74,6 +74,8 @@ static inline void hv_disable_stimer0_percpu_irq(int irq) {}
 
 
 #if IS_ENABLED(CONFIG_HYPERV)
+extern int hyperv_init_cpuhp;
+
 extern void *hv_hypercall_pg;
 extern void  __percpu  **hyperv_pcpu_input_arg;
 
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index f628e3dc150f..43b54bef5448 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -135,14 +135,32 @@ static void hv_machine_shutdown(void)
 {
 	if (kexec_in_progress && hv_kexec_handler)
 		hv_kexec_handler();
+
+	/*
+	 * Call hv_cpu_die() on all the CPUs, otherwise later the hypervisor
+	 * corrupts the old VP Assist Pages and can crash the kexec kernel.
+	 */
+	if (kexec_in_progress && hyperv_init_cpuhp > 0)
+		cpuhp_remove_state(hyperv_init_cpuhp);
+
+	/* The function calls stop_other_cpus(). */
 	native_machine_shutdown();
+
+	/* Disable the hypercall page when there is only 1 active CPU. */
+	if (kexec_in_progress)
+		hyperv_cleanup();
 }
 
 static void hv_machine_crash_shutdown(struct pt_regs *regs)
 {
 	if (hv_crash_handler)
 		hv_crash_handler(regs);
+
+	/* The function calls crash_smp_send_stop(). */
 	native_machine_crash_shutdown(regs);
+
+	/* Disable the hypercall page when there is only 1 active CPU. */
+	hyperv_cleanup();
 }
 #endif /* CONFIG_KEXEC_CORE */
 #endif /* CONFIG_HYPERV */
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 502f8cd95f6d..d491fdcee61f 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -2550,7 +2550,6 @@ static void hv_kexec_handler(void)
 	/* Make sure conn_state is set as hv_synic_cleanup checks for it */
 	mb();
 	cpuhp_remove_state(hyperv_cpuhp_online);
-	hyperv_cleanup();
 };
 
 static void hv_crash_handler(struct pt_regs *regs)
@@ -2566,7 +2565,6 @@ static void hv_crash_handler(struct pt_regs *regs)
 	cpu = smp_processor_id();
 	hv_stimer_cleanup(cpu);
 	hv_synic_disable_regs(cpu);
-	hyperv_cleanup();
 };
 
 static int hv_synic_suspend(void)
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* RE: [PATCH v2] x86/hyperv: Fix kexec panic/hang issues
  2020-12-22  6:55 [PATCH v2] x86/hyperv: Fix kexec panic/hang issues Dexuan Cui
@ 2020-12-22 13:45 ` Michael Kelley
  2021-01-05 13:04 ` Wei Liu
  1 sibling, 0 replies; 5+ messages in thread
From: Michael Kelley @ 2020-12-22 13:45 UTC (permalink / raw)
  To: Dexuan Cui, tglx, mingo, bp, x86, hpa, linux-hyperv, wei.liu,
	vkuznets, jwiesner, ohering
  Cc: linux-kernel, Stephen Hemminger, Haiyang Zhang, KY Srinivasan

From: Dexuan Cui <decui@microsoft.com> Sent: Monday, December 21, 2020 10:56 PM
> 
> Currently the kexec kernel can panic or hang due to 2 causes:
> 
> 1) hv_cpu_die() is not called upon kexec, so the hypervisor corrupts the
> old VP Assist Pages when the kexec kernel runs. The same issue is fixed
> for hibernation in commit 421f090c819d ("x86/hyperv: Suspend/resume the
> VP assist page for hibernation"). Now fix it for kexec.
> 
> 2) hyperv_cleanup() is called too early. In the kexec path, the other CPUs
> are stopped in hv_machine_shutdown() -> native_machine_shutdown(), so
> between hv_kexec_handler() and native_machine_shutdown(), the other CPUs
> can still try to access the hypercall page and cause panic. The workaround
> "hv_hypercall_pg = NULL;" in hyperv_cleanup() is unreliabe. Move
> hyperv_cleanup() to a better place.
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
> 
> Changes in v2:
> 	Improved the commit log as Michael Kelley suggested.
> 	No change to v1 otherwise.
> 
>  arch/x86/hyperv/hv_init.c       |  4 ++++
>  arch/x86/include/asm/mshyperv.h |  2 ++
>  arch/x86/kernel/cpu/mshyperv.c  | 18 ++++++++++++++++++
>  drivers/hv/vmbus_drv.c          |  2 --
>  4 files changed, 24 insertions(+), 2 deletions(-)
> 

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] x86/hyperv: Fix kexec panic/hang issues
  2020-12-22  6:55 [PATCH v2] x86/hyperv: Fix kexec panic/hang issues Dexuan Cui
  2020-12-22 13:45 ` Michael Kelley
@ 2021-01-05 13:04 ` Wei Liu
  2021-01-05 16:39   ` Michael Kelley
  1 sibling, 1 reply; 5+ messages in thread
From: Wei Liu @ 2021-01-05 13:04 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: tglx, mingo, bp, x86, hpa, linux-hyperv, mikelley, wei.liu,
	vkuznets, jwiesner, ohering, linux-kernel, sthemmin, haiyangz,
	kys

On Mon, Dec 21, 2020 at 10:55:41PM -0800, Dexuan Cui wrote:
> Currently the kexec kernel can panic or hang due to 2 causes:
> 
> 1) hv_cpu_die() is not called upon kexec, so the hypervisor corrupts the
> old VP Assist Pages when the kexec kernel runs. The same issue is fixed
> for hibernation in commit 421f090c819d ("x86/hyperv: Suspend/resume the
> VP assist page for hibernation"). Now fix it for kexec.
> 
> 2) hyperv_cleanup() is called too early. In the kexec path, the other CPUs
> are stopped in hv_machine_shutdown() -> native_machine_shutdown(), so
> between hv_kexec_handler() and native_machine_shutdown(), the other CPUs
> can still try to access the hypercall page and cause panic. The workaround
> "hv_hypercall_pg = NULL;" in hyperv_cleanup() is unreliabe. Move
> hyperv_cleanup() to a better place.
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>

The code looks a bit intrusive. On the other hand, this does sound like
something needs backporting for older stable kernels.

On a more practical note, I need to decide whether to take it via
hyperv-fixes or hyperv-next. What do you think?

Wei.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH v2] x86/hyperv: Fix kexec panic/hang issues
  2021-01-05 13:04 ` Wei Liu
@ 2021-01-05 16:39   ` Michael Kelley
  2021-01-05 17:53     ` Wei Liu
  0 siblings, 1 reply; 5+ messages in thread
From: Michael Kelley @ 2021-01-05 16:39 UTC (permalink / raw)
  To: Wei Liu, Dexuan Cui
  Cc: tglx, mingo, bp, x86, hpa, linux-hyperv, vkuznets, jwiesner,
	ohering, linux-kernel, Stephen Hemminger, Haiyang Zhang,
	KY Srinivasan

From: Wei Liu <wei.liu@kernel.org> Sent: Tuesday, January 5, 2021 5:04 AM
> 
> On Mon, Dec 21, 2020 at 10:55:41PM -0800, Dexuan Cui wrote:
> > Currently the kexec kernel can panic or hang due to 2 causes:
> >
> > 1) hv_cpu_die() is not called upon kexec, so the hypervisor corrupts the
> > old VP Assist Pages when the kexec kernel runs. The same issue is fixed
> > for hibernation in commit 421f090c819d ("x86/hyperv: Suspend/resume the
> > VP assist page for hibernation"). Now fix it for kexec.
> >
> > 2) hyperv_cleanup() is called too early. In the kexec path, the other CPUs
> > are stopped in hv_machine_shutdown() -> native_machine_shutdown(), so
> > between hv_kexec_handler() and native_machine_shutdown(), the other CPUs
> > can still try to access the hypercall page and cause panic. The workaround
> > "hv_hypercall_pg = NULL;" in hyperv_cleanup() is unreliabe. Move
> > hyperv_cleanup() to a better place.
> >
> > Signed-off-by: Dexuan Cui <decui@microsoft.com>
> 
> The code looks a bit intrusive. On the other hand, this does sound like
> something needs backporting for older stable kernels.
> 
> On a more practical note, I need to decide whether to take it via
> hyperv-fixes or hyperv-next. What do you think?
> 

I'd like to see this in hyperv-fixes and backported to older stable kernels.
In its current form, the kexec path in a Hyper-V guest has multiple problems
that make it unreliable, so the downside risk of taking these fixes is minimal
while the upside benefit is considerable.

Michael

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] x86/hyperv: Fix kexec panic/hang issues
  2021-01-05 16:39   ` Michael Kelley
@ 2021-01-05 17:53     ` Wei Liu
  0 siblings, 0 replies; 5+ messages in thread
From: Wei Liu @ 2021-01-05 17:53 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Wei Liu, Dexuan Cui, tglx, mingo, bp, x86, hpa, linux-hyperv,
	vkuznets, jwiesner, ohering, linux-kernel, Stephen Hemminger,
	Haiyang Zhang, KY Srinivasan

On Tue, Jan 05, 2021 at 04:39:38PM +0000, Michael Kelley wrote:
> From: Wei Liu <wei.liu@kernel.org> Sent: Tuesday, January 5, 2021 5:04 AM
> > 
> > On Mon, Dec 21, 2020 at 10:55:41PM -0800, Dexuan Cui wrote:
> > > Currently the kexec kernel can panic or hang due to 2 causes:
> > >
> > > 1) hv_cpu_die() is not called upon kexec, so the hypervisor corrupts the
> > > old VP Assist Pages when the kexec kernel runs. The same issue is fixed
> > > for hibernation in commit 421f090c819d ("x86/hyperv: Suspend/resume the
> > > VP assist page for hibernation"). Now fix it for kexec.
> > >
> > > 2) hyperv_cleanup() is called too early. In the kexec path, the other CPUs
> > > are stopped in hv_machine_shutdown() -> native_machine_shutdown(), so
> > > between hv_kexec_handler() and native_machine_shutdown(), the other CPUs
> > > can still try to access the hypercall page and cause panic. The workaround
> > > "hv_hypercall_pg = NULL;" in hyperv_cleanup() is unreliabe. Move
> > > hyperv_cleanup() to a better place.
> > >
> > > Signed-off-by: Dexuan Cui <decui@microsoft.com>
> > 
> > The code looks a bit intrusive. On the other hand, this does sound like
> > something needs backporting for older stable kernels.
> > 
> > On a more practical note, I need to decide whether to take it via
> > hyperv-fixes or hyperv-next. What do you think?
> > 
> 
> I'd like to see this in hyperv-fixes and backported to older stable kernels.
> In its current form, the kexec path in a Hyper-V guest has multiple problems
> that make it unreliable, so the downside risk of taking these fixes is minimal
> while the upside benefit is considerable.

Applied to hyperv-fixes.

Wei.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-01-05 17:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-22  6:55 [PATCH v2] x86/hyperv: Fix kexec panic/hang issues Dexuan Cui
2020-12-22 13:45 ` Michael Kelley
2021-01-05 13:04 ` Wei Liu
2021-01-05 16:39   ` Michael Kelley
2021-01-05 17:53     ` Wei Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).