linux-hyperv.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/hyperv: Suspend/resume the VP assist page for hibernation
@ 2020-04-17  6:29 Dexuan Cui
  2020-04-17  9:07 ` Wei Liu
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Dexuan Cui @ 2020-04-17  6:29 UTC (permalink / raw)
  To: bp, haiyangz, hpa, kys, linux-hyperv, linux-kernel, mingo,
	sthemmin, tglx, x86, mikelley, vkuznets, wei.liu
  Cc: Dexuan Cui

Unlike the other CPUs, CPU0 is never offlined during hibernation. So in the
resume path, the "new" kernel's VP assist page is not suspended (i.e.
disabled), and later when we jump to the "old" kernel, the page is not
properly re-enabled for CPU0 with the allocated page from the old kernel.

So far, the VP assist page is only used by hv_apic_eoi_write(). When the
page is not properly re-enabled, hvp->apic_assist is always 0, so the
HV_X64_MSR_EOI MSR is always written. This is not ideal with respect to
performance, but Hyper-V can still correctly handle this.

The issue is: the hypervisor can corrupt the old kernel memory, and hence
sometimes cause unexpected behaviors, e.g. when the old kernel's non-boot
CPUs are being onlined in the resume path, the VM can hang or be killed
due to virtual triple fault.

Fix the issue by calling hv_cpu_die()/hv_cpu_init() in the syscore ops.

Without the fix, hibernation can fail at a rate of 1/300 ~ 1/500.
With the fix, hibernation can pass a long-haul test of 2000 rounds.

Fixes: 05bd330a7fd8 ("x86/hyperv: Suspend/resume the hypercall page for hibernation")
Cc: stable@vger.kernel.org
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 arch/x86/hyperv/hv_init.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index b0da5320bcff..4d3ce86331a3 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -72,7 +72,8 @@ static int hv_cpu_init(unsigned int cpu)
 	struct page *pg;
 
 	input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
-	pg = alloc_page(GFP_KERNEL);
+	/* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
+	pg = alloc_page(GFP_ATOMIC);
 	if (unlikely(!pg))
 		return -ENOMEM;
 	*input_arg = page_address(pg);
@@ -253,6 +254,7 @@ static int __init hv_pci_init(void)
 static int hv_suspend(void)
 {
 	union hv_x64_msr_hypercall_contents hypercall_msr;
+	int ret;
 
 	/*
 	 * Reset the hypercall page as it is going to be invalidated
@@ -269,12 +271,17 @@ static int hv_suspend(void)
 	hypercall_msr.enable = 0;
 	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
 
-	return 0;
+	ret = hv_cpu_die(0);
+	return ret;
 }
 
 static void hv_resume(void)
 {
 	union hv_x64_msr_hypercall_contents hypercall_msr;
+	int ret;
+
+	ret = hv_cpu_init(0);
+	WARN_ON(ret);
 
 	/* Re-enable the hypercall page */
 	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
@@ -287,6 +294,7 @@ static void hv_resume(void)
 	hv_hypercall_pg_saved = NULL;
 }
 
+/* Note: when the ops are called, only CPU0 is online and IRQs are disabled. */
 static struct syscore_ops hv_syscore_ops = {
 	.suspend	= hv_suspend,
 	.resume		= hv_resume,
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86/hyperv: Suspend/resume the VP assist page for hibernation
  2020-04-17  6:29 [PATCH] x86/hyperv: Suspend/resume the VP assist page for hibernation Dexuan Cui
@ 2020-04-17  9:07 ` Wei Liu
  2020-04-17 22:44   ` Dexuan Cui
  2020-04-17 10:03 ` Vitaly Kuznetsov
  2020-04-17 11:00 ` Wei Liu
  2 siblings, 1 reply; 12+ messages in thread
From: Wei Liu @ 2020-04-17  9:07 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: bp, haiyangz, hpa, kys, linux-hyperv, linux-kernel, mingo,
	sthemmin, tglx, x86, mikelley, vkuznets, wei.liu

On Thu, Apr 16, 2020 at 11:29:59PM -0700, Dexuan Cui wrote:
> Unlike the other CPUs, CPU0 is never offlined during hibernation. So in the
> resume path, the "new" kernel's VP assist page is not suspended (i.e.
> disabled), and later when we jump to the "old" kernel, the page is not
> properly re-enabled for CPU0 with the allocated page from the old kernel.
> 
> So far, the VP assist page is only used by hv_apic_eoi_write(). When the
> page is not properly re-enabled, hvp->apic_assist is always 0, so the
> HV_X64_MSR_EOI MSR is always written. This is not ideal with respect to
> performance, but Hyper-V can still correctly handle this.
> 
> The issue is: the hypervisor can corrupt the old kernel memory, and hence
> sometimes cause unexpected behaviors, e.g. when the old kernel's non-boot
> CPUs are being onlined in the resume path, the VM can hang or be killed
> due to virtual triple fault.
> 
> Fix the issue by calling hv_cpu_die()/hv_cpu_init() in the syscore ops.
> 
> Without the fix, hibernation can fail at a rate of 1/300 ~ 1/500.
> With the fix, hibernation can pass a long-haul test of 2000 rounds.
> 
> Fixes: 05bd330a7fd8 ("x86/hyperv: Suspend/resume the hypercall page for hibernation")
> Cc: stable@vger.kernel.org
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
>  arch/x86/hyperv/hv_init.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index b0da5320bcff..4d3ce86331a3 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -72,7 +72,8 @@ static int hv_cpu_init(unsigned int cpu)
>  	struct page *pg;
>  
>  	input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> -	pg = alloc_page(GFP_KERNEL);
> +	/* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
> +	pg = alloc_page(GFP_ATOMIC);

IMHO it would be better to  only tap into the reserve pool if so
required, e.g.

        pg = alloc_page(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL);

Wei.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86/hyperv: Suspend/resume the VP assist page for hibernation
  2020-04-17  6:29 [PATCH] x86/hyperv: Suspend/resume the VP assist page for hibernation Dexuan Cui
  2020-04-17  9:07 ` Wei Liu
@ 2020-04-17 10:03 ` Vitaly Kuznetsov
  2020-04-17 10:55   ` Wei Liu
  2020-04-17 11:00 ` Wei Liu
  2 siblings, 1 reply; 12+ messages in thread
From: Vitaly Kuznetsov @ 2020-04-17 10:03 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: bp, haiyangz, hpa, kys, linux-hyperv, linux-kernel, mingo,
	sthemmin, tglx, x86, mikelley, wei.liu, kvm

Dexuan Cui <decui@microsoft.com> writes:

> Unlike the other CPUs, CPU0 is never offlined during hibernation. So in the
> resume path, the "new" kernel's VP assist page is not suspended (i.e.
> disabled), and later when we jump to the "old" kernel, the page is not
> properly re-enabled for CPU0 with the allocated page from the old kernel.
>
> So far, the VP assist page is only used by hv_apic_eoi_write().

No, not only for that ('git grep hv_get_vp_assist_page')

KVM on Hyper-V also needs VP assist page to use Enlightened VMCS. In
particular, Enlightened VMPTR is written there.

This makes me wonder: how does hibernation work with KVM in case we use
Enlightened VMCS and we have VMs running? We need to make sure VP Assist
page content is preserved.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86/hyperv: Suspend/resume the VP assist page for hibernation
  2020-04-17 10:03 ` Vitaly Kuznetsov
@ 2020-04-17 10:55   ` Wei Liu
  2020-04-17 12:03     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 12+ messages in thread
From: Wei Liu @ 2020-04-17 10:55 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Dexuan Cui, bp, haiyangz, hpa, kys, linux-hyperv, linux-kernel,
	mingo, sthemmin, tglx, x86, mikelley, wei.liu, kvm

On Fri, Apr 17, 2020 at 12:03:18PM +0200, Vitaly Kuznetsov wrote:
> Dexuan Cui <decui@microsoft.com> writes:
> 
> > Unlike the other CPUs, CPU0 is never offlined during hibernation. So in the
> > resume path, the "new" kernel's VP assist page is not suspended (i.e.
> > disabled), and later when we jump to the "old" kernel, the page is not
> > properly re-enabled for CPU0 with the allocated page from the old kernel.
> >
> > So far, the VP assist page is only used by hv_apic_eoi_write().
> 
> No, not only for that ('git grep hv_get_vp_assist_page')
> 
> KVM on Hyper-V also needs VP assist page to use Enlightened VMCS. In
> particular, Enlightened VMPTR is written there.
> 
> This makes me wonder: how does hibernation work with KVM in case we use
> Enlightened VMCS and we have VMs running? We need to make sure VP Assist
> page content is preserved.

The page itself is preserved, isn't it?

hv_cpu_die never frees the vp_assit page. It merely disables it.
hv_cpu_init only allocates a new page if necessary.

Wei.

> 
> -- 
> Vitaly
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86/hyperv: Suspend/resume the VP assist page for hibernation
  2020-04-17  6:29 [PATCH] x86/hyperv: Suspend/resume the VP assist page for hibernation Dexuan Cui
  2020-04-17  9:07 ` Wei Liu
  2020-04-17 10:03 ` Vitaly Kuznetsov
@ 2020-04-17 11:00 ` Wei Liu
  2020-04-17 23:47   ` Dexuan Cui
  2 siblings, 1 reply; 12+ messages in thread
From: Wei Liu @ 2020-04-17 11:00 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: bp, haiyangz, hpa, kys, linux-hyperv, linux-kernel, mingo,
	sthemmin, tglx, x86, mikelley, vkuznets, wei.liu

On Thu, Apr 16, 2020 at 11:29:59PM -0700, Dexuan Cui wrote:
> Unlike the other CPUs, CPU0 is never offlined during hibernation. So in the
> resume path, the "new" kernel's VP assist page is not suspended (i.e.
> disabled), and later when we jump to the "old" kernel, the page is not
> properly re-enabled for CPU0 with the allocated page from the old kernel.
> 
> So far, the VP assist page is only used by hv_apic_eoi_write(). When the
> page is not properly re-enabled, hvp->apic_assist is always 0, so the
> HV_X64_MSR_EOI MSR is always written. This is not ideal with respect to
> performance, but Hyper-V can still correctly handle this.
> 
> The issue is: the hypervisor can corrupt the old kernel memory, and hence
> sometimes cause unexpected behaviors, e.g. when the old kernel's non-boot
> CPUs are being onlined in the resume path, the VM can hang or be killed
> due to virtual triple fault.

I don't quite follow here.

The first sentence is rather alarming -- why would Hyper-V corrupt
guest's memory (kernel or not)?

Secondly, code below only specifies cpu0. What does it do with non-boot
cpus on the resume path?

Wei.

> 
> Fix the issue by calling hv_cpu_die()/hv_cpu_init() in the syscore ops.
> 
> Without the fix, hibernation can fail at a rate of 1/300 ~ 1/500.
> With the fix, hibernation can pass a long-haul test of 2000 rounds.
> 
> Fixes: 05bd330a7fd8 ("x86/hyperv: Suspend/resume the hypercall page for hibernation")
> Cc: stable@vger.kernel.org
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
>  arch/x86/hyperv/hv_init.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index b0da5320bcff..4d3ce86331a3 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -72,7 +72,8 @@ static int hv_cpu_init(unsigned int cpu)
>  	struct page *pg;
>  
>  	input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> -	pg = alloc_page(GFP_KERNEL);
> +	/* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
> +	pg = alloc_page(GFP_ATOMIC);
>  	if (unlikely(!pg))
>  		return -ENOMEM;
>  	*input_arg = page_address(pg);
> @@ -253,6 +254,7 @@ static int __init hv_pci_init(void)
>  static int hv_suspend(void)
>  {
>  	union hv_x64_msr_hypercall_contents hypercall_msr;
> +	int ret;
>  
>  	/*
>  	 * Reset the hypercall page as it is going to be invalidated
> @@ -269,12 +271,17 @@ static int hv_suspend(void)
>  	hypercall_msr.enable = 0;
>  	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>  
> -	return 0;
> +	ret = hv_cpu_die(0);
> +	return ret;
>  }
>  
>  static void hv_resume(void)
>  {
>  	union hv_x64_msr_hypercall_contents hypercall_msr;
> +	int ret;
> +
> +	ret = hv_cpu_init(0);
> +	WARN_ON(ret);
>  
>  	/* Re-enable the hypercall page */
>  	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> @@ -287,6 +294,7 @@ static void hv_resume(void)
>  	hv_hypercall_pg_saved = NULL;
>  }
>  
> +/* Note: when the ops are called, only CPU0 is online and IRQs are disabled. */
>  static struct syscore_ops hv_syscore_ops = {
>  	.suspend	= hv_suspend,
>  	.resume		= hv_resume,
> -- 
> 2.19.1
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86/hyperv: Suspend/resume the VP assist page for hibernation
  2020-04-17 10:55   ` Wei Liu
@ 2020-04-17 12:03     ` Vitaly Kuznetsov
  2020-04-17 13:08       ` Wei Liu
  2020-04-17 23:07       ` Dexuan Cui
  0 siblings, 2 replies; 12+ messages in thread
From: Vitaly Kuznetsov @ 2020-04-17 12:03 UTC (permalink / raw)
  To: Wei Liu
  Cc: Dexuan Cui, bp, haiyangz, hpa, kys, linux-hyperv, linux-kernel,
	mingo, sthemmin, tglx, x86, mikelley, wei.liu, kvm

Wei Liu <wei.liu@kernel.org> writes:

> On Fri, Apr 17, 2020 at 12:03:18PM +0200, Vitaly Kuznetsov wrote:
>> Dexuan Cui <decui@microsoft.com> writes:
>> 
>> > Unlike the other CPUs, CPU0 is never offlined during hibernation. So in the
>> > resume path, the "new" kernel's VP assist page is not suspended (i.e.
>> > disabled), and later when we jump to the "old" kernel, the page is not
>> > properly re-enabled for CPU0 with the allocated page from the old kernel.
>> >
>> > So far, the VP assist page is only used by hv_apic_eoi_write().
>> 
>> No, not only for that ('git grep hv_get_vp_assist_page')
>> 
>> KVM on Hyper-V also needs VP assist page to use Enlightened VMCS. In
>> particular, Enlightened VMPTR is written there.
>> 
>> This makes me wonder: how does hibernation work with KVM in case we use
>> Enlightened VMCS and we have VMs running? We need to make sure VP Assist
>> page content is preserved.
>
> The page itself is preserved, isn't it?
>

Right, unlike hyperv_pcpu_input_arg is is not freed.

> hv_cpu_die never frees the vp_assit page. It merely disables it.
> hv_cpu_init only allocates a new page if necessary.

I'm not really sure that Hyper-V will like us when we disable VP Assist
page and have an active L2 guest using Enlightened VMCS, who knows what
it caches and when. I'll try to at least test if/how it works.

This all is not really related to Dexuan's patch)

-- 
Vitaly


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86/hyperv: Suspend/resume the VP assist page for hibernation
  2020-04-17 12:03     ` Vitaly Kuznetsov
@ 2020-04-17 13:08       ` Wei Liu
  2020-04-17 23:07       ` Dexuan Cui
  1 sibling, 0 replies; 12+ messages in thread
From: Wei Liu @ 2020-04-17 13:08 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Wei Liu, Dexuan Cui, bp, haiyangz, hpa, kys, linux-hyperv,
	linux-kernel, mingo, sthemmin, tglx, x86, mikelley, kvm

On Fri, Apr 17, 2020 at 02:03:38PM +0200, Vitaly Kuznetsov wrote:
> Wei Liu <wei.liu@kernel.org> writes:
> 
> > On Fri, Apr 17, 2020 at 12:03:18PM +0200, Vitaly Kuznetsov wrote:
> >> Dexuan Cui <decui@microsoft.com> writes:
> >> 
> >> > Unlike the other CPUs, CPU0 is never offlined during hibernation. So in the
> >> > resume path, the "new" kernel's VP assist page is not suspended (i.e.
> >> > disabled), and later when we jump to the "old" kernel, the page is not
> >> > properly re-enabled for CPU0 with the allocated page from the old kernel.
> >> >
> >> > So far, the VP assist page is only used by hv_apic_eoi_write().
> >> 
> >> No, not only for that ('git grep hv_get_vp_assist_page')
> >> 
> >> KVM on Hyper-V also needs VP assist page to use Enlightened VMCS. In
> >> particular, Enlightened VMPTR is written there.
> >> 
> >> This makes me wonder: how does hibernation work with KVM in case we use
> >> Enlightened VMCS and we have VMs running? We need to make sure VP Assist
> >> page content is preserved.
> >
> > The page itself is preserved, isn't it?
> >
> 
> Right, unlike hyperv_pcpu_input_arg is is not freed.
> 
> > hv_cpu_die never frees the vp_assit page. It merely disables it.
> > hv_cpu_init only allocates a new page if necessary.
> 
> I'm not really sure that Hyper-V will like us when we disable VP Assist
> page and have an active L2 guest using Enlightened VMCS, who knows what
> it caches and when. I'll try to at least test if/how it works.
> 

I'm curious to know that as well. :-)

> This all is not really related to Dexuan's patch)

Right.

Wei.

> 
> -- 
> Vitaly
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] x86/hyperv: Suspend/resume the VP assist page for hibernation
  2020-04-17  9:07 ` Wei Liu
@ 2020-04-17 22:44   ` Dexuan Cui
  0 siblings, 0 replies; 12+ messages in thread
From: Dexuan Cui @ 2020-04-17 22:44 UTC (permalink / raw)
  To: Wei Liu
  Cc: bp, Haiyang Zhang, hpa, KY Srinivasan, linux-hyperv,
	linux-kernel, mingo, Stephen Hemminger, tglx, x86,
	Michael Kelley, vkuznets

> From: Wei Liu <wei.liu@kernel.org>
> Sent: Friday, April 17, 2020 2:08 AM
> > @@ -72,7 +72,8 @@ static int hv_cpu_init(unsigned int cpu)
> >  	struct page *pg;
> >
> >  	input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> > -	pg = alloc_page(GFP_KERNEL);
> > +	/* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
> > +	pg = alloc_page(GFP_ATOMIC);
> 
> IMHO it would be better to  only tap into the reserve pool if so
> required, e.g.
> 
>         pg = alloc_page(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL);
> 
> Wei.

Ok, I'll follow the suggestion.

BTW, there are indeed some usages like this, but not a lot:
grep irqs_disabled drivers/acpi include/acpi drivers/trace -nr |grep GFP_ATOMIC | grep GFP_KERNEL

Thanks,
-- Dexuan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] x86/hyperv: Suspend/resume the VP assist page for hibernation
  2020-04-17 12:03     ` Vitaly Kuznetsov
  2020-04-17 13:08       ` Wei Liu
@ 2020-04-17 23:07       ` Dexuan Cui
  1 sibling, 0 replies; 12+ messages in thread
From: Dexuan Cui @ 2020-04-17 23:07 UTC (permalink / raw)
  To: vkuznets, Wei Liu
  Cc: bp, Haiyang Zhang, hpa, KY Srinivasan, linux-hyperv,
	linux-kernel, mingo, Stephen Hemminger, tglx, x86,
	Michael Kelley, wei.liu, kvm

> From: Vitaly Kuznetsov <vkuznets@redhat.com>
> Sent: Friday, April 17, 2020 5:04 AM
> To: Wei Liu <wei.liu@kernel.org>
> 
> Wei Liu <wei.liu@kernel.org> writes:
> 
> > On Fri, Apr 17, 2020 at 12:03:18PM +0200, Vitaly Kuznetsov wrote:
> >> Dexuan Cui <decui@microsoft.com> writes:
> >>
> >> > Unlike the other CPUs, CPU0 is never offlined during hibernation. So in the
> >> > resume path, the "new" kernel's VP assist page is not suspended (i.e.
> >> > disabled), and later when we jump to the "old" kernel, the page is not
> >> > properly re-enabled for CPU0 with the allocated page from the old kernel.
> >> >
> >> > So far, the VP assist page is only used by hv_apic_eoi_write().
> >>
> >> No, not only for that ('git grep hv_get_vp_assist_page')

Sorry, I unintentionally ignored that, as I have few knowledge about the
optimization for nested virtualization. :-)

> >> KVM on Hyper-V also needs VP assist page to use Enlightened VMCS. In
> >> particular, Enlightened VMPTR is written there.
> >>
> >> This makes me wonder: how does hibernation work with KVM in case we
> use
> >> Enlightened VMCS and we have VMs running? We need to make sure VP
> Assist
> >> page content is preserved.
> >
> > The page itself is preserved, isn't it?
> >
> 
> Right, unlike hyperv_pcpu_input_arg is is not freed.
> 
> > hv_cpu_die never frees the vp_assit page. It merely disables it.
> > hv_cpu_init only allocates a new page if necessary.
> 
> I'm not really sure that Hyper-V will like us when we disable VP Assist
> page and have an active L2 guest using Enlightened VMCS, who knows what
> it caches and when. I'll try to at least test if/how it works.
> 
> This all is not really related to Dexuan's patch)
> --
> Vitaly

It looks you imply that: if there is no active L2 guests, it should be safe to 
disable/reenable the assist page upon hibernation?

Can you please write a patch for KVM (when KVM runs on Hyper-V) to abort
the hibernation request if there is any active L2 guest? The pm_notifier can 
be used for this.

Thanks,
-- Dexuan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] x86/hyperv: Suspend/resume the VP assist page for hibernation
  2020-04-17 11:00 ` Wei Liu
@ 2020-04-17 23:47   ` Dexuan Cui
  2020-04-20 12:08     ` Wei Liu
  0 siblings, 1 reply; 12+ messages in thread
From: Dexuan Cui @ 2020-04-17 23:47 UTC (permalink / raw)
  To: Wei Liu
  Cc: bp, Haiyang Zhang, hpa, KY Srinivasan, linux-hyperv,
	linux-kernel, mingo, Stephen Hemminger, tglx, x86,
	Michael Kelley, vkuznets

> From: Wei Liu <wei.liu@kernel.org>
> Sent: Friday, April 17, 2020 4:00 AM
> To: Dexuan Cui <decui@microsoft.com>
> 
> On Thu, Apr 16, 2020 at 11:29:59PM -0700, Dexuan Cui wrote:
> > Unlike the other CPUs, CPU0 is never offlined during hibernation. So in the
> > resume path, the "new" kernel's VP assist page is not suspended (i.e.
> > disabled), and later when we jump to the "old" kernel, the page is not
> > properly re-enabled for CPU0 with the allocated page from the old kernel.
> >
> > So far, the VP assist page is only used by hv_apic_eoi_write(). When the
> > page is not properly re-enabled, hvp->apic_assist is always 0, so the
> > HV_X64_MSR_EOI MSR is always written. This is not ideal with respect to
> > performance, but Hyper-V can still correctly handle this.
> >
> > The issue is: the hypervisor can corrupt the old kernel memory, and hence
> > sometimes cause unexpected behaviors, e.g. when the old kernel's non-boot
> > CPUs are being onlined in the resume path, the VM can hang or be killed
> > due to virtual triple fault.
> 
> I don't quite follow here.
> 
> The first sentence is rather alarming -- why would Hyper-V corrupt
> guest's memory (kernel or not)?

Without this patch, after the VM resumes from hibernation, the hypervisor 
still thinks the assist page of vCPU0 points to the physical page allocated by
the "new" kernel (the "new" kernel started up freshly, loaded the saved state 
of the "old" kernel from disk into memory, and jumped to the "old" kernel),
but the same physical page can be allocated to store something different in
the "old" kernel (which is the currently running kernel, since the VM resumed).

Conceptually, it looks Hyper-V writes into the assist page from time to time,
e.g. for the EOI optimization. This "corrupts" the page for the "old" kernel.

I'm not absolutely sure if this explains the strange hang issue or triple fault
I occasionally saw in my long-haul hibernation test, but with this patch,
I never reproduce the strange hang/triple fault issue again, so I think this
patch works.

> Secondly, code below only specifies cpu0. What does it do with non-boot
> cpus on the resume path?
> 
> Wei.

hyperv_init() registers hv_cpu_init()/hv_cpu_die() to the cpuhp framework:

cpuhp = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/hyperv_init:online",
                       hv_cpu_init, hv_cpu_die);

In the hibernation procedure, the non-boot CPUs are automatically disabled
and reenabled, so hv_cpu_init()/hv_cpu_die() are automatically called for them,
e.g. in the resume path, see:
    hibernation_restore()
        resume_target_kernel()
            hibernate_resume_nonboot_cpu_disable()
                disable_nonboot_cpus() 
            syscore_suspend()
                hv_cpu_die(0)  // Added by this patch
            swsusp_arch_resume()
                relocate_restore_code()
                    restore_image()
                        jump to the old kernel and we return from 
                        the swsusp_arch_suspend() in create_image()
                            syscore_resume()
                                hv_cpu_init(0) // Added by this patch.
                            suspend_enable_secondary_cpus()
                            dpm_resume_start()
                            ...
Thanks,
-- Dexuan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86/hyperv: Suspend/resume the VP assist page for hibernation
  2020-04-17 23:47   ` Dexuan Cui
@ 2020-04-20 12:08     ` Wei Liu
  2020-04-20 16:40       ` Dexuan Cui
  0 siblings, 1 reply; 12+ messages in thread
From: Wei Liu @ 2020-04-20 12:08 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: Wei Liu, bp, Haiyang Zhang, hpa, KY Srinivasan, linux-hyperv,
	linux-kernel, mingo, Stephen Hemminger, tglx, x86,
	Michael Kelley, vkuznets

On Fri, Apr 17, 2020 at 11:47:41PM +0000, Dexuan Cui wrote:
> > From: Wei Liu <wei.liu@kernel.org>
> > Sent: Friday, April 17, 2020 4:00 AM
> > To: Dexuan Cui <decui@microsoft.com>
> > 
> > On Thu, Apr 16, 2020 at 11:29:59PM -0700, Dexuan Cui wrote:
> > > Unlike the other CPUs, CPU0 is never offlined during hibernation. So in the
> > > resume path, the "new" kernel's VP assist page is not suspended (i.e.
> > > disabled), and later when we jump to the "old" kernel, the page is not
> > > properly re-enabled for CPU0 with the allocated page from the old kernel.
> > >
> > > So far, the VP assist page is only used by hv_apic_eoi_write(). When the
> > > page is not properly re-enabled, hvp->apic_assist is always 0, so the
> > > HV_X64_MSR_EOI MSR is always written. This is not ideal with respect to
> > > performance, but Hyper-V can still correctly handle this.
> > >
> > > The issue is: the hypervisor can corrupt the old kernel memory, and hence
> > > sometimes cause unexpected behaviors, e.g. when the old kernel's non-boot
> > > CPUs are being onlined in the resume path, the VM can hang or be killed
> > > due to virtual triple fault.
> > 
> > I don't quite follow here.
> > 
> > The first sentence is rather alarming -- why would Hyper-V corrupt
> > guest's memory (kernel or not)?
> 
> Without this patch, after the VM resumes from hibernation, the hypervisor 
> still thinks the assist page of vCPU0 points to the physical page allocated by
> the "new" kernel (the "new" kernel started up freshly, loaded the saved state 
> of the "old" kernel from disk into memory, and jumped to the "old" kernel),
> but the same physical page can be allocated to store something different in
> the "old" kernel (which is the currently running kernel, since the VM resumed).
> 
> Conceptually, it looks Hyper-V writes into the assist page from time to time,
> e.g. for the EOI optimization. This "corrupts" the page for the "old" kernel.
> 
> I'm not absolutely sure if this explains the strange hang issue or triple fault
> I occasionally saw in my long-haul hibernation test, but with this patch,
> I never reproduce the strange hang/triple fault issue again, so I think this
> patch works.

OK. I wouldn't be surprised if the corruption ends up changing code in
the kernel which further triggers triple fault etc. 

I would suggest make this clear in the commit message to not give the
impression that Hyper-V has this weird behaviour of corrupting guest
memory for no reason.

We can replace the paragraph starting with "The issue is: ..." with:

---
Linux needs to update Hyper-V the correct VP assist page to prevent
Hyper-V from writing to a stale page, which causes guest memory
corruption.  The memory corruption may have caused some of the hangs and
triple faults we saw during non-boot CPUs resume.
---

This What do you think?

Wei.

> 
> > Secondly, code below only specifies cpu0. What does it do with non-boot
> > cpus on the resume path?
> > 
> > Wei.
> 
> hyperv_init() registers hv_cpu_init()/hv_cpu_die() to the cpuhp framework:
> 
> cpuhp = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/hyperv_init:online",
>                        hv_cpu_init, hv_cpu_die);
> 
> In the hibernation procedure, the non-boot CPUs are automatically disabled
> and reenabled, so hv_cpu_init()/hv_cpu_die() are automatically called for them,
> e.g. in the resume path, see:
>     hibernation_restore()
>         resume_target_kernel()
>             hibernate_resume_nonboot_cpu_disable()
>                 disable_nonboot_cpus() 
>             syscore_suspend()
>                 hv_cpu_die(0)  // Added by this patch
>             swsusp_arch_resume()
>                 relocate_restore_code()
>                     restore_image()
>                         jump to the old kernel and we return from 
>                         the swsusp_arch_suspend() in create_image()
>                             syscore_resume()
>                                 hv_cpu_init(0) // Added by this patch.
>                             suspend_enable_secondary_cpus()
>                             dpm_resume_start()
>                             ...
> Thanks,
> -- Dexuan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] x86/hyperv: Suspend/resume the VP assist page for hibernation
  2020-04-20 12:08     ` Wei Liu
@ 2020-04-20 16:40       ` Dexuan Cui
  0 siblings, 0 replies; 12+ messages in thread
From: Dexuan Cui @ 2020-04-20 16:40 UTC (permalink / raw)
  To: Wei Liu
  Cc: bp, Haiyang Zhang, hpa, KY Srinivasan, linux-hyperv,
	linux-kernel, mingo, Stephen Hemminger, tglx, x86,
	Michael Kelley, vkuznets

> From: Wei Liu <wei.liu@kernel.org>
> Sent: Monday, April 20, 2020 5:08 AM
> 
> I would suggest make this clear in the commit message to not give the
> impression that Hyper-V has this weird behaviour of corrupting guest
> memory for no reason.
> 
> We can replace the paragraph starting with "The issue is: ..." with:
> 
> ---
> Linux needs to update Hyper-V the correct VP assist page to prevent
> Hyper-V from writing to a stale page, which causes guest memory
> corruption.  The memory corruption may have caused some of the hangs and
> triple faults we saw during non-boot CPUs resume.
> ---
> This What do you think?

This version is much better. I'll use it. Thanks, Wei!

I'll post a v2 with the updated comments. I'll document Vitaly's concern about
nested virtualization in the comment.
 
Thanks,
-- Dexuan

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-04-20 16:41 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-17  6:29 [PATCH] x86/hyperv: Suspend/resume the VP assist page for hibernation Dexuan Cui
2020-04-17  9:07 ` Wei Liu
2020-04-17 22:44   ` Dexuan Cui
2020-04-17 10:03 ` Vitaly Kuznetsov
2020-04-17 10:55   ` Wei Liu
2020-04-17 12:03     ` Vitaly Kuznetsov
2020-04-17 13:08       ` Wei Liu
2020-04-17 23:07       ` Dexuan Cui
2020-04-17 11:00 ` Wei Liu
2020-04-17 23:47   ` Dexuan Cui
2020-04-20 12:08     ` Wei Liu
2020-04-20 16:40       ` Dexuan Cui

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).