linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] x86, hyperv: fix kernel panic when kexec on HyperV VM
@ 2019-02-26 15:56 Kairui Song
  2019-02-26 16:39 ` Vitaly Kuznetsov
  2019-02-27 12:01 ` Peter Zijlstra
  0 siblings, 2 replies; 6+ messages in thread
From: Kairui Song @ 2019-02-26 15:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Vitaly Kuznetsov, Dave Young, x86, devel, Kairui Song

When hypercalls is used for sending IPIs, kexec will fail with a kernel
panic like this:

kexec_core: Starting new kernel
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
PGD 8000000057995067 P4D 8000000057995067 PUD 57990067 PMD 0
Oops: 0002 [#1] SMP PTI
CPU: 0 PID: 1016 Comm: kexec Not tainted 4.18.16-300.fc29.x86_64 #1
Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v3.0 03/02/2018
RIP: 0010:0xffffc9000001d000
Code: Bad RIP value.
RSP: 0018:ffffc9000495bcf0 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffffc9000001d000 RCX: 0000000000020015
RDX: 000000007f553000 RSI: 0000000000000000 RDI: ffffc9000495bd28
RBP: 0000000000000002 R08: 0000000000000000 R09: ffffffff8238aaf8
R10: ffffffff8238aae0 R11: 0000000000000000 R12: ffff88007f553008
R13: 0000000000000001 R14: ffff8800ff553000 R15: 0000000000000000
FS:  00007ff5c0e67b80(0000) GS:ffff880078e00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc9000001cfd6 CR3: 000000004f678006 CR4: 00000000003606f0
Call Trace:
 ? __send_ipi_mask+0x1c6/0x2d0
 ? hv_send_ipi_mask_allbutself+0x6d/0xb0
 ? mp_save_irq+0x70/0x70
 ? __ioapic_read_entry+0x32/0x50
 ? ioapic_read_entry+0x39/0x50
 ? clear_IO_APIC_pin+0xb8/0x110
 ? native_stop_other_cpus+0x6e/0x170
 ? native_machine_shutdown+0x22/0x40
 ? kernel_kexec+0x136/0x156
 ? __do_sys_reboot+0x1be/0x210
 ? kmem_cache_free+0x1b1/0x1e0
 ? __dentry_kill+0x10b/0x160
 ? _cond_resched+0x15/0x30
 ? dentry_kill+0x47/0x170
 ? dput.part.34+0xc6/0x100
 ? __fput+0x147/0x220
 ? _cond_resched+0x15/0x30
 ? task_work_run+0x38/0xa0
 ? do_syscall_64+0x5b/0x160
 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables sunrpc vfat fat crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_rapl_perf hv_balloon joydev xfs libcrc32c hv_storvsc serio_raw scsi_transport_fc hv_netvsc hyperv_keyboard hyperv_fb hid_hyperv crc32c_intel hv_vmbus

That's because HyperV's machine_ops.shutdown allow registering a hook to
be called upon shutdown, hv_vmbus will invalidate the hypercall page
using this hook. But hv_hypercall_pg is still pointing to this invalid
page, any hypercall based operation will panic the kernel. And kexec
progress will send IPIs for stopping CPUs.

This fix this by simply reset hv_hypercall_pg to NULL when the page is
revoked to avoid any misuse. IPI sending will fallback to use non
hypercall based method. This only happens on kexec / kdump so setting to
NULL should be good enough.

Fixes: 68bb7bfb7985 ("X86/Hyper-V: Enable IPI enlightenments")
Signed-off-by: Kairui Song <kasong@redhat.com>

---

I'm not sure about the details of what happened after the

wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);

But this fix should be valid, please let me know if I get anything
wrong, thanks.

 arch/x86/hyperv/hv_init.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 7abb09e2eeb8..92291c18d716 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -406,6 +406,10 @@ void hyperv_cleanup(void)
 	/* Reset our OS id */
 	wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
 
+	/* Cleanup page reference before reset the page */
+	hv_hypercall_pg = NULL;
+	wmb();
+
 	/* Reset the hypercall page */
 	hypercall_msr.as_uint64 = 0;
 	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] x86, hyperv: fix kernel panic when kexec on HyperV VM
  2019-02-26 15:56 [RFC PATCH] x86, hyperv: fix kernel panic when kexec on HyperV VM Kairui Song
@ 2019-02-26 16:39 ` Vitaly Kuznetsov
  2019-02-27 12:01 ` Peter Zijlstra
  1 sibling, 0 replies; 6+ messages in thread
From: Vitaly Kuznetsov @ 2019-02-26 16:39 UTC (permalink / raw)
  To: Kairui Song, linux-kernel
  Cc: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Dave Young, x86, devel, Kairui Song

Kairui Song <kasong@redhat.com> writes:

> When hypercalls is used for sending IPIs, kexec will fail with a kernel
> panic like this:
>
> kexec_core: Starting new kernel
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
> PGD 8000000057995067 P4D 8000000057995067 PUD 57990067 PMD 0
> Oops: 0002 [#1] SMP PTI
> CPU: 0 PID: 1016 Comm: kexec Not tainted 4.18.16-300.fc29.x86_64 #1
> Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v3.0 03/02/2018
> RIP: 0010:0xffffc9000001d000
> Code: Bad RIP value.
> RSP: 0018:ffffc9000495bcf0 EFLAGS: 00010046
> RAX: 0000000000000000 RBX: ffffc9000001d000 RCX: 0000000000020015
> RDX: 000000007f553000 RSI: 0000000000000000 RDI: ffffc9000495bd28
> RBP: 0000000000000002 R08: 0000000000000000 R09: ffffffff8238aaf8
> R10: ffffffff8238aae0 R11: 0000000000000000 R12: ffff88007f553008
> R13: 0000000000000001 R14: ffff8800ff553000 R15: 0000000000000000
> FS:  00007ff5c0e67b80(0000) GS:ffff880078e00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffc9000001cfd6 CR3: 000000004f678006 CR4: 00000000003606f0
> Call Trace:
>  ? __send_ipi_mask+0x1c6/0x2d0
>  ? hv_send_ipi_mask_allbutself+0x6d/0xb0
>  ? mp_save_irq+0x70/0x70
>  ? __ioapic_read_entry+0x32/0x50
>  ? ioapic_read_entry+0x39/0x50
>  ? clear_IO_APIC_pin+0xb8/0x110
>  ? native_stop_other_cpus+0x6e/0x170
>  ? native_machine_shutdown+0x22/0x40
>  ? kernel_kexec+0x136/0x156
>  ? __do_sys_reboot+0x1be/0x210
>  ? kmem_cache_free+0x1b1/0x1e0
>  ? __dentry_kill+0x10b/0x160
>  ? _cond_resched+0x15/0x30
>  ? dentry_kill+0x47/0x170
>  ? dput.part.34+0xc6/0x100
>  ? __fput+0x147/0x220
>  ? _cond_resched+0x15/0x30
>  ? task_work_run+0x38/0xa0
>  ? do_syscall_64+0x5b/0x160
>  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
> Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables sunrpc vfat fat crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_rapl_perf hv_balloon joydev xfs libcrc32c hv_storvsc serio_raw scsi_transport_fc hv_netvsc hyperv_keyboard hyperv_fb hid_hyperv crc32c_intel hv_vmbus
>
> That's because HyperV's machine_ops.shutdown allow registering a hook to
> be called upon shutdown, hv_vmbus will invalidate the hypercall page
> using this hook. But hv_hypercall_pg is still pointing to this invalid
> page, any hypercall based operation will panic the kernel. And kexec
> progress will send IPIs for stopping CPUs.
>
> This fix this by simply reset hv_hypercall_pg to NULL when the page is
> revoked to avoid any misuse. IPI sending will fallback to use non
> hypercall based method. This only happens on kexec / kdump so setting to
> NULL should be good enough.
>
> Fixes: 68bb7bfb7985 ("X86/Hyper-V: Enable IPI enlightenments")
> Signed-off-by: Kairui Song <kasong@redhat.com>
>
> ---
>
> I'm not sure about the details of what happened after the
>
> wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>
> But this fix should be valid, please let me know if I get anything
> wrong, thanks.
>
>  arch/x86/hyperv/hv_init.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 7abb09e2eeb8..92291c18d716 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -406,6 +406,10 @@ void hyperv_cleanup(void)
>  	/* Reset our OS id */
>  	wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
>  
> +	/* Cleanup page reference before reset the page */
> +	hv_hypercall_pg = NULL;
> +	wmb();
> +
>  	/* Reset the hypercall page */
>  	hypercall_msr.as_uint64 = 0;
>  	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);

This all looks correct to me. We need to reset HV_X64_MSR_HYPERCALL as
the hypercall page will remain 'special' otherwise so dumping it may
have undesired consequences (though, I think that last time I checked it
it was possible to read from this page without issues - and this should
be enough for kdump. But I'd rather keep things as they are as one
additional wrmsr doesn't hurt).

-- 
Vitaly

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] x86, hyperv: fix kernel panic when kexec on HyperV VM
  2019-02-26 15:56 [RFC PATCH] x86, hyperv: fix kernel panic when kexec on HyperV VM Kairui Song
  2019-02-26 16:39 ` Vitaly Kuznetsov
@ 2019-02-27 12:01 ` Peter Zijlstra
  2019-02-27 14:55   ` Kairui Song
  1 sibling, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2019-02-27 12:01 UTC (permalink / raw)
  To: Kairui Song
  Cc: linux-kernel, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Sasha Levin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Vitaly Kuznetsov, Dave Young, x86, devel

On Tue, Feb 26, 2019 at 11:56:15PM +0800, Kairui Song wrote:
>  arch/x86/hyperv/hv_init.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 7abb09e2eeb8..92291c18d716 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -406,6 +406,10 @@ void hyperv_cleanup(void)
>  	/* Reset our OS id */
>  	wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
>  
> +	/* Cleanup page reference before reset the page */
> +	hv_hypercall_pg = NULL;
> +	wmb();

What do we need that SFENCE for? Any why does it lack a comment?

> +
>  	/* Reset the hypercall page */
>  	hypercall_msr.as_uint64 = 0;
>  	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] x86, hyperv: fix kernel panic when kexec on HyperV VM
  2019-02-27 12:01 ` Peter Zijlstra
@ 2019-02-27 14:55   ` Kairui Song
  2019-03-05 12:28     ` Peter Zijlstra
  0 siblings, 1 reply; 6+ messages in thread
From: Kairui Song @ 2019-02-27 14:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linux Kernel Mailing List, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Vitaly Kuznetsov, Dave Young,
	the arch/x86 maintainers, devel

On Wed, Feb 27, 2019 at 8:02 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Feb 26, 2019 at 11:56:15PM +0800, Kairui Song wrote:
> >  arch/x86/hyperv/hv_init.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> > index 7abb09e2eeb8..92291c18d716 100644
> > --- a/arch/x86/hyperv/hv_init.c
> > +++ b/arch/x86/hyperv/hv_init.c
> > @@ -406,6 +406,10 @@ void hyperv_cleanup(void)
> >       /* Reset our OS id */
> >       wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
> >
> > +     /* Cleanup page reference before reset the page */
> > +     hv_hypercall_pg = NULL;
> > +     wmb();
>
> What do we need that SFENCE for? Any why does it lack a comment?

Hi, that's for ensuring the hv_hypercall_pg is reset to NULL before
the following wrmsr call. The wrmsr call will make the pointer address
invalid.
I can add some comment in V2 if this is OK.


--
Best Regards,
Kairui Song

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] x86, hyperv: fix kernel panic when kexec on HyperV VM
  2019-02-27 14:55   ` Kairui Song
@ 2019-03-05 12:28     ` Peter Zijlstra
  2019-03-05 13:32       ` Kairui Song
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2019-03-05 12:28 UTC (permalink / raw)
  To: Kairui Song
  Cc: Linux Kernel Mailing List, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Vitaly Kuznetsov, Dave Young,
	the arch/x86 maintainers, devel

On Wed, Feb 27, 2019 at 10:55:46PM +0800, Kairui Song wrote:
> On Wed, Feb 27, 2019 at 8:02 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Tue, Feb 26, 2019 at 11:56:15PM +0800, Kairui Song wrote:
> > >  arch/x86/hyperv/hv_init.c | 4 ++++
> > >  1 file changed, 4 insertions(+)
> > >
> > > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> > > index 7abb09e2eeb8..92291c18d716 100644
> > > --- a/arch/x86/hyperv/hv_init.c
> > > +++ b/arch/x86/hyperv/hv_init.c
> > > @@ -406,6 +406,10 @@ void hyperv_cleanup(void)
> > >       /* Reset our OS id */
> > >       wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
> > >
> > > +     /* Cleanup page reference before reset the page */
> > > +     hv_hypercall_pg = NULL;
> > > +     wmb();
> >
> > What do we need that SFENCE for? Any why does it lack a comment?
> 
> Hi, that's for ensuring the hv_hypercall_pg is reset to NULL before
> the following wrmsr call. The wrmsr call will make the pointer address
> invalid.

WRMSR is a serializing instruction (except for TSC_DEADLINE and the
X2APIC).

> I can add some comment in V2 if this is OK.

Barriers must always have a comment.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] x86, hyperv: fix kernel panic when kexec on HyperV VM
  2019-03-05 12:28     ` Peter Zijlstra
@ 2019-03-05 13:32       ` Kairui Song
  0 siblings, 0 replies; 6+ messages in thread
From: Kairui Song @ 2019-03-05 13:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linux Kernel Mailing List, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Vitaly Kuznetsov, Dave Young,
	the arch/x86 maintainers, devel

On Tue, Mar 5, 2019 at 8:28 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Feb 27, 2019 at 10:55:46PM +0800, Kairui Song wrote:
> > On Wed, Feb 27, 2019 at 8:02 PM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Tue, Feb 26, 2019 at 11:56:15PM +0800, Kairui Song wrote:
> > > >  arch/x86/hyperv/hv_init.c | 4 ++++
> > > >  1 file changed, 4 insertions(+)
> > > >
> > > > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> > > > index 7abb09e2eeb8..92291c18d716 100644
> > > > --- a/arch/x86/hyperv/hv_init.c
> > > > +++ b/arch/x86/hyperv/hv_init.c
> > > > @@ -406,6 +406,10 @@ void hyperv_cleanup(void)
> > > >       /* Reset our OS id */
> > > >       wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
> > > >
> > > > +     /* Cleanup page reference before reset the page */
> > > > +     hv_hypercall_pg = NULL;
> > > > +     wmb();
> > >
> > > What do we need that SFENCE for? Any why does it lack a comment?
> >
> > Hi, that's for ensuring the hv_hypercall_pg is reset to NULL before
> > the following wrmsr call. The wrmsr call will make the pointer address
> > invalid.
>
> WRMSR is a serializing instruction (except for TSC_DEADLINE and the
> X2APIC).
>

Many thanks for the info, I'm not aware of the exception condition, V2
is sent, will drop the barrier in V3 then.

-- 
Best Regards,
Kairui Song

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-03-05 13:32 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-26 15:56 [RFC PATCH] x86, hyperv: fix kernel panic when kexec on HyperV VM Kairui Song
2019-02-26 16:39 ` Vitaly Kuznetsov
2019-02-27 12:01 ` Peter Zijlstra
2019-02-27 14:55   ` Kairui Song
2019-03-05 12:28     ` Peter Zijlstra
2019-03-05 13:32       ` Kairui Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).