[v2] x86, hyperv: fix kernel panic when kexec on HyperV
diff mbox series

Message ID 20190305121703.29769-1-kasong@redhat.com
State Superseded
Headers show
Series
  • [v2] x86, hyperv: fix kernel panic when kexec on HyperV
Related show

Commit Message

Kairui Song March 5, 2019, 12:17 p.m. UTC
After commit 68bb7bfb7985 ("X86/Hyper-V: Enable IPI enlightenments"),
kexec will fail with a kernel panic like this:

kexec_core: Starting new kernel
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
PGD 8000000057995067 P4D 8000000057995067 PUD 57990067 PMD 0
Oops: 0002 [#1] SMP PTI
CPU: 0 PID: 1016 Comm: kexec Not tainted 4.18.16-300.fc29.x86_64 #1
Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v3.0 03/02/2018
RIP: 0010:0xffffc9000001d000
Code: Bad RIP value.
RSP: 0018:ffffc9000495bcf0 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffffc9000001d000 RCX: 0000000000020015
RDX: 000000007f553000 RSI: 0000000000000000 RDI: ffffc9000495bd28
RBP: 0000000000000002 R08: 0000000000000000 R09: ffffffff8238aaf8
R10: ffffffff8238aae0 R11: 0000000000000000 R12: ffff88007f553008
R13: 0000000000000001 R14: ffff8800ff553000 R15: 0000000000000000
FS:  00007ff5c0e67b80(0000) GS:ffff880078e00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc9000001cfd6 CR3: 000000004f678006 CR4: 00000000003606f0
Call Trace:
 ? __send_ipi_mask+0x1c6/0x2d0
 ? hv_send_ipi_mask_allbutself+0x6d/0xb0
 ? mp_save_irq+0x70/0x70
 ? __ioapic_read_entry+0x32/0x50
 ? ioapic_read_entry+0x39/0x50
 ? clear_IO_APIC_pin+0xb8/0x110
 ? native_stop_other_cpus+0x6e/0x170
 ? native_machine_shutdown+0x22/0x40
 ? kernel_kexec+0x136/0x156
 ? __do_sys_reboot+0x1be/0x210
 ? kmem_cache_free+0x1b1/0x1e0
 ? __dentry_kill+0x10b/0x160
 ? _cond_resched+0x15/0x30
 ? dentry_kill+0x47/0x170
 ? dput.part.34+0xc6/0x100
 ? __fput+0x147/0x220
 ? _cond_resched+0x15/0x30
 ? task_work_run+0x38/0xa0
 ? do_syscall_64+0x5b/0x160
 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables sunrpc vfat fat crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_rapl_perf hv_balloon joydev xfs libcrc32c hv_storvsc serio_raw scsi_transport_fc hv_netvsc hyperv_keyboard hyperv_fb hid_hyperv crc32c_intel hv_vmbus

That's because now we may use hypercall for sending IPI, the
hypercall page will be reset very early upon kexec reboot, but kexec
will need to send IPI for stopping CPUs, and it will reference this
no longer usable page, then kernel panics.

To fix it, simply reset hv_hypercall_pg to NULL before the page is
reset to avoid any misuse, IPI sending will fallback to use non
hypercall based method. This only happens on kexec / kdump so setting to
NULL should be good enough.

Fixes: 68bb7bfb7985 ("X86/Hyper-V: Enable IPI enlightenments")
Signed-off-by: Kairui Song <kasong@redhat.com>

---

Update from V1:
- Add comment for the barrier.

 arch/x86/hyperv/hv_init.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Peter Zijlstra March 5, 2019, 12:33 p.m. UTC | #1
On Tue, Mar 05, 2019 at 08:17:03PM +0800, Kairui Song wrote:
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 7abb09e2eeb8..34aa1e953dfc 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -406,6 +406,12 @@ void hyperv_cleanup(void)
>  	/* Reset our OS id */
>  	wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
>  
> +	/* Cleanup hypercall page reference before reset the page */
> +	hv_hypercall_pg = NULL;
> +
> +	/* Make sure page reference is cleared before wrmsr */

This comment forgets to tell us who cares about this. And why the wrmsr
itself isn't serializing enough.

> +	wmb();
> +
>  	/* Reset the hypercall page */
>  	hypercall_msr.as_uint64 = 0;
>  	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);

That looks like a fake MSR; and you're telling me that VMEXIT doesn't
serialize?
Kairui Song March 5, 2019, 1:34 p.m. UTC | #2
On Tue, Mar 5, 2019 at 8:33 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Mar 05, 2019 at 08:17:03PM +0800, Kairui Song wrote:
> > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> > index 7abb09e2eeb8..34aa1e953dfc 100644
> > --- a/arch/x86/hyperv/hv_init.c
> > +++ b/arch/x86/hyperv/hv_init.c
> > @@ -406,6 +406,12 @@ void hyperv_cleanup(void)
> >       /* Reset our OS id */
> >       wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
> >
> > +     /* Cleanup hypercall page reference before reset the page */
> > +     hv_hypercall_pg = NULL;
> > +
> > +     /* Make sure page reference is cleared before wrmsr */
>
> This comment forgets to tell us who cares about this. And why the wrmsr
> itself isn't serializing enough.
>
> > +     wmb();
> > +
> >       /* Reset the hypercall page */
> >       hypercall_msr.as_uint64 = 0;
> >       wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>
> That looks like a fake MSR; and you're telling me that VMEXIT doesn't
> serialize?

Thanks for the review, seem I being a bit paranoid on this. Will drop
it and send a v3 if no one has any other complaint.

--
Best Regards,
Kairui Song
Peter Zijlstra March 5, 2019, 3:21 p.m. UTC | #3
On Tue, Mar 05, 2019 at 09:34:13PM +0800, Kairui Song wrote:
> On Tue, Mar 5, 2019 at 8:33 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Tue, Mar 05, 2019 at 08:17:03PM +0800, Kairui Song wrote:
> > > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> > > index 7abb09e2eeb8..34aa1e953dfc 100644
> > > --- a/arch/x86/hyperv/hv_init.c
> > > +++ b/arch/x86/hyperv/hv_init.c
> > > @@ -406,6 +406,12 @@ void hyperv_cleanup(void)
> > >       /* Reset our OS id */
> > >       wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
> > >
> > > +     /* Cleanup hypercall page reference before reset the page */
> > > +     hv_hypercall_pg = NULL;
> > > +
> > > +     /* Make sure page reference is cleared before wrmsr */
> >
> > This comment forgets to tell us who cares about this. And why the wrmsr
> > itself isn't serializing enough.
> >
> > > +     wmb();
> > > +
> > >       /* Reset the hypercall page */
> > >       hypercall_msr.as_uint64 = 0;
> > >       wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> >
> > That looks like a fake MSR; and you're telling me that VMEXIT doesn't
> > serialize?
> 
> Thanks for the review, seem I being a bit paranoid on this. Will drop
> it and send a v3 if no one has any other complaint.

Also; our wrmsr implementation has a "memory" clobber on, which means
you don't even need a compiler barrier to force emit that store before
the wrmsr.

You might want to retain the comment that explains this ordering though.

	hv_hypercall_pg = NULL;
	/*
	 * text that explains why hv_hypercall_pg must be set NULL
	 * before the wrmsr
	 */
	wrmsrl(...);

Patch
diff mbox series

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 7abb09e2eeb8..34aa1e953dfc 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -406,6 +406,12 @@  void hyperv_cleanup(void)
 	/* Reset our OS id */
 	wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
 
+	/* Cleanup hypercall page reference before reset the page */
+	hv_hypercall_pg = NULL;
+
+	/* Make sure page reference is cleared before wrmsr */
+	wmb();
+
 	/* Reset the hypercall page */
 	hypercall_msr.as_uint64 = 0;
 	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);