* [PATCH] x86/hvm: Fix boot on systems where HVM isn't available
@ 2022-02-04 17:34 Andrew Cooper
2022-02-05 9:47 ` Roger Pau Monné
0 siblings, 1 reply; 4+ messages in thread
From: Andrew Cooper @ 2022-02-04 17:34 UTC (permalink / raw)
To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné, Wei Liu
c/s 27a63cdac388 ("x86/HVM: convert remaining hvm_funcs hook invocations to
alt-call") went too far with dropping NULL function pointer checks.
smp_callin() calls hvm_cpu_up() unconditionally. When the platform doesn't
support HVM, hvm_enable() exits without filling in hvm_funcs, after which the
altcall pass nukes the (now unconditional) indirect call, causing:
(XEN) ----[ Xen-4.17.0-10.18-d x86_64 debug=y Not tainted ]----
(XEN) CPU: 1
(XEN) RIP: e008:[<ffff82d04034bef5>] start_secondary+0x393/0x3b7
(XEN) RFLAGS: 0000000000010086 CONTEXT: hypervisor
...
(XEN) Xen code around <ffff82d04034bef5> (start_secondary+0x393/0x3b7):
(XEN) ff ff 8b 05 1b 84 17 00 <0f> 0b 0f ff ff 90 89 c3 85 c0 0f 84 db fe ff ff
...
(XEN) Xen call trace:
(XEN) [<ffff82d04034bef5>] R start_secondary+0x393/0x3b7
(XEN) [<ffff82d0402000e2>] F __high_start+0x42/0x60
To make matters worse, __stop_this_cpu() calls hvm_cpu_down() unconditionally
too, so what happen next is:
(XEN) ----[ Xen-4.17.0-10.18-d x86_64 debug=y Not tainted ]----
(XEN) CPU: 0
(XEN) RIP: e008:[<ffff82d04034ab02>] __stop_this_cpu+0x12/0x3c
(XEN) RFLAGS: 0000000000010046 CONTEXT: hypervisor
...
(XEN) Xen code around <ffff82d04034ab02> (__stop_this_cpu+0x12/0x3c):
(XEN) 48 89 e5 e8 8a 1d fd ff <0f> 0b 0f ff ff 90 0f 06 db e3 48 89 e0 48 0d ff
...
(XEN) Xen call trace:
(XEN) [<ffff82d04034ab02>] R __stop_this_cpu+0x12/0x3c
(XEN) [<ffff82d04034ac15>] F smp_send_stop+0xdd/0xf8
(XEN) [<ffff82d04034a229>] F machine_restart+0xa2/0x298
(XEN) [<ffff82d04034a42a>] F arch/x86/shutdown.c#__machine_restart+0xb/0x11
(XEN) [<ffff82d04022fd15>] F smp_call_function_interrupt+0xbf/0xea
(XEN) [<ffff82d04034acc6>] F call_function_interrupt+0x35/0x37
(XEN) [<ffff82d040331a70>] F do_IRQ+0xa3/0x6b5
(XEN) [<ffff82d04039482a>] F common_interrupt+0x10a/0x120
(XEN) [<ffff82d04031f649>] F __udelay+0x3a/0x51
(XEN) [<ffff82d04034d5fb>] F __cpu_up+0x48f/0x734
(XEN) [<ffff82d040203c2b>] F cpu_up+0x7d/0xde
(XEN) [<ffff82d0404543d3>] F __start_xen+0x200b/0x2618
(XEN) [<ffff82d0402000ef>] F __high_start+0x4f/0x60
which recurses until hitting a stack overflow. The #DF handler, which resets
its stack on each invocation, loops indefinitely.
Reinstate the NULL function pointer checks for hvm_cpu_{up,down}().
Fixes: 27a63cdac388 ("x86/HVM: convert remaining hvm_funcs hook invocations to alt-call")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
RFC. Not tested yet on the imacted hardware. It's a Xeon PHI with another
werid thing in need of debugging. First boot is fine, while second
boot (loading microcode this time) has a problem with vmx.
I wonder if we want to modify the callers to check for HVM being enabled,
rather than leaving the NULL pointer checks in a position where they're liable
to be reaped again.
Also, the #UD handler really should identify 0f 0b 0f ff ff as the
clobbered-altcall sequence, and provide a message to that effect.
---
xen/arch/x86/include/asm/hvm/hvm.h | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/xen/arch/x86/include/asm/hvm/hvm.h b/xen/arch/x86/include/asm/hvm/hvm.h
index a441cbc22159..2dd08567f730 100644
--- a/xen/arch/x86/include/asm/hvm/hvm.h
+++ b/xen/arch/x86/include/asm/hvm/hvm.h
@@ -553,12 +553,16 @@ static inline void hvm_invlpg(struct vcpu *v, unsigned long linear)
static inline int hvm_cpu_up(void)
{
- return alternative_call(hvm_funcs.cpu_up);
+ if ( hvm_funcs.cpu_up )
+ return alternative_call(hvm_funcs.cpu_up);
+
+ return 0;
}
static inline void hvm_cpu_down(void)
{
- alternative_vcall(hvm_funcs.cpu_down);
+ if ( hvm_funcs.cpu_down )
+ alternative_vcall(hvm_funcs.cpu_down);
}
static inline unsigned int hvm_get_insn_bytes(struct vcpu *v, uint8_t *buf)
--
2.11.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] x86/hvm: Fix boot on systems where HVM isn't available
2022-02-04 17:34 [PATCH] x86/hvm: Fix boot on systems where HVM isn't available Andrew Cooper
@ 2022-02-05 9:47 ` Roger Pau Monné
2022-02-07 8:29 ` Jan Beulich
0 siblings, 1 reply; 4+ messages in thread
From: Roger Pau Monné @ 2022-02-05 9:47 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Xen-devel, Jan Beulich, Wei Liu
On Fri, Feb 04, 2022 at 05:34:05PM +0000, Andrew Cooper wrote:
> c/s 27a63cdac388 ("x86/HVM: convert remaining hvm_funcs hook invocations to
> alt-call") went too far with dropping NULL function pointer checks.
>
> smp_callin() calls hvm_cpu_up() unconditionally. When the platform doesn't
> support HVM, hvm_enable() exits without filling in hvm_funcs, after which the
> altcall pass nukes the (now unconditional) indirect call, causing:
>
> (XEN) ----[ Xen-4.17.0-10.18-d x86_64 debug=y Not tainted ]----
> (XEN) CPU: 1
> (XEN) RIP: e008:[<ffff82d04034bef5>] start_secondary+0x393/0x3b7
> (XEN) RFLAGS: 0000000000010086 CONTEXT: hypervisor
> ...
> (XEN) Xen code around <ffff82d04034bef5> (start_secondary+0x393/0x3b7):
> (XEN) ff ff 8b 05 1b 84 17 00 <0f> 0b 0f ff ff 90 89 c3 85 c0 0f 84 db fe ff ff
> ...
> (XEN) Xen call trace:
> (XEN) [<ffff82d04034bef5>] R start_secondary+0x393/0x3b7
> (XEN) [<ffff82d0402000e2>] F __high_start+0x42/0x60
>
> To make matters worse, __stop_this_cpu() calls hvm_cpu_down() unconditionally
> too, so what happen next is:
>
> (XEN) ----[ Xen-4.17.0-10.18-d x86_64 debug=y Not tainted ]----
> (XEN) CPU: 0
> (XEN) RIP: e008:[<ffff82d04034ab02>] __stop_this_cpu+0x12/0x3c
> (XEN) RFLAGS: 0000000000010046 CONTEXT: hypervisor
> ...
> (XEN) Xen code around <ffff82d04034ab02> (__stop_this_cpu+0x12/0x3c):
> (XEN) 48 89 e5 e8 8a 1d fd ff <0f> 0b 0f ff ff 90 0f 06 db e3 48 89 e0 48 0d ff
> ...
> (XEN) Xen call trace:
> (XEN) [<ffff82d04034ab02>] R __stop_this_cpu+0x12/0x3c
> (XEN) [<ffff82d04034ac15>] F smp_send_stop+0xdd/0xf8
> (XEN) [<ffff82d04034a229>] F machine_restart+0xa2/0x298
> (XEN) [<ffff82d04034a42a>] F arch/x86/shutdown.c#__machine_restart+0xb/0x11
> (XEN) [<ffff82d04022fd15>] F smp_call_function_interrupt+0xbf/0xea
> (XEN) [<ffff82d04034acc6>] F call_function_interrupt+0x35/0x37
> (XEN) [<ffff82d040331a70>] F do_IRQ+0xa3/0x6b5
> (XEN) [<ffff82d04039482a>] F common_interrupt+0x10a/0x120
> (XEN) [<ffff82d04031f649>] F __udelay+0x3a/0x51
> (XEN) [<ffff82d04034d5fb>] F __cpu_up+0x48f/0x734
> (XEN) [<ffff82d040203c2b>] F cpu_up+0x7d/0xde
> (XEN) [<ffff82d0404543d3>] F __start_xen+0x200b/0x2618
> (XEN) [<ffff82d0402000ef>] F __high_start+0x4f/0x60
>
> which recurses until hitting a stack overflow. The #DF handler, which resets
> its stack on each invocation, loops indefinitely.
>
> Reinstate the NULL function pointer checks for hvm_cpu_{up,down}().
>
> Fixes: 27a63cdac388 ("x86/HVM: convert remaining hvm_funcs hook invocations to alt-call")
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Wei Liu <wl@xen.org>
>
> RFC. Not tested yet on the imacted hardware. It's a Xeon PHI with another
> werid thing in need of debugging. First boot is fine, while second
> boot (loading microcode this time) has a problem with vmx.
>
> I wonder if we want to modify the callers to check for HVM being enabled,
> rather than leaving the NULL pointer checks in a position where they're liable
> to be reaped again.
What about adding a couple of comments to hvm_cpu_{up,down} to note
they are called unconditionally regardless of whether HVM is present
or not?
Thanks, Roger.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] x86/hvm: Fix boot on systems where HVM isn't available
2022-02-05 9:47 ` Roger Pau Monné
@ 2022-02-07 8:29 ` Jan Beulich
2022-02-07 16:00 ` Andrew Cooper
0 siblings, 1 reply; 4+ messages in thread
From: Jan Beulich @ 2022-02-07 8:29 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné
On 05.02.2022 10:47, Roger Pau Monné wrote:
> On Fri, Feb 04, 2022 at 05:34:05PM +0000, Andrew Cooper wrote:
>> c/s 27a63cdac388 ("x86/HVM: convert remaining hvm_funcs hook invocations to
>> alt-call") went too far with dropping NULL function pointer checks.
Oh, I'm sorry, I should have noticed this.
>> smp_callin() calls hvm_cpu_up() unconditionally. When the platform doesn't
>> support HVM, hvm_enable() exits without filling in hvm_funcs, after which the
>> altcall pass nukes the (now unconditional) indirect call, causing:
>>
>> (XEN) ----[ Xen-4.17.0-10.18-d x86_64 debug=y Not tainted ]----
>> (XEN) CPU: 1
>> (XEN) RIP: e008:[<ffff82d04034bef5>] start_secondary+0x393/0x3b7
>> (XEN) RFLAGS: 0000000000010086 CONTEXT: hypervisor
>> ...
>> (XEN) Xen code around <ffff82d04034bef5> (start_secondary+0x393/0x3b7):
>> (XEN) ff ff 8b 05 1b 84 17 00 <0f> 0b 0f ff ff 90 89 c3 85 c0 0f 84 db fe ff ff
>> ...
>> (XEN) Xen call trace:
>> (XEN) [<ffff82d04034bef5>] R start_secondary+0x393/0x3b7
>> (XEN) [<ffff82d0402000e2>] F __high_start+0x42/0x60
>>
>> To make matters worse, __stop_this_cpu() calls hvm_cpu_down() unconditionally
>> too, so what happen next is:
>>
>> (XEN) ----[ Xen-4.17.0-10.18-d x86_64 debug=y Not tainted ]----
>> (XEN) CPU: 0
>> (XEN) RIP: e008:[<ffff82d04034ab02>] __stop_this_cpu+0x12/0x3c
>> (XEN) RFLAGS: 0000000000010046 CONTEXT: hypervisor
>> ...
>> (XEN) Xen code around <ffff82d04034ab02> (__stop_this_cpu+0x12/0x3c):
>> (XEN) 48 89 e5 e8 8a 1d fd ff <0f> 0b 0f ff ff 90 0f 06 db e3 48 89 e0 48 0d ff
>> ...
>> (XEN) Xen call trace:
>> (XEN) [<ffff82d04034ab02>] R __stop_this_cpu+0x12/0x3c
>> (XEN) [<ffff82d04034ac15>] F smp_send_stop+0xdd/0xf8
>> (XEN) [<ffff82d04034a229>] F machine_restart+0xa2/0x298
>> (XEN) [<ffff82d04034a42a>] F arch/x86/shutdown.c#__machine_restart+0xb/0x11
>> (XEN) [<ffff82d04022fd15>] F smp_call_function_interrupt+0xbf/0xea
>> (XEN) [<ffff82d04034acc6>] F call_function_interrupt+0x35/0x37
>> (XEN) [<ffff82d040331a70>] F do_IRQ+0xa3/0x6b5
>> (XEN) [<ffff82d04039482a>] F common_interrupt+0x10a/0x120
>> (XEN) [<ffff82d04031f649>] F __udelay+0x3a/0x51
>> (XEN) [<ffff82d04034d5fb>] F __cpu_up+0x48f/0x734
>> (XEN) [<ffff82d040203c2b>] F cpu_up+0x7d/0xde
>> (XEN) [<ffff82d0404543d3>] F __start_xen+0x200b/0x2618
>> (XEN) [<ffff82d0402000ef>] F __high_start+0x4f/0x60
>>
>> which recurses until hitting a stack overflow. The #DF handler, which resets
>> its stack on each invocation, loops indefinitely.
>>
>> Reinstate the NULL function pointer checks for hvm_cpu_{up,down}().
>>
>> Fixes: 27a63cdac388 ("x86/HVM: convert remaining hvm_funcs hook invocations to alt-call")
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Roger Pau Monné <roger.pau@citrix.com>
>> CC: Wei Liu <wl@xen.org>
>>
>> RFC. Not tested yet on the imacted hardware. It's a Xeon PHI with another
>> werid thing in need of debugging. First boot is fine, while second
>> boot (loading microcode this time) has a problem with vmx.
Sounds not unfamiliar: My meanwhile oldish Romley needs to be cold-
booted for VMX to actually be usable (not locked) on APs.
>> I wonder if we want to modify the callers to check for HVM being enabled,
>> rather than leaving the NULL pointer checks in a position where they're liable
>> to be reaped again.
>
> What about adding a couple of comments to hvm_cpu_{up,down} to note
> they are called unconditionally regardless of whether HVM is present
> or not?
I second this as the perhaps better alternative: The S3 path is
similarly affected (and you may want to mention this in the
description), so this would mean up to 5 conditionals (at the
source level) instead of the just two you get away with here.
Jan
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] x86/hvm: Fix boot on systems where HVM isn't available
2022-02-07 8:29 ` Jan Beulich
@ 2022-02-07 16:00 ` Andrew Cooper
0 siblings, 0 replies; 4+ messages in thread
From: Andrew Cooper @ 2022-02-07 16:00 UTC (permalink / raw)
To: Jan Beulich; +Cc: Xen-devel, Wei Liu, Roger Pau Monne
On 07/02/2022 08:29, Jan Beulich wrote:
> On 05.02.2022 10:47, Roger Pau Monné wrote:
>> On Fri, Feb 04, 2022 at 05:34:05PM +0000, Andrew Cooper wrote:
>>> c/s 27a63cdac388 ("x86/HVM: convert remaining hvm_funcs hook invocations to
>>> alt-call") went too far with dropping NULL function pointer checks.
> Oh, I'm sorry, I should have noticed this.
>
>>> smp_callin() calls hvm_cpu_up() unconditionally. When the platform doesn't
>>> support HVM, hvm_enable() exits without filling in hvm_funcs, after which the
>>> altcall pass nukes the (now unconditional) indirect call, causing:
>>>
>>> (XEN) ----[ Xen-4.17.0-10.18-d x86_64 debug=y Not tainted ]----
>>> (XEN) CPU: 1
>>> (XEN) RIP: e008:[<ffff82d04034bef5>] start_secondary+0x393/0x3b7
>>> (XEN) RFLAGS: 0000000000010086 CONTEXT: hypervisor
>>> ...
>>> (XEN) Xen code around <ffff82d04034bef5> (start_secondary+0x393/0x3b7):
>>> (XEN) ff ff 8b 05 1b 84 17 00 <0f> 0b 0f ff ff 90 89 c3 85 c0 0f 84 db fe ff ff
>>> ...
>>> (XEN) Xen call trace:
>>> (XEN) [<ffff82d04034bef5>] R start_secondary+0x393/0x3b7
>>> (XEN) [<ffff82d0402000e2>] F __high_start+0x42/0x60
>>>
>>> To make matters worse, __stop_this_cpu() calls hvm_cpu_down() unconditionally
>>> too, so what happen next is:
>>>
>>> (XEN) ----[ Xen-4.17.0-10.18-d x86_64 debug=y Not tainted ]----
>>> (XEN) CPU: 0
>>> (XEN) RIP: e008:[<ffff82d04034ab02>] __stop_this_cpu+0x12/0x3c
>>> (XEN) RFLAGS: 0000000000010046 CONTEXT: hypervisor
>>> ...
>>> (XEN) Xen code around <ffff82d04034ab02> (__stop_this_cpu+0x12/0x3c):
>>> (XEN) 48 89 e5 e8 8a 1d fd ff <0f> 0b 0f ff ff 90 0f 06 db e3 48 89 e0 48 0d ff
>>> ...
>>> (XEN) Xen call trace:
>>> (XEN) [<ffff82d04034ab02>] R __stop_this_cpu+0x12/0x3c
>>> (XEN) [<ffff82d04034ac15>] F smp_send_stop+0xdd/0xf8
>>> (XEN) [<ffff82d04034a229>] F machine_restart+0xa2/0x298
>>> (XEN) [<ffff82d04034a42a>] F arch/x86/shutdown.c#__machine_restart+0xb/0x11
>>> (XEN) [<ffff82d04022fd15>] F smp_call_function_interrupt+0xbf/0xea
>>> (XEN) [<ffff82d04034acc6>] F call_function_interrupt+0x35/0x37
>>> (XEN) [<ffff82d040331a70>] F do_IRQ+0xa3/0x6b5
>>> (XEN) [<ffff82d04039482a>] F common_interrupt+0x10a/0x120
>>> (XEN) [<ffff82d04031f649>] F __udelay+0x3a/0x51
>>> (XEN) [<ffff82d04034d5fb>] F __cpu_up+0x48f/0x734
>>> (XEN) [<ffff82d040203c2b>] F cpu_up+0x7d/0xde
>>> (XEN) [<ffff82d0404543d3>] F __start_xen+0x200b/0x2618
>>> (XEN) [<ffff82d0402000ef>] F __high_start+0x4f/0x60
>>>
>>> which recurses until hitting a stack overflow. The #DF handler, which resets
>>> its stack on each invocation, loops indefinitely.
>>>
>>> Reinstate the NULL function pointer checks for hvm_cpu_{up,down}().
>>>
>>> Fixes: 27a63cdac388 ("x86/HVM: convert remaining hvm_funcs hook invocations to alt-call")
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Thanks.
>
>>> ---
>>> CC: Jan Beulich <JBeulich@suse.com>
>>> CC: Roger Pau Monné <roger.pau@citrix.com>
>>> CC: Wei Liu <wl@xen.org>
>>>
>>> RFC. Not tested yet on the imacted hardware. It's a Xeon PHI with another
>>> werid thing in need of debugging. First boot is fine, while second
>>> boot (loading microcode this time) has a problem with vmx.
> Sounds not unfamiliar: My meanwhile oldish Romley needs to be cold-
> booted for VMX to actually be usable (not locked) on APs.
This is something which goes wrong as a consequence of loading microcode.
>>> I wonder if we want to modify the callers to check for HVM being enabled,
>>> rather than leaving the NULL pointer checks in a position where they're liable
>>> to be reaped again.
>> What about adding a couple of comments to hvm_cpu_{up,down} to note
>> they are called unconditionally regardless of whether HVM is present
>> or not?
> I second this as the perhaps better alternative: The S3 path is
> similarly affected (and you may want to mention this in the
> description), so this would mean up to 5 conditionals (at the
> source level) instead of the just two you get away with here.
Ok. I've added:
/* Called in boot/resume paths. Must cope with no HVM support. */
and:
/* Called in shutdown paths. Must cope with no HVM support. */
~Andrew
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-02-07 16:01 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-04 17:34 [PATCH] x86/hvm: Fix boot on systems where HVM isn't available Andrew Cooper
2022-02-05 9:47 ` Roger Pau Monné
2022-02-07 8:29 ` Jan Beulich
2022-02-07 16:00 ` Andrew Cooper
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.