linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/stackframe/32: repair 32-bit Xen PV
@ 2019-10-07 10:41 Jan Beulich
  2019-10-24  7:40 ` Ping: " Jan Beulich
  2019-10-24 16:11 ` Andy Lutomirski
  0 siblings, 2 replies; 7+ messages in thread
From: Jan Beulich @ 2019-10-07 10:41 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: lkml

Once again RPL checks have been introduced which don't account for a
32-bit kernel living in ring 1 when running in a PV Xen domain. The
case in FIXUP_FRAME has been preventing boot; adjust BUG_IF_WRONG_CR3
as well just in case.

Fixes: 3c88c692c287 ("x86/stackframe/32: Provide consistent pt_regs")
Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -48,6 +48,17 @@
 
 #include "calling.h"
 
+#ifndef CONFIG_XEN_PV
+# define USER_SEGMENT_RPL_MASK SEGMENT_RPL_MASK
+#else
+/*
+ * When running paravirtualized on Xen the kernel runs in ring 1, and hence
+ * simple mask based tests (i.e. ones not comparing against USER_RPL) have to
+ * ignore bit 0. See also the C-level get_kernel_rpl().
+ */
+# define USER_SEGMENT_RPL_MASK (SEGMENT_RPL_MASK & ~1)
+#endif
+
 	.section .entry.text, "ax"
 
 /*
@@ -172,7 +183,7 @@
 	ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
 	.if \no_user_check == 0
 	/* coming from usermode? */
-	testl	$SEGMENT_RPL_MASK, PT_CS(%esp)
+	testl	$USER_SEGMENT_RPL_MASK, PT_CS(%esp)
 	jz	.Lend_\@
 	.endif
 	/* On user-cr3? */
@@ -217,7 +228,7 @@
 	testl	$X86_EFLAGS_VM, 4*4(%esp)
 	jnz	.Lfrom_usermode_no_fixup_\@
 #endif
-	testl	$SEGMENT_RPL_MASK, 3*4(%esp)
+	testl	$USER_SEGMENT_RPL_MASK, 3*4(%esp)
 	jnz	.Lfrom_usermode_no_fixup_\@
 
 	orl	$CS_FROM_KERNEL, 3*4(%esp)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Ping: [PATCH] x86/stackframe/32: repair 32-bit Xen PV
  2019-10-07 10:41 [PATCH] x86/stackframe/32: repair 32-bit Xen PV Jan Beulich
@ 2019-10-24  7:40 ` Jan Beulich
  2019-10-24 16:11 ` Andy Lutomirski
  1 sibling, 0 replies; 7+ messages in thread
From: Jan Beulich @ 2019-10-24  7:40 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: lkml, xen-devel

On 07.10.2019 12:41, Jan Beulich wrote:
> Once again RPL checks have been introduced which don't account for a
> 32-bit kernel living in ring 1 when running in a PV Xen domain. The
> case in FIXUP_FRAME has been preventing boot; adjust BUG_IF_WRONG_CR3
> as well just in case.
> 
> Fixes: 3c88c692c287 ("x86/stackframe/32: Provide consistent pt_regs")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Ping?

I'd like to further note that there appears to a likely related
2nd problem - I'm seeing seemingly random attempts to enter VM86
mode when running PV under Xen. I suspect a never written eflags
value to get inspected. While the issue here kills the kernel
reliably during boot, that other issue sometimes allows the
system to at least come up in a partly functional way (depending
on which user processes get killed because of there not being
any VM86 mode available when running PV under [64-bit] Xen).

Jan

> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -48,6 +48,17 @@
>  
>  #include "calling.h"
>  
> +#ifndef CONFIG_XEN_PV
> +# define USER_SEGMENT_RPL_MASK SEGMENT_RPL_MASK
> +#else
> +/*
> + * When running paravirtualized on Xen the kernel runs in ring 1, and hence
> + * simple mask based tests (i.e. ones not comparing against USER_RPL) have to
> + * ignore bit 0. See also the C-level get_kernel_rpl().
> + */
> +# define USER_SEGMENT_RPL_MASK (SEGMENT_RPL_MASK & ~1)
> +#endif
> +
>  	.section .entry.text, "ax"
>  
>  /*
> @@ -172,7 +183,7 @@
>  	ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
>  	.if \no_user_check == 0
>  	/* coming from usermode? */
> -	testl	$SEGMENT_RPL_MASK, PT_CS(%esp)
> +	testl	$USER_SEGMENT_RPL_MASK, PT_CS(%esp)
>  	jz	.Lend_\@
>  	.endif
>  	/* On user-cr3? */
> @@ -217,7 +228,7 @@
>  	testl	$X86_EFLAGS_VM, 4*4(%esp)
>  	jnz	.Lfrom_usermode_no_fixup_\@
>  #endif
> -	testl	$SEGMENT_RPL_MASK, 3*4(%esp)
> +	testl	$USER_SEGMENT_RPL_MASK, 3*4(%esp)
>  	jnz	.Lfrom_usermode_no_fixup_\@
>  
>  	orl	$CS_FROM_KERNEL, 3*4(%esp)
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86/stackframe/32: repair 32-bit Xen PV
  2019-10-07 10:41 [PATCH] x86/stackframe/32: repair 32-bit Xen PV Jan Beulich
  2019-10-24  7:40 ` Ping: " Jan Beulich
@ 2019-10-24 16:11 ` Andy Lutomirski
  2019-10-24 16:31   ` [Xen-devel] " Andrew Cooper
  2019-10-25  6:06   ` Jan Beulich
  1 sibling, 2 replies; 7+ messages in thread
From: Andy Lutomirski @ 2019-10-24 16:11 UTC (permalink / raw)
  To: Jan Beulich, xen-devel, X86 ML, Peter Zijlstra; +Cc: Andy Lutomirski, lkml

On Mon, Oct 7, 2019 at 3:41 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> Once again RPL checks have been introduced which don't account for a
> 32-bit kernel living in ring 1 when running in a PV Xen domain. The
> case in FIXUP_FRAME has been preventing boot; adjust BUG_IF_WRONG_CR3
> as well just in case.

I'm okay with the generated code, but IMO the macro is too indirect
for something that's trivial.

>
> Fixes: 3c88c692c287 ("x86/stackframe/32: Provide consistent pt_regs")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -48,6 +48,17 @@
>
>  #include "calling.h"
>
> +#ifndef CONFIG_XEN_PV
> +# define USER_SEGMENT_RPL_MASK SEGMENT_RPL_MASK
> +#else
> +/*
> + * When running paravirtualized on Xen the kernel runs in ring 1, and hence
> + * simple mask based tests (i.e. ones not comparing against USER_RPL) have to
> + * ignore bit 0. See also the C-level get_kernel_rpl().
> + */

How about:

/*
 * When running on Xen PV, the actual %cs register in the kernel is 1, not 0.
 * If we need to distinguish between a %cs from kernel mode and a %cs from
 * user mode, we can do test $2 instead of test $3.
 */
#define USER_SEGMENT_RPL_MASK 2

but...

> +# define USER_SEGMENT_RPL_MASK (SEGMENT_RPL_MASK & ~1)
> +#endif
> +
>         .section .entry.text, "ax"
>
>  /*
> @@ -172,7 +183,7 @@
>         ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
>         .if \no_user_check == 0
>         /* coming from usermode? */
> -       testl   $SEGMENT_RPL_MASK, PT_CS(%esp)
> +       testl   $USER_SEGMENT_RPL_MASK, PT_CS(%esp)

Shouldn't PT_CS(%esp) be 0 if we came from the kernel?  I'm guessing
the actual bug is in whatever code put 1 in here in the first place.

In other words, I'm having trouble understanding why there is any
context in which some value would be 3 for user mode and 1 for kernel
mode.  Obviously if we're manually IRETing to kernel mode, we need to
set CS to 1, but if we're filling in our own PT_CS, we should just
write 0.

The supposedly offending commit (""x86/stackframe/32: Provide
consistent pt_regs") looks correct to me, so I suspect that the
problem is elsewhere.  Or is it intentional that Xen PV's asm
(arch/x86/xen/whatever) sticks 1 into the CS field on the stack?

Also, why are we supporting 32-bit Linux PV guests at all?  Can we
just delete this code instead?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Xen-devel] [PATCH] x86/stackframe/32: repair 32-bit Xen PV
  2019-10-24 16:11 ` Andy Lutomirski
@ 2019-10-24 16:31   ` Andrew Cooper
  2019-10-24 16:33     ` Andy Lutomirski
  2019-10-25  6:06   ` Jan Beulich
  1 sibling, 1 reply; 7+ messages in thread
From: Andrew Cooper @ 2019-10-24 16:31 UTC (permalink / raw)
  To: Andy Lutomirski, Jan Beulich, xen-devel, X86 ML, Peter Zijlstra; +Cc: lkml

On 24/10/2019 17:11, Andy Lutomirski wrote:
>> +# define USER_SEGMENT_RPL_MASK (SEGMENT_RPL_MASK & ~1)
>> +#endif
>> +
>>         .section .entry.text, "ax"
>>
>>  /*
>> @@ -172,7 +183,7 @@
>>         ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
>>         .if \no_user_check == 0
>>         /* coming from usermode? */
>> -       testl   $SEGMENT_RPL_MASK, PT_CS(%esp)
>> +       testl   $USER_SEGMENT_RPL_MASK, PT_CS(%esp)
> Shouldn't PT_CS(%esp) be 0 if we came from the kernel?  I'm guessing
> the actual bug is in whatever code put 1 in here in the first place.

Ring1 kernels (32bit) consistently see RPL1 everywhere under Xen.

Back in the days of a 32bit Xen, int $0x80 really was wired directly
from ring 3 to 1, and didn't bounce through Xen.  This isn't possible in
long mode, because all IDT gates are required to be 64bit code segments.

Ring3 kernels (64bit) consistently see RPL0 everywhere under Xen,
because presumably this was less invasive when designing the ABI.

~Andrew

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Xen-devel] [PATCH] x86/stackframe/32: repair 32-bit Xen PV
  2019-10-24 16:31   ` [Xen-devel] " Andrew Cooper
@ 2019-10-24 16:33     ` Andy Lutomirski
  0 siblings, 0 replies; 7+ messages in thread
From: Andy Lutomirski @ 2019-10-24 16:33 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Andy Lutomirski, Jan Beulich, xen-devel, X86 ML, Peter Zijlstra, lkml

On Thu, Oct 24, 2019 at 9:32 AM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> On 24/10/2019 17:11, Andy Lutomirski wrote:
> >> +# define USER_SEGMENT_RPL_MASK (SEGMENT_RPL_MASK & ~1)
> >> +#endif
> >> +
> >>         .section .entry.text, "ax"
> >>
> >>  /*
> >> @@ -172,7 +183,7 @@
> >>         ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
> >>         .if \no_user_check == 0
> >>         /* coming from usermode? */
> >> -       testl   $SEGMENT_RPL_MASK, PT_CS(%esp)
> >> +       testl   $USER_SEGMENT_RPL_MASK, PT_CS(%esp)
> > Shouldn't PT_CS(%esp) be 0 if we came from the kernel?  I'm guessing
> > the actual bug is in whatever code put 1 in here in the first place.
>
> Ring1 kernels (32bit) consistently see RPL1 everywhere under Xen.
>
> Back in the days of a 32bit Xen, int $0x80 really was wired directly
> from ring 3 to 1, and didn't bounce through Xen.  This isn't possible in
> long mode, because all IDT gates are required to be 64bit code segments.
>
> Ring3 kernels (64bit) consistently see RPL0 everywhere under Xen,
> because presumably this was less invasive when designing the ABI.
>

OK, gotcha.

So I'm fine with this patch if you improve the comment and definition.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86/stackframe/32: repair 32-bit Xen PV
  2019-10-24 16:11 ` Andy Lutomirski
  2019-10-24 16:31   ` [Xen-devel] " Andrew Cooper
@ 2019-10-25  6:06   ` Jan Beulich
  2019-10-25  6:09     ` Jürgen Groß
  1 sibling, 1 reply; 7+ messages in thread
From: Jan Beulich @ 2019-10-25  6:06 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: X86 ML, Peter Zijlstra, lkml, xen-devel, Juergen Gross

On 24.10.2019 18:11, Andy Lutomirski wrote:
> On Mon, Oct 7, 2019 at 3:41 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> Once again RPL checks have been introduced which don't account for a
>> 32-bit kernel living in ring 1 when running in a PV Xen domain. The
>> case in FIXUP_FRAME has been preventing boot; adjust BUG_IF_WRONG_CR3
>> as well just in case.
> 
> I'm okay with the generated code, but IMO the macro is too indirect
> for something that's trivial.
> 
>>
>> Fixes: 3c88c692c287 ("x86/stackframe/32: Provide consistent pt_regs")
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>
>> --- a/arch/x86/entry/entry_32.S
>> +++ b/arch/x86/entry/entry_32.S
>> @@ -48,6 +48,17 @@
>>
>>   #include "calling.h"
>>
>> +#ifndef CONFIG_XEN_PV
>> +# define USER_SEGMENT_RPL_MASK SEGMENT_RPL_MASK
>> +#else
>> +/*
>> + * When running paravirtualized on Xen the kernel runs in ring 1, and hence
>> + * simple mask based tests (i.e. ones not comparing against USER_RPL) have to
>> + * ignore bit 0. See also the C-level get_kernel_rpl().
>> + */
> 
> How about:
> 
> /*
>   * When running on Xen PV, the actual %cs register in the kernel is 1, not 0.
>   * If we need to distinguish between a %cs from kernel mode and a %cs from
>   * user mode, we can do test $2 instead of test $3.
>   */
> #define USER_SEGMENT_RPL_MASK 2

I.e. you're fine using just the single bit in all configurations?

>> +# define USER_SEGMENT_RPL_MASK (SEGMENT_RPL_MASK & ~1)
>> +#endif
>> +
>>          .section .entry.text, "ax"
>>
>>   /*
>> @@ -172,7 +183,7 @@
>>          ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
>>          .if \no_user_check == 0
>>          /* coming from usermode? */
>> -       testl   $SEGMENT_RPL_MASK, PT_CS(%esp)
>> +       testl   $USER_SEGMENT_RPL_MASK, PT_CS(%esp)
> 
> Shouldn't PT_CS(%esp) be 0 if we came from the kernel?  I'm guessing
> the actual bug is in whatever code put 1 in here in the first place.
> 
> In other words, I'm having trouble understanding why there is any
> context in which some value would be 3 for user mode and 1 for kernel
> mode.  Obviously if we're manually IRETing to kernel mode, we need to
> set CS to 1, but if we're filling in our own PT_CS, we should just
> write 0.
> 
> The supposedly offending commit (""x86/stackframe/32: Provide
> consistent pt_regs") looks correct to me, so I suspect that the
> problem is elsewhere.  Or is it intentional that Xen PV's asm
> (arch/x86/xen/whatever) sticks 1 into the CS field on the stack?

Manually created / updated frames _could_ in principle modify the
RPL, but ones coming from hardware (old 32-bit hypervisors) or Xen
(64-bit hypervisors) will have an RPL of 1, as already said by
Andrew. We could in principle also add a VM assist for the
hypervisor to store an RPL of 0, but I'd expect this to require
further kernel changes, and together with the old behavior still
being required to support I'm unconvinced this would be worth it.

> Also, why are we supporting 32-bit Linux PV guests at all?  Can we
> just delete this code instead?

This was already suggested by Jürgen (now also CC-ed), but in reply
it was pointed out that the process would be to first deprecate the
code, and remove it only a couple of releases later if no-one comes
up with a reason to retain it.

Jan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86/stackframe/32: repair 32-bit Xen PV
  2019-10-25  6:06   ` Jan Beulich
@ 2019-10-25  6:09     ` Jürgen Groß
  0 siblings, 0 replies; 7+ messages in thread
From: Jürgen Groß @ 2019-10-25  6:09 UTC (permalink / raw)
  To: Jan Beulich, Andy Lutomirski; +Cc: X86 ML, Peter Zijlstra, lkml, xen-devel

On 25.10.19 08:06, Jan Beulich wrote:
> On 24.10.2019 18:11, Andy Lutomirski wrote:
>> On Mon, Oct 7, 2019 at 3:41 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>
>>> Once again RPL checks have been introduced which don't account for a
>>> 32-bit kernel living in ring 1 when running in a PV Xen domain. The
>>> case in FIXUP_FRAME has been preventing boot; adjust BUG_IF_WRONG_CR3
>>> as well just in case.
>>
>> I'm okay with the generated code, but IMO the macro is too indirect
>> for something that's trivial.
>>
>>>
>>> Fixes: 3c88c692c287 ("x86/stackframe/32: Provide consistent pt_regs")
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>
>>> --- a/arch/x86/entry/entry_32.S
>>> +++ b/arch/x86/entry/entry_32.S
>>> @@ -48,6 +48,17 @@
>>>
>>>    #include "calling.h"
>>>
>>> +#ifndef CONFIG_XEN_PV
>>> +# define USER_SEGMENT_RPL_MASK SEGMENT_RPL_MASK
>>> +#else
>>> +/*
>>> + * When running paravirtualized on Xen the kernel runs in ring 1, and hence
>>> + * simple mask based tests (i.e. ones not comparing against USER_RPL) have to
>>> + * ignore bit 0. See also the C-level get_kernel_rpl().
>>> + */
>>
>> How about:
>>
>> /*
>>    * When running on Xen PV, the actual %cs register in the kernel is 1, not 0.
>>    * If we need to distinguish between a %cs from kernel mode and a %cs from
>>    * user mode, we can do test $2 instead of test $3.
>>    */
>> #define USER_SEGMENT_RPL_MASK 2
> 
> I.e. you're fine using just the single bit in all configurations?
> 
>>> +# define USER_SEGMENT_RPL_MASK (SEGMENT_RPL_MASK & ~1)
>>> +#endif
>>> +
>>>           .section .entry.text, "ax"
>>>
>>>    /*
>>> @@ -172,7 +183,7 @@
>>>           ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
>>>           .if \no_user_check == 0
>>>           /* coming from usermode? */
>>> -       testl   $SEGMENT_RPL_MASK, PT_CS(%esp)
>>> +       testl   $USER_SEGMENT_RPL_MASK, PT_CS(%esp)
>>
>> Shouldn't PT_CS(%esp) be 0 if we came from the kernel?  I'm guessing
>> the actual bug is in whatever code put 1 in here in the first place.
>>
>> In other words, I'm having trouble understanding why there is any
>> context in which some value would be 3 for user mode and 1 for kernel
>> mode.  Obviously if we're manually IRETing to kernel mode, we need to
>> set CS to 1, but if we're filling in our own PT_CS, we should just
>> write 0.
>>
>> The supposedly offending commit (""x86/stackframe/32: Provide
>> consistent pt_regs") looks correct to me, so I suspect that the
>> problem is elsewhere.  Or is it intentional that Xen PV's asm
>> (arch/x86/xen/whatever) sticks 1 into the CS field on the stack?
> 
> Manually created / updated frames _could_ in principle modify the
> RPL, but ones coming from hardware (old 32-bit hypervisors) or Xen
> (64-bit hypervisors) will have an RPL of 1, as already said by
> Andrew. We could in principle also add a VM assist for the
> hypervisor to store an RPL of 0, but I'd expect this to require
> further kernel changes, and together with the old behavior still
> being required to support I'm unconvinced this would be worth it.
> 
>> Also, why are we supporting 32-bit Linux PV guests at all?  Can we
>> just delete this code instead?
> 
> This was already suggested by Jürgen (now also CC-ed), but in reply
> it was pointed out that the process would be to first deprecate the
> code, and remove it only a couple of releases later if no-one comes
> up with a reason to retain it.

Thanks for the reminder.

I'll send a patch with the deprecation warning for 32-bit PV.


Juergen

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-10-25  6:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-07 10:41 [PATCH] x86/stackframe/32: repair 32-bit Xen PV Jan Beulich
2019-10-24  7:40 ` Ping: " Jan Beulich
2019-10-24 16:11 ` Andy Lutomirski
2019-10-24 16:31   ` [Xen-devel] " Andrew Cooper
2019-10-24 16:33     ` Andy Lutomirski
2019-10-25  6:06   ` Jan Beulich
2019-10-25  6:09     ` Jürgen Groß

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).