All of lore.kernel.org
 help / color / mirror / Atom feed
* Xen PV PTE ABI (or lack thereof)
@ 2016-01-20 20:10 Andrew Cooper
  2016-01-21 11:07 ` Jan Beulich
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Cooper @ 2016-01-20 20:10 UTC (permalink / raw)
  To: Xen-devel List, Jan Beulich, Tim Deegan, George Dunlap, Huaitong Han

Hi,

Pagetable entries are a shared medium between Xen and PV guests.

To the best of my knowledge, there is nothing written down about the ABI
here, and there are increasing problems as new hardware features become
available.

First of all, SMEP and SMAP.  32bit PV guests are subject to Xen's
SMEP/SMAP choices, because of running in ring 1.

SMAP in particular is problematic because older Linux guests do fall
foul of it; they don't understand what a SMAP pagefault is, and enter an
infinite loop of pagefaults.  SMEP is also problematic because it breaks
any guest wishing to use a shared address space between kernel and
user.  (I had some fun getting the test framework to function until I
twigged what was happening).

Both of these are regressions; older guests relying on existing
behaviour cease to function on newer hardware/Xen despite identical
settings.


For the PTE bits, _PAGE_GNTTAB (bit 62) is used exclusively in debug
build (so there is a guest observable difference between running on a
debug and a non-debug Xen), and the comment beside it even identifies
that it breaks BSD guests.  PTE bits 62:59 used by hardware if  CR4.PKE
is set.  Currently this means that we are not able to support Protection
Key for PV guests (although this restriction technically only applies to
debug builds of the hypervisor).

The other PTE bit used by Xen is _PAGE_GUEST_KERNEL (bit 52).  This bit
is used to notice when a 64bit PV guest attempts to override the fixup
Xen applies to its PTEs.  Xen unilaterally sets _PAGE_GLOBAL for user
pages, and clears _PAGE_GLOBAL for supervisor mappings, setting
_PAGE_USER in both cases as the PV kernel runs in ring3.  The only thing
_PAGE_GUEST_KERNEL is used for is to notice when the kernel deliberately
tries to create a _PAGE_GUEST_KERNEL|_PAGE_GLOBAL, at which point a
warning is logged and the kernel overridden.


Neither of the used PTE bits exist in the Xen public ABI.  Neither of
them serve a purpose other than a debugging aid.

I propose hiding them behind CONFIG_PV_PTE_DEBUG and declaring an ABI of
"all bits available for guest use".


The other question is what we do when it comes to %cr4 and PV guests.

The current SMAP issue is a blocker for XenServer, and I have some nasty
logic to fix up behind the guests back.  I have only just discovered the
SMEP issue, but it is still a regression (again, nothing states that a
PV guest must have a split address space;  segmentation is a perfectly
valid option in 32bit guests).  The PK issue is one which shouldn't be
an issue for us to implement in PV guests.

I am leaning towards allowing a toolstack to permit a PV guest to be
able to play with a few more CR4 bits.  We can't give a guest kernel
complete carte blanche, because of the security implications.  However,
we do already context switch CR4 for PV guests, so a few extra bits  on
a "nominated safe" domain is no extra hassle.

Thoughts?

~Andrew

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Xen PV PTE ABI (or lack thereof)
  2016-01-20 20:10 Xen PV PTE ABI (or lack thereof) Andrew Cooper
@ 2016-01-21 11:07 ` Jan Beulich
  2016-01-21 11:16   ` Andrew Cooper
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2016-01-21 11:07 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, Huaitong Han, Tim Deegan, Xen-devel List

>>> On 20.01.16 at 21:10, <andrew.cooper3@citrix.com> wrote:
> First of all, SMEP and SMAP.  32bit PV guests are subject to Xen's
> SMEP/SMAP choices, because of running in ring 1.
> 
> SMAP in particular is problematic because older Linux guests do fall
> foul of it; they don't understand what a SMAP pagefault is, and enter an
> infinite loop of pagefaults.  SMEP is also problematic because it breaks
> any guest wishing to use a shared address space between kernel and
> user.  (I had some fun getting the test framework to function until I
> twigged what was happening).
> 
> Both of these are regressions; older guests relying on existing
> behaviour cease to function on newer hardware/Xen despite identical
> settings.

And for both of them there simply should be a way for the guest to
state whether it's compatible (which should be the case for anything
we can't deal with completely transparently to guests).

> For the PTE bits, _PAGE_GNTTAB (bit 62) is used exclusively in debug
> build (so there is a guest observable difference between running on a
> debug and a non-debug Xen), and the comment beside it even identifies
> that it breaks BSD guests.  PTE bits 62:59 used by hardware if  CR4.PKE
> is set.  Currently this means that we are not able to support Protection
> Key for PV guests (although this restriction technically only applies to
> debug builds of the hypervisor).
> 
> The other PTE bit used by Xen is _PAGE_GUEST_KERNEL (bit 52).  This bit
> is used to notice when a 64bit PV guest attempts to override the fixup
> Xen applies to its PTEs.  Xen unilaterally sets _PAGE_GLOBAL for user
> pages, and clears _PAGE_GLOBAL for supervisor mappings, setting
> _PAGE_USER in both cases as the PV kernel runs in ring3.  The only thing
> _PAGE_GUEST_KERNEL is used for is to notice when the kernel deliberately
> tries to create a _PAGE_GUEST_KERNEL|_PAGE_GLOBAL, at which point a
> warning is logged and the kernel overridden.
> 
> 
> Neither of the used PTE bits exist in the Xen public ABI.  Neither of
> them serve a purpose other than a debugging aid.
> 
> I propose hiding them behind CONFIG_PV_PTE_DEBUG and declaring an ABI of
> "all bits available for guest use".

And a kernel using any of the conflicting bits would then become
unusable on a hypervisor with that debug option enabled? I'd
rather see us document the state things are in...

> The other question is what we do when it comes to %cr4 and PV guests.
> 
> The current SMAP issue is a blocker for XenServer, and I have some nasty
> logic to fix up behind the guests back.  I have only just discovered the
> SMEP issue, but it is still a regression (again, nothing states that a
> PV guest must have a split address space;  segmentation is a perfectly
> valid option in 32bit guests).  The PK issue is one which shouldn't be
> an issue for us to implement in PV guests.
> 
> I am leaning towards allowing a toolstack to permit a PV guest to be
> able to play with a few more CR4 bits.  We can't give a guest kernel
> complete carte blanche, because of the security implications.  However,
> we do already context switch CR4 for PV guests, so a few extra bits  on
> a "nominated safe" domain is no extra hassle.

Sounds reasonable.

Jan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Xen PV PTE ABI (or lack thereof)
  2016-01-21 11:07 ` Jan Beulich
@ 2016-01-21 11:16   ` Andrew Cooper
  2016-01-21 12:59     ` Jan Beulich
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Cooper @ 2016-01-21 11:16 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, Huaitong Han, Tim Deegan, Xen-devel List

On 21/01/16 11:07, Jan Beulich wrote:
>>>> On 20.01.16 at 21:10, <andrew.cooper3@citrix.com> wrote:
>> First of all, SMEP and SMAP.  32bit PV guests are subject to Xen's
>> SMEP/SMAP choices, because of running in ring 1.
>>
>> SMAP in particular is problematic because older Linux guests do fall
>> foul of it; they don't understand what a SMAP pagefault is, and enter an
>> infinite loop of pagefaults.  SMEP is also problematic because it breaks
>> any guest wishing to use a shared address space between kernel and
>> user.  (I had some fun getting the test framework to function until I
>> twigged what was happening).
>>
>> Both of these are regressions; older guests relying on existing
>> behaviour cease to function on newer hardware/Xen despite identical
>> settings.
> And for both of them there simply should be a way for the guest to
> state whether it's compatible (which should be the case for anything
> we can't deal with completely transparently to guests).
>
>> For the PTE bits, _PAGE_GNTTAB (bit 62) is used exclusively in debug
>> build (so there is a guest observable difference between running on a
>> debug and a non-debug Xen), and the comment beside it even identifies
>> that it breaks BSD guests.  PTE bits 62:59 used by hardware if  CR4.PKE
>> is set.  Currently this means that we are not able to support Protection
>> Key for PV guests (although this restriction technically only applies to
>> debug builds of the hypervisor).
>>
>> The other PTE bit used by Xen is _PAGE_GUEST_KERNEL (bit 52).  This bit
>> is used to notice when a 64bit PV guest attempts to override the fixup
>> Xen applies to its PTEs.  Xen unilaterally sets _PAGE_GLOBAL for user
>> pages, and clears _PAGE_GLOBAL for supervisor mappings, setting
>> _PAGE_USER in both cases as the PV kernel runs in ring3.  The only thing
>> _PAGE_GUEST_KERNEL is used for is to notice when the kernel deliberately
>> tries to create a _PAGE_GUEST_KERNEL|_PAGE_GLOBAL, at which point a
>> warning is logged and the kernel overridden.
>>
>>
>> Neither of the used PTE bits exist in the Xen public ABI.  Neither of
>> them serve a purpose other than a debugging aid.
>>
>> I propose hiding them behind CONFIG_PV_PTE_DEBUG and declaring an ABI of
>> "all bits available for guest use".
> And a kernel using any of the conflicting bits would then become
> unusable on a hypervisor with that debug option enabled? I'd
> rather see us document the state things are in...

_PAGE_GNTMAP is already states:

/*
 * Debug option: Ensure that granted mappings are not implicitly unmapped.
 * WARNING: This will need to be disabled to run OSes that use the spare PTE
 * bits themselves (e.g., *BSD).
 */

I was intending to have CONFIG_PV_PTE_DEBUG as an EXPERT option,
disabled by default even in debug builds.

There should not be an ABI difference between release and "normal" debug
builds.

~Andrew

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Xen PV PTE ABI (or lack thereof)
  2016-01-21 11:16   ` Andrew Cooper
@ 2016-01-21 12:59     ` Jan Beulich
  2016-01-21 13:17       ` Andrew Cooper
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2016-01-21 12:59 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, Huaitong Han, Tim Deegan, Xen-devel List

>>> On 21.01.16 at 12:16, <andrew.cooper3@citrix.com> wrote:
> On 21/01/16 11:07, Jan Beulich wrote:
>>>>> On 20.01.16 at 21:10, <andrew.cooper3@citrix.com> wrote:
>>> First of all, SMEP and SMAP.  32bit PV guests are subject to Xen's
>>> SMEP/SMAP choices, because of running in ring 1.
>>>
>>> SMAP in particular is problematic because older Linux guests do fall
>>> foul of it; they don't understand what a SMAP pagefault is, and enter an
>>> infinite loop of pagefaults.  SMEP is also problematic because it breaks
>>> any guest wishing to use a shared address space between kernel and
>>> user.  (I had some fun getting the test framework to function until I
>>> twigged what was happening).
>>>
>>> Both of these are regressions; older guests relying on existing
>>> behaviour cease to function on newer hardware/Xen despite identical
>>> settings.
>> And for both of them there simply should be a way for the guest to
>> state whether it's compatible (which should be the case for anything
>> we can't deal with completely transparently to guests).
>>
>>> For the PTE bits, _PAGE_GNTTAB (bit 62) is used exclusively in debug
>>> build (so there is a guest observable difference between running on a
>>> debug and a non-debug Xen), and the comment beside it even identifies
>>> that it breaks BSD guests.  PTE bits 62:59 used by hardware if  CR4.PKE
>>> is set.  Currently this means that we are not able to support Protection
>>> Key for PV guests (although this restriction technically only applies to
>>> debug builds of the hypervisor).
>>>
>>> The other PTE bit used by Xen is _PAGE_GUEST_KERNEL (bit 52).  This bit
>>> is used to notice when a 64bit PV guest attempts to override the fixup
>>> Xen applies to its PTEs.  Xen unilaterally sets _PAGE_GLOBAL for user
>>> pages, and clears _PAGE_GLOBAL for supervisor mappings, setting
>>> _PAGE_USER in both cases as the PV kernel runs in ring3.  The only thing
>>> _PAGE_GUEST_KERNEL is used for is to notice when the kernel deliberately
>>> tries to create a _PAGE_GUEST_KERNEL|_PAGE_GLOBAL, at which point a
>>> warning is logged and the kernel overridden.
>>>
>>>
>>> Neither of the used PTE bits exist in the Xen public ABI.  Neither of
>>> them serve a purpose other than a debugging aid.
>>>
>>> I propose hiding them behind CONFIG_PV_PTE_DEBUG and declaring an ABI of
>>> "all bits available for guest use".
>> And a kernel using any of the conflicting bits would then become
>> unusable on a hypervisor with that debug option enabled? I'd
>> rather see us document the state things are in...
> 
> _PAGE_GNTMAP is already states:
> 
> /*
>  * Debug option: Ensure that granted mappings are not implicitly unmapped.
>  * WARNING: This will need to be disabled to run OSes that use the spare PTE
>  * bits themselves (e.g., *BSD).
>  */

But that's (assuming the use of the two bits were spelled out) a
guest OS not fully playing by the spec. To me, "available" PTE bits
being shared implies that some of them may be claimed by Xen,
while others may be claimed by guests. You're right that this needs
to be written down, but I don't think we need to go as far as
forbidding Xen to use any of them. And even less so should we
preclude their use for any purpose going forward.

In the end, with (as it seems) not much effort this could even be
made dynamic: A guest could advertise which of the bits it doesn't
use, and then Xen could pick two of them for the two purposes it
currently needs them for. Should a guest leave no or just one bit
available, the debugging aid could then be disabled.

> I was intending to have CONFIG_PV_PTE_DEBUG as an EXPERT option,
> disabled by default even in debug builds.
> 
> There should not be an ABI difference between release and "normal" debug
> builds.

Well, I see your point, but as said above I'm not convinced
disabling all that code is the right solution. In fact, what you
propose is not far away from removing that code altogether.

Jan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Xen PV PTE ABI (or lack thereof)
  2016-01-21 12:59     ` Jan Beulich
@ 2016-01-21 13:17       ` Andrew Cooper
  2016-01-21 13:55         ` Jan Beulich
  2016-01-21 14:29         ` David Vrabel
  0 siblings, 2 replies; 10+ messages in thread
From: Andrew Cooper @ 2016-01-21 13:17 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, Huaitong Han, Tim Deegan, Xen-devel List

On 21/01/16 12:59, Jan Beulich wrote:
>>
>>>> For the PTE bits, _PAGE_GNTTAB (bit 62) is used exclusively in debug
>>>> build (so there is a guest observable difference between running on a
>>>> debug and a non-debug Xen), and the comment beside it even identifies
>>>> that it breaks BSD guests.  PTE bits 62:59 used by hardware if  CR4.PKE
>>>> is set.  Currently this means that we are not able to support Protection
>>>> Key for PV guests (although this restriction technically only applies to
>>>> debug builds of the hypervisor).
>>>>
>>>> The other PTE bit used by Xen is _PAGE_GUEST_KERNEL (bit 52).  This bit
>>>> is used to notice when a 64bit PV guest attempts to override the fixup
>>>> Xen applies to its PTEs.  Xen unilaterally sets _PAGE_GLOBAL for user
>>>> pages, and clears _PAGE_GLOBAL for supervisor mappings, setting
>>>> _PAGE_USER in both cases as the PV kernel runs in ring3.  The only thing
>>>> _PAGE_GUEST_KERNEL is used for is to notice when the kernel deliberately
>>>> tries to create a _PAGE_GUEST_KERNEL|_PAGE_GLOBAL, at which point a
>>>> warning is logged and the kernel overridden.
>>>>
>>>>
>>>> Neither of the used PTE bits exist in the Xen public ABI.  Neither of
>>>> them serve a purpose other than a debugging aid.
>>>>
>>>> I propose hiding them behind CONFIG_PV_PTE_DEBUG and declaring an ABI of
>>>> "all bits available for guest use".
>>> And a kernel using any of the conflicting bits would then become
>>> unusable on a hypervisor with that debug option enabled? I'd
>>> rather see us document the state things are in...
>> _PAGE_GNTMAP is already states:
>>
>> /*
>>  * Debug option: Ensure that granted mappings are not implicitly unmapped.
>>  * WARNING: This will need to be disabled to run OSes that use the spare PTE
>>  * bits themselves (e.g., *BSD).
>>  */
> But that's (assuming the use of the two bits were spelled out) a
> guest OS not fully playing by the spec. To me, "available" PTE bits
> being shared implies that some of them may be claimed by Xen,
> while others may be claimed by guests. You're right that this needs
> to be written down, but I don't think we need to go as far as
> forbidding Xen to use any of them. And even less so should we
> preclude their use for any purpose going forward.

I do not want to make an ABI which mandates the use of certain bits by Xen.

As we see with the Protection Key feature, newer hardware feature start
using bits which were previously software available, and we absolutely
don't want to be in a position where our ABI prevents us from ever
supporting a new feature.

> In the end, with (as it seems) not much effort this could even be
> made dynamic: A guest could advertise which of the bits it doesn't
> use, and then Xen could pick two of them for the two purposes it
> currently needs them for. Should a guest leave no or just one bit
> available, the debugging aid could then be disabled.

This is unnecessarily complicated.  It only helps going forward, and
leaves Xen with substantially more complicated PTE handling.

What happens if at some point later, we try to boot a PV guest which has
a different PTE layout to what Xen has chosen?  We definitely can't
update every PTE with a newly-chosen layout.

>
>> I was intending to have CONFIG_PV_PTE_DEBUG as an EXPERT option,
>> disabled by default even in debug builds.
>>
>> There should not be an ABI difference between release and "normal" debug
>> builds.
> Well, I see your point, but as said above I'm not convinced
> disabling all that code is the right solution. In fact, what you
> propose is not far away from removing that code altogether.

The two bits are only used for specialised debugging.  They should be
relegated to people doing specific debugging, and not interfere with the
overwhelming majority of cases where Xen doesn't need to use any
software available PTE bits.

~Andrew

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Xen PV PTE ABI (or lack thereof)
  2016-01-21 13:17       ` Andrew Cooper
@ 2016-01-21 13:55         ` Jan Beulich
  2016-01-21 15:10           ` Andrew Cooper
  2016-01-21 14:29         ` David Vrabel
  1 sibling, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2016-01-21 13:55 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, Huaitong Han, TimDeegan, Xen-devel List

>>> On 21.01.16 at 14:17, <andrew.cooper3@citrix.com> wrote:
> On 21/01/16 12:59, Jan Beulich wrote:
>>>
>>>>> For the PTE bits, _PAGE_GNTTAB (bit 62) is used exclusively in debug
>>>>> build (so there is a guest observable difference between running on a
>>>>> debug and a non-debug Xen), and the comment beside it even identifies
>>>>> that it breaks BSD guests.  PTE bits 62:59 used by hardware if  CR4.PKE
>>>>> is set.  Currently this means that we are not able to support Protection
>>>>> Key for PV guests (although this restriction technically only applies to
>>>>> debug builds of the hypervisor).
>>>>>
>>>>> The other PTE bit used by Xen is _PAGE_GUEST_KERNEL (bit 52).  This bit
>>>>> is used to notice when a 64bit PV guest attempts to override the fixup
>>>>> Xen applies to its PTEs.  Xen unilaterally sets _PAGE_GLOBAL for user
>>>>> pages, and clears _PAGE_GLOBAL for supervisor mappings, setting
>>>>> _PAGE_USER in both cases as the PV kernel runs in ring3.  The only thing
>>>>> _PAGE_GUEST_KERNEL is used for is to notice when the kernel deliberately
>>>>> tries to create a _PAGE_GUEST_KERNEL|_PAGE_GLOBAL, at which point a
>>>>> warning is logged and the kernel overridden.
>>>>>
>>>>>
>>>>> Neither of the used PTE bits exist in the Xen public ABI.  Neither of
>>>>> them serve a purpose other than a debugging aid.
>>>>>
>>>>> I propose hiding them behind CONFIG_PV_PTE_DEBUG and declaring an ABI of
>>>>> "all bits available for guest use".
>>>> And a kernel using any of the conflicting bits would then become
>>>> unusable on a hypervisor with that debug option enabled? I'd
>>>> rather see us document the state things are in...
>>> _PAGE_GNTMAP is already states:
>>>
>>> /*
>>>  * Debug option: Ensure that granted mappings are not implicitly unmapped.
>>>  * WARNING: This will need to be disabled to run OSes that use the spare PTE
>>>  * bits themselves (e.g., *BSD).
>>>  */
>> But that's (assuming the use of the two bits were spelled out) a
>> guest OS not fully playing by the spec. To me, "available" PTE bits
>> being shared implies that some of them may be claimed by Xen,
>> while others may be claimed by guests. You're right that this needs
>> to be written down, but I don't think we need to go as far as
>> forbidding Xen to use any of them. And even less so should we
>> preclude their use for any purpose going forward.
> 
> I do not want to make an ABI which mandates the use of certain bits by Xen.

Agreed, but that doesn't match what I said: We also shouldn't
make an ABI precluding use of some of the bits by Xen.

>> In the end, with (as it seems) not much effort this could even be
>> made dynamic: A guest could advertise which of the bits it doesn't
>> use, and then Xen could pick two of them for the two purposes it
>> currently needs them for. Should a guest leave no or just one bit
>> available, the debugging aid could then be disabled.
> 
> This is unnecessarily complicated.  It only helps going forward, and
> leaves Xen with substantially more complicated PTE handling.

Substantially? Instead of using a constant, we'd use a per-domain
variable. Not much of a complication afaict.

> What happens if at some point later, we try to boot a PV guest which has
> a different PTE layout to what Xen has chosen?  We definitely can't
> update every PTE with a newly-chosen layout.

Just to re-iterate: A per domain selection of bit mask(s).

>>> I was intending to have CONFIG_PV_PTE_DEBUG as an EXPERT option,
>>> disabled by default even in debug builds.
>>>
>>> There should not be an ABI difference between release and "normal" debug
>>> builds.
>> Well, I see your point, but as said above I'm not convinced
>> disabling all that code is the right solution. In fact, what you
>> propose is not far away from removing that code altogether.
> 
> The two bits are only used for specialised debugging.  They should be
> relegated to people doing specific debugging, and not interfere with the
> overwhelming majority of cases where Xen doesn't need to use any
> software available PTE bits.

Your repeated claim that _PAGE_GUEST_KERNEL is purely debugging only
makes we wonder how you would mean to adjust adjust_guest_l1e() with
that flag gone (most notably the last of its if()-s).

Jan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Xen PV PTE ABI (or lack thereof)
  2016-01-21 13:17       ` Andrew Cooper
  2016-01-21 13:55         ` Jan Beulich
@ 2016-01-21 14:29         ` David Vrabel
  2016-01-21 14:37           ` Andrew Cooper
  1 sibling, 1 reply; 10+ messages in thread
From: David Vrabel @ 2016-01-21 14:29 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich
  Cc: George Dunlap, Huaitong Han, Tim Deegan, Xen-devel List

On 21/01/16 13:17, Andrew Cooper wrote:
> 
> As we see with the Protection Key feature, newer hardware feature start
> using bits which were previously software available, and we absolutely
> don't want to be in a position where our ABI prevents us from ever
> supporting a new feature.

I don't see a problem in not supporting new features in /PV/ guests.

I think we should document the PV ABI as-is. i.e., that these two PTE
bits are not available for PV guest use.

David

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Xen PV PTE ABI (or lack thereof)
  2016-01-21 14:29         ` David Vrabel
@ 2016-01-21 14:37           ` Andrew Cooper
  2016-01-21 14:53             ` David Vrabel
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Cooper @ 2016-01-21 14:37 UTC (permalink / raw)
  To: David Vrabel, Jan Beulich
  Cc: George Dunlap, Huaitong Han, Tim Deegan, Xen-devel List

On 21/01/16 14:29, David Vrabel wrote:
> On 21/01/16 13:17, Andrew Cooper wrote:
>> As we see with the Protection Key feature, newer hardware feature start
>> using bits which were previously software available, and we absolutely
>> don't want to be in a position where our ABI prevents us from ever
>> supporting a new feature.
> I don't see a problem in not supporting new features in /PV/ guests.

I know you want to kill PV guests.  I am not going to deliberately
cripple PV guests in an attempt to achieve that goal.

>
> I think we should document the PV ABI as-is. i.e., that these two PTE
> bits are not available for PV guest use.

This is already known to break some guests.  It shouldn't stay as it
currently is.

~Andrew

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Xen PV PTE ABI (or lack thereof)
  2016-01-21 14:37           ` Andrew Cooper
@ 2016-01-21 14:53             ` David Vrabel
  0 siblings, 0 replies; 10+ messages in thread
From: David Vrabel @ 2016-01-21 14:53 UTC (permalink / raw)
  To: Andrew Cooper, David Vrabel, Jan Beulich
  Cc: George Dunlap, Huaitong Han, Tim Deegan, Xen-devel List

On 21/01/16 14:37, Andrew Cooper wrote:

>> I think we should document the PV ABI as-is. i.e., that these two PTE
>> bits are not available for PV guest use.
> 
> This is already known to break some guests.  It shouldn't stay as it
> currently is.

It's unfortunate that the current ABI is incompatible with some guests
in some situations, but it can't be fixed without adding a mechanism for
the guest to query or negotiate PTE bit availability since existing
guests have to be able to run on older hypervisors.

David

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Xen PV PTE ABI (or lack thereof)
  2016-01-21 13:55         ` Jan Beulich
@ 2016-01-21 15:10           ` Andrew Cooper
  0 siblings, 0 replies; 10+ messages in thread
From: Andrew Cooper @ 2016-01-21 15:10 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, Huaitong Han, TimDeegan, Xen-devel List

On 21/01/16 13:55, Jan Beulich wrote:

>>>> I was intending to have CONFIG_PV_PTE_DEBUG as an EXPERT option,
>>>> disabled by default even in debug builds.
>>>>
>>>> There should not be an ABI difference between release and "normal" debug
>>>> builds.
>>> Well, I see your point, but as said above I'm not convinced
>>> disabling all that code is the right solution. In fact, what you
>>> propose is not far away from removing that code altogether.
>> The two bits are only used for specialised debugging.  They should be
>> relegated to people doing specific debugging, and not interfere with the
>> overwhelming majority of cases where Xen doesn't need to use any
>> software available PTE bits.
> Your repeated claim that _PAGE_GUEST_KERNEL is purely debugging only
> makes we wonder how you would mean to adjust adjust_guest_l1e() with
> that flag gone (most notably the last of its if()-s).

diff --git a/xen/arch/x86/mm.c
b/xen/arch/x86/mm.c                                                                                                                                      

index b81d1fd..46ef5ce
100644                                                                                                                                                           

---
a/xen/arch/x86/mm.c                                                                                                                                                                 

+++
b/xen/arch/x86/mm.c                                                                                                                                                                 

@@ -1060,10 +1060,11 @@
get_page_from_l4e(                                                                                                                                              

                  == (_PAGE_GUEST_KERNEL|_PAGE_GLOBAL)
)                     
\                                                                                                         

                 MEM_LOG("Global bit is set to kernel page
%lx",             
\                                                                                                         

                        
l1e_get_pfn((pl1e)));                               
\                                                                                                         

-            if ( !(l1e_get_flags((pl1e)) & _PAGE_USER)
)                    
\                                                                                                         

-                l1e_add_flags((pl1e),
(_PAGE_GUEST_KERNEL|_PAGE_USER));     
\                                                                                                         

-            if ( !(l1e_get_flags((pl1e)) & _PAGE_GUEST_KERNEL)
)            
\                                                                                                         

-                l1e_add_flags((pl1e),
(_PAGE_GLOBAL|_PAGE_USER));           
\                                                                                                         

+            if ( l1e_get_flags((pl1e)) & _PAGE_USER
)                       
\                                                                                                         

+                l1e_add_flags((pl1e),
_PAGE_GLOBAL);                        
\                                                                                                         

+           
else                                                            
\                                                                                                         

+                l1e_remove_flags((pl1e),
_PAGE_GLOBAL);                     
\                                                                                                         

+            l1e_add_flags((pl1e),
_PAGE_USER);                              
\                                                                                                         

        
}                                                                   
\                                                                                                         

     } while ( 0 )


_PAGE_GUEST_KERNEL isn't in the ABI, which means that the 2nd if() is
the only piece of code which validly sets it.

Read-modify-write operations already don't function correctly as
_PAGE_GUEST_KERNEL is a hidden saturating bit from the guests point of view.

~Andrew

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-01-21 15:10 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-20 20:10 Xen PV PTE ABI (or lack thereof) Andrew Cooper
2016-01-21 11:07 ` Jan Beulich
2016-01-21 11:16   ` Andrew Cooper
2016-01-21 12:59     ` Jan Beulich
2016-01-21 13:17       ` Andrew Cooper
2016-01-21 13:55         ` Jan Beulich
2016-01-21 15:10           ` Andrew Cooper
2016-01-21 14:29         ` David Vrabel
2016-01-21 14:37           ` Andrew Cooper
2016-01-21 14:53             ` David Vrabel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.