All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] include/public: add new elf note for support of huge physical addresses
@ 2017-08-14 10:21 Juergen Gross
  2017-08-14 10:29 ` Jan Beulich
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Juergen Gross @ 2017-08-14 10:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, sstabellini, wei.liu2, George.Dunlap,
	andrew.cooper3, ian.jackson, tim, jbeulich

Current pv guests will only see physical addresses up to 46 bits wide.
In order to be able to run on a host supporting 5 level paging and to
make use of any possible memory page there, physical addresses with up
to 52 bits have to be supported.

As Xen needs to know whether a pv guest can handle such large addresses
the kernel of the guest has to advertise this capability.

Add a new ELF note for the maximum physical address the kernel can
make use of.

Please note that it is not required for a pv guest to support 5 level
paging in order to use high physical addresses.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
As I'd like to add support for large physical addresses in pv guests
rather sooner than later to the Linux kernel, I'm suggesting this
public interface change way before any 5 level paging support is added
to Xen.
---
 xen/include/public/elfnote.h | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/xen/include/public/elfnote.h b/xen/include/public/elfnote.h
index 936aa65822..8d76437f19 100644
--- a/xen/include/public/elfnote.h
+++ b/xen/include/public/elfnote.h
@@ -212,9 +212,18 @@
 #define XEN_ELFNOTE_PHYS32_ENTRY 18
 
 /*
+ * Maximum physical address size the kernel can handle.
+ *
+ * All memory of the PV guest must be allocated below this boundary,
+ * as the guest kernel can't handle page table entries with MFNs referring
+ * to memory above this value.
+ */
+#define XEN_ELFNOTE_MAXPHYS_SIZE 19
+
+/*
  * The number of the highest elfnote defined.
  */
-#define XEN_ELFNOTE_MAX XEN_ELFNOTE_PHYS32_ENTRY
+#define XEN_ELFNOTE_MAX XEN_ELFNOTE_MAXPHYS_SIZE
 
 /*
  * System information exported through crash notes.
-- 
2.12.3


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] include/public: add new elf note for support of huge physical addresses
  2017-08-14 10:21 [PATCH] include/public: add new elf note for support of huge physical addresses Juergen Gross
@ 2017-08-14 10:29 ` Jan Beulich
       [not found] ` <59919793020000780016F437@suse.com>
  2017-08-14 12:36 ` Andrew Cooper
  2 siblings, 0 replies; 13+ messages in thread
From: Jan Beulich @ 2017-08-14 10:29 UTC (permalink / raw)
  To: Juergen Gross
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, xen-devel

>>> On 14.08.17 at 12:21, <jgross@suse.com> wrote:
> Current pv guests will only see physical addresses up to 46 bits wide.
> In order to be able to run on a host supporting 5 level paging and to
> make use of any possible memory page there, physical addresses with up
> to 52 bits have to be supported.

Is this a Xen shortcoming or a Linux one (I assume the latter)?

> --- a/xen/include/public/elfnote.h
> +++ b/xen/include/public/elfnote.h
> @@ -212,9 +212,18 @@
>  #define XEN_ELFNOTE_PHYS32_ENTRY 18
>  
>  /*
> + * Maximum physical address size the kernel can handle.
> + *
> + * All memory of the PV guest must be allocated below this boundary,
> + * as the guest kernel can't handle page table entries with MFNs referring
> + * to memory above this value.
> + */
> +#define XEN_ELFNOTE_MAXPHYS_SIZE 19

Without use in the hypervisor or tools I don't see what good
introducing this note will do.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] include/public: add new elf note for support of huge physical addresses
       [not found] ` <59919793020000780016F437@suse.com>
@ 2017-08-14 10:35   ` Juergen Gross
  2017-08-14 10:48     ` Jan Beulich
       [not found]     ` <59919C20020000780016F4AB@suse.com>
  0 siblings, 2 replies; 13+ messages in thread
From: Juergen Gross @ 2017-08-14 10:35 UTC (permalink / raw)
  To: Jan Beulich
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, xen-devel

On 14/08/17 12:29, Jan Beulich wrote:
>>>> On 14.08.17 at 12:21, <jgross@suse.com> wrote:
>> Current pv guests will only see physical addresses up to 46 bits wide.
>> In order to be able to run on a host supporting 5 level paging and to
>> make use of any possible memory page there, physical addresses with up
>> to 52 bits have to be supported.
> 
> Is this a Xen shortcoming or a Linux one (I assume the latter)?

It is a shortcoming of the Xen pv interface.

>> --- a/xen/include/public/elfnote.h
>> +++ b/xen/include/public/elfnote.h
>> @@ -212,9 +212,18 @@
>>  #define XEN_ELFNOTE_PHYS32_ENTRY 18
>>  
>>  /*
>> + * Maximum physical address size the kernel can handle.
>> + *
>> + * All memory of the PV guest must be allocated below this boundary,
>> + * as the guest kernel can't handle page table entries with MFNs referring
>> + * to memory above this value.
>> + */
>> +#define XEN_ELFNOTE_MAXPHYS_SIZE 19
> 
> Without use in the hypervisor or tools I don't see what good
> introducing this note will do.

The Linux kernel could make use of it from e.g. kernel 4.14 on. So in
case supports 5 level paging hosts lets say in Xen 4.12 it could run
Linux pv guests with kernel 4.14 making use of high memory addresses.

In case we don't define the note (or do it rather late) pv guests would
have to be restricted to the low 64TB of host memory.


Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] include/public: add new elf note for support of huge physical addresses
  2017-08-14 10:35   ` Juergen Gross
@ 2017-08-14 10:48     ` Jan Beulich
       [not found]     ` <59919C20020000780016F4AB@suse.com>
  1 sibling, 0 replies; 13+ messages in thread
From: Jan Beulich @ 2017-08-14 10:48 UTC (permalink / raw)
  To: Juergen Gross
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, xen-devel

>>> On 14.08.17 at 12:35, <jgross@suse.com> wrote:
> On 14/08/17 12:29, Jan Beulich wrote:
>>>>> On 14.08.17 at 12:21, <jgross@suse.com> wrote:
>>> Current pv guests will only see physical addresses up to 46 bits wide.
>>> In order to be able to run on a host supporting 5 level paging and to
>>> make use of any possible memory page there, physical addresses with up
>>> to 52 bits have to be supported.
>> 
>> Is this a Xen shortcoming or a Linux one (I assume the latter)?
> 
> It is a shortcoming of the Xen pv interface.

Please be more precise: Where in the interface to we have a
restriction to 46 bits?

>>> --- a/xen/include/public/elfnote.h
>>> +++ b/xen/include/public/elfnote.h
>>> @@ -212,9 +212,18 @@
>>>  #define XEN_ELFNOTE_PHYS32_ENTRY 18
>>>  
>>>  /*
>>> + * Maximum physical address size the kernel can handle.
>>> + *
>>> + * All memory of the PV guest must be allocated below this boundary,
>>> + * as the guest kernel can't handle page table entries with MFNs referring
>>> + * to memory above this value.
>>> + */
>>> +#define XEN_ELFNOTE_MAXPHYS_SIZE 19
>> 
>> Without use in the hypervisor or tools I don't see what good
>> introducing this note will do.
> 
> The Linux kernel could make use of it from e.g. kernel 4.14 on. So in
> case supports 5 level paging hosts lets say in Xen 4.12 it could run
> Linux pv guests with kernel 4.14 making use of high memory addresses.
> 
> In case we don't define the note (or do it rather late) pv guests would
> have to be restricted to the low 64TB of host memory.

No matter what you say here - I can't see how defining the note
alone will help.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] include/public: add new elf note for support of huge physical addresses
       [not found]     ` <59919C20020000780016F4AB@suse.com>
@ 2017-08-14 11:05       ` Juergen Gross
  2017-08-14 11:40         ` Jan Beulich
       [not found]         ` <5991A844020000780016F53B@suse.com>
  0 siblings, 2 replies; 13+ messages in thread
From: Juergen Gross @ 2017-08-14 11:05 UTC (permalink / raw)
  To: Jan Beulich
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, xen-devel

On 14/08/17 12:48, Jan Beulich wrote:
>>>> On 14.08.17 at 12:35, <jgross@suse.com> wrote:
>> On 14/08/17 12:29, Jan Beulich wrote:
>>>>>> On 14.08.17 at 12:21, <jgross@suse.com> wrote:
>>>> Current pv guests will only see physical addresses up to 46 bits wide.
>>>> In order to be able to run on a host supporting 5 level paging and to
>>>> make use of any possible memory page there, physical addresses with up
>>>> to 52 bits have to be supported.
>>>
>>> Is this a Xen shortcoming or a Linux one (I assume the latter)?
>>
>> It is a shortcoming of the Xen pv interface.
> 
> Please be more precise: Where in the interface to we have a
> restriction to 46 bits?

We have no definition that the mfn width in a pte can be larger than
the pfn width for a given architecture (in this case a 4 level paging
64 bit x86 host).

So Xen has to assume a guest not telling otherwise has to be limited
to mfns not exceeding 4 level hosts maximum addresses.

Or would you like to not limit current pv guests to the lower 64TB and
risk them crashing, just because they interpreted the lack of any
specific mfn width definition in another way as you do?

>>>> --- a/xen/include/public/elfnote.h
>>>> +++ b/xen/include/public/elfnote.h
>>>> @@ -212,9 +212,18 @@
>>>>  #define XEN_ELFNOTE_PHYS32_ENTRY 18
>>>>  
>>>>  /*
>>>> + * Maximum physical address size the kernel can handle.
>>>> + *
>>>> + * All memory of the PV guest must be allocated below this boundary,
>>>> + * as the guest kernel can't handle page table entries with MFNs referring
>>>> + * to memory above this value.
>>>> + */
>>>> +#define XEN_ELFNOTE_MAXPHYS_SIZE 19
>>>
>>> Without use in the hypervisor or tools I don't see what good
>>> introducing this note will do.
>>
>> The Linux kernel could make use of it from e.g. kernel 4.14 on. So in
>> case supports 5 level paging hosts lets say in Xen 4.12 it could run
>> Linux pv guests with kernel 4.14 making use of high memory addresses.
>>
>> In case we don't define the note (or do it rather late) pv guests would
>> have to be restricted to the low 64TB of host memory.
> 
> No matter what you say here - I can't see how defining the note
> alone will help.

It will help to introduce the support in Linux for large mfns _now_
instead of having to wait.

This can be easily compared to the support of 5 level paging in the
kernel happening right now: When the 5 level paging machines are
available in the future you won't be limited to a rather recent kernel,
but you can use one already being part of some distribution.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] include/public: add new elf note for support of huge physical addresses
  2017-08-14 11:05       ` Juergen Gross
@ 2017-08-14 11:40         ` Jan Beulich
       [not found]         ` <5991A844020000780016F53B@suse.com>
  1 sibling, 0 replies; 13+ messages in thread
From: Jan Beulich @ 2017-08-14 11:40 UTC (permalink / raw)
  To: Juergen Gross
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, xen-devel

>>> On 14.08.17 at 13:05, <jgross@suse.com> wrote:
> On 14/08/17 12:48, Jan Beulich wrote:
>>>>> On 14.08.17 at 12:35, <jgross@suse.com> wrote:
>>> On 14/08/17 12:29, Jan Beulich wrote:
>>>>>>> On 14.08.17 at 12:21, <jgross@suse.com> wrote:
>>>>> Current pv guests will only see physical addresses up to 46 bits wide.
>>>>> In order to be able to run on a host supporting 5 level paging and to
>>>>> make use of any possible memory page there, physical addresses with up
>>>>> to 52 bits have to be supported.
>>>>
>>>> Is this a Xen shortcoming or a Linux one (I assume the latter)?
>>>
>>> It is a shortcoming of the Xen pv interface.
>> 
>> Please be more precise: Where in the interface to we have a
>> restriction to 46 bits?
> 
> We have no definition that the mfn width in a pte can be larger than
> the pfn width for a given architecture (in this case a 4 level paging
> 64 bit x86 host).
> 
> So Xen has to assume a guest not telling otherwise has to be limited
> to mfns not exceeding 4 level hosts maximum addresses.

The number of page table levels affects only virtual address
width. Physical addresses can architecturally be 52 bits wide,
and what CPUID extended leaf 8 provides is what limits
physical address width.

> Or would you like to not limit current pv guests to the lower 64TB and
> risk them crashing, just because they interpreted the lack of any
> specific mfn width definition in another way as you do?

Again - you saying "current pv guests" rather than "current
Linux PV guests" makes me assume you've found some
limitation in the PV ABI. Yet so far you didn't point out where
that is, which then again makes me assume you're talking
about a Linux limitation.

>>>>> --- a/xen/include/public/elfnote.h
>>>>> +++ b/xen/include/public/elfnote.h
>>>>> @@ -212,9 +212,18 @@
>>>>>  #define XEN_ELFNOTE_PHYS32_ENTRY 18
>>>>>  
>>>>>  /*
>>>>> + * Maximum physical address size the kernel can handle.
>>>>> + *
>>>>> + * All memory of the PV guest must be allocated below this boundary,
>>>>> + * as the guest kernel can't handle page table entries with MFNs referring
>>>>> + * to memory above this value.
>>>>> + */
>>>>> +#define XEN_ELFNOTE_MAXPHYS_SIZE 19
>>>>
>>>> Without use in the hypervisor or tools I don't see what good
>>>> introducing this note will do.
>>>
>>> The Linux kernel could make use of it from e.g. kernel 4.14 on. So in
>>> case supports 5 level paging hosts lets say in Xen 4.12 it could run
>>> Linux pv guests with kernel 4.14 making use of high memory addresses.
>>>
>>> In case we don't define the note (or do it rather late) pv guests would
>>> have to be restricted to the low 64TB of host memory.
>> 
>> No matter what you say here - I can't see how defining the note
>> alone will help.
> 
> It will help to introduce the support in Linux for large mfns _now_
> instead of having to wait.

How will that help (other than by knowing the numerical value
for the note)? Once again, without the hypervisor taking any
action upon seeing the note I don't see why on hardware with
wider than 46 bit physical addresses (all(?) AMD CPUs have 48
iirc) the intended effect will be achieved.

> This can be easily compared to the support of 5 level paging in the
> kernel happening right now: When the 5 level paging machines are
> available in the future you won't be limited to a rather recent kernel,
> but you can use one already being part of some distribution.

Yes and no. Since we don't mean to introduce 5-level PV guests,
we're not adding respective MMU ops anyway. If we would, it
would still seem strange to introduced, say, MMUEXT_PIN_L5_TABLE
without also implementing it. But yes, it would be possible, just
that other than here there really would not be a need for the
hypervisor to do anything for it as long as it doesn't itself know
of 5 page table levels.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] include/public: add new elf note for support of huge physical addresses
       [not found]         ` <5991A844020000780016F53B@suse.com>
@ 2017-08-14 12:21           ` Juergen Gross
  2017-08-14 13:18             ` Jan Beulich
       [not found]             ` <5991BF44020000780016F61E@suse.com>
  0 siblings, 2 replies; 13+ messages in thread
From: Juergen Gross @ 2017-08-14 12:21 UTC (permalink / raw)
  To: Jan Beulich
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, xen-devel

On 14/08/17 13:40, Jan Beulich wrote:
>>>> On 14.08.17 at 13:05, <jgross@suse.com> wrote:
>> On 14/08/17 12:48, Jan Beulich wrote:
>>>>>> On 14.08.17 at 12:35, <jgross@suse.com> wrote:
>>>> On 14/08/17 12:29, Jan Beulich wrote:
>>>>>>>> On 14.08.17 at 12:21, <jgross@suse.com> wrote:
>>>>>> Current pv guests will only see physical addresses up to 46 bits wide.
>>>>>> In order to be able to run on a host supporting 5 level paging and to
>>>>>> make use of any possible memory page there, physical addresses with up
>>>>>> to 52 bits have to be supported.
>>>>>
>>>>> Is this a Xen shortcoming or a Linux one (I assume the latter)?
>>>>
>>>> It is a shortcoming of the Xen pv interface.
>>>
>>> Please be more precise: Where in the interface to we have a
>>> restriction to 46 bits?
>>
>> We have no definition that the mfn width in a pte can be larger than
>> the pfn width for a given architecture (in this case a 4 level paging
>> 64 bit x86 host).
>>
>> So Xen has to assume a guest not telling otherwise has to be limited
>> to mfns not exceeding 4 level hosts maximum addresses.
> 
> The number of page table levels affects only virtual address
> width. Physical addresses can architecturally be 52 bits wide,
> and what CPUID extended leaf 8 provides is what limits
> physical address width.

Yes.

OTOH up to now there have been no x86 platforms supporting more than
46 bits physical address width (at least AFAIK), and this limit is
explicitly specified for all current processors.

>> Or would you like to not limit current pv guests to the lower 64TB and
>> risk them crashing, just because they interpreted the lack of any
>> specific mfn width definition in another way as you do?
> 
> Again - you saying "current pv guests" rather than "current
> Linux PV guests" makes me assume you've found some
> limitation in the PV ABI. Yet so far you didn't point out where
> that is, which then again makes me assume you're talking
> about a Linux limitation.

Yes, I am talking of Linux here.

And no, you are wrong that I haven't pointed out where the limitation
is: I have said that the PV ABI nowhere states that MFNs can be wider
than any current processor's PFNs.

So when being pedantic you are right: the Linux kernel is violating
the specification by not being able to run on a processor specifying
physical address width to be 52 bits via CPUID.

OTOH as there hasn't been any such processor up to now this was no
problem for Linux.

We could say, of course, this is a problem of Linux which should be
fixed. I think this wouldn't be a wise thing to do: we don't want to
do finger pointing at Linux, but we want a smooth user's experience
with Xen. So we need some kind of interface to handle the current
situation that no Linux kernel up to 4.13 will be able to make use of
physical host memory above 64TB. Again: I don't think we want to let
those kernel's just crash and tell the users its Linux' fault, they
should either use a new kernel or KVM.

>>>>>> --- a/xen/include/public/elfnote.h
>>>>>> +++ b/xen/include/public/elfnote.h
>>>>>> @@ -212,9 +212,18 @@
>>>>>>  #define XEN_ELFNOTE_PHYS32_ENTRY 18
>>>>>>  
>>>>>>  /*
>>>>>> + * Maximum physical address size the kernel can handle.
>>>>>> + *
>>>>>> + * All memory of the PV guest must be allocated below this boundary,
>>>>>> + * as the guest kernel can't handle page table entries with MFNs referring
>>>>>> + * to memory above this value.
>>>>>> + */
>>>>>> +#define XEN_ELFNOTE_MAXPHYS_SIZE 19
>>>>>
>>>>> Without use in the hypervisor or tools I don't see what good
>>>>> introducing this note will do.
>>>>
>>>> The Linux kernel could make use of it from e.g. kernel 4.14 on. So in
>>>> case supports 5 level paging hosts lets say in Xen 4.12 it could run
>>>> Linux pv guests with kernel 4.14 making use of high memory addresses.
>>>>
>>>> In case we don't define the note (or do it rather late) pv guests would
>>>> have to be restricted to the low 64TB of host memory.
>>>
>>> No matter what you say here - I can't see how defining the note
>>> alone will help.
>>
>> It will help to introduce the support in Linux for large mfns _now_
>> instead of having to wait.
> 
> How will that help (other than by knowing the numerical value
> for the note)? Once again, without the hypervisor taking any
> action upon seeing the note I don't see why on hardware with
> wider than 46 bit physical addresses (all(?) AMD CPUs have 48
> iirc) the intended effect will be achieved.

It will help defining the interface. Using the ELF note in Linux kernel
won't help now, but in future. And we won't be able to patch all 4.14
kernels on the world to suddenly use that ELF note. This can be done via
an upstream patch only.

>> This can be easily compared to the support of 5 level paging in the
>> kernel happening right now: When the 5 level paging machines are
>> available in the future you won't be limited to a rather recent kernel,
>> but you can use one already being part of some distribution.
> 
> Yes and no. Since we don't mean to introduce 5-level PV guests,
> we're not adding respective MMU ops anyway. If we would, it
> would still seem strange to introduced, say, MMUEXT_PIN_L5_TABLE
> without also implementing it. But yes, it would be possible, just
> that other than here there really would not be a need for the
> hypervisor to do anything for it as long as it doesn't itself know
> of 5 page table levels.

The patch I'm thinking of would just avoid masking away MFN bits as
it is done today. Look in pte_mfn_to_pfn(): the MFN is obtained by
masking the pte value with PTE_PFN_MASK. I'd like to use
XEN_PTE_PFN_MASK instead allowing for 52 bit physical addresses.

So we wouldn't need any other new interfaces. Its just handling of
pv pte values which is different by widening the mask. And this would
touch pv-specific code only.

I could do the Linux patch without the new ELF note. But this would mean
Xen couldn't tell whether a pv domain is capable to use memory above
64TB or not. So we would have to locate _all_ pv guests below 64TB or we
would risk crashing domains. With the ELF note we can avoid this
dilemma.

As long as there is no code in the hypervisor (or Xen tools) to handle
the ELF note we are in the same position as without the ELF note. But
we can add handling it and all of a sudden even "old" kernels using the
ELF note would benefit from it.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] include/public: add new elf note for support of huge physical addresses
  2017-08-14 10:21 [PATCH] include/public: add new elf note for support of huge physical addresses Juergen Gross
  2017-08-14 10:29 ` Jan Beulich
       [not found] ` <59919793020000780016F437@suse.com>
@ 2017-08-14 12:36 ` Andrew Cooper
  2017-08-14 12:56   ` Juergen Gross
  2 siblings, 1 reply; 13+ messages in thread
From: Andrew Cooper @ 2017-08-14 12:36 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, ian.jackson, tim, jbeulich

On 14/08/17 11:21, Juergen Gross wrote:
> Current pv guests will only see physical addresses up to 46 bits wide.
> In order to be able to run on a host supporting 5 level paging and to
> make use of any possible memory page there, physical addresses with up
> to 52 bits have to be supported.
>
> As Xen needs to know whether a pv guest can handle such large addresses
> the kernel of the guest has to advertise this capability.
>
> Add a new ELF note for the maximum physical address the kernel can
> make use of.
>
> Please note that it is not required for a pv guest to support 5 level
> paging in order to use high physical addresses.
>
> Signed-off-by: Juergen Gross <jgross@suse.com>

Why?

With PAE paging, the maximum physical address width is 52 bits, and has
been like that for a decade now.  5-level paging doesn't change this.

Are you saying that there is a Linux limitation where it doesn't cope
properly with 52 bits of width in the pagetables?

A note like this is fine in principle if it is in fact needed, but I
don't understand where the need arises.

~Andrew

P.S. you are aware that all guests are constrained to 16TB anyway,
because of the gnttab v1 32bit frame field?  In the case of PV guests,
that’s the 16TB MFN boundary.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] include/public: add new elf note for support of huge physical addresses
  2017-08-14 12:36 ` Andrew Cooper
@ 2017-08-14 12:56   ` Juergen Gross
  2017-08-14 13:12     ` Andrew Cooper
  0 siblings, 1 reply; 13+ messages in thread
From: Juergen Gross @ 2017-08-14 12:56 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, ian.jackson, tim, jbeulich

On 14/08/17 14:36, Andrew Cooper wrote:
> On 14/08/17 11:21, Juergen Gross wrote:
>> Current pv guests will only see physical addresses up to 46 bits wide.
>> In order to be able to run on a host supporting 5 level paging and to
>> make use of any possible memory page there, physical addresses with up
>> to 52 bits have to be supported.
>>
>> As Xen needs to know whether a pv guest can handle such large addresses
>> the kernel of the guest has to advertise this capability.
>>
>> Add a new ELF note for the maximum physical address the kernel can
>> make use of.
>>
>> Please note that it is not required for a pv guest to support 5 level
>> paging in order to use high physical addresses.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
> 
> Why?
> 
> With PAE paging, the maximum physical address width is 52 bits, and has
> been like that for a decade now.  5-level paging doesn't change this.
> 
> Are you saying that there is a Linux limitation where it doesn't cope
> properly with 52 bits of width in the pagetables?

Yes. See PTE_PFN_MASK in Linux kernel.

> A note like this is fine in principle if it is in fact needed, but I
> don't understand where the need arises.
> 
> ~Andrew
> 
> P.S. you are aware that all guests are constrained to 16TB anyway,
> because of the gnttab v1 32bit frame field?  In the case of PV guests,
> that’s the 16TB MFN boundary.

No, up to now I haven't thought of this.

This requires a major interface change, I guess. :-(


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] include/public: add new elf note for support of huge physical addresses
  2017-08-14 12:56   ` Juergen Gross
@ 2017-08-14 13:12     ` Andrew Cooper
  0 siblings, 0 replies; 13+ messages in thread
From: Andrew Cooper @ 2017-08-14 13:12 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, ian.jackson, tim, jbeulich

On 14/08/17 13:56, Juergen Gross wrote:
> On 14/08/17 14:36, Andrew Cooper wrote:
>> On 14/08/17 11:21, Juergen Gross wrote:
>>> Current pv guests will only see physical addresses up to 46 bits wide.
>>> In order to be able to run on a host supporting 5 level paging and to
>>> make use of any possible memory page there, physical addresses with up
>>> to 52 bits have to be supported.
>>>
>>> As Xen needs to know whether a pv guest can handle such large addresses
>>> the kernel of the guest has to advertise this capability.
>>>
>>> Add a new ELF note for the maximum physical address the kernel can
>>> make use of.
>>>
>>> Please note that it is not required for a pv guest to support 5 level
>>> paging in order to use high physical addresses.
>>>
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> Why?
>>
>> With PAE paging, the maximum physical address width is 52 bits, and has
>> been like that for a decade now.  5-level paging doesn't change this.
>>
>> Are you saying that there is a Linux limitation where it doesn't cope
>> properly with 52 bits of width in the pagetables?
> Yes. See PTE_PFN_MASK in Linux kernel.

:(

Why does Linux limit itself to 46 bits outside of 5LEVEL mode?  Loads of
AMD hardware already has 48 bits of physical address space.

It seems reasonable for Linux to limit itself to 44 bits in 32bit PAE
build.  Xen's limit of 128G is far lower.

What happens if Linux reads the a grant PTE written by Xen which refers
to a frame above this limit?  It looks like there is a latent bug, and
the higher address bits would be considered part of the upper flags.

It looks like such a flag is necessary for PV guests, but it also needs
to come with some other fixups in Xen.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] include/public: add new elf note for support of huge physical addresses
  2017-08-14 12:21           ` Juergen Gross
@ 2017-08-14 13:18             ` Jan Beulich
       [not found]             ` <5991BF44020000780016F61E@suse.com>
  1 sibling, 0 replies; 13+ messages in thread
From: Jan Beulich @ 2017-08-14 13:18 UTC (permalink / raw)
  To: Juergen Gross
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, xen-devel

>>> On 14.08.17 at 14:21, <jgross@suse.com> wrote:
> On 14/08/17 13:40, Jan Beulich wrote:
>>>>> On 14.08.17 at 13:05, <jgross@suse.com> wrote:
>>> On 14/08/17 12:48, Jan Beulich wrote:
>>>>>>> On 14.08.17 at 12:35, <jgross@suse.com> wrote:
>>>>> On 14/08/17 12:29, Jan Beulich wrote:
>>>>>>>>> On 14.08.17 at 12:21, <jgross@suse.com> wrote:
>>>>>>> Current pv guests will only see physical addresses up to 46 bits wide.
>>>>>>> In order to be able to run on a host supporting 5 level paging and to
>>>>>>> make use of any possible memory page there, physical addresses with up
>>>>>>> to 52 bits have to be supported.
>>>>>>
>>>>>> Is this a Xen shortcoming or a Linux one (I assume the latter)?
>>>>>
>>>>> It is a shortcoming of the Xen pv interface.
>>>>
>>>> Please be more precise: Where in the interface to we have a
>>>> restriction to 46 bits?
>>>
>>> We have no definition that the mfn width in a pte can be larger than
>>> the pfn width for a given architecture (in this case a 4 level paging
>>> 64 bit x86 host).
>>>
>>> So Xen has to assume a guest not telling otherwise has to be limited
>>> to mfns not exceeding 4 level hosts maximum addresses.
>> 
>> The number of page table levels affects only virtual address
>> width. Physical addresses can architecturally be 52 bits wide,
>> and what CPUID extended leaf 8 provides is what limits
>> physical address width.
> 
> Yes.
> 
> OTOH up to now there have been no x86 platforms supporting more than
> 46 bits physical address width (at least AFAIK), and this limit is
> explicitly specified for all current processors.

As said, AMD CPUs support 48 bits (and actually have hypertransport
stuff sitting at the top end, just not RAM extending that far).

>>> Or would you like to not limit current pv guests to the lower 64TB and
>>> risk them crashing, just because they interpreted the lack of any
>>> specific mfn width definition in another way as you do?
>> 
>> Again - you saying "current pv guests" rather than "current
>> Linux PV guests" makes me assume you've found some
>> limitation in the PV ABI. Yet so far you didn't point out where
>> that is, which then again makes me assume you're talking
>> about a Linux limitation.
> 
> Yes, I am talking of Linux here.
> 
> And no, you are wrong that I haven't pointed out where the limitation
> is: I have said that the PV ABI nowhere states that MFNs can be wider
> than any current processor's PFNs.

Why would it need to? The relevant limits are imposed by CPUID
output. There's no PV ABI aspect here.

> So when being pedantic you are right: the Linux kernel is violating
> the specification by not being able to run on a processor specifying
> physical address width to be 52 bits via CPUID.
> 
> OTOH as there hasn't been any such processor up to now this was no
> problem for Linux.
> 
> We could say, of course, this is a problem of Linux which should be
> fixed. I think this wouldn't be a wise thing to do: we don't want to
> do finger pointing at Linux, but we want a smooth user's experience
> with Xen. So we need some kind of interface to handle the current
> situation that no Linux kernel up to 4.13 will be able to make use of
> physical host memory above 64TB. Again: I don't think we want to let
> those kernel's just crash and tell the users its Linux' fault, they
> should either use a new kernel or KVM.

That's all fine, just that I'd expect you to make the hypervisor at
once honor the new note. Before accepting the addition to the
ABI, I'd at least like to see sketched out how the resulting
restriction would be enforced by the hypervisor. With the way
we do this for 32-bit guests I don't expect this to be entirely
straightforward.

>>> This can be easily compared to the support of 5 level paging in the
>>> kernel happening right now: When the 5 level paging machines are
>>> available in the future you won't be limited to a rather recent kernel,
>>> but you can use one already being part of some distribution.
>> 
>> Yes and no. Since we don't mean to introduce 5-level PV guests,
>> we're not adding respective MMU ops anyway. If we would, it
>> would still seem strange to introduced, say, MMUEXT_PIN_L5_TABLE
>> without also implementing it. But yes, it would be possible, just
>> that other than here there really would not be a need for the
>> hypervisor to do anything for it as long as it doesn't itself know
>> of 5 page table levels.
> 
> The patch I'm thinking of would just avoid masking away MFN bits as
> it is done today. Look in pte_mfn_to_pfn(): the MFN is obtained by
> masking the pte value with PTE_PFN_MASK. I'd like to use
> XEN_PTE_PFN_MASK instead allowing for 52 bit physical addresses.

Hmm, so you mean to nevertheless fix this on the Linux side.
Which then makes me wonder again - what do you need the note
for if you want to make Linux behave properly? By now it feels
like I'm really missing some of your rationale and/or intended plan
of action here.

> So we wouldn't need any other new interfaces. Its just handling of
> pv pte values which is different by widening the mask. And this would
> touch pv-specific code only.
> 
> I could do the Linux patch without the new ELF note. But this would mean
> Xen couldn't tell whether a pv domain is capable to use memory above
> 64TB or not. So we would have to locate _all_ pv guests below 64TB or we
> would risk crashing domains. With the ELF note we can avoid this
> dilemma.

I don't follow: It seems like you're implying that in the absence of
the note we'd restrict PV guests that way. But why would we? We
should not penalize non-Linux PV guests just because Linux has a
restriction. IOW the note needs to be present for a restriction to
be enforced, which in turn means the hypervisor first needs to
honor the note. Otherwise running a 4-level hypervisor on 5-level
capable hardware (with wider than 46-bit physical addresses)
would break Linux as well.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] include/public: add new elf note for support of huge physical addresses
       [not found]             ` <5991BF44020000780016F61E@suse.com>
@ 2017-08-14 14:23               ` Juergen Gross
  2017-08-14 14:31                 ` Jan Beulich
  0 siblings, 1 reply; 13+ messages in thread
From: Juergen Gross @ 2017-08-14 14:23 UTC (permalink / raw)
  To: Jan Beulich
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, xen-devel

On 14/08/17 15:18, Jan Beulich wrote:
>>>> On 14.08.17 at 14:21, <jgross@suse.com> wrote:
>> On 14/08/17 13:40, Jan Beulich wrote:
>>>>>> On 14.08.17 at 13:05, <jgross@suse.com> wrote:
>>>> On 14/08/17 12:48, Jan Beulich wrote:
>>>>>>>> On 14.08.17 at 12:35, <jgross@suse.com> wrote:
>>>>>> On 14/08/17 12:29, Jan Beulich wrote:
>>>>>>>>>> On 14.08.17 at 12:21, <jgross@suse.com> wrote:
>>>>>>>> Current pv guests will only see physical addresses up to 46 bits wide.
>>>>>>>> In order to be able to run on a host supporting 5 level paging and to
>>>>>>>> make use of any possible memory page there, physical addresses with up
>>>>>>>> to 52 bits have to be supported.
>>>>>>>
>>>>>>> Is this a Xen shortcoming or a Linux one (I assume the latter)?
>>>>>>
>>>>>> It is a shortcoming of the Xen pv interface.
>>>>>
>>>>> Please be more precise: Where in the interface to we have a
>>>>> restriction to 46 bits?
>>>>
>>>> We have no definition that the mfn width in a pte can be larger than
>>>> the pfn width for a given architecture (in this case a 4 level paging
>>>> 64 bit x86 host).
>>>>
>>>> So Xen has to assume a guest not telling otherwise has to be limited
>>>> to mfns not exceeding 4 level hosts maximum addresses.
>>>
>>> The number of page table levels affects only virtual address
>>> width. Physical addresses can architecturally be 52 bits wide,
>>> and what CPUID extended leaf 8 provides is what limits
>>> physical address width.
>>
>> Yes.
>>
>> OTOH up to now there have been no x86 platforms supporting more than
>> 46 bits physical address width (at least AFAIK), and this limit is
>> explicitly specified for all current processors.
> 
> As said, AMD CPUs support 48 bits (and actually have hypertransport
> stuff sitting at the top end, just not RAM extending that far).
> 
>>>> Or would you like to not limit current pv guests to the lower 64TB and
>>>> risk them crashing, just because they interpreted the lack of any
>>>> specific mfn width definition in another way as you do?
>>>
>>> Again - you saying "current pv guests" rather than "current
>>> Linux PV guests" makes me assume you've found some
>>> limitation in the PV ABI. Yet so far you didn't point out where
>>> that is, which then again makes me assume you're talking
>>> about a Linux limitation.
>>
>> Yes, I am talking of Linux here.
>>
>> And no, you are wrong that I haven't pointed out where the limitation
>> is: I have said that the PV ABI nowhere states that MFNs can be wider
>> than any current processor's PFNs.
> 
> Why would it need to? The relevant limits are imposed by CPUID
> output. There's no PV ABI aspect here.
> 
>> So when being pedantic you are right: the Linux kernel is violating
>> the specification by not being able to run on a processor specifying
>> physical address width to be 52 bits via CPUID.
>>
>> OTOH as there hasn't been any such processor up to now this was no
>> problem for Linux.
>>
>> We could say, of course, this is a problem of Linux which should be
>> fixed. I think this wouldn't be a wise thing to do: we don't want to
>> do finger pointing at Linux, but we want a smooth user's experience
>> with Xen. So we need some kind of interface to handle the current
>> situation that no Linux kernel up to 4.13 will be able to make use of
>> physical host memory above 64TB. Again: I don't think we want to let
>> those kernel's just crash and tell the users its Linux' fault, they
>> should either use a new kernel or KVM.
> 
> That's all fine, just that I'd expect you to make the hypervisor at
> once honor the new note. Before accepting the addition to the
> ABI, I'd at least like to see sketched out how the resulting
> restriction would be enforced by the hypervisor. With the way
> we do this for 32-bit guests I don't expect this to be entirely
> straightforward.

The minimal implementation would be to allocate memory for the guest
like today, then free all memory above the boundary the guest can
handle, and issue a message in case there was some memory freed. This
would tell the user what happened and why his guest wasn't started
or has less memory than intended. This would be much better than
random crashs at random times.

>>>> This can be easily compared to the support of 5 level paging in the
>>>> kernel happening right now: When the 5 level paging machines are
>>>> available in the future you won't be limited to a rather recent kernel,
>>>> but you can use one already being part of some distribution.
>>>
>>> Yes and no. Since we don't mean to introduce 5-level PV guests,
>>> we're not adding respective MMU ops anyway. If we would, it
>>> would still seem strange to introduced, say, MMUEXT_PIN_L5_TABLE
>>> without also implementing it. But yes, it would be possible, just
>>> that other than here there really would not be a need for the
>>> hypervisor to do anything for it as long as it doesn't itself know
>>> of 5 page table levels.
>>
>> The patch I'm thinking of would just avoid masking away MFN bits as
>> it is done today. Look in pte_mfn_to_pfn(): the MFN is obtained by
>> masking the pte value with PTE_PFN_MASK. I'd like to use
>> XEN_PTE_PFN_MASK instead allowing for 52 bit physical addresses.
> 
> Hmm, so you mean to nevertheless fix this on the Linux side.
> Which then makes me wonder again - what do you need the note
> for if you want to make Linux behave properly? By now it feels
> like I'm really missing some of your rationale and/or intended plan
> of action here.

The problem is not all Linux versions behave properly, only those
in future. I believe we can't assume everyone will only and ever use
the most recent kernel for a guest.

>> So we wouldn't need any other new interfaces. Its just handling of
>> pv pte values which is different by widening the mask. And this would
>> touch pv-specific code only.
>>
>> I could do the Linux patch without the new ELF note. But this would mean
>> Xen couldn't tell whether a pv domain is capable to use memory above
>> 64TB or not. So we would have to locate _all_ pv guests below 64TB or we
>> would risk crashing domains. With the ELF note we can avoid this
>> dilemma.
> 
> I don't follow: It seems like you're implying that in the absence of
> the note we'd restrict PV guests that way. But why would we? We
> should not penalize non-Linux PV guests just because Linux has a
> restriction.

Hmm, that's a valid point.

> IOW the note needs to be present for a restriction to
> be enforced, which in turn means the hypervisor first needs to
> honor the note.

I don't think so. How would you get the note into already existing
kernels having the restriction?

> Otherwise running a 4-level hypervisor on 5-level
> capable hardware (with wider than 46-bit physical addresses)
> would break Linux as well.

Right. OTOH on such a host bare metal 4-level Linux wouldn't run either
(or only using less memory).

With Andrew's comment regarding grant v1 restricting _all_ current pv
domains using this version to the first 16TB the complete discussion
might be moot. So do we need an ELF note specifying whether a pv domain
supports grant v2 in order to position it above 16TB? Or should this
semantics be included in a kernel specifying its max physical address
supported above 16TB?


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] include/public: add new elf note for support of huge physical addresses
  2017-08-14 14:23               ` Juergen Gross
@ 2017-08-14 14:31                 ` Jan Beulich
  0 siblings, 0 replies; 13+ messages in thread
From: Jan Beulich @ 2017-08-14 14:31 UTC (permalink / raw)
  To: Juergen Gross
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, xen-devel

>>> On 14.08.17 at 16:23, <jgross@suse.com> wrote:
> On 14/08/17 15:18, Jan Beulich wrote:
>> IOW the note needs to be present for a restriction to
>> be enforced, which in turn means the hypervisor first needs to
>> honor the note.
> 
> I don't think so. How would you get the note into already existing
> kernels having the restriction?

Well, we'd additionally need a guest config setting or some such.

>> Otherwise running a 4-level hypervisor on 5-level
>> capable hardware (with wider than 46-bit physical addresses)
>> would break Linux as well.
> 
> Right. OTOH on such a host bare metal 4-level Linux wouldn't run either
> (or only using less memory).
> 
> With Andrew's comment regarding grant v1 restricting _all_ current pv
> domains using this version to the first 16TB the complete discussion
> might be moot. So do we need an ELF note specifying whether a pv domain
> supports grant v2 in order to position it above 16TB? Or should this
> semantics be included in a kernel specifying its max physical address
> supported above 16TB?

First of all we'd need to enforce the 16Tb boundary in the
hypervisor. Then we could have a note relaxing this; whether
this is the note you propose or a separate one is secondary.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-08-14 14:31 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-14 10:21 [PATCH] include/public: add new elf note for support of huge physical addresses Juergen Gross
2017-08-14 10:29 ` Jan Beulich
     [not found] ` <59919793020000780016F437@suse.com>
2017-08-14 10:35   ` Juergen Gross
2017-08-14 10:48     ` Jan Beulich
     [not found]     ` <59919C20020000780016F4AB@suse.com>
2017-08-14 11:05       ` Juergen Gross
2017-08-14 11:40         ` Jan Beulich
     [not found]         ` <5991A844020000780016F53B@suse.com>
2017-08-14 12:21           ` Juergen Gross
2017-08-14 13:18             ` Jan Beulich
     [not found]             ` <5991BF44020000780016F61E@suse.com>
2017-08-14 14:23               ` Juergen Gross
2017-08-14 14:31                 ` Jan Beulich
2017-08-14 12:36 ` Andrew Cooper
2017-08-14 12:56   ` Juergen Gross
2017-08-14 13:12     ` Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.