All of lore.kernel.org
 help / color / mirror / Atom feed
* Xen/ARM API issue (page size)
@ 2021-07-08  0:32 Elliott Mitchell
  2021-07-08  1:05 ` Andrew Cooper
  0 siblings, 1 reply; 5+ messages in thread
From: Elliott Mitchell @ 2021-07-08  0:32 UTC (permalink / raw)
  To: xen-devel

Hopefully I'm not about to show the limits of my knowledge...

Quite a few values passed to Xen via hypercalls include a page number.
This makes sense as that maps to the hardware.  Problem is, I cannot help
but notice aarch64 allows for 4KB, 16KB and 64KB pages.

I don't know how flexible aarch64 is.  I don't know whether an aarch64
core can support multiple page sizes.  My tentative reading of
information seemed to suggest a typical aarch64 core /could/ allow
multiple page sizes.

What happens if a system (and Xen) is setup to support 64KB pages, but a
particular domain has been built strictly with 4KB page support?

What if a particular domain wanted to use 64KB pages (4KB being too
granular), but Xen was set to use 4KB pages?

What if a system had two domains which were set for different page sizes,
but the two needed to interact?


Then you have things like VCPUOP_register_vcpu_info.  The structure is
setup as mfn and offset.  With the /actual/ page size being used there,
it is troublesome.  Several places might work better if pure 64-bit
addresses were used, but with alignment requirements specified.

Then there is a question of what happens when we get a core which has
more than 64 physical address bits (seems a few years off, but for a long
time 32 seemed high).


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Xen/ARM API issue (page size)
  2021-07-08  0:32 Xen/ARM API issue (page size) Elliott Mitchell
@ 2021-07-08  1:05 ` Andrew Cooper
  2021-07-08 16:06   ` Julien Grall
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Cooper @ 2021-07-08  1:05 UTC (permalink / raw)
  To: Elliott Mitchell, xen-devel

On 08/07/2021 01:32, Elliott Mitchell wrote:
> Hopefully I'm not about to show the limits of my knowledge...
>
> Quite a few values passed to Xen via hypercalls include a page number.
> This makes sense as that maps to the hardware.  Problem is, I cannot help
> but notice aarch64 allows for 4KB, 16KB and 64KB pages.

Yes - page size is a know error through the ABI, seeing as Xen started
on x86 and 4k is the only size considered at the time.

32bit frame numbers were all the rage between the Pentum 2 (1997) and
the advent of 64bit systems (~2006), because they let you efficiently
reference up to 16T of physical memory, rather than being limited at 4G
if you used byte addresses instead.

It will be addressed in ABIv2 design (if I ever get enough time to write
everything down and make a start).

> I don't know how flexible aarch64 is.  I don't know whether an aarch64
> core can support multiple page sizes.  My tentative reading of
> information seemed to suggest a typical aarch64 core /could/ allow
> multiple page sizes.
>
> What happens if a system (and Xen) is setup to support 64KB pages, but a
> particular domain has been built strictly with 4KB page support?
>
> What if a particular domain wanted to use 64KB pages (4KB being too
> granular), but Xen was set to use 4KB pages?
>
> What if a system had two domains which were set for different page sizes,
> but the two needed to interact?

I'm afraid I'll have to defer to the arm folk to answer this, but my
understanding is that it should be possible to support guests compiled
with, and using, different page sizes (given a suitable ABI).

> Then you have things like VCPUOP_register_vcpu_info.  The structure is
> setup as mfn and offset.  With the /actual/ page size being used there,
> it is troublesome.  Several places might work better if pure 64-bit
> addresses were used, but with alignment requirements specified.

The way to fix size problems is to mandate that all addresses in the ABI
are full byte addresses, not frame numbers.  When alignment is required,
and it frequently is, it is fine to use the lower bits for metadata.

Critically, what his means is that you don't need separate API/ABI's
based on page size.  e.g. "please balloon out this page" operates "on
the alignment the guest is using", rather than needing separate ops for
4k/2M/1G (to list the x86 page sizes only).

> Then there is a question of what happens when we get a core which has
> more than 64 physical address bits (seems a few years off, but for a long
> time 32 seemed high).

riscv128 is already being discussed, and current generation x86 servers
already have 52 address bits and are using them all (partly NVDIMMs
which take up huge swathes of address space, and the various encrypted
RAM technologies which steal upper address bits for key-ids).

The only sensible way to address this is to introduce new ops mirroring
existing ones, using larger integers.  e.g. get_e820 and get_e820_2
where the latter returns __uint128_t's instead of uint64_t's (or whatever).

Whenever you're talking about systems like this, Xen has to be compiled
for the widest data type, and we know the datatype used by guest kernels
(based on its control settings).  All the compatibility layer needs to
do is zero extend 64bit addresses to form 128bit ones.

~Andrew



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Xen/ARM API issue (page size)
  2021-07-08  1:05 ` Andrew Cooper
@ 2021-07-08 16:06   ` Julien Grall
  2021-07-08 22:05     ` Elliott Mitchell
  0 siblings, 1 reply; 5+ messages in thread
From: Julien Grall @ 2021-07-08 16:06 UTC (permalink / raw)
  To: Andrew Cooper, Elliott Mitchell, xen-devel

Hi,

I will answer the two e-mails at the same time as my answer will be 
similar :).

On 08/07/2021 02:05, Andrew Cooper wrote:
> On 08/07/2021 01:32, Elliott Mitchell wrote:
>> Hopefully I'm not about to show the limits of my knowledge...
>>
>> Quite a few values passed to Xen via hypercalls include a page number.
>> This makes sense as that maps to the hardware.  Problem is, I cannot help
>> but notice aarch64 allows for 4KB, 16KB and 64KB pages.
> 
> Yes - page size is a know error through the ABI, seeing as Xen started
> on x86 and 4k is the only size considered at the time.
> 
> 32bit frame numbers were all the rage between the Pentum 2 (1997) and
> the advent of 64bit systems (~2006), because they let you efficiently
> reference up to 16T of physical memory, rather than being limited at 4G
> if you used byte addresses instead.
> 
> It will be addressed in ABIv2 design (if I ever get enough time to write
> everything down and make a start).

IIRC, ABIv2 will only focus on the interface between the hypervisor and 
the guests. However, I think we will also need to update the PV protocol 
so two domains agree on the page granularity used.

> 
>> I don't know how flexible aarch64 is.  I don't know whether an aarch64
>> core can support multiple page sizes.  My tentative reading of
>> information seemed to suggest a typical aarch64 core /could/ allow
>> multiple page sizes.

The Arm architecture allows the hypervisor and the kernel to chose its 
own granularity. IOW, a kernel may use 4KB when the hypervisor use 64KB.

Most of the arm64 cores supports all the page granularity. That said, 
this is not a requirement from the Arm Arm, so it may be possible to 
have cores only supporting a subset of the page granularity.

>>
>> What happens if a system (and Xen) is setup to support 64KB pages, but a
>> particular domain has been built strictly with 4KB page support?

If the processor only support 64KB, then you would not be able to boot a 
4KB kernel there.

>>
>> What if a particular domain wanted to use 64KB pages (4KB being too
>> granular), but Xen was set to use 4KB pages?
Today the hypercall ABI using the same page granularity as the 
hypervisor. IOW, the domain would need to break its page in 4KB chunk to 
talk to the hypervisor.

FWIW, this is how Linux with 64KB/16KB page granularity is able to run 
on current Xen.

>>
>> What if a system had two domains which were set for different page sizes,
>> but the two needed to interact?

They would need to agree on the page granularity used. At the moment, 
this is implicitely fixed to 4KB.

> 
> I'm afraid I'll have to defer to the arm folk to answer this, but my
> understanding is that it should be possible to support guests compiled
> with, and using, different page sizes (given a suitable ABI).
> 
>> Then you have things like VCPUOP_register_vcpu_info.  The structure is
>> setup as mfn and offset.  With the /actual/ page size being used there,
>> it is troublesome.  Several places might work better if pure 64-bit
>> addresses were used, but with alignment requirements specified.
> 
> The way to fix size problems is to mandate that all addresses in the ABI
> are full byte addresses, not frame numbers.  When alignment is required,
> and it frequently is, it is fine to use the lower bits for metadata.
> 
> Critically, what his means is that you don't need separate API/ABI's
> based on page size.  e.g. "please balloon out this page" operates "on
> the alignment the guest is using", rather than needing separate ops for
> 4k/2M/1G (to list the x86 page sizes only).

I think the full address is not sufficient here. The stage-2 page-table 
(aka EPT on x86) is using the page granularity of the hypervisor.

So for anything that requires change in the P2M, the domain needs to 
make sure the address is aligned to the page granularity of the hypervisor.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Xen/ARM API issue (page size)
  2021-07-08 16:06   ` Julien Grall
@ 2021-07-08 22:05     ` Elliott Mitchell
  2021-07-09  9:19       ` Julien Grall
  0 siblings, 1 reply; 5+ messages in thread
From: Elliott Mitchell @ 2021-07-08 22:05 UTC (permalink / raw)
  To: Julien Grall; +Cc: Andrew Cooper, xen-devel

On Thu, Jul 08, 2021 at 05:06:42PM +0100, Julien Grall wrote:
> On 08/07/2021 02:05, Andrew Cooper wrote:
> > On 08/07/2021 01:32, Elliott Mitchell wrote:
> >> Hopefully I'm not about to show the limits of my knowledge...
> >>
> >> Quite a few values passed to Xen via hypercalls include a page number.
> >> This makes sense as that maps to the hardware.  Problem is, I cannot help
> >> but notice aarch64 allows for 4KB, 16KB and 64KB pages.
> > 
> > Yes - page size is a know error through the ABI, seeing as Xen started
> > on x86 and 4k is the only size considered at the time.
> > 
> > 32bit frame numbers were all the rage between the Pentum 2 (1997) and
> > the advent of 64bit systems (~2006), because they let you efficiently
> > reference up to 16T of physical memory, rather than being limited at 4G
> > if you used byte addresses instead.
> > 
> > It will be addressed in ABIv2 design (if I ever get enough time to write
> > everything down and make a start).
> 
> IIRC, ABIv2 will only focus on the interface between the hypervisor and 
> the guests. However, I think we will also need to update the PV protocol 
> so two domains agree on the page granularity used.

I'm inclined to concur with Andrew Cooper here.  It makes a fair bit of
sense to consistently use full addresses across the entire ABI, just
specify alignment so the lower bits end up zeroes.


> Most of the arm64 cores supports all the page granularity. That said, 
> this is not a requirement from the Arm Arm, so it may be possible to 
> have cores only supporting a subset of the page granularity.

At which point it is possible to have a device where the page size(s)
supported by some cores are disjoint from the page size(s) supported by
other cores.

I imagine someone has plans.  An obvious use case would be a cellphone
chip with a low-power core for the modem and a high-power OS core.


> >> What happens if a system (and Xen) is setup to support 64KB pages, but a
> >> particular domain has been built strictly with 4KB page support?
> 
> If the processor only support 64KB, then you would not be able to boot a 
> 4KB kernel there.

I was being explicit about covering both cases of distinct page sizes
between Xen and domain (Xen with smaller page size, domain with smaller
page size).


> >> What if a particular domain wanted to use 64KB pages (4KB being too
> >> granular), but Xen was set to use 4KB pages?
> Today the hypercall ABI using the same page granularity as the 
> hypervisor. IOW, the domain would need to break its page in 4KB chunk to 
> talk to the hypervisor.
> 
> FWIW, this is how Linux with 64KB/16KB page granularity is able to run 
> on current Xen.

Breaking pages up is generally easier than putting them back together.
Good news is this could be handled similar to DMA operations and a few
pages reserved for interaction with small page domains.


> >> What if a system had two domains which were set for different page sizes,
> >> but the two needed to interact?
> 
> They would need to agree on the page granularity used. At the moment, 
> this is implicitely fixed to 4KB.

"implicitly" -> "undocumented" -> "guess" -> "12 hour build wasted"

For the case I'm concerned with, the history is a decent hint, but not
being explicitly documented is Bad.  In the Xen ABI there are too many
references to "page size" without the page size being defined as 4KB.

In a few years there may be someone on this list who assumed "page size"
meant whatever page size was in use and will be rather annoyed it means
4096, when both Xen and their OS were using 65536.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Xen/ARM API issue (page size)
  2021-07-08 22:05     ` Elliott Mitchell
@ 2021-07-09  9:19       ` Julien Grall
  0 siblings, 0 replies; 5+ messages in thread
From: Julien Grall @ 2021-07-09  9:19 UTC (permalink / raw)
  To: Elliott Mitchell; +Cc: Andrew Cooper, xen-devel

Hi Elliott,

On 08/07/2021 23:05, Elliott Mitchell wrote:
> On Thu, Jul 08, 2021 at 05:06:42PM +0100, Julien Grall wrote:
>> On 08/07/2021 02:05, Andrew Cooper wrote:
>>> On 08/07/2021 01:32, Elliott Mitchell wrote:
>>>> Hopefully I'm not about to show the limits of my knowledge...
>>>>
>>>> Quite a few values passed to Xen via hypercalls include a page number.
>>>> This makes sense as that maps to the hardware.  Problem is, I cannot help
>>>> but notice aarch64 allows for 4KB, 16KB and 64KB pages.
>>>
>>> Yes - page size is a know error through the ABI, seeing as Xen started
>>> on x86 and 4k is the only size considered at the time.
>>>
>>> 32bit frame numbers were all the rage between the Pentum 2 (1997) and
>>> the advent of 64bit systems (~2006), because they let you efficiently
>>> reference up to 16T of physical memory, rather than being limited at 4G
>>> if you used byte addresses instead.
>>>
>>> It will be addressed in ABIv2 design (if I ever get enough time to write
>>> everything down and make a start).
>>
>> IIRC, ABIv2 will only focus on the interface between the hypervisor and
>> the guests. However, I think we will also need to update the PV protocol
>> so two domains agree on the page granularity used.
> 
> I'm inclined to concur with Andrew Cooper here.  It makes a fair bit of
> sense to consistently use full addresses across the entire ABI, just
> specify alignment so the lower bits end up zeroes.
> 
> 
>> Most of the arm64 cores supports all the page granularity. That said,
>> this is not a requirement from the Arm Arm, so it may be possible to
>> have cores only supporting a subset of the page granularity.
> 
> At which point it is possible to have a device where the page size(s)
> supported by some cores are disjoint from the page size(s) supported by
> other cores.
Well yes, it is possible to have cores with incompatible features. 
However, the software may decide to not support that configuration.

For instance, Linux will sanitize the CPU features and may not boot (or 
prevent a CPU to boot) if it can't find a valid subset.

In the case of the page granularity, all the cores where the OS will run 
needs to have a common page granularity supported. Linux will have to be 
built with PAGE_SIZE set to that granularity (Linux cannot dynamically 
switched).

> I imagine someone has plans.  An obvious use case would be a cellphone
> chip with a low-power core for the modem and a high-power OS core.
So long the OS is running on just the high-power core, then it is fine.

> 
> 
>>>> What happens if a system (and Xen) is setup to support 64KB pages, but a
>>>> particular domain has been built strictly with 4KB page support?
>>
>> If the processor only support 64KB, then you would not be able to boot a
>> 4KB kernel there.
> 
> I was being explicit about covering both cases of distinct page sizes
> between Xen and domain (Xen with smaller page size, domain with smaller
> page size).

Ok. I think I covered the case where Xen will use a smaller page 
granularity compare to the domain. But I haven't covered the opposite.

With the current ABI, a guest would need to be modified so it will 
allocate memory and talk to the hypervisor using 64KB chunk (assuming 
this is what the hypervisor was build with).

>>>> What if a particular domain wanted to use 64KB pages (4KB being too
>>>> granular), but Xen was set to use 4KB pages?
>> Today the hypercall ABI using the same page granularity as the
>> hypervisor. IOW, the domain would need to break its page in 4KB chunk to
>> talk to the hypervisor.
>>
>> FWIW, this is how Linux with 64KB/16KB page granularity is able to run
>> on current Xen.
> 
> Breaking pages up is generally easier than putting them back together.

IIRC, on Linux we are not putting them back together. Instead, we are 
wasting 60KB for every page to keep the code simple. Obviously, this 
could be improved... But this is not thing I had time to look at it.

[...]

>>>> What if a system had two domains which were set for different page sizes,
>>>> but the two needed to interact?
>>
>> They would need to agree on the page granularity used. At the moment,
>> this is implicitely fixed to 4KB.
> 
> "implicitly" -> "undocumented" -> "guess" -> "12 hour build wasted"
> 
> For the case I'm concerned with, the history is a decent hint, but not
> being explicitly documented is Bad.  In the Xen ABI there are too many
> references to "page size" without the page size being defined as 4KB.
> 
> In a few years there may be someone on this list who assumed "page size"
> meant whatever page size was in use and will be rather annoyed it means
> 4096, when both Xen and their OS were using 65536.

The documentation in Xen (including the ABI) is not at its best. The 
community is currently working on improving it.

You are welcome to help to contribute around the page size.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-07-09  9:19 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-08  0:32 Xen/ARM API issue (page size) Elliott Mitchell
2021-07-08  1:05 ` Andrew Cooper
2021-07-08 16:06   ` Julien Grall
2021-07-08 22:05     ` Elliott Mitchell
2021-07-09  9:19       ` Julien Grall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.