All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] ARM64: 4 level page table translation for 4KB pages
@ 2014-03-31  3:51 Jungseok Lee
  2014-03-31  6:56 ` Arnd Bergmann
  0 siblings, 1 reply; 24+ messages in thread
From: Jungseok Lee @ 2014-03-31  3:51 UTC (permalink / raw)
  To: linux-arm-kernel

Dear All

Current ARM64 kernel cannot support 4KB pages for 40-bit physical address
space described in [1] due to one major issue + one minor issue.

Firstly, kernel logical memory map (0xffffffc000000000-0xffffffffffffffff)
cannot cover DRAM region from 544GB to 1024GB in [1]. Specifically, ARM64
kernel fails to create mapping for this region in map_mem function
(arch/arm64/mm/mmu.c) since __phys_to_virt for this region reaches to
address overflow. I've used 3.14-rc8+Fast Models to validate the statement.

Secondly, vmemmap space is not enough to cover over about 585GB physical
address space. Fortunately, this issue can be resolved as utilizing an extra
vmemmap space (0xffffffbe00000000-0xffffffbffbbfffff) in [2]. However,
it would not cover systems having a couple of terabytes DRAM.

Therefore, it would be needed to implement 4 level page table translations
for 4KB pages on 40-bit physical address space platforms. Someone might
suggest use of 64KB pages in this case, but I'm not sure about how to
deal with internal memory fragmentation.

I would like to contribute 4 level page table translations to upstream,
the target of which is 3.16 kernel, if there is no movement on it. I saw
some related RFC patches a couple of months ago, but they didn't seem to 
be merged into maintainer's tree.


Best Regards
Jungseok Lee


References
----------
[1]: Principles of ARM Memory Maps, Whit Paper, Issue C

[2]: Documentation/arm64/memory.txt

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC] ARM64: 4 level page table translation for 4KB pages
  2014-03-31  3:51 [RFC] ARM64: 4 level page table translation for 4KB pages Jungseok Lee
@ 2014-03-31  6:56 ` Arnd Bergmann
  2014-03-31 11:31   ` Catalin Marinas
  0 siblings, 1 reply; 24+ messages in thread
From: Arnd Bergmann @ 2014-03-31  6:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday 31 March 2014 12:51:07 Jungseok Lee wrote:
> Current ARM64 kernel cannot support 4KB pages for 40-bit physical address
> space described in [1] due to one major issue + one minor issue.
> 
> Firstly, kernel logical memory map (0xffffffc000000000-0xffffffffffffffff)
> cannot cover DRAM region from 544GB to 1024GB in [1]. Specifically, ARM64
> kernel fails to create mapping for this region in map_mem function
> (arch/arm64/mm/mmu.c) since __phys_to_virt for this region reaches to
> address overflow. I've used 3.14-rc8+Fast Models to validate the statement.

It took me a while to understand what is going on, but it essentially comes
down to the logical memory map (0xffffffc000000000-0xffffffffffffffff)
being able to represent only RAM in the first 256GB of address space.

More importantly, this means that any system following [1] will only be
able to use 32GB of RAM, which is a much more severe restriction than
what it sounds like at first.

> Secondly, vmemmap space is not enough to cover over about 585GB physical
> address space. Fortunately, this issue can be resolved as utilizing an extra
> vmemmap space (0xffffffbe00000000-0xffffffbffbbfffff) in [2]. However,
> it would not cover systems having a couple of terabytes DRAM.

This one can be trivially changed by taking more space out of the vmalloc
area, to go much higher if necessary. vmemmap space is always just a fraction
of the linear mapping size, so we can accommodate it by definition if we
find space to fit the physical memory.

> Therefore, it would be needed to implement 4 level page table translations
> for 4KB pages on 40-bit physical address space platforms. Someone might
> suggest use of 64KB pages in this case, but I'm not sure about how to
> deal with internal memory fragmentation.
> 
> I would like to contribute 4 level page table translations to upstream,
> the target of which is 3.16 kernel, if there is no movement on it. I saw
> some related RFC patches a couple of months ago, but they didn't seem to 
> be merged into maintainer's tree.

I think you are answering the wrong question here. Four level page tables
should not be required to support >32GB of RAM, that would be very silly.
There are good reasons to use a 50 bit virtual address space in user
land, e.g. for supporting data base applications that mmap huge files.

If this is not the goal however, we should not pay for the overhead
of the extra page table in user space. I can see two other possible
solutions for the problem:

a) always use a four-level page table in kernel space, regardless of
whether we do it in user space. We can move the kernel mappings down
in address space either by one 512GB entry to 0xffffff0000000000, or
to match the 64k-page location at 0xfffffc0000000000, or all the way
to to 0xfffc000000000000. In any case, we can have all the dynamic
mappings within one 512GB area and pretend we have a three-level
page table for them, while the rest of DRAM is mapped statically at
early boot time using 512GB large pages.

b) If there is a reasonable assumption that everybody is using the
memory map from [1], then we can change the __virt_to_phys
and __phys_to_virt functions to accomodate that and move everything
into a flat contiguous virtual address space of 256GB. This would
also enable the use of a more efficient mem_map array instead of the
vmemmap, but would break running on any system that doesn't follow
the same convention. I have no idea yet how common this memory map
is, so I can't tell if this would be a realistic solution for what
you are targeting. We clearly wouldn't do it if it implies distributions
to ship an extra kernel binary for systems based on different memory
maps.

	Arnd
 
> References
> ----------
> [1]: Principles of ARM Memory Maps, Whit Paper, Issue C
> 
> [2]: Documentation/arm64/memory.txt
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC] ARM64: 4 level page table translation for 4KB pages
  2014-03-31  6:56 ` Arnd Bergmann
@ 2014-03-31 11:31   ` Catalin Marinas
  2014-03-31 12:45     ` Catalin Marinas
                       ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Catalin Marinas @ 2014-03-31 11:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Mar 31, 2014 at 07:56:53AM +0100, Arnd Bergmann wrote:
> On Monday 31 March 2014 12:51:07 Jungseok Lee wrote:
> > Current ARM64 kernel cannot support 4KB pages for 40-bit physical address
> > space described in [1] due to one major issue + one minor issue.
> > 
> > Firstly, kernel logical memory map (0xffffffc000000000-0xffffffffffffffff)
> > cannot cover DRAM region from 544GB to 1024GB in [1]. Specifically, ARM64
> > kernel fails to create mapping for this region in map_mem function
> > (arch/arm64/mm/mmu.c) since __phys_to_virt for this region reaches to
> > address overflow. I've used 3.14-rc8+Fast Models to validate the statement.
> 
> It took me a while to understand what is going on, but it essentially comes
> down to the logical memory map (0xffffffc000000000-0xffffffffffffffff)
> being able to represent only RAM in the first 256GB of address space.
> 
> More importantly, this means that any system following [1] will only be
> able to use 32GB of RAM, which is a much more severe restriction than
> what it sounds like at first.

On a 64-bit platform, do we still need the alias at the bottom and the
512-544GB hole (even for 32-bit DMA, top address bits can be wired to
512GB)? Only the idmap would need 4 levels, but that's static, we don't
need to switch Linux to 4-levels. Otherwise the memory is too sparse.

Of course, if you have 512GB of RAM and you want 4K pages, 3 levels are
no longer enough (with 64K pages you get to 42-bit VA space).

> > Secondly, vmemmap space is not enough to cover over about 585GB physical
> > address space. Fortunately, this issue can be resolved as utilizing an extra
> > vmemmap space (0xffffffbe00000000-0xffffffbffbbfffff) in [2]. However,
> > it would not cover systems having a couple of terabytes DRAM.
> 
> This one can be trivially changed by taking more space out of the vmalloc
> area, to go much higher if necessary. vmemmap space is always just a fraction
> of the linear mapping size, so we can accommodate it by definition if we
> find space to fit the physical memory.

vmemmap is the total range / page size * sizeof(struct page). So for 1TB
range and 4K pages we would need 8GB (the current value, unless I
miscalculated the above). Anyway, you can't cover 1TB range with
3-levels.

> > Therefore, it would be needed to implement 4 level page table translations
> > for 4KB pages on 40-bit physical address space platforms. Someone might
> > suggest use of 64KB pages in this case, but I'm not sure about how to
> > deal with internal memory fragmentation.
> > 
> > I would like to contribute 4 level page table translations to upstream,
> > the target of which is 3.16 kernel, if there is no movement on it. I saw
> > some related RFC patches a couple of months ago, but they didn't seem to 
> > be merged into maintainer's tree.
> 
> I think you are answering the wrong question here. Four level page tables
> should not be required to support >32GB of RAM, that would be very silly.

I agree, we should only enable 4-levels of page table if we have close
to 512GB of RAM or the range is too sparse but I would actually push
back on the hardware guys to keep it tighter.

> There are good reasons to use a 50 bit virtual address space in user
> land, e.g. for supporting data base applications that mmap huge files.
> 
> If this is not the goal however, we should not pay for the overhead
> of the extra page table in user space. I can see two other possible
> solutions for the problem:
> 
> a) always use a four-level page table in kernel space, regardless of
> whether we do it in user space. We can move the kernel mappings down
> in address space either by one 512GB entry to 0xffffff0000000000, or
> to match the 64k-page location at 0xfffffc0000000000, or all the way
> to to 0xfffc000000000000. In any case, we can have all the dynamic
> mappings within one 512GB area and pretend we have a three-level
> page table for them, while the rest of DRAM is mapped statically at
> early boot time using 512GB large pages.

That's a workaround but we end up with two (or more) kernel pgds - one
for vmalloc, ioremap etc. and another (static) one for the kernel linear
mapping. So far there isn't any memory mapping carved out but we have to
be careful in the future.

However, kernel page table walking would be a bit slower with 4-levels.

> b) If there is a reasonable assumption that everybody is using the
> memory map from [1], then we can change the __virt_to_phys
> and __phys_to_virt functions to accomodate that and move everything
> into a flat contiguous virtual address space of 256GB. This would
> also enable the use of a more efficient mem_map array instead of the
> vmemmap, but would break running on any system that doesn't follow
> the same convention. I have no idea yet how common this memory map
> is, so I can't tell if this would be a realistic solution for what
> you are targeting. We clearly wouldn't do it if it implies distributions
> to ship an extra kernel binary for systems based on different memory
> maps.

We end up with hacks like the Realview phys/virt conversion. I don't
think we can guarantee that all ARMv8 platforms would follow the above
guidance.

-- 
Catalin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC] ARM64: 4 level page table translation for 4KB pages
  2014-03-31 11:31   ` Catalin Marinas
@ 2014-03-31 12:45     ` Catalin Marinas
  2014-03-31 12:58       ` Arnd Bergmann
  2014-03-31 12:53     ` Arnd Bergmann
  2014-04-01  0:42     ` Jungseok Lee
  2 siblings, 1 reply; 24+ messages in thread
From: Catalin Marinas @ 2014-03-31 12:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Mar 31, 2014 at 12:31:14PM +0100, Catalin Marinas wrote:
> On Mon, Mar 31, 2014 at 07:56:53AM +0100, Arnd Bergmann wrote:
> > On Monday 31 March 2014 12:51:07 Jungseok Lee wrote:
> > > Current ARM64 kernel cannot support 4KB pages for 40-bit physical address
> > > space described in [1] due to one major issue + one minor issue.
> > > 
> > > Firstly, kernel logical memory map (0xffffffc000000000-0xffffffffffffffff)
> > > cannot cover DRAM region from 544GB to 1024GB in [1]. Specifically, ARM64
> > > kernel fails to create mapping for this region in map_mem function
> > > (arch/arm64/mm/mmu.c) since __phys_to_virt for this region reaches to
> > > address overflow. I've used 3.14-rc8+Fast Models to validate the statement.
[...]
> > a) always use a four-level page table in kernel space, regardless of
> > whether we do it in user space. We can move the kernel mappings down
> > in address space either by one 512GB entry to 0xffffff0000000000, or
> > to match the 64k-page location at 0xfffffc0000000000, or all the way
> > to to 0xfffc000000000000. In any case, we can have all the dynamic
> > mappings within one 512GB area and pretend we have a three-level
> > page table for them, while the rest of DRAM is mapped statically at
> > early boot time using 512GB large pages.
> 
> That's a workaround but we end up with two (or more) kernel pgds - one
> for vmalloc, ioremap etc. and another (static) one for the kernel linear
> mapping. So far there isn't any memory mapping carved out but we have to
> be careful in the future.
> 
> However, kernel page table walking would be a bit slower with 4-levels.

Yet another approach would be to enable 4-levels of page tables (no
nopud.h) in the kernel with pgd_offset_k doing the right thing for 4
levels but user space configured to 3-levels only and pgd_offset
returning 0 while pud_offset does what pgd_offset currently implements
for 3 levels.

This solves the page table walk latency for user but not for kernel.
Anyway, if the hardware memory map is so sparse (a real SoC, not the
spec), I don't think we have other ways to support it with 3-levels of
page table for the kernel, unless we hack __virt_to_phys/__phys_to_virt.

-- 
Catalin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC] ARM64: 4 level page table translation for 4KB pages
  2014-03-31 11:31   ` Catalin Marinas
  2014-03-31 12:45     ` Catalin Marinas
@ 2014-03-31 12:53     ` Arnd Bergmann
  2014-03-31 15:27       ` Catalin Marinas
  2014-04-01  0:42     ` Jungseok Lee
  2 siblings, 1 reply; 24+ messages in thread
From: Arnd Bergmann @ 2014-03-31 12:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday 31 March 2014 12:31:14 Catalin Marinas wrote:
> On Mon, Mar 31, 2014 at 07:56:53AM +0100, Arnd Bergmann wrote:
> > On Monday 31 March 2014 12:51:07 Jungseok Lee wrote:
> > > Current ARM64 kernel cannot support 4KB pages for 40-bit physical address
> > > space described in [1] due to one major issue + one minor issue.
> > > 
> > > Firstly, kernel logical memory map (0xffffffc000000000-0xffffffffffffffff)
> > > cannot cover DRAM region from 544GB to 1024GB in [1]. Specifically, ARM64
> > > kernel fails to create mapping for this region in map_mem function
> > > (arch/arm64/mm/mmu.c) since __phys_to_virt for this region reaches to
> > > address overflow. I've used 3.14-rc8+Fast Models to validate the statement.
> > 
> > It took me a while to understand what is going on, but it essentially comes
> > down to the logical memory map (0xffffffc000000000-0xffffffffffffffff)
> > being able to represent only RAM in the first 256GB of address space.
> > 
> > More importantly, this means that any system following [1] will only be
> > able to use 32GB of RAM, which is a much more severe restriction than
> > what it sounds like at first.
> 
> On a 64-bit platform, do we still need the alias at the bottom and the
> 512-544GB hole (even for 32-bit DMA, top address bits can be wired to
> 512GB)? Only the idmap would need 4 levels, but that's static, we don't
> need to switch Linux to 4-levels. Otherwise the memory is too sparse.

I think we should keep a static virtual-to-physical mapping, and to keep
relocating the kernel at compile time without a hack like ARM_PATCH_PHYS_VIRT
if at all possible. Further, the same document that describes the
"much-too-sparse" memory map also says that there should be no alias,
so we have to load the kernel to 0x8000.0000 physical and address most of
the memory at 0x80.0000.0000

> Of course, if you have 512GB of RAM and you want 4K pages, 3 levels are
> no longer enough (with 64K pages you get to 42-bit VA space).

Right, that is a separate issue. I don't know at what point we'll have
to address this one. For now, we have to break the 32GB barrier, then
we can think about the 256GB barrier ;-)

> > > Secondly, vmemmap space is not enough to cover over about 585GB physical
> > > address space. Fortunately, this issue can be resolved as utilizing an extra
> > > vmemmap space (0xffffffbe00000000-0xffffffbffbbfffff) in [2]. However,
> > > it would not cover systems having a couple of terabytes DRAM.
> > 
> > This one can be trivially changed by taking more space out of the vmalloc
> > area, to go much higher if necessary. vmemmap space is always just a fraction
> > of the linear mapping size, so we can accommodate it by definition if we
> > find space to fit the physical memory.
> 
> vmemmap is the total range / page size * sizeof(struct page). So for 1TB
> range and 4K pages we would need 8GB (the current value, unless I
> miscalculated the above). Anyway, you can't cover 1TB range with
> 3-levels.

The size of 'struct page' depends on a couple of configuration variables.
If they are all enabled, you might need a bit more, even for configurations
that don't have that much address space.

> > > Therefore, it would be needed to implement 4 level page table translations
> > > for 4KB pages on 40-bit physical address space platforms. Someone might
> > > suggest use of 64KB pages in this case, but I'm not sure about how to
> > > deal with internal memory fragmentation.
> > > 
> > > I would like to contribute 4 level page table translations to upstream,
> > > the target of which is 3.16 kernel, if there is no movement on it. I saw
> > > some related RFC patches a couple of months ago, but they didn't seem to 
> > > be merged into maintainer's tree.
> > 
> > I think you are answering the wrong question here. Four level page tables
> > should not be required to support >32GB of RAM, that would be very silly.
> 
> I agree, we should only enable 4-levels of page table if we have close
> to 512GB of RAM or the range is too sparse but I would actually push
> back on the hardware guys to keep it tighter.

But remember this part:

> > There are good reasons to use a 50 bit virtual address space in user
> > land, e.g. for supporting data base applications that mmap huge files.

You may actually need 4-level tables even if you have much less installed
memory, depending on how the application is written. Note that x86, powerpc
and s390 all chose to use 4-level tables for 64-bit kernels all the
time, even thought they can also use 3-level of 5-level in some cases.

> > If this is not the goal however, we should not pay for the overhead
> > of the extra page table in user space. I can see two other possible
> > solutions for the problem:
> > 
> > a) always use a four-level page table in kernel space, regardless of
> > whether we do it in user space. We can move the kernel mappings down
> > in address space either by one 512GB entry to 0xffffff0000000000, or
> > to match the 64k-page location at 0xfffffc0000000000, or all the way
> > to to 0xfffc000000000000. In any case, we can have all the dynamic
> > mappings within one 512GB area and pretend we have a three-level
> > page table for them, while the rest of DRAM is mapped statically at
> > early boot time using 512GB large pages.
> 
> That's a workaround but we end up with two (or more) kernel pgds - one
> for vmalloc, ioremap etc. and another (static) one for the kernel linear
> mapping. So far there isn't any memory mapping carved out but we have to
> be careful in the future.
> 
> However, kernel page table walking would be a bit slower with 4-levels.

Do we actually walk the kernel page tables that often? With what I suggested,
we can still pretend that it's 3-level for all practical purposes, since
you wouldn't walk the page tables for the linear mapping.

> > b) If there is a reasonable assumption that everybody is using the
> > memory map from [1], then we can change the __virt_to_phys
> > and __phys_to_virt functions to accomodate that and move everything
> > into a flat contiguous virtual address space of 256GB. This would
> > also enable the use of a more efficient mem_map array instead of the
> > vmemmap, but would break running on any system that doesn't follow
> > the same convention. I have no idea yet how common this memory map
> > is, so I can't tell if this would be a realistic solution for what
> > you are targeting. We clearly wouldn't do it if it implies distributions
> > to ship an extra kernel binary for systems based on different memory
> > maps.
> 
> We end up with hacks like the Realview phys/virt conversion. I don't
> think we can guarantee that all ARMv8 platforms would follow the above
> guidance.

What I was thinking is that if all SBSA machines for instance follow this
model, then some distros that only support those machines anyway can
turn it on.

	Arnd

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC] ARM64: 4 level page table translation for 4KB pages
  2014-03-31 12:45     ` Catalin Marinas
@ 2014-03-31 12:58       ` Arnd Bergmann
  2014-03-31 15:00         ` Catalin Marinas
  0 siblings, 1 reply; 24+ messages in thread
From: Arnd Bergmann @ 2014-03-31 12:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday 31 March 2014 13:45:51 Catalin Marinas wrote:
> On Mon, Mar 31, 2014 at 12:31:14PM +0100, Catalin Marinas wrote:
> > On Mon, Mar 31, 2014 at 07:56:53AM +0100, Arnd Bergmann wrote:
> > > On Monday 31 March 2014 12:51:07 Jungseok Lee wrote:
> > > > Current ARM64 kernel cannot support 4KB pages for 40-bit physical address
> > > > space described in [1] due to one major issue + one minor issue.
> > > > 
> > > > Firstly, kernel logical memory map (0xffffffc000000000-0xffffffffffffffff)
> > > > cannot cover DRAM region from 544GB to 1024GB in [1]. Specifically, ARM64
> > > > kernel fails to create mapping for this region in map_mem function
> > > > (arch/arm64/mm/mmu.c) since __phys_to_virt for this region reaches to
> > > > address overflow. I've used 3.14-rc8+Fast Models to validate the statement.
> [...]
> > > a) always use a four-level page table in kernel space, regardless of
> > > whether we do it in user space. We can move the kernel mappings down
> > > in address space either by one 512GB entry to 0xffffff0000000000, or
> > > to match the 64k-page location at 0xfffffc0000000000, or all the way
> > > to to 0xfffc000000000000. In any case, we can have all the dynamic
> > > mappings within one 512GB area and pretend we have a three-level
> > > page table for them, while the rest of DRAM is mapped statically at
> > > early boot time using 512GB large pages.
> > 
> > That's a workaround but we end up with two (or more) kernel pgds - one
> > for vmalloc, ioremap etc. and another (static) one for the kernel linear
> > mapping. So far there isn't any memory mapping carved out but we have to
> > be careful in the future.
> > 
> > However, kernel page table walking would be a bit slower with 4-levels.
> 
> Yet another approach would be to enable 4-levels of page tables (no
> nopud.h) in the kernel with pgd_offset_k doing the right thing for 4
> levels but user space configured to 3-levels only and pgd_offset
> returning 0 while pud_offset does what pgd_offset currently implements
> for 3 levels.

Either I was unclear earlier, or I misunderstand what you are saying
here. How is that different from what I wrote above?

> This solves the page table walk latency for user but not for kernel.
> Anyway, if the hardware memory map is so sparse (a real SoC, not the
> spec), I don't think we have other ways to support it with 3-levels of
> page table for the kernel, unless we hack __virt_to_phys/__phys_to_virt.

Right. I also wonder if the SoCs are configurable with the way they
map the memory, they might just be able to do something smarter than
what the document says and map the >2GB memory contiguously starting at
0x08.8000.0000.

	Arnd

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC] ARM64: 4 level page table translation for 4KB pages
  2014-03-31 12:58       ` Arnd Bergmann
@ 2014-03-31 15:00         ` Catalin Marinas
  0 siblings, 0 replies; 24+ messages in thread
From: Catalin Marinas @ 2014-03-31 15:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Mar 31, 2014 at 01:58:42PM +0100, Arnd Bergmann wrote:
> On Monday 31 March 2014 13:45:51 Catalin Marinas wrote:
> > On Mon, Mar 31, 2014 at 12:31:14PM +0100, Catalin Marinas wrote:
> > > On Mon, Mar 31, 2014 at 07:56:53AM +0100, Arnd Bergmann wrote:
> > > > On Monday 31 March 2014 12:51:07 Jungseok Lee wrote:
> > > > > Current ARM64 kernel cannot support 4KB pages for 40-bit physical address
> > > > > space described in [1] due to one major issue + one minor issue.
> > > > > 
> > > > > Firstly, kernel logical memory map (0xffffffc000000000-0xffffffffffffffff)
> > > > > cannot cover DRAM region from 544GB to 1024GB in [1]. Specifically, ARM64
> > > > > kernel fails to create mapping for this region in map_mem function
> > > > > (arch/arm64/mm/mmu.c) since __phys_to_virt for this region reaches to
> > > > > address overflow. I've used 3.14-rc8+Fast Models to validate the statement.
> > [...]
> > > > a) always use a four-level page table in kernel space, regardless of
> > > > whether we do it in user space. We can move the kernel mappings down
> > > > in address space either by one 512GB entry to 0xffffff0000000000, or
> > > > to match the 64k-page location at 0xfffffc0000000000, or all the way
> > > > to to 0xfffc000000000000. In any case, we can have all the dynamic
> > > > mappings within one 512GB area and pretend we have a three-level
> > > > page table for them, while the rest of DRAM is mapped statically at
> > > > early boot time using 512GB large pages.
> > > 
> > > That's a workaround but we end up with two (or more) kernel pgds - one
> > > for vmalloc, ioremap etc. and another (static) one for the kernel linear
> > > mapping. So far there isn't any memory mapping carved out but we have to
> > > be careful in the future.
> > > 
> > > However, kernel page table walking would be a bit slower with 4-levels.
> > 
> > Yet another approach would be to enable 4-levels of page tables (no
> > nopud.h) in the kernel with pgd_offset_k doing the right thing for 4
> > levels but user space configured to 3-levels only and pgd_offset
> > returning 0 while pud_offset does what pgd_offset currently implements
> > for 3 levels.
> 
> Either I was unclear earlier, or I misunderstand what you are saying
> here. How is that different from what I wrote above?

It probably isn't, just my reading of it, whether we include
pgtable-nopud.h or not (and I thought you said we shouldn't so that we
pretend we still have 3 levels with the kernel mapping created at boot
statically but any dynamic mappings using the nopud macros).

> > This solves the page table walk latency for user but not for kernel.
> > Anyway, if the hardware memory map is so sparse (a real SoC, not the
> > spec), I don't think we have other ways to support it with 3-levels of
> > page table for the kernel, unless we hack __virt_to_phys/__phys_to_virt.
> 
> Right. I also wonder if the SoCs are configurable with the way they
> map the memory, they might just be able to do something smarter than
> what the document says and map the >2GB memory contiguously starting at
> 0x08.8000.0000.

This document is pre-ARMv8 (and extended for ARMv8 afterwards) but it
requires that some memory is placed within the 4GB range to be able to
boot in 32-bit mode with the MMU disabled.

What I understood from the hardware guys in the past is that for 4GB of
RAM (or more), they want to place it at 4GB (or 32GB if 32GB etc) for
the chip select. They can create an alias at 2GB and ARM recommends
hiding the top aliased memory (we have enough fun with software aliases,
hardware ones create some more). But on platforms like Keystone, such
hiding doesn't happen AFAICT.

Arguably, you don't need such low alias on ARMv8 but you never know (at
least some secure memory in case a secure OS is 32-bit only is still
useful).

-- 
Catalin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC] ARM64: 4 level page table translation for 4KB pages
  2014-03-31 12:53     ` Arnd Bergmann
@ 2014-03-31 15:27       ` Catalin Marinas
  2014-03-31 23:11         ` Arnd Bergmann
  2014-04-01  0:44         ` 정성진
  0 siblings, 2 replies; 24+ messages in thread
From: Catalin Marinas @ 2014-03-31 15:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Mar 31, 2014 at 01:53:20PM +0100, Arnd Bergmann wrote:
> On Monday 31 March 2014 12:31:14 Catalin Marinas wrote:
> > On Mon, Mar 31, 2014 at 07:56:53AM +0100, Arnd Bergmann wrote:
> > > On Monday 31 March 2014 12:51:07 Jungseok Lee wrote:
> > > > Current ARM64 kernel cannot support 4KB pages for 40-bit physical address
> > > > space described in [1] due to one major issue + one minor issue.
> > > > 
> > > > Firstly, kernel logical memory map (0xffffffc000000000-0xffffffffffffffff)
> > > > cannot cover DRAM region from 544GB to 1024GB in [1]. Specifically, ARM64
> > > > kernel fails to create mapping for this region in map_mem function
> > > > (arch/arm64/mm/mmu.c) since __phys_to_virt for this region reaches to
> > > > address overflow. I've used 3.14-rc8+Fast Models to validate the statement.
> > > 
> > > It took me a while to understand what is going on, but it essentially comes
> > > down to the logical memory map (0xffffffc000000000-0xffffffffffffffff)
> > > being able to represent only RAM in the first 256GB of address space.
> > > 
> > > More importantly, this means that any system following [1] will only be
> > > able to use 32GB of RAM, which is a much more severe restriction than
> > > what it sounds like at first.
> > 
> > On a 64-bit platform, do we still need the alias at the bottom and the
> > 512-544GB hole (even for 32-bit DMA, top address bits can be wired to
> > 512GB)? Only the idmap would need 4 levels, but that's static, we don't
> > need to switch Linux to 4-levels. Otherwise the memory is too sparse.
> 
> I think we should keep a static virtual-to-physical mapping,

Just so that I understand: with a PHYS_OFFSET of 0?

> and to keep
> relocating the kernel at compile time without a hack like ARM_PATCH_PHYS_VIRT
> if at all possible.

and the kernel running at a virtual alias at a higher position than the
end of the mapped RAM? IIUC x86_64 does something similar.

> > > > Secondly, vmemmap space is not enough to cover over about 585GB physical
> > > > address space. Fortunately, this issue can be resolved as utilizing an extra
> > > > vmemmap space (0xffffffbe00000000-0xffffffbffbbfffff) in [2]. However,
> > > > it would not cover systems having a couple of terabytes DRAM.
> > > 
> > > This one can be trivially changed by taking more space out of the vmalloc
> > > area, to go much higher if necessary. vmemmap space is always just a fraction
> > > of the linear mapping size, so we can accommodate it by definition if we
> > > find space to fit the physical memory.
> > 
> > vmemmap is the total range / page size * sizeof(struct page). So for 1TB
> > range and 4K pages we would need 8GB (the current value, unless I
> > miscalculated the above). Anyway, you can't cover 1TB range with
> > 3-levels.
> 
> The size of 'struct page' depends on a couple of configuration variables.
> If they are all enabled, you might need a bit more, even for configurations
> that don't have that much address space.

Yes. We could make vmemmap configurable at run-time or just go for a
maximum value.

> > > > Therefore, it would be needed to implement 4 level page table translations
> > > > for 4KB pages on 40-bit physical address space platforms. Someone might
> > > > suggest use of 64KB pages in this case, but I'm not sure about how to
> > > > deal with internal memory fragmentation.
> > > > 
> > > > I would like to contribute 4 level page table translations to upstream,
> > > > the target of which is 3.16 kernel, if there is no movement on it. I saw
> > > > some related RFC patches a couple of months ago, but they didn't seem to 
> > > > be merged into maintainer's tree.
> > > 
> > > I think you are answering the wrong question here. Four level page tables
> > > should not be required to support >32GB of RAM, that would be very silly.
> > 
> > I agree, we should only enable 4-levels of page table if we have close
> > to 512GB of RAM or the range is too sparse but I would actually push
> > back on the hardware guys to keep it tighter.
> 
> But remember this part:
> 
> > > There are good reasons to use a 50 bit virtual address space in user
> > > land, e.g. for supporting data base applications that mmap huge files.
> 
> You may actually need 4-level tables even if you have much less installed
> memory, depending on how the application is written. Note that x86, powerpc
> and s390 all chose to use 4-level tables for 64-bit kernels all the
> time, even thought they can also use 3-level of 5-level in some cases.

I don't mind 4-level tables by default but I would still keep a
configuration option (or at least doing some benchmarks to assess the
impact before switching permanently to 4-levels). There are mobile
platforms that don't really need as much VA space (and people are even
talking about ILP32).

> > > If this is not the goal however, we should not pay for the overhead
> > > of the extra page table in user space. I can see two other possible
> > > solutions for the problem:
> > > 
> > > a) always use a four-level page table in kernel space, regardless of
> > > whether we do it in user space. We can move the kernel mappings down
> > > in address space either by one 512GB entry to 0xffffff0000000000, or
> > > to match the 64k-page location at 0xfffffc0000000000, or all the way
> > > to to 0xfffc000000000000. In any case, we can have all the dynamic
> > > mappings within one 512GB area and pretend we have a three-level
> > > page table for them, while the rest of DRAM is mapped statically at
> > > early boot time using 512GB large pages.
> > 
> > That's a workaround but we end up with two (or more) kernel pgds - one
> > for vmalloc, ioremap etc. and another (static) one for the kernel linear
> > mapping. So far there isn't any memory mapping carved out but we have to
> > be careful in the future.
> > 
> > However, kernel page table walking would be a bit slower with 4-levels.
> 
> Do we actually walk the kernel page tables that often? With what I suggested,
> we can still pretend that it's 3-level for all practical purposes, since
> you wouldn't walk the page tables for the linear mapping.

I was referring to hardware page table walk (TLB miss). Again, we need
some benchmarks (it gets worse in a guest as it needs to walk the stage
2 for each stage 1 level miss; if you are really unlucky you can have up
to 24 memory accesses for a TLB miss with two translation stages and 4
levels each).

-- 
Catalin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC] ARM64: 4 level page table translation for 4KB pages
  2014-03-31 15:27       ` Catalin Marinas
@ 2014-03-31 23:11         ` Arnd Bergmann
  2014-04-01 13:23           ` Catalin Marinas
  2014-04-01  0:44         ` 정성진
  1 sibling, 1 reply; 24+ messages in thread
From: Arnd Bergmann @ 2014-03-31 23:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday 31 March 2014 16:27:19 Catalin Marinas wrote:
> On Mon, Mar 31, 2014 at 01:53:20PM +0100, Arnd Bergmann wrote:
> > On Monday 31 March 2014 12:31:14 Catalin Marinas wrote:
> > > On Mon, Mar 31, 2014 at 07:56:53AM +0100, Arnd Bergmann wrote:
> > > > On Monday 31 March 2014 12:51:07 Jungseok Lee wrote:
> > > > > Current ARM64 kernel cannot support 4KB pages for 40-bit physical address
> > > > > space described in [1] due to one major issue + one minor issue.
> > > > > 
> > > > > Firstly, kernel logical memory map (0xffffffc000000000-0xffffffffffffffff)
> > > > > cannot cover DRAM region from 544GB to 1024GB in [1]. Specifically, ARM64
> > > > > kernel fails to create mapping for this region in map_mem function
> > > > > (arch/arm64/mm/mmu.c) since __phys_to_virt for this region reaches to
> > > > > address overflow. I've used 3.14-rc8+Fast Models to validate the statement.
> > > > 
> > > > It took me a while to understand what is going on, but it essentially comes
> > > > down to the logical memory map (0xffffffc000000000-0xffffffffffffffff)
> > > > being able to represent only RAM in the first 256GB of address space.
> > > > 
> > > > More importantly, this means that any system following [1] will only be
> > > > able to use 32GB of RAM, which is a much more severe restriction than
> > > > what it sounds like at first.
> > > 
> > > On a 64-bit platform, do we still need the alias at the bottom and the
> > > 512-544GB hole (even for 32-bit DMA, top address bits can be wired to
> > > 512GB)? Only the idmap would need 4 levels, but that's static, we don't
> > > need to switch Linux to 4-levels. Otherwise the memory is too sparse.
> > 
> > I think we should keep a static virtual-to-physical mapping,
> 
> Just so that I understand: with a PHYS_OFFSET of 0?

I hadn't realized at first that it's variable, but I guess 0 would be the easiest,
otherwise we wouldn't be able to use 512GB pages to map the high memory range.

> > and to keep
> > relocating the kernel at compile time without a hack like ARM_PATCH_PHYS_VIRT
> > if at all possible.
> 
> and the kernel running at a virtual alias at a higher position than the
> end of the mapped RAM? IIUC x86_64 does something similar.

That would work, yes.

Another idea is to always run the kernel at PAGE_OFFSET, as today, but create
an alias there if there isn't already RAM at that location with the fixed
PHYS_OFFSET.

> > > > > Secondly, vmemmap space is not enough to cover over about 585GB physical
> > > > > address space. Fortunately, this issue can be resolved as utilizing an extra
> > > > > vmemmap space (0xffffffbe00000000-0xffffffbffbbfffff) in [2]. However,
> > > > > it would not cover systems having a couple of terabytes DRAM.
> > > > 
> > > > This one can be trivially changed by taking more space out of the vmalloc
> > > > area, to go much higher if necessary. vmemmap space is always just a fraction
> > > > of the linear mapping size, so we can accommodate it by definition if we
> > > > find space to fit the physical memory.
> > > 
> > > vmemmap is the total range / page size * sizeof(struct page). So for 1TB
> > > range and 4K pages we would need 8GB (the current value, unless I
> > > miscalculated the above). Anyway, you can't cover 1TB range with
> > > 3-levels.
> > 
> > The size of 'struct page' depends on a couple of configuration variables.
> > If they are all enabled, you might need a bit more, even for configurations
> > that don't have that much address space.
> 
> Yes. We could make vmemmap configurable at run-time or just go for a
> maximum value.

I would just aim for 'large enough': pick a reasonable maximum RAM size
and then leave space for four times as much mem_map as we need.

> > > > > Therefore, it would be needed to implement 4 level page table translations
> > > > > for 4KB pages on 40-bit physical address space platforms. Someone might
> > > > > suggest use of 64KB pages in this case, but I'm not sure about how to
> > > > > deal with internal memory fragmentation.
> > > > > 
> > > > > I would like to contribute 4 level page table translations to upstream,
> > > > > the target of which is 3.16 kernel, if there is no movement on it. I saw
> > > > > some related RFC patches a couple of months ago, but they didn't seem to 
> > > > > be merged into maintainer's tree.
> > > > 
> > > > I think you are answering the wrong question here. Four level page tables
> > > > should not be required to support >32GB of RAM, that would be very silly.
> > > 
> > > I agree, we should only enable 4-levels of page table if we have close
> > > to 512GB of RAM or the range is too sparse but I would actually push
> > > back on the hardware guys to keep it tighter.
> > 
> > But remember this part:
> > 
> > > > There are good reasons to use a 50 bit virtual address space in user
> > > > land, e.g. for supporting data base applications that mmap huge files.
> > 
> > You may actually need 4-level tables even if you have much less installed
> > memory, depending on how the application is written. Note that x86, powerpc
> > and s390 all chose to use 4-level tables for 64-bit kernels all the
> > time, even thought they can also use 3-level of 5-level in some cases.
> 
> I don't mind 4-level tables by default but I would still keep a
> configuration option (or at least doing some benchmarks to assess the
> impact before switching permanently to 4-levels). There are mobile
> platforms that don't really need as much VA space (and people are even
> talking about ILP32).

Yes, I wasn't suggesting we do it all the time. A related question
is whether we would also want to support 3-level 64k page tables, to
extend the addressable area from 42 bit (4TB) to 55 bit (large enough).
Is that actually a supported configuration?

> > > > If this is not the goal however, we should not pay for the overhead
> > > > of the extra page table in user space. I can see two other possible
> > > > solutions for the problem:
> > > > 
> > > > a) always use a four-level page table in kernel space, regardless of
> > > > whether we do it in user space. We can move the kernel mappings down
> > > > in address space either by one 512GB entry to 0xffffff0000000000, or
> > > > to match the 64k-page location at 0xfffffc0000000000, or all the way
> > > > to to 0xfffc000000000000. In any case, we can have all the dynamic
> > > > mappings within one 512GB area and pretend we have a three-level
> > > > page table for them, while the rest of DRAM is mapped statically at
> > > > early boot time using 512GB large pages.
> > > 
> > > That's a workaround but we end up with two (or more) kernel pgds - one
> > > for vmalloc, ioremap etc. and another (static) one for the kernel linear
> > > mapping. So far there isn't any memory mapping carved out but we have to
> > > be careful in the future.
> > > 
> > > However, kernel page table walking would be a bit slower with 4-levels.
> > 
> > Do we actually walk the kernel page tables that often? With what I suggested,
> > we can still pretend that it's 3-level for all practical purposes, since
> > you wouldn't walk the page tables for the linear mapping.
> 
> I was referring to hardware page table walk (TLB miss). Again, we need
> some benchmarks (it gets worse in a guest as it needs to walk the stage
> 2 for each stage 1 level miss; if you are really unlucky you can have up
> to 24 memory accesses for a TLB miss with two translation stages and 4
> levels each).

Ah right, of course. It would only be important for MMIO mappings though, as
the linear mapping can be done using 1GB or 512GB large pages, and these
tend to not cause noticeable overhead during lookup.

	Arnd

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC] ARM64: 4 level page table translation for 4KB pages
  2014-03-31 11:31   ` Catalin Marinas
  2014-03-31 12:45     ` Catalin Marinas
  2014-03-31 12:53     ` Arnd Bergmann
@ 2014-04-01  0:42     ` Jungseok Lee
  2 siblings, 0 replies; 24+ messages in thread
From: Jungseok Lee @ 2014-04-01  0:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday, March 31, 2014 8:31 PM, Catalin Marinas wrote:
> On Mon, Mar 31, 2014 at 07:56:53AM +0100, Arnd Bergmann wrote:
> > On Monday 31 March 2014 12:51:07 Jungseok Lee wrote:
> > > Current ARM64 kernel cannot support 4KB pages for 40-bit physical address
> > > space described in [1] due to one major issue + one minor issue.
> > >
> > > Firstly, kernel logical memory map (0xffffffc000000000-0xffffffffffffffff)
> > > cannot cover DRAM region from 544GB to 1024GB in [1]. Specifically, ARM64
> > > kernel fails to create mapping for this region in map_mem function
> > > (arch/arm64/mm/mmu.c) since __phys_to_virt for this region reaches to
> > > address overflow. I've used 3.14-rc8+Fast Models to validate the statement.
> >
> > It took me a while to understand what is going on, but it essentially comes
> > down to the logical memory map (0xffffffc000000000-0xffffffffffffffff)
> > being able to represent only RAM in the first 256GB of address space.
> >
> > More importantly, this means that any system following [1] will only be
> > able to use 32GB of RAM, which is a much more severe restriction than
> > what it sounds like at first.
> 
> On a 64-bit platform, do we still need the alias at the bottom and the
> 512-544GB hole (even for 32-bit DMA, top address bits can be wired to
> 512GB)? Only the idmap would need 4 levels, but that's static, we don't
> need to switch Linux to 4-levels. Otherwise the memory is too sparse.
> 
> Of course, if you have 512GB of RAM and you want 4K pages, 3 levels are
> no longer enough (with 64K pages you get to 42-bit VA space).
> 
> > > Secondly, vmemmap space is not enough to cover over about 585GB physical
> > > address space. Fortunately, this issue can be resolved as utilizing an extra
> > > vmemmap space (0xffffffbe00000000-0xffffffbffbbfffff) in [2]. However,
> > > it would not cover systems having a couple of terabytes DRAM.
> >
> > This one can be trivially changed by taking more space out of the vmalloc
> > area, to go much higher if necessary. vmemmap space is always just a fraction
> > of the linear mapping size, so we can accommodate it by definition if we
> > find space to fit the physical memory.
> 
> vmemmap is the total range / page size * sizeof(struct page). So for 1TB
> range and 4K pages we would need 8GB (the current value, unless I
> miscalculated the above). Anyway, you can't cover 1TB range with
> 3-levels.
> 
> > > Therefore, it would be needed to implement 4 level page table translations
> > > for 4KB pages on 40-bit physical address space platforms. Someone might
> > > suggest use of 64KB pages in this case, but I'm not sure about how to
> > > deal with internal memory fragmentation.
> > >
> > > I would like to contribute 4 level page table translations to upstream,
> > > the target of which is 3.16 kernel, if there is no movement on it. I saw
> > > some related RFC patches a couple of months ago, but they didn't seem to
> > > be merged into maintainer's tree.
> >
> > I think you are answering the wrong question here. Four level page tables
> > should not be required to support >32GB of RAM, that would be very silly.
> 
> I agree, we should only enable 4-levels of page table if we have close
> to 512GB of RAM or the range is too sparse but I would actually push
> back on the hardware guys to keep it tighter.

This is my point. If SoC design follows the document, [1], over 32GB RAM
region should be placed from 544GB. Under this severe restriction,
even 64GB system is supposed to use the region from 544GB to 576GB for
only 32GB. As you said, memory map is too sparse. Naturally, it would reach
to enable 4-levels page tables unless __virt_to_phys and pyhs_to_virt
functions are hacked.

I agree that we should enable 4-levels page table when one of the above
conditions are met.

> > There are good reasons to use a 50 bit virtual address space in user
> > land, e.g. for supporting data base applications that mmap huge files.
> >
> > If this is not the goal however, we should not pay for the overhead
> > of the extra page table in user space. I can see two other possible
> > solutions for the problem:
> >
> > a) always use a four-level page table in kernel space, regardless of
> > whether we do it in user space. We can move the kernel mappings down
> > in address space either by one 512GB entry to 0xffffff0000000000, or
> > to match the 64k-page location at 0xfffffc0000000000, or all the way
> > to to 0xfffc000000000000. In any case, we can have all the dynamic
> > mappings within one 512GB area and pretend we have a three-level
> > page table for them, while the rest of DRAM is mapped statically at
> > early boot time using 512GB large pages.
> 
> That's a workaround but we end up with two (or more) kernel pgds - one
> for vmalloc, ioremap etc. and another (static) one for the kernel linear
> mapping. So far there isn't any memory mapping carved out but we have to
> be careful in the future.
>
> However, kernel page table walking would be a bit slower with 4-levels.
> 
> > b) If there is a reasonable assumption that everybody is using the
> > memory map from [1], then we can change the __virt_to_phys
> > and __phys_to_virt functions to accomodate that and move everything
> > into a flat contiguous virtual address space of 256GB. This would
> > also enable the use of a more efficient mem_map array instead of the
> > vmemmap, but would break running on any system that doesn't follow
> > the same convention. I have no idea yet how common this memory map
> > is, so I can't tell if this would be a realistic solution for what
> > you are targeting. We clearly wouldn't do it if it implies distributions
> > to ship an extra kernel binary for systems based on different memory
> > maps.
> 
> We end up with hacks like the Realview phys/virt conversion. I don't
> think we can guarantee that all ARMv8 platforms would follow the above
> guidance.
> 
> --
> Catalin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC] ARM64: 4 level page table translation for 4KB pages
  2014-03-31 15:27       ` Catalin Marinas
  2014-03-31 23:11         ` Arnd Bergmann
@ 2014-04-01  0:44         ` 정성진
  2014-04-01  9:46           ` Catalin Marinas
  1 sibling, 1 reply; 24+ messages in thread
From: 정성진 @ 2014-04-01  0:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday, April 01, 2014 12:27 AM Catalin Marinas wrote:
> On Mon, Mar 31, 2014 at 01:53:20PM +0100, Arnd Bergmann wrote:
> > On Monday 31 March 2014 12:31:14 Catalin Marinas wrote:
> > > On Mon, Mar 31, 2014 at 07:56:53AM +0100, Arnd Bergmann wrote:
> > > > On Monday 31 March 2014 12:51:07 Jungseok Lee wrote:
> > > > > Current ARM64 kernel cannot support 4KB pages for 40-bit physical address
> > > > > space described in [1] due to one major issue + one minor issue.
> > > > >
> > > > > Firstly, kernel logical memory map (0xffffffc000000000-0xffffffffffffffff)
> > > > > cannot cover DRAM region from 544GB to 1024GB in [1]. Specifically, ARM64
> > > > > kernel fails to create mapping for this region in map_mem function
> > > > > (arch/arm64/mm/mmu.c) since __phys_to_virt for this region reaches to
> > > > > address overflow. I've used 3.14-rc8+Fast Models to validate the statement.
> > > >
> > > > It took me a while to understand what is going on, but it essentially comes
> > > > down to the logical memory map (0xffffffc000000000-0xffffffffffffffff)
> > > > being able to represent only RAM in the first 256GB of address space.
> > > >
> > > > More importantly, this means that any system following [1] will only be
> > > > able to use 32GB of RAM, which is a much more severe restriction than
> > > > what it sounds like at first.
> > >
> > > On a 64-bit platform, do we still need the alias at the bottom and the
> > > 512-544GB hole (even for 32-bit DMA, top address bits can be wired to
> > > 512GB)? Only the idmap would need 4 levels, but that's static, we don't
> > > need to switch Linux to 4-levels. Otherwise the memory is too sparse.
> >
> > I think we should keep a static virtual-to-physical mapping,
> 
> Just so that I understand: with a PHYS_OFFSET of 0?
> 
> > and to keep
> > relocating the kernel at compile time without a hack like ARM_PATCH_PHYS_VIRT
> > if at all possible.
> 
> and the kernel running at a virtual alias at a higher position than the
> end of the mapped RAM? IIUC x86_64 does something similar.
> 
> > > > > Secondly, vmemmap space is not enough to cover over about 585GB physical
> > > > > address space. Fortunately, this issue can be resolved as utilizing an extra
> > > > > vmemmap space (0xffffffbe00000000-0xffffffbffbbfffff) in [2]. However,
> > > > > it would not cover systems having a couple of terabytes DRAM.
> > > >
> > > > This one can be trivially changed by taking more space out of the vmalloc
> > > > area, to go much higher if necessary. vmemmap space is always just a fraction
> > > > of the linear mapping size, so we can accommodate it by definition if we
> > > > find space to fit the physical memory.
> > >
> > > vmemmap is the total range / page size * sizeof(struct page). So for 1TB
> > > range and 4K pages we would need 8GB (the current value, unless I
> > > miscalculated the above). Anyway, you can't cover 1TB range with
> > > 3-levels.
> >
> > The size of 'struct page' depends on a couple of configuration variables.
> > If they are all enabled, you might need a bit more, even for configurations
> > that don't have that much address space.
> 
> Yes. We could make vmemmap configurable at run-time or just go for a
> maximum value.
> 
> > > > > Therefore, it would be needed to implement 4 level page table translations
> > > > > for 4KB pages on 40-bit physical address space platforms. Someone might
> > > > > suggest use of 64KB pages in this case, but I'm not sure about how to
> > > > > deal with internal memory fragmentation.
> > > > >
> > > > > I would like to contribute 4 level page table translations to upstream,
> > > > > the target of which is 3.16 kernel, if there is no movement on it. I saw
> > > > > some related RFC patches a couple of months ago, but they didn't seem to
> > > > > be merged into maintainer's tree.
> > > >
> > > > I think you are answering the wrong question here. Four level page tables
> > > > should not be required to support >32GB of RAM, that would be very silly.
> > >
> > > I agree, we should only enable 4-levels of page table if we have close
> > > to 512GB of RAM or the range is too sparse but I would actually push
> > > back on the hardware guys to keep it tighter.
> >
> > But remember this part:
> >
> > > > There are good reasons to use a 50 bit virtual address space in user
> > > > land, e.g. for supporting data base applications that mmap huge files.
> >
> > You may actually need 4-level tables even if you have much less installed
> > memory, depending on how the application is written. Note that x86, powerpc
> > and s390 all chose to use 4-level tables for 64-bit kernels all the
> > time, even thought they can also use 3-level of 5-level in some cases.
> 
> I don't mind 4-level tables by default but I would still keep a
> configuration option (or at least doing some benchmarks to assess the
> impact before switching permanently to 4-levels). There are mobile
> platforms that don't really need as much VA space (and people are even
> talking about ILP32).

Hi,
How about keep 3-level table by default and enable 4-level table with 
config option?
Asymmetry level for kernel and user land would make code complicated.
And usually more memory means that user application tends to use more memory.
So I suggest same virtual space for both.

> 
> > > > If this is not the goal however, we should not pay for the overhead
> > > > of the extra page table in user space. I can see two other possible
> > > > solutions for the problem:
> > > >
> > > > a) always use a four-level page table in kernel space, regardless of
> > > > whether we do it in user space. We can move the kernel mappings down
> > > > in address space either by one 512GB entry to 0xffffff0000000000, or
> > > > to match the 64k-page location at 0xfffffc0000000000, or all the way
> > > > to to 0xfffc000000000000. In any case, we can have all the dynamic
> > > > mappings within one 512GB area and pretend we have a three-level
> > > > page table for them, while the rest of DRAM is mapped statically at
> > > > early boot time using 512GB large pages.
> > >
> > > That's a workaround but we end up with two (or more) kernel pgds - one
> > > for vmalloc, ioremap etc. and another (static) one for the kernel linear
> > > mapping. So far there isn't any memory mapping carved out but we have to
> > > be careful in the future.
> > >
> > > However, kernel page table walking would be a bit slower with 4-levels.
> >
> > Do we actually walk the kernel page tables that often? With what I suggested,
> > we can still pretend that it's 3-level for all practical purposes, since
> > you wouldn't walk the page tables for the linear mapping.
> 
> I was referring to hardware page table walk (TLB miss). Again, we need
> some benchmarks (it gets worse in a guest as it needs to walk the stage
> 2 for each stage 1 level miss; if you are really unlucky you can have up
> to 24 memory accesses for a TLB miss with two translation stages and 4
> levels each).
> 
> --
> Catalin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC] ARM64: 4 level page table translation for 4KB pages
  2014-04-01  0:44         ` 정성진
@ 2014-04-01  9:46           ` Catalin Marinas
  2014-04-01 10:13             ` 정성진
  0 siblings, 1 reply; 24+ messages in thread
From: Catalin Marinas @ 2014-04-01  9:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Apr 01, 2014 at 01:44:36AM +0100, ??? wrote:
> On Tuesday, April 01, 2014 12:27 AM Catalin Marinas wrote:
> > I don't mind 4-level tables by default but I would still keep a
> > configuration option (or at least doing some benchmarks to assess the
> > impact before switching permanently to 4-levels). There are mobile
> > platforms that don't really need as much VA space (and people are even
> > talking about ILP32).
> 
> How about keep 3-level table by default and enable 4-level table with 
> config option?

We want single image, so the default should cover all platforms. If
someone wants to deploy a more specific kernel (e.g. for a mobile
platform), they could change the configuration to more suitable ones.

> Asymmetry level for kernel and user land would make code complicated.
> And usually more memory means that user application tends to use more memory.
> So I suggest same virtual space for both.

I don't really get the "more memory means ..." above ;). It's more
virtual space, the user application would not extend to use all of it
(most likely won't even notice).

Asymmetry wouldn't make things much more complicated. You can even
pretend you have 4 levels configured but used pgd_offset tricks to
effectively use 3 hardware levels.

-- 
Catalin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC] ARM64: 4 level page table translation for 4KB pages
  2014-04-01  9:46           ` Catalin Marinas
@ 2014-04-01 10:13             ` 정성진
  2014-04-01 11:22               ` Catalin Marinas
  0 siblings, 1 reply; 24+ messages in thread
From: 정성진 @ 2014-04-01 10:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday, April 01, 2014 6:47 PM, Catalin Marinas wrote:
> On Tue, Apr 01, 2014 at 01:44:36AM +0100, ??? wrote:
> > On Tuesday, April 01, 2014 12:27 AM Catalin Marinas wrote:
> > > I don't mind 4-level tables by default but I would still keep a
> > > configuration option (or at least doing some benchmarks to assess the
> > > impact before switching permanently to 4-levels). There are mobile
> > > platforms that don't really need as much VA space (and people are even
> > > talking about ILP32).
> >
> > How about keep 3-level table by default and enable 4-level table with
> > config option?
> 
> We want single image, so the default should cover all platforms. If
> someone wants to deploy a more specific kernel (e.g. for a mobile
> platform), they could change the configuration to more suitable ones.
Single image would have merit but I think one way should be default 
Configuration and another should be specific kernel for smaller or lager 
memory platform.
Which one do you want as default configuration among 3 level and 4 level?
> 
> > Asymmetry level for kernel and user land would make code complicated.
> > And usually more memory means that user application tends to use more memory.
> > So I suggest same virtual space for both.
> 
> I don't really get the "more memory means ..." above ;). It's more
> virtual space, the user application would not extend to use all of it
> (most likely won't even notice).
The reason why we need 4 level page table is originally is that we need
bigger physical memory and kernel does not have enough virtual space.
Please consider this is not only for making virtual bigger.
It's to make system use large physical memory for user application.
> 
> Asymmetry wouldn't make things much more complicated. You can even
> pretend you have 4 levels configured but used pgd_offset tricks to
> effectively use 3 hardware levels.
User application also might need much memory and 4 level page table
in case it need 4 level page table for kernel.
> 
> --
> Catalin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC] ARM64: 4 level page table translation for 4KB pages
  2014-04-01 10:13             ` 정성진
@ 2014-04-01 11:22               ` Catalin Marinas
  2014-04-01 23:35                 ` Sungjinn Chung
  0 siblings, 1 reply; 24+ messages in thread
From: Catalin Marinas @ 2014-04-01 11:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Apr 01, 2014 at 11:13:27AM +0100, ??? wrote:
> On Tuesday, April 01, 2014 6:47 PM, Catalin Marinas wrote:
> > On Tue, Apr 01, 2014 at 01:44:36AM +0100, ??? wrote:
> > > On Tuesday, April 01, 2014 12:27 AM Catalin Marinas wrote:
> > > > I don't mind 4-level tables by default but I would still keep a
> > > > configuration option (or at least doing some benchmarks to assess the
> > > > impact before switching permanently to 4-levels). There are mobile
> > > > platforms that don't really need as much VA space (and people are even
> > > > talking about ILP32).
> > >
> > > How about keep 3-level table by default and enable 4-level table with
> > > config option?
> > 
> > We want single image, so the default should cover all platforms. If
> > someone wants to deploy a more specific kernel (e.g. for a mobile
> > platform), they could change the configuration to more suitable ones.
> Single image would have merit but I think one way should be default 
> Configuration and another should be specific kernel for smaller or lager 
> memory platform.
> Which one do you want as default configuration among 3 level and 4 level?

The default should be 4 levels to cover all cases. You can tweak the
config for your platform if you want to deploy a specific kernel image.

> > > Asymmetry level for kernel and user land would make code complicated.
> > > And usually more memory means that user application tends to use more memory.
> > > So I suggest same virtual space for both.
> > 
> > I don't really get the "more memory means ..." above ;). It's more
> > virtual space, the user application would not extend to use all of it
> > (most likely won't even notice).
> The reason why we need 4 level page table is originally is that we need
> bigger physical memory and kernel does not have enough virtual space.

So, you need more virtual space in the kernel to be able to linearly map
the physical address space. That's fine.

> Please consider this is not only for making virtual bigger.
> It's to make system use large physical memory for user application.

Do you want to use more than 512GB of physical memory for a user
application?

-- 
Catalin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC] ARM64: 4 level page table translation for 4KB pages
  2014-03-31 23:11         ` Arnd Bergmann
@ 2014-04-01 13:23           ` Catalin Marinas
  2014-04-02  3:58             ` Jungseok Lee
  0 siblings, 1 reply; 24+ messages in thread
From: Catalin Marinas @ 2014-04-01 13:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Apr 01, 2014 at 12:11:34AM +0100, Arnd Bergmann wrote:
> On Monday 31 March 2014 16:27:19 Catalin Marinas wrote:
> > On Mon, Mar 31, 2014 at 01:53:20PM +0100, Arnd Bergmann wrote:
> > > On Monday 31 March 2014 12:31:14 Catalin Marinas wrote:
> > > > On Mon, Mar 31, 2014 at 07:56:53AM +0100, Arnd Bergmann wrote:
> > > > > On Monday 31 March 2014 12:51:07 Jungseok Lee wrote:
> > > > > > Current ARM64 kernel cannot support 4KB pages for 40-bit physical address
> > > > > > space described in [1] due to one major issue + one minor issue.
> > > > > > 
> > > > > > Firstly, kernel logical memory map (0xffffffc000000000-0xffffffffffffffff)
> > > > > > cannot cover DRAM region from 544GB to 1024GB in [1]. Specifically, ARM64
> > > > > > kernel fails to create mapping for this region in map_mem function
> > > > > > (arch/arm64/mm/mmu.c) since __phys_to_virt for this region reaches to
> > > > > > address overflow. I've used 3.14-rc8+Fast Models to validate the statement.
> > > > > 
> > > > > It took me a while to understand what is going on, but it essentially comes
> > > > > down to the logical memory map (0xffffffc000000000-0xffffffffffffffff)
> > > > > being able to represent only RAM in the first 256GB of address space.
> > > > > 
> > > > > More importantly, this means that any system following [1] will only be
> > > > > able to use 32GB of RAM, which is a much more severe restriction than
> > > > > what it sounds like at first.
> > > > 
> > > > On a 64-bit platform, do we still need the alias at the bottom and the
> > > > 512-544GB hole (even for 32-bit DMA, top address bits can be wired to
> > > > 512GB)? Only the idmap would need 4 levels, but that's static, we don't
> > > > need to switch Linux to 4-levels. Otherwise the memory is too sparse.
> > > 


> > > I think we should keep a static virtual-to-physical mapping,
> > 
> > Just so that I understand: with a PHYS_OFFSET of 0?
> 
> I hadn't realized at first that it's variable, but I guess 0 would be the easiest,
> otherwise we wouldn't be able to use 512GB pages to map the high memory range.
> 
> > > and to keep
> > > relocating the kernel at compile time without a hack like ARM_PATCH_PHYS_VIRT
> > > if at all possible.
> > 
> > and the kernel running at a virtual alias at a higher position than the
> > end of the mapped RAM? IIUC x86_64 does something similar.
> 
> That would work, yes.
> 
> Another idea is to always run the kernel at PAGE_OFFSET, as today, but create
> an alias there if there isn't already RAM at that location with the fixed
> PHYS_OFFSET.

As long as we don't have some overlapping in VA space between start of
RAM and end of the mapped kernel.

There maybe be other tricky bits with KVM and how EL2 code is mapped.

> > > > > There are good reasons to use a 50 bit virtual address space in user
> > > > > land, e.g. for supporting data base applications that mmap huge files.
> > > 
> > > You may actually need 4-level tables even if you have much less installed
> > > memory, depending on how the application is written. Note that x86, powerpc
> > > and s390 all chose to use 4-level tables for 64-bit kernels all the
> > > time, even thought they can also use 3-level of 5-level in some cases.
> > 
> > I don't mind 4-level tables by default but I would still keep a
> > configuration option (or at least doing some benchmarks to assess the
> > impact before switching permanently to 4-levels). There are mobile
> > platforms that don't really need as much VA space (and people are even
> > talking about ILP32).
> 
> Yes, I wasn't suggesting we do it all the time. A related question
> is whether we would also want to support 3-level 64k page tables, to
> extend the addressable area from 42 bit (4TB) to 55 bit (large enough).
> Is that actually a supported configuration?

It can go up to 48-bit maximum (with some extra reserved bits in the
architecture, just in case more will be needed).

On some previous patches I've seen posted for 4-levels I asked that 64K
and 4K page configurations are decoupled from the pgtable-?level.h
macros so that if we ever need 3-levels with 64K it's easy to enable.
For the time being, I don't see a need (well, unless someone plans to
have 1TB of memory and uses the exponential memory map document).

-- 
Catalin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC] ARM64: 4 level page table translation for 4KB pages
  2014-04-01 11:22               ` Catalin Marinas
@ 2014-04-01 23:35                 ` Sungjinn Chung
  0 siblings, 0 replies; 24+ messages in thread
From: Sungjinn Chung @ 2014-04-01 23:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday, April 01, 2014 8:23 PM, Catalin Marinas wrote:
> On Tue, Apr 01, 2014 at 11:13:27AM +0100, ??? wrote:
> > On Tuesday, April 01, 2014 6:47 PM, Catalin Marinas wrote:
> > > On Tue, Apr 01, 2014 at 01:44:36AM +0100, ??? wrote:
> > > > On Tuesday, April 01, 2014 12:27 AM Catalin Marinas wrote:
> > > > > I don't mind 4-level tables by default but I would still keep a
> > > > > configuration option (or at least doing some benchmarks to assess the
> > > > > impact before switching permanently to 4-levels). There are mobile
> > > > > platforms that don't really need as much VA space (and people are even
> > > > > talking about ILP32).
> > > >
> > > > How about keep 3-level table by default and enable 4-level table with
> > > > config option?
> > >
> > > We want single image, so the default should cover all platforms. If
> > > someone wants to deploy a more specific kernel (e.g. for a mobile
> > > platform), they could change the configuration to more suitable ones.
> > Single image would have merit but I think one way should be default
> > Configuration and another should be specific kernel for smaller or lager
> > memory platform.
> > Which one do you want as default configuration among 3 level and 4 level?
> 
> The default should be 4 levels to cover all cases. You can tweak the
> config for your platform if you want to deploy a specific kernel image.
Ok.
> 
> > > > Asymmetry level for kernel and user land would make code complicated.
> > > > And usually more memory means that user application tends to use more memory.
> > > > So I suggest same virtual space for both.
> > >
> > > I don't really get the "more memory means ..." above ;). It's more
> > > virtual space, the user application would not extend to use all of it
> > > (most likely won't even notice).
> > The reason why we need 4 level page table is originally is that we need
> > bigger physical memory and kernel does not have enough virtual space.
> 
> So, you need more virtual space in the kernel to be able to linearly map
> the physical address space. That's fine.
> 
> > Please consider this is not only for making virtual bigger.
> > It's to make system use large physical memory for user application.
> 
> Do you want to use more than 512GB of physical memory for a user
> application?
I guess in-memory database might be first area.
> 
> --
> Catalin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC] ARM64: 4 level page table translation for 4KB pages
  2014-04-01 13:23           ` Catalin Marinas
@ 2014-04-02  3:58             ` Jungseok Lee
  2014-04-02  9:01               ` Catalin Marinas
  0 siblings, 1 reply; 24+ messages in thread
From: Jungseok Lee @ 2014-04-02  3:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday, April 01, 2014 10:23 PM, Catalin Marinas wrote:
> On Tue, Apr 01, 2014 at 12:11:34AM +0100, Arnd Bergmann wrote:
> > On Monday 31 March 2014 16:27:19 Catalin Marinas wrote:
> > > On Mon, Mar 31, 2014 at 01:53:20PM +0100, Arnd Bergmann wrote:
> > > > On Monday 31 March 2014 12:31:14 Catalin Marinas wrote:
> > > > > On Mon, Mar 31, 2014 at 07:56:53AM +0100, Arnd Bergmann wrote:
> > > > > > On Monday 31 March 2014 12:51:07 Jungseok Lee wrote:
> > > > > > > Current ARM64 kernel cannot support 4KB pages for 40-bit physical address
> > > > > > > space described in [1] due to one major issue + one minor issue.
> > > > > > >
> > > > > > > Firstly, kernel logical memory map (0xffffffc000000000-0xffffffffffffffff)
> > > > > > > cannot cover DRAM region from 544GB to 1024GB in [1]. Specifically, ARM64
> > > > > > > kernel fails to create mapping for this region in map_mem function
> > > > > > > (arch/arm64/mm/mmu.c) since __phys_to_virt for this region reaches to
> > > > > > > address overflow. I've used 3.14-rc8+Fast Models to validate the statement.
> > > > > >
> > > > > > It took me a while to understand what is going on, but it essentially comes
> > > > > > down to the logical memory map (0xffffffc000000000-0xffffffffffffffff)
> > > > > > being able to represent only RAM in the first 256GB of address space.
> > > > > >
> > > > > > More importantly, this means that any system following [1] will only be
> > > > > > able to use 32GB of RAM, which is a much more severe restriction than
> > > > > > what it sounds like at first.
> > > > >
> > > > > On a 64-bit platform, do we still need the alias at the bottom and the
> > > > > 512-544GB hole (even for 32-bit DMA, top address bits can be wired to
> > > > > 512GB)? Only the idmap would need 4 levels, but that's static, we don't
> > > > > need to switch Linux to 4-levels. Otherwise the memory is too sparse.
> > > >
> 
> 
> > > > I think we should keep a static virtual-to-physical mapping,
> > >
> > > Just so that I understand: with a PHYS_OFFSET of 0?
> >
> > I hadn't realized at first that it's variable, but I guess 0 would be the easiest,
> > otherwise we wouldn't be able to use 512GB pages to map the high memory range.
> >
> > > > and to keep
> > > > relocating the kernel at compile time without a hack like ARM_PATCH_PHYS_VIRT
> > > > if at all possible.
> > >
> > > and the kernel running at a virtual alias at a higher position than the
> > > end of the mapped RAM? IIUC x86_64 does something similar.
> >
> > That would work, yes.
> >
> > Another idea is to always run the kernel at PAGE_OFFSET, as today, but create
> > an alias there if there isn't already RAM at that location with the fixed
> > PHYS_OFFSET.
> 
> As long as we don't have some overlapping in VA space between start of
> RAM and end of the mapped kernel.
> 
> There maybe be other tricky bits with KVM and how EL2 code is mapped.
> 
> > > > > > There are good reasons to use a 50 bit virtual address space in user
> > > > > > land, e.g. for supporting data base applications that mmap huge files.
> > > >
> > > > You may actually need 4-level tables even if you have much less installed
> > > > memory, depending on how the application is written. Note that x86, powerpc
> > > > and s390 all chose to use 4-level tables for 64-bit kernels all the
> > > > time, even thought they can also use 3-level of 5-level in some cases.
> > >
> > > I don't mind 4-level tables by default but I would still keep a
> > > configuration option (or at least doing some benchmarks to assess the
> > > impact before switching permanently to 4-levels). There are mobile
> > > platforms that don't really need as much VA space (and people are even
> > > talking about ILP32).
> >
> > Yes, I wasn't suggesting we do it all the time. A related question
> > is whether we would also want to support 3-level 64k page tables, to
> > extend the addressable area from 42 bit (4TB) to 55 bit (large enough).
> > Is that actually a supported configuration?
> 
> It can go up to 48-bit maximum (with some extra reserved bits in the
> architecture, just in case more will be needed).
> 
> On some previous patches I've seen posted for 4-levels I asked that 64K
> and 4K page configurations are decoupled from the pgtable-?level.h
> macros so that if we ever need 3-levels with 64K it's easy to enable.

Is your request to decouple page size from the number of page tables?
In other words, would you like to prepare 4 options, 1)4KB+3Level, 2)
4KB+4Level, 3)64KB+2Level and 4)64KB+3Level, as combining page size
with page table levels in kernel configuration? 

Best Regards
Jungseok Lee

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC] ARM64: 4 level page table translation for 4KB pages
  2014-04-02  3:58             ` Jungseok Lee
@ 2014-04-02  9:01               ` Catalin Marinas
  2014-04-02 15:24                 ` Catalin Marinas
  0 siblings, 1 reply; 24+ messages in thread
From: Catalin Marinas @ 2014-04-02  9:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 02, 2014 at 04:58:39AM +0100, Jungseok Lee wrote:
> On Tuesday, April 01, 2014 10:23 PM, Catalin Marinas wrote:
> > On some previous patches I've seen posted for 4-levels I asked that 64K
> > and 4K page configurations are decoupled from the pgtable-?level.h
> > macros so that if we ever need 3-levels with 64K it's easy to enable.
> 
> Is your request to decouple page size from the number of page tables?
> In other words, would you like to prepare 4 options, 1)4KB+3Level, 2)
> 4KB+4Level, 3)64KB+2Level and 4)64KB+3Level, as combining page size
> with page table levels in kernel configuration? 

We can still use two options to make things less confusing: one for the
page size (4K/64K) and another for the size of the virtual address
space. From those two we can infer the number levels required. We can
limit the options to 39-bit, 42 (only fo 64K pages) and architecture
current maximum of 48 (for both 4K and 64K).

As for some code refactoring, for example, PTRS_PER_PTE etc. correspond
to the page size rather than the number of levels, so move them to
pgtable-{4k,64k}-hwdef.h. From the VA_BITS option above you can
calculate PTRS_PER_PGD (1 << (VA_BITS - PGDIR_SHIFT)). If this is too
large, you need a bigger PGDIR_SHIFT (another level). Some #ifdef's in
pgtable-hwdef.h and maybe create another pgtable-types.h. The
asm/memory.h file already defines the offsets in terms of VA_BITS, so
for the initial patches we should keep the same layout, make the
4-levels patches easier to review.

(and in the memory.txt doc I think we should make the end address
exclusive as it's easier to follow and can be compared with the kernel
virtual memory layout printed during boot)

-- 
Catalin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC] ARM64: 4 level page table translation for 4KB pages
  2014-04-02  9:01               ` Catalin Marinas
@ 2014-04-02 15:24                 ` Catalin Marinas
  2014-04-02 22:41                   ` Jungseok Lee
  2014-04-03  2:15                   ` Sungjinn Chung
  0 siblings, 2 replies; 24+ messages in thread
From: Catalin Marinas @ 2014-04-02 15:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 02, 2014 at 10:01:38AM +0100, Catalin Marinas wrote:
> On Wed, Apr 02, 2014 at 04:58:39AM +0100, Jungseok Lee wrote:
> > On Tuesday, April 01, 2014 10:23 PM, Catalin Marinas wrote:
> > > On some previous patches I've seen posted for 4-levels I asked that 64K
> > > and 4K page configurations are decoupled from the pgtable-?level.h
> > > macros so that if we ever need 3-levels with 64K it's easy to enable.
> > 
> > Is your request to decouple page size from the number of page tables?
> > In other words, would you like to prepare 4 options, 1)4KB+3Level, 2)
> > 4KB+4Level, 3)64KB+2Level and 4)64KB+3Level, as combining page size
> > with page table levels in kernel configuration? 
> 
> We can still use two options to make things less confusing: one for the
> page size (4K/64K) and another for the size of the virtual address
> space. From those two we can infer the number levels required. We can
> limit the options to 39-bit, 42 (only fo 64K pages) and architecture
> current maximum of 48 (for both 4K and 64K).

Another reason to decouple the page size is that we have 16K pages
specified in the ARM ARM (and we'll get hardware implementations at some
point). As a simple formula for the max VA space we can cover (capped at
48-bit):

VA_BITS = (PAGE_SHIFT - 3) * levels + PAGE_SHIFT

With 16K pages and 3 levels we can cover 47 bits. So we'll eventually
have the following VA bits options:

39 if 4K (3 levels)
42 if 64K (2 levels)
47 if 16K (3 levels)
48 if 4K || 16K || 64K (4/4/3 levels depending on page size)

The latter because of the architecture maximum.

-- 
Catalin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC] ARM64: 4 level page table translation for 4KB pages
  2014-04-02 15:24                 ` Catalin Marinas
@ 2014-04-02 22:41                   ` Jungseok Lee
  2014-04-03  2:15                   ` Sungjinn Chung
  1 sibling, 0 replies; 24+ messages in thread
From: Jungseok Lee @ 2014-04-02 22:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday, April 03, 2014 12:25 AM, Catalin Marinas wrote:
> On Wed, Apr 02, 2014 at 10:01:38AM +0100, Catalin Marinas wrote:
> > On Wed, Apr 02, 2014 at 04:58:39AM +0100, Jungseok Lee wrote:
> > > On Tuesday, April 01, 2014 10:23 PM, Catalin Marinas wrote:
> > > > On some previous patches I've seen posted for 4-levels I asked that 64K
> > > > and 4K page configurations are decoupled from the pgtable-?level.h
> > > > macros so that if we ever need 3-levels with 64K it's easy to enable.
> > >
> > > Is your request to decouple page size from the number of page tables?
> > > In other words, would you like to prepare 4 options, 1)4KB+3Level, 2)
> > > 4KB+4Level, 3)64KB+2Level and 4)64KB+3Level, as combining page size
> > > with page table levels in kernel configuration?
> >
> > We can still use two options to make things less confusing: one for the
> > page size (4K/64K) and another for the size of the virtual address
> > space. From those two we can infer the number levels required. We can
> > limit the options to 39-bit, 42 (only fo 64K pages) and architecture
> > current maximum of 48 (for both 4K and 64K).
> 
> Another reason to decouple the page size is that we have 16K pages
> specified in the ARM ARM (and we'll get hardware implementations at some
> point). As a simple formula for the max VA space we can cover (capped at
> 48-bit):
> VA_BITS = (PAGE_SHIFT - 3) * levels + PAGE_SHIFT
> 
> With 16K pages and 3 levels we can cover 47 bits. So we'll eventually
> have the following VA bits options:
> 
> 39 if 4K (3 levels)
> 42 if 64K (2 levels)
> 47 if 16K (3 levels)
> 48 if 4K || 16K || 64K (4/4/3 levels depending on page size)
> 
> The latter because of the architecture maximum.

Okay. Thanks for clarification.

Jungseok Lee

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC] ARM64: 4 level page table translation for 4KB pages
  2014-04-02 15:24                 ` Catalin Marinas
  2014-04-02 22:41                   ` Jungseok Lee
@ 2014-04-03  2:15                   ` Sungjinn Chung
  2014-04-03  8:38                     ` Catalin Marinas
  1 sibling, 1 reply; 24+ messages in thread
From: Sungjinn Chung @ 2014-04-03  2:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday, April 03, 2014 12:25 AM, Catalin Marinas wrote:
> On Wed, Apr 02, 2014 at 10:01:38AM +0100, Catalin Marinas wrote:
> > On Wed, Apr 02, 2014 at 04:58:39AM +0100, Jungseok Lee wrote:
> > > On Tuesday, April 01, 2014 10:23 PM, Catalin Marinas wrote:
> > > > On some previous patches I've seen posted for 4-levels I asked that 64K
> > > > and 4K page configurations are decoupled from the pgtable-?level.h
> > > > macros so that if we ever need 3-levels with 64K it's easy to enable.
> > >
> > > Is your request to decouple page size from the number of page tables?
> > > In other words, would you like to prepare 4 options, 1)4KB+3Level, 2)
> > > 4KB+4Level, 3)64KB+2Level and 4)64KB+3Level, as combining page size
> > > with page table levels in kernel configuration?
> >
> > We can still use two options to make things less confusing: one for the
> > page size (4K/64K) and another for the size of the virtual address
> > space. From those two we can infer the number levels required. We can
> > limit the options to 39-bit, 42 (only fo 64K pages) and architecture
> > current maximum of 48 (for both 4K and 64K).
> 
> Another reason to decouple the page size is that we have 16K pages
> specified in the ARM ARM (and we'll get hardware implementations at some
> point). As a simple formula for the max VA space we can cover (capped at
> 48-bit):
> 
> VA_BITS = (PAGE_SHIFT - 3) * levels + PAGE_SHIFT
> 
> With 16K pages and 3 levels we can cover 47 bits. So we'll eventually
> have the following VA bits options:
> 
> 39 if 4K (3 levels)
> 42 if 64K (2 levels)
> 47 if 16K (3 levels)
> 48 if 4K || 16K || 64K (4/4/3 levels depending on page size)
Separation for page size and VA bits looks great to me.

I think we can focus on only 4K and 64K at this point.
I'm worried about validation issue for 64K.
After we secure code for 4K and 64K with refactoring,
16K might not big thing. Jungseok or somebody else can do it.
How about your opinion?
> 
> The latter because of the architecture maximum.
> 
> --
> Catalin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC] ARM64: 4 level page table translation for 4KB pages
  2014-04-03  2:15                   ` Sungjinn Chung
@ 2014-04-03  8:38                     ` Catalin Marinas
  2014-04-03  9:14                       ` Sungjinn Chung
  0 siblings, 1 reply; 24+ messages in thread
From: Catalin Marinas @ 2014-04-03  8:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Apr 03, 2014 at 03:15:09AM +0100, Sungjinn Chung wrote:
> On Thursday, April 03, 2014 12:25 AM, Catalin Marinas wrote:
> > Another reason to decouple the page size is that we have 16K pages
> > specified in the ARM ARM (and we'll get hardware implementations at some
> > point). As a simple formula for the max VA space we can cover (capped at
> > 48-bit):
> > 
> > VA_BITS = (PAGE_SHIFT - 3) * levels + PAGE_SHIFT
> > 
> > With 16K pages and 3 levels we can cover 47 bits. So we'll eventually
> > have the following VA bits options:
> > 
> > 39 if 4K (3 levels)
> > 42 if 64K (2 levels)
> > 47 if 16K (3 levels)
> > 48 if 4K || 16K || 64K (4/4/3 levels depending on page size)
> 
> Separation for page size and VA bits looks great to me.
> 
> I think we can focus on only 4K and 64K at this point.

Yes.

> I'm worried about validation issue for 64K.

Why? The CPUs I'm aware of implement this feature.

> After we secure code for 4K and 64K with refactoring,
> 16K might not big thing.

Indeed, let's do the refactoring first.

> Jungseok or somebody else can do it. How about your opinion?

If you have time, please go ahead. I could do this as well but probably
early May.

-- 
Catalin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC] ARM64: 4 level page table translation for 4KB pages
  2014-04-03  8:38                     ` Catalin Marinas
@ 2014-04-03  9:14                       ` Sungjinn Chung
  2014-04-03  9:17                         ` Catalin Marinas
  0 siblings, 1 reply; 24+ messages in thread
From: Sungjinn Chung @ 2014-04-03  9:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday, April 03, 2014 5:38 PM, Catalin Marinas wrote:
> On Thu, Apr 03, 2014 at 03:15:09AM +0100, Sungjinn Chung wrote:
> > On Thursday, April 03, 2014 12:25 AM, Catalin Marinas wrote:
> > > Another reason to decouple the page size is that we have 16K pages
> > > specified in the ARM ARM (and we'll get hardware implementations at some
> > > point). As a simple formula for the max VA space we can cover (capped at
> > > 48-bit):
> > >
> > > VA_BITS = (PAGE_SHIFT - 3) * levels + PAGE_SHIFT
> > >
> > > With 16K pages and 3 levels we can cover 47 bits. So we'll eventually
> > > have the following VA bits options:
> > >
> > > 39 if 4K (3 levels)
> > > 42 if 64K (2 levels)
> > > 47 if 16K (3 levels)
> > > 48 if 4K || 16K || 64K (4/4/3 levels depending on page size)
> >
> > Separation for page size and VA bits looks great to me.
> >
> > I think we can focus on only 4K and 64K at this point.
> 
> Yes.
> 
> > I'm worried about validation issue for 64K.
> 
> Why? The CPUs I'm aware of implement this feature.
Sorry for my mistake. I meant 16K so please just ignore.
> 
> > After we secure code for 4K and 64K with refactoring,
> > 16K might not big thing.
> 
> Indeed, let's do the refactoring first.
> 
> > Jungseok or somebody else can do it. How about your opinion?
> 
> If you have time, please go ahead. I could do this as well but probably
> early May.
Ok. Let's talk about it later after we get first patchset from Jungseok.
> 
> --
> Catalin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC] ARM64: 4 level page table translation for 4KB pages
  2014-04-03  9:14                       ` Sungjinn Chung
@ 2014-04-03  9:17                         ` Catalin Marinas
  0 siblings, 0 replies; 24+ messages in thread
From: Catalin Marinas @ 2014-04-03  9:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Apr 03, 2014 at 10:14:53AM +0100, Sungjinn Chung wrote:
> On Thursday, April 03, 2014 5:38 PM, Catalin Marinas wrote:
> > On Thu, Apr 03, 2014 at 03:15:09AM +0100, Sungjinn Chung wrote:
> > > I'm worried about validation issue for 64K.
> > 
> > Why? The CPUs I'm aware of implement this feature.
> 
> Sorry for my mistake. I meant 16K so please just ignore.

A software model should help initially and probably FPGA afterwards.

-- 
Catalin

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2014-04-03  9:17 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-31  3:51 [RFC] ARM64: 4 level page table translation for 4KB pages Jungseok Lee
2014-03-31  6:56 ` Arnd Bergmann
2014-03-31 11:31   ` Catalin Marinas
2014-03-31 12:45     ` Catalin Marinas
2014-03-31 12:58       ` Arnd Bergmann
2014-03-31 15:00         ` Catalin Marinas
2014-03-31 12:53     ` Arnd Bergmann
2014-03-31 15:27       ` Catalin Marinas
2014-03-31 23:11         ` Arnd Bergmann
2014-04-01 13:23           ` Catalin Marinas
2014-04-02  3:58             ` Jungseok Lee
2014-04-02  9:01               ` Catalin Marinas
2014-04-02 15:24                 ` Catalin Marinas
2014-04-02 22:41                   ` Jungseok Lee
2014-04-03  2:15                   ` Sungjinn Chung
2014-04-03  8:38                     ` Catalin Marinas
2014-04-03  9:14                       ` Sungjinn Chung
2014-04-03  9:17                         ` Catalin Marinas
2014-04-01  0:44         ` 정성진
2014-04-01  9:46           ` Catalin Marinas
2014-04-01 10:13             ` 정성진
2014-04-01 11:22               ` Catalin Marinas
2014-04-01 23:35                 ` Sungjinn Chung
2014-04-01  0:42     ` Jungseok Lee

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.