* [PATCH 0/3] x86/mm/doc: Clean up mm.txt
@ 2018-10-06 8:43 Baoquan He
2018-10-06 8:43 ` [PATCH 1/3] x86/KASLR: Update KERNEL_IMAGE_SIZE description Baoquan He
` (5 more replies)
0 siblings, 6 replies; 19+ messages in thread
From: Baoquan He @ 2018-10-06 8:43 UTC (permalink / raw)
To: mingo; +Cc: linux-kernel, x86, linux-doc, tglx, thgarnie, corbet, Baoquan He
This clean up is suggested by Ingo.
It firstly fix the confusions in mm layout tables by unifying
each memory region description in the consistent style.
Secondly take the KASLR words out of the mm layout tables to make
it as a separate section to only list mm layout in non-KASLR case.
Then add KASLR document at the end of mm.txt.
Meanwhile update description about KERNEL_IMAGE_SIZE in
arch/x86/include/asm/page_64_types.h .
v2->v3:
Ingo helped to prettify the patch log and code comment, repost them
after updating accordign to Ingo's suggestions.
v1->v2:
Resend v2 since some typo and incorrect descriptions found in v1 post.
Baoquan He (3):
x86/KASLR: Update KERNEL_IMAGE_SIZE description
x86/mm/doc: Clean up the memory region layout descriptions
x86/doc/kaslr.txt: Create a separate part of document abourt KASLR at
the end of file
Documentation/x86/x86_64/mm.txt | 150 +++++++++++++++++++++++------------
arch/x86/include/asm/page_64_types.h | 15 ++--
2 files changed, 107 insertions(+), 58 deletions(-)
--
2.13.6
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 1/3] x86/KASLR: Update KERNEL_IMAGE_SIZE description
2018-10-06 8:43 [PATCH 0/3] x86/mm/doc: Clean up mm.txt Baoquan He
@ 2018-10-06 8:43 ` Baoquan He
2018-10-06 13:06 ` [tip:x86/mm] " tip-bot for Baoquan He
2018-10-06 8:43 ` [PATCH 2/3] x86/mm/doc: Clean up the memory region layout descriptions Baoquan He
` (4 subsequent siblings)
5 siblings, 1 reply; 19+ messages in thread
From: Baoquan He @ 2018-10-06 8:43 UTC (permalink / raw)
To: mingo; +Cc: linux-kernel, x86, linux-doc, tglx, thgarnie, corbet, Baoquan He
Currently CONFIG_RANDOMIZE_BASE=y is set by default, which makes some of the
old comments above the KERNEL_IMAGE_SIZE definition out of date. Update them
to the current state of affairs.
Signed-off-by: Baoquan He <bhe@redhat.com>
---
arch/x86/include/asm/page_64_types.h | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 6afac386a434..cd0cf1c568b4 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -59,13 +59,16 @@
#endif
/*
- * Kernel image size is limited to 1GiB due to the fixmap living in the
- * next 1GiB (see level2_kernel_pgt in arch/x86/kernel/head_64.S). Use
- * 512MiB by default, leaving 1.5GiB for modules once the page tables
- * are fully set up. If kernel ASLR is configured, it can extend the
- * kernel page table mapping, reducing the size of the modules area.
+ * Maximum kernel image size is limited to 1 GiB, due to the fixmap living
+ * in the next 1 GiB (see level2_kernel_pgt in arch/x86/kernel/head_64.S).
+ *
+ * On KASLR use 1 GiB by default, leaving 1 GiB for modules once the
+ * page tables are fully set up.
+ *
+ * If KASLR is disabled we can shrink it to 0.5 GiB and increase the size
+ * of the modules area to 1.5 GiB.
*/
-#if defined(CONFIG_RANDOMIZE_BASE)
+#ifdef CONFIG_RANDOMIZE_BASE
#define KERNEL_IMAGE_SIZE (1024 * 1024 * 1024)
#else
#define KERNEL_IMAGE_SIZE (512 * 1024 * 1024)
--
2.13.6
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 2/3] x86/mm/doc: Clean up the memory region layout descriptions
2018-10-06 8:43 [PATCH 0/3] x86/mm/doc: Clean up mm.txt Baoquan He
2018-10-06 8:43 ` [PATCH 1/3] x86/KASLR: Update KERNEL_IMAGE_SIZE description Baoquan He
@ 2018-10-06 8:43 ` Baoquan He
2018-10-06 13:07 ` [tip:x86/mm] x86/mm/doc: Clean up the x86-64 virtual memory " tip-bot for Baoquan He
2018-10-06 8:43 ` [PATCH 3/3] x86/doc/kaslr.txt: Create a separate part of document abourt KASLR at the end of file Baoquan He
` (3 subsequent siblings)
5 siblings, 1 reply; 19+ messages in thread
From: Baoquan He @ 2018-10-06 8:43 UTC (permalink / raw)
To: mingo; +Cc: linux-kernel, x86, linux-doc, tglx, thgarnie, corbet, Baoquan He
In Documentation/x86/x86_64/mm.txt, the style of descritions about
memory region layout is a little confusing:
- mix size in TB with 'bits'
- sometimes mention a size in the description and sometimes not
- sometimes list holes by address, sometimes only as an 'unused hole' line
So fix them to make them in consistent style.
Signed-off-by: Baoquan He <bhe@redhat.com>
---
Documentation/x86/x86_64/mm.txt | 84 ++++++++++++++++++++---------------------
1 file changed, 42 insertions(+), 42 deletions(-)
diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index 5432a96d31ff..b4bc95c9790e 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -1,55 +1,55 @@
Virtual memory map with 4 level page tables:
-0000000000000000 - 00007fffffffffff (=47 bits) user space, different per mm
-hole caused by [47:63] sign extension
-ffff800000000000 - ffff87ffffffffff (=43 bits) guard hole, reserved for hypervisor
-ffff880000000000 - ffffc7ffffffffff (=64 TB) direct mapping of all phys. memory
-ffffc80000000000 - ffffc8ffffffffff (=40 bits) hole
-ffffc90000000000 - ffffe8ffffffffff (=45 bits) vmalloc/ioremap space
-ffffe90000000000 - ffffe9ffffffffff (=40 bits) hole
-ffffea0000000000 - ffffeaffffffffff (=40 bits) virtual memory map (1TB)
-... unused hole ...
-ffffec0000000000 - fffffbffffffffff (=44 bits) kasan shadow memory (16TB)
-... unused hole ...
+0000000000000000 - 00007fffffffffff (=47 bits, 128 TB) user space, different per mm
+ hole caused by [47:63] sign extension
+ffff800000000000 - ffff87ffffffffff (=43 bits, 8 TB) guard hole, reserved for hypervisor
+ffff880000000000 - ffffc7ffffffffff (=46 bits, 64 TB) direct mapping of all phys. memory (page_offset_base)
+ffffc80000000000 - ffffc8ffffffffff (=40 bits, 1 TB) unused hole
+ffffc90000000000 - ffffe8ffffffffff (=45 bits, 32 TB) vmalloc/ioremap space (vmalloc_base)
+ffffe90000000000 - ffffe9ffffffffff (=40 bits, 1 TB) unused hole
+ffffea0000000000 - ffffeaffffffffff (=40 bits, 1 TB) virtual memory map (vmemmap_base)
+ffffeb0000000000 - ffffebffffffffff (=40 bits, 1 TB) unused hole
+ffffec0000000000 - fffffbffffffffff (=44 bits, 16 TB) kasan shadow memory
+fffffc0000000000 - fffffdffffffffff (=41 bits, 2 TB) unused hole
vaddr_end for KASLR
-fffffe0000000000 - fffffe7fffffffff (=39 bits) cpu_entry_area mapping
-fffffe8000000000 - fffffeffffffffff (=39 bits) LDT remap for PTI
-ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks
-... unused hole ...
-ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space
-... unused hole ...
-ffffffff80000000 - ffffffff9fffffff (=512 MB) kernel text mapping, from phys 0
-ffffffffa0000000 - fffffffffeffffff (1520 MB) module mapping space
+fffffe0000000000 - fffffe7fffffffff (=39 bits, 512 GB) cpu_entry_area mapping
+fffffe8000000000 - fffffeffffffffff (=39 bits, 512 GB) LDT remap for PTI
+ffffff0000000000 - ffffff7fffffffff (=39 bits, 512 GB) %esp fixup stacks
+ffffff8000000000 - fffffffeefffffff (~39 bits, ~507 GB) unused hole
+ffffffef00000000 - fffffffeffffffff (=36 bits, 64 GB) EFI region mapping space
+ffffffff00000000 - ffffffff7fffffff (=31 bits, 2 GB) unused hole
+ffffffff80000000 - ffffffff9fffffff (=29 bits, 512 MB) kernel text mapping, from phys 0
+ffffffffa0000000 - fffffffffeffffff (~31 bits, 1520 MB) module mapping space
[fixmap start] - ffffffffff5fffff kernel-internal fixmap range
-ffffffffff600000 - ffffffffff600fff (=4 kB) legacy vsyscall ABI
-ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
+ffffffffff600000 - ffffffffff600fff ( =4 kB) legacy vsyscall ABI
+ffffffffffe00000 - ffffffffffffffff ( =2 MB) unused hole
Virtual memory map with 5 level page tables:
-0000000000000000 - 00ffffffffffffff (=56 bits) user space, different per mm
-hole caused by [56:63] sign extension
-ff00000000000000 - ff0fffffffffffff (=52 bits) guard hole, reserved for hypervisor
-ff10000000000000 - ff8fffffffffffff (=55 bits) direct mapping of all phys. memory
-ff90000000000000 - ff9fffffffffffff (=52 bits) LDT remap for PTI
-ffa0000000000000 - ffd1ffffffffffff (=54 bits) vmalloc/ioremap space (12800 TB)
-ffd2000000000000 - ffd3ffffffffffff (=49 bits) hole
-ffd4000000000000 - ffd5ffffffffffff (=49 bits) virtual memory map (512TB)
-... unused hole ...
-ffdf000000000000 - fffffc0000000000 (=53 bits) kasan shadow memory (8PB)
-... unused hole ...
+0000000000000000 - 00ffffffffffffff (=56 bits, 64 PB) user space, different per mm
+ hole caused by [56:63] sign extension
+ff00000000000000 - ff0fffffffffffff (=52 bits, 4 PB) guard hole, reserved for hypervisor
+ff10000000000000 - ff8fffffffffffff (=55 bits, 32 PB) direct mapping of all phys. memory (page_offset_base)
+ff90000000000000 - ff9fffffffffffff (=52 bits, 4 PB) LDT remap for PTI
+ffa0000000000000 - ffd1ffffffffffff (=53 bits, 12800 TB) vmalloc/ioremap space (vmalloc_base)
+ffd2000000000000 - ffd3ffffffffffff (=49 bits, 512 TB) unused hole
+ffd4000000000000 - ffd5ffffffffffff (=49 bits, 512 TB) virtual memory map (vmemmap_base)
+ffd6000000000000 - ffdeffffffffffff (~51 bits, 2304 TB) unused hole
+ffdf000000000000 - fffffdffffffffff (~53 bits, ~8 PB) kasan shadow memory
+fffffc0000000000 - fffffdffffffffff (=41 bits, 2 TB) unused hole
vaddr_end for KASLR
-fffffe0000000000 - fffffe7fffffffff (=39 bits) cpu_entry_area mapping
-... unused hole ...
-ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks
-... unused hole ...
-ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space
-... unused hole ...
-ffffffff80000000 - ffffffff9fffffff (=512 MB) kernel text mapping, from phys 0
-ffffffffa0000000 - fffffffffeffffff (1520 MB) module mapping space
+fffffe0000000000 - fffffe7fffffffff (=39 bits, 512 GB) cpu_entry_area mapping
+fffffe8000000000 - fffffeffffffffff (=39 bits, 512 GB) unused hole
+ffffff0000000000 - ffffff7fffffffff (=39 bits, 512 GB) %esp fixup stacks
+ffffff8000000000 - ffffffeeffffffff (~39 bits, 444 GB) unused hole
+ffffffef00000000 - fffffffeffffffff (=36 bits, 64 GB) EFI region mapping space
+ffffffff00000000 - ffffffff7fffffff (31 bits, 2 GB) unused hole
+ffffffff80000000 - ffffffff9fffffff (=29 bits, 512 MB) kernel text mapping, from phys 0
+ffffffffa0000000 - fffffffffeffffff (~31 bits, 1520 MB) module mapping space
[fixmap start] - ffffffffff5fffff kernel-internal fixmap range
-ffffffffff600000 - ffffffffff600fff (=4 kB) legacy vsyscall ABI
-ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
+ffffffffff600000 - ffffffffff600fff ( =4 kB) legacy vsyscall ABI
+ffffffffffe00000 - ffffffffffffffff ( =2 MB) unused hole
Architecture defines a 64-bit virtual address. Implementations can support
less. Currently supported are 48- and 57-bit virtual addresses. Bits 63
--
2.13.6
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 3/3] x86/doc/kaslr.txt: Create a separate part of document abourt KASLR at the end of file
2018-10-06 8:43 [PATCH 0/3] x86/mm/doc: Clean up mm.txt Baoquan He
2018-10-06 8:43 ` [PATCH 1/3] x86/KASLR: Update KERNEL_IMAGE_SIZE description Baoquan He
2018-10-06 8:43 ` [PATCH 2/3] x86/mm/doc: Clean up the memory region layout descriptions Baoquan He
@ 2018-10-06 8:43 ` Baoquan He
2018-10-06 11:28 ` [PATCH 0/3] x86/mm/doc: Clean up mm.txt Baoquan He
` (2 subsequent siblings)
5 siblings, 0 replies; 19+ messages in thread
From: Baoquan He @ 2018-10-06 8:43 UTC (permalink / raw)
To: mingo; +Cc: linux-kernel, x86, linux-doc, tglx, thgarnie, corbet, Baoquan He
Take the original content as the first part to only list static mm
layout tables in non-KASLR case. Then add KASLR related description
at the end.
Signed-off-by: Baoquan He <bhe@redhat.com>
---
Documentation/x86/x86_64/mm.txt | 64 +++++++++++++++++++++++++++++++++++------
1 file changed, 55 insertions(+), 9 deletions(-)
diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index b4bc95c9790e..549fcae596e0 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -1,5 +1,5 @@
-Virtual memory map with 4 level page tables:
+MM layout in non-KASLR case:
0000000000000000 - 00007fffffffffff (=47 bits, 128 TB) user space, different per mm
hole caused by [47:63] sign extension
@@ -12,7 +12,6 @@ ffffea0000000000 - ffffeaffffffffff (=40 bits, 1 TB) virtual memory map (vme
ffffeb0000000000 - ffffebffffffffff (=40 bits, 1 TB) unused hole
ffffec0000000000 - fffffbffffffffff (=44 bits, 16 TB) kasan shadow memory
fffffc0000000000 - fffffdffffffffff (=41 bits, 2 TB) unused hole
- vaddr_end for KASLR
fffffe0000000000 - fffffe7fffffffff (=39 bits, 512 GB) cpu_entry_area mapping
fffffe8000000000 - fffffeffffffffff (=39 bits, 512 GB) LDT remap for PTI
ffffff0000000000 - ffffff7fffffffff (=39 bits, 512 GB) %esp fixup stacks
@@ -38,7 +37,6 @@ ffd4000000000000 - ffd5ffffffffffff (=49 bits, 512 TB) virtual memory map (vme
ffd6000000000000 - ffdeffffffffffff (~51 bits, 2304 TB) unused hole
ffdf000000000000 - fffffdffffffffff (~53 bits, ~8 PB) kasan shadow memory
fffffc0000000000 - fffffdffffffffff (=41 bits, 2 TB) unused hole
- vaddr_end for KASLR
fffffe0000000000 - fffffe7fffffffff (=39 bits, 512 GB) cpu_entry_area mapping
fffffe8000000000 - fffffeffffffffff (=39 bits, 512 GB) unused hole
ffffff0000000000 - ffffff7fffffffff (=39 bits, 512 GB) %esp fixup stacks
@@ -70,10 +68,58 @@ memory window (this size is arbitrary, it can be raised later if needed).
The mappings are not part of any other kernel PGD and are only available
during EFI runtime calls.
-Note that if CONFIG_RANDOMIZE_MEMORY is enabled, the direct mapping of all
-physical memory, vmalloc/ioremap space and virtual memory map are randomized.
-Their order is preserved but their base will be offset early at boot time.
+MM layout related to KASLR
+=========================================================================
-Be very careful vs. KASLR when changing anything here. The KASLR address
-range must not overlap with anything except the KASAN shadow area, which is
-correct as KASAN disables KASLR.
+Kernel Address Space Layout Randomization (KASLR) consists of two parts
+which work together to enhance the security of the Linux kernel:
+
+ - Kernel text KASLR
+ - Memory region KASLR
+
+Kernel text KASLR
+-----------------
+The physical address and virtual address of kernel text itself are
+randomized to a different position separately. The physical address of
+the kernel can be anywhere, under 64TB at most in 4-level paging mode,
+and under 4 PB in 5-level paging mode, while the virtual address of the
+kernel is restricted between [0xffffffff80000000, ffffffffbfffffff],
+the 1GB space.
+
+ffffffff80000000 - ffffffffbfffffff (1 GB) kernel text mapping, from phys 0
+ffffffffc0000000 - fffffffffeffffff (1 GB) module mapping space
+
+Note: The kernel text KASLR uses 1 GB space to randomize the position of
+kernel image, and it's defalutly enabled. If KASLR config option
+CONFIG_RANDOMIZE_BASE is not enabled, the space for kernel image will be
+shrunk to 512 MB, accordingly increase the size of modules area to 1.5 GB.
+
+Memory region KASLR
+-------------------
+If CONFIG_RANDOMIZE_MEMORY is enabled, the below three memory regions
+are randomized. Their order is preserved but their base will be offset
+early at boot time.
+
+ - direct mapping region
+ - vmalloc region
+ - vmemmap region
+
+The KASLR address range must not overlap with anything except the KASAN
+shadow area, which is correct as KASAN disables KASLR.
+
+So if take 4-level paging mode as example, from the original starting
+address of the direct mapping region for physical RAM, to the starting
+address of the cpu_entry_area mapping region, namely
+[0xffff880000000000 - 0xfffffdffffffffff], the scope of 118 TB in all
+is the virtual address space where memory region KASLR can be allowed to
+move those memory regions around. After KASLR manipulation is done, their
+layout looks like:
+
+Name Starting address Size Aligned
+-----------------------------------------------------------------------------------------------
+direct mapping page_offset_base [actual size of system RAM + 10 TB padding] 1 GB
+*guard hole random random 1 GB
+vmalloc vmalloc_base 32 TB 1 GB
+*guard hole random random 1 GB
+vmemmap vmemmap_base 1 TB 1 GB
+*guard hole random random 1 GB
--
2.13.6
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 0/3] x86/mm/doc: Clean up mm.txt
2018-10-06 8:43 [PATCH 0/3] x86/mm/doc: Clean up mm.txt Baoquan He
` (2 preceding siblings ...)
2018-10-06 8:43 ` [PATCH 3/3] x86/doc/kaslr.txt: Create a separate part of document abourt KASLR at the end of file Baoquan He
@ 2018-10-06 11:28 ` Baoquan He
2018-10-06 12:21 ` Ingo Molnar
2018-10-06 12:22 ` [PATCH 4/3] x86/mm/doc: Enhance the x86-64 virtual memory layout descriptions Ingo Molnar
5 siblings, 0 replies; 19+ messages in thread
From: Baoquan He @ 2018-10-06 11:28 UTC (permalink / raw)
To: mingo; +Cc: linux-kernel, x86, linux-doc, tglx, thgarnie, corbet
On 10/06/18 at 04:43pm, Baoquan He wrote:
> This clean up is suggested by Ingo.
>
> It firstly fix the confusions in mm layout tables by unifying
> each memory region description in the consistent style.
>
> Secondly take the KASLR words out of the mm layout tables to make
> it as a separate section to only list mm layout in non-KASLR case.
> Then add KASLR document at the end of mm.txt.
>
> Meanwhile update description about KERNEL_IMAGE_SIZE in
> arch/x86/include/asm/page_64_types.h .
Sorry, this is V3 post. I forgot adding v3 tag in patch subject.
>
> v2->v3:
> Ingo helped to prettify the patch log and code comment, repost them
> after updating accordign to Ingo's suggestions.
>
> v1->v2:
>
> Resend v2 since some typo and incorrect descriptions found in v1 post.
>
> Baoquan He (3):
> x86/KASLR: Update KERNEL_IMAGE_SIZE description
> x86/mm/doc: Clean up the memory region layout descriptions
> x86/doc/kaslr.txt: Create a separate part of document abourt KASLR at
> the end of file
>
> Documentation/x86/x86_64/mm.txt | 150 +++++++++++++++++++++++------------
> arch/x86/include/asm/page_64_types.h | 15 ++--
> 2 files changed, 107 insertions(+), 58 deletions(-)
>
> --
> 2.13.6
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 0/3] x86/mm/doc: Clean up mm.txt
2018-10-06 8:43 [PATCH 0/3] x86/mm/doc: Clean up mm.txt Baoquan He
` (3 preceding siblings ...)
2018-10-06 11:28 ` [PATCH 0/3] x86/mm/doc: Clean up mm.txt Baoquan He
@ 2018-10-06 12:21 ` Ingo Molnar
2018-10-06 12:22 ` [PATCH 4/3] x86/mm/doc: Enhance the x86-64 virtual memory layout descriptions Ingo Molnar
5 siblings, 0 replies; 19+ messages in thread
From: Ingo Molnar @ 2018-10-06 12:21 UTC (permalink / raw)
To: Baoquan He
Cc: linux-kernel, x86, linux-doc, tglx, thgarnie, corbet,
Borislav Petkov, Peter Zijlstra, Linus Torvalds, Andy Lutomirski
* Baoquan He <bhe@redhat.com> wrote:
> This clean up is suggested by Ingo.
>
> It firstly fix the confusions in mm layout tables by unifying
> each memory region description in the consistent style.
>
> Secondly take the KASLR words out of the mm layout tables to make
> it as a separate section to only list mm layout in non-KASLR case.
> Then add KASLR document at the end of mm.txt.
>
> Meanwhile update description about KERNEL_IMAGE_SIZE in
> arch/x86/include/asm/page_64_types.h .
>
> v2->v3:
> Ingo helped to prettify the patch log and code comment, repost them
> after updating accordign to Ingo's suggestions.
>
> v1->v2:
>
> Resend v2 since some typo and incorrect descriptions found in v1 post.
>
> Baoquan He (3):
> x86/KASLR: Update KERNEL_IMAGE_SIZE description
> x86/mm/doc: Clean up the memory region layout descriptions
> x86/doc/kaslr.txt: Create a separate part of document abourt KASLR at
> the end of file
Thanks, patches #1 and #2 are looking good and I'll apply them with some minor fixes, and I'll
comment about patch #3 separately.
I also wrote a larger patch enhancing the descriptions some more, I'll send that separately.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 4/3] x86/mm/doc: Enhance the x86-64 virtual memory layout descriptions
2018-10-06 8:43 [PATCH 0/3] x86/mm/doc: Clean up mm.txt Baoquan He
` (4 preceding siblings ...)
2018-10-06 12:21 ` Ingo Molnar
@ 2018-10-06 12:22 ` Ingo Molnar
2018-10-06 12:33 ` Ingo Molnar
2018-10-06 14:38 ` [PATCH 4/3 v2] " Ingo Molnar
5 siblings, 2 replies; 19+ messages in thread
From: Ingo Molnar @ 2018-10-06 12:22 UTC (permalink / raw)
To: Baoquan He
Cc: linux-kernel, x86, linux-doc, tglx, thgarnie, corbet,
Borislav Petkov, H. Peter Anvin, Andy Lutomirski, Peter Zijlstra,
Linus Torvalds, Andrew Morton
After the cleanups from Baoquan He, make it even more readable:
- Remove the 'bits' area size column: it's pretty pointless and was even
wrong for some of the entries. Given that MB, GB, TB, PT are 10, 20,
30 and 40 bits, a "8 TB" size description makes it obvious that it's
43 bits.
- Introduce an "offset" column:
--------------------------------------------------------------------------------
start addr | offset | end addr | size | VM area description
-----------------|------------|------------------|---------|--------------------
...
ffff880000000000 | -120 TB | ffffc7ffffffffff | 64 TB | direct mapping of all physical memory (page_offset_base),
this is what limits max physical memory supported.
The -120 TB notation makes it obvious where this particular virtual memory
region starts: 120 TB down from the top of the 64-bit virtual memory space.
Especially the layout of the kernel mappings is a *lot* more obvious when
written this way, plus it's much easier to compare it with the size column
and understand/check/validate and modify the kernel's layout in the future.
- Mark the part from where the 47-bit and 56-bit kernel layouts are 100% identical,
this starts at the -512 GB offset and the EFI region.
- Re-shuffle the size desciptions to be continous blocks of sizes, instead of the
often mixed size. I.e. write "0.5 TB" instead of "512 GB" if we are still in
the TB-granular region of the map.
- Make the 47-bit and 56-bit descriptions use the *exact* same layout and wording,
and only differ where there's a material difference. This makes it easy to compare
the two tables side by side by switching between two terminal tabs.
- Plus enhance a lot of other stylistic/typographical details: make the tables
explicitly tabular, add headers, enhance certain entries, etc. etc.
Note that there are some apparent errors in the tables as well, but I'll fix
them in a separate patch to make it easier to review/validate.
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: corbet@lwn.net
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: thgarnie@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
Documentation/x86/x86_64/mm.txt | 172 ++++++++++++++++++++++++++++------------
kernel/sched/core.c | 6 +
2 files changed, 128 insertions(+), 50 deletions(-)
Index: tip/Documentation/x86/x86_64/mm.txt
===================================================================
--- tip.orig/Documentation/x86/x86_64/mm.txt
+++ tip/Documentation/x86/x86_64/mm.txt
@@ -1,55 +1,127 @@
-Virtual memory map with 4 level page tables:
+========================================================
+| Complete virtual memory map with 4-level page tables |
+========================================================
-0000000000000000 - 00007fffffffffff (=47 bits, 128 TB) user space, different per mm
- hole caused by [47:63] sign extension
-ffff800000000000 - ffff87ffffffffff (=43 bits, 8 TB) guard hole, reserved for hypervisor
-ffff880000000000 - ffffc7ffffffffff (=46 bits, 64 TB) direct mapping of all phys. memory (page_offset_base)
-ffffc80000000000 - ffffc8ffffffffff (=40 bits, 1 TB) unused hole
-ffffc90000000000 - ffffe8ffffffffff (=45 bits, 32 TB) vmalloc/ioremap space (vmalloc_base)
-ffffe90000000000 - ffffe9ffffffffff (=40 bits, 1 TB) unused hole
-ffffea0000000000 - ffffeaffffffffff (=40 bits, 1 TB) virtual memory map (vmemmap_base)
-ffffeb0000000000 - ffffebffffffffff (=40 bits, 1 TB) unused hole
-ffffec0000000000 - fffffbffffffffff (=44 bits, 16 TB) kasan shadow memory
-fffffc0000000000 - fffffdffffffffff (=41 bits, 2 TB) unused hole
- vaddr_end for KASLR
-fffffe0000000000 - fffffe7fffffffff (=39 bits, 512 GB) cpu_entry_area mapping
-fffffe8000000000 - fffffeffffffffff (=39 bits, 512 GB) LDT remap for PTI
-ffffff0000000000 - ffffff7fffffffff (=39 bits, 512 GB) %esp fixup stacks
-ffffff8000000000 - fffffffeefffffff (~39 bits, ~507 GB) unused hole
-ffffffef00000000 - fffffffeffffffff (=36 bits, 64 GB) EFI region mapping space
-ffffffff00000000 - ffffffff7fffffff (=31 bits, 2 GB) unused hole
-ffffffff80000000 - ffffffff9fffffff (=29 bits, 512 MB) kernel text mapping, from phys 0
-ffffffffa0000000 - fffffffffeffffff (~31 bits, 1520 MB) module mapping space
-[fixmap start] - ffffffffff5fffff kernel-internal fixmap range
-ffffffffff600000 - ffffffffff600fff ( =4 kB) legacy vsyscall ABI
-ffffffffffe00000 - ffffffffffffffff ( =2 MB) unused hole
-
-Virtual memory map with 5 level page tables:
-
-0000000000000000 - 00ffffffffffffff (=56 bits, 64 PB) user space, different per mm
- hole caused by [56:63] sign extension
-ff00000000000000 - ff0fffffffffffff (=52 bits, 4 PB) guard hole, reserved for hypervisor
-ff10000000000000 - ff8fffffffffffff (=55 bits, 32 PB) direct mapping of all phys. memory (page_offset_base)
-ff90000000000000 - ff9fffffffffffff (=52 bits, 4 PB) LDT remap for PTI
-ffa0000000000000 - ffd1ffffffffffff (=53 bits, 12800 TB) vmalloc/ioremap space (vmalloc_base)
-ffd2000000000000 - ffd3ffffffffffff (=49 bits, 512 TB) unused hole
-ffd4000000000000 - ffd5ffffffffffff (=49 bits, 512 TB) virtual memory map (vmemmap_base)
-ffd6000000000000 - ffdeffffffffffff (~51 bits, 2304 TB) unused hole
-ffdf000000000000 - fffffdffffffffff (~53 bits, ~8 PB) kasan shadow memory
-fffffc0000000000 - fffffdffffffffff (=41 bits, 2 TB) unused hole
- vaddr_end for KASLR
-fffffe0000000000 - fffffe7fffffffff (=39 bits, 512 GB) cpu_entry_area mapping
-fffffe8000000000 - fffffeffffffffff (=39 bits, 512 GB) unused hole
-ffffff0000000000 - ffffff7fffffffff (=39 bits, 512 GB) %esp fixup stacks
-ffffff8000000000 - ffffffeeffffffff (~39 bits, 444 GB) unused hole
-ffffffef00000000 - fffffffeffffffff (=36 bits, 64 GB) EFI region mapping space
-ffffffff00000000 - ffffffff7fffffff (31 bits, 2 GB) unused hole
-ffffffff80000000 - ffffffff9fffffff (=29 bits, 512 MB) kernel text mapping, from phys 0
-ffffffffa0000000 - fffffffffeffffff (~31 bits, 1520 MB) module mapping space
-[fixmap start] - ffffffffff5fffff kernel-internal fixmap range
-ffffffffff600000 - ffffffffff600fff ( =4 kB) legacy vsyscall ABI
-ffffffffffe00000 - ffffffffffffffff ( =2 MB) unused hole
+Notes:
+
+ - Negative addresses such as "-23 TB" are absolute addresses in bytes, counted down
+ from the top of the 64-bit address space. It's easier to understand the layout
+ when seen both in absolute addresses and in distance-from-top notation.
+
+ For example 0xffffe90000000000 == -23 TB, it's 23 TB lower than the top of the
+ 64-bit address space (ffffffffffffffff).
+
+ Note that as we get closer to the top of the address space, the notation changes
+ from TB to GB and then MB/KB.
+
+ - "16M TB" might look weird at first sight, but it's an easier to visualize size
+ notation than "16 EB", which few will recognize at first sight as 16 exabytes.
+ It also shows it nicely how incredibly large 64-bit address space is.
+
+--------------------------------------------------------------------------------
+start addr | offset | end addr | size | VM area description
+-----------------|------------|------------------|---------|--------------------
+0000000000000000 | 0 | 00007fffffffffff | 128 TB | user-space virtual memory, different per mm
+ |
+0000800000000000 | +128 TB | ffff7fffffffffff | ~16M TB | ... huge, almost 64 bits wide hole of non-canonical
+ virtual memory addresses up to the -128 TB
+ starting offset of kernel mappings.
+ |
+ |----------------------------------------------------
+ | kernel-space virtual memory, shared between all processes:
+ |
+ffff800000000000 | -128 TB | ffff87ffffffffff | 8 TB | ... guard hole, also reserved for hypervisor
+ffff880000000000 | -120 TB | ffffc7ffffffffff | 64 TB | direct mapping of all physical memory (page_offset_base),
+ this is what limits max physical memory supported.
+ffffc80000000000 | -56 TB | ffffc8ffffffffff | 1 TB | ... unused hole
+ffffc90000000000 | -55 TB | ffffe8ffffffffff | 32 TB | vmalloc/ioremap space (vmalloc_base)
+ffffe90000000000 | -23 TB | ffffe9ffffffffff | 1 TB | ... unused hole
+ffffea0000000000 | -22 TB | ffffeaffffffffff | 1 TB | virtual memory map (vmemmap_base)
+ffffeb0000000000 | -21 TB | ffffebffffffffff | 1 TB | ... unused hole
+ffffec0000000000 | -20 TB | fffffbffffffffff | 16 TB | KASAN shadow memory
+fffffc0000000000 | -4 TB | fffffdffffffffff | 2 TB | ... unused hole
+ vaddr_end for KASLR
+
+fffffe0000000000 | -2 TB | fffffe7fffffffff | 512 GB | cpu_entry_area mapping
+fffffe8000000000 | -1.5 TB | fffffeffffffffff | 512 GB | LDT remap for PTI
+ffffff0000000000 | -1 TB | ffffff7fffffffff | 512 GB | %esp fixup stacks
+
+# Identical layout to the 56-bit one from here on:
+
+ffffff8000000000 | -512 GB | fffffffeefffffff | ~507 GB | ... unused hole
+ffffffef00000000 | -68 GB | fffffffeffffffff | 64 GB | EFI region mapping space
+ffffffff00000000 | -4 GB | ffffffff7fffffff | 2 GB | ... unused hole
+ffffffff80000000 | -2 GB | ffffffff9fffffff | 512 MB | kernel text mapping, mapped to physical address 0
+ffffffff80000000 |-2048 MB
+
+ffffffffa0000000 |-1536 MB | fffffffffeffffff | 1520 MB | module mapping space
+ffffffffff000000 | -16 MB
+
+ FIXADDR_START | ~-11 MB | ffffffffff5fffff | | kernel-internal fixmap range with variable size,
+ typical size is around ~0.5 MB
+
+ffffffffff600000 | -10 MB | ffffffffff600fff | 4 kB | legacy vsyscall ABI
+ffffffffffe00000 | -2 MB | ffffffffffffffff | 2 MB | ... unused hole
+----------------------------------------------------------------------------
+
+
+========================================================
+| Complete virtual memory map with 5-level page tables |
+========================================================
+
+Notes:
+
+ - With 56-bit addresses, user-space memory gets expanded by a factor of 512x,
+ from 0.125 PB to 64 PB. All kernel mappings shift down to the -64 PT starting
+ offset and many of the regions expand to support the much larger physical
+ memory supported.
+
+--------------------------------------------------------------------------------
+start addr | offset | end addr | size | VM area description
+-----------------|------------|------------------|---------|--------------------
+0000000000000000 | 0 | 00ffffffffffffff | 64 PB | user-space virtual memory, different per mm
+ |
+0000800000000000 | +64 PB | ffff7fffffffffff | ~16K PB | ... huge, still almost 64 bits wide hole of non-canonical
+ virtual memory addresses up to the -128 TB
+ starting offset of kernel mappings.
+ |
+ |----------------------------------------------------
+ | kernel-space virtual memory, shared between all processes:
+ |
+ff00000000000000 | -64 PB | ff0fffffffffffff | 4 PB | ... guard hole, also reserved for hypervisor
+ff10000000000000 | -60 PB | ff8fffffffffffff | 32 PB | direct mapping of all physical memory (page_offset_base),
+ this is what limits max physical memory supported.
+ff90000000000000 | -28 PB | ff9fffffffffffff | 4 PB | LDT remap for PTI
+ffa0000000000000 | -24 PB | ffd1ffffffffffff | 12.5 PB | vmalloc/ioremap space (vmalloc_base)
+ffd2000000000000 | -11.5 PB | ffd3ffffffffffff | 0.5 PB | ... unused hole
+ffd4000000000000 | -11 PB | ffd5ffffffffffff | 0.5 PB | virtual memory map (vmemmap_base)
+ffd6000000000000 | -10.5 PB | ffdeffffffffffff | 2.25 PB | ... unused hole
+ffdf000000000000 | -8.25 PB | fffffdffffffffff | ~8 PB | KASAN shadow memory
+fffffc0000000000 | -4 TB | fffffdffffffffff | 2 TB | ... unused hole
+ vaddr_end for KASLR
+
+fffffe0000000000 | -2 TB | fffffe7fffffffff | 0.5 TB | cpu_entry_area mapping
+fffffe8000000000 | -1.5 TB | fffffeffffffffff | 0.5 TB | ... unused hole
+ffffff0000000000 | -1 TB | ffffff7fffffffff | 0.5 TB | %esp fixup stacks
+ffffff8000000000 | -0.5 TB | ffffffeeffffffff | 444 GB | ... unused hole
+
+# Identical layout to the 47-bit one from here on:
+
+ffffffef00000000 | -68 GB | fffffffeffffffff | 64 GB | EFI region mapping space
+ffffffff00000000 | -4 GB | ffffffff7fffffff | 2 GB | ... unused hole
+ffffffff80000000 | -2 GB | ffffffff9fffffff | 512 MB | kernel text mapping, mapped to physical address 0
+ffffffff80000000 |-2048 MB
+
+ffffffffa0000000 |-1536 MB | fffffffffeffffff | 1520 MB | module mapping space
+ffffffffff000000 | -16 MB
+
+ FIXADDR_START | ~-11 MB | ffffffffff5fffff | | kernel-internal fixmap range with variable size,
+ typical size is around ~0.5 MB
+
+ffffffffff600000 | -10 MB | ffffffffff600fff | 4 kB | legacy vsyscall ABI
+ffffffffffe00000 | -2 MB | ffffffffffffffff | 2 MB | ... unused hole
+----------------------------------------------------------------------------
Architecture defines a 64-bit virtual address. Implementations can support
less. Currently supported are 48- and 57-bit virtual addresses. Bits 63
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 4/3] x86/mm/doc: Enhance the x86-64 virtual memory layout descriptions
2018-10-06 12:22 ` [PATCH 4/3] x86/mm/doc: Enhance the x86-64 virtual memory layout descriptions Ingo Molnar
@ 2018-10-06 12:33 ` Ingo Molnar
2018-10-06 14:41 ` Baoquan He
2018-10-06 14:38 ` [PATCH 4/3 v2] " Ingo Molnar
1 sibling, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2018-10-06 12:33 UTC (permalink / raw)
To: Baoquan He, Kirill A. Shutemov
Cc: linux-kernel, x86, linux-doc, tglx, thgarnie, corbet,
Borislav Petkov, H. Peter Anvin, Andy Lutomirski, Peter Zijlstra,
Linus Torvalds, Andrew Morton
* Ingo Molnar <mingo@kernel.org> wrote:
> +========================================================
> +| Complete virtual memory map with 4-level page tables |
> +========================================================
> +--------------------------------------------------------------------------------
> +start addr | offset | end addr | size | VM area description
> +-----------------|------------|------------------|---------|--------------------
> +
> +# Identical layout to the 56-bit one from here on:
> +
> +ffffff8000000000 | -512 GB | fffffffeefffffff | ~507 GB | ... unused hole
> +ffffffef00000000 | -68 GB | fffffffeffffffff | 64 GB | EFI region mapping space
> +========================================================
> +| Complete virtual memory map with 5-level page tables |
> +========================================================
> +ffffff8000000000 | -0.5 TB | ffffffeeffffffff | 444 GB | ... unused hole
> +
> +# Identical layout to the 47-bit one from here on:
> +
> +ffffffef00000000 | -68 GB | fffffffeffffffff | 64 GB | EFI region mapping space
So patch #2 appears to have introduced an error/typo in the 47-bit table. Note the weird size
and discontinuity of the 'unused hole' in the 47-bit table, and compare it with 56-bit table:
fffffffeefffffff
ffffffeeffffffff
(Note how the incorrect end address was cargo-cult-copied into the 'size' field of ~507 GB...)
The correct number is the 56-bit one, and both tables should show the following identical
layout:
ffffff8000000000 | -512 GB | fffffffeefffffff | 444 GB | ... unused hole
ffffffef00000000 | -68 GB | fffffffeffffffff | 64 GB | EFI region mapping space
Agreed?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 19+ messages in thread
* [tip:x86/mm] x86/KASLR: Update KERNEL_IMAGE_SIZE description
2018-10-06 8:43 ` [PATCH 1/3] x86/KASLR: Update KERNEL_IMAGE_SIZE description Baoquan He
@ 2018-10-06 13:06 ` tip-bot for Baoquan He
0 siblings, 0 replies; 19+ messages in thread
From: tip-bot for Baoquan He @ 2018-10-06 13:06 UTC (permalink / raw)
To: linux-tip-commits
Cc: peterz, luto, dvlasenk, tglx, bhe, luto, torvalds, bp, hpa,
linux-kernel, mingo, dave.hansen, brgerst, riel
Commit-ID: 06d4a462e954756f3d3d54e6f3f1bdc2e6f592a9
Gitweb: https://git.kernel.org/tip/06d4a462e954756f3d3d54e6f3f1bdc2e6f592a9
Author: Baoquan He <bhe@redhat.com>
AuthorDate: Sat, 6 Oct 2018 16:43:25 +0800
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 6 Oct 2018 14:46:46 +0200
x86/KASLR: Update KERNEL_IMAGE_SIZE description
Currently CONFIG_RANDOMIZE_BASE=y is set by default, which makes some of the
old comments above the KERNEL_IMAGE_SIZE definition out of date. Update them
to the current state of affairs.
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: corbet@lwn.net
Cc: linux-doc@vger.kernel.org
Cc: thgarnie@google.com
Link: http://lkml.kernel.org/r/20181006084327.27467-2-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/include/asm/page_64_types.h | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 6afac386a434..cd0cf1c568b4 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -59,13 +59,16 @@
#endif
/*
- * Kernel image size is limited to 1GiB due to the fixmap living in the
- * next 1GiB (see level2_kernel_pgt in arch/x86/kernel/head_64.S). Use
- * 512MiB by default, leaving 1.5GiB for modules once the page tables
- * are fully set up. If kernel ASLR is configured, it can extend the
- * kernel page table mapping, reducing the size of the modules area.
+ * Maximum kernel image size is limited to 1 GiB, due to the fixmap living
+ * in the next 1 GiB (see level2_kernel_pgt in arch/x86/kernel/head_64.S).
+ *
+ * On KASLR use 1 GiB by default, leaving 1 GiB for modules once the
+ * page tables are fully set up.
+ *
+ * If KASLR is disabled we can shrink it to 0.5 GiB and increase the size
+ * of the modules area to 1.5 GiB.
*/
-#if defined(CONFIG_RANDOMIZE_BASE)
+#ifdef CONFIG_RANDOMIZE_BASE
#define KERNEL_IMAGE_SIZE (1024 * 1024 * 1024)
#else
#define KERNEL_IMAGE_SIZE (512 * 1024 * 1024)
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [tip:x86/mm] x86/mm/doc: Clean up the x86-64 virtual memory layout descriptions
2018-10-06 8:43 ` [PATCH 2/3] x86/mm/doc: Clean up the memory region layout descriptions Baoquan He
@ 2018-10-06 13:07 ` tip-bot for Baoquan He
0 siblings, 0 replies; 19+ messages in thread
From: tip-bot for Baoquan He @ 2018-10-06 13:07 UTC (permalink / raw)
To: linux-tip-commits
Cc: hpa, riel, luto, luto, bp, dave.hansen, peterz, tglx, dvlasenk,
mingo, torvalds, brgerst, linux-kernel, bhe
Commit-ID: 5b12904065798fee8b153a506ac7b72d5ebbe26c
Gitweb: https://git.kernel.org/tip/5b12904065798fee8b153a506ac7b72d5ebbe26c
Author: Baoquan He <bhe@redhat.com>
AuthorDate: Sat, 6 Oct 2018 16:43:26 +0800
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 6 Oct 2018 14:46:47 +0200
x86/mm/doc: Clean up the x86-64 virtual memory layout descriptions
In Documentation/x86/x86_64/mm.txt, the description of the x86-64 virtual
memory layout has become a confusing hodgepodge of inconsistencies:
- there's a hard to read mixture of 'TB' and 'bits' notation
- the entries sometimes mention a size in the description and sometimes not
- sometimes they list holes by address, sometimes only as an 'unused hole' line
So make it all a coherent, readable, well organized description.
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: corbet@lwn.net
Cc: linux-doc@vger.kernel.org
Cc: thgarnie@google.com
Link: http://lkml.kernel.org/r/20181006084327.27467-3-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
Documentation/x86/x86_64/mm.txt | 84 ++++++++++++++++++++---------------------
1 file changed, 42 insertions(+), 42 deletions(-)
diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index 5432a96d31ff..b4bc95c9790e 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -1,55 +1,55 @@
Virtual memory map with 4 level page tables:
-0000000000000000 - 00007fffffffffff (=47 bits) user space, different per mm
-hole caused by [47:63] sign extension
-ffff800000000000 - ffff87ffffffffff (=43 bits) guard hole, reserved for hypervisor
-ffff880000000000 - ffffc7ffffffffff (=64 TB) direct mapping of all phys. memory
-ffffc80000000000 - ffffc8ffffffffff (=40 bits) hole
-ffffc90000000000 - ffffe8ffffffffff (=45 bits) vmalloc/ioremap space
-ffffe90000000000 - ffffe9ffffffffff (=40 bits) hole
-ffffea0000000000 - ffffeaffffffffff (=40 bits) virtual memory map (1TB)
-... unused hole ...
-ffffec0000000000 - fffffbffffffffff (=44 bits) kasan shadow memory (16TB)
-... unused hole ...
+0000000000000000 - 00007fffffffffff (=47 bits, 128 TB) user space, different per mm
+ hole caused by [47:63] sign extension
+ffff800000000000 - ffff87ffffffffff (=43 bits, 8 TB) guard hole, reserved for hypervisor
+ffff880000000000 - ffffc7ffffffffff (=46 bits, 64 TB) direct mapping of all phys. memory (page_offset_base)
+ffffc80000000000 - ffffc8ffffffffff (=40 bits, 1 TB) unused hole
+ffffc90000000000 - ffffe8ffffffffff (=45 bits, 32 TB) vmalloc/ioremap space (vmalloc_base)
+ffffe90000000000 - ffffe9ffffffffff (=40 bits, 1 TB) unused hole
+ffffea0000000000 - ffffeaffffffffff (=40 bits, 1 TB) virtual memory map (vmemmap_base)
+ffffeb0000000000 - ffffebffffffffff (=40 bits, 1 TB) unused hole
+ffffec0000000000 - fffffbffffffffff (=44 bits, 16 TB) kasan shadow memory
+fffffc0000000000 - fffffdffffffffff (=41 bits, 2 TB) unused hole
vaddr_end for KASLR
-fffffe0000000000 - fffffe7fffffffff (=39 bits) cpu_entry_area mapping
-fffffe8000000000 - fffffeffffffffff (=39 bits) LDT remap for PTI
-ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks
-... unused hole ...
-ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space
-... unused hole ...
-ffffffff80000000 - ffffffff9fffffff (=512 MB) kernel text mapping, from phys 0
-ffffffffa0000000 - fffffffffeffffff (1520 MB) module mapping space
+fffffe0000000000 - fffffe7fffffffff (=39 bits, 512 GB) cpu_entry_area mapping
+fffffe8000000000 - fffffeffffffffff (=39 bits, 512 GB) LDT remap for PTI
+ffffff0000000000 - ffffff7fffffffff (=39 bits, 512 GB) %esp fixup stacks
+ffffff8000000000 - fffffffeefffffff (~39 bits, ~507 GB) unused hole
+ffffffef00000000 - fffffffeffffffff (=36 bits, 64 GB) EFI region mapping space
+ffffffff00000000 - ffffffff7fffffff (=31 bits, 2 GB) unused hole
+ffffffff80000000 - ffffffff9fffffff (=29 bits, 512 MB) kernel text mapping, from phys 0
+ffffffffa0000000 - fffffffffeffffff (~31 bits, 1520 MB) module mapping space
[fixmap start] - ffffffffff5fffff kernel-internal fixmap range
-ffffffffff600000 - ffffffffff600fff (=4 kB) legacy vsyscall ABI
-ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
+ffffffffff600000 - ffffffffff600fff ( =4 kB) legacy vsyscall ABI
+ffffffffffe00000 - ffffffffffffffff ( =2 MB) unused hole
Virtual memory map with 5 level page tables:
-0000000000000000 - 00ffffffffffffff (=56 bits) user space, different per mm
-hole caused by [56:63] sign extension
-ff00000000000000 - ff0fffffffffffff (=52 bits) guard hole, reserved for hypervisor
-ff10000000000000 - ff8fffffffffffff (=55 bits) direct mapping of all phys. memory
-ff90000000000000 - ff9fffffffffffff (=52 bits) LDT remap for PTI
-ffa0000000000000 - ffd1ffffffffffff (=54 bits) vmalloc/ioremap space (12800 TB)
-ffd2000000000000 - ffd3ffffffffffff (=49 bits) hole
-ffd4000000000000 - ffd5ffffffffffff (=49 bits) virtual memory map (512TB)
-... unused hole ...
-ffdf000000000000 - fffffc0000000000 (=53 bits) kasan shadow memory (8PB)
-... unused hole ...
+0000000000000000 - 00ffffffffffffff (=56 bits, 64 PB) user space, different per mm
+ hole caused by [56:63] sign extension
+ff00000000000000 - ff0fffffffffffff (=52 bits, 4 PB) guard hole, reserved for hypervisor
+ff10000000000000 - ff8fffffffffffff (=55 bits, 32 PB) direct mapping of all phys. memory (page_offset_base)
+ff90000000000000 - ff9fffffffffffff (=52 bits, 4 PB) LDT remap for PTI
+ffa0000000000000 - ffd1ffffffffffff (=53 bits, 12800 TB) vmalloc/ioremap space (vmalloc_base)
+ffd2000000000000 - ffd3ffffffffffff (=49 bits, 512 TB) unused hole
+ffd4000000000000 - ffd5ffffffffffff (=49 bits, 512 TB) virtual memory map (vmemmap_base)
+ffd6000000000000 - ffdeffffffffffff (~51 bits, 2304 TB) unused hole
+ffdf000000000000 - fffffdffffffffff (~53 bits, ~8 PB) kasan shadow memory
+fffffc0000000000 - fffffdffffffffff (=41 bits, 2 TB) unused hole
vaddr_end for KASLR
-fffffe0000000000 - fffffe7fffffffff (=39 bits) cpu_entry_area mapping
-... unused hole ...
-ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks
-... unused hole ...
-ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space
-... unused hole ...
-ffffffff80000000 - ffffffff9fffffff (=512 MB) kernel text mapping, from phys 0
-ffffffffa0000000 - fffffffffeffffff (1520 MB) module mapping space
+fffffe0000000000 - fffffe7fffffffff (=39 bits, 512 GB) cpu_entry_area mapping
+fffffe8000000000 - fffffeffffffffff (=39 bits, 512 GB) unused hole
+ffffff0000000000 - ffffff7fffffffff (=39 bits, 512 GB) %esp fixup stacks
+ffffff8000000000 - ffffffeeffffffff (~39 bits, 444 GB) unused hole
+ffffffef00000000 - fffffffeffffffff (=36 bits, 64 GB) EFI region mapping space
+ffffffff00000000 - ffffffff7fffffff (31 bits, 2 GB) unused hole
+ffffffff80000000 - ffffffff9fffffff (=29 bits, 512 MB) kernel text mapping, from phys 0
+ffffffffa0000000 - fffffffffeffffff (~31 bits, 1520 MB) module mapping space
[fixmap start] - ffffffffff5fffff kernel-internal fixmap range
-ffffffffff600000 - ffffffffff600fff (=4 kB) legacy vsyscall ABI
-ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
+ffffffffff600000 - ffffffffff600fff ( =4 kB) legacy vsyscall ABI
+ffffffffffe00000 - ffffffffffffffff ( =2 MB) unused hole
Architecture defines a 64-bit virtual address. Implementations can support
less. Currently supported are 48- and 57-bit virtual addresses. Bits 63
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 4/3 v2] x86/mm/doc: Enhance the x86-64 virtual memory layout descriptions
2018-10-06 12:22 ` [PATCH 4/3] x86/mm/doc: Enhance the x86-64 virtual memory layout descriptions Ingo Molnar
2018-10-06 12:33 ` Ingo Molnar
@ 2018-10-06 14:38 ` Ingo Molnar
2018-10-06 15:02 ` Baoquan He
2018-10-06 17:03 ` Ingo Molnar
1 sibling, 2 replies; 19+ messages in thread
From: Ingo Molnar @ 2018-10-06 14:38 UTC (permalink / raw)
To: Baoquan He
Cc: linux-kernel, x86, linux-doc, tglx, thgarnie, corbet,
Borislav Petkov, H. Peter Anvin, Andy Lutomirski, Peter Zijlstra,
Linus Torvalds, Andrew Morton
Find a new iteration below, fixed the bug, prettified the table some more.
Thanks,
Ingo
=========>
After the cleanups from Baoquan He, make it even more readable:
- Remove the 'bits' area size column: it's pretty pointless and was even
wrong for some of the entries. Given that MB, GB, TB, PT are 10, 20,
30 and 40 bits, a "8 TB" size description makes it obvious that it's
43 bits.
- Introduce an "offset" column:
--------------------------------------------------------------------------------
start addr | offset | end addr | size | VM area description
-----------------|------------|------------------|---------|--------------------
...
ffff880000000000 | -120 TB | ffffc7ffffffffff | 64 TB | direct mapping of all physical memory (page_offset_base),
this is what limits max physical memory supported.
The -120 TB notation makes it obvious where this particular virtual memory
region starts: 120 TB down from the top of the 64-bit virtual memory space.
Especially the layout of the kernel mappings is a *lot* more obvious when
written this way, plus it's much easier to compare it with the size column
and understand/check/validate and modify the kernel's layout in the future.
- Mark the part from where the 47-bit and 56-bit kernel layouts are 100% identical,
this starts at the -512 GB offset and the EFI region.
- Re-shuffle the size desciptions to be continous blocks of sizes, instead of the
often mixed size. I.e. write "0.5 TB" instead of "512 GB" if we are still in
the TB-granular region of the map.
- Make the 47-bit and 56-bit descriptions use the *exact* same layout and wording,
and only differ where there's a material difference. This makes it easy to compare
the two tables side by side by switching between two terminal tabs.
- Plus enhance a lot of other stylistic/typographical details: make the tables
explicitly tabular, add headers, enhance certain entries, etc. etc.
Note that there are some apparent errors in the tables as well, but I'll fix
them in a separate patch to make it easier to review/validate.
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: corbet@lwn.net
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: thgarnie@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
Documentation/x86/x86_64/mm.txt | 171 ++++++++++++++++++++++++++++------------
1 file changed, 120 insertions(+), 51 deletions(-)
diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index b4bc95c9790e..702898633b00 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -1,55 +1,124 @@
+====================================================
+Complete virtual memory map with 4-level page tables
+====================================================
-Virtual memory map with 4 level page tables:
-
-0000000000000000 - 00007fffffffffff (=47 bits, 128 TB) user space, different per mm
- hole caused by [47:63] sign extension
-ffff800000000000 - ffff87ffffffffff (=43 bits, 8 TB) guard hole, reserved for hypervisor
-ffff880000000000 - ffffc7ffffffffff (=46 bits, 64 TB) direct mapping of all phys. memory (page_offset_base)
-ffffc80000000000 - ffffc8ffffffffff (=40 bits, 1 TB) unused hole
-ffffc90000000000 - ffffe8ffffffffff (=45 bits, 32 TB) vmalloc/ioremap space (vmalloc_base)
-ffffe90000000000 - ffffe9ffffffffff (=40 bits, 1 TB) unused hole
-ffffea0000000000 - ffffeaffffffffff (=40 bits, 1 TB) virtual memory map (vmemmap_base)
-ffffeb0000000000 - ffffebffffffffff (=40 bits, 1 TB) unused hole
-ffffec0000000000 - fffffbffffffffff (=44 bits, 16 TB) kasan shadow memory
-fffffc0000000000 - fffffdffffffffff (=41 bits, 2 TB) unused hole
- vaddr_end for KASLR
-fffffe0000000000 - fffffe7fffffffff (=39 bits, 512 GB) cpu_entry_area mapping
-fffffe8000000000 - fffffeffffffffff (=39 bits, 512 GB) LDT remap for PTI
-ffffff0000000000 - ffffff7fffffffff (=39 bits, 512 GB) %esp fixup stacks
-ffffff8000000000 - fffffffeefffffff (~39 bits, ~507 GB) unused hole
-ffffffef00000000 - fffffffeffffffff (=36 bits, 64 GB) EFI region mapping space
-ffffffff00000000 - ffffffff7fffffff (=31 bits, 2 GB) unused hole
-ffffffff80000000 - ffffffff9fffffff (=29 bits, 512 MB) kernel text mapping, from phys 0
-ffffffffa0000000 - fffffffffeffffff (~31 bits, 1520 MB) module mapping space
-[fixmap start] - ffffffffff5fffff kernel-internal fixmap range
-ffffffffff600000 - ffffffffff600fff ( =4 kB) legacy vsyscall ABI
-ffffffffffe00000 - ffffffffffffffff ( =2 MB) unused hole
-
-Virtual memory map with 5 level page tables:
-
-0000000000000000 - 00ffffffffffffff (=56 bits, 64 PB) user space, different per mm
- hole caused by [56:63] sign extension
-ff00000000000000 - ff0fffffffffffff (=52 bits, 4 PB) guard hole, reserved for hypervisor
-ff10000000000000 - ff8fffffffffffff (=55 bits, 32 PB) direct mapping of all phys. memory (page_offset_base)
-ff90000000000000 - ff9fffffffffffff (=52 bits, 4 PB) LDT remap for PTI
-ffa0000000000000 - ffd1ffffffffffff (=53 bits, 12800 TB) vmalloc/ioremap space (vmalloc_base)
-ffd2000000000000 - ffd3ffffffffffff (=49 bits, 512 TB) unused hole
-ffd4000000000000 - ffd5ffffffffffff (=49 bits, 512 TB) virtual memory map (vmemmap_base)
-ffd6000000000000 - ffdeffffffffffff (~51 bits, 2304 TB) unused hole
-ffdf000000000000 - fffffdffffffffff (~53 bits, ~8 PB) kasan shadow memory
-fffffc0000000000 - fffffdffffffffff (=41 bits, 2 TB) unused hole
- vaddr_end for KASLR
-fffffe0000000000 - fffffe7fffffffff (=39 bits, 512 GB) cpu_entry_area mapping
-fffffe8000000000 - fffffeffffffffff (=39 bits, 512 GB) unused hole
-ffffff0000000000 - ffffff7fffffffff (=39 bits, 512 GB) %esp fixup stacks
-ffffff8000000000 - ffffffeeffffffff (~39 bits, 444 GB) unused hole
-ffffffef00000000 - fffffffeffffffff (=36 bits, 64 GB) EFI region mapping space
-ffffffff00000000 - ffffffff7fffffff (31 bits, 2 GB) unused hole
-ffffffff80000000 - ffffffff9fffffff (=29 bits, 512 MB) kernel text mapping, from phys 0
-ffffffffa0000000 - fffffffffeffffff (~31 bits, 1520 MB) module mapping space
-[fixmap start] - ffffffffff5fffff kernel-internal fixmap range
-ffffffffff600000 - ffffffffff600fff ( =4 kB) legacy vsyscall ABI
-ffffffffffe00000 - ffffffffffffffff ( =2 MB) unused hole
+Notes:
+
+ - Negative addresses such as "-23 TB" are absolute addresses in bytes, counted down
+ from the top of the 64-bit address space. It's easier to understand the layout
+ when seen both in absolute addresses and in distance-from-top notation.
+
+ For example 0xffffe90000000000 == -23 TB, it's 23 TB lower than the top of the
+ 64-bit address space (ffffffffffffffff).
+
+ Note that as we get closer to the top of the address space, the notation changes
+ from TB to GB and then MB/KB.
+
+ - "16M TB" might look weird at first sight, but it's an easier to visualize size
+ notation than "16 EB", which few will recognize at first sight as 16 exabytes.
+ It also shows it nicely how incredibly large 64-bit address space is.
+
+========================================================================================================================
+ Start addr | Offset | End addr | Size | VM area description
+========================================================================================================================
+ | | | |
+ 0000000000000000 | 0 | 00007fffffffffff | 128 TB | user-space virtual memory, different per mm
+__________________|____________|__________________|_________|___________________________________________________________
+ | | | |
+ 0000800000000000 | +128 TB | ffff7fffffffffff | ~16M TB | ... huge, almost 64 bits wide hole of non-canonical
+ | | | | virtual memory addresses up to the -128 TB
+ | | | | starting offset of kernel mappings.
+__________________|____________|__________________|_________|___________________________________________________________
+ |
+ | Kernel-space virtual memory, shared between all processes:
+____________________________________________________________|___________________________________________________________
+ | | | |
+ ffff800000000000 | -128 TB | ffff87ffffffffff | 8 TB | ... guard hole, also reserved for hypervisor
+ ffff880000000000 | -120 TB | ffffc7ffffffffff | 64 TB | direct mapping of all physical memory (page_offset_base)
+ ffffc80000000000 | -56 TB | ffffc8ffffffffff | 1 TB | ... unused hole
+ ffffc90000000000 | -55 TB | ffffe8ffffffffff | 32 TB | vmalloc/ioremap space (vmalloc_base)
+ ffffe90000000000 | -23 TB | ffffe9ffffffffff | 1 TB | ... unused hole
+ ffffea0000000000 | -22 TB | ffffeaffffffffff | 1 TB | virtual memory map (vmemmap_base)
+ ffffeb0000000000 | -21 TB | ffffebffffffffff | 1 TB | ... unused hole
+ ffffec0000000000 | -20 TB | fffffbffffffffff | 16 TB | KASAN shadow memory
+ fffffc0000000000 | -4 TB | fffffdffffffffff | 2 TB | ... unused hole
+ | | | | vaddr_end for KASLR
+ fffffe0000000000 | -2 TB | fffffe7fffffffff | 0.5 TB | cpu_entry_area mapping
+ fffffe8000000000 | -1.5 TB | fffffeffffffffff | 0.5 TB | LDT remap for PTI
+ ffffff0000000000 | -1 TB | ffffff7fffffffff | 0.5 TB | %esp fixup stacks
+__________________|____________|__________________|_________|____________________________________________________________
+ |
+ | Identical layout to the 47-bit one from here on:
+____________________________________________________________|____________________________________________________________
+ | | | |
+ ffffff8000000000 | -512 GB | ffffffeeffffffff | 444 GB | ... unused hole
+ ffffffef00000000 | -68 GB | fffffffeffffffff | 64 GB | EFI region mapping space
+ ffffffff00000000 | -4 GB | ffffffff7fffffff | 2 GB | ... unused hole
+ ffffffff80000000 | -2 GB | ffffffff9fffffff | 512 MB | kernel text mapping, mapped to physical address 0
+ ffffffff80000000 |-2048 MB | | |
+ ffffffffa0000000 |-1536 MB | fffffffffeffffff | 1520 MB | module mapping space
+ ffffffffff000000 | -16 MB | | |
+ FIXADDR_START | ~-11 MB | ffffffffff5fffff | ~0.5 MB | kernel-internal fixmap range, variable size and offset
+ ffffffffff600000 | -10 MB | ffffffffff600fff | 4 kB | legacy vsyscall ABI
+ ffffffffffe00000 | -2 MB | ffffffffffffffff | 2 MB | ... unused hole
+__________________|____________|__________________|_________|___________________________________________________________
+
+
+====================================================
+Complete virtual memory map with 5-level page tables
+====================================================
+
+Notes:
+
+ - With 56-bit addresses, user-space memory gets expanded by a factor of 512x,
+ from 0.125 PB to 64 PB. All kernel mappings shift down to the -64 PT starting
+ offset and many of the regions expand to support the much larger physical
+ memory supported.
+
+========================================================================================================================
+ Start addr | Offset | End addr | Size | VM area description
+========================================================================================================================
+ | | | |
+ 0000000000000000 | 0 | 00ffffffffffffff | 64 PB | user-space virtual memory, different per mm
+__________________|____________|__________________|_________|___________________________________________________________
+ | | | |
+ 0000800000000000 | +64 PB | ffff7fffffffffff | ~16K PB | ... huge, still almost 64 bits wide hole of non-canonical
+ | | | | virtual memory addresses up to the -128 TB
+ | | | | starting offset of kernel mappings.
+__________________|____________|__________________|_________|___________________________________________________________
+ |
+ | Kernel-space virtual memory, shared between all processes:
+____________________________________________________________|___________________________________________________________
+ | | | |
+ ff00000000000000 | -64 PB | ff0fffffffffffff | 4 PB | ... guard hole, also reserved for hypervisor
+ ff10000000000000 | -60 PB | ff8fffffffffffff | 32 PB | direct mapping of all physical memory (page_offset_base)
+ ff90000000000000 | -28 PB | ff9fffffffffffff | 4 PB | LDT remap for PTI
+ ffa0000000000000 | -24 PB | ffd1ffffffffffff | 12.5 PB | vmalloc/ioremap space (vmalloc_base)
+ ffd2000000000000 | -11.5 PB | ffd3ffffffffffff | 0.5 PB | ... unused hole
+ ffd4000000000000 | -11 PB | ffd5ffffffffffff | 0.5 PB | virtual memory map (vmemmap_base)
+ ffd6000000000000 | -10.5 PB | ffdeffffffffffff | 2.25 PB | ... unused hole
+ ffdf000000000000 | -8.25 PB | fffffdffffffffff | ~8 PB | KASAN shadow memory
+ fffffc0000000000 | -4 TB | fffffdffffffffff | 2 TB | ... unused hole
+ | | | | vaddr_end for KASLR
+ fffffe0000000000 | -2 TB | fffffe7fffffffff | 0.5 TB | cpu_entry_area mapping
+ fffffe8000000000 | -1.5 TB | fffffeffffffffff | 0.5 TB | ... unused hole
+ ffffff0000000000 | -1 TB | ffffff7fffffffff | 0.5 TB | %esp fixup stacks
+__________________|____________|__________________|_________|____________________________________________________________
+ |
+ | Identical layout to the 47-bit one from here on:
+____________________________________________________________|____________________________________________________________
+ | | | |
+ ffffff8000000000 | -512 GB | ffffffeeffffffff | 444 GB | ... unused hole
+ ffffffef00000000 | -68 GB | fffffffeffffffff | 64 GB | EFI region mapping space
+ ffffffff00000000 | -4 GB | ffffffff7fffffff | 2 GB | ... unused hole
+ ffffffff80000000 | -2 GB | ffffffff9fffffff | 512 MB | kernel text mapping, mapped to physical address 0
+ ffffffff80000000 |-2048 MB | | |
+ ffffffffa0000000 |-1536 MB | fffffffffeffffff | 1520 MB | module mapping space
+ ffffffffff000000 | -16 MB | | |
+ FIXADDR_START | ~-11 MB | ffffffffff5fffff | ~0.5 MB | kernel-internal fixmap range, variable size and offset
+ ffffffffff600000 | -10 MB | ffffffffff600fff | 4 kB | legacy vsyscall ABI
+ ffffffffffe00000 | -2 MB | ffffffffffffffff | 2 MB | ... unused hole
+__________________|____________|__________________|_________|___________________________________________________________
Architecture defines a 64-bit virtual address. Implementations can support
less. Currently supported are 48- and 57-bit virtual addresses. Bits 63
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 4/3] x86/mm/doc: Enhance the x86-64 virtual memory layout descriptions
2018-10-06 12:33 ` Ingo Molnar
@ 2018-10-06 14:41 ` Baoquan He
0 siblings, 0 replies; 19+ messages in thread
From: Baoquan He @ 2018-10-06 14:41 UTC (permalink / raw)
To: Ingo Molnar
Cc: Kirill A. Shutemov, linux-kernel, x86, linux-doc, tglx, thgarnie,
corbet, Borislav Petkov, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Linus Torvalds, Andrew Morton
On 10/06/18 at 02:33pm, Ingo Molnar wrote:
>
> * Ingo Molnar <mingo@kernel.org> wrote:
>
> > +========================================================
> > +| Complete virtual memory map with 4-level page tables |
> > +========================================================
>
> > +--------------------------------------------------------------------------------
> > +start addr | offset | end addr | size | VM area description
> > +-----------------|------------|------------------|---------|--------------------
>
> > +
> > +# Identical layout to the 56-bit one from here on:
> > +
> > +ffffff8000000000 | -512 GB | fffffffeefffffff | ~507 GB | ... unused hole
> > +ffffffef00000000 | -68 GB | fffffffeffffffff | 64 GB | EFI region mapping space
>
> > +========================================================
> > +| Complete virtual memory map with 5-level page tables |
> > +========================================================
>
> > +ffffff8000000000 | -0.5 TB | ffffffeeffffffff | 444 GB | ... unused hole
> > +
> > +# Identical layout to the 47-bit one from here on:
> > +
> > +ffffffef00000000 | -68 GB | fffffffeffffffff | 64 GB | EFI region mapping space
>
> So patch #2 appears to have introduced an error/typo in the 47-bit table. Note the weird size
> and discontinuity of the 'unused hole' in the 47-bit table, and compare it with 56-bit table:
>
> fffffffeefffffff
> ffffffeeffffffff
>
> (Note how the incorrect end address was cargo-cult-copied into the 'size' field of ~507 GB...)
>
> The correct number is the 56-bit one, and both tables should show the following identical
> layout:
>
> ffffff8000000000 | -512 GB | fffffffeefffffff | 444 GB | ... unused hole
> ffffffef00000000 | -68 GB | fffffffeffffffff | 64 GB | EFI region mapping space
>
> Agreed?
Yes, you are right. I wondered why the size is a weird unaligned value.
Sorry about that.
Thanks
Baoquan
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 4/3 v2] x86/mm/doc: Enhance the x86-64 virtual memory layout descriptions
2018-10-06 14:38 ` [PATCH 4/3 v2] " Ingo Molnar
@ 2018-10-06 15:02 ` Baoquan He
2018-10-06 17:03 ` Ingo Molnar
1 sibling, 0 replies; 19+ messages in thread
From: Baoquan He @ 2018-10-06 15:02 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, x86, linux-doc, tglx, thgarnie, corbet,
Borislav Petkov, H. Peter Anvin, Andy Lutomirski, Peter Zijlstra,
Linus Torvalds, Andrew Morton
On 10/06/18 at 04:38pm, Ingo Molnar wrote:
> +Notes:
> +
> + - Negative addresses such as "-23 TB" are absolute addresses in bytes, counted down
> + from the top of the 64-bit address space. It's easier to understand the layout
> + when seen both in absolute addresses and in distance-from-top notation.
> +
> + For example 0xffffe90000000000 == -23 TB, it's 23 TB lower than the top of the
> + 64-bit address space (ffffffffffffffff).
> +
> + Note that as we get closer to the top of the address space, the notation changes
> + from TB to GB and then MB/KB.
> +
> + - "16M TB" might look weird at first sight, but it's an easier to visualize size
> + notation than "16 EB", which few will recognize at first sight as 16 exabytes.
> + It also shows it nicely how incredibly large 64-bit address space is.
Thanks, this looks much better than the old version and my change.
Reviewed-by: Baoquan He <bhe@redhat.com>
Thanks
Baoquan
> +
> +========================================================================================================================
> + Start addr | Offset | End addr | Size | VM area description
> +========================================================================================================================
> + | | | |
> + 0000000000000000 | 0 | 00007fffffffffff | 128 TB | user-space virtual memory, different per mm
> +__________________|____________|__________________|_________|___________________________________________________________
> + | | | |
> + 0000800000000000 | +128 TB | ffff7fffffffffff | ~16M TB | ... huge, almost 64 bits wide hole of non-canonical
> + | | | | virtual memory addresses up to the -128 TB
> + | | | | starting offset of kernel mappings.
> +__________________|____________|__________________|_________|___________________________________________________________
> + |
> + | Kernel-space virtual memory, shared between all processes:
> +____________________________________________________________|___________________________________________________________
> + | | | |
> + ffff800000000000 | -128 TB | ffff87ffffffffff | 8 TB | ... guard hole, also reserved for hypervisor
> + ffff880000000000 | -120 TB | ffffc7ffffffffff | 64 TB | direct mapping of all physical memory (page_offset_base)
> + ffffc80000000000 | -56 TB | ffffc8ffffffffff | 1 TB | ... unused hole
> + ffffc90000000000 | -55 TB | ffffe8ffffffffff | 32 TB | vmalloc/ioremap space (vmalloc_base)
> + ffffe90000000000 | -23 TB | ffffe9ffffffffff | 1 TB | ... unused hole
> + ffffea0000000000 | -22 TB | ffffeaffffffffff | 1 TB | virtual memory map (vmemmap_base)
> + ffffeb0000000000 | -21 TB | ffffebffffffffff | 1 TB | ... unused hole
> + ffffec0000000000 | -20 TB | fffffbffffffffff | 16 TB | KASAN shadow memory
> + fffffc0000000000 | -4 TB | fffffdffffffffff | 2 TB | ... unused hole
> + | | | | vaddr_end for KASLR
> + fffffe0000000000 | -2 TB | fffffe7fffffffff | 0.5 TB | cpu_entry_area mapping
> + fffffe8000000000 | -1.5 TB | fffffeffffffffff | 0.5 TB | LDT remap for PTI
> + ffffff0000000000 | -1 TB | ffffff7fffffffff | 0.5 TB | %esp fixup stacks
> +__________________|____________|__________________|_________|____________________________________________________________
> + |
> + | Identical layout to the 47-bit one from here on:
> +____________________________________________________________|____________________________________________________________
> + | | | |
> + ffffff8000000000 | -512 GB | ffffffeeffffffff | 444 GB | ... unused hole
> + ffffffef00000000 | -68 GB | fffffffeffffffff | 64 GB | EFI region mapping space
> + ffffffff00000000 | -4 GB | ffffffff7fffffff | 2 GB | ... unused hole
> + ffffffff80000000 | -2 GB | ffffffff9fffffff | 512 MB | kernel text mapping, mapped to physical address 0
> + ffffffff80000000 |-2048 MB | | |
> + ffffffffa0000000 |-1536 MB | fffffffffeffffff | 1520 MB | module mapping space
> + ffffffffff000000 | -16 MB | | |
> + FIXADDR_START | ~-11 MB | ffffffffff5fffff | ~0.5 MB | kernel-internal fixmap range, variable size and offset
> + ffffffffff600000 | -10 MB | ffffffffff600fff | 4 kB | legacy vsyscall ABI
> + ffffffffffe00000 | -2 MB | ffffffffffffffff | 2 MB | ... unused hole
> +__________________|____________|__________________|_________|___________________________________________________________
> +
> +
> +====================================================
> +Complete virtual memory map with 5-level page tables
> +====================================================
> +
> +Notes:
> +
> + - With 56-bit addresses, user-space memory gets expanded by a factor of 512x,
> + from 0.125 PB to 64 PB. All kernel mappings shift down to the -64 PT starting
> + offset and many of the regions expand to support the much larger physical
> + memory supported.
> +
> +========================================================================================================================
> + Start addr | Offset | End addr | Size | VM area description
> +========================================================================================================================
> + | | | |
> + 0000000000000000 | 0 | 00ffffffffffffff | 64 PB | user-space virtual memory, different per mm
> +__________________|____________|__________________|_________|___________________________________________________________
> + | | | |
> + 0000800000000000 | +64 PB | ffff7fffffffffff | ~16K PB | ... huge, still almost 64 bits wide hole of non-canonical
> + | | | | virtual memory addresses up to the -128 TB
> + | | | | starting offset of kernel mappings.
> +__________________|____________|__________________|_________|___________________________________________________________
> + |
> + | Kernel-space virtual memory, shared between all processes:
> +____________________________________________________________|___________________________________________________________
> + | | | |
> + ff00000000000000 | -64 PB | ff0fffffffffffff | 4 PB | ... guard hole, also reserved for hypervisor
> + ff10000000000000 | -60 PB | ff8fffffffffffff | 32 PB | direct mapping of all physical memory (page_offset_base)
> + ff90000000000000 | -28 PB | ff9fffffffffffff | 4 PB | LDT remap for PTI
> + ffa0000000000000 | -24 PB | ffd1ffffffffffff | 12.5 PB | vmalloc/ioremap space (vmalloc_base)
> + ffd2000000000000 | -11.5 PB | ffd3ffffffffffff | 0.5 PB | ... unused hole
> + ffd4000000000000 | -11 PB | ffd5ffffffffffff | 0.5 PB | virtual memory map (vmemmap_base)
> + ffd6000000000000 | -10.5 PB | ffdeffffffffffff | 2.25 PB | ... unused hole
> + ffdf000000000000 | -8.25 PB | fffffdffffffffff | ~8 PB | KASAN shadow memory
> + fffffc0000000000 | -4 TB | fffffdffffffffff | 2 TB | ... unused hole
> + | | | | vaddr_end for KASLR
> + fffffe0000000000 | -2 TB | fffffe7fffffffff | 0.5 TB | cpu_entry_area mapping
> + fffffe8000000000 | -1.5 TB | fffffeffffffffff | 0.5 TB | ... unused hole
> + ffffff0000000000 | -1 TB | ffffff7fffffffff | 0.5 TB | %esp fixup stacks
> +__________________|____________|__________________|_________|____________________________________________________________
> + |
> + | Identical layout to the 47-bit one from here on:
> +____________________________________________________________|____________________________________________________________
> + | | | |
> + ffffff8000000000 | -512 GB | ffffffeeffffffff | 444 GB | ... unused hole
> + ffffffef00000000 | -68 GB | fffffffeffffffff | 64 GB | EFI region mapping space
> + ffffffff00000000 | -4 GB | ffffffff7fffffff | 2 GB | ... unused hole
> + ffffffff80000000 | -2 GB | ffffffff9fffffff | 512 MB | kernel text mapping, mapped to physical address 0
> + ffffffff80000000 |-2048 MB | | |
> + ffffffffa0000000 |-1536 MB | fffffffffeffffff | 1520 MB | module mapping space
> + ffffffffff000000 | -16 MB | | |
> + FIXADDR_START | ~-11 MB | ffffffffff5fffff | ~0.5 MB | kernel-internal fixmap range, variable size and offset
> + ffffffffff600000 | -10 MB | ffffffffff600fff | 4 kB | legacy vsyscall ABI
> + ffffffffffe00000 | -2 MB | ffffffffffffffff | 2 MB | ... unused hole
> +__________________|____________|__________________|_________|___________________________________________________________
>
> Architecture defines a 64-bit virtual address. Implementations can support
> less. Currently supported are 48- and 57-bit virtual addresses. Bits 63
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 4/3 v2] x86/mm/doc: Enhance the x86-64 virtual memory layout descriptions
2018-10-06 14:38 ` [PATCH 4/3 v2] " Ingo Molnar
2018-10-06 15:02 ` Baoquan He
@ 2018-10-06 17:03 ` Ingo Molnar
2018-10-06 22:17 ` Andy Lutomirski
1 sibling, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2018-10-06 17:03 UTC (permalink / raw)
To: Baoquan He, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
Kirill A. Shutemov
Cc: linux-kernel, x86, linux-doc, tglx, thgarnie, corbet,
Borislav Petkov, H. Peter Anvin, Andy Lutomirski, Peter Zijlstra,
Linus Torvalds, Andrew Morton
There's one PTI related layout asymmetry I noticed between 4-level and 5-level kernels:
47-bit:
> + |
> + | Kernel-space virtual memory, shared between all processes:
> +____________________________________________________________|___________________________________________________________
> + | | | |
> + ffff800000000000 | -128 TB | ffff87ffffffffff | 8 TB | ... guard hole, also reserved for hypervisor
> + ffff880000000000 | -120 TB | ffffc7ffffffffff | 64 TB | direct mapping of all physical memory (page_offset_base)
> + ffffc80000000000 | -56 TB | ffffc8ffffffffff | 1 TB | ... unused hole
> + ffffc90000000000 | -55 TB | ffffe8ffffffffff | 32 TB | vmalloc/ioremap space (vmalloc_base)
> + ffffe90000000000 | -23 TB | ffffe9ffffffffff | 1 TB | ... unused hole
> + ffffea0000000000 | -22 TB | ffffeaffffffffff | 1 TB | virtual memory map (vmemmap_base)
> + ffffeb0000000000 | -21 TB | ffffebffffffffff | 1 TB | ... unused hole
> + ffffec0000000000 | -20 TB | fffffbffffffffff | 16 TB | KASAN shadow memory
> + fffffc0000000000 | -4 TB | fffffdffffffffff | 2 TB | ... unused hole
> + | | | | vaddr_end for KASLR
> + fffffe0000000000 | -2 TB | fffffe7fffffffff | 0.5 TB | cpu_entry_area mapping
> + fffffe8000000000 | -1.5 TB | fffffeffffffffff | 0.5 TB | LDT remap for PTI
> + ffffff0000000000 | -1 TB | ffffff7fffffffff | 0.5 TB | %esp fixup stacks
> +__________________|____________|__________________|_________|____________________________________________________________
> + |
56-bit:
> + |
> + | Kernel-space virtual memory, shared between all processes:
> +____________________________________________________________|___________________________________________________________
> + | | | |
> + ff00000000000000 | -64 PB | ff0fffffffffffff | 4 PB | ... guard hole, also reserved for hypervisor
> + ff10000000000000 | -60 PB | ff8fffffffffffff | 32 PB | direct mapping of all physical memory (page_offset_base)
> + ff90000000000000 | -28 PB | ff9fffffffffffff | 4 PB | LDT remap for PTI
> + ffa0000000000000 | -24 PB | ffd1ffffffffffff | 12.5 PB | vmalloc/ioremap space (vmalloc_base)
> + ffd2000000000000 | -11.5 PB | ffd3ffffffffffff | 0.5 PB | ... unused hole
> + ffd4000000000000 | -11 PB | ffd5ffffffffffff | 0.5 PB | virtual memory map (vmemmap_base)
> + ffd6000000000000 | -10.5 PB | ffdeffffffffffff | 2.25 PB | ... unused hole
> + ffdf000000000000 | -8.25 PB | fffffdffffffffff | ~8 PB | KASAN shadow memory
> + fffffc0000000000 | -4 TB | fffffdffffffffff | 2 TB | ... unused hole
> + | | | | vaddr_end for KASLR
> + fffffe0000000000 | -2 TB | fffffe7fffffffff | 0.5 TB | cpu_entry_area mapping
> + fffffe8000000000 | -1.5 TB | fffffeffffffffff | 0.5 TB | ... unused hole
> + ffffff0000000000 | -1 TB | ffffff7fffffffff | 0.5 TB | %esp fixup stacks
The two layouts are very similar beyond the shift in the offset and the region sizes, except
one big asymmetry: is the placement of the LDT remap for PTI.
Is there any fundamental reason why the LDT area is mapped into a 4 petabyte (!) area on 56-bit
kernels, instead of being at the -1.5 TB offset like on 47-bit kernels?
The only reason I can see is that this way is that it's currently coded at the PGD level only:
static void map_ldt_struct_to_user(struct mm_struct *mm)
{
pgd_t *pgd = pgd_offset(mm, LDT_BASE_ADDR);
if (static_cpu_has(X86_FEATURE_PTI) && !mm->context.ldt)
set_pgd(kernel_to_user_pgdp(pgd), *pgd);
}
( BTW., the 4 petabyte size of the area is misleading: a 5-level PGD entry covers 256 TB of
virtual memory, i.e 0.25 PB, not 4 PB. So in reality we have a 0.25 PB area there, used up
by the LDT mapping in a single PGD entry, plus a 3.75 PB hole after that. )
... but unless I'm missing something it's not really fundamental for it to be at the PGD level
- it could be two levels lower as well, and it could move back to the same place where it's on
the 47-bit kernel.
The LDT mapping operation is pretty heavy already, and the actual use of the LDT is not
impacted by where it's mapped, as the LDT is per mm so no remapping is required on context
switch.
I.e. could we move the LDT over to the same place? This would make an even larger area of the
address space identical between 47-bit and 56-bit kernels:
|
| Identical layout to the 47-bit one from here on:
____________________________________________________________|____________________________________________________________
| | | |
fffffc0000000000 | -4 TB | fffffdffffffffff | 2 TB | ... unused hole
| | | | vaddr_end for KASLR
fffffe0000000000 | -2 TB | fffffe7fffffffff | 0.5 TB | cpu_entry_area mapping
fffffe8000000000 | -1.5 TB | fffffeffffffffff | 0.5 TB | LDT remap for PTI
ffffff0000000000 | -1 TB | ffffff7fffffffff | 0.5 TB | %esp fixup stacks
ffffff8000000000 | -512 GB | ffffffeeffffffff | 444 GB | ... unused hole
ffffffef00000000 | -68 GB | fffffffeffffffff | 64 GB | EFI region mapping space
ffffffff00000000 | -4 GB | ffffffff7fffffff | 2 GB | ... unused hole
ffffffff80000000 | -2 GB | ffffffff9fffffff | 512 MB | kernel text mapping, mapped to physical address 0
ffffffff80000000 |-2048 MB | | |
ffffffffa0000000 |-1536 MB | fffffffffeffffff | 1520 MB | module mapping space
ffffffffff000000 | -16 MB | | |
FIXADDR_START | ~-11 MB | ffffffffff5fffff | ~0.5 MB | kernel-internal fixmap range, variable size and offset
ffffffffff600000 | -10 MB | ffffffffff600fff | 4 kB | legacy vsyscall ABI
ffffffffffe00000 | -2 MB | ffffffffffffffff | 2 MB | ... unused hole
__________________|____________|__________________|_________|___________________________________________________________
And the rest would basically just be 4 areas: the direct-mapping, vmalloc, vmemmap and KASAN
areas - which are scaled according to whether it's a 47-bit or 56-bit kernel.
Thoughts?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 4/3 v2] x86/mm/doc: Enhance the x86-64 virtual memory layout descriptions
2018-10-06 17:03 ` Ingo Molnar
@ 2018-10-06 22:17 ` Andy Lutomirski
2018-10-09 0:35 ` Baoquan He
0 siblings, 1 reply; 19+ messages in thread
From: Andy Lutomirski @ 2018-10-06 22:17 UTC (permalink / raw)
To: Ingo Molnar
Cc: Baoquan He, Andrew Lutomirski, Dave Hansen, Peter Zijlstra,
Kirill A. Shutemov, LKML, X86 ML, linux-doc, Thomas Gleixner,
Thomas Garnier, Jonathan Corbet, Borislav Petkov, H. Peter Anvin,
Linus Torvalds, Andrew Morton
On Sat, Oct 6, 2018 at 10:03 AM Ingo Molnar <mingo@kernel.org> wrote:
>
>
> There's one PTI related layout asymmetry I noticed between 4-level and 5-level kernels:
>
> 47-bit:
> > + |
> > + | Kernel-space virtual memory, shared between all processes:
> > +____________________________________________________________|___________________________________________________________
> > + | | | |
> > + ffff800000000000 | -128 TB | ffff87ffffffffff | 8 TB | ... guard hole, also reserved for hypervisor
> > + ffff880000000000 | -120 TB | ffffc7ffffffffff | 64 TB | direct mapping of all physical memory (page_offset_base)
> > + ffffc80000000000 | -56 TB | ffffc8ffffffffff | 1 TB | ... unused hole
> > + ffffc90000000000 | -55 TB | ffffe8ffffffffff | 32 TB | vmalloc/ioremap space (vmalloc_base)
> > + ffffe90000000000 | -23 TB | ffffe9ffffffffff | 1 TB | ... unused hole
> > + ffffea0000000000 | -22 TB | ffffeaffffffffff | 1 TB | virtual memory map (vmemmap_base)
> > + ffffeb0000000000 | -21 TB | ffffebffffffffff | 1 TB | ... unused hole
> > + ffffec0000000000 | -20 TB | fffffbffffffffff | 16 TB | KASAN shadow memory
> > + fffffc0000000000 | -4 TB | fffffdffffffffff | 2 TB | ... unused hole
> > + | | | | vaddr_end for KASLR
> > + fffffe0000000000 | -2 TB | fffffe7fffffffff | 0.5 TB | cpu_entry_area mapping
> > + fffffe8000000000 | -1.5 TB | fffffeffffffffff | 0.5 TB | LDT remap for PTI
> > + ffffff0000000000 | -1 TB | ffffff7fffffffff | 0.5 TB | %esp fixup stacks
> > +__________________|____________|__________________|_________|____________________________________________________________
> > + |
>
> 56-bit:
> > + |
> > + | Kernel-space virtual memory, shared between all processes:
> > +____________________________________________________________|___________________________________________________________
> > + | | | |
> > + ff00000000000000 | -64 PB | ff0fffffffffffff | 4 PB | ... guard hole, also reserved for hypervisor
> > + ff10000000000000 | -60 PB | ff8fffffffffffff | 32 PB | direct mapping of all physical memory (page_offset_base)
> > + ff90000000000000 | -28 PB | ff9fffffffffffff | 4 PB | LDT remap for PTI
> > + ffa0000000000000 | -24 PB | ffd1ffffffffffff | 12.5 PB | vmalloc/ioremap space (vmalloc_base)
> > + ffd2000000000000 | -11.5 PB | ffd3ffffffffffff | 0.5 PB | ... unused hole
> > + ffd4000000000000 | -11 PB | ffd5ffffffffffff | 0.5 PB | virtual memory map (vmemmap_base)
> > + ffd6000000000000 | -10.5 PB | ffdeffffffffffff | 2.25 PB | ... unused hole
> > + ffdf000000000000 | -8.25 PB | fffffdffffffffff | ~8 PB | KASAN shadow memory
> > + fffffc0000000000 | -4 TB | fffffdffffffffff | 2 TB | ... unused hole
> > + | | | | vaddr_end for KASLR
> > + fffffe0000000000 | -2 TB | fffffe7fffffffff | 0.5 TB | cpu_entry_area mapping
> > + fffffe8000000000 | -1.5 TB | fffffeffffffffff | 0.5 TB | ... unused hole
> > + ffffff0000000000 | -1 TB | ffffff7fffffffff | 0.5 TB | %esp fixup stacks
>
> The two layouts are very similar beyond the shift in the offset and the region sizes, except
> one big asymmetry: is the placement of the LDT remap for PTI.
>
> Is there any fundamental reason why the LDT area is mapped into a 4 petabyte (!) area on 56-bit
> kernels, instead of being at the -1.5 TB offset like on 47-bit kernels?
>
> The only reason I can see is that this way is that it's currently coded at the PGD level only:
>
> static void map_ldt_struct_to_user(struct mm_struct *mm)
> {
> pgd_t *pgd = pgd_offset(mm, LDT_BASE_ADDR);
>
> if (static_cpu_has(X86_FEATURE_PTI) && !mm->context.ldt)
> set_pgd(kernel_to_user_pgdp(pgd), *pgd);
> }
>
> ( BTW., the 4 petabyte size of the area is misleading: a 5-level PGD entry covers 256 TB of
> virtual memory, i.e 0.25 PB, not 4 PB. So in reality we have a 0.25 PB area there, used up
> by the LDT mapping in a single PGD entry, plus a 3.75 PB hole after that. )
>
> ... but unless I'm missing something it's not really fundamental for it to be at the PGD level
> - it could be two levels lower as well, and it could move back to the same place where it's on
> the 47-bit kernel.
>
The subtlety is that, if it's lower than the PGD level, there end up
being some tables that are private to each LDT-using mm that map
things other than the LDT. Those tables cover the same address range
as some corresponding tables in init_mm, and if those tables in
init_mm change after the LDT mapping is set up, the changes won't
propagate.
So it probably could be made to work, but it would take some extra care.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 4/3 v2] x86/mm/doc: Enhance the x86-64 virtual memory layout descriptions
2018-10-06 22:17 ` Andy Lutomirski
@ 2018-10-09 0:35 ` Baoquan He
2018-10-09 4:48 ` Baoquan He
0 siblings, 1 reply; 19+ messages in thread
From: Baoquan He @ 2018-10-09 0:35 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Ingo Molnar, Dave Hansen, Peter Zijlstra, Kirill A. Shutemov,
LKML, X86 ML, linux-doc, Thomas Gleixner, Thomas Garnier,
Jonathan Corbet, Borislav Petkov, H. Peter Anvin, Linus Torvalds,
Andrew Morton
Hi Andy, Ingo
On 10/06/18 at 03:17pm, Andy Lutomirski wrote:
> On Sat, Oct 6, 2018 at 10:03 AM Ingo Molnar <mingo@kernel.org> wrote:
> > ... but unless I'm missing something it's not really fundamental for it to be at the PGD level
> > - it could be two levels lower as well, and it could move back to the same place where it's on
> > the 47-bit kernel.
> >
>
> The subtlety is that, if it's lower than the PGD level, there end up
> being some tables that are private to each LDT-using mm that map
> things other than the LDT. Those tables cover the same address range
> as some corresponding tables in init_mm, and if those tables in
> init_mm change after the LDT mapping is set up, the changes won't
> propagate.
>
> So it probably could be made to work, but it would take some extra care.
I didn't know LDT well before, after some investigation, seems mainly
user space program like Wine will use it to protect/isolate something
by calling modify_ldt syscall, and Xen also use it. still I don't know
how they will use it to manipulate code/data segments.
While from the current kernel code, it can contains array of 8192 entries,
and each entry is 8 Byte, when PTI not enabled. If PTI is enabled, it's
doubled, 2 slots to map, 2 * 8192 * 8, 128KB in all. So one pmd entry can
cover it.
In 4-level paging mode, we reserve 512 GB virtual address space for it to
map, the 512 GB is one PGD entry. In 5-level paging mode, we reserve 4
PB for mapping LDT, and leave the previous 512 GB space next to
cpu_entry_area mapping empty as unused hole. Maybe we can still put LDT
map for PTI in the old place, after cpu_entry_area mapping in 5-level.
Then in 5-level, 512 GB is only one p4d entry, however it's in the last
pgd entry, each pgd points to 256 TB area, and the last pgd entry will
points to p4d table which always exists in system since it contains
kernel text mapping etc. Now if LDT take one entry in the always
existing p4d table, maybe it can still works as before it owns a whole
pgd entry, oh, no, 4 PB will cost 16 pgd entries.
Most importantly, putting LDT map for PTI in KASLR area, won't it cause
code bug, if we randomize the direct mapping/vmaloc/vmemmap to make them
overlap with LDT map area? We didn't take LDT into consideration when do
memory region KASLR.
4-level virutal memory layout:
ffff800000000000 | -128 TB | ffff87ffffffffff | 8 TB | ... guard hole, also reserved for hypervisor
ffff880000000000 | -120 TB | ffffc7ffffffffff | 64 TB | direct mapping of all physical memory (page_offset_base)
ffffc80000000000 | -56 TB | ffffc8ffffffffff | 1 TB | ... unused hole
ffffc90000000000 | -55 TB | ffffe8ffffffffff | 32 TB | vmalloc/ioremap space (vmalloc_base)
ffffe90000000000 | -23 TB | ffffe9ffffffffff | 1 TB | ... unused hole
ffffea0000000000 | -22 TB | ffffeaffffffffff | 1 TB | virtual memory map (vmemmap_base)
ffffeb0000000000 | -21 TB | ffffebffffffffff | 1 TB | ... unused hole
ffffec0000000000 | -20 TB | fffffbffffffffff | 16 TB | KASAN shadow memory
fffffc0000000000 | -4 TB | fffffdffffffffff | 2 TB | ... unused hole
| | | | vaddr_end for KASLR
fffffe0000000000 | -2 TB | fffffe7fffffffff | 0.5 TB | cpu_entry_area mapping
fffffe8000000000 | -1.5 TB | fffffeffffffffff | 0.5 TB | LDT remap for PTI
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ffffff0000000000 | -1 TB | ffffff7fffffffff | 0.5 TB | %esp fixup stacks
5-level virtual memory layout:
ff10000000000000 | -60 PB | ff8fffffffffffff | 32 PB | direct mapping of all physical memory (page_offset_base)
ff90000000000000 | -28 PB | ff9fffffffffffff | 4 PB | LDT remap for PTI
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ffa0000000000000 | -24 PB | ffd1ffffffffffff | 12.5 PB | vmalloc/ioremap space (vmalloc_base)
ffd2000000000000 | -11.5 PB | ffd3ffffffffffff | 0.5 PB | ... unused hole
ffd4000000000000 | -11 PB | ffd5ffffffffffff | 0.5 PB | virtual memory map (vmemmap_base)
ffd6000000000000 | -10.5 PB | ffdeffffffffffff | 2.25 PB | ... unused hole
ffdf000000000000 | -8.25 PB | fffffdffffffffff | ~8 PB | KASAN shadow memory
fffffc0000000000 | -4 TB | fffffdffffffffff | 2 TB | ... unused hole
| | | | vaddr_end for KASLR
fffffe0000000000 | -2 TB | fffffe7fffffffff | 0.5 TB | cpu_entry_area mapping
fffffe8000000000 | -1.5 TB | fffffeffffffffff | 0.5 TB | ... unused hole
ffffff0000000000 | -1 TB | ffffff7fffffffff | 0.5 TB | %esp fixup stacks
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 4/3 v2] x86/mm/doc: Enhance the x86-64 virtual memory layout descriptions
2018-10-09 0:35 ` Baoquan He
@ 2018-10-09 4:48 ` Baoquan He
0 siblings, 0 replies; 19+ messages in thread
From: Baoquan He @ 2018-10-09 4:48 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Ingo Molnar, Dave Hansen, Peter Zijlstra, Kirill A. Shutemov,
LKML, X86 ML, linux-doc, Thomas Gleixner, Thomas Garnier,
Jonathan Corbet, Borislav Petkov, H. Peter Anvin, Linus Torvalds,
Andrew Morton
On 10/09/18 at 08:35am, Baoquan He wrote:
> Hi Andy, Ingo
>
> On 10/06/18 at 03:17pm, Andy Lutomirski wrote:
> > On Sat, Oct 6, 2018 at 10:03 AM Ingo Molnar <mingo@kernel.org> wrote:
> > > ... but unless I'm missing something it's not really fundamental for it to be at the PGD level
> > > - it could be two levels lower as well, and it could move back to the same place where it's on
> > > the 47-bit kernel.
> > >
> >
> > The subtlety is that, if it's lower than the PGD level, there end up
> > being some tables that are private to each LDT-using mm that map
> > things other than the LDT. Those tables cover the same address range
> > as some corresponding tables in init_mm, and if those tables in
> > init_mm change after the LDT mapping is set up, the changes won't
> > propagate.
> >
> > So it probably could be made to work, but it would take some extra care.
>
> In 4-level paging mode, we reserve 512 GB virtual address space for it to
> map, the 512 GB is one PGD entry. In 5-level paging mode, we reserve 4
> PB for mapping LDT, and leave the previous 512 GB space next to
> cpu_entry_area mapping empty as unused hole. Maybe we can still put LDT
> map for PTI in the old place, after cpu_entry_area mapping in 5-level.
> Then in 5-level, 512 GB is only one p4d entry, however it's in the last
> pgd entry, each pgd points to 256 TB area, and the last pgd entry will
> points to p4d table which always exists in system since it contains
> kernel text mapping etc. Now if LDT take one entry in the always
> existing p4d table, maybe it can still works as before it owns a whole
> pgd entry, oh, no, 4 PB will cost 16 pgd entries.
Sorry, I am too long-winded. Here I mean that LDT map of 512 GB will
occupy one p4d entry alone, and the corresponding pgd and p4d table are
all always presnet and populated and unchanged. It might not need
any page table change to propagate. Not sure if there's any other risk
in this case.
Thanks
Baoquan
>
> Most importantly, putting LDT map for PTI in KASLR area, won't it cause
> code bug, if we randomize the direct mapping/vmaloc/vmemmap to make them
> overlap with LDT map area? We didn't take LDT into consideration when do
> memory region KASLR.
>
>
> 4-level virutal memory layout:
>
> ffff800000000000 | -128 TB | ffff87ffffffffff | 8 TB | ... guard hole, also reserved for hypervisor
> ffff880000000000 | -120 TB | ffffc7ffffffffff | 64 TB | direct mapping of all physical memory (page_offset_base)
> ffffc80000000000 | -56 TB | ffffc8ffffffffff | 1 TB | ... unused hole
> ffffc90000000000 | -55 TB | ffffe8ffffffffff | 32 TB | vmalloc/ioremap space (vmalloc_base)
> ffffe90000000000 | -23 TB | ffffe9ffffffffff | 1 TB | ... unused hole
> ffffea0000000000 | -22 TB | ffffeaffffffffff | 1 TB | virtual memory map (vmemmap_base)
> ffffeb0000000000 | -21 TB | ffffebffffffffff | 1 TB | ... unused hole
> ffffec0000000000 | -20 TB | fffffbffffffffff | 16 TB | KASAN shadow memory
> fffffc0000000000 | -4 TB | fffffdffffffffff | 2 TB | ... unused hole
> | | | | vaddr_end for KASLR
> fffffe0000000000 | -2 TB | fffffe7fffffffff | 0.5 TB | cpu_entry_area mapping
> fffffe8000000000 | -1.5 TB | fffffeffffffffff | 0.5 TB | LDT remap for PTI
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> ffffff0000000000 | -1 TB | ffffff7fffffffff | 0.5 TB | %esp fixup stacks
>
> 5-level virtual memory layout:
>
> ff10000000000000 | -60 PB | ff8fffffffffffff | 32 PB | direct mapping of all physical memory (page_offset_base)
> ff90000000000000 | -28 PB | ff9fffffffffffff | 4 PB | LDT remap for PTI
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> ffa0000000000000 | -24 PB | ffd1ffffffffffff | 12.5 PB | vmalloc/ioremap space (vmalloc_base)
> ffd2000000000000 | -11.5 PB | ffd3ffffffffffff | 0.5 PB | ... unused hole
> ffd4000000000000 | -11 PB | ffd5ffffffffffff | 0.5 PB | virtual memory map (vmemmap_base)
> ffd6000000000000 | -10.5 PB | ffdeffffffffffff | 2.25 PB | ... unused hole
> ffdf000000000000 | -8.25 PB | fffffdffffffffff | ~8 PB | KASAN shadow memory
> fffffc0000000000 | -4 TB | fffffdffffffffff | 2 TB | ... unused hole
> | | | | vaddr_end for KASLR
> fffffe0000000000 | -2 TB | fffffe7fffffffff | 0.5 TB | cpu_entry_area mapping
> fffffe8000000000 | -1.5 TB | fffffeffffffffff | 0.5 TB | ... unused hole
> ffffff0000000000 | -1 TB | ffffff7fffffffff | 0.5 TB | %esp fixup stacks
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 0/3] x86/mm/doc: Clean up mm.txt
2018-09-21 2:05 [PATCH 0/3] x86/mm/doc: Clean up mm.txt Baoquan He
@ 2018-09-27 0:02 ` Baoquan He
0 siblings, 0 replies; 19+ messages in thread
From: Baoquan He @ 2018-09-27 0:02 UTC (permalink / raw)
To: mingo, tglx, hpa
Cc: linux-kernel, kirill.shutemov, x86, thgarnie, corbet, linux-doc, peterz
On 09/21/18 at 10:05am, Baoquan He wrote:
> This clean up is suggested by Ingo.
This series is messy, have sent v2. So NACK this v1 series.
>
> It firstly fix the confusions in mm layout tables by unifying
> each memory region description in the consistent style.
>
> Secondly take the KASLR words out of the mm layout tables to make
> it as a separate section to only list mm layout in non-KASLR case.
> Then add KASLR document at the end of mm.txt.
>
> Meanwhile update document about KERNEL_IMAGE_SIZE in
> arch/x86/include/asm/page_64_types.h .
>
> Baoquan He (3):
> x86/KASLR: Update document about KERNEL_IMAGE_SIZE
> x86/mm/doc: Clean up the memory region layout descriptions
> x86/doc/kaslr.txt: Create a separate part of document abourt KASLR at
> the end of file
>
> Documentation/x86/x86_64/mm.txt | 138 +++++++++++++++++++++++------------
> arch/x86/include/asm/page_64_types.h | 7 +-
> 2 files changed, 96 insertions(+), 49 deletions(-)
>
> --
> 2.13.6
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 0/3] x86/mm/doc: Clean up mm.txt
@ 2018-09-21 2:05 Baoquan He
2018-09-27 0:02 ` Baoquan He
0 siblings, 1 reply; 19+ messages in thread
From: Baoquan He @ 2018-09-21 2:05 UTC (permalink / raw)
To: mingo, tglx, hpa
Cc: linux-kernel, kirill.shutemov, x86, thgarnie, corbet, linux-doc,
peterz, Baoquan He
This clean up is suggested by Ingo.
It firstly fix the confusions in mm layout tables by unifying
each memory region description in the consistent style.
Secondly take the KASLR words out of the mm layout tables to make
it as a separate section to only list mm layout in non-KASLR case.
Then add KASLR document at the end of mm.txt.
Meanwhile update document about KERNEL_IMAGE_SIZE in
arch/x86/include/asm/page_64_types.h .
Baoquan He (3):
x86/KASLR: Update document about KERNEL_IMAGE_SIZE
x86/mm/doc: Clean up the memory region layout descriptions
x86/doc/kaslr.txt: Create a separate part of document abourt KASLR at
the end of file
Documentation/x86/x86_64/mm.txt | 138 +++++++++++++++++++++++------------
arch/x86/include/asm/page_64_types.h | 7 +-
2 files changed, 96 insertions(+), 49 deletions(-)
--
2.13.6
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2018-10-09 4:48 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-06 8:43 [PATCH 0/3] x86/mm/doc: Clean up mm.txt Baoquan He
2018-10-06 8:43 ` [PATCH 1/3] x86/KASLR: Update KERNEL_IMAGE_SIZE description Baoquan He
2018-10-06 13:06 ` [tip:x86/mm] " tip-bot for Baoquan He
2018-10-06 8:43 ` [PATCH 2/3] x86/mm/doc: Clean up the memory region layout descriptions Baoquan He
2018-10-06 13:07 ` [tip:x86/mm] x86/mm/doc: Clean up the x86-64 virtual memory " tip-bot for Baoquan He
2018-10-06 8:43 ` [PATCH 3/3] x86/doc/kaslr.txt: Create a separate part of document abourt KASLR at the end of file Baoquan He
2018-10-06 11:28 ` [PATCH 0/3] x86/mm/doc: Clean up mm.txt Baoquan He
2018-10-06 12:21 ` Ingo Molnar
2018-10-06 12:22 ` [PATCH 4/3] x86/mm/doc: Enhance the x86-64 virtual memory layout descriptions Ingo Molnar
2018-10-06 12:33 ` Ingo Molnar
2018-10-06 14:41 ` Baoquan He
2018-10-06 14:38 ` [PATCH 4/3 v2] " Ingo Molnar
2018-10-06 15:02 ` Baoquan He
2018-10-06 17:03 ` Ingo Molnar
2018-10-06 22:17 ` Andy Lutomirski
2018-10-09 0:35 ` Baoquan He
2018-10-09 4:48 ` Baoquan He
-- strict thread matches above, loose matches on Subject: below --
2018-09-21 2:05 [PATCH 0/3] x86/mm/doc: Clean up mm.txt Baoquan He
2018-09-27 0:02 ` Baoquan He
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.