linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] x86/mm/doc: Clean up mm.txt
@ 2018-10-06  8:43 Baoquan He
  2018-10-06  8:43 ` [PATCH 1/3] x86/KASLR: Update KERNEL_IMAGE_SIZE description Baoquan He
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Baoquan He @ 2018-10-06  8:43 UTC (permalink / raw)
  To: mingo; +Cc: linux-kernel, x86, linux-doc, tglx, thgarnie, corbet, Baoquan He

This clean up is suggested by Ingo.

It firstly fix the confusions in mm layout tables by unifying
each memory region description in the consistent style.

Secondly take the KASLR words out of the mm layout tables to make
it as a separate section to only list mm layout in non-KASLR case.
Then add KASLR document at the end of mm.txt.

Meanwhile update description about KERNEL_IMAGE_SIZE in
arch/x86/include/asm/page_64_types.h .

v2->v3:
Ingo helped to prettify the patch log and code comment, repost them
after updating accordign to Ingo's suggestions.

v1->v2:

Resend v2 since some typo and incorrect descriptions found in v1 post.

Baoquan He (3):
  x86/KASLR: Update KERNEL_IMAGE_SIZE description
  x86/mm/doc: Clean up the memory region layout descriptions
  x86/doc/kaslr.txt: Create a separate part of document abourt KASLR at
    the end of file

 Documentation/x86/x86_64/mm.txt      | 150 +++++++++++++++++++++++------------
 arch/x86/include/asm/page_64_types.h |  15 ++--
 2 files changed, 107 insertions(+), 58 deletions(-)

-- 
2.13.6


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/3] x86/KASLR: Update KERNEL_IMAGE_SIZE description
  2018-10-06  8:43 [PATCH 0/3] x86/mm/doc: Clean up mm.txt Baoquan He
@ 2018-10-06  8:43 ` Baoquan He
  2018-10-06 13:06   ` [tip:x86/mm] " tip-bot for Baoquan He
  2018-10-06  8:43 ` [PATCH 2/3] x86/mm/doc: Clean up the memory region layout descriptions Baoquan He
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Baoquan He @ 2018-10-06  8:43 UTC (permalink / raw)
  To: mingo; +Cc: linux-kernel, x86, linux-doc, tglx, thgarnie, corbet, Baoquan He

Currently CONFIG_RANDOMIZE_BASE=y is set by default, which makes some of the
old comments above the KERNEL_IMAGE_SIZE definition out of date. Update them
to the current state of affairs.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 arch/x86/include/asm/page_64_types.h | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 6afac386a434..cd0cf1c568b4 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -59,13 +59,16 @@
 #endif
 
 /*
- * Kernel image size is limited to 1GiB due to the fixmap living in the
- * next 1GiB (see level2_kernel_pgt in arch/x86/kernel/head_64.S). Use
- * 512MiB by default, leaving 1.5GiB for modules once the page tables
- * are fully set up. If kernel ASLR is configured, it can extend the
- * kernel page table mapping, reducing the size of the modules area.
+ * Maximum kernel image size is limited to 1 GiB, due to the fixmap living
+ * in the next 1 GiB (see level2_kernel_pgt in arch/x86/kernel/head_64.S).
+ *
+ * On KASLR use 1 GiB by default, leaving 1 GiB for modules once the
+ * page tables are fully set up.
+ *
+ * If KASLR is disabled we can shrink it to 0.5 GiB and increase the size
+ * of the modules area to 1.5 GiB.
  */
-#if defined(CONFIG_RANDOMIZE_BASE)
+#ifdef CONFIG_RANDOMIZE_BASE
 #define KERNEL_IMAGE_SIZE	(1024 * 1024 * 1024)
 #else
 #define KERNEL_IMAGE_SIZE	(512 * 1024 * 1024)
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/3] x86/mm/doc: Clean up the memory region layout descriptions
  2018-10-06  8:43 [PATCH 0/3] x86/mm/doc: Clean up mm.txt Baoquan He
  2018-10-06  8:43 ` [PATCH 1/3] x86/KASLR: Update KERNEL_IMAGE_SIZE description Baoquan He
@ 2018-10-06  8:43 ` Baoquan He
  2018-10-06 13:07   ` [tip:x86/mm] x86/mm/doc: Clean up the x86-64 virtual memory " tip-bot for Baoquan He
  2018-10-06  8:43 ` [PATCH 3/3] x86/doc/kaslr.txt: Create a separate part of document abourt KASLR at the end of file Baoquan He
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Baoquan He @ 2018-10-06  8:43 UTC (permalink / raw)
  To: mingo; +Cc: linux-kernel, x86, linux-doc, tglx, thgarnie, corbet, Baoquan He

In Documentation/x86/x86_64/mm.txt, the style of descritions about
memory region layout is a little confusing:

 - mix size in TB with 'bits'
 - sometimes mention a size in the description and sometimes not
 - sometimes list holes by address, sometimes only as an 'unused hole' line

So fix them to make them in consistent style.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 Documentation/x86/x86_64/mm.txt | 84 ++++++++++++++++++++---------------------
 1 file changed, 42 insertions(+), 42 deletions(-)

diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index 5432a96d31ff..b4bc95c9790e 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -1,55 +1,55 @@
 
 Virtual memory map with 4 level page tables:
 
-0000000000000000 - 00007fffffffffff (=47 bits) user space, different per mm
-hole caused by [47:63] sign extension
-ffff800000000000 - ffff87ffffffffff (=43 bits) guard hole, reserved for hypervisor
-ffff880000000000 - ffffc7ffffffffff (=64 TB) direct mapping of all phys. memory
-ffffc80000000000 - ffffc8ffffffffff (=40 bits) hole
-ffffc90000000000 - ffffe8ffffffffff (=45 bits) vmalloc/ioremap space
-ffffe90000000000 - ffffe9ffffffffff (=40 bits) hole
-ffffea0000000000 - ffffeaffffffffff (=40 bits) virtual memory map (1TB)
-... unused hole ...
-ffffec0000000000 - fffffbffffffffff (=44 bits) kasan shadow memory (16TB)
-... unused hole ...
+0000000000000000 - 00007fffffffffff (=47 bits,   128 TB) user space, different per mm
+				    hole caused by [47:63] sign extension
+ffff800000000000 - ffff87ffffffffff (=43 bits,     8 TB) guard hole, reserved for hypervisor
+ffff880000000000 - ffffc7ffffffffff (=46 bits,    64 TB) direct mapping of all phys. memory (page_offset_base)
+ffffc80000000000 - ffffc8ffffffffff (=40 bits,     1 TB) unused hole
+ffffc90000000000 - ffffe8ffffffffff (=45 bits,    32 TB) vmalloc/ioremap space (vmalloc_base)
+ffffe90000000000 - ffffe9ffffffffff (=40 bits,     1 TB) unused hole
+ffffea0000000000 - ffffeaffffffffff (=40 bits,     1 TB) virtual memory map (vmemmap_base)
+ffffeb0000000000 - ffffebffffffffff (=40 bits,     1 TB) unused hole
+ffffec0000000000 - fffffbffffffffff (=44 bits,    16 TB) kasan shadow memory
+fffffc0000000000 - fffffdffffffffff (=41 bits,     2 TB) unused hole
 				    vaddr_end for KASLR
-fffffe0000000000 - fffffe7fffffffff (=39 bits) cpu_entry_area mapping
-fffffe8000000000 - fffffeffffffffff (=39 bits) LDT remap for PTI
-ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks
-... unused hole ...
-ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space
-... unused hole ...
-ffffffff80000000 - ffffffff9fffffff (=512 MB)  kernel text mapping, from phys 0
-ffffffffa0000000 - fffffffffeffffff (1520 MB) module mapping space
+fffffe0000000000 - fffffe7fffffffff (=39 bits,   512 GB) cpu_entry_area mapping
+fffffe8000000000 - fffffeffffffffff (=39 bits,   512 GB) LDT remap for PTI
+ffffff0000000000 - ffffff7fffffffff (=39 bits,   512 GB) %esp fixup stacks
+ffffff8000000000 - fffffffeefffffff (~39 bits,  ~507 GB) unused hole
+ffffffef00000000 - fffffffeffffffff (=36 bits,    64 GB) EFI region mapping space
+ffffffff00000000 - ffffffff7fffffff (=31 bits,     2 GB) unused hole
+ffffffff80000000 - ffffffff9fffffff (=29 bits,   512 MB) kernel text mapping, from phys 0
+ffffffffa0000000 - fffffffffeffffff (~31 bits,  1520 MB) module mapping space
 [fixmap start]   - ffffffffff5fffff kernel-internal fixmap range
-ffffffffff600000 - ffffffffff600fff (=4 kB) legacy vsyscall ABI
-ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
+ffffffffff600000 - ffffffffff600fff (             =4 kB) legacy vsyscall ABI
+ffffffffffe00000 - ffffffffffffffff (             =2 MB) unused hole
 
 Virtual memory map with 5 level page tables:
 
-0000000000000000 - 00ffffffffffffff (=56 bits) user space, different per mm
-hole caused by [56:63] sign extension
-ff00000000000000 - ff0fffffffffffff (=52 bits) guard hole, reserved for hypervisor
-ff10000000000000 - ff8fffffffffffff (=55 bits) direct mapping of all phys. memory
-ff90000000000000 - ff9fffffffffffff (=52 bits) LDT remap for PTI
-ffa0000000000000 - ffd1ffffffffffff (=54 bits) vmalloc/ioremap space (12800 TB)
-ffd2000000000000 - ffd3ffffffffffff (=49 bits) hole
-ffd4000000000000 - ffd5ffffffffffff (=49 bits) virtual memory map (512TB)
-... unused hole ...
-ffdf000000000000 - fffffc0000000000 (=53 bits) kasan shadow memory (8PB)
-... unused hole ...
+0000000000000000 - 00ffffffffffffff (=56 bits,    64 PB) user space, different per mm
+				    hole caused by [56:63] sign extension
+ff00000000000000 - ff0fffffffffffff (=52 bits,     4 PB) guard hole, reserved for hypervisor
+ff10000000000000 - ff8fffffffffffff (=55 bits,    32 PB) direct mapping of all phys. memory (page_offset_base)
+ff90000000000000 - ff9fffffffffffff (=52 bits,     4 PB) LDT remap for PTI
+ffa0000000000000 - ffd1ffffffffffff (=53 bits, 12800 TB) vmalloc/ioremap space (vmalloc_base)
+ffd2000000000000 - ffd3ffffffffffff (=49 bits,   512 TB) unused hole
+ffd4000000000000 - ffd5ffffffffffff (=49 bits,   512 TB) virtual memory map (vmemmap_base)
+ffd6000000000000 - ffdeffffffffffff (~51 bits,  2304 TB) unused hole
+ffdf000000000000 - fffffdffffffffff (~53 bits,    ~8 PB) kasan shadow memory
+fffffc0000000000 - fffffdffffffffff (=41 bits,     2 TB) unused hole
 				    vaddr_end for KASLR
-fffffe0000000000 - fffffe7fffffffff (=39 bits) cpu_entry_area mapping
-... unused hole ...
-ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks
-... unused hole ...
-ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space
-... unused hole ...
-ffffffff80000000 - ffffffff9fffffff (=512 MB)  kernel text mapping, from phys 0
-ffffffffa0000000 - fffffffffeffffff (1520 MB) module mapping space
+fffffe0000000000 - fffffe7fffffffff (=39 bits,   512 GB) cpu_entry_area mapping
+fffffe8000000000 - fffffeffffffffff (=39 bits,   512 GB) unused hole
+ffffff0000000000 - ffffff7fffffffff (=39 bits,   512 GB) %esp fixup stacks
+ffffff8000000000 - ffffffeeffffffff (~39 bits,   444 GB) unused hole
+ffffffef00000000 - fffffffeffffffff (=36 bits,    64 GB) EFI region mapping space
+ffffffff00000000 - ffffffff7fffffff (31 bits,      2 GB) unused hole
+ffffffff80000000 - ffffffff9fffffff (=29 bits,   512 MB) kernel text mapping, from phys 0
+ffffffffa0000000 - fffffffffeffffff (~31 bits,  1520 MB) module mapping space
 [fixmap start]   - ffffffffff5fffff kernel-internal fixmap range
-ffffffffff600000 - ffffffffff600fff (=4 kB) legacy vsyscall ABI
-ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
+ffffffffff600000 - ffffffffff600fff (             =4 kB) legacy vsyscall ABI
+ffffffffffe00000 - ffffffffffffffff (             =2 MB) unused hole
 
 Architecture defines a 64-bit virtual address. Implementations can support
 less. Currently supported are 48- and 57-bit virtual addresses. Bits 63
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/3] x86/doc/kaslr.txt: Create a separate part of document abourt KASLR at the end of file
  2018-10-06  8:43 [PATCH 0/3] x86/mm/doc: Clean up mm.txt Baoquan He
  2018-10-06  8:43 ` [PATCH 1/3] x86/KASLR: Update KERNEL_IMAGE_SIZE description Baoquan He
  2018-10-06  8:43 ` [PATCH 2/3] x86/mm/doc: Clean up the memory region layout descriptions Baoquan He
@ 2018-10-06  8:43 ` Baoquan He
  2018-10-06 11:28 ` [PATCH 0/3] x86/mm/doc: Clean up mm.txt Baoquan He
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Baoquan He @ 2018-10-06  8:43 UTC (permalink / raw)
  To: mingo; +Cc: linux-kernel, x86, linux-doc, tglx, thgarnie, corbet, Baoquan He

Take the original content as the first part to only list static mm
layout tables in non-KASLR case. Then add KASLR related description
at the end.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 Documentation/x86/x86_64/mm.txt | 64 +++++++++++++++++++++++++++++++++++------
 1 file changed, 55 insertions(+), 9 deletions(-)

diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index b4bc95c9790e..549fcae596e0 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -1,5 +1,5 @@
 
-Virtual memory map with 4 level page tables:
+MM layout in non-KASLR case:
 
 0000000000000000 - 00007fffffffffff (=47 bits,   128 TB) user space, different per mm
 				    hole caused by [47:63] sign extension
@@ -12,7 +12,6 @@ ffffea0000000000 - ffffeaffffffffff (=40 bits,     1 TB) virtual memory map (vme
 ffffeb0000000000 - ffffebffffffffff (=40 bits,     1 TB) unused hole
 ffffec0000000000 - fffffbffffffffff (=44 bits,    16 TB) kasan shadow memory
 fffffc0000000000 - fffffdffffffffff (=41 bits,     2 TB) unused hole
-				    vaddr_end for KASLR
 fffffe0000000000 - fffffe7fffffffff (=39 bits,   512 GB) cpu_entry_area mapping
 fffffe8000000000 - fffffeffffffffff (=39 bits,   512 GB) LDT remap for PTI
 ffffff0000000000 - ffffff7fffffffff (=39 bits,   512 GB) %esp fixup stacks
@@ -38,7 +37,6 @@ ffd4000000000000 - ffd5ffffffffffff (=49 bits,   512 TB) virtual memory map (vme
 ffd6000000000000 - ffdeffffffffffff (~51 bits,  2304 TB) unused hole
 ffdf000000000000 - fffffdffffffffff (~53 bits,    ~8 PB) kasan shadow memory
 fffffc0000000000 - fffffdffffffffff (=41 bits,     2 TB) unused hole
-				    vaddr_end for KASLR
 fffffe0000000000 - fffffe7fffffffff (=39 bits,   512 GB) cpu_entry_area mapping
 fffffe8000000000 - fffffeffffffffff (=39 bits,   512 GB) unused hole
 ffffff0000000000 - ffffff7fffffffff (=39 bits,   512 GB) %esp fixup stacks
@@ -70,10 +68,58 @@ memory window (this size is arbitrary, it can be raised later if needed).
 The mappings are not part of any other kernel PGD and are only available
 during EFI runtime calls.
 
-Note that if CONFIG_RANDOMIZE_MEMORY is enabled, the direct mapping of all
-physical memory, vmalloc/ioremap space and virtual memory map are randomized.
-Their order is preserved but their base will be offset early at boot time.
+MM layout related to KASLR
+=========================================================================
 
-Be very careful vs. KASLR when changing anything here. The KASLR address
-range must not overlap with anything except the KASAN shadow area, which is
-correct as KASAN disables KASLR.
+Kernel Address Space Layout Randomization (KASLR) consists of two parts
+which work together to enhance the security of the Linux kernel:
+
+ - Kernel text KASLR
+ - Memory region KASLR
+
+Kernel text KASLR
+-----------------
+The physical address and virtual address of kernel text itself are
+randomized to a different position separately. The physical address of
+the kernel can be anywhere, under 64TB at most in 4-level paging mode,
+and under 4 PB in 5-level paging mode, while the virtual address of the
+kernel is restricted between [0xffffffff80000000, ffffffffbfffffff],
+the 1GB space.
+
+ffffffff80000000 - ffffffffbfffffff (1 GB)  kernel text mapping, from phys 0
+ffffffffc0000000 - fffffffffeffffff (1 GB) module mapping space
+
+Note: The kernel text KASLR uses 1 GB space to randomize the position of
+kernel image, and it's defalutly enabled. If KASLR config option
+CONFIG_RANDOMIZE_BASE is not enabled, the space for kernel image will be
+shrunk to 512 MB, accordingly increase the size of modules area to 1.5 GB.
+
+Memory region KASLR
+-------------------
+If CONFIG_RANDOMIZE_MEMORY is enabled, the below three memory regions
+are randomized. Their order is preserved but their base will be offset
+early at boot time.
+
+   - direct mapping region
+   - vmalloc region
+   - vmemmap region
+
+The KASLR address range must not overlap with anything except the KASAN
+shadow area, which is correct as KASAN disables KASLR.
+
+So if take 4-level paging mode as example, from the original starting
+address of the direct mapping region for physical RAM, to the starting
+address of the cpu_entry_area mapping region, namely
+[0xffff880000000000 - 0xfffffdffffffffff], the scope of 118 TB in all
+is the virtual address space where memory region KASLR can be allowed to
+move those memory regions around. After KASLR manipulation is done, their
+layout looks like:
+
+Name            Starting address        Size                                         Aligned
+-----------------------------------------------------------------------------------------------
+direct mapping  page_offset_base        [actual size of system RAM + 10 TB padding]  1 GB
+*guard hole     random                  random                                       1 GB
+vmalloc         vmalloc_base            32 TB                                        1 GB
+*guard hole     random                  random                                       1 GB
+vmemmap         vmemmap_base            1 TB                                         1 GB
+*guard hole     random                  random                                       1 GB
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] x86/mm/doc: Clean up mm.txt
  2018-10-06  8:43 [PATCH 0/3] x86/mm/doc: Clean up mm.txt Baoquan He
                   ` (2 preceding siblings ...)
  2018-10-06  8:43 ` [PATCH 3/3] x86/doc/kaslr.txt: Create a separate part of document abourt KASLR at the end of file Baoquan He
@ 2018-10-06 11:28 ` Baoquan He
  2018-10-06 12:21 ` Ingo Molnar
  2018-10-06 12:22 ` [PATCH 4/3] x86/mm/doc: Enhance the x86-64 virtual memory layout descriptions Ingo Molnar
  5 siblings, 0 replies; 19+ messages in thread
From: Baoquan He @ 2018-10-06 11:28 UTC (permalink / raw)
  To: mingo; +Cc: linux-kernel, x86, linux-doc, tglx, thgarnie, corbet

On 10/06/18 at 04:43pm, Baoquan He wrote:
> This clean up is suggested by Ingo.
> 
> It firstly fix the confusions in mm layout tables by unifying
> each memory region description in the consistent style.
> 
> Secondly take the KASLR words out of the mm layout tables to make
> it as a separate section to only list mm layout in non-KASLR case.
> Then add KASLR document at the end of mm.txt.
> 
> Meanwhile update description about KERNEL_IMAGE_SIZE in
> arch/x86/include/asm/page_64_types.h .

Sorry, this is V3 post. I forgot adding v3 tag in patch subject.

> 
> v2->v3:
> Ingo helped to prettify the patch log and code comment, repost them
> after updating accordign to Ingo's suggestions.
> 
> v1->v2:
> 
> Resend v2 since some typo and incorrect descriptions found in v1 post.
> 
> Baoquan He (3):
>   x86/KASLR: Update KERNEL_IMAGE_SIZE description
>   x86/mm/doc: Clean up the memory region layout descriptions
>   x86/doc/kaslr.txt: Create a separate part of document abourt KASLR at
>     the end of file
> 
>  Documentation/x86/x86_64/mm.txt      | 150 +++++++++++++++++++++++------------
>  arch/x86/include/asm/page_64_types.h |  15 ++--
>  2 files changed, 107 insertions(+), 58 deletions(-)
> 
> -- 
> 2.13.6
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] x86/mm/doc: Clean up mm.txt
  2018-10-06  8:43 [PATCH 0/3] x86/mm/doc: Clean up mm.txt Baoquan He
                   ` (3 preceding siblings ...)
  2018-10-06 11:28 ` [PATCH 0/3] x86/mm/doc: Clean up mm.txt Baoquan He
@ 2018-10-06 12:21 ` Ingo Molnar
  2018-10-06 12:22 ` [PATCH 4/3] x86/mm/doc: Enhance the x86-64 virtual memory layout descriptions Ingo Molnar
  5 siblings, 0 replies; 19+ messages in thread
From: Ingo Molnar @ 2018-10-06 12:21 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, x86, linux-doc, tglx, thgarnie, corbet,
	Borislav Petkov, Peter Zijlstra, Linus Torvalds, Andy Lutomirski


* Baoquan He <bhe@redhat.com> wrote:

> This clean up is suggested by Ingo.
> 
> It firstly fix the confusions in mm layout tables by unifying
> each memory region description in the consistent style.
> 
> Secondly take the KASLR words out of the mm layout tables to make
> it as a separate section to only list mm layout in non-KASLR case.
> Then add KASLR document at the end of mm.txt.
> 
> Meanwhile update description about KERNEL_IMAGE_SIZE in
> arch/x86/include/asm/page_64_types.h .
> 
> v2->v3:
> Ingo helped to prettify the patch log and code comment, repost them
> after updating accordign to Ingo's suggestions.
> 
> v1->v2:
> 
> Resend v2 since some typo and incorrect descriptions found in v1 post.
> 
> Baoquan He (3):
>   x86/KASLR: Update KERNEL_IMAGE_SIZE description
>   x86/mm/doc: Clean up the memory region layout descriptions
>   x86/doc/kaslr.txt: Create a separate part of document abourt KASLR at
>     the end of file

Thanks, patches #1 and #2 are looking good and I'll apply them with some minor fixes, and I'll 
comment about patch #3 separately.

I also wrote a larger patch enhancing the descriptions some more, I'll send that separately.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 4/3] x86/mm/doc: Enhance the x86-64 virtual memory layout descriptions
  2018-10-06  8:43 [PATCH 0/3] x86/mm/doc: Clean up mm.txt Baoquan He
                   ` (4 preceding siblings ...)
  2018-10-06 12:21 ` Ingo Molnar
@ 2018-10-06 12:22 ` Ingo Molnar
  2018-10-06 12:33   ` Ingo Molnar
  2018-10-06 14:38   ` [PATCH 4/3 v2] " Ingo Molnar
  5 siblings, 2 replies; 19+ messages in thread
From: Ingo Molnar @ 2018-10-06 12:22 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, x86, linux-doc, tglx, thgarnie, corbet,
	Borislav Petkov, H. Peter Anvin, Andy Lutomirski, Peter Zijlstra,
	Linus Torvalds, Andrew Morton

After the cleanups from Baoquan He, make it even more readable:

 - Remove the 'bits' area size column: it's pretty pointless and was even
   wrong for some of the entries. Given that MB, GB, TB, PT are 10, 20,
   30 and 40 bits, a "8 TB" size description makes it obvious that it's
   43 bits.

 - Introduce an "offset" column:

    --------------------------------------------------------------------------------
    start addr       | offset     | end addr         |  size   | VM area description
    -----------------|------------|------------------|---------|--------------------
    ...
    ffff880000000000 | -120    TB | ffffc7ffffffffff |   64 TB | direct mapping of all physical memory (page_offset_base),
                                                                 this is what limits max physical memory supported.

   The -120 TB notation makes it obvious where this particular virtual memory
   region starts: 120 TB down from the top of the 64-bit virtual memory space.
   Especially the layout of the kernel mappings is a *lot* more obvious when
   written this way, plus it's much easier to compare it with the size column
   and understand/check/validate and modify the kernel's layout in the future.

 - Mark the part from where the 47-bit and 56-bit kernel layouts are 100% identical,
   this starts at the -512 GB offset and the EFI region.

 - Re-shuffle the size desciptions to be continous blocks of sizes, instead of the
   often mixed size. I.e. write "0.5 TB" instead of "512 GB" if we are still in
   the TB-granular region of the map.

 - Make the 47-bit and 56-bit descriptions use the *exact* same layout and wording,
   and only differ where there's a material difference. This makes it easy to compare
   the two tables side by side by switching between two terminal tabs.

 - Plus enhance a lot of other stylistic/typographical details: make the tables
   explicitly tabular, add headers, enhance certain entries, etc. etc.

Note that there are some apparent errors in the tables as well, but I'll fix
them in a separate patch to make it easier to review/validate.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: corbet@lwn.net
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: thgarnie@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/x86/x86_64/mm.txt |  172 ++++++++++++++++++++++++++++------------
 kernel/sched/core.c             |    6 +
 2 files changed, 128 insertions(+), 50 deletions(-)

Index: tip/Documentation/x86/x86_64/mm.txt
===================================================================
--- tip.orig/Documentation/x86/x86_64/mm.txt
+++ tip/Documentation/x86/x86_64/mm.txt
@@ -1,55 +1,127 @@
 
-Virtual memory map with 4 level page tables:
+========================================================
+| Complete virtual memory map with 4-level page tables |
+========================================================
 
-0000000000000000 - 00007fffffffffff (=47 bits,   128 TB) user space, different per mm
-				    hole caused by [47:63] sign extension
-ffff800000000000 - ffff87ffffffffff (=43 bits,     8 TB) guard hole, reserved for hypervisor
-ffff880000000000 - ffffc7ffffffffff (=46 bits,    64 TB) direct mapping of all phys. memory (page_offset_base)
-ffffc80000000000 - ffffc8ffffffffff (=40 bits,     1 TB) unused hole
-ffffc90000000000 - ffffe8ffffffffff (=45 bits,    32 TB) vmalloc/ioremap space (vmalloc_base)
-ffffe90000000000 - ffffe9ffffffffff (=40 bits,     1 TB) unused hole
-ffffea0000000000 - ffffeaffffffffff (=40 bits,     1 TB) virtual memory map (vmemmap_base)
-ffffeb0000000000 - ffffebffffffffff (=40 bits,     1 TB) unused hole
-ffffec0000000000 - fffffbffffffffff (=44 bits,    16 TB) kasan shadow memory
-fffffc0000000000 - fffffdffffffffff (=41 bits,     2 TB) unused hole
-				    vaddr_end for KASLR
-fffffe0000000000 - fffffe7fffffffff (=39 bits,   512 GB) cpu_entry_area mapping
-fffffe8000000000 - fffffeffffffffff (=39 bits,   512 GB) LDT remap for PTI
-ffffff0000000000 - ffffff7fffffffff (=39 bits,   512 GB) %esp fixup stacks
-ffffff8000000000 - fffffffeefffffff (~39 bits,  ~507 GB) unused hole
-ffffffef00000000 - fffffffeffffffff (=36 bits,    64 GB) EFI region mapping space
-ffffffff00000000 - ffffffff7fffffff (=31 bits,     2 GB) unused hole
-ffffffff80000000 - ffffffff9fffffff (=29 bits,   512 MB) kernel text mapping, from phys 0
-ffffffffa0000000 - fffffffffeffffff (~31 bits,  1520 MB) module mapping space
-[fixmap start]   - ffffffffff5fffff kernel-internal fixmap range
-ffffffffff600000 - ffffffffff600fff (             =4 kB) legacy vsyscall ABI
-ffffffffffe00000 - ffffffffffffffff (             =2 MB) unused hole
-
-Virtual memory map with 5 level page tables:
-
-0000000000000000 - 00ffffffffffffff (=56 bits,    64 PB) user space, different per mm
-				    hole caused by [56:63] sign extension
-ff00000000000000 - ff0fffffffffffff (=52 bits,     4 PB) guard hole, reserved for hypervisor
-ff10000000000000 - ff8fffffffffffff (=55 bits,    32 PB) direct mapping of all phys. memory (page_offset_base)
-ff90000000000000 - ff9fffffffffffff (=52 bits,     4 PB) LDT remap for PTI
-ffa0000000000000 - ffd1ffffffffffff (=53 bits, 12800 TB) vmalloc/ioremap space (vmalloc_base)
-ffd2000000000000 - ffd3ffffffffffff (=49 bits,   512 TB) unused hole
-ffd4000000000000 - ffd5ffffffffffff (=49 bits,   512 TB) virtual memory map (vmemmap_base)
-ffd6000000000000 - ffdeffffffffffff (~51 bits,  2304 TB) unused hole
-ffdf000000000000 - fffffdffffffffff (~53 bits,    ~8 PB) kasan shadow memory
-fffffc0000000000 - fffffdffffffffff (=41 bits,     2 TB) unused hole
-				    vaddr_end for KASLR
-fffffe0000000000 - fffffe7fffffffff (=39 bits,   512 GB) cpu_entry_area mapping
-fffffe8000000000 - fffffeffffffffff (=39 bits,   512 GB) unused hole
-ffffff0000000000 - ffffff7fffffffff (=39 bits,   512 GB) %esp fixup stacks
-ffffff8000000000 - ffffffeeffffffff (~39 bits,   444 GB) unused hole
-ffffffef00000000 - fffffffeffffffff (=36 bits,    64 GB) EFI region mapping space
-ffffffff00000000 - ffffffff7fffffff (31 bits,      2 GB) unused hole
-ffffffff80000000 - ffffffff9fffffff (=29 bits,   512 MB) kernel text mapping, from phys 0
-ffffffffa0000000 - fffffffffeffffff (~31 bits,  1520 MB) module mapping space
-[fixmap start]   - ffffffffff5fffff kernel-internal fixmap range
-ffffffffff600000 - ffffffffff600fff (             =4 kB) legacy vsyscall ABI
-ffffffffffe00000 - ffffffffffffffff (             =2 MB) unused hole
+Notes:
+
+ - Negative addresses such as "-23 TB" are absolute addresses in bytes, counted down
+   from the top of the 64-bit address space. It's easier to understand the layout
+   when seen both in absolute addresses and in distance-from-top notation.
+
+   For example 0xffffe90000000000 == -23 TB, it's 23 TB lower than the top of the
+   64-bit address space (ffffffffffffffff).
+
+   Note that as we get closer to the top of the address space, the notation changes
+   from TB to GB and then MB/KB.
+
+ - "16M TB" might look weird at first sight, but it's an easier to visualize size
+   notation than "16 EB", which few will recognize at first sight as 16 exabytes.
+   It also shows it nicely how incredibly large 64-bit address space is.
+
+--------------------------------------------------------------------------------
+start addr       | offset     | end addr         |  size   | VM area description
+-----------------|------------|------------------|---------|--------------------
+0000000000000000 |    0       | 00007fffffffffff |  128 TB | user-space virtual memory, different per mm
+                                                           |
+0000800000000000 | +128    TB | ffff7fffffffffff | ~16M TB | ... huge, almost 64 bits wide hole of non-canonical
+                                                                 virtual memory addresses up to the -128 TB
+                                                                 starting offset of kernel mappings.
+                                                           |
+                                                           |----------------------------------------------------
+                                                           | kernel-space virtual memory, shared between all processes:
+                                                           |
+ffff800000000000 | -128    TB | ffff87ffffffffff |    8 TB | ... guard hole, also reserved for hypervisor
+ffff880000000000 | -120    TB | ffffc7ffffffffff |   64 TB | direct mapping of all physical memory (page_offset_base),
+                                                             this is what limits max physical memory supported.
+ffffc80000000000 |  -56    TB | ffffc8ffffffffff |    1 TB | ... unused hole
+ffffc90000000000 |  -55    TB | ffffe8ffffffffff |   32 TB | vmalloc/ioremap space (vmalloc_base)
+ffffe90000000000 |  -23    TB | ffffe9ffffffffff |    1 TB | ... unused hole
+ffffea0000000000 |  -22    TB | ffffeaffffffffff |    1 TB | virtual memory map (vmemmap_base)
+ffffeb0000000000 |  -21    TB | ffffebffffffffff |    1 TB | ... unused hole
+ffffec0000000000 |  -20    TB | fffffbffffffffff |   16 TB | KASAN shadow memory
+fffffc0000000000 |   -4    TB | fffffdffffffffff |    2 TB | ... unused hole
+                                                             vaddr_end for KASLR
+
+fffffe0000000000 |   -2    TB | fffffe7fffffffff |  512 GB | cpu_entry_area mapping
+fffffe8000000000 |   -1.5  TB | fffffeffffffffff |  512 GB | LDT remap for PTI
+ffffff0000000000 |   -1    TB | ffffff7fffffffff |  512 GB | %esp fixup stacks
+
+# Identical layout to the 56-bit one from here on:
+
+ffffff8000000000 | -512    GB | fffffffeefffffff | ~507 GB | ... unused hole
+ffffffef00000000 |  -68    GB | fffffffeffffffff |   64 GB | EFI region mapping space
+ffffffff00000000 |   -4    GB | ffffffff7fffffff |    2 GB | ... unused hole
+ffffffff80000000 |   -2    GB | ffffffff9fffffff |  512 MB | kernel text mapping, mapped to physical address 0
+ffffffff80000000 |-2048    MB
+
+ffffffffa0000000 |-1536    MB | fffffffffeffffff | 1520 MB | module mapping space
+ffffffffff000000 |  -16    MB
+
+   FIXADDR_START | ~-11    MB | ffffffffff5fffff |         | kernel-internal fixmap range with variable size,
+                                                             typical size is around ~0.5 MB
+
+ffffffffff600000 |  -10    MB | ffffffffff600fff |    4 kB | legacy vsyscall ABI
+ffffffffffe00000 |   -2    MB | ffffffffffffffff |    2 MB | ... unused hole
+----------------------------------------------------------------------------
+
+
+========================================================
+| Complete virtual memory map with 5-level page tables |
+========================================================
+
+Notes:
+
+ - With 56-bit addresses, user-space memory gets expanded by a factor of 512x,
+   from 0.125 PB to 64 PB. All kernel mappings shift down to the -64 PT starting
+   offset and many of the regions expand to support the much larger physical
+   memory supported.
+
+--------------------------------------------------------------------------------
+start addr       | offset     | end addr         |  size   | VM area description
+-----------------|------------|------------------|---------|--------------------
+0000000000000000 |    0       | 00ffffffffffffff |   64 PB | user-space virtual memory, different per mm
+                                                           |
+0000800000000000 |  +64    PB | ffff7fffffffffff | ~16K PB | ... huge, still almost 64 bits wide hole of non-canonical
+                                                                 virtual memory addresses up to the -128 TB
+                                                                 starting offset of kernel mappings.
+                                                           |
+                                                           |----------------------------------------------------
+                                                           | kernel-space virtual memory, shared between all processes:
+                                                           |
+ff00000000000000 |  -64    PB | ff0fffffffffffff |    4 PB | ... guard hole, also reserved for hypervisor
+ff10000000000000 |  -60    PB | ff8fffffffffffff |   32 PB | direct mapping of all physical memory (page_offset_base),
+                                                             this is what limits max physical memory supported.
+ff90000000000000 |  -28    PB | ff9fffffffffffff |    4 PB | LDT remap for PTI
+ffa0000000000000 |  -24    PB | ffd1ffffffffffff | 12.5 PB | vmalloc/ioremap space (vmalloc_base)
+ffd2000000000000 |  -11.5  PB | ffd3ffffffffffff |  0.5 PB | ... unused hole
+ffd4000000000000 |  -11    PB | ffd5ffffffffffff |  0.5 PB | virtual memory map (vmemmap_base)
+ffd6000000000000 |  -10.5  PB | ffdeffffffffffff | 2.25 PB | ... unused hole
+ffdf000000000000 |   -8.25 PB | fffffdffffffffff |   ~8 PB | KASAN shadow memory
+fffffc0000000000 |   -4    TB | fffffdffffffffff |    2 TB | ... unused hole
+                                                             vaddr_end for KASLR
+
+fffffe0000000000 |   -2    TB | fffffe7fffffffff |  0.5 TB | cpu_entry_area mapping
+fffffe8000000000 |   -1.5  TB | fffffeffffffffff |  0.5 TB | ... unused hole
+ffffff0000000000 |   -1    TB | ffffff7fffffffff |  0.5 TB | %esp fixup stacks
+ffffff8000000000 |   -0.5  TB | ffffffeeffffffff |  444 GB | ... unused hole
+
+# Identical layout to the 47-bit one from here on:
+
+ffffffef00000000 |  -68    GB | fffffffeffffffff |   64 GB | EFI region mapping space
+ffffffff00000000 |   -4    GB | ffffffff7fffffff |    2 GB | ... unused hole
+ffffffff80000000 |   -2    GB | ffffffff9fffffff |  512 MB | kernel text mapping, mapped to physical address 0
+ffffffff80000000 |-2048    MB
+
+ffffffffa0000000 |-1536    MB | fffffffffeffffff | 1520 MB | module mapping space
+ffffffffff000000 |  -16    MB
+
+   FIXADDR_START | ~-11    MB | ffffffffff5fffff |         | kernel-internal fixmap range with variable size,
+                                                             typical size is around ~0.5 MB
+
+ffffffffff600000 |  -10    MB | ffffffffff600fff |    4 kB | legacy vsyscall ABI
+ffffffffffe00000 |   -2    MB | ffffffffffffffff |    2 MB | ... unused hole
+----------------------------------------------------------------------------
 
 Architecture defines a 64-bit virtual address. Implementations can support
 less. Currently supported are 48- and 57-bit virtual addresses. Bits 63


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/3] x86/mm/doc: Enhance the x86-64 virtual memory layout descriptions
  2018-10-06 12:22 ` [PATCH 4/3] x86/mm/doc: Enhance the x86-64 virtual memory layout descriptions Ingo Molnar
@ 2018-10-06 12:33   ` Ingo Molnar
  2018-10-06 14:41     ` Baoquan He
  2018-10-06 14:38   ` [PATCH 4/3 v2] " Ingo Molnar
  1 sibling, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2018-10-06 12:33 UTC (permalink / raw)
  To: Baoquan He, Kirill A. Shutemov
  Cc: linux-kernel, x86, linux-doc, tglx, thgarnie, corbet,
	Borislav Petkov, H. Peter Anvin, Andy Lutomirski, Peter Zijlstra,
	Linus Torvalds, Andrew Morton


* Ingo Molnar <mingo@kernel.org> wrote:

> +========================================================
> +| Complete virtual memory map with 4-level page tables |
> +========================================================

> +--------------------------------------------------------------------------------
> +start addr       | offset     | end addr         |  size   | VM area description
> +-----------------|------------|------------------|---------|--------------------

> +
> +# Identical layout to the 56-bit one from here on:
> +
> +ffffff8000000000 | -512    GB | fffffffeefffffff | ~507 GB | ... unused hole
> +ffffffef00000000 |  -68    GB | fffffffeffffffff |   64 GB | EFI region mapping space

> +========================================================
> +| Complete virtual memory map with 5-level page tables |
> +========================================================

> +ffffff8000000000 |   -0.5  TB | ffffffeeffffffff |  444 GB | ... unused hole
> +
> +# Identical layout to the 47-bit one from here on:
> +
> +ffffffef00000000 |  -68    GB | fffffffeffffffff |   64 GB | EFI region mapping space

So patch #2 appears to have introduced an error/typo in the 47-bit table. Note the weird size 
and discontinuity of the 'unused hole' in the 47-bit table, and compare it with 56-bit table:

  fffffffeefffffff
  ffffffeeffffffff

(Note how the incorrect end address was cargo-cult-copied into the 'size' field of ~507 GB...)

The correct number is the 56-bit one, and both tables should show the following identical 
layout:

  ffffff8000000000 | -512    GB | fffffffeefffffff |  444 GB | ... unused hole
  ffffffef00000000 |  -68    GB | fffffffeffffffff |   64 GB | EFI region mapping space

Agreed?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [tip:x86/mm] x86/KASLR: Update KERNEL_IMAGE_SIZE description
  2018-10-06  8:43 ` [PATCH 1/3] x86/KASLR: Update KERNEL_IMAGE_SIZE description Baoquan He
@ 2018-10-06 13:06   ` tip-bot for Baoquan He
  0 siblings, 0 replies; 19+ messages in thread
From: tip-bot for Baoquan He @ 2018-10-06 13:06 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, luto, dvlasenk, tglx, bhe, luto, torvalds, bp, hpa,
	linux-kernel, mingo, dave.hansen, brgerst, riel

Commit-ID:  06d4a462e954756f3d3d54e6f3f1bdc2e6f592a9
Gitweb:     https://git.kernel.org/tip/06d4a462e954756f3d3d54e6f3f1bdc2e6f592a9
Author:     Baoquan He <bhe@redhat.com>
AuthorDate: Sat, 6 Oct 2018 16:43:25 +0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 6 Oct 2018 14:46:46 +0200

x86/KASLR: Update KERNEL_IMAGE_SIZE description

Currently CONFIG_RANDOMIZE_BASE=y is set by default, which makes some of the
old comments above the KERNEL_IMAGE_SIZE definition out of date. Update them
to the current state of affairs.

Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: corbet@lwn.net
Cc: linux-doc@vger.kernel.org
Cc: thgarnie@google.com
Link: http://lkml.kernel.org/r/20181006084327.27467-2-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/page_64_types.h | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 6afac386a434..cd0cf1c568b4 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -59,13 +59,16 @@
 #endif
 
 /*
- * Kernel image size is limited to 1GiB due to the fixmap living in the
- * next 1GiB (see level2_kernel_pgt in arch/x86/kernel/head_64.S). Use
- * 512MiB by default, leaving 1.5GiB for modules once the page tables
- * are fully set up. If kernel ASLR is configured, it can extend the
- * kernel page table mapping, reducing the size of the modules area.
+ * Maximum kernel image size is limited to 1 GiB, due to the fixmap living
+ * in the next 1 GiB (see level2_kernel_pgt in arch/x86/kernel/head_64.S).
+ *
+ * On KASLR use 1 GiB by default, leaving 1 GiB for modules once the
+ * page tables are fully set up.
+ *
+ * If KASLR is disabled we can shrink it to 0.5 GiB and increase the size
+ * of the modules area to 1.5 GiB.
  */
-#if defined(CONFIG_RANDOMIZE_BASE)
+#ifdef CONFIG_RANDOMIZE_BASE
 #define KERNEL_IMAGE_SIZE	(1024 * 1024 * 1024)
 #else
 #define KERNEL_IMAGE_SIZE	(512 * 1024 * 1024)

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [tip:x86/mm] x86/mm/doc: Clean up the x86-64 virtual memory layout descriptions
  2018-10-06  8:43 ` [PATCH 2/3] x86/mm/doc: Clean up the memory region layout descriptions Baoquan He
@ 2018-10-06 13:07   ` tip-bot for Baoquan He
  0 siblings, 0 replies; 19+ messages in thread
From: tip-bot for Baoquan He @ 2018-10-06 13:07 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, riel, luto, luto, bp, dave.hansen, peterz, tglx, dvlasenk,
	mingo, torvalds, brgerst, linux-kernel, bhe

Commit-ID:  5b12904065798fee8b153a506ac7b72d5ebbe26c
Gitweb:     https://git.kernel.org/tip/5b12904065798fee8b153a506ac7b72d5ebbe26c
Author:     Baoquan He <bhe@redhat.com>
AuthorDate: Sat, 6 Oct 2018 16:43:26 +0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 6 Oct 2018 14:46:47 +0200

x86/mm/doc: Clean up the x86-64 virtual memory layout descriptions

In Documentation/x86/x86_64/mm.txt, the description of the x86-64 virtual
memory layout has become a confusing hodgepodge of inconsistencies:

 - there's a hard to read mixture of 'TB' and 'bits' notation
 - the entries sometimes mention a size in the description and sometimes not
 - sometimes they list holes by address, sometimes only as an 'unused hole' line

So make it all a coherent, readable, well organized description.

Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: corbet@lwn.net
Cc: linux-doc@vger.kernel.org
Cc: thgarnie@google.com
Link: http://lkml.kernel.org/r/20181006084327.27467-3-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/x86/x86_64/mm.txt | 84 ++++++++++++++++++++---------------------
 1 file changed, 42 insertions(+), 42 deletions(-)

diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index 5432a96d31ff..b4bc95c9790e 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -1,55 +1,55 @@
 
 Virtual memory map with 4 level page tables:
 
-0000000000000000 - 00007fffffffffff (=47 bits) user space, different per mm
-hole caused by [47:63] sign extension
-ffff800000000000 - ffff87ffffffffff (=43 bits) guard hole, reserved for hypervisor
-ffff880000000000 - ffffc7ffffffffff (=64 TB) direct mapping of all phys. memory
-ffffc80000000000 - ffffc8ffffffffff (=40 bits) hole
-ffffc90000000000 - ffffe8ffffffffff (=45 bits) vmalloc/ioremap space
-ffffe90000000000 - ffffe9ffffffffff (=40 bits) hole
-ffffea0000000000 - ffffeaffffffffff (=40 bits) virtual memory map (1TB)
-... unused hole ...
-ffffec0000000000 - fffffbffffffffff (=44 bits) kasan shadow memory (16TB)
-... unused hole ...
+0000000000000000 - 00007fffffffffff (=47 bits,   128 TB) user space, different per mm
+				    hole caused by [47:63] sign extension
+ffff800000000000 - ffff87ffffffffff (=43 bits,     8 TB) guard hole, reserved for hypervisor
+ffff880000000000 - ffffc7ffffffffff (=46 bits,    64 TB) direct mapping of all phys. memory (page_offset_base)
+ffffc80000000000 - ffffc8ffffffffff (=40 bits,     1 TB) unused hole
+ffffc90000000000 - ffffe8ffffffffff (=45 bits,    32 TB) vmalloc/ioremap space (vmalloc_base)
+ffffe90000000000 - ffffe9ffffffffff (=40 bits,     1 TB) unused hole
+ffffea0000000000 - ffffeaffffffffff (=40 bits,     1 TB) virtual memory map (vmemmap_base)
+ffffeb0000000000 - ffffebffffffffff (=40 bits,     1 TB) unused hole
+ffffec0000000000 - fffffbffffffffff (=44 bits,    16 TB) kasan shadow memory
+fffffc0000000000 - fffffdffffffffff (=41 bits,     2 TB) unused hole
 				    vaddr_end for KASLR
-fffffe0000000000 - fffffe7fffffffff (=39 bits) cpu_entry_area mapping
-fffffe8000000000 - fffffeffffffffff (=39 bits) LDT remap for PTI
-ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks
-... unused hole ...
-ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space
-... unused hole ...
-ffffffff80000000 - ffffffff9fffffff (=512 MB)  kernel text mapping, from phys 0
-ffffffffa0000000 - fffffffffeffffff (1520 MB) module mapping space
+fffffe0000000000 - fffffe7fffffffff (=39 bits,   512 GB) cpu_entry_area mapping
+fffffe8000000000 - fffffeffffffffff (=39 bits,   512 GB) LDT remap for PTI
+ffffff0000000000 - ffffff7fffffffff (=39 bits,   512 GB) %esp fixup stacks
+ffffff8000000000 - fffffffeefffffff (~39 bits,  ~507 GB) unused hole
+ffffffef00000000 - fffffffeffffffff (=36 bits,    64 GB) EFI region mapping space
+ffffffff00000000 - ffffffff7fffffff (=31 bits,     2 GB) unused hole
+ffffffff80000000 - ffffffff9fffffff (=29 bits,   512 MB) kernel text mapping, from phys 0
+ffffffffa0000000 - fffffffffeffffff (~31 bits,  1520 MB) module mapping space
 [fixmap start]   - ffffffffff5fffff kernel-internal fixmap range
-ffffffffff600000 - ffffffffff600fff (=4 kB) legacy vsyscall ABI
-ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
+ffffffffff600000 - ffffffffff600fff (             =4 kB) legacy vsyscall ABI
+ffffffffffe00000 - ffffffffffffffff (             =2 MB) unused hole
 
 Virtual memory map with 5 level page tables:
 
-0000000000000000 - 00ffffffffffffff (=56 bits) user space, different per mm
-hole caused by [56:63] sign extension
-ff00000000000000 - ff0fffffffffffff (=52 bits) guard hole, reserved for hypervisor
-ff10000000000000 - ff8fffffffffffff (=55 bits) direct mapping of all phys. memory
-ff90000000000000 - ff9fffffffffffff (=52 bits) LDT remap for PTI
-ffa0000000000000 - ffd1ffffffffffff (=54 bits) vmalloc/ioremap space (12800 TB)
-ffd2000000000000 - ffd3ffffffffffff (=49 bits) hole
-ffd4000000000000 - ffd5ffffffffffff (=49 bits) virtual memory map (512TB)
-... unused hole ...
-ffdf000000000000 - fffffc0000000000 (=53 bits) kasan shadow memory (8PB)
-... unused hole ...
+0000000000000000 - 00ffffffffffffff (=56 bits,    64 PB) user space, different per mm
+				    hole caused by [56:63] sign extension
+ff00000000000000 - ff0fffffffffffff (=52 bits,     4 PB) guard hole, reserved for hypervisor
+ff10000000000000 - ff8fffffffffffff (=55 bits,    32 PB) direct mapping of all phys. memory (page_offset_base)
+ff90000000000000 - ff9fffffffffffff (=52 bits,     4 PB) LDT remap for PTI
+ffa0000000000000 - ffd1ffffffffffff (=53 bits, 12800 TB) vmalloc/ioremap space (vmalloc_base)
+ffd2000000000000 - ffd3ffffffffffff (=49 bits,   512 TB) unused hole
+ffd4000000000000 - ffd5ffffffffffff (=49 bits,   512 TB) virtual memory map (vmemmap_base)
+ffd6000000000000 - ffdeffffffffffff (~51 bits,  2304 TB) unused hole
+ffdf000000000000 - fffffdffffffffff (~53 bits,    ~8 PB) kasan shadow memory
+fffffc0000000000 - fffffdffffffffff (=41 bits,     2 TB) unused hole
 				    vaddr_end for KASLR
-fffffe0000000000 - fffffe7fffffffff (=39 bits) cpu_entry_area mapping
-... unused hole ...
-ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks
-... unused hole ...
-ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space
-... unused hole ...
-ffffffff80000000 - ffffffff9fffffff (=512 MB)  kernel text mapping, from phys 0
-ffffffffa0000000 - fffffffffeffffff (1520 MB) module mapping space
+fffffe0000000000 - fffffe7fffffffff (=39 bits,   512 GB) cpu_entry_area mapping
+fffffe8000000000 - fffffeffffffffff (=39 bits,   512 GB) unused hole
+ffffff0000000000 - ffffff7fffffffff (=39 bits,   512 GB) %esp fixup stacks
+ffffff8000000000 - ffffffeeffffffff (~39 bits,   444 GB) unused hole
+ffffffef00000000 - fffffffeffffffff (=36 bits,    64 GB) EFI region mapping space
+ffffffff00000000 - ffffffff7fffffff (31 bits,      2 GB) unused hole
+ffffffff80000000 - ffffffff9fffffff (=29 bits,   512 MB) kernel text mapping, from phys 0
+ffffffffa0000000 - fffffffffeffffff (~31 bits,  1520 MB) module mapping space
 [fixmap start]   - ffffffffff5fffff kernel-internal fixmap range
-ffffffffff600000 - ffffffffff600fff (=4 kB) legacy vsyscall ABI
-ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
+ffffffffff600000 - ffffffffff600fff (             =4 kB) legacy vsyscall ABI
+ffffffffffe00000 - ffffffffffffffff (             =2 MB) unused hole
 
 Architecture defines a 64-bit virtual address. Implementations can support
 less. Currently supported are 48- and 57-bit virtual addresses. Bits 63

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/3 v2] x86/mm/doc: Enhance the x86-64 virtual memory layout descriptions
  2018-10-06 12:22 ` [PATCH 4/3] x86/mm/doc: Enhance the x86-64 virtual memory layout descriptions Ingo Molnar
  2018-10-06 12:33   ` Ingo Molnar
@ 2018-10-06 14:38   ` Ingo Molnar
  2018-10-06 15:02     ` Baoquan He
  2018-10-06 17:03     ` Ingo Molnar
  1 sibling, 2 replies; 19+ messages in thread
From: Ingo Molnar @ 2018-10-06 14:38 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, x86, linux-doc, tglx, thgarnie, corbet,
	Borislav Petkov, H. Peter Anvin, Andy Lutomirski, Peter Zijlstra,
	Linus Torvalds, Andrew Morton


Find a new iteration below, fixed the bug, prettified the table some more.

Thanks,

	Ingo

=========>
After the cleanups from Baoquan He, make it even more readable:

 - Remove the 'bits' area size column: it's pretty pointless and was even
   wrong for some of the entries. Given that MB, GB, TB, PT are 10, 20,
   30 and 40 bits, a "8 TB" size description makes it obvious that it's
   43 bits.

 - Introduce an "offset" column:

    --------------------------------------------------------------------------------
    start addr       | offset     | end addr         |  size   | VM area description
    -----------------|------------|------------------|---------|--------------------
    ...
    ffff880000000000 | -120    TB | ffffc7ffffffffff |   64 TB | direct mapping of all physical memory (page_offset_base),
                                                                 this is what limits max physical memory supported.

   The -120 TB notation makes it obvious where this particular virtual memory
   region starts: 120 TB down from the top of the 64-bit virtual memory space.
   Especially the layout of the kernel mappings is a *lot* more obvious when
   written this way, plus it's much easier to compare it with the size column
   and understand/check/validate and modify the kernel's layout in the future.

 - Mark the part from where the 47-bit and 56-bit kernel layouts are 100% identical,
   this starts at the -512 GB offset and the EFI region.

 - Re-shuffle the size desciptions to be continous blocks of sizes, instead of the
   often mixed size. I.e. write "0.5 TB" instead of "512 GB" if we are still in
   the TB-granular region of the map.

 - Make the 47-bit and 56-bit descriptions use the *exact* same layout and wording,
   and only differ where there's a material difference. This makes it easy to compare
   the two tables side by side by switching between two terminal tabs.

 - Plus enhance a lot of other stylistic/typographical details: make the tables
   explicitly tabular, add headers, enhance certain entries, etc. etc.

Note that there are some apparent errors in the tables as well, but I'll fix
them in a separate patch to make it easier to review/validate.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: corbet@lwn.net
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: thgarnie@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/x86/x86_64/mm.txt | 171 ++++++++++++++++++++++++++++------------
 1 file changed, 120 insertions(+), 51 deletions(-)

diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index b4bc95c9790e..702898633b00 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -1,55 +1,124 @@
+====================================================
+Complete virtual memory map with 4-level page tables
+====================================================
 
-Virtual memory map with 4 level page tables:
-
-0000000000000000 - 00007fffffffffff (=47 bits,   128 TB) user space, different per mm
-				    hole caused by [47:63] sign extension
-ffff800000000000 - ffff87ffffffffff (=43 bits,     8 TB) guard hole, reserved for hypervisor
-ffff880000000000 - ffffc7ffffffffff (=46 bits,    64 TB) direct mapping of all phys. memory (page_offset_base)
-ffffc80000000000 - ffffc8ffffffffff (=40 bits,     1 TB) unused hole
-ffffc90000000000 - ffffe8ffffffffff (=45 bits,    32 TB) vmalloc/ioremap space (vmalloc_base)
-ffffe90000000000 - ffffe9ffffffffff (=40 bits,     1 TB) unused hole
-ffffea0000000000 - ffffeaffffffffff (=40 bits,     1 TB) virtual memory map (vmemmap_base)
-ffffeb0000000000 - ffffebffffffffff (=40 bits,     1 TB) unused hole
-ffffec0000000000 - fffffbffffffffff (=44 bits,    16 TB) kasan shadow memory
-fffffc0000000000 - fffffdffffffffff (=41 bits,     2 TB) unused hole
-				    vaddr_end for KASLR
-fffffe0000000000 - fffffe7fffffffff (=39 bits,   512 GB) cpu_entry_area mapping
-fffffe8000000000 - fffffeffffffffff (=39 bits,   512 GB) LDT remap for PTI
-ffffff0000000000 - ffffff7fffffffff (=39 bits,   512 GB) %esp fixup stacks
-ffffff8000000000 - fffffffeefffffff (~39 bits,  ~507 GB) unused hole
-ffffffef00000000 - fffffffeffffffff (=36 bits,    64 GB) EFI region mapping space
-ffffffff00000000 - ffffffff7fffffff (=31 bits,     2 GB) unused hole
-ffffffff80000000 - ffffffff9fffffff (=29 bits,   512 MB) kernel text mapping, from phys 0
-ffffffffa0000000 - fffffffffeffffff (~31 bits,  1520 MB) module mapping space
-[fixmap start]   - ffffffffff5fffff kernel-internal fixmap range
-ffffffffff600000 - ffffffffff600fff (             =4 kB) legacy vsyscall ABI
-ffffffffffe00000 - ffffffffffffffff (             =2 MB) unused hole
-
-Virtual memory map with 5 level page tables:
-
-0000000000000000 - 00ffffffffffffff (=56 bits,    64 PB) user space, different per mm
-				    hole caused by [56:63] sign extension
-ff00000000000000 - ff0fffffffffffff (=52 bits,     4 PB) guard hole, reserved for hypervisor
-ff10000000000000 - ff8fffffffffffff (=55 bits,    32 PB) direct mapping of all phys. memory (page_offset_base)
-ff90000000000000 - ff9fffffffffffff (=52 bits,     4 PB) LDT remap for PTI
-ffa0000000000000 - ffd1ffffffffffff (=53 bits, 12800 TB) vmalloc/ioremap space (vmalloc_base)
-ffd2000000000000 - ffd3ffffffffffff (=49 bits,   512 TB) unused hole
-ffd4000000000000 - ffd5ffffffffffff (=49 bits,   512 TB) virtual memory map (vmemmap_base)
-ffd6000000000000 - ffdeffffffffffff (~51 bits,  2304 TB) unused hole
-ffdf000000000000 - fffffdffffffffff (~53 bits,    ~8 PB) kasan shadow memory
-fffffc0000000000 - fffffdffffffffff (=41 bits,     2 TB) unused hole
-				    vaddr_end for KASLR
-fffffe0000000000 - fffffe7fffffffff (=39 bits,   512 GB) cpu_entry_area mapping
-fffffe8000000000 - fffffeffffffffff (=39 bits,   512 GB) unused hole
-ffffff0000000000 - ffffff7fffffffff (=39 bits,   512 GB) %esp fixup stacks
-ffffff8000000000 - ffffffeeffffffff (~39 bits,   444 GB) unused hole
-ffffffef00000000 - fffffffeffffffff (=36 bits,    64 GB) EFI region mapping space
-ffffffff00000000 - ffffffff7fffffff (31 bits,      2 GB) unused hole
-ffffffff80000000 - ffffffff9fffffff (=29 bits,   512 MB) kernel text mapping, from phys 0
-ffffffffa0000000 - fffffffffeffffff (~31 bits,  1520 MB) module mapping space
-[fixmap start]   - ffffffffff5fffff kernel-internal fixmap range
-ffffffffff600000 - ffffffffff600fff (             =4 kB) legacy vsyscall ABI
-ffffffffffe00000 - ffffffffffffffff (             =2 MB) unused hole
+Notes:
+
+ - Negative addresses such as "-23 TB" are absolute addresses in bytes, counted down
+   from the top of the 64-bit address space. It's easier to understand the layout
+   when seen both in absolute addresses and in distance-from-top notation.
+
+   For example 0xffffe90000000000 == -23 TB, it's 23 TB lower than the top of the
+   64-bit address space (ffffffffffffffff).
+
+   Note that as we get closer to the top of the address space, the notation changes
+   from TB to GB and then MB/KB.
+
+ - "16M TB" might look weird at first sight, but it's an easier to visualize size
+   notation than "16 EB", which few will recognize at first sight as 16 exabytes.
+   It also shows it nicely how incredibly large 64-bit address space is.
+
+========================================================================================================================
+    Start addr    |   Offset   |     End addr     |  Size   | VM area description
+========================================================================================================================
+                  |            |                  |         |
+ 0000000000000000 |    0       | 00007fffffffffff |  128 TB | user-space virtual memory, different per mm
+__________________|____________|__________________|_________|___________________________________________________________
+                  |            |                  |         |
+ 0000800000000000 | +128    TB | ffff7fffffffffff | ~16M TB | ... huge, almost 64 bits wide hole of non-canonical
+                  |            |                  |         |     virtual memory addresses up to the -128 TB
+                  |            |                  |         |     starting offset of kernel mappings.
+__________________|____________|__________________|_________|___________________________________________________________
+                                                            |
+                                                            | Kernel-space virtual memory, shared between all processes:
+____________________________________________________________|___________________________________________________________
+                  |            |                  |         |
+ ffff800000000000 | -128    TB | ffff87ffffffffff |    8 TB | ... guard hole, also reserved for hypervisor
+ ffff880000000000 | -120    TB | ffffc7ffffffffff |   64 TB | direct mapping of all physical memory (page_offset_base)
+ ffffc80000000000 |  -56    TB | ffffc8ffffffffff |    1 TB | ... unused hole
+ ffffc90000000000 |  -55    TB | ffffe8ffffffffff |   32 TB | vmalloc/ioremap space (vmalloc_base)
+ ffffe90000000000 |  -23    TB | ffffe9ffffffffff |    1 TB | ... unused hole
+ ffffea0000000000 |  -22    TB | ffffeaffffffffff |    1 TB | virtual memory map (vmemmap_base)
+ ffffeb0000000000 |  -21    TB | ffffebffffffffff |    1 TB | ... unused hole
+ ffffec0000000000 |  -20    TB | fffffbffffffffff |   16 TB | KASAN shadow memory
+ fffffc0000000000 |   -4    TB | fffffdffffffffff |    2 TB | ... unused hole
+                  |            |                  |         | vaddr_end for KASLR
+ fffffe0000000000 |   -2    TB | fffffe7fffffffff |  0.5 TB | cpu_entry_area mapping
+ fffffe8000000000 |   -1.5  TB | fffffeffffffffff |  0.5 TB | LDT remap for PTI
+ ffffff0000000000 |   -1    TB | ffffff7fffffffff |  0.5 TB | %esp fixup stacks
+__________________|____________|__________________|_________|____________________________________________________________
+                                                            |
+                                                            | Identical layout to the 47-bit one from here on:
+____________________________________________________________|____________________________________________________________
+                  |            |                  |         |
+ ffffff8000000000 | -512    GB | ffffffeeffffffff |  444 GB | ... unused hole
+ ffffffef00000000 |  -68    GB | fffffffeffffffff |   64 GB | EFI region mapping space
+ ffffffff00000000 |   -4    GB | ffffffff7fffffff |    2 GB | ... unused hole
+ ffffffff80000000 |   -2    GB | ffffffff9fffffff |  512 MB | kernel text mapping, mapped to physical address 0
+ ffffffff80000000 |-2048    MB |                  |         |
+ ffffffffa0000000 |-1536    MB | fffffffffeffffff | 1520 MB | module mapping space
+ ffffffffff000000 |  -16    MB |                  |         |
+    FIXADDR_START | ~-11    MB | ffffffffff5fffff | ~0.5 MB | kernel-internal fixmap range, variable size and offset
+ ffffffffff600000 |  -10    MB | ffffffffff600fff |    4 kB | legacy vsyscall ABI
+ ffffffffffe00000 |   -2    MB | ffffffffffffffff |    2 MB | ... unused hole
+__________________|____________|__________________|_________|___________________________________________________________
+
+
+====================================================
+Complete virtual memory map with 5-level page tables
+====================================================
+
+Notes:
+
+ - With 56-bit addresses, user-space memory gets expanded by a factor of 512x,
+   from 0.125 PB to 64 PB. All kernel mappings shift down to the -64 PT starting
+   offset and many of the regions expand to support the much larger physical
+   memory supported.
+
+========================================================================================================================
+    Start addr    |   Offset   |     End addr     |  Size   | VM area description
+========================================================================================================================
+                  |            |                  |         |
+ 0000000000000000 |    0       | 00ffffffffffffff |   64 PB | user-space virtual memory, different per mm
+__________________|____________|__________________|_________|___________________________________________________________
+                  |            |                  |         |
+ 0000800000000000 |  +64    PB | ffff7fffffffffff | ~16K PB | ... huge, still almost 64 bits wide hole of non-canonical
+                  |            |                  |         |     virtual memory addresses up to the -128 TB
+                  |            |                  |         |     starting offset of kernel mappings.
+__________________|____________|__________________|_________|___________________________________________________________
+                                                            |
+                                                            | Kernel-space virtual memory, shared between all processes:
+____________________________________________________________|___________________________________________________________
+                  |            |                  |         |
+ ff00000000000000 |  -64    PB | ff0fffffffffffff |    4 PB | ... guard hole, also reserved for hypervisor
+ ff10000000000000 |  -60    PB | ff8fffffffffffff |   32 PB | direct mapping of all physical memory (page_offset_base)
+ ff90000000000000 |  -28    PB | ff9fffffffffffff |    4 PB | LDT remap for PTI
+ ffa0000000000000 |  -24    PB | ffd1ffffffffffff | 12.5 PB | vmalloc/ioremap space (vmalloc_base)
+ ffd2000000000000 |  -11.5  PB | ffd3ffffffffffff |  0.5 PB | ... unused hole
+ ffd4000000000000 |  -11    PB | ffd5ffffffffffff |  0.5 PB | virtual memory map (vmemmap_base)
+ ffd6000000000000 |  -10.5  PB | ffdeffffffffffff | 2.25 PB | ... unused hole
+ ffdf000000000000 |   -8.25 PB | fffffdffffffffff |   ~8 PB | KASAN shadow memory
+ fffffc0000000000 |   -4    TB | fffffdffffffffff |    2 TB | ... unused hole
+                  |            |                  |         | vaddr_end for KASLR
+ fffffe0000000000 |   -2    TB | fffffe7fffffffff |  0.5 TB | cpu_entry_area mapping
+ fffffe8000000000 |   -1.5  TB | fffffeffffffffff |  0.5 TB | ... unused hole
+ ffffff0000000000 |   -1    TB | ffffff7fffffffff |  0.5 TB | %esp fixup stacks
+__________________|____________|__________________|_________|____________________________________________________________
+                                                            |
+                                                            | Identical layout to the 47-bit one from here on:
+____________________________________________________________|____________________________________________________________
+                  |            |                  |         |
+ ffffff8000000000 | -512    GB | ffffffeeffffffff |  444 GB | ... unused hole
+ ffffffef00000000 |  -68    GB | fffffffeffffffff |   64 GB | EFI region mapping space
+ ffffffff00000000 |   -4    GB | ffffffff7fffffff |    2 GB | ... unused hole
+ ffffffff80000000 |   -2    GB | ffffffff9fffffff |  512 MB | kernel text mapping, mapped to physical address 0
+ ffffffff80000000 |-2048    MB |                  |         |
+ ffffffffa0000000 |-1536    MB | fffffffffeffffff | 1520 MB | module mapping space
+ ffffffffff000000 |  -16    MB |                  |         |
+    FIXADDR_START | ~-11    MB | ffffffffff5fffff | ~0.5 MB | kernel-internal fixmap range, variable size and offset
+ ffffffffff600000 |  -10    MB | ffffffffff600fff |    4 kB | legacy vsyscall ABI
+ ffffffffffe00000 |   -2    MB | ffffffffffffffff |    2 MB | ... unused hole
+__________________|____________|__________________|_________|___________________________________________________________
 
 Architecture defines a 64-bit virtual address. Implementations can support
 less. Currently supported are 48- and 57-bit virtual addresses. Bits 63

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/3] x86/mm/doc: Enhance the x86-64 virtual memory layout descriptions
  2018-10-06 12:33   ` Ingo Molnar
@ 2018-10-06 14:41     ` Baoquan He
  0 siblings, 0 replies; 19+ messages in thread
From: Baoquan He @ 2018-10-06 14:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kirill A. Shutemov, linux-kernel, x86, linux-doc, tglx, thgarnie,
	corbet, Borislav Petkov, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Linus Torvalds, Andrew Morton

On 10/06/18 at 02:33pm, Ingo Molnar wrote:
> 
> * Ingo Molnar <mingo@kernel.org> wrote:
> 
> > +========================================================
> > +| Complete virtual memory map with 4-level page tables |
> > +========================================================
> 
> > +--------------------------------------------------------------------------------
> > +start addr       | offset     | end addr         |  size   | VM area description
> > +-----------------|------------|------------------|---------|--------------------
> 
> > +
> > +# Identical layout to the 56-bit one from here on:
> > +
> > +ffffff8000000000 | -512    GB | fffffffeefffffff | ~507 GB | ... unused hole
> > +ffffffef00000000 |  -68    GB | fffffffeffffffff |   64 GB | EFI region mapping space
> 
> > +========================================================
> > +| Complete virtual memory map with 5-level page tables |
> > +========================================================
> 
> > +ffffff8000000000 |   -0.5  TB | ffffffeeffffffff |  444 GB | ... unused hole
> > +
> > +# Identical layout to the 47-bit one from here on:
> > +
> > +ffffffef00000000 |  -68    GB | fffffffeffffffff |   64 GB | EFI region mapping space
> 
> So patch #2 appears to have introduced an error/typo in the 47-bit table. Note the weird size 
> and discontinuity of the 'unused hole' in the 47-bit table, and compare it with 56-bit table:
> 
>   fffffffeefffffff
>   ffffffeeffffffff
> 
> (Note how the incorrect end address was cargo-cult-copied into the 'size' field of ~507 GB...)
> 
> The correct number is the 56-bit one, and both tables should show the following identical 
> layout:
> 
>   ffffff8000000000 | -512    GB | fffffffeefffffff |  444 GB | ... unused hole
>   ffffffef00000000 |  -68    GB | fffffffeffffffff |   64 GB | EFI region mapping space
> 
> Agreed?

Yes, you are right. I wondered why the size is a weird unaligned value.
Sorry about that.

Thanks
Baoquan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/3 v2] x86/mm/doc: Enhance the x86-64 virtual memory layout descriptions
  2018-10-06 14:38   ` [PATCH 4/3 v2] " Ingo Molnar
@ 2018-10-06 15:02     ` Baoquan He
  2018-10-06 17:03     ` Ingo Molnar
  1 sibling, 0 replies; 19+ messages in thread
From: Baoquan He @ 2018-10-06 15:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, x86, linux-doc, tglx, thgarnie, corbet,
	Borislav Petkov, H. Peter Anvin, Andy Lutomirski, Peter Zijlstra,
	Linus Torvalds, Andrew Morton

On 10/06/18 at 04:38pm, Ingo Molnar wrote:
> +Notes:
> +
> + - Negative addresses such as "-23 TB" are absolute addresses in bytes, counted down
> +   from the top of the 64-bit address space. It's easier to understand the layout
> +   when seen both in absolute addresses and in distance-from-top notation.
> +
> +   For example 0xffffe90000000000 == -23 TB, it's 23 TB lower than the top of the
> +   64-bit address space (ffffffffffffffff).
> +
> +   Note that as we get closer to the top of the address space, the notation changes
> +   from TB to GB and then MB/KB.
> +
> + - "16M TB" might look weird at first sight, but it's an easier to visualize size
> +   notation than "16 EB", which few will recognize at first sight as 16 exabytes.
> +   It also shows it nicely how incredibly large 64-bit address space is.

Thanks, this looks much better than the old version and my change.

Reviewed-by: Baoquan He <bhe@redhat.com>

Thanks
Baoquan
> +
> +========================================================================================================================
> +    Start addr    |   Offset   |     End addr     |  Size   | VM area description
> +========================================================================================================================
> +                  |            |                  |         |
> + 0000000000000000 |    0       | 00007fffffffffff |  128 TB | user-space virtual memory, different per mm
> +__________________|____________|__________________|_________|___________________________________________________________
> +                  |            |                  |         |
> + 0000800000000000 | +128    TB | ffff7fffffffffff | ~16M TB | ... huge, almost 64 bits wide hole of non-canonical
> +                  |            |                  |         |     virtual memory addresses up to the -128 TB
> +                  |            |                  |         |     starting offset of kernel mappings.
> +__________________|____________|__________________|_________|___________________________________________________________
> +                                                            |
> +                                                            | Kernel-space virtual memory, shared between all processes:
> +____________________________________________________________|___________________________________________________________
> +                  |            |                  |         |
> + ffff800000000000 | -128    TB | ffff87ffffffffff |    8 TB | ... guard hole, also reserved for hypervisor
> + ffff880000000000 | -120    TB | ffffc7ffffffffff |   64 TB | direct mapping of all physical memory (page_offset_base)
> + ffffc80000000000 |  -56    TB | ffffc8ffffffffff |    1 TB | ... unused hole
> + ffffc90000000000 |  -55    TB | ffffe8ffffffffff |   32 TB | vmalloc/ioremap space (vmalloc_base)
> + ffffe90000000000 |  -23    TB | ffffe9ffffffffff |    1 TB | ... unused hole
> + ffffea0000000000 |  -22    TB | ffffeaffffffffff |    1 TB | virtual memory map (vmemmap_base)
> + ffffeb0000000000 |  -21    TB | ffffebffffffffff |    1 TB | ... unused hole
> + ffffec0000000000 |  -20    TB | fffffbffffffffff |   16 TB | KASAN shadow memory
> + fffffc0000000000 |   -4    TB | fffffdffffffffff |    2 TB | ... unused hole
> +                  |            |                  |         | vaddr_end for KASLR
> + fffffe0000000000 |   -2    TB | fffffe7fffffffff |  0.5 TB | cpu_entry_area mapping
> + fffffe8000000000 |   -1.5  TB | fffffeffffffffff |  0.5 TB | LDT remap for PTI
> + ffffff0000000000 |   -1    TB | ffffff7fffffffff |  0.5 TB | %esp fixup stacks
> +__________________|____________|__________________|_________|____________________________________________________________
> +                                                            |
> +                                                            | Identical layout to the 47-bit one from here on:
> +____________________________________________________________|____________________________________________________________
> +                  |            |                  |         |
> + ffffff8000000000 | -512    GB | ffffffeeffffffff |  444 GB | ... unused hole
> + ffffffef00000000 |  -68    GB | fffffffeffffffff |   64 GB | EFI region mapping space
> + ffffffff00000000 |   -4    GB | ffffffff7fffffff |    2 GB | ... unused hole
> + ffffffff80000000 |   -2    GB | ffffffff9fffffff |  512 MB | kernel text mapping, mapped to physical address 0
> + ffffffff80000000 |-2048    MB |                  |         |
> + ffffffffa0000000 |-1536    MB | fffffffffeffffff | 1520 MB | module mapping space
> + ffffffffff000000 |  -16    MB |                  |         |
> +    FIXADDR_START | ~-11    MB | ffffffffff5fffff | ~0.5 MB | kernel-internal fixmap range, variable size and offset
> + ffffffffff600000 |  -10    MB | ffffffffff600fff |    4 kB | legacy vsyscall ABI
> + ffffffffffe00000 |   -2    MB | ffffffffffffffff |    2 MB | ... unused hole
> +__________________|____________|__________________|_________|___________________________________________________________
> +
> +
> +====================================================
> +Complete virtual memory map with 5-level page tables
> +====================================================
> +
> +Notes:
> +
> + - With 56-bit addresses, user-space memory gets expanded by a factor of 512x,
> +   from 0.125 PB to 64 PB. All kernel mappings shift down to the -64 PT starting
> +   offset and many of the regions expand to support the much larger physical
> +   memory supported.
> +
> +========================================================================================================================
> +    Start addr    |   Offset   |     End addr     |  Size   | VM area description
> +========================================================================================================================
> +                  |            |                  |         |
> + 0000000000000000 |    0       | 00ffffffffffffff |   64 PB | user-space virtual memory, different per mm
> +__________________|____________|__________________|_________|___________________________________________________________
> +                  |            |                  |         |
> + 0000800000000000 |  +64    PB | ffff7fffffffffff | ~16K PB | ... huge, still almost 64 bits wide hole of non-canonical
> +                  |            |                  |         |     virtual memory addresses up to the -128 TB
> +                  |            |                  |         |     starting offset of kernel mappings.
> +__________________|____________|__________________|_________|___________________________________________________________
> +                                                            |
> +                                                            | Kernel-space virtual memory, shared between all processes:
> +____________________________________________________________|___________________________________________________________
> +                  |            |                  |         |
> + ff00000000000000 |  -64    PB | ff0fffffffffffff |    4 PB | ... guard hole, also reserved for hypervisor
> + ff10000000000000 |  -60    PB | ff8fffffffffffff |   32 PB | direct mapping of all physical memory (page_offset_base)
> + ff90000000000000 |  -28    PB | ff9fffffffffffff |    4 PB | LDT remap for PTI
> + ffa0000000000000 |  -24    PB | ffd1ffffffffffff | 12.5 PB | vmalloc/ioremap space (vmalloc_base)
> + ffd2000000000000 |  -11.5  PB | ffd3ffffffffffff |  0.5 PB | ... unused hole
> + ffd4000000000000 |  -11    PB | ffd5ffffffffffff |  0.5 PB | virtual memory map (vmemmap_base)
> + ffd6000000000000 |  -10.5  PB | ffdeffffffffffff | 2.25 PB | ... unused hole
> + ffdf000000000000 |   -8.25 PB | fffffdffffffffff |   ~8 PB | KASAN shadow memory
> + fffffc0000000000 |   -4    TB | fffffdffffffffff |    2 TB | ... unused hole
> +                  |            |                  |         | vaddr_end for KASLR
> + fffffe0000000000 |   -2    TB | fffffe7fffffffff |  0.5 TB | cpu_entry_area mapping
> + fffffe8000000000 |   -1.5  TB | fffffeffffffffff |  0.5 TB | ... unused hole
> + ffffff0000000000 |   -1    TB | ffffff7fffffffff |  0.5 TB | %esp fixup stacks
> +__________________|____________|__________________|_________|____________________________________________________________
> +                                                            |
> +                                                            | Identical layout to the 47-bit one from here on:
> +____________________________________________________________|____________________________________________________________
> +                  |            |                  |         |
> + ffffff8000000000 | -512    GB | ffffffeeffffffff |  444 GB | ... unused hole
> + ffffffef00000000 |  -68    GB | fffffffeffffffff |   64 GB | EFI region mapping space
> + ffffffff00000000 |   -4    GB | ffffffff7fffffff |    2 GB | ... unused hole
> + ffffffff80000000 |   -2    GB | ffffffff9fffffff |  512 MB | kernel text mapping, mapped to physical address 0
> + ffffffff80000000 |-2048    MB |                  |         |
> + ffffffffa0000000 |-1536    MB | fffffffffeffffff | 1520 MB | module mapping space
> + ffffffffff000000 |  -16    MB |                  |         |
> +    FIXADDR_START | ~-11    MB | ffffffffff5fffff | ~0.5 MB | kernel-internal fixmap range, variable size and offset
> + ffffffffff600000 |  -10    MB | ffffffffff600fff |    4 kB | legacy vsyscall ABI
> + ffffffffffe00000 |   -2    MB | ffffffffffffffff |    2 MB | ... unused hole
> +__________________|____________|__________________|_________|___________________________________________________________
>  
>  Architecture defines a 64-bit virtual address. Implementations can support
>  less. Currently supported are 48- and 57-bit virtual addresses. Bits 63

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/3 v2] x86/mm/doc: Enhance the x86-64 virtual memory layout descriptions
  2018-10-06 14:38   ` [PATCH 4/3 v2] " Ingo Molnar
  2018-10-06 15:02     ` Baoquan He
@ 2018-10-06 17:03     ` Ingo Molnar
  2018-10-06 22:17       ` Andy Lutomirski
  1 sibling, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2018-10-06 17:03 UTC (permalink / raw)
  To: Baoquan He, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Kirill A. Shutemov
  Cc: linux-kernel, x86, linux-doc, tglx, thgarnie, corbet,
	Borislav Petkov, H. Peter Anvin, Andy Lutomirski, Peter Zijlstra,
	Linus Torvalds, Andrew Morton


There's one PTI related layout asymmetry I noticed between 4-level and 5-level kernels:

  47-bit:
> +                                                            |
> +                                                            | Kernel-space virtual memory, shared between all processes:
> +____________________________________________________________|___________________________________________________________
> +                  |            |                  |         |
> + ffff800000000000 | -128    TB | ffff87ffffffffff |    8 TB | ... guard hole, also reserved for hypervisor
> + ffff880000000000 | -120    TB | ffffc7ffffffffff |   64 TB | direct mapping of all physical memory (page_offset_base)
> + ffffc80000000000 |  -56    TB | ffffc8ffffffffff |    1 TB | ... unused hole
> + ffffc90000000000 |  -55    TB | ffffe8ffffffffff |   32 TB | vmalloc/ioremap space (vmalloc_base)
> + ffffe90000000000 |  -23    TB | ffffe9ffffffffff |    1 TB | ... unused hole
> + ffffea0000000000 |  -22    TB | ffffeaffffffffff |    1 TB | virtual memory map (vmemmap_base)
> + ffffeb0000000000 |  -21    TB | ffffebffffffffff |    1 TB | ... unused hole
> + ffffec0000000000 |  -20    TB | fffffbffffffffff |   16 TB | KASAN shadow memory
> + fffffc0000000000 |   -4    TB | fffffdffffffffff |    2 TB | ... unused hole
> +                  |            |                  |         | vaddr_end for KASLR
> + fffffe0000000000 |   -2    TB | fffffe7fffffffff |  0.5 TB | cpu_entry_area mapping
> + fffffe8000000000 |   -1.5  TB | fffffeffffffffff |  0.5 TB | LDT remap for PTI
> + ffffff0000000000 |   -1    TB | ffffff7fffffffff |  0.5 TB | %esp fixup stacks
> +__________________|____________|__________________|_________|____________________________________________________________
> +                                                            |

  56-bit:
> +                                                            |
> +                                                            | Kernel-space virtual memory, shared between all processes:
> +____________________________________________________________|___________________________________________________________
> +                  |            |                  |         |
> + ff00000000000000 |  -64    PB | ff0fffffffffffff |    4 PB | ... guard hole, also reserved for hypervisor
> + ff10000000000000 |  -60    PB | ff8fffffffffffff |   32 PB | direct mapping of all physical memory (page_offset_base)
> + ff90000000000000 |  -28    PB | ff9fffffffffffff |    4 PB | LDT remap for PTI
> + ffa0000000000000 |  -24    PB | ffd1ffffffffffff | 12.5 PB | vmalloc/ioremap space (vmalloc_base)
> + ffd2000000000000 |  -11.5  PB | ffd3ffffffffffff |  0.5 PB | ... unused hole
> + ffd4000000000000 |  -11    PB | ffd5ffffffffffff |  0.5 PB | virtual memory map (vmemmap_base)
> + ffd6000000000000 |  -10.5  PB | ffdeffffffffffff | 2.25 PB | ... unused hole
> + ffdf000000000000 |   -8.25 PB | fffffdffffffffff |   ~8 PB | KASAN shadow memory
> + fffffc0000000000 |   -4    TB | fffffdffffffffff |    2 TB | ... unused hole
> +                  |            |                  |         | vaddr_end for KASLR
> + fffffe0000000000 |   -2    TB | fffffe7fffffffff |  0.5 TB | cpu_entry_area mapping
> + fffffe8000000000 |   -1.5  TB | fffffeffffffffff |  0.5 TB | ... unused hole
> + ffffff0000000000 |   -1    TB | ffffff7fffffffff |  0.5 TB | %esp fixup stacks

The two layouts are very similar beyond the shift in the offset and the region sizes, except 
one big asymmetry: is the placement of the LDT remap for PTI.

Is there any fundamental reason why the LDT area is mapped into a 4 petabyte (!) area on 56-bit 
kernels, instead of being at the -1.5 TB offset like on 47-bit kernels?

The only reason I can see is that this way is that it's currently coded at the PGD level only:

static void map_ldt_struct_to_user(struct mm_struct *mm)
{
        pgd_t *pgd = pgd_offset(mm, LDT_BASE_ADDR); 

        if (static_cpu_has(X86_FEATURE_PTI) && !mm->context.ldt)
                set_pgd(kernel_to_user_pgdp(pgd), *pgd);
}

( BTW., the 4 petabyte size of the area is misleading: a 5-level PGD entry covers 256 TB of 
  virtual memory, i.e 0.25 PB, not 4 PB. So in reality we have a 0.25 PB area there, used up
  by the LDT mapping in a single PGD entry, plus a 3.75 PB hole after that. )

... but unless I'm missing something it's not really fundamental for it to be at the PGD level 
- it could be two levels lower as well, and it could move back to the same place where it's on 
the 47-bit kernel.

The LDT mapping operation is pretty heavy already, and the actual use of the LDT is not 
impacted by where it's mapped, as the LDT is per mm so no remapping is required on context 
switch.

I.e. could we move the LDT over to the same place? This would make an even larger area of the 
address space identical between 47-bit and 56-bit kernels:

                                                            |
                                                            | Identical layout to the 47-bit one from here on:
____________________________________________________________|____________________________________________________________
                  |            |                  |         |
 fffffc0000000000 |   -4    TB | fffffdffffffffff |    2 TB | ... unused hole
                  |            |                  |         | vaddr_end for KASLR
 fffffe0000000000 |   -2    TB | fffffe7fffffffff |  0.5 TB | cpu_entry_area mapping
 fffffe8000000000 |   -1.5  TB | fffffeffffffffff |  0.5 TB | LDT remap for PTI
 ffffff0000000000 |   -1    TB | ffffff7fffffffff |  0.5 TB | %esp fixup stacks
 ffffff8000000000 | -512    GB | ffffffeeffffffff |  444 GB | ... unused hole
 ffffffef00000000 |  -68    GB | fffffffeffffffff |   64 GB | EFI region mapping space
 ffffffff00000000 |   -4    GB | ffffffff7fffffff |    2 GB | ... unused hole
 ffffffff80000000 |   -2    GB | ffffffff9fffffff |  512 MB | kernel text mapping, mapped to physical address 0
 ffffffff80000000 |-2048    MB |                  |         |
 ffffffffa0000000 |-1536    MB | fffffffffeffffff | 1520 MB | module mapping space
 ffffffffff000000 |  -16    MB |                  |         |
    FIXADDR_START | ~-11    MB | ffffffffff5fffff | ~0.5 MB | kernel-internal fixmap range, variable size and offset
 ffffffffff600000 |  -10    MB | ffffffffff600fff |    4 kB | legacy vsyscall ABI
 ffffffffffe00000 |   -2    MB | ffffffffffffffff |    2 MB | ... unused hole
__________________|____________|__________________|_________|___________________________________________________________

And the rest would basically just be 4 areas: the direct-mapping, vmalloc, vmemmap and KASAN 
areas - which are scaled according to whether it's a 47-bit or 56-bit kernel.

Thoughts?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/3 v2] x86/mm/doc: Enhance the x86-64 virtual memory layout descriptions
  2018-10-06 17:03     ` Ingo Molnar
@ 2018-10-06 22:17       ` Andy Lutomirski
  2018-10-09  0:35         ` Baoquan He
  0 siblings, 1 reply; 19+ messages in thread
From: Andy Lutomirski @ 2018-10-06 22:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Baoquan He, Andrew Lutomirski, Dave Hansen, Peter Zijlstra,
	Kirill A. Shutemov, LKML, X86 ML, linux-doc, Thomas Gleixner,
	Thomas Garnier, Jonathan Corbet, Borislav Petkov, H. Peter Anvin,
	Linus Torvalds, Andrew Morton

On Sat, Oct 6, 2018 at 10:03 AM Ingo Molnar <mingo@kernel.org> wrote:
>
>
> There's one PTI related layout asymmetry I noticed between 4-level and 5-level kernels:
>
>   47-bit:
> > +                                                            |
> > +                                                            | Kernel-space virtual memory, shared between all processes:
> > +____________________________________________________________|___________________________________________________________
> > +                  |            |                  |         |
> > + ffff800000000000 | -128    TB | ffff87ffffffffff |    8 TB | ... guard hole, also reserved for hypervisor
> > + ffff880000000000 | -120    TB | ffffc7ffffffffff |   64 TB | direct mapping of all physical memory (page_offset_base)
> > + ffffc80000000000 |  -56    TB | ffffc8ffffffffff |    1 TB | ... unused hole
> > + ffffc90000000000 |  -55    TB | ffffe8ffffffffff |   32 TB | vmalloc/ioremap space (vmalloc_base)
> > + ffffe90000000000 |  -23    TB | ffffe9ffffffffff |    1 TB | ... unused hole
> > + ffffea0000000000 |  -22    TB | ffffeaffffffffff |    1 TB | virtual memory map (vmemmap_base)
> > + ffffeb0000000000 |  -21    TB | ffffebffffffffff |    1 TB | ... unused hole
> > + ffffec0000000000 |  -20    TB | fffffbffffffffff |   16 TB | KASAN shadow memory
> > + fffffc0000000000 |   -4    TB | fffffdffffffffff |    2 TB | ... unused hole
> > +                  |            |                  |         | vaddr_end for KASLR
> > + fffffe0000000000 |   -2    TB | fffffe7fffffffff |  0.5 TB | cpu_entry_area mapping
> > + fffffe8000000000 |   -1.5  TB | fffffeffffffffff |  0.5 TB | LDT remap for PTI
> > + ffffff0000000000 |   -1    TB | ffffff7fffffffff |  0.5 TB | %esp fixup stacks
> > +__________________|____________|__________________|_________|____________________________________________________________
> > +                                                            |
>
>   56-bit:
> > +                                                            |
> > +                                                            | Kernel-space virtual memory, shared between all processes:
> > +____________________________________________________________|___________________________________________________________
> > +                  |            |                  |         |
> > + ff00000000000000 |  -64    PB | ff0fffffffffffff |    4 PB | ... guard hole, also reserved for hypervisor
> > + ff10000000000000 |  -60    PB | ff8fffffffffffff |   32 PB | direct mapping of all physical memory (page_offset_base)
> > + ff90000000000000 |  -28    PB | ff9fffffffffffff |    4 PB | LDT remap for PTI
> > + ffa0000000000000 |  -24    PB | ffd1ffffffffffff | 12.5 PB | vmalloc/ioremap space (vmalloc_base)
> > + ffd2000000000000 |  -11.5  PB | ffd3ffffffffffff |  0.5 PB | ... unused hole
> > + ffd4000000000000 |  -11    PB | ffd5ffffffffffff |  0.5 PB | virtual memory map (vmemmap_base)
> > + ffd6000000000000 |  -10.5  PB | ffdeffffffffffff | 2.25 PB | ... unused hole
> > + ffdf000000000000 |   -8.25 PB | fffffdffffffffff |   ~8 PB | KASAN shadow memory
> > + fffffc0000000000 |   -4    TB | fffffdffffffffff |    2 TB | ... unused hole
> > +                  |            |                  |         | vaddr_end for KASLR
> > + fffffe0000000000 |   -2    TB | fffffe7fffffffff |  0.5 TB | cpu_entry_area mapping
> > + fffffe8000000000 |   -1.5  TB | fffffeffffffffff |  0.5 TB | ... unused hole
> > + ffffff0000000000 |   -1    TB | ffffff7fffffffff |  0.5 TB | %esp fixup stacks
>
> The two layouts are very similar beyond the shift in the offset and the region sizes, except
> one big asymmetry: is the placement of the LDT remap for PTI.
>
> Is there any fundamental reason why the LDT area is mapped into a 4 petabyte (!) area on 56-bit
> kernels, instead of being at the -1.5 TB offset like on 47-bit kernels?
>
> The only reason I can see is that this way is that it's currently coded at the PGD level only:
>
> static void map_ldt_struct_to_user(struct mm_struct *mm)
> {
>         pgd_t *pgd = pgd_offset(mm, LDT_BASE_ADDR);
>
>         if (static_cpu_has(X86_FEATURE_PTI) && !mm->context.ldt)
>                 set_pgd(kernel_to_user_pgdp(pgd), *pgd);
> }
>
> ( BTW., the 4 petabyte size of the area is misleading: a 5-level PGD entry covers 256 TB of
>   virtual memory, i.e 0.25 PB, not 4 PB. So in reality we have a 0.25 PB area there, used up
>   by the LDT mapping in a single PGD entry, plus a 3.75 PB hole after that. )
>
> ... but unless I'm missing something it's not really fundamental for it to be at the PGD level
> - it could be two levels lower as well, and it could move back to the same place where it's on
> the 47-bit kernel.
>

The subtlety is that, if it's lower than the PGD level, there end up
being some tables that are private to each LDT-using mm that map
things other than the LDT.  Those tables cover the same address range
as some corresponding tables in init_mm, and if those tables in
init_mm change after the LDT mapping is set up, the changes won't
propagate.

So it probably could be made to work, but it would take some extra care.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/3 v2] x86/mm/doc: Enhance the x86-64 virtual memory layout descriptions
  2018-10-06 22:17       ` Andy Lutomirski
@ 2018-10-09  0:35         ` Baoquan He
  2018-10-09  4:48           ` Baoquan He
  0 siblings, 1 reply; 19+ messages in thread
From: Baoquan He @ 2018-10-09  0:35 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, Dave Hansen, Peter Zijlstra, Kirill A. Shutemov,
	LKML, X86 ML, linux-doc, Thomas Gleixner, Thomas Garnier,
	Jonathan Corbet, Borislav Petkov, H. Peter Anvin, Linus Torvalds,
	Andrew Morton

Hi Andy, Ingo

On 10/06/18 at 03:17pm, Andy Lutomirski wrote:
> On Sat, Oct 6, 2018 at 10:03 AM Ingo Molnar <mingo@kernel.org> wrote:
> > ... but unless I'm missing something it's not really fundamental for it to be at the PGD level
> > - it could be two levels lower as well, and it could move back to the same place where it's on
> > the 47-bit kernel.
> >
> 
> The subtlety is that, if it's lower than the PGD level, there end up
> being some tables that are private to each LDT-using mm that map
> things other than the LDT.  Those tables cover the same address range
> as some corresponding tables in init_mm, and if those tables in
> init_mm change after the LDT mapping is set up, the changes won't
> propagate.
> 
> So it probably could be made to work, but it would take some extra care.

I didn't know LDT well before, after some investigation, seems mainly
user space program like Wine will use it to protect/isolate something
by calling modify_ldt syscall, and Xen also use it. still I don't know
how they will use it to manipulate code/data segments.

While from the current kernel code, it can contains array of 8192 entries,
and each entry is 8 Byte, when PTI not enabled. If PTI is enabled, it's
doubled, 2 slots to map, 2 * 8192 * 8, 128KB in all. So one pmd entry can
cover it.

In 4-level paging mode, we reserve 512 GB virtual address space for it to
map, the 512 GB is one PGD entry. In 5-level paging mode, we reserve 4
PB for mapping LDT, and leave the previous 512 GB space next to
cpu_entry_area mapping empty as unused hole. Maybe we can still put LDT
map for PTI in the old place, after cpu_entry_area mapping in 5-level.
Then in 5-level, 512 GB is only one p4d entry, however it's in the last
pgd entry, each pgd points to 256 TB area, and the last pgd entry will
points to p4d table which always exists in system since it contains
kernel text mapping etc. Now if LDT take one entry in the always
existing p4d table, maybe it can still works as before it owns a whole
pgd entry, oh, no, 4 PB will cost 16 pgd entries.

Most importantly, putting LDT map for PTI in KASLR area, won't it cause
code bug, if we randomize the direct mapping/vmaloc/vmemmap to make them
overlap with LDT map area? We didn't take LDT into consideration when do
memory region KASLR.


4-level virutal memory layout:

ffff800000000000 | -128    TB | ffff87ffffffffff |    8 TB | ... guard hole, also reserved for hypervisor
ffff880000000000 | -120    TB | ffffc7ffffffffff |   64 TB | direct mapping of all physical memory (page_offset_base)
ffffc80000000000 |  -56    TB | ffffc8ffffffffff |    1 TB | ... unused hole
ffffc90000000000 |  -55    TB | ffffe8ffffffffff |   32 TB | vmalloc/ioremap space (vmalloc_base)
ffffe90000000000 |  -23    TB | ffffe9ffffffffff |    1 TB | ... unused hole
ffffea0000000000 |  -22    TB | ffffeaffffffffff |    1 TB | virtual memory map (vmemmap_base)
ffffeb0000000000 |  -21    TB | ffffebffffffffff |    1 TB | ... unused hole
ffffec0000000000 |  -20    TB | fffffbffffffffff |   16 TB | KASAN shadow memory
fffffc0000000000 |   -4    TB | fffffdffffffffff |    2 TB | ... unused hole
                 |            |                  |         | vaddr_end for KASLR
fffffe0000000000 |   -2    TB | fffffe7fffffffff |  0.5 TB | cpu_entry_area mapping
fffffe8000000000 |   -1.5  TB | fffffeffffffffff |  0.5 TB | LDT remap for PTI
					^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^	
ffffff0000000000 |   -1    TB | ffffff7fffffffff |  0.5 TB | %esp fixup stacks

5-level virtual memory layout:

ff10000000000000 |  -60    PB | ff8fffffffffffff |   32 PB | direct mapping of all physical memory (page_offset_base)
ff90000000000000 |  -28    PB | ff9fffffffffffff |    4 PB | LDT remap for PTI
					^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ffa0000000000000 |  -24    PB | ffd1ffffffffffff | 12.5 PB | vmalloc/ioremap space (vmalloc_base)
ffd2000000000000 |  -11.5  PB | ffd3ffffffffffff |  0.5 PB | ... unused hole
ffd4000000000000 |  -11    PB | ffd5ffffffffffff |  0.5 PB | virtual memory map (vmemmap_base)
ffd6000000000000 |  -10.5  PB | ffdeffffffffffff | 2.25 PB | ... unused hole
ffdf000000000000 |   -8.25 PB | fffffdffffffffff |   ~8 PB | KASAN shadow memory
fffffc0000000000 |   -4    TB | fffffdffffffffff |    2 TB | ... unused hole
                 |            |                  |         | vaddr_end for KASLR
fffffe0000000000 |   -2    TB | fffffe7fffffffff |  0.5 TB | cpu_entry_area mapping
fffffe8000000000 |   -1.5  TB | fffffeffffffffff |  0.5 TB | ... unused hole
ffffff0000000000 |   -1    TB | ffffff7fffffffff |  0.5 TB | %esp fixup stacks

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/3 v2] x86/mm/doc: Enhance the x86-64 virtual memory layout descriptions
  2018-10-09  0:35         ` Baoquan He
@ 2018-10-09  4:48           ` Baoquan He
  0 siblings, 0 replies; 19+ messages in thread
From: Baoquan He @ 2018-10-09  4:48 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, Dave Hansen, Peter Zijlstra, Kirill A. Shutemov,
	LKML, X86 ML, linux-doc, Thomas Gleixner, Thomas Garnier,
	Jonathan Corbet, Borislav Petkov, H. Peter Anvin, Linus Torvalds,
	Andrew Morton

On 10/09/18 at 08:35am, Baoquan He wrote:
> Hi Andy, Ingo
> 
> On 10/06/18 at 03:17pm, Andy Lutomirski wrote:
> > On Sat, Oct 6, 2018 at 10:03 AM Ingo Molnar <mingo@kernel.org> wrote:
> > > ... but unless I'm missing something it's not really fundamental for it to be at the PGD level
> > > - it could be two levels lower as well, and it could move back to the same place where it's on
> > > the 47-bit kernel.
> > >
> > 
> > The subtlety is that, if it's lower than the PGD level, there end up
> > being some tables that are private to each LDT-using mm that map
> > things other than the LDT.  Those tables cover the same address range
> > as some corresponding tables in init_mm, and if those tables in
> > init_mm change after the LDT mapping is set up, the changes won't
> > propagate.
> > 
> > So it probably could be made to work, but it would take some extra care.
> 
> In 4-level paging mode, we reserve 512 GB virtual address space for it to
> map, the 512 GB is one PGD entry. In 5-level paging mode, we reserve 4
> PB for mapping LDT, and leave the previous 512 GB space next to
> cpu_entry_area mapping empty as unused hole. Maybe we can still put LDT
> map for PTI in the old place, after cpu_entry_area mapping in 5-level.
> Then in 5-level, 512 GB is only one p4d entry, however it's in the last
> pgd entry, each pgd points to 256 TB area, and the last pgd entry will
> points to p4d table which always exists in system since it contains
> kernel text mapping etc. Now if LDT take one entry in the always
> existing p4d table, maybe it can still works as before it owns a whole
> pgd entry, oh, no, 4 PB will cost 16 pgd entries.

Sorry, I am too long-winded. Here I mean that LDT map of 512 GB will
occupy one p4d entry alone, and the corresponding pgd and p4d table are
all always presnet and populated and unchanged. It might not need
any page table change to propagate. Not sure if there's any other risk
in this case.

Thanks
Baoquan

> 
> Most importantly, putting LDT map for PTI in KASLR area, won't it cause
> code bug, if we randomize the direct mapping/vmaloc/vmemmap to make them
> overlap with LDT map area? We didn't take LDT into consideration when do
> memory region KASLR.
> 
> 
> 4-level virutal memory layout:
> 
> ffff800000000000 | -128    TB | ffff87ffffffffff |    8 TB | ... guard hole, also reserved for hypervisor
> ffff880000000000 | -120    TB | ffffc7ffffffffff |   64 TB | direct mapping of all physical memory (page_offset_base)
> ffffc80000000000 |  -56    TB | ffffc8ffffffffff |    1 TB | ... unused hole
> ffffc90000000000 |  -55    TB | ffffe8ffffffffff |   32 TB | vmalloc/ioremap space (vmalloc_base)
> ffffe90000000000 |  -23    TB | ffffe9ffffffffff |    1 TB | ... unused hole
> ffffea0000000000 |  -22    TB | ffffeaffffffffff |    1 TB | virtual memory map (vmemmap_base)
> ffffeb0000000000 |  -21    TB | ffffebffffffffff |    1 TB | ... unused hole
> ffffec0000000000 |  -20    TB | fffffbffffffffff |   16 TB | KASAN shadow memory
> fffffc0000000000 |   -4    TB | fffffdffffffffff |    2 TB | ... unused hole
>                  |            |                  |         | vaddr_end for KASLR
> fffffe0000000000 |   -2    TB | fffffe7fffffffff |  0.5 TB | cpu_entry_area mapping
> fffffe8000000000 |   -1.5  TB | fffffeffffffffff |  0.5 TB | LDT remap for PTI
> 					^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^	
> ffffff0000000000 |   -1    TB | ffffff7fffffffff |  0.5 TB | %esp fixup stacks
> 
> 5-level virtual memory layout:
> 
> ff10000000000000 |  -60    PB | ff8fffffffffffff |   32 PB | direct mapping of all physical memory (page_offset_base)
> ff90000000000000 |  -28    PB | ff9fffffffffffff |    4 PB | LDT remap for PTI
> 					^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> ffa0000000000000 |  -24    PB | ffd1ffffffffffff | 12.5 PB | vmalloc/ioremap space (vmalloc_base)
> ffd2000000000000 |  -11.5  PB | ffd3ffffffffffff |  0.5 PB | ... unused hole
> ffd4000000000000 |  -11    PB | ffd5ffffffffffff |  0.5 PB | virtual memory map (vmemmap_base)
> ffd6000000000000 |  -10.5  PB | ffdeffffffffffff | 2.25 PB | ... unused hole
> ffdf000000000000 |   -8.25 PB | fffffdffffffffff |   ~8 PB | KASAN shadow memory
> fffffc0000000000 |   -4    TB | fffffdffffffffff |    2 TB | ... unused hole
>                  |            |                  |         | vaddr_end for KASLR
> fffffe0000000000 |   -2    TB | fffffe7fffffffff |  0.5 TB | cpu_entry_area mapping
> fffffe8000000000 |   -1.5  TB | fffffeffffffffff |  0.5 TB | ... unused hole
> ffffff0000000000 |   -1    TB | ffffff7fffffffff |  0.5 TB | %esp fixup stacks

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] x86/mm/doc: Clean up mm.txt
  2018-09-21  2:05 [PATCH 0/3] x86/mm/doc: Clean up mm.txt Baoquan He
@ 2018-09-27  0:02 ` Baoquan He
  0 siblings, 0 replies; 19+ messages in thread
From: Baoquan He @ 2018-09-27  0:02 UTC (permalink / raw)
  To: mingo, tglx, hpa
  Cc: linux-kernel, kirill.shutemov, x86, thgarnie, corbet, linux-doc, peterz

On 09/21/18 at 10:05am, Baoquan He wrote:
> This clean up is suggested by Ingo.

This series is messy, have sent v2. So NACK this v1 series.

> 
> It firstly fix the confusions in mm layout tables by unifying
> each memory region description in the consistent style.
> 
> Secondly take the KASLR words out of the mm layout tables to make
> it as a separate section to only list mm layout in non-KASLR case.
> Then add KASLR document at the end of mm.txt.
> 
> Meanwhile update document about KERNEL_IMAGE_SIZE in
> arch/x86/include/asm/page_64_types.h .
> 
> Baoquan He (3):
>   x86/KASLR: Update document about KERNEL_IMAGE_SIZE
>   x86/mm/doc: Clean up the memory region layout descriptions
>   x86/doc/kaslr.txt: Create a separate part of document abourt KASLR at
>     the end of file
> 
>  Documentation/x86/x86_64/mm.txt      | 138 +++++++++++++++++++++++------------
>  arch/x86/include/asm/page_64_types.h |   7 +-
>  2 files changed, 96 insertions(+), 49 deletions(-)
> 
> -- 
> 2.13.6
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 0/3] x86/mm/doc: Clean up mm.txt
@ 2018-09-21  2:05 Baoquan He
  2018-09-27  0:02 ` Baoquan He
  0 siblings, 1 reply; 19+ messages in thread
From: Baoquan He @ 2018-09-21  2:05 UTC (permalink / raw)
  To: mingo, tglx, hpa
  Cc: linux-kernel, kirill.shutemov, x86, thgarnie, corbet, linux-doc,
	peterz, Baoquan He

This clean up is suggested by Ingo.

It firstly fix the confusions in mm layout tables by unifying
each memory region description in the consistent style.

Secondly take the KASLR words out of the mm layout tables to make
it as a separate section to only list mm layout in non-KASLR case.
Then add KASLR document at the end of mm.txt.

Meanwhile update document about KERNEL_IMAGE_SIZE in
arch/x86/include/asm/page_64_types.h .

Baoquan He (3):
  x86/KASLR: Update document about KERNEL_IMAGE_SIZE
  x86/mm/doc: Clean up the memory region layout descriptions
  x86/doc/kaslr.txt: Create a separate part of document abourt KASLR at
    the end of file

 Documentation/x86/x86_64/mm.txt      | 138 +++++++++++++++++++++++------------
 arch/x86/include/asm/page_64_types.h |   7 +-
 2 files changed, 96 insertions(+), 49 deletions(-)

-- 
2.13.6


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-10-09  4:48 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-06  8:43 [PATCH 0/3] x86/mm/doc: Clean up mm.txt Baoquan He
2018-10-06  8:43 ` [PATCH 1/3] x86/KASLR: Update KERNEL_IMAGE_SIZE description Baoquan He
2018-10-06 13:06   ` [tip:x86/mm] " tip-bot for Baoquan He
2018-10-06  8:43 ` [PATCH 2/3] x86/mm/doc: Clean up the memory region layout descriptions Baoquan He
2018-10-06 13:07   ` [tip:x86/mm] x86/mm/doc: Clean up the x86-64 virtual memory " tip-bot for Baoquan He
2018-10-06  8:43 ` [PATCH 3/3] x86/doc/kaslr.txt: Create a separate part of document abourt KASLR at the end of file Baoquan He
2018-10-06 11:28 ` [PATCH 0/3] x86/mm/doc: Clean up mm.txt Baoquan He
2018-10-06 12:21 ` Ingo Molnar
2018-10-06 12:22 ` [PATCH 4/3] x86/mm/doc: Enhance the x86-64 virtual memory layout descriptions Ingo Molnar
2018-10-06 12:33   ` Ingo Molnar
2018-10-06 14:41     ` Baoquan He
2018-10-06 14:38   ` [PATCH 4/3 v2] " Ingo Molnar
2018-10-06 15:02     ` Baoquan He
2018-10-06 17:03     ` Ingo Molnar
2018-10-06 22:17       ` Andy Lutomirski
2018-10-09  0:35         ` Baoquan He
2018-10-09  4:48           ` Baoquan He
  -- strict thread matches above, loose matches on Subject: below --
2018-09-21  2:05 [PATCH 0/3] x86/mm/doc: Clean up mm.txt Baoquan He
2018-09-27  0:02 ` Baoquan He

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).