[PATCH v3 0/6] Several patches to fix code bugs, improve documents and clean up

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/6] Several patches to fix code bugs, improve documents and clean up
@ 2019-02-16 14:00 Baoquan He
  2019-02-16 14:00 ` [PATCH v3 1/6] x86/mm/KASLR: Improve code comments about struct kaslr_memory_region Baoquan He
                   ` (5 more replies)
  0 siblings, 6 replies; 23+ messages in thread
From: Baoquan He @ 2019-02-16 14:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, mingo, bp, hpa, dave.hansen, luto, peterz, x86, travis,
	thgarnie, keescook, akpm, yamada.masahiro, kirill, Baoquan He

The v2 post was here:
https://lkml.org/lkml/2018/9/9/56

During the v2 patch reviewing, Ingo suggested to:
 (1) improve the document in Documentation/x86/x86_64/mm.txt;
 (2) improve the documents about struct kaslr_memory_region;
 (3) open code unnecessary function get_padding();
 (4) improve the memory region randomization to be at 2M granularity;

*** Part (1)
has been done with below commits currently: 
d52888aa2753 x86/mm: Move LDT remap out of KASLR region on 5-level paging
32b89760ddf4 x86/mm/doc: Enhance the x86-64 virtual memory layout descriptions
5b1290406579 x86/mm/doc: Clean up the x86-64 virtual memory layout descriptions

*** Part (4)
has been investigated, the code change can be found here:
https://github.com/baoquan-he/linux/commits/mm-kaslr-2m-aligned

From test resut and Table 4-15 of Intel manual, changing the memory
randomization to be at 2M granularity is not doable.

  Table 4-15. Format of an IA-32e Page-Directory-Pointer-Table Entry (PDPTE) that Maps a 1-GByte Page

The current memory region KASLR is at 1 GB granularity, PUD aligned.
When I tested above patches on kvm guest, system work well with 1 GB
memory deployed. With 4 GB, KVM guest can't boot up. Finally I read
Intel CPU manual and found it's because the 1 GB page mapping need be
mapped at 1 GB aglined physical address. While the 2M granularity
randomization will break that and causes boot failure.

But we stil can improve the granularity in 5-level paging mode from 512
GB to 1 GB, this will be posted soon.

*** This patchset
includes the original three code bug fix patches in v2, and two new
patches to improve code comments about kaslr_memory_region and open
code unnecessary function get_padding(), meanwhile carry the known
SGI UV bug fix.

Note:
SGI UV bug fix is not tested yet, the idea was approved by SGI UV expert
Mike Travis, and the old post as below was tested and has been merged
into our RHEL distros. This new change doesn't change the way to
calculate the size of the direct mapping section, but only wrap the
calculation code into a new function calc_direct_mapping_size()
according to Ingo's suggestion.
https://lkml.org/lkml/2017/5/18/102

Baoquan He (6):
  x86/mm/KASLR: Improve code comments about struct kaslr_memory_region
  x86/mm/KASLR: Open code unnecessary function get_padding
  mm: Add build time sanity check for struct page size
  x86/mm/KASLR: Fix the wrong calculation of memory region initial size
  x86/mm/KASLR: Calculate the actual size of vmemmap region
  x86/mm/KASLR: Do not adapt the size of the direct mapping section for
    SGI UV system

 arch/x86/mm/kaslr.c | 131 +++++++++++++++++++++++++++++++++++---------
 mm/page_alloc.c     |   2 +
 2 files changed, 107 insertions(+), 26 deletions(-)

-- 
2.17.2

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 1/6] x86/mm/KASLR: Improve code comments about struct kaslr_memory_region
  2019-02-16 14:00 [PATCH v3 0/6] Several patches to fix code bugs, improve documents and clean up Baoquan He
@ 2019-02-16 14:00 ` Baoquan He
  2019-02-17 17:07   ` Kees Cook
  2019-02-16 14:00 ` [PATCH v3 2/6] x86/mm/KASLR: Open code unnecessary function get_padding Baoquan He
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: Baoquan He @ 2019-02-16 14:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, mingo, bp, hpa, dave.hansen, luto, peterz, x86, travis,
	thgarnie, keescook, akpm, yamada.masahiro, kirill, Baoquan He

The old comment above kaslr_memory_region is not clear enough to explain
the concepts of memory region KASLR.

[Ingo suggested this and helped to prettify the text]

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 arch/x86/mm/kaslr.c | 55 ++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 52 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index 3f452ffed7e9..d7c6e4e8e48e 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -42,10 +42,59 @@
 static const unsigned long vaddr_end = CPU_ENTRY_AREA_BASE;
 
 /*
- * Memory regions randomized by KASLR (except modules that use a separate logic
- * earlier during boot). The list is ordered based on virtual addresses. This
- * order is kept after randomization.
+ * 'struct kasl_memory_region' entries represent continuous chunks of
+ * kernel virtual memory regions, to be randomized by KASLR.
+ *
+ * ( The exception is the module space virtual memory window which
+ *   uses separate logic earlier during bootup. )
+ *
+ * Currently there are three such regions: the physical memory mapping,
+ * vmalloc and vmemmap regions.
+ *
+ * The array below has the entries ordered based on virtual addresses.
+ * The order is kept after randomization, i.e. the randomized
+ * virtual addresses of these regions are still ascending.
+ *
+ * Here are the fields:
+ *
+ * @base: points to a global variable used by the MM to get the
+ * virtual base address of any of the above regions. This allows the
+ * early KASLR code to modify these base addresses early during bootup,
+ * on a per bootup basis, without the MM code even being aware of whether
+ * it got changed and to what value.
+ *
+ * When KASLR is active then the MM code makes sure that for each region
+ * there's such a single, dynamic, global base address 'unsigned long'
+ * variable available for the KASLR code to point to and modify directly:
+ *
+ *	{ &page_offset_base, 0 },
+ *	{ &vmalloc_base,     0 },
+ *	{ &vmemmap_base,     1 },
+ *
+ * @size_tb: size in TB of each memory region. Thereinto, the size of
+ * the physical memory mapping region is variable, calculated according
+ * to the actual size of system RAM in order to save more space for
+ * randomization. The rest are fixed values related to paging mode.
+ *
+ * @size_tb: is the size of each memory region after randomization, and
+ * its unit is TB.
+ *
+ * Physical memory mapping: (actual RAM size + 10 TB padding)
+ * Vmalloc: 32 TB
+ * Vmemmap: 1 TB
+ *
+ * When randomize the layout, their order are kept, still the physical
+ * memory mapping region is handled fistly, next vmalloc and vmemmap.
+ * E.g the physical memory region, we limit the starting address to be
+ * taken from the 1st 1/3 part of the whole available virtual address
+ * space which is from 0xffff880000000000 to 0xfffffe0000000000, namely
+ * the original starting address of the physical memory mapping region
+ * to the starting address of cpu_entry_area mapping region. Once a random
+ * address is chosen for the physical memory mapping, we jump over the
+ * region and add 1G to begin the next region handling with the remaining
+ * available space.
  */
+
 static __initdata struct kaslr_memory_region {
 	unsigned long *base;
 	unsigned long size_tb;
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 2/6] x86/mm/KASLR: Open code unnecessary function get_padding
  2019-02-16 14:00 [PATCH v3 0/6] Several patches to fix code bugs, improve documents and clean up Baoquan He
  2019-02-16 14:00 ` [PATCH v3 1/6] x86/mm/KASLR: Improve code comments about struct kaslr_memory_region Baoquan He
@ 2019-02-16 14:00 ` Baoquan He
  2019-02-17 17:14   ` Kees Cook
  2019-02-16 14:00 ` [PATCH v3 3/6] mm: Add build time sanity check for struct page size Baoquan He
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: Baoquan He @ 2019-02-16 14:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, mingo, bp, hpa, dave.hansen, luto, peterz, x86, travis,
	thgarnie, keescook, akpm, yamada.masahiro, kirill, Baoquan He

It's used only twice and we do bit shifts in the parent function
anyway so it's not like it's hiding some uninteresting detail.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
---
 arch/x86/mm/kaslr.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index d7c6e4e8e48e..bf680929fe26 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -104,12 +104,6 @@ static __initdata struct kaslr_memory_region {
 	{ &vmemmap_base, 1 },
 };
 
-/* Get size in bytes used by the memory region */
-static inline unsigned long get_padding(struct kaslr_memory_region *region)
-{
-	return (region->size_tb << TB_SHIFT);
-}
-
 /*
  * Apply no randomization if KASLR was disabled at boot or if KASAN
  * is enabled. KASAN shadow mappings rely on regions being PGD aligned.
@@ -161,7 +155,7 @@ void __init kernel_randomize_memory(void)
 	/* Calculate entropy available between regions */
 	remain_entropy = vaddr_end - vaddr_start;
 	for (i = 0; i < ARRAY_SIZE(kaslr_regions); i++)
-		remain_entropy -= get_padding(&kaslr_regions[i]);
+		remain_entropy -= kaslr_regions[i].size_tb << TB_SHIFT;
 
 	prandom_seed_state(&rand_state, kaslr_get_random_long("Memory"));
 
@@ -185,7 +179,7 @@ void __init kernel_randomize_memory(void)
 		 * Jump the region and add a minimum padding based on
 		 * randomization alignment.
 		 */
-		vaddr += get_padding(&kaslr_regions[i]);
+		vaddr += kaslr_regions[i].size_tb << TB_SHIFT;
 		if (pgtable_l5_enabled())
 			vaddr = round_up(vaddr + 1, P4D_SIZE);
 		else
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 3/6] mm: Add build time sanity check for struct page size
  2019-02-16 14:00 [PATCH v3 0/6] Several patches to fix code bugs, improve documents and clean up Baoquan He
  2019-02-16 14:00 ` [PATCH v3 1/6] x86/mm/KASLR: Improve code comments about struct kaslr_memory_region Baoquan He
  2019-02-16 14:00 ` [PATCH v3 2/6] x86/mm/KASLR: Open code unnecessary function get_padding Baoquan He
@ 2019-02-16 14:00 ` Baoquan He
  2019-02-17 16:50   ` Kees Cook
  2019-02-16 14:00 ` [PATCH v3 4/6] x86/mm/KASLR: Fix the wrong calculation of memory region initial size Baoquan He
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: Baoquan He @ 2019-02-16 14:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, mingo, bp, hpa, dave.hansen, luto, peterz, x86, travis,
	thgarnie, keescook, akpm, yamada.masahiro, kirill, Baoquan He

Size of struct page might be larger than 64 bytes if debug options
enabled, or fields added for debugging intentionally. Yet an upper
limit need be added at build time to trigger an alert in case the
size is too big to boot up system, warning people to check if it's
be done on purpose in advance.

Here 1/4 of PAGE_SIZE is chosen since system must have been insane
with this value. For those systems with PAGE_SIZE larger than 4KB,
1KB is simply taken.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/page_alloc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 35fdde041f5c..eb6c8e22333b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -67,6 +67,7 @@
 #include <linux/lockdep.h>
 #include <linux/nmi.h>
 #include <linux/psi.h>
+#include <linux/sizes.h>
 
 #include <asm/sections.h>
 #include <asm/tlbflush.h>
@@ -7084,6 +7085,7 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
 	unsigned long start_pfn, end_pfn;
 	int i, nid;
 
+	BUILD_BUG_ON(sizeof(struct page) > min_t(size_t, SZ_1K, PAGE_SIZE));
 	/* Record where the zone boundaries are */
 	memset(arch_zone_lowest_possible_pfn, 0,
 				sizeof(arch_zone_lowest_possible_pfn));
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 4/6] x86/mm/KASLR: Fix the wrong calculation of memory region initial size
  2019-02-16 14:00 [PATCH v3 0/6] Several patches to fix code bugs, improve documents and clean up Baoquan He
                   ` (2 preceding siblings ...)
  2019-02-16 14:00 ` [PATCH v3 3/6] mm: Add build time sanity check for struct page size Baoquan He
@ 2019-02-16 14:00 ` Baoquan He
  2019-02-17 16:53   ` Kees Cook
  2019-02-16 14:00 ` [PATCH v3 5/6] x86/mm/KASLR: Calculate the actual size of vmemmap region Baoquan He
  2019-02-16 14:00 ` [PATCH v3 6/6] x86/mm/KASLR: Do not adapt the size of the direct mapping section for SGI UV system Baoquan He
  5 siblings, 1 reply; 23+ messages in thread
From: Baoquan He @ 2019-02-16 14:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, mingo, bp, hpa, dave.hansen, luto, peterz, x86, travis,
	thgarnie, keescook, akpm, yamada.masahiro, kirill, Baoquan He

In memory region KASLR, __PHYSICAL_MASK_SHIFT is taken to calculate
the initial size of the direct mapping region. This is correct in
the old code where __PHYSICAL_MASK_SHIFT was equal to MAX_PHYSMEM_BITS,
46 bits, and only 4-level mode was supported.

Later, in commit b83ce5ee91471d ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT
always 52"), __PHYSICAL_MASK_SHIFT was changed to be always 52 bits, no
matter it's 5-level or 4-level. This is wrong for 4-level paging. Then
when we adapt physical memory region size based on available memory, it
will overflow if the amount of system RAM and the padding is bigger
than 64 TB.

In fact, here MAX_PHYSMEM_BITS should be used instead. Fix it by
replacing __PHYSICAL_MASK_SHIFT with MAX_PHYSMEM_BITS.

Fixes: b83ce5ee9147 ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52")
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Baoquan He <bhe@redhat.com>
---
 arch/x86/mm/kaslr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index bf680929fe26..97768df923e3 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -137,7 +137,7 @@ void __init kernel_randomize_memory(void)
 	if (!kaslr_memory_enabled())
 		return;
 
-	kaslr_regions[0].size_tb = 1 << (__PHYSICAL_MASK_SHIFT - TB_SHIFT);
+	kaslr_regions[0].size_tb = 1 << (MAX_PHYSMEM_BITS - TB_SHIFT);
 	kaslr_regions[1].size_tb = VMALLOC_SIZE_TB;
 
 	/*
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 5/6] x86/mm/KASLR: Calculate the actual size of vmemmap region
  2019-02-16 14:00 [PATCH v3 0/6] Several patches to fix code bugs, improve documents and clean up Baoquan He
                   ` (3 preceding siblings ...)
  2019-02-16 14:00 ` [PATCH v3 4/6] x86/mm/KASLR: Fix the wrong calculation of memory region initial size Baoquan He
@ 2019-02-16 14:00 ` Baoquan He
  2019-02-17 17:25   ` Kees Cook
  2019-02-16 14:00 ` [PATCH v3 6/6] x86/mm/KASLR: Do not adapt the size of the direct mapping section for SGI UV system Baoquan He
  5 siblings, 1 reply; 23+ messages in thread
From: Baoquan He @ 2019-02-16 14:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, mingo, bp, hpa, dave.hansen, luto, peterz, x86, travis,
	thgarnie, keescook, akpm, yamada.masahiro, kirill, Baoquan He

Vmemmap region has different maximum size depending on paging mode.
Now its size is hardcoded as 1TB in memory KASLR, this is not
right for 5-level paging mode. It will cause overflow if vmemmap
region is randomized to be adjacent to cpu_entry_area region and
its actual size is bigger than 1TB.

So here calculate how many TB by the actual size of vmemmap region
and align up to 1TB boundary.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 arch/x86/mm/kaslr.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index 97768df923e3..ca12ed4e5239 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -101,7 +101,7 @@ static __initdata struct kaslr_memory_region {
 } kaslr_regions[] = {
 	{ &page_offset_base, 0 },
 	{ &vmalloc_base, 0 },
-	{ &vmemmap_base, 1 },
+	{ &vmemmap_base, 0 },
 };
 
 /*
@@ -121,6 +121,7 @@ void __init kernel_randomize_memory(void)
 	unsigned long rand, memory_tb;
 	struct rnd_state rand_state;
 	unsigned long remain_entropy;
+	unsigned long vmemmap_size;
 
 	vaddr_start = pgtable_l5_enabled() ? __PAGE_OFFSET_BASE_L5 : __PAGE_OFFSET_BASE_L4;
 	vaddr = vaddr_start;
@@ -152,6 +153,14 @@ void __init kernel_randomize_memory(void)
 	if (memory_tb < kaslr_regions[0].size_tb)
 		kaslr_regions[0].size_tb = memory_tb;
 
+	/*
+	 * Calculate how many TB vmemmap region needs, and align to
+	 * 1TB boundary.
+	 */
+	vmemmap_size = (kaslr_regions[0].size_tb << (TB_SHIFT - PAGE_SHIFT)) *
+		sizeof(struct page);
+	kaslr_regions[2].size_tb = DIV_ROUND_UP(vmemmap_size, 1UL << TB_SHIFT);
+
 	/* Calculate entropy available between regions */
 	remain_entropy = vaddr_end - vaddr_start;
 	for (i = 0; i < ARRAY_SIZE(kaslr_regions); i++)
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 6/6] x86/mm/KASLR: Do not adapt the size of the direct mapping section for SGI UV system
  2019-02-16 14:00 [PATCH v3 0/6] Several patches to fix code bugs, improve documents and clean up Baoquan He
                   ` (4 preceding siblings ...)
  2019-02-16 14:00 ` [PATCH v3 5/6] x86/mm/KASLR: Calculate the actual size of vmemmap region Baoquan He
@ 2019-02-16 14:00 ` Baoquan He
  2019-02-17  2:09   ` Baoquan He
  5 siblings, 1 reply; 23+ messages in thread
From: Baoquan He @ 2019-02-16 14:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, mingo, bp, hpa, dave.hansen, luto, peterz, x86, travis,
	thgarnie, keescook, akpm, yamada.masahiro, kirill, Baoquan He

On SGI UV system, kernel often hangs when KASLR is enabled. Disabling
KASLR makes kernel work well.

The back trace is:

kernel BUG at arch/x86/mm/init_64.c:311!
invalid opcode: 0000 [#1] SMP
[...]
RIP: 0010:__init_extra_mapping+0x188/0x196
[...]
Call Trace:
 init_extra_mapping_uc+0x13/0x15
 map_high+0x67/0x75
 map_mmioh_high_uv3+0x20a/0x219
 uv_system_init_hub+0x12d9/0x1496
 uv_system_init+0x27/0x29
 native_smp_prepare_cpus+0x28d/0x2d8
 kernel_init_freeable+0xdd/0x253
 ? rest_init+0x80/0x80
 kernel_init+0xe/0x110
 ret_from_fork+0x2c/0x40

This is because the SGI UV system need map its MMIOH region to the direct
mapping section, and the mapping happens in rest_init() which is much
later than the calling of kernel_randomize_memory() to do mm KASLR. So
mm KASLR can't count in the size of the MMIOH region when calculate the
needed size of address space for the direct mapping section.

When KASLR is disabled, there are 64TB address space for both system RAM
and the MMIOH regions to share. When KASLR is enabled, the current code
of mm KASLR only reserves the actual size of system RAM plus extra 10TB
for the direct mapping. Thus later the MMIOH mapping could go beyond
the upper bound of the direct mapping to step into VMALLOC or VMEMMAP area.
Then BUG_ON() in __init_extra_mapping() will be triggered.

E.g on the SGI UV3 machine where this bug was reported , there are two
MMIOH regions:

[    1.519001] UV: Map MMIOH0_HI 0xffc00000000 - 0x100000000000
[    1.523001] UV: Map MMIOH1_HI 0x100000000000 - 0x200000000000

They are [16TB-16G, 16TB) and [16TB, 32TB). On this machine, 512G RAM are
spread out to 1TB regions. Then above two SGI MMIOH regions also will be
mapped into the direct mapping section.

To fix it, we need check if it's SGI UV system by calling
is_early_uv_system() in kernel_randomize_memory(). If yes, do not adapt
thesize of the direct mapping section, just keep it as is, e.g in level-4
paging mode, 64TB.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 arch/x86/mm/kaslr.c | 57 +++++++++++++++++++++++++++++++++------------
 1 file changed, 42 insertions(+), 15 deletions(-)

diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index ca12ed4e5239..754b5da91d43 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -29,6 +29,7 @@
 #include <asm/pgtable.h>
 #include <asm/setup.h>
 #include <asm/kaslr.h>
+#include <asm/uv/uv.h>
 
 #include "mm_internal.h"
 
@@ -113,15 +114,51 @@ static inline bool kaslr_memory_enabled(void)
 	return kaslr_enabled() && !IS_ENABLED(CONFIG_KASAN);
 }
 
+/*
+ * Even though a huge virtual address space is reserved for the direct
+ * mapping of physical memory, e.g in 4-level pageing mode, it's 64TB,
+ * rare system can own enough physical memory to use it up, most are
+ * even less than 1TB. So with KASLR enabled, we adapt the size of
+ * direct mapping area to size of actual physical memory plus the
+ * configured padding CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING.
+ * The left part will be taken out to join memory randomization.
+ *
+ * Note that UV system is an exception, its MMIOH region need be mapped
+ * into the direct mapping area too, while the size can't be got until
+ * rest_init() calling. Hence for UV system, do not adapt the size
+ * of direct mapping area.
+ */
+static inline unsigned long calc_direct_mapping_size(void)
+{
+	unsigned long size_tb, memory_tb;
+
+	/*
+	 * Update Physical memory mapping to available and
+	 * add padding if needed (especially for memory hotplug support).
+	 */
+	memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) +
+		CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
+
+	size_tb = 1 << (MAX_PHYSMEM_BITS - TB_SHIFT);
+
+	/*
+	 * Adapt phyiscal memory region size based on available memory if
+	 * it's not UV system.
+	 */
+	if (memory_tb < size_tb && !is_early_uv_system())
+		size_tb = memory_tb;
+
+	return size_tb;
+}
+
 /* Initialize base and padding for each memory region randomized with KASLR */
 void __init kernel_randomize_memory(void)
 {
-	size_t i;
-	unsigned long vaddr_start, vaddr;
-	unsigned long rand, memory_tb;
-	struct rnd_state rand_state;
+	unsigned long vaddr_start, vaddr, rand;
 	unsigned long remain_entropy;
 	unsigned long vmemmap_size;
+	struct rnd_state rand_state;
+	size_t i;
 
 	vaddr_start = pgtable_l5_enabled() ? __PAGE_OFFSET_BASE_L5 : __PAGE_OFFSET_BASE_L4;
 	vaddr = vaddr_start;
@@ -138,20 +175,10 @@ void __init kernel_randomize_memory(void)
 	if (!kaslr_memory_enabled())
 		return;
 
-	kaslr_regions[0].size_tb = 1 << (MAX_PHYSMEM_BITS - TB_SHIFT);
+	kaslr_regions[0].size_tb = calc_direct_mapping_size();
 	kaslr_regions[1].size_tb = VMALLOC_SIZE_TB;
 
-	/*
-	 * Update Physical memory mapping to available and
-	 * add padding if needed (especially for memory hotplug support).
-	 */
 	BUG_ON(kaslr_regions[0].base != &page_offset_base);
-	memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) +
-		CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
-
-	/* Adapt phyiscal memory region size based on available memory */
-	if (memory_tb < kaslr_regions[0].size_tb)
-		kaslr_regions[0].size_tb = memory_tb;
 
 	/*
 	 * Calculate how many TB vmemmap region needs, and align to
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 6/6] x86/mm/KASLR: Do not adapt the size of the direct mapping section for SGI UV system
  2019-02-16 14:00 ` [PATCH v3 6/6] x86/mm/KASLR: Do not adapt the size of the direct mapping section for SGI UV system Baoquan He
@ 2019-02-17  2:09   ` Baoquan He
  2019-02-18 19:24     ` Mike Travis
  0 siblings, 1 reply; 23+ messages in thread
From: Baoquan He @ 2019-02-17  2:09 UTC (permalink / raw)
  To: travis, mike.travis
  Cc: tglx, mingo, bp, hpa, dave.hansen, luto, peterz, x86, thgarnie,
	linux-kernel, keescook, akpm, yamada.masahiro, kirill

Hi Mike,

On 02/16/19 at 10:00pm, Baoquan He wrote:
> On SGI UV system, kernel often hangs when KASLR is enabled. Disabling
> KASLR makes kernel work well.

I wrap codes which calculate the size of the direct mapping section
into a new function calc_direct_mapping_size() as Ingo suggested. This
code change has passed basic testing, but hasn't been tested on a
SGI UV machine after reproducing since it needs UV machine with UV
module installed of enough size.

To reproduce it, we can apply patches 0001~0005. If reproduced, patch
0006 can be applied on top to check if bug is fixed. Please help check
if the code is OK, if you have a machine, I can have a test.

Thanks
Baoquan

> 
> The back trace is:
> 
> kernel BUG at arch/x86/mm/init_64.c:311!
> invalid opcode: 0000 [#1] SMP
> [...]
> RIP: 0010:__init_extra_mapping+0x188/0x196
> [...]
> Call Trace:
>  init_extra_mapping_uc+0x13/0x15
>  map_high+0x67/0x75
>  map_mmioh_high_uv3+0x20a/0x219
>  uv_system_init_hub+0x12d9/0x1496
>  uv_system_init+0x27/0x29
>  native_smp_prepare_cpus+0x28d/0x2d8
>  kernel_init_freeable+0xdd/0x253
>  ? rest_init+0x80/0x80
>  kernel_init+0xe/0x110
>  ret_from_fork+0x2c/0x40
> 
> This is because the SGI UV system need map its MMIOH region to the direct
> mapping section, and the mapping happens in rest_init() which is much
> later than the calling of kernel_randomize_memory() to do mm KASLR. So
> mm KASLR can't count in the size of the MMIOH region when calculate the
> needed size of address space for the direct mapping section.
> 
> When KASLR is disabled, there are 64TB address space for both system RAM
> and the MMIOH regions to share. When KASLR is enabled, the current code
> of mm KASLR only reserves the actual size of system RAM plus extra 10TB
> for the direct mapping. Thus later the MMIOH mapping could go beyond
> the upper bound of the direct mapping to step into VMALLOC or VMEMMAP area.
> Then BUG_ON() in __init_extra_mapping() will be triggered.
> 
> E.g on the SGI UV3 machine where this bug was reported , there are two
> MMIOH regions:
> 
> [    1.519001] UV: Map MMIOH0_HI 0xffc00000000 - 0x100000000000
> [    1.523001] UV: Map MMIOH1_HI 0x100000000000 - 0x200000000000
> 
> They are [16TB-16G, 16TB) and [16TB, 32TB). On this machine, 512G RAM are
> spread out to 1TB regions. Then above two SGI MMIOH regions also will be
> mapped into the direct mapping section.
> 
> To fix it, we need check if it's SGI UV system by calling
> is_early_uv_system() in kernel_randomize_memory(). If yes, do not adapt
> thesize of the direct mapping section, just keep it as is, e.g in level-4
> paging mode, 64TB.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  arch/x86/mm/kaslr.c | 57 +++++++++++++++++++++++++++++++++------------
>  1 file changed, 42 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
> index ca12ed4e5239..754b5da91d43 100644
> --- a/arch/x86/mm/kaslr.c
> +++ b/arch/x86/mm/kaslr.c
> @@ -29,6 +29,7 @@
>  #include <asm/pgtable.h>
>  #include <asm/setup.h>
>  #include <asm/kaslr.h>
> +#include <asm/uv/uv.h>
>  
>  #include "mm_internal.h"
>  
> @@ -113,15 +114,51 @@ static inline bool kaslr_memory_enabled(void)
>  	return kaslr_enabled() && !IS_ENABLED(CONFIG_KASAN);
>  }
>  
> +/*
> + * Even though a huge virtual address space is reserved for the direct
> + * mapping of physical memory, e.g in 4-level pageing mode, it's 64TB,
> + * rare system can own enough physical memory to use it up, most are
> + * even less than 1TB. So with KASLR enabled, we adapt the size of
> + * direct mapping area to size of actual physical memory plus the
> + * configured padding CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING.
> + * The left part will be taken out to join memory randomization.
> + *
> + * Note that UV system is an exception, its MMIOH region need be mapped
> + * into the direct mapping area too, while the size can't be got until
> + * rest_init() calling. Hence for UV system, do not adapt the size
> + * of direct mapping area.
> + */
> +static inline unsigned long calc_direct_mapping_size(void)
> +{
> +	unsigned long size_tb, memory_tb;
> +
> +	/*
> +	 * Update Physical memory mapping to available and
> +	 * add padding if needed (especially for memory hotplug support).
> +	 */
> +	memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) +
> +		CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
> +
> +	size_tb = 1 << (MAX_PHYSMEM_BITS - TB_SHIFT);
> +
> +	/*
> +	 * Adapt phyiscal memory region size based on available memory if
> +	 * it's not UV system.
> +	 */
> +	if (memory_tb < size_tb && !is_early_uv_system())
> +		size_tb = memory_tb;
> +
> +	return size_tb;
> +}
> +
>  /* Initialize base and padding for each memory region randomized with KASLR */
>  void __init kernel_randomize_memory(void)
>  {
> -	size_t i;
> -	unsigned long vaddr_start, vaddr;
> -	unsigned long rand, memory_tb;
> -	struct rnd_state rand_state;
> +	unsigned long vaddr_start, vaddr, rand;
>  	unsigned long remain_entropy;
>  	unsigned long vmemmap_size;
> +	struct rnd_state rand_state;
> +	size_t i;
>  
>  	vaddr_start = pgtable_l5_enabled() ? __PAGE_OFFSET_BASE_L5 : __PAGE_OFFSET_BASE_L4;
>  	vaddr = vaddr_start;
> @@ -138,20 +175,10 @@ void __init kernel_randomize_memory(void)
>  	if (!kaslr_memory_enabled())
>  		return;
>  
> -	kaslr_regions[0].size_tb = 1 << (MAX_PHYSMEM_BITS - TB_SHIFT);
> +	kaslr_regions[0].size_tb = calc_direct_mapping_size();
>  	kaslr_regions[1].size_tb = VMALLOC_SIZE_TB;
>  
> -	/*
> -	 * Update Physical memory mapping to available and
> -	 * add padding if needed (especially for memory hotplug support).
> -	 */
>  	BUG_ON(kaslr_regions[0].base != &page_offset_base);
> -	memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) +
> -		CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
> -
> -	/* Adapt phyiscal memory region size based on available memory */
> -	if (memory_tb < kaslr_regions[0].size_tb)
> -		kaslr_regions[0].size_tb = memory_tb;
>  
>  	/*
>  	 * Calculate how many TB vmemmap region needs, and align to
> -- 
> 2.17.2
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 3/6] mm: Add build time sanity check for struct page size
  2019-02-16 14:00 ` [PATCH v3 3/6] mm: Add build time sanity check for struct page size Baoquan He
@ 2019-02-17 16:50   ` Kees Cook
  2019-02-18  8:07     ` Baoquan He
  0 siblings, 1 reply; 23+ messages in thread
From: Kees Cook @ 2019-02-17 16:50 UTC (permalink / raw)
  To: Baoquan He
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	X86 ML, Mike Travis, Thomas Garnier, Andrew Morton,
	Masahiro Yamada, Kirill A. Shutemov

On Sat, Feb 16, 2019 at 6:02 AM Baoquan He <bhe@redhat.com> wrote:
>
> Size of struct page might be larger than 64 bytes if debug options
> enabled, or fields added for debugging intentionally. Yet an upper
> limit need be added at build time to trigger an alert in case the
> size is too big to boot up system, warning people to check if it's
> be done on purpose in advance.
>
> Here 1/4 of PAGE_SIZE is chosen since system must have been insane
> with this value. For those systems with PAGE_SIZE larger than 4KB,
> 1KB is simply taken.
>
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  mm/page_alloc.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 35fdde041f5c..eb6c8e22333b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -67,6 +67,7 @@
>  #include <linux/lockdep.h>
>  #include <linux/nmi.h>
>  #include <linux/psi.h>
> +#include <linux/sizes.h>
>
>  #include <asm/sections.h>
>  #include <asm/tlbflush.h>
> @@ -7084,6 +7085,7 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
>         unsigned long start_pfn, end_pfn;
>         int i, nid;
>
> +       BUILD_BUG_ON(sizeof(struct page) > min_t(size_t, SZ_1K, PAGE_SIZE));

Are there systems with PAGE_SIZE < 1K? Maybe this should just be a
direct SZ_1K check?
(Also, perhaps this should use the new static_assert where struct page
is defined?)

-Kees

>         /* Record where the zone boundaries are */
>         memset(arch_zone_lowest_possible_pfn, 0,
>                                 sizeof(arch_zone_lowest_possible_pfn));
> --
> 2.17.2
>


-- 
Kees Cook

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 4/6] x86/mm/KASLR: Fix the wrong calculation of memory region initial size
  2019-02-16 14:00 ` [PATCH v3 4/6] x86/mm/KASLR: Fix the wrong calculation of memory region initial size Baoquan He
@ 2019-02-17 16:53   ` Kees Cook
  2019-02-18  8:30     ` Baoquan He
  0 siblings, 1 reply; 23+ messages in thread
From: Kees Cook @ 2019-02-17 16:53 UTC (permalink / raw)
  To: Baoquan He
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	X86 ML, Mike Travis, Thomas Garnier, Andrew Morton,
	Masahiro Yamada, Kirill A. Shutemov

On Sat, Feb 16, 2019 at 6:03 AM Baoquan He <bhe@redhat.com> wrote:
>
> In memory region KASLR, __PHYSICAL_MASK_SHIFT is taken to calculate
> the initial size of the direct mapping region. This is correct in
> the old code where __PHYSICAL_MASK_SHIFT was equal to MAX_PHYSMEM_BITS,
> 46 bits, and only 4-level mode was supported.
>
> Later, in commit b83ce5ee91471d ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT
> always 52"), __PHYSICAL_MASK_SHIFT was changed to be always 52 bits, no
> matter it's 5-level or 4-level. This is wrong for 4-level paging. Then
> when we adapt physical memory region size based on available memory, it
> will overflow if the amount of system RAM and the padding is bigger
> than 64 TB.
>
> In fact, here MAX_PHYSMEM_BITS should be used instead. Fix it by
> replacing __PHYSICAL_MASK_SHIFT with MAX_PHYSMEM_BITS.
>
> Fixes: b83ce5ee9147 ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52")
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reviewed-by: Thomas Garnier <thgarnie@google.com>
> Signed-off-by: Baoquan He <bhe@redhat.com>

Nice catch! I wish I had a system with >64TB RAM. ;)

Acked-by: Kees Cook <keescook@chromium.org>

> ---
>  arch/x86/mm/kaslr.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
> index bf680929fe26..97768df923e3 100644
> --- a/arch/x86/mm/kaslr.c
> +++ b/arch/x86/mm/kaslr.c
> @@ -137,7 +137,7 @@ void __init kernel_randomize_memory(void)
>         if (!kaslr_memory_enabled())
>                 return;
>
> -       kaslr_regions[0].size_tb = 1 << (__PHYSICAL_MASK_SHIFT - TB_SHIFT);
> +       kaslr_regions[0].size_tb = 1 << (MAX_PHYSMEM_BITS - TB_SHIFT);
>         kaslr_regions[1].size_tb = VMALLOC_SIZE_TB;
>
>         /*
> --
> 2.17.2
>


-- 
Kees Cook

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/6] x86/mm/KASLR: Improve code comments about struct kaslr_memory_region
  2019-02-16 14:00 ` [PATCH v3 1/6] x86/mm/KASLR: Improve code comments about struct kaslr_memory_region Baoquan He
@ 2019-02-17 17:07   ` Kees Cook
  2019-02-18  3:17     ` Baoquan He
  2019-03-12  0:55     ` Baoquan He
  0 siblings, 2 replies; 23+ messages in thread
From: Kees Cook @ 2019-02-17 17:07 UTC (permalink / raw)
  To: Baoquan He
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	X86 ML, Mike Travis, Thomas Garnier, Andrew Morton,
	Masahiro Yamada, Kirill A. Shutemov

On Sat, Feb 16, 2019 at 6:01 AM Baoquan He <bhe@redhat.com> wrote:
>
> The old comment above kaslr_memory_region is not clear enough to explain
> the concepts of memory region KASLR.
>
> [Ingo suggested this and helped to prettify the text]
>
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  arch/x86/mm/kaslr.c | 55 ++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 52 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
> index 3f452ffed7e9..d7c6e4e8e48e 100644
> --- a/arch/x86/mm/kaslr.c
> +++ b/arch/x86/mm/kaslr.c
> @@ -42,10 +42,59 @@
>  static const unsigned long vaddr_end = CPU_ENTRY_AREA_BASE;
>
>  /*
> - * Memory regions randomized by KASLR (except modules that use a separate logic
> - * earlier during boot). The list is ordered based on virtual addresses. This
> - * order is kept after randomization.
> + * 'struct kasl_memory_region' entries represent continuous chunks of

Typo: struct kaslr_memory_region

Also, while you're rewriting this, how about putting it in full
kern-doc format? (You're already using the "@field" style...) I think
you just need the "/**" header...

/**
 * struct name.... - short description...

> + * kernel virtual memory regions, to be randomized by KASLR.
> + *
> + * ( The exception is the module space virtual memory window which
> + *   uses separate logic earlier during bootup. )
> + *
> + * Currently there are three such regions: the physical memory mapping,
> + * vmalloc and vmemmap regions.
> + *
> + * The array below has the entries ordered based on virtual addresses.
> + * The order is kept after randomization, i.e. the randomized
> + * virtual addresses of these regions are still ascending.
> + *
> + * Here are the fields:
> + *
> + * @base: points to a global variable used by the MM to get the
> + * virtual base address of any of the above regions. This allows the
> + * early KASLR code to modify these base addresses early during bootup,
> + * on a per bootup basis, without the MM code even being aware of whether
> + * it got changed and to what value.
> + *
> + * When KASLR is active then the MM code makes sure that for each region
> + * there's such a single, dynamic, global base address 'unsigned long'
> + * variable available for the KASLR code to point to and modify directly:
> + *
> + *     { &page_offset_base, 0 },
> + *     { &vmalloc_base,     0 },
> + *     { &vmemmap_base,     1 },
> + *
> + * @size_tb: size in TB of each memory region. Thereinto, the size of

nit: "Thereinto" is odd. I'd say "Therefore".

> + * the physical memory mapping region is variable, calculated according
> + * to the actual size of system RAM in order to save more space for
> + * randomization. The rest are fixed values related to paging mode.
> + *
> + * @size_tb: is the size of each memory region after randomization, and
> + * its unit is TB.

Redundant lines?

> + *
> + * Physical memory mapping: (actual RAM size + 10 TB padding)
> + * Vmalloc: 32 TB
> + * Vmemmap: 1 TB
> + *
> + * When randomize the layout, their order are kept, still the physical
> + * memory mapping region is handled fistly, next vmalloc and vmemmap.

typo: "first"

> + * E.g the physical memory region, we limit the starting address to be
> + * taken from the 1st 1/3 part of the whole available virtual address
> + * space which is from 0xffff880000000000 to 0xfffffe0000000000, namely
> + * the original starting address of the physical memory mapping region
> + * to the starting address of cpu_entry_area mapping region. Once a random
> + * address is chosen for the physical memory mapping, we jump over the
> + * region and add 1G to begin the next region handling with the remaining
> + * available space.

Should the "operation" comments (rather than the struct field
comments) be moved to the start of the kernel_randomize_memory()
function instead?

-Kees

>   */
> +
>  static __initdata struct kaslr_memory_region {
>         unsigned long *base;
>         unsigned long size_tb;
> --
> 2.17.2
>


-- 
Kees Cook

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/6] x86/mm/KASLR: Open code unnecessary function get_padding
  2019-02-16 14:00 ` [PATCH v3 2/6] x86/mm/KASLR: Open code unnecessary function get_padding Baoquan He
@ 2019-02-17 17:14   ` Kees Cook
  0 siblings, 0 replies; 23+ messages in thread
From: Kees Cook @ 2019-02-17 17:14 UTC (permalink / raw)
  To: Baoquan He
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	X86 ML, Mike Travis, Thomas Garnier, Andrew Morton,
	Masahiro Yamada, Kirill A. Shutemov

On Sat, Feb 16, 2019 at 6:02 AM Baoquan He <bhe@redhat.com> wrote:
>
> It's used only twice and we do bit shifts in the parent function
> anyway so it's not like it's hiding some uninteresting detail.
>
> Suggested-by: Ingo Molnar <mingo@kernel.org>
> Signed-off-by: Baoquan He <bhe@redhat.com>

Acked-by: Kees Cook <keescook@chromium.org>

-Kees

> ---
>  arch/x86/mm/kaslr.c | 10 ++--------
>  1 file changed, 2 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
> index d7c6e4e8e48e..bf680929fe26 100644
> --- a/arch/x86/mm/kaslr.c
> +++ b/arch/x86/mm/kaslr.c
> @@ -104,12 +104,6 @@ static __initdata struct kaslr_memory_region {
>         { &vmemmap_base, 1 },
>  };
>
> -/* Get size in bytes used by the memory region */
> -static inline unsigned long get_padding(struct kaslr_memory_region *region)
> -{
> -       return (region->size_tb << TB_SHIFT);
> -}
> -
>  /*
>   * Apply no randomization if KASLR was disabled at boot or if KASAN
>   * is enabled. KASAN shadow mappings rely on regions being PGD aligned.
> @@ -161,7 +155,7 @@ void __init kernel_randomize_memory(void)
>         /* Calculate entropy available between regions */
>         remain_entropy = vaddr_end - vaddr_start;
>         for (i = 0; i < ARRAY_SIZE(kaslr_regions); i++)
> -               remain_entropy -= get_padding(&kaslr_regions[i]);
> +               remain_entropy -= kaslr_regions[i].size_tb << TB_SHIFT;
>
>         prandom_seed_state(&rand_state, kaslr_get_random_long("Memory"));
>
> @@ -185,7 +179,7 @@ void __init kernel_randomize_memory(void)
>                  * Jump the region and add a minimum padding based on
>                  * randomization alignment.
>                  */
> -               vaddr += get_padding(&kaslr_regions[i]);
> +               vaddr += kaslr_regions[i].size_tb << TB_SHIFT;
>                 if (pgtable_l5_enabled())
>                         vaddr = round_up(vaddr + 1, P4D_SIZE);
>                 else
> --
> 2.17.2
>


-- 
Kees Cook

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 5/6] x86/mm/KASLR: Calculate the actual size of vmemmap region
  2019-02-16 14:00 ` [PATCH v3 5/6] x86/mm/KASLR: Calculate the actual size of vmemmap region Baoquan He
@ 2019-02-17 17:25   ` Kees Cook
  2019-02-18  9:50     ` Baoquan He
  0 siblings, 1 reply; 23+ messages in thread
From: Kees Cook @ 2019-02-17 17:25 UTC (permalink / raw)
  To: Baoquan He
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	X86 ML, Mike Travis, Thomas Garnier, Andrew Morton,
	Masahiro Yamada, Kirill A. Shutemov

On Sat, Feb 16, 2019 at 6:04 AM Baoquan He <bhe@redhat.com> wrote:
>
> Vmemmap region has different maximum size depending on paging mode.
> Now its size is hardcoded as 1TB in memory KASLR, this is not
> right for 5-level paging mode. It will cause overflow if vmemmap
> region is randomized to be adjacent to cpu_entry_area region and
> its actual size is bigger than 1TB.
>
> So here calculate how many TB by the actual size of vmemmap region
> and align up to 1TB boundary.
>
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  arch/x86/mm/kaslr.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
> index 97768df923e3..ca12ed4e5239 100644
> --- a/arch/x86/mm/kaslr.c
> +++ b/arch/x86/mm/kaslr.c
> @@ -101,7 +101,7 @@ static __initdata struct kaslr_memory_region {
>  } kaslr_regions[] = {
>         { &page_offset_base, 0 },
>         { &vmalloc_base, 0 },
> -       { &vmemmap_base, 1 },
> +       { &vmemmap_base, 0 },
>  };
>
>  /*
> @@ -121,6 +121,7 @@ void __init kernel_randomize_memory(void)
>         unsigned long rand, memory_tb;
>         struct rnd_state rand_state;
>         unsigned long remain_entropy;
> +       unsigned long vmemmap_size;
>
>         vaddr_start = pgtable_l5_enabled() ? __PAGE_OFFSET_BASE_L5 : __PAGE_OFFSET_BASE_L4;
>         vaddr = vaddr_start;
> @@ -152,6 +153,14 @@ void __init kernel_randomize_memory(void)
>         if (memory_tb < kaslr_regions[0].size_tb)
>                 kaslr_regions[0].size_tb = memory_tb;
>
> +       /*
> +        * Calculate how many TB vmemmap region needs, and align to
> +        * 1TB boundary.

Can you describe why this is the right calculation? (This will help
explain why 4-level is different from 5-level here.)

> +        */
> +       vmemmap_size = (kaslr_regions[0].size_tb << (TB_SHIFT - PAGE_SHIFT)) *
> +               sizeof(struct page);
> +       kaslr_regions[2].size_tb = DIV_ROUND_UP(vmemmap_size, 1UL << TB_SHIFT);
> +
>         /* Calculate entropy available between regions */
>         remain_entropy = vaddr_end - vaddr_start;
>         for (i = 0; i < ARRAY_SIZE(kaslr_regions); i++)
> --
> 2.17.2
>

Otherwise, yeah, looks great!

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/6] x86/mm/KASLR: Improve code comments about struct kaslr_memory_region
  2019-02-17 17:07   ` Kees Cook
@ 2019-02-18  3:17     ` Baoquan He
  2019-03-12  3:45       ` Baoquan He
  2019-03-12  0:55     ` Baoquan He
  1 sibling, 1 reply; 23+ messages in thread
From: Baoquan He @ 2019-02-18  3:17 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	X86 ML, Mike Travis, Thomas Garnier, Andrew Morton,
	Masahiro Yamada, Kirill A. Shutemov

On 02/17/19 at 09:07am, Kees Cook wrote:
> > diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
> > index 3f452ffed7e9..d7c6e4e8e48e 100644
> > --- a/arch/x86/mm/kaslr.c
> > +++ b/arch/x86/mm/kaslr.c
> > @@ -42,10 +42,59 @@
> >  static const unsigned long vaddr_end = CPU_ENTRY_AREA_BASE;
> >
> >  /*
> > - * Memory regions randomized by KASLR (except modules that use a separate logic
> > - * earlier during boot). The list is ordered based on virtual addresses. This
> > - * order is kept after randomization.
> > + * 'struct kasl_memory_region' entries represent continuous chunks of
> 
> Typo: struct kaslr_memory_region

Will change.

Thanks for reviewing this patchset and great suggestions.

> 
> Also, while you're rewriting this, how about putting it in full
> kern-doc format? (You're already using the "@field" style...) I think
> you just need the "/**" header...

Sure, will update.

> 
> /**
>  * struct name.... - short description...
> 
> > + * kernel virtual memory regions, to be randomized by KASLR.
> > + *
> > + * ( The exception is the module space virtual memory window which
> > + *   uses separate logic earlier during bootup. )
> > + *
> > + * Currently there are three such regions: the physical memory mapping,
> > + * vmalloc and vmemmap regions.
> > + *
> > + * The array below has the entries ordered based on virtual addresses.
> > + * The order is kept after randomization, i.e. the randomized
> > + * virtual addresses of these regions are still ascending.
> > + *
> > + * Here are the fields:
> > + *
> > + * @base: points to a global variable used by the MM to get the
> > + * virtual base address of any of the above regions. This allows the
> > + * early KASLR code to modify these base addresses early during bootup,
> > + * on a per bootup basis, without the MM code even being aware of whether
> > + * it got changed and to what value.
> > + *
> > + * When KASLR is active then the MM code makes sure that for each region
> > + * there's such a single, dynamic, global base address 'unsigned long'
> > + * variable available for the KASLR code to point to and modify directly:
> > + *
> > + *     { &page_offset_base, 0 },
> > + *     { &vmalloc_base,     0 },
> > + *     { &vmemmap_base,     1 },
> > + *
> > + * @size_tb: size in TB of each memory region. Thereinto, the size of
> 
> nit: "Thereinto" is odd. I'd say "Therefore".

Will replace it with 'Therefore'.

> 
> > + * the physical memory mapping region is variable, calculated according
> > + * to the actual size of system RAM in order to save more space for
> > + * randomization. The rest are fixed values related to paging mode.
> > + *
> > + * @size_tb: is the size of each memory region after randomization, and
> > + * its unit is TB.
> 
> Redundant lines?

I added it on purpose to stress these regions and their sizes, can
remove this line. Or edit it like:


* @size_tb: is the size of each memory region after randomization, and
* its unit is TB:
*     Physical memory mapping: (actual RAM size + 10 TB padding)
*     Vmalloc: 32 TB
*     Vmemmap: 1 TB

> 
> > + *
> > + * Physical memory mapping: (actual RAM size + 10 TB padding)
> > + * Vmalloc: 32 TB
> > + * Vmemmap: 1 TB
> > + *
> > + * When randomize the layout, their order are kept, still the physical
> > + * memory mapping region is handled fistly, next vmalloc and vmemmap.
> 
> typo: "first"

Will change.

> 
> > + * E.g the physical memory region, we limit the starting address to be
> > + * taken from the 1st 1/3 part of the whole available virtual address
> > + * space which is from 0xffff880000000000 to 0xfffffe0000000000, namely
> > + * the original starting address of the physical memory mapping region
> > + * to the starting address of cpu_entry_area mapping region. Once a random
> > + * address is chosen for the physical memory mapping, we jump over the
> > + * region and add 1G to begin the next region handling with the remaining
> > + * available space.
> 
> Should the "operation" comments (rather than the struct field
> comments) be moved to the start of the kernel_randomize_memory()
> function instead?

This paragraph is used to describe the order in which regions are
handled, incidentally give an example to detail it. Since struct
kasl_memory_region is the core and only data KASLR handled, put it here.

> 
> -Kees
> 
> >   */
> > +
> >  static __initdata struct kaslr_memory_region {
> >         unsigned long *base;
> >         unsigned long size_tb;
> > --
> > 2.17.2
> >
> 
> 
> -- 
> Kees Cook

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 3/6] mm: Add build time sanity check for struct page size
  2019-02-17 16:50   ` Kees Cook
@ 2019-02-18  8:07     ` Baoquan He
  0 siblings, 0 replies; 23+ messages in thread
From: Baoquan He @ 2019-02-18  8:07 UTC (permalink / raw)
  To: Kees Cook, Kirill A. Shutemov
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	X86 ML, Mike Travis, Thomas Garnier, Andrew Morton,
	Masahiro Yamada

On 02/17/19 at 08:50am, Kees Cook wrote:
> On Sat, Feb 16, 2019 at 6:02 AM Baoquan He <bhe@redhat.com> wrote:
> >
> > Size of struct page might be larger than 64 bytes if debug options
> > enabled, or fields added for debugging intentionally. Yet an upper
> > limit need be added at build time to trigger an alert in case the
> > size is too big to boot up system, warning people to check if it's
> > be done on purpose in advance.
> >
> > Here 1/4 of PAGE_SIZE is chosen since system must have been insane
> > with this value. For those systems with PAGE_SIZE larger than 4KB,
> > 1KB is simply taken.
> >
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> > ---
> >  mm/page_alloc.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 35fdde041f5c..eb6c8e22333b 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -67,6 +67,7 @@
> >  #include <linux/lockdep.h>
> >  #include <linux/nmi.h>
> >  #include <linux/psi.h>
> > +#include <linux/sizes.h>
> >
> >  #include <asm/sections.h>
> >  #include <asm/tlbflush.h>
> > @@ -7084,6 +7085,7 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
> >         unsigned long start_pfn, end_pfn;
> >         int i, nid;
> >
> > +       BUILD_BUG_ON(sizeof(struct page) > min_t(size_t, SZ_1K, PAGE_SIZE));
> 
> Are there systems with PAGE_SIZE < 1K? Maybe this should just be a
> direct SZ_1K check?

This check was suggested by Kirill, I forgot adding his "Suggested-by",
sorry abou that.

Originally he suggested to add code in generic place like this:

	BUILD_BUG_ON(sizeof(struct page) < min(SZ_1K, PAGE_SIZE/4));

In later post, the kbuild test robot reported an build error on i386
ARCH,

http://lkml.kernel.org/r/20180911074733.GX1740@192.168.1.3

From the report hint, I thought 'PAGE_SIZE/4' also related to the macro
expansion, so change it as min_t(size_t, SZ_1K, PAGE_SIZE).

Just now I tried the build again, changing it back to 'PAGE_SIZE/4' also
works.

	BUILD_BUG_ON(sizeof(struct page) > min_t(size_t, SZ_1K, PAGE_SIZE/4));

I guess Kirill wants to make it as self explanatory for this check by
suggesting it as min(SZ_1K, PAGE_SIZE/4), to stress the 'PAGE_SIZE/4'.
As he said in mail thread, "If struct page is more than 1/4 of PAGE_SIZE
something is horribly broken".

lkml.kernel.org/r/20180903102642.rmzawwqsqjvh2mkb@kshutemo-mobl1

Just try to give more details about this adding, not defend. I am
fine to take any of them.


> (Also, perhaps this should use the new static_assert where struct page
> is defined?)

I searched with 'git grep', didn't find static_assert macro or function.
And also no finding in include/linux/mm_types.h. Could you please be
more specific?

Thanks
Baoquan
> 
> >         /* Record where the zone boundaries are */
> >         memset(arch_zone_lowest_possible_pfn, 0,
> >                                 sizeof(arch_zone_lowest_possible_pfn));
> > --
> > 2.17.2
> >
> 
> 
> -- 
> Kees Cook

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 4/6] x86/mm/KASLR: Fix the wrong calculation of memory region initial size
  2019-02-17 16:53   ` Kees Cook
@ 2019-02-18  8:30     ` Baoquan He
  0 siblings, 0 replies; 23+ messages in thread
From: Baoquan He @ 2019-02-18  8:30 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	X86 ML, Mike Travis, Thomas Garnier, Andrew Morton,
	Masahiro Yamada, Kirill A. Shutemov

On 02/17/19 at 08:53am, Kees Cook wrote:
> On Sat, Feb 16, 2019 at 6:03 AM Baoquan He <bhe@redhat.com> wrote:
> >
> > In memory region KASLR, __PHYSICAL_MASK_SHIFT is taken to calculate
> > the initial size of the direct mapping region. This is correct in
> > the old code where __PHYSICAL_MASK_SHIFT was equal to MAX_PHYSMEM_BITS,
> > 46 bits, and only 4-level mode was supported.
> >
> > Later, in commit b83ce5ee91471d ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT
> > always 52"), __PHYSICAL_MASK_SHIFT was changed to be always 52 bits, no
> > matter it's 5-level or 4-level. This is wrong for 4-level paging. Then
> > when we adapt physical memory region size based on available memory, it
> > will overflow if the amount of system RAM and the padding is bigger
> > than 64 TB.
> >
> > In fact, here MAX_PHYSMEM_BITS should be used instead. Fix it by
> > replacing __PHYSICAL_MASK_SHIFT with MAX_PHYSMEM_BITS.
> >
> > Fixes: b83ce5ee9147 ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52")
> > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Reviewed-by: Thomas Garnier <thgarnie@google.com>
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> 
> Nice catch! I wish I had a system with >64TB RAM. ;)
> 
> Acked-by: Kees Cook <keescook@chromium.org>

Thanks for reviewing and ack-ing. I don't have system with 64 TB RAM
either. This fix is from code reading. In patch 0006, the UV system
issue is a serious regression when I introduced KASLR into RHEL, now
even though a RHEL-only fix has been merged in our distros, the tracker
bug which tracks upstream fix will go to me during planning stage of
each RHEL version. After Kirill pushed 5-level code, SGI UV dev said the
old bug can't be reproduced any more in upstream kernel, I read code
and found that the code bug fixed in this patch will hide the SGI UV
issue :-).

> 
> > ---
> >  arch/x86/mm/kaslr.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
> > index bf680929fe26..97768df923e3 100644
> > --- a/arch/x86/mm/kaslr.c
> > +++ b/arch/x86/mm/kaslr.c
> > @@ -137,7 +137,7 @@ void __init kernel_randomize_memory(void)
> >         if (!kaslr_memory_enabled())
> >                 return;
> >
> > -       kaslr_regions[0].size_tb = 1 << (__PHYSICAL_MASK_SHIFT - TB_SHIFT);
> > +       kaslr_regions[0].size_tb = 1 << (MAX_PHYSMEM_BITS - TB_SHIFT);
> >         kaslr_regions[1].size_tb = VMALLOC_SIZE_TB;
> >
> >         /*
> > --
> > 2.17.2
> >
> 
> 
> -- 
> Kees Cook

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 5/6] x86/mm/KASLR: Calculate the actual size of vmemmap region
  2019-02-17 17:25   ` Kees Cook
@ 2019-02-18  9:50     ` Baoquan He
  2019-02-18 10:09       ` Baoquan He
  2019-02-18 10:11       ` Baoquan He
  0 siblings, 2 replies; 23+ messages in thread
From: Baoquan He @ 2019-02-18  9:50 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	X86 ML, Mike Travis, Thomas Garnier, Andrew Morton,
	Masahiro Yamada, Kirill A. Shutemov

On 02/17/19 at 09:25am, Kees Cook wrote:
> On Sat, Feb 16, 2019 at 6:04 AM Baoquan He <bhe@redhat.com> wrote:
> >
> > Vmemmap region has different maximum size depending on paging mode.
> > Now its size is hardcoded as 1TB in memory KASLR, this is not
> > right for 5-level paging mode. It will cause overflow if vmemmap
> > region is randomized to be adjacent to cpu_entry_area region and
> > its actual size is bigger than 1TB.
> >
> > So here calculate how many TB by the actual size of vmemmap region
> > and align up to 1TB boundary.
> >
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> > ---
> >  arch/x86/mm/kaslr.c | 11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
> > index 97768df923e3..ca12ed4e5239 100644
> > --- a/arch/x86/mm/kaslr.c
> > +++ b/arch/x86/mm/kaslr.c
> > @@ -101,7 +101,7 @@ static __initdata struct kaslr_memory_region {
> >  } kaslr_regions[] = {
> >         { &page_offset_base, 0 },
> >         { &vmalloc_base, 0 },
> > -       { &vmemmap_base, 1 },
> > +       { &vmemmap_base, 0 },
> >  };
> >
> >  /*
> > @@ -121,6 +121,7 @@ void __init kernel_randomize_memory(void)
> >         unsigned long rand, memory_tb;
> >         struct rnd_state rand_state;
> >         unsigned long remain_entropy;
> > +       unsigned long vmemmap_size;
> >
> >         vaddr_start = pgtable_l5_enabled() ? __PAGE_OFFSET_BASE_L5 : __PAGE_OFFSET_BASE_L4;
> >         vaddr = vaddr_start;
> > @@ -152,6 +153,14 @@ void __init kernel_randomize_memory(void)
> >         if (memory_tb < kaslr_regions[0].size_tb)
> >                 kaslr_regions[0].size_tb = memory_tb;
> >
> > +       /*
> > +        * Calculate how many TB vmemmap region needs, and align to
> > +        * 1TB boundary.
> 
> Can you describe why this is the right calculation? (This will help
> explain why 4-level is different from 5-level here.)

In the old code, the size of vmemmap is hardcoded as 1 TB. This is true
in 4-level paging mode, 64 TB RAM supported at most, and usually
sizeof(struct page) is 64 Bytes, it happens to be 1 TB.

However, in 5-level paging mode, 4 PB is the biggest RAM size we can
support, it's (4 PB)/64 == 1<<48, namely 256 TB area needed for vmemmap,
assuming sizeof(struct page) is 64 Bytes here.

So, the hardcoding of 1 TB is not correct for 5-level paging mode. 

Thanks
Baoquan

> 
> > +        */
> > +       vmemmap_size = (kaslr_regions[0].size_tb << (TB_SHIFT - PAGE_SHIFT)) *
> > +               sizeof(struct page);
> > +       kaslr_regions[2].size_tb = DIV_ROUND_UP(vmemmap_size, 1UL << TB_SHIFT);
> > +

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 5/6] x86/mm/KASLR: Calculate the actual size of vmemmap region
  2019-02-18  9:50     ` Baoquan He
@ 2019-02-18 10:09       ` Baoquan He
  2019-02-18 10:11       ` Baoquan He
  1 sibling, 0 replies; 23+ messages in thread
From: Baoquan He @ 2019-02-18 10:09 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	X86 ML, Mike Travis, Thomas Garnier, Andrew Morton,
	Masahiro Yamada, Kirill A. Shutemov

On 02/18/19 at 05:50pm, Baoquan He wrote:
> On 02/17/19 at 09:25am, Kees Cook wrote:
> > On Sat, Feb 16, 2019 at 6:04 AM Baoquan He <bhe@redhat.com> wrote:
> > >
> > > Vmemmap region has different maximum size depending on paging mode.
> > > Now its size is hardcoded as 1TB in memory KASLR, this is not
> > > right for 5-level paging mode. It will cause overflow if vmemmap
> > > region is randomized to be adjacent to cpu_entry_area region and
> > > its actual size is bigger than 1TB.
> > >
> > > So here calculate how many TB by the actual size of vmemmap region
> > > and align up to 1TB boundary.
> > >
> > > Signed-off-by: Baoquan He <bhe@redhat.com>
> > > ---
> > >  arch/x86/mm/kaslr.c | 11 ++++++++++-
> > >  1 file changed, 10 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
> > > index 97768df923e3..ca12ed4e5239 100644
> > > --- a/arch/x86/mm/kaslr.c
> > > +++ b/arch/x86/mm/kaslr.c
> > > @@ -101,7 +101,7 @@ static __initdata struct kaslr_memory_region {
> > >  } kaslr_regions[] = {
> > >         { &page_offset_base, 0 },
> > >         { &vmalloc_base, 0 },
> > > -       { &vmemmap_base, 1 },
> > > +       { &vmemmap_base, 0 },
> > >  };
> > >
> > >  /*
> > > @@ -121,6 +121,7 @@ void __init kernel_randomize_memory(void)
> > >         unsigned long rand, memory_tb;
> > >         struct rnd_state rand_state;
> > >         unsigned long remain_entropy;
> > > +       unsigned long vmemmap_size;
> > >
> > >         vaddr_start = pgtable_l5_enabled() ? __PAGE_OFFSET_BASE_L5 : __PAGE_OFFSET_BASE_L4;
> > >         vaddr = vaddr_start;
> > > @@ -152,6 +153,14 @@ void __init kernel_randomize_memory(void)
> > >         if (memory_tb < kaslr_regions[0].size_tb)
> > >                 kaslr_regions[0].size_tb = memory_tb;
> > >
> > > +       /*
> > > +        * Calculate how many TB vmemmap region needs, and align to
> > > +        * 1TB boundary.
> > 
> > Can you describe why this is the right calculation? (This will help
> > explain why 4-level is different from 5-level here.)
> 
> In the old code, the size of vmemmap is hardcoded as 1 TB. This is true
> in 4-level paging mode, 64 TB RAM supported at most, and usually
> sizeof(struct page) is 64 Bytes, it happens to be 1 TB.
> 
> However, in 5-level paging mode, 4 PB is the biggest RAM size we can
> support, it's (4 PB)/64 == 1<<48, namely 256 TB area needed for vmemmap,
> assuming sizeof(struct page) is 64 Bytes here.
> 
> So, the hardcoding of 1 TB is not correct for 5-level paging mode. 
> 
> Thanks
> Baoquan
> 
> > 
> > > +        */
> > > +       vmemmap_size = (kaslr_regions[0].size_tb << (TB_SHIFT - PAGE_SHIFT)) *
> > > +               sizeof(struct page);
> > > +       kaslr_regions[2].size_tb = DIV_ROUND_UP(vmemmap_size, 1UL << TB_SHIFT);

Forgot mentioning what this patch is trying to do. As you can see, we
just use the adjusted size of the direct mapping section, including the
possible later hotplugged memory region, to calculate the actual needed
size for vmemmap, then aligned up to 1 TB. For 4-level paging mode, it's
still 1 TB. For 5-level paging mode, it will be smaller than 256 TB
which is in the case of 4 PB RAM installed.

Thanks
Baoquan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 5/6] x86/mm/KASLR: Calculate the actual size of vmemmap region
  2019-02-18  9:50     ` Baoquan He
  2019-02-18 10:09       ` Baoquan He
@ 2019-02-18 10:11       ` Baoquan He
  1 sibling, 0 replies; 23+ messages in thread
From: Baoquan He @ 2019-02-18 10:11 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	X86 ML, Mike Travis, Thomas Garnier, Andrew Morton,
	Masahiro Yamada, Kirill A. Shutemov

On 02/18/19 at 05:50pm, Baoquan He wrote:
> On 02/17/19 at 09:25am, Kees Cook wrote:
> > On Sat, Feb 16, 2019 at 6:04 AM Baoquan He <bhe@redhat.com> wrote:
> > >
> > > Vmemmap region has different maximum size depending on paging mode.
> > > Now its size is hardcoded as 1TB in memory KASLR, this is not
> > > right for 5-level paging mode. It will cause overflow if vmemmap
> > > region is randomized to be adjacent to cpu_entry_area region and
> > > its actual size is bigger than 1TB.
> > >
> > > So here calculate how many TB by the actual size of vmemmap region
> > > and align up to 1TB boundary.
> > >
> > > Signed-off-by: Baoquan He <bhe@redhat.com>
> > > ---
> > >  arch/x86/mm/kaslr.c | 11 ++++++++++-
> > >  1 file changed, 10 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
> > > index 97768df923e3..ca12ed4e5239 100644
> > > --- a/arch/x86/mm/kaslr.c
> > > +++ b/arch/x86/mm/kaslr.c
> > > @@ -101,7 +101,7 @@ static __initdata struct kaslr_memory_region {
> > >  } kaslr_regions[] = {
> > >         { &page_offset_base, 0 },
> > >         { &vmalloc_base, 0 },
> > > -       { &vmemmap_base, 1 },
> > > +       { &vmemmap_base, 0 },
> > >  };
> > >
> > >  /*
> > > @@ -121,6 +121,7 @@ void __init kernel_randomize_memory(void)
> > >         unsigned long rand, memory_tb;
> > >         struct rnd_state rand_state;
> > >         unsigned long remain_entropy;
> > > +       unsigned long vmemmap_size;
> > >
> > >         vaddr_start = pgtable_l5_enabled() ? __PAGE_OFFSET_BASE_L5 : __PAGE_OFFSET_BASE_L4;
> > >         vaddr = vaddr_start;
> > > @@ -152,6 +153,14 @@ void __init kernel_randomize_memory(void)
> > >         if (memory_tb < kaslr_regions[0].size_tb)
> > >                 kaslr_regions[0].size_tb = memory_tb;
> > >
> > > +       /*
> > > +        * Calculate how many TB vmemmap region needs, and align to
> > > +        * 1TB boundary.
> > 
> > Can you describe why this is the right calculation? (This will help
> > explain why 4-level is different from 5-level here.)
> 
> In the old code, the size of vmemmap is hardcoded as 1 TB. This is true
> in 4-level paging mode, 64 TB RAM supported at most, and usually
> sizeof(struct page) is 64 Bytes, it happens to be 1 TB.
> 
> However, in 5-level paging mode, 4 PB is the biggest RAM size we can
> support, it's (4 PB)/64 == 1<<48, namely 256 TB area needed for vmemmap,

Sorry, this should be (4 PB)/64 == 1<<46, 64 TB is the maximum size for
vmemmap.

> assuming sizeof(struct page) is 64 Bytes here.
> 
> So, the hardcoding of 1 TB is not correct for 5-level paging mode. 
> 
> Thanks
> Baoquan
> 
> > 
> > > +        */
> > > +       vmemmap_size = (kaslr_regions[0].size_tb << (TB_SHIFT - PAGE_SHIFT)) *
> > > +               sizeof(struct page);
> > > +       kaslr_regions[2].size_tb = DIV_ROUND_UP(vmemmap_size, 1UL << TB_SHIFT);
> > > +

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 6/6] x86/mm/KASLR: Do not adapt the size of the direct mapping section for SGI UV system
  2019-02-17  2:09   ` Baoquan He
@ 2019-02-18 19:24     ` Mike Travis
  2019-02-19  0:04       ` Baoquan He
  0 siblings, 1 reply; 23+ messages in thread
From: Mike Travis @ 2019-02-18 19:24 UTC (permalink / raw)
  To: Baoquan He, Mike Travis
  Cc: tglx, mingo, bp, hpa, dave.hansen, luto, peterz, x86, thgarnie,
	linux-kernel, keescook, akpm, yamada.masahiro, kirill,
	Russ Anderson, Dimitri Sivanich

Hi Baoquan,

There was a breakage in the TSC code (for UV) where the change was to 
introduce an early TSC "adjustment".  This bypassed the UV auto setting 
of TSC by BIOS because it came before the UV init code could indicate 
that the TSC was already in sync.  (I believe I already sent you a note 
about this?)

To fix this, I sent in two patches, the first had this 
is_early_uv_system() function defined and the second had the check to 
avoid adjusting TSC too early.

The commit referred to is:

> commit 20a8378aa9dd108a01cb0e695599f5257a885c4b
> Author: Mike Travis <mike.travis@hpe.com>
> Date:   Tue Oct 2 13:01:45 2018 -0500
> 
>     x86/platform/uv: Provide is_early_uv_system()
>     
>     Introduce is_early_uv_system() which uses efi.uv_systab to decide early
>     in the boot process whether the kernel is running on a UV system.
>     
>     This is needed to skip other early setup/init code that might break
>     the UV platform if done too early such as before necessary ACPI tables
>     parsing takes place.
>     

Check to see if that function is defined in the kernels you are pushing 
this patch to.

As to testing on a UV system, I can schedule the time to do that, just 
send me the details.

Thanks,
Mike

On 2/16/2019 6:09 PM, Baoquan He wrote:
> Hi Mike,
> 
> On 02/16/19 at 10:00pm, Baoquan He wrote:
>> On SGI UV system, kernel often hangs when KASLR is enabled. Disabling
>> KASLR makes kernel work well.
> 
> I wrap codes which calculate the size of the direct mapping section
> into a new function calc_direct_mapping_size() as Ingo suggested. This
> code change has passed basic testing, but hasn't been tested on a
> SGI UV machine after reproducing since it needs UV machine with UV
> module installed of enough size.
> 
> To reproduce it, we can apply patches 0001~0005. If reproduced, patch
> 0006 can be applied on top to check if bug is fixed. Please help check
> if the code is OK, if you have a machine, I can have a test.
> 
> Thanks
> Baoquan
> 
>>
>> The back trace is:
>>
>> kernel BUG at arch/x86/mm/init_64.c:311!
>> invalid opcode: 0000 [#1] SMP
>> [...]
>> RIP: 0010:__init_extra_mapping+0x188/0x196
>> [...]
>> Call Trace:
>>   init_extra_mapping_uc+0x13/0x15
>>   map_high+0x67/0x75
>>   map_mmioh_high_uv3+0x20a/0x219
>>   uv_system_init_hub+0x12d9/0x1496
>>   uv_system_init+0x27/0x29
>>   native_smp_prepare_cpus+0x28d/0x2d8
>>   kernel_init_freeable+0xdd/0x253
>>   ? rest_init+0x80/0x80
>>   kernel_init+0xe/0x110
>>   ret_from_fork+0x2c/0x40
>>
>> This is because the SGI UV system need map its MMIOH region to the direct
>> mapping section, and the mapping happens in rest_init() which is much
>> later than the calling of kernel_randomize_memory() to do mm KASLR. So
>> mm KASLR can't count in the size of the MMIOH region when calculate the
>> needed size of address space for the direct mapping section.
>>
>> When KASLR is disabled, there are 64TB address space for both system RAM
>> and the MMIOH regions to share. When KASLR is enabled, the current code
>> of mm KASLR only reserves the actual size of system RAM plus extra 10TB
>> for the direct mapping. Thus later the MMIOH mapping could go beyond
>> the upper bound of the direct mapping to step into VMALLOC or VMEMMAP area.
>> Then BUG_ON() in __init_extra_mapping() will be triggered.
>>
>> E.g on the SGI UV3 machine where this bug was reported , there are two
>> MMIOH regions:
>>
>> [    1.519001] UV: Map MMIOH0_HI 0xffc00000000 - 0x100000000000
>> [    1.523001] UV: Map MMIOH1_HI 0x100000000000 - 0x200000000000
>>
>> They are [16TB-16G, 16TB) and [16TB, 32TB). On this machine, 512G RAM are
>> spread out to 1TB regions. Then above two SGI MMIOH regions also will be
>> mapped into the direct mapping section.
>>
>> To fix it, we need check if it's SGI UV system by calling
>> is_early_uv_system() in kernel_randomize_memory(). If yes, do not adapt
>> thesize of the direct mapping section, just keep it as is, e.g in level-4
>> paging mode, 64TB.
>>
>> Signed-off-by: Baoquan He <bhe@redhat.com>
>> ---
>>   arch/x86/mm/kaslr.c | 57 +++++++++++++++++++++++++++++++++------------
>>   1 file changed, 42 insertions(+), 15 deletions(-)
>>
>> diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
>> index ca12ed4e5239..754b5da91d43 100644
>> --- a/arch/x86/mm/kaslr.c
>> +++ b/arch/x86/mm/kaslr.c
>> @@ -29,6 +29,7 @@
>>   #include <asm/pgtable.h>
>>   #include <asm/setup.h>
>>   #include <asm/kaslr.h>
>> +#include <asm/uv/uv.h>
>>   
>>   #include "mm_internal.h"
>>   
>> @@ -113,15 +114,51 @@ static inline bool kaslr_memory_enabled(void)
>>   	return kaslr_enabled() && !IS_ENABLED(CONFIG_KASAN);
>>   }
>>   
>> +/*
>> + * Even though a huge virtual address space is reserved for the direct
>> + * mapping of physical memory, e.g in 4-level pageing mode, it's 64TB,
>> + * rare system can own enough physical memory to use it up, most are
>> + * even less than 1TB. So with KASLR enabled, we adapt the size of
>> + * direct mapping area to size of actual physical memory plus the
>> + * configured padding CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING.
>> + * The left part will be taken out to join memory randomization.
>> + *
>> + * Note that UV system is an exception, its MMIOH region need be mapped
>> + * into the direct mapping area too, while the size can't be got until
>> + * rest_init() calling. Hence for UV system, do not adapt the size
>> + * of direct mapping area.
>> + */
>> +static inline unsigned long calc_direct_mapping_size(void)
>> +{
>> +	unsigned long size_tb, memory_tb;
>> +
>> +	/*
>> +	 * Update Physical memory mapping to available and
>> +	 * add padding if needed (especially for memory hotplug support).
>> +	 */
>> +	memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) +
>> +		CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
>> +
>> +	size_tb = 1 << (MAX_PHYSMEM_BITS - TB_SHIFT);
>> +
>> +	/*
>> +	 * Adapt phyiscal memory region size based on available memory if
>> +	 * it's not UV system.
>> +	 */
>> +	if (memory_tb < size_tb && !is_early_uv_system())
>> +		size_tb = memory_tb;
>> +
>> +	return size_tb;
>> +}
>> +
>>   /* Initialize base and padding for each memory region randomized with KASLR */
>>   void __init kernel_randomize_memory(void)
>>   {
>> -	size_t i;
>> -	unsigned long vaddr_start, vaddr;
>> -	unsigned long rand, memory_tb;
>> -	struct rnd_state rand_state;
>> +	unsigned long vaddr_start, vaddr, rand;
>>   	unsigned long remain_entropy;
>>   	unsigned long vmemmap_size;
>> +	struct rnd_state rand_state;
>> +	size_t i;
>>   
>>   	vaddr_start = pgtable_l5_enabled() ? __PAGE_OFFSET_BASE_L5 : __PAGE_OFFSET_BASE_L4;
>>   	vaddr = vaddr_start;
>> @@ -138,20 +175,10 @@ void __init kernel_randomize_memory(void)
>>   	if (!kaslr_memory_enabled())
>>   		return;
>>   
>> -	kaslr_regions[0].size_tb = 1 << (MAX_PHYSMEM_BITS - TB_SHIFT);
>> +	kaslr_regions[0].size_tb = calc_direct_mapping_size();
>>   	kaslr_regions[1].size_tb = VMALLOC_SIZE_TB;
>>   
>> -	/*
>> -	 * Update Physical memory mapping to available and
>> -	 * add padding if needed (especially for memory hotplug support).
>> -	 */
>>   	BUG_ON(kaslr_regions[0].base != &page_offset_base);
>> -	memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) +
>> -		CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
>> -
>> -	/* Adapt phyiscal memory region size based on available memory */
>> -	if (memory_tb < kaslr_regions[0].size_tb)
>> -		kaslr_regions[0].size_tb = memory_tb;
>>   
>>   	/*
>>   	 * Calculate how many TB vmemmap region needs, and align to
>> -- 
>> 2.17.2
>>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 6/6] x86/mm/KASLR: Do not adapt the size of the direct mapping section for SGI UV system
  2019-02-18 19:24     ` Mike Travis
@ 2019-02-19  0:04       ` Baoquan He
  0 siblings, 0 replies; 23+ messages in thread
From: Baoquan He @ 2019-02-19  0:04 UTC (permalink / raw)
  To: Mike Travis
  Cc: tglx, mingo, bp, hpa, dave.hansen, luto, peterz, x86, thgarnie,
	linux-kernel, keescook, akpm, yamada.masahiro, kirill,
	Russ Anderson, Dimitri Sivanich

On 02/18/19 at 11:24am, Mike Travis wrote:
> Hi Baoquan,
> 
> There was a breakage in the TSC code (for UV) where the change was to
> introduce an early TSC "adjustment".  This bypassed the UV auto setting of
> TSC by BIOS because it came before the UV init code could indicate that the
> TSC was already in sync.  (I believe I already sent you a note about this?)
> 
> To fix this, I sent in two patches, the first had this is_early_uv_system()
> function defined and the second had the check to avoid adjusting TSC too
> early.
> 
> The commit referred to is:
> 
> > commit 20a8378aa9dd108a01cb0e695599f5257a885c4b
> > Author: Mike Travis <mike.travis@hpe.com>
> > Date:   Tue Oct 2 13:01:45 2018 -0500
> > 
> >     x86/platform/uv: Provide is_early_uv_system()
> >     Introduce is_early_uv_system() which uses efi.uv_systab to decide early
> >     in the boot process whether the kernel is running on a UV system.
> >     This is needed to skip other early setup/init code that might break
> >     the UV platform if done too early such as before necessary ACPI tables
> >     parsing takes place.
> 
> Check to see if that function is defined in the kernels you are pushing this
> patch to.

Thanks, Mike. Yes, I received your mail before, so use the
is_early_uv_system() function directly in this patch.

> 
> As to testing on a UV system, I can schedule the time to do that, just send
> me the details.

The bug is triggered because 

UV MMIOH region is larger than

  "(the size of actual RAM) + CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING"

Here CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING is 10 TB by default.

E.g UV system with 1 TB RAM, 1 TB + 10 TB = 11 TB. If UV MMIOH region is
located beyond [0, 11TB], it may trigger bug. I suggest a UV system can
be set like this:

The actual RAM can be several GB, like 20 GB is enough.

Then install UV modules after 11 TB, e.g if the size of UV reion is 16
TB, you can put it in [16TB, 32TB].

We can discuss and decide the setting according to the UV module size.

Thanks
Baoquan
> 
> 
> On 2/16/2019 6:09 PM, Baoquan He wrote:
> > Hi Mike,
> > 
> > On 02/16/19 at 10:00pm, Baoquan He wrote:
> > > On SGI UV system, kernel often hangs when KASLR is enabled. Disabling
> > > KASLR makes kernel work well.
> > 
> > I wrap codes which calculate the size of the direct mapping section
> > into a new function calc_direct_mapping_size() as Ingo suggested. This
> > code change has passed basic testing, but hasn't been tested on a
> > SGI UV machine after reproducing since it needs UV machine with UV
> > module installed of enough size.
> > 
> > To reproduce it, we can apply patches 0001~0005. If reproduced, patch
> > 0006 can be applied on top to check if bug is fixed. Please help check
> > if the code is OK, if you have a machine, I can have a test.
> > 
> > Thanks
> > Baoquan
> > 
> > > 
> > > The back trace is:
> > > 
> > > kernel BUG at arch/x86/mm/init_64.c:311!
> > > invalid opcode: 0000 [#1] SMP
> > > [...]
> > > RIP: 0010:__init_extra_mapping+0x188/0x196
> > > [...]
> > > Call Trace:
> > >   init_extra_mapping_uc+0x13/0x15
> > >   map_high+0x67/0x75
> > >   map_mmioh_high_uv3+0x20a/0x219
> > >   uv_system_init_hub+0x12d9/0x1496
> > >   uv_system_init+0x27/0x29
> > >   native_smp_prepare_cpus+0x28d/0x2d8
> > >   kernel_init_freeable+0xdd/0x253
> > >   ? rest_init+0x80/0x80
> > >   kernel_init+0xe/0x110
> > >   ret_from_fork+0x2c/0x40
> > > 
> > > This is because the SGI UV system need map its MMIOH region to the direct
> > > mapping section, and the mapping happens in rest_init() which is much
> > > later than the calling of kernel_randomize_memory() to do mm KASLR. So
> > > mm KASLR can't count in the size of the MMIOH region when calculate the
> > > needed size of address space for the direct mapping section.
> > > 
> > > When KASLR is disabled, there are 64TB address space for both system RAM
> > > and the MMIOH regions to share. When KASLR is enabled, the current code
> > > of mm KASLR only reserves the actual size of system RAM plus extra 10TB
> > > for the direct mapping. Thus later the MMIOH mapping could go beyond
> > > the upper bound of the direct mapping to step into VMALLOC or VMEMMAP area.
> > > Then BUG_ON() in __init_extra_mapping() will be triggered.
> > > 
> > > E.g on the SGI UV3 machine where this bug was reported , there are two
> > > MMIOH regions:
> > > 
> > > [    1.519001] UV: Map MMIOH0_HI 0xffc00000000 - 0x100000000000
> > > [    1.523001] UV: Map MMIOH1_HI 0x100000000000 - 0x200000000000
> > > 
> > > They are [16TB-16G, 16TB) and [16TB, 32TB). On this machine, 512G RAM are
> > > spread out to 1TB regions. Then above two SGI MMIOH regions also will be
> > > mapped into the direct mapping section.
> > > 
> > > To fix it, we need check if it's SGI UV system by calling
> > > is_early_uv_system() in kernel_randomize_memory(). If yes, do not adapt
> > > thesize of the direct mapping section, just keep it as is, e.g in level-4
> > > paging mode, 64TB.
> > > 
> > > Signed-off-by: Baoquan He <bhe@redhat.com>
> > > ---
> > >   arch/x86/mm/kaslr.c | 57 +++++++++++++++++++++++++++++++++------------
> > >   1 file changed, 42 insertions(+), 15 deletions(-)
> > > 
> > > diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
> > > index ca12ed4e5239..754b5da91d43 100644
> > > --- a/arch/x86/mm/kaslr.c
> > > +++ b/arch/x86/mm/kaslr.c
> > > @@ -29,6 +29,7 @@
> > >   #include <asm/pgtable.h>
> > >   #include <asm/setup.h>
> > >   #include <asm/kaslr.h>
> > > +#include <asm/uv/uv.h>
> > >   #include "mm_internal.h"
> > > @@ -113,15 +114,51 @@ static inline bool kaslr_memory_enabled(void)
> > >   	return kaslr_enabled() && !IS_ENABLED(CONFIG_KASAN);
> > >   }
> > > +/*
> > > + * Even though a huge virtual address space is reserved for the direct
> > > + * mapping of physical memory, e.g in 4-level pageing mode, it's 64TB,
> > > + * rare system can own enough physical memory to use it up, most are
> > > + * even less than 1TB. So with KASLR enabled, we adapt the size of
> > > + * direct mapping area to size of actual physical memory plus the
> > > + * configured padding CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING.
> > > + * The left part will be taken out to join memory randomization.
> > > + *
> > > + * Note that UV system is an exception, its MMIOH region need be mapped
> > > + * into the direct mapping area too, while the size can't be got until
> > > + * rest_init() calling. Hence for UV system, do not adapt the size
> > > + * of direct mapping area.
> > > + */
> > > +static inline unsigned long calc_direct_mapping_size(void)
> > > +{
> > > +	unsigned long size_tb, memory_tb;
> > > +
> > > +	/*
> > > +	 * Update Physical memory mapping to available and
> > > +	 * add padding if needed (especially for memory hotplug support).
> > > +	 */
> > > +	memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) +
> > > +		CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
> > > +
> > > +	size_tb = 1 << (MAX_PHYSMEM_BITS - TB_SHIFT);
> > > +
> > > +	/*
> > > +	 * Adapt phyiscal memory region size based on available memory if
> > > +	 * it's not UV system.
> > > +	 */
> > > +	if (memory_tb < size_tb && !is_early_uv_system())
> > > +		size_tb = memory_tb;
> > > +
> > > +	return size_tb;
> > > +}
> > > +
> > >   /* Initialize base and padding for each memory region randomized with KASLR */
> > >   void __init kernel_randomize_memory(void)
> > >   {
> > > -	size_t i;
> > > -	unsigned long vaddr_start, vaddr;
> > > -	unsigned long rand, memory_tb;
> > > -	struct rnd_state rand_state;
> > > +	unsigned long vaddr_start, vaddr, rand;
> > >   	unsigned long remain_entropy;
> > >   	unsigned long vmemmap_size;
> > > +	struct rnd_state rand_state;
> > > +	size_t i;
> > >   	vaddr_start = pgtable_l5_enabled() ? __PAGE_OFFSET_BASE_L5 : __PAGE_OFFSET_BASE_L4;
> > >   	vaddr = vaddr_start;
> > > @@ -138,20 +175,10 @@ void __init kernel_randomize_memory(void)
> > >   	if (!kaslr_memory_enabled())
> > >   		return;
> > > -	kaslr_regions[0].size_tb = 1 << (MAX_PHYSMEM_BITS - TB_SHIFT);
> > > +	kaslr_regions[0].size_tb = calc_direct_mapping_size();
> > >   	kaslr_regions[1].size_tb = VMALLOC_SIZE_TB;
> > > -	/*
> > > -	 * Update Physical memory mapping to available and
> > > -	 * add padding if needed (especially for memory hotplug support).
> > > -	 */
> > >   	BUG_ON(kaslr_regions[0].base != &page_offset_base);
> > > -	memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) +
> > > -		CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
> > > -
> > > -	/* Adapt phyiscal memory region size based on available memory */
> > > -	if (memory_tb < kaslr_regions[0].size_tb)
> > > -		kaslr_regions[0].size_tb = memory_tb;
> > >   	/*
> > >   	 * Calculate how many TB vmemmap region needs, and align to
> > > -- 
> > > 2.17.2
> > > 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/6] x86/mm/KASLR: Improve code comments about struct kaslr_memory_region
  2019-02-17 17:07   ` Kees Cook
  2019-02-18  3:17     ` Baoquan He
@ 2019-03-12  0:55     ` Baoquan He
  1 sibling, 0 replies; 23+ messages in thread
From: Baoquan He @ 2019-03-12  0:55 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	X86 ML, Mike Travis, Thomas Garnier, Andrew Morton,
	Masahiro Yamada, Kirill A. Shutemov

On 02/17/19 at 09:07am, Kees Cook wrote:
> > + * E.g the physical memory region, we limit the starting address to be
> > + * taken from the 1st 1/3 part of the whole available virtual address
> > + * space which is from 0xffff880000000000 to 0xfffffe0000000000, namely
> > + * the original starting address of the physical memory mapping region
> > + * to the starting address of cpu_entry_area mapping region. Once a random
> > + * address is chosen for the physical memory mapping, we jump over the
> > + * region and add 1G to begin the next region handling with the remaining
> > + * available space.
> 
> Should the "operation" comments (rather than the struct field
> comments) be moved to the start of the kernel_randomize_memory()
> function instead?

Rethink about this, I think you are right. This paragraph better be
moved to above kernel_randomize_memory(), to explain its behaviour.
Will update and repost. Thanks.

Thanks
Baoquan

> 
> >   */
> > +
> >  static __initdata struct kaslr_memory_region {
> >         unsigned long *base;
> >         unsigned long size_tb;
> > --
> > 2.17.2
> >
> 
> 
> -- 
> Kees Cook

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/6] x86/mm/KASLR: Improve code comments about struct kaslr_memory_region
  2019-02-18  3:17     ` Baoquan He
@ 2019-03-12  3:45       ` Baoquan He
  0 siblings, 0 replies; 23+ messages in thread
From: Baoquan He @ 2019-03-12  3:45 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	X86 ML, Mike Travis, Thomas Garnier, Andrew Morton,
	Masahiro Yamada, Kirill A. Shutemov

On 02/18/19 at 11:17am, Baoquan He wrote:
> > > + * When KASLR is active then the MM code makes sure that for each region
> > > + * there's such a single, dynamic, global base address 'unsigned long'
> > > + * variable available for the KASLR code to point to and modify directly:
> > > + *
> > > + *     { &page_offset_base, 0 },
> > > + *     { &vmalloc_base,     0 },
> > > + *     { &vmemmap_base,     1 },
> > > + *
> > > + * @size_tb: size in TB of each memory region. Thereinto, the size of
> > 
> > nit: "Thereinto" is odd. I'd say "Therefore".


Recheck here, there might be a misunderstanding. Here, 'Thereinto', I
want to say among the sizes of these three memory regions, the size of
the physical memory mapping region is variable. Just look it up in dict,
it sounds right. 'Therefore' equals to 'Hence' which is not expected
here. Am I right?
> 
> Will replace it with 'Therefore'.
> 
> > 
> > > + * the physical memory mapping region is variable, calculated according
> > > + * to the actual size of system RAM in order to save more space for
> > > + * randomization. The rest are fixed values related to paging mode.
> > > + *
> > > + * @size_tb: is the size of each memory region after randomization, and
> > > + * its unit is TB.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2019-03-12  3:45 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-16 14:00 [PATCH v3 0/6] Several patches to fix code bugs, improve documents and clean up Baoquan He
2019-02-16 14:00 ` [PATCH v3 1/6] x86/mm/KASLR: Improve code comments about struct kaslr_memory_region Baoquan He
2019-02-17 17:07   ` Kees Cook
2019-02-18  3:17     ` Baoquan He
2019-03-12  3:45       ` Baoquan He
2019-03-12  0:55     ` Baoquan He
2019-02-16 14:00 ` [PATCH v3 2/6] x86/mm/KASLR: Open code unnecessary function get_padding Baoquan He
2019-02-17 17:14   ` Kees Cook
2019-02-16 14:00 ` [PATCH v3 3/6] mm: Add build time sanity check for struct page size Baoquan He
2019-02-17 16:50   ` Kees Cook
2019-02-18  8:07     ` Baoquan He
2019-02-16 14:00 ` [PATCH v3 4/6] x86/mm/KASLR: Fix the wrong calculation of memory region initial size Baoquan He
2019-02-17 16:53   ` Kees Cook
2019-02-18  8:30     ` Baoquan He
2019-02-16 14:00 ` [PATCH v3 5/6] x86/mm/KASLR: Calculate the actual size of vmemmap region Baoquan He
2019-02-17 17:25   ` Kees Cook
2019-02-18  9:50     ` Baoquan He
2019-02-18 10:09       ` Baoquan He
2019-02-18 10:11       ` Baoquan He
2019-02-16 14:00 ` [PATCH v3 6/6] x86/mm/KASLR: Do not adapt the size of the direct mapping section for SGI UV system Baoquan He
2019-02-17  2:09   ` Baoquan He
2019-02-18 19:24     ` Mike Travis
2019-02-19  0:04       ` Baoquan He

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.