All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Boot-time switching between 4- and 5-level paging for 4.15, Part 2
@ 2017-10-20 19:59 ` Kirill A. Shutemov
  0 siblings, 0 replies; 18+ messages in thread
From: Kirill A. Shutemov @ 2017-10-20 19:59 UTC (permalink / raw)
  To: Ingo Molnar, Linus Torvalds, x86, Thomas Gleixner, H. Peter Anvin
  Cc: Andy Lutomirski, Cyrill Gorcunov, Borislav Petkov, Andi Kleen,
	linux-mm, linux-kernel, Kirill A. Shutemov

Hi Ingo,

Here's the second bunch of patches that prepare kernel to boot-time switching
between paging modes.

It's a small one. I hope we can get it in quick. :)

I include the zsmalloc patch again. We need something to address the issue.
If we would find a better solution, we can come back to the topic and
rework it.

Apart from zsmalloc patch, the patchset includes changes to decompression
code. I reworked these patches. They are split not exactly the way you've
described before, but I hope it's sensible anyway.

Please review and consider applying.

Kirill A. Shutemov (4):
  mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS
  x86/boot/compressed/64: Detect and handle 5-level paging at boot-time
  x86/boot/compressed/64: Introduce place_trampoline()
  x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G

 arch/x86/boot/compressed/head_64.S          | 99 ++++++++++++++++++++---------
 arch/x86/boot/compressed/pagetable.c        | 61 ++++++++++++++++++
 arch/x86/boot/compressed/pagetable.h        | 18 ++++++
 arch/x86/include/asm/pgtable-3level_types.h |  1 +
 arch/x86/include/asm/pgtable_64_types.h     |  2 +
 mm/zsmalloc.c                               | 13 ++--
 6 files changed, 158 insertions(+), 36 deletions(-)
 create mode 100644 arch/x86/boot/compressed/pagetable.h

-- 
2.14.2

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 0/4] Boot-time switching between 4- and 5-level paging for 4.15, Part 2
@ 2017-10-20 19:59 ` Kirill A. Shutemov
  0 siblings, 0 replies; 18+ messages in thread
From: Kirill A. Shutemov @ 2017-10-20 19:59 UTC (permalink / raw)
  To: Ingo Molnar, Linus Torvalds, x86, Thomas Gleixner, H. Peter Anvin
  Cc: Andy Lutomirski, Cyrill Gorcunov, Borislav Petkov, Andi Kleen,
	linux-mm, linux-kernel, Kirill A. Shutemov

Hi Ingo,

Here's the second bunch of patches that prepare kernel to boot-time switching
between paging modes.

It's a small one. I hope we can get it in quick. :)

I include the zsmalloc patch again. We need something to address the issue.
If we would find a better solution, we can come back to the topic and
rework it.

Apart from zsmalloc patch, the patchset includes changes to decompression
code. I reworked these patches. They are split not exactly the way you've
described before, but I hope it's sensible anyway.

Please review and consider applying.

Kirill A. Shutemov (4):
  mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS
  x86/boot/compressed/64: Detect and handle 5-level paging at boot-time
  x86/boot/compressed/64: Introduce place_trampoline()
  x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G

 arch/x86/boot/compressed/head_64.S          | 99 ++++++++++++++++++++---------
 arch/x86/boot/compressed/pagetable.c        | 61 ++++++++++++++++++
 arch/x86/boot/compressed/pagetable.h        | 18 ++++++
 arch/x86/include/asm/pgtable-3level_types.h |  1 +
 arch/x86/include/asm/pgtable_64_types.h     |  2 +
 mm/zsmalloc.c                               | 13 ++--
 6 files changed, 158 insertions(+), 36 deletions(-)
 create mode 100644 arch/x86/boot/compressed/pagetable.h

-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/4] mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS
  2017-10-20 19:59 ` Kirill A. Shutemov
@ 2017-10-20 19:59   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 18+ messages in thread
From: Kirill A. Shutemov @ 2017-10-20 19:59 UTC (permalink / raw)
  To: Ingo Molnar, Linus Torvalds, x86, Thomas Gleixner, H. Peter Anvin
  Cc: Andy Lutomirski, Cyrill Gorcunov, Borislav Petkov, Andi Kleen,
	linux-mm, linux-kernel, Kirill A. Shutemov, Minchan Kim,
	Nitin Gupta, Sergey Senozhatsky

With boot-time switching between paging mode we will have variable
MAX_PHYSMEM_BITS.

Let's use the maximum variable possible for CONFIG_X86_5LEVEL=y
configuration to define zsmalloc data structures.

The patch introduces MAX_POSSIBLE_PHYSMEM_BITS to cover such case.
It also suits well to handle PAE special case.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
---
 arch/x86/include/asm/pgtable-3level_types.h |  1 +
 arch/x86/include/asm/pgtable_64_types.h     |  2 ++
 mm/zsmalloc.c                               | 13 +++++++------
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/pgtable-3level_types.h b/arch/x86/include/asm/pgtable-3level_types.h
index b8a4341faafa..3fe1d107a875 100644
--- a/arch/x86/include/asm/pgtable-3level_types.h
+++ b/arch/x86/include/asm/pgtable-3level_types.h
@@ -43,5 +43,6 @@ typedef union {
  */
 #define PTRS_PER_PTE	512
 
+#define MAX_POSSIBLE_PHYSMEM_BITS	36
 
 #endif /* _ASM_X86_PGTABLE_3LEVEL_DEFS_H */
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 06470da156ba..39075df30b8a 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -39,6 +39,8 @@ typedef struct { pteval_t pte; } pte_t;
 #define P4D_SIZE	(_AC(1, UL) << P4D_SHIFT)
 #define P4D_MASK	(~(P4D_SIZE - 1))
 
+#define MAX_POSSIBLE_PHYSMEM_BITS	52
+
 #else /* CONFIG_X86_5LEVEL */
 
 /*
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 7c38e850a8fc..7bde01c55c90 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -82,18 +82,19 @@
  * This is made more complicated by various memory models and PAE.
  */
 
-#ifndef MAX_PHYSMEM_BITS
-#ifdef CONFIG_HIGHMEM64G
-#define MAX_PHYSMEM_BITS 36
-#else /* !CONFIG_HIGHMEM64G */
+#ifndef MAX_POSSIBLE_PHYSMEM_BITS
+#ifdef MAX_PHYSMEM_BITS
+#define MAX_POSSIBLE_PHYSMEM_BITS MAX_PHYSMEM_BITS
+#else
 /*
  * If this definition of MAX_PHYSMEM_BITS is used, OBJ_INDEX_BITS will just
  * be PAGE_SHIFT
  */
-#define MAX_PHYSMEM_BITS BITS_PER_LONG
+#define MAX_POSSIBLE_PHYSMEM_BITS BITS_PER_LONG
 #endif
 #endif
-#define _PFN_BITS		(MAX_PHYSMEM_BITS - PAGE_SHIFT)
+
+#define _PFN_BITS		(MAX_POSSIBLE_PHYSMEM_BITS - PAGE_SHIFT)
 
 /*
  * Memory for allocating for handle keeps object position by
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 1/4] mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS
@ 2017-10-20 19:59   ` Kirill A. Shutemov
  0 siblings, 0 replies; 18+ messages in thread
From: Kirill A. Shutemov @ 2017-10-20 19:59 UTC (permalink / raw)
  To: Ingo Molnar, Linus Torvalds, x86, Thomas Gleixner, H. Peter Anvin
  Cc: Andy Lutomirski, Cyrill Gorcunov, Borislav Petkov, Andi Kleen,
	linux-mm, linux-kernel, Kirill A. Shutemov, Minchan Kim,
	Nitin Gupta, Sergey Senozhatsky

With boot-time switching between paging mode we will have variable
MAX_PHYSMEM_BITS.

Let's use the maximum variable possible for CONFIG_X86_5LEVEL=y
configuration to define zsmalloc data structures.

The patch introduces MAX_POSSIBLE_PHYSMEM_BITS to cover such case.
It also suits well to handle PAE special case.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
---
 arch/x86/include/asm/pgtable-3level_types.h |  1 +
 arch/x86/include/asm/pgtable_64_types.h     |  2 ++
 mm/zsmalloc.c                               | 13 +++++++------
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/pgtable-3level_types.h b/arch/x86/include/asm/pgtable-3level_types.h
index b8a4341faafa..3fe1d107a875 100644
--- a/arch/x86/include/asm/pgtable-3level_types.h
+++ b/arch/x86/include/asm/pgtable-3level_types.h
@@ -43,5 +43,6 @@ typedef union {
  */
 #define PTRS_PER_PTE	512
 
+#define MAX_POSSIBLE_PHYSMEM_BITS	36
 
 #endif /* _ASM_X86_PGTABLE_3LEVEL_DEFS_H */
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 06470da156ba..39075df30b8a 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -39,6 +39,8 @@ typedef struct { pteval_t pte; } pte_t;
 #define P4D_SIZE	(_AC(1, UL) << P4D_SHIFT)
 #define P4D_MASK	(~(P4D_SIZE - 1))
 
+#define MAX_POSSIBLE_PHYSMEM_BITS	52
+
 #else /* CONFIG_X86_5LEVEL */
 
 /*
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 7c38e850a8fc..7bde01c55c90 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -82,18 +82,19 @@
  * This is made more complicated by various memory models and PAE.
  */
 
-#ifndef MAX_PHYSMEM_BITS
-#ifdef CONFIG_HIGHMEM64G
-#define MAX_PHYSMEM_BITS 36
-#else /* !CONFIG_HIGHMEM64G */
+#ifndef MAX_POSSIBLE_PHYSMEM_BITS
+#ifdef MAX_PHYSMEM_BITS
+#define MAX_POSSIBLE_PHYSMEM_BITS MAX_PHYSMEM_BITS
+#else
 /*
  * If this definition of MAX_PHYSMEM_BITS is used, OBJ_INDEX_BITS will just
  * be PAGE_SHIFT
  */
-#define MAX_PHYSMEM_BITS BITS_PER_LONG
+#define MAX_POSSIBLE_PHYSMEM_BITS BITS_PER_LONG
 #endif
 #endif
-#define _PFN_BITS		(MAX_PHYSMEM_BITS - PAGE_SHIFT)
+
+#define _PFN_BITS		(MAX_POSSIBLE_PHYSMEM_BITS - PAGE_SHIFT)
 
 /*
  * Memory for allocating for handle keeps object position by
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/4] x86/boot/compressed/64: Detect and handle 5-level paging at boot-time
  2017-10-20 19:59 ` Kirill A. Shutemov
@ 2017-10-20 19:59   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 18+ messages in thread
From: Kirill A. Shutemov @ 2017-10-20 19:59 UTC (permalink / raw)
  To: Ingo Molnar, Linus Torvalds, x86, Thomas Gleixner, H. Peter Anvin
  Cc: Andy Lutomirski, Cyrill Gorcunov, Borislav Petkov, Andi Kleen,
	linux-mm, linux-kernel, Kirill A. Shutemov

This patch prepare decompression code to boot-time switching between 4-
and 5-level paging.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/boot/compressed/head_64.S   | 16 ++++++++++++----
 arch/x86/boot/compressed/pagetable.c | 19 +++++++++++++++++++
 2 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index b4a5d284391c..6ac8239af2b6 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -288,10 +288,18 @@ ENTRY(startup_64)
 	leaq	boot_stack_end(%rbx), %rsp
 
 #ifdef CONFIG_X86_5LEVEL
-	/* Check if 5-level paging has already enabled */
-	movq	%cr4, %rax
-	testl	$X86_CR4_LA57, %eax
-	jnz	lvl5
+	/*
+	 * Check if we need to enable 5-level paging.
+	 * RSI holds real mode data and need to be preserved across
+	 * a function call.
+	 */
+	pushq	%rsi
+	call	need_to_enabled_l5
+	popq	%rsi
+
+	/* If need_to_enabled_l5() returned zero, we're done here. */
+	cmpq	$0, %rax
+	je	lvl5
 
 	/*
 	 * At this point we are in long mode with 4-level paging enabled,
diff --git a/arch/x86/boot/compressed/pagetable.c b/arch/x86/boot/compressed/pagetable.c
index f1aa43854bed..76d25a82e3ac 100644
--- a/arch/x86/boot/compressed/pagetable.c
+++ b/arch/x86/boot/compressed/pagetable.c
@@ -149,3 +149,22 @@ void finalize_identity_maps(void)
 {
 	write_cr3(top_level_pgt);
 }
+
+#ifdef CONFIG_X86_5LEVEL
+int need_to_enabled_l5(void)
+{
+	/* Check i leaf 7 is supported. */
+	if (native_cpuid_eax(0) < 7)
+		return 0;
+
+	/* Check if la57 is supported. */
+	if (!(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31))))
+		return 0;
+
+	/* Check if 5-level paging has already been enabled. */
+	if (native_read_cr4() & X86_CR4_LA57)
+		return 0;
+
+	return 1;
+}
+#endif
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/4] x86/boot/compressed/64: Detect and handle 5-level paging at boot-time
@ 2017-10-20 19:59   ` Kirill A. Shutemov
  0 siblings, 0 replies; 18+ messages in thread
From: Kirill A. Shutemov @ 2017-10-20 19:59 UTC (permalink / raw)
  To: Ingo Molnar, Linus Torvalds, x86, Thomas Gleixner, H. Peter Anvin
  Cc: Andy Lutomirski, Cyrill Gorcunov, Borislav Petkov, Andi Kleen,
	linux-mm, linux-kernel, Kirill A. Shutemov

This patch prepare decompression code to boot-time switching between 4-
and 5-level paging.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/boot/compressed/head_64.S   | 16 ++++++++++++----
 arch/x86/boot/compressed/pagetable.c | 19 +++++++++++++++++++
 2 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index b4a5d284391c..6ac8239af2b6 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -288,10 +288,18 @@ ENTRY(startup_64)
 	leaq	boot_stack_end(%rbx), %rsp
 
 #ifdef CONFIG_X86_5LEVEL
-	/* Check if 5-level paging has already enabled */
-	movq	%cr4, %rax
-	testl	$X86_CR4_LA57, %eax
-	jnz	lvl5
+	/*
+	 * Check if we need to enable 5-level paging.
+	 * RSI holds real mode data and need to be preserved across
+	 * a function call.
+	 */
+	pushq	%rsi
+	call	need_to_enabled_l5
+	popq	%rsi
+
+	/* If need_to_enabled_l5() returned zero, we're done here. */
+	cmpq	$0, %rax
+	je	lvl5
 
 	/*
 	 * At this point we are in long mode with 4-level paging enabled,
diff --git a/arch/x86/boot/compressed/pagetable.c b/arch/x86/boot/compressed/pagetable.c
index f1aa43854bed..76d25a82e3ac 100644
--- a/arch/x86/boot/compressed/pagetable.c
+++ b/arch/x86/boot/compressed/pagetable.c
@@ -149,3 +149,22 @@ void finalize_identity_maps(void)
 {
 	write_cr3(top_level_pgt);
 }
+
+#ifdef CONFIG_X86_5LEVEL
+int need_to_enabled_l5(void)
+{
+	/* Check i leaf 7 is supported. */
+	if (native_cpuid_eax(0) < 7)
+		return 0;
+
+	/* Check if la57 is supported. */
+	if (!(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31))))
+		return 0;
+
+	/* Check if 5-level paging has already been enabled. */
+	if (native_read_cr4() & X86_CR4_LA57)
+		return 0;
+
+	return 1;
+}
+#endif
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/4] x86/boot/compressed/64: Introduce place_trampoline()
  2017-10-20 19:59 ` Kirill A. Shutemov
@ 2017-10-20 19:59   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 18+ messages in thread
From: Kirill A. Shutemov @ 2017-10-20 19:59 UTC (permalink / raw)
  To: Ingo Molnar, Linus Torvalds, x86, Thomas Gleixner, H. Peter Anvin
  Cc: Andy Lutomirski, Cyrill Gorcunov, Borislav Petkov, Andi Kleen,
	linux-mm, linux-kernel, Kirill A. Shutemov

If bootloader enables 64-bit mode with 4-level paging, we need to
switch over to 5-level paging. The switching requires disabling paging.
It works fine if kernel itself is loaded below 4G.

If bootloader put the kernel above 4G (not sure if anybody does this),
we would loose control as soon as paging is disabled as code becomes
unreachable.

To handle the situation, we need a trampoline in lower memory that would
take care about switching on 5-level paging.

Apart from trampoline itself we also need place to store top level page
table in lower memory as we don't have a way to load 64-bit value into
CR3 from 32-bit mode. We only really need 8-bytes there as we only use
the very first entry of the page table. But we allocate whole page
anyway. We cannot have the code in the same because, there's hazard that
a CPU would read page table speculatively and get confused seeing
garbage.

This patch introduces place_trampoline() that finds right spot in lower
memory for trampoline, copies trampoline code there and setups new top
level page table for 5-level paging.

At this point we do all the preparation, but not yet use trampoline.
It will be done in following patch.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/boot/compressed/head_64.S   | 13 +++++++++++
 arch/x86/boot/compressed/pagetable.c | 42 ++++++++++++++++++++++++++++++++++++
 arch/x86/boot/compressed/pagetable.h | 18 ++++++++++++++++
 3 files changed, 73 insertions(+)
 create mode 100644 arch/x86/boot/compressed/pagetable.h

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 6ac8239af2b6..4d1555b39de0 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -315,6 +315,18 @@ ENTRY(startup_64)
 	 * The first step is go into compatibility mode.
 	 */
 
+	/*
+	 * Find suitable place for trampoline and populate it.
+	 * The address will be stored in RCX.
+	 *
+	 * RSI holds real mode data and need to be preserved across
+	 * a function call.
+	 */
+	pushq	%rsi
+	call	place_trampoline
+	popq	%rsi
+	movq	%rax, %rcx
+
 	/* Clear additional page table */
 	leaq	lvl5_pgtable(%rbx), %rdi
 	xorq	%rax, %rax
@@ -474,6 +486,7 @@ relocated:
 
 	.code32
 #ifdef CONFIG_X86_5LEVEL
+ENTRY(lvl5_trampoline_src)
 compatible_mode:
 	/* Setup data and stack segments */
 	movl	$__KERNEL_DS, %eax
diff --git a/arch/x86/boot/compressed/pagetable.c b/arch/x86/boot/compressed/pagetable.c
index 76d25a82e3ac..a04fb69a453f 100644
--- a/arch/x86/boot/compressed/pagetable.c
+++ b/arch/x86/boot/compressed/pagetable.c
@@ -23,6 +23,8 @@
 #undef CONFIG_AMD_MEM_ENCRYPT
 
 #include "misc.h"
+#include "pagetable.h"
+#include "../string.h"
 
 /* These actually do the work of building the kernel identity maps. */
 #include <asm/init.h>
@@ -167,4 +169,44 @@ int need_to_enabled_l5(void)
 
 	return 1;
 }
+
+#define BIOS_START_MIN		0x20000U	/* 128K, less than this is insane */
+#define BIOS_START_MAX		0x9f000U	/* 640K, absolute maximum */
+
+unsigned long *place_trampoline()
+{
+	unsigned long bios_start, ebda_start, trampoline_start, *trampoline;
+
+	/* Based on reserve_bios_regions() */
+
+	ebda_start = *(unsigned short *)0x40e << 4;
+	bios_start = *(unsigned short *)0x413 << 10;
+
+	if (bios_start < BIOS_START_MIN || bios_start > BIOS_START_MAX)
+		bios_start = BIOS_START_MAX;
+
+	if (ebda_start > BIOS_START_MIN && ebda_start < bios_start)
+		bios_start = ebda_start;
+
+	/* Place trampoline below end of low memory, aligned to 4k */
+	trampoline_start = bios_start - LVL5_TRAMPOLINE_SIZE;
+	trampoline_start = round_down(trampoline_start, PAGE_SIZE);
+
+	trampoline = (unsigned long *)trampoline_start;
+
+	/* Clear trampoline memory first */
+	memset(trampoline, 0, LVL5_TRAMPOLINE_SIZE);
+
+	/* Copy trampoline code in place */
+	memcpy(trampoline + LVL5_TRAMPOLINE_CODE_OFF / sizeof(unsigned long),
+			&lvl5_trampoline_src, LVL5_TRAMPOLINE_CODE_SIZE);
+
+	/*
+	 * Setup current CR3 as the first and the only entry in a new top level
+	 * page table.
+	 */
+	trampoline[0] = __read_cr3() + _PAGE_TABLE_NOENC;
+
+	return trampoline;
+}
 #endif
diff --git a/arch/x86/boot/compressed/pagetable.h b/arch/x86/boot/compressed/pagetable.h
new file mode 100644
index 000000000000..906436cc1c02
--- /dev/null
+++ b/arch/x86/boot/compressed/pagetable.h
@@ -0,0 +1,18 @@
+#ifndef BOOT_COMPRESSED_PAGETABLE_H
+#define BOOT_COMPRESSED_PAGETABLE_H
+
+#define LVL5_TRAMPOLINE_SIZE		(2 * PAGE_SIZE)
+
+#define LVL5_TRAMPOLINE_PGTABLE_OFF	0
+
+#define LVL5_TRAMPOLINE_CODE_OFF	PAGE_SIZE
+#define LVL5_TRAMPOLINE_CODE_SIZE	0x40
+
+#define LVL5_TRAMPOLINE_STACK_END	LVL5_TRAMPOLINE_SIZE
+
+#ifndef __ASSEMBLER__
+
+extern void (*lvl5_trampoline_src)(void *return_ptr);
+
+#endif /* __ASSEMBLER__ */
+#endif /* BOOT_COMPRESSED_PAGETABLE_H */
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/4] x86/boot/compressed/64: Introduce place_trampoline()
@ 2017-10-20 19:59   ` Kirill A. Shutemov
  0 siblings, 0 replies; 18+ messages in thread
From: Kirill A. Shutemov @ 2017-10-20 19:59 UTC (permalink / raw)
  To: Ingo Molnar, Linus Torvalds, x86, Thomas Gleixner, H. Peter Anvin
  Cc: Andy Lutomirski, Cyrill Gorcunov, Borislav Petkov, Andi Kleen,
	linux-mm, linux-kernel, Kirill A. Shutemov

If bootloader enables 64-bit mode with 4-level paging, we need to
switch over to 5-level paging. The switching requires disabling paging.
It works fine if kernel itself is loaded below 4G.

If bootloader put the kernel above 4G (not sure if anybody does this),
we would loose control as soon as paging is disabled as code becomes
unreachable.

To handle the situation, we need a trampoline in lower memory that would
take care about switching on 5-level paging.

Apart from trampoline itself we also need place to store top level page
table in lower memory as we don't have a way to load 64-bit value into
CR3 from 32-bit mode. We only really need 8-bytes there as we only use
the very first entry of the page table. But we allocate whole page
anyway. We cannot have the code in the same because, there's hazard that
a CPU would read page table speculatively and get confused seeing
garbage.

This patch introduces place_trampoline() that finds right spot in lower
memory for trampoline, copies trampoline code there and setups new top
level page table for 5-level paging.

At this point we do all the preparation, but not yet use trampoline.
It will be done in following patch.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/boot/compressed/head_64.S   | 13 +++++++++++
 arch/x86/boot/compressed/pagetable.c | 42 ++++++++++++++++++++++++++++++++++++
 arch/x86/boot/compressed/pagetable.h | 18 ++++++++++++++++
 3 files changed, 73 insertions(+)
 create mode 100644 arch/x86/boot/compressed/pagetable.h

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 6ac8239af2b6..4d1555b39de0 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -315,6 +315,18 @@ ENTRY(startup_64)
 	 * The first step is go into compatibility mode.
 	 */
 
+	/*
+	 * Find suitable place for trampoline and populate it.
+	 * The address will be stored in RCX.
+	 *
+	 * RSI holds real mode data and need to be preserved across
+	 * a function call.
+	 */
+	pushq	%rsi
+	call	place_trampoline
+	popq	%rsi
+	movq	%rax, %rcx
+
 	/* Clear additional page table */
 	leaq	lvl5_pgtable(%rbx), %rdi
 	xorq	%rax, %rax
@@ -474,6 +486,7 @@ relocated:
 
 	.code32
 #ifdef CONFIG_X86_5LEVEL
+ENTRY(lvl5_trampoline_src)
 compatible_mode:
 	/* Setup data and stack segments */
 	movl	$__KERNEL_DS, %eax
diff --git a/arch/x86/boot/compressed/pagetable.c b/arch/x86/boot/compressed/pagetable.c
index 76d25a82e3ac..a04fb69a453f 100644
--- a/arch/x86/boot/compressed/pagetable.c
+++ b/arch/x86/boot/compressed/pagetable.c
@@ -23,6 +23,8 @@
 #undef CONFIG_AMD_MEM_ENCRYPT
 
 #include "misc.h"
+#include "pagetable.h"
+#include "../string.h"
 
 /* These actually do the work of building the kernel identity maps. */
 #include <asm/init.h>
@@ -167,4 +169,44 @@ int need_to_enabled_l5(void)
 
 	return 1;
 }
+
+#define BIOS_START_MIN		0x20000U	/* 128K, less than this is insane */
+#define BIOS_START_MAX		0x9f000U	/* 640K, absolute maximum */
+
+unsigned long *place_trampoline()
+{
+	unsigned long bios_start, ebda_start, trampoline_start, *trampoline;
+
+	/* Based on reserve_bios_regions() */
+
+	ebda_start = *(unsigned short *)0x40e << 4;
+	bios_start = *(unsigned short *)0x413 << 10;
+
+	if (bios_start < BIOS_START_MIN || bios_start > BIOS_START_MAX)
+		bios_start = BIOS_START_MAX;
+
+	if (ebda_start > BIOS_START_MIN && ebda_start < bios_start)
+		bios_start = ebda_start;
+
+	/* Place trampoline below end of low memory, aligned to 4k */
+	trampoline_start = bios_start - LVL5_TRAMPOLINE_SIZE;
+	trampoline_start = round_down(trampoline_start, PAGE_SIZE);
+
+	trampoline = (unsigned long *)trampoline_start;
+
+	/* Clear trampoline memory first */
+	memset(trampoline, 0, LVL5_TRAMPOLINE_SIZE);
+
+	/* Copy trampoline code in place */
+	memcpy(trampoline + LVL5_TRAMPOLINE_CODE_OFF / sizeof(unsigned long),
+			&lvl5_trampoline_src, LVL5_TRAMPOLINE_CODE_SIZE);
+
+	/*
+	 * Setup current CR3 as the first and the only entry in a new top level
+	 * page table.
+	 */
+	trampoline[0] = __read_cr3() + _PAGE_TABLE_NOENC;
+
+	return trampoline;
+}
 #endif
diff --git a/arch/x86/boot/compressed/pagetable.h b/arch/x86/boot/compressed/pagetable.h
new file mode 100644
index 000000000000..906436cc1c02
--- /dev/null
+++ b/arch/x86/boot/compressed/pagetable.h
@@ -0,0 +1,18 @@
+#ifndef BOOT_COMPRESSED_PAGETABLE_H
+#define BOOT_COMPRESSED_PAGETABLE_H
+
+#define LVL5_TRAMPOLINE_SIZE		(2 * PAGE_SIZE)
+
+#define LVL5_TRAMPOLINE_PGTABLE_OFF	0
+
+#define LVL5_TRAMPOLINE_CODE_OFF	PAGE_SIZE
+#define LVL5_TRAMPOLINE_CODE_SIZE	0x40
+
+#define LVL5_TRAMPOLINE_STACK_END	LVL5_TRAMPOLINE_SIZE
+
+#ifndef __ASSEMBLER__
+
+extern void (*lvl5_trampoline_src)(void *return_ptr);
+
+#endif /* __ASSEMBLER__ */
+#endif /* BOOT_COMPRESSED_PAGETABLE_H */
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/4] x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
  2017-10-20 19:59 ` Kirill A. Shutemov
@ 2017-10-20 19:59   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 18+ messages in thread
From: Kirill A. Shutemov @ 2017-10-20 19:59 UTC (permalink / raw)
  To: Ingo Molnar, Linus Torvalds, x86, Thomas Gleixner, H. Peter Anvin
  Cc: Andy Lutomirski, Cyrill Gorcunov, Borislav Petkov, Andi Kleen,
	linux-mm, linux-kernel, Kirill A. Shutemov

This patch addresses shortcoming in current boot process on machines
that supports 5-level paging.

If bootloader enables 64-bit mode with 4-level paging, we need to
switch over to 5-level paging. The switching requires disabling paging.
It works fine if kernel itself is loaded below 4G.

If bootloader put the kernel above 4G (not sure if anybody does this),
we would loose control as soon as paging is disabled as code becomes
unreachable.

This patch implements trampoline in lower memory to handle this
situation.

We only need the memory for very short time, until main kernel image
setup its own page tables.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/boot/compressed/head_64.S | 72 ++++++++++++++++++++++++--------------
 1 file changed, 45 insertions(+), 27 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 4d1555b39de0..e8331f5a77f4 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -32,6 +32,7 @@
 #include <asm/processor-flags.h>
 #include <asm/asm-offsets.h>
 #include <asm/bootparam.h>
+#include "pagetable.h"
 
 /*
  * Locally defined symbols should be marked hidden:
@@ -288,6 +289,19 @@ ENTRY(startup_64)
 	leaq	boot_stack_end(%rbx), %rsp
 
 #ifdef CONFIG_X86_5LEVEL
+/*
+ * We need trampoline in lower memory switch from 4- to 5-level paging for
+ * cases when bootloader put kernel above 4G, but didn't enable 5-level paging
+ * for us.
+ *
+ * We also have to have top page table in lower memory as we don't have a way
+ * to load 64-bit value into CR3 from 32-bit mode. We only need 8-bytes there
+ * as we only use the very first entry of the page table, but we allocate whole
+ * page anyway. We cannot have the code in the same because, there's hazard
+ * that a CPU would read page table speculatively and get confused seeing
+ * garbage.
+ */
+
 	/*
 	 * Check if we need to enable 5-level paging.
 	 * RSI holds real mode data and need to be preserved across
@@ -309,8 +323,8 @@ ENTRY(startup_64)
 	 * long mode would trigger #GP. So we need to switch off long mode
 	 * first.
 	 *
-	 * NOTE: This is not going to work if bootloader put us above 4G
-	 * limit.
+	 * We use trampoline in lower memory to handle situation when
+	 * bootloader put the kernel image above 4G.
 	 *
 	 * The first step is go into compatibility mode.
 	 */
@@ -327,26 +341,20 @@ ENTRY(startup_64)
 	popq	%rsi
 	movq	%rax, %rcx
 
-	/* Clear additional page table */
-	leaq	lvl5_pgtable(%rbx), %rdi
-	xorq	%rax, %rax
-	movq	$(PAGE_SIZE/8), %rcx
-	rep	stosq
-
 	/*
-	 * Setup current CR3 as the first and only entry in a new top level
-	 * page table.
+	 * Load address of lvl5 into RDI.
+	 * It will be used to return address from trampoline.
 	 */
-	movq	%cr3, %rdi
-	leaq	0x7 (%rdi), %rax
-	movq	%rax, lvl5_pgtable(%rbx)
+	leaq	lvl5(%rip), %rdi
 
 	/* Switch to compatibility mode (CS.L = 0 CS.D = 1) via far return */
 	pushq	$__KERNEL32_CS
-	leaq	compatible_mode(%rip), %rax
+	leaq	LVL5_TRAMPOLINE_CODE_OFF(%rcx), %rax
 	pushq	%rax
 	lretq
 lvl5:
+	/* Restore stack, 32-bit trampoline uses own stack */
+	leaq	boot_stack_end(%rbx), %rsp
 #endif
 
 	/* Zero EFLAGS */
@@ -484,22 +492,30 @@ relocated:
  */
 	jmp	*%rax
 
-	.code32
 #ifdef CONFIG_X86_5LEVEL
+	.code32
+/*
+ * This is 32-bit trampoline that will be copied over to low memory.
+ *
+ * RDI contains return address (might be above 4G).
+ * ECX contains the base address of trampoline memory.
+ */
 ENTRY(lvl5_trampoline_src)
-compatible_mode:
 	/* Setup data and stack segments */
 	movl	$__KERNEL_DS, %eax
 	movl	%eax, %ds
 	movl	%eax, %ss
 
+	/* Setup new stack at the end of trampoline memory */
+	leal	LVL5_TRAMPOLINE_STACK_END (%ecx), %esp
+
 	/* Disable paging */
 	movl	%cr0, %eax
 	btrl	$X86_CR0_PG_BIT, %eax
 	movl	%eax, %cr0
 
 	/* Point CR3 to 5-level paging */
-	leal	lvl5_pgtable(%ebx), %eax
+	leal	LVL5_TRAMPOLINE_PGTABLE_OFF (%ecx), %eax
 	movl	%eax, %cr3
 
 	/* Enable PAE and LA57 mode */
@@ -507,23 +523,29 @@ compatible_mode:
 	orl	$(X86_CR4_PAE | X86_CR4_LA57), %eax
 	movl	%eax, %cr4
 
-	/* Calculate address we are running at */
-	call	1f
-1:	popl	%edi
-	subl	$1b, %edi
+	/* Calculate address of lvl5_enabled once we are in trampoline */
+	leal	lvl5_enabled - lvl5_trampoline_src + LVL5_TRAMPOLINE_CODE_OFF (%ecx), %eax
 
 	/* Prepare stack for far return to Long Mode */
 	pushl	$__KERNEL_CS
-	leal	lvl5(%edi), %eax
-	push	%eax
+	pushl	%eax
 
 	/* Enable paging back */
 	movl	$(X86_CR0_PG | X86_CR0_PE), %eax
 	movl	%eax, %cr0
 
 	lret
+
+	.code64
+lvl5_enabled:
+	/* Return from trampoline */
+	jmp	*%rdi
+
+	/* Bound size of trampoline code */
+	.org	lvl5_trampoline_src + LVL5_TRAMPOLINE_CODE_SIZE
 #endif
 
+	.code32
 no_longmode:
 	/* This isn't an x86-64 CPU so hang */
 1:
@@ -581,7 +603,3 @@ boot_stack_end:
 	.balign 4096
 pgtable:
 	.fill BOOT_PGT_SIZE, 1, 0
-#ifdef CONFIG_X86_5LEVEL
-lvl5_pgtable:
-	.fill PAGE_SIZE, 1, 0
-#endif
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/4] x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
@ 2017-10-20 19:59   ` Kirill A. Shutemov
  0 siblings, 0 replies; 18+ messages in thread
From: Kirill A. Shutemov @ 2017-10-20 19:59 UTC (permalink / raw)
  To: Ingo Molnar, Linus Torvalds, x86, Thomas Gleixner, H. Peter Anvin
  Cc: Andy Lutomirski, Cyrill Gorcunov, Borislav Petkov, Andi Kleen,
	linux-mm, linux-kernel, Kirill A. Shutemov

This patch addresses shortcoming in current boot process on machines
that supports 5-level paging.

If bootloader enables 64-bit mode with 4-level paging, we need to
switch over to 5-level paging. The switching requires disabling paging.
It works fine if kernel itself is loaded below 4G.

If bootloader put the kernel above 4G (not sure if anybody does this),
we would loose control as soon as paging is disabled as code becomes
unreachable.

This patch implements trampoline in lower memory to handle this
situation.

We only need the memory for very short time, until main kernel image
setup its own page tables.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/boot/compressed/head_64.S | 72 ++++++++++++++++++++++++--------------
 1 file changed, 45 insertions(+), 27 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 4d1555b39de0..e8331f5a77f4 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -32,6 +32,7 @@
 #include <asm/processor-flags.h>
 #include <asm/asm-offsets.h>
 #include <asm/bootparam.h>
+#include "pagetable.h"
 
 /*
  * Locally defined symbols should be marked hidden:
@@ -288,6 +289,19 @@ ENTRY(startup_64)
 	leaq	boot_stack_end(%rbx), %rsp
 
 #ifdef CONFIG_X86_5LEVEL
+/*
+ * We need trampoline in lower memory switch from 4- to 5-level paging for
+ * cases when bootloader put kernel above 4G, but didn't enable 5-level paging
+ * for us.
+ *
+ * We also have to have top page table in lower memory as we don't have a way
+ * to load 64-bit value into CR3 from 32-bit mode. We only need 8-bytes there
+ * as we only use the very first entry of the page table, but we allocate whole
+ * page anyway. We cannot have the code in the same because, there's hazard
+ * that a CPU would read page table speculatively and get confused seeing
+ * garbage.
+ */
+
 	/*
 	 * Check if we need to enable 5-level paging.
 	 * RSI holds real mode data and need to be preserved across
@@ -309,8 +323,8 @@ ENTRY(startup_64)
 	 * long mode would trigger #GP. So we need to switch off long mode
 	 * first.
 	 *
-	 * NOTE: This is not going to work if bootloader put us above 4G
-	 * limit.
+	 * We use trampoline in lower memory to handle situation when
+	 * bootloader put the kernel image above 4G.
 	 *
 	 * The first step is go into compatibility mode.
 	 */
@@ -327,26 +341,20 @@ ENTRY(startup_64)
 	popq	%rsi
 	movq	%rax, %rcx
 
-	/* Clear additional page table */
-	leaq	lvl5_pgtable(%rbx), %rdi
-	xorq	%rax, %rax
-	movq	$(PAGE_SIZE/8), %rcx
-	rep	stosq
-
 	/*
-	 * Setup current CR3 as the first and only entry in a new top level
-	 * page table.
+	 * Load address of lvl5 into RDI.
+	 * It will be used to return address from trampoline.
 	 */
-	movq	%cr3, %rdi
-	leaq	0x7 (%rdi), %rax
-	movq	%rax, lvl5_pgtable(%rbx)
+	leaq	lvl5(%rip), %rdi
 
 	/* Switch to compatibility mode (CS.L = 0 CS.D = 1) via far return */
 	pushq	$__KERNEL32_CS
-	leaq	compatible_mode(%rip), %rax
+	leaq	LVL5_TRAMPOLINE_CODE_OFF(%rcx), %rax
 	pushq	%rax
 	lretq
 lvl5:
+	/* Restore stack, 32-bit trampoline uses own stack */
+	leaq	boot_stack_end(%rbx), %rsp
 #endif
 
 	/* Zero EFLAGS */
@@ -484,22 +492,30 @@ relocated:
  */
 	jmp	*%rax
 
-	.code32
 #ifdef CONFIG_X86_5LEVEL
+	.code32
+/*
+ * This is 32-bit trampoline that will be copied over to low memory.
+ *
+ * RDI contains return address (might be above 4G).
+ * ECX contains the base address of trampoline memory.
+ */
 ENTRY(lvl5_trampoline_src)
-compatible_mode:
 	/* Setup data and stack segments */
 	movl	$__KERNEL_DS, %eax
 	movl	%eax, %ds
 	movl	%eax, %ss
 
+	/* Setup new stack at the end of trampoline memory */
+	leal	LVL5_TRAMPOLINE_STACK_END (%ecx), %esp
+
 	/* Disable paging */
 	movl	%cr0, %eax
 	btrl	$X86_CR0_PG_BIT, %eax
 	movl	%eax, %cr0
 
 	/* Point CR3 to 5-level paging */
-	leal	lvl5_pgtable(%ebx), %eax
+	leal	LVL5_TRAMPOLINE_PGTABLE_OFF (%ecx), %eax
 	movl	%eax, %cr3
 
 	/* Enable PAE and LA57 mode */
@@ -507,23 +523,29 @@ compatible_mode:
 	orl	$(X86_CR4_PAE | X86_CR4_LA57), %eax
 	movl	%eax, %cr4
 
-	/* Calculate address we are running at */
-	call	1f
-1:	popl	%edi
-	subl	$1b, %edi
+	/* Calculate address of lvl5_enabled once we are in trampoline */
+	leal	lvl5_enabled - lvl5_trampoline_src + LVL5_TRAMPOLINE_CODE_OFF (%ecx), %eax
 
 	/* Prepare stack for far return to Long Mode */
 	pushl	$__KERNEL_CS
-	leal	lvl5(%edi), %eax
-	push	%eax
+	pushl	%eax
 
 	/* Enable paging back */
 	movl	$(X86_CR0_PG | X86_CR0_PE), %eax
 	movl	%eax, %cr0
 
 	lret
+
+	.code64
+lvl5_enabled:
+	/* Return from trampoline */
+	jmp	*%rdi
+
+	/* Bound size of trampoline code */
+	.org	lvl5_trampoline_src + LVL5_TRAMPOLINE_CODE_SIZE
 #endif
 
+	.code32
 no_longmode:
 	/* This isn't an x86-64 CPU so hang */
 1:
@@ -581,7 +603,3 @@ boot_stack_end:
 	.balign 4096
 pgtable:
 	.fill BOOT_PGT_SIZE, 1, 0
-#ifdef CONFIG_X86_5LEVEL
-lvl5_pgtable:
-	.fill PAGE_SIZE, 1, 0
-#endif
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/4] mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS
  2017-10-20 19:59   ` Kirill A. Shutemov
@ 2017-10-21  1:43     ` Nitin Gupta
  -1 siblings, 0 replies; 18+ messages in thread
From: Nitin Gupta @ 2017-10-21  1:43 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Ingo Molnar, Linus Torvalds, x86, Thomas Gleixner,
	H. Peter Anvin, Andy Lutomirski, Cyrill Gorcunov,
	Borislav Petkov, Andi Kleen, linux-mm, linux-kernel, Minchan Kim,
	Sergey Senozhatsky

On Fri, Oct 20, 2017 at 12:59 PM, Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
> With boot-time switching between paging mode we will have variable
> MAX_PHYSMEM_BITS.
>
> Let's use the maximum variable possible for CONFIG_X86_5LEVEL=y
> configuration to define zsmalloc data structures.
>
> The patch introduces MAX_POSSIBLE_PHYSMEM_BITS to cover such case.
> It also suits well to handle PAE special case.
>


I see that with your upcoming patch, MAX_PHYSMEM_BITS is turned into a
variable for x86_64 case as: (pgtable_l5_enabled ? 52 : 46).

Even with this change, I don't see a need for this new
MAX_POSSIBLE_PHYSMEM_BITS constant.


> -#ifndef MAX_PHYSMEM_BITS
> -#ifdef CONFIG_HIGHMEM64G
> -#define MAX_PHYSMEM_BITS 36
> -#else /* !CONFIG_HIGHMEM64G */
> +#ifndef MAX_POSSIBLE_PHYSMEM_BITS
> +#ifdef MAX_PHYSMEM_BITS
> +#define MAX_POSSIBLE_PHYSMEM_BITS MAX_PHYSMEM_BITS
> +#else


This ifdef on HIGHMEM64G is redundant, as x86 already defines
MAX_PHYSMEM_BITS = 36 in PAE case. So, all that zsmalloc should do is:

#ifndef MAX_PHYSMEM_BITS
#define MAX_PHYSMEM_BITS BITS_PER_LONG
#endif

.. and then no change is needed for rest of derived constants like _PFN_BITS.

It is upto every arch to define correct MAX_PHYSMEM_BITS (variable or constant)
based on whatever configurations the arch supports. If not defined,
zsmalloc picks
a reasonable default of BITS_PER_LONG.

I will send a patch which makes the change to remove ifdef on CONFIG_HIGHMEM64G.

Thanks,
Nitin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/4] mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS
@ 2017-10-21  1:43     ` Nitin Gupta
  0 siblings, 0 replies; 18+ messages in thread
From: Nitin Gupta @ 2017-10-21  1:43 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Ingo Molnar, Linus Torvalds, x86, Thomas Gleixner,
	H. Peter Anvin, Andy Lutomirski, Cyrill Gorcunov,
	Borislav Petkov, Andi Kleen, linux-mm, linux-kernel, Minchan Kim,
	Sergey Senozhatsky

On Fri, Oct 20, 2017 at 12:59 PM, Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
> With boot-time switching between paging mode we will have variable
> MAX_PHYSMEM_BITS.
>
> Let's use the maximum variable possible for CONFIG_X86_5LEVEL=y
> configuration to define zsmalloc data structures.
>
> The patch introduces MAX_POSSIBLE_PHYSMEM_BITS to cover such case.
> It also suits well to handle PAE special case.
>


I see that with your upcoming patch, MAX_PHYSMEM_BITS is turned into a
variable for x86_64 case as: (pgtable_l5_enabled ? 52 : 46).

Even with this change, I don't see a need for this new
MAX_POSSIBLE_PHYSMEM_BITS constant.


> -#ifndef MAX_PHYSMEM_BITS
> -#ifdef CONFIG_HIGHMEM64G
> -#define MAX_PHYSMEM_BITS 36
> -#else /* !CONFIG_HIGHMEM64G */
> +#ifndef MAX_POSSIBLE_PHYSMEM_BITS
> +#ifdef MAX_PHYSMEM_BITS
> +#define MAX_POSSIBLE_PHYSMEM_BITS MAX_PHYSMEM_BITS
> +#else


This ifdef on HIGHMEM64G is redundant, as x86 already defines
MAX_PHYSMEM_BITS = 36 in PAE case. So, all that zsmalloc should do is:

#ifndef MAX_PHYSMEM_BITS
#define MAX_PHYSMEM_BITS BITS_PER_LONG
#endif

.. and then no change is needed for rest of derived constants like _PFN_BITS.

It is upto every arch to define correct MAX_PHYSMEM_BITS (variable or constant)
based on whatever configurations the arch supports. If not defined,
zsmalloc picks
a reasonable default of BITS_PER_LONG.

I will send a patch which makes the change to remove ifdef on CONFIG_HIGHMEM64G.

Thanks,
Nitin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/4] mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS
  2017-10-21  1:43     ` Nitin Gupta
@ 2017-10-21  8:32       ` Kirill A. Shutemov
  -1 siblings, 0 replies; 18+ messages in thread
From: Kirill A. Shutemov @ 2017-10-21  8:32 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Kirill A. Shutemov, Ingo Molnar, Linus Torvalds, x86,
	Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Cyrill Gorcunov, Borislav Petkov, Andi Kleen, linux-mm,
	linux-kernel, Minchan Kim, Sergey Senozhatsky

On Fri, Oct 20, 2017 at 06:43:55PM -0700, Nitin Gupta wrote:
> On Fri, Oct 20, 2017 at 12:59 PM, Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com> wrote:
> > With boot-time switching between paging mode we will have variable
> > MAX_PHYSMEM_BITS.
> >
> > Let's use the maximum variable possible for CONFIG_X86_5LEVEL=y
> > configuration to define zsmalloc data structures.
> >
> > The patch introduces MAX_POSSIBLE_PHYSMEM_BITS to cover such case.
> > It also suits well to handle PAE special case.
> >
> 
> 
> I see that with your upcoming patch, MAX_PHYSMEM_BITS is turned into a
> variable for x86_64 case as: (pgtable_l5_enabled ? 52 : 46).
> 
> Even with this change, I don't see a need for this new
> MAX_POSSIBLE_PHYSMEM_BITS constant.

This is the error, I'm talking about:

mm/zsmalloc.c:249:21: error: variably modified ‘size_class’ at file scope
  struct size_class *size_class[ZS_SIZE_CLASSES];

ZS_SIZE_CLASSES
  ZS_MIN_ALLOC_SIZE
    OBJ_INDEX_BITS
      _PFN_BITS
        MAX_PHYSMEM_BITS
	  (pgtable_l5_enabled ? 52 : 46)

Check without the patch and full patchset applied.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/4] mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS
@ 2017-10-21  8:32       ` Kirill A. Shutemov
  0 siblings, 0 replies; 18+ messages in thread
From: Kirill A. Shutemov @ 2017-10-21  8:32 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Kirill A. Shutemov, Ingo Molnar, Linus Torvalds, x86,
	Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Cyrill Gorcunov, Borislav Petkov, Andi Kleen, linux-mm,
	linux-kernel, Minchan Kim, Sergey Senozhatsky

On Fri, Oct 20, 2017 at 06:43:55PM -0700, Nitin Gupta wrote:
> On Fri, Oct 20, 2017 at 12:59 PM, Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com> wrote:
> > With boot-time switching between paging mode we will have variable
> > MAX_PHYSMEM_BITS.
> >
> > Let's use the maximum variable possible for CONFIG_X86_5LEVEL=y
> > configuration to define zsmalloc data structures.
> >
> > The patch introduces MAX_POSSIBLE_PHYSMEM_BITS to cover such case.
> > It also suits well to handle PAE special case.
> >
> 
> 
> I see that with your upcoming patch, MAX_PHYSMEM_BITS is turned into a
> variable for x86_64 case as: (pgtable_l5_enabled ? 52 : 46).
> 
> Even with this change, I don't see a need for this new
> MAX_POSSIBLE_PHYSMEM_BITS constant.

This is the error, I'm talking about:

mm/zsmalloc.c:249:21: error: variably modified a??size_classa?? at file scope
  struct size_class *size_class[ZS_SIZE_CLASSES];

ZS_SIZE_CLASSES
  ZS_MIN_ALLOC_SIZE
    OBJ_INDEX_BITS
      _PFN_BITS
        MAX_PHYSMEM_BITS
	  (pgtable_l5_enabled ? 52 : 46)

Check without the patch and full patchset applied.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/4] mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS
  2017-10-20 19:59   ` Kirill A. Shutemov
@ 2017-10-23  3:10     ` Minchan Kim
  -1 siblings, 0 replies; 18+ messages in thread
From: Minchan Kim @ 2017-10-23  3:10 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Ingo Molnar, Linus Torvalds, x86, Thomas Gleixner,
	H. Peter Anvin, Andy Lutomirski, Cyrill Gorcunov,
	Borislav Petkov, Andi Kleen, linux-mm, linux-kernel, Nitin Gupta,
	Sergey Senozhatsky

On Fri, Oct 20, 2017 at 10:59:31PM +0300, Kirill A. Shutemov wrote:
> With boot-time switching between paging mode we will have variable
> MAX_PHYSMEM_BITS.
> 
> Let's use the maximum variable possible for CONFIG_X86_5LEVEL=y
> configuration to define zsmalloc data structures.
> 
> The patch introduces MAX_POSSIBLE_PHYSMEM_BITS to cover such case.
> It also suits well to handle PAE special case.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Nitin Gupta <ngupta@vflare.org>
> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>

Nitin:

I think this patch works and it would be best for Kirill to be able to do.
So if you have better idea to clean it up, let's make it as another patch
regardless of this patch series.

Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/4] mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS
@ 2017-10-23  3:10     ` Minchan Kim
  0 siblings, 0 replies; 18+ messages in thread
From: Minchan Kim @ 2017-10-23  3:10 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Ingo Molnar, Linus Torvalds, x86, Thomas Gleixner,
	H. Peter Anvin, Andy Lutomirski, Cyrill Gorcunov,
	Borislav Petkov, Andi Kleen, linux-mm, linux-kernel, Nitin Gupta,
	Sergey Senozhatsky

On Fri, Oct 20, 2017 at 10:59:31PM +0300, Kirill A. Shutemov wrote:
> With boot-time switching between paging mode we will have variable
> MAX_PHYSMEM_BITS.
> 
> Let's use the maximum variable possible for CONFIG_X86_5LEVEL=y
> configuration to define zsmalloc data structures.
> 
> The patch introduces MAX_POSSIBLE_PHYSMEM_BITS to cover such case.
> It also suits well to handle PAE special case.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Nitin Gupta <ngupta@vflare.org>
> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>

Nitin:

I think this patch works and it would be best for Kirill to be able to do.
So if you have better idea to clean it up, let's make it as another patch
regardless of this patch series.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/4] mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS
  2017-10-23  3:10     ` Minchan Kim
@ 2017-10-23  5:25       ` Nitin Gupta
  -1 siblings, 0 replies; 18+ messages in thread
From: Nitin Gupta @ 2017-10-23  5:25 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Kirill A. Shutemov, Ingo Molnar, Linus Torvalds, x86,
	Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Cyrill Gorcunov, Borislav Petkov, Andi Kleen, linux-mm,
	linux-kernel, Sergey Senozhatsky

On Sun, Oct 22, 2017 at 8:10 PM, Minchan Kim <minchan@kernel.org> wrote:
> On Fri, Oct 20, 2017 at 10:59:31PM +0300, Kirill A. Shutemov wrote:
>> With boot-time switching between paging mode we will have variable
>> MAX_PHYSMEM_BITS.
>>
>> Let's use the maximum variable possible for CONFIG_X86_5LEVEL=y
>> configuration to define zsmalloc data structures.
>>
>> The patch introduces MAX_POSSIBLE_PHYSMEM_BITS to cover such case.
>> It also suits well to handle PAE special case.
>>
>> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Cc: Nitin Gupta <ngupta@vflare.org>
>> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
> Acked-by: Minchan Kim <minchan@kernel.org>
>
> Nitin:
>
> I think this patch works and it would be best for Kirill to be able to do.
> So if you have better idea to clean it up, let's make it as another patch
> regardless of this patch series.
>


I was looking into dynamically allocating size_class array to avoid that
compile error, but yes, that can be done in a future patch. So, for this patch:

Reviewed-by: Nitin Gupta <ngupta@vflare.org>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/4] mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS
@ 2017-10-23  5:25       ` Nitin Gupta
  0 siblings, 0 replies; 18+ messages in thread
From: Nitin Gupta @ 2017-10-23  5:25 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Kirill A. Shutemov, Ingo Molnar, Linus Torvalds, x86,
	Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Cyrill Gorcunov, Borislav Petkov, Andi Kleen, linux-mm,
	linux-kernel, Sergey Senozhatsky

On Sun, Oct 22, 2017 at 8:10 PM, Minchan Kim <minchan@kernel.org> wrote:
> On Fri, Oct 20, 2017 at 10:59:31PM +0300, Kirill A. Shutemov wrote:
>> With boot-time switching between paging mode we will have variable
>> MAX_PHYSMEM_BITS.
>>
>> Let's use the maximum variable possible for CONFIG_X86_5LEVEL=y
>> configuration to define zsmalloc data structures.
>>
>> The patch introduces MAX_POSSIBLE_PHYSMEM_BITS to cover such case.
>> It also suits well to handle PAE special case.
>>
>> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Cc: Nitin Gupta <ngupta@vflare.org>
>> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
> Acked-by: Minchan Kim <minchan@kernel.org>
>
> Nitin:
>
> I think this patch works and it would be best for Kirill to be able to do.
> So if you have better idea to clean it up, let's make it as another patch
> regardless of this patch series.
>


I was looking into dynamically allocating size_class array to avoid that
compile error, but yes, that can be done in a future patch. So, for this patch:

Reviewed-by: Nitin Gupta <ngupta@vflare.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-10-23  5:25 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-20 19:59 [PATCH 0/4] Boot-time switching between 4- and 5-level paging for 4.15, Part 2 Kirill A. Shutemov
2017-10-20 19:59 ` Kirill A. Shutemov
2017-10-20 19:59 ` [PATCH 1/4] mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS Kirill A. Shutemov
2017-10-20 19:59   ` Kirill A. Shutemov
2017-10-21  1:43   ` Nitin Gupta
2017-10-21  1:43     ` Nitin Gupta
2017-10-21  8:32     ` Kirill A. Shutemov
2017-10-21  8:32       ` Kirill A. Shutemov
2017-10-23  3:10   ` Minchan Kim
2017-10-23  3:10     ` Minchan Kim
2017-10-23  5:25     ` Nitin Gupta
2017-10-23  5:25       ` Nitin Gupta
2017-10-20 19:59 ` [PATCH 2/4] x86/boot/compressed/64: Detect and handle 5-level paging at boot-time Kirill A. Shutemov
2017-10-20 19:59   ` Kirill A. Shutemov
2017-10-20 19:59 ` [PATCH 3/4] x86/boot/compressed/64: Introduce place_trampoline() Kirill A. Shutemov
2017-10-20 19:59   ` Kirill A. Shutemov
2017-10-20 19:59 ` [PATCH 4/4] x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G Kirill A. Shutemov
2017-10-20 19:59   ` Kirill A. Shutemov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.