[PATCH v3 00/11] arm64: update/clarify/relax Image and FDT placement rules

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 00/11] arm64: update/clarify/relax Image and FDT placement rules
@ 2015-04-10 13:53 Ard Biesheuvel
  2015-04-10 13:53 ` [PATCH v3 01/11] arm64: reduce ID map to a single page Ard Biesheuvel
                   ` (10 more replies)
  0 siblings, 11 replies; 29+ messages in thread
From: Ard Biesheuvel @ 2015-04-10 13:53 UTC (permalink / raw)
  To: linux-arm-kernel

This series came about after Mark Rutland brought up the fact that the current
FDT placement logic used by the EFI stub is flawed. But actually, it turned out
that the documentation for both the Image and FDT placement was incorrect as
well, or confusing at the very least.

Changes since v2:
This is a complete overhaul of the previous version. The FDT changes are mostly
equivalent, but have been reimplemented in a way that does not rely on the
linear mapping to have been initialized yet. This includes changes to the fixmap
code itself to not rely on that either. Combined with the ID map reduction in
patch #1, this paves the way for relaxing the Image placement requirements as
well, i.e., the kernel Image can now be placed anywhere in memory without
affecting the accessibility of memory below it, or causing the resulting mapping
to be less efficient due to physical and virtual memory to not be relatively
aligned.

Changes since v1:
- patch #1: split off reservation of the FDT binary itself from the memreserve
  processing, since the former assumes the FDT is accessed via the linear
  mapping, which we are about to change
- patch #2: mention the older, stricter FDT placement rules in booting.txt,
  get rid of early_print,
  use correct format specifier for phys_addr_t,
  use R/O mapping for FDT,
- patches #3 .. #5: add R-b, minor style and grammar tweaks

Ard Biesheuvel (11):
  arm64: reduce ID map to a single page
  arm64: drop sleep_idmap_phys and clean up cpu_resume()
  of/fdt: split off FDT self reservation from memreserve processing
  arm64: use fixmap region for permanent FDT mapping
  arm64/efi: adapt to relaxed FDT placement requirements
  arm64: implement our own early_init_dt_add_memory_arch()
  arm64: fixmap: allow init before linear mapping is set up
  arm64: mm: explicitly bootstrap the linear mapping
  arm64: move kernel mapping out of linear region
  arm64: allow kernel Image to be loaded anywhere in physical memory
  arm64/efi: adapt to relaxed kernel Image placement requirements

 Documentation/arm64/booting.txt         |  25 ++-
 arch/arm/mm/init.c                      |   1 +
 arch/arm64/include/asm/compiler.h       |   2 +
 arch/arm64/include/asm/efi.h            |   8 +-
 arch/arm64/include/asm/fixmap.h         |   9 ++
 arch/arm64/include/asm/memory.h         |  15 +-
 arch/arm64/include/asm/mmu.h            |   1 +
 arch/arm64/kernel/Makefile              |   1 +
 arch/arm64/kernel/efi-stub.c            |   4 +-
 arch/arm64/kernel/head.S                |  60 ++-----
 arch/arm64/kernel/setup.c               |  45 +++---
 arch/arm64/kernel/sleep.S               |   9 +-
 arch/arm64/kernel/suspend.c             |   3 -
 arch/arm64/kernel/vmlinux.lds.S         |  35 ++++-
 arch/arm64/mm/init.c                    |  35 +++++
 arch/arm64/mm/mmu.c                     | 271 ++++++++++++++++++++++++--------
 arch/arm64/mm/proc.S                    |   3 +-
 arch/powerpc/kernel/prom.c              |   1 +
 drivers/firmware/efi/libstub/arm-stub.c |   5 +-
 drivers/firmware/efi/libstub/efistub.h  |   1 -
 drivers/firmware/efi/libstub/fdt.c      |  23 +--
 drivers/of/fdt.c                        |  19 ++-
 include/linux/of_fdt.h                  |   2 +
 23 files changed, 380 insertions(+), 198 deletions(-)

-- 
1.8.3.2

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 01/11] arm64: reduce ID map to a single page
  2015-04-10 13:53 [PATCH v3 00/11] arm64: update/clarify/relax Image and FDT placement rules Ard Biesheuvel
@ 2015-04-10 13:53 ` Ard Biesheuvel
  2015-04-13 12:53   ` Mark Rutland
  2015-04-10 13:53 ` [PATCH v3 02/11] arm64: drop sleep_idmap_phys and clean up cpu_resume() Ard Biesheuvel
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 29+ messages in thread
From: Ard Biesheuvel @ 2015-04-10 13:53 UTC (permalink / raw)
  To: linux-arm-kernel

Commit ea8c2e112445 ("arm64: Extend the idmap to the whole kernel
image") changed the early page table code so that the entire kernel
Image is covered by the identity map. This allows functions that
need to enable or disable the MMU to reside anywhere in the kernel
Image.

However, this change has the unfortunate side effect that the Image
cannot cross a physical 512 MB alignment boundary anymore, since the
early page table code cannot deal with the Image crossing a /virtual/
512 MB alignment boundary.

So instead, reduce the ID map to a single page, that is populated by
the contents of the .idmap.text section. Only three functions reside
there at the moment: __enable_mmu(), cpu_resume_mmu() and cpu_reset().
If new code is introduced that needs to manipulate the MMU state, it
should be added to this section as well.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/kernel/head.S        | 13 +++++++------
 arch/arm64/kernel/sleep.S       |  2 ++
 arch/arm64/kernel/vmlinux.lds.S | 11 ++++++++++-
 arch/arm64/mm/proc.S            |  3 ++-
 4 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 19f915e8f6e0..f54125a95a6d 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -382,7 +382,7 @@ __create_page_tables:
 	 * Create the identity mapping.
 	 */
 	mov	x0, x25				// idmap_pg_dir
-	adrp	x3, KERNEL_START		// __pa(KERNEL_START)
+	adrp	x3, __idmap_text_start		// __pa(__idmap_text_start)
 
 #ifndef CONFIG_ARM64_VA_BITS_48
 #define EXTRA_SHIFT	(PGDIR_SHIFT + PAGE_SHIFT - 3)
@@ -405,11 +405,11 @@ __create_page_tables:
 
 	/*
 	 * Calculate the maximum allowed value for TCR_EL1.T0SZ so that the
-	 * entire kernel image can be ID mapped. As T0SZ == (64 - #bits used),
+	 * entire ID map region can be mapped. As T0SZ == (64 - #bits used),
 	 * this number conveniently equals the number of leading zeroes in
-	 * the physical address of KERNEL_END.
+	 * the physical address of __idmap_text_end.
 	 */
-	adrp	x5, KERNEL_END
+	adrp	x5, __idmap_text_end
 	clz	x5, x5
 	cmp	x5, TCR_T0SZ(VA_BITS)	// default T0SZ small enough?
 	b.ge	1f			// .. then skip additional level
@@ -424,8 +424,8 @@ __create_page_tables:
 #endif
 
 	create_pgd_entry x0, x3, x5, x6
-	mov	x5, x3				// __pa(KERNEL_START)
-	adr_l	x6, KERNEL_END			// __pa(KERNEL_END)
+	mov	x5, x3				// __pa(__idmap_text_start)
+	adr_l	x6, __idmap_text_end		// __pa(__idmap_text_end)
 	create_block_map x0, x7, x3, x5, x6
 
 	/*
@@ -669,6 +669,7 @@ ENDPROC(__secondary_switched)
  *
  * other registers depend on the function called upon completion
  */
+	.section	".idmap.text", #alloc, #execinstr
 __enable_mmu:
 	ldr	x5, =vectors
 	msr	vbar_el1, x5
diff --git a/arch/arm64/kernel/sleep.S b/arch/arm64/kernel/sleep.S
index ede186cdd452..04dc9aa2831e 100644
--- a/arch/arm64/kernel/sleep.S
+++ b/arch/arm64/kernel/sleep.S
@@ -130,12 +130,14 @@ ENDPROC(__cpu_suspend_enter)
 /*
  * x0 must contain the sctlr value retrieved from restored context
  */
+	.pushsection	".idmap.text", #alloc, #execinstr
 ENTRY(cpu_resume_mmu)
 	ldr	x3, =cpu_resume_after_mmu
 	msr	sctlr_el1, x0		// restore sctlr_el1
 	isb
 	br	x3			// global jump to virtual address
 ENDPROC(cpu_resume_mmu)
+	.popsection
 cpu_resume_after_mmu:
 	mov	x0, #0			// return zero on success
 	ldp	x19, x20, [sp, #16]
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index a2c29865c3fe..98073332e2d0 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -38,6 +38,12 @@ jiffies = jiffies_64;
 	*(.hyp.text)					\
 	VMLINUX_SYMBOL(__hyp_text_end) = .;
 
+#define IDMAP_TEXT					\
+	. = ALIGN(SZ_4K);				\
+	VMLINUX_SYMBOL(__idmap_text_start) = .;		\
+	*(.idmap.text)					\
+	VMLINUX_SYMBOL(__idmap_text_end) = .;
+
 /*
  * The size of the PE/COFF section that covers the kernel image, which
  * runs from stext to _edata, must be a round multiple of the PE/COFF
@@ -95,6 +101,7 @@ SECTIONS
 			SCHED_TEXT
 			LOCK_TEXT
 			HYPERVISOR_TEXT
+			IDMAP_TEXT
 			*(.fixup)
 			*(.gnu.warning)
 		. = ALIGN(16);
@@ -167,11 +174,13 @@ SECTIONS
 }
 
 /*
- * The HYP init code can't be more than a page long,
+ * The HYP init code and ID map text can't be longer than a page each,
  * and should not cross a page boundary.
  */
 ASSERT(__hyp_idmap_text_end - (__hyp_idmap_text_start & ~(SZ_4K - 1)) <= SZ_4K,
 	"HYP init code too big or misaligned")
+ASSERT(__idmap_text_end - (__idmap_text_start & ~(SZ_4K - 1)) <= SZ_4K,
+	"ID map text too big or misaligned")
 
 /*
  * If padding is applied before .head.text, virt<->phys conversions will fail.
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index cdd754e19b9b..09da618906a3 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -67,7 +67,7 @@ ENDPROC(cpu_cache_off)
  *
  *	- loc   - location to jump to for soft reset
  */
-	.align	5
+	.pushsection	".idmap.text", #alloc, #execinstr
 ENTRY(cpu_reset)
 	mrs	x1, sctlr_el1
 	bic	x1, x1, #1
@@ -75,6 +75,7 @@ ENTRY(cpu_reset)
 	isb
 	ret	x0
 ENDPROC(cpu_reset)
+	.popsection
 
 ENTRY(cpu_soft_restart)
 	/* Save address of cpu_reset() and reset address */
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 02/11] arm64: drop sleep_idmap_phys and clean up cpu_resume()
  2015-04-10 13:53 [PATCH v3 00/11] arm64: update/clarify/relax Image and FDT placement rules Ard Biesheuvel
  2015-04-10 13:53 ` [PATCH v3 01/11] arm64: reduce ID map to a single page Ard Biesheuvel
@ 2015-04-10 13:53 ` Ard Biesheuvel
  2015-04-10 13:53 ` [PATCH v3 03/11] of/fdt: split off FDT self reservation from memreserve processing Ard Biesheuvel
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: Ard Biesheuvel @ 2015-04-10 13:53 UTC (permalink / raw)
  To: linux-arm-kernel

Two cleanups of the asm function cpu_resume():
- The global variable sleep_idmap_phys always points to idmap_pg_dir,
  so we can just use that value directly in the CPU resume path.
- Unclutter the load of sleep_save_sp::save_ptr_stash_phys.

Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/kernel/sleep.S   | 7 ++-----
 arch/arm64/kernel/suspend.c | 3 ---
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kernel/sleep.S b/arch/arm64/kernel/sleep.S
index 04dc9aa2831e..3c98fe1c1c2f 100644
--- a/arch/arm64/kernel/sleep.S
+++ b/arch/arm64/kernel/sleep.S
@@ -164,15 +164,12 @@ ENTRY(cpu_resume)
 #else
 	mov	x7, xzr
 #endif
-	adrp	x0, sleep_save_sp
-	add	x0, x0, #:lo12:sleep_save_sp
-	ldr	x0, [x0, #SLEEP_SAVE_SP_PHYS]
+	ldr_l	x0, sleep_save_sp + SLEEP_SAVE_SP_PHYS
 	ldr	x0, [x0, x7, lsl #3]
 	/* load sp from context */
 	ldr	x2, [x0, #CPU_CTX_SP]
-	adrp	x1, sleep_idmap_phys
 	/* load physical address of identity map page table in x1 */
-	ldr	x1, [x1, #:lo12:sleep_idmap_phys]
+	adrp	x1, idmap_pg_dir
 	mov	sp, x2
 	/*
 	 * cpu_do_resume expects x0 to contain context physical address
diff --git a/arch/arm64/kernel/suspend.c b/arch/arm64/kernel/suspend.c
index d7daf45ae7a2..f6073c27d65f 100644
--- a/arch/arm64/kernel/suspend.c
+++ b/arch/arm64/kernel/suspend.c
@@ -118,7 +118,6 @@ int __cpu_suspend(unsigned long arg, int (*fn)(unsigned long))
 }
 
 struct sleep_save_sp sleep_save_sp;
-phys_addr_t sleep_idmap_phys;
 
 static int __init cpu_suspend_init(void)
 {
@@ -132,9 +131,7 @@ static int __init cpu_suspend_init(void)
 
 	sleep_save_sp.save_ptr_stash = ctx_ptr;
 	sleep_save_sp.save_ptr_stash_phys = virt_to_phys(ctx_ptr);
-	sleep_idmap_phys = virt_to_phys(idmap_pg_dir);
 	__flush_dcache_area(&sleep_save_sp, sizeof(struct sleep_save_sp));
-	__flush_dcache_area(&sleep_idmap_phys, sizeof(sleep_idmap_phys));
 
 	return 0;
 }
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 03/11] of/fdt: split off FDT self reservation from memreserve processing
  2015-04-10 13:53 [PATCH v3 00/11] arm64: update/clarify/relax Image and FDT placement rules Ard Biesheuvel
  2015-04-10 13:53 ` [PATCH v3 01/11] arm64: reduce ID map to a single page Ard Biesheuvel
  2015-04-10 13:53 ` [PATCH v3 02/11] arm64: drop sleep_idmap_phys and clean up cpu_resume() Ard Biesheuvel
@ 2015-04-10 13:53 ` Ard Biesheuvel
  2015-04-10 13:53 ` [PATCH v3 04/11] arm64: use fixmap region for permanent FDT mapping Ard Biesheuvel
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: Ard Biesheuvel @ 2015-04-10 13:53 UTC (permalink / raw)
  To: linux-arm-kernel

This splits off the reservation of the memory occupied by the FDT
binary itself from the processing of the memory reservations it
contains. This is necessary because the physical address of the FDT,
which is needed to perform the reservation, may not be known to the
FDT driver core, i.e., it may be mapped outside the linear direct
mapping, in which case __pa() returns a bogus value.

Cc: Russell King <linux@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/mm/init.c         |  1 +
 arch/arm64/mm/init.c       |  1 +
 arch/powerpc/kernel/prom.c |  1 +
 drivers/of/fdt.c           | 19 ++++++++++++++-----
 include/linux/of_fdt.h     |  2 ++
 5 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 1609b022a72f..0b8657c36fe4 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -317,6 +317,7 @@ void __init arm_memblock_init(const struct machine_desc *mdesc)
 	if (mdesc->reserve)
 		mdesc->reserve();
 
+	early_init_fdt_reserve_self();
 	early_init_fdt_scan_reserved_mem();
 
 	/* reserve memory for DMA contiguous allocations */
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index ae85da6307bb..fa2389b0f7f0 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -170,6 +170,7 @@ void __init arm64_memblock_init(void)
 		memblock_reserve(__virt_to_phys(initrd_start), initrd_end - initrd_start);
 #endif
 
+	early_init_fdt_reserve_self();
 	early_init_fdt_scan_reserved_mem();
 
 	/* 4GB maximum for 32-bit only capable devices */
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index b8e15c678960..093ccfb384af 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -573,6 +573,7 @@ static void __init early_reserve_mem_dt(void)
 	int len;
 	const __be32 *prop;
 
+	early_init_fdt_reserve_self();
 	early_init_fdt_scan_reserved_mem();
 
 	dt_root = of_get_flat_dt_root();
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 3a896c9aeb74..bbb35cdb06e8 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -561,11 +561,6 @@ void __init early_init_fdt_scan_reserved_mem(void)
 	if (!initial_boot_params)
 		return;
 
-	/* Reserve the dtb region */
-	early_init_dt_reserve_memory_arch(__pa(initial_boot_params),
-					  fdt_totalsize(initial_boot_params),
-					  0);
-
 	/* Process header /memreserve/ fields */
 	for (n = 0; ; n++) {
 		fdt_get_mem_rsv(initial_boot_params, n, &base, &size);
@@ -579,6 +574,20 @@ void __init early_init_fdt_scan_reserved_mem(void)
 }
 
 /**
+ * early_init_fdt_reserve_self() - reserve the memory used by the FDT blob
+ */
+void __init early_init_fdt_reserve_self(void)
+{
+	if (!initial_boot_params)
+		return;
+
+	/* Reserve the dtb region */
+	early_init_dt_reserve_memory_arch(__pa(initial_boot_params),
+					  fdt_totalsize(initial_boot_params),
+					  0);
+}
+
+/**
  * of_scan_flat_dt - scan flattened tree blob and call callback on each.
  * @it: callback function
  * @data: context data pointer
diff --git a/include/linux/of_fdt.h b/include/linux/of_fdt.h
index 0ff360d5b3b3..6ef6b33238d3 100644
--- a/include/linux/of_fdt.h
+++ b/include/linux/of_fdt.h
@@ -62,6 +62,7 @@ extern int early_init_dt_scan_chosen(unsigned long node, const char *uname,
 extern int early_init_dt_scan_memory(unsigned long node, const char *uname,
 				     int depth, void *data);
 extern void early_init_fdt_scan_reserved_mem(void);
+extern void early_init_fdt_reserve_self(void);
 extern void early_init_dt_add_memory_arch(u64 base, u64 size);
 extern int early_init_dt_reserve_memory_arch(phys_addr_t base, phys_addr_t size,
 					     bool no_map);
@@ -89,6 +90,7 @@ extern u64 fdt_translate_address(const void *blob, int node_offset);
 extern void of_fdt_limit_memory(int limit);
 #else /* CONFIG_OF_FLATTREE */
 static inline void early_init_fdt_scan_reserved_mem(void) {}
+static inline void early_init_fdt_reserve_self(void) {}
 static inline const char *of_flat_dt_get_machine_name(void) { return NULL; }
 static inline void unflatten_device_tree(void) {}
 static inline void unflatten_and_copy_device_tree(void) {}
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 04/11] arm64: use fixmap region for permanent FDT mapping
  2015-04-10 13:53 [PATCH v3 00/11] arm64: update/clarify/relax Image and FDT placement rules Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2015-04-10 13:53 ` [PATCH v3 03/11] of/fdt: split off FDT self reservation from memreserve processing Ard Biesheuvel
@ 2015-04-10 13:53 ` Ard Biesheuvel
  2015-04-13 15:02   ` Mark Rutland
  2015-04-10 13:53 ` [PATCH v3 05/11] arm64/efi: adapt to relaxed FDT placement requirements Ard Biesheuvel
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 29+ messages in thread
From: Ard Biesheuvel @ 2015-04-10 13:53 UTC (permalink / raw)
  To: linux-arm-kernel

Currently, the FDT blob needs to be in the same 512 MB region as
the kernel, so that it can be mapped into the kernel virtual memory
space very early on using a minimal set of statically allocated
translation tables.

Now that we have early fixmap support, we can relax this restriction,
by moving the permanent FDT mapping to the fixmap region instead.
This way, the FDT blob may be anywhere in memory.

This also moves the vetting of the FDT to setup.c, since the early
init code in head.S does not handle mapping of the FDT anymore.
At the same time, fix up some comments in head.S that have gone stale.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 Documentation/arm64/booting.txt |  8 +++----
 arch/arm64/include/asm/fixmap.h |  9 ++++++++
 arch/arm64/include/asm/mmu.h    |  1 +
 arch/arm64/kernel/Makefile      |  1 +
 arch/arm64/kernel/head.S        | 39 +--------------------------------
 arch/arm64/kernel/setup.c       | 45 ++++++++++++++++++++------------------
 arch/arm64/mm/init.c            |  1 -
 arch/arm64/mm/mmu.c             | 48 +++++++++++++++++++++++++++++++++++++++++
 8 files changed, 88 insertions(+), 64 deletions(-)

diff --git a/Documentation/arm64/booting.txt b/Documentation/arm64/booting.txt
index f3c05b5f9f08..6396460f6085 100644
--- a/Documentation/arm64/booting.txt
+++ b/Documentation/arm64/booting.txt
@@ -45,11 +45,11 @@ sees fit.)
 
 Requirement: MANDATORY
 
-The device tree blob (dtb) must be placed on an 8-byte boundary within
-the first 512 megabytes from the start of the kernel image and must not
-cross a 2-megabyte boundary. This is to allow the kernel to map the
-blob using a single section mapping in the initial page tables.
+The device tree blob (dtb) must be placed on an 8-byte boundary and must
+not exceed 2 megabytes in size.
 
+NOTE: versions prior to v4.2 also require that the dtb be placed within
+512 MB of the 2 MB aligned boundary right below the kernel Image.
 
 3. Decompress the kernel image
 ------------------------------
diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
index 926495686554..fb3557c6901f 100644
--- a/arch/arm64/include/asm/fixmap.h
+++ b/arch/arm64/include/asm/fixmap.h
@@ -32,6 +32,15 @@
  */
 enum fixed_addresses {
 	FIX_HOLE,
+
+	/*
+	 * Reserve 4 MB of virtual space for the FDT at the top of the fixmap
+	 * region. Keep this at the top so it remains 2 MB aligned.
+	 */
+#define FIX_FDT_SIZE		SZ_4M
+	FIX_FDT_END,
+	FIX_FDT = FIX_FDT_END + FIX_FDT_SIZE / PAGE_SIZE - 1,
+
 	FIX_EARLYCON_MEM_BASE,
 	FIX_TEXT_POKE0,
 	__end_of_permanent_fixed_addresses,
diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 3d311761e3c2..79fcfb048884 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -34,5 +34,6 @@ extern void init_mem_pgprot(void);
 extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
 			       unsigned long virt, phys_addr_t size,
 			       pgprot_t prot);
+extern void *fixmap_remap_fdt(phys_addr_t dt_phys);
 
 #endif
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index d5e70747c7a2..772831cf6937 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -6,6 +6,7 @@ CPPFLAGS_vmlinux.lds	:= -DTEXT_OFFSET=$(TEXT_OFFSET)
 AFLAGS_head.o		:= -DTEXT_OFFSET=$(TEXT_OFFSET)
 CFLAGS_efi-stub.o 	:= -DTEXT_OFFSET=$(TEXT_OFFSET)
 CFLAGS_armv8_deprecated.o := -I$(src)
+CFLAGS_setup.o		:= -I$(srctree)/scripts/dtc/libfdt/
 
 CFLAGS_REMOVE_ftrace.o = -pg
 CFLAGS_REMOVE_insn.o = -pg
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index f54125a95a6d..208ca21868cc 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -237,8 +237,6 @@ ENTRY(stext)
 	bl	el2_setup			// Drop to EL1, w20=cpu_boot_mode
 	adrp	x24, __PHYS_OFFSET
 	bl	set_cpu_boot_mode_flag
-
-	bl	__vet_fdt
 	bl	__create_page_tables		// x25=TTBR0, x26=TTBR1
 	/*
 	 * The following calls CPU setup code, see arch/arm64/mm/proc.S for
@@ -270,24 +268,6 @@ preserve_boot_args:
 ENDPROC(preserve_boot_args)
 
 /*
- * Determine validity of the x21 FDT pointer.
- * The dtb must be 8-byte aligned and live in the first 512M of memory.
- */
-__vet_fdt:
-	tst	x21, #0x7
-	b.ne	1f
-	cmp	x21, x24
-	b.lt	1f
-	mov	x0, #(1 << 29)
-	add	x0, x0, x24
-	cmp	x21, x0
-	b.ge	1f
-	ret
-1:
-	mov	x21, #0
-	ret
-ENDPROC(__vet_fdt)
-/*
  * Macro to create a table entry to the next page.
  *
  *	tbl:	page table address
@@ -348,8 +328,7 @@ ENDPROC(__vet_fdt)
  * required to get the kernel running. The following sections are required:
  *   - identity mapping to enable the MMU (low address, TTBR0)
  *   - first few MB of the kernel linear mapping to jump to once the MMU has
- *     been enabled, including the FDT blob (TTBR1)
- *   - pgd entry for fixed mappings (TTBR1)
+ *     been enabled
  */
 __create_page_tables:
 	adrp	x25, idmap_pg_dir
@@ -439,22 +418,6 @@ __create_page_tables:
 	create_block_map x0, x7, x3, x5, x6
 
 	/*
-	 * Map the FDT blob (maximum 2MB; must be within 512MB of
-	 * PHYS_OFFSET).
-	 */
-	mov	x3, x21				// FDT phys address
-	and	x3, x3, #~((1 << 21) - 1)	// 2MB aligned
-	mov	x6, #PAGE_OFFSET
-	sub	x5, x3, x24			// subtract PHYS_OFFSET
-	tst	x5, #~((1 << 29) - 1)		// within 512MB?
-	csel	x21, xzr, x21, ne		// zero the FDT pointer
-	b.ne	1f
-	add	x5, x5, x6			// __va(FDT blob)
-	add	x6, x5, #1 << 21		// 2MB for the FDT blob
-	sub	x6, x6, #1			// inclusive range
-	create_block_map x0, x7, x3, x5, x6
-1:
-	/*
 	 * Since the page tables have been populated with non-cacheable
 	 * accesses (MMU disabled), invalidate the idmap and swapper page
 	 * tables again to remove any speculatively loaded cache lines.
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 51ef97274b52..40675229e95c 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -45,6 +45,7 @@
 #include <linux/of_platform.h>
 #include <linux/efi.h>
 #include <linux/personality.h>
+#include <linux/libfdt.h>
 
 #include <asm/fixmap.h>
 #include <asm/cpu.h>
@@ -103,18 +104,6 @@ static struct resource mem_res[] = {
 #define kernel_code mem_res[0]
 #define kernel_data mem_res[1]
 
-void __init early_print(const char *str, ...)
-{
-	char buf[256];
-	va_list ap;
-
-	va_start(ap, str);
-	vsnprintf(buf, sizeof(buf), str, ap);
-	va_end(ap);
-
-	printk("%s", buf);
-}
-
 /*
  * The recorded values of x0 .. x3 upon kernel entry.
  */
@@ -324,17 +313,31 @@ static void __init setup_processor(void)
 
 static void __init setup_machine_fdt(phys_addr_t dt_phys)
 {
-	if (!dt_phys || !early_init_dt_scan(phys_to_virt(dt_phys))) {
-		early_print("\n"
-			"Error: invalid device tree blob at physical address 0x%p (virtual address 0x%p)\n"
-			"The dtb must be 8-byte aligned and passed in the first 512MB of memory\n"
-			"\nPlease check your bootloader.\n",
-			dt_phys, phys_to_virt(dt_phys));
+	void *dt_virt = NULL;
+
+	if (dt_phys && (dt_phys & 7) == 0)
+		dt_virt = fixmap_remap_fdt(dt_phys);
+
+	/*
+	 * Before passing the dt_virt pointer to early_init_dt_scan(), we have
+	 * to ensure that the FDT size as reported in the FDT itself does not
+	 * exceed the window we just mapped for it.
+	 */
+	if (!dt_virt ||
+	    fdt_check_header(dt_virt) != 0 ||
+	    (dt_phys & (SZ_2M - 1)) + fdt_totalsize(dt_virt) > FIX_FDT_SIZE ||
+	    !early_init_dt_scan(dt_virt)) {
+		pr_crit("\n"
+			"Error: invalid device tree blob at physical address %pa (virtual address 0x%p)\n"
+			"The dtb must be 8-byte aligned and must not exceed 2 MB in size\n"
+			"\nPlease check your bootloader.",
+			&dt_phys, dt_virt);
 
 		while (true)
 			cpu_relax();
 	}
 
+	memblock_reserve(dt_phys, fdt_totalsize(dt_virt));
 	dump_stack_set_arch_desc("%s (DT)", of_flat_dt_get_machine_name());
 }
 
@@ -372,6 +375,9 @@ void __init setup_arch(char **cmdline_p)
 {
 	setup_processor();
 
+	early_fixmap_init();
+	early_ioremap_init();
+
 	setup_machine_fdt(__fdt_pointer);
 
 	init_mm.start_code = (unsigned long) _text;
@@ -381,9 +387,6 @@ void __init setup_arch(char **cmdline_p)
 
 	*cmdline_p = boot_command_line;
 
-	early_fixmap_init();
-	early_ioremap_init();
-
 	parse_early_param();
 
 	/*
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index fa2389b0f7f0..ae85da6307bb 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -170,7 +170,6 @@ void __init arm64_memblock_init(void)
 		memblock_reserve(__virt_to_phys(initrd_start), initrd_end - initrd_start);
 #endif
 
-	early_init_fdt_reserve_self();
 	early_init_fdt_scan_reserved_mem();
 
 	/* 4GB maximum for 32-bit only capable devices */
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 428aaf86c95b..60be58a160a2 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -643,3 +643,51 @@ void __set_fixmap(enum fixed_addresses idx,
 		flush_tlb_kernel_range(addr, addr+PAGE_SIZE);
 	}
 }
+
+void *__init fixmap_remap_fdt(phys_addr_t dt_phys)
+{
+	const u64 dt_virt_base = __fix_to_virt(FIX_FDT);
+	phys_addr_t dt_phys_base = dt_phys & ~(SZ_2M - 1);
+	u64 pfn = dt_phys_base >> PAGE_SHIFT;
+	u64 pfn_end = pfn + (FIX_FDT_SIZE >> PAGE_SHIFT);
+	pgprot_t prot = PAGE_KERNEL | PTE_RDONLY;
+
+	/*
+	 * Make sure that the FDT region can be mapped without the need to
+	 * allocate additional translation table pages, or perform physical
+	 * to virtual address translations via the linear mapping.
+	 *
+	 * On 4k pages, we'll use section mappings for the region so we
+	 * only have to be in the same PUD as the rest of the fixmap.
+	 * On 64k pages, we need to be in the same PMD as well, as the region
+	 * will be mapped using PTEs.
+	 */
+	BUILD_BUG_ON(dt_virt_base & (SZ_2M - 1));
+
+	if (IS_ENABLED(CONFIG_ARM64_64K_PAGES)) {
+		pte_t *pte;
+
+		BUILD_BUG_ON(__fix_to_virt(FIX_FDT_END) >> PMD_SHIFT !=
+			     __fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT);
+
+		/* map the FDT using 64k pages */
+		pte = fixmap_pte(dt_virt_base);
+		do {
+			set_pte(pte++, pfn_pte(pfn++, prot));
+		} while (pfn < pfn_end);
+	} else {
+		pmd_t *pmd;
+
+		BUILD_BUG_ON(__fix_to_virt(FIX_FDT_END) >> PUD_SHIFT !=
+			     __fix_to_virt(FIX_BTMAP_BEGIN) >> PUD_SHIFT);
+
+		/* map the FDT using 2 MB sections */
+		pmd = fixmap_pmd(dt_virt_base);
+		do {
+			set_pmd(pmd++, pfn_pmd(pfn, mk_sect_prot(prot)));
+			pfn += PMD_SIZE >> PAGE_SHIFT;
+		} while (pfn < pfn_end);
+	}
+
+	return (void *)(dt_virt_base + dt_phys - dt_phys_base);
+}
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 05/11] arm64/efi: adapt to relaxed FDT placement requirements
  2015-04-10 13:53 [PATCH v3 00/11] arm64: update/clarify/relax Image and FDT placement rules Ard Biesheuvel
                   ` (3 preceding siblings ...)
  2015-04-10 13:53 ` [PATCH v3 04/11] arm64: use fixmap region for permanent FDT mapping Ard Biesheuvel
@ 2015-04-10 13:53 ` Ard Biesheuvel
  2015-04-13 16:20   ` Mark Rutland
  2015-04-10 13:53 ` [PATCH v3 06/11] arm64: implement our own early_init_dt_add_memory_arch() Ard Biesheuvel
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 29+ messages in thread
From: Ard Biesheuvel @ 2015-04-10 13:53 UTC (permalink / raw)
  To: linux-arm-kernel

With the relaxed FDT placement requirements in place, we can change
the allocation strategy used by the stub to put the FDT image higher
up in memory. At the same time, reduce the minimal alignment to 8 bytes,
and impose a 2 MB size limit, as per the new requirements.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/efi.h            |  8 +++-----
 drivers/firmware/efi/libstub/arm-stub.c |  5 ++---
 drivers/firmware/efi/libstub/efistub.h  |  1 -
 drivers/firmware/efi/libstub/fdt.c      | 23 +++++++++++++----------
 4 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h
index ef572206f1c3..bd513dd663b9 100644
--- a/arch/arm64/include/asm/efi.h
+++ b/arch/arm64/include/asm/efi.h
@@ -39,12 +39,10 @@ extern void efi_init(void);
 /* arch specific definitions used by the stub code */
 
 /*
- * AArch64 requires the DTB to be 8-byte aligned in the first 512MiB from
- * start of kernel and may not cross a 2MiB boundary. We set alignment to
- * 2MiB so we know it won't cross a 2MiB boundary.
+ * AArch64 requires the DTB to be 8-byte aligned and not exceed 2MiB in size.
  */
-#define EFI_FDT_ALIGN	SZ_2M   /* used by allocate_new_fdt_and_exit_boot() */
-#define MAX_FDT_OFFSET	SZ_512M
+#define EFI_FDT_ALIGN		8
+#define EFI_FDT_MAX_SIZE	SZ_2M
 
 #define efi_call_early(f, ...) sys_table_arg->boottime->f(__VA_ARGS__)
 
diff --git a/drivers/firmware/efi/libstub/arm-stub.c b/drivers/firmware/efi/libstub/arm-stub.c
index dcae482a9a17..f54c76a4fd32 100644
--- a/drivers/firmware/efi/libstub/arm-stub.c
+++ b/drivers/firmware/efi/libstub/arm-stub.c
@@ -269,9 +269,8 @@ unsigned long efi_entry(void *handle, efi_system_table_t *sys_table,
 
 	new_fdt_addr = fdt_addr;
 	status = allocate_new_fdt_and_exit_boot(sys_table, handle,
-				&new_fdt_addr, dram_base + MAX_FDT_OFFSET,
-				initrd_addr, initrd_size, cmdline_ptr,
-				fdt_addr, fdt_size);
+				&new_fdt_addr, initrd_addr, initrd_size,
+				cmdline_ptr, fdt_addr, fdt_size);
 
 	/*
 	 * If all went well, we need to return the FDT address to the
diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
index 47437b16b186..c8e096094ea9 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -35,7 +35,6 @@ efi_status_t update_fdt(efi_system_table_t *sys_table, void *orig_fdt,
 efi_status_t allocate_new_fdt_and_exit_boot(efi_system_table_t *sys_table,
 					    void *handle,
 					    unsigned long *new_fdt_addr,
-					    unsigned long max_addr,
 					    u64 initrd_addr, u64 initrd_size,
 					    char *cmdline_ptr,
 					    unsigned long fdt_addr,
diff --git a/drivers/firmware/efi/libstub/fdt.c b/drivers/firmware/efi/libstub/fdt.c
index 91da56c4fd54..ace5ed70a88e 100644
--- a/drivers/firmware/efi/libstub/fdt.c
+++ b/drivers/firmware/efi/libstub/fdt.c
@@ -165,10 +165,6 @@ fdt_set_fail:
 	return EFI_LOAD_ERROR;
 }
 
-#ifndef EFI_FDT_ALIGN
-#define EFI_FDT_ALIGN EFI_PAGE_SIZE
-#endif
-
 /*
  * Allocate memory for a new FDT, then add EFI, commandline, and
  * initrd related fields to the FDT.  This routine increases the
@@ -186,7 +182,6 @@ fdt_set_fail:
 efi_status_t allocate_new_fdt_and_exit_boot(efi_system_table_t *sys_table,
 					    void *handle,
 					    unsigned long *new_fdt_addr,
-					    unsigned long max_addr,
 					    u64 initrd_addr, u64 initrd_size,
 					    char *cmdline_ptr,
 					    unsigned long fdt_addr,
@@ -197,6 +192,7 @@ efi_status_t allocate_new_fdt_and_exit_boot(efi_system_table_t *sys_table,
 	unsigned long mmap_key;
 	efi_memory_desc_t *memory_map, *runtime_map;
 	unsigned long new_fdt_size;
+	void *fdt_alloc;
 	efi_status_t status;
 	int runtime_entry_count = 0;
 
@@ -221,14 +217,21 @@ efi_status_t allocate_new_fdt_and_exit_boot(efi_system_table_t *sys_table,
 	 * will allocate a bigger buffer if this ends up being too
 	 * small, so a rough guess is OK here.
 	 */
-	new_fdt_size = fdt_size + EFI_PAGE_SIZE;
+	new_fdt_size = fdt_size + EFI_PAGE_SIZE + EFI_FDT_ALIGN;
 	while (1) {
-		status = efi_high_alloc(sys_table, new_fdt_size, EFI_FDT_ALIGN,
-					new_fdt_addr, max_addr);
+		if (new_fdt_size > EFI_FDT_MAX_SIZE) {
+			pr_efi_err(sys_table, "FDT size exceeds EFI_FDT_MAX_SIZE.\n");
+			goto fail;
+		}
+		status = sys_table->boottime->allocate_pool(EFI_LOADER_DATA,
+							    new_fdt_size,
+							    &fdt_alloc);
 		if (status != EFI_SUCCESS) {
 			pr_efi_err(sys_table, "Unable to allocate memory for new device tree.\n");
 			goto fail;
 		}
+		*new_fdt_addr = round_up((unsigned long)fdt_alloc,
+					 EFI_FDT_ALIGN);
 
 		/*
 		 * Now that we have done our final memory allocation (and free)
@@ -258,7 +261,7 @@ efi_status_t allocate_new_fdt_and_exit_boot(efi_system_table_t *sys_table,
 			 * to get new one that reflects the free/alloc we do
 			 * on the device tree buffer.
 			 */
-			efi_free(sys_table, new_fdt_size, *new_fdt_addr);
+			sys_table->boottime->free_pool(&fdt_alloc);
 			sys_table->boottime->free_pool(memory_map);
 			new_fdt_size += EFI_PAGE_SIZE;
 		} else {
@@ -316,7 +319,7 @@ fail_free_mmap:
 	sys_table->boottime->free_pool(memory_map);
 
 fail_free_new_fdt:
-	efi_free(sys_table, new_fdt_size, *new_fdt_addr);
+	sys_table->boottime->free_pool(&fdt_alloc);
 
 fail:
 	sys_table->boottime->free_pool(runtime_map);
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 06/11] arm64: implement our own early_init_dt_add_memory_arch()
  2015-04-10 13:53 [PATCH v3 00/11] arm64: update/clarify/relax Image and FDT placement rules Ard Biesheuvel
                   ` (4 preceding siblings ...)
  2015-04-10 13:53 ` [PATCH v3 05/11] arm64/efi: adapt to relaxed FDT placement requirements Ard Biesheuvel
@ 2015-04-10 13:53 ` Ard Biesheuvel
  2015-04-10 13:53 ` [PATCH v3 07/11] arm64: fixmap: allow init before linear mapping is set up Ard Biesheuvel
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: Ard Biesheuvel @ 2015-04-10 13:53 UTC (permalink / raw)
  To: linux-arm-kernel

Override the __weak early_init_dt_add_memory_arch() with our own
version. Initially, this version just ignores MAX_PHYS_ADDR, but
later on, we will modify it so it can deal with PHYS_OFFSET not
having been assigned yet.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/mm/init.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index ae85da6307bb..48175b769074 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -372,3 +372,32 @@ static int __init keepinitrd_setup(char *__unused)
 
 __setup("keepinitrd", keepinitrd_setup);
 #endif
+
+void __init early_init_dt_add_memory_arch(u64 base, u64 size)
+{
+	const u64 phys_offset = __pa(PAGE_OFFSET);
+
+	if (!PAGE_ALIGNED(base)) {
+		if (size < PAGE_SIZE - (base & ~PAGE_MASK)) {
+			pr_warn("Ignoring memory block 0x%llx - 0x%llx\n",
+				base, base + size);
+			return;
+		}
+		size -= PAGE_SIZE - (base & ~PAGE_MASK);
+		base = PAGE_ALIGN(base);
+	}
+	size &= PAGE_MASK;
+
+	if (base + size < phys_offset) {
+		pr_warning("Ignoring memory block 0x%llx - 0x%llx\n",
+			   base, base + size);
+		return;
+	}
+	if (base < phys_offset) {
+		pr_warning("Ignoring memory range 0x%llx - 0x%llx\n",
+			   base, phys_offset);
+		size -= phys_offset - base;
+		base = phys_offset;
+	}
+	memblock_add(base, size);
+}
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 07/11] arm64: fixmap: allow init before linear mapping is set up
  2015-04-10 13:53 [PATCH v3 00/11] arm64: update/clarify/relax Image and FDT placement rules Ard Biesheuvel
                   ` (5 preceding siblings ...)
  2015-04-10 13:53 ` [PATCH v3 06/11] arm64: implement our own early_init_dt_add_memory_arch() Ard Biesheuvel
@ 2015-04-10 13:53 ` Ard Biesheuvel
  2015-04-14 10:47   ` Mark Rutland
  2015-04-10 13:53 ` [PATCH v3 08/11] arm64: mm: explicitly bootstrap the linear mapping Ard Biesheuvel
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 29+ messages in thread
From: Ard Biesheuvel @ 2015-04-10 13:53 UTC (permalink / raw)
  To: linux-arm-kernel

This reworks early_ioremap_init() so it populates the various levels
of translation tables while taking the following into account:
- be prepared for any of the levels to have been populated already, as
  this may be the case once we move the kernel text mapping out of the
  linear mapping;
- don't rely on __va() to translate the physical address in a page table
  entry to a virtual address, since this produces linear mapping addresses;
  instead, use the fact that at any level, we know exactly which page in
  swapper_pg_dir an entry could be pointing to if it points anywhere.

This allows us to defer the initialization of the linear mapping until
after we have figured out where our RAM resides in the physical address
space.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/compiler.h |   2 +
 arch/arm64/kernel/vmlinux.lds.S   |  14 +++--
 arch/arm64/mm/mmu.c               | 117 +++++++++++++++++++++++++-------------
 3 files changed, 90 insertions(+), 43 deletions(-)

diff --git a/arch/arm64/include/asm/compiler.h b/arch/arm64/include/asm/compiler.h
index ee35fd0f2236..dd342af63673 100644
--- a/arch/arm64/include/asm/compiler.h
+++ b/arch/arm64/include/asm/compiler.h
@@ -27,4 +27,6 @@
  */
 #define __asmeq(x, y)  ".ifnc " x "," y " ; .err ; .endif\n\t"
 
+#define __pgdir		__attribute__((section(".pgdir"),aligned(PAGE_SIZE)))
+
 #endif	/* __ASM_COMPILER_H */
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 98073332e2d0..604f285d3832 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -160,11 +160,15 @@ SECTIONS
 
 	BSS_SECTION(0, 0, 0)
 
-	. = ALIGN(PAGE_SIZE);
-	idmap_pg_dir = .;
-	. += IDMAP_DIR_SIZE;
-	swapper_pg_dir = .;
-	. += SWAPPER_DIR_SIZE;
+	.pgdir (NOLOAD) : {
+		. = ALIGN(PAGE_SIZE);
+		idmap_pg_dir = .;
+		. += IDMAP_DIR_SIZE;
+		swapper_pg_dir = .;
+		__swapper_bs_region = . + PAGE_SIZE;
+		. += SWAPPER_DIR_SIZE;
+		*(.pgdir)
+	}
 
 	_end = .;
 
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 60be58a160a2..c0427b5c90c7 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -341,6 +341,70 @@ static void __init __map_memblock(phys_addr_t start, phys_addr_t end)
 }
 #endif
 
+struct mem_bootstrap_region {
+#if CONFIG_ARM64_PGTABLE_LEVELS > 3
+	pud_t	pud[PTRS_PER_PUD];
+#endif
+#if CONFIG_ARM64_PGTABLE_LEVELS > 2
+	pmd_t	pmd[PTRS_PER_PMD];
+#endif
+	pte_t	pte[PTRS_PER_PTE];
+};
+
+static void __init bootstrap_mem_region(unsigned long addr,
+					struct mem_bootstrap_region *reg,
+					pmd_t **ppmd, pte_t **ppte)
+{
+	/*
+	 * Avoid using the linear phys-to-virt translation __va() so that we
+	 * can use this code before the linear mapping is set up. Note that
+	 * any populated entries at any level can only point into swapper_pg_dir
+	 * since no other translation table pages have been allocated yet.
+	 * So at each level, we either need to populate it, or it has already
+	 * been populated by a swapper_pg_dir table at the same level, in which
+	 * case we can figure out its virtual address without applying __va()
+	 * on the contents of the entry, using the following struct.
+	 */
+	extern struct mem_bootstrap_region __swapper_bs_region;
+
+	pgd_t *pgd = pgd_offset_k(addr);
+	pud_t *pud = (pud_t *)pgd;
+	pmd_t *pmd = (pmd_t *)pud;
+
+#if CONFIG_ARM64_PGTABLE_LEVELS > 3
+	if (pgd_none(*pgd)) {
+		clear_page(reg->pud);
+		pgd_populate(&init_mm, pgd, reg->pud);
+		pud = reg->pud;
+	} else {
+		pud = __swapper_bs_region.pud;
+	}
+	pud += pud_index(addr);
+#endif
+
+#if CONFIG_ARM64_PGTABLE_LEVELS > 2
+	if (pud_none(*pud)) {
+		clear_page(reg->pmd);
+		pud_populate(&init_mm, pud, reg->pmd);
+		*ppmd = reg->pmd;
+	} else {
+		*ppmd = __swapper_bs_region.pmd;
+	}
+	pmd = *ppmd + pmd_index(addr);
+#endif
+
+	if (!ppte)
+		return;
+
+	if (pmd_none(*pmd)) {
+		clear_page(reg->pte);
+		pmd_populate_kernel(&init_mm, pmd, reg->pte);
+		*ppte = reg->pte;
+	} else {
+		*ppte = __swapper_bs_region.pte;
+	}
+}
+
 static void __init map_mem(void)
 {
 	struct memblock_region *reg;
@@ -554,58 +618,35 @@ void vmemmap_free(unsigned long start, unsigned long end)
 }
 #endif	/* CONFIG_SPARSEMEM_VMEMMAP */
 
-static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
-#if CONFIG_ARM64_PGTABLE_LEVELS > 2
-static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
-#endif
-#if CONFIG_ARM64_PGTABLE_LEVELS > 3
-static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
-#endif
-
-static inline pud_t * fixmap_pud(unsigned long addr)
-{
-	pgd_t *pgd = pgd_offset_k(addr);
-
-	BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));
-
-	return pud_offset(pgd, addr);
-}
+static pmd_t *fixmap_pmd_dir __initdata = (pmd_t *)swapper_pg_dir;
+static pte_t *fixmap_pte_dir;
 
-static inline pmd_t * fixmap_pmd(unsigned long addr)
+static __always_inline pmd_t *fixmap_pmd(unsigned long addr)
 {
-	pud_t *pud = fixmap_pud(addr);
-
-	BUG_ON(pud_none(*pud) || pud_bad(*pud));
-
-	return pmd_offset(pud, addr);
+#if CONFIG_ARM64_PGTABLE_LEVELS > 2
+	return fixmap_pmd_dir + pmd_index(addr);
+#else
+	return fixmap_pmd_dir + pgd_index(addr);
+#endif
 }
 
-static inline pte_t * fixmap_pte(unsigned long addr)
+static inline pte_t *fixmap_pte(unsigned long addr)
 {
-	pmd_t *pmd = fixmap_pmd(addr);
-
-	BUG_ON(pmd_none(*pmd) || pmd_bad(*pmd));
-
-	return pte_offset_kernel(pmd, addr);
+	return fixmap_pte_dir + pte_index(addr);
 }
 
 void __init early_fixmap_init(void)
 {
-	pgd_t *pgd;
-	pud_t *pud;
+	static struct mem_bootstrap_region fixmap_bs_region __pgdir;
 	pmd_t *pmd;
-	unsigned long addr = FIXADDR_START;
 
-	pgd = pgd_offset_k(addr);
-	pgd_populate(&init_mm, pgd, bm_pud);
-	pud = pud_offset(pgd, addr);
-	pud_populate(&init_mm, pud, bm_pmd);
-	pmd = pmd_offset(pud, addr);
-	pmd_populate_kernel(&init_mm, pmd, bm_pte);
+	bootstrap_mem_region(FIXADDR_START, &fixmap_bs_region, &fixmap_pmd_dir,
+			     &fixmap_pte_dir);
+	pmd = fixmap_pmd(FIXADDR_START);
 
 	/*
 	 * The boot-ioremap range spans multiple pmds, for which
-	 * we are not preparted:
+	 * we are not prepared:
 	 */
 	BUILD_BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
 		     != (__fix_to_virt(FIX_BTMAP_END) >> PMD_SHIFT));
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 08/11] arm64: mm: explicitly bootstrap the linear mapping
  2015-04-10 13:53 [PATCH v3 00/11] arm64: update/clarify/relax Image and FDT placement rules Ard Biesheuvel
                   ` (6 preceding siblings ...)
  2015-04-10 13:53 ` [PATCH v3 07/11] arm64: fixmap: allow init before linear mapping is set up Ard Biesheuvel
@ 2015-04-10 13:53 ` Ard Biesheuvel
  2015-04-10 13:53 ` [PATCH v3 09/11] arm64: move kernel mapping out of linear region Ard Biesheuvel
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: Ard Biesheuvel @ 2015-04-10 13:53 UTC (permalink / raw)
  To: linux-arm-kernel

In preparation of moving the kernel text out of the linear
mapping, ensure that the part of the kernel Image that contains
the statically allocated page tables is made accessible via the
linear mapping before performing the actual mapping of all of
memory. This is needed by the normal mapping routines, that rely
on the linear mapping to walk the page tables while manipulating
them.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/kernel/vmlinux.lds.S |  10 +++-
 arch/arm64/mm/mmu.c             | 104 ++++++++++++++++++++++++++++------------
 2 files changed, 81 insertions(+), 33 deletions(-)

diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 604f285d3832..b7cdf4feb9f1 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -160,8 +160,7 @@ SECTIONS
 
 	BSS_SECTION(0, 0, 0)
 
-	.pgdir (NOLOAD) : {
-		. = ALIGN(PAGE_SIZE);
+	.pgdir (NOLOAD) : ALIGN(SZ_1M) {
 		idmap_pg_dir = .;
 		. += IDMAP_DIR_SIZE;
 		swapper_pg_dir = .;
@@ -187,6 +186,13 @@ ASSERT(__idmap_text_end - (__idmap_text_start & ~(SZ_4K - 1)) <= SZ_4K,
 	"ID map text too big or misaligned")
 
 /*
+ * The pgdir region needs to be mappable using a single PMD or PUD sized region,
+ * so it should not cross a 512 MB or 1 GB alignment boundary, respectively
+ * (depending on page size). So align to an upper bound of its size.
+ */
+ASSERT(SIZEOF(.pgdir) < ALIGNOF(.pgdir), ".pgdir size exceeds its alignment")
+
+/*
  * If padding is applied before .head.text, virt<->phys conversions will fail.
  */
 ASSERT(_text == (PAGE_OFFSET + TEXT_OFFSET), "HEAD is misaligned")
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index c0427b5c90c7..ea35ec911393 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -405,26 +405,83 @@ static void __init bootstrap_mem_region(unsigned long addr,
 	}
 }
 
+static void __init bootstrap_linear_mapping(void)
+{
+	/*
+	 * Bootstrap the linear range that covers swapper_pg_dir so that the
+	 * statically allocated page tables are accessible via the linear
+	 * mapping. This allows us to start using the normal create_mapping()
+	 * logic which relies on the ability to translate physical addresses
+	 * contained in page table entries to virtual addresses using __va().
+	 */
+	static struct mem_bootstrap_region linear_bs_region __pgdir;
+	const phys_addr_t swapper_phys = __pa(swapper_pg_dir);
+	const unsigned long swapper_virt = __phys_to_virt(swapper_phys);
+	struct memblock_region *reg;
+	pmd_t *pmd;
+	pte_t *pte;
+
+	bootstrap_mem_region(swapper_virt, &linear_bs_region, &pmd,
+			     IS_ENABLED(CONFIG_ARM64_64K_PAGES) ? &pte : NULL);
+
+	/* now find the memblock that covers swapper_pg_dir, and clip */
+	for_each_memblock(memory, reg) {
+		phys_addr_t start = reg->base;
+		phys_addr_t end = start + reg->size;
+		unsigned long vstart, vend;
+
+		if (start > swapper_phys || end <= swapper_phys)
+			continue;
+
+#ifdef CONFIG_ARM64_64K_PAGES
+		/* clip the region to PMD size */
+		vstart = max(swapper_virt & PMD_MASK, __phys_to_virt(start));
+		vend = min(round_up(swapper_virt, PMD_SIZE),
+			   __phys_to_virt(end));
+
+		vstart = round_up(vstart, PAGE_SIZE);
+		vend = round_down(vend, PAGE_SIZE);
+
+		pte += pte_index(vstart);
+		do {
+			set_pte(pte++, __pte(__pa(vstart) | PAGE_KERNEL_EXEC));
+			vstart += PAGE_SIZE;
+		} while (vstart < vend);
+#else
+		/* clip the region to PUD size */
+		vstart = max(swapper_virt & PUD_MASK, __phys_to_virt(start));
+		vend = min(round_up(swapper_virt, PUD_SIZE),
+			   __phys_to_virt(end));
+
+		vstart = round_up(vstart, PMD_SIZE);
+		vend = round_down(vend, PMD_SIZE);
+
+		pmd += pmd_index(vstart);
+		do {
+			set_pmd(pmd++,
+				__pmd(__pa(vstart) | PROT_SECT_NORMAL_EXEC));
+			vstart += PMD_SIZE;
+		} while (vstart < vend);
+#endif
+
+		/*
+		 * Temporarily limit the memblock range. We need to do this as
+		 * create_mapping requires puds, pmds and ptes to be allocated
+		 * from memory addressable from the initial direct kernel
+		 * mapping.
+		 */
+		memblock_set_current_limit(__pa(vend));
+
+		return;
+	}
+	BUG();
+}
+
 static void __init map_mem(void)
 {
 	struct memblock_region *reg;
-	phys_addr_t limit;
 
-	/*
-	 * Temporarily limit the memblock range. We need to do this as
-	 * create_mapping requires puds, pmds and ptes to be allocated from
-	 * memory addressable from the initial direct kernel mapping.
-	 *
-	 * The initial direct kernel mapping, located@swapper_pg_dir, gives
-	 * us PUD_SIZE (4K pages) or PMD_SIZE (64K pages) memory starting from
-	 * PHYS_OFFSET (which must be aligned to 2MB as per
-	 * Documentation/arm64/booting.txt).
-	 */
-	if (IS_ENABLED(CONFIG_ARM64_64K_PAGES))
-		limit = PHYS_OFFSET + PMD_SIZE;
-	else
-		limit = PHYS_OFFSET + PUD_SIZE;
-	memblock_set_current_limit(limit);
+	bootstrap_linear_mapping();
 
 	/* map all the memory banks */
 	for_each_memblock(memory, reg) {
@@ -434,21 +491,6 @@ static void __init map_mem(void)
 		if (start >= end)
 			break;
 
-#ifndef CONFIG_ARM64_64K_PAGES
-		/*
-		 * For the first memory bank align the start address and
-		 * current memblock limit to prevent create_mapping() from
-		 * allocating pte page tables from unmapped memory.
-		 * When 64K pages are enabled, the pte page table for the
-		 * first PGDIR_SIZE is already present in swapper_pg_dir.
-		 */
-		if (start < limit)
-			start = ALIGN(start, PMD_SIZE);
-		if (end < limit) {
-			limit = end & PMD_MASK;
-			memblock_set_current_limit(limit);
-		}
-#endif
 		__map_memblock(start, end);
 	}
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 09/11] arm64: move kernel mapping out of linear region
  2015-04-10 13:53 [PATCH v3 00/11] arm64: update/clarify/relax Image and FDT placement rules Ard Biesheuvel
                   ` (7 preceding siblings ...)
  2015-04-10 13:53 ` [PATCH v3 08/11] arm64: mm: explicitly bootstrap the linear mapping Ard Biesheuvel
@ 2015-04-10 13:53 ` Ard Biesheuvel
  2015-04-10 13:53 ` [PATCH v3 10/11] arm64: allow kernel Image to be loaded anywhere in physical memory Ard Biesheuvel
  2015-04-10 13:53 ` [PATCH v3 11/11] arm64/efi: adapt to relaxed kernel Image placement requirements Ard Biesheuvel
  10 siblings, 0 replies; 29+ messages in thread
From: Ard Biesheuvel @ 2015-04-10 13:53 UTC (permalink / raw)
  To: linux-arm-kernel

This moves the primary mapping of the kernel Image out of
the linear region. This is a preparatory step towards allowing
the kernel Image to reside anywhere in physical memory without
affecting the ability to map all of it efficiently.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/memory.h | 15 ++++++++++++---
 arch/arm64/kernel/head.S        |  8 ++++----
 arch/arm64/kernel/vmlinux.lds.S |  4 ++--
 arch/arm64/mm/mmu.c             |  2 ++
 4 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index f800d45ea226..d92268614ddc 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -38,8 +38,10 @@
  */
 #define PCI_IO_SIZE		SZ_16M
 
+#define KIMAGE_OFFSET		SZ_64M
+
 /*
- * PAGE_OFFSET - the virtual address of the start of the kernel image (top
+ * PAGE_OFFSET - the virtual address of the base of the linear mapping (top
  *		 (VA_BITS - 1))
  * VA_BITS - the maximum number of bits for virtual addresses.
  * TASK_SIZE - the maximum size of a user space task.
@@ -49,7 +51,8 @@
  */
 #define VA_BITS			(CONFIG_ARM64_VA_BITS)
 #define PAGE_OFFSET		(UL(0xffffffffffffffff) << (VA_BITS - 1))
-#define MODULES_END		(PAGE_OFFSET)
+#define KIMAGE_VADDR		(PAGE_OFFSET - KIMAGE_OFFSET)
+#define MODULES_END		KIMAGE_VADDR
 #define MODULES_VADDR		(MODULES_END - SZ_64M)
 #define PCI_IO_END		(MODULES_VADDR - SZ_2M)
 #define PCI_IO_START		(PCI_IO_END - PCI_IO_SIZE)
@@ -77,7 +80,11 @@
  * private definitions which should NOT be used outside memory.h
  * files.  Use virt_to_phys/phys_to_virt/__pa/__va instead.
  */
-#define __virt_to_phys(x)	(((phys_addr_t)(x) - PAGE_OFFSET + PHYS_OFFSET))
+#define __virt_to_phys(x) ({						\
+	long __x = (long)(x) - PAGE_OFFSET;				\
+	__x >= 0 ? (phys_addr_t)(__x + PHYS_OFFSET) : 			\
+		   (phys_addr_t)(__x + PHYS_OFFSET + image_offset); })
+
 #define __phys_to_virt(x)	((unsigned long)((x) - PHYS_OFFSET + PAGE_OFFSET))
 
 /*
@@ -113,6 +120,8 @@ extern phys_addr_t		memstart_addr;
 /* PHYS_OFFSET - the physical address of the start of memory. */
 #define PHYS_OFFSET		({ memstart_addr; })
 
+extern u64 image_offset;
+
 /*
  * PFNs are used to describe any physical page; this means
  * PFN 0 == physical address 0.
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 208ca21868cc..729526a45f60 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -36,8 +36,6 @@
 #include <asm/page.h>
 #include <asm/virt.h>
 
-#define __PHYS_OFFSET	(KERNEL_START - TEXT_OFFSET)
-
 #if (TEXT_OFFSET & 0xfff) != 0
 #error TEXT_OFFSET must be at least 4KB aligned
 #elif (PAGE_OFFSET & 0x1fffff) != 0
@@ -58,6 +56,8 @@
 
 #define KERNEL_START	_text
 #define KERNEL_END	_end
+#define KERNEL_BASE	(KERNEL_START - TEXT_OFFSET)
+
 
 /*
  * Initial memory map attributes.
@@ -235,7 +235,7 @@ section_table:
 ENTRY(stext)
 	bl	preserve_boot_args
 	bl	el2_setup			// Drop to EL1, w20=cpu_boot_mode
-	adrp	x24, __PHYS_OFFSET
+	adrp	x24, KERNEL_BASE
 	bl	set_cpu_boot_mode_flag
 	bl	__create_page_tables		// x25=TTBR0, x26=TTBR1
 	/*
@@ -411,7 +411,7 @@ __create_page_tables:
 	 * Map the kernel image (starting with PHYS_OFFSET).
 	 */
 	mov	x0, x26				// swapper_pg_dir
-	mov	x5, #PAGE_OFFSET
+	ldr	x5, =KERNEL_BASE
 	create_pgd_entry x0, x5, x3, x6
 	ldr	x6, =KERNEL_END			// __va(KERNEL_END)
 	mov	x3, x24				// phys offset
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index b7cdf4feb9f1..7241dd428462 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -84,7 +84,7 @@ SECTIONS
 		*(.discard.*)
 	}
 
-	. = PAGE_OFFSET + TEXT_OFFSET;
+	. = KIMAGE_VADDR + TEXT_OFFSET;
 
 	.head.text : {
 		_text = .;
@@ -195,4 +195,4 @@ ASSERT(SIZEOF(.pgdir) < ALIGNOF(.pgdir), ".pgdir size exceeds its alignment")
 /*
  * If padding is applied before .head.text, virt<->phys conversions will fail.
  */
-ASSERT(_text == (PAGE_OFFSET + TEXT_OFFSET), "HEAD is misaligned")
+ASSERT(_text == (KIMAGE_VADDR + TEXT_OFFSET), "HEAD is misaligned")
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index ea35ec911393..4e2d696c5b9e 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -49,6 +49,8 @@ u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
 struct page *empty_zero_page;
 EXPORT_SYMBOL(empty_zero_page);
 
+u64 image_offset __read_mostly = KIMAGE_OFFSET;
+
 pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 			      unsigned long size, pgprot_t vma_prot)
 {
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 10/11] arm64: allow kernel Image to be loaded anywhere in physical memory
  2015-04-10 13:53 [PATCH v3 00/11] arm64: update/clarify/relax Image and FDT placement rules Ard Biesheuvel
                   ` (8 preceding siblings ...)
  2015-04-10 13:53 ` [PATCH v3 09/11] arm64: move kernel mapping out of linear region Ard Biesheuvel
@ 2015-04-10 13:53 ` Ard Biesheuvel
  2015-04-14 14:36   ` Mark Rutland
  2015-04-10 13:53 ` [PATCH v3 11/11] arm64/efi: adapt to relaxed kernel Image placement requirements Ard Biesheuvel
  10 siblings, 1 reply; 29+ messages in thread
From: Ard Biesheuvel @ 2015-04-10 13:53 UTC (permalink / raw)
  To: linux-arm-kernel

This relaxes the kernel Image placement requirements, so that it
may be placed at any 2 MB aligned offset in physical memory.

This is accomplished by ignoring PHYS_OFFSET when installing
memblocks, and accounting for the apparent virtual offset of
the kernel Image (in addition to the 64 MB that is is moved
below PAGE_OFFSET). As a result, virtual address references
below PAGE_OFFSET are correctly mapped onto physical references
into the kernel Image regardless of where it sits in memory.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 Documentation/arm64/booting.txt | 17 +++++++----------
 arch/arm64/mm/init.c            | 32 +++++++++++++++++++-------------
 2 files changed, 26 insertions(+), 23 deletions(-)

diff --git a/Documentation/arm64/booting.txt b/Documentation/arm64/booting.txt
index 6396460f6085..811d93548bdc 100644
--- a/Documentation/arm64/booting.txt
+++ b/Documentation/arm64/booting.txt
@@ -110,16 +110,13 @@ Header notes:
   depending on selected features, and is effectively unbound.
 
 The Image must be placed text_offset bytes from a 2MB aligned base
-address near the start of usable system RAM and called there. Memory
-below that base address is currently unusable by Linux, and therefore it
-is strongly recommended that this location is the start of system RAM.
-At least image_size bytes from the start of the image must be free for
-use by the kernel.
-
-Any memory described to the kernel (even that below the 2MB aligned base
-address) which is not marked as reserved from the kernel e.g. with a
-memreserve region in the device tree) will be considered as available to
-the kernel.
+address anywhere in usable system RAM and called there. At least
+image_size bytes from the start of the image must be free for use
+by the kernel.
+
+Any memory described to the kernel which is not marked as reserved from
+the kernel e.g. with a memreserve region in the device tree) will be
+considered as available to the kernel.
 
 Before jumping into the kernel, the following conditions must be met:
 
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 48175b769074..18234c7cf6e6 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -375,8 +375,6 @@ __setup("keepinitrd", keepinitrd_setup);
 
 void __init early_init_dt_add_memory_arch(u64 base, u64 size)
 {
-	const u64 phys_offset = __pa(PAGE_OFFSET);
-
 	if (!PAGE_ALIGNED(base)) {
 		if (size < PAGE_SIZE - (base & ~PAGE_MASK)) {
 			pr_warn("Ignoring memory block 0x%llx - 0x%llx\n",
@@ -388,16 +386,24 @@ void __init early_init_dt_add_memory_arch(u64 base, u64 size)
 	}
 	size &= PAGE_MASK;
 
-	if (base + size < phys_offset) {
-		pr_warning("Ignoring memory block 0x%llx - 0x%llx\n",
-			   base, base + size);
-		return;
-	}
-	if (base < phys_offset) {
-		pr_warning("Ignoring memory range 0x%llx - 0x%llx\n",
-			   base, phys_offset);
-		size -= phys_offset - base;
-		base = phys_offset;
-	}
 	memblock_add(base, size);
+
+	/*
+	 * Set memstart_addr to the base of the lowest physical memory region,
+	 * rounded down to PUD/PMD alignment so we can map it efficiently.
+	 * Since this also affects the apparent offset of the kernel image in
+	 * the virtual address space, increase image_offset by the same amount
+	 * that we decrease memstart_addr.
+	 */
+	if (!memstart_addr || memstart_addr > base) {
+		u64 new_memstart_addr;
+
+		if (IS_ENABLED(CONFIG_ARM64_64K_PAGES))
+			new_memstart_addr = base & PMD_MASK;
+		else
+			new_memstart_addr = base & PUD_MASK;
+
+		image_offset += memstart_addr - new_memstart_addr;
+		memstart_addr = new_memstart_addr;
+	}
 }
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 11/11] arm64/efi: adapt to relaxed kernel Image placement requirements
  2015-04-10 13:53 [PATCH v3 00/11] arm64: update/clarify/relax Image and FDT placement rules Ard Biesheuvel
                   ` (9 preceding siblings ...)
  2015-04-10 13:53 ` [PATCH v3 10/11] arm64: allow kernel Image to be loaded anywhere in physical memory Ard Biesheuvel
@ 2015-04-10 13:53 ` Ard Biesheuvel
  10 siblings, 0 replies; 29+ messages in thread
From: Ard Biesheuvel @ 2015-04-10 13:53 UTC (permalink / raw)
  To: linux-arm-kernel

This adapts the EFI stub kernel placement to the new relaxed
requirements, by placing the kernel Image at the highest available
2 MB offset in physical memory.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/kernel/efi-stub.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/efi-stub.c b/arch/arm64/kernel/efi-stub.c
index f5374065ad53..c2141ed8a043 100644
--- a/arch/arm64/kernel/efi-stub.c
+++ b/arch/arm64/kernel/efi-stub.c
@@ -28,8 +28,8 @@ efi_status_t __init handle_kernel_image(efi_system_table_t *sys_table,
 	kernel_size = _edata - _text;
 	if (*image_addr != (dram_base + TEXT_OFFSET)) {
 		kernel_memsize = kernel_size + (_end - _edata);
-		status = efi_low_alloc(sys_table, kernel_memsize + TEXT_OFFSET,
-				       SZ_2M, reserve_addr);
+		status = efi_high_alloc(sys_table, kernel_memsize + TEXT_OFFSET,
+				       SZ_2M, reserve_addr, ULONG_MAX);
 		if (status != EFI_SUCCESS) {
 			pr_efi_err(sys_table, "Failed to relocate kernel\n");
 			return status;
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 01/11] arm64: reduce ID map to a single page
  2015-04-10 13:53 ` [PATCH v3 01/11] arm64: reduce ID map to a single page Ard Biesheuvel
@ 2015-04-13 12:53   ` Mark Rutland
  2015-04-13 12:56     ` Ard Biesheuvel
  0 siblings, 1 reply; 29+ messages in thread
From: Mark Rutland @ 2015-04-13 12:53 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ard,

On Fri, Apr 10, 2015 at 02:53:45PM +0100, Ard Biesheuvel wrote:
> Commit ea8c2e112445 ("arm64: Extend the idmap to the whole kernel
> image") changed the early page table code so that the entire kernel
> Image is covered by the identity map. This allows functions that
> need to enable or disable the MMU to reside anywhere in the kernel
> Image.
> 
> However, this change has the unfortunate side effect that the Image
> cannot cross a physical 512 MB alignment boundary anymore, since the
> early page table code cannot deal with the Image crossing a /virtual/
> 512 MB alignment boundary.
> 
> So instead, reduce the ID map to a single page, that is populated by
> the contents of the .idmap.text section. Only three functions reside
> there at the moment: __enable_mmu(), cpu_resume_mmu() and cpu_reset().

It would be worth mentioning in the cover letter which branch this is
based on (arm64 for-next/core?), given the __enable_mmu + __turn_mmu_on
folding isn't in mainline yet.

> If new code is introduced that needs to manipulate the MMU state, it
> should be added to this section as well.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

Other than the minor nit below this looks good to me, and I've given it
a spin on Juno with 4K and 64K pages. Assuming you can fix that up:

Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>

> @@ -669,6 +669,7 @@ ENDPROC(__secondary_switched)
>   *
>   * other registers depend on the function called upon completion
>   */
> +	.section	".idmap.text", #alloc, #execinstr

We should use "ax" rather than #alloc, #execinstr to keep things
consistent, unless there's some difference that I'm missing?

I've tested the patch locally with all instances changed to "ax".

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 01/11] arm64: reduce ID map to a single page
  2015-04-13 12:53   ` Mark Rutland
@ 2015-04-13 12:56     ` Ard Biesheuvel
  0 siblings, 0 replies; 29+ messages in thread
From: Ard Biesheuvel @ 2015-04-13 12:56 UTC (permalink / raw)
  To: linux-arm-kernel

On 13 April 2015 at 14:53, Mark Rutland <mark.rutland@arm.com> wrote:
> Hi Ard,
>
> On Fri, Apr 10, 2015 at 02:53:45PM +0100, Ard Biesheuvel wrote:
>> Commit ea8c2e112445 ("arm64: Extend the idmap to the whole kernel
>> image") changed the early page table code so that the entire kernel
>> Image is covered by the identity map. This allows functions that
>> need to enable or disable the MMU to reside anywhere in the kernel
>> Image.
>>
>> However, this change has the unfortunate side effect that the Image
>> cannot cross a physical 512 MB alignment boundary anymore, since the
>> early page table code cannot deal with the Image crossing a /virtual/
>> 512 MB alignment boundary.
>>
>> So instead, reduce the ID map to a single page, that is populated by
>> the contents of the .idmap.text section. Only three functions reside
>> there at the moment: __enable_mmu(), cpu_resume_mmu() and cpu_reset().
>
> It would be worth mentioning in the cover letter which branch this is
> based on (arm64 for-next/core?), given the __enable_mmu + __turn_mmu_on
> folding isn't in mainline yet.
>

Ah yes. That just feels so long ago already :-)

>> If new code is introduced that needs to manipulate the MMU state, it
>> should be added to this section as well.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>
> Other than the minor nit below this looks good to me, and I've given it
> a spin on Juno with 4K and 64K pages. Assuming you can fix that up:
>
> Reviewed-by: Mark Rutland <mark.rutland@arm.com>
> Tested-by: Mark Rutland <mark.rutland@arm.com>
>

Thanks!

>> @@ -669,6 +669,7 @@ ENDPROC(__secondary_switched)
>>   *
>>   * other registers depend on the function called upon completion
>>   */
>> +     .section        ".idmap.text", #alloc, #execinstr
>
> We should use "ax" rather than #alloc, #execinstr to keep things
> consistent, unless there's some difference that I'm missing?
>
> I've tested the patch locally with all instances changed to "ax".
>

OK, I will change it. I have a personal preference for wordy when it
comes to things like this (try googling for ax), but as far as I know,
they are 100% equivalent.

-- 
Ard.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 04/11] arm64: use fixmap region for permanent FDT mapping
  2015-04-10 13:53 ` [PATCH v3 04/11] arm64: use fixmap region for permanent FDT mapping Ard Biesheuvel
@ 2015-04-13 15:02   ` Mark Rutland
  2015-04-13 15:15     ` Ard Biesheuvel
  0 siblings, 1 reply; 29+ messages in thread
From: Mark Rutland @ 2015-04-13 15:02 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ard,

> -The device tree blob (dtb) must be placed on an 8-byte boundary within
> -the first 512 megabytes from the start of the kernel image and must not
> -cross a 2-megabyte boundary. This is to allow the kernel to map the
> -blob using a single section mapping in the initial page tables.
> +The device tree blob (dtb) must be placed on an 8-byte boundary and must
> +not exceed 2 megabytes in size.
>  
> +NOTE: versions prior to v4.2 also require that the dtb be placed within
> +512 MB of the 2 MB aligned boundary right below the kernel Image.

This reads a little odd (it sounds like you could load the DTB 512M
below the boundary too).

How about:

NOTE: versions prior to v4.2 also require that the DTB be placed within
the 512 MB region starting at text_offset bytes below the kernel Image.

[...]

>  enum fixed_addresses {
>  	FIX_HOLE,
> +
> +	/*
> +	 * Reserve 4 MB of virtual space for the FDT at the top of the fixmap
> +	 * region. Keep this at the top so it remains 2 MB aligned.
> +	 */
> +#define FIX_FDT_SIZE		SZ_4M
> +	FIX_FDT_END,
> +	FIX_FDT = FIX_FDT_END + FIX_FDT_SIZE / PAGE_SIZE - 1,
> +
>  	FIX_EARLYCON_MEM_BASE,
>  	FIX_TEXT_POKE0,
>  	__end_of_permanent_fixed_addresses,

[...]

> +	/*
> +	 * Before passing the dt_virt pointer to early_init_dt_scan(), we have
> +	 * to ensure that the FDT size as reported in the FDT itself does not
> +	 * exceed the window we just mapped for it.
> +	 */
> +	if (!dt_virt ||
> +	    fdt_check_header(dt_virt) != 0 ||
> +	    (dt_phys & (SZ_2M - 1)) + fdt_totalsize(dt_virt) > FIX_FDT_SIZE ||
> +	    !early_init_dt_scan(dt_virt)) {

Doesn't this allow a DTB > 2M (violating the boot protocol's 2M
restriction)?

Perhaps we want a MAX_FDT_SIZE, and have FIX_FDT_SIZE be MAX_FDT_SIZE +
2M. Then we can check that the DTB <= MAX_FDT_SIZE, and we know we will
always be able to map it. 

Otherwise we could get intermittent failures for DTBs larger than 2MB,
depending on where they're loaded. Always failing those for now would
give us a consistent early warning that we need to bump MAX_FDT_SIZE.

[...]

> +void *__init fixmap_remap_fdt(phys_addr_t dt_phys)
> +{
> +	const u64 dt_virt_base = __fix_to_virt(FIX_FDT);
> +	phys_addr_t dt_phys_base = dt_phys & ~(SZ_2M - 1);

Do we definitely retain the high bits here? I don't have the C integer
promotion rules paged into my head, but the definition in
include/linux/sizes.h makes me suspicious, given it should have an int
type:

#define SZ_2M                           0x00200000

> +	u64 pfn = dt_phys_base >> PAGE_SHIFT;
> +	u64 pfn_end = pfn + (FIX_FDT_SIZE >> PAGE_SHIFT);
> +	pgprot_t prot = PAGE_KERNEL | PTE_RDONLY;
> +
> +	/*
> +	 * Make sure that the FDT region can be mapped without the need to
> +	 * allocate additional translation table pages, or perform physical
> +	 * to virtual address translations via the linear mapping.
> +	 *
> +	 * On 4k pages, we'll use section mappings for the region so we
> +	 * only have to be in the same PUD as the rest of the fixmap.
> +	 * On 64k pages, we need to be in the same PMD as well, as the region
> +	 * will be mapped using PTEs.
> +	 */

It would be nice to have the last two statements swapped (with
appropriate wording fixups) to match the order of the if branches.

Otherwise this looks good to me.

Mark.

> +	BUILD_BUG_ON(dt_virt_base & (SZ_2M - 1));
> +
> +	if (IS_ENABLED(CONFIG_ARM64_64K_PAGES)) {
> +		pte_t *pte;
> +
> +		BUILD_BUG_ON(__fix_to_virt(FIX_FDT_END) >> PMD_SHIFT !=
> +			     __fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT);
> +
> +		/* map the FDT using 64k pages */
> +		pte = fixmap_pte(dt_virt_base);
> +		do {
> +			set_pte(pte++, pfn_pte(pfn++, prot));
> +		} while (pfn < pfn_end);
> +	} else {
> +		pmd_t *pmd;
> +
> +		BUILD_BUG_ON(__fix_to_virt(FIX_FDT_END) >> PUD_SHIFT !=
> +			     __fix_to_virt(FIX_BTMAP_BEGIN) >> PUD_SHIFT);
> +
> +		/* map the FDT using 2 MB sections */
> +		pmd = fixmap_pmd(dt_virt_base);
> +		do {
> +			set_pmd(pmd++, pfn_pmd(pfn, mk_sect_prot(prot)));
> +			pfn += PMD_SIZE >> PAGE_SHIFT;
> +		} while (pfn < pfn_end);
> +	}
> +
> +	return (void *)(dt_virt_base + dt_phys - dt_phys_base);
> +}
> -- 
> 1.8.3.2
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 04/11] arm64: use fixmap region for permanent FDT mapping
  2015-04-13 15:02   ` Mark Rutland
@ 2015-04-13 15:15     ` Ard Biesheuvel
  2015-04-13 15:26       ` Mark Rutland
  0 siblings, 1 reply; 29+ messages in thread
From: Ard Biesheuvel @ 2015-04-13 15:15 UTC (permalink / raw)
  To: linux-arm-kernel

On 13 April 2015 at 17:02, Mark Rutland <mark.rutland@arm.com> wrote:
> Hi Ard,
>
>> -The device tree blob (dtb) must be placed on an 8-byte boundary within
>> -the first 512 megabytes from the start of the kernel image and must not
>> -cross a 2-megabyte boundary. This is to allow the kernel to map the
>> -blob using a single section mapping in the initial page tables.
>> +The device tree blob (dtb) must be placed on an 8-byte boundary and must
>> +not exceed 2 megabytes in size.
>>
>> +NOTE: versions prior to v4.2 also require that the dtb be placed within
>> +512 MB of the 2 MB aligned boundary right below the kernel Image.
>
> This reads a little odd (it sounds like you could load the DTB 512M
> below the boundary too).
>
> How about:
>
> NOTE: versions prior to v4.2 also require that the DTB be placed within
> the 512 MB region starting at text_offset bytes below the kernel Image.
>

OK

> [...]
>
>>  enum fixed_addresses {
>>       FIX_HOLE,
>> +
>> +     /*
>> +      * Reserve 4 MB of virtual space for the FDT at the top of the fixmap
>> +      * region. Keep this at the top so it remains 2 MB aligned.
>> +      */
>> +#define FIX_FDT_SIZE         SZ_4M
>> +     FIX_FDT_END,
>> +     FIX_FDT = FIX_FDT_END + FIX_FDT_SIZE / PAGE_SIZE - 1,
>> +
>>       FIX_EARLYCON_MEM_BASE,
>>       FIX_TEXT_POKE0,
>>       __end_of_permanent_fixed_addresses,
>
> [...]
>
>> +     /*
>> +      * Before passing the dt_virt pointer to early_init_dt_scan(), we have
>> +      * to ensure that the FDT size as reported in the FDT itself does not
>> +      * exceed the window we just mapped for it.
>> +      */
>> +     if (!dt_virt ||
>> +         fdt_check_header(dt_virt) != 0 ||
>> +         (dt_phys & (SZ_2M - 1)) + fdt_totalsize(dt_virt) > FIX_FDT_SIZE ||
>> +         !early_init_dt_scan(dt_virt)) {
>
> Doesn't this allow a DTB > 2M (violating the boot protocol's 2M
> restriction)?
>

Yes it does, and I was wondering what to do about it. We could be
pedantic, and fail here with no user visible feedback about what went
wrong, or we could be lax here, and add a warning later (similar to
the x1/x2/x3 check) that the boot protocol was violated if the FDT
exceeds 2 MB but otherwise proceed normally.

> Perhaps we want a MAX_FDT_SIZE, and have FIX_FDT_SIZE be MAX_FDT_SIZE +
> 2M. Then we can check that the DTB <= MAX_FDT_SIZE, and we know we will
> always be able to map it.
>
> Otherwise we could get intermittent failures for DTBs larger than 2MB,
> depending on where they're loaded. Always failing those for now would
> give us a consistent early warning that we need to bump MAX_FDT_SIZE.
>

Consistent, early, but emitted before we even have a console ...

> [...]
>
>> +void *__init fixmap_remap_fdt(phys_addr_t dt_phys)
>> +{
>> +     const u64 dt_virt_base = __fix_to_virt(FIX_FDT);
>> +     phys_addr_t dt_phys_base = dt_phys & ~(SZ_2M - 1);
>
> Do we definitely retain the high bits here? I don't have the C integer
> promotion rules paged into my head, but the definition in
> include/linux/sizes.h makes me suspicious, given it should have an int
> type:
>
> #define SZ_2M                           0x00200000
>

I will use round_down() instead

>> +     u64 pfn = dt_phys_base >> PAGE_SHIFT;
>> +     u64 pfn_end = pfn + (FIX_FDT_SIZE >> PAGE_SHIFT);
>> +     pgprot_t prot = PAGE_KERNEL | PTE_RDONLY;
>> +
>> +     /*
>> +      * Make sure that the FDT region can be mapped without the need to
>> +      * allocate additional translation table pages, or perform physical
>> +      * to virtual address translations via the linear mapping.
>> +      *
>> +      * On 4k pages, we'll use section mappings for the region so we
>> +      * only have to be in the same PUD as the rest of the fixmap.
>> +      * On 64k pages, we need to be in the same PMD as well, as the region
>> +      * will be mapped using PTEs.
>> +      */
>
> It would be nice to have the last two statements swapped (with
> appropriate wording fixups) to match the order of the if branches.
>

ok

> Otherwise this looks good to me.
>

thanks.

-- 
Ard.

>> +     BUILD_BUG_ON(dt_virt_base & (SZ_2M - 1));
>> +
>> +     if (IS_ENABLED(CONFIG_ARM64_64K_PAGES)) {
>> +             pte_t *pte;
>> +
>> +             BUILD_BUG_ON(__fix_to_virt(FIX_FDT_END) >> PMD_SHIFT !=
>> +                          __fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT);
>> +
>> +             /* map the FDT using 64k pages */
>> +             pte = fixmap_pte(dt_virt_base);
>> +             do {
>> +                     set_pte(pte++, pfn_pte(pfn++, prot));
>> +             } while (pfn < pfn_end);
>> +     } else {
>> +             pmd_t *pmd;
>> +
>> +             BUILD_BUG_ON(__fix_to_virt(FIX_FDT_END) >> PUD_SHIFT !=
>> +                          __fix_to_virt(FIX_BTMAP_BEGIN) >> PUD_SHIFT);
>> +
>> +             /* map the FDT using 2 MB sections */
>> +             pmd = fixmap_pmd(dt_virt_base);
>> +             do {
>> +                     set_pmd(pmd++, pfn_pmd(pfn, mk_sect_prot(prot)));
>> +                     pfn += PMD_SIZE >> PAGE_SHIFT;
>> +             } while (pfn < pfn_end);
>> +     }
>> +
>> +     return (void *)(dt_virt_base + dt_phys - dt_phys_base);
>> +}
>> --
>> 1.8.3.2
>>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 04/11] arm64: use fixmap region for permanent FDT mapping
  2015-04-13 15:15     ` Ard Biesheuvel
@ 2015-04-13 15:26       ` Mark Rutland
  2015-04-13 15:45         ` Ard Biesheuvel
  0 siblings, 1 reply; 29+ messages in thread
From: Mark Rutland @ 2015-04-13 15:26 UTC (permalink / raw)
  To: linux-arm-kernel

> >> +     /*
> >> +      * Before passing the dt_virt pointer to early_init_dt_scan(), we have
> >> +      * to ensure that the FDT size as reported in the FDT itself does not
> >> +      * exceed the window we just mapped for it.
> >> +      */
> >> +     if (!dt_virt ||
> >> +         fdt_check_header(dt_virt) != 0 ||
> >> +         (dt_phys & (SZ_2M - 1)) + fdt_totalsize(dt_virt) > FIX_FDT_SIZE ||
> >> +         !early_init_dt_scan(dt_virt)) {
> >
> > Doesn't this allow a DTB > 2M (violating the boot protocol's 2M
> > restriction)?
> >
> 
> Yes it does, and I was wondering what to do about it. We could be
> pedantic, and fail here with no user visible feedback about what went
> wrong, or we could be lax here, and add a warning later (similar to
> the x1/x2/x3 check) that the boot protocol was violated if the FDT
> exceeds 2 MB but otherwise proceed normally.
> 
> > Perhaps we want a MAX_FDT_SIZE, and have FIX_FDT_SIZE be MAX_FDT_SIZE +
> > 2M. Then we can check that the DTB <= MAX_FDT_SIZE, and we know we will
> > always be able to map it.
> >
> > Otherwise we could get intermittent failures for DTBs larger than 2MB,
> > depending on where they're loaded. Always failing those for now would
> > give us a consistent early warning that we need to bump MAX_FDT_SIZE.
> >
> 
> Consistent, early, but emitted before we even have a console ...

When I say warning, I mean that boot will consistently fail, rather than
there being a pr_warn.

People seem to wait until their platform is hilariously broken before
reporting bugs...

> >> +void *__init fixmap_remap_fdt(phys_addr_t dt_phys)
> >> +{
> >> +     const u64 dt_virt_base = __fix_to_virt(FIX_FDT);
> >> +     phys_addr_t dt_phys_base = dt_phys & ~(SZ_2M - 1);
> >
> > Do we definitely retain the high bits here? I don't have the C integer
> > promotion rules paged into my head, but the definition in
> > include/linux/sizes.h makes me suspicious, given it should have an int
> > type:
> >
> > #define SZ_2M                           0x00200000
> >
> 
> I will use round_down() instead

Ok.

Mark.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 04/11] arm64: use fixmap region for permanent FDT mapping
  2015-04-13 15:26       ` Mark Rutland
@ 2015-04-13 15:45         ` Ard Biesheuvel
  2015-04-13 16:26           ` Mark Rutland
  0 siblings, 1 reply; 29+ messages in thread
From: Ard Biesheuvel @ 2015-04-13 15:45 UTC (permalink / raw)
  To: linux-arm-kernel

On 13 April 2015 at 17:26, Mark Rutland <mark.rutland@arm.com> wrote:
>> >> +     /*
>> >> +      * Before passing the dt_virt pointer to early_init_dt_scan(), we have
>> >> +      * to ensure that the FDT size as reported in the FDT itself does not
>> >> +      * exceed the window we just mapped for it.
>> >> +      */
>> >> +     if (!dt_virt ||
>> >> +         fdt_check_header(dt_virt) != 0 ||
>> >> +         (dt_phys & (SZ_2M - 1)) + fdt_totalsize(dt_virt) > FIX_FDT_SIZE ||
>> >> +         !early_init_dt_scan(dt_virt)) {
>> >
>> > Doesn't this allow a DTB > 2M (violating the boot protocol's 2M
>> > restriction)?
>> >
>>
>> Yes it does, and I was wondering what to do about it. We could be
>> pedantic, and fail here with no user visible feedback about what went
>> wrong, or we could be lax here, and add a warning later (similar to
>> the x1/x2/x3 check) that the boot protocol was violated if the FDT
>> exceeds 2 MB but otherwise proceed normally.
>>
>> > Perhaps we want a MAX_FDT_SIZE, and have FIX_FDT_SIZE be MAX_FDT_SIZE +
>> > 2M. Then we can check that the DTB <= MAX_FDT_SIZE, and we know we will
>> > always be able to map it.
>> >
>> > Otherwise we could get intermittent failures for DTBs larger than 2MB,
>> > depending on where they're loaded. Always failing those for now would
>> > give us a consistent early warning that we need to bump MAX_FDT_SIZE.
>> >
>>
>> Consistent, early, but emitted before we even have a console ...
>
> When I say warning, I mean that boot will consistently fail, rather than
> there being a pr_warn.
>
> People seem to wait until their platform is hilariously broken before
> reporting bugs...
>

Yeah, that is true. At this stage, there is still room to be pedantic.

I will take your MAX_FDT_SiZE/FIX_FDT_SIZE suggestion, and check
fdt_totalsize() against the former instead.

I was wondering: this code now always maps the full 4 MB, also on 64k
pages, even if in the majority of cases, the actual FDT is much
smaller. Do you think there is a downside to that?

-- 
Ard.


>> >> +void *__init fixmap_remap_fdt(phys_addr_t dt_phys)
>> >> +{
>> >> +     const u64 dt_virt_base = __fix_to_virt(FIX_FDT);
>> >> +     phys_addr_t dt_phys_base = dt_phys & ~(SZ_2M - 1);
>> >
>> > Do we definitely retain the high bits here? I don't have the C integer
>> > promotion rules paged into my head, but the definition in
>> > include/linux/sizes.h makes me suspicious, given it should have an int
>> > type:
>> >
>> > #define SZ_2M                           0x00200000
>> >
>>
>> I will use round_down() instead
>
> Ok.
>
> Mark.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 05/11] arm64/efi: adapt to relaxed FDT placement requirements
  2015-04-10 13:53 ` [PATCH v3 05/11] arm64/efi: adapt to relaxed FDT placement requirements Ard Biesheuvel
@ 2015-04-13 16:20   ` Mark Rutland
  2015-04-13 16:25     ` Ard Biesheuvel
  0 siblings, 1 reply; 29+ messages in thread
From: Mark Rutland @ 2015-04-13 16:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 10, 2015 at 02:53:49PM +0100, Ard Biesheuvel wrote:
> With the relaxed FDT placement requirements in place, we can change
> the allocation strategy used by the stub to put the FDT image higher
> up in memory. At the same time, reduce the minimal alignment to 8 bytes,
> and impose a 2 MB size limit, as per the new requirements.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/include/asm/efi.h            |  8 +++-----
>  drivers/firmware/efi/libstub/arm-stub.c |  5 ++---
>  drivers/firmware/efi/libstub/efistub.h  |  1 -
>  drivers/firmware/efi/libstub/fdt.c      | 23 +++++++++++++----------
>  4 files changed, 18 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h
> index ef572206f1c3..bd513dd663b9 100644
> --- a/arch/arm64/include/asm/efi.h
> +++ b/arch/arm64/include/asm/efi.h
> @@ -39,12 +39,10 @@ extern void efi_init(void);
>  /* arch specific definitions used by the stub code */
>  
>  /*
> - * AArch64 requires the DTB to be 8-byte aligned in the first 512MiB from
> - * start of kernel and may not cross a 2MiB boundary. We set alignment to
> - * 2MiB so we know it won't cross a 2MiB boundary.
> + * AArch64 requires the DTB to be 8-byte aligned and not exceed 2MiB in size.
>   */
> -#define EFI_FDT_ALIGN	SZ_2M   /* used by allocate_new_fdt_and_exit_boot() */
> -#define MAX_FDT_OFFSET	SZ_512M
> +#define EFI_FDT_ALIGN		8
> +#define EFI_FDT_MAX_SIZE	SZ_2M

Is there any way we can organise things so we can share the kernel's
MAX_FDT_SIZE definition, so as to keep the stub and kernel in sync?

Otherwise, the patch looks good to me.

Mark.

>  
>  #define efi_call_early(f, ...) sys_table_arg->boottime->f(__VA_ARGS__)
>  
> diff --git a/drivers/firmware/efi/libstub/arm-stub.c b/drivers/firmware/efi/libstub/arm-stub.c
> index dcae482a9a17..f54c76a4fd32 100644
> --- a/drivers/firmware/efi/libstub/arm-stub.c
> +++ b/drivers/firmware/efi/libstub/arm-stub.c
> @@ -269,9 +269,8 @@ unsigned long efi_entry(void *handle, efi_system_table_t *sys_table,
>  
>  	new_fdt_addr = fdt_addr;
>  	status = allocate_new_fdt_and_exit_boot(sys_table, handle,
> -				&new_fdt_addr, dram_base + MAX_FDT_OFFSET,
> -				initrd_addr, initrd_size, cmdline_ptr,
> -				fdt_addr, fdt_size);
> +				&new_fdt_addr, initrd_addr, initrd_size,
> +				cmdline_ptr, fdt_addr, fdt_size);
>  
>  	/*
>  	 * If all went well, we need to return the FDT address to the
> diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
> index 47437b16b186..c8e096094ea9 100644
> --- a/drivers/firmware/efi/libstub/efistub.h
> +++ b/drivers/firmware/efi/libstub/efistub.h
> @@ -35,7 +35,6 @@ efi_status_t update_fdt(efi_system_table_t *sys_table, void *orig_fdt,
>  efi_status_t allocate_new_fdt_and_exit_boot(efi_system_table_t *sys_table,
>  					    void *handle,
>  					    unsigned long *new_fdt_addr,
> -					    unsigned long max_addr,
>  					    u64 initrd_addr, u64 initrd_size,
>  					    char *cmdline_ptr,
>  					    unsigned long fdt_addr,
> diff --git a/drivers/firmware/efi/libstub/fdt.c b/drivers/firmware/efi/libstub/fdt.c
> index 91da56c4fd54..ace5ed70a88e 100644
> --- a/drivers/firmware/efi/libstub/fdt.c
> +++ b/drivers/firmware/efi/libstub/fdt.c
> @@ -165,10 +165,6 @@ fdt_set_fail:
>  	return EFI_LOAD_ERROR;
>  }
>  
> -#ifndef EFI_FDT_ALIGN
> -#define EFI_FDT_ALIGN EFI_PAGE_SIZE
> -#endif
> -
>  /*
>   * Allocate memory for a new FDT, then add EFI, commandline, and
>   * initrd related fields to the FDT.  This routine increases the
> @@ -186,7 +182,6 @@ fdt_set_fail:
>  efi_status_t allocate_new_fdt_and_exit_boot(efi_system_table_t *sys_table,
>  					    void *handle,
>  					    unsigned long *new_fdt_addr,
> -					    unsigned long max_addr,
>  					    u64 initrd_addr, u64 initrd_size,
>  					    char *cmdline_ptr,
>  					    unsigned long fdt_addr,
> @@ -197,6 +192,7 @@ efi_status_t allocate_new_fdt_and_exit_boot(efi_system_table_t *sys_table,
>  	unsigned long mmap_key;
>  	efi_memory_desc_t *memory_map, *runtime_map;
>  	unsigned long new_fdt_size;
> +	void *fdt_alloc;
>  	efi_status_t status;
>  	int runtime_entry_count = 0;
>  
> @@ -221,14 +217,21 @@ efi_status_t allocate_new_fdt_and_exit_boot(efi_system_table_t *sys_table,
>  	 * will allocate a bigger buffer if this ends up being too
>  	 * small, so a rough guess is OK here.
>  	 */
> -	new_fdt_size = fdt_size + EFI_PAGE_SIZE;
> +	new_fdt_size = fdt_size + EFI_PAGE_SIZE + EFI_FDT_ALIGN;
>  	while (1) {
> -		status = efi_high_alloc(sys_table, new_fdt_size, EFI_FDT_ALIGN,
> -					new_fdt_addr, max_addr);
> +		if (new_fdt_size > EFI_FDT_MAX_SIZE) {
> +			pr_efi_err(sys_table, "FDT size exceeds EFI_FDT_MAX_SIZE.\n");
> +			goto fail;
> +		}
> +		status = sys_table->boottime->allocate_pool(EFI_LOADER_DATA,
> +							    new_fdt_size,
> +							    &fdt_alloc);
>  		if (status != EFI_SUCCESS) {
>  			pr_efi_err(sys_table, "Unable to allocate memory for new device tree.\n");
>  			goto fail;
>  		}
> +		*new_fdt_addr = round_up((unsigned long)fdt_alloc,
> +					 EFI_FDT_ALIGN);
>  
>  		/*
>  		 * Now that we have done our final memory allocation (and free)
> @@ -258,7 +261,7 @@ efi_status_t allocate_new_fdt_and_exit_boot(efi_system_table_t *sys_table,
>  			 * to get new one that reflects the free/alloc we do
>  			 * on the device tree buffer.
>  			 */
> -			efi_free(sys_table, new_fdt_size, *new_fdt_addr);
> +			sys_table->boottime->free_pool(&fdt_alloc);
>  			sys_table->boottime->free_pool(memory_map);
>  			new_fdt_size += EFI_PAGE_SIZE;
>  		} else {
> @@ -316,7 +319,7 @@ fail_free_mmap:
>  	sys_table->boottime->free_pool(memory_map);
>  
>  fail_free_new_fdt:
> -	efi_free(sys_table, new_fdt_size, *new_fdt_addr);
> +	sys_table->boottime->free_pool(&fdt_alloc);
>  
>  fail:
>  	sys_table->boottime->free_pool(runtime_map);
> -- 
> 1.8.3.2
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 05/11] arm64/efi: adapt to relaxed FDT placement requirements
  2015-04-13 16:20   ` Mark Rutland
@ 2015-04-13 16:25     ` Ard Biesheuvel
  2015-04-13 16:35       ` Mark Rutland
  0 siblings, 1 reply; 29+ messages in thread
From: Ard Biesheuvel @ 2015-04-13 16:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 13 April 2015 at 18:20, Mark Rutland <mark.rutland@arm.com> wrote:
> On Fri, Apr 10, 2015 at 02:53:49PM +0100, Ard Biesheuvel wrote:
>> With the relaxed FDT placement requirements in place, we can change
>> the allocation strategy used by the stub to put the FDT image higher
>> up in memory. At the same time, reduce the minimal alignment to 8 bytes,
>> and impose a 2 MB size limit, as per the new requirements.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  arch/arm64/include/asm/efi.h            |  8 +++-----
>>  drivers/firmware/efi/libstub/arm-stub.c |  5 ++---
>>  drivers/firmware/efi/libstub/efistub.h  |  1 -
>>  drivers/firmware/efi/libstub/fdt.c      | 23 +++++++++++++----------
>>  4 files changed, 18 insertions(+), 19 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h
>> index ef572206f1c3..bd513dd663b9 100644
>> --- a/arch/arm64/include/asm/efi.h
>> +++ b/arch/arm64/include/asm/efi.h
>> @@ -39,12 +39,10 @@ extern void efi_init(void);
>>  /* arch specific definitions used by the stub code */
>>
>>  /*
>> - * AArch64 requires the DTB to be 8-byte aligned in the first 512MiB from
>> - * start of kernel and may not cross a 2MiB boundary. We set alignment to
>> - * 2MiB so we know it won't cross a 2MiB boundary.
>> + * AArch64 requires the DTB to be 8-byte aligned and not exceed 2MiB in size.
>>   */
>> -#define EFI_FDT_ALIGN        SZ_2M   /* used by allocate_new_fdt_and_exit_boot() */
>> -#define MAX_FDT_OFFSET       SZ_512M
>> +#define EFI_FDT_ALIGN                8
>> +#define EFI_FDT_MAX_SIZE     SZ_2M
>
> Is there any way we can organise things so we can share the kernel's
> MAX_FDT_SIZE definition, so as to keep the stub and kernel in sync?
>

I'll put MAX_FDT_SIZE in asm/page.h (unless you can propose a better place)
That way, both fixmap.h and efi.h can refer to it.


> Otherwise, the patch looks good to me.
>
> Mark.
>
>>
>>  #define efi_call_early(f, ...) sys_table_arg->boottime->f(__VA_ARGS__)
>>
>> diff --git a/drivers/firmware/efi/libstub/arm-stub.c b/drivers/firmware/efi/libstub/arm-stub.c
>> index dcae482a9a17..f54c76a4fd32 100644
>> --- a/drivers/firmware/efi/libstub/arm-stub.c
>> +++ b/drivers/firmware/efi/libstub/arm-stub.c
>> @@ -269,9 +269,8 @@ unsigned long efi_entry(void *handle, efi_system_table_t *sys_table,
>>
>>       new_fdt_addr = fdt_addr;
>>       status = allocate_new_fdt_and_exit_boot(sys_table, handle,
>> -                             &new_fdt_addr, dram_base + MAX_FDT_OFFSET,
>> -                             initrd_addr, initrd_size, cmdline_ptr,
>> -                             fdt_addr, fdt_size);
>> +                             &new_fdt_addr, initrd_addr, initrd_size,
>> +                             cmdline_ptr, fdt_addr, fdt_size);
>>
>>       /*
>>        * If all went well, we need to return the FDT address to the
>> diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
>> index 47437b16b186..c8e096094ea9 100644
>> --- a/drivers/firmware/efi/libstub/efistub.h
>> +++ b/drivers/firmware/efi/libstub/efistub.h
>> @@ -35,7 +35,6 @@ efi_status_t update_fdt(efi_system_table_t *sys_table, void *orig_fdt,
>>  efi_status_t allocate_new_fdt_and_exit_boot(efi_system_table_t *sys_table,
>>                                           void *handle,
>>                                           unsigned long *new_fdt_addr,
>> -                                         unsigned long max_addr,
>>                                           u64 initrd_addr, u64 initrd_size,
>>                                           char *cmdline_ptr,
>>                                           unsigned long fdt_addr,
>> diff --git a/drivers/firmware/efi/libstub/fdt.c b/drivers/firmware/efi/libstub/fdt.c
>> index 91da56c4fd54..ace5ed70a88e 100644
>> --- a/drivers/firmware/efi/libstub/fdt.c
>> +++ b/drivers/firmware/efi/libstub/fdt.c
>> @@ -165,10 +165,6 @@ fdt_set_fail:
>>       return EFI_LOAD_ERROR;
>>  }
>>
>> -#ifndef EFI_FDT_ALIGN
>> -#define EFI_FDT_ALIGN EFI_PAGE_SIZE
>> -#endif
>> -
>>  /*
>>   * Allocate memory for a new FDT, then add EFI, commandline, and
>>   * initrd related fields to the FDT.  This routine increases the
>> @@ -186,7 +182,6 @@ fdt_set_fail:
>>  efi_status_t allocate_new_fdt_and_exit_boot(efi_system_table_t *sys_table,
>>                                           void *handle,
>>                                           unsigned long *new_fdt_addr,
>> -                                         unsigned long max_addr,
>>                                           u64 initrd_addr, u64 initrd_size,
>>                                           char *cmdline_ptr,
>>                                           unsigned long fdt_addr,
>> @@ -197,6 +192,7 @@ efi_status_t allocate_new_fdt_and_exit_boot(efi_system_table_t *sys_table,
>>       unsigned long mmap_key;
>>       efi_memory_desc_t *memory_map, *runtime_map;
>>       unsigned long new_fdt_size;
>> +     void *fdt_alloc;
>>       efi_status_t status;
>>       int runtime_entry_count = 0;
>>
>> @@ -221,14 +217,21 @@ efi_status_t allocate_new_fdt_and_exit_boot(efi_system_table_t *sys_table,
>>        * will allocate a bigger buffer if this ends up being too
>>        * small, so a rough guess is OK here.
>>        */
>> -     new_fdt_size = fdt_size + EFI_PAGE_SIZE;
>> +     new_fdt_size = fdt_size + EFI_PAGE_SIZE + EFI_FDT_ALIGN;
>>       while (1) {
>> -             status = efi_high_alloc(sys_table, new_fdt_size, EFI_FDT_ALIGN,
>> -                                     new_fdt_addr, max_addr);
>> +             if (new_fdt_size > EFI_FDT_MAX_SIZE) {
>> +                     pr_efi_err(sys_table, "FDT size exceeds EFI_FDT_MAX_SIZE.\n");
>> +                     goto fail;
>> +             }
>> +             status = sys_table->boottime->allocate_pool(EFI_LOADER_DATA,
>> +                                                         new_fdt_size,
>> +                                                         &fdt_alloc);
>>               if (status != EFI_SUCCESS) {
>>                       pr_efi_err(sys_table, "Unable to allocate memory for new device tree.\n");
>>                       goto fail;
>>               }
>> +             *new_fdt_addr = round_up((unsigned long)fdt_alloc,
>> +                                      EFI_FDT_ALIGN);
>>
>>               /*
>>                * Now that we have done our final memory allocation (and free)
>> @@ -258,7 +261,7 @@ efi_status_t allocate_new_fdt_and_exit_boot(efi_system_table_t *sys_table,
>>                        * to get new one that reflects the free/alloc we do
>>                        * on the device tree buffer.
>>                        */
>> -                     efi_free(sys_table, new_fdt_size, *new_fdt_addr);
>> +                     sys_table->boottime->free_pool(&fdt_alloc);
>>                       sys_table->boottime->free_pool(memory_map);
>>                       new_fdt_size += EFI_PAGE_SIZE;
>>               } else {
>> @@ -316,7 +319,7 @@ fail_free_mmap:
>>       sys_table->boottime->free_pool(memory_map);
>>
>>  fail_free_new_fdt:
>> -     efi_free(sys_table, new_fdt_size, *new_fdt_addr);
>> +     sys_table->boottime->free_pool(&fdt_alloc);
>>
>>  fail:
>>       sys_table->boottime->free_pool(runtime_map);
>> --
>> 1.8.3.2
>>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 04/11] arm64: use fixmap region for permanent FDT mapping
  2015-04-13 15:45         ` Ard Biesheuvel
@ 2015-04-13 16:26           ` Mark Rutland
  2015-04-14  7:44             ` Ard Biesheuvel
  0 siblings, 1 reply; 29+ messages in thread
From: Mark Rutland @ 2015-04-13 16:26 UTC (permalink / raw)
  To: linux-arm-kernel

> I was wondering: this code now always maps the full 4 MB, also on 64k
> pages, even if in the majority of cases, the actual FDT is much
> smaller. Do you think there is a downside to that?

The two cases I can think of as problematic are when the FDT is in the
last 2M of RAM, or in a 2M region immediately before some carveout which
won't use MT_NORMAL attributes.

We could solve those by only mapping however many 2M chunks are
necessary, or we could keep the requirement that it fits within a
naturally-aligned 2M region for now.

I don't see that this should yield significant problems otherwise.

Mark.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 05/11] arm64/efi: adapt to relaxed FDT placement requirements
  2015-04-13 16:25     ` Ard Biesheuvel
@ 2015-04-13 16:35       ` Mark Rutland
  2015-04-13 16:36         ` Ard Biesheuvel
  0 siblings, 1 reply; 29+ messages in thread
From: Mark Rutland @ 2015-04-13 16:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Apr 13, 2015 at 05:25:55PM +0100, Ard Biesheuvel wrote:
> On 13 April 2015 at 18:20, Mark Rutland <mark.rutland@arm.com> wrote:
> > On Fri, Apr 10, 2015 at 02:53:49PM +0100, Ard Biesheuvel wrote:
> >> With the relaxed FDT placement requirements in place, we can change
> >> the allocation strategy used by the stub to put the FDT image higher
> >> up in memory. At the same time, reduce the minimal alignment to 8 bytes,
> >> and impose a 2 MB size limit, as per the new requirements.
> >>
> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> >> ---
> >>  arch/arm64/include/asm/efi.h            |  8 +++-----
> >>  drivers/firmware/efi/libstub/arm-stub.c |  5 ++---
> >>  drivers/firmware/efi/libstub/efistub.h  |  1 -
> >>  drivers/firmware/efi/libstub/fdt.c      | 23 +++++++++++++----------
> >>  4 files changed, 18 insertions(+), 19 deletions(-)
> >>
> >> diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h
> >> index ef572206f1c3..bd513dd663b9 100644
> >> --- a/arch/arm64/include/asm/efi.h
> >> +++ b/arch/arm64/include/asm/efi.h
> >> @@ -39,12 +39,10 @@ extern void efi_init(void);
> >>  /* arch specific definitions used by the stub code */
> >>
> >>  /*
> >> - * AArch64 requires the DTB to be 8-byte aligned in the first 512MiB from
> >> - * start of kernel and may not cross a 2MiB boundary. We set alignment to
> >> - * 2MiB so we know it won't cross a 2MiB boundary.
> >> + * AArch64 requires the DTB to be 8-byte aligned and not exceed 2MiB in size.
> >>   */
> >> -#define EFI_FDT_ALIGN        SZ_2M   /* used by allocate_new_fdt_and_exit_boot() */
> >> -#define MAX_FDT_OFFSET       SZ_512M
> >> +#define EFI_FDT_ALIGN                8
> >> +#define EFI_FDT_MAX_SIZE     SZ_2M
> >
> > Is there any way we can organise things so we can share the kernel's
> > MAX_FDT_SIZE definition, so as to keep the stub and kernel in sync?
> >
> 
> I'll put MAX_FDT_SIZE in asm/page.h (unless you can propose a better place)
> That way, both fixmap.h and efi.h can refer to it.

Using page.h feels a little weird, but I can't see that a better header
exists currently.

We could add asm/boot.h for things related to boot protocol that we want
to have shareable with the stub, but I can't see anything else that we'd
want to put in there at the moment. If there are other things we could
deduplicate it would probably make sense, but it seems overkill for a
single definition. :/

Mark.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 05/11] arm64/efi: adapt to relaxed FDT placement requirements
  2015-04-13 16:35       ` Mark Rutland
@ 2015-04-13 16:36         ` Ard Biesheuvel
  0 siblings, 0 replies; 29+ messages in thread
From: Ard Biesheuvel @ 2015-04-13 16:36 UTC (permalink / raw)
  To: linux-arm-kernel

On 13 April 2015 at 18:35, Mark Rutland <mark.rutland@arm.com> wrote:
> On Mon, Apr 13, 2015 at 05:25:55PM +0100, Ard Biesheuvel wrote:
>> On 13 April 2015 at 18:20, Mark Rutland <mark.rutland@arm.com> wrote:
>> > On Fri, Apr 10, 2015 at 02:53:49PM +0100, Ard Biesheuvel wrote:
>> >> With the relaxed FDT placement requirements in place, we can change
>> >> the allocation strategy used by the stub to put the FDT image higher
>> >> up in memory. At the same time, reduce the minimal alignment to 8 bytes,
>> >> and impose a 2 MB size limit, as per the new requirements.
>> >>
>> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> >> ---
>> >>  arch/arm64/include/asm/efi.h            |  8 +++-----
>> >>  drivers/firmware/efi/libstub/arm-stub.c |  5 ++---
>> >>  drivers/firmware/efi/libstub/efistub.h  |  1 -
>> >>  drivers/firmware/efi/libstub/fdt.c      | 23 +++++++++++++----------
>> >>  4 files changed, 18 insertions(+), 19 deletions(-)
>> >>
>> >> diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h
>> >> index ef572206f1c3..bd513dd663b9 100644
>> >> --- a/arch/arm64/include/asm/efi.h
>> >> +++ b/arch/arm64/include/asm/efi.h
>> >> @@ -39,12 +39,10 @@ extern void efi_init(void);
>> >>  /* arch specific definitions used by the stub code */
>> >>
>> >>  /*
>> >> - * AArch64 requires the DTB to be 8-byte aligned in the first 512MiB from
>> >> - * start of kernel and may not cross a 2MiB boundary. We set alignment to
>> >> - * 2MiB so we know it won't cross a 2MiB boundary.
>> >> + * AArch64 requires the DTB to be 8-byte aligned and not exceed 2MiB in size.
>> >>   */
>> >> -#define EFI_FDT_ALIGN        SZ_2M   /* used by allocate_new_fdt_and_exit_boot() */
>> >> -#define MAX_FDT_OFFSET       SZ_512M
>> >> +#define EFI_FDT_ALIGN                8
>> >> +#define EFI_FDT_MAX_SIZE     SZ_2M
>> >
>> > Is there any way we can organise things so we can share the kernel's
>> > MAX_FDT_SIZE definition, so as to keep the stub and kernel in sync?
>> >
>>
>> I'll put MAX_FDT_SIZE in asm/page.h (unless you can propose a better place)
>> That way, both fixmap.h and efi.h can refer to it.
>
> Using page.h feels a little weird, but I can't see that a better header
> exists currently.
>
> We could add asm/boot.h for things related to boot protocol that we want
> to have shareable with the stub, but I can't see anything else that we'd
> want to put in there at the moment. If there are other things we could
> deduplicate it would probably make sense, but it seems overkill for a
> single definition. :/
>

I was going to add '#define MIN_FDT_ALIGN 8' to it as well, and refer
to it from efi.h.

I think asm/boot.h is not so bad actually.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 04/11] arm64: use fixmap region for permanent FDT mapping
  2015-04-13 16:26           ` Mark Rutland
@ 2015-04-14  7:44             ` Ard Biesheuvel
  2015-04-14  8:57               ` Mark Rutland
  0 siblings, 1 reply; 29+ messages in thread
From: Ard Biesheuvel @ 2015-04-14  7:44 UTC (permalink / raw)
  To: linux-arm-kernel


On 13 apr. 2015, at 18:26, Mark Rutland <mark.rutland@arm.com> wrote:

>> I was wondering: this code now always maps the full 4 MB, also on 64k
>> pages, even if in the majority of cases, the actual FDT is much
>> smaller. Do you think there is a downside to that?
> 
> The two cases I can think of as problematic are when the FDT is in the
> last 2M of RAM, or in a 2M region immediately before some carveout which
> won't use MT_NORMAL attributes.
> 
> We could solve those by only mapping however many 2M chunks are
> necessary, or we could keep the requirement that it fits within a
> naturally-aligned 2M region for now.
> 

I strongly prefer the former, since otherwise, you always need to verify the allocation explicitly, for an fdt of any size, and it turns out the logic is easily updated to move the vetting into fixmap_remap_fdt() and check the magic and size after mapping the first chunk, and set pfn_end accordingly

> I don't see that this should yield significant problems otherwise.
> 
> Mark.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 04/11] arm64: use fixmap region for permanent FDT mapping
  2015-04-14  7:44             ` Ard Biesheuvel
@ 2015-04-14  8:57               ` Mark Rutland
  0 siblings, 0 replies; 29+ messages in thread
From: Mark Rutland @ 2015-04-14  8:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Apr 14, 2015 at 08:44:27AM +0100, Ard Biesheuvel wrote:
> 
> On 13 apr. 2015, at 18:26, Mark Rutland <mark.rutland@arm.com> wrote:
> 
> >> I was wondering: this code now always maps the full 4 MB, also on 64k
> >> pages, even if in the majority of cases, the actual FDT is much
> >> smaller. Do you think there is a downside to that?
> > 
> > The two cases I can think of as problematic are when the FDT is in the
> > last 2M of RAM, or in a 2M region immediately before some carveout which
> > won't use MT_NORMAL attributes.
> > 
> > We could solve those by only mapping however many 2M chunks are
> > necessary, or we could keep the requirement that it fits within a
> > naturally-aligned 2M region for now.
> > 
> 
> I strongly prefer the former, since otherwise, you always need to
> verify the allocation explicitly, for an fdt of any size, and it turns
> out the logic is easily updated to move the vetting into
> fixmap_remap_fdt() and check the magic and size after mapping the
> first chunk, and set pfn_end accordingly

I'm happy with that; it also decays to the old behaviour for anything
that works currently.

We should probably make it clear in the booting document that the kernel
should be expected to map the DTB in 2M chunks, so people know they
shouldn't share any 2M chunk with a carveout.

Mark.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 07/11] arm64: fixmap: allow init before linear mapping is set up
  2015-04-10 13:53 ` [PATCH v3 07/11] arm64: fixmap: allow init before linear mapping is set up Ard Biesheuvel
@ 2015-04-14 10:47   ` Mark Rutland
  2015-04-14 11:02     ` Ard Biesheuvel
  0 siblings, 1 reply; 29+ messages in thread
From: Mark Rutland @ 2015-04-14 10:47 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ard,

On Fri, Apr 10, 2015 at 02:53:51PM +0100, Ard Biesheuvel wrote:
> This reworks early_ioremap_init() so it populates the various levels
> of translation tables while taking the following into account:
> - be prepared for any of the levels to have been populated already, as
>   this may be the case once we move the kernel text mapping out of the
>   linear mapping;
> - don't rely on __va() to translate the physical address in a page table
>   entry to a virtual address, since this produces linear mapping addresses;
>   instead, use the fact that at any level, we know exactly which page in
>   swapper_pg_dir an entry could be pointing to if it points anywhere.

Can we not use Catalin's PHYS_OFFSET swizzling trick instead? i.e.

 * Set PHYS_OFFSET so __va hits in the text mapping.

 * Create the fixmap entries.

 * Parse the DTB or UEFI memory map, figure out the real PHYS_OFFSET.
 
 * Create linear mapping for the initial tables.

 * Swap PHYS_OFFSET for the real version, and update init_mm->pgd to
   point at the linear map alias of the swapper pgd.

It seemed like that would require less open-coding of table manipulation
code, as we could use __va() early.

Is there a limitation with that approach that I'm missing?

Otherwise, comments below.

> This allows us to defer the initialization of the linear mapping until
> after we have figured out where our RAM resides in the physical address
> space.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/include/asm/compiler.h |   2 +
>  arch/arm64/kernel/vmlinux.lds.S   |  14 +++--
>  arch/arm64/mm/mmu.c               | 117 +++++++++++++++++++++++++-------------
>  3 files changed, 90 insertions(+), 43 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/compiler.h b/arch/arm64/include/asm/compiler.h
> index ee35fd0f2236..dd342af63673 100644
> --- a/arch/arm64/include/asm/compiler.h
> +++ b/arch/arm64/include/asm/compiler.h
> @@ -27,4 +27,6 @@
>   */
>  #define __asmeq(x, y)  ".ifnc " x "," y " ; .err ; .endif\n\t"
>  
> +#define __pgdir		__attribute__((section(".pgdir"),aligned(PAGE_SIZE)))
> +
>  #endif	/* __ASM_COMPILER_H */
> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> index 98073332e2d0..604f285d3832 100644
> --- a/arch/arm64/kernel/vmlinux.lds.S
> +++ b/arch/arm64/kernel/vmlinux.lds.S
> @@ -160,11 +160,15 @@ SECTIONS
>  
>  	BSS_SECTION(0, 0, 0)
>  
> -	. = ALIGN(PAGE_SIZE);
> -	idmap_pg_dir = .;
> -	. += IDMAP_DIR_SIZE;
> -	swapper_pg_dir = .;
> -	. += SWAPPER_DIR_SIZE;
> +	.pgdir (NOLOAD) : {
> +		. = ALIGN(PAGE_SIZE);
> +		idmap_pg_dir = .;
> +		. += IDMAP_DIR_SIZE;
> +		swapper_pg_dir = .;
> +		__swapper_bs_region = . + PAGE_SIZE;
> +		. += SWAPPER_DIR_SIZE;
> +		*(.pgdir)
> +	}
>  
>  	_end = .;
>  
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 60be58a160a2..c0427b5c90c7 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -341,6 +341,70 @@ static void __init __map_memblock(phys_addr_t start, phys_addr_t end)
>  }
>  #endif
>  
> +struct mem_bootstrap_region {

The "region" naming is a little confusing IMO. To me it sounds like
something akin to a VMA rather than a set of tables.

> +#if CONFIG_ARM64_PGTABLE_LEVELS > 3
> +	pud_t	pud[PTRS_PER_PUD];
> +#endif
> +#if CONFIG_ARM64_PGTABLE_LEVELS > 2
> +	pmd_t	pmd[PTRS_PER_PMD];
> +#endif
> +	pte_t	pte[PTRS_PER_PTE];
> +};
> +
> +static void __init bootstrap_mem_region(unsigned long addr,
> +					struct mem_bootstrap_region *reg,
> +					pmd_t **ppmd, pte_t **ppte)
> +{
> +	/*
> +	 * Avoid using the linear phys-to-virt translation __va() so that we
> +	 * can use this code before the linear mapping is set up. Note that
> +	 * any populated entries at any level can only point into swapper_pg_dir
> +	 * since no other translation table pages have been allocated yet.
> +	 * So at each level, we either need to populate it, or it has already
> +	 * been populated by a swapper_pg_dir table at the same level, in which
> +	 * case we can figure out its virtual address without applying __va()
> +	 * on the contents of the entry, using the following struct.
> +	 */
> +	extern struct mem_bootstrap_region __swapper_bs_region;
> +
> +	pgd_t *pgd = pgd_offset_k(addr);
> +	pud_t *pud = (pud_t *)pgd;
> +	pmd_t *pmd = (pmd_t *)pud;
> +
> +#if CONFIG_ARM64_PGTABLE_LEVELS > 3
> +	if (pgd_none(*pgd)) {
> +		clear_page(reg->pud);
> +		pgd_populate(&init_mm, pgd, reg->pud);

What's PHYS_OFFSET expected to be at this point (for the purposes of
__pa() in the *_populate*() macros)?

If we're relying on __pa() to convert text->phys, won't __va() convert
phys->text at this point?

> +		pud = reg->pud;
> +	} else {
> +		pud = __swapper_bs_region.pud;
> +	}
> +	pud += pud_index(addr);
> +#endif

Can we free the unused reg tables in the else cases? If __pa() works we
should be able to hand them to memblock, no?

[...]

>  	/*
>  	 * The boot-ioremap range spans multiple pmds, for which
> -	 * we are not preparted:
> +	 * we are not prepared:

This typo has been bugging me for ages. I'll be glad to see it go!

Mark.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 07/11] arm64: fixmap: allow init before linear mapping is set up
  2015-04-14 10:47   ` Mark Rutland
@ 2015-04-14 11:02     ` Ard Biesheuvel
  2015-04-14 13:41       ` Mark Rutland
  0 siblings, 1 reply; 29+ messages in thread
From: Ard Biesheuvel @ 2015-04-14 11:02 UTC (permalink / raw)
  To: linux-arm-kernel

On 14 April 2015 at 12:47, Mark Rutland <mark.rutland@arm.com> wrote:
> Hi Ard,
>
> On Fri, Apr 10, 2015 at 02:53:51PM +0100, Ard Biesheuvel wrote:
>> This reworks early_ioremap_init() so it populates the various levels
>> of translation tables while taking the following into account:
>> - be prepared for any of the levels to have been populated already, as
>>   this may be the case once we move the kernel text mapping out of the
>>   linear mapping;
>> - don't rely on __va() to translate the physical address in a page table
>>   entry to a virtual address, since this produces linear mapping addresses;
>>   instead, use the fact that at any level, we know exactly which page in
>>   swapper_pg_dir an entry could be pointing to if it points anywhere.
>
> Can we not use Catalin's PHYS_OFFSET swizzling trick instead? i.e.
>
>  * Set PHYS_OFFSET so __va hits in the text mapping.
>
>  * Create the fixmap entries.
>
>  * Parse the DTB or UEFI memory map, figure out the real PHYS_OFFSET.
>
>  * Create linear mapping for the initial tables.
>
>  * Swap PHYS_OFFSET for the real version, and update init_mm->pgd to
>    point at the linear map alias of the swapper pgd.
>
> It seemed like that would require less open-coding of table manipulation
> code, as we could use __va() early.
>
> Is there a limitation with that approach that I'm missing?
>

I didn't quite catch Catalin's suggestion to mean the above, but the
way you put it seems viable to me. I'll have a go and see how far I
get with it.

> Otherwise, comments below.
>
>> This allows us to defer the initialization of the linear mapping until
>> after we have figured out where our RAM resides in the physical address
>> space.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  arch/arm64/include/asm/compiler.h |   2 +
>>  arch/arm64/kernel/vmlinux.lds.S   |  14 +++--
>>  arch/arm64/mm/mmu.c               | 117 +++++++++++++++++++++++++-------------
>>  3 files changed, 90 insertions(+), 43 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/compiler.h b/arch/arm64/include/asm/compiler.h
>> index ee35fd0f2236..dd342af63673 100644
>> --- a/arch/arm64/include/asm/compiler.h
>> +++ b/arch/arm64/include/asm/compiler.h
>> @@ -27,4 +27,6 @@
>>   */
>>  #define __asmeq(x, y)  ".ifnc " x "," y " ; .err ; .endif\n\t"
>>
>> +#define __pgdir              __attribute__((section(".pgdir"),aligned(PAGE_SIZE)))
>> +
>>  #endif       /* __ASM_COMPILER_H */
>> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
>> index 98073332e2d0..604f285d3832 100644
>> --- a/arch/arm64/kernel/vmlinux.lds.S
>> +++ b/arch/arm64/kernel/vmlinux.lds.S
>> @@ -160,11 +160,15 @@ SECTIONS
>>
>>       BSS_SECTION(0, 0, 0)
>>
>> -     . = ALIGN(PAGE_SIZE);
>> -     idmap_pg_dir = .;
>> -     . += IDMAP_DIR_SIZE;
>> -     swapper_pg_dir = .;
>> -     . += SWAPPER_DIR_SIZE;
>> +     .pgdir (NOLOAD) : {
>> +             . = ALIGN(PAGE_SIZE);
>> +             idmap_pg_dir = .;
>> +             . += IDMAP_DIR_SIZE;
>> +             swapper_pg_dir = .;
>> +             __swapper_bs_region = . + PAGE_SIZE;
>> +             . += SWAPPER_DIR_SIZE;
>> +             *(.pgdir)
>> +     }
>>
>>       _end = .;
>>
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 60be58a160a2..c0427b5c90c7 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -341,6 +341,70 @@ static void __init __map_memblock(phys_addr_t start, phys_addr_t end)
>>  }
>>  #endif
>>
>> +struct mem_bootstrap_region {
>
> The "region" naming is a little confusing IMO. To me it sounds like
> something akin to a VMA rather than a set of tables.
>

ok

>> +#if CONFIG_ARM64_PGTABLE_LEVELS > 3
>> +     pud_t   pud[PTRS_PER_PUD];
>> +#endif
>> +#if CONFIG_ARM64_PGTABLE_LEVELS > 2
>> +     pmd_t   pmd[PTRS_PER_PMD];
>> +#endif
>> +     pte_t   pte[PTRS_PER_PTE];
>> +};
>> +
>> +static void __init bootstrap_mem_region(unsigned long addr,
>> +                                     struct mem_bootstrap_region *reg,
>> +                                     pmd_t **ppmd, pte_t **ppte)
>> +{
>> +     /*
>> +      * Avoid using the linear phys-to-virt translation __va() so that we
>> +      * can use this code before the linear mapping is set up. Note that
>> +      * any populated entries at any level can only point into swapper_pg_dir
>> +      * since no other translation table pages have been allocated yet.
>> +      * So at each level, we either need to populate it, or it has already
>> +      * been populated by a swapper_pg_dir table at the same level, in which
>> +      * case we can figure out its virtual address without applying __va()
>> +      * on the contents of the entry, using the following struct.
>> +      */
>> +     extern struct mem_bootstrap_region __swapper_bs_region;
>> +
>> +     pgd_t *pgd = pgd_offset_k(addr);
>> +     pud_t *pud = (pud_t *)pgd;
>> +     pmd_t *pmd = (pmd_t *)pud;
>> +
>> +#if CONFIG_ARM64_PGTABLE_LEVELS > 3
>> +     if (pgd_none(*pgd)) {
>> +             clear_page(reg->pud);
>> +             pgd_populate(&init_mm, pgd, reg->pud);
>
> What's PHYS_OFFSET expected to be at this point (for the purposes of
> __pa() in the *_populate*() macros)?
>
> If we're relying on __pa() to convert text->phys, won't __va() convert
> phys->text at this point?
>

At this points, yes. But later on, when the kernel text moves out of
the linear region, __pa() takes into account whether the input VA is
above or below PAGE_OFFSET, and adds the kernel image offset in the
latter case.
Changing __va() so it implements the inverse would be a can of worms
i'd rather keep closed.

>> +             pud = reg->pud;
>> +     } else {
>> +             pud = __swapper_bs_region.pud;
>> +     }
>> +     pud += pud_index(addr);
>> +#endif
>
> Can we free the unused reg tables in the else cases? If __pa() works we
> should be able to hand them to memblock, no?
>

Only if we put the memblock_reserve() of the kernel image before
early_fixmap_init(), otherwise we are freeing only to reserve it again
later.

> [...]
>
>>       /*
>>        * The boot-ioremap range spans multiple pmds, for which
>> -      * we are not preparted:
>> +      * we are not prepared:
>
> This typo has been bugging me for ages. I'll be glad to see it go!
>

:-)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 07/11] arm64: fixmap: allow init before linear mapping is set up
  2015-04-14 11:02     ` Ard Biesheuvel
@ 2015-04-14 13:41       ` Mark Rutland
  0 siblings, 0 replies; 29+ messages in thread
From: Mark Rutland @ 2015-04-14 13:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Apr 14, 2015 at 12:02:13PM +0100, Ard Biesheuvel wrote:
> On 14 April 2015 at 12:47, Mark Rutland <mark.rutland@arm.com> wrote:
> > Hi Ard,
> >
> > On Fri, Apr 10, 2015 at 02:53:51PM +0100, Ard Biesheuvel wrote:
> >> This reworks early_ioremap_init() so it populates the various levels
> >> of translation tables while taking the following into account:
> >> - be prepared for any of the levels to have been populated already, as
> >>   this may be the case once we move the kernel text mapping out of the
> >>   linear mapping;
> >> - don't rely on __va() to translate the physical address in a page table
> >>   entry to a virtual address, since this produces linear mapping addresses;
> >>   instead, use the fact that at any level, we know exactly which page in
> >>   swapper_pg_dir an entry could be pointing to if it points anywhere.
> >
> > Can we not use Catalin's PHYS_OFFSET swizzling trick instead? i.e.
> >
> >  * Set PHYS_OFFSET so __va hits in the text mapping.
> >
> >  * Create the fixmap entries.
> >
> >  * Parse the DTB or UEFI memory map, figure out the real PHYS_OFFSET.
> >
> >  * Create linear mapping for the initial tables.
> >
> >  * Swap PHYS_OFFSET for the real version, and update init_mm->pgd to
> >    point at the linear map alias of the swapper pgd.
> >
> > It seemed like that would require less open-coding of table manipulation
> > code, as we could use __va() early.
> >
> > Is there a limitation with that approach that I'm missing?
> >
> 
> I didn't quite catch Catalin's suggestion to mean the above, but the
> way you put it seems viable to me. I'll have a go and see how far I
> get with it.

We discussed it (and wrote it up) on the plane back from the FW summit,
and it may have made more sense in person than it did on the list; I've
only skimmed Catalin's responses. ;)

> >> +#if CONFIG_ARM64_PGTABLE_LEVELS > 3
> >> +     pud_t   pud[PTRS_PER_PUD];
> >> +#endif
> >> +#if CONFIG_ARM64_PGTABLE_LEVELS > 2
> >> +     pmd_t   pmd[PTRS_PER_PMD];
> >> +#endif
> >> +     pte_t   pte[PTRS_PER_PTE];
> >> +};
> >> +
> >> +static void __init bootstrap_mem_region(unsigned long addr,
> >> +                                     struct mem_bootstrap_region *reg,
> >> +                                     pmd_t **ppmd, pte_t **ppte)
> >> +{
> >> +     /*
> >> +      * Avoid using the linear phys-to-virt translation __va() so that we
> >> +      * can use this code before the linear mapping is set up. Note that
> >> +      * any populated entries at any level can only point into swapper_pg_dir
> >> +      * since no other translation table pages have been allocated yet.
> >> +      * So at each level, we either need to populate it, or it has already
> >> +      * been populated by a swapper_pg_dir table at the same level, in which
> >> +      * case we can figure out its virtual address without applying __va()
> >> +      * on the contents of the entry, using the following struct.
> >> +      */
> >> +     extern struct mem_bootstrap_region __swapper_bs_region;
> >> +
> >> +     pgd_t *pgd = pgd_offset_k(addr);
> >> +     pud_t *pud = (pud_t *)pgd;
> >> +     pmd_t *pmd = (pmd_t *)pud;
> >> +
> >> +#if CONFIG_ARM64_PGTABLE_LEVELS > 3
> >> +     if (pgd_none(*pgd)) {
> >> +             clear_page(reg->pud);
> >> +             pgd_populate(&init_mm, pgd, reg->pud);
> >
> > What's PHYS_OFFSET expected to be at this point (for the purposes of
> > __pa() in the *_populate*() macros)?
> >
> > If we're relying on __pa() to convert text->phys, won't __va() convert
> > phys->text at this point?
> >
> 
> At this points, yes. But later on, when the kernel text moves out of
> the linear region, __pa() takes into account whether the input VA is
> above or below PAGE_OFFSET, and adds the kernel image offset in the
> latter case.

Ah, I see.

> Changing __va() so it implements the inverse would be a can of worms
> i'd rather keep closed.

I completely agree.

> >> +             pud = reg->pud;
> >> +     } else {
> >> +             pud = __swapper_bs_region.pud;
> >> +     }
> >> +     pud += pud_index(addr);
> >> +#endif
> >
> > Can we free the unused reg tables in the else cases? If __pa() works we
> > should be able to hand them to memblock, no?
> >
> 
> Only if we put the memblock_reserve() of the kernel image before
> early_fixmap_init(), otherwise we are freeing only to reserve it again
> later.

Damn. That gets really painful with the memory limit (which does a
memblock_remove), early_param, and so on. There's horrible ordering
dependencies between those.

We could get around that with an early page allocator. Have each user
(just fixmap and linear map init?) place an upper bound on the pages
they need into .pgtbl.pool, have them allocate from there as needed, and
immediately after the memblock_reserve of the kernel, unreserve (remove
+ add) any remaining pages.

Mark.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 10/11] arm64: allow kernel Image to be loaded anywhere in physical memory
  2015-04-10 13:53 ` [PATCH v3 10/11] arm64: allow kernel Image to be loaded anywhere in physical memory Ard Biesheuvel
@ 2015-04-14 14:36   ` Mark Rutland
  0 siblings, 0 replies; 29+ messages in thread
From: Mark Rutland @ 2015-04-14 14:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 10, 2015 at 02:53:54PM +0100, Ard Biesheuvel wrote:
> This relaxes the kernel Image placement requirements, so that it
> may be placed at any 2 MB aligned offset in physical memory.
> 
> This is accomplished by ignoring PHYS_OFFSET when installing
> memblocks, and accounting for the apparent virtual offset of
> the kernel Image (in addition to the 64 MB that is is moved
> below PAGE_OFFSET). As a result, virtual address references
> below PAGE_OFFSET are correctly mapped onto physical references
> into the kernel Image regardless of where it sits in memory.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  Documentation/arm64/booting.txt | 17 +++++++----------
>  arch/arm64/mm/init.c            | 32 +++++++++++++++++++-------------
>  2 files changed, 26 insertions(+), 23 deletions(-)
> 
> diff --git a/Documentation/arm64/booting.txt b/Documentation/arm64/booting.txt
> index 6396460f6085..811d93548bdc 100644
> --- a/Documentation/arm64/booting.txt
> +++ b/Documentation/arm64/booting.txt
> @@ -110,16 +110,13 @@ Header notes:
>    depending on selected features, and is effectively unbound.
>  
>  The Image must be placed text_offset bytes from a 2MB aligned base
> -address near the start of usable system RAM and called there. Memory
> -below that base address is currently unusable by Linux, and therefore it
> -is strongly recommended that this location is the start of system RAM.
> -At least image_size bytes from the start of the image must be free for
> -use by the kernel.
> -
> -Any memory described to the kernel (even that below the 2MB aligned base
> -address) which is not marked as reserved from the kernel e.g. with a
> -memreserve region in the device tree) will be considered as available to
> -the kernel.
> +address anywhere in usable system RAM and called there. At least
> +image_size bytes from the start of the image must be free for use
> +by the kernel.
> +
> +Any memory described to the kernel which is not marked as reserved from
> +the kernel e.g. with a memreserve region in the device tree) will be
> +considered as available to the kernel.

As with the other docs changes we'll need a note w.r.t. the behaviour
older kernels. This might also be worth a feature bitmap bit so loaders
can do the best thing for the given kernel version.

> +	/*
> +	 * Set memstart_addr to the base of the lowest physical memory region,
> +	 * rounded down to PUD/PMD alignment so we can map it efficiently.
> +	 * Since this also affects the apparent offset of the kernel image in
> +	 * the virtual address space, increase image_offset by the same amount
> +	 * that we decrease memstart_addr.
> +	 */
> +	if (!memstart_addr || memstart_addr > base) {
> +		u64 new_memstart_addr;
> +
> +		if (IS_ENABLED(CONFIG_ARM64_64K_PAGES))
> +			new_memstart_addr = base & PMD_MASK;
> +		else
> +			new_memstart_addr = base & PUD_MASK;
> +
> +		image_offset += memstart_addr - new_memstart_addr;
> +		memstart_addr = new_memstart_addr;
> +	}

There's one slight snag with this. Given sufficient memory (e.g. more
than 512GB) and a sufficiently small VA size (e.g 39 bit), if the kernel
is loaded at the end of RAM we might not cover it in the linear mapping.

It would be nice if we could detect that and warn/stop if possible
(earlycon should be up by this point), rather than blowing up in strange
ways. The other option is to try to limit the memstart_addr such that we
know that we can map the kernel text (removing the unusable memory).

Mark.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2015-04-14 14:36 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-10 13:53 [PATCH v3 00/11] arm64: update/clarify/relax Image and FDT placement rules Ard Biesheuvel
2015-04-10 13:53 ` [PATCH v3 01/11] arm64: reduce ID map to a single page Ard Biesheuvel
2015-04-13 12:53   ` Mark Rutland
2015-04-13 12:56     ` Ard Biesheuvel
2015-04-10 13:53 ` [PATCH v3 02/11] arm64: drop sleep_idmap_phys and clean up cpu_resume() Ard Biesheuvel
2015-04-10 13:53 ` [PATCH v3 03/11] of/fdt: split off FDT self reservation from memreserve processing Ard Biesheuvel
2015-04-10 13:53 ` [PATCH v3 04/11] arm64: use fixmap region for permanent FDT mapping Ard Biesheuvel
2015-04-13 15:02   ` Mark Rutland
2015-04-13 15:15     ` Ard Biesheuvel
2015-04-13 15:26       ` Mark Rutland
2015-04-13 15:45         ` Ard Biesheuvel
2015-04-13 16:26           ` Mark Rutland
2015-04-14  7:44             ` Ard Biesheuvel
2015-04-14  8:57               ` Mark Rutland
2015-04-10 13:53 ` [PATCH v3 05/11] arm64/efi: adapt to relaxed FDT placement requirements Ard Biesheuvel
2015-04-13 16:20   ` Mark Rutland
2015-04-13 16:25     ` Ard Biesheuvel
2015-04-13 16:35       ` Mark Rutland
2015-04-13 16:36         ` Ard Biesheuvel
2015-04-10 13:53 ` [PATCH v3 06/11] arm64: implement our own early_init_dt_add_memory_arch() Ard Biesheuvel
2015-04-10 13:53 ` [PATCH v3 07/11] arm64: fixmap: allow init before linear mapping is set up Ard Biesheuvel
2015-04-14 10:47   ` Mark Rutland
2015-04-14 11:02     ` Ard Biesheuvel
2015-04-14 13:41       ` Mark Rutland
2015-04-10 13:53 ` [PATCH v3 08/11] arm64: mm: explicitly bootstrap the linear mapping Ard Biesheuvel
2015-04-10 13:53 ` [PATCH v3 09/11] arm64: move kernel mapping out of linear region Ard Biesheuvel
2015-04-10 13:53 ` [PATCH v3 10/11] arm64: allow kernel Image to be loaded anywhere in physical memory Ard Biesheuvel
2015-04-14 14:36   ` Mark Rutland
2015-04-10 13:53 ` [PATCH v3 11/11] arm64/efi: adapt to relaxed kernel Image placement requirements Ard Biesheuvel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.