linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/7] arm64: relax Image placement rules
@ 2015-11-16 11:23 Ard Biesheuvel
  2015-11-16 11:23 ` [PATCH v3 1/7] of/fdt: make memblock minimum physical address arch configurable Ard Biesheuvel
                   ` (6 more replies)
  0 siblings, 7 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2015-11-16 11:23 UTC (permalink / raw)
  To: linux-arm-kernel

This series updates the mapping of the kernel Image and the linear mapping of
system memory to allow more freedom in the choice of Image placement without
affecting the accessibility of system RAM below the kernel Image, and the
mapping efficiency (i.e., memory can always be mapped in 512 MB or 1 GB blocks).

An added benefit of having the freedom to put the Image anywhere in physical
memory is that it decomplicates the logic that needs to be applied when using
kexec to boot another kernel. Furthermore, moving the kernel out of the linear
mapping is a preparatory step towards implementing kASLR. This will be addressed
in a followup series.

Changes since v2 (http://marc.info/?l=linux-arm-kernel&m=144309211310754):
- rebased onto v4.4-rc1, which adds support for 16k pages, resulting in minor
  changes requried to various parts of the code
- fix behavior under CONFIG_DEBUG_RODATA

Changes since v1:
- dropped somewhat unrelated patch #1 and patches #2 and #3 that have been
  merged separately
- rebased onto v4.2-rc3
- tweak the generic early_init_dt_add_memory_arch for our purposes rather than
  clone the implementation completely

Known issues:
- the mem= command line parameter works correctly now, but removes memory from
  the bottom first before clipping from the top, which may be undesirable since
  it may discard precious memory below the 4 GB boundary.

Patch #1 refactors the generic early_init_dt_add_memory_arch implementation to
allow to minimum memblock address to be overridden by the architecture.

Patch #2 changes the memblock_reserve logic so that unused page table
reservations are left unreserved in memblock.

Patch #3 refactors early_fixmap_init() so that we can reuse its core for
bootstrapping other memory mappings.

Patch #4 bootstraps the linear mapping explicitly. Up until now, this was done
implicitly due to the fact that the linear mapping starts at the base of the
kernel Image.

Patch #5 moves the mapping of the kernel Image outside of the linear mapping.

Patch #6 changes the attributes of the linear mapping to non-executable since
we don't execute code from it anymore.

Patch #7 allows the kernel to be loaded at any 2 MB aligned offset in physical
memory, by assigning PHYS_OFFSET based on the available memory and not based on
the physical address of the base of the kernel Image.

Ard Biesheuvel (7):
  of/fdt: make memblock minimum physical address arch configurable
  arm64: use more granular reservations for static page table
    allocations
  arm64: split off early mapping code from early_fixmap_init()
  arm64: mm: explicitly bootstrap the linear mapping
  arm64: move kernel mapping out of linear region
  arm64: map linear region as non-executable
  arm64: allow kernel Image to be loaded anywhere in physical memory

 Documentation/arm64/booting.txt         |  12 +-
 arch/arm64/include/asm/boot.h           |   7 +
 arch/arm64/include/asm/compiler.h       |   2 +
 arch/arm64/include/asm/kernel-pgtable.h |   5 +-
 arch/arm64/include/asm/memory.h         |  27 ++-
 arch/arm64/kernel/head.S                |  18 +-
 arch/arm64/kernel/vmlinux.lds.S         |  39 +++-
 arch/arm64/mm/dump.c                    |   3 +-
 arch/arm64/mm/init.c                    |  54 ++++-
 arch/arm64/mm/mmu.c                     | 224 +++++++++++++-------
 drivers/of/fdt.c                        |   5 +-
 11 files changed, 284 insertions(+), 112 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 1/7] of/fdt: make memblock minimum physical address arch configurable
  2015-11-16 11:23 [PATCH v3 0/7] arm64: relax Image placement rules Ard Biesheuvel
@ 2015-11-16 11:23 ` Ard Biesheuvel
  2015-11-16 11:23 ` [PATCH v3 2/7] arm64: use more granular reservations for static page table allocations Ard Biesheuvel
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2015-11-16 11:23 UTC (permalink / raw)
  To: linux-arm-kernel

By default, early_init_dt_add_memory_arch() ignores memory below
the base of the kernel image since it won't be addressable via the
linear mapping. However, this is not appropriate anymore once we
decouple the kernel text mapping from the linear mapping, so archs
may want to drop the low limit entirely. So allow the minimum to be
overridden by setting MIN_MEMBLOCK_ADDR.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 drivers/of/fdt.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index d2430298a309..0455564f8cbc 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -971,13 +971,16 @@ int __init early_init_dt_scan_chosen(unsigned long node, const char *uname,
 }
 
 #ifdef CONFIG_HAVE_MEMBLOCK
+#ifndef MIN_MEMBLOCK_ADDR
+#define MIN_MEMBLOCK_ADDR	__pa(PAGE_OFFSET)
+#endif
 #ifndef MAX_MEMBLOCK_ADDR
 #define MAX_MEMBLOCK_ADDR	((phys_addr_t)~0)
 #endif
 
 void __init __weak early_init_dt_add_memory_arch(u64 base, u64 size)
 {
-	const u64 phys_offset = __pa(PAGE_OFFSET);
+	const u64 phys_offset = MIN_MEMBLOCK_ADDR;
 
 	if (!PAGE_ALIGNED(base)) {
 		if (size < PAGE_SIZE - (base & ~PAGE_MASK)) {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 2/7] arm64: use more granular reservations for static page table allocations
  2015-11-16 11:23 [PATCH v3 0/7] arm64: relax Image placement rules Ard Biesheuvel
  2015-11-16 11:23 ` [PATCH v3 1/7] of/fdt: make memblock minimum physical address arch configurable Ard Biesheuvel
@ 2015-11-16 11:23 ` Ard Biesheuvel
  2015-11-16 11:23 ` [PATCH v3 3/7] arm64: split off early mapping code from early_fixmap_init() Ard Biesheuvel
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2015-11-16 11:23 UTC (permalink / raw)
  To: linux-arm-kernel

Before introducing new statically allocated page tables and increasing
their alignment in subsequent patches, update the reservation logic
so that only pages that are in actual use end up as reserved with
memblock.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/mm/init.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 17bf39ac83ba..b3b0175d7135 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -36,6 +36,7 @@
 #include <linux/swiotlb.h>
 
 #include <asm/fixmap.h>
+#include <asm/kernel-pgtable.h>
 #include <asm/memory.h>
 #include <asm/sections.h>
 #include <asm/setup.h>
@@ -165,11 +166,13 @@ void __init arm64_memblock_init(void)
 	 * Register the kernel text, kernel data, initrd, and initial
 	 * pagetables with memblock.
 	 */
-	memblock_reserve(__pa(_text), _end - _text);
+	memblock_reserve(__pa(_text), __bss_stop - _text);
 #ifdef CONFIG_BLK_DEV_INITRD
 	if (initrd_start)
 		memblock_reserve(__virt_to_phys(initrd_start), initrd_end - initrd_start);
 #endif
+	memblock_reserve(__pa(idmap_pg_dir), IDMAP_DIR_SIZE);
+	memblock_reserve(__pa(swapper_pg_dir), SWAPPER_DIR_SIZE);
 
 	early_init_fdt_scan_reserved_mem();
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 3/7] arm64: split off early mapping code from early_fixmap_init()
  2015-11-16 11:23 [PATCH v3 0/7] arm64: relax Image placement rules Ard Biesheuvel
  2015-11-16 11:23 ` [PATCH v3 1/7] of/fdt: make memblock minimum physical address arch configurable Ard Biesheuvel
  2015-11-16 11:23 ` [PATCH v3 2/7] arm64: use more granular reservations for static page table allocations Ard Biesheuvel
@ 2015-11-16 11:23 ` Ard Biesheuvel
  2015-12-03 12:18   ` Mark Rutland
  2015-11-16 11:23 ` [PATCH v3 4/7] arm64: mm: explicitly bootstrap the linear mapping Ard Biesheuvel
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: Ard Biesheuvel @ 2015-11-16 11:23 UTC (permalink / raw)
  To: linux-arm-kernel

This splits off and generalises the population of the statically
allocated fixmap page tables so that we may reuse it later for
the linear mapping once we move the kernel text mapping out of it.

This also involves taking into account that table entries at any of
the levels we are populating may have been populated already, since
the fixmap mapping might not be disjoint up to the pgd level anymore
from other early mappings.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/compiler.h |  2 +
 arch/arm64/kernel/vmlinux.lds.S   | 12 ++--
 arch/arm64/mm/mmu.c               | 60 ++++++++++++++------
 3 files changed, 51 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/include/asm/compiler.h b/arch/arm64/include/asm/compiler.h
index ee35fd0f2236..dd342af63673 100644
--- a/arch/arm64/include/asm/compiler.h
+++ b/arch/arm64/include/asm/compiler.h
@@ -27,4 +27,6 @@
  */
 #define __asmeq(x, y)  ".ifnc " x "," y " ; .err ; .endif\n\t"
 
+#define __pgdir		__attribute__((section(".pgdir"),aligned(PAGE_SIZE)))
+
 #endif	/* __ASM_COMPILER_H */
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 1ee2c3937d4e..87a596246ec7 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -164,11 +164,13 @@ SECTIONS
 
 	BSS_SECTION(0, 0, 0)
 
-	. = ALIGN(PAGE_SIZE);
-	idmap_pg_dir = .;
-	. += IDMAP_DIR_SIZE;
-	swapper_pg_dir = .;
-	. += SWAPPER_DIR_SIZE;
+	.pgdir (NOLOAD) : ALIGN(PAGE_SIZE) {
+		idmap_pg_dir = .;
+		. += IDMAP_DIR_SIZE;
+		swapper_pg_dir = .;
+		. += SWAPPER_DIR_SIZE;
+		*(.pgdir)
+	}
 
 	_end = .;
 
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 32ddd893da9a..4f397a87c2be 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -396,6 +396,44 @@ static void __init __map_memblock(phys_addr_t start, phys_addr_t end)
 }
 #endif
 
+struct bootstrap_pgtables {
+	pte_t	pte[PTRS_PER_PTE];
+	pmd_t	pmd[PTRS_PER_PMD > 1 ? PTRS_PER_PMD : 0];
+	pud_t	pud[PTRS_PER_PUD > 1 ? PTRS_PER_PUD : 0];
+};
+
+static void __init bootstrap_early_mapping(unsigned long addr,
+					   struct bootstrap_pgtables *reg,
+					   bool pte_level)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+
+	pgd = pgd_offset_k(addr);
+	if (pgd_none(*pgd)) {
+		clear_page(reg->pud);
+		memblock_reserve(__pa(reg->pud), PAGE_SIZE);
+		pgd_populate(&init_mm, pgd, reg->pud);
+	}
+	pud = pud_offset(pgd, addr);
+	if (pud_none(*pud)) {
+		clear_page(reg->pmd);
+		memblock_reserve(__pa(reg->pmd), PAGE_SIZE);
+		pud_populate(&init_mm, pud, reg->pmd);
+	}
+
+	if (!pte_level)
+		return;
+
+	pmd = pmd_offset(pud, addr);
+	if (pmd_none(*pmd)) {
+		clear_page(reg->pte);
+		memblock_reserve(__pa(reg->pte), PAGE_SIZE);
+		pmd_populate_kernel(&init_mm, pmd, reg->pte);
+	}
+}
+
 static void __init map_mem(void)
 {
 	struct memblock_region *reg;
@@ -598,14 +636,6 @@ void vmemmap_free(unsigned long start, unsigned long end)
 }
 #endif	/* CONFIG_SPARSEMEM_VMEMMAP */
 
-static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
-#if CONFIG_PGTABLE_LEVELS > 2
-static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
-#endif
-#if CONFIG_PGTABLE_LEVELS > 3
-static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
-#endif
-
 static inline pud_t * fixmap_pud(unsigned long addr)
 {
 	pgd_t *pgd = pgd_offset_k(addr);
@@ -635,21 +665,15 @@ static inline pte_t * fixmap_pte(unsigned long addr)
 
 void __init early_fixmap_init(void)
 {
-	pgd_t *pgd;
-	pud_t *pud;
+	static struct bootstrap_pgtables fixmap_bs_pgtables __pgdir;
 	pmd_t *pmd;
-	unsigned long addr = FIXADDR_START;
 
-	pgd = pgd_offset_k(addr);
-	pgd_populate(&init_mm, pgd, bm_pud);
-	pud = pud_offset(pgd, addr);
-	pud_populate(&init_mm, pud, bm_pmd);
-	pmd = pmd_offset(pud, addr);
-	pmd_populate_kernel(&init_mm, pmd, bm_pte);
+	bootstrap_early_mapping(FIXADDR_START, &fixmap_bs_pgtables, true);
+	pmd = fixmap_pmd(FIXADDR_START);
 
 	/*
 	 * The boot-ioremap range spans multiple pmds, for which
-	 * we are not preparted:
+	 * we are not prepared:
 	 */
 	BUILD_BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
 		     != (__fix_to_virt(FIX_BTMAP_END) >> PMD_SHIFT));
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 4/7] arm64: mm: explicitly bootstrap the linear mapping
  2015-11-16 11:23 [PATCH v3 0/7] arm64: relax Image placement rules Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2015-11-16 11:23 ` [PATCH v3 3/7] arm64: split off early mapping code from early_fixmap_init() Ard Biesheuvel
@ 2015-11-16 11:23 ` Ard Biesheuvel
  2015-11-16 11:23 ` [PATCH v3 5/7] arm64: move kernel mapping out of linear region Ard Biesheuvel
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2015-11-16 11:23 UTC (permalink / raw)
  To: linux-arm-kernel

In preparation of moving the kernel text out of the linear
mapping, ensure that the part of the kernel Image that contains
the statically allocated page tables is made accessible via the
linear mapping before performing the actual mapping of all of
memory. This is needed by the normal mapping routines, that rely
on the linear mapping to walk the page tables while manipulating
them.

In addition, explicitly map the start of DRAM and set the memblock
limit so that all early memblock allocations are done from a region
that is guaranteed to be mapped.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/kernel/vmlinux.lds.S | 18 +++-
 arch/arm64/mm/mmu.c             | 93 +++++++++++++++-----
 2 files changed, 86 insertions(+), 25 deletions(-)

diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 87a596246ec7..63fca196c09e 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -72,6 +72,17 @@ PECOFF_FILE_ALIGNMENT = 0x200;
 #define ALIGN_DEBUG_RO_MIN(min)		. = ALIGN(min);
 #endif
 
+/*
+ * The pgdir region needs to be mappable using a single PMD or PUD sized region,
+ * so align it to a power-of-2 upper bound of its size. 16k/4 levels needs 20
+ * pages at the most, every other config needs at most 16 pages.
+ */
+#if defined(CONFIG_ARM64_16K_PAGES) && CONFIG_ARM64_PGTABLE_LEVELS == 4
+#define PGDIR_ALIGN	(32 * PAGE_SIZE)
+#else
+#define PGDIR_ALIGN	(16 * PAGE_SIZE)
+#endif
+
 SECTIONS
 {
 	/*
@@ -164,7 +175,7 @@ SECTIONS
 
 	BSS_SECTION(0, 0, 0)
 
-	.pgdir (NOLOAD) : ALIGN(PAGE_SIZE) {
+	.pgdir (NOLOAD) : ALIGN(PGDIR_ALIGN) {
 		idmap_pg_dir = .;
 		. += IDMAP_DIR_SIZE;
 		swapper_pg_dir = .;
@@ -189,6 +200,11 @@ ASSERT(__idmap_text_end - (__idmap_text_start & ~(SZ_4K - 1)) <= SZ_4K,
 	"ID map text too big or misaligned")
 
 /*
+ * Check that the chosen PGDIR_ALIGN value is sufficient.
+ */
+ASSERT(SIZEOF(.pgdir) <= ALIGNOF(.pgdir), ".pgdir size exceeds its alignment")
+
+/*
  * If padding is applied before .head.text, virt<->phys conversions will fail.
  */
 ASSERT(_text == (PAGE_OFFSET + TEXT_OFFSET), "HEAD is misaligned")
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 4f397a87c2be..81bb49eaa1a3 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -434,23 +434,86 @@ static void __init bootstrap_early_mapping(unsigned long addr,
 	}
 }
 
-static void __init map_mem(void)
+/*
+ * Bootstrap a memory mapping in such a way that it does not require allocation
+ * of page tables beyond the ones that were allocated statically by
+ * bootstrap_early_mapping().
+ * This is done by finding the memblock that covers pa_base, and intersecting
+ * it with the naturally aligned 512 MB, 32 MB or 1 GB region (depending on page
+ * size) that covers pa_base as well and (on 4k pages) round it to section size.
+ */
+static unsigned long __init bootstrap_region(struct bootstrap_pgtables *reg,
+					     phys_addr_t pa_base,
+					     unsigned long va_offset)
 {
-	struct memblock_region *reg;
-	phys_addr_t limit;
+	unsigned long va_base = __phys_to_virt(pa_base) + va_offset;
+	struct memblock_region *mr;
+
+	bootstrap_early_mapping(va_base, reg, !ARM64_SWAPPER_USES_SECTION_MAPS);
+
+	for_each_memblock(memory, mr) {
+		phys_addr_t start = mr->base;
+		phys_addr_t end = start + mr->size;
+		unsigned long vstart, vend;
+
+		if (start > pa_base || end <= pa_base)
+			continue;
+
+		/* clip the region to PMD size */
+		vstart = max(round_down(va_base, 1 << SWAPPER_TABLE_SHIFT),
+			     round_up(__phys_to_virt(start) + va_offset,
+				      SWAPPER_BLOCK_SIZE));
+		vend = min(round_up(va_base + 1, 1 << SWAPPER_TABLE_SHIFT),
+			   round_down(__phys_to_virt(end) + va_offset,
+				      SWAPPER_BLOCK_SIZE));
+
+		create_mapping(__pa(vstart - va_offset), vstart, vend - vstart,
+			       PAGE_KERNEL_EXEC);
+
+		return vend;
+	}
+	return 0;
+}
+
+/*
+ * Bootstrap the linear ranges that cover the start of DRAM and swapper_pg_dir
+ * so that the statically allocated page tables as well as newly allocated ones
+ * are accessible via the linear mapping.
+ */
+static void __init bootstrap_linear_mapping(unsigned long va_offset)
+{
+	static struct bootstrap_pgtables __pgdir bs_pgdir_low, bs_pgdir_high;
+	unsigned long vend;
+
+	/* Bootstrap the mapping for the beginning of RAM */
+	vend = bootstrap_region(&bs_pgdir_low, memblock_start_of_DRAM(),
+				va_offset);
+	BUG_ON(vend == 0);
 
 	/*
 	 * Temporarily limit the memblock range. We need to do this as
 	 * create_mapping requires puds, pmds and ptes to be allocated from
-	 * memory addressable from the initial direct kernel mapping.
+	 * memory addressable from the early linear mapping.
 	 *
 	 * The initial direct kernel mapping, located@swapper_pg_dir, gives
 	 * us PUD_SIZE (with SECTION maps) or PMD_SIZE (without SECTION maps,
 	 * memory starting from PHYS_OFFSET (which must be aligned to 2MB as
 	 * per Documentation/arm64/booting.txt).
 	 */
-	limit = PHYS_OFFSET + SWAPPER_INIT_MAP_SIZE;
-	memblock_set_current_limit(limit);
+	memblock_set_current_limit(__pa(vend - va_offset));
+
+	/* Bootstrap the linear mapping of the kernel image */
+	vend = bootstrap_region(&bs_pgdir_high, __pa(swapper_pg_dir),
+				va_offset);
+	if (vend == 0)
+		panic("Kernel image not covered by memblock");
+}
+
+static void __init map_mem(void)
+{
+	struct memblock_region *reg;
+
+	bootstrap_linear_mapping(0);
 
 	/* map all the memory banks */
 	for_each_memblock(memory, reg) {
@@ -460,24 +523,6 @@ static void __init map_mem(void)
 		if (start >= end)
 			break;
 
-		if (ARM64_SWAPPER_USES_SECTION_MAPS) {
-			/*
-			 * For the first memory bank align the start address and
-			 * current memblock limit to prevent create_mapping() from
-			 * allocating pte page tables from unmapped memory. With
-			 * the section maps, if the first block doesn't end on section
-			 * size boundary, create_mapping() will try to allocate a pte
-			 * page, which may be returned from an unmapped area.
-			 * When section maps are not used, the pte page table for the
-			 * current limit is already present in swapper_pg_dir.
-			 */
-			if (start < limit)
-				start = ALIGN(start, SECTION_SIZE);
-			if (end < limit) {
-				limit = end & SECTION_MASK;
-				memblock_set_current_limit(limit);
-			}
-		}
 		__map_memblock(start, end);
 	}
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 5/7] arm64: move kernel mapping out of linear region
  2015-11-16 11:23 [PATCH v3 0/7] arm64: relax Image placement rules Ard Biesheuvel
                   ` (3 preceding siblings ...)
  2015-11-16 11:23 ` [PATCH v3 4/7] arm64: mm: explicitly bootstrap the linear mapping Ard Biesheuvel
@ 2015-11-16 11:23 ` Ard Biesheuvel
  2015-12-07 12:26   ` Catalin Marinas
  2015-11-16 11:23 ` [PATCH v3 6/7] arm64: map linear region as non-executable Ard Biesheuvel
  2015-11-16 11:23 ` [PATCH v3 7/7] arm64: allow kernel Image to be loaded anywhere in physical memory Ard Biesheuvel
  6 siblings, 1 reply; 28+ messages in thread
From: Ard Biesheuvel @ 2015-11-16 11:23 UTC (permalink / raw)
  To: linux-arm-kernel

This moves the primary mapping of the kernel Image out of
the linear region. This is a preparatory step towards allowing
the kernel Image to reside anywhere in physical memory without
affecting the ability to map all of it efficiently.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/boot.h           |  7 +++++++
 arch/arm64/include/asm/kernel-pgtable.h |  5 +++--
 arch/arm64/include/asm/memory.h         | 20 +++++++++++++++++---
 arch/arm64/kernel/head.S                | 18 +++++++++++++-----
 arch/arm64/kernel/vmlinux.lds.S         | 11 +++++++++--
 arch/arm64/mm/dump.c                    |  3 ++-
 arch/arm64/mm/mmu.c                     | 19 ++++++++++++++-----
 7 files changed, 65 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/include/asm/boot.h b/arch/arm64/include/asm/boot.h
index 81151b67b26b..092d1096ce9a 100644
--- a/arch/arm64/include/asm/boot.h
+++ b/arch/arm64/include/asm/boot.h
@@ -11,4 +11,11 @@
 #define MIN_FDT_ALIGN		8
 #define MAX_FDT_SIZE		SZ_2M
 
+/*
+ * arm64 requires the kernel image to be 2 MB aligned and
+ * not exceed 64 MB in size.
+ */
+#define MIN_KIMG_ALIGN		SZ_2M
+#define MAX_KIMG_SIZE		SZ_64M
+
 #endif
diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
index a459714ee29e..daa8a7b9917a 100644
--- a/arch/arm64/include/asm/kernel-pgtable.h
+++ b/arch/arm64/include/asm/kernel-pgtable.h
@@ -70,8 +70,9 @@
 /*
  * Initial memory map attributes.
  */
-#define SWAPPER_PTE_FLAGS	(PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
-#define SWAPPER_PMD_FLAGS	(PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
+#define SWAPPER_PTE_FLAGS	(PTE_TYPE_PAGE | PTE_AF | PTE_SHARED | PTE_UXN)
+#define SWAPPER_PMD_FLAGS	(PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S | \
+				 PMD_SECT_UXN)
 
 #if ARM64_SWAPPER_USES_SECTION_MAPS
 #define SWAPPER_MM_MMUFLAGS	(PMD_ATTRINDX(MT_NORMAL) | SWAPPER_PMD_FLAGS)
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 853953cd1f08..3148691bc80a 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -24,6 +24,7 @@
 #include <linux/compiler.h>
 #include <linux/const.h>
 #include <linux/types.h>
+#include <asm/boot.h>
 #include <asm/sizes.h>
 
 /*
@@ -39,7 +40,12 @@
 #define PCI_IO_SIZE		SZ_16M
 
 /*
- * PAGE_OFFSET - the virtual address of the start of the kernel image (top
+ * Offset below PAGE_OFFSET where to map the kernel Image.
+ */
+#define KIMAGE_OFFSET		MAX_KIMG_SIZE
+
+/*
+ * PAGE_OFFSET - the virtual address of the base of the linear mapping (top
  *		 (VA_BITS - 1))
  * VA_BITS - the maximum number of bits for virtual addresses.
  * VA_START - the first kernel virtual address.
@@ -51,7 +57,8 @@
 #define VA_BITS			(CONFIG_ARM64_VA_BITS)
 #define VA_START		(UL(0xffffffffffffffff) << VA_BITS)
 #define PAGE_OFFSET		(UL(0xffffffffffffffff) << (VA_BITS - 1))
-#define MODULES_END		(PAGE_OFFSET)
+#define KIMAGE_VADDR		(PAGE_OFFSET - KIMAGE_OFFSET)
+#define MODULES_END		KIMAGE_VADDR
 #define MODULES_VADDR		(MODULES_END - SZ_64M)
 #define PCI_IO_END		(MODULES_VADDR - SZ_2M)
 #define PCI_IO_START		(PCI_IO_END - PCI_IO_SIZE)
@@ -75,7 +82,11 @@
  * private definitions which should NOT be used outside memory.h
  * files.  Use virt_to_phys/phys_to_virt/__pa/__va instead.
  */
-#define __virt_to_phys(x)	(((phys_addr_t)(x) - PAGE_OFFSET + PHYS_OFFSET))
+#define __virt_to_phys(x) ({						\
+	long __x = (long)(x) - PAGE_OFFSET;				\
+	__x >= 0 ? (phys_addr_t)(__x + PHYS_OFFSET) : 			\
+		   (phys_addr_t)(__x + PHYS_OFFSET + kernel_va_offset); })
+
 #define __phys_to_virt(x)	((unsigned long)((x) - PHYS_OFFSET + PAGE_OFFSET))
 
 /*
@@ -106,6 +117,8 @@ extern phys_addr_t		memstart_addr;
 /* PHYS_OFFSET - the physical address of the start of memory. */
 #define PHYS_OFFSET		({ memstart_addr; })
 
+extern u64 kernel_va_offset;
+
 /*
  * The maximum physical address that the linear direct mapping
  * of system RAM can cover. (PAGE_OFFSET can be interpreted as
@@ -113,6 +126,7 @@ extern phys_addr_t		memstart_addr;
  * maximum size of the linear mapping.)
  */
 #define MAX_MEMBLOCK_ADDR	({ memstart_addr - PAGE_OFFSET - 1; })
+#define MIN_MEMBLOCK_ADDR	__pa(KIMAGE_VADDR)
 
 /*
  * PFNs are used to describe any physical page; this means
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 23cfc08fc8ba..d3e4b5d6a8d2 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -38,8 +38,6 @@
 #include <asm/thread_info.h>
 #include <asm/virt.h>
 
-#define __PHYS_OFFSET	(KERNEL_START - TEXT_OFFSET)
-
 #if (TEXT_OFFSET & 0xfff) != 0
 #error TEXT_OFFSET must be at least 4KB aligned
 #elif (PAGE_OFFSET & 0x1fffff) != 0
@@ -50,6 +48,8 @@
 
 #define KERNEL_START	_text
 #define KERNEL_END	_end
+#define KERNEL_BASE	(KERNEL_START - TEXT_OFFSET)
+
 
 /*
  * Kernel startup entry point.
@@ -210,7 +210,15 @@ section_table:
 ENTRY(stext)
 	bl	preserve_boot_args
 	bl	el2_setup			// Drop to EL1, w20=cpu_boot_mode
-	adrp	x24, __PHYS_OFFSET
+
+	/*
+	 * Before the linear mapping has been set up, __va() translations will
+	 * not produce usable virtual addresses unless we tweak PHYS_OFFSET to
+	 * compensate for the offset between the kernel mapping and the base of
+	 * the linear mapping. We will undo this in map_mem().
+	 */
+	adrp	x24, KERNEL_BASE + KIMAGE_OFFSET
+
 	bl	set_cpu_boot_mode_flag
 	bl	__create_page_tables		// x25=TTBR0, x26=TTBR1
 	/*
@@ -389,10 +397,10 @@ __create_page_tables:
 	 * Map the kernel image (starting with PHYS_OFFSET).
 	 */
 	mov	x0, x26				// swapper_pg_dir
-	mov	x5, #PAGE_OFFSET
+	ldr	x5, =KERNEL_BASE
 	create_pgd_entry x0, x5, x3, x6
 	ldr	x6, =KERNEL_END			// __va(KERNEL_END)
-	mov	x3, x24				// phys offset
+	adrp	x3, KERNEL_BASE			// real PHYS_OFFSET
 	create_block_map x0, x7, x3, x5, x6
 
 	/*
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 63fca196c09e..84f780e6b039 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -7,6 +7,7 @@
 #include <asm-generic/vmlinux.lds.h>
 #include <asm/kernel-pgtable.h>
 #include <asm/thread_info.h>
+#include <asm/boot.h>
 #include <asm/memory.h>
 #include <asm/page.h>
 #include <asm/pgtable.h>
@@ -99,7 +100,7 @@ SECTIONS
 		*(.discard.*)
 	}
 
-	. = PAGE_OFFSET + TEXT_OFFSET;
+	. = KIMAGE_VADDR + TEXT_OFFSET;
 
 	.head.text : {
 		_text = .;
@@ -207,4 +208,10 @@ ASSERT(SIZEOF(.pgdir) <= ALIGNOF(.pgdir), ".pgdir size exceeds its alignment")
 /*
  * If padding is applied before .head.text, virt<->phys conversions will fail.
  */
-ASSERT(_text == (PAGE_OFFSET + TEXT_OFFSET), "HEAD is misaligned")
+ASSERT(_text == (KIMAGE_VADDR + TEXT_OFFSET), "HEAD is misaligned")
+
+/*
+ * Make sure the memory footprint of the kernel Image does not exceed the limit.
+ */
+ASSERT(_end - _text + TEXT_OFFSET <= MAX_KIMG_SIZE,
+	"Kernel Image memory footprint exceeds MAX_KIMG_SIZE")
diff --git a/arch/arm64/mm/dump.c b/arch/arm64/mm/dump.c
index 5a22a119a74c..f6272f450688 100644
--- a/arch/arm64/mm/dump.c
+++ b/arch/arm64/mm/dump.c
@@ -63,7 +63,8 @@ static struct addr_marker address_markers[] = {
 	{ PCI_IO_END,		"PCI I/O end" },
 	{ MODULES_VADDR,	"Modules start" },
 	{ MODULES_END,		"Modules end" },
-	{ PAGE_OFFSET,		"Kernel Mapping" },
+	{ KIMAGE_VADDR,		"Kernel Mapping" },
+	{ PAGE_OFFSET,		"Linear Mapping" },
 	{ -1,			NULL },
 };
 
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 81bb49eaa1a3..c7ba171951c8 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -51,6 +51,9 @@ u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
 struct page *empty_zero_page;
 EXPORT_SYMBOL(empty_zero_page);
 
+u64 kernel_va_offset __read_mostly;
+EXPORT_SYMBOL(kernel_va_offset);
+
 pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 			      unsigned long size, pgprot_t vma_prot)
 {
@@ -479,6 +482,9 @@ static unsigned long __init bootstrap_region(struct bootstrap_pgtables *reg,
  * Bootstrap the linear ranges that cover the start of DRAM and swapper_pg_dir
  * so that the statically allocated page tables as well as newly allocated ones
  * are accessible via the linear mapping.
+ * Since@this point, PHYS_OFFSET is still biased to redirect __va()
+ * translations into the kernel text mapping, we need to apply an
+ * explicit va_offset to calculate linear virtual addresses.
  */
 static void __init bootstrap_linear_mapping(unsigned long va_offset)
 {
@@ -513,7 +519,10 @@ static void __init map_mem(void)
 {
 	struct memblock_region *reg;
 
-	bootstrap_linear_mapping(0);
+	bootstrap_linear_mapping(KIMAGE_OFFSET);
+
+	kernel_va_offset = KIMAGE_OFFSET;
+	memstart_addr -= KIMAGE_OFFSET;
 
 	/* map all the memory banks */
 	for_each_memblock(memory, reg) {
@@ -535,12 +544,12 @@ static void __init fixup_executable(void)
 #ifdef CONFIG_DEBUG_RODATA
 	/* now that we are actually fully mapped, make the start/end more fine grained */
 	if (!IS_ALIGNED((unsigned long)_stext, SWAPPER_BLOCK_SIZE)) {
-		unsigned long aligned_start = round_down(__pa(_stext),
+		unsigned long aligned_start = round_down((unsigned long)_stext,
 							 SWAPPER_BLOCK_SIZE);
 
-		create_mapping(aligned_start, __phys_to_virt(aligned_start),
-				__pa(_stext) - aligned_start,
-				PAGE_KERNEL);
+		create_mapping(__pa(_stext), aligned_start,
+			       (unsigned long)_stext - aligned_start,
+			       PAGE_KERNEL);
 	}
 
 	if (!IS_ALIGNED((unsigned long)__init_end, SWAPPER_BLOCK_SIZE)) {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 6/7] arm64: map linear region as non-executable
  2015-11-16 11:23 [PATCH v3 0/7] arm64: relax Image placement rules Ard Biesheuvel
                   ` (4 preceding siblings ...)
  2015-11-16 11:23 ` [PATCH v3 5/7] arm64: move kernel mapping out of linear region Ard Biesheuvel
@ 2015-11-16 11:23 ` Ard Biesheuvel
  2015-12-07 16:19   ` Catalin Marinas
  2015-11-16 11:23 ` [PATCH v3 7/7] arm64: allow kernel Image to be loaded anywhere in physical memory Ard Biesheuvel
  6 siblings, 1 reply; 28+ messages in thread
From: Ard Biesheuvel @ 2015-11-16 11:23 UTC (permalink / raw)
  To: linux-arm-kernel

Now that we moved the kernel text out of the linear region, there
is no longer a reason to map the linear region as executable. This
also allows us to completely get rid of the __map_mem() variant that
only maps some of it executable if CONFIG_DEBUG_RODATA is selected.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/mm/mmu.c | 41 +-------------------
 1 file changed, 2 insertions(+), 39 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index c7ba171951c8..526eeb7e1e97 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -357,47 +357,10 @@ static void create_mapping_late(phys_addr_t phys, unsigned long virt,
 				phys, virt, size, prot, late_alloc);
 }
 
-#ifdef CONFIG_DEBUG_RODATA
 static void __init __map_memblock(phys_addr_t start, phys_addr_t end)
 {
-	/*
-	 * Set up the executable regions using the existing section mappings
-	 * for now. This will get more fine grained later once all memory
-	 * is mapped
-	 */
-	unsigned long kernel_x_start = round_down(__pa(_stext), SWAPPER_BLOCK_SIZE);
-	unsigned long kernel_x_end = round_up(__pa(__init_end), SWAPPER_BLOCK_SIZE);
-
-	if (end < kernel_x_start) {
-		create_mapping(start, __phys_to_virt(start),
-			end - start, PAGE_KERNEL);
-	} else if (start >= kernel_x_end) {
-		create_mapping(start, __phys_to_virt(start),
-			end - start, PAGE_KERNEL);
-	} else {
-		if (start < kernel_x_start)
-			create_mapping(start, __phys_to_virt(start),
-				kernel_x_start - start,
-				PAGE_KERNEL);
-		create_mapping(kernel_x_start,
-				__phys_to_virt(kernel_x_start),
-				kernel_x_end - kernel_x_start,
-				PAGE_KERNEL_EXEC);
-		if (kernel_x_end < end)
-			create_mapping(kernel_x_end,
-				__phys_to_virt(kernel_x_end),
-				end - kernel_x_end,
-				PAGE_KERNEL);
-	}
-
-}
-#else
-static void __init __map_memblock(phys_addr_t start, phys_addr_t end)
-{
-	create_mapping(start, __phys_to_virt(start), end - start,
-			PAGE_KERNEL_EXEC);
+	create_mapping(start, __phys_to_virt(start), end - start, PAGE_KERNEL);
 }
-#endif
 
 struct bootstrap_pgtables {
 	pte_t	pte[PTRS_PER_PTE];
@@ -471,7 +434,7 @@ static unsigned long __init bootstrap_region(struct bootstrap_pgtables *reg,
 				      SWAPPER_BLOCK_SIZE));
 
 		create_mapping(__pa(vstart - va_offset), vstart, vend - vstart,
-			       PAGE_KERNEL_EXEC);
+			       PAGE_KERNEL);
 
 		return vend;
 	}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 7/7] arm64: allow kernel Image to be loaded anywhere in physical memory
  2015-11-16 11:23 [PATCH v3 0/7] arm64: relax Image placement rules Ard Biesheuvel
                   ` (5 preceding siblings ...)
  2015-11-16 11:23 ` [PATCH v3 6/7] arm64: map linear region as non-executable Ard Biesheuvel
@ 2015-11-16 11:23 ` Ard Biesheuvel
  2015-12-07 15:30   ` Catalin Marinas
  6 siblings, 1 reply; 28+ messages in thread
From: Ard Biesheuvel @ 2015-11-16 11:23 UTC (permalink / raw)
  To: linux-arm-kernel

This relaxes the kernel Image placement requirements, so that it
may be placed at any 2 MB aligned offset in physical memory.

This is accomplished by ignoring PHYS_OFFSET when installing
memblocks, and accounting for the apparent virtual offset of
the kernel Image (in addition to the 64 MB that it is moved
below PAGE_OFFSET). As a result, virtual address references
below PAGE_OFFSET are correctly mapped onto physical references
into the kernel Image regardless of where it sits in memory.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 Documentation/arm64/booting.txt | 12 ++---
 arch/arm64/include/asm/memory.h |  9 ++--
 arch/arm64/mm/init.c            | 49 +++++++++++++++++++-
 arch/arm64/mm/mmu.c             | 29 ++++++++++--
 4 files changed, 83 insertions(+), 16 deletions(-)

diff --git a/Documentation/arm64/booting.txt b/Documentation/arm64/booting.txt
index 701d39d3171a..f190e708bb9b 100644
--- a/Documentation/arm64/booting.txt
+++ b/Documentation/arm64/booting.txt
@@ -117,14 +117,14 @@ Header notes:
   depending on selected features, and is effectively unbound.
 
 The Image must be placed text_offset bytes from a 2MB aligned base
-address near the start of usable system RAM and called there. Memory
-below that base address is currently unusable by Linux, and therefore it
-is strongly recommended that this location is the start of system RAM.
-The region between the 2 MB aligned base address and the start of the
-image has no special significance to the kernel, and may be used for
-other purposes.
+address anywhere in usable system RAM and called there. The region
+between the 2 MB aligned base address and the start of the image has no
+special significance to the kernel, and may be used for other purposes.
 At least image_size bytes from the start of the image must be free for
 use by the kernel.
+NOTE: versions prior to v4.5 cannot make use of memory below the
+physical offset of the Image so it is recommended that the Image be
+placed as close as possible to the start of system RAM.
 
 Any memory described to the kernel (even that below the start of the
 image) which is not marked as reserved from the kernel (e.g., with a
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 3148691bc80a..d6a237bda1f9 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -120,13 +120,10 @@ extern phys_addr_t		memstart_addr;
 extern u64 kernel_va_offset;
 
 /*
- * The maximum physical address that the linear direct mapping
- * of system RAM can cover. (PAGE_OFFSET can be interpreted as
- * a 2's complement signed quantity and negated to derive the
- * maximum size of the linear mapping.)
+ * Allow all memory at the discovery stage. We will clip it later.
  */
-#define MAX_MEMBLOCK_ADDR	({ memstart_addr - PAGE_OFFSET - 1; })
-#define MIN_MEMBLOCK_ADDR	__pa(KIMAGE_VADDR)
+#define MIN_MEMBLOCK_ADDR	0
+#define MAX_MEMBLOCK_ADDR	U64_MAX
 
 /*
  * PFNs are used to describe any physical page; this means
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index b3b0175d7135..29a7dc5327b6 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -35,6 +35,7 @@
 #include <linux/efi.h>
 #include <linux/swiotlb.h>
 
+#include <asm/boot.h>
 #include <asm/fixmap.h>
 #include <asm/kernel-pgtable.h>
 #include <asm/memory.h>
@@ -158,9 +159,55 @@ static int __init early_mem(char *p)
 }
 early_param("mem", early_mem);
 
+static void __init enforce_memory_limit(void)
+{
+	const phys_addr_t kbase = round_down(__pa(_text), MIN_KIMG_ALIGN);
+	u64 to_remove = memblock_phys_mem_size() - memory_limit;
+	phys_addr_t max_addr = 0;
+	struct memblock_region *r;
+
+	if (memory_limit == (phys_addr_t)ULLONG_MAX)
+		return;
+
+	/*
+	 * The kernel may be high up in physical memory, so try to apply the
+	 * limit below the kernel first, and only let the generic handling
+	 * take over if it turns out we haven't clipped enough memory yet.
+	 */
+	for_each_memblock(memory, r) {
+		if (r->base + r->size > kbase) {
+			u64 rem = min(to_remove, kbase - r->base);
+
+			max_addr = r->base + rem;
+			to_remove -= rem;
+			break;
+		}
+		if (to_remove <= r->size) {
+			max_addr = r->base + to_remove;
+			to_remove = 0;
+			break;
+		}
+		to_remove -= r->size;
+	}
+
+	memblock_remove(0, max_addr);
+
+	if (to_remove)
+		memblock_enforce_memory_limit(memory_limit);
+}
+
 void __init arm64_memblock_init(void)
 {
-	memblock_enforce_memory_limit(memory_limit);
+	/*
+	 * Remove the memory that we will not be able to cover
+	 * with the linear mapping.
+	 */
+	const s64 linear_region_size = -(s64)PAGE_OFFSET;
+
+	memblock_remove(round_down(memblock_start_of_DRAM(), SZ_1G) +
+			linear_region_size, ULLONG_MAX);
+
+	enforce_memory_limit();
 
 	/*
 	 * Register the kernel text, kernel data, initrd, and initial
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 526eeb7e1e97..1b9d7e48ba1e 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -21,6 +21,7 @@
 #include <linux/kernel.h>
 #include <linux/errno.h>
 #include <linux/init.h>
+#include <linux/initrd.h>
 #include <linux/libfdt.h>
 #include <linux/mman.h>
 #include <linux/nodemask.h>
@@ -481,11 +482,33 @@ static void __init bootstrap_linear_mapping(unsigned long va_offset)
 static void __init map_mem(void)
 {
 	struct memblock_region *reg;
+	u64 new_memstart_addr;
+	u64 new_va_offset;
 
-	bootstrap_linear_mapping(KIMAGE_OFFSET);
+	/*
+	 * Select a suitable value for the base of physical memory.
+	 * This should be equal to or below the lowest usable physical
+	 * memory address, and aligned to PUD/PMD size so that we can map
+	 * it efficiently.
+	 */
+	new_memstart_addr = round_down(memblock_start_of_DRAM(), SZ_1G);
+
+	/*
+	 * Calculate the offset between the kernel text mapping that exists
+	 * outside of the linear mapping, and its mapping in the linear region.
+	 */
+	new_va_offset = memstart_addr - new_memstart_addr;
+
+	bootstrap_linear_mapping(new_va_offset);
 
-	kernel_va_offset = KIMAGE_OFFSET;
-	memstart_addr -= KIMAGE_OFFSET;
+	kernel_va_offset = new_va_offset;
+	memstart_addr = new_memstart_addr;
+
+	/* Recalculate virtual addresses of initrd region */
+	if (IS_ENABLED(CONFIG_BLK_DEV_INITRD) && initrd_start) {
+		initrd_start += new_va_offset;
+		initrd_end += new_va_offset;
+	}
 
 	/* map all the memory banks */
 	for_each_memblock(memory, reg) {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 3/7] arm64: split off early mapping code from early_fixmap_init()
  2015-11-16 11:23 ` [PATCH v3 3/7] arm64: split off early mapping code from early_fixmap_init() Ard Biesheuvel
@ 2015-12-03 12:18   ` Mark Rutland
  2015-12-03 13:31     ` Ard Biesheuvel
  2015-12-08 12:40     ` Will Deacon
  0 siblings, 2 replies; 28+ messages in thread
From: Mark Rutland @ 2015-12-03 12:18 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ard,

Apologies that it's taken me so long to get around to this...

On Mon, Nov 16, 2015 at 12:23:14PM +0100, Ard Biesheuvel wrote:
> This splits off and generalises the population of the statically
> allocated fixmap page tables so that we may reuse it later for
> the linear mapping once we move the kernel text mapping out of it.
> 
> This also involves taking into account that table entries at any of
> the levels we are populating may have been populated already, since
> the fixmap mapping might not be disjoint up to the pgd level anymore
> from other early mappings.

As a heads-up, for avoiding TLB conflicts, I'm currently working on
alternative way of creating the kernel page tables which will definitely
conflict here, and may or may not supercede this approach.

By adding new FIX_{PGD,PUD,PMD,PTE} indicees to the fixmap, we can
allocate page tables from anywhere via memblock, and temporarily map
them as we need to.

That would avoid the need for the bootstrap tables. In head.S we'd only
need to create a temporary (coarse-grained, RWX) kernel mapping (with
the fixmap bolted on). Later we would create a whole new set of tables
with a fine-grained kernel mapping and a full linear mapping using the
new fixmap entries to temporarily map tables, then switch over to those
atomically.

Otherwise, one minor comment below.

> +static void __init bootstrap_early_mapping(unsigned long addr,
> +					   struct bootstrap_pgtables *reg,
> +					   bool pte_level)

The only caller in this patch passes true for pte_level.

Can we not introduce the argument when it is first needed? Or at least
have something in the commit message as to why we'll need it later?

>  	/*
>  	 * The boot-ioremap range spans multiple pmds, for which
> -	 * we are not preparted:
> +	 * we are not prepared:
>  	 */

I cannot wait to see this typo go!

Otherwise, this looks fine to me.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 3/7] arm64: split off early mapping code from early_fixmap_init()
  2015-12-03 12:18   ` Mark Rutland
@ 2015-12-03 13:31     ` Ard Biesheuvel
  2015-12-03 13:59       ` Mark Rutland
  2015-12-07 16:08       ` Catalin Marinas
  2015-12-08 12:40     ` Will Deacon
  1 sibling, 2 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2015-12-03 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

On 3 December 2015 at 13:18, Mark Rutland <mark.rutland@arm.com> wrote:
> Hi Ard,
>
> Apologies that it's taken me so long to get around to this...
>
> On Mon, Nov 16, 2015 at 12:23:14PM +0100, Ard Biesheuvel wrote:
>> This splits off and generalises the population of the statically
>> allocated fixmap page tables so that we may reuse it later for
>> the linear mapping once we move the kernel text mapping out of it.
>>
>> This also involves taking into account that table entries at any of
>> the levels we are populating may have been populated already, since
>> the fixmap mapping might not be disjoint up to the pgd level anymore
>> from other early mappings.
>
> As a heads-up, for avoiding TLB conflicts, I'm currently working on
> alternative way of creating the kernel page tables which will definitely
> conflict here, and may or may not supercede this approach.
>
> By adding new FIX_{PGD,PUD,PMD,PTE} indicees to the fixmap, we can
> allocate page tables from anywhere via memblock, and temporarily map
> them as we need to.
>

Interesting. So how are you dealing with the va<->pa translations and
vice versa that occur all over the place in create_mapping() et al ?

> That would avoid the need for the bootstrap tables. In head.S we'd only
> need to create a temporary (coarse-grained, RWX) kernel mapping (with
> the fixmap bolted on). Later we would create a whole new set of tables
> with a fine-grained kernel mapping and a full linear mapping using the
> new fixmap entries to temporarily map tables, then switch over to those
> atomically.
>

If we change back to a full linear mapping, are we back to not putting
the Image astride a 1GB/32MB/512MB boundary (depending on page size)?

Anyway, to illustrate where I am headed with this: in my next version
of this series, I intend to move the kernel mapping to the start of
the vmalloc area, which gets moved up 64 MB to make room for the
module area (which also moves down). That way, we can still load
modules as before, but no longer have a need for a dedicated carveout
for the kernel below PAGE_OFFSET.

The next step is then to move the kernel Image up inside the vmalloc
area based on some randomness we get from the bootloader, and relocate
it in place (using the same approach as in the patches I sent out
beginning of this year). I have implemented module PLTs so that the
Image and the modules no longer need to be within 128 MB of each
other, which means that we can have full KASLR for modules and Image,
and also place the kernel anywhere in physical memory.The module PLTs
would be a runtime penalty only, i.e., a KASLR capable kernel running
without KASLR would not incur the penalty of branching via PLTs. The
only build time option is -mcmodel=large for modules so that data
symbol references are absolute, but that is unlike to hurt
performance.

> Otherwise, one minor comment below.
>
>> +static void __init bootstrap_early_mapping(unsigned long addr,
>> +                                        struct bootstrap_pgtables *reg,
>> +                                        bool pte_level)
>
> The only caller in this patch passes true for pte_level.
>
> Can we not introduce the argument when it is first needed? Or at least
> have something in the commit message as to why we'll need it later?
>

Yes, that should be possible.

>>       /*
>>        * The boot-ioremap range spans multiple pmds, for which
>> -      * we are not preparted:
>> +      * we are not prepared:
>>        */
>
> I cannot wait to see this typo go!
>
> Otherwise, this looks fine to me.
>

Thanks Mark

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 3/7] arm64: split off early mapping code from early_fixmap_init()
  2015-12-03 13:31     ` Ard Biesheuvel
@ 2015-12-03 13:59       ` Mark Rutland
  2015-12-03 14:05         ` Ard Biesheuvel
  2015-12-07 16:08       ` Catalin Marinas
  1 sibling, 1 reply; 28+ messages in thread
From: Mark Rutland @ 2015-12-03 13:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Dec 03, 2015 at 02:31:19PM +0100, Ard Biesheuvel wrote:
> On 3 December 2015 at 13:18, Mark Rutland <mark.rutland@arm.com> wrote:
> > Hi Ard,
> >
> > Apologies that it's taken me so long to get around to this...
> >
> > On Mon, Nov 16, 2015 at 12:23:14PM +0100, Ard Biesheuvel wrote:
> >> This splits off and generalises the population of the statically
> >> allocated fixmap page tables so that we may reuse it later for
> >> the linear mapping once we move the kernel text mapping out of it.
> >>
> >> This also involves taking into account that table entries at any of
> >> the levels we are populating may have been populated already, since
> >> the fixmap mapping might not be disjoint up to the pgd level anymore
> >> from other early mappings.
> >
> > As a heads-up, for avoiding TLB conflicts, I'm currently working on
> > alternative way of creating the kernel page tables which will definitely
> > conflict here, and may or may not supercede this approach.
> >
> > By adding new FIX_{PGD,PUD,PMD,PTE} indicees to the fixmap, we can
> > allocate page tables from anywhere via memblock, and temporarily map
> > them as we need to.
> >
> 
> Interesting. So how are you dealing with the va<->pa translations and
> vice versa that occur all over the place in create_mapping() et al ?

By rewriting create_mapping() et al to not do that ;)

That's requiring a fair amount of massaging, but so far I've not hit
anything that renders the approach impossible.

> > That would avoid the need for the bootstrap tables. In head.S we'd only
> > need to create a temporary (coarse-grained, RWX) kernel mapping (with
> > the fixmap bolted on). Later we would create a whole new set of tables
> > with a fine-grained kernel mapping and a full linear mapping using the
> > new fixmap entries to temporarily map tables, then switch over to those
> > atomically.
> >
> 
> If we change back to a full linear mapping, are we back to not putting
> the Image astride a 1GB/32MB/512MB boundary (depending on page size)?

I'm not exactly sure what you mean here.

The kernel mapping may inhibit using large section mappings, but this is
necessary anyway due to permission changes at sub-section granularity
(e.g. in fixup_init).

The idea is that when the kernel tables are set up, things are mapped at
the largest possible granularity that permits later permission changes
without breaking/making sections (such that we can avoid TLB conflicts).

So we'd map the kernel and memory in segments, where no two segments
share a common last-level entry (i.e. they're all at least page-aligned,
and don't share a section with another segment).

We'd have separate segments for:
* memory below TEXT_OFFSET
* text
* rodata
* init
* altinstr (I think this can be folded into rodata)
* bss / data, tables
* memory above _end

Later I think it should be relatively simple to move the memory segment
mapping for split-VA.

> Anyway, to illustrate where I am headed with this: in my next version
> of this series, I intend to move the kernel mapping to the start of
> the vmalloc area, which gets moved up 64 MB to make room for the
> module area (which also moves down). That way, we can still load
> modules as before, but no longer have a need for a dedicated carveout
> for the kernel below PAGE_OFFSET.

Ok.

> The next step is then to move the kernel Image up inside the vmalloc
> area based on some randomness we get from the bootloader, and relocate
> it in place (using the same approach as in the patches I sent out
> beginning of this year). I have implemented module PLTs so that the
> Image and the modules no longer need to be within 128 MB of each
> other, which means that we can have full KASLR for modules and Image,
> and also place the kernel anywhere in physical memory.The module PLTs
> would be a runtime penalty only, i.e., a KASLR capable kernel running
> without KASLR would not incur the penalty of branching via PLTs. The
> only build time option is -mcmodel=large for modules so that data
> symbol references are absolute, but that is unlike to hurt
> performance.

I'm certainly interested in seeing this!

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 3/7] arm64: split off early mapping code from early_fixmap_init()
  2015-12-03 13:59       ` Mark Rutland
@ 2015-12-03 14:05         ` Ard Biesheuvel
  0 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2015-12-03 14:05 UTC (permalink / raw)
  To: linux-arm-kernel

On 3 December 2015 at 14:59, Mark Rutland <mark.rutland@arm.com> wrote:
> On Thu, Dec 03, 2015 at 02:31:19PM +0100, Ard Biesheuvel wrote:
>> On 3 December 2015 at 13:18, Mark Rutland <mark.rutland@arm.com> wrote:
>> > Hi Ard,
>> >
>> > Apologies that it's taken me so long to get around to this...
>> >
>> > On Mon, Nov 16, 2015 at 12:23:14PM +0100, Ard Biesheuvel wrote:
>> >> This splits off and generalises the population of the statically
>> >> allocated fixmap page tables so that we may reuse it later for
>> >> the linear mapping once we move the kernel text mapping out of it.
>> >>
>> >> This also involves taking into account that table entries at any of
>> >> the levels we are populating may have been populated already, since
>> >> the fixmap mapping might not be disjoint up to the pgd level anymore
>> >> from other early mappings.
>> >
>> > As a heads-up, for avoiding TLB conflicts, I'm currently working on
>> > alternative way of creating the kernel page tables which will definitely
>> > conflict here, and may or may not supercede this approach.
>> >
>> > By adding new FIX_{PGD,PUD,PMD,PTE} indicees to the fixmap, we can
>> > allocate page tables from anywhere via memblock, and temporarily map
>> > them as we need to.
>> >
>>
>> Interesting. So how are you dealing with the va<->pa translations and
>> vice versa that occur all over the place in create_mapping() et al ?
>
> By rewriting create_mapping() et al to not do that ;)
>
> That's requiring a fair amount of massaging, but so far I've not hit
> anything that renders the approach impossible.
>
>> > That would avoid the need for the bootstrap tables. In head.S we'd only
>> > need to create a temporary (coarse-grained, RWX) kernel mapping (with
>> > the fixmap bolted on). Later we would create a whole new set of tables
>> > with a fine-grained kernel mapping and a full linear mapping using the
>> > new fixmap entries to temporarily map tables, then switch over to those
>> > atomically.
>> >
>>
>> If we change back to a full linear mapping, are we back to not putting
>> the Image astride a 1GB/32MB/512MB boundary (depending on page size)?
>
> I'm not exactly sure what you mean here.
>

Apologies, I misread 'linear mapping' as 'id mapping, which of course
are two different things entirely

> The kernel mapping may inhibit using large section mappings, but this is
> necessary anyway due to permission changes at sub-section granularity
> (e.g. in fixup_init).
>
> The idea is that when the kernel tables are set up, things are mapped at
> the largest possible granularity that permits later permission changes
> without breaking/making sections (such that we can avoid TLB conflicts).
>
> So we'd map the kernel and memory in segments, where no two segments
> share a common last-level entry (i.e. they're all at least page-aligned,
> and don't share a section with another segment).
>
> We'd have separate segments for:
> * memory below TEXT_OFFSET
> * text
> * rodata
> * init
> * altinstr (I think this can be folded into rodata)
> * bss / data, tables
> * memory above _end
>
> Later I think it should be relatively simple to move the memory segment
> mapping for split-VA.
>

I'd need to see it to understand, I guess, but getting rid of the
pa<->va translations is definitely an improvement for the stuff I am
trying to do, and would probably make it a lot cleaner.

>> Anyway, to illustrate where I am headed with this: in my next version
>> of this series, I intend to move the kernel mapping to the start of
>> the vmalloc area, which gets moved up 64 MB to make room for the
>> module area (which also moves down). That way, we can still load
>> modules as before, but no longer have a need for a dedicated carveout
>> for the kernel below PAGE_OFFSET.
>
> Ok.
>
>> The next step is then to move the kernel Image up inside the vmalloc
>> area based on some randomness we get from the bootloader, and relocate
>> it in place (using the same approach as in the patches I sent out
>> beginning of this year). I have implemented module PLTs so that the
>> Image and the modules no longer need to be within 128 MB of each
>> other, which means that we can have full KASLR for modules and Image,
>> and also place the kernel anywhere in physical memory.The module PLTs
>> would be a runtime penalty only, i.e., a KASLR capable kernel running
>> without KASLR would not incur the penalty of branching via PLTs. The
>> only build time option is -mcmodel=large for modules so that data
>> symbol references are absolute, but that is unlike to hurt
>> performance.
>
> I'm certainly interested in seeing this!
>

I have patches for all of this, only they don't live on the same branch yet :-)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 5/7] arm64: move kernel mapping out of linear region
  2015-11-16 11:23 ` [PATCH v3 5/7] arm64: move kernel mapping out of linear region Ard Biesheuvel
@ 2015-12-07 12:26   ` Catalin Marinas
  2015-12-07 12:33     ` Ard Biesheuvel
  0 siblings, 1 reply; 28+ messages in thread
From: Catalin Marinas @ 2015-12-07 12:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Nov 16, 2015 at 12:23:16PM +0100, Ard Biesheuvel wrote:
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 23cfc08fc8ba..d3e4b5d6a8d2 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
[...]
> @@ -210,7 +210,15 @@ section_table:
>  ENTRY(stext)
>  	bl	preserve_boot_args
>  	bl	el2_setup			// Drop to EL1, w20=cpu_boot_mode
> -	adrp	x24, __PHYS_OFFSET
> +
> +	/*
> +	 * Before the linear mapping has been set up, __va() translations will
> +	 * not produce usable virtual addresses unless we tweak PHYS_OFFSET to
> +	 * compensate for the offset between the kernel mapping and the base of
> +	 * the linear mapping. We will undo this in map_mem().
> +	 */

Minor typo in comment: I guess you meant "__pa() translations will not
produce usable...".

> diff --git a/arch/arm64/mm/dump.c b/arch/arm64/mm/dump.c
> index 5a22a119a74c..f6272f450688 100644
> --- a/arch/arm64/mm/dump.c
> +++ b/arch/arm64/mm/dump.c
> @@ -63,7 +63,8 @@ static struct addr_marker address_markers[] = {
>  	{ PCI_IO_END,		"PCI I/O end" },
>  	{ MODULES_VADDR,	"Modules start" },
>  	{ MODULES_END,		"Modules end" },
> -	{ PAGE_OFFSET,		"Kernel Mapping" },
> +	{ KIMAGE_VADDR,		"Kernel Mapping" },
> +	{ PAGE_OFFSET,		"Linear Mapping" },
>  	{ -1,			NULL },
>  };

Apart from this, please change the pr_notice() in mem_init() to show the
linear mapping at the end (keep them in ascending order).

-- 
Catalin

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 5/7] arm64: move kernel mapping out of linear region
  2015-12-07 12:26   ` Catalin Marinas
@ 2015-12-07 12:33     ` Ard Biesheuvel
  2015-12-07 12:34       ` Ard Biesheuvel
  0 siblings, 1 reply; 28+ messages in thread
From: Ard Biesheuvel @ 2015-12-07 12:33 UTC (permalink / raw)
  To: linux-arm-kernel

On 7 December 2015 at 13:26, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Mon, Nov 16, 2015 at 12:23:16PM +0100, Ard Biesheuvel wrote:
>> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
>> index 23cfc08fc8ba..d3e4b5d6a8d2 100644
>> --- a/arch/arm64/kernel/head.S
>> +++ b/arch/arm64/kernel/head.S
> [...]
>> @@ -210,7 +210,15 @@ section_table:
>>  ENTRY(stext)
>>       bl      preserve_boot_args
>>       bl      el2_setup                       // Drop to EL1, w20=cpu_boot_mode
>> -     adrp    x24, __PHYS_OFFSET
>> +
>> +     /*
>> +      * Before the linear mapping has been set up, __va() translations will
>> +      * not produce usable virtual addresses unless we tweak PHYS_OFFSET to
>> +      * compensate for the offset between the kernel mapping and the base of
>> +      * the linear mapping. We will undo this in map_mem().
>> +      */
>
> Minor typo in comment: I guess you meant "__pa() translations will not
> produce usable...".
>

No, not quite. __va() translations will normally produce addresses in
the linear mapping, which will not be set up when we first start using
it in create_mapping(). So until that time, we have to redirect __va()
translations into the kernel mapping, where swapper_pg_dir is
shadowed. I am hoping that Mark's planned changes to create_mapping()
will make this unnecessary, but I haven't seen any of his code yet.

As far as __pa() is concerned, that translation is actually tweaked so
it will always produce usable addresses, regardless of whether the
bias is still set or not. The reason is that va-to-pa translations are
always unambiguous.

>> diff --git a/arch/arm64/mm/dump.c b/arch/arm64/mm/dump.c
>> index 5a22a119a74c..f6272f450688 100644
>> --- a/arch/arm64/mm/dump.c
>> +++ b/arch/arm64/mm/dump.c
>> @@ -63,7 +63,8 @@ static struct addr_marker address_markers[] = {
>>       { PCI_IO_END,           "PCI I/O end" },
>>       { MODULES_VADDR,        "Modules start" },
>>       { MODULES_END,          "Modules end" },
>> -     { PAGE_OFFSET,          "Kernel Mapping" },
>> +     { KIMAGE_VADDR,         "Kernel Mapping" },
>> +     { PAGE_OFFSET,          "Linear Mapping" },
>>       { -1,                   NULL },
>>  };
>
> Apart from this, please change the pr_notice() in mem_init() to show the
> linear mapping at the end (keep them in ascending order).
>

OK

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 5/7] arm64: move kernel mapping out of linear region
  2015-12-07 12:33     ` Ard Biesheuvel
@ 2015-12-07 12:34       ` Ard Biesheuvel
  2015-12-07 15:37         ` Catalin Marinas
  0 siblings, 1 reply; 28+ messages in thread
From: Ard Biesheuvel @ 2015-12-07 12:34 UTC (permalink / raw)
  To: linux-arm-kernel

On 7 December 2015 at 13:33, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 7 December 2015 at 13:26, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> On Mon, Nov 16, 2015 at 12:23:16PM +0100, Ard Biesheuvel wrote:
>>> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
>>> index 23cfc08fc8ba..d3e4b5d6a8d2 100644
>>> --- a/arch/arm64/kernel/head.S
>>> +++ b/arch/arm64/kernel/head.S
>> [...]
>>> @@ -210,7 +210,15 @@ section_table:
>>>  ENTRY(stext)
>>>       bl      preserve_boot_args
>>>       bl      el2_setup                       // Drop to EL1, w20=cpu_boot_mode
>>> -     adrp    x24, __PHYS_OFFSET
>>> +
>>> +     /*
>>> +      * Before the linear mapping has been set up, __va() translations will
>>> +      * not produce usable virtual addresses unless we tweak PHYS_OFFSET to
>>> +      * compensate for the offset between the kernel mapping and the base of
>>> +      * the linear mapping. We will undo this in map_mem().
>>> +      */
>>
>> Minor typo in comment: I guess you meant "__pa() translations will not
>> produce usable...".
>>
>
> No, not quite. __va() translations will normally produce addresses in
> the linear mapping, which will not be set up when we first start using
> it in create_mapping(). So until that time, we have to redirect __va()
> translations into the kernel mapping, where swapper_pg_dir is
> shadowed. I am hoping that Mark's planned changes to create_mapping()
> will make this unnecessary, but I haven't seen any of his code yet.
>
> As far as __pa() is concerned, that translation is actually tweaked so
> it will always produce usable addresses, regardless of whether the
> bias is still set or not. The reason is that va-to-pa translations are
> always unambiguous.
>

... so of course, the comment is still wrong, -> s/virtual/physical/ addresses


>>> diff --git a/arch/arm64/mm/dump.c b/arch/arm64/mm/dump.c
>>> index 5a22a119a74c..f6272f450688 100644
>>> --- a/arch/arm64/mm/dump.c
>>> +++ b/arch/arm64/mm/dump.c
>>> @@ -63,7 +63,8 @@ static struct addr_marker address_markers[] = {
>>>       { PCI_IO_END,           "PCI I/O end" },
>>>       { MODULES_VADDR,        "Modules start" },
>>>       { MODULES_END,          "Modules end" },
>>> -     { PAGE_OFFSET,          "Kernel Mapping" },
>>> +     { KIMAGE_VADDR,         "Kernel Mapping" },
>>> +     { PAGE_OFFSET,          "Linear Mapping" },
>>>       { -1,                   NULL },
>>>  };
>>
>> Apart from this, please change the pr_notice() in mem_init() to show the
>> linear mapping at the end (keep them in ascending order).
>>
>
> OK

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 7/7] arm64: allow kernel Image to be loaded anywhere in physical memory
  2015-11-16 11:23 ` [PATCH v3 7/7] arm64: allow kernel Image to be loaded anywhere in physical memory Ard Biesheuvel
@ 2015-12-07 15:30   ` Catalin Marinas
  2015-12-07 15:40     ` Ard Biesheuvel
  0 siblings, 1 reply; 28+ messages in thread
From: Catalin Marinas @ 2015-12-07 15:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Nov 16, 2015 at 12:23:18PM +0100, Ard Biesheuvel wrote:
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index 3148691bc80a..d6a237bda1f9 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -120,13 +120,10 @@ extern phys_addr_t		memstart_addr;
>  extern u64 kernel_va_offset;
>  
>  /*
> - * The maximum physical address that the linear direct mapping
> - * of system RAM can cover. (PAGE_OFFSET can be interpreted as
> - * a 2's complement signed quantity and negated to derive the
> - * maximum size of the linear mapping.)
> + * Allow all memory at the discovery stage. We will clip it later.
>   */
> -#define MAX_MEMBLOCK_ADDR	({ memstart_addr - PAGE_OFFSET - 1; })
> -#define MIN_MEMBLOCK_ADDR	__pa(KIMAGE_VADDR)
> +#define MIN_MEMBLOCK_ADDR	0
> +#define MAX_MEMBLOCK_ADDR	U64_MAX

Just in case we get some random memblock information, shall we cap the
maximum to PHYS_MASK?

> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index b3b0175d7135..29a7dc5327b6 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -158,9 +159,55 @@ static int __init early_mem(char *p)
>  }
>  early_param("mem", early_mem);
>  
> +static void __init enforce_memory_limit(void)
> +{
> +	const phys_addr_t kbase = round_down(__pa(_text), MIN_KIMG_ALIGN);
> +	u64 to_remove = memblock_phys_mem_size() - memory_limit;
> +	phys_addr_t max_addr = 0;
> +	struct memblock_region *r;
> +
> +	if (memory_limit == (phys_addr_t)ULLONG_MAX)
> +		return;
> +
> +	/*
> +	 * The kernel may be high up in physical memory, so try to apply the
> +	 * limit below the kernel first, and only let the generic handling
> +	 * take over if it turns out we haven't clipped enough memory yet.
> +	 */
> +	for_each_memblock(memory, r) {
> +		if (r->base + r->size > kbase) {
> +			u64 rem = min(to_remove, kbase - r->base);
> +
> +			max_addr = r->base + rem;
> +			to_remove -= rem;
> +			break;
> +		}
> +		if (to_remove <= r->size) {
> +			max_addr = r->base + to_remove;
> +			to_remove = 0;
> +			break;
> +		}
> +		to_remove -= r->size;
> +	}
> +
> +	memblock_remove(0, max_addr);

I don't fully get the reason for this function. Do you want to keep the
kernel around in memblock? How do we guarantee that the call below
wouldn't remove it anyway?

> +
> +	if (to_remove)
> +		memblock_enforce_memory_limit(memory_limit);

Shouldn't this be memblock_enforce_memory_limit(to_remove)?

> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 526eeb7e1e97..1b9d7e48ba1e 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c

> @@ -481,11 +482,33 @@ static void __init bootstrap_linear_mapping(unsigned long va_offset)
>  static void __init map_mem(void)
>  {
>  	struct memblock_region *reg;
> +	u64 new_memstart_addr;
> +	u64 new_va_offset;
>  
> -	bootstrap_linear_mapping(KIMAGE_OFFSET);
> +	/*
> +	 * Select a suitable value for the base of physical memory.
> +	 * This should be equal to or below the lowest usable physical
> +	 * memory address, and aligned to PUD/PMD size so that we can map
> +	 * it efficiently.
> +	 */
> +	new_memstart_addr = round_down(memblock_start_of_DRAM(), SZ_1G);

With this trick, we can no longer assume we have a mapping at
PAGE_OFFSET. I don't think we break any expectations but we probably
don't free the unused memmap at the beginning. We can probably set
prev_end to this rounded down address in free_unused_memmap().

-- 
Catalin

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 5/7] arm64: move kernel mapping out of linear region
  2015-12-07 12:34       ` Ard Biesheuvel
@ 2015-12-07 15:37         ` Catalin Marinas
  0 siblings, 0 replies; 28+ messages in thread
From: Catalin Marinas @ 2015-12-07 15:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Dec 07, 2015 at 01:34:19PM +0100, Ard Biesheuvel wrote:
> On 7 December 2015 at 13:33, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> > On 7 December 2015 at 13:26, Catalin Marinas <catalin.marinas@arm.com> wrote:
> >> On Mon, Nov 16, 2015 at 12:23:16PM +0100, Ard Biesheuvel wrote:
> >>> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> >>> index 23cfc08fc8ba..d3e4b5d6a8d2 100644
> >>> --- a/arch/arm64/kernel/head.S
> >>> +++ b/arch/arm64/kernel/head.S
> >> [...]
> >>> @@ -210,7 +210,15 @@ section_table:
> >>>  ENTRY(stext)
> >>>       bl      preserve_boot_args
> >>>       bl      el2_setup                       // Drop to EL1, w20=cpu_boot_mode
> >>> -     adrp    x24, __PHYS_OFFSET
> >>> +
> >>> +     /*
> >>> +      * Before the linear mapping has been set up, __va() translations will
> >>> +      * not produce usable virtual addresses unless we tweak PHYS_OFFSET to
> >>> +      * compensate for the offset between the kernel mapping and the base of
> >>> +      * the linear mapping. We will undo this in map_mem().
> >>> +      */
> >>
> >> Minor typo in comment: I guess you meant "__pa() translations will not
> >> produce usable...".
> >
> > No, not quite. __va() translations will normally produce addresses in
> > the linear mapping, which will not be set up when we first start using
> > it in create_mapping(). So until that time, we have to redirect __va()
> > translations into the kernel mapping, where swapper_pg_dir is
> > shadowed.

I guessed what you meant and I remember the reason based on past
discussions, only that to me "linear mapping" sounds like something in
virtual space while __va() generates a linear mapping -> physical
translation (just some wording, nothing serious).

> > I am hoping that Mark's planned changes to create_mapping()
> > will make this unnecessary, but I haven't seen any of his code yet.

Not sure, I haven't seen the details yet.

> > As far as __pa() is concerned, that translation is actually tweaked so
> > it will always produce usable addresses, regardless of whether the
> > bias is still set or not. The reason is that va-to-pa translations are
> > always unambiguous.

Only that very early during boot memstart_addr is still based on the
kernel load address rather than memblock_start_of_DRAM(), that's why I
thought you meant __pa().

> ... so of course, the comment is still wrong, -> s/virtual/physical/
> addresses

This would do.

-- 
Catalin

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 7/7] arm64: allow kernel Image to be loaded anywhere in physical memory
  2015-12-07 15:30   ` Catalin Marinas
@ 2015-12-07 15:40     ` Ard Biesheuvel
  2015-12-07 16:43       ` Catalin Marinas
  0 siblings, 1 reply; 28+ messages in thread
From: Ard Biesheuvel @ 2015-12-07 15:40 UTC (permalink / raw)
  To: linux-arm-kernel

On 7 December 2015 at 16:30, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Mon, Nov 16, 2015 at 12:23:18PM +0100, Ard Biesheuvel wrote:
>> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
>> index 3148691bc80a..d6a237bda1f9 100644
>> --- a/arch/arm64/include/asm/memory.h
>> +++ b/arch/arm64/include/asm/memory.h
>> @@ -120,13 +120,10 @@ extern phys_addr_t              memstart_addr;
>>  extern u64 kernel_va_offset;
>>
>>  /*
>> - * The maximum physical address that the linear direct mapping
>> - * of system RAM can cover. (PAGE_OFFSET can be interpreted as
>> - * a 2's complement signed quantity and negated to derive the
>> - * maximum size of the linear mapping.)
>> + * Allow all memory at the discovery stage. We will clip it later.
>>   */
>> -#define MAX_MEMBLOCK_ADDR    ({ memstart_addr - PAGE_OFFSET - 1; })
>> -#define MIN_MEMBLOCK_ADDR    __pa(KIMAGE_VADDR)
>> +#define MIN_MEMBLOCK_ADDR    0
>> +#define MAX_MEMBLOCK_ADDR    U64_MAX
>
> Just in case we get some random memblock information, shall we cap the
> maximum to PHYS_MASK?
>

Yes, that makes sense.

>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index b3b0175d7135..29a7dc5327b6 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -158,9 +159,55 @@ static int __init early_mem(char *p)
>>  }
>>  early_param("mem", early_mem);
>>
>> +static void __init enforce_memory_limit(void)
>> +{
>> +     const phys_addr_t kbase = round_down(__pa(_text), MIN_KIMG_ALIGN);
>> +     u64 to_remove = memblock_phys_mem_size() - memory_limit;
>> +     phys_addr_t max_addr = 0;
>> +     struct memblock_region *r;
>> +
>> +     if (memory_limit == (phys_addr_t)ULLONG_MAX)
>> +             return;
>> +
>> +     /*
>> +      * The kernel may be high up in physical memory, so try to apply the
>> +      * limit below the kernel first, and only let the generic handling
>> +      * take over if it turns out we haven't clipped enough memory yet.
>> +      */
>> +     for_each_memblock(memory, r) {
>> +             if (r->base + r->size > kbase) {
>> +                     u64 rem = min(to_remove, kbase - r->base);
>> +
>> +                     max_addr = r->base + rem;
>> +                     to_remove -= rem;
>> +                     break;
>> +             }
>> +             if (to_remove <= r->size) {
>> +                     max_addr = r->base + to_remove;
>> +                     to_remove = 0;
>> +                     break;
>> +             }
>> +             to_remove -= r->size;
>> +     }
>> +
>> +     memblock_remove(0, max_addr);
>
> I don't fully get the reason for this function. Do you want to keep the
> kernel around in memblock? How do we guarantee that the call below
> wouldn't remove it anyway?
>

The problem is that the ordinary memblock_enforce_memory_limit()
removes memory from the top, which means it will happily remove the
memory that covers your kernel image if it happens to be loaded high
up in physical memory.

>> +
>> +     if (to_remove)
>> +             memblock_enforce_memory_limit(memory_limit);
>
> Shouldn't this be memblock_enforce_memory_limit(to_remove)?
>

No, it takes the memory limit as input. 'to_remove + memory_limit'
will be exactly the remaining memory at this point.

>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 526eeb7e1e97..1b9d7e48ba1e 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>
>> @@ -481,11 +482,33 @@ static void __init bootstrap_linear_mapping(unsigned long va_offset)
>>  static void __init map_mem(void)
>>  {
>>       struct memblock_region *reg;
>> +     u64 new_memstart_addr;
>> +     u64 new_va_offset;
>>
>> -     bootstrap_linear_mapping(KIMAGE_OFFSET);
>> +     /*
>> +      * Select a suitable value for the base of physical memory.
>> +      * This should be equal to or below the lowest usable physical
>> +      * memory address, and aligned to PUD/PMD size so that we can map
>> +      * it efficiently.
>> +      */
>> +     new_memstart_addr = round_down(memblock_start_of_DRAM(), SZ_1G);
>
> With this trick, we can no longer assume we have a mapping at
> PAGE_OFFSET. I don't think we break any expectations but we probably
> don't free the unused memmap at the beginning. We can probably set
> prev_end to this rounded down address in free_unused_memmap().
>

I will look into that.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 3/7] arm64: split off early mapping code from early_fixmap_init()
  2015-12-03 13:31     ` Ard Biesheuvel
  2015-12-03 13:59       ` Mark Rutland
@ 2015-12-07 16:08       ` Catalin Marinas
  2015-12-07 16:13         ` Ard Biesheuvel
  1 sibling, 1 reply; 28+ messages in thread
From: Catalin Marinas @ 2015-12-07 16:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Dec 03, 2015 at 02:31:19PM +0100, Ard Biesheuvel wrote:
> On 3 December 2015 at 13:18, Mark Rutland <mark.rutland@arm.com> wrote:
> > As a heads-up, for avoiding TLB conflicts, I'm currently working on
> > alternative way of creating the kernel page tables which will definitely
> > conflict here, and may or may not supercede this approach.
> >
> > By adding new FIX_{PGD,PUD,PMD,PTE} indicees to the fixmap, we can
> > allocate page tables from anywhere via memblock, and temporarily map
> > them as we need to.
[...]
> > That would avoid the need for the bootstrap tables. In head.S we'd only
> > need to create a temporary (coarse-grained, RWX) kernel mapping (with
> > the fixmap bolted on). Later we would create a whole new set of tables
> > with a fine-grained kernel mapping and a full linear mapping using the
> > new fixmap entries to temporarily map tables, then switch over to those
> > atomically.

If we separate the kernel image mapping from the linear one, I think
things would be slightly simpler to avoid TLB conflicts (but I haven't
looked at Mark's patches yet).

> If we change back to a full linear mapping, are we back to not putting
> the Image astride a 1GB/32MB/512MB boundary (depending on page size)?
> 
> Anyway, to illustrate where I am headed with this: in my next version
> of this series, I intend to move the kernel mapping to the start of
> the vmalloc area, which gets moved up 64 MB to make room for the
> module area (which also moves down). That way, we can still load
> modules as before, but no longer have a need for a dedicated carveout
> for the kernel below PAGE_OFFSET.

This makes sense, I guess it can be easily added to the existing series
just by changing the KIMAGE_OFFSET macro.

> The next step is then to move the kernel Image up inside the vmalloc
> area based on some randomness we get from the bootloader, and relocate
> it in place (using the same approach as in the patches I sent out
> beginning of this year). I have implemented module PLTs so that the
> Image and the modules no longer need to be within 128 MB of each
> other, which means that we can have full KASLR for modules and Image,
> and also place the kernel anywhere in physical memory.The module PLTs
> would be a runtime penalty only, i.e., a KASLR capable kernel running
> without KASLR would not incur the penalty of branching via PLTs. The
> only build time option is -mcmodel=large for modules so that data
> symbol references are absolute, but that is unlike to hurt
> performance.

I guess full KASLR would be conditional on a config option.

-- 
Catalin

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 3/7] arm64: split off early mapping code from early_fixmap_init()
  2015-12-07 16:08       ` Catalin Marinas
@ 2015-12-07 16:13         ` Ard Biesheuvel
  0 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2015-12-07 16:13 UTC (permalink / raw)
  To: linux-arm-kernel

On 7 December 2015 at 17:08, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, Dec 03, 2015 at 02:31:19PM +0100, Ard Biesheuvel wrote:
>> On 3 December 2015 at 13:18, Mark Rutland <mark.rutland@arm.com> wrote:
>> > As a heads-up, for avoiding TLB conflicts, I'm currently working on
>> > alternative way of creating the kernel page tables which will definitely
>> > conflict here, and may or may not supercede this approach.
>> >
>> > By adding new FIX_{PGD,PUD,PMD,PTE} indicees to the fixmap, we can
>> > allocate page tables from anywhere via memblock, and temporarily map
>> > them as we need to.
> [...]
>> > That would avoid the need for the bootstrap tables. In head.S we'd only
>> > need to create a temporary (coarse-grained, RWX) kernel mapping (with
>> > the fixmap bolted on). Later we would create a whole new set of tables
>> > with a fine-grained kernel mapping and a full linear mapping using the
>> > new fixmap entries to temporarily map tables, then switch over to those
>> > atomically.
>
> If we separate the kernel image mapping from the linear one, I think
> things would be slightly simpler to avoid TLB conflicts (but I haven't
> looked at Mark's patches yet).
>
>> If we change back to a full linear mapping, are we back to not putting
>> the Image astride a 1GB/32MB/512MB boundary (depending on page size)?
>>
>> Anyway, to illustrate where I am headed with this: in my next version
>> of this series, I intend to move the kernel mapping to the start of
>> the vmalloc area, which gets moved up 64 MB to make room for the
>> module area (which also moves down). That way, we can still load
>> modules as before, but no longer have a need for a dedicated carveout
>> for the kernel below PAGE_OFFSET.
>
> This makes sense, I guess it can be easily added to the existing series
> just by changing the KIMAGE_OFFSET macro.
>

Indeed. The only difference is that the VM area needs to be reserved
explicitly, to prevent vmalloc() from reusing it.

>> The next step is then to move the kernel Image up inside the vmalloc
>> area based on some randomness we get from the bootloader, and relocate
>> it in place (using the same approach as in the patches I sent out
>> beginning of this year). I have implemented module PLTs so that the
>> Image and the modules no longer need to be within 128 MB of each
>> other, which means that we can have full KASLR for modules and Image,
>> and also place the kernel anywhere in physical memory.The module PLTs
>> would be a runtime penalty only, i.e., a KASLR capable kernel running
>> without KASLR would not incur the penalty of branching via PLTs. The
>> only build time option is -mcmodel=large for modules so that data
>> symbol references are absolute, but that is unlike to hurt
>> performance.
>
> I guess full KASLR would be conditional on a config option.
>

Yes. But it would be nice if the only build time penalty is the use of
-mcmodel=large for modules, so that distro kernels can enable KASLR
unconditionally (especially since -mcmodel=large is likely to be
enabled for distro kernels anyway, due to the A53 erratum that
requires it.)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 6/7] arm64: map linear region as non-executable
  2015-11-16 11:23 ` [PATCH v3 6/7] arm64: map linear region as non-executable Ard Biesheuvel
@ 2015-12-07 16:19   ` Catalin Marinas
  2015-12-07 16:22     ` Ard Biesheuvel
  0 siblings, 1 reply; 28+ messages in thread
From: Catalin Marinas @ 2015-12-07 16:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Nov 16, 2015 at 12:23:17PM +0100, Ard Biesheuvel wrote:
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index c7ba171951c8..526eeb7e1e97 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -357,47 +357,10 @@ static void create_mapping_late(phys_addr_t phys, unsigned long virt,
>  				phys, virt, size, prot, late_alloc);
>  }
>  
> -#ifdef CONFIG_DEBUG_RODATA
>  static void __init __map_memblock(phys_addr_t start, phys_addr_t end)
>  {
> -	/*
> -	 * Set up the executable regions using the existing section mappings
> -	 * for now. This will get more fine grained later once all memory
> -	 * is mapped
> -	 */
> -	unsigned long kernel_x_start = round_down(__pa(_stext), SWAPPER_BLOCK_SIZE);
> -	unsigned long kernel_x_end = round_up(__pa(__init_end), SWAPPER_BLOCK_SIZE);
> -
> -	if (end < kernel_x_start) {
> -		create_mapping(start, __phys_to_virt(start),
> -			end - start, PAGE_KERNEL);
> -	} else if (start >= kernel_x_end) {
> -		create_mapping(start, __phys_to_virt(start),
> -			end - start, PAGE_KERNEL);
> -	} else {
> -		if (start < kernel_x_start)
> -			create_mapping(start, __phys_to_virt(start),
> -				kernel_x_start - start,
> -				PAGE_KERNEL);
> -		create_mapping(kernel_x_start,
> -				__phys_to_virt(kernel_x_start),
> -				kernel_x_end - kernel_x_start,
> -				PAGE_KERNEL_EXEC);
> -		if (kernel_x_end < end)
> -			create_mapping(kernel_x_end,
> -				__phys_to_virt(kernel_x_end),
> -				end - kernel_x_end,
> -				PAGE_KERNEL);
> -	}
> -
> -}
> -#else
> -static void __init __map_memblock(phys_addr_t start, phys_addr_t end)
> -{
> -	create_mapping(start, __phys_to_virt(start), end - start,
> -			PAGE_KERNEL_EXEC);
> +	create_mapping(start, __phys_to_virt(start), end - start, PAGE_KERNEL);
>  }
> -#endif
>  
>  struct bootstrap_pgtables {
>  	pte_t	pte[PTRS_PER_PTE];
> @@ -471,7 +434,7 @@ static unsigned long __init bootstrap_region(struct bootstrap_pgtables *reg,
>  				      SWAPPER_BLOCK_SIZE));
>  
>  		create_mapping(__pa(vstart - va_offset), vstart, vend - vstart,
> -			       PAGE_KERNEL_EXEC);
> +			       PAGE_KERNEL);
>  
>  		return vend;
>  	}

These make sense. However, shall we go a step further and unmap the
kernel image completely from the linear mapping, maybe based on
CONFIG_DEBUG_RODATA? The mark_rodata_ro() function changes the text to
read-only but you can still get writable access to it via
__va(__pa(_stext)).

-- 
Catalin

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 6/7] arm64: map linear region as non-executable
  2015-12-07 16:19   ` Catalin Marinas
@ 2015-12-07 16:22     ` Ard Biesheuvel
  2015-12-07 16:27       ` Catalin Marinas
  0 siblings, 1 reply; 28+ messages in thread
From: Ard Biesheuvel @ 2015-12-07 16:22 UTC (permalink / raw)
  To: linux-arm-kernel

On 7 December 2015 at 17:19, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Mon, Nov 16, 2015 at 12:23:17PM +0100, Ard Biesheuvel wrote:
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index c7ba171951c8..526eeb7e1e97 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -357,47 +357,10 @@ static void create_mapping_late(phys_addr_t phys, unsigned long virt,
>>                               phys, virt, size, prot, late_alloc);
>>  }
>>
>> -#ifdef CONFIG_DEBUG_RODATA
>>  static void __init __map_memblock(phys_addr_t start, phys_addr_t end)
>>  {
>> -     /*
>> -      * Set up the executable regions using the existing section mappings
>> -      * for now. This will get more fine grained later once all memory
>> -      * is mapped
>> -      */
>> -     unsigned long kernel_x_start = round_down(__pa(_stext), SWAPPER_BLOCK_SIZE);
>> -     unsigned long kernel_x_end = round_up(__pa(__init_end), SWAPPER_BLOCK_SIZE);
>> -
>> -     if (end < kernel_x_start) {
>> -             create_mapping(start, __phys_to_virt(start),
>> -                     end - start, PAGE_KERNEL);
>> -     } else if (start >= kernel_x_end) {
>> -             create_mapping(start, __phys_to_virt(start),
>> -                     end - start, PAGE_KERNEL);
>> -     } else {
>> -             if (start < kernel_x_start)
>> -                     create_mapping(start, __phys_to_virt(start),
>> -                             kernel_x_start - start,
>> -                             PAGE_KERNEL);
>> -             create_mapping(kernel_x_start,
>> -                             __phys_to_virt(kernel_x_start),
>> -                             kernel_x_end - kernel_x_start,
>> -                             PAGE_KERNEL_EXEC);
>> -             if (kernel_x_end < end)
>> -                     create_mapping(kernel_x_end,
>> -                             __phys_to_virt(kernel_x_end),
>> -                             end - kernel_x_end,
>> -                             PAGE_KERNEL);
>> -     }
>> -
>> -}
>> -#else
>> -static void __init __map_memblock(phys_addr_t start, phys_addr_t end)
>> -{
>> -     create_mapping(start, __phys_to_virt(start), end - start,
>> -                     PAGE_KERNEL_EXEC);
>> +     create_mapping(start, __phys_to_virt(start), end - start, PAGE_KERNEL);
>>  }
>> -#endif
>>
>>  struct bootstrap_pgtables {
>>       pte_t   pte[PTRS_PER_PTE];
>> @@ -471,7 +434,7 @@ static unsigned long __init bootstrap_region(struct bootstrap_pgtables *reg,
>>                                     SWAPPER_BLOCK_SIZE));
>>
>>               create_mapping(__pa(vstart - va_offset), vstart, vend - vstart,
>> -                            PAGE_KERNEL_EXEC);
>> +                            PAGE_KERNEL);
>>
>>               return vend;
>>       }
>
> These make sense. However, shall we go a step further and unmap the
> kernel image completely from the linear mapping, maybe based on
> CONFIG_DEBUG_RODATA? The mark_rodata_ro() function changes the text to
> read-only but you can still get writable access to it via
> __va(__pa(_stext)).
>

If we can tolerate the fragmentation, then yes, let's unmap it
completely. As long as we don't unmap the.pgdir section, since that
will be referenced via the linear mapping

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 6/7] arm64: map linear region as non-executable
  2015-12-07 16:22     ` Ard Biesheuvel
@ 2015-12-07 16:27       ` Catalin Marinas
  0 siblings, 0 replies; 28+ messages in thread
From: Catalin Marinas @ 2015-12-07 16:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Dec 07, 2015 at 05:22:32PM +0100, Ard Biesheuvel wrote:
> On 7 December 2015 at 17:19, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Mon, Nov 16, 2015 at 12:23:17PM +0100, Ard Biesheuvel wrote:
> >> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> >> index c7ba171951c8..526eeb7e1e97 100644
> >> --- a/arch/arm64/mm/mmu.c
> >> +++ b/arch/arm64/mm/mmu.c
> >> @@ -357,47 +357,10 @@ static void create_mapping_late(phys_addr_t phys, unsigned long virt,
> >>                               phys, virt, size, prot, late_alloc);
> >>  }
> >>
> >> -#ifdef CONFIG_DEBUG_RODATA
> >>  static void __init __map_memblock(phys_addr_t start, phys_addr_t end)
> >>  {
> >> -     /*
> >> -      * Set up the executable regions using the existing section mappings
> >> -      * for now. This will get more fine grained later once all memory
> >> -      * is mapped
> >> -      */
> >> -     unsigned long kernel_x_start = round_down(__pa(_stext), SWAPPER_BLOCK_SIZE);
> >> -     unsigned long kernel_x_end = round_up(__pa(__init_end), SWAPPER_BLOCK_SIZE);
> >> -
> >> -     if (end < kernel_x_start) {
> >> -             create_mapping(start, __phys_to_virt(start),
> >> -                     end - start, PAGE_KERNEL);
> >> -     } else if (start >= kernel_x_end) {
> >> -             create_mapping(start, __phys_to_virt(start),
> >> -                     end - start, PAGE_KERNEL);
> >> -     } else {
> >> -             if (start < kernel_x_start)
> >> -                     create_mapping(start, __phys_to_virt(start),
> >> -                             kernel_x_start - start,
> >> -                             PAGE_KERNEL);
> >> -             create_mapping(kernel_x_start,
> >> -                             __phys_to_virt(kernel_x_start),
> >> -                             kernel_x_end - kernel_x_start,
> >> -                             PAGE_KERNEL_EXEC);
> >> -             if (kernel_x_end < end)
> >> -                     create_mapping(kernel_x_end,
> >> -                             __phys_to_virt(kernel_x_end),
> >> -                             end - kernel_x_end,
> >> -                             PAGE_KERNEL);
> >> -     }
> >> -
> >> -}
> >> -#else
> >> -static void __init __map_memblock(phys_addr_t start, phys_addr_t end)
> >> -{
> >> -     create_mapping(start, __phys_to_virt(start), end - start,
> >> -                     PAGE_KERNEL_EXEC);
> >> +     create_mapping(start, __phys_to_virt(start), end - start, PAGE_KERNEL);
> >>  }
> >> -#endif
> >>
> >>  struct bootstrap_pgtables {
> >>       pte_t   pte[PTRS_PER_PTE];
> >> @@ -471,7 +434,7 @@ static unsigned long __init bootstrap_region(struct bootstrap_pgtables *reg,
> >>                                     SWAPPER_BLOCK_SIZE));
> >>
> >>               create_mapping(__pa(vstart - va_offset), vstart, vend - vstart,
> >> -                            PAGE_KERNEL_EXEC);
> >> +                            PAGE_KERNEL);
> >>
> >>               return vend;
> >>       }
> >
> > These make sense. However, shall we go a step further and unmap the
> > kernel image completely from the linear mapping, maybe based on
> > CONFIG_DEBUG_RODATA? The mark_rodata_ro() function changes the text to
> > read-only but you can still get writable access to it via
> > __va(__pa(_stext)).
> 
> If we can tolerate the fragmentation, then yes, let's unmap it
> completely. As long as we don't unmap the.pgdir section, since that
> will be referenced via the linear mapping

I think we should do this in mark_rodata_ro() function *if*
CONFIG_DEBUG_RODATA is enabled, otherwise we leave them as they are
(non-exec linear mapping).

The problem, as before is potential TLB conflicts that Mark is going to
solve ;).

-- 
Catalin

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 7/7] arm64: allow kernel Image to be loaded anywhere in physical memory
  2015-12-07 15:40     ` Ard Biesheuvel
@ 2015-12-07 16:43       ` Catalin Marinas
  0 siblings, 0 replies; 28+ messages in thread
From: Catalin Marinas @ 2015-12-07 16:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Dec 07, 2015 at 04:40:20PM +0100, Ard Biesheuvel wrote:
> On 7 December 2015 at 16:30, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Mon, Nov 16, 2015 at 12:23:18PM +0100, Ard Biesheuvel wrote:
> >> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> >> index b3b0175d7135..29a7dc5327b6 100644
> >> --- a/arch/arm64/mm/init.c
> >> +++ b/arch/arm64/mm/init.c
> >> @@ -158,9 +159,55 @@ static int __init early_mem(char *p)
> >>  }
> >>  early_param("mem", early_mem);
> >>
> >> +static void __init enforce_memory_limit(void)
> >> +{
> >> +     const phys_addr_t kbase = round_down(__pa(_text), MIN_KIMG_ALIGN);
> >> +     u64 to_remove = memblock_phys_mem_size() - memory_limit;
> >> +     phys_addr_t max_addr = 0;
> >> +     struct memblock_region *r;
> >> +
> >> +     if (memory_limit == (phys_addr_t)ULLONG_MAX)
> >> +             return;
> >> +
> >> +     /*
> >> +      * The kernel may be high up in physical memory, so try to apply the
> >> +      * limit below the kernel first, and only let the generic handling
> >> +      * take over if it turns out we haven't clipped enough memory yet.
> >> +      */
> >> +     for_each_memblock(memory, r) {
> >> +             if (r->base + r->size > kbase) {
> >> +                     u64 rem = min(to_remove, kbase - r->base);
> >> +
> >> +                     max_addr = r->base + rem;
> >> +                     to_remove -= rem;
> >> +                     break;
> >> +             }
> >> +             if (to_remove <= r->size) {
> >> +                     max_addr = r->base + to_remove;
> >> +                     to_remove = 0;
> >> +                     break;
> >> +             }
> >> +             to_remove -= r->size;
> >> +     }
> >> +
> >> +     memblock_remove(0, max_addr);
> >
> > I don't fully get the reason for this function. Do you want to keep the
> > kernel around in memblock? How do we guarantee that the call below
> > wouldn't remove it anyway?
> 
> The problem is that the ordinary memblock_enforce_memory_limit()
> removes memory from the top, which means it will happily remove the
> memory that covers your kernel image if it happens to be loaded high
> up in physical memory.

We could fix the memblock_reserve() call on the kernel image but apart
from that we don't care about memblock's knowledge of the kernel text. A
potential problem is freeing the init memory which assumes it's present
in the linear mapping, though we could add additional checks here as
well. Is memblock_end_of_DRAM() adjusted to the new maximum address
after memblock_enforce_memory_limit()?

> >> +
> >> +     if (to_remove)
> >> +             memblock_enforce_memory_limit(memory_limit);
> >
> > Shouldn't this be memblock_enforce_memory_limit(to_remove)?
> 
> No, it takes the memory limit as input. 'to_remove + memory_limit'
> will be exactly the remaining memory at this point.

You are right, I thought it's the memory to remove.

-- 
Catalin

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 3/7] arm64: split off early mapping code from early_fixmap_init()
  2015-12-03 12:18   ` Mark Rutland
  2015-12-03 13:31     ` Ard Biesheuvel
@ 2015-12-08 12:40     ` Will Deacon
  2015-12-08 13:29       ` Ard Biesheuvel
  1 sibling, 1 reply; 28+ messages in thread
From: Will Deacon @ 2015-12-08 12:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Dec 03, 2015 at 12:18:40PM +0000, Mark Rutland wrote:
> Apologies that it's taken me so long to get around to this...
> 
> On Mon, Nov 16, 2015 at 12:23:14PM +0100, Ard Biesheuvel wrote:
> > This splits off and generalises the population of the statically
> > allocated fixmap page tables so that we may reuse it later for
> > the linear mapping once we move the kernel text mapping out of it.
> > 
> > This also involves taking into account that table entries at any of
> > the levels we are populating may have been populated already, since
> > the fixmap mapping might not be disjoint up to the pgd level anymore
> > from other early mappings.
> 
> As a heads-up, for avoiding TLB conflicts, I'm currently working on
> alternative way of creating the kernel page tables which will definitely
> conflict here, and may or may not supercede this approach.

Given that the Christmas break is around the corner and your TLB series
is probably going to take some time to get right, I suggest we persevere
with Ard's current patch series for 4.5 and merge the TLB conflict solution
for 4.6. I don't want us to end up in a situation where this is needlessly
blocked on something that isn't quite ready.

Any objections? If not, Ard -- can you post a new version of this, please?

Will

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 3/7] arm64: split off early mapping code from early_fixmap_init()
  2015-12-08 12:40     ` Will Deacon
@ 2015-12-08 13:29       ` Ard Biesheuvel
  2015-12-08 13:51         ` Will Deacon
  0 siblings, 1 reply; 28+ messages in thread
From: Ard Biesheuvel @ 2015-12-08 13:29 UTC (permalink / raw)
  To: linux-arm-kernel

On 8 December 2015 at 13:40, Will Deacon <will.deacon@arm.com> wrote:
> On Thu, Dec 03, 2015 at 12:18:40PM +0000, Mark Rutland wrote:
>> Apologies that it's taken me so long to get around to this...
>>
>> On Mon, Nov 16, 2015 at 12:23:14PM +0100, Ard Biesheuvel wrote:
>> > This splits off and generalises the population of the statically
>> > allocated fixmap page tables so that we may reuse it later for
>> > the linear mapping once we move the kernel text mapping out of it.
>> >
>> > This also involves taking into account that table entries at any of
>> > the levels we are populating may have been populated already, since
>> > the fixmap mapping might not be disjoint up to the pgd level anymore
>> > from other early mappings.
>>
>> As a heads-up, for avoiding TLB conflicts, I'm currently working on
>> alternative way of creating the kernel page tables which will definitely
>> conflict here, and may or may not supercede this approach.
>
> Given that the Christmas break is around the corner and your TLB series
> is probably going to take some time to get right, I suggest we persevere
> with Ard's current patch series for 4.5 and merge the TLB conflict solution
> for 4.6. I don't want us to end up in a situation where this is needlessly
> blocked on something that isn't quite ready.
>
> Any objections? If not, Ard -- can you post a new version of this, please?
>

Happy to post a new version, with the following remarks
- my current private tree has evolved in the mean time, and I am now
putting the kernel image at the base of the vmalloc region (and the
module region right before)
- I think Mark's changes would allow me to deobfuscate the VA bias
that redirects __va() translations into the kernel VA space rather
than the linear mapping

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 3/7] arm64: split off early mapping code from early_fixmap_init()
  2015-12-08 13:29       ` Ard Biesheuvel
@ 2015-12-08 13:51         ` Will Deacon
  2015-12-15 19:19           ` Ard Biesheuvel
  0 siblings, 1 reply; 28+ messages in thread
From: Will Deacon @ 2015-12-08 13:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Dec 08, 2015 at 02:29:33PM +0100, Ard Biesheuvel wrote:
> On 8 December 2015 at 13:40, Will Deacon <will.deacon@arm.com> wrote:
> > On Thu, Dec 03, 2015 at 12:18:40PM +0000, Mark Rutland wrote:
> >> Apologies that it's taken me so long to get around to this...
> >>
> >> On Mon, Nov 16, 2015 at 12:23:14PM +0100, Ard Biesheuvel wrote:
> >> > This splits off and generalises the population of the statically
> >> > allocated fixmap page tables so that we may reuse it later for
> >> > the linear mapping once we move the kernel text mapping out of it.
> >> >
> >> > This also involves taking into account that table entries at any of
> >> > the levels we are populating may have been populated already, since
> >> > the fixmap mapping might not be disjoint up to the pgd level anymore
> >> > from other early mappings.
> >>
> >> As a heads-up, for avoiding TLB conflicts, I'm currently working on
> >> alternative way of creating the kernel page tables which will definitely
> >> conflict here, and may or may not supercede this approach.
> >
> > Given that the Christmas break is around the corner and your TLB series
> > is probably going to take some time to get right, I suggest we persevere
> > with Ard's current patch series for 4.5 and merge the TLB conflict solution
> > for 4.6. I don't want us to end up in a situation where this is needlessly
> > blocked on something that isn't quite ready.
> >
> > Any objections? If not, Ard -- can you post a new version of this, please?
> >
> 
> Happy to post a new version, with the following remarks
> - my current private tree has evolved in the mean time, and I am now
> putting the kernel image at the base of the vmalloc region (and the
> module region right before)
> - I think Mark's changes would allow me to deobfuscate the VA bias
> that redirects __va() translations into the kernel VA space rather
> than the linear mapping

I'll leave that up to you. I'm just trying to avoid you growing a dependency
on something that's unlikely to make it for 4.5. If Mark separates out the
parts you need, perhaps that offers us some middle ground.

Will

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 3/7] arm64: split off early mapping code from early_fixmap_init()
  2015-12-08 13:51         ` Will Deacon
@ 2015-12-15 19:19           ` Ard Biesheuvel
  0 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2015-12-15 19:19 UTC (permalink / raw)
  To: linux-arm-kernel

On 8 December 2015 at 14:51, Will Deacon <will.deacon@arm.com> wrote:
> On Tue, Dec 08, 2015 at 02:29:33PM +0100, Ard Biesheuvel wrote:
>> On 8 December 2015 at 13:40, Will Deacon <will.deacon@arm.com> wrote:
>> > On Thu, Dec 03, 2015 at 12:18:40PM +0000, Mark Rutland wrote:
>> >> Apologies that it's taken me so long to get around to this...
>> >>
>> >> On Mon, Nov 16, 2015 at 12:23:14PM +0100, Ard Biesheuvel wrote:
>> >> > This splits off and generalises the population of the statically
>> >> > allocated fixmap page tables so that we may reuse it later for
>> >> > the linear mapping once we move the kernel text mapping out of it.
>> >> >
>> >> > This also involves taking into account that table entries at any of
>> >> > the levels we are populating may have been populated already, since
>> >> > the fixmap mapping might not be disjoint up to the pgd level anymore
>> >> > from other early mappings.
>> >>
>> >> As a heads-up, for avoiding TLB conflicts, I'm currently working on
>> >> alternative way of creating the kernel page tables which will definitely
>> >> conflict here, and may or may not supercede this approach.
>> >
>> > Given that the Christmas break is around the corner and your TLB series
>> > is probably going to take some time to get right, I suggest we persevere
>> > with Ard's current patch series for 4.5 and merge the TLB conflict solution
>> > for 4.6. I don't want us to end up in a situation where this is needlessly
>> > blocked on something that isn't quite ready.
>> >
>> > Any objections? If not, Ard -- can you post a new version of this, please?
>> >
>>
>> Happy to post a new version, with the following remarks
>> - my current private tree has evolved in the mean time, and I am now
>> putting the kernel image at the base of the vmalloc region (and the
>> module region right before)
>> - I think Mark's changes would allow me to deobfuscate the VA bias
>> that redirects __va() translations into the kernel VA space rather
>> than the linear mapping
>
> I'll leave that up to you. I'm just trying to avoid you growing a dependency
> on something that's unlikely to make it for 4.5. If Mark separates out the
> parts you need, perhaps that offers us some middle ground.
>

I have played around with Mark's code a bit, and it looks like it's a
huge improvement for the split VA patches as well: I have a patch that
removes early_fixmap_init's dependency on the linear mapping, and
combined with Mark's patches to use the fixmap for manipulating the
page tables, it looks like I no longer need the VA bias to redirect
__va translations into the kernel mapping early on.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2015-12-15 19:19 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-16 11:23 [PATCH v3 0/7] arm64: relax Image placement rules Ard Biesheuvel
2015-11-16 11:23 ` [PATCH v3 1/7] of/fdt: make memblock minimum physical address arch configurable Ard Biesheuvel
2015-11-16 11:23 ` [PATCH v3 2/7] arm64: use more granular reservations for static page table allocations Ard Biesheuvel
2015-11-16 11:23 ` [PATCH v3 3/7] arm64: split off early mapping code from early_fixmap_init() Ard Biesheuvel
2015-12-03 12:18   ` Mark Rutland
2015-12-03 13:31     ` Ard Biesheuvel
2015-12-03 13:59       ` Mark Rutland
2015-12-03 14:05         ` Ard Biesheuvel
2015-12-07 16:08       ` Catalin Marinas
2015-12-07 16:13         ` Ard Biesheuvel
2015-12-08 12:40     ` Will Deacon
2015-12-08 13:29       ` Ard Biesheuvel
2015-12-08 13:51         ` Will Deacon
2015-12-15 19:19           ` Ard Biesheuvel
2015-11-16 11:23 ` [PATCH v3 4/7] arm64: mm: explicitly bootstrap the linear mapping Ard Biesheuvel
2015-11-16 11:23 ` [PATCH v3 5/7] arm64: move kernel mapping out of linear region Ard Biesheuvel
2015-12-07 12:26   ` Catalin Marinas
2015-12-07 12:33     ` Ard Biesheuvel
2015-12-07 12:34       ` Ard Biesheuvel
2015-12-07 15:37         ` Catalin Marinas
2015-11-16 11:23 ` [PATCH v3 6/7] arm64: map linear region as non-executable Ard Biesheuvel
2015-12-07 16:19   ` Catalin Marinas
2015-12-07 16:22     ` Ard Biesheuvel
2015-12-07 16:27       ` Catalin Marinas
2015-11-16 11:23 ` [PATCH v3 7/7] arm64: allow kernel Image to be loaded anywhere in physical memory Ard Biesheuvel
2015-12-07 15:30   ` Catalin Marinas
2015-12-07 15:40     ` Ard Biesheuvel
2015-12-07 16:43       ` Catalin Marinas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).