All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH roundup 0/4] extend VA range of ID map for core kernel and KVM
@ 2015-03-06 14:34 ` Ard Biesheuvel
  0 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2015-03-06 14:34 UTC (permalink / raw)
  To: marc.zyngier, christoffer.dall, linux, linux-arm-kernel, kvmarm
  Cc: will.deacon, arnd, Ard Biesheuvel

These are the VA range patches presented as a coherent set. The bounce page
and 'HYP init code too big' error are probably not prerequisites anymore now
that I switched from merging the HYP runtime map with the HYP id map rather
than the kernel ID map, but I would strongly prefer to keep it as a single
series.

Ard Biesheuvel (3):
  arm64: mm: increase VA range of identity map
  ARM, arm64: kvm: get rid of the bounce page
  arm64: KVM: use ID map with increased VA range if required

Arnd Bergmann (1):
  ARM: KVM: avoid "HYP init code too big" error

 arch/arm/include/asm/kvm_mmu.h         | 10 +++++
 arch/arm/kernel/vmlinux.lds.S          | 35 ++++++++++++++---
 arch/arm/kvm/init.S                    |  3 ++
 arch/arm/kvm/mmu.c                     | 69 +++++++++++++++-------------------
 arch/arm64/include/asm/kvm_mmu.h       | 33 ++++++++++++++++
 arch/arm64/include/asm/mmu_context.h   | 43 +++++++++++++++++++++
 arch/arm64/include/asm/page.h          |  6 ++-
 arch/arm64/include/asm/pgtable-hwdef.h |  7 +++-
 arch/arm64/kernel/head.S               | 38 +++++++++++++++++++
 arch/arm64/kernel/smp.c                |  1 +
 arch/arm64/kernel/vmlinux.lds.S        | 18 ++++++---
 arch/arm64/kvm/hyp-init.S              | 26 +++++++++++++
 arch/arm64/mm/mmu.c                    |  7 +++-
 arch/arm64/mm/proc-macros.S            | 11 ++++++
 arch/arm64/mm/proc.S                   |  3 ++
 15 files changed, 256 insertions(+), 54 deletions(-)

-- 
1.8.3.2

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH roundup 0/4] extend VA range of ID map for core kernel and KVM
@ 2015-03-06 14:34 ` Ard Biesheuvel
  0 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2015-03-06 14:34 UTC (permalink / raw)
  To: linux-arm-kernel

These are the VA range patches presented as a coherent set. The bounce page
and 'HYP init code too big' error are probably not prerequisites anymore now
that I switched from merging the HYP runtime map with the HYP id map rather
than the kernel ID map, but I would strongly prefer to keep it as a single
series.

Ard Biesheuvel (3):
  arm64: mm: increase VA range of identity map
  ARM, arm64: kvm: get rid of the bounce page
  arm64: KVM: use ID map with increased VA range if required

Arnd Bergmann (1):
  ARM: KVM: avoid "HYP init code too big" error

 arch/arm/include/asm/kvm_mmu.h         | 10 +++++
 arch/arm/kernel/vmlinux.lds.S          | 35 ++++++++++++++---
 arch/arm/kvm/init.S                    |  3 ++
 arch/arm/kvm/mmu.c                     | 69 +++++++++++++++-------------------
 arch/arm64/include/asm/kvm_mmu.h       | 33 ++++++++++++++++
 arch/arm64/include/asm/mmu_context.h   | 43 +++++++++++++++++++++
 arch/arm64/include/asm/page.h          |  6 ++-
 arch/arm64/include/asm/pgtable-hwdef.h |  7 +++-
 arch/arm64/kernel/head.S               | 38 +++++++++++++++++++
 arch/arm64/kernel/smp.c                |  1 +
 arch/arm64/kernel/vmlinux.lds.S        | 18 ++++++---
 arch/arm64/kvm/hyp-init.S              | 26 +++++++++++++
 arch/arm64/mm/mmu.c                    |  7 +++-
 arch/arm64/mm/proc-macros.S            | 11 ++++++
 arch/arm64/mm/proc.S                   |  3 ++
 15 files changed, 256 insertions(+), 54 deletions(-)

-- 
1.8.3.2

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH roundup 1/4] arm64: mm: increase VA range of identity map
  2015-03-06 14:34 ` Ard Biesheuvel
@ 2015-03-06 14:34   ` Ard Biesheuvel
  -1 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2015-03-06 14:34 UTC (permalink / raw)
  To: marc.zyngier, christoffer.dall, linux, linux-arm-kernel, kvmarm
  Cc: will.deacon, arnd, Ard Biesheuvel

The page size and the number of translation levels, and hence the supported
virtual address range, are build-time configurables on arm64 whose optimal
values are use case dependent. However, in the current implementation, if
the system's RAM is located at a very high offset, the virtual address range
needs to reflect that merely because the identity mapping, which is only used
to enable or disable the MMU, requires the extended virtual range to map the
physical memory at an equal virtual offset.

This patch relaxes that requirement, by increasing the number of translation
levels for the identity mapping only, and only when actually needed, i.e.,
when system RAM's offset is found to be out of reach at runtime.

Tested-by: Laura Abbott <lauraa@codeaurora.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/mmu_context.h   | 43 ++++++++++++++++++++++++++++++++++
 arch/arm64/include/asm/page.h          |  6 +++--
 arch/arm64/include/asm/pgtable-hwdef.h |  7 +++++-
 arch/arm64/kernel/head.S               | 38 ++++++++++++++++++++++++++++++
 arch/arm64/kernel/smp.c                |  1 +
 arch/arm64/mm/mmu.c                    |  7 +++++-
 arch/arm64/mm/proc-macros.S            | 11 +++++++++
 arch/arm64/mm/proc.S                   |  3 +++
 8 files changed, 112 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index a9eee33dfa62..ecf2d060036b 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -64,6 +64,49 @@ static inline void cpu_set_reserved_ttbr0(void)
 	: "r" (ttbr));
 }
 
+/*
+ * TCR.T0SZ value to use when the ID map is active. Usually equals
+ * TCR_T0SZ(VA_BITS), unless system RAM is positioned very high in
+ * physical memory, in which case it will be smaller.
+ */
+extern u64 idmap_t0sz;
+
+static inline bool __cpu_uses_extended_idmap(void)
+{
+	return (!IS_ENABLED(CONFIG_ARM64_VA_BITS_48) &&
+		unlikely(idmap_t0sz != TCR_T0SZ(VA_BITS)));
+}
+
+static inline void __cpu_set_tcr_t0sz(u64 t0sz)
+{
+	unsigned long tcr;
+
+	if (__cpu_uses_extended_idmap())
+		asm volatile (
+		"	mrs	%0, tcr_el1	;"
+		"	bfi	%0, %1, %2, %3	;"
+		"	msr	tcr_el1, %0	;"
+		"	isb"
+		: "=&r" (tcr)
+		: "r"(t0sz), "I"(TCR_T0SZ_OFFSET), "I"(TCR_TxSZ_WIDTH));
+}
+
+/*
+ * Set TCR.T0SZ to the value appropriate for activating the identity map.
+ */
+static inline void cpu_set_idmap_tcr_t0sz(void)
+{
+	__cpu_set_tcr_t0sz(idmap_t0sz);
+}
+
+/*
+ * Set TCR.T0SZ to its default value (based on VA_BITS)
+ */
+static inline void cpu_set_default_tcr_t0sz(void)
+{
+	__cpu_set_tcr_t0sz(TCR_T0SZ(VA_BITS));
+}
+
 static inline void switch_new_context(struct mm_struct *mm)
 {
 	unsigned long flags;
diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index 22b16232bd60..3d02b1869eb8 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -33,7 +33,9 @@
  * image. Both require pgd, pud (4 levels only) and pmd tables to (section)
  * map the kernel. With the 64K page configuration, swapper and idmap need to
  * map to pte level. The swapper also maps the FDT (see __create_page_tables
- * for more information).
+ * for more information). Note that the number of ID map translation levels
+ * could be increased on the fly if system RAM is out of reach for the default
+ * VA range, so 3 pages are reserved in all cases.
  */
 #ifdef CONFIG_ARM64_64K_PAGES
 #define SWAPPER_PGTABLE_LEVELS	(CONFIG_ARM64_PGTABLE_LEVELS)
@@ -42,7 +44,7 @@
 #endif
 
 #define SWAPPER_DIR_SIZE	(SWAPPER_PGTABLE_LEVELS * PAGE_SIZE)
-#define IDMAP_DIR_SIZE		(SWAPPER_DIR_SIZE)
+#define IDMAP_DIR_SIZE		(3 * PAGE_SIZE)
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
index 5f930cc9ea83..847e864202cc 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -143,7 +143,12 @@
 /*
  * TCR flags.
  */
-#define TCR_TxSZ(x)		(((UL(64) - (x)) << 16) | ((UL(64) - (x)) << 0))
+#define TCR_T0SZ_OFFSET		0
+#define TCR_T1SZ_OFFSET		16
+#define TCR_T0SZ(x)		((UL(64) - (x)) << TCR_T0SZ_OFFSET)
+#define TCR_T1SZ(x)		((UL(64) - (x)) << TCR_T1SZ_OFFSET)
+#define TCR_TxSZ(x)		(TCR_T0SZ(x) | TCR_T1SZ(x))
+#define TCR_TxSZ_WIDTH		6
 #define TCR_IRGN_NC		((UL(0) << 8) | (UL(0) << 24))
 #define TCR_IRGN_WBWA		((UL(1) << 8) | (UL(1) << 24))
 #define TCR_IRGN_WT		((UL(2) << 8) | (UL(2) << 24))
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 8ce88e08c030..a3612eadab3c 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -387,6 +387,44 @@ __create_page_tables:
 	mov	x0, x25				// idmap_pg_dir
 	ldr	x3, =KERNEL_START
 	add	x3, x3, x28			// __pa(KERNEL_START)
+
+#ifndef CONFIG_ARM64_VA_BITS_48
+#define EXTRA_SHIFT	(PGDIR_SHIFT + PAGE_SHIFT - 3)
+#define EXTRA_PTRS	(1 << (48 - EXTRA_SHIFT))
+
+	/*
+	 * If VA_BITS < 48, it may be too small to allow for an ID mapping to be
+	 * created that covers system RAM if that is located sufficiently high
+	 * in the physical address space. So for the ID map, use an extended
+	 * virtual range in that case, by configuring an additional translation
+	 * level.
+	 * First, we have to verify our assumption that the current value of
+	 * VA_BITS was chosen such that all translation levels are fully
+	 * utilised, and that lowering T0SZ will always result in an additional
+	 * translation level to be configured.
+	 */
+#if VA_BITS != EXTRA_SHIFT
+#error "Mismatch between VA_BITS and page size/number of translation levels"
+#endif
+
+	/*
+	 * Calculate the maximum allowed value for TCR_EL1.T0SZ so that the
+	 * entire kernel image can be ID mapped. As T0SZ == (64 - #bits used),
+	 * this number conveniently equals the number of leading zeroes in
+	 * the physical address of KERNEL_END.
+	 */
+	adrp	x5, KERNEL_END
+	clz	x5, x5
+	cmp	x5, TCR_T0SZ(VA_BITS)	// default T0SZ small enough?
+	b.ge	1f			// .. then skip additional level
+
+	adrp	x6, idmap_t0sz
+	str	x5, [x6, :lo12:idmap_t0sz]
+
+	create_table_entry x0, x3, EXTRA_SHIFT, EXTRA_PTRS, x5, x6
+1:
+#endif
+
 	create_pgd_entry x0, x3, x5, x6
 	ldr	x6, =KERNEL_END
 	mov	x5, x3				// __pa(KERNEL_START)
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 328b8ce4b007..74554dfcce73 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -151,6 +151,7 @@ asmlinkage void secondary_start_kernel(void)
 	 */
 	cpu_set_reserved_ttbr0();
 	flush_tlb_all();
+	cpu_set_default_tcr_t0sz();
 
 	preempt_disable();
 	trace_hardirqs_off();
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index c6daaf6c6f97..c4f60393383e 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -40,6 +40,8 @@
 
 #include "mm.h"
 
+u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
+
 /*
  * Empty_zero_page is a special page that is used for zero-initialized data
  * and COW.
@@ -454,6 +456,7 @@ void __init paging_init(void)
 	 */
 	cpu_set_reserved_ttbr0();
 	flush_tlb_all();
+	cpu_set_default_tcr_t0sz();
 }
 
 /*
@@ -461,8 +464,10 @@ void __init paging_init(void)
  */
 void setup_mm_for_reboot(void)
 {
-	cpu_switch_mm(idmap_pg_dir, &init_mm);
+	cpu_set_reserved_ttbr0();
 	flush_tlb_all();
+	cpu_set_idmap_tcr_t0sz();
+	cpu_switch_mm(idmap_pg_dir, &init_mm);
 }
 
 /*
diff --git a/arch/arm64/mm/proc-macros.S b/arch/arm64/mm/proc-macros.S
index 005d29e2977d..c17fdd6a19bc 100644
--- a/arch/arm64/mm/proc-macros.S
+++ b/arch/arm64/mm/proc-macros.S
@@ -52,3 +52,14 @@
 	mov	\reg, #4			// bytes per word
 	lsl	\reg, \reg, \tmp		// actual cache line size
 	.endm
+
+/*
+ * tcr_set_idmap_t0sz - update TCR.T0SZ so that we can load the ID map
+ */
+	.macro	tcr_set_idmap_t0sz, valreg, tmpreg
+#ifndef CONFIG_ARM64_VA_BITS_48
+	adrp	\tmpreg, idmap_t0sz
+	ldr	\tmpreg, [\tmpreg, #:lo12:idmap_t0sz]
+	bfi	\valreg, \tmpreg, #TCR_T0SZ_OFFSET, #TCR_TxSZ_WIDTH
+#endif
+	.endm
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 28eebfb6af76..cdd754e19b9b 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -156,6 +156,7 @@ ENTRY(cpu_do_resume)
 	msr	cpacr_el1, x6
 	msr	ttbr0_el1, x1
 	msr	ttbr1_el1, x7
+	tcr_set_idmap_t0sz x8, x7
 	msr	tcr_el1, x8
 	msr	vbar_el1, x9
 	msr	mdscr_el1, x10
@@ -233,6 +234,8 @@ ENTRY(__cpu_setup)
 	 */
 	ldr	x10, =TCR_TxSZ(VA_BITS) | TCR_CACHE_FLAGS | TCR_SMP_FLAGS | \
 			TCR_TG_FLAGS | TCR_ASID16 | TCR_TBI0
+	tcr_set_idmap_t0sz	x10, x9
+
 	/*
 	 * Read the PARange bits from ID_AA64MMFR0_EL1 and set the IPS bits in
 	 * TCR_EL1.
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH roundup 1/4] arm64: mm: increase VA range of identity map
@ 2015-03-06 14:34   ` Ard Biesheuvel
  0 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2015-03-06 14:34 UTC (permalink / raw)
  To: linux-arm-kernel

The page size and the number of translation levels, and hence the supported
virtual address range, are build-time configurables on arm64 whose optimal
values are use case dependent. However, in the current implementation, if
the system's RAM is located at a very high offset, the virtual address range
needs to reflect that merely because the identity mapping, which is only used
to enable or disable the MMU, requires the extended virtual range to map the
physical memory at an equal virtual offset.

This patch relaxes that requirement, by increasing the number of translation
levels for the identity mapping only, and only when actually needed, i.e.,
when system RAM's offset is found to be out of reach at runtime.

Tested-by: Laura Abbott <lauraa@codeaurora.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/mmu_context.h   | 43 ++++++++++++++++++++++++++++++++++
 arch/arm64/include/asm/page.h          |  6 +++--
 arch/arm64/include/asm/pgtable-hwdef.h |  7 +++++-
 arch/arm64/kernel/head.S               | 38 ++++++++++++++++++++++++++++++
 arch/arm64/kernel/smp.c                |  1 +
 arch/arm64/mm/mmu.c                    |  7 +++++-
 arch/arm64/mm/proc-macros.S            | 11 +++++++++
 arch/arm64/mm/proc.S                   |  3 +++
 8 files changed, 112 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index a9eee33dfa62..ecf2d060036b 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -64,6 +64,49 @@ static inline void cpu_set_reserved_ttbr0(void)
 	: "r" (ttbr));
 }
 
+/*
+ * TCR.T0SZ value to use when the ID map is active. Usually equals
+ * TCR_T0SZ(VA_BITS), unless system RAM is positioned very high in
+ * physical memory, in which case it will be smaller.
+ */
+extern u64 idmap_t0sz;
+
+static inline bool __cpu_uses_extended_idmap(void)
+{
+	return (!IS_ENABLED(CONFIG_ARM64_VA_BITS_48) &&
+		unlikely(idmap_t0sz != TCR_T0SZ(VA_BITS)));
+}
+
+static inline void __cpu_set_tcr_t0sz(u64 t0sz)
+{
+	unsigned long tcr;
+
+	if (__cpu_uses_extended_idmap())
+		asm volatile (
+		"	mrs	%0, tcr_el1	;"
+		"	bfi	%0, %1, %2, %3	;"
+		"	msr	tcr_el1, %0	;"
+		"	isb"
+		: "=&r" (tcr)
+		: "r"(t0sz), "I"(TCR_T0SZ_OFFSET), "I"(TCR_TxSZ_WIDTH));
+}
+
+/*
+ * Set TCR.T0SZ to the value appropriate for activating the identity map.
+ */
+static inline void cpu_set_idmap_tcr_t0sz(void)
+{
+	__cpu_set_tcr_t0sz(idmap_t0sz);
+}
+
+/*
+ * Set TCR.T0SZ to its default value (based on VA_BITS)
+ */
+static inline void cpu_set_default_tcr_t0sz(void)
+{
+	__cpu_set_tcr_t0sz(TCR_T0SZ(VA_BITS));
+}
+
 static inline void switch_new_context(struct mm_struct *mm)
 {
 	unsigned long flags;
diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index 22b16232bd60..3d02b1869eb8 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -33,7 +33,9 @@
  * image. Both require pgd, pud (4 levels only) and pmd tables to (section)
  * map the kernel. With the 64K page configuration, swapper and idmap need to
  * map to pte level. The swapper also maps the FDT (see __create_page_tables
- * for more information).
+ * for more information). Note that the number of ID map translation levels
+ * could be increased on the fly if system RAM is out of reach for the default
+ * VA range, so 3 pages are reserved in all cases.
  */
 #ifdef CONFIG_ARM64_64K_PAGES
 #define SWAPPER_PGTABLE_LEVELS	(CONFIG_ARM64_PGTABLE_LEVELS)
@@ -42,7 +44,7 @@
 #endif
 
 #define SWAPPER_DIR_SIZE	(SWAPPER_PGTABLE_LEVELS * PAGE_SIZE)
-#define IDMAP_DIR_SIZE		(SWAPPER_DIR_SIZE)
+#define IDMAP_DIR_SIZE		(3 * PAGE_SIZE)
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
index 5f930cc9ea83..847e864202cc 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -143,7 +143,12 @@
 /*
  * TCR flags.
  */
-#define TCR_TxSZ(x)		(((UL(64) - (x)) << 16) | ((UL(64) - (x)) << 0))
+#define TCR_T0SZ_OFFSET		0
+#define TCR_T1SZ_OFFSET		16
+#define TCR_T0SZ(x)		((UL(64) - (x)) << TCR_T0SZ_OFFSET)
+#define TCR_T1SZ(x)		((UL(64) - (x)) << TCR_T1SZ_OFFSET)
+#define TCR_TxSZ(x)		(TCR_T0SZ(x) | TCR_T1SZ(x))
+#define TCR_TxSZ_WIDTH		6
 #define TCR_IRGN_NC		((UL(0) << 8) | (UL(0) << 24))
 #define TCR_IRGN_WBWA		((UL(1) << 8) | (UL(1) << 24))
 #define TCR_IRGN_WT		((UL(2) << 8) | (UL(2) << 24))
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 8ce88e08c030..a3612eadab3c 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -387,6 +387,44 @@ __create_page_tables:
 	mov	x0, x25				// idmap_pg_dir
 	ldr	x3, =KERNEL_START
 	add	x3, x3, x28			// __pa(KERNEL_START)
+
+#ifndef CONFIG_ARM64_VA_BITS_48
+#define EXTRA_SHIFT	(PGDIR_SHIFT + PAGE_SHIFT - 3)
+#define EXTRA_PTRS	(1 << (48 - EXTRA_SHIFT))
+
+	/*
+	 * If VA_BITS < 48, it may be too small to allow for an ID mapping to be
+	 * created that covers system RAM if that is located sufficiently high
+	 * in the physical address space. So for the ID map, use an extended
+	 * virtual range in that case, by configuring an additional translation
+	 * level.
+	 * First, we have to verify our assumption that the current value of
+	 * VA_BITS was chosen such that all translation levels are fully
+	 * utilised, and that lowering T0SZ will always result in an additional
+	 * translation level to be configured.
+	 */
+#if VA_BITS != EXTRA_SHIFT
+#error "Mismatch between VA_BITS and page size/number of translation levels"
+#endif
+
+	/*
+	 * Calculate the maximum allowed value for TCR_EL1.T0SZ so that the
+	 * entire kernel image can be ID mapped. As T0SZ == (64 - #bits used),
+	 * this number conveniently equals the number of leading zeroes in
+	 * the physical address of KERNEL_END.
+	 */
+	adrp	x5, KERNEL_END
+	clz	x5, x5
+	cmp	x5, TCR_T0SZ(VA_BITS)	// default T0SZ small enough?
+	b.ge	1f			// .. then skip additional level
+
+	adrp	x6, idmap_t0sz
+	str	x5, [x6, :lo12:idmap_t0sz]
+
+	create_table_entry x0, x3, EXTRA_SHIFT, EXTRA_PTRS, x5, x6
+1:
+#endif
+
 	create_pgd_entry x0, x3, x5, x6
 	ldr	x6, =KERNEL_END
 	mov	x5, x3				// __pa(KERNEL_START)
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 328b8ce4b007..74554dfcce73 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -151,6 +151,7 @@ asmlinkage void secondary_start_kernel(void)
 	 */
 	cpu_set_reserved_ttbr0();
 	flush_tlb_all();
+	cpu_set_default_tcr_t0sz();
 
 	preempt_disable();
 	trace_hardirqs_off();
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index c6daaf6c6f97..c4f60393383e 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -40,6 +40,8 @@
 
 #include "mm.h"
 
+u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
+
 /*
  * Empty_zero_page is a special page that is used for zero-initialized data
  * and COW.
@@ -454,6 +456,7 @@ void __init paging_init(void)
 	 */
 	cpu_set_reserved_ttbr0();
 	flush_tlb_all();
+	cpu_set_default_tcr_t0sz();
 }
 
 /*
@@ -461,8 +464,10 @@ void __init paging_init(void)
  */
 void setup_mm_for_reboot(void)
 {
-	cpu_switch_mm(idmap_pg_dir, &init_mm);
+	cpu_set_reserved_ttbr0();
 	flush_tlb_all();
+	cpu_set_idmap_tcr_t0sz();
+	cpu_switch_mm(idmap_pg_dir, &init_mm);
 }
 
 /*
diff --git a/arch/arm64/mm/proc-macros.S b/arch/arm64/mm/proc-macros.S
index 005d29e2977d..c17fdd6a19bc 100644
--- a/arch/arm64/mm/proc-macros.S
+++ b/arch/arm64/mm/proc-macros.S
@@ -52,3 +52,14 @@
 	mov	\reg, #4			// bytes per word
 	lsl	\reg, \reg, \tmp		// actual cache line size
 	.endm
+
+/*
+ * tcr_set_idmap_t0sz - update TCR.T0SZ so that we can load the ID map
+ */
+	.macro	tcr_set_idmap_t0sz, valreg, tmpreg
+#ifndef CONFIG_ARM64_VA_BITS_48
+	adrp	\tmpreg, idmap_t0sz
+	ldr	\tmpreg, [\tmpreg, #:lo12:idmap_t0sz]
+	bfi	\valreg, \tmpreg, #TCR_T0SZ_OFFSET, #TCR_TxSZ_WIDTH
+#endif
+	.endm
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 28eebfb6af76..cdd754e19b9b 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -156,6 +156,7 @@ ENTRY(cpu_do_resume)
 	msr	cpacr_el1, x6
 	msr	ttbr0_el1, x1
 	msr	ttbr1_el1, x7
+	tcr_set_idmap_t0sz x8, x7
 	msr	tcr_el1, x8
 	msr	vbar_el1, x9
 	msr	mdscr_el1, x10
@@ -233,6 +234,8 @@ ENTRY(__cpu_setup)
 	 */
 	ldr	x10, =TCR_TxSZ(VA_BITS) | TCR_CACHE_FLAGS | TCR_SMP_FLAGS | \
 			TCR_TG_FLAGS | TCR_ASID16 | TCR_TBI0
+	tcr_set_idmap_t0sz	x10, x9
+
 	/*
 	 * Read the PARange bits from ID_AA64MMFR0_EL1 and set the IPS bits in
 	 * TCR_EL1.
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH roundup 2/4] ARM: KVM: avoid "HYP init code too big" error
  2015-03-06 14:34 ` Ard Biesheuvel
@ 2015-03-06 14:34   ` Ard Biesheuvel
  -1 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2015-03-06 14:34 UTC (permalink / raw)
  To: marc.zyngier, christoffer.dall, linux, linux-arm-kernel, kvmarm
  Cc: will.deacon, arnd, Ard Biesheuvel

From: Arnd Bergmann <arnd@arndb.de>

When building large kernels, the linker will emit lots of veneers
into the .hyp.idmap.text section, which causes it to grow beyond
one page, and that triggers the build error.

This moves the section into .rodata instead, which avoids the
veneers and is safe because the code is not executed directly
but remapped by the hypervisor into its own executable address
space.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
[ardb: move the ALIGN() to .rodata as well, update log s/copied/remapped/]
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/kernel/vmlinux.lds.S | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index b31aa73e8076..2787eb8d3616 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -22,11 +22,15 @@
 	ALIGN_FUNCTION();						\
 	VMLINUX_SYMBOL(__idmap_text_start) = .;				\
 	*(.idmap.text)							\
-	VMLINUX_SYMBOL(__idmap_text_end) = .;				\
+	VMLINUX_SYMBOL(__idmap_text_end) = .;
+
+#define IDMAP_RODATA							\
+	.rodata : {							\
 	. = ALIGN(32);							\
 	VMLINUX_SYMBOL(__hyp_idmap_text_start) = .;			\
 	*(.hyp.idmap.text)						\
-	VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;
+	VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;			\
+	}
 
 #ifdef CONFIG_HOTPLUG_CPU
 #define ARM_CPU_DISCARD(x)
@@ -124,6 +128,7 @@ SECTIONS
 	. = ALIGN(1<<SECTION_SHIFT);
 #endif
 	RO_DATA(PAGE_SIZE)
+	IDMAP_RODATA
 
 	. = ALIGN(4);
 	__ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) {
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH roundup 2/4] ARM: KVM: avoid "HYP init code too big" error
@ 2015-03-06 14:34   ` Ard Biesheuvel
  0 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2015-03-06 14:34 UTC (permalink / raw)
  To: linux-arm-kernel

From: Arnd Bergmann <arnd@arndb.de>

When building large kernels, the linker will emit lots of veneers
into the .hyp.idmap.text section, which causes it to grow beyond
one page, and that triggers the build error.

This moves the section into .rodata instead, which avoids the
veneers and is safe because the code is not executed directly
but remapped by the hypervisor into its own executable address
space.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
[ardb: move the ALIGN() to .rodata as well, update log s/copied/remapped/]
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/kernel/vmlinux.lds.S | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index b31aa73e8076..2787eb8d3616 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -22,11 +22,15 @@
 	ALIGN_FUNCTION();						\
 	VMLINUX_SYMBOL(__idmap_text_start) = .;				\
 	*(.idmap.text)							\
-	VMLINUX_SYMBOL(__idmap_text_end) = .;				\
+	VMLINUX_SYMBOL(__idmap_text_end) = .;
+
+#define IDMAP_RODATA							\
+	.rodata : {							\
 	. = ALIGN(32);							\
 	VMLINUX_SYMBOL(__hyp_idmap_text_start) = .;			\
 	*(.hyp.idmap.text)						\
-	VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;
+	VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;			\
+	}
 
 #ifdef CONFIG_HOTPLUG_CPU
 #define ARM_CPU_DISCARD(x)
@@ -124,6 +128,7 @@ SECTIONS
 	. = ALIGN(1<<SECTION_SHIFT);
 #endif
 	RO_DATA(PAGE_SIZE)
+	IDMAP_RODATA
 
 	. = ALIGN(4);
 	__ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) {
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH roundup 3/4] ARM, arm64: kvm: get rid of the bounce page
  2015-03-06 14:34 ` Ard Biesheuvel
@ 2015-03-06 14:34   ` Ard Biesheuvel
  -1 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2015-03-06 14:34 UTC (permalink / raw)
  To: marc.zyngier, christoffer.dall, linux, linux-arm-kernel, kvmarm
  Cc: will.deacon, arnd, Ard Biesheuvel

The HYP init bounce page is a runtime construct that ensures that the
HYP init code does not cross a page boundary. However, this is something
we can do perfectly well at build time, by aligning the code appropriately.

For arm64, we just align to 4 KB, and enforce that the code size is less
than 4 KB, regardless of the chosen page size.

For ARM, the whole code is less than 256 bytes, so we tweak the linker
script to align at a power of 2 upper bound of the code size

Note that this also fixes a benign off-by-one error in the original bounce
page code, where a bounce page would be allocated unnecessarily if the code
was exactly 1 page in size.

Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/kernel/vmlinux.lds.S   | 26 ++++++++++++++++++++++---
 arch/arm/kvm/init.S             |  3 +++
 arch/arm/kvm/mmu.c              | 42 +++++------------------------------------
 arch/arm64/kernel/vmlinux.lds.S | 18 ++++++++++++------
 4 files changed, 43 insertions(+), 46 deletions(-)

diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index 2787eb8d3616..85db1669bfe3 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -26,12 +26,28 @@
 
 #define IDMAP_RODATA							\
 	.rodata : {							\
-	. = ALIGN(32);							\
+	. = ALIGN(HYP_IDMAP_ALIGN);					\
 	VMLINUX_SYMBOL(__hyp_idmap_text_start) = .;			\
 	*(.hyp.idmap.text)						\
 	VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;			\
 	}
 
+/*
+ * If the HYP idmap .text section is populated, it needs to be positioned
+ * such that it will not cross a page boundary in the final output image.
+ * So align it to the section size rounded up to the next power of 2.
+ * If __hyp_idmap_size is undefined, the section will be empty so define
+ * it as 0 in that case.
+ */
+PROVIDE(__hyp_idmap_size = 0);
+
+#define HYP_IDMAP_ALIGN							\
+	__hyp_idmap_size == 0 ? 0 :					\
+	__hyp_idmap_size <= 0x100 ? 0x100 :				\
+	__hyp_idmap_size <= 0x200 ? 0x200 :				\
+	__hyp_idmap_size <= 0x400 ? 0x400 :				\
+	__hyp_idmap_size <= 0x800 ? 0x800 : 0x1000
+
 #ifdef CONFIG_HOTPLUG_CPU
 #define ARM_CPU_DISCARD(x)
 #define ARM_CPU_KEEP(x)		x
@@ -351,8 +367,12 @@ SECTIONS
  */
 ASSERT((__proc_info_end - __proc_info_begin), "missing CPU support")
 ASSERT((__arch_info_end - __arch_info_begin), "no machine record defined")
+
 /*
- * The HYP init code can't be more than a page long.
+ * The HYP init code can't be more than a page long,
+ * and should not cross a page boundary.
  * The above comment applies as well.
  */
-ASSERT(((__hyp_idmap_text_end - __hyp_idmap_text_start) <= PAGE_SIZE), "HYP init code too big")
+ASSERT(((__hyp_idmap_text_end - 1) & PAGE_MASK) -
+	(__hyp_idmap_text_start & PAGE_MASK) == 0,
+	"HYP init code too big or unaligned")
diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
index 3988e72d16ff..11fb1d56f449 100644
--- a/arch/arm/kvm/init.S
+++ b/arch/arm/kvm/init.S
@@ -157,3 +157,6 @@ target:	@ We're now in the trampoline code, switch page tables
 __kvm_hyp_init_end:
 
 	.popsection
+
+	.global	__hyp_idmap_size
+	.set	__hyp_idmap_size, __kvm_hyp_init_end - __kvm_hyp_init
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 3e6859bc3e11..42a24d6b003b 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -37,7 +37,6 @@ static pgd_t *boot_hyp_pgd;
 static pgd_t *hyp_pgd;
 static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
 
-static void *init_bounce_page;
 static unsigned long hyp_idmap_start;
 static unsigned long hyp_idmap_end;
 static phys_addr_t hyp_idmap_vector;
@@ -405,9 +404,6 @@ void free_boot_hyp_pgd(void)
 	if (hyp_pgd)
 		unmap_range(NULL, hyp_pgd, TRAMPOLINE_VA, PAGE_SIZE);
 
-	free_page((unsigned long)init_bounce_page);
-	init_bounce_page = NULL;
-
 	mutex_unlock(&kvm_hyp_pgd_mutex);
 }
 
@@ -1498,39 +1494,11 @@ int kvm_mmu_init(void)
 	hyp_idmap_end = kvm_virt_to_phys(__hyp_idmap_text_end);
 	hyp_idmap_vector = kvm_virt_to_phys(__kvm_hyp_init);
 
-	if ((hyp_idmap_start ^ hyp_idmap_end) & PAGE_MASK) {
-		/*
-		 * Our init code is crossing a page boundary. Allocate
-		 * a bounce page, copy the code over and use that.
-		 */
-		size_t len = __hyp_idmap_text_end - __hyp_idmap_text_start;
-		phys_addr_t phys_base;
-
-		init_bounce_page = (void *)__get_free_page(GFP_KERNEL);
-		if (!init_bounce_page) {
-			kvm_err("Couldn't allocate HYP init bounce page\n");
-			err = -ENOMEM;
-			goto out;
-		}
-
-		memcpy(init_bounce_page, __hyp_idmap_text_start, len);
-		/*
-		 * Warning: the code we just copied to the bounce page
-		 * must be flushed to the point of coherency.
-		 * Otherwise, the data may be sitting in L2, and HYP
-		 * mode won't be able to observe it as it runs with
-		 * caches off at that point.
-		 */
-		kvm_flush_dcache_to_poc(init_bounce_page, len);
-
-		phys_base = kvm_virt_to_phys(init_bounce_page);
-		hyp_idmap_vector += phys_base - hyp_idmap_start;
-		hyp_idmap_start = phys_base;
-		hyp_idmap_end = phys_base + len;
-
-		kvm_info("Using HYP init bounce page @%lx\n",
-			 (unsigned long)phys_base);
-	}
+	/*
+	 * We rely on the linker script to ensure at build time that the HYP
+	 * init code does not cross a page boundary.
+	 */
+	BUG_ON((hyp_idmap_start ^ (hyp_idmap_end - 1)) & PAGE_MASK);
 
 	hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order);
 	boot_hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order);
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 5d9d2dca530d..9e447f983fae 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -23,10 +23,14 @@ jiffies = jiffies_64;
 
 #define HYPERVISOR_TEXT					\
 	/*						\
-	 * Force the alignment to be compatible with	\
-	 * the vectors requirements			\
+	 * Align to 4 KB so that			\
+	 * a) the HYP vector table is at its minimum	\
+	 *    alignment of 2048 bytes			\
+	 * b) the HYP init code will not cross a page	\
+	 *    boundary if its size does not exceed	\
+	 *    4 KB (see related ASSERT() below)		\
 	 */						\
-	. = ALIGN(2048);				\
+	. = ALIGN(SZ_4K);				\
 	VMLINUX_SYMBOL(__hyp_idmap_text_start) = .;	\
 	*(.hyp.idmap.text)				\
 	VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;	\
@@ -163,10 +167,12 @@ SECTIONS
 }
 
 /*
- * The HYP init code can't be more than a page long.
+ * The HYP init code can't be more than a page long,
+ * and should not cross a page boundary.
  */
-ASSERT(((__hyp_idmap_text_start + PAGE_SIZE) > __hyp_idmap_text_end),
-       "HYP init code too big")
+ASSERT(((__hyp_idmap_text_end - 1) & ~(SZ_4K - 1)) -
+	(__hyp_idmap_text_start & ~(SZ_4K - 1)) == 0,
+	"HYP init code too big or unaligned")
 
 /*
  * If padding is applied before .head.text, virt<->phys conversions will fail.
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH roundup 3/4] ARM, arm64: kvm: get rid of the bounce page
@ 2015-03-06 14:34   ` Ard Biesheuvel
  0 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2015-03-06 14:34 UTC (permalink / raw)
  To: linux-arm-kernel

The HYP init bounce page is a runtime construct that ensures that the
HYP init code does not cross a page boundary. However, this is something
we can do perfectly well at build time, by aligning the code appropriately.

For arm64, we just align to 4 KB, and enforce that the code size is less
than 4 KB, regardless of the chosen page size.

For ARM, the whole code is less than 256 bytes, so we tweak the linker
script to align at a power of 2 upper bound of the code size

Note that this also fixes a benign off-by-one error in the original bounce
page code, where a bounce page would be allocated unnecessarily if the code
was exactly 1 page in size.

Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/kernel/vmlinux.lds.S   | 26 ++++++++++++++++++++++---
 arch/arm/kvm/init.S             |  3 +++
 arch/arm/kvm/mmu.c              | 42 +++++------------------------------------
 arch/arm64/kernel/vmlinux.lds.S | 18 ++++++++++++------
 4 files changed, 43 insertions(+), 46 deletions(-)

diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index 2787eb8d3616..85db1669bfe3 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -26,12 +26,28 @@
 
 #define IDMAP_RODATA							\
 	.rodata : {							\
-	. = ALIGN(32);							\
+	. = ALIGN(HYP_IDMAP_ALIGN);					\
 	VMLINUX_SYMBOL(__hyp_idmap_text_start) = .;			\
 	*(.hyp.idmap.text)						\
 	VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;			\
 	}
 
+/*
+ * If the HYP idmap .text section is populated, it needs to be positioned
+ * such that it will not cross a page boundary in the final output image.
+ * So align it to the section size rounded up to the next power of 2.
+ * If __hyp_idmap_size is undefined, the section will be empty so define
+ * it as 0 in that case.
+ */
+PROVIDE(__hyp_idmap_size = 0);
+
+#define HYP_IDMAP_ALIGN							\
+	__hyp_idmap_size == 0 ? 0 :					\
+	__hyp_idmap_size <= 0x100 ? 0x100 :				\
+	__hyp_idmap_size <= 0x200 ? 0x200 :				\
+	__hyp_idmap_size <= 0x400 ? 0x400 :				\
+	__hyp_idmap_size <= 0x800 ? 0x800 : 0x1000
+
 #ifdef CONFIG_HOTPLUG_CPU
 #define ARM_CPU_DISCARD(x)
 #define ARM_CPU_KEEP(x)		x
@@ -351,8 +367,12 @@ SECTIONS
  */
 ASSERT((__proc_info_end - __proc_info_begin), "missing CPU support")
 ASSERT((__arch_info_end - __arch_info_begin), "no machine record defined")
+
 /*
- * The HYP init code can't be more than a page long.
+ * The HYP init code can't be more than a page long,
+ * and should not cross a page boundary.
  * The above comment applies as well.
  */
-ASSERT(((__hyp_idmap_text_end - __hyp_idmap_text_start) <= PAGE_SIZE), "HYP init code too big")
+ASSERT(((__hyp_idmap_text_end - 1) & PAGE_MASK) -
+	(__hyp_idmap_text_start & PAGE_MASK) == 0,
+	"HYP init code too big or unaligned")
diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
index 3988e72d16ff..11fb1d56f449 100644
--- a/arch/arm/kvm/init.S
+++ b/arch/arm/kvm/init.S
@@ -157,3 +157,6 @@ target:	@ We're now in the trampoline code, switch page tables
 __kvm_hyp_init_end:
 
 	.popsection
+
+	.global	__hyp_idmap_size
+	.set	__hyp_idmap_size, __kvm_hyp_init_end - __kvm_hyp_init
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 3e6859bc3e11..42a24d6b003b 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -37,7 +37,6 @@ static pgd_t *boot_hyp_pgd;
 static pgd_t *hyp_pgd;
 static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
 
-static void *init_bounce_page;
 static unsigned long hyp_idmap_start;
 static unsigned long hyp_idmap_end;
 static phys_addr_t hyp_idmap_vector;
@@ -405,9 +404,6 @@ void free_boot_hyp_pgd(void)
 	if (hyp_pgd)
 		unmap_range(NULL, hyp_pgd, TRAMPOLINE_VA, PAGE_SIZE);
 
-	free_page((unsigned long)init_bounce_page);
-	init_bounce_page = NULL;
-
 	mutex_unlock(&kvm_hyp_pgd_mutex);
 }
 
@@ -1498,39 +1494,11 @@ int kvm_mmu_init(void)
 	hyp_idmap_end = kvm_virt_to_phys(__hyp_idmap_text_end);
 	hyp_idmap_vector = kvm_virt_to_phys(__kvm_hyp_init);
 
-	if ((hyp_idmap_start ^ hyp_idmap_end) & PAGE_MASK) {
-		/*
-		 * Our init code is crossing a page boundary. Allocate
-		 * a bounce page, copy the code over and use that.
-		 */
-		size_t len = __hyp_idmap_text_end - __hyp_idmap_text_start;
-		phys_addr_t phys_base;
-
-		init_bounce_page = (void *)__get_free_page(GFP_KERNEL);
-		if (!init_bounce_page) {
-			kvm_err("Couldn't allocate HYP init bounce page\n");
-			err = -ENOMEM;
-			goto out;
-		}
-
-		memcpy(init_bounce_page, __hyp_idmap_text_start, len);
-		/*
-		 * Warning: the code we just copied to the bounce page
-		 * must be flushed to the point of coherency.
-		 * Otherwise, the data may be sitting in L2, and HYP
-		 * mode won't be able to observe it as it runs with
-		 * caches off at that point.
-		 */
-		kvm_flush_dcache_to_poc(init_bounce_page, len);
-
-		phys_base = kvm_virt_to_phys(init_bounce_page);
-		hyp_idmap_vector += phys_base - hyp_idmap_start;
-		hyp_idmap_start = phys_base;
-		hyp_idmap_end = phys_base + len;
-
-		kvm_info("Using HYP init bounce page @%lx\n",
-			 (unsigned long)phys_base);
-	}
+	/*
+	 * We rely on the linker script to ensure at build time that the HYP
+	 * init code does not cross a page boundary.
+	 */
+	BUG_ON((hyp_idmap_start ^ (hyp_idmap_end - 1)) & PAGE_MASK);
 
 	hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order);
 	boot_hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order);
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 5d9d2dca530d..9e447f983fae 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -23,10 +23,14 @@ jiffies = jiffies_64;
 
 #define HYPERVISOR_TEXT					\
 	/*						\
-	 * Force the alignment to be compatible with	\
-	 * the vectors requirements			\
+	 * Align to 4 KB so that			\
+	 * a) the HYP vector table is@its minimum	\
+	 *    alignment of 2048 bytes			\
+	 * b) the HYP init code will not cross a page	\
+	 *    boundary if its size does not exceed	\
+	 *    4 KB (see related ASSERT() below)		\
 	 */						\
-	. = ALIGN(2048);				\
+	. = ALIGN(SZ_4K);				\
 	VMLINUX_SYMBOL(__hyp_idmap_text_start) = .;	\
 	*(.hyp.idmap.text)				\
 	VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;	\
@@ -163,10 +167,12 @@ SECTIONS
 }
 
 /*
- * The HYP init code can't be more than a page long.
+ * The HYP init code can't be more than a page long,
+ * and should not cross a page boundary.
  */
-ASSERT(((__hyp_idmap_text_start + PAGE_SIZE) > __hyp_idmap_text_end),
-       "HYP init code too big")
+ASSERT(((__hyp_idmap_text_end - 1) & ~(SZ_4K - 1)) -
+	(__hyp_idmap_text_start & ~(SZ_4K - 1)) == 0,
+	"HYP init code too big or unaligned")
 
 /*
  * If padding is applied before .head.text, virt<->phys conversions will fail.
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH roundup 4/4] arm64: KVM: use ID map with increased VA range if required
  2015-03-06 14:34 ` Ard Biesheuvel
@ 2015-03-06 14:34   ` Ard Biesheuvel
  -1 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2015-03-06 14:34 UTC (permalink / raw)
  To: marc.zyngier, christoffer.dall, linux, linux-arm-kernel, kvmarm
  Cc: will.deacon, arnd, Ard Biesheuvel

This patch modifies the HYP init code so it can deal with system
RAM residing at an offset which exceeds the reach of VA_BITS.

Like for EL1, this involves configuring an additional level of
translation for the ID map. However, in case of EL2, this implies
that all translations use the extra level, as we cannot seamlessly
switch between translation tables with different numbers of
translation levels.

So add an extra translation table at the root level. Since the
ID map and the runtime HYP map are guaranteed not to overlap, they
can share this root level, and we can essentially merge these two
tables into one.

Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/include/asm/kvm_mmu.h   | 10 ++++++++++
 arch/arm/kvm/mmu.c               | 27 +++++++++++++++++++++++++--
 arch/arm64/include/asm/kvm_mmu.h | 33 +++++++++++++++++++++++++++++++++
 arch/arm64/kvm/hyp-init.S        | 26 ++++++++++++++++++++++++++
 4 files changed, 94 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 37ca2a4c6f09..617a30d00c1d 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -270,6 +270,16 @@ static inline void __kvm_flush_dcache_pud(pud_t pud)
 void kvm_set_way_flush(struct kvm_vcpu *vcpu);
 void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
 
+static inline bool __kvm_cpu_uses_extended_idmap(void)
+{
+	return false;
+}
+
+static inline void __kvm_extend_hypmap(pgd_t *boot_hyp_pgd,
+				       pgd_t *hyp_pgd,
+				       pgd_t *merged_hyp_pgd,
+				       unsigned long hyp_idmap_start) { }
+
 #endif	/* !__ASSEMBLY__ */
 
 #endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 42a24d6b003b..69c2b4ce6160 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -35,6 +35,7 @@ extern char  __hyp_idmap_text_start[], __hyp_idmap_text_end[];
 
 static pgd_t *boot_hyp_pgd;
 static pgd_t *hyp_pgd;
+static pgd_t *merged_hyp_pgd;
 static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
 
 static unsigned long hyp_idmap_start;
@@ -434,6 +435,11 @@ void free_hyp_pgds(void)
 		free_pages((unsigned long)hyp_pgd, hyp_pgd_order);
 		hyp_pgd = NULL;
 	}
+	if (merged_hyp_pgd) {
+		clear_page(merged_hyp_pgd);
+		free_page((unsigned long)merged_hyp_pgd);
+		merged_hyp_pgd = NULL;
+	}
 
 	mutex_unlock(&kvm_hyp_pgd_mutex);
 }
@@ -1473,12 +1479,18 @@ void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu)
 
 phys_addr_t kvm_mmu_get_httbr(void)
 {
-	return virt_to_phys(hyp_pgd);
+	if (__kvm_cpu_uses_extended_idmap())
+		return virt_to_phys(merged_hyp_pgd);
+	else
+		return virt_to_phys(hyp_pgd);
 }
 
 phys_addr_t kvm_mmu_get_boot_httbr(void)
 {
-	return virt_to_phys(boot_hyp_pgd);
+	if (__kvm_cpu_uses_extended_idmap())
+		return virt_to_phys(merged_hyp_pgd);
+	else
+		return virt_to_phys(boot_hyp_pgd);
 }
 
 phys_addr_t kvm_get_idmap_vector(void)
@@ -1521,6 +1533,17 @@ int kvm_mmu_init(void)
 		goto out;
 	}
 
+	if (__kvm_cpu_uses_extended_idmap()) {
+		merged_hyp_pgd = (pgd_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+		if (!merged_hyp_pgd) {
+			kvm_err("Failed to allocate extra HYP pgd\n");
+			goto out;
+		}
+		__kvm_extend_hypmap(boot_hyp_pgd, hyp_pgd, merged_hyp_pgd,
+				    hyp_idmap_start);
+		return 0;
+	}
+
 	/* Map the very same page at the trampoline VA */
 	err = 	__create_hyp_mappings(boot_hyp_pgd,
 				      TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 6458b5373142..edfe6864bc28 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -68,6 +68,8 @@
 #include <asm/pgalloc.h>
 #include <asm/cachetype.h>
 #include <asm/cacheflush.h>
+#include <asm/mmu_context.h>
+#include <asm/pgtable.h>
 
 #define KERN_TO_HYP(kva)	((unsigned long)kva - PAGE_OFFSET + HYP_PAGE_OFFSET)
 
@@ -305,5 +307,36 @@ static inline void __kvm_flush_dcache_pud(pud_t pud)
 void kvm_set_way_flush(struct kvm_vcpu *vcpu);
 void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
 
+static inline bool __kvm_cpu_uses_extended_idmap(void)
+{
+	return __cpu_uses_extended_idmap();
+}
+
+static inline void __kvm_extend_hypmap(pgd_t *boot_hyp_pgd,
+				       pgd_t *hyp_pgd,
+				       pgd_t *merged_hyp_pgd,
+				       unsigned long hyp_idmap_start)
+{
+	int idmap_idx;
+
+	/*
+	 * Use the first entry to access the HYP mappings. It is
+	 * guaranteed to be free, otherwise we wouldn't use an
+	 * extended idmap.
+	 */
+	VM_BUG_ON(pgd_val(merged_hyp_pgd[0]));
+	merged_hyp_pgd[0] = __pgd(__pa(hyp_pgd) | PMD_TYPE_TABLE);
+
+	/*
+	 * Create another extended level entry that points to the boot HYP map,
+	 * which contains an ID mapping of the HYP init code. We essentially
+	 * merge the boot and runtime HYP maps by doing so, but they don't
+	 * overlap anyway, so this is fine.
+	 */
+	idmap_idx = hyp_idmap_start >> VA_BITS;
+	VM_BUG_ON(pgd_val(merged_hyp_pgd[idmap_idx]));
+	merged_hyp_pgd[idmap_idx] = __pgd(__pa(boot_hyp_pgd) | PMD_TYPE_TABLE);
+}
+
 #endif /* __ASSEMBLY__ */
 #endif /* __ARM64_KVM_MMU_H__ */
diff --git a/arch/arm64/kvm/hyp-init.S b/arch/arm64/kvm/hyp-init.S
index c3191168a994..e7e75bea3fa6 100644
--- a/arch/arm64/kvm/hyp-init.S
+++ b/arch/arm64/kvm/hyp-init.S
@@ -20,6 +20,7 @@
 #include <asm/assembler.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/pgtable-hwdef.h>
 
 	.text
 	.pushsection	.hyp.idmap.text, "ax"
@@ -65,6 +66,26 @@ __do_hyp_init:
 	and	x4, x4, x5
 	ldr	x5, =TCR_EL2_FLAGS
 	orr	x4, x4, x5
+
+#ifndef CONFIG_ARM64_VA_BITS_48
+	/*
+	 * If we are running with VA_BITS < 48, we may be running with an extra
+	 * level of translation in the ID map. This is only the case if system
+	 * RAM is out of range for the currently configured page size and number
+	 * of translation levels, in which case we will also need the extra
+	 * level for the HYP ID map, or we won't be able to enable the EL2 MMU.
+	 *
+	 * However, at EL2, there is only one TTBR register, and we can't switch
+	 * between translation tables *and* update TCR_EL2.T0SZ at the same
+	 * time. Bottom line: we need the extra level in *both* our translation
+	 * tables.
+	 *
+	 * So use the same T0SZ value we use for the ID map.
+	 */
+	adrp	x5, idmap_t0sz			// get ID map TCR.T0SZ
+	ldr	x5, [x5, :lo12:idmap_t0sz]
+	bfi	x4, x5, TCR_T0SZ_OFFSET, TCR_TxSZ_WIDTH
+#endif
 	msr	tcr_el2, x4
 
 	ldr	x4, =VTCR_EL2_FLAGS
@@ -91,6 +112,10 @@ __do_hyp_init:
 	msr	sctlr_el2, x4
 	isb
 
+	/* Skip the trampoline dance if we merged the boot and runtime PGDs */
+	cmp	x0, x1
+	b.eq	merged
+
 	/* MMU is now enabled. Get ready for the trampoline dance */
 	ldr	x4, =TRAMPOLINE_VA
 	adr	x5, target
@@ -105,6 +130,7 @@ target: /* We're now in the trampoline code, switch page tables */
 	tlbi	alle2
 	dsb	sy
 
+merged:
 	/* Set the stack and new vectors */
 	kern_hyp_va	x2
 	mov	sp, x2
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH roundup 4/4] arm64: KVM: use ID map with increased VA range if required
@ 2015-03-06 14:34   ` Ard Biesheuvel
  0 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2015-03-06 14:34 UTC (permalink / raw)
  To: linux-arm-kernel

This patch modifies the HYP init code so it can deal with system
RAM residing at an offset which exceeds the reach of VA_BITS.

Like for EL1, this involves configuring an additional level of
translation for the ID map. However, in case of EL2, this implies
that all translations use the extra level, as we cannot seamlessly
switch between translation tables with different numbers of
translation levels.

So add an extra translation table at the root level. Since the
ID map and the runtime HYP map are guaranteed not to overlap, they
can share this root level, and we can essentially merge these two
tables into one.

Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/include/asm/kvm_mmu.h   | 10 ++++++++++
 arch/arm/kvm/mmu.c               | 27 +++++++++++++++++++++++++--
 arch/arm64/include/asm/kvm_mmu.h | 33 +++++++++++++++++++++++++++++++++
 arch/arm64/kvm/hyp-init.S        | 26 ++++++++++++++++++++++++++
 4 files changed, 94 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 37ca2a4c6f09..617a30d00c1d 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -270,6 +270,16 @@ static inline void __kvm_flush_dcache_pud(pud_t pud)
 void kvm_set_way_flush(struct kvm_vcpu *vcpu);
 void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
 
+static inline bool __kvm_cpu_uses_extended_idmap(void)
+{
+	return false;
+}
+
+static inline void __kvm_extend_hypmap(pgd_t *boot_hyp_pgd,
+				       pgd_t *hyp_pgd,
+				       pgd_t *merged_hyp_pgd,
+				       unsigned long hyp_idmap_start) { }
+
 #endif	/* !__ASSEMBLY__ */
 
 #endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 42a24d6b003b..69c2b4ce6160 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -35,6 +35,7 @@ extern char  __hyp_idmap_text_start[], __hyp_idmap_text_end[];
 
 static pgd_t *boot_hyp_pgd;
 static pgd_t *hyp_pgd;
+static pgd_t *merged_hyp_pgd;
 static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
 
 static unsigned long hyp_idmap_start;
@@ -434,6 +435,11 @@ void free_hyp_pgds(void)
 		free_pages((unsigned long)hyp_pgd, hyp_pgd_order);
 		hyp_pgd = NULL;
 	}
+	if (merged_hyp_pgd) {
+		clear_page(merged_hyp_pgd);
+		free_page((unsigned long)merged_hyp_pgd);
+		merged_hyp_pgd = NULL;
+	}
 
 	mutex_unlock(&kvm_hyp_pgd_mutex);
 }
@@ -1473,12 +1479,18 @@ void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu)
 
 phys_addr_t kvm_mmu_get_httbr(void)
 {
-	return virt_to_phys(hyp_pgd);
+	if (__kvm_cpu_uses_extended_idmap())
+		return virt_to_phys(merged_hyp_pgd);
+	else
+		return virt_to_phys(hyp_pgd);
 }
 
 phys_addr_t kvm_mmu_get_boot_httbr(void)
 {
-	return virt_to_phys(boot_hyp_pgd);
+	if (__kvm_cpu_uses_extended_idmap())
+		return virt_to_phys(merged_hyp_pgd);
+	else
+		return virt_to_phys(boot_hyp_pgd);
 }
 
 phys_addr_t kvm_get_idmap_vector(void)
@@ -1521,6 +1533,17 @@ int kvm_mmu_init(void)
 		goto out;
 	}
 
+	if (__kvm_cpu_uses_extended_idmap()) {
+		merged_hyp_pgd = (pgd_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+		if (!merged_hyp_pgd) {
+			kvm_err("Failed to allocate extra HYP pgd\n");
+			goto out;
+		}
+		__kvm_extend_hypmap(boot_hyp_pgd, hyp_pgd, merged_hyp_pgd,
+				    hyp_idmap_start);
+		return 0;
+	}
+
 	/* Map the very same page at the trampoline VA */
 	err = 	__create_hyp_mappings(boot_hyp_pgd,
 				      TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 6458b5373142..edfe6864bc28 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -68,6 +68,8 @@
 #include <asm/pgalloc.h>
 #include <asm/cachetype.h>
 #include <asm/cacheflush.h>
+#include <asm/mmu_context.h>
+#include <asm/pgtable.h>
 
 #define KERN_TO_HYP(kva)	((unsigned long)kva - PAGE_OFFSET + HYP_PAGE_OFFSET)
 
@@ -305,5 +307,36 @@ static inline void __kvm_flush_dcache_pud(pud_t pud)
 void kvm_set_way_flush(struct kvm_vcpu *vcpu);
 void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
 
+static inline bool __kvm_cpu_uses_extended_idmap(void)
+{
+	return __cpu_uses_extended_idmap();
+}
+
+static inline void __kvm_extend_hypmap(pgd_t *boot_hyp_pgd,
+				       pgd_t *hyp_pgd,
+				       pgd_t *merged_hyp_pgd,
+				       unsigned long hyp_idmap_start)
+{
+	int idmap_idx;
+
+	/*
+	 * Use the first entry to access the HYP mappings. It is
+	 * guaranteed to be free, otherwise we wouldn't use an
+	 * extended idmap.
+	 */
+	VM_BUG_ON(pgd_val(merged_hyp_pgd[0]));
+	merged_hyp_pgd[0] = __pgd(__pa(hyp_pgd) | PMD_TYPE_TABLE);
+
+	/*
+	 * Create another extended level entry that points to the boot HYP map,
+	 * which contains an ID mapping of the HYP init code. We essentially
+	 * merge the boot and runtime HYP maps by doing so, but they don't
+	 * overlap anyway, so this is fine.
+	 */
+	idmap_idx = hyp_idmap_start >> VA_BITS;
+	VM_BUG_ON(pgd_val(merged_hyp_pgd[idmap_idx]));
+	merged_hyp_pgd[idmap_idx] = __pgd(__pa(boot_hyp_pgd) | PMD_TYPE_TABLE);
+}
+
 #endif /* __ASSEMBLY__ */
 #endif /* __ARM64_KVM_MMU_H__ */
diff --git a/arch/arm64/kvm/hyp-init.S b/arch/arm64/kvm/hyp-init.S
index c3191168a994..e7e75bea3fa6 100644
--- a/arch/arm64/kvm/hyp-init.S
+++ b/arch/arm64/kvm/hyp-init.S
@@ -20,6 +20,7 @@
 #include <asm/assembler.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/pgtable-hwdef.h>
 
 	.text
 	.pushsection	.hyp.idmap.text, "ax"
@@ -65,6 +66,26 @@ __do_hyp_init:
 	and	x4, x4, x5
 	ldr	x5, =TCR_EL2_FLAGS
 	orr	x4, x4, x5
+
+#ifndef CONFIG_ARM64_VA_BITS_48
+	/*
+	 * If we are running with VA_BITS < 48, we may be running with an extra
+	 * level of translation in the ID map. This is only the case if system
+	 * RAM is out of range for the currently configured page size and number
+	 * of translation levels, in which case we will also need the extra
+	 * level for the HYP ID map, or we won't be able to enable the EL2 MMU.
+	 *
+	 * However, at EL2, there is only one TTBR register, and we can't switch
+	 * between translation tables *and* update TCR_EL2.T0SZ@the same
+	 * time. Bottom line: we need the extra level in *both* our translation
+	 * tables.
+	 *
+	 * So use the same T0SZ value we use for the ID map.
+	 */
+	adrp	x5, idmap_t0sz			// get ID map TCR.T0SZ
+	ldr	x5, [x5, :lo12:idmap_t0sz]
+	bfi	x4, x5, TCR_T0SZ_OFFSET, TCR_TxSZ_WIDTH
+#endif
 	msr	tcr_el2, x4
 
 	ldr	x4, =VTCR_EL2_FLAGS
@@ -91,6 +112,10 @@ __do_hyp_init:
 	msr	sctlr_el2, x4
 	isb
 
+	/* Skip the trampoline dance if we merged the boot and runtime PGDs */
+	cmp	x0, x1
+	b.eq	merged
+
 	/* MMU is now enabled. Get ready for the trampoline dance */
 	ldr	x4, =TRAMPOLINE_VA
 	adr	x5, target
@@ -105,6 +130,7 @@ target: /* We're now in the trampoline code, switch page tables */
 	tlbi	alle2
 	dsb	sy
 
+merged:
 	/* Set the stack and new vectors */
 	kern_hyp_va	x2
 	mov	sp, x2
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH roundup 2/4] ARM: KVM: avoid "HYP init code too big" error
  2015-03-06 14:34   ` Ard Biesheuvel
@ 2015-03-09 19:09     ` Russell King - ARM Linux
  -1 siblings, 0 replies; 18+ messages in thread
From: Russell King - ARM Linux @ 2015-03-09 19:09 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: arnd, marc.zyngier, will.deacon, kvmarm, linux-arm-kernel

On Fri, Mar 06, 2015 at 03:34:40PM +0100, Ard Biesheuvel wrote:
> From: Arnd Bergmann <arnd@arndb.de>
> 
> When building large kernels, the linker will emit lots of veneers
> into the .hyp.idmap.text section, which causes it to grow beyond
> one page, and that triggers the build error.
> 
> This moves the section into .rodata instead, which avoids the
> veneers and is safe because the code is not executed directly
> but remapped by the hypervisor into its own executable address
> space.
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> [ardb: move the ALIGN() to .rodata as well, update log s/copied/remapped/]
> Tested-by: Marc Zyngier <marc.zyngier@arm.com>
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm/kernel/vmlinux.lds.S | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
> index b31aa73e8076..2787eb8d3616 100644
> --- a/arch/arm/kernel/vmlinux.lds.S
> +++ b/arch/arm/kernel/vmlinux.lds.S
> @@ -22,11 +22,15 @@
>  	ALIGN_FUNCTION();						\
>  	VMLINUX_SYMBOL(__idmap_text_start) = .;				\
>  	*(.idmap.text)							\
> -	VMLINUX_SYMBOL(__idmap_text_end) = .;				\
> +	VMLINUX_SYMBOL(__idmap_text_end) = .;
> +
> +#define IDMAP_RODATA							\
> +	.rodata : {							\

We already have a .rodata section defined by RO_DATA().  Quite how this
interacts with the existing .rodata section, I don't know, but it
probably won't be right.  Have you checked what effect this has?

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH roundup 2/4] ARM: KVM: avoid "HYP init code too big" error
@ 2015-03-09 19:09     ` Russell King - ARM Linux
  0 siblings, 0 replies; 18+ messages in thread
From: Russell King - ARM Linux @ 2015-03-09 19:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Mar 06, 2015 at 03:34:40PM +0100, Ard Biesheuvel wrote:
> From: Arnd Bergmann <arnd@arndb.de>
> 
> When building large kernels, the linker will emit lots of veneers
> into the .hyp.idmap.text section, which causes it to grow beyond
> one page, and that triggers the build error.
> 
> This moves the section into .rodata instead, which avoids the
> veneers and is safe because the code is not executed directly
> but remapped by the hypervisor into its own executable address
> space.
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> [ardb: move the ALIGN() to .rodata as well, update log s/copied/remapped/]
> Tested-by: Marc Zyngier <marc.zyngier@arm.com>
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm/kernel/vmlinux.lds.S | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
> index b31aa73e8076..2787eb8d3616 100644
> --- a/arch/arm/kernel/vmlinux.lds.S
> +++ b/arch/arm/kernel/vmlinux.lds.S
> @@ -22,11 +22,15 @@
>  	ALIGN_FUNCTION();						\
>  	VMLINUX_SYMBOL(__idmap_text_start) = .;				\
>  	*(.idmap.text)							\
> -	VMLINUX_SYMBOL(__idmap_text_end) = .;				\
> +	VMLINUX_SYMBOL(__idmap_text_end) = .;
> +
> +#define IDMAP_RODATA							\
> +	.rodata : {							\

We already have a .rodata section defined by RO_DATA().  Quite how this
interacts with the existing .rodata section, I don't know, but it
probably won't be right.  Have you checked what effect this has?

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH roundup 2/4] ARM: KVM: avoid "HYP init code too big" error
  2015-03-09 19:09     ` Russell King - ARM Linux
@ 2015-03-10  9:56       ` Ard Biesheuvel
  -1 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2015-03-10  9:56 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Arnd Bergmann, Marc Zyngier, Will Deacon, kvmarm, linux-arm-kernel

(resend with complete cc)

On 9 March 2015 at 20:09, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Fri, Mar 06, 2015 at 03:34:40PM +0100, Ard Biesheuvel wrote:
>> From: Arnd Bergmann <arnd@arndb.de>
>>
>> When building large kernels, the linker will emit lots of veneers
>> into the .hyp.idmap.text section, which causes it to grow beyond
>> one page, and that triggers the build error.
>>
>> This moves the section into .rodata instead, which avoids the
>> veneers and is safe because the code is not executed directly
>> but remapped by the hypervisor into its own executable address
>> space.
>>
>> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>> [ardb: move the ALIGN() to .rodata as well, update log s/copied/remapped/]
>> Tested-by: Marc Zyngier <marc.zyngier@arm.com>
>> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  arch/arm/kernel/vmlinux.lds.S | 9 +++++++--
>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
>> index b31aa73e8076..2787eb8d3616 100644
>> --- a/arch/arm/kernel/vmlinux.lds.S
>> +++ b/arch/arm/kernel/vmlinux.lds.S
>> @@ -22,11 +22,15 @@
>>       ALIGN_FUNCTION();                                               \
>>       VMLINUX_SYMBOL(__idmap_text_start) = .;                         \
>>       *(.idmap.text)                                                  \
>> -     VMLINUX_SYMBOL(__idmap_text_end) = .;                           \
>> +     VMLINUX_SYMBOL(__idmap_text_end) = .;
>> +
>> +#define IDMAP_RODATA                                                 \
>> +     .rodata : {                                                     \
>
> We already have a .rodata section defined by RO_DATA().  Quite how this
> interacts with the existing .rodata section, I don't know, but it
> probably won't be right.  Have you checked what effect this has?
>

 Here's just the rodata lines from 'readelf -S vmlinux', with and
without the patch applied

[ 4] .rodata           PROGBITS        c0752000 552000 310620 00   A  0   0 64

[ 4] .rodata           PROGBITS        c0752000 552000 3106c0 00  AX  0   0 64

and there is only a single one, so it appears binutils is quite happy with this.

If the A -> AX bothers you, we could fold in the following hunk as well.

--- a/arch/arm/kvm/init.S
+++ b/arch/arm/kvm/init.S
@@ -52,7 +52,7 @@
  */

        .text
-       .pushsection    .hyp.idmap.text,"ax"
+       .pushsection    .hyp.idmap.text,"a"
        .align 5
 __kvm_hyp_init:
        .globl __kvm_hyp_init

which takes care of that.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH roundup 2/4] ARM: KVM: avoid "HYP init code too big" error
@ 2015-03-10  9:56       ` Ard Biesheuvel
  0 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2015-03-10  9:56 UTC (permalink / raw)
  To: linux-arm-kernel

(resend with complete cc)

On 9 March 2015 at 20:09, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Fri, Mar 06, 2015 at 03:34:40PM +0100, Ard Biesheuvel wrote:
>> From: Arnd Bergmann <arnd@arndb.de>
>>
>> When building large kernels, the linker will emit lots of veneers
>> into the .hyp.idmap.text section, which causes it to grow beyond
>> one page, and that triggers the build error.
>>
>> This moves the section into .rodata instead, which avoids the
>> veneers and is safe because the code is not executed directly
>> but remapped by the hypervisor into its own executable address
>> space.
>>
>> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>> [ardb: move the ALIGN() to .rodata as well, update log s/copied/remapped/]
>> Tested-by: Marc Zyngier <marc.zyngier@arm.com>
>> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  arch/arm/kernel/vmlinux.lds.S | 9 +++++++--
>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
>> index b31aa73e8076..2787eb8d3616 100644
>> --- a/arch/arm/kernel/vmlinux.lds.S
>> +++ b/arch/arm/kernel/vmlinux.lds.S
>> @@ -22,11 +22,15 @@
>>       ALIGN_FUNCTION();                                               \
>>       VMLINUX_SYMBOL(__idmap_text_start) = .;                         \
>>       *(.idmap.text)                                                  \
>> -     VMLINUX_SYMBOL(__idmap_text_end) = .;                           \
>> +     VMLINUX_SYMBOL(__idmap_text_end) = .;
>> +
>> +#define IDMAP_RODATA                                                 \
>> +     .rodata : {                                                     \
>
> We already have a .rodata section defined by RO_DATA().  Quite how this
> interacts with the existing .rodata section, I don't know, but it
> probably won't be right.  Have you checked what effect this has?
>

 Here's just the rodata lines from 'readelf -S vmlinux', with and
without the patch applied

[ 4] .rodata           PROGBITS        c0752000 552000 310620 00   A  0   0 64

[ 4] .rodata           PROGBITS        c0752000 552000 3106c0 00  AX  0   0 64

and there is only a single one, so it appears binutils is quite happy with this.

If the A -> AX bothers you, we could fold in the following hunk as well.

--- a/arch/arm/kvm/init.S
+++ b/arch/arm/kvm/init.S
@@ -52,7 +52,7 @@
  */

        .text
-       .pushsection    .hyp.idmap.text,"ax"
+       .pushsection    .hyp.idmap.text,"a"
        .align 5
 __kvm_hyp_init:
        .globl __kvm_hyp_init

which takes care of that.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH roundup 1/4] arm64: mm: increase VA range of identity map
  2015-03-06 14:34   ` Ard Biesheuvel
@ 2015-03-16 14:28     ` Christoffer Dall
  -1 siblings, 0 replies; 18+ messages in thread
From: Christoffer Dall @ 2015-03-16 14:28 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux, arnd, marc.zyngier, will.deacon, kvmarm, linux-arm-kernel

On Fri, Mar 06, 2015 at 03:34:39PM +0100, Ard Biesheuvel wrote:
> The page size and the number of translation levels, and hence the supported
> virtual address range, are build-time configurables on arm64 whose optimal
> values are use case dependent. However, in the current implementation, if
> the system's RAM is located at a very high offset, the virtual address range
> needs to reflect that merely because the identity mapping, which is only used
> to enable or disable the MMU, requires the extended virtual range to map the
> physical memory at an equal virtual offset.
> 
> This patch relaxes that requirement, by increasing the number of translation
> levels for the identity mapping only, and only when actually needed, i.e.,
> when system RAM's offset is found to be out of reach at runtime.
> 
> Tested-by: Laura Abbott <lauraa@codeaurora.org>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> Tested-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/include/asm/mmu_context.h   | 43 ++++++++++++++++++++++++++++++++++
>  arch/arm64/include/asm/page.h          |  6 +++--
>  arch/arm64/include/asm/pgtable-hwdef.h |  7 +++++-
>  arch/arm64/kernel/head.S               | 38 ++++++++++++++++++++++++++++++
>  arch/arm64/kernel/smp.c                |  1 +
>  arch/arm64/mm/mmu.c                    |  7 +++++-
>  arch/arm64/mm/proc-macros.S            | 11 +++++++++
>  arch/arm64/mm/proc.S                   |  3 +++
>  8 files changed, 112 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
> index a9eee33dfa62..ecf2d060036b 100644
> --- a/arch/arm64/include/asm/mmu_context.h
> +++ b/arch/arm64/include/asm/mmu_context.h
> @@ -64,6 +64,49 @@ static inline void cpu_set_reserved_ttbr0(void)
>  	: "r" (ttbr));
>  }
>  
> +/*
> + * TCR.T0SZ value to use when the ID map is active. Usually equals
> + * TCR_T0SZ(VA_BITS), unless system RAM is positioned very high in
> + * physical memory, in which case it will be smaller.
> + */
> +extern u64 idmap_t0sz;
> +
> +static inline bool __cpu_uses_extended_idmap(void)
> +{
> +	return (!IS_ENABLED(CONFIG_ARM64_VA_BITS_48) &&
> +		unlikely(idmap_t0sz != TCR_T0SZ(VA_BITS)));
> +}
> +
> +static inline void __cpu_set_tcr_t0sz(u64 t0sz)
> +{
> +	unsigned long tcr;
> +
> +	if (__cpu_uses_extended_idmap())
> +		asm volatile (
> +		"	mrs	%0, tcr_el1	;"
> +		"	bfi	%0, %1, %2, %3	;"
> +		"	msr	tcr_el1, %0	;"
> +		"	isb"
> +		: "=&r" (tcr)
> +		: "r"(t0sz), "I"(TCR_T0SZ_OFFSET), "I"(TCR_TxSZ_WIDTH));
> +}
> +
> +/*
> + * Set TCR.T0SZ to the value appropriate for activating the identity map.
> + */
> +static inline void cpu_set_idmap_tcr_t0sz(void)
> +{
> +	__cpu_set_tcr_t0sz(idmap_t0sz);
> +}
> +
> +/*
> + * Set TCR.T0SZ to its default value (based on VA_BITS)
> + */
> +static inline void cpu_set_default_tcr_t0sz(void)
> +{
> +	__cpu_set_tcr_t0sz(TCR_T0SZ(VA_BITS));
> +}
> +
>  static inline void switch_new_context(struct mm_struct *mm)
>  {
>  	unsigned long flags;
> diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
> index 22b16232bd60..3d02b1869eb8 100644
> --- a/arch/arm64/include/asm/page.h
> +++ b/arch/arm64/include/asm/page.h
> @@ -33,7 +33,9 @@
>   * image. Both require pgd, pud (4 levels only) and pmd tables to (section)
>   * map the kernel. With the 64K page configuration, swapper and idmap need to
>   * map to pte level. The swapper also maps the FDT (see __create_page_tables
> - * for more information).
> + * for more information). Note that the number of ID map translation levels
> + * could be increased on the fly if system RAM is out of reach for the default
> + * VA range, so 3 pages are reserved in all cases.
>   */
>  #ifdef CONFIG_ARM64_64K_PAGES
>  #define SWAPPER_PGTABLE_LEVELS	(CONFIG_ARM64_PGTABLE_LEVELS)
> @@ -42,7 +44,7 @@
>  #endif
>  
>  #define SWAPPER_DIR_SIZE	(SWAPPER_PGTABLE_LEVELS * PAGE_SIZE)
> -#define IDMAP_DIR_SIZE		(SWAPPER_DIR_SIZE)
> +#define IDMAP_DIR_SIZE		(3 * PAGE_SIZE)
>  
>  #ifndef __ASSEMBLY__
>  
> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
> index 5f930cc9ea83..847e864202cc 100644
> --- a/arch/arm64/include/asm/pgtable-hwdef.h
> +++ b/arch/arm64/include/asm/pgtable-hwdef.h
> @@ -143,7 +143,12 @@
>  /*
>   * TCR flags.
>   */
> -#define TCR_TxSZ(x)		(((UL(64) - (x)) << 16) | ((UL(64) - (x)) << 0))
> +#define TCR_T0SZ_OFFSET		0
> +#define TCR_T1SZ_OFFSET		16
> +#define TCR_T0SZ(x)		((UL(64) - (x)) << TCR_T0SZ_OFFSET)
> +#define TCR_T1SZ(x)		((UL(64) - (x)) << TCR_T1SZ_OFFSET)
> +#define TCR_TxSZ(x)		(TCR_T0SZ(x) | TCR_T1SZ(x))
> +#define TCR_TxSZ_WIDTH		6
>  #define TCR_IRGN_NC		((UL(0) << 8) | (UL(0) << 24))
>  #define TCR_IRGN_WBWA		((UL(1) << 8) | (UL(1) << 24))
>  #define TCR_IRGN_WT		((UL(2) << 8) | (UL(2) << 24))
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 8ce88e08c030..a3612eadab3c 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -387,6 +387,44 @@ __create_page_tables:
>  	mov	x0, x25				// idmap_pg_dir
>  	ldr	x3, =KERNEL_START
>  	add	x3, x3, x28			// __pa(KERNEL_START)
> +
> +#ifndef CONFIG_ARM64_VA_BITS_48
> +#define EXTRA_SHIFT	(PGDIR_SHIFT + PAGE_SHIFT - 3)
> +#define EXTRA_PTRS	(1 << (48 - EXTRA_SHIFT))

How does this math work exactly?

I also had to look at the create_pgd_entry macros to understand that these
mean the shift for the 'extra' pgtable, and not the extra amount of
shifts compared to PGDIR_SHIFT.  Not sure if that warrants a comment?


> +
> +	/*
> +	 * If VA_BITS < 48, it may be too small to allow for an ID mapping to be
> +	 * created that covers system RAM if that is located sufficiently high
> +	 * in the physical address space. So for the ID map, use an extended
> +	 * virtual range in that case, by configuring an additional translation
> +	 * level.
> +	 * First, we have to verify our assumption that the current value of
> +	 * VA_BITS was chosen such that all translation levels are fully
> +	 * utilised, and that lowering T0SZ will always result in an additional
> +	 * translation level to be configured.
> +	 */
> +#if VA_BITS != EXTRA_SHIFT
> +#error "Mismatch between VA_BITS and page size/number of translation levels"
> +#endif
> +
> +	/*
> +	 * Calculate the maximum allowed value for TCR_EL1.T0SZ so that the
> +	 * entire kernel image can be ID mapped. As T0SZ == (64 - #bits used),
> +	 * this number conveniently equals the number of leading zeroes in
> +	 * the physical address of KERNEL_END.
> +	 */
> +	adrp	x5, KERNEL_END
> +	clz	x5, x5
> +	cmp	x5, TCR_T0SZ(VA_BITS)	// default T0SZ small enough?
> +	b.ge	1f			// .. then skip additional level
> +
> +	adrp	x6, idmap_t0sz
> +	str	x5, [x6, :lo12:idmap_t0sz]
> +
> +	create_table_entry x0, x3, EXTRA_SHIFT, EXTRA_PTRS, x5, x6

can you explain me how the subsequent call to create_pgd_entry with the
same tbl (x0) value ends up passing the right pointer from the extra
level to the pgd to the block mappings?

> +1:
> +#endif
> +
>  	create_pgd_entry x0, x3, x5, x6
>  	ldr	x6, =KERNEL_END
>  	mov	x5, x3				// __pa(KERNEL_START)
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 328b8ce4b007..74554dfcce73 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -151,6 +151,7 @@ asmlinkage void secondary_start_kernel(void)
>  	 */
>  	cpu_set_reserved_ttbr0();
>  	flush_tlb_all();
> +	cpu_set_default_tcr_t0sz();
>  
>  	preempt_disable();
>  	trace_hardirqs_off();
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index c6daaf6c6f97..c4f60393383e 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -40,6 +40,8 @@
>  
>  #include "mm.h"
>  
> +u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
> +
>  /*
>   * Empty_zero_page is a special page that is used for zero-initialized data
>   * and COW.
> @@ -454,6 +456,7 @@ void __init paging_init(void)
>  	 */
>  	cpu_set_reserved_ttbr0();
>  	flush_tlb_all();
> +	cpu_set_default_tcr_t0sz();
>  }
>  
>  /*
> @@ -461,8 +464,10 @@ void __init paging_init(void)
>   */
>  void setup_mm_for_reboot(void)
>  {
> -	cpu_switch_mm(idmap_pg_dir, &init_mm);
> +	cpu_set_reserved_ttbr0();
>  	flush_tlb_all();
> +	cpu_set_idmap_tcr_t0sz();
> +	cpu_switch_mm(idmap_pg_dir, &init_mm);
>  }
>  
>  /*
> diff --git a/arch/arm64/mm/proc-macros.S b/arch/arm64/mm/proc-macros.S
> index 005d29e2977d..c17fdd6a19bc 100644
> --- a/arch/arm64/mm/proc-macros.S
> +++ b/arch/arm64/mm/proc-macros.S
> @@ -52,3 +52,14 @@
>  	mov	\reg, #4			// bytes per word
>  	lsl	\reg, \reg, \tmp		// actual cache line size
>  	.endm
> +
> +/*
> + * tcr_set_idmap_t0sz - update TCR.T0SZ so that we can load the ID map
> + */
> +	.macro	tcr_set_idmap_t0sz, valreg, tmpreg
> +#ifndef CONFIG_ARM64_VA_BITS_48
> +	adrp	\tmpreg, idmap_t0sz
> +	ldr	\tmpreg, [\tmpreg, #:lo12:idmap_t0sz]
> +	bfi	\valreg, \tmpreg, #TCR_T0SZ_OFFSET, #TCR_TxSZ_WIDTH
> +#endif
> +	.endm
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 28eebfb6af76..cdd754e19b9b 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -156,6 +156,7 @@ ENTRY(cpu_do_resume)
>  	msr	cpacr_el1, x6
>  	msr	ttbr0_el1, x1
>  	msr	ttbr1_el1, x7
> +	tcr_set_idmap_t0sz x8, x7
>  	msr	tcr_el1, x8
>  	msr	vbar_el1, x9
>  	msr	mdscr_el1, x10
> @@ -233,6 +234,8 @@ ENTRY(__cpu_setup)
>  	 */
>  	ldr	x10, =TCR_TxSZ(VA_BITS) | TCR_CACHE_FLAGS | TCR_SMP_FLAGS | \
>  			TCR_TG_FLAGS | TCR_ASID16 | TCR_TBI0
> +	tcr_set_idmap_t0sz	x10, x9
> +
>  	/*
>  	 * Read the PARange bits from ID_AA64MMFR0_EL1 and set the IPS bits in
>  	 * TCR_EL1.
> -- 
> 1.8.3.2
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH roundup 1/4] arm64: mm: increase VA range of identity map
@ 2015-03-16 14:28     ` Christoffer Dall
  0 siblings, 0 replies; 18+ messages in thread
From: Christoffer Dall @ 2015-03-16 14:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Mar 06, 2015 at 03:34:39PM +0100, Ard Biesheuvel wrote:
> The page size and the number of translation levels, and hence the supported
> virtual address range, are build-time configurables on arm64 whose optimal
> values are use case dependent. However, in the current implementation, if
> the system's RAM is located at a very high offset, the virtual address range
> needs to reflect that merely because the identity mapping, which is only used
> to enable or disable the MMU, requires the extended virtual range to map the
> physical memory at an equal virtual offset.
> 
> This patch relaxes that requirement, by increasing the number of translation
> levels for the identity mapping only, and only when actually needed, i.e.,
> when system RAM's offset is found to be out of reach at runtime.
> 
> Tested-by: Laura Abbott <lauraa@codeaurora.org>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> Tested-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/include/asm/mmu_context.h   | 43 ++++++++++++++++++++++++++++++++++
>  arch/arm64/include/asm/page.h          |  6 +++--
>  arch/arm64/include/asm/pgtable-hwdef.h |  7 +++++-
>  arch/arm64/kernel/head.S               | 38 ++++++++++++++++++++++++++++++
>  arch/arm64/kernel/smp.c                |  1 +
>  arch/arm64/mm/mmu.c                    |  7 +++++-
>  arch/arm64/mm/proc-macros.S            | 11 +++++++++
>  arch/arm64/mm/proc.S                   |  3 +++
>  8 files changed, 112 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
> index a9eee33dfa62..ecf2d060036b 100644
> --- a/arch/arm64/include/asm/mmu_context.h
> +++ b/arch/arm64/include/asm/mmu_context.h
> @@ -64,6 +64,49 @@ static inline void cpu_set_reserved_ttbr0(void)
>  	: "r" (ttbr));
>  }
>  
> +/*
> + * TCR.T0SZ value to use when the ID map is active. Usually equals
> + * TCR_T0SZ(VA_BITS), unless system RAM is positioned very high in
> + * physical memory, in which case it will be smaller.
> + */
> +extern u64 idmap_t0sz;
> +
> +static inline bool __cpu_uses_extended_idmap(void)
> +{
> +	return (!IS_ENABLED(CONFIG_ARM64_VA_BITS_48) &&
> +		unlikely(idmap_t0sz != TCR_T0SZ(VA_BITS)));
> +}
> +
> +static inline void __cpu_set_tcr_t0sz(u64 t0sz)
> +{
> +	unsigned long tcr;
> +
> +	if (__cpu_uses_extended_idmap())
> +		asm volatile (
> +		"	mrs	%0, tcr_el1	;"
> +		"	bfi	%0, %1, %2, %3	;"
> +		"	msr	tcr_el1, %0	;"
> +		"	isb"
> +		: "=&r" (tcr)
> +		: "r"(t0sz), "I"(TCR_T0SZ_OFFSET), "I"(TCR_TxSZ_WIDTH));
> +}
> +
> +/*
> + * Set TCR.T0SZ to the value appropriate for activating the identity map.
> + */
> +static inline void cpu_set_idmap_tcr_t0sz(void)
> +{
> +	__cpu_set_tcr_t0sz(idmap_t0sz);
> +}
> +
> +/*
> + * Set TCR.T0SZ to its default value (based on VA_BITS)
> + */
> +static inline void cpu_set_default_tcr_t0sz(void)
> +{
> +	__cpu_set_tcr_t0sz(TCR_T0SZ(VA_BITS));
> +}
> +
>  static inline void switch_new_context(struct mm_struct *mm)
>  {
>  	unsigned long flags;
> diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
> index 22b16232bd60..3d02b1869eb8 100644
> --- a/arch/arm64/include/asm/page.h
> +++ b/arch/arm64/include/asm/page.h
> @@ -33,7 +33,9 @@
>   * image. Both require pgd, pud (4 levels only) and pmd tables to (section)
>   * map the kernel. With the 64K page configuration, swapper and idmap need to
>   * map to pte level. The swapper also maps the FDT (see __create_page_tables
> - * for more information).
> + * for more information). Note that the number of ID map translation levels
> + * could be increased on the fly if system RAM is out of reach for the default
> + * VA range, so 3 pages are reserved in all cases.
>   */
>  #ifdef CONFIG_ARM64_64K_PAGES
>  #define SWAPPER_PGTABLE_LEVELS	(CONFIG_ARM64_PGTABLE_LEVELS)
> @@ -42,7 +44,7 @@
>  #endif
>  
>  #define SWAPPER_DIR_SIZE	(SWAPPER_PGTABLE_LEVELS * PAGE_SIZE)
> -#define IDMAP_DIR_SIZE		(SWAPPER_DIR_SIZE)
> +#define IDMAP_DIR_SIZE		(3 * PAGE_SIZE)
>  
>  #ifndef __ASSEMBLY__
>  
> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
> index 5f930cc9ea83..847e864202cc 100644
> --- a/arch/arm64/include/asm/pgtable-hwdef.h
> +++ b/arch/arm64/include/asm/pgtable-hwdef.h
> @@ -143,7 +143,12 @@
>  /*
>   * TCR flags.
>   */
> -#define TCR_TxSZ(x)		(((UL(64) - (x)) << 16) | ((UL(64) - (x)) << 0))
> +#define TCR_T0SZ_OFFSET		0
> +#define TCR_T1SZ_OFFSET		16
> +#define TCR_T0SZ(x)		((UL(64) - (x)) << TCR_T0SZ_OFFSET)
> +#define TCR_T1SZ(x)		((UL(64) - (x)) << TCR_T1SZ_OFFSET)
> +#define TCR_TxSZ(x)		(TCR_T0SZ(x) | TCR_T1SZ(x))
> +#define TCR_TxSZ_WIDTH		6
>  #define TCR_IRGN_NC		((UL(0) << 8) | (UL(0) << 24))
>  #define TCR_IRGN_WBWA		((UL(1) << 8) | (UL(1) << 24))
>  #define TCR_IRGN_WT		((UL(2) << 8) | (UL(2) << 24))
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 8ce88e08c030..a3612eadab3c 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -387,6 +387,44 @@ __create_page_tables:
>  	mov	x0, x25				// idmap_pg_dir
>  	ldr	x3, =KERNEL_START
>  	add	x3, x3, x28			// __pa(KERNEL_START)
> +
> +#ifndef CONFIG_ARM64_VA_BITS_48
> +#define EXTRA_SHIFT	(PGDIR_SHIFT + PAGE_SHIFT - 3)
> +#define EXTRA_PTRS	(1 << (48 - EXTRA_SHIFT))

How does this math work exactly?

I also had to look@the create_pgd_entry macros to understand that these
mean the shift for the 'extra' pgtable, and not the extra amount of
shifts compared to PGDIR_SHIFT.  Not sure if that warrants a comment?


> +
> +	/*
> +	 * If VA_BITS < 48, it may be too small to allow for an ID mapping to be
> +	 * created that covers system RAM if that is located sufficiently high
> +	 * in the physical address space. So for the ID map, use an extended
> +	 * virtual range in that case, by configuring an additional translation
> +	 * level.
> +	 * First, we have to verify our assumption that the current value of
> +	 * VA_BITS was chosen such that all translation levels are fully
> +	 * utilised, and that lowering T0SZ will always result in an additional
> +	 * translation level to be configured.
> +	 */
> +#if VA_BITS != EXTRA_SHIFT
> +#error "Mismatch between VA_BITS and page size/number of translation levels"
> +#endif
> +
> +	/*
> +	 * Calculate the maximum allowed value for TCR_EL1.T0SZ so that the
> +	 * entire kernel image can be ID mapped. As T0SZ == (64 - #bits used),
> +	 * this number conveniently equals the number of leading zeroes in
> +	 * the physical address of KERNEL_END.
> +	 */
> +	adrp	x5, KERNEL_END
> +	clz	x5, x5
> +	cmp	x5, TCR_T0SZ(VA_BITS)	// default T0SZ small enough?
> +	b.ge	1f			// .. then skip additional level
> +
> +	adrp	x6, idmap_t0sz
> +	str	x5, [x6, :lo12:idmap_t0sz]
> +
> +	create_table_entry x0, x3, EXTRA_SHIFT, EXTRA_PTRS, x5, x6

can you explain me how the subsequent call to create_pgd_entry with the
same tbl (x0) value ends up passing the right pointer from the extra
level to the pgd to the block mappings?

> +1:
> +#endif
> +
>  	create_pgd_entry x0, x3, x5, x6
>  	ldr	x6, =KERNEL_END
>  	mov	x5, x3				// __pa(KERNEL_START)
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 328b8ce4b007..74554dfcce73 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -151,6 +151,7 @@ asmlinkage void secondary_start_kernel(void)
>  	 */
>  	cpu_set_reserved_ttbr0();
>  	flush_tlb_all();
> +	cpu_set_default_tcr_t0sz();
>  
>  	preempt_disable();
>  	trace_hardirqs_off();
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index c6daaf6c6f97..c4f60393383e 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -40,6 +40,8 @@
>  
>  #include "mm.h"
>  
> +u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
> +
>  /*
>   * Empty_zero_page is a special page that is used for zero-initialized data
>   * and COW.
> @@ -454,6 +456,7 @@ void __init paging_init(void)
>  	 */
>  	cpu_set_reserved_ttbr0();
>  	flush_tlb_all();
> +	cpu_set_default_tcr_t0sz();
>  }
>  
>  /*
> @@ -461,8 +464,10 @@ void __init paging_init(void)
>   */
>  void setup_mm_for_reboot(void)
>  {
> -	cpu_switch_mm(idmap_pg_dir, &init_mm);
> +	cpu_set_reserved_ttbr0();
>  	flush_tlb_all();
> +	cpu_set_idmap_tcr_t0sz();
> +	cpu_switch_mm(idmap_pg_dir, &init_mm);
>  }
>  
>  /*
> diff --git a/arch/arm64/mm/proc-macros.S b/arch/arm64/mm/proc-macros.S
> index 005d29e2977d..c17fdd6a19bc 100644
> --- a/arch/arm64/mm/proc-macros.S
> +++ b/arch/arm64/mm/proc-macros.S
> @@ -52,3 +52,14 @@
>  	mov	\reg, #4			// bytes per word
>  	lsl	\reg, \reg, \tmp		// actual cache line size
>  	.endm
> +
> +/*
> + * tcr_set_idmap_t0sz - update TCR.T0SZ so that we can load the ID map
> + */
> +	.macro	tcr_set_idmap_t0sz, valreg, tmpreg
> +#ifndef CONFIG_ARM64_VA_BITS_48
> +	adrp	\tmpreg, idmap_t0sz
> +	ldr	\tmpreg, [\tmpreg, #:lo12:idmap_t0sz]
> +	bfi	\valreg, \tmpreg, #TCR_T0SZ_OFFSET, #TCR_TxSZ_WIDTH
> +#endif
> +	.endm
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 28eebfb6af76..cdd754e19b9b 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -156,6 +156,7 @@ ENTRY(cpu_do_resume)
>  	msr	cpacr_el1, x6
>  	msr	ttbr0_el1, x1
>  	msr	ttbr1_el1, x7
> +	tcr_set_idmap_t0sz x8, x7
>  	msr	tcr_el1, x8
>  	msr	vbar_el1, x9
>  	msr	mdscr_el1, x10
> @@ -233,6 +234,8 @@ ENTRY(__cpu_setup)
>  	 */
>  	ldr	x10, =TCR_TxSZ(VA_BITS) | TCR_CACHE_FLAGS | TCR_SMP_FLAGS | \
>  			TCR_TG_FLAGS | TCR_ASID16 | TCR_TBI0
> +	tcr_set_idmap_t0sz	x10, x9
> +
>  	/*
>  	 * Read the PARange bits from ID_AA64MMFR0_EL1 and set the IPS bits in
>  	 * TCR_EL1.
> -- 
> 1.8.3.2
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH roundup 1/4] arm64: mm: increase VA range of identity map
  2015-03-16 14:28     ` Christoffer Dall
@ 2015-03-16 14:39       ` Ard Biesheuvel
  -1 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2015-03-16 14:39 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Russell King - ARM Linux, Arnd Bergmann, Marc Zyngier,
	Will Deacon, kvmarm, linux-arm-kernel

On 16 March 2015 at 15:28, Christoffer Dall <christoffer.dall@linaro.org> wrote:
> On Fri, Mar 06, 2015 at 03:34:39PM +0100, Ard Biesheuvel wrote:
>> The page size and the number of translation levels, and hence the supported
>> virtual address range, are build-time configurables on arm64 whose optimal
>> values are use case dependent. However, in the current implementation, if
>> the system's RAM is located at a very high offset, the virtual address range
>> needs to reflect that merely because the identity mapping, which is only used
>> to enable or disable the MMU, requires the extended virtual range to map the
>> physical memory at an equal virtual offset.
>>
>> This patch relaxes that requirement, by increasing the number of translation
>> levels for the identity mapping only, and only when actually needed, i.e.,
>> when system RAM's offset is found to be out of reach at runtime.
>>
>> Tested-by: Laura Abbott <lauraa@codeaurora.org>
>> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
>> Tested-by: Marc Zyngier <marc.zyngier@arm.com>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  arch/arm64/include/asm/mmu_context.h   | 43 ++++++++++++++++++++++++++++++++++
>>  arch/arm64/include/asm/page.h          |  6 +++--
>>  arch/arm64/include/asm/pgtable-hwdef.h |  7 +++++-
>>  arch/arm64/kernel/head.S               | 38 ++++++++++++++++++++++++++++++
>>  arch/arm64/kernel/smp.c                |  1 +
>>  arch/arm64/mm/mmu.c                    |  7 +++++-
>>  arch/arm64/mm/proc-macros.S            | 11 +++++++++
>>  arch/arm64/mm/proc.S                   |  3 +++
>>  8 files changed, 112 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
>> index a9eee33dfa62..ecf2d060036b 100644
>> --- a/arch/arm64/include/asm/mmu_context.h
>> +++ b/arch/arm64/include/asm/mmu_context.h
>> @@ -64,6 +64,49 @@ static inline void cpu_set_reserved_ttbr0(void)
>>       : "r" (ttbr));
>>  }
>>
>> +/*
>> + * TCR.T0SZ value to use when the ID map is active. Usually equals
>> + * TCR_T0SZ(VA_BITS), unless system RAM is positioned very high in
>> + * physical memory, in which case it will be smaller.
>> + */
>> +extern u64 idmap_t0sz;
>> +
>> +static inline bool __cpu_uses_extended_idmap(void)
>> +{
>> +     return (!IS_ENABLED(CONFIG_ARM64_VA_BITS_48) &&
>> +             unlikely(idmap_t0sz != TCR_T0SZ(VA_BITS)));
>> +}
>> +
>> +static inline void __cpu_set_tcr_t0sz(u64 t0sz)
>> +{
>> +     unsigned long tcr;
>> +
>> +     if (__cpu_uses_extended_idmap())
>> +             asm volatile (
>> +             "       mrs     %0, tcr_el1     ;"
>> +             "       bfi     %0, %1, %2, %3  ;"
>> +             "       msr     tcr_el1, %0     ;"
>> +             "       isb"
>> +             : "=&r" (tcr)
>> +             : "r"(t0sz), "I"(TCR_T0SZ_OFFSET), "I"(TCR_TxSZ_WIDTH));
>> +}
>> +
>> +/*
>> + * Set TCR.T0SZ to the value appropriate for activating the identity map.
>> + */
>> +static inline void cpu_set_idmap_tcr_t0sz(void)
>> +{
>> +     __cpu_set_tcr_t0sz(idmap_t0sz);
>> +}
>> +
>> +/*
>> + * Set TCR.T0SZ to its default value (based on VA_BITS)
>> + */
>> +static inline void cpu_set_default_tcr_t0sz(void)
>> +{
>> +     __cpu_set_tcr_t0sz(TCR_T0SZ(VA_BITS));
>> +}
>> +
>>  static inline void switch_new_context(struct mm_struct *mm)
>>  {
>>       unsigned long flags;
>> diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
>> index 22b16232bd60..3d02b1869eb8 100644
>> --- a/arch/arm64/include/asm/page.h
>> +++ b/arch/arm64/include/asm/page.h
>> @@ -33,7 +33,9 @@
>>   * image. Both require pgd, pud (4 levels only) and pmd tables to (section)
>>   * map the kernel. With the 64K page configuration, swapper and idmap need to
>>   * map to pte level. The swapper also maps the FDT (see __create_page_tables
>> - * for more information).
>> + * for more information). Note that the number of ID map translation levels
>> + * could be increased on the fly if system RAM is out of reach for the default
>> + * VA range, so 3 pages are reserved in all cases.
>>   */
>>  #ifdef CONFIG_ARM64_64K_PAGES
>>  #define SWAPPER_PGTABLE_LEVELS       (CONFIG_ARM64_PGTABLE_LEVELS)
>> @@ -42,7 +44,7 @@
>>  #endif
>>
>>  #define SWAPPER_DIR_SIZE     (SWAPPER_PGTABLE_LEVELS * PAGE_SIZE)
>> -#define IDMAP_DIR_SIZE               (SWAPPER_DIR_SIZE)
>> +#define IDMAP_DIR_SIZE               (3 * PAGE_SIZE)
>>
>>  #ifndef __ASSEMBLY__
>>
>> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
>> index 5f930cc9ea83..847e864202cc 100644
>> --- a/arch/arm64/include/asm/pgtable-hwdef.h
>> +++ b/arch/arm64/include/asm/pgtable-hwdef.h
>> @@ -143,7 +143,12 @@
>>  /*
>>   * TCR flags.
>>   */
>> -#define TCR_TxSZ(x)          (((UL(64) - (x)) << 16) | ((UL(64) - (x)) << 0))
>> +#define TCR_T0SZ_OFFSET              0
>> +#define TCR_T1SZ_OFFSET              16
>> +#define TCR_T0SZ(x)          ((UL(64) - (x)) << TCR_T0SZ_OFFSET)
>> +#define TCR_T1SZ(x)          ((UL(64) - (x)) << TCR_T1SZ_OFFSET)
>> +#define TCR_TxSZ(x)          (TCR_T0SZ(x) | TCR_T1SZ(x))
>> +#define TCR_TxSZ_WIDTH               6
>>  #define TCR_IRGN_NC          ((UL(0) << 8) | (UL(0) << 24))
>>  #define TCR_IRGN_WBWA                ((UL(1) << 8) | (UL(1) << 24))
>>  #define TCR_IRGN_WT          ((UL(2) << 8) | (UL(2) << 24))
>> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
>> index 8ce88e08c030..a3612eadab3c 100644
>> --- a/arch/arm64/kernel/head.S
>> +++ b/arch/arm64/kernel/head.S
>> @@ -387,6 +387,44 @@ __create_page_tables:
>>       mov     x0, x25                         // idmap_pg_dir
>>       ldr     x3, =KERNEL_START
>>       add     x3, x3, x28                     // __pa(KERNEL_START)
>> +
>> +#ifndef CONFIG_ARM64_VA_BITS_48
>> +#define EXTRA_SHIFT  (PGDIR_SHIFT + PAGE_SHIFT - 3)
>> +#define EXTRA_PTRS   (1 << (48 - EXTRA_SHIFT))
>
> How does this math work exactly?
>

PAGE_SHIFT - 3 is the number of bits translated at each level.
EXTRA_SHIFT is the number of VA low bits that is translated by the
higher tables.
EXTRA_PTRS is the size of the root table (in 64-bit words)

> I also had to look at the create_pgd_entry macros to understand that these
> mean the shift for the 'extra' pgtable, and not the extra amount of
> shifts compared to PGDIR_SHIFT.  Not sure if that warrants a comment?
>

I am not sure if I understand what 'the extra amount of shifts' means,
so I should at least add a comment that that's not it :-)
But yes, I can clarify that.

>
>> +
>> +     /*
>> +      * If VA_BITS < 48, it may be too small to allow for an ID mapping to be
>> +      * created that covers system RAM if that is located sufficiently high
>> +      * in the physical address space. So for the ID map, use an extended
>> +      * virtual range in that case, by configuring an additional translation
>> +      * level.
>> +      * First, we have to verify our assumption that the current value of
>> +      * VA_BITS was chosen such that all translation levels are fully
>> +      * utilised, and that lowering T0SZ will always result in an additional
>> +      * translation level to be configured.
>> +      */
>> +#if VA_BITS != EXTRA_SHIFT
>> +#error "Mismatch between VA_BITS and page size/number of translation levels"
>> +#endif
>> +
>> +     /*
>> +      * Calculate the maximum allowed value for TCR_EL1.T0SZ so that the
>> +      * entire kernel image can be ID mapped. As T0SZ == (64 - #bits used),
>> +      * this number conveniently equals the number of leading zeroes in
>> +      * the physical address of KERNEL_END.
>> +      */
>> +     adrp    x5, KERNEL_END
>> +     clz     x5, x5
>> +     cmp     x5, TCR_T0SZ(VA_BITS)   // default T0SZ small enough?
>> +     b.ge    1f                      // .. then skip additional level
>> +
>> +     adrp    x6, idmap_t0sz
>> +     str     x5, [x6, :lo12:idmap_t0sz]
>> +
>> +     create_table_entry x0, x3, EXTRA_SHIFT, EXTRA_PTRS, x5, x6
>
> can you explain me how the subsequent call to create_pgd_entry with the
> same tbl (x0) value ends up passing the right pointer from the extra
> level to the pgd to the block mappings?
>

x0 is not preserved by the macro but incremented by 1 page.

Look at create_pgd_entry: it calls create_table_entry twice with the
same \tbl register, but each call sets another level.

>> +1:
>> +#endif
>> +
>>       create_pgd_entry x0, x3, x5, x6
>>       ldr     x6, =KERNEL_END
>>       mov     x5, x3                          // __pa(KERNEL_START)
>> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
>> index 328b8ce4b007..74554dfcce73 100644
>> --- a/arch/arm64/kernel/smp.c
>> +++ b/arch/arm64/kernel/smp.c
>> @@ -151,6 +151,7 @@ asmlinkage void secondary_start_kernel(void)
>>        */
>>       cpu_set_reserved_ttbr0();
>>       flush_tlb_all();
>> +     cpu_set_default_tcr_t0sz();
>>
>>       preempt_disable();
>>       trace_hardirqs_off();
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index c6daaf6c6f97..c4f60393383e 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -40,6 +40,8 @@
>>
>>  #include "mm.h"
>>
>> +u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
>> +
>>  /*
>>   * Empty_zero_page is a special page that is used for zero-initialized data
>>   * and COW.
>> @@ -454,6 +456,7 @@ void __init paging_init(void)
>>        */
>>       cpu_set_reserved_ttbr0();
>>       flush_tlb_all();
>> +     cpu_set_default_tcr_t0sz();
>>  }
>>
>>  /*
>> @@ -461,8 +464,10 @@ void __init paging_init(void)
>>   */
>>  void setup_mm_for_reboot(void)
>>  {
>> -     cpu_switch_mm(idmap_pg_dir, &init_mm);
>> +     cpu_set_reserved_ttbr0();
>>       flush_tlb_all();
>> +     cpu_set_idmap_tcr_t0sz();
>> +     cpu_switch_mm(idmap_pg_dir, &init_mm);
>>  }
>>
>>  /*
>> diff --git a/arch/arm64/mm/proc-macros.S b/arch/arm64/mm/proc-macros.S
>> index 005d29e2977d..c17fdd6a19bc 100644
>> --- a/arch/arm64/mm/proc-macros.S
>> +++ b/arch/arm64/mm/proc-macros.S
>> @@ -52,3 +52,14 @@
>>       mov     \reg, #4                        // bytes per word
>>       lsl     \reg, \reg, \tmp                // actual cache line size
>>       .endm
>> +
>> +/*
>> + * tcr_set_idmap_t0sz - update TCR.T0SZ so that we can load the ID map
>> + */
>> +     .macro  tcr_set_idmap_t0sz, valreg, tmpreg
>> +#ifndef CONFIG_ARM64_VA_BITS_48
>> +     adrp    \tmpreg, idmap_t0sz
>> +     ldr     \tmpreg, [\tmpreg, #:lo12:idmap_t0sz]
>> +     bfi     \valreg, \tmpreg, #TCR_T0SZ_OFFSET, #TCR_TxSZ_WIDTH
>> +#endif
>> +     .endm
>> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
>> index 28eebfb6af76..cdd754e19b9b 100644
>> --- a/arch/arm64/mm/proc.S
>> +++ b/arch/arm64/mm/proc.S
>> @@ -156,6 +156,7 @@ ENTRY(cpu_do_resume)
>>       msr     cpacr_el1, x6
>>       msr     ttbr0_el1, x1
>>       msr     ttbr1_el1, x7
>> +     tcr_set_idmap_t0sz x8, x7
>>       msr     tcr_el1, x8
>>       msr     vbar_el1, x9
>>       msr     mdscr_el1, x10
>> @@ -233,6 +234,8 @@ ENTRY(__cpu_setup)
>>        */
>>       ldr     x10, =TCR_TxSZ(VA_BITS) | TCR_CACHE_FLAGS | TCR_SMP_FLAGS | \
>>                       TCR_TG_FLAGS | TCR_ASID16 | TCR_TBI0
>> +     tcr_set_idmap_t0sz      x10, x9
>> +
>>       /*
>>        * Read the PARange bits from ID_AA64MMFR0_EL1 and set the IPS bits in
>>        * TCR_EL1.
>> --
>> 1.8.3.2
>>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH roundup 1/4] arm64: mm: increase VA range of identity map
@ 2015-03-16 14:39       ` Ard Biesheuvel
  0 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2015-03-16 14:39 UTC (permalink / raw)
  To: linux-arm-kernel

On 16 March 2015 at 15:28, Christoffer Dall <christoffer.dall@linaro.org> wrote:
> On Fri, Mar 06, 2015 at 03:34:39PM +0100, Ard Biesheuvel wrote:
>> The page size and the number of translation levels, and hence the supported
>> virtual address range, are build-time configurables on arm64 whose optimal
>> values are use case dependent. However, in the current implementation, if
>> the system's RAM is located at a very high offset, the virtual address range
>> needs to reflect that merely because the identity mapping, which is only used
>> to enable or disable the MMU, requires the extended virtual range to map the
>> physical memory at an equal virtual offset.
>>
>> This patch relaxes that requirement, by increasing the number of translation
>> levels for the identity mapping only, and only when actually needed, i.e.,
>> when system RAM's offset is found to be out of reach at runtime.
>>
>> Tested-by: Laura Abbott <lauraa@codeaurora.org>
>> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
>> Tested-by: Marc Zyngier <marc.zyngier@arm.com>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  arch/arm64/include/asm/mmu_context.h   | 43 ++++++++++++++++++++++++++++++++++
>>  arch/arm64/include/asm/page.h          |  6 +++--
>>  arch/arm64/include/asm/pgtable-hwdef.h |  7 +++++-
>>  arch/arm64/kernel/head.S               | 38 ++++++++++++++++++++++++++++++
>>  arch/arm64/kernel/smp.c                |  1 +
>>  arch/arm64/mm/mmu.c                    |  7 +++++-
>>  arch/arm64/mm/proc-macros.S            | 11 +++++++++
>>  arch/arm64/mm/proc.S                   |  3 +++
>>  8 files changed, 112 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
>> index a9eee33dfa62..ecf2d060036b 100644
>> --- a/arch/arm64/include/asm/mmu_context.h
>> +++ b/arch/arm64/include/asm/mmu_context.h
>> @@ -64,6 +64,49 @@ static inline void cpu_set_reserved_ttbr0(void)
>>       : "r" (ttbr));
>>  }
>>
>> +/*
>> + * TCR.T0SZ value to use when the ID map is active. Usually equals
>> + * TCR_T0SZ(VA_BITS), unless system RAM is positioned very high in
>> + * physical memory, in which case it will be smaller.
>> + */
>> +extern u64 idmap_t0sz;
>> +
>> +static inline bool __cpu_uses_extended_idmap(void)
>> +{
>> +     return (!IS_ENABLED(CONFIG_ARM64_VA_BITS_48) &&
>> +             unlikely(idmap_t0sz != TCR_T0SZ(VA_BITS)));
>> +}
>> +
>> +static inline void __cpu_set_tcr_t0sz(u64 t0sz)
>> +{
>> +     unsigned long tcr;
>> +
>> +     if (__cpu_uses_extended_idmap())
>> +             asm volatile (
>> +             "       mrs     %0, tcr_el1     ;"
>> +             "       bfi     %0, %1, %2, %3  ;"
>> +             "       msr     tcr_el1, %0     ;"
>> +             "       isb"
>> +             : "=&r" (tcr)
>> +             : "r"(t0sz), "I"(TCR_T0SZ_OFFSET), "I"(TCR_TxSZ_WIDTH));
>> +}
>> +
>> +/*
>> + * Set TCR.T0SZ to the value appropriate for activating the identity map.
>> + */
>> +static inline void cpu_set_idmap_tcr_t0sz(void)
>> +{
>> +     __cpu_set_tcr_t0sz(idmap_t0sz);
>> +}
>> +
>> +/*
>> + * Set TCR.T0SZ to its default value (based on VA_BITS)
>> + */
>> +static inline void cpu_set_default_tcr_t0sz(void)
>> +{
>> +     __cpu_set_tcr_t0sz(TCR_T0SZ(VA_BITS));
>> +}
>> +
>>  static inline void switch_new_context(struct mm_struct *mm)
>>  {
>>       unsigned long flags;
>> diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
>> index 22b16232bd60..3d02b1869eb8 100644
>> --- a/arch/arm64/include/asm/page.h
>> +++ b/arch/arm64/include/asm/page.h
>> @@ -33,7 +33,9 @@
>>   * image. Both require pgd, pud (4 levels only) and pmd tables to (section)
>>   * map the kernel. With the 64K page configuration, swapper and idmap need to
>>   * map to pte level. The swapper also maps the FDT (see __create_page_tables
>> - * for more information).
>> + * for more information). Note that the number of ID map translation levels
>> + * could be increased on the fly if system RAM is out of reach for the default
>> + * VA range, so 3 pages are reserved in all cases.
>>   */
>>  #ifdef CONFIG_ARM64_64K_PAGES
>>  #define SWAPPER_PGTABLE_LEVELS       (CONFIG_ARM64_PGTABLE_LEVELS)
>> @@ -42,7 +44,7 @@
>>  #endif
>>
>>  #define SWAPPER_DIR_SIZE     (SWAPPER_PGTABLE_LEVELS * PAGE_SIZE)
>> -#define IDMAP_DIR_SIZE               (SWAPPER_DIR_SIZE)
>> +#define IDMAP_DIR_SIZE               (3 * PAGE_SIZE)
>>
>>  #ifndef __ASSEMBLY__
>>
>> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
>> index 5f930cc9ea83..847e864202cc 100644
>> --- a/arch/arm64/include/asm/pgtable-hwdef.h
>> +++ b/arch/arm64/include/asm/pgtable-hwdef.h
>> @@ -143,7 +143,12 @@
>>  /*
>>   * TCR flags.
>>   */
>> -#define TCR_TxSZ(x)          (((UL(64) - (x)) << 16) | ((UL(64) - (x)) << 0))
>> +#define TCR_T0SZ_OFFSET              0
>> +#define TCR_T1SZ_OFFSET              16
>> +#define TCR_T0SZ(x)          ((UL(64) - (x)) << TCR_T0SZ_OFFSET)
>> +#define TCR_T1SZ(x)          ((UL(64) - (x)) << TCR_T1SZ_OFFSET)
>> +#define TCR_TxSZ(x)          (TCR_T0SZ(x) | TCR_T1SZ(x))
>> +#define TCR_TxSZ_WIDTH               6
>>  #define TCR_IRGN_NC          ((UL(0) << 8) | (UL(0) << 24))
>>  #define TCR_IRGN_WBWA                ((UL(1) << 8) | (UL(1) << 24))
>>  #define TCR_IRGN_WT          ((UL(2) << 8) | (UL(2) << 24))
>> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
>> index 8ce88e08c030..a3612eadab3c 100644
>> --- a/arch/arm64/kernel/head.S
>> +++ b/arch/arm64/kernel/head.S
>> @@ -387,6 +387,44 @@ __create_page_tables:
>>       mov     x0, x25                         // idmap_pg_dir
>>       ldr     x3, =KERNEL_START
>>       add     x3, x3, x28                     // __pa(KERNEL_START)
>> +
>> +#ifndef CONFIG_ARM64_VA_BITS_48
>> +#define EXTRA_SHIFT  (PGDIR_SHIFT + PAGE_SHIFT - 3)
>> +#define EXTRA_PTRS   (1 << (48 - EXTRA_SHIFT))
>
> How does this math work exactly?
>

PAGE_SHIFT - 3 is the number of bits translated at each level.
EXTRA_SHIFT is the number of VA low bits that is translated by the
higher tables.
EXTRA_PTRS is the size of the root table (in 64-bit words)

> I also had to look at the create_pgd_entry macros to understand that these
> mean the shift for the 'extra' pgtable, and not the extra amount of
> shifts compared to PGDIR_SHIFT.  Not sure if that warrants a comment?
>

I am not sure if I understand what 'the extra amount of shifts' means,
so I should at least add a comment that that's not it :-)
But yes, I can clarify that.

>
>> +
>> +     /*
>> +      * If VA_BITS < 48, it may be too small to allow for an ID mapping to be
>> +      * created that covers system RAM if that is located sufficiently high
>> +      * in the physical address space. So for the ID map, use an extended
>> +      * virtual range in that case, by configuring an additional translation
>> +      * level.
>> +      * First, we have to verify our assumption that the current value of
>> +      * VA_BITS was chosen such that all translation levels are fully
>> +      * utilised, and that lowering T0SZ will always result in an additional
>> +      * translation level to be configured.
>> +      */
>> +#if VA_BITS != EXTRA_SHIFT
>> +#error "Mismatch between VA_BITS and page size/number of translation levels"
>> +#endif
>> +
>> +     /*
>> +      * Calculate the maximum allowed value for TCR_EL1.T0SZ so that the
>> +      * entire kernel image can be ID mapped. As T0SZ == (64 - #bits used),
>> +      * this number conveniently equals the number of leading zeroes in
>> +      * the physical address of KERNEL_END.
>> +      */
>> +     adrp    x5, KERNEL_END
>> +     clz     x5, x5
>> +     cmp     x5, TCR_T0SZ(VA_BITS)   // default T0SZ small enough?
>> +     b.ge    1f                      // .. then skip additional level
>> +
>> +     adrp    x6, idmap_t0sz
>> +     str     x5, [x6, :lo12:idmap_t0sz]
>> +
>> +     create_table_entry x0, x3, EXTRA_SHIFT, EXTRA_PTRS, x5, x6
>
> can you explain me how the subsequent call to create_pgd_entry with the
> same tbl (x0) value ends up passing the right pointer from the extra
> level to the pgd to the block mappings?
>

x0 is not preserved by the macro but incremented by 1 page.

Look at create_pgd_entry: it calls create_table_entry twice with the
same \tbl register, but each call sets another level.

>> +1:
>> +#endif
>> +
>>       create_pgd_entry x0, x3, x5, x6
>>       ldr     x6, =KERNEL_END
>>       mov     x5, x3                          // __pa(KERNEL_START)
>> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
>> index 328b8ce4b007..74554dfcce73 100644
>> --- a/arch/arm64/kernel/smp.c
>> +++ b/arch/arm64/kernel/smp.c
>> @@ -151,6 +151,7 @@ asmlinkage void secondary_start_kernel(void)
>>        */
>>       cpu_set_reserved_ttbr0();
>>       flush_tlb_all();
>> +     cpu_set_default_tcr_t0sz();
>>
>>       preempt_disable();
>>       trace_hardirqs_off();
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index c6daaf6c6f97..c4f60393383e 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -40,6 +40,8 @@
>>
>>  #include "mm.h"
>>
>> +u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
>> +
>>  /*
>>   * Empty_zero_page is a special page that is used for zero-initialized data
>>   * and COW.
>> @@ -454,6 +456,7 @@ void __init paging_init(void)
>>        */
>>       cpu_set_reserved_ttbr0();
>>       flush_tlb_all();
>> +     cpu_set_default_tcr_t0sz();
>>  }
>>
>>  /*
>> @@ -461,8 +464,10 @@ void __init paging_init(void)
>>   */
>>  void setup_mm_for_reboot(void)
>>  {
>> -     cpu_switch_mm(idmap_pg_dir, &init_mm);
>> +     cpu_set_reserved_ttbr0();
>>       flush_tlb_all();
>> +     cpu_set_idmap_tcr_t0sz();
>> +     cpu_switch_mm(idmap_pg_dir, &init_mm);
>>  }
>>
>>  /*
>> diff --git a/arch/arm64/mm/proc-macros.S b/arch/arm64/mm/proc-macros.S
>> index 005d29e2977d..c17fdd6a19bc 100644
>> --- a/arch/arm64/mm/proc-macros.S
>> +++ b/arch/arm64/mm/proc-macros.S
>> @@ -52,3 +52,14 @@
>>       mov     \reg, #4                        // bytes per word
>>       lsl     \reg, \reg, \tmp                // actual cache line size
>>       .endm
>> +
>> +/*
>> + * tcr_set_idmap_t0sz - update TCR.T0SZ so that we can load the ID map
>> + */
>> +     .macro  tcr_set_idmap_t0sz, valreg, tmpreg
>> +#ifndef CONFIG_ARM64_VA_BITS_48
>> +     adrp    \tmpreg, idmap_t0sz
>> +     ldr     \tmpreg, [\tmpreg, #:lo12:idmap_t0sz]
>> +     bfi     \valreg, \tmpreg, #TCR_T0SZ_OFFSET, #TCR_TxSZ_WIDTH
>> +#endif
>> +     .endm
>> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
>> index 28eebfb6af76..cdd754e19b9b 100644
>> --- a/arch/arm64/mm/proc.S
>> +++ b/arch/arm64/mm/proc.S
>> @@ -156,6 +156,7 @@ ENTRY(cpu_do_resume)
>>       msr     cpacr_el1, x6
>>       msr     ttbr0_el1, x1
>>       msr     ttbr1_el1, x7
>> +     tcr_set_idmap_t0sz x8, x7
>>       msr     tcr_el1, x8
>>       msr     vbar_el1, x9
>>       msr     mdscr_el1, x10
>> @@ -233,6 +234,8 @@ ENTRY(__cpu_setup)
>>        */
>>       ldr     x10, =TCR_TxSZ(VA_BITS) | TCR_CACHE_FLAGS | TCR_SMP_FLAGS | \
>>                       TCR_TG_FLAGS | TCR_ASID16 | TCR_TBI0
>> +     tcr_set_idmap_t0sz      x10, x9
>> +
>>       /*
>>        * Read the PARange bits from ID_AA64MMFR0_EL1 and set the IPS bits in
>>        * TCR_EL1.
>> --
>> 1.8.3.2
>>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-03-16 14:39 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-06 14:34 [PATCH roundup 0/4] extend VA range of ID map for core kernel and KVM Ard Biesheuvel
2015-03-06 14:34 ` Ard Biesheuvel
2015-03-06 14:34 ` [PATCH roundup 1/4] arm64: mm: increase VA range of identity map Ard Biesheuvel
2015-03-06 14:34   ` Ard Biesheuvel
2015-03-16 14:28   ` Christoffer Dall
2015-03-16 14:28     ` Christoffer Dall
2015-03-16 14:39     ` Ard Biesheuvel
2015-03-16 14:39       ` Ard Biesheuvel
2015-03-06 14:34 ` [PATCH roundup 2/4] ARM: KVM: avoid "HYP init code too big" error Ard Biesheuvel
2015-03-06 14:34   ` Ard Biesheuvel
2015-03-09 19:09   ` Russell King - ARM Linux
2015-03-09 19:09     ` Russell King - ARM Linux
2015-03-10  9:56     ` Ard Biesheuvel
2015-03-10  9:56       ` Ard Biesheuvel
2015-03-06 14:34 ` [PATCH roundup 3/4] ARM, arm64: kvm: get rid of the bounce page Ard Biesheuvel
2015-03-06 14:34   ` Ard Biesheuvel
2015-03-06 14:34 ` [PATCH roundup 4/4] arm64: KVM: use ID map with increased VA range if required Ard Biesheuvel
2015-03-06 14:34   ` Ard Biesheuvel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.