All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] arm64: KVM: use increased VA range if needed
@ 2015-02-26 15:29 Ard Biesheuvel
  2015-02-26 15:29 ` [RFC PATCH 1/3] ARM, arm64: kvm: get rid of the bounce page Ard Biesheuvel
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Ard Biesheuvel @ 2015-02-26 15:29 UTC (permalink / raw)
  To: linux-arm-kernel

This series addresses the issue where KVM will not be able to initialize
when running on a system whose RAM is located higher in the physical address
space than can be covered with an ID map using VA_BITS of virtual address
space.

Tested on AMD Seattle with 4k pages/VA_BITS==39.

Ard Biesheuvel (3):
  ARM, arm64: kvm: get rid of the bounce page
  arm64: make ID map shareable with EL2
  arm64: KVM: use ID map with increased VA range if required

 arch/arm/kernel/vmlinux.lds.S   | 12 +++++++++---
 arch/arm/kvm/init.S             | 11 +++++++++++
 arch/arm/kvm/mmu.c              | 42 +++++------------------------------------
 arch/arm64/kernel/head.S        | 22 +++++++++++++++++----
 arch/arm64/kernel/vmlinux.lds.S | 20 +++++++++++++-------
 arch/arm64/kvm/hyp-init.S       | 40 +++++++++++++++++++++++++++++++++++++--
 6 files changed, 94 insertions(+), 53 deletions(-)

-- 
1.8.3.2

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 1/3] ARM, arm64: kvm: get rid of the bounce page
  2015-02-26 15:29 [RFC PATCH 0/3] arm64: KVM: use increased VA range if needed Ard Biesheuvel
@ 2015-02-26 15:29 ` Ard Biesheuvel
  2015-02-26 16:10   ` Marc Zyngier
  2015-02-26 15:29 ` [RFC PATCH 2/3] arm64: make ID map shareable with EL2 Ard Biesheuvel
  2015-02-26 15:29 ` [RFC PATCH 3/3] arm64: KVM: use ID map with increased VA range if required Ard Biesheuvel
  2 siblings, 1 reply; 17+ messages in thread
From: Ard Biesheuvel @ 2015-02-26 15:29 UTC (permalink / raw)
  To: linux-arm-kernel

The HYP init bounce page is a runtime construct that ensures that the
HYP init code does not cross a page boundary. However, this is something
we can do perfectly well at build time, by aligning the code appropriately.

For arm64, we just align to 4 KB, and enforce that the code size is less
than 4 KB, regardless of the chosen page size.

For ARM, the whole code is less than 256 bytes, so we tweak the linker
script to align at a power of 2 upper bound of the code size

Note that this also fixes a benign off-by-one error in the original bounce
page code, where a bounce page would be allocated unnecessarily if the code
was exactly 1 page in size.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/kernel/vmlinux.lds.S   | 12 +++++++++---
 arch/arm/kvm/init.S             | 11 +++++++++++
 arch/arm/kvm/mmu.c              | 42 +++++------------------------------------
 arch/arm64/kernel/vmlinux.lds.S | 18 ++++++++++++------
 4 files changed, 37 insertions(+), 46 deletions(-)

diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index b31aa73e8076..8179d3903dee 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -23,7 +23,7 @@
 	VMLINUX_SYMBOL(__idmap_text_start) = .;				\
 	*(.idmap.text)							\
 	VMLINUX_SYMBOL(__idmap_text_end) = .;				\
-	. = ALIGN(32);							\
+	. = ALIGN(1 << __hyp_idmap_align_order);			\
 	VMLINUX_SYMBOL(__hyp_idmap_text_start) = .;			\
 	*(.hyp.idmap.text)						\
 	VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;
@@ -346,8 +346,14 @@ SECTIONS
  */
 ASSERT((__proc_info_end - __proc_info_begin), "missing CPU support")
 ASSERT((__arch_info_end - __arch_info_begin), "no machine record defined")
+
 /*
- * The HYP init code can't be more than a page long.
+ * The HYP init code can't be more than a page long,
+ * and should not cross a page boundary.
  * The above comment applies as well.
  */
-ASSERT(((__hyp_idmap_text_end - __hyp_idmap_text_start) <= PAGE_SIZE), "HYP init code too big")
+ASSERT(((__hyp_idmap_text_end - 1) & PAGE_MASK) -
+	(__hyp_idmap_text_start & PAGE_MASK) == 0,
+	"HYP init code too big or unaligned")
+ASSERT(__hyp_idmap_size <= (1 << __hyp_idmap_align_order),
+	"__hyp_idmap_size should be <= (1 << __hyp_idmap_align_order)")
diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
index 3988e72d16ff..7a279bc8e0e1 100644
--- a/arch/arm/kvm/init.S
+++ b/arch/arm/kvm/init.S
@@ -157,3 +157,14 @@ target:	@ We're now in the trampoline code, switch page tables
 __kvm_hyp_init_end:
 
 	.popsection
+
+	/*
+	 * When making changes to this file, make sure that the value of
+	 * __hyp_idmap_align_order is updated if the size of the code ends up
+	 * exceeding (1 << __hyp_idmap_align_order). This helps ensure that the
+	 * code never crosses a page boundary, without wasting too much memory
+	 * on aligning to PAGE_SIZE.
+	 */
+	.global	__hyp_idmap_size, __hyp_idmap_align_order
+	.set	__hyp_idmap_size, __kvm_hyp_init_end - __kvm_hyp_init
+	.set	__hyp_idmap_align_order, 8
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 3e6859bc3e11..42a24d6b003b 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -37,7 +37,6 @@ static pgd_t *boot_hyp_pgd;
 static pgd_t *hyp_pgd;
 static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
 
-static void *init_bounce_page;
 static unsigned long hyp_idmap_start;
 static unsigned long hyp_idmap_end;
 static phys_addr_t hyp_idmap_vector;
@@ -405,9 +404,6 @@ void free_boot_hyp_pgd(void)
 	if (hyp_pgd)
 		unmap_range(NULL, hyp_pgd, TRAMPOLINE_VA, PAGE_SIZE);
 
-	free_page((unsigned long)init_bounce_page);
-	init_bounce_page = NULL;
-
 	mutex_unlock(&kvm_hyp_pgd_mutex);
 }
 
@@ -1498,39 +1494,11 @@ int kvm_mmu_init(void)
 	hyp_idmap_end = kvm_virt_to_phys(__hyp_idmap_text_end);
 	hyp_idmap_vector = kvm_virt_to_phys(__kvm_hyp_init);
 
-	if ((hyp_idmap_start ^ hyp_idmap_end) & PAGE_MASK) {
-		/*
-		 * Our init code is crossing a page boundary. Allocate
-		 * a bounce page, copy the code over and use that.
-		 */
-		size_t len = __hyp_idmap_text_end - __hyp_idmap_text_start;
-		phys_addr_t phys_base;
-
-		init_bounce_page = (void *)__get_free_page(GFP_KERNEL);
-		if (!init_bounce_page) {
-			kvm_err("Couldn't allocate HYP init bounce page\n");
-			err = -ENOMEM;
-			goto out;
-		}
-
-		memcpy(init_bounce_page, __hyp_idmap_text_start, len);
-		/*
-		 * Warning: the code we just copied to the bounce page
-		 * must be flushed to the point of coherency.
-		 * Otherwise, the data may be sitting in L2, and HYP
-		 * mode won't be able to observe it as it runs with
-		 * caches off at that point.
-		 */
-		kvm_flush_dcache_to_poc(init_bounce_page, len);
-
-		phys_base = kvm_virt_to_phys(init_bounce_page);
-		hyp_idmap_vector += phys_base - hyp_idmap_start;
-		hyp_idmap_start = phys_base;
-		hyp_idmap_end = phys_base + len;
-
-		kvm_info("Using HYP init bounce page @%lx\n",
-			 (unsigned long)phys_base);
-	}
+	/*
+	 * We rely on the linker script to ensure at build time that the HYP
+	 * init code does not cross a page boundary.
+	 */
+	BUG_ON((hyp_idmap_start ^ (hyp_idmap_end - 1)) & PAGE_MASK);
 
 	hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order);
 	boot_hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order);
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 5d9d2dca530d..17383c257a7d 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -23,10 +23,14 @@ jiffies = jiffies_64;
 
 #define HYPERVISOR_TEXT					\
 	/*						\
-	 * Force the alignment to be compatible with	\
-	 * the vectors requirements			\
+	 * Align to 4K so that				\
+	 * a) the HYP vector table is@its minimum	\
+	 *    alignment of 2048 bytes			\
+	 * b) the HYP init code will not cross a page	\
+	 *    boundary if its size does not exceed	\
+	 *    4K (see related ASSERT() below)		\
 	 */						\
-	. = ALIGN(2048);				\
+	. = ALIGN(SZ_4K);				\
 	VMLINUX_SYMBOL(__hyp_idmap_text_start) = .;	\
 	*(.hyp.idmap.text)				\
 	VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;	\
@@ -163,10 +167,12 @@ SECTIONS
 }
 
 /*
- * The HYP init code can't be more than a page long.
+ * The HYP init code can't be more than a page long,
+ * and should not cross a page boundary.
  */
-ASSERT(((__hyp_idmap_text_start + PAGE_SIZE) > __hyp_idmap_text_end),
-       "HYP init code too big")
+ASSERT(((__hyp_idmap_text_end - 1) & ~(SZ_4K - 1)) -
+	(__hyp_idmap_text_start & ~(SZ_4K - 1)) == 0,
+	"HYP init code too big or unaligned")
 
 /*
  * If padding is applied before .head.text, virt<->phys conversions will fail.
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 2/3] arm64: make ID map shareable with EL2
  2015-02-26 15:29 [RFC PATCH 0/3] arm64: KVM: use increased VA range if needed Ard Biesheuvel
  2015-02-26 15:29 ` [RFC PATCH 1/3] ARM, arm64: kvm: get rid of the bounce page Ard Biesheuvel
@ 2015-02-26 15:29 ` Ard Biesheuvel
  2015-02-26 15:29 ` [RFC PATCH 3/3] arm64: KVM: use ID map with increased VA range if required Ard Biesheuvel
  2 siblings, 0 replies; 17+ messages in thread
From: Ard Biesheuvel @ 2015-02-26 15:29 UTC (permalink / raw)
  To: linux-arm-kernel

This changes the ID map creation in head.S so that it can be reused
by KVM to ID map the HYP init code in EL2.

Since the architecture defines the AP[1] bit as RES1 in page table entries
used at EL2, we must ensure that the HYP init code is mapped with the AP[1]
bit set. However, AP[1] means 'writable at EL0' when set at EL1, which
automatically implies PXN == 1 (rendering the entire ID map non-executable)
unless we also set AP[2], which means read-only at both EL0 and EL1.

To prevent having to make the entire ID map read-only, we split the ID map
in two regions, and only set AP[2:1] == 0b11 for the first part, that covers
the HYP .text section. Note that this also moves it to before _stext, which
removes it from the runtime executable at EL1 region. This is a nice bonus,
in fact, since this code should never be executable at EL1 anyway.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/kernel/head.S        | 22 ++++++++++++++++++----
 arch/arm64/kernel/vmlinux.lds.S |  2 +-
 2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index a3612eadab3c..0b3fb672640c 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -79,8 +79,10 @@
 
 #ifdef CONFIG_ARM64_64K_PAGES
 #define MM_MMUFLAGS	PTE_ATTRINDX(MT_NORMAL) | PTE_FLAGS
+#define MM_MMUFLAGS_HYP	MM_MMUFLAGS | PTE_HYP | PTE_RDONLY
 #else
 #define MM_MMUFLAGS	PMD_ATTRINDX(MT_NORMAL) | PMD_FLAGS
+#define MM_MMUFLAGS_HYP	MM_MMUFLAGS | PMD_HYP | PMD_SECT_RDONLY
 #endif
 
 /*
@@ -379,8 +381,6 @@ __create_page_tables:
 	cmp	x0, x6
 	b.lo	1b
 
-	ldr	x7, =MM_MMUFLAGS
-
 	/*
 	 * Create the identity mapping.
 	 */
@@ -426,9 +426,23 @@ __create_page_tables:
 #endif
 
 	create_pgd_entry x0, x3, x5, x6
-	ldr	x6, =KERNEL_END
+
+	/*
+	 * Map the first region -which also contains the HYP text sections- with
+	 * HYP compatible attributes, so that we can share the ID map with KVM.
+	 */
+	ldr	x7, =MM_MMUFLAGS_HYP
+	adrp	x6, _stext			// __pa(_stext)
 	mov	x5, x3				// __pa(KERNEL_START)
-	add	x6, x6, x28			// __pa(KERNEL_END)
+	create_block_map x0, x7, x3, x5, x6
+
+	/*
+	 * Map everything else with the default attributes.
+	 */
+	ldr	x7, =MM_MMUFLAGS
+	adrp	x3, (_stext + BLOCK_SIZE - 1)	// next block after _stext
+	adrp	x6, KERNEL_END
+	mov	x5, x3
 	create_block_map x0, x7, x3, x5, x6
 
 	/*
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 17383c257a7d..4874ebca4b3e 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -83,6 +83,7 @@ SECTIONS
 	.head.text : {
 		_text = .;
 		HEAD_TEXT
+		HYPERVISOR_TEXT
 	}
 	ALIGN_DEBUG_RO
 	.text : {			/* Real text segment		*/
@@ -94,7 +95,6 @@ SECTIONS
 			TEXT_TEXT
 			SCHED_TEXT
 			LOCK_TEXT
-			HYPERVISOR_TEXT
 			*(.fixup)
 			*(.gnu.warning)
 		. = ALIGN(16);
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 3/3] arm64: KVM: use ID map with increased VA range if required
  2015-02-26 15:29 [RFC PATCH 0/3] arm64: KVM: use increased VA range if needed Ard Biesheuvel
  2015-02-26 15:29 ` [RFC PATCH 1/3] ARM, arm64: kvm: get rid of the bounce page Ard Biesheuvel
  2015-02-26 15:29 ` [RFC PATCH 2/3] arm64: make ID map shareable with EL2 Ard Biesheuvel
@ 2015-02-26 15:29 ` Ard Biesheuvel
  2015-02-26 16:11   ` Catalin Marinas
  2015-02-26 18:06   ` Marc Zyngier
  2 siblings, 2 replies; 17+ messages in thread
From: Ard Biesheuvel @ 2015-02-26 15:29 UTC (permalink / raw)
  To: linux-arm-kernel

This patch modifies the HYP init code so it can deal with system
RAM residing at an offset which exceeds the reach of VA_BITS.

Like for EL1, this involves configuring an additional level of
translation for the ID map. However, in case of EL2, this implies
that all translations use the extra level, as we cannot seamlessly
switch between translation tables with different numbers of
translation levels. For this reason, the ID map is merged with
the runtime HYP map, since they don't overlap anyway.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/kvm/hyp-init.S | 40 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 38 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/hyp-init.S b/arch/arm64/kvm/hyp-init.S
index c3191168a994..0af16bce6316 100644
--- a/arch/arm64/kvm/hyp-init.S
+++ b/arch/arm64/kvm/hyp-init.S
@@ -20,6 +20,7 @@
 #include <asm/assembler.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/pgtable-hwdef.h>
 
 	.text
 	.pushsection	.hyp.idmap.text, "ax"
@@ -58,13 +59,44 @@ __invalid:
 	 */
 __do_hyp_init:
 
-	msr	ttbr0_el2, x0
-
 	mrs	x4, tcr_el1
 	ldr	x5, =TCR_EL2_MASK
 	and	x4, x4, x5
 	ldr	x5, =TCR_EL2_FLAGS
 	orr	x4, x4, x5
+
+#ifndef CONFIG_ARM64_VA_BITS_48
+	/*
+	 * If we are running with VA_BITS < 48, we may be running with an extra
+	 * level of translation in the ID map. This is only the case if system
+	 * RAM is out of range for the currently configured page size and number
+	 * of translation levels, in which case we will also need the extra
+	 * level for the HYP ID map, or we won't be able to enable the EL2 MMU.
+	 *
+	 * However, at EL2, there is only one TTBR register, and we can't switch
+	 * between translation tables *and* update TCR_EL2.T0SZ at the same
+	 * time. Bottom line: we need the extra level in *both* our translation
+	 * tables.
+	 *
+	 * Fortunately, there is an easy way out: the existing ID map, with the
+	 * extra level, can be reused for both. The kernel image is already
+	 * identity mapped at a high virtual offset, which leaves VA_BITS of
+	 * address space at the low end to put our runtime HYP mappings.
+	 */
+	adrp	x5, idmap_t0sz			// get ID map TCR.T0SZ
+	ldr	x5, [x5, :lo12:idmap_t0sz]
+	cmp	x5, TCR_T0SZ(VA_BITS)		// extra level configured?
+	b.ge	1f				// if not, skip
+
+	bfi	x4, x5, TCR_T0SZ_OFFSET, TCR_TxSZ_WIDTH
+
+	adrp	x0, idmap_pg_dir	// get root of ID map
+	orr	x5, x1, PMD_TYPE_TABLE	// table entry pointing at HYP pgd
+	str	x5, [x0]		// store@offset #0
+	mov	x1, #0
+1:
+#endif
+	msr	ttbr0_el2, x0
 	msr	tcr_el2, x4
 
 	ldr	x4, =VTCR_EL2_FLAGS
@@ -91,6 +123,9 @@ __do_hyp_init:
 	msr	sctlr_el2, x4
 	isb
 
+	/* Skip the trampoline dance if we merged the boot and runtime PGDs */
+	cbz	x1, merged
+
 	/* MMU is now enabled. Get ready for the trampoline dance */
 	ldr	x4, =TRAMPOLINE_VA
 	adr	x5, target
@@ -105,6 +140,7 @@ target: /* We're now in the trampoline code, switch page tables */
 	tlbi	alle2
 	dsb	sy
 
+merged:
 	/* Set the stack and new vectors */
 	kern_hyp_va	x2
 	mov	sp, x2
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 1/3] ARM, arm64: kvm: get rid of the bounce page
  2015-02-26 15:29 ` [RFC PATCH 1/3] ARM, arm64: kvm: get rid of the bounce page Ard Biesheuvel
@ 2015-02-26 16:10   ` Marc Zyngier
  2015-02-26 17:31     ` Ard Biesheuvel
  0 siblings, 1 reply; 17+ messages in thread
From: Marc Zyngier @ 2015-02-26 16:10 UTC (permalink / raw)
  To: linux-arm-kernel

On 26/02/15 15:29, Ard Biesheuvel wrote:
> The HYP init bounce page is a runtime construct that ensures that the
> HYP init code does not cross a page boundary. However, this is something
> we can do perfectly well at build time, by aligning the code appropriately.
> 
> For arm64, we just align to 4 KB, and enforce that the code size is less
> than 4 KB, regardless of the chosen page size.
> 
> For ARM, the whole code is less than 256 bytes, so we tweak the linker
> script to align at a power of 2 upper bound of the code size
> 
> Note that this also fixes a benign off-by-one error in the original bounce
> page code, where a bounce page would be allocated unnecessarily if the code
> was exactly 1 page in size.

I really like this simplification. Can you please check that it still
work on 32bit with this patch from Arnd?

https://www.mail-archive.com/kvm at vger.kernel.org/msg112364.html

Another question below:

> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm/kernel/vmlinux.lds.S   | 12 +++++++++---
>  arch/arm/kvm/init.S             | 11 +++++++++++
>  arch/arm/kvm/mmu.c              | 42 +++++------------------------------------
>  arch/arm64/kernel/vmlinux.lds.S | 18 ++++++++++++------
>  4 files changed, 37 insertions(+), 46 deletions(-)
> 
> diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
> index b31aa73e8076..8179d3903dee 100644
> --- a/arch/arm/kernel/vmlinux.lds.S
> +++ b/arch/arm/kernel/vmlinux.lds.S
> @@ -23,7 +23,7 @@
>  	VMLINUX_SYMBOL(__idmap_text_start) = .;				\
>  	*(.idmap.text)							\
>  	VMLINUX_SYMBOL(__idmap_text_end) = .;				\
> -	. = ALIGN(32);							\
> +	. = ALIGN(1 << __hyp_idmap_align_order);			\
>  	VMLINUX_SYMBOL(__hyp_idmap_text_start) = .;			\
>  	*(.hyp.idmap.text)						\
>  	VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;
> @@ -346,8 +346,14 @@ SECTIONS
>   */
>  ASSERT((__proc_info_end - __proc_info_begin), "missing CPU support")
>  ASSERT((__arch_info_end - __arch_info_begin), "no machine record defined")
> +
>  /*
> - * The HYP init code can't be more than a page long.
> + * The HYP init code can't be more than a page long,
> + * and should not cross a page boundary.
>   * The above comment applies as well.
>   */
> -ASSERT(((__hyp_idmap_text_end - __hyp_idmap_text_start) <= PAGE_SIZE), "HYP init code too big")
> +ASSERT(((__hyp_idmap_text_end - 1) & PAGE_MASK) -
> +	(__hyp_idmap_text_start & PAGE_MASK) == 0,
> +	"HYP init code too big or unaligned")
> +ASSERT(__hyp_idmap_size <= (1 << __hyp_idmap_align_order),
> +	"__hyp_idmap_size should be <= (1 << __hyp_idmap_align_order)")
> diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
> index 3988e72d16ff..7a279bc8e0e1 100644
> --- a/arch/arm/kvm/init.S
> +++ b/arch/arm/kvm/init.S
> @@ -157,3 +157,14 @@ target:	@ We're now in the trampoline code, switch page tables
>  __kvm_hyp_init_end:
>  
>  	.popsection
> +
> +	/*
> +	 * When making changes to this file, make sure that the value of
> +	 * __hyp_idmap_align_order is updated if the size of the code ends up
> +	 * exceeding (1 << __hyp_idmap_align_order). This helps ensure that the
> +	 * code never crosses a page boundary, without wasting too much memory
> +	 * on aligning to PAGE_SIZE.
> +	 */
> +	.global	__hyp_idmap_size, __hyp_idmap_align_order
> +	.set	__hyp_idmap_size, __kvm_hyp_init_end - __kvm_hyp_init
> +	.set	__hyp_idmap_align_order, 8

Is there a way to generate this __hyp_idmap_align_order automatically?
We're already pretty close to this 8 bit limit...

> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 3e6859bc3e11..42a24d6b003b 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -37,7 +37,6 @@ static pgd_t *boot_hyp_pgd;
>  static pgd_t *hyp_pgd;
>  static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
>  
> -static void *init_bounce_page;
>  static unsigned long hyp_idmap_start;
>  static unsigned long hyp_idmap_end;
>  static phys_addr_t hyp_idmap_vector;
> @@ -405,9 +404,6 @@ void free_boot_hyp_pgd(void)
>  	if (hyp_pgd)
>  		unmap_range(NULL, hyp_pgd, TRAMPOLINE_VA, PAGE_SIZE);
>  
> -	free_page((unsigned long)init_bounce_page);
> -	init_bounce_page = NULL;
> -
>  	mutex_unlock(&kvm_hyp_pgd_mutex);
>  }
>  
> @@ -1498,39 +1494,11 @@ int kvm_mmu_init(void)
>  	hyp_idmap_end = kvm_virt_to_phys(__hyp_idmap_text_end);
>  	hyp_idmap_vector = kvm_virt_to_phys(__kvm_hyp_init);
>  
> -	if ((hyp_idmap_start ^ hyp_idmap_end) & PAGE_MASK) {
> -		/*
> -		 * Our init code is crossing a page boundary. Allocate
> -		 * a bounce page, copy the code over and use that.
> -		 */
> -		size_t len = __hyp_idmap_text_end - __hyp_idmap_text_start;
> -		phys_addr_t phys_base;
> -
> -		init_bounce_page = (void *)__get_free_page(GFP_KERNEL);
> -		if (!init_bounce_page) {
> -			kvm_err("Couldn't allocate HYP init bounce page\n");
> -			err = -ENOMEM;
> -			goto out;
> -		}
> -
> -		memcpy(init_bounce_page, __hyp_idmap_text_start, len);
> -		/*
> -		 * Warning: the code we just copied to the bounce page
> -		 * must be flushed to the point of coherency.
> -		 * Otherwise, the data may be sitting in L2, and HYP
> -		 * mode won't be able to observe it as it runs with
> -		 * caches off at that point.
> -		 */
> -		kvm_flush_dcache_to_poc(init_bounce_page, len);
> -
> -		phys_base = kvm_virt_to_phys(init_bounce_page);
> -		hyp_idmap_vector += phys_base - hyp_idmap_start;
> -		hyp_idmap_start = phys_base;
> -		hyp_idmap_end = phys_base + len;
> -
> -		kvm_info("Using HYP init bounce page @%lx\n",
> -			 (unsigned long)phys_base);
> -	}
> +	/*
> +	 * We rely on the linker script to ensure at build time that the HYP
> +	 * init code does not cross a page boundary.
> +	 */
> +	BUG_ON((hyp_idmap_start ^ (hyp_idmap_end - 1)) & PAGE_MASK);
>  
>  	hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order);
>  	boot_hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order);
> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> index 5d9d2dca530d..17383c257a7d 100644
> --- a/arch/arm64/kernel/vmlinux.lds.S
> +++ b/arch/arm64/kernel/vmlinux.lds.S
> @@ -23,10 +23,14 @@ jiffies = jiffies_64;
>  
>  #define HYPERVISOR_TEXT					\
>  	/*						\
> -	 * Force the alignment to be compatible with	\
> -	 * the vectors requirements			\
> +	 * Align to 4K so that				\
> +	 * a) the HYP vector table is at its minimum	\
> +	 *    alignment of 2048 bytes			\
> +	 * b) the HYP init code will not cross a page	\
> +	 *    boundary if its size does not exceed	\
> +	 *    4K (see related ASSERT() below)		\
>  	 */						\
> -	. = ALIGN(2048);				\
> +	. = ALIGN(SZ_4K);				\
>  	VMLINUX_SYMBOL(__hyp_idmap_text_start) = .;	\
>  	*(.hyp.idmap.text)				\
>  	VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;	\
> @@ -163,10 +167,12 @@ SECTIONS
>  }
>  
>  /*
> - * The HYP init code can't be more than a page long.
> + * The HYP init code can't be more than a page long,
> + * and should not cross a page boundary.
>   */
> -ASSERT(((__hyp_idmap_text_start + PAGE_SIZE) > __hyp_idmap_text_end),
> -       "HYP init code too big")
> +ASSERT(((__hyp_idmap_text_end - 1) & ~(SZ_4K - 1)) -
> +	(__hyp_idmap_text_start & ~(SZ_4K - 1)) == 0,
> +	"HYP init code too big or unaligned")
>  
>  /*
>   * If padding is applied before .head.text, virt<->phys conversions will fail.
> 

Otherwise looks pretty good.

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 3/3] arm64: KVM: use ID map with increased VA range if required
  2015-02-26 15:29 ` [RFC PATCH 3/3] arm64: KVM: use ID map with increased VA range if required Ard Biesheuvel
@ 2015-02-26 16:11   ` Catalin Marinas
  2015-02-26 16:29     ` Catalin Marinas
  2015-02-26 16:56     ` Ard Biesheuvel
  2015-02-26 18:06   ` Marc Zyngier
  1 sibling, 2 replies; 17+ messages in thread
From: Catalin Marinas @ 2015-02-26 16:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 26, 2015 at 03:29:07PM +0000, Ard Biesheuvel wrote:
> +	/*
> +	 * If we are running with VA_BITS < 48, we may be running with an extra
> +	 * level of translation in the ID map. This is only the case if system
> +	 * RAM is out of range for the currently configured page size and number
> +	 * of translation levels, in which case we will also need the extra
> +	 * level for the HYP ID map, or we won't be able to enable the EL2 MMU.
> +	 *
> +	 * However, at EL2, there is only one TTBR register, and we can't switch
> +	 * between translation tables *and* update TCR_EL2.T0SZ at the same
> +	 * time. Bottom line: we need the extra level in *both* our translation
> +	 * tables.

It doesn't sound too nice but I'm not sure we can do much about it.

> +	 *
> +	 * Fortunately, there is an easy way out: the existing ID map, with the
> +	 * extra level, can be reused for both. The kernel image is already
> +	 * identity mapped at a high virtual offset, which leaves VA_BITS of
> +	 * address space at the low end to put our runtime HYP mappings.
> +	 */

Another aspect here is that the Hyp VA starts at HYP_PAGE_OFFSET which
is 1 << (VA_BITS - 1). On some platforms we may get an overlap with the
physical memory (and the original idmap). Could we switch to the
dedicated hyp TTBR@this point, with the extra level? Functions like
__create_hyp_mapping() don't need to know about this extra level as
hyp_pgd only knows about 4 levels but TTBR0_EL2 would point to the extra
level.

-- 
Catalin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 3/3] arm64: KVM: use ID map with increased VA range if required
  2015-02-26 16:11   ` Catalin Marinas
@ 2015-02-26 16:29     ` Catalin Marinas
  2015-02-26 16:56     ` Ard Biesheuvel
  1 sibling, 0 replies; 17+ messages in thread
From: Catalin Marinas @ 2015-02-26 16:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 26, 2015 at 04:11:47PM +0000, Catalin Marinas wrote:
> On Thu, Feb 26, 2015 at 03:29:07PM +0000, Ard Biesheuvel wrote:
> > +	/*
> > +	 * If we are running with VA_BITS < 48, we may be running with an extra
> > +	 * level of translation in the ID map. This is only the case if system
> > +	 * RAM is out of range for the currently configured page size and number
> > +	 * of translation levels, in which case we will also need the extra
> > +	 * level for the HYP ID map, or we won't be able to enable the EL2 MMU.
> > +	 *
> > +	 * However, at EL2, there is only one TTBR register, and we can't switch
> > +	 * between translation tables *and* update TCR_EL2.T0SZ at the same
> > +	 * time. Bottom line: we need the extra level in *both* our translation
> > +	 * tables.
> 
> It doesn't sound too nice but I'm not sure we can do much about it.
> 
> > +	 *
> > +	 * Fortunately, there is an easy way out: the existing ID map, with the
> > +	 * extra level, can be reused for both. The kernel image is already
> > +	 * identity mapped at a high virtual offset, which leaves VA_BITS of
> > +	 * address space at the low end to put our runtime HYP mappings.
> > +	 */
> 
> Another aspect here is that the Hyp VA starts at HYP_PAGE_OFFSET which
> is 1 << (VA_BITS - 1). On some platforms we may get an overlap with the
> physical memory (and the original idmap). Could we switch to the
> dedicated hyp TTBR at this point, with the extra level? Functions like
> __create_hyp_mapping() don't need to know about this extra level as
> hyp_pgd only knows about 4 levels but TTBR0_EL2 would point to the extra
> level.

I meant "hyp_pgd only knows about 3 levels (or 2, depending on the
configuration; same as the host kernel TTBR1_EL1)"

-- 
Catalin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 3/3] arm64: KVM: use ID map with increased VA range if required
  2015-02-26 16:11   ` Catalin Marinas
  2015-02-26 16:29     ` Catalin Marinas
@ 2015-02-26 16:56     ` Ard Biesheuvel
  2015-02-26 18:07       ` Catalin Marinas
  1 sibling, 1 reply; 17+ messages in thread
From: Ard Biesheuvel @ 2015-02-26 16:56 UTC (permalink / raw)
  To: linux-arm-kernel

On 26 February 2015 at 16:11, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, Feb 26, 2015 at 03:29:07PM +0000, Ard Biesheuvel wrote:
>> +     /*
>> +      * If we are running with VA_BITS < 48, we may be running with an extra
>> +      * level of translation in the ID map. This is only the case if system
>> +      * RAM is out of range for the currently configured page size and number
>> +      * of translation levels, in which case we will also need the extra
>> +      * level for the HYP ID map, or we won't be able to enable the EL2 MMU.
>> +      *
>> +      * However, at EL2, there is only one TTBR register, and we can't switch
>> +      * between translation tables *and* update TCR_EL2.T0SZ at the same
>> +      * time. Bottom line: we need the extra level in *both* our translation
>> +      * tables.
>
> It doesn't sound too nice but I'm not sure we can do much about it.
>

I don't think there is.

>> +      *
>> +      * Fortunately, there is an easy way out: the existing ID map, with the
>> +      * extra level, can be reused for both. The kernel image is already
>> +      * identity mapped at a high virtual offset, which leaves VA_BITS of
>> +      * address space at the low end to put our runtime HYP mappings.
>> +      */
>
> Another aspect here is that the Hyp VA starts at HYP_PAGE_OFFSET which
> is 1 << (VA_BITS - 1). On some platforms we may get an overlap with the
> physical memory (and the original idmap). Could we switch to the
> dedicated hyp TTBR at this point, with the extra level? Functions like
> __create_hyp_mapping() don't need to know about this extra level as
> hyp_pgd only knows about 4 levels but TTBR0_EL2 would point to the extra
> level.
>

The original id map only covers the kernel image, not all of system
ram (and it doesn't have to). The only way the hyp VA range could
overlap with the existing ID mapping while we are running with the
additional level is if KERNEL_START < (1<<VA_BITS) and KERNEL_END >=
(1<<VA_BITS). I don't think the pgtable code in head.S can deal with
that anyway atm. Otherwise, I don't see how they would ever overlap.

If we are not running with the additional level, then we keep the
separate translation tables. We could decide to reuse the existing ID
map in that case as well, but this is fairly trivial to implement so I
left it out for now.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 1/3] ARM, arm64: kvm: get rid of the bounce page
  2015-02-26 16:10   ` Marc Zyngier
@ 2015-02-26 17:31     ` Ard Biesheuvel
  2015-02-26 18:24       ` Marc Zyngier
  0 siblings, 1 reply; 17+ messages in thread
From: Ard Biesheuvel @ 2015-02-26 17:31 UTC (permalink / raw)
  To: linux-arm-kernel

On 26 February 2015 at 16:10, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 26/02/15 15:29, Ard Biesheuvel wrote:
>> The HYP init bounce page is a runtime construct that ensures that the
>> HYP init code does not cross a page boundary. However, this is something
>> we can do perfectly well at build time, by aligning the code appropriately.
>>
>> For arm64, we just align to 4 KB, and enforce that the code size is less
>> than 4 KB, regardless of the chosen page size.
>>
>> For ARM, the whole code is less than 256 bytes, so we tweak the linker
>> script to align at a power of 2 upper bound of the code size
>>
>> Note that this also fixes a benign off-by-one error in the original bounce
>> page code, where a bounce page would be allocated unnecessarily if the code
>> was exactly 1 page in size.
>
> I really like this simplification. Can you please check that it still
> work on 32bit with this patch from Arnd?
>
> https://www.mail-archive.com/kvm at vger.kernel.org/msg112364.html
>

Yes, it does.

Note that the kernel's RODATA permissions shouldn't affect whether
this code is executable or not in HYP mode, so I think this code
belongs in RODATA in the 1st place.

> Another question below:
>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  arch/arm/kernel/vmlinux.lds.S   | 12 +++++++++---
>>  arch/arm/kvm/init.S             | 11 +++++++++++
>>  arch/arm/kvm/mmu.c              | 42 +++++------------------------------------
>>  arch/arm64/kernel/vmlinux.lds.S | 18 ++++++++++++------
>>  4 files changed, 37 insertions(+), 46 deletions(-)
>>
>> diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
>> index b31aa73e8076..8179d3903dee 100644
>> --- a/arch/arm/kernel/vmlinux.lds.S
>> +++ b/arch/arm/kernel/vmlinux.lds.S
>> @@ -23,7 +23,7 @@
>>       VMLINUX_SYMBOL(__idmap_text_start) = .;                         \
>>       *(.idmap.text)                                                  \
>>       VMLINUX_SYMBOL(__idmap_text_end) = .;                           \
>> -     . = ALIGN(32);                                                  \
>> +     . = ALIGN(1 << __hyp_idmap_align_order);                        \
>>       VMLINUX_SYMBOL(__hyp_idmap_text_start) = .;                     \
>>       *(.hyp.idmap.text)                                              \
>>       VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;
>> @@ -346,8 +346,14 @@ SECTIONS
>>   */
>>  ASSERT((__proc_info_end - __proc_info_begin), "missing CPU support")
>>  ASSERT((__arch_info_end - __arch_info_begin), "no machine record defined")
>> +
>>  /*
>> - * The HYP init code can't be more than a page long.
>> + * The HYP init code can't be more than a page long,
>> + * and should not cross a page boundary.
>>   * The above comment applies as well.
>>   */
>> -ASSERT(((__hyp_idmap_text_end - __hyp_idmap_text_start) <= PAGE_SIZE), "HYP init code too big")
>> +ASSERT(((__hyp_idmap_text_end - 1) & PAGE_MASK) -
>> +     (__hyp_idmap_text_start & PAGE_MASK) == 0,
>> +     "HYP init code too big or unaligned")
>> +ASSERT(__hyp_idmap_size <= (1 << __hyp_idmap_align_order),
>> +     "__hyp_idmap_size should be <= (1 << __hyp_idmap_align_order)")
>> diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
>> index 3988e72d16ff..7a279bc8e0e1 100644
>> --- a/arch/arm/kvm/init.S
>> +++ b/arch/arm/kvm/init.S
>> @@ -157,3 +157,14 @@ target:  @ We're now in the trampoline code, switch page tables
>>  __kvm_hyp_init_end:
>>
>>       .popsection
>> +
>> +     /*
>> +      * When making changes to this file, make sure that the value of
>> +      * __hyp_idmap_align_order is updated if the size of the code ends up
>> +      * exceeding (1 << __hyp_idmap_align_order). This helps ensure that the
>> +      * code never crosses a page boundary, without wasting too much memory
>> +      * on aligning to PAGE_SIZE.
>> +      */
>> +     .global __hyp_idmap_size, __hyp_idmap_align_order
>> +     .set    __hyp_idmap_size, __kvm_hyp_init_end - __kvm_hyp_init
>> +     .set    __hyp_idmap_align_order, 8
>
> Is there a way to generate this __hyp_idmap_align_order automatically?
> We're already pretty close to this 8 bit limit...
>

This seems to work:

#define HYP_IDMAP_ALIGN \
__hyp_idmap_size <= 0x100 ? 0x100 : \
__hyp_idmap_size <= 0x200 ? 0x200 : \
__hyp_idmap_size <= 0x400 ? 0x400 : \
__hyp_idmap_size <= 0x800 ? 0x800 : 0x1000

and

. = ALIGN(HYP_IDMAP_ALIGN); \

and we are limited at 1 page anyway.

Should I respin and include the move to RODATA@the same time?
Or would you like me to rebase onto Arnd's patch?

-- 
Ard.

>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 3e6859bc3e11..42a24d6b003b 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -37,7 +37,6 @@ static pgd_t *boot_hyp_pgd;
>>  static pgd_t *hyp_pgd;
>>  static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
>>
>> -static void *init_bounce_page;
>>  static unsigned long hyp_idmap_start;
>>  static unsigned long hyp_idmap_end;
>>  static phys_addr_t hyp_idmap_vector;
>> @@ -405,9 +404,6 @@ void free_boot_hyp_pgd(void)
>>       if (hyp_pgd)
>>               unmap_range(NULL, hyp_pgd, TRAMPOLINE_VA, PAGE_SIZE);
>>
>> -     free_page((unsigned long)init_bounce_page);
>> -     init_bounce_page = NULL;
>> -
>>       mutex_unlock(&kvm_hyp_pgd_mutex);
>>  }
>>
>> @@ -1498,39 +1494,11 @@ int kvm_mmu_init(void)
>>       hyp_idmap_end = kvm_virt_to_phys(__hyp_idmap_text_end);
>>       hyp_idmap_vector = kvm_virt_to_phys(__kvm_hyp_init);
>>
>> -     if ((hyp_idmap_start ^ hyp_idmap_end) & PAGE_MASK) {
>> -             /*
>> -              * Our init code is crossing a page boundary. Allocate
>> -              * a bounce page, copy the code over and use that.
>> -              */
>> -             size_t len = __hyp_idmap_text_end - __hyp_idmap_text_start;
>> -             phys_addr_t phys_base;
>> -
>> -             init_bounce_page = (void *)__get_free_page(GFP_KERNEL);
>> -             if (!init_bounce_page) {
>> -                     kvm_err("Couldn't allocate HYP init bounce page\n");
>> -                     err = -ENOMEM;
>> -                     goto out;
>> -             }
>> -
>> -             memcpy(init_bounce_page, __hyp_idmap_text_start, len);
>> -             /*
>> -              * Warning: the code we just copied to the bounce page
>> -              * must be flushed to the point of coherency.
>> -              * Otherwise, the data may be sitting in L2, and HYP
>> -              * mode won't be able to observe it as it runs with
>> -              * caches off at that point.
>> -              */
>> -             kvm_flush_dcache_to_poc(init_bounce_page, len);
>> -
>> -             phys_base = kvm_virt_to_phys(init_bounce_page);
>> -             hyp_idmap_vector += phys_base - hyp_idmap_start;
>> -             hyp_idmap_start = phys_base;
>> -             hyp_idmap_end = phys_base + len;
>> -
>> -             kvm_info("Using HYP init bounce page @%lx\n",
>> -                      (unsigned long)phys_base);
>> -     }
>> +     /*
>> +      * We rely on the linker script to ensure at build time that the HYP
>> +      * init code does not cross a page boundary.
>> +      */
>> +     BUG_ON((hyp_idmap_start ^ (hyp_idmap_end - 1)) & PAGE_MASK);
>>
>>       hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order);
>>       boot_hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order);
>> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
>> index 5d9d2dca530d..17383c257a7d 100644
>> --- a/arch/arm64/kernel/vmlinux.lds.S
>> +++ b/arch/arm64/kernel/vmlinux.lds.S
>> @@ -23,10 +23,14 @@ jiffies = jiffies_64;
>>
>>  #define HYPERVISOR_TEXT                                      \
>>       /*                                              \
>> -      * Force the alignment to be compatible with    \
>> -      * the vectors requirements                     \
>> +      * Align to 4K so that                          \
>> +      * a) the HYP vector table is at its minimum    \
>> +      *    alignment of 2048 bytes                   \
>> +      * b) the HYP init code will not cross a page   \
>> +      *    boundary if its size does not exceed      \
>> +      *    4K (see related ASSERT() below)           \
>>        */                                             \
>> -     . = ALIGN(2048);                                \
>> +     . = ALIGN(SZ_4K);                               \
>>       VMLINUX_SYMBOL(__hyp_idmap_text_start) = .;     \
>>       *(.hyp.idmap.text)                              \
>>       VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;       \
>> @@ -163,10 +167,12 @@ SECTIONS
>>  }
>>
>>  /*
>> - * The HYP init code can't be more than a page long.
>> + * The HYP init code can't be more than a page long,
>> + * and should not cross a page boundary.
>>   */
>> -ASSERT(((__hyp_idmap_text_start + PAGE_SIZE) > __hyp_idmap_text_end),
>> -       "HYP init code too big")
>> +ASSERT(((__hyp_idmap_text_end - 1) & ~(SZ_4K - 1)) -
>> +     (__hyp_idmap_text_start & ~(SZ_4K - 1)) == 0,
>> +     "HYP init code too big or unaligned")
>>
>>  /*
>>   * If padding is applied before .head.text, virt<->phys conversions will fail.
>>
>
> Otherwise looks pretty good.
>
>         M.
> --
> Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 3/3] arm64: KVM: use ID map with increased VA range if required
  2015-02-26 15:29 ` [RFC PATCH 3/3] arm64: KVM: use ID map with increased VA range if required Ard Biesheuvel
  2015-02-26 16:11   ` Catalin Marinas
@ 2015-02-26 18:06   ` Marc Zyngier
  2015-02-26 18:17     ` Ard Biesheuvel
  1 sibling, 1 reply; 17+ messages in thread
From: Marc Zyngier @ 2015-02-26 18:06 UTC (permalink / raw)
  To: linux-arm-kernel

On 26/02/15 15:29, Ard Biesheuvel wrote:
> This patch modifies the HYP init code so it can deal with system
> RAM residing at an offset which exceeds the reach of VA_BITS.
> 
> Like for EL1, this involves configuring an additional level of
> translation for the ID map. However, in case of EL2, this implies
> that all translations use the extra level, as we cannot seamlessly
> switch between translation tables with different numbers of
> translation levels. For this reason, the ID map is merged with
> the runtime HYP map, since they don't overlap anyway.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/kvm/hyp-init.S | 40 ++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 38 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp-init.S b/arch/arm64/kvm/hyp-init.S
> index c3191168a994..0af16bce6316 100644
> --- a/arch/arm64/kvm/hyp-init.S
> +++ b/arch/arm64/kvm/hyp-init.S
> @@ -20,6 +20,7 @@
>  #include <asm/assembler.h>
>  #include <asm/kvm_arm.h>
>  #include <asm/kvm_mmu.h>
> +#include <asm/pgtable-hwdef.h>
>  
>  	.text
>  	.pushsection	.hyp.idmap.text, "ax"
> @@ -58,13 +59,44 @@ __invalid:
>  	 */
>  __do_hyp_init:
>  
> -	msr	ttbr0_el2, x0
> -
>  	mrs	x4, tcr_el1
>  	ldr	x5, =TCR_EL2_MASK
>  	and	x4, x4, x5
>  	ldr	x5, =TCR_EL2_FLAGS
>  	orr	x4, x4, x5
> +
> +#ifndef CONFIG_ARM64_VA_BITS_48
> +	/*
> +	 * If we are running with VA_BITS < 48, we may be running with an extra
> +	 * level of translation in the ID map. This is only the case if system
> +	 * RAM is out of range for the currently configured page size and number
> +	 * of translation levels, in which case we will also need the extra
> +	 * level for the HYP ID map, or we won't be able to enable the EL2 MMU.
> +	 *
> +	 * However, at EL2, there is only one TTBR register, and we can't switch
> +	 * between translation tables *and* update TCR_EL2.T0SZ at the same
> +	 * time. Bottom line: we need the extra level in *both* our translation
> +	 * tables.
> +	 *
> +	 * Fortunately, there is an easy way out: the existing ID map, with the
> +	 * extra level, can be reused for both. The kernel image is already
> +	 * identity mapped at a high virtual offset, which leaves VA_BITS of
> +	 * address space at the low end to put our runtime HYP mappings.
> +	 */
> +	adrp	x5, idmap_t0sz			// get ID map TCR.T0SZ
> +	ldr	x5, [x5, :lo12:idmap_t0sz]
> +	cmp	x5, TCR_T0SZ(VA_BITS)		// extra level configured?
> +	b.ge	1f				// if not, skip
> +
> +	bfi	x4, x5, TCR_T0SZ_OFFSET, TCR_TxSZ_WIDTH
> +
> +	adrp	x0, idmap_pg_dir	// get root of ID map
> +	orr	x5, x1, PMD_TYPE_TABLE	// table entry pointing at HYP pgd
> +	str	x5, [x0]		// store at offset #0
> +	mov	x1, #0
> +1:

Wow. This is making my head spin a bit.

Nitpick: shouldn't you use PUD_TYPE_TABLE instead of PMD_TYPE_TABLE
(yeah, I know, that's the same thing...)?

> +#endif
> +	msr	ttbr0_el2, x0
>  	msr	tcr_el2, x4
>  
>  	ldr	x4, =VTCR_EL2_FLAGS
> @@ -91,6 +123,9 @@ __do_hyp_init:
>  	msr	sctlr_el2, x4
>  	isb
>  
> +	/* Skip the trampoline dance if we merged the boot and runtime PGDs */
> +	cbz	x1, merged
> +
>  	/* MMU is now enabled. Get ready for the trampoline dance */
>  	ldr	x4, =TRAMPOLINE_VA
>  	adr	x5, target
> @@ -105,6 +140,7 @@ target: /* We're now in the trampoline code, switch page tables */
>  	tlbi	alle2
>  	dsb	sy
>  
> +merged:
>  	/* Set the stack and new vectors */
>  	kern_hyp_va	x2
>  	mov	sp, x2
> 

The one thing that worries here is that our EL1 idmap is not a strict
idmap anymore. I really wonder what we can break with that...

But hey, it's so mad I like it! ;-)

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 3/3] arm64: KVM: use ID map with increased VA range if required
  2015-02-26 16:56     ` Ard Biesheuvel
@ 2015-02-26 18:07       ` Catalin Marinas
  0 siblings, 0 replies; 17+ messages in thread
From: Catalin Marinas @ 2015-02-26 18:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 26, 2015 at 04:56:32PM +0000, Ard Biesheuvel wrote:
> On 26 February 2015 at 16:11, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Thu, Feb 26, 2015 at 03:29:07PM +0000, Ard Biesheuvel wrote:
> >> +     /*
> >> +      * If we are running with VA_BITS < 48, we may be running with an extra
> >> +      * level of translation in the ID map. This is only the case if system
> >> +      * RAM is out of range for the currently configured page size and number
> >> +      * of translation levels, in which case we will also need the extra
> >> +      * level for the HYP ID map, or we won't be able to enable the EL2 MMU.
> >> +      *
> >> +      * However, at EL2, there is only one TTBR register, and we can't switch
> >> +      * between translation tables *and* update TCR_EL2.T0SZ at the same
> >> +      * time. Bottom line: we need the extra level in *both* our translation
> >> +      * tables.
> >
> > It doesn't sound too nice but I'm not sure we can do much about it.
> 
> I don't think there is.

Talking to Marc, I think there are some tricks which involve setting
HYP_PAGE_OFFSET to 0 and the level 0 table page would have the first
entry pointing to level 2 of hyp_pgd directly (skipping level 1). While
keeping the same TTBR0_EL2 as idmap, first jump to a low address. The
page table walker only reads 3 levels before it hits a block entry and
treats that as 2MB section. Once you run in this low VA, change T0SZ to
3 levels and the same entry now gets down to 4K page mappings.

But even if the above works, I think it's too complicated and we won't
understand the code anymore.

> >> +      *
> >> +      * Fortunately, there is an easy way out: the existing ID map, with the
> >> +      * extra level, can be reused for both. The kernel image is already
> >> +      * identity mapped at a high virtual offset, which leaves VA_BITS of
> >> +      * address space at the low end to put our runtime HYP mappings.
> >> +      */
> >
> > Another aspect here is that the Hyp VA starts at HYP_PAGE_OFFSET which
> > is 1 << (VA_BITS - 1). On some platforms we may get an overlap with the
> > physical memory (and the original idmap). Could we switch to the
> > dedicated hyp TTBR at this point, with the extra level? Functions like
> > __create_hyp_mapping() don't need to know about this extra level as
> > hyp_pgd only knows about 4 levels but TTBR0_EL2 would point to the extra
> > level.
> 
> The original id map only covers the kernel image, not all of system
> ram (and it doesn't have to). The only way the hyp VA range could
> overlap with the existing ID mapping while we are running with the
> additional level is if KERNEL_START < (1<<VA_BITS) and KERNEL_END >=
> (1<<VA_BITS). I don't think the pgtable code in head.S can deal with
> that anyway atm. Otherwise, I don't see how they would ever overlap.
> 
> If we are not running with the additional level, then we keep the
> separate translation tables. We could decide to reuse the existing ID
> map in that case as well, but this is fairly trivial to implement so I
> left it out for now.

This last case is what I was confused about. So with KERNEL_END within
VA_BITS, we don't get the additional level anyway and the above idmap
tricks.

The only downside is that the kernel idmap now has all the hyp mappings.
I don't think it's a problem.

-- 
Catalin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 3/3] arm64: KVM: use ID map with increased VA range if required
  2015-02-26 18:06   ` Marc Zyngier
@ 2015-02-26 18:17     ` Ard Biesheuvel
  2015-02-26 18:51       ` Marc Zyngier
  0 siblings, 1 reply; 17+ messages in thread
From: Ard Biesheuvel @ 2015-02-26 18:17 UTC (permalink / raw)
  To: linux-arm-kernel

On 26 February 2015 at 18:06, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 26/02/15 15:29, Ard Biesheuvel wrote:
>> This patch modifies the HYP init code so it can deal with system
>> RAM residing at an offset which exceeds the reach of VA_BITS.
>>
>> Like for EL1, this involves configuring an additional level of
>> translation for the ID map. However, in case of EL2, this implies
>> that all translations use the extra level, as we cannot seamlessly
>> switch between translation tables with different numbers of
>> translation levels. For this reason, the ID map is merged with
>> the runtime HYP map, since they don't overlap anyway.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  arch/arm64/kvm/hyp-init.S | 40 ++++++++++++++++++++++++++++++++++++++--
>>  1 file changed, 38 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/hyp-init.S b/arch/arm64/kvm/hyp-init.S
>> index c3191168a994..0af16bce6316 100644
>> --- a/arch/arm64/kvm/hyp-init.S
>> +++ b/arch/arm64/kvm/hyp-init.S
>> @@ -20,6 +20,7 @@
>>  #include <asm/assembler.h>
>>  #include <asm/kvm_arm.h>
>>  #include <asm/kvm_mmu.h>
>> +#include <asm/pgtable-hwdef.h>
>>
>>       .text
>>       .pushsection    .hyp.idmap.text, "ax"
>> @@ -58,13 +59,44 @@ __invalid:
>>        */
>>  __do_hyp_init:
>>
>> -     msr     ttbr0_el2, x0
>> -
>>       mrs     x4, tcr_el1
>>       ldr     x5, =TCR_EL2_MASK
>>       and     x4, x4, x5
>>       ldr     x5, =TCR_EL2_FLAGS
>>       orr     x4, x4, x5
>> +
>> +#ifndef CONFIG_ARM64_VA_BITS_48
>> +     /*
>> +      * If we are running with VA_BITS < 48, we may be running with an extra
>> +      * level of translation in the ID map. This is only the case if system
>> +      * RAM is out of range for the currently configured page size and number
>> +      * of translation levels, in which case we will also need the extra
>> +      * level for the HYP ID map, or we won't be able to enable the EL2 MMU.
>> +      *
>> +      * However, at EL2, there is only one TTBR register, and we can't switch
>> +      * between translation tables *and* update TCR_EL2.T0SZ at the same
>> +      * time. Bottom line: we need the extra level in *both* our translation
>> +      * tables.
>> +      *
>> +      * Fortunately, there is an easy way out: the existing ID map, with the
>> +      * extra level, can be reused for both. The kernel image is already
>> +      * identity mapped at a high virtual offset, which leaves VA_BITS of
>> +      * address space at the low end to put our runtime HYP mappings.
>> +      */
>> +     adrp    x5, idmap_t0sz                  // get ID map TCR.T0SZ
>> +     ldr     x5, [x5, :lo12:idmap_t0sz]
>> +     cmp     x5, TCR_T0SZ(VA_BITS)           // extra level configured?
>> +     b.ge    1f                              // if not, skip
>> +
>> +     bfi     x4, x5, TCR_T0SZ_OFFSET, TCR_TxSZ_WIDTH
>> +
>> +     adrp    x0, idmap_pg_dir        // get root of ID map
>> +     orr     x5, x1, PMD_TYPE_TABLE  // table entry pointing at HYP pgd
>> +     str     x5, [x0]                // store at offset #0
>> +     mov     x1, #0
>> +1:
>
> Wow. This is making my head spin a bit.
>
> Nitpick: shouldn't you use PUD_TYPE_TABLE instead of PMD_TYPE_TABLE
> (yeah, I know, that's the same thing...)?
>

I think both are equally inappropriate, considering that this is the
level above PGD. But I am happy to switch one for the other ...


>> +#endif
>> +     msr     ttbr0_el2, x0
>>       msr     tcr_el2, x4
>>
>>       ldr     x4, =VTCR_EL2_FLAGS
>> @@ -91,6 +123,9 @@ __do_hyp_init:
>>       msr     sctlr_el2, x4
>>       isb
>>
>> +     /* Skip the trampoline dance if we merged the boot and runtime PGDs */
>> +     cbz     x1, merged
>> +
>>       /* MMU is now enabled. Get ready for the trampoline dance */
>>       ldr     x4, =TRAMPOLINE_VA
>>       adr     x5, target
>> @@ -105,6 +140,7 @@ target: /* We're now in the trampoline code, switch page tables */
>>       tlbi    alle2
>>       dsb     sy
>>
>> +merged:
>>       /* Set the stack and new vectors */
>>       kern_hyp_va     x2
>>       mov     sp, x2
>>
>
> The one thing that worries here is that our EL1 idmap is not a strict
> idmap anymore. I really wonder what we can break with that...
>

Yes, that is the question I had myself. The other issue is that we
never remove the entry we add at the root level in this code, which I
don't think matters that much but it would be good if someone could
confirm.

> But hey, it's so mad I like it! ;-)
>

Thanks!

-- 
Ard.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 1/3] ARM, arm64: kvm: get rid of the bounce page
  2015-02-26 17:31     ` Ard Biesheuvel
@ 2015-02-26 18:24       ` Marc Zyngier
  2015-02-26 18:41         ` Ard Biesheuvel
  0 siblings, 1 reply; 17+ messages in thread
From: Marc Zyngier @ 2015-02-26 18:24 UTC (permalink / raw)
  To: linux-arm-kernel

On 26/02/15 17:31, Ard Biesheuvel wrote:
> On 26 February 2015 at 16:10, Marc Zyngier <marc.zyngier@arm.com> wrote:
>> On 26/02/15 15:29, Ard Biesheuvel wrote:
>>> The HYP init bounce page is a runtime construct that ensures that the
>>> HYP init code does not cross a page boundary. However, this is something
>>> we can do perfectly well at build time, by aligning the code appropriately.
>>>
>>> For arm64, we just align to 4 KB, and enforce that the code size is less
>>> than 4 KB, regardless of the chosen page size.
>>>
>>> For ARM, the whole code is less than 256 bytes, so we tweak the linker
>>> script to align at a power of 2 upper bound of the code size
>>>
>>> Note that this also fixes a benign off-by-one error in the original bounce
>>> page code, where a bounce page would be allocated unnecessarily if the code
>>> was exactly 1 page in size.
>>
>> I really like this simplification. Can you please check that it still
>> work on 32bit with this patch from Arnd?
>>
>> https://www.mail-archive.com/kvm at vger.kernel.org/msg112364.html
>>
> 
> Yes, it does.
> 
> Note that the kernel's RODATA permissions shouldn't affect whether
> this code is executable or not in HYP mode, so I think this code
> belongs in RODATA in the 1st place.

Yup. Maybe we should do the same for arm64, shouldn't we?

>> Another question below:
>>
>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>> ---
>>>  arch/arm/kernel/vmlinux.lds.S   | 12 +++++++++---
>>>  arch/arm/kvm/init.S             | 11 +++++++++++
>>>  arch/arm/kvm/mmu.c              | 42 +++++------------------------------------
>>>  arch/arm64/kernel/vmlinux.lds.S | 18 ++++++++++++------
>>>  4 files changed, 37 insertions(+), 46 deletions(-)
>>>
>>> diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
>>> index b31aa73e8076..8179d3903dee 100644
>>> --- a/arch/arm/kernel/vmlinux.lds.S
>>> +++ b/arch/arm/kernel/vmlinux.lds.S
>>> @@ -23,7 +23,7 @@
>>>       VMLINUX_SYMBOL(__idmap_text_start) = .;                         \
>>>       *(.idmap.text)                                                  \
>>>       VMLINUX_SYMBOL(__idmap_text_end) = .;                           \
>>> -     . = ALIGN(32);                                                  \
>>> +     . = ALIGN(1 << __hyp_idmap_align_order);                        \
>>>       VMLINUX_SYMBOL(__hyp_idmap_text_start) = .;                     \
>>>       *(.hyp.idmap.text)                                              \
>>>       VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;
>>> @@ -346,8 +346,14 @@ SECTIONS
>>>   */
>>>  ASSERT((__proc_info_end - __proc_info_begin), "missing CPU support")
>>>  ASSERT((__arch_info_end - __arch_info_begin), "no machine record defined")
>>> +
>>>  /*
>>> - * The HYP init code can't be more than a page long.
>>> + * The HYP init code can't be more than a page long,
>>> + * and should not cross a page boundary.
>>>   * The above comment applies as well.
>>>   */
>>> -ASSERT(((__hyp_idmap_text_end - __hyp_idmap_text_start) <= PAGE_SIZE), "HYP init code too big")
>>> +ASSERT(((__hyp_idmap_text_end - 1) & PAGE_MASK) -
>>> +     (__hyp_idmap_text_start & PAGE_MASK) == 0,
>>> +     "HYP init code too big or unaligned")
>>> +ASSERT(__hyp_idmap_size <= (1 << __hyp_idmap_align_order),
>>> +     "__hyp_idmap_size should be <= (1 << __hyp_idmap_align_order)")
>>> diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
>>> index 3988e72d16ff..7a279bc8e0e1 100644
>>> --- a/arch/arm/kvm/init.S
>>> +++ b/arch/arm/kvm/init.S
>>> @@ -157,3 +157,14 @@ target:  @ We're now in the trampoline code, switch page tables
>>>  __kvm_hyp_init_end:
>>>
>>>       .popsection
>>> +
>>> +     /*
>>> +      * When making changes to this file, make sure that the value of
>>> +      * __hyp_idmap_align_order is updated if the size of the code ends up
>>> +      * exceeding (1 << __hyp_idmap_align_order). This helps ensure that the
>>> +      * code never crosses a page boundary, without wasting too much memory
>>> +      * on aligning to PAGE_SIZE.
>>> +      */
>>> +     .global __hyp_idmap_size, __hyp_idmap_align_order
>>> +     .set    __hyp_idmap_size, __kvm_hyp_init_end - __kvm_hyp_init
>>> +     .set    __hyp_idmap_align_order, 8
>>
>> Is there a way to generate this __hyp_idmap_align_order automatically?
>> We're already pretty close to this 8 bit limit...
>>
> 
> This seems to work:
> 
> #define HYP_IDMAP_ALIGN \
> __hyp_idmap_size <= 0x100 ? 0x100 : \
> __hyp_idmap_size <= 0x200 ? 0x200 : \
> __hyp_idmap_size <= 0x400 ? 0x400 : \
> __hyp_idmap_size <= 0x800 ? 0x800 : 0x1000
> 
> and
> 
> . = ALIGN(HYP_IDMAP_ALIGN); \
> 
> and we are limited at 1 page anyway.

Ah, excellent.

> Should I respin and include the move to RODATA at the same time?
> Or would you like me to rebase onto Arnd's patch?

[adding Arnd on CC]

Rebasing on top of Arnd's patch seems fair, as he came up with the idea
the first place.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 1/3] ARM, arm64: kvm: get rid of the bounce page
  2015-02-26 18:24       ` Marc Zyngier
@ 2015-02-26 18:41         ` Ard Biesheuvel
  2015-02-27  8:19           ` Ard Biesheuvel
  0 siblings, 1 reply; 17+ messages in thread
From: Ard Biesheuvel @ 2015-02-26 18:41 UTC (permalink / raw)
  To: linux-arm-kernel

On 26 February 2015 at 18:24, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 26/02/15 17:31, Ard Biesheuvel wrote:
>> On 26 February 2015 at 16:10, Marc Zyngier <marc.zyngier@arm.com> wrote:
>>> On 26/02/15 15:29, Ard Biesheuvel wrote:
>>>> The HYP init bounce page is a runtime construct that ensures that the
>>>> HYP init code does not cross a page boundary. However, this is something
>>>> we can do perfectly well at build time, by aligning the code appropriately.
>>>>
>>>> For arm64, we just align to 4 KB, and enforce that the code size is less
>>>> than 4 KB, regardless of the chosen page size.
>>>>
>>>> For ARM, the whole code is less than 256 bytes, so we tweak the linker
>>>> script to align at a power of 2 upper bound of the code size
>>>>
>>>> Note that this also fixes a benign off-by-one error in the original bounce
>>>> page code, where a bounce page would be allocated unnecessarily if the code
>>>> was exactly 1 page in size.
>>>
>>> I really like this simplification. Can you please check that it still
>>> work on 32bit with this patch from Arnd?
>>>
>>> https://www.mail-archive.com/kvm at vger.kernel.org/msg112364.html
>>>
>>
>> Yes, it does.
>>
>> Note that the kernel's RODATA permissions shouldn't affect whether
>> this code is executable or not in HYP mode, so I think this code
>> belongs in RODATA in the 1st place.
>
> Yup. Maybe we should do the same for arm64, shouldn't we?
>

In fact, patch 2/2 of the series I sent earlier today has a similar
side effect, by putting the HYP idmap before _stext.

>>> Another question below:
>>>
>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>> ---
>>>>  arch/arm/kernel/vmlinux.lds.S   | 12 +++++++++---
>>>>  arch/arm/kvm/init.S             | 11 +++++++++++
>>>>  arch/arm/kvm/mmu.c              | 42 +++++------------------------------------
>>>>  arch/arm64/kernel/vmlinux.lds.S | 18 ++++++++++++------
>>>>  4 files changed, 37 insertions(+), 46 deletions(-)
>>>>
>>>> diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
>>>> index b31aa73e8076..8179d3903dee 100644
>>>> --- a/arch/arm/kernel/vmlinux.lds.S
>>>> +++ b/arch/arm/kernel/vmlinux.lds.S
>>>> @@ -23,7 +23,7 @@
>>>>       VMLINUX_SYMBOL(__idmap_text_start) = .;                         \
>>>>       *(.idmap.text)                                                  \
>>>>       VMLINUX_SYMBOL(__idmap_text_end) = .;                           \
>>>> -     . = ALIGN(32);                                                  \
>>>> +     . = ALIGN(1 << __hyp_idmap_align_order);                        \
>>>>       VMLINUX_SYMBOL(__hyp_idmap_text_start) = .;                     \
>>>>       *(.hyp.idmap.text)                                              \
>>>>       VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;
>>>> @@ -346,8 +346,14 @@ SECTIONS
>>>>   */
>>>>  ASSERT((__proc_info_end - __proc_info_begin), "missing CPU support")
>>>>  ASSERT((__arch_info_end - __arch_info_begin), "no machine record defined")
>>>> +
>>>>  /*
>>>> - * The HYP init code can't be more than a page long.
>>>> + * The HYP init code can't be more than a page long,
>>>> + * and should not cross a page boundary.
>>>>   * The above comment applies as well.
>>>>   */
>>>> -ASSERT(((__hyp_idmap_text_end - __hyp_idmap_text_start) <= PAGE_SIZE), "HYP init code too big")
>>>> +ASSERT(((__hyp_idmap_text_end - 1) & PAGE_MASK) -
>>>> +     (__hyp_idmap_text_start & PAGE_MASK) == 0,
>>>> +     "HYP init code too big or unaligned")
>>>> +ASSERT(__hyp_idmap_size <= (1 << __hyp_idmap_align_order),
>>>> +     "__hyp_idmap_size should be <= (1 << __hyp_idmap_align_order)")
>>>> diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
>>>> index 3988e72d16ff..7a279bc8e0e1 100644
>>>> --- a/arch/arm/kvm/init.S
>>>> +++ b/arch/arm/kvm/init.S
>>>> @@ -157,3 +157,14 @@ target:  @ We're now in the trampoline code, switch page tables
>>>>  __kvm_hyp_init_end:
>>>>
>>>>       .popsection
>>>> +
>>>> +     /*
>>>> +      * When making changes to this file, make sure that the value of
>>>> +      * __hyp_idmap_align_order is updated if the size of the code ends up
>>>> +      * exceeding (1 << __hyp_idmap_align_order). This helps ensure that the
>>>> +      * code never crosses a page boundary, without wasting too much memory
>>>> +      * on aligning to PAGE_SIZE.
>>>> +      */
>>>> +     .global __hyp_idmap_size, __hyp_idmap_align_order
>>>> +     .set    __hyp_idmap_size, __kvm_hyp_init_end - __kvm_hyp_init
>>>> +     .set    __hyp_idmap_align_order, 8
>>>
>>> Is there a way to generate this __hyp_idmap_align_order automatically?
>>> We're already pretty close to this 8 bit limit...
>>>
>>
>> This seems to work:
>>
>> #define HYP_IDMAP_ALIGN \
>> __hyp_idmap_size <= 0x100 ? 0x100 : \
>> __hyp_idmap_size <= 0x200 ? 0x200 : \
>> __hyp_idmap_size <= 0x400 ? 0x400 : \
>> __hyp_idmap_size <= 0x800 ? 0x800 : 0x1000
>>
>> and
>>
>> . = ALIGN(HYP_IDMAP_ALIGN); \
>>
>> and we are limited at 1 page anyway.
>
> Ah, excellent.
>
>> Should I respin and include the move to RODATA at the same time?
>> Or would you like me to rebase onto Arnd's patch?
>
> [adding Arnd on CC]
>
> Rebasing on top of Arnd's patch seems fair, as he came up with the idea
> the first place.
>

OK, that's fine.

@Arnd: I think you should move the ALIGN(32) along with the idmap bits
into .rodata.
Could you cc me on the updated patch? I will rebase this on top of it then.

Thanks,
Ard.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 3/3] arm64: KVM: use ID map with increased VA range if required
  2015-02-26 18:17     ` Ard Biesheuvel
@ 2015-02-26 18:51       ` Marc Zyngier
  2015-02-26 19:04         ` Ard Biesheuvel
  0 siblings, 1 reply; 17+ messages in thread
From: Marc Zyngier @ 2015-02-26 18:51 UTC (permalink / raw)
  To: linux-arm-kernel

On 26/02/15 18:17, Ard Biesheuvel wrote:
> On 26 February 2015 at 18:06, Marc Zyngier <marc.zyngier@arm.com> wrote:
>> On 26/02/15 15:29, Ard Biesheuvel wrote:
>>> This patch modifies the HYP init code so it can deal with system
>>> RAM residing at an offset which exceeds the reach of VA_BITS.
>>>
>>> Like for EL1, this involves configuring an additional level of
>>> translation for the ID map. However, in case of EL2, this implies
>>> that all translations use the extra level, as we cannot seamlessly
>>> switch between translation tables with different numbers of
>>> translation levels. For this reason, the ID map is merged with
>>> the runtime HYP map, since they don't overlap anyway.
>>>
>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>> ---
>>>  arch/arm64/kvm/hyp-init.S | 40 ++++++++++++++++++++++++++++++++++++++--
>>>  1 file changed, 38 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/arm64/kvm/hyp-init.S b/arch/arm64/kvm/hyp-init.S
>>> index c3191168a994..0af16bce6316 100644
>>> --- a/arch/arm64/kvm/hyp-init.S
>>> +++ b/arch/arm64/kvm/hyp-init.S
>>> @@ -20,6 +20,7 @@
>>>  #include <asm/assembler.h>
>>>  #include <asm/kvm_arm.h>
>>>  #include <asm/kvm_mmu.h>
>>> +#include <asm/pgtable-hwdef.h>
>>>
>>>       .text
>>>       .pushsection    .hyp.idmap.text, "ax"
>>> @@ -58,13 +59,44 @@ __invalid:
>>>        */
>>>  __do_hyp_init:
>>>
>>> -     msr     ttbr0_el2, x0
>>> -
>>>       mrs     x4, tcr_el1
>>>       ldr     x5, =TCR_EL2_MASK
>>>       and     x4, x4, x5
>>>       ldr     x5, =TCR_EL2_FLAGS
>>>       orr     x4, x4, x5
>>> +
>>> +#ifndef CONFIG_ARM64_VA_BITS_48
>>> +     /*
>>> +      * If we are running with VA_BITS < 48, we may be running with an extra
>>> +      * level of translation in the ID map. This is only the case if system
>>> +      * RAM is out of range for the currently configured page size and number
>>> +      * of translation levels, in which case we will also need the extra
>>> +      * level for the HYP ID map, or we won't be able to enable the EL2 MMU.
>>> +      *
>>> +      * However, at EL2, there is only one TTBR register, and we can't switch
>>> +      * between translation tables *and* update TCR_EL2.T0SZ at the same
>>> +      * time. Bottom line: we need the extra level in *both* our translation
>>> +      * tables.
>>> +      *
>>> +      * Fortunately, there is an easy way out: the existing ID map, with the
>>> +      * extra level, can be reused for both. The kernel image is already
>>> +      * identity mapped at a high virtual offset, which leaves VA_BITS of
>>> +      * address space at the low end to put our runtime HYP mappings.
>>> +      */
>>> +     adrp    x5, idmap_t0sz                  // get ID map TCR.T0SZ
>>> +     ldr     x5, [x5, :lo12:idmap_t0sz]
>>> +     cmp     x5, TCR_T0SZ(VA_BITS)           // extra level configured?
>>> +     b.ge    1f                              // if not, skip
>>> +
>>> +     bfi     x4, x5, TCR_T0SZ_OFFSET, TCR_TxSZ_WIDTH
>>> +
>>> +     adrp    x0, idmap_pg_dir        // get root of ID map
>>> +     orr     x5, x1, PMD_TYPE_TABLE  // table entry pointing at HYP pgd
>>> +     str     x5, [x0]                // store at offset #0
>>> +     mov     x1, #0
>>> +1:
>>
>> Wow. This is making my head spin a bit.
>>
>> Nitpick: shouldn't you use PUD_TYPE_TABLE instead of PMD_TYPE_TABLE
>> (yeah, I know, that's the same thing...)?
>>
> 
> I think both are equally inappropriate, considering that this is the
> level above PGD. But I am happy to switch one for the other ...

Yeah, brainfart here. We'd need a PGD_TYPE_TABLE, but I don't think it's
worth the hassle.

> 
>>> +#endif
>>> +     msr     ttbr0_el2, x0
>>>       msr     tcr_el2, x4
>>>
>>>       ldr     x4, =VTCR_EL2_FLAGS
>>> @@ -91,6 +123,9 @@ __do_hyp_init:
>>>       msr     sctlr_el2, x4
>>>       isb
>>>
>>> +     /* Skip the trampoline dance if we merged the boot and runtime PGDs */
>>> +     cbz     x1, merged
>>> +
>>>       /* MMU is now enabled. Get ready for the trampoline dance */
>>>       ldr     x4, =TRAMPOLINE_VA
>>>       adr     x5, target
>>> @@ -105,6 +140,7 @@ target: /* We're now in the trampoline code, switch page tables */
>>>       tlbi    alle2
>>>       dsb     sy
>>>
>>> +merged:
>>>       /* Set the stack and new vectors */
>>>       kern_hyp_va     x2
>>>       mov     sp, x2
>>>
>>
>> The one thing that worries here is that our EL1 idmap is not a strict
>> idmap anymore. I really wonder what we can break with that...
>>
> 
> Yes, that is the question I had myself. The other issue is that we
> never remove the entry we add at the root level in this code, which I
> don't think matters that much but it would be good if someone could
> confirm.

Do you mean having both the idmap and the runtime mappings in EL2? I'm
trying to break it, but I don't see any obvious issues, except for the
semi-religious fear of keeping unused mappings around....

Worse case, we could have a hyp-specific idmap, and nuke/populate the
idmap pgd entry as we see fit. One thing at a time...

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 3/3] arm64: KVM: use ID map with increased VA range if required
  2015-02-26 18:51       ` Marc Zyngier
@ 2015-02-26 19:04         ` Ard Biesheuvel
  0 siblings, 0 replies; 17+ messages in thread
From: Ard Biesheuvel @ 2015-02-26 19:04 UTC (permalink / raw)
  To: linux-arm-kernel

On 26 February 2015 at 18:51, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 26/02/15 18:17, Ard Biesheuvel wrote:
>> On 26 February 2015 at 18:06, Marc Zyngier <marc.zyngier@arm.com> wrote:
>>> On 26/02/15 15:29, Ard Biesheuvel wrote:
>>>> This patch modifies the HYP init code so it can deal with system
>>>> RAM residing at an offset which exceeds the reach of VA_BITS.
>>>>
>>>> Like for EL1, this involves configuring an additional level of
>>>> translation for the ID map. However, in case of EL2, this implies
>>>> that all translations use the extra level, as we cannot seamlessly
>>>> switch between translation tables with different numbers of
>>>> translation levels. For this reason, the ID map is merged with
>>>> the runtime HYP map, since they don't overlap anyway.
>>>>
>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>> ---
>>>>  arch/arm64/kvm/hyp-init.S | 40 ++++++++++++++++++++++++++++++++++++++--
>>>>  1 file changed, 38 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/kvm/hyp-init.S b/arch/arm64/kvm/hyp-init.S
>>>> index c3191168a994..0af16bce6316 100644
>>>> --- a/arch/arm64/kvm/hyp-init.S
>>>> +++ b/arch/arm64/kvm/hyp-init.S
>>>> @@ -20,6 +20,7 @@
>>>>  #include <asm/assembler.h>
>>>>  #include <asm/kvm_arm.h>
>>>>  #include <asm/kvm_mmu.h>
>>>> +#include <asm/pgtable-hwdef.h>
>>>>
>>>>       .text
>>>>       .pushsection    .hyp.idmap.text, "ax"
>>>> @@ -58,13 +59,44 @@ __invalid:
>>>>        */
>>>>  __do_hyp_init:
>>>>
>>>> -     msr     ttbr0_el2, x0
>>>> -
>>>>       mrs     x4, tcr_el1
>>>>       ldr     x5, =TCR_EL2_MASK
>>>>       and     x4, x4, x5
>>>>       ldr     x5, =TCR_EL2_FLAGS
>>>>       orr     x4, x4, x5
>>>> +
>>>> +#ifndef CONFIG_ARM64_VA_BITS_48
>>>> +     /*
>>>> +      * If we are running with VA_BITS < 48, we may be running with an extra
>>>> +      * level of translation in the ID map. This is only the case if system
>>>> +      * RAM is out of range for the currently configured page size and number
>>>> +      * of translation levels, in which case we will also need the extra
>>>> +      * level for the HYP ID map, or we won't be able to enable the EL2 MMU.
>>>> +      *
>>>> +      * However, at EL2, there is only one TTBR register, and we can't switch
>>>> +      * between translation tables *and* update TCR_EL2.T0SZ at the same
>>>> +      * time. Bottom line: we need the extra level in *both* our translation
>>>> +      * tables.
>>>> +      *
>>>> +      * Fortunately, there is an easy way out: the existing ID map, with the
>>>> +      * extra level, can be reused for both. The kernel image is already
>>>> +      * identity mapped at a high virtual offset, which leaves VA_BITS of
>>>> +      * address space at the low end to put our runtime HYP mappings.
>>>> +      */
>>>> +     adrp    x5, idmap_t0sz                  // get ID map TCR.T0SZ
>>>> +     ldr     x5, [x5, :lo12:idmap_t0sz]
>>>> +     cmp     x5, TCR_T0SZ(VA_BITS)           // extra level configured?
>>>> +     b.ge    1f                              // if not, skip
>>>> +
>>>> +     bfi     x4, x5, TCR_T0SZ_OFFSET, TCR_TxSZ_WIDTH
>>>> +
>>>> +     adrp    x0, idmap_pg_dir        // get root of ID map
>>>> +     orr     x5, x1, PMD_TYPE_TABLE  // table entry pointing at HYP pgd
>>>> +     str     x5, [x0]                // store at offset #0
>>>> +     mov     x1, #0
>>>> +1:
>>>
>>> Wow. This is making my head spin a bit.
>>>
>>> Nitpick: shouldn't you use PUD_TYPE_TABLE instead of PMD_TYPE_TABLE
>>> (yeah, I know, that's the same thing...)?
>>>
>>
>> I think both are equally inappropriate, considering that this is the
>> level above PGD. But I am happy to switch one for the other ...
>
> Yeah, brainfart here. We'd need a PGD_TYPE_TABLE, but I don't think it's
> worth the hassle.
>
>>
>>>> +#endif
>>>> +     msr     ttbr0_el2, x0
>>>>       msr     tcr_el2, x4
>>>>
>>>>       ldr     x4, =VTCR_EL2_FLAGS
>>>> @@ -91,6 +123,9 @@ __do_hyp_init:
>>>>       msr     sctlr_el2, x4
>>>>       isb
>>>>
>>>> +     /* Skip the trampoline dance if we merged the boot and runtime PGDs */
>>>> +     cbz     x1, merged
>>>> +
>>>>       /* MMU is now enabled. Get ready for the trampoline dance */
>>>>       ldr     x4, =TRAMPOLINE_VA
>>>>       adr     x5, target
>>>> @@ -105,6 +140,7 @@ target: /* We're now in the trampoline code, switch page tables */
>>>>       tlbi    alle2
>>>>       dsb     sy
>>>>
>>>> +merged:
>>>>       /* Set the stack and new vectors */
>>>>       kern_hyp_va     x2
>>>>       mov     sp, x2
>>>>
>>>
>>> The one thing that worries here is that our EL1 idmap is not a strict
>>> idmap anymore. I really wonder what we can break with that...
>>>
>>
>> Yes, that is the question I had myself. The other issue is that we
>> never remove the entry we add at the root level in this code, which I
>> don't think matters that much but it would be good if someone could
>> confirm.
>
> Do you mean having both the idmap and the runtime mappings in EL2? I'm
> trying to break it, but I don't see any obvious issues, except for the
> semi-religious fear of keeping unused mappings around....
>

No, I think that part is sound. My concern is that the pointer to
hyp_pgd is never removed if hyp_pgd itself is ever taken down and
freed.


> Worse case, we could have a hyp-specific idmap, and nuke/populate the
> idmap pgd entry as we see fit. One thing at a time...
>
>         M.
> --
> Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 1/3] ARM, arm64: kvm: get rid of the bounce page
  2015-02-26 18:41         ` Ard Biesheuvel
@ 2015-02-27  8:19           ` Ard Biesheuvel
  0 siblings, 0 replies; 17+ messages in thread
From: Ard Biesheuvel @ 2015-02-27  8:19 UTC (permalink / raw)
  To: linux-arm-kernel

On 26 February 2015 at 18:41, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 26 February 2015 at 18:24, Marc Zyngier <marc.zyngier@arm.com> wrote:
>> On 26/02/15 17:31, Ard Biesheuvel wrote:
>>> On 26 February 2015 at 16:10, Marc Zyngier <marc.zyngier@arm.com> wrote:
>>>> On 26/02/15 15:29, Ard Biesheuvel wrote:
>>>>> The HYP init bounce page is a runtime construct that ensures that the
>>>>> HYP init code does not cross a page boundary. However, this is something
>>>>> we can do perfectly well at build time, by aligning the code appropriately.
>>>>>
>>>>> For arm64, we just align to 4 KB, and enforce that the code size is less
>>>>> than 4 KB, regardless of the chosen page size.
>>>>>
>>>>> For ARM, the whole code is less than 256 bytes, so we tweak the linker
>>>>> script to align at a power of 2 upper bound of the code size
>>>>>
>>>>> Note that this also fixes a benign off-by-one error in the original bounce
>>>>> page code, where a bounce page would be allocated unnecessarily if the code
>>>>> was exactly 1 page in size.
>>>>
>>>> I really like this simplification. Can you please check that it still
>>>> work on 32bit with this patch from Arnd?
>>>>
>>>> https://www.mail-archive.com/kvm at vger.kernel.org/msg112364.html
>>>>
>>>
>>> Yes, it does.
>>>
>>> Note that the kernel's RODATA permissions shouldn't affect whether
>>> this code is executable or not in HYP mode, so I think this code
>>> belongs in RODATA in the 1st place.
>>
>> Yup. Maybe we should do the same for arm64, shouldn't we?
>>
>
> In fact, patch 2/2 of the series I sent earlier today has a similar
> side effect, by putting the HYP idmap before _stext.
>
>>>> Another question below:
>>>>
>>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>>> ---
>>>>>  arch/arm/kernel/vmlinux.lds.S   | 12 +++++++++---
>>>>>  arch/arm/kvm/init.S             | 11 +++++++++++
>>>>>  arch/arm/kvm/mmu.c              | 42 +++++------------------------------------
>>>>>  arch/arm64/kernel/vmlinux.lds.S | 18 ++++++++++++------
>>>>>  4 files changed, 37 insertions(+), 46 deletions(-)
>>>>>
>>>>> diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
>>>>> index b31aa73e8076..8179d3903dee 100644
>>>>> --- a/arch/arm/kernel/vmlinux.lds.S
>>>>> +++ b/arch/arm/kernel/vmlinux.lds.S
>>>>> @@ -23,7 +23,7 @@
>>>>>       VMLINUX_SYMBOL(__idmap_text_start) = .;                         \
>>>>>       *(.idmap.text)                                                  \
>>>>>       VMLINUX_SYMBOL(__idmap_text_end) = .;                           \
>>>>> -     . = ALIGN(32);                                                  \
>>>>> +     . = ALIGN(1 << __hyp_idmap_align_order);                        \
>>>>>       VMLINUX_SYMBOL(__hyp_idmap_text_start) = .;                     \
>>>>>       *(.hyp.idmap.text)                                              \
>>>>>       VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;
>>>>> @@ -346,8 +346,14 @@ SECTIONS
>>>>>   */
>>>>>  ASSERT((__proc_info_end - __proc_info_begin), "missing CPU support")
>>>>>  ASSERT((__arch_info_end - __arch_info_begin), "no machine record defined")
>>>>> +
>>>>>  /*
>>>>> - * The HYP init code can't be more than a page long.
>>>>> + * The HYP init code can't be more than a page long,
>>>>> + * and should not cross a page boundary.
>>>>>   * The above comment applies as well.
>>>>>   */
>>>>> -ASSERT(((__hyp_idmap_text_end - __hyp_idmap_text_start) <= PAGE_SIZE), "HYP init code too big")
>>>>> +ASSERT(((__hyp_idmap_text_end - 1) & PAGE_MASK) -
>>>>> +     (__hyp_idmap_text_start & PAGE_MASK) == 0,
>>>>> +     "HYP init code too big or unaligned")
>>>>> +ASSERT(__hyp_idmap_size <= (1 << __hyp_idmap_align_order),
>>>>> +     "__hyp_idmap_size should be <= (1 << __hyp_idmap_align_order)")
>>>>> diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
>>>>> index 3988e72d16ff..7a279bc8e0e1 100644
>>>>> --- a/arch/arm/kvm/init.S
>>>>> +++ b/arch/arm/kvm/init.S
>>>>> @@ -157,3 +157,14 @@ target:  @ We're now in the trampoline code, switch page tables
>>>>>  __kvm_hyp_init_end:
>>>>>
>>>>>       .popsection
>>>>> +
>>>>> +     /*
>>>>> +      * When making changes to this file, make sure that the value of
>>>>> +      * __hyp_idmap_align_order is updated if the size of the code ends up
>>>>> +      * exceeding (1 << __hyp_idmap_align_order). This helps ensure that the
>>>>> +      * code never crosses a page boundary, without wasting too much memory
>>>>> +      * on aligning to PAGE_SIZE.
>>>>> +      */
>>>>> +     .global __hyp_idmap_size, __hyp_idmap_align_order
>>>>> +     .set    __hyp_idmap_size, __kvm_hyp_init_end - __kvm_hyp_init
>>>>> +     .set    __hyp_idmap_align_order, 8
>>>>
>>>> Is there a way to generate this __hyp_idmap_align_order automatically?
>>>> We're already pretty close to this 8 bit limit...
>>>>
>>>
>>> This seems to work:
>>>
>>> #define HYP_IDMAP_ALIGN \
>>> __hyp_idmap_size <= 0x100 ? 0x100 : \
>>> __hyp_idmap_size <= 0x200 ? 0x200 : \
>>> __hyp_idmap_size <= 0x400 ? 0x400 : \
>>> __hyp_idmap_size <= 0x800 ? 0x800 : 0x1000
>>>
>>> and
>>>
>>> . = ALIGN(HYP_IDMAP_ALIGN); \
>>>
>>> and we are limited at 1 page anyway.
>>
>> Ah, excellent.
>>
>>> Should I respin and include the move to RODATA at the same time?
>>> Or would you like me to rebase onto Arnd's patch?
>>
>> [adding Arnd on CC]
>>
>> Rebasing on top of Arnd's patch seems fair, as he came up with the idea
>> the first place.
>>
>
> OK, that's fine.
>
> @Arnd: I think you should move the ALIGN(32) along with the idmap bits
> into .rodata.
> Could you cc me on the updated patch? I will rebase this on top of it then.

Actually, it makes much more sense for me just to update the patch and
resend it, so that is what I am about to do.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-02-27  8:19 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-26 15:29 [RFC PATCH 0/3] arm64: KVM: use increased VA range if needed Ard Biesheuvel
2015-02-26 15:29 ` [RFC PATCH 1/3] ARM, arm64: kvm: get rid of the bounce page Ard Biesheuvel
2015-02-26 16:10   ` Marc Zyngier
2015-02-26 17:31     ` Ard Biesheuvel
2015-02-26 18:24       ` Marc Zyngier
2015-02-26 18:41         ` Ard Biesheuvel
2015-02-27  8:19           ` Ard Biesheuvel
2015-02-26 15:29 ` [RFC PATCH 2/3] arm64: make ID map shareable with EL2 Ard Biesheuvel
2015-02-26 15:29 ` [RFC PATCH 3/3] arm64: KVM: use ID map with increased VA range if required Ard Biesheuvel
2015-02-26 16:11   ` Catalin Marinas
2015-02-26 16:29     ` Catalin Marinas
2015-02-26 16:56     ` Ard Biesheuvel
2015-02-26 18:07       ` Catalin Marinas
2015-02-26 18:06   ` Marc Zyngier
2015-02-26 18:17     ` Ard Biesheuvel
2015-02-26 18:51       ` Marc Zyngier
2015-02-26 19:04         ` Ard Biesheuvel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.