All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/15] arm/arm64: KVM: Merge boot and runtime page tables
@ 2016-06-07 10:58 ` Marc Zyngier
  0 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-07 10:58 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: linux-arm-kernel, kvm, kvmarm

Until now, we've been setting up KVM using two sets of page tables:
one for the "boot" where we perform the basic MMU setup, and one for
the runtime.

Switching between the two was though to be safe, but we've recently
realized that it is not: it is not enough to ensure that the VA->PA
mapping is consistent when switching TTBR0_EL2, but we also have to
ensure that the intermediate translations are the same as well. If the
TLB can return two different values for intermediate translations,
we're screwed (TLB conflicts).

At that point, the only safe thing to do is to never change TTBR0_EL2,
which means that we need to make the idmap page part of the runtime
page tables.

The series starts with a bit of brain dumping explaining what we're
trying to do. This might not be useful as a merge candidate, but it
was useful for me to put this somewhere. It goes on revamping the
whole notion of HYP VA range, making it runtime patchable. It then
always merge idmap and runtime page table into one set, leading to
quite a lot of simplification in the init/teardown code. In the
process, 32bit KVM gains the ability to teardown the HYP page-tables
and vectors, which makes kexec a bit closer.

This has been tested on Seattle, Juno, the FVP model (both v8.0 and
v8.1), Cubietruck and Midway, and is based on 4.7-rc2.

Thanks,

	M.

Marc Zyngier (15):
  arm64: KVM: Merged page tables documentation
  arm64: KVM: Kill HYP_PAGE_OFFSET
  arm64: Add ARM64_HYP_OFFSET_LOW capability
  arm64: KVM: Define HYP offset masks
  arm64: KVM: Refactor kern_hyp_va/hyp_kern_va to deal with multiple
    offsets
  arm/arm64: KVM: Export __hyp_text_start/end symbols
  arm64: KVM: Runtime detection of lower HYP offset
  arm/arm64: KVM: Always have merged page tables
  arm64: KVM: Simplify HYP init/teardown
  arm/arm64: KVM: Drop boot_pgd
  arm/arm64: KVM: Kill free_boot_hyp_pgd
  arm: KVM: Simplify HYP init
  arm: KVM: Allow hyp teardown
  arm/arm64: KVM: Prune unused #defines
  arm/arm64: KVM: Check that IDMAP doesn't intersect with VA range

 arch/arm/include/asm/kvm_asm.h      |   2 +
 arch/arm/include/asm/kvm_host.h     |  25 +++-----
 arch/arm/include/asm/kvm_mmu.h      |  11 ----
 arch/arm/include/asm/virt.h         |   4 ++
 arch/arm/kvm/arm.c                  |  20 ++----
 arch/arm/kvm/init.S                 |  56 ++++++----------
 arch/arm/kvm/mmu.c                  | 125 ++++++++++++++++--------------------
 arch/arm64/include/asm/cpufeature.h |   3 +-
 arch/arm64/include/asm/kvm_host.h   |  17 ++---
 arch/arm64/include/asm/kvm_hyp.h    |  28 ++++----
 arch/arm64/include/asm/kvm_mmu.h    | 100 ++++++++++++++++++++++++-----
 arch/arm64/include/asm/virt.h       |   4 ++
 arch/arm64/kernel/cpufeature.c      |  19 ++++++
 arch/arm64/kvm/hyp-init.S           |  61 +++---------------
 arch/arm64/kvm/hyp/entry.S          |  19 ------
 arch/arm64/kvm/hyp/hyp-entry.S      |  15 +++++
 arch/arm64/kvm/reset.c              |  28 --------
 17 files changed, 240 insertions(+), 297 deletions(-)

-- 
2.1.4


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 00/15] arm/arm64: KVM: Merge boot and runtime page tables
@ 2016-06-07 10:58 ` Marc Zyngier
  0 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-07 10:58 UTC (permalink / raw)
  To: linux-arm-kernel

Until now, we've been setting up KVM using two sets of page tables:
one for the "boot" where we perform the basic MMU setup, and one for
the runtime.

Switching between the two was though to be safe, but we've recently
realized that it is not: it is not enough to ensure that the VA->PA
mapping is consistent when switching TTBR0_EL2, but we also have to
ensure that the intermediate translations are the same as well. If the
TLB can return two different values for intermediate translations,
we're screwed (TLB conflicts).

At that point, the only safe thing to do is to never change TTBR0_EL2,
which means that we need to make the idmap page part of the runtime
page tables.

The series starts with a bit of brain dumping explaining what we're
trying to do. This might not be useful as a merge candidate, but it
was useful for me to put this somewhere. It goes on revamping the
whole notion of HYP VA range, making it runtime patchable. It then
always merge idmap and runtime page table into one set, leading to
quite a lot of simplification in the init/teardown code. In the
process, 32bit KVM gains the ability to teardown the HYP page-tables
and vectors, which makes kexec a bit closer.

This has been tested on Seattle, Juno, the FVP model (both v8.0 and
v8.1), Cubietruck and Midway, and is based on 4.7-rc2.

Thanks,

	M.

Marc Zyngier (15):
  arm64: KVM: Merged page tables documentation
  arm64: KVM: Kill HYP_PAGE_OFFSET
  arm64: Add ARM64_HYP_OFFSET_LOW capability
  arm64: KVM: Define HYP offset masks
  arm64: KVM: Refactor kern_hyp_va/hyp_kern_va to deal with multiple
    offsets
  arm/arm64: KVM: Export __hyp_text_start/end symbols
  arm64: KVM: Runtime detection of lower HYP offset
  arm/arm64: KVM: Always have merged page tables
  arm64: KVM: Simplify HYP init/teardown
  arm/arm64: KVM: Drop boot_pgd
  arm/arm64: KVM: Kill free_boot_hyp_pgd
  arm: KVM: Simplify HYP init
  arm: KVM: Allow hyp teardown
  arm/arm64: KVM: Prune unused #defines
  arm/arm64: KVM: Check that IDMAP doesn't intersect with VA range

 arch/arm/include/asm/kvm_asm.h      |   2 +
 arch/arm/include/asm/kvm_host.h     |  25 +++-----
 arch/arm/include/asm/kvm_mmu.h      |  11 ----
 arch/arm/include/asm/virt.h         |   4 ++
 arch/arm/kvm/arm.c                  |  20 ++----
 arch/arm/kvm/init.S                 |  56 ++++++----------
 arch/arm/kvm/mmu.c                  | 125 ++++++++++++++++--------------------
 arch/arm64/include/asm/cpufeature.h |   3 +-
 arch/arm64/include/asm/kvm_host.h   |  17 ++---
 arch/arm64/include/asm/kvm_hyp.h    |  28 ++++----
 arch/arm64/include/asm/kvm_mmu.h    | 100 ++++++++++++++++++++++++-----
 arch/arm64/include/asm/virt.h       |   4 ++
 arch/arm64/kernel/cpufeature.c      |  19 ++++++
 arch/arm64/kvm/hyp-init.S           |  61 +++---------------
 arch/arm64/kvm/hyp/entry.S          |  19 ------
 arch/arm64/kvm/hyp/hyp-entry.S      |  15 +++++
 arch/arm64/kvm/reset.c              |  28 --------
 17 files changed, 240 insertions(+), 297 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 01/15] arm64: KVM: Merged page tables documentation
  2016-06-07 10:58 ` Marc Zyngier
@ 2016-06-07 10:58   ` Marc Zyngier
  -1 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-07 10:58 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: linux-arm-kernel, kvm, kvmarm

Since dealing with VA ranges tends to hurt my brain badly, let's
start with a bit of documentation that will hopefully help
understanding what comes next...

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/kvm_mmu.h | 45 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 42 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index f05ac27..00bc277 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -29,10 +29,49 @@
  *
  * Instead, give the HYP mode its own VA region at a fixed offset from
  * the kernel by just masking the top bits (which are all ones for a
- * kernel address).
+ * kernel address). We need to find out how many bits to mask.
  *
- * ARMv8.1 (using VHE) does have a TTBR1_EL2, and doesn't use these
- * macros (the entire kernel runs at EL2).
+ * We want to build a set of page tables that cover both parts of the
+ * idmap (the trampoline page used to initialize EL2), and our normal
+ * runtime VA space, at the same time.
+ *
+ * Given that the kernel uses VA_BITS for its entire address space,
+ * and that half of that space (VA_BITS - 1) is used for the linear
+ * mapping, we can limit the EL2 space to the same size.
+ *
+ * The main question is "Within the VA_BITS space, does EL2 use the
+ * top or the bottom half of that space to shadow the kernel's linear
+ * mapping?". As we need to idmap the trampoline page, this is
+ * determined by the range in which this page lives.
+ *
+ * If the page is in the bottom half, we have to use the top half. If
+ * the page is in the top half, we have to use the bottom half:
+ *
+ * if (PA(T)[VA_BITS - 1] == 1)
+ *	HYP_VA_RANGE = [0 ... (1 << (VA_BITS - 1)) - 1]
+ * else
+ *	HYP_VA_RANGE = [(1 << (VA_BITS - 1)) ... (1 << VA_BITS) - 1]
+ *
+ * In practice, the second case can be simplified to
+ *	HYP_VA_RANGE = [0 ... (1 << VA_BITS) - 1]
+ * because we'll never get anything in the bottom range.
+ *
+ * This of course assumes that the trampoline page exists within the
+ * VA_BITS range. If it doesn't, then it means we're in the odd case
+ * where the kernel idmap (as well as HYP) uses more levels than the
+ * kernel runtime page tables (as seen when the kernel is configured
+ * for 4k pages, 39bits VA, and yet memory lives just above that
+ * limit, forcing the idmap to use 4 levels of page tables while the
+ * kernel itself only uses 3). In this particular case, it doesn't
+ * matter which side of VA_BITS we use, as we're guaranteed not to
+ * conflict with anything.
+ *
+ * An alternative would be to always use 4 levels of page tables for
+ * EL2, no matter what the kernel does. But who wants more levels than
+ * strictly necessary?
+ *
+ * Thankfully, ARMv8.1 (using VHE) does have a TTBR1_EL2, and doesn't
+ * need any of this madness (the entire kernel runs at EL2).
  */
 #define HYP_PAGE_OFFSET_SHIFT	VA_BITS
 #define HYP_PAGE_OFFSET_MASK	((UL(1) << HYP_PAGE_OFFSET_SHIFT) - 1)
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 01/15] arm64: KVM: Merged page tables documentation
@ 2016-06-07 10:58   ` Marc Zyngier
  0 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-07 10:58 UTC (permalink / raw)
  To: linux-arm-kernel

Since dealing with VA ranges tends to hurt my brain badly, let's
start with a bit of documentation that will hopefully help
understanding what comes next...

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/kvm_mmu.h | 45 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 42 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index f05ac27..00bc277 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -29,10 +29,49 @@
  *
  * Instead, give the HYP mode its own VA region at a fixed offset from
  * the kernel by just masking the top bits (which are all ones for a
- * kernel address).
+ * kernel address). We need to find out how many bits to mask.
  *
- * ARMv8.1 (using VHE) does have a TTBR1_EL2, and doesn't use these
- * macros (the entire kernel runs at EL2).
+ * We want to build a set of page tables that cover both parts of the
+ * idmap (the trampoline page used to initialize EL2), and our normal
+ * runtime VA space, at the same time.
+ *
+ * Given that the kernel uses VA_BITS for its entire address space,
+ * and that half of that space (VA_BITS - 1) is used for the linear
+ * mapping, we can limit the EL2 space to the same size.
+ *
+ * The main question is "Within the VA_BITS space, does EL2 use the
+ * top or the bottom half of that space to shadow the kernel's linear
+ * mapping?". As we need to idmap the trampoline page, this is
+ * determined by the range in which this page lives.
+ *
+ * If the page is in the bottom half, we have to use the top half. If
+ * the page is in the top half, we have to use the bottom half:
+ *
+ * if (PA(T)[VA_BITS - 1] == 1)
+ *	HYP_VA_RANGE = [0 ... (1 << (VA_BITS - 1)) - 1]
+ * else
+ *	HYP_VA_RANGE = [(1 << (VA_BITS - 1)) ... (1 << VA_BITS) - 1]
+ *
+ * In practice, the second case can be simplified to
+ *	HYP_VA_RANGE = [0 ... (1 << VA_BITS) - 1]
+ * because we'll never get anything in the bottom range.
+ *
+ * This of course assumes that the trampoline page exists within the
+ * VA_BITS range. If it doesn't, then it means we're in the odd case
+ * where the kernel idmap (as well as HYP) uses more levels than the
+ * kernel runtime page tables (as seen when the kernel is configured
+ * for 4k pages, 39bits VA, and yet memory lives just above that
+ * limit, forcing the idmap to use 4 levels of page tables while the
+ * kernel itself only uses 3). In this particular case, it doesn't
+ * matter which side of VA_BITS we use, as we're guaranteed not to
+ * conflict with anything.
+ *
+ * An alternative would be to always use 4 levels of page tables for
+ * EL2, no matter what the kernel does. But who wants more levels than
+ * strictly necessary?
+ *
+ * Thankfully, ARMv8.1 (using VHE) does have a TTBR1_EL2, and doesn't
+ * need any of this madness (the entire kernel runs@EL2).
  */
 #define HYP_PAGE_OFFSET_SHIFT	VA_BITS
 #define HYP_PAGE_OFFSET_MASK	((UL(1) << HYP_PAGE_OFFSET_SHIFT) - 1)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 02/15] arm64: KVM: Kill HYP_PAGE_OFFSET
  2016-06-07 10:58 ` Marc Zyngier
@ 2016-06-07 10:58   ` Marc Zyngier
  -1 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-07 10:58 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: linux-arm-kernel, kvm, kvmarm

HYP_PAGE_OFFSET is not massively useful. And the way we use it
in KERN_HYP_VA is inconsistent with the equivalent operation in
EL2, where we use a mask instead.

Let's replace the uses of HYP_PAGE_OFFSET with HYP_PAGE_OFFSET_MASK,
and get rid of the pointless macro.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/kvm_hyp.h | 5 ++---
 arch/arm64/include/asm/kvm_mmu.h | 3 +--
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 44eaff7..61d01a9 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -38,11 +38,10 @@ static inline unsigned long __kern_hyp_va(unsigned long v)
 
 static inline unsigned long __hyp_kern_va(unsigned long v)
 {
-	u64 offset = PAGE_OFFSET - HYP_PAGE_OFFSET;
-	asm volatile(ALTERNATIVE("add %0, %0, %1",
+	asm volatile(ALTERNATIVE("orr %0, %0, %1",
 				 "nop",
 				 ARM64_HAS_VIRT_HOST_EXTN)
-		     : "+r" (v) : "r" (offset));
+		     : "+r" (v) : "i" (~HYP_PAGE_OFFSET_MASK));
 	return v;
 }
 
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 00bc277..d162372 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -75,7 +75,6 @@
  */
 #define HYP_PAGE_OFFSET_SHIFT	VA_BITS
 #define HYP_PAGE_OFFSET_MASK	((UL(1) << HYP_PAGE_OFFSET_SHIFT) - 1)
-#define HYP_PAGE_OFFSET		(PAGE_OFFSET & HYP_PAGE_OFFSET_MASK)
 
 /*
  * Our virtual mapping for the idmap-ed MMU-enable code. Must be
@@ -109,7 +108,7 @@ alternative_endif
 #include <asm/mmu_context.h>
 #include <asm/pgtable.h>
 
-#define KERN_TO_HYP(kva)	((unsigned long)kva - PAGE_OFFSET + HYP_PAGE_OFFSET)
+#define KERN_TO_HYP(kva)	((unsigned long)kva & HYP_PAGE_OFFSET_MASK)
 
 /*
  * We currently only support a 40bit IPA.
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 02/15] arm64: KVM: Kill HYP_PAGE_OFFSET
@ 2016-06-07 10:58   ` Marc Zyngier
  0 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-07 10:58 UTC (permalink / raw)
  To: linux-arm-kernel

HYP_PAGE_OFFSET is not massively useful. And the way we use it
in KERN_HYP_VA is inconsistent with the equivalent operation in
EL2, where we use a mask instead.

Let's replace the uses of HYP_PAGE_OFFSET with HYP_PAGE_OFFSET_MASK,
and get rid of the pointless macro.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/kvm_hyp.h | 5 ++---
 arch/arm64/include/asm/kvm_mmu.h | 3 +--
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 44eaff7..61d01a9 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -38,11 +38,10 @@ static inline unsigned long __kern_hyp_va(unsigned long v)
 
 static inline unsigned long __hyp_kern_va(unsigned long v)
 {
-	u64 offset = PAGE_OFFSET - HYP_PAGE_OFFSET;
-	asm volatile(ALTERNATIVE("add %0, %0, %1",
+	asm volatile(ALTERNATIVE("orr %0, %0, %1",
 				 "nop",
 				 ARM64_HAS_VIRT_HOST_EXTN)
-		     : "+r" (v) : "r" (offset));
+		     : "+r" (v) : "i" (~HYP_PAGE_OFFSET_MASK));
 	return v;
 }
 
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 00bc277..d162372 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -75,7 +75,6 @@
  */
 #define HYP_PAGE_OFFSET_SHIFT	VA_BITS
 #define HYP_PAGE_OFFSET_MASK	((UL(1) << HYP_PAGE_OFFSET_SHIFT) - 1)
-#define HYP_PAGE_OFFSET		(PAGE_OFFSET & HYP_PAGE_OFFSET_MASK)
 
 /*
  * Our virtual mapping for the idmap-ed MMU-enable code. Must be
@@ -109,7 +108,7 @@ alternative_endif
 #include <asm/mmu_context.h>
 #include <asm/pgtable.h>
 
-#define KERN_TO_HYP(kva)	((unsigned long)kva - PAGE_OFFSET + HYP_PAGE_OFFSET)
+#define KERN_TO_HYP(kva)	((unsigned long)kva & HYP_PAGE_OFFSET_MASK)
 
 /*
  * We currently only support a 40bit IPA.
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 03/15] arm64: Add ARM64_HYP_OFFSET_LOW capability
  2016-06-07 10:58 ` Marc Zyngier
@ 2016-06-07 10:58   ` Marc Zyngier
  -1 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-07 10:58 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: linux-arm-kernel, kvm, kvmarm

As we need to indicate to the rest of the kernel which region of
the HYP VA space is safe to use, add a capability that will
indicate that KVM should use the [VA_BITS-2:0] range.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/cpufeature.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 224efe7..d40edbb 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -36,8 +36,9 @@
 #define ARM64_HAS_VIRT_HOST_EXTN		11
 #define ARM64_WORKAROUND_CAVIUM_27456		12
 #define ARM64_HAS_32BIT_EL0			13
+#define ARM64_HYP_OFFSET_LOW			14
 
-#define ARM64_NCAPS				14
+#define ARM64_NCAPS				15
 
 #ifndef __ASSEMBLY__
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 03/15] arm64: Add ARM64_HYP_OFFSET_LOW capability
@ 2016-06-07 10:58   ` Marc Zyngier
  0 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-07 10:58 UTC (permalink / raw)
  To: linux-arm-kernel

As we need to indicate to the rest of the kernel which region of
the HYP VA space is safe to use, add a capability that will
indicate that KVM should use the [VA_BITS-2:0] range.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/cpufeature.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 224efe7..d40edbb 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -36,8 +36,9 @@
 #define ARM64_HAS_VIRT_HOST_EXTN		11
 #define ARM64_WORKAROUND_CAVIUM_27456		12
 #define ARM64_HAS_32BIT_EL0			13
+#define ARM64_HYP_OFFSET_LOW			14
 
-#define ARM64_NCAPS				14
+#define ARM64_NCAPS				15
 
 #ifndef __ASSEMBLY__
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 04/15] arm64: KVM: Define HYP offset masks
  2016-06-07 10:58 ` Marc Zyngier
@ 2016-06-07 10:58   ` Marc Zyngier
  -1 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-07 10:58 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm

Define the two possible HYP VA regions in terms of VA_BITS,
and keep HYP_PAGE_OFFSET_MASK as a temporary compatibility
definition.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/kvm_mmu.h | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index d162372..e45df1b 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -73,8 +73,12 @@
  * Thankfully, ARMv8.1 (using VHE) does have a TTBR1_EL2, and doesn't
  * need any of this madness (the entire kernel runs at EL2).
  */
-#define HYP_PAGE_OFFSET_SHIFT	VA_BITS
-#define HYP_PAGE_OFFSET_MASK	((UL(1) << HYP_PAGE_OFFSET_SHIFT) - 1)
+
+#define HYP_PAGE_OFFSET_HIGH_MASK	((UL(1) << VA_BITS) - 1)
+#define HYP_PAGE_OFFSET_LOW_MASK	((UL(1) << (VA_BITS - 1)) - 1)
+
+/* Temporary compat define */
+#define HYP_PAGE_OFFSET_MASK		HYP_PAGE_OFFSET_HIGH_MASK
 
 /*
  * Our virtual mapping for the idmap-ed MMU-enable code. Must be
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 04/15] arm64: KVM: Define HYP offset masks
@ 2016-06-07 10:58   ` Marc Zyngier
  0 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-07 10:58 UTC (permalink / raw)
  To: linux-arm-kernel

Define the two possible HYP VA regions in terms of VA_BITS,
and keep HYP_PAGE_OFFSET_MASK as a temporary compatibility
definition.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/kvm_mmu.h | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index d162372..e45df1b 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -73,8 +73,12 @@
  * Thankfully, ARMv8.1 (using VHE) does have a TTBR1_EL2, and doesn't
  * need any of this madness (the entire kernel runs at EL2).
  */
-#define HYP_PAGE_OFFSET_SHIFT	VA_BITS
-#define HYP_PAGE_OFFSET_MASK	((UL(1) << HYP_PAGE_OFFSET_SHIFT) - 1)
+
+#define HYP_PAGE_OFFSET_HIGH_MASK	((UL(1) << VA_BITS) - 1)
+#define HYP_PAGE_OFFSET_LOW_MASK	((UL(1) << (VA_BITS - 1)) - 1)
+
+/* Temporary compat define */
+#define HYP_PAGE_OFFSET_MASK		HYP_PAGE_OFFSET_HIGH_MASK
 
 /*
  * Our virtual mapping for the idmap-ed MMU-enable code. Must be
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 05/15] arm64: KVM: Refactor kern_hyp_va/hyp_kern_va to deal with multiple offsets
  2016-06-07 10:58 ` Marc Zyngier
@ 2016-06-07 10:58   ` Marc Zyngier
  -1 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-07 10:58 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: linux-arm-kernel, kvm, kvmarm

As we move towards a selectable HYP VA range, it is obvious that
we don't want to test a variable to find out if we need to use
the bottom VA range, the top VA range, or use the address as is
(for VHE).

Instead, we can expand our current helpers to generate the right
mask or nop with code patching. We default to using the top VA
space, with alternatives to switch to the bottom one or to nop
out the instructions.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/kvm_hyp.h | 27 ++++++++++++--------------
 arch/arm64/include/asm/kvm_mmu.h | 42 +++++++++++++++++++++++++++++++++++++---
 2 files changed, 51 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 61d01a9..dd4904b 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -25,24 +25,21 @@
 
 #define __hyp_text __section(.hyp.text) notrace
 
-static inline unsigned long __kern_hyp_va(unsigned long v)
-{
-	asm volatile(ALTERNATIVE("and %0, %0, %1",
-				 "nop",
-				 ARM64_HAS_VIRT_HOST_EXTN)
-		     : "+r" (v) : "i" (HYP_PAGE_OFFSET_MASK));
-	return v;
-}
-
-#define kern_hyp_va(v) (typeof(v))(__kern_hyp_va((unsigned long)(v)))
-
 static inline unsigned long __hyp_kern_va(unsigned long v)
 {
-	asm volatile(ALTERNATIVE("orr %0, %0, %1",
-				 "nop",
+	u64 mask;
+
+	asm volatile(ALTERNATIVE("mov %0, %1",
+				 "mov %0, %2",
+				 ARM64_HYP_OFFSET_LOW)
+		     : "=r" (mask)
+		     : "i" (~HYP_PAGE_OFFSET_HIGH_MASK),
+		       "i" (~HYP_PAGE_OFFSET_LOW_MASK));
+	asm volatile(ALTERNATIVE("nop",
+				 "mov %0, xzr",
 				 ARM64_HAS_VIRT_HOST_EXTN)
-		     : "+r" (v) : "i" (~HYP_PAGE_OFFSET_MASK));
-	return v;
+		     : "+r" (mask));
+	return v | mask;
 }
 
 #define hyp_kern_va(v) (typeof(v))(__hyp_kern_va((unsigned long)(v)))
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index e45df1b..889330b 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -95,13 +95,33 @@
 /*
  * Convert a kernel VA into a HYP VA.
  * reg: VA to be converted.
+ *
+ * This generates the following sequences:
+ * - High mask:
+ *		and x0, x0, #HYP_PAGE_OFFSET_HIGH_MASK
+ *		nop
+ * - Low mask:
+ *		and x0, x0, #HYP_PAGE_OFFSET_HIGH_MASK
+ *		and x0, x0, #HYP_PAGE_OFFSET_LOW_MASK
+ * - VHE:
+ *		nop
+ *		nop
+ *
+ * The "low mask" version works because the mask is a strict subset of
+ * the "high mask", hence performing the first mask for nothing.
+ * Should be completely invisible on any viable CPU.
  */
 .macro kern_hyp_va	reg
-alternative_if_not ARM64_HAS_VIRT_HOST_EXTN	
-	and	\reg, \reg, #HYP_PAGE_OFFSET_MASK
+alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
+	and     \reg, \reg, #HYP_PAGE_OFFSET_HIGH_MASK
 alternative_else
 	nop
 alternative_endif
+alternative_if_not ARM64_HYP_OFFSET_LOW
+	nop
+alternative_else
+	and     \reg, \reg, #HYP_PAGE_OFFSET_LOW_MASK
+alternative_endif
 .endm
 
 #else
@@ -112,7 +132,23 @@ alternative_endif
 #include <asm/mmu_context.h>
 #include <asm/pgtable.h>
 
-#define KERN_TO_HYP(kva)	((unsigned long)kva & HYP_PAGE_OFFSET_MASK)
+static inline unsigned long __kern_hyp_va(unsigned long v)
+{
+	asm volatile(ALTERNATIVE("and %0, %0, %1",
+				 "nop",
+				 ARM64_HAS_VIRT_HOST_EXTN)
+		     : "+r" (v)
+		     : "i" (HYP_PAGE_OFFSET_HIGH_MASK));
+	asm volatile(ALTERNATIVE("nop",
+				 "and %0, %0, %1",
+				 ARM64_HYP_OFFSET_LOW)
+		     : "+r" (v)
+		     : "i" (HYP_PAGE_OFFSET_LOW_MASK));
+	return v;
+}
+
+#define kern_hyp_va(v) 	(typeof(v))(__kern_hyp_va((unsigned long)(v)))
+#define KERN_TO_HYP(v)	kern_hyp_va(v)
 
 /*
  * We currently only support a 40bit IPA.
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 05/15] arm64: KVM: Refactor kern_hyp_va/hyp_kern_va to deal with multiple offsets
@ 2016-06-07 10:58   ` Marc Zyngier
  0 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-07 10:58 UTC (permalink / raw)
  To: linux-arm-kernel

As we move towards a selectable HYP VA range, it is obvious that
we don't want to test a variable to find out if we need to use
the bottom VA range, the top VA range, or use the address as is
(for VHE).

Instead, we can expand our current helpers to generate the right
mask or nop with code patching. We default to using the top VA
space, with alternatives to switch to the bottom one or to nop
out the instructions.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/kvm_hyp.h | 27 ++++++++++++--------------
 arch/arm64/include/asm/kvm_mmu.h | 42 +++++++++++++++++++++++++++++++++++++---
 2 files changed, 51 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 61d01a9..dd4904b 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -25,24 +25,21 @@
 
 #define __hyp_text __section(.hyp.text) notrace
 
-static inline unsigned long __kern_hyp_va(unsigned long v)
-{
-	asm volatile(ALTERNATIVE("and %0, %0, %1",
-				 "nop",
-				 ARM64_HAS_VIRT_HOST_EXTN)
-		     : "+r" (v) : "i" (HYP_PAGE_OFFSET_MASK));
-	return v;
-}
-
-#define kern_hyp_va(v) (typeof(v))(__kern_hyp_va((unsigned long)(v)))
-
 static inline unsigned long __hyp_kern_va(unsigned long v)
 {
-	asm volatile(ALTERNATIVE("orr %0, %0, %1",
-				 "nop",
+	u64 mask;
+
+	asm volatile(ALTERNATIVE("mov %0, %1",
+				 "mov %0, %2",
+				 ARM64_HYP_OFFSET_LOW)
+		     : "=r" (mask)
+		     : "i" (~HYP_PAGE_OFFSET_HIGH_MASK),
+		       "i" (~HYP_PAGE_OFFSET_LOW_MASK));
+	asm volatile(ALTERNATIVE("nop",
+				 "mov %0, xzr",
 				 ARM64_HAS_VIRT_HOST_EXTN)
-		     : "+r" (v) : "i" (~HYP_PAGE_OFFSET_MASK));
-	return v;
+		     : "+r" (mask));
+	return v | mask;
 }
 
 #define hyp_kern_va(v) (typeof(v))(__hyp_kern_va((unsigned long)(v)))
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index e45df1b..889330b 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -95,13 +95,33 @@
 /*
  * Convert a kernel VA into a HYP VA.
  * reg: VA to be converted.
+ *
+ * This generates the following sequences:
+ * - High mask:
+ *		and x0, x0, #HYP_PAGE_OFFSET_HIGH_MASK
+ *		nop
+ * - Low mask:
+ *		and x0, x0, #HYP_PAGE_OFFSET_HIGH_MASK
+ *		and x0, x0, #HYP_PAGE_OFFSET_LOW_MASK
+ * - VHE:
+ *		nop
+ *		nop
+ *
+ * The "low mask" version works because the mask is a strict subset of
+ * the "high mask", hence performing the first mask for nothing.
+ * Should be completely invisible on any viable CPU.
  */
 .macro kern_hyp_va	reg
-alternative_if_not ARM64_HAS_VIRT_HOST_EXTN	
-	and	\reg, \reg, #HYP_PAGE_OFFSET_MASK
+alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
+	and     \reg, \reg, #HYP_PAGE_OFFSET_HIGH_MASK
 alternative_else
 	nop
 alternative_endif
+alternative_if_not ARM64_HYP_OFFSET_LOW
+	nop
+alternative_else
+	and     \reg, \reg, #HYP_PAGE_OFFSET_LOW_MASK
+alternative_endif
 .endm
 
 #else
@@ -112,7 +132,23 @@ alternative_endif
 #include <asm/mmu_context.h>
 #include <asm/pgtable.h>
 
-#define KERN_TO_HYP(kva)	((unsigned long)kva & HYP_PAGE_OFFSET_MASK)
+static inline unsigned long __kern_hyp_va(unsigned long v)
+{
+	asm volatile(ALTERNATIVE("and %0, %0, %1",
+				 "nop",
+				 ARM64_HAS_VIRT_HOST_EXTN)
+		     : "+r" (v)
+		     : "i" (HYP_PAGE_OFFSET_HIGH_MASK));
+	asm volatile(ALTERNATIVE("nop",
+				 "and %0, %0, %1",
+				 ARM64_HYP_OFFSET_LOW)
+		     : "+r" (v)
+		     : "i" (HYP_PAGE_OFFSET_LOW_MASK));
+	return v;
+}
+
+#define kern_hyp_va(v) 	(typeof(v))(__kern_hyp_va((unsigned long)(v)))
+#define KERN_TO_HYP(v)	kern_hyp_va(v)
 
 /*
  * We currently only support a 40bit IPA.
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 06/15] arm/arm64: KVM: Export __hyp_text_start/end symbols
  2016-06-07 10:58 ` Marc Zyngier
@ 2016-06-07 10:58   ` Marc Zyngier
  -1 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-07 10:58 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: linux-arm-kernel, kvm, kvmarm

Declare the __hyp_text_start/end symbols in asm/virt.h so that
they can be reused without having to declare them locally.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/virt.h   | 4 ++++
 arch/arm/kvm/mmu.c            | 2 --
 arch/arm64/include/asm/virt.h | 4 ++++
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/virt.h b/arch/arm/include/asm/virt.h
index d4ceaf5..a2e75b8 100644
--- a/arch/arm/include/asm/virt.h
+++ b/arch/arm/include/asm/virt.h
@@ -80,6 +80,10 @@ static inline bool is_kernel_in_hyp_mode(void)
 	return false;
 }
 
+/* The section containing the hypervisor idmap text */
+extern char __hyp_idmap_text_start[];
+extern char __hyp_idmap_text_end[];
+
 /* The section containing the hypervisor text */
 extern char __hyp_text_start[];
 extern char __hyp_text_end[];
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 45c43ae..d6ecbf1 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -32,8 +32,6 @@
 
 #include "trace.h"
 
-extern char  __hyp_idmap_text_start[], __hyp_idmap_text_end[];
-
 static pgd_t *boot_hyp_pgd;
 static pgd_t *hyp_pgd;
 static pgd_t *merged_hyp_pgd;
diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index dcbcf8d..88aa8ec 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -82,6 +82,10 @@ extern void verify_cpu_run_el(void);
 static inline void verify_cpu_run_el(void) {}
 #endif
 
+/* The section containing the hypervisor idmap text */
+extern char __hyp_idmap_text_start[];
+extern char __hyp_idmap_text_end[];
+
 /* The section containing the hypervisor text */
 extern char __hyp_text_start[];
 extern char __hyp_text_end[];
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 06/15] arm/arm64: KVM: Export __hyp_text_start/end symbols
@ 2016-06-07 10:58   ` Marc Zyngier
  0 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-07 10:58 UTC (permalink / raw)
  To: linux-arm-kernel

Declare the __hyp_text_start/end symbols in asm/virt.h so that
they can be reused without having to declare them locally.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/virt.h   | 4 ++++
 arch/arm/kvm/mmu.c            | 2 --
 arch/arm64/include/asm/virt.h | 4 ++++
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/virt.h b/arch/arm/include/asm/virt.h
index d4ceaf5..a2e75b8 100644
--- a/arch/arm/include/asm/virt.h
+++ b/arch/arm/include/asm/virt.h
@@ -80,6 +80,10 @@ static inline bool is_kernel_in_hyp_mode(void)
 	return false;
 }
 
+/* The section containing the hypervisor idmap text */
+extern char __hyp_idmap_text_start[];
+extern char __hyp_idmap_text_end[];
+
 /* The section containing the hypervisor text */
 extern char __hyp_text_start[];
 extern char __hyp_text_end[];
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 45c43ae..d6ecbf1 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -32,8 +32,6 @@
 
 #include "trace.h"
 
-extern char  __hyp_idmap_text_start[], __hyp_idmap_text_end[];
-
 static pgd_t *boot_hyp_pgd;
 static pgd_t *hyp_pgd;
 static pgd_t *merged_hyp_pgd;
diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index dcbcf8d..88aa8ec 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -82,6 +82,10 @@ extern void verify_cpu_run_el(void);
 static inline void verify_cpu_run_el(void) {}
 #endif
 
+/* The section containing the hypervisor idmap text */
+extern char __hyp_idmap_text_start[];
+extern char __hyp_idmap_text_end[];
+
 /* The section containing the hypervisor text */
 extern char __hyp_text_start[];
 extern char __hyp_text_end[];
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 07/15] arm64: KVM: Runtime detection of lower HYP offset
  2016-06-07 10:58 ` Marc Zyngier
@ 2016-06-07 10:58   ` Marc Zyngier
  -1 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-07 10:58 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm

Add the code that enables the switch to the lower HYP VA range.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kernel/cpufeature.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 811773d..ffb3e14d 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -726,6 +726,19 @@ static bool runs_at_el2(const struct arm64_cpu_capabilities *entry, int __unused
 	return is_kernel_in_hyp_mode();
 }
 
+static bool hyp_offset_low(const struct arm64_cpu_capabilities *entry,
+			   int __unused)
+{
+	phys_addr_t idmap_addr = virt_to_phys(__hyp_idmap_text_start);
+
+	/*
+	 * Activate the lower HYP offset only if:
+	 * - the idmap doesn't clash with it,
+	 * - the kernel is not running at EL2.
+	 */
+	return idmap_addr > GENMASK(VA_BITS - 2, 0) && !is_kernel_in_hyp_mode();
+}
+
 static const struct arm64_cpu_capabilities arm64_features[] = {
 	{
 		.desc = "GIC system register CPU interface",
@@ -803,6 +816,12 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.field_pos = ID_AA64PFR0_EL0_SHIFT,
 		.min_field_value = ID_AA64PFR0_EL0_32BIT_64BIT,
 	},
+	{
+		.desc = "Reduced HYP mapping offset",
+		.capability = ARM64_HYP_OFFSET_LOW,
+		.def_scope = SCOPE_SYSTEM,
+		.matches = hyp_offset_low,
+	},
 	{},
 };
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 07/15] arm64: KVM: Runtime detection of lower HYP offset
@ 2016-06-07 10:58   ` Marc Zyngier
  0 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-07 10:58 UTC (permalink / raw)
  To: linux-arm-kernel

Add the code that enables the switch to the lower HYP VA range.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kernel/cpufeature.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 811773d..ffb3e14d 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -726,6 +726,19 @@ static bool runs_at_el2(const struct arm64_cpu_capabilities *entry, int __unused
 	return is_kernel_in_hyp_mode();
 }
 
+static bool hyp_offset_low(const struct arm64_cpu_capabilities *entry,
+			   int __unused)
+{
+	phys_addr_t idmap_addr = virt_to_phys(__hyp_idmap_text_start);
+
+	/*
+	 * Activate the lower HYP offset only if:
+	 * - the idmap doesn't clash with it,
+	 * - the kernel is not running at EL2.
+	 */
+	return idmap_addr > GENMASK(VA_BITS - 2, 0) && !is_kernel_in_hyp_mode();
+}
+
 static const struct arm64_cpu_capabilities arm64_features[] = {
 	{
 		.desc = "GIC system register CPU interface",
@@ -803,6 +816,12 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.field_pos = ID_AA64PFR0_EL0_SHIFT,
 		.min_field_value = ID_AA64PFR0_EL0_32BIT_64BIT,
 	},
+	{
+		.desc = "Reduced HYP mapping offset",
+		.capability = ARM64_HYP_OFFSET_LOW,
+		.def_scope = SCOPE_SYSTEM,
+		.matches = hyp_offset_low,
+	},
 	{},
 };
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 08/15] arm/arm64: KVM: Always have merged page tables
  2016-06-07 10:58 ` Marc Zyngier
@ 2016-06-07 10:58   ` Marc Zyngier
  -1 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-07 10:58 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: linux-arm-kernel, kvm, kvmarm

We're in a position where we can now always have "merged" page
tables, where both the runtime mapping and the idmap coexist.

This results in some code being removed, but there is more to come.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/kvm/mmu.c     | 74 +++++++++++++++++++++++---------------------------
 arch/arm64/kvm/reset.c | 31 +++++----------------
 2 files changed, 41 insertions(+), 64 deletions(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index d6ecbf1..9a17e14 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -492,13 +492,12 @@ void free_boot_hyp_pgd(void)
 
 	if (boot_hyp_pgd) {
 		unmap_hyp_range(boot_hyp_pgd, hyp_idmap_start, PAGE_SIZE);
-		unmap_hyp_range(boot_hyp_pgd, TRAMPOLINE_VA, PAGE_SIZE);
 		free_pages((unsigned long)boot_hyp_pgd, hyp_pgd_order);
 		boot_hyp_pgd = NULL;
 	}
 
 	if (hyp_pgd)
-		unmap_hyp_range(hyp_pgd, TRAMPOLINE_VA, PAGE_SIZE);
+		unmap_hyp_range(hyp_pgd, hyp_idmap_start, PAGE_SIZE);
 
 	mutex_unlock(&kvm_hyp_pgd_mutex);
 }
@@ -1690,7 +1689,7 @@ phys_addr_t kvm_mmu_get_boot_httbr(void)
 	if (__kvm_cpu_uses_extended_idmap())
 		return virt_to_phys(merged_hyp_pgd);
 	else
-		return virt_to_phys(boot_hyp_pgd);
+		return virt_to_phys(hyp_pgd);
 }
 
 phys_addr_t kvm_get_idmap_vector(void)
@@ -1703,6 +1702,22 @@ phys_addr_t kvm_get_idmap_start(void)
 	return hyp_idmap_start;
 }
 
+static int kvm_map_idmap_text(pgd_t *pgd)
+{
+	int err;
+
+	/* Create the idmap in the boot page tables */
+	err = 	__create_hyp_mappings(pgd,
+				      hyp_idmap_start, hyp_idmap_end,
+				      __phys_to_pfn(hyp_idmap_start),
+				      PAGE_HYP);
+	if (err)
+		kvm_err("Failed to idmap %lx-%lx\n",
+			hyp_idmap_start, hyp_idmap_end);
+
+	return err;
+}
+
 int kvm_mmu_init(void)
 {
 	int err;
@@ -1718,27 +1733,25 @@ int kvm_mmu_init(void)
 	BUG_ON((hyp_idmap_start ^ (hyp_idmap_end - 1)) & PAGE_MASK);
 
 	hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order);
-	boot_hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order);
-
-	if (!hyp_pgd || !boot_hyp_pgd) {
+	if (!hyp_pgd) {
 		kvm_err("Hyp mode PGD not allocated\n");
 		err = -ENOMEM;
 		goto out;
 	}
 
-	/* Create the idmap in the boot page tables */
-	err = 	__create_hyp_mappings(boot_hyp_pgd,
-				      hyp_idmap_start, hyp_idmap_end,
-				      __phys_to_pfn(hyp_idmap_start),
-				      PAGE_HYP);
+	if (__kvm_cpu_uses_extended_idmap()) {
+		boot_hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
+							 hyp_pgd_order);
+		if (!boot_hyp_pgd) {
+			kvm_err("Hyp boot PGD not allocated\n");
+			err = -ENOMEM;
+			goto out;
+		}
 
-	if (err) {
-		kvm_err("Failed to idmap %lx-%lx\n",
-			hyp_idmap_start, hyp_idmap_end);
-		goto out;
-	}
+		err = kvm_map_idmap_text(boot_hyp_pgd);
+		if (err)
+			goto out;
 
-	if (__kvm_cpu_uses_extended_idmap()) {
 		merged_hyp_pgd = (pgd_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
 		if (!merged_hyp_pgd) {
 			kvm_err("Failed to allocate extra HYP pgd\n");
@@ -1746,29 +1759,10 @@ int kvm_mmu_init(void)
 		}
 		__kvm_extend_hypmap(boot_hyp_pgd, hyp_pgd, merged_hyp_pgd,
 				    hyp_idmap_start);
-		return 0;
-	}
-
-	/* Map the very same page at the trampoline VA */
-	err = 	__create_hyp_mappings(boot_hyp_pgd,
-				      TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
-				      __phys_to_pfn(hyp_idmap_start),
-				      PAGE_HYP);
-	if (err) {
-		kvm_err("Failed to map trampoline @%lx into boot HYP pgd\n",
-			TRAMPOLINE_VA);
-		goto out;
-	}
-
-	/* Map the same page again into the runtime page tables */
-	err = 	__create_hyp_mappings(hyp_pgd,
-				      TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
-				      __phys_to_pfn(hyp_idmap_start),
-				      PAGE_HYP);
-	if (err) {
-		kvm_err("Failed to map trampoline @%lx into runtime HYP pgd\n",
-			TRAMPOLINE_VA);
-		goto out;
+	} else {
+		err = kvm_map_idmap_text(hyp_pgd);
+		if (err)
+			goto out;
 	}
 
 	return 0;
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index b1ad730..d044ca3 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -133,30 +133,13 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 	return kvm_timer_vcpu_reset(vcpu, cpu_vtimer_irq);
 }
 
-extern char __hyp_idmap_text_start[];
-
 unsigned long kvm_hyp_reset_entry(void)
 {
-	if (!__kvm_cpu_uses_extended_idmap()) {
-		unsigned long offset;
-
-		/*
-		 * Find the address of __kvm_hyp_reset() in the trampoline page.
-		 * This is present in the running page tables, and the boot page
-		 * tables, so we call the code here to start the trampoline
-		 * dance in reverse.
-		 */
-		offset = (unsigned long)__kvm_hyp_reset
-			 - ((unsigned long)__hyp_idmap_text_start & PAGE_MASK);
-
-		return TRAMPOLINE_VA + offset;
-	} else {
-		/*
-		 * KVM is running with merged page tables, which don't have the
-		 * trampoline page mapped. We know the idmap is still mapped,
-		 * but can't be called into directly. Use
-		 * __extended_idmap_trampoline to do the call.
-		 */
-		return (unsigned long)kvm_ksym_ref(__extended_idmap_trampoline);
-	}
+	/*
+	 * KVM is running with merged page tables, which don't have the
+	 * trampoline page mapped. We know the idmap is still mapped,
+	 * but can't be called into directly. Use
+	 * __extended_idmap_trampoline to do the call.
+	 */
+	return (unsigned long)kvm_ksym_ref(__extended_idmap_trampoline);
 }
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 08/15] arm/arm64: KVM: Always have merged page tables
@ 2016-06-07 10:58   ` Marc Zyngier
  0 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-07 10:58 UTC (permalink / raw)
  To: linux-arm-kernel

We're in a position where we can now always have "merged" page
tables, where both the runtime mapping and the idmap coexist.

This results in some code being removed, but there is more to come.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/kvm/mmu.c     | 74 +++++++++++++++++++++++---------------------------
 arch/arm64/kvm/reset.c | 31 +++++----------------
 2 files changed, 41 insertions(+), 64 deletions(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index d6ecbf1..9a17e14 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -492,13 +492,12 @@ void free_boot_hyp_pgd(void)
 
 	if (boot_hyp_pgd) {
 		unmap_hyp_range(boot_hyp_pgd, hyp_idmap_start, PAGE_SIZE);
-		unmap_hyp_range(boot_hyp_pgd, TRAMPOLINE_VA, PAGE_SIZE);
 		free_pages((unsigned long)boot_hyp_pgd, hyp_pgd_order);
 		boot_hyp_pgd = NULL;
 	}
 
 	if (hyp_pgd)
-		unmap_hyp_range(hyp_pgd, TRAMPOLINE_VA, PAGE_SIZE);
+		unmap_hyp_range(hyp_pgd, hyp_idmap_start, PAGE_SIZE);
 
 	mutex_unlock(&kvm_hyp_pgd_mutex);
 }
@@ -1690,7 +1689,7 @@ phys_addr_t kvm_mmu_get_boot_httbr(void)
 	if (__kvm_cpu_uses_extended_idmap())
 		return virt_to_phys(merged_hyp_pgd);
 	else
-		return virt_to_phys(boot_hyp_pgd);
+		return virt_to_phys(hyp_pgd);
 }
 
 phys_addr_t kvm_get_idmap_vector(void)
@@ -1703,6 +1702,22 @@ phys_addr_t kvm_get_idmap_start(void)
 	return hyp_idmap_start;
 }
 
+static int kvm_map_idmap_text(pgd_t *pgd)
+{
+	int err;
+
+	/* Create the idmap in the boot page tables */
+	err = 	__create_hyp_mappings(pgd,
+				      hyp_idmap_start, hyp_idmap_end,
+				      __phys_to_pfn(hyp_idmap_start),
+				      PAGE_HYP);
+	if (err)
+		kvm_err("Failed to idmap %lx-%lx\n",
+			hyp_idmap_start, hyp_idmap_end);
+
+	return err;
+}
+
 int kvm_mmu_init(void)
 {
 	int err;
@@ -1718,27 +1733,25 @@ int kvm_mmu_init(void)
 	BUG_ON((hyp_idmap_start ^ (hyp_idmap_end - 1)) & PAGE_MASK);
 
 	hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order);
-	boot_hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order);
-
-	if (!hyp_pgd || !boot_hyp_pgd) {
+	if (!hyp_pgd) {
 		kvm_err("Hyp mode PGD not allocated\n");
 		err = -ENOMEM;
 		goto out;
 	}
 
-	/* Create the idmap in the boot page tables */
-	err = 	__create_hyp_mappings(boot_hyp_pgd,
-				      hyp_idmap_start, hyp_idmap_end,
-				      __phys_to_pfn(hyp_idmap_start),
-				      PAGE_HYP);
+	if (__kvm_cpu_uses_extended_idmap()) {
+		boot_hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
+							 hyp_pgd_order);
+		if (!boot_hyp_pgd) {
+			kvm_err("Hyp boot PGD not allocated\n");
+			err = -ENOMEM;
+			goto out;
+		}
 
-	if (err) {
-		kvm_err("Failed to idmap %lx-%lx\n",
-			hyp_idmap_start, hyp_idmap_end);
-		goto out;
-	}
+		err = kvm_map_idmap_text(boot_hyp_pgd);
+		if (err)
+			goto out;
 
-	if (__kvm_cpu_uses_extended_idmap()) {
 		merged_hyp_pgd = (pgd_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
 		if (!merged_hyp_pgd) {
 			kvm_err("Failed to allocate extra HYP pgd\n");
@@ -1746,29 +1759,10 @@ int kvm_mmu_init(void)
 		}
 		__kvm_extend_hypmap(boot_hyp_pgd, hyp_pgd, merged_hyp_pgd,
 				    hyp_idmap_start);
-		return 0;
-	}
-
-	/* Map the very same page at the trampoline VA */
-	err = 	__create_hyp_mappings(boot_hyp_pgd,
-				      TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
-				      __phys_to_pfn(hyp_idmap_start),
-				      PAGE_HYP);
-	if (err) {
-		kvm_err("Failed to map trampoline @%lx into boot HYP pgd\n",
-			TRAMPOLINE_VA);
-		goto out;
-	}
-
-	/* Map the same page again into the runtime page tables */
-	err = 	__create_hyp_mappings(hyp_pgd,
-				      TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
-				      __phys_to_pfn(hyp_idmap_start),
-				      PAGE_HYP);
-	if (err) {
-		kvm_err("Failed to map trampoline @%lx into runtime HYP pgd\n",
-			TRAMPOLINE_VA);
-		goto out;
+	} else {
+		err = kvm_map_idmap_text(hyp_pgd);
+		if (err)
+			goto out;
 	}
 
 	return 0;
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index b1ad730..d044ca3 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -133,30 +133,13 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 	return kvm_timer_vcpu_reset(vcpu, cpu_vtimer_irq);
 }
 
-extern char __hyp_idmap_text_start[];
-
 unsigned long kvm_hyp_reset_entry(void)
 {
-	if (!__kvm_cpu_uses_extended_idmap()) {
-		unsigned long offset;
-
-		/*
-		 * Find the address of __kvm_hyp_reset() in the trampoline page.
-		 * This is present in the running page tables, and the boot page
-		 * tables, so we call the code here to start the trampoline
-		 * dance in reverse.
-		 */
-		offset = (unsigned long)__kvm_hyp_reset
-			 - ((unsigned long)__hyp_idmap_text_start & PAGE_MASK);
-
-		return TRAMPOLINE_VA + offset;
-	} else {
-		/*
-		 * KVM is running with merged page tables, which don't have the
-		 * trampoline page mapped. We know the idmap is still mapped,
-		 * but can't be called into directly. Use
-		 * __extended_idmap_trampoline to do the call.
-		 */
-		return (unsigned long)kvm_ksym_ref(__extended_idmap_trampoline);
-	}
+	/*
+	 * KVM is running with merged page tables, which don't have the
+	 * trampoline page mapped. We know the idmap is still mapped,
+	 * but can't be called into directly. Use
+	 * __extended_idmap_trampoline to do the call.
+	 */
+	return (unsigned long)kvm_ksym_ref(__extended_idmap_trampoline);
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 09/15] arm64: KVM: Simplify HYP init/teardown
  2016-06-07 10:58 ` Marc Zyngier
@ 2016-06-07 10:58   ` Marc Zyngier
  -1 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-07 10:58 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm

Now that we only have the "merged page tables" case to deal with,
there is a bunch of things we can simplify in the HYP code (both
at init and teardown time).

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/kvm_host.h | 12 ++------
 arch/arm64/kvm/hyp-init.S         | 61 +++++----------------------------------
 arch/arm64/kvm/hyp/entry.S        | 19 ------------
 arch/arm64/kvm/hyp/hyp-entry.S    | 15 ++++++++++
 arch/arm64/kvm/reset.c            | 11 -------
 5 files changed, 26 insertions(+), 92 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 49095fc..88462c3 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -48,7 +48,6 @@
 int __attribute_const__ kvm_target_cpu(void);
 int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
 int kvm_arch_dev_ioctl_check_extension(long ext);
-unsigned long kvm_hyp_reset_entry(void);
 void __extended_idmap_trampoline(phys_addr_t boot_pgd, phys_addr_t idmap_start);
 
 struct kvm_arch {
@@ -357,19 +356,14 @@ static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
 	 * Call initialization code, and switch to the full blown
 	 * HYP code.
 	 */
-	__kvm_call_hyp((void *)boot_pgd_ptr, pgd_ptr,
-		       hyp_stack_ptr, vector_ptr);
+	__kvm_call_hyp((void *)pgd_ptr, hyp_stack_ptr, vector_ptr);
 }
 
+void __kvm_hyp_teardown(void);
 static inline void __cpu_reset_hyp_mode(phys_addr_t boot_pgd_ptr,
 					phys_addr_t phys_idmap_start)
 {
-	/*
-	 * Call reset code, and switch back to stub hyp vectors.
-	 * Uses __kvm_call_hyp() to avoid kaslr's kvm_ksym_ref() translation.
-	 */
-	__kvm_call_hyp((void *)kvm_hyp_reset_entry(),
-		       boot_pgd_ptr, phys_idmap_start);
+	kvm_call_hyp(__kvm_hyp_teardown, phys_idmap_start);
 }
 
 static inline void kvm_arch_hardware_unsetup(void) {}
diff --git a/arch/arm64/kvm/hyp-init.S b/arch/arm64/kvm/hyp-init.S
index a873a6d..6b29d3d 100644
--- a/arch/arm64/kvm/hyp-init.S
+++ b/arch/arm64/kvm/hyp-init.S
@@ -53,10 +53,9 @@ __invalid:
 	b	.
 
 	/*
-	 * x0: HYP boot pgd
-	 * x1: HYP pgd
-	 * x2: HYP stack
-	 * x3: HYP vectors
+	 * x0: HYP pgd
+	 * x1: HYP stack
+	 * x2: HYP vectors
 	 */
 __do_hyp_init:
 
@@ -110,71 +109,27 @@ __do_hyp_init:
 	msr	sctlr_el2, x4
 	isb
 
-	/* Skip the trampoline dance if we merged the boot and runtime PGDs */
-	cmp	x0, x1
-	b.eq	merged
-
-	/* MMU is now enabled. Get ready for the trampoline dance */
-	ldr	x4, =TRAMPOLINE_VA
-	adr	x5, target
-	bfi	x4, x5, #0, #PAGE_SHIFT
-	br	x4
-
-target: /* We're now in the trampoline code, switch page tables */
-	msr	ttbr0_el2, x1
-	isb
-
-	/* Invalidate the old TLBs */
-	tlbi	alle2
-	dsb	sy
-
-merged:
 	/* Set the stack and new vectors */
+	kern_hyp_va	x1
+	mov	sp, x1
 	kern_hyp_va	x2
-	mov	sp, x2
-	kern_hyp_va	x3
-	msr	vbar_el2, x3
+	msr	vbar_el2, x2
 
 	/* Hello, World! */
 	eret
 ENDPROC(__kvm_hyp_init)
 
 	/*
-	 * Reset kvm back to the hyp stub. This is the trampoline dance in
-	 * reverse. If kvm used an extended idmap, __extended_idmap_trampoline
-	 * calls this code directly in the idmap. In this case switching to the
-	 * boot tables is a no-op.
-	 *
-	 * x0: HYP boot pgd
-	 * x1: HYP phys_idmap_start
+	 * Reset kvm back to the hyp stub.
 	 */
 ENTRY(__kvm_hyp_reset)
-	/* We're in trampoline code in VA, switch back to boot page tables */
-	msr	ttbr0_el2, x0
-	isb
-
-	/* Ensure the PA branch doesn't find a stale tlb entry or stale code. */
-	ic	iallu
-	tlbi	alle2
-	dsb	sy
-	isb
-
-	/* Branch into PA space */
-	adr	x0, 1f
-	bfi	x1, x0, #0, #PAGE_SHIFT
-	br	x1
-
 	/* We're now in idmap, disable MMU */
-1:	mrs	x0, sctlr_el2
+	mrs	x0, sctlr_el2
 	ldr	x1, =SCTLR_ELx_FLAGS
 	bic	x0, x0, x1		// Clear SCTL_M and etc
 	msr	sctlr_el2, x0
 	isb
 
-	/* Invalidate the old TLBs */
-	tlbi	alle2
-	dsb	sy
-
 	/* Install stub vectors */
 	adr_l	x0, __hyp_stub_vectors
 	msr	vbar_el2, x0
diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index 70254a6..ce9e5e5 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -164,22 +164,3 @@ alternative_endif
 
 	eret
 ENDPROC(__fpsimd_guest_restore)
-
-/*
- * When using the extended idmap, we don't have a trampoline page we can use
- * while we switch pages tables during __kvm_hyp_reset. Accessing the idmap
- * directly would be ideal, but if we're using the extended idmap then the
- * idmap is located above HYP_PAGE_OFFSET, and the address will be masked by
- * kvm_call_hyp using kern_hyp_va.
- *
- * x0: HYP boot pgd
- * x1: HYP phys_idmap_start
- */
-ENTRY(__extended_idmap_trampoline)
-	mov	x4, x1
-	adr_l	x3, __kvm_hyp_reset
-
-	/* insert __kvm_hyp_reset()s offset into phys_idmap_start */
-	bfi	x4, x3, #0, #PAGE_SHIFT
-	br	x4
-ENDPROC(__extended_idmap_trampoline)
diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
index 2d87f36..f6d9694 100644
--- a/arch/arm64/kvm/hyp/hyp-entry.S
+++ b/arch/arm64/kvm/hyp/hyp-entry.S
@@ -62,6 +62,21 @@ ENTRY(__vhe_hyp_call)
 	isb
 	ret
 ENDPROC(__vhe_hyp_call)
+
+/*
+ * Compute the idmap address of __kvm_hyp_reset based on the idmap
+ * start passed as a parameter, and jump there.
+ *
+ * x0: HYP phys_idmap_start
+ */
+ENTRY(__kvm_hyp_teardown)
+	mov	x4, x0
+	adr_l	x3, __kvm_hyp_reset
+
+	/* insert __kvm_hyp_reset()s offset into phys_idmap_start */
+	bfi	x4, x3, #0, #PAGE_SHIFT
+	br	x4
+ENDPROC(__kvm_hyp_teardown)
 	
 el1_sync:				// Guest trapped into EL2
 	save_x0_to_x3
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index d044ca3..deee1b1 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -132,14 +132,3 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 	/* Reset timer */
 	return kvm_timer_vcpu_reset(vcpu, cpu_vtimer_irq);
 }
-
-unsigned long kvm_hyp_reset_entry(void)
-{
-	/*
-	 * KVM is running with merged page tables, which don't have the
-	 * trampoline page mapped. We know the idmap is still mapped,
-	 * but can't be called into directly. Use
-	 * __extended_idmap_trampoline to do the call.
-	 */
-	return (unsigned long)kvm_ksym_ref(__extended_idmap_trampoline);
-}
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 09/15] arm64: KVM: Simplify HYP init/teardown
@ 2016-06-07 10:58   ` Marc Zyngier
  0 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-07 10:58 UTC (permalink / raw)
  To: linux-arm-kernel

Now that we only have the "merged page tables" case to deal with,
there is a bunch of things we can simplify in the HYP code (both
at init and teardown time).

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/kvm_host.h | 12 ++------
 arch/arm64/kvm/hyp-init.S         | 61 +++++----------------------------------
 arch/arm64/kvm/hyp/entry.S        | 19 ------------
 arch/arm64/kvm/hyp/hyp-entry.S    | 15 ++++++++++
 arch/arm64/kvm/reset.c            | 11 -------
 5 files changed, 26 insertions(+), 92 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 49095fc..88462c3 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -48,7 +48,6 @@
 int __attribute_const__ kvm_target_cpu(void);
 int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
 int kvm_arch_dev_ioctl_check_extension(long ext);
-unsigned long kvm_hyp_reset_entry(void);
 void __extended_idmap_trampoline(phys_addr_t boot_pgd, phys_addr_t idmap_start);
 
 struct kvm_arch {
@@ -357,19 +356,14 @@ static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
 	 * Call initialization code, and switch to the full blown
 	 * HYP code.
 	 */
-	__kvm_call_hyp((void *)boot_pgd_ptr, pgd_ptr,
-		       hyp_stack_ptr, vector_ptr);
+	__kvm_call_hyp((void *)pgd_ptr, hyp_stack_ptr, vector_ptr);
 }
 
+void __kvm_hyp_teardown(void);
 static inline void __cpu_reset_hyp_mode(phys_addr_t boot_pgd_ptr,
 					phys_addr_t phys_idmap_start)
 {
-	/*
-	 * Call reset code, and switch back to stub hyp vectors.
-	 * Uses __kvm_call_hyp() to avoid kaslr's kvm_ksym_ref() translation.
-	 */
-	__kvm_call_hyp((void *)kvm_hyp_reset_entry(),
-		       boot_pgd_ptr, phys_idmap_start);
+	kvm_call_hyp(__kvm_hyp_teardown, phys_idmap_start);
 }
 
 static inline void kvm_arch_hardware_unsetup(void) {}
diff --git a/arch/arm64/kvm/hyp-init.S b/arch/arm64/kvm/hyp-init.S
index a873a6d..6b29d3d 100644
--- a/arch/arm64/kvm/hyp-init.S
+++ b/arch/arm64/kvm/hyp-init.S
@@ -53,10 +53,9 @@ __invalid:
 	b	.
 
 	/*
-	 * x0: HYP boot pgd
-	 * x1: HYP pgd
-	 * x2: HYP stack
-	 * x3: HYP vectors
+	 * x0: HYP pgd
+	 * x1: HYP stack
+	 * x2: HYP vectors
 	 */
 __do_hyp_init:
 
@@ -110,71 +109,27 @@ __do_hyp_init:
 	msr	sctlr_el2, x4
 	isb
 
-	/* Skip the trampoline dance if we merged the boot and runtime PGDs */
-	cmp	x0, x1
-	b.eq	merged
-
-	/* MMU is now enabled. Get ready for the trampoline dance */
-	ldr	x4, =TRAMPOLINE_VA
-	adr	x5, target
-	bfi	x4, x5, #0, #PAGE_SHIFT
-	br	x4
-
-target: /* We're now in the trampoline code, switch page tables */
-	msr	ttbr0_el2, x1
-	isb
-
-	/* Invalidate the old TLBs */
-	tlbi	alle2
-	dsb	sy
-
-merged:
 	/* Set the stack and new vectors */
+	kern_hyp_va	x1
+	mov	sp, x1
 	kern_hyp_va	x2
-	mov	sp, x2
-	kern_hyp_va	x3
-	msr	vbar_el2, x3
+	msr	vbar_el2, x2
 
 	/* Hello, World! */
 	eret
 ENDPROC(__kvm_hyp_init)
 
 	/*
-	 * Reset kvm back to the hyp stub. This is the trampoline dance in
-	 * reverse. If kvm used an extended idmap, __extended_idmap_trampoline
-	 * calls this code directly in the idmap. In this case switching to the
-	 * boot tables is a no-op.
-	 *
-	 * x0: HYP boot pgd
-	 * x1: HYP phys_idmap_start
+	 * Reset kvm back to the hyp stub.
 	 */
 ENTRY(__kvm_hyp_reset)
-	/* We're in trampoline code in VA, switch back to boot page tables */
-	msr	ttbr0_el2, x0
-	isb
-
-	/* Ensure the PA branch doesn't find a stale tlb entry or stale code. */
-	ic	iallu
-	tlbi	alle2
-	dsb	sy
-	isb
-
-	/* Branch into PA space */
-	adr	x0, 1f
-	bfi	x1, x0, #0, #PAGE_SHIFT
-	br	x1
-
 	/* We're now in idmap, disable MMU */
-1:	mrs	x0, sctlr_el2
+	mrs	x0, sctlr_el2
 	ldr	x1, =SCTLR_ELx_FLAGS
 	bic	x0, x0, x1		// Clear SCTL_M and etc
 	msr	sctlr_el2, x0
 	isb
 
-	/* Invalidate the old TLBs */
-	tlbi	alle2
-	dsb	sy
-
 	/* Install stub vectors */
 	adr_l	x0, __hyp_stub_vectors
 	msr	vbar_el2, x0
diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index 70254a6..ce9e5e5 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -164,22 +164,3 @@ alternative_endif
 
 	eret
 ENDPROC(__fpsimd_guest_restore)
-
-/*
- * When using the extended idmap, we don't have a trampoline page we can use
- * while we switch pages tables during __kvm_hyp_reset. Accessing the idmap
- * directly would be ideal, but if we're using the extended idmap then the
- * idmap is located above HYP_PAGE_OFFSET, and the address will be masked by
- * kvm_call_hyp using kern_hyp_va.
- *
- * x0: HYP boot pgd
- * x1: HYP phys_idmap_start
- */
-ENTRY(__extended_idmap_trampoline)
-	mov	x4, x1
-	adr_l	x3, __kvm_hyp_reset
-
-	/* insert __kvm_hyp_reset()s offset into phys_idmap_start */
-	bfi	x4, x3, #0, #PAGE_SHIFT
-	br	x4
-ENDPROC(__extended_idmap_trampoline)
diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
index 2d87f36..f6d9694 100644
--- a/arch/arm64/kvm/hyp/hyp-entry.S
+++ b/arch/arm64/kvm/hyp/hyp-entry.S
@@ -62,6 +62,21 @@ ENTRY(__vhe_hyp_call)
 	isb
 	ret
 ENDPROC(__vhe_hyp_call)
+
+/*
+ * Compute the idmap address of __kvm_hyp_reset based on the idmap
+ * start passed as a parameter, and jump there.
+ *
+ * x0: HYP phys_idmap_start
+ */
+ENTRY(__kvm_hyp_teardown)
+	mov	x4, x0
+	adr_l	x3, __kvm_hyp_reset
+
+	/* insert __kvm_hyp_reset()s offset into phys_idmap_start */
+	bfi	x4, x3, #0, #PAGE_SHIFT
+	br	x4
+ENDPROC(__kvm_hyp_teardown)
 	
 el1_sync:				// Guest trapped into EL2
 	save_x0_to_x3
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index d044ca3..deee1b1 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -132,14 +132,3 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 	/* Reset timer */
 	return kvm_timer_vcpu_reset(vcpu, cpu_vtimer_irq);
 }
-
-unsigned long kvm_hyp_reset_entry(void)
-{
-	/*
-	 * KVM is running with merged page tables, which don't have the
-	 * trampoline page mapped. We know the idmap is still mapped,
-	 * but can't be called into directly. Use
-	 * __extended_idmap_trampoline to do the call.
-	 */
-	return (unsigned long)kvm_ksym_ref(__extended_idmap_trampoline);
-}
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 10/15] arm/arm64: KVM: Drop boot_pgd
  2016-06-07 10:58 ` Marc Zyngier
@ 2016-06-07 10:58   ` Marc Zyngier
  -1 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-07 10:58 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm

Since we now only have one set of page tables, the concept of
boot_pgd is useless and can be removed. We still keep it as
an element of the "extended idmap" thing.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_host.h   |  8 +++-----
 arch/arm/include/asm/kvm_mmu.h    |  1 -
 arch/arm/kvm/arm.c                | 15 +++------------
 arch/arm/kvm/mmu.c                |  8 --------
 arch/arm64/include/asm/kvm_host.h |  6 ++----
 arch/arm64/include/asm/kvm_mmu.h  |  1 -
 6 files changed, 8 insertions(+), 31 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 96387d4..020f4eb 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -241,8 +241,7 @@ int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
 int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 		int exception_index);
 
-static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
-				       phys_addr_t pgd_ptr,
+static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
 				       unsigned long hyp_stack_ptr,
 				       unsigned long vector_ptr)
 {
@@ -272,12 +271,11 @@ static inline void __cpu_init_stage2(void)
 	kvm_call_hyp(__init_stage2_translation);
 }
 
-static inline void __cpu_reset_hyp_mode(phys_addr_t boot_pgd_ptr,
-					phys_addr_t phys_idmap_start)
+static inline void __cpu_reset_hyp_mode(phys_addr_t phys_idmap_start)
 {
 	/*
 	 * TODO
-	 * kvm_call_reset(boot_pgd_ptr, phys_idmap_start);
+	 * kvm_call_reset(phys_idmap_start);
 	 */
 }
 
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index f9a6506..898e7c8 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -65,7 +65,6 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
 void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
 
 phys_addr_t kvm_mmu_get_httbr(void);
-phys_addr_t kvm_mmu_get_boot_httbr(void);
 phys_addr_t kvm_get_idmap_vector(void);
 phys_addr_t kvm_get_idmap_start(void);
 int kvm_mmu_init(void);
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 893941e..7f424fc 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -1038,7 +1038,6 @@ long kvm_arch_vm_ioctl(struct file *filp,
 
 static void cpu_init_hyp_mode(void *dummy)
 {
-	phys_addr_t boot_pgd_ptr;
 	phys_addr_t pgd_ptr;
 	unsigned long hyp_stack_ptr;
 	unsigned long stack_page;
@@ -1047,13 +1046,12 @@ static void cpu_init_hyp_mode(void *dummy)
 	/* Switch from the HYP stub to our own HYP init vector */
 	__hyp_set_vectors(kvm_get_idmap_vector());
 
-	boot_pgd_ptr = kvm_mmu_get_boot_httbr();
 	pgd_ptr = kvm_mmu_get_httbr();
 	stack_page = __this_cpu_read(kvm_arm_hyp_stack_page);
 	hyp_stack_ptr = stack_page + PAGE_SIZE;
 	vector_ptr = (unsigned long)kvm_ksym_ref(__kvm_hyp_vector);
 
-	__cpu_init_hyp_mode(boot_pgd_ptr, pgd_ptr, hyp_stack_ptr, vector_ptr);
+	__cpu_init_hyp_mode(pgd_ptr, hyp_stack_ptr, vector_ptr);
 	__cpu_init_stage2();
 
 	kvm_arm_init_debug();
@@ -1075,15 +1073,8 @@ static void cpu_hyp_reinit(void)
 
 static void cpu_hyp_reset(void)
 {
-	phys_addr_t boot_pgd_ptr;
-	phys_addr_t phys_idmap_start;
-
-	if (!is_kernel_in_hyp_mode()) {
-		boot_pgd_ptr = kvm_mmu_get_boot_httbr();
-		phys_idmap_start = kvm_get_idmap_start();
-
-		__cpu_reset_hyp_mode(boot_pgd_ptr, phys_idmap_start);
-	}
+	if (!is_kernel_in_hyp_mode())
+		__cpu_reset_hyp_mode(kvm_get_idmap_start());
 }
 
 static void _kvm_arch_hardware_enable(void *discard)
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 9a17e14..647dbb2 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -1684,14 +1684,6 @@ phys_addr_t kvm_mmu_get_httbr(void)
 		return virt_to_phys(hyp_pgd);
 }
 
-phys_addr_t kvm_mmu_get_boot_httbr(void)
-{
-	if (__kvm_cpu_uses_extended_idmap())
-		return virt_to_phys(merged_hyp_pgd);
-	else
-		return virt_to_phys(hyp_pgd);
-}
-
 phys_addr_t kvm_get_idmap_vector(void)
 {
 	return hyp_idmap_vector;
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 88462c3..6731d4e 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -347,8 +347,7 @@ int kvm_perf_teardown(void);
 
 struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
 
-static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
-				       phys_addr_t pgd_ptr,
+static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
 				       unsigned long hyp_stack_ptr,
 				       unsigned long vector_ptr)
 {
@@ -360,8 +359,7 @@ static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
 }
 
 void __kvm_hyp_teardown(void);
-static inline void __cpu_reset_hyp_mode(phys_addr_t boot_pgd_ptr,
-					phys_addr_t phys_idmap_start)
+static inline void __cpu_reset_hyp_mode(phys_addr_t phys_idmap_start)
 {
 	kvm_call_hyp(__kvm_hyp_teardown, phys_idmap_start);
 }
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 889330b..7a23e18 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -175,7 +175,6 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
 void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
 
 phys_addr_t kvm_mmu_get_httbr(void);
-phys_addr_t kvm_mmu_get_boot_httbr(void);
 phys_addr_t kvm_get_idmap_vector(void);
 phys_addr_t kvm_get_idmap_start(void);
 int kvm_mmu_init(void);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 10/15] arm/arm64: KVM: Drop boot_pgd
@ 2016-06-07 10:58   ` Marc Zyngier
  0 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-07 10:58 UTC (permalink / raw)
  To: linux-arm-kernel

Since we now only have one set of page tables, the concept of
boot_pgd is useless and can be removed. We still keep it as
an element of the "extended idmap" thing.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_host.h   |  8 +++-----
 arch/arm/include/asm/kvm_mmu.h    |  1 -
 arch/arm/kvm/arm.c                | 15 +++------------
 arch/arm/kvm/mmu.c                |  8 --------
 arch/arm64/include/asm/kvm_host.h |  6 ++----
 arch/arm64/include/asm/kvm_mmu.h  |  1 -
 6 files changed, 8 insertions(+), 31 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 96387d4..020f4eb 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -241,8 +241,7 @@ int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
 int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 		int exception_index);
 
-static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
-				       phys_addr_t pgd_ptr,
+static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
 				       unsigned long hyp_stack_ptr,
 				       unsigned long vector_ptr)
 {
@@ -272,12 +271,11 @@ static inline void __cpu_init_stage2(void)
 	kvm_call_hyp(__init_stage2_translation);
 }
 
-static inline void __cpu_reset_hyp_mode(phys_addr_t boot_pgd_ptr,
-					phys_addr_t phys_idmap_start)
+static inline void __cpu_reset_hyp_mode(phys_addr_t phys_idmap_start)
 {
 	/*
 	 * TODO
-	 * kvm_call_reset(boot_pgd_ptr, phys_idmap_start);
+	 * kvm_call_reset(phys_idmap_start);
 	 */
 }
 
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index f9a6506..898e7c8 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -65,7 +65,6 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
 void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
 
 phys_addr_t kvm_mmu_get_httbr(void);
-phys_addr_t kvm_mmu_get_boot_httbr(void);
 phys_addr_t kvm_get_idmap_vector(void);
 phys_addr_t kvm_get_idmap_start(void);
 int kvm_mmu_init(void);
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 893941e..7f424fc 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -1038,7 +1038,6 @@ long kvm_arch_vm_ioctl(struct file *filp,
 
 static void cpu_init_hyp_mode(void *dummy)
 {
-	phys_addr_t boot_pgd_ptr;
 	phys_addr_t pgd_ptr;
 	unsigned long hyp_stack_ptr;
 	unsigned long stack_page;
@@ -1047,13 +1046,12 @@ static void cpu_init_hyp_mode(void *dummy)
 	/* Switch from the HYP stub to our own HYP init vector */
 	__hyp_set_vectors(kvm_get_idmap_vector());
 
-	boot_pgd_ptr = kvm_mmu_get_boot_httbr();
 	pgd_ptr = kvm_mmu_get_httbr();
 	stack_page = __this_cpu_read(kvm_arm_hyp_stack_page);
 	hyp_stack_ptr = stack_page + PAGE_SIZE;
 	vector_ptr = (unsigned long)kvm_ksym_ref(__kvm_hyp_vector);
 
-	__cpu_init_hyp_mode(boot_pgd_ptr, pgd_ptr, hyp_stack_ptr, vector_ptr);
+	__cpu_init_hyp_mode(pgd_ptr, hyp_stack_ptr, vector_ptr);
 	__cpu_init_stage2();
 
 	kvm_arm_init_debug();
@@ -1075,15 +1073,8 @@ static void cpu_hyp_reinit(void)
 
 static void cpu_hyp_reset(void)
 {
-	phys_addr_t boot_pgd_ptr;
-	phys_addr_t phys_idmap_start;
-
-	if (!is_kernel_in_hyp_mode()) {
-		boot_pgd_ptr = kvm_mmu_get_boot_httbr();
-		phys_idmap_start = kvm_get_idmap_start();
-
-		__cpu_reset_hyp_mode(boot_pgd_ptr, phys_idmap_start);
-	}
+	if (!is_kernel_in_hyp_mode())
+		__cpu_reset_hyp_mode(kvm_get_idmap_start());
 }
 
 static void _kvm_arch_hardware_enable(void *discard)
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 9a17e14..647dbb2 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -1684,14 +1684,6 @@ phys_addr_t kvm_mmu_get_httbr(void)
 		return virt_to_phys(hyp_pgd);
 }
 
-phys_addr_t kvm_mmu_get_boot_httbr(void)
-{
-	if (__kvm_cpu_uses_extended_idmap())
-		return virt_to_phys(merged_hyp_pgd);
-	else
-		return virt_to_phys(hyp_pgd);
-}
-
 phys_addr_t kvm_get_idmap_vector(void)
 {
 	return hyp_idmap_vector;
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 88462c3..6731d4e 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -347,8 +347,7 @@ int kvm_perf_teardown(void);
 
 struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
 
-static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
-				       phys_addr_t pgd_ptr,
+static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
 				       unsigned long hyp_stack_ptr,
 				       unsigned long vector_ptr)
 {
@@ -360,8 +359,7 @@ static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
 }
 
 void __kvm_hyp_teardown(void);
-static inline void __cpu_reset_hyp_mode(phys_addr_t boot_pgd_ptr,
-					phys_addr_t phys_idmap_start)
+static inline void __cpu_reset_hyp_mode(phys_addr_t phys_idmap_start)
 {
 	kvm_call_hyp(__kvm_hyp_teardown, phys_idmap_start);
 }
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 889330b..7a23e18 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -175,7 +175,6 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
 void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
 
 phys_addr_t kvm_mmu_get_httbr(void);
-phys_addr_t kvm_mmu_get_boot_httbr(void);
 phys_addr_t kvm_get_idmap_vector(void);
 phys_addr_t kvm_get_idmap_start(void);
 int kvm_mmu_init(void);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 11/15] arm/arm64: KVM: Kill free_boot_hyp_pgd
  2016-06-07 10:58 ` Marc Zyngier
@ 2016-06-07 10:58   ` Marc Zyngier
  -1 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-07 10:58 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm

There is no way to free the boot PGD, because it doesn't exist
anymore as a standalone entity.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_mmu.h   |  1 -
 arch/arm/kvm/arm.c               |  4 ----
 arch/arm/kvm/mmu.c               | 30 +++++++-----------------------
 arch/arm64/include/asm/kvm_mmu.h |  1 -
 4 files changed, 7 insertions(+), 29 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 898e7c8..ea32d39 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -51,7 +51,6 @@
 
 int create_hyp_mappings(void *from, void *to);
 int create_hyp_io_mappings(void *from, void *to, phys_addr_t);
-void free_boot_hyp_pgd(void);
 void free_hyp_pgds(void);
 
 void stage2_unmap_vm(struct kvm *kvm);
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 7f424fc..ca5ac1a 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -1322,10 +1322,6 @@ static int init_hyp_mode(void)
 		}
 	}
 
-#ifndef CONFIG_HOTPLUG_CPU
-	free_boot_hyp_pgd();
-#endif
-
 	/* set size of VMID supported by CPU */
 	kvm_vmid_bits = kvm_get_vmid_bits();
 	kvm_info("%d-bit VMID\n", kvm_vmid_bits);
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 647dbb2..46b8604 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -482,27 +482,6 @@ static void unmap_hyp_range(pgd_t *pgdp, phys_addr_t start, u64 size)
 }
 
 /**
- * free_boot_hyp_pgd - free HYP boot page tables
- *
- * Free the HYP boot page tables. The bounce page is also freed.
- */
-void free_boot_hyp_pgd(void)
-{
-	mutex_lock(&kvm_hyp_pgd_mutex);
-
-	if (boot_hyp_pgd) {
-		unmap_hyp_range(boot_hyp_pgd, hyp_idmap_start, PAGE_SIZE);
-		free_pages((unsigned long)boot_hyp_pgd, hyp_pgd_order);
-		boot_hyp_pgd = NULL;
-	}
-
-	if (hyp_pgd)
-		unmap_hyp_range(hyp_pgd, hyp_idmap_start, PAGE_SIZE);
-
-	mutex_unlock(&kvm_hyp_pgd_mutex);
-}
-
-/**
  * free_hyp_pgds - free Hyp-mode page tables
  *
  * Assumes hyp_pgd is a page table used strictly in Hyp-mode and
@@ -516,11 +495,16 @@ void free_hyp_pgds(void)
 {
 	unsigned long addr;
 
-	free_boot_hyp_pgd();
-
 	mutex_lock(&kvm_hyp_pgd_mutex);
 
+	if (boot_hyp_pgd) {
+		unmap_hyp_range(boot_hyp_pgd, hyp_idmap_start, PAGE_SIZE);
+		free_pages((unsigned long)boot_hyp_pgd, hyp_pgd_order);
+		boot_hyp_pgd = NULL;
+	}
+
 	if (hyp_pgd) {
+		unmap_hyp_range(hyp_pgd, hyp_idmap_start, PAGE_SIZE);
 		for (addr = PAGE_OFFSET; virt_addr_valid(addr); addr += PGDIR_SIZE)
 			unmap_hyp_range(hyp_pgd, KERN_TO_HYP(addr), PGDIR_SIZE);
 		for (addr = VMALLOC_START; is_vmalloc_addr((void*)addr); addr += PGDIR_SIZE)
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 7a23e18..3c4dd4e 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -161,7 +161,6 @@ static inline unsigned long __kern_hyp_va(unsigned long v)
 
 int create_hyp_mappings(void *from, void *to);
 int create_hyp_io_mappings(void *from, void *to, phys_addr_t);
-void free_boot_hyp_pgd(void);
 void free_hyp_pgds(void);
 
 void stage2_unmap_vm(struct kvm *kvm);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 11/15] arm/arm64: KVM: Kill free_boot_hyp_pgd
@ 2016-06-07 10:58   ` Marc Zyngier
  0 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-07 10:58 UTC (permalink / raw)
  To: linux-arm-kernel

There is no way to free the boot PGD, because it doesn't exist
anymore as a standalone entity.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_mmu.h   |  1 -
 arch/arm/kvm/arm.c               |  4 ----
 arch/arm/kvm/mmu.c               | 30 +++++++-----------------------
 arch/arm64/include/asm/kvm_mmu.h |  1 -
 4 files changed, 7 insertions(+), 29 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 898e7c8..ea32d39 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -51,7 +51,6 @@
 
 int create_hyp_mappings(void *from, void *to);
 int create_hyp_io_mappings(void *from, void *to, phys_addr_t);
-void free_boot_hyp_pgd(void);
 void free_hyp_pgds(void);
 
 void stage2_unmap_vm(struct kvm *kvm);
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 7f424fc..ca5ac1a 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -1322,10 +1322,6 @@ static int init_hyp_mode(void)
 		}
 	}
 
-#ifndef CONFIG_HOTPLUG_CPU
-	free_boot_hyp_pgd();
-#endif
-
 	/* set size of VMID supported by CPU */
 	kvm_vmid_bits = kvm_get_vmid_bits();
 	kvm_info("%d-bit VMID\n", kvm_vmid_bits);
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 647dbb2..46b8604 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -482,27 +482,6 @@ static void unmap_hyp_range(pgd_t *pgdp, phys_addr_t start, u64 size)
 }
 
 /**
- * free_boot_hyp_pgd - free HYP boot page tables
- *
- * Free the HYP boot page tables. The bounce page is also freed.
- */
-void free_boot_hyp_pgd(void)
-{
-	mutex_lock(&kvm_hyp_pgd_mutex);
-
-	if (boot_hyp_pgd) {
-		unmap_hyp_range(boot_hyp_pgd, hyp_idmap_start, PAGE_SIZE);
-		free_pages((unsigned long)boot_hyp_pgd, hyp_pgd_order);
-		boot_hyp_pgd = NULL;
-	}
-
-	if (hyp_pgd)
-		unmap_hyp_range(hyp_pgd, hyp_idmap_start, PAGE_SIZE);
-
-	mutex_unlock(&kvm_hyp_pgd_mutex);
-}
-
-/**
  * free_hyp_pgds - free Hyp-mode page tables
  *
  * Assumes hyp_pgd is a page table used strictly in Hyp-mode and
@@ -516,11 +495,16 @@ void free_hyp_pgds(void)
 {
 	unsigned long addr;
 
-	free_boot_hyp_pgd();
-
 	mutex_lock(&kvm_hyp_pgd_mutex);
 
+	if (boot_hyp_pgd) {
+		unmap_hyp_range(boot_hyp_pgd, hyp_idmap_start, PAGE_SIZE);
+		free_pages((unsigned long)boot_hyp_pgd, hyp_pgd_order);
+		boot_hyp_pgd = NULL;
+	}
+
 	if (hyp_pgd) {
+		unmap_hyp_range(hyp_pgd, hyp_idmap_start, PAGE_SIZE);
 		for (addr = PAGE_OFFSET; virt_addr_valid(addr); addr += PGDIR_SIZE)
 			unmap_hyp_range(hyp_pgd, KERN_TO_HYP(addr), PGDIR_SIZE);
 		for (addr = VMALLOC_START; is_vmalloc_addr((void*)addr); addr += PGDIR_SIZE)
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 7a23e18..3c4dd4e 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -161,7 +161,6 @@ static inline unsigned long __kern_hyp_va(unsigned long v)
 
 int create_hyp_mappings(void *from, void *to);
 int create_hyp_io_mappings(void *from, void *to, phys_addr_t);
-void free_boot_hyp_pgd(void);
 void free_hyp_pgds(void);
 
 void stage2_unmap_vm(struct kvm *kvm);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 12/15] arm: KVM: Simplify HYP init
  2016-06-07 10:58 ` Marc Zyngier
@ 2016-06-07 10:58   ` Marc Zyngier
  -1 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-07 10:58 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm

Just like for arm64, we can now make the HYP setup a lot simpler,
and we can now initialise it in one go (instead of the two
phases we currently have).

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_host.h | 15 +++++--------
 arch/arm/kvm/init.S             | 49 ++++++++---------------------------------
 2 files changed, 14 insertions(+), 50 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 020f4eb..eafbfd5 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -250,18 +250,13 @@ static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
 	 * code. The init code doesn't need to preserve these
 	 * registers as r0-r3 are already callee saved according to
 	 * the AAPCS.
-	 * Note that we slightly misuse the prototype by casing the
+	 * Note that we slightly misuse the prototype by casting the
 	 * stack pointer to a void *.
-	 *
-	 * We don't have enough registers to perform the full init in
-	 * one go.  Install the boot PGD first, and then install the
-	 * runtime PGD, stack pointer and vectors. The PGDs are always
-	 * passed as the third argument, in order to be passed into
-	 * r2-r3 to the init code (yes, this is compliant with the
-	 * PCS!).
-	 */
 
-	kvm_call_hyp(NULL, 0, boot_pgd_ptr);
+	 * The PGDs are always passed as the third argument, in order
+	 * to be passed into r2-r3 to the init code (yes, this is
+	 * compliant with the PCS!).
+	 */
 
 	kvm_call_hyp((void*)hyp_stack_ptr, vector_ptr, pgd_ptr);
 }
diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
index 1f9ae17..b82a99d 100644
--- a/arch/arm/kvm/init.S
+++ b/arch/arm/kvm/init.S
@@ -32,23 +32,13 @@
  *       r2,r3 = Hypervisor pgd pointer
  *
  * The init scenario is:
- * - We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
- *   runtime stack, runtime vectors
- * - Enable the MMU with the boot pgd
- * - Jump to a target into the trampoline page (remember, this is the same
- *   physical page!)
- * - Now switch to the runtime pgd (same VA, and still the same physical
- *   page!)
+ * - We jump in HYP with 3 parameters: runtime HYP pgd, runtime stack,
+ *   runtime vectors
  * - Invalidate TLBs
  * - Set stack and vectors
+ * - Setup the page tables
+ * - Enable the MMU
  * - Profit! (or eret, if you only care about the code).
- *
- * As we only have four registers available to pass parameters (and we
- * need six), we split the init in two phases:
- * - Phase 1: r0 = 0, r1 = 0, r2,r3 contain the boot PGD.
- *   Provides the basic HYP init, and enable the MMU.
- * - Phase 2: r0 = ToS, r1 = vectors, r2,r3 contain the runtime PGD.
- *   Switches to the runtime PGD, set stack and vectors.
  */
 
 	.text
@@ -68,8 +58,11 @@ __kvm_hyp_init:
 	W(b)	.
 
 __do_hyp_init:
-	cmp	r0, #0			@ We have a SP?
-	bne	phase2			@ Yes, second stage init
+	@ Set stack pointer
+	mov	sp, r0
+
+	@ Set HVBAR to point to the HYP vectors
+	mcr	p15, 4, r1, c12, c0, 0	@ HVBAR
 
 	@ Set the HTTBR to point to the hypervisor PGD pointer passed
 	mcrr	p15, 4, rr_lo_hi(r2, r3), c2
@@ -114,33 +107,9 @@ __do_hyp_init:
  THUMB(	ldr	r2, =(HSCTLR_M | HSCTLR_A | HSCTLR_TE)		)
 	orr	r1, r1, r2
 	orr	r0, r0, r1
-	isb
 	mcr	p15, 4, r0, c1, c0, 0	@ HSCR
-
-	@ End of init phase-1
-	eret
-
-phase2:
-	@ Set stack pointer
-	mov	sp, r0
-
-	@ Set HVBAR to point to the HYP vectors
-	mcr	p15, 4, r1, c12, c0, 0	@ HVBAR
-
-	@ Jump to the trampoline page
-	ldr	r0, =TRAMPOLINE_VA
-	adr	r1, target
-	bfi	r0, r1, #0, #PAGE_SHIFT
-	ret	r0
-
-target:	@ We're now in the trampoline code, switch page tables
-	mcrr	p15, 4, rr_lo_hi(r2, r3), c2
 	isb
 
-	@ Invalidate the old TLBs
-	mcr	p15, 4, r0, c8, c7, 0	@ TLBIALLH
-	dsb	ish
-
 	eret
 
 	.ltorg
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 12/15] arm: KVM: Simplify HYP init
@ 2016-06-07 10:58   ` Marc Zyngier
  0 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-07 10:58 UTC (permalink / raw)
  To: linux-arm-kernel

Just like for arm64, we can now make the HYP setup a lot simpler,
and we can now initialise it in one go (instead of the two
phases we currently have).

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_host.h | 15 +++++--------
 arch/arm/kvm/init.S             | 49 ++++++++---------------------------------
 2 files changed, 14 insertions(+), 50 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 020f4eb..eafbfd5 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -250,18 +250,13 @@ static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
 	 * code. The init code doesn't need to preserve these
 	 * registers as r0-r3 are already callee saved according to
 	 * the AAPCS.
-	 * Note that we slightly misuse the prototype by casing the
+	 * Note that we slightly misuse the prototype by casting the
 	 * stack pointer to a void *.
-	 *
-	 * We don't have enough registers to perform the full init in
-	 * one go.  Install the boot PGD first, and then install the
-	 * runtime PGD, stack pointer and vectors. The PGDs are always
-	 * passed as the third argument, in order to be passed into
-	 * r2-r3 to the init code (yes, this is compliant with the
-	 * PCS!).
-	 */
 
-	kvm_call_hyp(NULL, 0, boot_pgd_ptr);
+	 * The PGDs are always passed as the third argument, in order
+	 * to be passed into r2-r3 to the init code (yes, this is
+	 * compliant with the PCS!).
+	 */
 
 	kvm_call_hyp((void*)hyp_stack_ptr, vector_ptr, pgd_ptr);
 }
diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
index 1f9ae17..b82a99d 100644
--- a/arch/arm/kvm/init.S
+++ b/arch/arm/kvm/init.S
@@ -32,23 +32,13 @@
  *       r2,r3 = Hypervisor pgd pointer
  *
  * The init scenario is:
- * - We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
- *   runtime stack, runtime vectors
- * - Enable the MMU with the boot pgd
- * - Jump to a target into the trampoline page (remember, this is the same
- *   physical page!)
- * - Now switch to the runtime pgd (same VA, and still the same physical
- *   page!)
+ * - We jump in HYP with 3 parameters: runtime HYP pgd, runtime stack,
+ *   runtime vectors
  * - Invalidate TLBs
  * - Set stack and vectors
+ * - Setup the page tables
+ * - Enable the MMU
  * - Profit! (or eret, if you only care about the code).
- *
- * As we only have four registers available to pass parameters (and we
- * need six), we split the init in two phases:
- * - Phase 1: r0 = 0, r1 = 0, r2,r3 contain the boot PGD.
- *   Provides the basic HYP init, and enable the MMU.
- * - Phase 2: r0 = ToS, r1 = vectors, r2,r3 contain the runtime PGD.
- *   Switches to the runtime PGD, set stack and vectors.
  */
 
 	.text
@@ -68,8 +58,11 @@ __kvm_hyp_init:
 	W(b)	.
 
 __do_hyp_init:
-	cmp	r0, #0			@ We have a SP?
-	bne	phase2			@ Yes, second stage init
+	@ Set stack pointer
+	mov	sp, r0
+
+	@ Set HVBAR to point to the HYP vectors
+	mcr	p15, 4, r1, c12, c0, 0	@ HVBAR
 
 	@ Set the HTTBR to point to the hypervisor PGD pointer passed
 	mcrr	p15, 4, rr_lo_hi(r2, r3), c2
@@ -114,33 +107,9 @@ __do_hyp_init:
  THUMB(	ldr	r2, =(HSCTLR_M | HSCTLR_A | HSCTLR_TE)		)
 	orr	r1, r1, r2
 	orr	r0, r0, r1
-	isb
 	mcr	p15, 4, r0, c1, c0, 0	@ HSCR
-
-	@ End of init phase-1
-	eret
-
-phase2:
-	@ Set stack pointer
-	mov	sp, r0
-
-	@ Set HVBAR to point to the HYP vectors
-	mcr	p15, 4, r1, c12, c0, 0	@ HVBAR
-
-	@ Jump to the trampoline page
-	ldr	r0, =TRAMPOLINE_VA
-	adr	r1, target
-	bfi	r0, r1, #0, #PAGE_SHIFT
-	ret	r0
-
-target:	@ We're now in the trampoline code, switch page tables
-	mcrr	p15, 4, rr_lo_hi(r2, r3), c2
 	isb
 
-	@ Invalidate the old TLBs
-	mcr	p15, 4, r0, c8, c7, 0	@ TLBIALLH
-	dsb	ish
-
 	eret
 
 	.ltorg
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 13/15] arm: KVM: Allow hyp teardown
  2016-06-07 10:58 ` Marc Zyngier
@ 2016-06-07 10:58   ` Marc Zyngier
  -1 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-07 10:58 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: linux-arm-kernel, kvm, kvmarm

So far, KVM was getting in the way of kexec on 32bit (and the arm64
kexec hackers couldn't be bothered to fix it on 32bit...).

With simpler page tables, tearing KVM down becomes very easy, so
let's just do it.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_asm.h    |  2 ++
 arch/arm/include/asm/kvm_host.h   |  8 +++-----
 arch/arm/kvm/arm.c                |  3 ++-
 arch/arm/kvm/init.S               | 15 +++++++++++++++
 arch/arm64/include/asm/kvm_host.h |  3 ++-
 5 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 3d5a5cd..58faff5 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -66,6 +66,8 @@ extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 
 extern void __init_stage2_translation(void);
+
+extern void __kvm_hyp_reset(unsigned long);
 #endif
 
 #endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index eafbfd5..58d0b69 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -266,12 +266,10 @@ static inline void __cpu_init_stage2(void)
 	kvm_call_hyp(__init_stage2_translation);
 }
 
-static inline void __cpu_reset_hyp_mode(phys_addr_t phys_idmap_start)
+static inline void __cpu_reset_hyp_mode(unsigned long vector_ptr,
+					phys_addr_t phys_idmap_start)
 {
-	/*
-	 * TODO
-	 * kvm_call_reset(phys_idmap_start);
-	 */
+	kvm_call_hyp((void *)virt_to_idmap(__kvm_hyp_reset), vector_ptr);
 }
 
 static inline int kvm_arch_dev_ioctl_check_extension(long ext)
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index ca5ac1a..fe20532 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -1074,7 +1074,8 @@ static void cpu_hyp_reinit(void)
 static void cpu_hyp_reset(void)
 {
 	if (!is_kernel_in_hyp_mode())
-		__cpu_reset_hyp_mode(kvm_get_idmap_start());
+		__cpu_reset_hyp_mode(hyp_default_vectors,
+				     kvm_get_idmap_start());
 }
 
 static void _kvm_arch_hardware_enable(void *discard)
diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
index b82a99d..bf89c91 100644
--- a/arch/arm/kvm/init.S
+++ b/arch/arm/kvm/init.S
@@ -112,6 +112,21 @@ __do_hyp_init:
 
 	eret
 
+	@ r0 : stub vectors address
+ENTRY(__kvm_hyp_reset)
+	/* We're now in idmap, disable MMU */
+	mrc	p15, 4, r1, c1, c0, 0	@ HSCTLR
+	ldr	r2, =(HSCTLR_M | HSCTLR_A | HSCTLR_C | HSCTLR_I)
+	bic	r1, r1, r2
+	mcr	p15, 4, r1, c1, c0, 0	@ HSCTLR
+
+	/* Install stub vectors */
+	mcr	p15, 4, r0, c12, c0, 0	@ HVBAR
+	isb
+
+	eret
+ENDPROC(__kvm_hyp_reset)
+
 	.ltorg
 
 	.globl __kvm_hyp_init_end
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 6731d4e..69d5cc2d 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -359,7 +359,8 @@ static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
 }
 
 void __kvm_hyp_teardown(void);
-static inline void __cpu_reset_hyp_mode(phys_addr_t phys_idmap_start)
+static inline void __cpu_reset_hyp_mode(unsigned long vector_ptr,
+					phys_addr_t phys_idmap_start)
 {
 	kvm_call_hyp(__kvm_hyp_teardown, phys_idmap_start);
 }
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 13/15] arm: KVM: Allow hyp teardown
@ 2016-06-07 10:58   ` Marc Zyngier
  0 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-07 10:58 UTC (permalink / raw)
  To: linux-arm-kernel

So far, KVM was getting in the way of kexec on 32bit (and the arm64
kexec hackers couldn't be bothered to fix it on 32bit...).

With simpler page tables, tearing KVM down becomes very easy, so
let's just do it.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_asm.h    |  2 ++
 arch/arm/include/asm/kvm_host.h   |  8 +++-----
 arch/arm/kvm/arm.c                |  3 ++-
 arch/arm/kvm/init.S               | 15 +++++++++++++++
 arch/arm64/include/asm/kvm_host.h |  3 ++-
 5 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 3d5a5cd..58faff5 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -66,6 +66,8 @@ extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 
 extern void __init_stage2_translation(void);
+
+extern void __kvm_hyp_reset(unsigned long);
 #endif
 
 #endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index eafbfd5..58d0b69 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -266,12 +266,10 @@ static inline void __cpu_init_stage2(void)
 	kvm_call_hyp(__init_stage2_translation);
 }
 
-static inline void __cpu_reset_hyp_mode(phys_addr_t phys_idmap_start)
+static inline void __cpu_reset_hyp_mode(unsigned long vector_ptr,
+					phys_addr_t phys_idmap_start)
 {
-	/*
-	 * TODO
-	 * kvm_call_reset(phys_idmap_start);
-	 */
+	kvm_call_hyp((void *)virt_to_idmap(__kvm_hyp_reset), vector_ptr);
 }
 
 static inline int kvm_arch_dev_ioctl_check_extension(long ext)
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index ca5ac1a..fe20532 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -1074,7 +1074,8 @@ static void cpu_hyp_reinit(void)
 static void cpu_hyp_reset(void)
 {
 	if (!is_kernel_in_hyp_mode())
-		__cpu_reset_hyp_mode(kvm_get_idmap_start());
+		__cpu_reset_hyp_mode(hyp_default_vectors,
+				     kvm_get_idmap_start());
 }
 
 static void _kvm_arch_hardware_enable(void *discard)
diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
index b82a99d..bf89c91 100644
--- a/arch/arm/kvm/init.S
+++ b/arch/arm/kvm/init.S
@@ -112,6 +112,21 @@ __do_hyp_init:
 
 	eret
 
+	@ r0 : stub vectors address
+ENTRY(__kvm_hyp_reset)
+	/* We're now in idmap, disable MMU */
+	mrc	p15, 4, r1, c1, c0, 0	@ HSCTLR
+	ldr	r2, =(HSCTLR_M | HSCTLR_A | HSCTLR_C | HSCTLR_I)
+	bic	r1, r1, r2
+	mcr	p15, 4, r1, c1, c0, 0	@ HSCTLR
+
+	/* Install stub vectors */
+	mcr	p15, 4, r0, c12, c0, 0	@ HVBAR
+	isb
+
+	eret
+ENDPROC(__kvm_hyp_reset)
+
 	.ltorg
 
 	.globl __kvm_hyp_init_end
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 6731d4e..69d5cc2d 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -359,7 +359,8 @@ static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
 }
 
 void __kvm_hyp_teardown(void);
-static inline void __cpu_reset_hyp_mode(phys_addr_t phys_idmap_start)
+static inline void __cpu_reset_hyp_mode(unsigned long vector_ptr,
+					phys_addr_t phys_idmap_start)
 {
 	kvm_call_hyp(__kvm_hyp_teardown, phys_idmap_start);
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 14/15] arm/arm64: KVM: Prune unused #defines
  2016-06-07 10:58 ` Marc Zyngier
@ 2016-06-07 10:58   ` Marc Zyngier
  -1 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-07 10:58 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm

We can now remove a number of dead #defines, thanks to the trampoline
code being gone.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_mmu.h   |  9 ---------
 arch/arm64/include/asm/kvm_mmu.h | 10 ----------
 2 files changed, 19 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index ea32d39..a0c6cf4 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -26,18 +26,9 @@
  * We directly use the kernel VA for the HYP, as we can directly share
  * the mapping (HTTBR "covers" TTBR1).
  */
-#define HYP_PAGE_OFFSET_MASK	UL(~0)
-#define HYP_PAGE_OFFSET		PAGE_OFFSET
 #define KERN_TO_HYP(kva)	(kva)
 
 /*
- * Our virtual mapping for the boot-time MMU-enable code. Must be
- * shared across all the page-tables. Conveniently, we use the vectors
- * page, where no kernel data will ever be shared with HYP.
- */
-#define TRAMPOLINE_VA		UL(CONFIG_VECTORS_BASE)
-
-/*
  * KVM_MMU_CACHE_MIN_PAGES is the number of stage2 page table translation levels.
  */
 #define KVM_MMU_CACHE_MIN_PAGES	2
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 3c4dd4e..5b8a5622 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -77,16 +77,6 @@
 #define HYP_PAGE_OFFSET_HIGH_MASK	((UL(1) << VA_BITS) - 1)
 #define HYP_PAGE_OFFSET_LOW_MASK	((UL(1) << (VA_BITS - 1)) - 1)
 
-/* Temporary compat define */
-#define HYP_PAGE_OFFSET_MASK		HYP_PAGE_OFFSET_HIGH_MASK
-
-/*
- * Our virtual mapping for the idmap-ed MMU-enable code. Must be
- * shared across all the page-tables. Conveniently, we use the last
- * possible page, where no kernel mapping will ever exist.
- */
-#define TRAMPOLINE_VA		(HYP_PAGE_OFFSET_MASK & PAGE_MASK)
-
 #ifdef __ASSEMBLY__
 
 #include <asm/alternative.h>
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 14/15] arm/arm64: KVM: Prune unused #defines
@ 2016-06-07 10:58   ` Marc Zyngier
  0 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-07 10:58 UTC (permalink / raw)
  To: linux-arm-kernel

We can now remove a number of dead #defines, thanks to the trampoline
code being gone.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_mmu.h   |  9 ---------
 arch/arm64/include/asm/kvm_mmu.h | 10 ----------
 2 files changed, 19 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index ea32d39..a0c6cf4 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -26,18 +26,9 @@
  * We directly use the kernel VA for the HYP, as we can directly share
  * the mapping (HTTBR "covers" TTBR1).
  */
-#define HYP_PAGE_OFFSET_MASK	UL(~0)
-#define HYP_PAGE_OFFSET		PAGE_OFFSET
 #define KERN_TO_HYP(kva)	(kva)
 
 /*
- * Our virtual mapping for the boot-time MMU-enable code. Must be
- * shared across all the page-tables. Conveniently, we use the vectors
- * page, where no kernel data will ever be shared with HYP.
- */
-#define TRAMPOLINE_VA		UL(CONFIG_VECTORS_BASE)
-
-/*
  * KVM_MMU_CACHE_MIN_PAGES is the number of stage2 page table translation levels.
  */
 #define KVM_MMU_CACHE_MIN_PAGES	2
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 3c4dd4e..5b8a5622 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -77,16 +77,6 @@
 #define HYP_PAGE_OFFSET_HIGH_MASK	((UL(1) << VA_BITS) - 1)
 #define HYP_PAGE_OFFSET_LOW_MASK	((UL(1) << (VA_BITS - 1)) - 1)
 
-/* Temporary compat define */
-#define HYP_PAGE_OFFSET_MASK		HYP_PAGE_OFFSET_HIGH_MASK
-
-/*
- * Our virtual mapping for the idmap-ed MMU-enable code. Must be
- * shared across all the page-tables. Conveniently, we use the last
- * possible page, where no kernel mapping will ever exist.
- */
-#define TRAMPOLINE_VA		(HYP_PAGE_OFFSET_MASK & PAGE_MASK)
-
 #ifdef __ASSEMBLY__
 
 #include <asm/alternative.h>
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 15/15] arm/arm64: KVM: Check that IDMAP doesn't intersect with VA range
  2016-06-07 10:58 ` Marc Zyngier
@ 2016-06-07 10:58   ` Marc Zyngier
  -1 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-07 10:58 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm

This is more of a safety measure than anything else: If we end-up
with an idmap page that intersect with the range picked for the
the HYP VA space, abort the KVM setup, as it is unsafe to go
further.

I cannot imagine it happening on 64bit (we have a mechanism to
work around it), but could potentially occur on a 32bit system with
the kernel loaded high enough in memory so that in conflicts with
the kernel VA.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/kvm/mmu.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 46b8604..819517d 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -1708,6 +1708,21 @@ int kvm_mmu_init(void)
 	 */
 	BUG_ON((hyp_idmap_start ^ (hyp_idmap_end - 1)) & PAGE_MASK);
 
+	kvm_info("IDMAP page: %lx\n", hyp_idmap_start);
+	kvm_info("HYP VA range: %lx:%lx\n",
+		 KERN_TO_HYP(PAGE_OFFSET), KERN_TO_HYP(~0UL));
+
+	if (hyp_idmap_start >= KERN_TO_HYP(PAGE_OFFSET) &&
+	    hyp_idmap_start <  KERN_TO_HYP(~0UL)) {
+		/*
+		 * The idmap page is intersecting with the VA space,
+		 * it is not safe to continue further.
+		 */
+		kvm_err("IDMAP intersecting with HYP VA, unable to continue\n");
+		err = -EINVAL;
+		goto out;
+	}
+
 	hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order);
 	if (!hyp_pgd) {
 		kvm_err("Hyp mode PGD not allocated\n");
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 15/15] arm/arm64: KVM: Check that IDMAP doesn't intersect with VA range
@ 2016-06-07 10:58   ` Marc Zyngier
  0 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-07 10:58 UTC (permalink / raw)
  To: linux-arm-kernel

This is more of a safety measure than anything else: If we end-up
with an idmap page that intersect with the range picked for the
the HYP VA space, abort the KVM setup, as it is unsafe to go
further.

I cannot imagine it happening on 64bit (we have a mechanism to
work around it), but could potentially occur on a 32bit system with
the kernel loaded high enough in memory so that in conflicts with
the kernel VA.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/kvm/mmu.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 46b8604..819517d 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -1708,6 +1708,21 @@ int kvm_mmu_init(void)
 	 */
 	BUG_ON((hyp_idmap_start ^ (hyp_idmap_end - 1)) & PAGE_MASK);
 
+	kvm_info("IDMAP page: %lx\n", hyp_idmap_start);
+	kvm_info("HYP VA range: %lx:%lx\n",
+		 KERN_TO_HYP(PAGE_OFFSET), KERN_TO_HYP(~0UL));
+
+	if (hyp_idmap_start >= KERN_TO_HYP(PAGE_OFFSET) &&
+	    hyp_idmap_start <  KERN_TO_HYP(~0UL)) {
+		/*
+		 * The idmap page is intersecting with the VA space,
+		 * it is not safe to continue further.
+		 */
+		kvm_err("IDMAP intersecting with HYP VA, unable to continue\n");
+		err = -EINVAL;
+		goto out;
+	}
+
 	hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order);
 	if (!hyp_pgd) {
 		kvm_err("Hyp mode PGD not allocated\n");
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* Re: [PATCH 01/15] arm64: KVM: Merged page tables documentation
  2016-06-07 10:58   ` Marc Zyngier
@ 2016-06-27 13:28     ` Christoffer Dall
  -1 siblings, 0 replies; 90+ messages in thread
From: Christoffer Dall @ 2016-06-27 13:28 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, linux-arm-kernel, kvmarm

On Tue, Jun 07, 2016 at 11:58:21AM +0100, Marc Zyngier wrote:
> Since dealing with VA ranges tends to hurt my brain badly, let's
> start with a bit of documentation that will hopefully help
> understanding what comes next...
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/kvm_mmu.h | 45 +++++++++++++++++++++++++++++++++++++---
>  1 file changed, 42 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index f05ac27..00bc277 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -29,10 +29,49 @@
>   *
>   * Instead, give the HYP mode its own VA region at a fixed offset from
>   * the kernel by just masking the top bits (which are all ones for a
> - * kernel address).
> + * kernel address). We need to find out how many bits to mask.
>   *
> - * ARMv8.1 (using VHE) does have a TTBR1_EL2, and doesn't use these
> - * macros (the entire kernel runs at EL2).
> + * We want to build a set of page tables that cover both parts of the
> + * idmap (the trampoline page used to initialize EL2), and our normal
> + * runtime VA space, at the same time.
> + *
> + * Given that the kernel uses VA_BITS for its entire address space,
> + * and that half of that space (VA_BITS - 1) is used for the linear
> + * mapping, we can limit the EL2 space to the same size.

we can also limit the EL2 space to (VA_BITS - 1).

> + *
> + * The main question is "Within the VA_BITS space, does EL2 use the
> + * top or the bottom half of that space to shadow the kernel's linear
> + * mapping?". As we need to idmap the trampoline page, this is
> + * determined by the range in which this page lives.
> + *
> + * If the page is in the bottom half, we have to use the top half. If
> + * the page is in the top half, we have to use the bottom half:
> + *
> + * if (PA(T)[VA_BITS - 1] == 1)
> + *	HYP_VA_RANGE = [0 ... (1 << (VA_BITS - 1)) - 1]
> + * else
> + *	HYP_VA_RANGE = [(1 << (VA_BITS - 1)) ... (1 << VA_BITS) - 1]

Is this pseudo code or what am I looking at?  What is T?

I don't understand what this is saying.

Can this be written using known constructs such as hyp_idmap_end,
PHYS_OFFSET etc.?

And perhaps the pseudo code should define HYP_VA_SHIFT instead of the
range to simplify it, at least I'm confused.

> + *
> + * In practice, the second case can be simplified to
> + *	HYP_VA_RANGE = [0 ... (1 << VA_BITS) - 1]
> + * because we'll never get anything in the bottom range.

and now I'm more confused, are we not supposed to map the idmap in the
bottom range?  Is this part of the comment necessary?

> + *
> + * This of course assumes that the trampoline page exists within the
> + * VA_BITS range. If it doesn't, then it means we're in the odd case
> + * where the kernel idmap (as well as HYP) uses more levels than the
> + * kernel runtime page tables (as seen when the kernel is configured
> + * for 4k pages, 39bits VA, and yet memory lives just above that
> + * limit, forcing the idmap to use 4 levels of page tables while the
> + * kernel itself only uses 3). In this particular case, it doesn't
> + * matter which side of VA_BITS we use, as we're guaranteed not to
> + * conflict with anything.
> + *
> + * An alternative would be to always use 4 levels of page tables for
> + * EL2, no matter what the kernel does. But who wants more levels than
> + * strictly necessary?
> + *
> + * Thankfully, ARMv8.1 (using VHE) does have a TTBR1_EL2, and doesn't
> + * need any of this madness (the entire kernel runs at EL2).

Not sure how these two last paragraphs helps understanding what this
patch set is about to implement, as it seems to raise more questions
than answer them, but I will proceed to trying to read the code...


Thanks,
-Christoffer

>   */
>  #define HYP_PAGE_OFFSET_SHIFT	VA_BITS
>  #define HYP_PAGE_OFFSET_MASK	((UL(1) << HYP_PAGE_OFFSET_SHIFT) - 1)
> -- 
> 2.1.4
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 01/15] arm64: KVM: Merged page tables documentation
@ 2016-06-27 13:28     ` Christoffer Dall
  0 siblings, 0 replies; 90+ messages in thread
From: Christoffer Dall @ 2016-06-27 13:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 07, 2016 at 11:58:21AM +0100, Marc Zyngier wrote:
> Since dealing with VA ranges tends to hurt my brain badly, let's
> start with a bit of documentation that will hopefully help
> understanding what comes next...
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/kvm_mmu.h | 45 +++++++++++++++++++++++++++++++++++++---
>  1 file changed, 42 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index f05ac27..00bc277 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -29,10 +29,49 @@
>   *
>   * Instead, give the HYP mode its own VA region at a fixed offset from
>   * the kernel by just masking the top bits (which are all ones for a
> - * kernel address).
> + * kernel address). We need to find out how many bits to mask.
>   *
> - * ARMv8.1 (using VHE) does have a TTBR1_EL2, and doesn't use these
> - * macros (the entire kernel runs at EL2).
> + * We want to build a set of page tables that cover both parts of the
> + * idmap (the trampoline page used to initialize EL2), and our normal
> + * runtime VA space, at the same time.
> + *
> + * Given that the kernel uses VA_BITS for its entire address space,
> + * and that half of that space (VA_BITS - 1) is used for the linear
> + * mapping, we can limit the EL2 space to the same size.

we can also limit the EL2 space to (VA_BITS - 1).

> + *
> + * The main question is "Within the VA_BITS space, does EL2 use the
> + * top or the bottom half of that space to shadow the kernel's linear
> + * mapping?". As we need to idmap the trampoline page, this is
> + * determined by the range in which this page lives.
> + *
> + * If the page is in the bottom half, we have to use the top half. If
> + * the page is in the top half, we have to use the bottom half:
> + *
> + * if (PA(T)[VA_BITS - 1] == 1)
> + *	HYP_VA_RANGE = [0 ... (1 << (VA_BITS - 1)) - 1]
> + * else
> + *	HYP_VA_RANGE = [(1 << (VA_BITS - 1)) ... (1 << VA_BITS) - 1]

Is this pseudo code or what am I looking at?  What is T?

I don't understand what this is saying.

Can this be written using known constructs such as hyp_idmap_end,
PHYS_OFFSET etc.?

And perhaps the pseudo code should define HYP_VA_SHIFT instead of the
range to simplify it,@least I'm confused.

> + *
> + * In practice, the second case can be simplified to
> + *	HYP_VA_RANGE = [0 ... (1 << VA_BITS) - 1]
> + * because we'll never get anything in the bottom range.

and now I'm more confused, are we not supposed to map the idmap in the
bottom range?  Is this part of the comment necessary?

> + *
> + * This of course assumes that the trampoline page exists within the
> + * VA_BITS range. If it doesn't, then it means we're in the odd case
> + * where the kernel idmap (as well as HYP) uses more levels than the
> + * kernel runtime page tables (as seen when the kernel is configured
> + * for 4k pages, 39bits VA, and yet memory lives just above that
> + * limit, forcing the idmap to use 4 levels of page tables while the
> + * kernel itself only uses 3). In this particular case, it doesn't
> + * matter which side of VA_BITS we use, as we're guaranteed not to
> + * conflict with anything.
> + *
> + * An alternative would be to always use 4 levels of page tables for
> + * EL2, no matter what the kernel does. But who wants more levels than
> + * strictly necessary?
> + *
> + * Thankfully, ARMv8.1 (using VHE) does have a TTBR1_EL2, and doesn't
> + * need any of this madness (the entire kernel runs at EL2).

Not sure how these two last paragraphs helps understanding what this
patch set is about to implement, as it seems to raise more questions
than answer them, but I will proceed to trying to read the code...


Thanks,
-Christoffer

>   */
>  #define HYP_PAGE_OFFSET_SHIFT	VA_BITS
>  #define HYP_PAGE_OFFSET_MASK	((UL(1) << HYP_PAGE_OFFSET_SHIFT) - 1)
> -- 
> 2.1.4
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 00/15] arm/arm64: KVM: Merge boot and runtime page tables
  2016-06-07 10:58 ` Marc Zyngier
@ 2016-06-27 13:29   ` Christoffer Dall
  -1 siblings, 0 replies; 90+ messages in thread
From: Christoffer Dall @ 2016-06-27 13:29 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, linux-arm-kernel, kvmarm

On Tue, Jun 07, 2016 at 11:58:20AM +0100, Marc Zyngier wrote:
> Until now, we've been setting up KVM using two sets of page tables:
> one for the "boot" where we perform the basic MMU setup, and one for
> the runtime.
> 
> Switching between the two was though to be safe, but we've recently
> realized that it is not: it is not enough to ensure that the VA->PA
> mapping is consistent when switching TTBR0_EL2, but we also have to
> ensure that the intermediate translations are the same as well. If the
> TLB can return two different values for intermediate translations,
> we're screwed (TLB conflicts).

Just a clarification: Intermediate Translations here means the
page table levels of translations for a single stage of translation?

Is there a valid reference to the arhictecture specification for this?

Thanks,
-Christoffer


> 
> At that point, the only safe thing to do is to never change TTBR0_EL2,
> which means that we need to make the idmap page part of the runtime
> page tables.
> 
> The series starts with a bit of brain dumping explaining what we're
> trying to do. This might not be useful as a merge candidate, but it
> was useful for me to put this somewhere. It goes on revamping the
> whole notion of HYP VA range, making it runtime patchable. It then
> always merge idmap and runtime page table into one set, leading to
> quite a lot of simplification in the init/teardown code. In the
> process, 32bit KVM gains the ability to teardown the HYP page-tables
> and vectors, which makes kexec a bit closer.
> 
> This has been tested on Seattle, Juno, the FVP model (both v8.0 and
> v8.1), Cubietruck and Midway, and is based on 4.7-rc2.
> 
> Thanks,
> 
> 	M.
> 
> Marc Zyngier (15):
>   arm64: KVM: Merged page tables documentation
>   arm64: KVM: Kill HYP_PAGE_OFFSET
>   arm64: Add ARM64_HYP_OFFSET_LOW capability
>   arm64: KVM: Define HYP offset masks
>   arm64: KVM: Refactor kern_hyp_va/hyp_kern_va to deal with multiple
>     offsets
>   arm/arm64: KVM: Export __hyp_text_start/end symbols
>   arm64: KVM: Runtime detection of lower HYP offset
>   arm/arm64: KVM: Always have merged page tables
>   arm64: KVM: Simplify HYP init/teardown
>   arm/arm64: KVM: Drop boot_pgd
>   arm/arm64: KVM: Kill free_boot_hyp_pgd
>   arm: KVM: Simplify HYP init
>   arm: KVM: Allow hyp teardown
>   arm/arm64: KVM: Prune unused #defines
>   arm/arm64: KVM: Check that IDMAP doesn't intersect with VA range
> 
>  arch/arm/include/asm/kvm_asm.h      |   2 +
>  arch/arm/include/asm/kvm_host.h     |  25 +++-----
>  arch/arm/include/asm/kvm_mmu.h      |  11 ----
>  arch/arm/include/asm/virt.h         |   4 ++
>  arch/arm/kvm/arm.c                  |  20 ++----
>  arch/arm/kvm/init.S                 |  56 ++++++----------
>  arch/arm/kvm/mmu.c                  | 125 ++++++++++++++++--------------------
>  arch/arm64/include/asm/cpufeature.h |   3 +-
>  arch/arm64/include/asm/kvm_host.h   |  17 ++---
>  arch/arm64/include/asm/kvm_hyp.h    |  28 ++++----
>  arch/arm64/include/asm/kvm_mmu.h    | 100 ++++++++++++++++++++++++-----
>  arch/arm64/include/asm/virt.h       |   4 ++
>  arch/arm64/kernel/cpufeature.c      |  19 ++++++
>  arch/arm64/kvm/hyp-init.S           |  61 +++---------------
>  arch/arm64/kvm/hyp/entry.S          |  19 ------
>  arch/arm64/kvm/hyp/hyp-entry.S      |  15 +++++
>  arch/arm64/kvm/reset.c              |  28 --------
>  17 files changed, 240 insertions(+), 297 deletions(-)
> 
> -- 
> 2.1.4
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 00/15] arm/arm64: KVM: Merge boot and runtime page tables
@ 2016-06-27 13:29   ` Christoffer Dall
  0 siblings, 0 replies; 90+ messages in thread
From: Christoffer Dall @ 2016-06-27 13:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 07, 2016 at 11:58:20AM +0100, Marc Zyngier wrote:
> Until now, we've been setting up KVM using two sets of page tables:
> one for the "boot" where we perform the basic MMU setup, and one for
> the runtime.
> 
> Switching between the two was though to be safe, but we've recently
> realized that it is not: it is not enough to ensure that the VA->PA
> mapping is consistent when switching TTBR0_EL2, but we also have to
> ensure that the intermediate translations are the same as well. If the
> TLB can return two different values for intermediate translations,
> we're screwed (TLB conflicts).

Just a clarification: Intermediate Translations here means the
page table levels of translations for a single stage of translation?

Is there a valid reference to the arhictecture specification for this?

Thanks,
-Christoffer


> 
> At that point, the only safe thing to do is to never change TTBR0_EL2,
> which means that we need to make the idmap page part of the runtime
> page tables.
> 
> The series starts with a bit of brain dumping explaining what we're
> trying to do. This might not be useful as a merge candidate, but it
> was useful for me to put this somewhere. It goes on revamping the
> whole notion of HYP VA range, making it runtime patchable. It then
> always merge idmap and runtime page table into one set, leading to
> quite a lot of simplification in the init/teardown code. In the
> process, 32bit KVM gains the ability to teardown the HYP page-tables
> and vectors, which makes kexec a bit closer.
> 
> This has been tested on Seattle, Juno, the FVP model (both v8.0 and
> v8.1), Cubietruck and Midway, and is based on 4.7-rc2.
> 
> Thanks,
> 
> 	M.
> 
> Marc Zyngier (15):
>   arm64: KVM: Merged page tables documentation
>   arm64: KVM: Kill HYP_PAGE_OFFSET
>   arm64: Add ARM64_HYP_OFFSET_LOW capability
>   arm64: KVM: Define HYP offset masks
>   arm64: KVM: Refactor kern_hyp_va/hyp_kern_va to deal with multiple
>     offsets
>   arm/arm64: KVM: Export __hyp_text_start/end symbols
>   arm64: KVM: Runtime detection of lower HYP offset
>   arm/arm64: KVM: Always have merged page tables
>   arm64: KVM: Simplify HYP init/teardown
>   arm/arm64: KVM: Drop boot_pgd
>   arm/arm64: KVM: Kill free_boot_hyp_pgd
>   arm: KVM: Simplify HYP init
>   arm: KVM: Allow hyp teardown
>   arm/arm64: KVM: Prune unused #defines
>   arm/arm64: KVM: Check that IDMAP doesn't intersect with VA range
> 
>  arch/arm/include/asm/kvm_asm.h      |   2 +
>  arch/arm/include/asm/kvm_host.h     |  25 +++-----
>  arch/arm/include/asm/kvm_mmu.h      |  11 ----
>  arch/arm/include/asm/virt.h         |   4 ++
>  arch/arm/kvm/arm.c                  |  20 ++----
>  arch/arm/kvm/init.S                 |  56 ++++++----------
>  arch/arm/kvm/mmu.c                  | 125 ++++++++++++++++--------------------
>  arch/arm64/include/asm/cpufeature.h |   3 +-
>  arch/arm64/include/asm/kvm_host.h   |  17 ++---
>  arch/arm64/include/asm/kvm_hyp.h    |  28 ++++----
>  arch/arm64/include/asm/kvm_mmu.h    | 100 ++++++++++++++++++++++++-----
>  arch/arm64/include/asm/virt.h       |   4 ++
>  arch/arm64/kernel/cpufeature.c      |  19 ++++++
>  arch/arm64/kvm/hyp-init.S           |  61 +++---------------
>  arch/arm64/kvm/hyp/entry.S          |  19 ------
>  arch/arm64/kvm/hyp/hyp-entry.S      |  15 +++++
>  arch/arm64/kvm/reset.c              |  28 --------
>  17 files changed, 240 insertions(+), 297 deletions(-)
> 
> -- 
> 2.1.4
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 02/15] arm64: KVM: Kill HYP_PAGE_OFFSET
  2016-06-07 10:58   ` Marc Zyngier
@ 2016-06-27 13:47     ` Christoffer Dall
  -1 siblings, 0 replies; 90+ messages in thread
From: Christoffer Dall @ 2016-06-27 13:47 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: linux-arm-kernel, kvm, kvmarm

On Tue, Jun 07, 2016 at 11:58:22AM +0100, Marc Zyngier wrote:
> HYP_PAGE_OFFSET is not massively useful. And the way we use it
> in KERN_HYP_VA is inconsistent with the equivalent operation in
> EL2, where we use a mask instead.
> 
> Let's replace the uses of HYP_PAGE_OFFSET with HYP_PAGE_OFFSET_MASK,
> and get rid of the pointless macro.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/kvm_hyp.h | 5 ++---
>  arch/arm64/include/asm/kvm_mmu.h | 3 +--
>  2 files changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
> index 44eaff7..61d01a9 100644
> --- a/arch/arm64/include/asm/kvm_hyp.h
> +++ b/arch/arm64/include/asm/kvm_hyp.h
> @@ -38,11 +38,10 @@ static inline unsigned long __kern_hyp_va(unsigned long v)
>  
>  static inline unsigned long __hyp_kern_va(unsigned long v)
>  {
> -	u64 offset = PAGE_OFFSET - HYP_PAGE_OFFSET;
> -	asm volatile(ALTERNATIVE("add %0, %0, %1",
> +	asm volatile(ALTERNATIVE("orr %0, %0, %1",
>  				 "nop",
>  				 ARM64_HAS_VIRT_HOST_EXTN)
> -		     : "+r" (v) : "r" (offset));
> +		     : "+r" (v) : "i" (~HYP_PAGE_OFFSET_MASK));

for some reason this is hurting my brain.  I can't easily see that the
two implementations are equivalent.

I can see that the kernel-to-hyp masking is trivially correct, but are
we always sure that the upper bits that we mask off are always set?

>  	return v;
>  }
>  
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 00bc277..d162372 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -75,7 +75,6 @@
>   */
>  #define HYP_PAGE_OFFSET_SHIFT	VA_BITS
>  #define HYP_PAGE_OFFSET_MASK	((UL(1) << HYP_PAGE_OFFSET_SHIFT) - 1)
> -#define HYP_PAGE_OFFSET		(PAGE_OFFSET & HYP_PAGE_OFFSET_MASK)
>  
>  /*
>   * Our virtual mapping for the idmap-ed MMU-enable code. Must be
> @@ -109,7 +108,7 @@ alternative_endif
>  #include <asm/mmu_context.h>
>  #include <asm/pgtable.h>
>  
> -#define KERN_TO_HYP(kva)	((unsigned long)kva - PAGE_OFFSET + HYP_PAGE_OFFSET)
> +#define KERN_TO_HYP(kva)	((unsigned long)kva & HYP_PAGE_OFFSET_MASK)
>  

Why do we have both kern_hyp_va() and KERN_TO_HYP and how are they
related again?

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 02/15] arm64: KVM: Kill HYP_PAGE_OFFSET
@ 2016-06-27 13:47     ` Christoffer Dall
  0 siblings, 0 replies; 90+ messages in thread
From: Christoffer Dall @ 2016-06-27 13:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 07, 2016 at 11:58:22AM +0100, Marc Zyngier wrote:
> HYP_PAGE_OFFSET is not massively useful. And the way we use it
> in KERN_HYP_VA is inconsistent with the equivalent operation in
> EL2, where we use a mask instead.
> 
> Let's replace the uses of HYP_PAGE_OFFSET with HYP_PAGE_OFFSET_MASK,
> and get rid of the pointless macro.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/kvm_hyp.h | 5 ++---
>  arch/arm64/include/asm/kvm_mmu.h | 3 +--
>  2 files changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
> index 44eaff7..61d01a9 100644
> --- a/arch/arm64/include/asm/kvm_hyp.h
> +++ b/arch/arm64/include/asm/kvm_hyp.h
> @@ -38,11 +38,10 @@ static inline unsigned long __kern_hyp_va(unsigned long v)
>  
>  static inline unsigned long __hyp_kern_va(unsigned long v)
>  {
> -	u64 offset = PAGE_OFFSET - HYP_PAGE_OFFSET;
> -	asm volatile(ALTERNATIVE("add %0, %0, %1",
> +	asm volatile(ALTERNATIVE("orr %0, %0, %1",
>  				 "nop",
>  				 ARM64_HAS_VIRT_HOST_EXTN)
> -		     : "+r" (v) : "r" (offset));
> +		     : "+r" (v) : "i" (~HYP_PAGE_OFFSET_MASK));

for some reason this is hurting my brain.  I can't easily see that the
two implementations are equivalent.

I can see that the kernel-to-hyp masking is trivially correct, but are
we always sure that the upper bits that we mask off are always set?

>  	return v;
>  }
>  
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 00bc277..d162372 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -75,7 +75,6 @@
>   */
>  #define HYP_PAGE_OFFSET_SHIFT	VA_BITS
>  #define HYP_PAGE_OFFSET_MASK	((UL(1) << HYP_PAGE_OFFSET_SHIFT) - 1)
> -#define HYP_PAGE_OFFSET		(PAGE_OFFSET & HYP_PAGE_OFFSET_MASK)
>  
>  /*
>   * Our virtual mapping for the idmap-ed MMU-enable code. Must be
> @@ -109,7 +108,7 @@ alternative_endif
>  #include <asm/mmu_context.h>
>  #include <asm/pgtable.h>
>  
> -#define KERN_TO_HYP(kva)	((unsigned long)kva - PAGE_OFFSET + HYP_PAGE_OFFSET)
> +#define KERN_TO_HYP(kva)	((unsigned long)kva & HYP_PAGE_OFFSET_MASK)
>  

Why do we have both kern_hyp_va() and KERN_TO_HYP and how are they
related again?

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 01/15] arm64: KVM: Merged page tables documentation
  2016-06-27 13:28     ` Christoffer Dall
@ 2016-06-27 14:06       ` Marc Zyngier
  -1 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-27 14:06 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: linux-arm-kernel, kvm, kvmarm

On 27/06/16 14:28, Christoffer Dall wrote:
> On Tue, Jun 07, 2016 at 11:58:21AM +0100, Marc Zyngier wrote:
>> Since dealing with VA ranges tends to hurt my brain badly, let's
>> start with a bit of documentation that will hopefully help
>> understanding what comes next...
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm64/include/asm/kvm_mmu.h | 45 +++++++++++++++++++++++++++++++++++++---
>>  1 file changed, 42 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
>> index f05ac27..00bc277 100644
>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> @@ -29,10 +29,49 @@
>>   *
>>   * Instead, give the HYP mode its own VA region at a fixed offset from
>>   * the kernel by just masking the top bits (which are all ones for a
>> - * kernel address).
>> + * kernel address). We need to find out how many bits to mask.
>>   *
>> - * ARMv8.1 (using VHE) does have a TTBR1_EL2, and doesn't use these
>> - * macros (the entire kernel runs at EL2).
>> + * We want to build a set of page tables that cover both parts of the
>> + * idmap (the trampoline page used to initialize EL2), and our normal
>> + * runtime VA space, at the same time.
>> + *
>> + * Given that the kernel uses VA_BITS for its entire address space,
>> + * and that half of that space (VA_BITS - 1) is used for the linear
>> + * mapping, we can limit the EL2 space to the same size.
> 
> we can also limit the EL2 space to (VA_BITS - 1).
> 
>> + *
>> + * The main question is "Within the VA_BITS space, does EL2 use the
>> + * top or the bottom half of that space to shadow the kernel's linear
>> + * mapping?". As we need to idmap the trampoline page, this is
>> + * determined by the range in which this page lives.
>> + *
>> + * If the page is in the bottom half, we have to use the top half. If
>> + * the page is in the top half, we have to use the bottom half:
>> + *
>> + * if (PA(T)[VA_BITS - 1] == 1)
>> + *	HYP_VA_RANGE = [0 ... (1 << (VA_BITS - 1)) - 1]
>> + * else
>> + *	HYP_VA_RANGE = [(1 << (VA_BITS - 1)) ... (1 << VA_BITS) - 1]
> 
> Is this pseudo code or what am I looking at?  What is T?

Pseudocode indeed. T is the "trampoline page".

> I don't understand what this is saying.

This is giving you the range of HYP VAs that can be safely used to map
kernel ranges.

> Can this be written using known constructs such as hyp_idmap_end,
> PHYS_OFFSET etc.?

I'm not sure. We're trying to determine the VA range that doesn't
conflict with a physical range. I don't see how introducing PHYS_OFFSET
is going to help, because we're only interested in a single page (the
trampoline page).

> And perhaps the pseudo code should define HYP_VA_SHIFT instead of the
> range to simplify it, at least I'm confused.

I think HYP_VA_SHIFT is actually contributing to the confusion, because
it has no practical impact on anything.

> 
>> + *
>> + * In practice, the second case can be simplified to
>> + *	HYP_VA_RANGE = [0 ... (1 << VA_BITS) - 1]
>> + * because we'll never get anything in the bottom range.
> 
> and now I'm more confused, are we not supposed to map the idmap in the
> bottom range?  Is this part of the comment necessary?

Well, I found it useful when I wrote it. What I meant is that we're
never going to alias a kernel mapping there.

> 
>> + *
>> + * This of course assumes that the trampoline page exists within the
>> + * VA_BITS range. If it doesn't, then it means we're in the odd case
>> + * where the kernel idmap (as well as HYP) uses more levels than the
>> + * kernel runtime page tables (as seen when the kernel is configured
>> + * for 4k pages, 39bits VA, and yet memory lives just above that
>> + * limit, forcing the idmap to use 4 levels of page tables while the
>> + * kernel itself only uses 3). In this particular case, it doesn't
>> + * matter which side of VA_BITS we use, as we're guaranteed not to
>> + * conflict with anything.
>> + *
>> + * An alternative would be to always use 4 levels of page tables for
>> + * EL2, no matter what the kernel does. But who wants more levels than
>> + * strictly necessary?
>> + *
>> + * Thankfully, ARMv8.1 (using VHE) does have a TTBR1_EL2, and doesn't
>> + * need any of this madness (the entire kernel runs at EL2).
> 
> Not sure how these two last paragraphs helps understanding what this
> patch set is about to implement, as it seems to raise more questions
> than answer them, but I will proceed to trying to read the code...

As I said, I found this blurb useful when I was trying to reason about
the problem. I don't mind it being dropped.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 01/15] arm64: KVM: Merged page tables documentation
@ 2016-06-27 14:06       ` Marc Zyngier
  0 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-27 14:06 UTC (permalink / raw)
  To: linux-arm-kernel

On 27/06/16 14:28, Christoffer Dall wrote:
> On Tue, Jun 07, 2016 at 11:58:21AM +0100, Marc Zyngier wrote:
>> Since dealing with VA ranges tends to hurt my brain badly, let's
>> start with a bit of documentation that will hopefully help
>> understanding what comes next...
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm64/include/asm/kvm_mmu.h | 45 +++++++++++++++++++++++++++++++++++++---
>>  1 file changed, 42 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
>> index f05ac27..00bc277 100644
>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> @@ -29,10 +29,49 @@
>>   *
>>   * Instead, give the HYP mode its own VA region at a fixed offset from
>>   * the kernel by just masking the top bits (which are all ones for a
>> - * kernel address).
>> + * kernel address). We need to find out how many bits to mask.
>>   *
>> - * ARMv8.1 (using VHE) does have a TTBR1_EL2, and doesn't use these
>> - * macros (the entire kernel runs at EL2).
>> + * We want to build a set of page tables that cover both parts of the
>> + * idmap (the trampoline page used to initialize EL2), and our normal
>> + * runtime VA space, at the same time.
>> + *
>> + * Given that the kernel uses VA_BITS for its entire address space,
>> + * and that half of that space (VA_BITS - 1) is used for the linear
>> + * mapping, we can limit the EL2 space to the same size.
> 
> we can also limit the EL2 space to (VA_BITS - 1).
> 
>> + *
>> + * The main question is "Within the VA_BITS space, does EL2 use the
>> + * top or the bottom half of that space to shadow the kernel's linear
>> + * mapping?". As we need to idmap the trampoline page, this is
>> + * determined by the range in which this page lives.
>> + *
>> + * If the page is in the bottom half, we have to use the top half. If
>> + * the page is in the top half, we have to use the bottom half:
>> + *
>> + * if (PA(T)[VA_BITS - 1] == 1)
>> + *	HYP_VA_RANGE = [0 ... (1 << (VA_BITS - 1)) - 1]
>> + * else
>> + *	HYP_VA_RANGE = [(1 << (VA_BITS - 1)) ... (1 << VA_BITS) - 1]
> 
> Is this pseudo code or what am I looking at?  What is T?

Pseudocode indeed. T is the "trampoline page".

> I don't understand what this is saying.

This is giving you the range of HYP VAs that can be safely used to map
kernel ranges.

> Can this be written using known constructs such as hyp_idmap_end,
> PHYS_OFFSET etc.?

I'm not sure. We're trying to determine the VA range that doesn't
conflict with a physical range. I don't see how introducing PHYS_OFFSET
is going to help, because we're only interested in a single page (the
trampoline page).

> And perhaps the pseudo code should define HYP_VA_SHIFT instead of the
> range to simplify it, at least I'm confused.

I think HYP_VA_SHIFT is actually contributing to the confusion, because
it has no practical impact on anything.

> 
>> + *
>> + * In practice, the second case can be simplified to
>> + *	HYP_VA_RANGE = [0 ... (1 << VA_BITS) - 1]
>> + * because we'll never get anything in the bottom range.
> 
> and now I'm more confused, are we not supposed to map the idmap in the
> bottom range?  Is this part of the comment necessary?

Well, I found it useful when I wrote it. What I meant is that we're
never going to alias a kernel mapping there.

> 
>> + *
>> + * This of course assumes that the trampoline page exists within the
>> + * VA_BITS range. If it doesn't, then it means we're in the odd case
>> + * where the kernel idmap (as well as HYP) uses more levels than the
>> + * kernel runtime page tables (as seen when the kernel is configured
>> + * for 4k pages, 39bits VA, and yet memory lives just above that
>> + * limit, forcing the idmap to use 4 levels of page tables while the
>> + * kernel itself only uses 3). In this particular case, it doesn't
>> + * matter which side of VA_BITS we use, as we're guaranteed not to
>> + * conflict with anything.
>> + *
>> + * An alternative would be to always use 4 levels of page tables for
>> + * EL2, no matter what the kernel does. But who wants more levels than
>> + * strictly necessary?
>> + *
>> + * Thankfully, ARMv8.1 (using VHE) does have a TTBR1_EL2, and doesn't
>> + * need any of this madness (the entire kernel runs at EL2).
> 
> Not sure how these two last paragraphs helps understanding what this
> patch set is about to implement, as it seems to raise more questions
> than answer them, but I will proceed to trying to read the code...

As I said, I found this blurb useful when I was trying to reason about
the problem. I don't mind it being dropped.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 00/15] arm/arm64: KVM: Merge boot and runtime page tables
  2016-06-27 13:29   ` Christoffer Dall
@ 2016-06-27 14:12     ` Marc Zyngier
  -1 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-27 14:12 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: linux-arm-kernel, kvm, kvmarm

On 27/06/16 14:29, Christoffer Dall wrote:
> On Tue, Jun 07, 2016 at 11:58:20AM +0100, Marc Zyngier wrote:
>> Until now, we've been setting up KVM using two sets of page tables:
>> one for the "boot" where we perform the basic MMU setup, and one for
>> the runtime.
>>
>> Switching between the two was though to be safe, but we've recently
>> realized that it is not: it is not enough to ensure that the VA->PA
>> mapping is consistent when switching TTBR0_EL2, but we also have to
>> ensure that the intermediate translations are the same as well. If the
>> TLB can return two different values for intermediate translations,
>> we're screwed (TLB conflicts).
> 
> Just a clarification: Intermediate Translations here means the
> page table levels of translations for a single stage of translation?

It does indeed.

> Is there a valid reference to the arhictecture specification for this?

D4.7.1 (General TLB maintenance requirements) talks a bit about
intermediate caching. G4.9.5 (TLB conflicts abort) is also of interest.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 00/15] arm/arm64: KVM: Merge boot and runtime page tables
@ 2016-06-27 14:12     ` Marc Zyngier
  0 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-27 14:12 UTC (permalink / raw)
  To: linux-arm-kernel

On 27/06/16 14:29, Christoffer Dall wrote:
> On Tue, Jun 07, 2016 at 11:58:20AM +0100, Marc Zyngier wrote:
>> Until now, we've been setting up KVM using two sets of page tables:
>> one for the "boot" where we perform the basic MMU setup, and one for
>> the runtime.
>>
>> Switching between the two was though to be safe, but we've recently
>> realized that it is not: it is not enough to ensure that the VA->PA
>> mapping is consistent when switching TTBR0_EL2, but we also have to
>> ensure that the intermediate translations are the same as well. If the
>> TLB can return two different values for intermediate translations,
>> we're screwed (TLB conflicts).
> 
> Just a clarification: Intermediate Translations here means the
> page table levels of translations for a single stage of translation?

It does indeed.

> Is there a valid reference to the arhictecture specification for this?

D4.7.1 (General TLB maintenance requirements) talks a bit about
intermediate caching. G4.9.5 (TLB conflicts abort) is also of interest.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 02/15] arm64: KVM: Kill HYP_PAGE_OFFSET
  2016-06-27 13:47     ` Christoffer Dall
@ 2016-06-27 14:20       ` Marc Zyngier
  -1 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-27 14:20 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm

On 27/06/16 14:47, Christoffer Dall wrote:
> On Tue, Jun 07, 2016 at 11:58:22AM +0100, Marc Zyngier wrote:
>> HYP_PAGE_OFFSET is not massively useful. And the way we use it
>> in KERN_HYP_VA is inconsistent with the equivalent operation in
>> EL2, where we use a mask instead.
>>
>> Let's replace the uses of HYP_PAGE_OFFSET with HYP_PAGE_OFFSET_MASK,
>> and get rid of the pointless macro.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm64/include/asm/kvm_hyp.h | 5 ++---
>>  arch/arm64/include/asm/kvm_mmu.h | 3 +--
>>  2 files changed, 3 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
>> index 44eaff7..61d01a9 100644
>> --- a/arch/arm64/include/asm/kvm_hyp.h
>> +++ b/arch/arm64/include/asm/kvm_hyp.h
>> @@ -38,11 +38,10 @@ static inline unsigned long __kern_hyp_va(unsigned long v)
>>  
>>  static inline unsigned long __hyp_kern_va(unsigned long v)
>>  {
>> -	u64 offset = PAGE_OFFSET - HYP_PAGE_OFFSET;
>> -	asm volatile(ALTERNATIVE("add %0, %0, %1",
>> +	asm volatile(ALTERNATIVE("orr %0, %0, %1",
>>  				 "nop",
>>  				 ARM64_HAS_VIRT_HOST_EXTN)
>> -		     : "+r" (v) : "r" (offset));
>> +		     : "+r" (v) : "i" (~HYP_PAGE_OFFSET_MASK));
> 
> for some reason this is hurting my brain.  I can't easily see that the
> two implementations are equivalent.
> 
> I can see that the kernel-to-hyp masking is trivially correct, but are
> we always sure that the upper bits that we mask off are always set?

A kernel address always has the top bits set. That's a given, and a
property of the architecture (bits [63:VA_BITS] are set to one. See
D4.2.1 and the definition of a Virtual Address (top VA subrange).

> 
>>  	return v;
>>  }
>>  
>> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
>> index 00bc277..d162372 100644
>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> @@ -75,7 +75,6 @@
>>   */
>>  #define HYP_PAGE_OFFSET_SHIFT	VA_BITS
>>  #define HYP_PAGE_OFFSET_MASK	((UL(1) << HYP_PAGE_OFFSET_SHIFT) - 1)
>> -#define HYP_PAGE_OFFSET		(PAGE_OFFSET & HYP_PAGE_OFFSET_MASK)
>>  
>>  /*
>>   * Our virtual mapping for the idmap-ed MMU-enable code. Must be
>> @@ -109,7 +108,7 @@ alternative_endif
>>  #include <asm/mmu_context.h>
>>  #include <asm/pgtable.h>
>>  
>> -#define KERN_TO_HYP(kva)	((unsigned long)kva - PAGE_OFFSET + HYP_PAGE_OFFSET)
>> +#define KERN_TO_HYP(kva)	((unsigned long)kva & HYP_PAGE_OFFSET_MASK)
>>  
> 
> Why do we have both kern_hyp_va() and KERN_TO_HYP and how are they
> related again?

That's because kern_hyp_va used to be reserved to the assembly code, and
KERN_TO_HYP used in C code. We could (and probably should) unify them.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 02/15] arm64: KVM: Kill HYP_PAGE_OFFSET
@ 2016-06-27 14:20       ` Marc Zyngier
  0 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-27 14:20 UTC (permalink / raw)
  To: linux-arm-kernel

On 27/06/16 14:47, Christoffer Dall wrote:
> On Tue, Jun 07, 2016 at 11:58:22AM +0100, Marc Zyngier wrote:
>> HYP_PAGE_OFFSET is not massively useful. And the way we use it
>> in KERN_HYP_VA is inconsistent with the equivalent operation in
>> EL2, where we use a mask instead.
>>
>> Let's replace the uses of HYP_PAGE_OFFSET with HYP_PAGE_OFFSET_MASK,
>> and get rid of the pointless macro.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm64/include/asm/kvm_hyp.h | 5 ++---
>>  arch/arm64/include/asm/kvm_mmu.h | 3 +--
>>  2 files changed, 3 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
>> index 44eaff7..61d01a9 100644
>> --- a/arch/arm64/include/asm/kvm_hyp.h
>> +++ b/arch/arm64/include/asm/kvm_hyp.h
>> @@ -38,11 +38,10 @@ static inline unsigned long __kern_hyp_va(unsigned long v)
>>  
>>  static inline unsigned long __hyp_kern_va(unsigned long v)
>>  {
>> -	u64 offset = PAGE_OFFSET - HYP_PAGE_OFFSET;
>> -	asm volatile(ALTERNATIVE("add %0, %0, %1",
>> +	asm volatile(ALTERNATIVE("orr %0, %0, %1",
>>  				 "nop",
>>  				 ARM64_HAS_VIRT_HOST_EXTN)
>> -		     : "+r" (v) : "r" (offset));
>> +		     : "+r" (v) : "i" (~HYP_PAGE_OFFSET_MASK));
> 
> for some reason this is hurting my brain.  I can't easily see that the
> two implementations are equivalent.
> 
> I can see that the kernel-to-hyp masking is trivially correct, but are
> we always sure that the upper bits that we mask off are always set?

A kernel address always has the top bits set. That's a given, and a
property of the architecture (bits [63:VA_BITS] are set to one. See
D4.2.1 and the definition of a Virtual Address (top VA subrange).

> 
>>  	return v;
>>  }
>>  
>> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
>> index 00bc277..d162372 100644
>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> @@ -75,7 +75,6 @@
>>   */
>>  #define HYP_PAGE_OFFSET_SHIFT	VA_BITS
>>  #define HYP_PAGE_OFFSET_MASK	((UL(1) << HYP_PAGE_OFFSET_SHIFT) - 1)
>> -#define HYP_PAGE_OFFSET		(PAGE_OFFSET & HYP_PAGE_OFFSET_MASK)
>>  
>>  /*
>>   * Our virtual mapping for the idmap-ed MMU-enable code. Must be
>> @@ -109,7 +108,7 @@ alternative_endif
>>  #include <asm/mmu_context.h>
>>  #include <asm/pgtable.h>
>>  
>> -#define KERN_TO_HYP(kva)	((unsigned long)kva - PAGE_OFFSET + HYP_PAGE_OFFSET)
>> +#define KERN_TO_HYP(kva)	((unsigned long)kva & HYP_PAGE_OFFSET_MASK)
>>  
> 
> Why do we have both kern_hyp_va() and KERN_TO_HYP and how are they
> related again?

That's because kern_hyp_va used to be reserved to the assembly code, and
KERN_TO_HYP used in C code. We could (and probably should) unify them.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 01/15] arm64: KVM: Merged page tables documentation
  2016-06-27 14:06       ` Marc Zyngier
@ 2016-06-28 11:46         ` Christoffer Dall
  -1 siblings, 0 replies; 90+ messages in thread
From: Christoffer Dall @ 2016-06-28 11:46 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: linux-arm-kernel, kvm, kvmarm

On Mon, Jun 27, 2016 at 03:06:11PM +0100, Marc Zyngier wrote:
> On 27/06/16 14:28, Christoffer Dall wrote:
> > On Tue, Jun 07, 2016 at 11:58:21AM +0100, Marc Zyngier wrote:
> >> Since dealing with VA ranges tends to hurt my brain badly, let's
> >> start with a bit of documentation that will hopefully help
> >> understanding what comes next...
> >>
> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >> ---
> >>  arch/arm64/include/asm/kvm_mmu.h | 45 +++++++++++++++++++++++++++++++++++++---
> >>  1 file changed, 42 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> >> index f05ac27..00bc277 100644
> >> --- a/arch/arm64/include/asm/kvm_mmu.h
> >> +++ b/arch/arm64/include/asm/kvm_mmu.h
> >> @@ -29,10 +29,49 @@
> >>   *
> >>   * Instead, give the HYP mode its own VA region at a fixed offset from
> >>   * the kernel by just masking the top bits (which are all ones for a
> >> - * kernel address).
> >> + * kernel address). We need to find out how many bits to mask.
> >>   *
> >> - * ARMv8.1 (using VHE) does have a TTBR1_EL2, and doesn't use these
> >> - * macros (the entire kernel runs at EL2).
> >> + * We want to build a set of page tables that cover both parts of the
> >> + * idmap (the trampoline page used to initialize EL2), and our normal
> >> + * runtime VA space, at the same time.
> >> + *
> >> + * Given that the kernel uses VA_BITS for its entire address space,
> >> + * and that half of that space (VA_BITS - 1) is used for the linear
> >> + * mapping, we can limit the EL2 space to the same size.
> > 
> > we can also limit the EL2 space to (VA_BITS - 1).
> > 
> >> + *
> >> + * The main question is "Within the VA_BITS space, does EL2 use the
> >> + * top or the bottom half of that space to shadow the kernel's linear
> >> + * mapping?". As we need to idmap the trampoline page, this is
> >> + * determined by the range in which this page lives.
> >> + *
> >> + * If the page is in the bottom half, we have to use the top half. If
> >> + * the page is in the top half, we have to use the bottom half:
> >> + *
> >> + * if (PA(T)[VA_BITS - 1] == 1)
> >> + *	HYP_VA_RANGE = [0 ... (1 << (VA_BITS - 1)) - 1]
> >> + * else
> >> + *	HYP_VA_RANGE = [(1 << (VA_BITS - 1)) ... (1 << VA_BITS) - 1]
> > 
> > Is this pseudo code or what am I looking at?  What is T?
> 
> Pseudocode indeed. T is the "trampoline page".
> 
> > I don't understand what this is saying.
> 
> This is giving you the range of HYP VAs that can be safely used to map
> kernel ranges.

Ah, by PA(T)[bit_nr] you mean the value of an individual bit 'bit_nr' ?

I just think I choked on the pseudocode syntax, perhaps this is easier
to understand?

T = __virt_to_phys(__hyp_idmap_text_start)
if (T & BIT(VA_BITS - 1))
	HYP_VA_MIN = 0  //idmap in upper half
else
	HYP_VA_MIN = 1 << (VA_BITS - 1)
HYP_VA_MAX = HYP_VA_MIN + (1 << (VA_BITS - 1)) - 1

> 
> > Can this be written using known constructs such as hyp_idmap_end,
> > PHYS_OFFSET etc.?
> 
> I'm not sure. We're trying to determine the VA range that doesn't
> conflict with a physical range. I don't see how introducing PHYS_OFFSET
> is going to help, because we're only interested in a single page (the
> trampoline page).
> 
> > And perhaps the pseudo code should define HYP_VA_SHIFT instead of the
> > range to simplify it, at least I'm confused.
> 
> I think HYP_VA_SHIFT is actually contributing to the confusion, because
> it has no practical impact on anything.
> 

I was rambling, my suggestion above is basically what I meant.

> > 
> >> + *
> >> + * In practice, the second case can be simplified to
> >> + *	HYP_VA_RANGE = [0 ... (1 << VA_BITS) - 1]
> >> + * because we'll never get anything in the bottom range.
> > 
> > and now I'm more confused, are we not supposed to map the idmap in the
> > bottom range?  Is this part of the comment necessary?
> 
> Well, I found it useful when I wrote it. What I meant is that we're
> never going to alias a kernel mapping there.

I think we should merge the documentation, this stuff is tricky so
having it properly documented is important IMHO.

The confusing part here is that we just said above that the HYP VA range
may have to live in the upper part because the lower part would be used
for the idmap, so why can we use it anyway?

Is the point that you'll be done with the idmap at some point?

> 
> > 
> >> + *
> >> + * This of course assumes that the trampoline page exists within the
> >> + * VA_BITS range. If it doesn't, then it means we're in the odd case
> >> + * where the kernel idmap (as well as HYP) uses more levels than the
> >> + * kernel runtime page tables (as seen when the kernel is configured
> >> + * for 4k pages, 39bits VA, and yet memory lives just above that
> >> + * limit, forcing the idmap to use 4 levels of page tables while the
> >> + * kernel itself only uses 3). In this particular case, it doesn't
> >> + * matter which side of VA_BITS we use, as we're guaranteed not to
> >> + * conflict with anything.
> >> + *
> >> + * An alternative would be to always use 4 levels of page tables for
> >> + * EL2, no matter what the kernel does. But who wants more levels than
> >> + * strictly necessary?

Our expectation here is that using an additional level is slower for TLB
misses, so we want to avoid this, correct?  Also does the kernel never
use 4 levels of page tables so that this is always an option.

I appreciate the tongue-in-cheek, but since this hurts my brain (badly)
I want to get rid of anything here that leaves the reader with open
questions.

I don't mind trying to rewrite some of this, just have to make sure I
actually understand it first.

> >> + *
> >> + * Thankfully, ARMv8.1 (using VHE) does have a TTBR1_EL2, and doesn't
> >> + * need any of this madness (the entire kernel runs at EL2).

So here I would simply state that using VHE, there are no separate hyp
mappings and all KVM functionality is already mapped as part of the main
kernel mappings, and none of this applies in that case.  Perhaps that's
what you said already, and I just misread it for some reason.

> > 
> > Not sure how these two last paragraphs helps understanding what this
> > patch set is about to implement, as it seems to raise more questions
> > than answer them, but I will proceed to trying to read the code...
> 
> As I said, I found this blurb useful when I was trying to reason about
> the problem. I don't mind it being dropped.
> 

I would prefer if we can tweak it so I also understand it and then
actually merge it.  That also makes it easier for me to review the patch
set :)

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 01/15] arm64: KVM: Merged page tables documentation
@ 2016-06-28 11:46         ` Christoffer Dall
  0 siblings, 0 replies; 90+ messages in thread
From: Christoffer Dall @ 2016-06-28 11:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 27, 2016 at 03:06:11PM +0100, Marc Zyngier wrote:
> On 27/06/16 14:28, Christoffer Dall wrote:
> > On Tue, Jun 07, 2016 at 11:58:21AM +0100, Marc Zyngier wrote:
> >> Since dealing with VA ranges tends to hurt my brain badly, let's
> >> start with a bit of documentation that will hopefully help
> >> understanding what comes next...
> >>
> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >> ---
> >>  arch/arm64/include/asm/kvm_mmu.h | 45 +++++++++++++++++++++++++++++++++++++---
> >>  1 file changed, 42 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> >> index f05ac27..00bc277 100644
> >> --- a/arch/arm64/include/asm/kvm_mmu.h
> >> +++ b/arch/arm64/include/asm/kvm_mmu.h
> >> @@ -29,10 +29,49 @@
> >>   *
> >>   * Instead, give the HYP mode its own VA region at a fixed offset from
> >>   * the kernel by just masking the top bits (which are all ones for a
> >> - * kernel address).
> >> + * kernel address). We need to find out how many bits to mask.
> >>   *
> >> - * ARMv8.1 (using VHE) does have a TTBR1_EL2, and doesn't use these
> >> - * macros (the entire kernel runs at EL2).
> >> + * We want to build a set of page tables that cover both parts of the
> >> + * idmap (the trampoline page used to initialize EL2), and our normal
> >> + * runtime VA space, at the same time.
> >> + *
> >> + * Given that the kernel uses VA_BITS for its entire address space,
> >> + * and that half of that space (VA_BITS - 1) is used for the linear
> >> + * mapping, we can limit the EL2 space to the same size.
> > 
> > we can also limit the EL2 space to (VA_BITS - 1).
> > 
> >> + *
> >> + * The main question is "Within the VA_BITS space, does EL2 use the
> >> + * top or the bottom half of that space to shadow the kernel's linear
> >> + * mapping?". As we need to idmap the trampoline page, this is
> >> + * determined by the range in which this page lives.
> >> + *
> >> + * If the page is in the bottom half, we have to use the top half. If
> >> + * the page is in the top half, we have to use the bottom half:
> >> + *
> >> + * if (PA(T)[VA_BITS - 1] == 1)
> >> + *	HYP_VA_RANGE = [0 ... (1 << (VA_BITS - 1)) - 1]
> >> + * else
> >> + *	HYP_VA_RANGE = [(1 << (VA_BITS - 1)) ... (1 << VA_BITS) - 1]
> > 
> > Is this pseudo code or what am I looking at?  What is T?
> 
> Pseudocode indeed. T is the "trampoline page".
> 
> > I don't understand what this is saying.
> 
> This is giving you the range of HYP VAs that can be safely used to map
> kernel ranges.

Ah, by PA(T)[bit_nr] you mean the value of an individual bit 'bit_nr' ?

I just think I choked on the pseudocode syntax, perhaps this is easier
to understand?

T = __virt_to_phys(__hyp_idmap_text_start)
if (T & BIT(VA_BITS - 1))
	HYP_VA_MIN = 0  //idmap in upper half
else
	HYP_VA_MIN = 1 << (VA_BITS - 1)
HYP_VA_MAX = HYP_VA_MIN + (1 << (VA_BITS - 1)) - 1

> 
> > Can this be written using known constructs such as hyp_idmap_end,
> > PHYS_OFFSET etc.?
> 
> I'm not sure. We're trying to determine the VA range that doesn't
> conflict with a physical range. I don't see how introducing PHYS_OFFSET
> is going to help, because we're only interested in a single page (the
> trampoline page).
> 
> > And perhaps the pseudo code should define HYP_VA_SHIFT instead of the
> > range to simplify it, at least I'm confused.
> 
> I think HYP_VA_SHIFT is actually contributing to the confusion, because
> it has no practical impact on anything.
> 

I was rambling, my suggestion above is basically what I meant.

> > 
> >> + *
> >> + * In practice, the second case can be simplified to
> >> + *	HYP_VA_RANGE = [0 ... (1 << VA_BITS) - 1]
> >> + * because we'll never get anything in the bottom range.
> > 
> > and now I'm more confused, are we not supposed to map the idmap in the
> > bottom range?  Is this part of the comment necessary?
> 
> Well, I found it useful when I wrote it. What I meant is that we're
> never going to alias a kernel mapping there.

I think we should merge the documentation, this stuff is tricky so
having it properly documented is important IMHO.

The confusing part here is that we just said above that the HYP VA range
may have to live in the upper part because the lower part would be used
for the idmap, so why can we use it anyway?

Is the point that you'll be done with the idmap at some point?

> 
> > 
> >> + *
> >> + * This of course assumes that the trampoline page exists within the
> >> + * VA_BITS range. If it doesn't, then it means we're in the odd case
> >> + * where the kernel idmap (as well as HYP) uses more levels than the
> >> + * kernel runtime page tables (as seen when the kernel is configured
> >> + * for 4k pages, 39bits VA, and yet memory lives just above that
> >> + * limit, forcing the idmap to use 4 levels of page tables while the
> >> + * kernel itself only uses 3). In this particular case, it doesn't
> >> + * matter which side of VA_BITS we use, as we're guaranteed not to
> >> + * conflict with anything.
> >> + *
> >> + * An alternative would be to always use 4 levels of page tables for
> >> + * EL2, no matter what the kernel does. But who wants more levels than
> >> + * strictly necessary?

Our expectation here is that using an additional level is slower for TLB
misses, so we want to avoid this, correct?  Also does the kernel never
use 4 levels of page tables so that this is always an option.

I appreciate the tongue-in-cheek, but since this hurts my brain (badly)
I want to get rid of anything here that leaves the reader with open
questions.

I don't mind trying to rewrite some of this, just have to make sure I
actually understand it first.

> >> + *
> >> + * Thankfully, ARMv8.1 (using VHE) does have a TTBR1_EL2, and doesn't
> >> + * need any of this madness (the entire kernel runs at EL2).

So here I would simply state that using VHE, there are no separate hyp
mappings and all KVM functionality is already mapped as part of the main
kernel mappings, and none of this applies in that case.  Perhaps that's
what you said already, and I just misread it for some reason.

> > 
> > Not sure how these two last paragraphs helps understanding what this
> > patch set is about to implement, as it seems to raise more questions
> > than answer them, but I will proceed to trying to read the code...
> 
> As I said, I found this blurb useful when I was trying to reason about
> the problem. I don't mind it being dropped.
> 

I would prefer if we can tweak it so I also understand it and then
actually merge it.  That also makes it easier for me to review the patch
set :)

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 02/15] arm64: KVM: Kill HYP_PAGE_OFFSET
  2016-06-27 14:20       ` Marc Zyngier
@ 2016-06-28 12:03         ` Christoffer Dall
  -1 siblings, 0 replies; 90+ messages in thread
From: Christoffer Dall @ 2016-06-28 12:03 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, linux-arm-kernel, kvmarm

On Mon, Jun 27, 2016 at 03:20:23PM +0100, Marc Zyngier wrote:
> On 27/06/16 14:47, Christoffer Dall wrote:
> > On Tue, Jun 07, 2016 at 11:58:22AM +0100, Marc Zyngier wrote:
> >> HYP_PAGE_OFFSET is not massively useful. And the way we use it
> >> in KERN_HYP_VA is inconsistent with the equivalent operation in
> >> EL2, where we use a mask instead.
> >>
> >> Let's replace the uses of HYP_PAGE_OFFSET with HYP_PAGE_OFFSET_MASK,
> >> and get rid of the pointless macro.
> >>
> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >> ---
> >>  arch/arm64/include/asm/kvm_hyp.h | 5 ++---
> >>  arch/arm64/include/asm/kvm_mmu.h | 3 +--
> >>  2 files changed, 3 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
> >> index 44eaff7..61d01a9 100644
> >> --- a/arch/arm64/include/asm/kvm_hyp.h
> >> +++ b/arch/arm64/include/asm/kvm_hyp.h
> >> @@ -38,11 +38,10 @@ static inline unsigned long __kern_hyp_va(unsigned long v)
> >>  
> >>  static inline unsigned long __hyp_kern_va(unsigned long v)
> >>  {
> >> -	u64 offset = PAGE_OFFSET - HYP_PAGE_OFFSET;
> >> -	asm volatile(ALTERNATIVE("add %0, %0, %1",
> >> +	asm volatile(ALTERNATIVE("orr %0, %0, %1",
> >>  				 "nop",
> >>  				 ARM64_HAS_VIRT_HOST_EXTN)
> >> -		     : "+r" (v) : "r" (offset));
> >> +		     : "+r" (v) : "i" (~HYP_PAGE_OFFSET_MASK));
> > 
> > for some reason this is hurting my brain.  I can't easily see that the
> > two implementations are equivalent.
> > 
> > I can see that the kernel-to-hyp masking is trivially correct, but are
> > we always sure that the upper bits that we mask off are always set?
> 
> A kernel address always has the top bits set. That's a given, and a
> property of the architecture (bits [63:VA_BITS] are set to one. See
> D4.2.1 and the definition of a Virtual Address (top VA subrange).
> 

This part I understood, but I somehow had the impression that
HYP_PAGE_OFFSET_MASK could mask off more than (63 - VA_BITS + 1) bits,
but looking at the definition of HYP_PAGE_OFFSET_MASK it clearly cannot.

What can I say, I probably shouldn't have looked at code yesterday.

> > 
> >>  	return v;
> >>  }
> >>  
> >> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> >> index 00bc277..d162372 100644
> >> --- a/arch/arm64/include/asm/kvm_mmu.h
> >> +++ b/arch/arm64/include/asm/kvm_mmu.h
> >> @@ -75,7 +75,6 @@
> >>   */
> >>  #define HYP_PAGE_OFFSET_SHIFT	VA_BITS
> >>  #define HYP_PAGE_OFFSET_MASK	((UL(1) << HYP_PAGE_OFFSET_SHIFT) - 1)
> >> -#define HYP_PAGE_OFFSET		(PAGE_OFFSET & HYP_PAGE_OFFSET_MASK)
> >>  
> >>  /*
> >>   * Our virtual mapping for the idmap-ed MMU-enable code. Must be
> >> @@ -109,7 +108,7 @@ alternative_endif
> >>  #include <asm/mmu_context.h>
> >>  #include <asm/pgtable.h>
> >>  
> >> -#define KERN_TO_HYP(kva)	((unsigned long)kva - PAGE_OFFSET + HYP_PAGE_OFFSET)
> >> +#define KERN_TO_HYP(kva)	((unsigned long)kva & HYP_PAGE_OFFSET_MASK)
> >>  
> > 
> > Why do we have both kern_hyp_va() and KERN_TO_HYP and how are they
> > related again?
> 
> That's because kern_hyp_va used to be reserved to the assembly code, and
> KERN_TO_HYP used in C code. We could (and probably should) unify them.
> 
If we can, that would be good.


Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 02/15] arm64: KVM: Kill HYP_PAGE_OFFSET
@ 2016-06-28 12:03         ` Christoffer Dall
  0 siblings, 0 replies; 90+ messages in thread
From: Christoffer Dall @ 2016-06-28 12:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 27, 2016 at 03:20:23PM +0100, Marc Zyngier wrote:
> On 27/06/16 14:47, Christoffer Dall wrote:
> > On Tue, Jun 07, 2016 at 11:58:22AM +0100, Marc Zyngier wrote:
> >> HYP_PAGE_OFFSET is not massively useful. And the way we use it
> >> in KERN_HYP_VA is inconsistent with the equivalent operation in
> >> EL2, where we use a mask instead.
> >>
> >> Let's replace the uses of HYP_PAGE_OFFSET with HYP_PAGE_OFFSET_MASK,
> >> and get rid of the pointless macro.
> >>
> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >> ---
> >>  arch/arm64/include/asm/kvm_hyp.h | 5 ++---
> >>  arch/arm64/include/asm/kvm_mmu.h | 3 +--
> >>  2 files changed, 3 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
> >> index 44eaff7..61d01a9 100644
> >> --- a/arch/arm64/include/asm/kvm_hyp.h
> >> +++ b/arch/arm64/include/asm/kvm_hyp.h
> >> @@ -38,11 +38,10 @@ static inline unsigned long __kern_hyp_va(unsigned long v)
> >>  
> >>  static inline unsigned long __hyp_kern_va(unsigned long v)
> >>  {
> >> -	u64 offset = PAGE_OFFSET - HYP_PAGE_OFFSET;
> >> -	asm volatile(ALTERNATIVE("add %0, %0, %1",
> >> +	asm volatile(ALTERNATIVE("orr %0, %0, %1",
> >>  				 "nop",
> >>  				 ARM64_HAS_VIRT_HOST_EXTN)
> >> -		     : "+r" (v) : "r" (offset));
> >> +		     : "+r" (v) : "i" (~HYP_PAGE_OFFSET_MASK));
> > 
> > for some reason this is hurting my brain.  I can't easily see that the
> > two implementations are equivalent.
> > 
> > I can see that the kernel-to-hyp masking is trivially correct, but are
> > we always sure that the upper bits that we mask off are always set?
> 
> A kernel address always has the top bits set. That's a given, and a
> property of the architecture (bits [63:VA_BITS] are set to one. See
> D4.2.1 and the definition of a Virtual Address (top VA subrange).
> 

This part I understood, but I somehow had the impression that
HYP_PAGE_OFFSET_MASK could mask off more than (63 - VA_BITS + 1) bits,
but looking at the definition of HYP_PAGE_OFFSET_MASK it clearly cannot.

What can I say, I probably shouldn't have looked at code yesterday.

> > 
> >>  	return v;
> >>  }
> >>  
> >> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> >> index 00bc277..d162372 100644
> >> --- a/arch/arm64/include/asm/kvm_mmu.h
> >> +++ b/arch/arm64/include/asm/kvm_mmu.h
> >> @@ -75,7 +75,6 @@
> >>   */
> >>  #define HYP_PAGE_OFFSET_SHIFT	VA_BITS
> >>  #define HYP_PAGE_OFFSET_MASK	((UL(1) << HYP_PAGE_OFFSET_SHIFT) - 1)
> >> -#define HYP_PAGE_OFFSET		(PAGE_OFFSET & HYP_PAGE_OFFSET_MASK)
> >>  
> >>  /*
> >>   * Our virtual mapping for the idmap-ed MMU-enable code. Must be
> >> @@ -109,7 +108,7 @@ alternative_endif
> >>  #include <asm/mmu_context.h>
> >>  #include <asm/pgtable.h>
> >>  
> >> -#define KERN_TO_HYP(kva)	((unsigned long)kva - PAGE_OFFSET + HYP_PAGE_OFFSET)
> >> +#define KERN_TO_HYP(kva)	((unsigned long)kva & HYP_PAGE_OFFSET_MASK)
> >>  
> > 
> > Why do we have both kern_hyp_va() and KERN_TO_HYP and how are they
> > related again?
> 
> That's because kern_hyp_va used to be reserved to the assembly code, and
> KERN_TO_HYP used in C code. We could (and probably should) unify them.
> 
If we can, that would be good.


Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 05/15] arm64: KVM: Refactor kern_hyp_va/hyp_kern_va to deal with multiple offsets
  2016-06-07 10:58   ` Marc Zyngier
@ 2016-06-28 12:42     ` Christoffer Dall
  -1 siblings, 0 replies; 90+ messages in thread
From: Christoffer Dall @ 2016-06-28 12:42 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, linux-arm-kernel, kvmarm

On Tue, Jun 07, 2016 at 11:58:25AM +0100, Marc Zyngier wrote:
> As we move towards a selectable HYP VA range, it is obvious that
> we don't want to test a variable to find out if we need to use
> the bottom VA range, the top VA range, or use the address as is
> (for VHE).
> 
> Instead, we can expand our current helpers to generate the right
> mask or nop with code patching. We default to using the top VA
> space, with alternatives to switch to the bottom one or to nop
> out the instructions.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/kvm_hyp.h | 27 ++++++++++++--------------
>  arch/arm64/include/asm/kvm_mmu.h | 42 +++++++++++++++++++++++++++++++++++++---
>  2 files changed, 51 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
> index 61d01a9..dd4904b 100644
> --- a/arch/arm64/include/asm/kvm_hyp.h
> +++ b/arch/arm64/include/asm/kvm_hyp.h
> @@ -25,24 +25,21 @@
>  
>  #define __hyp_text __section(.hyp.text) notrace
>  
> -static inline unsigned long __kern_hyp_va(unsigned long v)
> -{
> -	asm volatile(ALTERNATIVE("and %0, %0, %1",
> -				 "nop",
> -				 ARM64_HAS_VIRT_HOST_EXTN)
> -		     : "+r" (v) : "i" (HYP_PAGE_OFFSET_MASK));
> -	return v;
> -}
> -
> -#define kern_hyp_va(v) (typeof(v))(__kern_hyp_va((unsigned long)(v)))
> -
>  static inline unsigned long __hyp_kern_va(unsigned long v)
>  {
> -	asm volatile(ALTERNATIVE("orr %0, %0, %1",
> -				 "nop",
> +	u64 mask;
> +
> +	asm volatile(ALTERNATIVE("mov %0, %1",
> +				 "mov %0, %2",
> +				 ARM64_HYP_OFFSET_LOW)
> +		     : "=r" (mask)
> +		     : "i" (~HYP_PAGE_OFFSET_HIGH_MASK),
> +		       "i" (~HYP_PAGE_OFFSET_LOW_MASK));
> +	asm volatile(ALTERNATIVE("nop",
> +				 "mov %0, xzr",
>  				 ARM64_HAS_VIRT_HOST_EXTN)
> -		     : "+r" (v) : "i" (~HYP_PAGE_OFFSET_MASK));
> -	return v;
> +		     : "+r" (mask));
> +	return v | mask;

If mask is ~HYP_PAGE_OFFSET_LOW_MASK how can you be sure that setting
bit (VA_BITS - 1) is always the right thing to do to generate a kernel
address?

This is kind of what I asked before only now there's an extra bit not
guaranteed by the architecture to be set for the kernel range, I
think.

>  }
>  
>  #define hyp_kern_va(v) (typeof(v))(__hyp_kern_va((unsigned long)(v)))
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index e45df1b..889330b 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -95,13 +95,33 @@
>  /*
>   * Convert a kernel VA into a HYP VA.
>   * reg: VA to be converted.
> + *
> + * This generates the following sequences:
> + * - High mask:
> + *		and x0, x0, #HYP_PAGE_OFFSET_HIGH_MASK
> + *		nop
> + * - Low mask:
> + *		and x0, x0, #HYP_PAGE_OFFSET_HIGH_MASK
> + *		and x0, x0, #HYP_PAGE_OFFSET_LOW_MASK
> + * - VHE:
> + *		nop
> + *		nop
> + *
> + * The "low mask" version works because the mask is a strict subset of
> + * the "high mask", hence performing the first mask for nothing.
> + * Should be completely invisible on any viable CPU.
>   */
>  .macro kern_hyp_va	reg
> -alternative_if_not ARM64_HAS_VIRT_HOST_EXTN	
> -	and	\reg, \reg, #HYP_PAGE_OFFSET_MASK
> +alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
> +	and     \reg, \reg, #HYP_PAGE_OFFSET_HIGH_MASK
>  alternative_else
>  	nop
>  alternative_endif
> +alternative_if_not ARM64_HYP_OFFSET_LOW
> +	nop
> +alternative_else
> +	and     \reg, \reg, #HYP_PAGE_OFFSET_LOW_MASK
> +alternative_endif
>  .endm
>  
>  #else
> @@ -112,7 +132,23 @@ alternative_endif
>  #include <asm/mmu_context.h>
>  #include <asm/pgtable.h>
>  
> -#define KERN_TO_HYP(kva)	((unsigned long)kva & HYP_PAGE_OFFSET_MASK)
> +static inline unsigned long __kern_hyp_va(unsigned long v)
> +{
> +	asm volatile(ALTERNATIVE("and %0, %0, %1",
> +				 "nop",
> +				 ARM64_HAS_VIRT_HOST_EXTN)
> +		     : "+r" (v)
> +		     : "i" (HYP_PAGE_OFFSET_HIGH_MASK));
> +	asm volatile(ALTERNATIVE("nop",
> +				 "and %0, %0, %1",
> +				 ARM64_HYP_OFFSET_LOW)
> +		     : "+r" (v)
> +		     : "i" (HYP_PAGE_OFFSET_LOW_MASK));

how is the second operation a nop for VHE? Is this because
ARM64_HYP_OFFSET_LOW will never be set for VHE?

> +	return v;
> +}
> +
> +#define kern_hyp_va(v) 	(typeof(v))(__kern_hyp_va((unsigned long)(v)))
> +#define KERN_TO_HYP(v)	kern_hyp_va(v)

looks like there's room for some unification/cleanup here as well.

>  
>  /*
>   * We currently only support a 40bit IPA.
> -- 
> 2.1.4
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 05/15] arm64: KVM: Refactor kern_hyp_va/hyp_kern_va to deal with multiple offsets
@ 2016-06-28 12:42     ` Christoffer Dall
  0 siblings, 0 replies; 90+ messages in thread
From: Christoffer Dall @ 2016-06-28 12:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 07, 2016 at 11:58:25AM +0100, Marc Zyngier wrote:
> As we move towards a selectable HYP VA range, it is obvious that
> we don't want to test a variable to find out if we need to use
> the bottom VA range, the top VA range, or use the address as is
> (for VHE).
> 
> Instead, we can expand our current helpers to generate the right
> mask or nop with code patching. We default to using the top VA
> space, with alternatives to switch to the bottom one or to nop
> out the instructions.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/kvm_hyp.h | 27 ++++++++++++--------------
>  arch/arm64/include/asm/kvm_mmu.h | 42 +++++++++++++++++++++++++++++++++++++---
>  2 files changed, 51 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
> index 61d01a9..dd4904b 100644
> --- a/arch/arm64/include/asm/kvm_hyp.h
> +++ b/arch/arm64/include/asm/kvm_hyp.h
> @@ -25,24 +25,21 @@
>  
>  #define __hyp_text __section(.hyp.text) notrace
>  
> -static inline unsigned long __kern_hyp_va(unsigned long v)
> -{
> -	asm volatile(ALTERNATIVE("and %0, %0, %1",
> -				 "nop",
> -				 ARM64_HAS_VIRT_HOST_EXTN)
> -		     : "+r" (v) : "i" (HYP_PAGE_OFFSET_MASK));
> -	return v;
> -}
> -
> -#define kern_hyp_va(v) (typeof(v))(__kern_hyp_va((unsigned long)(v)))
> -
>  static inline unsigned long __hyp_kern_va(unsigned long v)
>  {
> -	asm volatile(ALTERNATIVE("orr %0, %0, %1",
> -				 "nop",
> +	u64 mask;
> +
> +	asm volatile(ALTERNATIVE("mov %0, %1",
> +				 "mov %0, %2",
> +				 ARM64_HYP_OFFSET_LOW)
> +		     : "=r" (mask)
> +		     : "i" (~HYP_PAGE_OFFSET_HIGH_MASK),
> +		       "i" (~HYP_PAGE_OFFSET_LOW_MASK));
> +	asm volatile(ALTERNATIVE("nop",
> +				 "mov %0, xzr",
>  				 ARM64_HAS_VIRT_HOST_EXTN)
> -		     : "+r" (v) : "i" (~HYP_PAGE_OFFSET_MASK));
> -	return v;
> +		     : "+r" (mask));
> +	return v | mask;

If mask is ~HYP_PAGE_OFFSET_LOW_MASK how can you be sure that setting
bit (VA_BITS - 1) is always the right thing to do to generate a kernel
address?

This is kind of what I asked before only now there's an extra bit not
guaranteed by the architecture to be set for the kernel range, I
think.

>  }
>  
>  #define hyp_kern_va(v) (typeof(v))(__hyp_kern_va((unsigned long)(v)))
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index e45df1b..889330b 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -95,13 +95,33 @@
>  /*
>   * Convert a kernel VA into a HYP VA.
>   * reg: VA to be converted.
> + *
> + * This generates the following sequences:
> + * - High mask:
> + *		and x0, x0, #HYP_PAGE_OFFSET_HIGH_MASK
> + *		nop
> + * - Low mask:
> + *		and x0, x0, #HYP_PAGE_OFFSET_HIGH_MASK
> + *		and x0, x0, #HYP_PAGE_OFFSET_LOW_MASK
> + * - VHE:
> + *		nop
> + *		nop
> + *
> + * The "low mask" version works because the mask is a strict subset of
> + * the "high mask", hence performing the first mask for nothing.
> + * Should be completely invisible on any viable CPU.
>   */
>  .macro kern_hyp_va	reg
> -alternative_if_not ARM64_HAS_VIRT_HOST_EXTN	
> -	and	\reg, \reg, #HYP_PAGE_OFFSET_MASK
> +alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
> +	and     \reg, \reg, #HYP_PAGE_OFFSET_HIGH_MASK
>  alternative_else
>  	nop
>  alternative_endif
> +alternative_if_not ARM64_HYP_OFFSET_LOW
> +	nop
> +alternative_else
> +	and     \reg, \reg, #HYP_PAGE_OFFSET_LOW_MASK
> +alternative_endif
>  .endm
>  
>  #else
> @@ -112,7 +132,23 @@ alternative_endif
>  #include <asm/mmu_context.h>
>  #include <asm/pgtable.h>
>  
> -#define KERN_TO_HYP(kva)	((unsigned long)kva & HYP_PAGE_OFFSET_MASK)
> +static inline unsigned long __kern_hyp_va(unsigned long v)
> +{
> +	asm volatile(ALTERNATIVE("and %0, %0, %1",
> +				 "nop",
> +				 ARM64_HAS_VIRT_HOST_EXTN)
> +		     : "+r" (v)
> +		     : "i" (HYP_PAGE_OFFSET_HIGH_MASK));
> +	asm volatile(ALTERNATIVE("nop",
> +				 "and %0, %0, %1",
> +				 ARM64_HYP_OFFSET_LOW)
> +		     : "+r" (v)
> +		     : "i" (HYP_PAGE_OFFSET_LOW_MASK));

how is the second operation a nop for VHE? Is this because
ARM64_HYP_OFFSET_LOW will never be set for VHE?

> +	return v;
> +}
> +
> +#define kern_hyp_va(v) 	(typeof(v))(__kern_hyp_va((unsigned long)(v)))
> +#define KERN_TO_HYP(v)	kern_hyp_va(v)

looks like there's room for some unification/cleanup here as well.

>  
>  /*
>   * We currently only support a 40bit IPA.
> -- 
> 2.1.4
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 09/15] arm64: KVM: Simplify HYP init/teardown
  2016-06-07 10:58   ` Marc Zyngier
@ 2016-06-28 21:31     ` Christoffer Dall
  -1 siblings, 0 replies; 90+ messages in thread
From: Christoffer Dall @ 2016-06-28 21:31 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: linux-arm-kernel, kvm, kvmarm

On Tue, Jun 07, 2016 at 11:58:29AM +0100, Marc Zyngier wrote:
> Now that we only have the "merged page tables" case to deal with,
> there is a bunch of things we can simplify in the HYP code (both
> at init and teardown time).
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/kvm_host.h | 12 ++------
>  arch/arm64/kvm/hyp-init.S         | 61 +++++----------------------------------
>  arch/arm64/kvm/hyp/entry.S        | 19 ------------
>  arch/arm64/kvm/hyp/hyp-entry.S    | 15 ++++++++++
>  arch/arm64/kvm/reset.c            | 11 -------
>  5 files changed, 26 insertions(+), 92 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 49095fc..88462c3 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -48,7 +48,6 @@
>  int __attribute_const__ kvm_target_cpu(void);
>  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
>  int kvm_arch_dev_ioctl_check_extension(long ext);
> -unsigned long kvm_hyp_reset_entry(void);
>  void __extended_idmap_trampoline(phys_addr_t boot_pgd, phys_addr_t idmap_start);
>  
>  struct kvm_arch {
> @@ -357,19 +356,14 @@ static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
>  	 * Call initialization code, and switch to the full blown
>  	 * HYP code.
>  	 */
> -	__kvm_call_hyp((void *)boot_pgd_ptr, pgd_ptr,
> -		       hyp_stack_ptr, vector_ptr);
> +	__kvm_call_hyp((void *)pgd_ptr, hyp_stack_ptr, vector_ptr);
>  }
>  
> +void __kvm_hyp_teardown(void);
>  static inline void __cpu_reset_hyp_mode(phys_addr_t boot_pgd_ptr,
>  					phys_addr_t phys_idmap_start)
>  {
> -	/*
> -	 * Call reset code, and switch back to stub hyp vectors.
> -	 * Uses __kvm_call_hyp() to avoid kaslr's kvm_ksym_ref() translation.
> -	 */
> -	__kvm_call_hyp((void *)kvm_hyp_reset_entry(),
> -		       boot_pgd_ptr, phys_idmap_start);
> +	kvm_call_hyp(__kvm_hyp_teardown, phys_idmap_start);
>  }
>  
>  static inline void kvm_arch_hardware_unsetup(void) {}
> diff --git a/arch/arm64/kvm/hyp-init.S b/arch/arm64/kvm/hyp-init.S
> index a873a6d..6b29d3d 100644
> --- a/arch/arm64/kvm/hyp-init.S
> +++ b/arch/arm64/kvm/hyp-init.S
> @@ -53,10 +53,9 @@ __invalid:
>  	b	.
>  
>  	/*
> -	 * x0: HYP boot pgd
> -	 * x1: HYP pgd
> -	 * x2: HYP stack
> -	 * x3: HYP vectors
> +	 * x0: HYP pgd
> +	 * x1: HYP stack
> +	 * x2: HYP vectors
>  	 */
>  __do_hyp_init:
>  
> @@ -110,71 +109,27 @@ __do_hyp_init:
>  	msr	sctlr_el2, x4
>  	isb
>  
> -	/* Skip the trampoline dance if we merged the boot and runtime PGDs */
> -	cmp	x0, x1
> -	b.eq	merged
> -
> -	/* MMU is now enabled. Get ready for the trampoline dance */
> -	ldr	x4, =TRAMPOLINE_VA
> -	adr	x5, target
> -	bfi	x4, x5, #0, #PAGE_SHIFT
> -	br	x4
> -
> -target: /* We're now in the trampoline code, switch page tables */
> -	msr	ttbr0_el2, x1
> -	isb
> -
> -	/* Invalidate the old TLBs */
> -	tlbi	alle2
> -	dsb	sy
> -
> -merged:
>  	/* Set the stack and new vectors */
> +	kern_hyp_va	x1
> +	mov	sp, x1
>  	kern_hyp_va	x2
> -	mov	sp, x2
> -	kern_hyp_va	x3
> -	msr	vbar_el2, x3
> +	msr	vbar_el2, x2
>  
>  	/* Hello, World! */
>  	eret
>  ENDPROC(__kvm_hyp_init)
>  
>  	/*
> -	 * Reset kvm back to the hyp stub. This is the trampoline dance in
> -	 * reverse. If kvm used an extended idmap, __extended_idmap_trampoline
> -	 * calls this code directly in the idmap. In this case switching to the
> -	 * boot tables is a no-op.
> -	 *
> -	 * x0: HYP boot pgd
> -	 * x1: HYP phys_idmap_start
> +	 * Reset kvm back to the hyp stub.
>  	 */
>  ENTRY(__kvm_hyp_reset)
> -	/* We're in trampoline code in VA, switch back to boot page tables */
> -	msr	ttbr0_el2, x0
> -	isb
> -
> -	/* Ensure the PA branch doesn't find a stale tlb entry or stale code. */
> -	ic	iallu
> -	tlbi	alle2
> -	dsb	sy
> -	isb
> -
> -	/* Branch into PA space */
> -	adr	x0, 1f
> -	bfi	x1, x0, #0, #PAGE_SHIFT
> -	br	x1
> -
>  	/* We're now in idmap, disable MMU */
> -1:	mrs	x0, sctlr_el2
> +	mrs	x0, sctlr_el2
>  	ldr	x1, =SCTLR_ELx_FLAGS
>  	bic	x0, x0, x1		// Clear SCTL_M and etc
>  	msr	sctlr_el2, x0
>  	isb
>  
> -	/* Invalidate the old TLBs */
> -	tlbi	alle2
> -	dsb	sy
> -

why can we get rid of the above two lines now?

>  	/* Install stub vectors */
>  	adr_l	x0, __hyp_stub_vectors
>  	msr	vbar_el2, x0
> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> index 70254a6..ce9e5e5 100644
> --- a/arch/arm64/kvm/hyp/entry.S
> +++ b/arch/arm64/kvm/hyp/entry.S
> @@ -164,22 +164,3 @@ alternative_endif
>  
>  	eret
>  ENDPROC(__fpsimd_guest_restore)
> -
> -/*
> - * When using the extended idmap, we don't have a trampoline page we can use
> - * while we switch pages tables during __kvm_hyp_reset. Accessing the idmap
> - * directly would be ideal, but if we're using the extended idmap then the
> - * idmap is located above HYP_PAGE_OFFSET, and the address will be masked by
> - * kvm_call_hyp using kern_hyp_va.
> - *
> - * x0: HYP boot pgd
> - * x1: HYP phys_idmap_start
> - */
> -ENTRY(__extended_idmap_trampoline)
> -	mov	x4, x1
> -	adr_l	x3, __kvm_hyp_reset
> -
> -	/* insert __kvm_hyp_reset()s offset into phys_idmap_start */
> -	bfi	x4, x3, #0, #PAGE_SHIFT
> -	br	x4
> -ENDPROC(__extended_idmap_trampoline)
> diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
> index 2d87f36..f6d9694 100644
> --- a/arch/arm64/kvm/hyp/hyp-entry.S
> +++ b/arch/arm64/kvm/hyp/hyp-entry.S
> @@ -62,6 +62,21 @@ ENTRY(__vhe_hyp_call)
>  	isb
>  	ret
>  ENDPROC(__vhe_hyp_call)
> +
> +/*
> + * Compute the idmap address of __kvm_hyp_reset based on the idmap
> + * start passed as a parameter, and jump there.
> + *
> + * x0: HYP phys_idmap_start
> + */
> +ENTRY(__kvm_hyp_teardown)
> +	mov	x4, x0
> +	adr_l	x3, __kvm_hyp_reset
> +
> +	/* insert __kvm_hyp_reset()s offset into phys_idmap_start */
> +	bfi	x4, x3, #0, #PAGE_SHIFT
> +	br	x4
> +ENDPROC(__kvm_hyp_teardown)
>  	
>  el1_sync:				// Guest trapped into EL2
>  	save_x0_to_x3
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index d044ca3..deee1b1 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -132,14 +132,3 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>  	/* Reset timer */
>  	return kvm_timer_vcpu_reset(vcpu, cpu_vtimer_irq);
>  }
> -
> -unsigned long kvm_hyp_reset_entry(void)
> -{
> -	/*
> -	 * KVM is running with merged page tables, which don't have the
> -	 * trampoline page mapped. We know the idmap is still mapped,
> -	 * but can't be called into directly. Use
> -	 * __extended_idmap_trampoline to do the call.
> -	 */
> -	return (unsigned long)kvm_ksym_ref(__extended_idmap_trampoline);
> -}
> -- 
> 2.1.4
> 

I'm not sure I understand why we needed the kvm_hyp_reset_entry
indirection before, but the resulting code here looks good to me.

Thanks,
-Christoffer


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 09/15] arm64: KVM: Simplify HYP init/teardown
@ 2016-06-28 21:31     ` Christoffer Dall
  0 siblings, 0 replies; 90+ messages in thread
From: Christoffer Dall @ 2016-06-28 21:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 07, 2016 at 11:58:29AM +0100, Marc Zyngier wrote:
> Now that we only have the "merged page tables" case to deal with,
> there is a bunch of things we can simplify in the HYP code (both
> at init and teardown time).
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/kvm_host.h | 12 ++------
>  arch/arm64/kvm/hyp-init.S         | 61 +++++----------------------------------
>  arch/arm64/kvm/hyp/entry.S        | 19 ------------
>  arch/arm64/kvm/hyp/hyp-entry.S    | 15 ++++++++++
>  arch/arm64/kvm/reset.c            | 11 -------
>  5 files changed, 26 insertions(+), 92 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 49095fc..88462c3 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -48,7 +48,6 @@
>  int __attribute_const__ kvm_target_cpu(void);
>  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
>  int kvm_arch_dev_ioctl_check_extension(long ext);
> -unsigned long kvm_hyp_reset_entry(void);
>  void __extended_idmap_trampoline(phys_addr_t boot_pgd, phys_addr_t idmap_start);
>  
>  struct kvm_arch {
> @@ -357,19 +356,14 @@ static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
>  	 * Call initialization code, and switch to the full blown
>  	 * HYP code.
>  	 */
> -	__kvm_call_hyp((void *)boot_pgd_ptr, pgd_ptr,
> -		       hyp_stack_ptr, vector_ptr);
> +	__kvm_call_hyp((void *)pgd_ptr, hyp_stack_ptr, vector_ptr);
>  }
>  
> +void __kvm_hyp_teardown(void);
>  static inline void __cpu_reset_hyp_mode(phys_addr_t boot_pgd_ptr,
>  					phys_addr_t phys_idmap_start)
>  {
> -	/*
> -	 * Call reset code, and switch back to stub hyp vectors.
> -	 * Uses __kvm_call_hyp() to avoid kaslr's kvm_ksym_ref() translation.
> -	 */
> -	__kvm_call_hyp((void *)kvm_hyp_reset_entry(),
> -		       boot_pgd_ptr, phys_idmap_start);
> +	kvm_call_hyp(__kvm_hyp_teardown, phys_idmap_start);
>  }
>  
>  static inline void kvm_arch_hardware_unsetup(void) {}
> diff --git a/arch/arm64/kvm/hyp-init.S b/arch/arm64/kvm/hyp-init.S
> index a873a6d..6b29d3d 100644
> --- a/arch/arm64/kvm/hyp-init.S
> +++ b/arch/arm64/kvm/hyp-init.S
> @@ -53,10 +53,9 @@ __invalid:
>  	b	.
>  
>  	/*
> -	 * x0: HYP boot pgd
> -	 * x1: HYP pgd
> -	 * x2: HYP stack
> -	 * x3: HYP vectors
> +	 * x0: HYP pgd
> +	 * x1: HYP stack
> +	 * x2: HYP vectors
>  	 */
>  __do_hyp_init:
>  
> @@ -110,71 +109,27 @@ __do_hyp_init:
>  	msr	sctlr_el2, x4
>  	isb
>  
> -	/* Skip the trampoline dance if we merged the boot and runtime PGDs */
> -	cmp	x0, x1
> -	b.eq	merged
> -
> -	/* MMU is now enabled. Get ready for the trampoline dance */
> -	ldr	x4, =TRAMPOLINE_VA
> -	adr	x5, target
> -	bfi	x4, x5, #0, #PAGE_SHIFT
> -	br	x4
> -
> -target: /* We're now in the trampoline code, switch page tables */
> -	msr	ttbr0_el2, x1
> -	isb
> -
> -	/* Invalidate the old TLBs */
> -	tlbi	alle2
> -	dsb	sy
> -
> -merged:
>  	/* Set the stack and new vectors */
> +	kern_hyp_va	x1
> +	mov	sp, x1
>  	kern_hyp_va	x2
> -	mov	sp, x2
> -	kern_hyp_va	x3
> -	msr	vbar_el2, x3
> +	msr	vbar_el2, x2
>  
>  	/* Hello, World! */
>  	eret
>  ENDPROC(__kvm_hyp_init)
>  
>  	/*
> -	 * Reset kvm back to the hyp stub. This is the trampoline dance in
> -	 * reverse. If kvm used an extended idmap, __extended_idmap_trampoline
> -	 * calls this code directly in the idmap. In this case switching to the
> -	 * boot tables is a no-op.
> -	 *
> -	 * x0: HYP boot pgd
> -	 * x1: HYP phys_idmap_start
> +	 * Reset kvm back to the hyp stub.
>  	 */
>  ENTRY(__kvm_hyp_reset)
> -	/* We're in trampoline code in VA, switch back to boot page tables */
> -	msr	ttbr0_el2, x0
> -	isb
> -
> -	/* Ensure the PA branch doesn't find a stale tlb entry or stale code. */
> -	ic	iallu
> -	tlbi	alle2
> -	dsb	sy
> -	isb
> -
> -	/* Branch into PA space */
> -	adr	x0, 1f
> -	bfi	x1, x0, #0, #PAGE_SHIFT
> -	br	x1
> -
>  	/* We're now in idmap, disable MMU */
> -1:	mrs	x0, sctlr_el2
> +	mrs	x0, sctlr_el2
>  	ldr	x1, =SCTLR_ELx_FLAGS
>  	bic	x0, x0, x1		// Clear SCTL_M and etc
>  	msr	sctlr_el2, x0
>  	isb
>  
> -	/* Invalidate the old TLBs */
> -	tlbi	alle2
> -	dsb	sy
> -

why can we get rid of the above two lines now?

>  	/* Install stub vectors */
>  	adr_l	x0, __hyp_stub_vectors
>  	msr	vbar_el2, x0
> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> index 70254a6..ce9e5e5 100644
> --- a/arch/arm64/kvm/hyp/entry.S
> +++ b/arch/arm64/kvm/hyp/entry.S
> @@ -164,22 +164,3 @@ alternative_endif
>  
>  	eret
>  ENDPROC(__fpsimd_guest_restore)
> -
> -/*
> - * When using the extended idmap, we don't have a trampoline page we can use
> - * while we switch pages tables during __kvm_hyp_reset. Accessing the idmap
> - * directly would be ideal, but if we're using the extended idmap then the
> - * idmap is located above HYP_PAGE_OFFSET, and the address will be masked by
> - * kvm_call_hyp using kern_hyp_va.
> - *
> - * x0: HYP boot pgd
> - * x1: HYP phys_idmap_start
> - */
> -ENTRY(__extended_idmap_trampoline)
> -	mov	x4, x1
> -	adr_l	x3, __kvm_hyp_reset
> -
> -	/* insert __kvm_hyp_reset()s offset into phys_idmap_start */
> -	bfi	x4, x3, #0, #PAGE_SHIFT
> -	br	x4
> -ENDPROC(__extended_idmap_trampoline)
> diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
> index 2d87f36..f6d9694 100644
> --- a/arch/arm64/kvm/hyp/hyp-entry.S
> +++ b/arch/arm64/kvm/hyp/hyp-entry.S
> @@ -62,6 +62,21 @@ ENTRY(__vhe_hyp_call)
>  	isb
>  	ret
>  ENDPROC(__vhe_hyp_call)
> +
> +/*
> + * Compute the idmap address of __kvm_hyp_reset based on the idmap
> + * start passed as a parameter, and jump there.
> + *
> + * x0: HYP phys_idmap_start
> + */
> +ENTRY(__kvm_hyp_teardown)
> +	mov	x4, x0
> +	adr_l	x3, __kvm_hyp_reset
> +
> +	/* insert __kvm_hyp_reset()s offset into phys_idmap_start */
> +	bfi	x4, x3, #0, #PAGE_SHIFT
> +	br	x4
> +ENDPROC(__kvm_hyp_teardown)
>  	
>  el1_sync:				// Guest trapped into EL2
>  	save_x0_to_x3
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index d044ca3..deee1b1 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -132,14 +132,3 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>  	/* Reset timer */
>  	return kvm_timer_vcpu_reset(vcpu, cpu_vtimer_irq);
>  }
> -
> -unsigned long kvm_hyp_reset_entry(void)
> -{
> -	/*
> -	 * KVM is running with merged page tables, which don't have the
> -	 * trampoline page mapped. We know the idmap is still mapped,
> -	 * but can't be called into directly. Use
> -	 * __extended_idmap_trampoline to do the call.
> -	 */
> -	return (unsigned long)kvm_ksym_ref(__extended_idmap_trampoline);
> -}
> -- 
> 2.1.4
> 

I'm not sure I understand why we needed the kvm_hyp_reset_entry
indirection before, but the resulting code here looks good to me.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 08/15] arm/arm64: KVM: Always have merged page tables
  2016-06-07 10:58   ` Marc Zyngier
@ 2016-06-28 21:43     ` Christoffer Dall
  -1 siblings, 0 replies; 90+ messages in thread
From: Christoffer Dall @ 2016-06-28 21:43 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, linux-arm-kernel, kvmarm

On Tue, Jun 07, 2016 at 11:58:28AM +0100, Marc Zyngier wrote:
> We're in a position where we can now always have "merged" page
> tables, where both the runtime mapping and the idmap coexist.
> 
> This results in some code being removed, but there is more to come.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/kvm/mmu.c     | 74 +++++++++++++++++++++++---------------------------
>  arch/arm64/kvm/reset.c | 31 +++++----------------
>  2 files changed, 41 insertions(+), 64 deletions(-)
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index d6ecbf1..9a17e14 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -492,13 +492,12 @@ void free_boot_hyp_pgd(void)
>  
>  	if (boot_hyp_pgd) {
>  		unmap_hyp_range(boot_hyp_pgd, hyp_idmap_start, PAGE_SIZE);
> -		unmap_hyp_range(boot_hyp_pgd, TRAMPOLINE_VA, PAGE_SIZE);
>  		free_pages((unsigned long)boot_hyp_pgd, hyp_pgd_order);
>  		boot_hyp_pgd = NULL;
>  	}
>  
>  	if (hyp_pgd)
> -		unmap_hyp_range(hyp_pgd, TRAMPOLINE_VA, PAGE_SIZE);
> +		unmap_hyp_range(hyp_pgd, hyp_idmap_start, PAGE_SIZE);
>  
>  	mutex_unlock(&kvm_hyp_pgd_mutex);
>  }
> @@ -1690,7 +1689,7 @@ phys_addr_t kvm_mmu_get_boot_httbr(void)
>  	if (__kvm_cpu_uses_extended_idmap())
>  		return virt_to_phys(merged_hyp_pgd);
>  	else
> -		return virt_to_phys(boot_hyp_pgd);
> +		return virt_to_phys(hyp_pgd);
>  }
>  
>  phys_addr_t kvm_get_idmap_vector(void)
> @@ -1703,6 +1702,22 @@ phys_addr_t kvm_get_idmap_start(void)
>  	return hyp_idmap_start;
>  }
>  
> +static int kvm_map_idmap_text(pgd_t *pgd)
> +{
> +	int err;
> +
> +	/* Create the idmap in the boot page tables */
> +	err = 	__create_hyp_mappings(pgd,
> +				      hyp_idmap_start, hyp_idmap_end,
> +				      __phys_to_pfn(hyp_idmap_start),
> +				      PAGE_HYP);
> +	if (err)
> +		kvm_err("Failed to idmap %lx-%lx\n",
> +			hyp_idmap_start, hyp_idmap_end);
> +
> +	return err;
> +}
> +
>  int kvm_mmu_init(void)
>  {
>  	int err;
> @@ -1718,27 +1733,25 @@ int kvm_mmu_init(void)
>  	BUG_ON((hyp_idmap_start ^ (hyp_idmap_end - 1)) & PAGE_MASK);
>  
>  	hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order);
> -	boot_hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order);
> -
> -	if (!hyp_pgd || !boot_hyp_pgd) {
> +	if (!hyp_pgd) {
>  		kvm_err("Hyp mode PGD not allocated\n");
>  		err = -ENOMEM;
>  		goto out;
>  	}
>  
> -	/* Create the idmap in the boot page tables */
> -	err = 	__create_hyp_mappings(boot_hyp_pgd,
> -				      hyp_idmap_start, hyp_idmap_end,
> -				      __phys_to_pfn(hyp_idmap_start),
> -				      PAGE_HYP);
> +	if (__kvm_cpu_uses_extended_idmap()) {
> +		boot_hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
> +							 hyp_pgd_order);
> +		if (!boot_hyp_pgd) {
> +			kvm_err("Hyp boot PGD not allocated\n");
> +			err = -ENOMEM;
> +			goto out;
> +		}
>  
> -	if (err) {
> -		kvm_err("Failed to idmap %lx-%lx\n",
> -			hyp_idmap_start, hyp_idmap_end);
> -		goto out;
> -	}
> +		err = kvm_map_idmap_text(boot_hyp_pgd);
> +		if (err)
> +			goto out;
>  
> -	if (__kvm_cpu_uses_extended_idmap()) {
>  		merged_hyp_pgd = (pgd_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
>  		if (!merged_hyp_pgd) {
>  			kvm_err("Failed to allocate extra HYP pgd\n");
> @@ -1746,29 +1759,10 @@ int kvm_mmu_init(void)
>  		}
>  		__kvm_extend_hypmap(boot_hyp_pgd, hyp_pgd, merged_hyp_pgd,
>  				    hyp_idmap_start);
> -		return 0;
> -	}
> -
> -	/* Map the very same page at the trampoline VA */
> -	err = 	__create_hyp_mappings(boot_hyp_pgd,
> -				      TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
> -				      __phys_to_pfn(hyp_idmap_start),
> -				      PAGE_HYP);
> -	if (err) {
> -		kvm_err("Failed to map trampoline @%lx into boot HYP pgd\n",
> -			TRAMPOLINE_VA);
> -		goto out;
> -	}
> -
> -	/* Map the same page again into the runtime page tables */
> -	err = 	__create_hyp_mappings(hyp_pgd,
> -				      TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
> -				      __phys_to_pfn(hyp_idmap_start),
> -				      PAGE_HYP);
> -	if (err) {
> -		kvm_err("Failed to map trampoline @%lx into runtime HYP pgd\n",
> -			TRAMPOLINE_VA);
> -		goto out;
> +	} else {
> +		err = kvm_map_idmap_text(hyp_pgd);
> +		if (err)
> +			goto out;

Something I'm not clear on:

how can we always have merged pgtables on 32-bit ARM at this point?

why is there not a potential conflict at this point in the series
between the runtime hyp mappings and the idmaps?

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 08/15] arm/arm64: KVM: Always have merged page tables
@ 2016-06-28 21:43     ` Christoffer Dall
  0 siblings, 0 replies; 90+ messages in thread
From: Christoffer Dall @ 2016-06-28 21:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 07, 2016 at 11:58:28AM +0100, Marc Zyngier wrote:
> We're in a position where we can now always have "merged" page
> tables, where both the runtime mapping and the idmap coexist.
> 
> This results in some code being removed, but there is more to come.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/kvm/mmu.c     | 74 +++++++++++++++++++++++---------------------------
>  arch/arm64/kvm/reset.c | 31 +++++----------------
>  2 files changed, 41 insertions(+), 64 deletions(-)
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index d6ecbf1..9a17e14 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -492,13 +492,12 @@ void free_boot_hyp_pgd(void)
>  
>  	if (boot_hyp_pgd) {
>  		unmap_hyp_range(boot_hyp_pgd, hyp_idmap_start, PAGE_SIZE);
> -		unmap_hyp_range(boot_hyp_pgd, TRAMPOLINE_VA, PAGE_SIZE);
>  		free_pages((unsigned long)boot_hyp_pgd, hyp_pgd_order);
>  		boot_hyp_pgd = NULL;
>  	}
>  
>  	if (hyp_pgd)
> -		unmap_hyp_range(hyp_pgd, TRAMPOLINE_VA, PAGE_SIZE);
> +		unmap_hyp_range(hyp_pgd, hyp_idmap_start, PAGE_SIZE);
>  
>  	mutex_unlock(&kvm_hyp_pgd_mutex);
>  }
> @@ -1690,7 +1689,7 @@ phys_addr_t kvm_mmu_get_boot_httbr(void)
>  	if (__kvm_cpu_uses_extended_idmap())
>  		return virt_to_phys(merged_hyp_pgd);
>  	else
> -		return virt_to_phys(boot_hyp_pgd);
> +		return virt_to_phys(hyp_pgd);
>  }
>  
>  phys_addr_t kvm_get_idmap_vector(void)
> @@ -1703,6 +1702,22 @@ phys_addr_t kvm_get_idmap_start(void)
>  	return hyp_idmap_start;
>  }
>  
> +static int kvm_map_idmap_text(pgd_t *pgd)
> +{
> +	int err;
> +
> +	/* Create the idmap in the boot page tables */
> +	err = 	__create_hyp_mappings(pgd,
> +				      hyp_idmap_start, hyp_idmap_end,
> +				      __phys_to_pfn(hyp_idmap_start),
> +				      PAGE_HYP);
> +	if (err)
> +		kvm_err("Failed to idmap %lx-%lx\n",
> +			hyp_idmap_start, hyp_idmap_end);
> +
> +	return err;
> +}
> +
>  int kvm_mmu_init(void)
>  {
>  	int err;
> @@ -1718,27 +1733,25 @@ int kvm_mmu_init(void)
>  	BUG_ON((hyp_idmap_start ^ (hyp_idmap_end - 1)) & PAGE_MASK);
>  
>  	hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order);
> -	boot_hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order);
> -
> -	if (!hyp_pgd || !boot_hyp_pgd) {
> +	if (!hyp_pgd) {
>  		kvm_err("Hyp mode PGD not allocated\n");
>  		err = -ENOMEM;
>  		goto out;
>  	}
>  
> -	/* Create the idmap in the boot page tables */
> -	err = 	__create_hyp_mappings(boot_hyp_pgd,
> -				      hyp_idmap_start, hyp_idmap_end,
> -				      __phys_to_pfn(hyp_idmap_start),
> -				      PAGE_HYP);
> +	if (__kvm_cpu_uses_extended_idmap()) {
> +		boot_hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
> +							 hyp_pgd_order);
> +		if (!boot_hyp_pgd) {
> +			kvm_err("Hyp boot PGD not allocated\n");
> +			err = -ENOMEM;
> +			goto out;
> +		}
>  
> -	if (err) {
> -		kvm_err("Failed to idmap %lx-%lx\n",
> -			hyp_idmap_start, hyp_idmap_end);
> -		goto out;
> -	}
> +		err = kvm_map_idmap_text(boot_hyp_pgd);
> +		if (err)
> +			goto out;
>  
> -	if (__kvm_cpu_uses_extended_idmap()) {
>  		merged_hyp_pgd = (pgd_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
>  		if (!merged_hyp_pgd) {
>  			kvm_err("Failed to allocate extra HYP pgd\n");
> @@ -1746,29 +1759,10 @@ int kvm_mmu_init(void)
>  		}
>  		__kvm_extend_hypmap(boot_hyp_pgd, hyp_pgd, merged_hyp_pgd,
>  				    hyp_idmap_start);
> -		return 0;
> -	}
> -
> -	/* Map the very same page at the trampoline VA */
> -	err = 	__create_hyp_mappings(boot_hyp_pgd,
> -				      TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
> -				      __phys_to_pfn(hyp_idmap_start),
> -				      PAGE_HYP);
> -	if (err) {
> -		kvm_err("Failed to map trampoline @%lx into boot HYP pgd\n",
> -			TRAMPOLINE_VA);
> -		goto out;
> -	}
> -
> -	/* Map the same page again into the runtime page tables */
> -	err = 	__create_hyp_mappings(hyp_pgd,
> -				      TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
> -				      __phys_to_pfn(hyp_idmap_start),
> -				      PAGE_HYP);
> -	if (err) {
> -		kvm_err("Failed to map trampoline @%lx into runtime HYP pgd\n",
> -			TRAMPOLINE_VA);
> -		goto out;
> +	} else {
> +		err = kvm_map_idmap_text(hyp_pgd);
> +		if (err)
> +			goto out;

Something I'm not clear on:

how can we always have merged pgtables on 32-bit ARM at this point?

why is there not a potential conflict at this point in the series
between the runtime hyp mappings and the idmaps?

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 12/15] arm: KVM: Simplify HYP init
  2016-06-07 10:58   ` Marc Zyngier
@ 2016-06-28 21:50     ` Christoffer Dall
  -1 siblings, 0 replies; 90+ messages in thread
From: Christoffer Dall @ 2016-06-28 21:50 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, linux-arm-kernel, kvmarm

On Tue, Jun 07, 2016 at 11:58:32AM +0100, Marc Zyngier wrote:
> Just like for arm64, we can now make the HYP setup a lot simpler,
> and we can now initialise it in one go (instead of the two
> phases we currently have).
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/include/asm/kvm_host.h | 15 +++++--------
>  arch/arm/kvm/init.S             | 49 ++++++++---------------------------------
>  2 files changed, 14 insertions(+), 50 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 020f4eb..eafbfd5 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -250,18 +250,13 @@ static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
>  	 * code. The init code doesn't need to preserve these
>  	 * registers as r0-r3 are already callee saved according to
>  	 * the AAPCS.
> -	 * Note that we slightly misuse the prototype by casing the
> +	 * Note that we slightly misuse the prototype by casting the
>  	 * stack pointer to a void *.
> -	 *
> -	 * We don't have enough registers to perform the full init in
> -	 * one go.  Install the boot PGD first, and then install the
> -	 * runtime PGD, stack pointer and vectors. The PGDs are always
> -	 * passed as the third argument, in order to be passed into
> -	 * r2-r3 to the init code (yes, this is compliant with the
> -	 * PCS!).
> -	 */
>  
> -	kvm_call_hyp(NULL, 0, boot_pgd_ptr);
> +	 * The PGDs are always passed as the third argument, in order
> +	 * to be passed into r2-r3 to the init code (yes, this is
> +	 * compliant with the PCS!).
> +	 */
>  
>  	kvm_call_hyp((void*)hyp_stack_ptr, vector_ptr, pgd_ptr);
>  }
> diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
> index 1f9ae17..b82a99d 100644
> --- a/arch/arm/kvm/init.S
> +++ b/arch/arm/kvm/init.S
> @@ -32,23 +32,13 @@
>   *       r2,r3 = Hypervisor pgd pointer
>   *
>   * The init scenario is:
> - * - We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
> - *   runtime stack, runtime vectors
> - * - Enable the MMU with the boot pgd
> - * - Jump to a target into the trampoline page (remember, this is the same
> - *   physical page!)
> - * - Now switch to the runtime pgd (same VA, and still the same physical
> - *   page!)
> + * - We jump in HYP with 3 parameters: runtime HYP pgd, runtime stack,
> + *   runtime vectors

probably just call this HYP pgd, HYP stack, and HYP vectors now

>   * - Invalidate TLBs
>   * - Set stack and vectors
> + * - Setup the page tables
> + * - Enable the MMU
>   * - Profit! (or eret, if you only care about the code).
> - *
> - * As we only have four registers available to pass parameters (and we
> - * need six), we split the init in two phases:
> - * - Phase 1: r0 = 0, r1 = 0, r2,r3 contain the boot PGD.
> - *   Provides the basic HYP init, and enable the MMU.
> - * - Phase 2: r0 = ToS, r1 = vectors, r2,r3 contain the runtime PGD.
> - *   Switches to the runtime PGD, set stack and vectors.
>   */
>  
>  	.text
> @@ -68,8 +58,11 @@ __kvm_hyp_init:
>  	W(b)	.
>  
>  __do_hyp_init:
> -	cmp	r0, #0			@ We have a SP?
> -	bne	phase2			@ Yes, second stage init
> +	@ Set stack pointer
> +	mov	sp, r0
> +
> +	@ Set HVBAR to point to the HYP vectors
> +	mcr	p15, 4, r1, c12, c0, 0	@ HVBAR
>  
>  	@ Set the HTTBR to point to the hypervisor PGD pointer passed
>  	mcrr	p15, 4, rr_lo_hi(r2, r3), c2
> @@ -114,33 +107,9 @@ __do_hyp_init:
>   THUMB(	ldr	r2, =(HSCTLR_M | HSCTLR_A | HSCTLR_TE)		)
>  	orr	r1, r1, r2
>  	orr	r0, r0, r1
> -	isb
>  	mcr	p15, 4, r0, c1, c0, 0	@ HSCR
> -
> -	@ End of init phase-1
> -	eret
> -
> -phase2:
> -	@ Set stack pointer
> -	mov	sp, r0
> -
> -	@ Set HVBAR to point to the HYP vectors
> -	mcr	p15, 4, r1, c12, c0, 0	@ HVBAR
> -
> -	@ Jump to the trampoline page
> -	ldr	r0, =TRAMPOLINE_VA
> -	adr	r1, target
> -	bfi	r0, r1, #0, #PAGE_SHIFT
> -	ret	r0
> -
> -target:	@ We're now in the trampoline code, switch page tables
> -	mcrr	p15, 4, rr_lo_hi(r2, r3), c2
>  	isb
>  
> -	@ Invalidate the old TLBs
> -	mcr	p15, 4, r0, c8, c7, 0	@ TLBIALLH
> -	dsb	ish

how are we sure there are no stale entries in the TLB beyond the idmap
region?  Did we take care of this during kernel boot?  What about
hotplug/suspend stuff?

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 12/15] arm: KVM: Simplify HYP init
@ 2016-06-28 21:50     ` Christoffer Dall
  0 siblings, 0 replies; 90+ messages in thread
From: Christoffer Dall @ 2016-06-28 21:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 07, 2016 at 11:58:32AM +0100, Marc Zyngier wrote:
> Just like for arm64, we can now make the HYP setup a lot simpler,
> and we can now initialise it in one go (instead of the two
> phases we currently have).
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/include/asm/kvm_host.h | 15 +++++--------
>  arch/arm/kvm/init.S             | 49 ++++++++---------------------------------
>  2 files changed, 14 insertions(+), 50 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 020f4eb..eafbfd5 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -250,18 +250,13 @@ static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
>  	 * code. The init code doesn't need to preserve these
>  	 * registers as r0-r3 are already callee saved according to
>  	 * the AAPCS.
> -	 * Note that we slightly misuse the prototype by casing the
> +	 * Note that we slightly misuse the prototype by casting the
>  	 * stack pointer to a void *.
> -	 *
> -	 * We don't have enough registers to perform the full init in
> -	 * one go.  Install the boot PGD first, and then install the
> -	 * runtime PGD, stack pointer and vectors. The PGDs are always
> -	 * passed as the third argument, in order to be passed into
> -	 * r2-r3 to the init code (yes, this is compliant with the
> -	 * PCS!).
> -	 */
>  
> -	kvm_call_hyp(NULL, 0, boot_pgd_ptr);
> +	 * The PGDs are always passed as the third argument, in order
> +	 * to be passed into r2-r3 to the init code (yes, this is
> +	 * compliant with the PCS!).
> +	 */
>  
>  	kvm_call_hyp((void*)hyp_stack_ptr, vector_ptr, pgd_ptr);
>  }
> diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
> index 1f9ae17..b82a99d 100644
> --- a/arch/arm/kvm/init.S
> +++ b/arch/arm/kvm/init.S
> @@ -32,23 +32,13 @@
>   *       r2,r3 = Hypervisor pgd pointer
>   *
>   * The init scenario is:
> - * - We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
> - *   runtime stack, runtime vectors
> - * - Enable the MMU with the boot pgd
> - * - Jump to a target into the trampoline page (remember, this is the same
> - *   physical page!)
> - * - Now switch to the runtime pgd (same VA, and still the same physical
> - *   page!)
> + * - We jump in HYP with 3 parameters: runtime HYP pgd, runtime stack,
> + *   runtime vectors

probably just call this HYP pgd, HYP stack, and HYP vectors now

>   * - Invalidate TLBs
>   * - Set stack and vectors
> + * - Setup the page tables
> + * - Enable the MMU
>   * - Profit! (or eret, if you only care about the code).
> - *
> - * As we only have four registers available to pass parameters (and we
> - * need six), we split the init in two phases:
> - * - Phase 1: r0 = 0, r1 = 0, r2,r3 contain the boot PGD.
> - *   Provides the basic HYP init, and enable the MMU.
> - * - Phase 2: r0 = ToS, r1 = vectors, r2,r3 contain the runtime PGD.
> - *   Switches to the runtime PGD, set stack and vectors.
>   */
>  
>  	.text
> @@ -68,8 +58,11 @@ __kvm_hyp_init:
>  	W(b)	.
>  
>  __do_hyp_init:
> -	cmp	r0, #0			@ We have a SP?
> -	bne	phase2			@ Yes, second stage init
> +	@ Set stack pointer
> +	mov	sp, r0
> +
> +	@ Set HVBAR to point to the HYP vectors
> +	mcr	p15, 4, r1, c12, c0, 0	@ HVBAR
>  
>  	@ Set the HTTBR to point to the hypervisor PGD pointer passed
>  	mcrr	p15, 4, rr_lo_hi(r2, r3), c2
> @@ -114,33 +107,9 @@ __do_hyp_init:
>   THUMB(	ldr	r2, =(HSCTLR_M | HSCTLR_A | HSCTLR_TE)		)
>  	orr	r1, r1, r2
>  	orr	r0, r0, r1
> -	isb
>  	mcr	p15, 4, r0, c1, c0, 0	@ HSCR
> -
> -	@ End of init phase-1
> -	eret
> -
> -phase2:
> -	@ Set stack pointer
> -	mov	sp, r0
> -
> -	@ Set HVBAR to point to the HYP vectors
> -	mcr	p15, 4, r1, c12, c0, 0	@ HVBAR
> -
> -	@ Jump to the trampoline page
> -	ldr	r0, =TRAMPOLINE_VA
> -	adr	r1, target
> -	bfi	r0, r1, #0, #PAGE_SHIFT
> -	ret	r0
> -
> -target:	@ We're now in the trampoline code, switch page tables
> -	mcrr	p15, 4, rr_lo_hi(r2, r3), c2
>  	isb
>  
> -	@ Invalidate the old TLBs
> -	mcr	p15, 4, r0, c8, c7, 0	@ TLBIALLH
> -	dsb	ish

how are we sure there are no stale entries in the TLB beyond the idmap
region?  Did we take care of this during kernel boot?  What about
hotplug/suspend stuff?

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 15/15] arm/arm64: KVM: Check that IDMAP doesn't intersect with VA range
  2016-06-07 10:58   ` Marc Zyngier
@ 2016-06-28 22:01     ` Christoffer Dall
  -1 siblings, 0 replies; 90+ messages in thread
From: Christoffer Dall @ 2016-06-28 22:01 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, linux-arm-kernel, kvmarm

On Tue, Jun 07, 2016 at 11:58:35AM +0100, Marc Zyngier wrote:
> This is more of a safety measure than anything else: If we end-up
> with an idmap page that intersect with the range picked for the
> the HYP VA space, abort the KVM setup, as it is unsafe to go
> further.
> 
> I cannot imagine it happening on 64bit (we have a mechanism to
> work around it), but could potentially occur on a 32bit system with
> the kernel loaded high enough in memory so that in conflicts with
> the kernel VA.

ah, you had a patch for this...

does this even work for enabling the MMU during kernel boot or how do
they deal with it?

> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/kvm/mmu.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 46b8604..819517d 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -1708,6 +1708,21 @@ int kvm_mmu_init(void)
>  	 */
>  	BUG_ON((hyp_idmap_start ^ (hyp_idmap_end - 1)) & PAGE_MASK);
>  
> +	kvm_info("IDMAP page: %lx\n", hyp_idmap_start);
> +	kvm_info("HYP VA range: %lx:%lx\n",
> +		 KERN_TO_HYP(PAGE_OFFSET), KERN_TO_HYP(~0UL));
> +
> +	if (hyp_idmap_start >= KERN_TO_HYP(PAGE_OFFSET) &&
> +	    hyp_idmap_start <  KERN_TO_HYP(~0UL)) {

why is the second part of this clause necessary?

> +		/*
> +		 * The idmap page is intersecting with the VA space,
> +		 * it is not safe to continue further.
> +		 */
> +		kvm_err("IDMAP intersecting with HYP VA, unable to continue\n");
> +		err = -EINVAL;
> +		goto out;
> +	}
> +
>  	hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order);
>  	if (!hyp_pgd) {
>  		kvm_err("Hyp mode PGD not allocated\n");
> -- 
> 2.1.4
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 15/15] arm/arm64: KVM: Check that IDMAP doesn't intersect with VA range
@ 2016-06-28 22:01     ` Christoffer Dall
  0 siblings, 0 replies; 90+ messages in thread
From: Christoffer Dall @ 2016-06-28 22:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 07, 2016 at 11:58:35AM +0100, Marc Zyngier wrote:
> This is more of a safety measure than anything else: If we end-up
> with an idmap page that intersect with the range picked for the
> the HYP VA space, abort the KVM setup, as it is unsafe to go
> further.
> 
> I cannot imagine it happening on 64bit (we have a mechanism to
> work around it), but could potentially occur on a 32bit system with
> the kernel loaded high enough in memory so that in conflicts with
> the kernel VA.

ah, you had a patch for this...

does this even work for enabling the MMU during kernel boot or how do
they deal with it?

> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/kvm/mmu.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 46b8604..819517d 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -1708,6 +1708,21 @@ int kvm_mmu_init(void)
>  	 */
>  	BUG_ON((hyp_idmap_start ^ (hyp_idmap_end - 1)) & PAGE_MASK);
>  
> +	kvm_info("IDMAP page: %lx\n", hyp_idmap_start);
> +	kvm_info("HYP VA range: %lx:%lx\n",
> +		 KERN_TO_HYP(PAGE_OFFSET), KERN_TO_HYP(~0UL));
> +
> +	if (hyp_idmap_start >= KERN_TO_HYP(PAGE_OFFSET) &&
> +	    hyp_idmap_start <  KERN_TO_HYP(~0UL)) {

why is the second part of this clause necessary?

> +		/*
> +		 * The idmap page is intersecting with the VA space,
> +		 * it is not safe to continue further.
> +		 */
> +		kvm_err("IDMAP intersecting with HYP VA, unable to continue\n");
> +		err = -EINVAL;
> +		goto out;
> +	}
> +
>  	hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order);
>  	if (!hyp_pgd) {
>  		kvm_err("Hyp mode PGD not allocated\n");
> -- 
> 2.1.4
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 01/15] arm64: KVM: Merged page tables documentation
  2016-06-28 11:46         ` Christoffer Dall
@ 2016-06-29  9:05           ` Marc Zyngier
  -1 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-29  9:05 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm

On Tue, 28 Jun 2016 13:46:08 +0200
Christoffer Dall <christoffer.dall@linaro.org> wrote:

> On Mon, Jun 27, 2016 at 03:06:11PM +0100, Marc Zyngier wrote:
> > On 27/06/16 14:28, Christoffer Dall wrote:
> > > On Tue, Jun 07, 2016 at 11:58:21AM +0100, Marc Zyngier wrote:
> > >> Since dealing with VA ranges tends to hurt my brain badly, let's
> > >> start with a bit of documentation that will hopefully help
> > >> understanding what comes next...
> > >>
> > >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > >> ---
> > >>  arch/arm64/include/asm/kvm_mmu.h | 45 +++++++++++++++++++++++++++++++++++++---
> > >>  1 file changed, 42 insertions(+), 3 deletions(-)
> > >>
> > >> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> > >> index f05ac27..00bc277 100644
> > >> --- a/arch/arm64/include/asm/kvm_mmu.h
> > >> +++ b/arch/arm64/include/asm/kvm_mmu.h
> > >> @@ -29,10 +29,49 @@
> > >>   *
> > >>   * Instead, give the HYP mode its own VA region at a fixed offset from
> > >>   * the kernel by just masking the top bits (which are all ones for a
> > >> - * kernel address).
> > >> + * kernel address). We need to find out how many bits to mask.
> > >>   *
> > >> - * ARMv8.1 (using VHE) does have a TTBR1_EL2, and doesn't use these
> > >> - * macros (the entire kernel runs at EL2).
> > >> + * We want to build a set of page tables that cover both parts of the
> > >> + * idmap (the trampoline page used to initialize EL2), and our normal
> > >> + * runtime VA space, at the same time.
> > >> + *
> > >> + * Given that the kernel uses VA_BITS for its entire address space,
> > >> + * and that half of that space (VA_BITS - 1) is used for the linear
> > >> + * mapping, we can limit the EL2 space to the same size.
> > > 
> > > we can also limit the EL2 space to (VA_BITS - 1).
> > > 
> > >> + *
> > >> + * The main question is "Within the VA_BITS space, does EL2 use the
> > >> + * top or the bottom half of that space to shadow the kernel's linear
> > >> + * mapping?". As we need to idmap the trampoline page, this is
> > >> + * determined by the range in which this page lives.
> > >> + *
> > >> + * If the page is in the bottom half, we have to use the top half. If
> > >> + * the page is in the top half, we have to use the bottom half:
> > >> + *
> > >> + * if (PA(T)[VA_BITS - 1] == 1)
> > >> + *	HYP_VA_RANGE = [0 ... (1 << (VA_BITS - 1)) - 1]
> > >> + * else
> > >> + *	HYP_VA_RANGE = [(1 << (VA_BITS - 1)) ... (1 << VA_BITS) - 1]
> > > 
> > > Is this pseudo code or what am I looking at?  What is T?
> > 
> > Pseudocode indeed. T is the "trampoline page".
> > 
> > > I don't understand what this is saying.
> > 
> > This is giving you the range of HYP VAs that can be safely used to map
> > kernel ranges.
> 
> Ah, by PA(T)[bit_nr] you mean the value of an individual bit 'bit_nr' ?
> 
> I just think I choked on the pseudocode syntax, perhaps this is easier
> to understand?
> 
> T = __virt_to_phys(__hyp_idmap_text_start)
> if (T & BIT(VA_BITS - 1))
> 	HYP_VA_MIN = 0  //idmap in upper half
> else
> 	HYP_VA_MIN = 1 << (VA_BITS - 1)
> HYP_VA_MAX = HYP_VA_MIN + (1 << (VA_BITS - 1)) - 1

Yup, that's equivalent.

[...]

> > >> + *
> > >> + * In practice, the second case can be simplified to
> > >> + *	HYP_VA_RANGE = [0 ... (1 << VA_BITS) - 1]
> > >> + * because we'll never get anything in the bottom range.
> > > 
> > > and now I'm more confused, are we not supposed to map the idmap in the
> > > bottom range?  Is this part of the comment necessary?
> > 
> > Well, I found it useful when I wrote it. What I meant is that we're
> > never going to alias a kernel mapping there.
> 
> I think we should merge the documentation, this stuff is tricky so
> having it properly documented is important IMHO.
>
> The confusing part here is that we just said above that the HYP VA range
> may have to live in the upper part because the lower part would be used
> for the idmap, so why can we use it anyway?
> 
> Is the point that you'll be done with the idmap at some point?

No, the idmap has to stay (you definitely need it in order to enable
the MMU). It is not so much that we can or cannot bottom range, this
is simply where the idmap lives (the remark is confusing). The usable
VA space (to map kernel objects) is still between HYP_VA_MIN and
HYP_VA_MAX, as per your above definition.

> > 
> > > 
> > >> + *
> > >> + * This of course assumes that the trampoline page exists within the
> > >> + * VA_BITS range. If it doesn't, then it means we're in the odd case
> > >> + * where the kernel idmap (as well as HYP) uses more levels than the
> > >> + * kernel runtime page tables (as seen when the kernel is configured
> > >> + * for 4k pages, 39bits VA, and yet memory lives just above that
> > >> + * limit, forcing the idmap to use 4 levels of page tables while the
> > >> + * kernel itself only uses 3). In this particular case, it doesn't
> > >> + * matter which side of VA_BITS we use, as we're guaranteed not to
> > >> + * conflict with anything.
> > >> + *
> > >> + * An alternative would be to always use 4 levels of page tables for
> > >> + * EL2, no matter what the kernel does. But who wants more levels than
> > >> + * strictly necessary?
> 
> Our expectation here is that using an additional level is slower for TLB
> misses, so we want to avoid this, correct?  Also does the kernel never
> use 4 levels of page tables so that this is always an option.

A additional level is likely to increase the latency of a miss by an
additional 30% (compared to a 3 level miss). The kernel itself may be
configured for 4 levels, in which case we follow whatever it does.

> I appreciate the tongue-in-cheek, but since this hurts my brain (badly)
> I want to get rid of anything here that leaves the reader with open
> questions.
> 
> I don't mind trying to rewrite some of this, just have to make sure I
> actually understand it first.
> 
> > >> + *
> > >> + * Thankfully, ARMv8.1 (using VHE) does have a TTBR1_EL2, and doesn't
> > >> + * need any of this madness (the entire kernel runs at EL2).
> 
> So here I would simply state that using VHE, there are no separate hyp
> mappings and all KVM functionality is already mapped as part of the main
> kernel mappings, and none of this applies in that case.  Perhaps that's
> what you said already, and I just misread it for some reason.

Sure. I'll rewrite the thing.

> > > 
> > > Not sure how these two last paragraphs helps understanding what this
> > > patch set is about to implement, as it seems to raise more questions
> > > than answer them, but I will proceed to trying to read the code...
> > 
> > As I said, I found this blurb useful when I was trying to reason about
> > the problem. I don't mind it being dropped.
> > 
> 
> I would prefer if we can tweak it so I also understand it and then
> actually merge it.  That also makes it easier for me to review the patch
> set :)

Works for me!

	M.
-- 
Jazz is not dead. It just smells funny.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 01/15] arm64: KVM: Merged page tables documentation
@ 2016-06-29  9:05           ` Marc Zyngier
  0 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-29  9:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 28 Jun 2016 13:46:08 +0200
Christoffer Dall <christoffer.dall@linaro.org> wrote:

> On Mon, Jun 27, 2016 at 03:06:11PM +0100, Marc Zyngier wrote:
> > On 27/06/16 14:28, Christoffer Dall wrote:
> > > On Tue, Jun 07, 2016 at 11:58:21AM +0100, Marc Zyngier wrote:
> > >> Since dealing with VA ranges tends to hurt my brain badly, let's
> > >> start with a bit of documentation that will hopefully help
> > >> understanding what comes next...
> > >>
> > >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > >> ---
> > >>  arch/arm64/include/asm/kvm_mmu.h | 45 +++++++++++++++++++++++++++++++++++++---
> > >>  1 file changed, 42 insertions(+), 3 deletions(-)
> > >>
> > >> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> > >> index f05ac27..00bc277 100644
> > >> --- a/arch/arm64/include/asm/kvm_mmu.h
> > >> +++ b/arch/arm64/include/asm/kvm_mmu.h
> > >> @@ -29,10 +29,49 @@
> > >>   *
> > >>   * Instead, give the HYP mode its own VA region at a fixed offset from
> > >>   * the kernel by just masking the top bits (which are all ones for a
> > >> - * kernel address).
> > >> + * kernel address). We need to find out how many bits to mask.
> > >>   *
> > >> - * ARMv8.1 (using VHE) does have a TTBR1_EL2, and doesn't use these
> > >> - * macros (the entire kernel runs at EL2).
> > >> + * We want to build a set of page tables that cover both parts of the
> > >> + * idmap (the trampoline page used to initialize EL2), and our normal
> > >> + * runtime VA space, at the same time.
> > >> + *
> > >> + * Given that the kernel uses VA_BITS for its entire address space,
> > >> + * and that half of that space (VA_BITS - 1) is used for the linear
> > >> + * mapping, we can limit the EL2 space to the same size.
> > > 
> > > we can also limit the EL2 space to (VA_BITS - 1).
> > > 
> > >> + *
> > >> + * The main question is "Within the VA_BITS space, does EL2 use the
> > >> + * top or the bottom half of that space to shadow the kernel's linear
> > >> + * mapping?". As we need to idmap the trampoline page, this is
> > >> + * determined by the range in which this page lives.
> > >> + *
> > >> + * If the page is in the bottom half, we have to use the top half. If
> > >> + * the page is in the top half, we have to use the bottom half:
> > >> + *
> > >> + * if (PA(T)[VA_BITS - 1] == 1)
> > >> + *	HYP_VA_RANGE = [0 ... (1 << (VA_BITS - 1)) - 1]
> > >> + * else
> > >> + *	HYP_VA_RANGE = [(1 << (VA_BITS - 1)) ... (1 << VA_BITS) - 1]
> > > 
> > > Is this pseudo code or what am I looking at?  What is T?
> > 
> > Pseudocode indeed. T is the "trampoline page".
> > 
> > > I don't understand what this is saying.
> > 
> > This is giving you the range of HYP VAs that can be safely used to map
> > kernel ranges.
> 
> Ah, by PA(T)[bit_nr] you mean the value of an individual bit 'bit_nr' ?
> 
> I just think I choked on the pseudocode syntax, perhaps this is easier
> to understand?
> 
> T = __virt_to_phys(__hyp_idmap_text_start)
> if (T & BIT(VA_BITS - 1))
> 	HYP_VA_MIN = 0  //idmap in upper half
> else
> 	HYP_VA_MIN = 1 << (VA_BITS - 1)
> HYP_VA_MAX = HYP_VA_MIN + (1 << (VA_BITS - 1)) - 1

Yup, that's equivalent.

[...]

> > >> + *
> > >> + * In practice, the second case can be simplified to
> > >> + *	HYP_VA_RANGE = [0 ... (1 << VA_BITS) - 1]
> > >> + * because we'll never get anything in the bottom range.
> > > 
> > > and now I'm more confused, are we not supposed to map the idmap in the
> > > bottom range?  Is this part of the comment necessary?
> > 
> > Well, I found it useful when I wrote it. What I meant is that we're
> > never going to alias a kernel mapping there.
> 
> I think we should merge the documentation, this stuff is tricky so
> having it properly documented is important IMHO.
>
> The confusing part here is that we just said above that the HYP VA range
> may have to live in the upper part because the lower part would be used
> for the idmap, so why can we use it anyway?
> 
> Is the point that you'll be done with the idmap at some point?

No, the idmap has to stay (you definitely need it in order to enable
the MMU). It is not so much that we can or cannot bottom range, this
is simply where the idmap lives (the remark is confusing). The usable
VA space (to map kernel objects) is still between HYP_VA_MIN and
HYP_VA_MAX, as per your above definition.

> > 
> > > 
> > >> + *
> > >> + * This of course assumes that the trampoline page exists within the
> > >> + * VA_BITS range. If it doesn't, then it means we're in the odd case
> > >> + * where the kernel idmap (as well as HYP) uses more levels than the
> > >> + * kernel runtime page tables (as seen when the kernel is configured
> > >> + * for 4k pages, 39bits VA, and yet memory lives just above that
> > >> + * limit, forcing the idmap to use 4 levels of page tables while the
> > >> + * kernel itself only uses 3). In this particular case, it doesn't
> > >> + * matter which side of VA_BITS we use, as we're guaranteed not to
> > >> + * conflict with anything.
> > >> + *
> > >> + * An alternative would be to always use 4 levels of page tables for
> > >> + * EL2, no matter what the kernel does. But who wants more levels than
> > >> + * strictly necessary?
> 
> Our expectation here is that using an additional level is slower for TLB
> misses, so we want to avoid this, correct?  Also does the kernel never
> use 4 levels of page tables so that this is always an option.

A additional level is likely to increase the latency of a miss by an
additional 30% (compared to a 3 level miss). The kernel itself may be
configured for 4 levels, in which case we follow whatever it does.

> I appreciate the tongue-in-cheek, but since this hurts my brain (badly)
> I want to get rid of anything here that leaves the reader with open
> questions.
> 
> I don't mind trying to rewrite some of this, just have to make sure I
> actually understand it first.
> 
> > >> + *
> > >> + * Thankfully, ARMv8.1 (using VHE) does have a TTBR1_EL2, and doesn't
> > >> + * need any of this madness (the entire kernel runs at EL2).
> 
> So here I would simply state that using VHE, there are no separate hyp
> mappings and all KVM functionality is already mapped as part of the main
> kernel mappings, and none of this applies in that case.  Perhaps that's
> what you said already, and I just misread it for some reason.

Sure. I'll rewrite the thing.

> > > 
> > > Not sure how these two last paragraphs helps understanding what this
> > > patch set is about to implement, as it seems to raise more questions
> > > than answer them, but I will proceed to trying to read the code...
> > 
> > As I said, I found this blurb useful when I was trying to reason about
> > the problem. I don't mind it being dropped.
> > 
> 
> I would prefer if we can tweak it so I also understand it and then
> actually merge it.  That also makes it easier for me to review the patch
> set :)

Works for me!

	M.
-- 
Jazz is not dead. It just smells funny.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 05/15] arm64: KVM: Refactor kern_hyp_va/hyp_kern_va to deal with multiple offsets
  2016-06-28 12:42     ` Christoffer Dall
@ 2016-06-30  9:22       ` Marc Zyngier
  -1 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-30  9:22 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: linux-arm-kernel, kvm, kvmarm

On 28/06/16 13:42, Christoffer Dall wrote:
> On Tue, Jun 07, 2016 at 11:58:25AM +0100, Marc Zyngier wrote:
>> As we move towards a selectable HYP VA range, it is obvious that
>> we don't want to test a variable to find out if we need to use
>> the bottom VA range, the top VA range, or use the address as is
>> (for VHE).
>>
>> Instead, we can expand our current helpers to generate the right
>> mask or nop with code patching. We default to using the top VA
>> space, with alternatives to switch to the bottom one or to nop
>> out the instructions.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm64/include/asm/kvm_hyp.h | 27 ++++++++++++--------------
>>  arch/arm64/include/asm/kvm_mmu.h | 42 +++++++++++++++++++++++++++++++++++++---
>>  2 files changed, 51 insertions(+), 18 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
>> index 61d01a9..dd4904b 100644
>> --- a/arch/arm64/include/asm/kvm_hyp.h
>> +++ b/arch/arm64/include/asm/kvm_hyp.h
>> @@ -25,24 +25,21 @@
>>  
>>  #define __hyp_text __section(.hyp.text) notrace
>>  
>> -static inline unsigned long __kern_hyp_va(unsigned long v)
>> -{
>> -	asm volatile(ALTERNATIVE("and %0, %0, %1",
>> -				 "nop",
>> -				 ARM64_HAS_VIRT_HOST_EXTN)
>> -		     : "+r" (v) : "i" (HYP_PAGE_OFFSET_MASK));
>> -	return v;
>> -}
>> -
>> -#define kern_hyp_va(v) (typeof(v))(__kern_hyp_va((unsigned long)(v)))
>> -
>>  static inline unsigned long __hyp_kern_va(unsigned long v)
>>  {
>> -	asm volatile(ALTERNATIVE("orr %0, %0, %1",
>> -				 "nop",
>> +	u64 mask;
>> +
>> +	asm volatile(ALTERNATIVE("mov %0, %1",
>> +				 "mov %0, %2",
>> +				 ARM64_HYP_OFFSET_LOW)
>> +		     : "=r" (mask)
>> +		     : "i" (~HYP_PAGE_OFFSET_HIGH_MASK),
>> +		       "i" (~HYP_PAGE_OFFSET_LOW_MASK));
>> +	asm volatile(ALTERNATIVE("nop",
>> +				 "mov %0, xzr",
>>  				 ARM64_HAS_VIRT_HOST_EXTN)
>> -		     : "+r" (v) : "i" (~HYP_PAGE_OFFSET_MASK));
>> -	return v;
>> +		     : "+r" (mask));
>> +	return v | mask;
> 
> If mask is ~HYP_PAGE_OFFSET_LOW_MASK how can you be sure that setting
> bit (VA_BITS - 1) is always the right thing to do to generate a kernel
> address?

It has taken be a while, but I think I finally see what you mean. We
have no idea whether that bit was set or not.

> This is kind of what I asked before only now there's an extra bit not
> guaranteed by the architecture to be set for the kernel range, I
> think.

Yeah, I finally connected the couple of neurons left up there (that's
what remains after the whole brexit braindamage). This doesn't work (or
rather it only works sometimes). The good new is that I also realized we
don't need any of that crap.

The only case we currently use a HVA->KVA transformation is to pass the
panic string down to panic(), and we can perfectly prevent
__kvm_hyp_teardown from ever be evaluated as a HVA with a bit of
asm-foo. This allows us to get rid of this whole function.

> 
>>  }
>>  
>>  #define hyp_kern_va(v) (typeof(v))(__hyp_kern_va((unsigned long)(v)))
>> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
>> index e45df1b..889330b 100644
>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> @@ -95,13 +95,33 @@
>>  /*
>>   * Convert a kernel VA into a HYP VA.
>>   * reg: VA to be converted.
>> + *
>> + * This generates the following sequences:
>> + * - High mask:
>> + *		and x0, x0, #HYP_PAGE_OFFSET_HIGH_MASK
>> + *		nop
>> + * - Low mask:
>> + *		and x0, x0, #HYP_PAGE_OFFSET_HIGH_MASK
>> + *		and x0, x0, #HYP_PAGE_OFFSET_LOW_MASK
>> + * - VHE:
>> + *		nop
>> + *		nop
>> + *
>> + * The "low mask" version works because the mask is a strict subset of
>> + * the "high mask", hence performing the first mask for nothing.
>> + * Should be completely invisible on any viable CPU.
>>   */
>>  .macro kern_hyp_va	reg
>> -alternative_if_not ARM64_HAS_VIRT_HOST_EXTN	
>> -	and	\reg, \reg, #HYP_PAGE_OFFSET_MASK
>> +alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
>> +	and     \reg, \reg, #HYP_PAGE_OFFSET_HIGH_MASK
>>  alternative_else
>>  	nop
>>  alternative_endif
>> +alternative_if_not ARM64_HYP_OFFSET_LOW
>> +	nop
>> +alternative_else
>> +	and     \reg, \reg, #HYP_PAGE_OFFSET_LOW_MASK
>> +alternative_endif
>>  .endm
>>  
>>  #else
>> @@ -112,7 +132,23 @@ alternative_endif
>>  #include <asm/mmu_context.h>
>>  #include <asm/pgtable.h>
>>  
>> -#define KERN_TO_HYP(kva)	((unsigned long)kva & HYP_PAGE_OFFSET_MASK)
>> +static inline unsigned long __kern_hyp_va(unsigned long v)
>> +{
>> +	asm volatile(ALTERNATIVE("and %0, %0, %1",
>> +				 "nop",
>> +				 ARM64_HAS_VIRT_HOST_EXTN)
>> +		     : "+r" (v)
>> +		     : "i" (HYP_PAGE_OFFSET_HIGH_MASK));
>> +	asm volatile(ALTERNATIVE("nop",
>> +				 "and %0, %0, %1",
>> +				 ARM64_HYP_OFFSET_LOW)
>> +		     : "+r" (v)
>> +		     : "i" (HYP_PAGE_OFFSET_LOW_MASK));
> 
> how is the second operation a nop for VHE? Is this because
> ARM64_HYP_OFFSET_LOW will never be set for VHE?

That's because VHE has no notion of an offset at all (we end up with two
nops).

>> +	return v;
>> +}
>> +
>> +#define kern_hyp_va(v) 	(typeof(v))(__kern_hyp_va((unsigned long)(v)))
>> +#define KERN_TO_HYP(v)	kern_hyp_va(v)
> 
> looks like there's room for some unification/cleanup here as well.

Indeed. I'll queue up an additional patch at the end to clean the bulk
of the code.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 05/15] arm64: KVM: Refactor kern_hyp_va/hyp_kern_va to deal with multiple offsets
@ 2016-06-30  9:22       ` Marc Zyngier
  0 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-30  9:22 UTC (permalink / raw)
  To: linux-arm-kernel

On 28/06/16 13:42, Christoffer Dall wrote:
> On Tue, Jun 07, 2016 at 11:58:25AM +0100, Marc Zyngier wrote:
>> As we move towards a selectable HYP VA range, it is obvious that
>> we don't want to test a variable to find out if we need to use
>> the bottom VA range, the top VA range, or use the address as is
>> (for VHE).
>>
>> Instead, we can expand our current helpers to generate the right
>> mask or nop with code patching. We default to using the top VA
>> space, with alternatives to switch to the bottom one or to nop
>> out the instructions.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm64/include/asm/kvm_hyp.h | 27 ++++++++++++--------------
>>  arch/arm64/include/asm/kvm_mmu.h | 42 +++++++++++++++++++++++++++++++++++++---
>>  2 files changed, 51 insertions(+), 18 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
>> index 61d01a9..dd4904b 100644
>> --- a/arch/arm64/include/asm/kvm_hyp.h
>> +++ b/arch/arm64/include/asm/kvm_hyp.h
>> @@ -25,24 +25,21 @@
>>  
>>  #define __hyp_text __section(.hyp.text) notrace
>>  
>> -static inline unsigned long __kern_hyp_va(unsigned long v)
>> -{
>> -	asm volatile(ALTERNATIVE("and %0, %0, %1",
>> -				 "nop",
>> -				 ARM64_HAS_VIRT_HOST_EXTN)
>> -		     : "+r" (v) : "i" (HYP_PAGE_OFFSET_MASK));
>> -	return v;
>> -}
>> -
>> -#define kern_hyp_va(v) (typeof(v))(__kern_hyp_va((unsigned long)(v)))
>> -
>>  static inline unsigned long __hyp_kern_va(unsigned long v)
>>  {
>> -	asm volatile(ALTERNATIVE("orr %0, %0, %1",
>> -				 "nop",
>> +	u64 mask;
>> +
>> +	asm volatile(ALTERNATIVE("mov %0, %1",
>> +				 "mov %0, %2",
>> +				 ARM64_HYP_OFFSET_LOW)
>> +		     : "=r" (mask)
>> +		     : "i" (~HYP_PAGE_OFFSET_HIGH_MASK),
>> +		       "i" (~HYP_PAGE_OFFSET_LOW_MASK));
>> +	asm volatile(ALTERNATIVE("nop",
>> +				 "mov %0, xzr",
>>  				 ARM64_HAS_VIRT_HOST_EXTN)
>> -		     : "+r" (v) : "i" (~HYP_PAGE_OFFSET_MASK));
>> -	return v;
>> +		     : "+r" (mask));
>> +	return v | mask;
> 
> If mask is ~HYP_PAGE_OFFSET_LOW_MASK how can you be sure that setting
> bit (VA_BITS - 1) is always the right thing to do to generate a kernel
> address?

It has taken be a while, but I think I finally see what you mean. We
have no idea whether that bit was set or not.

> This is kind of what I asked before only now there's an extra bit not
> guaranteed by the architecture to be set for the kernel range, I
> think.

Yeah, I finally connected the couple of neurons left up there (that's
what remains after the whole brexit braindamage). This doesn't work (or
rather it only works sometimes). The good new is that I also realized we
don't need any of that crap.

The only case we currently use a HVA->KVA transformation is to pass the
panic string down to panic(), and we can perfectly prevent
__kvm_hyp_teardown from ever be evaluated as a HVA with a bit of
asm-foo. This allows us to get rid of this whole function.

> 
>>  }
>>  
>>  #define hyp_kern_va(v) (typeof(v))(__hyp_kern_va((unsigned long)(v)))
>> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
>> index e45df1b..889330b 100644
>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> @@ -95,13 +95,33 @@
>>  /*
>>   * Convert a kernel VA into a HYP VA.
>>   * reg: VA to be converted.
>> + *
>> + * This generates the following sequences:
>> + * - High mask:
>> + *		and x0, x0, #HYP_PAGE_OFFSET_HIGH_MASK
>> + *		nop
>> + * - Low mask:
>> + *		and x0, x0, #HYP_PAGE_OFFSET_HIGH_MASK
>> + *		and x0, x0, #HYP_PAGE_OFFSET_LOW_MASK
>> + * - VHE:
>> + *		nop
>> + *		nop
>> + *
>> + * The "low mask" version works because the mask is a strict subset of
>> + * the "high mask", hence performing the first mask for nothing.
>> + * Should be completely invisible on any viable CPU.
>>   */
>>  .macro kern_hyp_va	reg
>> -alternative_if_not ARM64_HAS_VIRT_HOST_EXTN	
>> -	and	\reg, \reg, #HYP_PAGE_OFFSET_MASK
>> +alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
>> +	and     \reg, \reg, #HYP_PAGE_OFFSET_HIGH_MASK
>>  alternative_else
>>  	nop
>>  alternative_endif
>> +alternative_if_not ARM64_HYP_OFFSET_LOW
>> +	nop
>> +alternative_else
>> +	and     \reg, \reg, #HYP_PAGE_OFFSET_LOW_MASK
>> +alternative_endif
>>  .endm
>>  
>>  #else
>> @@ -112,7 +132,23 @@ alternative_endif
>>  #include <asm/mmu_context.h>
>>  #include <asm/pgtable.h>
>>  
>> -#define KERN_TO_HYP(kva)	((unsigned long)kva & HYP_PAGE_OFFSET_MASK)
>> +static inline unsigned long __kern_hyp_va(unsigned long v)
>> +{
>> +	asm volatile(ALTERNATIVE("and %0, %0, %1",
>> +				 "nop",
>> +				 ARM64_HAS_VIRT_HOST_EXTN)
>> +		     : "+r" (v)
>> +		     : "i" (HYP_PAGE_OFFSET_HIGH_MASK));
>> +	asm volatile(ALTERNATIVE("nop",
>> +				 "and %0, %0, %1",
>> +				 ARM64_HYP_OFFSET_LOW)
>> +		     : "+r" (v)
>> +		     : "i" (HYP_PAGE_OFFSET_LOW_MASK));
> 
> how is the second operation a nop for VHE? Is this because
> ARM64_HYP_OFFSET_LOW will never be set for VHE?

That's because VHE has no notion of an offset at all (we end up with two
nops).

>> +	return v;
>> +}
>> +
>> +#define kern_hyp_va(v) 	(typeof(v))(__kern_hyp_va((unsigned long)(v)))
>> +#define KERN_TO_HYP(v)	kern_hyp_va(v)
> 
> looks like there's room for some unification/cleanup here as well.

Indeed. I'll queue up an additional patch at the end to clean the bulk
of the code.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 05/15] arm64: KVM: Refactor kern_hyp_va/hyp_kern_va to deal with multiple offsets
  2016-06-30  9:22       ` Marc Zyngier
@ 2016-06-30 10:16         ` Marc Zyngier
  -1 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-30 10:16 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: linux-arm-kernel, kvm, kvmarm

On 30/06/16 10:22, Marc Zyngier wrote:
> On 28/06/16 13:42, Christoffer Dall wrote:
>> On Tue, Jun 07, 2016 at 11:58:25AM +0100, Marc Zyngier wrote:
>>> As we move towards a selectable HYP VA range, it is obvious that
>>> we don't want to test a variable to find out if we need to use
>>> the bottom VA range, the top VA range, or use the address as is
>>> (for VHE).
>>>
>>> Instead, we can expand our current helpers to generate the right
>>> mask or nop with code patching. We default to using the top VA
>>> space, with alternatives to switch to the bottom one or to nop
>>> out the instructions.
>>>
>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>> ---
>>>  arch/arm64/include/asm/kvm_hyp.h | 27 ++++++++++++--------------
>>>  arch/arm64/include/asm/kvm_mmu.h | 42 +++++++++++++++++++++++++++++++++++++---
>>>  2 files changed, 51 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
>>> index 61d01a9..dd4904b 100644
>>> --- a/arch/arm64/include/asm/kvm_hyp.h
>>> +++ b/arch/arm64/include/asm/kvm_hyp.h
>>> @@ -25,24 +25,21 @@
>>>  
>>>  #define __hyp_text __section(.hyp.text) notrace
>>>  
>>> -static inline unsigned long __kern_hyp_va(unsigned long v)
>>> -{
>>> -	asm volatile(ALTERNATIVE("and %0, %0, %1",
>>> -				 "nop",
>>> -				 ARM64_HAS_VIRT_HOST_EXTN)
>>> -		     : "+r" (v) : "i" (HYP_PAGE_OFFSET_MASK));
>>> -	return v;
>>> -}
>>> -
>>> -#define kern_hyp_va(v) (typeof(v))(__kern_hyp_va((unsigned long)(v)))
>>> -
>>>  static inline unsigned long __hyp_kern_va(unsigned long v)
>>>  {
>>> -	asm volatile(ALTERNATIVE("orr %0, %0, %1",
>>> -				 "nop",
>>> +	u64 mask;
>>> +
>>> +	asm volatile(ALTERNATIVE("mov %0, %1",
>>> +				 "mov %0, %2",
>>> +				 ARM64_HYP_OFFSET_LOW)
>>> +		     : "=r" (mask)
>>> +		     : "i" (~HYP_PAGE_OFFSET_HIGH_MASK),
>>> +		       "i" (~HYP_PAGE_OFFSET_LOW_MASK));
>>> +	asm volatile(ALTERNATIVE("nop",
>>> +				 "mov %0, xzr",
>>>  				 ARM64_HAS_VIRT_HOST_EXTN)
>>> -		     : "+r" (v) : "i" (~HYP_PAGE_OFFSET_MASK));
>>> -	return v;
>>> +		     : "+r" (mask));
>>> +	return v | mask;
>>
>> If mask is ~HYP_PAGE_OFFSET_LOW_MASK how can you be sure that setting
>> bit (VA_BITS - 1) is always the right thing to do to generate a kernel
>> address?
> 
> It has taken be a while, but I think I finally see what you mean. We
> have no idea whether that bit was set or not.
> 
>> This is kind of what I asked before only now there's an extra bit not
>> guaranteed by the architecture to be set for the kernel range, I
>> think.
> 
> Yeah, I finally connected the couple of neurons left up there (that's
> what remains after the whole brexit braindamage). This doesn't work (or
> rather it only works sometimes). The good new is that I also realized we
> don't need any of that crap.
> 
> The only case we currently use a HVA->KVA transformation is to pass the
> panic string down to panic(), and we can perfectly prevent
> __kvm_hyp_teardown from ever be evaluated as a HVA with a bit of
> asm-foo. This allows us to get rid of this whole function.

Here's what I meant by this:

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 437cfad..c19754d 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -299,9 +299,16 @@ static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%
 
 static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par)
 {
-	unsigned long str_va = (unsigned long)__hyp_panic_string;
+	unsigned long str_va;
 
-	__hyp_do_panic(hyp_kern_va(str_va),
+	/*
+	 * Force the panic string to be loaded from the literal pool,
+	 * making sure it is a kernel address and not a PC-relative
+	 * reference.
+	 */
+	asm volatile("ldr %0, =__hyp_panic_string" : "=r" (str_va));
+
+	__hyp_do_panic(str_va,
 		       spsr,  elr,
 		       read_sysreg(esr_el2),   read_sysreg_el2(far),
 		       read_sysreg(hpfar_el2), par,

With that in place, we can entirely get rid of hyp_kern_va().

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 05/15] arm64: KVM: Refactor kern_hyp_va/hyp_kern_va to deal with multiple offsets
@ 2016-06-30 10:16         ` Marc Zyngier
  0 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-30 10:16 UTC (permalink / raw)
  To: linux-arm-kernel

On 30/06/16 10:22, Marc Zyngier wrote:
> On 28/06/16 13:42, Christoffer Dall wrote:
>> On Tue, Jun 07, 2016 at 11:58:25AM +0100, Marc Zyngier wrote:
>>> As we move towards a selectable HYP VA range, it is obvious that
>>> we don't want to test a variable to find out if we need to use
>>> the bottom VA range, the top VA range, or use the address as is
>>> (for VHE).
>>>
>>> Instead, we can expand our current helpers to generate the right
>>> mask or nop with code patching. We default to using the top VA
>>> space, with alternatives to switch to the bottom one or to nop
>>> out the instructions.
>>>
>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>> ---
>>>  arch/arm64/include/asm/kvm_hyp.h | 27 ++++++++++++--------------
>>>  arch/arm64/include/asm/kvm_mmu.h | 42 +++++++++++++++++++++++++++++++++++++---
>>>  2 files changed, 51 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
>>> index 61d01a9..dd4904b 100644
>>> --- a/arch/arm64/include/asm/kvm_hyp.h
>>> +++ b/arch/arm64/include/asm/kvm_hyp.h
>>> @@ -25,24 +25,21 @@
>>>  
>>>  #define __hyp_text __section(.hyp.text) notrace
>>>  
>>> -static inline unsigned long __kern_hyp_va(unsigned long v)
>>> -{
>>> -	asm volatile(ALTERNATIVE("and %0, %0, %1",
>>> -				 "nop",
>>> -				 ARM64_HAS_VIRT_HOST_EXTN)
>>> -		     : "+r" (v) : "i" (HYP_PAGE_OFFSET_MASK));
>>> -	return v;
>>> -}
>>> -
>>> -#define kern_hyp_va(v) (typeof(v))(__kern_hyp_va((unsigned long)(v)))
>>> -
>>>  static inline unsigned long __hyp_kern_va(unsigned long v)
>>>  {
>>> -	asm volatile(ALTERNATIVE("orr %0, %0, %1",
>>> -				 "nop",
>>> +	u64 mask;
>>> +
>>> +	asm volatile(ALTERNATIVE("mov %0, %1",
>>> +				 "mov %0, %2",
>>> +				 ARM64_HYP_OFFSET_LOW)
>>> +		     : "=r" (mask)
>>> +		     : "i" (~HYP_PAGE_OFFSET_HIGH_MASK),
>>> +		       "i" (~HYP_PAGE_OFFSET_LOW_MASK));
>>> +	asm volatile(ALTERNATIVE("nop",
>>> +				 "mov %0, xzr",
>>>  				 ARM64_HAS_VIRT_HOST_EXTN)
>>> -		     : "+r" (v) : "i" (~HYP_PAGE_OFFSET_MASK));
>>> -	return v;
>>> +		     : "+r" (mask));
>>> +	return v | mask;
>>
>> If mask is ~HYP_PAGE_OFFSET_LOW_MASK how can you be sure that setting
>> bit (VA_BITS - 1) is always the right thing to do to generate a kernel
>> address?
> 
> It has taken be a while, but I think I finally see what you mean. We
> have no idea whether that bit was set or not.
> 
>> This is kind of what I asked before only now there's an extra bit not
>> guaranteed by the architecture to be set for the kernel range, I
>> think.
> 
> Yeah, I finally connected the couple of neurons left up there (that's
> what remains after the whole brexit braindamage). This doesn't work (or
> rather it only works sometimes). The good new is that I also realized we
> don't need any of that crap.
> 
> The only case we currently use a HVA->KVA transformation is to pass the
> panic string down to panic(), and we can perfectly prevent
> __kvm_hyp_teardown from ever be evaluated as a HVA with a bit of
> asm-foo. This allows us to get rid of this whole function.

Here's what I meant by this:

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 437cfad..c19754d 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -299,9 +299,16 @@ static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%
 
 static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par)
 {
-	unsigned long str_va = (unsigned long)__hyp_panic_string;
+	unsigned long str_va;
 
-	__hyp_do_panic(hyp_kern_va(str_va),
+	/*
+	 * Force the panic string to be loaded from the literal pool,
+	 * making sure it is a kernel address and not a PC-relative
+	 * reference.
+	 */
+	asm volatile("ldr %0, =__hyp_panic_string" : "=r" (str_va));
+
+	__hyp_do_panic(str_va,
 		       spsr,  elr,
 		       read_sysreg(esr_el2),   read_sysreg_el2(far),
 		       read_sysreg(hpfar_el2), par,

With that in place, we can entirely get rid of hyp_kern_va().

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* Re: [PATCH 05/15] arm64: KVM: Refactor kern_hyp_va/hyp_kern_va to deal with multiple offsets
  2016-06-30 10:16         ` Marc Zyngier
@ 2016-06-30 10:26           ` Christoffer Dall
  -1 siblings, 0 replies; 90+ messages in thread
From: Christoffer Dall @ 2016-06-30 10:26 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, linux-arm-kernel, kvmarm

On Thu, Jun 30, 2016 at 11:16:44AM +0100, Marc Zyngier wrote:
> On 30/06/16 10:22, Marc Zyngier wrote:
> > On 28/06/16 13:42, Christoffer Dall wrote:
> >> On Tue, Jun 07, 2016 at 11:58:25AM +0100, Marc Zyngier wrote:
> >>> As we move towards a selectable HYP VA range, it is obvious that
> >>> we don't want to test a variable to find out if we need to use
> >>> the bottom VA range, the top VA range, or use the address as is
> >>> (for VHE).
> >>>
> >>> Instead, we can expand our current helpers to generate the right
> >>> mask or nop with code patching. We default to using the top VA
> >>> space, with alternatives to switch to the bottom one or to nop
> >>> out the instructions.
> >>>
> >>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >>> ---
> >>>  arch/arm64/include/asm/kvm_hyp.h | 27 ++++++++++++--------------
> >>>  arch/arm64/include/asm/kvm_mmu.h | 42 +++++++++++++++++++++++++++++++++++++---
> >>>  2 files changed, 51 insertions(+), 18 deletions(-)
> >>>
> >>> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
> >>> index 61d01a9..dd4904b 100644
> >>> --- a/arch/arm64/include/asm/kvm_hyp.h
> >>> +++ b/arch/arm64/include/asm/kvm_hyp.h
> >>> @@ -25,24 +25,21 @@
> >>>  
> >>>  #define __hyp_text __section(.hyp.text) notrace
> >>>  
> >>> -static inline unsigned long __kern_hyp_va(unsigned long v)
> >>> -{
> >>> -	asm volatile(ALTERNATIVE("and %0, %0, %1",
> >>> -				 "nop",
> >>> -				 ARM64_HAS_VIRT_HOST_EXTN)
> >>> -		     : "+r" (v) : "i" (HYP_PAGE_OFFSET_MASK));
> >>> -	return v;
> >>> -}
> >>> -
> >>> -#define kern_hyp_va(v) (typeof(v))(__kern_hyp_va((unsigned long)(v)))
> >>> -
> >>>  static inline unsigned long __hyp_kern_va(unsigned long v)
> >>>  {
> >>> -	asm volatile(ALTERNATIVE("orr %0, %0, %1",
> >>> -				 "nop",
> >>> +	u64 mask;
> >>> +
> >>> +	asm volatile(ALTERNATIVE("mov %0, %1",
> >>> +				 "mov %0, %2",
> >>> +				 ARM64_HYP_OFFSET_LOW)
> >>> +		     : "=r" (mask)
> >>> +		     : "i" (~HYP_PAGE_OFFSET_HIGH_MASK),
> >>> +		       "i" (~HYP_PAGE_OFFSET_LOW_MASK));
> >>> +	asm volatile(ALTERNATIVE("nop",
> >>> +				 "mov %0, xzr",
> >>>  				 ARM64_HAS_VIRT_HOST_EXTN)
> >>> -		     : "+r" (v) : "i" (~HYP_PAGE_OFFSET_MASK));
> >>> -	return v;
> >>> +		     : "+r" (mask));
> >>> +	return v | mask;
> >>
> >> If mask is ~HYP_PAGE_OFFSET_LOW_MASK how can you be sure that setting
> >> bit (VA_BITS - 1) is always the right thing to do to generate a kernel
> >> address?
> > 
> > It has taken be a while, but I think I finally see what you mean. We
> > have no idea whether that bit was set or not.
> > 
> >> This is kind of what I asked before only now there's an extra bit not
> >> guaranteed by the architecture to be set for the kernel range, I
> >> think.
> > 
> > Yeah, I finally connected the couple of neurons left up there (that's
> > what remains after the whole brexit braindamage). This doesn't work (or
> > rather it only works sometimes). The good new is that I also realized we
> > don't need any of that crap.
> > 
> > The only case we currently use a HVA->KVA transformation is to pass the
> > panic string down to panic(), and we can perfectly prevent
> > __kvm_hyp_teardown from ever be evaluated as a HVA with a bit of
> > asm-foo. This allows us to get rid of this whole function.
> 
> Here's what I meant by this:
> 
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 437cfad..c19754d 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -299,9 +299,16 @@ static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%
>  
>  static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par)
>  {
> -	unsigned long str_va = (unsigned long)__hyp_panic_string;
> +	unsigned long str_va;
>  
> -	__hyp_do_panic(hyp_kern_va(str_va),
> +	/*
> +	 * Force the panic string to be loaded from the literal pool,
> +	 * making sure it is a kernel address and not a PC-relative
> +	 * reference.
> +	 */
> +	asm volatile("ldr %0, =__hyp_panic_string" : "=r" (str_va));
> +
> +	__hyp_do_panic(str_va,
>  		       spsr,  elr,
>  		       read_sysreg(esr_el2),   read_sysreg_el2(far),
>  		       read_sysreg(hpfar_el2), par,
> 
> With that in place, we can entirely get rid of hyp_kern_va().
> 

Looks good to me, there's really no need to get that string pointer via
a hyp address.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 05/15] arm64: KVM: Refactor kern_hyp_va/hyp_kern_va to deal with multiple offsets
@ 2016-06-30 10:26           ` Christoffer Dall
  0 siblings, 0 replies; 90+ messages in thread
From: Christoffer Dall @ 2016-06-30 10:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jun 30, 2016 at 11:16:44AM +0100, Marc Zyngier wrote:
> On 30/06/16 10:22, Marc Zyngier wrote:
> > On 28/06/16 13:42, Christoffer Dall wrote:
> >> On Tue, Jun 07, 2016 at 11:58:25AM +0100, Marc Zyngier wrote:
> >>> As we move towards a selectable HYP VA range, it is obvious that
> >>> we don't want to test a variable to find out if we need to use
> >>> the bottom VA range, the top VA range, or use the address as is
> >>> (for VHE).
> >>>
> >>> Instead, we can expand our current helpers to generate the right
> >>> mask or nop with code patching. We default to using the top VA
> >>> space, with alternatives to switch to the bottom one or to nop
> >>> out the instructions.
> >>>
> >>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >>> ---
> >>>  arch/arm64/include/asm/kvm_hyp.h | 27 ++++++++++++--------------
> >>>  arch/arm64/include/asm/kvm_mmu.h | 42 +++++++++++++++++++++++++++++++++++++---
> >>>  2 files changed, 51 insertions(+), 18 deletions(-)
> >>>
> >>> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
> >>> index 61d01a9..dd4904b 100644
> >>> --- a/arch/arm64/include/asm/kvm_hyp.h
> >>> +++ b/arch/arm64/include/asm/kvm_hyp.h
> >>> @@ -25,24 +25,21 @@
> >>>  
> >>>  #define __hyp_text __section(.hyp.text) notrace
> >>>  
> >>> -static inline unsigned long __kern_hyp_va(unsigned long v)
> >>> -{
> >>> -	asm volatile(ALTERNATIVE("and %0, %0, %1",
> >>> -				 "nop",
> >>> -				 ARM64_HAS_VIRT_HOST_EXTN)
> >>> -		     : "+r" (v) : "i" (HYP_PAGE_OFFSET_MASK));
> >>> -	return v;
> >>> -}
> >>> -
> >>> -#define kern_hyp_va(v) (typeof(v))(__kern_hyp_va((unsigned long)(v)))
> >>> -
> >>>  static inline unsigned long __hyp_kern_va(unsigned long v)
> >>>  {
> >>> -	asm volatile(ALTERNATIVE("orr %0, %0, %1",
> >>> -				 "nop",
> >>> +	u64 mask;
> >>> +
> >>> +	asm volatile(ALTERNATIVE("mov %0, %1",
> >>> +				 "mov %0, %2",
> >>> +				 ARM64_HYP_OFFSET_LOW)
> >>> +		     : "=r" (mask)
> >>> +		     : "i" (~HYP_PAGE_OFFSET_HIGH_MASK),
> >>> +		       "i" (~HYP_PAGE_OFFSET_LOW_MASK));
> >>> +	asm volatile(ALTERNATIVE("nop",
> >>> +				 "mov %0, xzr",
> >>>  				 ARM64_HAS_VIRT_HOST_EXTN)
> >>> -		     : "+r" (v) : "i" (~HYP_PAGE_OFFSET_MASK));
> >>> -	return v;
> >>> +		     : "+r" (mask));
> >>> +	return v | mask;
> >>
> >> If mask is ~HYP_PAGE_OFFSET_LOW_MASK how can you be sure that setting
> >> bit (VA_BITS - 1) is always the right thing to do to generate a kernel
> >> address?
> > 
> > It has taken be a while, but I think I finally see what you mean. We
> > have no idea whether that bit was set or not.
> > 
> >> This is kind of what I asked before only now there's an extra bit not
> >> guaranteed by the architecture to be set for the kernel range, I
> >> think.
> > 
> > Yeah, I finally connected the couple of neurons left up there (that's
> > what remains after the whole brexit braindamage). This doesn't work (or
> > rather it only works sometimes). The good new is that I also realized we
> > don't need any of that crap.
> > 
> > The only case we currently use a HVA->KVA transformation is to pass the
> > panic string down to panic(), and we can perfectly prevent
> > __kvm_hyp_teardown from ever be evaluated as a HVA with a bit of
> > asm-foo. This allows us to get rid of this whole function.
> 
> Here's what I meant by this:
> 
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 437cfad..c19754d 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -299,9 +299,16 @@ static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%
>  
>  static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par)
>  {
> -	unsigned long str_va = (unsigned long)__hyp_panic_string;
> +	unsigned long str_va;
>  
> -	__hyp_do_panic(hyp_kern_va(str_va),
> +	/*
> +	 * Force the panic string to be loaded from the literal pool,
> +	 * making sure it is a kernel address and not a PC-relative
> +	 * reference.
> +	 */
> +	asm volatile("ldr %0, =__hyp_panic_string" : "=r" (str_va));
> +
> +	__hyp_do_panic(str_va,
>  		       spsr,  elr,
>  		       read_sysreg(esr_el2),   read_sysreg_el2(far),
>  		       read_sysreg(hpfar_el2), par,
> 
> With that in place, we can entirely get rid of hyp_kern_va().
> 

Looks good to me, there's really no need to get that string pointer via
a hyp address.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 05/15] arm64: KVM: Refactor kern_hyp_va/hyp_kern_va to deal with multiple offsets
  2016-06-30 10:16         ` Marc Zyngier
@ 2016-06-30 10:42           ` Ard Biesheuvel
  -1 siblings, 0 replies; 90+ messages in thread
From: Ard Biesheuvel @ 2016-06-30 10:42 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Christoffer Dall, linux-arm-kernel, KVM devel mailing list, kvmarm

On 30 June 2016 at 12:16, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 30/06/16 10:22, Marc Zyngier wrote:
>> On 28/06/16 13:42, Christoffer Dall wrote:
>>> On Tue, Jun 07, 2016 at 11:58:25AM +0100, Marc Zyngier wrote:
>>>> As we move towards a selectable HYP VA range, it is obvious that
>>>> we don't want to test a variable to find out if we need to use
>>>> the bottom VA range, the top VA range, or use the address as is
>>>> (for VHE).
>>>>
>>>> Instead, we can expand our current helpers to generate the right
>>>> mask or nop with code patching. We default to using the top VA
>>>> space, with alternatives to switch to the bottom one or to nop
>>>> out the instructions.
>>>>
>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>>> ---
>>>>  arch/arm64/include/asm/kvm_hyp.h | 27 ++++++++++++--------------
>>>>  arch/arm64/include/asm/kvm_mmu.h | 42 +++++++++++++++++++++++++++++++++++++---
>>>>  2 files changed, 51 insertions(+), 18 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
>>>> index 61d01a9..dd4904b 100644
>>>> --- a/arch/arm64/include/asm/kvm_hyp.h
>>>> +++ b/arch/arm64/include/asm/kvm_hyp.h
>>>> @@ -25,24 +25,21 @@
>>>>
>>>>  #define __hyp_text __section(.hyp.text) notrace
>>>>
>>>> -static inline unsigned long __kern_hyp_va(unsigned long v)
>>>> -{
>>>> -   asm volatile(ALTERNATIVE("and %0, %0, %1",
>>>> -                            "nop",
>>>> -                            ARM64_HAS_VIRT_HOST_EXTN)
>>>> -                : "+r" (v) : "i" (HYP_PAGE_OFFSET_MASK));
>>>> -   return v;
>>>> -}
>>>> -
>>>> -#define kern_hyp_va(v) (typeof(v))(__kern_hyp_va((unsigned long)(v)))
>>>> -
>>>>  static inline unsigned long __hyp_kern_va(unsigned long v)
>>>>  {
>>>> -   asm volatile(ALTERNATIVE("orr %0, %0, %1",
>>>> -                            "nop",
>>>> +   u64 mask;
>>>> +
>>>> +   asm volatile(ALTERNATIVE("mov %0, %1",
>>>> +                            "mov %0, %2",
>>>> +                            ARM64_HYP_OFFSET_LOW)
>>>> +                : "=r" (mask)
>>>> +                : "i" (~HYP_PAGE_OFFSET_HIGH_MASK),
>>>> +                  "i" (~HYP_PAGE_OFFSET_LOW_MASK));
>>>> +   asm volatile(ALTERNATIVE("nop",
>>>> +                            "mov %0, xzr",
>>>>                              ARM64_HAS_VIRT_HOST_EXTN)
>>>> -                : "+r" (v) : "i" (~HYP_PAGE_OFFSET_MASK));
>>>> -   return v;
>>>> +                : "+r" (mask));
>>>> +   return v | mask;
>>>
>>> If mask is ~HYP_PAGE_OFFSET_LOW_MASK how can you be sure that setting
>>> bit (VA_BITS - 1) is always the right thing to do to generate a kernel
>>> address?
>>
>> It has taken be a while, but I think I finally see what you mean. We
>> have no idea whether that bit was set or not.
>>
>>> This is kind of what I asked before only now there's an extra bit not
>>> guaranteed by the architecture to be set for the kernel range, I
>>> think.
>>
>> Yeah, I finally connected the couple of neurons left up there (that's
>> what remains after the whole brexit braindamage). This doesn't work (or
>> rather it only works sometimes). The good new is that I also realized we
>> don't need any of that crap.
>>
>> The only case we currently use a HVA->KVA transformation is to pass the
>> panic string down to panic(), and we can perfectly prevent
>> __kvm_hyp_teardown from ever be evaluated as a HVA with a bit of
>> asm-foo. This allows us to get rid of this whole function.
>
> Here's what I meant by this:
>
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 437cfad..c19754d 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -299,9 +299,16 @@ static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%
>
>  static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par)
>  {
> -       unsigned long str_va = (unsigned long)__hyp_panic_string;
> +       unsigned long str_va;
>
> -       __hyp_do_panic(hyp_kern_va(str_va),
> +       /*
> +        * Force the panic string to be loaded from the literal pool,
> +        * making sure it is a kernel address and not a PC-relative
> +        * reference.
> +        */
> +       asm volatile("ldr %0, =__hyp_panic_string" : "=r" (str_va));
> +

Wouldn't it suffice to make  __hyp_panic_string a non-static pointer
to const char? That way, it will be statically initialized with a
kernel VA, and the external linkage forces the compiler to evaluate
its value at runtime.


> +       __hyp_do_panic(str_va,
>                        spsr,  elr,
>                        read_sysreg(esr_el2),   read_sysreg_el2(far),
>                        read_sysreg(hpfar_el2), par,
>
> With that in place, we can entirely get rid of hyp_kern_va().
>
>         M.
> --
> Jazz is not dead. It just smells funny...
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 05/15] arm64: KVM: Refactor kern_hyp_va/hyp_kern_va to deal with multiple offsets
@ 2016-06-30 10:42           ` Ard Biesheuvel
  0 siblings, 0 replies; 90+ messages in thread
From: Ard Biesheuvel @ 2016-06-30 10:42 UTC (permalink / raw)
  To: linux-arm-kernel

On 30 June 2016 at 12:16, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 30/06/16 10:22, Marc Zyngier wrote:
>> On 28/06/16 13:42, Christoffer Dall wrote:
>>> On Tue, Jun 07, 2016 at 11:58:25AM +0100, Marc Zyngier wrote:
>>>> As we move towards a selectable HYP VA range, it is obvious that
>>>> we don't want to test a variable to find out if we need to use
>>>> the bottom VA range, the top VA range, or use the address as is
>>>> (for VHE).
>>>>
>>>> Instead, we can expand our current helpers to generate the right
>>>> mask or nop with code patching. We default to using the top VA
>>>> space, with alternatives to switch to the bottom one or to nop
>>>> out the instructions.
>>>>
>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>>> ---
>>>>  arch/arm64/include/asm/kvm_hyp.h | 27 ++++++++++++--------------
>>>>  arch/arm64/include/asm/kvm_mmu.h | 42 +++++++++++++++++++++++++++++++++++++---
>>>>  2 files changed, 51 insertions(+), 18 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
>>>> index 61d01a9..dd4904b 100644
>>>> --- a/arch/arm64/include/asm/kvm_hyp.h
>>>> +++ b/arch/arm64/include/asm/kvm_hyp.h
>>>> @@ -25,24 +25,21 @@
>>>>
>>>>  #define __hyp_text __section(.hyp.text) notrace
>>>>
>>>> -static inline unsigned long __kern_hyp_va(unsigned long v)
>>>> -{
>>>> -   asm volatile(ALTERNATIVE("and %0, %0, %1",
>>>> -                            "nop",
>>>> -                            ARM64_HAS_VIRT_HOST_EXTN)
>>>> -                : "+r" (v) : "i" (HYP_PAGE_OFFSET_MASK));
>>>> -   return v;
>>>> -}
>>>> -
>>>> -#define kern_hyp_va(v) (typeof(v))(__kern_hyp_va((unsigned long)(v)))
>>>> -
>>>>  static inline unsigned long __hyp_kern_va(unsigned long v)
>>>>  {
>>>> -   asm volatile(ALTERNATIVE("orr %0, %0, %1",
>>>> -                            "nop",
>>>> +   u64 mask;
>>>> +
>>>> +   asm volatile(ALTERNATIVE("mov %0, %1",
>>>> +                            "mov %0, %2",
>>>> +                            ARM64_HYP_OFFSET_LOW)
>>>> +                : "=r" (mask)
>>>> +                : "i" (~HYP_PAGE_OFFSET_HIGH_MASK),
>>>> +                  "i" (~HYP_PAGE_OFFSET_LOW_MASK));
>>>> +   asm volatile(ALTERNATIVE("nop",
>>>> +                            "mov %0, xzr",
>>>>                              ARM64_HAS_VIRT_HOST_EXTN)
>>>> -                : "+r" (v) : "i" (~HYP_PAGE_OFFSET_MASK));
>>>> -   return v;
>>>> +                : "+r" (mask));
>>>> +   return v | mask;
>>>
>>> If mask is ~HYP_PAGE_OFFSET_LOW_MASK how can you be sure that setting
>>> bit (VA_BITS - 1) is always the right thing to do to generate a kernel
>>> address?
>>
>> It has taken be a while, but I think I finally see what you mean. We
>> have no idea whether that bit was set or not.
>>
>>> This is kind of what I asked before only now there's an extra bit not
>>> guaranteed by the architecture to be set for the kernel range, I
>>> think.
>>
>> Yeah, I finally connected the couple of neurons left up there (that's
>> what remains after the whole brexit braindamage). This doesn't work (or
>> rather it only works sometimes). The good new is that I also realized we
>> don't need any of that crap.
>>
>> The only case we currently use a HVA->KVA transformation is to pass the
>> panic string down to panic(), and we can perfectly prevent
>> __kvm_hyp_teardown from ever be evaluated as a HVA with a bit of
>> asm-foo. This allows us to get rid of this whole function.
>
> Here's what I meant by this:
>
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 437cfad..c19754d 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -299,9 +299,16 @@ static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%
>
>  static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par)
>  {
> -       unsigned long str_va = (unsigned long)__hyp_panic_string;
> +       unsigned long str_va;
>
> -       __hyp_do_panic(hyp_kern_va(str_va),
> +       /*
> +        * Force the panic string to be loaded from the literal pool,
> +        * making sure it is a kernel address and not a PC-relative
> +        * reference.
> +        */
> +       asm volatile("ldr %0, =__hyp_panic_string" : "=r" (str_va));
> +

Wouldn't it suffice to make  __hyp_panic_string a non-static pointer
to const char? That way, it will be statically initialized with a
kernel VA, and the external linkage forces the compiler to evaluate
its value at runtime.


> +       __hyp_do_panic(str_va,
>                        spsr,  elr,
>                        read_sysreg(esr_el2),   read_sysreg_el2(far),
>                        read_sysreg(hpfar_el2), par,
>
> With that in place, we can entirely get rid of hyp_kern_va().
>
>         M.
> --
> Jazz is not dead. It just smells funny...
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 05/15] arm64: KVM: Refactor kern_hyp_va/hyp_kern_va to deal with multiple offsets
  2016-06-30 10:42           ` Ard Biesheuvel
@ 2016-06-30 11:02             ` Marc Zyngier
  -1 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-30 11:02 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Christoffer Dall, linux-arm-kernel, KVM devel mailing list, kvmarm

On 30/06/16 11:42, Ard Biesheuvel wrote:
> On 30 June 2016 at 12:16, Marc Zyngier <marc.zyngier@arm.com> wrote:
>> On 30/06/16 10:22, Marc Zyngier wrote:
>>> On 28/06/16 13:42, Christoffer Dall wrote:
>>>> On Tue, Jun 07, 2016 at 11:58:25AM +0100, Marc Zyngier wrote:
>>>>> As we move towards a selectable HYP VA range, it is obvious that
>>>>> we don't want to test a variable to find out if we need to use
>>>>> the bottom VA range, the top VA range, or use the address as is
>>>>> (for VHE).
>>>>>
>>>>> Instead, we can expand our current helpers to generate the right
>>>>> mask or nop with code patching. We default to using the top VA
>>>>> space, with alternatives to switch to the bottom one or to nop
>>>>> out the instructions.
>>>>>
>>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>>>> ---
>>>>>  arch/arm64/include/asm/kvm_hyp.h | 27 ++++++++++++--------------
>>>>>  arch/arm64/include/asm/kvm_mmu.h | 42 +++++++++++++++++++++++++++++++++++++---
>>>>>  2 files changed, 51 insertions(+), 18 deletions(-)
>>>>>
>>>>> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
>>>>> index 61d01a9..dd4904b 100644
>>>>> --- a/arch/arm64/include/asm/kvm_hyp.h
>>>>> +++ b/arch/arm64/include/asm/kvm_hyp.h
>>>>> @@ -25,24 +25,21 @@
>>>>>
>>>>>  #define __hyp_text __section(.hyp.text) notrace
>>>>>
>>>>> -static inline unsigned long __kern_hyp_va(unsigned long v)
>>>>> -{
>>>>> -   asm volatile(ALTERNATIVE("and %0, %0, %1",
>>>>> -                            "nop",
>>>>> -                            ARM64_HAS_VIRT_HOST_EXTN)
>>>>> -                : "+r" (v) : "i" (HYP_PAGE_OFFSET_MASK));
>>>>> -   return v;
>>>>> -}
>>>>> -
>>>>> -#define kern_hyp_va(v) (typeof(v))(__kern_hyp_va((unsigned long)(v)))
>>>>> -
>>>>>  static inline unsigned long __hyp_kern_va(unsigned long v)
>>>>>  {
>>>>> -   asm volatile(ALTERNATIVE("orr %0, %0, %1",
>>>>> -                            "nop",
>>>>> +   u64 mask;
>>>>> +
>>>>> +   asm volatile(ALTERNATIVE("mov %0, %1",
>>>>> +                            "mov %0, %2",
>>>>> +                            ARM64_HYP_OFFSET_LOW)
>>>>> +                : "=r" (mask)
>>>>> +                : "i" (~HYP_PAGE_OFFSET_HIGH_MASK),
>>>>> +                  "i" (~HYP_PAGE_OFFSET_LOW_MASK));
>>>>> +   asm volatile(ALTERNATIVE("nop",
>>>>> +                            "mov %0, xzr",
>>>>>                              ARM64_HAS_VIRT_HOST_EXTN)
>>>>> -                : "+r" (v) : "i" (~HYP_PAGE_OFFSET_MASK));
>>>>> -   return v;
>>>>> +                : "+r" (mask));
>>>>> +   return v | mask;
>>>>
>>>> If mask is ~HYP_PAGE_OFFSET_LOW_MASK how can you be sure that setting
>>>> bit (VA_BITS - 1) is always the right thing to do to generate a kernel
>>>> address?
>>>
>>> It has taken be a while, but I think I finally see what you mean. We
>>> have no idea whether that bit was set or not.
>>>
>>>> This is kind of what I asked before only now there's an extra bit not
>>>> guaranteed by the architecture to be set for the kernel range, I
>>>> think.
>>>
>>> Yeah, I finally connected the couple of neurons left up there (that's
>>> what remains after the whole brexit braindamage). This doesn't work (or
>>> rather it only works sometimes). The good new is that I also realized we
>>> don't need any of that crap.
>>>
>>> The only case we currently use a HVA->KVA transformation is to pass the
>>> panic string down to panic(), and we can perfectly prevent
>>> __kvm_hyp_teardown from ever be evaluated as a HVA with a bit of
>>> asm-foo. This allows us to get rid of this whole function.
>>
>> Here's what I meant by this:
>>
>> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
>> index 437cfad..c19754d 100644
>> --- a/arch/arm64/kvm/hyp/switch.c
>> +++ b/arch/arm64/kvm/hyp/switch.c
>> @@ -299,9 +299,16 @@ static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%
>>
>>  static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par)
>>  {
>> -       unsigned long str_va = (unsigned long)__hyp_panic_string;
>> +       unsigned long str_va;
>>
>> -       __hyp_do_panic(hyp_kern_va(str_va),
>> +       /*
>> +        * Force the panic string to be loaded from the literal pool,
>> +        * making sure it is a kernel address and not a PC-relative
>> +        * reference.
>> +        */
>> +       asm volatile("ldr %0, =__hyp_panic_string" : "=r" (str_va));
>> +
> 
> Wouldn't it suffice to make  __hyp_panic_string a non-static pointer
> to const char? That way, it will be statically initialized with a
> kernel VA, and the external linkage forces the compiler to evaluate
> its value at runtime.

Yup, that would work as well. The only nit is that the pointer needs to be
in the __hyp_text section, and my compiler is shouting at me with this:

  CC      arch/arm64/kvm/hyp/switch.o
arch/arm64/kvm/hyp/switch.c: In function '__hyp_call_panic_vhe':
arch/arm64/kvm/hyp/switch.c:298:13: error: __hyp_panic_string causes a section type conflict with __fpsimd_enabled_nvhe
 const char *__hyp_panic_string __section(.hyp.text) = "HYP panic:\nPS:%08llx PC:%016llx ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n";
             ^
arch/arm64/kvm/hyp/switch.c:22:24: note: '__fpsimd_enabled_nvhe' was declared here
 static bool __hyp_text __fpsimd_enabled_nvhe(void)

Any clue?

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 05/15] arm64: KVM: Refactor kern_hyp_va/hyp_kern_va to deal with multiple offsets
@ 2016-06-30 11:02             ` Marc Zyngier
  0 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-30 11:02 UTC (permalink / raw)
  To: linux-arm-kernel

On 30/06/16 11:42, Ard Biesheuvel wrote:
> On 30 June 2016 at 12:16, Marc Zyngier <marc.zyngier@arm.com> wrote:
>> On 30/06/16 10:22, Marc Zyngier wrote:
>>> On 28/06/16 13:42, Christoffer Dall wrote:
>>>> On Tue, Jun 07, 2016 at 11:58:25AM +0100, Marc Zyngier wrote:
>>>>> As we move towards a selectable HYP VA range, it is obvious that
>>>>> we don't want to test a variable to find out if we need to use
>>>>> the bottom VA range, the top VA range, or use the address as is
>>>>> (for VHE).
>>>>>
>>>>> Instead, we can expand our current helpers to generate the right
>>>>> mask or nop with code patching. We default to using the top VA
>>>>> space, with alternatives to switch to the bottom one or to nop
>>>>> out the instructions.
>>>>>
>>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>>>> ---
>>>>>  arch/arm64/include/asm/kvm_hyp.h | 27 ++++++++++++--------------
>>>>>  arch/arm64/include/asm/kvm_mmu.h | 42 +++++++++++++++++++++++++++++++++++++---
>>>>>  2 files changed, 51 insertions(+), 18 deletions(-)
>>>>>
>>>>> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
>>>>> index 61d01a9..dd4904b 100644
>>>>> --- a/arch/arm64/include/asm/kvm_hyp.h
>>>>> +++ b/arch/arm64/include/asm/kvm_hyp.h
>>>>> @@ -25,24 +25,21 @@
>>>>>
>>>>>  #define __hyp_text __section(.hyp.text) notrace
>>>>>
>>>>> -static inline unsigned long __kern_hyp_va(unsigned long v)
>>>>> -{
>>>>> -   asm volatile(ALTERNATIVE("and %0, %0, %1",
>>>>> -                            "nop",
>>>>> -                            ARM64_HAS_VIRT_HOST_EXTN)
>>>>> -                : "+r" (v) : "i" (HYP_PAGE_OFFSET_MASK));
>>>>> -   return v;
>>>>> -}
>>>>> -
>>>>> -#define kern_hyp_va(v) (typeof(v))(__kern_hyp_va((unsigned long)(v)))
>>>>> -
>>>>>  static inline unsigned long __hyp_kern_va(unsigned long v)
>>>>>  {
>>>>> -   asm volatile(ALTERNATIVE("orr %0, %0, %1",
>>>>> -                            "nop",
>>>>> +   u64 mask;
>>>>> +
>>>>> +   asm volatile(ALTERNATIVE("mov %0, %1",
>>>>> +                            "mov %0, %2",
>>>>> +                            ARM64_HYP_OFFSET_LOW)
>>>>> +                : "=r" (mask)
>>>>> +                : "i" (~HYP_PAGE_OFFSET_HIGH_MASK),
>>>>> +                  "i" (~HYP_PAGE_OFFSET_LOW_MASK));
>>>>> +   asm volatile(ALTERNATIVE("nop",
>>>>> +                            "mov %0, xzr",
>>>>>                              ARM64_HAS_VIRT_HOST_EXTN)
>>>>> -                : "+r" (v) : "i" (~HYP_PAGE_OFFSET_MASK));
>>>>> -   return v;
>>>>> +                : "+r" (mask));
>>>>> +   return v | mask;
>>>>
>>>> If mask is ~HYP_PAGE_OFFSET_LOW_MASK how can you be sure that setting
>>>> bit (VA_BITS - 1) is always the right thing to do to generate a kernel
>>>> address?
>>>
>>> It has taken be a while, but I think I finally see what you mean. We
>>> have no idea whether that bit was set or not.
>>>
>>>> This is kind of what I asked before only now there's an extra bit not
>>>> guaranteed by the architecture to be set for the kernel range, I
>>>> think.
>>>
>>> Yeah, I finally connected the couple of neurons left up there (that's
>>> what remains after the whole brexit braindamage). This doesn't work (or
>>> rather it only works sometimes). The good new is that I also realized we
>>> don't need any of that crap.
>>>
>>> The only case we currently use a HVA->KVA transformation is to pass the
>>> panic string down to panic(), and we can perfectly prevent
>>> __kvm_hyp_teardown from ever be evaluated as a HVA with a bit of
>>> asm-foo. This allows us to get rid of this whole function.
>>
>> Here's what I meant by this:
>>
>> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
>> index 437cfad..c19754d 100644
>> --- a/arch/arm64/kvm/hyp/switch.c
>> +++ b/arch/arm64/kvm/hyp/switch.c
>> @@ -299,9 +299,16 @@ static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%
>>
>>  static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par)
>>  {
>> -       unsigned long str_va = (unsigned long)__hyp_panic_string;
>> +       unsigned long str_va;
>>
>> -       __hyp_do_panic(hyp_kern_va(str_va),
>> +       /*
>> +        * Force the panic string to be loaded from the literal pool,
>> +        * making sure it is a kernel address and not a PC-relative
>> +        * reference.
>> +        */
>> +       asm volatile("ldr %0, =__hyp_panic_string" : "=r" (str_va));
>> +
> 
> Wouldn't it suffice to make  __hyp_panic_string a non-static pointer
> to const char? That way, it will be statically initialized with a
> kernel VA, and the external linkage forces the compiler to evaluate
> its value at runtime.

Yup, that would work as well. The only nit is that the pointer needs to be
in the __hyp_text section, and my compiler is shouting at me with this:

  CC      arch/arm64/kvm/hyp/switch.o
arch/arm64/kvm/hyp/switch.c: In function '__hyp_call_panic_vhe':
arch/arm64/kvm/hyp/switch.c:298:13: error: __hyp_panic_string causes a section type conflict with __fpsimd_enabled_nvhe
 const char *__hyp_panic_string __section(.hyp.text) = "HYP panic:\nPS:%08llx PC:%016llx ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n";
             ^
arch/arm64/kvm/hyp/switch.c:22:24: note: '__fpsimd_enabled_nvhe' was declared here
 static bool __hyp_text __fpsimd_enabled_nvhe(void)

Any clue?

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 05/15] arm64: KVM: Refactor kern_hyp_va/hyp_kern_va to deal with multiple offsets
  2016-06-30 11:02             ` Marc Zyngier
@ 2016-06-30 11:10               ` Ard Biesheuvel
  -1 siblings, 0 replies; 90+ messages in thread
From: Ard Biesheuvel @ 2016-06-30 11:10 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvmarm, linux-arm-kernel, KVM devel mailing list

On 30 June 2016 at 13:02, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 30/06/16 11:42, Ard Biesheuvel wrote:
>> On 30 June 2016 at 12:16, Marc Zyngier <marc.zyngier@arm.com> wrote:
>>> On 30/06/16 10:22, Marc Zyngier wrote:
>>>> On 28/06/16 13:42, Christoffer Dall wrote:
>>>>> On Tue, Jun 07, 2016 at 11:58:25AM +0100, Marc Zyngier wrote:
>>>>>> As we move towards a selectable HYP VA range, it is obvious that
>>>>>> we don't want to test a variable to find out if we need to use
>>>>>> the bottom VA range, the top VA range, or use the address as is
>>>>>> (for VHE).
>>>>>>
>>>>>> Instead, we can expand our current helpers to generate the right
>>>>>> mask or nop with code patching. We default to using the top VA
>>>>>> space, with alternatives to switch to the bottom one or to nop
>>>>>> out the instructions.
>>>>>>
>>>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>>>>> ---
>>>>>>  arch/arm64/include/asm/kvm_hyp.h | 27 ++++++++++++--------------
>>>>>>  arch/arm64/include/asm/kvm_mmu.h | 42 +++++++++++++++++++++++++++++++++++++---
>>>>>>  2 files changed, 51 insertions(+), 18 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
>>>>>> index 61d01a9..dd4904b 100644
>>>>>> --- a/arch/arm64/include/asm/kvm_hyp.h
>>>>>> +++ b/arch/arm64/include/asm/kvm_hyp.h
>>>>>> @@ -25,24 +25,21 @@
>>>>>>
>>>>>>  #define __hyp_text __section(.hyp.text) notrace
>>>>>>
>>>>>> -static inline unsigned long __kern_hyp_va(unsigned long v)
>>>>>> -{
>>>>>> -   asm volatile(ALTERNATIVE("and %0, %0, %1",
>>>>>> -                            "nop",
>>>>>> -                            ARM64_HAS_VIRT_HOST_EXTN)
>>>>>> -                : "+r" (v) : "i" (HYP_PAGE_OFFSET_MASK));
>>>>>> -   return v;
>>>>>> -}
>>>>>> -
>>>>>> -#define kern_hyp_va(v) (typeof(v))(__kern_hyp_va((unsigned long)(v)))
>>>>>> -
>>>>>>  static inline unsigned long __hyp_kern_va(unsigned long v)
>>>>>>  {
>>>>>> -   asm volatile(ALTERNATIVE("orr %0, %0, %1",
>>>>>> -                            "nop",
>>>>>> +   u64 mask;
>>>>>> +
>>>>>> +   asm volatile(ALTERNATIVE("mov %0, %1",
>>>>>> +                            "mov %0, %2",
>>>>>> +                            ARM64_HYP_OFFSET_LOW)
>>>>>> +                : "=r" (mask)
>>>>>> +                : "i" (~HYP_PAGE_OFFSET_HIGH_MASK),
>>>>>> +                  "i" (~HYP_PAGE_OFFSET_LOW_MASK));
>>>>>> +   asm volatile(ALTERNATIVE("nop",
>>>>>> +                            "mov %0, xzr",
>>>>>>                              ARM64_HAS_VIRT_HOST_EXTN)
>>>>>> -                : "+r" (v) : "i" (~HYP_PAGE_OFFSET_MASK));
>>>>>> -   return v;
>>>>>> +                : "+r" (mask));
>>>>>> +   return v | mask;
>>>>>
>>>>> If mask is ~HYP_PAGE_OFFSET_LOW_MASK how can you be sure that setting
>>>>> bit (VA_BITS - 1) is always the right thing to do to generate a kernel
>>>>> address?
>>>>
>>>> It has taken be a while, but I think I finally see what you mean. We
>>>> have no idea whether that bit was set or not.
>>>>
>>>>> This is kind of what I asked before only now there's an extra bit not
>>>>> guaranteed by the architecture to be set for the kernel range, I
>>>>> think.
>>>>
>>>> Yeah, I finally connected the couple of neurons left up there (that's
>>>> what remains after the whole brexit braindamage). This doesn't work (or
>>>> rather it only works sometimes). The good new is that I also realized we
>>>> don't need any of that crap.
>>>>
>>>> The only case we currently use a HVA->KVA transformation is to pass the
>>>> panic string down to panic(), and we can perfectly prevent
>>>> __kvm_hyp_teardown from ever be evaluated as a HVA with a bit of
>>>> asm-foo. This allows us to get rid of this whole function.
>>>
>>> Here's what I meant by this:
>>>
>>> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
>>> index 437cfad..c19754d 100644
>>> --- a/arch/arm64/kvm/hyp/switch.c
>>> +++ b/arch/arm64/kvm/hyp/switch.c
>>> @@ -299,9 +299,16 @@ static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%
>>>
>>>  static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par)
>>>  {
>>> -       unsigned long str_va = (unsigned long)__hyp_panic_string;
>>> +       unsigned long str_va;
>>>
>>> -       __hyp_do_panic(hyp_kern_va(str_va),
>>> +       /*
>>> +        * Force the panic string to be loaded from the literal pool,
>>> +        * making sure it is a kernel address and not a PC-relative
>>> +        * reference.
>>> +        */
>>> +       asm volatile("ldr %0, =__hyp_panic_string" : "=r" (str_va));
>>> +
>>
>> Wouldn't it suffice to make  __hyp_panic_string a non-static pointer
>> to const char? That way, it will be statically initialized with a
>> kernel VA, and the external linkage forces the compiler to evaluate
>> its value at runtime.
>
> Yup, that would work as well. The only nit is that the pointer needs to be
> in the __hyp_text section, and my compiler is shouting at me with this:
>
>   CC      arch/arm64/kvm/hyp/switch.o
> arch/arm64/kvm/hyp/switch.c: In function '__hyp_call_panic_vhe':
> arch/arm64/kvm/hyp/switch.c:298:13: error: __hyp_panic_string causes a section type conflict with __fpsimd_enabled_nvhe
>  const char *__hyp_panic_string __section(.hyp.text) = "HYP panic:\nPS:%08llx PC:%016llx ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n";
>              ^
> arch/arm64/kvm/hyp/switch.c:22:24: note: '__fpsimd_enabled_nvhe' was declared here
>  static bool __hyp_text __fpsimd_enabled_nvhe(void)
>
> Any clue?
>

The pointer is writable/non-exec and the code is readonly/exec, so it
makes sense for the compiler to complain about this. It needs to be
non-const, though, to prevent the compiler from short-circuiting the
evaluation, so the only solution would be to add a .hyp.data section
to the linker script, and put the __hyp_panic_string pointer in there.

Not worth the trouble, perhaps ...

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 05/15] arm64: KVM: Refactor kern_hyp_va/hyp_kern_va to deal with multiple offsets
@ 2016-06-30 11:10               ` Ard Biesheuvel
  0 siblings, 0 replies; 90+ messages in thread
From: Ard Biesheuvel @ 2016-06-30 11:10 UTC (permalink / raw)
  To: linux-arm-kernel

On 30 June 2016 at 13:02, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 30/06/16 11:42, Ard Biesheuvel wrote:
>> On 30 June 2016 at 12:16, Marc Zyngier <marc.zyngier@arm.com> wrote:
>>> On 30/06/16 10:22, Marc Zyngier wrote:
>>>> On 28/06/16 13:42, Christoffer Dall wrote:
>>>>> On Tue, Jun 07, 2016 at 11:58:25AM +0100, Marc Zyngier wrote:
>>>>>> As we move towards a selectable HYP VA range, it is obvious that
>>>>>> we don't want to test a variable to find out if we need to use
>>>>>> the bottom VA range, the top VA range, or use the address as is
>>>>>> (for VHE).
>>>>>>
>>>>>> Instead, we can expand our current helpers to generate the right
>>>>>> mask or nop with code patching. We default to using the top VA
>>>>>> space, with alternatives to switch to the bottom one or to nop
>>>>>> out the instructions.
>>>>>>
>>>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>>>>> ---
>>>>>>  arch/arm64/include/asm/kvm_hyp.h | 27 ++++++++++++--------------
>>>>>>  arch/arm64/include/asm/kvm_mmu.h | 42 +++++++++++++++++++++++++++++++++++++---
>>>>>>  2 files changed, 51 insertions(+), 18 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
>>>>>> index 61d01a9..dd4904b 100644
>>>>>> --- a/arch/arm64/include/asm/kvm_hyp.h
>>>>>> +++ b/arch/arm64/include/asm/kvm_hyp.h
>>>>>> @@ -25,24 +25,21 @@
>>>>>>
>>>>>>  #define __hyp_text __section(.hyp.text) notrace
>>>>>>
>>>>>> -static inline unsigned long __kern_hyp_va(unsigned long v)
>>>>>> -{
>>>>>> -   asm volatile(ALTERNATIVE("and %0, %0, %1",
>>>>>> -                            "nop",
>>>>>> -                            ARM64_HAS_VIRT_HOST_EXTN)
>>>>>> -                : "+r" (v) : "i" (HYP_PAGE_OFFSET_MASK));
>>>>>> -   return v;
>>>>>> -}
>>>>>> -
>>>>>> -#define kern_hyp_va(v) (typeof(v))(__kern_hyp_va((unsigned long)(v)))
>>>>>> -
>>>>>>  static inline unsigned long __hyp_kern_va(unsigned long v)
>>>>>>  {
>>>>>> -   asm volatile(ALTERNATIVE("orr %0, %0, %1",
>>>>>> -                            "nop",
>>>>>> +   u64 mask;
>>>>>> +
>>>>>> +   asm volatile(ALTERNATIVE("mov %0, %1",
>>>>>> +                            "mov %0, %2",
>>>>>> +                            ARM64_HYP_OFFSET_LOW)
>>>>>> +                : "=r" (mask)
>>>>>> +                : "i" (~HYP_PAGE_OFFSET_HIGH_MASK),
>>>>>> +                  "i" (~HYP_PAGE_OFFSET_LOW_MASK));
>>>>>> +   asm volatile(ALTERNATIVE("nop",
>>>>>> +                            "mov %0, xzr",
>>>>>>                              ARM64_HAS_VIRT_HOST_EXTN)
>>>>>> -                : "+r" (v) : "i" (~HYP_PAGE_OFFSET_MASK));
>>>>>> -   return v;
>>>>>> +                : "+r" (mask));
>>>>>> +   return v | mask;
>>>>>
>>>>> If mask is ~HYP_PAGE_OFFSET_LOW_MASK how can you be sure that setting
>>>>> bit (VA_BITS - 1) is always the right thing to do to generate a kernel
>>>>> address?
>>>>
>>>> It has taken be a while, but I think I finally see what you mean. We
>>>> have no idea whether that bit was set or not.
>>>>
>>>>> This is kind of what I asked before only now there's an extra bit not
>>>>> guaranteed by the architecture to be set for the kernel range, I
>>>>> think.
>>>>
>>>> Yeah, I finally connected the couple of neurons left up there (that's
>>>> what remains after the whole brexit braindamage). This doesn't work (or
>>>> rather it only works sometimes). The good new is that I also realized we
>>>> don't need any of that crap.
>>>>
>>>> The only case we currently use a HVA->KVA transformation is to pass the
>>>> panic string down to panic(), and we can perfectly prevent
>>>> __kvm_hyp_teardown from ever be evaluated as a HVA with a bit of
>>>> asm-foo. This allows us to get rid of this whole function.
>>>
>>> Here's what I meant by this:
>>>
>>> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
>>> index 437cfad..c19754d 100644
>>> --- a/arch/arm64/kvm/hyp/switch.c
>>> +++ b/arch/arm64/kvm/hyp/switch.c
>>> @@ -299,9 +299,16 @@ static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%
>>>
>>>  static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par)
>>>  {
>>> -       unsigned long str_va = (unsigned long)__hyp_panic_string;
>>> +       unsigned long str_va;
>>>
>>> -       __hyp_do_panic(hyp_kern_va(str_va),
>>> +       /*
>>> +        * Force the panic string to be loaded from the literal pool,
>>> +        * making sure it is a kernel address and not a PC-relative
>>> +        * reference.
>>> +        */
>>> +       asm volatile("ldr %0, =__hyp_panic_string" : "=r" (str_va));
>>> +
>>
>> Wouldn't it suffice to make  __hyp_panic_string a non-static pointer
>> to const char? That way, it will be statically initialized with a
>> kernel VA, and the external linkage forces the compiler to evaluate
>> its value at runtime.
>
> Yup, that would work as well. The only nit is that the pointer needs to be
> in the __hyp_text section, and my compiler is shouting at me with this:
>
>   CC      arch/arm64/kvm/hyp/switch.o
> arch/arm64/kvm/hyp/switch.c: In function '__hyp_call_panic_vhe':
> arch/arm64/kvm/hyp/switch.c:298:13: error: __hyp_panic_string causes a section type conflict with __fpsimd_enabled_nvhe
>  const char *__hyp_panic_string __section(.hyp.text) = "HYP panic:\nPS:%08llx PC:%016llx ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n";
>              ^
> arch/arm64/kvm/hyp/switch.c:22:24: note: '__fpsimd_enabled_nvhe' was declared here
>  static bool __hyp_text __fpsimd_enabled_nvhe(void)
>
> Any clue?
>

The pointer is writable/non-exec and the code is readonly/exec, so it
makes sense for the compiler to complain about this. It needs to be
non-const, though, to prevent the compiler from short-circuiting the
evaluation, so the only solution would be to add a .hyp.data section
to the linker script, and put the __hyp_panic_string pointer in there.

Not worth the trouble, perhaps ...

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 05/15] arm64: KVM: Refactor kern_hyp_va/hyp_kern_va to deal with multiple offsets
  2016-06-30 11:10               ` Ard Biesheuvel
@ 2016-06-30 11:57                 ` Marc Zyngier
  -1 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-30 11:57 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: kvmarm, linux-arm-kernel, KVM devel mailing list

On 30/06/16 12:10, Ard Biesheuvel wrote:
> On 30 June 2016 at 13:02, Marc Zyngier <marc.zyngier@arm.com> wrote:
>> On 30/06/16 11:42, Ard Biesheuvel wrote:
>>> On 30 June 2016 at 12:16, Marc Zyngier <marc.zyngier@arm.com> wrote:
>>>> On 30/06/16 10:22, Marc Zyngier wrote:
>>>>> On 28/06/16 13:42, Christoffer Dall wrote:
>>>>>> On Tue, Jun 07, 2016 at 11:58:25AM +0100, Marc Zyngier wrote:
>>>>>>> As we move towards a selectable HYP VA range, it is obvious that
>>>>>>> we don't want to test a variable to find out if we need to use
>>>>>>> the bottom VA range, the top VA range, or use the address as is
>>>>>>> (for VHE).
>>>>>>>
>>>>>>> Instead, we can expand our current helpers to generate the right
>>>>>>> mask or nop with code patching. We default to using the top VA
>>>>>>> space, with alternatives to switch to the bottom one or to nop
>>>>>>> out the instructions.
>>>>>>>
>>>>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>>>>>> ---
>>>>>>>  arch/arm64/include/asm/kvm_hyp.h | 27 ++++++++++++--------------
>>>>>>>  arch/arm64/include/asm/kvm_mmu.h | 42 +++++++++++++++++++++++++++++++++++++---
>>>>>>>  2 files changed, 51 insertions(+), 18 deletions(-)
>>>>>>>
>>>>>>> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
>>>>>>> index 61d01a9..dd4904b 100644
>>>>>>> --- a/arch/arm64/include/asm/kvm_hyp.h
>>>>>>> +++ b/arch/arm64/include/asm/kvm_hyp.h
>>>>>>> @@ -25,24 +25,21 @@
>>>>>>>
>>>>>>>  #define __hyp_text __section(.hyp.text) notrace
>>>>>>>
>>>>>>> -static inline unsigned long __kern_hyp_va(unsigned long v)
>>>>>>> -{
>>>>>>> -   asm volatile(ALTERNATIVE("and %0, %0, %1",
>>>>>>> -                            "nop",
>>>>>>> -                            ARM64_HAS_VIRT_HOST_EXTN)
>>>>>>> -                : "+r" (v) : "i" (HYP_PAGE_OFFSET_MASK));
>>>>>>> -   return v;
>>>>>>> -}
>>>>>>> -
>>>>>>> -#define kern_hyp_va(v) (typeof(v))(__kern_hyp_va((unsigned long)(v)))
>>>>>>> -
>>>>>>>  static inline unsigned long __hyp_kern_va(unsigned long v)
>>>>>>>  {
>>>>>>> -   asm volatile(ALTERNATIVE("orr %0, %0, %1",
>>>>>>> -                            "nop",
>>>>>>> +   u64 mask;
>>>>>>> +
>>>>>>> +   asm volatile(ALTERNATIVE("mov %0, %1",
>>>>>>> +                            "mov %0, %2",
>>>>>>> +                            ARM64_HYP_OFFSET_LOW)
>>>>>>> +                : "=r" (mask)
>>>>>>> +                : "i" (~HYP_PAGE_OFFSET_HIGH_MASK),
>>>>>>> +                  "i" (~HYP_PAGE_OFFSET_LOW_MASK));
>>>>>>> +   asm volatile(ALTERNATIVE("nop",
>>>>>>> +                            "mov %0, xzr",
>>>>>>>                              ARM64_HAS_VIRT_HOST_EXTN)
>>>>>>> -                : "+r" (v) : "i" (~HYP_PAGE_OFFSET_MASK));
>>>>>>> -   return v;
>>>>>>> +                : "+r" (mask));
>>>>>>> +   return v | mask;
>>>>>>
>>>>>> If mask is ~HYP_PAGE_OFFSET_LOW_MASK how can you be sure that setting
>>>>>> bit (VA_BITS - 1) is always the right thing to do to generate a kernel
>>>>>> address?
>>>>>
>>>>> It has taken be a while, but I think I finally see what you mean. We
>>>>> have no idea whether that bit was set or not.
>>>>>
>>>>>> This is kind of what I asked before only now there's an extra bit not
>>>>>> guaranteed by the architecture to be set for the kernel range, I
>>>>>> think.
>>>>>
>>>>> Yeah, I finally connected the couple of neurons left up there (that's
>>>>> what remains after the whole brexit braindamage). This doesn't work (or
>>>>> rather it only works sometimes). The good new is that I also realized we
>>>>> don't need any of that crap.
>>>>>
>>>>> The only case we currently use a HVA->KVA transformation is to pass the
>>>>> panic string down to panic(), and we can perfectly prevent
>>>>> __kvm_hyp_teardown from ever be evaluated as a HVA with a bit of
>>>>> asm-foo. This allows us to get rid of this whole function.
>>>>
>>>> Here's what I meant by this:
>>>>
>>>> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
>>>> index 437cfad..c19754d 100644
>>>> --- a/arch/arm64/kvm/hyp/switch.c
>>>> +++ b/arch/arm64/kvm/hyp/switch.c
>>>> @@ -299,9 +299,16 @@ static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%
>>>>
>>>>  static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par)
>>>>  {
>>>> -       unsigned long str_va = (unsigned long)__hyp_panic_string;
>>>> +       unsigned long str_va;
>>>>
>>>> -       __hyp_do_panic(hyp_kern_va(str_va),
>>>> +       /*
>>>> +        * Force the panic string to be loaded from the literal pool,
>>>> +        * making sure it is a kernel address and not a PC-relative
>>>> +        * reference.
>>>> +        */
>>>> +       asm volatile("ldr %0, =__hyp_panic_string" : "=r" (str_va));
>>>> +
>>>
>>> Wouldn't it suffice to make  __hyp_panic_string a non-static pointer
>>> to const char? That way, it will be statically initialized with a
>>> kernel VA, and the external linkage forces the compiler to evaluate
>>> its value at runtime.
>>
>> Yup, that would work as well. The only nit is that the pointer needs to be
>> in the __hyp_text section, and my compiler is shouting at me with this:
>>
>>   CC      arch/arm64/kvm/hyp/switch.o
>> arch/arm64/kvm/hyp/switch.c: In function '__hyp_call_panic_vhe':
>> arch/arm64/kvm/hyp/switch.c:298:13: error: __hyp_panic_string causes a section type conflict with __fpsimd_enabled_nvhe
>>  const char *__hyp_panic_string __section(.hyp.text) = "HYP panic:\nPS:%08llx PC:%016llx ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n";
>>              ^
>> arch/arm64/kvm/hyp/switch.c:22:24: note: '__fpsimd_enabled_nvhe' was declared here
>>  static bool __hyp_text __fpsimd_enabled_nvhe(void)
>>
>> Any clue?
>>
> 
> The pointer is writable/non-exec and the code is readonly/exec, so it
> makes sense for the compiler to complain about this. It needs to be
> non-const, though, to prevent the compiler from short-circuiting the
> evaluation, so the only solution would be to add a .hyp.data section
> to the linker script, and put the __hyp_panic_string pointer in there.
> 
> Not worth the trouble, perhaps ...

Yeah. Slightly overkill for something that is not meant to be used...
I'll keep the asm hack for now, with a mental note of moving this to a
.hyp.data section if we ever create one for other reasons.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 05/15] arm64: KVM: Refactor kern_hyp_va/hyp_kern_va to deal with multiple offsets
@ 2016-06-30 11:57                 ` Marc Zyngier
  0 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-30 11:57 UTC (permalink / raw)
  To: linux-arm-kernel

On 30/06/16 12:10, Ard Biesheuvel wrote:
> On 30 June 2016 at 13:02, Marc Zyngier <marc.zyngier@arm.com> wrote:
>> On 30/06/16 11:42, Ard Biesheuvel wrote:
>>> On 30 June 2016 at 12:16, Marc Zyngier <marc.zyngier@arm.com> wrote:
>>>> On 30/06/16 10:22, Marc Zyngier wrote:
>>>>> On 28/06/16 13:42, Christoffer Dall wrote:
>>>>>> On Tue, Jun 07, 2016 at 11:58:25AM +0100, Marc Zyngier wrote:
>>>>>>> As we move towards a selectable HYP VA range, it is obvious that
>>>>>>> we don't want to test a variable to find out if we need to use
>>>>>>> the bottom VA range, the top VA range, or use the address as is
>>>>>>> (for VHE).
>>>>>>>
>>>>>>> Instead, we can expand our current helpers to generate the right
>>>>>>> mask or nop with code patching. We default to using the top VA
>>>>>>> space, with alternatives to switch to the bottom one or to nop
>>>>>>> out the instructions.
>>>>>>>
>>>>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>>>>>> ---
>>>>>>>  arch/arm64/include/asm/kvm_hyp.h | 27 ++++++++++++--------------
>>>>>>>  arch/arm64/include/asm/kvm_mmu.h | 42 +++++++++++++++++++++++++++++++++++++---
>>>>>>>  2 files changed, 51 insertions(+), 18 deletions(-)
>>>>>>>
>>>>>>> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
>>>>>>> index 61d01a9..dd4904b 100644
>>>>>>> --- a/arch/arm64/include/asm/kvm_hyp.h
>>>>>>> +++ b/arch/arm64/include/asm/kvm_hyp.h
>>>>>>> @@ -25,24 +25,21 @@
>>>>>>>
>>>>>>>  #define __hyp_text __section(.hyp.text) notrace
>>>>>>>
>>>>>>> -static inline unsigned long __kern_hyp_va(unsigned long v)
>>>>>>> -{
>>>>>>> -   asm volatile(ALTERNATIVE("and %0, %0, %1",
>>>>>>> -                            "nop",
>>>>>>> -                            ARM64_HAS_VIRT_HOST_EXTN)
>>>>>>> -                : "+r" (v) : "i" (HYP_PAGE_OFFSET_MASK));
>>>>>>> -   return v;
>>>>>>> -}
>>>>>>> -
>>>>>>> -#define kern_hyp_va(v) (typeof(v))(__kern_hyp_va((unsigned long)(v)))
>>>>>>> -
>>>>>>>  static inline unsigned long __hyp_kern_va(unsigned long v)
>>>>>>>  {
>>>>>>> -   asm volatile(ALTERNATIVE("orr %0, %0, %1",
>>>>>>> -                            "nop",
>>>>>>> +   u64 mask;
>>>>>>> +
>>>>>>> +   asm volatile(ALTERNATIVE("mov %0, %1",
>>>>>>> +                            "mov %0, %2",
>>>>>>> +                            ARM64_HYP_OFFSET_LOW)
>>>>>>> +                : "=r" (mask)
>>>>>>> +                : "i" (~HYP_PAGE_OFFSET_HIGH_MASK),
>>>>>>> +                  "i" (~HYP_PAGE_OFFSET_LOW_MASK));
>>>>>>> +   asm volatile(ALTERNATIVE("nop",
>>>>>>> +                            "mov %0, xzr",
>>>>>>>                              ARM64_HAS_VIRT_HOST_EXTN)
>>>>>>> -                : "+r" (v) : "i" (~HYP_PAGE_OFFSET_MASK));
>>>>>>> -   return v;
>>>>>>> +                : "+r" (mask));
>>>>>>> +   return v | mask;
>>>>>>
>>>>>> If mask is ~HYP_PAGE_OFFSET_LOW_MASK how can you be sure that setting
>>>>>> bit (VA_BITS - 1) is always the right thing to do to generate a kernel
>>>>>> address?
>>>>>
>>>>> It has taken be a while, but I think I finally see what you mean. We
>>>>> have no idea whether that bit was set or not.
>>>>>
>>>>>> This is kind of what I asked before only now there's an extra bit not
>>>>>> guaranteed by the architecture to be set for the kernel range, I
>>>>>> think.
>>>>>
>>>>> Yeah, I finally connected the couple of neurons left up there (that's
>>>>> what remains after the whole brexit braindamage). This doesn't work (or
>>>>> rather it only works sometimes). The good new is that I also realized we
>>>>> don't need any of that crap.
>>>>>
>>>>> The only case we currently use a HVA->KVA transformation is to pass the
>>>>> panic string down to panic(), and we can perfectly prevent
>>>>> __kvm_hyp_teardown from ever be evaluated as a HVA with a bit of
>>>>> asm-foo. This allows us to get rid of this whole function.
>>>>
>>>> Here's what I meant by this:
>>>>
>>>> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
>>>> index 437cfad..c19754d 100644
>>>> --- a/arch/arm64/kvm/hyp/switch.c
>>>> +++ b/arch/arm64/kvm/hyp/switch.c
>>>> @@ -299,9 +299,16 @@ static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%
>>>>
>>>>  static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par)
>>>>  {
>>>> -       unsigned long str_va = (unsigned long)__hyp_panic_string;
>>>> +       unsigned long str_va;
>>>>
>>>> -       __hyp_do_panic(hyp_kern_va(str_va),
>>>> +       /*
>>>> +        * Force the panic string to be loaded from the literal pool,
>>>> +        * making sure it is a kernel address and not a PC-relative
>>>> +        * reference.
>>>> +        */
>>>> +       asm volatile("ldr %0, =__hyp_panic_string" : "=r" (str_va));
>>>> +
>>>
>>> Wouldn't it suffice to make  __hyp_panic_string a non-static pointer
>>> to const char? That way, it will be statically initialized with a
>>> kernel VA, and the external linkage forces the compiler to evaluate
>>> its value at runtime.
>>
>> Yup, that would work as well. The only nit is that the pointer needs to be
>> in the __hyp_text section, and my compiler is shouting at me with this:
>>
>>   CC      arch/arm64/kvm/hyp/switch.o
>> arch/arm64/kvm/hyp/switch.c: In function '__hyp_call_panic_vhe':
>> arch/arm64/kvm/hyp/switch.c:298:13: error: __hyp_panic_string causes a section type conflict with __fpsimd_enabled_nvhe
>>  const char *__hyp_panic_string __section(.hyp.text) = "HYP panic:\nPS:%08llx PC:%016llx ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n";
>>              ^
>> arch/arm64/kvm/hyp/switch.c:22:24: note: '__fpsimd_enabled_nvhe' was declared here
>>  static bool __hyp_text __fpsimd_enabled_nvhe(void)
>>
>> Any clue?
>>
> 
> The pointer is writable/non-exec and the code is readonly/exec, so it
> makes sense for the compiler to complain about this. It needs to be
> non-const, though, to prevent the compiler from short-circuiting the
> evaluation, so the only solution would be to add a .hyp.data section
> to the linker script, and put the __hyp_panic_string pointer in there.
> 
> Not worth the trouble, perhaps ...

Yeah. Slightly overkill for something that is not meant to be used...
I'll keep the asm hack for now, with a mental note of moving this to a
.hyp.data section if we ever create one for other reasons.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 09/15] arm64: KVM: Simplify HYP init/teardown
  2016-06-28 21:31     ` Christoffer Dall
@ 2016-06-30 12:10       ` Marc Zyngier
  -1 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-30 12:10 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: linux-arm-kernel, kvm, kvmarm

On 28/06/16 22:31, Christoffer Dall wrote:
> On Tue, Jun 07, 2016 at 11:58:29AM +0100, Marc Zyngier wrote:
>> Now that we only have the "merged page tables" case to deal with,
>> there is a bunch of things we can simplify in the HYP code (both
>> at init and teardown time).
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm64/include/asm/kvm_host.h | 12 ++------
>>  arch/arm64/kvm/hyp-init.S         | 61 +++++----------------------------------
>>  arch/arm64/kvm/hyp/entry.S        | 19 ------------
>>  arch/arm64/kvm/hyp/hyp-entry.S    | 15 ++++++++++
>>  arch/arm64/kvm/reset.c            | 11 -------
>>  5 files changed, 26 insertions(+), 92 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 49095fc..88462c3 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -48,7 +48,6 @@
>>  int __attribute_const__ kvm_target_cpu(void);
>>  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
>>  int kvm_arch_dev_ioctl_check_extension(long ext);
>> -unsigned long kvm_hyp_reset_entry(void);
>>  void __extended_idmap_trampoline(phys_addr_t boot_pgd, phys_addr_t idmap_start);
>>  
>>  struct kvm_arch {
>> @@ -357,19 +356,14 @@ static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
>>  	 * Call initialization code, and switch to the full blown
>>  	 * HYP code.
>>  	 */
>> -	__kvm_call_hyp((void *)boot_pgd_ptr, pgd_ptr,
>> -		       hyp_stack_ptr, vector_ptr);
>> +	__kvm_call_hyp((void *)pgd_ptr, hyp_stack_ptr, vector_ptr);
>>  }
>>  
>> +void __kvm_hyp_teardown(void);
>>  static inline void __cpu_reset_hyp_mode(phys_addr_t boot_pgd_ptr,
>>  					phys_addr_t phys_idmap_start)
>>  {
>> -	/*
>> -	 * Call reset code, and switch back to stub hyp vectors.
>> -	 * Uses __kvm_call_hyp() to avoid kaslr's kvm_ksym_ref() translation.
>> -	 */
>> -	__kvm_call_hyp((void *)kvm_hyp_reset_entry(),
>> -		       boot_pgd_ptr, phys_idmap_start);
>> +	kvm_call_hyp(__kvm_hyp_teardown, phys_idmap_start);
>>  }
>>  
>>  static inline void kvm_arch_hardware_unsetup(void) {}
>> diff --git a/arch/arm64/kvm/hyp-init.S b/arch/arm64/kvm/hyp-init.S
>> index a873a6d..6b29d3d 100644
>> --- a/arch/arm64/kvm/hyp-init.S
>> +++ b/arch/arm64/kvm/hyp-init.S
>> @@ -53,10 +53,9 @@ __invalid:
>>  	b	.
>>  
>>  	/*
>> -	 * x0: HYP boot pgd
>> -	 * x1: HYP pgd
>> -	 * x2: HYP stack
>> -	 * x3: HYP vectors
>> +	 * x0: HYP pgd
>> +	 * x1: HYP stack
>> +	 * x2: HYP vectors
>>  	 */
>>  __do_hyp_init:
>>  
>> @@ -110,71 +109,27 @@ __do_hyp_init:
>>  	msr	sctlr_el2, x4
>>  	isb
>>  
>> -	/* Skip the trampoline dance if we merged the boot and runtime PGDs */
>> -	cmp	x0, x1
>> -	b.eq	merged
>> -
>> -	/* MMU is now enabled. Get ready for the trampoline dance */
>> -	ldr	x4, =TRAMPOLINE_VA
>> -	adr	x5, target
>> -	bfi	x4, x5, #0, #PAGE_SHIFT
>> -	br	x4
>> -
>> -target: /* We're now in the trampoline code, switch page tables */
>> -	msr	ttbr0_el2, x1
>> -	isb
>> -
>> -	/* Invalidate the old TLBs */
>> -	tlbi	alle2
>> -	dsb	sy
>> -
>> -merged:
>>  	/* Set the stack and new vectors */
>> +	kern_hyp_va	x1
>> +	mov	sp, x1
>>  	kern_hyp_va	x2
>> -	mov	sp, x2
>> -	kern_hyp_va	x3
>> -	msr	vbar_el2, x3
>> +	msr	vbar_el2, x2
>>  
>>  	/* Hello, World! */
>>  	eret
>>  ENDPROC(__kvm_hyp_init)
>>  
>>  	/*
>> -	 * Reset kvm back to the hyp stub. This is the trampoline dance in
>> -	 * reverse. If kvm used an extended idmap, __extended_idmap_trampoline
>> -	 * calls this code directly in the idmap. In this case switching to the
>> -	 * boot tables is a no-op.
>> -	 *
>> -	 * x0: HYP boot pgd
>> -	 * x1: HYP phys_idmap_start
>> +	 * Reset kvm back to the hyp stub.
>>  	 */
>>  ENTRY(__kvm_hyp_reset)
>> -	/* We're in trampoline code in VA, switch back to boot page tables */
>> -	msr	ttbr0_el2, x0
>> -	isb
>> -
>> -	/* Ensure the PA branch doesn't find a stale tlb entry or stale code. */
>> -	ic	iallu
>> -	tlbi	alle2
>> -	dsb	sy
>> -	isb
>> -
>> -	/* Branch into PA space */
>> -	adr	x0, 1f
>> -	bfi	x1, x0, #0, #PAGE_SHIFT
>> -	br	x1
>> -
>>  	/* We're now in idmap, disable MMU */
>> -1:	mrs	x0, sctlr_el2
>> +	mrs	x0, sctlr_el2
>>  	ldr	x1, =SCTLR_ELx_FLAGS
>>  	bic	x0, x0, x1		// Clear SCTL_M and etc
>>  	msr	sctlr_el2, x0
>>  	isb
>>  
>> -	/* Invalidate the old TLBs */
>> -	tlbi	alle2
>> -	dsb	sy
>> -
> 
> why can we get rid of the above two lines now?

We never really needed them, as we always invalid TLBs before enabling
the MMU. Simply disabling the MMU is enough here.

> 
>>  	/* Install stub vectors */
>>  	adr_l	x0, __hyp_stub_vectors
>>  	msr	vbar_el2, x0
>> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
>> index 70254a6..ce9e5e5 100644
>> --- a/arch/arm64/kvm/hyp/entry.S
>> +++ b/arch/arm64/kvm/hyp/entry.S
>> @@ -164,22 +164,3 @@ alternative_endif
>>  
>>  	eret
>>  ENDPROC(__fpsimd_guest_restore)
>> -
>> -/*
>> - * When using the extended idmap, we don't have a trampoline page we can use
>> - * while we switch pages tables during __kvm_hyp_reset. Accessing the idmap
>> - * directly would be ideal, but if we're using the extended idmap then the
>> - * idmap is located above HYP_PAGE_OFFSET, and the address will be masked by
>> - * kvm_call_hyp using kern_hyp_va.
>> - *
>> - * x0: HYP boot pgd
>> - * x1: HYP phys_idmap_start
>> - */
>> -ENTRY(__extended_idmap_trampoline)
>> -	mov	x4, x1
>> -	adr_l	x3, __kvm_hyp_reset
>> -
>> -	/* insert __kvm_hyp_reset()s offset into phys_idmap_start */
>> -	bfi	x4, x3, #0, #PAGE_SHIFT
>> -	br	x4
>> -ENDPROC(__extended_idmap_trampoline)
>> diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
>> index 2d87f36..f6d9694 100644
>> --- a/arch/arm64/kvm/hyp/hyp-entry.S
>> +++ b/arch/arm64/kvm/hyp/hyp-entry.S
>> @@ -62,6 +62,21 @@ ENTRY(__vhe_hyp_call)
>>  	isb
>>  	ret
>>  ENDPROC(__vhe_hyp_call)
>> +
>> +/*
>> + * Compute the idmap address of __kvm_hyp_reset based on the idmap
>> + * start passed as a parameter, and jump there.
>> + *
>> + * x0: HYP phys_idmap_start
>> + */
>> +ENTRY(__kvm_hyp_teardown)
>> +	mov	x4, x0
>> +	adr_l	x3, __kvm_hyp_reset
>> +
>> +	/* insert __kvm_hyp_reset()s offset into phys_idmap_start */
>> +	bfi	x4, x3, #0, #PAGE_SHIFT
>> +	br	x4
>> +ENDPROC(__kvm_hyp_teardown)
>>  	
>>  el1_sync:				// Guest trapped into EL2
>>  	save_x0_to_x3
>> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
>> index d044ca3..deee1b1 100644
>> --- a/arch/arm64/kvm/reset.c
>> +++ b/arch/arm64/kvm/reset.c
>> @@ -132,14 +132,3 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>>  	/* Reset timer */
>>  	return kvm_timer_vcpu_reset(vcpu, cpu_vtimer_irq);
>>  }
>> -
>> -unsigned long kvm_hyp_reset_entry(void)
>> -{
>> -	/*
>> -	 * KVM is running with merged page tables, which don't have the
>> -	 * trampoline page mapped. We know the idmap is still mapped,
>> -	 * but can't be called into directly. Use
>> -	 * __extended_idmap_trampoline to do the call.
>> -	 */
>> -	return (unsigned long)kvm_ksym_ref(__extended_idmap_trampoline);
>> -}
>> -- 
>> 2.1.4
>>
> 
> I'm not sure I understand why we needed the kvm_hyp_reset_entry
> indirection before, but the resulting code here looks good to me.

We still have an indirection, it is just a bit cleaner: We cannot call
directly into the reset function located in the idmap, as the function
is not strictly a kernel address, and the kern_hyp_va macro will mess
with the function address. This is why we go via:

__cpu_reset_hyp_mode -> __kvm_hyp_teardown -> __kvm_hyp_reset

__cpu_reset_hyp_mode is the arch-agnostic entry point,
__kvm_hyp_teardown is a normal HYP function, and __kvm_hyp_reset is the
real thing in the idmap page.

Is that clearer?

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 09/15] arm64: KVM: Simplify HYP init/teardown
@ 2016-06-30 12:10       ` Marc Zyngier
  0 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-30 12:10 UTC (permalink / raw)
  To: linux-arm-kernel

On 28/06/16 22:31, Christoffer Dall wrote:
> On Tue, Jun 07, 2016 at 11:58:29AM +0100, Marc Zyngier wrote:
>> Now that we only have the "merged page tables" case to deal with,
>> there is a bunch of things we can simplify in the HYP code (both
>> at init and teardown time).
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm64/include/asm/kvm_host.h | 12 ++------
>>  arch/arm64/kvm/hyp-init.S         | 61 +++++----------------------------------
>>  arch/arm64/kvm/hyp/entry.S        | 19 ------------
>>  arch/arm64/kvm/hyp/hyp-entry.S    | 15 ++++++++++
>>  arch/arm64/kvm/reset.c            | 11 -------
>>  5 files changed, 26 insertions(+), 92 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 49095fc..88462c3 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -48,7 +48,6 @@
>>  int __attribute_const__ kvm_target_cpu(void);
>>  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
>>  int kvm_arch_dev_ioctl_check_extension(long ext);
>> -unsigned long kvm_hyp_reset_entry(void);
>>  void __extended_idmap_trampoline(phys_addr_t boot_pgd, phys_addr_t idmap_start);
>>  
>>  struct kvm_arch {
>> @@ -357,19 +356,14 @@ static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
>>  	 * Call initialization code, and switch to the full blown
>>  	 * HYP code.
>>  	 */
>> -	__kvm_call_hyp((void *)boot_pgd_ptr, pgd_ptr,
>> -		       hyp_stack_ptr, vector_ptr);
>> +	__kvm_call_hyp((void *)pgd_ptr, hyp_stack_ptr, vector_ptr);
>>  }
>>  
>> +void __kvm_hyp_teardown(void);
>>  static inline void __cpu_reset_hyp_mode(phys_addr_t boot_pgd_ptr,
>>  					phys_addr_t phys_idmap_start)
>>  {
>> -	/*
>> -	 * Call reset code, and switch back to stub hyp vectors.
>> -	 * Uses __kvm_call_hyp() to avoid kaslr's kvm_ksym_ref() translation.
>> -	 */
>> -	__kvm_call_hyp((void *)kvm_hyp_reset_entry(),
>> -		       boot_pgd_ptr, phys_idmap_start);
>> +	kvm_call_hyp(__kvm_hyp_teardown, phys_idmap_start);
>>  }
>>  
>>  static inline void kvm_arch_hardware_unsetup(void) {}
>> diff --git a/arch/arm64/kvm/hyp-init.S b/arch/arm64/kvm/hyp-init.S
>> index a873a6d..6b29d3d 100644
>> --- a/arch/arm64/kvm/hyp-init.S
>> +++ b/arch/arm64/kvm/hyp-init.S
>> @@ -53,10 +53,9 @@ __invalid:
>>  	b	.
>>  
>>  	/*
>> -	 * x0: HYP boot pgd
>> -	 * x1: HYP pgd
>> -	 * x2: HYP stack
>> -	 * x3: HYP vectors
>> +	 * x0: HYP pgd
>> +	 * x1: HYP stack
>> +	 * x2: HYP vectors
>>  	 */
>>  __do_hyp_init:
>>  
>> @@ -110,71 +109,27 @@ __do_hyp_init:
>>  	msr	sctlr_el2, x4
>>  	isb
>>  
>> -	/* Skip the trampoline dance if we merged the boot and runtime PGDs */
>> -	cmp	x0, x1
>> -	b.eq	merged
>> -
>> -	/* MMU is now enabled. Get ready for the trampoline dance */
>> -	ldr	x4, =TRAMPOLINE_VA
>> -	adr	x5, target
>> -	bfi	x4, x5, #0, #PAGE_SHIFT
>> -	br	x4
>> -
>> -target: /* We're now in the trampoline code, switch page tables */
>> -	msr	ttbr0_el2, x1
>> -	isb
>> -
>> -	/* Invalidate the old TLBs */
>> -	tlbi	alle2
>> -	dsb	sy
>> -
>> -merged:
>>  	/* Set the stack and new vectors */
>> +	kern_hyp_va	x1
>> +	mov	sp, x1
>>  	kern_hyp_va	x2
>> -	mov	sp, x2
>> -	kern_hyp_va	x3
>> -	msr	vbar_el2, x3
>> +	msr	vbar_el2, x2
>>  
>>  	/* Hello, World! */
>>  	eret
>>  ENDPROC(__kvm_hyp_init)
>>  
>>  	/*
>> -	 * Reset kvm back to the hyp stub. This is the trampoline dance in
>> -	 * reverse. If kvm used an extended idmap, __extended_idmap_trampoline
>> -	 * calls this code directly in the idmap. In this case switching to the
>> -	 * boot tables is a no-op.
>> -	 *
>> -	 * x0: HYP boot pgd
>> -	 * x1: HYP phys_idmap_start
>> +	 * Reset kvm back to the hyp stub.
>>  	 */
>>  ENTRY(__kvm_hyp_reset)
>> -	/* We're in trampoline code in VA, switch back to boot page tables */
>> -	msr	ttbr0_el2, x0
>> -	isb
>> -
>> -	/* Ensure the PA branch doesn't find a stale tlb entry or stale code. */
>> -	ic	iallu
>> -	tlbi	alle2
>> -	dsb	sy
>> -	isb
>> -
>> -	/* Branch into PA space */
>> -	adr	x0, 1f
>> -	bfi	x1, x0, #0, #PAGE_SHIFT
>> -	br	x1
>> -
>>  	/* We're now in idmap, disable MMU */
>> -1:	mrs	x0, sctlr_el2
>> +	mrs	x0, sctlr_el2
>>  	ldr	x1, =SCTLR_ELx_FLAGS
>>  	bic	x0, x0, x1		// Clear SCTL_M and etc
>>  	msr	sctlr_el2, x0
>>  	isb
>>  
>> -	/* Invalidate the old TLBs */
>> -	tlbi	alle2
>> -	dsb	sy
>> -
> 
> why can we get rid of the above two lines now?

We never really needed them, as we always invalid TLBs before enabling
the MMU. Simply disabling the MMU is enough here.

> 
>>  	/* Install stub vectors */
>>  	adr_l	x0, __hyp_stub_vectors
>>  	msr	vbar_el2, x0
>> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
>> index 70254a6..ce9e5e5 100644
>> --- a/arch/arm64/kvm/hyp/entry.S
>> +++ b/arch/arm64/kvm/hyp/entry.S
>> @@ -164,22 +164,3 @@ alternative_endif
>>  
>>  	eret
>>  ENDPROC(__fpsimd_guest_restore)
>> -
>> -/*
>> - * When using the extended idmap, we don't have a trampoline page we can use
>> - * while we switch pages tables during __kvm_hyp_reset. Accessing the idmap
>> - * directly would be ideal, but if we're using the extended idmap then the
>> - * idmap is located above HYP_PAGE_OFFSET, and the address will be masked by
>> - * kvm_call_hyp using kern_hyp_va.
>> - *
>> - * x0: HYP boot pgd
>> - * x1: HYP phys_idmap_start
>> - */
>> -ENTRY(__extended_idmap_trampoline)
>> -	mov	x4, x1
>> -	adr_l	x3, __kvm_hyp_reset
>> -
>> -	/* insert __kvm_hyp_reset()s offset into phys_idmap_start */
>> -	bfi	x4, x3, #0, #PAGE_SHIFT
>> -	br	x4
>> -ENDPROC(__extended_idmap_trampoline)
>> diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
>> index 2d87f36..f6d9694 100644
>> --- a/arch/arm64/kvm/hyp/hyp-entry.S
>> +++ b/arch/arm64/kvm/hyp/hyp-entry.S
>> @@ -62,6 +62,21 @@ ENTRY(__vhe_hyp_call)
>>  	isb
>>  	ret
>>  ENDPROC(__vhe_hyp_call)
>> +
>> +/*
>> + * Compute the idmap address of __kvm_hyp_reset based on the idmap
>> + * start passed as a parameter, and jump there.
>> + *
>> + * x0: HYP phys_idmap_start
>> + */
>> +ENTRY(__kvm_hyp_teardown)
>> +	mov	x4, x0
>> +	adr_l	x3, __kvm_hyp_reset
>> +
>> +	/* insert __kvm_hyp_reset()s offset into phys_idmap_start */
>> +	bfi	x4, x3, #0, #PAGE_SHIFT
>> +	br	x4
>> +ENDPROC(__kvm_hyp_teardown)
>>  	
>>  el1_sync:				// Guest trapped into EL2
>>  	save_x0_to_x3
>> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
>> index d044ca3..deee1b1 100644
>> --- a/arch/arm64/kvm/reset.c
>> +++ b/arch/arm64/kvm/reset.c
>> @@ -132,14 +132,3 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>>  	/* Reset timer */
>>  	return kvm_timer_vcpu_reset(vcpu, cpu_vtimer_irq);
>>  }
>> -
>> -unsigned long kvm_hyp_reset_entry(void)
>> -{
>> -	/*
>> -	 * KVM is running with merged page tables, which don't have the
>> -	 * trampoline page mapped. We know the idmap is still mapped,
>> -	 * but can't be called into directly. Use
>> -	 * __extended_idmap_trampoline to do the call.
>> -	 */
>> -	return (unsigned long)kvm_ksym_ref(__extended_idmap_trampoline);
>> -}
>> -- 
>> 2.1.4
>>
> 
> I'm not sure I understand why we needed the kvm_hyp_reset_entry
> indirection before, but the resulting code here looks good to me.

We still have an indirection, it is just a bit cleaner: We cannot call
directly into the reset function located in the idmap, as the function
is not strictly a kernel address, and the kern_hyp_va macro will mess
with the function address. This is why we go via:

__cpu_reset_hyp_mode -> __kvm_hyp_teardown -> __kvm_hyp_reset

__cpu_reset_hyp_mode is the arch-agnostic entry point,
__kvm_hyp_teardown is a normal HYP function, and __kvm_hyp_reset is the
real thing in the idmap page.

Is that clearer?

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 08/15] arm/arm64: KVM: Always have merged page tables
  2016-06-28 21:43     ` Christoffer Dall
@ 2016-06-30 12:27       ` Marc Zyngier
  -1 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-30 12:27 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: linux-arm-kernel, kvm, kvmarm

On 28/06/16 22:43, Christoffer Dall wrote:
> On Tue, Jun 07, 2016 at 11:58:28AM +0100, Marc Zyngier wrote:
>> We're in a position where we can now always have "merged" page
>> tables, where both the runtime mapping and the idmap coexist.
>>
>> This results in some code being removed, but there is more to come.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm/kvm/mmu.c     | 74 +++++++++++++++++++++++---------------------------
>>  arch/arm64/kvm/reset.c | 31 +++++----------------
>>  2 files changed, 41 insertions(+), 64 deletions(-)
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index d6ecbf1..9a17e14 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -492,13 +492,12 @@ void free_boot_hyp_pgd(void)
>>  
>>  	if (boot_hyp_pgd) {
>>  		unmap_hyp_range(boot_hyp_pgd, hyp_idmap_start, PAGE_SIZE);
>> -		unmap_hyp_range(boot_hyp_pgd, TRAMPOLINE_VA, PAGE_SIZE);
>>  		free_pages((unsigned long)boot_hyp_pgd, hyp_pgd_order);
>>  		boot_hyp_pgd = NULL;
>>  	}
>>  
>>  	if (hyp_pgd)
>> -		unmap_hyp_range(hyp_pgd, TRAMPOLINE_VA, PAGE_SIZE);
>> +		unmap_hyp_range(hyp_pgd, hyp_idmap_start, PAGE_SIZE);
>>  
>>  	mutex_unlock(&kvm_hyp_pgd_mutex);
>>  }
>> @@ -1690,7 +1689,7 @@ phys_addr_t kvm_mmu_get_boot_httbr(void)
>>  	if (__kvm_cpu_uses_extended_idmap())
>>  		return virt_to_phys(merged_hyp_pgd);
>>  	else
>> -		return virt_to_phys(boot_hyp_pgd);
>> +		return virt_to_phys(hyp_pgd);
>>  }
>>  
>>  phys_addr_t kvm_get_idmap_vector(void)
>> @@ -1703,6 +1702,22 @@ phys_addr_t kvm_get_idmap_start(void)
>>  	return hyp_idmap_start;
>>  }
>>  
>> +static int kvm_map_idmap_text(pgd_t *pgd)
>> +{
>> +	int err;
>> +
>> +	/* Create the idmap in the boot page tables */
>> +	err = 	__create_hyp_mappings(pgd,
>> +				      hyp_idmap_start, hyp_idmap_end,
>> +				      __phys_to_pfn(hyp_idmap_start),
>> +				      PAGE_HYP);
>> +	if (err)
>> +		kvm_err("Failed to idmap %lx-%lx\n",
>> +			hyp_idmap_start, hyp_idmap_end);
>> +
>> +	return err;
>> +}
>> +
>>  int kvm_mmu_init(void)
>>  {
>>  	int err;
>> @@ -1718,27 +1733,25 @@ int kvm_mmu_init(void)
>>  	BUG_ON((hyp_idmap_start ^ (hyp_idmap_end - 1)) & PAGE_MASK);
>>  
>>  	hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order);
>> -	boot_hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order);
>> -
>> -	if (!hyp_pgd || !boot_hyp_pgd) {
>> +	if (!hyp_pgd) {
>>  		kvm_err("Hyp mode PGD not allocated\n");
>>  		err = -ENOMEM;
>>  		goto out;
>>  	}
>>  
>> -	/* Create the idmap in the boot page tables */
>> -	err = 	__create_hyp_mappings(boot_hyp_pgd,
>> -				      hyp_idmap_start, hyp_idmap_end,
>> -				      __phys_to_pfn(hyp_idmap_start),
>> -				      PAGE_HYP);
>> +	if (__kvm_cpu_uses_extended_idmap()) {
>> +		boot_hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
>> +							 hyp_pgd_order);
>> +		if (!boot_hyp_pgd) {
>> +			kvm_err("Hyp boot PGD not allocated\n");
>> +			err = -ENOMEM;
>> +			goto out;
>> +		}
>>  
>> -	if (err) {
>> -		kvm_err("Failed to idmap %lx-%lx\n",
>> -			hyp_idmap_start, hyp_idmap_end);
>> -		goto out;
>> -	}
>> +		err = kvm_map_idmap_text(boot_hyp_pgd);
>> +		if (err)
>> +			goto out;
>>  
>> -	if (__kvm_cpu_uses_extended_idmap()) {
>>  		merged_hyp_pgd = (pgd_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
>>  		if (!merged_hyp_pgd) {
>>  			kvm_err("Failed to allocate extra HYP pgd\n");
>> @@ -1746,29 +1759,10 @@ int kvm_mmu_init(void)
>>  		}
>>  		__kvm_extend_hypmap(boot_hyp_pgd, hyp_pgd, merged_hyp_pgd,
>>  				    hyp_idmap_start);
>> -		return 0;
>> -	}
>> -
>> -	/* Map the very same page at the trampoline VA */
>> -	err = 	__create_hyp_mappings(boot_hyp_pgd,
>> -				      TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
>> -				      __phys_to_pfn(hyp_idmap_start),
>> -				      PAGE_HYP);
>> -	if (err) {
>> -		kvm_err("Failed to map trampoline @%lx into boot HYP pgd\n",
>> -			TRAMPOLINE_VA);
>> -		goto out;
>> -	}
>> -
>> -	/* Map the same page again into the runtime page tables */
>> -	err = 	__create_hyp_mappings(hyp_pgd,
>> -				      TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
>> -				      __phys_to_pfn(hyp_idmap_start),
>> -				      PAGE_HYP);
>> -	if (err) {
>> -		kvm_err("Failed to map trampoline @%lx into runtime HYP pgd\n",
>> -			TRAMPOLINE_VA);
>> -		goto out;
>> +	} else {
>> +		err = kvm_map_idmap_text(hyp_pgd);
>> +		if (err)
>> +			goto out;
> 
> Something I'm not clear on:
> 
> how can we always have merged pgtables on 32-bit ARM at this point?
> 
> why is there not a potential conflict at this point in the series
> between the runtime hyp mappings and the idmaps?

The problem is slightly different. On 32bit, the HYP mapping completely
covers the whole address space, just like the kernel. But if your idmap
and the kernel VA overlap, you are in a very weird position. Actually,
you'll even have trouble booting into the kernel.

So my take on this is that it has already been solved by making sure the
kernel is loaded at an address that won't alias with the kernel VA. If
it hasn't, then they are probably not running mainline Linux.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 08/15] arm/arm64: KVM: Always have merged page tables
@ 2016-06-30 12:27       ` Marc Zyngier
  0 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-30 12:27 UTC (permalink / raw)
  To: linux-arm-kernel

On 28/06/16 22:43, Christoffer Dall wrote:
> On Tue, Jun 07, 2016 at 11:58:28AM +0100, Marc Zyngier wrote:
>> We're in a position where we can now always have "merged" page
>> tables, where both the runtime mapping and the idmap coexist.
>>
>> This results in some code being removed, but there is more to come.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm/kvm/mmu.c     | 74 +++++++++++++++++++++++---------------------------
>>  arch/arm64/kvm/reset.c | 31 +++++----------------
>>  2 files changed, 41 insertions(+), 64 deletions(-)
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index d6ecbf1..9a17e14 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -492,13 +492,12 @@ void free_boot_hyp_pgd(void)
>>  
>>  	if (boot_hyp_pgd) {
>>  		unmap_hyp_range(boot_hyp_pgd, hyp_idmap_start, PAGE_SIZE);
>> -		unmap_hyp_range(boot_hyp_pgd, TRAMPOLINE_VA, PAGE_SIZE);
>>  		free_pages((unsigned long)boot_hyp_pgd, hyp_pgd_order);
>>  		boot_hyp_pgd = NULL;
>>  	}
>>  
>>  	if (hyp_pgd)
>> -		unmap_hyp_range(hyp_pgd, TRAMPOLINE_VA, PAGE_SIZE);
>> +		unmap_hyp_range(hyp_pgd, hyp_idmap_start, PAGE_SIZE);
>>  
>>  	mutex_unlock(&kvm_hyp_pgd_mutex);
>>  }
>> @@ -1690,7 +1689,7 @@ phys_addr_t kvm_mmu_get_boot_httbr(void)
>>  	if (__kvm_cpu_uses_extended_idmap())
>>  		return virt_to_phys(merged_hyp_pgd);
>>  	else
>> -		return virt_to_phys(boot_hyp_pgd);
>> +		return virt_to_phys(hyp_pgd);
>>  }
>>  
>>  phys_addr_t kvm_get_idmap_vector(void)
>> @@ -1703,6 +1702,22 @@ phys_addr_t kvm_get_idmap_start(void)
>>  	return hyp_idmap_start;
>>  }
>>  
>> +static int kvm_map_idmap_text(pgd_t *pgd)
>> +{
>> +	int err;
>> +
>> +	/* Create the idmap in the boot page tables */
>> +	err = 	__create_hyp_mappings(pgd,
>> +				      hyp_idmap_start, hyp_idmap_end,
>> +				      __phys_to_pfn(hyp_idmap_start),
>> +				      PAGE_HYP);
>> +	if (err)
>> +		kvm_err("Failed to idmap %lx-%lx\n",
>> +			hyp_idmap_start, hyp_idmap_end);
>> +
>> +	return err;
>> +}
>> +
>>  int kvm_mmu_init(void)
>>  {
>>  	int err;
>> @@ -1718,27 +1733,25 @@ int kvm_mmu_init(void)
>>  	BUG_ON((hyp_idmap_start ^ (hyp_idmap_end - 1)) & PAGE_MASK);
>>  
>>  	hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order);
>> -	boot_hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order);
>> -
>> -	if (!hyp_pgd || !boot_hyp_pgd) {
>> +	if (!hyp_pgd) {
>>  		kvm_err("Hyp mode PGD not allocated\n");
>>  		err = -ENOMEM;
>>  		goto out;
>>  	}
>>  
>> -	/* Create the idmap in the boot page tables */
>> -	err = 	__create_hyp_mappings(boot_hyp_pgd,
>> -				      hyp_idmap_start, hyp_idmap_end,
>> -				      __phys_to_pfn(hyp_idmap_start),
>> -				      PAGE_HYP);
>> +	if (__kvm_cpu_uses_extended_idmap()) {
>> +		boot_hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
>> +							 hyp_pgd_order);
>> +		if (!boot_hyp_pgd) {
>> +			kvm_err("Hyp boot PGD not allocated\n");
>> +			err = -ENOMEM;
>> +			goto out;
>> +		}
>>  
>> -	if (err) {
>> -		kvm_err("Failed to idmap %lx-%lx\n",
>> -			hyp_idmap_start, hyp_idmap_end);
>> -		goto out;
>> -	}
>> +		err = kvm_map_idmap_text(boot_hyp_pgd);
>> +		if (err)
>> +			goto out;
>>  
>> -	if (__kvm_cpu_uses_extended_idmap()) {
>>  		merged_hyp_pgd = (pgd_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
>>  		if (!merged_hyp_pgd) {
>>  			kvm_err("Failed to allocate extra HYP pgd\n");
>> @@ -1746,29 +1759,10 @@ int kvm_mmu_init(void)
>>  		}
>>  		__kvm_extend_hypmap(boot_hyp_pgd, hyp_pgd, merged_hyp_pgd,
>>  				    hyp_idmap_start);
>> -		return 0;
>> -	}
>> -
>> -	/* Map the very same page at the trampoline VA */
>> -	err = 	__create_hyp_mappings(boot_hyp_pgd,
>> -				      TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
>> -				      __phys_to_pfn(hyp_idmap_start),
>> -				      PAGE_HYP);
>> -	if (err) {
>> -		kvm_err("Failed to map trampoline @%lx into boot HYP pgd\n",
>> -			TRAMPOLINE_VA);
>> -		goto out;
>> -	}
>> -
>> -	/* Map the same page again into the runtime page tables */
>> -	err = 	__create_hyp_mappings(hyp_pgd,
>> -				      TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
>> -				      __phys_to_pfn(hyp_idmap_start),
>> -				      PAGE_HYP);
>> -	if (err) {
>> -		kvm_err("Failed to map trampoline @%lx into runtime HYP pgd\n",
>> -			TRAMPOLINE_VA);
>> -		goto out;
>> +	} else {
>> +		err = kvm_map_idmap_text(hyp_pgd);
>> +		if (err)
>> +			goto out;
> 
> Something I'm not clear on:
> 
> how can we always have merged pgtables on 32-bit ARM at this point?
> 
> why is there not a potential conflict at this point in the series
> between the runtime hyp mappings and the idmaps?

The problem is slightly different. On 32bit, the HYP mapping completely
covers the whole address space, just like the kernel. But if your idmap
and the kernel VA overlap, you are in a very weird position. Actually,
you'll even have trouble booting into the kernel.

So my take on this is that it has already been solved by making sure the
kernel is loaded at an address that won't alias with the kernel VA. If
it hasn't, then they are probably not running mainline Linux.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 12/15] arm: KVM: Simplify HYP init
  2016-06-28 21:50     ` Christoffer Dall
@ 2016-06-30 12:31       ` Marc Zyngier
  -1 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-30 12:31 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm

On 28/06/16 22:50, Christoffer Dall wrote:
> On Tue, Jun 07, 2016 at 11:58:32AM +0100, Marc Zyngier wrote:
>> Just like for arm64, we can now make the HYP setup a lot simpler,
>> and we can now initialise it in one go (instead of the two
>> phases we currently have).
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm/include/asm/kvm_host.h | 15 +++++--------
>>  arch/arm/kvm/init.S             | 49 ++++++++---------------------------------
>>  2 files changed, 14 insertions(+), 50 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> index 020f4eb..eafbfd5 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -250,18 +250,13 @@ static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
>>  	 * code. The init code doesn't need to preserve these
>>  	 * registers as r0-r3 are already callee saved according to
>>  	 * the AAPCS.
>> -	 * Note that we slightly misuse the prototype by casing the
>> +	 * Note that we slightly misuse the prototype by casting the
>>  	 * stack pointer to a void *.
>> -	 *
>> -	 * We don't have enough registers to perform the full init in
>> -	 * one go.  Install the boot PGD first, and then install the
>> -	 * runtime PGD, stack pointer and vectors. The PGDs are always
>> -	 * passed as the third argument, in order to be passed into
>> -	 * r2-r3 to the init code (yes, this is compliant with the
>> -	 * PCS!).
>> -	 */
>>  
>> -	kvm_call_hyp(NULL, 0, boot_pgd_ptr);
>> +	 * The PGDs are always passed as the third argument, in order
>> +	 * to be passed into r2-r3 to the init code (yes, this is
>> +	 * compliant with the PCS!).
>> +	 */
>>  
>>  	kvm_call_hyp((void*)hyp_stack_ptr, vector_ptr, pgd_ptr);
>>  }
>> diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
>> index 1f9ae17..b82a99d 100644
>> --- a/arch/arm/kvm/init.S
>> +++ b/arch/arm/kvm/init.S
>> @@ -32,23 +32,13 @@
>>   *       r2,r3 = Hypervisor pgd pointer
>>   *
>>   * The init scenario is:
>> - * - We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
>> - *   runtime stack, runtime vectors
>> - * - Enable the MMU with the boot pgd
>> - * - Jump to a target into the trampoline page (remember, this is the same
>> - *   physical page!)
>> - * - Now switch to the runtime pgd (same VA, and still the same physical
>> - *   page!)
>> + * - We jump in HYP with 3 parameters: runtime HYP pgd, runtime stack,
>> + *   runtime vectors
> 
> probably just call this HYP pgd, HYP stack, and HYP vectors now

Yup.

>>   * - Invalidate TLBs
>>   * - Set stack and vectors
>> + * - Setup the page tables
>> + * - Enable the MMU
>>   * - Profit! (or eret, if you only care about the code).
>> - *
>> - * As we only have four registers available to pass parameters (and we
>> - * need six), we split the init in two phases:
>> - * - Phase 1: r0 = 0, r1 = 0, r2,r3 contain the boot PGD.
>> - *   Provides the basic HYP init, and enable the MMU.
>> - * - Phase 2: r0 = ToS, r1 = vectors, r2,r3 contain the runtime PGD.
>> - *   Switches to the runtime PGD, set stack and vectors.
>>   */
>>  
>>  	.text
>> @@ -68,8 +58,11 @@ __kvm_hyp_init:
>>  	W(b)	.
>>  
>>  __do_hyp_init:
>> -	cmp	r0, #0			@ We have a SP?
>> -	bne	phase2			@ Yes, second stage init
>> +	@ Set stack pointer
>> +	mov	sp, r0
>> +
>> +	@ Set HVBAR to point to the HYP vectors
>> +	mcr	p15, 4, r1, c12, c0, 0	@ HVBAR
>>  
>>  	@ Set the HTTBR to point to the hypervisor PGD pointer passed
>>  	mcrr	p15, 4, rr_lo_hi(r2, r3), c2
>> @@ -114,33 +107,9 @@ __do_hyp_init:
>>   THUMB(	ldr	r2, =(HSCTLR_M | HSCTLR_A | HSCTLR_TE)		)
>>  	orr	r1, r1, r2
>>  	orr	r0, r0, r1
>> -	isb
>>  	mcr	p15, 4, r0, c1, c0, 0	@ HSCR
>> -
>> -	@ End of init phase-1
>> -	eret
>> -
>> -phase2:
>> -	@ Set stack pointer
>> -	mov	sp, r0
>> -
>> -	@ Set HVBAR to point to the HYP vectors
>> -	mcr	p15, 4, r1, c12, c0, 0	@ HVBAR
>> -
>> -	@ Jump to the trampoline page
>> -	ldr	r0, =TRAMPOLINE_VA
>> -	adr	r1, target
>> -	bfi	r0, r1, #0, #PAGE_SHIFT
>> -	ret	r0
>> -
>> -target:	@ We're now in the trampoline code, switch page tables
>> -	mcrr	p15, 4, rr_lo_hi(r2, r3), c2
>>  	isb
>>  
>> -	@ Invalidate the old TLBs
>> -	mcr	p15, 4, r0, c8, c7, 0	@ TLBIALLH
>> -	dsb	ish
> 
> how are we sure there are no stale entries in the TLB beyond the idmap
> region?  Did we take care of this during kernel boot?  What about
> hotplug/suspend stuff?

This is done just before installing the page tables (not visible in this
patch). Hotplug/suspend goes through the same path as well, so it should
be all taken care of.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 12/15] arm: KVM: Simplify HYP init
@ 2016-06-30 12:31       ` Marc Zyngier
  0 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-30 12:31 UTC (permalink / raw)
  To: linux-arm-kernel

On 28/06/16 22:50, Christoffer Dall wrote:
> On Tue, Jun 07, 2016 at 11:58:32AM +0100, Marc Zyngier wrote:
>> Just like for arm64, we can now make the HYP setup a lot simpler,
>> and we can now initialise it in one go (instead of the two
>> phases we currently have).
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm/include/asm/kvm_host.h | 15 +++++--------
>>  arch/arm/kvm/init.S             | 49 ++++++++---------------------------------
>>  2 files changed, 14 insertions(+), 50 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> index 020f4eb..eafbfd5 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -250,18 +250,13 @@ static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
>>  	 * code. The init code doesn't need to preserve these
>>  	 * registers as r0-r3 are already callee saved according to
>>  	 * the AAPCS.
>> -	 * Note that we slightly misuse the prototype by casing the
>> +	 * Note that we slightly misuse the prototype by casting the
>>  	 * stack pointer to a void *.
>> -	 *
>> -	 * We don't have enough registers to perform the full init in
>> -	 * one go.  Install the boot PGD first, and then install the
>> -	 * runtime PGD, stack pointer and vectors. The PGDs are always
>> -	 * passed as the third argument, in order to be passed into
>> -	 * r2-r3 to the init code (yes, this is compliant with the
>> -	 * PCS!).
>> -	 */
>>  
>> -	kvm_call_hyp(NULL, 0, boot_pgd_ptr);
>> +	 * The PGDs are always passed as the third argument, in order
>> +	 * to be passed into r2-r3 to the init code (yes, this is
>> +	 * compliant with the PCS!).
>> +	 */
>>  
>>  	kvm_call_hyp((void*)hyp_stack_ptr, vector_ptr, pgd_ptr);
>>  }
>> diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
>> index 1f9ae17..b82a99d 100644
>> --- a/arch/arm/kvm/init.S
>> +++ b/arch/arm/kvm/init.S
>> @@ -32,23 +32,13 @@
>>   *       r2,r3 = Hypervisor pgd pointer
>>   *
>>   * The init scenario is:
>> - * - We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
>> - *   runtime stack, runtime vectors
>> - * - Enable the MMU with the boot pgd
>> - * - Jump to a target into the trampoline page (remember, this is the same
>> - *   physical page!)
>> - * - Now switch to the runtime pgd (same VA, and still the same physical
>> - *   page!)
>> + * - We jump in HYP with 3 parameters: runtime HYP pgd, runtime stack,
>> + *   runtime vectors
> 
> probably just call this HYP pgd, HYP stack, and HYP vectors now

Yup.

>>   * - Invalidate TLBs
>>   * - Set stack and vectors
>> + * - Setup the page tables
>> + * - Enable the MMU
>>   * - Profit! (or eret, if you only care about the code).
>> - *
>> - * As we only have four registers available to pass parameters (and we
>> - * need six), we split the init in two phases:
>> - * - Phase 1: r0 = 0, r1 = 0, r2,r3 contain the boot PGD.
>> - *   Provides the basic HYP init, and enable the MMU.
>> - * - Phase 2: r0 = ToS, r1 = vectors, r2,r3 contain the runtime PGD.
>> - *   Switches to the runtime PGD, set stack and vectors.
>>   */
>>  
>>  	.text
>> @@ -68,8 +58,11 @@ __kvm_hyp_init:
>>  	W(b)	.
>>  
>>  __do_hyp_init:
>> -	cmp	r0, #0			@ We have a SP?
>> -	bne	phase2			@ Yes, second stage init
>> +	@ Set stack pointer
>> +	mov	sp, r0
>> +
>> +	@ Set HVBAR to point to the HYP vectors
>> +	mcr	p15, 4, r1, c12, c0, 0	@ HVBAR
>>  
>>  	@ Set the HTTBR to point to the hypervisor PGD pointer passed
>>  	mcrr	p15, 4, rr_lo_hi(r2, r3), c2
>> @@ -114,33 +107,9 @@ __do_hyp_init:
>>   THUMB(	ldr	r2, =(HSCTLR_M | HSCTLR_A | HSCTLR_TE)		)
>>  	orr	r1, r1, r2
>>  	orr	r0, r0, r1
>> -	isb
>>  	mcr	p15, 4, r0, c1, c0, 0	@ HSCR
>> -
>> -	@ End of init phase-1
>> -	eret
>> -
>> -phase2:
>> -	@ Set stack pointer
>> -	mov	sp, r0
>> -
>> -	@ Set HVBAR to point to the HYP vectors
>> -	mcr	p15, 4, r1, c12, c0, 0	@ HVBAR
>> -
>> -	@ Jump to the trampoline page
>> -	ldr	r0, =TRAMPOLINE_VA
>> -	adr	r1, target
>> -	bfi	r0, r1, #0, #PAGE_SHIFT
>> -	ret	r0
>> -
>> -target:	@ We're now in the trampoline code, switch page tables
>> -	mcrr	p15, 4, rr_lo_hi(r2, r3), c2
>>  	isb
>>  
>> -	@ Invalidate the old TLBs
>> -	mcr	p15, 4, r0, c8, c7, 0	@ TLBIALLH
>> -	dsb	ish
> 
> how are we sure there are no stale entries in the TLB beyond the idmap
> region?  Did we take care of this during kernel boot?  What about
> hotplug/suspend stuff?

This is done just before installing the page tables (not visible in this
patch). Hotplug/suspend goes through the same path as well, so it should
be all taken care of.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 15/15] arm/arm64: KVM: Check that IDMAP doesn't intersect with VA range
  2016-06-28 22:01     ` Christoffer Dall
@ 2016-06-30 12:51       ` Marc Zyngier
  -1 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-30 12:51 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: linux-arm-kernel, kvm, kvmarm

On 28/06/16 23:01, Christoffer Dall wrote:
> On Tue, Jun 07, 2016 at 11:58:35AM +0100, Marc Zyngier wrote:
>> This is more of a safety measure than anything else: If we end-up
>> with an idmap page that intersect with the range picked for the
>> the HYP VA space, abort the KVM setup, as it is unsafe to go
>> further.
>>
>> I cannot imagine it happening on 64bit (we have a mechanism to
>> work around it), but could potentially occur on a 32bit system with
>> the kernel loaded high enough in memory so that in conflicts with
>> the kernel VA.
> 
> ah, you had a patch for this...
> 
> does this even work for enabling the MMU during kernel boot or how do
> they deal with it?

As I said in a reply to an earlier patch, this must already taken care
of by the bootloader, making sure that the kernel physical memory does
not alias with the VAs. Pretty scary.

> 
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm/kvm/mmu.c | 15 +++++++++++++++
>>  1 file changed, 15 insertions(+)
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 46b8604..819517d 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -1708,6 +1708,21 @@ int kvm_mmu_init(void)
>>  	 */
>>  	BUG_ON((hyp_idmap_start ^ (hyp_idmap_end - 1)) & PAGE_MASK);
>>  
>> +	kvm_info("IDMAP page: %lx\n", hyp_idmap_start);
>> +	kvm_info("HYP VA range: %lx:%lx\n",
>> +		 KERN_TO_HYP(PAGE_OFFSET), KERN_TO_HYP(~0UL));
>> +
>> +	if (hyp_idmap_start >= KERN_TO_HYP(PAGE_OFFSET) &&
>> +	    hyp_idmap_start <  KERN_TO_HYP(~0UL)) {
> 
> why is the second part of this clause necessary?

We want to check that our clash avoiding mechanism works.

Since we're translating the kernel VA downwards (by clearing the top
bits), we can definitely end-up in a situation where the idmap is above
the translated "top of the kernel" (that's the "low mask" option). So it
is definitely worth checking that we really don't get any aliasing. This
has been quite useful when debugging this code.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 15/15] arm/arm64: KVM: Check that IDMAP doesn't intersect with VA range
@ 2016-06-30 12:51       ` Marc Zyngier
  0 siblings, 0 replies; 90+ messages in thread
From: Marc Zyngier @ 2016-06-30 12:51 UTC (permalink / raw)
  To: linux-arm-kernel

On 28/06/16 23:01, Christoffer Dall wrote:
> On Tue, Jun 07, 2016 at 11:58:35AM +0100, Marc Zyngier wrote:
>> This is more of a safety measure than anything else: If we end-up
>> with an idmap page that intersect with the range picked for the
>> the HYP VA space, abort the KVM setup, as it is unsafe to go
>> further.
>>
>> I cannot imagine it happening on 64bit (we have a mechanism to
>> work around it), but could potentially occur on a 32bit system with
>> the kernel loaded high enough in memory so that in conflicts with
>> the kernel VA.
> 
> ah, you had a patch for this...
> 
> does this even work for enabling the MMU during kernel boot or how do
> they deal with it?

As I said in a reply to an earlier patch, this must already taken care
of by the bootloader, making sure that the kernel physical memory does
not alias with the VAs. Pretty scary.

> 
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm/kvm/mmu.c | 15 +++++++++++++++
>>  1 file changed, 15 insertions(+)
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 46b8604..819517d 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -1708,6 +1708,21 @@ int kvm_mmu_init(void)
>>  	 */
>>  	BUG_ON((hyp_idmap_start ^ (hyp_idmap_end - 1)) & PAGE_MASK);
>>  
>> +	kvm_info("IDMAP page: %lx\n", hyp_idmap_start);
>> +	kvm_info("HYP VA range: %lx:%lx\n",
>> +		 KERN_TO_HYP(PAGE_OFFSET), KERN_TO_HYP(~0UL));
>> +
>> +	if (hyp_idmap_start >= KERN_TO_HYP(PAGE_OFFSET) &&
>> +	    hyp_idmap_start <  KERN_TO_HYP(~0UL)) {
> 
> why is the second part of this clause necessary?

We want to check that our clash avoiding mechanism works.

Since we're translating the kernel VA downwards (by clearing the top
bits), we can definitely end-up in a situation where the idmap is above
the translated "top of the kernel" (that's the "low mask" option). So it
is definitely worth checking that we really don't get any aliasing. This
has been quite useful when debugging this code.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 15/15] arm/arm64: KVM: Check that IDMAP doesn't intersect with VA range
  2016-06-30 12:51       ` Marc Zyngier
@ 2016-06-30 13:27         ` Christoffer Dall
  -1 siblings, 0 replies; 90+ messages in thread
From: Christoffer Dall @ 2016-06-30 13:27 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, linux-arm-kernel, kvmarm

On Thu, Jun 30, 2016 at 01:51:00PM +0100, Marc Zyngier wrote:
> On 28/06/16 23:01, Christoffer Dall wrote:
> > On Tue, Jun 07, 2016 at 11:58:35AM +0100, Marc Zyngier wrote:
> >> This is more of a safety measure than anything else: If we end-up
> >> with an idmap page that intersect with the range picked for the
> >> the HYP VA space, abort the KVM setup, as it is unsafe to go
> >> further.
> >>
> >> I cannot imagine it happening on 64bit (we have a mechanism to
> >> work around it), but could potentially occur on a 32bit system with
> >> the kernel loaded high enough in memory so that in conflicts with
> >> the kernel VA.
> > 
> > ah, you had a patch for this...
> > 
> > does this even work for enabling the MMU during kernel boot or how do
> > they deal with it?
> 
> As I said in a reply to an earlier patch, this must already taken care
> of by the bootloader, making sure that the kernel physical memory does
> not alias with the VAs. Pretty scary.
> 
> > 
> >>
> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >> ---
> >>  arch/arm/kvm/mmu.c | 15 +++++++++++++++
> >>  1 file changed, 15 insertions(+)
> >>
> >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> >> index 46b8604..819517d 100644
> >> --- a/arch/arm/kvm/mmu.c
> >> +++ b/arch/arm/kvm/mmu.c
> >> @@ -1708,6 +1708,21 @@ int kvm_mmu_init(void)
> >>  	 */
> >>  	BUG_ON((hyp_idmap_start ^ (hyp_idmap_end - 1)) & PAGE_MASK);
> >>  
> >> +	kvm_info("IDMAP page: %lx\n", hyp_idmap_start);
> >> +	kvm_info("HYP VA range: %lx:%lx\n",
> >> +		 KERN_TO_HYP(PAGE_OFFSET), KERN_TO_HYP(~0UL));
> >> +
> >> +	if (hyp_idmap_start >= KERN_TO_HYP(PAGE_OFFSET) &&
> >> +	    hyp_idmap_start <  KERN_TO_HYP(~0UL)) {
> > 
> > why is the second part of this clause necessary?
> 
> We want to check that our clash avoiding mechanism works.
> 
> Since we're translating the kernel VA downwards (by clearing the top
> bits), we can definitely end-up in a situation where the idmap is above
> the translated "top of the kernel" (that's the "low mask" option). So it
> is definitely worth checking that we really don't get any aliasing. This
> has been quite useful when debugging this code.
> 
Right, I thought about this only in the context of 32-bit and got
confused.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 15/15] arm/arm64: KVM: Check that IDMAP doesn't intersect with VA range
@ 2016-06-30 13:27         ` Christoffer Dall
  0 siblings, 0 replies; 90+ messages in thread
From: Christoffer Dall @ 2016-06-30 13:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jun 30, 2016 at 01:51:00PM +0100, Marc Zyngier wrote:
> On 28/06/16 23:01, Christoffer Dall wrote:
> > On Tue, Jun 07, 2016 at 11:58:35AM +0100, Marc Zyngier wrote:
> >> This is more of a safety measure than anything else: If we end-up
> >> with an idmap page that intersect with the range picked for the
> >> the HYP VA space, abort the KVM setup, as it is unsafe to go
> >> further.
> >>
> >> I cannot imagine it happening on 64bit (we have a mechanism to
> >> work around it), but could potentially occur on a 32bit system with
> >> the kernel loaded high enough in memory so that in conflicts with
> >> the kernel VA.
> > 
> > ah, you had a patch for this...
> > 
> > does this even work for enabling the MMU during kernel boot or how do
> > they deal with it?
> 
> As I said in a reply to an earlier patch, this must already taken care
> of by the bootloader, making sure that the kernel physical memory does
> not alias with the VAs. Pretty scary.
> 
> > 
> >>
> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >> ---
> >>  arch/arm/kvm/mmu.c | 15 +++++++++++++++
> >>  1 file changed, 15 insertions(+)
> >>
> >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> >> index 46b8604..819517d 100644
> >> --- a/arch/arm/kvm/mmu.c
> >> +++ b/arch/arm/kvm/mmu.c
> >> @@ -1708,6 +1708,21 @@ int kvm_mmu_init(void)
> >>  	 */
> >>  	BUG_ON((hyp_idmap_start ^ (hyp_idmap_end - 1)) & PAGE_MASK);
> >>  
> >> +	kvm_info("IDMAP page: %lx\n", hyp_idmap_start);
> >> +	kvm_info("HYP VA range: %lx:%lx\n",
> >> +		 KERN_TO_HYP(PAGE_OFFSET), KERN_TO_HYP(~0UL));
> >> +
> >> +	if (hyp_idmap_start >= KERN_TO_HYP(PAGE_OFFSET) &&
> >> +	    hyp_idmap_start <  KERN_TO_HYP(~0UL)) {
> > 
> > why is the second part of this clause necessary?
> 
> We want to check that our clash avoiding mechanism works.
> 
> Since we're translating the kernel VA downwards (by clearing the top
> bits), we can definitely end-up in a situation where the idmap is above
> the translated "top of the kernel" (that's the "low mask" option). So it
> is definitely worth checking that we really don't get any aliasing. This
> has been quite useful when debugging this code.
> 
Right, I thought about this only in the context of 32-bit and got
confused.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 08/15] arm/arm64: KVM: Always have merged page tables
  2016-06-30 12:27       ` Marc Zyngier
@ 2016-06-30 13:28         ` Christoffer Dall
  -1 siblings, 0 replies; 90+ messages in thread
From: Christoffer Dall @ 2016-06-30 13:28 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, linux-arm-kernel, kvmarm

On Thu, Jun 30, 2016 at 01:27:05PM +0100, Marc Zyngier wrote:
> On 28/06/16 22:43, Christoffer Dall wrote:
> > On Tue, Jun 07, 2016 at 11:58:28AM +0100, Marc Zyngier wrote:
> >> We're in a position where we can now always have "merged" page
> >> tables, where both the runtime mapping and the idmap coexist.
> >>
> >> This results in some code being removed, but there is more to come.
> >>
> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >> ---
> >>  arch/arm/kvm/mmu.c     | 74 +++++++++++++++++++++++---------------------------
> >>  arch/arm64/kvm/reset.c | 31 +++++----------------
> >>  2 files changed, 41 insertions(+), 64 deletions(-)
> >>
> >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> >> index d6ecbf1..9a17e14 100644
> >> --- a/arch/arm/kvm/mmu.c
> >> +++ b/arch/arm/kvm/mmu.c
> >> @@ -492,13 +492,12 @@ void free_boot_hyp_pgd(void)
> >>  
> >>  	if (boot_hyp_pgd) {
> >>  		unmap_hyp_range(boot_hyp_pgd, hyp_idmap_start, PAGE_SIZE);
> >> -		unmap_hyp_range(boot_hyp_pgd, TRAMPOLINE_VA, PAGE_SIZE);
> >>  		free_pages((unsigned long)boot_hyp_pgd, hyp_pgd_order);
> >>  		boot_hyp_pgd = NULL;
> >>  	}
> >>  
> >>  	if (hyp_pgd)
> >> -		unmap_hyp_range(hyp_pgd, TRAMPOLINE_VA, PAGE_SIZE);
> >> +		unmap_hyp_range(hyp_pgd, hyp_idmap_start, PAGE_SIZE);
> >>  
> >>  	mutex_unlock(&kvm_hyp_pgd_mutex);
> >>  }
> >> @@ -1690,7 +1689,7 @@ phys_addr_t kvm_mmu_get_boot_httbr(void)
> >>  	if (__kvm_cpu_uses_extended_idmap())
> >>  		return virt_to_phys(merged_hyp_pgd);
> >>  	else
> >> -		return virt_to_phys(boot_hyp_pgd);
> >> +		return virt_to_phys(hyp_pgd);
> >>  }
> >>  
> >>  phys_addr_t kvm_get_idmap_vector(void)
> >> @@ -1703,6 +1702,22 @@ phys_addr_t kvm_get_idmap_start(void)
> >>  	return hyp_idmap_start;
> >>  }
> >>  
> >> +static int kvm_map_idmap_text(pgd_t *pgd)
> >> +{
> >> +	int err;
> >> +
> >> +	/* Create the idmap in the boot page tables */
> >> +	err = 	__create_hyp_mappings(pgd,
> >> +				      hyp_idmap_start, hyp_idmap_end,
> >> +				      __phys_to_pfn(hyp_idmap_start),
> >> +				      PAGE_HYP);
> >> +	if (err)
> >> +		kvm_err("Failed to idmap %lx-%lx\n",
> >> +			hyp_idmap_start, hyp_idmap_end);
> >> +
> >> +	return err;
> >> +}
> >> +
> >>  int kvm_mmu_init(void)
> >>  {
> >>  	int err;
> >> @@ -1718,27 +1733,25 @@ int kvm_mmu_init(void)
> >>  	BUG_ON((hyp_idmap_start ^ (hyp_idmap_end - 1)) & PAGE_MASK);
> >>  
> >>  	hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order);
> >> -	boot_hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order);
> >> -
> >> -	if (!hyp_pgd || !boot_hyp_pgd) {
> >> +	if (!hyp_pgd) {
> >>  		kvm_err("Hyp mode PGD not allocated\n");
> >>  		err = -ENOMEM;
> >>  		goto out;
> >>  	}
> >>  
> >> -	/* Create the idmap in the boot page tables */
> >> -	err = 	__create_hyp_mappings(boot_hyp_pgd,
> >> -				      hyp_idmap_start, hyp_idmap_end,
> >> -				      __phys_to_pfn(hyp_idmap_start),
> >> -				      PAGE_HYP);
> >> +	if (__kvm_cpu_uses_extended_idmap()) {
> >> +		boot_hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
> >> +							 hyp_pgd_order);
> >> +		if (!boot_hyp_pgd) {
> >> +			kvm_err("Hyp boot PGD not allocated\n");
> >> +			err = -ENOMEM;
> >> +			goto out;
> >> +		}
> >>  
> >> -	if (err) {
> >> -		kvm_err("Failed to idmap %lx-%lx\n",
> >> -			hyp_idmap_start, hyp_idmap_end);
> >> -		goto out;
> >> -	}
> >> +		err = kvm_map_idmap_text(boot_hyp_pgd);
> >> +		if (err)
> >> +			goto out;
> >>  
> >> -	if (__kvm_cpu_uses_extended_idmap()) {
> >>  		merged_hyp_pgd = (pgd_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
> >>  		if (!merged_hyp_pgd) {
> >>  			kvm_err("Failed to allocate extra HYP pgd\n");
> >> @@ -1746,29 +1759,10 @@ int kvm_mmu_init(void)
> >>  		}
> >>  		__kvm_extend_hypmap(boot_hyp_pgd, hyp_pgd, merged_hyp_pgd,
> >>  				    hyp_idmap_start);
> >> -		return 0;
> >> -	}
> >> -
> >> -	/* Map the very same page at the trampoline VA */
> >> -	err = 	__create_hyp_mappings(boot_hyp_pgd,
> >> -				      TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
> >> -				      __phys_to_pfn(hyp_idmap_start),
> >> -				      PAGE_HYP);
> >> -	if (err) {
> >> -		kvm_err("Failed to map trampoline @%lx into boot HYP pgd\n",
> >> -			TRAMPOLINE_VA);
> >> -		goto out;
> >> -	}
> >> -
> >> -	/* Map the same page again into the runtime page tables */
> >> -	err = 	__create_hyp_mappings(hyp_pgd,
> >> -				      TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
> >> -				      __phys_to_pfn(hyp_idmap_start),
> >> -				      PAGE_HYP);
> >> -	if (err) {
> >> -		kvm_err("Failed to map trampoline @%lx into runtime HYP pgd\n",
> >> -			TRAMPOLINE_VA);
> >> -		goto out;
> >> +	} else {
> >> +		err = kvm_map_idmap_text(hyp_pgd);
> >> +		if (err)
> >> +			goto out;
> > 
> > Something I'm not clear on:
> > 
> > how can we always have merged pgtables on 32-bit ARM at this point?
> > 
> > why is there not a potential conflict at this point in the series
> > between the runtime hyp mappings and the idmaps?
> 
> The problem is slightly different. On 32bit, the HYP mapping completely
> covers the whole address space, just like the kernel. But if your idmap
> and the kernel VA overlap, you are in a very weird position. Actually,
> you'll even have trouble booting into the kernel.
> 
> So my take on this is that it has already been solved by making sure the
> kernel is loaded at an address that won't alias with the kernel VA. If
> it hasn't, then they are probably not running mainline Linux.
> 
Ok, thanks for the explanation!

-Christoffer

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 08/15] arm/arm64: KVM: Always have merged page tables
@ 2016-06-30 13:28         ` Christoffer Dall
  0 siblings, 0 replies; 90+ messages in thread
From: Christoffer Dall @ 2016-06-30 13:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jun 30, 2016 at 01:27:05PM +0100, Marc Zyngier wrote:
> On 28/06/16 22:43, Christoffer Dall wrote:
> > On Tue, Jun 07, 2016 at 11:58:28AM +0100, Marc Zyngier wrote:
> >> We're in a position where we can now always have "merged" page
> >> tables, where both the runtime mapping and the idmap coexist.
> >>
> >> This results in some code being removed, but there is more to come.
> >>
> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >> ---
> >>  arch/arm/kvm/mmu.c     | 74 +++++++++++++++++++++++---------------------------
> >>  arch/arm64/kvm/reset.c | 31 +++++----------------
> >>  2 files changed, 41 insertions(+), 64 deletions(-)
> >>
> >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> >> index d6ecbf1..9a17e14 100644
> >> --- a/arch/arm/kvm/mmu.c
> >> +++ b/arch/arm/kvm/mmu.c
> >> @@ -492,13 +492,12 @@ void free_boot_hyp_pgd(void)
> >>  
> >>  	if (boot_hyp_pgd) {
> >>  		unmap_hyp_range(boot_hyp_pgd, hyp_idmap_start, PAGE_SIZE);
> >> -		unmap_hyp_range(boot_hyp_pgd, TRAMPOLINE_VA, PAGE_SIZE);
> >>  		free_pages((unsigned long)boot_hyp_pgd, hyp_pgd_order);
> >>  		boot_hyp_pgd = NULL;
> >>  	}
> >>  
> >>  	if (hyp_pgd)
> >> -		unmap_hyp_range(hyp_pgd, TRAMPOLINE_VA, PAGE_SIZE);
> >> +		unmap_hyp_range(hyp_pgd, hyp_idmap_start, PAGE_SIZE);
> >>  
> >>  	mutex_unlock(&kvm_hyp_pgd_mutex);
> >>  }
> >> @@ -1690,7 +1689,7 @@ phys_addr_t kvm_mmu_get_boot_httbr(void)
> >>  	if (__kvm_cpu_uses_extended_idmap())
> >>  		return virt_to_phys(merged_hyp_pgd);
> >>  	else
> >> -		return virt_to_phys(boot_hyp_pgd);
> >> +		return virt_to_phys(hyp_pgd);
> >>  }
> >>  
> >>  phys_addr_t kvm_get_idmap_vector(void)
> >> @@ -1703,6 +1702,22 @@ phys_addr_t kvm_get_idmap_start(void)
> >>  	return hyp_idmap_start;
> >>  }
> >>  
> >> +static int kvm_map_idmap_text(pgd_t *pgd)
> >> +{
> >> +	int err;
> >> +
> >> +	/* Create the idmap in the boot page tables */
> >> +	err = 	__create_hyp_mappings(pgd,
> >> +				      hyp_idmap_start, hyp_idmap_end,
> >> +				      __phys_to_pfn(hyp_idmap_start),
> >> +				      PAGE_HYP);
> >> +	if (err)
> >> +		kvm_err("Failed to idmap %lx-%lx\n",
> >> +			hyp_idmap_start, hyp_idmap_end);
> >> +
> >> +	return err;
> >> +}
> >> +
> >>  int kvm_mmu_init(void)
> >>  {
> >>  	int err;
> >> @@ -1718,27 +1733,25 @@ int kvm_mmu_init(void)
> >>  	BUG_ON((hyp_idmap_start ^ (hyp_idmap_end - 1)) & PAGE_MASK);
> >>  
> >>  	hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order);
> >> -	boot_hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order);
> >> -
> >> -	if (!hyp_pgd || !boot_hyp_pgd) {
> >> +	if (!hyp_pgd) {
> >>  		kvm_err("Hyp mode PGD not allocated\n");
> >>  		err = -ENOMEM;
> >>  		goto out;
> >>  	}
> >>  
> >> -	/* Create the idmap in the boot page tables */
> >> -	err = 	__create_hyp_mappings(boot_hyp_pgd,
> >> -				      hyp_idmap_start, hyp_idmap_end,
> >> -				      __phys_to_pfn(hyp_idmap_start),
> >> -				      PAGE_HYP);
> >> +	if (__kvm_cpu_uses_extended_idmap()) {
> >> +		boot_hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
> >> +							 hyp_pgd_order);
> >> +		if (!boot_hyp_pgd) {
> >> +			kvm_err("Hyp boot PGD not allocated\n");
> >> +			err = -ENOMEM;
> >> +			goto out;
> >> +		}
> >>  
> >> -	if (err) {
> >> -		kvm_err("Failed to idmap %lx-%lx\n",
> >> -			hyp_idmap_start, hyp_idmap_end);
> >> -		goto out;
> >> -	}
> >> +		err = kvm_map_idmap_text(boot_hyp_pgd);
> >> +		if (err)
> >> +			goto out;
> >>  
> >> -	if (__kvm_cpu_uses_extended_idmap()) {
> >>  		merged_hyp_pgd = (pgd_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
> >>  		if (!merged_hyp_pgd) {
> >>  			kvm_err("Failed to allocate extra HYP pgd\n");
> >> @@ -1746,29 +1759,10 @@ int kvm_mmu_init(void)
> >>  		}
> >>  		__kvm_extend_hypmap(boot_hyp_pgd, hyp_pgd, merged_hyp_pgd,
> >>  				    hyp_idmap_start);
> >> -		return 0;
> >> -	}
> >> -
> >> -	/* Map the very same page at the trampoline VA */
> >> -	err = 	__create_hyp_mappings(boot_hyp_pgd,
> >> -				      TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
> >> -				      __phys_to_pfn(hyp_idmap_start),
> >> -				      PAGE_HYP);
> >> -	if (err) {
> >> -		kvm_err("Failed to map trampoline @%lx into boot HYP pgd\n",
> >> -			TRAMPOLINE_VA);
> >> -		goto out;
> >> -	}
> >> -
> >> -	/* Map the same page again into the runtime page tables */
> >> -	err = 	__create_hyp_mappings(hyp_pgd,
> >> -				      TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
> >> -				      __phys_to_pfn(hyp_idmap_start),
> >> -				      PAGE_HYP);
> >> -	if (err) {
> >> -		kvm_err("Failed to map trampoline @%lx into runtime HYP pgd\n",
> >> -			TRAMPOLINE_VA);
> >> -		goto out;
> >> +	} else {
> >> +		err = kvm_map_idmap_text(hyp_pgd);
> >> +		if (err)
> >> +			goto out;
> > 
> > Something I'm not clear on:
> > 
> > how can we always have merged pgtables on 32-bit ARM at this point?
> > 
> > why is there not a potential conflict at this point in the series
> > between the runtime hyp mappings and the idmaps?
> 
> The problem is slightly different. On 32bit, the HYP mapping completely
> covers the whole address space, just like the kernel. But if your idmap
> and the kernel VA overlap, you are in a very weird position. Actually,
> you'll even have trouble booting into the kernel.
> 
> So my take on this is that it has already been solved by making sure the
> kernel is loaded at an address that won't alias with the kernel VA. If
> it hasn't, then they are probably not running mainline Linux.
> 
Ok, thanks for the explanation!

-Christoffer

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 09/15] arm64: KVM: Simplify HYP init/teardown
  2016-06-30 12:10       ` Marc Zyngier
@ 2016-06-30 13:31         ` Christoffer Dall
  -1 siblings, 0 replies; 90+ messages in thread
From: Christoffer Dall @ 2016-06-30 13:31 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, linux-arm-kernel, kvmarm

On Thu, Jun 30, 2016 at 01:10:33PM +0100, Marc Zyngier wrote:
> On 28/06/16 22:31, Christoffer Dall wrote:
> > On Tue, Jun 07, 2016 at 11:58:29AM +0100, Marc Zyngier wrote:
> >> Now that we only have the "merged page tables" case to deal with,
> >> there is a bunch of things we can simplify in the HYP code (both
> >> at init and teardown time).
> >>
> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >> ---
> >>  arch/arm64/include/asm/kvm_host.h | 12 ++------
> >>  arch/arm64/kvm/hyp-init.S         | 61 +++++----------------------------------
> >>  arch/arm64/kvm/hyp/entry.S        | 19 ------------
> >>  arch/arm64/kvm/hyp/hyp-entry.S    | 15 ++++++++++
> >>  arch/arm64/kvm/reset.c            | 11 -------
> >>  5 files changed, 26 insertions(+), 92 deletions(-)
> >>
> >> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> >> index 49095fc..88462c3 100644
> >> --- a/arch/arm64/include/asm/kvm_host.h
> >> +++ b/arch/arm64/include/asm/kvm_host.h
> >> @@ -48,7 +48,6 @@
> >>  int __attribute_const__ kvm_target_cpu(void);
> >>  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
> >>  int kvm_arch_dev_ioctl_check_extension(long ext);
> >> -unsigned long kvm_hyp_reset_entry(void);
> >>  void __extended_idmap_trampoline(phys_addr_t boot_pgd, phys_addr_t idmap_start);
> >>  
> >>  struct kvm_arch {
> >> @@ -357,19 +356,14 @@ static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
> >>  	 * Call initialization code, and switch to the full blown
> >>  	 * HYP code.
> >>  	 */
> >> -	__kvm_call_hyp((void *)boot_pgd_ptr, pgd_ptr,
> >> -		       hyp_stack_ptr, vector_ptr);
> >> +	__kvm_call_hyp((void *)pgd_ptr, hyp_stack_ptr, vector_ptr);
> >>  }
> >>  
> >> +void __kvm_hyp_teardown(void);
> >>  static inline void __cpu_reset_hyp_mode(phys_addr_t boot_pgd_ptr,
> >>  					phys_addr_t phys_idmap_start)
> >>  {
> >> -	/*
> >> -	 * Call reset code, and switch back to stub hyp vectors.
> >> -	 * Uses __kvm_call_hyp() to avoid kaslr's kvm_ksym_ref() translation.
> >> -	 */
> >> -	__kvm_call_hyp((void *)kvm_hyp_reset_entry(),
> >> -		       boot_pgd_ptr, phys_idmap_start);
> >> +	kvm_call_hyp(__kvm_hyp_teardown, phys_idmap_start);
> >>  }
> >>  
> >>  static inline void kvm_arch_hardware_unsetup(void) {}
> >> diff --git a/arch/arm64/kvm/hyp-init.S b/arch/arm64/kvm/hyp-init.S
> >> index a873a6d..6b29d3d 100644
> >> --- a/arch/arm64/kvm/hyp-init.S
> >> +++ b/arch/arm64/kvm/hyp-init.S
> >> @@ -53,10 +53,9 @@ __invalid:
> >>  	b	.
> >>  
> >>  	/*
> >> -	 * x0: HYP boot pgd
> >> -	 * x1: HYP pgd
> >> -	 * x2: HYP stack
> >> -	 * x3: HYP vectors
> >> +	 * x0: HYP pgd
> >> +	 * x1: HYP stack
> >> +	 * x2: HYP vectors
> >>  	 */
> >>  __do_hyp_init:
> >>  
> >> @@ -110,71 +109,27 @@ __do_hyp_init:
> >>  	msr	sctlr_el2, x4
> >>  	isb
> >>  
> >> -	/* Skip the trampoline dance if we merged the boot and runtime PGDs */
> >> -	cmp	x0, x1
> >> -	b.eq	merged
> >> -
> >> -	/* MMU is now enabled. Get ready for the trampoline dance */
> >> -	ldr	x4, =TRAMPOLINE_VA
> >> -	adr	x5, target
> >> -	bfi	x4, x5, #0, #PAGE_SHIFT
> >> -	br	x4
> >> -
> >> -target: /* We're now in the trampoline code, switch page tables */
> >> -	msr	ttbr0_el2, x1
> >> -	isb
> >> -
> >> -	/* Invalidate the old TLBs */
> >> -	tlbi	alle2
> >> -	dsb	sy
> >> -
> >> -merged:
> >>  	/* Set the stack and new vectors */
> >> +	kern_hyp_va	x1
> >> +	mov	sp, x1
> >>  	kern_hyp_va	x2
> >> -	mov	sp, x2
> >> -	kern_hyp_va	x3
> >> -	msr	vbar_el2, x3
> >> +	msr	vbar_el2, x2
> >>  
> >>  	/* Hello, World! */
> >>  	eret
> >>  ENDPROC(__kvm_hyp_init)
> >>  
> >>  	/*
> >> -	 * Reset kvm back to the hyp stub. This is the trampoline dance in
> >> -	 * reverse. If kvm used an extended idmap, __extended_idmap_trampoline
> >> -	 * calls this code directly in the idmap. In this case switching to the
> >> -	 * boot tables is a no-op.
> >> -	 *
> >> -	 * x0: HYP boot pgd
> >> -	 * x1: HYP phys_idmap_start
> >> +	 * Reset kvm back to the hyp stub.
> >>  	 */
> >>  ENTRY(__kvm_hyp_reset)
> >> -	/* We're in trampoline code in VA, switch back to boot page tables */
> >> -	msr	ttbr0_el2, x0
> >> -	isb
> >> -
> >> -	/* Ensure the PA branch doesn't find a stale tlb entry or stale code. */
> >> -	ic	iallu
> >> -	tlbi	alle2
> >> -	dsb	sy
> >> -	isb
> >> -
> >> -	/* Branch into PA space */
> >> -	adr	x0, 1f
> >> -	bfi	x1, x0, #0, #PAGE_SHIFT
> >> -	br	x1
> >> -
> >>  	/* We're now in idmap, disable MMU */
> >> -1:	mrs	x0, sctlr_el2
> >> +	mrs	x0, sctlr_el2
> >>  	ldr	x1, =SCTLR_ELx_FLAGS
> >>  	bic	x0, x0, x1		// Clear SCTL_M and etc
> >>  	msr	sctlr_el2, x0
> >>  	isb
> >>  
> >> -	/* Invalidate the old TLBs */
> >> -	tlbi	alle2
> >> -	dsb	sy
> >> -
> > 
> > why can we get rid of the above two lines now?
> 
> We never really needed them, as we always invalid TLBs before enabling
> the MMU. Simply disabling the MMU is enough here.
> 
> > 
> >>  	/* Install stub vectors */
> >>  	adr_l	x0, __hyp_stub_vectors
> >>  	msr	vbar_el2, x0
> >> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> >> index 70254a6..ce9e5e5 100644
> >> --- a/arch/arm64/kvm/hyp/entry.S
> >> +++ b/arch/arm64/kvm/hyp/entry.S
> >> @@ -164,22 +164,3 @@ alternative_endif
> >>  
> >>  	eret
> >>  ENDPROC(__fpsimd_guest_restore)
> >> -
> >> -/*
> >> - * When using the extended idmap, we don't have a trampoline page we can use
> >> - * while we switch pages tables during __kvm_hyp_reset. Accessing the idmap
> >> - * directly would be ideal, but if we're using the extended idmap then the
> >> - * idmap is located above HYP_PAGE_OFFSET, and the address will be masked by
> >> - * kvm_call_hyp using kern_hyp_va.
> >> - *
> >> - * x0: HYP boot pgd
> >> - * x1: HYP phys_idmap_start
> >> - */
> >> -ENTRY(__extended_idmap_trampoline)
> >> -	mov	x4, x1
> >> -	adr_l	x3, __kvm_hyp_reset
> >> -
> >> -	/* insert __kvm_hyp_reset()s offset into phys_idmap_start */
> >> -	bfi	x4, x3, #0, #PAGE_SHIFT
> >> -	br	x4
> >> -ENDPROC(__extended_idmap_trampoline)
> >> diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
> >> index 2d87f36..f6d9694 100644
> >> --- a/arch/arm64/kvm/hyp/hyp-entry.S
> >> +++ b/arch/arm64/kvm/hyp/hyp-entry.S
> >> @@ -62,6 +62,21 @@ ENTRY(__vhe_hyp_call)
> >>  	isb
> >>  	ret
> >>  ENDPROC(__vhe_hyp_call)
> >> +
> >> +/*
> >> + * Compute the idmap address of __kvm_hyp_reset based on the idmap
> >> + * start passed as a parameter, and jump there.
> >> + *
> >> + * x0: HYP phys_idmap_start
> >> + */
> >> +ENTRY(__kvm_hyp_teardown)
> >> +	mov	x4, x0
> >> +	adr_l	x3, __kvm_hyp_reset
> >> +
> >> +	/* insert __kvm_hyp_reset()s offset into phys_idmap_start */
> >> +	bfi	x4, x3, #0, #PAGE_SHIFT
> >> +	br	x4
> >> +ENDPROC(__kvm_hyp_teardown)
> >>  	
> >>  el1_sync:				// Guest trapped into EL2
> >>  	save_x0_to_x3
> >> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> >> index d044ca3..deee1b1 100644
> >> --- a/arch/arm64/kvm/reset.c
> >> +++ b/arch/arm64/kvm/reset.c
> >> @@ -132,14 +132,3 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
> >>  	/* Reset timer */
> >>  	return kvm_timer_vcpu_reset(vcpu, cpu_vtimer_irq);
> >>  }
> >> -
> >> -unsigned long kvm_hyp_reset_entry(void)
> >> -{
> >> -	/*
> >> -	 * KVM is running with merged page tables, which don't have the
> >> -	 * trampoline page mapped. We know the idmap is still mapped,
> >> -	 * but can't be called into directly. Use
> >> -	 * __extended_idmap_trampoline to do the call.
> >> -	 */
> >> -	return (unsigned long)kvm_ksym_ref(__extended_idmap_trampoline);
> >> -}
> >> -- 
> >> 2.1.4
> >>
> > 
> > I'm not sure I understand why we needed the kvm_hyp_reset_entry
> > indirection before, but the resulting code here looks good to me.
> 
> We still have an indirection, it is just a bit cleaner: We cannot call
> directly into the reset function located in the idmap, as the function
> is not strictly a kernel address, and the kern_hyp_va macro will mess
> with the function address. This is why we go via:
> 
> __cpu_reset_hyp_mode -> __kvm_hyp_teardown -> __kvm_hyp_reset
> 
> __cpu_reset_hyp_mode is the arch-agnostic entry point,
> __kvm_hyp_teardown is a normal HYP function, and __kvm_hyp_reset is the
> real thing in the idmap page.
> 
> Is that clearer?
> 
Didn't we have

__cpu_reset_hyp_mode -> kvm_hyp_reset_entry ->
__extended_idmap_trampoline -> __kvm_hyp_reset before, so one more level
of indirection, which we are not removing?

In any case, both versions of the code looks correct, perhaps it was
simply that kvm_hyp_reset_entry was implemented for both arm/arm64
before?

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 09/15] arm64: KVM: Simplify HYP init/teardown
@ 2016-06-30 13:31         ` Christoffer Dall
  0 siblings, 0 replies; 90+ messages in thread
From: Christoffer Dall @ 2016-06-30 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jun 30, 2016 at 01:10:33PM +0100, Marc Zyngier wrote:
> On 28/06/16 22:31, Christoffer Dall wrote:
> > On Tue, Jun 07, 2016 at 11:58:29AM +0100, Marc Zyngier wrote:
> >> Now that we only have the "merged page tables" case to deal with,
> >> there is a bunch of things we can simplify in the HYP code (both
> >> at init and teardown time).
> >>
> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >> ---
> >>  arch/arm64/include/asm/kvm_host.h | 12 ++------
> >>  arch/arm64/kvm/hyp-init.S         | 61 +++++----------------------------------
> >>  arch/arm64/kvm/hyp/entry.S        | 19 ------------
> >>  arch/arm64/kvm/hyp/hyp-entry.S    | 15 ++++++++++
> >>  arch/arm64/kvm/reset.c            | 11 -------
> >>  5 files changed, 26 insertions(+), 92 deletions(-)
> >>
> >> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> >> index 49095fc..88462c3 100644
> >> --- a/arch/arm64/include/asm/kvm_host.h
> >> +++ b/arch/arm64/include/asm/kvm_host.h
> >> @@ -48,7 +48,6 @@
> >>  int __attribute_const__ kvm_target_cpu(void);
> >>  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
> >>  int kvm_arch_dev_ioctl_check_extension(long ext);
> >> -unsigned long kvm_hyp_reset_entry(void);
> >>  void __extended_idmap_trampoline(phys_addr_t boot_pgd, phys_addr_t idmap_start);
> >>  
> >>  struct kvm_arch {
> >> @@ -357,19 +356,14 @@ static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
> >>  	 * Call initialization code, and switch to the full blown
> >>  	 * HYP code.
> >>  	 */
> >> -	__kvm_call_hyp((void *)boot_pgd_ptr, pgd_ptr,
> >> -		       hyp_stack_ptr, vector_ptr);
> >> +	__kvm_call_hyp((void *)pgd_ptr, hyp_stack_ptr, vector_ptr);
> >>  }
> >>  
> >> +void __kvm_hyp_teardown(void);
> >>  static inline void __cpu_reset_hyp_mode(phys_addr_t boot_pgd_ptr,
> >>  					phys_addr_t phys_idmap_start)
> >>  {
> >> -	/*
> >> -	 * Call reset code, and switch back to stub hyp vectors.
> >> -	 * Uses __kvm_call_hyp() to avoid kaslr's kvm_ksym_ref() translation.
> >> -	 */
> >> -	__kvm_call_hyp((void *)kvm_hyp_reset_entry(),
> >> -		       boot_pgd_ptr, phys_idmap_start);
> >> +	kvm_call_hyp(__kvm_hyp_teardown, phys_idmap_start);
> >>  }
> >>  
> >>  static inline void kvm_arch_hardware_unsetup(void) {}
> >> diff --git a/arch/arm64/kvm/hyp-init.S b/arch/arm64/kvm/hyp-init.S
> >> index a873a6d..6b29d3d 100644
> >> --- a/arch/arm64/kvm/hyp-init.S
> >> +++ b/arch/arm64/kvm/hyp-init.S
> >> @@ -53,10 +53,9 @@ __invalid:
> >>  	b	.
> >>  
> >>  	/*
> >> -	 * x0: HYP boot pgd
> >> -	 * x1: HYP pgd
> >> -	 * x2: HYP stack
> >> -	 * x3: HYP vectors
> >> +	 * x0: HYP pgd
> >> +	 * x1: HYP stack
> >> +	 * x2: HYP vectors
> >>  	 */
> >>  __do_hyp_init:
> >>  
> >> @@ -110,71 +109,27 @@ __do_hyp_init:
> >>  	msr	sctlr_el2, x4
> >>  	isb
> >>  
> >> -	/* Skip the trampoline dance if we merged the boot and runtime PGDs */
> >> -	cmp	x0, x1
> >> -	b.eq	merged
> >> -
> >> -	/* MMU is now enabled. Get ready for the trampoline dance */
> >> -	ldr	x4, =TRAMPOLINE_VA
> >> -	adr	x5, target
> >> -	bfi	x4, x5, #0, #PAGE_SHIFT
> >> -	br	x4
> >> -
> >> -target: /* We're now in the trampoline code, switch page tables */
> >> -	msr	ttbr0_el2, x1
> >> -	isb
> >> -
> >> -	/* Invalidate the old TLBs */
> >> -	tlbi	alle2
> >> -	dsb	sy
> >> -
> >> -merged:
> >>  	/* Set the stack and new vectors */
> >> +	kern_hyp_va	x1
> >> +	mov	sp, x1
> >>  	kern_hyp_va	x2
> >> -	mov	sp, x2
> >> -	kern_hyp_va	x3
> >> -	msr	vbar_el2, x3
> >> +	msr	vbar_el2, x2
> >>  
> >>  	/* Hello, World! */
> >>  	eret
> >>  ENDPROC(__kvm_hyp_init)
> >>  
> >>  	/*
> >> -	 * Reset kvm back to the hyp stub. This is the trampoline dance in
> >> -	 * reverse. If kvm used an extended idmap, __extended_idmap_trampoline
> >> -	 * calls this code directly in the idmap. In this case switching to the
> >> -	 * boot tables is a no-op.
> >> -	 *
> >> -	 * x0: HYP boot pgd
> >> -	 * x1: HYP phys_idmap_start
> >> +	 * Reset kvm back to the hyp stub.
> >>  	 */
> >>  ENTRY(__kvm_hyp_reset)
> >> -	/* We're in trampoline code in VA, switch back to boot page tables */
> >> -	msr	ttbr0_el2, x0
> >> -	isb
> >> -
> >> -	/* Ensure the PA branch doesn't find a stale tlb entry or stale code. */
> >> -	ic	iallu
> >> -	tlbi	alle2
> >> -	dsb	sy
> >> -	isb
> >> -
> >> -	/* Branch into PA space */
> >> -	adr	x0, 1f
> >> -	bfi	x1, x0, #0, #PAGE_SHIFT
> >> -	br	x1
> >> -
> >>  	/* We're now in idmap, disable MMU */
> >> -1:	mrs	x0, sctlr_el2
> >> +	mrs	x0, sctlr_el2
> >>  	ldr	x1, =SCTLR_ELx_FLAGS
> >>  	bic	x0, x0, x1		// Clear SCTL_M and etc
> >>  	msr	sctlr_el2, x0
> >>  	isb
> >>  
> >> -	/* Invalidate the old TLBs */
> >> -	tlbi	alle2
> >> -	dsb	sy
> >> -
> > 
> > why can we get rid of the above two lines now?
> 
> We never really needed them, as we always invalid TLBs before enabling
> the MMU. Simply disabling the MMU is enough here.
> 
> > 
> >>  	/* Install stub vectors */
> >>  	adr_l	x0, __hyp_stub_vectors
> >>  	msr	vbar_el2, x0
> >> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> >> index 70254a6..ce9e5e5 100644
> >> --- a/arch/arm64/kvm/hyp/entry.S
> >> +++ b/arch/arm64/kvm/hyp/entry.S
> >> @@ -164,22 +164,3 @@ alternative_endif
> >>  
> >>  	eret
> >>  ENDPROC(__fpsimd_guest_restore)
> >> -
> >> -/*
> >> - * When using the extended idmap, we don't have a trampoline page we can use
> >> - * while we switch pages tables during __kvm_hyp_reset. Accessing the idmap
> >> - * directly would be ideal, but if we're using the extended idmap then the
> >> - * idmap is located above HYP_PAGE_OFFSET, and the address will be masked by
> >> - * kvm_call_hyp using kern_hyp_va.
> >> - *
> >> - * x0: HYP boot pgd
> >> - * x1: HYP phys_idmap_start
> >> - */
> >> -ENTRY(__extended_idmap_trampoline)
> >> -	mov	x4, x1
> >> -	adr_l	x3, __kvm_hyp_reset
> >> -
> >> -	/* insert __kvm_hyp_reset()s offset into phys_idmap_start */
> >> -	bfi	x4, x3, #0, #PAGE_SHIFT
> >> -	br	x4
> >> -ENDPROC(__extended_idmap_trampoline)
> >> diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
> >> index 2d87f36..f6d9694 100644
> >> --- a/arch/arm64/kvm/hyp/hyp-entry.S
> >> +++ b/arch/arm64/kvm/hyp/hyp-entry.S
> >> @@ -62,6 +62,21 @@ ENTRY(__vhe_hyp_call)
> >>  	isb
> >>  	ret
> >>  ENDPROC(__vhe_hyp_call)
> >> +
> >> +/*
> >> + * Compute the idmap address of __kvm_hyp_reset based on the idmap
> >> + * start passed as a parameter, and jump there.
> >> + *
> >> + * x0: HYP phys_idmap_start
> >> + */
> >> +ENTRY(__kvm_hyp_teardown)
> >> +	mov	x4, x0
> >> +	adr_l	x3, __kvm_hyp_reset
> >> +
> >> +	/* insert __kvm_hyp_reset()s offset into phys_idmap_start */
> >> +	bfi	x4, x3, #0, #PAGE_SHIFT
> >> +	br	x4
> >> +ENDPROC(__kvm_hyp_teardown)
> >>  	
> >>  el1_sync:				// Guest trapped into EL2
> >>  	save_x0_to_x3
> >> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> >> index d044ca3..deee1b1 100644
> >> --- a/arch/arm64/kvm/reset.c
> >> +++ b/arch/arm64/kvm/reset.c
> >> @@ -132,14 +132,3 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
> >>  	/* Reset timer */
> >>  	return kvm_timer_vcpu_reset(vcpu, cpu_vtimer_irq);
> >>  }
> >> -
> >> -unsigned long kvm_hyp_reset_entry(void)
> >> -{
> >> -	/*
> >> -	 * KVM is running with merged page tables, which don't have the
> >> -	 * trampoline page mapped. We know the idmap is still mapped,
> >> -	 * but can't be called into directly. Use
> >> -	 * __extended_idmap_trampoline to do the call.
> >> -	 */
> >> -	return (unsigned long)kvm_ksym_ref(__extended_idmap_trampoline);
> >> -}
> >> -- 
> >> 2.1.4
> >>
> > 
> > I'm not sure I understand why we needed the kvm_hyp_reset_entry
> > indirection before, but the resulting code here looks good to me.
> 
> We still have an indirection, it is just a bit cleaner: We cannot call
> directly into the reset function located in the idmap, as the function
> is not strictly a kernel address, and the kern_hyp_va macro will mess
> with the function address. This is why we go via:
> 
> __cpu_reset_hyp_mode -> __kvm_hyp_teardown -> __kvm_hyp_reset
> 
> __cpu_reset_hyp_mode is the arch-agnostic entry point,
> __kvm_hyp_teardown is a normal HYP function, and __kvm_hyp_reset is the
> real thing in the idmap page.
> 
> Is that clearer?
> 
Didn't we have

__cpu_reset_hyp_mode -> kvm_hyp_reset_entry ->
__extended_idmap_trampoline -> __kvm_hyp_reset before, so one more level
of indirection, which we are not removing?

In any case, both versions of the code looks correct, perhaps it was
simply that kvm_hyp_reset_entry was implemented for both arm/arm64
before?

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 12/15] arm: KVM: Simplify HYP init
  2016-06-30 12:31       ` Marc Zyngier
@ 2016-06-30 13:32         ` Christoffer Dall
  -1 siblings, 0 replies; 90+ messages in thread
From: Christoffer Dall @ 2016-06-30 13:32 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, linux-arm-kernel, kvmarm

On Thu, Jun 30, 2016 at 01:31:52PM +0100, Marc Zyngier wrote:
> On 28/06/16 22:50, Christoffer Dall wrote:
> > On Tue, Jun 07, 2016 at 11:58:32AM +0100, Marc Zyngier wrote:
> >> Just like for arm64, we can now make the HYP setup a lot simpler,
> >> and we can now initialise it in one go (instead of the two
> >> phases we currently have).
> >>
> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >> ---
> >>  arch/arm/include/asm/kvm_host.h | 15 +++++--------
> >>  arch/arm/kvm/init.S             | 49 ++++++++---------------------------------
> >>  2 files changed, 14 insertions(+), 50 deletions(-)
> >>
> >> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> >> index 020f4eb..eafbfd5 100644
> >> --- a/arch/arm/include/asm/kvm_host.h
> >> +++ b/arch/arm/include/asm/kvm_host.h
> >> @@ -250,18 +250,13 @@ static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
> >>  	 * code. The init code doesn't need to preserve these
> >>  	 * registers as r0-r3 are already callee saved according to
> >>  	 * the AAPCS.
> >> -	 * Note that we slightly misuse the prototype by casing the
> >> +	 * Note that we slightly misuse the prototype by casting the
> >>  	 * stack pointer to a void *.
> >> -	 *
> >> -	 * We don't have enough registers to perform the full init in
> >> -	 * one go.  Install the boot PGD first, and then install the
> >> -	 * runtime PGD, stack pointer and vectors. The PGDs are always
> >> -	 * passed as the third argument, in order to be passed into
> >> -	 * r2-r3 to the init code (yes, this is compliant with the
> >> -	 * PCS!).
> >> -	 */
> >>  
> >> -	kvm_call_hyp(NULL, 0, boot_pgd_ptr);
> >> +	 * The PGDs are always passed as the third argument, in order
> >> +	 * to be passed into r2-r3 to the init code (yes, this is
> >> +	 * compliant with the PCS!).
> >> +	 */
> >>  
> >>  	kvm_call_hyp((void*)hyp_stack_ptr, vector_ptr, pgd_ptr);
> >>  }
> >> diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
> >> index 1f9ae17..b82a99d 100644
> >> --- a/arch/arm/kvm/init.S
> >> +++ b/arch/arm/kvm/init.S
> >> @@ -32,23 +32,13 @@
> >>   *       r2,r3 = Hypervisor pgd pointer
> >>   *
> >>   * The init scenario is:
> >> - * - We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
> >> - *   runtime stack, runtime vectors
> >> - * - Enable the MMU with the boot pgd
> >> - * - Jump to a target into the trampoline page (remember, this is the same
> >> - *   physical page!)
> >> - * - Now switch to the runtime pgd (same VA, and still the same physical
> >> - *   page!)
> >> + * - We jump in HYP with 3 parameters: runtime HYP pgd, runtime stack,
> >> + *   runtime vectors
> > 
> > probably just call this HYP pgd, HYP stack, and HYP vectors now
> 
> Yup.
> 
> >>   * - Invalidate TLBs
> >>   * - Set stack and vectors
> >> + * - Setup the page tables
> >> + * - Enable the MMU
> >>   * - Profit! (or eret, if you only care about the code).
> >> - *
> >> - * As we only have four registers available to pass parameters (and we
> >> - * need six), we split the init in two phases:
> >> - * - Phase 1: r0 = 0, r1 = 0, r2,r3 contain the boot PGD.
> >> - *   Provides the basic HYP init, and enable the MMU.
> >> - * - Phase 2: r0 = ToS, r1 = vectors, r2,r3 contain the runtime PGD.
> >> - *   Switches to the runtime PGD, set stack and vectors.
> >>   */
> >>  
> >>  	.text
> >> @@ -68,8 +58,11 @@ __kvm_hyp_init:
> >>  	W(b)	.
> >>  
> >>  __do_hyp_init:
> >> -	cmp	r0, #0			@ We have a SP?
> >> -	bne	phase2			@ Yes, second stage init
> >> +	@ Set stack pointer
> >> +	mov	sp, r0
> >> +
> >> +	@ Set HVBAR to point to the HYP vectors
> >> +	mcr	p15, 4, r1, c12, c0, 0	@ HVBAR
> >>  
> >>  	@ Set the HTTBR to point to the hypervisor PGD pointer passed
> >>  	mcrr	p15, 4, rr_lo_hi(r2, r3), c2
> >> @@ -114,33 +107,9 @@ __do_hyp_init:
> >>   THUMB(	ldr	r2, =(HSCTLR_M | HSCTLR_A | HSCTLR_TE)		)
> >>  	orr	r1, r1, r2
> >>  	orr	r0, r0, r1
> >> -	isb
> >>  	mcr	p15, 4, r0, c1, c0, 0	@ HSCR
> >> -
> >> -	@ End of init phase-1
> >> -	eret
> >> -
> >> -phase2:
> >> -	@ Set stack pointer
> >> -	mov	sp, r0
> >> -
> >> -	@ Set HVBAR to point to the HYP vectors
> >> -	mcr	p15, 4, r1, c12, c0, 0	@ HVBAR
> >> -
> >> -	@ Jump to the trampoline page
> >> -	ldr	r0, =TRAMPOLINE_VA
> >> -	adr	r1, target
> >> -	bfi	r0, r1, #0, #PAGE_SHIFT
> >> -	ret	r0
> >> -
> >> -target:	@ We're now in the trampoline code, switch page tables
> >> -	mcrr	p15, 4, rr_lo_hi(r2, r3), c2
> >>  	isb
> >>  
> >> -	@ Invalidate the old TLBs
> >> -	mcr	p15, 4, r0, c8, c7, 0	@ TLBIALLH
> >> -	dsb	ish
> > 
> > how are we sure there are no stale entries in the TLB beyond the idmap
> > region?  Did we take care of this during kernel boot?  What about
> > hotplug/suspend stuff?
> 
> This is done just before installing the page tables (not visible in this
> patch). Hotplug/suspend goes through the same path as well, so it should
> be all taken care of.
> 

Right, ok.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 12/15] arm: KVM: Simplify HYP init
@ 2016-06-30 13:32         ` Christoffer Dall
  0 siblings, 0 replies; 90+ messages in thread
From: Christoffer Dall @ 2016-06-30 13:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jun 30, 2016 at 01:31:52PM +0100, Marc Zyngier wrote:
> On 28/06/16 22:50, Christoffer Dall wrote:
> > On Tue, Jun 07, 2016 at 11:58:32AM +0100, Marc Zyngier wrote:
> >> Just like for arm64, we can now make the HYP setup a lot simpler,
> >> and we can now initialise it in one go (instead of the two
> >> phases we currently have).
> >>
> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >> ---
> >>  arch/arm/include/asm/kvm_host.h | 15 +++++--------
> >>  arch/arm/kvm/init.S             | 49 ++++++++---------------------------------
> >>  2 files changed, 14 insertions(+), 50 deletions(-)
> >>
> >> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> >> index 020f4eb..eafbfd5 100644
> >> --- a/arch/arm/include/asm/kvm_host.h
> >> +++ b/arch/arm/include/asm/kvm_host.h
> >> @@ -250,18 +250,13 @@ static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
> >>  	 * code. The init code doesn't need to preserve these
> >>  	 * registers as r0-r3 are already callee saved according to
> >>  	 * the AAPCS.
> >> -	 * Note that we slightly misuse the prototype by casing the
> >> +	 * Note that we slightly misuse the prototype by casting the
> >>  	 * stack pointer to a void *.
> >> -	 *
> >> -	 * We don't have enough registers to perform the full init in
> >> -	 * one go.  Install the boot PGD first, and then install the
> >> -	 * runtime PGD, stack pointer and vectors. The PGDs are always
> >> -	 * passed as the third argument, in order to be passed into
> >> -	 * r2-r3 to the init code (yes, this is compliant with the
> >> -	 * PCS!).
> >> -	 */
> >>  
> >> -	kvm_call_hyp(NULL, 0, boot_pgd_ptr);
> >> +	 * The PGDs are always passed as the third argument, in order
> >> +	 * to be passed into r2-r3 to the init code (yes, this is
> >> +	 * compliant with the PCS!).
> >> +	 */
> >>  
> >>  	kvm_call_hyp((void*)hyp_stack_ptr, vector_ptr, pgd_ptr);
> >>  }
> >> diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
> >> index 1f9ae17..b82a99d 100644
> >> --- a/arch/arm/kvm/init.S
> >> +++ b/arch/arm/kvm/init.S
> >> @@ -32,23 +32,13 @@
> >>   *       r2,r3 = Hypervisor pgd pointer
> >>   *
> >>   * The init scenario is:
> >> - * - We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
> >> - *   runtime stack, runtime vectors
> >> - * - Enable the MMU with the boot pgd
> >> - * - Jump to a target into the trampoline page (remember, this is the same
> >> - *   physical page!)
> >> - * - Now switch to the runtime pgd (same VA, and still the same physical
> >> - *   page!)
> >> + * - We jump in HYP with 3 parameters: runtime HYP pgd, runtime stack,
> >> + *   runtime vectors
> > 
> > probably just call this HYP pgd, HYP stack, and HYP vectors now
> 
> Yup.
> 
> >>   * - Invalidate TLBs
> >>   * - Set stack and vectors
> >> + * - Setup the page tables
> >> + * - Enable the MMU
> >>   * - Profit! (or eret, if you only care about the code).
> >> - *
> >> - * As we only have four registers available to pass parameters (and we
> >> - * need six), we split the init in two phases:
> >> - * - Phase 1: r0 = 0, r1 = 0, r2,r3 contain the boot PGD.
> >> - *   Provides the basic HYP init, and enable the MMU.
> >> - * - Phase 2: r0 = ToS, r1 = vectors, r2,r3 contain the runtime PGD.
> >> - *   Switches to the runtime PGD, set stack and vectors.
> >>   */
> >>  
> >>  	.text
> >> @@ -68,8 +58,11 @@ __kvm_hyp_init:
> >>  	W(b)	.
> >>  
> >>  __do_hyp_init:
> >> -	cmp	r0, #0			@ We have a SP?
> >> -	bne	phase2			@ Yes, second stage init
> >> +	@ Set stack pointer
> >> +	mov	sp, r0
> >> +
> >> +	@ Set HVBAR to point to the HYP vectors
> >> +	mcr	p15, 4, r1, c12, c0, 0	@ HVBAR
> >>  
> >>  	@ Set the HTTBR to point to the hypervisor PGD pointer passed
> >>  	mcrr	p15, 4, rr_lo_hi(r2, r3), c2
> >> @@ -114,33 +107,9 @@ __do_hyp_init:
> >>   THUMB(	ldr	r2, =(HSCTLR_M | HSCTLR_A | HSCTLR_TE)		)
> >>  	orr	r1, r1, r2
> >>  	orr	r0, r0, r1
> >> -	isb
> >>  	mcr	p15, 4, r0, c1, c0, 0	@ HSCR
> >> -
> >> -	@ End of init phase-1
> >> -	eret
> >> -
> >> -phase2:
> >> -	@ Set stack pointer
> >> -	mov	sp, r0
> >> -
> >> -	@ Set HVBAR to point to the HYP vectors
> >> -	mcr	p15, 4, r1, c12, c0, 0	@ HVBAR
> >> -
> >> -	@ Jump to the trampoline page
> >> -	ldr	r0, =TRAMPOLINE_VA
> >> -	adr	r1, target
> >> -	bfi	r0, r1, #0, #PAGE_SHIFT
> >> -	ret	r0
> >> -
> >> -target:	@ We're now in the trampoline code, switch page tables
> >> -	mcrr	p15, 4, rr_lo_hi(r2, r3), c2
> >>  	isb
> >>  
> >> -	@ Invalidate the old TLBs
> >> -	mcr	p15, 4, r0, c8, c7, 0	@ TLBIALLH
> >> -	dsb	ish
> > 
> > how are we sure there are no stale entries in the TLB beyond the idmap
> > region?  Did we take care of this during kernel boot?  What about
> > hotplug/suspend stuff?
> 
> This is done just before installing the page tables (not visible in this
> patch). Hotplug/suspend goes through the same path as well, so it should
> be all taken care of.
> 

Right, ok.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 90+ messages in thread

end of thread, other threads:[~2016-06-30 13:32 UTC | newest]

Thread overview: 90+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-07 10:58 [PATCH 00/15] arm/arm64: KVM: Merge boot and runtime page tables Marc Zyngier
2016-06-07 10:58 ` Marc Zyngier
2016-06-07 10:58 ` [PATCH 01/15] arm64: KVM: Merged page tables documentation Marc Zyngier
2016-06-07 10:58   ` Marc Zyngier
2016-06-27 13:28   ` Christoffer Dall
2016-06-27 13:28     ` Christoffer Dall
2016-06-27 14:06     ` Marc Zyngier
2016-06-27 14:06       ` Marc Zyngier
2016-06-28 11:46       ` Christoffer Dall
2016-06-28 11:46         ` Christoffer Dall
2016-06-29  9:05         ` Marc Zyngier
2016-06-29  9:05           ` Marc Zyngier
2016-06-07 10:58 ` [PATCH 02/15] arm64: KVM: Kill HYP_PAGE_OFFSET Marc Zyngier
2016-06-07 10:58   ` Marc Zyngier
2016-06-27 13:47   ` Christoffer Dall
2016-06-27 13:47     ` Christoffer Dall
2016-06-27 14:20     ` Marc Zyngier
2016-06-27 14:20       ` Marc Zyngier
2016-06-28 12:03       ` Christoffer Dall
2016-06-28 12:03         ` Christoffer Dall
2016-06-07 10:58 ` [PATCH 03/15] arm64: Add ARM64_HYP_OFFSET_LOW capability Marc Zyngier
2016-06-07 10:58   ` Marc Zyngier
2016-06-07 10:58 ` [PATCH 04/15] arm64: KVM: Define HYP offset masks Marc Zyngier
2016-06-07 10:58   ` Marc Zyngier
2016-06-07 10:58 ` [PATCH 05/15] arm64: KVM: Refactor kern_hyp_va/hyp_kern_va to deal with multiple offsets Marc Zyngier
2016-06-07 10:58   ` Marc Zyngier
2016-06-28 12:42   ` Christoffer Dall
2016-06-28 12:42     ` Christoffer Dall
2016-06-30  9:22     ` Marc Zyngier
2016-06-30  9:22       ` Marc Zyngier
2016-06-30 10:16       ` Marc Zyngier
2016-06-30 10:16         ` Marc Zyngier
2016-06-30 10:26         ` Christoffer Dall
2016-06-30 10:26           ` Christoffer Dall
2016-06-30 10:42         ` Ard Biesheuvel
2016-06-30 10:42           ` Ard Biesheuvel
2016-06-30 11:02           ` Marc Zyngier
2016-06-30 11:02             ` Marc Zyngier
2016-06-30 11:10             ` Ard Biesheuvel
2016-06-30 11:10               ` Ard Biesheuvel
2016-06-30 11:57               ` Marc Zyngier
2016-06-30 11:57                 ` Marc Zyngier
2016-06-07 10:58 ` [PATCH 06/15] arm/arm64: KVM: Export __hyp_text_start/end symbols Marc Zyngier
2016-06-07 10:58   ` Marc Zyngier
2016-06-07 10:58 ` [PATCH 07/15] arm64: KVM: Runtime detection of lower HYP offset Marc Zyngier
2016-06-07 10:58   ` Marc Zyngier
2016-06-07 10:58 ` [PATCH 08/15] arm/arm64: KVM: Always have merged page tables Marc Zyngier
2016-06-07 10:58   ` Marc Zyngier
2016-06-28 21:43   ` Christoffer Dall
2016-06-28 21:43     ` Christoffer Dall
2016-06-30 12:27     ` Marc Zyngier
2016-06-30 12:27       ` Marc Zyngier
2016-06-30 13:28       ` Christoffer Dall
2016-06-30 13:28         ` Christoffer Dall
2016-06-07 10:58 ` [PATCH 09/15] arm64: KVM: Simplify HYP init/teardown Marc Zyngier
2016-06-07 10:58   ` Marc Zyngier
2016-06-28 21:31   ` Christoffer Dall
2016-06-28 21:31     ` Christoffer Dall
2016-06-30 12:10     ` Marc Zyngier
2016-06-30 12:10       ` Marc Zyngier
2016-06-30 13:31       ` Christoffer Dall
2016-06-30 13:31         ` Christoffer Dall
2016-06-07 10:58 ` [PATCH 10/15] arm/arm64: KVM: Drop boot_pgd Marc Zyngier
2016-06-07 10:58   ` Marc Zyngier
2016-06-07 10:58 ` [PATCH 11/15] arm/arm64: KVM: Kill free_boot_hyp_pgd Marc Zyngier
2016-06-07 10:58   ` Marc Zyngier
2016-06-07 10:58 ` [PATCH 12/15] arm: KVM: Simplify HYP init Marc Zyngier
2016-06-07 10:58   ` Marc Zyngier
2016-06-28 21:50   ` Christoffer Dall
2016-06-28 21:50     ` Christoffer Dall
2016-06-30 12:31     ` Marc Zyngier
2016-06-30 12:31       ` Marc Zyngier
2016-06-30 13:32       ` Christoffer Dall
2016-06-30 13:32         ` Christoffer Dall
2016-06-07 10:58 ` [PATCH 13/15] arm: KVM: Allow hyp teardown Marc Zyngier
2016-06-07 10:58   ` Marc Zyngier
2016-06-07 10:58 ` [PATCH 14/15] arm/arm64: KVM: Prune unused #defines Marc Zyngier
2016-06-07 10:58   ` Marc Zyngier
2016-06-07 10:58 ` [PATCH 15/15] arm/arm64: KVM: Check that IDMAP doesn't intersect with VA range Marc Zyngier
2016-06-07 10:58   ` Marc Zyngier
2016-06-28 22:01   ` Christoffer Dall
2016-06-28 22:01     ` Christoffer Dall
2016-06-30 12:51     ` Marc Zyngier
2016-06-30 12:51       ` Marc Zyngier
2016-06-30 13:27       ` Christoffer Dall
2016-06-30 13:27         ` Christoffer Dall
2016-06-27 13:29 ` [PATCH 00/15] arm/arm64: KVM: Merge boot and runtime page tables Christoffer Dall
2016-06-27 13:29   ` Christoffer Dall
2016-06-27 14:12   ` Marc Zyngier
2016-06-27 14:12     ` Marc Zyngier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.