All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/8] KVM: arm64: Hypervisor stack enhancements
@ 2022-02-24  5:13 ` Kalesh Singh
  0 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24  5:13 UTC (permalink / raw)
  Cc: will, maz, qperret, tabba, surenb, kernel-team, Kalesh Singh,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Catalin Marinas,
	Mark Rutland, Mark Brown, Masami Hiramatsu, Peter Collingbourne,
	Madhavan T. Venkataraman, Andrew Walbran, Andrew Scull,
	Paolo Bonzini, Zenghui Yu, Ard Biesheuvel, linux-arm-kernel,
	kvmarm, linux-kernel

Hi all,

This is v3 of the nVHE hypervisor stack enhancements.

Previous versions can be found at:
v2: https://lore.kernel.org/r/20220222165212.2005066-1-kaleshsingh@google.com/
v1: https://lore.kernel.org/r/20220210224220.4076151-1-kaleshsingh@google.com/

The main update in this version is that the unwinder now uses the core logic
from the regular kernel stack unwinder to avoid duplicate code, per Mark; along
with fixes for the other issues identified in v2.

The previous cover letter (with updated call trace) has been copied below.

Thanks,
Kalesh

-----

This series is based on 5.17-rc5 and adds the following stack features to
the KVM nVHE hypervisor:

== Hyp Stack Guard Pages ==

Based on the technique used by arm64 VMAP_STACK to detect overflow.
i.e. the stack is aligned to twice its size which ensure that the 
'stack shift' bit of any valid SP is 0. The 'stack shift' bit can be
tested in the exception entry to detect overflow without corrupting GPRs.

== Hyp Stack Unwinder ==

Based on the arm64 kernel stack unwinder
(See: arch/arm64/kernel/stacktrace.c)

The unwinding and dumping of the hyp stack is not enabled by default and
depends on CONFIG_NVHE_EL2_DEBUG to avoid potential information leaks.

When CONFIG_NVHE_EL2_DEBUG is enabled the host stage 2 protection is
disabled, allowing the host to read the hypervisor stack pages and unwind
the stack from EL1. This allows us to print the hypervisor stacktrace
before panicking the host; as shown below.

Example call trace:

[   98.916444][  T426] kvm [426]: nVHE hyp panic at: [<ffffffc0096156fc>] __kvm_nvhe_overflow_stack+0x8/0x34!
[   98.918360][  T426] nVHE HYP call trace:
[   98.918692][  T426] kvm [426]: [<ffffffc009615aac>] __kvm_nvhe_cpu_prepare_nvhe_panic_info+0x4c/0x68
[   98.919545][  T426] kvm [426]: [<ffffffc0096159a4>] __kvm_nvhe_hyp_panic+0x2c/0xe8
[   98.920107][  T426] kvm [426]: [<ffffffc009615ad8>] __kvm_nvhe_hyp_panic_bad_stack+0x10/0x10
[   98.920665][  T426] kvm [426]: [<ffffffc009610a4c>] __kvm_nvhe___kvm_hyp_host_vector+0x24c/0x794
[   98.921292][  T426] kvm [426]: [<ffffffc009615718>] __kvm_nvhe_overflow_stack+0x24/0x34
. . .

[   98.973382][  T426] kvm [426]: [<ffffffc009615718>] __kvm_nvhe_overflow_stack+0x24/0x34
[   98.973816][  T426] kvm [426]: [<ffffffc0096152f4>] __kvm_nvhe___kvm_vcpu_run+0x38/0x438
[   98.974255][  T426] kvm [426]: [<ffffffc009616f80>] __kvm_nvhe_handle___kvm_vcpu_run+0x1c4/0x364
[   98.974719][  T426] kvm [426]: [<ffffffc009616928>] __kvm_nvhe_handle_trap+0xa8/0x130
[   98.975152][  T426] kvm [426]: [<ffffffc009610064>] __kvm_nvhe___host_exit+0x64/0x64
[   98.975588][  T426] ---- end of nVHE HYP call trace ----


Kalesh Singh (8):
  KVM: arm64: Introduce hyp_alloc_private_va_range()
  KVM: arm64: Introduce pkvm_alloc_private_va_range()
  KVM: arm64: Add guard pages for KVM nVHE hypervisor stack
  KVM: arm64: Add guard pages for pKVM (protected nVHE) hypervisor stack
  KVM: arm64: Detect and handle hypervisor stack overflows
  KVM: arm64: Add hypervisor overflow stack
  KVM: arm64: Unwind and dump nVHE HYP stacktrace
  KVM: arm64: Symbolize the nVHE HYP backtrace

 arch/arm64/include/asm/kvm_asm.h     |  20 +++
 arch/arm64/include/asm/kvm_mmu.h     |   4 +
 arch/arm64/include/asm/stacktrace.h  |  12 ++
 arch/arm64/kernel/stacktrace.c       | 210 ++++++++++++++++++++++++---
 arch/arm64/kvm/Kconfig               |   5 +-
 arch/arm64/kvm/arm.c                 |  34 ++++-
 arch/arm64/kvm/handle_exit.c         |  16 +-
 arch/arm64/kvm/hyp/include/nvhe/mm.h |   3 +-
 arch/arm64/kvm/hyp/nvhe/host.S       |  29 ++++
 arch/arm64/kvm/hyp/nvhe/hyp-main.c   |   5 +-
 arch/arm64/kvm/hyp/nvhe/mm.c         |  51 ++++---
 arch/arm64/kvm/hyp/nvhe/setup.c      |  25 +++-
 arch/arm64/kvm/hyp/nvhe/switch.c     |  30 +++-
 arch/arm64/kvm/mmu.c                 |  62 +++++---
 scripts/kallsyms.c                   |   2 +-
 15 files changed, 422 insertions(+), 86 deletions(-)


base-commit: cfb92440ee71adcc2105b0890bb01ac3cddb8507
-- 
2.35.1.473.g83b2b277ed-goog


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v3 0/8] KVM: arm64: Hypervisor stack enhancements
@ 2022-02-24  5:13 ` Kalesh Singh
  0 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24  5:13 UTC (permalink / raw)
  Cc: will, maz, qperret, tabba, surenb, kernel-team, Kalesh Singh,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Catalin Marinas,
	Mark Rutland, Mark Brown, Masami Hiramatsu, Peter Collingbourne,
	Madhavan T. Venkataraman, Andrew Walbran, Andrew Scull,
	Paolo Bonzini, Zenghui Yu, Ard Biesheuvel, linux-arm-kernel,
	kvmarm, linux-kernel

Hi all,

This is v3 of the nVHE hypervisor stack enhancements.

Previous versions can be found at:
v2: https://lore.kernel.org/r/20220222165212.2005066-1-kaleshsingh@google.com/
v1: https://lore.kernel.org/r/20220210224220.4076151-1-kaleshsingh@google.com/

The main update in this version is that the unwinder now uses the core logic
from the regular kernel stack unwinder to avoid duplicate code, per Mark; along
with fixes for the other issues identified in v2.

The previous cover letter (with updated call trace) has been copied below.

Thanks,
Kalesh

-----

This series is based on 5.17-rc5 and adds the following stack features to
the KVM nVHE hypervisor:

== Hyp Stack Guard Pages ==

Based on the technique used by arm64 VMAP_STACK to detect overflow.
i.e. the stack is aligned to twice its size which ensure that the 
'stack shift' bit of any valid SP is 0. The 'stack shift' bit can be
tested in the exception entry to detect overflow without corrupting GPRs.

== Hyp Stack Unwinder ==

Based on the arm64 kernel stack unwinder
(See: arch/arm64/kernel/stacktrace.c)

The unwinding and dumping of the hyp stack is not enabled by default and
depends on CONFIG_NVHE_EL2_DEBUG to avoid potential information leaks.

When CONFIG_NVHE_EL2_DEBUG is enabled the host stage 2 protection is
disabled, allowing the host to read the hypervisor stack pages and unwind
the stack from EL1. This allows us to print the hypervisor stacktrace
before panicking the host; as shown below.

Example call trace:

[   98.916444][  T426] kvm [426]: nVHE hyp panic at: [<ffffffc0096156fc>] __kvm_nvhe_overflow_stack+0x8/0x34!
[   98.918360][  T426] nVHE HYP call trace:
[   98.918692][  T426] kvm [426]: [<ffffffc009615aac>] __kvm_nvhe_cpu_prepare_nvhe_panic_info+0x4c/0x68
[   98.919545][  T426] kvm [426]: [<ffffffc0096159a4>] __kvm_nvhe_hyp_panic+0x2c/0xe8
[   98.920107][  T426] kvm [426]: [<ffffffc009615ad8>] __kvm_nvhe_hyp_panic_bad_stack+0x10/0x10
[   98.920665][  T426] kvm [426]: [<ffffffc009610a4c>] __kvm_nvhe___kvm_hyp_host_vector+0x24c/0x794
[   98.921292][  T426] kvm [426]: [<ffffffc009615718>] __kvm_nvhe_overflow_stack+0x24/0x34
. . .

[   98.973382][  T426] kvm [426]: [<ffffffc009615718>] __kvm_nvhe_overflow_stack+0x24/0x34
[   98.973816][  T426] kvm [426]: [<ffffffc0096152f4>] __kvm_nvhe___kvm_vcpu_run+0x38/0x438
[   98.974255][  T426] kvm [426]: [<ffffffc009616f80>] __kvm_nvhe_handle___kvm_vcpu_run+0x1c4/0x364
[   98.974719][  T426] kvm [426]: [<ffffffc009616928>] __kvm_nvhe_handle_trap+0xa8/0x130
[   98.975152][  T426] kvm [426]: [<ffffffc009610064>] __kvm_nvhe___host_exit+0x64/0x64
[   98.975588][  T426] ---- end of nVHE HYP call trace ----


Kalesh Singh (8):
  KVM: arm64: Introduce hyp_alloc_private_va_range()
  KVM: arm64: Introduce pkvm_alloc_private_va_range()
  KVM: arm64: Add guard pages for KVM nVHE hypervisor stack
  KVM: arm64: Add guard pages for pKVM (protected nVHE) hypervisor stack
  KVM: arm64: Detect and handle hypervisor stack overflows
  KVM: arm64: Add hypervisor overflow stack
  KVM: arm64: Unwind and dump nVHE HYP stacktrace
  KVM: arm64: Symbolize the nVHE HYP backtrace

 arch/arm64/include/asm/kvm_asm.h     |  20 +++
 arch/arm64/include/asm/kvm_mmu.h     |   4 +
 arch/arm64/include/asm/stacktrace.h  |  12 ++
 arch/arm64/kernel/stacktrace.c       | 210 ++++++++++++++++++++++++---
 arch/arm64/kvm/Kconfig               |   5 +-
 arch/arm64/kvm/arm.c                 |  34 ++++-
 arch/arm64/kvm/handle_exit.c         |  16 +-
 arch/arm64/kvm/hyp/include/nvhe/mm.h |   3 +-
 arch/arm64/kvm/hyp/nvhe/host.S       |  29 ++++
 arch/arm64/kvm/hyp/nvhe/hyp-main.c   |   5 +-
 arch/arm64/kvm/hyp/nvhe/mm.c         |  51 ++++---
 arch/arm64/kvm/hyp/nvhe/setup.c      |  25 +++-
 arch/arm64/kvm/hyp/nvhe/switch.c     |  30 +++-
 arch/arm64/kvm/mmu.c                 |  62 +++++---
 scripts/kallsyms.c                   |   2 +-
 15 files changed, 422 insertions(+), 86 deletions(-)


base-commit: cfb92440ee71adcc2105b0890bb01ac3cddb8507
-- 
2.35.1.473.g83b2b277ed-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v3 0/8] KVM: arm64: Hypervisor stack enhancements
@ 2022-02-24  5:13 ` Kalesh Singh
  0 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24  5:13 UTC (permalink / raw)
  Cc: Catalin Marinas, Kalesh Singh, will, Andrew Walbran, maz, kvmarm,
	Madhavan T. Venkataraman, kernel-team, surenb, Mark Brown,
	Peter Collingbourne, linux-arm-kernel, linux-kernel,
	Masami Hiramatsu, Paolo Bonzini

Hi all,

This is v3 of the nVHE hypervisor stack enhancements.

Previous versions can be found at:
v2: https://lore.kernel.org/r/20220222165212.2005066-1-kaleshsingh@google.com/
v1: https://lore.kernel.org/r/20220210224220.4076151-1-kaleshsingh@google.com/

The main update in this version is that the unwinder now uses the core logic
from the regular kernel stack unwinder to avoid duplicate code, per Mark; along
with fixes for the other issues identified in v2.

The previous cover letter (with updated call trace) has been copied below.

Thanks,
Kalesh

-----

This series is based on 5.17-rc5 and adds the following stack features to
the KVM nVHE hypervisor:

== Hyp Stack Guard Pages ==

Based on the technique used by arm64 VMAP_STACK to detect overflow.
i.e. the stack is aligned to twice its size which ensure that the 
'stack shift' bit of any valid SP is 0. The 'stack shift' bit can be
tested in the exception entry to detect overflow without corrupting GPRs.

== Hyp Stack Unwinder ==

Based on the arm64 kernel stack unwinder
(See: arch/arm64/kernel/stacktrace.c)

The unwinding and dumping of the hyp stack is not enabled by default and
depends on CONFIG_NVHE_EL2_DEBUG to avoid potential information leaks.

When CONFIG_NVHE_EL2_DEBUG is enabled the host stage 2 protection is
disabled, allowing the host to read the hypervisor stack pages and unwind
the stack from EL1. This allows us to print the hypervisor stacktrace
before panicking the host; as shown below.

Example call trace:

[   98.916444][  T426] kvm [426]: nVHE hyp panic at: [<ffffffc0096156fc>] __kvm_nvhe_overflow_stack+0x8/0x34!
[   98.918360][  T426] nVHE HYP call trace:
[   98.918692][  T426] kvm [426]: [<ffffffc009615aac>] __kvm_nvhe_cpu_prepare_nvhe_panic_info+0x4c/0x68
[   98.919545][  T426] kvm [426]: [<ffffffc0096159a4>] __kvm_nvhe_hyp_panic+0x2c/0xe8
[   98.920107][  T426] kvm [426]: [<ffffffc009615ad8>] __kvm_nvhe_hyp_panic_bad_stack+0x10/0x10
[   98.920665][  T426] kvm [426]: [<ffffffc009610a4c>] __kvm_nvhe___kvm_hyp_host_vector+0x24c/0x794
[   98.921292][  T426] kvm [426]: [<ffffffc009615718>] __kvm_nvhe_overflow_stack+0x24/0x34
. . .

[   98.973382][  T426] kvm [426]: [<ffffffc009615718>] __kvm_nvhe_overflow_stack+0x24/0x34
[   98.973816][  T426] kvm [426]: [<ffffffc0096152f4>] __kvm_nvhe___kvm_vcpu_run+0x38/0x438
[   98.974255][  T426] kvm [426]: [<ffffffc009616f80>] __kvm_nvhe_handle___kvm_vcpu_run+0x1c4/0x364
[   98.974719][  T426] kvm [426]: [<ffffffc009616928>] __kvm_nvhe_handle_trap+0xa8/0x130
[   98.975152][  T426] kvm [426]: [<ffffffc009610064>] __kvm_nvhe___host_exit+0x64/0x64
[   98.975588][  T426] ---- end of nVHE HYP call trace ----


Kalesh Singh (8):
  KVM: arm64: Introduce hyp_alloc_private_va_range()
  KVM: arm64: Introduce pkvm_alloc_private_va_range()
  KVM: arm64: Add guard pages for KVM nVHE hypervisor stack
  KVM: arm64: Add guard pages for pKVM (protected nVHE) hypervisor stack
  KVM: arm64: Detect and handle hypervisor stack overflows
  KVM: arm64: Add hypervisor overflow stack
  KVM: arm64: Unwind and dump nVHE HYP stacktrace
  KVM: arm64: Symbolize the nVHE HYP backtrace

 arch/arm64/include/asm/kvm_asm.h     |  20 +++
 arch/arm64/include/asm/kvm_mmu.h     |   4 +
 arch/arm64/include/asm/stacktrace.h  |  12 ++
 arch/arm64/kernel/stacktrace.c       | 210 ++++++++++++++++++++++++---
 arch/arm64/kvm/Kconfig               |   5 +-
 arch/arm64/kvm/arm.c                 |  34 ++++-
 arch/arm64/kvm/handle_exit.c         |  16 +-
 arch/arm64/kvm/hyp/include/nvhe/mm.h |   3 +-
 arch/arm64/kvm/hyp/nvhe/host.S       |  29 ++++
 arch/arm64/kvm/hyp/nvhe/hyp-main.c   |   5 +-
 arch/arm64/kvm/hyp/nvhe/mm.c         |  51 ++++---
 arch/arm64/kvm/hyp/nvhe/setup.c      |  25 +++-
 arch/arm64/kvm/hyp/nvhe/switch.c     |  30 +++-
 arch/arm64/kvm/mmu.c                 |  62 +++++---
 scripts/kallsyms.c                   |   2 +-
 15 files changed, 422 insertions(+), 86 deletions(-)


base-commit: cfb92440ee71adcc2105b0890bb01ac3cddb8507
-- 
2.35.1.473.g83b2b277ed-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v3 1/8] KVM: arm64: Introduce hyp_alloc_private_va_range()
  2022-02-24  5:13 ` Kalesh Singh
  (?)
@ 2022-02-24  5:13   ` Kalesh Singh
  -1 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24  5:13 UTC (permalink / raw)
  Cc: will, maz, qperret, tabba, surenb, kernel-team, Kalesh Singh,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Catalin Marinas,
	Mark Rutland, Mark Brown, Masami Hiramatsu, Peter Collingbourne,
	Madhavan T. Venkataraman, Andrew Scull, Paolo Bonzini,
	Ard Biesheuvel, linux-arm-kernel, kvmarm, linux-kernel

hyp_alloc_private_va_range() can be used to reserve private VA ranges
in the nVHE hypervisor. Also update  __create_hyp_private_mapping()
to allow specifying an alignment for the private VA mapping.

These will be used to implement stack guard pages for KVM nVHE hypervisor
(nVHE Hyp mode / not pKVM), in a subsequent patch in the series.

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---

Changes in v3:
  - Handle null ptr in IS_ERR_OR_NULL checks, per Mark

 arch/arm64/include/asm/kvm_mmu.h |  4 +++
 arch/arm64/kvm/mmu.c             | 62 ++++++++++++++++++++------------
 2 files changed, 43 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 81839e9a8a24..0b0c71302b92 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -153,6 +153,10 @@ static __always_inline unsigned long __kern_hyp_va(unsigned long v)
 int kvm_share_hyp(void *from, void *to);
 void kvm_unshare_hyp(void *from, void *to);
 int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
+unsigned long hyp_alloc_private_va_range(size_t size, size_t align);
+int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
+				size_t align, unsigned long *haddr,
+				enum kvm_pgtable_prot prot);
 int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
 			   void __iomem **kaddr,
 			   void __iomem **haddr);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index bc2aba953299..fc09536c8197 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -457,22 +457,16 @@ int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
 	return 0;
 }
 
-static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
-					unsigned long *haddr,
-					enum kvm_pgtable_prot prot)
+
+/*
+ * Allocates a private VA range below io_map_base.
+ *
+ * @size:	The size of the VA range to reserve.
+ * @align:	The required alignment for the allocation.
+ */
+unsigned long hyp_alloc_private_va_range(size_t size, size_t align)
 {
 	unsigned long base;
-	int ret = 0;
-
-	if (!kvm_host_owns_hyp_mappings()) {
-		base = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
-					 phys_addr, size, prot);
-		if (IS_ERR_OR_NULL((void *)base))
-			return PTR_ERR((void *)base);
-		*haddr = base;
-
-		return 0;
-	}
 
 	mutex_lock(&kvm_hyp_pgd_mutex);
 
@@ -484,8 +478,8 @@ static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
 	 *
 	 * The allocated size is always a multiple of PAGE_SIZE.
 	 */
-	size = PAGE_ALIGN(size + offset_in_page(phys_addr));
-	base = io_map_base - size;
+	base = io_map_base - PAGE_ALIGN(size);
+	base = ALIGN_DOWN(base, align);
 
 	/*
 	 * Verify that BIT(VA_BITS - 1) hasn't been flipped by
@@ -493,20 +487,42 @@ static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
 	 * overflowed the idmap/IO address range.
 	 */
 	if ((base ^ io_map_base) & BIT(VA_BITS - 1))
-		ret = -ENOMEM;
+		base = (unsigned long)ERR_PTR(-ENOMEM);
 	else
 		io_map_base = base;
 
 	mutex_unlock(&kvm_hyp_pgd_mutex);
 
-	if (ret)
-		goto out;
+	return base;
+}
+
+int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
+				size_t align, unsigned long *haddr,
+				enum kvm_pgtable_prot prot)
+{
+	unsigned long addr;
+	int ret = 0;
+
+	if (!kvm_host_owns_hyp_mappings()) {
+		addr = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
+					 phys_addr, size, prot);
+		if (IS_ERR_OR_NULL((void *)addr))
+			return addr ? PTR_ERR((void *)addr) : -ENOMEM;
+		*haddr = addr;
+
+		return 0;
+	}
+
+	size += offset_in_page(phys_addr);
+	addr = hyp_alloc_private_va_range(size, align);
+	if (IS_ERR_OR_NULL((void *)addr))
+		return addr ? PTR_ERR((void *)addr) : -ENOMEM;
 
-	ret = __create_hyp_mappings(base, size, phys_addr, prot);
+	ret = __create_hyp_mappings(addr, size, phys_addr, prot);
 	if (ret)
 		goto out;
 
-	*haddr = base + offset_in_page(phys_addr);
+	*haddr = addr + offset_in_page(phys_addr);
 out:
 	return ret;
 }
@@ -537,7 +553,7 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
 		return 0;
 	}
 
-	ret = __create_hyp_private_mapping(phys_addr, size,
+	ret = __create_hyp_private_mapping(phys_addr, size, PAGE_SIZE,
 					   &addr, PAGE_HYP_DEVICE);
 	if (ret) {
 		iounmap(*kaddr);
@@ -564,7 +580,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
 
 	BUG_ON(is_kernel_in_hyp_mode());
 
-	ret = __create_hyp_private_mapping(phys_addr, size,
+	ret = __create_hyp_private_mapping(phys_addr, size, PAGE_SIZE,
 					   &addr, PAGE_HYP_EXEC);
 	if (ret) {
 		*haddr = NULL;
-- 
2.35.1.473.g83b2b277ed-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 1/8] KVM: arm64: Introduce hyp_alloc_private_va_range()
@ 2022-02-24  5:13   ` Kalesh Singh
  0 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24  5:13 UTC (permalink / raw)
  Cc: will, maz, qperret, tabba, surenb, kernel-team, Kalesh Singh,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Catalin Marinas,
	Mark Rutland, Mark Brown, Masami Hiramatsu, Peter Collingbourne,
	Madhavan T. Venkataraman, Andrew Scull, Paolo Bonzini,
	Ard Biesheuvel, linux-arm-kernel, kvmarm, linux-kernel

hyp_alloc_private_va_range() can be used to reserve private VA ranges
in the nVHE hypervisor. Also update  __create_hyp_private_mapping()
to allow specifying an alignment for the private VA mapping.

These will be used to implement stack guard pages for KVM nVHE hypervisor
(nVHE Hyp mode / not pKVM), in a subsequent patch in the series.

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---

Changes in v3:
  - Handle null ptr in IS_ERR_OR_NULL checks, per Mark

 arch/arm64/include/asm/kvm_mmu.h |  4 +++
 arch/arm64/kvm/mmu.c             | 62 ++++++++++++++++++++------------
 2 files changed, 43 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 81839e9a8a24..0b0c71302b92 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -153,6 +153,10 @@ static __always_inline unsigned long __kern_hyp_va(unsigned long v)
 int kvm_share_hyp(void *from, void *to);
 void kvm_unshare_hyp(void *from, void *to);
 int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
+unsigned long hyp_alloc_private_va_range(size_t size, size_t align);
+int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
+				size_t align, unsigned long *haddr,
+				enum kvm_pgtable_prot prot);
 int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
 			   void __iomem **kaddr,
 			   void __iomem **haddr);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index bc2aba953299..fc09536c8197 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -457,22 +457,16 @@ int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
 	return 0;
 }
 
-static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
-					unsigned long *haddr,
-					enum kvm_pgtable_prot prot)
+
+/*
+ * Allocates a private VA range below io_map_base.
+ *
+ * @size:	The size of the VA range to reserve.
+ * @align:	The required alignment for the allocation.
+ */
+unsigned long hyp_alloc_private_va_range(size_t size, size_t align)
 {
 	unsigned long base;
-	int ret = 0;
-
-	if (!kvm_host_owns_hyp_mappings()) {
-		base = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
-					 phys_addr, size, prot);
-		if (IS_ERR_OR_NULL((void *)base))
-			return PTR_ERR((void *)base);
-		*haddr = base;
-
-		return 0;
-	}
 
 	mutex_lock(&kvm_hyp_pgd_mutex);
 
@@ -484,8 +478,8 @@ static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
 	 *
 	 * The allocated size is always a multiple of PAGE_SIZE.
 	 */
-	size = PAGE_ALIGN(size + offset_in_page(phys_addr));
-	base = io_map_base - size;
+	base = io_map_base - PAGE_ALIGN(size);
+	base = ALIGN_DOWN(base, align);
 
 	/*
 	 * Verify that BIT(VA_BITS - 1) hasn't been flipped by
@@ -493,20 +487,42 @@ static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
 	 * overflowed the idmap/IO address range.
 	 */
 	if ((base ^ io_map_base) & BIT(VA_BITS - 1))
-		ret = -ENOMEM;
+		base = (unsigned long)ERR_PTR(-ENOMEM);
 	else
 		io_map_base = base;
 
 	mutex_unlock(&kvm_hyp_pgd_mutex);
 
-	if (ret)
-		goto out;
+	return base;
+}
+
+int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
+				size_t align, unsigned long *haddr,
+				enum kvm_pgtable_prot prot)
+{
+	unsigned long addr;
+	int ret = 0;
+
+	if (!kvm_host_owns_hyp_mappings()) {
+		addr = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
+					 phys_addr, size, prot);
+		if (IS_ERR_OR_NULL((void *)addr))
+			return addr ? PTR_ERR((void *)addr) : -ENOMEM;
+		*haddr = addr;
+
+		return 0;
+	}
+
+	size += offset_in_page(phys_addr);
+	addr = hyp_alloc_private_va_range(size, align);
+	if (IS_ERR_OR_NULL((void *)addr))
+		return addr ? PTR_ERR((void *)addr) : -ENOMEM;
 
-	ret = __create_hyp_mappings(base, size, phys_addr, prot);
+	ret = __create_hyp_mappings(addr, size, phys_addr, prot);
 	if (ret)
 		goto out;
 
-	*haddr = base + offset_in_page(phys_addr);
+	*haddr = addr + offset_in_page(phys_addr);
 out:
 	return ret;
 }
@@ -537,7 +553,7 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
 		return 0;
 	}
 
-	ret = __create_hyp_private_mapping(phys_addr, size,
+	ret = __create_hyp_private_mapping(phys_addr, size, PAGE_SIZE,
 					   &addr, PAGE_HYP_DEVICE);
 	if (ret) {
 		iounmap(*kaddr);
@@ -564,7 +580,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
 
 	BUG_ON(is_kernel_in_hyp_mode());
 
-	ret = __create_hyp_private_mapping(phys_addr, size,
+	ret = __create_hyp_private_mapping(phys_addr, size, PAGE_SIZE,
 					   &addr, PAGE_HYP_EXEC);
 	if (ret) {
 		*haddr = NULL;
-- 
2.35.1.473.g83b2b277ed-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 1/8] KVM: arm64: Introduce hyp_alloc_private_va_range()
@ 2022-02-24  5:13   ` Kalesh Singh
  0 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24  5:13 UTC (permalink / raw)
  Cc: Catalin Marinas, Kalesh Singh, will, kvmarm, maz,
	Madhavan T. Venkataraman, kernel-team, surenb, Mark Brown,
	Peter Collingbourne, linux-arm-kernel, linux-kernel,
	Masami Hiramatsu, Paolo Bonzini

hyp_alloc_private_va_range() can be used to reserve private VA ranges
in the nVHE hypervisor. Also update  __create_hyp_private_mapping()
to allow specifying an alignment for the private VA mapping.

These will be used to implement stack guard pages for KVM nVHE hypervisor
(nVHE Hyp mode / not pKVM), in a subsequent patch in the series.

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---

Changes in v3:
  - Handle null ptr in IS_ERR_OR_NULL checks, per Mark

 arch/arm64/include/asm/kvm_mmu.h |  4 +++
 arch/arm64/kvm/mmu.c             | 62 ++++++++++++++++++++------------
 2 files changed, 43 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 81839e9a8a24..0b0c71302b92 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -153,6 +153,10 @@ static __always_inline unsigned long __kern_hyp_va(unsigned long v)
 int kvm_share_hyp(void *from, void *to);
 void kvm_unshare_hyp(void *from, void *to);
 int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
+unsigned long hyp_alloc_private_va_range(size_t size, size_t align);
+int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
+				size_t align, unsigned long *haddr,
+				enum kvm_pgtable_prot prot);
 int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
 			   void __iomem **kaddr,
 			   void __iomem **haddr);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index bc2aba953299..fc09536c8197 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -457,22 +457,16 @@ int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
 	return 0;
 }
 
-static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
-					unsigned long *haddr,
-					enum kvm_pgtable_prot prot)
+
+/*
+ * Allocates a private VA range below io_map_base.
+ *
+ * @size:	The size of the VA range to reserve.
+ * @align:	The required alignment for the allocation.
+ */
+unsigned long hyp_alloc_private_va_range(size_t size, size_t align)
 {
 	unsigned long base;
-	int ret = 0;
-
-	if (!kvm_host_owns_hyp_mappings()) {
-		base = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
-					 phys_addr, size, prot);
-		if (IS_ERR_OR_NULL((void *)base))
-			return PTR_ERR((void *)base);
-		*haddr = base;
-
-		return 0;
-	}
 
 	mutex_lock(&kvm_hyp_pgd_mutex);
 
@@ -484,8 +478,8 @@ static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
 	 *
 	 * The allocated size is always a multiple of PAGE_SIZE.
 	 */
-	size = PAGE_ALIGN(size + offset_in_page(phys_addr));
-	base = io_map_base - size;
+	base = io_map_base - PAGE_ALIGN(size);
+	base = ALIGN_DOWN(base, align);
 
 	/*
 	 * Verify that BIT(VA_BITS - 1) hasn't been flipped by
@@ -493,20 +487,42 @@ static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
 	 * overflowed the idmap/IO address range.
 	 */
 	if ((base ^ io_map_base) & BIT(VA_BITS - 1))
-		ret = -ENOMEM;
+		base = (unsigned long)ERR_PTR(-ENOMEM);
 	else
 		io_map_base = base;
 
 	mutex_unlock(&kvm_hyp_pgd_mutex);
 
-	if (ret)
-		goto out;
+	return base;
+}
+
+int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
+				size_t align, unsigned long *haddr,
+				enum kvm_pgtable_prot prot)
+{
+	unsigned long addr;
+	int ret = 0;
+
+	if (!kvm_host_owns_hyp_mappings()) {
+		addr = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
+					 phys_addr, size, prot);
+		if (IS_ERR_OR_NULL((void *)addr))
+			return addr ? PTR_ERR((void *)addr) : -ENOMEM;
+		*haddr = addr;
+
+		return 0;
+	}
+
+	size += offset_in_page(phys_addr);
+	addr = hyp_alloc_private_va_range(size, align);
+	if (IS_ERR_OR_NULL((void *)addr))
+		return addr ? PTR_ERR((void *)addr) : -ENOMEM;
 
-	ret = __create_hyp_mappings(base, size, phys_addr, prot);
+	ret = __create_hyp_mappings(addr, size, phys_addr, prot);
 	if (ret)
 		goto out;
 
-	*haddr = base + offset_in_page(phys_addr);
+	*haddr = addr + offset_in_page(phys_addr);
 out:
 	return ret;
 }
@@ -537,7 +553,7 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
 		return 0;
 	}
 
-	ret = __create_hyp_private_mapping(phys_addr, size,
+	ret = __create_hyp_private_mapping(phys_addr, size, PAGE_SIZE,
 					   &addr, PAGE_HYP_DEVICE);
 	if (ret) {
 		iounmap(*kaddr);
@@ -564,7 +580,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
 
 	BUG_ON(is_kernel_in_hyp_mode());
 
-	ret = __create_hyp_private_mapping(phys_addr, size,
+	ret = __create_hyp_private_mapping(phys_addr, size, PAGE_SIZE,
 					   &addr, PAGE_HYP_EXEC);
 	if (ret) {
 		*haddr = NULL;
-- 
2.35.1.473.g83b2b277ed-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 2/8] KVM: arm64: Introduce pkvm_alloc_private_va_range()
  2022-02-24  5:13 ` Kalesh Singh
  (?)
@ 2022-02-24  5:13   ` Kalesh Singh
  -1 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24  5:13 UTC (permalink / raw)
  Cc: will, maz, qperret, tabba, surenb, kernel-team, Kalesh Singh,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Catalin Marinas,
	Mark Rutland, Mark Brown, Masami Hiramatsu, Peter Collingbourne,
	Madhavan T. Venkataraman, Andrew Walbran, Andrew Scull,
	Paolo Bonzini, Ard Biesheuvel, linux-arm-kernel, kvmarm,
	linux-kernel

pkvm_hyp_alloc_private_va_range() can be used to reserve private VA ranges
in the pKVM nVHE hypervisor (). Also update __pkvm_create_private_mapping()
to allow specifying an alignment for the private VA mapping.

These will be used to implement stack guard pages for pKVM nVHE hypervisor
(in a subsequent patch in the series).

Credits to Quentin Perret <qperret@google.com> for the idea of moving
private VA allocation out of __pkvm_create_private_mapping()

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---

Changes in v3:
  - Handle null ptr in IS_ERR_OR_NULL checks, per Mark

Changes in v2:
  - Allow specifying an alignment for the private VA allocations, per Marc

 arch/arm64/kvm/hyp/include/nvhe/mm.h |  3 +-
 arch/arm64/kvm/hyp/nvhe/hyp-main.c   |  5 +--
 arch/arm64/kvm/hyp/nvhe/mm.c         | 51 ++++++++++++++++++----------
 arch/arm64/kvm/mmu.c                 |  2 +-
 4 files changed, 40 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
index 2d08510c6cc1..05d06ad00347 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -20,7 +20,8 @@ int pkvm_cpu_set_vector(enum arm64_hyp_spectre_vector slot);
 int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
 int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot prot);
 unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
-					    enum kvm_pgtable_prot prot);
+					size_t align, enum kvm_pgtable_prot prot);
+unsigned long pkvm_alloc_private_va_range(size_t size, size_t align);
 
 static inline void hyp_vmemmap_range(phys_addr_t phys, unsigned long size,
 				     unsigned long *start, unsigned long *end)
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 5e2197db0d32..96b2312a0f1d 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -158,9 +158,10 @@ static void handle___pkvm_create_private_mapping(struct kvm_cpu_context *host_ct
 {
 	DECLARE_REG(phys_addr_t, phys, host_ctxt, 1);
 	DECLARE_REG(size_t, size, host_ctxt, 2);
-	DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 3);
+	DECLARE_REG(size_t, align, host_ctxt, 3);
+	DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 4);
 
-	cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, prot);
+	cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, align, prot);
 }
 
 static void handle___pkvm_prot_finalize(struct kvm_cpu_context *host_ctxt)
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index 526a7d6fa86f..f35468ec639d 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -37,26 +37,46 @@ static int __pkvm_create_mappings(unsigned long start, unsigned long size,
 	return err;
 }
 
-unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
-					    enum kvm_pgtable_prot prot)
+/*
+ * Allocates a private VA range above __io_map_base.
+ *
+ * @size:	The size of the VA range to reserve.
+ * @align:	The required alignment for the allocation.
+ */
+unsigned long pkvm_alloc_private_va_range(size_t size, size_t align)
 {
-	unsigned long addr;
-	int err;
+	unsigned long base, addr;
 
 	hyp_spin_lock(&pkvm_pgd_lock);
 
-	size = PAGE_ALIGN(size + offset_in_page(phys));
-	addr = __io_map_base;
-	__io_map_base += size;
+	addr = ALIGN(__io_map_base, align);
+
+	/* The allocated size is always a multiple of PAGE_SIZE */
+	base = addr + PAGE_ALIGN(size);
 
 	/* Are we overflowing on the vmemmap ? */
-	if (__io_map_base > __hyp_vmemmap) {
-		__io_map_base -= size;
+	if (base > __hyp_vmemmap)
 		addr = (unsigned long)ERR_PTR(-ENOMEM);
+	else
+		__io_map_base = base;
+
+	hyp_spin_unlock(&pkvm_pgd_lock);
+
+	return addr;
+}
+
+unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
+					size_t align, enum kvm_pgtable_prot prot)
+{
+	unsigned long addr;
+	int err;
+
+	size += offset_in_page(phys);
+	addr = pkvm_alloc_private_va_range(size, align);
+	if (IS_ERR((void *)addr))
 		goto out;
-	}
 
-	err = kvm_pgtable_hyp_map(&pkvm_pgtable, addr, size, phys, prot);
+	err = __pkvm_create_mappings(addr, size, phys, prot);
 	if (err) {
 		addr = (unsigned long)ERR_PTR(err);
 		goto out;
@@ -64,8 +84,6 @@ unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
 
 	addr = addr + offset_in_page(phys);
 out:
-	hyp_spin_unlock(&pkvm_pgd_lock);
-
 	return addr;
 }
 
@@ -152,11 +170,10 @@ int hyp_map_vectors(void)
 		return 0;
 
 	phys = __hyp_pa(__bp_harden_hyp_vecs);
-	bp_base = (void *)__pkvm_create_private_mapping(phys,
-							__BP_HARDEN_HYP_VECS_SZ,
-							PAGE_HYP_EXEC);
+	bp_base = (void *)__pkvm_create_private_mapping(phys, __BP_HARDEN_HYP_VECS_SZ,
+							PAGE_SIZE, PAGE_HYP_EXEC);
 	if (IS_ERR_OR_NULL(bp_base))
-		return PTR_ERR(bp_base);
+		return bp_base ? PTR_ERR(bp_base) : -ENOMEM;
 
 	__hyp_bp_vect_base = bp_base;
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index fc09536c8197..298e6d8439ef 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -505,7 +505,7 @@ int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
 
 	if (!kvm_host_owns_hyp_mappings()) {
 		addr = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
-					 phys_addr, size, prot);
+					 phys_addr, size, align, prot);
 		if (IS_ERR_OR_NULL((void *)addr))
 			return addr ? PTR_ERR((void *)addr) : -ENOMEM;
 		*haddr = addr;
-- 
2.35.1.473.g83b2b277ed-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 2/8] KVM: arm64: Introduce pkvm_alloc_private_va_range()
@ 2022-02-24  5:13   ` Kalesh Singh
  0 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24  5:13 UTC (permalink / raw)
  Cc: will, maz, qperret, tabba, surenb, kernel-team, Kalesh Singh,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Catalin Marinas,
	Mark Rutland, Mark Brown, Masami Hiramatsu, Peter Collingbourne,
	Madhavan T. Venkataraman, Andrew Walbran, Andrew Scull,
	Paolo Bonzini, Ard Biesheuvel, linux-arm-kernel, kvmarm,
	linux-kernel

pkvm_hyp_alloc_private_va_range() can be used to reserve private VA ranges
in the pKVM nVHE hypervisor (). Also update __pkvm_create_private_mapping()
to allow specifying an alignment for the private VA mapping.

These will be used to implement stack guard pages for pKVM nVHE hypervisor
(in a subsequent patch in the series).

Credits to Quentin Perret <qperret@google.com> for the idea of moving
private VA allocation out of __pkvm_create_private_mapping()

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---

Changes in v3:
  - Handle null ptr in IS_ERR_OR_NULL checks, per Mark

Changes in v2:
  - Allow specifying an alignment for the private VA allocations, per Marc

 arch/arm64/kvm/hyp/include/nvhe/mm.h |  3 +-
 arch/arm64/kvm/hyp/nvhe/hyp-main.c   |  5 +--
 arch/arm64/kvm/hyp/nvhe/mm.c         | 51 ++++++++++++++++++----------
 arch/arm64/kvm/mmu.c                 |  2 +-
 4 files changed, 40 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
index 2d08510c6cc1..05d06ad00347 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -20,7 +20,8 @@ int pkvm_cpu_set_vector(enum arm64_hyp_spectre_vector slot);
 int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
 int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot prot);
 unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
-					    enum kvm_pgtable_prot prot);
+					size_t align, enum kvm_pgtable_prot prot);
+unsigned long pkvm_alloc_private_va_range(size_t size, size_t align);
 
 static inline void hyp_vmemmap_range(phys_addr_t phys, unsigned long size,
 				     unsigned long *start, unsigned long *end)
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 5e2197db0d32..96b2312a0f1d 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -158,9 +158,10 @@ static void handle___pkvm_create_private_mapping(struct kvm_cpu_context *host_ct
 {
 	DECLARE_REG(phys_addr_t, phys, host_ctxt, 1);
 	DECLARE_REG(size_t, size, host_ctxt, 2);
-	DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 3);
+	DECLARE_REG(size_t, align, host_ctxt, 3);
+	DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 4);
 
-	cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, prot);
+	cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, align, prot);
 }
 
 static void handle___pkvm_prot_finalize(struct kvm_cpu_context *host_ctxt)
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index 526a7d6fa86f..f35468ec639d 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -37,26 +37,46 @@ static int __pkvm_create_mappings(unsigned long start, unsigned long size,
 	return err;
 }
 
-unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
-					    enum kvm_pgtable_prot prot)
+/*
+ * Allocates a private VA range above __io_map_base.
+ *
+ * @size:	The size of the VA range to reserve.
+ * @align:	The required alignment for the allocation.
+ */
+unsigned long pkvm_alloc_private_va_range(size_t size, size_t align)
 {
-	unsigned long addr;
-	int err;
+	unsigned long base, addr;
 
 	hyp_spin_lock(&pkvm_pgd_lock);
 
-	size = PAGE_ALIGN(size + offset_in_page(phys));
-	addr = __io_map_base;
-	__io_map_base += size;
+	addr = ALIGN(__io_map_base, align);
+
+	/* The allocated size is always a multiple of PAGE_SIZE */
+	base = addr + PAGE_ALIGN(size);
 
 	/* Are we overflowing on the vmemmap ? */
-	if (__io_map_base > __hyp_vmemmap) {
-		__io_map_base -= size;
+	if (base > __hyp_vmemmap)
 		addr = (unsigned long)ERR_PTR(-ENOMEM);
+	else
+		__io_map_base = base;
+
+	hyp_spin_unlock(&pkvm_pgd_lock);
+
+	return addr;
+}
+
+unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
+					size_t align, enum kvm_pgtable_prot prot)
+{
+	unsigned long addr;
+	int err;
+
+	size += offset_in_page(phys);
+	addr = pkvm_alloc_private_va_range(size, align);
+	if (IS_ERR((void *)addr))
 		goto out;
-	}
 
-	err = kvm_pgtable_hyp_map(&pkvm_pgtable, addr, size, phys, prot);
+	err = __pkvm_create_mappings(addr, size, phys, prot);
 	if (err) {
 		addr = (unsigned long)ERR_PTR(err);
 		goto out;
@@ -64,8 +84,6 @@ unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
 
 	addr = addr + offset_in_page(phys);
 out:
-	hyp_spin_unlock(&pkvm_pgd_lock);
-
 	return addr;
 }
 
@@ -152,11 +170,10 @@ int hyp_map_vectors(void)
 		return 0;
 
 	phys = __hyp_pa(__bp_harden_hyp_vecs);
-	bp_base = (void *)__pkvm_create_private_mapping(phys,
-							__BP_HARDEN_HYP_VECS_SZ,
-							PAGE_HYP_EXEC);
+	bp_base = (void *)__pkvm_create_private_mapping(phys, __BP_HARDEN_HYP_VECS_SZ,
+							PAGE_SIZE, PAGE_HYP_EXEC);
 	if (IS_ERR_OR_NULL(bp_base))
-		return PTR_ERR(bp_base);
+		return bp_base ? PTR_ERR(bp_base) : -ENOMEM;
 
 	__hyp_bp_vect_base = bp_base;
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index fc09536c8197..298e6d8439ef 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -505,7 +505,7 @@ int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
 
 	if (!kvm_host_owns_hyp_mappings()) {
 		addr = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
-					 phys_addr, size, prot);
+					 phys_addr, size, align, prot);
 		if (IS_ERR_OR_NULL((void *)addr))
 			return addr ? PTR_ERR((void *)addr) : -ENOMEM;
 		*haddr = addr;
-- 
2.35.1.473.g83b2b277ed-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 2/8] KVM: arm64: Introduce pkvm_alloc_private_va_range()
@ 2022-02-24  5:13   ` Kalesh Singh
  0 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24  5:13 UTC (permalink / raw)
  Cc: Catalin Marinas, Kalesh Singh, will, kvmarm, Andrew Walbran, maz,
	Madhavan T. Venkataraman, kernel-team, surenb, Mark Brown,
	Peter Collingbourne, linux-arm-kernel, linux-kernel,
	Masami Hiramatsu, Paolo Bonzini

pkvm_hyp_alloc_private_va_range() can be used to reserve private VA ranges
in the pKVM nVHE hypervisor (). Also update __pkvm_create_private_mapping()
to allow specifying an alignment for the private VA mapping.

These will be used to implement stack guard pages for pKVM nVHE hypervisor
(in a subsequent patch in the series).

Credits to Quentin Perret <qperret@google.com> for the idea of moving
private VA allocation out of __pkvm_create_private_mapping()

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---

Changes in v3:
  - Handle null ptr in IS_ERR_OR_NULL checks, per Mark

Changes in v2:
  - Allow specifying an alignment for the private VA allocations, per Marc

 arch/arm64/kvm/hyp/include/nvhe/mm.h |  3 +-
 arch/arm64/kvm/hyp/nvhe/hyp-main.c   |  5 +--
 arch/arm64/kvm/hyp/nvhe/mm.c         | 51 ++++++++++++++++++----------
 arch/arm64/kvm/mmu.c                 |  2 +-
 4 files changed, 40 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
index 2d08510c6cc1..05d06ad00347 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -20,7 +20,8 @@ int pkvm_cpu_set_vector(enum arm64_hyp_spectre_vector slot);
 int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
 int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot prot);
 unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
-					    enum kvm_pgtable_prot prot);
+					size_t align, enum kvm_pgtable_prot prot);
+unsigned long pkvm_alloc_private_va_range(size_t size, size_t align);
 
 static inline void hyp_vmemmap_range(phys_addr_t phys, unsigned long size,
 				     unsigned long *start, unsigned long *end)
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 5e2197db0d32..96b2312a0f1d 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -158,9 +158,10 @@ static void handle___pkvm_create_private_mapping(struct kvm_cpu_context *host_ct
 {
 	DECLARE_REG(phys_addr_t, phys, host_ctxt, 1);
 	DECLARE_REG(size_t, size, host_ctxt, 2);
-	DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 3);
+	DECLARE_REG(size_t, align, host_ctxt, 3);
+	DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 4);
 
-	cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, prot);
+	cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, align, prot);
 }
 
 static void handle___pkvm_prot_finalize(struct kvm_cpu_context *host_ctxt)
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index 526a7d6fa86f..f35468ec639d 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -37,26 +37,46 @@ static int __pkvm_create_mappings(unsigned long start, unsigned long size,
 	return err;
 }
 
-unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
-					    enum kvm_pgtable_prot prot)
+/*
+ * Allocates a private VA range above __io_map_base.
+ *
+ * @size:	The size of the VA range to reserve.
+ * @align:	The required alignment for the allocation.
+ */
+unsigned long pkvm_alloc_private_va_range(size_t size, size_t align)
 {
-	unsigned long addr;
-	int err;
+	unsigned long base, addr;
 
 	hyp_spin_lock(&pkvm_pgd_lock);
 
-	size = PAGE_ALIGN(size + offset_in_page(phys));
-	addr = __io_map_base;
-	__io_map_base += size;
+	addr = ALIGN(__io_map_base, align);
+
+	/* The allocated size is always a multiple of PAGE_SIZE */
+	base = addr + PAGE_ALIGN(size);
 
 	/* Are we overflowing on the vmemmap ? */
-	if (__io_map_base > __hyp_vmemmap) {
-		__io_map_base -= size;
+	if (base > __hyp_vmemmap)
 		addr = (unsigned long)ERR_PTR(-ENOMEM);
+	else
+		__io_map_base = base;
+
+	hyp_spin_unlock(&pkvm_pgd_lock);
+
+	return addr;
+}
+
+unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
+					size_t align, enum kvm_pgtable_prot prot)
+{
+	unsigned long addr;
+	int err;
+
+	size += offset_in_page(phys);
+	addr = pkvm_alloc_private_va_range(size, align);
+	if (IS_ERR((void *)addr))
 		goto out;
-	}
 
-	err = kvm_pgtable_hyp_map(&pkvm_pgtable, addr, size, phys, prot);
+	err = __pkvm_create_mappings(addr, size, phys, prot);
 	if (err) {
 		addr = (unsigned long)ERR_PTR(err);
 		goto out;
@@ -64,8 +84,6 @@ unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
 
 	addr = addr + offset_in_page(phys);
 out:
-	hyp_spin_unlock(&pkvm_pgd_lock);
-
 	return addr;
 }
 
@@ -152,11 +170,10 @@ int hyp_map_vectors(void)
 		return 0;
 
 	phys = __hyp_pa(__bp_harden_hyp_vecs);
-	bp_base = (void *)__pkvm_create_private_mapping(phys,
-							__BP_HARDEN_HYP_VECS_SZ,
-							PAGE_HYP_EXEC);
+	bp_base = (void *)__pkvm_create_private_mapping(phys, __BP_HARDEN_HYP_VECS_SZ,
+							PAGE_SIZE, PAGE_HYP_EXEC);
 	if (IS_ERR_OR_NULL(bp_base))
-		return PTR_ERR(bp_base);
+		return bp_base ? PTR_ERR(bp_base) : -ENOMEM;
 
 	__hyp_bp_vect_base = bp_base;
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index fc09536c8197..298e6d8439ef 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -505,7 +505,7 @@ int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
 
 	if (!kvm_host_owns_hyp_mappings()) {
 		addr = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
-					 phys_addr, size, prot);
+					 phys_addr, size, align, prot);
 		if (IS_ERR_OR_NULL((void *)addr))
 			return addr ? PTR_ERR((void *)addr) : -ENOMEM;
 		*haddr = addr;
-- 
2.35.1.473.g83b2b277ed-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 3/8] KVM: arm64: Add guard pages for KVM nVHE hypervisor stack
  2022-02-24  5:13 ` Kalesh Singh
  (?)
@ 2022-02-24  5:13   ` Kalesh Singh
  -1 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24  5:13 UTC (permalink / raw)
  Cc: will, maz, qperret, tabba, surenb, kernel-team, Kalesh Singh,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Catalin Marinas,
	Mark Rutland, Mark Brown, Masami Hiramatsu, Peter Collingbourne,
	Madhavan T. Venkataraman, Andrew Walbran, Andrew Scull,
	Paolo Bonzini, linux-arm-kernel, kvmarm, linux-kernel

Maps the stack pages in the flexible private VA range and allocates
guard pages below the stack as unbacked VA space. The stack is aligned
to twice its size to aid overflow detection (implemented in a subsequent
patch in the series).

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---

Changes in v3:
  - Handle null ptr in IS_ERR_OR_NULL checks, per Mark

 arch/arm64/include/asm/kvm_asm.h |  1 +
 arch/arm64/kvm/arm.c             | 32 +++++++++++++++++++++++++++++---
 2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index d5b0386ef765..2e277f2ed671 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -169,6 +169,7 @@ struct kvm_nvhe_init_params {
 	unsigned long tcr_el2;
 	unsigned long tpidr_el2;
 	unsigned long stack_hyp_va;
+	unsigned long stack_pa;
 	phys_addr_t pgd_pa;
 	unsigned long hcr_el2;
 	unsigned long vttbr;
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index ecc5958e27fe..7a23630c4a7f 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1541,7 +1541,6 @@ static void cpu_prepare_hyp_mode(int cpu)
 	tcr |= (idmap_t0sz & GENMASK(TCR_TxSZ_WIDTH - 1, 0)) << TCR_T0SZ_OFFSET;
 	params->tcr_el2 = tcr;
 
-	params->stack_hyp_va = kern_hyp_va(per_cpu(kvm_arm_hyp_stack_page, cpu) + PAGE_SIZE);
 	params->pgd_pa = kvm_mmu_get_httbr();
 	if (is_protected_kvm_enabled())
 		params->hcr_el2 = HCR_HOST_NVHE_PROTECTED_FLAGS;
@@ -1990,14 +1989,41 @@ static int init_hyp_mode(void)
 	 * Map the Hyp stack pages
 	 */
 	for_each_possible_cpu(cpu) {
+		struct kvm_nvhe_init_params *params = per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
 		char *stack_page = (char *)per_cpu(kvm_arm_hyp_stack_page, cpu);
-		err = create_hyp_mappings(stack_page, stack_page + PAGE_SIZE,
-					  PAGE_HYP);
+		unsigned long stack_hyp_va, guard_hyp_va;
 
+		/*
+		 * Private mappings are allocated downwards from io_map_base
+		 * so allocate the stack first then the guard page.
+		 *
+		 * The stack is aligned to twice its size to facilitate overflow
+		 * detection.
+		 */
+		err = __create_hyp_private_mapping(__pa(stack_page), PAGE_SIZE,
+						PAGE_SIZE * 2, &stack_hyp_va, PAGE_HYP);
 		if (err) {
 			kvm_err("Cannot map hyp stack\n");
 			goto out_err;
 		}
+
+		/* Allocate unbacked private VA range for stack guard page */
+		guard_hyp_va = hyp_alloc_private_va_range(PAGE_SIZE, PAGE_SIZE);
+		if (IS_ERR_OR_NULL((void *)guard_hyp_va)) {
+			err = guard_hyp_va ? PTR_ERR((void *)guard_hyp_va) : -ENOMEM;
+			kvm_err("Cannot allocate hyp stack guard page\n");
+			goto out_err;
+		}
+
+		/*
+		 * Save the stack PA in nvhe_init_params. This will be needed to recreate
+		 * the stack mapping in protected nVHE mode. __hyp_pa() won't do the right
+		 * thing there, since the stack has been mapped in the flexible private
+		 * VA space.
+		 */
+		params->stack_pa = __pa(stack_page) + PAGE_SIZE;
+
+		params->stack_hyp_va = stack_hyp_va + PAGE_SIZE;
 	}
 
 	for_each_possible_cpu(cpu) {
-- 
2.35.1.473.g83b2b277ed-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 3/8] KVM: arm64: Add guard pages for KVM nVHE hypervisor stack
@ 2022-02-24  5:13   ` Kalesh Singh
  0 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24  5:13 UTC (permalink / raw)
  Cc: will, maz, qperret, tabba, surenb, kernel-team, Kalesh Singh,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Catalin Marinas,
	Mark Rutland, Mark Brown, Masami Hiramatsu, Peter Collingbourne,
	Madhavan T. Venkataraman, Andrew Walbran, Andrew Scull,
	Paolo Bonzini, linux-arm-kernel, kvmarm, linux-kernel

Maps the stack pages in the flexible private VA range and allocates
guard pages below the stack as unbacked VA space. The stack is aligned
to twice its size to aid overflow detection (implemented in a subsequent
patch in the series).

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---

Changes in v3:
  - Handle null ptr in IS_ERR_OR_NULL checks, per Mark

 arch/arm64/include/asm/kvm_asm.h |  1 +
 arch/arm64/kvm/arm.c             | 32 +++++++++++++++++++++++++++++---
 2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index d5b0386ef765..2e277f2ed671 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -169,6 +169,7 @@ struct kvm_nvhe_init_params {
 	unsigned long tcr_el2;
 	unsigned long tpidr_el2;
 	unsigned long stack_hyp_va;
+	unsigned long stack_pa;
 	phys_addr_t pgd_pa;
 	unsigned long hcr_el2;
 	unsigned long vttbr;
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index ecc5958e27fe..7a23630c4a7f 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1541,7 +1541,6 @@ static void cpu_prepare_hyp_mode(int cpu)
 	tcr |= (idmap_t0sz & GENMASK(TCR_TxSZ_WIDTH - 1, 0)) << TCR_T0SZ_OFFSET;
 	params->tcr_el2 = tcr;
 
-	params->stack_hyp_va = kern_hyp_va(per_cpu(kvm_arm_hyp_stack_page, cpu) + PAGE_SIZE);
 	params->pgd_pa = kvm_mmu_get_httbr();
 	if (is_protected_kvm_enabled())
 		params->hcr_el2 = HCR_HOST_NVHE_PROTECTED_FLAGS;
@@ -1990,14 +1989,41 @@ static int init_hyp_mode(void)
 	 * Map the Hyp stack pages
 	 */
 	for_each_possible_cpu(cpu) {
+		struct kvm_nvhe_init_params *params = per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
 		char *stack_page = (char *)per_cpu(kvm_arm_hyp_stack_page, cpu);
-		err = create_hyp_mappings(stack_page, stack_page + PAGE_SIZE,
-					  PAGE_HYP);
+		unsigned long stack_hyp_va, guard_hyp_va;
 
+		/*
+		 * Private mappings are allocated downwards from io_map_base
+		 * so allocate the stack first then the guard page.
+		 *
+		 * The stack is aligned to twice its size to facilitate overflow
+		 * detection.
+		 */
+		err = __create_hyp_private_mapping(__pa(stack_page), PAGE_SIZE,
+						PAGE_SIZE * 2, &stack_hyp_va, PAGE_HYP);
 		if (err) {
 			kvm_err("Cannot map hyp stack\n");
 			goto out_err;
 		}
+
+		/* Allocate unbacked private VA range for stack guard page */
+		guard_hyp_va = hyp_alloc_private_va_range(PAGE_SIZE, PAGE_SIZE);
+		if (IS_ERR_OR_NULL((void *)guard_hyp_va)) {
+			err = guard_hyp_va ? PTR_ERR((void *)guard_hyp_va) : -ENOMEM;
+			kvm_err("Cannot allocate hyp stack guard page\n");
+			goto out_err;
+		}
+
+		/*
+		 * Save the stack PA in nvhe_init_params. This will be needed to recreate
+		 * the stack mapping in protected nVHE mode. __hyp_pa() won't do the right
+		 * thing there, since the stack has been mapped in the flexible private
+		 * VA space.
+		 */
+		params->stack_pa = __pa(stack_page) + PAGE_SIZE;
+
+		params->stack_hyp_va = stack_hyp_va + PAGE_SIZE;
 	}
 
 	for_each_possible_cpu(cpu) {
-- 
2.35.1.473.g83b2b277ed-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 3/8] KVM: arm64: Add guard pages for KVM nVHE hypervisor stack
@ 2022-02-24  5:13   ` Kalesh Singh
  0 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24  5:13 UTC (permalink / raw)
  Cc: Catalin Marinas, Kalesh Singh, will, kvmarm, Andrew Walbran, maz,
	Madhavan T. Venkataraman, kernel-team, surenb, Mark Brown,
	Peter Collingbourne, linux-arm-kernel, linux-kernel,
	Masami Hiramatsu, Paolo Bonzini

Maps the stack pages in the flexible private VA range and allocates
guard pages below the stack as unbacked VA space. The stack is aligned
to twice its size to aid overflow detection (implemented in a subsequent
patch in the series).

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---

Changes in v3:
  - Handle null ptr in IS_ERR_OR_NULL checks, per Mark

 arch/arm64/include/asm/kvm_asm.h |  1 +
 arch/arm64/kvm/arm.c             | 32 +++++++++++++++++++++++++++++---
 2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index d5b0386ef765..2e277f2ed671 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -169,6 +169,7 @@ struct kvm_nvhe_init_params {
 	unsigned long tcr_el2;
 	unsigned long tpidr_el2;
 	unsigned long stack_hyp_va;
+	unsigned long stack_pa;
 	phys_addr_t pgd_pa;
 	unsigned long hcr_el2;
 	unsigned long vttbr;
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index ecc5958e27fe..7a23630c4a7f 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1541,7 +1541,6 @@ static void cpu_prepare_hyp_mode(int cpu)
 	tcr |= (idmap_t0sz & GENMASK(TCR_TxSZ_WIDTH - 1, 0)) << TCR_T0SZ_OFFSET;
 	params->tcr_el2 = tcr;
 
-	params->stack_hyp_va = kern_hyp_va(per_cpu(kvm_arm_hyp_stack_page, cpu) + PAGE_SIZE);
 	params->pgd_pa = kvm_mmu_get_httbr();
 	if (is_protected_kvm_enabled())
 		params->hcr_el2 = HCR_HOST_NVHE_PROTECTED_FLAGS;
@@ -1990,14 +1989,41 @@ static int init_hyp_mode(void)
 	 * Map the Hyp stack pages
 	 */
 	for_each_possible_cpu(cpu) {
+		struct kvm_nvhe_init_params *params = per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
 		char *stack_page = (char *)per_cpu(kvm_arm_hyp_stack_page, cpu);
-		err = create_hyp_mappings(stack_page, stack_page + PAGE_SIZE,
-					  PAGE_HYP);
+		unsigned long stack_hyp_va, guard_hyp_va;
 
+		/*
+		 * Private mappings are allocated downwards from io_map_base
+		 * so allocate the stack first then the guard page.
+		 *
+		 * The stack is aligned to twice its size to facilitate overflow
+		 * detection.
+		 */
+		err = __create_hyp_private_mapping(__pa(stack_page), PAGE_SIZE,
+						PAGE_SIZE * 2, &stack_hyp_va, PAGE_HYP);
 		if (err) {
 			kvm_err("Cannot map hyp stack\n");
 			goto out_err;
 		}
+
+		/* Allocate unbacked private VA range for stack guard page */
+		guard_hyp_va = hyp_alloc_private_va_range(PAGE_SIZE, PAGE_SIZE);
+		if (IS_ERR_OR_NULL((void *)guard_hyp_va)) {
+			err = guard_hyp_va ? PTR_ERR((void *)guard_hyp_va) : -ENOMEM;
+			kvm_err("Cannot allocate hyp stack guard page\n");
+			goto out_err;
+		}
+
+		/*
+		 * Save the stack PA in nvhe_init_params. This will be needed to recreate
+		 * the stack mapping in protected nVHE mode. __hyp_pa() won't do the right
+		 * thing there, since the stack has been mapped in the flexible private
+		 * VA space.
+		 */
+		params->stack_pa = __pa(stack_page) + PAGE_SIZE;
+
+		params->stack_hyp_va = stack_hyp_va + PAGE_SIZE;
 	}
 
 	for_each_possible_cpu(cpu) {
-- 
2.35.1.473.g83b2b277ed-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 4/8] KVM: arm64: Add guard pages for pKVM (protected nVHE) hypervisor stack
  2022-02-24  5:13 ` Kalesh Singh
  (?)
@ 2022-02-24  5:13   ` Kalesh Singh
  -1 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24  5:13 UTC (permalink / raw)
  Cc: will, maz, qperret, tabba, surenb, kernel-team, Kalesh Singh,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Catalin Marinas,
	Mark Rutland, Mark Brown, Masami Hiramatsu, Peter Collingbourne,
	Madhavan T. Venkataraman, Andrew Scull, linux-arm-kernel, kvmarm,
	linux-kernel

Maps the stack pages in the flexible private VA range and allocates
guard pages below the stack as unbacked VA space. The stack is aligned
to twice its size to aid overflow detection (implemented in a subsequent
patch in the series).

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---

Changes in v3:
  - Handle null ptr in IS_ERR_OR_NULL checks, per Mark

 arch/arm64/kvm/hyp/nvhe/setup.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 27af337f9fea..5f3a4002f9c5 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -105,11 +105,28 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 		if (ret)
 			return ret;
 
-		end = (void *)per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va;
+		/*
+		 * Private mappings are allocated upwards from __io_map_base
+		 * so allocate the guard page first then the stack.
+		 */
+		start = (void *)pkvm_alloc_private_va_range(PAGE_SIZE, PAGE_SIZE);
+		if (IS_ERR_OR_NULL(start))
+			return start ? PTR_ERR(start) : -ENOMEM;
+
+		/*
+		 * The stack is aligned to twice its size to facilitate overflow
+		 * detection.
+		 */
+		end = (void *)per_cpu_ptr(&kvm_init_params, i)->stack_pa;
 		start = end - PAGE_SIZE;
-		ret = pkvm_create_mappings(start, end, PAGE_HYP);
-		if (ret)
-			return ret;
+		start = (void *)__pkvm_create_private_mapping((phys_addr_t)start,
+					PAGE_SIZE, PAGE_SIZE * 2, PAGE_HYP);
+		if (IS_ERR_OR_NULL(start))
+			return start ? PTR_ERR(start) : -ENOMEM;
+		end = start + PAGE_SIZE;
+
+		/* Update stack_hyp_va to end of the stack's private VA range */
+		per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va = (unsigned long) end;
 	}
 
 	/*
-- 
2.35.1.473.g83b2b277ed-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 4/8] KVM: arm64: Add guard pages for pKVM (protected nVHE) hypervisor stack
@ 2022-02-24  5:13   ` Kalesh Singh
  0 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24  5:13 UTC (permalink / raw)
  Cc: will, maz, qperret, tabba, surenb, kernel-team, Kalesh Singh,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Catalin Marinas,
	Mark Rutland, Mark Brown, Masami Hiramatsu, Peter Collingbourne,
	Madhavan T. Venkataraman, Andrew Scull, linux-arm-kernel, kvmarm,
	linux-kernel

Maps the stack pages in the flexible private VA range and allocates
guard pages below the stack as unbacked VA space. The stack is aligned
to twice its size to aid overflow detection (implemented in a subsequent
patch in the series).

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---

Changes in v3:
  - Handle null ptr in IS_ERR_OR_NULL checks, per Mark

 arch/arm64/kvm/hyp/nvhe/setup.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 27af337f9fea..5f3a4002f9c5 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -105,11 +105,28 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 		if (ret)
 			return ret;
 
-		end = (void *)per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va;
+		/*
+		 * Private mappings are allocated upwards from __io_map_base
+		 * so allocate the guard page first then the stack.
+		 */
+		start = (void *)pkvm_alloc_private_va_range(PAGE_SIZE, PAGE_SIZE);
+		if (IS_ERR_OR_NULL(start))
+			return start ? PTR_ERR(start) : -ENOMEM;
+
+		/*
+		 * The stack is aligned to twice its size to facilitate overflow
+		 * detection.
+		 */
+		end = (void *)per_cpu_ptr(&kvm_init_params, i)->stack_pa;
 		start = end - PAGE_SIZE;
-		ret = pkvm_create_mappings(start, end, PAGE_HYP);
-		if (ret)
-			return ret;
+		start = (void *)__pkvm_create_private_mapping((phys_addr_t)start,
+					PAGE_SIZE, PAGE_SIZE * 2, PAGE_HYP);
+		if (IS_ERR_OR_NULL(start))
+			return start ? PTR_ERR(start) : -ENOMEM;
+		end = start + PAGE_SIZE;
+
+		/* Update stack_hyp_va to end of the stack's private VA range */
+		per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va = (unsigned long) end;
 	}
 
 	/*
-- 
2.35.1.473.g83b2b277ed-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 4/8] KVM: arm64: Add guard pages for pKVM (protected nVHE) hypervisor stack
@ 2022-02-24  5:13   ` Kalesh Singh
  0 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24  5:13 UTC (permalink / raw)
  Cc: kernel-team, Catalin Marinas, linux-arm-kernel, will,
	Peter Collingbourne, maz, linux-kernel, Madhavan T. Venkataraman,
	Mark Brown, Masami Hiramatsu, Kalesh Singh, surenb, kvmarm

Maps the stack pages in the flexible private VA range and allocates
guard pages below the stack as unbacked VA space. The stack is aligned
to twice its size to aid overflow detection (implemented in a subsequent
patch in the series).

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---

Changes in v3:
  - Handle null ptr in IS_ERR_OR_NULL checks, per Mark

 arch/arm64/kvm/hyp/nvhe/setup.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 27af337f9fea..5f3a4002f9c5 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -105,11 +105,28 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 		if (ret)
 			return ret;
 
-		end = (void *)per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va;
+		/*
+		 * Private mappings are allocated upwards from __io_map_base
+		 * so allocate the guard page first then the stack.
+		 */
+		start = (void *)pkvm_alloc_private_va_range(PAGE_SIZE, PAGE_SIZE);
+		if (IS_ERR_OR_NULL(start))
+			return start ? PTR_ERR(start) : -ENOMEM;
+
+		/*
+		 * The stack is aligned to twice its size to facilitate overflow
+		 * detection.
+		 */
+		end = (void *)per_cpu_ptr(&kvm_init_params, i)->stack_pa;
 		start = end - PAGE_SIZE;
-		ret = pkvm_create_mappings(start, end, PAGE_HYP);
-		if (ret)
-			return ret;
+		start = (void *)__pkvm_create_private_mapping((phys_addr_t)start,
+					PAGE_SIZE, PAGE_SIZE * 2, PAGE_HYP);
+		if (IS_ERR_OR_NULL(start))
+			return start ? PTR_ERR(start) : -ENOMEM;
+		end = start + PAGE_SIZE;
+
+		/* Update stack_hyp_va to end of the stack's private VA range */
+		per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va = (unsigned long) end;
 	}
 
 	/*
-- 
2.35.1.473.g83b2b277ed-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 5/8] KVM: arm64: Detect and handle hypervisor stack overflows
  2022-02-24  5:13 ` Kalesh Singh
  (?)
@ 2022-02-24  5:13   ` Kalesh Singh
  -1 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24  5:13 UTC (permalink / raw)
  Cc: will, maz, qperret, tabba, surenb, kernel-team, Kalesh Singh,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Catalin Marinas,
	Mark Rutland, Mark Brown, Masami Hiramatsu, Peter Collingbourne,
	Madhavan T. Venkataraman, Andrew Walbran, Andrew Scull,
	Zenghui Yu, linux-arm-kernel, kvmarm, linux-kernel

The hypervisor stacks (for both nVHE Hyp mode and nVHE protected mode)
are aligned to twice their size (PAGE_SIZE), meaning that any valid stack
address has PAGE_SHIFT bit as 0. This allows us to conveniently check for
overflow in the exception entry without corrupting any GPRs. We won't
recover from a stack overflow so panic the hypervisor.

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---

Changes in v3:
  - Remove test_sp_overflow macro, per Mark
  - Add asmlinkage attribute for hyp_panic, hyp_panic_bad_stack, per Ard

 arch/arm64/kvm/hyp/nvhe/host.S   | 24 ++++++++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/switch.c |  7 ++++++-
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
index 3d613e721a75..749961bfa5ba 100644
--- a/arch/arm64/kvm/hyp/nvhe/host.S
+++ b/arch/arm64/kvm/hyp/nvhe/host.S
@@ -153,6 +153,18 @@ SYM_FUNC_END(__host_hvc)
 
 .macro invalid_host_el2_vect
 	.align 7
+
+	/*
+	 * Test whether the SP has overflowed, without corrupting a GPR.
+	 * nVHE hypervisor stacks are aligned so that SP & (1 << PAGE_SHIFT)
+	 * should always be zero.
+	 */
+	add	sp, sp, x0			// sp' = sp + x0
+	sub	x0, sp, x0			// x0' = sp' - x0 = (sp + x0) - x0 = sp
+	tbnz	x0, #PAGE_SHIFT, .L__hyp_sp_overflow\@
+	sub	x0, sp, x0			// x0'' = sp' - x0' = (sp + x0) - sp = x0
+	sub	sp, sp, x0			// sp'' = sp' - x0 = (sp + x0) - x0 = sp
+
 	/* If a guest is loaded, panic out of it. */
 	stp	x0, x1, [sp, #-16]!
 	get_loaded_vcpu x0, x1
@@ -165,6 +177,18 @@ SYM_FUNC_END(__host_hvc)
 	 * been partially clobbered by __host_enter.
 	 */
 	b	hyp_panic
+
+.L__hyp_sp_overflow\@:
+	/*
+	 * Reset SP to the top of the stack, to allow handling the hyp_panic.
+	 * This corrupts the stack but is ok, since we won't be attempting
+	 * any unwinding here.
+	 */
+	ldr_this_cpu	x0, kvm_init_params + NVHE_INIT_STACK_HYP_VA, x1
+	mov	sp, x0
+
+	bl	hyp_panic_bad_stack
+	ASM_BUG()
 .endm
 
 .macro invalid_host_el1_vect
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 6410d21d8695..703a5d3f611b 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -347,7 +347,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 	return exit_code;
 }
 
-void __noreturn hyp_panic(void)
+asmlinkage void __noreturn hyp_panic(void)
 {
 	u64 spsr = read_sysreg_el2(SYS_SPSR);
 	u64 elr = read_sysreg_el2(SYS_ELR);
@@ -369,6 +369,11 @@ void __noreturn hyp_panic(void)
 	unreachable();
 }
 
+asmlinkage void __noreturn hyp_panic_bad_stack(void)
+{
+	hyp_panic();
+}
+
 asmlinkage void kvm_unexpected_el2_exception(void)
 {
 	return __kvm_unexpected_el2_exception();
-- 
2.35.1.473.g83b2b277ed-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 5/8] KVM: arm64: Detect and handle hypervisor stack overflows
@ 2022-02-24  5:13   ` Kalesh Singh
  0 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24  5:13 UTC (permalink / raw)
  Cc: will, maz, qperret, tabba, surenb, kernel-team, Kalesh Singh,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Catalin Marinas,
	Mark Rutland, Mark Brown, Masami Hiramatsu, Peter Collingbourne,
	Madhavan T. Venkataraman, Andrew Walbran, Andrew Scull,
	Zenghui Yu, linux-arm-kernel, kvmarm, linux-kernel

The hypervisor stacks (for both nVHE Hyp mode and nVHE protected mode)
are aligned to twice their size (PAGE_SIZE), meaning that any valid stack
address has PAGE_SHIFT bit as 0. This allows us to conveniently check for
overflow in the exception entry without corrupting any GPRs. We won't
recover from a stack overflow so panic the hypervisor.

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---

Changes in v3:
  - Remove test_sp_overflow macro, per Mark
  - Add asmlinkage attribute for hyp_panic, hyp_panic_bad_stack, per Ard

 arch/arm64/kvm/hyp/nvhe/host.S   | 24 ++++++++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/switch.c |  7 ++++++-
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
index 3d613e721a75..749961bfa5ba 100644
--- a/arch/arm64/kvm/hyp/nvhe/host.S
+++ b/arch/arm64/kvm/hyp/nvhe/host.S
@@ -153,6 +153,18 @@ SYM_FUNC_END(__host_hvc)
 
 .macro invalid_host_el2_vect
 	.align 7
+
+	/*
+	 * Test whether the SP has overflowed, without corrupting a GPR.
+	 * nVHE hypervisor stacks are aligned so that SP & (1 << PAGE_SHIFT)
+	 * should always be zero.
+	 */
+	add	sp, sp, x0			// sp' = sp + x0
+	sub	x0, sp, x0			// x0' = sp' - x0 = (sp + x0) - x0 = sp
+	tbnz	x0, #PAGE_SHIFT, .L__hyp_sp_overflow\@
+	sub	x0, sp, x0			// x0'' = sp' - x0' = (sp + x0) - sp = x0
+	sub	sp, sp, x0			// sp'' = sp' - x0 = (sp + x0) - x0 = sp
+
 	/* If a guest is loaded, panic out of it. */
 	stp	x0, x1, [sp, #-16]!
 	get_loaded_vcpu x0, x1
@@ -165,6 +177,18 @@ SYM_FUNC_END(__host_hvc)
 	 * been partially clobbered by __host_enter.
 	 */
 	b	hyp_panic
+
+.L__hyp_sp_overflow\@:
+	/*
+	 * Reset SP to the top of the stack, to allow handling the hyp_panic.
+	 * This corrupts the stack but is ok, since we won't be attempting
+	 * any unwinding here.
+	 */
+	ldr_this_cpu	x0, kvm_init_params + NVHE_INIT_STACK_HYP_VA, x1
+	mov	sp, x0
+
+	bl	hyp_panic_bad_stack
+	ASM_BUG()
 .endm
 
 .macro invalid_host_el1_vect
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 6410d21d8695..703a5d3f611b 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -347,7 +347,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 	return exit_code;
 }
 
-void __noreturn hyp_panic(void)
+asmlinkage void __noreturn hyp_panic(void)
 {
 	u64 spsr = read_sysreg_el2(SYS_SPSR);
 	u64 elr = read_sysreg_el2(SYS_ELR);
@@ -369,6 +369,11 @@ void __noreturn hyp_panic(void)
 	unreachable();
 }
 
+asmlinkage void __noreturn hyp_panic_bad_stack(void)
+{
+	hyp_panic();
+}
+
 asmlinkage void kvm_unexpected_el2_exception(void)
 {
 	return __kvm_unexpected_el2_exception();
-- 
2.35.1.473.g83b2b277ed-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 5/8] KVM: arm64: Detect and handle hypervisor stack overflows
@ 2022-02-24  5:13   ` Kalesh Singh
  0 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24  5:13 UTC (permalink / raw)
  Cc: Catalin Marinas, Kalesh Singh, will, kvmarm, Andrew Walbran, maz,
	Madhavan T. Venkataraman, kernel-team, surenb, Mark Brown,
	Peter Collingbourne, linux-arm-kernel, linux-kernel,
	Masami Hiramatsu

The hypervisor stacks (for both nVHE Hyp mode and nVHE protected mode)
are aligned to twice their size (PAGE_SIZE), meaning that any valid stack
address has PAGE_SHIFT bit as 0. This allows us to conveniently check for
overflow in the exception entry without corrupting any GPRs. We won't
recover from a stack overflow so panic the hypervisor.

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---

Changes in v3:
  - Remove test_sp_overflow macro, per Mark
  - Add asmlinkage attribute for hyp_panic, hyp_panic_bad_stack, per Ard

 arch/arm64/kvm/hyp/nvhe/host.S   | 24 ++++++++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/switch.c |  7 ++++++-
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
index 3d613e721a75..749961bfa5ba 100644
--- a/arch/arm64/kvm/hyp/nvhe/host.S
+++ b/arch/arm64/kvm/hyp/nvhe/host.S
@@ -153,6 +153,18 @@ SYM_FUNC_END(__host_hvc)
 
 .macro invalid_host_el2_vect
 	.align 7
+
+	/*
+	 * Test whether the SP has overflowed, without corrupting a GPR.
+	 * nVHE hypervisor stacks are aligned so that SP & (1 << PAGE_SHIFT)
+	 * should always be zero.
+	 */
+	add	sp, sp, x0			// sp' = sp + x0
+	sub	x0, sp, x0			// x0' = sp' - x0 = (sp + x0) - x0 = sp
+	tbnz	x0, #PAGE_SHIFT, .L__hyp_sp_overflow\@
+	sub	x0, sp, x0			// x0'' = sp' - x0' = (sp + x0) - sp = x0
+	sub	sp, sp, x0			// sp'' = sp' - x0 = (sp + x0) - x0 = sp
+
 	/* If a guest is loaded, panic out of it. */
 	stp	x0, x1, [sp, #-16]!
 	get_loaded_vcpu x0, x1
@@ -165,6 +177,18 @@ SYM_FUNC_END(__host_hvc)
 	 * been partially clobbered by __host_enter.
 	 */
 	b	hyp_panic
+
+.L__hyp_sp_overflow\@:
+	/*
+	 * Reset SP to the top of the stack, to allow handling the hyp_panic.
+	 * This corrupts the stack but is ok, since we won't be attempting
+	 * any unwinding here.
+	 */
+	ldr_this_cpu	x0, kvm_init_params + NVHE_INIT_STACK_HYP_VA, x1
+	mov	sp, x0
+
+	bl	hyp_panic_bad_stack
+	ASM_BUG()
 .endm
 
 .macro invalid_host_el1_vect
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 6410d21d8695..703a5d3f611b 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -347,7 +347,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 	return exit_code;
 }
 
-void __noreturn hyp_panic(void)
+asmlinkage void __noreturn hyp_panic(void)
 {
 	u64 spsr = read_sysreg_el2(SYS_SPSR);
 	u64 elr = read_sysreg_el2(SYS_ELR);
@@ -369,6 +369,11 @@ void __noreturn hyp_panic(void)
 	unreachable();
 }
 
+asmlinkage void __noreturn hyp_panic_bad_stack(void)
+{
+	hyp_panic();
+}
+
 asmlinkage void kvm_unexpected_el2_exception(void)
 {
 	return __kvm_unexpected_el2_exception();
-- 
2.35.1.473.g83b2b277ed-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 6/8] KVM: arm64: Add hypervisor overflow stack
  2022-02-24  5:13 ` Kalesh Singh
  (?)
@ 2022-02-24  5:13   ` Kalesh Singh
  -1 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24  5:13 UTC (permalink / raw)
  Cc: will, maz, qperret, tabba, surenb, kernel-team, Kalesh Singh,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Catalin Marinas,
	Mark Rutland, Mark Brown, Masami Hiramatsu, Peter Collingbourne,
	Madhavan T. Venkataraman, Andrew Walbran, Andrew Scull,
	Zenghui Yu, Ard Biesheuvel, Paolo Bonzini, linux-arm-kernel,
	kvmarm, linux-kernel

Allocate and switch to 16-byte aligned secondary stack on overflow. This
provides us stack space to better handle overflows; and is used in
a subsequent patch to dump the hypervisor stacktrace. The overflow stack
is only allocated if CONFIG_NVHE_EL2_DEBUG is enabled, as hypervisor
stacktraces is a debug feature dependent on CONFIG_NVHE_EL2_DEBUG.

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---
 arch/arm64/kvm/hyp/nvhe/host.S   | 5 +++++
 arch/arm64/kvm/hyp/nvhe/switch.c | 5 +++++
 2 files changed, 10 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
index 749961bfa5ba..367a01e8abed 100644
--- a/arch/arm64/kvm/hyp/nvhe/host.S
+++ b/arch/arm64/kvm/hyp/nvhe/host.S
@@ -179,6 +179,10 @@ SYM_FUNC_END(__host_hvc)
 	b	hyp_panic
 
 .L__hyp_sp_overflow\@:
+#ifdef CONFIG_NVHE_EL2_DEBUG
+	/* Switch to the overflow stack */
+	adr_this_cpu sp, hyp_overflow_stack + PAGE_SIZE, x0
+#else
 	/*
 	 * Reset SP to the top of the stack, to allow handling the hyp_panic.
 	 * This corrupts the stack but is ok, since we won't be attempting
@@ -186,6 +190,7 @@ SYM_FUNC_END(__host_hvc)
 	 */
 	ldr_this_cpu	x0, kvm_init_params + NVHE_INIT_STACK_HYP_VA, x1
 	mov	sp, x0
+#endif
 
 	bl	hyp_panic_bad_stack
 	ASM_BUG()
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 703a5d3f611b..efc20273a352 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -34,6 +34,11 @@ DEFINE_PER_CPU(struct kvm_host_data, kvm_host_data);
 DEFINE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
 DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
 
+#ifdef CONFIG_NVHE_EL2_DEBUG
+DEFINE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack)
+	__aligned(16);
+#endif
+
 static void __activate_traps(struct kvm_vcpu *vcpu)
 {
 	u64 val;
-- 
2.35.1.473.g83b2b277ed-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 6/8] KVM: arm64: Add hypervisor overflow stack
@ 2022-02-24  5:13   ` Kalesh Singh
  0 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24  5:13 UTC (permalink / raw)
  Cc: will, maz, qperret, tabba, surenb, kernel-team, Kalesh Singh,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Catalin Marinas,
	Mark Rutland, Mark Brown, Masami Hiramatsu, Peter Collingbourne,
	Madhavan T. Venkataraman, Andrew Walbran, Andrew Scull,
	Zenghui Yu, Ard Biesheuvel, Paolo Bonzini, linux-arm-kernel,
	kvmarm, linux-kernel

Allocate and switch to 16-byte aligned secondary stack on overflow. This
provides us stack space to better handle overflows; and is used in
a subsequent patch to dump the hypervisor stacktrace. The overflow stack
is only allocated if CONFIG_NVHE_EL2_DEBUG is enabled, as hypervisor
stacktraces is a debug feature dependent on CONFIG_NVHE_EL2_DEBUG.

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---
 arch/arm64/kvm/hyp/nvhe/host.S   | 5 +++++
 arch/arm64/kvm/hyp/nvhe/switch.c | 5 +++++
 2 files changed, 10 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
index 749961bfa5ba..367a01e8abed 100644
--- a/arch/arm64/kvm/hyp/nvhe/host.S
+++ b/arch/arm64/kvm/hyp/nvhe/host.S
@@ -179,6 +179,10 @@ SYM_FUNC_END(__host_hvc)
 	b	hyp_panic
 
 .L__hyp_sp_overflow\@:
+#ifdef CONFIG_NVHE_EL2_DEBUG
+	/* Switch to the overflow stack */
+	adr_this_cpu sp, hyp_overflow_stack + PAGE_SIZE, x0
+#else
 	/*
 	 * Reset SP to the top of the stack, to allow handling the hyp_panic.
 	 * This corrupts the stack but is ok, since we won't be attempting
@@ -186,6 +190,7 @@ SYM_FUNC_END(__host_hvc)
 	 */
 	ldr_this_cpu	x0, kvm_init_params + NVHE_INIT_STACK_HYP_VA, x1
 	mov	sp, x0
+#endif
 
 	bl	hyp_panic_bad_stack
 	ASM_BUG()
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 703a5d3f611b..efc20273a352 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -34,6 +34,11 @@ DEFINE_PER_CPU(struct kvm_host_data, kvm_host_data);
 DEFINE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
 DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
 
+#ifdef CONFIG_NVHE_EL2_DEBUG
+DEFINE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack)
+	__aligned(16);
+#endif
+
 static void __activate_traps(struct kvm_vcpu *vcpu)
 {
 	u64 val;
-- 
2.35.1.473.g83b2b277ed-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 6/8] KVM: arm64: Add hypervisor overflow stack
@ 2022-02-24  5:13   ` Kalesh Singh
  0 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24  5:13 UTC (permalink / raw)
  Cc: Catalin Marinas, Kalesh Singh, will, Andrew Walbran, maz, kvmarm,
	Madhavan T. Venkataraman, kernel-team, surenb, Mark Brown,
	Peter Collingbourne, linux-arm-kernel, linux-kernel,
	Masami Hiramatsu, Paolo Bonzini

Allocate and switch to 16-byte aligned secondary stack on overflow. This
provides us stack space to better handle overflows; and is used in
a subsequent patch to dump the hypervisor stacktrace. The overflow stack
is only allocated if CONFIG_NVHE_EL2_DEBUG is enabled, as hypervisor
stacktraces is a debug feature dependent on CONFIG_NVHE_EL2_DEBUG.

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---
 arch/arm64/kvm/hyp/nvhe/host.S   | 5 +++++
 arch/arm64/kvm/hyp/nvhe/switch.c | 5 +++++
 2 files changed, 10 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
index 749961bfa5ba..367a01e8abed 100644
--- a/arch/arm64/kvm/hyp/nvhe/host.S
+++ b/arch/arm64/kvm/hyp/nvhe/host.S
@@ -179,6 +179,10 @@ SYM_FUNC_END(__host_hvc)
 	b	hyp_panic
 
 .L__hyp_sp_overflow\@:
+#ifdef CONFIG_NVHE_EL2_DEBUG
+	/* Switch to the overflow stack */
+	adr_this_cpu sp, hyp_overflow_stack + PAGE_SIZE, x0
+#else
 	/*
 	 * Reset SP to the top of the stack, to allow handling the hyp_panic.
 	 * This corrupts the stack but is ok, since we won't be attempting
@@ -186,6 +190,7 @@ SYM_FUNC_END(__host_hvc)
 	 */
 	ldr_this_cpu	x0, kvm_init_params + NVHE_INIT_STACK_HYP_VA, x1
 	mov	sp, x0
+#endif
 
 	bl	hyp_panic_bad_stack
 	ASM_BUG()
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 703a5d3f611b..efc20273a352 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -34,6 +34,11 @@ DEFINE_PER_CPU(struct kvm_host_data, kvm_host_data);
 DEFINE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
 DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
 
+#ifdef CONFIG_NVHE_EL2_DEBUG
+DEFINE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack)
+	__aligned(16);
+#endif
+
 static void __activate_traps(struct kvm_vcpu *vcpu)
 {
 	u64 val;
-- 
2.35.1.473.g83b2b277ed-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 7/8] KVM: arm64: Unwind and dump nVHE HYP stacktrace
  2022-02-24  5:13 ` Kalesh Singh
  (?)
@ 2022-02-24  5:13   ` Kalesh Singh
  -1 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24  5:13 UTC (permalink / raw)
  Cc: will, maz, qperret, tabba, surenb, kernel-team, Kalesh Singh,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Catalin Marinas,
	Mark Rutland, Mark Brown, Masami Hiramatsu, Peter Collingbourne,
	Madhavan T. Venkataraman, Andrew Scull, Paolo Bonzini,
	Ard Biesheuvel, linux-arm-kernel, kvmarm, linux-kernel

Unwind the stack in EL1, when CONFIG_NVHE_EL2_DEBUG is enabled. This is
possible because CONFIG_NVHE_EL2_DEBUG disables the host stage 2 protection
which allows host to access the hypervisor stack pages in EL1.

Unwinding and dumping hyp call traces is gated on CONFIG_NVHE_EL2_DEBUG
to avoid the potential leaking of information to the host.

A simple stack overflow test produces the following output:

[  580.376051][  T412] kvm: nVHE hyp panic at: ffffffc0116145c4!
[  580.378034][  T412] kvm [412]: nVHE HYP call trace:
[  580.378591][  T412] kvm [412]:  [<ffffffc011614934>]
[  580.378993][  T412] kvm [412]:  [<ffffffc01160fa48>]
[  580.379386][  T412] kvm [412]:  [<ffffffc0116145dc>]  // Non-terminating recursive call
[  580.379772][  T412] kvm [412]:  [<ffffffc0116145dc>]
[  580.380158][  T412] kvm [412]:  [<ffffffc0116145dc>]
[  580.380544][  T412] kvm [412]:  [<ffffffc0116145dc>]
[  580.380928][  T412] kvm [412]:  [<ffffffc0116145dc>]
. . .

Since nVHE hyp symbols are not included by kallsyms to avoid issues
with aliasing, we fallback to the vmlinux addresses. Symbolizing the
addresses is handled in the next patch in this series.

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---

Changes in v3:
  - The nvhe hyp stack unwinder now makes use of the core logic from the
    regular kernel unwinder to avoid duplication, per Mark

Changes in v2:
  - Add cpu_prepare_nvhe_panic_info()
  - Move updating the panic info to hyp_panic(), so that unwinding also
    works for conventional nVHE Hyp-mode.

 arch/arm64/include/asm/kvm_asm.h    |  19 +++
 arch/arm64/include/asm/stacktrace.h |  12 ++
 arch/arm64/kernel/stacktrace.c      | 210 +++++++++++++++++++++++++---
 arch/arm64/kvm/Kconfig              |   5 +-
 arch/arm64/kvm/arm.c                |   2 +-
 arch/arm64/kvm/handle_exit.c        |   3 +
 arch/arm64/kvm/hyp/nvhe/switch.c    |  18 +++
 7 files changed, 243 insertions(+), 26 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 2e277f2ed671..16efdf150a37 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -176,6 +176,25 @@ struct kvm_nvhe_init_params {
 	unsigned long vtcr;
 };
 
+#ifdef CONFIG_NVHE_EL2_DEBUG
+/*
+ * Used by the host in EL1 to dump the nVHE hypervisor backtrace on
+ * hyp_panic. This is possible because CONFIG_NVHE_EL2_DEBUG disables
+ * the host stage 2 protection. See: __hyp_do_panic()
+ *
+ * @hyp_stack_base:             hyp VA of the hyp_stack base.
+ * @hyp_overflow_stack_base:    hyp VA of the hyp_overflow_stack base.
+ * @fp:                         hyp FP where the backtrace begins.
+ * @pc:                         hyp PC where the backtrace begins.
+ */
+struct kvm_nvhe_panic_info {
+	unsigned long hyp_stack_base;
+	unsigned long hyp_overflow_stack_base;
+	unsigned long fp;
+	unsigned long pc;
+};
+#endif /* CONFIG_NVHE_EL2_DEBUG */
+
 /* Translate a kernel address @ptr into its equivalent linear mapping */
 #define kvm_ksym_ref(ptr)						\
 	({								\
diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index e77cdef9ca29..18611a51cf14 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -22,6 +22,10 @@ enum stack_type {
 	STACK_TYPE_OVERFLOW,
 	STACK_TYPE_SDEI_NORMAL,
 	STACK_TYPE_SDEI_CRITICAL,
+#ifdef CONFIG_NVHE_EL2_DEBUG
+	STACK_TYPE_KVM_NVHE_HYP,
+	STACK_TYPE_KVM_NVHE_OVERFLOW,
+#endif /* CONFIG_NVHE_EL2_DEBUG */
 	__NR_STACK_TYPES
 };
 
@@ -147,4 +151,12 @@ static inline bool on_accessible_stack(const struct task_struct *tsk,
 	return false;
 }
 
+#ifdef CONFIG_NVHE_EL2_DEBUG
+void kvm_nvhe_dump_backtrace(unsigned long hyp_offset);
+#else
+static inline void kvm_nvhe_dump_backtrace(unsigned long hyp_offset)
+{
+}
+#endif /* CONFIG_NVHE_EL2_DEBUG */
+
 #endif	/* __ASM_STACKTRACE_H */
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index e4103e085681..6ec85cb69b1f 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -15,6 +15,8 @@
 
 #include <asm/irq.h>
 #include <asm/pointer_auth.h>
+#include <asm/kvm_asm.h>
+#include <asm/kvm_hyp.h>
 #include <asm/stack_pointer.h>
 #include <asm/stacktrace.h>
 
@@ -64,26 +66,15 @@ NOKPROBE_SYMBOL(start_backtrace);
  * records (e.g. a cycle), determined based on the location and fp value of A
  * and the location (but not the fp value) of B.
  */
-static int notrace unwind_frame(struct task_struct *tsk,
-				struct stackframe *frame)
+static int notrace __unwind_frame(struct stackframe *frame, struct stack_info *info,
+		unsigned long (*translate_fp)(unsigned long, enum stack_type))
 {
 	unsigned long fp = frame->fp;
-	struct stack_info info;
-
-	if (!tsk)
-		tsk = current;
-
-	/* Final frame; nothing to unwind */
-	if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
-		return -ENOENT;
 
 	if (fp & 0x7)
 		return -EINVAL;
 
-	if (!on_accessible_stack(tsk, fp, 16, &info))
-		return -EINVAL;
-
-	if (test_bit(info.type, frame->stacks_done))
+	if (test_bit(info->type, frame->stacks_done))
 		return -EINVAL;
 
 	/*
@@ -94,28 +85,62 @@ static int notrace unwind_frame(struct task_struct *tsk,
 	 *
 	 * TASK -> IRQ -> OVERFLOW -> SDEI_NORMAL
 	 * TASK -> SDEI_NORMAL -> SDEI_CRITICAL -> OVERFLOW
+	 * KVM_NVHE_HYP -> KVM_NVHE_OVERFLOW
 	 *
 	 * ... but the nesting itself is strict. Once we transition from one
 	 * stack to another, it's never valid to unwind back to that first
 	 * stack.
 	 */
-	if (info.type == frame->prev_type) {
+	if (info->type == frame->prev_type) {
 		if (fp <= frame->prev_fp)
 			return -EINVAL;
 	} else {
 		set_bit(frame->prev_type, frame->stacks_done);
 	}
 
+	/* Record fp as prev_fp before attempting to get the next fp */
+	frame->prev_fp = fp;
+
+	/*
+	 * If fp is not from the current address space perform the
+	 * necessary translation before dereferencing it to get next fp.
+	 */
+	if (translate_fp)
+		fp = translate_fp(fp, info->type);
+	if (!fp)
+		return -EINVAL;
+
 	/*
 	 * Record this frame record's values and location. The prev_fp and
-	 * prev_type are only meaningful to the next unwind_frame() invocation.
+	 * prev_type are only meaningful to the next __unwind_frame() invocation.
 	 */
 	frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
 	frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
-	frame->prev_fp = fp;
-	frame->prev_type = info.type;
-
 	frame->pc = ptrauth_strip_insn_pac(frame->pc);
+	frame->prev_type = info->type;
+
+	return 0;
+}
+
+static int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
+{
+	unsigned long fp = frame->fp;
+	struct stack_info info;
+	int err;
+
+	if (!tsk)
+		tsk = current;
+
+	/* Final frame; nothing to unwind */
+	if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
+		return -ENOENT;
+
+	if (!on_accessible_stack(tsk, fp, 16, &info))
+		return -EINVAL;
+
+	err = __unwind_frame(frame, &info, NULL);
+	if (err)
+		return err;
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 	if (tsk->ret_stack &&
@@ -143,20 +168,27 @@ static int notrace unwind_frame(struct task_struct *tsk,
 }
 NOKPROBE_SYMBOL(unwind_frame);
 
-static void notrace walk_stackframe(struct task_struct *tsk,
-				    struct stackframe *frame,
-				    bool (*fn)(void *, unsigned long), void *data)
+static void notrace __walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
+		bool (*fn)(void *, unsigned long), void *data,
+		int (*unwind_frame_fn)(struct task_struct *tsk, struct stackframe *frame))
 {
 	while (1) {
 		int ret;
 
 		if (!fn(data, frame->pc))
 			break;
-		ret = unwind_frame(tsk, frame);
+		ret = unwind_frame_fn(tsk, frame);
 		if (ret < 0)
 			break;
 	}
 }
+
+static void notrace walk_stackframe(struct task_struct *tsk,
+				    struct stackframe *frame,
+				    bool (*fn)(void *, unsigned long), void *data)
+{
+	__walk_stackframe(tsk, frame, fn, data, unwind_frame);
+}
 NOKPROBE_SYMBOL(walk_stackframe);
 
 static bool dump_backtrace_entry(void *arg, unsigned long where)
@@ -210,3 +242,135 @@ noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
 
 	walk_stackframe(task, &frame, consume_entry, cookie);
 }
+
+#ifdef CONFIG_NVHE_EL2_DEBUG
+DECLARE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
+DECLARE_KVM_NVHE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack);
+DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
+
+static inline bool kvm_nvhe_on_overflow_stack(unsigned long sp, unsigned long size,
+				 struct stack_info *info)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+	unsigned long low = (unsigned long)panic_info->hyp_overflow_stack_base;
+	unsigned long high = low + PAGE_SIZE;
+
+	return on_stack(sp, size, low, high, STACK_TYPE_KVM_NVHE_OVERFLOW, info);
+}
+
+static inline bool kvm_nvhe_on_hyp_stack(unsigned long sp, unsigned long size,
+				 struct stack_info *info)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+	unsigned long low = (unsigned long)panic_info->hyp_stack_base;
+	unsigned long high = low + PAGE_SIZE;
+
+	return on_stack(sp, size, low, high, STACK_TYPE_KVM_NVHE_HYP, info);
+}
+
+static inline bool kvm_nvhe_on_accessible_stack(unsigned long sp, unsigned long size,
+				       struct stack_info *info)
+{
+	if (info)
+		info->type = STACK_TYPE_UNKNOWN;
+
+	if (kvm_nvhe_on_hyp_stack(sp, size, info))
+		return true;
+	if (kvm_nvhe_on_overflow_stack(sp, size, info))
+		return true;
+
+	return false;
+}
+
+static unsigned long kvm_nvhe_hyp_stack_kern_va(unsigned long addr)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+	unsigned long hyp_base, kern_base, hyp_offset;
+
+	hyp_base = (unsigned long)panic_info->hyp_stack_base;
+	hyp_offset = addr - hyp_base;
+
+	kern_base = (unsigned long)*this_cpu_ptr(&kvm_arm_hyp_stack_page);
+
+	return kern_base + hyp_offset;
+}
+
+static unsigned long kvm_nvhe_overflow_stack_kern_va(unsigned long addr)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+	unsigned long hyp_base, kern_base, hyp_offset;
+
+	hyp_base = (unsigned long)panic_info->hyp_overflow_stack_base;
+	hyp_offset = addr - hyp_base;
+
+	kern_base = (unsigned long)this_cpu_ptr_nvhe_sym(hyp_overflow_stack);
+
+	return kern_base + hyp_offset;
+}
+
+/*
+ * Convert KVM nVHE hypervisor stack VA to a kernel VA.
+ *
+ * The nVHE hypervisor stack is mapped in the flexible 'private' VA range, to allow
+ * for guard pages below the stack. Consequently, the fixed offset address
+ * translation macros won't work here.
+ *
+ * The kernel VA is calculated as an offset from the kernel VA of the hypervisor
+ * stack base. See: kvm_nvhe_hyp_stack_kern_va(),  kvm_nvhe_overflow_stack_kern_va()
+ */
+static unsigned long kvm_nvhe_stack_kern_va(unsigned long addr,
+					enum stack_type type)
+{
+	switch (type) {
+	case STACK_TYPE_KVM_NVHE_HYP:
+		return kvm_nvhe_hyp_stack_kern_va(addr);
+	case STACK_TYPE_KVM_NVHE_OVERFLOW:
+		return kvm_nvhe_overflow_stack_kern_va(addr);
+	default:
+		return 0UL;
+	}
+}
+
+static int notrace kvm_nvhe_unwind_frame(struct task_struct *tsk,
+					struct stackframe *frame)
+{
+	struct stack_info info;
+
+	if (!kvm_nvhe_on_accessible_stack(frame->fp, 16, &info))
+		return -EINVAL;
+
+	return  __unwind_frame(frame, &info, kvm_nvhe_stack_kern_va);
+}
+
+static bool kvm_nvhe_dump_backtrace_entry(void *arg, unsigned long where)
+{
+	unsigned long va_mask = GENMASK_ULL(vabits_actual - 1, 0);
+	unsigned long hyp_offset = (unsigned long)arg;
+
+	where &= va_mask;	/* Mask tags */
+	where += hyp_offset;	/* Convert to kern addr */
+
+	kvm_err("[<%016lx>] %pB\n", where, (void *)where);
+
+	return true;
+}
+
+static void notrace kvm_nvhe_walk_stackframe(struct task_struct *tsk,
+				    struct stackframe *frame,
+				    bool (*fn)(void *, unsigned long), void *data)
+{
+	__walk_stackframe(tsk, frame, fn, data, kvm_nvhe_unwind_frame);
+}
+
+void kvm_nvhe_dump_backtrace(unsigned long hyp_offset)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+	struct stackframe frame;
+
+	start_backtrace(&frame, panic_info->fp, panic_info->pc);
+	pr_err("nVHE HYP call trace:\n");
+	kvm_nvhe_walk_stackframe(NULL, &frame, kvm_nvhe_dump_backtrace_entry,
+					(void *)hyp_offset);
+	pr_err("---- end of nVHE HYP call trace ----\n");
+}
+#endif /* CONFIG_NVHE_EL2_DEBUG */
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 8a5fbbf084df..75f2c8255ff0 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -51,8 +51,9 @@ config NVHE_EL2_DEBUG
 	depends on KVM
 	help
 	  Say Y here to enable the debug mode for the non-VHE KVM EL2 object.
-	  Failure reports will BUG() in the hypervisor. This is intended for
-	  local EL2 hypervisor development.
+	  Failure reports will BUG() in the hypervisor; and panics will print
+	  the hypervisor call stack. This is intended for local EL2 hypervisor
+	  development.
 
 	  If unsure, say N.
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 7a23630c4a7f..66c07c04eb52 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -49,7 +49,7 @@ DEFINE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
 
 DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
 
-static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
+DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
 unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
 DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
 
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index e3140abd2e2e..ff69dff33700 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -17,6 +17,7 @@
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_mmu.h>
 #include <asm/debug-monitors.h>
+#include <asm/stacktrace.h>
 #include <asm/traps.h>
 
 #include <kvm/arm_hypercalls.h>
@@ -326,6 +327,8 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr,
 		kvm_err("nVHE hyp panic at: %016llx!\n", elr_virt + hyp_offset);
 	}
 
+	kvm_nvhe_dump_backtrace(hyp_offset);
+
 	/*
 	 * Hyp has panicked and we're going to handle that by panicking the
 	 * kernel. The kernel offset will be revealed in the panic so we're
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index efc20273a352..b8ecffc47424 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -37,6 +37,22 @@ DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
 #ifdef CONFIG_NVHE_EL2_DEBUG
 DEFINE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack)
 	__aligned(16);
+DEFINE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
+
+static inline void cpu_prepare_nvhe_panic_info(void)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr(&kvm_panic_info);
+	struct kvm_nvhe_init_params *params = this_cpu_ptr(&kvm_init_params);
+
+	panic_info->hyp_stack_base = (unsigned long)(params->stack_hyp_va - PAGE_SIZE);
+	panic_info->hyp_overflow_stack_base = (unsigned long)this_cpu_ptr(hyp_overflow_stack);
+	panic_info->fp = (unsigned long)__builtin_frame_address(0);
+	panic_info->pc = _THIS_IP_;
+}
+ #else
+static inline void cpu_prepare_nvhe_panic_info(void)
+{
+}
 #endif
 
 static void __activate_traps(struct kvm_vcpu *vcpu)
@@ -360,6 +376,8 @@ asmlinkage void __noreturn hyp_panic(void)
 	struct kvm_cpu_context *host_ctxt;
 	struct kvm_vcpu *vcpu;
 
+	cpu_prepare_nvhe_panic_info();
+
 	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
 	vcpu = host_ctxt->__hyp_running_vcpu;
 
-- 
2.35.1.473.g83b2b277ed-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 7/8] KVM: arm64: Unwind and dump nVHE HYP stacktrace
@ 2022-02-24  5:13   ` Kalesh Singh
  0 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24  5:13 UTC (permalink / raw)
  Cc: will, maz, qperret, tabba, surenb, kernel-team, Kalesh Singh,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Catalin Marinas,
	Mark Rutland, Mark Brown, Masami Hiramatsu, Peter Collingbourne,
	Madhavan T. Venkataraman, Andrew Scull, Paolo Bonzini,
	Ard Biesheuvel, linux-arm-kernel, kvmarm, linux-kernel

Unwind the stack in EL1, when CONFIG_NVHE_EL2_DEBUG is enabled. This is
possible because CONFIG_NVHE_EL2_DEBUG disables the host stage 2 protection
which allows host to access the hypervisor stack pages in EL1.

Unwinding and dumping hyp call traces is gated on CONFIG_NVHE_EL2_DEBUG
to avoid the potential leaking of information to the host.

A simple stack overflow test produces the following output:

[  580.376051][  T412] kvm: nVHE hyp panic at: ffffffc0116145c4!
[  580.378034][  T412] kvm [412]: nVHE HYP call trace:
[  580.378591][  T412] kvm [412]:  [<ffffffc011614934>]
[  580.378993][  T412] kvm [412]:  [<ffffffc01160fa48>]
[  580.379386][  T412] kvm [412]:  [<ffffffc0116145dc>]  // Non-terminating recursive call
[  580.379772][  T412] kvm [412]:  [<ffffffc0116145dc>]
[  580.380158][  T412] kvm [412]:  [<ffffffc0116145dc>]
[  580.380544][  T412] kvm [412]:  [<ffffffc0116145dc>]
[  580.380928][  T412] kvm [412]:  [<ffffffc0116145dc>]
. . .

Since nVHE hyp symbols are not included by kallsyms to avoid issues
with aliasing, we fallback to the vmlinux addresses. Symbolizing the
addresses is handled in the next patch in this series.

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---

Changes in v3:
  - The nvhe hyp stack unwinder now makes use of the core logic from the
    regular kernel unwinder to avoid duplication, per Mark

Changes in v2:
  - Add cpu_prepare_nvhe_panic_info()
  - Move updating the panic info to hyp_panic(), so that unwinding also
    works for conventional nVHE Hyp-mode.

 arch/arm64/include/asm/kvm_asm.h    |  19 +++
 arch/arm64/include/asm/stacktrace.h |  12 ++
 arch/arm64/kernel/stacktrace.c      | 210 +++++++++++++++++++++++++---
 arch/arm64/kvm/Kconfig              |   5 +-
 arch/arm64/kvm/arm.c                |   2 +-
 arch/arm64/kvm/handle_exit.c        |   3 +
 arch/arm64/kvm/hyp/nvhe/switch.c    |  18 +++
 7 files changed, 243 insertions(+), 26 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 2e277f2ed671..16efdf150a37 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -176,6 +176,25 @@ struct kvm_nvhe_init_params {
 	unsigned long vtcr;
 };
 
+#ifdef CONFIG_NVHE_EL2_DEBUG
+/*
+ * Used by the host in EL1 to dump the nVHE hypervisor backtrace on
+ * hyp_panic. This is possible because CONFIG_NVHE_EL2_DEBUG disables
+ * the host stage 2 protection. See: __hyp_do_panic()
+ *
+ * @hyp_stack_base:             hyp VA of the hyp_stack base.
+ * @hyp_overflow_stack_base:    hyp VA of the hyp_overflow_stack base.
+ * @fp:                         hyp FP where the backtrace begins.
+ * @pc:                         hyp PC where the backtrace begins.
+ */
+struct kvm_nvhe_panic_info {
+	unsigned long hyp_stack_base;
+	unsigned long hyp_overflow_stack_base;
+	unsigned long fp;
+	unsigned long pc;
+};
+#endif /* CONFIG_NVHE_EL2_DEBUG */
+
 /* Translate a kernel address @ptr into its equivalent linear mapping */
 #define kvm_ksym_ref(ptr)						\
 	({								\
diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index e77cdef9ca29..18611a51cf14 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -22,6 +22,10 @@ enum stack_type {
 	STACK_TYPE_OVERFLOW,
 	STACK_TYPE_SDEI_NORMAL,
 	STACK_TYPE_SDEI_CRITICAL,
+#ifdef CONFIG_NVHE_EL2_DEBUG
+	STACK_TYPE_KVM_NVHE_HYP,
+	STACK_TYPE_KVM_NVHE_OVERFLOW,
+#endif /* CONFIG_NVHE_EL2_DEBUG */
 	__NR_STACK_TYPES
 };
 
@@ -147,4 +151,12 @@ static inline bool on_accessible_stack(const struct task_struct *tsk,
 	return false;
 }
 
+#ifdef CONFIG_NVHE_EL2_DEBUG
+void kvm_nvhe_dump_backtrace(unsigned long hyp_offset);
+#else
+static inline void kvm_nvhe_dump_backtrace(unsigned long hyp_offset)
+{
+}
+#endif /* CONFIG_NVHE_EL2_DEBUG */
+
 #endif	/* __ASM_STACKTRACE_H */
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index e4103e085681..6ec85cb69b1f 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -15,6 +15,8 @@
 
 #include <asm/irq.h>
 #include <asm/pointer_auth.h>
+#include <asm/kvm_asm.h>
+#include <asm/kvm_hyp.h>
 #include <asm/stack_pointer.h>
 #include <asm/stacktrace.h>
 
@@ -64,26 +66,15 @@ NOKPROBE_SYMBOL(start_backtrace);
  * records (e.g. a cycle), determined based on the location and fp value of A
  * and the location (but not the fp value) of B.
  */
-static int notrace unwind_frame(struct task_struct *tsk,
-				struct stackframe *frame)
+static int notrace __unwind_frame(struct stackframe *frame, struct stack_info *info,
+		unsigned long (*translate_fp)(unsigned long, enum stack_type))
 {
 	unsigned long fp = frame->fp;
-	struct stack_info info;
-
-	if (!tsk)
-		tsk = current;
-
-	/* Final frame; nothing to unwind */
-	if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
-		return -ENOENT;
 
 	if (fp & 0x7)
 		return -EINVAL;
 
-	if (!on_accessible_stack(tsk, fp, 16, &info))
-		return -EINVAL;
-
-	if (test_bit(info.type, frame->stacks_done))
+	if (test_bit(info->type, frame->stacks_done))
 		return -EINVAL;
 
 	/*
@@ -94,28 +85,62 @@ static int notrace unwind_frame(struct task_struct *tsk,
 	 *
 	 * TASK -> IRQ -> OVERFLOW -> SDEI_NORMAL
 	 * TASK -> SDEI_NORMAL -> SDEI_CRITICAL -> OVERFLOW
+	 * KVM_NVHE_HYP -> KVM_NVHE_OVERFLOW
 	 *
 	 * ... but the nesting itself is strict. Once we transition from one
 	 * stack to another, it's never valid to unwind back to that first
 	 * stack.
 	 */
-	if (info.type == frame->prev_type) {
+	if (info->type == frame->prev_type) {
 		if (fp <= frame->prev_fp)
 			return -EINVAL;
 	} else {
 		set_bit(frame->prev_type, frame->stacks_done);
 	}
 
+	/* Record fp as prev_fp before attempting to get the next fp */
+	frame->prev_fp = fp;
+
+	/*
+	 * If fp is not from the current address space perform the
+	 * necessary translation before dereferencing it to get next fp.
+	 */
+	if (translate_fp)
+		fp = translate_fp(fp, info->type);
+	if (!fp)
+		return -EINVAL;
+
 	/*
 	 * Record this frame record's values and location. The prev_fp and
-	 * prev_type are only meaningful to the next unwind_frame() invocation.
+	 * prev_type are only meaningful to the next __unwind_frame() invocation.
 	 */
 	frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
 	frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
-	frame->prev_fp = fp;
-	frame->prev_type = info.type;
-
 	frame->pc = ptrauth_strip_insn_pac(frame->pc);
+	frame->prev_type = info->type;
+
+	return 0;
+}
+
+static int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
+{
+	unsigned long fp = frame->fp;
+	struct stack_info info;
+	int err;
+
+	if (!tsk)
+		tsk = current;
+
+	/* Final frame; nothing to unwind */
+	if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
+		return -ENOENT;
+
+	if (!on_accessible_stack(tsk, fp, 16, &info))
+		return -EINVAL;
+
+	err = __unwind_frame(frame, &info, NULL);
+	if (err)
+		return err;
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 	if (tsk->ret_stack &&
@@ -143,20 +168,27 @@ static int notrace unwind_frame(struct task_struct *tsk,
 }
 NOKPROBE_SYMBOL(unwind_frame);
 
-static void notrace walk_stackframe(struct task_struct *tsk,
-				    struct stackframe *frame,
-				    bool (*fn)(void *, unsigned long), void *data)
+static void notrace __walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
+		bool (*fn)(void *, unsigned long), void *data,
+		int (*unwind_frame_fn)(struct task_struct *tsk, struct stackframe *frame))
 {
 	while (1) {
 		int ret;
 
 		if (!fn(data, frame->pc))
 			break;
-		ret = unwind_frame(tsk, frame);
+		ret = unwind_frame_fn(tsk, frame);
 		if (ret < 0)
 			break;
 	}
 }
+
+static void notrace walk_stackframe(struct task_struct *tsk,
+				    struct stackframe *frame,
+				    bool (*fn)(void *, unsigned long), void *data)
+{
+	__walk_stackframe(tsk, frame, fn, data, unwind_frame);
+}
 NOKPROBE_SYMBOL(walk_stackframe);
 
 static bool dump_backtrace_entry(void *arg, unsigned long where)
@@ -210,3 +242,135 @@ noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
 
 	walk_stackframe(task, &frame, consume_entry, cookie);
 }
+
+#ifdef CONFIG_NVHE_EL2_DEBUG
+DECLARE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
+DECLARE_KVM_NVHE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack);
+DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
+
+static inline bool kvm_nvhe_on_overflow_stack(unsigned long sp, unsigned long size,
+				 struct stack_info *info)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+	unsigned long low = (unsigned long)panic_info->hyp_overflow_stack_base;
+	unsigned long high = low + PAGE_SIZE;
+
+	return on_stack(sp, size, low, high, STACK_TYPE_KVM_NVHE_OVERFLOW, info);
+}
+
+static inline bool kvm_nvhe_on_hyp_stack(unsigned long sp, unsigned long size,
+				 struct stack_info *info)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+	unsigned long low = (unsigned long)panic_info->hyp_stack_base;
+	unsigned long high = low + PAGE_SIZE;
+
+	return on_stack(sp, size, low, high, STACK_TYPE_KVM_NVHE_HYP, info);
+}
+
+static inline bool kvm_nvhe_on_accessible_stack(unsigned long sp, unsigned long size,
+				       struct stack_info *info)
+{
+	if (info)
+		info->type = STACK_TYPE_UNKNOWN;
+
+	if (kvm_nvhe_on_hyp_stack(sp, size, info))
+		return true;
+	if (kvm_nvhe_on_overflow_stack(sp, size, info))
+		return true;
+
+	return false;
+}
+
+static unsigned long kvm_nvhe_hyp_stack_kern_va(unsigned long addr)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+	unsigned long hyp_base, kern_base, hyp_offset;
+
+	hyp_base = (unsigned long)panic_info->hyp_stack_base;
+	hyp_offset = addr - hyp_base;
+
+	kern_base = (unsigned long)*this_cpu_ptr(&kvm_arm_hyp_stack_page);
+
+	return kern_base + hyp_offset;
+}
+
+static unsigned long kvm_nvhe_overflow_stack_kern_va(unsigned long addr)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+	unsigned long hyp_base, kern_base, hyp_offset;
+
+	hyp_base = (unsigned long)panic_info->hyp_overflow_stack_base;
+	hyp_offset = addr - hyp_base;
+
+	kern_base = (unsigned long)this_cpu_ptr_nvhe_sym(hyp_overflow_stack);
+
+	return kern_base + hyp_offset;
+}
+
+/*
+ * Convert KVM nVHE hypervisor stack VA to a kernel VA.
+ *
+ * The nVHE hypervisor stack is mapped in the flexible 'private' VA range, to allow
+ * for guard pages below the stack. Consequently, the fixed offset address
+ * translation macros won't work here.
+ *
+ * The kernel VA is calculated as an offset from the kernel VA of the hypervisor
+ * stack base. See: kvm_nvhe_hyp_stack_kern_va(),  kvm_nvhe_overflow_stack_kern_va()
+ */
+static unsigned long kvm_nvhe_stack_kern_va(unsigned long addr,
+					enum stack_type type)
+{
+	switch (type) {
+	case STACK_TYPE_KVM_NVHE_HYP:
+		return kvm_nvhe_hyp_stack_kern_va(addr);
+	case STACK_TYPE_KVM_NVHE_OVERFLOW:
+		return kvm_nvhe_overflow_stack_kern_va(addr);
+	default:
+		return 0UL;
+	}
+}
+
+static int notrace kvm_nvhe_unwind_frame(struct task_struct *tsk,
+					struct stackframe *frame)
+{
+	struct stack_info info;
+
+	if (!kvm_nvhe_on_accessible_stack(frame->fp, 16, &info))
+		return -EINVAL;
+
+	return  __unwind_frame(frame, &info, kvm_nvhe_stack_kern_va);
+}
+
+static bool kvm_nvhe_dump_backtrace_entry(void *arg, unsigned long where)
+{
+	unsigned long va_mask = GENMASK_ULL(vabits_actual - 1, 0);
+	unsigned long hyp_offset = (unsigned long)arg;
+
+	where &= va_mask;	/* Mask tags */
+	where += hyp_offset;	/* Convert to kern addr */
+
+	kvm_err("[<%016lx>] %pB\n", where, (void *)where);
+
+	return true;
+}
+
+static void notrace kvm_nvhe_walk_stackframe(struct task_struct *tsk,
+				    struct stackframe *frame,
+				    bool (*fn)(void *, unsigned long), void *data)
+{
+	__walk_stackframe(tsk, frame, fn, data, kvm_nvhe_unwind_frame);
+}
+
+void kvm_nvhe_dump_backtrace(unsigned long hyp_offset)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+	struct stackframe frame;
+
+	start_backtrace(&frame, panic_info->fp, panic_info->pc);
+	pr_err("nVHE HYP call trace:\n");
+	kvm_nvhe_walk_stackframe(NULL, &frame, kvm_nvhe_dump_backtrace_entry,
+					(void *)hyp_offset);
+	pr_err("---- end of nVHE HYP call trace ----\n");
+}
+#endif /* CONFIG_NVHE_EL2_DEBUG */
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 8a5fbbf084df..75f2c8255ff0 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -51,8 +51,9 @@ config NVHE_EL2_DEBUG
 	depends on KVM
 	help
 	  Say Y here to enable the debug mode for the non-VHE KVM EL2 object.
-	  Failure reports will BUG() in the hypervisor. This is intended for
-	  local EL2 hypervisor development.
+	  Failure reports will BUG() in the hypervisor; and panics will print
+	  the hypervisor call stack. This is intended for local EL2 hypervisor
+	  development.
 
 	  If unsure, say N.
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 7a23630c4a7f..66c07c04eb52 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -49,7 +49,7 @@ DEFINE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
 
 DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
 
-static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
+DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
 unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
 DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
 
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index e3140abd2e2e..ff69dff33700 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -17,6 +17,7 @@
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_mmu.h>
 #include <asm/debug-monitors.h>
+#include <asm/stacktrace.h>
 #include <asm/traps.h>
 
 #include <kvm/arm_hypercalls.h>
@@ -326,6 +327,8 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr,
 		kvm_err("nVHE hyp panic at: %016llx!\n", elr_virt + hyp_offset);
 	}
 
+	kvm_nvhe_dump_backtrace(hyp_offset);
+
 	/*
 	 * Hyp has panicked and we're going to handle that by panicking the
 	 * kernel. The kernel offset will be revealed in the panic so we're
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index efc20273a352..b8ecffc47424 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -37,6 +37,22 @@ DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
 #ifdef CONFIG_NVHE_EL2_DEBUG
 DEFINE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack)
 	__aligned(16);
+DEFINE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
+
+static inline void cpu_prepare_nvhe_panic_info(void)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr(&kvm_panic_info);
+	struct kvm_nvhe_init_params *params = this_cpu_ptr(&kvm_init_params);
+
+	panic_info->hyp_stack_base = (unsigned long)(params->stack_hyp_va - PAGE_SIZE);
+	panic_info->hyp_overflow_stack_base = (unsigned long)this_cpu_ptr(hyp_overflow_stack);
+	panic_info->fp = (unsigned long)__builtin_frame_address(0);
+	panic_info->pc = _THIS_IP_;
+}
+ #else
+static inline void cpu_prepare_nvhe_panic_info(void)
+{
+}
 #endif
 
 static void __activate_traps(struct kvm_vcpu *vcpu)
@@ -360,6 +376,8 @@ asmlinkage void __noreturn hyp_panic(void)
 	struct kvm_cpu_context *host_ctxt;
 	struct kvm_vcpu *vcpu;
 
+	cpu_prepare_nvhe_panic_info();
+
 	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
 	vcpu = host_ctxt->__hyp_running_vcpu;
 
-- 
2.35.1.473.g83b2b277ed-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 7/8] KVM: arm64: Unwind and dump nVHE HYP stacktrace
@ 2022-02-24  5:13   ` Kalesh Singh
  0 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24  5:13 UTC (permalink / raw)
  Cc: Catalin Marinas, Kalesh Singh, will, kvmarm, maz,
	Madhavan T. Venkataraman, kernel-team, surenb, Mark Brown,
	Peter Collingbourne, linux-arm-kernel, linux-kernel,
	Masami Hiramatsu, Paolo Bonzini

Unwind the stack in EL1, when CONFIG_NVHE_EL2_DEBUG is enabled. This is
possible because CONFIG_NVHE_EL2_DEBUG disables the host stage 2 protection
which allows host to access the hypervisor stack pages in EL1.

Unwinding and dumping hyp call traces is gated on CONFIG_NVHE_EL2_DEBUG
to avoid the potential leaking of information to the host.

A simple stack overflow test produces the following output:

[  580.376051][  T412] kvm: nVHE hyp panic at: ffffffc0116145c4!
[  580.378034][  T412] kvm [412]: nVHE HYP call trace:
[  580.378591][  T412] kvm [412]:  [<ffffffc011614934>]
[  580.378993][  T412] kvm [412]:  [<ffffffc01160fa48>]
[  580.379386][  T412] kvm [412]:  [<ffffffc0116145dc>]  // Non-terminating recursive call
[  580.379772][  T412] kvm [412]:  [<ffffffc0116145dc>]
[  580.380158][  T412] kvm [412]:  [<ffffffc0116145dc>]
[  580.380544][  T412] kvm [412]:  [<ffffffc0116145dc>]
[  580.380928][  T412] kvm [412]:  [<ffffffc0116145dc>]
. . .

Since nVHE hyp symbols are not included by kallsyms to avoid issues
with aliasing, we fallback to the vmlinux addresses. Symbolizing the
addresses is handled in the next patch in this series.

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---

Changes in v3:
  - The nvhe hyp stack unwinder now makes use of the core logic from the
    regular kernel unwinder to avoid duplication, per Mark

Changes in v2:
  - Add cpu_prepare_nvhe_panic_info()
  - Move updating the panic info to hyp_panic(), so that unwinding also
    works for conventional nVHE Hyp-mode.

 arch/arm64/include/asm/kvm_asm.h    |  19 +++
 arch/arm64/include/asm/stacktrace.h |  12 ++
 arch/arm64/kernel/stacktrace.c      | 210 +++++++++++++++++++++++++---
 arch/arm64/kvm/Kconfig              |   5 +-
 arch/arm64/kvm/arm.c                |   2 +-
 arch/arm64/kvm/handle_exit.c        |   3 +
 arch/arm64/kvm/hyp/nvhe/switch.c    |  18 +++
 7 files changed, 243 insertions(+), 26 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 2e277f2ed671..16efdf150a37 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -176,6 +176,25 @@ struct kvm_nvhe_init_params {
 	unsigned long vtcr;
 };
 
+#ifdef CONFIG_NVHE_EL2_DEBUG
+/*
+ * Used by the host in EL1 to dump the nVHE hypervisor backtrace on
+ * hyp_panic. This is possible because CONFIG_NVHE_EL2_DEBUG disables
+ * the host stage 2 protection. See: __hyp_do_panic()
+ *
+ * @hyp_stack_base:             hyp VA of the hyp_stack base.
+ * @hyp_overflow_stack_base:    hyp VA of the hyp_overflow_stack base.
+ * @fp:                         hyp FP where the backtrace begins.
+ * @pc:                         hyp PC where the backtrace begins.
+ */
+struct kvm_nvhe_panic_info {
+	unsigned long hyp_stack_base;
+	unsigned long hyp_overflow_stack_base;
+	unsigned long fp;
+	unsigned long pc;
+};
+#endif /* CONFIG_NVHE_EL2_DEBUG */
+
 /* Translate a kernel address @ptr into its equivalent linear mapping */
 #define kvm_ksym_ref(ptr)						\
 	({								\
diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index e77cdef9ca29..18611a51cf14 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -22,6 +22,10 @@ enum stack_type {
 	STACK_TYPE_OVERFLOW,
 	STACK_TYPE_SDEI_NORMAL,
 	STACK_TYPE_SDEI_CRITICAL,
+#ifdef CONFIG_NVHE_EL2_DEBUG
+	STACK_TYPE_KVM_NVHE_HYP,
+	STACK_TYPE_KVM_NVHE_OVERFLOW,
+#endif /* CONFIG_NVHE_EL2_DEBUG */
 	__NR_STACK_TYPES
 };
 
@@ -147,4 +151,12 @@ static inline bool on_accessible_stack(const struct task_struct *tsk,
 	return false;
 }
 
+#ifdef CONFIG_NVHE_EL2_DEBUG
+void kvm_nvhe_dump_backtrace(unsigned long hyp_offset);
+#else
+static inline void kvm_nvhe_dump_backtrace(unsigned long hyp_offset)
+{
+}
+#endif /* CONFIG_NVHE_EL2_DEBUG */
+
 #endif	/* __ASM_STACKTRACE_H */
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index e4103e085681..6ec85cb69b1f 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -15,6 +15,8 @@
 
 #include <asm/irq.h>
 #include <asm/pointer_auth.h>
+#include <asm/kvm_asm.h>
+#include <asm/kvm_hyp.h>
 #include <asm/stack_pointer.h>
 #include <asm/stacktrace.h>
 
@@ -64,26 +66,15 @@ NOKPROBE_SYMBOL(start_backtrace);
  * records (e.g. a cycle), determined based on the location and fp value of A
  * and the location (but not the fp value) of B.
  */
-static int notrace unwind_frame(struct task_struct *tsk,
-				struct stackframe *frame)
+static int notrace __unwind_frame(struct stackframe *frame, struct stack_info *info,
+		unsigned long (*translate_fp)(unsigned long, enum stack_type))
 {
 	unsigned long fp = frame->fp;
-	struct stack_info info;
-
-	if (!tsk)
-		tsk = current;
-
-	/* Final frame; nothing to unwind */
-	if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
-		return -ENOENT;
 
 	if (fp & 0x7)
 		return -EINVAL;
 
-	if (!on_accessible_stack(tsk, fp, 16, &info))
-		return -EINVAL;
-
-	if (test_bit(info.type, frame->stacks_done))
+	if (test_bit(info->type, frame->stacks_done))
 		return -EINVAL;
 
 	/*
@@ -94,28 +85,62 @@ static int notrace unwind_frame(struct task_struct *tsk,
 	 *
 	 * TASK -> IRQ -> OVERFLOW -> SDEI_NORMAL
 	 * TASK -> SDEI_NORMAL -> SDEI_CRITICAL -> OVERFLOW
+	 * KVM_NVHE_HYP -> KVM_NVHE_OVERFLOW
 	 *
 	 * ... but the nesting itself is strict. Once we transition from one
 	 * stack to another, it's never valid to unwind back to that first
 	 * stack.
 	 */
-	if (info.type == frame->prev_type) {
+	if (info->type == frame->prev_type) {
 		if (fp <= frame->prev_fp)
 			return -EINVAL;
 	} else {
 		set_bit(frame->prev_type, frame->stacks_done);
 	}
 
+	/* Record fp as prev_fp before attempting to get the next fp */
+	frame->prev_fp = fp;
+
+	/*
+	 * If fp is not from the current address space perform the
+	 * necessary translation before dereferencing it to get next fp.
+	 */
+	if (translate_fp)
+		fp = translate_fp(fp, info->type);
+	if (!fp)
+		return -EINVAL;
+
 	/*
 	 * Record this frame record's values and location. The prev_fp and
-	 * prev_type are only meaningful to the next unwind_frame() invocation.
+	 * prev_type are only meaningful to the next __unwind_frame() invocation.
 	 */
 	frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
 	frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
-	frame->prev_fp = fp;
-	frame->prev_type = info.type;
-
 	frame->pc = ptrauth_strip_insn_pac(frame->pc);
+	frame->prev_type = info->type;
+
+	return 0;
+}
+
+static int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
+{
+	unsigned long fp = frame->fp;
+	struct stack_info info;
+	int err;
+
+	if (!tsk)
+		tsk = current;
+
+	/* Final frame; nothing to unwind */
+	if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
+		return -ENOENT;
+
+	if (!on_accessible_stack(tsk, fp, 16, &info))
+		return -EINVAL;
+
+	err = __unwind_frame(frame, &info, NULL);
+	if (err)
+		return err;
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 	if (tsk->ret_stack &&
@@ -143,20 +168,27 @@ static int notrace unwind_frame(struct task_struct *tsk,
 }
 NOKPROBE_SYMBOL(unwind_frame);
 
-static void notrace walk_stackframe(struct task_struct *tsk,
-				    struct stackframe *frame,
-				    bool (*fn)(void *, unsigned long), void *data)
+static void notrace __walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
+		bool (*fn)(void *, unsigned long), void *data,
+		int (*unwind_frame_fn)(struct task_struct *tsk, struct stackframe *frame))
 {
 	while (1) {
 		int ret;
 
 		if (!fn(data, frame->pc))
 			break;
-		ret = unwind_frame(tsk, frame);
+		ret = unwind_frame_fn(tsk, frame);
 		if (ret < 0)
 			break;
 	}
 }
+
+static void notrace walk_stackframe(struct task_struct *tsk,
+				    struct stackframe *frame,
+				    bool (*fn)(void *, unsigned long), void *data)
+{
+	__walk_stackframe(tsk, frame, fn, data, unwind_frame);
+}
 NOKPROBE_SYMBOL(walk_stackframe);
 
 static bool dump_backtrace_entry(void *arg, unsigned long where)
@@ -210,3 +242,135 @@ noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
 
 	walk_stackframe(task, &frame, consume_entry, cookie);
 }
+
+#ifdef CONFIG_NVHE_EL2_DEBUG
+DECLARE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
+DECLARE_KVM_NVHE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack);
+DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
+
+static inline bool kvm_nvhe_on_overflow_stack(unsigned long sp, unsigned long size,
+				 struct stack_info *info)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+	unsigned long low = (unsigned long)panic_info->hyp_overflow_stack_base;
+	unsigned long high = low + PAGE_SIZE;
+
+	return on_stack(sp, size, low, high, STACK_TYPE_KVM_NVHE_OVERFLOW, info);
+}
+
+static inline bool kvm_nvhe_on_hyp_stack(unsigned long sp, unsigned long size,
+				 struct stack_info *info)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+	unsigned long low = (unsigned long)panic_info->hyp_stack_base;
+	unsigned long high = low + PAGE_SIZE;
+
+	return on_stack(sp, size, low, high, STACK_TYPE_KVM_NVHE_HYP, info);
+}
+
+static inline bool kvm_nvhe_on_accessible_stack(unsigned long sp, unsigned long size,
+				       struct stack_info *info)
+{
+	if (info)
+		info->type = STACK_TYPE_UNKNOWN;
+
+	if (kvm_nvhe_on_hyp_stack(sp, size, info))
+		return true;
+	if (kvm_nvhe_on_overflow_stack(sp, size, info))
+		return true;
+
+	return false;
+}
+
+static unsigned long kvm_nvhe_hyp_stack_kern_va(unsigned long addr)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+	unsigned long hyp_base, kern_base, hyp_offset;
+
+	hyp_base = (unsigned long)panic_info->hyp_stack_base;
+	hyp_offset = addr - hyp_base;
+
+	kern_base = (unsigned long)*this_cpu_ptr(&kvm_arm_hyp_stack_page);
+
+	return kern_base + hyp_offset;
+}
+
+static unsigned long kvm_nvhe_overflow_stack_kern_va(unsigned long addr)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+	unsigned long hyp_base, kern_base, hyp_offset;
+
+	hyp_base = (unsigned long)panic_info->hyp_overflow_stack_base;
+	hyp_offset = addr - hyp_base;
+
+	kern_base = (unsigned long)this_cpu_ptr_nvhe_sym(hyp_overflow_stack);
+
+	return kern_base + hyp_offset;
+}
+
+/*
+ * Convert KVM nVHE hypervisor stack VA to a kernel VA.
+ *
+ * The nVHE hypervisor stack is mapped in the flexible 'private' VA range, to allow
+ * for guard pages below the stack. Consequently, the fixed offset address
+ * translation macros won't work here.
+ *
+ * The kernel VA is calculated as an offset from the kernel VA of the hypervisor
+ * stack base. See: kvm_nvhe_hyp_stack_kern_va(),  kvm_nvhe_overflow_stack_kern_va()
+ */
+static unsigned long kvm_nvhe_stack_kern_va(unsigned long addr,
+					enum stack_type type)
+{
+	switch (type) {
+	case STACK_TYPE_KVM_NVHE_HYP:
+		return kvm_nvhe_hyp_stack_kern_va(addr);
+	case STACK_TYPE_KVM_NVHE_OVERFLOW:
+		return kvm_nvhe_overflow_stack_kern_va(addr);
+	default:
+		return 0UL;
+	}
+}
+
+static int notrace kvm_nvhe_unwind_frame(struct task_struct *tsk,
+					struct stackframe *frame)
+{
+	struct stack_info info;
+
+	if (!kvm_nvhe_on_accessible_stack(frame->fp, 16, &info))
+		return -EINVAL;
+
+	return  __unwind_frame(frame, &info, kvm_nvhe_stack_kern_va);
+}
+
+static bool kvm_nvhe_dump_backtrace_entry(void *arg, unsigned long where)
+{
+	unsigned long va_mask = GENMASK_ULL(vabits_actual - 1, 0);
+	unsigned long hyp_offset = (unsigned long)arg;
+
+	where &= va_mask;	/* Mask tags */
+	where += hyp_offset;	/* Convert to kern addr */
+
+	kvm_err("[<%016lx>] %pB\n", where, (void *)where);
+
+	return true;
+}
+
+static void notrace kvm_nvhe_walk_stackframe(struct task_struct *tsk,
+				    struct stackframe *frame,
+				    bool (*fn)(void *, unsigned long), void *data)
+{
+	__walk_stackframe(tsk, frame, fn, data, kvm_nvhe_unwind_frame);
+}
+
+void kvm_nvhe_dump_backtrace(unsigned long hyp_offset)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+	struct stackframe frame;
+
+	start_backtrace(&frame, panic_info->fp, panic_info->pc);
+	pr_err("nVHE HYP call trace:\n");
+	kvm_nvhe_walk_stackframe(NULL, &frame, kvm_nvhe_dump_backtrace_entry,
+					(void *)hyp_offset);
+	pr_err("---- end of nVHE HYP call trace ----\n");
+}
+#endif /* CONFIG_NVHE_EL2_DEBUG */
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 8a5fbbf084df..75f2c8255ff0 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -51,8 +51,9 @@ config NVHE_EL2_DEBUG
 	depends on KVM
 	help
 	  Say Y here to enable the debug mode for the non-VHE KVM EL2 object.
-	  Failure reports will BUG() in the hypervisor. This is intended for
-	  local EL2 hypervisor development.
+	  Failure reports will BUG() in the hypervisor; and panics will print
+	  the hypervisor call stack. This is intended for local EL2 hypervisor
+	  development.
 
 	  If unsure, say N.
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 7a23630c4a7f..66c07c04eb52 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -49,7 +49,7 @@ DEFINE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
 
 DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
 
-static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
+DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
 unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
 DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
 
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index e3140abd2e2e..ff69dff33700 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -17,6 +17,7 @@
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_mmu.h>
 #include <asm/debug-monitors.h>
+#include <asm/stacktrace.h>
 #include <asm/traps.h>
 
 #include <kvm/arm_hypercalls.h>
@@ -326,6 +327,8 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr,
 		kvm_err("nVHE hyp panic at: %016llx!\n", elr_virt + hyp_offset);
 	}
 
+	kvm_nvhe_dump_backtrace(hyp_offset);
+
 	/*
 	 * Hyp has panicked and we're going to handle that by panicking the
 	 * kernel. The kernel offset will be revealed in the panic so we're
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index efc20273a352..b8ecffc47424 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -37,6 +37,22 @@ DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
 #ifdef CONFIG_NVHE_EL2_DEBUG
 DEFINE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack)
 	__aligned(16);
+DEFINE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
+
+static inline void cpu_prepare_nvhe_panic_info(void)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr(&kvm_panic_info);
+	struct kvm_nvhe_init_params *params = this_cpu_ptr(&kvm_init_params);
+
+	panic_info->hyp_stack_base = (unsigned long)(params->stack_hyp_va - PAGE_SIZE);
+	panic_info->hyp_overflow_stack_base = (unsigned long)this_cpu_ptr(hyp_overflow_stack);
+	panic_info->fp = (unsigned long)__builtin_frame_address(0);
+	panic_info->pc = _THIS_IP_;
+}
+ #else
+static inline void cpu_prepare_nvhe_panic_info(void)
+{
+}
 #endif
 
 static void __activate_traps(struct kvm_vcpu *vcpu)
@@ -360,6 +376,8 @@ asmlinkage void __noreturn hyp_panic(void)
 	struct kvm_cpu_context *host_ctxt;
 	struct kvm_vcpu *vcpu;
 
+	cpu_prepare_nvhe_panic_info();
+
 	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
 	vcpu = host_ctxt->__hyp_running_vcpu;
 
-- 
2.35.1.473.g83b2b277ed-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 8/8] KVM: arm64: Symbolize the nVHE HYP backtrace
  2022-02-24  5:13 ` Kalesh Singh
  (?)
@ 2022-02-24  5:13   ` Kalesh Singh
  -1 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24  5:13 UTC (permalink / raw)
  Cc: will, maz, qperret, tabba, surenb, kernel-team, Kalesh Singh,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Catalin Marinas,
	Mark Rutland, Mark Brown, Masami Hiramatsu, Peter Collingbourne,
	Madhavan T. Venkataraman, Andrew Walbran, Andrew Scull,
	linux-arm-kernel, kvmarm, linux-kernel

Reintroduce the __kvm_nvhe_ symbols in kallsyms, ignoring the local
symbols in this namespace. The local symbols are not informative and
can cause aliasing issues when symbolizing the addresses.

With the necessary symbols now in kallsyms we can symbolize nVHE
stacktrace addresses using the %pB print format specifier.

Example call trace:

[   98.916444][  T426] kvm [426]: nVHE hyp panic at: [<ffffffc0096156fc>] __kvm_nvhe_overflow_stack+0x8/0x34!
[   98.918360][  T426] nVHE HYP call trace:
[   98.918692][  T426] kvm [426]: [<ffffffc009615aac>] __kvm_nvhe_cpu_prepare_nvhe_panic_info+0x4c/0x68
[   98.919545][  T426] kvm [426]: [<ffffffc0096159a4>] __kvm_nvhe_hyp_panic+0x2c/0xe8
[   98.920107][  T426] kvm [426]: [<ffffffc009615ad8>] __kvm_nvhe_hyp_panic_bad_stack+0x10/0x10
[   98.920665][  T426] kvm [426]: [<ffffffc009610a4c>] __kvm_nvhe___kvm_hyp_host_vector+0x24c/0x794
[   98.921292][  T426] kvm [426]: [<ffffffc009615718>] __kvm_nvhe_overflow_stack+0x24/0x34
. . .

[   98.973382][  T426] kvm [426]: [<ffffffc009615718>] __kvm_nvhe_overflow_stack+0x24/0x34
[   98.973816][  T426] kvm [426]: [<ffffffc0096152f4>] __kvm_nvhe___kvm_vcpu_run+0x38/0x438
[   98.974255][  T426] kvm [426]: [<ffffffc009616f80>] __kvm_nvhe_handle___kvm_vcpu_run+0x1c4/0x364
[   98.974719][  T426] kvm [426]: [<ffffffc009616928>] __kvm_nvhe_handle_trap+0xa8/0x130
[   98.975152][  T426] kvm [426]: [<ffffffc009610064>] __kvm_nvhe___host_exit+0x64/0x64
[   98.975588][  T426] ---- end of nVHE HYP call trace ----

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---

Changes in v2:
  - Fix printk warnings - %p expects (void *)

 arch/arm64/kvm/handle_exit.c | 13 +++++--------
 scripts/kallsyms.c           |  2 +-
 2 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index ff69dff33700..3a5c32017c6b 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -296,13 +296,8 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr,
 	u64 elr_in_kimg = __phys_to_kimg(elr_phys);
 	u64 hyp_offset = elr_in_kimg - kaslr_offset() - elr_virt;
 	u64 mode = spsr & PSR_MODE_MASK;
+	u64 panic_addr = elr_virt + hyp_offset;
 
-	/*
-	 * The nVHE hyp symbols are not included by kallsyms to avoid issues
-	 * with aliasing. That means that the symbols cannot be printed with the
-	 * "%pS" format specifier, so fall back to the vmlinux address if
-	 * there's no better option.
-	 */
 	if (mode != PSR_MODE_EL2t && mode != PSR_MODE_EL2h) {
 		kvm_err("Invalid host exception to nVHE hyp!\n");
 	} else if (ESR_ELx_EC(esr) == ESR_ELx_EC_BRK64 &&
@@ -322,9 +317,11 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr,
 		if (file)
 			kvm_err("nVHE hyp BUG at: %s:%u!\n", file, line);
 		else
-			kvm_err("nVHE hyp BUG at: %016llx!\n", elr_virt + hyp_offset);
+			kvm_err("nVHE hyp BUG at: [<%016llx>] %pB!\n", panic_addr,
+					(void *)panic_addr);
 	} else {
-		kvm_err("nVHE hyp panic at: %016llx!\n", elr_virt + hyp_offset);
+		kvm_err("nVHE hyp panic at: [<%016llx>] %pB!\n", panic_addr,
+				(void *)panic_addr);
 	}
 
 	kvm_nvhe_dump_backtrace(hyp_offset);
diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index 54ad86d13784..19aba43d9da4 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -111,7 +111,7 @@ static bool is_ignored_symbol(const char *name, char type)
 		".LASANPC",		/* s390 kasan local symbols */
 		"__crc_",		/* modversions */
 		"__efistub_",		/* arm64 EFI stub namespace */
-		"__kvm_nvhe_",		/* arm64 non-VHE KVM namespace */
+		"__kvm_nvhe_$",		/* arm64 local symbols in non-VHE KVM namespace */
 		"__AArch64ADRPThunk_",	/* arm64 lld */
 		"__ARMV5PILongThunk_",	/* arm lld */
 		"__ARMV7PILongThunk_",
-- 
2.35.1.473.g83b2b277ed-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 8/8] KVM: arm64: Symbolize the nVHE HYP backtrace
@ 2022-02-24  5:13   ` Kalesh Singh
  0 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24  5:13 UTC (permalink / raw)
  Cc: will, maz, qperret, tabba, surenb, kernel-team, Kalesh Singh,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Catalin Marinas,
	Mark Rutland, Mark Brown, Masami Hiramatsu, Peter Collingbourne,
	Madhavan T. Venkataraman, Andrew Walbran, Andrew Scull,
	linux-arm-kernel, kvmarm, linux-kernel

Reintroduce the __kvm_nvhe_ symbols in kallsyms, ignoring the local
symbols in this namespace. The local symbols are not informative and
can cause aliasing issues when symbolizing the addresses.

With the necessary symbols now in kallsyms we can symbolize nVHE
stacktrace addresses using the %pB print format specifier.

Example call trace:

[   98.916444][  T426] kvm [426]: nVHE hyp panic at: [<ffffffc0096156fc>] __kvm_nvhe_overflow_stack+0x8/0x34!
[   98.918360][  T426] nVHE HYP call trace:
[   98.918692][  T426] kvm [426]: [<ffffffc009615aac>] __kvm_nvhe_cpu_prepare_nvhe_panic_info+0x4c/0x68
[   98.919545][  T426] kvm [426]: [<ffffffc0096159a4>] __kvm_nvhe_hyp_panic+0x2c/0xe8
[   98.920107][  T426] kvm [426]: [<ffffffc009615ad8>] __kvm_nvhe_hyp_panic_bad_stack+0x10/0x10
[   98.920665][  T426] kvm [426]: [<ffffffc009610a4c>] __kvm_nvhe___kvm_hyp_host_vector+0x24c/0x794
[   98.921292][  T426] kvm [426]: [<ffffffc009615718>] __kvm_nvhe_overflow_stack+0x24/0x34
. . .

[   98.973382][  T426] kvm [426]: [<ffffffc009615718>] __kvm_nvhe_overflow_stack+0x24/0x34
[   98.973816][  T426] kvm [426]: [<ffffffc0096152f4>] __kvm_nvhe___kvm_vcpu_run+0x38/0x438
[   98.974255][  T426] kvm [426]: [<ffffffc009616f80>] __kvm_nvhe_handle___kvm_vcpu_run+0x1c4/0x364
[   98.974719][  T426] kvm [426]: [<ffffffc009616928>] __kvm_nvhe_handle_trap+0xa8/0x130
[   98.975152][  T426] kvm [426]: [<ffffffc009610064>] __kvm_nvhe___host_exit+0x64/0x64
[   98.975588][  T426] ---- end of nVHE HYP call trace ----

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---

Changes in v2:
  - Fix printk warnings - %p expects (void *)

 arch/arm64/kvm/handle_exit.c | 13 +++++--------
 scripts/kallsyms.c           |  2 +-
 2 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index ff69dff33700..3a5c32017c6b 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -296,13 +296,8 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr,
 	u64 elr_in_kimg = __phys_to_kimg(elr_phys);
 	u64 hyp_offset = elr_in_kimg - kaslr_offset() - elr_virt;
 	u64 mode = spsr & PSR_MODE_MASK;
+	u64 panic_addr = elr_virt + hyp_offset;
 
-	/*
-	 * The nVHE hyp symbols are not included by kallsyms to avoid issues
-	 * with aliasing. That means that the symbols cannot be printed with the
-	 * "%pS" format specifier, so fall back to the vmlinux address if
-	 * there's no better option.
-	 */
 	if (mode != PSR_MODE_EL2t && mode != PSR_MODE_EL2h) {
 		kvm_err("Invalid host exception to nVHE hyp!\n");
 	} else if (ESR_ELx_EC(esr) == ESR_ELx_EC_BRK64 &&
@@ -322,9 +317,11 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr,
 		if (file)
 			kvm_err("nVHE hyp BUG at: %s:%u!\n", file, line);
 		else
-			kvm_err("nVHE hyp BUG at: %016llx!\n", elr_virt + hyp_offset);
+			kvm_err("nVHE hyp BUG at: [<%016llx>] %pB!\n", panic_addr,
+					(void *)panic_addr);
 	} else {
-		kvm_err("nVHE hyp panic at: %016llx!\n", elr_virt + hyp_offset);
+		kvm_err("nVHE hyp panic at: [<%016llx>] %pB!\n", panic_addr,
+				(void *)panic_addr);
 	}
 
 	kvm_nvhe_dump_backtrace(hyp_offset);
diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index 54ad86d13784..19aba43d9da4 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -111,7 +111,7 @@ static bool is_ignored_symbol(const char *name, char type)
 		".LASANPC",		/* s390 kasan local symbols */
 		"__crc_",		/* modversions */
 		"__efistub_",		/* arm64 EFI stub namespace */
-		"__kvm_nvhe_",		/* arm64 non-VHE KVM namespace */
+		"__kvm_nvhe_$",		/* arm64 local symbols in non-VHE KVM namespace */
 		"__AArch64ADRPThunk_",	/* arm64 lld */
 		"__ARMV5PILongThunk_",	/* arm lld */
 		"__ARMV7PILongThunk_",
-- 
2.35.1.473.g83b2b277ed-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 8/8] KVM: arm64: Symbolize the nVHE HYP backtrace
@ 2022-02-24  5:13   ` Kalesh Singh
  0 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24  5:13 UTC (permalink / raw)
  Cc: kernel-team, Catalin Marinas, Andrew Walbran, will,
	Peter Collingbourne, maz, linux-kernel, Madhavan T. Venkataraman,
	Mark Brown, Masami Hiramatsu, Kalesh Singh, linux-arm-kernel,
	surenb, kvmarm

Reintroduce the __kvm_nvhe_ symbols in kallsyms, ignoring the local
symbols in this namespace. The local symbols are not informative and
can cause aliasing issues when symbolizing the addresses.

With the necessary symbols now in kallsyms we can symbolize nVHE
stacktrace addresses using the %pB print format specifier.

Example call trace:

[   98.916444][  T426] kvm [426]: nVHE hyp panic at: [<ffffffc0096156fc>] __kvm_nvhe_overflow_stack+0x8/0x34!
[   98.918360][  T426] nVHE HYP call trace:
[   98.918692][  T426] kvm [426]: [<ffffffc009615aac>] __kvm_nvhe_cpu_prepare_nvhe_panic_info+0x4c/0x68
[   98.919545][  T426] kvm [426]: [<ffffffc0096159a4>] __kvm_nvhe_hyp_panic+0x2c/0xe8
[   98.920107][  T426] kvm [426]: [<ffffffc009615ad8>] __kvm_nvhe_hyp_panic_bad_stack+0x10/0x10
[   98.920665][  T426] kvm [426]: [<ffffffc009610a4c>] __kvm_nvhe___kvm_hyp_host_vector+0x24c/0x794
[   98.921292][  T426] kvm [426]: [<ffffffc009615718>] __kvm_nvhe_overflow_stack+0x24/0x34
. . .

[   98.973382][  T426] kvm [426]: [<ffffffc009615718>] __kvm_nvhe_overflow_stack+0x24/0x34
[   98.973816][  T426] kvm [426]: [<ffffffc0096152f4>] __kvm_nvhe___kvm_vcpu_run+0x38/0x438
[   98.974255][  T426] kvm [426]: [<ffffffc009616f80>] __kvm_nvhe_handle___kvm_vcpu_run+0x1c4/0x364
[   98.974719][  T426] kvm [426]: [<ffffffc009616928>] __kvm_nvhe_handle_trap+0xa8/0x130
[   98.975152][  T426] kvm [426]: [<ffffffc009610064>] __kvm_nvhe___host_exit+0x64/0x64
[   98.975588][  T426] ---- end of nVHE HYP call trace ----

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---

Changes in v2:
  - Fix printk warnings - %p expects (void *)

 arch/arm64/kvm/handle_exit.c | 13 +++++--------
 scripts/kallsyms.c           |  2 +-
 2 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index ff69dff33700..3a5c32017c6b 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -296,13 +296,8 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr,
 	u64 elr_in_kimg = __phys_to_kimg(elr_phys);
 	u64 hyp_offset = elr_in_kimg - kaslr_offset() - elr_virt;
 	u64 mode = spsr & PSR_MODE_MASK;
+	u64 panic_addr = elr_virt + hyp_offset;
 
-	/*
-	 * The nVHE hyp symbols are not included by kallsyms to avoid issues
-	 * with aliasing. That means that the symbols cannot be printed with the
-	 * "%pS" format specifier, so fall back to the vmlinux address if
-	 * there's no better option.
-	 */
 	if (mode != PSR_MODE_EL2t && mode != PSR_MODE_EL2h) {
 		kvm_err("Invalid host exception to nVHE hyp!\n");
 	} else if (ESR_ELx_EC(esr) == ESR_ELx_EC_BRK64 &&
@@ -322,9 +317,11 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr,
 		if (file)
 			kvm_err("nVHE hyp BUG at: %s:%u!\n", file, line);
 		else
-			kvm_err("nVHE hyp BUG at: %016llx!\n", elr_virt + hyp_offset);
+			kvm_err("nVHE hyp BUG at: [<%016llx>] %pB!\n", panic_addr,
+					(void *)panic_addr);
 	} else {
-		kvm_err("nVHE hyp panic at: %016llx!\n", elr_virt + hyp_offset);
+		kvm_err("nVHE hyp panic at: [<%016llx>] %pB!\n", panic_addr,
+				(void *)panic_addr);
 	}
 
 	kvm_nvhe_dump_backtrace(hyp_offset);
diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index 54ad86d13784..19aba43d9da4 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -111,7 +111,7 @@ static bool is_ignored_symbol(const char *name, char type)
 		".LASANPC",		/* s390 kasan local symbols */
 		"__crc_",		/* modversions */
 		"__efistub_",		/* arm64 EFI stub namespace */
-		"__kvm_nvhe_",		/* arm64 non-VHE KVM namespace */
+		"__kvm_nvhe_$",		/* arm64 local symbols in non-VHE KVM namespace */
 		"__AArch64ADRPThunk_",	/* arm64 lld */
 		"__ARMV5PILongThunk_",	/* arm lld */
 		"__ARMV7PILongThunk_",
-- 
2.35.1.473.g83b2b277ed-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 1/8] KVM: arm64: Introduce hyp_alloc_private_va_range()
  2022-02-24  5:13   ` Kalesh Singh
  (?)
@ 2022-02-24 12:24     ` Fuad Tabba
  -1 siblings, 0 replies; 60+ messages in thread
From: Fuad Tabba @ 2022-02-24 12:24 UTC (permalink / raw)
  To: Kalesh Singh
  Cc: will, maz, qperret, surenb, kernel-team, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Catalin Marinas,
	Mark Rutland, Mark Brown, Masami Hiramatsu, Peter Collingbourne,
	Madhavan T. Venkataraman, Andrew Scull, Paolo Bonzini,
	Ard Biesheuvel, linux-arm-kernel, kvmarm, linux-kernel

Hi Kalesh,

On Thu, Feb 24, 2022 at 5:16 AM Kalesh Singh <kaleshsingh@google.com> wrote:
>
> hyp_alloc_private_va_range() can be used to reserve private VA ranges
> in the nVHE hypervisor. Also update  __create_hyp_private_mapping()
> to allow specifying an alignment for the private VA mapping.
>
> These will be used to implement stack guard pages for KVM nVHE hypervisor
> (nVHE Hyp mode / not pKVM), in a subsequent patch in the series.
>
> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> ---
>
> Changes in v3:
>   - Handle null ptr in IS_ERR_OR_NULL checks, per Mark
>
>  arch/arm64/include/asm/kvm_mmu.h |  4 +++
>  arch/arm64/kvm/mmu.c             | 62 ++++++++++++++++++++------------
>  2 files changed, 43 insertions(+), 23 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 81839e9a8a24..0b0c71302b92 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -153,6 +153,10 @@ static __always_inline unsigned long __kern_hyp_va(unsigned long v)
>  int kvm_share_hyp(void *from, void *to);
>  void kvm_unshare_hyp(void *from, void *to);
>  int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
> +unsigned long hyp_alloc_private_va_range(size_t size, size_t align);
> +int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> +                               size_t align, unsigned long *haddr,
> +                               enum kvm_pgtable_prot prot);
>  int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
>                            void __iomem **kaddr,
>                            void __iomem **haddr);
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index bc2aba953299..fc09536c8197 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -457,22 +457,16 @@ int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
>         return 0;
>  }
>
> -static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> -                                       unsigned long *haddr,
> -                                       enum kvm_pgtable_prot prot)
> +
> +/*
> + * Allocates a private VA range below io_map_base.
> + *
> + * @size:      The size of the VA range to reserve.
> + * @align:     The required alignment for the allocation.
> + */

Many of the functions in this file use the kernel-doc format, and your
added comments are close, but not quite conforment. If you want to use
the kernel-doc for these you can refer to:
https://www.kernel.org/doc/html/latest/doc-guide/kernel-doc.html

> +unsigned long hyp_alloc_private_va_range(size_t size, size_t align)
>  {
>         unsigned long base;
> -       int ret = 0;
> -
> -       if (!kvm_host_owns_hyp_mappings()) {
> -               base = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
> -                                        phys_addr, size, prot);
> -               if (IS_ERR_OR_NULL((void *)base))
> -                       return PTR_ERR((void *)base);
> -               *haddr = base;
> -
> -               return 0;
> -       }
>
>         mutex_lock(&kvm_hyp_pgd_mutex);
>
> @@ -484,8 +478,8 @@ static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
>          *
>          * The allocated size is always a multiple of PAGE_SIZE.
>          */
> -       size = PAGE_ALIGN(size + offset_in_page(phys_addr));
> -       base = io_map_base - size;
> +       base = io_map_base - PAGE_ALIGN(size);
> +       base = ALIGN_DOWN(base, align);
>
>         /*
>          * Verify that BIT(VA_BITS - 1) hasn't been flipped by
> @@ -493,20 +487,42 @@ static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
>          * overflowed the idmap/IO address range.
>          */
>         if ((base ^ io_map_base) & BIT(VA_BITS - 1))
> -               ret = -ENOMEM;
> +               base = (unsigned long)ERR_PTR(-ENOMEM);
>         else
>                 io_map_base = base;
>
>         mutex_unlock(&kvm_hyp_pgd_mutex);
>
> -       if (ret)
> -               goto out;
> +       return base;
> +}
> +
> +int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> +                               size_t align, unsigned long *haddr,
> +                               enum kvm_pgtable_prot prot)
> +{
> +       unsigned long addr;
> +       int ret = 0;
> +
> +       if (!kvm_host_owns_hyp_mappings()) {
> +               addr = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
> +                                        phys_addr, size, prot);
> +               if (IS_ERR_OR_NULL((void *)addr))
> +                       return addr ? PTR_ERR((void *)addr) : -ENOMEM;
> +               *haddr = addr;
> +
> +               return 0;
> +       }
> +
> +       size += offset_in_page(phys_addr);

You're not page-aligning the size, which was the behavior before this
patch. However, looking at where it's being used it seems to be fine
because the users of size would align it if necessary.

Thanks,
/fuad



> +       addr = hyp_alloc_private_va_range(size, align);
> +       if (IS_ERR_OR_NULL((void *)addr))
> +               return addr ? PTR_ERR((void *)addr) : -ENOMEM;
>
> -       ret = __create_hyp_mappings(base, size, phys_addr, prot);
> +       ret = __create_hyp_mappings(addr, size, phys_addr, prot);
>         if (ret)
>                 goto out;
>
> -       *haddr = base + offset_in_page(phys_addr);
> +       *haddr = addr + offset_in_page(phys_addr);
>  out:
>         return ret;
>  }
> @@ -537,7 +553,7 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
>                 return 0;
>         }
>
> -       ret = __create_hyp_private_mapping(phys_addr, size,
> +       ret = __create_hyp_private_mapping(phys_addr, size, PAGE_SIZE,
>                                            &addr, PAGE_HYP_DEVICE);
>         if (ret) {
>                 iounmap(*kaddr);
> @@ -564,7 +580,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>
>         BUG_ON(is_kernel_in_hyp_mode());
>
> -       ret = __create_hyp_private_mapping(phys_addr, size,
> +       ret = __create_hyp_private_mapping(phys_addr, size, PAGE_SIZE,
>                                            &addr, PAGE_HYP_EXEC);
>         if (ret) {
>                 *haddr = NULL;
> --
> 2.35.1.473.g83b2b277ed-goog
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 1/8] KVM: arm64: Introduce hyp_alloc_private_va_range()
@ 2022-02-24 12:24     ` Fuad Tabba
  0 siblings, 0 replies; 60+ messages in thread
From: Fuad Tabba @ 2022-02-24 12:24 UTC (permalink / raw)
  To: Kalesh Singh
  Cc: kernel-team, will, Peter Collingbourne, maz, linux-kernel,
	kvmarm, Madhavan T. Venkataraman, Mark Brown, Masami Hiramatsu,
	Catalin Marinas, Paolo Bonzini, surenb, linux-arm-kernel

Hi Kalesh,

On Thu, Feb 24, 2022 at 5:16 AM Kalesh Singh <kaleshsingh@google.com> wrote:
>
> hyp_alloc_private_va_range() can be used to reserve private VA ranges
> in the nVHE hypervisor. Also update  __create_hyp_private_mapping()
> to allow specifying an alignment for the private VA mapping.
>
> These will be used to implement stack guard pages for KVM nVHE hypervisor
> (nVHE Hyp mode / not pKVM), in a subsequent patch in the series.
>
> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> ---
>
> Changes in v3:
>   - Handle null ptr in IS_ERR_OR_NULL checks, per Mark
>
>  arch/arm64/include/asm/kvm_mmu.h |  4 +++
>  arch/arm64/kvm/mmu.c             | 62 ++++++++++++++++++++------------
>  2 files changed, 43 insertions(+), 23 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 81839e9a8a24..0b0c71302b92 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -153,6 +153,10 @@ static __always_inline unsigned long __kern_hyp_va(unsigned long v)
>  int kvm_share_hyp(void *from, void *to);
>  void kvm_unshare_hyp(void *from, void *to);
>  int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
> +unsigned long hyp_alloc_private_va_range(size_t size, size_t align);
> +int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> +                               size_t align, unsigned long *haddr,
> +                               enum kvm_pgtable_prot prot);
>  int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
>                            void __iomem **kaddr,
>                            void __iomem **haddr);
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index bc2aba953299..fc09536c8197 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -457,22 +457,16 @@ int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
>         return 0;
>  }
>
> -static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> -                                       unsigned long *haddr,
> -                                       enum kvm_pgtable_prot prot)
> +
> +/*
> + * Allocates a private VA range below io_map_base.
> + *
> + * @size:      The size of the VA range to reserve.
> + * @align:     The required alignment for the allocation.
> + */

Many of the functions in this file use the kernel-doc format, and your
added comments are close, but not quite conforment. If you want to use
the kernel-doc for these you can refer to:
https://www.kernel.org/doc/html/latest/doc-guide/kernel-doc.html

> +unsigned long hyp_alloc_private_va_range(size_t size, size_t align)
>  {
>         unsigned long base;
> -       int ret = 0;
> -
> -       if (!kvm_host_owns_hyp_mappings()) {
> -               base = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
> -                                        phys_addr, size, prot);
> -               if (IS_ERR_OR_NULL((void *)base))
> -                       return PTR_ERR((void *)base);
> -               *haddr = base;
> -
> -               return 0;
> -       }
>
>         mutex_lock(&kvm_hyp_pgd_mutex);
>
> @@ -484,8 +478,8 @@ static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
>          *
>          * The allocated size is always a multiple of PAGE_SIZE.
>          */
> -       size = PAGE_ALIGN(size + offset_in_page(phys_addr));
> -       base = io_map_base - size;
> +       base = io_map_base - PAGE_ALIGN(size);
> +       base = ALIGN_DOWN(base, align);
>
>         /*
>          * Verify that BIT(VA_BITS - 1) hasn't been flipped by
> @@ -493,20 +487,42 @@ static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
>          * overflowed the idmap/IO address range.
>          */
>         if ((base ^ io_map_base) & BIT(VA_BITS - 1))
> -               ret = -ENOMEM;
> +               base = (unsigned long)ERR_PTR(-ENOMEM);
>         else
>                 io_map_base = base;
>
>         mutex_unlock(&kvm_hyp_pgd_mutex);
>
> -       if (ret)
> -               goto out;
> +       return base;
> +}
> +
> +int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> +                               size_t align, unsigned long *haddr,
> +                               enum kvm_pgtable_prot prot)
> +{
> +       unsigned long addr;
> +       int ret = 0;
> +
> +       if (!kvm_host_owns_hyp_mappings()) {
> +               addr = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
> +                                        phys_addr, size, prot);
> +               if (IS_ERR_OR_NULL((void *)addr))
> +                       return addr ? PTR_ERR((void *)addr) : -ENOMEM;
> +               *haddr = addr;
> +
> +               return 0;
> +       }
> +
> +       size += offset_in_page(phys_addr);

You're not page-aligning the size, which was the behavior before this
patch. However, looking at where it's being used it seems to be fine
because the users of size would align it if necessary.

Thanks,
/fuad



> +       addr = hyp_alloc_private_va_range(size, align);
> +       if (IS_ERR_OR_NULL((void *)addr))
> +               return addr ? PTR_ERR((void *)addr) : -ENOMEM;
>
> -       ret = __create_hyp_mappings(base, size, phys_addr, prot);
> +       ret = __create_hyp_mappings(addr, size, phys_addr, prot);
>         if (ret)
>                 goto out;
>
> -       *haddr = base + offset_in_page(phys_addr);
> +       *haddr = addr + offset_in_page(phys_addr);
>  out:
>         return ret;
>  }
> @@ -537,7 +553,7 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
>                 return 0;
>         }
>
> -       ret = __create_hyp_private_mapping(phys_addr, size,
> +       ret = __create_hyp_private_mapping(phys_addr, size, PAGE_SIZE,
>                                            &addr, PAGE_HYP_DEVICE);
>         if (ret) {
>                 iounmap(*kaddr);
> @@ -564,7 +580,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>
>         BUG_ON(is_kernel_in_hyp_mode());
>
> -       ret = __create_hyp_private_mapping(phys_addr, size,
> +       ret = __create_hyp_private_mapping(phys_addr, size, PAGE_SIZE,
>                                            &addr, PAGE_HYP_EXEC);
>         if (ret) {
>                 *haddr = NULL;
> --
> 2.35.1.473.g83b2b277ed-goog
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 1/8] KVM: arm64: Introduce hyp_alloc_private_va_range()
@ 2022-02-24 12:24     ` Fuad Tabba
  0 siblings, 0 replies; 60+ messages in thread
From: Fuad Tabba @ 2022-02-24 12:24 UTC (permalink / raw)
  To: Kalesh Singh
  Cc: will, maz, qperret, surenb, kernel-team, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Catalin Marinas,
	Mark Rutland, Mark Brown, Masami Hiramatsu, Peter Collingbourne,
	Madhavan T. Venkataraman, Andrew Scull, Paolo Bonzini,
	Ard Biesheuvel, linux-arm-kernel, kvmarm, linux-kernel

Hi Kalesh,

On Thu, Feb 24, 2022 at 5:16 AM Kalesh Singh <kaleshsingh@google.com> wrote:
>
> hyp_alloc_private_va_range() can be used to reserve private VA ranges
> in the nVHE hypervisor. Also update  __create_hyp_private_mapping()
> to allow specifying an alignment for the private VA mapping.
>
> These will be used to implement stack guard pages for KVM nVHE hypervisor
> (nVHE Hyp mode / not pKVM), in a subsequent patch in the series.
>
> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> ---
>
> Changes in v3:
>   - Handle null ptr in IS_ERR_OR_NULL checks, per Mark
>
>  arch/arm64/include/asm/kvm_mmu.h |  4 +++
>  arch/arm64/kvm/mmu.c             | 62 ++++++++++++++++++++------------
>  2 files changed, 43 insertions(+), 23 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 81839e9a8a24..0b0c71302b92 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -153,6 +153,10 @@ static __always_inline unsigned long __kern_hyp_va(unsigned long v)
>  int kvm_share_hyp(void *from, void *to);
>  void kvm_unshare_hyp(void *from, void *to);
>  int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
> +unsigned long hyp_alloc_private_va_range(size_t size, size_t align);
> +int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> +                               size_t align, unsigned long *haddr,
> +                               enum kvm_pgtable_prot prot);
>  int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
>                            void __iomem **kaddr,
>                            void __iomem **haddr);
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index bc2aba953299..fc09536c8197 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -457,22 +457,16 @@ int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
>         return 0;
>  }
>
> -static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> -                                       unsigned long *haddr,
> -                                       enum kvm_pgtable_prot prot)
> +
> +/*
> + * Allocates a private VA range below io_map_base.
> + *
> + * @size:      The size of the VA range to reserve.
> + * @align:     The required alignment for the allocation.
> + */

Many of the functions in this file use the kernel-doc format, and your
added comments are close, but not quite conforment. If you want to use
the kernel-doc for these you can refer to:
https://www.kernel.org/doc/html/latest/doc-guide/kernel-doc.html

> +unsigned long hyp_alloc_private_va_range(size_t size, size_t align)
>  {
>         unsigned long base;
> -       int ret = 0;
> -
> -       if (!kvm_host_owns_hyp_mappings()) {
> -               base = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
> -                                        phys_addr, size, prot);
> -               if (IS_ERR_OR_NULL((void *)base))
> -                       return PTR_ERR((void *)base);
> -               *haddr = base;
> -
> -               return 0;
> -       }
>
>         mutex_lock(&kvm_hyp_pgd_mutex);
>
> @@ -484,8 +478,8 @@ static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
>          *
>          * The allocated size is always a multiple of PAGE_SIZE.
>          */
> -       size = PAGE_ALIGN(size + offset_in_page(phys_addr));
> -       base = io_map_base - size;
> +       base = io_map_base - PAGE_ALIGN(size);
> +       base = ALIGN_DOWN(base, align);
>
>         /*
>          * Verify that BIT(VA_BITS - 1) hasn't been flipped by
> @@ -493,20 +487,42 @@ static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
>          * overflowed the idmap/IO address range.
>          */
>         if ((base ^ io_map_base) & BIT(VA_BITS - 1))
> -               ret = -ENOMEM;
> +               base = (unsigned long)ERR_PTR(-ENOMEM);
>         else
>                 io_map_base = base;
>
>         mutex_unlock(&kvm_hyp_pgd_mutex);
>
> -       if (ret)
> -               goto out;
> +       return base;
> +}
> +
> +int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> +                               size_t align, unsigned long *haddr,
> +                               enum kvm_pgtable_prot prot)
> +{
> +       unsigned long addr;
> +       int ret = 0;
> +
> +       if (!kvm_host_owns_hyp_mappings()) {
> +               addr = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
> +                                        phys_addr, size, prot);
> +               if (IS_ERR_OR_NULL((void *)addr))
> +                       return addr ? PTR_ERR((void *)addr) : -ENOMEM;
> +               *haddr = addr;
> +
> +               return 0;
> +       }
> +
> +       size += offset_in_page(phys_addr);

You're not page-aligning the size, which was the behavior before this
patch. However, looking at where it's being used it seems to be fine
because the users of size would align it if necessary.

Thanks,
/fuad



> +       addr = hyp_alloc_private_va_range(size, align);
> +       if (IS_ERR_OR_NULL((void *)addr))
> +               return addr ? PTR_ERR((void *)addr) : -ENOMEM;
>
> -       ret = __create_hyp_mappings(base, size, phys_addr, prot);
> +       ret = __create_hyp_mappings(addr, size, phys_addr, prot);
>         if (ret)
>                 goto out;
>
> -       *haddr = base + offset_in_page(phys_addr);
> +       *haddr = addr + offset_in_page(phys_addr);
>  out:
>         return ret;
>  }
> @@ -537,7 +553,7 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
>                 return 0;
>         }
>
> -       ret = __create_hyp_private_mapping(phys_addr, size,
> +       ret = __create_hyp_private_mapping(phys_addr, size, PAGE_SIZE,
>                                            &addr, PAGE_HYP_DEVICE);
>         if (ret) {
>                 iounmap(*kaddr);
> @@ -564,7 +580,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>
>         BUG_ON(is_kernel_in_hyp_mode());
>
> -       ret = __create_hyp_private_mapping(phys_addr, size,
> +       ret = __create_hyp_private_mapping(phys_addr, size, PAGE_SIZE,
>                                            &addr, PAGE_HYP_EXEC);
>         if (ret) {
>                 *haddr = NULL;
> --
> 2.35.1.473.g83b2b277ed-goog
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 2/8] KVM: arm64: Introduce pkvm_alloc_private_va_range()
  2022-02-24  5:13   ` Kalesh Singh
  (?)
@ 2022-02-24 12:25     ` Fuad Tabba
  -1 siblings, 0 replies; 60+ messages in thread
From: Fuad Tabba @ 2022-02-24 12:25 UTC (permalink / raw)
  To: Kalesh Singh
  Cc: will, maz, qperret, surenb, kernel-team, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Catalin Marinas,
	Mark Rutland, Mark Brown, Masami Hiramatsu, Peter Collingbourne,
	Madhavan T. Venkataraman, Andrew Walbran, Andrew Scull,
	Paolo Bonzini, Ard Biesheuvel, linux-arm-kernel, kvmarm,
	linux-kernel

Hi Kalesh,

I really like how this makes the code cleaner in general. A couple of
small nits below.

On Thu, Feb 24, 2022 at 5:17 AM 'Kalesh Singh' via kernel-team
<kernel-team@android.com> wrote:
>
> pkvm_hyp_alloc_private_va_range() can be used to reserve private VA ranges
> in the pKVM nVHE hypervisor (). Also update __pkvm_create_private_mapping()
> to allow specifying an alignment for the private VA mapping.
>
> These will be used to implement stack guard pages for pKVM nVHE hypervisor
> (in a subsequent patch in the series).
>
> Credits to Quentin Perret <qperret@google.com> for the idea of moving
> private VA allocation out of __pkvm_create_private_mapping()
>
> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> ---
>
> Changes in v3:
>   - Handle null ptr in IS_ERR_OR_NULL checks, per Mark
>
> Changes in v2:
>   - Allow specifying an alignment for the private VA allocations, per Marc
>
>  arch/arm64/kvm/hyp/include/nvhe/mm.h |  3 +-
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c   |  5 +--
>  arch/arm64/kvm/hyp/nvhe/mm.c         | 51 ++++++++++++++++++----------
>  arch/arm64/kvm/mmu.c                 |  2 +-
>  4 files changed, 40 insertions(+), 21 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> index 2d08510c6cc1..05d06ad00347 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> @@ -20,7 +20,8 @@ int pkvm_cpu_set_vector(enum arm64_hyp_spectre_vector slot);
>  int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
>  int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot prot);
>  unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
> -                                           enum kvm_pgtable_prot prot);
> +                                       size_t align, enum kvm_pgtable_prot prot);

Minor nit: the alignment of this does not match how it was before,
i.e., it's not in line with the other function parameters. Yet it
still goes over 80 characters.

> +unsigned long pkvm_alloc_private_va_range(size_t size, size_t align);
>
>  static inline void hyp_vmemmap_range(phys_addr_t phys, unsigned long size,
>                                      unsigned long *start, unsigned long *end)
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index 5e2197db0d32..96b2312a0f1d 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -158,9 +158,10 @@ static void handle___pkvm_create_private_mapping(struct kvm_cpu_context *host_ct
>  {
>         DECLARE_REG(phys_addr_t, phys, host_ctxt, 1);
>         DECLARE_REG(size_t, size, host_ctxt, 2);
> -       DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 3);
> +       DECLARE_REG(size_t, align, host_ctxt, 3);
> +       DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 4);
>
> -       cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, prot);
> +       cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, align, prot);
>  }
>
>  static void handle___pkvm_prot_finalize(struct kvm_cpu_context *host_ctxt)
> diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
> index 526a7d6fa86f..f35468ec639d 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mm.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mm.c
> @@ -37,26 +37,46 @@ static int __pkvm_create_mappings(unsigned long start, unsigned long size,
>         return err;
>  }
>
> -unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
> -                                           enum kvm_pgtable_prot prot)
> +/*
> + * Allocates a private VA range above __io_map_base.
> + *
> + * @size:      The size of the VA range to reserve.
> + * @align:     The required alignment for the allocation.
> + */
> +unsigned long pkvm_alloc_private_va_range(size_t size, size_t align)
>  {
> -       unsigned long addr;
> -       int err;
> +       unsigned long base, addr;
>
>         hyp_spin_lock(&pkvm_pgd_lock);
>
> -       size = PAGE_ALIGN(size + offset_in_page(phys));
> -       addr = __io_map_base;
> -       __io_map_base += size;
> +       addr = ALIGN(__io_map_base, align);
> +
> +       /* The allocated size is always a multiple of PAGE_SIZE */
> +       base = addr + PAGE_ALIGN(size);
>
>         /* Are we overflowing on the vmemmap ? */
> -       if (__io_map_base > __hyp_vmemmap) {
> -               __io_map_base -= size;
> +       if (base > __hyp_vmemmap)
>                 addr = (unsigned long)ERR_PTR(-ENOMEM);
> +       else
> +               __io_map_base = base;
> +
> +       hyp_spin_unlock(&pkvm_pgd_lock);
> +
> +       return addr;
> +}
> +
> +unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
> +                                       size_t align, enum kvm_pgtable_prot prot)
> +{
> +       unsigned long addr;
> +       int err;
> +
> +       size += offset_in_page(phys);

Same as in the patch before, the previous code would align the size
but not this change. However, looking at the callers and callees this
seems to be fine, since it's aligned when needed.

Thanks,
/fuad

> +       addr = pkvm_alloc_private_va_range(size, align);
> +       if (IS_ERR((void *)addr))
>                 goto out;
> -       }
>
> -       err = kvm_pgtable_hyp_map(&pkvm_pgtable, addr, size, phys, prot);
> +       err = __pkvm_create_mappings(addr, size, phys, prot);
>         if (err) {
>                 addr = (unsigned long)ERR_PTR(err);
>                 goto out;
> @@ -64,8 +84,6 @@ unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
>
>         addr = addr + offset_in_page(phys);
>  out:
> -       hyp_spin_unlock(&pkvm_pgd_lock);
> -
>         return addr;
>  }
>
> @@ -152,11 +170,10 @@ int hyp_map_vectors(void)
>                 return 0;
>
>         phys = __hyp_pa(__bp_harden_hyp_vecs);
> -       bp_base = (void *)__pkvm_create_private_mapping(phys,
> -                                                       __BP_HARDEN_HYP_VECS_SZ,
> -                                                       PAGE_HYP_EXEC);
> +       bp_base = (void *)__pkvm_create_private_mapping(phys, __BP_HARDEN_HYP_VECS_SZ,
> +                                                       PAGE_SIZE, PAGE_HYP_EXEC);
>         if (IS_ERR_OR_NULL(bp_base))
> -               return PTR_ERR(bp_base);
> +               return bp_base ? PTR_ERR(bp_base) : -ENOMEM;
>
>         __hyp_bp_vect_base = bp_base;
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index fc09536c8197..298e6d8439ef 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -505,7 +505,7 @@ int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
>
>         if (!kvm_host_owns_hyp_mappings()) {
>                 addr = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
> -                                        phys_addr, size, prot);
> +                                        phys_addr, size, align, prot);
>                 if (IS_ERR_OR_NULL((void *)addr))
>                         return addr ? PTR_ERR((void *)addr) : -ENOMEM;
>                 *haddr = addr;
> --
> 2.35.1.473.g83b2b277ed-goog
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 2/8] KVM: arm64: Introduce pkvm_alloc_private_va_range()
@ 2022-02-24 12:25     ` Fuad Tabba
  0 siblings, 0 replies; 60+ messages in thread
From: Fuad Tabba @ 2022-02-24 12:25 UTC (permalink / raw)
  To: Kalesh Singh
  Cc: kernel-team, Andrew Walbran, will, Peter Collingbourne, maz,
	linux-kernel, kvmarm, Madhavan T. Venkataraman, Mark Brown,
	Masami Hiramatsu, Catalin Marinas, Paolo Bonzini, surenb,
	linux-arm-kernel

Hi Kalesh,

I really like how this makes the code cleaner in general. A couple of
small nits below.

On Thu, Feb 24, 2022 at 5:17 AM 'Kalesh Singh' via kernel-team
<kernel-team@android.com> wrote:
>
> pkvm_hyp_alloc_private_va_range() can be used to reserve private VA ranges
> in the pKVM nVHE hypervisor (). Also update __pkvm_create_private_mapping()
> to allow specifying an alignment for the private VA mapping.
>
> These will be used to implement stack guard pages for pKVM nVHE hypervisor
> (in a subsequent patch in the series).
>
> Credits to Quentin Perret <qperret@google.com> for the idea of moving
> private VA allocation out of __pkvm_create_private_mapping()
>
> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> ---
>
> Changes in v3:
>   - Handle null ptr in IS_ERR_OR_NULL checks, per Mark
>
> Changes in v2:
>   - Allow specifying an alignment for the private VA allocations, per Marc
>
>  arch/arm64/kvm/hyp/include/nvhe/mm.h |  3 +-
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c   |  5 +--
>  arch/arm64/kvm/hyp/nvhe/mm.c         | 51 ++++++++++++++++++----------
>  arch/arm64/kvm/mmu.c                 |  2 +-
>  4 files changed, 40 insertions(+), 21 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> index 2d08510c6cc1..05d06ad00347 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> @@ -20,7 +20,8 @@ int pkvm_cpu_set_vector(enum arm64_hyp_spectre_vector slot);
>  int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
>  int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot prot);
>  unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
> -                                           enum kvm_pgtable_prot prot);
> +                                       size_t align, enum kvm_pgtable_prot prot);

Minor nit: the alignment of this does not match how it was before,
i.e., it's not in line with the other function parameters. Yet it
still goes over 80 characters.

> +unsigned long pkvm_alloc_private_va_range(size_t size, size_t align);
>
>  static inline void hyp_vmemmap_range(phys_addr_t phys, unsigned long size,
>                                      unsigned long *start, unsigned long *end)
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index 5e2197db0d32..96b2312a0f1d 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -158,9 +158,10 @@ static void handle___pkvm_create_private_mapping(struct kvm_cpu_context *host_ct
>  {
>         DECLARE_REG(phys_addr_t, phys, host_ctxt, 1);
>         DECLARE_REG(size_t, size, host_ctxt, 2);
> -       DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 3);
> +       DECLARE_REG(size_t, align, host_ctxt, 3);
> +       DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 4);
>
> -       cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, prot);
> +       cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, align, prot);
>  }
>
>  static void handle___pkvm_prot_finalize(struct kvm_cpu_context *host_ctxt)
> diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
> index 526a7d6fa86f..f35468ec639d 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mm.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mm.c
> @@ -37,26 +37,46 @@ static int __pkvm_create_mappings(unsigned long start, unsigned long size,
>         return err;
>  }
>
> -unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
> -                                           enum kvm_pgtable_prot prot)
> +/*
> + * Allocates a private VA range above __io_map_base.
> + *
> + * @size:      The size of the VA range to reserve.
> + * @align:     The required alignment for the allocation.
> + */
> +unsigned long pkvm_alloc_private_va_range(size_t size, size_t align)
>  {
> -       unsigned long addr;
> -       int err;
> +       unsigned long base, addr;
>
>         hyp_spin_lock(&pkvm_pgd_lock);
>
> -       size = PAGE_ALIGN(size + offset_in_page(phys));
> -       addr = __io_map_base;
> -       __io_map_base += size;
> +       addr = ALIGN(__io_map_base, align);
> +
> +       /* The allocated size is always a multiple of PAGE_SIZE */
> +       base = addr + PAGE_ALIGN(size);
>
>         /* Are we overflowing on the vmemmap ? */
> -       if (__io_map_base > __hyp_vmemmap) {
> -               __io_map_base -= size;
> +       if (base > __hyp_vmemmap)
>                 addr = (unsigned long)ERR_PTR(-ENOMEM);
> +       else
> +               __io_map_base = base;
> +
> +       hyp_spin_unlock(&pkvm_pgd_lock);
> +
> +       return addr;
> +}
> +
> +unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
> +                                       size_t align, enum kvm_pgtable_prot prot)
> +{
> +       unsigned long addr;
> +       int err;
> +
> +       size += offset_in_page(phys);

Same as in the patch before, the previous code would align the size
but not this change. However, looking at the callers and callees this
seems to be fine, since it's aligned when needed.

Thanks,
/fuad

> +       addr = pkvm_alloc_private_va_range(size, align);
> +       if (IS_ERR((void *)addr))
>                 goto out;
> -       }
>
> -       err = kvm_pgtable_hyp_map(&pkvm_pgtable, addr, size, phys, prot);
> +       err = __pkvm_create_mappings(addr, size, phys, prot);
>         if (err) {
>                 addr = (unsigned long)ERR_PTR(err);
>                 goto out;
> @@ -64,8 +84,6 @@ unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
>
>         addr = addr + offset_in_page(phys);
>  out:
> -       hyp_spin_unlock(&pkvm_pgd_lock);
> -
>         return addr;
>  }
>
> @@ -152,11 +170,10 @@ int hyp_map_vectors(void)
>                 return 0;
>
>         phys = __hyp_pa(__bp_harden_hyp_vecs);
> -       bp_base = (void *)__pkvm_create_private_mapping(phys,
> -                                                       __BP_HARDEN_HYP_VECS_SZ,
> -                                                       PAGE_HYP_EXEC);
> +       bp_base = (void *)__pkvm_create_private_mapping(phys, __BP_HARDEN_HYP_VECS_SZ,
> +                                                       PAGE_SIZE, PAGE_HYP_EXEC);
>         if (IS_ERR_OR_NULL(bp_base))
> -               return PTR_ERR(bp_base);
> +               return bp_base ? PTR_ERR(bp_base) : -ENOMEM;
>
>         __hyp_bp_vect_base = bp_base;
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index fc09536c8197..298e6d8439ef 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -505,7 +505,7 @@ int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
>
>         if (!kvm_host_owns_hyp_mappings()) {
>                 addr = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
> -                                        phys_addr, size, prot);
> +                                        phys_addr, size, align, prot);
>                 if (IS_ERR_OR_NULL((void *)addr))
>                         return addr ? PTR_ERR((void *)addr) : -ENOMEM;
>                 *haddr = addr;
> --
> 2.35.1.473.g83b2b277ed-goog
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 2/8] KVM: arm64: Introduce pkvm_alloc_private_va_range()
@ 2022-02-24 12:25     ` Fuad Tabba
  0 siblings, 0 replies; 60+ messages in thread
From: Fuad Tabba @ 2022-02-24 12:25 UTC (permalink / raw)
  To: Kalesh Singh
  Cc: will, maz, qperret, surenb, kernel-team, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Catalin Marinas,
	Mark Rutland, Mark Brown, Masami Hiramatsu, Peter Collingbourne,
	Madhavan T. Venkataraman, Andrew Walbran, Andrew Scull,
	Paolo Bonzini, Ard Biesheuvel, linux-arm-kernel, kvmarm,
	linux-kernel

Hi Kalesh,

I really like how this makes the code cleaner in general. A couple of
small nits below.

On Thu, Feb 24, 2022 at 5:17 AM 'Kalesh Singh' via kernel-team
<kernel-team@android.com> wrote:
>
> pkvm_hyp_alloc_private_va_range() can be used to reserve private VA ranges
> in the pKVM nVHE hypervisor (). Also update __pkvm_create_private_mapping()
> to allow specifying an alignment for the private VA mapping.
>
> These will be used to implement stack guard pages for pKVM nVHE hypervisor
> (in a subsequent patch in the series).
>
> Credits to Quentin Perret <qperret@google.com> for the idea of moving
> private VA allocation out of __pkvm_create_private_mapping()
>
> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> ---
>
> Changes in v3:
>   - Handle null ptr in IS_ERR_OR_NULL checks, per Mark
>
> Changes in v2:
>   - Allow specifying an alignment for the private VA allocations, per Marc
>
>  arch/arm64/kvm/hyp/include/nvhe/mm.h |  3 +-
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c   |  5 +--
>  arch/arm64/kvm/hyp/nvhe/mm.c         | 51 ++++++++++++++++++----------
>  arch/arm64/kvm/mmu.c                 |  2 +-
>  4 files changed, 40 insertions(+), 21 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> index 2d08510c6cc1..05d06ad00347 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> @@ -20,7 +20,8 @@ int pkvm_cpu_set_vector(enum arm64_hyp_spectre_vector slot);
>  int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
>  int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot prot);
>  unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
> -                                           enum kvm_pgtable_prot prot);
> +                                       size_t align, enum kvm_pgtable_prot prot);

Minor nit: the alignment of this does not match how it was before,
i.e., it's not in line with the other function parameters. Yet it
still goes over 80 characters.

> +unsigned long pkvm_alloc_private_va_range(size_t size, size_t align);
>
>  static inline void hyp_vmemmap_range(phys_addr_t phys, unsigned long size,
>                                      unsigned long *start, unsigned long *end)
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index 5e2197db0d32..96b2312a0f1d 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -158,9 +158,10 @@ static void handle___pkvm_create_private_mapping(struct kvm_cpu_context *host_ct
>  {
>         DECLARE_REG(phys_addr_t, phys, host_ctxt, 1);
>         DECLARE_REG(size_t, size, host_ctxt, 2);
> -       DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 3);
> +       DECLARE_REG(size_t, align, host_ctxt, 3);
> +       DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 4);
>
> -       cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, prot);
> +       cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, align, prot);
>  }
>
>  static void handle___pkvm_prot_finalize(struct kvm_cpu_context *host_ctxt)
> diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
> index 526a7d6fa86f..f35468ec639d 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mm.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mm.c
> @@ -37,26 +37,46 @@ static int __pkvm_create_mappings(unsigned long start, unsigned long size,
>         return err;
>  }
>
> -unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
> -                                           enum kvm_pgtable_prot prot)
> +/*
> + * Allocates a private VA range above __io_map_base.
> + *
> + * @size:      The size of the VA range to reserve.
> + * @align:     The required alignment for the allocation.
> + */
> +unsigned long pkvm_alloc_private_va_range(size_t size, size_t align)
>  {
> -       unsigned long addr;
> -       int err;
> +       unsigned long base, addr;
>
>         hyp_spin_lock(&pkvm_pgd_lock);
>
> -       size = PAGE_ALIGN(size + offset_in_page(phys));
> -       addr = __io_map_base;
> -       __io_map_base += size;
> +       addr = ALIGN(__io_map_base, align);
> +
> +       /* The allocated size is always a multiple of PAGE_SIZE */
> +       base = addr + PAGE_ALIGN(size);
>
>         /* Are we overflowing on the vmemmap ? */
> -       if (__io_map_base > __hyp_vmemmap) {
> -               __io_map_base -= size;
> +       if (base > __hyp_vmemmap)
>                 addr = (unsigned long)ERR_PTR(-ENOMEM);
> +       else
> +               __io_map_base = base;
> +
> +       hyp_spin_unlock(&pkvm_pgd_lock);
> +
> +       return addr;
> +}
> +
> +unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
> +                                       size_t align, enum kvm_pgtable_prot prot)
> +{
> +       unsigned long addr;
> +       int err;
> +
> +       size += offset_in_page(phys);

Same as in the patch before, the previous code would align the size
but not this change. However, looking at the callers and callees this
seems to be fine, since it's aligned when needed.

Thanks,
/fuad

> +       addr = pkvm_alloc_private_va_range(size, align);
> +       if (IS_ERR((void *)addr))
>                 goto out;
> -       }
>
> -       err = kvm_pgtable_hyp_map(&pkvm_pgtable, addr, size, phys, prot);
> +       err = __pkvm_create_mappings(addr, size, phys, prot);
>         if (err) {
>                 addr = (unsigned long)ERR_PTR(err);
>                 goto out;
> @@ -64,8 +84,6 @@ unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
>
>         addr = addr + offset_in_page(phys);
>  out:
> -       hyp_spin_unlock(&pkvm_pgd_lock);
> -
>         return addr;
>  }
>
> @@ -152,11 +170,10 @@ int hyp_map_vectors(void)
>                 return 0;
>
>         phys = __hyp_pa(__bp_harden_hyp_vecs);
> -       bp_base = (void *)__pkvm_create_private_mapping(phys,
> -                                                       __BP_HARDEN_HYP_VECS_SZ,
> -                                                       PAGE_HYP_EXEC);
> +       bp_base = (void *)__pkvm_create_private_mapping(phys, __BP_HARDEN_HYP_VECS_SZ,
> +                                                       PAGE_SIZE, PAGE_HYP_EXEC);
>         if (IS_ERR_OR_NULL(bp_base))
> -               return PTR_ERR(bp_base);
> +               return bp_base ? PTR_ERR(bp_base) : -ENOMEM;
>
>         __hyp_bp_vect_base = bp_base;
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index fc09536c8197..298e6d8439ef 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -505,7 +505,7 @@ int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
>
>         if (!kvm_host_owns_hyp_mappings()) {
>                 addr = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
> -                                        phys_addr, size, prot);
> +                                        phys_addr, size, align, prot);
>                 if (IS_ERR_OR_NULL((void *)addr))
>                         return addr ? PTR_ERR((void *)addr) : -ENOMEM;
>                 *haddr = addr;
> --
> 2.35.1.473.g83b2b277ed-goog
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 3/8] KVM: arm64: Add guard pages for KVM nVHE hypervisor stack
  2022-02-24  5:13   ` Kalesh Singh
  (?)
@ 2022-02-24 12:26     ` Fuad Tabba
  -1 siblings, 0 replies; 60+ messages in thread
From: Fuad Tabba @ 2022-02-24 12:26 UTC (permalink / raw)
  To: Kalesh Singh
  Cc: will, maz, qperret, surenb, kernel-team, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Catalin Marinas,
	Mark Rutland, Mark Brown, Masami Hiramatsu, Peter Collingbourne,
	Madhavan T. Venkataraman, Andrew Walbran, Andrew Scull,
	Paolo Bonzini, linux-arm-kernel, kvmarm, linux-kernel

Hi Kalesh,



On Thu, Feb 24, 2022 at 5:18 AM Kalesh Singh <kaleshsingh@google.com> wrote:
>
> Maps the stack pages in the flexible private VA range and allocates
> guard pages below the stack as unbacked VA space. The stack is aligned
> to twice its size to aid overflow detection (implemented in a subsequent
> patch in the series).
>
> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> ---
>
> Changes in v3:
>   - Handle null ptr in IS_ERR_OR_NULL checks, per Mark
>
>  arch/arm64/include/asm/kvm_asm.h |  1 +
>  arch/arm64/kvm/arm.c             | 32 +++++++++++++++++++++++++++++---
>  2 files changed, 30 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index d5b0386ef765..2e277f2ed671 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -169,6 +169,7 @@ struct kvm_nvhe_init_params {
>         unsigned long tcr_el2;
>         unsigned long tpidr_el2;
>         unsigned long stack_hyp_va;
> +       unsigned long stack_pa;
>         phys_addr_t pgd_pa;
>         unsigned long hcr_el2;
>         unsigned long vttbr;
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index ecc5958e27fe..7a23630c4a7f 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1541,7 +1541,6 @@ static void cpu_prepare_hyp_mode(int cpu)
>         tcr |= (idmap_t0sz & GENMASK(TCR_TxSZ_WIDTH - 1, 0)) << TCR_T0SZ_OFFSET;
>         params->tcr_el2 = tcr;
>
> -       params->stack_hyp_va = kern_hyp_va(per_cpu(kvm_arm_hyp_stack_page, cpu) + PAGE_SIZE);
>         params->pgd_pa = kvm_mmu_get_httbr();
>         if (is_protected_kvm_enabled())
>                 params->hcr_el2 = HCR_HOST_NVHE_PROTECTED_FLAGS;
> @@ -1990,14 +1989,41 @@ static int init_hyp_mode(void)
>          * Map the Hyp stack pages
>          */
>         for_each_possible_cpu(cpu) {
> +               struct kvm_nvhe_init_params *params = per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
>                 char *stack_page = (char *)per_cpu(kvm_arm_hyp_stack_page, cpu);
> -               err = create_hyp_mappings(stack_page, stack_page + PAGE_SIZE,
> -                                         PAGE_HYP);
> +               unsigned long stack_hyp_va, guard_hyp_va;
>
> +               /*
> +                * Private mappings are allocated downwards from io_map_base
> +                * so allocate the stack first then the guard page.
> +                *
> +                * The stack is aligned to twice its size to facilitate overflow
> +                * detection.
> +                */
> +               err = __create_hyp_private_mapping(__pa(stack_page), PAGE_SIZE,
> +                                               PAGE_SIZE * 2, &stack_hyp_va, PAGE_HYP);
>                 if (err) {
>                         kvm_err("Cannot map hyp stack\n");
>                         goto out_err;
>                 }
> +
> +               /* Allocate unbacked private VA range for stack guard page */
> +               guard_hyp_va = hyp_alloc_private_va_range(PAGE_SIZE, PAGE_SIZE);
> +               if (IS_ERR_OR_NULL((void *)guard_hyp_va)) {
> +                       err = guard_hyp_va ? PTR_ERR((void *)guard_hyp_va) : -ENOMEM;

I am a bit confused by this check. hyp_alloc_private_va_range() always
returns ERR_PTR(-ENOMEM) if there's an error. Mark's comment (if I
understood it correctly) was about how you were handling it *in*
hyp_alloc_private_va_range(), rather than calls *to*
hyp_alloc_private_va_range().

> +                       kvm_err("Cannot allocate hyp stack guard page\n");
> +                       goto out_err;
> +               }
> +
> +               /*
> +                * Save the stack PA in nvhe_init_params. This will be needed to recreate
> +                * the stack mapping in protected nVHE mode. __hyp_pa() won't do the right
> +                * thing there, since the stack has been mapped in the flexible private
> +                * VA space.
> +                */

Nit: These comments go over 80 columns, unlike other comments that
you've added in this file.

Thanks,
/fuad

> +               params->stack_pa = __pa(stack_page) + PAGE_SIZE;
> +
> +               params->stack_hyp_va = stack_hyp_va + PAGE_SIZE;
>         }
>
>         for_each_possible_cpu(cpu) {
> --
> 2.35.1.473.g83b2b277ed-goog
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 3/8] KVM: arm64: Add guard pages for KVM nVHE hypervisor stack
@ 2022-02-24 12:26     ` Fuad Tabba
  0 siblings, 0 replies; 60+ messages in thread
From: Fuad Tabba @ 2022-02-24 12:26 UTC (permalink / raw)
  To: Kalesh Singh
  Cc: kernel-team, Andrew Walbran, will, Peter Collingbourne, maz,
	linux-kernel, Madhavan T. Venkataraman, Mark Brown,
	Masami Hiramatsu, Catalin Marinas, Paolo Bonzini, surenb, kvmarm,
	linux-arm-kernel

Hi Kalesh,



On Thu, Feb 24, 2022 at 5:18 AM Kalesh Singh <kaleshsingh@google.com> wrote:
>
> Maps the stack pages in the flexible private VA range and allocates
> guard pages below the stack as unbacked VA space. The stack is aligned
> to twice its size to aid overflow detection (implemented in a subsequent
> patch in the series).
>
> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> ---
>
> Changes in v3:
>   - Handle null ptr in IS_ERR_OR_NULL checks, per Mark
>
>  arch/arm64/include/asm/kvm_asm.h |  1 +
>  arch/arm64/kvm/arm.c             | 32 +++++++++++++++++++++++++++++---
>  2 files changed, 30 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index d5b0386ef765..2e277f2ed671 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -169,6 +169,7 @@ struct kvm_nvhe_init_params {
>         unsigned long tcr_el2;
>         unsigned long tpidr_el2;
>         unsigned long stack_hyp_va;
> +       unsigned long stack_pa;
>         phys_addr_t pgd_pa;
>         unsigned long hcr_el2;
>         unsigned long vttbr;
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index ecc5958e27fe..7a23630c4a7f 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1541,7 +1541,6 @@ static void cpu_prepare_hyp_mode(int cpu)
>         tcr |= (idmap_t0sz & GENMASK(TCR_TxSZ_WIDTH - 1, 0)) << TCR_T0SZ_OFFSET;
>         params->tcr_el2 = tcr;
>
> -       params->stack_hyp_va = kern_hyp_va(per_cpu(kvm_arm_hyp_stack_page, cpu) + PAGE_SIZE);
>         params->pgd_pa = kvm_mmu_get_httbr();
>         if (is_protected_kvm_enabled())
>                 params->hcr_el2 = HCR_HOST_NVHE_PROTECTED_FLAGS;
> @@ -1990,14 +1989,41 @@ static int init_hyp_mode(void)
>          * Map the Hyp stack pages
>          */
>         for_each_possible_cpu(cpu) {
> +               struct kvm_nvhe_init_params *params = per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
>                 char *stack_page = (char *)per_cpu(kvm_arm_hyp_stack_page, cpu);
> -               err = create_hyp_mappings(stack_page, stack_page + PAGE_SIZE,
> -                                         PAGE_HYP);
> +               unsigned long stack_hyp_va, guard_hyp_va;
>
> +               /*
> +                * Private mappings are allocated downwards from io_map_base
> +                * so allocate the stack first then the guard page.
> +                *
> +                * The stack is aligned to twice its size to facilitate overflow
> +                * detection.
> +                */
> +               err = __create_hyp_private_mapping(__pa(stack_page), PAGE_SIZE,
> +                                               PAGE_SIZE * 2, &stack_hyp_va, PAGE_HYP);
>                 if (err) {
>                         kvm_err("Cannot map hyp stack\n");
>                         goto out_err;
>                 }
> +
> +               /* Allocate unbacked private VA range for stack guard page */
> +               guard_hyp_va = hyp_alloc_private_va_range(PAGE_SIZE, PAGE_SIZE);
> +               if (IS_ERR_OR_NULL((void *)guard_hyp_va)) {
> +                       err = guard_hyp_va ? PTR_ERR((void *)guard_hyp_va) : -ENOMEM;

I am a bit confused by this check. hyp_alloc_private_va_range() always
returns ERR_PTR(-ENOMEM) if there's an error. Mark's comment (if I
understood it correctly) was about how you were handling it *in*
hyp_alloc_private_va_range(), rather than calls *to*
hyp_alloc_private_va_range().

> +                       kvm_err("Cannot allocate hyp stack guard page\n");
> +                       goto out_err;
> +               }
> +
> +               /*
> +                * Save the stack PA in nvhe_init_params. This will be needed to recreate
> +                * the stack mapping in protected nVHE mode. __hyp_pa() won't do the right
> +                * thing there, since the stack has been mapped in the flexible private
> +                * VA space.
> +                */

Nit: These comments go over 80 columns, unlike other comments that
you've added in this file.

Thanks,
/fuad

> +               params->stack_pa = __pa(stack_page) + PAGE_SIZE;
> +
> +               params->stack_hyp_va = stack_hyp_va + PAGE_SIZE;
>         }
>
>         for_each_possible_cpu(cpu) {
> --
> 2.35.1.473.g83b2b277ed-goog
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 3/8] KVM: arm64: Add guard pages for KVM nVHE hypervisor stack
@ 2022-02-24 12:26     ` Fuad Tabba
  0 siblings, 0 replies; 60+ messages in thread
From: Fuad Tabba @ 2022-02-24 12:26 UTC (permalink / raw)
  To: Kalesh Singh
  Cc: will, maz, qperret, surenb, kernel-team, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Catalin Marinas,
	Mark Rutland, Mark Brown, Masami Hiramatsu, Peter Collingbourne,
	Madhavan T. Venkataraman, Andrew Walbran, Andrew Scull,
	Paolo Bonzini, linux-arm-kernel, kvmarm, linux-kernel

Hi Kalesh,



On Thu, Feb 24, 2022 at 5:18 AM Kalesh Singh <kaleshsingh@google.com> wrote:
>
> Maps the stack pages in the flexible private VA range and allocates
> guard pages below the stack as unbacked VA space. The stack is aligned
> to twice its size to aid overflow detection (implemented in a subsequent
> patch in the series).
>
> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> ---
>
> Changes in v3:
>   - Handle null ptr in IS_ERR_OR_NULL checks, per Mark
>
>  arch/arm64/include/asm/kvm_asm.h |  1 +
>  arch/arm64/kvm/arm.c             | 32 +++++++++++++++++++++++++++++---
>  2 files changed, 30 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index d5b0386ef765..2e277f2ed671 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -169,6 +169,7 @@ struct kvm_nvhe_init_params {
>         unsigned long tcr_el2;
>         unsigned long tpidr_el2;
>         unsigned long stack_hyp_va;
> +       unsigned long stack_pa;
>         phys_addr_t pgd_pa;
>         unsigned long hcr_el2;
>         unsigned long vttbr;
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index ecc5958e27fe..7a23630c4a7f 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1541,7 +1541,6 @@ static void cpu_prepare_hyp_mode(int cpu)
>         tcr |= (idmap_t0sz & GENMASK(TCR_TxSZ_WIDTH - 1, 0)) << TCR_T0SZ_OFFSET;
>         params->tcr_el2 = tcr;
>
> -       params->stack_hyp_va = kern_hyp_va(per_cpu(kvm_arm_hyp_stack_page, cpu) + PAGE_SIZE);
>         params->pgd_pa = kvm_mmu_get_httbr();
>         if (is_protected_kvm_enabled())
>                 params->hcr_el2 = HCR_HOST_NVHE_PROTECTED_FLAGS;
> @@ -1990,14 +1989,41 @@ static int init_hyp_mode(void)
>          * Map the Hyp stack pages
>          */
>         for_each_possible_cpu(cpu) {
> +               struct kvm_nvhe_init_params *params = per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
>                 char *stack_page = (char *)per_cpu(kvm_arm_hyp_stack_page, cpu);
> -               err = create_hyp_mappings(stack_page, stack_page + PAGE_SIZE,
> -                                         PAGE_HYP);
> +               unsigned long stack_hyp_va, guard_hyp_va;
>
> +               /*
> +                * Private mappings are allocated downwards from io_map_base
> +                * so allocate the stack first then the guard page.
> +                *
> +                * The stack is aligned to twice its size to facilitate overflow
> +                * detection.
> +                */
> +               err = __create_hyp_private_mapping(__pa(stack_page), PAGE_SIZE,
> +                                               PAGE_SIZE * 2, &stack_hyp_va, PAGE_HYP);
>                 if (err) {
>                         kvm_err("Cannot map hyp stack\n");
>                         goto out_err;
>                 }
> +
> +               /* Allocate unbacked private VA range for stack guard page */
> +               guard_hyp_va = hyp_alloc_private_va_range(PAGE_SIZE, PAGE_SIZE);
> +               if (IS_ERR_OR_NULL((void *)guard_hyp_va)) {
> +                       err = guard_hyp_va ? PTR_ERR((void *)guard_hyp_va) : -ENOMEM;

I am a bit confused by this check. hyp_alloc_private_va_range() always
returns ERR_PTR(-ENOMEM) if there's an error. Mark's comment (if I
understood it correctly) was about how you were handling it *in*
hyp_alloc_private_va_range(), rather than calls *to*
hyp_alloc_private_va_range().

> +                       kvm_err("Cannot allocate hyp stack guard page\n");
> +                       goto out_err;
> +               }
> +
> +               /*
> +                * Save the stack PA in nvhe_init_params. This will be needed to recreate
> +                * the stack mapping in protected nVHE mode. __hyp_pa() won't do the right
> +                * thing there, since the stack has been mapped in the flexible private
> +                * VA space.
> +                */

Nit: These comments go over 80 columns, unlike other comments that
you've added in this file.

Thanks,
/fuad

> +               params->stack_pa = __pa(stack_page) + PAGE_SIZE;
> +
> +               params->stack_hyp_va = stack_hyp_va + PAGE_SIZE;
>         }
>
>         for_each_possible_cpu(cpu) {
> --
> 2.35.1.473.g83b2b277ed-goog
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 6/8] KVM: arm64: Add hypervisor overflow stack
  2022-02-24  5:13   ` Kalesh Singh
  (?)
@ 2022-02-24 12:26     ` Fuad Tabba
  -1 siblings, 0 replies; 60+ messages in thread
From: Fuad Tabba @ 2022-02-24 12:26 UTC (permalink / raw)
  To: Kalesh Singh
  Cc: will, maz, qperret, surenb, kernel-team, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Catalin Marinas,
	Mark Rutland, Mark Brown, Masami Hiramatsu, Peter Collingbourne,
	Madhavan T. Venkataraman, Andrew Walbran, Andrew Scull,
	Zenghui Yu, Ard Biesheuvel, Paolo Bonzini, linux-arm-kernel,
	kvmarm, linux-kernel

Hi Kalesh,

On Thu, Feb 24, 2022 at 5:21 AM Kalesh Singh <kaleshsingh@google.com> wrote:
>
> Allocate and switch to 16-byte aligned secondary stack on overflow. This
> provides us stack space to better handle overflows; and is used in
> a subsequent patch to dump the hypervisor stacktrace. The overflow stack
> is only allocated if CONFIG_NVHE_EL2_DEBUG is enabled, as hypervisor
> stacktraces is a debug feature dependent on CONFIG_NVHE_EL2_DEBUG.
>
> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> ---
>  arch/arm64/kvm/hyp/nvhe/host.S   | 5 +++++
>  arch/arm64/kvm/hyp/nvhe/switch.c | 5 +++++
>  2 files changed, 10 insertions(+)
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
> index 749961bfa5ba..367a01e8abed 100644
> --- a/arch/arm64/kvm/hyp/nvhe/host.S
> +++ b/arch/arm64/kvm/hyp/nvhe/host.S
> @@ -179,6 +179,10 @@ SYM_FUNC_END(__host_hvc)
>         b       hyp_panic
>
>  .L__hyp_sp_overflow\@:
> +#ifdef CONFIG_NVHE_EL2_DEBUG
> +       /* Switch to the overflow stack */
> +       adr_this_cpu sp, hyp_overflow_stack + PAGE_SIZE, x0
> +#else
>         /*
>          * Reset SP to the top of the stack, to allow handling the hyp_panic.
>          * This corrupts the stack but is ok, since we won't be attempting
> @@ -186,6 +190,7 @@ SYM_FUNC_END(__host_hvc)
>          */

Nit: Maybe you should update this comment as well, since whether it
corrupts the stack or not depends on what happens above with
CONFIG_NVHE_EL2_DEBUG.

Thanks,
/fuad

>         ldr_this_cpu    x0, kvm_init_params + NVHE_INIT_STACK_HYP_VA, x1
>         mov     sp, x0
> +#endif
>
>         bl      hyp_panic_bad_stack
>         ASM_BUG()
> diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
> index 703a5d3f611b..efc20273a352 100644
> --- a/arch/arm64/kvm/hyp/nvhe/switch.c
> +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
> @@ -34,6 +34,11 @@ DEFINE_PER_CPU(struct kvm_host_data, kvm_host_data);
>  DEFINE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
>  DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
>
> +#ifdef CONFIG_NVHE_EL2_DEBUG
> +DEFINE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack)
> +       __aligned(16);
> +#endif
> +
>  static void __activate_traps(struct kvm_vcpu *vcpu)
>  {
>         u64 val;
> --
> 2.35.1.473.g83b2b277ed-goog
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 6/8] KVM: arm64: Add hypervisor overflow stack
@ 2022-02-24 12:26     ` Fuad Tabba
  0 siblings, 0 replies; 60+ messages in thread
From: Fuad Tabba @ 2022-02-24 12:26 UTC (permalink / raw)
  To: Kalesh Singh
  Cc: Catalin Marinas, will, kvmarm, Andrew Walbran, maz,
	Madhavan T. Venkataraman, kernel-team, surenb, Mark Brown,
	Peter Collingbourne, linux-arm-kernel, linux-kernel,
	Masami Hiramatsu, Paolo Bonzini

Hi Kalesh,

On Thu, Feb 24, 2022 at 5:21 AM Kalesh Singh <kaleshsingh@google.com> wrote:
>
> Allocate and switch to 16-byte aligned secondary stack on overflow. This
> provides us stack space to better handle overflows; and is used in
> a subsequent patch to dump the hypervisor stacktrace. The overflow stack
> is only allocated if CONFIG_NVHE_EL2_DEBUG is enabled, as hypervisor
> stacktraces is a debug feature dependent on CONFIG_NVHE_EL2_DEBUG.
>
> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> ---
>  arch/arm64/kvm/hyp/nvhe/host.S   | 5 +++++
>  arch/arm64/kvm/hyp/nvhe/switch.c | 5 +++++
>  2 files changed, 10 insertions(+)
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
> index 749961bfa5ba..367a01e8abed 100644
> --- a/arch/arm64/kvm/hyp/nvhe/host.S
> +++ b/arch/arm64/kvm/hyp/nvhe/host.S
> @@ -179,6 +179,10 @@ SYM_FUNC_END(__host_hvc)
>         b       hyp_panic
>
>  .L__hyp_sp_overflow\@:
> +#ifdef CONFIG_NVHE_EL2_DEBUG
> +       /* Switch to the overflow stack */
> +       adr_this_cpu sp, hyp_overflow_stack + PAGE_SIZE, x0
> +#else
>         /*
>          * Reset SP to the top of the stack, to allow handling the hyp_panic.
>          * This corrupts the stack but is ok, since we won't be attempting
> @@ -186,6 +190,7 @@ SYM_FUNC_END(__host_hvc)
>          */

Nit: Maybe you should update this comment as well, since whether it
corrupts the stack or not depends on what happens above with
CONFIG_NVHE_EL2_DEBUG.

Thanks,
/fuad

>         ldr_this_cpu    x0, kvm_init_params + NVHE_INIT_STACK_HYP_VA, x1
>         mov     sp, x0
> +#endif
>
>         bl      hyp_panic_bad_stack
>         ASM_BUG()
> diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
> index 703a5d3f611b..efc20273a352 100644
> --- a/arch/arm64/kvm/hyp/nvhe/switch.c
> +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
> @@ -34,6 +34,11 @@ DEFINE_PER_CPU(struct kvm_host_data, kvm_host_data);
>  DEFINE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
>  DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
>
> +#ifdef CONFIG_NVHE_EL2_DEBUG
> +DEFINE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack)
> +       __aligned(16);
> +#endif
> +
>  static void __activate_traps(struct kvm_vcpu *vcpu)
>  {
>         u64 val;
> --
> 2.35.1.473.g83b2b277ed-goog
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 6/8] KVM: arm64: Add hypervisor overflow stack
@ 2022-02-24 12:26     ` Fuad Tabba
  0 siblings, 0 replies; 60+ messages in thread
From: Fuad Tabba @ 2022-02-24 12:26 UTC (permalink / raw)
  To: Kalesh Singh
  Cc: will, maz, qperret, surenb, kernel-team, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Catalin Marinas,
	Mark Rutland, Mark Brown, Masami Hiramatsu, Peter Collingbourne,
	Madhavan T. Venkataraman, Andrew Walbran, Andrew Scull,
	Zenghui Yu, Ard Biesheuvel, Paolo Bonzini, linux-arm-kernel,
	kvmarm, linux-kernel

Hi Kalesh,

On Thu, Feb 24, 2022 at 5:21 AM Kalesh Singh <kaleshsingh@google.com> wrote:
>
> Allocate and switch to 16-byte aligned secondary stack on overflow. This
> provides us stack space to better handle overflows; and is used in
> a subsequent patch to dump the hypervisor stacktrace. The overflow stack
> is only allocated if CONFIG_NVHE_EL2_DEBUG is enabled, as hypervisor
> stacktraces is a debug feature dependent on CONFIG_NVHE_EL2_DEBUG.
>
> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> ---
>  arch/arm64/kvm/hyp/nvhe/host.S   | 5 +++++
>  arch/arm64/kvm/hyp/nvhe/switch.c | 5 +++++
>  2 files changed, 10 insertions(+)
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
> index 749961bfa5ba..367a01e8abed 100644
> --- a/arch/arm64/kvm/hyp/nvhe/host.S
> +++ b/arch/arm64/kvm/hyp/nvhe/host.S
> @@ -179,6 +179,10 @@ SYM_FUNC_END(__host_hvc)
>         b       hyp_panic
>
>  .L__hyp_sp_overflow\@:
> +#ifdef CONFIG_NVHE_EL2_DEBUG
> +       /* Switch to the overflow stack */
> +       adr_this_cpu sp, hyp_overflow_stack + PAGE_SIZE, x0
> +#else
>         /*
>          * Reset SP to the top of the stack, to allow handling the hyp_panic.
>          * This corrupts the stack but is ok, since we won't be attempting
> @@ -186,6 +190,7 @@ SYM_FUNC_END(__host_hvc)
>          */

Nit: Maybe you should update this comment as well, since whether it
corrupts the stack or not depends on what happens above with
CONFIG_NVHE_EL2_DEBUG.

Thanks,
/fuad

>         ldr_this_cpu    x0, kvm_init_params + NVHE_INIT_STACK_HYP_VA, x1
>         mov     sp, x0
> +#endif
>
>         bl      hyp_panic_bad_stack
>         ASM_BUG()
> diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
> index 703a5d3f611b..efc20273a352 100644
> --- a/arch/arm64/kvm/hyp/nvhe/switch.c
> +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
> @@ -34,6 +34,11 @@ DEFINE_PER_CPU(struct kvm_host_data, kvm_host_data);
>  DEFINE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
>  DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
>
> +#ifdef CONFIG_NVHE_EL2_DEBUG
> +DEFINE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack)
> +       __aligned(16);
> +#endif
> +
>  static void __activate_traps(struct kvm_vcpu *vcpu)
>  {
>         u64 val;
> --
> 2.35.1.473.g83b2b277ed-goog
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 7/8] KVM: arm64: Unwind and dump nVHE HYP stacktrace
  2022-02-24  5:13   ` Kalesh Singh
  (?)
@ 2022-02-24 12:28     ` Fuad Tabba
  -1 siblings, 0 replies; 60+ messages in thread
From: Fuad Tabba @ 2022-02-24 12:28 UTC (permalink / raw)
  To: Kalesh Singh
  Cc: will, maz, qperret, surenb, kernel-team, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Catalin Marinas,
	Mark Rutland, Mark Brown, Masami Hiramatsu, Peter Collingbourne,
	Madhavan T. Venkataraman, Andrew Scull, Paolo Bonzini,
	Ard Biesheuvel, linux-arm-kernel, kvmarm, linux-kernel

Hi Kalesh,

On Thu, Feb 24, 2022 at 5:22 AM Kalesh Singh <kaleshsingh@google.com> wrote:
>
> Unwind the stack in EL1, when CONFIG_NVHE_EL2_DEBUG is enabled. This is
> possible because CONFIG_NVHE_EL2_DEBUG disables the host stage 2 protection
> which allows host to access the hypervisor stack pages in EL1.

For this comment to be clearer, and if my understanding is correct, I
think that it should say that CONFIG_NVHE_EL2_DEBUG allows host stage
2 protection to be disabled on a hyp_panic. Otherwise, on reading the
comment one might think that CONFIG_NVHE_EL2_DEBUG runs without host
stage 2 protection at all.

>
> Unwinding and dumping hyp call traces is gated on CONFIG_NVHE_EL2_DEBUG
> to avoid the potential leaking of information to the host.
>
> A simple stack overflow test produces the following output:
>
> [  580.376051][  T412] kvm: nVHE hyp panic at: ffffffc0116145c4!
> [  580.378034][  T412] kvm [412]: nVHE HYP call trace:
> [  580.378591][  T412] kvm [412]:  [<ffffffc011614934>]
> [  580.378993][  T412] kvm [412]:  [<ffffffc01160fa48>]
> [  580.379386][  T412] kvm [412]:  [<ffffffc0116145dc>]  // Non-terminating recursive call
> [  580.379772][  T412] kvm [412]:  [<ffffffc0116145dc>]
> [  580.380158][  T412] kvm [412]:  [<ffffffc0116145dc>]
> [  580.380544][  T412] kvm [412]:  [<ffffffc0116145dc>]
> [  580.380928][  T412] kvm [412]:  [<ffffffc0116145dc>]
> . . .
>
> Since nVHE hyp symbols are not included by kallsyms to avoid issues
> with aliasing, we fallback to the vmlinux addresses. Symbolizing the
> addresses is handled in the next patch in this series.
>
> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> ---
>
> Changes in v3:
>   - The nvhe hyp stack unwinder now makes use of the core logic from the
>     regular kernel unwinder to avoid duplication, per Mark
>
> Changes in v2:
>   - Add cpu_prepare_nvhe_panic_info()
>   - Move updating the panic info to hyp_panic(), so that unwinding also
>     works for conventional nVHE Hyp-mode.
>
>  arch/arm64/include/asm/kvm_asm.h    |  19 +++
>  arch/arm64/include/asm/stacktrace.h |  12 ++
>  arch/arm64/kernel/stacktrace.c      | 210 +++++++++++++++++++++++++---
>  arch/arm64/kvm/Kconfig              |   5 +-
>  arch/arm64/kvm/arm.c                |   2 +-
>  arch/arm64/kvm/handle_exit.c        |   3 +
>  arch/arm64/kvm/hyp/nvhe/switch.c    |  18 +++
>  7 files changed, 243 insertions(+), 26 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 2e277f2ed671..16efdf150a37 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -176,6 +176,25 @@ struct kvm_nvhe_init_params {
>         unsigned long vtcr;
>  };
>
> +#ifdef CONFIG_NVHE_EL2_DEBUG
> +/*
> + * Used by the host in EL1 to dump the nVHE hypervisor backtrace on
> + * hyp_panic. This is possible because CONFIG_NVHE_EL2_DEBUG disables
> + * the host stage 2 protection. See: __hyp_do_panic()

Same as my comment above.

> + *
> + * @hyp_stack_base:             hyp VA of the hyp_stack base.
> + * @hyp_overflow_stack_base:    hyp VA of the hyp_overflow_stack base.
> + * @fp:                         hyp FP where the backtrace begins.
> + * @pc:                         hyp PC where the backtrace begins.
> + */
> +struct kvm_nvhe_panic_info {
> +       unsigned long hyp_stack_base;
> +       unsigned long hyp_overflow_stack_base;
> +       unsigned long fp;
> +       unsigned long pc;
> +};
> +#endif /* CONFIG_NVHE_EL2_DEBUG */
> +
>  /* Translate a kernel address @ptr into its equivalent linear mapping */
>  #define kvm_ksym_ref(ptr)                                              \
>         ({                                                              \
> diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
> index e77cdef9ca29..18611a51cf14 100644
> --- a/arch/arm64/include/asm/stacktrace.h
> +++ b/arch/arm64/include/asm/stacktrace.h
> @@ -22,6 +22,10 @@ enum stack_type {
>         STACK_TYPE_OVERFLOW,
>         STACK_TYPE_SDEI_NORMAL,
>         STACK_TYPE_SDEI_CRITICAL,
> +#ifdef CONFIG_NVHE_EL2_DEBUG
> +       STACK_TYPE_KVM_NVHE_HYP,
> +       STACK_TYPE_KVM_NVHE_OVERFLOW,
> +#endif /* CONFIG_NVHE_EL2_DEBUG */
>         __NR_STACK_TYPES
>  };
>
> @@ -147,4 +151,12 @@ static inline bool on_accessible_stack(const struct task_struct *tsk,
>         return false;
>  }
>
> +#ifdef CONFIG_NVHE_EL2_DEBUG
> +void kvm_nvhe_dump_backtrace(unsigned long hyp_offset);
> +#else
> +static inline void kvm_nvhe_dump_backtrace(unsigned long hyp_offset)
> +{
> +}
> +#endif /* CONFIG_NVHE_EL2_DEBUG */
> +
>  #endif /* __ASM_STACKTRACE_H */
> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> index e4103e085681..6ec85cb69b1f 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -15,6 +15,8 @@
>
>  #include <asm/irq.h>
>  #include <asm/pointer_auth.h>
> +#include <asm/kvm_asm.h>
> +#include <asm/kvm_hyp.h>
>  #include <asm/stack_pointer.h>
>  #include <asm/stacktrace.h>
>
> @@ -64,26 +66,15 @@ NOKPROBE_SYMBOL(start_backtrace);
>   * records (e.g. a cycle), determined based on the location and fp value of A
>   * and the location (but not the fp value) of B.
>   */
> -static int notrace unwind_frame(struct task_struct *tsk,
> -                               struct stackframe *frame)
> +static int notrace __unwind_frame(struct stackframe *frame, struct stack_info *info,
> +               unsigned long (*translate_fp)(unsigned long, enum stack_type))
>  {
>         unsigned long fp = frame->fp;
> -       struct stack_info info;
> -
> -       if (!tsk)
> -               tsk = current;
> -
> -       /* Final frame; nothing to unwind */
> -       if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
> -               return -ENOENT;
>
>         if (fp & 0x7)
>                 return -EINVAL;
>
> -       if (!on_accessible_stack(tsk, fp, 16, &info))
> -               return -EINVAL;
> -
> -       if (test_bit(info.type, frame->stacks_done))
> +       if (test_bit(info->type, frame->stacks_done))
>                 return -EINVAL;
>
>         /*
> @@ -94,28 +85,62 @@ static int notrace unwind_frame(struct task_struct *tsk,
>          *
>          * TASK -> IRQ -> OVERFLOW -> SDEI_NORMAL
>          * TASK -> SDEI_NORMAL -> SDEI_CRITICAL -> OVERFLOW
> +        * KVM_NVHE_HYP -> KVM_NVHE_OVERFLOW
>          *
>          * ... but the nesting itself is strict. Once we transition from one
>          * stack to another, it's never valid to unwind back to that first
>          * stack.
>          */
> -       if (info.type == frame->prev_type) {
> +       if (info->type == frame->prev_type) {
>                 if (fp <= frame->prev_fp)
>                         return -EINVAL;
>         } else {
>                 set_bit(frame->prev_type, frame->stacks_done);
>         }
>
> +       /* Record fp as prev_fp before attempting to get the next fp */
> +       frame->prev_fp = fp;
> +
> +       /*
> +        * If fp is not from the current address space perform the
> +        * necessary translation before dereferencing it to get next fp.
> +        */
> +       if (translate_fp)
> +               fp = translate_fp(fp, info->type);
> +       if (!fp)
> +               return -EINVAL;
> +
>         /*
>          * Record this frame record's values and location. The prev_fp and
> -        * prev_type are only meaningful to the next unwind_frame() invocation.
> +        * prev_type are only meaningful to the next __unwind_frame() invocation.
>          */
>         frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
>         frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
> -       frame->prev_fp = fp;
> -       frame->prev_type = info.type;
> -
>         frame->pc = ptrauth_strip_insn_pac(frame->pc);
> +       frame->prev_type = info->type;
> +
> +       return 0;
> +}
> +
> +static int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
> +{
> +       unsigned long fp = frame->fp;
> +       struct stack_info info;
> +       int err;
> +
> +       if (!tsk)
> +               tsk = current;
> +
> +       /* Final frame; nothing to unwind */
> +       if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
> +               return -ENOENT;
> +
> +       if (!on_accessible_stack(tsk, fp, 16, &info))
> +               return -EINVAL;
> +
> +       err = __unwind_frame(frame, &info, NULL);
> +       if (err)
> +               return err;
>
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         if (tsk->ret_stack &&
> @@ -143,20 +168,27 @@ static int notrace unwind_frame(struct task_struct *tsk,
>  }
>  NOKPROBE_SYMBOL(unwind_frame);
>
> -static void notrace walk_stackframe(struct task_struct *tsk,
> -                                   struct stackframe *frame,
> -                                   bool (*fn)(void *, unsigned long), void *data)
> +static void notrace __walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
> +               bool (*fn)(void *, unsigned long), void *data,
> +               int (*unwind_frame_fn)(struct task_struct *tsk, struct stackframe *frame))
>  {
>         while (1) {
>                 int ret;
>
>                 if (!fn(data, frame->pc))
>                         break;
> -               ret = unwind_frame(tsk, frame);
> +               ret = unwind_frame_fn(tsk, frame);
>                 if (ret < 0)
>                         break;
>         }
>  }
> +
> +static void notrace walk_stackframe(struct task_struct *tsk,
> +                                   struct stackframe *frame,
> +                                   bool (*fn)(void *, unsigned long), void *data)
> +{
> +       __walk_stackframe(tsk, frame, fn, data, unwind_frame);
> +}
>  NOKPROBE_SYMBOL(walk_stackframe);
>
>  static bool dump_backtrace_entry(void *arg, unsigned long where)
> @@ -210,3 +242,135 @@ noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
>
>         walk_stackframe(task, &frame, consume_entry, cookie);
>  }
> +
> +#ifdef CONFIG_NVHE_EL2_DEBUG
> +DECLARE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
> +DECLARE_KVM_NVHE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack);
> +DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
> +
> +static inline bool kvm_nvhe_on_overflow_stack(unsigned long sp, unsigned long size,
> +                                struct stack_info *info)
> +{
> +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> +       unsigned long low = (unsigned long)panic_info->hyp_overflow_stack_base;
> +       unsigned long high = low + PAGE_SIZE;
> +
> +       return on_stack(sp, size, low, high, STACK_TYPE_KVM_NVHE_OVERFLOW, info);
> +}
> +
> +static inline bool kvm_nvhe_on_hyp_stack(unsigned long sp, unsigned long size,
> +                                struct stack_info *info)
> +{
> +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> +       unsigned long low = (unsigned long)panic_info->hyp_stack_base;
> +       unsigned long high = low + PAGE_SIZE;
> +
> +       return on_stack(sp, size, low, high, STACK_TYPE_KVM_NVHE_HYP, info);
> +}
> +
> +static inline bool kvm_nvhe_on_accessible_stack(unsigned long sp, unsigned long size,
> +                                      struct stack_info *info)
> +{
> +       if (info)
> +               info->type = STACK_TYPE_UNKNOWN;
> +
> +       if (kvm_nvhe_on_hyp_stack(sp, size, info))
> +               return true;
> +       if (kvm_nvhe_on_overflow_stack(sp, size, info))
> +               return true;
> +
> +       return false;
> +}
> +
> +static unsigned long kvm_nvhe_hyp_stack_kern_va(unsigned long addr)
> +{
> +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> +       unsigned long hyp_base, kern_base, hyp_offset;
> +
> +       hyp_base = (unsigned long)panic_info->hyp_stack_base;
> +       hyp_offset = addr - hyp_base;
> +
> +       kern_base = (unsigned long)*this_cpu_ptr(&kvm_arm_hyp_stack_page);
> +
> +       return kern_base + hyp_offset;
> +}
> +
> +static unsigned long kvm_nvhe_overflow_stack_kern_va(unsigned long addr)
> +{
> +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> +       unsigned long hyp_base, kern_base, hyp_offset;
> +
> +       hyp_base = (unsigned long)panic_info->hyp_overflow_stack_base;
> +       hyp_offset = addr - hyp_base;
> +
> +       kern_base = (unsigned long)this_cpu_ptr_nvhe_sym(hyp_overflow_stack);
> +
> +       return kern_base + hyp_offset;
> +}
> +
> +/*
> + * Convert KVM nVHE hypervisor stack VA to a kernel VA.
> + *
> + * The nVHE hypervisor stack is mapped in the flexible 'private' VA range, to allow
> + * for guard pages below the stack. Consequently, the fixed offset address
> + * translation macros won't work here.
> + *
> + * The kernel VA is calculated as an offset from the kernel VA of the hypervisor
> + * stack base. See: kvm_nvhe_hyp_stack_kern_va(),  kvm_nvhe_overflow_stack_kern_va()
> + */
> +static unsigned long kvm_nvhe_stack_kern_va(unsigned long addr,
> +                                       enum stack_type type)
> +{
> +       switch (type) {
> +       case STACK_TYPE_KVM_NVHE_HYP:
> +               return kvm_nvhe_hyp_stack_kern_va(addr);
> +       case STACK_TYPE_KVM_NVHE_OVERFLOW:
> +               return kvm_nvhe_overflow_stack_kern_va(addr);
> +       default:
> +               return 0UL;
> +       }
> +}
> +
> +static int notrace kvm_nvhe_unwind_frame(struct task_struct *tsk,
> +                                       struct stackframe *frame)
> +{
> +       struct stack_info info;
> +
> +       if (!kvm_nvhe_on_accessible_stack(frame->fp, 16, &info))
> +               return -EINVAL;
> +
> +       return  __unwind_frame(frame, &info, kvm_nvhe_stack_kern_va);
> +}
> +
> +static bool kvm_nvhe_dump_backtrace_entry(void *arg, unsigned long where)
> +{
> +       unsigned long va_mask = GENMASK_ULL(vabits_actual - 1, 0);
> +       unsigned long hyp_offset = (unsigned long)arg;
> +
> +       where &= va_mask;       /* Mask tags */
> +       where += hyp_offset;    /* Convert to kern addr */
> +
> +       kvm_err("[<%016lx>] %pB\n", where, (void *)where);
> +
> +       return true;
> +}
> +
> +static void notrace kvm_nvhe_walk_stackframe(struct task_struct *tsk,
> +                                   struct stackframe *frame,
> +                                   bool (*fn)(void *, unsigned long), void *data)
> +{
> +       __walk_stackframe(tsk, frame, fn, data, kvm_nvhe_unwind_frame);
> +}
> +
> +void kvm_nvhe_dump_backtrace(unsigned long hyp_offset)
> +{
> +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> +       struct stackframe frame;
> +
> +       start_backtrace(&frame, panic_info->fp, panic_info->pc);
> +       pr_err("nVHE HYP call trace:\n");
> +       kvm_nvhe_walk_stackframe(NULL, &frame, kvm_nvhe_dump_backtrace_entry,
> +                                       (void *)hyp_offset);
> +       pr_err("---- end of nVHE HYP call trace ----\n");
> +}
> +#endif /* CONFIG_NVHE_EL2_DEBUG */
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index 8a5fbbf084df..75f2c8255ff0 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -51,8 +51,9 @@ config NVHE_EL2_DEBUG
>         depends on KVM
>         help
>           Say Y here to enable the debug mode for the non-VHE KVM EL2 object.
> -         Failure reports will BUG() in the hypervisor. This is intended for
> -         local EL2 hypervisor development.
> +         Failure reports will BUG() in the hypervisor; and panics will print
> +         the hypervisor call stack. This is intended for local EL2 hypervisor
> +         development.

Nit: maybe for clarity you could rephrase as "calls to hyp_panic()
will result in printing the hypervisor call stack".

Thanks,
/fuad


>
>           If unsure, say N.
>
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 7a23630c4a7f..66c07c04eb52 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -49,7 +49,7 @@ DEFINE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
>
>  DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
>
> -static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
> +DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
>  unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
>  DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
>
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index e3140abd2e2e..ff69dff33700 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -17,6 +17,7 @@
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_mmu.h>
>  #include <asm/debug-monitors.h>
> +#include <asm/stacktrace.h>
>  #include <asm/traps.h>
>
>  #include <kvm/arm_hypercalls.h>
> @@ -326,6 +327,8 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr,
>                 kvm_err("nVHE hyp panic at: %016llx!\n", elr_virt + hyp_offset);
>         }
>
> +       kvm_nvhe_dump_backtrace(hyp_offset);
> +
>         /*
>          * Hyp has panicked and we're going to handle that by panicking the
>          * kernel. The kernel offset will be revealed in the panic so we're
> diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
> index efc20273a352..b8ecffc47424 100644
> --- a/arch/arm64/kvm/hyp/nvhe/switch.c
> +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
> @@ -37,6 +37,22 @@ DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
>  #ifdef CONFIG_NVHE_EL2_DEBUG
>  DEFINE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack)
>         __aligned(16);
> +DEFINE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
> +
> +static inline void cpu_prepare_nvhe_panic_info(void)
> +{
> +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr(&kvm_panic_info);
> +       struct kvm_nvhe_init_params *params = this_cpu_ptr(&kvm_init_params);
> +
> +       panic_info->hyp_stack_base = (unsigned long)(params->stack_hyp_va - PAGE_SIZE);
> +       panic_info->hyp_overflow_stack_base = (unsigned long)this_cpu_ptr(hyp_overflow_stack);
> +       panic_info->fp = (unsigned long)__builtin_frame_address(0);
> +       panic_info->pc = _THIS_IP_;
> +}
> + #else
> +static inline void cpu_prepare_nvhe_panic_info(void)
> +{
> +}
>  #endif
>
>  static void __activate_traps(struct kvm_vcpu *vcpu)
> @@ -360,6 +376,8 @@ asmlinkage void __noreturn hyp_panic(void)
>         struct kvm_cpu_context *host_ctxt;
>         struct kvm_vcpu *vcpu;
>
> +       cpu_prepare_nvhe_panic_info();
> +
>         host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
>         vcpu = host_ctxt->__hyp_running_vcpu;
>
> --
> 2.35.1.473.g83b2b277ed-goog
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 7/8] KVM: arm64: Unwind and dump nVHE HYP stacktrace
@ 2022-02-24 12:28     ` Fuad Tabba
  0 siblings, 0 replies; 60+ messages in thread
From: Fuad Tabba @ 2022-02-24 12:28 UTC (permalink / raw)
  To: Kalesh Singh
  Cc: kernel-team, will, Peter Collingbourne, maz, linux-kernel,
	kvmarm, Madhavan T. Venkataraman, Mark Brown, Masami Hiramatsu,
	Catalin Marinas, Paolo Bonzini, surenb, linux-arm-kernel

Hi Kalesh,

On Thu, Feb 24, 2022 at 5:22 AM Kalesh Singh <kaleshsingh@google.com> wrote:
>
> Unwind the stack in EL1, when CONFIG_NVHE_EL2_DEBUG is enabled. This is
> possible because CONFIG_NVHE_EL2_DEBUG disables the host stage 2 protection
> which allows host to access the hypervisor stack pages in EL1.

For this comment to be clearer, and if my understanding is correct, I
think that it should say that CONFIG_NVHE_EL2_DEBUG allows host stage
2 protection to be disabled on a hyp_panic. Otherwise, on reading the
comment one might think that CONFIG_NVHE_EL2_DEBUG runs without host
stage 2 protection at all.

>
> Unwinding and dumping hyp call traces is gated on CONFIG_NVHE_EL2_DEBUG
> to avoid the potential leaking of information to the host.
>
> A simple stack overflow test produces the following output:
>
> [  580.376051][  T412] kvm: nVHE hyp panic at: ffffffc0116145c4!
> [  580.378034][  T412] kvm [412]: nVHE HYP call trace:
> [  580.378591][  T412] kvm [412]:  [<ffffffc011614934>]
> [  580.378993][  T412] kvm [412]:  [<ffffffc01160fa48>]
> [  580.379386][  T412] kvm [412]:  [<ffffffc0116145dc>]  // Non-terminating recursive call
> [  580.379772][  T412] kvm [412]:  [<ffffffc0116145dc>]
> [  580.380158][  T412] kvm [412]:  [<ffffffc0116145dc>]
> [  580.380544][  T412] kvm [412]:  [<ffffffc0116145dc>]
> [  580.380928][  T412] kvm [412]:  [<ffffffc0116145dc>]
> . . .
>
> Since nVHE hyp symbols are not included by kallsyms to avoid issues
> with aliasing, we fallback to the vmlinux addresses. Symbolizing the
> addresses is handled in the next patch in this series.
>
> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> ---
>
> Changes in v3:
>   - The nvhe hyp stack unwinder now makes use of the core logic from the
>     regular kernel unwinder to avoid duplication, per Mark
>
> Changes in v2:
>   - Add cpu_prepare_nvhe_panic_info()
>   - Move updating the panic info to hyp_panic(), so that unwinding also
>     works for conventional nVHE Hyp-mode.
>
>  arch/arm64/include/asm/kvm_asm.h    |  19 +++
>  arch/arm64/include/asm/stacktrace.h |  12 ++
>  arch/arm64/kernel/stacktrace.c      | 210 +++++++++++++++++++++++++---
>  arch/arm64/kvm/Kconfig              |   5 +-
>  arch/arm64/kvm/arm.c                |   2 +-
>  arch/arm64/kvm/handle_exit.c        |   3 +
>  arch/arm64/kvm/hyp/nvhe/switch.c    |  18 +++
>  7 files changed, 243 insertions(+), 26 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 2e277f2ed671..16efdf150a37 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -176,6 +176,25 @@ struct kvm_nvhe_init_params {
>         unsigned long vtcr;
>  };
>
> +#ifdef CONFIG_NVHE_EL2_DEBUG
> +/*
> + * Used by the host in EL1 to dump the nVHE hypervisor backtrace on
> + * hyp_panic. This is possible because CONFIG_NVHE_EL2_DEBUG disables
> + * the host stage 2 protection. See: __hyp_do_panic()

Same as my comment above.

> + *
> + * @hyp_stack_base:             hyp VA of the hyp_stack base.
> + * @hyp_overflow_stack_base:    hyp VA of the hyp_overflow_stack base.
> + * @fp:                         hyp FP where the backtrace begins.
> + * @pc:                         hyp PC where the backtrace begins.
> + */
> +struct kvm_nvhe_panic_info {
> +       unsigned long hyp_stack_base;
> +       unsigned long hyp_overflow_stack_base;
> +       unsigned long fp;
> +       unsigned long pc;
> +};
> +#endif /* CONFIG_NVHE_EL2_DEBUG */
> +
>  /* Translate a kernel address @ptr into its equivalent linear mapping */
>  #define kvm_ksym_ref(ptr)                                              \
>         ({                                                              \
> diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
> index e77cdef9ca29..18611a51cf14 100644
> --- a/arch/arm64/include/asm/stacktrace.h
> +++ b/arch/arm64/include/asm/stacktrace.h
> @@ -22,6 +22,10 @@ enum stack_type {
>         STACK_TYPE_OVERFLOW,
>         STACK_TYPE_SDEI_NORMAL,
>         STACK_TYPE_SDEI_CRITICAL,
> +#ifdef CONFIG_NVHE_EL2_DEBUG
> +       STACK_TYPE_KVM_NVHE_HYP,
> +       STACK_TYPE_KVM_NVHE_OVERFLOW,
> +#endif /* CONFIG_NVHE_EL2_DEBUG */
>         __NR_STACK_TYPES
>  };
>
> @@ -147,4 +151,12 @@ static inline bool on_accessible_stack(const struct task_struct *tsk,
>         return false;
>  }
>
> +#ifdef CONFIG_NVHE_EL2_DEBUG
> +void kvm_nvhe_dump_backtrace(unsigned long hyp_offset);
> +#else
> +static inline void kvm_nvhe_dump_backtrace(unsigned long hyp_offset)
> +{
> +}
> +#endif /* CONFIG_NVHE_EL2_DEBUG */
> +
>  #endif /* __ASM_STACKTRACE_H */
> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> index e4103e085681..6ec85cb69b1f 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -15,6 +15,8 @@
>
>  #include <asm/irq.h>
>  #include <asm/pointer_auth.h>
> +#include <asm/kvm_asm.h>
> +#include <asm/kvm_hyp.h>
>  #include <asm/stack_pointer.h>
>  #include <asm/stacktrace.h>
>
> @@ -64,26 +66,15 @@ NOKPROBE_SYMBOL(start_backtrace);
>   * records (e.g. a cycle), determined based on the location and fp value of A
>   * and the location (but not the fp value) of B.
>   */
> -static int notrace unwind_frame(struct task_struct *tsk,
> -                               struct stackframe *frame)
> +static int notrace __unwind_frame(struct stackframe *frame, struct stack_info *info,
> +               unsigned long (*translate_fp)(unsigned long, enum stack_type))
>  {
>         unsigned long fp = frame->fp;
> -       struct stack_info info;
> -
> -       if (!tsk)
> -               tsk = current;
> -
> -       /* Final frame; nothing to unwind */
> -       if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
> -               return -ENOENT;
>
>         if (fp & 0x7)
>                 return -EINVAL;
>
> -       if (!on_accessible_stack(tsk, fp, 16, &info))
> -               return -EINVAL;
> -
> -       if (test_bit(info.type, frame->stacks_done))
> +       if (test_bit(info->type, frame->stacks_done))
>                 return -EINVAL;
>
>         /*
> @@ -94,28 +85,62 @@ static int notrace unwind_frame(struct task_struct *tsk,
>          *
>          * TASK -> IRQ -> OVERFLOW -> SDEI_NORMAL
>          * TASK -> SDEI_NORMAL -> SDEI_CRITICAL -> OVERFLOW
> +        * KVM_NVHE_HYP -> KVM_NVHE_OVERFLOW
>          *
>          * ... but the nesting itself is strict. Once we transition from one
>          * stack to another, it's never valid to unwind back to that first
>          * stack.
>          */
> -       if (info.type == frame->prev_type) {
> +       if (info->type == frame->prev_type) {
>                 if (fp <= frame->prev_fp)
>                         return -EINVAL;
>         } else {
>                 set_bit(frame->prev_type, frame->stacks_done);
>         }
>
> +       /* Record fp as prev_fp before attempting to get the next fp */
> +       frame->prev_fp = fp;
> +
> +       /*
> +        * If fp is not from the current address space perform the
> +        * necessary translation before dereferencing it to get next fp.
> +        */
> +       if (translate_fp)
> +               fp = translate_fp(fp, info->type);
> +       if (!fp)
> +               return -EINVAL;
> +
>         /*
>          * Record this frame record's values and location. The prev_fp and
> -        * prev_type are only meaningful to the next unwind_frame() invocation.
> +        * prev_type are only meaningful to the next __unwind_frame() invocation.
>          */
>         frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
>         frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
> -       frame->prev_fp = fp;
> -       frame->prev_type = info.type;
> -
>         frame->pc = ptrauth_strip_insn_pac(frame->pc);
> +       frame->prev_type = info->type;
> +
> +       return 0;
> +}
> +
> +static int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
> +{
> +       unsigned long fp = frame->fp;
> +       struct stack_info info;
> +       int err;
> +
> +       if (!tsk)
> +               tsk = current;
> +
> +       /* Final frame; nothing to unwind */
> +       if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
> +               return -ENOENT;
> +
> +       if (!on_accessible_stack(tsk, fp, 16, &info))
> +               return -EINVAL;
> +
> +       err = __unwind_frame(frame, &info, NULL);
> +       if (err)
> +               return err;
>
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         if (tsk->ret_stack &&
> @@ -143,20 +168,27 @@ static int notrace unwind_frame(struct task_struct *tsk,
>  }
>  NOKPROBE_SYMBOL(unwind_frame);
>
> -static void notrace walk_stackframe(struct task_struct *tsk,
> -                                   struct stackframe *frame,
> -                                   bool (*fn)(void *, unsigned long), void *data)
> +static void notrace __walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
> +               bool (*fn)(void *, unsigned long), void *data,
> +               int (*unwind_frame_fn)(struct task_struct *tsk, struct stackframe *frame))
>  {
>         while (1) {
>                 int ret;
>
>                 if (!fn(data, frame->pc))
>                         break;
> -               ret = unwind_frame(tsk, frame);
> +               ret = unwind_frame_fn(tsk, frame);
>                 if (ret < 0)
>                         break;
>         }
>  }
> +
> +static void notrace walk_stackframe(struct task_struct *tsk,
> +                                   struct stackframe *frame,
> +                                   bool (*fn)(void *, unsigned long), void *data)
> +{
> +       __walk_stackframe(tsk, frame, fn, data, unwind_frame);
> +}
>  NOKPROBE_SYMBOL(walk_stackframe);
>
>  static bool dump_backtrace_entry(void *arg, unsigned long where)
> @@ -210,3 +242,135 @@ noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
>
>         walk_stackframe(task, &frame, consume_entry, cookie);
>  }
> +
> +#ifdef CONFIG_NVHE_EL2_DEBUG
> +DECLARE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
> +DECLARE_KVM_NVHE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack);
> +DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
> +
> +static inline bool kvm_nvhe_on_overflow_stack(unsigned long sp, unsigned long size,
> +                                struct stack_info *info)
> +{
> +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> +       unsigned long low = (unsigned long)panic_info->hyp_overflow_stack_base;
> +       unsigned long high = low + PAGE_SIZE;
> +
> +       return on_stack(sp, size, low, high, STACK_TYPE_KVM_NVHE_OVERFLOW, info);
> +}
> +
> +static inline bool kvm_nvhe_on_hyp_stack(unsigned long sp, unsigned long size,
> +                                struct stack_info *info)
> +{
> +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> +       unsigned long low = (unsigned long)panic_info->hyp_stack_base;
> +       unsigned long high = low + PAGE_SIZE;
> +
> +       return on_stack(sp, size, low, high, STACK_TYPE_KVM_NVHE_HYP, info);
> +}
> +
> +static inline bool kvm_nvhe_on_accessible_stack(unsigned long sp, unsigned long size,
> +                                      struct stack_info *info)
> +{
> +       if (info)
> +               info->type = STACK_TYPE_UNKNOWN;
> +
> +       if (kvm_nvhe_on_hyp_stack(sp, size, info))
> +               return true;
> +       if (kvm_nvhe_on_overflow_stack(sp, size, info))
> +               return true;
> +
> +       return false;
> +}
> +
> +static unsigned long kvm_nvhe_hyp_stack_kern_va(unsigned long addr)
> +{
> +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> +       unsigned long hyp_base, kern_base, hyp_offset;
> +
> +       hyp_base = (unsigned long)panic_info->hyp_stack_base;
> +       hyp_offset = addr - hyp_base;
> +
> +       kern_base = (unsigned long)*this_cpu_ptr(&kvm_arm_hyp_stack_page);
> +
> +       return kern_base + hyp_offset;
> +}
> +
> +static unsigned long kvm_nvhe_overflow_stack_kern_va(unsigned long addr)
> +{
> +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> +       unsigned long hyp_base, kern_base, hyp_offset;
> +
> +       hyp_base = (unsigned long)panic_info->hyp_overflow_stack_base;
> +       hyp_offset = addr - hyp_base;
> +
> +       kern_base = (unsigned long)this_cpu_ptr_nvhe_sym(hyp_overflow_stack);
> +
> +       return kern_base + hyp_offset;
> +}
> +
> +/*
> + * Convert KVM nVHE hypervisor stack VA to a kernel VA.
> + *
> + * The nVHE hypervisor stack is mapped in the flexible 'private' VA range, to allow
> + * for guard pages below the stack. Consequently, the fixed offset address
> + * translation macros won't work here.
> + *
> + * The kernel VA is calculated as an offset from the kernel VA of the hypervisor
> + * stack base. See: kvm_nvhe_hyp_stack_kern_va(),  kvm_nvhe_overflow_stack_kern_va()
> + */
> +static unsigned long kvm_nvhe_stack_kern_va(unsigned long addr,
> +                                       enum stack_type type)
> +{
> +       switch (type) {
> +       case STACK_TYPE_KVM_NVHE_HYP:
> +               return kvm_nvhe_hyp_stack_kern_va(addr);
> +       case STACK_TYPE_KVM_NVHE_OVERFLOW:
> +               return kvm_nvhe_overflow_stack_kern_va(addr);
> +       default:
> +               return 0UL;
> +       }
> +}
> +
> +static int notrace kvm_nvhe_unwind_frame(struct task_struct *tsk,
> +                                       struct stackframe *frame)
> +{
> +       struct stack_info info;
> +
> +       if (!kvm_nvhe_on_accessible_stack(frame->fp, 16, &info))
> +               return -EINVAL;
> +
> +       return  __unwind_frame(frame, &info, kvm_nvhe_stack_kern_va);
> +}
> +
> +static bool kvm_nvhe_dump_backtrace_entry(void *arg, unsigned long where)
> +{
> +       unsigned long va_mask = GENMASK_ULL(vabits_actual - 1, 0);
> +       unsigned long hyp_offset = (unsigned long)arg;
> +
> +       where &= va_mask;       /* Mask tags */
> +       where += hyp_offset;    /* Convert to kern addr */
> +
> +       kvm_err("[<%016lx>] %pB\n", where, (void *)where);
> +
> +       return true;
> +}
> +
> +static void notrace kvm_nvhe_walk_stackframe(struct task_struct *tsk,
> +                                   struct stackframe *frame,
> +                                   bool (*fn)(void *, unsigned long), void *data)
> +{
> +       __walk_stackframe(tsk, frame, fn, data, kvm_nvhe_unwind_frame);
> +}
> +
> +void kvm_nvhe_dump_backtrace(unsigned long hyp_offset)
> +{
> +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> +       struct stackframe frame;
> +
> +       start_backtrace(&frame, panic_info->fp, panic_info->pc);
> +       pr_err("nVHE HYP call trace:\n");
> +       kvm_nvhe_walk_stackframe(NULL, &frame, kvm_nvhe_dump_backtrace_entry,
> +                                       (void *)hyp_offset);
> +       pr_err("---- end of nVHE HYP call trace ----\n");
> +}
> +#endif /* CONFIG_NVHE_EL2_DEBUG */
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index 8a5fbbf084df..75f2c8255ff0 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -51,8 +51,9 @@ config NVHE_EL2_DEBUG
>         depends on KVM
>         help
>           Say Y here to enable the debug mode for the non-VHE KVM EL2 object.
> -         Failure reports will BUG() in the hypervisor. This is intended for
> -         local EL2 hypervisor development.
> +         Failure reports will BUG() in the hypervisor; and panics will print
> +         the hypervisor call stack. This is intended for local EL2 hypervisor
> +         development.

Nit: maybe for clarity you could rephrase as "calls to hyp_panic()
will result in printing the hypervisor call stack".

Thanks,
/fuad


>
>           If unsure, say N.
>
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 7a23630c4a7f..66c07c04eb52 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -49,7 +49,7 @@ DEFINE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
>
>  DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
>
> -static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
> +DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
>  unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
>  DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
>
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index e3140abd2e2e..ff69dff33700 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -17,6 +17,7 @@
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_mmu.h>
>  #include <asm/debug-monitors.h>
> +#include <asm/stacktrace.h>
>  #include <asm/traps.h>
>
>  #include <kvm/arm_hypercalls.h>
> @@ -326,6 +327,8 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr,
>                 kvm_err("nVHE hyp panic at: %016llx!\n", elr_virt + hyp_offset);
>         }
>
> +       kvm_nvhe_dump_backtrace(hyp_offset);
> +
>         /*
>          * Hyp has panicked and we're going to handle that by panicking the
>          * kernel. The kernel offset will be revealed in the panic so we're
> diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
> index efc20273a352..b8ecffc47424 100644
> --- a/arch/arm64/kvm/hyp/nvhe/switch.c
> +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
> @@ -37,6 +37,22 @@ DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
>  #ifdef CONFIG_NVHE_EL2_DEBUG
>  DEFINE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack)
>         __aligned(16);
> +DEFINE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
> +
> +static inline void cpu_prepare_nvhe_panic_info(void)
> +{
> +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr(&kvm_panic_info);
> +       struct kvm_nvhe_init_params *params = this_cpu_ptr(&kvm_init_params);
> +
> +       panic_info->hyp_stack_base = (unsigned long)(params->stack_hyp_va - PAGE_SIZE);
> +       panic_info->hyp_overflow_stack_base = (unsigned long)this_cpu_ptr(hyp_overflow_stack);
> +       panic_info->fp = (unsigned long)__builtin_frame_address(0);
> +       panic_info->pc = _THIS_IP_;
> +}
> + #else
> +static inline void cpu_prepare_nvhe_panic_info(void)
> +{
> +}
>  #endif
>
>  static void __activate_traps(struct kvm_vcpu *vcpu)
> @@ -360,6 +376,8 @@ asmlinkage void __noreturn hyp_panic(void)
>         struct kvm_cpu_context *host_ctxt;
>         struct kvm_vcpu *vcpu;
>
> +       cpu_prepare_nvhe_panic_info();
> +
>         host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
>         vcpu = host_ctxt->__hyp_running_vcpu;
>
> --
> 2.35.1.473.g83b2b277ed-goog
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 7/8] KVM: arm64: Unwind and dump nVHE HYP stacktrace
@ 2022-02-24 12:28     ` Fuad Tabba
  0 siblings, 0 replies; 60+ messages in thread
From: Fuad Tabba @ 2022-02-24 12:28 UTC (permalink / raw)
  To: Kalesh Singh
  Cc: will, maz, qperret, surenb, kernel-team, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Catalin Marinas,
	Mark Rutland, Mark Brown, Masami Hiramatsu, Peter Collingbourne,
	Madhavan T. Venkataraman, Andrew Scull, Paolo Bonzini,
	Ard Biesheuvel, linux-arm-kernel, kvmarm, linux-kernel

Hi Kalesh,

On Thu, Feb 24, 2022 at 5:22 AM Kalesh Singh <kaleshsingh@google.com> wrote:
>
> Unwind the stack in EL1, when CONFIG_NVHE_EL2_DEBUG is enabled. This is
> possible because CONFIG_NVHE_EL2_DEBUG disables the host stage 2 protection
> which allows host to access the hypervisor stack pages in EL1.

For this comment to be clearer, and if my understanding is correct, I
think that it should say that CONFIG_NVHE_EL2_DEBUG allows host stage
2 protection to be disabled on a hyp_panic. Otherwise, on reading the
comment one might think that CONFIG_NVHE_EL2_DEBUG runs without host
stage 2 protection at all.

>
> Unwinding and dumping hyp call traces is gated on CONFIG_NVHE_EL2_DEBUG
> to avoid the potential leaking of information to the host.
>
> A simple stack overflow test produces the following output:
>
> [  580.376051][  T412] kvm: nVHE hyp panic at: ffffffc0116145c4!
> [  580.378034][  T412] kvm [412]: nVHE HYP call trace:
> [  580.378591][  T412] kvm [412]:  [<ffffffc011614934>]
> [  580.378993][  T412] kvm [412]:  [<ffffffc01160fa48>]
> [  580.379386][  T412] kvm [412]:  [<ffffffc0116145dc>]  // Non-terminating recursive call
> [  580.379772][  T412] kvm [412]:  [<ffffffc0116145dc>]
> [  580.380158][  T412] kvm [412]:  [<ffffffc0116145dc>]
> [  580.380544][  T412] kvm [412]:  [<ffffffc0116145dc>]
> [  580.380928][  T412] kvm [412]:  [<ffffffc0116145dc>]
> . . .
>
> Since nVHE hyp symbols are not included by kallsyms to avoid issues
> with aliasing, we fallback to the vmlinux addresses. Symbolizing the
> addresses is handled in the next patch in this series.
>
> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> ---
>
> Changes in v3:
>   - The nvhe hyp stack unwinder now makes use of the core logic from the
>     regular kernel unwinder to avoid duplication, per Mark
>
> Changes in v2:
>   - Add cpu_prepare_nvhe_panic_info()
>   - Move updating the panic info to hyp_panic(), so that unwinding also
>     works for conventional nVHE Hyp-mode.
>
>  arch/arm64/include/asm/kvm_asm.h    |  19 +++
>  arch/arm64/include/asm/stacktrace.h |  12 ++
>  arch/arm64/kernel/stacktrace.c      | 210 +++++++++++++++++++++++++---
>  arch/arm64/kvm/Kconfig              |   5 +-
>  arch/arm64/kvm/arm.c                |   2 +-
>  arch/arm64/kvm/handle_exit.c        |   3 +
>  arch/arm64/kvm/hyp/nvhe/switch.c    |  18 +++
>  7 files changed, 243 insertions(+), 26 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 2e277f2ed671..16efdf150a37 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -176,6 +176,25 @@ struct kvm_nvhe_init_params {
>         unsigned long vtcr;
>  };
>
> +#ifdef CONFIG_NVHE_EL2_DEBUG
> +/*
> + * Used by the host in EL1 to dump the nVHE hypervisor backtrace on
> + * hyp_panic. This is possible because CONFIG_NVHE_EL2_DEBUG disables
> + * the host stage 2 protection. See: __hyp_do_panic()

Same as my comment above.

> + *
> + * @hyp_stack_base:             hyp VA of the hyp_stack base.
> + * @hyp_overflow_stack_base:    hyp VA of the hyp_overflow_stack base.
> + * @fp:                         hyp FP where the backtrace begins.
> + * @pc:                         hyp PC where the backtrace begins.
> + */
> +struct kvm_nvhe_panic_info {
> +       unsigned long hyp_stack_base;
> +       unsigned long hyp_overflow_stack_base;
> +       unsigned long fp;
> +       unsigned long pc;
> +};
> +#endif /* CONFIG_NVHE_EL2_DEBUG */
> +
>  /* Translate a kernel address @ptr into its equivalent linear mapping */
>  #define kvm_ksym_ref(ptr)                                              \
>         ({                                                              \
> diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
> index e77cdef9ca29..18611a51cf14 100644
> --- a/arch/arm64/include/asm/stacktrace.h
> +++ b/arch/arm64/include/asm/stacktrace.h
> @@ -22,6 +22,10 @@ enum stack_type {
>         STACK_TYPE_OVERFLOW,
>         STACK_TYPE_SDEI_NORMAL,
>         STACK_TYPE_SDEI_CRITICAL,
> +#ifdef CONFIG_NVHE_EL2_DEBUG
> +       STACK_TYPE_KVM_NVHE_HYP,
> +       STACK_TYPE_KVM_NVHE_OVERFLOW,
> +#endif /* CONFIG_NVHE_EL2_DEBUG */
>         __NR_STACK_TYPES
>  };
>
> @@ -147,4 +151,12 @@ static inline bool on_accessible_stack(const struct task_struct *tsk,
>         return false;
>  }
>
> +#ifdef CONFIG_NVHE_EL2_DEBUG
> +void kvm_nvhe_dump_backtrace(unsigned long hyp_offset);
> +#else
> +static inline void kvm_nvhe_dump_backtrace(unsigned long hyp_offset)
> +{
> +}
> +#endif /* CONFIG_NVHE_EL2_DEBUG */
> +
>  #endif /* __ASM_STACKTRACE_H */
> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> index e4103e085681..6ec85cb69b1f 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -15,6 +15,8 @@
>
>  #include <asm/irq.h>
>  #include <asm/pointer_auth.h>
> +#include <asm/kvm_asm.h>
> +#include <asm/kvm_hyp.h>
>  #include <asm/stack_pointer.h>
>  #include <asm/stacktrace.h>
>
> @@ -64,26 +66,15 @@ NOKPROBE_SYMBOL(start_backtrace);
>   * records (e.g. a cycle), determined based on the location and fp value of A
>   * and the location (but not the fp value) of B.
>   */
> -static int notrace unwind_frame(struct task_struct *tsk,
> -                               struct stackframe *frame)
> +static int notrace __unwind_frame(struct stackframe *frame, struct stack_info *info,
> +               unsigned long (*translate_fp)(unsigned long, enum stack_type))
>  {
>         unsigned long fp = frame->fp;
> -       struct stack_info info;
> -
> -       if (!tsk)
> -               tsk = current;
> -
> -       /* Final frame; nothing to unwind */
> -       if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
> -               return -ENOENT;
>
>         if (fp & 0x7)
>                 return -EINVAL;
>
> -       if (!on_accessible_stack(tsk, fp, 16, &info))
> -               return -EINVAL;
> -
> -       if (test_bit(info.type, frame->stacks_done))
> +       if (test_bit(info->type, frame->stacks_done))
>                 return -EINVAL;
>
>         /*
> @@ -94,28 +85,62 @@ static int notrace unwind_frame(struct task_struct *tsk,
>          *
>          * TASK -> IRQ -> OVERFLOW -> SDEI_NORMAL
>          * TASK -> SDEI_NORMAL -> SDEI_CRITICAL -> OVERFLOW
> +        * KVM_NVHE_HYP -> KVM_NVHE_OVERFLOW
>          *
>          * ... but the nesting itself is strict. Once we transition from one
>          * stack to another, it's never valid to unwind back to that first
>          * stack.
>          */
> -       if (info.type == frame->prev_type) {
> +       if (info->type == frame->prev_type) {
>                 if (fp <= frame->prev_fp)
>                         return -EINVAL;
>         } else {
>                 set_bit(frame->prev_type, frame->stacks_done);
>         }
>
> +       /* Record fp as prev_fp before attempting to get the next fp */
> +       frame->prev_fp = fp;
> +
> +       /*
> +        * If fp is not from the current address space perform the
> +        * necessary translation before dereferencing it to get next fp.
> +        */
> +       if (translate_fp)
> +               fp = translate_fp(fp, info->type);
> +       if (!fp)
> +               return -EINVAL;
> +
>         /*
>          * Record this frame record's values and location. The prev_fp and
> -        * prev_type are only meaningful to the next unwind_frame() invocation.
> +        * prev_type are only meaningful to the next __unwind_frame() invocation.
>          */
>         frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
>         frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
> -       frame->prev_fp = fp;
> -       frame->prev_type = info.type;
> -
>         frame->pc = ptrauth_strip_insn_pac(frame->pc);
> +       frame->prev_type = info->type;
> +
> +       return 0;
> +}
> +
> +static int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
> +{
> +       unsigned long fp = frame->fp;
> +       struct stack_info info;
> +       int err;
> +
> +       if (!tsk)
> +               tsk = current;
> +
> +       /* Final frame; nothing to unwind */
> +       if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
> +               return -ENOENT;
> +
> +       if (!on_accessible_stack(tsk, fp, 16, &info))
> +               return -EINVAL;
> +
> +       err = __unwind_frame(frame, &info, NULL);
> +       if (err)
> +               return err;
>
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         if (tsk->ret_stack &&
> @@ -143,20 +168,27 @@ static int notrace unwind_frame(struct task_struct *tsk,
>  }
>  NOKPROBE_SYMBOL(unwind_frame);
>
> -static void notrace walk_stackframe(struct task_struct *tsk,
> -                                   struct stackframe *frame,
> -                                   bool (*fn)(void *, unsigned long), void *data)
> +static void notrace __walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
> +               bool (*fn)(void *, unsigned long), void *data,
> +               int (*unwind_frame_fn)(struct task_struct *tsk, struct stackframe *frame))
>  {
>         while (1) {
>                 int ret;
>
>                 if (!fn(data, frame->pc))
>                         break;
> -               ret = unwind_frame(tsk, frame);
> +               ret = unwind_frame_fn(tsk, frame);
>                 if (ret < 0)
>                         break;
>         }
>  }
> +
> +static void notrace walk_stackframe(struct task_struct *tsk,
> +                                   struct stackframe *frame,
> +                                   bool (*fn)(void *, unsigned long), void *data)
> +{
> +       __walk_stackframe(tsk, frame, fn, data, unwind_frame);
> +}
>  NOKPROBE_SYMBOL(walk_stackframe);
>
>  static bool dump_backtrace_entry(void *arg, unsigned long where)
> @@ -210,3 +242,135 @@ noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
>
>         walk_stackframe(task, &frame, consume_entry, cookie);
>  }
> +
> +#ifdef CONFIG_NVHE_EL2_DEBUG
> +DECLARE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
> +DECLARE_KVM_NVHE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack);
> +DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
> +
> +static inline bool kvm_nvhe_on_overflow_stack(unsigned long sp, unsigned long size,
> +                                struct stack_info *info)
> +{
> +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> +       unsigned long low = (unsigned long)panic_info->hyp_overflow_stack_base;
> +       unsigned long high = low + PAGE_SIZE;
> +
> +       return on_stack(sp, size, low, high, STACK_TYPE_KVM_NVHE_OVERFLOW, info);
> +}
> +
> +static inline bool kvm_nvhe_on_hyp_stack(unsigned long sp, unsigned long size,
> +                                struct stack_info *info)
> +{
> +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> +       unsigned long low = (unsigned long)panic_info->hyp_stack_base;
> +       unsigned long high = low + PAGE_SIZE;
> +
> +       return on_stack(sp, size, low, high, STACK_TYPE_KVM_NVHE_HYP, info);
> +}
> +
> +static inline bool kvm_nvhe_on_accessible_stack(unsigned long sp, unsigned long size,
> +                                      struct stack_info *info)
> +{
> +       if (info)
> +               info->type = STACK_TYPE_UNKNOWN;
> +
> +       if (kvm_nvhe_on_hyp_stack(sp, size, info))
> +               return true;
> +       if (kvm_nvhe_on_overflow_stack(sp, size, info))
> +               return true;
> +
> +       return false;
> +}
> +
> +static unsigned long kvm_nvhe_hyp_stack_kern_va(unsigned long addr)
> +{
> +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> +       unsigned long hyp_base, kern_base, hyp_offset;
> +
> +       hyp_base = (unsigned long)panic_info->hyp_stack_base;
> +       hyp_offset = addr - hyp_base;
> +
> +       kern_base = (unsigned long)*this_cpu_ptr(&kvm_arm_hyp_stack_page);
> +
> +       return kern_base + hyp_offset;
> +}
> +
> +static unsigned long kvm_nvhe_overflow_stack_kern_va(unsigned long addr)
> +{
> +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> +       unsigned long hyp_base, kern_base, hyp_offset;
> +
> +       hyp_base = (unsigned long)panic_info->hyp_overflow_stack_base;
> +       hyp_offset = addr - hyp_base;
> +
> +       kern_base = (unsigned long)this_cpu_ptr_nvhe_sym(hyp_overflow_stack);
> +
> +       return kern_base + hyp_offset;
> +}
> +
> +/*
> + * Convert KVM nVHE hypervisor stack VA to a kernel VA.
> + *
> + * The nVHE hypervisor stack is mapped in the flexible 'private' VA range, to allow
> + * for guard pages below the stack. Consequently, the fixed offset address
> + * translation macros won't work here.
> + *
> + * The kernel VA is calculated as an offset from the kernel VA of the hypervisor
> + * stack base. See: kvm_nvhe_hyp_stack_kern_va(),  kvm_nvhe_overflow_stack_kern_va()
> + */
> +static unsigned long kvm_nvhe_stack_kern_va(unsigned long addr,
> +                                       enum stack_type type)
> +{
> +       switch (type) {
> +       case STACK_TYPE_KVM_NVHE_HYP:
> +               return kvm_nvhe_hyp_stack_kern_va(addr);
> +       case STACK_TYPE_KVM_NVHE_OVERFLOW:
> +               return kvm_nvhe_overflow_stack_kern_va(addr);
> +       default:
> +               return 0UL;
> +       }
> +}
> +
> +static int notrace kvm_nvhe_unwind_frame(struct task_struct *tsk,
> +                                       struct stackframe *frame)
> +{
> +       struct stack_info info;
> +
> +       if (!kvm_nvhe_on_accessible_stack(frame->fp, 16, &info))
> +               return -EINVAL;
> +
> +       return  __unwind_frame(frame, &info, kvm_nvhe_stack_kern_va);
> +}
> +
> +static bool kvm_nvhe_dump_backtrace_entry(void *arg, unsigned long where)
> +{
> +       unsigned long va_mask = GENMASK_ULL(vabits_actual - 1, 0);
> +       unsigned long hyp_offset = (unsigned long)arg;
> +
> +       where &= va_mask;       /* Mask tags */
> +       where += hyp_offset;    /* Convert to kern addr */
> +
> +       kvm_err("[<%016lx>] %pB\n", where, (void *)where);
> +
> +       return true;
> +}
> +
> +static void notrace kvm_nvhe_walk_stackframe(struct task_struct *tsk,
> +                                   struct stackframe *frame,
> +                                   bool (*fn)(void *, unsigned long), void *data)
> +{
> +       __walk_stackframe(tsk, frame, fn, data, kvm_nvhe_unwind_frame);
> +}
> +
> +void kvm_nvhe_dump_backtrace(unsigned long hyp_offset)
> +{
> +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> +       struct stackframe frame;
> +
> +       start_backtrace(&frame, panic_info->fp, panic_info->pc);
> +       pr_err("nVHE HYP call trace:\n");
> +       kvm_nvhe_walk_stackframe(NULL, &frame, kvm_nvhe_dump_backtrace_entry,
> +                                       (void *)hyp_offset);
> +       pr_err("---- end of nVHE HYP call trace ----\n");
> +}
> +#endif /* CONFIG_NVHE_EL2_DEBUG */
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index 8a5fbbf084df..75f2c8255ff0 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -51,8 +51,9 @@ config NVHE_EL2_DEBUG
>         depends on KVM
>         help
>           Say Y here to enable the debug mode for the non-VHE KVM EL2 object.
> -         Failure reports will BUG() in the hypervisor. This is intended for
> -         local EL2 hypervisor development.
> +         Failure reports will BUG() in the hypervisor; and panics will print
> +         the hypervisor call stack. This is intended for local EL2 hypervisor
> +         development.

Nit: maybe for clarity you could rephrase as "calls to hyp_panic()
will result in printing the hypervisor call stack".

Thanks,
/fuad


>
>           If unsure, say N.
>
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 7a23630c4a7f..66c07c04eb52 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -49,7 +49,7 @@ DEFINE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
>
>  DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
>
> -static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
> +DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
>  unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
>  DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
>
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index e3140abd2e2e..ff69dff33700 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -17,6 +17,7 @@
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_mmu.h>
>  #include <asm/debug-monitors.h>
> +#include <asm/stacktrace.h>
>  #include <asm/traps.h>
>
>  #include <kvm/arm_hypercalls.h>
> @@ -326,6 +327,8 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr,
>                 kvm_err("nVHE hyp panic at: %016llx!\n", elr_virt + hyp_offset);
>         }
>
> +       kvm_nvhe_dump_backtrace(hyp_offset);
> +
>         /*
>          * Hyp has panicked and we're going to handle that by panicking the
>          * kernel. The kernel offset will be revealed in the panic so we're
> diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
> index efc20273a352..b8ecffc47424 100644
> --- a/arch/arm64/kvm/hyp/nvhe/switch.c
> +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
> @@ -37,6 +37,22 @@ DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
>  #ifdef CONFIG_NVHE_EL2_DEBUG
>  DEFINE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack)
>         __aligned(16);
> +DEFINE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
> +
> +static inline void cpu_prepare_nvhe_panic_info(void)
> +{
> +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr(&kvm_panic_info);
> +       struct kvm_nvhe_init_params *params = this_cpu_ptr(&kvm_init_params);
> +
> +       panic_info->hyp_stack_base = (unsigned long)(params->stack_hyp_va - PAGE_SIZE);
> +       panic_info->hyp_overflow_stack_base = (unsigned long)this_cpu_ptr(hyp_overflow_stack);
> +       panic_info->fp = (unsigned long)__builtin_frame_address(0);
> +       panic_info->pc = _THIS_IP_;
> +}
> + #else
> +static inline void cpu_prepare_nvhe_panic_info(void)
> +{
> +}
>  #endif
>
>  static void __activate_traps(struct kvm_vcpu *vcpu)
> @@ -360,6 +376,8 @@ asmlinkage void __noreturn hyp_panic(void)
>         struct kvm_cpu_context *host_ctxt;
>         struct kvm_vcpu *vcpu;
>
> +       cpu_prepare_nvhe_panic_info();
> +
>         host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
>         vcpu = host_ctxt->__hyp_running_vcpu;
>
> --
> 2.35.1.473.g83b2b277ed-goog
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 1/8] KVM: arm64: Introduce hyp_alloc_private_va_range()
  2022-02-24 12:24     ` Fuad Tabba
  (?)
@ 2022-02-24 17:20       ` Kalesh Singh
  -1 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24 17:20 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Will Deacon, Marc Zyngier, Quentin Perret, Suren Baghdasaryan,
	Cc: Android Kernel, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Catalin Marinas, Mark Rutland, Mark Brown,
	Masami Hiramatsu, Peter Collingbourne, Madhavan T. Venkataraman,
	Andrew Scull, Paolo Bonzini, Ard Biesheuvel,
	moderated list:ARM64 PORT (AARCH64 ARCHITECTURE),
	kvmarm, LKML

On Thu, Feb 24, 2022 at 4:25 AM Fuad Tabba <tabba@google.com> wrote:
>
> Hi Kalesh,
>
> On Thu, Feb 24, 2022 at 5:16 AM Kalesh Singh <kaleshsingh@google.com> wrote:
> >
> > hyp_alloc_private_va_range() can be used to reserve private VA ranges
> > in the nVHE hypervisor. Also update  __create_hyp_private_mapping()
> > to allow specifying an alignment for the private VA mapping.
> >
> > These will be used to implement stack guard pages for KVM nVHE hypervisor
> > (nVHE Hyp mode / not pKVM), in a subsequent patch in the series.
> >
> > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > ---
> >
> > Changes in v3:
> >   - Handle null ptr in IS_ERR_OR_NULL checks, per Mark
> >
> >  arch/arm64/include/asm/kvm_mmu.h |  4 +++
> >  arch/arm64/kvm/mmu.c             | 62 ++++++++++++++++++++------------
> >  2 files changed, 43 insertions(+), 23 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> > index 81839e9a8a24..0b0c71302b92 100644
> > --- a/arch/arm64/include/asm/kvm_mmu.h
> > +++ b/arch/arm64/include/asm/kvm_mmu.h
> > @@ -153,6 +153,10 @@ static __always_inline unsigned long __kern_hyp_va(unsigned long v)
> >  int kvm_share_hyp(void *from, void *to);
> >  void kvm_unshare_hyp(void *from, void *to);
> >  int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
> > +unsigned long hyp_alloc_private_va_range(size_t size, size_t align);
> > +int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> > +                               size_t align, unsigned long *haddr,
> > +                               enum kvm_pgtable_prot prot);
> >  int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
> >                            void __iomem **kaddr,
> >                            void __iomem **haddr);
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index bc2aba953299..fc09536c8197 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -457,22 +457,16 @@ int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
> >         return 0;
> >  }
> >
> > -static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> > -                                       unsigned long *haddr,
> > -                                       enum kvm_pgtable_prot prot)
> > +
> > +/*
> > + * Allocates a private VA range below io_map_base.
> > + *
> > + * @size:      The size of the VA range to reserve.
> > + * @align:     The required alignment for the allocation.
> > + */
>
> Many of the functions in this file use the kernel-doc format, and your
> added comments are close, but not quite conforment. If you want to use
> the kernel-doc for these you can refer to:
> https://www.kernel.org/doc/html/latest/doc-guide/kernel-doc.html

Hi Fuad,

Thanks for the pointer. I will update the function comments to match
when I send the next version.

>
> > +unsigned long hyp_alloc_private_va_range(size_t size, size_t align)
> >  {
> >         unsigned long base;
> > -       int ret = 0;
> > -
> > -       if (!kvm_host_owns_hyp_mappings()) {
> > -               base = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
> > -                                        phys_addr, size, prot);
> > -               if (IS_ERR_OR_NULL((void *)base))
> > -                       return PTR_ERR((void *)base);
> > -               *haddr = base;
> > -
> > -               return 0;
> > -       }
> >
> >         mutex_lock(&kvm_hyp_pgd_mutex);
> >
> > @@ -484,8 +478,8 @@ static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> >          *
> >          * The allocated size is always a multiple of PAGE_SIZE.
> >          */
> > -       size = PAGE_ALIGN(size + offset_in_page(phys_addr));
> > -       base = io_map_base - size;
> > +       base = io_map_base - PAGE_ALIGN(size);
> > +       base = ALIGN_DOWN(base, align);
> >
> >         /*
> >          * Verify that BIT(VA_BITS - 1) hasn't been flipped by
> > @@ -493,20 +487,42 @@ static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> >          * overflowed the idmap/IO address range.
> >          */
> >         if ((base ^ io_map_base) & BIT(VA_BITS - 1))
> > -               ret = -ENOMEM;
> > +               base = (unsigned long)ERR_PTR(-ENOMEM);
> >         else
> >                 io_map_base = base;
> >
> >         mutex_unlock(&kvm_hyp_pgd_mutex);
> >
> > -       if (ret)
> > -               goto out;
> > +       return base;
> > +}
> > +
> > +int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> > +                               size_t align, unsigned long *haddr,
> > +                               enum kvm_pgtable_prot prot)
> > +{
> > +       unsigned long addr;
> > +       int ret = 0;
> > +
> > +       if (!kvm_host_owns_hyp_mappings()) {
> > +               addr = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
> > +                                        phys_addr, size, prot);
> > +               if (IS_ERR_OR_NULL((void *)addr))
> > +                       return addr ? PTR_ERR((void *)addr) : -ENOMEM;
> > +               *haddr = addr;
> > +
> > +               return 0;
> > +       }
> > +
> > +       size += offset_in_page(phys_addr);
>
> You're not page-aligning the size, which was the behavior before this
> patch. However, looking at where it's being used it seems to be fine
> because the users of size would align it if necessary.

This is now done by hyp_alloc_private_va_range() when calculating the new base:
 ...
 * The allocated size is always a multiple of PAGE_SIZE.
 */
base = io_map_base - PAGE_ALIGN(size);
...

Thanks,
Kalesh

>
> Thanks,
> /fuad
>
>
>
> > +       addr = hyp_alloc_private_va_range(size, align);
> > +       if (IS_ERR_OR_NULL((void *)addr))
> > +               return addr ? PTR_ERR((void *)addr) : -ENOMEM;
> >
> > -       ret = __create_hyp_mappings(base, size, phys_addr, prot);
> > +       ret = __create_hyp_mappings(addr, size, phys_addr, prot);
> >         if (ret)
> >                 goto out;
> >
> > -       *haddr = base + offset_in_page(phys_addr);
> > +       *haddr = addr + offset_in_page(phys_addr);
> >  out:
> >         return ret;
> >  }
> > @@ -537,7 +553,7 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
> >                 return 0;
> >         }
> >
> > -       ret = __create_hyp_private_mapping(phys_addr, size,
> > +       ret = __create_hyp_private_mapping(phys_addr, size, PAGE_SIZE,
> >                                            &addr, PAGE_HYP_DEVICE);
> >         if (ret) {
> >                 iounmap(*kaddr);
> > @@ -564,7 +580,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
> >
> >         BUG_ON(is_kernel_in_hyp_mode());
> >
> > -       ret = __create_hyp_private_mapping(phys_addr, size,
> > +       ret = __create_hyp_private_mapping(phys_addr, size, PAGE_SIZE,
> >                                            &addr, PAGE_HYP_EXEC);
> >         if (ret) {
> >                 *haddr = NULL;
> > --
> > 2.35.1.473.g83b2b277ed-goog
> >

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 1/8] KVM: arm64: Introduce hyp_alloc_private_va_range()
@ 2022-02-24 17:20       ` Kalesh Singh
  0 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24 17:20 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Will Deacon, Marc Zyngier, Quentin Perret, Suren Baghdasaryan,
	Cc: Android Kernel, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Catalin Marinas, Mark Rutland, Mark Brown,
	Masami Hiramatsu, Peter Collingbourne, Madhavan T. Venkataraman,
	Andrew Scull, Paolo Bonzini, Ard Biesheuvel,
	moderated list:ARM64 PORT (AARCH64 ARCHITECTURE),
	kvmarm, LKML

On Thu, Feb 24, 2022 at 4:25 AM Fuad Tabba <tabba@google.com> wrote:
>
> Hi Kalesh,
>
> On Thu, Feb 24, 2022 at 5:16 AM Kalesh Singh <kaleshsingh@google.com> wrote:
> >
> > hyp_alloc_private_va_range() can be used to reserve private VA ranges
> > in the nVHE hypervisor. Also update  __create_hyp_private_mapping()
> > to allow specifying an alignment for the private VA mapping.
> >
> > These will be used to implement stack guard pages for KVM nVHE hypervisor
> > (nVHE Hyp mode / not pKVM), in a subsequent patch in the series.
> >
> > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > ---
> >
> > Changes in v3:
> >   - Handle null ptr in IS_ERR_OR_NULL checks, per Mark
> >
> >  arch/arm64/include/asm/kvm_mmu.h |  4 +++
> >  arch/arm64/kvm/mmu.c             | 62 ++++++++++++++++++++------------
> >  2 files changed, 43 insertions(+), 23 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> > index 81839e9a8a24..0b0c71302b92 100644
> > --- a/arch/arm64/include/asm/kvm_mmu.h
> > +++ b/arch/arm64/include/asm/kvm_mmu.h
> > @@ -153,6 +153,10 @@ static __always_inline unsigned long __kern_hyp_va(unsigned long v)
> >  int kvm_share_hyp(void *from, void *to);
> >  void kvm_unshare_hyp(void *from, void *to);
> >  int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
> > +unsigned long hyp_alloc_private_va_range(size_t size, size_t align);
> > +int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> > +                               size_t align, unsigned long *haddr,
> > +                               enum kvm_pgtable_prot prot);
> >  int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
> >                            void __iomem **kaddr,
> >                            void __iomem **haddr);
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index bc2aba953299..fc09536c8197 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -457,22 +457,16 @@ int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
> >         return 0;
> >  }
> >
> > -static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> > -                                       unsigned long *haddr,
> > -                                       enum kvm_pgtable_prot prot)
> > +
> > +/*
> > + * Allocates a private VA range below io_map_base.
> > + *
> > + * @size:      The size of the VA range to reserve.
> > + * @align:     The required alignment for the allocation.
> > + */
>
> Many of the functions in this file use the kernel-doc format, and your
> added comments are close, but not quite conforment. If you want to use
> the kernel-doc for these you can refer to:
> https://www.kernel.org/doc/html/latest/doc-guide/kernel-doc.html

Hi Fuad,

Thanks for the pointer. I will update the function comments to match
when I send the next version.

>
> > +unsigned long hyp_alloc_private_va_range(size_t size, size_t align)
> >  {
> >         unsigned long base;
> > -       int ret = 0;
> > -
> > -       if (!kvm_host_owns_hyp_mappings()) {
> > -               base = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
> > -                                        phys_addr, size, prot);
> > -               if (IS_ERR_OR_NULL((void *)base))
> > -                       return PTR_ERR((void *)base);
> > -               *haddr = base;
> > -
> > -               return 0;
> > -       }
> >
> >         mutex_lock(&kvm_hyp_pgd_mutex);
> >
> > @@ -484,8 +478,8 @@ static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> >          *
> >          * The allocated size is always a multiple of PAGE_SIZE.
> >          */
> > -       size = PAGE_ALIGN(size + offset_in_page(phys_addr));
> > -       base = io_map_base - size;
> > +       base = io_map_base - PAGE_ALIGN(size);
> > +       base = ALIGN_DOWN(base, align);
> >
> >         /*
> >          * Verify that BIT(VA_BITS - 1) hasn't been flipped by
> > @@ -493,20 +487,42 @@ static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> >          * overflowed the idmap/IO address range.
> >          */
> >         if ((base ^ io_map_base) & BIT(VA_BITS - 1))
> > -               ret = -ENOMEM;
> > +               base = (unsigned long)ERR_PTR(-ENOMEM);
> >         else
> >                 io_map_base = base;
> >
> >         mutex_unlock(&kvm_hyp_pgd_mutex);
> >
> > -       if (ret)
> > -               goto out;
> > +       return base;
> > +}
> > +
> > +int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> > +                               size_t align, unsigned long *haddr,
> > +                               enum kvm_pgtable_prot prot)
> > +{
> > +       unsigned long addr;
> > +       int ret = 0;
> > +
> > +       if (!kvm_host_owns_hyp_mappings()) {
> > +               addr = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
> > +                                        phys_addr, size, prot);
> > +               if (IS_ERR_OR_NULL((void *)addr))
> > +                       return addr ? PTR_ERR((void *)addr) : -ENOMEM;
> > +               *haddr = addr;
> > +
> > +               return 0;
> > +       }
> > +
> > +       size += offset_in_page(phys_addr);
>
> You're not page-aligning the size, which was the behavior before this
> patch. However, looking at where it's being used it seems to be fine
> because the users of size would align it if necessary.

This is now done by hyp_alloc_private_va_range() when calculating the new base:
 ...
 * The allocated size is always a multiple of PAGE_SIZE.
 */
base = io_map_base - PAGE_ALIGN(size);
...

Thanks,
Kalesh

>
> Thanks,
> /fuad
>
>
>
> > +       addr = hyp_alloc_private_va_range(size, align);
> > +       if (IS_ERR_OR_NULL((void *)addr))
> > +               return addr ? PTR_ERR((void *)addr) : -ENOMEM;
> >
> > -       ret = __create_hyp_mappings(base, size, phys_addr, prot);
> > +       ret = __create_hyp_mappings(addr, size, phys_addr, prot);
> >         if (ret)
> >                 goto out;
> >
> > -       *haddr = base + offset_in_page(phys_addr);
> > +       *haddr = addr + offset_in_page(phys_addr);
> >  out:
> >         return ret;
> >  }
> > @@ -537,7 +553,7 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
> >                 return 0;
> >         }
> >
> > -       ret = __create_hyp_private_mapping(phys_addr, size,
> > +       ret = __create_hyp_private_mapping(phys_addr, size, PAGE_SIZE,
> >                                            &addr, PAGE_HYP_DEVICE);
> >         if (ret) {
> >                 iounmap(*kaddr);
> > @@ -564,7 +580,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
> >
> >         BUG_ON(is_kernel_in_hyp_mode());
> >
> > -       ret = __create_hyp_private_mapping(phys_addr, size,
> > +       ret = __create_hyp_private_mapping(phys_addr, size, PAGE_SIZE,
> >                                            &addr, PAGE_HYP_EXEC);
> >         if (ret) {
> >                 *haddr = NULL;
> > --
> > 2.35.1.473.g83b2b277ed-goog
> >

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 1/8] KVM: arm64: Introduce hyp_alloc_private_va_range()
@ 2022-02-24 17:20       ` Kalesh Singh
  0 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24 17:20 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Cc: Android Kernel, Will Deacon, Peter Collingbourne,
	Marc Zyngier, LKML, kvmarm, Madhavan T. Venkataraman, Mark Brown,
	Masami Hiramatsu, Catalin Marinas, Paolo Bonzini,
	Suren Baghdasaryan,
	moderated list:ARM64 PORT (AARCH64 ARCHITECTURE)

On Thu, Feb 24, 2022 at 4:25 AM Fuad Tabba <tabba@google.com> wrote:
>
> Hi Kalesh,
>
> On Thu, Feb 24, 2022 at 5:16 AM Kalesh Singh <kaleshsingh@google.com> wrote:
> >
> > hyp_alloc_private_va_range() can be used to reserve private VA ranges
> > in the nVHE hypervisor. Also update  __create_hyp_private_mapping()
> > to allow specifying an alignment for the private VA mapping.
> >
> > These will be used to implement stack guard pages for KVM nVHE hypervisor
> > (nVHE Hyp mode / not pKVM), in a subsequent patch in the series.
> >
> > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > ---
> >
> > Changes in v3:
> >   - Handle null ptr in IS_ERR_OR_NULL checks, per Mark
> >
> >  arch/arm64/include/asm/kvm_mmu.h |  4 +++
> >  arch/arm64/kvm/mmu.c             | 62 ++++++++++++++++++++------------
> >  2 files changed, 43 insertions(+), 23 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> > index 81839e9a8a24..0b0c71302b92 100644
> > --- a/arch/arm64/include/asm/kvm_mmu.h
> > +++ b/arch/arm64/include/asm/kvm_mmu.h
> > @@ -153,6 +153,10 @@ static __always_inline unsigned long __kern_hyp_va(unsigned long v)
> >  int kvm_share_hyp(void *from, void *to);
> >  void kvm_unshare_hyp(void *from, void *to);
> >  int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
> > +unsigned long hyp_alloc_private_va_range(size_t size, size_t align);
> > +int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> > +                               size_t align, unsigned long *haddr,
> > +                               enum kvm_pgtable_prot prot);
> >  int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
> >                            void __iomem **kaddr,
> >                            void __iomem **haddr);
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index bc2aba953299..fc09536c8197 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -457,22 +457,16 @@ int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
> >         return 0;
> >  }
> >
> > -static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> > -                                       unsigned long *haddr,
> > -                                       enum kvm_pgtable_prot prot)
> > +
> > +/*
> > + * Allocates a private VA range below io_map_base.
> > + *
> > + * @size:      The size of the VA range to reserve.
> > + * @align:     The required alignment for the allocation.
> > + */
>
> Many of the functions in this file use the kernel-doc format, and your
> added comments are close, but not quite conforment. If you want to use
> the kernel-doc for these you can refer to:
> https://www.kernel.org/doc/html/latest/doc-guide/kernel-doc.html

Hi Fuad,

Thanks for the pointer. I will update the function comments to match
when I send the next version.

>
> > +unsigned long hyp_alloc_private_va_range(size_t size, size_t align)
> >  {
> >         unsigned long base;
> > -       int ret = 0;
> > -
> > -       if (!kvm_host_owns_hyp_mappings()) {
> > -               base = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
> > -                                        phys_addr, size, prot);
> > -               if (IS_ERR_OR_NULL((void *)base))
> > -                       return PTR_ERR((void *)base);
> > -               *haddr = base;
> > -
> > -               return 0;
> > -       }
> >
> >         mutex_lock(&kvm_hyp_pgd_mutex);
> >
> > @@ -484,8 +478,8 @@ static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> >          *
> >          * The allocated size is always a multiple of PAGE_SIZE.
> >          */
> > -       size = PAGE_ALIGN(size + offset_in_page(phys_addr));
> > -       base = io_map_base - size;
> > +       base = io_map_base - PAGE_ALIGN(size);
> > +       base = ALIGN_DOWN(base, align);
> >
> >         /*
> >          * Verify that BIT(VA_BITS - 1) hasn't been flipped by
> > @@ -493,20 +487,42 @@ static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> >          * overflowed the idmap/IO address range.
> >          */
> >         if ((base ^ io_map_base) & BIT(VA_BITS - 1))
> > -               ret = -ENOMEM;
> > +               base = (unsigned long)ERR_PTR(-ENOMEM);
> >         else
> >                 io_map_base = base;
> >
> >         mutex_unlock(&kvm_hyp_pgd_mutex);
> >
> > -       if (ret)
> > -               goto out;
> > +       return base;
> > +}
> > +
> > +int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> > +                               size_t align, unsigned long *haddr,
> > +                               enum kvm_pgtable_prot prot)
> > +{
> > +       unsigned long addr;
> > +       int ret = 0;
> > +
> > +       if (!kvm_host_owns_hyp_mappings()) {
> > +               addr = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
> > +                                        phys_addr, size, prot);
> > +               if (IS_ERR_OR_NULL((void *)addr))
> > +                       return addr ? PTR_ERR((void *)addr) : -ENOMEM;
> > +               *haddr = addr;
> > +
> > +               return 0;
> > +       }
> > +
> > +       size += offset_in_page(phys_addr);
>
> You're not page-aligning the size, which was the behavior before this
> patch. However, looking at where it's being used it seems to be fine
> because the users of size would align it if necessary.

This is now done by hyp_alloc_private_va_range() when calculating the new base:
 ...
 * The allocated size is always a multiple of PAGE_SIZE.
 */
base = io_map_base - PAGE_ALIGN(size);
...

Thanks,
Kalesh

>
> Thanks,
> /fuad
>
>
>
> > +       addr = hyp_alloc_private_va_range(size, align);
> > +       if (IS_ERR_OR_NULL((void *)addr))
> > +               return addr ? PTR_ERR((void *)addr) : -ENOMEM;
> >
> > -       ret = __create_hyp_mappings(base, size, phys_addr, prot);
> > +       ret = __create_hyp_mappings(addr, size, phys_addr, prot);
> >         if (ret)
> >                 goto out;
> >
> > -       *haddr = base + offset_in_page(phys_addr);
> > +       *haddr = addr + offset_in_page(phys_addr);
> >  out:
> >         return ret;
> >  }
> > @@ -537,7 +553,7 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
> >                 return 0;
> >         }
> >
> > -       ret = __create_hyp_private_mapping(phys_addr, size,
> > +       ret = __create_hyp_private_mapping(phys_addr, size, PAGE_SIZE,
> >                                            &addr, PAGE_HYP_DEVICE);
> >         if (ret) {
> >                 iounmap(*kaddr);
> > @@ -564,7 +580,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
> >
> >         BUG_ON(is_kernel_in_hyp_mode());
> >
> > -       ret = __create_hyp_private_mapping(phys_addr, size,
> > +       ret = __create_hyp_private_mapping(phys_addr, size, PAGE_SIZE,
> >                                            &addr, PAGE_HYP_EXEC);
> >         if (ret) {
> >                 *haddr = NULL;
> > --
> > 2.35.1.473.g83b2b277ed-goog
> >
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 2/8] KVM: arm64: Introduce pkvm_alloc_private_va_range()
  2022-02-24 12:25     ` Fuad Tabba
  (?)
@ 2022-02-24 17:28       ` Kalesh Singh
  -1 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24 17:28 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Will Deacon, Marc Zyngier, Quentin Perret, Suren Baghdasaryan,
	Cc: Android Kernel, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Catalin Marinas, Mark Rutland, Mark Brown,
	Masami Hiramatsu, Peter Collingbourne, Madhavan T. Venkataraman,
	Andrew Walbran, Andrew Scull, Paolo Bonzini, Ard Biesheuvel,
	moderated list:ARM64 PORT (AARCH64 ARCHITECTURE),
	kvmarm, LKML

On Thu, Feb 24, 2022 at 4:26 AM Fuad Tabba <tabba@google.com> wrote:
>
> Hi Kalesh,
>
> I really like how this makes the code cleaner in general. A couple of
> small nits below.
>
> On Thu, Feb 24, 2022 at 5:17 AM 'Kalesh Singh' via kernel-team
> <kernel-team@android.com> wrote:
> >
> > pkvm_hyp_alloc_private_va_range() can be used to reserve private VA ranges
> > in the pKVM nVHE hypervisor (). Also update __pkvm_create_private_mapping()
> > to allow specifying an alignment for the private VA mapping.
> >
> > These will be used to implement stack guard pages for pKVM nVHE hypervisor
> > (in a subsequent patch in the series).
> >
> > Credits to Quentin Perret <qperret@google.com> for the idea of moving
> > private VA allocation out of __pkvm_create_private_mapping()
> >
> > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > ---
> >
> > Changes in v3:
> >   - Handle null ptr in IS_ERR_OR_NULL checks, per Mark
> >
> > Changes in v2:
> >   - Allow specifying an alignment for the private VA allocations, per Marc
> >
> >  arch/arm64/kvm/hyp/include/nvhe/mm.h |  3 +-
> >  arch/arm64/kvm/hyp/nvhe/hyp-main.c   |  5 +--
> >  arch/arm64/kvm/hyp/nvhe/mm.c         | 51 ++++++++++++++++++----------
> >  arch/arm64/kvm/mmu.c                 |  2 +-
> >  4 files changed, 40 insertions(+), 21 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> > index 2d08510c6cc1..05d06ad00347 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> > @@ -20,7 +20,8 @@ int pkvm_cpu_set_vector(enum arm64_hyp_spectre_vector slot);
> >  int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
> >  int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot prot);
> >  unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
> > -                                           enum kvm_pgtable_prot prot);
> > +                                       size_t align, enum kvm_pgtable_prot prot);
>
> Minor nit: the alignment of this does not match how it was before,
> i.e., it's not in line with the other function parameters. Yet it
> still goes over 80 characters.

Ack
>
> > +unsigned long pkvm_alloc_private_va_range(size_t size, size_t align);
> >
> >  static inline void hyp_vmemmap_range(phys_addr_t phys, unsigned long size,
> >                                      unsigned long *start, unsigned long *end)
> > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > index 5e2197db0d32..96b2312a0f1d 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > @@ -158,9 +158,10 @@ static void handle___pkvm_create_private_mapping(struct kvm_cpu_context *host_ct
> >  {
> >         DECLARE_REG(phys_addr_t, phys, host_ctxt, 1);
> >         DECLARE_REG(size_t, size, host_ctxt, 2);
> > -       DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 3);
> > +       DECLARE_REG(size_t, align, host_ctxt, 3);
> > +       DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 4);
> >
> > -       cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, prot);
> > +       cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, align, prot);
> >  }
> >
> >  static void handle___pkvm_prot_finalize(struct kvm_cpu_context *host_ctxt)
> > diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
> > index 526a7d6fa86f..f35468ec639d 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/mm.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/mm.c
> > @@ -37,26 +37,46 @@ static int __pkvm_create_mappings(unsigned long start, unsigned long size,
> >         return err;
> >  }
> >
> > -unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
> > -                                           enum kvm_pgtable_prot prot)
> > +/*
> > + * Allocates a private VA range above __io_map_base.
> > + *
> > + * @size:      The size of the VA range to reserve.
> > + * @align:     The required alignment for the allocation.
> > + */
> > +unsigned long pkvm_alloc_private_va_range(size_t size, size_t align)
> >  {
> > -       unsigned long addr;
> > -       int err;
> > +       unsigned long base, addr;
> >
> >         hyp_spin_lock(&pkvm_pgd_lock);
> >
> > -       size = PAGE_ALIGN(size + offset_in_page(phys));
> > -       addr = __io_map_base;
> > -       __io_map_base += size;
> > +       addr = ALIGN(__io_map_base, align);
> > +
> > +       /* The allocated size is always a multiple of PAGE_SIZE */
> > +       base = addr + PAGE_ALIGN(size);
> >
> >         /* Are we overflowing on the vmemmap ? */
> > -       if (__io_map_base > __hyp_vmemmap) {
> > -               __io_map_base -= size;
> > +       if (base > __hyp_vmemmap)
> >                 addr = (unsigned long)ERR_PTR(-ENOMEM);
> > +       else
> > +               __io_map_base = base;
> > +
> > +       hyp_spin_unlock(&pkvm_pgd_lock);
> > +
> > +       return addr;
> > +}
> > +
> > +unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
> > +                                       size_t align, enum kvm_pgtable_prot prot)
> > +{
> > +       unsigned long addr;
> > +       int err;
> > +
> > +       size += offset_in_page(phys);
>
> Same as in the patch before, the previous code would align the size
> but not this change. However, looking at the callers and callees this
> seems to be fine, since it's aligned when needed.

This is now handled by pkvm_alloc_private_va_range(), so caller doesn't need to:

...
/* The allocated size is always a multiple of PAGE_SIZE */
 base = addr + PAGE_ALIGN(size);
 ...

Thanks,
Kalesh
>
> Thanks,
> /fuad
>
> > +       addr = pkvm_alloc_private_va_range(size, align);
> > +       if (IS_ERR((void *)addr))
> >                 goto out;
> > -       }
> >
> > -       err = kvm_pgtable_hyp_map(&pkvm_pgtable, addr, size, phys, prot);
> > +       err = __pkvm_create_mappings(addr, size, phys, prot);
> >         if (err) {
> >                 addr = (unsigned long)ERR_PTR(err);
> >                 goto out;
> > @@ -64,8 +84,6 @@ unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
> >
> >         addr = addr + offset_in_page(phys);
> >  out:
> > -       hyp_spin_unlock(&pkvm_pgd_lock);
> > -
> >         return addr;
> >  }
> >
> > @@ -152,11 +170,10 @@ int hyp_map_vectors(void)
> >                 return 0;
> >
> >         phys = __hyp_pa(__bp_harden_hyp_vecs);
> > -       bp_base = (void *)__pkvm_create_private_mapping(phys,
> > -                                                       __BP_HARDEN_HYP_VECS_SZ,
> > -                                                       PAGE_HYP_EXEC);
> > +       bp_base = (void *)__pkvm_create_private_mapping(phys, __BP_HARDEN_HYP_VECS_SZ,
> > +                                                       PAGE_SIZE, PAGE_HYP_EXEC);
> >         if (IS_ERR_OR_NULL(bp_base))
> > -               return PTR_ERR(bp_base);
> > +               return bp_base ? PTR_ERR(bp_base) : -ENOMEM;
> >
> >         __hyp_bp_vect_base = bp_base;
> >
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index fc09536c8197..298e6d8439ef 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -505,7 +505,7 @@ int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> >
> >         if (!kvm_host_owns_hyp_mappings()) {
> >                 addr = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
> > -                                        phys_addr, size, prot);
> > +                                        phys_addr, size, align, prot);
> >                 if (IS_ERR_OR_NULL((void *)addr))
> >                         return addr ? PTR_ERR((void *)addr) : -ENOMEM;
> >                 *haddr = addr;
> > --
> > 2.35.1.473.g83b2b277ed-goog
> >
> > --
> > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
> >

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 2/8] KVM: arm64: Introduce pkvm_alloc_private_va_range()
@ 2022-02-24 17:28       ` Kalesh Singh
  0 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24 17:28 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Will Deacon, Marc Zyngier, Quentin Perret, Suren Baghdasaryan,
	Cc: Android Kernel, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Catalin Marinas, Mark Rutland, Mark Brown,
	Masami Hiramatsu, Peter Collingbourne, Madhavan T. Venkataraman,
	Andrew Walbran, Andrew Scull, Paolo Bonzini, Ard Biesheuvel,
	moderated list:ARM64 PORT (AARCH64 ARCHITECTURE),
	kvmarm, LKML

On Thu, Feb 24, 2022 at 4:26 AM Fuad Tabba <tabba@google.com> wrote:
>
> Hi Kalesh,
>
> I really like how this makes the code cleaner in general. A couple of
> small nits below.
>
> On Thu, Feb 24, 2022 at 5:17 AM 'Kalesh Singh' via kernel-team
> <kernel-team@android.com> wrote:
> >
> > pkvm_hyp_alloc_private_va_range() can be used to reserve private VA ranges
> > in the pKVM nVHE hypervisor (). Also update __pkvm_create_private_mapping()
> > to allow specifying an alignment for the private VA mapping.
> >
> > These will be used to implement stack guard pages for pKVM nVHE hypervisor
> > (in a subsequent patch in the series).
> >
> > Credits to Quentin Perret <qperret@google.com> for the idea of moving
> > private VA allocation out of __pkvm_create_private_mapping()
> >
> > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > ---
> >
> > Changes in v3:
> >   - Handle null ptr in IS_ERR_OR_NULL checks, per Mark
> >
> > Changes in v2:
> >   - Allow specifying an alignment for the private VA allocations, per Marc
> >
> >  arch/arm64/kvm/hyp/include/nvhe/mm.h |  3 +-
> >  arch/arm64/kvm/hyp/nvhe/hyp-main.c   |  5 +--
> >  arch/arm64/kvm/hyp/nvhe/mm.c         | 51 ++++++++++++++++++----------
> >  arch/arm64/kvm/mmu.c                 |  2 +-
> >  4 files changed, 40 insertions(+), 21 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> > index 2d08510c6cc1..05d06ad00347 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> > @@ -20,7 +20,8 @@ int pkvm_cpu_set_vector(enum arm64_hyp_spectre_vector slot);
> >  int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
> >  int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot prot);
> >  unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
> > -                                           enum kvm_pgtable_prot prot);
> > +                                       size_t align, enum kvm_pgtable_prot prot);
>
> Minor nit: the alignment of this does not match how it was before,
> i.e., it's not in line with the other function parameters. Yet it
> still goes over 80 characters.

Ack
>
> > +unsigned long pkvm_alloc_private_va_range(size_t size, size_t align);
> >
> >  static inline void hyp_vmemmap_range(phys_addr_t phys, unsigned long size,
> >                                      unsigned long *start, unsigned long *end)
> > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > index 5e2197db0d32..96b2312a0f1d 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > @@ -158,9 +158,10 @@ static void handle___pkvm_create_private_mapping(struct kvm_cpu_context *host_ct
> >  {
> >         DECLARE_REG(phys_addr_t, phys, host_ctxt, 1);
> >         DECLARE_REG(size_t, size, host_ctxt, 2);
> > -       DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 3);
> > +       DECLARE_REG(size_t, align, host_ctxt, 3);
> > +       DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 4);
> >
> > -       cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, prot);
> > +       cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, align, prot);
> >  }
> >
> >  static void handle___pkvm_prot_finalize(struct kvm_cpu_context *host_ctxt)
> > diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
> > index 526a7d6fa86f..f35468ec639d 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/mm.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/mm.c
> > @@ -37,26 +37,46 @@ static int __pkvm_create_mappings(unsigned long start, unsigned long size,
> >         return err;
> >  }
> >
> > -unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
> > -                                           enum kvm_pgtable_prot prot)
> > +/*
> > + * Allocates a private VA range above __io_map_base.
> > + *
> > + * @size:      The size of the VA range to reserve.
> > + * @align:     The required alignment for the allocation.
> > + */
> > +unsigned long pkvm_alloc_private_va_range(size_t size, size_t align)
> >  {
> > -       unsigned long addr;
> > -       int err;
> > +       unsigned long base, addr;
> >
> >         hyp_spin_lock(&pkvm_pgd_lock);
> >
> > -       size = PAGE_ALIGN(size + offset_in_page(phys));
> > -       addr = __io_map_base;
> > -       __io_map_base += size;
> > +       addr = ALIGN(__io_map_base, align);
> > +
> > +       /* The allocated size is always a multiple of PAGE_SIZE */
> > +       base = addr + PAGE_ALIGN(size);
> >
> >         /* Are we overflowing on the vmemmap ? */
> > -       if (__io_map_base > __hyp_vmemmap) {
> > -               __io_map_base -= size;
> > +       if (base > __hyp_vmemmap)
> >                 addr = (unsigned long)ERR_PTR(-ENOMEM);
> > +       else
> > +               __io_map_base = base;
> > +
> > +       hyp_spin_unlock(&pkvm_pgd_lock);
> > +
> > +       return addr;
> > +}
> > +
> > +unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
> > +                                       size_t align, enum kvm_pgtable_prot prot)
> > +{
> > +       unsigned long addr;
> > +       int err;
> > +
> > +       size += offset_in_page(phys);
>
> Same as in the patch before, the previous code would align the size
> but not this change. However, looking at the callers and callees this
> seems to be fine, since it's aligned when needed.

This is now handled by pkvm_alloc_private_va_range(), so caller doesn't need to:

...
/* The allocated size is always a multiple of PAGE_SIZE */
 base = addr + PAGE_ALIGN(size);
 ...

Thanks,
Kalesh
>
> Thanks,
> /fuad
>
> > +       addr = pkvm_alloc_private_va_range(size, align);
> > +       if (IS_ERR((void *)addr))
> >                 goto out;
> > -       }
> >
> > -       err = kvm_pgtable_hyp_map(&pkvm_pgtable, addr, size, phys, prot);
> > +       err = __pkvm_create_mappings(addr, size, phys, prot);
> >         if (err) {
> >                 addr = (unsigned long)ERR_PTR(err);
> >                 goto out;
> > @@ -64,8 +84,6 @@ unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
> >
> >         addr = addr + offset_in_page(phys);
> >  out:
> > -       hyp_spin_unlock(&pkvm_pgd_lock);
> > -
> >         return addr;
> >  }
> >
> > @@ -152,11 +170,10 @@ int hyp_map_vectors(void)
> >                 return 0;
> >
> >         phys = __hyp_pa(__bp_harden_hyp_vecs);
> > -       bp_base = (void *)__pkvm_create_private_mapping(phys,
> > -                                                       __BP_HARDEN_HYP_VECS_SZ,
> > -                                                       PAGE_HYP_EXEC);
> > +       bp_base = (void *)__pkvm_create_private_mapping(phys, __BP_HARDEN_HYP_VECS_SZ,
> > +                                                       PAGE_SIZE, PAGE_HYP_EXEC);
> >         if (IS_ERR_OR_NULL(bp_base))
> > -               return PTR_ERR(bp_base);
> > +               return bp_base ? PTR_ERR(bp_base) : -ENOMEM;
> >
> >         __hyp_bp_vect_base = bp_base;
> >
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index fc09536c8197..298e6d8439ef 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -505,7 +505,7 @@ int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> >
> >         if (!kvm_host_owns_hyp_mappings()) {
> >                 addr = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
> > -                                        phys_addr, size, prot);
> > +                                        phys_addr, size, align, prot);
> >                 if (IS_ERR_OR_NULL((void *)addr))
> >                         return addr ? PTR_ERR((void *)addr) : -ENOMEM;
> >                 *haddr = addr;
> > --
> > 2.35.1.473.g83b2b277ed-goog
> >
> > --
> > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
> >

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 2/8] KVM: arm64: Introduce pkvm_alloc_private_va_range()
@ 2022-02-24 17:28       ` Kalesh Singh
  0 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24 17:28 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Cc: Android Kernel, Andrew Walbran, Will Deacon,
	Peter Collingbourne, Marc Zyngier, LKML, kvmarm,
	Madhavan T. Venkataraman, Mark Brown, Masami Hiramatsu,
	Catalin Marinas, Paolo Bonzini, Suren Baghdasaryan,
	moderated list:ARM64 PORT (AARCH64 ARCHITECTURE)

On Thu, Feb 24, 2022 at 4:26 AM Fuad Tabba <tabba@google.com> wrote:
>
> Hi Kalesh,
>
> I really like how this makes the code cleaner in general. A couple of
> small nits below.
>
> On Thu, Feb 24, 2022 at 5:17 AM 'Kalesh Singh' via kernel-team
> <kernel-team@android.com> wrote:
> >
> > pkvm_hyp_alloc_private_va_range() can be used to reserve private VA ranges
> > in the pKVM nVHE hypervisor (). Also update __pkvm_create_private_mapping()
> > to allow specifying an alignment for the private VA mapping.
> >
> > These will be used to implement stack guard pages for pKVM nVHE hypervisor
> > (in a subsequent patch in the series).
> >
> > Credits to Quentin Perret <qperret@google.com> for the idea of moving
> > private VA allocation out of __pkvm_create_private_mapping()
> >
> > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > ---
> >
> > Changes in v3:
> >   - Handle null ptr in IS_ERR_OR_NULL checks, per Mark
> >
> > Changes in v2:
> >   - Allow specifying an alignment for the private VA allocations, per Marc
> >
> >  arch/arm64/kvm/hyp/include/nvhe/mm.h |  3 +-
> >  arch/arm64/kvm/hyp/nvhe/hyp-main.c   |  5 +--
> >  arch/arm64/kvm/hyp/nvhe/mm.c         | 51 ++++++++++++++++++----------
> >  arch/arm64/kvm/mmu.c                 |  2 +-
> >  4 files changed, 40 insertions(+), 21 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> > index 2d08510c6cc1..05d06ad00347 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> > @@ -20,7 +20,8 @@ int pkvm_cpu_set_vector(enum arm64_hyp_spectre_vector slot);
> >  int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
> >  int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot prot);
> >  unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
> > -                                           enum kvm_pgtable_prot prot);
> > +                                       size_t align, enum kvm_pgtable_prot prot);
>
> Minor nit: the alignment of this does not match how it was before,
> i.e., it's not in line with the other function parameters. Yet it
> still goes over 80 characters.

Ack
>
> > +unsigned long pkvm_alloc_private_va_range(size_t size, size_t align);
> >
> >  static inline void hyp_vmemmap_range(phys_addr_t phys, unsigned long size,
> >                                      unsigned long *start, unsigned long *end)
> > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > index 5e2197db0d32..96b2312a0f1d 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > @@ -158,9 +158,10 @@ static void handle___pkvm_create_private_mapping(struct kvm_cpu_context *host_ct
> >  {
> >         DECLARE_REG(phys_addr_t, phys, host_ctxt, 1);
> >         DECLARE_REG(size_t, size, host_ctxt, 2);
> > -       DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 3);
> > +       DECLARE_REG(size_t, align, host_ctxt, 3);
> > +       DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 4);
> >
> > -       cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, prot);
> > +       cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, align, prot);
> >  }
> >
> >  static void handle___pkvm_prot_finalize(struct kvm_cpu_context *host_ctxt)
> > diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
> > index 526a7d6fa86f..f35468ec639d 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/mm.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/mm.c
> > @@ -37,26 +37,46 @@ static int __pkvm_create_mappings(unsigned long start, unsigned long size,
> >         return err;
> >  }
> >
> > -unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
> > -                                           enum kvm_pgtable_prot prot)
> > +/*
> > + * Allocates a private VA range above __io_map_base.
> > + *
> > + * @size:      The size of the VA range to reserve.
> > + * @align:     The required alignment for the allocation.
> > + */
> > +unsigned long pkvm_alloc_private_va_range(size_t size, size_t align)
> >  {
> > -       unsigned long addr;
> > -       int err;
> > +       unsigned long base, addr;
> >
> >         hyp_spin_lock(&pkvm_pgd_lock);
> >
> > -       size = PAGE_ALIGN(size + offset_in_page(phys));
> > -       addr = __io_map_base;
> > -       __io_map_base += size;
> > +       addr = ALIGN(__io_map_base, align);
> > +
> > +       /* The allocated size is always a multiple of PAGE_SIZE */
> > +       base = addr + PAGE_ALIGN(size);
> >
> >         /* Are we overflowing on the vmemmap ? */
> > -       if (__io_map_base > __hyp_vmemmap) {
> > -               __io_map_base -= size;
> > +       if (base > __hyp_vmemmap)
> >                 addr = (unsigned long)ERR_PTR(-ENOMEM);
> > +       else
> > +               __io_map_base = base;
> > +
> > +       hyp_spin_unlock(&pkvm_pgd_lock);
> > +
> > +       return addr;
> > +}
> > +
> > +unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
> > +                                       size_t align, enum kvm_pgtable_prot prot)
> > +{
> > +       unsigned long addr;
> > +       int err;
> > +
> > +       size += offset_in_page(phys);
>
> Same as in the patch before, the previous code would align the size
> but not this change. However, looking at the callers and callees this
> seems to be fine, since it's aligned when needed.

This is now handled by pkvm_alloc_private_va_range(), so caller doesn't need to:

...
/* The allocated size is always a multiple of PAGE_SIZE */
 base = addr + PAGE_ALIGN(size);
 ...

Thanks,
Kalesh
>
> Thanks,
> /fuad
>
> > +       addr = pkvm_alloc_private_va_range(size, align);
> > +       if (IS_ERR((void *)addr))
> >                 goto out;
> > -       }
> >
> > -       err = kvm_pgtable_hyp_map(&pkvm_pgtable, addr, size, phys, prot);
> > +       err = __pkvm_create_mappings(addr, size, phys, prot);
> >         if (err) {
> >                 addr = (unsigned long)ERR_PTR(err);
> >                 goto out;
> > @@ -64,8 +84,6 @@ unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
> >
> >         addr = addr + offset_in_page(phys);
> >  out:
> > -       hyp_spin_unlock(&pkvm_pgd_lock);
> > -
> >         return addr;
> >  }
> >
> > @@ -152,11 +170,10 @@ int hyp_map_vectors(void)
> >                 return 0;
> >
> >         phys = __hyp_pa(__bp_harden_hyp_vecs);
> > -       bp_base = (void *)__pkvm_create_private_mapping(phys,
> > -                                                       __BP_HARDEN_HYP_VECS_SZ,
> > -                                                       PAGE_HYP_EXEC);
> > +       bp_base = (void *)__pkvm_create_private_mapping(phys, __BP_HARDEN_HYP_VECS_SZ,
> > +                                                       PAGE_SIZE, PAGE_HYP_EXEC);
> >         if (IS_ERR_OR_NULL(bp_base))
> > -               return PTR_ERR(bp_base);
> > +               return bp_base ? PTR_ERR(bp_base) : -ENOMEM;
> >
> >         __hyp_bp_vect_base = bp_base;
> >
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index fc09536c8197..298e6d8439ef 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -505,7 +505,7 @@ int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> >
> >         if (!kvm_host_owns_hyp_mappings()) {
> >                 addr = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
> > -                                        phys_addr, size, prot);
> > +                                        phys_addr, size, align, prot);
> >                 if (IS_ERR_OR_NULL((void *)addr))
> >                         return addr ? PTR_ERR((void *)addr) : -ENOMEM;
> >                 *haddr = addr;
> > --
> > 2.35.1.473.g83b2b277ed-goog
> >
> > --
> > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
> >
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 3/8] KVM: arm64: Add guard pages for KVM nVHE hypervisor stack
  2022-02-24 12:26     ` Fuad Tabba
  (?)
@ 2022-02-24 17:54       ` Kalesh Singh
  -1 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24 17:54 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Will Deacon, Marc Zyngier, Quentin Perret, Suren Baghdasaryan,
	Cc: Android Kernel, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Catalin Marinas, Mark Rutland, Mark Brown,
	Masami Hiramatsu, Peter Collingbourne, Madhavan T. Venkataraman,
	Andrew Walbran, Andrew Scull, Paolo Bonzini,
	moderated list:ARM64 PORT (AARCH64 ARCHITECTURE),
	kvmarm, LKML

On Thu, Feb 24, 2022 at 4:26 AM Fuad Tabba <tabba@google.com> wrote:
>
> Hi Kalesh,
>
>
>
> On Thu, Feb 24, 2022 at 5:18 AM Kalesh Singh <kaleshsingh@google.com> wrote:
> >
> > Maps the stack pages in the flexible private VA range and allocates
> > guard pages below the stack as unbacked VA space. The stack is aligned
> > to twice its size to aid overflow detection (implemented in a subsequent
> > patch in the series).
> >
> > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > ---
> >
> > Changes in v3:
> >   - Handle null ptr in IS_ERR_OR_NULL checks, per Mark
> >
> >  arch/arm64/include/asm/kvm_asm.h |  1 +
> >  arch/arm64/kvm/arm.c             | 32 +++++++++++++++++++++++++++++---
> >  2 files changed, 30 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> > index d5b0386ef765..2e277f2ed671 100644
> > --- a/arch/arm64/include/asm/kvm_asm.h
> > +++ b/arch/arm64/include/asm/kvm_asm.h
> > @@ -169,6 +169,7 @@ struct kvm_nvhe_init_params {
> >         unsigned long tcr_el2;
> >         unsigned long tpidr_el2;
> >         unsigned long stack_hyp_va;
> > +       unsigned long stack_pa;
> >         phys_addr_t pgd_pa;
> >         unsigned long hcr_el2;
> >         unsigned long vttbr;
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index ecc5958e27fe..7a23630c4a7f 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -1541,7 +1541,6 @@ static void cpu_prepare_hyp_mode(int cpu)
> >         tcr |= (idmap_t0sz & GENMASK(TCR_TxSZ_WIDTH - 1, 0)) << TCR_T0SZ_OFFSET;
> >         params->tcr_el2 = tcr;
> >
> > -       params->stack_hyp_va = kern_hyp_va(per_cpu(kvm_arm_hyp_stack_page, cpu) + PAGE_SIZE);
> >         params->pgd_pa = kvm_mmu_get_httbr();
> >         if (is_protected_kvm_enabled())
> >                 params->hcr_el2 = HCR_HOST_NVHE_PROTECTED_FLAGS;
> > @@ -1990,14 +1989,41 @@ static int init_hyp_mode(void)
> >          * Map the Hyp stack pages
> >          */
> >         for_each_possible_cpu(cpu) {
> > +               struct kvm_nvhe_init_params *params = per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
> >                 char *stack_page = (char *)per_cpu(kvm_arm_hyp_stack_page, cpu);
> > -               err = create_hyp_mappings(stack_page, stack_page + PAGE_SIZE,
> > -                                         PAGE_HYP);
> > +               unsigned long stack_hyp_va, guard_hyp_va;
> >
> > +               /*
> > +                * Private mappings are allocated downwards from io_map_base
> > +                * so allocate the stack first then the guard page.
> > +                *
> > +                * The stack is aligned to twice its size to facilitate overflow
> > +                * detection.
> > +                */
> > +               err = __create_hyp_private_mapping(__pa(stack_page), PAGE_SIZE,
> > +                                               PAGE_SIZE * 2, &stack_hyp_va, PAGE_HYP);
> >                 if (err) {
> >                         kvm_err("Cannot map hyp stack\n");
> >                         goto out_err;
> >                 }
> > +
> > +               /* Allocate unbacked private VA range for stack guard page */
> > +               guard_hyp_va = hyp_alloc_private_va_range(PAGE_SIZE, PAGE_SIZE);
> > +               if (IS_ERR_OR_NULL((void *)guard_hyp_va)) {
> > +                       err = guard_hyp_va ? PTR_ERR((void *)guard_hyp_va) : -ENOMEM;
>
> I am a bit confused by this check. hyp_alloc_private_va_range() always
> returns ERR_PTR(-ENOMEM) if there's an error. Mark's comment (if I
> understood it correctly) was about how you were handling it *in*
> hyp_alloc_private_va_range(), rather than calls *to*
> hyp_alloc_private_va_range().

Mark's comments were for the callers. I think the address can still be
null without returning -ENOMEM (judging from what the check was before
hyp_alloc_private_va_range). You make a good point - I think we can
handle any potential null in *_alloc_private_va_range() and drop the
use of PTR_ERR with IS_ERR_OR_NULL (which seems not a good idea in
general).

>
> > +                       kvm_err("Cannot allocate hyp stack guard page\n");
> > +                       goto out_err;
> > +               }
> > +
> > +               /*
> > +                * Save the stack PA in nvhe_init_params. This will be needed to recreate
> > +                * the stack mapping in protected nVHE mode. __hyp_pa() won't do the right
> > +                * thing there, since the stack has been mapped in the flexible private
> > +                * VA space.
> > +                */
>
> Nit: These comments go over 80 columns, unlike other comments that
> you've added in this file.

Ack. I'll update in the next version.

Thanks,
Kalesh

>
> Thanks,
> /fuad
>
> > +               params->stack_pa = __pa(stack_page) + PAGE_SIZE;
> > +
> > +               params->stack_hyp_va = stack_hyp_va + PAGE_SIZE;
> >         }
> >
> >         for_each_possible_cpu(cpu) {
> > --
> > 2.35.1.473.g83b2b277ed-goog
> >

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 3/8] KVM: arm64: Add guard pages for KVM nVHE hypervisor stack
@ 2022-02-24 17:54       ` Kalesh Singh
  0 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24 17:54 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Will Deacon, Marc Zyngier, Quentin Perret, Suren Baghdasaryan,
	Cc: Android Kernel, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Catalin Marinas, Mark Rutland, Mark Brown,
	Masami Hiramatsu, Peter Collingbourne, Madhavan T. Venkataraman,
	Andrew Walbran, Andrew Scull, Paolo Bonzini,
	moderated list:ARM64 PORT (AARCH64 ARCHITECTURE),
	kvmarm, LKML

On Thu, Feb 24, 2022 at 4:26 AM Fuad Tabba <tabba@google.com> wrote:
>
> Hi Kalesh,
>
>
>
> On Thu, Feb 24, 2022 at 5:18 AM Kalesh Singh <kaleshsingh@google.com> wrote:
> >
> > Maps the stack pages in the flexible private VA range and allocates
> > guard pages below the stack as unbacked VA space. The stack is aligned
> > to twice its size to aid overflow detection (implemented in a subsequent
> > patch in the series).
> >
> > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > ---
> >
> > Changes in v3:
> >   - Handle null ptr in IS_ERR_OR_NULL checks, per Mark
> >
> >  arch/arm64/include/asm/kvm_asm.h |  1 +
> >  arch/arm64/kvm/arm.c             | 32 +++++++++++++++++++++++++++++---
> >  2 files changed, 30 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> > index d5b0386ef765..2e277f2ed671 100644
> > --- a/arch/arm64/include/asm/kvm_asm.h
> > +++ b/arch/arm64/include/asm/kvm_asm.h
> > @@ -169,6 +169,7 @@ struct kvm_nvhe_init_params {
> >         unsigned long tcr_el2;
> >         unsigned long tpidr_el2;
> >         unsigned long stack_hyp_va;
> > +       unsigned long stack_pa;
> >         phys_addr_t pgd_pa;
> >         unsigned long hcr_el2;
> >         unsigned long vttbr;
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index ecc5958e27fe..7a23630c4a7f 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -1541,7 +1541,6 @@ static void cpu_prepare_hyp_mode(int cpu)
> >         tcr |= (idmap_t0sz & GENMASK(TCR_TxSZ_WIDTH - 1, 0)) << TCR_T0SZ_OFFSET;
> >         params->tcr_el2 = tcr;
> >
> > -       params->stack_hyp_va = kern_hyp_va(per_cpu(kvm_arm_hyp_stack_page, cpu) + PAGE_SIZE);
> >         params->pgd_pa = kvm_mmu_get_httbr();
> >         if (is_protected_kvm_enabled())
> >                 params->hcr_el2 = HCR_HOST_NVHE_PROTECTED_FLAGS;
> > @@ -1990,14 +1989,41 @@ static int init_hyp_mode(void)
> >          * Map the Hyp stack pages
> >          */
> >         for_each_possible_cpu(cpu) {
> > +               struct kvm_nvhe_init_params *params = per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
> >                 char *stack_page = (char *)per_cpu(kvm_arm_hyp_stack_page, cpu);
> > -               err = create_hyp_mappings(stack_page, stack_page + PAGE_SIZE,
> > -                                         PAGE_HYP);
> > +               unsigned long stack_hyp_va, guard_hyp_va;
> >
> > +               /*
> > +                * Private mappings are allocated downwards from io_map_base
> > +                * so allocate the stack first then the guard page.
> > +                *
> > +                * The stack is aligned to twice its size to facilitate overflow
> > +                * detection.
> > +                */
> > +               err = __create_hyp_private_mapping(__pa(stack_page), PAGE_SIZE,
> > +                                               PAGE_SIZE * 2, &stack_hyp_va, PAGE_HYP);
> >                 if (err) {
> >                         kvm_err("Cannot map hyp stack\n");
> >                         goto out_err;
> >                 }
> > +
> > +               /* Allocate unbacked private VA range for stack guard page */
> > +               guard_hyp_va = hyp_alloc_private_va_range(PAGE_SIZE, PAGE_SIZE);
> > +               if (IS_ERR_OR_NULL((void *)guard_hyp_va)) {
> > +                       err = guard_hyp_va ? PTR_ERR((void *)guard_hyp_va) : -ENOMEM;
>
> I am a bit confused by this check. hyp_alloc_private_va_range() always
> returns ERR_PTR(-ENOMEM) if there's an error. Mark's comment (if I
> understood it correctly) was about how you were handling it *in*
> hyp_alloc_private_va_range(), rather than calls *to*
> hyp_alloc_private_va_range().

Mark's comments were for the callers. I think the address can still be
null without returning -ENOMEM (judging from what the check was before
hyp_alloc_private_va_range). You make a good point - I think we can
handle any potential null in *_alloc_private_va_range() and drop the
use of PTR_ERR with IS_ERR_OR_NULL (which seems not a good idea in
general).

>
> > +                       kvm_err("Cannot allocate hyp stack guard page\n");
> > +                       goto out_err;
> > +               }
> > +
> > +               /*
> > +                * Save the stack PA in nvhe_init_params. This will be needed to recreate
> > +                * the stack mapping in protected nVHE mode. __hyp_pa() won't do the right
> > +                * thing there, since the stack has been mapped in the flexible private
> > +                * VA space.
> > +                */
>
> Nit: These comments go over 80 columns, unlike other comments that
> you've added in this file.

Ack. I'll update in the next version.

Thanks,
Kalesh

>
> Thanks,
> /fuad
>
> > +               params->stack_pa = __pa(stack_page) + PAGE_SIZE;
> > +
> > +               params->stack_hyp_va = stack_hyp_va + PAGE_SIZE;
> >         }
> >
> >         for_each_possible_cpu(cpu) {
> > --
> > 2.35.1.473.g83b2b277ed-goog
> >

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 3/8] KVM: arm64: Add guard pages for KVM nVHE hypervisor stack
@ 2022-02-24 17:54       ` Kalesh Singh
  0 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24 17:54 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Cc: Android Kernel, Andrew Walbran, Will Deacon,
	Peter Collingbourne, Marc Zyngier, LKML,
	Madhavan T. Venkataraman, Mark Brown, Masami Hiramatsu,
	Catalin Marinas, Paolo Bonzini, Suren Baghdasaryan, kvmarm,
	moderated list:ARM64 PORT (AARCH64 ARCHITECTURE)

On Thu, Feb 24, 2022 at 4:26 AM Fuad Tabba <tabba@google.com> wrote:
>
> Hi Kalesh,
>
>
>
> On Thu, Feb 24, 2022 at 5:18 AM Kalesh Singh <kaleshsingh@google.com> wrote:
> >
> > Maps the stack pages in the flexible private VA range and allocates
> > guard pages below the stack as unbacked VA space. The stack is aligned
> > to twice its size to aid overflow detection (implemented in a subsequent
> > patch in the series).
> >
> > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > ---
> >
> > Changes in v3:
> >   - Handle null ptr in IS_ERR_OR_NULL checks, per Mark
> >
> >  arch/arm64/include/asm/kvm_asm.h |  1 +
> >  arch/arm64/kvm/arm.c             | 32 +++++++++++++++++++++++++++++---
> >  2 files changed, 30 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> > index d5b0386ef765..2e277f2ed671 100644
> > --- a/arch/arm64/include/asm/kvm_asm.h
> > +++ b/arch/arm64/include/asm/kvm_asm.h
> > @@ -169,6 +169,7 @@ struct kvm_nvhe_init_params {
> >         unsigned long tcr_el2;
> >         unsigned long tpidr_el2;
> >         unsigned long stack_hyp_va;
> > +       unsigned long stack_pa;
> >         phys_addr_t pgd_pa;
> >         unsigned long hcr_el2;
> >         unsigned long vttbr;
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index ecc5958e27fe..7a23630c4a7f 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -1541,7 +1541,6 @@ static void cpu_prepare_hyp_mode(int cpu)
> >         tcr |= (idmap_t0sz & GENMASK(TCR_TxSZ_WIDTH - 1, 0)) << TCR_T0SZ_OFFSET;
> >         params->tcr_el2 = tcr;
> >
> > -       params->stack_hyp_va = kern_hyp_va(per_cpu(kvm_arm_hyp_stack_page, cpu) + PAGE_SIZE);
> >         params->pgd_pa = kvm_mmu_get_httbr();
> >         if (is_protected_kvm_enabled())
> >                 params->hcr_el2 = HCR_HOST_NVHE_PROTECTED_FLAGS;
> > @@ -1990,14 +1989,41 @@ static int init_hyp_mode(void)
> >          * Map the Hyp stack pages
> >          */
> >         for_each_possible_cpu(cpu) {
> > +               struct kvm_nvhe_init_params *params = per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
> >                 char *stack_page = (char *)per_cpu(kvm_arm_hyp_stack_page, cpu);
> > -               err = create_hyp_mappings(stack_page, stack_page + PAGE_SIZE,
> > -                                         PAGE_HYP);
> > +               unsigned long stack_hyp_va, guard_hyp_va;
> >
> > +               /*
> > +                * Private mappings are allocated downwards from io_map_base
> > +                * so allocate the stack first then the guard page.
> > +                *
> > +                * The stack is aligned to twice its size to facilitate overflow
> > +                * detection.
> > +                */
> > +               err = __create_hyp_private_mapping(__pa(stack_page), PAGE_SIZE,
> > +                                               PAGE_SIZE * 2, &stack_hyp_va, PAGE_HYP);
> >                 if (err) {
> >                         kvm_err("Cannot map hyp stack\n");
> >                         goto out_err;
> >                 }
> > +
> > +               /* Allocate unbacked private VA range for stack guard page */
> > +               guard_hyp_va = hyp_alloc_private_va_range(PAGE_SIZE, PAGE_SIZE);
> > +               if (IS_ERR_OR_NULL((void *)guard_hyp_va)) {
> > +                       err = guard_hyp_va ? PTR_ERR((void *)guard_hyp_va) : -ENOMEM;
>
> I am a bit confused by this check. hyp_alloc_private_va_range() always
> returns ERR_PTR(-ENOMEM) if there's an error. Mark's comment (if I
> understood it correctly) was about how you were handling it *in*
> hyp_alloc_private_va_range(), rather than calls *to*
> hyp_alloc_private_va_range().

Mark's comments were for the callers. I think the address can still be
null without returning -ENOMEM (judging from what the check was before
hyp_alloc_private_va_range). You make a good point - I think we can
handle any potential null in *_alloc_private_va_range() and drop the
use of PTR_ERR with IS_ERR_OR_NULL (which seems not a good idea in
general).

>
> > +                       kvm_err("Cannot allocate hyp stack guard page\n");
> > +                       goto out_err;
> > +               }
> > +
> > +               /*
> > +                * Save the stack PA in nvhe_init_params. This will be needed to recreate
> > +                * the stack mapping in protected nVHE mode. __hyp_pa() won't do the right
> > +                * thing there, since the stack has been mapped in the flexible private
> > +                * VA space.
> > +                */
>
> Nit: These comments go over 80 columns, unlike other comments that
> you've added in this file.

Ack. I'll update in the next version.

Thanks,
Kalesh

>
> Thanks,
> /fuad
>
> > +               params->stack_pa = __pa(stack_page) + PAGE_SIZE;
> > +
> > +               params->stack_hyp_va = stack_hyp_va + PAGE_SIZE;
> >         }
> >
> >         for_each_possible_cpu(cpu) {
> > --
> > 2.35.1.473.g83b2b277ed-goog
> >
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 6/8] KVM: arm64: Add hypervisor overflow stack
  2022-02-24 12:26     ` Fuad Tabba
  (?)
@ 2022-02-24 17:56       ` Kalesh Singh
  -1 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24 17:56 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Will Deacon, Marc Zyngier, Quentin Perret, Suren Baghdasaryan,
	Cc: Android Kernel, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Catalin Marinas, Mark Rutland, Mark Brown,
	Masami Hiramatsu, Peter Collingbourne, Madhavan T. Venkataraman,
	Andrew Walbran, Andrew Scull, Zenghui Yu, Ard Biesheuvel,
	Paolo Bonzini, moderated list:ARM64 PORT (AARCH64 ARCHITECTURE),
	kvmarm, LKML

On Thu, Feb 24, 2022 at 4:27 AM Fuad Tabba <tabba@google.com> wrote:
>
> Hi Kalesh,
>
> On Thu, Feb 24, 2022 at 5:21 AM Kalesh Singh <kaleshsingh@google.com> wrote:
> >
> > Allocate and switch to 16-byte aligned secondary stack on overflow. This
> > provides us stack space to better handle overflows; and is used in
> > a subsequent patch to dump the hypervisor stacktrace. The overflow stack
> > is only allocated if CONFIG_NVHE_EL2_DEBUG is enabled, as hypervisor
> > stacktraces is a debug feature dependent on CONFIG_NVHE_EL2_DEBUG.
> >
> > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > ---
> >  arch/arm64/kvm/hyp/nvhe/host.S   | 5 +++++
> >  arch/arm64/kvm/hyp/nvhe/switch.c | 5 +++++
> >  2 files changed, 10 insertions(+)
> >
> > diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
> > index 749961bfa5ba..367a01e8abed 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/host.S
> > +++ b/arch/arm64/kvm/hyp/nvhe/host.S
> > @@ -179,6 +179,10 @@ SYM_FUNC_END(__host_hvc)
> >         b       hyp_panic
> >
> >  .L__hyp_sp_overflow\@:
> > +#ifdef CONFIG_NVHE_EL2_DEBUG
> > +       /* Switch to the overflow stack */
> > +       adr_this_cpu sp, hyp_overflow_stack + PAGE_SIZE, x0
> > +#else
> >         /*
> >          * Reset SP to the top of the stack, to allow handling the hyp_panic.
> >          * This corrupts the stack but is ok, since we won't be attempting
> > @@ -186,6 +190,7 @@ SYM_FUNC_END(__host_hvc)
> >          */
>
> Nit: Maybe you should update this comment as well, since whether it
> corrupts the stack or not depends on what happens above with
> CONFIG_NVHE_EL2_DEBUG.

Ack, will update it in the next version.

Thanks,
Kalesh
>
> Thanks,
> /fuad
>
> >         ldr_this_cpu    x0, kvm_init_params + NVHE_INIT_STACK_HYP_VA, x1
> >         mov     sp, x0
> > +#endif
> >
> >         bl      hyp_panic_bad_stack
> >         ASM_BUG()
> > diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
> > index 703a5d3f611b..efc20273a352 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/switch.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
> > @@ -34,6 +34,11 @@ DEFINE_PER_CPU(struct kvm_host_data, kvm_host_data);
> >  DEFINE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
> >  DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
> >
> > +#ifdef CONFIG_NVHE_EL2_DEBUG
> > +DEFINE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack)
> > +       __aligned(16);
> > +#endif
> > +
> >  static void __activate_traps(struct kvm_vcpu *vcpu)
> >  {
> >         u64 val;
> > --
> > 2.35.1.473.g83b2b277ed-goog
> >

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 6/8] KVM: arm64: Add hypervisor overflow stack
@ 2022-02-24 17:56       ` Kalesh Singh
  0 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24 17:56 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Will Deacon, Marc Zyngier, Quentin Perret, Suren Baghdasaryan,
	Cc: Android Kernel, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Catalin Marinas, Mark Rutland, Mark Brown,
	Masami Hiramatsu, Peter Collingbourne, Madhavan T. Venkataraman,
	Andrew Walbran, Andrew Scull, Zenghui Yu, Ard Biesheuvel,
	Paolo Bonzini, moderated list:ARM64 PORT (AARCH64 ARCHITECTURE),
	kvmarm, LKML

On Thu, Feb 24, 2022 at 4:27 AM Fuad Tabba <tabba@google.com> wrote:
>
> Hi Kalesh,
>
> On Thu, Feb 24, 2022 at 5:21 AM Kalesh Singh <kaleshsingh@google.com> wrote:
> >
> > Allocate and switch to 16-byte aligned secondary stack on overflow. This
> > provides us stack space to better handle overflows; and is used in
> > a subsequent patch to dump the hypervisor stacktrace. The overflow stack
> > is only allocated if CONFIG_NVHE_EL2_DEBUG is enabled, as hypervisor
> > stacktraces is a debug feature dependent on CONFIG_NVHE_EL2_DEBUG.
> >
> > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > ---
> >  arch/arm64/kvm/hyp/nvhe/host.S   | 5 +++++
> >  arch/arm64/kvm/hyp/nvhe/switch.c | 5 +++++
> >  2 files changed, 10 insertions(+)
> >
> > diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
> > index 749961bfa5ba..367a01e8abed 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/host.S
> > +++ b/arch/arm64/kvm/hyp/nvhe/host.S
> > @@ -179,6 +179,10 @@ SYM_FUNC_END(__host_hvc)
> >         b       hyp_panic
> >
> >  .L__hyp_sp_overflow\@:
> > +#ifdef CONFIG_NVHE_EL2_DEBUG
> > +       /* Switch to the overflow stack */
> > +       adr_this_cpu sp, hyp_overflow_stack + PAGE_SIZE, x0
> > +#else
> >         /*
> >          * Reset SP to the top of the stack, to allow handling the hyp_panic.
> >          * This corrupts the stack but is ok, since we won't be attempting
> > @@ -186,6 +190,7 @@ SYM_FUNC_END(__host_hvc)
> >          */
>
> Nit: Maybe you should update this comment as well, since whether it
> corrupts the stack or not depends on what happens above with
> CONFIG_NVHE_EL2_DEBUG.

Ack, will update it in the next version.

Thanks,
Kalesh
>
> Thanks,
> /fuad
>
> >         ldr_this_cpu    x0, kvm_init_params + NVHE_INIT_STACK_HYP_VA, x1
> >         mov     sp, x0
> > +#endif
> >
> >         bl      hyp_panic_bad_stack
> >         ASM_BUG()
> > diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
> > index 703a5d3f611b..efc20273a352 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/switch.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
> > @@ -34,6 +34,11 @@ DEFINE_PER_CPU(struct kvm_host_data, kvm_host_data);
> >  DEFINE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
> >  DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
> >
> > +#ifdef CONFIG_NVHE_EL2_DEBUG
> > +DEFINE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack)
> > +       __aligned(16);
> > +#endif
> > +
> >  static void __activate_traps(struct kvm_vcpu *vcpu)
> >  {
> >         u64 val;
> > --
> > 2.35.1.473.g83b2b277ed-goog
> >

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 6/8] KVM: arm64: Add hypervisor overflow stack
@ 2022-02-24 17:56       ` Kalesh Singh
  0 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24 17:56 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Catalin Marinas, Will Deacon, kvmarm, Andrew Walbran,
	Marc Zyngier, Madhavan T. Venkataraman, Cc: Android Kernel,
	Suren Baghdasaryan, Mark Brown, Peter Collingbourne,
	moderated list:ARM64 PORT (AARCH64 ARCHITECTURE),
	LKML, Masami Hiramatsu, Paolo Bonzini

On Thu, Feb 24, 2022 at 4:27 AM Fuad Tabba <tabba@google.com> wrote:
>
> Hi Kalesh,
>
> On Thu, Feb 24, 2022 at 5:21 AM Kalesh Singh <kaleshsingh@google.com> wrote:
> >
> > Allocate and switch to 16-byte aligned secondary stack on overflow. This
> > provides us stack space to better handle overflows; and is used in
> > a subsequent patch to dump the hypervisor stacktrace. The overflow stack
> > is only allocated if CONFIG_NVHE_EL2_DEBUG is enabled, as hypervisor
> > stacktraces is a debug feature dependent on CONFIG_NVHE_EL2_DEBUG.
> >
> > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > ---
> >  arch/arm64/kvm/hyp/nvhe/host.S   | 5 +++++
> >  arch/arm64/kvm/hyp/nvhe/switch.c | 5 +++++
> >  2 files changed, 10 insertions(+)
> >
> > diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
> > index 749961bfa5ba..367a01e8abed 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/host.S
> > +++ b/arch/arm64/kvm/hyp/nvhe/host.S
> > @@ -179,6 +179,10 @@ SYM_FUNC_END(__host_hvc)
> >         b       hyp_panic
> >
> >  .L__hyp_sp_overflow\@:
> > +#ifdef CONFIG_NVHE_EL2_DEBUG
> > +       /* Switch to the overflow stack */
> > +       adr_this_cpu sp, hyp_overflow_stack + PAGE_SIZE, x0
> > +#else
> >         /*
> >          * Reset SP to the top of the stack, to allow handling the hyp_panic.
> >          * This corrupts the stack but is ok, since we won't be attempting
> > @@ -186,6 +190,7 @@ SYM_FUNC_END(__host_hvc)
> >          */
>
> Nit: Maybe you should update this comment as well, since whether it
> corrupts the stack or not depends on what happens above with
> CONFIG_NVHE_EL2_DEBUG.

Ack, will update it in the next version.

Thanks,
Kalesh
>
> Thanks,
> /fuad
>
> >         ldr_this_cpu    x0, kvm_init_params + NVHE_INIT_STACK_HYP_VA, x1
> >         mov     sp, x0
> > +#endif
> >
> >         bl      hyp_panic_bad_stack
> >         ASM_BUG()
> > diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
> > index 703a5d3f611b..efc20273a352 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/switch.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
> > @@ -34,6 +34,11 @@ DEFINE_PER_CPU(struct kvm_host_data, kvm_host_data);
> >  DEFINE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
> >  DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
> >
> > +#ifdef CONFIG_NVHE_EL2_DEBUG
> > +DEFINE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack)
> > +       __aligned(16);
> > +#endif
> > +
> >  static void __activate_traps(struct kvm_vcpu *vcpu)
> >  {
> >         u64 val;
> > --
> > 2.35.1.473.g83b2b277ed-goog
> >
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 7/8] KVM: arm64: Unwind and dump nVHE HYP stacktrace
  2022-02-24 12:28     ` Fuad Tabba
  (?)
@ 2022-02-24 18:08       ` Kalesh Singh
  -1 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24 18:08 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Will Deacon, Marc Zyngier, Quentin Perret, Suren Baghdasaryan,
	Cc: Android Kernel, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Catalin Marinas, Mark Rutland, Mark Brown,
	Masami Hiramatsu, Peter Collingbourne, Madhavan T. Venkataraman,
	Andrew Scull, Paolo Bonzini, Ard Biesheuvel,
	moderated list:ARM64 PORT (AARCH64 ARCHITECTURE),
	kvmarm, LKML

On Thu, Feb 24, 2022 at 4:28 AM Fuad Tabba <tabba@google.com> wrote:
>
> Hi Kalesh,
>
> On Thu, Feb 24, 2022 at 5:22 AM Kalesh Singh <kaleshsingh@google.com> wrote:
> >
> > Unwind the stack in EL1, when CONFIG_NVHE_EL2_DEBUG is enabled. This is
> > possible because CONFIG_NVHE_EL2_DEBUG disables the host stage 2 protection
> > which allows host to access the hypervisor stack pages in EL1.
>
> For this comment to be clearer, and if my understanding is correct, I
> think that it should say that CONFIG_NVHE_EL2_DEBUG allows host stage
> 2 protection to be disabled on a hyp_panic. Otherwise, on reading the
> comment one might think that CONFIG_NVHE_EL2_DEBUG runs without host
> stage 2 protection at all.

Your understanding is correct: the host stage 2  protection is only
disabled on a hyp_panic(). I'll rephrase to make it clearer.

>
> >
> > Unwinding and dumping hyp call traces is gated on CONFIG_NVHE_EL2_DEBUG
> > to avoid the potential leaking of information to the host.
> >
> > A simple stack overflow test produces the following output:
> >
> > [  580.376051][  T412] kvm: nVHE hyp panic at: ffffffc0116145c4!
> > [  580.378034][  T412] kvm [412]: nVHE HYP call trace:
> > [  580.378591][  T412] kvm [412]:  [<ffffffc011614934>]
> > [  580.378993][  T412] kvm [412]:  [<ffffffc01160fa48>]
> > [  580.379386][  T412] kvm [412]:  [<ffffffc0116145dc>]  // Non-terminating recursive call
> > [  580.379772][  T412] kvm [412]:  [<ffffffc0116145dc>]
> > [  580.380158][  T412] kvm [412]:  [<ffffffc0116145dc>]
> > [  580.380544][  T412] kvm [412]:  [<ffffffc0116145dc>]
> > [  580.380928][  T412] kvm [412]:  [<ffffffc0116145dc>]
> > . . .
> >
> > Since nVHE hyp symbols are not included by kallsyms to avoid issues
> > with aliasing, we fallback to the vmlinux addresses. Symbolizing the
> > addresses is handled in the next patch in this series.
> >
> > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > ---
> >
> > Changes in v3:
> >   - The nvhe hyp stack unwinder now makes use of the core logic from the
> >     regular kernel unwinder to avoid duplication, per Mark
> >
> > Changes in v2:
> >   - Add cpu_prepare_nvhe_panic_info()
> >   - Move updating the panic info to hyp_panic(), so that unwinding also
> >     works for conventional nVHE Hyp-mode.
> >
> >  arch/arm64/include/asm/kvm_asm.h    |  19 +++
> >  arch/arm64/include/asm/stacktrace.h |  12 ++
> >  arch/arm64/kernel/stacktrace.c      | 210 +++++++++++++++++++++++++---
> >  arch/arm64/kvm/Kconfig              |   5 +-
> >  arch/arm64/kvm/arm.c                |   2 +-
> >  arch/arm64/kvm/handle_exit.c        |   3 +
> >  arch/arm64/kvm/hyp/nvhe/switch.c    |  18 +++
> >  7 files changed, 243 insertions(+), 26 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> > index 2e277f2ed671..16efdf150a37 100644
> > --- a/arch/arm64/include/asm/kvm_asm.h
> > +++ b/arch/arm64/include/asm/kvm_asm.h
> > @@ -176,6 +176,25 @@ struct kvm_nvhe_init_params {
> >         unsigned long vtcr;
> >  };
> >
> > +#ifdef CONFIG_NVHE_EL2_DEBUG
> > +/*
> > + * Used by the host in EL1 to dump the nVHE hypervisor backtrace on
> > + * hyp_panic. This is possible because CONFIG_NVHE_EL2_DEBUG disables
> > + * the host stage 2 protection. See: __hyp_do_panic()
>
> Same as my comment above.

Ack
>
> > + *
> > + * @hyp_stack_base:             hyp VA of the hyp_stack base.
> > + * @hyp_overflow_stack_base:    hyp VA of the hyp_overflow_stack base.
> > + * @fp:                         hyp FP where the backtrace begins.
> > + * @pc:                         hyp PC where the backtrace begins.
> > + */
> > +struct kvm_nvhe_panic_info {
> > +       unsigned long hyp_stack_base;
> > +       unsigned long hyp_overflow_stack_base;
> > +       unsigned long fp;
> > +       unsigned long pc;
> > +};
> > +#endif /* CONFIG_NVHE_EL2_DEBUG */
> > +
> >  /* Translate a kernel address @ptr into its equivalent linear mapping */
> >  #define kvm_ksym_ref(ptr)                                              \
> >         ({                                                              \
> > diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
> > index e77cdef9ca29..18611a51cf14 100644
> > --- a/arch/arm64/include/asm/stacktrace.h
> > +++ b/arch/arm64/include/asm/stacktrace.h
> > @@ -22,6 +22,10 @@ enum stack_type {
> >         STACK_TYPE_OVERFLOW,
> >         STACK_TYPE_SDEI_NORMAL,
> >         STACK_TYPE_SDEI_CRITICAL,
> > +#ifdef CONFIG_NVHE_EL2_DEBUG
> > +       STACK_TYPE_KVM_NVHE_HYP,
> > +       STACK_TYPE_KVM_NVHE_OVERFLOW,
> > +#endif /* CONFIG_NVHE_EL2_DEBUG */
> >         __NR_STACK_TYPES
> >  };
> >
> > @@ -147,4 +151,12 @@ static inline bool on_accessible_stack(const struct task_struct *tsk,
> >         return false;
> >  }
> >
> > +#ifdef CONFIG_NVHE_EL2_DEBUG
> > +void kvm_nvhe_dump_backtrace(unsigned long hyp_offset);
> > +#else
> > +static inline void kvm_nvhe_dump_backtrace(unsigned long hyp_offset)
> > +{
> > +}
> > +#endif /* CONFIG_NVHE_EL2_DEBUG */
> > +
> >  #endif /* __ASM_STACKTRACE_H */
> > diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> > index e4103e085681..6ec85cb69b1f 100644
> > --- a/arch/arm64/kernel/stacktrace.c
> > +++ b/arch/arm64/kernel/stacktrace.c
> > @@ -15,6 +15,8 @@
> >
> >  #include <asm/irq.h>
> >  #include <asm/pointer_auth.h>
> > +#include <asm/kvm_asm.h>
> > +#include <asm/kvm_hyp.h>
> >  #include <asm/stack_pointer.h>
> >  #include <asm/stacktrace.h>
> >
> > @@ -64,26 +66,15 @@ NOKPROBE_SYMBOL(start_backtrace);
> >   * records (e.g. a cycle), determined based on the location and fp value of A
> >   * and the location (but not the fp value) of B.
> >   */
> > -static int notrace unwind_frame(struct task_struct *tsk,
> > -                               struct stackframe *frame)
> > +static int notrace __unwind_frame(struct stackframe *frame, struct stack_info *info,
> > +               unsigned long (*translate_fp)(unsigned long, enum stack_type))
> >  {
> >         unsigned long fp = frame->fp;
> > -       struct stack_info info;
> > -
> > -       if (!tsk)
> > -               tsk = current;
> > -
> > -       /* Final frame; nothing to unwind */
> > -       if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
> > -               return -ENOENT;
> >
> >         if (fp & 0x7)
> >                 return -EINVAL;
> >
> > -       if (!on_accessible_stack(tsk, fp, 16, &info))
> > -               return -EINVAL;
> > -
> > -       if (test_bit(info.type, frame->stacks_done))
> > +       if (test_bit(info->type, frame->stacks_done))
> >                 return -EINVAL;
> >
> >         /*
> > @@ -94,28 +85,62 @@ static int notrace unwind_frame(struct task_struct *tsk,
> >          *
> >          * TASK -> IRQ -> OVERFLOW -> SDEI_NORMAL
> >          * TASK -> SDEI_NORMAL -> SDEI_CRITICAL -> OVERFLOW
> > +        * KVM_NVHE_HYP -> KVM_NVHE_OVERFLOW
> >          *
> >          * ... but the nesting itself is strict. Once we transition from one
> >          * stack to another, it's never valid to unwind back to that first
> >          * stack.
> >          */
> > -       if (info.type == frame->prev_type) {
> > +       if (info->type == frame->prev_type) {
> >                 if (fp <= frame->prev_fp)
> >                         return -EINVAL;
> >         } else {
> >                 set_bit(frame->prev_type, frame->stacks_done);
> >         }
> >
> > +       /* Record fp as prev_fp before attempting to get the next fp */
> > +       frame->prev_fp = fp;
> > +
> > +       /*
> > +        * If fp is not from the current address space perform the
> > +        * necessary translation before dereferencing it to get next fp.
> > +        */
> > +       if (translate_fp)
> > +               fp = translate_fp(fp, info->type);
> > +       if (!fp)
> > +               return -EINVAL;
> > +
> >         /*
> >          * Record this frame record's values and location. The prev_fp and
> > -        * prev_type are only meaningful to the next unwind_frame() invocation.
> > +        * prev_type are only meaningful to the next __unwind_frame() invocation.
> >          */
> >         frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
> >         frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
> > -       frame->prev_fp = fp;
> > -       frame->prev_type = info.type;
> > -
> >         frame->pc = ptrauth_strip_insn_pac(frame->pc);
> > +       frame->prev_type = info->type;
> > +
> > +       return 0;
> > +}
> > +
> > +static int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
> > +{
> > +       unsigned long fp = frame->fp;
> > +       struct stack_info info;
> > +       int err;
> > +
> > +       if (!tsk)
> > +               tsk = current;
> > +
> > +       /* Final frame; nothing to unwind */
> > +       if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
> > +               return -ENOENT;
> > +
> > +       if (!on_accessible_stack(tsk, fp, 16, &info))
> > +               return -EINVAL;
> > +
> > +       err = __unwind_frame(frame, &info, NULL);
> > +       if (err)
> > +               return err;
> >
> >  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> >         if (tsk->ret_stack &&
> > @@ -143,20 +168,27 @@ static int notrace unwind_frame(struct task_struct *tsk,
> >  }
> >  NOKPROBE_SYMBOL(unwind_frame);
> >
> > -static void notrace walk_stackframe(struct task_struct *tsk,
> > -                                   struct stackframe *frame,
> > -                                   bool (*fn)(void *, unsigned long), void *data)
> > +static void notrace __walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
> > +               bool (*fn)(void *, unsigned long), void *data,
> > +               int (*unwind_frame_fn)(struct task_struct *tsk, struct stackframe *frame))
> >  {
> >         while (1) {
> >                 int ret;
> >
> >                 if (!fn(data, frame->pc))
> >                         break;
> > -               ret = unwind_frame(tsk, frame);
> > +               ret = unwind_frame_fn(tsk, frame);
> >                 if (ret < 0)
> >                         break;
> >         }
> >  }
> > +
> > +static void notrace walk_stackframe(struct task_struct *tsk,
> > +                                   struct stackframe *frame,
> > +                                   bool (*fn)(void *, unsigned long), void *data)
> > +{
> > +       __walk_stackframe(tsk, frame, fn, data, unwind_frame);
> > +}
> >  NOKPROBE_SYMBOL(walk_stackframe);
> >
> >  static bool dump_backtrace_entry(void *arg, unsigned long where)
> > @@ -210,3 +242,135 @@ noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
> >
> >         walk_stackframe(task, &frame, consume_entry, cookie);
> >  }
> > +
> > +#ifdef CONFIG_NVHE_EL2_DEBUG
> > +DECLARE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
> > +DECLARE_KVM_NVHE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack);
> > +DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
> > +
> > +static inline bool kvm_nvhe_on_overflow_stack(unsigned long sp, unsigned long size,
> > +                                struct stack_info *info)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> > +       unsigned long low = (unsigned long)panic_info->hyp_overflow_stack_base;
> > +       unsigned long high = low + PAGE_SIZE;
> > +
> > +       return on_stack(sp, size, low, high, STACK_TYPE_KVM_NVHE_OVERFLOW, info);
> > +}
> > +
> > +static inline bool kvm_nvhe_on_hyp_stack(unsigned long sp, unsigned long size,
> > +                                struct stack_info *info)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> > +       unsigned long low = (unsigned long)panic_info->hyp_stack_base;
> > +       unsigned long high = low + PAGE_SIZE;
> > +
> > +       return on_stack(sp, size, low, high, STACK_TYPE_KVM_NVHE_HYP, info);
> > +}
> > +
> > +static inline bool kvm_nvhe_on_accessible_stack(unsigned long sp, unsigned long size,
> > +                                      struct stack_info *info)
> > +{
> > +       if (info)
> > +               info->type = STACK_TYPE_UNKNOWN;
> > +
> > +       if (kvm_nvhe_on_hyp_stack(sp, size, info))
> > +               return true;
> > +       if (kvm_nvhe_on_overflow_stack(sp, size, info))
> > +               return true;
> > +
> > +       return false;
> > +}
> > +
> > +static unsigned long kvm_nvhe_hyp_stack_kern_va(unsigned long addr)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> > +       unsigned long hyp_base, kern_base, hyp_offset;
> > +
> > +       hyp_base = (unsigned long)panic_info->hyp_stack_base;
> > +       hyp_offset = addr - hyp_base;
> > +
> > +       kern_base = (unsigned long)*this_cpu_ptr(&kvm_arm_hyp_stack_page);
> > +
> > +       return kern_base + hyp_offset;
> > +}
> > +
> > +static unsigned long kvm_nvhe_overflow_stack_kern_va(unsigned long addr)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> > +       unsigned long hyp_base, kern_base, hyp_offset;
> > +
> > +       hyp_base = (unsigned long)panic_info->hyp_overflow_stack_base;
> > +       hyp_offset = addr - hyp_base;
> > +
> > +       kern_base = (unsigned long)this_cpu_ptr_nvhe_sym(hyp_overflow_stack);
> > +
> > +       return kern_base + hyp_offset;
> > +}
> > +
> > +/*
> > + * Convert KVM nVHE hypervisor stack VA to a kernel VA.
> > + *
> > + * The nVHE hypervisor stack is mapped in the flexible 'private' VA range, to allow
> > + * for guard pages below the stack. Consequently, the fixed offset address
> > + * translation macros won't work here.
> > + *
> > + * The kernel VA is calculated as an offset from the kernel VA of the hypervisor
> > + * stack base. See: kvm_nvhe_hyp_stack_kern_va(),  kvm_nvhe_overflow_stack_kern_va()
> > + */
> > +static unsigned long kvm_nvhe_stack_kern_va(unsigned long addr,
> > +                                       enum stack_type type)
> > +{
> > +       switch (type) {
> > +       case STACK_TYPE_KVM_NVHE_HYP:
> > +               return kvm_nvhe_hyp_stack_kern_va(addr);
> > +       case STACK_TYPE_KVM_NVHE_OVERFLOW:
> > +               return kvm_nvhe_overflow_stack_kern_va(addr);
> > +       default:
> > +               return 0UL;
> > +       }
> > +}
> > +
> > +static int notrace kvm_nvhe_unwind_frame(struct task_struct *tsk,
> > +                                       struct stackframe *frame)
> > +{
> > +       struct stack_info info;
> > +
> > +       if (!kvm_nvhe_on_accessible_stack(frame->fp, 16, &info))
> > +               return -EINVAL;
> > +
> > +       return  __unwind_frame(frame, &info, kvm_nvhe_stack_kern_va);
> > +}
> > +
> > +static bool kvm_nvhe_dump_backtrace_entry(void *arg, unsigned long where)
> > +{
> > +       unsigned long va_mask = GENMASK_ULL(vabits_actual - 1, 0);
> > +       unsigned long hyp_offset = (unsigned long)arg;
> > +
> > +       where &= va_mask;       /* Mask tags */
> > +       where += hyp_offset;    /* Convert to kern addr */
> > +
> > +       kvm_err("[<%016lx>] %pB\n", where, (void *)where);
> > +
> > +       return true;
> > +}
> > +
> > +static void notrace kvm_nvhe_walk_stackframe(struct task_struct *tsk,
> > +                                   struct stackframe *frame,
> > +                                   bool (*fn)(void *, unsigned long), void *data)
> > +{
> > +       __walk_stackframe(tsk, frame, fn, data, kvm_nvhe_unwind_frame);
> > +}
> > +
> > +void kvm_nvhe_dump_backtrace(unsigned long hyp_offset)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> > +       struct stackframe frame;
> > +
> > +       start_backtrace(&frame, panic_info->fp, panic_info->pc);
> > +       pr_err("nVHE HYP call trace:\n");
> > +       kvm_nvhe_walk_stackframe(NULL, &frame, kvm_nvhe_dump_backtrace_entry,
> > +                                       (void *)hyp_offset);
> > +       pr_err("---- end of nVHE HYP call trace ----\n");
> > +}
> > +#endif /* CONFIG_NVHE_EL2_DEBUG */
> > diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> > index 8a5fbbf084df..75f2c8255ff0 100644
> > --- a/arch/arm64/kvm/Kconfig
> > +++ b/arch/arm64/kvm/Kconfig
> > @@ -51,8 +51,9 @@ config NVHE_EL2_DEBUG
> >         depends on KVM
> >         help
> >           Say Y here to enable the debug mode for the non-VHE KVM EL2 object.
> > -         Failure reports will BUG() in the hypervisor. This is intended for
> > -         local EL2 hypervisor development.
> > +         Failure reports will BUG() in the hypervisor; and panics will print
> > +         the hypervisor call stack. This is intended for local EL2 hypervisor
> > +         development.
>
> Nit: maybe for clarity you could rephrase as "calls to hyp_panic()
> will result in printing the hypervisor call stack".

Ack. I'll update in the next version.

Thanks,
Kalesh
>
> Thanks,
> /fuad
>
>
> >
> >           If unsure, say N.
> >
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 7a23630c4a7f..66c07c04eb52 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -49,7 +49,7 @@ DEFINE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
> >
> >  DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
> >
> > -static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
> > +DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
> >  unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
> >  DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
> >
> > diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> > index e3140abd2e2e..ff69dff33700 100644
> > --- a/arch/arm64/kvm/handle_exit.c
> > +++ b/arch/arm64/kvm/handle_exit.c
> > @@ -17,6 +17,7 @@
> >  #include <asm/kvm_emulate.h>
> >  #include <asm/kvm_mmu.h>
> >  #include <asm/debug-monitors.h>
> > +#include <asm/stacktrace.h>
> >  #include <asm/traps.h>
> >
> >  #include <kvm/arm_hypercalls.h>
> > @@ -326,6 +327,8 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr,
> >                 kvm_err("nVHE hyp panic at: %016llx!\n", elr_virt + hyp_offset);
> >         }
> >
> > +       kvm_nvhe_dump_backtrace(hyp_offset);
> > +
> >         /*
> >          * Hyp has panicked and we're going to handle that by panicking the
> >          * kernel. The kernel offset will be revealed in the panic so we're
> > diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
> > index efc20273a352..b8ecffc47424 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/switch.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
> > @@ -37,6 +37,22 @@ DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
> >  #ifdef CONFIG_NVHE_EL2_DEBUG
> >  DEFINE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack)
> >         __aligned(16);
> > +DEFINE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
> > +
> > +static inline void cpu_prepare_nvhe_panic_info(void)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr(&kvm_panic_info);
> > +       struct kvm_nvhe_init_params *params = this_cpu_ptr(&kvm_init_params);
> > +
> > +       panic_info->hyp_stack_base = (unsigned long)(params->stack_hyp_va - PAGE_SIZE);
> > +       panic_info->hyp_overflow_stack_base = (unsigned long)this_cpu_ptr(hyp_overflow_stack);
> > +       panic_info->fp = (unsigned long)__builtin_frame_address(0);
> > +       panic_info->pc = _THIS_IP_;
> > +}
> > + #else
> > +static inline void cpu_prepare_nvhe_panic_info(void)
> > +{
> > +}
> >  #endif
> >
> >  static void __activate_traps(struct kvm_vcpu *vcpu)
> > @@ -360,6 +376,8 @@ asmlinkage void __noreturn hyp_panic(void)
> >         struct kvm_cpu_context *host_ctxt;
> >         struct kvm_vcpu *vcpu;
> >
> > +       cpu_prepare_nvhe_panic_info();
> > +
> >         host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
> >         vcpu = host_ctxt->__hyp_running_vcpu;
> >
> > --
> > 2.35.1.473.g83b2b277ed-goog
> >

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 7/8] KVM: arm64: Unwind and dump nVHE HYP stacktrace
@ 2022-02-24 18:08       ` Kalesh Singh
  0 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24 18:08 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Will Deacon, Marc Zyngier, Quentin Perret, Suren Baghdasaryan,
	Cc: Android Kernel, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Catalin Marinas, Mark Rutland, Mark Brown,
	Masami Hiramatsu, Peter Collingbourne, Madhavan T. Venkataraman,
	Andrew Scull, Paolo Bonzini, Ard Biesheuvel,
	moderated list:ARM64 PORT (AARCH64 ARCHITECTURE),
	kvmarm, LKML

On Thu, Feb 24, 2022 at 4:28 AM Fuad Tabba <tabba@google.com> wrote:
>
> Hi Kalesh,
>
> On Thu, Feb 24, 2022 at 5:22 AM Kalesh Singh <kaleshsingh@google.com> wrote:
> >
> > Unwind the stack in EL1, when CONFIG_NVHE_EL2_DEBUG is enabled. This is
> > possible because CONFIG_NVHE_EL2_DEBUG disables the host stage 2 protection
> > which allows host to access the hypervisor stack pages in EL1.
>
> For this comment to be clearer, and if my understanding is correct, I
> think that it should say that CONFIG_NVHE_EL2_DEBUG allows host stage
> 2 protection to be disabled on a hyp_panic. Otherwise, on reading the
> comment one might think that CONFIG_NVHE_EL2_DEBUG runs without host
> stage 2 protection at all.

Your understanding is correct: the host stage 2  protection is only
disabled on a hyp_panic(). I'll rephrase to make it clearer.

>
> >
> > Unwinding and dumping hyp call traces is gated on CONFIG_NVHE_EL2_DEBUG
> > to avoid the potential leaking of information to the host.
> >
> > A simple stack overflow test produces the following output:
> >
> > [  580.376051][  T412] kvm: nVHE hyp panic at: ffffffc0116145c4!
> > [  580.378034][  T412] kvm [412]: nVHE HYP call trace:
> > [  580.378591][  T412] kvm [412]:  [<ffffffc011614934>]
> > [  580.378993][  T412] kvm [412]:  [<ffffffc01160fa48>]
> > [  580.379386][  T412] kvm [412]:  [<ffffffc0116145dc>]  // Non-terminating recursive call
> > [  580.379772][  T412] kvm [412]:  [<ffffffc0116145dc>]
> > [  580.380158][  T412] kvm [412]:  [<ffffffc0116145dc>]
> > [  580.380544][  T412] kvm [412]:  [<ffffffc0116145dc>]
> > [  580.380928][  T412] kvm [412]:  [<ffffffc0116145dc>]
> > . . .
> >
> > Since nVHE hyp symbols are not included by kallsyms to avoid issues
> > with aliasing, we fallback to the vmlinux addresses. Symbolizing the
> > addresses is handled in the next patch in this series.
> >
> > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > ---
> >
> > Changes in v3:
> >   - The nvhe hyp stack unwinder now makes use of the core logic from the
> >     regular kernel unwinder to avoid duplication, per Mark
> >
> > Changes in v2:
> >   - Add cpu_prepare_nvhe_panic_info()
> >   - Move updating the panic info to hyp_panic(), so that unwinding also
> >     works for conventional nVHE Hyp-mode.
> >
> >  arch/arm64/include/asm/kvm_asm.h    |  19 +++
> >  arch/arm64/include/asm/stacktrace.h |  12 ++
> >  arch/arm64/kernel/stacktrace.c      | 210 +++++++++++++++++++++++++---
> >  arch/arm64/kvm/Kconfig              |   5 +-
> >  arch/arm64/kvm/arm.c                |   2 +-
> >  arch/arm64/kvm/handle_exit.c        |   3 +
> >  arch/arm64/kvm/hyp/nvhe/switch.c    |  18 +++
> >  7 files changed, 243 insertions(+), 26 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> > index 2e277f2ed671..16efdf150a37 100644
> > --- a/arch/arm64/include/asm/kvm_asm.h
> > +++ b/arch/arm64/include/asm/kvm_asm.h
> > @@ -176,6 +176,25 @@ struct kvm_nvhe_init_params {
> >         unsigned long vtcr;
> >  };
> >
> > +#ifdef CONFIG_NVHE_EL2_DEBUG
> > +/*
> > + * Used by the host in EL1 to dump the nVHE hypervisor backtrace on
> > + * hyp_panic. This is possible because CONFIG_NVHE_EL2_DEBUG disables
> > + * the host stage 2 protection. See: __hyp_do_panic()
>
> Same as my comment above.

Ack
>
> > + *
> > + * @hyp_stack_base:             hyp VA of the hyp_stack base.
> > + * @hyp_overflow_stack_base:    hyp VA of the hyp_overflow_stack base.
> > + * @fp:                         hyp FP where the backtrace begins.
> > + * @pc:                         hyp PC where the backtrace begins.
> > + */
> > +struct kvm_nvhe_panic_info {
> > +       unsigned long hyp_stack_base;
> > +       unsigned long hyp_overflow_stack_base;
> > +       unsigned long fp;
> > +       unsigned long pc;
> > +};
> > +#endif /* CONFIG_NVHE_EL2_DEBUG */
> > +
> >  /* Translate a kernel address @ptr into its equivalent linear mapping */
> >  #define kvm_ksym_ref(ptr)                                              \
> >         ({                                                              \
> > diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
> > index e77cdef9ca29..18611a51cf14 100644
> > --- a/arch/arm64/include/asm/stacktrace.h
> > +++ b/arch/arm64/include/asm/stacktrace.h
> > @@ -22,6 +22,10 @@ enum stack_type {
> >         STACK_TYPE_OVERFLOW,
> >         STACK_TYPE_SDEI_NORMAL,
> >         STACK_TYPE_SDEI_CRITICAL,
> > +#ifdef CONFIG_NVHE_EL2_DEBUG
> > +       STACK_TYPE_KVM_NVHE_HYP,
> > +       STACK_TYPE_KVM_NVHE_OVERFLOW,
> > +#endif /* CONFIG_NVHE_EL2_DEBUG */
> >         __NR_STACK_TYPES
> >  };
> >
> > @@ -147,4 +151,12 @@ static inline bool on_accessible_stack(const struct task_struct *tsk,
> >         return false;
> >  }
> >
> > +#ifdef CONFIG_NVHE_EL2_DEBUG
> > +void kvm_nvhe_dump_backtrace(unsigned long hyp_offset);
> > +#else
> > +static inline void kvm_nvhe_dump_backtrace(unsigned long hyp_offset)
> > +{
> > +}
> > +#endif /* CONFIG_NVHE_EL2_DEBUG */
> > +
> >  #endif /* __ASM_STACKTRACE_H */
> > diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> > index e4103e085681..6ec85cb69b1f 100644
> > --- a/arch/arm64/kernel/stacktrace.c
> > +++ b/arch/arm64/kernel/stacktrace.c
> > @@ -15,6 +15,8 @@
> >
> >  #include <asm/irq.h>
> >  #include <asm/pointer_auth.h>
> > +#include <asm/kvm_asm.h>
> > +#include <asm/kvm_hyp.h>
> >  #include <asm/stack_pointer.h>
> >  #include <asm/stacktrace.h>
> >
> > @@ -64,26 +66,15 @@ NOKPROBE_SYMBOL(start_backtrace);
> >   * records (e.g. a cycle), determined based on the location and fp value of A
> >   * and the location (but not the fp value) of B.
> >   */
> > -static int notrace unwind_frame(struct task_struct *tsk,
> > -                               struct stackframe *frame)
> > +static int notrace __unwind_frame(struct stackframe *frame, struct stack_info *info,
> > +               unsigned long (*translate_fp)(unsigned long, enum stack_type))
> >  {
> >         unsigned long fp = frame->fp;
> > -       struct stack_info info;
> > -
> > -       if (!tsk)
> > -               tsk = current;
> > -
> > -       /* Final frame; nothing to unwind */
> > -       if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
> > -               return -ENOENT;
> >
> >         if (fp & 0x7)
> >                 return -EINVAL;
> >
> > -       if (!on_accessible_stack(tsk, fp, 16, &info))
> > -               return -EINVAL;
> > -
> > -       if (test_bit(info.type, frame->stacks_done))
> > +       if (test_bit(info->type, frame->stacks_done))
> >                 return -EINVAL;
> >
> >         /*
> > @@ -94,28 +85,62 @@ static int notrace unwind_frame(struct task_struct *tsk,
> >          *
> >          * TASK -> IRQ -> OVERFLOW -> SDEI_NORMAL
> >          * TASK -> SDEI_NORMAL -> SDEI_CRITICAL -> OVERFLOW
> > +        * KVM_NVHE_HYP -> KVM_NVHE_OVERFLOW
> >          *
> >          * ... but the nesting itself is strict. Once we transition from one
> >          * stack to another, it's never valid to unwind back to that first
> >          * stack.
> >          */
> > -       if (info.type == frame->prev_type) {
> > +       if (info->type == frame->prev_type) {
> >                 if (fp <= frame->prev_fp)
> >                         return -EINVAL;
> >         } else {
> >                 set_bit(frame->prev_type, frame->stacks_done);
> >         }
> >
> > +       /* Record fp as prev_fp before attempting to get the next fp */
> > +       frame->prev_fp = fp;
> > +
> > +       /*
> > +        * If fp is not from the current address space perform the
> > +        * necessary translation before dereferencing it to get next fp.
> > +        */
> > +       if (translate_fp)
> > +               fp = translate_fp(fp, info->type);
> > +       if (!fp)
> > +               return -EINVAL;
> > +
> >         /*
> >          * Record this frame record's values and location. The prev_fp and
> > -        * prev_type are only meaningful to the next unwind_frame() invocation.
> > +        * prev_type are only meaningful to the next __unwind_frame() invocation.
> >          */
> >         frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
> >         frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
> > -       frame->prev_fp = fp;
> > -       frame->prev_type = info.type;
> > -
> >         frame->pc = ptrauth_strip_insn_pac(frame->pc);
> > +       frame->prev_type = info->type;
> > +
> > +       return 0;
> > +}
> > +
> > +static int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
> > +{
> > +       unsigned long fp = frame->fp;
> > +       struct stack_info info;
> > +       int err;
> > +
> > +       if (!tsk)
> > +               tsk = current;
> > +
> > +       /* Final frame; nothing to unwind */
> > +       if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
> > +               return -ENOENT;
> > +
> > +       if (!on_accessible_stack(tsk, fp, 16, &info))
> > +               return -EINVAL;
> > +
> > +       err = __unwind_frame(frame, &info, NULL);
> > +       if (err)
> > +               return err;
> >
> >  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> >         if (tsk->ret_stack &&
> > @@ -143,20 +168,27 @@ static int notrace unwind_frame(struct task_struct *tsk,
> >  }
> >  NOKPROBE_SYMBOL(unwind_frame);
> >
> > -static void notrace walk_stackframe(struct task_struct *tsk,
> > -                                   struct stackframe *frame,
> > -                                   bool (*fn)(void *, unsigned long), void *data)
> > +static void notrace __walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
> > +               bool (*fn)(void *, unsigned long), void *data,
> > +               int (*unwind_frame_fn)(struct task_struct *tsk, struct stackframe *frame))
> >  {
> >         while (1) {
> >                 int ret;
> >
> >                 if (!fn(data, frame->pc))
> >                         break;
> > -               ret = unwind_frame(tsk, frame);
> > +               ret = unwind_frame_fn(tsk, frame);
> >                 if (ret < 0)
> >                         break;
> >         }
> >  }
> > +
> > +static void notrace walk_stackframe(struct task_struct *tsk,
> > +                                   struct stackframe *frame,
> > +                                   bool (*fn)(void *, unsigned long), void *data)
> > +{
> > +       __walk_stackframe(tsk, frame, fn, data, unwind_frame);
> > +}
> >  NOKPROBE_SYMBOL(walk_stackframe);
> >
> >  static bool dump_backtrace_entry(void *arg, unsigned long where)
> > @@ -210,3 +242,135 @@ noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
> >
> >         walk_stackframe(task, &frame, consume_entry, cookie);
> >  }
> > +
> > +#ifdef CONFIG_NVHE_EL2_DEBUG
> > +DECLARE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
> > +DECLARE_KVM_NVHE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack);
> > +DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
> > +
> > +static inline bool kvm_nvhe_on_overflow_stack(unsigned long sp, unsigned long size,
> > +                                struct stack_info *info)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> > +       unsigned long low = (unsigned long)panic_info->hyp_overflow_stack_base;
> > +       unsigned long high = low + PAGE_SIZE;
> > +
> > +       return on_stack(sp, size, low, high, STACK_TYPE_KVM_NVHE_OVERFLOW, info);
> > +}
> > +
> > +static inline bool kvm_nvhe_on_hyp_stack(unsigned long sp, unsigned long size,
> > +                                struct stack_info *info)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> > +       unsigned long low = (unsigned long)panic_info->hyp_stack_base;
> > +       unsigned long high = low + PAGE_SIZE;
> > +
> > +       return on_stack(sp, size, low, high, STACK_TYPE_KVM_NVHE_HYP, info);
> > +}
> > +
> > +static inline bool kvm_nvhe_on_accessible_stack(unsigned long sp, unsigned long size,
> > +                                      struct stack_info *info)
> > +{
> > +       if (info)
> > +               info->type = STACK_TYPE_UNKNOWN;
> > +
> > +       if (kvm_nvhe_on_hyp_stack(sp, size, info))
> > +               return true;
> > +       if (kvm_nvhe_on_overflow_stack(sp, size, info))
> > +               return true;
> > +
> > +       return false;
> > +}
> > +
> > +static unsigned long kvm_nvhe_hyp_stack_kern_va(unsigned long addr)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> > +       unsigned long hyp_base, kern_base, hyp_offset;
> > +
> > +       hyp_base = (unsigned long)panic_info->hyp_stack_base;
> > +       hyp_offset = addr - hyp_base;
> > +
> > +       kern_base = (unsigned long)*this_cpu_ptr(&kvm_arm_hyp_stack_page);
> > +
> > +       return kern_base + hyp_offset;
> > +}
> > +
> > +static unsigned long kvm_nvhe_overflow_stack_kern_va(unsigned long addr)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> > +       unsigned long hyp_base, kern_base, hyp_offset;
> > +
> > +       hyp_base = (unsigned long)panic_info->hyp_overflow_stack_base;
> > +       hyp_offset = addr - hyp_base;
> > +
> > +       kern_base = (unsigned long)this_cpu_ptr_nvhe_sym(hyp_overflow_stack);
> > +
> > +       return kern_base + hyp_offset;
> > +}
> > +
> > +/*
> > + * Convert KVM nVHE hypervisor stack VA to a kernel VA.
> > + *
> > + * The nVHE hypervisor stack is mapped in the flexible 'private' VA range, to allow
> > + * for guard pages below the stack. Consequently, the fixed offset address
> > + * translation macros won't work here.
> > + *
> > + * The kernel VA is calculated as an offset from the kernel VA of the hypervisor
> > + * stack base. See: kvm_nvhe_hyp_stack_kern_va(),  kvm_nvhe_overflow_stack_kern_va()
> > + */
> > +static unsigned long kvm_nvhe_stack_kern_va(unsigned long addr,
> > +                                       enum stack_type type)
> > +{
> > +       switch (type) {
> > +       case STACK_TYPE_KVM_NVHE_HYP:
> > +               return kvm_nvhe_hyp_stack_kern_va(addr);
> > +       case STACK_TYPE_KVM_NVHE_OVERFLOW:
> > +               return kvm_nvhe_overflow_stack_kern_va(addr);
> > +       default:
> > +               return 0UL;
> > +       }
> > +}
> > +
> > +static int notrace kvm_nvhe_unwind_frame(struct task_struct *tsk,
> > +                                       struct stackframe *frame)
> > +{
> > +       struct stack_info info;
> > +
> > +       if (!kvm_nvhe_on_accessible_stack(frame->fp, 16, &info))
> > +               return -EINVAL;
> > +
> > +       return  __unwind_frame(frame, &info, kvm_nvhe_stack_kern_va);
> > +}
> > +
> > +static bool kvm_nvhe_dump_backtrace_entry(void *arg, unsigned long where)
> > +{
> > +       unsigned long va_mask = GENMASK_ULL(vabits_actual - 1, 0);
> > +       unsigned long hyp_offset = (unsigned long)arg;
> > +
> > +       where &= va_mask;       /* Mask tags */
> > +       where += hyp_offset;    /* Convert to kern addr */
> > +
> > +       kvm_err("[<%016lx>] %pB\n", where, (void *)where);
> > +
> > +       return true;
> > +}
> > +
> > +static void notrace kvm_nvhe_walk_stackframe(struct task_struct *tsk,
> > +                                   struct stackframe *frame,
> > +                                   bool (*fn)(void *, unsigned long), void *data)
> > +{
> > +       __walk_stackframe(tsk, frame, fn, data, kvm_nvhe_unwind_frame);
> > +}
> > +
> > +void kvm_nvhe_dump_backtrace(unsigned long hyp_offset)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> > +       struct stackframe frame;
> > +
> > +       start_backtrace(&frame, panic_info->fp, panic_info->pc);
> > +       pr_err("nVHE HYP call trace:\n");
> > +       kvm_nvhe_walk_stackframe(NULL, &frame, kvm_nvhe_dump_backtrace_entry,
> > +                                       (void *)hyp_offset);
> > +       pr_err("---- end of nVHE HYP call trace ----\n");
> > +}
> > +#endif /* CONFIG_NVHE_EL2_DEBUG */
> > diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> > index 8a5fbbf084df..75f2c8255ff0 100644
> > --- a/arch/arm64/kvm/Kconfig
> > +++ b/arch/arm64/kvm/Kconfig
> > @@ -51,8 +51,9 @@ config NVHE_EL2_DEBUG
> >         depends on KVM
> >         help
> >           Say Y here to enable the debug mode for the non-VHE KVM EL2 object.
> > -         Failure reports will BUG() in the hypervisor. This is intended for
> > -         local EL2 hypervisor development.
> > +         Failure reports will BUG() in the hypervisor; and panics will print
> > +         the hypervisor call stack. This is intended for local EL2 hypervisor
> > +         development.
>
> Nit: maybe for clarity you could rephrase as "calls to hyp_panic()
> will result in printing the hypervisor call stack".

Ack. I'll update in the next version.

Thanks,
Kalesh
>
> Thanks,
> /fuad
>
>
> >
> >           If unsure, say N.
> >
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 7a23630c4a7f..66c07c04eb52 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -49,7 +49,7 @@ DEFINE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
> >
> >  DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
> >
> > -static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
> > +DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
> >  unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
> >  DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
> >
> > diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> > index e3140abd2e2e..ff69dff33700 100644
> > --- a/arch/arm64/kvm/handle_exit.c
> > +++ b/arch/arm64/kvm/handle_exit.c
> > @@ -17,6 +17,7 @@
> >  #include <asm/kvm_emulate.h>
> >  #include <asm/kvm_mmu.h>
> >  #include <asm/debug-monitors.h>
> > +#include <asm/stacktrace.h>
> >  #include <asm/traps.h>
> >
> >  #include <kvm/arm_hypercalls.h>
> > @@ -326,6 +327,8 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr,
> >                 kvm_err("nVHE hyp panic at: %016llx!\n", elr_virt + hyp_offset);
> >         }
> >
> > +       kvm_nvhe_dump_backtrace(hyp_offset);
> > +
> >         /*
> >          * Hyp has panicked and we're going to handle that by panicking the
> >          * kernel. The kernel offset will be revealed in the panic so we're
> > diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
> > index efc20273a352..b8ecffc47424 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/switch.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
> > @@ -37,6 +37,22 @@ DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
> >  #ifdef CONFIG_NVHE_EL2_DEBUG
> >  DEFINE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack)
> >         __aligned(16);
> > +DEFINE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
> > +
> > +static inline void cpu_prepare_nvhe_panic_info(void)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr(&kvm_panic_info);
> > +       struct kvm_nvhe_init_params *params = this_cpu_ptr(&kvm_init_params);
> > +
> > +       panic_info->hyp_stack_base = (unsigned long)(params->stack_hyp_va - PAGE_SIZE);
> > +       panic_info->hyp_overflow_stack_base = (unsigned long)this_cpu_ptr(hyp_overflow_stack);
> > +       panic_info->fp = (unsigned long)__builtin_frame_address(0);
> > +       panic_info->pc = _THIS_IP_;
> > +}
> > + #else
> > +static inline void cpu_prepare_nvhe_panic_info(void)
> > +{
> > +}
> >  #endif
> >
> >  static void __activate_traps(struct kvm_vcpu *vcpu)
> > @@ -360,6 +376,8 @@ asmlinkage void __noreturn hyp_panic(void)
> >         struct kvm_cpu_context *host_ctxt;
> >         struct kvm_vcpu *vcpu;
> >
> > +       cpu_prepare_nvhe_panic_info();
> > +
> >         host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
> >         vcpu = host_ctxt->__hyp_running_vcpu;
> >
> > --
> > 2.35.1.473.g83b2b277ed-goog
> >

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 7/8] KVM: arm64: Unwind and dump nVHE HYP stacktrace
@ 2022-02-24 18:08       ` Kalesh Singh
  0 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-24 18:08 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Cc: Android Kernel, Will Deacon, Peter Collingbourne,
	Marc Zyngier, LKML, kvmarm, Madhavan T. Venkataraman, Mark Brown,
	Masami Hiramatsu, Catalin Marinas, Paolo Bonzini,
	Suren Baghdasaryan,
	moderated list:ARM64 PORT (AARCH64 ARCHITECTURE)

On Thu, Feb 24, 2022 at 4:28 AM Fuad Tabba <tabba@google.com> wrote:
>
> Hi Kalesh,
>
> On Thu, Feb 24, 2022 at 5:22 AM Kalesh Singh <kaleshsingh@google.com> wrote:
> >
> > Unwind the stack in EL1, when CONFIG_NVHE_EL2_DEBUG is enabled. This is
> > possible because CONFIG_NVHE_EL2_DEBUG disables the host stage 2 protection
> > which allows host to access the hypervisor stack pages in EL1.
>
> For this comment to be clearer, and if my understanding is correct, I
> think that it should say that CONFIG_NVHE_EL2_DEBUG allows host stage
> 2 protection to be disabled on a hyp_panic. Otherwise, on reading the
> comment one might think that CONFIG_NVHE_EL2_DEBUG runs without host
> stage 2 protection at all.

Your understanding is correct: the host stage 2  protection is only
disabled on a hyp_panic(). I'll rephrase to make it clearer.

>
> >
> > Unwinding and dumping hyp call traces is gated on CONFIG_NVHE_EL2_DEBUG
> > to avoid the potential leaking of information to the host.
> >
> > A simple stack overflow test produces the following output:
> >
> > [  580.376051][  T412] kvm: nVHE hyp panic at: ffffffc0116145c4!
> > [  580.378034][  T412] kvm [412]: nVHE HYP call trace:
> > [  580.378591][  T412] kvm [412]:  [<ffffffc011614934>]
> > [  580.378993][  T412] kvm [412]:  [<ffffffc01160fa48>]
> > [  580.379386][  T412] kvm [412]:  [<ffffffc0116145dc>]  // Non-terminating recursive call
> > [  580.379772][  T412] kvm [412]:  [<ffffffc0116145dc>]
> > [  580.380158][  T412] kvm [412]:  [<ffffffc0116145dc>]
> > [  580.380544][  T412] kvm [412]:  [<ffffffc0116145dc>]
> > [  580.380928][  T412] kvm [412]:  [<ffffffc0116145dc>]
> > . . .
> >
> > Since nVHE hyp symbols are not included by kallsyms to avoid issues
> > with aliasing, we fallback to the vmlinux addresses. Symbolizing the
> > addresses is handled in the next patch in this series.
> >
> > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > ---
> >
> > Changes in v3:
> >   - The nvhe hyp stack unwinder now makes use of the core logic from the
> >     regular kernel unwinder to avoid duplication, per Mark
> >
> > Changes in v2:
> >   - Add cpu_prepare_nvhe_panic_info()
> >   - Move updating the panic info to hyp_panic(), so that unwinding also
> >     works for conventional nVHE Hyp-mode.
> >
> >  arch/arm64/include/asm/kvm_asm.h    |  19 +++
> >  arch/arm64/include/asm/stacktrace.h |  12 ++
> >  arch/arm64/kernel/stacktrace.c      | 210 +++++++++++++++++++++++++---
> >  arch/arm64/kvm/Kconfig              |   5 +-
> >  arch/arm64/kvm/arm.c                |   2 +-
> >  arch/arm64/kvm/handle_exit.c        |   3 +
> >  arch/arm64/kvm/hyp/nvhe/switch.c    |  18 +++
> >  7 files changed, 243 insertions(+), 26 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> > index 2e277f2ed671..16efdf150a37 100644
> > --- a/arch/arm64/include/asm/kvm_asm.h
> > +++ b/arch/arm64/include/asm/kvm_asm.h
> > @@ -176,6 +176,25 @@ struct kvm_nvhe_init_params {
> >         unsigned long vtcr;
> >  };
> >
> > +#ifdef CONFIG_NVHE_EL2_DEBUG
> > +/*
> > + * Used by the host in EL1 to dump the nVHE hypervisor backtrace on
> > + * hyp_panic. This is possible because CONFIG_NVHE_EL2_DEBUG disables
> > + * the host stage 2 protection. See: __hyp_do_panic()
>
> Same as my comment above.

Ack
>
> > + *
> > + * @hyp_stack_base:             hyp VA of the hyp_stack base.
> > + * @hyp_overflow_stack_base:    hyp VA of the hyp_overflow_stack base.
> > + * @fp:                         hyp FP where the backtrace begins.
> > + * @pc:                         hyp PC where the backtrace begins.
> > + */
> > +struct kvm_nvhe_panic_info {
> > +       unsigned long hyp_stack_base;
> > +       unsigned long hyp_overflow_stack_base;
> > +       unsigned long fp;
> > +       unsigned long pc;
> > +};
> > +#endif /* CONFIG_NVHE_EL2_DEBUG */
> > +
> >  /* Translate a kernel address @ptr into its equivalent linear mapping */
> >  #define kvm_ksym_ref(ptr)                                              \
> >         ({                                                              \
> > diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
> > index e77cdef9ca29..18611a51cf14 100644
> > --- a/arch/arm64/include/asm/stacktrace.h
> > +++ b/arch/arm64/include/asm/stacktrace.h
> > @@ -22,6 +22,10 @@ enum stack_type {
> >         STACK_TYPE_OVERFLOW,
> >         STACK_TYPE_SDEI_NORMAL,
> >         STACK_TYPE_SDEI_CRITICAL,
> > +#ifdef CONFIG_NVHE_EL2_DEBUG
> > +       STACK_TYPE_KVM_NVHE_HYP,
> > +       STACK_TYPE_KVM_NVHE_OVERFLOW,
> > +#endif /* CONFIG_NVHE_EL2_DEBUG */
> >         __NR_STACK_TYPES
> >  };
> >
> > @@ -147,4 +151,12 @@ static inline bool on_accessible_stack(const struct task_struct *tsk,
> >         return false;
> >  }
> >
> > +#ifdef CONFIG_NVHE_EL2_DEBUG
> > +void kvm_nvhe_dump_backtrace(unsigned long hyp_offset);
> > +#else
> > +static inline void kvm_nvhe_dump_backtrace(unsigned long hyp_offset)
> > +{
> > +}
> > +#endif /* CONFIG_NVHE_EL2_DEBUG */
> > +
> >  #endif /* __ASM_STACKTRACE_H */
> > diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> > index e4103e085681..6ec85cb69b1f 100644
> > --- a/arch/arm64/kernel/stacktrace.c
> > +++ b/arch/arm64/kernel/stacktrace.c
> > @@ -15,6 +15,8 @@
> >
> >  #include <asm/irq.h>
> >  #include <asm/pointer_auth.h>
> > +#include <asm/kvm_asm.h>
> > +#include <asm/kvm_hyp.h>
> >  #include <asm/stack_pointer.h>
> >  #include <asm/stacktrace.h>
> >
> > @@ -64,26 +66,15 @@ NOKPROBE_SYMBOL(start_backtrace);
> >   * records (e.g. a cycle), determined based on the location and fp value of A
> >   * and the location (but not the fp value) of B.
> >   */
> > -static int notrace unwind_frame(struct task_struct *tsk,
> > -                               struct stackframe *frame)
> > +static int notrace __unwind_frame(struct stackframe *frame, struct stack_info *info,
> > +               unsigned long (*translate_fp)(unsigned long, enum stack_type))
> >  {
> >         unsigned long fp = frame->fp;
> > -       struct stack_info info;
> > -
> > -       if (!tsk)
> > -               tsk = current;
> > -
> > -       /* Final frame; nothing to unwind */
> > -       if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
> > -               return -ENOENT;
> >
> >         if (fp & 0x7)
> >                 return -EINVAL;
> >
> > -       if (!on_accessible_stack(tsk, fp, 16, &info))
> > -               return -EINVAL;
> > -
> > -       if (test_bit(info.type, frame->stacks_done))
> > +       if (test_bit(info->type, frame->stacks_done))
> >                 return -EINVAL;
> >
> >         /*
> > @@ -94,28 +85,62 @@ static int notrace unwind_frame(struct task_struct *tsk,
> >          *
> >          * TASK -> IRQ -> OVERFLOW -> SDEI_NORMAL
> >          * TASK -> SDEI_NORMAL -> SDEI_CRITICAL -> OVERFLOW
> > +        * KVM_NVHE_HYP -> KVM_NVHE_OVERFLOW
> >          *
> >          * ... but the nesting itself is strict. Once we transition from one
> >          * stack to another, it's never valid to unwind back to that first
> >          * stack.
> >          */
> > -       if (info.type == frame->prev_type) {
> > +       if (info->type == frame->prev_type) {
> >                 if (fp <= frame->prev_fp)
> >                         return -EINVAL;
> >         } else {
> >                 set_bit(frame->prev_type, frame->stacks_done);
> >         }
> >
> > +       /* Record fp as prev_fp before attempting to get the next fp */
> > +       frame->prev_fp = fp;
> > +
> > +       /*
> > +        * If fp is not from the current address space perform the
> > +        * necessary translation before dereferencing it to get next fp.
> > +        */
> > +       if (translate_fp)
> > +               fp = translate_fp(fp, info->type);
> > +       if (!fp)
> > +               return -EINVAL;
> > +
> >         /*
> >          * Record this frame record's values and location. The prev_fp and
> > -        * prev_type are only meaningful to the next unwind_frame() invocation.
> > +        * prev_type are only meaningful to the next __unwind_frame() invocation.
> >          */
> >         frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
> >         frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
> > -       frame->prev_fp = fp;
> > -       frame->prev_type = info.type;
> > -
> >         frame->pc = ptrauth_strip_insn_pac(frame->pc);
> > +       frame->prev_type = info->type;
> > +
> > +       return 0;
> > +}
> > +
> > +static int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
> > +{
> > +       unsigned long fp = frame->fp;
> > +       struct stack_info info;
> > +       int err;
> > +
> > +       if (!tsk)
> > +               tsk = current;
> > +
> > +       /* Final frame; nothing to unwind */
> > +       if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
> > +               return -ENOENT;
> > +
> > +       if (!on_accessible_stack(tsk, fp, 16, &info))
> > +               return -EINVAL;
> > +
> > +       err = __unwind_frame(frame, &info, NULL);
> > +       if (err)
> > +               return err;
> >
> >  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> >         if (tsk->ret_stack &&
> > @@ -143,20 +168,27 @@ static int notrace unwind_frame(struct task_struct *tsk,
> >  }
> >  NOKPROBE_SYMBOL(unwind_frame);
> >
> > -static void notrace walk_stackframe(struct task_struct *tsk,
> > -                                   struct stackframe *frame,
> > -                                   bool (*fn)(void *, unsigned long), void *data)
> > +static void notrace __walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
> > +               bool (*fn)(void *, unsigned long), void *data,
> > +               int (*unwind_frame_fn)(struct task_struct *tsk, struct stackframe *frame))
> >  {
> >         while (1) {
> >                 int ret;
> >
> >                 if (!fn(data, frame->pc))
> >                         break;
> > -               ret = unwind_frame(tsk, frame);
> > +               ret = unwind_frame_fn(tsk, frame);
> >                 if (ret < 0)
> >                         break;
> >         }
> >  }
> > +
> > +static void notrace walk_stackframe(struct task_struct *tsk,
> > +                                   struct stackframe *frame,
> > +                                   bool (*fn)(void *, unsigned long), void *data)
> > +{
> > +       __walk_stackframe(tsk, frame, fn, data, unwind_frame);
> > +}
> >  NOKPROBE_SYMBOL(walk_stackframe);
> >
> >  static bool dump_backtrace_entry(void *arg, unsigned long where)
> > @@ -210,3 +242,135 @@ noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
> >
> >         walk_stackframe(task, &frame, consume_entry, cookie);
> >  }
> > +
> > +#ifdef CONFIG_NVHE_EL2_DEBUG
> > +DECLARE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
> > +DECLARE_KVM_NVHE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack);
> > +DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
> > +
> > +static inline bool kvm_nvhe_on_overflow_stack(unsigned long sp, unsigned long size,
> > +                                struct stack_info *info)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> > +       unsigned long low = (unsigned long)panic_info->hyp_overflow_stack_base;
> > +       unsigned long high = low + PAGE_SIZE;
> > +
> > +       return on_stack(sp, size, low, high, STACK_TYPE_KVM_NVHE_OVERFLOW, info);
> > +}
> > +
> > +static inline bool kvm_nvhe_on_hyp_stack(unsigned long sp, unsigned long size,
> > +                                struct stack_info *info)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> > +       unsigned long low = (unsigned long)panic_info->hyp_stack_base;
> > +       unsigned long high = low + PAGE_SIZE;
> > +
> > +       return on_stack(sp, size, low, high, STACK_TYPE_KVM_NVHE_HYP, info);
> > +}
> > +
> > +static inline bool kvm_nvhe_on_accessible_stack(unsigned long sp, unsigned long size,
> > +                                      struct stack_info *info)
> > +{
> > +       if (info)
> > +               info->type = STACK_TYPE_UNKNOWN;
> > +
> > +       if (kvm_nvhe_on_hyp_stack(sp, size, info))
> > +               return true;
> > +       if (kvm_nvhe_on_overflow_stack(sp, size, info))
> > +               return true;
> > +
> > +       return false;
> > +}
> > +
> > +static unsigned long kvm_nvhe_hyp_stack_kern_va(unsigned long addr)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> > +       unsigned long hyp_base, kern_base, hyp_offset;
> > +
> > +       hyp_base = (unsigned long)panic_info->hyp_stack_base;
> > +       hyp_offset = addr - hyp_base;
> > +
> > +       kern_base = (unsigned long)*this_cpu_ptr(&kvm_arm_hyp_stack_page);
> > +
> > +       return kern_base + hyp_offset;
> > +}
> > +
> > +static unsigned long kvm_nvhe_overflow_stack_kern_va(unsigned long addr)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> > +       unsigned long hyp_base, kern_base, hyp_offset;
> > +
> > +       hyp_base = (unsigned long)panic_info->hyp_overflow_stack_base;
> > +       hyp_offset = addr - hyp_base;
> > +
> > +       kern_base = (unsigned long)this_cpu_ptr_nvhe_sym(hyp_overflow_stack);
> > +
> > +       return kern_base + hyp_offset;
> > +}
> > +
> > +/*
> > + * Convert KVM nVHE hypervisor stack VA to a kernel VA.
> > + *
> > + * The nVHE hypervisor stack is mapped in the flexible 'private' VA range, to allow
> > + * for guard pages below the stack. Consequently, the fixed offset address
> > + * translation macros won't work here.
> > + *
> > + * The kernel VA is calculated as an offset from the kernel VA of the hypervisor
> > + * stack base. See: kvm_nvhe_hyp_stack_kern_va(),  kvm_nvhe_overflow_stack_kern_va()
> > + */
> > +static unsigned long kvm_nvhe_stack_kern_va(unsigned long addr,
> > +                                       enum stack_type type)
> > +{
> > +       switch (type) {
> > +       case STACK_TYPE_KVM_NVHE_HYP:
> > +               return kvm_nvhe_hyp_stack_kern_va(addr);
> > +       case STACK_TYPE_KVM_NVHE_OVERFLOW:
> > +               return kvm_nvhe_overflow_stack_kern_va(addr);
> > +       default:
> > +               return 0UL;
> > +       }
> > +}
> > +
> > +static int notrace kvm_nvhe_unwind_frame(struct task_struct *tsk,
> > +                                       struct stackframe *frame)
> > +{
> > +       struct stack_info info;
> > +
> > +       if (!kvm_nvhe_on_accessible_stack(frame->fp, 16, &info))
> > +               return -EINVAL;
> > +
> > +       return  __unwind_frame(frame, &info, kvm_nvhe_stack_kern_va);
> > +}
> > +
> > +static bool kvm_nvhe_dump_backtrace_entry(void *arg, unsigned long where)
> > +{
> > +       unsigned long va_mask = GENMASK_ULL(vabits_actual - 1, 0);
> > +       unsigned long hyp_offset = (unsigned long)arg;
> > +
> > +       where &= va_mask;       /* Mask tags */
> > +       where += hyp_offset;    /* Convert to kern addr */
> > +
> > +       kvm_err("[<%016lx>] %pB\n", where, (void *)where);
> > +
> > +       return true;
> > +}
> > +
> > +static void notrace kvm_nvhe_walk_stackframe(struct task_struct *tsk,
> > +                                   struct stackframe *frame,
> > +                                   bool (*fn)(void *, unsigned long), void *data)
> > +{
> > +       __walk_stackframe(tsk, frame, fn, data, kvm_nvhe_unwind_frame);
> > +}
> > +
> > +void kvm_nvhe_dump_backtrace(unsigned long hyp_offset)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> > +       struct stackframe frame;
> > +
> > +       start_backtrace(&frame, panic_info->fp, panic_info->pc);
> > +       pr_err("nVHE HYP call trace:\n");
> > +       kvm_nvhe_walk_stackframe(NULL, &frame, kvm_nvhe_dump_backtrace_entry,
> > +                                       (void *)hyp_offset);
> > +       pr_err("---- end of nVHE HYP call trace ----\n");
> > +}
> > +#endif /* CONFIG_NVHE_EL2_DEBUG */
> > diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> > index 8a5fbbf084df..75f2c8255ff0 100644
> > --- a/arch/arm64/kvm/Kconfig
> > +++ b/arch/arm64/kvm/Kconfig
> > @@ -51,8 +51,9 @@ config NVHE_EL2_DEBUG
> >         depends on KVM
> >         help
> >           Say Y here to enable the debug mode for the non-VHE KVM EL2 object.
> > -         Failure reports will BUG() in the hypervisor. This is intended for
> > -         local EL2 hypervisor development.
> > +         Failure reports will BUG() in the hypervisor; and panics will print
> > +         the hypervisor call stack. This is intended for local EL2 hypervisor
> > +         development.
>
> Nit: maybe for clarity you could rephrase as "calls to hyp_panic()
> will result in printing the hypervisor call stack".

Ack. I'll update in the next version.

Thanks,
Kalesh
>
> Thanks,
> /fuad
>
>
> >
> >           If unsure, say N.
> >
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 7a23630c4a7f..66c07c04eb52 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -49,7 +49,7 @@ DEFINE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
> >
> >  DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
> >
> > -static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
> > +DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
> >  unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
> >  DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
> >
> > diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> > index e3140abd2e2e..ff69dff33700 100644
> > --- a/arch/arm64/kvm/handle_exit.c
> > +++ b/arch/arm64/kvm/handle_exit.c
> > @@ -17,6 +17,7 @@
> >  #include <asm/kvm_emulate.h>
> >  #include <asm/kvm_mmu.h>
> >  #include <asm/debug-monitors.h>
> > +#include <asm/stacktrace.h>
> >  #include <asm/traps.h>
> >
> >  #include <kvm/arm_hypercalls.h>
> > @@ -326,6 +327,8 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr,
> >                 kvm_err("nVHE hyp panic at: %016llx!\n", elr_virt + hyp_offset);
> >         }
> >
> > +       kvm_nvhe_dump_backtrace(hyp_offset);
> > +
> >         /*
> >          * Hyp has panicked and we're going to handle that by panicking the
> >          * kernel. The kernel offset will be revealed in the panic so we're
> > diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
> > index efc20273a352..b8ecffc47424 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/switch.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
> > @@ -37,6 +37,22 @@ DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
> >  #ifdef CONFIG_NVHE_EL2_DEBUG
> >  DEFINE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack)
> >         __aligned(16);
> > +DEFINE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
> > +
> > +static inline void cpu_prepare_nvhe_panic_info(void)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr(&kvm_panic_info);
> > +       struct kvm_nvhe_init_params *params = this_cpu_ptr(&kvm_init_params);
> > +
> > +       panic_info->hyp_stack_base = (unsigned long)(params->stack_hyp_va - PAGE_SIZE);
> > +       panic_info->hyp_overflow_stack_base = (unsigned long)this_cpu_ptr(hyp_overflow_stack);
> > +       panic_info->fp = (unsigned long)__builtin_frame_address(0);
> > +       panic_info->pc = _THIS_IP_;
> > +}
> > + #else
> > +static inline void cpu_prepare_nvhe_panic_info(void)
> > +{
> > +}
> >  #endif
> >
> >  static void __activate_traps(struct kvm_vcpu *vcpu)
> > @@ -360,6 +376,8 @@ asmlinkage void __noreturn hyp_panic(void)
> >         struct kvm_cpu_context *host_ctxt;
> >         struct kvm_vcpu *vcpu;
> >
> > +       cpu_prepare_nvhe_panic_info();
> > +
> >         host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
> >         vcpu = host_ctxt->__hyp_running_vcpu;
> >
> > --
> > 2.35.1.473.g83b2b277ed-goog
> >
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 0/8] KVM: arm64: Hypervisor stack enhancements
  2022-02-24  5:13 ` Kalesh Singh
  (?)
@ 2022-02-25  3:59   ` Kalesh Singh
  -1 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-25  3:59 UTC (permalink / raw)
  Cc: Will Deacon, Marc Zyngier, Quentin Perret, Fuad Tabba,
	Suren Baghdasaryan, Cc: Android Kernel, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Catalin Marinas,
	Mark Rutland, Mark Brown, Masami Hiramatsu, Peter Collingbourne,
	Madhavan T. Venkataraman, Andrew Walbran, Andrew Scull,
	Paolo Bonzini, Zenghui Yu, Ard Biesheuvel,
	moderated list:ARM64 PORT (AARCH64 ARCHITECTURE),
	kvmarm, LKML

On Wed, Feb 23, 2022 at 9:15 PM Kalesh Singh <kaleshsingh@google.com> wrote:
>
> Hi all,
>
> This is v3 of the nVHE hypervisor stack enhancements.

Please find the latest version v4, posted at:
https://lore.kernel.org/r/20220225033548.1912117-1-kaleshsingh@google.com/

Thanks,
Kalesh

>
> Previous versions can be found at:
> v2: https://lore.kernel.org/r/20220222165212.2005066-1-kaleshsingh@google.com/
> v1: https://lore.kernel.org/r/20220210224220.4076151-1-kaleshsingh@google.com/
>
> The main update in this version is that the unwinder now uses the core logic
> from the regular kernel stack unwinder to avoid duplicate code, per Mark; along
> with fixes for the other issues identified in v2.
>
> The previous cover letter (with updated call trace) has been copied below.
>
> Thanks,
> Kalesh
>
> -----
>
> This series is based on 5.17-rc5 and adds the following stack features to
> the KVM nVHE hypervisor:
>
> == Hyp Stack Guard Pages ==
>
> Based on the technique used by arm64 VMAP_STACK to detect overflow.
> i.e. the stack is aligned to twice its size which ensure that the
> 'stack shift' bit of any valid SP is 0. The 'stack shift' bit can be
> tested in the exception entry to detect overflow without corrupting GPRs.
>
> == Hyp Stack Unwinder ==
>
> Based on the arm64 kernel stack unwinder
> (See: arch/arm64/kernel/stacktrace.c)
>
> The unwinding and dumping of the hyp stack is not enabled by default and
> depends on CONFIG_NVHE_EL2_DEBUG to avoid potential information leaks.
>
> When CONFIG_NVHE_EL2_DEBUG is enabled the host stage 2 protection is
> disabled, allowing the host to read the hypervisor stack pages and unwind
> the stack from EL1. This allows us to print the hypervisor stacktrace
> before panicking the host; as shown below.
>
> Example call trace:
>
> [   98.916444][  T426] kvm [426]: nVHE hyp panic at: [<ffffffc0096156fc>] __kvm_nvhe_overflow_stack+0x8/0x34!
> [   98.918360][  T426] nVHE HYP call trace:
> [   98.918692][  T426] kvm [426]: [<ffffffc009615aac>] __kvm_nvhe_cpu_prepare_nvhe_panic_info+0x4c/0x68
> [   98.919545][  T426] kvm [426]: [<ffffffc0096159a4>] __kvm_nvhe_hyp_panic+0x2c/0xe8
> [   98.920107][  T426] kvm [426]: [<ffffffc009615ad8>] __kvm_nvhe_hyp_panic_bad_stack+0x10/0x10
> [   98.920665][  T426] kvm [426]: [<ffffffc009610a4c>] __kvm_nvhe___kvm_hyp_host_vector+0x24c/0x794
> [   98.921292][  T426] kvm [426]: [<ffffffc009615718>] __kvm_nvhe_overflow_stack+0x24/0x34
> . . .
>
> [   98.973382][  T426] kvm [426]: [<ffffffc009615718>] __kvm_nvhe_overflow_stack+0x24/0x34
> [   98.973816][  T426] kvm [426]: [<ffffffc0096152f4>] __kvm_nvhe___kvm_vcpu_run+0x38/0x438
> [   98.974255][  T426] kvm [426]: [<ffffffc009616f80>] __kvm_nvhe_handle___kvm_vcpu_run+0x1c4/0x364
> [   98.974719][  T426] kvm [426]: [<ffffffc009616928>] __kvm_nvhe_handle_trap+0xa8/0x130
> [   98.975152][  T426] kvm [426]: [<ffffffc009610064>] __kvm_nvhe___host_exit+0x64/0x64
> [   98.975588][  T426] ---- end of nVHE HYP call trace ----
>
>
> Kalesh Singh (8):
>   KVM: arm64: Introduce hyp_alloc_private_va_range()
>   KVM: arm64: Introduce pkvm_alloc_private_va_range()
>   KVM: arm64: Add guard pages for KVM nVHE hypervisor stack
>   KVM: arm64: Add guard pages for pKVM (protected nVHE) hypervisor stack
>   KVM: arm64: Detect and handle hypervisor stack overflows
>   KVM: arm64: Add hypervisor overflow stack
>   KVM: arm64: Unwind and dump nVHE HYP stacktrace
>   KVM: arm64: Symbolize the nVHE HYP backtrace
>
>  arch/arm64/include/asm/kvm_asm.h     |  20 +++
>  arch/arm64/include/asm/kvm_mmu.h     |   4 +
>  arch/arm64/include/asm/stacktrace.h  |  12 ++
>  arch/arm64/kernel/stacktrace.c       | 210 ++++++++++++++++++++++++---
>  arch/arm64/kvm/Kconfig               |   5 +-
>  arch/arm64/kvm/arm.c                 |  34 ++++-
>  arch/arm64/kvm/handle_exit.c         |  16 +-
>  arch/arm64/kvm/hyp/include/nvhe/mm.h |   3 +-
>  arch/arm64/kvm/hyp/nvhe/host.S       |  29 ++++
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c   |   5 +-
>  arch/arm64/kvm/hyp/nvhe/mm.c         |  51 ++++---
>  arch/arm64/kvm/hyp/nvhe/setup.c      |  25 +++-
>  arch/arm64/kvm/hyp/nvhe/switch.c     |  30 +++-
>  arch/arm64/kvm/mmu.c                 |  62 +++++---
>  scripts/kallsyms.c                   |   2 +-
>  15 files changed, 422 insertions(+), 86 deletions(-)
>
>
> base-commit: cfb92440ee71adcc2105b0890bb01ac3cddb8507
> --
> 2.35.1.473.g83b2b277ed-goog
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 0/8] KVM: arm64: Hypervisor stack enhancements
@ 2022-02-25  3:59   ` Kalesh Singh
  0 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-25  3:59 UTC (permalink / raw)
  Cc: Will Deacon, Marc Zyngier, Quentin Perret, Fuad Tabba,
	Suren Baghdasaryan, Cc: Android Kernel, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Catalin Marinas,
	Mark Rutland, Mark Brown, Masami Hiramatsu, Peter Collingbourne,
	Madhavan T. Venkataraman, Andrew Walbran, Andrew Scull,
	Paolo Bonzini, Zenghui Yu, Ard Biesheuvel,
	moderated list:ARM64 PORT (AARCH64 ARCHITECTURE),
	kvmarm, LKML

On Wed, Feb 23, 2022 at 9:15 PM Kalesh Singh <kaleshsingh@google.com> wrote:
>
> Hi all,
>
> This is v3 of the nVHE hypervisor stack enhancements.

Please find the latest version v4, posted at:
https://lore.kernel.org/r/20220225033548.1912117-1-kaleshsingh@google.com/

Thanks,
Kalesh

>
> Previous versions can be found at:
> v2: https://lore.kernel.org/r/20220222165212.2005066-1-kaleshsingh@google.com/
> v1: https://lore.kernel.org/r/20220210224220.4076151-1-kaleshsingh@google.com/
>
> The main update in this version is that the unwinder now uses the core logic
> from the regular kernel stack unwinder to avoid duplicate code, per Mark; along
> with fixes for the other issues identified in v2.
>
> The previous cover letter (with updated call trace) has been copied below.
>
> Thanks,
> Kalesh
>
> -----
>
> This series is based on 5.17-rc5 and adds the following stack features to
> the KVM nVHE hypervisor:
>
> == Hyp Stack Guard Pages ==
>
> Based on the technique used by arm64 VMAP_STACK to detect overflow.
> i.e. the stack is aligned to twice its size which ensure that the
> 'stack shift' bit of any valid SP is 0. The 'stack shift' bit can be
> tested in the exception entry to detect overflow without corrupting GPRs.
>
> == Hyp Stack Unwinder ==
>
> Based on the arm64 kernel stack unwinder
> (See: arch/arm64/kernel/stacktrace.c)
>
> The unwinding and dumping of the hyp stack is not enabled by default and
> depends on CONFIG_NVHE_EL2_DEBUG to avoid potential information leaks.
>
> When CONFIG_NVHE_EL2_DEBUG is enabled the host stage 2 protection is
> disabled, allowing the host to read the hypervisor stack pages and unwind
> the stack from EL1. This allows us to print the hypervisor stacktrace
> before panicking the host; as shown below.
>
> Example call trace:
>
> [   98.916444][  T426] kvm [426]: nVHE hyp panic at: [<ffffffc0096156fc>] __kvm_nvhe_overflow_stack+0x8/0x34!
> [   98.918360][  T426] nVHE HYP call trace:
> [   98.918692][  T426] kvm [426]: [<ffffffc009615aac>] __kvm_nvhe_cpu_prepare_nvhe_panic_info+0x4c/0x68
> [   98.919545][  T426] kvm [426]: [<ffffffc0096159a4>] __kvm_nvhe_hyp_panic+0x2c/0xe8
> [   98.920107][  T426] kvm [426]: [<ffffffc009615ad8>] __kvm_nvhe_hyp_panic_bad_stack+0x10/0x10
> [   98.920665][  T426] kvm [426]: [<ffffffc009610a4c>] __kvm_nvhe___kvm_hyp_host_vector+0x24c/0x794
> [   98.921292][  T426] kvm [426]: [<ffffffc009615718>] __kvm_nvhe_overflow_stack+0x24/0x34
> . . .
>
> [   98.973382][  T426] kvm [426]: [<ffffffc009615718>] __kvm_nvhe_overflow_stack+0x24/0x34
> [   98.973816][  T426] kvm [426]: [<ffffffc0096152f4>] __kvm_nvhe___kvm_vcpu_run+0x38/0x438
> [   98.974255][  T426] kvm [426]: [<ffffffc009616f80>] __kvm_nvhe_handle___kvm_vcpu_run+0x1c4/0x364
> [   98.974719][  T426] kvm [426]: [<ffffffc009616928>] __kvm_nvhe_handle_trap+0xa8/0x130
> [   98.975152][  T426] kvm [426]: [<ffffffc009610064>] __kvm_nvhe___host_exit+0x64/0x64
> [   98.975588][  T426] ---- end of nVHE HYP call trace ----
>
>
> Kalesh Singh (8):
>   KVM: arm64: Introduce hyp_alloc_private_va_range()
>   KVM: arm64: Introduce pkvm_alloc_private_va_range()
>   KVM: arm64: Add guard pages for KVM nVHE hypervisor stack
>   KVM: arm64: Add guard pages for pKVM (protected nVHE) hypervisor stack
>   KVM: arm64: Detect and handle hypervisor stack overflows
>   KVM: arm64: Add hypervisor overflow stack
>   KVM: arm64: Unwind and dump nVHE HYP stacktrace
>   KVM: arm64: Symbolize the nVHE HYP backtrace
>
>  arch/arm64/include/asm/kvm_asm.h     |  20 +++
>  arch/arm64/include/asm/kvm_mmu.h     |   4 +
>  arch/arm64/include/asm/stacktrace.h  |  12 ++
>  arch/arm64/kernel/stacktrace.c       | 210 ++++++++++++++++++++++++---
>  arch/arm64/kvm/Kconfig               |   5 +-
>  arch/arm64/kvm/arm.c                 |  34 ++++-
>  arch/arm64/kvm/handle_exit.c         |  16 +-
>  arch/arm64/kvm/hyp/include/nvhe/mm.h |   3 +-
>  arch/arm64/kvm/hyp/nvhe/host.S       |  29 ++++
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c   |   5 +-
>  arch/arm64/kvm/hyp/nvhe/mm.c         |  51 ++++---
>  arch/arm64/kvm/hyp/nvhe/setup.c      |  25 +++-
>  arch/arm64/kvm/hyp/nvhe/switch.c     |  30 +++-
>  arch/arm64/kvm/mmu.c                 |  62 +++++---
>  scripts/kallsyms.c                   |   2 +-
>  15 files changed, 422 insertions(+), 86 deletions(-)
>
>
> base-commit: cfb92440ee71adcc2105b0890bb01ac3cddb8507
> --
> 2.35.1.473.g83b2b277ed-goog
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 0/8] KVM: arm64: Hypervisor stack enhancements
@ 2022-02-25  3:59   ` Kalesh Singh
  0 siblings, 0 replies; 60+ messages in thread
From: Kalesh Singh @ 2022-02-25  3:59 UTC (permalink / raw)
  Cc: Catalin Marinas, Will Deacon, kvmarm, Andrew Walbran,
	Marc Zyngier, Madhavan T. Venkataraman, Cc: Android Kernel,
	Suren Baghdasaryan, Mark Brown, Peter Collingbourne,
	moderated list:ARM64 PORT (AARCH64 ARCHITECTURE),
	LKML, Masami Hiramatsu, Paolo Bonzini

On Wed, Feb 23, 2022 at 9:15 PM Kalesh Singh <kaleshsingh@google.com> wrote:
>
> Hi all,
>
> This is v3 of the nVHE hypervisor stack enhancements.

Please find the latest version v4, posted at:
https://lore.kernel.org/r/20220225033548.1912117-1-kaleshsingh@google.com/

Thanks,
Kalesh

>
> Previous versions can be found at:
> v2: https://lore.kernel.org/r/20220222165212.2005066-1-kaleshsingh@google.com/
> v1: https://lore.kernel.org/r/20220210224220.4076151-1-kaleshsingh@google.com/
>
> The main update in this version is that the unwinder now uses the core logic
> from the regular kernel stack unwinder to avoid duplicate code, per Mark; along
> with fixes for the other issues identified in v2.
>
> The previous cover letter (with updated call trace) has been copied below.
>
> Thanks,
> Kalesh
>
> -----
>
> This series is based on 5.17-rc5 and adds the following stack features to
> the KVM nVHE hypervisor:
>
> == Hyp Stack Guard Pages ==
>
> Based on the technique used by arm64 VMAP_STACK to detect overflow.
> i.e. the stack is aligned to twice its size which ensure that the
> 'stack shift' bit of any valid SP is 0. The 'stack shift' bit can be
> tested in the exception entry to detect overflow without corrupting GPRs.
>
> == Hyp Stack Unwinder ==
>
> Based on the arm64 kernel stack unwinder
> (See: arch/arm64/kernel/stacktrace.c)
>
> The unwinding and dumping of the hyp stack is not enabled by default and
> depends on CONFIG_NVHE_EL2_DEBUG to avoid potential information leaks.
>
> When CONFIG_NVHE_EL2_DEBUG is enabled the host stage 2 protection is
> disabled, allowing the host to read the hypervisor stack pages and unwind
> the stack from EL1. This allows us to print the hypervisor stacktrace
> before panicking the host; as shown below.
>
> Example call trace:
>
> [   98.916444][  T426] kvm [426]: nVHE hyp panic at: [<ffffffc0096156fc>] __kvm_nvhe_overflow_stack+0x8/0x34!
> [   98.918360][  T426] nVHE HYP call trace:
> [   98.918692][  T426] kvm [426]: [<ffffffc009615aac>] __kvm_nvhe_cpu_prepare_nvhe_panic_info+0x4c/0x68
> [   98.919545][  T426] kvm [426]: [<ffffffc0096159a4>] __kvm_nvhe_hyp_panic+0x2c/0xe8
> [   98.920107][  T426] kvm [426]: [<ffffffc009615ad8>] __kvm_nvhe_hyp_panic_bad_stack+0x10/0x10
> [   98.920665][  T426] kvm [426]: [<ffffffc009610a4c>] __kvm_nvhe___kvm_hyp_host_vector+0x24c/0x794
> [   98.921292][  T426] kvm [426]: [<ffffffc009615718>] __kvm_nvhe_overflow_stack+0x24/0x34
> . . .
>
> [   98.973382][  T426] kvm [426]: [<ffffffc009615718>] __kvm_nvhe_overflow_stack+0x24/0x34
> [   98.973816][  T426] kvm [426]: [<ffffffc0096152f4>] __kvm_nvhe___kvm_vcpu_run+0x38/0x438
> [   98.974255][  T426] kvm [426]: [<ffffffc009616f80>] __kvm_nvhe_handle___kvm_vcpu_run+0x1c4/0x364
> [   98.974719][  T426] kvm [426]: [<ffffffc009616928>] __kvm_nvhe_handle_trap+0xa8/0x130
> [   98.975152][  T426] kvm [426]: [<ffffffc009610064>] __kvm_nvhe___host_exit+0x64/0x64
> [   98.975588][  T426] ---- end of nVHE HYP call trace ----
>
>
> Kalesh Singh (8):
>   KVM: arm64: Introduce hyp_alloc_private_va_range()
>   KVM: arm64: Introduce pkvm_alloc_private_va_range()
>   KVM: arm64: Add guard pages for KVM nVHE hypervisor stack
>   KVM: arm64: Add guard pages for pKVM (protected nVHE) hypervisor stack
>   KVM: arm64: Detect and handle hypervisor stack overflows
>   KVM: arm64: Add hypervisor overflow stack
>   KVM: arm64: Unwind and dump nVHE HYP stacktrace
>   KVM: arm64: Symbolize the nVHE HYP backtrace
>
>  arch/arm64/include/asm/kvm_asm.h     |  20 +++
>  arch/arm64/include/asm/kvm_mmu.h     |   4 +
>  arch/arm64/include/asm/stacktrace.h  |  12 ++
>  arch/arm64/kernel/stacktrace.c       | 210 ++++++++++++++++++++++++---
>  arch/arm64/kvm/Kconfig               |   5 +-
>  arch/arm64/kvm/arm.c                 |  34 ++++-
>  arch/arm64/kvm/handle_exit.c         |  16 +-
>  arch/arm64/kvm/hyp/include/nvhe/mm.h |   3 +-
>  arch/arm64/kvm/hyp/nvhe/host.S       |  29 ++++
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c   |   5 +-
>  arch/arm64/kvm/hyp/nvhe/mm.c         |  51 ++++---
>  arch/arm64/kvm/hyp/nvhe/setup.c      |  25 +++-
>  arch/arm64/kvm/hyp/nvhe/switch.c     |  30 +++-
>  arch/arm64/kvm/mmu.c                 |  62 +++++---
>  scripts/kallsyms.c                   |   2 +-
>  15 files changed, 422 insertions(+), 86 deletions(-)
>
>
> base-commit: cfb92440ee71adcc2105b0890bb01ac3cddb8507
> --
> 2.35.1.473.g83b2b277ed-goog
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2022-02-25 15:00 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-24  5:13 [PATCH v3 0/8] KVM: arm64: Hypervisor stack enhancements Kalesh Singh
2022-02-24  5:13 ` Kalesh Singh
2022-02-24  5:13 ` Kalesh Singh
2022-02-24  5:13 ` [PATCH v3 1/8] KVM: arm64: Introduce hyp_alloc_private_va_range() Kalesh Singh
2022-02-24  5:13   ` Kalesh Singh
2022-02-24  5:13   ` Kalesh Singh
2022-02-24 12:24   ` Fuad Tabba
2022-02-24 12:24     ` Fuad Tabba
2022-02-24 12:24     ` Fuad Tabba
2022-02-24 17:20     ` Kalesh Singh
2022-02-24 17:20       ` Kalesh Singh
2022-02-24 17:20       ` Kalesh Singh
2022-02-24  5:13 ` [PATCH v3 2/8] KVM: arm64: Introduce pkvm_alloc_private_va_range() Kalesh Singh
2022-02-24  5:13   ` Kalesh Singh
2022-02-24  5:13   ` Kalesh Singh
2022-02-24 12:25   ` Fuad Tabba
2022-02-24 12:25     ` Fuad Tabba
2022-02-24 12:25     ` Fuad Tabba
2022-02-24 17:28     ` Kalesh Singh
2022-02-24 17:28       ` Kalesh Singh
2022-02-24 17:28       ` Kalesh Singh
2022-02-24  5:13 ` [PATCH v3 3/8] KVM: arm64: Add guard pages for KVM nVHE hypervisor stack Kalesh Singh
2022-02-24  5:13   ` Kalesh Singh
2022-02-24  5:13   ` Kalesh Singh
2022-02-24 12:26   ` Fuad Tabba
2022-02-24 12:26     ` Fuad Tabba
2022-02-24 12:26     ` Fuad Tabba
2022-02-24 17:54     ` Kalesh Singh
2022-02-24 17:54       ` Kalesh Singh
2022-02-24 17:54       ` Kalesh Singh
2022-02-24  5:13 ` [PATCH v3 4/8] KVM: arm64: Add guard pages for pKVM (protected nVHE) " Kalesh Singh
2022-02-24  5:13   ` Kalesh Singh
2022-02-24  5:13   ` Kalesh Singh
2022-02-24  5:13 ` [PATCH v3 5/8] KVM: arm64: Detect and handle hypervisor stack overflows Kalesh Singh
2022-02-24  5:13   ` Kalesh Singh
2022-02-24  5:13   ` Kalesh Singh
2022-02-24  5:13 ` [PATCH v3 6/8] KVM: arm64: Add hypervisor overflow stack Kalesh Singh
2022-02-24  5:13   ` Kalesh Singh
2022-02-24  5:13   ` Kalesh Singh
2022-02-24 12:26   ` Fuad Tabba
2022-02-24 12:26     ` Fuad Tabba
2022-02-24 12:26     ` Fuad Tabba
2022-02-24 17:56     ` Kalesh Singh
2022-02-24 17:56       ` Kalesh Singh
2022-02-24 17:56       ` Kalesh Singh
2022-02-24  5:13 ` [PATCH v3 7/8] KVM: arm64: Unwind and dump nVHE HYP stacktrace Kalesh Singh
2022-02-24  5:13   ` Kalesh Singh
2022-02-24  5:13   ` Kalesh Singh
2022-02-24 12:28   ` Fuad Tabba
2022-02-24 12:28     ` Fuad Tabba
2022-02-24 12:28     ` Fuad Tabba
2022-02-24 18:08     ` Kalesh Singh
2022-02-24 18:08       ` Kalesh Singh
2022-02-24 18:08       ` Kalesh Singh
2022-02-24  5:13 ` [PATCH v3 8/8] KVM: arm64: Symbolize the nVHE HYP backtrace Kalesh Singh
2022-02-24  5:13   ` Kalesh Singh
2022-02-24  5:13   ` Kalesh Singh
2022-02-25  3:59 ` [PATCH v3 0/8] KVM: arm64: Hypervisor stack enhancements Kalesh Singh
2022-02-25  3:59   ` Kalesh Singh
2022-02-25  3:59   ` Kalesh Singh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.