All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] KVM: arm64: Hypervisor stack enhancements
@ 2022-02-10 22:41 ` Kalesh Singh
  0 siblings, 0 replies; 36+ messages in thread
From: Kalesh Singh @ 2022-02-10 22:41 UTC (permalink / raw)
  Cc: will, maz, qperret, tabba, surenb, kernel-team, Kalesh Singh,
	Catalin Marinas, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Ard Biesheuvel, Mark Rutland, Pasha Tatashin, Joey Gouly,
	Peter Collingbourne, Andrew Walbran, Andrew Scull, Paolo Bonzini,
	linux-arm-kernel, linux-kernel, kvmarm

This series is based on v5.17-rc3 and adds the following stack features to
the KVM nVHE hypervisor:

== Hyp Stack Guard Pages ==

Based on the technique used by arm64 VMAP_STACK to detect overflow.
i.e. the stack is aligned to twice its size which ensure that the 
'stack shift' bit of any valid SP is 0. The 'stack shift' bit can be
tested in the exception entry to detect overflow without corrupting GPRs.

== Hyp Stack Unwinder ==

Based on the arm64 kernel stack unwinder
(See: arch/arm64/kernel/stacktrace.c)

The unwinding and dumping of the hyp stack is not enabled by default and
depends on CONFIG_NVHE_EL2_DEBUG to avoid potential information leaks.

When CONFIG_NVHE_EL2_DEBUG is enabled the host stage 2 protection is
disabled, allowing the host to read the hypervisor stack pages and unwind
the stack from EL1. This allows us to print the hypervisor stacktrace
before panicking the host; as shown below:

kvm [408]: nVHE hyp panic at: \
           [<ffffffc01161460c>] __kvm_nvhe_overflow_stack+0x10/0x34!
kvm [408]: nVHE HYP call trace:
kvm [408]: [<ffffffc011614974>] __kvm_nvhe_hyp_panic_bad_stack+0xc/0x10
kvm [408]: [<ffffffc01160fa48>] __kvm_nvhe___kvm_hyp_host_vector+0x248/0x794
kvm [408]: [<ffffffc01161461c>] __kvm_nvhe_overflow_stack+0x20/0x34
. . .
kvm [408]: [<ffffffc01161461c>] __kvm_nvhe_overflow_stack+0x20/0x34
kvm [408]: [<ffffffc01161421c>] __kvm_nvhe___kvm_vcpu_run+0x2c/0x40c
kvm [408]: [<ffffffc011615e14>] __kvm_nvhe_handle___kvm_vcpu_run+0x1c8/0x36c
kvm [408]: [<ffffffc0116157c4>] __kvm_nvhe_handle_trap+0xa4/0x124
kvm [408]: [<ffffffc01160f060>] __kvm_nvhe___host_exit+0x60/0x64
kvm [408]: ---- end of nVHE HYP call trace ----


Kalesh Singh (3):
  KVM: arm64: Add Hyp overflow stack
  KVM: arm64: Unwind and dump nVHE HYP stacktrace
  KVM: arm64: Symbolize the nVHE HYP backtrace

Quentin Perret (4):
  KVM: arm64: Map the stack pages in the 'private' range
  KVM: arm64: Factor out private range VA allocation
  arm64: asm: Introduce test_sp_overflow macro
  KVM: arm64: Allocate guard pages near hyp stacks

 arch/arm64/include/asm/assembler.h   |  11 +
 arch/arm64/include/asm/kvm_asm.h     |  17 ++
 arch/arm64/kernel/entry.S            |   9 +-
 arch/arm64/kvm/Makefile              |   1 +
 arch/arm64/kvm/arm.c                 |   2 +-
 arch/arm64/kvm/handle_exit.c         |  14 +-
 arch/arm64/kvm/hyp/include/nvhe/mm.h |   1 +
 arch/arm64/kvm/hyp/nvhe/host.S       |  21 ++
 arch/arm64/kvm/hyp/nvhe/mm.c         |  28 ++-
 arch/arm64/kvm/hyp/nvhe/setup.c      |  63 +++++-
 arch/arm64/kvm/hyp/nvhe/switch.c     |  22 ++
 arch/arm64/kvm/stacktrace.c          | 290 +++++++++++++++++++++++++++
 arch/arm64/kvm/stacktrace.h          |  17 ++
 scripts/kallsyms.c                   |   2 +-
 14 files changed, 468 insertions(+), 30 deletions(-)
 create mode 100644 arch/arm64/kvm/stacktrace.c
 create mode 100644 arch/arm64/kvm/stacktrace.h


base-commit: dfd42facf1e4ada021b939b4e19c935dcdd55566
-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 0/7] KVM: arm64: Hypervisor stack enhancements
@ 2022-02-10 22:41 ` Kalesh Singh
  0 siblings, 0 replies; 36+ messages in thread
From: Kalesh Singh @ 2022-02-10 22:41 UTC (permalink / raw)
  Cc: will, maz, qperret, tabba, surenb, kernel-team, Kalesh Singh,
	Catalin Marinas, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Ard Biesheuvel, Mark Rutland, Pasha Tatashin, Joey Gouly,
	Peter Collingbourne, Andrew Walbran, Andrew Scull, Paolo Bonzini,
	linux-arm-kernel, linux-kernel, kvmarm

This series is based on v5.17-rc3 and adds the following stack features to
the KVM nVHE hypervisor:

== Hyp Stack Guard Pages ==

Based on the technique used by arm64 VMAP_STACK to detect overflow.
i.e. the stack is aligned to twice its size which ensure that the 
'stack shift' bit of any valid SP is 0. The 'stack shift' bit can be
tested in the exception entry to detect overflow without corrupting GPRs.

== Hyp Stack Unwinder ==

Based on the arm64 kernel stack unwinder
(See: arch/arm64/kernel/stacktrace.c)

The unwinding and dumping of the hyp stack is not enabled by default and
depends on CONFIG_NVHE_EL2_DEBUG to avoid potential information leaks.

When CONFIG_NVHE_EL2_DEBUG is enabled the host stage 2 protection is
disabled, allowing the host to read the hypervisor stack pages and unwind
the stack from EL1. This allows us to print the hypervisor stacktrace
before panicking the host; as shown below:

kvm [408]: nVHE hyp panic at: \
           [<ffffffc01161460c>] __kvm_nvhe_overflow_stack+0x10/0x34!
kvm [408]: nVHE HYP call trace:
kvm [408]: [<ffffffc011614974>] __kvm_nvhe_hyp_panic_bad_stack+0xc/0x10
kvm [408]: [<ffffffc01160fa48>] __kvm_nvhe___kvm_hyp_host_vector+0x248/0x794
kvm [408]: [<ffffffc01161461c>] __kvm_nvhe_overflow_stack+0x20/0x34
. . .
kvm [408]: [<ffffffc01161461c>] __kvm_nvhe_overflow_stack+0x20/0x34
kvm [408]: [<ffffffc01161421c>] __kvm_nvhe___kvm_vcpu_run+0x2c/0x40c
kvm [408]: [<ffffffc011615e14>] __kvm_nvhe_handle___kvm_vcpu_run+0x1c8/0x36c
kvm [408]: [<ffffffc0116157c4>] __kvm_nvhe_handle_trap+0xa4/0x124
kvm [408]: [<ffffffc01160f060>] __kvm_nvhe___host_exit+0x60/0x64
kvm [408]: ---- end of nVHE HYP call trace ----


Kalesh Singh (3):
  KVM: arm64: Add Hyp overflow stack
  KVM: arm64: Unwind and dump nVHE HYP stacktrace
  KVM: arm64: Symbolize the nVHE HYP backtrace

Quentin Perret (4):
  KVM: arm64: Map the stack pages in the 'private' range
  KVM: arm64: Factor out private range VA allocation
  arm64: asm: Introduce test_sp_overflow macro
  KVM: arm64: Allocate guard pages near hyp stacks

 arch/arm64/include/asm/assembler.h   |  11 +
 arch/arm64/include/asm/kvm_asm.h     |  17 ++
 arch/arm64/kernel/entry.S            |   9 +-
 arch/arm64/kvm/Makefile              |   1 +
 arch/arm64/kvm/arm.c                 |   2 +-
 arch/arm64/kvm/handle_exit.c         |  14 +-
 arch/arm64/kvm/hyp/include/nvhe/mm.h |   1 +
 arch/arm64/kvm/hyp/nvhe/host.S       |  21 ++
 arch/arm64/kvm/hyp/nvhe/mm.c         |  28 ++-
 arch/arm64/kvm/hyp/nvhe/setup.c      |  63 +++++-
 arch/arm64/kvm/hyp/nvhe/switch.c     |  22 ++
 arch/arm64/kvm/stacktrace.c          | 290 +++++++++++++++++++++++++++
 arch/arm64/kvm/stacktrace.h          |  17 ++
 scripts/kallsyms.c                   |   2 +-
 14 files changed, 468 insertions(+), 30 deletions(-)
 create mode 100644 arch/arm64/kvm/stacktrace.c
 create mode 100644 arch/arm64/kvm/stacktrace.h


base-commit: dfd42facf1e4ada021b939b4e19c935dcdd55566
-- 
2.35.1.265.g69c8d7142f-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 0/7] KVM: arm64: Hypervisor stack enhancements
@ 2022-02-10 22:41 ` Kalesh Singh
  0 siblings, 0 replies; 36+ messages in thread
From: Kalesh Singh @ 2022-02-10 22:41 UTC (permalink / raw)
  Cc: Catalin Marinas, Joey Gouly, Kalesh Singh, will, kvmarm,
	Andrew Walbran, maz, linux-arm-kernel, kernel-team,
	Pasha Tatashin, surenb, Peter Collingbourne, linux-kernel,
	Paolo Bonzini

This series is based on v5.17-rc3 and adds the following stack features to
the KVM nVHE hypervisor:

== Hyp Stack Guard Pages ==

Based on the technique used by arm64 VMAP_STACK to detect overflow.
i.e. the stack is aligned to twice its size which ensure that the 
'stack shift' bit of any valid SP is 0. The 'stack shift' bit can be
tested in the exception entry to detect overflow without corrupting GPRs.

== Hyp Stack Unwinder ==

Based on the arm64 kernel stack unwinder
(See: arch/arm64/kernel/stacktrace.c)

The unwinding and dumping of the hyp stack is not enabled by default and
depends on CONFIG_NVHE_EL2_DEBUG to avoid potential information leaks.

When CONFIG_NVHE_EL2_DEBUG is enabled the host stage 2 protection is
disabled, allowing the host to read the hypervisor stack pages and unwind
the stack from EL1. This allows us to print the hypervisor stacktrace
before panicking the host; as shown below:

kvm [408]: nVHE hyp panic at: \
           [<ffffffc01161460c>] __kvm_nvhe_overflow_stack+0x10/0x34!
kvm [408]: nVHE HYP call trace:
kvm [408]: [<ffffffc011614974>] __kvm_nvhe_hyp_panic_bad_stack+0xc/0x10
kvm [408]: [<ffffffc01160fa48>] __kvm_nvhe___kvm_hyp_host_vector+0x248/0x794
kvm [408]: [<ffffffc01161461c>] __kvm_nvhe_overflow_stack+0x20/0x34
. . .
kvm [408]: [<ffffffc01161461c>] __kvm_nvhe_overflow_stack+0x20/0x34
kvm [408]: [<ffffffc01161421c>] __kvm_nvhe___kvm_vcpu_run+0x2c/0x40c
kvm [408]: [<ffffffc011615e14>] __kvm_nvhe_handle___kvm_vcpu_run+0x1c8/0x36c
kvm [408]: [<ffffffc0116157c4>] __kvm_nvhe_handle_trap+0xa4/0x124
kvm [408]: [<ffffffc01160f060>] __kvm_nvhe___host_exit+0x60/0x64
kvm [408]: ---- end of nVHE HYP call trace ----


Kalesh Singh (3):
  KVM: arm64: Add Hyp overflow stack
  KVM: arm64: Unwind and dump nVHE HYP stacktrace
  KVM: arm64: Symbolize the nVHE HYP backtrace

Quentin Perret (4):
  KVM: arm64: Map the stack pages in the 'private' range
  KVM: arm64: Factor out private range VA allocation
  arm64: asm: Introduce test_sp_overflow macro
  KVM: arm64: Allocate guard pages near hyp stacks

 arch/arm64/include/asm/assembler.h   |  11 +
 arch/arm64/include/asm/kvm_asm.h     |  17 ++
 arch/arm64/kernel/entry.S            |   9 +-
 arch/arm64/kvm/Makefile              |   1 +
 arch/arm64/kvm/arm.c                 |   2 +-
 arch/arm64/kvm/handle_exit.c         |  14 +-
 arch/arm64/kvm/hyp/include/nvhe/mm.h |   1 +
 arch/arm64/kvm/hyp/nvhe/host.S       |  21 ++
 arch/arm64/kvm/hyp/nvhe/mm.c         |  28 ++-
 arch/arm64/kvm/hyp/nvhe/setup.c      |  63 +++++-
 arch/arm64/kvm/hyp/nvhe/switch.c     |  22 ++
 arch/arm64/kvm/stacktrace.c          | 290 +++++++++++++++++++++++++++
 arch/arm64/kvm/stacktrace.h          |  17 ++
 scripts/kallsyms.c                   |   2 +-
 14 files changed, 468 insertions(+), 30 deletions(-)
 create mode 100644 arch/arm64/kvm/stacktrace.c
 create mode 100644 arch/arm64/kvm/stacktrace.h


base-commit: dfd42facf1e4ada021b939b4e19c935dcdd55566
-- 
2.35.1.265.g69c8d7142f-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1/7] KVM: arm64: Map the stack pages in the 'private' range
  2022-02-10 22:41 ` Kalesh Singh
  (?)
@ 2022-02-10 22:41   ` Kalesh Singh
  -1 siblings, 0 replies; 36+ messages in thread
From: Kalesh Singh @ 2022-02-10 22:41 UTC (permalink / raw)
  Cc: will, maz, qperret, tabba, surenb, kernel-team, Kalesh Singh,
	Catalin Marinas, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Ard Biesheuvel, Mark Rutland, Pasha Tatashin, Joey Gouly,
	Peter Collingbourne, Andrew Walbran, Andrew Scull, Paolo Bonzini,
	linux-arm-kernel, linux-kernel, kvmarm

From: Quentin Perret <qperret@google.com>

In preparation for introducing guard pages for the stacks, map them
in the 'private' range of the EL2 VA space in which the VA to PA
relation is flexible when running in protected mode.

Signed-off-by: Quentin Perret <qperret@google.com>
[Kalesh - Refactor, add comments, resolve conflicts,
          use  __pkvm_create_private_mapping()]
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---
 arch/arm64/kvm/hyp/nvhe/setup.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 27af337f9fea..99e178cf4249 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -105,11 +105,19 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 		if (ret)
 			return ret;
 
-		end = (void *)per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va;
+		/* Map stack pages in the 'private' VA range */
+		end = (void *)__hyp_pa(per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va);
 		start = end - PAGE_SIZE;
-		ret = pkvm_create_mappings(start, end, PAGE_HYP);
-		if (ret)
-			return ret;
+		start = (void *)__pkvm_create_private_mapping((phys_addr_t)start,
+					PAGE_SIZE, PAGE_HYP);
+		if (IS_ERR_OR_NULL(start))
+			return PTR_ERR(start);
+		end = start + PAGE_SIZE;
+		/*
+		 * Update stack_hyp_va to the end of the stack page's
+		 * allocated 'private' VA range.
+		 */
+		per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va = (unsigned long) end;
 	}
 
 	/*
-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 1/7] KVM: arm64: Map the stack pages in the 'private' range
@ 2022-02-10 22:41   ` Kalesh Singh
  0 siblings, 0 replies; 36+ messages in thread
From: Kalesh Singh @ 2022-02-10 22:41 UTC (permalink / raw)
  Cc: will, maz, qperret, tabba, surenb, kernel-team, Kalesh Singh,
	Catalin Marinas, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Ard Biesheuvel, Mark Rutland, Pasha Tatashin, Joey Gouly,
	Peter Collingbourne, Andrew Walbran, Andrew Scull, Paolo Bonzini,
	linux-arm-kernel, linux-kernel, kvmarm

From: Quentin Perret <qperret@google.com>

In preparation for introducing guard pages for the stacks, map them
in the 'private' range of the EL2 VA space in which the VA to PA
relation is flexible when running in protected mode.

Signed-off-by: Quentin Perret <qperret@google.com>
[Kalesh - Refactor, add comments, resolve conflicts,
          use  __pkvm_create_private_mapping()]
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---
 arch/arm64/kvm/hyp/nvhe/setup.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 27af337f9fea..99e178cf4249 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -105,11 +105,19 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 		if (ret)
 			return ret;
 
-		end = (void *)per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va;
+		/* Map stack pages in the 'private' VA range */
+		end = (void *)__hyp_pa(per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va);
 		start = end - PAGE_SIZE;
-		ret = pkvm_create_mappings(start, end, PAGE_HYP);
-		if (ret)
-			return ret;
+		start = (void *)__pkvm_create_private_mapping((phys_addr_t)start,
+					PAGE_SIZE, PAGE_HYP);
+		if (IS_ERR_OR_NULL(start))
+			return PTR_ERR(start);
+		end = start + PAGE_SIZE;
+		/*
+		 * Update stack_hyp_va to the end of the stack page's
+		 * allocated 'private' VA range.
+		 */
+		per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va = (unsigned long) end;
 	}
 
 	/*
-- 
2.35.1.265.g69c8d7142f-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 1/7] KVM: arm64: Map the stack pages in the 'private' range
@ 2022-02-10 22:41   ` Kalesh Singh
  0 siblings, 0 replies; 36+ messages in thread
From: Kalesh Singh @ 2022-02-10 22:41 UTC (permalink / raw)
  Cc: Catalin Marinas, Joey Gouly, Kalesh Singh, will, kvmarm,
	Andrew Walbran, maz, linux-arm-kernel, kernel-team,
	Pasha Tatashin, surenb, Peter Collingbourne, linux-kernel,
	Paolo Bonzini

From: Quentin Perret <qperret@google.com>

In preparation for introducing guard pages for the stacks, map them
in the 'private' range of the EL2 VA space in which the VA to PA
relation is flexible when running in protected mode.

Signed-off-by: Quentin Perret <qperret@google.com>
[Kalesh - Refactor, add comments, resolve conflicts,
          use  __pkvm_create_private_mapping()]
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---
 arch/arm64/kvm/hyp/nvhe/setup.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 27af337f9fea..99e178cf4249 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -105,11 +105,19 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 		if (ret)
 			return ret;
 
-		end = (void *)per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va;
+		/* Map stack pages in the 'private' VA range */
+		end = (void *)__hyp_pa(per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va);
 		start = end - PAGE_SIZE;
-		ret = pkvm_create_mappings(start, end, PAGE_HYP);
-		if (ret)
-			return ret;
+		start = (void *)__pkvm_create_private_mapping((phys_addr_t)start,
+					PAGE_SIZE, PAGE_HYP);
+		if (IS_ERR_OR_NULL(start))
+			return PTR_ERR(start);
+		end = start + PAGE_SIZE;
+		/*
+		 * Update stack_hyp_va to the end of the stack page's
+		 * allocated 'private' VA range.
+		 */
+		per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va = (unsigned long) end;
 	}
 
 	/*
-- 
2.35.1.265.g69c8d7142f-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 2/7] KVM: arm64: Factor out private range VA allocation
  2022-02-10 22:41 ` Kalesh Singh
  (?)
@ 2022-02-10 22:41   ` Kalesh Singh
  -1 siblings, 0 replies; 36+ messages in thread
From: Kalesh Singh @ 2022-02-10 22:41 UTC (permalink / raw)
  Cc: will, maz, qperret, tabba, surenb, kernel-team, Kalesh Singh,
	Catalin Marinas, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Ard Biesheuvel, Mark Rutland, Pasha Tatashin, Joey Gouly,
	Peter Collingbourne, Andrew Walbran, Andrew Scull, Paolo Bonzini,
	linux-arm-kernel, linux-kernel, kvmarm

From: Quentin Perret <qperret@google.com>

__pkvm_create_private_mapping() is currently responsible for allocating
VA space in the hypervisor's "private" range and creating stage-1
mappings. In order to allow reusing the VA space allocation logic from
other places, let's factor it out in a standalone function. This is will
be used to allocate private VA ranges for hypervisor stack guard pages
in a subsequent patch in this series.

Signed-off-by: Quentin Perret <qperret@google.com>
[Kalesh - Resolve conflicts and make hyp_alloc_private_va_range
          available outside nvhe/mm.c, update commit message]
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---
 arch/arm64/kvm/hyp/include/nvhe/mm.h |  1 +
 arch/arm64/kvm/hyp/nvhe/mm.c         | 28 +++++++++++++++++++---------
 2 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
index 2d08510c6cc1..f53fb0e406db 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -21,6 +21,7 @@ int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
 int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot prot);
 unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
 					    enum kvm_pgtable_prot prot);
+unsigned long hyp_alloc_private_va_range(size_t size);
 
 static inline void hyp_vmemmap_range(phys_addr_t phys, unsigned long size,
 				     unsigned long *start, unsigned long *end)
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index 526a7d6fa86f..e196441e072f 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -37,6 +37,22 @@ static int __pkvm_create_mappings(unsigned long start, unsigned long size,
 	return err;
 }
 
+unsigned long hyp_alloc_private_va_range(size_t size)
+{
+	unsigned long addr = __io_map_base;
+
+	hyp_assert_lock_held(&pkvm_pgd_lock);
+	__io_map_base += PAGE_ALIGN(size);
+
+	/* Are we overflowing on the vmemmap ? */
+	if (__io_map_base > __hyp_vmemmap) {
+		__io_map_base = addr;
+		addr = (unsigned long)ERR_PTR(-ENOMEM);
+	}
+
+	return addr;
+}
+
 unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
 					    enum kvm_pgtable_prot prot)
 {
@@ -45,16 +61,10 @@ unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
 
 	hyp_spin_lock(&pkvm_pgd_lock);
 
-	size = PAGE_ALIGN(size + offset_in_page(phys));
-	addr = __io_map_base;
-	__io_map_base += size;
-
-	/* Are we overflowing on the vmemmap ? */
-	if (__io_map_base > __hyp_vmemmap) {
-		__io_map_base -= size;
-		addr = (unsigned long)ERR_PTR(-ENOMEM);
+	size = size + offset_in_page(phys);
+	addr = hyp_alloc_private_va_range(size);
+	if (IS_ERR((void *)addr))
 		goto out;
-	}
 
 	err = kvm_pgtable_hyp_map(&pkvm_pgtable, addr, size, phys, prot);
 	if (err) {
-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 2/7] KVM: arm64: Factor out private range VA allocation
@ 2022-02-10 22:41   ` Kalesh Singh
  0 siblings, 0 replies; 36+ messages in thread
From: Kalesh Singh @ 2022-02-10 22:41 UTC (permalink / raw)
  Cc: will, maz, qperret, tabba, surenb, kernel-team, Kalesh Singh,
	Catalin Marinas, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Ard Biesheuvel, Mark Rutland, Pasha Tatashin, Joey Gouly,
	Peter Collingbourne, Andrew Walbran, Andrew Scull, Paolo Bonzini,
	linux-arm-kernel, linux-kernel, kvmarm

From: Quentin Perret <qperret@google.com>

__pkvm_create_private_mapping() is currently responsible for allocating
VA space in the hypervisor's "private" range and creating stage-1
mappings. In order to allow reusing the VA space allocation logic from
other places, let's factor it out in a standalone function. This is will
be used to allocate private VA ranges for hypervisor stack guard pages
in a subsequent patch in this series.

Signed-off-by: Quentin Perret <qperret@google.com>
[Kalesh - Resolve conflicts and make hyp_alloc_private_va_range
          available outside nvhe/mm.c, update commit message]
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---
 arch/arm64/kvm/hyp/include/nvhe/mm.h |  1 +
 arch/arm64/kvm/hyp/nvhe/mm.c         | 28 +++++++++++++++++++---------
 2 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
index 2d08510c6cc1..f53fb0e406db 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -21,6 +21,7 @@ int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
 int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot prot);
 unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
 					    enum kvm_pgtable_prot prot);
+unsigned long hyp_alloc_private_va_range(size_t size);
 
 static inline void hyp_vmemmap_range(phys_addr_t phys, unsigned long size,
 				     unsigned long *start, unsigned long *end)
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index 526a7d6fa86f..e196441e072f 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -37,6 +37,22 @@ static int __pkvm_create_mappings(unsigned long start, unsigned long size,
 	return err;
 }
 
+unsigned long hyp_alloc_private_va_range(size_t size)
+{
+	unsigned long addr = __io_map_base;
+
+	hyp_assert_lock_held(&pkvm_pgd_lock);
+	__io_map_base += PAGE_ALIGN(size);
+
+	/* Are we overflowing on the vmemmap ? */
+	if (__io_map_base > __hyp_vmemmap) {
+		__io_map_base = addr;
+		addr = (unsigned long)ERR_PTR(-ENOMEM);
+	}
+
+	return addr;
+}
+
 unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
 					    enum kvm_pgtable_prot prot)
 {
@@ -45,16 +61,10 @@ unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
 
 	hyp_spin_lock(&pkvm_pgd_lock);
 
-	size = PAGE_ALIGN(size + offset_in_page(phys));
-	addr = __io_map_base;
-	__io_map_base += size;
-
-	/* Are we overflowing on the vmemmap ? */
-	if (__io_map_base > __hyp_vmemmap) {
-		__io_map_base -= size;
-		addr = (unsigned long)ERR_PTR(-ENOMEM);
+	size = size + offset_in_page(phys);
+	addr = hyp_alloc_private_va_range(size);
+	if (IS_ERR((void *)addr))
 		goto out;
-	}
 
 	err = kvm_pgtable_hyp_map(&pkvm_pgtable, addr, size, phys, prot);
 	if (err) {
-- 
2.35.1.265.g69c8d7142f-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 2/7] KVM: arm64: Factor out private range VA allocation
@ 2022-02-10 22:41   ` Kalesh Singh
  0 siblings, 0 replies; 36+ messages in thread
From: Kalesh Singh @ 2022-02-10 22:41 UTC (permalink / raw)
  Cc: Catalin Marinas, Joey Gouly, Kalesh Singh, will, kvmarm,
	Andrew Walbran, maz, linux-arm-kernel, kernel-team,
	Pasha Tatashin, surenb, Peter Collingbourne, linux-kernel,
	Paolo Bonzini

From: Quentin Perret <qperret@google.com>

__pkvm_create_private_mapping() is currently responsible for allocating
VA space in the hypervisor's "private" range and creating stage-1
mappings. In order to allow reusing the VA space allocation logic from
other places, let's factor it out in a standalone function. This is will
be used to allocate private VA ranges for hypervisor stack guard pages
in a subsequent patch in this series.

Signed-off-by: Quentin Perret <qperret@google.com>
[Kalesh - Resolve conflicts and make hyp_alloc_private_va_range
          available outside nvhe/mm.c, update commit message]
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---
 arch/arm64/kvm/hyp/include/nvhe/mm.h |  1 +
 arch/arm64/kvm/hyp/nvhe/mm.c         | 28 +++++++++++++++++++---------
 2 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
index 2d08510c6cc1..f53fb0e406db 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -21,6 +21,7 @@ int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
 int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot prot);
 unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
 					    enum kvm_pgtable_prot prot);
+unsigned long hyp_alloc_private_va_range(size_t size);
 
 static inline void hyp_vmemmap_range(phys_addr_t phys, unsigned long size,
 				     unsigned long *start, unsigned long *end)
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index 526a7d6fa86f..e196441e072f 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -37,6 +37,22 @@ static int __pkvm_create_mappings(unsigned long start, unsigned long size,
 	return err;
 }
 
+unsigned long hyp_alloc_private_va_range(size_t size)
+{
+	unsigned long addr = __io_map_base;
+
+	hyp_assert_lock_held(&pkvm_pgd_lock);
+	__io_map_base += PAGE_ALIGN(size);
+
+	/* Are we overflowing on the vmemmap ? */
+	if (__io_map_base > __hyp_vmemmap) {
+		__io_map_base = addr;
+		addr = (unsigned long)ERR_PTR(-ENOMEM);
+	}
+
+	return addr;
+}
+
 unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
 					    enum kvm_pgtable_prot prot)
 {
@@ -45,16 +61,10 @@ unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
 
 	hyp_spin_lock(&pkvm_pgd_lock);
 
-	size = PAGE_ALIGN(size + offset_in_page(phys));
-	addr = __io_map_base;
-	__io_map_base += size;
-
-	/* Are we overflowing on the vmemmap ? */
-	if (__io_map_base > __hyp_vmemmap) {
-		__io_map_base -= size;
-		addr = (unsigned long)ERR_PTR(-ENOMEM);
+	size = size + offset_in_page(phys);
+	addr = hyp_alloc_private_va_range(size);
+	if (IS_ERR((void *)addr))
 		goto out;
-	}
 
 	err = kvm_pgtable_hyp_map(&pkvm_pgtable, addr, size, phys, prot);
 	if (err) {
-- 
2.35.1.265.g69c8d7142f-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 3/7] arm64: asm: Introduce test_sp_overflow macro
  2022-02-10 22:41 ` Kalesh Singh
  (?)
@ 2022-02-10 22:41   ` Kalesh Singh
  -1 siblings, 0 replies; 36+ messages in thread
From: Kalesh Singh @ 2022-02-10 22:41 UTC (permalink / raw)
  Cc: will, maz, qperret, tabba, surenb, kernel-team, Kalesh Singh,
	Catalin Marinas, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Ard Biesheuvel, Mark Rutland, Pasha Tatashin, Joey Gouly,
	Peter Collingbourne, Andrew Walbran, Andrew Scull,
	linux-arm-kernel, linux-kernel, kvmarm

From: Quentin Perret <qperret@google.com>

The asm entry code in the kernel uses a trick to check if VMAP'd stacks
have overflowed by aligning them at THREAD_SHIFT * 2 granularity and
checking the SP's THREAD_SHIFT bit.

Protected KVM will soon make use of a similar trick to detect stack
overflows, so factor out the asm code in a re-usable macro.

Signed-off-by: Quentin Perret <qperret@google.com>
[Kalesh - Resolve minor conflicts]
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---
 arch/arm64/include/asm/assembler.h | 11 +++++++++++
 arch/arm64/kernel/entry.S          |  9 ++-------
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index e8bd0af0141c..ad40eb0eee83 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -850,4 +850,15 @@ alternative_endif
 
 #endif /* GNU_PROPERTY_AARCH64_FEATURE_1_DEFAULT */
 
+/*
+ * Test whether the SP has overflowed, without corrupting a GPR.
+ */
+.macro test_sp_overflow shift, label
+	add	sp, sp, x0			// sp' = sp + x0
+	sub	x0, sp, x0			// x0' = sp' - x0 = (sp + x0) - x0 = sp
+	tbnz	x0, #\shift, \label
+	sub	x0, sp, x0			// x0'' = sp' - x0' = (sp + x0) - sp = x0
+	sub	sp, sp, x0			// sp'' = sp' - x0 = (sp + x0) - x0 = sp
+.endm
+
 #endif	/* __ASM_ASSEMBLER_H */
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 772ec2ecf488..2632bc47b348 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -53,16 +53,11 @@ alternative_else_nop_endif
 	sub	sp, sp, #PT_REGS_SIZE
 #ifdef CONFIG_VMAP_STACK
 	/*
-	 * Test whether the SP has overflowed, without corrupting a GPR.
 	 * Task and IRQ stacks are aligned so that SP & (1 << THREAD_SHIFT)
 	 * should always be zero.
 	 */
-	add	sp, sp, x0			// sp' = sp + x0
-	sub	x0, sp, x0			// x0' = sp' - x0 = (sp + x0) - x0 = sp
-	tbnz	x0, #THREAD_SHIFT, 0f
-	sub	x0, sp, x0			// x0'' = sp' - x0' = (sp + x0) - sp = x0
-	sub	sp, sp, x0			// sp'' = sp' - x0 = (sp + x0) - x0 = sp
-	b	el\el\ht\()_\regsize\()_\label
+	test_sp_overflow THREAD_SHIFT, 0f
+	b       el\el\ht\()_\regsize\()_\label
 
 0:
 	/*
-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 3/7] arm64: asm: Introduce test_sp_overflow macro
@ 2022-02-10 22:41   ` Kalesh Singh
  0 siblings, 0 replies; 36+ messages in thread
From: Kalesh Singh @ 2022-02-10 22:41 UTC (permalink / raw)
  Cc: will, maz, qperret, tabba, surenb, kernel-team, Kalesh Singh,
	Catalin Marinas, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Ard Biesheuvel, Mark Rutland, Pasha Tatashin, Joey Gouly,
	Peter Collingbourne, Andrew Walbran, Andrew Scull,
	linux-arm-kernel, linux-kernel, kvmarm

From: Quentin Perret <qperret@google.com>

The asm entry code in the kernel uses a trick to check if VMAP'd stacks
have overflowed by aligning them at THREAD_SHIFT * 2 granularity and
checking the SP's THREAD_SHIFT bit.

Protected KVM will soon make use of a similar trick to detect stack
overflows, so factor out the asm code in a re-usable macro.

Signed-off-by: Quentin Perret <qperret@google.com>
[Kalesh - Resolve minor conflicts]
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---
 arch/arm64/include/asm/assembler.h | 11 +++++++++++
 arch/arm64/kernel/entry.S          |  9 ++-------
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index e8bd0af0141c..ad40eb0eee83 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -850,4 +850,15 @@ alternative_endif
 
 #endif /* GNU_PROPERTY_AARCH64_FEATURE_1_DEFAULT */
 
+/*
+ * Test whether the SP has overflowed, without corrupting a GPR.
+ */
+.macro test_sp_overflow shift, label
+	add	sp, sp, x0			// sp' = sp + x0
+	sub	x0, sp, x0			// x0' = sp' - x0 = (sp + x0) - x0 = sp
+	tbnz	x0, #\shift, \label
+	sub	x0, sp, x0			// x0'' = sp' - x0' = (sp + x0) - sp = x0
+	sub	sp, sp, x0			// sp'' = sp' - x0 = (sp + x0) - x0 = sp
+.endm
+
 #endif	/* __ASM_ASSEMBLER_H */
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 772ec2ecf488..2632bc47b348 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -53,16 +53,11 @@ alternative_else_nop_endif
 	sub	sp, sp, #PT_REGS_SIZE
 #ifdef CONFIG_VMAP_STACK
 	/*
-	 * Test whether the SP has overflowed, without corrupting a GPR.
 	 * Task and IRQ stacks are aligned so that SP & (1 << THREAD_SHIFT)
 	 * should always be zero.
 	 */
-	add	sp, sp, x0			// sp' = sp + x0
-	sub	x0, sp, x0			// x0' = sp' - x0 = (sp + x0) - x0 = sp
-	tbnz	x0, #THREAD_SHIFT, 0f
-	sub	x0, sp, x0			// x0'' = sp' - x0' = (sp + x0) - sp = x0
-	sub	sp, sp, x0			// sp'' = sp' - x0 = (sp + x0) - x0 = sp
-	b	el\el\ht\()_\regsize\()_\label
+	test_sp_overflow THREAD_SHIFT, 0f
+	b       el\el\ht\()_\regsize\()_\label
 
 0:
 	/*
-- 
2.35.1.265.g69c8d7142f-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 3/7] arm64: asm: Introduce test_sp_overflow macro
@ 2022-02-10 22:41   ` Kalesh Singh
  0 siblings, 0 replies; 36+ messages in thread
From: Kalesh Singh @ 2022-02-10 22:41 UTC (permalink / raw)
  Cc: kernel-team, Catalin Marinas, Pasha Tatashin, will,
	Peter Collingbourne, maz, linux-kernel, Joey Gouly, kvmarm,
	Andrew Walbran, Kalesh Singh, surenb, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

The asm entry code in the kernel uses a trick to check if VMAP'd stacks
have overflowed by aligning them at THREAD_SHIFT * 2 granularity and
checking the SP's THREAD_SHIFT bit.

Protected KVM will soon make use of a similar trick to detect stack
overflows, so factor out the asm code in a re-usable macro.

Signed-off-by: Quentin Perret <qperret@google.com>
[Kalesh - Resolve minor conflicts]
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---
 arch/arm64/include/asm/assembler.h | 11 +++++++++++
 arch/arm64/kernel/entry.S          |  9 ++-------
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index e8bd0af0141c..ad40eb0eee83 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -850,4 +850,15 @@ alternative_endif
 
 #endif /* GNU_PROPERTY_AARCH64_FEATURE_1_DEFAULT */
 
+/*
+ * Test whether the SP has overflowed, without corrupting a GPR.
+ */
+.macro test_sp_overflow shift, label
+	add	sp, sp, x0			// sp' = sp + x0
+	sub	x0, sp, x0			// x0' = sp' - x0 = (sp + x0) - x0 = sp
+	tbnz	x0, #\shift, \label
+	sub	x0, sp, x0			// x0'' = sp' - x0' = (sp + x0) - sp = x0
+	sub	sp, sp, x0			// sp'' = sp' - x0 = (sp + x0) - x0 = sp
+.endm
+
 #endif	/* __ASM_ASSEMBLER_H */
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 772ec2ecf488..2632bc47b348 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -53,16 +53,11 @@ alternative_else_nop_endif
 	sub	sp, sp, #PT_REGS_SIZE
 #ifdef CONFIG_VMAP_STACK
 	/*
-	 * Test whether the SP has overflowed, without corrupting a GPR.
 	 * Task and IRQ stacks are aligned so that SP & (1 << THREAD_SHIFT)
 	 * should always be zero.
 	 */
-	add	sp, sp, x0			// sp' = sp + x0
-	sub	x0, sp, x0			// x0' = sp' - x0 = (sp + x0) - x0 = sp
-	tbnz	x0, #THREAD_SHIFT, 0f
-	sub	x0, sp, x0			// x0'' = sp' - x0' = (sp + x0) - sp = x0
-	sub	sp, sp, x0			// sp'' = sp' - x0 = (sp + x0) - x0 = sp
-	b	el\el\ht\()_\regsize\()_\label
+	test_sp_overflow THREAD_SHIFT, 0f
+	b       el\el\ht\()_\regsize\()_\label
 
 0:
 	/*
-- 
2.35.1.265.g69c8d7142f-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 4/7] KVM: arm64: Allocate guard pages near hyp stacks
  2022-02-10 22:41 ` Kalesh Singh
  (?)
@ 2022-02-10 22:41   ` Kalesh Singh
  -1 siblings, 0 replies; 36+ messages in thread
From: Kalesh Singh @ 2022-02-10 22:41 UTC (permalink / raw)
  Cc: will, maz, qperret, tabba, surenb, kernel-team, Kalesh Singh,
	Catalin Marinas, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Ard Biesheuvel, Mark Rutland, Pasha Tatashin, Joey Gouly,
	Peter Collingbourne, Andrew Scull, linux-arm-kernel,
	linux-kernel, kvmarm

From: Quentin Perret <qperret@google.com>

Allocate unbacked VA space underneath each stack page to ensure stack
overflows get trapped and don't corrupt memory silently.

The stack is aligned to twice its size (PAGE_SIZE), meaning that any
valid stack address has PAGE_SHIFT bit as 0. This allows us to easily
check for overflow in the exception entry without corrupting any GPRs.

Signed-off-by: Quentin Perret <qperret@google.com>
[ Kalesh - Update commit text and comments,
           refactor, add overflow handling ]
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---
 arch/arm64/kvm/hyp/nvhe/host.S   | 16 ++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/setup.c  | 19 ++++++++++++++++++-
 arch/arm64/kvm/hyp/nvhe/switch.c |  5 +++++
 3 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
index 3d613e721a75..78e4b612ac06 100644
--- a/arch/arm64/kvm/hyp/nvhe/host.S
+++ b/arch/arm64/kvm/hyp/nvhe/host.S
@@ -153,6 +153,10 @@ SYM_FUNC_END(__host_hvc)
 
 .macro invalid_host_el2_vect
 	.align 7
+
+	/* Test stack overflow without corrupting GPRs */
+	test_sp_overflow PAGE_SHIFT, .L__hyp_sp_overflow\@
+
 	/* If a guest is loaded, panic out of it. */
 	stp	x0, x1, [sp, #-16]!
 	get_loaded_vcpu x0, x1
@@ -165,6 +169,18 @@ SYM_FUNC_END(__host_hvc)
 	 * been partially clobbered by __host_enter.
 	 */
 	b	hyp_panic
+
+.L__hyp_sp_overflow\@:
+	/*
+	 * Reset SP to the top of the stack, to allow handling the hyp_panic.
+	 * This corrupts the stack but is ok, since we won't be attempting
+	 * any unwinding here.
+	 */
+	ldr_this_cpu	x0, kvm_init_params + NVHE_INIT_STACK_HYP_VA, x1
+	mov	sp, x0
+
+	bl	hyp_panic_bad_stack
+	ASM_BUG()
 .endm
 
 .macro invalid_host_el1_vect
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 99e178cf4249..114053dff228 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -105,7 +105,24 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 		if (ret)
 			return ret;
 
-		/* Map stack pages in the 'private' VA range */
+		/*
+		 * Allocate 'private' VA range for stack guard pages.
+		 *
+		 * The 'private' VA range grows upward and stacks downwards, so
+		 * allocate the guard page first. But make sure to align the
+		 * stack itself with PAGE_SIZE * 2 granularity to ease overflow
+		 * detection in the entry assembly code.
+		 */
+		do {
+			start = (void *)hyp_alloc_private_va_range(PAGE_SIZE);
+			if (IS_ERR(start))
+				return PTR_ERR(start);
+		} while (IS_ALIGNED((u64) start, PAGE_SIZE * 2));
+
+		/*
+		 * Map stack pages in the 'private' VA range above the allocated
+		 * guard pages.
+		 */
 		end = (void *)__hyp_pa(per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va);
 		start = end - PAGE_SIZE;
 		start = (void *)__pkvm_create_private_mapping((phys_addr_t)start,
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 6410d21d8695..5a2e1ab79913 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -369,6 +369,11 @@ void __noreturn hyp_panic(void)
 	unreachable();
 }
 
+void __noreturn hyp_panic_bad_stack(void)
+{
+	hyp_panic();
+}
+
 asmlinkage void kvm_unexpected_el2_exception(void)
 {
 	return __kvm_unexpected_el2_exception();
-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 4/7] KVM: arm64: Allocate guard pages near hyp stacks
@ 2022-02-10 22:41   ` Kalesh Singh
  0 siblings, 0 replies; 36+ messages in thread
From: Kalesh Singh @ 2022-02-10 22:41 UTC (permalink / raw)
  Cc: will, maz, qperret, tabba, surenb, kernel-team, Kalesh Singh,
	Catalin Marinas, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Ard Biesheuvel, Mark Rutland, Pasha Tatashin, Joey Gouly,
	Peter Collingbourne, Andrew Scull, linux-arm-kernel,
	linux-kernel, kvmarm

From: Quentin Perret <qperret@google.com>

Allocate unbacked VA space underneath each stack page to ensure stack
overflows get trapped and don't corrupt memory silently.

The stack is aligned to twice its size (PAGE_SIZE), meaning that any
valid stack address has PAGE_SHIFT bit as 0. This allows us to easily
check for overflow in the exception entry without corrupting any GPRs.

Signed-off-by: Quentin Perret <qperret@google.com>
[ Kalesh - Update commit text and comments,
           refactor, add overflow handling ]
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---
 arch/arm64/kvm/hyp/nvhe/host.S   | 16 ++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/setup.c  | 19 ++++++++++++++++++-
 arch/arm64/kvm/hyp/nvhe/switch.c |  5 +++++
 3 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
index 3d613e721a75..78e4b612ac06 100644
--- a/arch/arm64/kvm/hyp/nvhe/host.S
+++ b/arch/arm64/kvm/hyp/nvhe/host.S
@@ -153,6 +153,10 @@ SYM_FUNC_END(__host_hvc)
 
 .macro invalid_host_el2_vect
 	.align 7
+
+	/* Test stack overflow without corrupting GPRs */
+	test_sp_overflow PAGE_SHIFT, .L__hyp_sp_overflow\@
+
 	/* If a guest is loaded, panic out of it. */
 	stp	x0, x1, [sp, #-16]!
 	get_loaded_vcpu x0, x1
@@ -165,6 +169,18 @@ SYM_FUNC_END(__host_hvc)
 	 * been partially clobbered by __host_enter.
 	 */
 	b	hyp_panic
+
+.L__hyp_sp_overflow\@:
+	/*
+	 * Reset SP to the top of the stack, to allow handling the hyp_panic.
+	 * This corrupts the stack but is ok, since we won't be attempting
+	 * any unwinding here.
+	 */
+	ldr_this_cpu	x0, kvm_init_params + NVHE_INIT_STACK_HYP_VA, x1
+	mov	sp, x0
+
+	bl	hyp_panic_bad_stack
+	ASM_BUG()
 .endm
 
 .macro invalid_host_el1_vect
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 99e178cf4249..114053dff228 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -105,7 +105,24 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 		if (ret)
 			return ret;
 
-		/* Map stack pages in the 'private' VA range */
+		/*
+		 * Allocate 'private' VA range for stack guard pages.
+		 *
+		 * The 'private' VA range grows upward and stacks downwards, so
+		 * allocate the guard page first. But make sure to align the
+		 * stack itself with PAGE_SIZE * 2 granularity to ease overflow
+		 * detection in the entry assembly code.
+		 */
+		do {
+			start = (void *)hyp_alloc_private_va_range(PAGE_SIZE);
+			if (IS_ERR(start))
+				return PTR_ERR(start);
+		} while (IS_ALIGNED((u64) start, PAGE_SIZE * 2));
+
+		/*
+		 * Map stack pages in the 'private' VA range above the allocated
+		 * guard pages.
+		 */
 		end = (void *)__hyp_pa(per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va);
 		start = end - PAGE_SIZE;
 		start = (void *)__pkvm_create_private_mapping((phys_addr_t)start,
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 6410d21d8695..5a2e1ab79913 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -369,6 +369,11 @@ void __noreturn hyp_panic(void)
 	unreachable();
 }
 
+void __noreturn hyp_panic_bad_stack(void)
+{
+	hyp_panic();
+}
+
 asmlinkage void kvm_unexpected_el2_exception(void)
 {
 	return __kvm_unexpected_el2_exception();
-- 
2.35.1.265.g69c8d7142f-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 4/7] KVM: arm64: Allocate guard pages near hyp stacks
@ 2022-02-10 22:41   ` Kalesh Singh
  0 siblings, 0 replies; 36+ messages in thread
From: Kalesh Singh @ 2022-02-10 22:41 UTC (permalink / raw)
  Cc: kernel-team, Catalin Marinas, Pasha Tatashin, will,
	Peter Collingbourne, maz, linux-kernel, Joey Gouly, kvmarm,
	Kalesh Singh, surenb, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

Allocate unbacked VA space underneath each stack page to ensure stack
overflows get trapped and don't corrupt memory silently.

The stack is aligned to twice its size (PAGE_SIZE), meaning that any
valid stack address has PAGE_SHIFT bit as 0. This allows us to easily
check for overflow in the exception entry without corrupting any GPRs.

Signed-off-by: Quentin Perret <qperret@google.com>
[ Kalesh - Update commit text and comments,
           refactor, add overflow handling ]
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---
 arch/arm64/kvm/hyp/nvhe/host.S   | 16 ++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/setup.c  | 19 ++++++++++++++++++-
 arch/arm64/kvm/hyp/nvhe/switch.c |  5 +++++
 3 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
index 3d613e721a75..78e4b612ac06 100644
--- a/arch/arm64/kvm/hyp/nvhe/host.S
+++ b/arch/arm64/kvm/hyp/nvhe/host.S
@@ -153,6 +153,10 @@ SYM_FUNC_END(__host_hvc)
 
 .macro invalid_host_el2_vect
 	.align 7
+
+	/* Test stack overflow without corrupting GPRs */
+	test_sp_overflow PAGE_SHIFT, .L__hyp_sp_overflow\@
+
 	/* If a guest is loaded, panic out of it. */
 	stp	x0, x1, [sp, #-16]!
 	get_loaded_vcpu x0, x1
@@ -165,6 +169,18 @@ SYM_FUNC_END(__host_hvc)
 	 * been partially clobbered by __host_enter.
 	 */
 	b	hyp_panic
+
+.L__hyp_sp_overflow\@:
+	/*
+	 * Reset SP to the top of the stack, to allow handling the hyp_panic.
+	 * This corrupts the stack but is ok, since we won't be attempting
+	 * any unwinding here.
+	 */
+	ldr_this_cpu	x0, kvm_init_params + NVHE_INIT_STACK_HYP_VA, x1
+	mov	sp, x0
+
+	bl	hyp_panic_bad_stack
+	ASM_BUG()
 .endm
 
 .macro invalid_host_el1_vect
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 99e178cf4249..114053dff228 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -105,7 +105,24 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 		if (ret)
 			return ret;
 
-		/* Map stack pages in the 'private' VA range */
+		/*
+		 * Allocate 'private' VA range for stack guard pages.
+		 *
+		 * The 'private' VA range grows upward and stacks downwards, so
+		 * allocate the guard page first. But make sure to align the
+		 * stack itself with PAGE_SIZE * 2 granularity to ease overflow
+		 * detection in the entry assembly code.
+		 */
+		do {
+			start = (void *)hyp_alloc_private_va_range(PAGE_SIZE);
+			if (IS_ERR(start))
+				return PTR_ERR(start);
+		} while (IS_ALIGNED((u64) start, PAGE_SIZE * 2));
+
+		/*
+		 * Map stack pages in the 'private' VA range above the allocated
+		 * guard pages.
+		 */
 		end = (void *)__hyp_pa(per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va);
 		start = end - PAGE_SIZE;
 		start = (void *)__pkvm_create_private_mapping((phys_addr_t)start,
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 6410d21d8695..5a2e1ab79913 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -369,6 +369,11 @@ void __noreturn hyp_panic(void)
 	unreachable();
 }
 
+void __noreturn hyp_panic_bad_stack(void)
+{
+	hyp_panic();
+}
+
 asmlinkage void kvm_unexpected_el2_exception(void)
 {
 	return __kvm_unexpected_el2_exception();
-- 
2.35.1.265.g69c8d7142f-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 5/7] KVM: arm64: Add Hyp overflow stack
  2022-02-10 22:41 ` Kalesh Singh
  (?)
@ 2022-02-10 22:41   ` Kalesh Singh
  -1 siblings, 0 replies; 36+ messages in thread
From: Kalesh Singh @ 2022-02-10 22:41 UTC (permalink / raw)
  Cc: will, maz, qperret, tabba, surenb, kernel-team, Kalesh Singh,
	Catalin Marinas, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Ard Biesheuvel, Mark Rutland, Pasha Tatashin, Joey Gouly,
	Peter Collingbourne, Andrew Scull, linux-arm-kernel,
	linux-kernel, kvmarm

Allocate and switch to 16-byte aligned secondary stack on overflow. This
provides us stack space to better handle overflows; and is used in
a subsequent patch to dump the hypervisor stacktrace. The overflow stack
is only allocated if CONFIG_NVHE_EL2_DEBUG is enabled, as hypervisor
stacktraces is a debug feature dependent on CONFIG_NVHE_EL2_DEBUG.

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---
 arch/arm64/kvm/hyp/nvhe/host.S  | 5 +++++
 arch/arm64/kvm/hyp/nvhe/setup.c | 5 +++++
 2 files changed, 10 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
index 78e4b612ac06..751a4b9e429f 100644
--- a/arch/arm64/kvm/hyp/nvhe/host.S
+++ b/arch/arm64/kvm/hyp/nvhe/host.S
@@ -171,6 +171,10 @@ SYM_FUNC_END(__host_hvc)
 	b	hyp_panic
 
 .L__hyp_sp_overflow\@:
+#ifdef CONFIG_NVHE_EL2_DEBUG
+	/* Switch to the overflow stack */
+	adr_this_cpu sp, hyp_overflow_stack + PAGE_SIZE, x0
+#else
 	/*
 	 * Reset SP to the top of the stack, to allow handling the hyp_panic.
 	 * This corrupts the stack but is ok, since we won't be attempting
@@ -178,6 +182,7 @@ SYM_FUNC_END(__host_hvc)
 	 */
 	ldr_this_cpu	x0, kvm_init_params + NVHE_INIT_STACK_HYP_VA, x1
 	mov	sp, x0
+#endif
 
 	bl	hyp_panic_bad_stack
 	ASM_BUG()
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 114053dff228..39937fa6a1b2 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -20,6 +20,11 @@
 
 unsigned long hyp_nr_cpus;
 
+#ifdef CONFIG_NVHE_EL2_DEBUG
+DEFINE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack)
+	__aligned(16);
+#endif
+
 #define hyp_percpu_size ((unsigned long)__per_cpu_end - \
 			 (unsigned long)__per_cpu_start)
 
-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 5/7] KVM: arm64: Add Hyp overflow stack
@ 2022-02-10 22:41   ` Kalesh Singh
  0 siblings, 0 replies; 36+ messages in thread
From: Kalesh Singh @ 2022-02-10 22:41 UTC (permalink / raw)
  Cc: will, maz, qperret, tabba, surenb, kernel-team, Kalesh Singh,
	Catalin Marinas, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Ard Biesheuvel, Mark Rutland, Pasha Tatashin, Joey Gouly,
	Peter Collingbourne, Andrew Scull, linux-arm-kernel,
	linux-kernel, kvmarm

Allocate and switch to 16-byte aligned secondary stack on overflow. This
provides us stack space to better handle overflows; and is used in
a subsequent patch to dump the hypervisor stacktrace. The overflow stack
is only allocated if CONFIG_NVHE_EL2_DEBUG is enabled, as hypervisor
stacktraces is a debug feature dependent on CONFIG_NVHE_EL2_DEBUG.

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---
 arch/arm64/kvm/hyp/nvhe/host.S  | 5 +++++
 arch/arm64/kvm/hyp/nvhe/setup.c | 5 +++++
 2 files changed, 10 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
index 78e4b612ac06..751a4b9e429f 100644
--- a/arch/arm64/kvm/hyp/nvhe/host.S
+++ b/arch/arm64/kvm/hyp/nvhe/host.S
@@ -171,6 +171,10 @@ SYM_FUNC_END(__host_hvc)
 	b	hyp_panic
 
 .L__hyp_sp_overflow\@:
+#ifdef CONFIG_NVHE_EL2_DEBUG
+	/* Switch to the overflow stack */
+	adr_this_cpu sp, hyp_overflow_stack + PAGE_SIZE, x0
+#else
 	/*
 	 * Reset SP to the top of the stack, to allow handling the hyp_panic.
 	 * This corrupts the stack but is ok, since we won't be attempting
@@ -178,6 +182,7 @@ SYM_FUNC_END(__host_hvc)
 	 */
 	ldr_this_cpu	x0, kvm_init_params + NVHE_INIT_STACK_HYP_VA, x1
 	mov	sp, x0
+#endif
 
 	bl	hyp_panic_bad_stack
 	ASM_BUG()
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 114053dff228..39937fa6a1b2 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -20,6 +20,11 @@
 
 unsigned long hyp_nr_cpus;
 
+#ifdef CONFIG_NVHE_EL2_DEBUG
+DEFINE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack)
+	__aligned(16);
+#endif
+
 #define hyp_percpu_size ((unsigned long)__per_cpu_end - \
 			 (unsigned long)__per_cpu_start)
 
-- 
2.35.1.265.g69c8d7142f-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 5/7] KVM: arm64: Add Hyp overflow stack
@ 2022-02-10 22:41   ` Kalesh Singh
  0 siblings, 0 replies; 36+ messages in thread
From: Kalesh Singh @ 2022-02-10 22:41 UTC (permalink / raw)
  Cc: kernel-team, Catalin Marinas, Pasha Tatashin, will,
	Peter Collingbourne, maz, linux-kernel, Joey Gouly, kvmarm,
	Kalesh Singh, surenb, linux-arm-kernel

Allocate and switch to 16-byte aligned secondary stack on overflow. This
provides us stack space to better handle overflows; and is used in
a subsequent patch to dump the hypervisor stacktrace. The overflow stack
is only allocated if CONFIG_NVHE_EL2_DEBUG is enabled, as hypervisor
stacktraces is a debug feature dependent on CONFIG_NVHE_EL2_DEBUG.

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---
 arch/arm64/kvm/hyp/nvhe/host.S  | 5 +++++
 arch/arm64/kvm/hyp/nvhe/setup.c | 5 +++++
 2 files changed, 10 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
index 78e4b612ac06..751a4b9e429f 100644
--- a/arch/arm64/kvm/hyp/nvhe/host.S
+++ b/arch/arm64/kvm/hyp/nvhe/host.S
@@ -171,6 +171,10 @@ SYM_FUNC_END(__host_hvc)
 	b	hyp_panic
 
 .L__hyp_sp_overflow\@:
+#ifdef CONFIG_NVHE_EL2_DEBUG
+	/* Switch to the overflow stack */
+	adr_this_cpu sp, hyp_overflow_stack + PAGE_SIZE, x0
+#else
 	/*
 	 * Reset SP to the top of the stack, to allow handling the hyp_panic.
 	 * This corrupts the stack but is ok, since we won't be attempting
@@ -178,6 +182,7 @@ SYM_FUNC_END(__host_hvc)
 	 */
 	ldr_this_cpu	x0, kvm_init_params + NVHE_INIT_STACK_HYP_VA, x1
 	mov	sp, x0
+#endif
 
 	bl	hyp_panic_bad_stack
 	ASM_BUG()
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 114053dff228..39937fa6a1b2 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -20,6 +20,11 @@
 
 unsigned long hyp_nr_cpus;
 
+#ifdef CONFIG_NVHE_EL2_DEBUG
+DEFINE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack)
+	__aligned(16);
+#endif
+
 #define hyp_percpu_size ((unsigned long)__per_cpu_end - \
 			 (unsigned long)__per_cpu_start)
 
-- 
2.35.1.265.g69c8d7142f-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 6/7] KVM: arm64: Unwind and dump nVHE HYP stacktrace
  2022-02-10 22:41 ` Kalesh Singh
  (?)
@ 2022-02-10 22:41   ` Kalesh Singh
  -1 siblings, 0 replies; 36+ messages in thread
From: Kalesh Singh @ 2022-02-10 22:41 UTC (permalink / raw)
  Cc: will, maz, qperret, tabba, surenb, kernel-team, Kalesh Singh,
	Catalin Marinas, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Ard Biesheuvel, Mark Rutland, Pasha Tatashin, Joey Gouly,
	Peter Collingbourne, Andrew Walbran, Andrew Scull, Paolo Bonzini,
	linux-arm-kernel, linux-kernel, kvmarm

Unwind the stack in EL1, when CONFIG_NVHE_EL2_DEBUG is enabled. This is
possible because CONFIG_NVHE_EL2_DEBUG disables the host stage 2 protection
which allows host to access the hypervisor stack pages in EL1.

Unwinding and dumping hyp call traces is gated on CONFIG_NVHE_EL2_DEBUG
to avoid the potential leaking of information to the host.

A simple stack overflow test produces the following output:

[  580.376051][  T412] kvm: nVHE hyp panic at: ffffffc0116145c4!
[  580.378034][  T412] kvm [412]: nVHE HYP call trace (vmlinux
addresses):
[  580.378591][  T412] kvm [412]:  [<ffffffc011614934>]
[  580.378993][  T412] kvm [412]:  [<ffffffc01160fa48>]
[  580.379386][  T412] kvm [412]:  [<ffffffc0116145dc>]  // Non-terminating recursive call
[  580.379772][  T412] kvm [412]:  [<ffffffc0116145dc>]
[  580.380158][  T412] kvm [412]:  [<ffffffc0116145dc>]
[  580.380544][  T412] kvm [412]:  [<ffffffc0116145dc>]
[  580.380928][  T412] kvm [412]:  [<ffffffc0116145dc>]
. . .

Since nVHE hyp symbols are not included by kallsyms to avoid issues
with aliasing, we fallback to the vmlinux addresses. Symbolizing the
addresses is handled in the next patch in this series.

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---
 arch/arm64/include/asm/kvm_asm.h |  17 ++
 arch/arm64/kvm/Makefile          |   1 +
 arch/arm64/kvm/arm.c             |   2 +-
 arch/arm64/kvm/handle_exit.c     |   3 +
 arch/arm64/kvm/hyp/nvhe/setup.c  |  25 +++
 arch/arm64/kvm/hyp/nvhe/switch.c |  17 ++
 arch/arm64/kvm/stacktrace.c      | 290 +++++++++++++++++++++++++++++++
 arch/arm64/kvm/stacktrace.h      |  17 ++
 8 files changed, 371 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/kvm/stacktrace.c
 create mode 100644 arch/arm64/kvm/stacktrace.h

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index d5b0386ef765..f2b4c2ae5905 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -175,6 +175,23 @@ struct kvm_nvhe_init_params {
 	unsigned long vtcr;
 };
 
+#ifdef CONFIG_NVHE_EL2_DEBUG
+/*
+ * Used by the host in EL1 to dump the nVHE hypervisor backtrace on
+ * hyp_panic. This is possible because CONFIG_NVHE_EL2_DEBUG disables
+ * the host stage 2 protection. See: __hyp_do_panic()
+ *
+ * @hyp_stack_base:		hyp VA of the hyp_stack base.
+ * @hyp_overflow_stack_base:	hyp VA of the hyp_overflow_stack base.
+ * @start_fp:			hyp FP where the hyp backtrace should begin.
+ */
+struct kvm_nvhe_panic_info {
+	unsigned long hyp_stack_base;
+	unsigned long hyp_overflow_stack_base;
+	unsigned long start_fp;
+};
+#endif
+
 /* Translate a kernel address @ptr into its equivalent linear mapping */
 #define kvm_ksym_ref(ptr)						\
 	({								\
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 91861fd8b897..262b5c58cc62 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -23,6 +23,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
 	 vgic/vgic-its.o vgic/vgic-debug.o
 
 kvm-$(CONFIG_HW_PERF_EVENTS)  += pmu-emul.o
+kvm-$(CONFIG_NVHE_EL2_DEBUG)  += stacktrace.o
 
 always-y := hyp_constants.h hyp-constants.s
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index ecc5958e27fe..f779436919ad 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -49,7 +49,7 @@ DEFINE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
 
 DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
 
-static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
+DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
 unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
 DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
 
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index e3140abd2e2e..b038c32a3236 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -23,6 +23,7 @@
 
 #define CREATE_TRACE_POINTS
 #include "trace_handle_exit.h"
+#include "stacktrace.h"
 
 typedef int (*exit_handle_fn)(struct kvm_vcpu *);
 
@@ -326,6 +327,8 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr,
 		kvm_err("nVHE hyp panic at: %016llx!\n", elr_virt + hyp_offset);
 	}
 
+	hyp_dump_backtrace(hyp_offset);
+
 	/*
 	 * Hyp has panicked and we're going to handle that by panicking the
 	 * kernel. The kernel offset will be revealed in the panic so we're
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 39937fa6a1b2..3d7720d25acb 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -23,6 +23,29 @@ unsigned long hyp_nr_cpus;
 #ifdef CONFIG_NVHE_EL2_DEBUG
 DEFINE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack)
 	__aligned(16);
+
+DEFINE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
+
+static void init_nvhe_panic_info(void)
+{
+	struct kvm_nvhe_panic_info *panic_info;
+	struct kvm_nvhe_init_params *params;
+	int cpu;
+
+	for (cpu = 0; cpu < hyp_nr_cpus; cpu++) {
+		panic_info = per_cpu_ptr(&kvm_panic_info, cpu);
+		params = per_cpu_ptr(&kvm_init_params, cpu);
+
+		panic_info->hyp_stack_base = (unsigned long)(params->stack_hyp_va - PAGE_SIZE);
+		panic_info->hyp_overflow_stack_base
+			= (unsigned long)per_cpu_ptr(hyp_overflow_stack, cpu);
+		panic_info->start_fp = 0;
+	}
+}
+#else
+static inline void init_nvhe_panic_info(void)
+{
+}
 #endif
 
 #define hyp_percpu_size ((unsigned long)__per_cpu_end - \
@@ -140,6 +163,8 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 		 * allocated 'private' VA range.
 		 */
 		per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va = (unsigned long) end;
+
+		init_nvhe_panic_info();
 	}
 
 	/*
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 5a2e1ab79913..8f8cd0c02e1a 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -34,6 +34,21 @@ DEFINE_PER_CPU(struct kvm_host_data, kvm_host_data);
 DEFINE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
 DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
 
+#ifdef CONFIG_NVHE_EL2_DEBUG
+DECLARE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
+
+static void update_nvhe_panic_info_fp(void)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr(&kvm_panic_info);
+
+	panic_info->start_fp = (unsigned long)__builtin_frame_address(0);
+}
+#else
+static inline void update_nvhe_panic_info_fp(void)
+{
+}
+#endif
+
 static void __activate_traps(struct kvm_vcpu *vcpu)
 {
 	u64 val;
@@ -355,6 +370,8 @@ void __noreturn hyp_panic(void)
 	struct kvm_cpu_context *host_ctxt;
 	struct kvm_vcpu *vcpu;
 
+	update_nvhe_panic_info_fp();
+
 	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
 	vcpu = host_ctxt->__hyp_running_vcpu;
 
diff --git a/arch/arm64/kvm/stacktrace.c b/arch/arm64/kvm/stacktrace.c
new file mode 100644
index 000000000000..3990a616ab66
--- /dev/null
+++ b/arch/arm64/kvm/stacktrace.c
@@ -0,0 +1,290 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Stack unwinder for EL2 nVHE hypervisor.
+ *
+ * Code mostly copied from the arm64 kernel stack unwinder
+ * and adapted to the nVHE hypervisor.
+ *
+ * See: arch/arm64/kernel/stacktrace.c
+ *
+ * CONFIG_NVHE_EL2_DEBUG disables the host stage-2 protection
+ * allowing us to access the hypervisor stack pages and
+ * consequently unwind its stack from the host in EL1.
+ *
+ * See: __hyp_do_panic()
+ */
+
+#include <asm/kvm_asm.h>
+#include <asm/kvm_hyp.h>
+#include <linux/kvm_host.h>
+#include "stacktrace.h"
+
+DECLARE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
+DECLARE_KVM_NVHE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack);
+DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
+
+enum hyp_stack_type {
+	HYP_STACK_TYPE_UNKNOWN,
+	HYP_STACK_TYPE_HYP,
+	HYP_STACK_TYPE_OVERFLOW,
+	__NR_HYP_STACK_TYPES
+};
+
+struct hyp_stack_info {
+	unsigned long low;
+	unsigned long high;
+	enum hyp_stack_type type;
+};
+
+/*
+ * A snapshot of a frame record or fp/lr register values, along with some
+ * accounting information necessary for robust unwinding.
+ *
+ * @fp:          The fp value in the frame record (or the real fp)
+ * @pc:          The pc value calculated from lr in the frame record.
+ *
+ * @stacks_done: Stacks which have been entirely unwound, for which it is no
+ *               longer valid to unwind to.
+ *
+ * @prev_fp:     The fp that pointed to this frame record, or a synthetic value
+ *               of 0. This is used to ensure that within a stack, each
+ *               subsequent frame record is at an increasing address.
+ * @prev_type:   The type of stack this frame record was on, or a synthetic
+ *               value of HYP_STACK_TYPE_UNKNOWN. This is used to detect a
+ *               transition from one stack to another.
+ */
+struct hyp_stackframe {
+	unsigned long fp;
+	unsigned long pc;
+	DECLARE_BITMAP(stacks_done, __NR_HYP_STACK_TYPES);
+	unsigned long prev_fp;
+	enum hyp_stack_type prev_type;
+};
+
+static inline bool __on_hyp_stack(unsigned long hyp_sp, unsigned long size,
+				unsigned long low, unsigned long high,
+				enum hyp_stack_type type,
+				struct hyp_stack_info *info)
+{
+	if (!low)
+		return false;
+
+	if (hyp_sp < low || hyp_sp + size < hyp_sp || hyp_sp + size > high)
+		return false;
+
+	if (info) {
+		info->low = low;
+		info->high = high;
+		info->type = type;
+	}
+	return true;
+}
+
+static inline bool on_hyp_overflow_stack(unsigned long hyp_sp, unsigned long size,
+				 struct hyp_stack_info *info)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+	unsigned long low = (unsigned long)panic_info->hyp_overflow_stack_base;
+	unsigned long high = low + PAGE_SIZE;
+
+	return __on_hyp_stack(hyp_sp, size, low, high, HYP_STACK_TYPE_OVERFLOW, info);
+}
+
+static inline bool on_hyp_stack(unsigned long hyp_sp, unsigned long size,
+				 struct hyp_stack_info *info)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+	unsigned long low = (unsigned long)panic_info->hyp_stack_base;
+	unsigned long high = low + PAGE_SIZE;
+
+	return __on_hyp_stack(hyp_sp, size, low, high, HYP_STACK_TYPE_HYP, info);
+}
+
+static inline bool on_hyp_accessible_stack(unsigned long hyp_sp, unsigned long size,
+				       struct hyp_stack_info *info)
+{
+	if (info)
+		info->type = HYP_STACK_TYPE_UNKNOWN;
+
+	if (on_hyp_stack(hyp_sp, size, info))
+		return true;
+	if (on_hyp_overflow_stack(hyp_sp, size, info))
+		return true;
+
+	return false;
+}
+
+static unsigned long __hyp_stack_kern_va(unsigned long hyp_va)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+	unsigned long hyp_base, kern_base, hyp_offset;
+
+	hyp_base = (unsigned long)panic_info->hyp_stack_base;
+	hyp_offset = hyp_va - hyp_base;
+
+	kern_base = (unsigned long)*this_cpu_ptr(&kvm_arm_hyp_stack_page);
+
+	return kern_base + hyp_offset;
+}
+
+static unsigned long __hyp_overflow_stack_kern_va(unsigned long hyp_va)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+	unsigned long hyp_base, kern_base, hyp_offset;
+
+	hyp_base = (unsigned long)panic_info->hyp_overflow_stack_base;
+	hyp_offset = hyp_va - hyp_base;
+
+	kern_base = (unsigned long)this_cpu_ptr_nvhe_sym(hyp_overflow_stack);
+
+	return kern_base + hyp_offset;
+}
+
+/*
+ * Convert hypervisor stack VA to a kernel VA.
+ *
+ * The hypervisor stack is mapped in the flexible 'private' VA range, to allow
+ * for guard pages below the stack. Consequently, the fixed offset address
+ * translation macros won't work here.
+ *
+ * The kernel VA is calculated as an offset from the kernel VA of the hypervisor
+ * stack base. See: __hyp_stack_kern_va(),  __hyp_overflow_stack_kern_va()
+ */
+static unsigned long hyp_stack_kern_va(unsigned long hyp_va,
+					enum hyp_stack_type stack_type)
+{
+	switch (stack_type) {
+	case HYP_STACK_TYPE_HYP:
+		return __hyp_stack_kern_va(hyp_va);
+	case HYP_STACK_TYPE_OVERFLOW:
+		return __hyp_overflow_stack_kern_va(hyp_va);
+	default:
+		return 0UL;
+	}
+}
+
+/*
+ * Unwind from one frame record (A) to the next frame record (B).
+ *
+ * We terminate early if the location of B indicates a malformed chain of frame
+ * records (e.g. a cycle), determined based on the location and fp value of A
+ * and the location (but not the fp value) of B.
+ */
+static int notrace hyp_unwind_frame(struct hyp_stackframe *frame)
+{
+	unsigned long fp = frame->fp, fp_kern_va;
+	struct hyp_stack_info info;
+
+	if (fp & 0x7)
+		return -EINVAL;
+
+	if (!on_hyp_accessible_stack(fp, 16, &info))
+		return -EINVAL;
+
+	if (test_bit(info.type, frame->stacks_done))
+		return -EINVAL;
+
+	/*
+	 * As stacks grow downward, any valid record on the same stack must be
+	 * at a strictly higher address than the prior record.
+	 *
+	 * Stacks can nest in the following order:
+	 *
+	 * HYP -> OVERFLOW
+	 *
+	 * ... but the nesting itself is strict. Once we transition from one
+	 * stack to another, it's never valid to unwind back to that first
+	 * stack.
+	 */
+	if (info.type == frame->prev_type) {
+		if (fp <= frame->prev_fp)
+			return -EINVAL;
+	} else {
+		set_bit(frame->prev_type, frame->stacks_done);
+	}
+
+	/* Translate the hyp stack address to a kernel address */
+	fp_kern_va = hyp_stack_kern_va(fp, info.type);
+	if (!fp_kern_va)
+		return -EINVAL;
+
+	/*
+	 * Record this frame record's values and location. The prev_fp and
+	 * prev_type are only meaningful to the next hyp_unwind_frame()
+	 * invocation.
+	 */
+	frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp_kern_va));
+	/* PC = LR - 4; All aarch64 instructions are 32-bits in size */
+	frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp_kern_va + 8)) - 4;
+	frame->prev_fp = fp;
+	frame->prev_type = info.type;
+
+	return 0;
+}
+
+/*
+ * AArch64 PCS assigns the frame pointer to x29.
+ *
+ * A simple function prologue looks like this:
+ *	sub	sp, sp, #0x10
+ *	stp	x29, x30, [sp]
+ *	mov	x29, sp
+ *
+ * A simple function epilogue looks like this:
+ *	mov	sp, x29
+ *	ldp	x29, x30, [sp]
+ *	add	sp, sp, #0x10
+ */
+static void hyp_start_backtrace(struct hyp_stackframe *frame, unsigned long fp)
+{
+	frame->fp = fp;
+
+	/*
+	 * Prime the first unwind.
+	 *
+	 * In hyp_unwind_frame() we'll check that the FP points to a valid
+	 * stack, which can't be HYP_STACK_TYPE_UNKNOWN, and the first unwind
+	 * will be treated as a transition to whichever stack that happens to
+	 * be. The prev_fp value won't be used, but we set it to 0 such that
+	 * it is definitely not an accessible stack address. The first frame
+	 * (hyp_panic()) is skipped, so we also set PC to 0.
+	 */
+	bitmap_zero(frame->stacks_done, __NR_HYP_STACK_TYPES);
+	frame->pc = frame->prev_fp = 0;
+	frame->prev_type = HYP_STACK_TYPE_UNKNOWN;
+}
+
+static void hyp_dump_backtrace_entry(unsigned long hyp_pc, unsigned long hyp_offset)
+{
+	unsigned long va_mask = GENMASK_ULL(vabits_actual - 1, 0);
+
+	hyp_pc &= va_mask;
+	hyp_pc += hyp_offset;
+
+	kvm_err(" [<%016llx>]\n", hyp_pc);
+}
+
+void hyp_dump_backtrace(unsigned long hyp_offset)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+	struct hyp_stackframe frame;
+	int frame_nr = 0;
+	int skip = 1;		/* Skip the first frame: hyp_panic() */
+
+	kvm_err("nVHE HYP call trace (vmlinux addresses):\n");
+
+	hyp_start_backtrace(&frame, (unsigned long)panic_info->start_fp);
+
+	do {
+		if (skip) {
+			skip--;
+			continue;
+		}
+
+		hyp_dump_backtrace_entry(frame.pc, hyp_offset);
+
+		frame_nr++;
+	} while (!hyp_unwind_frame(&frame));
+
+	kvm_err("---- end of nVHE HYP call trace ----\n");
+}
diff --git a/arch/arm64/kvm/stacktrace.h b/arch/arm64/kvm/stacktrace.h
new file mode 100644
index 000000000000..40c397394b9b
--- /dev/null
+++ b/arch/arm64/kvm/stacktrace.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Stack unwinder for EL2 nVHE hypervisor.
+ */
+
+#ifndef __KVM_HYP_STACKTRACE_H
+#define __KVM_HYP_STACKTRACE_H
+
+#ifdef CONFIG_NVHE_EL2_DEBUG
+void hyp_dump_backtrace(unsigned long hyp_offset);
+#else
+static inline void hyp_dump_backtrace(unsigned long hyp_offset)
+{
+}
+#endif /* CONFIG_NVHE_EL2_DEBUG */
+
+#endif /* __KVM_HYP_STACKTRACE_H */
-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 6/7] KVM: arm64: Unwind and dump nVHE HYP stacktrace
@ 2022-02-10 22:41   ` Kalesh Singh
  0 siblings, 0 replies; 36+ messages in thread
From: Kalesh Singh @ 2022-02-10 22:41 UTC (permalink / raw)
  Cc: will, maz, qperret, tabba, surenb, kernel-team, Kalesh Singh,
	Catalin Marinas, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Ard Biesheuvel, Mark Rutland, Pasha Tatashin, Joey Gouly,
	Peter Collingbourne, Andrew Walbran, Andrew Scull, Paolo Bonzini,
	linux-arm-kernel, linux-kernel, kvmarm

Unwind the stack in EL1, when CONFIG_NVHE_EL2_DEBUG is enabled. This is
possible because CONFIG_NVHE_EL2_DEBUG disables the host stage 2 protection
which allows host to access the hypervisor stack pages in EL1.

Unwinding and dumping hyp call traces is gated on CONFIG_NVHE_EL2_DEBUG
to avoid the potential leaking of information to the host.

A simple stack overflow test produces the following output:

[  580.376051][  T412] kvm: nVHE hyp panic at: ffffffc0116145c4!
[  580.378034][  T412] kvm [412]: nVHE HYP call trace (vmlinux
addresses):
[  580.378591][  T412] kvm [412]:  [<ffffffc011614934>]
[  580.378993][  T412] kvm [412]:  [<ffffffc01160fa48>]
[  580.379386][  T412] kvm [412]:  [<ffffffc0116145dc>]  // Non-terminating recursive call
[  580.379772][  T412] kvm [412]:  [<ffffffc0116145dc>]
[  580.380158][  T412] kvm [412]:  [<ffffffc0116145dc>]
[  580.380544][  T412] kvm [412]:  [<ffffffc0116145dc>]
[  580.380928][  T412] kvm [412]:  [<ffffffc0116145dc>]
. . .

Since nVHE hyp symbols are not included by kallsyms to avoid issues
with aliasing, we fallback to the vmlinux addresses. Symbolizing the
addresses is handled in the next patch in this series.

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---
 arch/arm64/include/asm/kvm_asm.h |  17 ++
 arch/arm64/kvm/Makefile          |   1 +
 arch/arm64/kvm/arm.c             |   2 +-
 arch/arm64/kvm/handle_exit.c     |   3 +
 arch/arm64/kvm/hyp/nvhe/setup.c  |  25 +++
 arch/arm64/kvm/hyp/nvhe/switch.c |  17 ++
 arch/arm64/kvm/stacktrace.c      | 290 +++++++++++++++++++++++++++++++
 arch/arm64/kvm/stacktrace.h      |  17 ++
 8 files changed, 371 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/kvm/stacktrace.c
 create mode 100644 arch/arm64/kvm/stacktrace.h

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index d5b0386ef765..f2b4c2ae5905 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -175,6 +175,23 @@ struct kvm_nvhe_init_params {
 	unsigned long vtcr;
 };
 
+#ifdef CONFIG_NVHE_EL2_DEBUG
+/*
+ * Used by the host in EL1 to dump the nVHE hypervisor backtrace on
+ * hyp_panic. This is possible because CONFIG_NVHE_EL2_DEBUG disables
+ * the host stage 2 protection. See: __hyp_do_panic()
+ *
+ * @hyp_stack_base:		hyp VA of the hyp_stack base.
+ * @hyp_overflow_stack_base:	hyp VA of the hyp_overflow_stack base.
+ * @start_fp:			hyp FP where the hyp backtrace should begin.
+ */
+struct kvm_nvhe_panic_info {
+	unsigned long hyp_stack_base;
+	unsigned long hyp_overflow_stack_base;
+	unsigned long start_fp;
+};
+#endif
+
 /* Translate a kernel address @ptr into its equivalent linear mapping */
 #define kvm_ksym_ref(ptr)						\
 	({								\
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 91861fd8b897..262b5c58cc62 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -23,6 +23,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
 	 vgic/vgic-its.o vgic/vgic-debug.o
 
 kvm-$(CONFIG_HW_PERF_EVENTS)  += pmu-emul.o
+kvm-$(CONFIG_NVHE_EL2_DEBUG)  += stacktrace.o
 
 always-y := hyp_constants.h hyp-constants.s
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index ecc5958e27fe..f779436919ad 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -49,7 +49,7 @@ DEFINE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
 
 DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
 
-static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
+DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
 unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
 DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
 
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index e3140abd2e2e..b038c32a3236 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -23,6 +23,7 @@
 
 #define CREATE_TRACE_POINTS
 #include "trace_handle_exit.h"
+#include "stacktrace.h"
 
 typedef int (*exit_handle_fn)(struct kvm_vcpu *);
 
@@ -326,6 +327,8 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr,
 		kvm_err("nVHE hyp panic at: %016llx!\n", elr_virt + hyp_offset);
 	}
 
+	hyp_dump_backtrace(hyp_offset);
+
 	/*
 	 * Hyp has panicked and we're going to handle that by panicking the
 	 * kernel. The kernel offset will be revealed in the panic so we're
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 39937fa6a1b2..3d7720d25acb 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -23,6 +23,29 @@ unsigned long hyp_nr_cpus;
 #ifdef CONFIG_NVHE_EL2_DEBUG
 DEFINE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack)
 	__aligned(16);
+
+DEFINE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
+
+static void init_nvhe_panic_info(void)
+{
+	struct kvm_nvhe_panic_info *panic_info;
+	struct kvm_nvhe_init_params *params;
+	int cpu;
+
+	for (cpu = 0; cpu < hyp_nr_cpus; cpu++) {
+		panic_info = per_cpu_ptr(&kvm_panic_info, cpu);
+		params = per_cpu_ptr(&kvm_init_params, cpu);
+
+		panic_info->hyp_stack_base = (unsigned long)(params->stack_hyp_va - PAGE_SIZE);
+		panic_info->hyp_overflow_stack_base
+			= (unsigned long)per_cpu_ptr(hyp_overflow_stack, cpu);
+		panic_info->start_fp = 0;
+	}
+}
+#else
+static inline void init_nvhe_panic_info(void)
+{
+}
 #endif
 
 #define hyp_percpu_size ((unsigned long)__per_cpu_end - \
@@ -140,6 +163,8 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 		 * allocated 'private' VA range.
 		 */
 		per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va = (unsigned long) end;
+
+		init_nvhe_panic_info();
 	}
 
 	/*
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 5a2e1ab79913..8f8cd0c02e1a 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -34,6 +34,21 @@ DEFINE_PER_CPU(struct kvm_host_data, kvm_host_data);
 DEFINE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
 DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
 
+#ifdef CONFIG_NVHE_EL2_DEBUG
+DECLARE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
+
+static void update_nvhe_panic_info_fp(void)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr(&kvm_panic_info);
+
+	panic_info->start_fp = (unsigned long)__builtin_frame_address(0);
+}
+#else
+static inline void update_nvhe_panic_info_fp(void)
+{
+}
+#endif
+
 static void __activate_traps(struct kvm_vcpu *vcpu)
 {
 	u64 val;
@@ -355,6 +370,8 @@ void __noreturn hyp_panic(void)
 	struct kvm_cpu_context *host_ctxt;
 	struct kvm_vcpu *vcpu;
 
+	update_nvhe_panic_info_fp();
+
 	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
 	vcpu = host_ctxt->__hyp_running_vcpu;
 
diff --git a/arch/arm64/kvm/stacktrace.c b/arch/arm64/kvm/stacktrace.c
new file mode 100644
index 000000000000..3990a616ab66
--- /dev/null
+++ b/arch/arm64/kvm/stacktrace.c
@@ -0,0 +1,290 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Stack unwinder for EL2 nVHE hypervisor.
+ *
+ * Code mostly copied from the arm64 kernel stack unwinder
+ * and adapted to the nVHE hypervisor.
+ *
+ * See: arch/arm64/kernel/stacktrace.c
+ *
+ * CONFIG_NVHE_EL2_DEBUG disables the host stage-2 protection
+ * allowing us to access the hypervisor stack pages and
+ * consequently unwind its stack from the host in EL1.
+ *
+ * See: __hyp_do_panic()
+ */
+
+#include <asm/kvm_asm.h>
+#include <asm/kvm_hyp.h>
+#include <linux/kvm_host.h>
+#include "stacktrace.h"
+
+DECLARE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
+DECLARE_KVM_NVHE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack);
+DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
+
+enum hyp_stack_type {
+	HYP_STACK_TYPE_UNKNOWN,
+	HYP_STACK_TYPE_HYP,
+	HYP_STACK_TYPE_OVERFLOW,
+	__NR_HYP_STACK_TYPES
+};
+
+struct hyp_stack_info {
+	unsigned long low;
+	unsigned long high;
+	enum hyp_stack_type type;
+};
+
+/*
+ * A snapshot of a frame record or fp/lr register values, along with some
+ * accounting information necessary for robust unwinding.
+ *
+ * @fp:          The fp value in the frame record (or the real fp)
+ * @pc:          The pc value calculated from lr in the frame record.
+ *
+ * @stacks_done: Stacks which have been entirely unwound, for which it is no
+ *               longer valid to unwind to.
+ *
+ * @prev_fp:     The fp that pointed to this frame record, or a synthetic value
+ *               of 0. This is used to ensure that within a stack, each
+ *               subsequent frame record is at an increasing address.
+ * @prev_type:   The type of stack this frame record was on, or a synthetic
+ *               value of HYP_STACK_TYPE_UNKNOWN. This is used to detect a
+ *               transition from one stack to another.
+ */
+struct hyp_stackframe {
+	unsigned long fp;
+	unsigned long pc;
+	DECLARE_BITMAP(stacks_done, __NR_HYP_STACK_TYPES);
+	unsigned long prev_fp;
+	enum hyp_stack_type prev_type;
+};
+
+static inline bool __on_hyp_stack(unsigned long hyp_sp, unsigned long size,
+				unsigned long low, unsigned long high,
+				enum hyp_stack_type type,
+				struct hyp_stack_info *info)
+{
+	if (!low)
+		return false;
+
+	if (hyp_sp < low || hyp_sp + size < hyp_sp || hyp_sp + size > high)
+		return false;
+
+	if (info) {
+		info->low = low;
+		info->high = high;
+		info->type = type;
+	}
+	return true;
+}
+
+static inline bool on_hyp_overflow_stack(unsigned long hyp_sp, unsigned long size,
+				 struct hyp_stack_info *info)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+	unsigned long low = (unsigned long)panic_info->hyp_overflow_stack_base;
+	unsigned long high = low + PAGE_SIZE;
+
+	return __on_hyp_stack(hyp_sp, size, low, high, HYP_STACK_TYPE_OVERFLOW, info);
+}
+
+static inline bool on_hyp_stack(unsigned long hyp_sp, unsigned long size,
+				 struct hyp_stack_info *info)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+	unsigned long low = (unsigned long)panic_info->hyp_stack_base;
+	unsigned long high = low + PAGE_SIZE;
+
+	return __on_hyp_stack(hyp_sp, size, low, high, HYP_STACK_TYPE_HYP, info);
+}
+
+static inline bool on_hyp_accessible_stack(unsigned long hyp_sp, unsigned long size,
+				       struct hyp_stack_info *info)
+{
+	if (info)
+		info->type = HYP_STACK_TYPE_UNKNOWN;
+
+	if (on_hyp_stack(hyp_sp, size, info))
+		return true;
+	if (on_hyp_overflow_stack(hyp_sp, size, info))
+		return true;
+
+	return false;
+}
+
+static unsigned long __hyp_stack_kern_va(unsigned long hyp_va)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+	unsigned long hyp_base, kern_base, hyp_offset;
+
+	hyp_base = (unsigned long)panic_info->hyp_stack_base;
+	hyp_offset = hyp_va - hyp_base;
+
+	kern_base = (unsigned long)*this_cpu_ptr(&kvm_arm_hyp_stack_page);
+
+	return kern_base + hyp_offset;
+}
+
+static unsigned long __hyp_overflow_stack_kern_va(unsigned long hyp_va)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+	unsigned long hyp_base, kern_base, hyp_offset;
+
+	hyp_base = (unsigned long)panic_info->hyp_overflow_stack_base;
+	hyp_offset = hyp_va - hyp_base;
+
+	kern_base = (unsigned long)this_cpu_ptr_nvhe_sym(hyp_overflow_stack);
+
+	return kern_base + hyp_offset;
+}
+
+/*
+ * Convert hypervisor stack VA to a kernel VA.
+ *
+ * The hypervisor stack is mapped in the flexible 'private' VA range, to allow
+ * for guard pages below the stack. Consequently, the fixed offset address
+ * translation macros won't work here.
+ *
+ * The kernel VA is calculated as an offset from the kernel VA of the hypervisor
+ * stack base. See: __hyp_stack_kern_va(),  __hyp_overflow_stack_kern_va()
+ */
+static unsigned long hyp_stack_kern_va(unsigned long hyp_va,
+					enum hyp_stack_type stack_type)
+{
+	switch (stack_type) {
+	case HYP_STACK_TYPE_HYP:
+		return __hyp_stack_kern_va(hyp_va);
+	case HYP_STACK_TYPE_OVERFLOW:
+		return __hyp_overflow_stack_kern_va(hyp_va);
+	default:
+		return 0UL;
+	}
+}
+
+/*
+ * Unwind from one frame record (A) to the next frame record (B).
+ *
+ * We terminate early if the location of B indicates a malformed chain of frame
+ * records (e.g. a cycle), determined based on the location and fp value of A
+ * and the location (but not the fp value) of B.
+ */
+static int notrace hyp_unwind_frame(struct hyp_stackframe *frame)
+{
+	unsigned long fp = frame->fp, fp_kern_va;
+	struct hyp_stack_info info;
+
+	if (fp & 0x7)
+		return -EINVAL;
+
+	if (!on_hyp_accessible_stack(fp, 16, &info))
+		return -EINVAL;
+
+	if (test_bit(info.type, frame->stacks_done))
+		return -EINVAL;
+
+	/*
+	 * As stacks grow downward, any valid record on the same stack must be
+	 * at a strictly higher address than the prior record.
+	 *
+	 * Stacks can nest in the following order:
+	 *
+	 * HYP -> OVERFLOW
+	 *
+	 * ... but the nesting itself is strict. Once we transition from one
+	 * stack to another, it's never valid to unwind back to that first
+	 * stack.
+	 */
+	if (info.type == frame->prev_type) {
+		if (fp <= frame->prev_fp)
+			return -EINVAL;
+	} else {
+		set_bit(frame->prev_type, frame->stacks_done);
+	}
+
+	/* Translate the hyp stack address to a kernel address */
+	fp_kern_va = hyp_stack_kern_va(fp, info.type);
+	if (!fp_kern_va)
+		return -EINVAL;
+
+	/*
+	 * Record this frame record's values and location. The prev_fp and
+	 * prev_type are only meaningful to the next hyp_unwind_frame()
+	 * invocation.
+	 */
+	frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp_kern_va));
+	/* PC = LR - 4; All aarch64 instructions are 32-bits in size */
+	frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp_kern_va + 8)) - 4;
+	frame->prev_fp = fp;
+	frame->prev_type = info.type;
+
+	return 0;
+}
+
+/*
+ * AArch64 PCS assigns the frame pointer to x29.
+ *
+ * A simple function prologue looks like this:
+ *	sub	sp, sp, #0x10
+ *	stp	x29, x30, [sp]
+ *	mov	x29, sp
+ *
+ * A simple function epilogue looks like this:
+ *	mov	sp, x29
+ *	ldp	x29, x30, [sp]
+ *	add	sp, sp, #0x10
+ */
+static void hyp_start_backtrace(struct hyp_stackframe *frame, unsigned long fp)
+{
+	frame->fp = fp;
+
+	/*
+	 * Prime the first unwind.
+	 *
+	 * In hyp_unwind_frame() we'll check that the FP points to a valid
+	 * stack, which can't be HYP_STACK_TYPE_UNKNOWN, and the first unwind
+	 * will be treated as a transition to whichever stack that happens to
+	 * be. The prev_fp value won't be used, but we set it to 0 such that
+	 * it is definitely not an accessible stack address. The first frame
+	 * (hyp_panic()) is skipped, so we also set PC to 0.
+	 */
+	bitmap_zero(frame->stacks_done, __NR_HYP_STACK_TYPES);
+	frame->pc = frame->prev_fp = 0;
+	frame->prev_type = HYP_STACK_TYPE_UNKNOWN;
+}
+
+static void hyp_dump_backtrace_entry(unsigned long hyp_pc, unsigned long hyp_offset)
+{
+	unsigned long va_mask = GENMASK_ULL(vabits_actual - 1, 0);
+
+	hyp_pc &= va_mask;
+	hyp_pc += hyp_offset;
+
+	kvm_err(" [<%016llx>]\n", hyp_pc);
+}
+
+void hyp_dump_backtrace(unsigned long hyp_offset)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+	struct hyp_stackframe frame;
+	int frame_nr = 0;
+	int skip = 1;		/* Skip the first frame: hyp_panic() */
+
+	kvm_err("nVHE HYP call trace (vmlinux addresses):\n");
+
+	hyp_start_backtrace(&frame, (unsigned long)panic_info->start_fp);
+
+	do {
+		if (skip) {
+			skip--;
+			continue;
+		}
+
+		hyp_dump_backtrace_entry(frame.pc, hyp_offset);
+
+		frame_nr++;
+	} while (!hyp_unwind_frame(&frame));
+
+	kvm_err("---- end of nVHE HYP call trace ----\n");
+}
diff --git a/arch/arm64/kvm/stacktrace.h b/arch/arm64/kvm/stacktrace.h
new file mode 100644
index 000000000000..40c397394b9b
--- /dev/null
+++ b/arch/arm64/kvm/stacktrace.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Stack unwinder for EL2 nVHE hypervisor.
+ */
+
+#ifndef __KVM_HYP_STACKTRACE_H
+#define __KVM_HYP_STACKTRACE_H
+
+#ifdef CONFIG_NVHE_EL2_DEBUG
+void hyp_dump_backtrace(unsigned long hyp_offset);
+#else
+static inline void hyp_dump_backtrace(unsigned long hyp_offset)
+{
+}
+#endif /* CONFIG_NVHE_EL2_DEBUG */
+
+#endif /* __KVM_HYP_STACKTRACE_H */
-- 
2.35.1.265.g69c8d7142f-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 6/7] KVM: arm64: Unwind and dump nVHE HYP stacktrace
@ 2022-02-10 22:41   ` Kalesh Singh
  0 siblings, 0 replies; 36+ messages in thread
From: Kalesh Singh @ 2022-02-10 22:41 UTC (permalink / raw)
  Cc: Catalin Marinas, Joey Gouly, Kalesh Singh, will, kvmarm,
	Andrew Walbran, maz, linux-arm-kernel, kernel-team,
	Pasha Tatashin, surenb, Peter Collingbourne, linux-kernel,
	Paolo Bonzini

Unwind the stack in EL1, when CONFIG_NVHE_EL2_DEBUG is enabled. This is
possible because CONFIG_NVHE_EL2_DEBUG disables the host stage 2 protection
which allows host to access the hypervisor stack pages in EL1.

Unwinding and dumping hyp call traces is gated on CONFIG_NVHE_EL2_DEBUG
to avoid the potential leaking of information to the host.

A simple stack overflow test produces the following output:

[  580.376051][  T412] kvm: nVHE hyp panic at: ffffffc0116145c4!
[  580.378034][  T412] kvm [412]: nVHE HYP call trace (vmlinux
addresses):
[  580.378591][  T412] kvm [412]:  [<ffffffc011614934>]
[  580.378993][  T412] kvm [412]:  [<ffffffc01160fa48>]
[  580.379386][  T412] kvm [412]:  [<ffffffc0116145dc>]  // Non-terminating recursive call
[  580.379772][  T412] kvm [412]:  [<ffffffc0116145dc>]
[  580.380158][  T412] kvm [412]:  [<ffffffc0116145dc>]
[  580.380544][  T412] kvm [412]:  [<ffffffc0116145dc>]
[  580.380928][  T412] kvm [412]:  [<ffffffc0116145dc>]
. . .

Since nVHE hyp symbols are not included by kallsyms to avoid issues
with aliasing, we fallback to the vmlinux addresses. Symbolizing the
addresses is handled in the next patch in this series.

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---
 arch/arm64/include/asm/kvm_asm.h |  17 ++
 arch/arm64/kvm/Makefile          |   1 +
 arch/arm64/kvm/arm.c             |   2 +-
 arch/arm64/kvm/handle_exit.c     |   3 +
 arch/arm64/kvm/hyp/nvhe/setup.c  |  25 +++
 arch/arm64/kvm/hyp/nvhe/switch.c |  17 ++
 arch/arm64/kvm/stacktrace.c      | 290 +++++++++++++++++++++++++++++++
 arch/arm64/kvm/stacktrace.h      |  17 ++
 8 files changed, 371 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/kvm/stacktrace.c
 create mode 100644 arch/arm64/kvm/stacktrace.h

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index d5b0386ef765..f2b4c2ae5905 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -175,6 +175,23 @@ struct kvm_nvhe_init_params {
 	unsigned long vtcr;
 };
 
+#ifdef CONFIG_NVHE_EL2_DEBUG
+/*
+ * Used by the host in EL1 to dump the nVHE hypervisor backtrace on
+ * hyp_panic. This is possible because CONFIG_NVHE_EL2_DEBUG disables
+ * the host stage 2 protection. See: __hyp_do_panic()
+ *
+ * @hyp_stack_base:		hyp VA of the hyp_stack base.
+ * @hyp_overflow_stack_base:	hyp VA of the hyp_overflow_stack base.
+ * @start_fp:			hyp FP where the hyp backtrace should begin.
+ */
+struct kvm_nvhe_panic_info {
+	unsigned long hyp_stack_base;
+	unsigned long hyp_overflow_stack_base;
+	unsigned long start_fp;
+};
+#endif
+
 /* Translate a kernel address @ptr into its equivalent linear mapping */
 #define kvm_ksym_ref(ptr)						\
 	({								\
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 91861fd8b897..262b5c58cc62 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -23,6 +23,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
 	 vgic/vgic-its.o vgic/vgic-debug.o
 
 kvm-$(CONFIG_HW_PERF_EVENTS)  += pmu-emul.o
+kvm-$(CONFIG_NVHE_EL2_DEBUG)  += stacktrace.o
 
 always-y := hyp_constants.h hyp-constants.s
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index ecc5958e27fe..f779436919ad 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -49,7 +49,7 @@ DEFINE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
 
 DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
 
-static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
+DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
 unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
 DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
 
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index e3140abd2e2e..b038c32a3236 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -23,6 +23,7 @@
 
 #define CREATE_TRACE_POINTS
 #include "trace_handle_exit.h"
+#include "stacktrace.h"
 
 typedef int (*exit_handle_fn)(struct kvm_vcpu *);
 
@@ -326,6 +327,8 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr,
 		kvm_err("nVHE hyp panic at: %016llx!\n", elr_virt + hyp_offset);
 	}
 
+	hyp_dump_backtrace(hyp_offset);
+
 	/*
 	 * Hyp has panicked and we're going to handle that by panicking the
 	 * kernel. The kernel offset will be revealed in the panic so we're
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 39937fa6a1b2..3d7720d25acb 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -23,6 +23,29 @@ unsigned long hyp_nr_cpus;
 #ifdef CONFIG_NVHE_EL2_DEBUG
 DEFINE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack)
 	__aligned(16);
+
+DEFINE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
+
+static void init_nvhe_panic_info(void)
+{
+	struct kvm_nvhe_panic_info *panic_info;
+	struct kvm_nvhe_init_params *params;
+	int cpu;
+
+	for (cpu = 0; cpu < hyp_nr_cpus; cpu++) {
+		panic_info = per_cpu_ptr(&kvm_panic_info, cpu);
+		params = per_cpu_ptr(&kvm_init_params, cpu);
+
+		panic_info->hyp_stack_base = (unsigned long)(params->stack_hyp_va - PAGE_SIZE);
+		panic_info->hyp_overflow_stack_base
+			= (unsigned long)per_cpu_ptr(hyp_overflow_stack, cpu);
+		panic_info->start_fp = 0;
+	}
+}
+#else
+static inline void init_nvhe_panic_info(void)
+{
+}
 #endif
 
 #define hyp_percpu_size ((unsigned long)__per_cpu_end - \
@@ -140,6 +163,8 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 		 * allocated 'private' VA range.
 		 */
 		per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va = (unsigned long) end;
+
+		init_nvhe_panic_info();
 	}
 
 	/*
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 5a2e1ab79913..8f8cd0c02e1a 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -34,6 +34,21 @@ DEFINE_PER_CPU(struct kvm_host_data, kvm_host_data);
 DEFINE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
 DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
 
+#ifdef CONFIG_NVHE_EL2_DEBUG
+DECLARE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
+
+static void update_nvhe_panic_info_fp(void)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr(&kvm_panic_info);
+
+	panic_info->start_fp = (unsigned long)__builtin_frame_address(0);
+}
+#else
+static inline void update_nvhe_panic_info_fp(void)
+{
+}
+#endif
+
 static void __activate_traps(struct kvm_vcpu *vcpu)
 {
 	u64 val;
@@ -355,6 +370,8 @@ void __noreturn hyp_panic(void)
 	struct kvm_cpu_context *host_ctxt;
 	struct kvm_vcpu *vcpu;
 
+	update_nvhe_panic_info_fp();
+
 	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
 	vcpu = host_ctxt->__hyp_running_vcpu;
 
diff --git a/arch/arm64/kvm/stacktrace.c b/arch/arm64/kvm/stacktrace.c
new file mode 100644
index 000000000000..3990a616ab66
--- /dev/null
+++ b/arch/arm64/kvm/stacktrace.c
@@ -0,0 +1,290 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Stack unwinder for EL2 nVHE hypervisor.
+ *
+ * Code mostly copied from the arm64 kernel stack unwinder
+ * and adapted to the nVHE hypervisor.
+ *
+ * See: arch/arm64/kernel/stacktrace.c
+ *
+ * CONFIG_NVHE_EL2_DEBUG disables the host stage-2 protection
+ * allowing us to access the hypervisor stack pages and
+ * consequently unwind its stack from the host in EL1.
+ *
+ * See: __hyp_do_panic()
+ */
+
+#include <asm/kvm_asm.h>
+#include <asm/kvm_hyp.h>
+#include <linux/kvm_host.h>
+#include "stacktrace.h"
+
+DECLARE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
+DECLARE_KVM_NVHE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack);
+DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
+
+enum hyp_stack_type {
+	HYP_STACK_TYPE_UNKNOWN,
+	HYP_STACK_TYPE_HYP,
+	HYP_STACK_TYPE_OVERFLOW,
+	__NR_HYP_STACK_TYPES
+};
+
+struct hyp_stack_info {
+	unsigned long low;
+	unsigned long high;
+	enum hyp_stack_type type;
+};
+
+/*
+ * A snapshot of a frame record or fp/lr register values, along with some
+ * accounting information necessary for robust unwinding.
+ *
+ * @fp:          The fp value in the frame record (or the real fp)
+ * @pc:          The pc value calculated from lr in the frame record.
+ *
+ * @stacks_done: Stacks which have been entirely unwound, for which it is no
+ *               longer valid to unwind to.
+ *
+ * @prev_fp:     The fp that pointed to this frame record, or a synthetic value
+ *               of 0. This is used to ensure that within a stack, each
+ *               subsequent frame record is at an increasing address.
+ * @prev_type:   The type of stack this frame record was on, or a synthetic
+ *               value of HYP_STACK_TYPE_UNKNOWN. This is used to detect a
+ *               transition from one stack to another.
+ */
+struct hyp_stackframe {
+	unsigned long fp;
+	unsigned long pc;
+	DECLARE_BITMAP(stacks_done, __NR_HYP_STACK_TYPES);
+	unsigned long prev_fp;
+	enum hyp_stack_type prev_type;
+};
+
+static inline bool __on_hyp_stack(unsigned long hyp_sp, unsigned long size,
+				unsigned long low, unsigned long high,
+				enum hyp_stack_type type,
+				struct hyp_stack_info *info)
+{
+	if (!low)
+		return false;
+
+	if (hyp_sp < low || hyp_sp + size < hyp_sp || hyp_sp + size > high)
+		return false;
+
+	if (info) {
+		info->low = low;
+		info->high = high;
+		info->type = type;
+	}
+	return true;
+}
+
+static inline bool on_hyp_overflow_stack(unsigned long hyp_sp, unsigned long size,
+				 struct hyp_stack_info *info)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+	unsigned long low = (unsigned long)panic_info->hyp_overflow_stack_base;
+	unsigned long high = low + PAGE_SIZE;
+
+	return __on_hyp_stack(hyp_sp, size, low, high, HYP_STACK_TYPE_OVERFLOW, info);
+}
+
+static inline bool on_hyp_stack(unsigned long hyp_sp, unsigned long size,
+				 struct hyp_stack_info *info)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+	unsigned long low = (unsigned long)panic_info->hyp_stack_base;
+	unsigned long high = low + PAGE_SIZE;
+
+	return __on_hyp_stack(hyp_sp, size, low, high, HYP_STACK_TYPE_HYP, info);
+}
+
+static inline bool on_hyp_accessible_stack(unsigned long hyp_sp, unsigned long size,
+				       struct hyp_stack_info *info)
+{
+	if (info)
+		info->type = HYP_STACK_TYPE_UNKNOWN;
+
+	if (on_hyp_stack(hyp_sp, size, info))
+		return true;
+	if (on_hyp_overflow_stack(hyp_sp, size, info))
+		return true;
+
+	return false;
+}
+
+static unsigned long __hyp_stack_kern_va(unsigned long hyp_va)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+	unsigned long hyp_base, kern_base, hyp_offset;
+
+	hyp_base = (unsigned long)panic_info->hyp_stack_base;
+	hyp_offset = hyp_va - hyp_base;
+
+	kern_base = (unsigned long)*this_cpu_ptr(&kvm_arm_hyp_stack_page);
+
+	return kern_base + hyp_offset;
+}
+
+static unsigned long __hyp_overflow_stack_kern_va(unsigned long hyp_va)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+	unsigned long hyp_base, kern_base, hyp_offset;
+
+	hyp_base = (unsigned long)panic_info->hyp_overflow_stack_base;
+	hyp_offset = hyp_va - hyp_base;
+
+	kern_base = (unsigned long)this_cpu_ptr_nvhe_sym(hyp_overflow_stack);
+
+	return kern_base + hyp_offset;
+}
+
+/*
+ * Convert hypervisor stack VA to a kernel VA.
+ *
+ * The hypervisor stack is mapped in the flexible 'private' VA range, to allow
+ * for guard pages below the stack. Consequently, the fixed offset address
+ * translation macros won't work here.
+ *
+ * The kernel VA is calculated as an offset from the kernel VA of the hypervisor
+ * stack base. See: __hyp_stack_kern_va(),  __hyp_overflow_stack_kern_va()
+ */
+static unsigned long hyp_stack_kern_va(unsigned long hyp_va,
+					enum hyp_stack_type stack_type)
+{
+	switch (stack_type) {
+	case HYP_STACK_TYPE_HYP:
+		return __hyp_stack_kern_va(hyp_va);
+	case HYP_STACK_TYPE_OVERFLOW:
+		return __hyp_overflow_stack_kern_va(hyp_va);
+	default:
+		return 0UL;
+	}
+}
+
+/*
+ * Unwind from one frame record (A) to the next frame record (B).
+ *
+ * We terminate early if the location of B indicates a malformed chain of frame
+ * records (e.g. a cycle), determined based on the location and fp value of A
+ * and the location (but not the fp value) of B.
+ */
+static int notrace hyp_unwind_frame(struct hyp_stackframe *frame)
+{
+	unsigned long fp = frame->fp, fp_kern_va;
+	struct hyp_stack_info info;
+
+	if (fp & 0x7)
+		return -EINVAL;
+
+	if (!on_hyp_accessible_stack(fp, 16, &info))
+		return -EINVAL;
+
+	if (test_bit(info.type, frame->stacks_done))
+		return -EINVAL;
+
+	/*
+	 * As stacks grow downward, any valid record on the same stack must be
+	 * at a strictly higher address than the prior record.
+	 *
+	 * Stacks can nest in the following order:
+	 *
+	 * HYP -> OVERFLOW
+	 *
+	 * ... but the nesting itself is strict. Once we transition from one
+	 * stack to another, it's never valid to unwind back to that first
+	 * stack.
+	 */
+	if (info.type == frame->prev_type) {
+		if (fp <= frame->prev_fp)
+			return -EINVAL;
+	} else {
+		set_bit(frame->prev_type, frame->stacks_done);
+	}
+
+	/* Translate the hyp stack address to a kernel address */
+	fp_kern_va = hyp_stack_kern_va(fp, info.type);
+	if (!fp_kern_va)
+		return -EINVAL;
+
+	/*
+	 * Record this frame record's values and location. The prev_fp and
+	 * prev_type are only meaningful to the next hyp_unwind_frame()
+	 * invocation.
+	 */
+	frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp_kern_va));
+	/* PC = LR - 4; All aarch64 instructions are 32-bits in size */
+	frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp_kern_va + 8)) - 4;
+	frame->prev_fp = fp;
+	frame->prev_type = info.type;
+
+	return 0;
+}
+
+/*
+ * AArch64 PCS assigns the frame pointer to x29.
+ *
+ * A simple function prologue looks like this:
+ *	sub	sp, sp, #0x10
+ *	stp	x29, x30, [sp]
+ *	mov	x29, sp
+ *
+ * A simple function epilogue looks like this:
+ *	mov	sp, x29
+ *	ldp	x29, x30, [sp]
+ *	add	sp, sp, #0x10
+ */
+static void hyp_start_backtrace(struct hyp_stackframe *frame, unsigned long fp)
+{
+	frame->fp = fp;
+
+	/*
+	 * Prime the first unwind.
+	 *
+	 * In hyp_unwind_frame() we'll check that the FP points to a valid
+	 * stack, which can't be HYP_STACK_TYPE_UNKNOWN, and the first unwind
+	 * will be treated as a transition to whichever stack that happens to
+	 * be. The prev_fp value won't be used, but we set it to 0 such that
+	 * it is definitely not an accessible stack address. The first frame
+	 * (hyp_panic()) is skipped, so we also set PC to 0.
+	 */
+	bitmap_zero(frame->stacks_done, __NR_HYP_STACK_TYPES);
+	frame->pc = frame->prev_fp = 0;
+	frame->prev_type = HYP_STACK_TYPE_UNKNOWN;
+}
+
+static void hyp_dump_backtrace_entry(unsigned long hyp_pc, unsigned long hyp_offset)
+{
+	unsigned long va_mask = GENMASK_ULL(vabits_actual - 1, 0);
+
+	hyp_pc &= va_mask;
+	hyp_pc += hyp_offset;
+
+	kvm_err(" [<%016llx>]\n", hyp_pc);
+}
+
+void hyp_dump_backtrace(unsigned long hyp_offset)
+{
+	struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+	struct hyp_stackframe frame;
+	int frame_nr = 0;
+	int skip = 1;		/* Skip the first frame: hyp_panic() */
+
+	kvm_err("nVHE HYP call trace (vmlinux addresses):\n");
+
+	hyp_start_backtrace(&frame, (unsigned long)panic_info->start_fp);
+
+	do {
+		if (skip) {
+			skip--;
+			continue;
+		}
+
+		hyp_dump_backtrace_entry(frame.pc, hyp_offset);
+
+		frame_nr++;
+	} while (!hyp_unwind_frame(&frame));
+
+	kvm_err("---- end of nVHE HYP call trace ----\n");
+}
diff --git a/arch/arm64/kvm/stacktrace.h b/arch/arm64/kvm/stacktrace.h
new file mode 100644
index 000000000000..40c397394b9b
--- /dev/null
+++ b/arch/arm64/kvm/stacktrace.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Stack unwinder for EL2 nVHE hypervisor.
+ */
+
+#ifndef __KVM_HYP_STACKTRACE_H
+#define __KVM_HYP_STACKTRACE_H
+
+#ifdef CONFIG_NVHE_EL2_DEBUG
+void hyp_dump_backtrace(unsigned long hyp_offset);
+#else
+static inline void hyp_dump_backtrace(unsigned long hyp_offset)
+{
+}
+#endif /* CONFIG_NVHE_EL2_DEBUG */
+
+#endif /* __KVM_HYP_STACKTRACE_H */
-- 
2.35.1.265.g69c8d7142f-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 7/7] KVM: arm64: Symbolize the nVHE HYP backtrace
  2022-02-10 22:41 ` Kalesh Singh
  (?)
@ 2022-02-10 22:41   ` Kalesh Singh
  -1 siblings, 0 replies; 36+ messages in thread
From: Kalesh Singh @ 2022-02-10 22:41 UTC (permalink / raw)
  Cc: will, maz, qperret, tabba, surenb, kernel-team, Kalesh Singh,
	Catalin Marinas, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Ard Biesheuvel, Mark Rutland, Pasha Tatashin, Joey Gouly,
	Peter Collingbourne, Andrew Walbran, Andrew Scull,
	linux-arm-kernel, linux-kernel, kvmarm

Reintroduce the __kvm_nvhe_ symbols in kallsyms, ignoring the local
symbols in this namespace. The local symbols are not informative and
can cause aliasing issues when symbolizing the addresses.

With the necessary symbols now in kallsyms we can symbolize nVHE
stacktrace addresses using the %pB print format specifier.

Some sample call traces:

-------
[  167.018598][  T407] kvm [407]: nVHE hyp panic at: [<ffffffc0116145cc>] __kvm_nvhe_overflow_stack+0x10/0x34!
[  167.020841][  T407] kvm [407]: nVHE HYP call trace:
[  167.021371][  T407] kvm [407]: [<ffffffc011614934>] __kvm_nvhe_hyp_panic_bad_stack+0xc/0x10
[  167.021972][  T407] kvm [407]: [<ffffffc01160fa48>] __kvm_nvhe___kvm_hyp_host_vector+0x248/0x794
[  167.022572][  T407] kvm [407]: [<ffffffc0116145dc>] __kvm_nvhe_overflow_stack+0x20/0x34
[  167.023135][  T407] kvm [407]: [<ffffffc0116145dc>] __kvm_nvhe_overflow_stack+0x20/0x34
[  167.023699][  T407] kvm [407]: [<ffffffc0116145dc>] __kvm_nvhe_overflow_stack+0x20/0x34
[  167.024261][  T407] kvm [407]: [<ffffffc0116145dc>] __kvm_nvhe_overflow_stack+0x20/0x34
. . .

-------
[  166.161699][  T409] kvm [409]: Invalid host exception to nVHE hyp!
[  166.163789][  T409] kvm [409]: nVHE HYP call trace:
[  166.164709][  T409] kvm [409]: [<ffffffc011614fa0>] __kvm_nvhe_handle___kvm_vcpu_run+0x198/0x21c
[  166.165352][  T409] kvm [409]: [<ffffffc011614980>] __kvm_nvhe_handle_trap+0xa4/0x124
[  166.165911][  T409] kvm [409]: [<ffffffc01160f060>] __kvm_nvhe___host_exit+0x60/0x64
[  166.166657][  T409] Kernel panic - not syncing: HYP panic:
. . .
-------

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---
 arch/arm64/kvm/handle_exit.c | 11 +++--------
 arch/arm64/kvm/stacktrace.c  |  2 +-
 scripts/kallsyms.c           |  2 +-
 3 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index b038c32a3236..d7f0f295aebf 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -296,13 +296,8 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr,
 	u64 elr_in_kimg = __phys_to_kimg(elr_phys);
 	u64 hyp_offset = elr_in_kimg - kaslr_offset() - elr_virt;
 	u64 mode = spsr & PSR_MODE_MASK;
+	u64 panic_addr = elr_virt + hyp_offset;
 
-	/*
-	 * The nVHE hyp symbols are not included by kallsyms to avoid issues
-	 * with aliasing. That means that the symbols cannot be printed with the
-	 * "%pS" format specifier, so fall back to the vmlinux address if
-	 * there's no better option.
-	 */
 	if (mode != PSR_MODE_EL2t && mode != PSR_MODE_EL2h) {
 		kvm_err("Invalid host exception to nVHE hyp!\n");
 	} else if (ESR_ELx_EC(esr) == ESR_ELx_EC_BRK64 &&
@@ -322,9 +317,9 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr,
 		if (file)
 			kvm_err("nVHE hyp BUG at: %s:%u!\n", file, line);
 		else
-			kvm_err("nVHE hyp BUG at: %016llx!\n", elr_virt + hyp_offset);
+			kvm_err("nVHE hyp BUG at: [<%016llx>] %pB!\n", panic_addr, panic_addr);
 	} else {
-		kvm_err("nVHE hyp panic at: %016llx!\n", elr_virt + hyp_offset);
+		kvm_err("nVHE hyp panic at: [<%016llx>] %pB!\n", panic_addr, panic_addr);
 	}
 
 	hyp_dump_backtrace(hyp_offset);
diff --git a/arch/arm64/kvm/stacktrace.c b/arch/arm64/kvm/stacktrace.c
index 3990a616ab66..4d12ffee9cc6 100644
--- a/arch/arm64/kvm/stacktrace.c
+++ b/arch/arm64/kvm/stacktrace.c
@@ -261,7 +261,7 @@ static void hyp_dump_backtrace_entry(unsigned long hyp_pc, unsigned long hyp_off
 	hyp_pc &= va_mask;
 	hyp_pc += hyp_offset;
 
-	kvm_err(" [<%016llx>]\n", hyp_pc);
+	kvm_err("[<%016llx>] %pB\n", hyp_pc, hyp_pc);
 }
 
 void hyp_dump_backtrace(unsigned long hyp_offset)
diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index 54ad86d13784..19aba43d9da4 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -111,7 +111,7 @@ static bool is_ignored_symbol(const char *name, char type)
 		".LASANPC",		/* s390 kasan local symbols */
 		"__crc_",		/* modversions */
 		"__efistub_",		/* arm64 EFI stub namespace */
-		"__kvm_nvhe_",		/* arm64 non-VHE KVM namespace */
+		"__kvm_nvhe_$",		/* arm64 local symbols in non-VHE KVM namespace */
 		"__AArch64ADRPThunk_",	/* arm64 lld */
 		"__ARMV5PILongThunk_",	/* arm lld */
 		"__ARMV7PILongThunk_",
-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 7/7] KVM: arm64: Symbolize the nVHE HYP backtrace
@ 2022-02-10 22:41   ` Kalesh Singh
  0 siblings, 0 replies; 36+ messages in thread
From: Kalesh Singh @ 2022-02-10 22:41 UTC (permalink / raw)
  Cc: will, maz, qperret, tabba, surenb, kernel-team, Kalesh Singh,
	Catalin Marinas, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Ard Biesheuvel, Mark Rutland, Pasha Tatashin, Joey Gouly,
	Peter Collingbourne, Andrew Walbran, Andrew Scull,
	linux-arm-kernel, linux-kernel, kvmarm

Reintroduce the __kvm_nvhe_ symbols in kallsyms, ignoring the local
symbols in this namespace. The local symbols are not informative and
can cause aliasing issues when symbolizing the addresses.

With the necessary symbols now in kallsyms we can symbolize nVHE
stacktrace addresses using the %pB print format specifier.

Some sample call traces:

-------
[  167.018598][  T407] kvm [407]: nVHE hyp panic at: [<ffffffc0116145cc>] __kvm_nvhe_overflow_stack+0x10/0x34!
[  167.020841][  T407] kvm [407]: nVHE HYP call trace:
[  167.021371][  T407] kvm [407]: [<ffffffc011614934>] __kvm_nvhe_hyp_panic_bad_stack+0xc/0x10
[  167.021972][  T407] kvm [407]: [<ffffffc01160fa48>] __kvm_nvhe___kvm_hyp_host_vector+0x248/0x794
[  167.022572][  T407] kvm [407]: [<ffffffc0116145dc>] __kvm_nvhe_overflow_stack+0x20/0x34
[  167.023135][  T407] kvm [407]: [<ffffffc0116145dc>] __kvm_nvhe_overflow_stack+0x20/0x34
[  167.023699][  T407] kvm [407]: [<ffffffc0116145dc>] __kvm_nvhe_overflow_stack+0x20/0x34
[  167.024261][  T407] kvm [407]: [<ffffffc0116145dc>] __kvm_nvhe_overflow_stack+0x20/0x34
. . .

-------
[  166.161699][  T409] kvm [409]: Invalid host exception to nVHE hyp!
[  166.163789][  T409] kvm [409]: nVHE HYP call trace:
[  166.164709][  T409] kvm [409]: [<ffffffc011614fa0>] __kvm_nvhe_handle___kvm_vcpu_run+0x198/0x21c
[  166.165352][  T409] kvm [409]: [<ffffffc011614980>] __kvm_nvhe_handle_trap+0xa4/0x124
[  166.165911][  T409] kvm [409]: [<ffffffc01160f060>] __kvm_nvhe___host_exit+0x60/0x64
[  166.166657][  T409] Kernel panic - not syncing: HYP panic:
. . .
-------

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---
 arch/arm64/kvm/handle_exit.c | 11 +++--------
 arch/arm64/kvm/stacktrace.c  |  2 +-
 scripts/kallsyms.c           |  2 +-
 3 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index b038c32a3236..d7f0f295aebf 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -296,13 +296,8 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr,
 	u64 elr_in_kimg = __phys_to_kimg(elr_phys);
 	u64 hyp_offset = elr_in_kimg - kaslr_offset() - elr_virt;
 	u64 mode = spsr & PSR_MODE_MASK;
+	u64 panic_addr = elr_virt + hyp_offset;
 
-	/*
-	 * The nVHE hyp symbols are not included by kallsyms to avoid issues
-	 * with aliasing. That means that the symbols cannot be printed with the
-	 * "%pS" format specifier, so fall back to the vmlinux address if
-	 * there's no better option.
-	 */
 	if (mode != PSR_MODE_EL2t && mode != PSR_MODE_EL2h) {
 		kvm_err("Invalid host exception to nVHE hyp!\n");
 	} else if (ESR_ELx_EC(esr) == ESR_ELx_EC_BRK64 &&
@@ -322,9 +317,9 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr,
 		if (file)
 			kvm_err("nVHE hyp BUG at: %s:%u!\n", file, line);
 		else
-			kvm_err("nVHE hyp BUG at: %016llx!\n", elr_virt + hyp_offset);
+			kvm_err("nVHE hyp BUG at: [<%016llx>] %pB!\n", panic_addr, panic_addr);
 	} else {
-		kvm_err("nVHE hyp panic at: %016llx!\n", elr_virt + hyp_offset);
+		kvm_err("nVHE hyp panic at: [<%016llx>] %pB!\n", panic_addr, panic_addr);
 	}
 
 	hyp_dump_backtrace(hyp_offset);
diff --git a/arch/arm64/kvm/stacktrace.c b/arch/arm64/kvm/stacktrace.c
index 3990a616ab66..4d12ffee9cc6 100644
--- a/arch/arm64/kvm/stacktrace.c
+++ b/arch/arm64/kvm/stacktrace.c
@@ -261,7 +261,7 @@ static void hyp_dump_backtrace_entry(unsigned long hyp_pc, unsigned long hyp_off
 	hyp_pc &= va_mask;
 	hyp_pc += hyp_offset;
 
-	kvm_err(" [<%016llx>]\n", hyp_pc);
+	kvm_err("[<%016llx>] %pB\n", hyp_pc, hyp_pc);
 }
 
 void hyp_dump_backtrace(unsigned long hyp_offset)
diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index 54ad86d13784..19aba43d9da4 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -111,7 +111,7 @@ static bool is_ignored_symbol(const char *name, char type)
 		".LASANPC",		/* s390 kasan local symbols */
 		"__crc_",		/* modversions */
 		"__efistub_",		/* arm64 EFI stub namespace */
-		"__kvm_nvhe_",		/* arm64 non-VHE KVM namespace */
+		"__kvm_nvhe_$",		/* arm64 local symbols in non-VHE KVM namespace */
 		"__AArch64ADRPThunk_",	/* arm64 lld */
 		"__ARMV5PILongThunk_",	/* arm lld */
 		"__ARMV7PILongThunk_",
-- 
2.35.1.265.g69c8d7142f-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 7/7] KVM: arm64: Symbolize the nVHE HYP backtrace
@ 2022-02-10 22:41   ` Kalesh Singh
  0 siblings, 0 replies; 36+ messages in thread
From: Kalesh Singh @ 2022-02-10 22:41 UTC (permalink / raw)
  Cc: kernel-team, Catalin Marinas, Pasha Tatashin, will,
	Peter Collingbourne, maz, linux-kernel, Joey Gouly, kvmarm,
	Andrew Walbran, Kalesh Singh, surenb, linux-arm-kernel

Reintroduce the __kvm_nvhe_ symbols in kallsyms, ignoring the local
symbols in this namespace. The local symbols are not informative and
can cause aliasing issues when symbolizing the addresses.

With the necessary symbols now in kallsyms we can symbolize nVHE
stacktrace addresses using the %pB print format specifier.

Some sample call traces:

-------
[  167.018598][  T407] kvm [407]: nVHE hyp panic at: [<ffffffc0116145cc>] __kvm_nvhe_overflow_stack+0x10/0x34!
[  167.020841][  T407] kvm [407]: nVHE HYP call trace:
[  167.021371][  T407] kvm [407]: [<ffffffc011614934>] __kvm_nvhe_hyp_panic_bad_stack+0xc/0x10
[  167.021972][  T407] kvm [407]: [<ffffffc01160fa48>] __kvm_nvhe___kvm_hyp_host_vector+0x248/0x794
[  167.022572][  T407] kvm [407]: [<ffffffc0116145dc>] __kvm_nvhe_overflow_stack+0x20/0x34
[  167.023135][  T407] kvm [407]: [<ffffffc0116145dc>] __kvm_nvhe_overflow_stack+0x20/0x34
[  167.023699][  T407] kvm [407]: [<ffffffc0116145dc>] __kvm_nvhe_overflow_stack+0x20/0x34
[  167.024261][  T407] kvm [407]: [<ffffffc0116145dc>] __kvm_nvhe_overflow_stack+0x20/0x34
. . .

-------
[  166.161699][  T409] kvm [409]: Invalid host exception to nVHE hyp!
[  166.163789][  T409] kvm [409]: nVHE HYP call trace:
[  166.164709][  T409] kvm [409]: [<ffffffc011614fa0>] __kvm_nvhe_handle___kvm_vcpu_run+0x198/0x21c
[  166.165352][  T409] kvm [409]: [<ffffffc011614980>] __kvm_nvhe_handle_trap+0xa4/0x124
[  166.165911][  T409] kvm [409]: [<ffffffc01160f060>] __kvm_nvhe___host_exit+0x60/0x64
[  166.166657][  T409] Kernel panic - not syncing: HYP panic:
. . .
-------

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---
 arch/arm64/kvm/handle_exit.c | 11 +++--------
 arch/arm64/kvm/stacktrace.c  |  2 +-
 scripts/kallsyms.c           |  2 +-
 3 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index b038c32a3236..d7f0f295aebf 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -296,13 +296,8 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr,
 	u64 elr_in_kimg = __phys_to_kimg(elr_phys);
 	u64 hyp_offset = elr_in_kimg - kaslr_offset() - elr_virt;
 	u64 mode = spsr & PSR_MODE_MASK;
+	u64 panic_addr = elr_virt + hyp_offset;
 
-	/*
-	 * The nVHE hyp symbols are not included by kallsyms to avoid issues
-	 * with aliasing. That means that the symbols cannot be printed with the
-	 * "%pS" format specifier, so fall back to the vmlinux address if
-	 * there's no better option.
-	 */
 	if (mode != PSR_MODE_EL2t && mode != PSR_MODE_EL2h) {
 		kvm_err("Invalid host exception to nVHE hyp!\n");
 	} else if (ESR_ELx_EC(esr) == ESR_ELx_EC_BRK64 &&
@@ -322,9 +317,9 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr,
 		if (file)
 			kvm_err("nVHE hyp BUG at: %s:%u!\n", file, line);
 		else
-			kvm_err("nVHE hyp BUG at: %016llx!\n", elr_virt + hyp_offset);
+			kvm_err("nVHE hyp BUG at: [<%016llx>] %pB!\n", panic_addr, panic_addr);
 	} else {
-		kvm_err("nVHE hyp panic at: %016llx!\n", elr_virt + hyp_offset);
+		kvm_err("nVHE hyp panic at: [<%016llx>] %pB!\n", panic_addr, panic_addr);
 	}
 
 	hyp_dump_backtrace(hyp_offset);
diff --git a/arch/arm64/kvm/stacktrace.c b/arch/arm64/kvm/stacktrace.c
index 3990a616ab66..4d12ffee9cc6 100644
--- a/arch/arm64/kvm/stacktrace.c
+++ b/arch/arm64/kvm/stacktrace.c
@@ -261,7 +261,7 @@ static void hyp_dump_backtrace_entry(unsigned long hyp_pc, unsigned long hyp_off
 	hyp_pc &= va_mask;
 	hyp_pc += hyp_offset;
 
-	kvm_err(" [<%016llx>]\n", hyp_pc);
+	kvm_err("[<%016llx>] %pB\n", hyp_pc, hyp_pc);
 }
 
 void hyp_dump_backtrace(unsigned long hyp_offset)
diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index 54ad86d13784..19aba43d9da4 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -111,7 +111,7 @@ static bool is_ignored_symbol(const char *name, char type)
 		".LASANPC",		/* s390 kasan local symbols */
 		"__crc_",		/* modversions */
 		"__efistub_",		/* arm64 EFI stub namespace */
-		"__kvm_nvhe_",		/* arm64 non-VHE KVM namespace */
+		"__kvm_nvhe_$",		/* arm64 local symbols in non-VHE KVM namespace */
 		"__AArch64ADRPThunk_",	/* arm64 lld */
 		"__ARMV5PILongThunk_",	/* arm lld */
 		"__ARMV7PILongThunk_",
-- 
2.35.1.265.g69c8d7142f-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/7] KVM: arm64: Hypervisor stack enhancements
  2022-02-10 22:41 ` Kalesh Singh
  (?)
@ 2022-02-14 11:41   ` Marc Zyngier
  -1 siblings, 0 replies; 36+ messages in thread
From: Marc Zyngier @ 2022-02-14 11:41 UTC (permalink / raw)
  To: Kalesh Singh
  Cc: Pasha Tatashin, will, Peter Collingbourne, kernel-team,
	linux-kernel, Joey Gouly, kvmarm, Andrew Walbran,
	Catalin Marinas, Paolo Bonzini, surenb, linux-arm-kernel

On Thu, 10 Feb 2022 22:41:41 +0000,
Kalesh Singh <kaleshsingh@google.com> wrote:
> 
> This series is based on v5.17-rc3 and adds the following stack features to
> the KVM nVHE hypervisor:
> 
> == Hyp Stack Guard Pages ==
> 
> Based on the technique used by arm64 VMAP_STACK to detect overflow.
> i.e. the stack is aligned to twice its size which ensure that the 
> 'stack shift' bit of any valid SP is 0. The 'stack shift' bit can be
> tested in the exception entry to detect overflow without corrupting GPRs.

Having quickly parsed the code, this seems to only be effective for
pKVM and the EL2-allocated stack. Is there any technical reason not to
implement this for the much more common case of 'classic' KVM in nVHE
mode?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/7] KVM: arm64: Hypervisor stack enhancements
@ 2022-02-14 11:41   ` Marc Zyngier
  0 siblings, 0 replies; 36+ messages in thread
From: Marc Zyngier @ 2022-02-14 11:41 UTC (permalink / raw)
  To: Kalesh Singh
  Cc: will, qperret, tabba, surenb, kernel-team, Catalin Marinas,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Ard Biesheuvel,
	Mark Rutland, Pasha Tatashin, Joey Gouly, Peter Collingbourne,
	Andrew Walbran, Andrew Scull, Paolo Bonzini, linux-arm-kernel,
	linux-kernel, kvmarm

On Thu, 10 Feb 2022 22:41:41 +0000,
Kalesh Singh <kaleshsingh@google.com> wrote:
> 
> This series is based on v5.17-rc3 and adds the following stack features to
> the KVM nVHE hypervisor:
> 
> == Hyp Stack Guard Pages ==
> 
> Based on the technique used by arm64 VMAP_STACK to detect overflow.
> i.e. the stack is aligned to twice its size which ensure that the 
> 'stack shift' bit of any valid SP is 0. The 'stack shift' bit can be
> tested in the exception entry to detect overflow without corrupting GPRs.

Having quickly parsed the code, this seems to only be effective for
pKVM and the EL2-allocated stack. Is there any technical reason not to
implement this for the much more common case of 'classic' KVM in nVHE
mode?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/7] KVM: arm64: Hypervisor stack enhancements
@ 2022-02-14 11:41   ` Marc Zyngier
  0 siblings, 0 replies; 36+ messages in thread
From: Marc Zyngier @ 2022-02-14 11:41 UTC (permalink / raw)
  To: Kalesh Singh
  Cc: will, qperret, tabba, surenb, kernel-team, Catalin Marinas,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Ard Biesheuvel,
	Mark Rutland, Pasha Tatashin, Joey Gouly, Peter Collingbourne,
	Andrew Walbran, Andrew Scull, Paolo Bonzini, linux-arm-kernel,
	linux-kernel, kvmarm

On Thu, 10 Feb 2022 22:41:41 +0000,
Kalesh Singh <kaleshsingh@google.com> wrote:
> 
> This series is based on v5.17-rc3 and adds the following stack features to
> the KVM nVHE hypervisor:
> 
> == Hyp Stack Guard Pages ==
> 
> Based on the technique used by arm64 VMAP_STACK to detect overflow.
> i.e. the stack is aligned to twice its size which ensure that the 
> 'stack shift' bit of any valid SP is 0. The 'stack shift' bit can be
> tested in the exception entry to detect overflow without corrupting GPRs.

Having quickly parsed the code, this seems to only be effective for
pKVM and the EL2-allocated stack. Is there any technical reason not to
implement this for the much more common case of 'classic' KVM in nVHE
mode?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 4/7] KVM: arm64: Allocate guard pages near hyp stacks
  2022-02-10 22:41   ` Kalesh Singh
  (?)
@ 2022-02-14 14:06     ` Marc Zyngier
  -1 siblings, 0 replies; 36+ messages in thread
From: Marc Zyngier @ 2022-02-14 14:06 UTC (permalink / raw)
  To: Kalesh Singh
  Cc: will, qperret, tabba, surenb, kernel-team, Catalin Marinas,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Ard Biesheuvel,
	Mark Rutland, Pasha Tatashin, Joey Gouly, Peter Collingbourne,
	Andrew Scull, linux-arm-kernel, linux-kernel, kvmarm

On Thu, 10 Feb 2022 22:41:45 +0000,
Kalesh Singh <kaleshsingh@google.com> wrote:
> 
> From: Quentin Perret <qperret@google.com>
> 
> Allocate unbacked VA space underneath each stack page to ensure stack
> overflows get trapped and don't corrupt memory silently.
> 
> The stack is aligned to twice its size (PAGE_SIZE), meaning that any
> valid stack address has PAGE_SHIFT bit as 0. This allows us to easily
> check for overflow in the exception entry without corrupting any GPRs.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> [ Kalesh - Update commit text and comments,
>            refactor, add overflow handling ]
> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> ---
>  arch/arm64/kvm/hyp/nvhe/host.S   | 16 ++++++++++++++++
>  arch/arm64/kvm/hyp/nvhe/setup.c  | 19 ++++++++++++++++++-
>  arch/arm64/kvm/hyp/nvhe/switch.c |  5 +++++
>  3 files changed, 39 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
> index 3d613e721a75..78e4b612ac06 100644
> --- a/arch/arm64/kvm/hyp/nvhe/host.S
> +++ b/arch/arm64/kvm/hyp/nvhe/host.S
> @@ -153,6 +153,10 @@ SYM_FUNC_END(__host_hvc)
>  
>  .macro invalid_host_el2_vect
>  	.align 7
> +
> +	/* Test stack overflow without corrupting GPRs */
> +	test_sp_overflow PAGE_SHIFT, .L__hyp_sp_overflow\@
> +

I am definitely concerned with this in a system not using pKVM (which
is on average 100% of the upstream users so far! ;-). This is more or
less guaranteed to do the wrong thing 50% of the times, depending on
the alignment of the stack.

>  	/* If a guest is loaded, panic out of it. */
>  	stp	x0, x1, [sp, #-16]!
>  	get_loaded_vcpu x0, x1
> @@ -165,6 +169,18 @@ SYM_FUNC_END(__host_hvc)
>  	 * been partially clobbered by __host_enter.
>  	 */
>  	b	hyp_panic
> +
> +.L__hyp_sp_overflow\@:
> +	/*
> +	 * Reset SP to the top of the stack, to allow handling the hyp_panic.
> +	 * This corrupts the stack but is ok, since we won't be attempting
> +	 * any unwinding here.
> +	 */
> +	ldr_this_cpu	x0, kvm_init_params + NVHE_INIT_STACK_HYP_VA, x1
> +	mov	sp, x0
> +
> +	bl	hyp_panic_bad_stack
> +	ASM_BUG()
>  .endm
>  
>  .macro invalid_host_el1_vect
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index 99e178cf4249..114053dff228 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -105,7 +105,24 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
>  		if (ret)
>  			return ret;
>  
> -		/* Map stack pages in the 'private' VA range */
> +		/*
> +		 * Allocate 'private' VA range for stack guard pages.
> +		 *
> +		 * The 'private' VA range grows upward and stacks downwards, so
> +		 * allocate the guard page first. But make sure to align the
> +		 * stack itself with PAGE_SIZE * 2 granularity to ease overflow
> +		 * detection in the entry assembly code.
> +		 */
> +		do {
> +			start = (void *)hyp_alloc_private_va_range(PAGE_SIZE);
> +			if (IS_ERR(start))
> +				return PTR_ERR(start);
> +		} while (IS_ALIGNED((u64) start, PAGE_SIZE * 2));

This seems cumbersome. Can't we tweak hyp_alloc_private_va_range() to
perform the required alignment? It could easily be convinced to return
an address that is aligned on the size of the region, which would
avoid this sort of loop.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 4/7] KVM: arm64: Allocate guard pages near hyp stacks
@ 2022-02-14 14:06     ` Marc Zyngier
  0 siblings, 0 replies; 36+ messages in thread
From: Marc Zyngier @ 2022-02-14 14:06 UTC (permalink / raw)
  To: Kalesh Singh
  Cc: Pasha Tatashin, will, Peter Collingbourne, kernel-team,
	linux-kernel, Joey Gouly, kvmarm, Catalin Marinas, surenb,
	linux-arm-kernel

On Thu, 10 Feb 2022 22:41:45 +0000,
Kalesh Singh <kaleshsingh@google.com> wrote:
> 
> From: Quentin Perret <qperret@google.com>
> 
> Allocate unbacked VA space underneath each stack page to ensure stack
> overflows get trapped and don't corrupt memory silently.
> 
> The stack is aligned to twice its size (PAGE_SIZE), meaning that any
> valid stack address has PAGE_SHIFT bit as 0. This allows us to easily
> check for overflow in the exception entry without corrupting any GPRs.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> [ Kalesh - Update commit text and comments,
>            refactor, add overflow handling ]
> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> ---
>  arch/arm64/kvm/hyp/nvhe/host.S   | 16 ++++++++++++++++
>  arch/arm64/kvm/hyp/nvhe/setup.c  | 19 ++++++++++++++++++-
>  arch/arm64/kvm/hyp/nvhe/switch.c |  5 +++++
>  3 files changed, 39 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
> index 3d613e721a75..78e4b612ac06 100644
> --- a/arch/arm64/kvm/hyp/nvhe/host.S
> +++ b/arch/arm64/kvm/hyp/nvhe/host.S
> @@ -153,6 +153,10 @@ SYM_FUNC_END(__host_hvc)
>  
>  .macro invalid_host_el2_vect
>  	.align 7
> +
> +	/* Test stack overflow without corrupting GPRs */
> +	test_sp_overflow PAGE_SHIFT, .L__hyp_sp_overflow\@
> +

I am definitely concerned with this in a system not using pKVM (which
is on average 100% of the upstream users so far! ;-). This is more or
less guaranteed to do the wrong thing 50% of the times, depending on
the alignment of the stack.

>  	/* If a guest is loaded, panic out of it. */
>  	stp	x0, x1, [sp, #-16]!
>  	get_loaded_vcpu x0, x1
> @@ -165,6 +169,18 @@ SYM_FUNC_END(__host_hvc)
>  	 * been partially clobbered by __host_enter.
>  	 */
>  	b	hyp_panic
> +
> +.L__hyp_sp_overflow\@:
> +	/*
> +	 * Reset SP to the top of the stack, to allow handling the hyp_panic.
> +	 * This corrupts the stack but is ok, since we won't be attempting
> +	 * any unwinding here.
> +	 */
> +	ldr_this_cpu	x0, kvm_init_params + NVHE_INIT_STACK_HYP_VA, x1
> +	mov	sp, x0
> +
> +	bl	hyp_panic_bad_stack
> +	ASM_BUG()
>  .endm
>  
>  .macro invalid_host_el1_vect
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index 99e178cf4249..114053dff228 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -105,7 +105,24 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
>  		if (ret)
>  			return ret;
>  
> -		/* Map stack pages in the 'private' VA range */
> +		/*
> +		 * Allocate 'private' VA range for stack guard pages.
> +		 *
> +		 * The 'private' VA range grows upward and stacks downwards, so
> +		 * allocate the guard page first. But make sure to align the
> +		 * stack itself with PAGE_SIZE * 2 granularity to ease overflow
> +		 * detection in the entry assembly code.
> +		 */
> +		do {
> +			start = (void *)hyp_alloc_private_va_range(PAGE_SIZE);
> +			if (IS_ERR(start))
> +				return PTR_ERR(start);
> +		} while (IS_ALIGNED((u64) start, PAGE_SIZE * 2));

This seems cumbersome. Can't we tweak hyp_alloc_private_va_range() to
perform the required alignment? It could easily be convinced to return
an address that is aligned on the size of the region, which would
avoid this sort of loop.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 4/7] KVM: arm64: Allocate guard pages near hyp stacks
@ 2022-02-14 14:06     ` Marc Zyngier
  0 siblings, 0 replies; 36+ messages in thread
From: Marc Zyngier @ 2022-02-14 14:06 UTC (permalink / raw)
  To: Kalesh Singh
  Cc: will, qperret, tabba, surenb, kernel-team, Catalin Marinas,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Ard Biesheuvel,
	Mark Rutland, Pasha Tatashin, Joey Gouly, Peter Collingbourne,
	Andrew Scull, linux-arm-kernel, linux-kernel, kvmarm

On Thu, 10 Feb 2022 22:41:45 +0000,
Kalesh Singh <kaleshsingh@google.com> wrote:
> 
> From: Quentin Perret <qperret@google.com>
> 
> Allocate unbacked VA space underneath each stack page to ensure stack
> overflows get trapped and don't corrupt memory silently.
> 
> The stack is aligned to twice its size (PAGE_SIZE), meaning that any
> valid stack address has PAGE_SHIFT bit as 0. This allows us to easily
> check for overflow in the exception entry without corrupting any GPRs.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> [ Kalesh - Update commit text and comments,
>            refactor, add overflow handling ]
> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> ---
>  arch/arm64/kvm/hyp/nvhe/host.S   | 16 ++++++++++++++++
>  arch/arm64/kvm/hyp/nvhe/setup.c  | 19 ++++++++++++++++++-
>  arch/arm64/kvm/hyp/nvhe/switch.c |  5 +++++
>  3 files changed, 39 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
> index 3d613e721a75..78e4b612ac06 100644
> --- a/arch/arm64/kvm/hyp/nvhe/host.S
> +++ b/arch/arm64/kvm/hyp/nvhe/host.S
> @@ -153,6 +153,10 @@ SYM_FUNC_END(__host_hvc)
>  
>  .macro invalid_host_el2_vect
>  	.align 7
> +
> +	/* Test stack overflow without corrupting GPRs */
> +	test_sp_overflow PAGE_SHIFT, .L__hyp_sp_overflow\@
> +

I am definitely concerned with this in a system not using pKVM (which
is on average 100% of the upstream users so far! ;-). This is more or
less guaranteed to do the wrong thing 50% of the times, depending on
the alignment of the stack.

>  	/* If a guest is loaded, panic out of it. */
>  	stp	x0, x1, [sp, #-16]!
>  	get_loaded_vcpu x0, x1
> @@ -165,6 +169,18 @@ SYM_FUNC_END(__host_hvc)
>  	 * been partially clobbered by __host_enter.
>  	 */
>  	b	hyp_panic
> +
> +.L__hyp_sp_overflow\@:
> +	/*
> +	 * Reset SP to the top of the stack, to allow handling the hyp_panic.
> +	 * This corrupts the stack but is ok, since we won't be attempting
> +	 * any unwinding here.
> +	 */
> +	ldr_this_cpu	x0, kvm_init_params + NVHE_INIT_STACK_HYP_VA, x1
> +	mov	sp, x0
> +
> +	bl	hyp_panic_bad_stack
> +	ASM_BUG()
>  .endm
>  
>  .macro invalid_host_el1_vect
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index 99e178cf4249..114053dff228 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -105,7 +105,24 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
>  		if (ret)
>  			return ret;
>  
> -		/* Map stack pages in the 'private' VA range */
> +		/*
> +		 * Allocate 'private' VA range for stack guard pages.
> +		 *
> +		 * The 'private' VA range grows upward and stacks downwards, so
> +		 * allocate the guard page first. But make sure to align the
> +		 * stack itself with PAGE_SIZE * 2 granularity to ease overflow
> +		 * detection in the entry assembly code.
> +		 */
> +		do {
> +			start = (void *)hyp_alloc_private_va_range(PAGE_SIZE);
> +			if (IS_ERR(start))
> +				return PTR_ERR(start);
> +		} while (IS_ALIGNED((u64) start, PAGE_SIZE * 2));

This seems cumbersome. Can't we tweak hyp_alloc_private_va_range() to
perform the required alignment? It could easily be convinced to return
an address that is aligned on the size of the region, which would
avoid this sort of loop.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/7] KVM: arm64: Hypervisor stack enhancements
  2022-02-14 11:41   ` Marc Zyngier
  (?)
@ 2022-02-14 21:54     ` Kalesh Singh
  -1 siblings, 0 replies; 36+ messages in thread
From: Kalesh Singh @ 2022-02-14 21:54 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Will Deacon, Quentin Perret, Fuad Tabba, Suren Baghdasaryan,
	Cc: Android Kernel, Catalin Marinas, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Ard Biesheuvel, Mark Rutland,
	Pasha Tatashin, Joey Gouly, Peter Collingbourne, Andrew Walbran,
	Andrew Scull, Paolo Bonzini,
	moderated list:ARM64 PORT (AARCH64 ARCHITECTURE),
	LKML, kvmarm

On Mon, Feb 14, 2022 at 3:41 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Thu, 10 Feb 2022 22:41:41 +0000,
> Kalesh Singh <kaleshsingh@google.com> wrote:
> >
> > This series is based on v5.17-rc3 and adds the following stack features to
> > the KVM nVHE hypervisor:
> >
> > == Hyp Stack Guard Pages ==
> >
> > Based on the technique used by arm64 VMAP_STACK to detect overflow.
> > i.e. the stack is aligned to twice its size which ensure that the
> > 'stack shift' bit of any valid SP is 0. The 'stack shift' bit can be
> > tested in the exception entry to detect overflow without corrupting GPRs.
>
> Having quickly parsed the code, this seems to only be effective for
> pKVM and the EL2-allocated stack. Is there any technical reason not to
> implement this for the much more common case of 'classic' KVM in nVHE
> mode?

Hi Marc,

No technical reason. We hadn't thought of it from that perspective.
It's a good idea, I'll look into this and repost  a new version.

Thanks,
Kalesh

>
> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/7] KVM: arm64: Hypervisor stack enhancements
@ 2022-02-14 21:54     ` Kalesh Singh
  0 siblings, 0 replies; 36+ messages in thread
From: Kalesh Singh @ 2022-02-14 21:54 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Will Deacon, Quentin Perret, Fuad Tabba, Suren Baghdasaryan,
	Cc: Android Kernel, Catalin Marinas, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Ard Biesheuvel, Mark Rutland,
	Pasha Tatashin, Joey Gouly, Peter Collingbourne, Andrew Walbran,
	Andrew Scull, Paolo Bonzini,
	moderated list:ARM64 PORT (AARCH64 ARCHITECTURE),
	LKML, kvmarm

On Mon, Feb 14, 2022 at 3:41 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Thu, 10 Feb 2022 22:41:41 +0000,
> Kalesh Singh <kaleshsingh@google.com> wrote:
> >
> > This series is based on v5.17-rc3 and adds the following stack features to
> > the KVM nVHE hypervisor:
> >
> > == Hyp Stack Guard Pages ==
> >
> > Based on the technique used by arm64 VMAP_STACK to detect overflow.
> > i.e. the stack is aligned to twice its size which ensure that the
> > 'stack shift' bit of any valid SP is 0. The 'stack shift' bit can be
> > tested in the exception entry to detect overflow without corrupting GPRs.
>
> Having quickly parsed the code, this seems to only be effective for
> pKVM and the EL2-allocated stack. Is there any technical reason not to
> implement this for the much more common case of 'classic' KVM in nVHE
> mode?

Hi Marc,

No technical reason. We hadn't thought of it from that perspective.
It's a good idea, I'll look into this and repost  a new version.

Thanks,
Kalesh

>
> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/7] KVM: arm64: Hypervisor stack enhancements
@ 2022-02-14 21:54     ` Kalesh Singh
  0 siblings, 0 replies; 36+ messages in thread
From: Kalesh Singh @ 2022-02-14 21:54 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Pasha Tatashin, Will Deacon, Peter Collingbourne,
	Cc: Android Kernel, LKML, Joey Gouly, kvmarm, Andrew Walbran,
	Catalin Marinas, Paolo Bonzini, Suren Baghdasaryan,
	moderated list:ARM64 PORT (AARCH64 ARCHITECTURE)

On Mon, Feb 14, 2022 at 3:41 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Thu, 10 Feb 2022 22:41:41 +0000,
> Kalesh Singh <kaleshsingh@google.com> wrote:
> >
> > This series is based on v5.17-rc3 and adds the following stack features to
> > the KVM nVHE hypervisor:
> >
> > == Hyp Stack Guard Pages ==
> >
> > Based on the technique used by arm64 VMAP_STACK to detect overflow.
> > i.e. the stack is aligned to twice its size which ensure that the
> > 'stack shift' bit of any valid SP is 0. The 'stack shift' bit can be
> > tested in the exception entry to detect overflow without corrupting GPRs.
>
> Having quickly parsed the code, this seems to only be effective for
> pKVM and the EL2-allocated stack. Is there any technical reason not to
> implement this for the much more common case of 'classic' KVM in nVHE
> mode?

Hi Marc,

No technical reason. We hadn't thought of it from that perspective.
It's a good idea, I'll look into this and repost  a new version.

Thanks,
Kalesh

>
> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 4/7] KVM: arm64: Allocate guard pages near hyp stacks
  2022-02-14 14:06     ` Marc Zyngier
  (?)
@ 2022-02-14 22:03       ` Kalesh Singh
  -1 siblings, 0 replies; 36+ messages in thread
From: Kalesh Singh @ 2022-02-14 22:03 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Will Deacon, Quentin Perret, Fuad Tabba, Suren Baghdasaryan,
	Cc: Android Kernel, Catalin Marinas, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Ard Biesheuvel, Mark Rutland,
	Pasha Tatashin, Joey Gouly, Peter Collingbourne, Andrew Scull,
	moderated list:ARM64 PORT (AARCH64 ARCHITECTURE),
	LKML, kvmarm

On Mon, Feb 14, 2022 at 6:06 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Thu, 10 Feb 2022 22:41:45 +0000,
> Kalesh Singh <kaleshsingh@google.com> wrote:
> >
> > From: Quentin Perret <qperret@google.com>
> >
> > Allocate unbacked VA space underneath each stack page to ensure stack
> > overflows get trapped and don't corrupt memory silently.
> >
> > The stack is aligned to twice its size (PAGE_SIZE), meaning that any
> > valid stack address has PAGE_SHIFT bit as 0. This allows us to easily
> > check for overflow in the exception entry without corrupting any GPRs.
> >
> > Signed-off-by: Quentin Perret <qperret@google.com>
> > [ Kalesh - Update commit text and comments,
> >            refactor, add overflow handling ]
> > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > ---
> >  arch/arm64/kvm/hyp/nvhe/host.S   | 16 ++++++++++++++++
> >  arch/arm64/kvm/hyp/nvhe/setup.c  | 19 ++++++++++++++++++-
> >  arch/arm64/kvm/hyp/nvhe/switch.c |  5 +++++
> >  3 files changed, 39 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
> > index 3d613e721a75..78e4b612ac06 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/host.S
> > +++ b/arch/arm64/kvm/hyp/nvhe/host.S
> > @@ -153,6 +153,10 @@ SYM_FUNC_END(__host_hvc)
> >
> >  .macro invalid_host_el2_vect
> >       .align 7
> > +
> > +     /* Test stack overflow without corrupting GPRs */
> > +     test_sp_overflow PAGE_SHIFT, .L__hyp_sp_overflow\@
> > +
>
> I am definitely concerned with this in a system not using pKVM (which
> is on average 100% of the upstream users so far! ;-). This is more or
> less guaranteed to do the wrong thing 50% of the times, depending on
> the alignment of the stack.

Thanks for pointing this out Marc. I'll rework this to work for both
protected and non-protected modes.

>
> >       /* If a guest is loaded, panic out of it. */
> >       stp     x0, x1, [sp, #-16]!
> >       get_loaded_vcpu x0, x1
> > @@ -165,6 +169,18 @@ SYM_FUNC_END(__host_hvc)
> >        * been partially clobbered by __host_enter.
> >        */
> >       b       hyp_panic
> > +
> > +.L__hyp_sp_overflow\@:
> > +     /*
> > +      * Reset SP to the top of the stack, to allow handling the hyp_panic.
> > +      * This corrupts the stack but is ok, since we won't be attempting
> > +      * any unwinding here.
> > +      */
> > +     ldr_this_cpu    x0, kvm_init_params + NVHE_INIT_STACK_HYP_VA, x1
> > +     mov     sp, x0
> > +
> > +     bl      hyp_panic_bad_stack
> > +     ASM_BUG()
> >  .endm
> >
> >  .macro invalid_host_el1_vect
> > diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> > index 99e178cf4249..114053dff228 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> > @@ -105,7 +105,24 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
> >               if (ret)
> >                       return ret;
> >
> > -             /* Map stack pages in the 'private' VA range */
> > +             /*
> > +              * Allocate 'private' VA range for stack guard pages.
> > +              *
> > +              * The 'private' VA range grows upward and stacks downwards, so
> > +              * allocate the guard page first. But make sure to align the
> > +              * stack itself with PAGE_SIZE * 2 granularity to ease overflow
> > +              * detection in the entry assembly code.
> > +              */
> > +             do {
> > +                     start = (void *)hyp_alloc_private_va_range(PAGE_SIZE);
> > +                     if (IS_ERR(start))
> > +                             return PTR_ERR(start);
> > +             } while (IS_ALIGNED((u64) start, PAGE_SIZE * 2));
>
> This seems cumbersome. Can't we tweak hyp_alloc_private_va_range() to
> perform the required alignment? It could easily be convinced to return
> an address that is aligned on the size of the region, which would
> avoid this sort of loop.

Ack. I'll update it for v2.

- Kalesh

>
> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 4/7] KVM: arm64: Allocate guard pages near hyp stacks
@ 2022-02-14 22:03       ` Kalesh Singh
  0 siblings, 0 replies; 36+ messages in thread
From: Kalesh Singh @ 2022-02-14 22:03 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Will Deacon, Quentin Perret, Fuad Tabba, Suren Baghdasaryan,
	Cc: Android Kernel, Catalin Marinas, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Ard Biesheuvel, Mark Rutland,
	Pasha Tatashin, Joey Gouly, Peter Collingbourne, Andrew Scull,
	moderated list:ARM64 PORT (AARCH64 ARCHITECTURE),
	LKML, kvmarm

On Mon, Feb 14, 2022 at 6:06 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Thu, 10 Feb 2022 22:41:45 +0000,
> Kalesh Singh <kaleshsingh@google.com> wrote:
> >
> > From: Quentin Perret <qperret@google.com>
> >
> > Allocate unbacked VA space underneath each stack page to ensure stack
> > overflows get trapped and don't corrupt memory silently.
> >
> > The stack is aligned to twice its size (PAGE_SIZE), meaning that any
> > valid stack address has PAGE_SHIFT bit as 0. This allows us to easily
> > check for overflow in the exception entry without corrupting any GPRs.
> >
> > Signed-off-by: Quentin Perret <qperret@google.com>
> > [ Kalesh - Update commit text and comments,
> >            refactor, add overflow handling ]
> > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > ---
> >  arch/arm64/kvm/hyp/nvhe/host.S   | 16 ++++++++++++++++
> >  arch/arm64/kvm/hyp/nvhe/setup.c  | 19 ++++++++++++++++++-
> >  arch/arm64/kvm/hyp/nvhe/switch.c |  5 +++++
> >  3 files changed, 39 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
> > index 3d613e721a75..78e4b612ac06 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/host.S
> > +++ b/arch/arm64/kvm/hyp/nvhe/host.S
> > @@ -153,6 +153,10 @@ SYM_FUNC_END(__host_hvc)
> >
> >  .macro invalid_host_el2_vect
> >       .align 7
> > +
> > +     /* Test stack overflow without corrupting GPRs */
> > +     test_sp_overflow PAGE_SHIFT, .L__hyp_sp_overflow\@
> > +
>
> I am definitely concerned with this in a system not using pKVM (which
> is on average 100% of the upstream users so far! ;-). This is more or
> less guaranteed to do the wrong thing 50% of the times, depending on
> the alignment of the stack.

Thanks for pointing this out Marc. I'll rework this to work for both
protected and non-protected modes.

>
> >       /* If a guest is loaded, panic out of it. */
> >       stp     x0, x1, [sp, #-16]!
> >       get_loaded_vcpu x0, x1
> > @@ -165,6 +169,18 @@ SYM_FUNC_END(__host_hvc)
> >        * been partially clobbered by __host_enter.
> >        */
> >       b       hyp_panic
> > +
> > +.L__hyp_sp_overflow\@:
> > +     /*
> > +      * Reset SP to the top of the stack, to allow handling the hyp_panic.
> > +      * This corrupts the stack but is ok, since we won't be attempting
> > +      * any unwinding here.
> > +      */
> > +     ldr_this_cpu    x0, kvm_init_params + NVHE_INIT_STACK_HYP_VA, x1
> > +     mov     sp, x0
> > +
> > +     bl      hyp_panic_bad_stack
> > +     ASM_BUG()
> >  .endm
> >
> >  .macro invalid_host_el1_vect
> > diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> > index 99e178cf4249..114053dff228 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> > @@ -105,7 +105,24 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
> >               if (ret)
> >                       return ret;
> >
> > -             /* Map stack pages in the 'private' VA range */
> > +             /*
> > +              * Allocate 'private' VA range for stack guard pages.
> > +              *
> > +              * The 'private' VA range grows upward and stacks downwards, so
> > +              * allocate the guard page first. But make sure to align the
> > +              * stack itself with PAGE_SIZE * 2 granularity to ease overflow
> > +              * detection in the entry assembly code.
> > +              */
> > +             do {
> > +                     start = (void *)hyp_alloc_private_va_range(PAGE_SIZE);
> > +                     if (IS_ERR(start))
> > +                             return PTR_ERR(start);
> > +             } while (IS_ALIGNED((u64) start, PAGE_SIZE * 2));
>
> This seems cumbersome. Can't we tweak hyp_alloc_private_va_range() to
> perform the required alignment? It could easily be convinced to return
> an address that is aligned on the size of the region, which would
> avoid this sort of loop.

Ack. I'll update it for v2.

- Kalesh

>
> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 4/7] KVM: arm64: Allocate guard pages near hyp stacks
@ 2022-02-14 22:03       ` Kalesh Singh
  0 siblings, 0 replies; 36+ messages in thread
From: Kalesh Singh @ 2022-02-14 22:03 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Pasha Tatashin, Will Deacon, Peter Collingbourne,
	Cc: Android Kernel, LKML, Joey Gouly, kvmarm, Catalin Marinas,
	Suren Baghdasaryan,
	moderated list:ARM64 PORT (AARCH64 ARCHITECTURE)

On Mon, Feb 14, 2022 at 6:06 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Thu, 10 Feb 2022 22:41:45 +0000,
> Kalesh Singh <kaleshsingh@google.com> wrote:
> >
> > From: Quentin Perret <qperret@google.com>
> >
> > Allocate unbacked VA space underneath each stack page to ensure stack
> > overflows get trapped and don't corrupt memory silently.
> >
> > The stack is aligned to twice its size (PAGE_SIZE), meaning that any
> > valid stack address has PAGE_SHIFT bit as 0. This allows us to easily
> > check for overflow in the exception entry without corrupting any GPRs.
> >
> > Signed-off-by: Quentin Perret <qperret@google.com>
> > [ Kalesh - Update commit text and comments,
> >            refactor, add overflow handling ]
> > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > ---
> >  arch/arm64/kvm/hyp/nvhe/host.S   | 16 ++++++++++++++++
> >  arch/arm64/kvm/hyp/nvhe/setup.c  | 19 ++++++++++++++++++-
> >  arch/arm64/kvm/hyp/nvhe/switch.c |  5 +++++
> >  3 files changed, 39 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
> > index 3d613e721a75..78e4b612ac06 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/host.S
> > +++ b/arch/arm64/kvm/hyp/nvhe/host.S
> > @@ -153,6 +153,10 @@ SYM_FUNC_END(__host_hvc)
> >
> >  .macro invalid_host_el2_vect
> >       .align 7
> > +
> > +     /* Test stack overflow without corrupting GPRs */
> > +     test_sp_overflow PAGE_SHIFT, .L__hyp_sp_overflow\@
> > +
>
> I am definitely concerned with this in a system not using pKVM (which
> is on average 100% of the upstream users so far! ;-). This is more or
> less guaranteed to do the wrong thing 50% of the times, depending on
> the alignment of the stack.

Thanks for pointing this out Marc. I'll rework this to work for both
protected and non-protected modes.

>
> >       /* If a guest is loaded, panic out of it. */
> >       stp     x0, x1, [sp, #-16]!
> >       get_loaded_vcpu x0, x1
> > @@ -165,6 +169,18 @@ SYM_FUNC_END(__host_hvc)
> >        * been partially clobbered by __host_enter.
> >        */
> >       b       hyp_panic
> > +
> > +.L__hyp_sp_overflow\@:
> > +     /*
> > +      * Reset SP to the top of the stack, to allow handling the hyp_panic.
> > +      * This corrupts the stack but is ok, since we won't be attempting
> > +      * any unwinding here.
> > +      */
> > +     ldr_this_cpu    x0, kvm_init_params + NVHE_INIT_STACK_HYP_VA, x1
> > +     mov     sp, x0
> > +
> > +     bl      hyp_panic_bad_stack
> > +     ASM_BUG()
> >  .endm
> >
> >  .macro invalid_host_el1_vect
> > diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> > index 99e178cf4249..114053dff228 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> > @@ -105,7 +105,24 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
> >               if (ret)
> >                       return ret;
> >
> > -             /* Map stack pages in the 'private' VA range */
> > +             /*
> > +              * Allocate 'private' VA range for stack guard pages.
> > +              *
> > +              * The 'private' VA range grows upward and stacks downwards, so
> > +              * allocate the guard page first. But make sure to align the
> > +              * stack itself with PAGE_SIZE * 2 granularity to ease overflow
> > +              * detection in the entry assembly code.
> > +              */
> > +             do {
> > +                     start = (void *)hyp_alloc_private_va_range(PAGE_SIZE);
> > +                     if (IS_ERR(start))
> > +                             return PTR_ERR(start);
> > +             } while (IS_ALIGNED((u64) start, PAGE_SIZE * 2));
>
> This seems cumbersome. Can't we tweak hyp_alloc_private_va_range() to
> perform the required alignment? It could easily be convinced to return
> an address that is aligned on the size of the region, which would
> avoid this sort of loop.

Ack. I'll update it for v2.

- Kalesh

>
> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2022-02-15  9:01 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-10 22:41 [PATCH 0/7] KVM: arm64: Hypervisor stack enhancements Kalesh Singh
2022-02-10 22:41 ` Kalesh Singh
2022-02-10 22:41 ` Kalesh Singh
2022-02-10 22:41 ` [PATCH 1/7] KVM: arm64: Map the stack pages in the 'private' range Kalesh Singh
2022-02-10 22:41   ` Kalesh Singh
2022-02-10 22:41   ` Kalesh Singh
2022-02-10 22:41 ` [PATCH 2/7] KVM: arm64: Factor out private range VA allocation Kalesh Singh
2022-02-10 22:41   ` Kalesh Singh
2022-02-10 22:41   ` Kalesh Singh
2022-02-10 22:41 ` [PATCH 3/7] arm64: asm: Introduce test_sp_overflow macro Kalesh Singh
2022-02-10 22:41   ` Kalesh Singh
2022-02-10 22:41   ` Kalesh Singh
2022-02-10 22:41 ` [PATCH 4/7] KVM: arm64: Allocate guard pages near hyp stacks Kalesh Singh
2022-02-10 22:41   ` Kalesh Singh
2022-02-10 22:41   ` Kalesh Singh
2022-02-14 14:06   ` Marc Zyngier
2022-02-14 14:06     ` Marc Zyngier
2022-02-14 14:06     ` Marc Zyngier
2022-02-14 22:03     ` Kalesh Singh
2022-02-14 22:03       ` Kalesh Singh
2022-02-14 22:03       ` Kalesh Singh
2022-02-10 22:41 ` [PATCH 5/7] KVM: arm64: Add Hyp overflow stack Kalesh Singh
2022-02-10 22:41   ` Kalesh Singh
2022-02-10 22:41   ` Kalesh Singh
2022-02-10 22:41 ` [PATCH 6/7] KVM: arm64: Unwind and dump nVHE HYP stacktrace Kalesh Singh
2022-02-10 22:41   ` Kalesh Singh
2022-02-10 22:41   ` Kalesh Singh
2022-02-10 22:41 ` [PATCH 7/7] KVM: arm64: Symbolize the nVHE HYP backtrace Kalesh Singh
2022-02-10 22:41   ` Kalesh Singh
2022-02-10 22:41   ` Kalesh Singh
2022-02-14 11:41 ` [PATCH 0/7] KVM: arm64: Hypervisor stack enhancements Marc Zyngier
2022-02-14 11:41   ` Marc Zyngier
2022-02-14 11:41   ` Marc Zyngier
2022-02-14 21:54   ` Kalesh Singh
2022-02-14 21:54     ` Kalesh Singh
2022-02-14 21:54     ` Kalesh Singh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.