All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] KVM: arm64: A510 errata workaround and fixes for fixup_guest_exit()
@ 2022-01-25 15:37 ` James Morse
  0 siblings, 0 replies; 18+ messages in thread
From: James Morse @ 2022-01-25 15:37 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Anshuman Khandual, Marc Zyngier, Catalin Marinas, Will Deacon

Hello!

Early Cortex-A510 parts have a nasty erratum where two ERETs,
pointer-auth and software step conspire to corrupt SPSR_EL2. A
guest can only trigger this when it is being stepped by EL2, which
gives EL2 the opportunity to work around the erratum. Patch 4 does
this, the SDEN is available from:
https://developer.arm.com/documentation/SDEN2397239/900

Patches 2 and 3 fix two issues with the adjacent code where a stale
esr value could be used to alter the ELR_EL2 when an IRQ synchronises
an SError, and when an HVC synchronises an SError, the HVC may be
handled twice, (not just execute twice).


There are three series that would add the Cortex-A510 part macros. I've picked
Anshuman's patch that does this, on the assumption that makes someone's life
easier. I haven't spotted that patch on the arm64/for-next/fixes branch, so
I've not included the hash in the prerequisite field of the CC-stable.

Let me know if you want this reposted once that value is known.

This series is based on v5.17-rc1 and can be retrieved from:
https://git.gitlab.arm.com/linux-arm/linux-jm.git a510_errata/kvm_bits/v1


Thanks,

James

Anshuman Khandual (1):
  arm64: Add Cortex-A510 CPU part definition

James Morse (3):
  KVM: arm64: Avoid consuming a stale esr value when SError occur
  KVM: arm64: Stop handle_exit() from handling HVC twice when an SError
    occurs
  KVM: arm64: Workaround Cortex-A510's single-step and PAC trap errata

 Documentation/arm64/silicon-errata.rst  |  2 ++
 arch/arm64/Kconfig                      | 16 +++++++++++++++
 arch/arm64/include/asm/cputype.h        |  2 ++
 arch/arm64/kernel/cpu_errata.c          |  8 ++++++++
 arch/arm64/kvm/handle_exit.c            |  8 ++++++++
 arch/arm64/kvm/hyp/include/hyp/switch.h | 27 +++++++++++++++++++++----
 arch/arm64/tools/cpucaps                |  1 +
 7 files changed, 60 insertions(+), 4 deletions(-)

-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 0/4] KVM: arm64: A510 errata workaround and fixes for fixup_guest_exit()
@ 2022-01-25 15:37 ` James Morse
  0 siblings, 0 replies; 18+ messages in thread
From: James Morse @ 2022-01-25 15:37 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Alexandru Elisei,
	Suzuki K Poulose, Anshuman Khandual

Hello!

Early Cortex-A510 parts have a nasty erratum where two ERETs,
pointer-auth and software step conspire to corrupt SPSR_EL2. A
guest can only trigger this when it is being stepped by EL2, which
gives EL2 the opportunity to work around the erratum. Patch 4 does
this, the SDEN is available from:
https://developer.arm.com/documentation/SDEN2397239/900

Patches 2 and 3 fix two issues with the adjacent code where a stale
esr value could be used to alter the ELR_EL2 when an IRQ synchronises
an SError, and when an HVC synchronises an SError, the HVC may be
handled twice, (not just execute twice).


There are three series that would add the Cortex-A510 part macros. I've picked
Anshuman's patch that does this, on the assumption that makes someone's life
easier. I haven't spotted that patch on the arm64/for-next/fixes branch, so
I've not included the hash in the prerequisite field of the CC-stable.

Let me know if you want this reposted once that value is known.

This series is based on v5.17-rc1 and can be retrieved from:
https://git.gitlab.arm.com/linux-arm/linux-jm.git a510_errata/kvm_bits/v1


Thanks,

James

Anshuman Khandual (1):
  arm64: Add Cortex-A510 CPU part definition

James Morse (3):
  KVM: arm64: Avoid consuming a stale esr value when SError occur
  KVM: arm64: Stop handle_exit() from handling HVC twice when an SError
    occurs
  KVM: arm64: Workaround Cortex-A510's single-step and PAC trap errata

 Documentation/arm64/silicon-errata.rst  |  2 ++
 arch/arm64/Kconfig                      | 16 +++++++++++++++
 arch/arm64/include/asm/cputype.h        |  2 ++
 arch/arm64/kernel/cpu_errata.c          |  8 ++++++++
 arch/arm64/kvm/handle_exit.c            |  8 ++++++++
 arch/arm64/kvm/hyp/include/hyp/switch.h | 27 +++++++++++++++++++++----
 arch/arm64/tools/cpucaps                |  1 +
 7 files changed, 60 insertions(+), 4 deletions(-)

-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/4] arm64: Add Cortex-A510 CPU part definition
  2022-01-25 15:37 ` James Morse
@ 2022-01-25 15:38   ` James Morse
  -1 siblings, 0 replies; 18+ messages in thread
From: James Morse @ 2022-01-25 15:38 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Anshuman Khandual, Marc Zyngier, Catalin Marinas, Will Deacon

From: Anshuman Khandual <anshuman.khandual@arm.com>

Add the CPU Partnumbers for the new Arm designs.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/arm64/include/asm/cputype.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h
index 19b8441aa8f2..e8fdc10395b6 100644
--- a/arch/arm64/include/asm/cputype.h
+++ b/arch/arm64/include/asm/cputype.h
@@ -73,6 +73,7 @@
 #define ARM_CPU_PART_CORTEX_A76		0xD0B
 #define ARM_CPU_PART_NEOVERSE_N1	0xD0C
 #define ARM_CPU_PART_CORTEX_A77		0xD0D
+#define ARM_CPU_PART_CORTEX_A510	0xD46
 #define ARM_CPU_PART_CORTEX_A710	0xD47
 #define ARM_CPU_PART_NEOVERSE_N2	0xD49
 
@@ -115,6 +116,7 @@
 #define MIDR_CORTEX_A76	MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A76)
 #define MIDR_NEOVERSE_N1 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_NEOVERSE_N1)
 #define MIDR_CORTEX_A77	MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A77)
+#define MIDR_CORTEX_A510 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A510)
 #define MIDR_CORTEX_A710 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A710)
 #define MIDR_NEOVERSE_N2 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_NEOVERSE_N2)
 #define MIDR_THUNDERX	MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX)
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 1/4] arm64: Add Cortex-A510 CPU part definition
@ 2022-01-25 15:38   ` James Morse
  0 siblings, 0 replies; 18+ messages in thread
From: James Morse @ 2022-01-25 15:38 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Alexandru Elisei,
	Suzuki K Poulose, Anshuman Khandual

From: Anshuman Khandual <anshuman.khandual@arm.com>

Add the CPU Partnumbers for the new Arm designs.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/arm64/include/asm/cputype.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h
index 19b8441aa8f2..e8fdc10395b6 100644
--- a/arch/arm64/include/asm/cputype.h
+++ b/arch/arm64/include/asm/cputype.h
@@ -73,6 +73,7 @@
 #define ARM_CPU_PART_CORTEX_A76		0xD0B
 #define ARM_CPU_PART_NEOVERSE_N1	0xD0C
 #define ARM_CPU_PART_CORTEX_A77		0xD0D
+#define ARM_CPU_PART_CORTEX_A510	0xD46
 #define ARM_CPU_PART_CORTEX_A710	0xD47
 #define ARM_CPU_PART_NEOVERSE_N2	0xD49
 
@@ -115,6 +116,7 @@
 #define MIDR_CORTEX_A76	MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A76)
 #define MIDR_NEOVERSE_N1 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_NEOVERSE_N1)
 #define MIDR_CORTEX_A77	MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A77)
+#define MIDR_CORTEX_A510 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A510)
 #define MIDR_CORTEX_A710 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A710)
 #define MIDR_NEOVERSE_N2 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_NEOVERSE_N2)
 #define MIDR_THUNDERX	MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX)
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/4] KVM: arm64: Avoid consuming a stale esr value when SError occur
  2022-01-25 15:37 ` James Morse
@ 2022-01-25 15:38   ` James Morse
  -1 siblings, 0 replies; 18+ messages in thread
From: James Morse @ 2022-01-25 15:38 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Anshuman Khandual, Marc Zyngier, Catalin Marinas, Will Deacon

When any exception other than an IRQ occurs, the CPU updates the ESR_EL2
register with the exception syndrome. An SError may also become pending,
and will be synchronised by KVM. KVM notes the exception type, and whether
an SError was synchronised in exit_code.

When an exception other than an IRQ occurs, fixup_guest_exit() updates
vcpu->arch.fault.esr_el2 from the hardware register. When an SError was
synchronised, the vcpu esr value is used to determine if the exception
was due to an HVC. If so, ELR_EL2 is moved back one instruction. This
is so that KVM can process the SError first, and re-execute the HVC if
the guest survives the SError.

But if an IRQ synchronises an SError, the vcpu's esr value is stale.
If the previous non-IRQ exception was an HVC, KVM will corrupt ELR_EL2,
causing an unrelated guest instruction to be executed twice.

Check ARM_EXCEPTION_CODE() before messing with ELR_EL2, IRQs don't
update this register so don't need to check.

Fixes: defe21f49bc9 ("KVM: arm64: Move PC rollback on SError to HYP")
Cc: stable@vger.kernel.org
Reported-by: Steven Price <steven.price@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/arm64/kvm/hyp/include/hyp/switch.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 58e14f8ead23..331dd10821df 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -424,7 +424,8 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 	if (ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ)
 		vcpu->arch.fault.esr_el2 = read_sysreg_el2(SYS_ESR);
 
-	if (ARM_SERROR_PENDING(*exit_code)) {
+	if (ARM_SERROR_PENDING(*exit_code) &&
+	    ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ) {
 		u8 esr_ec = kvm_vcpu_trap_get_class(vcpu);
 
 		/*
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/4] KVM: arm64: Avoid consuming a stale esr value when SError occur
@ 2022-01-25 15:38   ` James Morse
  0 siblings, 0 replies; 18+ messages in thread
From: James Morse @ 2022-01-25 15:38 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Alexandru Elisei,
	Suzuki K Poulose, Anshuman Khandual

When any exception other than an IRQ occurs, the CPU updates the ESR_EL2
register with the exception syndrome. An SError may also become pending,
and will be synchronised by KVM. KVM notes the exception type, and whether
an SError was synchronised in exit_code.

When an exception other than an IRQ occurs, fixup_guest_exit() updates
vcpu->arch.fault.esr_el2 from the hardware register. When an SError was
synchronised, the vcpu esr value is used to determine if the exception
was due to an HVC. If so, ELR_EL2 is moved back one instruction. This
is so that KVM can process the SError first, and re-execute the HVC if
the guest survives the SError.

But if an IRQ synchronises an SError, the vcpu's esr value is stale.
If the previous non-IRQ exception was an HVC, KVM will corrupt ELR_EL2,
causing an unrelated guest instruction to be executed twice.

Check ARM_EXCEPTION_CODE() before messing with ELR_EL2, IRQs don't
update this register so don't need to check.

Fixes: defe21f49bc9 ("KVM: arm64: Move PC rollback on SError to HYP")
Cc: stable@vger.kernel.org
Reported-by: Steven Price <steven.price@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/arm64/kvm/hyp/include/hyp/switch.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 58e14f8ead23..331dd10821df 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -424,7 +424,8 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 	if (ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ)
 		vcpu->arch.fault.esr_el2 = read_sysreg_el2(SYS_ESR);
 
-	if (ARM_SERROR_PENDING(*exit_code)) {
+	if (ARM_SERROR_PENDING(*exit_code) &&
+	    ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ) {
 		u8 esr_ec = kvm_vcpu_trap_get_class(vcpu);
 
 		/*
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/4] KVM: arm64: Stop handle_exit() from handling HVC twice when an SError occurs
  2022-01-25 15:37 ` James Morse
@ 2022-01-25 15:38   ` James Morse
  -1 siblings, 0 replies; 18+ messages in thread
From: James Morse @ 2022-01-25 15:38 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Anshuman Khandual, Marc Zyngier, Catalin Marinas, Will Deacon

Prior to commit defe21f49bc9 ("KVM: arm64: Move PC rollback on SError to
HYP"), when an SError is synchronised due to another exception, KVM
handles the SError first. If the guest survives, the instruction that
triggered the original exception is re-exectued to handle the first
exception. HVC is treated as a special case as the instruction wouldn't
normally be re-exectued, as its not a trap.

Commit defe21f49bc9 didn't preserve the behaviour of the 'return 1'
that skips the rest of handle_exit().

Since commit defe21f49bc9, KVM will try to handle the SError and the
original exception at the same time. When the exception was an HVC,
fixup_guest_exit() has already rolled back ELR_EL2, meaning if the
guest has virtual SError masked, it will execute and handle the HVC
twice.

Restore the original behaviour.

Fixes: defe21f49bc9 ("KVM: arm64: Move PC rollback on SError to HYP")
Cc: stable@vger.kernel.org
Signed-off-by: James Morse <james.morse@arm.com>
---
It may be possible to remove both this patch, and the HVC handling code
in fixup_guest_exit(). This means KVM would always handle the exception
and the SError. This may result in unnecessary work if the guest takes
the virtual SError when it is next restarted, but should be harmless if
SError are always re-injected using HCR_EL2.VSE.
---
 arch/arm64/kvm/handle_exit.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index fd2dd26caf91..e3140abd2e2e 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -228,6 +228,14 @@ int handle_exit(struct kvm_vcpu *vcpu, int exception_index)
 {
 	struct kvm_run *run = vcpu->run;
 
+	if (ARM_SERROR_PENDING(exception_index)) {
+		/*
+		 * The SError is handled by handle_exit_early(). If the guest
+		 * survives it will re-execute the original instruction.
+		 */
+		return 1;
+	}
+
 	exception_index = ARM_EXCEPTION_CODE(exception_index);
 
 	switch (exception_index) {
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/4] KVM: arm64: Stop handle_exit() from handling HVC twice when an SError occurs
@ 2022-01-25 15:38   ` James Morse
  0 siblings, 0 replies; 18+ messages in thread
From: James Morse @ 2022-01-25 15:38 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Alexandru Elisei,
	Suzuki K Poulose, Anshuman Khandual

Prior to commit defe21f49bc9 ("KVM: arm64: Move PC rollback on SError to
HYP"), when an SError is synchronised due to another exception, KVM
handles the SError first. If the guest survives, the instruction that
triggered the original exception is re-exectued to handle the first
exception. HVC is treated as a special case as the instruction wouldn't
normally be re-exectued, as its not a trap.

Commit defe21f49bc9 didn't preserve the behaviour of the 'return 1'
that skips the rest of handle_exit().

Since commit defe21f49bc9, KVM will try to handle the SError and the
original exception at the same time. When the exception was an HVC,
fixup_guest_exit() has already rolled back ELR_EL2, meaning if the
guest has virtual SError masked, it will execute and handle the HVC
twice.

Restore the original behaviour.

Fixes: defe21f49bc9 ("KVM: arm64: Move PC rollback on SError to HYP")
Cc: stable@vger.kernel.org
Signed-off-by: James Morse <james.morse@arm.com>
---
It may be possible to remove both this patch, and the HVC handling code
in fixup_guest_exit(). This means KVM would always handle the exception
and the SError. This may result in unnecessary work if the guest takes
the virtual SError when it is next restarted, but should be harmless if
SError are always re-injected using HCR_EL2.VSE.
---
 arch/arm64/kvm/handle_exit.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index fd2dd26caf91..e3140abd2e2e 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -228,6 +228,14 @@ int handle_exit(struct kvm_vcpu *vcpu, int exception_index)
 {
 	struct kvm_run *run = vcpu->run;
 
+	if (ARM_SERROR_PENDING(exception_index)) {
+		/*
+		 * The SError is handled by handle_exit_early(). If the guest
+		 * survives it will re-execute the original instruction.
+		 */
+		return 1;
+	}
+
 	exception_index = ARM_EXCEPTION_CODE(exception_index);
 
 	switch (exception_index) {
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/4] KVM: arm64: Workaround Cortex-A510's single-step and PAC trap errata
  2022-01-25 15:37 ` James Morse
@ 2022-01-25 15:38   ` James Morse
  -1 siblings, 0 replies; 18+ messages in thread
From: James Morse @ 2022-01-25 15:38 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Anshuman Khandual, Marc Zyngier, Catalin Marinas, Will Deacon

Cortex-A510's erratum #2077057 causes SPSR_EL2 to be corrupted when
single-stepping authenticated ERET instructions. A single step is
expected, but a pointer authentication trap is taken instead. The
erratum causes SPSR_EL1 to be copied to SPSR_EL2, which could allow
EL1 to cause a return to EL2 with a guest controlled ELR_EL2.

Because the conditions require an ERET into active-not-pending state,
this is only a problem for the EL2 when EL2 is stepping EL1. In this case
the previous SPSR_EL2 value is preserved in struct kvm_vcpu, and can be
restored.

Cc: stable@vger.kernel.org # ${GITHASHHERE}: arm64: Add Cortex-A510 CPU part definition
Cc: stable@vger.kernel.org
Signed-off-by: James Morse <james.morse@arm.com>
---
 Documentation/arm64/silicon-errata.rst  |  2 ++
 arch/arm64/Kconfig                      | 16 ++++++++++++++++
 arch/arm64/kernel/cpu_errata.c          |  8 ++++++++
 arch/arm64/kvm/hyp/include/hyp/switch.h | 24 +++++++++++++++++++++---
 arch/arm64/tools/cpucaps                |  1 +
 5 files changed, 48 insertions(+), 3 deletions(-)

diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
index 5342e895fb60..ac1ae34564c9 100644
--- a/Documentation/arm64/silicon-errata.rst
+++ b/Documentation/arm64/silicon-errata.rst
@@ -92,6 +92,8 @@ stable kernels.
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-A77      | #1508412        | ARM64_ERRATUM_1508412       |
 +----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-A510     | #2077057        | ARM64_ERRATUM_2077057       |
++----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-A710     | #2119858        | ARM64_ERRATUM_2119858       |
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-A710     | #2054223        | ARM64_ERRATUM_2054223       |
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 6978140edfa4..02b542ec18c8 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -670,6 +670,22 @@ config ARM64_ERRATUM_1508412
 config ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
 	bool
 
+config ARM64_ERRATUM_2077057
+	bool "Cortex-A510: 2077057: workaround software-step corrupting SPSR_EL2"
+	help
+	  This option adds the workaround for ARM Cortex-A510 erratum 2077057.
+	  Affected Cortex-A510 may corrupt SPSR_EL2 when the a step exception is
+	  expected, but a Pointer Authentication trap is taken instead. The
+	  erratum causes SPSR_EL1 to be copied to SPSR_EL2, which could allow
+	  EL1 to cause a return to EL2 with a guest controlled ELR_EL2.
+
+	  This can only happen when EL2 is stepping EL1.
+
+	  When these conditions occur, the SPSR_EL2 value is unchanged from the
+	  previous guest entry, and can be restored from the in-memory copy.
+
+	  If unsure, say Y.
+
 config ARM64_ERRATUM_2119858
 	bool "Cortex-A710: 2119858: workaround TRBE overwriting trace data in FILL mode"
 	default y
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 9e1c1aef9ebd..04a014c63251 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -597,6 +597,14 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
 		.type = ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE,
 		CAP_MIDR_RANGE_LIST(trbe_write_out_of_range_cpus),
 	},
+#endif
+#ifdef CONFIG_ARM64_ERRATUM_2077057
+	{
+		.desc = "ARM erratum 2077057",
+		.capability = ARM64_WORKAROUND_2077057,
+		.type = ARM64_CPUCAP_LOCAL_CPU_ERRATUM,
+		ERRATA_MIDR_REV_RANGE(MIDR_CORTEX_A510, 0, 0, 2),
+	},
 #endif
 	{
 	}
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 331dd10821df..93bf140cc972 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -409,6 +409,8 @@ static inline bool kvm_hyp_handle_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
  */
 static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 {
+	u8 esr_ec;
+
 	/*
 	 * Save PSTATE early so that we can evaluate the vcpu mode
 	 * early on.
@@ -421,13 +423,13 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 	 */
 	early_exit_filter(vcpu, exit_code);
 
-	if (ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ)
+	if (ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ) {
 		vcpu->arch.fault.esr_el2 = read_sysreg_el2(SYS_ESR);
+		esr_ec = kvm_vcpu_trap_get_class(vcpu);
+	}
 
 	if (ARM_SERROR_PENDING(*exit_code) &&
 	    ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ) {
-		u8 esr_ec = kvm_vcpu_trap_get_class(vcpu);
-
 		/*
 		 * HVC already have an adjusted PC, which we need to
 		 * correct in order to return to after having injected
@@ -440,6 +442,22 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 			write_sysreg_el2(read_sysreg_el2(SYS_ELR) - 4, SYS_ELR);
 	}
 
+	/*
+	 * Check for the conditions of Cortex-A510's #2077057. When these occur
+	 * SPSR_EL2 can't be trusted, but isn't needed either as it is
+	 * unchanged from the value in vcpu_gp_regs(vcpu)->pstate.
+	 * Did we just take a PAC exception when a step exception was expected?
+	 */
+	if (IS_ENABLED(CONFIG_ARM64_ERRATUM_2077057) &&
+	    cpus_have_const_cap(ARM64_WORKAROUND_2077057) &&
+	    ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ &&
+	    esr_ec == ESR_ELx_EC_PAC &&
+	    vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
+		/* Active-not-pending? */
+		if (*vcpu_cpsr(vcpu) & DBG_SPSR_SS)
+			write_sysreg_el2(*vcpu_cpsr(vcpu), SYS_SPSR);
+	}
+
 	/*
 	 * We're using the raw exception code in order to only process
 	 * the trap if no SError is pending. We will come back to the
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 870c39537dd0..2e7cd3fecca6 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -55,6 +55,7 @@ WORKAROUND_1418040
 WORKAROUND_1463225
 WORKAROUND_1508412
 WORKAROUND_1542419
+WORKAROUND_2077057
 WORKAROUND_TRBE_OVERWRITE_FILL_MODE
 WORKAROUND_TSB_FLUSH_FAILURE
 WORKAROUND_TRBE_WRITE_OUT_OF_RANGE
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/4] KVM: arm64: Workaround Cortex-A510's single-step and PAC trap errata
@ 2022-01-25 15:38   ` James Morse
  0 siblings, 0 replies; 18+ messages in thread
From: James Morse @ 2022-01-25 15:38 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Alexandru Elisei,
	Suzuki K Poulose, Anshuman Khandual

Cortex-A510's erratum #2077057 causes SPSR_EL2 to be corrupted when
single-stepping authenticated ERET instructions. A single step is
expected, but a pointer authentication trap is taken instead. The
erratum causes SPSR_EL1 to be copied to SPSR_EL2, which could allow
EL1 to cause a return to EL2 with a guest controlled ELR_EL2.

Because the conditions require an ERET into active-not-pending state,
this is only a problem for the EL2 when EL2 is stepping EL1. In this case
the previous SPSR_EL2 value is preserved in struct kvm_vcpu, and can be
restored.

Cc: stable@vger.kernel.org # ${GITHASHHERE}: arm64: Add Cortex-A510 CPU part definition
Cc: stable@vger.kernel.org
Signed-off-by: James Morse <james.morse@arm.com>
---
 Documentation/arm64/silicon-errata.rst  |  2 ++
 arch/arm64/Kconfig                      | 16 ++++++++++++++++
 arch/arm64/kernel/cpu_errata.c          |  8 ++++++++
 arch/arm64/kvm/hyp/include/hyp/switch.h | 24 +++++++++++++++++++++---
 arch/arm64/tools/cpucaps                |  1 +
 5 files changed, 48 insertions(+), 3 deletions(-)

diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
index 5342e895fb60..ac1ae34564c9 100644
--- a/Documentation/arm64/silicon-errata.rst
+++ b/Documentation/arm64/silicon-errata.rst
@@ -92,6 +92,8 @@ stable kernels.
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-A77      | #1508412        | ARM64_ERRATUM_1508412       |
 +----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-A510     | #2077057        | ARM64_ERRATUM_2077057       |
++----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-A710     | #2119858        | ARM64_ERRATUM_2119858       |
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-A710     | #2054223        | ARM64_ERRATUM_2054223       |
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 6978140edfa4..02b542ec18c8 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -670,6 +670,22 @@ config ARM64_ERRATUM_1508412
 config ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
 	bool
 
+config ARM64_ERRATUM_2077057
+	bool "Cortex-A510: 2077057: workaround software-step corrupting SPSR_EL2"
+	help
+	  This option adds the workaround for ARM Cortex-A510 erratum 2077057.
+	  Affected Cortex-A510 may corrupt SPSR_EL2 when the a step exception is
+	  expected, but a Pointer Authentication trap is taken instead. The
+	  erratum causes SPSR_EL1 to be copied to SPSR_EL2, which could allow
+	  EL1 to cause a return to EL2 with a guest controlled ELR_EL2.
+
+	  This can only happen when EL2 is stepping EL1.
+
+	  When these conditions occur, the SPSR_EL2 value is unchanged from the
+	  previous guest entry, and can be restored from the in-memory copy.
+
+	  If unsure, say Y.
+
 config ARM64_ERRATUM_2119858
 	bool "Cortex-A710: 2119858: workaround TRBE overwriting trace data in FILL mode"
 	default y
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 9e1c1aef9ebd..04a014c63251 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -597,6 +597,14 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
 		.type = ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE,
 		CAP_MIDR_RANGE_LIST(trbe_write_out_of_range_cpus),
 	},
+#endif
+#ifdef CONFIG_ARM64_ERRATUM_2077057
+	{
+		.desc = "ARM erratum 2077057",
+		.capability = ARM64_WORKAROUND_2077057,
+		.type = ARM64_CPUCAP_LOCAL_CPU_ERRATUM,
+		ERRATA_MIDR_REV_RANGE(MIDR_CORTEX_A510, 0, 0, 2),
+	},
 #endif
 	{
 	}
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 331dd10821df..93bf140cc972 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -409,6 +409,8 @@ static inline bool kvm_hyp_handle_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
  */
 static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 {
+	u8 esr_ec;
+
 	/*
 	 * Save PSTATE early so that we can evaluate the vcpu mode
 	 * early on.
@@ -421,13 +423,13 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 	 */
 	early_exit_filter(vcpu, exit_code);
 
-	if (ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ)
+	if (ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ) {
 		vcpu->arch.fault.esr_el2 = read_sysreg_el2(SYS_ESR);
+		esr_ec = kvm_vcpu_trap_get_class(vcpu);
+	}
 
 	if (ARM_SERROR_PENDING(*exit_code) &&
 	    ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ) {
-		u8 esr_ec = kvm_vcpu_trap_get_class(vcpu);
-
 		/*
 		 * HVC already have an adjusted PC, which we need to
 		 * correct in order to return to after having injected
@@ -440,6 +442,22 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 			write_sysreg_el2(read_sysreg_el2(SYS_ELR) - 4, SYS_ELR);
 	}
 
+	/*
+	 * Check for the conditions of Cortex-A510's #2077057. When these occur
+	 * SPSR_EL2 can't be trusted, but isn't needed either as it is
+	 * unchanged from the value in vcpu_gp_regs(vcpu)->pstate.
+	 * Did we just take a PAC exception when a step exception was expected?
+	 */
+	if (IS_ENABLED(CONFIG_ARM64_ERRATUM_2077057) &&
+	    cpus_have_const_cap(ARM64_WORKAROUND_2077057) &&
+	    ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ &&
+	    esr_ec == ESR_ELx_EC_PAC &&
+	    vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
+		/* Active-not-pending? */
+		if (*vcpu_cpsr(vcpu) & DBG_SPSR_SS)
+			write_sysreg_el2(*vcpu_cpsr(vcpu), SYS_SPSR);
+	}
+
 	/*
 	 * We're using the raw exception code in order to only process
 	 * the trap if no SError is pending. We will come back to the
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 870c39537dd0..2e7cd3fecca6 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -55,6 +55,7 @@ WORKAROUND_1418040
 WORKAROUND_1463225
 WORKAROUND_1508412
 WORKAROUND_1542419
+WORKAROUND_2077057
 WORKAROUND_TRBE_OVERWRITE_FILL_MODE
 WORKAROUND_TSB_FLUSH_FAILURE
 WORKAROUND_TRBE_WRITE_OUT_OF_RANGE
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] KVM: arm64: Workaround Cortex-A510's single-step and PAC trap errata
  2022-01-25 15:38   ` James Morse
@ 2022-01-25 16:51     ` Marc Zyngier
  -1 siblings, 0 replies; 18+ messages in thread
From: Marc Zyngier @ 2022-01-25 16:51 UTC (permalink / raw)
  To: James Morse
  Cc: Catalin Marinas, Anshuman Khandual, Will Deacon, kvmarm,
	linux-arm-kernel

Hi James,

Thanks for this. I guess.

On Tue, 25 Jan 2022 15:38:03 +0000,
James Morse <james.morse@arm.com> wrote:
> 
> Cortex-A510's erratum #2077057 causes SPSR_EL2 to be corrupted when
> single-stepping authenticated ERET instructions. A single step is
> expected, but a pointer authentication trap is taken instead. The
> erratum causes SPSR_EL1 to be copied to SPSR_EL2, which could allow
> EL1 to cause a return to EL2 with a guest controlled ELR_EL2.
> 
> Because the conditions require an ERET into active-not-pending state,
> this is only a problem for the EL2 when EL2 is stepping EL1. In this case
> the previous SPSR_EL2 value is preserved in struct kvm_vcpu, and can be
> restored.

Urgh. That's pretty nasty :-(.

> 
> Cc: stable@vger.kernel.org # ${GITHASHHERE}: arm64: Add Cortex-A510 CPU part definition
> Cc: stable@vger.kernel.org
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  Documentation/arm64/silicon-errata.rst  |  2 ++
>  arch/arm64/Kconfig                      | 16 ++++++++++++++++
>  arch/arm64/kernel/cpu_errata.c          |  8 ++++++++
>  arch/arm64/kvm/hyp/include/hyp/switch.h | 24 +++++++++++++++++++++---
>  arch/arm64/tools/cpucaps                |  1 +
>  5 files changed, 48 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
> index 5342e895fb60..ac1ae34564c9 100644
> --- a/Documentation/arm64/silicon-errata.rst
> +++ b/Documentation/arm64/silicon-errata.rst
> @@ -92,6 +92,8 @@ stable kernels.
>  +----------------+-----------------+-----------------+-----------------------------+
>  | ARM            | Cortex-A77      | #1508412        | ARM64_ERRATUM_1508412       |
>  +----------------+-----------------+-----------------+-----------------------------+
> +| ARM            | Cortex-A510     | #2077057        | ARM64_ERRATUM_2077057       |
> ++----------------+-----------------+-----------------+-----------------------------+
>  | ARM            | Cortex-A710     | #2119858        | ARM64_ERRATUM_2119858       |
>  +----------------+-----------------+-----------------+-----------------------------+
>  | ARM            | Cortex-A710     | #2054223        | ARM64_ERRATUM_2054223       |
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 6978140edfa4..02b542ec18c8 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -670,6 +670,22 @@ config ARM64_ERRATUM_1508412
>  config ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
>  	bool
>  
> +config ARM64_ERRATUM_2077057
> +	bool "Cortex-A510: 2077057: workaround software-step corrupting SPSR_EL2"
> +	help
> +	  This option adds the workaround for ARM Cortex-A510 erratum 2077057.
> +	  Affected Cortex-A510 may corrupt SPSR_EL2 when the a step exception is
> +	  expected, but a Pointer Authentication trap is taken instead. The
> +	  erratum causes SPSR_EL1 to be copied to SPSR_EL2, which could allow
> +	  EL1 to cause a return to EL2 with a guest controlled ELR_EL2.
> +
> +	  This can only happen when EL2 is stepping EL1.
> +
> +	  When these conditions occur, the SPSR_EL2 value is unchanged from the
> +	  previous guest entry, and can be restored from the in-memory copy.
> +
> +	  If unsure, say Y.
> +
>  config ARM64_ERRATUM_2119858
>  	bool "Cortex-A710: 2119858: workaround TRBE overwriting trace data in FILL mode"
>  	default y
> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
> index 9e1c1aef9ebd..04a014c63251 100644
> --- a/arch/arm64/kernel/cpu_errata.c
> +++ b/arch/arm64/kernel/cpu_errata.c
> @@ -597,6 +597,14 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
>  		.type = ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE,
>  		CAP_MIDR_RANGE_LIST(trbe_write_out_of_range_cpus),
>  	},
> +#endif
> +#ifdef CONFIG_ARM64_ERRATUM_2077057
> +	{
> +		.desc = "ARM erratum 2077057",
> +		.capability = ARM64_WORKAROUND_2077057,
> +		.type = ARM64_CPUCAP_LOCAL_CPU_ERRATUM,
> +		ERRATA_MIDR_REV_RANGE(MIDR_CORTEX_A510, 0, 0, 2),
> +	},
>  #endif
>  	{
>  	}
> diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
> index 331dd10821df..93bf140cc972 100644
> --- a/arch/arm64/kvm/hyp/include/hyp/switch.h
> +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
> @@ -409,6 +409,8 @@ static inline bool kvm_hyp_handle_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>   */
>  static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>  {
> +	u8 esr_ec;
> +
>  	/*
>  	 * Save PSTATE early so that we can evaluate the vcpu mode
>  	 * early on.
> @@ -421,13 +423,13 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>  	 */
>  	early_exit_filter(vcpu, exit_code);
>  
> -	if (ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ)
> +	if (ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ) {
>  		vcpu->arch.fault.esr_el2 = read_sysreg_el2(SYS_ESR);
> +		esr_ec = kvm_vcpu_trap_get_class(vcpu);
> +	}
>  
>  	if (ARM_SERROR_PENDING(*exit_code) &&
>  	    ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ) {
> -		u8 esr_ec = kvm_vcpu_trap_get_class(vcpu);
> -
>  		/*
>  		 * HVC already have an adjusted PC, which we need to
>  		 * correct in order to return to after having injected
> @@ -440,6 +442,22 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>  			write_sysreg_el2(read_sysreg_el2(SYS_ELR) - 4, SYS_ELR);
>  	}
>  
> +	/*
> +	 * Check for the conditions of Cortex-A510's #2077057. When these occur
> +	 * SPSR_EL2 can't be trusted, but isn't needed either as it is
> +	 * unchanged from the value in vcpu_gp_regs(vcpu)->pstate.
> +	 * Did we just take a PAC exception when a step exception was expected?
> +	 */
> +	if (IS_ENABLED(CONFIG_ARM64_ERRATUM_2077057) &&

nit: we can drop this IS_ENABLED()...

> +	    cpus_have_const_cap(ARM64_WORKAROUND_2077057) &&

and make this a final cap. Being a ARM64_CPUCAP_LOCAL_CPU_ERRATUM, we
won't accept late CPUs on a system that wasn't affected until then.

> +	    ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ &&
> +	    esr_ec == ESR_ELx_EC_PAC &&
> +	    vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
> +		/* Active-not-pending? */
> +		if (*vcpu_cpsr(vcpu) & DBG_SPSR_SS)
> +			write_sysreg_el2(*vcpu_cpsr(vcpu), SYS_SPSR);

Err... Isn't this way too late? The function starts with:

	vcpu->arch.ctxt.regs.pstate = read_sysreg_el2(SYS_SPSR);

which is just another way to write:

	*vcpu_cpsr(vcpu) = read_sysreg_el2(SYS_SPSR);

By that time, the vcpu's PSTATE is terminally corrupted.

I think you need to hoist this workaround way up, before we call into
early_exit_filter() as it will assume that the guest pstate is correct
(this is used by both pKVM and the NV code).

Something like this (untested):

diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 93bf140cc972..a8a1502db237 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -402,6 +402,26 @@ static inline bool kvm_hyp_handle_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 	return false;
 }
 
+static inline void synchronize_vcpu_pstate(struct kvm_vcpu *vcpu,
+					   u64 *exit_code)
+{
+	/*
+	 * Check for the conditions of Cortex-A510's #2077057. When these occur
+	 * SPSR_EL2 can't be trusted, but isn't needed either as it is
+	 * unchanged from the value in vcpu_gp_regs(vcpu)->pstate.
+	 * Did we just take a PAC exception when a step exception (being in the
+	 * Active-not-pending state) was expected?
+	 */
+	if (cpus_have_final_cap(ARM64_WORKAROUND_2077057)	&&
+	    ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ	&&
+	    kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_PAC	&&
+	    vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP		&&
+	    *vcpu_cpsr(vcpu) & DBG_SPSR_SS)
+		write_sysreg_el2(*vcpu_cpsr(vcpu), SYS_SPSR);
+
+	*vcpu_cpsr(vcpu) = read_sysreg_el2(SYS_SPSR);
+}
+
 /*
  * Return true when we were able to fixup the guest exit and should return to
  * the guest, false when we should restore the host state and return to the
@@ -415,7 +435,7 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 	 * Save PSTATE early so that we can evaluate the vcpu mode
 	 * early on.
 	 */
-	vcpu->arch.ctxt.regs.pstate = read_sysreg_el2(SYS_SPSR);
+	synchronize_vcpu_pstate(vcpu, exit_code);
 
 	/*
 	 * Check whether we want to repaint the state one way or
@@ -442,22 +462,6 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 			write_sysreg_el2(read_sysreg_el2(SYS_ELR) - 4, SYS_ELR);
 	}
 
-	/*
-	 * Check for the conditions of Cortex-A510's #2077057. When these occur
-	 * SPSR_EL2 can't be trusted, but isn't needed either as it is
-	 * unchanged from the value in vcpu_gp_regs(vcpu)->pstate.
-	 * Did we just take a PAC exception when a step exception was expected?
-	 */
-	if (IS_ENABLED(CONFIG_ARM64_ERRATUM_2077057) &&
-	    cpus_have_const_cap(ARM64_WORKAROUND_2077057) &&
-	    ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ &&
-	    esr_ec == ESR_ELx_EC_PAC &&
-	    vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
-		/* Active-not-pending? */
-		if (*vcpu_cpsr(vcpu) & DBG_SPSR_SS)
-			write_sysreg_el2(*vcpu_cpsr(vcpu), SYS_SPSR);
-	}
-
 	/*
 	 * We're using the raw exception code in order to only process
 	 * the trap if no SError is pending. We will come back to the

> +	}
> +
>  	/*
>  	 * We're using the raw exception code in order to only process
>  	 * the trap if no SError is pending. We will come back to the
> diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
> index 870c39537dd0..2e7cd3fecca6 100644
> --- a/arch/arm64/tools/cpucaps
> +++ b/arch/arm64/tools/cpucaps
> @@ -55,6 +55,7 @@ WORKAROUND_1418040
>  WORKAROUND_1463225
>  WORKAROUND_1508412
>  WORKAROUND_1542419
> +WORKAROUND_2077057
>  WORKAROUND_TRBE_OVERWRITE_FILL_MODE
>  WORKAROUND_TSB_FLUSH_FAILURE
>  WORKAROUND_TRBE_WRITE_OUT_OF_RANGE

Other than that, I'm happy to take the series as a whole ASAP, if only
for the two pretty embarrassing bug fixes. If you can respin it
shortly and address the comments above, I'd like it into -rc2.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] KVM: arm64: Workaround Cortex-A510's single-step and PAC trap errata
@ 2022-01-25 16:51     ` Marc Zyngier
  0 siblings, 0 replies; 18+ messages in thread
From: Marc Zyngier @ 2022-01-25 16:51 UTC (permalink / raw)
  To: James Morse
  Cc: kvmarm, linux-arm-kernel, Catalin Marinas, Will Deacon,
	Alexandru Elisei, Suzuki K Poulose, Anshuman Khandual

Hi James,

Thanks for this. I guess.

On Tue, 25 Jan 2022 15:38:03 +0000,
James Morse <james.morse@arm.com> wrote:
> 
> Cortex-A510's erratum #2077057 causes SPSR_EL2 to be corrupted when
> single-stepping authenticated ERET instructions. A single step is
> expected, but a pointer authentication trap is taken instead. The
> erratum causes SPSR_EL1 to be copied to SPSR_EL2, which could allow
> EL1 to cause a return to EL2 with a guest controlled ELR_EL2.
> 
> Because the conditions require an ERET into active-not-pending state,
> this is only a problem for the EL2 when EL2 is stepping EL1. In this case
> the previous SPSR_EL2 value is preserved in struct kvm_vcpu, and can be
> restored.

Urgh. That's pretty nasty :-(.

> 
> Cc: stable@vger.kernel.org # ${GITHASHHERE}: arm64: Add Cortex-A510 CPU part definition
> Cc: stable@vger.kernel.org
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  Documentation/arm64/silicon-errata.rst  |  2 ++
>  arch/arm64/Kconfig                      | 16 ++++++++++++++++
>  arch/arm64/kernel/cpu_errata.c          |  8 ++++++++
>  arch/arm64/kvm/hyp/include/hyp/switch.h | 24 +++++++++++++++++++++---
>  arch/arm64/tools/cpucaps                |  1 +
>  5 files changed, 48 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
> index 5342e895fb60..ac1ae34564c9 100644
> --- a/Documentation/arm64/silicon-errata.rst
> +++ b/Documentation/arm64/silicon-errata.rst
> @@ -92,6 +92,8 @@ stable kernels.
>  +----------------+-----------------+-----------------+-----------------------------+
>  | ARM            | Cortex-A77      | #1508412        | ARM64_ERRATUM_1508412       |
>  +----------------+-----------------+-----------------+-----------------------------+
> +| ARM            | Cortex-A510     | #2077057        | ARM64_ERRATUM_2077057       |
> ++----------------+-----------------+-----------------+-----------------------------+
>  | ARM            | Cortex-A710     | #2119858        | ARM64_ERRATUM_2119858       |
>  +----------------+-----------------+-----------------+-----------------------------+
>  | ARM            | Cortex-A710     | #2054223        | ARM64_ERRATUM_2054223       |
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 6978140edfa4..02b542ec18c8 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -670,6 +670,22 @@ config ARM64_ERRATUM_1508412
>  config ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
>  	bool
>  
> +config ARM64_ERRATUM_2077057
> +	bool "Cortex-A510: 2077057: workaround software-step corrupting SPSR_EL2"
> +	help
> +	  This option adds the workaround for ARM Cortex-A510 erratum 2077057.
> +	  Affected Cortex-A510 may corrupt SPSR_EL2 when the a step exception is
> +	  expected, but a Pointer Authentication trap is taken instead. The
> +	  erratum causes SPSR_EL1 to be copied to SPSR_EL2, which could allow
> +	  EL1 to cause a return to EL2 with a guest controlled ELR_EL2.
> +
> +	  This can only happen when EL2 is stepping EL1.
> +
> +	  When these conditions occur, the SPSR_EL2 value is unchanged from the
> +	  previous guest entry, and can be restored from the in-memory copy.
> +
> +	  If unsure, say Y.
> +
>  config ARM64_ERRATUM_2119858
>  	bool "Cortex-A710: 2119858: workaround TRBE overwriting trace data in FILL mode"
>  	default y
> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
> index 9e1c1aef9ebd..04a014c63251 100644
> --- a/arch/arm64/kernel/cpu_errata.c
> +++ b/arch/arm64/kernel/cpu_errata.c
> @@ -597,6 +597,14 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
>  		.type = ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE,
>  		CAP_MIDR_RANGE_LIST(trbe_write_out_of_range_cpus),
>  	},
> +#endif
> +#ifdef CONFIG_ARM64_ERRATUM_2077057
> +	{
> +		.desc = "ARM erratum 2077057",
> +		.capability = ARM64_WORKAROUND_2077057,
> +		.type = ARM64_CPUCAP_LOCAL_CPU_ERRATUM,
> +		ERRATA_MIDR_REV_RANGE(MIDR_CORTEX_A510, 0, 0, 2),
> +	},
>  #endif
>  	{
>  	}
> diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
> index 331dd10821df..93bf140cc972 100644
> --- a/arch/arm64/kvm/hyp/include/hyp/switch.h
> +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
> @@ -409,6 +409,8 @@ static inline bool kvm_hyp_handle_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>   */
>  static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>  {
> +	u8 esr_ec;
> +
>  	/*
>  	 * Save PSTATE early so that we can evaluate the vcpu mode
>  	 * early on.
> @@ -421,13 +423,13 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>  	 */
>  	early_exit_filter(vcpu, exit_code);
>  
> -	if (ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ)
> +	if (ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ) {
>  		vcpu->arch.fault.esr_el2 = read_sysreg_el2(SYS_ESR);
> +		esr_ec = kvm_vcpu_trap_get_class(vcpu);
> +	}
>  
>  	if (ARM_SERROR_PENDING(*exit_code) &&
>  	    ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ) {
> -		u8 esr_ec = kvm_vcpu_trap_get_class(vcpu);
> -
>  		/*
>  		 * HVC already have an adjusted PC, which we need to
>  		 * correct in order to return to after having injected
> @@ -440,6 +442,22 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>  			write_sysreg_el2(read_sysreg_el2(SYS_ELR) - 4, SYS_ELR);
>  	}
>  
> +	/*
> +	 * Check for the conditions of Cortex-A510's #2077057. When these occur
> +	 * SPSR_EL2 can't be trusted, but isn't needed either as it is
> +	 * unchanged from the value in vcpu_gp_regs(vcpu)->pstate.
> +	 * Did we just take a PAC exception when a step exception was expected?
> +	 */
> +	if (IS_ENABLED(CONFIG_ARM64_ERRATUM_2077057) &&

nit: we can drop this IS_ENABLED()...

> +	    cpus_have_const_cap(ARM64_WORKAROUND_2077057) &&

and make this a final cap. Being a ARM64_CPUCAP_LOCAL_CPU_ERRATUM, we
won't accept late CPUs on a system that wasn't affected until then.

> +	    ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ &&
> +	    esr_ec == ESR_ELx_EC_PAC &&
> +	    vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
> +		/* Active-not-pending? */
> +		if (*vcpu_cpsr(vcpu) & DBG_SPSR_SS)
> +			write_sysreg_el2(*vcpu_cpsr(vcpu), SYS_SPSR);

Err... Isn't this way too late? The function starts with:

	vcpu->arch.ctxt.regs.pstate = read_sysreg_el2(SYS_SPSR);

which is just another way to write:

	*vcpu_cpsr(vcpu) = read_sysreg_el2(SYS_SPSR);

By that time, the vcpu's PSTATE is terminally corrupted.

I think you need to hoist this workaround way up, before we call into
early_exit_filter() as it will assume that the guest pstate is correct
(this is used by both pKVM and the NV code).

Something like this (untested):

diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 93bf140cc972..a8a1502db237 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -402,6 +402,26 @@ static inline bool kvm_hyp_handle_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 	return false;
 }
 
+static inline void synchronize_vcpu_pstate(struct kvm_vcpu *vcpu,
+					   u64 *exit_code)
+{
+	/*
+	 * Check for the conditions of Cortex-A510's #2077057. When these occur
+	 * SPSR_EL2 can't be trusted, but isn't needed either as it is
+	 * unchanged from the value in vcpu_gp_regs(vcpu)->pstate.
+	 * Did we just take a PAC exception when a step exception (being in the
+	 * Active-not-pending state) was expected?
+	 */
+	if (cpus_have_final_cap(ARM64_WORKAROUND_2077057)	&&
+	    ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ	&&
+	    kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_PAC	&&
+	    vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP		&&
+	    *vcpu_cpsr(vcpu) & DBG_SPSR_SS)
+		write_sysreg_el2(*vcpu_cpsr(vcpu), SYS_SPSR);
+
+	*vcpu_cpsr(vcpu) = read_sysreg_el2(SYS_SPSR);
+}
+
 /*
  * Return true when we were able to fixup the guest exit and should return to
  * the guest, false when we should restore the host state and return to the
@@ -415,7 +435,7 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 	 * Save PSTATE early so that we can evaluate the vcpu mode
 	 * early on.
 	 */
-	vcpu->arch.ctxt.regs.pstate = read_sysreg_el2(SYS_SPSR);
+	synchronize_vcpu_pstate(vcpu, exit_code);
 
 	/*
 	 * Check whether we want to repaint the state one way or
@@ -442,22 +462,6 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 			write_sysreg_el2(read_sysreg_el2(SYS_ELR) - 4, SYS_ELR);
 	}
 
-	/*
-	 * Check for the conditions of Cortex-A510's #2077057. When these occur
-	 * SPSR_EL2 can't be trusted, but isn't needed either as it is
-	 * unchanged from the value in vcpu_gp_regs(vcpu)->pstate.
-	 * Did we just take a PAC exception when a step exception was expected?
-	 */
-	if (IS_ENABLED(CONFIG_ARM64_ERRATUM_2077057) &&
-	    cpus_have_const_cap(ARM64_WORKAROUND_2077057) &&
-	    ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ &&
-	    esr_ec == ESR_ELx_EC_PAC &&
-	    vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
-		/* Active-not-pending? */
-		if (*vcpu_cpsr(vcpu) & DBG_SPSR_SS)
-			write_sysreg_el2(*vcpu_cpsr(vcpu), SYS_SPSR);
-	}
-
 	/*
 	 * We're using the raw exception code in order to only process
 	 * the trap if no SError is pending. We will come back to the

> +	}
> +
>  	/*
>  	 * We're using the raw exception code in order to only process
>  	 * the trap if no SError is pending. We will come back to the
> diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
> index 870c39537dd0..2e7cd3fecca6 100644
> --- a/arch/arm64/tools/cpucaps
> +++ b/arch/arm64/tools/cpucaps
> @@ -55,6 +55,7 @@ WORKAROUND_1418040
>  WORKAROUND_1463225
>  WORKAROUND_1508412
>  WORKAROUND_1542419
> +WORKAROUND_2077057
>  WORKAROUND_TRBE_OVERWRITE_FILL_MODE
>  WORKAROUND_TSB_FLUSH_FAILURE
>  WORKAROUND_TRBE_WRITE_OUT_OF_RANGE

Other than that, I'm happy to take the series as a whole ASAP, if only
for the two pretty embarrassing bug fixes. If you can respin it
shortly and address the comments above, I'd like it into -rc2.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] KVM: arm64: Workaround Cortex-A510's single-step and PAC trap errata
  2022-01-25 16:51     ` Marc Zyngier
@ 2022-01-25 18:19       ` James Morse
  -1 siblings, 0 replies; 18+ messages in thread
From: James Morse @ 2022-01-25 18:19 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Anshuman Khandual, Will Deacon, kvmarm,
	linux-arm-kernel

Hi Marc,

On 25/01/2022 16:51, Marc Zyngier wrote:
> On Tue, 25 Jan 2022 15:38:03 +0000,
> James Morse <james.morse@arm.com> wrote:
>>
>> Cortex-A510's erratum #2077057 causes SPSR_EL2 to be corrupted when
>> single-stepping authenticated ERET instructions. A single step is
>> expected, but a pointer authentication trap is taken instead. The
>> erratum causes SPSR_EL1 to be copied to SPSR_EL2, which could allow
>> EL1 to cause a return to EL2 with a guest controlled ELR_EL2.
>>
>> Because the conditions require an ERET into active-not-pending state,
>> this is only a problem for the EL2 when EL2 is stepping EL1. In this case
>> the previous SPSR_EL2 value is preserved in struct kvm_vcpu, and can be
>> restored.

> Urgh. That's pretty nasty :-(.

>> diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
>> index 331dd10821df..93bf140cc972 100644
>> --- a/arch/arm64/kvm/hyp/include/hyp/switch.h
>> +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
>> @@ -440,6 +442,22 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>>  			write_sysreg_el2(read_sysreg_el2(SYS_ELR) - 4, SYS_ELR);
>>  	}
>>  
>> +	/*
>> +	 * Check for the conditions of Cortex-A510's #2077057. When these occur
>> +	 * SPSR_EL2 can't be trusted, but isn't needed either as it is
>> +	 * unchanged from the value in vcpu_gp_regs(vcpu)->pstate.
>> +	 * Did we just take a PAC exception when a step exception was expected?
>> +	 */
>> +	if (IS_ENABLED(CONFIG_ARM64_ERRATUM_2077057) &&
> 
> nit: we can drop this IS_ENABLED()...

Hmmm, I thought dead code elimination was a good thing!
Without the cpu_errata.c match, (which is also guarded by #ifdef), the cap will never be
true, even if an affected CPU showed up. This way the compiler knows it can remove all this.


>> +	    cpus_have_const_cap(ARM64_WORKAROUND_2077057) &&
> 
> and make this a final cap. Being a ARM64_CPUCAP_LOCAL_CPU_ERRATUM, we
> won't accept late CPUs on a system that wasn't affected until then.
> 
>> +	    ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ &&
>> +	    esr_ec == ESR_ELx_EC_PAC &&
>> +	    vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
>> +		/* Active-not-pending? */
>> +		if (*vcpu_cpsr(vcpu) & DBG_SPSR_SS)
>> +			write_sysreg_el2(*vcpu_cpsr(vcpu), SYS_SPSR);
> 
> Err... Isn't this way too late? The function starts with:
> 
> 	vcpu->arch.ctxt.regs.pstate = read_sysreg_el2(SYS_SPSR);
> 
> which is just another way to write:
> 
> 	*vcpu_cpsr(vcpu) = read_sysreg_el2(SYS_SPSR);
> 
> By that time, the vcpu's PSTATE is terminally corrupted.

Yes -  bother. Staring at it didn't let me spot that!
I can hit the conditions to test this, but due to lack of imagination the model doesn't
corrupt the SPSR.


> I think you need to hoist this workaround way up, before we call into
> early_exit_filter() as it will assume that the guest pstate is correct
> (this is used by both pKVM and the NV code).
> 
> Something like this (untested):
> 
> diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
> index 93bf140cc972..a8a1502db237 100644
> --- a/arch/arm64/kvm/hyp/include/hyp/switch.h
> +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
> @@ -402,6 +402,26 @@ static inline bool kvm_hyp_handle_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>  	return false;
>  }
>  
> +static inline void synchronize_vcpu_pstate(struct kvm_vcpu *vcpu,
> +					   u64 *exit_code)
> +{
> +	/*
> +	 * Check for the conditions of Cortex-A510's #2077057. When these occur
> +	 * SPSR_EL2 can't be trusted, but isn't needed either as it is
> +	 * unchanged from the value in vcpu_gp_regs(vcpu)->pstate.
> +	 * Did we just take a PAC exception when a step exception (being in the
> +	 * Active-not-pending state) was expected?
> +	 */
> +	if (cpus_have_final_cap(ARM64_WORKAROUND_2077057)	&&
> +	    ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ	&&

> +	    kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_PAC	&&

The vcpu's esr_el2 isn't yet set:
| ESR_ELx_EC(read_sysreg_el2(SYS_ESR)) == ESR_ELx_EC_PAC

(and I'll shuffle the order so this is last as its an extra sysreg read)


> +	    vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP		&&
> +	    *vcpu_cpsr(vcpu) & DBG_SPSR_SS)
> +		write_sysreg_el2(*vcpu_cpsr(vcpu), SYS_SPSR);
> +
> +	*vcpu_cpsr(vcpu) = read_sysreg_el2(SYS_SPSR);
> +}
> +
>  /*
>   * Return true when we were able to fixup the guest exit and should return to
>   * the guest, false when we should restore the host state and return to the
> @@ -415,7 +435,7 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>  	 * Save PSTATE early so that we can evaluate the vcpu mode
>  	 * early on.
>  	 */
> -	vcpu->arch.ctxt.regs.pstate = read_sysreg_el2(SYS_SPSR);
> +	synchronize_vcpu_pstate(vcpu, exit_code);

Even better, that saves the noise from moving esr_ec around!


> Other than that, I'm happy to take the series as a whole ASAP, if only
> for the two pretty embarrassing bug fixes. If you can respin it
> shortly and address the comments above, I'd like it into -rc2.

Will do. Shout if you strongly care about the IS_ENABLED().


Thanks,

James
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] KVM: arm64: Workaround Cortex-A510's single-step and PAC trap errata
@ 2022-01-25 18:19       ` James Morse
  0 siblings, 0 replies; 18+ messages in thread
From: James Morse @ 2022-01-25 18:19 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, Catalin Marinas, Will Deacon,
	Alexandru Elisei, Suzuki K Poulose, Anshuman Khandual

Hi Marc,

On 25/01/2022 16:51, Marc Zyngier wrote:
> On Tue, 25 Jan 2022 15:38:03 +0000,
> James Morse <james.morse@arm.com> wrote:
>>
>> Cortex-A510's erratum #2077057 causes SPSR_EL2 to be corrupted when
>> single-stepping authenticated ERET instructions. A single step is
>> expected, but a pointer authentication trap is taken instead. The
>> erratum causes SPSR_EL1 to be copied to SPSR_EL2, which could allow
>> EL1 to cause a return to EL2 with a guest controlled ELR_EL2.
>>
>> Because the conditions require an ERET into active-not-pending state,
>> this is only a problem for the EL2 when EL2 is stepping EL1. In this case
>> the previous SPSR_EL2 value is preserved in struct kvm_vcpu, and can be
>> restored.

> Urgh. That's pretty nasty :-(.

>> diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
>> index 331dd10821df..93bf140cc972 100644
>> --- a/arch/arm64/kvm/hyp/include/hyp/switch.h
>> +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
>> @@ -440,6 +442,22 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>>  			write_sysreg_el2(read_sysreg_el2(SYS_ELR) - 4, SYS_ELR);
>>  	}
>>  
>> +	/*
>> +	 * Check for the conditions of Cortex-A510's #2077057. When these occur
>> +	 * SPSR_EL2 can't be trusted, but isn't needed either as it is
>> +	 * unchanged from the value in vcpu_gp_regs(vcpu)->pstate.
>> +	 * Did we just take a PAC exception when a step exception was expected?
>> +	 */
>> +	if (IS_ENABLED(CONFIG_ARM64_ERRATUM_2077057) &&
> 
> nit: we can drop this IS_ENABLED()...

Hmmm, I thought dead code elimination was a good thing!
Without the cpu_errata.c match, (which is also guarded by #ifdef), the cap will never be
true, even if an affected CPU showed up. This way the compiler knows it can remove all this.


>> +	    cpus_have_const_cap(ARM64_WORKAROUND_2077057) &&
> 
> and make this a final cap. Being a ARM64_CPUCAP_LOCAL_CPU_ERRATUM, we
> won't accept late CPUs on a system that wasn't affected until then.
> 
>> +	    ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ &&
>> +	    esr_ec == ESR_ELx_EC_PAC &&
>> +	    vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
>> +		/* Active-not-pending? */
>> +		if (*vcpu_cpsr(vcpu) & DBG_SPSR_SS)
>> +			write_sysreg_el2(*vcpu_cpsr(vcpu), SYS_SPSR);
> 
> Err... Isn't this way too late? The function starts with:
> 
> 	vcpu->arch.ctxt.regs.pstate = read_sysreg_el2(SYS_SPSR);
> 
> which is just another way to write:
> 
> 	*vcpu_cpsr(vcpu) = read_sysreg_el2(SYS_SPSR);
> 
> By that time, the vcpu's PSTATE is terminally corrupted.

Yes -  bother. Staring at it didn't let me spot that!
I can hit the conditions to test this, but due to lack of imagination the model doesn't
corrupt the SPSR.


> I think you need to hoist this workaround way up, before we call into
> early_exit_filter() as it will assume that the guest pstate is correct
> (this is used by both pKVM and the NV code).
> 
> Something like this (untested):
> 
> diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
> index 93bf140cc972..a8a1502db237 100644
> --- a/arch/arm64/kvm/hyp/include/hyp/switch.h
> +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
> @@ -402,6 +402,26 @@ static inline bool kvm_hyp_handle_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>  	return false;
>  }
>  
> +static inline void synchronize_vcpu_pstate(struct kvm_vcpu *vcpu,
> +					   u64 *exit_code)
> +{
> +	/*
> +	 * Check for the conditions of Cortex-A510's #2077057. When these occur
> +	 * SPSR_EL2 can't be trusted, but isn't needed either as it is
> +	 * unchanged from the value in vcpu_gp_regs(vcpu)->pstate.
> +	 * Did we just take a PAC exception when a step exception (being in the
> +	 * Active-not-pending state) was expected?
> +	 */
> +	if (cpus_have_final_cap(ARM64_WORKAROUND_2077057)	&&
> +	    ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ	&&

> +	    kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_PAC	&&

The vcpu's esr_el2 isn't yet set:
| ESR_ELx_EC(read_sysreg_el2(SYS_ESR)) == ESR_ELx_EC_PAC

(and I'll shuffle the order so this is last as its an extra sysreg read)


> +	    vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP		&&
> +	    *vcpu_cpsr(vcpu) & DBG_SPSR_SS)
> +		write_sysreg_el2(*vcpu_cpsr(vcpu), SYS_SPSR);
> +
> +	*vcpu_cpsr(vcpu) = read_sysreg_el2(SYS_SPSR);
> +}
> +
>  /*
>   * Return true when we were able to fixup the guest exit and should return to
>   * the guest, false when we should restore the host state and return to the
> @@ -415,7 +435,7 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>  	 * Save PSTATE early so that we can evaluate the vcpu mode
>  	 * early on.
>  	 */
> -	vcpu->arch.ctxt.regs.pstate = read_sysreg_el2(SYS_SPSR);
> +	synchronize_vcpu_pstate(vcpu, exit_code);

Even better, that saves the noise from moving esr_ec around!


> Other than that, I'm happy to take the series as a whole ASAP, if only
> for the two pretty embarrassing bug fixes. If you can respin it
> shortly and address the comments above, I'd like it into -rc2.

Will do. Shout if you strongly care about the IS_ENABLED().


Thanks,

James

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] KVM: arm64: Workaround Cortex-A510's single-step and PAC trap errata
  2022-01-25 18:19       ` James Morse
@ 2022-01-25 18:36         ` Marc Zyngier
  -1 siblings, 0 replies; 18+ messages in thread
From: Marc Zyngier @ 2022-01-25 18:36 UTC (permalink / raw)
  To: James Morse
  Cc: Catalin Marinas, Anshuman Khandual, Will Deacon, kvmarm,
	linux-arm-kernel

On Tue, 25 Jan 2022 18:19:45 +0000,
James Morse <james.morse@arm.com> wrote:
> 
> Hi Marc,
> 
> On 25/01/2022 16:51, Marc Zyngier wrote:
> > On Tue, 25 Jan 2022 15:38:03 +0000,
> > James Morse <james.morse@arm.com> wrote:
> >>
> >> Cortex-A510's erratum #2077057 causes SPSR_EL2 to be corrupted when
> >> single-stepping authenticated ERET instructions. A single step is
> >> expected, but a pointer authentication trap is taken instead. The
> >> erratum causes SPSR_EL1 to be copied to SPSR_EL2, which could allow
> >> EL1 to cause a return to EL2 with a guest controlled ELR_EL2.
> >>
> >> Because the conditions require an ERET into active-not-pending state,
> >> this is only a problem for the EL2 when EL2 is stepping EL1. In this case
> >> the previous SPSR_EL2 value is preserved in struct kvm_vcpu, and can be
> >> restored.
> 
> > Urgh. That's pretty nasty :-(.
> 
> >> diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
> >> index 331dd10821df..93bf140cc972 100644
> >> --- a/arch/arm64/kvm/hyp/include/hyp/switch.h
> >> +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
> >> @@ -440,6 +442,22 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
> >>  			write_sysreg_el2(read_sysreg_el2(SYS_ELR) - 4, SYS_ELR);
> >>  	}
> >>  
> >> +	/*
> >> +	 * Check for the conditions of Cortex-A510's #2077057. When these occur
> >> +	 * SPSR_EL2 can't be trusted, but isn't needed either as it is
> >> +	 * unchanged from the value in vcpu_gp_regs(vcpu)->pstate.
> >> +	 * Did we just take a PAC exception when a step exception was expected?
> >> +	 */
> >> +	if (IS_ENABLED(CONFIG_ARM64_ERRATUM_2077057) &&
> > 
> > nit: we can drop this IS_ENABLED()...
> 
> Hmmm, I thought dead code elimination was a good thing!
> Without the cpu_errata.c match, (which is also guarded by #ifdef),
> the cap will never be true, even if an affected CPU showed up. This
> way the compiler knows it can remove all this.

I don't dispute that. However, experience shows that the more of these
we sprinkle around, the quicker this code bitrots as we don't build
for all the possible configurations. In general, we hardly have any
dependency on configuration symbols, and rely on static keys got
things not be terribly sucky.

> 
> 
> >> +	    cpus_have_const_cap(ARM64_WORKAROUND_2077057) &&
> > 
> > and make this a final cap. Being a ARM64_CPUCAP_LOCAL_CPU_ERRATUM, we
> > won't accept late CPUs on a system that wasn't affected until then.
> > 
> >> +	    ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ &&
> >> +	    esr_ec == ESR_ELx_EC_PAC &&
> >> +	    vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
> >> +		/* Active-not-pending? */
> >> +		if (*vcpu_cpsr(vcpu) & DBG_SPSR_SS)
> >> +			write_sysreg_el2(*vcpu_cpsr(vcpu), SYS_SPSR);
> > 
> > Err... Isn't this way too late? The function starts with:
> > 
> > 	vcpu->arch.ctxt.regs.pstate = read_sysreg_el2(SYS_SPSR);
> > 
> > which is just another way to write:
> > 
> > 	*vcpu_cpsr(vcpu) = read_sysreg_el2(SYS_SPSR);
> > 
> > By that time, the vcpu's PSTATE is terminally corrupted.
> 
> Yes -  bother. Staring at it didn't let me spot that!
> I can hit the conditions to test this, but due to lack of
> imagination the model doesn't corrupt the SPSR.

Bad model. ;-)

> 
> 
> > I think you need to hoist this workaround way up, before we call into
> > early_exit_filter() as it will assume that the guest pstate is correct
> > (this is used by both pKVM and the NV code).
> > 
> > Something like this (untested):
> > 
> > diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
> > index 93bf140cc972..a8a1502db237 100644
> > --- a/arch/arm64/kvm/hyp/include/hyp/switch.h
> > +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
> > @@ -402,6 +402,26 @@ static inline bool kvm_hyp_handle_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
> >  	return false;
> >  }
> >  
> > +static inline void synchronize_vcpu_pstate(struct kvm_vcpu *vcpu,
> > +					   u64 *exit_code)
> > +{
> > +	/*
> > +	 * Check for the conditions of Cortex-A510's #2077057. When these occur
> > +	 * SPSR_EL2 can't be trusted, but isn't needed either as it is
> > +	 * unchanged from the value in vcpu_gp_regs(vcpu)->pstate.
> > +	 * Did we just take a PAC exception when a step exception (being in the
> > +	 * Active-not-pending state) was expected?
> > +	 */
> > +	if (cpus_have_final_cap(ARM64_WORKAROUND_2077057)	&&
> > +	    ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ	&&
> 
> > +	    kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_PAC	&&
> 
> The vcpu's esr_el2 isn't yet set:
> | ESR_ELx_EC(read_sysreg_el2(SYS_ESR)) == ESR_ELx_EC_PAC

Ah, well spotted!

> 
> (and I'll shuffle the order so this is last as its an extra sysreg read)
> 
> 
> > +	    vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP		&&
> > +	    *vcpu_cpsr(vcpu) & DBG_SPSR_SS)
> > +		write_sysreg_el2(*vcpu_cpsr(vcpu), SYS_SPSR);
> > +
> > +	*vcpu_cpsr(vcpu) = read_sysreg_el2(SYS_SPSR);
> > +}
> > +
> >  /*
> >   * Return true when we were able to fixup the guest exit and should return to
> >   * the guest, false when we should restore the host state and return to the
> > @@ -415,7 +435,7 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
> >  	 * Save PSTATE early so that we can evaluate the vcpu mode
> >  	 * early on.
> >  	 */
> > -	vcpu->arch.ctxt.regs.pstate = read_sysreg_el2(SYS_SPSR);
> > +	synchronize_vcpu_pstate(vcpu, exit_code);
> 
> Even better, that saves the noise from moving esr_ec around!

Yup, the patch becomes marginally smaller.

> 
> 
> > Other than that, I'm happy to take the series as a whole ASAP, if only
> > for the two pretty embarrassing bug fixes. If you can respin it
> > shortly and address the comments above, I'd like it into -rc2.
> 
> Will do. Shout if you strongly care about the IS_ENABLED().

I'd really like it gone. Otherwise, I'll never compile that code.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] KVM: arm64: Workaround Cortex-A510's single-step and PAC trap errata
@ 2022-01-25 18:36         ` Marc Zyngier
  0 siblings, 0 replies; 18+ messages in thread
From: Marc Zyngier @ 2022-01-25 18:36 UTC (permalink / raw)
  To: James Morse
  Cc: kvmarm, linux-arm-kernel, Catalin Marinas, Will Deacon,
	Alexandru Elisei, Suzuki K Poulose, Anshuman Khandual

On Tue, 25 Jan 2022 18:19:45 +0000,
James Morse <james.morse@arm.com> wrote:
> 
> Hi Marc,
> 
> On 25/01/2022 16:51, Marc Zyngier wrote:
> > On Tue, 25 Jan 2022 15:38:03 +0000,
> > James Morse <james.morse@arm.com> wrote:
> >>
> >> Cortex-A510's erratum #2077057 causes SPSR_EL2 to be corrupted when
> >> single-stepping authenticated ERET instructions. A single step is
> >> expected, but a pointer authentication trap is taken instead. The
> >> erratum causes SPSR_EL1 to be copied to SPSR_EL2, which could allow
> >> EL1 to cause a return to EL2 with a guest controlled ELR_EL2.
> >>
> >> Because the conditions require an ERET into active-not-pending state,
> >> this is only a problem for the EL2 when EL2 is stepping EL1. In this case
> >> the previous SPSR_EL2 value is preserved in struct kvm_vcpu, and can be
> >> restored.
> 
> > Urgh. That's pretty nasty :-(.
> 
> >> diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
> >> index 331dd10821df..93bf140cc972 100644
> >> --- a/arch/arm64/kvm/hyp/include/hyp/switch.h
> >> +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
> >> @@ -440,6 +442,22 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
> >>  			write_sysreg_el2(read_sysreg_el2(SYS_ELR) - 4, SYS_ELR);
> >>  	}
> >>  
> >> +	/*
> >> +	 * Check for the conditions of Cortex-A510's #2077057. When these occur
> >> +	 * SPSR_EL2 can't be trusted, but isn't needed either as it is
> >> +	 * unchanged from the value in vcpu_gp_regs(vcpu)->pstate.
> >> +	 * Did we just take a PAC exception when a step exception was expected?
> >> +	 */
> >> +	if (IS_ENABLED(CONFIG_ARM64_ERRATUM_2077057) &&
> > 
> > nit: we can drop this IS_ENABLED()...
> 
> Hmmm, I thought dead code elimination was a good thing!
> Without the cpu_errata.c match, (which is also guarded by #ifdef),
> the cap will never be true, even if an affected CPU showed up. This
> way the compiler knows it can remove all this.

I don't dispute that. However, experience shows that the more of these
we sprinkle around, the quicker this code bitrots as we don't build
for all the possible configurations. In general, we hardly have any
dependency on configuration symbols, and rely on static keys got
things not be terribly sucky.

> 
> 
> >> +	    cpus_have_const_cap(ARM64_WORKAROUND_2077057) &&
> > 
> > and make this a final cap. Being a ARM64_CPUCAP_LOCAL_CPU_ERRATUM, we
> > won't accept late CPUs on a system that wasn't affected until then.
> > 
> >> +	    ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ &&
> >> +	    esr_ec == ESR_ELx_EC_PAC &&
> >> +	    vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
> >> +		/* Active-not-pending? */
> >> +		if (*vcpu_cpsr(vcpu) & DBG_SPSR_SS)
> >> +			write_sysreg_el2(*vcpu_cpsr(vcpu), SYS_SPSR);
> > 
> > Err... Isn't this way too late? The function starts with:
> > 
> > 	vcpu->arch.ctxt.regs.pstate = read_sysreg_el2(SYS_SPSR);
> > 
> > which is just another way to write:
> > 
> > 	*vcpu_cpsr(vcpu) = read_sysreg_el2(SYS_SPSR);
> > 
> > By that time, the vcpu's PSTATE is terminally corrupted.
> 
> Yes -  bother. Staring at it didn't let me spot that!
> I can hit the conditions to test this, but due to lack of
> imagination the model doesn't corrupt the SPSR.

Bad model. ;-)

> 
> 
> > I think you need to hoist this workaround way up, before we call into
> > early_exit_filter() as it will assume that the guest pstate is correct
> > (this is used by both pKVM and the NV code).
> > 
> > Something like this (untested):
> > 
> > diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
> > index 93bf140cc972..a8a1502db237 100644
> > --- a/arch/arm64/kvm/hyp/include/hyp/switch.h
> > +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
> > @@ -402,6 +402,26 @@ static inline bool kvm_hyp_handle_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
> >  	return false;
> >  }
> >  
> > +static inline void synchronize_vcpu_pstate(struct kvm_vcpu *vcpu,
> > +					   u64 *exit_code)
> > +{
> > +	/*
> > +	 * Check for the conditions of Cortex-A510's #2077057. When these occur
> > +	 * SPSR_EL2 can't be trusted, but isn't needed either as it is
> > +	 * unchanged from the value in vcpu_gp_regs(vcpu)->pstate.
> > +	 * Did we just take a PAC exception when a step exception (being in the
> > +	 * Active-not-pending state) was expected?
> > +	 */
> > +	if (cpus_have_final_cap(ARM64_WORKAROUND_2077057)	&&
> > +	    ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ	&&
> 
> > +	    kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_PAC	&&
> 
> The vcpu's esr_el2 isn't yet set:
> | ESR_ELx_EC(read_sysreg_el2(SYS_ESR)) == ESR_ELx_EC_PAC

Ah, well spotted!

> 
> (and I'll shuffle the order so this is last as its an extra sysreg read)
> 
> 
> > +	    vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP		&&
> > +	    *vcpu_cpsr(vcpu) & DBG_SPSR_SS)
> > +		write_sysreg_el2(*vcpu_cpsr(vcpu), SYS_SPSR);
> > +
> > +	*vcpu_cpsr(vcpu) = read_sysreg_el2(SYS_SPSR);
> > +}
> > +
> >  /*
> >   * Return true when we were able to fixup the guest exit and should return to
> >   * the guest, false when we should restore the host state and return to the
> > @@ -415,7 +435,7 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
> >  	 * Save PSTATE early so that we can evaluate the vcpu mode
> >  	 * early on.
> >  	 */
> > -	vcpu->arch.ctxt.regs.pstate = read_sysreg_el2(SYS_SPSR);
> > +	synchronize_vcpu_pstate(vcpu, exit_code);
> 
> Even better, that saves the noise from moving esr_ec around!

Yup, the patch becomes marginally smaller.

> 
> 
> > Other than that, I'm happy to take the series as a whole ASAP, if only
> > for the two pretty embarrassing bug fixes. If you can respin it
> > shortly and address the comments above, I'd like it into -rc2.
> 
> Will do. Shout if you strongly care about the IS_ENABLED().

I'd really like it gone. Otherwise, I'll never compile that code.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] KVM: arm64: Workaround Cortex-A510's single-step and PAC trap errata
  2022-01-25 18:36         ` Marc Zyngier
@ 2022-01-26 16:49           ` James Morse
  -1 siblings, 0 replies; 18+ messages in thread
From: James Morse @ 2022-01-26 16:49 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Anshuman Khandual, Will Deacon, kvmarm,
	linux-arm-kernel

Hi Marc,

On 25/01/2022 18:36, Marc Zyngier wrote:
> On Tue, 25 Jan 2022 18:19:45 +0000,
> James Morse <james.morse@arm.com> wrote:
>> On 25/01/2022 16:51, Marc Zyngier wrote:
>>> On Tue, 25 Jan 2022 15:38:03 +0000,
>>> James Morse <james.morse@arm.com> wrote:
>>>>
>>>> Cortex-A510's erratum #2077057 causes SPSR_EL2 to be corrupted when
>>>> single-stepping authenticated ERET instructions. A single step is
>>>> expected, but a pointer authentication trap is taken instead. The
>>>> erratum causes SPSR_EL1 to be copied to SPSR_EL2, which could allow
>>>> EL1 to cause a return to EL2 with a guest controlled ELR_EL2.
>>>>
>>>> Because the conditions require an ERET into active-not-pending state,
>>>> this is only a problem for the EL2 when EL2 is stepping EL1. In this case
>>>> the previous SPSR_EL2 value is preserved in struct kvm_vcpu, and can be
>>>> restored.

>>>> diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
>>>> index 331dd10821df..93bf140cc972 100644
>>>> --- a/arch/arm64/kvm/hyp/include/hyp/switch.h
>>>> +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
>>>> @@ -440,6 +442,22 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>>>>  			write_sysreg_el2(read_sysreg_el2(SYS_ELR) - 4, SYS_ELR);
>>>>  	}
>>>>  
>>>> +	/*
>>>> +	 * Check for the conditions of Cortex-A510's #2077057. When these occur
>>>> +	 * SPSR_EL2 can't be trusted, but isn't needed either as it is
>>>> +	 * unchanged from the value in vcpu_gp_regs(vcpu)->pstate.
>>>> +	 * Did we just take a PAC exception when a step exception was expected?
>>>> +	 */
>>>> +	if (IS_ENABLED(CONFIG_ARM64_ERRATUM_2077057) &&
>>>
>>> nit: we can drop this IS_ENABLED()...
>>
>> Hmmm, I thought dead code elimination was a good thing!
>> Without the cpu_errata.c match, (which is also guarded by #ifdef),
>> the cap will never be true, even if an affected CPU showed up. This
>> way the compiler knows it can remove all this.

> I don't dispute that. However, experience shows that the more of these
> we sprinkle around, the quicker this code bitrots as we don't build
> for all the possible configurations. In general, we hardly have any
> dependency on configuration symbols, and rely on static keys got
> things not be terribly sucky.

I'll remove it - but I'm not convinced:

If this were in an #ifdef block, I agree that would prevent the compiler from even seeing
this code, and it would certainly bitrot if that Kconfig isn't selected, and would not be
build-tested properly in all the bizarre cases that only CI systems try.

But the IS_ENABLED() stuff lets the compiler can see inside that block, and means the
compiler can check the types etc, before eventually removing it as the condition becomes
"if(false)".

For example:
|	if (IS_ENABLED(CONFIG_ARM64_ERRATUM_2077057))
|		synchronize_vcpu_pstate(42);

This fails to compile regardless of the value of the Kconfig symbol:
| In file included from arch/arm64/kvm/hyp/vhe/switch.c:7:
| ./arch/arm64/kvm/hyp/include/hyp/switch.h: In function ‘synchronize_vcpu_pstate’:
| ./arch/arm64/kvm/hyp/include/hyp/switch.h:415:27: warning: passing argument 1 of
| ‘synchronize_vcpu_pstate’ makes pointer from integer without a cast [-Wint-conversion]
|   415 |   synchronize_vcpu_pstate(42);
|
| ./arch/arm64/kvm/hyp/include/hyp/switch.h:405:61: note: expected ‘struct kvm_vcpu *’ but
| argument is of type ‘int’
|  405 | static inline void synchronize_vcpu_pstate(struct kvm_vcpu *vcpu, u64 *exit_code)

etc.

| cat .config | grep 2077057
| # CONFIG_ARM64_ERRATUM_2077057 is not set


I guess you're worried about link errors.

[..]

>>> Other than that, I'm happy to take the series as a whole ASAP, if only
>>> for the two pretty embarrassing bug fixes. If you can respin it
>>> shortly and address the comments above, I'd like it into -rc2.
>>
>> Will do. Shout if you strongly care about the IS_ENABLED().
> 
> I'd really like it gone. Otherwise, I'll never compile that code.


Thanks,

James
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] KVM: arm64: Workaround Cortex-A510's single-step and PAC trap errata
@ 2022-01-26 16:49           ` James Morse
  0 siblings, 0 replies; 18+ messages in thread
From: James Morse @ 2022-01-26 16:49 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, Catalin Marinas, Will Deacon,
	Alexandru Elisei, Suzuki K Poulose, Anshuman Khandual

Hi Marc,

On 25/01/2022 18:36, Marc Zyngier wrote:
> On Tue, 25 Jan 2022 18:19:45 +0000,
> James Morse <james.morse@arm.com> wrote:
>> On 25/01/2022 16:51, Marc Zyngier wrote:
>>> On Tue, 25 Jan 2022 15:38:03 +0000,
>>> James Morse <james.morse@arm.com> wrote:
>>>>
>>>> Cortex-A510's erratum #2077057 causes SPSR_EL2 to be corrupted when
>>>> single-stepping authenticated ERET instructions. A single step is
>>>> expected, but a pointer authentication trap is taken instead. The
>>>> erratum causes SPSR_EL1 to be copied to SPSR_EL2, which could allow
>>>> EL1 to cause a return to EL2 with a guest controlled ELR_EL2.
>>>>
>>>> Because the conditions require an ERET into active-not-pending state,
>>>> this is only a problem for the EL2 when EL2 is stepping EL1. In this case
>>>> the previous SPSR_EL2 value is preserved in struct kvm_vcpu, and can be
>>>> restored.

>>>> diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
>>>> index 331dd10821df..93bf140cc972 100644
>>>> --- a/arch/arm64/kvm/hyp/include/hyp/switch.h
>>>> +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
>>>> @@ -440,6 +442,22 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>>>>  			write_sysreg_el2(read_sysreg_el2(SYS_ELR) - 4, SYS_ELR);
>>>>  	}
>>>>  
>>>> +	/*
>>>> +	 * Check for the conditions of Cortex-A510's #2077057. When these occur
>>>> +	 * SPSR_EL2 can't be trusted, but isn't needed either as it is
>>>> +	 * unchanged from the value in vcpu_gp_regs(vcpu)->pstate.
>>>> +	 * Did we just take a PAC exception when a step exception was expected?
>>>> +	 */
>>>> +	if (IS_ENABLED(CONFIG_ARM64_ERRATUM_2077057) &&
>>>
>>> nit: we can drop this IS_ENABLED()...
>>
>> Hmmm, I thought dead code elimination was a good thing!
>> Without the cpu_errata.c match, (which is also guarded by #ifdef),
>> the cap will never be true, even if an affected CPU showed up. This
>> way the compiler knows it can remove all this.

> I don't dispute that. However, experience shows that the more of these
> we sprinkle around, the quicker this code bitrots as we don't build
> for all the possible configurations. In general, we hardly have any
> dependency on configuration symbols, and rely on static keys got
> things not be terribly sucky.

I'll remove it - but I'm not convinced:

If this were in an #ifdef block, I agree that would prevent the compiler from even seeing
this code, and it would certainly bitrot if that Kconfig isn't selected, and would not be
build-tested properly in all the bizarre cases that only CI systems try.

But the IS_ENABLED() stuff lets the compiler can see inside that block, and means the
compiler can check the types etc, before eventually removing it as the condition becomes
"if(false)".

For example:
|	if (IS_ENABLED(CONFIG_ARM64_ERRATUM_2077057))
|		synchronize_vcpu_pstate(42);

This fails to compile regardless of the value of the Kconfig symbol:
| In file included from arch/arm64/kvm/hyp/vhe/switch.c:7:
| ./arch/arm64/kvm/hyp/include/hyp/switch.h: In function ‘synchronize_vcpu_pstate’:
| ./arch/arm64/kvm/hyp/include/hyp/switch.h:415:27: warning: passing argument 1 of
| ‘synchronize_vcpu_pstate’ makes pointer from integer without a cast [-Wint-conversion]
|   415 |   synchronize_vcpu_pstate(42);
|
| ./arch/arm64/kvm/hyp/include/hyp/switch.h:405:61: note: expected ‘struct kvm_vcpu *’ but
| argument is of type ‘int’
|  405 | static inline void synchronize_vcpu_pstate(struct kvm_vcpu *vcpu, u64 *exit_code)

etc.

| cat .config | grep 2077057
| # CONFIG_ARM64_ERRATUM_2077057 is not set


I guess you're worried about link errors.

[..]

>>> Other than that, I'm happy to take the series as a whole ASAP, if only
>>> for the two pretty embarrassing bug fixes. If you can respin it
>>> shortly and address the comments above, I'd like it into -rc2.
>>
>> Will do. Shout if you strongly care about the IS_ENABLED().
> 
> I'd really like it gone. Otherwise, I'll never compile that code.


Thanks,

James

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-01-26 17:29 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-25 15:37 [PATCH 0/4] KVM: arm64: A510 errata workaround and fixes for fixup_guest_exit() James Morse
2022-01-25 15:37 ` James Morse
2022-01-25 15:38 ` [PATCH 1/4] arm64: Add Cortex-A510 CPU part definition James Morse
2022-01-25 15:38   ` James Morse
2022-01-25 15:38 ` [PATCH 2/4] KVM: arm64: Avoid consuming a stale esr value when SError occur James Morse
2022-01-25 15:38   ` James Morse
2022-01-25 15:38 ` [PATCH 3/4] KVM: arm64: Stop handle_exit() from handling HVC twice when an SError occurs James Morse
2022-01-25 15:38   ` James Morse
2022-01-25 15:38 ` [PATCH 4/4] KVM: arm64: Workaround Cortex-A510's single-step and PAC trap errata James Morse
2022-01-25 15:38   ` James Morse
2022-01-25 16:51   ` Marc Zyngier
2022-01-25 16:51     ` Marc Zyngier
2022-01-25 18:19     ` James Morse
2022-01-25 18:19       ` James Morse
2022-01-25 18:36       ` Marc Zyngier
2022-01-25 18:36         ` Marc Zyngier
2022-01-26 16:49         ` James Morse
2022-01-26 16:49           ` James Morse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.